/ Learn / Demos / Quantum Machine Learning / Quantum Circuit Born Machines

Quantum Circuit Born Machines

Published: May 22, 2024. Last updated: November 6, 2024.

Unsupervised generative modelling emerges as a promising application for achieving practical quantum advantage on classical data due to its high complexity relative to supervised machine learning tasks. This makes them a suitable candidate for leveraging the potential of near-term quantum computers. A popular quantum generative model known as Quantum Circuit Born Machines (QCBMs) has shown impressive results in modelling distributions across various datasets, including both toy and real-world datasets, and you will learn about them in this demo.

/_images/OGthumbnail_large_QuantumCircuitBornMachines_2024-05-13.png

Generative modelling with QCBMs

Quantum Circuit Born machines (QCBMs) show promise in unsupervised generative modelling, aiming to learn and represent classical dataset probability distributions through pure quantum states 1 2. They are popular due to their high expressive power 3. Born machines leverage the probabilistic interpretation of quantum wavefunctions, representing probability distributions with pure quantum states instead of thermal distributions like Boltzmann machines. This allows Born machines to directly generate samples through projective measurements on qubits, offering a faster alternative to the Gibbs sampling approach 4.

For a dataset $\mathcal{D} = \{x\}$ with independent and identically distributed samples from an unknown target distribution $\pi(x),$ QCBM is used to generate samples closely resembling the target distribution. QCBM transforms the input product state $|\textbf{0} \rangle$ to a parameterized quantum state $|\psi_\boldsymbol{\theta}\rangle.$ Measuring this output state in the computational basis yields a sample of bits $x \sim p_\theta(x).$

$$ p_\boldsymbol{\theta}(x) = |\langle x | \psi_\boldsymbol{\theta} \rangle|^2. $$

The objective is to align the model probability distribution $p_\boldsymbol{\theta}$ with the target distribution $\pi.$

In this tutorial, following 1, we will implement a gradient-based algorithm for QCBM using PennyLane. We describe the model and learning algorithm followed by its application to the $3 \times 3$ Bars and Stripes dataset and double Gaussian peaks.

To train the QCBM, we use the squared maximum mean discrepancy (MMD) as the loss function

$$ \mathcal{L}(\boldsymbol{\theta}) = \left\|\sum_{x} p_\boldsymbol{\theta}(x) \phi(x)- \sum_{x} \pi(x) \phi(x) \right\|^2, $$

where $\phi(x)$ maps $x$ to a larger feature space. Using a kernel $K(x,y) = \phi(x)^T\phi(y)$ allows us to work in a lower-dimensional space. We use the Radial basis function (RBF) kernel for this purpose, which is defined as:

$$ K(x,y) = \frac{1}{c}\sum_{i=1}^c \exp \left( \frac{|x-y|^2}{2\sigma_i^2} \right). $$

Here, $\sigma_i$ is the bandwidth parameter controlling the Gaussian kernel’s width. $\mathcal{L}$ approaches to zero if and only if $p_\boldsymbol{\theta}$ approaches $\pi$ 5. To learn more about kernel methods, check out this demo.

We can now write the loss function in terms of $K(x,y)$ as

$$ \mathcal{L} = \underset{x, y \sim p_\boldsymbol{\theta}}{\mathbb{E}}[{K(x,y)}]-2\underset{x\sim p_\boldsymbol{\theta},y\sim \pi}{\mathbb{E}}[K(x,y)]+\underset{x, y \sim \pi}{\mathbb{E}}[K(x, y)] $$

Armed with these ingredients, we can write code for the QCBM and the loss function. We first define the MMD class for computing the squared MMD loss with radial basis function kernel. Upon initialization, it calculates the kernel function.

import jax
import jax.numpy as jnp

jax.config.update("jax_enable_x64", True)

class MMD:

def __init__(self, scales, space): gammas = 1 / (2 * (scales**2)) sq_dists = jnp.abs(space[:, None] - space[None, :]) ** 2 self.K = sum(jnp.exp(-gamma * sq_dists) for gamma in gammas) / len(scales) self.scales = scales

def k_expval(self, px, py): # Kernel expectation value return px @ self.K @ py

def __call__(self, px, py): pxy = px - py return self.k_expval(pxy, pxy)

Defining classes helps in caching the kernel instead of calculating it everytime to find the expectation value. Next up, the QCBM holds the definition for the quantum circuit born machine and the objective function to minimize.

from functools import partial

class QCBM:

def __init__(self, circ, mmd, py): self.circ = circ self.mmd = mmd self.py = py # target distribution π(x)

@partial(jax.jit, static_argnums=0) def mmd_loss(self, params): px = self.circ(params) return self.mmd(px, self.py), px

Learning the Bars and Stripes data distribution

We train the QCBM on the Bars and stripes dataset. The dataset has binary black and white images of size $n \times n$ pixels. We consider $n=3$ for this tutorial, which will give 14 valid configurations. The dataset is represented by flattened bitstrings. The quantum circuit will use 9 qubits in total.

Data generation

import numpy as np

def get_bars_and_stripes(n): bitstrings = [list(np.binary_repr(i, n))[::-1] for i in range(2**n)] bitstrings = np.array(bitstrings, dtype=int)

stripes = bitstrings.copy() stripes = np.repeat(stripes, n, 0) stripes = stripes.reshape(2**n, n * n)

bars = bitstrings.copy() bars = bars.reshape(2**n * n, 1) bars = np.repeat(bars, n, 1) bars = bars.reshape(2**n, n * n) return np.vstack((stripes[0 : stripes.shape[0] - 1], bars[1 : bars.shape[0]]))

n = 3 size = n**2 data = get_bars_and_stripes(n) print(data.shape)
(14, 9)

The dataset has 9 features per data point. A visualization of one data point is shown below. Each data point represents a flattened bitstring.

import matplotlib.pyplot as plt

sample = data[1].reshape(n, n)

plt.figure(figsize=(2, 2)) plt.imshow(sample, cmap="gray", vmin=0, vmax=1) plt.grid(color="gray", linewidth=2) plt.xticks([]) plt.yticks([])

for i in range(n): for j in range(n): text = plt.text( i, j, sample[j][i], ha="center", va="center", color="gray", fontsize=12, )

print(f"\nSample bitstring: {''.join(np.array(sample.flatten(), dtype='str'))}")
tutorial qcbm
Sample bitstring: 100100100

We can plot the full dataset of $3 \times 3$ images.

plt.figure(figsize=(4, 4))
j = 1
for i in data:
    plt.subplot(4, 4, j)
    j += 1
    plt.imshow(np.reshape(i, (n, n)), cmap="gray", vmin=0, vmax=1)
    plt.xticks([])
    plt.yticks([])
tutorial qcbm

Next we compute the integers represented by the valid configurations. We will use them later to evaluate the performance of the QCBM.

bitstrings = []
nums = []
for d in data:
    bitstrings += ["".join(str(int(i)) for i in d)]
    nums += [int(bitstrings[-1], 2)]
print(nums)
[0, 292, 146, 438, 73, 365, 219, 448, 56, 504, 7, 455, 63, 511]

Using the dataset, we can compute the target probability distribution $\pi(x)$ which is visualized below.

probs = np.zeros(2**size)
probs[nums] = 1 / len(data)

plt.figure(figsize=(12, 5)) plt.bar(np.arange(2**size), probs, width=2.0, label=r"$\pi(x)$") plt.xticks(nums, bitstrings, rotation=80)

plt.xlabel("Samples") plt.ylabel("Prob. Distribution") plt.legend(loc="upper right") plt.subplots_adjust(bottom=0.3) plt.show()
tutorial qcbm

One can observe that it is a uniform distribution and only the probabilities of the valid configurations are non-zero while rest are zero. Next we define a parameterized quantum circuit to train. This quantum circuit will act as a generative model, thus, realize a Born machine.

import pennylane as qml

np.random.seed(42)

n_qubits = size dev = qml.device("default.qubit", wires=n_qubits)

n_layers = 6 wshape = qml.StronglyEntanglingLayers.shape(n_layers=n_layers, n_wires=n_qubits) weights = np.random.random(size=wshape)

@qml.qnode(dev) def circuit(weights): qml.StronglyEntanglingLayers(weights=weights, ranges=[1] * n_layers, wires=range(n_qubits)) return qml.probs()

jit_circuit = jax.jit(circuit)

Using the MMD and QCBM classes defined earlier, we create their respective objects. Using Optax, we define the Adam optimizer.

import optax

bandwidth = jnp.array([0.25, 0.5, 1]) space = jnp.arange(2**n_qubits)

mmd = MMD(bandwidth, space) qcbm = QCBM(jit_circuit, mmd, probs)

opt = optax.adam(learning_rate=0.1) opt_state = opt.init(weights)

We can also verify that the summation in the first line of $\mathcal{L}$ is equal to the expectation values in the second line.

loss_1, px = qcbm.mmd_loss(weights)  # squared MMD
loss_2 = mmd.k_expval(px, px) - 2 * mmd.k_expval(px, probs) + mmd.k_expval(probs, probs)
print(loss_1)
print(loss_2)
0.06842073068350442
0.06842073068350445

Training

We define the update_step method which

  • computes the squared MMD loss and gradients.

  • apply the update step of our optimizer.

  • updates the parameter values.

  • calculates the KL divergence.

The KL divergence 6 is a measure of how far the predicted distribution $p_\boldsymbol{\theta}(x)$ is from the target distribution $\pi(x).$

@jax.jit
def update_step(params, opt_state):
    (loss_val, qcbm_probs), grads = jax.value_and_grad(qcbm.mmd_loss, has_aux=True)(params)
    updates, opt_state = opt.update(grads, opt_state)
    params = optax.apply_updates(params, updates)
    kl_div = -jnp.sum(qcbm.py * jnp.nan_to_num(jnp.log(qcbm_probs / qcbm.py)))
    return params, opt_state, loss_val, kl_div

history = [] divs = [] n_iterations = 100

for i in range(n_iterations): weights, opt_state, loss_val, kl_div = update_step(weights, opt_state)

if i % 10 == 0: print(f"Step: {i} Loss: {loss_val:.4f} KL-div: {kl_div:.4f}")

history.append(loss_val) divs.append(kl_div)
Step: 0 Loss: 0.0684 KL-div: 4.0735
Step: 10 Loss: 0.0439 KL-div: 1.7351
Step: 20 Loss: 0.0367 KL-div: 1.3296
Step: 30 Loss: 0.0292 KL-div: 1.0619
Step: 40 Loss: 0.0116 KL-div: 0.4977
Step: 50 Loss: 0.0039 KL-div: 0.2462
Step: 60 Loss: 0.0014 KL-div: 0.1348
Step: 70 Loss: 0.0009 KL-div: 0.1077
Step: 80 Loss: 0.0006 KL-div: 0.0852
Step: 90 Loss: 0.0004 KL-div: 0.0755

Visualizing the training results, we get the following plot.

fig, ax = plt.subplots(1, 2, figsize=(12, 5))

ax[0].plot(history) ax[0].set_xlabel("Iteration") ax[0].set_ylabel("MMD Loss")

ax[1].plot(divs, color="green") ax[1].set_xlabel("Iteration") ax[1].set_ylabel("KL Divergence") plt.show()
tutorial qcbm

Comparing the target probability distribution with the QCBM predictions, we can see that the predictions results in a good approximation.

qcbm_probs = np.array(qcbm.circ(weights))

plt.figure(figsize=(12, 5))

plt.bar( np.arange(2**size), probs, width=2.0, label=r"$\pi(x)$", alpha=0.4, color="tab:blue", ) plt.bar( np.arange(2**size), qcbm_probs, width=2.0, label=r"$p_\theta(x)$", alpha=0.9, color="tab:green", )

plt.xlabel("Samples") plt.ylabel("Prob. Distribution")

plt.xticks(nums, bitstrings, rotation=80) plt.legend(loc="upper right") plt.subplots_adjust(bottom=0.3) plt.show()
tutorial qcbm

Testing

To visualize the performance of the model, we generate samples and compute $\chi \equiv$ P($x$ is a bar or stripe) which is a measure of generation quality.

def circuit(weights):
    qml.StronglyEntanglingLayers(weights=weights, ranges=[1] * n_layers, wires=range(n_qubits))
    return qml.sample()

for N in [2000, 20000]: dev = qml.device("default.qubit", wires=n_qubits, shots=N) circ = qml.QNode(circuit, device=dev) preds = circ(weights) mask = np.any(np.all(preds[:, None] == data, axis=2), axis=1) # Check for row-wise equality chi = np.sum(mask) / N print(f"χ for N = {N}: {chi:.4f}")

print(f"χ for N = ∞: {np.sum(qcbm_probs[nums]):.4f}")
χ for N = 2000: 0.9280
χ for N = 20000: 0.9298
χ for N = ∞: 0.9293

Few of the samples are plotted below. The ones with a red border represents invalid images.

plt.figure(figsize=(8, 8))
j = 1
for i, m in zip(preds[:64], mask[:64]):
    ax = plt.subplot(8, 8, j)
    j += 1
    plt.imshow(np.reshape(i, (n, n)), cmap="gray", vmin=0, vmax=1)
    if ~m:
        plt.setp(ax.spines.values(), color="red", linewidth=1.5)
    plt.xticks([])
    plt.yticks([])
tutorial qcbm

The model is able to learn the target distribution due to a circuit with larger layers. Also, 1 has argued that training a QCBM with deeper circuits does not suffer from the vanishing gradients problem.

Learning a mixture of Gaussians

Now we use a QCBM to model a mixture of Gaussians

$$ \pi(x)\propto e^{-\frac{1}{2}\left(\frac{x-\mu_1}{\sigma}\right)^2}+e^{-\frac{1}{2}\left(\frac{x-\mu_2}{\sigma}\right)^2}, $$

with $x$ ranging from $0 \dots 2^{n}-1,$ where $n$ is the number of qubits.

def mixture_gaussian_pdf(x, mus, sigmas):
    mus, sigmas = np.array(mus), np.array(sigmas)
    vars = sigmas**2
    values = [
        (1 / np.sqrt(2 * np.pi * v)) * np.exp(-((x - m) ** 2) / (2 * v)) for m, v in zip(mus, vars)
    ]
    values = np.sum([val / sum(val) for val in values], axis=0)
    return values / np.sum(values)

n_qubits = 6 x_max = 2**n_qubits x_input = np.arange(x_max) mus = [(2 / 7) * x_max, (5 / 7) * x_max] sigmas = [x_max / 8] * 2 data = mixture_gaussian_pdf(x_input, mus, sigmas)

plt.plot(data, label=r"$\pi(x)$") plt.legend() plt.show()
tutorial qcbm

In contrast to the Bars and Stripes dataset, the Gaussian mixture distribution exhibits a smooth and non-zero probability for every basis state. Similar to the previous experiment, we will create an ansatz and measure probabilities.

dev = qml.device("default.qubit", wires=n_qubits)

n_layers = 4 wshape = qml.StronglyEntanglingLayers.shape(n_layers=n_layers, n_wires=n_qubits) weights = np.random.random(size=wshape)

@qml.qnode(dev) def circuit(weights): qml.StronglyEntanglingLayers( weights=weights, ranges=[1] * n_layers, wires=range(n_qubits) ) return qml.probs()

jit_circuit = jax.jit(circuit)

qml.draw_mpl(circuit, level="device")(weights) plt.show()
tutorial qcbm

With the quantum circuit defined, we are ready to optimize the squared MMD loss function which follows a code similar to the Bars and Stripes case.

bandwidth = jnp.array([0.25, 60])
space = jnp.arange(2**n_qubits)

mmd = MMD(bandwidth, space) qcbm = QCBM(jit_circuit, mmd, data)

opt = optax.adam(learning_rate=0.1) opt_state = opt.init(weights)

history = [] divs = [] n_iterations = 100

for i in range(n_iterations): weights, opt_state, loss_val, kl_div = update_step(weights, opt_state)

if i % 10 == 0: print(f"Step: {i} Loss: {loss_val:.4f} KL-div: {kl_div:.4f}")

history.append(loss_val) divs.append(kl_div)
Step: 0 Loss: 0.0274 KL-div: 0.8073
Step: 10 Loss: 0.0031 KL-div: 0.2578
Step: 20 Loss: 0.0013 KL-div: 0.1006
Step: 30 Loss: 0.0006 KL-div: 0.0529
Step: 40 Loss: 0.0004 KL-div: 0.0350
Step: 50 Loss: 0.0002 KL-div: 0.0237
Step: 60 Loss: 0.0002 KL-div: 0.0228
Step: 70 Loss: 0.0001 KL-div: 0.0200
Step: 80 Loss: 0.0001 KL-div: 0.0128
Step: 90 Loss: 0.0001 KL-div: 0.0087

Finally, we plot the histogram with the probabilities obtained from QCBM and compare it with the actual probability distribution.

qcbm_probs = qcbm.circ(weights)

plt.plot(range(x_max), data, linestyle="-.", label=r"$\pi(x)$") plt.bar(range(x_max), qcbm_probs, color="green", alpha=0.5, label="samples")

plt.xlabel("Samples") plt.ylabel("Prob. Distribution")

plt.legend() plt.show()
tutorial qcbm

The histogram (green bars) aligns remarkably well with the exact probability distribution (blue dashed curve).

Conclusion

In this tutorial, we introduced and implemented Quantum Circuit Born Machine (QCBM) using PennyLane. The algorithm is a gradient-based learning involving optimizing the squared MMD loss. We also evaluated QCBMs on the Bars and Stripes and two peaks datasets. One can also leverage the differentiable learning of the QCBM to solve combinatorial problems where the output is binary strings.

References

1(1,2,3)

Liu, Jin-Guo, and Lei Wang. “Differentiable learning of quantum circuit born machines.” Physical Review A 98.6 (2018): 062324.

2

Benedetti, Marcello, et al. “A generative modeling approach for benchmarking and training shallow quantum circuits.” npj Quantum Information 5.1 (2019): 45.

3

Du, Yuxuan, et al. “Expressive power of parametrized quantum circuits.” Physical Review Research 2.3 (2020): 033125.

4

Ackley, David H., Geoffrey E. Hinton, and Terrence J. Sejnowski. “A learning algorithm for Boltzmann machines.” Cognitive science 9.1 (1985): 147-169.

5

Gretton, Arthur, et al. “A kernel method for the two-sample-problem.” Advances in neural information processing systems 19 (2006).

6

Kullback, Solomon, and Richard A. Leibler. “On information and sufficiency.” The annals of mathematical statistics 22.1 (1951): 79-86.

About the author

Total running time of the script: (0 minutes 29.618 seconds)

Gopal Ramesh Dahale

Gopal Ramesh Dahale

Gopal is a quantum research software engineer at Qkrishi and a Qiskit advocate interested in applications of quantum computing in machine learning, chemistry and combinatorial optimization problems.

Total running time of the script: (0 minutes 29.618 seconds)