Variational classifier

Maria Schuld

Demos/
Quantum Machine Learning/
Variational classifier

Variational classifier

Maria Schuld

Published: October 10, 2019. Last updated: April 17, 2026.

In this tutorial, we show how to use PennyLane to implement variational quantum classifiers - quantum circuits that can be trained from labelled data to classify new data samples. The two examples used are inspired by two of the first papers that proposed variational circuits as supervised machine learning models: Farhi and Neven (2018) as well as Schuld et al. (2018).

More precisely, the first example shows that a variational circuit can be optimized to emulate the parity function

\[\begin{split}f: x \in \{0,1\}^{\otimes n} \rightarrow y = \begin{cases} 1 \text{ if uneven number of 1's in } x \\ 0 \text{ else}. \end{cases}\end{split}\]

It demonstrates how to encode binary inputs into the initial state of the variational circuit, which is simply a computational basis state (basis encoding).

The second example shows how to encode real vectors as amplitude vectors into quantum states (amplitude encoding) and how to train a variational circuit to recognize the first two classes of flowers in the Iris dataset.

1. Fitting the parity function

Imports

We start by importing PennyLane, the PennyLane-provided version of NumPy, and an optimizer.

import pennylane as qp
from pennylane import numpy as np
from pennylane.optimize import NesterovMomentumOptimizer

Quantum and classical nodes

We then create a quantum device that will run our circuits.

dev = qp.device("default.qubit")

Variational classifiers usually define a “layer” or “block”, which is an elementary circuit architecture that gets repeated to build the full variational circuit.

Our circuit layer will use four qubits, or wires, and consists of an arbitrary rotation on every qubit, as well as a ring of CNOTs that entangles each qubit with its neighbour. Borrowing from machine learning, we call the parameters of the layer weights.

def layer(layer_weights):
    for wire in range(4):
        qp.Rot(*layer_weights[wire], wires=wire)

    for wires in ([0, 1], [1, 2], [2, 3], [3, 0]):
        qp.CNOT(wires)

We also need a way to encode data inputs \(x\) into the circuit, so that the measured output depends on the inputs. In this first example, the inputs are bitstrings, which we encode into the state of the qubits. The quantum state \(\psi\) after state preparation is a computational basis state that has 1s where \(x\) has 1s, for example

\[x = 0101 \rightarrow |\psi \rangle = |0101 \rangle .\]

The BasisState function provided by PennyLane is made to do just this. It expects x to be a list of zeros and ones, i.e. [0,1,0,1].

def state_preparation(x):
    qp.BasisState(x, wires=[0, 1, 2, 3])

Now we define the variational quantum circuit as this state preparation routine, followed by a repetition of the layer structure.

@qp.qnode(dev)
def circuit(weights, x):
    state_preparation(x)

    for layer_weights in weights:
        layer(layer_weights)

    return qp.expval(qp.PauliZ(0))

If we want to add a “classical” bias parameter, the variational quantum classifier also needs some post-processing. We define the full model as a sum of the output of the quantum circuit, plus the trainable bias.

def variational_classifier(weights, bias, x):
    return circuit(weights, x) + bias

Cost

In supervised learning, the cost function is usually the sum of a loss function and a regularizer. We restrict ourselves to the standard square loss that measures the distance between target labels and model predictions.

def square_loss(labels, predictions):
    # We use a call to qp.math.stack to allow subtracting the arrays directly
    return np.mean((labels - qp.math.stack(predictions)) ** 2)

To monitor how many inputs the current classifier predicted correctly, we also define the accuracy, or the proportion of predictions that agree with a set of target labels.

def accuracy(labels, predictions):
    acc = sum(abs(l - p) < 1e-5 for l, p in zip(labels, predictions))
    acc = acc / len(labels)
    return acc

For learning tasks, the cost depends on the data - here the features and labels considered in the iteration of the optimization routine.

def cost(weights, bias, X, Y):
    predictions = [variational_classifier(weights, bias, x) for x in X]
    return square_loss(Y, predictions)

Optimization

Let’s now load and preprocess some data.

Note

The parity dataset’s train and test sets can be downloaded and should be placed in the subfolder variational_classifier/data.

data = np.loadtxt("variational_classifier/data/parity_train.txt", dtype=int)
X = np.array(data[:, :-1])
Y = np.array(data[:, -1])
Y = Y * 2 - 1  # shift label from {0, 1} to {-1, 1}

for x,y in zip(X, Y):
    print(f"x = {x}, y = {y}")

x = [0 0 0 1], y = 1
x = [0 0 1 0], y = 1
x = [0 1 0 0], y = 1
x = [0 1 0 1], y = -1
x = [0 1 1 0], y = -1
x = [0 1 1 1], y = 1
x = [1 0 0 0], y = 1
x = [1 0 0 1], y = -1
x = [1 0 1 1], y = 1
x = [1 1 1 1], y = -1

We initialize the variables randomly (but fix a seed for reproducibility). Remember that one of the variables is used as a bias, while the rest is fed into the gates of the variational circuit.

np.random.seed(0)
num_qubits = 4
num_layers = 2
weights_init = 0.01 * np.random.randn(num_layers, num_qubits, 3, requires_grad=True)
bias_init = np.array(0.0, requires_grad=True)

print("Weights:", weights_init)
print("Bias: ", bias_init)

Weights: [[[ 0.01764052  0.00400157  0.00978738]
  [ 0.02240893  0.01867558 -0.00977278]
  [ 0.00950088 -0.00151357 -0.00103219]
  [ 0.00410599  0.00144044  0.01454274]]

 [[ 0.00761038  0.00121675  0.00443863]
  [ 0.00333674  0.01494079 -0.00205158]
  [ 0.00313068 -0.00854096 -0.0255299 ]
  [ 0.00653619  0.00864436 -0.00742165]]]
Bias:  0.0

Next we create an optimizer instance and choose a batch size…

opt = NesterovMomentumOptimizer(0.5)
batch_size = 5

…and run the optimizer to train our model. We track the accuracy - the share of correctly classified data samples. For this we compute the outputs of the variational classifier and turn them into predictions in \(\{-1,1\}\) by taking the sign of the output.

weights = weights_init
bias = bias_init
for it in range(100):

    # Update the weights by one optimizer step, using only a limited batch of data
    batch_index = np.random.randint(0, len(X), (batch_size,))
    X_batch = X[batch_index]
    Y_batch = Y[batch_index]
    weights, bias = opt.step(cost, weights, bias, X=X_batch, Y=Y_batch)

    # Compute accuracy
    predictions = [np.sign(variational_classifier(weights, bias, x)) for x in X]

    current_cost = cost(weights, bias, X, Y)
    acc = accuracy(Y, predictions)

    print(f"Iter: {it+1:4d} | Cost: {current_cost:0.7f} | Accuracy: {acc:0.7f}")

Iter:    1 | Cost: 2.3147651 | Accuracy: 0.5000000
Iter:    2 | Cost: 1.9664866 | Accuracy: 0.5000000
Iter:    3 | Cost: 1.9208589 | Accuracy: 0.5000000
Iter:    4 | Cost: 2.6276126 | Accuracy: 0.5000000
Iter:    5 | Cost: 0.9323119 | Accuracy: 0.6000000
Iter:    6 | Cost: 1.1903549 | Accuracy: 0.5000000
Iter:    7 | Cost: 2.0508989 | Accuracy: 0.4000000
Iter:    8 | Cost: 1.1275531 | Accuracy: 0.6000000
Iter:    9 | Cost: 1.1659803 | Accuracy: 0.6000000
Iter:   10 | Cost: 1.1349618 | Accuracy: 0.6000000
Iter:   11 | Cost: 0.9994063 | Accuracy: 0.6000000
Iter:   12 | Cost: 1.0812559 | Accuracy: 0.6000000
Iter:   13 | Cost: 1.2863155 | Accuracy: 0.6000000
Iter:   14 | Cost: 2.2658259 | Accuracy: 0.4000000
Iter:   15 | Cost: 1.1323724 | Accuracy: 0.6000000
Iter:   16 | Cost: 1.3439737 | Accuracy: 0.8000000
Iter:   17 | Cost: 2.0076168 | Accuracy: 0.6000000
Iter:   18 | Cost: 1.2685760 | Accuracy: 0.5000000
Iter:   19 | Cost: 1.6762475 | Accuracy: 0.5000000
Iter:   20 | Cost: 1.1868237 | Accuracy: 0.6000000
Iter:   21 | Cost: 1.4784687 | Accuracy: 0.6000000
Iter:   22 | Cost: 1.4599473 | Accuracy: 0.6000000
Iter:   23 | Cost: 0.9573269 | Accuracy: 0.6000000
Iter:   24 | Cost: 1.1657424 | Accuracy: 0.5000000
Iter:   25 | Cost: 1.0877087 | Accuracy: 0.4000000
Iter:   26 | Cost: 1.1683687 | Accuracy: 0.6000000
Iter:   27 | Cost: 2.1141689 | Accuracy: 0.6000000
Iter:   28 | Cost: 1.0272966 | Accuracy: 0.5000000
Iter:   29 | Cost: 0.9664085 | Accuracy: 0.5000000
Iter:   30 | Cost: 1.1287654 | Accuracy: 0.6000000
Iter:   31 | Cost: 1.4202360 | Accuracy: 0.4000000
Iter:   32 | Cost: 1.1286000 | Accuracy: 0.5000000
Iter:   33 | Cost: 1.9594333 | Accuracy: 0.4000000
Iter:   34 | Cost: 1.2811832 | Accuracy: 0.4000000
Iter:   35 | Cost: 0.8522775 | Accuracy: 0.7000000
Iter:   36 | Cost: 1.4765281 | Accuracy: 0.6000000
Iter:   37 | Cost: 0.9603287 | Accuracy: 0.6000000
Iter:   38 | Cost: 1.6031314 | Accuracy: 0.6000000
Iter:   39 | Cost: 1.1700888 | Accuracy: 0.4000000
Iter:   40 | Cost: 1.7571779 | Accuracy: 0.4000000
Iter:   41 | Cost: 1.9608116 | Accuracy: 0.6000000
Iter:   42 | Cost: 2.0802752 | Accuracy: 0.6000000
Iter:   43 | Cost: 1.1904884 | Accuracy: 0.3000000
Iter:   44 | Cost: 0.9941585 | Accuracy: 0.6000000
Iter:   45 | Cost: 1.0709609 | Accuracy: 0.5000000
Iter:   46 | Cost: 0.9780625 | Accuracy: 0.6000000
Iter:   47 | Cost: 1.1573709 | Accuracy: 0.6000000
Iter:   48 | Cost: 1.0235239 | Accuracy: 0.6000000
Iter:   49 | Cost: 1.2842469 | Accuracy: 0.5000000
Iter:   50 | Cost: 0.8549226 | Accuracy: 0.6000000
Iter:   51 | Cost: 0.5136787 | Accuracy: 1.0000000
Iter:   52 | Cost: 0.2488031 | Accuracy: 1.0000000
Iter:   53 | Cost: 0.0461277 | Accuracy: 1.0000000
Iter:   54 | Cost: 0.0293518 | Accuracy: 1.0000000
Iter:   55 | Cost: 0.0205454 | Accuracy: 1.0000000
Iter:   56 | Cost: 0.0352514 | Accuracy: 1.0000000
Iter:   57 | Cost: 0.0576767 | Accuracy: 1.0000000
Iter:   58 | Cost: 0.0291305 | Accuracy: 1.0000000
Iter:   59 | Cost: 0.0127137 | Accuracy: 1.0000000
Iter:   60 | Cost: 0.0058108 | Accuracy: 1.0000000
Iter:   61 | Cost: 0.0018002 | Accuracy: 1.0000000
Iter:   62 | Cost: 0.0014089 | Accuracy: 1.0000000
Iter:   63 | Cost: 0.0017489 | Accuracy: 1.0000000
Iter:   64 | Cost: 0.0021282 | Accuracy: 1.0000000
Iter:   65 | Cost: 0.0029876 | Accuracy: 1.0000000
Iter:   66 | Cost: 0.0035331 | Accuracy: 1.0000000
Iter:   67 | Cost: 0.0035540 | Accuracy: 1.0000000
Iter:   68 | Cost: 0.0025639 | Accuracy: 1.0000000
Iter:   69 | Cost: 0.0019459 | Accuracy: 1.0000000
Iter:   70 | Cost: 0.0015856 | Accuracy: 1.0000000
Iter:   71 | Cost: 0.0008439 | Accuracy: 1.0000000
Iter:   72 | Cost: 0.0005960 | Accuracy: 1.0000000
Iter:   73 | Cost: 0.0003122 | Accuracy: 1.0000000
Iter:   74 | Cost: 0.0002446 | Accuracy: 1.0000000
Iter:   75 | Cost: 0.0001745 | Accuracy: 1.0000000
Iter:   76 | Cost: 0.0001215 | Accuracy: 1.0000000
Iter:   77 | Cost: 0.0001141 | Accuracy: 1.0000000
Iter:   78 | Cost: 0.0001538 | Accuracy: 1.0000000
Iter:   79 | Cost: 0.0001871 | Accuracy: 1.0000000
Iter:   80 | Cost: 0.0001330 | Accuracy: 1.0000000
Iter:   81 | Cost: 0.0001380 | Accuracy: 1.0000000
Iter:   82 | Cost: 0.0001336 | Accuracy: 1.0000000
Iter:   83 | Cost: 0.0001483 | Accuracy: 1.0000000
Iter:   84 | Cost: 0.0001234 | Accuracy: 1.0000000
Iter:   85 | Cost: 0.0001359 | Accuracy: 1.0000000
Iter:   86 | Cost: 0.0001268 | Accuracy: 1.0000000
Iter:   87 | Cost: 0.0002270 | Accuracy: 1.0000000
Iter:   88 | Cost: 0.0000865 | Accuracy: 1.0000000
Iter:   89 | Cost: 0.0000774 | Accuracy: 1.0000000
Iter:   90 | Cost: 0.0000759 | Accuracy: 1.0000000
Iter:   91 | Cost: 0.0000607 | Accuracy: 1.0000000
Iter:   92 | Cost: 0.0000523 | Accuracy: 1.0000000
Iter:   93 | Cost: 0.0000536 | Accuracy: 1.0000000
Iter:   94 | Cost: 0.0000444 | Accuracy: 1.0000000
Iter:   95 | Cost: 0.0000384 | Accuracy: 1.0000000
Iter:   96 | Cost: 0.0000497 | Accuracy: 1.0000000
Iter:   97 | Cost: 0.0000263 | Accuracy: 1.0000000
Iter:   98 | Cost: 0.0000229 | Accuracy: 1.0000000
Iter:   99 | Cost: 0.0000339 | Accuracy: 1.0000000
Iter:  100 | Cost: 0.0000174 | Accuracy: 1.0000000

As we can see, the variational classifier learned to classify all bit strings from the training set correctly.

But unlike optimization, in machine learning the goal is to generalize from limited data to unseen examples. Even if the variational quantum circuit was perfectly optimized with respect to the cost, it might not generalize, a phenomenon known as overfitting. The art of (quantum) machine learning is to create models and learning procedures that tend to find “good” minima, or those that lead to models which generalize well.

With this in mind, let’s look at a test set of examples we have not used during training:

data = np.loadtxt("variational_classifier/data/parity_test.txt", dtype=int)
X_test = np.array(data[:, :-1])
Y_test = np.array(data[:, -1])
Y_test = Y_test * 2 - 1  # shift label from {0, 1} to {-1, 1}

predictions_test = [np.sign(variational_classifier(weights, bias, x)) for x in X_test]

for x,y,p in zip(X_test, Y_test, predictions_test):
    print(f"x = {x}, y = {y}, pred={p}")

acc_test = accuracy(Y_test, predictions_test)
print("Accuracy on unseen data:", acc_test)

x = [0 0 0 0], y = -1, pred=-1.0
x = [0 0 1 1], y = -1, pred=-1.0
x = [1 0 1 0], y = -1, pred=-1.0
x = [1 1 1 0], y = 1, pred=1.0
x = [1 1 0 0], y = -1, pred=-1.0
x = [1 1 0 1], y = 1, pred=1.0
Accuracy on unseen data: 1.0

The quantum circuit has also learnt to predict all unseen examples perfectly well! This is actually remarkable, since the encoding strategy creates quantum states from the data that have zero overlap – and hence the states created from the test set have no overlap with the states created from the training set. There are many functional relations the variational circuit could learn from this kind of representation, but the classifier chooses to label bit strings according to our ground truth, the parity function.

Let’s look at the second example, in which we use another encoding strategy.

2. Iris classification

We now move on to classifying data points from the Iris dataset, which are no longer simple bitstrings but represented as real-valued vectors. The vectors are 2-dimensional, but we will add some “latent dimensions” and therefore encode inputs into 2 qubits.

Quantum and classical nodes

State preparation is not as simple as when we represent a bitstring with a basis state. Every input x has to be translated into a set of angles which can get fed into a small routine for state preparation. To simplify things a bit, we will work with data from the positive subspace, so that we can ignore signs (which would require another cascade of rotations around the Z-axis).

The circuit is coded according to the scheme in Möttönen, et al. (2004), or—as presented for positive vectors only—in Schuld and Petruccione (2018). We also decomposed controlled Y-axis rotations into more basic gates, following Nielsen and Chuang (2010).

def get_angles(x):
    beta0 = 2 * np.arcsin(np.sqrt(x[1] ** 2) / np.sqrt(x[0] ** 2 + x[1] ** 2 + 1e-12))
    beta1 = 2 * np.arcsin(np.sqrt(x[3] ** 2) / np.sqrt(x[2] ** 2 + x[3] ** 2 + 1e-12))
    beta2 = 2 * np.arcsin(np.linalg.norm(x[2:]) / np.linalg.norm(x))

    return np.array([beta2, -beta1 / 2, beta1 / 2, -beta0 / 2, beta0 / 2])


def state_preparation(a):
    qp.RY(a[0], wires=0)

    qp.CNOT(wires=[0, 1])
    qp.RY(a[1], wires=1)
    qp.CNOT(wires=[0, 1])
    qp.RY(a[2], wires=1)

    qp.PauliX(wires=0)
    qp.CNOT(wires=[0, 1])
    qp.RY(a[3], wires=1)
    qp.CNOT(wires=[0, 1])
    qp.RY(a[4], wires=1)
    qp.PauliX(wires=0)

Let’s test if this routine actually works.

x = np.array([0.53896774, 0.79503606, 0.27826503, 0.0], requires_grad=False)
ang = get_angles(x)


@qp.qnode(dev)
def test(angles):
    state_preparation(angles)

    return qp.state()


state = test(ang)

print("x               : ", np.round(x, 6))
print("angles          : ", np.round(ang, 6))
print("amplitude vector: ", np.round(np.real(state), 6))

x               :  [0.538968 0.795036 0.278265 0.      ]
angles          :  [ 0.563975 -0.        0.       -0.975046  0.975046]
amplitude vector:  [ 0.538968  0.795036  0.278265 -0.      ]

The method computed the correct angles to prepare the desired state!

Note

The default.qubit simulator provides a shortcut to state_preparation with the command qp.StatePrep(x, wires=[0, 1]). On state simulators, this just replaces the quantum state with our (normalized) input. On hardware, the operation implements more sophisticated versions of the routine used above.

Since we are working with only 2 qubits now, we need to update the layer function. In addition, we redefine the cost function to pass the full batch of data to the state preparation of the circuit simultaneously, a technique similar to NumPy broadcasting.

def layer(layer_weights):
    for wire in range(2):
        qp.Rot(*layer_weights[wire], wires=wire)
    qp.CNOT(wires=[0, 1])


def cost(weights, bias, X, Y):
    # Transpose the batch of input data in order to make the indexing
    # in state_preparation work
    predictions = variational_classifier(weights, bias, X.T)
    return square_loss(Y, predictions)

Data

We load the Iris data set. There is a bit of preprocessing to do in order to encode the inputs into the amplitudes of a quantum state. We will augment the data points by two so-called “latent dimensions”, making the size of the padded data point match the size of the state vector in the quantum device. We then need to normalize the data points, and finally, we translate the inputs x to rotation angles using the get_angles function we defined above.

Data preprocessing should always be done with the problem in mind; for example, if we do not add any latent dimensions, normalization erases any information on the length of the vectors and classes separated by this feature will not be distinguishable.

Note

The Iris dataset can be downloaded here and should be placed in the subfolder variational_classifer/data.

data = np.loadtxt("variational_classifier/data/iris_classes1and2_scaled.txt")
X = data[:, 0:2]
print(f"First X sample (original)  : {X[0]}")

# pad the vectors to size 2^2=4 with constant values
padding = np.ones((len(X), 2)) * 0.1
X_pad = np.c_[X, padding]
print(f"First X sample (padded)    : {X_pad[0]}")

# normalize each input
normalization = np.sqrt(np.sum(X_pad**2, -1))
X_norm = (X_pad.T / normalization).T
print(f"First X sample (normalized): {X_norm[0]}")

# the angles for state preparation are the features
features = np.array([get_angles(x) for x in X_norm], requires_grad=False)
print(f"First features sample      : {features[0]}")

Y = data[:, -1]

First X sample (original)  : [0.4  0.75]
First X sample (padded)    : [0.4  0.75 0.1  0.1 ]
First X sample (normalized): [0.46420708 0.87038828 0.11605177 0.11605177]
First features sample      : [ 0.32973573 -0.78539816  0.78539816 -1.080839    1.080839  ]

These angles are our new features, which is why we have renamed X to “features” above. Let’s plot the stages of preprocessing and play around with the dimensions (dim1, dim2). Some of them still separate the classes well, while others are less informative.

import matplotlib.pyplot as plt

plt.figure()
plt.scatter(X[:, 0][Y == 1], X[:, 1][Y == 1], c="b", marker="o", ec="k")
plt.scatter(X[:, 0][Y == -1], X[:, 1][Y == -1], c="r", marker="o", ec="k")
plt.title("Original data")
plt.show()

plt.figure()
dim1 = 0
dim2 = 1
plt.scatter(X_norm[:, dim1][Y == 1], X_norm[:, dim2][Y == 1], c="b", marker="o", ec="k")
plt.scatter(X_norm[:, dim1][Y == -1], X_norm[:, dim2][Y == -1], c="r", marker="o", ec="k")
plt.title(f"Padded and normalised data (dims {dim1} and {dim2})")
plt.show()

plt.figure()
dim1 = 0
dim2 = 3
plt.scatter(features[:, dim1][Y == 1], features[:, dim2][Y == 1], c="b", marker="o", ec="k")
plt.scatter(features[:, dim1][Y == -1], features[:, dim2][Y == -1], c="r", marker="o", ec="k")
plt.title(f"Feature vectors (dims {dim1} and {dim2})")
plt.show()

This time we want to generalize from the data samples. This means that we want to train our model on one set of data and test its performance on a second set of data that has not been used in training. To monitor the generalization performance, the data is split into training and validation set.

np.random.seed(0)
num_data = len(Y)
num_train = int(0.75 * num_data)
index = np.random.permutation(range(num_data))
feats_train = features[index[:num_train]]
Y_train = Y[index[:num_train]]
feats_val = features[index[num_train:]]
Y_val = Y[index[num_train:]]

# We need these later for plotting
X_train = X[index[:num_train]]
X_val = X[index[num_train:]]

Optimization

First we initialize the variables.

num_qubits = 2
num_layers = 6

weights_init = 0.01 * np.random.randn(num_layers, num_qubits, 3, requires_grad=True)
bias_init = np.array(0.0, requires_grad=True)

Again we minimize the cost, using the imported optimizer.

opt = NesterovMomentumOptimizer(0.01)
batch_size = 5

# train the variational classifier
weights = weights_init
bias = bias_init
for it in range(60):
    # Update the weights by one optimizer step
    batch_index = np.random.randint(0, num_train, (batch_size,))
    feats_train_batch = feats_train[batch_index]
    Y_train_batch = Y_train[batch_index]
    weights, bias, _, _ = opt.step(cost, weights, bias, feats_train_batch, Y_train_batch)

    # Compute predictions on train and validation set
    predictions_train = np.sign(variational_classifier(weights, bias, feats_train.T))
    predictions_val = np.sign(variational_classifier(weights, bias, feats_val.T))

    # Compute accuracy on train and validation set
    acc_train = accuracy(Y_train, predictions_train)
    acc_val = accuracy(Y_val, predictions_val)

    if (it + 1) % 2 == 0:
        _cost = cost(weights, bias, features, Y)
        print(
            f"Iter: {it + 1:5d} | Cost: {_cost:0.7f} | "
            f"Acc train: {acc_train:0.7f} | Acc validation: {acc_val:0.7f}"
        )

Iter:     2 | Cost: 1.6589456 | Acc train: 0.4800000 | Acc validation: 0.5600000
Iter:     4 | Cost: 1.2054273 | Acc train: 0.4933333 | Acc validation: 0.5600000
Iter:     6 | Cost: 0.9740740 | Acc train: 0.4933333 | Acc validation: 0.7200000
Iter:     8 | Cost: 0.9660872 | Acc train: 0.6400000 | Acc validation: 0.6400000
Iter:    10 | Cost: 0.9569019 | Acc train: 0.6000000 | Acc validation: 0.6000000
Iter:    12 | Cost: 0.9445863 | Acc train: 0.4933333 | Acc validation: 0.7200000
Iter:    14 | Cost: 1.0339978 | Acc train: 0.4800000 | Acc validation: 0.5600000
Iter:    16 | Cost: 1.0774217 | Acc train: 0.4933333 | Acc validation: 0.5600000
Iter:    18 | Cost: 0.9984426 | Acc train: 0.4800000 | Acc validation: 0.5600000
Iter:    20 | Cost: 0.8975279 | Acc train: 0.5600000 | Acc validation: 0.7600000
Iter:    22 | Cost: 0.8451699 | Acc train: 0.6400000 | Acc validation: 0.6400000
Iter:    24 | Cost: 0.8337489 | Acc train: 0.5600000 | Acc validation: 0.5200000
Iter:    26 | Cost: 0.7832025 | Acc train: 0.6000000 | Acc validation: 0.6000000
Iter:    28 | Cost: 0.7397515 | Acc train: 0.6133333 | Acc validation: 0.6000000
Iter:    30 | Cost: 0.6690522 | Acc train: 0.6666667 | Acc validation: 0.6400000
Iter:    32 | Cost: 0.5640186 | Acc train: 0.8266667 | Acc validation: 0.8000000
Iter:    34 | Cost: 0.4765597 | Acc train: 0.8933333 | Acc validation: 0.8800000
Iter:    36 | Cost: 0.4144135 | Acc train: 0.9200000 | Acc validation: 0.9600000
Iter:    38 | Cost: 0.3569566 | Acc train: 0.9600000 | Acc validation: 1.0000000
Iter:    40 | Cost: 0.3186159 | Acc train: 0.9866667 | Acc validation: 1.0000000
Iter:    42 | Cost: 0.2853043 | Acc train: 0.9866667 | Acc validation: 1.0000000
Iter:    44 | Cost: 0.2652725 | Acc train: 1.0000000 | Acc validation: 1.0000000
Iter:    46 | Cost: 0.2525848 | Acc train: 1.0000000 | Acc validation: 1.0000000
Iter:    48 | Cost: 0.2444278 | Acc train: 1.0000000 | Acc validation: 1.0000000
Iter:    50 | Cost: 0.2436316 | Acc train: 0.9866667 | Acc validation: 1.0000000
Iter:    52 | Cost: 0.2376316 | Acc train: 1.0000000 | Acc validation: 1.0000000
Iter:    54 | Cost: 0.2307475 | Acc train: 1.0000000 | Acc validation: 1.0000000
Iter:    56 | Cost: 0.2341245 | Acc train: 1.0000000 | Acc validation: 1.0000000
Iter:    58 | Cost: 0.2292663 | Acc train: 1.0000000 | Acc validation: 1.0000000
Iter:    60 | Cost: 0.2241948 | Acc train: 1.0000000 | Acc validation: 1.0000000

We can plot the continuous output of the variational classifier for the first two dimensions of the Iris data set.

plt.figure()
cm = plt.cm.RdBu

# make data for decision regions
xx, yy = np.meshgrid(np.linspace(0.0, 1.5, 30), np.linspace(0.0, 1.5, 30))
X_grid = [np.array([x, y]) for x, y in zip(xx.flatten(), yy.flatten())]

# preprocess grid points like data inputs above
padding = 0.1 * np.ones((len(X_grid), 2))
X_grid = np.c_[X_grid, padding]  # pad each input
normalization = np.sqrt(np.sum(X_grid**2, -1))
X_grid = (X_grid.T / normalization).T  # normalize each input
features_grid = np.array([get_angles(x) for x in X_grid])  # angles are new features
predictions_grid = variational_classifier(weights, bias, features_grid.T)
Z = np.reshape(predictions_grid, xx.shape)

# plot decision regions
levels = np.arange(-1, 1.1, 0.1)
cnt = plt.contourf(xx, yy, Z, levels=levels, cmap=cm, alpha=0.8, extend="both")
plt.contour(xx, yy, Z, levels=[0.0], colors=("black",), linestyles=("--",), linewidths=(0.8,))
plt.colorbar(cnt, ticks=[-1, 0, 1])

# plot data
for color, label in zip(["b", "r"], [1, -1]):
    plot_x = X_train[:, 0][Y_train == label]
    plot_y = X_train[:, 1][Y_train == label]
    plt.scatter(plot_x, plot_y, c=color, marker="o", ec="k", label=f"class {label} train")
    plot_x = (X_val[:, 0][Y_val == label],)
    plot_y = (X_val[:, 1][Y_val == label],)
    plt.scatter(plot_x, plot_y, c=color, marker="^", ec="k", label=f"class {label} validation")

plt.legend()
plt.show()

We find that the variational classifier learnt a separating line between the datapoints of the two different classes, which allows it to classify even the unseen validation data with perfect accuracy.

About the author

Maria Schuld

Dedicated to making quantum machine learning a reality one day.

Total running time of the script: (0 minutes 29.307 seconds)

Share demo

Ask a question on the forum

Variational classifier

1. Fitting the parity function

Imports

Quantum and classical nodes

Cost

Optimization

2. Iris classification

Quantum and classical nodes

Data

Optimization

About the author

Related Demos

Research

Performance

Hardware and simulators

Demos library

Compilation hub

Quantum datasets

Teach

Learn

Codebook

Coding challenges

Videos

Glossary

Install

Features

PennyLane documentation

Catalyst documentation

Development guide

How-to guides

API

GitHub