Note
Go to the end to download the full example code.
In this tutorial, we show how to use PennyLane to implement variational quantum classifiers - quantum circuits that can be trained from labelled data to classify new data samples. The two examples used are inspired by two of the first papers that proposed variational circuits as supervised machine learning models: Farhi and Neven (2018) as well as Schuld et al. (2018).
More precisely, the first example shows that a variational circuit can be optimized to emulate the parity function
It demonstrates how to encode binary inputs into the initial state of the variational circuit, which is simply a computational basis state (basis encoding).
The second example shows how to encode real vectors as amplitude vectors into quantum states (amplitude encoding) and how to train a variational circuit to recognize the first two classes of flowers in the Iris dataset.
1. Fitting the parity function¶
Imports¶
We start by importing PennyLane, the PennyLane-provided version of NumPy, and an optimizer.
import pennylane as qml
from pennylane import numpy as np
from pennylane.optimize import NesterovMomentumOptimizer
Quantum and classical nodes¶
We then create a quantum device that will run our circuits.
dev = qml.device("default.qubit")
Variational classifiers usually define a “layer” or “block”, which is an elementary circuit architecture that gets repeated to build the full variational circuit.
Our circuit layer will use four qubits, or wires, and consists of an arbitrary
rotation on every qubit, as well as a ring of CNOTs that entangles each qubit
with its neighbour. Borrowing from machine learning, we call the parameters
of the layer weights
.
def layer(layer_weights):
for wire in range(4):
qml.Rot(*layer_weights[wire], wires=wire)
for wires in ([0, 1], [1, 2], [2, 3], [3, 0]):
qml.CNOT(wires)
We also need a way to encode data inputs $x$ into the circuit, so that the measured output depends on the inputs. In this first example, the inputs are bitstrings, which we encode into the state of the qubits. The quantum state $\psi$ after state preparation is a computational basis state that has 1s where $x$ has 1s, for example
The BasisState
function provided by PennyLane is made to do just
this. It expects x
to be a list of zeros and ones, i.e. [0,1,0,1]
.
def state_preparation(x):
qml.BasisState(x, wires=[0, 1, 2, 3])
Now we define the variational quantum circuit as this state preparation routine, followed by a repetition of the layer structure.
@qml.qnode(dev)
def circuit(weights, x):
state_preparation(x)
for layer_weights in weights:
layer(layer_weights)
return qml.expval(qml.PauliZ(0))
If we want to add a “classical” bias parameter, the variational quantum classifier also needs some post-processing. We define the full model as a sum of the output of the quantum circuit, plus the trainable bias.
def variational_classifier(weights, bias, x):
return circuit(weights, x) + bias
Cost¶
In supervised learning, the cost function is usually the sum of a loss function and a regularizer. We restrict ourselves to the standard square loss that measures the distance between target labels and model predictions.
def square_loss(labels, predictions):
# We use a call to qml.math.stack to allow subtracting the arrays directly
return np.mean((labels - qml.math.stack(predictions)) ** 2)
To monitor how many inputs the current classifier predicted correctly, we also define the accuracy, or the proportion of predictions that agree with a set of target labels.
def accuracy(labels, predictions):
acc = sum(abs(l - p) < 1e-5 for l, p in zip(labels, predictions))
acc = acc / len(labels)
return acc
For learning tasks, the cost depends on the data - here the features and labels considered in the iteration of the optimization routine.
def cost(weights, bias, X, Y):
predictions = [variational_classifier(weights, bias, x) for x in X]
return square_loss(Y, predictions)
Optimization¶
Let’s now load and preprocess some data.
Note
The parity dataset’s train and
test sets can be downloaded and
should be placed in the subfolder variational_classifier/data
.
data = np.loadtxt("variational_classifier/data/parity_train.txt", dtype=int)
X = np.array(data[:, :-1])
Y = np.array(data[:, -1])
Y = Y * 2 - 1 # shift label from {0, 1} to {-1, 1}
for x,y in zip(X, Y):
print(f"x = {x}, y = {y}")
x = [0 0 0 1], y = 1
x = [0 0 1 0], y = 1
x = [0 1 0 0], y = 1
x = [0 1 0 1], y = -1
x = [0 1 1 0], y = -1
x = [0 1 1 1], y = 1
x = [1 0 0 0], y = 1
x = [1 0 0 1], y = -1
x = [1 0 1 1], y = 1
x = [1 1 1 1], y = -1
We initialize the variables randomly (but fix a seed for reproducibility). Remember that one of the variables is used as a bias, while the rest is fed into the gates of the variational circuit.
np.random.seed(0)
num_qubits = 4
num_layers = 2
weights_init = 0.01 * np.random.randn(num_layers, num_qubits, 3, requires_grad=True)
bias_init = np.array(0.0, requires_grad=True)
print("Weights:", weights_init)
print("Bias: ", bias_init)
Weights: [[[ 0.01764052 0.00400157 0.00978738]
[ 0.02240893 0.01867558 -0.00977278]
[ 0.00950088 -0.00151357 -0.00103219]
[ 0.00410599 0.00144044 0.01454274]]
[[ 0.00761038 0.00121675 0.00443863]
[ 0.00333674 0.01494079 -0.00205158]
[ 0.00313068 -0.00854096 -0.0255299 ]
[ 0.00653619 0.00864436 -0.00742165]]]
Bias: 0.0
Next we create an optimizer instance and choose a batch size…
opt = NesterovMomentumOptimizer(0.5)
batch_size = 5
…and run the optimizer to train our model. We track the accuracy - the share of correctly classified data samples. For this we compute the outputs of the variational classifier and turn them into predictions in $\{-1,1\}$ by taking the sign of the output.
weights = weights_init
bias = bias_init
for it in range(100):
# Update the weights by one optimizer step, using only a limited batch of data
batch_index = np.random.randint(0, len(X), (batch_size,))
X_batch = X[batch_index]
Y_batch = Y[batch_index]
weights, bias = opt.step(cost, weights, bias, X=X_batch, Y=Y_batch)
# Compute accuracy
predictions = [np.sign(variational_classifier(weights, bias, x)) for x in X]
current_cost = cost(weights, bias, X, Y)
acc = accuracy(Y, predictions)
print(f"Iter: {it+1:4d} | Cost: {current_cost:0.7f} | Accuracy: {acc:0.7f}")
Iter: 1 | Cost: 2.3147651 | Accuracy: 0.5000000
Iter: 2 | Cost: 1.9664866 | Accuracy: 0.5000000
Iter: 3 | Cost: 1.9208589 | Accuracy: 0.5000000
Iter: 4 | Cost: 2.6276126 | Accuracy: 0.5000000
Iter: 5 | Cost: 0.9323119 | Accuracy: 0.6000000
Iter: 6 | Cost: 1.1903549 | Accuracy: 0.5000000
Iter: 7 | Cost: 2.0508989 | Accuracy: 0.4000000
Iter: 8 | Cost: 1.1275531 | Accuracy: 0.6000000
Iter: 9 | Cost: 1.1659803 | Accuracy: 0.6000000
Iter: 10 | Cost: 1.1349618 | Accuracy: 0.6000000
Iter: 11 | Cost: 0.9994063 | Accuracy: 0.6000000
Iter: 12 | Cost: 1.0812559 | Accuracy: 0.6000000
Iter: 13 | Cost: 1.2863155 | Accuracy: 0.6000000
Iter: 14 | Cost: 2.2658259 | Accuracy: 0.4000000
Iter: 15 | Cost: 1.1323724 | Accuracy: 0.6000000
Iter: 16 | Cost: 1.3439737 | Accuracy: 0.8000000
Iter: 17 | Cost: 2.0076168 | Accuracy: 0.6000000
Iter: 18 | Cost: 1.2685760 | Accuracy: 0.5000000
Iter: 19 | Cost: 1.6762475 | Accuracy: 0.5000000
Iter: 20 | Cost: 1.1868237 | Accuracy: 0.6000000
Iter: 21 | Cost: 1.4784687 | Accuracy: 0.6000000
Iter: 22 | Cost: 1.4599473 | Accuracy: 0.6000000
Iter: 23 | Cost: 0.9573269 | Accuracy: 0.6000000
Iter: 24 | Cost: 1.1657424 | Accuracy: 0.5000000
Iter: 25 | Cost: 1.0877087 | Accuracy: 0.4000000
Iter: 26 | Cost: 1.1683687 | Accuracy: 0.6000000
Iter: 27 | Cost: 2.1141689 | Accuracy: 0.6000000
Iter: 28 | Cost: 1.0272966 | Accuracy: 0.5000000
Iter: 29 | Cost: 0.9664085 | Accuracy: 0.5000000
Iter: 30 | Cost: 1.1287654 | Accuracy: 0.6000000
Iter: 31 | Cost: 1.4202360 | Accuracy: 0.4000000
Iter: 32 | Cost: 1.1286000 | Accuracy: 0.5000000
Iter: 33 | Cost: 1.9594333 | Accuracy: 0.4000000
Iter: 34 | Cost: 1.2811832 | Accuracy: 0.4000000
Iter: 35 | Cost: 0.8522775 | Accuracy: 0.7000000
Iter: 36 | Cost: 1.4765281 | Accuracy: 0.6000000
Iter: 37 | Cost: 0.9603287 | Accuracy: 0.6000000
Iter: 38 | Cost: 1.6031314 | Accuracy: 0.6000000
Iter: 39 | Cost: 1.1700888 | Accuracy: 0.4000000
Iter: 40 | Cost: 1.7571779 | Accuracy: 0.4000000
Iter: 41 | Cost: 1.9608116 | Accuracy: 0.6000000
Iter: 42 | Cost: 2.0802752 | Accuracy: 0.6000000
Iter: 43 | Cost: 1.1904884 | Accuracy: 0.3000000
Iter: 44 | Cost: 0.9941585 | Accuracy: 0.6000000
Iter: 45 | Cost: 1.0709609 | Accuracy: 0.5000000
Iter: 46 | Cost: 0.9780625 | Accuracy: 0.6000000
Iter: 47 | Cost: 1.1573709 | Accuracy: 0.6000000
Iter: 48 | Cost: 1.0235239 | Accuracy: 0.6000000
Iter: 49 | Cost: 1.2842469 | Accuracy: 0.5000000
Iter: 50 | Cost: 0.8549226 | Accuracy: 0.6000000
Iter: 51 | Cost: 0.5136787 | Accuracy: 1.0000000
Iter: 52 | Cost: 0.2488031 | Accuracy: 1.0000000
Iter: 53 | Cost: 0.0461277 | Accuracy: 1.0000000
Iter: 54 | Cost: 0.0293518 | Accuracy: 1.0000000
Iter: 55 | Cost: 0.0205454 | Accuracy: 1.0000000
Iter: 56 | Cost: 0.0352514 | Accuracy: 1.0000000
Iter: 57 | Cost: 0.0576767 | Accuracy: 1.0000000
Iter: 58 | Cost: 0.0291305 | Accuracy: 1.0000000
Iter: 59 | Cost: 0.0127137 | Accuracy: 1.0000000
Iter: 60 | Cost: 0.0058108 | Accuracy: 1.0000000
Iter: 61 | Cost: 0.0018002 | Accuracy: 1.0000000
Iter: 62 | Cost: 0.0014089 | Accuracy: 1.0000000
Iter: 63 | Cost: 0.0017489 | Accuracy: 1.0000000
Iter: 64 | Cost: 0.0021282 | Accuracy: 1.0000000
Iter: 65 | Cost: 0.0029876 | Accuracy: 1.0000000
Iter: 66 | Cost: 0.0035331 | Accuracy: 1.0000000
Iter: 67 | Cost: 0.0035540 | Accuracy: 1.0000000
Iter: 68 | Cost: 0.0025639 | Accuracy: 1.0000000
Iter: 69 | Cost: 0.0019459 | Accuracy: 1.0000000
Iter: 70 | Cost: 0.0015856 | Accuracy: 1.0000000
Iter: 71 | Cost: 0.0008439 | Accuracy: 1.0000000
Iter: 72 | Cost: 0.0005960 | Accuracy: 1.0000000
Iter: 73 | Cost: 0.0003122 | Accuracy: 1.0000000
Iter: 74 | Cost: 0.0002446 | Accuracy: 1.0000000
Iter: 75 | Cost: 0.0001745 | Accuracy: 1.0000000
Iter: 76 | Cost: 0.0001215 | Accuracy: 1.0000000
Iter: 77 | Cost: 0.0001141 | Accuracy: 1.0000000
Iter: 78 | Cost: 0.0001538 | Accuracy: 1.0000000
Iter: 79 | Cost: 0.0001871 | Accuracy: 1.0000000
Iter: 80 | Cost: 0.0001330 | Accuracy: 1.0000000
Iter: 81 | Cost: 0.0001380 | Accuracy: 1.0000000
Iter: 82 | Cost: 0.0001336 | Accuracy: 1.0000000
Iter: 83 | Cost: 0.0001483 | Accuracy: 1.0000000
Iter: 84 | Cost: 0.0001234 | Accuracy: 1.0000000
Iter: 85 | Cost: 0.0001359 | Accuracy: 1.0000000
Iter: 86 | Cost: 0.0001268 | Accuracy: 1.0000000
Iter: 87 | Cost: 0.0002270 | Accuracy: 1.0000000
Iter: 88 | Cost: 0.0000865 | Accuracy: 1.0000000
Iter: 89 | Cost: 0.0000774 | Accuracy: 1.0000000
Iter: 90 | Cost: 0.0000759 | Accuracy: 1.0000000
Iter: 91 | Cost: 0.0000607 | Accuracy: 1.0000000
Iter: 92 | Cost: 0.0000523 | Accuracy: 1.0000000
Iter: 93 | Cost: 0.0000536 | Accuracy: 1.0000000
Iter: 94 | Cost: 0.0000444 | Accuracy: 1.0000000
Iter: 95 | Cost: 0.0000384 | Accuracy: 1.0000000
Iter: 96 | Cost: 0.0000497 | Accuracy: 1.0000000
Iter: 97 | Cost: 0.0000263 | Accuracy: 1.0000000
Iter: 98 | Cost: 0.0000229 | Accuracy: 1.0000000
Iter: 99 | Cost: 0.0000339 | Accuracy: 1.0000000
Iter: 100 | Cost: 0.0000174 | Accuracy: 1.0000000
As we can see, the variational classifier learned to classify all bit strings from the training set correctly.
But unlike optimization, in machine learning the goal is to generalize from limited data to unseen examples. Even if the variational quantum circuit was perfectly optimized with respect to the cost, it might not generalize, a phenomenon known as overfitting. The art of (quantum) machine learning is to create models and learning procedures that tend to find “good” minima, or those that lead to models which generalize well.
With this in mind, let’s look at a test set of examples we have not used during training:
data = np.loadtxt("variational_classifier/data/parity_test.txt", dtype=int)
X_test = np.array(data[:, :-1])
Y_test = np.array(data[:, -1])
Y_test = Y_test * 2 - 1 # shift label from {0, 1} to {-1, 1}
predictions_test = [np.sign(variational_classifier(weights, bias, x)) for x in X_test]
for x,y,p in zip(X_test, Y_test, predictions_test):
print(f"x = {x}, y = {y}, pred={p}")
acc_test = accuracy(Y_test, predictions_test)
print("Accuracy on unseen data:", acc_test)
x = [0 0 0 0], y = -1, pred=-1.0
x = [0 0 1 1], y = -1, pred=-1.0
x = [1 0 1 0], y = -1, pred=-1.0
x = [1 1 1 0], y = 1, pred=1.0
x = [1 1 0 0], y = -1, pred=-1.0
x = [1 1 0 1], y = 1, pred=1.0
Accuracy on unseen data: 1.0
The quantum circuit has also learnt to predict all unseen examples perfectly well! This is actually remarkable, since the encoding strategy creates quantum states from the data that have zero overlap – and hence the states created from the test set have no overlap with the states created from the training set. There are many functional relations the variational circuit could learn from this kind of representation, but the classifier chooses to label bit strings according to our ground truth, the parity function.
Let’s look at the second example, in which we use another encoding strategy.
2. Iris classification¶
We now move on to classifying data points from the Iris dataset, which are no longer simple bitstrings but represented as real-valued vectors. The vectors are 2-dimensional, but we will add some “latent dimensions” and therefore encode inputs into 2 qubits.
Quantum and classical nodes¶
State preparation is not as simple as when we represent a bitstring with a basis state. Every input x has to be translated into a set of angles which can get fed into a small routine for state preparation. To simplify things a bit, we will work with data from the positive subspace, so that we can ignore signs (which would require another cascade of rotations around the Z-axis).
The circuit is coded according to the scheme in Möttönen, et al. (2004), or—as presented for positive vectors only—in Schuld and Petruccione (2018). We also decomposed controlled Y-axis rotations into more basic gates, following Nielsen and Chuang (2010).
def get_angles(x):
beta0 = 2 * np.arcsin(np.sqrt(x[1] ** 2) / np.sqrt(x[0] ** 2 + x[1] ** 2 + 1e-12))
beta1 = 2 * np.arcsin(np.sqrt(x[3] ** 2) / np.sqrt(x[2] ** 2 + x[3] ** 2 + 1e-12))
beta2 = 2 * np.arcsin(np.linalg.norm(x[2:]) / np.linalg.norm(x))
return np.array([beta2, -beta1 / 2, beta1 / 2, -beta0 / 2, beta0 / 2])
def state_preparation(a):
qml.RY(a[0], wires=0)
qml.CNOT(wires=[0, 1])
qml.RY(a[1], wires=1)
qml.CNOT(wires=[0, 1])
qml.RY(a[2], wires=1)
qml.PauliX(wires=0)
qml.CNOT(wires=[0, 1])
qml.RY(a[3], wires=1)
qml.CNOT(wires=[0, 1])
qml.RY(a[4], wires=1)
qml.PauliX(wires=0)
Let’s test if this routine actually works.
x = np.array([0.53896774, 0.79503606, 0.27826503, 0.0], requires_grad=False)
ang = get_angles(x)
@qml.qnode(dev)
def test(angles):
state_preparation(angles)
return qml.state()
state = test(ang)
print("x : ", np.round(x, 6))
print("angles : ", np.round(ang, 6))
print("amplitude vector: ", np.round(np.real(state), 6))
x : [0.538968 0.795036 0.278265 0. ]
angles : [ 0.563975 -0. 0. -0.975046 0.975046]
amplitude vector: [ 0.538968 0.795036 0.278265 -0. ]
The method computed the correct angles to prepare the desired state!
Note
The
default.qubit
simulator provides a shortcut tostate_preparation
with the commandqml.StatePrep(x, wires=[0, 1])
. On state simulators, this just replaces the quantum state with our (normalized) input. On hardware, the operation implements more sophisticated versions of the routine used above.
Since we are working with only 2 qubits now, we need to update the layer
function.
In addition, we redefine the cost
function to pass the full batch of data
to the state preparation of the circuit simultaneously, a technique similar
to NumPy broadcasting.
def layer(layer_weights):
for wire in range(2):
qml.Rot(*layer_weights[wire], wires=wire)
qml.CNOT(wires=[0, 1])
def cost(weights, bias, X, Y):
# Transpose the batch of input data in order to make the indexing
# in state_preparation work
predictions = variational_classifier(weights, bias, X.T)
return square_loss(Y, predictions)
Data¶
We load the Iris data set. There is a bit of preprocessing to do in
order to encode the inputs into the amplitudes of a quantum state. We will augment the
data points by two so-called “latent dimensions”, making the size of the padded data point
match the size of the state vector in the quantum device. We then need
to normalize the data points, and finally, we translate the inputs x to rotation
angles using the get_angles
function we defined above.
Data preprocessing should always be done with the problem in mind; for example, if we do not add any latent dimensions, normalization erases any information on the length of the vectors and classes separated by this feature will not be distinguishable.
Note
The Iris dataset can be downloaded
here and should be placed
in the subfolder variational_classifer/data
.
data = np.loadtxt("variational_classifier/data/iris_classes1and2_scaled.txt")
X = data[:, 0:2]
print(f"First X sample (original) : {X[0]}")
# pad the vectors to size 2^2=4 with constant values
padding = np.ones((len(X), 2)) * 0.1
X_pad = np.c_[X, padding]
print(f"First X sample (padded) : {X_pad[0]}")
# normalize each input
normalization = np.sqrt(np.sum(X_pad**2, -1))
X_norm = (X_pad.T / normalization).T
print(f"First X sample (normalized): {X_norm[0]}")
# the angles for state preparation are the features
features = np.array([get_angles(x) for x in X_norm], requires_grad=False)
print(f"First features sample : {features[0]}")
Y = data[:, -1]
First X sample (original) : [0.4 0.75]
First X sample (padded) : [0.4 0.75 0.1 0.1 ]
First X sample (normalized): [0.46420708 0.87038828 0.11605177 0.11605177]
First features sample : [ 0.32973573 -0.78539816 0.78539816 -1.080839 1.080839 ]
These angles are our new features, which is why we have renamed X to “features” above. Let’s plot the stages of preprocessing and play around with the dimensions (dim1, dim2). Some of them still separate the classes well, while others are less informative.
import matplotlib.pyplot as plt
plt.figure()
plt.scatter(X[:, 0][Y == 1], X[:, 1][Y == 1], c="b", marker="o", ec="k")
plt.scatter(X[:, 0][Y == -1], X[:, 1][Y == -1], c="r", marker="o", ec="k")
plt.title("Original data")
plt.show()
plt.figure()
dim1 = 0
dim2 = 1
plt.scatter(X_norm[:, dim1][Y == 1], X_norm[:, dim2][Y == 1], c="b", marker="o", ec="k")
plt.scatter(X_norm[:, dim1][Y == -1], X_norm[:, dim2][Y == -1], c="r", marker="o", ec="k")
plt.title(f"Padded and normalised data (dims {dim1} and {dim2})")
plt.show()
plt.figure()
dim1 = 0
dim2 = 3
plt.scatter(features[:, dim1][Y == 1], features[:, dim2][Y == 1], c="b", marker="o", ec="k")
plt.scatter(features[:, dim1][Y == -1], features[:, dim2][Y == -1], c="r", marker="o", ec="k")
plt.title(f"Feature vectors (dims {dim1} and {dim2})")
plt.show()
This time we want to generalize from the data samples. This means that we want to train our model on one set of data and test its performance on a second set of data that has not been used in training. To monitor the generalization performance, the data is split into training and validation set.
np.random.seed(0)
num_data = len(Y)
num_train = int(0.75 * num_data)
index = np.random.permutation(range(num_data))
feats_train = features[index[:num_train]]
Y_train = Y[index[:num_train]]
feats_val = features[index[num_train:]]
Y_val = Y[index[num_train:]]
# We need these later for plotting
X_train = X[index[:num_train]]
X_val = X[index[num_train:]]
Optimization¶
First we initialize the variables.
num_qubits = 2
num_layers = 6
weights_init = 0.01 * np.random.randn(num_layers, num_qubits, 3, requires_grad=True)
bias_init = np.array(0.0, requires_grad=True)
Again we minimize the cost, using the imported optimizer.
opt = NesterovMomentumOptimizer(0.01)
batch_size = 5
# train the variational classifier
weights = weights_init
bias = bias_init
for it in range(60):
# Update the weights by one optimizer step
batch_index = np.random.randint(0, num_train, (batch_size,))
feats_train_batch = feats_train[batch_index]
Y_train_batch = Y_train[batch_index]
weights, bias, _, _ = opt.step(cost, weights, bias, feats_train_batch, Y_train_batch)
# Compute predictions on train and validation set
predictions_train = np.sign(variational_classifier(weights, bias, feats_train.T))
predictions_val = np.sign(variational_classifier(weights, bias, feats_val.T))
# Compute accuracy on train and validation set
acc_train = accuracy(Y_train, predictions_train)
acc_val = accuracy(Y_val, predictions_val)
if (it + 1) % 2 == 0:
_cost = cost(weights, bias, features, Y)
print(
f"Iter: {it + 1:5d} | Cost: {_cost:0.7f} | "
f"Acc train: {acc_train:0.7f} | Acc validation: {acc_val:0.7f}"
)
Iter: 2 | Cost: 1.6589456 | Acc train: 0.4800000 | Acc validation: 0.5600000
Iter: 4 | Cost: 1.2054273 | Acc train: 0.4933333 | Acc validation: 0.5600000
Iter: 6 | Cost: 0.9740740 | Acc train: 0.4933333 | Acc validation: 0.7200000
Iter: 8 | Cost: 0.9660872 | Acc train: 0.6400000 | Acc validation: 0.6400000
Iter: 10 | Cost: 0.9569019 | Acc train: 0.6000000 | Acc validation: 0.6000000
Iter: 12 | Cost: 0.9445863 | Acc train: 0.4933333 | Acc validation: 0.7200000
Iter: 14 | Cost: 1.0339978 | Acc train: 0.4800000 | Acc validation: 0.5600000
Iter: 16 | Cost: 1.0774217 | Acc train: 0.4933333 | Acc validation: 0.5600000
Iter: 18 | Cost: 0.9984426 | Acc train: 0.4800000 | Acc validation: 0.5600000
Iter: 20 | Cost: 0.8975279 | Acc train: 0.5600000 | Acc validation: 0.7600000
Iter: 22 | Cost: 0.8451699 | Acc train: 0.6400000 | Acc validation: 0.6400000
Iter: 24 | Cost: 0.8337489 | Acc train: 0.5600000 | Acc validation: 0.5200000
Iter: 26 | Cost: 0.7832025 | Acc train: 0.6000000 | Acc validation: 0.6000000
Iter: 28 | Cost: 0.7397515 | Acc train: 0.6133333 | Acc validation: 0.6000000
Iter: 30 | Cost: 0.6690522 | Acc train: 0.6666667 | Acc validation: 0.6400000
Iter: 32 | Cost: 0.5640186 | Acc train: 0.8266667 | Acc validation: 0.8000000
Iter: 34 | Cost: 0.4765597 | Acc train: 0.8933333 | Acc validation: 0.8800000
Iter: 36 | Cost: 0.4144135 | Acc train: 0.9200000 | Acc validation: 0.9600000
Iter: 38 | Cost: 0.3569566 | Acc train: 0.9600000 | Acc validation: 1.0000000
Iter: 40 | Cost: 0.3186159 | Acc train: 0.9866667 | Acc validation: 1.0000000
Iter: 42 | Cost: 0.2853043 | Acc train: 0.9866667 | Acc validation: 1.0000000
Iter: 44 | Cost: 0.2652725 | Acc train: 1.0000000 | Acc validation: 1.0000000
Iter: 46 | Cost: 0.2525848 | Acc train: 1.0000000 | Acc validation: 1.0000000
Iter: 48 | Cost: 0.2444278 | Acc train: 1.0000000 | Acc validation: 1.0000000
Iter: 50 | Cost: 0.2436316 | Acc train: 0.9866667 | Acc validation: 1.0000000
Iter: 52 | Cost: 0.2376316 | Acc train: 1.0000000 | Acc validation: 1.0000000
Iter: 54 | Cost: 0.2307475 | Acc train: 1.0000000 | Acc validation: 1.0000000
Iter: 56 | Cost: 0.2341245 | Acc train: 1.0000000 | Acc validation: 1.0000000
Iter: 58 | Cost: 0.2292663 | Acc train: 1.0000000 | Acc validation: 1.0000000
Iter: 60 | Cost: 0.2241948 | Acc train: 1.0000000 | Acc validation: 1.0000000
We can plot the continuous output of the variational classifier for the first two dimensions of the Iris data set.
plt.figure()
cm = plt.cm.RdBu
# make data for decision regions
xx, yy = np.meshgrid(np.linspace(0.0, 1.5, 30), np.linspace(0.0, 1.5, 30))
X_grid = [np.array([x, y]) for x, y in zip(xx.flatten(), yy.flatten())]
# preprocess grid points like data inputs above
padding = 0.1 * np.ones((len(X_grid), 2))
X_grid = np.c_[X_grid, padding] # pad each input
normalization = np.sqrt(np.sum(X_grid**2, -1))
X_grid = (X_grid.T / normalization).T # normalize each input
features_grid = np.array([get_angles(x) for x in X_grid]) # angles are new features
predictions_grid = variational_classifier(weights, bias, features_grid.T)
Z = np.reshape(predictions_grid, xx.shape)
# plot decision regions
levels = np.arange(-1, 1.1, 0.1)
cnt = plt.contourf(xx, yy, Z, levels=levels, cmap=cm, alpha=0.8, extend="both")
plt.contour(xx, yy, Z, levels=[0.0], colors=("black",), linestyles=("--",), linewidths=(0.8,))
plt.colorbar(cnt, ticks=[-1, 0, 1])
# plot data
for color, label in zip(["b", "r"], [1, -1]):
plot_x = X_train[:, 0][Y_train == label]
plot_y = X_train[:, 1][Y_train == label]
plt.scatter(plot_x, plot_y, c=color, marker="o", ec="k", label=f"class {label} train")
plot_x = (X_val[:, 0][Y_val == label],)
plot_y = (X_val[:, 1][Y_val == label],)
plt.scatter(plot_x, plot_y, c=color, marker="^", ec="k", label=f"class {label} validation")
plt.legend()
plt.show()

We find that the variational classifier learnt a separating line between the datapoints of the two different classes, which allows it to classify even the unseen validation data with perfect accuracy.
Maria Schuld
Dedicated to making quantum machine learning a reality one day.
Share demo