{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# This cell is added by sphinx-gallery\n# It can be customized to whatever you like\n%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Data-reuploading classifier {#data_reuploading_classifier}\n===========================\n\n::: {.meta}\n:property=\\\"og:description\\\": Implement a single-qubit universal quantum\nclassifier using PennyLane. :property=\\\"og:image\\\":\n\n:::\n\n::: {.related}\ntutorial\\_variational\\_classifier Variational quantum classifier\ntutorial\\_multiclass\\_classification Multiclass margin classifier\ntutorial\\_expressivity\\_fourier\\_series Quantum models as Fourier series\n:::\n\n*Author: Shahnawaz Ahmed (shahnawaz.ahmed95\\@gmail.com). Last updated:\n19 Jan 2021.*\n\nA single-qubit quantum circuit which can implement arbitrary unitary\noperations can be used as a universal classifier much like a single\nhidden-layered Neural Network. As surprising as it sounds,\n[P\u00e9rez-Salinas et al. (2019)](https://arxiv.org/abs/1907.02085) discuss\nthis with their idea of \\'data reuploading\\'. It is possible to load a\nsingle qubit with arbitrary dimensional data and then use it as a\nuniversal classifier.\n\nIn this example, we will implement this idea with Pennylane - a python\nbased tool for quantum machine learning, automatic differentiation, and\noptimization of hybrid quantum-classical computations.\n\nBackground\n----------\n\nWe consider a simple classification problem and will train a\nsingle-qubit variational quantum circuit to achieve this goal. The data\nis generated as a set of random points in a plane $(x_1, x_2)$ and\nlabeled as 1 (blue) or 0 (red) depending on whether they lie inside or\noutside a circle. The goal is to train a quantum circuit to predict the\nlabel (red or blue) given an input point\\'s coordinate.\n\n![](../demonstrations/data_reuploading/universal_circles.png)\n\n### Transforming quantum states using unitary operations\n\nA single-qubit quantum state is characterized by a two-dimensional state\nvector and can be visualized as a point in the so-called Bloch sphere.\nInstead of just being a 0 (up) or 1 (down), it can exist in a\nsuperposition with say 30% chance of being in the $|0 \\rangle$ and 70%\nchance of being in the $|1 \\rangle$ state. This is represented by a\nstate vector\n$|\\psi \\rangle = \\sqrt{0.3}|0 \\rangle + \\sqrt{0.7}|1 \\rangle$ -the\nprobability \\\"amplitude\\\" of the quantum state. In general we can take a\nvector $(\\alpha, \\beta)$ to represent the probabilities of a qubit being\nin a particular state and visualize it on the Bloch sphere as an arrow.\n\n![](../demonstrations/data_reuploading/universal_bloch.png)\n\n### Data loading using unitaries\n\nIn order to load data onto a single qubit, we use a unitary operation\n$U(x_1, x_2, x_3)$ which is just a parameterized matrix multiplication\nrepresenting the rotation of the state vector in the Bloch sphere. E.g.,\nto load $(x_1, x_2)$ into the qubit, we just start from some initial\nstate vector, $|0 \\rangle$, apply the unitary operation $U(x_1, x_2, 0)$\nand end up at a new point on the Bloch sphere. Here we have padded 0\nsince our data is only 2D. P\u00e9rez-Salinas et al. (2019) discuss how to\nload a higher dimensional data point ($[x_1, x_2, x_3, x_4, x_5, x_6]$)\nby breaking it down in sets of three parameters\n($U(x_1, x_2, x_3), U(x_4, x_5, x_6)$).\n\n### Model parameters with data re-uploading\n\nOnce we load the data onto the quantum circuit, we want to have some\ntrainable nonlinear model similar to a neural network as well as a way\nof learning the weights of the model from data. This is again done with\nunitaries, $U(\\theta_1, \\theta_2, \\theta_3)$, such that we load the data\nfirst and then apply the weights to form a single layer\n$L(\\vec \\theta, \\vec x) = U(\\vec \\theta)U(\\vec x)$. In principle, this\nis just application of two matrix multiplications on an input vector\ninitialized to some value. In order to increase the number of trainable\nparameters (similar to increasing neurons in a single layer of a neural\nnetwork), we can reapply this layer again and again with new sets of\nweights,\n$L(\\vec \\theta_1, \\vec x) L(\\vec \\theta_2, , \\vec x) ... L(\\vec \\theta_L, \\vec x)$\nfor $L$ layers. The quantum circuit would look like the following:\n\n![](../demonstrations/data_reuploading/universal_layers.png)\n\n### The cost function and \\\"nonlinear collapse\\\"\n\nSo far, we have only performed linear operations (matrix\nmultiplications) and we know that we need to have some nonlinear\nsquashing similar to activation functions in neural networks to really\nmake a universal classifier (Cybenko 1989). Here is where things gets a\nbit quantum. After the application of the layers, we will end up at some\npoint on the Bloch sphere due to the sequence of unitaries implementing\nrotations of the input. These are still just linear transformations of\nthe input state. Now, the output of the model should be a class label\nwhich can be encoded as fixed vectors (Blue = $[1, 0]$, Red = $[0, 1]$)\non the Bloch sphere. We want to end up at either of them after\ntransforming our input state through alternate applications of data\nlayer and weights.\n\nWe can use the idea of the \\\"collapse\\\" of our quantum state into one or\nother class. This happens when we measure the quantum state which leads\nto its projection as either the state 0 or 1. We can compute the\nfidelity (or closeness) of the output state to the class label making\nthe output state jump to either $| 0 \\rangle$ or $|1\\rangle$. By\nrepeating this process several times, we can compute the probability or\noverlap of our output to both labels and assign a class based on the\nlabel our output has a higher overlap. This is much like having a set of\noutput neurons and selecting the one which has the highest value as the\nlabel.\n\nWe can encode the output label as a particular quantum state that we\nwant to end up in and use Pennylane to find the probability of ending up\nin that state after running the circuit. We construct an observable\ncorresponding to the output label using the\n[Hermitian](https://pennylane.readthedocs.io/en/latest/code/ops/qubit.html#pennylane.ops.qubit.Hermitian)\noperator. The expectation value of the observable gives the overlap or\nfidelity. We can then define the cost function as the sum of the\nfidelities for all the data points after passing through the circuit and\noptimize the parameters $(\\vec \\theta)$ to minimize the cost.\n\n$$\\texttt{Cost} = \\sum_{\\texttt{data points}} (1 - \\texttt{fidelity}(\\psi_{\\texttt{output}}(\\vec x, \\vec \\theta), \\psi_{\\texttt{label}}))$$\n\nNow, we can use our favorite optimizer to maximize the sum of the\nfidelities over all data points (or batches of datapoints) and find the\noptimal weights for classification. Gradient-based optimizers such as\nAdam (Kingma et. al., 2014) can be used if we have a good model of the\ncircuit and how noise might affect it. Or, we can use some gradient-free\nmethod such as L-BFGS (Liu, Dong C., and Nocedal, J., 1989) to evaluate\nthe gradient and find the optimal weights where we can treat the quantum\ncircuit as a black-box and the gradients are computed numerically using\na fixed number of function evaluations and iterations. The L-BFGS method\ncan be used with the PyTorch interface for Pennylane.\n\n### Multiple qubits, entanglement and Deep Neural Networks\n\nThe Universal Approximation Theorem declares that a neural network with\ntwo or more hidden layers can serve as a universal function\napproximator. Recently, we have witnessed remarkable progress of\nlearning algorithms using Deep Neural Networks.\n\nP\u00e9rez-Salinas et al. (2019) make a connection to Deep Neural Networks by\ndescribing that in their approach the \\\"layers\\\"\n$L_i(\\vec \\theta_i, \\vec x )$ are analogous to the size of the\nintermediate hidden layer of a neural network. And the concept of deep\n(multiple layers of the neural network) relates to the number of qubits.\nSo, multiple qubits with entanglement between them could provide some\nquantum advantage over classical neural networks. But here, we will only\nimplement a single qubit classifier.\n\n![](../demonstrations/data_reuploading/universal_dnn.png)\n\n\\\"Talk is cheap. Show me the code.\\\" - Linus Torvalds\n-----------------------------------------------------\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import pennylane as qml\nfrom pennylane import numpy as np\nfrom pennylane.optimize import AdamOptimizer, GradientDescentOptimizer\n\nimport matplotlib.pyplot as plt\n\n\n# Set a random seed\nnp.random.seed(42)\n\n\n# Make a dataset of points inside and outside of a circle\ndef circle(samples, center=[0.0, 0.0], radius=np.sqrt(2 / np.pi)):\n \"\"\"\n Generates a dataset of points with 1/0 labels inside a given radius.\n\n Args:\n samples (int): number of samples to generate\n center (tuple): center of the circle\n radius (float: radius of the circle\n\n Returns:\n Xvals (array[tuple]): coordinates of points\n yvals (array[int]): classification labels\n \"\"\"\n Xvals, yvals = [], []\n\n for i in range(samples):\n x = 2 * (np.random.rand(2)) - 1\n y = 0\n if np.linalg.norm(x - center) < radius:\n y = 1\n Xvals.append(x)\n yvals.append(y)\n return np.array(Xvals, requires_grad=False), np.array(yvals, requires_grad=False)\n\n\ndef plot_data(x, y, fig=None, ax=None):\n \"\"\"\n Plot data with red/blue values for a binary classification.\n\n Args:\n x (array[tuple]): array of data points as tuples\n y (array[int]): array of data points as tuples\n \"\"\"\n if fig == None:\n fig, ax = plt.subplots(1, 1, figsize=(5, 5))\n reds = y == 0\n blues = y == 1\n ax.scatter(x[reds, 0], x[reds, 1], c=\"red\", s=20, edgecolor=\"k\")\n ax.scatter(x[blues, 0], x[blues, 1], c=\"blue\", s=20, edgecolor=\"k\")\n ax.set_xlabel(\"$x_1$\")\n ax.set_ylabel(\"$x_2$\")\n\n\nXdata, ydata = circle(500)\nfig, ax = plt.subplots(1, 1, figsize=(4, 4))\nplot_data(Xdata, ydata, fig=fig, ax=ax)\nplt.show()\n\n\n# Define output labels as quantum state vectors\ndef density_matrix(state):\n \"\"\"Calculates the density matrix representation of a state.\n\n Args:\n state (array[complex]): array representing a quantum state vector\n\n Returns:\n dm: (array[complex]): array representing the density matrix\n \"\"\"\n return state * np.conj(state).T\n\n\nlabel_0 = [[1], [0]]\nlabel_1 = [[0], [1]]\nstate_labels = np.array([label_0, label_1], requires_grad=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Simple classifier with data reloading and fidelity loss\n=======================================================\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"dev = qml.device(\"default.qubit\", wires=1)\n# Install any pennylane-plugin to run on some particular backend\n\n\n@qml.qnode(dev)\ndef qcircuit(params, x, y):\n \"\"\"A variational quantum circuit representing the Universal classifier.\n\n Args:\n params (array[float]): array of parameters\n x (array[float]): single input vector\n y (array[float]): single output state density matrix\n\n Returns:\n float: fidelity between output state and input\n \"\"\"\n for p in params:\n qml.Rot(*x, wires=0)\n qml.Rot(*p, wires=0)\n return qml.expval(qml.Hermitian(y, wires=[0]))\n\n\ndef cost(params, x, y, state_labels=None):\n \"\"\"Cost function to be minimized.\n\n Args:\n params (array[float]): array of parameters\n x (array[float]): 2-d array of input vectors\n y (array[float]): 1-d array of targets\n state_labels (array[float]): array of state representations for labels\n\n Returns:\n float: loss value to be minimized\n \"\"\"\n # Compute prediction for each input in data batch\n loss = 0.0\n dm_labels = [density_matrix(s) for s in state_labels]\n for i in range(len(x)):\n f = qcircuit(params, x[i], dm_labels[y[i]])\n loss = loss + (1 - f) ** 2\n return loss / len(x)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Utility functions for testing and creating batches\n==================================================\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def test(params, x, y, state_labels=None):\n \"\"\"\n Tests on a given set of data.\n\n Args:\n params (array[float]): array of parameters\n x (array[float]): 2-d array of input vectors\n y (array[float]): 1-d array of targets\n state_labels (array[float]): 1-d array of state representations for labels\n\n Returns:\n predicted (array([int]): predicted labels for test data\n output_states (array[float]): output quantum states from the circuit\n \"\"\"\n fidelity_values = []\n dm_labels = [density_matrix(s) for s in state_labels]\n predicted = []\n\n for i in range(len(x)):\n fidel_function = lambda y: qcircuit(params, x[i], y)\n fidelities = [fidel_function(dm) for dm in dm_labels]\n best_fidel = np.argmax(fidelities)\n\n predicted.append(best_fidel)\n fidelity_values.append(fidelities)\n\n return np.array(predicted), np.array(fidelity_values)\n\n\ndef accuracy_score(y_true, y_pred):\n \"\"\"Accuracy score.\n\n Args:\n y_true (array[float]): 1-d array of targets\n y_predicted (array[float]): 1-d array of predictions\n state_labels (array[float]): 1-d array of state representations for labels\n\n Returns:\n score (float): the fraction of correctly classified samples\n \"\"\"\n score = y_true == y_pred\n return score.sum() / len(y_true)\n\n\ndef iterate_minibatches(inputs, targets, batch_size):\n \"\"\"\n A generator for batches of the input data\n\n Args:\n inputs (array[float]): input data\n targets (array[float]): targets\n\n Returns:\n inputs (array[float]): one batch of input data of length `batch_size`\n targets (array[float]): one batch of targets of length `batch_size`\n \"\"\"\n for start_idx in range(0, inputs.shape[0] - batch_size + 1, batch_size):\n idxs = slice(start_idx, start_idx + batch_size)\n yield inputs[idxs], targets[idxs]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Train a quantum classifier on the circle dataset\n================================================\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Generate training and test data\nnum_training = 200\nnum_test = 2000\n\nXdata, y_train = circle(num_training)\nX_train = np.hstack((Xdata, np.zeros((Xdata.shape[0], 1), requires_grad=False)))\n\nXtest, y_test = circle(num_test)\nX_test = np.hstack((Xtest, np.zeros((Xtest.shape[0], 1), requires_grad=False)))\n\n\n# Train using Adam optimizer and evaluate the classifier\nnum_layers = 3\nlearning_rate = 0.6\nepochs = 10\nbatch_size = 32\n\nopt = AdamOptimizer(learning_rate, beta1=0.9, beta2=0.999)\n\n# initialize random weights\nparams = np.random.uniform(size=(num_layers, 3), requires_grad=True)\n\npredicted_train, fidel_train = test(params, X_train, y_train, state_labels)\naccuracy_train = accuracy_score(y_train, predicted_train)\n\npredicted_test, fidel_test = test(params, X_test, y_test, state_labels)\naccuracy_test = accuracy_score(y_test, predicted_test)\n\n# save predictions with random weights for comparison\ninitial_predictions = predicted_test\n\nloss = cost(params, X_test, y_test, state_labels)\n\nprint(\n \"Epoch: {:2d} | Cost: {:3f} | Train accuracy: {:3f} | Test Accuracy: {:3f}\".format(\n 0, loss, accuracy_train, accuracy_test\n )\n)\n\nfor it in range(epochs):\n for Xbatch, ybatch in iterate_minibatches(X_train, y_train, batch_size=batch_size):\n params, _, _, _ = opt.step(cost, params, Xbatch, ybatch, state_labels)\n\n predicted_train, fidel_train = test(params, X_train, y_train, state_labels)\n accuracy_train = accuracy_score(y_train, predicted_train)\n loss = cost(params, X_train, y_train, state_labels)\n\n predicted_test, fidel_test = test(params, X_test, y_test, state_labels)\n accuracy_test = accuracy_score(y_test, predicted_test)\n res = [it + 1, loss, accuracy_train, accuracy_test]\n print(\n \"Epoch: {:2d} | Loss: {:3f} | Train accuracy: {:3f} | Test accuracy: {:3f}\".format(\n *res\n )\n )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Results\n=======\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(\n \"Cost: {:3f} | Train accuracy {:3f} | Test Accuracy : {:3f}\".format(\n loss, accuracy_train, accuracy_test\n )\n)\n\nprint(\"Learned weights\")\nfor i in range(num_layers):\n print(\"Layer {}: {}\".format(i, params[i]))\n\n\nfig, axes = plt.subplots(1, 3, figsize=(10, 3))\nplot_data(X_test, initial_predictions, fig, axes[0])\nplot_data(X_test, predicted_test, fig, axes[1])\nplot_data(X_test, y_test, fig, axes[2])\naxes[0].set_title(\"Predictions with random weights\")\naxes[1].set_title(\"Predictions after training\")\naxes[2].set_title(\"True test data\")\nplt.tight_layout()\nplt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"References\n==========\n\n\\[1\\] P\u00e9rez-Salinas, Adri\u00e1n, et al. \\\"Data re-uploading for a universal\nquantum classifier.\\\" arXiv preprint arXiv:1907.02085 (2019).\n\n\\[2\\] Kingma, Diederik P., and Ba, J. \\\"Adam: A method for stochastic\noptimization.\\\" arXiv preprint arXiv:1412.6980 (2014).\n\n\\[3\\] Liu, Dong C., and Nocedal, J. \\\"On the limited memory BFGS method\nfor large scale optimization.\\\" Mathematical programming 45.1-3 (1989):\n503-528.\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.13"
}
},
"nbformat": 4,
"nbformat_minor": 0
}