{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# This cell is added by sphinx-gallery\n# It can be customized to whatever you like\n%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Alleviating barren plateaus with local cost functions\n=====================================================\n\n*Author: Thomas Storwick (). Posted: 9 Sep 2020.\nLast updated: 28 Jan 2021.*\n\nBarren Plateaus\n---------------\n\nBarren plateaus </demos/tutorial\\_barren\\_plateaus> are large\nregions of the cost function's parameter space where the variance of the\ngradient is almost 0; or, put another way, the cost function landscape\nis flat. This means that a variational circuit initialized in one of\nthese areas will be untrainable using any gradient-based algorithm.\n\nIn [\"Cost-Function-Dependent Barren Plateaus in Shallow Quantum Neural\nNetworks\"](https://arxiv.org/abs/2001.00550) \\[\\#Cerezo2020\\]\\_, Cerezo\net al. demonstrate the idea that the barren plateau phenomenon can,\nunder some circumstances, be avoided by using cost functions that only\nhave information from part of the circuit. These *local* cost functions\ncan be more robust against noise, and may have better-behaved gradients\nwith no plateaus for shallow circuits.\n\n![Taken from Cerezo et al.\n\\[\\#Cerezo2020\\]\\_.](../demonstrations/local_cost_functions/Cerezo_et_al_local_cost_functions.png){width=\"50%\"}\n\nMany variational quantum algorithms are constructed to use global cost\nfunctions. Information from the entire measurement is used to analyze\nthe result of the circuit, and a cost function is calculated from this\nto quantify the circuit's performance. A local cost function only\nconsiders information from a few qubits, and attempts to analyze the\nbehavior of the entire circuit from this limited scope.\n\nCerezo et al. also handily prove that these local cost functions are\nbounded by the global ones, i.e., if a global cost function is\nformulated in the manner described by Cerezo et al., then the value of\nits corresponding local cost function will always be less than or equal\nto the value of the global cost function.\n\nIn this notebook, we investigate the effect of barren plateaus in\nvariational quantum algorithms, and how they can be mitigated using\nlocal cost functions.\n\nWe first need to import the following modules.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import pennylane as qml\nfrom pennylane import numpy as np\nimport matplotlib.pyplot as plt\nfrom matplotlib.ticker import LinearLocator, FormatStrFormatter\n\nnp.random.seed(42)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Visualizing the problem\n=======================\n\nTo start, let's look at the task of learning the identity gate across\nmultiple qubits. This will help us visualize the problem and get a sense\nof what is happening in the cost landscape.\n\nFirst we define a number of wires we want to train on. The work by\nCerezo et al. shows that circuits are trainable under certain regimes,\nso how many qubits we train on will effect our results.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"wires = 6\ndev = qml.device(\"default.qubit\", wires=wires, shots=10000)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we want to define our QNodes and our circuit ansatz. For this\nsimple example, an ansatz that works well is simply a rotation along X,\nand a rotation along Y, repeated across all the qubits.\n\nWe will also define our cost functions here. Since we are trying to\nlearn the identity gate, a natural cost function is 1 minus the\nprobability of measuring the zero state, denoted here as\n$1 - p_{|0\\rangle}$.\n\n$$C = \\langle \\psi(\\theta) | \\left(I - |0\\rangle \\langle 0|\\right) | \\psi(\\theta) \\rangle =1-p_{|0\\rangle}$$\n\nWe will apply this across all qubits for our global cost function, i.e.,\n\n$$C_{G} = \\langle \\psi(\\theta) | \\left(I - |00 \\ldots 0\\rangle \\langle 00 \\ldots 0|\\right) | \\psi(\\theta) \\rangle = 1-p_{|00 \\ldots 0\\rangle}$$\n\nand for the local cost function, we will sum the individual\ncontributions from each qubit:\n\n$$C_L = \\langle \\psi(\\theta) | \\left(I - \\frac{1}{n} \\sum_j |0\\rangle \\langle 0|_j\\right)|\\psi(\\theta)\\rangle = 1 - \\sum_j p_{|0\\rangle_j}.$$\n\nIt might be clear to some readers now why this function can perform\nbetter. By formatting our local cost function in this way, we have\nessentially divided the problem up into multiple single-qubit terms, and\nsummed all the results up.\n\nTo implement this, we will define a separate QNode for the local cost\nfunction and the global cost function.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def global_cost_simple(rotations):\n for i in range(wires):\n qml.RX(rotations[0][i], wires=i)\n qml.RY(rotations[1][i], wires=i)\n return qml.probs(wires=range(wires))\n\ndef local_cost_simple(rotations):\n for i in range(wires):\n qml.RX(rotations[0][i], wires=i)\n qml.RY(rotations[1][i], wires=i)\n return [qml.probs(wires=i) for i in range(wires)]\n\nglobal_circuit = qml.QNode(global_cost_simple, dev)\n\nlocal_circuit = qml.QNode(local_cost_simple, dev)\n\ndef cost_local(rotations):\n return 1 - np.sum(local_circuit(rotations)[:,0])/wires\n\n\ndef cost_global(rotations):\n return 1 - global_circuit(rotations)[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To analyze each of the circuits, we provide some random initial\nparameters for each rotation.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"RX = np.random.uniform(low=-np.pi, high=np.pi)\nRY = np.random.uniform(low=-np.pi, high=np.pi)\nrotations = [[RX for i in range(wires)], [RY for i in range(wires)]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Examining the results:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(\"Global Cost: {: .7f}\".format(cost_global(rotations)))\nprint(\"Local Cost: {: .7f}\".format(cost_local(rotations)))\nprint(\"--- Global Circuit ---\")\nprint(global_circuit.draw())\nprint(\"--- Local Circuit\")\nprint(local_circuit.draw())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With this simple example, we can visualize the cost function, and see\nthe barren plateau effect graphically. Although there are $2n$ (where\n$n$ is the number of qubits) parameters, in order to plot the cost\nlandscape we must constrain ourselves. We will consider the case where\nall X rotations have the same value, and all the Y rotations have the\nsame value.\n\nFirstly, we look at the global cost function. When plotting the cost\nfunction across 6 qubits, much of the cost landscape is flat, and\ndifficult to train (even with a circuit depth of only 2!). This effect\nwill worsen as the number of qubits increases.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def generate_surface(cost_function):\n Z = []\n Z_assembler = []\n\n X = np.arange(-np.pi, np.pi, 0.25)\n Y = np.arange(-np.pi, np.pi, 0.25)\n X, Y = np.meshgrid(X, Y)\n\n for x in X[0, :]:\n for y in Y[:, 0]:\n rotations = [[x for i in range(wires)], [y for i in range(wires)]]\n Z_assembler.append(cost_function(rotations))\n Z.append(Z_assembler)\n Z_assembler = []\n\n Z = np.asarray(Z)\n return Z\n\ndef plot_surface(surface):\n X = np.arange(-np.pi, np.pi, 0.25)\n Y = np.arange(-np.pi, np.pi, 0.25)\n X, Y = np.meshgrid(X, Y)\n fig = plt.figure()\n ax = fig.add_subplot(111, projection=\"3d\")\n surf = ax.plot_surface(X, Y, surface, cmap=\"viridis\", linewidth=0, antialiased=False)\n ax.set_zlim(0, 1)\n ax.zaxis.set_major_locator(LinearLocator(10))\n ax.zaxis.set_major_formatter(FormatStrFormatter(\"%.02f\"))\n plt.show()\n\n\nglobal_surface = generate_surface(cost_global)\nplot_surface(global_surface)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However, when we change to the local cost function, the cost landscape\nbecomes much more trainable as the size of the barren plateau decreases.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"local_surface = generate_surface(cost_local)\nplot_surface(local_surface)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Those are some nice pictures, but how do they reflect actual\ntrainability? Let us try training both the local and global cost\nfunctions. To simplify this model, let's modify our cost function from\n\n$$C_{L} = 1-\\sum p_{|0\\rangle},$$\n\nwhere we sum the marginal probabilities of each qubit, to\n\n$$C_{L} = 1-p_{|0\\rangle},$$\n\nwhere we only consider the probability of a single qubit to be in the 0\nstate.\n\nWhile we're at it, let us make our ansatz a little more like one we\nwould encounter while trying to solve a VQE problem, and add\nentanglement.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def global_cost_simple(rotations):\n for i in range(wires):\n qml.RX(rotations[0][i], wires=i)\n qml.RY(rotations[1][i], wires=i)\n qml.broadcast(qml.CNOT, wires=range(wires), pattern=\"chain\")\n return qml.probs(wires=range(wires))\ndef local_cost_simple(rotations):\n for i in range(wires):\n qml.RX(rotations[0][i], wires=i)\n qml.RY(rotations[1][i], wires=i)\n qml.broadcast(qml.CNOT, wires=range(wires), pattern=\"chain\")\n return qml.probs(wires=[0])\n\nglobal_circuit = qml.QNode(global_cost_simple, dev)\n\nlocal_circuit = qml.QNode(local_cost_simple, dev)\n\ndef cost_local(rotations):\n return 1 - local_circuit(rotations)[0]\ndef cost_global(rotations):\n return 1 - global_circuit(rotations)[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Of course, now that we've changed both our cost function and our\ncircuit, we will need to scan the cost landscape again.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"global_surface = generate_surface(cost_global)\nplot_surface(global_surface)\n\nlocal_surface = generate_surface(cost_local)\nplot_surface(local_surface)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It seems our changes didn't significantly alter the overall cost\nlandscape. This probably isn't a general trend, but it is a nice\nsurprise. Now, let us get back to training the local and global cost\nfunctions. Because we have a visualization of the total cost landscape,\nlet's pick a point to exaggerate the problem. One of the worst points in\nthe landscape is $(\\pi,0)$ as it is in the middle of the plateau, so\nlet's use that.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"rotations = np.array([[3.] * len(range(wires)), [0.] * len(range(wires))])\nopt = qml.GradientDescentOptimizer(stepsize=0.2)\nsteps = 100\nparams_global = rotations\nfor i in range(steps):\n # update the circuit parameters\n params_global = opt.step(cost_global, params_global)\n\n if (i + 1) % 5 == 0:\n print(\"Cost after step {:5d}: {: .7f}\".format(i + 1, cost_global(params_global)))\n if cost_global(params_global) < 0.1:\n break\nprint(global_circuit.draw())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After 100 steps, the cost function is still exactly 1. Clearly we are in\nan \"untrainable\" area. Now, let us limit ourselves to the local cost\nfunction and see how it performs.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"rotations = np.array([[3. for i in range(wires)], [0. for i in range(wires)]])\nopt = qml.GradientDescentOptimizer(stepsize=0.2)\nsteps = 100\nparams_local = rotations\nfor i in range(steps):\n # update the circuit parameters\n params_local = opt.step(cost_local, params_local)\n\n if (i + 1) % 5 == 0:\n print(\"Cost after step {:5d}: {: .7f}\".format(i + 1, cost_local(params_local)))\n if cost_local(params_local) < 0.05:\n break\nprint(local_circuit.draw())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It trained! And much faster than the global case. However, we know our\nlocal cost function is bounded by the global one, but just how much have\nwe trained it?\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"cost_global(params_local)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Interestingly, the global cost function is still 1. If we trained the\nlocal cost function, why hasn't the global cost function changed?\n\nThe answer is that we have trained the global cost a *little bit*, but\nnot enough to see a change with only 10000 shots. To see the effect,\nwe'll need to increase the number of shots to an unreasonable amount.\nInstead, making the backend analytic by setting shots to `None`, gives\nus the exact representation.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"dev.shots = None\nglobal_circuit = qml.QNode(global_cost_simple, dev)\nprint(\n \"Current cost: \"\n + str(cost_global(params_local))\n + \".\\nInitial cost: \"\n + str(cost_global([[3.0 for i in range(wires)], [0 for i in range(wires)]]))\n + \".\\nDifference: \"\n + str(\n cost_global([[3.0 for i in range(wires)], [0 for i in range(wires)]])\n - cost_global(params_local)\n )\n)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Our circuit has definitely been trained, but not a useful amount. If we\nattempt to use this circuit, it would act the same as if we never\ntrained at all. Furthermore, if we now attempt to train the global cost\nfunction, we are still firmly in the plateau region. In order to fully\ntrain the global circuit, we will need to increase the locality\ngradually as we train.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def tunable_cost_simple(rotations):\n for i in range(wires):\n qml.RX(rotations[0][i], wires=i)\n qml.RY(rotations[1][i], wires=i)\n qml.broadcast(qml.CNOT, wires=range(wires), pattern=\"chain\")\n return qml.probs(range(locality))\n\ndef cost_tunable(rotations):\n return 1 - tunable_circuit(rotations)[0]\n\ndev.shots = 10000\ntunable_circuit = qml.QNode(tunable_cost_simple, dev)\nlocality = 2\nparams_tunable = params_local\nprint(cost_tunable(params_tunable))\nprint(tunable_circuit.draw())\n\nlocality = 2\nopt = qml.GradientDescentOptimizer(stepsize=0.1)\nsteps = 600\nfor i in range(steps):\n # update the circuit parameters\n params_tunable = opt.step(cost_tunable, params_tunable)\n\n runCost = cost_tunable(params_tunable)\n if (i + 1) % 10 == 0:\n print(\n \"Cost after step {:5d}: {: .7f}\".format(i + 1, runCost)\n + \". Locality: \"\n + str(locality)\n )\n\n if runCost < 0.1 and locality < wires:\n print(\"---Switching Locality---\")\n locality += 1\n continue\n elif runCost < 0.1 and locality >= wires:\n break\nprint(tunable_circuit.draw())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A more thorough analysis\n========================\n\nNow the circuit can be trained, even though we started from a place\nwhere the global function has a barren plateau. The significance of this\nis that we can now train from every starting location in this example.\n\nBut, how often does this problem occur? If we wanted to train this\ncircuit from a random starting point, how often would we be stuck in a\nplateau? To investigate this, let's attempt to train the global cost\nfunction using random starting positions and count how many times we run\ninto a barren plateau.\n\nLet's use a number of qubits we are more likely to use in a real\nvariational circuit: n=10. We will say that after 400 steps, any run\nwith a cost function of less than 0.9 (chosen arbitrarily) will probably\nbe trainable given more time. Any run with a greater cost function will\nprobably be in a plateau.\n\nThis may take up to 15 minutes.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"samples = 10\nplateau = 0\ntrained = 0\nopt = qml.GradientDescentOptimizer(stepsize=0.2)\nsteps = 400\nwires = 8\n\ndev = qml.device(\"default.qubit\", wires=wires, shots=10000)\nglobal_circuit = qml.QNode(global_cost_simple, dev)\n\nfor runs in range(samples):\n print(\"--- New run! ---\")\n has_been_trained = False\n params_global = [\n [np.random.uniform(-np.pi, np.pi) for i in range(wires)],\n [np.random.uniform(-np.pi, np.pi) for i in range(wires)],\n ]\n for i in range(steps):\n # update the circuit parameters\n params_global = opt.step(cost_global, params_global)\n\n if (i + 1) % 20 == 0:\n print(\"Cost after step {:5d}: {: .7f}\".format(i + 1, cost_global(params_global)))\n if cost_global(params_global) < 0.9:\n has_been_trained = True\n break\n if has_been_trained:\n trained = trained + 1\n else:\n plateau = plateau + 1\n print(\"Trained: {:5d}\".format(trained))\n print(\"Plateau'd: {:5d}\".format(plateau))\n\n\nsamples = 10\nplateau = 0\ntrained = 0\nopt = qml.GradientDescentOptimizer(stepsize=0.2)\nsteps = 400\nwires = 8\n\ndev = qml.device(\"default.qubit\", wires=wires, shots=10000)\ntunable_circuit = qml.QNode(tunable_cost_simple, dev)\n\nfor runs in range(samples):\n locality = 1\n print(\"--- New run! ---\")\n has_been_trained = False\n params_tunable = [\n [np.random.uniform(-np.pi, np.pi) for i in range(wires)],\n [np.random.uniform(-np.pi, np.pi) for i in range(wires)],\n ]\n for i in range(steps):\n # update the circuit parameters\n params_tunable = opt.step(cost_tunable, params_tunable)\n\n runCost = cost_tunable(params_tunable)\n if (i + 1) % 10 == 0:\n print(\n \"Cost after step {:5d}: {: .7f}\".format(i + 1, runCost)\n + \". Locality: \"\n + str(locality)\n )\n\n if runCost < 0.5 and locality < wires:\n print(\"---Switching Locality---\")\n locality += 1\n continue\n elif runCost < 0.1 and locality >= wires:\n trained = trained + 1\n has_been_trained = True\n break\n if not has_been_trained:\n plateau = plateau + 1\n print(\"Trained: {:5d}\".format(trained))\n print(\"Plateau'd: {:5d}\".format(plateau))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the global case, anywhere between 70-80% of starting positions are\nuntrainable, a significant number. It is likely that, as the complexity\nof our ansatz\u2014and the number of qubits\u2014increases, this factor will\nincrease.\n\nWe can compare that to our local cost function, where every single area\ntrained, and most even trained in less time. While these examples are\nsimple, this local-vs-global cost behaviour has been shown to extend to\nmore complex problems.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"References\n==========\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
}