{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# This cell is added by sphinx-gallery\n# It can be customized to whatever you like\n%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Adjoint Differentiation\n=======================\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Author: PennyLane dev team. Posted: 23 Nov 2021. Last updated: 23 Nov\n2021.*\n\n[Classical automatic\ndifferentiation](https://en.wikipedia.org/wiki/Automatic_differentiation#The_chain_rule,_forward_and_reverse_accumulation)\nhas two methods of calculation: forward and reverse. The optimal choice\nof method depends on the structure of the problem; is the function\nmany-to-one or one-to-many? We use the properties of the problem to\noptimize how we calculate derivatives.\n\nMost methods for calculating the derivatives of quantum circuits are\neither direct applications of classical gradient methods to quantum\nsimulations, or quantum hardware methods like parameter-shift where we\ncan only extract restricted pieces of information.\n\nAdjoint differentiation straddles these two strategies, taking benefits\nfrom each. On simulators, we can examine and modify the state vector at\nany point. At the same time, we know our quantum circuit holds specific\nproperties not present in an arbitrary classical computation.\n\nQuantum circuits only involve:\n\n1) initialization,\n\n$$|0\\rangle,$$\n\n2) application of unitary operators,\n\n$$|\\Psi\\rangle = U_{n} U_{n-1} \\dots U_0 |0\\rangle,$$\n\n3) measurement, such as estimating an expectation value of a Hermitian\n operator,\n\n$$\\langle M \\rangle = \\langle \\Psi | M | \\Psi \\rangle.$$\n\nSince all our operators are unitary, we can easily \"undo\" or \"erase\"\nthem by applying their adjoint:\n\n$$U^{\\dagger} U | \\phi \\rangle = |\\phi\\rangle.$$\n\nThe **adjoint differentiation method** takes advantage of the ability to\nerase, creating a time- and memory-efficient method for computing\nquantum gradients on state vector simulators. Tyson Jones and Julien\nGacon describe this algorithm in their paper [\"Efficient calculation of\ngradients in classical simulations of variational quantum\nalgorithms\"](https://arxiv.org/abs/2009.02823) .\n\nIn this demo, you will learn how adjoint differentiation works and how\nto request it for your PennyLane QNode. We will also look at the\nperformance benefits.\n\nTime for some code\n==================\n\nSo how does it work? Instead of jumping straight to the algorithm, let's\nexplore the above equations and their implementation in a bit more\ndetail.\n\nTo start, we import PennyLane and PennyLane's numpy:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import pennylane as qml\nfrom pennylane import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We also need a circuit to simulate:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"dev = qml.device('default.qubit', wires=2)\n\nx = np.array([0.1, 0.2, 0.3])\n\n@qml.qnode(dev, diff_method=\"adjoint\")\ndef circuit(a):\n qml.RX(a[0], wires=0)\n qml.CNOT(wires=(0,1))\n qml.RY(a[1], wires=1)\n qml.RZ(a[2], wires=1)\n return qml.expval(qml.PauliX(wires=1))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The fast c++ simulator device `\"lightning.qubit\"` also supports adjoint\ndifferentiation, but here we want to quickly prototype a minimal version\nto illustrate how the algorithm works. We recommend performing adjoint\ndifferentiation on `\"lightning.qubit\"` for substantial performance\nincreases.\n\nWe will use the `circuit` QNode just for comparison purposes. Throughout\nthis demo, we will instead use a list of its operations `ops` and a\nsingle observable `M`.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"n_gates = 4\nn_params = 3\n\nops = [\n qml.RX(x[0], wires=0),\n qml.CNOT(wires=(0,1)),\n qml.RY(x[1], wires=1),\n qml.RZ(x[2], wires=1)\n]\nM = qml.PauliX(wires=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We create our state by using the `\"default.qubit\"` methods\n`_create_basis_state` and `_apply_operation`.\n\nThese are private methods that you shouldn't typically use and are\nsubject to change without a deprecation period, but we use them here to\nillustrate the algorithm.\n\nInternally, the device uses a 2x2x2x... array to represent the state,\nwhereas the measurement `qml.state()` and the device attribute\n`dev.state` flatten this internal representation.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"state = dev._create_basis_state(0)\n\nfor op in ops:\n state = dev._apply_operation(state, op)\n \nprint(state)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can think of the expectation $\\langle M \\rangle$ as an inner product\nbetween a bra and a ket:\n\n$$\\langle M \\rangle = \\langle b | k \\rangle = \\langle \\Psi | M | \\Psi \\rangle,$$\n\nwhere\n\n$$\\langle b | = \\langle \\Psi| M = \\langle 0 | U_0^{\\dagger} \\dots U_n^{\\dagger} M,$$\n\n$$| k \\rangle = |\\Psi \\rangle = U_n U_{n-1} \\dots U_0 |0\\rangle.$$\n\nWe could have attached $M$, a Hermitian observable ($M^{\\dagger}=M$), to\neither the bra or the ket, but attaching it to the bra side will be\nuseful later.\n\nUsing the `state` calculated above, we can create these $|b\\rangle$ and\n$|k\\rangle$ vectors.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"bra = dev._apply_operation(state, M)\nket = state"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we use `np.vdot` to take their inner product. `np.vdot` sums over\nall dimensions and takes the complex conjugate of the first input.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"M_expval = np.vdot(bra, ket)\nprint(\"vdot : \", M_expval)\nprint(\"QNode : \", circuit(x))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We got the same result via both methods! This validates our use of\n`vdot` and device methods.\n\nBut the dividing line between what makes the \"bra\" and \"ket\" vector is\nactually fairly arbitrary. We can divide the two vectors at any point\nfrom one $\\langle 0 |$ to the other $|0\\rangle$. For example, we could\nhave used\n\n$$\\langle b_n | = \\langle 0 | U_1^{\\dagger} \\dots U_n^{\\dagger} M U_n,$$\n\n$$|k_n \\rangle = U_{n-1} \\dots U_1 |0\\rangle,$$\n\nand gotten the exact same results. Here, the subscript $n$ is used to\nindicate that $U_n$ was moved to the bra side of the expression. Let's\ncalculate that instead:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"bra_n = dev._create_basis_state(0)\n\nfor op in ops:\n bra_n = dev._apply_operation(bra_n, op)\nbra_n = dev._apply_operation(bra_n, M)\nbra_n = dev._apply_operation(bra_n, ops[-1].inv())\n\nops[-1].inv() # returning the operation to an uninverted state\n\nket_n = dev._create_basis_state(0)\n\nfor op in ops[:-1]: # don't apply last operation\n ket_n = dev._apply_operation(ket_n, op)\n\nM_expval_n = np.vdot(bra_n, ket_n)\nprint(M_expval_n)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Same answer!\n\nWe can calculate this in a more efficient way if we already have the\ninitial `state` $| \\Psi \\rangle$. To shift the splitting point, we don't\nhave to recalculate everything from scratch. We just remove the\noperation from the ket and add it to the bra:\n\n$$\\langle b_n | = \\langle b | U_n,$$\n\n$$|k_n\\rangle = U_n^{\\dagger} |k\\rangle .$$\n\nFor the ket vector, you can think of $U_n^{\\dagger}$ as \"eating\" its\ncorresponding unitary from the vector, erasing it from the list of\noperations.\n\nOf course, we actually work with the conjugate transpose of\n$\\langle b_n |$,\n\n$$|b_n\\rangle = U_n^{\\dagger} | b \\rangle.$$\n\nOnce we write it in this form, we see that the adjoint of the operation\n$U_n^{\\dagger}$ operates on both $|k_n\\rangle$ and $|b_n\\rangle$ to move\nthe splitting point right.\n\nLet's call this the \"version 2\" method.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"bra_n_v2 = dev._apply_operation(state, M)\nket_n_v2 = state\n\nops[-1].inv()\n\nbra_n_v2 = dev._apply_operation(bra_n_v2, ops[-1])\nket_n_v2 = dev._apply_operation(ket_n_v2, ops[-1])\n\nops[-1].inv()\n\nM_expval_n_v2 = np.vdot(bra_n_v2, ket_n_v2)\nprint(M_expval_n_v2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Much simpler!\n\nWe can easily iterate over all the operations to show that the same\nresult occurs no matter where you split the operations:\n\n$$\\langle b_i | = \\langle b_{i+1}| U_{i},$$\n\n$$|k_{i+1} \\rangle = U_{i} |k_{i}\\rangle.$$\n\nRewritten, we have our iteration formulas\n\n$$| b_i \\rangle = U_i^{\\dagger} |b_{i+1}\\rangle,$$\n\n$$| k_i \\rangle = U_i^{\\dagger} |k_{i+1}\\rangle.$$\n\nFor each iteration, we move an operation from the ket side to the bra\nside. We start near the center at $U_n$ and reverse through the\noperations list until we reach $U_0$.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"bra_loop = dev._apply_operation(state, M)\nket_loop = state\n\nfor op in reversed(ops):\n op.inv()\n bra_loop = dev._apply_operation(bra_loop, op)\n ket_loop = dev._apply_operation(ket_loop, op)\n op.inv()\n print(np.vdot(bra_loop, ket_loop))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally to Derivatives!\n=======================\n\nWe showed how to calculate the same thing a bunch of different ways. Why\nis this useful? Wherever we cut, we can stick additional things in the\nmiddle. What are we sticking in the middle? The derivative of a gate.\n\nFor simplicity's sake, assume each unitary operation $U_i$ is a function\nof a single parameter $\\theta_i$. For non-parametrized gates like CNOT,\nwe say its derivative is zero. We can also generalize the algorithm to\nmulti-parameter gates, but we leave those out for now.\n\nRemember that each parameter occurs twice in $\\langle M \\rangle$: once\nin the bra and once in the ket. Therefore, we use the product rule to\ntake the derivative with respect to both locations:\n\n$$\\frac{\\partial \\langle M \\rangle}{\\partial \\theta_i} = \n \\langle 0 | U_1^{\\dagger} \\dots \\frac{\\text{d} U_i^{\\dagger}}{\\text{d} \\theta_i} \\dots M \\dots U_i \\dots U_1 | 0\\rangle.$$\n\n$$+ \\langle 0 | U_1^{\\dagger} \\dots U_i^{\\dagger} \\dots M \\dots \\frac{\\text{d} U_i}{\\text{d} \\theta_i} \\dots U_1 |0\\rangle$$\n\nWe can now notice that those two components are complex conjugates of\neach other, so we can further simplify. Note that each term is not an\nexpectation value of a Hermitian observable, and therefore not\nguaranteed to be real. When we add them together, the imaginary part\ncancels out, and we obtain twice the value of the real part:\n\n$$= 2 \\cdot \\text{Re}\\left( \\langle 0 | U_1^{\\dagger} \\dots U_i^{\\dagger} \\dots M \\dots \\frac{\\text{d} U_i}{\\text{d} \\theta_i} \\dots U_1 |0\\rangle \\right).$$\n\nWe can take that formula and break it into its \"bra\" and \"ket\" halves\nfor a derivative at the $i$ th position:\n\n$$\\frac{\\partial \\langle M \\rangle }{\\partial \\theta_i } = \n2 \\text{Re} \\left( \\langle b_i | \\frac{\\text{d} U_i }{\\text{d} \\theta_i} | k_i \\rangle \\right)$$\n\nwhere\n\n$$\\langle b_i | = \\langle 0 | U_1^{\\dagger} \\dots U_n^{\\dagger} M U_n \\dots U_{i+1},$$\n\nNotice that $U_i$ does not appear in either the bra or the ket in the\nabove equations. These formulas differ from the ones we used when just\ncalculating the expectation value. For the actual derivative\ncalculation, we use a temporary version of the bra,\n\n$$\\langle \\tilde{b}_i | = \\langle b_i | \\frac{\\text{d} U_i}{\\text{d} \\theta_i},$$\n\nand use these to get the derivative\n\n$$\\frac{\\partial \\langle M \\rangle}{\\partial \\theta_i} = 2 \\text{Re}\\left( \\langle \\tilde{b}_i | k_i \\rangle \\right).$$\n\nBoth the bra and the ket can be calculated recursively:\n\n$$| b_{i} \\rangle = U^{\\dagger}_{i+1} |b_{i+1}\\rangle,$$\n\n$$| k_{i} \\rangle = U^{\\dagger}_{i} |k_{i+1}\\rangle.$$\n\nWe can iterate through the operations starting at $n$ and ending at $1$.\n\nWe do have to calculate initial state first, the \"forward\" pass:\n\n$$|\\Psi\\rangle = U_{n} U_{n-1} \\dots U_0 |0\\rangle.$$\n\nOnce we have that, we only have about the same amount of work to\ncalculate all the derivatives, instead of quadratically more work.\n\nDerivative of an Operator\n-------------------------\n\nOne final thing before we get back to coding: how do we get the\nderivative of an operator?\n\nMost parametrized gates can be represented in terms of Pauli Rotations,\nwhich can be written as\n\n$$U = e^{i c \\hat{G} \\theta}$$\n\nfor a Pauli matrix $\\hat{G}$, a constant $c$, and the parameter\n$\\theta$. Thus we can easily calculate their derivatives:\n\n$$\\frac{\\text{d} U}{\\text{d} \\theta} = i c \\hat{G} e^{i c \\hat{G} \\theta} = i c \\hat{G} U .$$\n\nLuckily, PennyLane already has a built-in function for calculating this.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"grad_op0 = qml.operation.operation_derivative(ops[0])\nprint(grad_op0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now for calculating the full derivative using the adjoint method!\n\nWe loop over the reversed operations, just as before. But if the\noperation has a parameter, we calculate its derivative and append it to\na list before moving on. Since the `operation_derivative` function spits\nback out a matrix instead of an operation, we have to use\n`dev._apply_unitary` instead to create $|\\tilde{b}_i\\rangle$.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"bra = dev._apply_operation(state, M)\nket = state\n\ngrads = []\n\nfor op in reversed(ops):\n op.inv()\n ket = dev._apply_operation(ket, op)\n \n # Calculating the derivative\n if op.num_params != 0:\n dU = qml.operation.operation_derivative(op)\n \n bra_temp = dev._apply_unitary(bra, dU, op.wires)\n \n dM = 2 * np.real(np.vdot(bra_temp, ket))\n grads.append(dM)\n \n bra = dev._apply_operation(bra, op)\n op.inv()\n\n\n# Finally reverse the order of the gradients\n# since we calculated them in reverse\ngrads = grads[::-1]\n\nprint(\"our calculation: \", grads)\n\ngrad_compare = qml.grad(circuit)(x)\nprint(\"comparison: \", grad_compare)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It matches!!!\n\nIf you want to use adjoint differentiation without having to code up\nyour own method that can support arbitrary circuits, you can use\n`diff_method=\"adjoint\"` in PennyLane with `\"default.qubit\"` or\nPennyLane's fast C++ simulator `\"lightning.qubit\"`.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"dev_lightning = qml.device('lightning.qubit', wires=2)\n\n@qml.qnode(dev_lightning, diff_method=\"adjoint\")\ndef circuit_adjoint(a):\n qml.RX(a[0], wires=0)\n qml.CNOT(wires=(0,1))\n qml.RY(a[1], wires=1)\n qml.RZ(a[2], wires=1)\n return qml.expval(M)\n\nqml.grad(circuit_adjoint)(x)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Performance\n===========\n\nThe algorithm gives us the correct answers, but is it worth using?\nParameter-shift gradients require at least two executions per parameter,\nso that method gets more and more expensive with the size of the\ncircuit, especially on simulators. Backpropagation demonstrates decent\ntime scaling, but requires more and more memory as the circuit gets\nlarger. Simulation of large circuits is already RAM-limited, and\nbackpropagation constrains the size of possible circuits even more.\nPennyLane also achieves backpropagation derivatives from a Python\nsimulator and interface-specific functions. The `\"lightning.qubit\"`\ndevice does not support backpropagation, so backpropagation derivatives\nlose the speedup from an optimized simulator.\n\nWith adjoint differentiation on `\"lightning.qubit\"`, you can get the\nbest of both worlds: fast and memory efficient.\n\nBut how fast? The provided script\n[here](https://pennylane.ai/qml/demos/adjoint_diff_benchmarking.html)\ngenerated the following images on a mid-range laptop. The\nbackpropagation times were produced with the Python simulator\n`\"default.qubit\"`, while parameter-shift and adjoint differentiation\ntimes were calculated with `\"lightning.qubit\"`. The adjoint method\nclearly wins out for performance.\n\n![](../demonstrations/adjoint_diff/scaling.png){width=\"80%\"}\n\nConclusions\n===========\n\nSo what have we learned? Adjoint differentiation is an efficient method\nfor differentiating quantum circuits with state vector simulation. It\nscales nicely in time without excessive memory requirements. Now that\nyou've seen how the algorithm works, you can better understand what is\nhappening when you select adjoint differentiation from one of\nPennyLane's simulators.\n\nBibliography\n============\n\nJones and Gacon. Efficient calculation of gradients in classical\nsimulations of variational quantum algorithms.\n[](https://arxiv.org/abs/2009.02823)\n\nXiu-Zhe Luo, Jin-Guo Liu, Pan Zhang, and Lei Wang. Yao.jl: [Extensible,\nefficient framework for quantum algorithm\ndesign](https://quantum-journal.org/papers/q-2020-10-11-341/) , 2019\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
}