How to calculate the Hessian of a classical-quantum hybrid model

Josh Izaac (PennyLane Team)

PennyLane QNodes don’t just support computing the derivative—they also support computing the second derivative. Further, the second derivative can be computed on both simulators and quantum hardware.

In this how-to, we’ll show you how you can extract the Hessian of a hybrid model using PennyLane in three different ways.

Consider the following hybrid model:

import pennylane as qml
from pennylane import numpy as np

dev = qml.device("default.qubit", wires=2)

@qml.qnode(dev)
def circuit(weights):
    for i in range(2):
        qml.Rot(*weights[i, 0], wires=0)
        qml.Rot(*weights[i, 1], wires=1)
        qml.CNOT(wires=[0, 1])
    return qml.probs(wires=0)

def f(weights, x):
    quantum = circuit(np.sin(weights))
    return np.sum(np.abs(quantum - x) / np.cos(x))

We can compute the Hessian of this cost function f,

$$\mathbf{H}(f)_{ij}(w, x) = \frac{\partial^2}{\partial w_i\partial w_j}f(w, x),$$

by taking the Jacobian of the gradient. Let’s begin by defining our trainable parameters weights and our input data x, and computing the gradient:

>>> weights = np.random.random(size=[2, 2, 3], requires_grad=True)
>>> x = np.array([0.54, 0.1], requires_grad=False)
>>> grad_fn = qml.grad(f)

If we want, we can evaluate our gradient function, but it’s not required!

>>> grad = grad_fn(weights, x)
>>> print(np.round(grad, 5))
[[[-0.      -0.43247  0.02584]
  [-0.      -0.00441  0.00115]]

 [[ 0.02654 -0.20502 -0.     ]
  [-0.      -0.      -0.     ]]]

We can now compute the Jacobian of our gradient function; this gives us the Hessian.

>>> hess_fn = qml.jacobian(grad_fn)
>>> H = hess_fn(weights, x)
>>> H.shape
(2, 2, 3, 2, 2, 3)

Note the shape of our Hessian matrix; this is because our input weights are of shape (2, 2, 3)!

TensorFlow

In the above, we used the built-in Autograd interface. However, the Hessian can also be computed if you are using TensorFlow. We recreate our model from above, but this time using TensorFlow, and making sure to specify interface="tf":

import pennylane as qml
import tensorflow as tf
import numpy as np

dev = qml.device("default.qubit", wires=2)

@qml.qnode(dev, interface="tf")
def circuit(weights):
    for i in range(2):
        qml.Rot(weights[i, 0, 0], weights[i, 0, 1], weights[i, 0, 2], wires=0)
        qml.Rot(weights[i, 1, 0], weights[i, 1, 1], weights[i, 1, 2], wires=1)
        qml.CNOT(wires=[0, 1])
    return qml.probs(wires=0)

@tf.function
def f(weights, x):
    quantum = circuit(tf.sin(weights))
    return tf.reduce_sum(tf.abs(quantum - x) / tf.cos(x))

weights = tf.Variable(np.random.random(size=[2, 2, 3]), dtype=tf.float64)
x = tf.constant([0.54, 0.1], dtype=tf.float64)

As before, we simply take the Jacobian of the gradient. Let’s set up the computation. Since we are taking the double derivative, we need to use two gradient tapes:

with tf.GradientTape() as tape1:
    with tf.GradientTape() as tape2:
        res = f(weights, x)
    grad = tape2.gradient(res, weights)
hess = tape1.jacobian(grad, weights)

We can check the shape of the Hessian, and extract the value of one of its elements:

>>> hess.shape
TensorShape([2, 2, 3, 2, 2, 3])
>>> hess[0, 1, 2, 1, 0, 1]
<tf.Tensor: shape=(), dtype=float64, numpy=2.904043724838658e-06>

PyTorch and the parameter-shift rule

If we want to extract the Hessian using PyTorch, we must use the torch.autograd.functional module. Aside from that, the process is very much the same. This time, let’s explicitly use the parameter-shift rule in order to compute the Hessian in a way that is compatible with hardware devices.

import pennylane as qml
import torch
import numpy as np

dev = qml.device("default.qubit", wires=2)

@qml.qnode(dev, interface="torch", diff_method="parameter-shift")
def circuit(weights):
    for i in range(2):
        qml.Rot(*weights[i, 0], wires=0)
        qml.Rot(*weights[i, 1], wires=1)
        qml.CNOT(wires=[0, 1])
    return qml.probs(wires=0)

def f(weights, x):
    quantum = circuit(torch.sin(weights))
    return torch.sum(torch.abs(quantum - x) / torch.cos(x))

weights = torch.tensor(np.random.random(size=[2, 2, 3]), requires_grad=True)
x = torch.tensor([0.54, 0.1], requires_grad=False)
hess = torch.autograd.functional.hessian(f, (weights, x))

Note that the torch.autograd.functional.hessian function does not respect our requires_grad attributes! So the output will be a nested tuple where the derivative has been taken with respect to both weights and x. Since we are interested in just the Hessian with respect to the weights, lets extract the 00th element:

>>> hess[0][0].shape
torch.Size([2, 2, 3, 2, 2, 3])

We can also explore how many device evaluations were performed to extract the Hessian:

>>> dev.num_executions
313

To compute the Hessian, \(2p[(p-1)+1]+1\) evaluations are required, where \(p\) is the number of parameters in the model. Further, PyTorch will always implicitly compute the Jacobian prior to computing the Hessian, requiring an additional \(2p\) evaluations. Let’s verify this:

>>> p = weights.numel()
>>> 2*p * ((p-1) + 1) + 1 + 2*p
313