In quantum machine learning (QML), we need to optimize certain parameters \boldsymbol{\theta} of a variational quantum ansatz | \psi (\boldsymbol{\theta}) \rangle to minimize some specified cost function C(\boldsymbol{\theta}). How we choose to optimize the parameters is one of the key steps of every QML workflow. Specifically, the choice of different optimizers can highly influence the parameters obtained after an optimization routine as discussed in our previous post. For this reason, in this How-to blog, we will learn how to implement our own custom-built optimizer.
There are many optimizers one could use, from simpler gradient-based optimizers like SGD and ADAM (see code here), high-order gradient optimizers, such as AdaHessian, and gradient-free optimizers, such as the Nelder–Mead optimizer or the Rotosolve optimizer. Sometimes, these optimizers alone can’t do the job right. For simplicity, we will only focus on one gradient-based optimizer: the random coordinate descent (RCD) optimizer.
We first need a cost function to optimize. We will choose the cost function C defined by
where n is the number of qubits, p_i(\boldsymbol{\theta}) is the probability of measuring outcome i from a quantum circuit with parameters \boldsymbol{\theta} and y_i \in \mathbb{R}. These p_i values will be obtained from a simple randomly chosen quantum circuit that we aim to optimize.
We start by importing the libraries we will use:
import pennylane as qml from pennylane import numpy as np import matplotlib.pyplot as plt import seaborn as sns sns.set_theme() np.random.seed(101)
Then, we construct the two-qubit circuit we will use to obtain p_i:
dev = qml.device("default.qubit", wires=2) @qml.qnode(dev) # The inputs of this circuit are the parameters that we will optimize. def circuit(theta): # theta is an np.array of shape (2,2). qml.RX(theta[0, 0], wires=0) qml.RX(theta[0, 1], wires=1) qml.CNOT(wires=[0, 1]) qml.RY(theta[1, 0], wires=0) qml.RY(theta[1, 1], wires=1) qml.CNOT(wires=[0, 1]) return qml.probs(wires=[0, 1])
This circuit consists of four parametrized gates and one non-trainable entangling gate. We can visualize it with the method qml.draw_mpl
:
qml.draw_mpl(circuit)(np.random.rand(2,2))
Having defined our circuit, we define the cost function and array mentioned above:
# we will pick our set of target values {y_i} randomly y_target = np.array([0.7, -0.5, 1.6, -0.6], requires_grad=False) def cost(theta): x = np.sin(theta) probs = circuit(x) return np.sum(np.abs(np.log(probs) ** (-1)) - y_target)
NOTE: If your circuit happens to have a probability distribution that is exactly 0 for some outcome, this cost
function will diverge!
This circuit ansatz has four parameters that we denote by \boldsymbol{\theta}. Our goal is to find
where \boldsymbol{\theta}_\star are the parameters that minimize our cost function C .
Using built-in optimizers in PennyLane
Before we start defining our own optimizers, let us optimize the cost
function using a built-in optimizer from PennyLane
, namely, the qml.AdamOptimizer
optimizer.
from pennylane import AdamOptimizer
Every optimizer in pennylane.optimize
has a step
function that updates the parameters (theta
in our case) of our circuit. We can use this function to optimize the cost
function defined above:
n_steps = 1500 theta = np.random.rand(2, 2, requires_grad=True) costs_list = [] opt = AdamOptimizer() for i in range(1, n_steps+1): if i%100==0: print("Running... Current step: ", i) theta = opt.step(cost, theta) costs_list.append(cost(theta))
Finally, we can plot the learning curve of this routine.
plt.plot(costs_list) plt.xlabel("Steps") plt.ylabel("Cost function") plt.title("Built-in Adam")
Random coordinate descent
Random coordinate descent (RCD) is a gradient-based optimizer, meaning that the update rule makes use of the gradient of the cost function. This method is different from vanilla gradient descent since each update does not alter all of the parameters \boldsymbol{\theta}. Rather, one update consists of changing one of the parameters \theta_i.
Initializing our parameters \boldsymbol{\theta} = (\theta_1, \dots , \theta_d) randomly, we iterate through the following update rule:
- Randomly choose an integer i \in \{ 1, \dots , d\}.
- Update the corresponding parameter: \theta_i \leftarrow \theta_i - \eta_i \frac{\partial C(\boldsymbol{\theta})}{\partial \theta_i}.
Now, what we should learn to do is to take derivatives of quantum circuits!
There are several ways of calculating \frac{\partial C(\boldsymbol{\theta})}{\partial \theta_i}. You can read more about different methods of calculating derivatives on PennyLane’s documentation. For now, we will use finite differences to calculate the gradient of our circuit. This method allows us to approximate partial derivatives as
Note that we should not inherit RCDOptimizer
from GradientDescentOptimizer
since we only need partial derivatives per step (and calculating all derivatives for each time step is unnecessary).
Let us now implement a function that calculates this update rule:
def rcd_update(theta, eta=0.01 , h=0.005): i = np.random.randint(theta.size) shape = theta.shape dt = np.eye(1,theta.size,i) # we reshape theta to match the shape of dt theta = theta.reshape(-1) dtheta = (cost((theta+dt*h).reshape(shape)) - cost((theta-dt*h).reshape(shape)))/(2*h) # we reshape back to the original shape of theta return (theta - eta*dtheta*dt).reshape(shape)
If we want to implement our own optimizer object, like every other optimizer in pennylane
, we should create a class with the following structure:
class MyOptimizer(): """This is my own optimizer""" def __init__(): """Initializes our own optimizer""" pass def step(objective_fn, theta, *args, **kwargs): """ Args: - objective_fn to be minimized - theta Returns: - updated theta """ pass def step_and_cost(objective_fn, theta, *args, **kwargs): """ Args: - objective_fn to be minimized - theta Returns: - updated theta, cost of updated theta """ pass
We have already defined a step function (rcd_update
), thus, with a small re-write we can fill this skeleton and create a new RCDOptimizer
class.
class RCDOptimizer: def __init__(self, eta=0.01 , h=0.005): self.h = h self.eta = eta def step(self, objective_fn, theta): i = np.random.randint(theta.size) shape = theta.shape dt = np.eye(1,theta.size,i) theta = theta.reshape(-1) dtheta = (cost((theta+dt*self.h).reshape(shape)) -cost((theta-dt*self.h).reshape(shape)))/(2*self.h) return (theta - self.eta*dtheta*dt).reshape(shape) def step_and_cost(self, objective_fn, theta): theta = self.step(objective_fn, theta) return theta, objective_fn(theta)
By changing one line of our previous code, we can implement this new optimizer:
n_steps = 1500 theta = np.random.rand(2, 2, requires_grad=True) costs_list = [] opt = RCDOptimizer() for i in range(1, n_steps+1): if i%100==0: print("Running... Current step: ", i) theta = opt.step(cost, theta) costs_list.append(cost(theta))
Now, let’s visualize the optimization procedure of the RCDOptimizer
.
plt.plot(costs_list) plt.xlabel("Steps") plt.ylabel("Cost function") plt.title("Random coordinate descent")
This time we have achieved a better local minimum with RCDOptimizer
! I encourage you to try different random seeds and hyper-parameters of AdamOptimizer
and RCDOptimizer
and compare what works best for this problem.
Remarks
When running these two optimization algorithms - AdamOptimizer
and RCDOptimizer
- we get sets of parameters \boldsymbol{\theta} that achieve an approximate local minimum of a given cost function. There does not exist a rule of thumb on what optimizer to use for any QML task. This is why it is important to know how to change optimizers, or in this case, implement your own optimizer! If you want to learn more about different optimizers for your QML routines, check out PennyLane’s documentation.
About the author
Luis Mantilla Calderon
Luis is a summer resident at Xanadu. He works in quantum error correction and is interested in QML, quantum compilation, and BCI technology.