In quantum machine learning (QML), we need to optimize certain parameters
There are many optimizers one could use, from simpler gradient-based optimizers like SGD and ADAM (see code here), high-order gradient optimizers, such as AdaHessian, and gradient-free optimizers, such as the Nelder–Mead optimizer or the Rotosolve optimizer. Sometimes, these optimizers alone can’t do the job right. For simplicity, we will only focus on one gradient-based optimizer: the random coordinate descent (RCD) optimizer.
We first need a cost function to optimize. We will choose the cost function
where
We start by importing the libraries we will use:
import pennylane as qml from pennylane import numpy as np import matplotlib.pyplot as plt import seaborn as sns sns.set_theme() np.random.seed(101)
Then, we construct the two-qubit circuit we will use to obtain
dev = qml.device("default.qubit", wires=2) @qml.qnode(dev) # The inputs of this circuit are the parameters that we will optimize. def circuit(theta): # theta is an np.array of shape (2,2). qml.RX(theta[0, 0], wires=0) qml.RX(theta[0, 1], wires=1) qml.CNOT(wires=[0, 1]) qml.RY(theta[1, 0], wires=0) qml.RY(theta[1, 1], wires=1) qml.CNOT(wires=[0, 1]) return qml.probs(wires=[0, 1])
This circuit consists of four parametrized gates and one non-trainable entangling gate. We can visualize it with the method qml.draw_mpl
:
qml.draw_mpl(circuit)(np.random.rand(2,2))
Having defined our circuit, we define the cost function and array mentioned above:
# we will pick our set of target values {y_i} randomly y_target = np.array([0.7, -0.5, 1.6, -0.6], requires_grad=False) def cost(theta): x = np.sin(theta) probs = circuit(x) return np.sum(np.abs(np.log(probs) ** (-1)) - y_target)
NOTE: If your circuit happens to have a probability distribution that is exactly 0 for some outcome, this cost
function will diverge!
This circuit ansatz has four parameters that we denote by
where
Using built-in optimizers in PennyLane
Before we start defining our own optimizers, let us optimize the cost
function using a built-in optimizer from PennyLane
, namely, the qml.AdamOptimizer
optimizer.
from pennylane import AdamOptimizer
Every optimizer in pennylane.optimize
has a step
function that updates the parameters (theta
in our case) of our circuit. We can use this function to optimize the cost
function defined above:
n_steps = 1500 theta = np.random.rand(2, 2, requires_grad=True) costs_list = [] opt = AdamOptimizer() for i in range(1, n_steps+1): if i%100==0: print("Running... Current step: ", i) theta = opt.step(cost, theta) costs_list.append(cost(theta))
Finally, we can plot the learning curve of this routine.
plt.plot(costs_list) plt.xlabel("Steps") plt.ylabel("Cost function") plt.title("Built-in Adam")
Random coordinate descent
Random coordinate descent (RCD) is a gradient-based optimizer, meaning that the update rule makes use of the gradient of the cost function. This method is different from vanilla gradient descent since each update does not alter all of the parameters
Initializing our parameters
- Randomly choose an integer
. - Update the corresponding parameter:
.
Now, what we should learn to do is to take derivatives of quantum circuits!
There are several ways of calculating
Note that we should not inherit RCDOptimizer
from GradientDescentOptimizer
since we only need partial derivatives per step (and calculating all derivatives for each time step is unnecessary).
Let us now implement a function that calculates this update rule:
def rcd_update(theta, eta=0.01 , h=0.005): i = np.random.randint(theta.size) shape = theta.shape dt = np.eye(1,theta.size,i) # we reshape theta to match the shape of dt theta = theta.reshape(-1) dtheta = (cost((theta+dt*h).reshape(shape)) - cost((theta-dt*h).reshape(shape)))/(2*h) # we reshape back to the original shape of theta return (theta - eta*dtheta*dt).reshape(shape)
If we want to implement our own optimizer object, like every other optimizer in pennylane
, we should create a class with the following structure:
class MyOptimizer(): """This is my own optimizer""" def __init__(): """Initializes our own optimizer""" pass def step(objective_fn, theta, *args, **kwargs): """ Args: - objective_fn to be minimized - theta Returns: - updated theta """ pass def step_and_cost(objective_fn, theta, *args, **kwargs): """ Args: - objective_fn to be minimized - theta Returns: - updated theta, cost of updated theta """ pass
We have already defined a step function (rcd_update
), thus, with a small re-write we can fill this skeleton and create a new RCDOptimizer
class.
class RCDOptimizer: def __init__(self, eta=0.01 , h=0.005): self.h = h self.eta = eta def step(self, objective_fn, theta): i = np.random.randint(theta.size) shape = theta.shape dt = np.eye(1,theta.size,i) theta = theta.reshape(-1) dtheta = (cost((theta+dt*self.h).reshape(shape)) -cost((theta-dt*self.h).reshape(shape)))/(2*self.h) return (theta - self.eta*dtheta*dt).reshape(shape) def step_and_cost(self, objective_fn, theta): theta = self.step(objective_fn, theta) return theta, objective_fn(theta)
By changing one line of our previous code, we can implement this new optimizer:
n_steps = 1500 theta = np.random.rand(2, 2, requires_grad=True) costs_list = [] opt = RCDOptimizer() for i in range(1, n_steps+1): if i%100==0: print("Running... Current step: ", i) theta = opt.step(cost, theta) costs_list.append(cost(theta))
Now, let’s visualize the optimization procedure of the RCDOptimizer
.
plt.plot(costs_list) plt.xlabel("Steps") plt.ylabel("Cost function") plt.title("Random coordinate descent")
This time we have achieved a better local minimum with RCDOptimizer
! I encourage you to try different random seeds and hyper-parameters of AdamOptimizer
and RCDOptimizer
and compare what works best for this problem.
Remarks
When running these two optimization algorithms - AdamOptimizer
and RCDOptimizer
- we get sets of parameters
About the author
Luis Mantilla Calderon
Luis is a summer resident at Xanadu. He works in quantum error correction and is interested in QML, quantum compilation, and BCI technology.