How to create your own optimizer

Luis Mantilla (Xanadu resident)

In quantum machine learning (QML), we need to optimize certain parameters \(\boldsymbol{\theta}\) of a variational quantum ansatz \(| \psi (\boldsymbol{\theta}) \rangle\) to minimize some specified cost function \(C(\boldsymbol{\theta})\). How we choose to optimize the parameters is one of the key steps of every QML workflow. Specifically, the choice of different optimizers can highly influence the parameters obtained after an optimization routine as discussed in our previous post. For this reason, in this How-to blog, we will learn how to implement our own custom-built optimizer.

There are many optimizers one could use, from simpler gradient-based optimizers like SGD and ADAM (see code here), high-order gradient optimizers, such as AdaHessian, and gradient-free optimizers, such as the Nelder–Mead optimizer or the Rotosolve optimizer. Sometimes, these optimizers alone can’t do the job right. For simplicity, we will only focus on one gradient-based optimizer: the random coordinate descent (RCD) optimizer.

We first need a cost function to optimize. We will choose the cost function \(C\) defined by

$$ C(\boldsymbol{\theta}) = \sum_{i \in [2^n]} |\log(p_i(\boldsymbol{\theta}))^{-1} - y_i|, $$

where \(n\) is the number of qubits, \(p_i(\boldsymbol{\theta})\) is the probability of measuring outcome \(i\) from a quantum circuit with parameters \(\boldsymbol{\theta}\) and \(y_i \in \mathbb{R}\). These \(p_i\) values will be obtained from a simple randomly chosen quantum circuit that we aim to optimize.

We start by importing the libraries we will use:

import pennylane as qml
from pennylane import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()
np.random.seed(101)

Then, we construct the two-qubit circuit we will use to obtain \(p_i\):

dev = qml.device("default.qubit", wires=2)

@qml.qnode(dev)
# The inputs of this circuit are the parameters that we will optimize.
def circuit(theta):
    # theta is an np.array of shape (2,2).
    qml.RX(theta[0, 0], wires=0)
    qml.RX(theta[0, 1], wires=1)

    qml.CNOT(wires=[0, 1])

    qml.RY(theta[1, 0], wires=0)
    qml.RY(theta[1, 1], wires=1)

    qml.CNOT(wires=[0, 1])
    return qml.probs(wires=[0, 1])

This circuit consists of four parametrized gates and one non-trainable entangling gate. We can visualize it with the method qml.draw_mpl:

qml.draw_mpl(circuit)(np.random.rand(2,2))

Having defined our circuit, we define the cost function and array mentioned above:

# we will pick our set of target values {y_i} randomly
y_target = np.array([0.7, -0.5, 1.6, -0.6], requires_grad=False)

def cost(theta):
    x = np.sin(theta)
    probs = circuit(x)
    return np.sum(np.abs(np.log(probs) ** (-1)) - y_target)

NOTE: If your circuit happens to have a probability distribution that is exactly 0 for some outcome, this cost function will diverge!

This circuit ansatz has four parameters that we denote by \(\boldsymbol{\theta}\). Our goal is to find

$$\boldsymbol{\theta}_\star = \operatorname{argmin}_{\boldsymbol{\theta}}\{ \operatorname{C}(\boldsymbol{\theta} ) \}, $$

where \(\boldsymbol{\theta}_\star\) are the parameters that minimize our cost function \(C\).

Using built-in optimizers in PennyLane

Before we start defining our own optimizers, let us optimize the cost function using a built-in optimizer from PennyLane, namely, the qml.AdamOptimizer optimizer.

from pennylane import AdamOptimizer

Every optimizer in pennylane.optimize has a step function that updates the parameters (theta in our case) of our circuit. We can use this function to optimize the cost function defined above:

n_steps = 1500
theta = np.random.rand(2, 2, requires_grad=True)
costs_list = []
opt = AdamOptimizer()
for i in range(1, n_steps+1):
    if i%100==0: print("Running... Current step: ", i)
    theta = opt.step(cost, theta)
    costs_list.append(cost(theta))

Finally, we can plot the learning curve of this routine.

plt.plot(costs_list)
plt.xlabel("Steps")
plt.ylabel("Cost function")
plt.title("Built-in Adam")

Random coordinate descent

Random coordinate descent (RCD) is a gradient-based optimizer, meaning that the update rule makes use of the gradient of the cost function. This method is different from vanilla gradient descent since each update does not alter all of the parameters \(\boldsymbol{\theta}\). Rather, one update consists of changing one of the parameters \(\theta_i\).

Initializing our parameters \(\boldsymbol{\theta} = (\theta_1, \dots , \theta_d)\) randomly, we iterate through the following update rule:

  1. Randomly choose an integer \(i \in \{ 1, \dots , d\}\).
  2. Update the corresponding parameter: \(\theta_i \leftarrow \theta_i - \eta_i \frac{\partial C(\boldsymbol{\theta})}{\partial \theta_i}\).

Now, what we should learn to do is to take derivatives of quantum circuits!

There are several ways of calculating \(\frac{\partial C(\boldsymbol{\theta})}{\partial \theta_i}\). You can read more about different methods of calculating derivatives on PennyLane’s documentation. For now, we will use finite differences to calculate the gradient of our circuit. This method allows us to approximate partial derivatives as

$$\frac{\partial C}{\partial \theta_i} \approx \frac{C(\boldsymbol{\theta} + h\boldsymbol{e_i}) - C(\boldsymbol{\theta} - h\boldsymbol{e_i}) }{2h}. $$

Note that we should not inherit RCDOptimizer from GradientDescentOptimizer since we only need partial derivatives per step (and calculating all derivatives for each time step is unnecessary).

Let us now implement a function that calculates this update rule:

def rcd_update(theta, eta=0.01 , h=0.005):
    i = np.random.randint(theta.size)
    shape = theta.shape
    dt = np.eye(1,theta.size,i)
    # we reshape theta to match the shape of dt
    theta = theta.reshape(-1)
    dtheta = (cost((theta+dt*h).reshape(shape)) 
                - cost((theta-dt*h).reshape(shape)))/(2*h)
    # we reshape back to the original shape of theta
    return (theta - eta*dtheta*dt).reshape(shape)

If we want to implement our own optimizer object, like every other optimizer in pennylane, we should create a class with the following structure:

class MyOptimizer():
    """This is my own optimizer"""
    def __init__():
        """Initializes our own optimizer"""
        pass 

    def step(objective_fn, theta, *args, **kwargs):
        """
        Args: - objective_fn to be minimized
              - theta
        Returns:
              - updated theta
        """
        pass

    def step_and_cost(objective_fn, theta, *args, **kwargs):
        """
        Args: - objective_fn to be minimized
              - theta
        Returns:
              - updated theta, cost of updated theta
        """
        pass

We have already defined a step function (rcd_update), thus, with a small re-write we can fill this skeleton and create a new RCDOptimizer class.

class RCDOptimizer:
    def __init__(self, eta=0.01 , h=0.005):
        self.h = h
        self.eta = eta

    def step(self, objective_fn, theta):
        i = np.random.randint(theta.size)
        shape = theta.shape
        dt = np.eye(1,theta.size,i)
        theta = theta.reshape(-1)
        dtheta = (cost((theta+dt*self.h).reshape(shape)) 
                    -cost((theta-dt*self.h).reshape(shape)))/(2*self.h)
        return (theta - self.eta*dtheta*dt).reshape(shape)

    def step_and_cost(self, objective_fn, theta):
        theta = self.step(objective_fn, theta)
        return theta, objective_fn(theta)

By changing one line of our previous code, we can implement this new optimizer:

n_steps = 1500
theta = np.random.rand(2, 2, requires_grad=True)
costs_list = []
opt = RCDOptimizer()
for i in range(1, n_steps+1):
    if i%100==0: print("Running... Current step: ", i)
    theta = opt.step(cost, theta)
    costs_list.append(cost(theta))

Now, let’s visualize the optimization procedure of the RCDOptimizer.

plt.plot(costs_list)
plt.xlabel("Steps")
plt.ylabel("Cost function")
plt.title("Random coordinate descent")

This time we have achieved a better local minimum with RCDOptimizer! I encourage you to try different random seeds and hyper-parameters of AdamOptimizer and RCDOptimizer and compare what works best for this problem.

Remarks

When running these two optimization algorithms - AdamOptimizer and RCDOptimizer - we get sets of parameters \(\boldsymbol{\theta}\) that achieve an approximate local minimum of a given cost function. There does not exist a rule of thumb on what optimizer to use for any QML task. This is why it is important to know how to change optimizers, or in this case, implement your own optimizer! If you want to learn more about different optimizers for your QML routines, check out PennyLane’s documentation.

Luis Mantilla

Luis Mantilla

Luis is a MSc student at the University of British Columbia where he is studying how to implement a quantum recurrent neural network. This summer, he is working in the architecture team as part of Xanadu’s residency program.