In quantum machine learning (QML), we need to optimize certain parameters \(\boldsymbol{\theta}\) of a variational quantum ansatz \(| \psi (\boldsymbol{\theta}) \rangle\) to minimize some specified cost function \(C(\boldsymbol{\theta})\). How we choose to optimize the parameters is one of the key steps of every QML workflow. Specifically, the choice of different optimizers can highly influence the parameters obtained after an optimization routine as discussed in our previous post. For this reason, in this How-to blog, we will learn how to implement our own custom-built optimizer.

There are many optimizers one could use, from simpler gradient-based optimizers like SGD and ADAM (see code here), high-order gradient optimizers, such as AdaHessian, and gradient-free optimizers, such as the Nelder–Mead optimizer or the Rotosolve optimizer. Sometimes, these optimizers alone can’t do the job right. For simplicity, we will only focus on one gradient-based optimizer: the random coordinate descent (RCD) optimizer.

We first need a cost function to optimize. We will choose the cost function \(C\) defined by

where \(n\) is the number of qubits, \(p_i(\boldsymbol{\theta})\) is the probability of measuring outcome \(i\) from a quantum circuit with parameters \(\boldsymbol{\theta}\) and \(y_i \in \mathbb{R}\). These \(p_i\) values will be obtained from a simple randomly chosen quantum circuit that we aim to optimize.

We start by importing the libraries we will use:

```
import pennylane as qml
from pennylane import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()
np.random.seed(101)
```

Then, we construct the two-qubit circuit we will use to obtain \(p_i\):

```
dev = qml.device("default.qubit", wires=2)
@qml.qnode(dev)
# The inputs of this circuit are the parameters that we will optimize.
def circuit(theta):
# theta is an np.array of shape (2,2).
qml.RX(theta[0, 0], wires=0)
qml.RX(theta[0, 1], wires=1)
qml.CNOT(wires=[0, 1])
qml.RY(theta[1, 0], wires=0)
qml.RY(theta[1, 1], wires=1)
qml.CNOT(wires=[0, 1])
return qml.probs(wires=[0, 1])
```

This circuit consists of four parametrized gates and one non-trainable entangling gate. We can visualize it with the method `qml.draw_mpl`

:

```
qml.draw_mpl(circuit)(np.random.rand(2,2))
```

Having defined our circuit, we define the cost function and array mentioned above:

```
# we will pick our set of target values {y_i} randomly
y_target = np.array([0.7, -0.5, 1.6, -0.6], requires_grad=False)
def cost(theta):
x = np.sin(theta)
probs = circuit(x)
return np.sum(np.abs(np.log(probs) ** (-1)) - y_target)
```

If your circuit happens to have a probability distribution that isNOTE:exactly0 for some outcome, this`cost`

function will diverge!

This circuit ansatz has four parameters that we denote by \(\boldsymbol{\theta}\). Our goal is to find

where \(\boldsymbol{\theta}_\star\) are the parameters that minimize our cost function \(C\).

### Using built-in optimizers in PennyLane

Before we start defining our own optimizers, let us optimize the `cost`

function using a built-in optimizer from `PennyLane`

, namely, the `qml.AdamOptimizer`

optimizer.

```
from pennylane import AdamOptimizer
```

Every optimizer in `pennylane.optimize`

has a `step`

function that updates the parameters (`theta`

in our case) of our circuit. We can use this function to optimize the `cost`

function defined above:

```
n_steps = 1500
theta = np.random.rand(2, 2, requires_grad=True)
costs_list = []
opt = AdamOptimizer()
for i in range(1, n_steps+1):
if i%100==0: print("Running... Current step: ", i)
theta = opt.step(cost, theta)
costs_list.append(cost(theta))
```

Finally, we can plot the learning curve of this routine.

```
plt.plot(costs_list)
plt.xlabel("Steps")
plt.ylabel("Cost function")
plt.title("Built-in Adam")
```

### Random coordinate descent

Random coordinate descent (RCD) is a gradient-based optimizer, meaning that the update rule makes use of the gradient of the cost function. This method is different from vanilla gradient descent since each update does not alter all of the parameters \(\boldsymbol{\theta}\). Rather, one update consists of changing *one* of the parameters \(\theta_i\).

Initializing our parameters \(\boldsymbol{\theta} = (\theta_1, \dots , \theta_d)\) randomly, we iterate through the following update rule:

- Randomly choose an integer \(i \in \{ 1, \dots , d\}\).
- Update the corresponding parameter: \(\theta_i \leftarrow \theta_i - \eta_i \frac{\partial C(\boldsymbol{\theta})}{\partial \theta_i}\).

Now, what we should learn to do is to take derivatives of quantum circuits!

There are several ways of calculating \(\frac{\partial C(\boldsymbol{\theta})}{\partial \theta_i}\). You can read more about different methods of calculating derivatives on PennyLane’s documentation. For now, we will use finite differences to calculate the gradient of our circuit. This method allows us to approximate partial derivatives as

Note that we should not inherit `RCDOptimizer`

from `GradientDescentOptimizer`

since we only need partial derivatives per step (and calculating all derivatives for each time step is unnecessary).

Let us now implement a function that calculates this update rule:

```
def rcd_update(theta, eta=0.01 , h=0.005):
i = np.random.randint(theta.size)
shape = theta.shape
dt = np.eye(1,theta.size,i)
# we reshape theta to match the shape of dt
theta = theta.reshape(-1)
dtheta = (cost((theta+dt*h).reshape(shape))
- cost((theta-dt*h).reshape(shape)))/(2*h)
# we reshape back to the original shape of theta
return (theta - eta*dtheta*dt).reshape(shape)
```

If we want to implement our own optimizer object, like every other optimizer in `pennylane`

, we should create a class with the following structure:

```
class MyOptimizer():
"""This is my own optimizer"""
def __init__():
"""Initializes our own optimizer"""
pass
def step(objective_fn, theta, *args, **kwargs):
"""
Args: - objective_fn to be minimized
- theta
Returns:
- updated theta
"""
pass
def step_and_cost(objective_fn, theta, *args, **kwargs):
"""
Args: - objective_fn to be minimized
- theta
Returns:
- updated theta, cost of updated theta
"""
pass
```

We have already defined a step function (`rcd_update`

), thus, with a small re-write we can fill this skeleton and create a new `RCDOptimizer`

class.

```
class RCDOptimizer:
def __init__(self, eta=0.01 , h=0.005):
self.h = h
self.eta = eta
def step(self, objective_fn, theta):
i = np.random.randint(theta.size)
shape = theta.shape
dt = np.eye(1,theta.size,i)
theta = theta.reshape(-1)
dtheta = (cost((theta+dt*self.h).reshape(shape))
-cost((theta-dt*self.h).reshape(shape)))/(2*self.h)
return (theta - self.eta*dtheta*dt).reshape(shape)
def step_and_cost(self, objective_fn, theta):
theta = self.step(objective_fn, theta)
return theta, objective_fn(theta)
```

By changing one line of our previous code, we can implement this new optimizer:

```
n_steps = 1500
theta = np.random.rand(2, 2, requires_grad=True)
costs_list = []
opt = RCDOptimizer()
for i in range(1, n_steps+1):
if i%100==0: print("Running... Current step: ", i)
theta = opt.step(cost, theta)
costs_list.append(cost(theta))
```

Now, let’s visualize the optimization procedure of the `RCDOptimizer`

.

```
plt.plot(costs_list)
plt.xlabel("Steps")
plt.ylabel("Cost function")
plt.title("Random coordinate descent")
```

This time we have achieved a better local minimum with `RCDOptimizer`

! I encourage you to try different random seeds and hyper-parameters of `AdamOptimizer`

and `RCDOptimizer`

and compare what works best for this problem.

### Remarks

When running these two optimization algorithms - `AdamOptimizer`

and `RCDOptimizer`

- we get sets of parameters \(\boldsymbol{\theta}\) that achieve an approximate local minimum of a given cost function. There does not exist a rule of thumb on what optimizer to use for any QML task. This is why it is important to know how to change optimizers, or in this case, implement your own optimizer! If you want to learn more about different optimizers for your QML routines, check out PennyLane’s documentation.