Frugal shot optimization with Rosalin

Author: PennyLane dev team. Posted: 19 May 2020. Last updated: 20 Jan 2021.

In this tutorial we investigate and implement the Rosalin (Random Operator Sampling for Adaptive Learning with Individual Number of shots) from Arrasmith et al. [1]. In this paper, a strategy is introduced for reducing the number of shots required when optimizing variational quantum algorithms, by both:

  • Frugally adapting the number of shots used per parameter update, and
  • Performing a weighted sampling of operators from the cost Hamiltonian.

Background

While a large number of papers in variational quantum algorithms focus on the choice of circuit ansatz, cost function, gradient computation, or initialization method, the optimization strategy—an important component affecting both convergence time and quantum resource dependence—is not as frequently considered. Instead, common ‘out-of-the-box’ classical optimization techniques, such as gradient-free methods (COBLYA, Nelder-Mead), gradient-descent, and Hessian-free methods (L-BFGS) tend to be used.

However, for variational algorithms such as VQE, which involve evaluating a large number of non-commuting operators in the cost function, decreasing the number of quantum evaluations required for convergence, while still minimizing statistical noise, can be a delicate balance.

Recent work has highlighted that ‘quantum-aware’ optimization techniques can lead to marked improvements when training variational quantum algorithms:

  • Quantum natural gradient descent by Stokes et al. [2], which takes into account the quantum geometry during the gradient-descent update step.
  • The work of Sweke et al. [3], which shows that quantum gradient descent with a finite number of shots is equivalent to stochastic gradient descent, and has guaranteed convergence. Furthermore, combining a finite number of shots with weighted sampling of the cost function terms leads to Doubly stochastic gradient descent.
  • The iCANS (individual Coupled Adaptive Number of Shots) optimization technique by Jonas Kuebler et al. [4] adapts the number of shots measurements during training, by maximizing the expected gain per shot.

In this latest result by Arrasmith et al. [1], the idea of doubly stochastic gradient descent has been used to extend the iCANS optimizer, resulting in faster convergence.

Over the course of this tutorial, we will explore their results; beginning first with a demonstration of weighted random sampling of the cost Hamiltonian operators, before combining this with the shot-frugal iCANS optimizer to perform doubly stochastic Rosalin optimization.

Weighted random sampling

Consider a Hamiltonian \(H\) expanded as a weighted sum of operators \(h_i\) that can be directly measured:

\[H = \sum_{i=1}^N c_i h_i.\]

Due to the linearity of expectation values, the expectation value of this Hamiltonian can be expressed as the weighted sum of each individual term:

\[\langle H\rangle = \sum_{i=1}^N c_i \langle h_i\rangle.\]

In the doubly stochastic gradient descent demonstration, we estimated this expectation value by uniformly sampling a subset of the terms at each optimization step, and evaluating each term by using the same finite number of shots \(N\).

However, what happens if we use a weighted approach to determine how to distribute our samples across the terms of the Hamiltonian? In weighted random sampling (WRS), the number of shots used to determine the expectation value \(\langle h_i\rangle\) is a discrete random variable distributed according to a multinomial distribution,

\[S \sim \text{Multinomial}(p_i),\]

with event probabilities

\[p_i = \frac{|c_i|}{\sum_i |c_i|}.\]

That is, the number of shots assigned to the measurement of the expectation value of the \(i\text{th}\) term of the Hamiltonian is drawn from a probability distribution proportional to the magnitude of its coefficient \(c_i\).

To see this strategy in action, consider the Hamiltonian

\[H = 2I\otimes X + 4 I\otimes Z - X\otimes X + 5Y\otimes Y + 2 Z\otimes X.\]

We can solve for the ground state energy using the variational quantum eigensolver (VQE) algorithm.

First, let’s import NumPy and PennyLane, and define our Hamiltonian.

import numpy as np
import pennylane as qml

# set the random seed
np.random.seed(4)

coeffs = [2, 4, -1, 5, 2]

obs = [
  qml.PauliX(1),
  qml.PauliZ(1),
  qml.PauliX(0) @ qml.PauliX(1),
  qml.PauliY(0) @ qml.PauliY(1),
  qml.PauliZ(0) @ qml.PauliZ(1)
]

We can now create our quantum device (let’s use the default.qubit simulator), and begin constructing some QNodes to evaluate each observable. For our ansatz, we’ll use the StronglyEntanglingLayers.

from pennylane import expval
from pennylane.init import strong_ent_layers_uniform
from pennylane.templates.layers import StronglyEntanglingLayers

num_layers = 2
num_wires = 2

# create a device that estimates expectation values using a finite number of shots
non_analytic_dev = qml.device("default.qubit", wires=num_wires, analytic=False)

# create a device that calculates exact expectation values
analytic_dev = qml.device("default.qubit", wires=num_wires, analytic=True)

We use map() to map our ansatz over our list of observables, returning a collection of QNodes, each one evaluating the expectation value of each Hamiltonian.

qnodes = qml.map(StronglyEntanglingLayers, obs, device=non_analytic_dev, diff_method="parameter-shift")

Now, let’s set the total number of shots, and determine the probability for sampling each Hamiltonian term.

total_shots = 8000
prob_shots = np.abs(coeffs) / np.sum(np.abs(coeffs))
print(prob_shots)

Out:

[0.14285714 0.28571429 0.07142857 0.35714286 0.14285714]

We can now use SciPy to create our multinomial distributed random variable \(S\), using the number of trials (total shot number) and probability values:

from scipy.stats import multinomial
si = multinomial(n=total_shots, p=prob_shots)

Sampling from this distribution will provide the number of shots used to sample each term in the Hamiltonian:

samples = si.rvs()[0]
print(samples)
print(sum(samples))

Out:

[1191 2262  552 2876 1119]
8000

As expected, if we sum the sampled shots per term, we recover the total number of shots.

Let’s now create our cost function. Recall that the cost function must do the following:

  1. It must sample from the multinomial distribution we created above, to determine the number of shots \(s_i\) to use to estimate the expectation value of the ith Hamiltonian term.
  2. It then must estimate the expectation value \(\langle h_i\rangle\) by querying the required QNode.
  3. And, last but not least, estimate the expectation value \(\langle H\rangle = \sum_i c_i\langle h_i\rangle\).
def cost(params):
    # sample from the multinomial distribution
    shots_per_term = si.rvs()[0]

    result = 0

    for h, c, p, s in zip(qnodes, coeffs, prob_shots, shots_per_term):
        # set the number of shots
        h.device.shots = s

        # evaluate the QNode corresponding to
        # the Hamiltonian term, and add it on to our running sum
        result += c * h(params)

    return result

Evaluating our cost function with some initial parameters, we can test out that our cost function evaluates correctly.

init_params = strong_ent_layers_uniform(n_layers=num_layers, n_wires=num_wires)
print(cost(init_params))

Out:

-0.8395887630997874

Performing the optimization, with the number of shots randomly determined at each optimization step:

opt = qml.AdamOptimizer(0.05)
params = init_params

cost_wrs = []
shots_wrs = []

for i in range(100):
    params, _cost = opt.step_and_cost(cost, params)
    cost_wrs.append(_cost)
    shots_wrs.append(total_shots*i)
    print("Step {}: cost = {} shots used = {}".format(i, cost_wrs[-1], shots_wrs[-1]))

Out:

Step 0: cost = -0.47971271815214855 shots used = 0
Step 1: cost = -1.6046714935746036 shots used = 8000
Step 2: cost = -2.3711582441456405 shots used = 16000
Step 3: cost = -3.027957214388223 shots used = 24000
Step 4: cost = -3.5159459178932124 shots used = 32000
Step 5: cost = -4.174569440117134 shots used = 40000
Step 6: cost = -4.399038233873806 shots used = 48000
Step 7: cost = -4.97578796538636 shots used = 56000
Step 8: cost = -4.986174103921648 shots used = 64000
Step 9: cost = -5.4723644093138475 shots used = 72000
Step 10: cost = -5.565504831455739 shots used = 80000
Step 11: cost = -6.020192611410446 shots used = 88000
Step 12: cost = -5.883734806920739 shots used = 96000
Step 13: cost = -6.223338234961309 shots used = 104000
Step 14: cost = -6.530929589596482 shots used = 112000
Step 15: cost = -6.55209167517164 shots used = 120000
Step 16: cost = -6.616263395709042 shots used = 128000
Step 17: cost = -6.701343782177316 shots used = 136000
Step 18: cost = -6.926249635222259 shots used = 144000
Step 19: cost = -6.893630348365093 shots used = 152000
Step 20: cost = -7.25951992995195 shots used = 160000
Step 21: cost = -7.33526091390357 shots used = 168000
Step 22: cost = -7.236203508675232 shots used = 176000
Step 23: cost = -7.313133236859679 shots used = 184000
Step 24: cost = -7.495089739160081 shots used = 192000
Step 25: cost = -7.404816912094075 shots used = 200000
Step 26: cost = -7.37681376104195 shots used = 208000
Step 27: cost = -7.344579270735176 shots used = 216000
Step 28: cost = -7.590572546476281 shots used = 224000
Step 29: cost = -7.367096851527457 shots used = 232000
Step 30: cost = -7.691416766514971 shots used = 240000
Step 31: cost = -7.5685461129010445 shots used = 248000
Step 32: cost = -7.543452291622964 shots used = 256000
Step 33: cost = -7.45022937841806 shots used = 264000
Step 34: cost = -7.699735579409215 shots used = 272000
Step 35: cost = -7.48507023774885 shots used = 280000
Step 36: cost = -7.712396980518955 shots used = 288000
Step 37: cost = -7.798746778552323 shots used = 296000
Step 38: cost = -7.258524045988669 shots used = 304000
Step 39: cost = -7.612477992812337 shots used = 312000
Step 40: cost = -7.644321507171171 shots used = 320000
Step 41: cost = -7.622252218175268 shots used = 328000
Step 42: cost = -7.659566275956822 shots used = 336000
Step 43: cost = -7.716392372535333 shots used = 344000
Step 44: cost = -7.70550059314149 shots used = 352000
Step 45: cost = -7.569574267252698 shots used = 360000
Step 46: cost = -7.6505339589939245 shots used = 368000
Step 47: cost = -7.573980249824434 shots used = 376000
Step 48: cost = -7.669518751840373 shots used = 384000
Step 49: cost = -7.87942099640493 shots used = 392000
Step 50: cost = -7.79359324246517 shots used = 400000
Step 51: cost = -7.640857698840708 shots used = 408000
Step 52: cost = -7.760214224863149 shots used = 416000
Step 53: cost = -7.7534866741462665 shots used = 424000
Step 54: cost = -7.823529894428987 shots used = 432000
Step 55: cost = -7.3421772191804795 shots used = 440000
Step 56: cost = -7.678629604324032 shots used = 448000
Step 57: cost = -7.785780985238394 shots used = 456000
Step 58: cost = -7.809678921909053 shots used = 464000
Step 59: cost = -7.740548636222224 shots used = 472000
Step 60: cost = -7.845219614969788 shots used = 480000
Step 61: cost = -7.852354115046689 shots used = 488000
Step 62: cost = -7.87987428603207 shots used = 496000
Step 63: cost = -7.729421977751639 shots used = 504000
Step 64: cost = -7.730096681753854 shots used = 512000
Step 65: cost = -7.785972022421943 shots used = 520000
Step 66: cost = -7.957770864474304 shots used = 528000
Step 67: cost = -7.8822165400177395 shots used = 536000
Step 68: cost = -7.671714190969816 shots used = 544000
Step 69: cost = -8.088343720314754 shots used = 552000
Step 70: cost = -7.668598000172826 shots used = 560000
Step 71: cost = -7.816750278067786 shots used = 568000
Step 72: cost = -7.79850716226346 shots used = 576000
Step 73: cost = -7.819275585642221 shots used = 584000
Step 74: cost = -7.904932714604841 shots used = 592000
Step 75: cost = -7.680104341111219 shots used = 600000
Step 76: cost = -7.758170742941555 shots used = 608000
Step 77: cost = -7.77567561639292 shots used = 616000
Step 78: cost = -7.807035648246735 shots used = 624000
Step 79: cost = -7.922820252746465 shots used = 632000
Step 80: cost = -7.875242047278865 shots used = 640000
Step 81: cost = -7.695550297275514 shots used = 648000
Step 82: cost = -7.869766403908718 shots used = 656000
Step 83: cost = -7.997038278663192 shots used = 664000
Step 84: cost = -7.824013035482636 shots used = 672000
Step 85: cost = -7.781748834372316 shots used = 680000
Step 86: cost = -7.874897583115409 shots used = 688000
Step 87: cost = -7.897156700629832 shots used = 696000
Step 88: cost = -7.77261627101102 shots used = 704000
Step 89: cost = -7.677512918968558 shots used = 712000
Step 90: cost = -7.619087063716822 shots used = 720000
Step 91: cost = -7.884491369760785 shots used = 728000
Step 92: cost = -8.079089757562398 shots used = 736000
Step 93: cost = -8.01176501738253 shots used = 744000
Step 94: cost = -7.905685278551548 shots used = 752000
Step 95: cost = -7.878790696475531 shots used = 760000
Step 96: cost = -7.6828741912022585 shots used = 768000
Step 97: cost = -7.809858196602689 shots used = 776000
Step 98: cost = -8.043075025868639 shots used = 784000
Step 99: cost = -8.03810533747292 shots used = 792000

Let’s compare this against an optimization not using weighted random sampling. Here, we will split the 8000 total shots evenly across all Hamiltonian terms, also known as uniform deterministic sampling.

non_analytic_dev.shots = total_shots / len(coeffs)

qnodes = qml.map(StronglyEntanglingLayers, obs, device=non_analytic_dev)
cost = qml.dot(coeffs, qnodes)

opt = qml.AdamOptimizer(0.05)
params = init_params

cost_adam = []
shots_adam = []

for i in range(100):
    params, _cost = opt.step_and_cost(cost, params)
    cost_adam.append(_cost)
    shots_adam.append(total_shots*i)
    print("Step {}: cost = {} shots used = {}".format(i, cost_adam[-1], shots_adam[-1]))

Out:

Step 0: cost = -0.4750000000000003 shots used = 0
Step 1: cost = -1.2174999999999998 shots used = 8000
Step 2: cost = -2.4137500000000003 shots used = 16000
Step 3: cost = -2.8324999999999996 shots used = 24000
Step 4: cost = -3.4450000000000003 shots used = 32000
Step 5: cost = -4.0737499999999995 shots used = 40000
Step 6: cost = -4.271249999999999 shots used = 48000
Step 7: cost = -4.635 shots used = 56000
Step 8: cost = -5.08625 shots used = 64000
Step 9: cost = -5.425 shots used = 72000
Step 10: cost = -5.65625 shots used = 80000
Step 11: cost = -5.75625 shots used = 88000
Step 12: cost = -5.85 shots used = 96000
Step 13: cost = -6.225 shots used = 104000
Step 14: cost = -6.15 shots used = 112000
Step 15: cost = -6.6137500000000005 shots used = 120000
Step 16: cost = -6.661249999999999 shots used = 128000
Step 17: cost = -6.75625 shots used = 136000
Step 18: cost = -6.925 shots used = 144000
Step 19: cost = -7.18 shots used = 152000
Step 20: cost = -6.95125 shots used = 160000
Step 21: cost = -7.2375 shots used = 168000
Step 22: cost = -7.5200000000000005 shots used = 176000
Step 23: cost = -7.558749999999999 shots used = 184000
Step 24: cost = -7.387499999999999 shots used = 192000
Step 25: cost = -7.4925 shots used = 200000
Step 26: cost = -7.367500000000001 shots used = 208000
Step 27: cost = -7.54125 shots used = 216000
Step 28: cost = -7.365 shots used = 224000
Step 29: cost = -7.695 shots used = 232000
Step 30: cost = -7.60625 shots used = 240000
Step 31: cost = -7.4525 shots used = 248000
Step 32: cost = -7.60125 shots used = 256000
Step 33: cost = -7.48 shots used = 264000
Step 34: cost = -7.711250000000001 shots used = 272000
Step 35: cost = -7.46125 shots used = 280000
Step 36: cost = -7.58875 shots used = 288000
Step 37: cost = -7.641249999999999 shots used = 296000
Step 38: cost = -7.728750000000001 shots used = 304000
Step 39: cost = -7.9 shots used = 312000
Step 40: cost = -7.80875 shots used = 320000
Step 41: cost = -7.6875 shots used = 328000
Step 42: cost = -7.73125 shots used = 336000
Step 43: cost = -7.553750000000001 shots used = 344000
Step 44: cost = -7.6674999999999995 shots used = 352000
Step 45: cost = -7.687500000000001 shots used = 360000
Step 46: cost = -7.8 shots used = 368000
Step 47: cost = -8.012500000000001 shots used = 376000
Step 48: cost = -7.65125 shots used = 384000
Step 49: cost = -7.72 shots used = 392000
Step 50: cost = -7.88375 shots used = 400000
Step 51: cost = -7.58125 shots used = 408000
Step 52: cost = -7.732500000000001 shots used = 416000
Step 53: cost = -7.725 shots used = 424000
Step 54: cost = -7.9225 shots used = 432000
Step 55: cost = -7.755 shots used = 440000
Step 56: cost = -7.726250000000001 shots used = 448000
Step 57: cost = -7.9262500000000005 shots used = 456000
Step 58: cost = -7.75 shots used = 464000
Step 59: cost = -7.84625 shots used = 472000
Step 60: cost = -7.58375 shots used = 480000
Step 61: cost = -7.81375 shots used = 488000
Step 62: cost = -7.4825 shots used = 496000
Step 63: cost = -7.85375 shots used = 504000
Step 64: cost = -7.8025 shots used = 512000
Step 65: cost = -7.7700000000000005 shots used = 520000
Step 66: cost = -7.9399999999999995 shots used = 528000
Step 67: cost = -7.918749999999999 shots used = 536000
Step 68: cost = -7.92125 shots used = 544000
Step 69: cost = -7.938750000000001 shots used = 552000
Step 70: cost = -7.7825 shots used = 560000
Step 71: cost = -7.800000000000001 shots used = 568000
Step 72: cost = -7.7225 shots used = 576000
Step 73: cost = -7.7175 shots used = 584000
Step 74: cost = -7.843749999999999 shots used = 592000
Step 75: cost = -7.840000000000001 shots used = 600000
Step 76: cost = -7.9750000000000005 shots used = 608000
Step 77: cost = -7.79 shots used = 616000
Step 78: cost = -7.95 shots used = 624000
Step 79: cost = -7.9087499999999995 shots used = 632000
Step 80: cost = -7.967499999999999 shots used = 640000
Step 81: cost = -7.884999999999999 shots used = 648000
Step 82: cost = -7.79 shots used = 656000
Step 83: cost = -8.01375 shots used = 664000
Step 84: cost = -7.715000000000001 shots used = 672000
Step 85: cost = -8.05 shots used = 680000
Step 86: cost = -7.75625 shots used = 688000
Step 87: cost = -7.62875 shots used = 696000
Step 88: cost = -7.60125 shots used = 704000
Step 89: cost = -7.661250000000001 shots used = 712000
Step 90: cost = -7.77625 shots used = 720000
Step 91: cost = -7.97375 shots used = 728000
Step 92: cost = -7.90875 shots used = 736000
Step 93: cost = -8.12375 shots used = 744000
Step 94: cost = -7.858749999999999 shots used = 752000
Step 95: cost = -8.0025 shots used = 760000
Step 96: cost = -7.91375 shots used = 768000
Step 97: cost = -7.83375 shots used = 776000
Step 98: cost = -7.80125 shots used = 784000
Step 99: cost = -7.8225 shots used = 792000

Comparing these two techniques:

from matplotlib import pyplot as plt

plt.style.use("seaborn")
plt.plot(shots_wrs, cost_wrs, "b", label="Adam WRS")
plt.plot(shots_adam, cost_adam, "g", label="Adam")

plt.ylabel("Cost function value")
plt.xlabel("Number of shots")
plt.legend()
plt.show()
tutorial rosalin

We can see that weighted random sampling performs just as well as the uniform deterministic sampling. However, weighted random sampling begins to show a non-negligible improvement over deterministic sampling for large Hamiltonians with highly non-uniform coefficients. For example, see Fig (3) and (4) of Arrasmith et al. [1], comparing weighted random sampling VQE optimization for both \(\text{H}_2\) and \(\text{LiH}\) molecules.

Note

While not covered here, another approach that could be taken is weighted deterministic sampling. Here, the number of shots is distributed across terms as per

\[s_i = \left\lfloor N \frac{|c_i|}{\sum_i |c_i|}\right\rfloor,\]

where \(N\) is the total number of shots.

Rosalin: Frugal shot optimization

We can see above that both methods optimize fairly well; weighted random sampling converges just as well as evenly distributing the shots across all Hamiltonian terms. However, deterministic shot distribution approaches will always have a minimum shot value required per expectation value, as below this threshold they become biased estimators. This is not the case with random sampling; as we saw in the doubly stochastic gradient descent demonstration, the introduction of randomness allows for as little as a single shot per expectation term, while still remaining an unbiased estimator.

Using this insight, Arrasmith et al. [1] modified the iCANS frugal shot-optimization technique [4] to include weighted random sampling, making it ‘doubly stochastic’.

iCANS optimizer

Two variants of the iCANS optimizer were introduced in Kübler et al., iCANS1 and iCANS2. The iCANS1 optimizer, on which Rosalin is based, frugally distributes a shot budget across the partial derivatives of each parameter, which are computed using the parameter-shift rule. It works roughly as follows:

  1. The initial step of the optimizer is performed with some specified minimum number of shots, \(s_{min}\), for all partial derivatives.

  2. The parameter-shift rule is then used to estimate the gradient \(g_i\) for each parameter \(\theta_i\), parameters, as well as the variances \(v_i\) of the estimated gradients.

  3. Gradient descent is performed for each parameter \(\theta_i\), using the pre-defined learning rate \(\alpha\) and the gradient information \(g_i\):

    \[\theta_i = \theta_i - \alpha g_i.\]
  4. The improvement in the cost function per shot, for a specific parameter value, is then calculated via

    \[\gamma_i = \frac{1}{s_i} \left[ \left(\alpha - \frac{1}{2} L\alpha^2\right) g_i^2 - \frac{L\alpha^2}{2s_i}v_i \right],\]

    where:

    • \(L \leq \sum_i|c_i|\) is the bound on the Lipschitz constant of the variational quantum algorithm objective function,
    • \(c_i\) are the coefficients of the Hamiltonian, and
    • \(\alpha\) is the learning rate, and must be bound such that \(\alpha < 2/L\) for the above expression to hold.
  5. Finally, the new values of \(s_i\) (shots for partial derivative of parameter \(\theta_i\)) is given by:

    \[s_i = \frac{2L\alpha}{2-L\alpha}\left(\frac{v_i}{g_i^2}\right)\propto \frac{v_i}{g_i^2}.\]

In addition to the above, to counteract the presence of noise in the system, a running average of \(g_i\) and \(s_i\) (\(\chi_i\) and \(\xi_i\) respectively) are used when computing \(\gamma_i\) and \(s_i\).

Note

In classical machine learning, the Lipschitz constant of the cost function is generally unknown. However, for a variational quantum algorithm with cost of the form \(f(x) = \langle \psi(x) | \hat{H} |\psi(x)\rangle\), an upper bound on the Lipschitz constant is given by \(L < \sum_i|c_i|\), where \(c_i\) are the coefficients of \(\hat{H}\) when decomposed into a linear combination of Pauli-operator tensor products.

Rosalin implementation

Let’s now modify iCANS above to incorporate weighted random sampling of Hamiltonian terms — the Rosalin frugal shot optimizer.

Rosalin takes several hyper-parameters:

  • min_shots: the minimum number of shots used to estimate the expectations of each term in the Hamiltonian. Note that this must be larger than 2 for the variance of the gradients to be computed.
  • mu: The running average constant \(\mu\in[0, 1]\). Used to control how quickly the number of shots recommended for each gradient component changes.
  • b: Regularization bias. The bias should be kept small, but non-zero.
  • lr: The learning rate. Recall from above that the learning rate must be such that \(\alpha < 2/L = 2/\sum_i|c_i|\).

Since the Rosalin optimizer has a state that must be preserved between optimization steps, let’s use a class to create our optimizer.

class Rosalin:

    def __init__(self, qnodes, coeffs, min_shots, mu=0.99, b=1e-6, lr=0.07):
        self.qnodes = qnodes
        self.coeffs = coeffs

        self.lipschitz = np.sum(np.abs(coeffs))

        if lr > 2 / self.lipschitz:
            raise ValueError("The learning rate must be less than ", 2 / self.lipschitz)

        # hyperparameters
        self.min_shots = min_shots
        self.mu = mu  # running average constant
        self.b = b    # regularization bias
        self.lr = lr  # learning rate

        # keep track of the total number of shots used
        self.shots_used = 0
        # total number of iterations
        self.k = 0
        # Number of shots per parameter
        self.s = np.zeros_like(params, dtype=np.float64) + min_shots

        # Running average of the parameter gradients
        self.chi = None
        # Running average of the variance of the parameter gradients
        self.xi = None

    def estimate_hamiltonian(self, params, shots):
        """Returns an array containing length ``shots`` single-shot estimates
        of the Hamiltonian. The shots are distributed randomly over
        the terms in the Hamiltonian, as per a Multinomial distribution.

        Since we are performing single-shot estimates, the QNodes must be
        set to 'sample' mode.
        """

        # determine the shot probability per term
        prob_shots = np.abs(coeffs) / np.sum(np.abs(coeffs))

        # construct the multinomial distribution, and sample
        # from it to determine how many shots to apply per term
        si = multinomial(n=shots, p=prob_shots)
        shots_per_term = si.rvs()[0]

        results = []
        for h, c, p, s in zip(self.qnodes, self.coeffs, prob_shots, shots_per_term):

            # if the number of shots is 0, do nothing
            if s == 0:
                continue

            # set the QNode device shots
            h.device.shots = s

            # evaluate the QNode corresponding to
            # the Hamiltonian term
            res = h(params)

            if s == 1:
                res = np.array([res])

            # Note that, unlike above, we divide each term by the
            # probability per shot. This is because we are sampling one at a time.
            results.append(c * res / p)

        return np.concatenate(results)

    def evaluate_grad_var(self, i, params, shots):
        """Evaluate the gradient, as well as the variance in the gradient,
        for the ith parameter in params, using the parameter-shift rule.
        """
        shift = np.zeros_like(params)
        shift[i] = np.pi / 2

        shift_forward = self.estimate_hamiltonian(params + shift, shots)
        shift_backward = self.estimate_hamiltonian(params - shift, shots)

        g = np.mean(shift_forward - shift_backward) / 2
        s = np.var((shift_forward - shift_backward) / 2, ddof=1)

        return g, s

    def step(self, params):
        """Perform a single step of the Rosalin optimizer."""
        # keep track of the number of shots run
        self.shots_used += int(2 * np.sum(self.s))

        # compute the gradient, as well as the variance in the gradient,
        # using the number of shots determined by the array s.
        grad = []
        S = []

        p_ind = list(np.ndindex(*params.shape))

        for l in p_ind:
            # loop through each parameter, performing
            # the parameter-shift rule
            g_, s_ = self.evaluate_grad_var(l, params, self.s[l])
            grad.append(g_)
            S.append(s_)

        grad = np.reshape(np.stack(grad), params.shape)
        S = np.reshape(np.stack(S), params.shape)

        # gradient descent update
        params = params - self.lr * grad

        if self.xi is None:
            self.chi = np.zeros_like(params, dtype=np.float64)
            self.xi = np.zeros_like(params, dtype=np.float64)

        # running average of the gradient variance
        self.xi = self.mu * self.xi + (1 - self.mu) * S
        xi = self.xi / (1 - self.mu ** (self.k + 1))

        # running average of the gradient
        self.chi = self.mu * self.chi + (1 - self.mu) * grad
        chi = self.chi / (1 - self.mu ** (self.k + 1))

        # determine the new optimum shots distribution for the next
        # iteration of the optimizer
        s = np.ceil(
            (2 * self.lipschitz * self.lr * xi)
            / ((2 - self.lipschitz * self.lr) * (chi ** 2 + self.b * (self.mu ** self.k)))
        )

        # apply an upper and lower bound on the new shot distributions,
        # to avoid the number of shots reducing below min(2, min_shots),
        # or growing too significantly.
        gamma = (
            (self.lr - self.lipschitz * self.lr ** 2 / 2) * chi ** 2
            - xi * self.lipschitz * self.lr ** 2 / (2 * s)
        ) / s

        argmax_gamma = np.unravel_index(np.argmax(gamma), gamma.shape)
        smax = s[argmax_gamma]
        self.s = np.clip(s, min(2, self.min_shots), smax)

        self.k += 1
        return params

Rosalin optimization

We are now ready to use our Rosalin optimizer to optimize the initial VQE problem. Note that we create our QNodes using measure="sample", since the Rosalin optimizer must be able to generate single-shot samples from our device.

rosalin_device = qml.device("default.qubit", wires=num_wires, analytic=False)
qnodes = qml.map(StronglyEntanglingLayers, obs, device=rosalin_device, measure="sample")

Let’s also create a separate cost function using an ‘exact’ quantum device, so that we can keep track of the exact cost function value at each iteration.

cost_analytic = qml.dot(coeffs, qml.map(StronglyEntanglingLayers, obs, device=analytic_dev))

Creating the optimizer and beginning the optimization:

opt = Rosalin(qnodes, coeffs, min_shots=10)
params = init_params

cost_rosalin = [cost_analytic(params)]
shots_rosalin = [0]

for i in range(60):
    params = opt.step(params)
    cost_rosalin.append(cost_analytic(params))
    shots_rosalin.append(opt.shots_used)
    print(f"Step {i}: cost = {cost_rosalin[-1]}, shots_used = {shots_rosalin[-1]}")

Out:

Step 0: cost = -3.4928090686245064, shots_used = 240
Step 1: cost = -2.8716811390235613, shots_used = 312
Step 2: cost = -3.7590197589831185, shots_used = 384
Step 3: cost = -5.515801105823636, shots_used = 480
Step 4: cost = -3.705413448871312, shots_used = 600
Step 5: cost = -4.173237407647809, shots_used = 744
Step 6: cost = -7.4432215997165, shots_used = 960
Step 7: cost = -6.929019562600036, shots_used = 1200
Step 8: cost = -6.636269032139257, shots_used = 1488
Step 9: cost = -7.4903187080168925, shots_used = 1944
Step 10: cost = -7.570806734706176, shots_used = 2616
Step 11: cost = -7.679827109083976, shots_used = 3600
Step 12: cost = -7.795213797809713, shots_used = 4968
Step 13: cost = -7.8538323834910475, shots_used = 7312
Step 14: cost = -7.847001164770654, shots_used = 9616
Step 15: cost = -7.821094066941344, shots_used = 11992
Step 16: cost = -7.6929309645559165, shots_used = 14320
Step 17: cost = -7.7048475611778215, shots_used = 16792
Step 18: cost = -7.84543616920051, shots_used = 20008
Step 19: cost = -7.835747356848822, shots_used = 23248
Step 20: cost = -7.8515150552621735, shots_used = 27832
Step 21: cost = -7.861469110295185, shots_used = 33376
Step 22: cost = -7.872794285996416, shots_used = 40240
Step 23: cost = -7.829591405193604, shots_used = 48832
Step 24: cost = -7.843611758167786, shots_used = 57736
Step 25: cost = -7.861785500970594, shots_used = 66904
Step 26: cost = -7.857465148841151, shots_used = 77824
Step 27: cost = -7.839957071390776, shots_used = 90160
Step 28: cost = -7.852613209139017, shots_used = 107006
Step 29: cost = -7.857477772825318, shots_used = 124886
Step 30: cost = -7.842793776430604, shots_used = 143266
Step 31: cost = -7.862044320075629, shots_used = 163568
Step 32: cost = -7.866923623322376, shots_used = 185072
Step 33: cost = -7.847402437464903, shots_used = 203000
Step 34: cost = -7.875655005023053, shots_used = 224648
Step 35: cost = -7.874972624883538, shots_used = 252998
Step 36: cost = -7.8716697615489295, shots_used = 282638
Step 37: cost = -7.879251527009243, shots_used = 313314
Step 38: cost = -7.873526406429234, shots_used = 346186
Step 39: cost = -7.878414470451739, shots_used = 381888
Step 40: cost = -7.8746187783525725, shots_used = 418896
Step 41: cost = -7.886226777536887, shots_used = 457920
Step 42: cost = -7.889146175993925, shots_used = 503592
Step 43: cost = -7.882846746233821, shots_used = 552676
Step 44: cost = -7.888745402270841, shots_used = 609010
Step 45: cost = -7.8877038208765455, shots_used = 664906
Step 46: cost = -7.8904655772386585, shots_used = 725050
Step 47: cost = -7.891289313111892, shots_used = 788968
Step 48: cost = -7.892048590240504, shots_used = 857736
Step 49: cost = -7.8939212955891565, shots_used = 930582
Step 50: cost = -7.894267505118622, shots_used = 1000158
Step 51: cost = -7.88870542405607, shots_used = 1083368
Step 52: cost = -7.889440477832928, shots_used = 1167120
Step 53: cost = -7.891802034000425, shots_used = 1254264
Step 54: cost = -7.8897412086542085, shots_used = 1350692
Step 55: cost = -7.892509162950194, shots_used = 1457786
Step 56: cost = -7.892505113480256, shots_used = 1563400
Step 57: cost = -7.893467959799664, shots_used = 1674534
Step 58: cost = -7.893304325751356, shots_used = 1793406
Step 59: cost = -7.892946674255748, shots_used = 1922096

Let’s compare this to a standard Adam optimization. Using 100 shots per quantum evaluation, for each update step there are 2 quantum evaluations per parameter.

adam_shots_per_eval = 100
adam_shots_per_step = 2 * adam_shots_per_eval * len(params.flatten())
print(adam_shots_per_step)

Out:

2400

Thus, Adam is using 2400 shots per update step.

params = init_params
opt = qml.AdamOptimizer(0.07)

non_analytic_dev.shots = adam_shots_per_eval
cost = qml.dot(
  coeffs,
  qml.map(StronglyEntanglingLayers, obs, device=non_analytic_dev, diff_method="parameter-shift")
)

cost_adam = [cost_analytic(params)]
shots_adam = [0]

for i in range(100):
    params = opt.step(cost, params)
    cost_adam.append(cost_analytic(params))
    shots_adam.append(adam_shots_per_step * (i + 1))
    print("Step {}: cost = {} shots_used = {}".format(i, cost_adam[-1], shots_adam[-1]))

Out:

Step 0: cost = -2.1215081049779547 shots_used = 2400
Step 1: cost = -3.138064307667297 shots_used = 4800
Step 2: cost = -3.903865289837866 shots_used = 7200
Step 3: cost = -4.507682568650019 shots_used = 9600
Step 4: cost = -4.992868673078652 shots_used = 12000
Step 5: cost = -5.397220584298201 shots_used = 14400
Step 6: cost = -5.737391742493613 shots_used = 16800
Step 7: cost = -6.0226329347198515 shots_used = 19200
Step 8: cost = -6.263398222785995 shots_used = 21600
Step 9: cost = -6.465261674088946 shots_used = 24000
Step 10: cost = -6.642321275431066 shots_used = 26400
Step 11: cost = -6.79901367303434 shots_used = 28800
Step 12: cost = -6.935111590568157 shots_used = 31200
Step 13: cost = -7.054654524467989 shots_used = 33600
Step 14: cost = -7.155567507674262 shots_used = 36000
Step 15: cost = -7.238946624798189 shots_used = 38400
Step 16: cost = -7.306633573696118 shots_used = 40800
Step 17: cost = -7.361893012873347 shots_used = 43200
Step 18: cost = -7.406163353304568 shots_used = 45600
Step 19: cost = -7.439471884054811 shots_used = 48000
Step 20: cost = -7.464642983727413 shots_used = 50400
Step 21: cost = -7.482135093155363 shots_used = 52800
Step 22: cost = -7.492965982678255 shots_used = 55200
Step 23: cost = -7.4995711208123526 shots_used = 57600
Step 24: cost = -7.504495071237245 shots_used = 60000
Step 25: cost = -7.508236629783636 shots_used = 62400
Step 26: cost = -7.512755576875523 shots_used = 64800
Step 27: cost = -7.521073754365091 shots_used = 67200
Step 28: cost = -7.529908984029604 shots_used = 69600
Step 29: cost = -7.5453558194210855 shots_used = 72000
Step 30: cost = -7.566103826658749 shots_used = 74400
Step 31: cost = -7.586593697229583 shots_used = 76800
Step 32: cost = -7.608123856223107 shots_used = 79200
Step 33: cost = -7.627060101684072 shots_used = 81600
Step 34: cost = -7.647119544050486 shots_used = 84000
Step 35: cost = -7.664689836652333 shots_used = 86400
Step 36: cost = -7.6787332388013185 shots_used = 88800
Step 37: cost = -7.688432645200169 shots_used = 91200
Step 38: cost = -7.69494802947398 shots_used = 93600
Step 39: cost = -7.699261015754831 shots_used = 96000
Step 40: cost = -7.701146491260424 shots_used = 98400
Step 41: cost = -7.70055797092466 shots_used = 100800
Step 42: cost = -7.698138076324305 shots_used = 103200
Step 43: cost = -7.694484794884271 shots_used = 105600
Step 44: cost = -7.692762362612309 shots_used = 108000
Step 45: cost = -7.691301914538373 shots_used = 110400
Step 46: cost = -7.690289452924415 shots_used = 112800
Step 47: cost = -7.692825466597301 shots_used = 115200
Step 48: cost = -7.698426326150402 shots_used = 117600
Step 49: cost = -7.70569685153181 shots_used = 120000
Step 50: cost = -7.71270798174784 shots_used = 122400
Step 51: cost = -7.719728739440944 shots_used = 124800
Step 52: cost = -7.7261764433031335 shots_used = 127200
Step 53: cost = -7.731758840605337 shots_used = 129600
Step 54: cost = -7.737486107940807 shots_used = 132000
Step 55: cost = -7.742883220750697 shots_used = 134400
Step 56: cost = -7.748378742512239 shots_used = 136800
Step 57: cost = -7.753133233487281 shots_used = 139200
Step 58: cost = -7.756478490455658 shots_used = 141600
Step 59: cost = -7.760357871331654 shots_used = 144000
Step 60: cost = -7.765368510137018 shots_used = 146400
Step 61: cost = -7.770584310342268 shots_used = 148800
Step 62: cost = -7.775237183666503 shots_used = 151200
Step 63: cost = -7.780213114403942 shots_used = 153600
Step 64: cost = -7.786537928974204 shots_used = 156000
Step 65: cost = -7.7934701200015875 shots_used = 158400
Step 66: cost = -7.799223986115196 shots_used = 160800
Step 67: cost = -7.802077100958286 shots_used = 163200
Step 68: cost = -7.803342383342252 shots_used = 165600
Step 69: cost = -7.804200856020879 shots_used = 168000
Step 70: cost = -7.805131083684176 shots_used = 170400
Step 71: cost = -7.80783114227738 shots_used = 172800
Step 72: cost = -7.80852611882117 shots_used = 175200
Step 73: cost = -7.810012472165346 shots_used = 177600
Step 74: cost = -7.811944961883073 shots_used = 180000
Step 75: cost = -7.815916061479068 shots_used = 182400
Step 76: cost = -7.8219164383912165 shots_used = 184800
Step 77: cost = -7.828221376600689 shots_used = 187200
Step 78: cost = -7.833006254177973 shots_used = 189600
Step 79: cost = -7.8363875131373835 shots_used = 192000
Step 80: cost = -7.8396352291244575 shots_used = 194400
Step 81: cost = -7.842450998127182 shots_used = 196800
Step 82: cost = -7.84447619621909 shots_used = 199200
Step 83: cost = -7.846066173612538 shots_used = 201600
Step 84: cost = -7.8468744330712 shots_used = 204000
Step 85: cost = -7.848638965889068 shots_used = 206400
Step 86: cost = -7.850825571201936 shots_used = 208800
Step 87: cost = -7.853653343602643 shots_used = 211200
Step 88: cost = -7.8568340468002775 shots_used = 213600
Step 89: cost = -7.859359135290533 shots_used = 216000
Step 90: cost = -7.861570849252088 shots_used = 218400
Step 91: cost = -7.864125171249038 shots_used = 220800
Step 92: cost = -7.8670631265084445 shots_used = 223200
Step 93: cost = -7.869145745867427 shots_used = 225600
Step 94: cost = -7.871590481712041 shots_used = 228000
Step 95: cost = -7.874015910508825 shots_used = 230400
Step 96: cost = -7.875480563579005 shots_used = 232800
Step 97: cost = -7.8765609486774055 shots_used = 235200
Step 98: cost = -7.87696783221222 shots_used = 237600
Step 99: cost = -7.877634614604139 shots_used = 240000

Plotting both experiments:

plt.style.use("seaborn")
plt.plot(shots_rosalin, cost_rosalin, "b", label="Rosalin")
plt.plot(shots_adam, cost_adam, "g", label="Adam")

plt.ylabel("Cost function value")
plt.xlabel("Number of shots")
plt.legend()
plt.xlim(0, 300000)
plt.show()
tutorial rosalin

The Rosalin optimizer performs significantly better than the Adam optimizer, approaching the ground state energy of the Hamiltonian with strikingly fewer shots.

While beyond the scope of this demonstration, the Rosalin optimizer can be modified in various other ways; for instance, by incorporating weighted hybrid sampling (which distributes some shots deterministically, with the remainder done randomly), or by adapting the variant iCANS2 optimizer. Download this demonstration from the sidebar 👉 and give it a go! ⚛️

References

[1](1, 2, 3, 4) Andrew Arrasmith, Lukasz Cincio, Rolando D. Somma, and Patrick J. Coles. “Operator Sampling for Shot-frugal Optimization in Variational Algorithms.” arXiv:2004.06252 (2020).
[2]James Stokes, Josh Izaac, Nathan Killoran, and Giuseppe Carleo. “Quantum Natural Gradient.” arXiv:1909.02108 (2019).
[3]Ryan Sweke, Frederik Wilde, Johannes Jakob Meyer, Maria Schuld, Paul K. Fährmann, Barthélémy Meynard-Piganeau, and Jens Eisert. “Stochastic gradient descent for hybrid quantum-classical optimization.” arXiv:1910.01155 (2019).
[4](1, 2) Jonas M. Kübler, Andrew Arrasmith, Lukasz Cincio, and Patrick J. Coles. “An Adaptive Optimizer for Measurement-Frugal Variational Algorithms.” Quantum 4, 263 (2020).

Total running time of the script: ( 0 minutes 45.472 seconds)

Gallery generated by Sphinx-Gallery