Frugal shot optimization with Rosalin

Author: PennyLane dev team. Posted: 19 May 2020. Last updated: 13 April 2021.

In this tutorial we investigate and implement the Rosalin (Random Operator Sampling for Adaptive Learning with Individual Number of shots) from Arrasmith et al. [1]. In this paper, a strategy is introduced for reducing the number of shots required when optimizing variational quantum algorithms, by both:

  • Frugally adapting the number of shots used per parameter update, and
  • Performing a weighted sampling of operators from the cost Hamiltonian.

Note

The Rosalin optimizer is available in PennyLane via the ShotAdaptiveOptimizer.

Background

While a large number of papers in variational quantum algorithms focus on the choice of circuit ansatz, cost function, gradient computation, or initialization method, the optimization strategy—an important component affecting both convergence time and quantum resource dependence—is not as frequently considered. Instead, common ‘out-of-the-box’ classical optimization techniques, such as gradient-free methods (COBLYA, Nelder-Mead), gradient-descent, and Hessian-free methods (L-BFGS) tend to be used.

However, for variational algorithms such as VQE, which involve evaluating a large number of non-commuting operators in the cost function, decreasing the number of quantum evaluations required for convergence, while still minimizing statistical noise, can be a delicate balance.

Recent work has highlighted that ‘quantum-aware’ optimization techniques can lead to marked improvements when training variational quantum algorithms:

  • Quantum natural gradient descent by Stokes et al. [2], which takes into account the quantum geometry during the gradient-descent update step.
  • The work of Sweke et al. [3], which shows that quantum gradient descent with a finite number of shots is equivalent to stochastic gradient descent, and has guaranteed convergence. Furthermore, combining a finite number of shots with weighted sampling of the cost function terms leads to Doubly stochastic gradient descent.
  • The iCANS (individual Coupled Adaptive Number of Shots) optimization technique by Jonas Kuebler et al. [4] adapts the number of shots measurements during training, by maximizing the expected gain per shot.

In this latest result by Arrasmith et al. [1], the idea of doubly stochastic gradient descent has been used to extend the iCANS optimizer, resulting in faster convergence.

Over the course of this tutorial, we will explore their results; beginning first with a demonstration of weighted random sampling of the cost Hamiltonian operators, before combining this with the shot-frugal iCANS optimizer to perform doubly stochastic Rosalin optimization.

Weighted random sampling

Consider a Hamiltonian \(H\) expanded as a weighted sum of operators \(h_i\) that can be directly measured:

\[H = \sum_{i=1}^N c_i h_i.\]

Due to the linearity of expectation values, the expectation value of this Hamiltonian can be expressed as the weighted sum of each individual term:

\[\langle H\rangle = \sum_{i=1}^N c_i \langle h_i\rangle.\]

In the doubly stochastic gradient descent demonstration, we estimated this expectation value by uniformly sampling a subset of the terms at each optimization step, and evaluating each term by using the same finite number of shots \(N\).

However, what happens if we use a weighted approach to determine how to distribute our samples across the terms of the Hamiltonian? In weighted random sampling (WRS), the number of shots used to determine the expectation value \(\langle h_i\rangle\) is a discrete random variable distributed according to a multinomial distribution,

\[S \sim \text{Multinomial}(p_i),\]

with event probabilities

\[p_i = \frac{|c_i|}{\sum_i |c_i|}.\]

That is, the number of shots assigned to the measurement of the expectation value of the \(i\text{th}\) term of the Hamiltonian is drawn from a probability distribution proportional to the magnitude of its coefficient \(c_i\).

To see this strategy in action, consider the Hamiltonian

\[H = 2I\otimes X + 4 I\otimes Z - X\otimes X + 5Y\otimes Y + 2 Z\otimes X.\]

We can solve for the ground state energy using the variational quantum eigensolver (VQE) algorithm.

First, let’s import NumPy and PennyLane, and define our Hamiltonian.

import numpy as np
import pennylane as qml

# set the random seed
np.random.seed(4)

coeffs = [2, 4, -1, 5, 2]

obs = [
  qml.PauliX(1),
  qml.PauliZ(1),
  qml.PauliX(0) @ qml.PauliX(1),
  qml.PauliY(0) @ qml.PauliY(1),
  qml.PauliZ(0) @ qml.PauliZ(1)
]

We can now create our quantum device (let’s use the default.qubit simulator), and begin constructing some QNodes to evaluate each observable. For our ansatz, we’ll use the StronglyEntanglingLayers.

from pennylane import expval
from pennylane.init import strong_ent_layers_uniform
from pennylane.templates.layers import StronglyEntanglingLayers

num_layers = 2
num_wires = 2

# create a device that estimates expectation values using a finite number of shots
non_analytic_dev = qml.device("default.qubit", wires=num_wires, shots=100)

# create a device that calculates exact expectation values
analytic_dev = qml.device("default.qubit", wires=num_wires, shots=None)

We use map() to map our ansatz over our list of observables, returning a collection of QNodes, each one evaluating the expectation value of each Hamiltonian.

qnodes = qml.map(StronglyEntanglingLayers, obs, device=non_analytic_dev, diff_method="parameter-shift")

Now, let’s set the total number of shots, and determine the probability for sampling each Hamiltonian term.

total_shots = 8000
prob_shots = np.abs(coeffs) / np.sum(np.abs(coeffs))
print(prob_shots)

Out:

[0.14285714 0.28571429 0.07142857 0.35714286 0.14285714]

We can now use SciPy to create our multinomial distributed random variable \(S\), using the number of trials (total shot number) and probability values:

from scipy.stats import multinomial
si = multinomial(n=total_shots, p=prob_shots)

Sampling from this distribution will provide the number of shots used to sample each term in the Hamiltonian:

samples = si.rvs()[0]
print(samples)
print(sum(samples))

Out:

[1191 2262  552 2876 1119]
8000

As expected, if we sum the sampled shots per term, we recover the total number of shots.

Let’s now create our cost function. Recall that the cost function must do the following:

  1. It must sample from the multinomial distribution we created above, to determine the number of shots \(s_i\) to use to estimate the expectation value of the ith Hamiltonian term.
  2. It then must estimate the expectation value \(\langle h_i\rangle\) by querying the required QNode.
  3. And, last but not least, estimate the expectation value \(\langle H\rangle = \sum_i c_i\langle h_i\rangle\).
def cost(params):
    # sample from the multinomial distribution
    shots_per_term = si.rvs()[0]

    result = 0

    for h, c, p, s in zip(qnodes, coeffs, prob_shots, shots_per_term):

        # evaluate the QNode corresponding to
        # the Hamiltonian term, and add it on to our running sum
        result += c * h(params, shots=int(s))

    return result

Evaluating our cost function with some initial parameters, we can test out that our cost function evaluates correctly.

init_params = strong_ent_layers_uniform(n_layers=num_layers, n_wires=num_wires)
print(cost(init_params))

Out:

-0.8395887630997874

Performing the optimization, with the number of shots randomly determined at each optimization step:

opt = qml.AdamOptimizer(0.05)
params = init_params

cost_wrs = []
shots_wrs = []

for i in range(100):
    params, _cost = opt.step_and_cost(cost, params)
    cost_wrs.append(_cost)
    shots_wrs.append(total_shots*i)
    print("Step {}: cost = {} shots used = {}".format(i, cost_wrs[-1], shots_wrs[-1]))

Out:

Step 0: cost = -0.47971271815214855 shots used = 0
Step 1: cost = -1.6879973520840041 shots used = 8000
Step 2: cost = -2.437928256197112 shots used = 16000
Step 3: cost = -2.9300968884147647 shots used = 24000
Step 4: cost = -3.7779069617997116 shots used = 32000
Step 5: cost = -3.8889841568955115 shots used = 40000
Step 6: cost = -4.508059711766957 shots used = 48000
Step 7: cost = -4.71114219758592 shots used = 56000
Step 8: cost = -4.984457128293103 shots used = 64000
Step 9: cost = -5.597084424095087 shots used = 72000
Step 10: cost = -5.456976403687039 shots used = 80000
Step 11: cost = -5.736752824027413 shots used = 88000
Step 12: cost = -6.220317925041974 shots used = 96000
Step 13: cost = -6.45162161927903 shots used = 104000
Step 14: cost = -6.563539211112225 shots used = 112000
Step 15: cost = -6.487339064303318 shots used = 120000
Step 16: cost = -6.69261841162329 shots used = 128000
Step 17: cost = -6.909230576241427 shots used = 136000
Step 18: cost = -7.05156660241221 shots used = 144000
Step 19: cost = -7.163688069859358 shots used = 152000
Step 20: cost = -7.191791478058647 shots used = 160000
Step 21: cost = -7.191694602776715 shots used = 168000
Step 22: cost = -7.430122007574104 shots used = 176000
Step 23: cost = -7.245621601209081 shots used = 184000
Step 24: cost = -7.539044265851978 shots used = 192000
Step 25: cost = -7.532847998808006 shots used = 200000
Step 26: cost = -7.44257222073886 shots used = 208000
Step 27: cost = -7.439951968648378 shots used = 216000
Step 28: cost = -7.734568855081575 shots used = 224000
Step 29: cost = -7.618221322585628 shots used = 232000
Step 30: cost = -7.651544920606065 shots used = 240000
Step 31: cost = -7.5069088885777155 shots used = 248000
Step 32: cost = -7.780301321189146 shots used = 256000
Step 33: cost = -7.4456447455856445 shots used = 264000
Step 34: cost = -7.403560444278863 shots used = 272000
Step 35: cost = -7.666718876831026 shots used = 280000
Step 36: cost = -7.7178910518866415 shots used = 288000
Step 37: cost = -7.375680885292107 shots used = 296000
Step 38: cost = -7.665568049279896 shots used = 304000
Step 39: cost = -7.568101693343673 shots used = 312000
Step 40: cost = -7.524188200359864 shots used = 320000
Step 41: cost = -7.525528734255245 shots used = 328000
Step 42: cost = -7.57734861403185 shots used = 336000
Step 43: cost = -7.76844833198197 shots used = 344000
Step 44: cost = -7.797619087079373 shots used = 352000
Step 45: cost = -7.879148884805528 shots used = 360000
Step 46: cost = -7.744030492750696 shots used = 368000
Step 47: cost = -7.6484739221198765 shots used = 376000
Step 48: cost = -7.679623095926702 shots used = 384000
Step 49: cost = -7.607476988501242 shots used = 392000
Step 50: cost = -7.856041856821188 shots used = 400000
Step 51: cost = -7.644473030321983 shots used = 408000
Step 52: cost = -7.593159311741706 shots used = 416000
Step 53: cost = -7.606939212888227 shots used = 424000
Step 54: cost = -7.621128949485829 shots used = 432000
Step 55: cost = -7.743568287057952 shots used = 440000
Step 56: cost = -7.6325929460598525 shots used = 448000
Step 57: cost = -7.718256562367575 shots used = 456000
Step 58: cost = -7.861601938446393 shots used = 464000
Step 59: cost = -7.666115854972354 shots used = 472000
Step 60: cost = -7.644148944168839 shots used = 480000
Step 61: cost = -7.771569192260795 shots used = 488000
Step 62: cost = -7.776898446282362 shots used = 496000
Step 63: cost = -7.711006891533269 shots used = 504000
Step 64: cost = -7.748650044666392 shots used = 512000
Step 65: cost = -7.690723991927554 shots used = 520000
Step 66: cost = -7.694117031088106 shots used = 528000
Step 67: cost = -7.793250125674997 shots used = 536000
Step 68: cost = -7.926049735334674 shots used = 544000
Step 69: cost = -7.686292326080605 shots used = 552000
Step 70: cost = -7.745774212716911 shots used = 560000
Step 71: cost = -7.625346751584894 shots used = 568000
Step 72: cost = -7.846664469958039 shots used = 576000
Step 73: cost = -7.860275655123486 shots used = 584000
Step 74: cost = -7.593043619614097 shots used = 592000
Step 75: cost = -7.7969799318129045 shots used = 600000
Step 76: cost = -7.837545360539077 shots used = 608000
Step 77: cost = -7.845253964960701 shots used = 616000
Step 78: cost = -7.941652692590529 shots used = 624000
Step 79: cost = -7.967099906804574 shots used = 632000
Step 80: cost = -7.803163356121793 shots used = 640000
Step 81: cost = -7.665600401510319 shots used = 648000
Step 82: cost = -8.09158124610039 shots used = 656000
Step 83: cost = -7.774883584668083 shots used = 664000
Step 84: cost = -7.758175214036924 shots used = 672000
Step 85: cost = -7.9169924228411865 shots used = 680000
Step 86: cost = -7.670199051467696 shots used = 688000
Step 87: cost = -8.085682024006845 shots used = 696000
Step 88: cost = -7.8433919424579095 shots used = 704000
Step 89: cost = -7.755236580472145 shots used = 712000
Step 90: cost = -7.847624689390126 shots used = 720000
Step 91: cost = -8.122239105086607 shots used = 728000
Step 92: cost = -7.922374192271718 shots used = 736000
Step 93: cost = -7.904676929818973 shots used = 744000
Step 94: cost = -7.909417248833883 shots used = 752000
Step 95: cost = -8.06033491620787 shots used = 760000
Step 96: cost = -7.765636196903123 shots used = 768000
Step 97: cost = -7.801666008865329 shots used = 776000
Step 98: cost = -8.066513329432457 shots used = 784000
Step 99: cost = -7.8942080196569675 shots used = 792000

Let’s compare this against an optimization not using weighted random sampling. Here, we will split the 8000 total shots evenly across all Hamiltonian terms, also known as uniform deterministic sampling.

non_analytic_dev.shots = int(total_shots / len(coeffs))

qnodes = qml.map(StronglyEntanglingLayers, obs, device=non_analytic_dev)
cost = qml.dot(coeffs, qnodes)

opt = qml.AdamOptimizer(0.05)
params = init_params

cost_adam = []
shots_adam = []

for i in range(100):
    params, _cost = opt.step_and_cost(cost, params)
    cost_adam.append(_cost)
    shots_adam.append(total_shots*i)
    print("Step {}: cost = {} shots used = {}".format(i, cost_adam[-1], shots_adam[-1]))

Out:

Step 0: cost = -0.38250000000000006 shots used = 0
Step 1: cost = -1.7450000000000006 shots used = 8000
Step 2: cost = -2.54875 shots used = 16000
Step 3: cost = -2.91 shots used = 24000
Step 4: cost = -3.4762500000000003 shots used = 32000
Step 5: cost = -4.08875 shots used = 40000
Step 6: cost = -4.586250000000001 shots used = 48000
Step 7: cost = -4.805 shots used = 56000
Step 8: cost = -4.925 shots used = 64000
Step 9: cost = -5.385000000000001 shots used = 72000
Step 10: cost = -5.4725 shots used = 80000
Step 11: cost = -5.63875 shots used = 88000
Step 12: cost = -5.796250000000001 shots used = 96000
Step 13: cost = -6.308750000000001 shots used = 104000
Step 14: cost = -6.2524999999999995 shots used = 112000
Step 15: cost = -6.706249999999999 shots used = 120000
Step 16: cost = -6.711250000000001 shots used = 128000
Step 17: cost = -6.803749999999999 shots used = 136000
Step 18: cost = -6.94375 shots used = 144000
Step 19: cost = -7.2837499999999995 shots used = 152000
Step 20: cost = -7.4 shots used = 160000
Step 21: cost = -7.38375 shots used = 168000
Step 22: cost = -7.40125 shots used = 176000
Step 23: cost = -7.4775 shots used = 184000
Step 24: cost = -7.58 shots used = 192000
Step 25: cost = -7.623749999999999 shots used = 200000
Step 26: cost = -7.49625 shots used = 208000
Step 27: cost = -7.58375 shots used = 216000
Step 28: cost = -7.6312500000000005 shots used = 224000
Step 29: cost = -7.13375 shots used = 232000
Step 30: cost = -7.47 shots used = 240000
Step 31: cost = -7.6075 shots used = 248000
Step 32: cost = -7.34875 shots used = 256000
Step 33: cost = -7.6525 shots used = 264000
Step 34: cost = -7.572500000000001 shots used = 272000
Step 35: cost = -7.390000000000001 shots used = 280000
Step 36: cost = -7.76375 shots used = 288000
Step 37: cost = -7.49 shots used = 296000
Step 38: cost = -7.61625 shots used = 304000
Step 39: cost = -7.695 shots used = 312000
Step 40: cost = -7.702499999999999 shots used = 320000
Step 41: cost = -7.59625 shots used = 328000
Step 42: cost = -7.733750000000001 shots used = 336000
Step 43: cost = -7.6875 shots used = 344000
Step 44: cost = -7.75875 shots used = 352000
Step 45: cost = -7.796250000000001 shots used = 360000
Step 46: cost = -7.7387500000000005 shots used = 368000
Step 47: cost = -7.92375 shots used = 376000
Step 48: cost = -7.6225 shots used = 384000
Step 49: cost = -7.8425 shots used = 392000
Step 50: cost = -7.74 shots used = 400000
Step 51: cost = -7.661250000000001 shots used = 408000
Step 52: cost = -7.786250000000001 shots used = 416000
Step 53: cost = -7.78875 shots used = 424000
Step 54: cost = -7.62375 shots used = 432000
Step 55: cost = -7.9375 shots used = 440000
Step 56: cost = -7.71625 shots used = 448000
Step 57: cost = -7.72375 shots used = 456000
Step 58: cost = -7.741250000000001 shots used = 464000
Step 59: cost = -7.811249999999999 shots used = 472000
Step 60: cost = -7.89 shots used = 480000
Step 61: cost = -7.74 shots used = 488000
Step 62: cost = -7.751250000000001 shots used = 496000
Step 63: cost = -7.71875 shots used = 504000
Step 64: cost = -7.695 shots used = 512000
Step 65: cost = -7.7325 shots used = 520000
Step 66: cost = -7.819999999999999 shots used = 528000
Step 67: cost = -7.981249999999999 shots used = 536000
Step 68: cost = -7.8 shots used = 544000
Step 69: cost = -7.89 shots used = 552000
Step 70: cost = -7.7125 shots used = 560000
Step 71: cost = -7.993750000000001 shots used = 568000
Step 72: cost = -7.772499999999999 shots used = 576000
Step 73: cost = -8.01125 shots used = 584000
Step 74: cost = -8.116249999999999 shots used = 592000
Step 75: cost = -7.9662500000000005 shots used = 600000
Step 76: cost = -7.7125 shots used = 608000
Step 77: cost = -7.8925 shots used = 616000
Step 78: cost = -7.967499999999999 shots used = 624000
Step 79: cost = -7.91375 shots used = 632000
Step 80: cost = -7.797499999999999 shots used = 640000
Step 81: cost = -7.9975000000000005 shots used = 648000
Step 82: cost = -7.99 shots used = 656000
Step 83: cost = -7.7124999999999995 shots used = 664000
Step 84: cost = -7.76875 shots used = 672000
Step 85: cost = -7.62 shots used = 680000
Step 86: cost = -7.822500000000001 shots used = 688000
Step 87: cost = -7.74625 shots used = 696000
Step 88: cost = -7.9137499999999985 shots used = 704000
Step 89: cost = -7.86125 shots used = 712000
Step 90: cost = -7.975 shots used = 720000
Step 91: cost = -7.89375 shots used = 728000
Step 92: cost = -8.1075 shots used = 736000
Step 93: cost = -7.775 shots used = 744000
Step 94: cost = -7.8999999999999995 shots used = 752000
Step 95: cost = -7.85625 shots used = 760000
Step 96: cost = -7.925000000000001 shots used = 768000
Step 97: cost = -8.0 shots used = 776000
Step 98: cost = -7.825000000000001 shots used = 784000
Step 99: cost = -7.999999999999999 shots used = 792000

Comparing these two techniques:

from matplotlib import pyplot as plt

plt.style.use("seaborn")
plt.plot(shots_wrs, cost_wrs, "b", label="Adam WRS")
plt.plot(shots_adam, cost_adam, "g", label="Adam")

plt.ylabel("Cost function value")
plt.xlabel("Number of shots")
plt.legend()
plt.show()
tutorial rosalin

We can see that weighted random sampling performs just as well as the uniform deterministic sampling. However, weighted random sampling begins to show a non-negligible improvement over deterministic sampling for large Hamiltonians with highly non-uniform coefficients. For example, see Fig (3) and (4) of Arrasmith et al. [1], comparing weighted random sampling VQE optimization for both \(\text{H}_2\) and \(\text{LiH}\) molecules.

Note

While not covered here, another approach that could be taken is weighted deterministic sampling. Here, the number of shots is distributed across terms as per

\[s_i = \left\lfloor N \frac{|c_i|}{\sum_i |c_i|}\right\rfloor,\]

where \(N\) is the total number of shots.

Rosalin: Frugal shot optimization

We can see above that both methods optimize fairly well; weighted random sampling converges just as well as evenly distributing the shots across all Hamiltonian terms. However, deterministic shot distribution approaches will always have a minimum shot value required per expectation value, as below this threshold they become biased estimators. This is not the case with random sampling; as we saw in the doubly stochastic gradient descent demonstration, the introduction of randomness allows for as little as a single shot per expectation term, while still remaining an unbiased estimator.

Using this insight, Arrasmith et al. [1] modified the iCANS frugal shot-optimization technique [4] to include weighted random sampling, making it ‘doubly stochastic’.

iCANS optimizer

Two variants of the iCANS optimizer were introduced in Kübler et al., iCANS1 and iCANS2. The iCANS1 optimizer, on which Rosalin is based, frugally distributes a shot budget across the partial derivatives of each parameter, which are computed using the parameter-shift rule. It works roughly as follows:

  1. The initial step of the optimizer is performed with some specified minimum number of shots, \(s_{min}\), for all partial derivatives.

  2. The parameter-shift rule is then used to estimate the gradient \(g_i\) for each parameter \(\theta_i\), parameters, as well as the variances \(v_i\) of the estimated gradients.

  3. Gradient descent is performed for each parameter \(\theta_i\), using the pre-defined learning rate \(\alpha\) and the gradient information \(g_i\):

    \[\theta_i = \theta_i - \alpha g_i.\]
  4. The improvement in the cost function per shot, for a specific parameter value, is then calculated via

    \[\gamma_i = \frac{1}{s_i} \left[ \left(\alpha - \frac{1}{2} L\alpha^2\right) g_i^2 - \frac{L\alpha^2}{2s_i}v_i \right],\]

    where:

    • \(L \leq \sum_i|c_i|\) is the bound on the Lipschitz constant of the variational quantum algorithm objective function,
    • \(c_i\) are the coefficients of the Hamiltonian, and
    • \(\alpha\) is the learning rate, and must be bound such that \(\alpha < 2/L\) for the above expression to hold.
  5. Finally, the new values of \(s_i\) (shots for partial derivative of parameter \(\theta_i\)) is given by:

    \[s_i = \frac{2L\alpha}{2-L\alpha}\left(\frac{v_i}{g_i^2}\right)\propto \frac{v_i}{g_i^2}.\]

In addition to the above, to counteract the presence of noise in the system, a running average of \(g_i\) and \(s_i\) (\(\chi_i\) and \(\xi_i\) respectively) are used when computing \(\gamma_i\) and \(s_i\).

Note

In classical machine learning, the Lipschitz constant of the cost function is generally unknown. However, for a variational quantum algorithm with cost of the form \(f(x) = \langle \psi(x) | \hat{H} |\psi(x)\rangle\), an upper bound on the Lipschitz constant is given by \(L < \sum_i|c_i|\), where \(c_i\) are the coefficients of \(\hat{H}\) when decomposed into a linear combination of Pauli-operator tensor products.

Rosalin implementation

Let’s now modify iCANS above to incorporate weighted random sampling of Hamiltonian terms — the Rosalin frugal shot optimizer.

Rosalin takes several hyper-parameters:

  • min_shots: the minimum number of shots used to estimate the expectations of each term in the Hamiltonian. Note that this must be larger than 2 for the variance of the gradients to be computed.
  • mu: The running average constant \(\mu\in[0, 1]\). Used to control how quickly the number of shots recommended for each gradient component changes.
  • b: Regularization bias. The bias should be kept small, but non-zero.
  • lr: The learning rate. Recall from above that the learning rate must be such that \(\alpha < 2/L = 2/\sum_i|c_i|\).

Since the Rosalin optimizer has a state that must be preserved between optimization steps, let’s use a class to create our optimizer.

class Rosalin:

    def __init__(self, qnodes, coeffs, min_shots, mu=0.99, b=1e-6, lr=0.07):
        self.qnodes = qnodes
        self.coeffs = coeffs

        self.lipschitz = np.sum(np.abs(coeffs))

        if lr > 2 / self.lipschitz:
            raise ValueError("The learning rate must be less than ", 2 / self.lipschitz)

        # hyperparameters
        self.min_shots = min_shots
        self.mu = mu  # running average constant
        self.b = b    # regularization bias
        self.lr = lr  # learning rate

        # keep track of the total number of shots used
        self.shots_used = 0
        # total number of iterations
        self.k = 0
        # Number of shots per parameter
        self.s = np.zeros_like(params, dtype=np.float64) + min_shots

        # Running average of the parameter gradients
        self.chi = None
        # Running average of the variance of the parameter gradients
        self.xi = None

    def estimate_hamiltonian(self, params, shots):
        """Returns an array containing length ``shots`` single-shot estimates
        of the Hamiltonian. The shots are distributed randomly over
        the terms in the Hamiltonian, as per a Multinomial distribution.

        Since we are performing single-shot estimates, the QNodes must be
        set to 'sample' mode.
        """

        # determine the shot probability per term
        prob_shots = np.abs(coeffs) / np.sum(np.abs(coeffs))

        # construct the multinomial distribution, and sample
        # from it to determine how many shots to apply per term
        si = multinomial(n=shots, p=prob_shots)
        shots_per_term = si.rvs()[0]

        results = []
        for h, c, p, s in zip(self.qnodes, self.coeffs, prob_shots, shots_per_term):

            # if the number of shots is 0, do nothing
            if s == 0:
                continue

            # evaluate the QNode corresponding to
            # the Hamiltonian term
            res = h(params, shots=int(s))

            if s == 1:
                res = np.array([res])

            # Note that, unlike above, we divide each term by the
            # probability per shot. This is because we are sampling one at a time.
            results.append(c * res / p)

        return np.concatenate(results)

    def evaluate_grad_var(self, i, params, shots):
        """Evaluate the gradient, as well as the variance in the gradient,
        for the ith parameter in params, using the parameter-shift rule.
        """
        shift = np.zeros_like(params)
        shift[i] = np.pi / 2

        shift_forward = self.estimate_hamiltonian(params + shift, shots)
        shift_backward = self.estimate_hamiltonian(params - shift, shots)

        g = np.mean(shift_forward - shift_backward) / 2
        s = np.var((shift_forward - shift_backward) / 2, ddof=1)

        return g, s

    def step(self, params):
        """Perform a single step of the Rosalin optimizer."""
        # keep track of the number of shots run
        self.shots_used += int(2 * np.sum(self.s))

        # compute the gradient, as well as the variance in the gradient,
        # using the number of shots determined by the array s.
        grad = []
        S = []

        p_ind = list(np.ndindex(*params.shape))

        for l in p_ind:
            # loop through each parameter, performing
            # the parameter-shift rule
            g_, s_ = self.evaluate_grad_var(l, params, self.s[l])
            grad.append(g_)
            S.append(s_)

        grad = np.reshape(np.stack(grad), params.shape)
        S = np.reshape(np.stack(S), params.shape)

        # gradient descent update
        params = params - self.lr * grad

        if self.xi is None:
            self.chi = np.zeros_like(params, dtype=np.float64)
            self.xi = np.zeros_like(params, dtype=np.float64)

        # running average of the gradient variance
        self.xi = self.mu * self.xi + (1 - self.mu) * S
        xi = self.xi / (1 - self.mu ** (self.k + 1))

        # running average of the gradient
        self.chi = self.mu * self.chi + (1 - self.mu) * grad
        chi = self.chi / (1 - self.mu ** (self.k + 1))

        # determine the new optimum shots distribution for the next
        # iteration of the optimizer
        s = np.ceil(
            (2 * self.lipschitz * self.lr * xi)
            / ((2 - self.lipschitz * self.lr) * (chi ** 2 + self.b * (self.mu ** self.k)))
        )

        # apply an upper and lower bound on the new shot distributions,
        # to avoid the number of shots reducing below min(2, min_shots),
        # or growing too significantly.
        gamma = (
            (self.lr - self.lipschitz * self.lr ** 2 / 2) * chi ** 2
            - xi * self.lipschitz * self.lr ** 2 / (2 * s)
        ) / s

        argmax_gamma = np.unravel_index(np.argmax(gamma), gamma.shape)
        smax = s[argmax_gamma]
        self.s = np.clip(s, min(2, self.min_shots), smax)

        self.k += 1
        return params

Rosalin optimization

We are now ready to use our Rosalin optimizer to optimize the initial VQE problem. Note that we create our QNodes using measure="sample", since the Rosalin optimizer must be able to generate single-shot samples from our device.

rosalin_device = qml.device("default.qubit", wires=num_wires, shots=100)
qnodes = qml.map(StronglyEntanglingLayers, obs, device=rosalin_device, measure="sample")

Let’s also create a separate cost function using an ‘exact’ quantum device, so that we can keep track of the exact cost function value at each iteration.

cost_analytic = qml.dot(coeffs, qml.map(StronglyEntanglingLayers, obs, device=analytic_dev))

Creating the optimizer and beginning the optimization:

opt = Rosalin(qnodes, coeffs, min_shots=10)
params = init_params

cost_rosalin = [cost_analytic(params)]
shots_rosalin = [0]

for i in range(60):
    params = opt.step(params)
    cost_rosalin.append(cost_analytic(params))
    shots_rosalin.append(opt.shots_used)
    print(f"Step {i}: cost = {cost_rosalin[-1]}, shots_used = {shots_rosalin[-1]}")

Out:

Step 0: cost = -5.976611864639143, shots_used = 240
Step 1: cost = -3.9696542358660727, shots_used = 288
Step 2: cost = -4.960189727105254, shots_used = 360
Step 3: cost = -4.580003760087767, shots_used = 456
Step 4: cost = -2.230216749128693, shots_used = 552
Step 5: cost = -3.6390262209635624, shots_used = 696
Step 6: cost = -6.407579837465835, shots_used = 1050
Step 7: cost = -7.4366536874312565, shots_used = 1578
Step 8: cost = -7.259604321778904, shots_used = 2250
Step 9: cost = -7.062132684694287, shots_used = 2970
Step 10: cost = -7.5539381823528915, shots_used = 3738
Step 11: cost = -7.530120251217978, shots_used = 4866
Step 12: cost = -7.620064018172076, shots_used = 6474
Step 13: cost = -7.749105026853709, shots_used = 8288
Step 14: cost = -7.7584669100105454, shots_used = 10388
Step 15: cost = -7.5476680907885845, shots_used = 12404
Step 16: cost = -7.802606000681813, shots_used = 14660
Step 17: cost = -7.819375105495885, shots_used = 17180
Step 18: cost = -7.813893056373781, shots_used = 19700
Step 19: cost = -7.818976697763795, shots_used = 22796
Step 20: cost = -7.847655565015213, shots_used = 26372
Step 21: cost = -7.854512274045721, shots_used = 30810
Step 22: cost = -7.855665819254089, shots_used = 35538
Step 23: cost = -7.843276666680198, shots_used = 40770
Step 24: cost = -7.82813895796069, shots_used = 45762
Step 25: cost = -7.796501914990248, shots_used = 51162
Step 26: cost = -7.871130124788932, shots_used = 56466
Step 27: cost = -7.866190872563943, shots_used = 62010
Step 28: cost = -7.780118268373553, shots_used = 68250
Step 29: cost = -7.843565291223448, shots_used = 74946
Step 30: cost = -7.840084824878835, shots_used = 81762
Step 31: cost = -7.863430860462219, shots_used = 88962
Step 32: cost = -7.863400771365601, shots_used = 96786
Step 33: cost = -7.828392469226825, shots_used = 104730
Step 34: cost = -7.845758777555817, shots_used = 114532
Step 35: cost = -7.862280441095794, shots_used = 122908
Step 36: cost = -7.866212335569502, shots_used = 131836
Step 37: cost = -7.859430128177042, shots_used = 140500
Step 38: cost = -7.856087432905534, shots_used = 150076
Step 39: cost = -7.850323433779115, shots_used = 159676
Step 40: cost = -7.834403598788763, shots_used = 170116
Step 41: cost = -7.849769789802028, shots_used = 181300
Step 42: cost = -7.86693841353118, shots_used = 192700
Step 43: cost = -7.865653895759861, shots_used = 204460
Step 44: cost = -7.853522061269157, shots_used = 217900
Step 45: cost = -7.885272132729725, shots_used = 231748
Step 46: cost = -7.88224395467864, shots_used = 245644
Step 47: cost = -7.884376349618622, shots_used = 259852
Step 48: cost = -7.8808911781003825, shots_used = 275164
Step 49: cost = -7.881035167671664, shots_used = 292444
Step 50: cost = -7.881931152903569, shots_used = 310300
Step 51: cost = -7.873486288144938, shots_used = 329452
Step 52: cost = -7.842973314288795, shots_used = 348532
Step 53: cost = -7.87101794797729, shots_used = 368644
Step 54: cost = -7.880857865087542, shots_used = 388828
Step 55: cost = -7.884163217633474, shots_used = 409132
Step 56: cost = -7.866452206380498, shots_used = 429076
Step 57: cost = -7.876255345278057, shots_used = 451468
Step 58: cost = -7.87369984074766, shots_used = 475348
Step 59: cost = -7.890243502630163, shots_used = 501460

Let’s compare this to a standard Adam optimization. Using 100 shots per quantum evaluation, for each update step there are 2 quantum evaluations per parameter.

adam_shots_per_eval = 100
adam_shots_per_step = 2 * adam_shots_per_eval * len(params.flatten())
print(adam_shots_per_step)

Out:

2400

Thus, Adam is using 2400 shots per update step.

params = init_params
opt = qml.AdamOptimizer(0.07)

non_analytic_dev.shots = adam_shots_per_eval
cost = qml.dot(
  coeffs,
  qml.map(StronglyEntanglingLayers, obs, device=non_analytic_dev, diff_method="parameter-shift")
)

cost_adam = [cost_analytic(params)]
shots_adam = [0]

for i in range(100):
    params = opt.step(cost, params)
    cost_adam.append(cost_analytic(params))
    shots_adam.append(adam_shots_per_step * (i + 1))
    print("Step {}: cost = {} shots_used = {}".format(i, cost_adam[-1], shots_adam[-1]))

Out:

Step 0: cost = -2.03376839972733 shots_used = 2400
Step 1: cost = -3.0397515887713897 shots_used = 4800
Step 2: cost = -3.8459175082365666 shots_used = 7200
Step 3: cost = -4.505506895275778 shots_used = 9600
Step 4: cost = -5.0488106623708084 shots_used = 12000
Step 5: cost = -5.482162129547712 shots_used = 14400
Step 6: cost = -5.83880726147689 shots_used = 16800
Step 7: cost = -6.143933494222608 shots_used = 19200
Step 8: cost = -6.412317130720796 shots_used = 21600
Step 9: cost = -6.6534666682698 shots_used = 24000
Step 10: cost = -6.86746547637287 shots_used = 26400
Step 11: cost = -7.057043661341395 shots_used = 28800
Step 12: cost = -7.219548494479429 shots_used = 31200
Step 13: cost = -7.3445177518694456 shots_used = 33600
Step 14: cost = -7.435753942420535 shots_used = 36000
Step 15: cost = -7.497138548636965 shots_used = 38400
Step 16: cost = -7.529946318655265 shots_used = 40800
Step 17: cost = -7.537070813893377 shots_used = 43200
Step 18: cost = -7.525225697166624 shots_used = 45600
Step 19: cost = -7.5048251159723405 shots_used = 48000
Step 20: cost = -7.481487171246212 shots_used = 50400
Step 21: cost = -7.461106527571478 shots_used = 52800
Step 22: cost = -7.4490325775024075 shots_used = 55200
Step 23: cost = -7.444817343084735 shots_used = 57600
Step 24: cost = -7.4494913586937574 shots_used = 60000
Step 25: cost = -7.462969617594349 shots_used = 62400
Step 26: cost = -7.484518392550573 shots_used = 64800
Step 27: cost = -7.509533957688121 shots_used = 67200
Step 28: cost = -7.535240804873656 shots_used = 69600
Step 29: cost = -7.560642729685874 shots_used = 72000
Step 30: cost = -7.586205677180162 shots_used = 74400
Step 31: cost = -7.61260475402048 shots_used = 76800
Step 32: cost = -7.637117815005769 shots_used = 79200
Step 33: cost = -7.661716123608457 shots_used = 81600
Step 34: cost = -7.6852319189727165 shots_used = 84000
Step 35: cost = -7.708583289744081 shots_used = 86400
Step 36: cost = -7.729551671925802 shots_used = 88800
Step 37: cost = -7.7462558125604595 shots_used = 91200
Step 38: cost = -7.758965992155235 shots_used = 93600
Step 39: cost = -7.764889692835303 shots_used = 96000
Step 40: cost = -7.770298814247658 shots_used = 98400
Step 41: cost = -7.771938304013664 shots_used = 100800
Step 42: cost = -7.771490419427766 shots_used = 103200
Step 43: cost = -7.771665932203987 shots_used = 105600
Step 44: cost = -7.771775966399097 shots_used = 108000
Step 45: cost = -7.772019786144459 shots_used = 110400
Step 46: cost = -7.774409408800273 shots_used = 112800
Step 47: cost = -7.777544198411677 shots_used = 115200
Step 48: cost = -7.78057842461007 shots_used = 117600
Step 49: cost = -7.7865146226898805 shots_used = 120000
Step 50: cost = -7.793839215454196 shots_used = 122400
Step 51: cost = -7.802144039740554 shots_used = 124800
Step 52: cost = -7.809859012081808 shots_used = 127200
Step 53: cost = -7.818330164675909 shots_used = 129600
Step 54: cost = -7.826930993976666 shots_used = 132000
Step 55: cost = -7.834969848723968 shots_used = 134400
Step 56: cost = -7.842454395123664 shots_used = 136800
Step 57: cost = -7.849335152675151 shots_used = 139200
Step 58: cost = -7.853951071633944 shots_used = 141600
Step 59: cost = -7.858296868696565 shots_used = 144000
Step 60: cost = -7.862867672169834 shots_used = 146400
Step 61: cost = -7.865540080202736 shots_used = 148800
Step 62: cost = -7.867577632485199 shots_used = 151200
Step 63: cost = -7.869035010771334 shots_used = 153600
Step 64: cost = -7.870496374034538 shots_used = 156000
Step 65: cost = -7.871678720443278 shots_used = 158400
Step 66: cost = -7.872542373444428 shots_used = 160800
Step 67: cost = -7.873739299675017 shots_used = 163200
Step 68: cost = -7.874314293738313 shots_used = 165600
Step 69: cost = -7.875793149514538 shots_used = 168000
Step 70: cost = -7.877051911492931 shots_used = 170400
Step 71: cost = -7.878207264678217 shots_used = 172800
Step 72: cost = -7.879198045006914 shots_used = 175200
Step 73: cost = -7.880726987471535 shots_used = 177600
Step 74: cost = -7.882055795432435 shots_used = 180000
Step 75: cost = -7.88215282515028 shots_used = 182400
Step 76: cost = -7.881947191378357 shots_used = 184800
Step 77: cost = -7.881566349945106 shots_used = 187200
Step 78: cost = -7.881659168988012 shots_used = 189600
Step 79: cost = -7.881276797156975 shots_used = 192000
Step 80: cost = -7.879976174007023 shots_used = 194400
Step 81: cost = -7.878714918643873 shots_used = 196800
Step 82: cost = -7.877964404670651 shots_used = 199200
Step 83: cost = -7.8771022016203665 shots_used = 201600
Step 84: cost = -7.875562772172711 shots_used = 204000
Step 85: cost = -7.875602350174969 shots_used = 206400
Step 86: cost = -7.877141380119034 shots_used = 208800
Step 87: cost = -7.87925788505365 shots_used = 211200
Step 88: cost = -7.881144761009377 shots_used = 213600
Step 89: cost = -7.882250363744701 shots_used = 216000
Step 90: cost = -7.881748113564451 shots_used = 218400
Step 91: cost = -7.883533319932514 shots_used = 220800
Step 92: cost = -7.884779159318079 shots_used = 223200
Step 93: cost = -7.8868911005436555 shots_used = 225600
Step 94: cost = -7.888524224480213 shots_used = 228000
Step 95: cost = -7.888123287772768 shots_used = 230400
Step 96: cost = -7.8867800801467896 shots_used = 232800
Step 97: cost = -7.8853107450636415 shots_used = 235200
Step 98: cost = -7.883507674089132 shots_used = 237600
Step 99: cost = -7.881351067687096 shots_used = 240000

Plotting both experiments:

plt.style.use("seaborn")
plt.plot(shots_rosalin, cost_rosalin, "b", label="Rosalin")
plt.plot(shots_adam, cost_adam, "g", label="Adam")

plt.ylabel("Cost function value")
plt.xlabel("Number of shots")
plt.legend()
plt.xlim(0, 300000)
plt.show()
tutorial rosalin

The Rosalin optimizer performs significantly better than the Adam optimizer, approaching the ground state energy of the Hamiltonian with strikingly fewer shots.

While beyond the scope of this demonstration, the Rosalin optimizer can be modified in various other ways; for instance, by incorporating weighted hybrid sampling (which distributes some shots deterministically, with the remainder done randomly), or by adapting the variant iCANS2 optimizer. Download this demonstration from the sidebar 👉 and give it a go! ⚛️

References

[1](1, 2, 3, 4) Andrew Arrasmith, Lukasz Cincio, Rolando D. Somma, and Patrick J. Coles. “Operator Sampling for Shot-frugal Optimization in Variational Algorithms.” arXiv:2004.06252 (2020).
[2]James Stokes, Josh Izaac, Nathan Killoran, and Giuseppe Carleo. “Quantum Natural Gradient.” arXiv:1909.02108 (2019).
[3]Ryan Sweke, Frederik Wilde, Johannes Jakob Meyer, Maria Schuld, Paul K. Fährmann, Barthélémy Meynard-Piganeau, and Jens Eisert. “Stochastic gradient descent for hybrid quantum-classical optimization.” arXiv:1910.01155 (2019).
[4](1, 2) Jonas M. Kübler, Andrew Arrasmith, Lukasz Cincio, and Patrick J. Coles. “An Adaptive Optimizer for Measurement-Frugal Variational Algorithms.” Quantum 4, 263 (2020).

Total running time of the script: ( 0 minutes 43.172 seconds)

Gallery generated by Sphinx-Gallery