Frugal shot optimization with Rosalin

In this tutorial we investigate and implement the Rosalin (Random Operator Sampling for Adaptive Learning with Individual Number of shots) from Arrasmith et al. [1]. In this paper, a strategy is introduced for reducing the number of shots required when optimizing variational quantum algorithms, by both:

  • Frugally adapting the number of shots used per parameter update, and
  • Performing a weighted sampling of operators from the cost Hamiltonian.

Background

While a large number of papers in variational quantum algorithms focus on the choice of circuit ansatz, cost function, gradient computation, or initialization method, the optimization strategy—an important component affecting both convergence time and quantum resource dependence—is not as frequently considered. Instead, common ‘out-of-the-box’ classical optimization techniques, such as gradient-free methods (COBLYA, Nelder-Mead), gradient-descent, and Hessian-free methods (L-BFGS) tend to be used.

However, for variational algorithms such as VQE, which involve evaluating a large number of non-commuting operators in the cost function, decreasing the number of quantum evaluations required for convergence, while still minimizing statistical noise, can be a delicate balance.

Recent work has highlighted that ‘quantum-aware’ optimization techniques can lead to marked improvements when training variational quantum algorithms:

  • Quantum natural gradient descent by Stokes et al. [2], which takes into account the quantum geometry during the gradient-descent update step.
  • The work of Sweke et al. [3], which shows that quantum gradient descent with a finite number of shots is equivalent to stochastic gradient descent, and has guaranteed convergence. Furthermore, combining a finite number of shots with weighted sampling of the cost function terms leads to Doubly stochastic gradient descent.
  • The iCANS (individual Coupled Adaptive Number of Shots) optimization technique by Jonas Kuebler et al. [4] adapts the number of shots measurements during training, by maximizing the expected gain per shot.

In this latest result by Arrasmith et al. [1], the idea of doubly stochastic gradient descent has been used to extend the iCANS optimizer, resulting in faster convergence.

Over the course of this tutorial, we will explore their results; beginning first with a demonstration of weighted random sampling of the cost Hamiltonian operators, before combining this with the shot-frugal iCANS optimizer to perform doubly stochastic Rosalin optimization.

Weighted random sampling

Consider a Hamiltonian \(H\) expanded as a weighted sum of operators \(h_i\) that can be directly measured:

\[H = \sum_{i=1}^N c_i h_i.\]

Due to the linearity of expectation values, the expectation value of this Hamiltonian can be expressed as the weighted sum of each individual term:

\[\langle H\rangle = \sum_{i=1}^N c_i \langle h_i\rangle.\]

In the doubly stochastic gradient descent demonstration, we estimated this expectation value by uniformly sampling a subset of the terms at each optimization step, and evaluating each term by using the same finite number of shots \(N\).

However, what happens if we use a weighted approach to determine how to distribute our samples across the terms of the Hamiltonian? In weighted random sampling (WRS), the number of shots used to determine the expectation value \(\langle h_i\rangle\) is a discrete random variable distributed according to a multinomial distribution,

\[S \sim \text{Multinomial}(p_i),\]

with event probabilities

\[p_i = \frac{|c_i|}{\sum_i |c_i|}.\]

That is, the number of shots assigned to the measurement of the expectation value of the \(i\text{th}\) term of the Hamiltonian is drawn from a probability distribution proportional to the magnitude of its coefficient \(c_i\).

To see this strategy in action, consider the Hamiltonian

\[H = 2I\otimes X + 4 I\otimes Z - X\otimes X + 5Y\otimes Y + 2 Z\otimes X.\]

We can solve for the ground state energy using the variational quantum eigensolver (VQE) algorithm.

First, lets import NumPy and PennyLane, and define our Hamiltonian.

import numpy as np
import pennylane as qml

# set the random seed
np.random.seed(2)

coeffs = [2, 4, -1, 5, 2]

obs = [
  qml.PauliX(1),
  qml.PauliZ(1),
  qml.PauliX(0) @ qml.PauliX(1),
  qml.PauliY(0) @ qml.PauliY(1),
  qml.PauliZ(0) @ qml.PauliZ(1)
]

We can now create our quantum device (let’s use the default.qubit simulator), and begin constructing some QNodes to evaluate each observable. For our ansatz, we’ll use the StronglyEntanglingLayers.

from pennylane import expval
from pennylane.init import strong_ent_layers_uniform
from pennylane.templates.layers import StronglyEntanglingLayers

num_layers = 2
num_wires = 2

# create a device that estimates expectation values using a finite number of shots
non_analytic_dev = qml.device("default.qubit", wires=num_wires, analytic=False)

# create a device that calculates exact expectation values
analytic_dev = qml.device("default.qubit", wires=num_wires, analytic=True)

We use map() to map our ansatz over our list of observables, returning a collection of QNodes, each one evaluating the expectation value of each Hamiltonian.

qnodes = qml.map(StronglyEntanglingLayers, obs, device=non_analytic_dev)

Now, let’s set the total number of shots, and determine the probability for sampling each Hamiltonian term.

total_shots = 8000
prob_shots = np.abs(coeffs) / np.sum(np.abs(coeffs))
print(prob_shots)

Out:

[0.14285714 0.28571429 0.07142857 0.35714286 0.14285714]

We can now use SciPy to create our multinormial distributed random variable \(S\), using the number of trials (total shot number) and probability values:

from scipy.stats import multinomial
si = multinomial(n=total_shots, p=prob_shots)

Sampling from this distribution will provide the number of shots used to sample each term in the Hamiltonian:

samples = si.rvs()[0]
print(samples)
print(sum(samples))

Out:

[1180 2298  577 2838 1107]
8000

As expected, if we sum the sampled shots per term, we recover the total number of shots.

Let’s now create our cost function. Recall that the cost function must do the following:

  1. It must sample from the multinomial distribution we created above, to determine the number of shots \(s_i\) to use to estimate the expectation value of the ith Hamiltonian term.
  2. It then must estimate the expectation value \(\langle h_i\rangle\) by querying the required QNode.
  3. And, last but not least, estimate the expectation value \(\langle H\rangle = \sum_i c_i\langle h_i\rangle\).
def cost(params):
    # sample from the multinomial distribution
    shots_per_term = si.rvs()[0]

    result = 0

    for h, c, p, s in zip(qnodes, coeffs, prob_shots, shots_per_term):
        # set the number of shots
        h.device.shots = s

        # evaluate the QNode corresponding to
        # the Hamiltonian term, and add it on to our running sum
        result += c * h(params)

    return result

Evaluating our cost function with some initial parameters, we can test out that our cost function evaluates correctly.

init_params = strong_ent_layers_uniform(n_layers=num_layers, n_wires=num_wires)
print(cost(init_params))

Out:

-1.550084950616114

Performing the optimization, with the number of shots randomly determined at each optimization step:

opt = qml.AdamOptimizer(0.05)
params = init_params

cost_wrs = []
shots_wrs = []

for i in range(100):
    params = opt.step(cost, params)
    cost_wrs.append(cost(params))
    shots_wrs.append(total_shots*i)
    print("Step {}: cost = {} shots used = {}".format(i, cost_wrs[-1], shots_wrs[-1]))

Out:

Step 0: cost = -3.0208059439067583 shots used = 0
Step 1: cost = -3.771657737346778 shots used = 8000
Step 2: cost = -4.1918335876753705 shots used = 16000
Step 3: cost = -4.528861128268821 shots used = 24000
Step 4: cost = -4.78022120157168 shots used = 32000
Step 5: cost = -5.260060056875861 shots used = 40000
Step 6: cost = -5.16245782603232 shots used = 48000
Step 7: cost = -5.433694149953796 shots used = 56000
Step 8: cost = -5.465839536103188 shots used = 64000
Step 9: cost = -5.679329690536203 shots used = 72000
Step 10: cost = -5.375385788893826 shots used = 80000
Step 11: cost = -5.628981464381226 shots used = 88000
Step 12: cost = -5.4109207183901535 shots used = 96000
Step 13: cost = -5.703568999840614 shots used = 104000
Step 14: cost = -5.523693857029922 shots used = 112000
Step 15: cost = -5.595764316070951 shots used = 120000
Step 16: cost = -5.564592889242951 shots used = 128000
Step 17: cost = -5.8204773190306645 shots used = 136000
Step 18: cost = -5.839297130302995 shots used = 144000
Step 19: cost = -5.94567347957143 shots used = 152000
Step 20: cost = -6.054138142895432 shots used = 160000
Step 21: cost = -6.159357395749357 shots used = 168000
Step 22: cost = -6.246985881842835 shots used = 176000
Step 23: cost = -6.314466907526883 shots used = 184000
Step 24: cost = -6.3554073533506 shots used = 192000
Step 25: cost = -6.577211343628384 shots used = 200000
Step 26: cost = -6.778989934301061 shots used = 208000
Step 27: cost = -6.7523532490350995 shots used = 216000
Step 28: cost = -6.49404172920368 shots used = 224000
Step 29: cost = -6.514277164937701 shots used = 232000
Step 30: cost = -6.673083441695827 shots used = 240000
Step 31: cost = -6.598485805388185 shots used = 248000
Step 32: cost = -6.589656625941934 shots used = 256000
Step 33: cost = -6.609296991752516 shots used = 264000
Step 34: cost = -6.7383724409491075 shots used = 272000
Step 35: cost = -6.6785832337703335 shots used = 280000
Step 36: cost = -6.872757625919889 shots used = 288000
Step 37: cost = -6.708732300531025 shots used = 296000
Step 38: cost = -6.537050684213952 shots used = 304000
Step 39: cost = -6.662267945814211 shots used = 312000
Step 40: cost = -6.78580572101249 shots used = 320000
Step 41: cost = -6.868928884943361 shots used = 328000
Step 42: cost = -6.664023813736473 shots used = 336000
Step 43: cost = -6.7648051387785735 shots used = 344000
Step 44: cost = -6.656640213664698 shots used = 352000
Step 45: cost = -6.740435206132723 shots used = 360000
Step 46: cost = -6.8340045351379315 shots used = 368000
Step 47: cost = -6.840034482417571 shots used = 376000
Step 48: cost = -6.701170523435469 shots used = 384000
Step 49: cost = -7.025391068294496 shots used = 392000
Step 50: cost = -6.783514510875614 shots used = 400000
Step 51: cost = -6.793984527889651 shots used = 408000
Step 52: cost = -6.732852254187753 shots used = 416000
Step 53: cost = -6.688745260026209 shots used = 424000
Step 54: cost = -6.777224876958758 shots used = 432000
Step 55: cost = -6.744317633665775 shots used = 440000
Step 56: cost = -6.717729365299377 shots used = 448000
Step 57: cost = -6.721441950040284 shots used = 456000
Step 58: cost = -6.836623454022722 shots used = 464000
Step 59: cost = -6.5824319483454135 shots used = 472000
Step 60: cost = -6.791408167459501 shots used = 480000
Step 61: cost = -6.524963409420412 shots used = 488000
Step 62: cost = -6.691384830568775 shots used = 496000
Step 63: cost = -6.815866083137505 shots used = 504000
Step 64: cost = -6.854918311823749 shots used = 512000
Step 65: cost = -6.891139609263549 shots used = 520000
Step 66: cost = -6.7929559721419865 shots used = 528000
Step 67: cost = -6.844773533819947 shots used = 536000
Step 68: cost = -6.576201314584495 shots used = 544000
Step 69: cost = -7.074486949413229 shots used = 552000
Step 70: cost = -6.895103418540494 shots used = 560000
Step 71: cost = -6.825544193347103 shots used = 568000
Step 72: cost = -6.84757505029122 shots used = 576000
Step 73: cost = -6.749907749013109 shots used = 584000
Step 74: cost = -6.811140127059757 shots used = 592000
Step 75: cost = -6.642510561336519 shots used = 600000
Step 76: cost = -6.92152650709561 shots used = 608000
Step 77: cost = -6.95127200496328 shots used = 616000
Step 78: cost = -6.876052609712963 shots used = 624000
Step 79: cost = -7.075841486234874 shots used = 632000
Step 80: cost = -6.6078656044779835 shots used = 640000
Step 81: cost = -6.940275039066986 shots used = 648000
Step 82: cost = -6.90663122626091 shots used = 656000
Step 83: cost = -7.0483807638679155 shots used = 664000
Step 84: cost = -6.848694682025014 shots used = 672000
Step 85: cost = -7.021974538003514 shots used = 680000
Step 86: cost = -6.851058948953611 shots used = 688000
Step 87: cost = -6.674563059862608 shots used = 696000
Step 88: cost = -6.972003058694072 shots used = 704000
Step 89: cost = -7.045311784732022 shots used = 712000
Step 90: cost = -7.0894725659136375 shots used = 720000
Step 91: cost = -6.974366529167912 shots used = 728000
Step 92: cost = -7.046405519155454 shots used = 736000
Step 93: cost = -6.889270509556305 shots used = 744000
Step 94: cost = -6.875704944748242 shots used = 752000
Step 95: cost = -6.9265663588252835 shots used = 760000
Step 96: cost = -7.023414799602686 shots used = 768000
Step 97: cost = -7.109050317752904 shots used = 776000
Step 98: cost = -7.0010661839374055 shots used = 784000
Step 99: cost = -6.8988766725708 shots used = 792000

Let’s compare this against an optimization not using weighted random sampling. Here, we will split the 8000 total shots evenly across all Hamiltonian terms, also known as uniform deterministic sampling.

non_analytic_dev.shots = total_shots / len(coeffs)

qnodes = qml.map(StronglyEntanglingLayers, obs, device=non_analytic_dev)
cost = qml.dot(coeffs, qnodes)

opt = qml.AdamOptimizer(0.05)
params = init_params

cost_adam = []
shots_adam = []

for i in range(100):
    params = opt.step(cost, params)
    cost_adam.append(cost(params))
    shots_adam.append(total_shots*i)
    print("Step {}: cost = {} shots used = {}".format(i, cost_adam[-1], shots_adam[-1]))

Out:

Step 0: cost = -3.4275000000000007 shots used = 0
Step 1: cost = -3.56875 shots used = 8000
Step 2: cost = -4.257499999999999 shots used = 16000
Step 3: cost = -4.67125 shots used = 24000
Step 4: cost = -4.6225000000000005 shots used = 32000
Step 5: cost = -5.59625 shots used = 40000
Step 6: cost = -4.9087499999999995 shots used = 48000
Step 7: cost = -5.20125 shots used = 56000
Step 8: cost = -5.415000000000001 shots used = 64000
Step 9: cost = -5.220000000000001 shots used = 72000
Step 10: cost = -5.48625 shots used = 80000
Step 11: cost = -5.49125 shots used = 88000
Step 12: cost = -5.53875 shots used = 96000
Step 13: cost = -5.22375 shots used = 104000
Step 14: cost = -5.3975 shots used = 112000
Step 15: cost = -5.697499999999999 shots used = 120000
Step 16: cost = -5.73375 shots used = 128000
Step 17: cost = -5.6275 shots used = 136000
Step 18: cost = -5.777500000000001 shots used = 144000
Step 19: cost = -5.8012500000000005 shots used = 152000
Step 20: cost = -6.097499999999999 shots used = 160000
Step 21: cost = -6.14375 shots used = 168000
Step 22: cost = -6.12875 shots used = 176000
Step 23: cost = -6.61 shots used = 184000
Step 24: cost = -6.4575 shots used = 192000
Step 25: cost = -6.557499999999999 shots used = 200000
Step 26: cost = -6.4750000000000005 shots used = 208000
Step 27: cost = -6.458749999999999 shots used = 216000
Step 28: cost = -6.4725 shots used = 224000
Step 29: cost = -6.3075 shots used = 232000
Step 30: cost = -6.785 shots used = 240000
Step 31: cost = -6.703750000000001 shots used = 248000
Step 32: cost = -6.63375 shots used = 256000
Step 33: cost = -6.8687499999999995 shots used = 264000
Step 34: cost = -6.703749999999999 shots used = 272000
Step 35: cost = -6.805 shots used = 280000
Step 36: cost = -6.54 shots used = 288000
Step 37: cost = -6.80375 shots used = 296000
Step 38: cost = -6.620000000000001 shots used = 304000
Step 39: cost = -6.67 shots used = 312000
Step 40: cost = -6.7437499999999995 shots used = 320000
Step 41: cost = -6.86875 shots used = 328000
Step 42: cost = -6.90375 shots used = 336000
Step 43: cost = -6.6575 shots used = 344000
Step 44: cost = -6.826250000000001 shots used = 352000
Step 45: cost = -6.8475 shots used = 360000
Step 46: cost = -6.7925 shots used = 368000
Step 47: cost = -6.66 shots used = 376000
Step 48: cost = -6.75375 shots used = 384000
Step 49: cost = -6.82625 shots used = 392000
Step 50: cost = -6.8125 shots used = 400000
Step 51: cost = -6.755 shots used = 408000
Step 52: cost = -6.922499999999999 shots used = 416000
Step 53: cost = -6.75875 shots used = 424000
Step 54: cost = -6.8425 shots used = 432000
Step 55: cost = -6.332500000000001 shots used = 440000
Step 56: cost = -6.797499999999999 shots used = 448000
Step 57: cost = -6.78 shots used = 456000
Step 58: cost = -6.8374999999999995 shots used = 464000
Step 59: cost = -6.80625 shots used = 472000
Step 60: cost = -6.634999999999999 shots used = 480000
Step 61: cost = -7.07125 shots used = 488000
Step 62: cost = -6.81375 shots used = 496000
Step 63: cost = -6.7475 shots used = 504000
Step 64: cost = -6.895 shots used = 512000
Step 65: cost = -6.9325 shots used = 520000
Step 66: cost = -6.71125 shots used = 528000
Step 67: cost = -6.845 shots used = 536000
Step 68: cost = -6.99 shots used = 544000
Step 69: cost = -7.106250000000001 shots used = 552000
Step 70: cost = -6.641249999999999 shots used = 560000
Step 71: cost = -6.5525 shots used = 568000
Step 72: cost = -7.01875 shots used = 576000
Step 73: cost = -6.910000000000001 shots used = 584000
Step 74: cost = -6.847500000000001 shots used = 592000
Step 75: cost = -6.8125 shots used = 600000
Step 76: cost = -7.14375 shots used = 608000
Step 77: cost = -6.7925 shots used = 616000
Step 78: cost = -6.6175 shots used = 624000
Step 79: cost = -6.7612499999999995 shots used = 632000
Step 80: cost = -6.95625 shots used = 640000
Step 81: cost = -6.7875 shots used = 648000
Step 82: cost = -7.017500000000001 shots used = 656000
Step 83: cost = -6.79375 shots used = 664000
Step 84: cost = -6.928750000000001 shots used = 672000
Step 85: cost = -7.03 shots used = 680000
Step 86: cost = -6.79875 shots used = 688000
Step 87: cost = -6.94625 shots used = 696000
Step 88: cost = -7.04125 shots used = 704000
Step 89: cost = -7.0625 shots used = 712000
Step 90: cost = -6.93 shots used = 720000
Step 91: cost = -7.0025 shots used = 728000
Step 92: cost = -6.9174999999999995 shots used = 736000
Step 93: cost = -6.8812500000000005 shots used = 744000
Step 94: cost = -6.965 shots used = 752000
Step 95: cost = -6.7124999999999995 shots used = 760000
Step 96: cost = -6.9262500000000005 shots used = 768000
Step 97: cost = -7.058749999999999 shots used = 776000
Step 98: cost = -6.937500000000001 shots used = 784000
Step 99: cost = -6.828749999999999 shots used = 792000

Comparing these two techniques:

from matplotlib import pyplot as plt

plt.style.use("seaborn")
plt.plot(shots_wrs, cost_wrs, "b", label="Adam WRS")
plt.plot(shots_adam, cost_adam, "g", label="Adam")

plt.ylabel("Cost function value")
plt.xlabel("Number of shots")
plt.legend()
plt.show()
tutorial rosalin

We can see that weighted random sampling performs just as well as the uniform deterministic sampling. However, weighted random sampling begins to show a non-negligable improvement over deterministic sampling for large Hamiltonians with highly non-uniform coefficients. For example, see Fig (3) and (4) of Arrasmith et al. [1], comparing weighted random sampling VQE optimization for both \(\text{H}_2\) and \(\text{LiH}\) molecules.

Note

While not covered here, another approach that could be taken is weighted deterministic sampling. Here, the number of shots is distributed across terms as per

\[s_i = \left\lfloor N \frac{|c_i|}{\sum_i |c_i|}\right\rfloor,\]

where \(N\) is the total number of shots.

Rosalin: Frugal shot optimization

We can see above that both methods optimize fairly well; weighted random sampling converges just as well as evenly distributing the shots across all Hamiltonian terms. However, deterministic shot distribution approaches will always have a minimum shot value required per expectation value, as below this threshold they become biased estimators. This is not the case with random sampling; as we saw in the doubly stochastic gradient descent demonstration, the introduction of randomness allows for as little as a single shot per expectation term, while still remaining an unbiased estimator.

Using this insight, Arrasmith et al. [1] modified the iCANS frugal shot-optimization technique [4] to include weighted random sampling, making it ‘doubly stochastic’.

iCANS optimizer

Two variants of the iCANS optimizer were introduced in Kübler et al., iCANS1 and iCANS2. The iCANS1 optimizer, on which Rosalin is based, frugally distributes a shot budget across the partial derivatives of each parameter, which are computed using the parameter-shift rule. It works roughly as follows:

  1. The initial step of the optimizer is performed with some specified minimum number of shots, \(s_{min}\), for all partial derivatives.

  2. The parameter-shift rule is then used to estimate the gradient \(g_i\) for each parameter \(\theta_i\), parameters, as well as the variances \(v_i\) of the estimated gradients.

  3. Gradient descent is performed for each parameter \(\theta_i\), using the pre-defined learning rate \(\alpha\) and the gradient information \(g_i\):

    \[\theta_i = \theta_i - \alpha g_i.\]
  4. The improvement in the cost function per shot, for a specific parameter value, is then calculated via

    \[\gamma_i = \frac{1}{s_i} \left[ \left(\alpha - \frac{1}{2} L\alpha^2\right) g_i^2 - \frac{L\alpha^2}{2s_i}v_i \right],\]

    where:

    • \(L \leq \sum_i|c_i|\) is the bound on the Lipschitz constant of the variational quantum algorithm objective function,
    • \(c_i\) are the coefficients of the Hamiltonian, and
    • \(\alpha\) is the learning rate, and must be bound such that \(\alpha < 2/L\) for the above expression to hold.
  5. Finally, the new values of \(s_i\) (shots for partial derivative of parameter \(\theta_i\)) is given by:

    \[s_i = \frac{2L\alpha}{2-L\alpha}\left(\frac{v_i}{g_i^2}\right)\propto \frac{v_i}{g_i^2}.\]

In addition to the above, to counteract the presence of noise in the system, a running average of \(g_i\) and \(s_i\) (\(\chi_i\) and \(\xi_i\) respectively) are used when computing \(\gamma_i\) and \(s_i\).

Note

In classical machine learning, the Lipschitz constant of the cost function is generally unknown. However, for a variational quantum algorithm with cost of the form \(f(x) = \langle \psi(x) | \hat{H} |\psi(x)\rangle\), an upper bound on the Lipschitz constant is given by \(L < \sum_i|c_i|\), where \(c_i\) are the coefficients of \(\hat{H}\) when decomposed into a linear combination of Pauli-operator tensor products.

Rosalin implementation

Let’s now modify iCANS above to incorporate weighted random sampling of Hamiltonian terms — the Rosalin frugal shot optimizer.

Rosalin takes several hyper-parameters:

  • min_shots: the minimum number of shots used to estimate the expectations of each term in the Hamiltonian. Note that this must be larger than 2 for the variance of the gradients to be computed.
  • mu: The running average constant \(\mu\in[0, 1]\). Used to control how quickly the number of shots recommended for each gradient component changes.
  • b: Regularization bias. The bias should be kept small, but non-zero.
  • lr: The learning rate. Recall from above that the learning rate must be such that \(\alpha < 2/L = 2/\sum_i|c_i|\).

Since the Rosalin optimizer has a state that must be preserved between optimization steps, let’s use a class to create our optimizer.

class Rosalin:

    def __init__(self, qnodes, coeffs, min_shots, mu=0.99, b=1e-6, lr=0.07):
        self.qnodes = qnodes
        self.coeffs = coeffs

        self.lipschitz = np.sum(np.abs(coeffs))

        if lr > 2 / self.lipschitz:
            raise ValueError("The learning rate must be less than ", 2 / self.lipschitz)

        # hyperparameters
        self.min_shots = min_shots
        self.mu = mu  # running average constant
        self.b = b    # regularization bias
        self.lr = lr  # learning rate

        # keep track of the total number of shots used
        self.shots_used = 0
        # total number of iterations
        self.k = 0
        # Number of shots per parameter
        self.s = np.zeros_like(params, dtype=np.float64) + min_shots

        # Running average of the parameter gradients
        self.chi = None
        # Running average of the variance of the parameter gradients
        self.xi = None

    def estimate_hamiltonian(self, params, shots):
        """Returns an array containing length ``shots`` single-shot estimates
        of the Hamiltonian. The shots are distributed randomly over
        the terms in the Hamiltonian, as per a Multinomial distribution.

        Since we are performing single-shot estimates, the QNodes must be
        set to 'sample' mode.
        """

        # determine the shot probability per term
        prob_shots = np.abs(coeffs) / np.sum(np.abs(coeffs))

        # construct the multinomial distribution, and sample
        # from it to determine how many shots to apply per term
        si = multinomial(n=shots, p=prob_shots)
        shots_per_term = si.rvs()[0]

        results = []
        for h, c, p, s in zip(self.qnodes, self.coeffs, prob_shots, shots_per_term):

            # if the number of shots is 0, do nothing
            if s == 0:
                continue

            # set the QNode device shots
            h.device.shots = s

            # evaluate the QNode corresponding to
            # the Hamiltonian term
            res = h(params)

            if s == 1:
                res = np.array([res])

            # Note that, unlike above, we divide each term by the
            # probability per shot. This is because we are sampling one at a time.
            results.append(c * res / p)

        return np.concatenate(results)

    def evaluate_grad_var(self, i, params, shots):
        """Evaluate the gradient, as well as the variance in the gradient,
        for the ith parameter in params, using the parameter-shift rule.
        """
        shift = np.zeros_like(params)
        shift[i] = np.pi / 2

        shift_forward = self.estimate_hamiltonian(params + shift, shots)
        shift_backward = self.estimate_hamiltonian(params - shift, shots)

        g = np.mean(shift_forward - shift_backward) / 2
        s = np.var((shift_forward - shift_backward) / 2, ddof=1)

        return g, s

    def step(self, params):
        """Perform a single step of the Rosalin optimizater."""
        # keep track of the number of shots run
        self.shots_used += int(2 * np.sum(self.s))

        # compute the gradient, as well as the variance in the gradient,
        # using the number of shots determined by the array s.
        grad = []
        S = []

        p_ind = list(np.ndindex(*params.shape))

        for l in p_ind:
            # loop through each parameter, performing
            # the parameter-shift rule
            g_, s_ = self.evaluate_grad_var(l, params, self.s[l])
            grad.append(g_)
            S.append(s_)

        grad = np.reshape(np.stack(grad), params.shape)
        S = np.reshape(np.stack(S), params.shape)

        # gradient descent update
        params = params - self.lr * grad

        if self.xi is None:
            self.chi = np.zeros_like(params, dtype=np.float64)
            self.xi = np.zeros_like(params, dtype=np.float64)

        # running average of the gradient variance
        self.xi = self.mu * self.xi + (1 - self.mu) * S
        xi = self.xi / (1 - self.mu ** (self.k + 1))

        # running average of the gradient
        self.chi = self.mu * self.chi + (1 - self.mu) * grad
        chi = self.chi / (1 - self.mu ** (self.k + 1))

        # determine the new optimum shots distribution for the next
        # iteration of the optimizer
        s = np.ceil(
            (2 * self.lipschitz * self.lr * xi)
            / ((2 - self.lipschitz * self.lr) * (chi ** 2 + self.b * (self.mu ** self.k)))
        )

        # apply an upper and lower bound on the new shot distributions,
        # to avoid the number of shots reducing below min(2, min_shots),
        # or growing too significantly.
        gamma = (
            (self.lr - self.lipschitz * self.lr ** 2 / 2) * chi ** 2
            - xi * self.lipschitz * self.lr ** 2 / (2 * s)
        ) / s

        argmax_gamma = np.unravel_index(np.argmax(gamma), gamma.shape)
        smax = s[argmax_gamma]
        self.s = np.clip(s, min(2, self.min_shots), smax)

        self.k += 1
        return params

Rosalin optimization

We are now ready to use our Rosalin optimizer to optimize the initial VQE problem. Note that we create our QNodes using measure="sample", since the Rosalin optimizer must be able to generate single-shot samples from our device.

rosalin_device = qml.device("default.qubit", wires=num_wires, analytic=False)
qnodes = qml.map(StronglyEntanglingLayers, obs, device=rosalin_device, measure="sample")

Let’s also create a separate cost function using an ‘exact’ quantum device, so that we can keep track of the exact cost function value at each iteration.

cost_analytic = qml.dot(coeffs, qml.map(StronglyEntanglingLayers, obs, device=analytic_dev))

Creating the optimizer and beginning the optimization:

opt = Rosalin(qnodes, coeffs, min_shots=10)
params = init_params

cost_rosalin = [cost_analytic(params)]
shots_rosalin = [0]

for i in range(60):
    params = opt.step(params)
    cost_rosalin.append(cost_analytic(params))
    shots_rosalin.append(opt.shots_used)
    print("Step {}: cost = {} shots_used = {}".format(i, cost_rosalin[-1], shots_rosalin[-1]))

Out:

Step 0: cost = -4.187036936265069 shots_used = 240
Step 1: cost = -3.036810921229055 shots_used = 312
Step 2: cost = -1.9905197912754213 shots_used = 360
Step 3: cost = -1.3755430372910906 shots_used = 432
Step 4: cost = -4.515529902204182 shots_used = 504
Step 5: cost = -2.72307465032099 shots_used = 576
Step 6: cost = -4.612824071967708 shots_used = 648
Step 7: cost = -2.835792039317653 shots_used = 744
Step 8: cost = -5.02283689964054 shots_used = 840
Step 9: cost = -5.561027468003262 shots_used = 1032
Step 10: cost = -6.302815853424333 shots_used = 1296
Step 11: cost = -6.032705987212449 shots_used = 1632
Step 12: cost = -6.197524298414042 shots_used = 1968
Step 13: cost = -6.711044720047259 shots_used = 2352
Step 14: cost = -6.604851046426948 shots_used = 2784
Step 15: cost = -7.09092505251282 shots_used = 3336
Step 16: cost = -7.42918139680909 shots_used = 3984
Step 17: cost = -7.424291838078807 shots_used = 4800
Step 18: cost = -7.505826265616944 shots_used = 5592
Step 19: cost = -7.5603750668162855 shots_used = 6480
Step 20: cost = -7.673510522269357 shots_used = 7416
Step 21: cost = -7.799204946149125 shots_used = 8304
Step 22: cost = -7.7773740487388805 shots_used = 9168
Step 23: cost = -7.78718601380687 shots_used = 10176
Step 24: cost = -7.83282941812867 shots_used = 11304
Step 25: cost = -7.6279709745712445 shots_used = 12528
Step 26: cost = -7.7730405743466955 shots_used = 13776
Step 27: cost = -7.736988862228367 shots_used = 15096
Step 28: cost = -7.640698472141413 shots_used = 16512
Step 29: cost = -7.758972060827432 shots_used = 18024
Step 30: cost = -7.815436433346498 shots_used = 19704
Step 31: cost = -7.809011461528314 shots_used = 21504
Step 32: cost = -7.807494224696059 shots_used = 23640
Step 33: cost = -7.7842454660702005 shots_used = 25968
Step 34: cost = -7.719004282489612 shots_used = 28560
Step 35: cost = -7.870511297142265 shots_used = 31128
Step 36: cost = -7.846837786420862 shots_used = 34104
Step 37: cost = -7.806130447701809 shots_used = 37512
Step 38: cost = -7.836728445955725 shots_used = 41376
Step 39: cost = -7.869285109614562 shots_used = 45384
Step 40: cost = -7.872616756800454 shots_used = 49776
Step 41: cost = -7.853078133040148 shots_used = 54240
Step 42: cost = -7.8771532233765305 shots_used = 58752
Step 43: cost = -7.85735365934473 shots_used = 63264
Step 44: cost = -7.781220650989157 shots_used = 67968
Step 45: cost = -7.847959524028933 shots_used = 72888
Step 46: cost = -7.870258414725502 shots_used = 78384
Step 47: cost = -7.8742816095250525 shots_used = 84072
Step 48: cost = -7.868020472598408 shots_used = 90240
Step 49: cost = -7.8075033366291935 shots_used = 96576
Step 50: cost = -7.883605011429525 shots_used = 103320
Step 51: cost = -7.8691240287848085 shots_used = 110736
Step 52: cost = -7.88361482387627 shots_used = 118728
Step 53: cost = -7.889713002787561 shots_used = 127344
Step 54: cost = -7.886354313781992 shots_used = 136728
Step 55: cost = -7.88053085899908 shots_used = 145944
Step 56: cost = -7.840279542540182 shots_used = 155304
Step 57: cost = -7.879039640339286 shots_used = 165744
Step 58: cost = -7.8787718125771224 shots_used = 177096
Step 59: cost = -7.861821722625106 shots_used = 188952

Let’s compare this to a standard Adam optimization. Using 100 shots per quantum evaluation, for each update step there are 2 quantum evaluations per parameter.

adam_shots_per_eval = 100
adam_shots_per_step = 2 * adam_shots_per_eval * len(params.flatten())
print(adam_shots_per_step)

Out:

2400

Thus, Adam is using 2400 shots per update step.

params = init_params
opt = qml.AdamOptimizer(0.07)

non_analytic_dev.shots = adam_shots_per_eval
cost = qml.dot(coeffs, qml.map(StronglyEntanglingLayers, obs, device=non_analytic_dev))

cost_adam = [cost_analytic(params)]
shots_adam = [0]

for i in range(100):
    params = opt.step(cost, params)
    cost_adam.append(cost_analytic(params))
    shots_adam.append(adam_shots_per_step * (i + 1))
    print("Step {}: cost = {} shots_used = {}".format(i, cost_adam[-1], shots_adam[-1]))

Out:

Step 0: cost = -3.5369214151849864 shots_used = 2400
Step 1: cost = -4.390565915810495 shots_used = 4800
Step 2: cost = -4.900158700594325 shots_used = 7200
Step 3: cost = -5.181672985745442 shots_used = 9600
Step 4: cost = -5.310088327712991 shots_used = 12000
Step 5: cost = -5.340364750396398 shots_used = 14400
Step 6: cost = -5.314642688156364 shots_used = 16800
Step 7: cost = -5.277142615136198 shots_used = 19200
Step 8: cost = -5.243550842750599 shots_used = 21600
Step 9: cost = -5.231922547660483 shots_used = 24000
Step 10: cost = -5.249666681860763 shots_used = 26400
Step 11: cost = -5.302356906715139 shots_used = 28800
Step 12: cost = -5.387145573878129 shots_used = 31200
Step 13: cost = -5.501545661957397 shots_used = 33600
Step 14: cost = -5.642498616964824 shots_used = 36000
Step 15: cost = -5.800645688492173 shots_used = 38400
Step 16: cost = -5.9643074507274285 shots_used = 40800
Step 17: cost = -6.126376334818995 shots_used = 43200
Step 18: cost = -6.279123119000003 shots_used = 45600
Step 19: cost = -6.411753703515126 shots_used = 48000
Step 20: cost = -6.522365190543626 shots_used = 50400
Step 21: cost = -6.607528598116838 shots_used = 52800
Step 22: cost = -6.667388825256372 shots_used = 55200
Step 23: cost = -6.702030469632301 shots_used = 57600
Step 24: cost = -6.714487999813686 shots_used = 60000
Step 25: cost = -6.709421487721695 shots_used = 62400
Step 26: cost = -6.6931241732644935 shots_used = 64800
Step 27: cost = -6.6716266390949395 shots_used = 67200
Step 28: cost = -6.648655633843328 shots_used = 69600
Step 29: cost = -6.628684343251805 shots_used = 72000
Step 30: cost = -6.614559228963572 shots_used = 74400
Step 31: cost = -6.609487642287889 shots_used = 76800
Step 32: cost = -6.608099349694564 shots_used = 79200
Step 33: cost = -6.614845098729151 shots_used = 81600
Step 34: cost = -6.627335971673629 shots_used = 84000
Step 35: cost = -6.645587109975945 shots_used = 86400
Step 36: cost = -6.6661056997607515 shots_used = 88800
Step 37: cost = -6.68630198294033 shots_used = 91200
Step 38: cost = -6.703950460994744 shots_used = 93600
Step 39: cost = -6.718026880248049 shots_used = 96000
Step 40: cost = -6.730014414088037 shots_used = 98400
Step 41: cost = -6.738124408690313 shots_used = 100800
Step 42: cost = -6.745427040174652 shots_used = 103200
Step 43: cost = -6.750349304061132 shots_used = 105600
Step 44: cost = -6.753339679002335 shots_used = 108000
Step 45: cost = -6.756057762425343 shots_used = 110400
Step 46: cost = -6.757954138215228 shots_used = 112800
Step 47: cost = -6.759830681301706 shots_used = 115200
Step 48: cost = -6.7628313784205565 shots_used = 117600
Step 49: cost = -6.766966621986197 shots_used = 120000
Step 50: cost = -6.7709230513178476 shots_used = 122400
Step 51: cost = -6.776854520713336 shots_used = 124800
Step 52: cost = -6.782047054375711 shots_used = 127200
Step 53: cost = -6.788394188293627 shots_used = 129600
Step 54: cost = -6.795084363461055 shots_used = 132000
Step 55: cost = -6.80252050193897 shots_used = 134400
Step 56: cost = -6.809514624891964 shots_used = 136800
Step 57: cost = -6.814659138694676 shots_used = 139200
Step 58: cost = -6.818127272963677 shots_used = 141600
Step 59: cost = -6.820569181386316 shots_used = 144000
Step 60: cost = -6.823469824995868 shots_used = 146400
Step 61: cost = -6.8275514120999965 shots_used = 148800
Step 62: cost = -6.834114331170101 shots_used = 151200
Step 63: cost = -6.840422716414664 shots_used = 153600
Step 64: cost = -6.8465495859676055 shots_used = 156000
Step 65: cost = -6.851228489540773 shots_used = 158400
Step 66: cost = -6.856180962236583 shots_used = 160800
Step 67: cost = -6.861702868999014 shots_used = 163200
Step 68: cost = -6.86816332040616 shots_used = 165600
Step 69: cost = -6.874014129936109 shots_used = 168000
Step 70: cost = -6.879742746650653 shots_used = 170400
Step 71: cost = -6.884560504989609 shots_used = 172800
Step 72: cost = -6.8899874843590565 shots_used = 175200
Step 73: cost = -6.895725652914159 shots_used = 177600
Step 74: cost = -6.901268846339821 shots_used = 180000
Step 75: cost = -6.908232744468148 shots_used = 182400
Step 76: cost = -6.917061473399348 shots_used = 184800
Step 77: cost = -6.925612461360372 shots_used = 187200
Step 78: cost = -6.933256317683657 shots_used = 189600
Step 79: cost = -6.9397854974618545 shots_used = 192000
Step 80: cost = -6.946070692990286 shots_used = 194400
Step 81: cost = -6.952230228108258 shots_used = 196800
Step 82: cost = -6.95723410505337 shots_used = 199200
Step 83: cost = -6.964628360174405 shots_used = 201600
Step 84: cost = -6.9710360555877395 shots_used = 204000
Step 85: cost = -6.976315318818857 shots_used = 206400
Step 86: cost = -6.982264938425148 shots_used = 208800
Step 87: cost = -6.9896783894541805 shots_used = 211200
Step 88: cost = -6.995371631854548 shots_used = 213600
Step 89: cost = -7.002929904768311 shots_used = 216000
Step 90: cost = -7.011286041276102 shots_used = 218400
Step 91: cost = -7.0194966809722175 shots_used = 220800
Step 92: cost = -7.026630093122487 shots_used = 223200
Step 93: cost = -7.033391241989296 shots_used = 225600
Step 94: cost = -7.03823625356164 shots_used = 228000
Step 95: cost = -7.0414874440373865 shots_used = 230400
Step 96: cost = -7.042764357800749 shots_used = 232800
Step 97: cost = -7.044024324094519 shots_used = 235200
Step 98: cost = -7.0474826441732406 shots_used = 237600
Step 99: cost = -7.0531031950589185 shots_used = 240000

Plotting both experiments:

plt.style.use("seaborn")
plt.plot(shots_rosalin, cost_rosalin, "b", label="Rosalin")
plt.plot(shots_adam, cost_adam, "g", label="Adam")

plt.ylabel("Cost function value")
plt.xlabel("Number of shots")
plt.legend()
plt.show()
tutorial rosalin

The Rosalin optimizer performs significantly better than the Adam optimizer, approaching the ground state energy of the Hamiltonian with strikingly fewer shots.

While beyond the scope of this demonstration, the Rosalin optimizer can be modified in various other ways; for instance, by incorporating weighted hybrid sampling (which distributes some shots deterministically, with the remainder done randomly), or by adapating the variant iCANS2 optimizer. Download this demonstration from the sidebar 👉 and give it a go! ⚛️

References

[1](1, 2, 3, 4) Andrew Arrasmith, Lukasz Cincio, Rolando D. Somma, and Patrick J. Coles. “Operator Sampling for Shot-frugal Optimization in Variational Algorithms.” arXiv:2004.06252 (2020).
[2]James Stokes, Josh Izaac, Nathan Killoran, and Giuseppe Carleo. “Quantum Natural Gradient.” arXiv:1909.02108 (2019).
[3]Ryan Sweke, Frederik Wilde, Johannes Jakob Meyer, Maria Schuld, Paul K. Fährmann, Barthélémy Meynard-Piganeau, and Jens Eisert. “Stochastic gradient descent for hybrid quantum-classical optimization.” arXiv:1910.01155 (2019).
[4](1, 2) Jonas M. Kübler, Andrew Arrasmith, Lukasz Cincio, and Patrick J. Coles. “An Adaptive Optimizer for Measurement-Frugal Variational Algorithms.” Quantum 4, 263 (2020).

Total running time of the script: ( 1 minutes 3.108 seconds)

Gallery generated by Sphinx-Gallery