- Demos/
- Optimization/
Frugal shot optimization with Rosalin
Frugal shot optimization with Rosalin
Published: May 18, 2020. Last updated: October 06, 2024.
In this tutorial we investigate and implement the Rosalin (Random Operator Sampling for Adaptive Learning with Individual Number of shots) from Arrasmith et al. 1. In this paper, a strategy is introduced for reducing the number of shots required when optimizing variational quantum algorithms, by both:
Frugally adapting the number of shots used per parameter update, and
Performing a weighted sampling of operators from the cost Hamiltonian.
Note
The Rosalin optimizer is available in PennyLane via the
ShotAdaptiveOptimizer
.
Background
While a large number of papers in variational quantum algorithms focus on the choice of circuit ansatz, cost function, gradient computation, or initialization method, the optimization strategy—an important component affecting both convergence time and quantum resource dependence—is not as frequently considered. Instead, common ‘out-of-the-box’ classical optimization techniques, such as gradient-free methods (COBLYA, Nelder-Mead), gradient-descent, and Hessian-free methods (L-BFGS) tend to be used.
However, for variational algorithms such as VQE, which involve evaluating a large number of non-commuting operators in the cost function, decreasing the number of quantum evaluations required for convergence, while still minimizing statistical noise, can be a delicate balance.
Recent work has highlighted that ‘quantum-aware’ optimization techniques can lead to marked improvements when training variational quantum algorithms:
Quantum natural gradient descent by Stokes et al. 2, which takes into account the quantum geometry during the gradient-descent update step.
The work of Sweke et al. 3, which shows that quantum gradient descent with a finite number of shots is equivalent to stochastic gradient descent, and has guaranteed convergence. Furthermore, combining a finite number of shots with weighted sampling of the cost function terms leads to Doubly stochastic gradient descent.
The iCANS (individual Coupled Adaptive Number of Shots) optimization technique by Jonas Kuebler et al. 4 adapts the number of shots measurements during training, by maximizing the expected gain per shot.
In this latest result by Arrasmith et al. 1, the idea of doubly stochastic gradient descent has been used to extend the iCANS optimizer, resulting in faster convergence.
Over the course of this tutorial, we will explore their results; beginning first with a demonstration of weighted random sampling of the cost Hamiltonian operators, before combining this with the shot-frugal iCANS optimizer to perform doubly stochastic Rosalin optimization.
Weighted random sampling
Consider a Hamiltonian \(H\) expanded as a weighted sum of operators \(h_i\) that can be directly measured:
Due to the linearity of expectation values, the expectation value of this Hamiltonian can be expressed as the weighted sum of each individual term:
In the doubly stochastic gradient descent demonstration, we estimated this expectation value by uniformly sampling a subset of the terms at each optimization step, and evaluating each term by using the same finite number of shots \(N.\)
However, what happens if we use a weighted approach to determine how to distribute our samples across the terms of the Hamiltonian? In weighted random sampling (WRS), the number of shots used to determine the expectation value \(\langle h_i\rangle\) is a discrete random variable distributed according to a multinomial distribution,
with event probabilities
That is, the number of shots assigned to the measurement of the expectation value of the \(i\text{th}\) term of the Hamiltonian is drawn from a probability distribution proportional to the magnitude of its coefficient \(c_i.\)
To see this strategy in action, consider the Hamiltonian
We can solve for the ground state energy using the variational quantum eigensolver (VQE) algorithm.
First, let’s import NumPy and PennyLane, and define our Hamiltonian.
import pennylane as qml
from pennylane import numpy as np
# set the random seed
np.random.seed(4)
coeffs = [2, 4, -1, 5, 2]
obs = [
qml.PauliX(1),
qml.PauliZ(1),
qml.PauliX(0) @ qml.PauliX(1),
qml.PauliY(0) @ qml.PauliY(1),
qml.PauliZ(0) @ qml.PauliZ(1)
]
We can now create our quantum device (let’s use the default.qubit
simulator).
num_layers = 2
num_wires = 2
# create a device that estimates expectation values using a finite number of shots
non_analytic_dev = qml.device("default.qubit", wires=num_wires, shots=100, seed=432423)
# create a device that calculates exact expectation values
analytic_dev = qml.device("default.qubit", wires=num_wires, shots=None)
Now, let’s set the total number of shots, and determine the probability for sampling each Hamiltonian term.
total_shots = 8000
prob_shots = np.abs(coeffs) / np.sum(np.abs(coeffs))
print(prob_shots)
[0.14285714 0.28571429 0.07142857 0.35714286 0.14285714]
We can now use SciPy to create our multinomial distributed random variable \(S,\) using the number of trials (total shot number) and probability values:
from scipy.stats import multinomial
si = multinomial(n=total_shots, p=prob_shots)
Sampling from this distribution will provide the number of shots used to sample each term in the Hamiltonian:
samples = si.rvs()[0]
print(samples)
print(sum(samples))
[1180 2328 544 2854 1094]
8000
As expected, if we sum the sampled shots per term, we recover the total number of shots.
Let’s now create our cost function. Recall that the cost function must do the following:
It must sample from the multinomial distribution we created above, to determine the number of shots \(s_i\) to use to estimate the expectation value of the ith Hamiltonian term.
It then must estimate the expectation value \(\langle h_i\rangle\) by creating the required QNode. For our ansatz, we’ll use the
StronglyEntanglingLayers
.And, last but not least, estimate the expectation value \(\langle H\rangle = \sum_i c_i\langle h_i\rangle.\)
from pennylane.templates.layers import StronglyEntanglingLayers
@qml.qnode(non_analytic_dev, diff_method="parameter-shift", interface="autograd")
def qnode(weights, observable):
StronglyEntanglingLayers(weights, wires=non_analytic_dev.wires)
return qml.expval(observable)
def cost(params):
# sample from the multinomial distribution
shots_per_term = si.rvs()[0]
result = 0
for o, c, s in zip(obs, coeffs, shots_per_term):
# evaluate the QNode corresponding to
# the Hamiltonian term, and add it on to our running sum
result += c * qnode(params, o, shots=int(s))
return result
Evaluating our cost function with some initial parameters, we can test out that our cost function evaluates correctly.
param_shape = StronglyEntanglingLayers.shape(n_layers=num_layers, n_wires=num_wires)
init_params = np.random.uniform(low=0.0, high=2*np.pi, size=param_shape, requires_grad=True)
print(cost(init_params))
-3.8651386132882037
Performing the optimization, with the number of shots randomly determined at each optimization step:
opt = qml.AdamOptimizer(0.05)
params = init_params
cost_wrs = []
shots_wrs = []
for i in range(100):
params, _cost = opt.step_and_cost(cost, params)
cost_wrs.append(_cost)
shots_wrs.append(total_shots*i)
print("Step {}: cost = {} shots used = {}".format(i, cost_wrs[-1], shots_wrs[-1]))
Step 0: cost = -3.6700485485706293 shots used = 0
Step 1: cost = -4.953452212335504 shots used = 8000
Step 2: cost = -5.6667762770523185 shots used = 16000
Step 3: cost = -6.325683660980321 shots used = 24000
Step 4: cost = -7.046363093589159 shots used = 32000
Step 5: cost = -7.263571669118207 shots used = 40000
Step 6: cost = -7.516981726648618 shots used = 48000
Step 7: cost = -7.271864419248833 shots used = 56000
Step 8: cost = -7.4320666307426375 shots used = 64000
Step 9: cost = -7.4462917442068575 shots used = 72000
Step 10: cost = -7.6542915179682645 shots used = 80000
Step 11: cost = -7.329606531776502 shots used = 88000
Step 12: cost = -7.662542762739735 shots used = 96000
Step 13: cost = -7.584679230901346 shots used = 104000
Step 14: cost = -7.4833375481632824 shots used = 112000
Step 15: cost = -7.320962222883258 shots used = 120000
Step 16: cost = -7.500728630830073 shots used = 128000
Step 17: cost = -7.494763651968659 shots used = 136000
Step 18: cost = -7.680002301979257 shots used = 144000
Step 19: cost = -7.593757152737037 shots used = 152000
Step 20: cost = -7.629450874394776 shots used = 160000
Step 21: cost = -7.566698218227597 shots used = 168000
Step 22: cost = -7.859531302491554 shots used = 176000
Step 23: cost = -7.823465712076164 shots used = 184000
Step 24: cost = -7.6292747423561424 shots used = 192000
Step 25: cost = -8.013258655639712 shots used = 200000
Step 26: cost = -8.02893073982241 shots used = 208000
Step 27: cost = -7.787669802447964 shots used = 216000
Step 28: cost = -7.851582289012587 shots used = 224000
Step 29: cost = -7.9580237912624945 shots used = 232000
Step 30: cost = -7.7441316789247425 shots used = 240000
Step 31: cost = -8.047898343139 shots used = 248000
Step 32: cost = -7.900611219839868 shots used = 256000
Step 33: cost = -7.744580964577219 shots used = 264000
Step 34: cost = -7.890248201831718 shots used = 272000
Step 35: cost = -8.097835934510217 shots used = 280000
Step 36: cost = -7.9617617187764465 shots used = 288000
Step 37: cost = -7.8463946011807355 shots used = 296000
Step 38: cost = -7.792976922132706 shots used = 304000
Step 39: cost = -7.797739527552661 shots used = 312000
Step 40: cost = -8.043937079740244 shots used = 320000
Step 41: cost = -8.010037249305224 shots used = 328000
Step 42: cost = -7.894270264533468 shots used = 336000
Step 43: cost = -7.791970007324865 shots used = 344000
Step 44: cost = -7.942194323691684 shots used = 352000
Step 45: cost = -7.9478462201199065 shots used = 360000
Step 46: cost = -7.931687138510533 shots used = 368000
Step 47: cost = -7.937489362992923 shots used = 376000
Step 48: cost = -7.948214219148987 shots used = 384000
Step 49: cost = -7.852924744179088 shots used = 392000
Step 50: cost = -8.016606565340588 shots used = 400000
Step 51: cost = -7.866360902414698 shots used = 408000
Step 52: cost = -7.881238487673111 shots used = 416000
Step 53: cost = -7.760217342200983 shots used = 424000
Step 54: cost = -7.870842636206925 shots used = 432000
Step 55: cost = -7.846272686353321 shots used = 440000
Step 56: cost = -8.010074362806195 shots used = 448000
Step 57: cost = -8.02719214579063 shots used = 456000
Step 58: cost = -7.842498540696231 shots used = 464000
Step 59: cost = -8.04868061887861 shots used = 472000
Step 60: cost = -7.752355076563031 shots used = 480000
Step 61: cost = -7.91153754440824 shots used = 488000
Step 62: cost = -7.772285191783333 shots used = 496000
Step 63: cost = -7.9167124438772145 shots used = 504000
Step 64: cost = -7.944225160907276 shots used = 512000
Step 65: cost = -7.785745638270633 shots used = 520000
Step 66: cost = -8.023484688752811 shots used = 528000
Step 67: cost = -8.128541024638105 shots used = 536000
Step 68: cost = -7.859105941819826 shots used = 544000
Step 69: cost = -8.037851050948316 shots used = 552000
Step 70: cost = -7.976443423279927 shots used = 560000
Step 71: cost = -7.90038869658323 shots used = 568000
Step 72: cost = -7.997029058412236 shots used = 576000
Step 73: cost = -7.8807910978419 shots used = 584000
Step 74: cost = -7.651204034044203 shots used = 592000
Step 75: cost = -8.06666583767855 shots used = 600000
Step 76: cost = -7.965113367395665 shots used = 608000
Step 77: cost = -8.086141914345257 shots used = 616000
Step 78: cost = -7.813205342498767 shots used = 624000
Step 79: cost = -7.993441829700296 shots used = 632000
Step 80: cost = -7.996168167189228 shots used = 640000
Step 81: cost = -8.119572114110015 shots used = 648000
Step 82: cost = -8.014446278718644 shots used = 656000
Step 83: cost = -7.858820066304361 shots used = 664000
Step 84: cost = -7.926027618270085 shots used = 672000
Step 85: cost = -7.989984786055085 shots used = 680000
Step 86: cost = -7.85945796928511 shots used = 688000
Step 87: cost = -7.832527844591796 shots used = 696000
Step 88: cost = -8.010251558747166 shots used = 704000
Step 89: cost = -7.924020805756433 shots used = 712000
Step 90: cost = -7.864168979461082 shots used = 720000
Step 91: cost = -7.946644274357081 shots used = 728000
Step 92: cost = -7.827327742606052 shots used = 736000
Step 93: cost = -7.8544948481202095 shots used = 744000
Step 94: cost = -7.958281989618408 shots used = 752000
Step 95: cost = -7.704952737459829 shots used = 760000
Step 96: cost = -7.956393156493025 shots used = 768000
Step 97: cost = -7.94917011146214 shots used = 776000
Step 98: cost = -7.792258745178003 shots used = 784000
Step 99: cost = -7.960266938667684 shots used = 792000
Let’s compare this against an optimization not using weighted random sampling. Here, we will split the 8000 total shots evenly across all Hamiltonian terms, also known as uniform deterministic sampling.
@qml.qnode(non_analytic_dev, diff_method="parameter-shift", interface="autograd")
def qnode(weights, obs):
StronglyEntanglingLayers(weights, wires=non_analytic_dev.wires)
return qml.expval(obs)
def cost(params):
shots_per_term = int(total_shots / len(coeffs))
result = 0
for o, c in zip(obs, coeffs):
# evaluate the QNode corresponding to
# the Hamiltonian term, and add it on to our running sum
result += c * qnode(params, o, shots=shots_per_term)
return result
opt = qml.AdamOptimizer(0.05)
params = init_params
cost_adam = []
shots_adam = []
for i in range(100):
params, _cost = opt.step_and_cost(cost, params)
cost_adam.append(_cost)
shots_adam.append(total_shots*i)
print("Step {}: cost = {} shots used = {}".format(i, cost_adam[-1], shots_adam[-1]))
Step 0: cost = -3.7399999999999998 shots used = 0
Step 1: cost = -4.93125 shots used = 8000
Step 2: cost = -5.802499999999999 shots used = 16000
Step 3: cost = -6.598750000000001 shots used = 24000
Step 4: cost = -7.05875 shots used = 32000
Step 5: cost = -7.35625 shots used = 40000
Step 6: cost = -7.597499999999999 shots used = 48000
Step 7: cost = -7.35 shots used = 56000
Step 8: cost = -7.58375 shots used = 64000
Step 9: cost = -7.378750000000001 shots used = 72000
Step 10: cost = -7.505 shots used = 80000
Step 11: cost = -7.4025 shots used = 88000
Step 12: cost = -7.56875 shots used = 96000
Step 13: cost = -7.614999999999999 shots used = 104000
Step 14: cost = -7.60125 shots used = 112000
Step 15: cost = -7.6325 shots used = 120000
Step 16: cost = -7.2212499999999995 shots used = 128000
Step 17: cost = -7.3425 shots used = 136000
Step 18: cost = -7.652500000000001 shots used = 144000
Step 19: cost = -7.72 shots used = 152000
Step 20: cost = -7.6175 shots used = 160000
Step 21: cost = -7.655 shots used = 168000
Step 22: cost = -7.94375 shots used = 176000
Step 23: cost = -7.84 shots used = 184000
Step 24: cost = -7.8925 shots used = 192000
Step 25: cost = -7.74625 shots used = 200000
Step 26: cost = -7.725 shots used = 208000
Step 27: cost = -7.65375 shots used = 216000
Step 28: cost = -7.67625 shots used = 224000
Step 29: cost = -7.803749999999999 shots used = 232000
Step 30: cost = -7.915 shots used = 240000
Step 31: cost = -7.71375 shots used = 248000
Step 32: cost = -7.411250000000001 shots used = 256000
Step 33: cost = -7.8275 shots used = 264000
Step 34: cost = -8.05 shots used = 272000
Step 35: cost = -7.875 shots used = 280000
Step 36: cost = -7.800000000000001 shots used = 288000
Step 37: cost = -7.936249999999999 shots used = 296000
Step 38: cost = -7.69375 shots used = 304000
Step 39: cost = -8.175 shots used = 312000
Step 40: cost = -7.7325 shots used = 320000
Step 41: cost = -7.772499999999999 shots used = 328000
Step 42: cost = -7.9 shots used = 336000
Step 43: cost = -7.800000000000001 shots used = 344000
Step 44: cost = -7.852500000000001 shots used = 352000
Step 45: cost = -7.726249999999999 shots used = 360000
Step 46: cost = -7.95 shots used = 368000
Step 47: cost = -7.85375 shots used = 376000
Step 48: cost = -8.0075 shots used = 384000
Step 49: cost = -7.89875 shots used = 392000
Step 50: cost = -7.79625 shots used = 400000
Step 51: cost = -7.49625 shots used = 408000
Step 52: cost = -7.80375 shots used = 416000
Step 53: cost = -8.0525 shots used = 424000
Step 54: cost = -7.86125 shots used = 432000
Step 55: cost = -7.9575 shots used = 440000
Step 56: cost = -7.9425 shots used = 448000
Step 57: cost = -8.01125 shots used = 456000
Step 58: cost = -8.0125 shots used = 464000
Step 59: cost = -7.856249999999999 shots used = 472000
Step 60: cost = -7.878749999999999 shots used = 480000
Step 61: cost = -7.831250000000001 shots used = 488000
Step 62: cost = -7.880000000000001 shots used = 496000
Step 63: cost = -7.8075 shots used = 504000
Step 64: cost = -7.98875 shots used = 512000
Step 65: cost = -7.76625 shots used = 520000
Step 66: cost = -7.64 shots used = 528000
Step 67: cost = -7.768750000000001 shots used = 536000
Step 68: cost = -7.87625 shots used = 544000
Step 69: cost = -7.903749999999999 shots used = 552000
Step 70: cost = -7.648750000000001 shots used = 560000
Step 71: cost = -7.8075 shots used = 568000
Step 72: cost = -7.7575 shots used = 576000
Step 73: cost = -7.9087499999999995 shots used = 584000
Step 74: cost = -7.8725000000000005 shots used = 592000
Step 75: cost = -7.7837499999999995 shots used = 600000
Step 76: cost = -7.968749999999999 shots used = 608000
Step 77: cost = -7.831249999999999 shots used = 616000
Step 78: cost = -8.18125 shots used = 624000
Step 79: cost = -7.759999999999999 shots used = 632000
Step 80: cost = -8.16875 shots used = 640000
Step 81: cost = -7.757499999999999 shots used = 648000
Step 82: cost = -8.055 shots used = 656000
Step 83: cost = -7.9662500000000005 shots used = 664000
Step 84: cost = -8.03375 shots used = 672000
Step 85: cost = -7.91375 shots used = 680000
Step 86: cost = -8.0 shots used = 688000
Step 87: cost = -7.733750000000001 shots used = 696000
Step 88: cost = -7.691250000000001 shots used = 704000
Step 89: cost = -8.065 shots used = 712000
Step 90: cost = -7.8425 shots used = 720000
Step 91: cost = -7.94125 shots used = 728000
Step 92: cost = -7.83125 shots used = 736000
Step 93: cost = -7.72875 shots used = 744000
Step 94: cost = -7.965000000000002 shots used = 752000
Step 95: cost = -7.95125 shots used = 760000
Step 96: cost = -7.775 shots used = 768000
Step 97: cost = -7.92125 shots used = 776000
Step 98: cost = -8.12625 shots used = 784000
Step 99: cost = -7.81125 shots used = 792000
Comparing these two techniques:
from matplotlib import pyplot as plt
plt.style.use("seaborn-v0_8")
plt.plot(shots_wrs, cost_wrs, "b", label="Adam WRS")
plt.plot(shots_adam, cost_adam, "g", label="Adam")
plt.ylabel("Cost function value")
plt.xlabel("Number of shots")
plt.legend()
plt.show()

We can see that weighted random sampling performs just as well as the uniform deterministic sampling. However, weighted random sampling begins to show a non-negligible improvement over deterministic sampling for large Hamiltonians with highly non-uniform coefficients. For example, see Fig (3) and (4) of Arrasmith et al. 1, comparing weighted random sampling VQE optimization for both \(\text{H}_2\) and \(\text{LiH}\) molecules.
Note
While not covered here, another approach that could be taken is weighted deterministic sampling. Here, the number of shots is distributed across terms as per
where \(N\) is the total number of shots.
Rosalin: Frugal shot optimization
We can see above that both methods optimize fairly well; weighted random sampling converges just as well as evenly distributing the shots across all Hamiltonian terms. However, deterministic shot distribution approaches will always have a minimum shot value required per expectation value, as below this threshold they become biased estimators. This is not the case with random sampling; as we saw in the doubly stochastic gradient descent demonstration, the introduction of randomness allows for as little as a single shot per expectation term, while still remaining an unbiased estimator.
Using this insight, Arrasmith et al. 1 modified the iCANS frugal shot-optimization technique 4 to include weighted random sampling, making it ‘doubly stochastic’.
iCANS optimizer
Two variants of the iCANS optimizer were introduced in Kübler et al., iCANS1 and iCANS2. The iCANS1 optimizer, on which Rosalin is based, frugally distributes a shot budget across the partial derivatives of each parameter, which are computed using the parameter-shift rule. It works roughly as follows:
The initial step of the optimizer is performed with some specified minimum number of shots, \(s_{min},\) for all partial derivatives.
The parameter-shift rule is then used to estimate the gradient \(g_i\) for each parameter \(\theta_i,\) parameters, as well as the variances \(v_i\) of the estimated gradients.
Gradient descent is performed for each parameter \(\theta_i,\) using the pre-defined learning rate \(\alpha\) and the gradient information \(g_i:\)
\[\theta_i = \theta_i - \alpha g_i.\]The improvement in the cost function per shot, for a specific parameter value, is then calculated via
\[\gamma_i = \frac{1}{s_i} \left[ \left(\alpha - \frac{1}{2} L\alpha^2\right) g_i^2 - \frac{L\alpha^2}{2s_i}v_i \right],\]where:
\(L \leq \sum_i|c_i|\) is the bound on the Lipschitz constant of the variational quantum algorithm objective function,
\(c_i\) are the coefficients of the Hamiltonian, and
\(\alpha\) is the learning rate, and must be bound such that \(\alpha < 2/L\) for the above expression to hold.
Finally, the new values of \(s_i\) (shots for partial derivative of parameter \(\theta_i\)) is given by:
\[s_i = \frac{2L\alpha}{2-L\alpha}\left(\frac{v_i}{g_i^2}\right)\propto \frac{v_i}{g_i^2}.\]
In addition to the above, to counteract the presence of noise in the system, a running average of \(g_i\) and \(s_i\) (\(\chi_i\) and \(\xi_i\) respectively) are used when computing \(\gamma_i\) and \(s_i.\)
Note
In classical machine learning, the Lipschitz constant of the cost function is generally unknown. However, for a variational quantum algorithm with cost of the form \(f(x) = \langle \psi(x) | \hat{H} |\psi(x)\rangle,\) an upper bound on the Lipschitz constant is given by \(L < \sum_i|c_i|,\) where \(c_i\) are the coefficients of \(\hat{H}\) when decomposed into a linear combination of Pauli-operator tensor products.
Rosalin implementation
Let’s now modify iCANS above to incorporate weighted random sampling of Hamiltonian terms — the Rosalin frugal shot optimizer.
Rosalin takes several hyper-parameters:
min_shots
: the minimum number of shots used to estimate the expectations of each term in the Hamiltonian. Note that this must be larger than 2 for the variance of the gradients to be computed.mu
: The running average constant \(\mu\in[0, 1].\) Used to control how quickly the number of shots recommended for each gradient component changes.b
: Regularization bias. The bias should be kept small, but non-zero.lr
: The learning rate. Recall from above that the learning rate must be such that \(\alpha < 2/L = 2/\sum_i|c_i|.\)
Since the Rosalin optimizer has a state that must be preserved between optimization steps, let’s use a class to create our optimizer.
class Rosalin:
def __init__(self, obs, coeffs, min_shots, mu=0.99, b=1e-6, lr=0.07):
self.obs = obs
self.coeffs = coeffs
self.lipschitz = np.sum(np.abs(coeffs))
if lr > 2 / self.lipschitz:
raise ValueError("The learning rate must be less than ", 2 / self.lipschitz)
# hyperparameters
self.min_shots = min_shots
self.mu = mu # running average constant
self.b = b # regularization bias
self.lr = lr # learning rate
# keep track of the total number of shots used
self.shots_used = 0
# total number of iterations
self.k = 0
# Number of shots per parameter
self.s = np.zeros_like(params, dtype=np.float64) + min_shots
# Running average of the parameter gradients
self.chi = None
# Running average of the variance of the parameter gradients
self.xi = None
def estimate_hamiltonian(self, params, shots):
"""Returns an array containing length ``shots`` single-shot estimates
of the Hamiltonian. The shots are distributed randomly over
the terms in the Hamiltonian, as per a Multinomial distribution.
Since we are performing single-shot estimates, the QNodes must be
set to 'sample' mode.
"""
# note that convergence depends on seed for random number generation
rosalin_device = qml.device("default.qubit", wires=num_wires, shots=100, seed=93754352)
# determine the shot probability per term
prob_shots = np.abs(coeffs) / np.sum(np.abs(coeffs))
# construct the multinomial distribution, and sample
# from it to determine how many shots to apply per term
si = multinomial(n=shots, p=prob_shots)
shots_per_term = si.rvs()[0]
results = []
@qml.qnode(rosalin_device, diff_method="parameter-shift", interface="autograd")
def qnode(weights, observable):
StronglyEntanglingLayers(weights, wires=rosalin_device.wires)
return qml.sample(observable)
for o, c, p, s in zip(self.obs, self.coeffs, prob_shots, shots_per_term):
# if the number of shots is 0, do nothing
if s == 0:
continue
# evaluate the QNode corresponding to
# the Hamiltonian term
res = qnode(params, o, shots=int(s))
if s == 1:
res = np.array([res])
# Note that, unlike above, we divide each term by the
# probability per shot. This is because we are sampling one at a time.
results.append(c * res / p)
return np.concatenate(results)
def evaluate_grad_var(self, i, params, shots):
"""Evaluate the gradient, as well as the variance in the gradient,
for the ith parameter in params, using the parameter-shift rule.
"""
shift = np.zeros_like(params)
shift[i] = np.pi / 2
shift_forward = self.estimate_hamiltonian(params + shift, shots)
shift_backward = self.estimate_hamiltonian(params - shift, shots)
g = np.mean(shift_forward - shift_backward) / 2
s = np.var((shift_forward - shift_backward) / 2, ddof=1)
return g, s
def step(self, params):
"""Perform a single step of the Rosalin optimizer."""
# keep track of the number of shots run
self.shots_used += int(2 * np.sum(self.s))
# compute the gradient, as well as the variance in the gradient,
# using the number of shots determined by the array s.
grad = []
S = []
p_ind = list(np.ndindex(*params.shape))
for l in p_ind:
# loop through each parameter, performing
# the parameter-shift rule
g_, s_ = self.evaluate_grad_var(l, params, self.s[l])
grad.append(g_)
S.append(s_)
grad = np.reshape(np.stack(grad), params.shape)
S = np.reshape(np.stack(S), params.shape)
# gradient descent update
params = params - self.lr * grad
if self.xi is None:
self.chi = np.zeros_like(params, dtype=np.float64)
self.xi = np.zeros_like(params, dtype=np.float64)
# running average of the gradient variance
self.xi = self.mu * self.xi + (1 - self.mu) * S
xi = self.xi / (1 - self.mu ** (self.k + 1))
# running average of the gradient
self.chi = self.mu * self.chi + (1 - self.mu) * grad
chi = self.chi / (1 - self.mu ** (self.k + 1))
# determine the new optimum shots distribution for the next
# iteration of the optimizer
s = np.ceil(
(2 * self.lipschitz * self.lr * xi)
/ ((2 - self.lipschitz * self.lr) * (chi ** 2 + self.b * (self.mu ** self.k)))
)
# apply an upper and lower bound on the new shot distributions,
# to avoid the number of shots reducing below min(2, min_shots),
# or growing too significantly.
gamma = (
(self.lr - self.lipschitz * self.lr ** 2 / 2) * chi ** 2
- xi * self.lipschitz * self.lr ** 2 / (2 * s)
) / s
argmax_gamma = np.unravel_index(np.argmax(gamma), gamma.shape)
smax = s[argmax_gamma]
self.s = np.clip(s, min(2, self.min_shots), max(2, smax))
self.k += 1
return params
Rosalin optimization
We are now ready to use our Rosalin optimizer to optimize the initial VQE problem. But first let’s also create a separate cost function using an ‘exact’ quantum device, so that we can keep track of the exact cost function value at each iteration.
@qml.qnode(analytic_dev, interface="autograd")
def cost_analytic(weights):
StronglyEntanglingLayers(weights, wires=analytic_dev.wires)
return qml.expval(qml.Hamiltonian(coeffs, obs))
Creating the optimizer and beginning the optimization:
opt = Rosalin(obs, coeffs, min_shots=10)
params = init_params
cost_rosalin = [cost_analytic(params)]
shots_rosalin = [0]
for i in range(60):
params = opt.step(params)
cost_rosalin.append(cost_analytic(params))
shots_rosalin.append(opt.shots_used)
print(f"Step {i}: cost = {cost_rosalin[-1]}, shots_used = {shots_rosalin[-1]}")
/home/runner/work/qml/qml/.venv-build/lib/python3.10/site-packages/pennylane/numpy/tensor.py:152: RuntimeWarning: invalid value encountered in divide
res = super().__array_ufunc__(ufunc, method, *args, **kwargs)
Step 0: cost = -5.779295839256025, shots_used = 240
Step 1: cost = -6.805206893399627, shots_used = 288
Step 2: cost = -2.6893925392107434, shots_used = 336
Step 3: cost = -5.350062964632457, shots_used = 384
Step 4: cost = -5.831008801216585, shots_used = 504
Step 5: cost = -6.760493308594775, shots_used = 648
Step 6: cost = -6.482632372692123, shots_used = 864
Step 7: cost = -6.566349030575215, shots_used = 1056
Step 8: cost = -7.477307803043914, shots_used = 1368
Step 9: cost = -7.186835300333113, shots_used = 1920
Step 10: cost = -7.421053952343545, shots_used = 2640
Step 11: cost = -7.376261742482428, shots_used = 3644
Step 12: cost = -7.309061613928674, shots_used = 4964
Step 13: cost = -7.442581445605876, shots_used = 6548
Step 14: cost = -7.478988282580251, shots_used = 7868
Step 15: cost = -7.48868889398077, shots_used = 9284
Step 16: cost = -7.563982691559651, shots_used = 11858
Step 17: cost = -7.586957780522313, shots_used = 15016
Step 18: cost = -7.604513574880681, shots_used = 18572
Step 19: cost = -7.619780110568966, shots_used = 22398
Step 20: cost = -7.659513040694079, shots_used = 26602
Step 21: cost = -7.730909651085327, shots_used = 29914
Step 22: cost = -7.7600372692451725, shots_used = 33514
Step 23: cost = -7.7856638088292, shots_used = 37330
Step 24: cost = -7.8100849588818075, shots_used = 41338
Step 25: cost = -7.831884186772848, shots_used = 45610
Step 26: cost = -7.802887398609376, shots_used = 50530
Step 27: cost = -7.824376978414916, shots_used = 58382
Step 28: cost = -7.822802836870766, shots_used = 66658
Step 29: cost = -7.8330050077271345, shots_used = 75512
Step 30: cost = -7.805545389986744, shots_used = 85154
Step 31: cost = -7.831264262622566, shots_used = 96450
Step 32: cost = -7.850317461606631, shots_used = 105042
Step 33: cost = -7.844235749235165, shots_used = 118094
Step 34: cost = -7.845437388094773, shots_used = 132398
Step 35: cost = -7.857489470342101, shots_used = 147238
Step 36: cost = -7.8673603821301, shots_used = 162612
Step 37: cost = -7.869112523384946, shots_used = 179154
Step 38: cost = -7.865852000769199, shots_used = 196842
Step 39: cost = -7.862221526205571, shots_used = 215768
Step 40: cost = -7.851198822348627, shots_used = 228392
Step 41: cost = -7.85097345121944, shots_used = 242000
Step 42: cost = -7.853755812297292, shots_used = 255968
Step 43: cost = -7.879646586523319, shots_used = 270584
Step 44: cost = -7.878639601453184, shots_used = 285344
Step 45: cost = -7.837578781232576, shots_used = 300368
Step 46: cost = -7.85552848901955, shots_used = 316688
Step 47: cost = -7.865450319636475, shots_used = 333896
Step 48: cost = -7.883663477945893, shots_used = 351368
Step 49: cost = -7.877446282128802, shots_used = 369008
Step 50: cost = -7.883805248843979, shots_used = 386312
Step 51: cost = -7.887679775109563, shots_used = 404984
Step 52: cost = -7.889004195465957, shots_used = 423944
Step 53: cost = -7.891556144874796, shots_used = 444080
Step 54: cost = -7.886742350975194, shots_used = 465296
Step 55: cost = -7.886326968645518, shots_used = 486440
Step 56: cost = -7.888448457866568, shots_used = 508352
Step 57: cost = -7.890512814402705, shots_used = 530312
Step 58: cost = -7.889377868911154, shots_used = 553304
Step 59: cost = -7.884614189287923, shots_used = 576944
Let’s compare this to a standard Adam optimization. Using 100 shots per quantum evaluation, for each update step there are 2 quantum evaluations per parameter.
adam_shots_per_eval = 100
adam_shots_per_step = 2 * adam_shots_per_eval * len(params.flatten())
print(adam_shots_per_step)
2400
Thus, Adam is using 2400 shots per update step.
params = init_params
opt = qml.AdamOptimizer(0.07)
adam_dev = qml.device('default.qubit', shots=adam_shots_per_eval, seed=595905)
@qml.qnode(adam_dev, diff_method="parameter-shift", interface="autograd")
def cost(weights):
StronglyEntanglingLayers(weights, wires=non_analytic_dev.wires)
return qml.expval(qml.Hamiltonian(coeffs, obs))
cost_adam = [cost_analytic(params)]
shots_adam = [0]
for i in range(100):
params = opt.step(cost, params)
cost_adam.append(cost_analytic(params))
shots_adam.append(adam_shots_per_step * (i + 1))
print("Step {}: cost = {} shots_used = {}".format(i, cost_adam[-1], shots_adam[-1]))
Step 0: cost = -5.128325065574895 shots_used = 2400
Step 1: cost = -6.077888806505183 shots_used = 4800
Step 2: cost = -6.669387659810989 shots_used = 7200
Step 3: cost = -6.995155780313791 shots_used = 9600
Step 4: cost = -7.149135565909892 shots_used = 12000
Step 5: cost = -7.247518554637995 shots_used = 14400
Step 6: cost = -7.324118025755027 shots_used = 16800
Step 7: cost = -7.369807252900227 shots_used = 19200
Step 8: cost = -7.365879641931143 shots_used = 21600
Step 9: cost = -7.340915526703385 shots_used = 24000
Step 10: cost = -7.295077914262812 shots_used = 26400
Step 11: cost = -7.268348607909802 shots_used = 28800
Step 12: cost = -7.288542377363977 shots_used = 31200
Step 13: cost = -7.344993267610022 shots_used = 33600
Step 14: cost = -7.428107113997562 shots_used = 36000
Step 15: cost = -7.521001832744414 shots_used = 38400
Step 16: cost = -7.617367280851905 shots_used = 40800
Step 17: cost = -7.699135008266959 shots_used = 43200
Step 18: cost = -7.75576251948225 shots_used = 45600
Step 19: cost = -7.778134322455443 shots_used = 48000
Step 20: cost = -7.769683470735101 shots_used = 50400
Step 21: cost = -7.734538431593637 shots_used = 52800
Step 22: cost = -7.68951492138592 shots_used = 55200
Step 23: cost = -7.669653730774853 shots_used = 57600
Step 24: cost = -7.6691034263039874 shots_used = 60000
Step 25: cost = -7.69497564393327 shots_used = 62400
Step 26: cost = -7.732443334382091 shots_used = 64800
Step 27: cost = -7.777220576867574 shots_used = 67200
Step 28: cost = -7.817137394513477 shots_used = 69600
Step 29: cost = -7.84566315525906 shots_used = 72000
Step 30: cost = -7.856220761903944 shots_used = 74400
Step 31: cost = -7.851372880188038 shots_used = 76800
Step 32: cost = -7.835785605215614 shots_used = 79200
Step 33: cost = -7.824746067752327 shots_used = 81600
Step 34: cost = -7.813672992553275 shots_used = 84000
Step 35: cost = -7.803034802319635 shots_used = 86400
Step 36: cost = -7.810210781023406 shots_used = 88800
Step 37: cost = -7.828619121530458 shots_used = 91200
Step 38: cost = -7.846343932231347 shots_used = 93600
Step 39: cost = -7.86155305834996 shots_used = 96000
Step 40: cost = -7.868475215643945 shots_used = 98400
Step 41: cost = -7.863007418658066 shots_used = 100800
Step 42: cost = -7.848437376010034 shots_used = 103200
Step 43: cost = -7.832939780012174 shots_used = 105600
Step 44: cost = -7.831026527756556 shots_used = 108000
Step 45: cost = -7.843603945150899 shots_used = 110400
Step 46: cost = -7.8614209612410235 shots_used = 112800
Step 47: cost = -7.879967191299146 shots_used = 115200
Step 48: cost = -7.888573112202806 shots_used = 117600
Step 49: cost = -7.885565506002882 shots_used = 120000
Step 50: cost = -7.871077318717596 shots_used = 122400
Step 51: cost = -7.864989772267785 shots_used = 124800
Step 52: cost = -7.86398005926097 shots_used = 127200
Step 53: cost = -7.87321596480673 shots_used = 129600
Step 54: cost = -7.883245256959968 shots_used = 132000
Step 55: cost = -7.8879294183700335 shots_used = 134400
Step 56: cost = -7.892615796297896 shots_used = 136800
Step 57: cost = -7.89635501258833 shots_used = 139200
Step 58: cost = -7.897970628141338 shots_used = 141600
Step 59: cost = -7.896585230980312 shots_used = 144000
Step 60: cost = -7.889686262793565 shots_used = 146400
Step 61: cost = -7.887124460600172 shots_used = 148800
Step 62: cost = -7.879623746550409 shots_used = 151200
Step 63: cost = -7.879749349408954 shots_used = 153600
Step 64: cost = -7.8870820937807515 shots_used = 156000
Step 65: cost = -7.893955382029848 shots_used = 158400
Step 66: cost = -7.897896040152991 shots_used = 160800
Step 67: cost = -7.892594916057406 shots_used = 163200
Step 68: cost = -7.8828275731761694 shots_used = 165600
Step 69: cost = -7.8796645625087995 shots_used = 168000
Step 70: cost = -7.880185578884502 shots_used = 170400
Step 71: cost = -7.884563719057874 shots_used = 172800
Step 72: cost = -7.889696622147811 shots_used = 175200
Step 73: cost = -7.894309098391391 shots_used = 177600
Step 74: cost = -7.896381594918211 shots_used = 180000
Step 75: cost = -7.9004364720512585 shots_used = 182400
Step 76: cost = -7.902016906767931 shots_used = 184800
Step 77: cost = -7.901972501435424 shots_used = 187200
Step 78: cost = -7.898670614377715 shots_used = 189600
Step 79: cost = -7.893820214436694 shots_used = 192000
Step 80: cost = -7.894059279943024 shots_used = 194400
Step 81: cost = -7.89737760944112 shots_used = 196800
Step 82: cost = -7.898647408975192 shots_used = 199200
Step 83: cost = -7.896862162561549 shots_used = 201600
Step 84: cost = -7.888966980238634 shots_used = 204000
Step 85: cost = -7.882190248831656 shots_used = 206400
Step 86: cost = -7.876178144452561 shots_used = 208800
Step 87: cost = -7.87152067230592 shots_used = 211200
Step 88: cost = -7.871817907219619 shots_used = 213600
Step 89: cost = -7.878376982107158 shots_used = 216000
Step 90: cost = -7.885210000163964 shots_used = 218400
Step 91: cost = -7.889412839205933 shots_used = 220800
Step 92: cost = -7.894121739093724 shots_used = 223200
Step 93: cost = -7.895743941027305 shots_used = 225600
Step 94: cost = -7.892429384037089 shots_used = 228000
Step 95: cost = -7.8888232691583 shots_used = 230400
Step 96: cost = -7.883245928399467 shots_used = 232800
Step 97: cost = -7.882901887169076 shots_used = 235200
Step 98: cost = -7.8856510828744915 shots_used = 237600
Step 99: cost = -7.886925982292149 shots_used = 240000
Plotting both experiments:
plt.style.use("seaborn-v0_8")
plt.plot(shots_rosalin, cost_rosalin, "b", label="Rosalin")
plt.plot(shots_adam, cost_adam, "g", label="Adam")
plt.ylabel("Cost function value")
plt.xlabel("Number of shots")
plt.legend()
plt.xlim(0, 300000)
plt.show()

The Rosalin optimizer performs significantly better than the Adam optimizer, approaching the ground state energy of the Hamiltonian with strikingly fewer shots.
While beyond the scope of this demonstration, the Rosalin optimizer can be modified in various other ways; for instance, by incorporating weighted hybrid sampling (which distributes some shots deterministically, with the remainder done randomly), or by adapting the variant iCANS2 optimizer. Download this demonstration from the sidebar 👉 and give it a go! ⚛️
References
- 1(1,2,3,4)
Andrew Arrasmith, Lukasz Cincio, Rolando D. Somma, and Patrick J. Coles. “Operator Sampling for Shot-frugal Optimization in Variational Algorithms.” arXiv:2004.06252 (2020).
- 2
James Stokes, Josh Izaac, Nathan Killoran, and Giuseppe Carleo. “Quantum Natural Gradient.” arXiv:1909.02108 (2019).
- 3
Ryan Sweke, Frederik Wilde, Johannes Jakob Meyer, Maria Schuld, Paul K. Fährmann, Barthélémy Meynard-Piganeau, and Jens Eisert. “Stochastic gradient descent for hybrid quantum-classical optimization.” arXiv:1910.01155 (2019).
- 4(1,2)
Jonas M. Kübler, Andrew Arrasmith, Lukasz Cincio, and Patrick J. Coles. “An Adaptive Optimizer for Measurement-Frugal Variational Algorithms.” Quantum 4, 263 (2020).
About the author
Josh Izaac
Josh is a theoretical physicist, software tinkerer, and occasional baker. At Xanadu, he contributes to the development and growth of Xanadu’s open-source quantum software products.
Total running time of the script: (1 minute 35.695 seconds)