Evaluating analytic gradients of pulse programs on quantum computers

Korbinian Kottmann

Demos/
Optimization/
Evaluating analytic gradients of pulse programs on quantum computers

Evaluating analytic gradients of pulse programs on quantum computers

Korbinian Kottmann

Published: December 11, 2023. Last updated: September 22, 2025.

Are you tired of spending precious quantum resources on computing stochastic gradients of quantum pulse programs? In this demo we introduce ODEgen, a method to compute analytic gradients of pulse programs on quantum computers with high accuracy at lower cost! Learn about how ODEgen achieves this and convince yourself of its strengths with a numerical experiment in this demo.

Illustration of the ODEgen gradient method for pulse gates, compared to the stochastic parameter-shift rule — Illustration of ODEgen and stochastic parameter-shift (SPS) for computing gradients of quantum pulse programs. ODEgen offloads the complexity induced by the time-dynamics to a classical ODE solver, whereas SPS performs Monte-Carlo sampling in time.¶

Introduction

Many contemporary quantum computers are operated by steering the qubit state through an electromagnetic pulse. This can be modeled by means of a time-dependent Hamiltonian

\[H(\theta, t) = H_\text{drift} + \sum_{j=1}^{N_g} f_j(\theta, t) H_j\]

with time-dependent, parametrized pulse envelopes \(f_j(\theta, t)\) for each of the \(N_g\) pulse generators \(H_j,\) and a constant drift term \(H_\text{drift}.\) Evolving a quantum state under \(H(\theta, t)\) for some time T results in a unitary evolution \(U(\theta),\) to which we refer to as a pulse gate. A prominent example is superconducting qubit platforms as described in the demo on differentiable pulse programming or the demo about OQC’s Lucy.

The parameters \(\theta\) of \(H(\theta, t)\) determine the shape and strength of the pulse, and can be subject to optimization in applications like the variational quantum eigensolver (VQE) [3]. Gradient based optimization on hardware is possible by utilizing the stochastic parameter-shift (SPS) rule introduced in [4] and [5]. However, this method is intrinsically stochastic and may require a large number of shots.

In this demo, we are going to take a look at the recently introduced ODEgen method for computing analytic gradients of pulse gates [1]. It utilizes classical ordinary differential equation (ODE) solvers for computing gradient recipes of quantum pulse programs that can be executed on hardware.

SPS and ODEgen gradient rules

Let us start by deriving both the SPS and ODEgen rules.

We are interested in cost functions of the form

\[\mathcal{L}(\theta) = \langle 0 | U(\theta)^\dagger H_\text{obj} U(\theta) | 0 \rangle\]

where we compute the expectation value of some objective Hamiltonian \(H_\text{obj}\) (think e.g. quantum many-body Hamiltonian whose ground state energy we want to estimate). For simplicity, we assume a sole pulse gate \(U(\theta)\) generated by a pulse Hamiltonian \(H(\theta, t) = H_\text{drift} + \sum_{j=1}^{N_g} f_j(\theta, t) H_j.\) Further, let us assume the so-called pulse generators \(H_j\) in \(H(\theta, t)\) to be Pauli words, which will make SPS rule below a bit more digestible. For more details on the general cases beyond Pauli word generators we refer to the original paper [1].

Note

As a quick reminder, let us briefly review the parameter-shift rules for Pauli words. This will help to make the journey through the following a bit easier. For a loss function \(L(\theta) = \langle 0 | U(\theta)^\dagger H_\text{obj} U(\theta)\) with a gate \(U(\theta) = e^{-i \frac{\theta}{2} H}\) generated by a Pauli word \(H\) (think e.g. \(H = X_0 Y_1\)), one can compute the gradient in a hardware compatible manner via the parameter-shift rule

\[\frac{\partial}{\partial \theta} L(\theta) = \frac{1}{2} \left(L(\theta + \frac{\pi}{2}) - L(\theta - \frac{\pi}{2}) \right).\]

For the case of multiple gates generated by different Paulis \(H_j\) with different parameters \(\theta_j\), \(U(\theta) = \prod_j e^{-i \frac{\theta_j}{2} H_j},\) the rule still has the same form

\[\frac{\partial}{\partial \theta_j} L(\theta) = \frac{1}{2} \left(L(\theta + e_j\frac{\pi}{2}) - L(\theta - e_j \frac{\pi}{2}) \right).\]

Here, the shifts of \(\pm \frac{\pi}{2}\) are only applied to differentiated parameters via the basis vectors \(e_0 = (1,0,0,..),\) \(e_1 = (0,1,0,..)\) etc. Pulse gates generated from \(H(\theta, t)\) introduce two complications. First, the generators of the gates are now time-dependent. Second, the generator consists of a sum of non-commuting terms. These complications are addressed in the SPS rule and ODEgen method, described below.

SPS

We can compute the gradient of \(\mathcal{L}\) by means of the stochastic parameter-shift rule via

\[\frac{\partial}{\partial \theta_j} \mathcal{L}(\theta) = \int_0^T d\tau \sum_{i=1}^{N_g} \frac{\partial f_i(\theta, \tau)}{\partial \theta_j} \left(\tilde{\mathcal{L}}^+_i(\tau) - \tilde{\mathcal{L}}^-_i(\tau) \right).\]

The \(\tilde{\mathcal{L}}^\pm_i(\tau)\) are the original expectation values but under shifted evolutions \(U(T, \tau) e^{-i\left(\pm\frac{\pi}{4}\right)H_i} U(\tau, 0).\) As an exception, we explicitly state the evolution times and use the notation \(U(t_1, t_0)\) to indicate the evolution is going from time \(t_0\) to \(t_1.\) One important point to stress is that in the case of the SPS rule, the circuit structure changes, whereas in the regular parameter-shift rule the same circuits are executed with shifted parameters.

In practice, the integral is approximated via Monte Carlo integration

\[\frac{\partial}{\partial \theta_j} \mathcal{L}(\theta) \approx \frac{1}{N_s} \sum_{\tau \in \mathcal{U}([0, T])} \sum_{i=1}^{N_g} \frac{\partial f_i(\theta, \tau)}{\partial \theta_j} \left(\tilde{\mathcal{L}}^+_i(\tau) - \tilde{\mathcal{L}}^-_i(\tau) \right)\]

where the \(\tau \in \mathcal{U}([0, T])\) are sampled uniformly between \(0\) and \(T,\) and \(N_s\) is the number of Monte Carlo samples for the integration. The larger the number of samples, the better the approximation. This comes at the cost of more quantum resources \(\mathcal{R},\) expressed as the number of distinct expectation values executed on the quantum device,

\[\mathcal{R}_\text{SPS} = 2 N_s N_g.\]

ODEgen

In contrast, the recently introduced ODEgen method [1] has the advantage that it circumvents the need for Monte Carlo sampling by off-loading the complexity introduced by the time-dynamics to a differentiable ODE solver.

The first step of ODEgen is writing the derivative of a pulse unitary \(U(\theta)\) as

\[\frac{\partial}{\partial \theta_j} U(\theta) = -i U(\theta) \mathcal{H}_j\]

with a so-called effective generator \(\mathcal{H}_j\) for each of the parameters \(\theta_j\) (note that these are not the pulse generators \(H_j\) in \(H(\theta, t)\)). We can obtain \(\mathcal{H}_j\) classically by computing both \(\frac{\partial}{\partial \theta_j} U(\theta)\) and \(U(\theta)\) in a forward and backward pass through the ODE solver. We already use such a solver in PennyLane for simulating pulses in Here, we use it to generate parameter-shift rules that can be executed on hardware.

The next step is to decompose each effective generator into a basis the quantum computer can understand, and, in particular, can execute. We choose the typical Pauli basis and write

\[\mathcal{H}_j = \sum_\ell \omega_\ell^{(j)} P_\ell\]

for Pauli words \(P_\ell\) (e.g. \(P_\ell = X_0 Y_1\) for some \(\ell\)) with coefficients \(\omega_\ell^{(j)}.\) With this decomposition, we can write the gradient of the cost function in terms of parameter-shift rules that can be executed on a quantum computer:

\[\begin{split}\frac{\partial \mathcal{L}}{\partial \theta_j} = \langle 0 | \left[U(\theta)^\dagger H_\text{obj} U(\theta), -i\mathcal{H}_j \right] | 0 \rangle \\ = \sum_\ell 2 \omega_\ell^{(j)} \langle 0 | \left[U(\theta)^\dagger H_\text{obj} U(\theta), -\frac{i}{2} P_\ell \right] |0\rangle \\ = \sum_\ell 2 \omega_\ell^{(j)} \frac{d}{dx} \left[ \langle 0 | U(\theta)^\dagger e^{i\frac{x}{2} P_\ell} H_\text{obj} e^{-i\frac{x}{2} P_\ell} U(\theta)|0\rangle \right]_{x=0}.\end{split}\]

In particular, we can identify

\[L_\ell(x) = \langle 0 | U(\theta)^\dagger e^{i\frac{x}{2} P_\ell} H_\text{obj} e^{-i\frac{x}{2} P_\ell} U(\theta)|0\rangle\]

as an expectation value shifted by the dummy variable \(x,\) whose derivative is given by the standard two-term parameter-shift rule (see note above or this derivation). Overall, we have

\[\frac{\partial \mathcal{L}}{\partial \theta_j} = \sum_\ell \omega_\ell^{(j)} \left(L_\ell\left(\frac{\pi}{2}\right) - L_\ell\left(-\frac{\pi}{2}\right) \right).\]

The quantum resources for ODEgen are \(2\) executions for each non-zero Pauli term \(\omega_\ell^{(j)} P_\ell\) (i.e. non-zero for any \(j\) for a particular \(\ell\)) of the decomposition. This number is at most \(4^n-1\) for \(n\) qubits. A better upper bound is given by the dimension of the dynamical Lie algebra (DLA) of the pulse Hamiltonian. That is, the number of linearly independent operators that can be generated from nested commutators of the pulse generators and the drift term. An example would be a pulse Hamiltonian composed of terms \(X_0,\) \(X_1\) and \(Z_0 Z_1.\) A basis for the DLA of this pulse Hamiltonian is given by \(\{X_0, X_1, Z_0Z_1, Y_0Y_1, Y_0Z_0, Z_0Y_0\}.\) This tells us that only those terms can be non-zero in a decomposition of any effective generator from gates generated by such a pulse Hamiltonian.

Overall, ODEgen is well-suited for pulse Hamiltonians that act effectively on few qubits - as is the case for superconducting qubit and ion trap qubit architectures - or yield a small DLA.

Numerical experiment

We want to put ODEgen and SPS head to head in a variational quantum algorithm with the same available quantum resources. For this, we are going to perform the variational quantum eigensolver (VQE) on the Heisenberg model Hamiltonian

\[H_\text{obj} = X_0 X_1 + Y_0 Y_1 + Z_0 Z_1\]

for two qubits. The ground state of this Hamiltonian is the maximally entangled singlet state \(|\phi^- \rangle = (|01\rangle - |10\rangle)/\sqrt{2}\) with ground state energy \(-3.\) Let us define it in PennyLane and also import some libraries that we are going to need for this demo.

import pennylane as qml
import numpy as np
import jax.numpy as jnp
import jax

import optax

jax.config.update("jax_enable_x64", True)
jax.config.update("jax_platform_name", "cpu")

import matplotlib.pyplot as plt

H_obj = qml.sum(qml.PauliX(0) @ qml.PauliX(1), qml.PauliY(0) @ qml.PauliY(1), qml.PauliZ(0) @ qml.PauliZ(1))
E_exact = -3.
wires = H_obj.wires

We are going to consider a system of transmon qubits described by the Hamiltonian

\[H(\theta, t) = - \sum_i \frac{\omega_i}{2} Z_i + \sum_i \Omega_i(t) \sin(\nu_i t + \phi_i(t)) Y_i + \sum_{q, p \in \mathcal{C}} \frac{g_{qp}}{2} (X_i X_p + Y_i Y_p).\]

The first term describes the single qubits with frequencies \(\omega_i.\) The second term describes the driving with drive amplitudes \(\Omega_i,\) drive frequencies \(\nu_i\) and phases \(\phi_i.\) You can check out our recent demo on driving qubits on OQC’s Lucy if you want to learn more about the details of controlling transmon qubits. The third term describes the coupling between neighboring qubits. We only have two qubits and a simple topology of \(\mathcal{C} = \{(0, 1)\},\) hence only one coupling constant \(g_{01}.\) The coupling is necessary to generate entanglement, which is achieved with cross-resonant driving in fixed-coupling transmon systems, as is the case here.

We will use realistic parameters for the transmons, taken from the coaxmon design paper [6] (this is the blue-print for the transmon qubits in OQC’s Lucy that you can access on a pulse level in PennyLane). In order to prepare the singlet ground state, we will perform a cross-resonance pulse, i.e. driving one qubit at its coupled neighbor’s frequency for entanglement generation (see [6] or [2]) while simultaneously driving the other qubit on resonance. We choose a gate time of \(100 \text{ ns}.\) We will use a piecewise constant function pwc() to parametrize both the amplitude \(\Omega_i(t)\) and the phase \(\phi_i(t)\) in time, with t_bins = 10 time bins to allow for enough flexibility in the evolution.

T_CR = 100.   # gate time for two qubit drive (cross resonance)
qubit_freq = 2*np.pi*np.array([6.509, 5.963])
g = 2 * np.pi * 0.0123

def drive_field(T, wdrive):
    """Set the evolution time ``T`` and drive frequency ``wdrive``"""
    def wrapped(p, t):
        # The first len(p) values of the trainable params p characterize the pwc function
        amp = qml.pulse.pwc(T)(p[:len(p)//2], t)
        phi = qml.pulse.pwc(T)(p[len(p)//2:], t)
        return amp * jnp.sin(wdrive * t + phi)

    return wrapped

H_pulse = qml.dot(-0.5*qubit_freq, [qml.PauliZ(i) for i in wires])
H_pulse += g * (qml.PauliX(wires[0]) @ qml.PauliX(wires[1]) + qml.PauliY(wires[0]) @ qml.PauliY(wires[1]))

H_pulse += drive_field(T_CR, qubit_freq[0]) * qml.PauliY(wires[0]) # on-resonance driving of qubit 0
H_pulse += drive_field(T_CR, qubit_freq[0]) * qml.PauliY(wires[1]) # off-resonance driving of qubit 1

We can now define the cost function that computes the expectation value of the Heisenberg Hamiltonian after evolving the state with the parametrized pulse Hamiltonian. We then define the two separate qnodes with ODEgen and SPS as their differentiation methods, respectively.

def qnode0(params):
    qml.evolve(H_pulse)((params[0], params[1]), t=T_CR)
    return qml.expval(H_obj)

dev = qml.device("default.qubit", wires=range(2))

qnode_jax = qml.QNode(qnode0, dev, interface="jax")
value_and_grad_jax = jax.jit(jax.value_and_grad(qnode_jax))

num_split_times = 8
gradient_kwargs = {"use_broadcasting": True, "num_split_times": num_split_times}
qnode_sps = qml.QNode(
    qnode0,
    dev,
    interface="jax",
    diff_method=qml.gradients.stoch_pulse_grad,
    gradient_kwargs=gradient_kwargs,
)
value_and_grad_sps = jax.value_and_grad(qnode_sps)

qnode_odegen = qml.QNode(qnode0, dev, interface="jax", diff_method=qml.gradients.pulse_odegen)
value_and_grad_odegen = jax.value_and_grad(qnode_odegen)

We note that for as long as we use a simulator, there is naturally no difference between the gradients obtained from direct backpropagation and using ODEgen.

tbins = 10            # number of time bins per pulse
n_param_batch = 2     # number of parameter batches

x = jnp.ones((n_param_batch, tbins * 2))

res0, grad0 = value_and_grad_jax(x)
res1, grad1 = value_and_grad_odegen(x)
np.allclose(res0, res1, atol=1e-3), np.allclose(grad0, grad1, atol=1e-3)

(True, True)

This allows us to use direct backpropagation in this demo, which is always faster in simulation. We now have all the ingredients to run VQE with ODEgen and SPS. We define the following standard optimization loop and run it from the same random initial values with ODEgen and SPS gradients.

def run_opt(value_and_grad, theta, n_epochs=120, lr=0.1, b1=0.9, b2=0.999):

    optimizer = optax.adam(learning_rate=lr, b1=b1, b2=b2)
    opt_state = optimizer.init(theta)

    energy = np.zeros(n_epochs)
    gradients = []
    thetas = []

    @jax.jit
    def partial_step(grad_circuit, opt_state, theta):
        # SPS gradients don't allow for full jitting of the update step
        updates, opt_state = optimizer.update(grad_circuit, opt_state)
        theta = optax.apply_updates(theta, updates)

        return opt_state, theta

    ## Optimization loop
    for n in range(n_epochs):
        val, grad_circuit = value_and_grad(theta)
        opt_state, theta = partial_step(grad_circuit, opt_state, theta)

        energy[n] = val
        gradients.append(grad_circuit)
        thetas.append(theta)

    return thetas, energy

key = jax.random.PRNGKey(888)
theta0 = jax.random.normal(key, shape=(n_param_batch, tbins * 2))

thetaf_odegen, energy_odegen = run_opt(value_and_grad_jax, theta0)
thetaf_sps, energy_sps = run_opt(value_and_grad_sps, theta0)

plt.plot(np.array(energy_sps) - E_exact, label="SPS")
plt.plot(np.array(energy_odegen) - E_exact, label="ODEgen")
plt.legend()
plt.yscale("log")
plt.ylabel("$E(\\theta) - E_{{FCI}}$")
plt.xlabel("epochs")
plt.show()

We see that with analytic gradients (ODEgen), we can reach the ground state energy within 100 epochs, whereas with SPS gradients we cannot find the path towards the minimum due to the stochasticity of the gradient estimates. Note that the convergence of the optimization is sensitive to the initial guess. In this demonstration, both optimizations start from the same (random) initial point. This picture solidifies when repeating this procedure for multiple runs from different random initializations, as was demonstrated in [1].

We also want to make sure that this is a fair comparison in terms of quantum resources. In the case of ODEgen, we maximally have \(\mathcal{R}_\text{ODEgen} = 2 (4^n - 1) = 30\) expectation values. For SPS we have \(2 N_g N_s = 32\) (due to \(N_g = 2\) and \(N_s=8\) time samples per gradient that we chose in num_split_times above). Thus, overall, we require fewer quantum resources for ODEgen gradients while achieving better performance.

Conclusion

We introduced ODEgen for computing analytic gradients of pulse gates and showcased its advantages in simulation with an example using VQE. The method is particularly well-suited for quantum computing architectures that build complexity from few-qubit gates, as is the case for superconducting qubit architectures. We invite you to play with ODEgen yourself. Note that this feature is amenable to hardware and you can compute gradients on OQC’s Lucy via PennyLane. We show you how to connect to Lucy and run pulse gates in our recent demo. Running VQE using ODEgen on hardware has recently been demonstrated in [1] and you can directly find the code here.

References

About the author

Korbinian Kottmann

Quantum compilation & open source software

Total running time of the script: (12 minutes 3.278 seconds)

Share demo

Ask a question on the forum

About

Software & Documentation

Resources

Topics

Community & Support

Evaluating analytic gradients of pulse programs on quantum computers

Introduction

SPS and ODEgen gradient rules

SPS

ODEgen

Numerical experiment

Conclusion

References

About the author

Related Demos