Computing adjoint gradients with Amazon Braket SV1

The following is a guest post by Katharine Hyatt (Scientist @ Amazon Braket) and Daniela Becker (Senior Product Manager @ Amazon Braket).

Since 2020 PennyLane has been integrated with Amazon Braket, the quantum computing service at AWS, to enable hybrid quantum algorithms to be run on high-performance simulators and quantum processing units through a range of devices. By installing the PennyLane-Braket plugin, all Braket devices can be accessed directly in PennyLane. The plugin comes pre-installed with Braket notebook instances and you can check out the installation guide for more information.

Braket now supports the computation of gradients using the adjoint differentiation method, which offers runtime speedups for various quantum machine learning and optimization workloads. In this blog post we’re going to show how you can apply the expressiveness of PennyLane together with a powerful Braket backend to solve a quantum chemistry problem using adjoint gradients. In particular, we’re going to use Braket’s on-demand state vector simulator SV1, which can simulate circuits of up to 34 qubits. To learn more about it, take a look at the SV1 section of the Amazon Braket Developer Guide.

plbraket

Finding a molecular ground state energy with ADAPT-VQE and a classical simulator

When studying chemical systems, we often want to find the ground state, or lowest energy configuration, of a given molecule. Doing so allows us to compute physical properties of the molecule of interest. The ground state is the eigenvector of the molecule’s Hamiltonian (a linear map describing the interactions of the molecule’s constituent atoms), which corresponds to its smallest eigenvalue (lowest energy).

Computing the ground state using a classical computer can quickly become intractable if we consider the full Hamiltonian — usually one has to make some kind of approximation because of the exponential scaling of interactions in the system. Using the variational quantum eigensolver (VQE) algorithm on a quantum computer with sufficiently many fault-tolerant qubits would offer a way around this problem. While such quantum computers are still in development, we can model their operation using a classical simulator on small problems, to understand how VQE could perform once fault-tolerant quantum computers become available.

In order to find the ground state of a molecule, we construct a circuit that represents an approximation of the ground state as an initial guess. A common choice is the Hartree–Fock (HF) state, constructed from single- or multi-electron excitations. Each excitation is parametrized, and by iteratively optimizing the parameter values, we can build a circuit that evolves the initial guess towards the true ground state. We do this by computing the gradient of a cost function, in this case the expectation value of the molecule’s Hamiltonian. Excitations may involve a single electron, two (a “double” excitation), or even more. Including higher-order excitations increases the accuracy of the algorithm, but the total number of n-electron excitations increases dramatically with n. For example, the number of double excitations scales with the fourth power of the number of electrons, resulting in many hundreds of excitations even for a modest (10–12) number of electrons.

At first, it would seem that this represents a daunting problem: the number of parameters to optimize grows so quickly that the optimization would simply be too time consuming, even for relatively small molecules. Luckily, it turns out most excitations contribute very little to the final ground state, which allows us to truncate the number of parameters needed in order to still calculate the ground state and its energy to a reasonable accuracy.

If we can figure out which excitations can be discarded before performing the full optimization, we can dramatically cut down on the time needed to obtain a solution. We don’t a priori know which excitations are or are not important, so we need a way to identify those we can discard.

The ADAPT-VQE algorithm, developed by Grimsley et al., allows us to perform this filtering. The steps needed are:

Compute partial derivatives of the cost function with respect to all double excitations
Filter out all double excitations with derivatives below some chosen cutoff
Optimize the parameters of the remaining double excitations
Compute partial derivatives of the cost function with respect to all single excitations, keeping the filtered-and-optimized double excitations fixed
Filter out all single excitations with derivatives below some chosen cutoff value
Optimize all remaining excitations
Compute the final energy

Accelerating ADAPT-VQE with the adjoint differentiation method

For this blog, we’ll look at a molecule many of us are familiar with from daily life: carbon dioxide, $\mathrm{CO_2}$ . $\mathrm{CO_2}$ has 22 electrons in total. For the simulation, we can pick a subset of these as “active” electrons — those that are excitable. In reality many of the electrons are not excitable, or are very rarely excited, and one does not need to consider all electrons as active to obtain accurate results. Here, we’ll set the number of active electrons to 10 and construct the Hamiltonian, the initial HF state, and the excitations in a few lines of code, using PennyLane’s qchem module:

import pennylane as qml
from pennylane import qchem
from pennylane import numpy as np

n_electrons = 10
symbols, coordinates = qchem.read_structure('qchem/co2.xyz')
H, qubits = qchem.molecular_hamiltonian(symbols, coordinates, method="pyscf", active_electrons=n_electrons, name="co2")
hf_state = qchem.hf_state(n_electrons, qubits)

print(f"Number of qubits: {qubits}") # will be 18 qubits

Even for small molecules, the number of excitations can be quite large. We need to compute partial derivatives for each excitation in order to filter them. That means that, initially, we need to compute the gradient with respect to several hundred parameters! One method to compute the gradient is the “parameter-shift” technique, which requires $2P$ circuit executions, where $P$ is the total number of parameters. This means that we’d need many hundreds of circuit executions to compute the gradient in our initial filtering pass. Although these are parallelizable, the time needed to complete the simulation may still be quite long.

Because we’re running on a classical simulator, there is another option. We can use the adjoint differentiation method to compute the gradient in only “1+1” circuit executions. The adjoint differentiation method, introduced for differentiation of classical algorithms, was first applied to quantum circuits by the authors of the Yao.jl software package. We can think of the adjoint method as analogous to classical backpropagation — we run the circuit once, then step backwards through it to compute the partial derivatives. See the PennyLane demo on adjoint differentiation or the Amazon Braket example notebook on the method for in-depth treatments. This is only possible on a simulator, because we have to clone the state vector.

Running with the adjoint method on Amazon Braket’s SV1:

Note: To use Braket SV1, you must first create an account on AWS and also follow the setup instructions for accessing Braket from Python.

Using PennyLane with Amazon Braket, we can take advantage of the adjoint differentiation method when running on the on-demand state vector simulator SV1 in shots=0 mode. To tell SV1 to use the adjoint method, simply set diff_method=’device’ in your PennyLane qnode:

diff_method = 'device'
dev = qml.device(
    "braket.aws.qubit",
    device_arn="arn:aws:braket:::device/quantum-simulator/amazon/sv1",
    wires=qubits,
    shots=0,
)

@qml.qnode(dev, diff_method=diff_method)
def circuit_1(params, excitations):
    qml.BasisState(hf_state, wires=H.wires)
    for i, excitation in enumerate(excitations):
        if len(excitation) == 4:
            qml.DoubleExcitation(params[i], wires=excitation)
        else:
            qml.SingleExcitation(params[i], wires=excitation)
    return qml.expval(H)

We are able to set up a large VQE simulation on a molecule and run it with a few library calls, and we’re also able to complete the entire workflow in 27 minutes by using SV1.

Comparing the adjoint method with parameter-shift

We can also easily run the parameter-shift method on SV1 thanks to PennyLane. Let’s compare how long it takes to compute a gradient for 100 parameters, keeping in mind that the adjoint method should only need “1+1” circuit executions and the parameter shift should need 200. SV1 supports task batching, which allows tasks to run concurrently on the on-demand simulator. This can lower the wallclock time for your workflow to complete. We’ll run the gradient computation using the adjoint method, parallelized parameter-shift, and serial parameter-shift and compare:

doubles_count  = 100
doubles_select = doubles[:doubles_count]

# Construct the serial parameter-shift circuit
@qml.qnode(dev, diff_method='parameter-shift')
def circuit_1_ps_serial(params, excitations):
    qml.BasisState(hf_state, wires=H.wires)
    for i, excitation in enumerate(excitations):
        if len(excitation) == 4:
            qml.DoubleExcitation(params[i], wires=excitation)
        else:
            qml.SingleExcitation(params[i], wires=excitation)
    return qml.expval(H)

# Construct the batched device
dev_batched = qml.device(
    "braket.aws.qubit",
    device_arn="arn:aws:braket:::device/quantum-simulator/amazon/sv1",
    wires=qubits,
    shots=0,
    parallel=True,
)

# and batched parameter-shift circuit
@qml.qnode(dev_batched, diff_method='parameter-shift')
def circuit_1_ps_batched(params, excitations): # must redefine due to new diff_method and device
    qml.BasisState(hf_state, wires=H.wires)
    for i, excitation in enumerate(excitations):
        if len(excitation) == 4:
            qml.DoubleExcitation(params[i], wires=excitation)
        else:
            qml.SingleExcitation(params[i], wires=excitation)
    return qml.expval(H)

params = [0.0] * doubles_count
grad_ps_serial  = qml.grad(circuit_1_ps_serial, argnum=0)
grad_ps_batched = qml.grad(circuit_1_ps_batched, argnum=0)

Now we can compare the wallclock time to run the gradient computation with each method:

import time
ps_serial_start  = time.time()
ps_serial_grads  = grads_ps_serial(params, excitations=doubles_select)
ps_serial_stop   = time.time()

ps_batched_start = time.time()
ps_batched_grads = grads_ps_batched(params, excitations=doubles_select)
ps_batched_stop  = time.time()

ag_start = time.time()
adjoint_grads = circuit_gradient(params, excitations=doubles_select)
ag_stop = time.time()

print("Time to compute 100 partial derivatives with:")
print(f"Adjoint differentiation: {ag_stop-ag_start}")
print(f"Batched parameter-shift: {ps_batched_stop-ps_batched_start}")
print(f"Serial parameter-shift: {ps_serial_stop-ps_serial_start}")

A representative run yields the following times:

Time to compute 100 partial derivatives with:
Adjoint differentiation: 94.12 s
Batched parameter-shift: 19443.05 s
Serial parameter-shift: 27922.40 s

We should also confirm that the partial derivatives are identical, as should be the case when running in the shots=0 mode:

assert all([adjoint_grads[ii] == ps_serial_grads[ii] for ii in range(doubles_count)])
assert all([adjoint_grads[ii] == ps_batched_grads[ii] for ii in range(doubles_count)])

Note: Running the contents of this blog may result in simulation fees charged to your AWS account. The AWS Free Tier gives you one free hour of simulation time per month — for more information, visit the Braket pricing page. Researchers at accredited institutions can apply for credits to support experiments on Amazon Braket through the AWS Cloud Credits for Research program.

The adjoint method runs up to 1000 times faster than parameter-shift, even when the latter is accelerated with batching.

In sum, using the adjoint differentiation method on SV1 can deliver a strong performance improvement for your quantum workflow, accelerating your exploration and development. And using PennyLane, you can set up a complex problem in a few lines of code. This workflow is examined in more detail in this Amazon Braket Example notebook.

Where to go from here

We were able to set up a quantum chemistry problem with many parametrized gates and 18 qubits needed to simulate the system. We filtered out most of the excitations that contribute little to the final answer and were able to save time by focusing the optimization only on those excitations which contribute most to the final result. This allowed us to reach a larger problem size — more electrons and qubits — than we would have been able to otherwise. We encourage you to try out the ADAPT-VQE method and adjoint differentiation with other molecules, for example $\mathrm{C_2O_2}$ , $\mathrm{NH_3}$ , or $\mathrm{N_2H_4}$ . SV1 supports up to 34 qubits, so simulating even those with more electrons than $\mathrm{CO_2}$ may be possible. You can find the appropriate xyz structure files on PubChem. Since the first introduction of ADAPT-VQE, researchers in quantum chemistry have continued to develop algorithms appropriate for near term quantum devices, one example of which is TETRIS-ADAPT-VQE. Researchers are also investigating different initial guesses that can accelerate the solution process, many of which are described in this overview of the state of VQE. We encourage you to also try out various algorithms and initial ansatze — if the whole ansatz is not already implemented, it can often be easily constructed using the extensive catalog of primitives and templates available in PennyLane.

Why PennyLane

Getting Started

Documentation

Ecosystem