How to choose your device

Aleksei Malyshev

One of the main driving forces that has enabled the explosive growth of machine learning (ML) in the past decade were the specialised ML frameworks (e.g., PyTorch, Tensorflow, and Jax), which provide high-quality and easily composable ML primitives as well as streamlined access to hardware acceleration units. In other words, an ML practitioner can nowadays spend most of their time thinking about the high-level structure of the developed algorithm, rather than about the low-level details of its implementation and execution on specific hardware.

In a similar vein, PennyLane aims to provide the same level of abstraction for the field of quantum machine learning (QML); a researcher should be able to design a quantum algorithm or circuit and then seamlessly run it on a quantum computer. Nevertheless, we currently lack a fault-tolerant quantum computer and can rely only on existing NISQ devices. While in an ideal case no particular device should be preferred over the others, in practice they are different in their nature. They are subject to different types of noise, have different sets of high-quality gates, and thus can yield different results for the same circuit.

Hence, PennyLane users need to be aware of the device on which they are running their code. And it’s not only about devices — software simulators can be configured in numerous ways, too. Therefore, being exposed to the variety of backends and their features is beneficial even at the prototyping stage. To help you navigate through the plethora of choices, we will consider a toy example of a differentiable quantum computing problem, go through the common steps of picking a device, and showcase some of their pros and cons.

The toy problem

For this how-to blog post we will consider a problem inspired by the PennyLane qubit rotation tutorial. We have the following variational circuit.

The toy problem circuit

We want to find such values of \(\alpha\), \(\beta\) and \(\gamma\) that the initial state \(|0\rangle\) is mapped to the final state \(|1\rangle\). These desired values exist because any unitary \(\hat{U}\) can be decomposed as follows:

\begin{equation} \hat{U} = e^{i\delta} \hat{R}_Z(\gamma) \hat{R}_Y(\beta) \hat{R}_Z(\alpha) , \quad \text{for some} \quad \alpha, \beta, \gamma, \delta \in \mathbb{R}. \end{equation}

Indeed, there exists at least one unitary \(\hat{U}\) (for example, Pauli \(\hat{X}\)) that can perform the desired transformation. In principle, we can find the optimal values for such a simple system analytically, but that would ruin the purpose of this guide, so let’s be diligent and use an optimization routine. In particular, we will look for the optimal \(\alpha\), \(\beta\) and \(\gamma\) iteratively, using the gradient descent optimization algorithm. If you are not familiar with the concepts of cost function and gradient descent, please have a look at the aforementioned qubit rotation tutorial.

An important digression

In this how-to we are interested in running the same circuit on different devices. However, the pipeline used in most PennyLane tutorials somewhat defeats this purpose. In most cases, you will first create a device:

import pennylane as qml

device = qml.device('default.qubit', wires=1)

and then use the qnode() decorator to bind the circuit function to this device:

def circuit(params):

Now, if you want to run the same circuit on another device, you basically have to copy-paste the body of circuit function, give it another name and apply the qnode() decorator again. Of course, this is far from optimal and very error-prone — as is true for any code copy-pasting!

We will stick to another approach: we will first define the circuit, and then use QNode constructor to bind it to different devices and create different QNodes. If you are not yet comfortable with circuits, devices and QNodes, you can refer to the qubit rotation tutorial — it does a great job in explaining these concepts.

A naïve approach

Now that we have agreed on our pipeline, let’s see what a naïve approach to our toy problem might look like!

1. As described above, let’s first define the circuit:

def circuit(params):
    qml.RZ(params[0], wires=0)
    qml.RY(params[1], wires=0)
    qml.RZ(params[2], wires=0)

    return qml.expval(qml.PauliZ(0))

Our cost function for the optimization purposes is the expected value of the \(\langle \hat{Z} \rangle\) observable, so we will be looking for its gradient with respect to the circuit parameters.

2. Then, we can choose a simple software simulator to run the circuit. For now, we want to be honest and we don’t want to compute the quantities exactly — just as if we were running our code on an actual quantum computer. We emulate this by choosing shots = 1000.

default_device = qml.device('default.qubit', 

3. Now, let’s fuse the circuit and device into a QNode, We use the parameter-shift differentiation rule to mimic a real quantum computer).

default_qnode = qml.QNode(circuit, 

4. Finally, it’s time to write the optimisation loop (note that we crudely measure and save the time spent on every iteration to showcase the advantages of different devices1)

import time

def default_optimize(qnode, init_params, step_num, verbose=False):
    params = init_params
    costs = []
    times = []
    optimizer = qml.AdamOptimizer()

    for i in range(step_num):
        start_time = time.time()
        params, cost = optimizer.step_and_cost(lambda params: qnode(params), params)
        times.append(time.time() - start_time)
        if verbose and (i + 1) % 50 == 0:
            print(f"Cost after step {i + 1}: {costs[-1]}")

    return costs, times

Okay, we’ve set everything up; let’s run the optimisation!

from pennylane import numpy as np

init_params = np.random.randn(3)
step_num = 500
costs, times = default_optimize(default_qnode, init_params, step_num, verbose=True)
Cost after step 50: 0.742
Cost after step 100: 0.168
Cost after step 150: -0.37
Cost after step 200: -0.784
Cost after step 250: -0.952
Cost after step 300: -0.994
Cost after step 350: -1.0
Cost after step 400: -1.0
Cost after step 450: -1.0
Cost after step 500: -1.0

Brilliant! The expectation value of \(\hat{Z}\) decreases as the optimization runs and reaches \(-1\), which is what we would expect from a qubit in state \(|1\rangle.\)

Possible improvements

The previous device gave us the desired result, but let’s step back for a moment and think about possible modifications that can make us happier (for example, which would make our code faster or would allow us to converge faster).

1. Exact calculations

Suppose we had another, more complicated circuit (still simulated classicaly) and we wanted to check whether the optimization works at all, i.e. whether it works under the mildest conditions ever — when all the quantities are evaluated exactly. For this, we would need to change our device to:

backprop_device = qml.device('default.qubit', wires=1) # We set shots = None

Perhaps the name of the new device have already hinted to you what the next step will be: since we are relying on brute-force calculations anyway, why bother with the parameter-shift rule? All our computations should form a perfectly valid computational graph, and thus we can leverage the magic of backpropagation. This would save us some time, because we need to run just one forward pass to compute the entire gradient. On the contrary, the parameter-shift rule has to make \(2 N_{\rm param.}\) forward passes (if the magic of backpropagation seems too magical, there is a dedicated backpropagation tutorial, which you are also very much encouraged to check). The corresponding change can be implemented as follows:

backprop_qnode = qml.QNode(circuit, 

2. Interfacing with existing ML packages

Let us now explore another direction for possible improvement. Previously, we used a “golden standard” optimizer — ADAM — but do we have to be limited by it? Suppose you’ve heard from you friend about another ✨shiny✨ optimizer — in our case let’s assume it’s the L-BFGS optimizer — and you are eager to try it. Unfortunately, PennyLane doesn’t have it implemented, but you know that PyTorch does. What can be done? Well, nothing could be simpler. PennyLane developers spend an awful lot of time making sure PennyLane can be seamlessly used with existing ML frameworks. In particular, any QNode can be made transparent to PyTorch and be a meaningful part of its computational graph. The only change to make is to set the interface keyword argument to 'torch':

torch_qnode = qml.QNode(circuit, 

We have to slightly modify our optimization function, but this change is only due to minor differences in PennyLane and PyTorch optimization routines:

import torch

def torch_optimize(qnode, init_params, step_num, verbose=False):
    params = torch.from_numpy(np.copy(init_params))
    params.requires_grad = True

    # In PyTorch, optimizer takes a list optimized tensors a constructor argument
    optimizer = torch.optim.LBFGS([params])
    costs = []
    times = []    
    for i in range(step_num):
        # Also, in PyTorch one has to define a cost closure to feed into L-BFGS 
        start_time = time.time()
        def closure():
            cost = qnode(params)
            if cost.requires_grad:

            return cost.detach().numpy()

        cost = optimizer.step(closure)
        times.append(time.time() - start_time)
        if verbose and (i + 1) % 50 == 0:
            print(f"Cost after step {i + 1}: {costs[-1]}")

    return costs, times

Let’s now run the code:

step_num = 500
costs, times = torch_optimize(torch_qnode, init_params, step_num, verbose=True)
Cost after step 50: -0.9999999999999999
Cost after step 100: -0.9999999999999999
Cost after step 150: -0.9999999999999999
Cost after step 200: -0.9999999999999999
Cost after step 250: -0.9999999999999999
Cost after step 300: -0.9999999999999999
Cost after step 350: -0.9999999999999999
Cost after step 400: -0.9999999999999999
Cost after step 450: -0.9999999999999999
Cost after step 500: -0.9999999999999999

Whoa! One can see that L-BFGSshines✨ indeed; after 50 iterations the value of \(\langle \hat{Z} \rangle\) is already \(-1\), while it takes ADAM roughly seven times longer to get there!

Disclaimer: of course, L-BFGS is not magic, we recorded the cost only in between so-called external iterations, which consist of several internal parameter updates made by L-BFGS. Hence, it would be more fair to compare execution times, which we will do later.

3. Using existing ML packages for tensor operations

We have employed some cool perks from PyTorch, but we left out a huge part part of it: we didn’t use either highly optimised tensor algebra routines or GPUs. Can we somehow benefit from them as well? Sure! In fact, we can ask PennyLane to run the entire computation using only PyTorch primitives, and it can be done this easily:

# Torch tensor arithmetics on CPU
torch_device = qml.device('default.qubit.torch', wires=1)
torch_qnode = qml.QNode(circuit, 

The code above would still run on CPU and the only change to enable GPU execution is to add torch_device='cuda' to the initialization of qml.device:

# Torch tensor arithmetics on GPU
torch_device = qml.device('default.qubit.torch', wires=1, torch_device='cuda')
torch_qnode = qml.QNode(circuit, 

The only thing left to take care of now is to make sure that tensors that enter the torch_qnode are placed on the same device as the QNode itself.

4. Adding some noise

Of course, PennyLane devices are not only aimed at making your life easier. In fact, some of them serve to make your life more realistic. So far we have enjoyed ourselves simulating quantum computers classically and calculating the gradients exactly, but executing the same circuit on a real quantum machine would be nowhere near as pleasant. It would be great to account for that — for example, by incorporating some noise effects in our calculations.

As soon as we start dealing with noise, we can no longer rely on the pure state formalism to describe the evolution of quantum states, and instead we have to switch to the formalism of mixed states. It might sound scary, but as many times before, it requires only a small change to our code: choosing the default.mixed simulator device:

mixed_device = qml.device('default.mixed', wires=1)

That’s it! Now we are free to add some noise to our circuit by slightly modifying it2:

def mixed_circuit(params):
    qml.RZ(params[0], wires=0)
    qml.RY(params[1], wires=0)
    qml.RZ(params[2], wires=0)

    # The noise is added here:
    qml.DepolarizingChannel(0.01, wires=0)

    return qml.expval(qml.PauliZ(0))

And we initialize a new QNode:

# default.mixed device supports backpropagation since only very recently, 
# but let's not be afraid of using new features!
mixed_qnode = qml.QNode(mixed_circuit, device=mixed_device, diff_method='backprop')

Running experiments

We have proposed enough simple modifications, let’s test them all! In our case, every optimization instance is defined by a QNode that’s used to run the calculations and by an optimization function (we have defined two so far: default_optimize and torch_optimize). We will keep these “experiment descriptions” as tuples in a dictionary, so that we can nicely iterate over them:

exp_names = ('default', 'backprop', 'torch', 'mixed')
exp_descriptions = {
    'default': (default_qnode, default_optimize),
    'backprop': (backprop_qnode, default_optimize),
    'torch': (torch_qnode, torch_optimize),
    'mixed': (mixed_qnode, default_optimize),

Now, we just iterate through the defined experiments and store the resulting cost curves as well as the time spent on iterations.

costs = {}
times = {}
step_num = 250
init_params = np.asarray([-1.21499306, -2.19300114, -0.49350792])
for exp_name, (qnode, opt_func) in exp_descriptions.items():    
    costs[exp_name], times[exp_name] = opt_func(qnode,

Finally, let’s plot the results: we plot both “Cost vs. Iteration” and “Cost vs. Time” plots in the same figure.

from matplotlib import pyplot as plt

fig, axes = plt.subplots(1, 2)    
fig.set_figwidth(2 * fig.get_figheight())
ax_idx_to_xlabel = {0: 'Iteration', 1: 'Time, s'}

for ax_idx in (0, 1):
    for exp_name in exp_descriptions:
        if ax_idx == 0:


The resulting curves

Let’s analyze the curves. We can see that they reflect their nature perfectly. The blue default curve corresponds to a “classic” variational optimization based on sampling the quantum circuit — and we can see how it “wiggles”, but still converges closely to a desired value of \(\langle\hat{Z}\rangle = -1\).

The orange backprop curve reflects calculations done exactly, resulting in the Cost–Iteration plot looking like a smoothened version of the default curve. However, if we look at the Cost–Time plot, we can see that it actually takes less time for backprop to converge — just because the gradient can be estimated after one forward pass.

The green torch curve demonstrates the advantage of the L-BFGS algorithm run on a PyTorch device. The Cost–Iteration plot shows that L-BFGS actually converges after a single iteration, and the Cost–Time plot shows that this iteration doesn’t take too long, even though the optimizer internally evaluates the cost function several times.

Finally, the red mixed curve stabilizes at some value \(\langle\hat{Z}\rangle > -1\) because the added noise makes it impossible to prepare the qubit in a perfect \(|1\rangle\) state. Nevertheless, we can see that the curve has still converged, and thus we conclude we have found the set of parameters which brings us to the goal of \(\langle\hat{Z}\rangle = -1\) as closely as possible.

The Device Zoo

Of course, PennyLane supports way more devices than what we have just overviewed, each having its own bells and whistles. If we try to get acquainted with all of them, this how-to will get overloaded. Let’s have a brief look at the available classes of devices and their highlights.

  1. Software simulators: different tensor algebra backends. This class of devices was represented by the default.torch in our how-to and they are ideal if you want to unleash the power of GPUs (or maybe even TPUs) onto your problem. Of course, apart from the PyTorch backend, there are devices interfacing with/running on TensorFlow and Jax. We should also mention the lightning.qubit device, which is a fast qubit simulator that uses C++ in the backend.
  2. Software simulators: different physics. We have already seen one of them (default.mixed), which allowed us to switch to the language of density matrix representation and explore the effects of noise. Similarly, if you are interested in continuous variable quantum computing, you might find the default.gaussian device useful, as it is tailored to simulate photonic systems.
  3. Cloud devices. IBM, IonQ, AWS, Honeywell, Rigetti, … There are so many quantum computing companies on the market providing cloud access to their NISQ devices (or sometimes high-performance software simulators). Why not trying one of them? If you are eager to try your code on an actual quantum computer, just install a plugin and use them almost out-of-the-box! More information can be found on the PennyLane Plugins and ecosystem page. Note that you need to write a PennyLane plugin whenever you want to interface with a new device, so when we say “device” we mean “plugin” and vice-versa.

Final remarks

To sum up, you might find the following tentative workflow useful:

  1. Start easy and use default.qubit with shots=None and backpropagation to prototype your algorithm.
  2. Get more realistic by setting shots to a finite number and/or using default.mixed.
  3. Make your code fly by using high-performance software simulators (possibly using GPUs and/or TPUs).
  4. If you are more interested in exploring the behaviour of real devices, just go for them!

Congratulations on completing this how-to, we hope you can now easily navigate through the PennyLane Device Zoo!

Aleksei Malyshev

Aleksei Malyshev

Aleksei is a DPhil student at the University of Oxford. He does research at the intersection of deep learning and quantum physics. In particular, he studies how to make neural networks represent quantum states of interest and how to extract useful information out of this representation. Apart from that, he is not that scared of the quantum winter because he is really keen to see what quantum Santa is like.

  1. Please note that measuring execution times with time.time() is very far from optimal. For example, if your operating system switched to another process amidst running your Python script (which it does quite often — otherwise it wouldn’t be multiprocessing), then the measured time might be much larger than the “pure” execution time. Nevertheless, using time.time() is good enough for the purpose of this how-to.
  2. Here we choose the depolarizing channel model of noise, but you can learn more about other noise models in the demo about noisy circuits. (Have you already noticed that PennyLane has demos, tutorials and how-tos on many basic yet important topics?)

Tags: device, qml, pytorch
Last modified: 00:01, 10 Jan 2023