How to parallelize QNode execution

Brian Doolittle (Xanadu resident)

Currently, only a handful of quantum computers are available for public use. The rising interest in quantum computing has resulted in long wait times for these public devices. When you run a quantum computer through the cloud, the majority of time can be spent waiting in a queue for your job to execute. More specifically, your local machine (e.g., laptop) dispatches a job to a remote quantum device via a synchronous web request to an external API. While waiting for a response, your laptop idles, performing no useful computation.

In hybrid computing architectures, it is often desirable to collect data from many different quantum circuits. Naively, PennyLane would execute these circuits in series. That is, the first job is dispatched to the remote machine while your laptop idles. Once the first job finishes, the second job is dispatched. This process continues until all jobs have been executed on the remote quantum device. Clearly, single-threaded local execution of quantum circuits is very inefficient if the remote device is able to run parallel jobs. The solution is parallelization through multi-threading!

Multi-threading allows multiple web requests to be made at once. Only one thread can be processed at a time, however, when one thread idles another can be run. In this manner, remote quantum devices can harness their parallelization to dramatically reduce the time required to execute all of the circuits. PennyLane offers the ability to parallelize circuit execution across remote quantum devices using the QNodeCollection class. Under the hood, PennyLane uses Dask to multi-thread the web requests made to external quantum devices.

In this PennyLane how-to guide, we will demonstrate how to parallelize circuit execution across IBM Quantum devices using the simple example of qubit state tomography. We will show how to construct a QNodeCollection and how to parallelize its execution using the parallel=True flag.

Setup for Parallel QNode Execution

Required Python Libraries

In addition to PennyLane, you’ll need the following libraries for this how-to guide:

  • Dask - a flexible library for parallel computing in Python.
  • Qiskit - an open-source SDK for working with IBM’s quantum computers.
  • PennyLane-Qiskit Plugin - integrates the Qiskit SDK with PennyLane’s hybrid computing platform.

Install dask with:

$ pip install dask[delayed]

Install instructions for qiskit and the pennylane-qiskit plugin are found on the Qiskit getting started page and the PennyLane Installation page respectively.

IBM Q Account Integration

An IBM Q account and secret API token are needed to execute quantum circuits on IBM’s hardware and remote simulators. To obtain your IBM Q API token follow these steps:

  1. Sign in or create an IBM Quantum account at
  2. The secret IBM Q API token can be copied from the welcome page of your profile.

Then use the IBM Q API token to get a provider context through qiskit.

from qiskit import IBMQ

token = "XYZ"   # secret IBM Q API token from above
provider = IBMQ.enable_account(token)

The Qiskit provider instance will give PennyLane access to the qiskit.ibmq devices.


By default, a qiskit.ibmq device will attempt to use an already active or stored IBM Q account. If one exists, then you can configure the device without the provider argument. Details on how to configure PennyLane to use your IBM Q account are found here.

Example: Parallelized Quantum State Tomography

Quantum state tomography is a procedure where an unknown quantum state is characterized by repeatedly measuring it. This requires a quantum state to be prepared many times and measured in a number of non-commuting bases. Since each measurement basis is run as a different quantum circuit, quantum state tomography is a simple example by which to demonstrate the advantage of parallelization. In this how-to guide, we won’t worry about the details of state tomography. Instead, we’ll show how to parallelize qubit state tomography circuits. To begin, we first import the dependencies.

import pennylane as qml
import pennylane.numpy as np

The first step for parallel circuit execution in PennyLane is to define a template circuit. For qubit state tomography, the circuit template simply prepares an arbitrary qubit state using qml.Rot().

# template circuit for qubit preparation
def qubit_rotation_circuit(params, wires):
    theta, phi, omega = params
    qml.Rot(theta, phi, omega, wires=wires)

The prepared qubit state is then measured in each of the Pauli bases. We express these measurements as a list of Pauli observables.

pauli_observables = [

Before constructing the tomography circuits, we need to setup an IBM Q device.

ibm_dev = qml.device('qiskit.ibmq', wires=1,

The backend is chosen as the 'ibmq_qasm_simulator' because this hardware simulator has relatively short wait times and can process incoming web requests in parallel. Furthermore, we need one wire for the qubit circuit and the provider argument was configured in the IBM Q Account Integration section.

It remains to construct the three parallelizable circuits for our qubit state tomography. This is easily done using PennyLane’s map() function to construct a QNodeCollection class.

ibm_qnodes =
    qubit_rotation_circuit,  # template circuit
    pauli_observables,       # measurement bases
    ibm_dev,                 # IBM simulator

Here, ibm_qnodes is an instance of a QNodeCollection. It can be understood as a list of three QNode instances which prepare the same qubit, but measure in different Pauli bases. The benefit of the QNodeCollection is that the contained QNode instances can be executed in parallel. The measure="expval" argument declares that the expectation value of each Pauli observable will be computed.

Parallel vs. Series Execution

To compare parallel and series execution we will use the following angle parameters to prepare the “unknown” qubit state.

# arguments for qml.Rot()
theta = np.pi/2
phi = np.pi/3
omega = np.pi/4

First, we will execute the ibm_qnodes in series. Note that the time library is used to measure the execution time.

>>> import time
>>> %time ibm_pauli_expvals = ibm_qnodes([theta, phi, omega])
CPU times: user 181 ms, sys: 19.9 ms, total: 201 ms
Wall time: 38.6 s

Now, we will compute the same expectation values in parallel.

>>> %time ibm_pauli_expvals = ibm_qnodes([theta, phi, omega], parallel=True)
CPU times: user 192 ms, sys: 19.5 ms, total: 211 ms
Wall time: 12.9 s

Wow, there was a 3x speedup when using the parallel=True flag! The speedup comes from the multi-threading provided by dask behind the scenes. Multi-threading allows PennyLane to send three parallel web requests to the IBM Q simulator. As a result, it takes 1/3 of the time to execute the QNodeCollection.


If dask is not installed, an ImportError will be raised if a QNodeCollection is executed with the parallel=True flag.


Parallelizing jobs across remote quantum devices is an important step towards distributed hybrid computing. In this tutorial, we demonstrated how simple circuits can have dramatic speedups when parallelized across remote devices. In PennyLane, the general approach is simple:

  1. Construct a QNodeCollection.
  2. Execute the QNodeCollection with the parallel=True flag.

It is important to understand the limitations of parallelization in PennyLane. Currently, parallel processing using built-in simulator devices is not supported. One setting where parallelization does work is when web requests are being made to remote devices, provided the device can handle receiving and processing those requests in parallel. For example, if the dispatched jobs have to wait in the same queue for execution on the same device, parallelization won’t help.

We look forward to the future developments of the PennyLane dev team and general progress towards distributed hybrid computing architectures. Until then, happy hacking!