May 08, 2024

Testing quantum computing platforms

Matteo Paltenghi

The following is a guest post by Matteo Paltenghi, a researcher and PhD candidate at the University of Stuttgart, discussing their recent research on testing quantum software platforms — including PennyLane — for both non-quantum and quantum-specific bugs.

Quantum computing is a rapidly evolving field that promises to revolutionize many domains such as cryptography, simulation, optimization and even machine learning. Since its inception, a lot of effort has been directed towards creating a reliable hardware platform to implement the fundamental quantum computing primitives such as qubits and the quantum gates between them. Making the computation on the hardware reliable and keeping the effect of noise to a minimum are certainly important milestones in unlocking the potential of quantum computing, but less emphasis is usually put on the reliability of the software stack. Indeed, buggy software can easily lead to useless results, as much as a quantum computer with a lot of noise.

As a researcher at the intersection between software engineering and quantum computing, my work focuses on increasing the reliability of the software stack of quantum computing software. By addressing the intricacies of quantum software testing, we aim to ensure that the potential of quantum computing is realized to its fullest extent, offering robust solutions for the challenges ahead.

The quantum computing stack

The quantum computing stack can be divided into three parts:

Quantum algorithms: the computational steps to solve a particular problem; this can be described as a program or as a specification of quantum gates to be applied.
Quantum computing platforms: software we use to convert our algorithm into something the machine can understand and execute; including different components such as a conversion step to make the program compatible with the used hardware or optimization.
Quantum computer hardware: a specific underlying hardware technology used to run machine code and produce an output, such as superconducting qubits or photonic quantum computing.

Our focus in the quantum stack is on testing quantum computing platforms.

Reliable quantum computing platforms are essentially the link between promising and novel ideas described as algorithms and their actual implementation and execution on real hardware that could provide an actual quantum speedup. For this pivotal role, the reliability of the quantum computing platform is crucial for the successful development of the field.

Quantum software testing := the activity of testing the software used to convert quantum computations into machine code executable on a quantum computer. Broadly it includes both testing the quantum programs or algorithms, as well as platforms, and in this blog post we will focus on testing quantum software platforms.

Challenges in quantum software testing

Two popular examples of quantum computing platforms are PennyLane by Xanadu and Qiskit by IBM Quantum, but there are many more with a different focus, like Cirq by Google, the Quantum Development Kit by Microsoft, or TKET by Quantinuum. They all share some commonalities and challenges — importantly, they include pieces of software that an average programmer, not trained in quantum computing, would not know how to code correctly. This inherent difficulty and domain-specific knowledge required to be among the developers of a quantum computing platform is one of the root causes for the existence of a new class of bugs, that we call quantum-specific.

Quantum-specific bugs := bugs in the software stack that require quantum computing knowledge to be spotted and fixed.

We can see two examples below; first we have a generic Python bug that could be spotted by any average Python programmer…

A Python bug (image taken from Bugs in Quantum Computing Platforms: An Empirical Study) by Paltenghi and Pradel, OOPSLA 22).

…and the following bug encompasses a mathematical formula that only a quantum developer would be able to spot as incorrect (the incorrect code fails to distinguish between two quantum-specific concepts, PauliTerm and PauliSum when checking if term implements an identity function on the quantum state).

A quantum-specific bug according to our definition (image taken from Bugs in Quantum Computing Platforms: An Empirical Study by Paltenghi and Pradel, OOPSLA 22).

The existence of quantum-specific bugs is one of the main challenges faced by quantum software testing; it is a direct consequence of the fast-paced environment of quantum computing. Indeed, the code of quantum software platforms evolves really quickly, making it hard for quantum software testing to keep up with the implementation of new features.

Bugs in quantum computing platforms

The first step to tackle a new problem like quantum software testing is to better understand it. To do so, we ran an empirical study including 18 open source quantum computing platforms: PennyLane, ProjectQ, OpenQL, Qiskit Aer, Qiskit Ignis, Qiskit Terra, Tequila, Braket, dwave-system, XACC, QDK Q# Libraries, QDK Q# Compiler, QDK Q# Runtime, Cirq, Qulacs, PyQuil, Mitiq, and StrawberryFields.

We tried to understand the kinds of bugs that happen in this software, find out how frequent quantum-specific bugs are, and uncover some key observations that could help us better test these types of platforms. We inspected the GitHub history of these platforms looking for bug fix commits, and made a note of 223 bugs across the different platforms.

The key insights we found are:

A significant portion (40%) of bugs are quantum-specific, motivating further development of dedicated testing approaches to target them.
Among the modules within quantum platforms most vulnerable to quantum-specific bugs are core components, including those defining essential quantum abstractions such as Python classes for qubits or quantum circuits. This presents an opportunity for bug hunters like us, as exposing these bugs merely requires providing example quantum computations to the platform and executing them on a simulator. Thus, there's no necessity for fully error-corrected quantum computers to reliably identify bugs.
A large portion of those bugs could be detected because the platform crashed when fed specific valid programs. This also guarantees that we could find bugs relatively inexpensively without any comparison of output distribution between the expected distribution of a program and what we actually get, which is a notoriously challenging task.

In the picture below we show the main components of a quantum computing platform and the number of classical and traditional bugs in them.

The main components of a quantum computing platform with the number of quantum-specific bugs (orange bar) and classical bugs (blue bar) that we inspected.

The main components of a quantum computing platform with the number of quantum-specific bugs (orange bar) and classical bugs (blue bar) that we found.

The list below shows the definitions and descriptions of the components we identified while inspecting the bugs.

Quantum abstractions: Components that provide quantum programming constructs, such as qubits, gates, and circuits.
Classical abstractions: They provide high-level constructs for classical programming tasks, e.g. data structure, sorting algorithms.
Domain-specific abstractions: Some platforms offer specialized abstractions for expressing quantum algorithms in specific domains (e.g. chemistry).
Intermediate representations: Representations that include code for creating and manipulating in-memory representations of quantum programs.
Optimizations: Quantum compilers perform various optimizations to improve program efficiency, such as reducing circuit depth.
Machine code generation: Component that translates high-level quantum programs into low-level instructions (e.g. QASM).
Interface to quantum computer: Manages the communication with a backend that represents a real computer.
Simulator: A software-based simulation environment that mimics quantum operations.
Quantum state evaluation: Evaluates the state of a quantum program, e.g., by measuring qubits after computation.
Testing and visualization: Components for testing quantum programs and visualizing results.
Scripts and glue code: They facilitate communication and coordination between different components in the platform.

To find out more, take a look at our paper.

MorphQ: Metamorphic testing for quantum software testing

Given the large portion of quantum-specific bugs in these platforms, we need testing techniques that test the properties of the quantum programs manipulated and generated by those platforms. A recurring challenge in software engineering, known as the oracle problem, comes into play here. An oracle serves as a hypothetical entity that reliably provides the correct output when given a specific input query, essentially serving as a benchmark for assessing whether the system under test, namely a quantum computing platform, behaves as intended according to its specifications.

Oracle problem := the challenge of distinguishing the correct behavior of a system from the incorrect one, given an input to the system. It is a fundamental problem in software testing, as it requires a mechanism or a procedure to determine whether a test has passed or failed.

Let’s imagine we write a quantum program with PennyLane and then we convert it to its underlying QASM (quantum assembly) representation to send it to an IBM quantum computer. How can we ensure that the given QASM representation is exactly equivalent to our program?

An intriguing approach from software testing to tackle this challenge is to identify properties that specific pairs of programs compiled by our system should exhibit and then verify whether those properties hold true. This framework is called metamorphic testing, and it consists of repeatedly generating an initial quantum program called a source program, then applying a transformation to the source code of that program to generate a new program, as a follow-up that has a specific relationship with the source program.

An overview of the approach our team took, called MorphQ, is shown in the figure above. The name 'MorphQ' draws inspiration from its core concept of metamorphic testing (Morph-) and its application to quantum software (-Q). First, in stage 1, we generate a quantum program, after which we apply transformations (e.g. adding two canceling X gates or changing the settings of the execution environment) in stage 2, and then we run the program and check the output relationships in stage 3.

Let’s look at a concrete example. We start from a random circuit, like the one in the figure below…

…then reorder the qubit wires and generate another program that has the exact same gates but has them applied to different qubit indices, e.g. the gates acting on qubit 2 are consistently acting on qubit 1:

This approach would then execute both and check if the two lead to two outputs that are in a specific relationship. In this case the outputs should be the same if we reorder the output bit strings of the second program using inverse mapping, namely if every qubit at position 1 is moved at position 2 for all the output bit strings.

A more advanced transformation is shown the figure below, where the follow-up circuit is obtained by partitioning the original circuit into two subcircuits. They are run separately and then the two distributions are combined with the Cartesian product to find another output distribution which should be equivalent to the first one.

Random circuit — follow-up Cartesian product

Regardless of the specific transformation, in both cases we expect the two circuits to lead to the same output in our application of the metamorphic circuit, because we focus on the so-called semantics-preserving transformations, namely transformations that maintain the same meaning of the program.

Thus, by executing both and comparing the distributions, we are able to find real bugs. We can also take a closer look at two examples of bugs found when running the program using Qiskit:

This first bug is triggered by a combination of two metamorphic transformations: Change of optimization level and Inject null-effect operations. A main circuit with eleven qubits with a subcircuit with ten qubits, which is optimized with optimization pass of level 2. The transpilation of this program triggers a generic NumPy error message (in the comment). The bug is in a specific analysis part of the optimization called the CommutationAnalysis, which fails due to matrix multiplications with dimensions exceeding the maximum supported by NumPy.

The second example shows a bug that is triggered because of the conversion between quantum circuits and QASM. Specifically, when the subcircuit with a classical register is converted to QASM and then back to a quantum circuit, it results in an error due to parsing invalid QASM code. The issue lies in the QASM exporter, which fails to handle sub-circuit declarations with classical registers correctly, leading to the reported error.

What makes MorphQ a practical approach?

Here is a note on the practicality of our testing approach.

Building on the insight of our previous study of bugs in quantum computing platforms, we decided to do the following when performing metamorphic testing using MorphQ:

Run the source and follow-up programs on simulators, because running on real noisy hardware could lead us to believe that there is a bug in the software where instead the problem is in the hardware. Moreover, our study of bugs revealed how many bugs there are in the core abstractions of the platforms, which is an exercise in the same way regardless of the target backend.
Check for a more reliable lead-to-crash output relationship instead of comparing output distributions, because, due to the nature of probabilistic computation of quantum computing, there is always a chance that two distributions are different just by chance and not because of a bug. As a consequence, MorphQ flags as a bug any case where one of the two programs crashes while the other runs successfully. Moreover, most of the bugs studied in our previous work were detected by crashes, showing how that is promising.

Similarly, prior work with QDiff has shown how the approach of comparing distributions with the goal of testing a quantum computing platform is extremely brittle, especially when using real quantum computers. This reinforced our choice of runs on simulators.

Conclusion

In conclusion, our exploration into quantum software testing unveils critical insights into the reliability and robustness of quantum computing platforms. By delving into the intricacies of quantum-specific bugs and leveraging metamorphic testing techniques, we've highlighted the importance of addressing these challenges to ensure the advancement and adoption of quantum technologies.

For you as a quantum developer or researcher, this means recognizing the significance of thorough testing methodologies and being cognizant of the potential pitfalls inherent in quantum software development and of which components are the most bug prone. Moving forward, researchers and developers alike can benefit from our findings by applying similar testing principles to their preferred quantum computing platforms, whether it be PennyLane, Qiskit, or others. By embracing the methodologies outlined in our MorphQ paper and GitHub repository, you can already concretely contribute to the enhancement and refinement of quantum software testing practices, ultimately propelling the field (and your own projects) towards greater reliability and innovation.

About the author

Matteo Paltenghi

Hey there! I'm Matteo Paltenghi, a quantum computing researcher based in Stuttgart, Germany. I'm all about ensuring quantum software is rock-solid through rigorous testing methods. I've also dabbled in the intersection of software engineering and mac...

Last modified: November 05, 2024

Why PennyLane

Getting Started

Documentation

Ecosystem