January 10, 2025
A brief history of optimizing T-gate counts with the op-T-mize dataset
Non-Clifford gates are widely considered the main source of headache in fault-tolerant quantum computing, due to the non-triviality of their implementation. Clifford gates are often implemented via transversal operations, which are considered to have negligible cost compared to their non-Clifford counterparts. Non-Clifford gates, on the other hand, require expensive protocols like magic state distillation and injection. When it comes to calculating the non-Clifford costs of a quantum circuit, T-gate counts have emerged as the metric to consider since many quantum error correcting codes use Clifford + T as their gate set for universal quantum computing.
Note that in NISQ (noisy intermediate-scale quantum) settings this is different as single qubit gates like Pauli rotations or T-gates are cheap, and two-qubit gates like \text{CNOT} are expensive in terms of time and error induced. We are not talking about NISQ settings here.
Even though recent work suggests that T-gates can be made cheaper such that minimizing Clifford gates becomes relevant as well, there has been a tremendous amount of work and progress in optimizing T-gate counts in quantum compilation. This has in part been driven by the availability of standard benchmark circuits that everyone can compare their methods against. To make these benchmark circuits even easier to access, we've added them to PennyLane. And because this benchmark dataset is never named, we took the liberty to call it the op-T-mize dataset.
In this blog post, we want to give a brief overview of the state of affairs in T-gate optimization and highlight some cornerstone papers as well as some of the latest developments (with no expectation of completeness). All mentioned papers below used the benchmark circuits in the op-T-mize dataset (with some slight variations).
Contents
A brief history of T-gate optimization
The T-gate optimization literature can roughly be divided in two schools: using phase polynomials [1, 2, 3, 4] and ZX-calculus [5, 6, 7] as intermediate representations of quantum circuits. In both cases, the assumed input is a (Clifford + T) circuit and the goal is to reduce the T-gate count as much as possible. Each representation has its Achilles' heel and it has been found that they are particularly strong in combination as they complement each other's deficits (to some degree at least).
Phase polynomials
Phase polynomials describe the action of gates composed of (\text{CNOT} + T) gates (in particular, no Hadamard gates inducing superposition). Technically, the action is described by a phase polynomial in combination with a so-called gate synthesis matrix. Because we usually mostly care about the phase polynomial, this representation is often just refered to as such in the literature. For example, we can write the action of \text{CNOT} as
We can convince ourselves that this indeed yields |0, 0\rangle \mapsto |0, 0\rangle, |0, 1\rangle \mapsto |0, 1\rangle, |1, 0\rangle \mapsto |1, 1\rangle, |1, 1\rangle \mapsto |1, 0\rangle.
Further, a T-gate acts on a single qubit state as
We can now compose the action of T and \text{CNOT} in this phase polynomial description and readily see that
To be mathematically compact, we can generally write |x_1, x_2, .. \rangle as a binary vector \boldsymbol{x} \in \mathbb{Z}_2^{n} for n qubits. A (\text{CNOT} + T) circuit U can then be compactly described in terms of its phase polynomial \phi and gate synthesis matrix A by
The gate synthesis matrix can be described by a \text{CNOT} circuit, which in the framework of optimizing solely T-gates is not of concern. So we mainly care about the phase polynomial part, e^{i \phi(\boldsymbol{x})}.
The ground work for these techniques has been laid in [1], with [2] giving additional improvements and an efficient C++ implementation, making it a de facto standard method (in particular, the third order duplicate and destroy algorithm — TODD).
The first crux of these methods is to split a (Clifford + T) circuit into a leading (\text{CNOT} + T) part and a remaining Clifford part (see figure below). This is always possible at the cost of introducing some auxiliary qubits. One can then further separate just the non-Clifford phase part U_f. The second crux is to read out a specific symmetric rank-3 tensor \mathcal{T} that describes the first part of the circuit, U_f, containing only non-Clifford gates. In [1] it was shown that optimizing the T gate count then reduces to finding a low-rank representation of \mathcal{T}, and there are different strategies and heuristics to achieve that.
The op-T-mize benchmark results from the original work (Polynomial-time T-depth Optimization of Clifford+T circuits via Matroid Partitioning [1]) have been improved upon by An Efficient Quantum Compiler that reduces T count [2] and several other works. Most recently, reinforcement learning [3] has been used to find good low-rank representations and cleverly inserting sub-circuits further reducing the T-gate count at the cost of introducing auxiliary qubits (so-called "gadgets"). The TODD algorithm has recently been further improved in terms of run-time and gate counts on op-T-mize circuits [4].
The Achilles' heel of these methods is the fact that phase polynomial descriptions cannot handle (too many) Hadamard gates — or superposition, for that matter. Formally this can be done in U |\boldsymbol{x}\rangle = \sum_j c_j e^{i \phi^j(\boldsymbol{x})} |A^j \boldsymbol{x}\rangle but there are exponentially many j for full descriptions. The required separation of input circuits into (\text{CNOT} + T) and Clifford parts leads to an overhead in Clifford gates and controlled operations.
ZX-calculus
ZX-calculus is a powerful graphical language to represent and reason about linear operators on qubits. You can check our introduction to the ZX-calculus for more details on the general framework. There are a lot of rewrite rules in ZX-calculus that allow one to manipulate operators. As such, it is well-suited for optimizing circuits and, in particular, reducing T-gate counts.
There is, however, the caveat that after manipulating a circuit in its ZX-diagram, there is no guarantee that a circuit can be extracted efficiently again. Hence there need to be extra rules on what manipulations are allowed and which ones aren't. The fundamental work for circuit optimization and, in particular, T-gate optimization has been laid in [5, 6, 7], where the authors introduce a canonical form of a circuit and a set of rewrite rules that guarantee efficient extraction of a valid circuit representation again.
These methods are readily available with Python (PyZX) and Rust (QuiZX) implementations on the ZX-calculus website. An example of how a circuit in form of a ZX-diagram can be optimized to reduce T-gate count can be seen in the figure below, taken from [5].
These methods can, in principle, handle any type of gate in (Clifford + T) circuits and, in particular, don't suffer from superposition induced by Hadamards. Their main caveat is the extraction procedure to retrieve a valid quantum circuit, which is only efficient when specific rewrite rules are applied, thereby limiting the possibilities for optimization.
It has been shown that a combination of phase polynomial and ZX-calculus techniques can lead to better results than each method on its own. This makes sense as they nicely complement each other's Achilles' heels. For example, see the last column in table 1 with the op-T-mize circuits in [5], where the author's ZX-based method is combined with TODD.
Op-T-mize dataset
Throughout the aforementioned papers, the authors use a specific set of logical circuits as the benchmark for their method in optimizing T-gate counts. In particular, you will find virtually the same table of circuits with slight variations in all papers. Having clearly defined and comparable benchmarks is a great driver for scientific progress. We want to contribute to that by making these circuits readily available in PennyLane Datasets. Because there is no commonly used name for these benchmark circuits (in fact, they are not named at all), we propose to call it the op-T-mize dataset due to its historic prevalence in T-gate optimization literature. Of course, these circuits can also be used to benchmark other metrics like \text{CNOT} count, circuit depth or overall gate count.
Literature
[1] Matthew Amy, Dmitri Maslov, Michele Mosca "Polynomial-time T-depth Optimization of Clifford+T circuits via Matroid Partitioning" arXiv:1303.2042, 2013.
[2] Luke Heyfron, Earl T. Campbell "An Efficient Quantum Compiler that reduces T count" arXiv:1712.01557, 2017.
[3] Francisco J. R. Ruiz, Tuomas Laakkonen, Johannes Bausch et al "Quantum Circuit Optimization with AlphaTensor" arXiv:2402.14396, 2024.
[4] Vivien Vandaele "Lower T-count with faster algorithms" arXiv:2407.08695, 2024.
[5] Aleks Kissinger, John van de Wetering "Reducing T-count with the ZX-calculus" arXiv:1903.10477, 2019.
[6] Ross Duncan, Aleks Kissinger, Simon Perdrix, John van de Wetering "Graph-theoretic Simplification of Quantum Circuits with the ZX-calculus" arXiv:1902.03178, 2019.
[7] Miriam Backens, Hector Miller-Bakewell, Giovanni de Felice, Leo Lobski, John van de Wetering "There and back again: A circuit extraction tale" arXiv:2003.01664, 2020.
About the author
Korbinian Kottmann
Quantum simulation & open source software