January 10, 2025

A brief history of optimizing T-gate counts with the op-T-mize dataset

Korbinian Kottmann

op-T-mize PennyLane dataset thumbnail image

Non-Clifford gates are widely considered the main source of headache in fault-tolerant quantum computing, due to the non-triviality of their implementation. Clifford gates are often implemented via transversal operations, which are considered to have negligible cost compared to their non-Clifford counterparts. Non-Clifford gates, on the other hand, require expensive protocols like magic state distillation and injection. When it comes to calculating the non-Clifford costs of a quantum circuit, $T$ -gate counts have emerged as the metric to consider since many quantum error correcting codes use Clifford + $T$ as their gate set for universal quantum computing.

Note that in NISQ (noisy intermediate-scale quantum) settings this is different as single qubit gates like Pauli rotations or $T$ -gates are cheap, and two-qubit gates like $\text{CNOT}$ are expensive in terms of time and error induced. We are not talking about NISQ settings here.

Even though recent work suggests that $T$ -gates can be made cheaper such that minimizing Clifford gates becomes relevant as well, there has been a tremendous amount of work and progress in optimizing $T$ -gate counts in quantum compilation. This has in part been driven by the availability of standard benchmark circuits that everyone can compare their methods against. To make these benchmark circuits even easier to access, we've added them to PennyLane. And because this benchmark dataset is never named, we took the liberty to call it the op-T-mize dataset.

In this blog post, we want to give a brief overview of the state of affairs in $T$ -gate optimization and highlight some cornerstone papers as well as some of the latest developments (with no expectation of completeness). All mentioned papers below used the benchmark circuits in the op-T-mize dataset (with some slight variations).

A brief history of T-gate optimization
- Phase polynomials
- ZX-calculus
Op-T-mize dataset
Literature

A brief history of T-gate optimization

The T-gate optimization literature can roughly be divided in two schools: using phase polynomials [1, 2, 3, 4] and ZX-calculus [5, 6, 7] as intermediate representations of quantum circuits. In both cases, the assumed input is a (Clifford + $T$ ) circuit and the goal is to reduce the T-gate count as much as possible. Each representation has its Achilles' heel and it has been found that they are particularly strong in combination as they complement each other's deficits (to some degree at least).

Phase polynomials

Phase polynomials describe the action of gates composed of $(\text{CNOT} + T)$ gates (in particular, no Hadamard gates inducing superposition). Technically, the action is described by a phase polynomial in combination with a so-called gate synthesis matrix. Because we usually mostly care about the phase polynomial, this representation is often just refered to as such in the literature. For example, we can write the action of $\text{CNOT}$ as

\text{CNOT}_{(1, 2)} |x_1, x_2\rangle = |x_1, x_1 \oplus x_2\rangle.

Further, a $T$ -gate acts on a single qubit state as

T |x\rangle = e^{i \frac{\pi}{4} x} |x\rangle.

We can now compose the action of $T$ and $\text{CNOT}$ in this phase polynomial description and readily see that

T_2 \text{CNOT}_{(1, 2)} |x_1, x_2\rangle = e^{i \frac{\pi}{4} (x_1 \oplus x_2)} |x_1, x_1 \oplus x_2\rangle.

To be mathematically compact, we can generally write $|x_1, x_2, .. \rangle$ as a binary vector $\boldsymbol{x} \in \mathbb{Z}_2^{n}$ for $n$ qubits. A $(\text{CNOT} + T)$ circuit $U$ can then be compactly described in terms of its phase polynomial $\phi$ and gate synthesis matrix $A$ by

U |\boldsymbol{x}\rangle = e^{i \phi(\boldsymbol{x})} |A \boldsymbol{x}\rangle.

The gate synthesis matrix can be described by a $\text{CNOT}$ circuit, which in the framework of optimizing solely T-gates is not of concern. So we mainly care about the phase polynomial part, $e^{i \phi(\boldsymbol{x})}$ .

The ground work for these techniques has been laid in [1], with [2] giving additional improvements and an efficient C++ implementation, making it a de facto standard method (in particular, the third order duplicate and destroy algorithm — TODD).

The first crux of these methods is to split a (Clifford + $T$ ) circuit into a leading $(\text{CNOT} + T)$ part and a remaining Clifford part (see figure below). This is always possible at the cost of introducing some auxiliary qubits. One can then further separate just the non-Clifford phase part $U_f$ . The second crux is to read out a specific symmetric rank- $3$ tensor $\mathcal{T}$ that describes the first part of the circuit, $U_f$ , containing only non-Clifford gates. In [1] it was shown that optimizing the $T$ gate count then reduces to finding a low-rank representation of $\mathcal{T},$ and there are different strategies and heuristics to achieve that.

$Fig. 1 taken from [2] illustrating the process of extracting the phase polynomial of the circuit and optimizing it via a low-rank decomposition of its symmetric tensor S (we call it \mathcal{T}).$ Fig. 1 taken from [2], illustrating the process of extracting the phase polynomial of the circuit and optimizing it via a low-rank decomposition of its symmetric tensor $S$ (we call it $\mathcal{T}$).

The op-T-mize benchmark results from the original work (Polynomial-time T-depth Optimization of Clifford+T circuits via Matroid Partitioning [1]) have been improved upon by An Efficient Quantum Compiler that reduces T count [2] and several other works. Most recently, reinforcement learning [3] has been used to find good low-rank representations and cleverly inserting sub-circuits further reducing the T-gate count at the cost of introducing auxiliary qubits (so-called "gadgets"). The TODD algorithm has recently been further improved in terms of run-time and gate counts on op-T-mize circuits [4].

Latest op-T-mize benchmark results from [4] (TOHPE).

The Achilles' heel of these methods is the fact that phase polynomial descriptions cannot handle (too many) Hadamard gates — or superposition, for that matter. Formally this can be done in $U |\boldsymbol{x}\rangle = \sum_j c_j e^{i \phi^j(\boldsymbol{x})} |A^j \boldsymbol{x}\rangle$ but there are exponentially many $j$ for full descriptions. The required separation of input circuits into $(\text{CNOT} + T)$ and Clifford parts leads to an overhead in Clifford gates and controlled operations.

ZX-calculus

ZX-calculus is a powerful graphical language to represent and reason about linear operators on qubits. You can check our introduction to the ZX-calculus for more details on the general framework. There are a lot of rewrite rules in ZX-calculus that allow one to manipulate operators. As such, it is well-suited for optimizing circuits and, in particular, reducing T-gate counts.

There is, however, the caveat that after manipulating a circuit in its ZX-diagram, there is no guarantee that a circuit can be extracted efficiently again. Hence there need to be extra rules on what manipulations are allowed and which ones aren't. The fundamental work for circuit optimization and, in particular, T-gate optimization has been laid in [5, 6, 7], where the authors introduce a canonical form of a circuit and a set of rewrite rules that guarantee efficient extraction of a valid circuit representation again.

These methods are readily available with Python (PyZX) and Rust (QuiZX) implementations on the ZX-calculus website. An example of how a circuit in form of a ZX-diagram can be optimized to reduce T-gate count can be seen in the figure below, taken from [5].

a) Original circuit. b) The circuit expanded as a ZX-diagram, with 21 T gates. c) Simplified ZX-diagram. d) 15 T gates remain after phase-teleportation.

Figure taken from [5]. a) Original circuit. b) The circuit expanded as a ZX-diagram, with 21 T gates. c) Simplified ZX-diagram. d) 15 T gates remain after phase-teleportation.

These methods can, in principle, handle any type of gate in (Clifford + $T$ ) circuits and, in particular, don't suffer from superposition induced by Hadamards. Their main caveat is the extraction procedure to retrieve a valid quantum circuit, which is only efficient when specific rewrite rules are applied, thereby limiting the possibilities for optimization.

It has been shown that a combination of phase polynomial and ZX-calculus techniques can lead to better results than each method on its own. This makes sense as they nicely complement each other's Achilles' heels. For example, see the last column in table 1 with the op-T-mize circuits in [5], where the author's ZX-based method is combined with TODD.

Op-T-mize benchmark results from [5] in 2019 showcasing the advantage of combining phase polynomial methods (i.e. TODD) and ZX-calculus for T-gate optimization.

Op-T-mize dataset

Throughout the aforementioned papers, the authors use a specific set of logical circuits as the benchmark for their method in optimizing T-gate counts. In particular, you will find virtually the same table of circuits with slight variations in all papers. Having clearly defined and comparable benchmarks is a great driver for scientific progress. We want to contribute to that by making these circuits readily available in PennyLane Datasets. Because there is no commonly used name for these benchmark circuits (in fact, they are not named at all), we propose to call it the op-T-mize dataset due to its historic prevalence in T-gate optimization literature. Of course, these circuits can also be used to benchmark other metrics like $\text{CNOT}$ count, circuit depth or overall gate count.

Literature

[1] Matthew Amy, Dmitri Maslov, Michele Mosca "Polynomial-time T-depth Optimization of Clifford+T circuits via Matroid Partitioning" arXiv:1303.2042, 2013.

[2] Luke Heyfron, Earl T. Campbell "An Efficient Quantum Compiler that reduces T count" arXiv:1712.01557, 2017.

[3] Francisco J. R. Ruiz, Tuomas Laakkonen, Johannes Bausch et al "Quantum Circuit Optimization with AlphaTensor" arXiv:2402.14396, 2024.

[4] Vivien Vandaele "Lower T-count with faster algorithms" arXiv:2407.08695, 2024.

[5] Aleks Kissinger, John van de Wetering "Reducing T-count with the ZX-calculus" arXiv:1903.10477, 2019.

[6] Ross Duncan, Aleks Kissinger, Simon Perdrix, John van de Wetering "Graph-theoretic Simplification of Quantum Circuits with the ZX-calculus" arXiv:1902.03178, 2019.

[7] Miriam Backens, Hector Miller-Bakewell, Giovanni de Felice, Leo Lobski, John van de Wetering "There and back again: A circuit extraction tale" arXiv:2003.01664, 2020.

About the author

Korbinian Kottmann

Korbinian likes simulating quantum systems, whether it be via tensor network methods during his PhD, or with quantum computers at Xanadu. Currently, he works on quantum compilation to make simulation algorithms go brrr.

Last modified: January 10, 2025

About

Software & Documentation

Resources

Topics

Community & Support