December 09, 2025

Compilation lay of the land: decomposition routines

Korbinian Kottmann

David Wierichs

Compilation lay of the land: decomposition routines

The term quantum compilation can mean very different things depending on who you are talking to.

It might refer to local circuit optimizations, to the translation between intermediate representations (IRs) for quantum circuits, or generally anything that operates on program descriptions that consist of, or contain, quantum instructions.

In this blog post, we want to highlight some of the landmark papers and established techniques in what we deem decomposition routines: the fundamental routines that break down some gate or family of gates into another gate set. Decomposition routines are a central aspect of quantum compilation that no compiler can work without, and that unlocks many other compilation techniques that rely on a specific gate set.

The importance of decompositions is mirrored in the age of the techniques we show here; with some landmark results dating all the way back to the early 2000s.

Unitary Synthesis

Unitary synthesis is the process that translates an arbitrary unitary matrix into a set of quantum gates. As such, it is the very first thing a compiler needs to do for any circuit component specified in terms of such a matrix. As unitary matrices grow exponentially in size with the number of qubits they act on, this naturally restricts unitary synthesis to smaller matrices. It serves as an essential subroutine wherever classical computation is used as a resource, such as (matrix product) state preparation.

Quantum Shannon Decomposition for reducing CNOT gates

The quantum Shannon decomposition is an archetypical unitary synthesis method based on recursive Cartan decompositions.

The Quantum Shannon Decomposition (QSD) by Shende, Bullock, and Markov is a cornerstone in unitary synthesis. It dates back to 2004 and we still keep coming back to it in 2025, for its mixed bag of decomposition and synthesis techniques, important cross-references from back in the day, and–maybe most often–for the mixed bag of circuit identities, concisely collected in circuit diagrams.

The main focus of earlier works like this was to provide the lowest possible CNOT count, which QSD achieved with $\frac{23}{48} 4^n$ in leading order. This number only recently got beaten by the Block ZXZ decomposition in 2024, with a new leading-order count of $\frac{22}{48} 4^n$ . It is worth noting that these decompositions all are based on the same recursive Cartan decomposition.

Reducing non-Clifford gates with auxiliary qubits

The unitary synthesis method introduced by Berry et al. is the best known in non-Clifford gate counts, but requires a significant number of auxiliary qubits.

The unitary synthesis method introcuced by Berry et al. is the best known in non-Clifford gate counts, but requires a significant number of auxiliary qubits.

While still relevant, we find that focusing solely on CNOT gates is somewhat antiquated in 2025, as the industry and community is shifting towards fault-tolerant quantum computing (FTQC). There, reducing non-Clifford gates, such as T and Toffoli gates, is much more important. Currently, the best known non-Clifford gate count for unitary synthesis is actually provided in a quantum chemistry applications paper (see section IIIa). At this point, it is worth highlighting that in order to achieve these gate counts, this method uses large auxiliary qubit registers, as can be seen in the example circuit with 3 qubits above. There, the auxiliary qubits of the two parallel QROMs are shown, while the extra qubits carrying an auxiliary phase gradient state, as well as the extra qubits for addition, are skipped.

Reducing non-Clifford gates without auxiliary qubits

This parameter-optimal decomposition has been introduced in Figure 3 in Mottonen, Vartiainen, Bergholm, Salomaa and later optimized in CNOT count in Bergholm, Vartiainen, Möttönen, Salomaa to be very close to the optimized QSD.

This parameter-optimal decomposition has been introduced in Figure 3 in Möttönen, Vartiainen, Bergholm, and Salomaa and later optimized in CNOT count in Bergholm, Vartiainen, Möttönen, and Salomaa to be very close to the optimized QSD.

When you don't have access to enough auxiliary qubits, the unitary synthesis method by Bergholm, Vartiainen, Möttönen, and Salomaa from 2004 is currently your best choice. It is optimal in the number of required rotation gates (each of which contributes a lot of T gates, as we shall later see), while being very close to Block ZXZ in CNOT count ( $\tfrac12 4^n$ vs the aforementioned $\tfrac{22}{48} 4^n$ ). Due to the mentioned focus on CNOTs, the community dismissed it in favour of the QSD, and then the Block ZXZ decomposition; quite an unfortunate tradeoff when we seek to implement a fault-tolerant quantum algorithm.

And by the way, if you are looking to implement innocent two-qubit operators, the literature knows how to synthesize those optimally! Probably our favourite resource for this is another work by Shende, Markov, and Bullock, although we took the liberty to summarize two-qubit synthesis in the PennyLane compilation hub as well.

Want to dive deeper?

Learn about the recursive Cartan decompositions that unify all QSD and QSD-adjacent unitary synthesis methods.
Check out our latest paper on unitary synthesis that achieves the best of both worlds: parameter-optimality and CNOT count optimality (note that it comes at a significant classical computional cost, though).

State preparation and demultiplexing

State preparation is realized by a series of multiplexed rotation gates, with the angles related to the input state coefficients as outlined in Eqs. (5) and (8) in Möttönen, Vartiainen, Bergholm, and Salomaa.

Preparing a state $|\psi\rangle$ on a quantum computer is a crucial subroutine. It is closely related to unitary synthesis, as you can think of it as realizing a single column of a unitary matrix $U = \begin{pmatrix}|\psi\rangle | * | * | .. | * \end{pmatrix}$ so that $U|0\rangle = |\psi\rangle$ . We dont care about the remaining columns marked by the $*$ , we just need $U$ to be unitary. An important routine for preparing arbitrary state vectors was provided by Möttönen, Vartiainen, Bergholm, and Salomaa in 2004.

Interested in exploring more?

Take a look at how multiplexed single-qubit gates are decomposed by the same authors.
Get a primer on matrix product states for quantum.
Learn how to prepare initial states for quantum chemistry.
Catch the latest with a brand-new compilation framework for state preparation by our colleagues at Xanadu.

Rotation discretization for FTQC

Discretization without auxiliary qubits

In most FTQC scenarios, only the (Clifford + T) gate set (or an equivalent discrete gate set) is supported. This means that rotation gates with arbitrary angles need to be discretized into a series of (Clifford + T) gates for some given precision. The Solovay-Kitaev algorithm is celebrated for solving this problem originally, but it did not find optimal solutions. Ross and Selinger found the optimal solution with a probabilistic algorithm involving finite number fields that is often referred to as gridsynth. In this work, which we consider a cornerstone of quantum compilation, each rotation gate roughly costs $3 \log_2(1/\epsilon)$ for a given precision $\epsilon$ . It comes with very nice geometric interpretation about so-called grid problems and unitaries.

$In gridsynth, the cheapest approximation from a finite number field to a target is sought, in an area given by the allowed error $\epsilon$.$

In gridsynth, the cheapest approximation from a finite number field to a target is sought, in an area given by the allowed error $\epsilon$ .

Code: Both methods are implemented in the PennyLane function qml.clifford_t_decomposition()

Discretization with auxiliary qubits

$In a repeat-until-success (RUS) procedure, a quantum gate $U$ is repeated until a desired measurement outcome is obtained. Otherwise, a correction operator $W^\dagger_i$ ensures correctness.$

In a repeat-until-success (RUS) procedure, a quantum gate $U$ is repeated until a desired measurement outcome is obtained. Otherwise, a correction operator $W^\dagger_i$ ensures correctness.

The optimal lower bound from gridsynth can be beaten when auxiliary qubits are available! The repeat-until-success (RUS) scheme does exactly that, achieving $\sim 1.2 \log_2(1/\epsilon)$ T gates per rotation. For this, it repeats a quantum gate until a desired measurement outcome on an entangled auxiliary qubit is obtained. For all the times that the desired result is not achieved, some correction operators are applied to guarantee the correctness of the procedure. This approach was further improved and generalized in the seminal work by Kliuchnikov et al., achieving $\sim 0.56 \log_2(1/\epsilon)$ T gates on average.

This concludes our brief foray into the land of decompositions, hopefully giving you an understanding of the current state-of-the-art results in the field. We hope you find the various resources as useful as we do in our day-to-day compilation research work, and look forward to the new decompositions that will be found in the future!

In the meantime, we'd love to hear about your favourite decomposition techniques. Maybe they are already covered in our compilation hub? Take a look and find out! And, if not, get in touch to let us know what techniques we are missing and should add.

Want to stay up to date on the latest trends in the wider quantum compilation research community? Make sure to check out our recap of the 2025 International Workshop on Quantum Compilation!

About the authors

Korbinian Kottmann

Quantum compilation & open source software

David Wierichs

I like to think about differentiation and representations of quantum programs, and I enjoy coding up research ideas and useful features for anyone to use in PennyLane.

Last modified: December 11, 2025

About

Software & Documentation

Resources

Topics

Community & Support