1. Demos/
  2. Quantum Computing/
  3. Unitary synthesis with recursive KAK decompositions

Unitary synthesis with recursive KAK decompositions

Published: May 30, 2025. Last updated: June 03, 2025.

Unitary synthesis is the process to decompose a given unitary matrix into quantum gates. Research efforts to obtain cheap decompositions have been going on for at least 25 years, successively reducing the gate counts and/or depth of the synthesized circuits. In this demo we will take a look at three unitary synthesis techniques: (one of) the first results proving universality of one-qubit and two-qubit gates for arbitrary unitaries 1, the probably most widely-known Quantum Shannon Decomposition (QSD) 2, and the Block-ZXZ decomposition 3 that currently holds the record for lowest CNOT count for any number of qubits.

Established unitary synthesis techniques implement variants of the same decomposition, here visualized as sandwiches.

Crucially, we will not just go through these three techniques separately, but we will explain how they are variants of the same underlying mathematical factorization, which is called a recursive KAK, or Cartan, decomposition 5. One of the recursion steps will be implemented by a Cosine-Sine Decomposition (CSD), which may be the most well-known KAK decomposition.

In the appendix, we will derive the CNOT and rotation gate counts for all three decompositions, both in their generic form and after applying manual optimizations.

We recommend that readers first familiarize themselves with KAK decompositions, for example using our introductory demo or our demo on compiling Hamiltonian simulation variationally, which follows reference 6.

KAK decompositions

A KAK, or Cartan, decomposition of a matrix Lie group \(\mathcal{G}\) with respect to a Lie subgroup \(\mathcal{K}\) allows us to factorize any group element \(G\in\mathcal{G}\) into \(K_1 A K_2\), where \(K_1,K_2\in\mathcal{K}\) and \(A\) lies in a so-called Cartan subgroup \(\mathcal{A}\). In a circuit, we would draw such a factorization:

Quantum circuit of a generic KAK decomposition.

To understand the nature of the matrix \(A\), consider the Lie algebra \(\mathfrak{g}\) of the group \(\mathcal{G}\) together with the algebra \(\mathfrak{k}\) of \(\mathcal{K}\), which is a subalgebra of \(\mathfrak{g}\). The orthogonal complement of \(\mathfrak{k}\) within \(\mathfrak{g}\) is the so-called horizontal space, \(\mathfrak{p}\). A (horizontal) Cartan subalgebra \(\mathfrak{a}\) then is any maximal Abelian subalgebra of \(\mathfrak{p}\), and the Cartan subgroup \(\mathcal{A}\) simply is the (Abelian) group generated by \(\mathfrak{a}\) via the exponential map, \(\mathcal{A} =\exp(\mathfrak{a})\).

Note that there are (infinitely) many possible choices for \(\mathfrak{a}\) and thus for \(\mathcal{A}\), and for each choice, the KAK decomposition exists. Also note that the KAK decomposition is a statement of existence, so it is still up to us to actually find the matrices \(K_1\), \(A\), and \(K_2\) for a given matrix \(G\).

There is a full classification of KAK decompositions, based on the groups \(\mathcal{G}\) and \(\mathcal{K}\). For \(\mathcal{G}\) being a classical real compact Lie group, we have ten types:

Type

\(\mathcal{G}\)

\(\mathcal{K}\)

rank

A

\(U(d)\times U(d)\)

\(U(d)\)

\(d\)

AI

\(U(d)\)

\(SO(d)\)

\(d\)

AII

\(U(2d)\)

\(Sp(d)\)

\(d\)

AIII \({}_{p,q}\)

\(U(p+q)\)

\(U(p)\times U(q)\)

\(\text{min}(p, q)\)

BD

\(SO(d)\times SO(d)\)

\(SO(d)\)

\(\lfloor d/ 2\rfloor\)

BDI \({}_{p,q}\)

\(SO(p+q)\)

\(SO(p)\times SO(q)\)

\(\text{min}(p, q)\)

DIII

\(SO(2d)\)

\(U(d)\)

\(d/2\)

C

\(Sp(d)\times Sp(d)\)

\(Sp(d)\)

\(d\)

CI

\(Sp(d)\)

\(U(d)\)

\(d\)

CII \({}_{p,q}\)

\(Sp(p+q)\)

\(Sp(p)\times Sp(q)\)

\(\text{min}(p, q)\)

Classification of KAK decompositions of the classical real compact Lie groups. Traditionally, they are defined on the special unitary group \(SU(d)\), which we have extended here by a global phase to \(U(d)\). The rank of the pair \((\mathcal{G},\mathcal{K})\) is the dimension of the Cartan subalgebra.

In the following, we will only use Cartan decompositions of type AIII or of type A, with a small exception of type AI for the Khaneja-Glaser decomposition. We follow Sec. IID and App. B of 5 and work with the unitary, rather than the special unitary group, throughout. While a Cartan decomposition of \(U(d)=SU(d)\times U(1)\) technically is not classified, Observation 1 in 5 allows us to extend a decomposition of \(SU(d)\) to \(U(d)\). It turns out that multiple technicalities become simpler with this. Now, let us look at those typed decompositions in detail.

Type-AIII decomposition

A type-AIII Cartan decomposition is defined for the unitary group \(U(d)\) in any dimension \(d\) and has a parameter \(p\), \(0\leq p\leq d\), that characterizes the subgroup \(\mathcal{K}\). Here we will only look at qubit systems (\(d=2^n\) for \(n\) qubits) and a particularly regular choice for \(p\) (\(p=d/2=2^{n-1}\)), corresponding to \(\mathcal{K}=U(2^{n-1})^{\times 2}\). In principle, there are many ways to embed this subgroup \(\mathcal{K}\) in the standard representation of \(\mathcal{G}=U(d)\) (i.e., in the representation as \(d\times d\) matrices with \(U^\dagger U=\mathbb{I}\)) but we will only use what’s maybe the most straightforward embedding, namely as block-diagonal matrices:

\[\begin{split}\mathcal{K}=U(2^{n-1})^{\times 2} \cong\left\{\begin{pmatrix} A & 0 \\ 0 & B \end{pmatrix}\bigg| A, B\in U(2^{n-1})\right\}.\end{split}\]

Note that such a block-diagonal matrix can also be written as \(A\oplus B\). In a quantum circuit, such a block-diagonal matrix is implemented by a multiplexer, or Select operator (also see Select). Using the first qubit as most significant bit, an element from \(\mathcal{K}\) can be drawn like the following:

Block-diagonal unitary gate, also called multiplexer or Select operator.

The horizontal subspace for this decomposition is

\[\begin{split}\mathfrak{p}_{\text{AIII}} = \mathfrak{k}^\perp =\left\{\begin{pmatrix} 0 & C \\ -C^\dagger & 0 \end{pmatrix}\bigg| C \in \mathbb{C}^{2^{n-1}\times 2^{n-1}}\right\},\end{split}\]

and a valid Cartan subalgebra will be any \(2^{n-1}\)-dimensional Abelian subalgebra \(\mathfrak{a}\subset\mathfrak{p}\). For example, we may choose

\[\begin{split}\mathfrak{a}_{\text{AIII}} =\left\{\begin{pmatrix} 0 & x \\ -x & 0 \end{pmatrix}\bigg| x \in \mathfrak{u}_{\text{diag}}(2^{n-1})\right\},\end{split}\]

which is spanned by the basis elements \(\{iY\otimes |j\rangle\!\langle j|\}\). Here we denoted by \(\mathfrak{u}_{\text{diag}}\) the Abelian algebra of diagonal purely imaginary matrices, a subalgebra of \(\mathfrak{u}\).

We observe that the generators in this Cartan subalgebra will lead to an independent Pauli \(Y\) rotation on the first qubit for every computational basis state of all other qubits. This is again a Select operator, or multiplexer, which in this case also is referred to as a uniformly controlled rotation. We may draw it as

Cartan subgroup element implemented as Select, or multiplexed, operator. Also called uniformly controlled rotation.

Later on, we will also consider other Cartan subalgebras for this decomposition, which differ from \(\mathfrak{a}_{\text{AIII}}\) by some simple basis rotations.

The full KAK decomposition of type AIII with the choices and representations from above is given by the circuit diagram:

Generic type-AIII KAK decomposition of the unitary group on n qubits.

Implementation: Cosine-Sine decomposition

We have not yet discussed how one can obtain the block-diagonal matrices \(K_1\), \(K_2\) and the Cartan subgroup element \(A\) for a given matrix \(G\in U(2^n)\). Luckily, as has been observed repeatedly in the literature, the Cosine-Sine Decomposition (CSD) implements a generalized type-AIII Cartan decomposition that allows to use distinct subgroups \(U(p_1)\times U(d-p_1)\) and \(U(p_2)\times U(d-p_2)\) for \(K_1\) and \(K_2\). Here we will not care for the generalization and fix \(p_1=p_2=2^{n-1}\).

The CSD, as implemented for example in scipy.linalg.cossin with the setting separate=True, then computes the blocks for the block-diagonal matrices as well as the coefficients, or rotation angles, for \(A\in\exp(\mathfrak{a}_{\text{AIII}})\).

We may check that this works with the following code. We apply cossin to some input U, from which we create the block matrices \(K_1\) and \(K_2\) with numpy and the Cartan subgroup matrix \(A\) via a Select-applied, or multiplexed, qml.RY rotation. Then we check that those matrices multiplied together yield the input U again.

import pennylane as qml
import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import cossin, eig, qr
from scipy.stats import unitary_group

def aiii_decomposition(U):
    """Compute an type-AIII KAK decomposition of ``U`` using a cosine-sine decomposition."""

    # Hilbert space dimension d=2**n
    d = len(U)
    # Pick p_1 = p_2 = 2**(n-1)
    return cossin(U, d//2, d//2, separate=True)

# Qubit count
n = 5
# Get a random unitary on n qubits
U = unitary_group.rvs(2**n, random_state=214)
# Compute AIII decomposition
(u_1, u_2), theta, (v_1, v_2) = aiii_decomposition(U)

# Check that the decomposition is valid
zero = np.zeros_like(u_1)
K_1 = np.block([[u_1, zero], [zero, u_2]])
ry_ops = [qml.RY(2 * th, 0) for th in theta]
A = qml.matrix(qml.Select(ry_ops, control=range(1, n)), wire_order=range(n))
K_2 = np.block([[v_1, zero], [zero, v_2]])

reconstructed_U = K_1 @ A @ K_2
print(np.allclose(reconstructed_U, U))
True

This worked!

If a different Cartan subalgebra is chosen, a suitable rotation needs to be applied. This is usually a matter of right-multiplying \(K_1\) (or left-multiplying \(K_2\)) by the rotation and re-interpreting the obtained angles to parametrize the new Cartan subgroup element.

For example, for the Block-ZXZ decomposition below, we would need to perform a rotation on the first qubit from the Pauli \(Y\) to the Pauli \(X\) basis. This rotation is given by \(\tfrac{1}{\sqrt{2}}(X+Y)\).

def aiii_decomposition_rotated(U):
    """Compute a type-AIII KAK decomposition of ``U`` using a cosine-sine decomposition
    and rotate the outputs so that the Cartan subalgebra is in the X basis on
    the first qubit."""
    d = len(U)
    n = int(np.round(np.log2(d)))
    (u_1, u_2), theta, (v_1, v_2) = cossin(U, d//2, d//2, separate=True)
    zero = np.zeros_like(u_1)
    K_1 = np.block([[u_1, zero], [zero, u_2]])
    K_2 = np.block([[v_1, zero], [zero, v_2]])
    rotation = (qml.X(0) + qml.Y(0)) / np.sqrt(2)
    rotation_mat = qml.matrix(rotation, wire_order=range(n))
    # Transform K_1 and K_2. No need to transform theta, just re-interpret as RX angles
    K_1 = K_1 @ rotation_mat
    K_2 = rotation_mat @ K_2
    return K_1, theta, K_2

new_K_1, theta, new_K_2 = aiii_decomposition_rotated(U)

# note that the block structure has been rotated:
fig, axs = plt.subplots(2, 2)
real, imag = "\mathfrak{Re}", "\mathfrak{Im}"
titles = [f"${part}(K_{i})$" for i in [1, 2] for part in [real, imag]]
data = [new_K_1.real, new_K_1.imag, new_K_2.real, new_K_2.imag]

for i, ax in enumerate(axs.flat):
    ax.set_title(titles[i])
    ax.imshow(data[i], cmap="bwr")
    ax.set_xticks([])
    ax.set_yticks([])
plt.show()

# The decomposition is still valid:

rx_ops = [qml.RX(2 * th, 0) for th in theta]
new_A = qml.matrix(qml.Select(rx_ops, control=range(1, n)), wire_order=range(n))
reconstructed_U = new_K_1 @ new_A @ new_K_2
print(np.allclose(reconstructed_U, U))
$\mathfrak{Re}(K_1)$, $\mathfrak{Im}(K_1)$, $\mathfrak{Re}(K_2)$, $\mathfrak{Im}(K_2)$
True

Type-A decomposition

A type-A Cartan decomposition is defined for the “doubled” unitary group \(\mathcal{G}=U(d)^{\times 2}\) and its subgroup \(\mathcal{K}=U(d)\). As you may anticipate, we still only care about qubit space dimensions, \(d=2^n\), for some qubit count \(n\). Generically, we may use the representation of \(\mathcal{G}\) as block-diagonal matrices from above, and we will use the representation

\[\begin{split}\mathcal{K}=U(2^n) \cong\left\{\begin{pmatrix} A & 0 \\ 0 & A \end{pmatrix}\bigg| A\in U(2^n)\right\} =\left\{\mathbb{I}_2 \otimes A | A\in U(2^n)\right\}\end{split}\]

for the subgroup. Note that in a quantum circuit, such an element \(\mathbb{I}_2\otimes A\in\mathcal{K}\) simply is the gate \(A\) acting on all but the first qubit.

As \(\mathcal{K}\) is generated by matrices that do the same in the two diagonal blocks, the complementary horizontal space accordingly consists of elements that do the opposite in the two blocks,

\[\begin{split}\mathfrak{p} = \mathfrak{k}^\perp =\left\{\begin{pmatrix} C & 0 \\ 0 & -C \end{pmatrix}\bigg| C \in \mathfrak{u}(2^n)\right\}.\end{split}\]

A valid Cartan subalgebra is then any Abelian subalgebra of \(\mathfrak{p}\) with dimension \(2^n\). A particularly simple choice that will be important for us is

\[\begin{split}\mathfrak{a}_{\text{A}} =\left\{\begin{pmatrix} d & 0 \\ 0 & -d \end{pmatrix}\bigg| d \in \mathfrak{u}_{\text{diag}}(2^n)\right\},\end{split}\]

which is exactly of the form of \(\mathfrak{a}_{\text{AIII}}\) except for a basis change from Pauli \(Y\) to Pauli \(Z\). We thus may draw the complete type-A decomposition using \(\mathfrak{a}_{\text{A}}\) as

Generic type-A KAK decomposition of the doubled unitary group on n-1 qubits.

Implementation: Demultiplexing

Like for the type-AIII KAK decomposition above, we will need a numerical method that computes the matrices \(K_1, K_2\in\mathbb{I}\otimes U(2^{n-1})\) and \(A\in\exp(\mathfrak{a}_{\text{A}})\) for the type-A decomposition such that

\[G = U \oplus V = (\mathbb{I} \otimes U_1) A (\mathbb{I} \otimes U_2).\]

Luckily, this can be done with a standard eigenvalue decomposition and some additional matrix manipulations 2.

Consider the block-diagonal matrix \(G=U\oplus V\) and compute \(\delta=UV^\dagger\in U(2^{n-1})\). An eigenvalue decomposition of \(\delta\) then yields \(\delta=U_1 D^2 U_1^\dagger\) with \(U_1\in U(2^{n-1})\) and \(D^2\) a diagonal unitary. Then we may compute any square root of \(D\) and define \(U_2=D U_1^\dagger V\), which again is unitary. And with this, we already found our type-A Cartan decomposition:

\[\begin{split}(\mathbb{I}\otimes U_1) (D\oplus D^\dagger) (\mathbb{I}\otimes U_2) &=(U_1\oplus U_1) (D\oplus D^\dagger) (U_2\oplus U_2)\\ &=(U_1 D U_2) \oplus (U_1 D^\dagger U_2)\\ &=(\underset{=\delta=UV^\dagger}{\underbrace{U_1 DD U_1^\dagger}} V) \oplus (U_1 D^\dagger D U_1^\dagger V)\\ &=U\oplus V\\ &=G\end{split}\]

Note that the central matrix \(D\oplus D^\dagger\) is generated by some \(d\oplus (-d)\), which is an element of our choice for the Cartan subalgebra above, \(\mathfrak{a}_{\text{A}}\).

Let’s implement the demultiplexing step in code as well.

def demultiplex(U, V):
    """Compute the type-A KAK decomposition of the block-diagonal matrix U⊕V that
    corresponds to demultiplexing."""
    delta = U @ V.conj().T
    # Compute eigenvalue decomposition
    D_squared, U_1 = eig(delta)
    # Compute the square root by extracting phases and halving them
    phi = np.angle(D_squared) / 2
    U_2 = np.diag(np.exp(1j * phi)) @ U_1.conj().T @ V
    # Return the rotation angles for A, instead of the diagonal matrix D
    return U_1, phi, U_2

U_1, phi, U_2 = demultiplex(u_1, u_2)

The demultiplexed matrices make up \(K_1=u_1\oplus u_2\) from above:

rz_ops = [qml.RZ(-2 * p, 0) for p in phi]
demultiplex_A = qml.matrix(qml.Select(rz_ops, control=range(1, n)), wire_order=range(n))
demultiplex_K_1 = np.block([[U_1, zero], [zero, U_1]])
demultiplex_K_2 = np.block([[U_2, zero], [zero, U_2]])
reconstructed_K_1 = demultiplex_K_1 @ demultiplex_A @ demultiplex_K_2
print(np.allclose(reconstructed_K_1, K_1))
True

As for the CSD, rotations into other Cartan subalgebras can usually be performed by multiplying \(K_1\) and \(K_2\) from one side, and reinterpreting the coefficients for \(A\) (phi in the function above) in the new Cartan subalgebra.

Recursive KAK decompositions

A key observation to enable the use of KAK decompositions for unitary synthesis is that one may apply them repeatedly, or recursively. For this, a new KAK decomposition is applied to the subgroup elements \(K_1, K_2\) obtained in a previous decomposition step while maintaining the Cartan subgroup element \(A\). Denoting elements obtained at the \(k\)th decomposition step with a superscript \({\circ}^{(k)}\), we find for a group element \(G\) at the third recursion level:

\[\begin{split}G &= & & & K^{(1)}_1 & & & & A^{(1)} & & & & K^{(1)}_2 & & & \\ &= & K^{(2)}_1 & & A^{(2)}_1 & & K^{(2)}_2 & & A^{(1)} & & K^{(2)}_3 & & A^{(2)}_2 & & K^{(2)}_4 & \\ &=K^{(3)}_1 & A^{(3)}_1 & K^{(3)}_2 & A^{(2)}_1 & K^{(3)}_3 & A^{(3)}_2 & K^{(3)}_4 & A^{(1)} & K^{(3)}_5 & A^{(3)}_3 & K^{(3)}_6 & A^{(2)}_2 & K^{(3)}_7 & A^{(3)}_4 & K^{(3)}_4\end{split}\]

You may already anticipate that the two decomposition types we considered above fit together particularly well in this context: The subgroup \(\mathcal{K}=U(2^{n-1})^{\times 2}\) of a type-AIII decomposition is exactly the total group of a type-A decomposition, and the subgroup \(\mathcal{K}=U(2^{n-1})\) for a type-A decomposition is a valid total group for another type-AIII decomposition. This allows a recursion exclusively between these two types of decompositions:

\[U(2^n) \overset{\text{AIII}}{\longrightarrow} [U(2^{n-1})\times U(2^{n-1})] \overset{\text{A}}{\longrightarrow} U(2^{n-1}) \overset{\text{AIII}}{\longrightarrow} [U(2^{n-2})\times U(2^{n-2})] \cdots \overset{\text{AIII}}{\longrightarrow} [U(4)\times U(4)] \overset{\text{A}}{\longrightarrow} U(4)\]

This is exactly what the three unitary synthesis techniques below implement in the form of quantum circuits!

In the following, we will make the choices of Cartan subalgebra explicit, demonstrating that all three works realize the above recursion of type-AIII and type-A KAK decompositions, combined with an irregular decomposition of \(U(4)\) and/or some post-hoc optimization steps.

Khaneja-Glaser decomposition (2000)

In 2000, Khaneja and Glaser 1 demonstrated that one-qubit and two-qubit operations are universal for quantum computation by proving that any unitary matrix can be broken down recursively, “decoupling” one qubit at a time. Bullock later identified one part of this recursion to be a type-AIII KAK decomposition 7, and Dağlı et al. added that the remaining steps form KAK decompositions as well 8. Furthermore, this decomposition has inspired a series of related works, mostly searching for variations of the recursive KAK decomposition, that achieve quantum circuits with reduced CNOT counts.

The Khaneja-Glaser decomposition defines a set of commuting two-qubit operators

\[\mathcal{S}=\{X\otimes X, Y\otimes Y, Z\otimes Z\},\]

which together with \(\mathbb{I}_4\) span an Abelian four-dimensional algebra \(\mathfrak{a}_{\text{base}}=\operatorname{span}_{i\mathbb{R}}\mathcal{S}\cup\{\mathbb{I_4}\}\). This algebra is related to the algebra of diagonal matrices via the so-called “magic basis” rotation:

\[\begin{split}E = \frac{1}{\sqrt{2}}\begin{pmatrix} 1 & i & 0 & 0 \\ 0 & 0 & i & 1 \\ 0 & 0 & i & -1 \\ 1 & -i & 0 & 0 \end{pmatrix},\quad E^\dagger \mathfrak{a}_{\text{base}} E = \mathfrak{u}_{\text{diag}}(4).\end{split}\]

To understand the choice of Cartan subalgebra for the type-AIII KAK decompositions of \(U(2^n)\), \(n\geq 3\), recall the generic choice from above:

\[\begin{split}\mathfrak{a}_{\text{AIII}} &=\left\{\begin{pmatrix} 0 & x \\ -x & 0 \end{pmatrix}\bigg| x \in \mathfrak{u}_{\text{diag}}(2^{n-1})\right\},\\ &=\operatorname{span}_{i\mathbb{R}}\left\{Y\otimes |j\rangle\langle j| | 0\leq j \leq 2^{n-1} \right\}.\end{split}\]

We rotate qubit \(0\) from the Pauli \(Y\) into the Pauli \(X\) basis, and qubits \(1\) to \(n-3\) from the computational basis into the Pauli \(X\) basis, which we can write:

\[\operatorname{span}_{i\mathbb{R}}\left\{X\otimes \{\mathbb{I},X\}^{\otimes (n-3)} \otimes |j\rangle\langle j| | 0\leq j \leq 4 \right\}.\]

In a final step, we apply the magic basis transformation \(E\) from above to the last two qubits (qubits \(n-2\) and \(n-1\)), which rotates them from the computational into the magic basis characterized by \(\mathcal{S}\).

\[\mathfrak{a}'_{\text{AIII}}(n) =\operatorname{span}_{i\mathbb{R}} \{X\otimes \{\mathbb{I},X\}^{\otimes (n-3)}\otimes \mathcal{S}\}.\]

For the type-A decompositions, the Cartan subalgebra is similar, but the first qubit is rotated into the Pauli \(Z\) basis instead:

\[\mathfrak{a}'_{\text{A}}(n) =\operatorname{span}_{i\mathbb{R}} \{Z\otimes \{\mathbb{I},X\}^{\otimes (n-3)}\otimes \mathcal{S}\}.\]

Collecting the type-AIII and type-A decompositions into one, the Khaneja-Glaser decomposition takes the following form:

Khaneja-Glaser decomposition of the unitary group on n qubits.

Terminating the recursion: two-qubit decomposition

To conclude the recursion, Khaneja and Glaser break down the remaining two-qubit unitaries from \(U(4)\) into single-qubit gates and two-qubit gates generated by \(\mathfrak{a}_{\text{base}}\). It turns out that this is a Cartan decomposition again, specified by the subgroup \(SO(4)\) and labeled as type AI. Generically, this decomposition would factorize a unitary matrix into two real-valued, orthogonal matrices and a diagonal matrix (the Cartan subgroup element).

However, the magic basis rotation \(E\) from above transforms orthogonal matrices into matrices of the form \(A\otimes B\) for two single-qubit (special) unitaries \(A,B\), and (as we mentioned above) it transforms diagonal matrices into matrices from \(\mathfrak{a}_{\text{base}}\). This means that the type-AI Cartan decomposition can be rotated by \(E\) to obtain a decomposition of the two-qubit unitaries into single-qubit unitaries and Cartan subgroup elements.

Math detail: accidental/exceptional isomorphism

The fact that the Lie groups of special orthogonal matrices (\(SO(4)\)) and single-qubit (special) unitaries on two qubits (\(SU(2)\times SU(2)\)) are isomorphic is a so-called accidental, or exceptional, isomorphism. In general, \(SO(d)\) is not isomorphic to two (or more) copies of some lower-dimensional unitary group(s).

As the Cartan subalgebras for the Khaneja-Glaser decomposition are obtained by rotating the canonical choices, they can be implemented in quantum circuits via the local rotations and any implementation of the generic subalgebra operators, see the info box in the next section.

Quantum Shannon decomposition (2004)

The Quantum Shannon Decomposition (QSD) was introduced by Shende et al. 2 in 2004, using the language of multiplexers, or Select operators. The individual statements provided in Theorems 1, 10 and 12 all are KAK decompositions, of types AIII (or AI), AIII, and A, respectively. The authors explicitly reference Theorem 10 as cosine-sine decomposition. As we covered those already above, we here just need to note down the Cartan subalgebras used by the QSD, which are simpler than those used by Khaneja and Glaser and match exactly the exemplary \(\mathfrak{a}_{\text{AIII}}\) and \(\mathfrak{a}_{\text{A}}\) given in the beginning. They are captured by multiplexed single-qubit rotations about the Pauli \(Y\) axis and Pauli \(Z\) axis, respectively, with the first of the qubits (on which the recursion currently acts) used as target, and all subsequent qubits used as controls. The alternating basis of the multiplexers is a consequence of the two different Cartan subalgebras used for the type-AIII and type-A decompositions.

Implementing a multiplexed rotation

As a subroutine, all decompositions require us to perform a multiplexed, also called uniformly controlled or Select-applied, single-qubit rotation, \(UCR_P(\vec{\theta})\) about some Pauli word \(P\). A possible implementation of \(UCR_P\) is given in the following circuit diagram:

Decomposition of a multiplexer on 3 qubits.

It takes \(2^k\) CNOT gates and \(2^k\) rotation gates for \(k\) control qubits.

Concatenating these statements (Theorems 10 and 12) results in the widely used QSD (Theorem 13), which is most easily expressed as a circuit drawing:

Quantum Shannon decomposition of the unitary group on n qubits.

You may notice that this is simply the combination of our circuit drawings for the generic KAK decompositions of types AIII and A from earlier. It is instructive to perform the recursion in a circuit diagram and to appreciate the circuit structure arising from the full QSD:

Recursively applied Quantum Shannon decomposition of the unitary group on n qubits.

Here, each of the simple boxes represents an arbitrary \(U(2^{n-2})\) gate.

Before moving on, it is important to mention that the QSD achieves its CNOT counts as given in Tab. 1 of 2 by ultimately decomposing \(U(4)\) with a standard 3-CNOT circuit as used for two-qubit synthesis, see 9 by the same authors, and by applying two optimization steps, see our gate counts appendix or App. A of 2.

Block-ZXZ decomposition (2024)

The third decomposition using the recursive KAK decomposition with types AIII and A is the so-called Block-ZXZ decomposition by Krol and Al-Ars 3. The important steps are virtually identical to a QSD, with the small difference that the Cartan subalgebra \(\mathfrak{a}_{\text{AIII}}\) is rotated on the first qubit, from the Pauli \(Y\) into the Pauli \(X\) basis. The subalgebra for the type-A decomposition remains unchanged.

The apparent key difference to the QSD lies in the transformation applied in addition to each type-AIII Cartan decomposition. We showcase the transformation into the block structure in form of a circuit diagram:

Transformations from a type-AIII KAK decomposed circuit to a circuit in Block-ZXZ form.

Afterwards, the block-diagonal matrices can be decomposed with the type-A decompositions, which is done with the same generic technique as for the QSD. Crucially, the transformation above appears to simplify the circuit structure by turning some blocks into the identity, but this has no effect on the CNOT count. The enhanced optimization performed by Krol and Al-Ars also applies to the more generic form after the recursive Cartan decomposition (see our gate counts appendix and Sec. 5 of 3).

In this generic form, the Block-ZXZ decomposition is almost identical to the QSD (can you spot the difference?):

Block-ZXZ decomposition of the unitary group on n qubits.

Conclusion

In this demo we learned how to chain KAK, or Cartan, decompositions together in order to obtain recursive decomposition techniques. In particular, we found that three important decompositions from the literature take this form, and even use Cartan decompositions of the same types. Ultimately, the difference in structure only stems from basis rotations of the Cartan subgroups and some manual optimization tricks, which are closely related for the Quantum Shannon decomposition and the Block-ZXZ decomposition.

If you are interested, take a read through the appendix which discusses the gate counts of the three decompositions and the mentioned optimization steps in detail.

References

1(1,2,3)

Navin Khaneja, Steffen Glaser (2000) Cartan Decomposition of SU(2^n), Constructive Controllability of Spin systems and Universal Quantum Computing. arXiv:quant-ph/0010100.

2(1,2,3,4,5,6)

Vivek V Shende, Stephen S Bullock, Igor L Markov (2006) Synthesis of Quantum Logic Circuits. arXiv:quant-ph/0406176, IEEE Trans. on Computer-Aided Design, Vol. 25, No. 6.

3(1,2,3,4)

Anna M Krol, Zaid Al-Ars (2024) Beyond Quantum Shannon: Circuit Construction for General n-Qubit Gates Based on Block ZXZ-Decomposition. arXiv:2403.13692.

4

Maximilian Balthasar Mansky, Santiago Londoño Castillo, Victor Ramos Puigvert, Claudia Linnhoff-Popien (2023). Near-optimal quantum circuit construction via Cartan decomposition. arXiv:2212.12934, Phys. Rev. A 108, 052607.

5(1,2,3)

David Wierichs, Maxwell West, Roy T. Forestano, M. Cerezo, Nathan Killoran (2025) Recursive Cartan decompositions for unitary synthesis. arXiv:2503.19014.

6

Efekan Kökcü, Thomas Steckmann, Yan Wang, J K Freericks, Eugene F Dumitrescu, Alexander F. Kemper (2021) Fixed Depth Hamiltonian Simulation via Cartan Decomposition. arXiv:2104.00728, Phys. Rev. Lett. 129, 070501

7

Stephen S Bullock (2004) Note on the Khaneja Glaser Decomposition. arXiv:quant-ph/0403141, Quantum Inf. Comput., Vol. 4, No. 5, 396-400.

8

Mehmet Dagli, Domenico D’Alessandro, Jonathan D H Smith (2007) A General Framework for Recursive Decompositions of Unitary Quantum Evolutions. arXiv:quant-ph/0701193, J. Phys. A: Math. Theor. 41 155302.

9

Vivek V Shende, Igor L Markov, Stephen S Bullock (2003) Minimal Universal Two-qubit Quantum Circuits. arXiv:quant-ph/0308033, Phys. Rev. A 69, 062321.

Appendix: Gate counts

Before concluding, let us count the gates in the three decompositions we discussed above. We will start with the decompositions as they are and then will comment on optimizations that reduce the CNOT count further. We will denote the CNOT (rotation gate) cost for a special unitary on \(n\) qubits as \(c_n\) (\(r_n\)).

Khaneja-Glaser decomposition

Khaneja and Glaser 1 decompose two-qubit special unitaries into circuits with leading and trailing single-qubit unitaries (12 single-parameter rotations in total) and a central Cartan subgroup element with three rotations generated by the operators in \(\mathcal{S}\) (except for the identity). They do not specify a decomposition into CNOTs, so that we need to fill in this gap. For comparability, we assume an optimal decomposition of the central part into three CNOTs and three rotation gates. Overall, we thus have \(c_2^{\text{KG}}=3\) and \(r_2^{\text{KG}}=15\).

For an arbitrary number of qubits \(n\), Khaneja and Glaser decompose one special unitary \(U\in SU(2^{n})\) into four special unitaries on \(n-1\) qubits, three \((n-1)\)-multiplexed \(R_Z\) or \(R_X\) rotations and a number of basis change rotation gates. The multiplexers take \(2^{n-1}\) CNOT gates and equally many rotation gates each (see info box in QSD section). The basis change rotations are composed of six layers of Hadamard gates on \(n-3\) qubits and six applications of \(E\) or \(E^\dagger\).

At this point, we can either decide to merge those rotations into the \(SU(2^{n-1})\) blocks, or to keep them in the decomposition. For the former, we get the unoptimized gate counts of the QSD below. For the latter, we note that a circuit to implement \(E\) is given by

Circuit for the magic basis rotation E.

and if we disregard single-qubit Clifford gates (including the Hadamard layers) for simplicity, this leads to six additional CNOT gates per qubit reduction.

Overall, we obtain the recursion relations

\[\begin{split}c_n^{\text{KG}} &= 4 c_{n-1}^{\text{KG}} + 3\cdot 2^{n-1} + 6,\\ r_n^{\text{KG}} &= 4 r_{n-1}^{\text{KG}} + 3\cdot 2^{n-1}.\end{split}\]

Resolving those recursions gives \(c_n^{\text{KG}} = a 4^n - 3 \cdot 2^{n-1} -2\) with \(c_2^{\text{KG}} = 16a - 8\), and \(r_n^{\text{KG}} = b 4^n - 3 \cdot 2^{n-1}\) with \(r_2^{\text{KG}}=16b - 6\), respectively. For the optimal two-qubit decomposition as starting point, we find

\[\begin{split}c_n^{\text{KG}} &= \frac{11}{16} 4^n - 3\cdot 2^{n-1} - 2,\\ r_n^{\text{KG}} &= \frac{21}{16} 4^n - 3\cdot 2^{n-1}.\end{split}\]

Note that some references give the CNOT count for the Khaneja-Glaser decomposition as \(\tilde{c}_n^{\text{KG}}=\tfrac{21}{16} 4^n - 3(n 2^{n-2} + 2^n)\), based on a decomposition in 4.

Khaneja and Glaser do not perform any optimizations so that we leave the counts at the above.

Quantum Shannon Decomposition

Without any optimizations, the QSD decomposes a unitary on \(n\) qubits into four unitaries on \(n-1\) qubits and three \((n-1)\)-multiplexed single-qubit \(R_Z\) and \(R_Y\) rotations. That is, we find the same behaviour as for the Khaneja-Glaser decomposition if we had merged the basis rotations into the smaller unitary blocks. Correspondingly, we get the recursion relation \(x_n=4 x_{n-1} +3\cdot 2^{n-1}\) for both, the CNOT count and the number of rotation gates. Using the optimal two-qubit decomposition then leads to

\[\begin{split}c_n^{\text{QSD}} &= \frac{9}{16} 4^n - 3\cdot 2^{n-1},\\ r_n^{\text{QSD}} &= \frac{21}{16} 4^n - 3\cdot 2^{n-1}.\end{split}\]

Shende et al. 2 then perform the following two optimizations: First, the decomposition of the multiplexed \(R_Y\) rotation is replaced by the following, which uses CZ instead of CNOT gates:

Decomposition of a multiplexed RY rotation on 3 qubits using CZ gates.

As the last CZ gate is (block) diagonal, it can be absorbed into the multiplexed \(SU(2^{n-1})\) operation on the right, before decomposing it with the type-A decomposition. This changes the recursion relation to \(c_n = 4 c_{n-1} + 3\cdot 2^{n-1} - 1\) and leaves the rotation count unchanged. The solution for the CNOT count is \(c_n = \tfrac{13}{24} 4^n -3\cdot 2^{n-1} + \tfrac{1}{3}\).

Second, Shende et al. consider the stage at the recursion just before decomposing the two-qubit unitaries, and with all multiplexed rotations left intact. They then recite the following alternative decomposition of two-qubit unitaries, where \(\Delta\) is a diagonal unitary operator:

Decomposition of a two-qubit unitary with a diagonal and two CNOTs.

Note that all two-qubit blocks in the QSD are separated by multiplexer controls, which commute with diagonal matrices. Therefore we can perform the following optimization:

CNOT optimization in the QSD using an alternative two-qubit decomposition.

Here we represented \(SU(2)\) gates with dotted lines and the remaining parts of the multiplexers with dashed lines. We start at the right-most two-qubit block, decompose it into the above form, and pull the diagonal \(\Delta\) to the left through the adjacent multiplexer (red). Then we absorb the diagonal into the second two-qubit block from the right (blue). This block can then be decomposed into the above form again, and the diagonal term can be moved through the next multiplexer (red). Continuing this, we find that we can decompose all two-qubit blocks into 2 CNOTs and 14 rotation gates, except for the left-most block which is decomposed in the conventional manner into 3 CNOTs and 15 rotations. As there are \(4^{n-2}\) two-qubit blocks, we save \(4^{n-2}-1\) CNOTs and rotation gates, arriving at

\[\begin{split}c_n^{\text{QSD,opt}} &= \tfrac{23}{48} 4^n -3\cdot 2^{n-1} + \tfrac{4}{3},\\ r_n^{\text{QSD,opt}} &= \tfrac{5}{4} 4^n - 3\cdot 2^{n-1} + 1.\end{split}\]

Block-ZXZ decomposition

The generic gate counts for the Block-ZXZ decomposition are the same as for the QSD. This is not too surprising, given their very similar structure. The optimization steps taken by Krol and Al-Ars 3 also are quite similar. In particular, the second optimization for the QSD is inherited as-is, and the first optimization is tweaked to allow for the removal of a second CNOT gate per step. It is easiest to understand in the generic KAK-decomposed circuit, i.e., before performing the transformation into the block form that we derived in the main text. We again demonstrate the optimization in a series of circuit equalities:

Optimization of the Block-ZXZ decomposition to remove two CNOT gates.

We first pulled out Hadamards from the middle multiplexed \(R_X\) rotation, making it an \(R_Z\) rotation instead, and recognized that the middle three gates form a multiplexed \(SU(2^{n-1})\) gate (green). Then we inserted two CNOTs to the left and right of the first and second Hadamard, respectively, and pulled one of them through the Hadamard gate (yellow). Afterwards we may recognize that the multiplexed block does not change when merging the CZ gates into it, and we revert the grouping step and the Hadamard transform (blue) Overall, we created two CNOT gates out of thin air, which will cancel with the outer-most CNOT gates of the decompositions for the multiplexed \(R_Z\) rotations (red).

This changes the recursion relation to \(c_n = 4 c_{n-1} + 3\cdot 2^{n-1} - 2\) with solution \(c_n = \tfrac{25}{48} 4^n -3\cdot 2^{n-1} + \tfrac{2}{3}\). Subtracting the savings from the second step (see QSD), we obtain

\[\begin{split}c_n^{\text{ZXZ,opt}} &= \tfrac{22}{48} 4^n -3\cdot 2^{n-1} + \tfrac{5}{3},\\ r_n^{\text{ZXZ,opt}} &= \tfrac{5}{4} 4^n - 3\cdot 2^{n-1} + 1.\end{split}\]

We note once more that the transformation into the special block form that is eponymous to the Block-ZXZ decomposition does not yield any CNOT or rotation gate reductions. This is because the controlled gates obtained through the transformation are decomposed like multiplexers, leading to the same cost as before the transformation.

To conclude, we plot the CNOT and rotation gate counts, comparing them to their respective lower bounds

\[\begin{split}c_n^{\text{min}} &= \tfrac{1}{4} (4^n - 3n - 1),\\ r_n^{\text{min}} &= 4^n - 1.\end{split}\]

The former is derived by counting the independent degrees of freedom one can add per CNOT gate in a circuit and equating this with the dimension of the group, \(4^n-1\). This dimension then also is the lower bound for the rotation count.

n = np.arange(2, 11)
c_n_KG = 11/16 * 4 ** n - 3 * 2**(n-1) - 2
r_n_KG = 21/16 * 4 ** n - 3 * 2**(n-1)

c_n_QSD = 9/16 * 4 ** n - 3 * 2**(n-1)
r_n_QSD = 21/16 * 4 ** n - 3 * 2**(n-1)

c_n_QSD_opt = 23/48 * 4 ** n - 3 * 2**(n-1) + 4/3
r_n_QSD_opt = 5 / 4 * 4 ** n - 3 * 2**(n-1) + 1

c_n_ZXZ = 11/24 * 4 ** n - 3 * 2**(n-1) + 5/3
r_n_ZXZ = 5 / 4 * 4 ** n - 3 * 2**(n-1) + 1

c_n_min = 1/4 * (4**n - 3 * n - 1)
r_n_min = 4**n - 1

all_c_n = [c_n_KG, c_n_QSD, c_n_QSD_opt, c_n_ZXZ, c_n_min]
all_r_n = [r_n_KG, r_n_QSD, r_n_QSD_opt, r_n_ZXZ, r_n_min]
labels = ["KG", "QSD", "QSD, opt", "Block-ZXZ", "Lower bound"]
colors = ["#DE8900", "#4D53C8", "#44ACE8", "#D7333B", "#007D33"]


fig, axs = plt.subplots(1, 2, figsize=(12, 5))
for c_n, r_n, label, color in zip(all_c_n, all_r_n, labels, colors):
    axs[0].plot(n, c_n / 4**n, label=label, color=color, lw=2)
    ls = ":" if label in ["QSD", "Block-ZXZ"] else "-"
    axs[1].plot(n, r_n / 4**n, label=label, color=color, ls=ls, lw=2)

ylabels = ["Number of CNOT gates ($/4^n$)", "Number of rotations ($/4^n$)"]
for ax, ylabel in zip(axs, ylabels):
    ax.legend()
    ax.set_xlabel("Number of qubits $n$")
    ax.set_ylabel(ylabel)
tutorial unitary synthesis kak

About the author

David Wierichs

David Wierichs

I like to think about differentiation and representations of quantum programs, and I enjoy coding up research ideas and useful features for anyone to use in PennyLane.

Total running time of the script: (0 minutes 0.610 seconds)