Efficient Rotations With Phase Gradient States

Emily Nobes

Demos/
Algorithms/
Efficient Rotations With Phase Gradient States

Efficient Rotations With Phase Gradient States

Emily Nobes

Published: May 20, 2026. Last updated: May 20, 2026.

If you care about efficient rotations, you should care about phase gradient states.

The field of quantum algorithms has long been focused on ensuring operations can be executed on quantum hardware as efficiently as possible. Especially when integrating error correction and techniques toward fault-tolerant quantum computing, carrying out the processes as effectively as possible using as few resources as possible is of central importance. It is unlikely that the best way to do this is universal; different operations will require different implementation strategies to concurrently maintain functionality and efficiency. Many strategies have been thrown around as to how to approach this problem, such as the well known GridSynth strategy or the often cited Solovay-Kitaev approach. This demo, however, will focus on the new kid on the block: the phase gradient state, a gate synthesis tool that achieves resource efficiency by relying on quantum addition to rotate a quantum state.

Efficient Rotations?

As it stands, quantum algorithmic efficiency tends to be quantified by the number of T gates required to execute the non-Clifford operations included in a system. Since each T gate represents a fixed \(\frac{\pi}{4}\) rotation, generating arbitrary angles that deviate from this fixed point becomes increasingly resource intensive the more we vary from \(\frac{\pi}{4}\). Arbitrary rotations are very expensive and, at the same time, foundational to some of the most important tools we have in quantum algorithms and computing, such as the quantum Fourier transform (QFT), quantum phase estimation, and the Trotterization algorithm. As a result, optimizing these operations in the fault-tolerant picture is very important [1].

To emphasize the benefit of T gate optimization, let us take the example of time evolution simulation on a computational grid. Here, a wavefunction discretized to \(N=2^n\) grid points, where \(n\) is the number of qubits required to represent the dimensions of the grid, can be effectively evolved in time via the application of a multiplexed rotation, which can be interpreted as a mesh of controlled rotations that apply position-dependent phases to all members of a given state. In the naïve approach, where each point on the grid is treated independently and receives an isolated, individual rotation, the system’s gate count will scale as \(\mathcal{O}(2^n\log_2(1/\epsilon))\), where \(\epsilon\) is the desired precision. Yikes.

A quick resource estimation for a single pass of this procedure can be carried out using PennyLane and the estimator tool. Note that, for an \(n\)-qubit system, SelectPauliRot() (a PennyLane operator that represents the aforementioned multiplexed operation) will apply a total of \(R=2^n\) rotations to address each possible state. Thus, for a 3-qubit system, 8 rotations will be applied in a single pass. The T count estimate here is made using the method outlined in [8] for gate synthesis, which is the default approach taken by PennyLane’s estimation function.

import pennylane as qp
import numpy as np
import pennylane.estimator as qre

n = 3 #Control Qubits
np.random.seed(35)
angles = np.random.rand(2**n)*2*np.pi

#Define Circuit Wires
data_wires = range(n)
target_wire = n

dev = qp.device("default.qubit")

@qp.qnode(dev)
def circuit_baseline():
  for wire in data_wires:
    qp.Hadamard(wire)
  qp.SelectPauliRot(angles, data_wires, target_wire, rot_axis="Y")
  return qp.probs(target_wire)

print(qre.estimate(circuit_baseline)())

--- Resources: ---
 Total wires: 4
   algorithmic wires: 4
   allocated wires: 0
     zero state: 0
     any state: 0
 Total gates : 363
   'T': 352,
   'CNOT': 8,
   'Hadamard': 3

From this estimate, we can see that even a single, simple pass of a rotation multiplexer requires over 300 T gates for a small set of randomly-chosen angles. Considering the cost of T gates and the fact that useful applications of this procedure will need to handle many more qubits and many more rotations, it is easy to see the cost will grow very quickly. We need a hero!

The Phase Gradient State

What if, instead of having to apply an individual, expensive rotation for each bit value to achieve the desired outcome, we could simply add a pre-defined phase state to the qubits as needed?

Phase gradient states are a type of catalytic resource state that can be used to facilitate rotations additively. For clarity, a state is “catalytic” if it is unchanged by a process that it plays a role in facilitating. To borrow (aptly) from the language of chemistry, it exists to catalyze (assist) a specific process (operation). The phase gradient state can be interpreted as a catalyst for phase shifts, invoking phase accumulation on other qubits without sacrificing its own properties. The state can be represented as

\[|\nabla_b\rangle=\frac{1}{\sqrt{B}}\sum_{x=0}^{B-1}e^{-2\pi i \frac{x}{B}}|x\rangle\]

where \(x\) represents a target computational basis state, which one can interpret as an address marker that dictates what rotation angle is applied. In product state form,

\[|\nabla_b\rangle=\otimes_{j=1}^b\frac{1}{\sqrt{2}}(|0\rangle+e^{-i\frac{2\pi}{2^j}}|1\rangle).\]

Here, \(b=\log_2(1/\epsilon)\) is the total number of qubits stored in the phase gradient register, \(B=2^b\) is the total number of possible states in the superposition between all qubits in the gradient register, and \(j\) is the index of a specific qubit within the register.

../../_images/PhaseShiftCircuitDiagram.png — *Equivalent circuits for executing a phase shift, in which the phase shift operator can be replaced with an addition step between a state and the phase gradient register*¶

The phase gradient state can be interpreted as acting as a pre-defined plane of stored angles that can be accessed and invoked on a target state as desired. This state can be prepared once, stored in an auxiliary register, and reused. To induce a rotation, an integer value \(k\), proportional to the rotation angle needs to be added to the gradient register, which is done most commonly using a controlled addition gate step, to invoke a phase shift on the target state corresponding to the “address marker” mentioned above. This procedure is summarized as

\[\begin{split}\begin{aligned} |\Psi\rangle|\nabla_b\rangle &= \alpha|0\rangle|\nabla_b\rangle+\beta|1\rangle|\nabla_b\rangle \\ C(Add_k)|\Psi\rangle|\nabla_b\rangle &= \alpha|0\rangle|\nabla_b\rangle+\beta|1\rangle Add_k |\nabla_b\rangle. \end{aligned}\end{split}\]

The controlled addition step can basically be interpreted as a “push” invoked by the added state on the phase gradient register. Via quantum addition, the gradient register is shifted by an amount equivalent to the binary weight of each data qubit that is added to it. Since the data state remains “stationary”, the two states will be out of phase by an amount equivalent to the shift experienced by the register following the addition operation.

../../_images/PhaseKickback.gif — The phase gradient register addition process can be imagined as a change in the relative phase between the data register and the phase gradient register. As depicted, a controlled addition between the two registers will result in the positional displacement of the phase gradient state which, in turn, causes the phase difference that can be associated with either state. Even though the gradient register shifts, the states in the data register can “pick up” the relative phase difference.¶

Since phase is relative, it can be said without issue that the data register has accumulated a phase equivalent to this shift. This process is referred to as phase kickback. Again thanks to the relative nature of this shift, the properties of the gradient register are globally unchanged, solidifying it as a catalytic resource.

\[\begin{split}\begin{aligned} C(Add_k)|\Psi\rangle|\nabla_b\rangle &= \alpha|0\rangle|\nabla_b\rangle+\beta|1\rangle e^{-\frac{2\pi i k}{B}} |\nabla_b\rangle \\ &= (\alpha|0\rangle+\beta e^{-\frac{2\pi i k}{B}} |1\rangle)|\nabla_b\rangle. \end{aligned}\end{split}\]

Thus, the phase gradient state essentially stores spatially dependent phases that can be applied to invoke rotations as a function of qubit position.

Phase Gradient Rotation Algorithm

A phase gradient state is encoded onto a register composed of \(b\) qubits.
An adder operation is performed between a constant \(k\) and the gradient register.
The phase gradient register shifts proportionally to the \(k\) value.
The shift in the gradient register causes the target qubit to accumulate a relative phase via phase kickback.
Since position shifts are relative and do not alter structure, the catalytic phase gradient state remains unchanged and can be reused as desired.

This structure can be easily extended to the more commonly used multiplexed case that we discussed at the beginning of this demonstration. In this case, we can use a quantum read only memory (QROM) to store each \(k\) value in parallel in a data register. To carry out the rotation, it is best practice to carry out a semi-out-of-place addition using a SemiAdder() to add this data register to the phase gradient register. This reduces the complexity bound to \(\mathcal{O}(2^n+\log_2(1/\epsilon))\). This reduction in complexity combined with the catalytic nature of the phase gradient state makes this approach to gate synthesis highly resource efficient.

../../_images/Multiplexer.png — *Simple multiplexed phase gradient addition* [7].¶

from pennylane.labs.transforms import select_pauli_rot_phase_gradient

prec = 1e-9 #Desired Accuracy
b = int(np.ceil(np.log2(1/prec))) #Gradient Register Size

#Define Auxiliary Wires for Phase Gradient Transformation
reg = qp.registers({
    "_reserved": n + 1,   #Don't overwrite the previously defined wires!
    "angle":     b,
    "gradient":  b,
    "work":      3 * b,
})
angle_wires    = reg["angle"]
gradient_wires = reg["gradient"]
work_wires     = reg["work"]

@qp.qnode(dev)
@select_pauli_rot_phase_gradient(angle_wires,gradient_wires,work_wires)
def circuit_phase_grad():
  for wire in data_wires:
    qp.Hadamard(wire)
  qp.SelectPauliRot(angles, data_wires, target_wire, rot_axis="Y")
  return qp.probs(target_wire)

print(qre.estimate(circuit_phase_grad)())

--- Resources: ---
 Total wires: 183
   algorithmic wires: 154
   allocated wires: 29
     zero state: 29
     any state: 0
 Total gates : 855
   'Toffoli': 37,
   'CNOT': 525,
   'X': 136,
   'Z': 1,
   'S': 2,
   'Hadamard': 154

Note that there are different ways to translate Toffoli gates (which are continuous) into T gate counts, but here we will take Gidney’s approximation of 1 Toffoli gate = 4 T gates [1], meaning this implementation requires an estimated 148 T gates. So, for the same task with the same goal, this operation has now reduced in T gate cost by more than half. Not too shabby!

The scale of these savings becomes increasingly clear as the size of the data register that stores the input state (and, therefore, the size of the system as a whole) increases. The plot below shows how drastically the resource requirements scale in the first case with even a small increase in system size.

Though the phase gradient approach requires an investment of resources to prepare the gradient register, the additive, multiplexed nature of the rotation procedure results in a very slow accumulation of resources, as elaborated below. It is, of course, important to acknowledge that the implementation of the phase gradient state requires additional qubits to be added to the system. Though this contributes to resource cost, the comparative cost of generating a usable T gate still outweighs this investment in the long run. Thus, even though this approach requires an upfront investment of resources, it pays off in the long-run as additional per-rotation costs accumulate much more slowly than in alternative methods.

Sizing Up Other Optimizations

The importance of cheaply decomposing arbitrary rotations has been known to the industry for some time. As a result, the phase gradient approach is not the only gate-synthesis strategy available in the quantum algorithmic toolkit. There are many nuances involved in asserting the ideal applications of each algorithm, but a full exploration will not be included here. Instead, comparing the per-rotation gate cost reveals the relative cost of each strategy, which must be weighed against the nuance of the specific application. Note that, for a QFT, \(R=N^2/2\).

Algorithm	Setup Cost (T)	Marginal Cost Per Rotation (T)	Total Cost for \(R\) Rotations (T)	Cost to Execute a QFT for \(N=10\) (T)
Solovay-Kitaev	0	\(\log_2^{3.97}(1/\epsilon)\) [4]	\(R \log_2^{3.97}(1/\epsilon)\)	\(3.61 \times 10^{7}\)
Multiplexed Phase Gradient	\(\log_2(1/\epsilon)\)	\(4\log_2(1/\epsilon)\) [1]	\(\log_2(1/\epsilon) + \frac{4R}{N} \log_2(1/\epsilon)\)	\(6.28 \times 10^{2}\)
GridSynth	0	\(3\log_2(1/\epsilon)\) [2]	\(3R \log_2(1/\epsilon)\)	\(4.48 \times 10^{3}\)
Repeat Until Success (RUS)	0	\(2.4\log_2(1/\epsilon)\) [3]	\(2.4R \log_2(1/\epsilon)\)	\(3.59 \times 10^{3}\)
Kliuchnikov	0	\(2\log_2(1/\epsilon)\) [5]	\(2R \log_2(1/\epsilon)\)	\(2.99 \times 10^{3}\)
Single Qubit Gate Approximation	0	\(0.56\log_2(1/\epsilon)\) [6]	\(0.56R \log_2(1/\epsilon)\)	\(8.37 \times 10^{2}\)

Though this comparison shows the comparatively low cost of the multiplexed phase gradient approach, it leaves a hanging question: why not just multiplex every method? QROM can, of course, be used to load states into a register in any computationally compatible scenario, but the fact that the phase gradient state is catalytic and accessible in a pre-configured register is what makes this efficient multiplexing approach possible. If the phase gradient state were destroyed each time it was applied to a state, it would be impossible to effectively carry out a simultaneous rotation on a register. Trying to multiplex a GridSynth process, for example, would require the simultaneous execution of gate sequences that vary with the control qubit states. This would cause the depth of the multiplexed operation to explode, eliminating any potential advantage. So, while phase gradients are expensive to prepare, they give us the luxury of a simple look-up table style approach.

Conclusion

Phase gradient states are becoming more and more present in state-of-the-art algorithms, such as dynamic simulations. When it comes to developing useful applications for the future’s quantum computers, it is crucial to keep in mind what resources will be required so that compatibility with hardware and physical systems remains reasonable. As emphasized, the additive nature of using phase gradient states for this application greatly reduces the T count of applying arbitrary rotations. Understanding of the theory behind phase gradient states and the role they can play in carrying out efficient rotations opens the door to several compilation tools in PennyLane. The phase gradient page in PennyLane’s compilation hub is the best place to go to learn more. The tools available can be applied to a plethora of applications. Give them a try the next time you are experimenting with quantum phase estimation, for example!

References

About the author

Emily Nobes

Emily is a Master of Science in Physics student at McGill University. She works on integrated photonic hardware and tends to dabble in quantum encryption. Emily is joining Xanadu as a resident for summer 2026.

Total running time of the script: (0 minutes 0.562 seconds)

Share demo

Ask a question on the forum

Efficient Rotations With Phase Gradient States

Efficient Rotations?

The Phase Gradient State

Sizing Up Other Optimizations

Conclusion

References

About the author

Related Demos

Research

Performance

Hardware and simulators

Demos library

Compilation hub

Quantum datasets

Teach

Learn

Codebook

Coding challenges

Videos

Glossary

Install

Features

PennyLane documentation

Catalyst documentation

Development guide

How-to guides

API

GitHub