1. Demos/
  2. Optimization/
  3. Circuits as Fourier series

Circuits as Fourier series

Published: September 10, 2023. Last updated: November 05, 2024.

In this demo, we’re going to give an interactive, code-free introduction to the idea of quantum circuits as Fourier series. We’ll also discuss one of the main applications, called the parameter-shift rule. Concepts will be embodied in visualizations you can play with! We’ll assume some familiarity with the basics of quantum circuits. The last part of the demonstration is more advanced, and intended to give researchers a different way to think of the material. But beginners are also welcome!

Single-qubit gates

Consider a single-qubit Hermitian operator \(G.\) It acts on a two-dimensional space, and therefore has two eigenvalues: \(\kappa \pm \gamma,\) where \(\kappa\) is the average and \(2\gamma\) the difference between them. If we exponentiate \(G\), with a parameter \(\theta,\) we obtain a unitary gate:

\[U(\theta) = e^{i\theta G}.\]

Up to an overall phase of \(e^{i\theta\kappa},\) which we can ignore because it is not measurable, the unitary \(U\) has eigenvalues \(e^ {\pm i \theta\gamma}\). In the eigenbasis (i.e., the basis of eigenvectors of \(G,\) or equivalently \(U\)), it is diagonal. We will draw this diagonal matrix as a magenta box:

We will work on the eigenbasis of \(U\) from now on. This means that a column vector \([1, 0]^T\) is the eigenvector associated with \(-\gamma,\) and \([0, 1]^T\) is associated with \(+\gamma:\)

Tip

Click the image to toggle between eigenvectors.

We’ve written the matrix as a box to suggest a different way to think of it: a gate in a quantum circuit. Instead of column vectors, we can use bra-ket notation, with basis states \(\vert0\rangle = [1, 0]^T\) and \(\vert 1\rangle = [0, 1]^T.\) We will also add some horizontal lines through the gate to suggest that the states are “piped” through and pick up the corresponding phase.

Tip

Click to toggle between basis states \(\vert 0\rangle\) and \(\vert 1\rangle.\) While the mouse is in the magenta box, its vertical position controls \(\gamma,\) with the top of the box corresponding to high frequencies, and the bottom of the box to low frequencies.

Frequency components

You may wonder why we have introduced the parameter \(\theta.\) The idea is that, if we have \(U(\theta)\) in our circuit, we can treat \(\theta\) as a parameter we can tune to improve the results of the circuit for, say, approximating a state of interest. Viewed as functions of this tunable parameter \(\theta,\) the purples phases are exponentials of frequency \(\omega = \pm \gamma.\) Below, we plot the real and imaginary parts of these frequencies.

Tip

Click to toggle between basis states \(\vert 0\rangle\) and \(\vert 1\rangle.\) While the mouse is in the magenta box, its vertical position controls \(\gamma.\)

Usually, we start a wire in the \(\vert 0\rangle\) basis state. We can “split” this into a combination of \(\vert 0\rangle\) and \(\vert1\rangle\) by applying a gate \(W.\) We picture \(W\) as a blue gate below. The state \(\vert\psi(\theta)\rangle =U(\theta)W\vert 0\rangle\) will then be a superposition of \(e^{-i\theta\gamma}\vert0\rangle\) and \(e^{+i\theta\gamma}\vert1\rangle.\) Again, viewed as a function of \(\theta,\) it has both frequency components, storing them in the coefficient of the corresponding eigenstate.

Tip

Vertical mouse position in the blue box controls the relative weight of \(\vert0\rangle\) and \(\vert 1\rangle.\) Hovering towards the top places the most weight on \(\vert0\rangle,\) and towards the bottom on \(\vert1\rangle.\) Clicking the magenta box toggles between them, with vertical mouse position controlling \(\gamma.\)

Once we’ve prepared a state \(\vert\psi(\theta)\rangle,\) we can measure it with some Hermitian operator \(M.\) In the context of variational circuits, the result of a measurement can be used to optimize the parameter \(\theta.\) To this end, let’s define the expectation value as a function of \(\theta,\)

\[f(\theta) = \langle \psi(\theta)\vert M \vert\psi(\theta)\rangle.\]

We represent the measurement \(M\) as a yellow box, sandwiched between a circuit on the left preparing the ket \(\vert\psi(\theta)\rangle\) and an adjoint circuit preparing the bra \(\langle \psi(\theta)\vert.\)

Note

Note that taking the adjoint swaps \(\pm\gamma,\) and the order of elements in the circuit is inverted compared to the expression for \(f(\theta).\)

We can expand the bra and ket in terms of frequency components \(e^{\pm i\theta\gamma},\) so \(f(\theta)\) will be a sum of products of these terms. We show this below.

Tip

Click on magenta boxes to toggle between frequency components contributing to \(f(\theta).\) Vertical position in the yellow box controls the constant terms \(\langle 0 \vert M \vert0\rangle\) and \(\langle 1 \vert M \vert 1\rangle.\)

To recap what we’ve learned so far: the ket \(\vert\psi (\theta)\rangle\) is a linear combination of the terms \(e^ {-i\theta\gamma}\vert0\rangle\) and \(e^ {+i\theta\gamma}\vert1\rangle\). If we measure this state, the expectation value \(f(\theta) = \langle\psi(\theta)\vert M\vert\psi (\theta)\rangle\) is a linear combination of products of the frequency terms \(e^{\pm i\theta\gamma}.\) More formally, we can write

\[f(\theta) = c_{(-2)} e^{-i2\gamma\theta} + c_{(0)} + c_{(+2)} e^{+i2\gamma\theta}\]

for some coefficients \(c_{(-2)}, c_{(0)}, c_{(+2)}.\) General expressions of this form—sums of exponential terms with evenly spaced frequencies—are called Fourier series. This turns out to be a useful way to look at parameterized circuits!

Larger circuits

We can embed this structure, with a single occurrence of \(U (\theta)\), into a larger circuit. Instead of a linear combination of \(\vert0\rangle\) and \(\vert 1\rangle,\) it will be a linear combination of the form

\[\alpha_0 \vert 0\rangle \otimes \vert \psi_0\rangle + \alpha_1 \vert 1\rangle \otimes\vert \psi_1\rangle,\]

for some states \(\vert \psi_0\rangle\) and \(\vert \psi_1\rangle\) on the rest of the circuit, up to a reordering of wires. (This follows by factorizing with respect to the tensor product.)

After applying \(U(\theta),\) the overall state will become

\[e^{-i\theta\gamma}\alpha_0 \vert 0\rangle \otimes\vert \psi_0\rangle + e^{+i\theta\gamma}\alpha_1 \vert 1\rangle \otimes\vert \psi_1\rangle.\]

Whatever subsequent gates we apply, as long as there is only one occurrence of \(U(\theta),\) this expansion in frequencies \(e^ {\pm i\theta\gamma}\) is the same. We illustrate how a single copy of \(U (\theta)\) acts on the larger circuit below.

If there are multiple copies of \(U(\theta)\) in the circuit, we simply get more frequencies. Each box will multiply existing coefficients by \(e^{-i\gamma\theta}\) or \(e^{+i\gamma\theta},\) depending on the state it encounters. This will iteratively build up a state of the following form:

\[\vert \psi(\theta)\rangle = \alpha_{0\cdots 00}e^{-in\gamma\theta}\vert \psi_{0\cdots 00}\rangle + \alpha_{0\cdots 01}e^{-i(n-2)\gamma\theta}\vert \psi_{0\cdots 01}\rangle + \cdots + \alpha_{1\cdots 11}e^{+in\gamma\theta}\vert \psi_{1\cdots 11}\rangle,\]

where the first term corresponds to choosing \(e^{-i\gamma \theta}\) in each box, and the last term \(e^{+i\gamma \theta}\) in each box. Note that, for intermediate frequencies, many different choices yield the same final result! We illustrate for \(n = 2\) below, where the total state at the end of the circuit is

\[\vert \psi(\theta)\rangle = \alpha_{00}e^{-i2\theta\gamma} \vert \psi_{00}\rangle + \alpha_{01}e^{i0\gamma}\vert \psi_{01}\rangle + \alpha_{10}e^{i0\gamma}\vert \psi_{10}\rangle + \alpha_{1}e^{+i2\theta\gamma} \vert \psi_{11}\rangle.\]

Here, there are two ways to obtain a frequency of zero.

Tip

Click on the magenta boxes terms to choose \(e^{\pm i\gamma\theta}\) in each box. The final frequency is shown on the right. Although we are placing the boxes in parallel, the result is the same for series.

As usual, we can form the expectation value of a measurement \(f(\theta) = \langle \psi(\theta)\vert M \vert \psi (\theta)\rangle\). This will be a linear combination of overlaps of the states above. For instance, when \(n =2,\) it will be built from overlaps of the states (including the associated phases)

\[e^{-i2\theta\gamma}\vert \psi_{00}\rangle, \quad e^{i0\gamma}\vert \psi_{01}\rangle, \quad e^{i0\gamma}\vert \psi_{10}\rangle, \quad e^{+i2\theta\gamma}\vert \psi_{11}\rangle.\]

Thus, the expectation is a Fourier series, with terms arising from all the different ways of combining these frequencies. For general \(n,\) this is

\[f(\theta) = c_{(-2n)}e^{-2in\gamma\theta} + \cdots + c_{(0)} + \cdots + c_{(2n)}e^{2in\gamma\theta}.\]

Below, we illustrate for \(n = 2,\) where the Fourier series consists of five terms:

\[f(\theta) = c_{(-4)}e^{-4i\gamma\theta} + c_{(-2)}e^{-2i\gamma\theta} + c_{(0)} + c_{(+2)}e^{+2i\gamma\theta} +c_{(+4)}e^{+4i\gamma\theta}.\]

Tip

Click on the magenta boxes terms to choose \(e^{\pm i\gamma\theta}\) in each box, contributing to the final frequency. This frequency is shown below the circuit.

Coefficient vectors

So far, we’ve focused on the \(\theta\)-dependent “pure frequency” terms \(e^{i\omega\theta\gamma}\) appearing in the Fourier series. However, the coefficients \(c_w\) also have an important role to play. It turns out that, for a given set of frequencies in the Fourier series, the coefficients uniquely characterize \(f(\theta).\) We’ll redo the \(n = 2\) example but highlight the coefficients instead; these depend on the structure of the circuit and the choice of measurement.

Tip

Click on the magenta boxes terms to choose \(e^{\pm i\gamma\theta}\) in each box, contributing to the final frequency. The coefficient of this final frequency is shown below the circuit.

The set of Fourier sums of fixed degree form a vector space, and the pure frequencies \(e^{i\omega \theta\gamma}\) are a basis. We can assemble these coefficients into a column vector, \(\vec{c}_f,\) which contains the same information as the function \(f(\theta):\)

\[\begin{split}f(\theta) = c_{(-2n)} e^{-i2n\theta\gamma} + \cdots + c_{(+2n)} e^{+i2n\theta\gamma} \quad \Leftrightarrow \quad \vec{c}_f = \begin{bmatrix} c_{(-2n)} \\ \vdots \\ c_{(2n)} \end{bmatrix}.\end{split}\]

If an operation on the function \(f(\theta)\) preserves the structure of the function, i.e., it only modifies the coefficients, then we can think of it as an operation on vectors instead! Our first example is differentiation. This simply pulls down a constant term from the exponent of each pure frequency:

\[f'(\theta) = (-i2n\gamma)c_{-2n} e^{-i2n\theta\gamma} + \cdots + (+i2n\gamma)c_{+2n} e^{+i2n\theta\gamma}.\]

In terms of the coefficient vectors, however, it is just a diagonal matrix \(D:\)

\[\begin{split}\vec{c}_{f'} = \begin{bmatrix} (-i2n\gamma)c_{-2n} \\ \vdots \\ (+i2n\gamma)c_{2n} \end{bmatrix} = \begin{bmatrix} -i2n\gamma & & \\ & \ddots & \\ && +i2n\gamma \end{bmatrix} \begin{bmatrix} c_{-2n} \\ \vdots \\ c_{2n} \end{bmatrix} = D\vec{c}.\end{split}\]

Tip

The derivative of the expectation \(f(\theta)\) as a coefficient vector. Click on the magenta boxes to choose terms contributing to the final frequency.

Another particularly simple operation on \(f(\theta)\) is to shift the parameter \(\theta\) by some constant amount \(s,\) giving a new function \(f_s(\theta) = f(\theta + s),\) also called a parameter shift. From index laws, this adds an exponential factor to each coefficient:

\[f(\theta + s) = c_{(-2n)} e^{-i2n(\theta + s)\gamma} + \cdots + c_{(+2n)} e^{+i2n(\theta + s)\gamma} = e^{-i2ns\gamma}c_{(-2n)} e^{-i2n\theta\gamma} + \cdots + e^{+i2ns\gamma}c_{(+2n)} e^{+i2n\theta\gamma}.\]

Once again, this can be viewed as a diagonal matrix \(T_s\) acting on the coefficient vector:

\[\begin{split}\vec{c}_{f_s} = \begin{bmatrix} e^{-i2ns\gamma}c_{-2n} \\ \vdots \\ e^{+i2ns\gamma}c_{2n} \end{bmatrix} = \begin{bmatrix} e^{-i2ns\gamma} & & \\ & \ddots & \\ && e^{+i2ns\gamma} \end{bmatrix} \begin{bmatrix} c_{-2n} \\ \vdots \\ c_{2n} \end{bmatrix} = T_s\vec{c}.\end{split}\]

Tip

The parameter shifted expectation \(f(\theta + s)\) as a coefficient vector. Click on the magenta boxes to choose terms contributing to the final frequency.

The two-term parameter-shift rule

Our original motivation for introducing \(\theta\) was to optimize the measurement result \(f(\theta).\) If we can differentiate \(f (\theta)\), we can use tools from classical machine learning such as gradient descent. The problem is that circuits are black boxes; all we can do is set some parameters, pull a lever, and out pops a measurement outcome. It’s a bit like a toaster. How do you differentiate a toaster?

Tip

Click the measurement button for toasty measurement outcomes.

Luckily, the magic of Fourier series and coefficient vectors come to the rescue. The basic idea is to write the differentiation matrix \(D\) as a linear combination of shift matrices \(T_s.\) Although we can’t differentiate directly, we _can_ change parameters! Let’s illustrate with the simple example of \(n = 1.\) In this case, the matrices are

\[\begin{split}D = \begin{bmatrix}-2i\gamma && \\ & 0 & \\ & & +2i\gamma \end {bmatrix}, \quad T_s = \begin{bmatrix}e^{-2i\gamma s} && \\ &1& \\ && e^ {+2i\gamma s} \end{bmatrix}.\end{split}\]

It’s easy to check that for any \(s,\)

\[D = \frac{2\gamma}{\sin(\gamma s)}(T_s - T_{-s}).\]

Translating back into statements about functions, we learn that

\[f'(\theta) = \frac{2\gamma}{\sin(\gamma s)}[f(\theta + s) - f (\theta - s)].\]

This is called two-term parameter-shift rule.

Tip

Click anywhere for gratuitous toast.

The two-term rule has a simple geometric interpretation. Changing \(\theta\) takes us around a circle of fixed radius \(r = \vert c_{\pm2}\vert\) at speed \(\gamma/2\pi.\)

Note

Note that parameter shifts add phases and don’t change \(\vert c_ {(\pm 2)}\vert\). The radius is given by either, since the reality of measurement outcomes implies \(c_{(+2)} = \overline{c_{(-2)}}.\)

The derivative \(f'(\theta)\) is a tangent vector of length \(r\gamma.\) We can choose Cartesian coordinates where this tangent vector is vertical, with components

\[f'(\theta) = (0, r\gamma).\]

The parameter shifts \(f(\theta \pm s),\) on the other hand, have components

\[f(\theta \pm s) = (r \cos(\gamma s), \pm r\sin(\gamma s)).\]

It follows immediately that

\[f'(\theta) = \frac{2\gamma}{\sin(\gamma s)} [f(\theta + s) - f(\theta - s)].\]

We picture the geometry below. On the right, tangent to the circle, is \(f'\) as a coefficient vector in the \(c_{\pm2}\) plane. We display \(f(\theta + s)\) and \(-f(\theta - s)\) as light magenta lines. Their vector sum \(\Delta f\) is a dark magenta line.

Tip

Horizontal mouse position controls \(s.\)

The general parameter-shift rule

For \(n > 1,\) parameter shift rules don’t have a simple geometric interpretation. The problem is that each pair of coefficients \(c_{\pm \omega}\) is associated with a circle governing two components of the coefficient vector. To find the derivative, we need to understand each pair of components separately, but parameter shifts simultaneously wind us around all the circles, at different speeds! Geometrically speaking, putting all these circles together gives a higher-dimensional donut, which parameter shifts wind us around. This sounds complicated!

It also looks complicated, as we illustrate for \(n = 3\) below. We set \(\gamma = 1\) so that, on the circle in the \(c_{\pm \omega}\) plane, we execute \(\omega\) revolutions for a single cycle of \(s.\)

Tip

Horizontal mouse position controls \(s.\) For instance, if we place it in the middle of interval, \(s = \pi.\) This translates into an angle \(\pm \pi\gamma\) on the circle labelled \(c_{(\pm \gamma)}.\)

Perhaps surprisingly, the coefficient vector perspective and a few tricks let us derive the general parameter-shift rule straighforwardly. We start with the observation that \(f'(\theta)\) has a coefficient vector with \(2n\) nonzero components, since the constant term always vanishes. Thus, \(2n\) linearly independent parameter shifts \(f (\theta + s_k)\) should be sufficient to reconstruct the derivative, with

\[f'(\theta) = \sum_{k=1}^{2n} \beta_k f(\theta + s_k) \tag{1}\]

for some coefficients \(\beta_k.\) The problem is how to find the shifts and coefficients! We can invoke linear algebra to find coefficients, but only once we choose shifts, and it’s not obvious how to get them to be independent. We can see the problem for \(n = 3,\) where we choose six random shifts. Are they independent or not? It seems hard to tell. Is there a better approach?

Tip

Horizontal mouse position controls \(s,\) which is now “quantized” with random shifts.

Thankfully, there is! We can solve two problems at once by introducing an inner product, i.e., a way to find the scalar overlap of two vectors. This will let us identify orthogonal and hence independent shifts \(s_k.\) Since they are orthogonal, we can also easily determine the coefficients \(\beta_k.\) The idea is straightforward: since the matrices of interest are diagonal,

\[D = \mbox{diag}(-2in\gamma, \ldots, +2in\gamma), \quad T_s = \mbox{diag} (e^{-2in\gamma s}, \ldots, e^{+2in\gamma s}),\]

we can just pluck out the vector of diagonal entries and define a complex inner product in the usual way. Technically, this is the Frobenius inner product for matrices:

\[\langle A, B\rangle = \mbox{Tr}[A^\dagger B].\]

Consider two shifts \(s, t \in [0, 2\pi/\gamma),\) and define \(\omega = e^{2\pi i\gamma(t-s)}.\) The inner product of diagonal shift matrices \(T_s, T_t\) is

\[\langle T_s, T_t\rangle = \sum_{j=-n}^n \omega^j = \omega^{-n}\sum_{j=0}^ {2n} \omega^j= \frac{\omega^{-n}(1 - \omega^{2n+1})}{1 - \omega} \tag{2}\]

using the geometric series. Before moving on, let’s visualize what these inner products look like for \(n = 3.\) The expression (2) is a sum of phases, which we can add top-to-tail on the complex plane. We’ve added a big \(\mathbb{C}\) to distinguish this from other planes we’ve been looking at.

Tip

We display the inner product \(\langle T_s, T_t\rangle\) below. Horizontal mouse position controls \(s.\) The choice of \(t\) is “quantized” to the random shifts from above; click to the left to set it. The phases summed are light magenta, and the total is dark magenta.

As expected, our random shifts are not orthogonal. But with (2) in hand, it’s easy to choose orthogonal vectors! We simply select our shifts \(s_j\) so that the numerator of (2) vanishes:

\[\omega^{2n + 1} = e^{2\pi i \gamma(2n+1)(s_k - s_j)} = 1.\]

This occurs if \(\gamma(2n + 1)(s_k - s_j)\) is always an integer. A natural choice is

\[s_k = \frac{k}{\gamma(2n+1)},\]

for \(k = 1, \ldots, 2n.\) In this case, we have the orthogonality relation

\[\langle T_{s_k}, T_{s_k}\rangle = (2n+1)\delta_{jk}.\]

Thus, spacing shifts equally around the \(s\) circle gives us an orthogonal set of shifts. We picture these equally spaced shifts, and check visually they are orthogonal, for \(n=3\) below. Select any of the equally spaced points, and you can see that its inner product with another of the equally spaced points vanishes.

Tip

Horizontal mouse position controls \(s.\)

Orthogonality makes finding the coefficients \(\beta_k\) easy: we simply take the inner product between \(T_{s_k}\) and the left-hand side of (1), expressed in terms of the differentiation matrix \(D.\) This gives

\[\begin{align*} \beta_k & = \frac{2i\gamma}{2n+1}\sum_{j=-n}^n j \omega_k^j \end{align*}\]

for \(\omega_k = e^{-2\pi i k/(2n+1)}.\) This looks tricky, but we can start with a geometric series, differentiate with respect to \(\omega\), and multiply by \(\omega_k,\) so we get what we want:

\[\omega_k \partial_{\omega_k} \sum_{j=-n}^n \omega_k^j = \sum_ {j=-n}^n j\omega_k^j.\]

We already computed the geometric series in (2). Plugging that back in, differentiating, and using the fact that \(\omega_k^{2n+1} = 1,\) we finally get

\[\begin{align*} \beta_k & = \frac{2i\gamma}{2n+1}\cdot\frac{\omega_k^{-n}(2n+1)(\omega_k - 1)}{(\omega_k - 1)^2} = \frac{2 i\gamma \omega_k^{-n}}{\omega_k-1}. \end{align*}\]

Putting these together with our shifts, we have our general parameter-shift rule:

\[f'(\theta) = \sum_{k=1}^{2n}\frac{2 i\gamma \omega_k^{-n}} {\omega_k-1}f\left(\theta + \frac{k}{\gamma(2n+1)}\right).\]

The approach outlined here only works when the frequencies in the problem are evenly spaced. However, there are ways to generalize further. Even without orthogonality, we can find independent shifts and solve the linear algebra problem (1) for the coefficients. Alternatively, we can use randomization to obtain shifts that are orthogonal on average, leading to the stochastic parameter-shift rule.

Conclusion

We’ve seen that, by viewing a parameterized gate as an opportunity to choose eigenvalues, we naturally expand the expectations of the circuit as a Fourier series, that is, a sum of frequency terms, corresponding to the eigenvalues we chose along the way, with coefficients depending on the circuit. By thinking about the structure of coefficients themselves, we derived parameter-shift rules, allowing us to evaluate the derivative of a circuit as a linear combination of its expectation value at different parameter values. This is a key part of the machinery of performing machine learning with quantum circuits, since we cannot perform the derivatives directly.

If you’d like to learn more, there are many papers on the topic. Two particularly clear references:

Handily, both references have an associated PennyLane demo!

Finally, PennyLane is jam-packed with tools for analyzing circuits as Fourier series. Check out the documentation on the Fourier module to learn more!

About the author

David Wakeham

David Wakeham

Having fun at the intersection of quantum algorithms, symmetry, statistical learning theory and large language models.