Generative quantum advantage for classical and quantum problems

Joseph Bowles

Demos/
Quantum Machine Learning/
Generative quantum advantage for classical and quantum problems

Generative quantum advantage for classical and quantum problems

Joseph Bowles

Published: November 27, 2025. Last updated: November 28, 2025.

Generative machine learning is all about inferring and sampling from probability distributions, and sampling output distributions of quantum computers is known to be classically hard. So proving quantum advantages for generative quantum machine learning is easy, right? Unfortunately, things are not so simple. As H. Huang and colleagues point out in their recent preprint “Generative quantum advantage for classical and quantum problems” [1], claiming an advantage for generative machine learning should not only require the ability to sample from a hard distribution, but also to be able to learn it efficiently from data, and they investigate a specific scenario in which this is possible.

In this demo we will unpack one of the main results of their paper to understand its core mechanics. We will see that the problem is constructed so that learning the hard distribution boils down to performing single-qubit tomography, and we will discuss the scope of this technique in relation to practical AI. In particular, we will focus on the first theorem (Theorem 1) of the paper, since it aligns closest with the notion of generative machine learning in the classical literature. It is informally stated as:

Theorem 1 (Informal: Classically hard, quantumly easy generative models)

Under standard complexity-theoretic conjectures, there exist distributions \(p(y|x)\) mapping classical n-bit strings to m-bit strings that a quantum computer can efficiently learn to generate using classical data samples, but are hard to generate with classical computers.

To show the above, we need to do a couple of things:

Identify a classically ‘hard’ conditional distribution \(p(y|x)\) that corresponds to a family of quantum circuits. For this, we can leverage some existing results about the hardness of sampling.
Show that, with access to a dataset obtained by querying and sampling from \(p(y|x)\), we can infer the circuits that produced the data, and can therefore generate more data.

The paper gives a couple of circuit structures that can be used. We will focus on the simplest, which they term instantaneously deep quantum neural networks (IDQNNs).

Instantaneously deep quantum neural networks

Instantaneously deep quantum neural networks, or IDQNNs, are a particular type of shallow parameterized quantum circuits. The qubits of the circuit live on a lattice, which we’ll take to be a 2D lattice, and index the qubits by their lattice positions \((i,j)\). To sample from the circuit one does the following.

Recipe for sampling from an IDQNN

Prepare each qubit on the lattice in the \(\ket{+}\) state.
Entangle the qubits by performing controlled-Z gates between some pairs of nearest-neighbour qubits on the lattice. If the qubits are horizontal neighbours on the lattice, a CZ is always applied.
Perform a single-qubit Z rotation \(U_{z}(\theta_{ij})=\exp(-\frac{i}{2}\theta_{ij}Z)\) with parameter \(\theta_{ij}\) on each of the qubits
Measure all qubits in the X basis to produce outcomes \(y_{ij}\)

We can depict this graphically as follows.

Note that this is not a circuit diagram, but a graphical representation of the circuit description above: qubits are denoted by black dots, CZ gates are lines between dots, the angles specify the single-qubit rotations, and the blue \(y_{ij}\) are the X-measurement outcomes. We’ll also use the vector \(\boldsymbol{y}\) from now on to denote all the \(y_{ij}\).

The above corresponds to a circuit acting on a 2D lattice of 12 qubits, and this structure corresponds to a shallow circuit since the circuit depth does not depend on the size of the lattice. However, we can map this onto an equivalent deep 1D circuit with only 3 qubits by viewing the circuit as a measurement based quantum computation (MBQC) recipe. When viewed from this perspective, the horizontal dimension of the lattice becomes a time axis, so that at time 0, the system consists of just three qubits (the first vertical axis) prepared in the \(\ket{+}\) state. After applying all CZ and rotation gates that involve these qubits and measuring them in the X basis, the state is teleported to the next line of three qubits (the precise state will depend on the measurement outcomes \(y_{11}\), \(y_{12}\), \(y_{13}\)). Repeating this process until we arrive at the end of the lattice therefore defines a type of stochastic quantum circuit acting on three qubits. The precise way to make this mapping is well known from MBQC theory, and is a bit tricky (see Appendix H2 of their paper [1]), so we won’t bore you with the details here.

If you apply the mapping to our example IDQNN, you find the following circuit:

Equivalent deep circuit that reproduces the IDQNN statistics

The circuit structure for layers 2 and 3 is the same as for layer 1, where the CZ structure is determined by the vertically acting CZ gates that appear in the second and third vertical axis of the lattice. The inputs to the Z gates are classical controls, i.e. the gate is applied only if \(y_{ij}=1\).

To generate samples from this deep circuit we do the following:

Recipe for sampling from the deep circuit

Generate all bits \(y_{ij}\) for \(i<4\) uniformly at random using a classical random number generator.
Run the above circuit, controlling the Z gates on these bits.
Measure the output of the circuit to obtain the final three bits \(y_{41}\), \(y_{42}\), \(y_{43}\).

One can show that the distribution \(p(\boldsymbol{y})\) obtained with the above recipe is identical to the one we described for the IDQNN, and so the two methods are indistinguishable if just given samples \(\boldsymbol{y}\).

If these two circuits lead to the same distribution then why did we do this? The reason is that qubit counts on quantum hardware are still limited. By implementing the deep 1D circuit on a few qubits however, you can simulate the distribution of a 2D shallow circuit on many more qubits (the authors call this circuit ‘compression’). This trick is used to simulate a shallow IDQNN circuit on 816 qubits using a deep circuit with just 68 qubits. To do this, they actually work with a deep circuit on a 2D lattice, and map it to a shallow circuit on a 3D lattice. This obviously complicates things a bit (and makes drawing pictures a lot harder!) so we will stick to the 2D vs 1D example above; in the end, it will contain everything we need to understand the result for higher dimensional lattices.

Proving hardness for sampling

It turns out that—if we remove the classically controlled Z gates for now—the circuit structure of the deep circuit above is universal. That is, any \(n\)-qubit circuit with two qubit gates can be efficiently approximated by sequential layers of Hadamards, Z rotations and controlled-Z gates on an \(n\) qubit computational basis input. We can therefore use this fact to define a circuit that is hard to sample from classically: simply take your favourite pre-existing hardness results for sampling (for example, [2]) and compile the circuit to the H, RZ, CZ gateset. We can then embed this into the precise structure we had above by inserting the classically controlled-Z gates at every layer. If we happen to sample the all-zero bitstring for the \(y_{ij}\) values that control these gates, then we will sample from this hard distribution. In this sense the distribution \(p(\boldsymbol{y})\) is ‘hard’ since any classical algorithm will fail to reproduce the full statistics in this case. Moreover, since the distribution of the IDQNN is identical, it follows that the corresponding IDQNN is also hard to sample from.

Adding inputs states

At this point, we have a shallow circuit called an IDQNN, a way to map it to a deep circuit structure, and an argument that the distributions \(p(\boldsymbol{y})\) resulting from these circuits are hard to sample from classically. However, we don’t yet have everything in order to be able to learn. The last ingredient we need comes in the form of an input \(x\). This will mean that rather than working with the the probability distribution \(p(\boldsymbol{y})\), we will work with a conditional probability distribution \(p(\boldsymbol{y}|x)\).

For each \(x\), the probability distribution \(p(\boldsymbol{y}|x)\) corresponds to a IDQNN where—rather than all qubits being in the \(\vert + \rangle\) state—each input qubit can be prepared in either the \(\vert + \rangle\) or \(\vert 0 \rangle\) state, which is determined by the input \(x\). We therefore adapt the first step of our recipe for the IDQNN:

Recipe for sampling from an IDQNN with inputs

Prepare each qubit in either the \(\vert + \rangle\) or \(\vert 0 \rangle\) state, depending on \(x\).
Perform steps 2-4 as before.

In order to be able to prove the result, the choice and distribution of possible input states must satisfy a particular property called ‘local decoupling’ (see Appendix C2 of their paper [1]). One particularly simple choice that will work for our 2D IDQNN is the following choice of three inputs, \(x=0,1,2\) (in the paper a different choice is used, but the result will be the same).

If \(x=0\), all input qubits are prepared in the \(\vert + \rangle\) state. If \(x=1\), all qubits on the ‘even diagonals’ of the lattice are prepared in \(\vert + \rangle\), the remaining are prepared in \(\vert 0 \rangle\). If \(x=2\), all qubits on the ‘odd diagonals’ of the lattice are prepared in \(\vert + \rangle\), the remaining are prepared in \(\vert 0 \rangle\).

Pictorially, the choice looks like this.

If the input is 0, we already know what happens; this is just the IDQNN described in the previous sections. If the input is 1 or 2, things get very simple. Note that the CZ gate is symmetric, and can be written as

\[CZ = \vert 0 \rangle\langle 0 \vert \otimes \mathbb{I} + \vert 1 \rangle\langle 1 \vert \otimes Z = \mathbb{I} \otimes \vert 0 \rangle\langle 0 \vert + Z \otimes \vert 1 \rangle\langle 1 \vert.\]

Thus, if a CZ gate acts on the state \(\vert 0 \rangle\) on either side, the effect is that it becomes the identity. Since every CZ gate hits a qubit in the \(\vert 0 \rangle\) state for these inputs, we can actually remove them all. For example, the input \(x=1\) actually just corresponds to an unentangled product state:

By performing mid-circuit measurements and resetting qubits, we can easily reproduce the statistics for the inputs \(x=1,2\). For example, for \(x=1\), the circuit looks as follows (we have removed the rotation gates for the qubits prepared in \(\vert 0 \rangle\) since this results in a global phase only).

The deep circuit with mid-circuit measurements that reproduces the IDQNN for x=1

The authors argue that the conditional distribution \(p(\boldsymbol{y}|x)\) should also be considered hard to sample from classically, since the input \(x=0\) corresponds to the case of the ‘no input’ IDQNN of the previous sections, for which we have already argued hardness. For the inputs \(x=1\) and \(x=2\), however, the resulting distribution has an efficient classical simulation since it corresponds to measurements made on unentangled single qubits.

The learning problem

We now have a classically hard conditional distribution \(p(\boldsymbol{y}|x)\), where each input \(x\) corresponds to a IDQNN with inputs that we know how to simulate with a deeper circuit on fewer qubits. At this point we are ready to learn.

We first need a dataset, which we create by repeating the following \(N\) times

Randomly sample an input \(x=0,1,2\).
Implement the deep circuit that simulates the IDQNN for this input to generate a set of outcomes \(\boldsymbol{y}\). Add the pair \((x,\boldsymbol{y})\)to the dataset.

The precise definition of learning is given by definition 13 in the Appendix of the paper:

Definition 13 (The task of learning to generate classical bitstrings)

We are given a dataset of input-output bitstring pairs \((x,\boldsymbol{y})\). Each output bitstring \(\boldsymbol{y}\) is sampled according to an unknown conditional distribution \(p(\boldsymbol{y}|x)\). The goal is to learn a model from the dataset that can generate new output bitstrings \(\boldsymbol{y}\) according to the unknown distribution \(p(\boldsymbol{y}|x)\) for any given new input bitstring \(x\).

Although the above definition suggests the conditional distribution is unknown, we actually know a lot about it. In particular, we need to work under the assumption that we know the exact structure of the quantum circuits that produce the data, except for the rotation angles \(\theta_{ij}\) (i.e. this is included in the `prior knowledge’ of the problem). To learn, we therefore just need to infer the parameters \(\theta_{ij}\) from the data, which will allow us to generate new data by simply implementing the resulting circuits. This is very different from real world machine learning problems, where such a precise parametric form of the ground truth distribution is rarely known.

So how do we infer the parameters \(\theta_{ij}\) from data? Consider for example the data for input \(x=1\), and the outcome \(y_{12}\). From the above circuit we see that in this case the outcome is produced by this single qubit circuit:

The single-qubit circuit that is used to infer the parameter \theta_{12}

This is a measurement on a rotated single-qubit state, for which the expectation value for \(y_{12}\) is

\[\langle y_{12} \rangle = (1-\cos(\theta_{12}))/2.\]

Rearranging this equation we have

\[\theta_{12} = \frac{1}{2} \arccos(1 - 2\langle y_{12} \rangle).\]

All we have to do to infer \(\theta_{12}\) is to look at the data for \(x=1\), estimate the expectation \(\langle y_{12} \rangle\) from the corresponding \(y_{12}\) values, and use the above formula; no gradients or training required (so clearly no barren plateaus either)! Note that all we are doing here is a form of single-qubit tomography, which you might encounter in a first course of quantum information.

The remaining parameters can be estimated in a similar way depending on whether they live on the even diagonal (which requires \(x=1\)) or the odd diagonal (which requires \(x=2\)). From the Hoeffding inequality, we can be sure that the estimates are close to the true values with high probability, and we thus have learned the parameters to low error. With this knowledge we can now sample data for the \(x=0\) input and this is known to be classically hard, and in this sense we have learned the distribution.

Does this bring us closer to useful quantum AI?

Now that the technicalities are out of the way, we can ask the really important question: does this bring us closer to genuinely useful QML algorithms? This question is speculated on briefly in the conclusion of the paper where it is mentioned that “the precise remaining steps towards useful generative quantum advantage remain an open question”. But why is usefulness so enigmatic? As we explain below, a large part of the reason is due to the fact that the setup we considered is significantly different to that of real-world generative machine learning problems.

The necessity of prior knowledge

The first of these differences concerns the ‘prior knowledge’ that needs to be assumed in order to prove the result. Notice that, if we were given the data \({(x, \boldsymbol{y})}\) but told nothing else, we would not have known how to infer the parameters, nor how to produce new samples once these parameters were known, since these both required knowledge of the circuit. That is, precise knowledge of the circuit structures that produced the data was necessary in order to learn. In reality, such a precise form of the ground truth distribution is not known and models have to be learnt with vastly less prior knowledge. In their current form, these techniques therefore only appear useful for tasks related to quantum circuit tomography, where such knowledge is part of the problem description.

Learning from simple statistics

The second difference concerns how the parameters were learned. The parameters of modern classical generative models must be learned by an iterative process (often gradient descent) that extracts their values from complex, multidimensional correlations that are present in the statistics of the training data. For example, the parameters of convolutional filters in image models are adjusted to capture highly non-linear correlations, such as how edges, textures, and object parts co-occur across different spatial locations. In the quantum setup, although the distribution for the input \(x=0\) is undoubtedly complex, the data that is used for learning comes from the much simpler product distributions in which there are no correlations between bits.

As an illustrative comparison, imagine a large classical generative model (such as a diffusion model or a transformer) with parameters \(\theta\) and corresponding output distribution \(p(\boldsymbol{y})\). Suppose we want to learn the parameters \(\theta\) from data. To do this we construct a conditional distribution \(p(\boldsymbol{y}|x)\) which does the following:

For \(x=0\), the model samples the generative model distribution \(p(\boldsymbol{y})\)
For \(x=1\), \(\boldsymbol{y}\) just returns the parameters \(\theta\)

Obviously, we can learn the parameters of the model from \(p(\boldsymbol{y}|x)\): we just look at the data for \(x=1\) and read them off directly from \(\boldsymbol{y}\). Our quantum example is not dramatically different from this, since for the inputs \(x=1,2\) we have a simple method to read off the parameters from the statistics (single-qubit tomography), and this method is known beforehand rather than being learned from the data. In effect, we have set up the problem so that inferring parameters is straightforward for some inputs, whilst sampling is hard for others, and this process of learning is very different from the complex process that occurs in modern neural networks. We note that the specific example in the paper is more involved than this, and uses a higher dimensional lattice and a different set of inputs, but the strategy is the same: for each parameter, there is a reasonable fraction of the inputs that leaves the relevant qubit unentangled from the rest, and single-qubit statistics reveals the desired value.

What can lead us to genuine usefulness?

In order to uncover genuine usefulness in quantum machine learning we therefore need to move to scenarios that mirror the assumptions of realistic learning problems. If the flipside of this is that proving complexity theoretic separations becomes seemingly impossible, then perhaps they are not the right goals to be pursuing [3]? A more fruitful direction may be to instead look for other tools to evaluate the performance of quantum models. Although this means waving goodbye to computational complexity theory, this freedom just might be the fresh perspective we need to uncover genuinely useful quantum machine learning algorithms.

References

About the author

Joseph Bowles

Quantum Machine Learning researcher at Xanadu

Share demo

Ask a question on the forum

About

Software & Documentation

Resources

Topics

Community & Support

Generative quantum advantage for classical and quantum problems

Instantaneously deep quantum neural networks

Recipe for sampling from an IDQNN

Recipe for sampling from the deep circuit

Proving hardness for sampling

Adding inputs states

Recipe for sampling from an IDQNN with inputs

The learning problem

Does this bring us closer to useful quantum AI?

The necessity of prior knowledge

Learning from simple statistics

What can lead us to genuine usefulness?

References

About the author

Related Demos