/ Learn / Demos / Quantum Computing / A Game of Surface Codes: Large-Scale Quantum Computing with Lattice Surgery

A Game of Surface Codes: Large-Scale Quantum Computing with Lattice Surgeryยถ

Published: June 2, 2025. Last updated: June 2, 2025.

In surface-code based fault tolerant quantum computing architectures, T gates are typically implemented via injected magic states. The layout and design of the architecture plays a crucial role in how fast a magic state can be reliably produced and consumed for computation. The game of surface codes 1 allows us to reason about such space-time tradeoffs in architecture designs, without having to get into the nitty-gritty details of surface code physics. In this demo, we will see how different designs can lead to faster computations at the cost of involving more qubits and vice versa.

/_images/Hero_Game_of_Surface_Codes.png

Introductionยถ

The game of surface codes 1 is a high-level framework for designing surface code quantum computing architectures. The game helps us understand space-time trade-offs, where designs with a higher qubit overhead allow for faster computations and vice versa. For example, a space-efficient design might allow a computation with $10^8$ T gates to run in $4$ hours using $55k$ physical qubits, whereas an intermediate design may run the same computation in $22$ minutes using $120k$ physical qubits, or a time-optimized design in $1$ second using $1500$ interconnected quantum computers with $220k$ physical qubits, each.

One can draw a rough comparison to microchip design in classical computing, where the equivalent game would be about how to arrange the transistors of a chip to perform fast and efficient computations.

The game can be understood entirely from the rules described in the next section. However, it still helps to understand the correspondences in physical fault tolerant quantum computing (FTQC) architectures. First of all it is important to note that we consider surface codes that implement (Clifford + T) circuits. In particular, these circuits can be compiled to circuits that just perform Pauli product measurements. This is because all Clifford operations can be moved to the end of the circuit and merged with measurements. The remaining non-Clifford gates are realized by magic state injection and more Clifford operations, which can be merged with measurements again. Hence, we mainly care about performing measurements on qubits in arbitrary bases and efficiently distilling and injecting magic states.

We also note that the patches that represent qubits correspond to surface code qubits. There is a detailed explanation in Appendix A in 1 that describes the surface code realizations of all operations that we are going to see. These are useful to know in order to grasp the full depth of the game, but are not essential to understanding its rules and concluding design principles that we cover in this demo. For further reading on these subjects, we recommend the blog posts on the surface code and quantum error correction by Arthur Pesah, our demo on the toric code, as well as the three-part series on the toric code by James Wooton.

Rules of the gameยถ

The game is played on a board of tiles, where patches correspond to logical qubits. Underlying these tiles are physical qubits that are statically arranged ($2d^2$ physical qubits per tile for code distance $d$). But we should view logical qubit patches as dynamic entities that can appear, move around, deform and disappear again. The goal of this demo will be to understand the design principles and space-time trade-offs for surface code architectures.

Data qubits are realized by patches that occupy at least one tile, but potentially multiple. They always have four distinct boundaries corresponding to X (dotted) and Z (solid) edges. This is shown in the figure below.

/_images/qubit_definition_cropped.png

Qubits are defined as patches of tiles on the board. A single qubit can occupy one tile (a) or multiple tiles (b), where dotted lines correspond to X and solid lines to Z operators. Attribution see **ยถ

Every operation in the game has an associated time cost that we measure in units of code cycles ๐Ÿ•’. There are some discrepancies to actual surface code cycles, but the correspondance is close enough to weigh out space-time trade-offs in architecture designs. We are not going to give an exhaustive overview of all possible operations, but focus on a few important ones and fill the remaining gaps necessary for the architecture designs in the respective sections below.

Arbitrary Pauli product measurementsยถ

At the cost of 0๐Ÿ•’ we can measure patches in the X and Z basis. If two patches share a border, one can measure the product of their shared edges as highlighted by the blue region in the figure below at the cost of 1๐Ÿ•’.

/_images/ZZ_measurement.png

Simultaneously measuring the patches of two adjacent patches corresponds to the product of their neighboring edges. Here, we measure $ZZ$. Attribution see **ยถ

In particular, if the shared edge contains both Z and X edges, we can measure in the Y basis. In the following example, the upper qubit A has both operator edges $Z_A$ and $X_A$ exposed. Measuring it together with the auxillary qubit B, initialized in the $|0\rangle$ state below, we measure $(Z_A X_A) \otimes Z_B \propto Y_A \otimes Z_B$ alltogether.

/_images/Y_measurement.png

Y operators can be measured by having both X and Z edges be exposed with an adjacent auxiliary qubit. The measurement corresponds to the product of all involved operators, involving $Z_A X_A \propto Y_A$. Attribution see **ยถ

If we want to measure a single qubit patch in practice, we start off deforming it at the cost of 1๐Ÿ•’, initialize an auxiliary qubit at no cost, and perform the joint measurement as shown above (1๐Ÿ•’). The entire protocol costs 2๐Ÿ•’ and is shown below:

/_images/Y_measurement_protocol.png

The protocol for measuring a single qubit in the Y basis involves deforming the patch (Step 2, 1๐Ÿ•’), initializing an auxillary qubit in $|0\rangle$ (0๐Ÿ•’), simultaneously measuring both patches (1๐Ÿ•’) and deforming the qubit back again (0๐Ÿ•’). Attribution see **ยถ

Auxiliary qubits play an important role as they allow measuring products of Pauli operators on different qubits, which is the most crucial operation in this framework, since everything is mapped to Pauli product measurements.

/_images/PPM.png

Measuring $Y_1 X_3 Z_4 X_5$ via a joint auxiliary qubit in 1๐Ÿ•’. In principle multi-qubit measurements with many qubits come at the same cost as with fewer qubit. However, the requirement of having an auxiliary region connecting all qubits may demand extra deformations. Attribution see **ยถ

Non-Clifford Pauli rotationsยถ

Non-Clifford Pauli rotations $e^{-i \frac{\pi}{8} P}$ for some Pauli word $P$ are realized via magic state distillation and injection. Magic state distillation blocks are a crucial part of the architecture design that we are going to cover later. For the moment we assume that we have means to prepare magic states $|m\rangle = |0\rangle + e^{-i \frac{\pi}{4}} |1\rangle$ on special qubit tiles (distillation blocks). Magic state injection in this case then refers to the following protocol:

/_images/magic_state_injection.png

Performing a non-Clifford $\pi/8$ rotation corresponds to performing the joint measurement of the Pauli word and $Z$ on the magic state qubit. The measurement of $P \otimes Z_m$ costs 1๐Ÿ•’, the subsequent $X$ measurement is free. The additional classically controlled Clifford rotations can be merged again with the measurements at the end of the circuit. Attribution see **ยถ

Take for example the Pauli word $P = Z_1 Y_2 X_4$ on the architecture layout below. This design allows one to directly perform $e^{-i \frac{\pi}{8} P}$ as we have access to all of $X, Y, Z$ on each qubit, as well as the $Z$ edge for the magic state qubit.

/_images/non_clifford_rotation.png

Performing $e^{-i \frac{\pi}{8} Z_1 Y_2 X_4}$ by measuring $Z_1 Y_2 X_4 Z_m$. The additional measurement $X$ on the magic state qubit is not shown and has no additional cost. The remaining Clifford Pauli rotations are merged with the terminal measurements at the end of the circuit via compilation. Attribution see **ยถ

We are going to see in the next section that one of the biggest problems is performing Y rotations and measurements (same thing, really, in this framework).

Data blocks designยถ

Computation happens on logical data qubits that are arranged on a so-called data block. We now have all the necessary tools to understand different designs and their space-time tradeoffs. In particular, the speed of the quantum computer is determined by how fast a magic state can be distilled and consumed by a data block. In this section we focus on how the design affects how fast a magic state can be consumed by a block and do not focus on the distillation itself (this will be handled in the next section).

Compact data blocksยถ

The compact data block has the following form. The middle aisle is going to be used as an auxiliary qubit region.

/_images/compact_block.png

The compact data block design is efficient in space. However, only one edge is exposed to the auxiliary qubit region in the middle. Attribution see **ยถ

This design only uses $\frac{3}{2}n + 3$ tiles for $n$ qubits. The biggest drawback is rather obvious: we can only access $Z$ measurements in the auxiliary qubit region. In order to perform joint $X$ measurements, we can perform a patch rotation at a cost of 3๐Ÿ•’:

/_images/patch_rotation.png

A patch rotation can be used to expose the $X$ edge to the auxiliary qubit region. Attribution see **ยถ

The worst thing that can happen is to have two opposite qubits require an X measurement, e.g. qubits (3 and 4) or (5 and 6). If either or both occurs, it takes a total of 6๐Ÿ•’ to rotate the patches.

An additional problem of this design is the fact that there are no tiles for qubits to expand to in order to perform Y measurements. This can be remedied by making use of the identity

$$ e^{i \frac{\pi}{8} Y} = e^{-i \frac{\pi}{4} Z} e^{i \frac{\pi}{8} X} e^{i \frac{\pi}{4} Z}. $$

The Clifford rotation on the right $e^{i \frac{\pi}{4} Z}$, which is applied first, needs to be explicitly performed in this case. The second Clifford rotation ($e^{-i \frac{\pi}{4} Z}$) can be merged with the terminal measurements of the circuit. Such a rotation $e^{i \frac{\pi}{4} P}$ can be performed with a joint measurement of $P \otimes Y$, similar to the magic state distillation circuit:

/_images/clifford_rotation.png

A Clifford rotation $e^{i \frac{\pi}{4} P}$ is performed by measuring $P \otimes Y$. Attribution see **ยถ

In particular, we still need to be able to perform a $Y$ measurement somewhere. In this case we just outsourced it to another resource qubit, which we can use for all others and for which we left space in the bottom left corner of the compact data block. For example, we can perform the rotation $e^{i \frac{\pi}{4} Z_3 Z_5 Z_6}$ at a cost of 1๐Ÿ•’ in the following way:

/_images/clifford_rotation_356.png

A Clifford rotation $e^{i \frac{\pi}{4} Z_3 Z_5 Z_6}$ is performed by measuring $Z_3 Z_5 Z_6 \otimes Y_\text{resource}$ with the additional resource qubit in the bottom left corner of the compact block. Attribution see **ยถ

The worst case here is having an even number of $Y$ operators in the Pauli word, as it requires two distinct $\frac{\pi}{4}$ rotations, each costing 2๐Ÿ•’.

Overall, in the worst case scenario an operation can cost 9๐Ÿ•’. This consists of the base cost of 1๐Ÿ•’ for performing the Pauli measaurement, 2๐Ÿ•’ for having an even number of $Y$ operators, and 6๐Ÿ•’ when opposite qubit patches require $X$ measurements. The following protocol shows such a scenario by performing $e^{i \frac{\pi}{8} Y_1 Y_3 Z_4 Y_5 Y_6}$, which is realized by $e^{i \frac{\pi}{8} X_1 X_3 Z_4 X_5 X_6} e^{i \frac{\pi}{4} Z_3 Z_5 Z_6} e^{i \frac{\pi}{4} Z_1}$ (ignoring again the additional two $\frac{\pi}{4}$ rotations that are merged with measurements).

/_images/compact_block_worst_case.png

Worst case scenario in the compact block when performing $e^{i \frac{\pi}{8} Y_1 Y_3 Z_4 Y_5 Y_6}$. Step 2 measures $Z_1$ together with $Y$ on the resource qubit in order to perform the $e^{i \frac{\pi}{4} Z_1}$ rotation at 1๐Ÿ•’. Step 3 performs the additional $X$ measurement on the resource qubit at 0๐Ÿ•’. Same for steps 4 and 5 for performing $e^{i \frac{\pi}{4} Z_3 Z_5 Z_6}$ at 1๐Ÿ•’ overall. Steps 6 and 7 perform the patch rotations at 3๐Ÿ•’, each. And the final measurement of $X_1 X_3 Z_4 X_5 X_6 Z_m$ at another 1๐Ÿ•’ in step 8 completes the computation. Attribution see **ยถ

Intermediate data blocksยถ

The intermediate data block design gets rid of the problem of potentially having blocking $X$ measurements on opposite qubit patches by simply removing the second row and laying out all qubits in a linear fashion.

/_images/intermediate_block.png

Intermediate data block design. Attribution see **ยถ

As such, this architecture occupies $2n + 4$ tiles. One can get additional savings by having the auxiliary qubit region be flexibly the lower or upper row. This way, one can save on the extra cost of rotating patches back to their original position.

/_images/intermediate_worst_case.png

Performing a $ZXZZX$ measurement by performing patch rotations for the appropriate $X$ measurements and moving all qubits down into the auxiliary region to save time. Attribution see **ยถ

Overall we get a maximum of 2๐Ÿ•’ for the rotations. Adding the base cost of 1๐Ÿ•’ for the measurement and the maximum 2๐Ÿ•’ for the additional Clifford $\pi/4$ Z rotations as in the compact block design, we obtain a maximum cost of 5๐Ÿ•’.

Fast data blocksยถ

In order to be able to access Y operations directly, we need both Z and X edges exposed to the auxiliary qubit region, demanding 2 tiles for 1 qubit. We omitted this in the rule description before as it is only relevant for the fast data block, but we can also realize 2 qubits on a single patch using 2 tiles:

/_images/2q_patch.png

Two qubits can be realized by a patch on two tiles. The patch now has 6 distinct edges, corresponding to the operators as indicated in the figure. Attribution see **ยถ

With this extra trick up our sleeve, we can construct the fast data block consisting of two-qubit patches with an all-encompassing auxiliary qubit region.

/_images/fast_block.png

Fast data block design. Attribution see **ยถ

Here, all 15 distinct Pauli operators are readily available. This is because we have $X_1$, $X_1 \otimes X_2$, $Z_2$, $Z_1 \otimes Z_2$ and all products thereof available. For example, we can realize $X_2$ via $X_1 (X_1 \otimes X_2)$ and we have $Y_1 \propto (X_1) (Z_1) = (X_1) (Z_1 \otimes Z_2) (Z_2)$. With the same logic we can obtain $Y_2$ and $Z_1$. Further, we have operators like $X_1 Y_1 \propto (X_1 \otimes X_2) Z_2$, $Z_1 \otimes X_2 = X_1 (X_1 \otimes X_2) Z_2 (Z_1 \otimes Z_2)$ and $Y_1 X_2 \propto (X_1 \otimes X_2) (Z_2) (Z_1 \otimes Z_2)$.

The maximum time cost for performing a non-Clifford Pauli rotation therefore is just 1๐Ÿ•’ on the fast data block.

Distillation blocks designยถ

So far we have only been concerned with data blocks that perform Pauli product measurements and assumed magic states to be available for consumption. These magic states need to be distilled in separate blocks, which can in principle be of the same design as data blocks. But since the blocks are used for a fixed protocol, this knowledge can be used for simplifications.

There are different approaches to perform magic state distillation. We consider the case where we can prepare a magic state with infidelity $p$. The distillation protocol is then such that this infidelity is decreased to an acceptable level. All other operations of the protocol are Clifford, so we can measure if an error has occured. This then determines the success probability of the protocol, which in the case below is roughly $(1-p)^n$ for an $n$-qubit protocol. We are going to go through the simplest protocol in a 15-to-1 distillation block.

15-to-1 distillationยถ

This protocol uses 15 imperfect magic states with infidelity $p$ and outputs a single magic state with infidelity of $35p^3$. The distillation circuit is shown below, with the details described in section 3.1 in 1:

/_images/15-to-1.png

15-to-1 distillation protocol. Each $\frac{\pi}{8}$ rotation involves a magic state injection with an error-prone magic state. In total, we have $4+11$ magic states, each with infidelity $p$ and output a magic state $|m\rangle$ on the fifth qubit with infidelity $35p^3$. Attribution see **ยถ

Because all operations in the protocol are Z measurements, we can use the compact data block design to perform the distillation. Another trick the author of 1 proposes is to use the auto-corrected magic state injection protocol below that avoids the additional Clifford $\frac{\pi}{4}$ Pauli rotation (and to note that the $\frac{\pi}{2}$ Pauli rotation is just a sign flip that can be tracked classically).

/_images/auto-corrected-non-clifford.png

The auto-corrected magic state injection protocol avoids the additional Clifford $\frac{\pi}{4}$ Pauli rotation from above at the cost of having an additional qubit that is measured. However, note that the first two measurements commute and can be performed simultaneously. Attribution see **ยถ

Using this injection protocol to perform the non-Clifford $\frac{\pi}{8}$ rotations using the error prone magic states, the 15-to-1 protocol on a compact data block is performed in the following way:

/_images/15-to-1-protocol.png

The 15-to-1 protocol executed on a compact data block using the auto-corrected magic state injection subroutine in each of the repeating steps. Note that both $P \otimes Z_m$ and $Z_m \otimes Y_{|0\rangle}$ measurements are performed simultaneously. If all $X$ measurements on qubits 1-4 in step 23 yield a $+1$ result, a magic state is successfully prepared on qubit 5. The probability for failure is roughly $(1-p)^n$. Attribution see **ยถ

The 15-to-1 distillation protocol produces a magic state in 11๐Ÿ•’ on 11 tiles.

Quantum computer designsยถ

The 15-to-1 distillation protocol is the simplest of a variety of protocols each with different characteristics. The best choice of distillation protocol heavily depends on the error probabilities of the quantum computer in use, as well as the overall tolerance for errors we allow to still occur. For example, assume we tolerate a T infidelity of $10^{-10}$ and have $p=10^{-4}$, then the 15-to-1 protocol would suffice as it yields an infidelity of $35p^3 = 3.5 \times 10^{-11} < 10^{-10}$.

Another consideration is to combine data and distillation blocks that match in their maximum time requirements. Since the 15-to-1 distillation above takes 11๐Ÿ•’ to procude a magic state, there is no point in using the fast or intermediate data blocks, and we can just resort to the compact one.

A minimal setup can be seen below. It consists of 100 logical qubits on 153 tiles in a compact block, as well as a 15-to-1 distillation block using another 11 tiles.

/_images/minimal-setup.png

Minimal setup with 100 logical qubits on 153 tiles and 11 extra tiles for a compact distillation block. Attribution see **ยถ

For a code distance of $d=13$ we would require $164 \cdot 2 \cdot d^2 \approx 55k$ physical qubits. An example computation with $10^8$ T gates at a code cycle of $1\mu s$ would finish in $d \cdot 11๐Ÿ•’ \cdot 10^8 \approx 4h$.

In this setup, a magic state is produced every 11๐Ÿ•’ and takes at most 9๐Ÿ•’ for consumption. The bottleneck is in the magic state distillation, and overall this setup takes 11๐Ÿ•’ per non-Clifford gate. The most straight-forward way to speed this up is by adding magic state distillation blocks. Adding just one other distillation block halves the T-gate production time to 5.5๐Ÿ•’. Now it makes sense to use the intermediate data block design, which takes at most 5๐Ÿ•’ for T-gate consumption:

/_images/intermediate_setup.png

Intermediate setup consisting of the intermediate data block and two 15-to-1 distillation blocks on each end. Attribution see **ยถ

In this case we require 222 tiles, so $222 \cdot 2 \cdot d^2 \approx 75k$ physical qubits, and the same computation mentioned before would finish in half the time after about $2h$.

Conclusionยถ

Weโ€™ve been introduced to a high-level description of quantum computing that allows us to reason about space-time trade-offs in FTQC architecture designs. We have seen some basic prototypes that allow computations involving $10^8$ T gates in orders of hours using $55k$ or $75k$ physical qubits. With this knowledge, we should be able to follow the more involved tricks discussed in sections 4 and 5 in 1, that we have not covered in this demo yet.

Referencesยถ

1(1,2,3,4,5,6)

Daniel Litinski โ€œA Game of Surface Codes: Large-Scale Quantum Computing with Lattice Surgeryโ€ arXiv:1808.02892, 2018.

Attributionsยถ

**: Images from Game of Surface Codes by Daniel Litinski, CC BY 4.0

About the authorยถ

Total running time of the script: (0 minutes 0.000 seconds)

Korbinian Kottmann

Korbinian Kottmann

Quantum simulation & open source software

Total running time of the script: (0 minutes 0.000 seconds)