Efficient block encoding for QSVT applied to matrix inversion for CFD problems

Jay Soni

Demos/
Algorithms/
Efficient block encoding for QSVT applied to matrix inversion for CFD problems

Efficient block encoding for QSVT applied to matrix inversion for CFD problems

Jay Soni

Published: January 15, 2026. Last updated: January 15, 2026.

Systems of linear equations are remarkably ubiquitous, appearing in almost every industry from healthcare, transportation, finance, chemistry and even quantum computing. Solving systems of linear equations is a fundamental pillar of modern science and industry. It has been shown that QSVT can be used for matrix inversion on a quantum computer [1]. For more information on how to use PennyLane’s qsvt() functionality to run matrix inversion on a quantum computer see our demo on QSVT in Practice.

Do you know what the logical resource requirements are to invert a matrix on a quantum computer?

It turns out that the quantum compute cost can get quite large due to data loading. Simply put, if we want to invert a large matrix, we are constrained by the cost of encoding that matrix onto the quantum computer. While this cost is significant for any general matrix, in real life our problems often have patterns and structure!

By exploiting the structure of a problem, we can significantly reduce the quantum resource cost of the algorithm. Thereby making QSVT based matrix inversion more accessible to implement on nearer term fault-tolerant quantum hardware.

This demo, based on our recent paper Quantum compilation framework for data loading [2], will showcase an optimized block encoding strategy that uses the sparsity of the matrix to significantly reduce the cost of QSVT. We will focus on a particular matrix inversion problem that arises in computation fluid dynamics (CFD) [3].

Problem Setup

The two-dimensional lid-driven cavity flow (2D-LDC) is a classical benchmark in computational fluid dynamics (CFD) for validating numerical schemes that solve the incompressible Navier–Stokes equations. Determining the fluid flow within the cavity requires solving the pressure correction equations by inverting the associated pressure correction matrix (\(A\)). The pressure correction matrix that arises from simulating flow within this cavity has a very sparse, structured diagonal form. The figure below highlights the non-zero entries a \((64, 64)\) instance of this matrix [3].

../../_images/sparse_diagonal_matrix.png

In order to invert this matrix we will need to load it into the quantum computer using a block encoding. The standard technique for block encoding any (square) matrix \(A\) is the method of linear combination of unitaries (LCUs). However, it suffers from a few fatal flaws (try saying that five times fast).

The number of terms in the LCU scales as \(O(4^{n})\) in the number of qubits, and thus the cost of the block encoding also scales exponentially. Even computing the LCU decomposition becomes a computational bottleneck. Furthermore, there is no way of knowing a priori how many terms there will be in the LCU. For these reasons, a general “one size fits all” block encoding scheme is usually too expensive for our systems of interest. Instead, we leverage the inherent patterns and structure of our matrix in order to implement an efficient block encoding operator.

Exploiting structure in the block encoding

This matrix (\(A\)) can be block encoded using a d-diagonal encoding technique [2] developed by my colleagues here at Xanadu. The method works by loading each diagonal in parallel and then shifting them to their respective ranks in the matrix. The quantum circit that implements the d-diagonal block encoding is presented below. To learn more, read our paper: “Quantum compilation framework for data loading”.

Estimating the resource cost for this circuit may seem like a daunting task, but we have PennyLane’s quantum resource estimator to help us construct each piece!

Diagonal Matrices & the Walsh-Hadamard Transform

Let’s start with the \(D_{k}\) operators. These are a list of diagonal operators where each operator stores the normalized entries from one of the diagonals of our d-diagonal matrix \(A\). By multiplexing over the \(D_{k}\) operators, we can load all of the diagonals in parallel. Typically, each diagonal operator is implemented using a product of Controlled PhaseShift gates, in this case we will be leveraging the Walsh-Hadamard transformation ([2], [4]).

The Walsh-Hadamard transform allows us to naturally optimize the cost of our block encoding by tuning the number of Walsh coefficients within \([1, N]\), where \(N\) is the the size of the matrix. If the entries in our diagonal are sparse in the Walsh basis, as is the case for the CFD example, then we can get away with far fewer Walsh coefficients. This results in a much more efficient encoding. The circuit below prepares a single such Walsh-Hadamard diagonal block encoding, ultimately we need to prepare as many operators as non-zero diagonals in \(A\).

import pennylane.numpy as qnp
import pennylane.estimator as qre


def WalshHadamard_Dk(num_diags, size_diagonal, num_walsh_coeffs):
    num_diag_wires = int(qnp.ceil(qnp.log2(size_diagonal)))
    list_of_diagonal_ops = []

    for _ in range(num_diags):
        compute_op = qre.Prod([qre.Hadamard(), qre.S()])

        zero_ctrl_wh = qre.Controlled(
            qre.Prod(((qre.MultiRZ(num_diag_wires // 2), num_walsh_coeffs),)),
            num_ctrl_wires=1,
            num_zero_ctrl=1,
        )
        one_ctrl_wh = qre.Controlled(
            qre.Prod(((qre.MultiRZ(num_diag_wires // 2), num_walsh_coeffs),)),
            num_ctrl_wires=1,
            num_zero_ctrl=0,
        )
        target_op = qre.Prod([zero_ctrl_wh, one_ctrl_wh])

        list_of_diagonal_ops.append(qre.ChangeOpBasis(compute_op, target_op))

    return list_of_diagonal_ops

Here the Prod class is used to represent a product of operators, and the Controlled class represents the controlled operator.

Shift Operator

Next we will implement the Shift operator. This is a product of three subroutines which shift the diagonal entries into the correct rank of the d-diagonal block encoding. Each rank has an associated shift value, the main diagonal has shift value 0. Each diagonal going up is labelled \((+1, +2, \ldots)\) respectively and each diagonal going down is labelled with its negative counterpart. The Shift operator works in three steps:

Load the binary representation of the shift value for each non-zero diagonal in \(A\).
Shift each of the diagonals in parallel using an \(Adder\) subroutine.
Unload the shift values to restore the quantum register.

def ShiftOp(num_shifts, num_load_wires, wires):
    prep_wires, load_shift_wires, load_diag_wires = wires

    Load = qre.QROM(
        num_bitstrings=num_shifts,
        size_bitstring=num_load_wires,
        restored=False,
        select_swap_depth=1,
        wires=prep_wires + load_shift_wires,
    )

    Adder = qre.SemiAdder(
        max_register_size=num_load_wires,
        wires=load_shift_wires + load_diag_wires,
    )

    Unload = qre.Adjoint(
        qre.QROM(
            num_bitstrings=num_shifts,
            size_bitstring=num_load_wires,
            restored=False,
            select_swap_depth=1,
            wires=prep_wires + load_shift_wires,
        )
    )
    return qre.Prod([Load, Adder, Unload])

d-Diagonal Block Encoding

Now that we have developed all of the pieces, we’ll bring them together to implement the full d-diagonal block encoding operator [2].

def WH_Diagonal_BE(num_diagonals, matrix_size, num_walsh_coeffs):
    # Initialize qubit registers:
    num_prep_wires = int(qnp.ceil(qnp.log2(num_diagonals)))
    num_diag_wires = int(qnp.ceil(qnp.log2(matrix_size)))

    prep_wires = [f"prep{i}" for i in range(num_prep_wires)]
    load_diag_wires = [f"load_d{i}" for i in range(num_diag_wires)]
    load_shift_wires = [f"load_s{i}" for i in range(num_diag_wires)]

    # Prep:
    Prep = qre.QROMStatePreparation(num_prep_wires, wires=prep_wires)

    # Shift:
    Shift = ShiftOp(
        num_shifts=num_diagonals,
        num_load_wires=num_diag_wires,
        wires=(prep_wires, load_diag_wires, load_shift_wires),
    )

    # Multiplexed - Dk:
    diagonal_ops = WalshHadamard_Dk(num_diagonals, matrix_size, num_walsh_coeffs)
    Dk = qre.Select(diagonal_ops, wires=load_diag_wires + prep_wires + ["target"])

    # Prep^t:
    Prep_dagger = qre.Adjoint(qre.QROMStatePreparation(num_prep_wires, wires=prep_wires))

    return qre.Prod([Prep, Shift, Dk, Prep_dagger])

Putting it all together: QSVT for matrix inversion

In this section we use all of the tools we have developed to estimate the resource requirements for solving a practical matrix inversion problem. Following the CFD example [2], we will estimate the resources required to invert a tri-diagonal matrix of size \((2^{20}, 2^{20})\). This system requires a \(10^{8}\)-degree polynomial approximation of the inverse function.

We will also restrict the gateset to Clifford + T gates, to generate results that are faithful to the ones presented in [2].

degree = 10**8
matrix_size = 2**20  # 2^20 x 2^20
num_diagonals = 3
num_walsh_coeffs = 512  # imperically determined


def matrix_inversion(degree, matrix_size, num_diagonals):
    num_state_wires = int(qnp.ceil(qnp.log2(matrix_size)))  # 20 qubits
    b_wires = [f"load_d{i}" for i in range(num_state_wires)]

    # Prepare |b> vector:
    qre.MPSPrep(
        num_state_wires,
        max_bond_dim=2**5,  # bond dimension = 2**5
        wires=b_wires,
    )

    # Apply A^-1:
    qre.QSVT(
        block_encoding=WH_Diagonal_BE(num_diagonals, matrix_size, num_walsh_coeffs),
        encoding_dims=(matrix_size, matrix_size),
        poly_deg=degree,
    )
    return


gate_set = {"X", "Y", "Z", "Hadamard", "T", "S", "CNOT"}
res = qre.estimate(matrix_inversion, gate_set)(degree, matrix_size, num_diagonals)
print(res)

--- Resources: ---
 Total wires: 106
   algorithmic wires: 43
   allocated wires: 63
     zero state: 63
     any state: 0
 Total gates : 4.832E+13
   'T': 2.968E+13,
   'CNOT': 1.202E+13,
   'X': 1.760E+10,
   'Z': 1.369E+12,
   'S': 1.990E+12,
   'Hadamard': 3.242E+12

The estimated T gate count of this matrix inversion workflow matches the reported \(3 \cdot 10^{13}\) from the reference.

Conclusion

In this demo we used knowledge about the structure of our matricies to build a more efficient block encoding. This work brings us one step closer to implementing quantum linear solvers on near term quantum hardware. If you are interested in other subroutines for data loading and how to optimize them, checkout our paper on a Quantum compilation framework for data loading.

Along the way we showcased PennyLane’s estimator module to determine the resource requirements for matrix inversion by QSVT. You learned how to build complex circuits from our library of primitives, making resource estimation as simple as putting together Lego blocks. Ofcourse QSVT has many other applications, from unstructured search, phase estimation and even Hamiltonian simulation. We challenge you to think of smart data loading techniques to reduce the quantum cost of QSVT for these applications as well!

References

About the author

Jay Soni

Jay completed his BSc. in Mathematical Physics from the University of Waterloo and currently works as a Quantum Software Developer at Xanadu. Fun fact, you will often find him sipping on a Tim Horton's IceCapp while he is working.

Total running time of the script: (0 minutes 0.024 seconds)

Share demo

Ask a question on the forum

About

Software & Documentation

Resources

Topics

Community & Support

Efficient block encoding for QSVT applied to matrix inversion for CFD problems

Problem Setup

Exploiting structure in the block encoding

Diagonal Matrices & the Walsh-Hadamard Transform

Shift Operator

d-Diagonal Block Encoding

Putting it all together: QSVT for matrix inversion

Conclusion

References

About the author

Related Demos