# The Variational Quantum Thermalizer¶

Author: Jack Ceroni

This demonstration discusses theory and experiments relating to a recently proposed quantum algorithm called the Variational Quantum Thermalizer (VQT): a generalization of the well-know Variational Quantum Eigensolver (VQE) to systems with non-zero temperatures.

## The Idea¶

The goal of the VQT is to prepare the thermal state of a given Hamiltonian $$\hat{H}$$ at temperature $$T$$, which is defined as

$\rho_\text{thermal} \ = \ \frac{e^{- \hat{H} \beta}}{\text{Tr}(e^{- \hat{H} \beta})} \ = \ \frac{e^{- \hat{H} \beta}}{Z_{\beta}},$

where $$\beta \ = \ 1/T$$. The thermal state is a mixed state, which means that can be described by an ensemble of pure states. Since we are attempting to learn a mixed state, we must deviate from the standard variational method of passing a pure state through an ansatz circuit, and minimizing the energy expectation.

The VQT begins with an initial density matrix, $$\rho_{\theta}$$, described by a probability distribution parametrized by some collection of parameters $$\theta$$, and an ensemble of pure states, $$\{|\psi_i\rangle\}$$. Let $$p_i(\theta_i)$$ be the probability corresponding to the $$i$$-th pure state. We sample from this probability distribution to get some pure state $$|\psi_k\rangle$$, which we pass through a parametrized circuit, $$U(\phi)$$. From the results of this circuit, we then calculate $$\langle \psi_k | U^{\dagger}(\phi) \hat{H}\, U(\phi) |\psi_k\rangle$$. Repeating this process multiple times and taking the average of these expectation values gives us the the expectation value of $$\hat{H}$$ with respect to $$U \rho_{\theta} U^{\dagger}$$. Inputted parameters create an initial density matrix and a parametrized ansatz, which are used to calculate the expectation value of the Hamiltonian with respect to a new mixed state.

Arguably, the most important part of a variational circuit is its cost function, which we attempt to minimize with a classical optimizer. In VQE, we generally try to minimize $$\langle \psi(\theta) | \hat{H} | \psi(\theta) \rangle$$ which, upon minimization, gives us a parametrized circuit that prepares a good approximation to the ground state of $$\hat{H}$$. In the VQT, the goal is to arrive at a parametrized probability distribution, and a parametrized ansatz, that generate a good approximation to the thermal state. This generally involves more than calculating the energy expectation value. Luckily, we know that the thermal state of $$\hat{H}$$ minimizes the following free-energy cost function

$\mathcal{L}(\theta, \ \phi) \ = \ \beta \ \text{Tr}( \hat{H} \ \hat{U}(\phi) \rho_{\theta} \hat{U}(\phi)^{\dagger} ) \ - \ S_\theta,$

where $$S_{\theta}$$ is the von Neumann entropy of $$U \rho_{\theta} U^{\dagger}$$, which is the same as the von Neumann entropy of $$\rho_{\theta}$$ due to invariance of entropy under unitary transformations. This cost function is minimized when $$\hat{U}(\phi) \rho_{\theta} \hat{U}(\phi)^{\dagger} \ = \ \rho_{\text{thermal}}$$, so similarly to VQE, we minimize it with a classical optimizer to obtain the target parameters, and thus the target state. A high-level representation of how the VQT works.

All together, the outlined processes give us a general protocol to generate thermal states.

## Simulating the VQT for a 4-Qubit Heisenberg Model¶

In this demonstration, we simulate the 4-qubit Heisenberg model. We can begin by importing the necessary dependencies.

import pennylane as qml
from matplotlib import pyplot as plt
import numpy as np
from numpy import array
import scipy
from scipy.optimize import minimize
import networkx as nx
import seaborn
import itertools


The Heisenberg Hamiltonian is defined as

$\hat{H} \ = \ \displaystyle\sum_{(i, j) \in E} X_i X_j \ + \ Z_i Z_j \ + \ Y_i Y_j,$

where $$X_i$$, $$Y_i$$ and $$Z_i$$ are the Pauli gates acting on the $$i$$-th qubit. In addition, $$E$$ is the set of edges in the graph $$G \ = \ (V, \ E)$$ describing the interactions between the qubits. In this demonstration, we define the interaction graph to be the cycle graph:

interaction_graph = nx.cycle_graph(4)
nx.draw(interaction_graph) With this, we can calculate the matrix representation of the Heisenberg Hamiltonian in the computational basis:

def create_hamiltonian_matrix(n, graph):

matrix = np.zeros((2 ** n, 2 ** n))

for i in graph.edges:
x = y = z = 1
for j in range(0, n):
if j == i or j == i:
x = np.kron(x, qml.PauliX.matrix)
y = np.kron(y, qml.PauliY.matrix)
z = np.kron(z, qml.PauliZ.matrix)
else:
x = np.kron(x, np.identity(2))
y = np.kron(y, np.identity(2))
z = np.kron(z, np.identity(2))

return matrix

ham_matrix = create_hamiltonian_matrix(4, interaction_graph)

# Prints a visual representation of the Hamiltonian matrix
seaborn.heatmap(ham_matrix.real)
plt.show() With this done, we construct the VQT. We begin by defining some fixed variables that are used throughout the simulation:

beta = 2  # beta = 1/T
nr_qubits = 4


The first step of the VQT is to create the initial density matrix, $$\rho_\theta$$. In this demonstration, we let $$\rho_\theta$$ be factorized, meaning that it can be written as an uncorrelated tensor product of $$4$$ one-qubit density matrices that are diagonal in the computational basis. The motivation is that in this factorized model, the number of $$\theta_i$$ parameters needed to describe $$\rho_\theta$$ scales linearly rather than exponentially with the number of qubits. For each one-qubit system described by $$\rho_\theta^i$$, we have

$\rho_{\theta}^{i} \ = \ p_i(\theta_i) |0\rangle \langle 0| \ + \ (1 \ - \ p_i(\theta_i))|1\rangle \langle1|.$

From here, all we have to do is define $$p_i(\theta_i)$$, which we choose to be the sigmoid

$p_{i}(\theta_{i}) \ = \ \frac{e^{\theta_i}}{e^{\theta_i} \ + \ 1}.$
def sigmoid(x):
return np.exp(x) / (np.exp(x) + 1)


This is a natural choice for probability function, as it has a range of $$[0, \ 1]$$, meaning that we don’t need to restrict the domain of $$\theta_i$$ to some subset of the real numbers. With the probability function defined, we can write a method that gives us the diagonal elements of each one-qubit density matrix, for some parameters $$\theta$$:

def prob_dist(params):
return np.vstack([sigmoid(params), 1 - sigmoid(params)]).T


### Creating the Ansatz Circuit¶

With this done, we can move on to defining the ansatz circuit, $$U(\phi)$$, composed of rotational and coupling layers. The rotation layer is simply RX, RY, and RZ gates applied to each qubit. We make use of the AngleEmbedding function, which allows us to easily pass parameters into rotational layers.

def single_rotation(phi_params, qubits):

rotations = ["Z", "Y", "X"]
for i in range(0, len(rotations)):
qml.templates.AngleEmbedding(phi_params[i], wires=qubits, rotation=rotations[i])


To construct the general ansatz, we combine the method we have just defined with a collection of parametrized coupling gates placed between qubits that share an edge in the interaction graph. In addition, we define the depth of the ansatz, and the device on which the simulations are run:

depth = 4
dev = qml.device("default.qubit", wires=nr_qubits)

def quantum_circuit(rotation_params, coupling_params, sample=None):

# Prepares the initial basis state corresponding to the sample
qml.templates.BasisStatePreparation(sample, wires=range(nr_qubits))

# Prepares the variational ansatz for the circuit
for i in range(0, depth):
single_rotation(rotation_params[i], range(nr_qubits))
unitary=qml.CRX,
pattern="ring",
wires=range(nr_qubits),
parameters=coupling_params[i]
)

# Calculates the expectation value of the Hamiltonian with respect to the prepared states
return qml.expval(qml.Hermitian(ham_matrix, wires=range(nr_qubits)))

# Constructs the QNode
qnode = qml.QNode(quantum_circuit, dev)


We can get an idea of what this circuit looks like by printing out a test circuit:

rotation_params = [[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]] for i in range(0, depth)]
coupling_params = [[1, 1, 1, 1] for i in range(0, depth)]
results = qnode(rotation_params, coupling_params, sample=[1, 0, 1, 0])
print(qnode.draw())


Out:

 0: ──X──────RZ(1)──RY(1)──RX(1)──╭C─────────────────────────────╭RX(1)──RZ(1)──RY(1)──RX(1)──╭C─────────────────────────────╭RX(1)──RZ(1)──RY(1)──RX(1)──╭C─────────────────────────────╭RX(1)──RZ(1)──RY(1)──RX(1)──╭C──────────────────────╭RX(1)──╭┤ ⟨H0⟩
1: ──RZ(1)──RY(1)──RX(1)─────────╰RX(1)──╭C───────RZ(1)──RY(1)──│───────RX(1)────────────────╰RX(1)──╭C───────RZ(1)──RY(1)──│───────RX(1)────────────────╰RX(1)──╭C───────RZ(1)──RY(1)──│───────RX(1)────────────────╰RX(1)──╭C──────────────│───────├┤ ⟨H0⟩
2: ──X──────RZ(1)──RY(1)──RX(1)──────────╰RX(1)──╭C──────RZ(1)──│───────RY(1)──RX(1)─────────────────╰RX(1)──╭C──────RZ(1)──│───────RY(1)──RX(1)─────────────────╰RX(1)──╭C──────RZ(1)──│───────RY(1)──RX(1)─────────────────╰RX(1)──╭C──────│───────├┤ ⟨H0⟩
3: ──RZ(1)──RY(1)──RX(1)─────────────────────────╰RX(1)─────────╰C──────RZ(1)──RY(1)──RX(1)──────────────────╰RX(1)─────────╰C──────RZ(1)──RY(1)──RX(1)──────────────────╰RX(1)─────────╰C──────RZ(1)──RY(1)──RX(1)──────────────────╰RX(1)──╰C──────╰┤ ⟨H0⟩
H0 =
[[ 4.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j
0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j]
[ 0.+0.j  0.+0.j  2.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  2.+0.j
0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j]
[ 0.+0.j  2.+0.j  0.+0.j  0.+0.j  2.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j
0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j]
[ 0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  2.+0.j  0.+0.j  0.+0.j  0.+0.j
0.+0.j  2.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j]
[ 0.+0.j  0.+0.j  2.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  2.+0.j
0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j]
[ 0.+0.j  0.+0.j  0.+0.j  2.+0.j  0.+0.j -4.+0.j  2.+0.j  0.+0.j  0.+0.j
2.+0.j  0.+0.j  0.+0.j  2.+0.j  0.+0.j  0.+0.j  0.+0.j]
[ 0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  2.+0.j  0.+0.j  0.+0.j  0.+0.j
0.+0.j  2.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j]
[ 0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j
0.+0.j  0.+0.j  2.+0.j  0.+0.j  0.+0.j  2.+0.j  0.+0.j]
[ 0.+0.j  2.+0.j  0.+0.j  0.+0.j  2.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j
0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j]
[ 0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  2.+0.j  0.+0.j  0.+0.j  0.+0.j
0.+0.j  2.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j]
[ 0.+0.j  0.+0.j  0.+0.j  2.+0.j  0.+0.j  0.+0.j  2.+0.j  0.+0.j  0.+0.j
2.+0.j -4.+0.j  0.+0.j  2.+0.j  0.+0.j  0.+0.j  0.+0.j]
[ 0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  2.+0.j  0.+0.j
0.+0.j  0.+0.j  0.+0.j  0.+0.j  2.+0.j  0.+0.j  0.+0.j]
[ 0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  2.+0.j  0.+0.j  0.+0.j  0.+0.j
0.+0.j  2.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j]
[ 0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j
0.+0.j  0.+0.j  2.+0.j  0.+0.j  0.+0.j  2.+0.j  0.+0.j]
[ 0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  2.+0.j  0.+0.j
0.+0.j  0.+0.j  0.+0.j  0.+0.j  2.+0.j  0.+0.j  0.+0.j]
[ 0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j
0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  4.+0.j]]


Recall that the final cost function depends not only on the expectation value of the Hamiltonian, but also the von Neumann entropy of the state, which is determined by the collection of $$p_i(\theta_i)$$s. Since the entropy of a collection of multiple uncorrelated subsystems is the same as the sum of the individual values of entropy for each subsystem, we can sum the entropy values of each one-qubit system in the factorized space to get the total:

def calculate_entropy(distribution):

total_entropy = 0
for d in distribution:
total_entropy += -1 * d * np.log(d) + -1 * d * np.log(d)

# Returns an array of the entropy values of the different initial density matrices



### The Cost Function¶

Finally, we combine the ansatz and the entropy function to get the cost function. In this demonstration, we deviate slightly from how VQT would be performed in practice. Instead of sampling from the probability distribution describing the initial mixed state, we use the ansatz to calculate $$\langle x_i | U^{\dagger}(\phi) \hat{H} \,U(\phi) |x_i\rangle$$ for each basis state $$|x_i\rangle$$. We then multiply each of these expectation values by their corresponding $$(\rho_\theta)_{ii}$$, which is exactly the probability of sampling $$|x_i\rangle$$ from the distribution. Summing each of these terms together gives us the expected value of the Hamiltonian with respect to the transformed density matrix.

In the case of this small, simple model, exact calculations such as this reduce the number of circuit executions, and thus the total execution time.

You may have noticed previously that the “structure” of the parameters list passed into the ansatz is quite complicated. We write a general function that takes a one-dimensional list, and converts it into the nested list structure that can be inputed into the ansatz:

def convert_list(params):

# Separates the list of parameters
dist_params = params[0:nr_qubits]
ansatz_params_1 = params[nr_qubits : ((depth + 1) * nr_qubits)]
ansatz_params_2 = params[((depth + 1) * nr_qubits) :]

coupling = np.split(ansatz_params_1, depth)

# Partitions the parameters into multiple lists
split = np.split(ansatz_params_2, depth)
rotation = []
for s in split:
rotation.append(np.split(s, 3))

ansatz_params = [rotation, coupling]

return [dist_params, ansatz_params]


We then pass this function, along with the ansatz and the entropy function into the final cost function:

def exact_cost(params):

global iterations

# Transforms the parameter list
parameters = convert_list(params)
dist_params = parameters
ansatz_params = parameters

# Creates the probability distribution
distribution = prob_dist(dist_params)

# Generates a list of all computational basis states of our qubit system
combos = itertools.product([0, 1], repeat=nr_qubits)
s = [list(c) for c in combos]

# Passes each basis state through the variational circuit and multiplies
# the calculated energy EV with the associated probability from the distribution
cost = 0
for i in s:
result = qnode(ansatz_params, ansatz_params, sample=i)
for j in range(0, len(i)):
result = result * distribution[j][i[j]]
cost += result

# Calculates the entropy and the final cost function
entropy = calculate_entropy(distribution)
final_cost = beta * cost - entropy

return final_cost


We then create the function that is passed into the optimizer:

def cost_execution(params):

global iterations

cost = exact_cost(params)

if iterations % 50 == 0:
print("Cost at Step {}: {}".format(iterations, cost))

iterations += 1
return cost


The last step is to define the optimizer, and execute the optimization method. We use the “Constrained Optimization by Linear Approximation” (COBYLA) optimization method, which is a gradient-free optimizer. We observe that for this algorithm, COBYLA has a lower runtime than its gradient-based counterparts, so we utilize it in this demonstration:

iterations = 0

number = nr_qubits * (1 + depth * 4)
params = [np.random.randint(-300, 300) / 100 for i in range(0, number)]
out = minimize(cost_execution, x0=params, method="COBYLA", options={"maxiter": 1600})
out_params = out["x"]


Out:

Cost at Step 0: -3.0383370128131006
Cost at Step 50: -6.543355901672349
Cost at Step 100: -7.577195324266485
Cost at Step 150: -7.6182141575788975
Cost at Step 200: -9.238185671551406
Cost at Step 250: -10.539175424920677
Cost at Step 300: -11.133188631591441
Cost at Step 350: -11.94103674677227
Cost at Step 400: -12.755762451651176
Cost at Step 450: -13.022487958442019
Cost at Step 500: -13.162401945000605
Cost at Step 550: -13.18543103305229
Cost at Step 600: -13.519374692963437
Cost at Step 650: -13.712909107717824
Cost at Step 700: -13.814476716375879
Cost at Step 750: -14.005177426893587
Cost at Step 800: -14.00280412101905
Cost at Step 850: -14.2449191759581
Cost at Step 900: -14.351035614437961
Cost at Step 950: -14.401129730191636
Cost at Step 1000: -14.439249499636398
Cost at Step 1050: -14.5295362057969
Cost at Step 1100: -14.607975851568627
Cost at Step 1150: -14.659533426443431
Cost at Step 1200: -14.691345120842747
Cost at Step 1250: -14.728470516907826
Cost at Step 1300: -14.712401223160892
Cost at Step 1350: -14.84061658816376
Cost at Step 1400: -14.87770133620838
Cost at Step 1450: -14.9070799348634
Cost at Step 1500: -15.000590856330298
Cost at Step 1550: -15.043415471395024


We can now check to see how well our optimization method performed by writing a function that reconstructs the transformed density matrix of some initial state, with respect to lists of $$\theta$$ and $$\phi$$ parameters:

def prepare_state(params, device):

# Initializes the density matrix

final_density_matrix = np.zeros((2 ** nr_qubits, 2 ** nr_qubits))

# Prepares the optimal parameters, creates the distribution and the bitstrings
parameters = convert_list(params)
dist_params = parameters
unitary_params = parameters

distribution = prob_dist(dist_params)

combos = itertools.product([0, 1], repeat=nr_qubits)
s = [list(c) for c in combos]

# Runs the circuit in the case of the optimal parameters, for each bitstring,
# and adds the result to the final density matrix

for i in s:
qnode(unitary_params, unitary_params, sample=i)
state = device.state
for j in range(0, len(i)):
state = np.sqrt(distribution[j][i[j]]) * state

return final_density_matrix

# Prepares the density matrix
prep_density_matrix = prepare_state(out_params, dev)


We then display the prepared state by plotting a heatmap of the entry-wise absolute value of the density matrix:

seaborn.heatmap(abs(prep_density_matrix))
plt.show() ### Numerical Calculations¶

To verify that we have in fact prepared a good approximation of the thermal state, let’s calculate it numerically by taking the matrix exponential of the Heisenberg Hamiltonian, as was outlined earlier.

def create_target(qubit, beta, ham, graph):

# Calculates the matrix form of the density matrix, by taking
# the exponential of the Hamiltonian

h = ham(qubit, graph)
y = -1 * float(beta) * h
new_matrix = scipy.linalg.expm(np.array(y))
norm = np.trace(new_matrix)
final_target = (1 / norm) * new_matrix

return final_target

target_density_matrix = create_target(
nr_qubits, beta,
create_hamiltonian_matrix,
interaction_graph
)


Finally, we can plot a heatmap of the target density matrix:

seaborn.heatmap(abs(target_density_matrix))
plt.show() The two images look very similar, which suggests that we have constructed a good approximation of the thermal state! Alternatively, if you prefer a more quantitative measure of similarity, we can calculate the trace distance between the two density matrices, which is defined as

$T(\rho, \ \sigma) \ = \ \frac{1}{2} \text{Tr} \sqrt{(\rho \ - \ \sigma)^{\dagger} (\rho \ - \ \sigma)},$

and is a metric on the space of density matrices:

def trace_distance(one, two):

return 0.5 * np.trace(np.absolute(np.add(one, -1 * two)))

print("Trace Distance: " + str(trace_distance(target_density_matrix, prep_density_matrix)))


Out:

Trace Distance: 0.0653618026621349


The closer to zero, the more similar the two states are. Thus, we have found a close approximation of the thermal state of $$H$$ with the VQT!