Running GPU-accelerated quantum circuit simulations on Covalent Cloud using PennyLane

Santosh Kumar Radha; Ara Ghukasyan

Demos/
Quantum Machine Learning/
Running GPU-accelerated quantum circuit simulations on Covalent Cloud using PennyLane

Running GPU-accelerated quantum circuit simulations on Covalent Cloud using PennyLane

Santosh Kumar Radha

Ara Ghukasyan

Published: May 23, 2024. Last updated: September 22, 2025.

In this tutorial, we’ll demonstrate how to run GPU-accelerated quantum circuit simulations on Covalent Cloud using PennyLane. We will focus on a specific example around quantum support vector machines (QSVMs) to demonstrate how easy it is to run GPU-accelerated quantum circuit simulations on Covalent Cloud.

QSVMs are essentially traditional SVMs that rely on embedding kernels evaluated on a quantum computer—a.k.a. quantum embedding kernel. These kernels provide a unique (and perhaps classically intractable) means of measuring pairwise similarity.

Using GPUs to simulate quantum computers is worthwhile when qubit capacity and/or fidelity requirements are not met by the available quantum hardware. While QSVMs are relatively tolerant to noise (an important reason for their current popularity), evaluating kernels on real quantum hardware is not always practical nor necessary.

Implementation

Let’s start by importing the required packages.

import covalent as ct
import covalent_cloud as cc
import matplotlib.pyplot as plt
import pennylane as qml
from matplotlib.colors import ListedColormap
from pennylane import numpy as np
from sklearn.datasets import make_blobs
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

# cc.save_api_key("YOUR_API_KEY")

Covalent Cloud allows us to create reusable execution environments, as shown below. This environment represents a typical setup for running Pennylane on NVIDIA GPUs.

cc.create_env(
    name="pennylane-gpu",  # identifier for referring to this environment
    conda={
        "channels": ["conda-forge"],
        "dependencies": ["cudatoolkit>=11.8"],
    },
    pip=[
        "cuquantum==23.10.0",
        "matplotlib==3.8.2",
        "Pennylane==0.34.0",
        "PennyLane-Lightning[GPU]==0.34.0",
        "scikit-learn==1.3.1",
        "torch==2.1.2",
    ],
    wait=True,
)

Next, we’ll define our resource specifications by creating some executors for this workflow. Both executors will run tasks in our new environment, named ”pennylane-gpu”.

cpu_executor = cc.CloudExecutor(  # for lightweight non-quantum tasks
    env="pennylane-gpu",
    num_cpus=2,
    memory="2GB",
)
gpu_executor = cc.CloudExecutor(  # for GPU-powered circuit simulations
    env="pennylane-gpu", num_cpus=4, memory="12GB", num_gpus=1, gpu_type="v100"
)

On to the algorithm!

Here’s a function returns a simple quantum kernel based on Pennylane’s IQP Embedding template. We’ll use it as-is inside our workflow.

QML_DEVICE = "lightning.qubit"


def get_kernel_circuit(n_wires):

    @qml.qnode(qml.device(QML_DEVICE, wires=n_wires, shots=None))
    def circuit(x1, x2):
        qml.IQPEmbedding(x1, wires=range(n_wires), n_repeats=4)
        qml.adjoint(qml.IQPEmbedding)(x2, wires=range(n_wires), n_repeats=4)
        return qml.probs(wires=range(n_wires))

    return lambda x1, x2: circuit(x1, x2)[0]  # |0..0> state probability

Next, each function destined for remote execution is decorated with @ct.electron, with an executor specified therein. Only tasks that evaluate the simulated quantum kernel should require gpu_executor. For example, we don’t need GPUs to generate our input data:

@ct.electron(executor=cpu_executor)  # lightweight non-quantum task
def get_split_data(n_samples=18, test_size=0.2):
    centers = [(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)]
    X, y = make_blobs(n_samples, centers=centers, cluster_std=0.25, shuffle=False)
    # rescale labels to be -1, 1
    mapping = {0: -1, 1: 1, 2: -1, 3: 1, 4: -1, 5: 1, 6: -1, 7: 1, 8: -1}
    y = np.array([mapping[i] for i in y])
    X = X.astype(np.float32)
    y = y.astype(int)

    # X_train, X_test, y_train, y_test
    return train_test_split(X, y, test_size=test_size, random_state=3)

Classifying with the SVM, on the other hand, requires $O(n^2)$ kernel evaluations, where $n$ is the dataset size. Accordingly, we’ll use GPUs (i.e. gpu_executor) to speed up this process.

DISP_SETTINGS = {
    "grid_resolution": 50,
    "response_method": "predict",
    "alpha": 0.5,
    "cmap": plt.cm.RdBu,
}


@ct.electron(executor=gpu_executor)
def classify_with_qsvm(Xtr, Xte, ytr, yte):
    kernel = get_kernel_circuit(n_wires=Xtr.shape[1])

    kernel_matrix_fn = lambda X, Z: qml.kernels.kernel_matrix(X, Z, kernel)
    svc = SVC(kernel=kernel_matrix_fn).fit(Xtr, ytr)

    # train/test accuracy
    accuracy_tr = svc.score(Xtr, ytr)
    accuracy_te = svc.score(Xte, yte)

    # decision boundary
    cm_bright = ListedColormap(["#FF0000", "#0000FF"])
    disp = DecisionBoundaryDisplay.from_estimator(svc, Xte, **DISP_SETTINGS)
    disp.ax_.scatter(Xtr[:, 0], Xtr[:, 1], c=ytr, cmap=cm_bright)
    disp.ax_.scatter(Xte[:, 0], Xte[:, 1], c=yte, cmap=cm_bright, marker="$\u25EF$")

    return accuracy_tr, accuracy_te, disp

Putting it all together, we can define a QSVM training and testing workflow. This special function gets decorated with @ct.lattice.

@ct.lattice(workflow_executor=cpu_executor, executor=cpu_executor)
def run_qsvm(n_samples, test_size):
    Xtr, Xte, ytr, yte = get_split_data(n_samples, test_size)
    return classify_with_qsvm(Xtr, Xte, ytr, yte)

Now, to dispatch run_qsvm to Covalent Cloud, we call it after wrapping with ct.dispatch, as usual.

dispatch_id = cc.dispatch(run_qsvm)(n_samples=64, test_size=0.2)
print("Dispatch ID:", dispatch_id)

Here’s what we get when we query and display the results.

result = cc.get_result(dispatch_id, wait=True)
result.result.load()

train_acc, test_acc, decision_boundary_figure = result.result.value
print(f"Train accuracy: {train_acc * 100:.1f}%")
print(f"Test accuracy: {test_acc * 100:.1f}%")

decision_boundary_figure

../../_images/covalent_cloud_gpu_19_1.png

Conclusion

In this tutorial, we demonstrated how to run quantum circuit simulations on GPUs via Covalent Cloud. We used PennyLane to define a simple quantum kernel, and then trained and tested a QSVM on a 2-dimensional dataset. To make the most of this tutorial, try experimenting with different datasets/kernels or increasing the dataset dimension, to gain a greater advantage from GPU acceleration.

The cost of running this workflow is approximately $0.27. The full code is available below, and you can try it yourself with the Run on Covalent Cloud button in the side menu on the right.

Full Code

import covalent as ct
import covalent_cloud as cc
import matplotlib.pyplot as plt
import pennylane as qml
from matplotlib.colors import ListedColormap
from pennylane import numpy as np
from sklearn.datasets import make_blobs
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

cc.save_api_key("API_KEY")

cc.create_env(
    name="pennylane-gpu",  # identifier for referring to this environment
    conda={
        "channels": ["conda-forge"],
        "dependencies": ["cudatoolkit>=11.8"],
    }, pip=[
        "cuquantum==23.10.0",
        "matplotlib==3.8.2",
        "Pennylane==0.34.0",
        "PennyLane-Lightning[GPU]==0.34.0",
        "scikit-learn==1.3.1",
        "torch==2.1.2",
    ],
    wait=True
)

cpu_executor = cc.CloudExecutor(  # for lightweight non-quantum tasks
    env="pennylane-gpu",
    num_cpus=2,
    memory="2GB",
)
gpu_executor = cc.CloudExecutor(  # for GPU-powered circuit simulations
    env="pennylane-gpu",
    num_cpus=4,
    memory="12GB",
    num_gpus=1,
    gpu_type="v100"
)

QML_DEVICE = "lightning.gpu"

def get_kernel_circuit(n_wires):

    @qml.qnode(qml.device(QML_DEVICE, wires=n_wires, shots=None))
    def circuit(x1, x2):
        qml.IQPEmbedding(x1, wires=range(n_wires), n_repeats=4)
        qml.adjoint(qml.IQPEmbedding)(x2, wires=range(n_wires), n_repeats=4)
        return qml.probs(wires=range(n_wires))

    return lambda x1, x2: circuit(x1, x2)[0]  # |0..0> state probability

@ct.electron(executor=cpu_executor)  # lightweight non-quantum task
def get_split_data(n_samples=18, test_size=0.2):
    centers = [(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)]
    X, y = make_blobs(n_samples, centers=centers, cluster_std=0.25, shuffle=False)
    # rescale labels to be -1, 1
    mapping = {0: -1, 1: 1, 2: -1, 3: 1, 4: -1, 5: 1, 6: -1, 7: 1, 8: -1}
    y = np.array([mapping[i] for i in y])
    X = X.astype(np.float32)
    y = y.astype(int)

    # X_train, X_test, y_train, y_test
    return train_test_split(X, y, test_size=test_size, random_state=3)

DISP_SETTINGS = {"grid_resolution": 50, "response_method": "predict", "alpha": 0.5, "cmap": plt.cm.RdBu}

@ct.electron(executor=gpu_executor)
def classify_with_qsvm(Xtr, Xte, ytr, yte):
    kernel = get_kernel_circuit(n_wires=Xtr.shape[1])

    kernel_matrix_fn = lambda X, Z: qml.kernels.kernel_matrix(X, Z, kernel)
    svc = SVC(kernel=kernel_matrix_fn).fit(Xtr, ytr)

    # train/test accuracy
    accuracy_tr = svc.score(Xtr, ytr)
    accuracy_te = svc.score(Xte, yte)

    # decision boundary
    cm_bright = ListedColormap(["#FF0000", "#0000FF"])
    disp = DecisionBoundaryDisplay.from_estimator(svc, Xte, **DISP_SETTINGS)
    disp.ax_.scatter(Xtr[:, 0], Xtr[:, 1], c=ytr, cmap=cm_bright)
    disp.ax_.scatter(Xte[:, 0], Xte[:, 1], c=yte, cmap=cm_bright, marker="$\u25EF$")

    return accuracy_tr, accuracy_te, disp

@ct.lattice(workflow_executor=cpu_executor, executor=cpu_executor)
def run_qsvm(n_samples, test_size):
    Xtr, Xte, ytr, yte = get_split_data(n_samples, test_size)
    return classify_with_qsvm(Xtr, Xte, ytr, yte)

dispatch_id = cc.dispatch(run_qsvm)(n_samples=64, test_size=0.2)
print("Dispatch ID:", dispatch_id)

result = cc.get_result(dispatch_id, wait=True)
result.result.load()

train_acc, test_acc, decision_boundary_figure = result.result.value
print(f"Train accuracy: {train_acc * 100:.1f}%")
print(f"Test accuracy: {test_acc * 100:.1f}%")

About the authors

Santosh Kumar Radha

Santosh is the head of research and development and product lead at Agnostiq.

Ara Ghukasyan

Ara is a Research Software Engineer at Agnostiq Inc. He has a B.Sc. in Math & Physics and a Ph.D. in Engineering Physics from McMaster University in Hamilton, Ontario. Ara’s interests include machine learning, physics, and quantum computing.

Share demo

Ask a question on the forum

About

Software & Documentation

Resources

Topics

Community & Support

Running GPU-accelerated quantum circuit simulations on Covalent Cloud using PennyLane

Implementation

Conclusion

Full Code

About the authors

Related Demos