December 09, 2025
Expanding the Frontier: Accelerating Quantum Workflows with PennyLane and AMD
High-Performance Computing (HPC) loves AMD, and it's safe to say that PennyLane loves AMD, too.
With AMD accelerators powering some of the world's most formidable machines—from current Top500 leaders and exascale giants El Capitan and Frontier, to future systems like the recently announced Alice Recoque, Discovery, and Lux—the hardware landscape for scientific discovery is shifting.
As highlighted in AMD's recent blog, these accelerators are pushing the boundaries of what is computationally possible. At the same time, platforms such as AMD Developer Cloud are making it easier to access high-end AMD GPUs without requiring supercomputing resources.
At PennyLane, we are committed to enabling your quantum research on whatever hardware you have. Whether you are prototyping on a personal workstation, testing an instance on the Developer Cloud, or running massive simulations on the world's flagship supercomputers, PennyLane is ready to unlock the full potential of AMD hardware.
Here is how we are making it happen, and everything you can do with PennyLane and AMD 🪄
Contents
- First things first: Welcome Lightning-AMDGPU
- Validated on Frontier: A User-Friendly Experience
- The Secret Sauce: MPI for Massive Scalability
- Supercharging Compilation with Catalyst
- How to Get Started
First things first: Welcome Lightning-AMDGPU
To make it seamless to get started on AMD GPUs, we have released Lightning-AMDGPU—binaries of our existing Lightning-Kokkos simulator precompiled specifically for AMD GPUs.
We compiled this device for maximum ease and maximum performance. You can now access the raw power of AMD's lightning-fast AMD Instinct™ GPUs with a simple install on your Developer Cloud instance:
pip install pennylane-lightning-amdgpu
(Note: Precompiled wheels are currently available for ROCm 7.0 and MI300 series GPUs).
What's happening under the hood? Lightning-AMDGPU utilizes our highly portable Lightning-Kokkos simulator backend, powered by the Kokkos portability framework. This allows us to write C++ code that runs seamlessly across CPUs and GPUs. Crucially, on AMD platforms, this code is directly lowered into HIP—AMD's native programming model—to enable maximum performance.
Go check out the Lightning-AMDGPU device details and try it out yourself today!
Validated on Frontier: A User-Friendly Experience
What if you have access to a larger HPC system with multiple AMD GPUs? We have you covered there, too. By building Lightning-Kokkos from source, you unlock the ability to scale up with distributed simulations across multi-GPUs.
But talking about scaling is one thing; doing it on the world's first exascale supercomputer is another.
We recently collaborated with Oak Ridge National Laboratory (ORNL) to demonstrate the power of PennyLane's Lightning simulator, specifically how easy it is to get PennyLane and Lightning up and running on Frontier.
The widespread perception is that exascale requires complex, bespoke code. However, installing and using Lightning on Frontier is straightforward. We have released easy-to-follow instructions, including specific tips to help you take full advantage of Frontier's high-bandwidth interconnect.
Below is a strong scaling plot of running Quantum Fourier Transform (QFT), a common subroutine in many quantum algorithms. As you can see, Lightning-Kokkos easily supports simulating on over 1000 AMD GPUs, and adding more AMD hardware resources translates directly into performance gains.
Want to dive deeper?
-
Watch: We presented a comprehensive tutorial for the Oak Ridge Leadership Computing Facility (OLCF) community. Watch the full tutorial: PennyLane on Frontier 2025 (Vimeo)
-
Code: Try the exact demo examples we ran on Frontier. The code is open-source and available at the OLCF Quantum Training Series GitHub.
The Secret Sauce: MPI for Massive Scalability
How did we achieve the results above?
The core of this performance is PennyLane Lightning, our high-performance simulator suite. We have worked hard to make Lightning not just faster, but fundamentally "HPC-friendly". Since v0.42, we have added powerful functionality to run parallelized circuit simulations at virtually any scale by leveraging MPI (Message Passing Interface) support within Lightning-Kokkos.
With MPI, you are no longer limited to a single GPU. You can distribute the state vector of a single circuit across multiple GPUs and multiple nodes. This allows you to:
-
Run faster: Parallelize large-scale simulations to drastically reduce runtime.
-
Go bigger: Simulate a larger number of qubits than can fit into the memory of a single GPU.
This architecture is the key ingredient behind the impressive scaling plot seen on Frontier.
Supercharging Compilation with Catalyst
Scaling the simulator is vital, but what about optimizing complex hybrid workflows?
Catalyst is our Quantum Just-In-Time (QJIT) compiler. It takes your hybrid quantum-classical programs and efficiently compiles them, unlocking significant speedups and enabling features like AutoGraph and optimized dynamic quantum circuits.
Does this work on AMD GPUs? Absolutely.
Catalyst seamlessly integrates with the Lightning-AMDGPU (and of course, Lightning-Kokkos) backend. This means you can combine the compilation power of Catalyst with the raw throughput of Lightning running on an AMD ROCm device. This combination allows for:
-
Just-in-Time Compilation: Compile the entire workflow (not just the quantum circuit) for optimized execution.
-
Hardware Acceleration: Offload the heavy lifting to AMD GPUs automatically.
Following our previous exploration of the quantitative advantage of compiling your hybrid quantum-classical program, let's look at the advantage gained by using Catalyst to optimize a simple circuit:
@qml.qjit(autograph=True)
@qml.qnode(dev)
def circuit(x: float, y: float, N: int):
for i in range(N):
qml.Hadamard(wires=0)
qml.Hadamard(wires=0)
qml.RX(x, wires=1)
qml.RX(y, wires=1)
return qml.probs(wires=[0, 1])
The optimization we add here includes applying two simple passes: canceling consecutive adjoint gates and merging rotation gates. We executed this on an AMD Developer Cloud instance using Lightning-AMDGPU.
The Result: Where the time it takes for PennyLane to apply the optimization passes grows with the circuit gate depth, Catalyst's compilation time remains constant regardless of depth. This is the power of preserving control structure for quantum optimization.
With the combined power of Catalyst and Lightning on AMD, you can take the acceleration of your research workflow to the next level. Check out the rest of the blog post for more interesting workflows and speed-ups brought to you by Catalyst, or read our latest work on constant-time compilation of Shor's algorithm and the associated demo—those same benefits apply here as well.
How to Get Started
Ready to run on AMD? Here are several easy ways to get you up and running:
Pip install on AMD Developer Cloud
The easiest way to start running on MI300X GPUs is via pip:
pip install pennylane pennylane-lightning-amdgpu
Then you'll be good to go! Check out AMD's latest blog post for a deeper dive into setting up and using PennyLane and Lightning on their Developer Cloud.
Docker images
If you want to use our pre-built images, first follow AMD's quick start guide for their Container Toolkit. Then you'll be ready to use our Docker Image:
docker run -it --rm --runtime=amd --gpus 1 pennylaneai/pennylane:latest-lightning-kokkos-rocm /bin/bash
From source
For maximum performance, MPI scaling, and custom configurations (like on Frontier), check out our build guides:
- Build Lightning-AMDGPU from source
- Build Lightning-Kokkos from source
- Guide: Lightning-Kokkos for HPC
Take Lightning for a spin wherever you have AMD GPUs—from laptops, to supercomputing centers, to the cloud—and scale your quantum simulations even further.
Interested in getting in touch with us? Simply head over to our GitHub repository and check out the ongoing work and join in on the development. Or pop over to the PennyLane discussion forum to let us know how you are using PennyLane, Lightning, and Catalyst in your workflows.
And if you are as excited as we are, make sure to keep an eye on the PennyLane Blog and follow us on social media for the latest PennyLane features and updates!
About the authors
Joseph Lee
Joseph is a physicist and software developer trying to make PennyLane go zoom.
Lee O'Riordan
Physicist, purveyor of angular momentum, GPUs, pointy guitars, and computational things. Working on quantum stuff.
Josh Izaac
Josh is a theoretical physicist, software tinkerer, and occasional baker. At Xanadu, he contributes to the development and growth of Xanadu’s open-source quantum software products.