PennyLane
  • Why PennyLane
  • Getting Started
  • Documentation
  • Ecosystem
Install
Install
  1. Blog/
  2. Quantum Machine Learning/
  3. Introducing PennyLane Datasets

August 29, 2023

Introducing PennyLane Datasets

Diego Guala

Diego Guala

Josh Izaac

Josh Izaac

Easy access to large data is integral to quantum computing research – especially as quantum algorithms and hardware scale. Today, we're delighted to launch PennyLane Datasets, a brand new online library that makes it easier to browse available datasets, and find exactly what you are looking for; complementing our existing datasets functionality in PennyLane.

Capybara, giraffe, lion, and kangaroo standing around a quantum circuit

We've been sharing PennyLane Demos with a mission to make impactful research accessible and to allow researchers across the spectrum of quantum computing to explore, understand, and generalize new results by simply copying and modifying existing code.

Imagine if you could do the same with data!

No longer do you need to spend hours sifting through GitHub repos or attempting to reverse-engineer data from papers to make it code-compatible — use PennyLane Datasets to dive in, build, and explore data-driven quantum algorithms.

Our datasets library is currently in preview; give it a whirl, and let us know if there are any datasets you would love to see in it. Or, read on to learn more about how PennyLane Datasets work, the data currently available, and what we have planned for the future.

What is quantum data anyway?

Quantum data is any data that goes in or comes out of a quantum system.

In our first release of the PennyLane Datasets library, you’ll find available a collection of quantum datasets spanning from quantum chemistry to spin systems. This includes:

  • A wide variety of common molecules with multiple bond lengths and angles — from standard bearers like \text{H}_2 and \text{H}_2\text{O} to \text{C}_2\text{H}_6 and \text{N}_2\text{H}_4.

  • Spin systems such as the Fermi–Hubbard model and the Ising model, with multiple lattices, periodicities, and layouts.

For each dataset, we also provide the data you need for benchmarking and testing your algorithms, including Hamiltonians, classical shadow samples, measurement groupings, symmetries, and even common circuit ansätze and parameters.

However, as we continue to build out our datasets library, we are taking a broader view of what "quantum data" is.

We don’t want to pigeonhole; quantum data is more than just chemistry data and spin system samples. It is any data you may be using or that you may be interested in when building quantum algorithms.

Download and access data in seconds

Using the new dataset interface, you can browse and then choose a specific dataset and its parameters:

Short GIF of a PennyLane user browsing the datasets website, choosing a molecule, and configuring its bond length and basis set before download

Change parameters like bond length and basis in the Configure tab, and explore the corresponding information for these dataset settings on the dataset page. These include a preview of the Hamiltonian that defines the system, details about the available samples, and the source code for generating the dataset. 🤖

With a dataset and parameters selected, hop over to the Download tab to choose what information you want to download. Then, the PennyLane code can be easily copied into your Python environment to quickly download the dataset and start using it in circuits.

Short GIF of a PennyLane user browsing the datasets website, choosing which dataset attributes to download, generating the rquired PennyLane code, then copying it into a Python environment

Other features include:

  • 🎲 Samples: associated output from quantum simulators or hardware. For example, provided samples for the molecule datasets includes sampling data obtained from the optimized variational circuit with available Hamiltonian groupings.
  • 💻 Source code available for all data: explore the code that generated the data, and even download and run it yourself.
  • 📖 Detailed data attributes: an overview of all data available within the dataset, including descriptions and data types.

In addition to this new way to browse and access the datasets, we've also made significant improvements to the existing datasets functionality in PennyLane as part of the PennyLane v0.32 release, such as:

  • 🥣 Improved datasets serialization: datasets are now stored using the HDF5 data format, which is ubiquitous across scientific computation.
  • ⚙️ Access to dataset identifiers from inside the datasets. Values such as the bondlength, molname, or lattice.
  • ⚛️ New datasets: a large variety of dimers, organic and inorganic molecules, mainly focused on molecules that exhibit multireference character.
  • 👌 Download datasets with default parameter values automatically. For example, you can download the \text{H}_2 molecule in the STO-3G basis at its optimal bond length by simply calling qml.data.load('qchem', molname='H2'). For more details, and many other PennyLane v0.32 features and improvements, make sure to check out the PennyLane v0.32 release.

What’s next for PennyLane Datasets?

With the PennyLane Datasets project, our goal is to both document and provide important research data so that you can build and share faster.

We currently aim to provide:

  • Standard datasets: useful for broader, more general quantum computing research. Examples include molecules and spin systems.

  • Research-specific datasets: linked to particular research, to encourage exploration and sharing of recent results. These datasets may be input to quantum circuits, embedding data, circuits, or output samples from quantum simulations or quantum hardware. Over the next few months, we will be working to grow the catalog of available datasets, and continuing to improve the PennyLane Datasets service.

Try it out and get involved

To get started using PennyLane Datasets, browse the available datasets and import them directly into PennyLane. For more details on the PennyLane integration, or to create your own datasets, please check out the PennyLane documentation on quantum datasets.

Have any data you would love to see? Or even data you would love to share and host on PennyLane Datasets? If so, please get in touch — we would love to help.

…and if you are as excited as we are, make sure to keep an eye on the PennyLane Blog and follow us on social media for the latest PennyLane Datasets updates.

About the authors

Diego Guala
Diego Guala

Diego Guala

Diego is a quantum scientist at Xanadu. His work is focused on supporting the development of the datasets service and PennyLane features.

Josh Izaac
Josh Izaac

Josh Izaac

Josh is a theoretical physicist, software tinkerer, and occasional baker. At Xanadu, he contributes to the development and growth of Xanadu’s open-source quantum software products.

Last modified: August 06, 2024

Related Blog Posts

PennyLane

PennyLane is an open-source software framework for quantum machine learning, quantum chemistry, and quantum computing, with the ability to run on all hardware. Built with ❤️ by Xanadu.

Stay updated with our newsletter

For researchers

  • Research
  • Features
  • Demos
  • Compilation
  • Datasets
  • Performance
  • Learn
  • Videos
  • Documentation
  • Teach

For learners

  • Learn
  • Codebook
  • Teach
  • Videos
  • Challenges
  • Demos
  • Compilation
  • Glossary

For developers

  • Features
  • Documentation
  • API
  • GitHub
  • Datasets
  • Demos
  • Compilation
  • Performance
  • Devices
  • Catalyst

© Copyright 2025 | Xanadu | All rights reserved

TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.

Privacy Policy|Terms of Service|Cookie Policy|Code of Conduct