Easy access to large data is integral to quantum computing research – especially as quantum algorithms and hardware scale. Today, we're delighted to launch PennyLane Datasets, a brand new online library that makes it easier to browse available datasets, and find exactly what you are looking for; complementing our existing datasets functionality in PennyLane.
We've been sharing PennyLane Demos with a mission to make impactful research accessible and to allow researchers across the spectrum of quantum computing to explore, understand, and generalize new results by simply copying and modifying existing code.
Imagine if you could do the same with data!
No longer do you need to spend hours sifting through GitHub repos or attempting to reverse-engineer data from papers to make it code-compatible — use PennyLane Datasets to dive in, build, and explore data-driven quantum algorithms.
Our datasets library is currently in preview; give it a whirl, and let us know if there are any datasets you would love to see in it. Or, read on to learn more about how PennyLane Datasets work, the data currently available, and what we have planned for the future.
What is quantum data anyway?
Quantum data is any data that goes in or comes out of a quantum system.
In our first release of the PennyLane Datasets library, you’ll find available a collection of quantum datasets spanning from quantum chemistry to spin systems. This includes:
-
A wide variety of common molecules with multiple bond lengths and angles — from standard bearers like \text{H}_2 and \text{H}_2\text{O} to \text{C}_2\text{H}_6 and \text{N}_2\text{H}_4.
-
Spin systems such as the Fermi–Hubbard model and the Ising model, with multiple lattices, periodicities, and layouts.
For each dataset, we also provide the data you need for benchmarking and testing your algorithms, including Hamiltonians, classical shadow samples, measurement groupings, symmetries, and even common circuit ansätze and parameters.
However, as we continue to build out our datasets library, we are taking a broader view of what "quantum data" is.
We don’t want to pigeonhole; quantum data is more than just chemistry data and spin system samples. It is any data you may be using or that you may be interested in when building quantum algorithms.
Download and access data in seconds
Using the new dataset interface, you can browse and then choose a specific dataset and its parameters:
Change parameters like bond length and basis in the Configure tab, and explore the corresponding information for these dataset settings on the dataset page. These include a preview of the Hamiltonian that defines the system, details about the available samples, and the source code for generating the dataset. 🤖
With a dataset and parameters selected, hop over to the Download tab to choose what information you want to download. Then, the PennyLane code can be easily copied into your Python environment to quickly download the dataset and start using it in circuits.
Other features include:
- 🎲 Samples: associated output from quantum simulators or hardware. For example, provided samples for the molecule datasets includes sampling data obtained from the optimized variational circuit with available Hamiltonian groupings.
- 💻 Source code available for all data: explore the code that generated the data, and even download and run it yourself.
- 📖 Detailed data attributes: an overview of all data available within the dataset, including descriptions and data types.
In addition to this new way to browse and access the datasets, we've also made significant improvements to the existing datasets functionality in PennyLane as part of the PennyLane v0.32 release, such as:
- 🥣 Improved datasets serialization: datasets are now stored using the HDF5 data format, which is ubiquitous across scientific computation.
- ⚙️ Access to dataset identifiers from inside the datasets. Values such as the
bondlength
,molname
, orlattice
. - ⚛️ New datasets: a large variety of dimers, organic and inorganic molecules, mainly focused on molecules that exhibit multireference character.
- 👌 Download datasets with default parameter values automatically. For example, you can download the \text{H}_2 molecule in the STO-3G basis at its optimal bond length by simply calling
qml.data.load('qchem', molname='H2')
. For more details, and many other PennyLane v0.32 features and improvements, make sure to check out the PennyLane v0.32 release.
What’s next for PennyLane Datasets?
With the PennyLane Datasets project, our goal is to both document and provide important research data so that you can build and share faster.
We currently aim to provide:
-
Standard datasets: useful for broader, more general quantum computing research. Examples include molecules and spin systems.
-
Research-specific datasets: linked to particular research, to encourage exploration and sharing of recent results. These datasets may be input to quantum circuits, embedding data, circuits, or output samples from quantum simulations or quantum hardware. Over the next few months, we will be working to grow the catalog of available datasets, and continuing to improve the PennyLane Datasets service.
Try it out and get involved
To get started using PennyLane Datasets, browse the available datasets and import them directly into PennyLane. For more details on the PennyLane integration, or to create your own datasets, please check out the PennyLane documentation on quantum datasets.
Have any data you would love to see? Or even data you would love to share and host on PennyLane Datasets? If so, please get in touch — we would love to help.
…and if you are as excited as we are, make sure to keep an eye on the PennyLane Blog and follow us on social media for the latest PennyLane Datasets updates.
About the authors
Diego Guala
Diego is a quantum scientist at Xanadu. His work is focused on supporting the development of the datasets service and PennyLane features.
Josh Izaac
Josh is a theoretical physicist, software tinkerer, and occasional baker. At Xanadu, he contributes to the development and growth of Xanadu’s open-source quantum software products.