If machine learning is interesting, quantum machine learning (QML) is twice as interesting. It’s incredible that you can combine these two fields into one, and it is an area that has seen huge interest and growth in the past few years. In this blog post we will highlight some of the topics which will be helpful on your learning journey. By the end of this article you will be familiarized with some of the most important concepts in optimization, machine learning, and quantum computing, and you will be ready to write your very own quantum machine learning program 😎.

So how can you get started learning quantum machine learning?
The first thing to note is that **you don’t have to be an expert on either quantum computing or machine learning**. Quantum computing is a field that emerged from research in physics, but that has seen an incredible transformation to the software realm in recent years. Nowadays even some high-school students are learning quantum computing! This should help emphasize the argument that you only need a few basics in order to start your journey into QML.

# The basics

There are some concepts in **math** and **linear algebra** that are important to understand quantum machine learning. Knowing some basic **Python** is also very useful if you want to program an algorithm using some of the most popular frameworks available.

## Linear algebra and other math concepts

There are several topics in math that will be useful when learning about both quantum computing and (quantum) machine learning. You can learn linear algebra and math basics for free here and here. Some of the important concepts are:

- Trigonometry
- Vectors
- Linear combinations
- Matrices
- Polar and cartesian coordinate systems
- Complex numbers
- Functions and gradients
- Eigenvalues and eigenvectors

It’s not necessary to be an expert on these topics, a basic understanding of them should be enough.

## Python

Many of the most widely-used libraries and frameworks for classical and quantum machine learning are based on Python. Some examples are PennyLane, scikit-learn and PyTorch. For this reason it’s useful to learn Python if you want to program a quantum machine learning algorithm. Note that there are other frameworks which are not based on Python so feel free to explore other options too.

One way to learn Python is to take a course. There are many free options online (here are some examples). If you already know how to code in other languages, a good way to learn is to watch a YouTube video to get an idea of the syntax, download a cheat sheet to have the keywords handy, and try different small projects. NumPy is a Python package which is widely used in scientific computing. It will be very helpful to have some NumPy tools handy, so be sure to check this cheat sheet if you’re new to this package.

# The building blocks

Once you feel confident with the math concepts and Python basics, you will be ready to build upon that knowledge to learn about optimization, machine learning, and quantum computing 💪.

## Optimization

Many concepts in quantum machine learning are related to optimization. Optimization problems are present across many different fields of study, and involve finding the inputs that will give you the best possible output for a given problem. Generally, finding the best possible output corresponds to minimizing a **cost**.
One example is finding the minimum amount of raw material to manufacture an object. In this case the “cost” is the amount of the raw materials. The problem could be framed differently to minimize the money invested in the raw material, or any other function related to the cost and/or amount of material. This cost will be a function, and so we will refer to it as a **cost function**.

Once you have a cost function you must define an **optimization strategy**. In general you minimize your cost function by taking a number of **steps** that eventually lead you to the lowest cost possible. The bigger your steps are, the more you will move in a certain direction in the **cost landscape**. It’s important to find the right step size for your problem (or to vary your step size) so that you converge to a solution within a reasonable amount of time. If the step size is too small it might take too long to converge. Instead, if the step size is too large it might jump past parts of the landscape where the cost is low. Having an idea of how your function looks and trying different step sizes is helpful in finding a good value.

Many optimization techniques are based on finding the **gradient**, which is the direction of steepest change in your function. Since you want to find the lowest point of your function, a smart way to choose the direction for your steps is to move in the direction in which the gradient decreases most steeply. If you’re not sure what gradients are all about, check out this video.

## Machine learning

Machine learning focuses on using computers to find patterns and trends in data, and generalize those to future data it’s never seen before. It involves having a computer algorithm which can improve its performance on a task (learn) based on large amounts of data without being explicitly programmed.

Let’s imagine the task is looking at an image and deciding whether it’s a picture of a cat or a dog. Just as humans, in order for a machine to learn it has to be taught or “trained” to perform the task. You would use a set of many images of dogs and cats to train your model, and then a different set of images to test whether your model is working well. While training your model you will have to use a cost function which you can optimize after every training iteration. As your cost decreases your model becomes better at representing the training data. Only after you feel that your model is good enough then you show it the **test** data. This can help you see whether the model really understands general patterns in the data or not. After successful testing, you will have a machine learning model that you can use on new data!

We have talked a lot about how our cost function is needed to create an optimization, where the cost is reduced step by step. We have also mentioned that we often use gradients to perform such optimizations. Now it’s important to notice that we cannot find the gradients for certain kinds of functions. For instance, if the function is not continuous we won’t be able to calculate its gradient. So what do we do in those cases? We can create a modified version of it, or a completely new cost function, such as the squared distance between your model output and the label. In our variational classifier demo you can see how a squared-distance metric is used in the cost function and then a different accuracy metric is used to see how well the model predicted the answer.

The problem of classifying an image as cat or dog is called a classification problem. In this case the measure (or metric) of success can be the rate of the correct predictions to all predictions, the rate between the true positives and all positives, or the squared distance between your model output and the label. The advantage of the latter is that it is a continuous function, and therefore easier to optimize with gradient-based methods. If you want to learn more about the metrics of success for classification problems, this article has a good explanation.

One very important concept in machine learning is **neural networks**. While many algorithms in machine learning are not related to neural networks, it’s highly likely that you will hear about them when exploring quantum machine learning problems. Neural networks consist of a set of nodes connected by edges to which weights are assigned. Information propagates through the network, and gives rise to a function of the data that can map inputs to outputs. Coming back to our cat/dog example, the input could be the pixels of the image and the output is a label “cat” or “dog”. Given a cost function, we can train a neural network by finding the best weights that influence the flow of information, and hence the input-output map. The most popular optimization strategy is **backpropagation**, which consists of finding the gradient of the cost function with respect to the weights, and modifying them accordingly. Once the weights have been modified, the process can be repeated many times to improve the quality of the output. You can learn more about backpropagation here.

We have only given the example of image classification, but there are many kinds of tasks which a computer can perform with machine learning such as regression, clustering, and more. These tasks are generally classified into supervised, unsupervised, or reinforcement learning. It’s important to note that not all of these require the use of neural networks. If you’re curious to learn more about this you can read the articles here and here.

## Quantum computing

Quantum computing refers to the use of physical quantum systems to solve problems. Quantum computers are fundamentally different from classical computers: they are not made of transistors but instead work with **qubits**, which physically may be photons, superconducting qubits, trapped ions, or one of many other technologies. What these technologies have in common is that they use quantum mechanical principles such as superposition, entanglement, and interference in order to perform computations.

*Photonic quantum chip*

The qubit is the basic unit of information in a quantum computer. Qubits are represented mathematically as complex-valued unit vectors, or linear combinations of them. This is why at the beginning of this article we mentioned that it was important to know about these concepts 😁. Imagine that a bit behaves like a coin on a table: it can be either heads or tails. Instead the qubit is like having a coin that you can rotate in the air in a controlled manner. If the qubit is in states 0 or 1 it’s in heads or tails. If you make the coin rotate in the air it’s not really in 0 or 1 anymore but it’s in a **superposition** of 0 and 1.

Dirac notation (also known as bra-ket notation) is often used to represent qubit states. You can learn more about this notation here. The vector

represents the \(\vert 0\rangle\) state and the vector

represents the \(\vert 1\rangle\) state. All other states are a linear combination (superposition) of these two.

Now that we know about qubits, how do we use them to perform computations?
We manipulate qubits by applying **operations**, such as **gates** or **measurements** to them. The gates we use are similar to those found in classical computation such as “AND”, “OR”, and “NOT”. Some of the quantum gates we use are the “NOT” (or “X”) gate, rotations around the *x*, *y* and *z* axes “RX”, “RY”, and “RZ”, and the “Hadamard” gate. Quantum gates are used to entangle qubits, put them into superposition, change the outcome probabilities of measurements, and more. When a measurement happens it’s like grabbing the rotating coin and putting it on a table. Any superposition that might have existed is said to **collapse** into a classical state.

We often represent these computations graphically as a **quantum circuit**. Circuits are drawn as horizontal wires, over which you can apply the gates. Each wire corresponds to a qubit, and at the end of each wire there is a measurement, which will allow us to determine the output of our circuit.

# Putting it all together: Quantum Machine Learning

Once you’re familiar with all of these building blocks, you’ll be ready to start putting things together and explore **quantum machine learning**. The first thing you will need is a problem that you want to solve. What kind of problem is it? Is it a problem that is intrinsically quantum (such as a chemical problem) where you want to use machine learning tools to get a solution? Or is it a problem that you can find in classical machine learning where you want to use quantum computing to solve it? Knowing your problem can give you ideas on how to encode it into circuits, and how to combine classical and quantum techniques to find a solution.

This article about quantum machine learning will give you insight into how both classical and quantum machines are combined using differentiable programming. This technique is based on the fact that you can find the gradient of a quantum circuit with respect to the parameters controlling the operations in it. With the ability to calculate the gradients of quantum circuits, you can use a quantum circuit as a node (a QNode) in a classical neural network. You can learn more about differentiable programming here. Note that there are many different approaches to quantum machine learning. For a few years now, the dominant idea in research is what we introduce here (and what PennyLane is made for), but be aware that there are other approaches too.

After reading this you might be wondering how to actually program and solve a quantum machine learning problem. Do you need to have a quantum computer at hand, or a supercomputer in order to run such a program? Fortunately the answer is no! There are many quantum computers that are available on the cloud, and simulators that you can run on your own laptop. There are also many software libraries that are designed to help you write and run such programs.

PennyLane is a cross-platform Python library for differentiable programming of quantum computers. This means that you can use PennyLane to access quantum computers from different providers, while making it easy to write and run quantum computing programs.

When writing any quantum computing program you must have in mind where it’s going to run. The place where you will run your program is called a “device”. For most problems the best you can do (given that current quantum computers can be noisy) is to begin by using a simulator. Since circuits can have different numbers of qubits (or wires), it’s important to define this when you define your device.

After defining your device you can define your QNode, which binds together the device with a Python function that implements a quantum circuit and returns a measurement. This syntax makes it easy to include parameters in your circuits. In the end the quantum computation is just a function that can depend on inputs and produces an output.

You can then define a cost function (which is also just a normal Python function) based on the output of your QNode. Hybrid models often include preprocessing or postprocessing, which can be arbitrarily complex. This means that we can add additional functions as simple as adding a constant or as complex as adding a full neural network. If you do want to use a neural network you can define it as a separate function, and then use it within your cost function.

The final step is to perform the optimization over the cost function. PennyLane has a lot of optimizers which you can choose from. The optimization part involves choosing an optimizer and a step size, making an initial guess for the value of your parameters, and then iterating over a number of defined steps. Finally you can print or graph the results of your optimization.

In summary, to create a quantum machine learning program in PennyLane you need to:

- Define a device including the device type and the number of wires.
- Define your quantum circuit (QNode).
- Define pre-/postprocessing (such as a neural network). (Optional)
- Define a cost function which takes in your quantum circuit and your neural network (if you have one).
- Perform the optimization
- Choose an optimizer.
- Choose a step size.
- Make an initial guess for the value of your parameters.
- Iterate over a number of defined steps.

- Enjoy your results by printing or graphing them!

This is how this algorithm would look on PennyLane v0.18.0:

```
# In this program we will train a circuit to model a sine function
# We import the necessary libraries
import pennylane as qml
from pennylane import numpy as np
import matplotlib.pyplot as plt
# We create the training Data
X = np.linspace(0, 2*np.pi, 5) # 5 input datapoints from 0 to 2pi
# We tell the optimizer that this is an input datapoint,
# and not a parameter to optimize over.
X.requires_grad = False
Y = np.sin(X) # The outputs for the input datapoints
# We create the test Data
# 5 test datapoints, shifted from the training data by 0.2
# Since we're not optimizing over the test data we
# don't need to specify requires_grad = False
X_test = np.linspace(0.2, 2*np.pi+0.2, 5)
Y_test = np.sin(X_test) # The outputs for the test datapoints
# Step 1 - Create the device
# Here we use the 'default.qubit' simulator and 1 qubit (wires=1)
dev = qml.device('default.qubit', wires=1)
# Step 2 - Create the quantum circuit
@qml.qnode(dev)
def quantum_circuit(datapoint, params):
# Encode the input data as an RX rotation
qml.RX(datapoint, wires=0)
# Create a rotation based on the angles in "params"
qml.Rot(params[0], params[1], params[2], wires=0)
# We return the expected value of a measurement along the Z axis
return qml.expval(qml.PauliZ(wires=0))
# Step 3 - Classical Pre/Postprocessing
def loss_func(predictions):
# This is a postprocessing step. Here we use a least squares metric
# based on the predictions of the quantum circuit and the outputs
# of the training data points.
total_losses = 0
for i in range(len(Y)):
output = Y[i]
prediction = predictions[i]
loss = (prediction - output)**2
total_losses += loss
return total_losses
# Step 4 - Define your cost function, including any classical pre/postprocessing
def cost_fn(params):
# We get the predictions of the quantum circuit for a specific
# set of parameters along the entire input dataset
predictions = [quantum_circuit(x, params) for x in X]
# We calculate the cost including any classical postprocessing
cost = loss_func(predictions)
return cost
# Steps 5.1 and 5.2 - We define the optimizer
opt = qml.GradientDescentOptimizer(stepsize=0.3)
# Step 5.3 We make an initial guess for the parameters
params = [0.1,0.1,0.1]
# Step 5.4 - We iterate over a number of defined steps (100)
for i in range (100):
# Over each step the parameters change to give a better cost
params, prev_cost = opt.step_and_cost(cost_fn,params)
if i%10 == 0:
# We print the result after every 10 steps
print(f'Step = {i} Cost = {cost_fn(params)}')
# Step 6 - Test and graph your results!
test_predictions = []
for x_test in X_test:
prediction = quantum_circuit(x_test,params)
test_predictions.append(prediction)
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.scatter(X, Y, s=30, c='b', marker="s", label='Train outputs')
ax1.scatter(X_test,Y_test, s=60, c='r', marker="o", label='Test outputs')
ax1.scatter(X_test,test_predictions, s=30, c='k', marker="x", label='Test predicitons')
plt.xlabel("Inputs")
plt.ylabel("Outputs")
plt.title("QML results")
plt.legend(loc='upper right');
plt.show()
```

As you can see on the image above, we have the clear outline of a sine function. This shows that our model effectively learned to recognize the underlying pattern in our data.

# Concluding remarks

Now it’s time to try this out yourself! Download and install PennyLane using `pip install pennylane`

and code up the program showed above, or follow one of our Getting Started demos to create your first quantum program. You can also install any of our plugins or interfaces by following the instructions on our install page. If you feel comfortable with the basic demos try out any of the more advanced demos.
There’s always more to learn so explore different resources such as articles and videos.

If you want to challenge yourself we encourage you to participate in our events, implement an algorithm from a research paper, or create your own demo! The PennyLane community is a growing group of people from all around the world, who are passionate about open-source quantum computing. Get engaged with the community by attending our Thursday community calls, participating in our discussion forum, and contributing to PennyLane. Be sure to follow us on Twitter to always stay informed of the latest news.

If you liked this guide, share it with others who might like it too. Enjoy doing quantum machine learning and using PennyLane!