Ang pahinang ito ay hindi pa naisalin. Nakikita mo ang orihinal na bersyon sa Ingles.
Review of relevant machine learning methods
In this section, we will review some key terms and methods from classical machine learning that will help us to better understand the workflows in quantum machine learning. We will first introduce some general terms, before doing a deeper dive on two types of machine learning: kernel methods (especially in the context of a support vector machine) and neural networks. There are certainly connections between these methods, but we will treat them as distinct due to the differences in quantum workflows discussed here and in later lessons. This is only a cursory overview, and we will skip a great deal of nuance. For a more complete overview of machine learning, we recommend resources like [1-3].
Types of machine learning
As a simple definition, machine learning is a collection of algorithms that analyze and draw inferences from patterns and relationships in data. Broadly speaking, machine learning algorithms can be grouped into three main categories depending on the type of data involved and how algorithms learn without being explicitly programmed:
- Supervised learning: In supervised learning, the data that is used to train the model is labeled. The goal of these algorithms is to learn the relationship between data and their corresponding labels or outputs and to generalize this to unseen data. Common tasks in this class are classification and regression.
- Unsupervised learning: In contrast to supervised learning, unsupervised learning uses unlabeled data to train the machine learning model. The goal of such algorithms is to discover hidden patterns and structure in data. Some algorithms in this class are clustering and dimensionality reduction algorithms. Some generative models such as generative adversarial networks and variational autoencoders can also be considered in this category.
- Reinforcement learning: Algorithms in this machine learning category are defined by an agent which interacts with an environment. The agent takes actions and receives feedback from its environment in the form of rewards and punishments. Eventually through this feedback mechanism, the agent learns to take the correct set of actions to perform a specific task.

The left image shows two categories of labeled data as in supervised learning. In this case, the categories are linearly separable. The right image shows clusters of data. In an unsupervised learning task, these data would not initially be labeled and the algorithm would study the distribution, perhaps looking for clusters. For the purposes of visualizing example clusters the algorithm might identify, the data points have now been labeled. A key difference between the two is that the supervised learning process starts with the data already labeled and the unsupervised process starts with unlabeled data, even if the data are labeled at the end.
Introducing “quantum” to machine learning
We can now start exploring how “quantum” is introduced into machine learning. In this broader categorization, we consider the type of model/algorithm on the processing device, as well as the type of data provided to it. The picture above summarizes these possible combinations.

For instance, CC means that we have a classical dataset – such as images, sound, or text that we can store on classical computers - and that we also use a classical computer to run a machine learning algorithm. This is precisely the classical machine learning setting. On the other hand, QQ means that we are using a quantum computer to process quantum data. Here, “quantum data” could mean several things, and could be context-dependent. Quantum data could be thought of as a set of measurement outcomes obtained from a quantum device, or it could refer to states that have been prepared on a quantum computer by another algorithm. In the future, it could even refer to data stored in QRAM (Quantum Random Access Memory), which does not currently exist. When researchers talk about quantum machine learning, they usually refer to the CQ regime, where the dataset at hand is classical and the processing device executing the machine learning algorithm is a quantum computer. In the following parts of the course, we will focus on such algorithms.
Support vector machines
We now recap a class of algorithms called support vector machines from a classical machine learning point of view. Later we will show how to bring quantum computing into this algorithm.

Let’s suppose a task of binary classification on a dataset with two-dimensional feature space as shown in the plot. One thing we can do to perform classification for this dataset is to find a line, or in general a hyperplane that separates the two classes. In practice we can find infinitely many separating hyperplanes, so the question is: How do we define the optimal one? The idea here is that a particularly good decision boundary should maximize the margin, which is defined as the distance to the nearest points in each class. In this setting, the data points with the smallest distance to the decision boundary are called support vectors.
A linear decision boundary could be described in a number of ways; in some ways the most straightforward way is that shown in below. Here, is the set of parameters defining the hyperplane, is your dataset, and is a constant shift. is a mapping from the space of input data points often (but not necessarily) to a higher dimensional space. We will return to this mapping below.
In the model is the vector of tunable parameters that the model would learn. This is what we call the “primal formulation”. With some mathematical manipulation we can show that there is a second way we can formulate the same problem. We call this the "dual formulation", depicted by the equation below. For this formulation, we need to optimize over the alpha parameters. The main difference is that in the primal formulation the equation has an inner product between the feature vector and the learnable parameters, whereas in the dual formulation the inner product is between feature vectors. Even though the dual form includes both the training data features and the corresponding labels, we will see in the next section how it proves to be more useful than the primal form.
Kernel methods and how quantum can play a role
The video below motivates how quantum can play a role in linear classifiers. This is described in greater detail in the text.
Moving to higher dimensional spaces
In this and the following subsection, the discussion focuses on mappings to higher dimensions. The point here is to explain the "kernel trick" in the context of mappings between spaces, and thus set the stage for what a quantum kernel is. The point is not that higher dimensions in quantum wave functions solve all of our problems. As mentioned in the introduction, classical Gaussian feature maps are already infinite-dimensional. The dimensionality of data features is important, but high-dimensional quantum states are not sufficient for improvement over classical methods.
Graphically, one can easily see how we can generalize the SVM approach to cases where the original data are not linearly separable, given the right mapping to higher dimensions. Looking at the two-dimensional data on the left, we can see that there is no linear decision boundary that can separate the two classes. However, we can consider adding a third feature to our feature space. If this new feature is - for instance - the distance to the origin of the previous two features and , then the data becomes linearly separable. This also means that we can run the support vector machine algorithm successfully now on this higher dimensional feature space.

We also denote this “feature mapping” . The feature map often maps from the space of input data to a higher dimension, as is shown here, but there are models and algorithms that make use of mappings to lower dimensions. The mapping to higher dimensions is simply an easy case to visualize and understand.
Some feature maps may map into very high dimensional spaces. In such cases, the high-dimensionality makes inner products more computationally expensive. We will return to that point below.
Why is the dual form useful?
Recall the primal and dual formulations of our linear boundary model:
Now that we know that using a feature map to get to a higher dimensional space can allow us to successfully find a separating hyperplane, we can replace the original feature vector in the equations with the feature-mapped vectors. However, if we do this in the primal formulation, we encounter the problem of having to compute the inner products between the parameters and a potentially very high dimensional feature map. However, in the dual formulation, we see that these are replaced with inner products between feature mapped vectors of different inputs.
For some feature maps, it may be possible to write the inner product of feature-mapped vectors as a simple function of the original (lower-dimensional) variables and . For some choices of we may even be able to write as a simple function of the lower-dimensional inner product This is computationally very beneficial because we can access the space in which data are linearly separable, but without the cost of manipulations in higher dimensions. In fact, since the feature-mapped vectors only appear in the in inner products, we may not even need to explicitly carry out the feature mapping to compute the inner products. We call the function that calculates the inner products the “kernel function”, and this way of avoiding the feature map computation is called the “kernel trick”. In fact, the feature mapped vectors could even be infinite dimensional, but the kernel might still be very efficiently computable.
The kernel function itself is a function of two input data vectors. Inserting each pair of data vectors in the dataset as arguments of the kernel function results in a symmetric, positive semi-definite matrix, called the kernel matrix:
Once we compute the kernel matrix, then we can find the optimal parameters () using methods such as quadratic programming software or an algorithm called “sequential minimal optimization”. Of course, this assumes that there exists an efficiently-calculable kernel corresponding to a feature map that makes your data classes linearly separable. A related but novel approach is quantum kernel estimation.
Quantum kernels
Quantum computers, or quantum states in general, allow for a very natural definition of “quantum kernel”. We can interpret the encoding of an input into a quantum state as a feature map. This process may indeed map the data into a very high-dimensional space as is common in classical feature maps, but the dimensionality will depend on the encoding method (see the Data Encoding lesson). Recall that the inner product of two quantum states is related to the probability of measuring the state