0% found this document useful (0 votes)
30 views20 pages

Aiml Unit 4

Uploaded by

kanish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views20 pages

Aiml Unit 4

Uploaded by

kanish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

1. What is Unsupervised Learning?

Unsupervised learning is a type of machine learning in which models are trained using
unlabeled dataset and are allowed to act on that data without any supervision.

he goal of unsupervised learning is to find the underlying structure of dataset, group


that data according to similarities, and represent that dataset in a compressed
format.

Example: Suppose the unsupervised learning algorithm is given an input dataset


containing images of different types of cats and dogs. The algorithm is never trained
upon the given dataset, which means it does not have any idea about the features of
the dataset. The task of the unsupervised learning algorithm is to identify the image
features on their own. Unsupervised learning algorithm will perform this task by
clustering the image dataset into the groups according to similarities between images.

Working of Unsupervised Learning

Working of unsupervised learning can be understood by the below diagram:


Here, we have taken an unlabeled input data, which means it is not categorized and
corresponding outputs are also not given. Now, this unlabeled input data is fed to the
machine learning model in order to train it. Firstly, it will interpret the raw data to find the
hidden patterns from the data and then will apply suitable algorithms such as k-means
clustering, Decision tree, etc.

Once it applies the suitable algorithm, the algorithm divides the data objects into groups
according to the similarities and difference between the objects.

Types of Unsupervised Learning Algorithm:

The unsupervised learning algorithm can be further categorized into two types of
problems:

o Clustering: Clustering is a method of grouping the objects into clusters such that
objects with most similarities remains into a group and has less or no similarities
with the objects of another group. Cluster analysis finds the commonalities
between the data objects and categorizes them as per the presence and
absence of those commonalities.
o Association: An association rule is an unsupervised learning method which is
used for finding the relationships between variables in the large database. It
determines the set of items that occurs together in the dataset. Association rule
makes marketing strategy more effective. Such as people who buy X item
(suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical
example of Association rule is Market Basket Analysis.

Unsupervised Learning algorithms:

o K-means clustering
o KNN (k-nearest neighbors)
o Gaussian mixture models
o Expectation maximization

2. What is K-Means Algorithm?:

K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled


dataset into different clusters. Here K defines the number of pre-defined clusters that
need to be created in the process, as if K=2, there will be two clusters, and for K=3,
there will be three clusters, and so on.

It is an iterative algorithm that divides the unlabeled dataset into k different clusters in
such a way that each dataset belongs only one group that has similar properties.

It allows us to cluster the data into different groups and a convenient way to discover
the categories of groups in the unlabeled dataset on its own without the need for any
training.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The
main aim of this algorithm is to minimize the sum of distances between the data point
and their corresponding clusters.

The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of
clusters, and repeats the process until it does not find the best clusters. The value of k
should be predetermined in this algorithm.

The k-means clustering algorithm mainly performs two tasks:

o Determines the best value for K center points or centroids by an iterative


process.
o Assigns each data point to its closest k-center. Those data points which are near
to the particular k-center, create a cluster.

Hence each cluster has datapoints with some commonalities, and it is away from other
clusters.

The below diagram explains the working of the K-means Clustering Algorithm:

How does the K-Means Algorithm Work?

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined
K clusters.

Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to the new
closest centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots:

Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two
variables is given below:

o Let's take number k of clusters, i.e., K=2, to identify the dataset and to put them
into different clusters. It means here we will try to group these datasets into two
different clusters.
o We need to choose some random k points or centroid to form the cluster. These
points can be either the points from the dataset or any other point. So, here we
are selecting the below two points as k points, which are not the part of our
dataset. Consider the below image:

o Now we will assign each data point of the scatter plot to its closest K-point or
centroid. We will compute it by applying some mathematics that we have studied
to calculate the distance between two points. So, we will draw a median between
both the centroids. Consider the below image:

From the above image, it is clear that points left side of the line is near to the K1 or blue
centroid, and points to the right of the line are close to the yellow centroid. Let's color
them as blue and yellow for clear visualization.
o As we need to find the closest cluster, so we will repeat the process by
choosing a new centroid. To choose the new centroids, we will compute the
center of gravity of these centroids, and will find new centroids as below:

o Next, we will reassign each datapoint to the new centroid. For this, we will repeat
the same process of finding a median line. The median will be like below image:
From the above image, we can see, one yellow point is on the left side of the line, and
two blue points are right to the line. So, these three points will be assigned to new
centroids.

As reassignment has taken place, so we will again go to the step-4, which is finding new
centroids or K-points.

o We will repeat the process by finding the center of gravity of centroids, so the
new centroids will be as shown in the below image:
o As we got the new centroids so again will draw the median line and reassign the
data points. So, the image will be:

o We can see in the above image; there are no dissimilar data points on either side
of the line, which means our model is formed. Consider the below image:

As our model is ready, so we can now remove the assumed centroids, and the two final
clusters will be as shown in the below image:
3. K-Nearest Neighbor(KNN) Algorithm for Machine
Learning:

o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based


on Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and available
cases and put the new case into the category that is most similar to the available
categories.
o K-NN algorithm stores all the available data and classifies a new data point
based on the similarity. This means when new data appears then it can be easily
classified into a well suite category by using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
o It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets new
data, then it classifies that data into a category that is much similar to the new
data.
o Example: Suppose, we have an image of a creature that looks similar to cat and
dog, but we want to know either it is a cat or dog. So for this identification, we
can use the KNN algorithm, as it works on a similarity measure. Our KNN model
will find the similar features of the new data set to the cats and dogs images and
based on the most similar features it will put it in either cat or dog category.
Why do we need a K-NN Algorithm?

Suppose there are two categories, i.e., Category A and Category B, and we have a new
data point x1, so this data point will lie in which of these categories. To solve this type of
problem, we need a K-NN algorithm. With the help of K-NN, we can easily identify the
category or class of a particular dataset. Consider the below diagram:

How does K-NN work?

The K-NN working can be explained on the basis of the below algorithm:

o Step-1: Select the number K of the neighbors


o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
o Step-4: Among these k neighbors, count the number of the data points in each
category.
o Step-5: Assign the new data points to that category for which the number of the
neighbor is maximum.
o Step-6: Our model is ready.

Suppose we have a new data point and we need to put it in the required category.
Consider the below image:

o Firstly, we will choose the number of neighbors, so we will choose the k=5.
o Next, we will calculate the Euclidean distance between the data points. The
Euclidean distance is the distance between two points, which we have already
studied in geometry. It can be calculated as:
o By calculating the Euclidean distance we got the nearest neighbors, as three
nearest neighbors in category A and two nearest neighbors in category B.
Consider the below image:

o As we can see the 3 nearest neighbors are from category A, hence this new data
point must belong to category A.

How to select the value of K in the K-NN Algorithm?

Below are some points to remember while selecting the value of K in the K-NN
algorithm:

o There is no particular way to determine the best value for "K", so we need to try
some values to find the best out of them. The most preferred value for K is 5.
o A very low value for K such as K=1 or K=2, can be noisy and lead to the effects
of outliers in the model.
o Large values for K are good, but it may find some difficulties.

4. Gaussian Mixture Models (GMMs):

The Gaussian Mixture Model (GMM) is a probabilistic model used for clustering and
density estimation. It assumes that the data is generated from a mixture of several
Gaussian components, each representing a distinct cluster. GMM assigns
probabilities to data points, allowing them to belong to multiple clusters
simultaneously. The model is widely used in machine learning and pattern
recognition applications.

Gaussian Mixture Models (GMMs) assume that there are a certain number of
components, where each component is a Gaussian distribution. Hence, a Ga ussian
Mixture Model tends to group the data points belonging to a single Gaussian
component together. The parameters of the mixture components, such as the
means and covariances, are typically estimated using the Expectation-Maximization
(EM) algorithm or maximum likelihood estimation techniques.

Let’s say we have three Gaussian components (more on that in the next section) –
GD1, GD2, and GD3. These have a certain mean (μ1, μ2, μ3) and variance (σ1, σ2,
σ3) value respectively. For a given set of data points, our GMM would identify the
probability of each data point belonging to each of these mixture components. The
EM algorithm iteratively updates these parameters to maximize the likelihood of the
data, without requiring the derivative to be calculated explicitly.

Wait, probability?

You read that right! Gaussian Mixture Models are probabilistic models and use
the soft clustering approach for distributing the points in different clusters. I’ll
take another example that will make it easier to understand.

Here, we have three clusters that are denoted by three colors – Blue, Green, and
Cyan. Let’s take the data point highlighted in red. The probability of this point being
a part of the blue cluster is 1, while the probability of it being a part of the green or
cyan clusters is 0.
These probabilities are computed using Bayes’ theorem, which relates the prior and
posterior probabilities of the cluster assignments given the data. An important
decision in GMMs is choosing the appropriate number of components, which can be
done using techniques like the Bayesian Information Criterion (BIC) or cross -
validation.

Now, consider another point – somewhere in between the blue and cyan
(highlighted in the below figure). The probability that this point is a part of cluster
green is 0, right? The probability that this belongs to blue and cyan is 0.2 and 0.8
respectively. These coefficients represent the responsibilities or soft assignments of
the data point to the different Gaussian components in the mixture.

Gaussian Mixture Models use the soft clustering technique for assigning data points
to Gaussian distributions, leveraging Bayes’ theorem to compute the posterior
probabilities. I’m sure you’re wondering what these distributions are so let me
explain that in the next section.
The Gaussian Distribution

I’m sure you’re familiar with Gaussian Distributions (or the Normal Distribution). It
has a bell-shaped curve, with the data points symmetrically distributed around the
mean value.

The below image has a few Gaussian distributions with a difference in mean (μ) and
variance (σ 2 ). Remember that the higher the σ value more the spread:

Source: Wikipedia

In a one dimensional space, the probability density function of a Gaussian


distribution is given by:

where μ is the mean and σ2 is the variance.

But this would only be true for a single variable. In the case of two variables, instead
of a 2D bell-shaped curve, we will have a 3D bell curve as shown below:
The probability density function would be given by:

where x is the input vector, μ is the 2D mean vector, and Σ is the 2×2 covariance
matrix. The covariance would now define the shape of this curve. We can
generalize the same for d-dimensions.

Thus, this multivariate Gaussian model would have x and μ as vectors of length d,
and Σ would be a d x d covariance matrix.

Hence, for a dataset with d features, we would have a mixture of k Gaussian


distributions (where k is equivalent to the number of clusters), each having a certain
mean vector and variance matrix. But wait – how is the mean and variance value for
each Gaussian assigned?

These values are determined using a technique called expectation maximization


(EM). We need to understand this technique before we dive deeper into the working
of Gaussian Mixture Models.

5. EM Algorithm in Machine Learning:

What is an EM algorithm?
The Expectation-Maximization (EM) algorithm is defined as the combination of various
unsupervised machine learning algorithms, which is used to determine the local
maximum likelihood estimates (MLE) or maximum a posteriori estimates (MAP) for
unobservable variables in statistical models. Further, it is a technique to find maximum
likelihood estimation when the latent variables are present. It is also referred to as
the latent variable model.

A latent variable model consists of both observable and unobservable variables where
observable can be predicted while unobserved are inferred from the observed variable.
These unobservable variables are known as latent variables.

Key Points:

o It is known as the latent variable model to determine MLE and MAP parameters
for latent variables.
o It is used to predict values of parameters in instances where data is missing or
unobservable for learning, and this is done until convergence of the values
occurs.

EM Algorithm

The EM algorithm is the combination of various unsupervised ML algorithms, such as


the k-means clustering algorithm. Being an iterative approach, it consists of two
modes. In the first mode, we estimate the missing or latent variables. Hence it is
referred to as the Expectation/estimation step (E-step). Further, the other mode is
used to optimize the parameters of the models so that it can explain the data more
clearly. The second mode is known as the maximization-step or M-step.
o Expectation step (E - step): It involves the estimation (guess) of all missing
values in the dataset so that after completing this step, there should not be any
missing value.
o Maximization step (M - step): This step involves the use of estimated data in
the E-step and updating the parameters.
o Repeat E-step and M-step until the convergence of the values occurs.

The primary goal of the EM algorithm is to use the available observed data of the
dataset to estimate the missing data of the latent variables and then use that data to
update the values of the parameters in the M-step.

What is Convergence in the EM algorithm?

Convergence is defined as the specific situation in probability based on intuition,


e.g., if there are two random variables that have very less difference in their probability,
then they are known as converged. In other words, whenever the values of given
variables are matched with each other, it is called convergence.

Steps in EM Algorithm

The EM algorithm is completed mainly in 4 steps, which include Initialization Step,


Expectation Step, Maximization Step, and convergence Step. These steps are
explained as follows:
o 1st Step: The very first step is to initialize the parameter values. Further, the
system is provided with incomplete observed data with the assumption that data
is obtained from a specific model.

o 2nd Step: This step is known as Expectation or E-Step, which is used to estimate
or guess the values of the missing or incomplete data using the observed data.
Further, E-step primarily updates the variables.
o 3rd Step: This step is known as Maximization or M-step, where we use complete
data obtained from the 2nd step to update the parameter values. Further, M-step
primarily updates the hypothesis.
o 4th step: The last step is to check if the values of latent variables are converging
or not. If it gets "yes", then stop the process; else, repeat the process from step 2
until the convergence occurs.

You might also like