0% found this document useful (0 votes)

30 views20 pages

Aiml Unit 4

Uploaded by

kanish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views20 pages

Aiml Unit 4

Uploaded by

kanish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

1. What is Unsupervised Learning?

Unsupervised learning is a type of machine learning in which models are trained using
unlabeled dataset and are allowed to act on that data without any supervision.

he goal of unsupervised learning is to find the underlying structure of dataset, group

that data according to similarities, and represent that dataset in a compressed
format.

Example: Suppose the unsupervised learning algorithm is given an input dataset

containing images of different types of cats and dogs. The algorithm is never trained
upon the given dataset, which means it does not have any idea about the features of
the dataset. The task of the unsupervised learning algorithm is to identify the image
features on their own. Unsupervised learning algorithm will perform this task by
clustering the image dataset into the groups according to similarities between images.

Working of Unsupervised Learning

Working of unsupervised learning can be understood by the below diagram:

Here, we have taken an unlabeled input data, which means it is not categorized and
corresponding outputs are also not given. Now, this unlabeled input data is fed to the
machine learning model in order to train it. Firstly, it will interpret the raw data to find the
hidden patterns from the data and then will apply suitable algorithms such as k-means
clustering, Decision tree, etc.

Once it applies the suitable algorithm, the algorithm divides the data objects into groups
according to the similarities and difference between the objects.

Types of Unsupervised Learning Algorithm:

The unsupervised learning algorithm can be further categorized into two types of
problems:

o Clustering: Clustering is a method of grouping the objects into clusters such that
objects with most similarities remains into a group and has less or no similarities
with the objects of another group. Cluster analysis finds the commonalities
between the data objects and categorizes them as per the presence and
absence of those commonalities.
o Association: An association rule is an unsupervised learning method which is
used for finding the relationships between variables in the large database. It
determines the set of items that occurs together in the dataset. Association rule
makes marketing strategy more effective. Such as people who buy X item
(suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical
example of Association rule is Market Basket Analysis.

Unsupervised Learning algorithms:

o K-means clustering
o KNN (k-nearest neighbors)
o Gaussian mixture models
o Expectation maximization

2. What is K-Means Algorithm?:

K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled

dataset into different clusters. Here K defines the number of pre-defined clusters that
need to be created in the process, as if K=2, there will be two clusters, and for K=3,
there will be three clusters, and so on.

It is an iterative algorithm that divides the unlabeled dataset into k different clusters in
such a way that each dataset belongs only one group that has similar properties.

It allows us to cluster the data into different groups and a convenient way to discover
the categories of groups in the unlabeled dataset on its own without the need for any
training.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The
main aim of this algorithm is to minimize the sum of distances between the data point
and their corresponding clusters.

The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of
clusters, and repeats the process until it does not find the best clusters. The value of k
should be predetermined in this algorithm.

The k-means clustering algorithm mainly performs two tasks:

o Determines the best value for K center points or centroids by an iterative

process.
o Assigns each data point to its closest k-center. Those data points which are near
to the particular k-center, create a cluster.

Hence each cluster has datapoints with some commonalities, and it is away from other
clusters.

The below diagram explains the working of the K-means Clustering Algorithm:

How does the K-Means Algorithm Work?

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined
K clusters.

Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to the new
closest centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots:

Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two
variables is given below:

o Let's take number k of clusters, i.e., K=2, to identify the dataset and to put them
into different clusters. It means here we will try to group these datasets into two
different clusters.
o We need to choose some random k points or centroid to form the cluster. These
points can be either the points from the dataset or any other point. So, here we
are selecting the below two points as k points, which are not the part of our
dataset. Consider the below image:

o Now we will assign each data point of the scatter plot to its closest K-point or
centroid. We will compute it by applying some mathematics that we have studied
to calculate the distance between two points. So, we will draw a median between
both the centroids. Consider the below image:

From the above image, it is clear that points left side of the line is near to the K1 or blue
centroid, and points to the right of the line are close to the yellow centroid. Let's color
them as blue and yellow for clear visualization.
o As we need to find the closest cluster, so we will repeat the process by
choosing a new centroid. To choose the new centroids, we will compute the
center of gravity of these centroids, and will find new centroids as below:

o Next, we will reassign each datapoint to the new centroid. For this, we will repeat
the same process of finding a median line. The median will be like below image:
From the above image, we can see, one yellow point is on the left side of the line, and
two blue points are right to the line. So, these three points will be assigned to new
centroids.

As reassignment has taken place, so we will again go to the step-4, which is finding new
centroids or K-points.

o We will repeat the process by finding the center of gravity of centroids, so the
new centroids will be as shown in the below image:
o As we got the new centroids so again will draw the median line and reassign the
data points. So, the image will be:

o We can see in the above image; there are no dissimilar data points on either side
of the line, which means our model is formed. Consider the below image:

As our model is ready, so we can now remove the assumed centroids, and the two final
clusters will be as shown in the below image:
3. K-Nearest Neighbor(KNN) Algorithm for Machine
Learning:

o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based

on Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and available
cases and put the new case into the category that is most similar to the available
categories.
o K-NN algorithm stores all the available data and classifies a new data point
based on the similarity. This means when new data appears then it can be easily
classified into a well suite category by using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
o It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets new
data, then it classifies that data into a category that is much similar to the new
data.
o Example: Suppose, we have an image of a creature that looks similar to cat and
dog, but we want to know either it is a cat or dog. So for this identification, we
can use the KNN algorithm, as it works on a similarity measure. Our KNN model
will find the similar features of the new data set to the cats and dogs images and
based on the most similar features it will put it in either cat or dog category.
Why do we need a K-NN Algorithm?

Suppose there are two categories, i.e., Category A and Category B, and we have a new
data point x1, so this data point will lie in which of these categories. To solve this type of
problem, we need a K-NN algorithm. With the help of K-NN, we can easily identify the
category or class of a particular dataset. Consider the below diagram:

How does K-NN work?

The K-NN working can be explained on the basis of the below algorithm:

o Step-1: Select the number K of the neighbors

o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
o Step-4: Among these k neighbors, count the number of the data points in each
category.
o Step-5: Assign the new data points to that category for which the number of the
neighbor is maximum.
o Step-6: Our model is ready.

Suppose we have a new data point and we need to put it in the required category.
Consider the below image:

o Firstly, we will choose the number of neighbors, so we will choose the k=5.
o Next, we will calculate the Euclidean distance between the data points. The
Euclidean distance is the distance between two points, which we have already
studied in geometry. It can be calculated as:
o By calculating the Euclidean distance we got the nearest neighbors, as three
nearest neighbors in category A and two nearest neighbors in category B.
Consider the below image:

o As we can see the 3 nearest neighbors are from category A, hence this new data
point must belong to category A.

How to select the value of K in the K-NN Algorithm?

Below are some points to remember while selecting the value of K in the K-NN
algorithm:

o There is no particular way to determine the best value for "K", so we need to try
some values to find the best out of them. The most preferred value for K is 5.
o A very low value for K such as K=1 or K=2, can be noisy and lead to the effects
of outliers in the model.
o Large values for K are good, but it may find some difficulties.

4. Gaussian Mixture Models (GMMs):

The Gaussian Mixture Model (GMM) is a probabilistic model used for clustering and
density estimation. It assumes that the data is generated from a mixture of several
Gaussian components, each representing a distinct cluster. GMM assigns
probabilities to data points, allowing them to belong to multiple clusters
simultaneously. The model is widely used in machine learning and pattern
recognition applications.

Gaussian Mixture Models (GMMs) assume that there are a certain number of
components, where each component is a Gaussian distribution. Hence, a Ga ussian
Mixture Model tends to group the data points belonging to a single Gaussian
component together. The parameters of the mixture components, such as the
means and covariances, are typically estimated using the Expectation-Maximization
(EM) algorithm or maximum likelihood estimation techniques.

Let’s say we have three Gaussian components (more on that in the next section) –
GD1, GD2, and GD3. These have a certain mean (μ1, μ2, μ3) and variance (σ1, σ2,
σ3) value respectively. For a given set of data points, our GMM would identify the
probability of each data point belonging to each of these mixture components. The
EM algorithm iteratively updates these parameters to maximize the likelihood of the
data, without requiring the derivative to be calculated explicitly.

Wait, probability?

You read that right! Gaussian Mixture Models are probabilistic models and use
the soft clustering approach for distributing the points in different clusters. I’ll
take another example that will make it easier to understand.

Here, we have three clusters that are denoted by three colors – Blue, Green, and
Cyan. Let’s take the data point highlighted in red. The probability of this point being
a part of the blue cluster is 1, while the probability of it being a part of the green or
cyan clusters is 0.
These probabilities are computed using Bayes’ theorem, which relates the prior and
posterior probabilities of the cluster assignments given the data. An important
decision in GMMs is choosing the appropriate number of components, which can be
done using techniques like the Bayesian Information Criterion (BIC) or cross -
validation.

Now, consider another point – somewhere in between the blue and cyan
(highlighted in the below figure). The probability that this point is a part of cluster
green is 0, right? The probability that this belongs to blue and cyan is 0.2 and 0.8
respectively. These coefficients represent the responsibilities or soft assignments of
the data point to the different Gaussian components in the mixture.

Gaussian Mixture Models use the soft clustering technique for assigning data points
to Gaussian distributions, leveraging Bayes’ theorem to compute the posterior
probabilities. I’m sure you’re wondering what these distributions are so let me
explain that in the next section.
The Gaussian Distribution

I’m sure you’re familiar with Gaussian Distributions (or the Normal Distribution). It
has a bell-shaped curve, with the data points symmetrically distributed around the
mean value.

The below image has a few Gaussian distributions with a difference in mean (μ) and
variance (σ 2 ). Remember that the higher the σ value more the spread:

Source: Wikipedia

In a one dimensional space, the probability density function of a Gaussian

distribution is given by:

where μ is the mean and σ2 is the variance.

But this would only be true for a single variable. In the case of two variables, instead
of a 2D bell-shaped curve, we will have a 3D bell curve as shown below:
The probability density function would be given by:

where x is the input vector, μ is the 2D mean vector, and Σ is the 2×2 covariance
matrix. The covariance would now define the shape of this curve. We can
generalize the same for d-dimensions.

Thus, this multivariate Gaussian model would have x and μ as vectors of length d,
and Σ would be a d x d covariance matrix.

Hence, for a dataset with d features, we would have a mixture of k Gaussian

distributions (where k is equivalent to the number of clusters), each having a certain
mean vector and variance matrix. But wait – how is the mean and variance value for
each Gaussian assigned?

These values are determined using a technique called expectation maximization

(EM). We need to understand this technique before we dive deeper into the working
of Gaussian Mixture Models.

5. EM Algorithm in Machine Learning:

What is an EM algorithm?
The Expectation-Maximization (EM) algorithm is defined as the combination of various
unsupervised machine learning algorithms, which is used to determine the local
maximum likelihood estimates (MLE) or maximum a posteriori estimates (MAP) for
unobservable variables in statistical models. Further, it is a technique to find maximum
likelihood estimation when the latent variables are present. It is also referred to as
the latent variable model.

A latent variable model consists of both observable and unobservable variables where
observable can be predicted while unobserved are inferred from the observed variable.
These unobservable variables are known as latent variables.

Key Points:

o It is known as the latent variable model to determine MLE and MAP parameters
for latent variables.
o It is used to predict values of parameters in instances where data is missing or
unobservable for learning, and this is done until convergence of the values
occurs.

EM Algorithm

The EM algorithm is the combination of various unsupervised ML algorithms, such as

the k-means clustering algorithm. Being an iterative approach, it consists of two
modes. In the first mode, we estimate the missing or latent variables. Hence it is
referred to as the Expectation/estimation step (E-step). Further, the other mode is
used to optimize the parameters of the models so that it can explain the data more
clearly. The second mode is known as the maximization-step or M-step.
o Expectation step (E - step): It involves the estimation (guess) of all missing
values in the dataset so that after completing this step, there should not be any
missing value.
o Maximization step (M - step): This step involves the use of estimated data in
the E-step and updating the parameters.
o Repeat E-step and M-step until the convergence of the values occurs.

The primary goal of the EM algorithm is to use the available observed data of the
dataset to estimate the missing data of the latent variables and then use that data to
update the values of the parameters in the M-step.

What is Convergence in the EM algorithm?

Convergence is defined as the specific situation in probability based on intuition,

e.g., if there are two random variables that have very less difference in their probability,
then they are known as converged. In other words, whenever the values of given
variables are matched with each other, it is called convergence.

Steps in EM Algorithm

The EM algorithm is completed mainly in 4 steps, which include Initialization Step,

Expectation Step, Maximization Step, and convergence Step. These steps are
explained as follows:
o 1st Step: The very first step is to initialize the parameter values. Further, the
system is provided with incomplete observed data with the assumption that data
is obtained from a specific model.

o 2nd Step: This step is known as Expectation or E-Step, which is used to estimate
or guess the values of the missing or incomplete data using the observed data.
Further, E-step primarily updates the variables.
o 3rd Step: This step is known as Maximization or M-step, where we use complete
data obtained from the 2nd step to update the parameter values. Further, M-step
primarily updates the hypothesis.
o 4th step: The last step is to check if the values of latent variables are converging
or not. If it gets "yes", then stop the process; else, repeat the process from step 2
until the convergence occurs.

UNIT-5 Material
No ratings yet
UNIT-5 Material
42 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
Machine Learning - Iv
No ratings yet
Machine Learning - Iv
13 pages
Unit 3 Unsupervised Learning & Neural Network
No ratings yet
Unit 3 Unsupervised Learning & Neural Network
21 pages
Unit 3 & 4 (p18)
No ratings yet
Unit 3 & 4 (p18)
18 pages
Week 11
No ratings yet
Week 11
49 pages
Clustering Notes
No ratings yet
Clustering Notes
29 pages
Chapter 3 p4
No ratings yet
Chapter 3 p4
18 pages
Unit IV
No ratings yet
Unit IV
96 pages
ML Unit5 Notes
No ratings yet
ML Unit5 Notes
18 pages
(KtabPDF Com) xrwA7TEBGp
No ratings yet
(KtabPDF Com) xrwA7TEBGp
32 pages
KMeans Clustering
No ratings yet
KMeans Clustering
16 pages
UNIT 4 K-Means Clustring
No ratings yet
UNIT 4 K-Means Clustring
13 pages
K-Means Algo
No ratings yet
K-Means Algo
4 pages
Aiml 8
No ratings yet
Aiml 8
7 pages
DSUP Exp5
No ratings yet
DSUP Exp5
7 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Clustering
No ratings yet
Clustering
10 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
Unit 4
No ratings yet
Unit 4
29 pages
CLUSTERING
No ratings yet
CLUSTERING
11 pages
Yunsu Han KNN K Means
No ratings yet
Yunsu Han KNN K Means
8 pages
Week 10
No ratings yet
Week 10
41 pages
ML Unit 2
No ratings yet
ML Unit 2
17 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
AI With ML (Unit-4)
No ratings yet
AI With ML (Unit-4)
14 pages
Unsupervised Learning Part 1
No ratings yet
Unsupervised Learning Part 1
9 pages
Som New
No ratings yet
Som New
21 pages
Unit 4
No ratings yet
Unit 4
22 pages
Chapter 9
No ratings yet
Chapter 9
8 pages
K Means Final
No ratings yet
K Means Final
10 pages
KNN VS Kmeans
No ratings yet
KNN VS Kmeans
3 pages
Unsupervised Learning Final
No ratings yet
Unsupervised Learning Final
17 pages
Algo
No ratings yet
Algo
59 pages
2nd Unit NN Final Class Notes
No ratings yet
2nd Unit NN Final Class Notes
50 pages
UNIT III Part-1
No ratings yet
UNIT III Part-1
69 pages
Lecture 1 (UNIT 1)
No ratings yet
Lecture 1 (UNIT 1)
68 pages
ML - Unit - 2
No ratings yet
ML - Unit - 2
13 pages
Mod4 - Unsupervised Learning
No ratings yet
Mod4 - Unsupervised Learning
9 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
Lab 10 Unsupervised
No ratings yet
Lab 10 Unsupervised
12 pages
K-NN Algorithm and Clustering Analysis
No ratings yet
K-NN Algorithm and Clustering Analysis
93 pages
Clustering
No ratings yet
Clustering
24 pages
ML Unit 2 Notes
No ratings yet
ML Unit 2 Notes
14 pages
Unit 4
No ratings yet
Unit 4
125 pages
DSA Presentation Group 6
No ratings yet
DSA Presentation Group 6
34 pages
Unit4 ML
No ratings yet
Unit4 ML
20 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
K Mean Clustering
No ratings yet
K Mean Clustering
18 pages
K Mean Clustering
No ratings yet
K Mean Clustering
18 pages
K Mean Clustering
No ratings yet
K Mean Clustering
19 pages
Clustering
No ratings yet
Clustering
17 pages
Unit-4 Unsupervised Algorithm
No ratings yet
Unit-4 Unsupervised Algorithm
18 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
UNIT 3 ML Distance Based Learning
No ratings yet
UNIT 3 ML Distance Based Learning
19 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
22 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Lec 19
No ratings yet
Lec 19
23 pages
Lec 21
No ratings yet
Lec 21
16 pages
Esia Notes
No ratings yet
Esia Notes
35 pages
Lec 18
No ratings yet
Lec 18
18 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Ios Unit 1
No ratings yet
Ios Unit 1
27 pages
Ios Unit 4
No ratings yet
Ios Unit 4
18 pages
Ios Unit 3
No ratings yet
Ios Unit 3
10 pages
Glossary For Isye 6501 Introduction To Analytics Modeling
No ratings yet
Glossary For Isye 6501 Introduction To Analytics Modeling
24 pages
Channel Estimation For Massive MIMO Using Gaussian-Mixture Bayesian Learning
No ratings yet
Channel Estimation For Massive MIMO Using Gaussian-Mixture Bayesian Learning
78 pages
Epfl Machine Learning Final Exam 2021 Solutions
No ratings yet
Epfl Machine Learning Final Exam 2021 Solutions
21 pages
Ch9 2-MixturesofGaussians PDF
No ratings yet
Ch9 2-MixturesofGaussians PDF
38 pages
Inference and Learning From Data - Volume I - Foundations
No ratings yet
Inference and Learning From Data - Volume I - Foundations
1,106 pages
A Comparison of Imputation Strategies For Ordinal Missing Data On Likert Scale Variables
No ratings yet
A Comparison of Imputation Strategies For Ordinal Missing Data On Likert Scale Variables
21 pages
AI UNIT - 4 Notes
No ratings yet
AI UNIT - 4 Notes
9 pages
Machine Learning Techniques Unit-2
No ratings yet
Machine Learning Techniques Unit-2
100 pages
Recommender System Based On Customer Segmentation (RSCS)
No ratings yet
Recommender System Based On Customer Segmentation (RSCS)
28 pages
Irtplay
No ratings yet
Irtplay
78 pages
AI Unit-4
No ratings yet
AI Unit-4
59 pages
Latent Class Análysis
No ratings yet
Latent Class Análysis
33 pages
Cs3491-Aiml Lab Manual Final
No ratings yet
Cs3491-Aiml Lab Manual Final
78 pages
Aimlllll
No ratings yet
Aimlllll
40 pages
AI After Mids Chap 7 To 10
No ratings yet
AI After Mids Chap 7 To 10
34 pages
CS772 Lec10
No ratings yet
CS772 Lec10
23 pages
Document 770
No ratings yet
Document 770
6 pages
21CSC305P ML - Unit4
No ratings yet
21CSC305P ML - Unit4
76 pages
A New Approach To Modeling and Estimation For Pairs Trading
No ratings yet
A New Approach To Modeling and Estimation For Pairs Trading
31 pages
13 - Chapter 4 PDF
No ratings yet
13 - Chapter 4 PDF
46 pages
Bayes ML Tutorial
No ratings yet
Bayes ML Tutorial
69 pages
Sample Thesis Algorithm
100% (3)
Sample Thesis Algorithm
7 pages
Unit Iv L Earning
No ratings yet
Unit Iv L Earning
33 pages
Probabilistic Identification o F Keyblocks in Rock Excavations PDF
No ratings yet
Probabilistic Identification o F Keyblocks in Rock Excavations PDF
250 pages
Honeypot: Intrusion Detection System
No ratings yet
Honeypot: Intrusion Detection System
4 pages
Latent Markov Models For Longitudinal Data 1st Edition Francesco Bartolucci - The Ebook Is Now Available, Just One Click To Start Reading
No ratings yet
Latent Markov Models For Longitudinal Data 1st Edition Francesco Bartolucci - The Ebook Is Now Available, Just One Click To Start Reading
65 pages
AI Unit 5
No ratings yet
AI Unit 5
295 pages
Basic Data Mining Techniques: Attributes
No ratings yet
Basic Data Mining Techniques: Attributes
12 pages
Latent Dirichlet Allocation LDA and Topic Modeling PDF
No ratings yet
Latent Dirichlet Allocation LDA and Topic Modeling PDF
41 pages

Aiml Unit 4

Uploaded by

Aiml Unit 4

Uploaded by

1. What is Unsupervised Learning?

he goal of unsupervised learning is to find the underlying structure of dataset, group

Example: Suppose the unsupervised learning algorithm is given an input dataset

Working of Unsupervised Learning

Working of unsupervised learning can be understood by the below diagram:

Types of Unsupervised Learning Algorithm:

Unsupervised Learning algorithms:

2. What is K-Means Algorithm?:

K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled

The k-means clustering algorithm mainly performs two tasks:

o Determines the best value for K center points or centroids by an iterative

How does the K-Means Algorithm Work?

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots:

o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based

How does K-NN work?

o Step-1: Select the number K of the neighbors

How to select the value of K in the K-NN Algorithm?

4. Gaussian Mixture Models (GMMs):

In a one dimensional space, the probability density function of a Gaussian

where μ is the mean and σ2 is the variance.

Hence, for a dataset with d features, we would have a mixture of k Gaussian

These values are determined using a technique called expectation maximization

5. EM Algorithm in Machine Learning:

The EM algorithm is the combination of various unsupervised ML algorithms, such as

What is Convergence in the EM algorithm?

Convergence is defined as the specific situation in probability based on intuition,

The EM algorithm is completed mainly in 4 steps, which include Initialization Step,

You might also like