0% found this document useful (0 votes)

5 views

Copy of How to Perform Clustering Algorithms in Machine Learning

The document discusses clustering algorithms in machine learning, focusing on unsupervised learning techniques that identify patterns in datasets without labeled data. It outlines various clustering methods, including K-means, KNN, hierarchical clustering, and others, explaining their characteristics and applications. The Elbow method is highlighted as a technique for determining the optimal number of clusters in K-means clustering, emphasizing the importance of data normalization and the evaluation of clustering results.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Copy of How to Perform Clustering Algorithms in Machine Learning

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

How to perform clustering algorithms in

machine learning?
There are four types of learning technique in machine learning based on the dataset and the
problem. They are Supervised, unsupervised, semi-supervised. Mostly in the real world, the
problem and the dataset will revolve around supervised and semi supervised machine learning.
That is because of the labelled data in the dataset. Supervised machine learning involves the
labelled dataset, which means that there is an accurate dependent and independent variable for
the prediction. Semi supervised machine learning involves a partially labelled dataset, which
means either independent or dependent variables are labelled to a certain extent. But there is
another learning which is Unsupervised machine learning. Unsupervised machine learning
involves the discovery of the pattern of the given dataset. For example, in sentiment analysis,
grouping the positive, negative and neutral reviews based on the data of first 24-48 hrs of movie
release. Unsupervised machine learning is about grouping the data based on a condition.
Clustering based analysis comes under unsupervised machine learning. In the real-world, the
usage of clustering based analysis is lesser than the supervised and unsupervised. In this
article, we will see about the clustering analysis and how to perform a clustering algorithm in
machine learning.

What is unsupervised machine learning?

Before we dive into the clustering algorithm, first let us get familiar with how unsupervised
machine learning works. As said, unsupervised machine learning deals with the natural
discovery of the pattern in the dataset. Unlike predictive modelling like supervised machine
learning, unsupervised machine learning involves interpreting the data in the source (dataset)
and finding the natural pattern in the feature of space (clustering). It means that unsupervised
machine learning allows the algorithm to learn on its own by finding the pattern which was
hidden or not discovered yet.

Types of unsupervised machine learning:

There are two main types of unsupervised learning, they are as follows:

● Clustering
● Association

Clustering:

Clustering is often depicted as an area of density in feature space where the examples from the
domain are closer to present cluster to neighbor cluster. Clusters have a centroid which is the
point feature space, it may or may not have boundary. Clustering helps in the problem domain
like pattern recognition or discovery or knowledge discovery (sentiment analysis). Clustering
also have several types, they are as follows:

● K means clustering
● KNN- K Nearest Neighbor
● Hierarchical clustering
● Principal component analysis
● Singular value decomposition
● Independent component analysis

These types of clustering have their own advantages. For example,

Exclusive- also known as the partition, in which data points are grouped in such a way that it
belongs to one cluster. Example- k means clustering.

Agglomerate- every data point is considered as a cluster, the iterative union between two
neighbors clusters reduces the number of clusters. Example- Hierarchical clustering.

Overlapping- In this technique, a fuzzy set is used as the cluster data. Fuzzy set is also known
as an uncertain set, in which each element has a degree of membership. Based on the fuzzy
set principle, each point belongs to two or more clusters with a certain and separate degree of
membership.

Probabilities- In this technique, probability distribution to create a cluster of data points. For
example,

● Nvidia TX GPU
● AMD GPU
● Nvidia GTX GPU
● AMD Fidelity RTX GPU

Here, we can classify the group as ‘Nvidia’ and ‘AMD’, ‘RTX’ and ‘GTX’.

Association:

Association rules for establishing the association among the data objects in a large database.
Association is a type of unsupervised machine learning used for discovering the interesting
relationship between the variables in the database.

For example, people who buy new phones tend to purchase extra battery banks, scratch proof
back cases etc.

Types of clustering:

There are six types of clustering algorithms in machine learning. They are as follows:
K means clustering:

It is an iterative clustering algorithm, to find the highest value for every iteration. The first step is,
the desired cluster is chosen. In this technique, the objective is to group/cluster the data points
into k numbers of the cluster. The large k value means less group with more granularity in the
same way and low k value means large group with minimal granularity.

In K means algorithm, based on centroid each group is formed and this centroid pulls the
nearest data points to form the cluster. The centroid acts as the nucleus and nearest data points
are surrounded by it.

KNN- K nearest neighbors:

KNN is based on the feature similarity. The process of finding the value of k is called the
parameter tuning which influences the accuracy. The value of k determines the number of
clusters. To find the value of K:

K= sqrt(n), where n is the total number of data points in the dataset.

Hierarchical clustering:

Hierarchical clustering is another unsupervised learning algorithm that is used to group together
the unlabeled data points having similar characteristics. Hierarchical clustering algorithms fall
into the following two categories.

Agglomeration hierarchical algorithms − In agglomeration hierarchical algorithms, each data

point is treated as a single cluster and then successively merge or agglomerate (bottom-up
approach) the pairs of clusters. The hierarchy of the clusters is represented as a dendrogram or
tree structure.

Divisive hierarchical algorithms − On the other hand, in divisive hierarchical algorithms, all
the data points are treated as one big cluster and the process of clustering involves dividing
(Top-down approach) the one big cluster into various small clusters.

Principal Component Analysis:

Principal Component Analysis is an unsupervised learning algorithm that is used for

dimensionality reduction in machine learning. It is a statistical process that converts the
observations of correlated features into a set of linearly uncorrelated features with the help of
orthogonal transformation. These new transformed features are called the Principal
Components. It is one of the popular tools that is used for exploratory data analysis and
predictive modeling. It is a technique to draw strong patterns from the given dataset by reducing
the variances.

Singular value decomposition:

Singular Value decomposition also known as SVD is a multivariate statistical technique that
helps solve complex problems, by reducing its size from the large value. It also helps in
generating significant solutions for few values. Singular value decomposition shines most in
geophysical and atmospheric science. Mathematically SVD,

SVD= U. Sigma V^T

SVD= is the matrix (mxn) that need to be decomposed

U= mxm matrix,

Sigma= mxn diagonal matrix

V^T= transpose of a nxn matrix where T is a superscript

Independent component analysis:

Independent Component Analysis (ICA) is a technique in statistics used to detect hidden factors
that exist in datasets of random variables, signals, or measurements. An alternative to principal
component analysis (PCA), Independent Component Analysis helps to divide multivariate
signals into subcomponents that are assumed to be non-Gaussian in nature, and independent
of each other.

Example of Clustering algorithm:

Now, we have a better understanding of unsupervised machine learning and how clustering
works and its applications. Following will be the live example of clustering with K-means
algorithm for better understanding on the topic.

K-means clustering:

Before we perform the operation using k means, it is better to understand one important topic
which is "The Elbow method". The important aspect of unsupervised learning is to find the
optimal number of clusters. The Elbow method is a famous method used to determine the
number of clusters by finding the value of 'k'. Now, let us start the analysis.

NOTE: It is always important to scale the data before applying the algorithm

Step 1: Importing the necessary packages:

Importing the K-means model with the help of sklearn. The python code is shown below:
Matplotlib helps us to understand the elbow method which was previously mentioned and the
visual presentation of the number of clusters.

Step: 2 Import the dataset:

For explaining purposes, the IRIS dataset is used as an example. The python code for importing
the dataset from the local system.

Step: 3 Analysis and preprocessing:

This is mostly involved in the data cleaning. For this example, one of the independent variables
is dropped due to its insignificant effect on the dataset. The python code for preprocessing,

Step: 4 Data Normalizing:

As mentioned, it is better to normalize the data, since it will be easy to process. In this example,
the reason for normalizing the data is that, from head to tail of the dataset, the petal length
increases. It is better to scale the data so that all the data points are within the range for better
visualization.
Step: 5 Performing the elbow method for finding the 'k' value:

The main step in the clustering algorithm is to find the optimal number of clustering. For this
example, as mentioned the elbow method used to find the number of clusters. The python code
is shown below.

Here, as mentioned, the number of clusters is found to be 2. But how to find the k value here, it
is simple. The graph shows the line has "bends" The number of bends represents the K value.
In this example, the line has 2 bends, then k value is 2 which means two clusters is the optimal
number of clusters for this dataset.

Step: 6 Applying the K-means algorithm:

Now, the number of clustered to be applied was found, the final step is to apply the k-means
algorithm and visualize the clustered data. The python code is below.
The scaled data, shows that the data points are within the range of 0-1, which is easy to
process and also to visualize the data points clearly and neatly.

Step: 7 Output:

The visualization of the data cluster, as shown below.

The image shows the data points are clustered together and formed two optimal groups

Here, one question arises, why k=2 instead of k=3 or 4 if considering the end points as bend.
Let us explore that option also.

Here, the number of clusters is taken as 3, and k means applied, the result shows the partition.
Now, comparing the K=2 and K=3 output, it is noticeable that the cluster one (yellow) is actually
divided into 2 (yellow and blue), and this is not the optimal cluster. So, it is best to eliminate the
end points as a bend in the elbow method.

FAQs:
1. What are the clustering methods used in machine learning?

The following are the clustering methods used in machine learning:

● K means clustering
● KNN- K Nearest Neighbor
● Hierarchical clustering
● Principal component analysis
● Singular value decomposition
● Independent component analysis

2. What is an Elbow method in K-means clustering in machine learning?

The Elbow method is used to determine the number of clusters by finding the value of 'k'.

3. What are the types of unsupervised machine learning?

The two types of unsupervised machine learning are as follows:

● Cluster based algorithm

● Association based algorithm

4. What is unsupervised machine learning?

Unsupervised machine learning deals with the natural discovery of the pattern in the dataset.

Conclusion:
Clustering based analysis is very helpful in increasing the performance of the model and
procure the faster results. The other advantages are greater scalability, simplified management.
In this article, unsupervised machine learning with types of clustering and live examples of one
of the types of clustering is discussed. The important points are, there is no best clustering
algorithm in machine learning, every algorithm has its own purpose and advantages. It is
important to scale the data before applying the algorithm. Do not consider the end points in the
elbow method graph, as it leads to improper clustering of the data points.

Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Chapter 3 Review Questions
83% (6)
Chapter 3 Review Questions
3 pages
2nd Unit NN Final Class Notes (1)
No ratings yet
2nd Unit NN Final Class Notes (1)
50 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
ML UNIT-III
No ratings yet
ML UNIT-III
18 pages
UNIT-5 Material
No ratings yet
UNIT-5 Material
42 pages
Un-Supervised Machine Learning
No ratings yet
Un-Supervised Machine Learning
9 pages
Unit 2 Unsupervised Learning
No ratings yet
Unit 2 Unsupervised Learning
86 pages
1
No ratings yet
1
59 pages
Day 3 - Content
No ratings yet
Day 3 - Content
50 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
Chapter 8
No ratings yet
Chapter 8
15 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
4 pages
Ml Unit5 Notes
No ratings yet
Ml Unit5 Notes
18 pages
Machine Learning Clustering AlgorithmsI
No ratings yet
Machine Learning Clustering AlgorithmsI
129 pages
Evolutional Study On KNN and K-Means Algorithms (SP)
No ratings yet
Evolutional Study On KNN and K-Means Algorithms (SP)
9 pages
Unit-4
No ratings yet
Unit-4
53 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
10 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
47 pages
unit4_ml[1]
No ratings yet
unit4_ml[1]
20 pages
Aiml Prof
No ratings yet
Aiml Prof
8 pages
Unit 3 big data
No ratings yet
Unit 3 big data
50 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
16 pages
Week 9. Unsupervised Learning
No ratings yet
Week 9. Unsupervised Learning
32 pages
Unit 4 Introduction to Algorithm
No ratings yet
Unit 4 Introduction to Algorithm
10 pages
ML Unit 2 Notes
No ratings yet
ML Unit 2 Notes
14 pages
MODULE 3
No ratings yet
MODULE 3
17 pages
Unsupervised Machine Learning in Python
100% (1)
Unsupervised Machine Learning in Python
89 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
m Learning
No ratings yet
m Learning
11 pages
ARTIFICIAL INTELLIGENCE LEC 5
No ratings yet
ARTIFICIAL INTELLIGENCE LEC 5
20 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Unit-5 Clustering (March 16, 24)
No ratings yet
Unit-5 Clustering (March 16, 24)
25 pages
MACHINE LEARNING
No ratings yet
MACHINE LEARNING
5 pages
A Comprehensive Overview of Clustering Algorithms in Pattern Recognition
No ratings yet
A Comprehensive Overview of Clustering Algorithms in Pattern Recognition
8 pages
ML notes
No ratings yet
ML notes
10 pages
SJNanda - Spider and CollidingBodies
No ratings yet
SJNanda - Spider and CollidingBodies
50 pages
unit4
No ratings yet
unit4
96 pages
Unit- 4(ML)
No ratings yet
Unit- 4(ML)
13 pages
K Means
No ratings yet
K Means
9 pages
Lecture 3 Types of Machine Learning
No ratings yet
Lecture 3 Types of Machine Learning
40 pages
AI - W8L15
No ratings yet
AI - W8L15
44 pages
chapter 3 p4
No ratings yet
chapter 3 p4
18 pages
Machine Learning File
No ratings yet
Machine Learning File
7 pages
Clustering in Machine Learning: Prepared by
No ratings yet
Clustering in Machine Learning: Prepared by
10 pages
Lab 10 Unsupervised
No ratings yet
Lab 10 Unsupervised
12 pages
CE345 - Lecture #9 - Clustering
No ratings yet
CE345 - Lecture #9 - Clustering
56 pages
DSUP_Exp5[1]
No ratings yet
DSUP_Exp5[1]
7 pages
unit 2 ml
No ratings yet
unit 2 ml
11 pages
Unit 3 unsupervised learning algorith
No ratings yet
Unit 3 unsupervised learning algorith
15 pages
unit 4 mining
No ratings yet
unit 4 mining
12 pages
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
100% (1)
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
13 pages
Unsupervised Learning 1691392220
No ratings yet
Unsupervised Learning 1691392220
15 pages
fuzzy meaning
No ratings yet
fuzzy meaning
6 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
9 pages
Unit IV
No ratings yet
Unit IV
96 pages
UnsupervisedLearning_FoundationalMathofAI_S24
No ratings yet
UnsupervisedLearning_FoundationalMathofAI_S24
6 pages
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
From Everand
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
Peter Bradley
No ratings yet
Flight Price Prediction Using Machine Learning Algorithms
No ratings yet
Flight Price Prediction Using Machine Learning Algorithms
5 pages
Quiz 1: and Analysis of Algorithms
No ratings yet
Quiz 1: and Analysis of Algorithms
13 pages
False Position Algorithm
No ratings yet
False Position Algorithm
6 pages
Motor Fault Detection Using Sound Signature and Wavelet Transform
No ratings yet
Motor Fault Detection Using Sound Signature and Wavelet Transform
9 pages
Weekly Schedule - Gifari: Weekly Objectives To Do Items Deadlines This Week
No ratings yet
Weekly Schedule - Gifari: Weekly Objectives To Do Items Deadlines This Week
2 pages
CENSO Manual
No ratings yet
CENSO Manual
74 pages
20I6001 - Aaradhya - Ghota - Assignment 1
No ratings yet
20I6001 - Aaradhya - Ghota - Assignment 1
20 pages
Assignment 1 DAA
No ratings yet
Assignment 1 DAA
14 pages
Numerical Differentiation and Integration
100% (1)
Numerical Differentiation and Integration
84 pages
Reed Solomon Encoder C Model (Comment # 234) : Martin Langhammer September 2012
No ratings yet
Reed Solomon Encoder C Model (Comment # 234) : Martin Langhammer September 2012
9 pages
Slide 0
No ratings yet
Slide 0
12 pages
Neural Networks
No ratings yet
Neural Networks
5 pages
Digital Bildbehandling: Laborationer #2: Fourier Transform and Image Morphology
No ratings yet
Digital Bildbehandling: Laborationer #2: Fourier Transform and Image Morphology
3 pages
Day 1.2 GCF Common Monomial Factor
No ratings yet
Day 1.2 GCF Common Monomial Factor
19 pages
Solution Manual for A Friendly Introduction to Numerical Analysis Brian Bradie - Available For Quick Download And Unlimited Reading
100% (8)
Solution Manual for A Friendly Introduction to Numerical Analysis Brian Bradie - Available For Quick Download And Unlimited Reading
40 pages
Financial Time Series Forecasting With Deep Learning a Systematic Literature Review 2005 2019
No ratings yet
Financial Time Series Forecasting With Deep Learning a Systematic Literature Review 2005 2019
64 pages
Rızvanoğlu2019 Article OptimizationOfMunicipalSolidWa
No ratings yet
Rızvanoğlu2019 Article OptimizationOfMunicipalSolidWa
12 pages
Class Notes Deep-Learning
No ratings yet
Class Notes Deep-Learning
3 pages
01. [AI - Slides] Search - Uninformed Search
No ratings yet
01. [AI - Slides] Search - Uninformed Search
69 pages
Finite Math
No ratings yet
Finite Math
26 pages
Digital Signal Processing - DR - Prarthan Mehta
No ratings yet
Digital Signal Processing - DR - Prarthan Mehta
61 pages
Daa Lab Manual 21cs42
No ratings yet
Daa Lab Manual 21cs42
44 pages
Program For Bisection Method
No ratings yet
Program For Bisection Method
5 pages
Algorithms Design and Analysis DP Sheet: Year 3 22 20 - Semester 2
No ratings yet
Algorithms Design and Analysis DP Sheet: Year 3 22 20 - Semester 2
16 pages
Ajol File Journals - 411 - Articles - 221085 - Submission - Proof - 221085 4897 541755 1 10 20220208
No ratings yet
Ajol File Journals - 411 - Articles - 221085 - Submission - Proof - 221085 4897 541755 1 10 20220208
6 pages
Algorithm Design: Project Team: AJIT ANTIL (2K19/EE/026) Sunny (2K19/Ee/Xxx)
No ratings yet
Algorithm Design: Project Team: AJIT ANTIL (2K19/EE/026) Sunny (2K19/Ee/Xxx)
7 pages
Multimedia Systems: Chapter 7: Data Compression
No ratings yet
Multimedia Systems: Chapter 7: Data Compression
41 pages
Using EXCEL Solver: J.D. Camm University of Cincinnati December, 1997
No ratings yet
Using EXCEL Solver: J.D. Camm University of Cincinnati December, 1997
14 pages
A_high-speed_CRC-32_Implementation_on_FPGA
No ratings yet
A_high-speed_CRC-32_Implementation_on_FPGA
4 pages