0% found this document useful (0 votes)

20 views57 pages

Clustering

Uploaded by

Madina Dates

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views57 pages

Clustering

Uploaded by

Madina Dates

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

Machine Learning

Clustering

Source:
https://www.kdnuggets.com/2023/05/clustering-scikitlearn-tutorial-unsupervised-learning.html
https://www.projectpro.io/article/clustering-algorithms-in-machine-learning/842
What is Clustering
> Organizing data into clusters such that there is
● high intra-cluster similarity
● low inter-cluster similarity

> Informally, ﬁnding natural groupings among objects

based on similarity (pattern matching)
> Unsupervised learning works with unlabelled data
> Used as
● a tool that is used on its own to solve problems
related to identifying patterns within datasets.
● As a pre-processing step for various machine
learning algorithms.
Source: https://www.advancinganalytics.co.uk/blog/2022/6/13/get-started-with-clustering-the-easy-way
Clustering Applications
Customer segmentation: Clustering is often used to group customers based on their
demographics or purchasing behaviour. By segmenting customers into groups,
targeted promotions can be done. For example, in the retail industry, customers are
often segmented so that their behaviours are well understood. Once the segmentation
is performed, targeting a particular segment with group-specific action like
promotions can be easily carried out.
Clustering Applications
Image segmentation: This is a process of partitioning an image into multiple distinct
regions containing sets of pixels with similar attributes. Often, this technique is used
in medical research to identify underlying patterns and in the automotive industry, in
particular in autonomous vehicles to identify objects.
Clustering Applications
Market segmentation: This helps businesses increase the chances of people engaging
with advertisements or content, resulting in more efficient campaigns and improved
return on investment. Similar to customer segmentation, clustering is widely used in
many businesses to perform market segmentation.
Clustering Applications
Anomaly detection: Widely used in businesses, particularly in social media, finance,
healthcare and manufacturing. Identification of fake news, fraudulent transactions, or
defective mechanical components are popular areas where clustering is often used to
detect anomalies. For example, in the finance sector, clustering is often used to
identify fraudulent transactions using historical fraudulent transaction data.
Clustering Applications
Document sorting: Clustering can be used to organise and categorise documents based
on certain key features, for example, category, keywords, word frequency or content.
This is popular within many businesses, where managing risk and compliance is
essential.

Pricing: Grouping new products based on a set of

features using clustering is often used in the
retail industry to price new products accurately.
Clustering Applications
Customer services: With more emphasis being put on customer care, clustering is
widely used to group customer complaints into tiers of importance. This then provides
the ability to understand, prioritise and focus on the most important issues to make
the most signiﬁcant impact.

Genome analysis: Clustering is often used to

determine similarities between genomes.
In fact, clustering algorithms were utilised
during the Covid-19 pandemic to detect and
analyse distinct strains of Coronavirus to help
establish similarities to origin hosts.
Clustering Applications
Data Compression: By minimizing the amount of data points that must be examined,
various clustering techniques can be utilized to compress huge datasets. Data analysis
may become quicker and more effective as a result.

Recommendation Systems: Clustering can be

used in recommendation systems to group people
or things that share characteristics. This could
enhance the user experience and help to increase
the accuracy of recommendations.
Types of Clustering Algorithms
Distribution models – Clusters in this model belong to a
distribution. Data points are grouped based on the
probability of belonging to either a normal or a gaussian
distribution. The expectation-maximisation algorithm, which
uses multivariate normal distributions, is one of the popular
examples of this algorithm.

Centroid models – This is an iterative algorithm in which

data is organised into clusters based on how close data
points are to the centre of clusters also known as centroid.
An example of centroid models is the K-means algorithm.
Source: https://www.advancinganalytics.co.uk/blog/2022/6/13/10-incredibly-useful-clustering-algorithms-you-need-to-know
Types of Clustering Algorithms

Connectivity models – This is similar to the centroid model

that seeks to build a hierarchy of clusters. An example of a
connectivity model is the hierarchical algorithm.

Density models – Clusters are deﬁned by areas of

concentrated density. It searches areas of dense data points
and assigns those areas to the same clusters. DBSCAN and
OPTICS are two popular density-based clustering models.

Source: https://www.advancinganalytics.co.uk/blog/2022/6/13/10-incredibly-useful-clustering-algorithms-you-need-to-know
Clustering Algorithms
Affinity Propagation: It considers all data points as input measures of similarity
between pairs of data points and simultaneously considers them as potential
exemplars. Real-valued messages are exchanged between data points until a
high-quality set of exemplars and corresponding clusters gradually emerges.

Agglomerative Hierarchical Clustering: This clustering technique uses a hierarchical

“bottom-up” approach. This implies that the algorithm begins with all data points as
clusters and begins merging them depending on the distance between clusters. This
will continue until we establish one large cluster.
Clustering Algorithms
BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies): This technique
is very useful when clustering large datasets as it begins by ﬁrst generating a more
compact summary that retains as much distribution information as possible and then
clustering the data summary instead of the original large dataset.

DBSCAN (Density-Based Spatial Clustering of Applications with Noise): DBSCAN is a

really well known density-based clustering algorithm. It determines clusters based on
how dense regions are. It is able to ﬁnd irregular-shaped clusters and outliers really
well.

OPTICS (Ordering Points to Identify the Clustering Structure): This is also a

density-based clustering algorithm. It is very similar to the DBSCAN described above,
but it overcomes one of DBSCAN’s limitations, which is the problem of detecting
meaningful clusters in data of varying density.
Clustering Algorithms
K-Means: This algorithm is one of the most popular and commonly used clustering
technique. It works by assigning data points to clusters based on the shortest distance
to the centroids or centre of the cluster. This algorithm’s main goal is to minimise the
sum of distances between data points and their respective clusters.

Mini-Batch K-Means: This is a k-means version in which cluster centroids are updated
in small batches rather than the entire dataset. When working with a large dataset, the
mini-batch k-means technique can be used to minimise computing time.

Mean Shift Clustering: Mean shift clustering algorithm is a centroid-based algorithm

that works by shifting data points towards centroids to be the mean of other points in
the feature space.
Clustering Algorithms
Spectral Clustering: Spectral clustering is a graph-based algorithm where the
approach is used to identify communities of nodes based on the edges. Because of its
ease of implementation and promising performance, spectral clustering has grown in
popularity.

Gaussian Mixture Models (GMM): The Gaussian mixture models is an extension of the
k-means clustering algorithm. It is based on the idea that each cluster may be
assigned to a different Gaussian distribution. GMM uses soft-assignment of data points
to clusters (i.e. probabilistic and therefore better) when contrasting with the K-means
approach of hard-assignments of data points to clusters.
Circles - two circles,
one circumscribed
by the other.

Moons - two
interleaving half
circles.

Varied variance
blobs – blobs that
have different
variances.

Anisotropically
distributed blobs -
unequal widths and
lengths.

Regular blobs - Just

three regular blobs

Homogenous data
a ‘null’ situation for
clustering.
Distance Measuring Technique – Euclidean vs Manhattan
Distance Measuring Technique – Correlation-based
Pearson correlation
measures the degree of a linear relationship between two proﬁles.

Spearman correlation
computes the correlation between the rank of x and the rank of y variables.
Cluster
Distance
Measuring
Techniques

Ward’s method: In this method all possible pairs of clusters are combined and the sum of the
squared distances within each cluster is calculated. This is then summed over all clusters. The
combination that gives the lowest sum of squares is chosen.
Agglomerative
Hierarchical
Clustering –
Dendogram
Agglomerative Hierarchical Clustering
Agglomerative Hierarchical Clustering
Agglomerative Hierarchical Clustering
Agglomerative Hierarchical Clustering
Agglomerative Hierarchical Clustering
Agglomerative Hierarchical Clustering
Agglomerative Hierarchical Clustering
Agglomerative Hierarchical Clustering
Ward’s
Hierarchical
Clustering
Hierarchical Clustering
Hierarchical Clustering Implementation
Partitional Clustering – K-Means Algorithm

K-Means++
K-Means
Clustering-1
K-Means
Clustering-2
K-Means
After moving centers, re-assign the objects…

Clustering-3
K-Means
Clustering-4
K-Means
Clustering-5
K-Means Clustering
K-Means Clustering
K-Means Clustering – Right # of clusters
K-Means Clustering – Right # of clusters
K-Means Clustering – Right # of clusters
K-Means Clustering – Right # of clusters
K-Means Clustering – Right # of clusters
K-Means Clustering – Right # of clusters - Silhouette Coefficient
K-Means Clustering – Right # of clusters - Silhouette Coefficient
K-Means Clustering – Right # of clusters - Silhouette Coefficient
K-Means Clustering – Right # of clusters - Silhouette Coefficient
K-Means Clustering – Right # of clusters - Silhouette Coefficient
DBSCAN
Density-Based
Spatial Clustering
and Application
with Noise Source:
www.sthda.com
Why
DBSCAN
Parameters
Terms
DBSCAN
Algorithm
DBSCAN
Other Clustering Algorithms
Optics:
https://www.madrasresearch.org/post/optics-clustering
https://github.com/christianversloot/machine-learning-articles/blob/main/performing
-optics-clustering-with-python-and-scikit-learn.md

Mean Shift:
https://ml-explained.com/blog/mean-shift-explained
https://aitechtrend.com/simplifying-data-clustering-with-mean-shift-algorithm-in-pyt
hon/

Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
4.unit 4 ML Q&A
No ratings yet
4.unit 4 ML Q&A
73 pages
Lecturer-1 Unit 3
No ratings yet
Lecturer-1 Unit 3
31 pages
E-Note_28966_Content_Document_20241211091351PM
No ratings yet
E-Note_28966_Content_Document_20241211091351PM
69 pages
Clustering new
No ratings yet
Clustering new
6 pages
CBSYLLABUS BDA
No ratings yet
CBSYLLABUS BDA
5 pages
Module 5
No ratings yet
Module 5
91 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Unit III Clustering
No ratings yet
Unit III Clustering
47 pages
Classify Clustering
No ratings yet
Classify Clustering
31 pages
Clustering
No ratings yet
Clustering
13 pages
Unit 4
No ratings yet
Unit 4
74 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
Clustering
No ratings yet
Clustering
29 pages
Unit 5
No ratings yet
Unit 5
5 pages
Unit 4-L2
No ratings yet
Unit 4-L2
19 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
7 pages
Clustering
No ratings yet
Clustering
8 pages
clustering
No ratings yet
clustering
20 pages
ML UNIT-III
No ratings yet
ML UNIT-III
18 pages
ARTIFICIAL INTELLIGENCE LEC 5
No ratings yet
ARTIFICIAL INTELLIGENCE LEC 5
20 pages
Clustering
No ratings yet
Clustering
21 pages
FPA unit 3
No ratings yet
FPA unit 3
17 pages
Clustering
No ratings yet
Clustering
6 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
MODULE-V
No ratings yet
MODULE-V
16 pages
Untitled document
No ratings yet
Untitled document
32 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
Clustering Unit4
No ratings yet
Clustering Unit4
9 pages
AI
No ratings yet
AI
19 pages
ML_Unit-3
No ratings yet
ML_Unit-3
22 pages
Unit 4
No ratings yet
Unit 4
5 pages
Clustering Notes
No ratings yet
Clustering Notes
17 pages
clustering-u-5
No ratings yet
clustering-u-5
2 pages
Clustering
No ratings yet
Clustering
3 pages
Unit 4
No ratings yet
Unit 4
40 pages
A06-A Survey of Clustering Techniques
No ratings yet
A06-A Survey of Clustering Techniques
5 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
7 pages
2023112069310_29501Clustering In Data Mining Process
No ratings yet
2023112069310_29501Clustering In Data Mining Process
3 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
Clustering Explanation
No ratings yet
Clustering Explanation
8 pages
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
93 pages
DWDM Unit-5
No ratings yet
DWDM Unit-5
52 pages
Clustering
No ratings yet
Clustering
11 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Machine Learning Clustering AlgorithmsI
No ratings yet
Machine Learning Clustering AlgorithmsI
129 pages
Lecture Unsupervised (17!04!2024).Pptx
No ratings yet
Lecture Unsupervised (17!04!2024).Pptx
61 pages
DMW UNIT 5
No ratings yet
DMW UNIT 5
10 pages
Day 3 - Content
No ratings yet
Day 3 - Content
50 pages
Module-5_Notes_13-12-2024.docx
No ratings yet
Module-5_Notes_13-12-2024.docx
45 pages
Data Mining - UNIT-IV
No ratings yet
Data Mining - UNIT-IV
24 pages
U20cs604 Machine Learning Unit III
No ratings yet
U20cs604 Machine Learning Unit III
23 pages
Unit 3 unsupervised learning algorith
No ratings yet
Unit 3 unsupervised learning algorith
15 pages
Cluster Analysis
No ratings yet
Cluster Analysis
36 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
From Everand
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
Elaine Tate
No ratings yet
DS Final Question Bank for Mid-1
No ratings yet
DS Final Question Bank for Mid-1
6 pages
CSC 2302 Syllabus - Spring 2019
No ratings yet
CSC 2302 Syllabus - Spring 2019
3 pages
Simulate Tic-Tac-Toe Using TM: 1 Motivation
No ratings yet
Simulate Tic-Tac-Toe Using TM: 1 Motivation
11 pages
Find.the Probability of Getting 2 Heads When 5 Coins Are Tossed - Google Search
No ratings yet
Find.the Probability of Getting 2 Heads When 5 Coins Are Tossed - Google Search
1 page
Circular Function II - English
No ratings yet
Circular Function II - English
2 pages
850 Midterm 2015
No ratings yet
850 Midterm 2015
2 pages
Feed Forward Neural Networks: Prof. Adel Abdennour
No ratings yet
Feed Forward Neural Networks: Prof. Adel Abdennour
48 pages
Linear Search and Binary Search
No ratings yet
Linear Search and Binary Search
24 pages
USE_OF_PRIME_NUMBERS_IN_RSA_ALGORITHM_1435 (1)
No ratings yet
USE_OF_PRIME_NUMBERS_IN_RSA_ALGORITHM_1435 (1)
3 pages
Optical Computer Architectures: The Application of Optical Concepts To Next Generation Computers
No ratings yet
Optical Computer Architectures: The Application of Optical Concepts To Next Generation Computers
9 pages
Unit 4 Teaching Notes 2020
No ratings yet
Unit 4 Teaching Notes 2020
40 pages
Chapter 3 - Clean Random Variables and Probability Distributions Notes
No ratings yet
Chapter 3 - Clean Random Variables and Probability Distributions Notes
17 pages
2016-MICCAI-Recognizing End-Diastole and End-Systole
No ratings yet
2016-MICCAI-Recognizing End-Diastole and End-Systole
9 pages
Problem Set 0: Introduction To Algorithms
No ratings yet
Problem Set 0: Introduction To Algorithms
27 pages
Assignment Problem Solutions
No ratings yet
Assignment Problem Solutions
8 pages
BA6 - Linear Optimization
No ratings yet
BA6 - Linear Optimization
25 pages
AlphaZero Research Paper Summary
No ratings yet
AlphaZero Research Paper Summary
3 pages
Computational Methods in Process Engineering Lab Experiment - 1
No ratings yet
Computational Methods in Process Engineering Lab Experiment - 1
10 pages
Tangents (Y11)
No ratings yet
Tangents (Y11)
7 pages
QP VTU With Ans 1A PDF
No ratings yet
QP VTU With Ans 1A PDF
19 pages
Lecture Notes in Computer Science 3897: Editorial Board
No ratings yet
Lecture Notes in Computer Science 3897: Editorial Board
379 pages
Cryptography Notes-5-23
No ratings yet
Cryptography Notes-5-23
19 pages
T3 - Solving Equations
100% (1)
T3 - Solving Equations
4 pages
economatrics_postmte_1
No ratings yet
economatrics_postmte_1
46 pages
Applied Model Predictive Control - A Brief Guide Do MATLAB/Simulink MPC Toolbox
100% (1)
Applied Model Predictive Control - A Brief Guide Do MATLAB/Simulink MPC Toolbox
66 pages
Extra Quiz CT2-1
No ratings yet
Extra Quiz CT2-1
3 pages
IIR اوس و يحيى رحاب
No ratings yet
IIR اوس و يحيى رحاب
22 pages
CS 3510 Homework 1 Q
No ratings yet
CS 3510 Homework 1 Q
7 pages
Micro-Insurance Model
No ratings yet
Micro-Insurance Model
6 pages
L2 - Fuzzy Logic System
No ratings yet
L2 - Fuzzy Logic System
30 pages

Clustering

Uploaded by

Clustering

Uploaded by

Machine Learning

> Informally, ﬁnding natural groupings among objects

Pricing: Grouping new products based on a set of

Genome analysis: Clustering is often used to

Recommendation Systems: Clustering can be

Centroid models – This is an iterative algorithm in which

Connectivity models – This is similar to the centroid model

Density models – Clusters are deﬁned by areas of

Agglomerative Hierarchical Clustering: This clustering technique uses a hierarchical

DBSCAN (Density-Based Spatial Clustering of Applications with Noise): DBSCAN is a

OPTICS (Ordering Points to Identify the Clustering Structure): This is also a

Mean Shift Clustering: Mean shift clustering algorithm is a centroid-based algorithm

Regular blobs - Just

You might also like