Clustering
Clustering
Clustering
Source:
https://www.kdnuggets.com/2023/05/clustering-scikitlearn-tutorial-unsupervised-learning.html
https://www.projectpro.io/article/clustering-algorithms-in-machine-learning/842
What is Clustering
> Organizing data into clusters such that there is
● high intra-cluster similarity
● low inter-cluster similarity
Source: https://www.advancinganalytics.co.uk/blog/2022/6/13/10-incredibly-useful-clustering-algorithms-you-need-to-know
Clustering Algorithms
Affinity Propagation: It considers all data points as input measures of similarity
between pairs of data points and simultaneously considers them as potential
exemplars. Real-valued messages are exchanged between data points until a
high-quality set of exemplars and corresponding clusters gradually emerges.
Mini-Batch K-Means: This is a k-means version in which cluster centroids are updated
in small batches rather than the entire dataset. When working with a large dataset, the
mini-batch k-means technique can be used to minimise computing time.
Gaussian Mixture Models (GMM): The Gaussian mixture models is an extension of the
k-means clustering algorithm. It is based on the idea that each cluster may be
assigned to a different Gaussian distribution. GMM uses soft-assignment of data points
to clusters (i.e. probabilistic and therefore better) when contrasting with the K-means
approach of hard-assignments of data points to clusters.
Circles - two circles,
one circumscribed
by the other.
Moons - two
interleaving half
circles.
Varied variance
blobs – blobs that
have different
variances.
Anisotropically
distributed blobs -
unequal widths and
lengths.
Homogenous data
a ‘null’ situation for
clustering.
Distance Measuring Technique – Euclidean vs Manhattan
Distance Measuring Technique – Correlation-based
Pearson correlation
measures the degree of a linear relationship between two profiles.
Spearman correlation
computes the correlation between the rank of x and the rank of y variables.
Cluster
Distance
Measuring
Techniques
Ward’s method: In this method all possible pairs of clusters are combined and the sum of the
squared distances within each cluster is calculated. This is then summed over all clusters. The
combination that gives the lowest sum of squares is chosen.
Agglomerative
Hierarchical
Clustering –
Dendogram
Agglomerative Hierarchical Clustering
Agglomerative Hierarchical Clustering
Agglomerative Hierarchical Clustering
Agglomerative Hierarchical Clustering
Agglomerative Hierarchical Clustering
Agglomerative Hierarchical Clustering
Agglomerative Hierarchical Clustering
Agglomerative Hierarchical Clustering
Ward’s
Hierarchical
Clustering
Hierarchical Clustering
Hierarchical Clustering Implementation
Partitional Clustering – K-Means Algorithm
K-Means++
K-Means
Clustering-1
K-Means
Clustering-2
K-Means
After moving centers, re-assign the objects…
Clustering-3
K-Means
Clustering-4
K-Means
Clustering-5
K-Means Clustering
K-Means Clustering
K-Means Clustering – Right # of clusters
K-Means Clustering – Right # of clusters
K-Means Clustering – Right # of clusters
K-Means Clustering – Right # of clusters
K-Means Clustering – Right # of clusters
K-Means Clustering – Right # of clusters - Silhouette Coefficient
K-Means Clustering – Right # of clusters - Silhouette Coefficient
K-Means Clustering – Right # of clusters - Silhouette Coefficient
K-Means Clustering – Right # of clusters - Silhouette Coefficient
K-Means Clustering – Right # of clusters - Silhouette Coefficient
DBSCAN
Density-Based
Spatial Clustering
and Application
with Noise Source:
www.sthda.com
Why
DBSCAN
Parameters
Terms
DBSCAN
Algorithm
DBSCAN
Other Clustering Algorithms
Optics:
https://www.madrasresearch.org/post/optics-clustering
https://github.com/christianversloot/machine-learning-articles/blob/main/performing
-optics-clustering-with-python-and-scikit-learn.md
Mean Shift:
https://ml-explained.com/blog/mean-shift-explained
https://aitechtrend.com/simplifying-data-clustering-with-mean-shift-algorithm-in-pyt
hon/