Unit 4-L2
Unit 4-L2
Figure: 1
Clustering
• It is an unsupervised learning method, hence no supervision is provided to the algorithm,
and it deals with the unlabeled dataset.
• After applying this clustering technique, each cluster or group is provided with a cluster-
ID. ML system can use this id to simplify the processing of large and complex datasets.
Figure: 2
Types of Clustering Methods
• The most common example of partitioning clustering is the K-Means Clustering
algorithm.
• In this type, the dataset is divided into a set of k groups, where K is used to define the
number of pre-defined groups.
• The cluster center is created in such a way that the distance between the data points of
one cluster is minimum as compared to another cluster centroid.
• Density-Based Clustering: The density-based clustering method connects the highly-
dense areas into clusters, and the arbitrarily shaped distributions are formed as long as
the dense region can be connected.
• This algorithm does it by identifying different clusters in the dataset and connects the
areas of high densities into clusters.
• The dense areas in data space are divided from each other by sparser areas.
Types of Clustering Methods
• These algorithms can face difficulty in clustering the data points if the dataset has
varying densities and high dimensions.
Figure: 3
Types of Clustering Methods
• Distribution Model-Based Clustering: In the distribution model-based clustering
method, the data is divided based on the probability of how a dataset belongs to a
particular distribution.
Figure: 4
Types of Clustering Methods
• The grouping is done by assuming some distributions commonly Gaussian Distribution.
• The example of this type is the Expectation-Maximization Clustering algorithm that uses
Gaussian Mixture Models (GMM).
• Hierarchical Clustering: Hierarchical clustering can be used as an alternative for the
partitioned clustering as there is no requirement of pre-specifying the number of clusters
to be created. In this technique, the dataset is divided into clusters to create a tree-like
structure, which is also called a dendrogram.
Figure: 5
Types of Clustering Methods
• The observations or any number of clusters can be selected by cutting the tree at the
correct level.
• The most common example of this method is the Agglomerative Hierarchical algorithm.
• Fuzzy Clustering: Fuzzy clustering is a type of soft method in which a data object may
belong to more than one group or cluster.
• Each dataset has a set of membership coefficients, which depend on the degree of
membership to be in a cluster.
• Fuzzy C-means algorithm is the example of this type of clustering; it is sometimes also
known as the Fuzzy k-means algorithm.
Clustering Algorithms
• Clustering Algorithms: The Clustering algorithms can be divided based on their models
that are explained above.
• The clustering algorithm is based on the kind of data that we are using.
• Such as, some algorithms need to guess the number of clusters in the given dataset,
whereas some are required to find the minimum distance between the observation of
the dataset.
• Some popular Clustering algorithms:
• K-Means algorithm
• Mean-shift algorithm
• DBSCAN Algorithm
• Expectation-Maximization Clustering using GMM
• Agglomerative Hierarchical algorithm
• Affinity Propagation
Clustering Algorithms
• K-Means algorithm: The k-means algorithm is one of the most popular clustering
algorithms.
• It classifies the dataset by dividing the samples into different clusters of equal
variances.
• The number of clusters must be specified in this algorithm. It is fast with fewer
computations required, with the linear complexity of O(n).
• Mean-shift algorithm: Mean-shift algorithm tries to find the dense areas in the smooth
density of data points.
• It is an example of a centroid-based model, that works on updating the candidates for
centroid to be the center of the points within a given region.
• DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of Applications with
Noise. It is an example of a density-based model similar to the mean-shift, but with some
remarkable advantages. In this algorithm, the areas of high density are separated by the
areas of low density. Because of this, the clusters can be found in any arbitrary shape.
Clustering Algorithms
• DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of Applications with
Noise.
• It is an example of a density-based model similar to the mean-shift, but with some
remarkable advantages.
• In this algorithm, the areas of high density are separated by the areas of low density.
Because of this, the clusters can be found in any arbitrary shape.
• Expectation-Maximization Clustering using GMM: This algorithm can be used as an
alternative for the k-means algorithm or for those cases where K-means can be failed.
• In GMM, it is assumed that the data points are Gaussian distributed.
• Agglomerative Hierarchical algorithm: The Agglomerative hierarchical algorithm
performs the bottom-up hierarchical clustering.
• In this, each data point is treated as a single cluster at the outset and then
successively merged. The cluster hierarchy can be represented as a tree-structure.
Clustering Algorithms
• Affinity Propagation: It is different from other clustering algorithms as it does not
require to specify the number of clusters.
• In this, each data point sends a message between the pair of data points until
convergence.
• It has O(N2T) time complexity, which is the main drawback of this algorithm.
Applications of Clustering
• In Identification of Cancer Cells: The clustering algorithms are widely used for the
identification of cancerous cells.
• It divides the cancerous and non-cancerous data sets into different groups.
• In Search Engines: Search engines also work on the clustering technique. The search
result appears based on the closest object to the search query.
• It does it by grouping similar data objects in one group that is far from the other
dissimilar objects.
• The accurate result of a query depends on the quality of the clustering algorithm
used.
• Customer Segmentation: It is used in market research to segment the customers based
on their choice and preferences.
Applications of Clustering
• In Biology: It is used in the biology stream to classify different species of plants and
animals using the image recognition technique.
• In Land Use: The clustering technique is used in identifying the area of similar lands use
in the GIS database.
• This can be very useful to find that for what purpose the particular land should be
used, that means for which purpose it is more suitable.
Thank You
19