0% found this document useful (0 votes)
7 views11 pages

Clustering

Uploaded by

Priyam Ranjan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views11 pages

Clustering

Uploaded by

Priyam Ranjan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Clustering

Clustering

• Clustering or cluster analysis is a machine learning technique, which groups the unlabelled
dataset.

• It can be defined as “A way of grouping the data points into different clusters, consisting
of similar data points. The objects with the possible similarities remain in a group that
has less or no similarities with another group.”

• It does it by finding some similar patterns in the unlabelled dataset such as shape, size,
color, behavior, etc., and divides them as per the presence and absence of those similar
patterns.

• It is an unsupervised learning method, hence no supervision is provided to the algorithm,


and it deals with the unlabeled dataset.
• After applying this clustering technique, each cluster or group is provided with a
cluster-ID. ML system can use this id to simplify the processing of large and complex
datasets.

• The clustering technique is commonly used for statistical data analysis.

Example: Let's understand the clustering technique with the real-world example of Mall:

• When we visit any shopping mall, we can observe that the things with similar
usage are grouped together.

• Such as the t-shirts are grouped in one section, and trousers are at other sections,
similarly, at vegetable sections, apples, bananas, Mangoes, etc., are grouped in
separate sections, so that we can easily find out the things.

• The clustering technique also works in the same way. Other examples of clustering
are grouping documents according to the topic.
• The clustering technique can be widely used in various tasks. Some most common uses of
this technique are:
• Market Segmentation
• Statistical data analysis
• Social network analysis
• Image segmentation
• Anomaly detection, etc.

• Apart from these general usages, it is used by the Amazon in its recommendation system
to provide the recommendations as per the past search of products.

• Netflix also uses this technique to recommend the movies and web-series to its users as
per the watch history.
Types of Clustering Methods

• The clustering methods are broadly divided into Hard clustering (datapoint belongs to
only one group) and Soft Clustering (data points can belong to another group also).

• But there are also other various approaches of Clustering exist. Below are the main
clustering methods used in Machine learning:
1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering
Partitioning Clustering

• It is a type of clustering that divides the data into


non-hierarchical groups. It is also known as the
centroid-based method.

• The most common example of partitioning


clustering is the K-Means Clustering algorithm.

• In this type, the dataset is divided into a set of k


groups, where K is used to define the number of
pre-defined groups.

• The cluster center is created in such a way that


the distance between the data points of one
cluster is minimum as compared to another
cluster centroid.
Density-Based Clustering

• The density-based clustering method connects the


highly-dense areas into clusters, and the arbitrarily
shaped distributions are formed as long as the
dense region can be connected.

• This algorithm does it by identifying different


clusters in the dataset and connects the areas of
high densities into clusters.

• The dense areas in data space are divided from


each other by sparser areas.

• These algorithms can face difficulty in clustering


the data points if the dataset has varying densities
and high dimensions.
Distribution Model-Based Clustering

• In the distribution model-based clustering


method, the data is divided based on the
probability of how a dataset belongs to a
particular distribution.

• The grouping is done by assuming some


distributions commonly Gaussian
Distribution.

• The example of this type is the


Expectation-Maximization Clustering
algorithm that uses Gaussian Mixture Models
(GMM).
Hierarchical Clustering

• Hierarchical clustering can be used as an


alternative for the partitioned clustering as
there is no requirement of pre-specifying the
number of clusters to be created.

• In this technique, the dataset is divided into


clusters to create a tree-like structure, which
is also called a dendrogram.

• The observations or any number of clusters


can be selected by cutting the tree at the
correct level.

• The most common example of this method is


the Agglomerative Hierarchical algorithm.
Fuzzy Clustering

• Fuzzy clustering is a type of soft method in which a data object may belong to more than
one group or cluster.

• Each dataset has a set of membership coefficients, which depend on the degree of
membership to be in a cluster.

• Fuzzy C-means algorithm is the example of this type of clustering; it is sometimes also
known as the Fuzzy k-means algorithm.

You might also like