ML Mod6
ML Mod6
MODULE 6 – SYLLABUS
Unsupervised Learning - Clustering Methods - K-means, Expectation-Maximization
Algorithm, Hierarchical Clustering Methods , Density based clustering
➢ Explain Clustering with example/application. Why is it said to be Unsupervised
Learning(can refer mod 1 too)
➢ Explain K-Means procedure/algorithm with example
➢ When do we say the K-means algorithm has converged or when do we stop
cluster reorganisation in K means.
➢ Explain the Reconstruction error to be minimized in Clustering
➢ How can we choose the initial clusters in K-means, how do we determine optimal
number of clusters to choose in Clustering
➢ What are the drawbacks for K-means
CLUSTERING
Clustering or cluster analysis is the task of grouping a set of objects in such a way that
objects in the same group (called a cluster) are more similar (in some sense) to each other
than to those in other groups (clusters).
Example for Clustering – Color Quantization - Let us say we have an image that is stored
with 24 bits/pixel and can have up to 16 million colors. Assume we have a color screen with
8 bits/pixel that can display only 256 colors. We want to find the best 256 colors among all
16 million colors such that the image using only the 256 colors in the palette looks as close as
possible to the original image. This is color quantization where we map from high to lower
resolution.
Other Examples – Digit Classification, Categorizing News articles, Categorizing users in
Social Media
k-means Clustering
The k-means clustering algorithm is one of the simplest unsupervised learning
algorithms for solving the clustering problem.
Let it be required to classify a given data set into a certain number of clusters, say, k
clusters. We start by choosing k points arbitrarily as the “centres” of the clusters, one
for each cluster. We then associate each of the given data points with the nearest
centre. We now take the averages of the data points associated with a centre and
replace the centre with the average, and this is done for each of the centres. We repeat
the process until the centres converge to some fixed points. The data points nearest to
the centres form the various clusters in the dataset. Each cluster is represented by the
associated centre.
1
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6
ie: we take the intra cluster error by taking distance of each data point from cluster center
(inner summation), we add this error for all the clusters(outer summation where k is the
number of clusters), we aim to minimize this sum. vi ‘s are the cluster centers.
2
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6
3
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6
4
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6
5
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6
Hierarchical Clustering
➢ Explain types of hierarchical clustering
➢ Compare Agglomerative and divisive clustering methods
➢ Explain Dendograms with example
➢ Explain the various methods to find distance between group of data points (max
distance- complete linkage, min distance- single linkage, average distance)
➢ Explain Agglomerative clustering algorithm
➢ Explain Divisive clustering (DIANA) with example
Hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method
of cluster analysis which seeks to build a hierarchy of clusters (or groups) in a given
dataset. The hierarchical clustering produces clusters in which the clusters at each
level of the hierarchy are created by merging clusters at the next lower level. At the
lowest level, each cluster contains a single observation. At the highest level there is
only one cluster containing all of the data.
The decision regarding whether two clusters are to be merged or not is taken based on
the measure of dissimilarity between the clusters. The distance between two clusters
is usually taken as the measure of dissimilarity between the clusters.
Dendrograms
Hierarchical clustering can be represented by a rooted binary tree. The nodes of the
trees represent groups or clusters. The root node represents the entire data set. The
terminal nodes each represent one of the individual observations (singleton clusters).
Each nonterminal node has two daughter nodes.
The distance between merged clusters is monotone increasing with the level of the
merger. The height of each node above the level of the terminal nodes in the tree is
proportional to the value of the distance between its two daughters
6
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6
produced by hierarchical clustering. The dendrogram may be drawn with the root
node at the top and the branches growing vertically downwards
In the agglomerative we start at the bottom and at each level recursively merge a
selected pair of clusters into a single cluster. This produces a grouping at the next
higher level with one less cluster. If there are N observations in the dataset, there will
be N − 1 levels in the hierarchy. The pair chosen for merging consist of the two
groups with the smallest “intergroup dissimilarity”.
7
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6
Divisive method
8
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6
9
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6
In step 3 the cluster distance is calculate using Complete Linking Clustering or Single
Linkage Clustering or Average Linkage Clustering
10
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6
11
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6
12
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6
The complete-linkage clustering uses the “minimum formula”, that is, the following
formula to compute the distance between two clusters A and B:
13
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6
14
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6
15
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6
16
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6
17
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6
18
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6
K means clustering will fail to cluster based on density as it would cluster based on
distance to nearest centroid. It would obtain different clusters in comparison to density
based , Its fails to capture the complex density pattern in the data sets.
Fig shows examples of cases where Density based clustering can be applied to capture
the complex patterns
19
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6
DBSCAN ALGORITHM
20
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6
21
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6
22
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6
In the case of Gaussian mixture problems, because of the nature of the function,
finding a maximum likelihood estimate by taking the derivatives of the log-likelihood
function with respect to all the parameters and simultaneously solving the resulting
equations is nearly impossible. So we apply the EM algorithm to solve the problem.
23
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6
24
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra