0% found this document useful (0 votes)

81 views24 pages

ML Mod6

This module covers clustering methods including k-means, hierarchical clustering, and density-based clustering. It discusses k-means clustering including initializing clusters, minimizing reconstruction error, and determining convergence. It also covers hierarchical clustering, comparing agglomerative and divisive approaches. Dendograms are used to represent hierarchical clusters and different linkage methods are described to calculate distance between clusters.

Uploaded by

amarthya v

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views24 pages

ML Mod6

Uploaded by

amarthya v

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

CS 476 Introduction to Machine Learning, Module 6

MODULE 6 – SYLLABUS
Unsupervised Learning - Clustering Methods - K-means, Expectation-Maximization
Algorithm, Hierarchical Clustering Methods , Density based clustering
➢ Explain Clustering with example/application. Why is it said to be Unsupervised
Learning(can refer mod 1 too)
➢ Explain K-Means procedure/algorithm with example
➢ When do we say the K-means algorithm has converged or when do we stop
cluster reorganisation in K means.
➢ Explain the Reconstruction error to be minimized in Clustering
➢ How can we choose the initial clusters in K-means, how do we determine optimal
number of clusters to choose in Clustering
➢ What are the drawbacks for K-means
CLUSTERING
Clustering or cluster analysis is the task of grouping a set of objects in such a way that
objects in the same group (called a cluster) are more similar (in some sense) to each other
than to those in other groups (clusters).
Example for Clustering – Color Quantization - Let us say we have an image that is stored
with 24 bits/pixel and can have up to 16 million colors. Assume we have a color screen with
8 bits/pixel that can display only 256 colors. We want to find the best 256 colors among all
16 million colors such that the image using only the 256 colors in the palette looks as close as
possible to the original image. This is color quantization where we map from high to lower
resolution.
Other Examples – Digit Classification, Categorizing News articles, Categorizing users in
Social Media
k-means Clustering
The k-means clustering algorithm is one of the simplest unsupervised learning
algorithms for solving the clustering problem.
Let it be required to classify a given data set into a certain number of clusters, say, k
clusters. We start by choosing k points arbitrarily as the “centres” of the clusters, one
for each cluster. We then associate each of the given data points with the nearest
centre. We now take the averages of the data points associated with a centre and
replace the centre with the average, and this is done for each of the centres. We repeat
the process until the centres converge to some fixed points. The data points nearest to
the centres form the various clusters in the dataset. Each cluster is represented by the
associated centre.

1
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6

The aim is to minimize the Reconstruction error , given as

ie: we take the intra cluster error by taking distance of each data point from cluster center
(inner summation), we add this error for all the clusters(outer summation where k is the
number of clusters), we aim to minimize this sum. vi ‘s are the cluster centers.

Step 6. The steps are repeated until Convergence; we stop if

● If the reconstruction error is within a threshold

● If no data points are reassigned
● If cluster centers don’t change

Some Methods to choose initial cluster points

2
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6

Disadvantages/Drawbacks of K means Clustering

Even though the k-means algorithm is fast, robust and easy to understand, there are
several disadvantages to the algorithm.  
● The learning algorithm requires apriori specification of the number of cluster centers
● The final cluster centres depend on the initial vi’s.  
● With different representation of data we get different results
● Euclidean distance measures can unequally weight underlying factors.  
● The learning algorithm provides the local optima of the squared error function.  
● Randomly choosing of the initial cluster centres may not lead to a fruitful result.  
● The algorithm cannot be applied to categorical data.  
The optimum number of Cluters (k) can be identified by poltting number of clusters with
respect to error, after a certain K value the decrease in error reduces drastically with each
iteration – Elbow Method to find optimal number of K

3
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6

In the problem, the required number of clusters

is 2 and we take k = 2.

We choose two points arbitrarily as the initial

cluster centres. Let us choose arbitrarily

We compute the distances of the given data

points from the cluster centers.

4
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6

Calculating distance of data points w.r.t new centers

5
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6

Hierarchical Clustering
➢ Explain types of hierarchical clustering
➢ Compare Agglomerative and divisive clustering methods
➢ Explain Dendograms with example
➢ Explain the various methods to find distance between group of data points (max
distance- complete linkage, min distance- single linkage, average distance)
➢ Explain Agglomerative clustering algorithm
➢ Explain Divisive clustering (DIANA) with example
Hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method
of cluster analysis which seeks to build a hierarchy of clusters (or groups) in a given
dataset. The hierarchical clustering produces clusters in which the clusters at each
level of the hierarchy are created by merging clusters at the next lower level. At the
lowest level, each cluster contains a single observation. At the highest level there is
only one cluster containing all of the data.

The decision regarding whether two clusters are to be merged or not is taken based on
the measure of dissimilarity between the clusters. The distance between two clusters
is usually taken as the measure of dissimilarity between the clusters.

Dendrograms
Hierarchical clustering can be represented by a rooted binary tree. The nodes of the
trees represent groups or clusters. The root node represents the entire data set. The
terminal nodes each represent one of the individual observations (singleton clusters).
Each nonterminal node has two daughter nodes.

The distance between merged clusters is monotone increasing with the level of the
merger. The height of each node above the level of the terminal nodes in the tree is
proportional to the value of the distance between its two daughters

A dendrogram is a tree diagram used to illustrate the arrangement of the clusters

6
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6

produced by hierarchical clustering. The dendrogram may be drawn with the root
node at the top and the branches growing vertically downwards

Figure 13.7 is a dendrogram of the dataset {a,

b, c, d, e}. Note that the root node represents
the entire dataset and the terminal nodes represent the individual observations.

Methods for hierarchical clustering

There are two methods for the hierarchical clustering of a dataset. These are known as
the

1. agglomerative method (or the bottom-up method) and the

2. divisive method (or, the top-down method).
Agglomerative Hierarchical Clustering

In the agglomerative we start at the bottom and at each level recursively merge a
selected pair of clusters into a single cluster. This produces a grouping at the next
higher level with one less cluster. If there are N observations in the dataset, there will
be N − 1 levels in the hierarchy. The pair chosen for merging consist of the two
groups with the smallest “intergroup dissimilarity”.

7
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6

Divisive method

The divisive method starts at the top and at

each level recursively split one of the
existing clusters at that level into two new
clusters. If there are N observations in the
dataset, there the divisive method also will
produce N − 1 levels in the hierarchy. The
split is chosen to produce two new groups
with the largest “between-group
dissimilarity”.

Measure of distance between Two data points

8
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6

➢ what does measure of dissimilarity measure. Give examples of measures of

dissimilarity.
➢ Explain different types of linkages in clustering
Measures of distance between groups of data points

9
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6

Algorithm for agglomerative hierarchical clustering

In step 3 the cluster distance is calculate using Complete Linking Clustering or Single
Linkage Clustering or Average Linkage Clustering

10
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6

11
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6

12
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6

The complete-linkage clustering uses the “minimum formula”, that is, the following
formula to compute the distance between two clusters A and B:

13
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6

14
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6

Dendogram for the Hierarchical clustering

Algorithm for divisive hierarchical clustering

Divisive clustering algorithms begin with the entire data set as a single cluster, and
recursively divide one of the existing clusters into two daughter clusters at each
iteration in a top-down fashion. To apply this procedure, we need a separate algorithm
to divide a given dataset into two clusters.

DIANA (DIvisive ANAlysis)

15
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6

16
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6

= ¼ (d(a,b)+d(a,c)+d(a,d), d(a,e))= ¼ (9+3+6+11) = 7.25

17
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6

18
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6

DENSITY BASED CLUSTERING

➢ Explain Density based Clustering (DBSCAN) with example and illustrations.
(look terms in procedure – core point , neighbour hood, outlier, border point,
density reachable)
➢ When is Density based clustering preferred over K means clustering
In density-based clustering, clusters are defined as areas of higher density than the
remainder of the data set. Objects in these sparse areas - that are required to separate
clusters - are usually considered to be noise and border points. The most popular
density based clustering method is DBSCAN (Density-Based Spatial Clustering of
Applications with Noise).

K means clustering will fail to cluster based on density as it would cluster based on
distance to nearest centroid. It would obtain different clusters in comparison to density
based , Its fails to capture the complex density pattern in the data sets.

Fig shows examples of cases where Density based clustering can be applied to capture
the complex patterns

19
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6

DBSCAN ALGORITHM

20
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6

Read for Expectation Maximization algorithm explanation .

Examples of probability distributions

A bimodal distribution is a continuous probability distribution with two different

modes. The modes appear as distinct peaks in the graph of the probability density
function.  

Consider the mixture of k normal distributions defined by. Let us define a k-

dimensional random variable

21
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6

➢ Write the Expectation Maximization algorithm

Expectation-maximisation algorithm
The maximum likelihood estimation method (MLE) is a method for estimating the
parameters of a statistical model, given observations (see Section 6.5 for details). The
method attempts to find the parameter values that maximize the likelihood function, or
equivalently the log-likelihood function, given the observations.

The expectation-maximisation algorithm (sometimes abbreviated as the EM

algorithm) is used to find maximum likelihood estimates of the parameters of a
statistical model in cases where the equations cannot be solved directly. These models
generally involve latent or unobserved variables in addition to unknown parameters
and known data observations. For example, a Gaussian mixture model can be
described by assuming that each observed data point has a corresponding unobserved
data point, or latent variable, specifying the mixture component to which each data
point belongs.

22
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6

In the case of Gaussian mixture problems, because of the nature of the function,
finding a maximum likelihood estimate by taking the derivatives of the log-likelihood
function with respect to all the parameters and simultaneously solving the resulting
equations is nearly impossible. So we apply the EM algorithm to solve the problem.

As already indicated, the EM algorithm is a general procedure for estimating the

parameters in a statistical model.

23
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra
CS 476 Introduction to Machine Learning, Module 6

Further tutorial to understand EM algorithm

https://www.kdnuggets.com/2016/08/tutorial-expectation-maximization-algorithm.html
https://www.cmi.ac.in/~madhavan/courses/dmml2018/literature/EM_algorithm_2coin_e
xample.pdf

24
Prepared By Abin Philip, Asst Prof, Toc H.
Reference: Introduction to Machine Learning, II edition, Ethem Alpaydin, Lecture Notes in machine Learning by Dr V N
Krishnachandra

Lecture Notes - Clustering
No ratings yet
Lecture Notes - Clustering
13 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
1731009606_Clustering_(Class_38-39)
No ratings yet
1731009606_Clustering_(Class_38-39)
45 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Mid 2
No ratings yet
Mid 2
11 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
MachineLearning Unit IV.pptx
No ratings yet
MachineLearning Unit IV.pptx
51 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
Clustering
No ratings yet
Clustering
39 pages
Clustering-Part1.pptx
No ratings yet
Clustering-Part1.pptx
84 pages
Clustering
No ratings yet
Clustering
20 pages
U-5_IML (2)
No ratings yet
U-5_IML (2)
20 pages
MACHINE LEARNING NOTES ANNA UNIVERSITY
No ratings yet
MACHINE LEARNING NOTES ANNA UNIVERSITY
14 pages
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
No ratings yet
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
7 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Unit-6 Clustering Techniques
No ratings yet
Unit-6 Clustering Techniques
110 pages
Machine Learning Bloque 4
No ratings yet
Machine Learning Bloque 4
12 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
Data Clustering..
No ratings yet
Data Clustering..
10 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
17 pages
Unit 5
No ratings yet
Unit 5
5 pages
Lec09 Clustering
No ratings yet
Lec09 Clustering
27 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
Clustering Slides
No ratings yet
Clustering Slides
22 pages
unsupervised_learning_1
No ratings yet
unsupervised_learning_1
40 pages
Cluster
100% (1)
Cluster
72 pages
ML-UNIT-III
No ratings yet
ML-UNIT-III
12 pages
Clustering: CMPUT 466/551 Nilanjan Ray
No ratings yet
Clustering: CMPUT 466/551 Nilanjan Ray
34 pages
Unit-IV ppt
No ratings yet
Unit-IV ppt
51 pages
Clustering
No ratings yet
Clustering
75 pages
unit 4 mining
No ratings yet
unit 4 mining
12 pages
A Comprehensive Overview of Clustering Algorithms in Pattern Recognition
No ratings yet
A Comprehensive Overview of Clustering Algorithms in Pattern Recognition
8 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Text Analytics Unit-3
No ratings yet
Text Analytics Unit-3
11 pages
MODULE 4 - 5TH SEM (2)
No ratings yet
MODULE 4 - 5TH SEM (2)
23 pages
CLUSTERING
No ratings yet
CLUSTERING
5 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
Unsupervised-Learning-Part-1 (1)
No ratings yet
Unsupervised-Learning-Part-1 (1)
9 pages
Lec 8
No ratings yet
Lec 8
14 pages
ML UNIT-5 (1)
No ratings yet
ML UNIT-5 (1)
30 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
Importance of Clustering in Data Mining
No ratings yet
Importance of Clustering in Data Mining
5 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
Unit 4 Self Made (1)
No ratings yet
Unit 4 Self Made (1)
28 pages
Clustering
No ratings yet
Clustering
80 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
Zara
No ratings yet
Zara
47 pages
Unit 4 - Data Warehousing and Mining
No ratings yet
Unit 4 - Data Warehousing and Mining
51 pages
Chapter_6 (2)
No ratings yet
Chapter_6 (2)
54 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
Clustering
No ratings yet
Clustering
29 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Clustering Methods
No ratings yet
Clustering Methods
29 pages
Cluster Evaluation Techniques: Atds Assignment
No ratings yet
Cluster Evaluation Techniques: Atds Assignment
4 pages
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
From Everand
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
Fouad Sabry
No ratings yet
MATLAB - Lecture 1 - Overview
No ratings yet
MATLAB - Lecture 1 - Overview
98 pages
Comprehensive Review of K Means Clustering Algorithms1
No ratings yet
Comprehensive Review of K Means Clustering Algorithms1
6 pages
Ebook 042 Tutorial Spss Hierarchical Cluster Analysis
No ratings yet
Ebook 042 Tutorial Spss Hierarchical Cluster Analysis
17 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
8 pages
Data Mining Using Python Lab
100% (1)
Data Mining Using Python Lab
63 pages
Data Science and Big Data Analysis Mcqs
No ratings yet
Data Science and Big Data Analysis Mcqs
53 pages
Week 8 Prev & Current Assignments
No ratings yet
Week 8 Prev & Current Assignments
28 pages
Cluster Analysis
No ratings yet
Cluster Analysis
26 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
No ratings yet
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
44 pages
Artificial Intelligence Fundamentals Midterm Q1
No ratings yet
Artificial Intelligence Fundamentals Midterm Q1
4 pages
Introduction To Data Mining Clustering Analysis
No ratings yet
Introduction To Data Mining Clustering Analysis
84 pages
Segmentation and Targeting
No ratings yet
Segmentation and Targeting
330 pages
07-Clustering-2024
No ratings yet
07-Clustering-2024
51 pages
Unit IV Unsupervised Learning
No ratings yet
Unit IV Unsupervised Learning
4 pages
Irfan Awan - The International Conference on Deep Learning, Big Data and Blockchain (Deep-BDB 2021)-Springer Nature (2021)
No ratings yet
Irfan Awan - The International Conference on Deep Learning, Big Data and Blockchain (Deep-BDB 2021)-Springer Nature (2021)
182 pages
Piyush Data Science 3
No ratings yet
Piyush Data Science 3
26 pages
Data Mining MCQ
No ratings yet
Data Mining MCQ
4 pages
UNIT5
No ratings yet
UNIT5
60 pages
Crisp DM - Crisp MLQ
No ratings yet
Crisp DM - Crisp MLQ
9 pages
56ce723e Barca Conference Paper Laurie Shaw
No ratings yet
56ce723e Barca Conference Paper Laurie Shaw
13 pages
UL Coded Project Report_KC (1)
No ratings yet
UL Coded Project Report_KC (1)
30 pages
Clustering - The Data Ensemble
No ratings yet
Clustering - The Data Ensemble
4 pages
The Hierarchical Equal Risk Contribution Portfolio
No ratings yet
The Hierarchical Equal Risk Contribution Portfolio
26 pages
BRUGHMANS, T. 2010: Connecting The Dots: Towards Archaeological Network Analysis. Oxford Journal of Archaeology.
100% (1)
BRUGHMANS, T. 2010: Connecting The Dots: Towards Archaeological Network Analysis. Oxford Journal of Archaeology.
46 pages
Definition: Hydraulic Flow Units in Oil and Gas Reservoirs
No ratings yet
Definition: Hydraulic Flow Units in Oil and Gas Reservoirs
3 pages
Techniques of Cluster Analysis: A Seminar On
No ratings yet
Techniques of Cluster Analysis: A Seminar On
25 pages
Data Mining Unit-4
No ratings yet
Data Mining Unit-4
27 pages
Caixinha 2016
No ratings yet
Caixinha 2016
16 pages

ML Mod6

Uploaded by

ML Mod6

Uploaded by

CS 476 Introduction to Machine Learning, Module 6

The aim is to minimize the Reconstruction error , given as

Step 6. The steps are repeated until Convergence; we stop if

● If the reconstruction error is within a threshold

Some Methods to choose initial cluster points

Disadvantages/Drawbacks of K means Clustering

In the problem, the required number of clusters

We choose two points arbitrarily as the initial

We compute the distances of the given data

Calculating distance of data points w.r.t new centers

A dendrogram is a tree diagram used to illustrate the arrangement of the clusters

Figure 13.7 is a dendrogram of the dataset {a,

Methods for hierarchical clustering

1. agglomerative method (or the bottom-up method) and the

The divisive method starts at the top and at

Measure of distance between Two data points

➢ what does measure of dissimilarity measure. Give examples of measures of

Algorithm for agglomerative hierarchical clustering

Dendogram for the Hierarchical clustering

Algorithm for divisive hierarchical clustering

DIANA (DIvisive ANAlysis)

= ¼ (d(a,b)+d(a,c)+d(a,d), d(a,e))= ¼ (9+3+6+11) = 7.25

DENSITY BASED CLUSTERING

Read for Expectation Maximization algorithm explanation .

A bimodal distribution is a continuous probability distribution with two different

Consider the mixture of k normal distributions defined by. Let us define a k-

➢ Write the Expectation Maximization algorithm

The expectation-maximisation algorithm (sometimes abbreviated as the EM

As already indicated, the EM algorithm is a general procedure for estimating the

Further tutorial to understand EM algorithm

You might also like