0% found this document useful (0 votes)

376 views27 pages

K Means Clustering Problem Solved

The k-means clustering algorithm involves the following steps: 1. Randomly select k objects as initial cluster centroids 2. Assign each object to the closest centroid to form k clusters 3. Recalculate the centroid of each cluster 4. Repeat steps 2-3 until centroids stop changing or maximum iterations are reached.

Uploaded by

Bhushan Kelkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

376 views27 pages

K Means Clustering Problem Solved

Uploaded by

Bhushan Kelkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

 Assumes Euclidean space/distance

 Start by picking k, the number of clusters

 Initialize clusters by picking one point per

cluster
 Example: Pick one point at random, then k-1
other points, each as far away as possible from
the previous points

BDA 23
 1) For each point, place it in the cluster whose
current centroid it is nearest

 2) After all points are assigned, update the

locations of centroids of the k clusters

 3) Reassign all points to their closest centroid

 Sometimes moves points between clusters

 Repeat 2 and 3 until convergence

 Convergence: Points don’t move between clusters
and centroids stabilize
BDA 24
Algorithm 16.1: k-Means clustering
Input: D is a dataset containing n objects, k is the number of cluster
Output: A set of k clusters
Steps:
1. Randomly choose k objects from D as the initial cluster centroids.
2. For each of the objects in D do
• Compute distance between the current objects and k cluster centroids
• Assign the current object to that cluster to which it is closest.
3. Compute the “cluster centers” of each cluster. These become the new cluster
centroids.
4. Repeat step 2-3 until the convergence criterion is satisfied
5. Stop

25
Note:
1) Objects are defined in terms of set of attributes.
where each is continuous data type.
2) Distance computation: Any distance such as or cosine similarity.
3) Minimum distance is the measure of closeness between an object and
centroid.
4) Mean Calculation: It is the mean value of each attribute values of all objects.
5) Convergence criteria: Any one of the following are termination condition of
the algorithm.
• Number of maximum iteration permissible.
• No change of centroid values in any cluster.
• Zero (or no significant) movement(s) of object from one cluster to another.
• Cluster quality reaches to a certain level of acceptance.

26
BDA 27
x
x
x
x
x

x x x x x x

x … data point
… centroid Clusters after round 1

BDA 28
x
x
x
x
x

x x x x x x

x … data point
… centroid Clusters after round 2

BDA 29
x
x
x
x
x

x x x x x x

x … data point
… centroid Clusters at the end

BDA 30
Fig 16.1: Plotting data of Table 16.1
25
A1 A2
6.8 12.6
0.8 9.8 20

1.2 11.6
2.8 9.6 15

A2
3.8 9.9
4.4 6.5 10
4.8 1.1
6.0 19.9 5
6.2 18.5
7.6 17.4 0
7.8 12.2 0 2 4 6 8 10 12

6.6 7.7 A1
8.2 4.5
8.4 6.9
Table 16.1: 16 objects with two
9.0 3.4
attributes 𝑨𝟏 and 𝑨𝟐 .
9.6 11.1

31
• Suppose, k=3. Three objects are chosen at random shown as circled (see
Fig 16.1). These three centroids are shown below.
Initial Centroids chosen randomly

Centroid Objects
A1 A2
c1 3.8 9.9
c2 7.8 12.2
c3 6.2 18.5

• Let us consider the Euclidean distance measure (L2 Norm) as the distance
measurement in our illustration.
• Let d1, d2 and d3 denote the distance from an object to c1, c2 and c3
respectively. The distance calculations are shown in Table 16.2.
• Assignment of each object to the respective centroid is shown in the right-
most column and the clustering so obtained is shown in Fig 16.2.

32
Table 16.2: Distance calculation Fig 16.2: Initial cluster with respect to Table
A1 A2 d1 d2 d3 cluster
16.2
6.8 12.6 4.0 1.1 5.9 2
0.8 9.8 3.0 7.4 10.2 1
1.2 11.6 3.1 6.6 8.5 1
2.8 9.6 1.0 5.6 9.5 1
3.8 9.9 0.0 4.6 8.9 1
4.4 6.5 3.5 6.6 12.1 1
4.8 1.1 8.9 11.5 17.5 1
6.0 19.9 10.2 7.9 1.4 3
6.2 18.5 8.9 6.5 0.0 3
7.6 17.4 8.4 5.2 1.8 3
7.8 12.2 4.6 0.0 6.5 2
6.6 7.7 3.6 4.7 10.8 1
8.2 4.5 7.0 7.7 14.1 1
8.4 6.9 5.5 5.3 11.8 2
9.0 3.4 8.3 8.9 15.4 1
9.6 11.1 5.9 2.1 8.1 2

CS 40003: Data 33
The calculation new centroids of the three cluster using the mean of attribute
values of A1 and A2 is shown in the Table below. The cluster with new centroids
are shown in Fig 16.3.

Calculation of new centroids

New Objects
Centroid A1 A2
c1 4.6 7.1
c2 8.2 10.7
c3 6.6 18.6

Fig 16.3: Initial cluster with new centroids

CS 40003: Data 34
We next reassign the 16 objects to three clusters by determining which centroid is
closest to each one. This gives the revised set of clusters shown in Fig 16.4.
Note that point p moves from cluster C2 to cluster C1.

Fig 16.4: Cluster after first iteration

CS 40003: Data 35
• The newly obtained centroids after second iteration are given in the table below.
Note that the centroid c3 remains unchanged, where c2 and c1 changed a little.
• With respect to newly obtained cluster centres, 16 points are reassigned again.
These are the same clusters as before. Hence, their centroids also remain
unchanged.
• Considering this as the termination criteria, the k-means algorithm stops here.
Hence, the final cluster in Fig 16.5 is same as Fig 16.4.
Fig 16.5: Cluster after Second iteration

Cluster centres after second iteration

Centroid Revised Centroids

A1 A2
c1 5.0 7.1
c2 8.1 12.0
c3 6.6 18.6

CS 40003: Data 36
How to select k?
 Try different k, looking at the change in the
average distance to centroid as k increases
 Average falls rapidly until right k, then
changes little

Best value
of k
Average
distance to
centroid k

BDA 37
Too few; x
many long x
xx x
distances
x x
to centroid. x x x x x
x x x x x
x xx x xx x
x x x x
x x

x x x
x x x x
x x x
x

BDA 38
x
Just right; x
distances xx x
rather short. x x
x x x x x
x x x x x
x xx x xx x
x x x x
x x

x x x
x x x x
x x x
x

BDA 39
Too many; x
little improvement x
in average xx x
distance. x x
x x x x x
x x x x x
x xx x xx x
x x x x
x x

x x x
x x x x
x x x
x

BDA 40
 Group the medicines below into two groups
based on two feature weight & ph

Medicine Weight (X) Ph (Y)

A 1 1
B 2 1
C 4 3
D 5 4

 Given k=2, Take initial centroid as A & B

BDA 41
Medicine Weight (X) Ph (Y) Assignment
A 1 1 Cluster 1
B 2 1 Cluster 2
C 4 3 ?
D 5 4 ?

 Cluster points can also be represented as

A(1,1) B(2,1) C(4,3) D(5,4)
 Euclidean Distance Formula
 d[(x,y),(a,b)]=

BDA 42
 d[(4,3),(1,1)]= =3.61 Point d(1,1) d(2,1) Cluste
 d[(5,4),(1,1)]= =5
r
 d[(4,3),(2,1)]= =2.83 A(1,1) 0 X 1
 d[(5,4),(2,1)]= =4.24
B(2,1) x 0 2

C(4,3) ?

D(5,4) ?

BDA 43
 d[(4,3),(1,1)]= =3.61 Point d(1,1) d(2,1) Cluste
 d[(5,4),(1,1)]= =5
r
 d[(4,3),(2,1)]= =2.83 A(1,1) 0 X 1
 d[(5,4),(2,1)]= =4.24
B(2,1) x 0 2

C(4,3) 3.61 2.83 2

 Centroid Calculation
D(5,4) 5 4.24 2

Cluster Points Centroid

1 A(1,1) (1,1)

2 B(2,1), C(4,3), D(5,4) =(3.67,2.67)

BDA 44
Point d(1,1) d(3.6 Clust
 d[(2,1),(1,1)]= =1 7,2.6 er
 d[(4,3),(1,1)]= =3.61 7)
 d[(5,4),(1,1)]= =5 A(1,1) 0 X 1
 d[(2,1),(3.67,2.67)]= =
2.36 B(2,1)
 d[(4,3),(3.67,2.67)]= =
2.69 C(4,3)

 d[(5,4),(3.67,2.67)]= = D(5,4)
1.37

BDA 45
Point d(1,1) d(3.6 Clust
 d[(2,1),(1,1)]= =1 7,2.6 er
 d[(4,3),(1,1)]= =3.61 7)
 d[(5,4),(1,1)]= =5 A(1,1) 0 X 1
 d[(2,1),(3.67,2.67)]= =
2.36 B(2,1) 1 2.36 1
 d[(4,3),(3.67,2.67)]= =
2.69 C(4,3) 3.61 2.69 2

 d[(5,4),(3.67,2.67)]= = D(5,4) 5 1.37 2

1.37
 Centroid Calculation
Cluster Points Centroid

1 A(1,1), B(2,1) =(1,1.5)

2 C(4,3), D(5,4) =(4.5,3.5)

BDA 46
Point d(1,1. d(4.5, Clust
 d[(1,1),(1,1.5)]= =0.5 5) 3.5) er
 d[(2,1),(1,1.5)]= =1.12
A(1,1)
 d[(4,3),(1,1.5)]= =3.35
 d[(5,4),(1,1.5)]= =4.72 B(2,1)
 d[(1,1),(4.5,3.5)]= =4.30
 d[(2,1),(4.5,3.5)]= =5.36 C(4,3)

 d[(4,3),(4.5,3.5)]= =0.71
D(5,4)
 d[(5,4),(4.5,3.5)]= =0.71
 Centroid Calculation

BDA 47
Point d(1,1. d(4.5, Clust
 d[(1,1),(1,1.5)]= =0.5 5) 3.5) er
 d[(2,1),(1,1.5)]= =1.12
A(1,1) 0.5 4.30 1
 d[(4,3),(1,1.5)]= =3.35
 d[(5,4),(1,1.5)]= =4.72 B(2,1) 1.12 5.36 1
 d[(1,1),(4.5,3.5)]= =4.30
 d[(2,1),(4.5,3.5)]= =5.36 C(4,3) 3.35 0.71 2

 d[(4,3),(4.5,3.5)]= =0.71
D(5,4) 4.72 0.71 2
 d[(5,4),(4.5,3.5)]= =0.71
 Centroid Calculation

 As the cluster has not changed, the cluster has

reached its stability.

BDA 48

Subiect Examen Bilingv 2023
100% (1)
Subiect Examen Bilingv 2023
2 pages
Module - 4 K Means Clustering
No ratings yet
Module - 4 K Means Clustering
20 pages
Lecture-18-Clustering-19092024-091909am
No ratings yet
Lecture-18-Clustering-19092024-091909am
33 pages
Introduction to Data Science Lecture 6 KG Sir OEC M 621 (e)
No ratings yet
Introduction to Data Science Lecture 6 KG Sir OEC M 621 (e)
19 pages
Lecture 1 (UNIT 1)
No ratings yet
Lecture 1 (UNIT 1)
68 pages
DWM Question Bank Solution
No ratings yet
DWM Question Bank Solution
23 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Updated_k-means Naive bayes
No ratings yet
Updated_k-means Naive bayes
11 pages
K-Means Clustering
No ratings yet
K-Means Clustering
7 pages
Unit-V (1)
No ratings yet
Unit-V (1)
165 pages
5 - CH 5-K-Means Clustering
No ratings yet
5 - CH 5-K-Means Clustering
54 pages
K Mean Clustering
No ratings yet
K Mean Clustering
36 pages
K-Means Clustering-converted-merged
No ratings yet
K-Means Clustering-converted-merged
76 pages
ML Clustering K Mean (1)
No ratings yet
ML Clustering K Mean (1)
33 pages
K Mean Clustering
No ratings yet
K Mean Clustering
45 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
045 6(a) What is a Dendrogram_ How is It Constructed
No ratings yet
045 6(a) What is a Dendrogram_ How is It Constructed
4 pages
Kmeans Clustering Numerical - 1
No ratings yet
Kmeans Clustering Numerical - 1
5 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
Clustering solved examples
No ratings yet
Clustering solved examples
13 pages
Assignment1 M0719077 Naufal Adhi Iyansyah
No ratings yet
Assignment1 M0719077 Naufal Adhi Iyansyah
4 pages
Clustering
No ratings yet
Clustering
80 pages
K-MEANS
No ratings yet
K-MEANS
25 pages
Data Analytics: Clustering Techniques
No ratings yet
Data Analytics: Clustering Techniques
47 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
K Means
No ratings yet
K Means
19 pages
2.10 Partitioning Methods - k-Means and k-Medoids
No ratings yet
2.10 Partitioning Methods - k-Means and k-Medoids
38 pages
ML Unit 4 Part A Material
No ratings yet
ML Unit 4 Part A Material
15 pages
K-Means
No ratings yet
K-Means
66 pages
Module 5
No ratings yet
Module 5
98 pages
K-Means Clustering
No ratings yet
K-Means Clustering
38 pages
ML K-Means
No ratings yet
ML K-Means
3 pages
Quality of Clustering: Clustering (K-Means Algorithm)
No ratings yet
Quality of Clustering: Clustering (K-Means Algorithm)
4 pages
PART2
No ratings yet
PART2
61 pages
Unit-IV ppt
No ratings yet
Unit-IV ppt
51 pages
CPE412 Pattern Recognition (Week 7)
No ratings yet
CPE412 Pattern Recognition (Week 7)
48 pages
Clustering 47698 Techniques
No ratings yet
Clustering 47698 Techniques
47 pages
LO1_K-means strategy
No ratings yet
LO1_K-means strategy
29 pages
K-means_clustering
No ratings yet
K-means_clustering
21 pages
Clustering Partition Hierachy
No ratings yet
Clustering Partition Hierachy
58 pages
08_k-means
No ratings yet
08_k-means
19 pages
Clustering TNP
No ratings yet
Clustering TNP
53 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
MachineLearning Unit IV.pptx
No ratings yet
MachineLearning Unit IV.pptx
51 pages
KMeans Variants
No ratings yet
KMeans Variants
27 pages
Storage Technologies: Digital Assignment 1
No ratings yet
Storage Technologies: Digital Assignment 1
16 pages
K-Means With Elbow Method
No ratings yet
K-Means With Elbow Method
24 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Kmeans Clustering Lecture 8
No ratings yet
Kmeans Clustering Lecture 8
20 pages
K Mean Clustering
No ratings yet
K Mean Clustering
27 pages
Lect 4
No ratings yet
Lect 4
34 pages
k Mean Clustering
No ratings yet
k Mean Clustering
32 pages
lecture 9 K Means
No ratings yet
lecture 9 K Means
23 pages
4 Clustering1
No ratings yet
4 Clustering1
41 pages
K-Means
No ratings yet
K-Means
14 pages
13 Clustering Techniques
No ratings yet
13 Clustering Techniques
47 pages
3 00f3f2a7d5 K Means
No ratings yet
3 00f3f2a7d5 K Means
13 pages
Foundations of Image Science
From Everand
Foundations of Image Science
Harrison H. Barrett
No ratings yet
Mathematical Tools for Real-World Applications: A Gentle Introduction for Students and Practitioners
From Everand
Mathematical Tools for Real-World Applications: A Gentle Introduction for Students and Practitioners
Alexandr Draganov
No ratings yet
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
Physics 11 Energy Unit Plan
No ratings yet
Physics 11 Energy Unit Plan
38 pages
Uts Physical Self
No ratings yet
Uts Physical Self
5 pages
Notes On The Seven Branch Prayer (Vajrayana Buddhism)
100% (1)
Notes On The Seven Branch Prayer (Vajrayana Buddhism)
6 pages
ECO121 Quiz 2
No ratings yet
ECO121 Quiz 2
4 pages
Landscape Signal - An Investigative Report It's Point of Origin
No ratings yet
Landscape Signal - An Investigative Report It's Point of Origin
19 pages
Dzikir Ketaqwaan: Q.S Al Baqoroh Ayat 255, Q.S at Taubah Ayat 128-129, Q.S Fathir Ayat 38, Q.S Al Ahzab Ayat 56
No ratings yet
Dzikir Ketaqwaan: Q.S Al Baqoroh Ayat 255, Q.S at Taubah Ayat 128-129, Q.S Fathir Ayat 38, Q.S Al Ahzab Ayat 56
2 pages
Cost Function PDF
No ratings yet
Cost Function PDF
270 pages
15.10.2022 Have To Mustn't Should
No ratings yet
15.10.2022 Have To Mustn't Should
2 pages
Dr. S.P. Hewawasam (MD) Consultant Gastroenterologist/Senior Lecturer in Physiology
No ratings yet
Dr. S.P. Hewawasam (MD) Consultant Gastroenterologist/Senior Lecturer in Physiology
33 pages
Test Report Test 4 PDF
No ratings yet
Test Report Test 4 PDF
19 pages
Tragic Characters Villains
No ratings yet
Tragic Characters Villains
3 pages
Task 2 Possible Answers: Jane Austen's Emma Lesson 8
No ratings yet
Task 2 Possible Answers: Jane Austen's Emma Lesson 8
3 pages
02august2024 Annotations
No ratings yet
02august2024 Annotations
21 pages
Community Problem Report 2
No ratings yet
Community Problem Report 2
7 pages
ĐÁP ÁN ĐỀ CƯƠNG LVT
No ratings yet
ĐÁP ÁN ĐỀ CƯƠNG LVT
5 pages
SQL Lab 1: Objective
No ratings yet
SQL Lab 1: Objective
8 pages
Lec2 PDF
No ratings yet
Lec2 PDF
11 pages
Change The Passages Below To Simple Past Tense. Underline The Words That Have Been Changed
100% (1)
Change The Passages Below To Simple Past Tense. Underline The Words That Have Been Changed
2 pages
Active and Passive Voice
No ratings yet
Active and Passive Voice
43 pages
Walmart
No ratings yet
Walmart
4 pages
Indonesian - English Vocabulary in Health & Health Care Field
No ratings yet
Indonesian - English Vocabulary in Health & Health Care Field
15 pages
Eight Hours + a Gun
No ratings yet
Eight Hours + a Gun
218 pages
G.R. No. 87636 NEPTALI A. GONZALES v. CATALINO MACARAIG
No ratings yet
G.R. No. 87636 NEPTALI A. GONZALES v. CATALINO MACARAIG
31 pages
Ok Menu Sugar Lab
No ratings yet
Ok Menu Sugar Lab
6 pages
OSN 8800 6800 3800 V100R007C02 Hardware Description 03
100% (10)
OSN 8800 6800 3800 V100R007C02 Hardware Description 03
3,027 pages
English
No ratings yet
English
8 pages
KHIND Annual Report Malaysia
No ratings yet
KHIND Annual Report Malaysia
77 pages
The Origins of Writing As A Problem of Historical Epistemology PDF
100% (1)
The Origins of Writing As A Problem of Historical Epistemology PDF
10 pages
TUGAS TUTORIAL 3 Bahasa Inggris
No ratings yet
TUGAS TUTORIAL 3 Bahasa Inggris
5 pages

K Means Clustering Problem Solved

Uploaded by

K Means Clustering Problem Solved

Uploaded by

 Assumes Euclidean space/distance

 Start by picking k, the number of clusters

 Initialize clusters by picking one point per

 2) After all points are assigned, update the

 3) Reassign all points to their closest centroid

 Repeat 2 and 3 until convergence

Calculation of new centroids

Fig 16.3: Initial cluster with new centroids

Fig 16.4: Cluster after first iteration

Cluster centres after second iteration

Centroid Revised Centroids

Medicine Weight (X) Ph (Y)

 Given k=2, Take initial centroid as A & B

 Cluster points can also be represented as

C(4,3) 3.61 2.83 2

Cluster Points Centroid

2 B(2,1), C(4,3), D(5,4) =(3.67,2.67)

 d[(5,4),(3.67,2.67)]= = D(5,4) 5 1.37 2

1 A(1,1), B(2,1) =(1,1.5)

 As the cluster has not changed, the cluster has

You might also like