K Means Clustering Problem Solved
K Means Clustering Problem Solved
BDA 23
1) For each point, place it in the cluster whose
current centroid it is nearest
25
Note:
1) Objects are defined in terms of set of attributes.
where each is continuous data type.
2) Distance computation: Any distance such as or cosine similarity.
3) Minimum distance is the measure of closeness between an object and
centroid.
4) Mean Calculation: It is the mean value of each attribute values of all objects.
5) Convergence criteria: Any one of the following are termination condition of
the algorithm.
• Number of maximum iteration permissible.
• No change of centroid values in any cluster.
• Zero (or no significant) movement(s) of object from one cluster to another.
• Cluster quality reaches to a certain level of acceptance.
26
BDA 27
x
x
x
x
x
x x x x x x
x … data point
… centroid Clusters after round 1
BDA 28
x
x
x
x
x
x x x x x x
x … data point
… centroid Clusters after round 2
BDA 29
x
x
x
x
x
x x x x x x
x … data point
… centroid Clusters at the end
BDA 30
Fig 16.1: Plotting data of Table 16.1
25
A1 A2
6.8 12.6
0.8 9.8 20
1.2 11.6
2.8 9.6 15
A2
3.8 9.9
4.4 6.5 10
4.8 1.1
6.0 19.9 5
6.2 18.5
7.6 17.4 0
7.8 12.2 0 2 4 6 8 10 12
6.6 7.7 A1
8.2 4.5
8.4 6.9
Table 16.1: 16 objects with two
9.0 3.4
attributes 𝑨𝟏 and 𝑨𝟐 .
9.6 11.1
31
• Suppose, k=3. Three objects are chosen at random shown as circled (see
Fig 16.1). These three centroids are shown below.
Initial Centroids chosen randomly
Centroid Objects
A1 A2
c1 3.8 9.9
c2 7.8 12.2
c3 6.2 18.5
• Let us consider the Euclidean distance measure (L2 Norm) as the distance
measurement in our illustration.
• Let d1, d2 and d3 denote the distance from an object to c1, c2 and c3
respectively. The distance calculations are shown in Table 16.2.
• Assignment of each object to the respective centroid is shown in the right-
most column and the clustering so obtained is shown in Fig 16.2.
32
Table 16.2: Distance calculation Fig 16.2: Initial cluster with respect to Table
A1 A2 d1 d2 d3 cluster
16.2
6.8 12.6 4.0 1.1 5.9 2
0.8 9.8 3.0 7.4 10.2 1
1.2 11.6 3.1 6.6 8.5 1
2.8 9.6 1.0 5.6 9.5 1
3.8 9.9 0.0 4.6 8.9 1
4.4 6.5 3.5 6.6 12.1 1
4.8 1.1 8.9 11.5 17.5 1
6.0 19.9 10.2 7.9 1.4 3
6.2 18.5 8.9 6.5 0.0 3
7.6 17.4 8.4 5.2 1.8 3
7.8 12.2 4.6 0.0 6.5 2
6.6 7.7 3.6 4.7 10.8 1
8.2 4.5 7.0 7.7 14.1 1
8.4 6.9 5.5 5.3 11.8 2
9.0 3.4 8.3 8.9 15.4 1
9.6 11.1 5.9 2.1 8.1 2
CS 40003: Data 33
The calculation new centroids of the three cluster using the mean of attribute
values of A1 and A2 is shown in the Table below. The cluster with new centroids
are shown in Fig 16.3.
New Objects
Centroid A1 A2
c1 4.6 7.1
c2 8.2 10.7
c3 6.6 18.6
CS 40003: Data 34
We next reassign the 16 objects to three clusters by determining which centroid is
closest to each one. This gives the revised set of clusters shown in Fig 16.4.
Note that point p moves from cluster C2 to cluster C1.
CS 40003: Data 35
• The newly obtained centroids after second iteration are given in the table below.
Note that the centroid c3 remains unchanged, where c2 and c1 changed a little.
• With respect to newly obtained cluster centres, 16 points are reassigned again.
These are the same clusters as before. Hence, their centroids also remain
unchanged.
• Considering this as the termination criteria, the k-means algorithm stops here.
Hence, the final cluster in Fig 16.5 is same as Fig 16.4.
Fig 16.5: Cluster after Second iteration
CS 40003: Data 36
How to select k?
Try different k, looking at the change in the
average distance to centroid as k increases
Average falls rapidly until right k, then
changes little
Best value
of k
Average
distance to
centroid k
BDA 37
Too few; x
many long x
xx x
distances
x x
to centroid. x x x x x
x x x x x
x xx x xx x
x x x x
x x
x x x
x x x x
x x x
x
BDA 38
x
Just right; x
distances xx x
rather short. x x
x x x x x
x x x x x
x xx x xx x
x x x x
x x
x x x
x x x x
x x x
x
BDA 39
Too many; x
little improvement x
in average xx x
distance. x x
x x x x x
x x x x x
x xx x xx x
x x x x
x x
x x x
x x x x
x x x
x
BDA 40
Group the medicines below into two groups
based on two feature weight & ph
BDA 41
Medicine Weight (X) Ph (Y) Assignment
A 1 1 Cluster 1
B 2 1 Cluster 2
C 4 3 ?
D 5 4 ?
BDA 42
d[(4,3),(1,1)]= =3.61 Point d(1,1) d(2,1) Cluste
d[(5,4),(1,1)]= =5
r
d[(4,3),(2,1)]= =2.83 A(1,1) 0 X 1
d[(5,4),(2,1)]= =4.24
B(2,1) x 0 2
C(4,3) ?
D(5,4) ?
BDA 43
d[(4,3),(1,1)]= =3.61 Point d(1,1) d(2,1) Cluste
d[(5,4),(1,1)]= =5
r
d[(4,3),(2,1)]= =2.83 A(1,1) 0 X 1
d[(5,4),(2,1)]= =4.24
B(2,1) x 0 2
1 A(1,1) (1,1)
BDA 44
Point d(1,1) d(3.6 Clust
d[(2,1),(1,1)]= =1 7,2.6 er
d[(4,3),(1,1)]= =3.61 7)
d[(5,4),(1,1)]= =5 A(1,1) 0 X 1
d[(2,1),(3.67,2.67)]= =
2.36 B(2,1)
d[(4,3),(3.67,2.67)]= =
2.69 C(4,3)
d[(5,4),(3.67,2.67)]= = D(5,4)
1.37
BDA 45
Point d(1,1) d(3.6 Clust
d[(2,1),(1,1)]= =1 7,2.6 er
d[(4,3),(1,1)]= =3.61 7)
d[(5,4),(1,1)]= =5 A(1,1) 0 X 1
d[(2,1),(3.67,2.67)]= =
2.36 B(2,1) 1 2.36 1
d[(4,3),(3.67,2.67)]= =
2.69 C(4,3) 3.61 2.69 2
BDA 46
Point d(1,1. d(4.5, Clust
d[(1,1),(1,1.5)]= =0.5 5) 3.5) er
d[(2,1),(1,1.5)]= =1.12
A(1,1)
d[(4,3),(1,1.5)]= =3.35
d[(5,4),(1,1.5)]= =4.72 B(2,1)
d[(1,1),(4.5,3.5)]= =4.30
d[(2,1),(4.5,3.5)]= =5.36 C(4,3)
d[(4,3),(4.5,3.5)]= =0.71
D(5,4)
d[(5,4),(4.5,3.5)]= =0.71
Centroid Calculation
BDA 47
Point d(1,1. d(4.5, Clust
d[(1,1),(1,1.5)]= =0.5 5) 3.5) er
d[(2,1),(1,1.5)]= =1.12
A(1,1) 0.5 4.30 1
d[(4,3),(1,1.5)]= =3.35
d[(5,4),(1,1.5)]= =4.72 B(2,1) 1.12 5.36 1
d[(1,1),(4.5,3.5)]= =4.30
d[(2,1),(4.5,3.5)]= =5.36 C(4,3) 3.35 0.71 2
d[(4,3),(4.5,3.5)]= =0.71
D(5,4) 4.72 0.71 2
d[(5,4),(4.5,3.5)]= =0.71
Centroid Calculation
BDA 48