K Medoids
K Medoids
and Algorithms
Finding groups of objects such that the objects in a group will be similar (or
related) to one another and different from (or unrelated to) the objects in
other groups
Inter-cluster
Intra-cluster distances are
distances are maximized
minimized
Quality: What Is Good Clustering?
Dissimilarity matrix 0
– (one mode) d(2,1) 0
d(3,1) d ( 3,2) 0
: : :
d ( n,1) d ( n,2) ... ... 0
Type of data in clustering analysis
Interval-scaled variables
Binary variables
Standardize data
– Calculate the mean absolute deviation:
s f 1n (| x1 f m f | | x2 f m f | ... | xnf m f |)
1
where m f n (x1 f x2 f ... xnf )
.
If q = 2, d is Euclidean distance
d (i, j) (| x x |2 | x x |2 ... | x x |2 )
i1 j1 i2 j2 ip jp
– Properties
d(i,j) 0
d(i,i) = 0
d(i,j) = d(j,i)
d(i,j) d(i,k) + d(k,j)
Example
Name Gender Fever Cough Test-1 Test-2 Test-3 Test-4
Jack M Y N P N N N
Mary F Y N P N P N
Jim M Y P N N N N
d (i, j) p p m
– map the range of each variable onto [0, 1] by replacing i-th object in
the f-th variable by
rif 1
zif
Mf 1
yif = log(xif)
– f is ordinal or ratio-scaled
compute ranks rif and
zif
r if 1
and treat zif as interval-scaled M f 1
Vector Objects
TYPES OF CLUSTERS
Types of Clusters
Well-separated clusters
Center-based clusters
Contiguous clusters
Density-based clusters
Property or Conceptual
Well-Separated Clusters:
3 well-separated clusters
Types of Clusters: Center-Based
Center-based
– A cluster is a set of objects such that an object in a cluster is closer
(more similar) to the “center” of a cluster, than to the center of any
other cluster
– The center of a cluster is often a centroid, the average of all the points
in the cluster, or a medoid, the most “representative” point of a cluster
4 center-based clusters
Types of Clusters: Contiguity-Based
8 contiguous clusters
Types of Clusters: Density-Based
Density-based
6 density-based clusters
Types of Clusters: Conceptual Clusters
2 Overlapping Circles
Types of Clusters: Objective Function
Map the clustering problem to a different domain and solve a related problem in that domain
– Proximity matrix defines a weighted graph, where the nodes are the points being clustered, and the
weighted edges represent the proximities between points
– Clustering is equivalent to breaking the graph into connected components, one for each cluster
– Want to minimize the edge weight between clusters and maximize the edge weight within clusters
CLUSTERING TECHNIQUES
Types of Clustering
Hierarchical clustering
– A set of nested clusters organized as a hierarchical tree
p1
p3 p4
p2
p1 p2 p3 p4
Traditional Hierarchical Clustering Traditional Dendrogram
p1
p3 p4
p2
p1 p2 p3 p4
Non-traditional Hierarchical Clustering Non-traditional Dendrogram
Density based Clustering
Partitional Clustering
Hierarchical clustering
Density-based clustering
Partitional Clustering
Partitional Clustering
Given
– A data set of n objects
Organize the objects into k partitions (k<=n) where each partition represents
a cluster
The clusters are formed to optimize an objective partitioning criterion
– Objects within a cluster are similar
Complexity is O( n * K * I * d )
– n = number of points, K = number of clusters,
I = number of iterations, d = number of attributes
Evaluating K-means Clusters
– Given two clusters, we can choose the one with the smallest error
One easy way to reduce SSE is to increase K, i.e. the number of clusters
– A good clustering with smaller K can have a lower SSE than a poor clustering
with higher K
Two different K-means Clusterings
3
2.5
2
Original Points
1.5
y
1
0.5
3 3
2.5 2.5
2 2
1.5 1.5
y
y
1 1
0.5 0.5
0 0
Iteration 6
1
2
3
4
5
3
2.5
1.5
y 1
0.5
2 2 2
y
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
2 2 2
y
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
Importance of Choosing Initial Centroids (Case
ii)
Iteration 5
1
2
3
4
3
2.5
1.5
y
1
0.5
Iteration 1 Iteration 2
3 3
2.5 2.5
2 2
1.5 1.5
y
1 1
0.5 0.5
0 0
2 2 2
y
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
Problems with Selecting Initial Points
If there are K ‘real’ clusters then the chance of selecting one centroid
from each cluster is small.
– Chance is relatively small when K is large
– Sometimes the initial centroids will readjust themselves in ‘right’ way, and
sometimes they don’t
y 2
-2
-4
-6
0 5 10 15 20
x
Starting with two initial centroids in one cluster of each pair of clusters
10 Clusters Example (Good Clusters)
Iteration 1 Iteration 2
8 8
6 6
4 4
2 2
y
0 0
-2 -2
-4 -4
-6 -6
0 5 10 15 20 0 5 10 15 20
x x
Iteration 3 Iteration 4
8 8
6 6
4 4
2 2
y
y
0 0
-2 -2
-4 -4
-6 -6
0 5 10 15 20 0 5 10 15 20
x x
Starting with two initial centroids in one cluster of each pair of clusters
10 Clusters Example (Bad Clusters)
Iteration 4
1
2
3
8
y 0
-2
-4
-6
0 5 10 15 20
x
Starting with some pairs of clusters having three initial centroids, while other have only one.
10 Clusters Example (Bad Clusters)
Iteration 1 Iteration 2
8 8
6 6
4 4
2 2
y
0 0
-2 -2
-4 -4
-6 -6
0 5 10 15 20 0 5 10 15 20
Iteration
x 3 Iteration
x 4
8 8
6 6
4 4
2 2
y
y
0 0
-2 -2
-4 -4
-6 -6
0 5 10 15 20 0 5 10 15 20
x x
Starting with some pairs of clusters having three initial centroids, while other have only one.
Solutions to Initial Centroids Problem
Multiple runs
– Helps, but probability is not on your side
Post-processing
Bisecting K-means
– Not as susceptible to initialization issues
Pre-processing and Post-processing
Pre-processing
– Normalize the data
– Eliminate outliers
Post-processing
– Eliminate small clusters that may represent outliers
– Merge clusters that are ‘close’ and that have relatively low SSE
– Densities
– Non-globular shapes
– Since an object with an extremely large value may substantially distort the
Solution: Instead of taking the mean value of the object in a cluster as a reference
point, medoids can be used, which is the most centrally located object in a cluster.
10 10
9 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
The K-Medoids Clustering Method
– starts from an initial set of medoids and iteratively replaces one of the medoids
clustering.
– PAM works effectively for small data sets, but does not scale well for large data
sets.
2. For each pair of selected object (i) and non-selected object (h), calculate
object
𝑦3
𝑥2 𝑦2
𝑥3
𝑥1 C2
C1
𝑦1
Case (i) : Computation of Cjih
10
8
t
7
Therefore, in future, j j
6
belongs to the cluster h
represented by h 5
4
Cjih = d(j, h) - d(j, i)
3
i
2
0
0 1 2 3 4 5 6 7 8 9 10
𝑘
Cjih Can be positive or negative, depending on
𝐸=∑ ∑ 𝑑( 𝑗 ,𝑖) whether j is more similar to h or to i
𝑖=1 𝑗∈𝐶 𝑖
Case (ii) : Computation of Cjih
10
8 h
Therefore, in future, j 7 j
belongs to the cluster 6
represented by t 5
4
i t
3
0
0 1 2 3 4 5 6 7 8 9 10
10
8
t j
7
Therefore, in future, j
6
belongs to the cluster h
5
represented by t itself
4
3
i
2
0
0 1 2 3 4 5 6 7 8 9 10
Cjih = 0
Case (iv) : Computation of Cjih
10
8
t
7
Therefore, in future, j j
6
belongs to the cluster h
5
represented by h
4
3
i
2
0
0 1 2 3 4 5 6 7 8 9 10
Data Objects 9
8
3
A1 A2 7
4
6 10
01 2 6 1
5
02 3 4
9
03 4
3 8 2 6 8
04 4 7 3 7
5
05 6 2 2
3 4 5 6 7 8 9
06 6 4
07 7 3
08 Goal: create two
7 4 clusters
09 8 5 Choose randmly two
010 7 6 medoids
08 = (7,4) and 02 = (3,4)
PAM or K-Medoids: Example
Data Objects 9
cluster1
8
3 cluster
A1 A2 7 2
4
6 10
01 2 6 1
5
02 3 4
9
03 4
3 8 2 6 8
04 4 7 3 7
5
05 6 2 2
3 4 5 6 7 8 9
06 6 4
07 7 3 Assign each object to theclosest representative object
08 7 4
09 Using L1 Metric (Manhattan), we form the following clusters
8 5
010 7 6 Cluster1 = {01, 02, 03, 04}
Data Objects 9
cluster1
8
3 cluster
A1 A2 7 2
4
6 10
01 2 6 1
5
02 3 4
9
03 4
3 8 2 6 8
04 4 7 3 7
5
05 6 2 2
3 4 5 6 7 8 9
06 6 4
07 7 3 Compute the absolute error criterion [for the set
08 7 4 of Medoids (O2,O8)]
09 8 5
𝒌
010 7 6
𝑬=∑ ∑ |𝒑− 𝑶 𝒊|=(|𝑶 𝟏 − 𝑶 𝟐|+|𝑶 𝟑 − 𝑶 𝟐|+|𝑶 𝟒 − 𝑶 𝟐|)+¿ ¿
𝒊=𝟏 𝒑 ∈𝑪 𝒊
Data Objects 9
cluster1
8
3 cluster
A1 A2 7 2
4
6 10
01 2 6 1
5
02 3 4
9
03 4
3 8 2 6 8
04 4 7 3 7
5
05 6 2 2
3 4 5 6 7 8 9
06 6 4
07 7 3
08 7 4 The absolute error criterion [for the set of Medoids
(O2,O8)]
09 8 5
010 7 6 E = (3+4+4)+(3+1+1+2+2) = 20
PAM or K-Medoids: Example
Data Objects 9
cluster1
8
3 cluster
A1 A2 7 2
4
6 10
01 2 6 1
5
02 3 4
9
03 4
3 8 2 6 8
04 4 7 3 7
5
05 6 2 2
3 4 5 6 7 8 9
06 6 4
07 7 3 • Choose a random object 07
08 7 4
• Swap 08 and 07
09 8 5
010 • Compute the absolute error criterion [for the set of
7 6
Medoids (02,07)
E = (3+4+4)+(2+2+1+3+3) = 22
PAM or K-Medoids: Example
Data Objects 9
cluster1
8
3 cluster
A1 A2 7 2
4
6 10
01 2 6 1
5
02 3 4
9
03 4
3 8 2 6 8
04 4 7 3 7
5
05 6 2 2
3 4 5 6 7 8 9
06 6 4
07 7 3 → Compute the cost function
08 7 4
Absolute error [02 ,07 ] - Absolute error [for 02 ,08 ]
09 8 5
010 7 6
S=22-20
S> 0 => It is a bad idea to replace 08 by 07
PAM or K-Medoids: Example
Data Objects 9
cluster1
8
3 cluster
A1 A2 7 2
4
6 10
01 2 6 1
5
02 3 4
9
03 4
3 8 2 6 8
04 4 7 3 7
5
05 6 2 2
3 4 5 6 7 8 9
06 6 4
07 7 3 C687 = d(07,06) - d(08,06)=2-1=1
08 7 4
C587 = d(07,05) - d(08,05)=2-3=-1 C187=0, C387=0, C487=0
09 8 5
010 C987 = d(07,09) - d(08,09)=3-2=1
7 6
C1087 = d(07,010) - d(08,010)=3-2=1
TCih=jCjih=1-1+1+1=2=S
PAM or K-Medoids : Different View
{03,02}
Initial Medoids
E=20 Dataset=
{01,04}
E=26 {01,05} {04,05}
E=17 E=9
PAM or K-Medoids : Different View
{03,02}
Initial Medoids
E=20 Dataset=
{01,04}
E=26 {01,05} {04,05}
E=17 E=9
PAM or K-Medoids : Different View
{03,02}
Initial Medoids
E=20 Dataset=
{01,04}
E=26 {01,05} {04,05}
E=17 E=9
PAM or K-Medoids : Different View
{03,02}
Initial Medoids
E=20 Dataset=
{01,04}
E=26 {01,05} {04,05}
E=17 E=9
PAM or K-Medoids : Different View
{03,02}
Initial Medoids
E=20 Dataset=
{01,04}
E=26 {01,05} {04,05}
E=17 E=9
PAM or K-Medoids : Different View
PAM is more robust than k-means in the presence of noise and outliers
a mean.
PAM works efficiently for small data sets but does not scale well for large
data sets.
It draws multiple samples of the data set, applies PAM on each sample, and
Weakness:
CLARA draws a sample of the dataset and applies PAM on the sample in
If the sample is best representative of the entire dataset then the medoids of
– Note that the algorithm cannot find the best solution if one of the best k-
PAM
sample
CLARA (Clustering Large Applications)
Choose the best
clustering
Cluste Cluste Cluste
To improve the approximation, multiple rs rs rs
PAM PAM PAM
samples are drawn and the best
…
clustering is returned as the output.
sample1 sample2 samplem
The clustering accuracy is measured
– Draw a sample of 40 + 2k objects randomly from the entire data set, and call Al-
– For each object in the entire data set, determine which of the k medoids is the most similar to it.
– Calculate the average dissimilarity ON THE ENTIRE DATASET of the clustering obtained in the previous step. If
this value as the current minimum, and retain the k medoids found in Step (1) as
PAM finds the best k medoids among a given data, and CLARA finds the best k
Problems
The best k medoids may not be selected during the sampling process, in
this case, CLARA will never find the best clustering.
If the sampling is biased we cannot have a good clustering.
PAM or K-Medoids : Different View
{02,03} Dataset=
Initial Medoids
E=20
No. of patterns (n) = 5
No. of clusters (k) =2
{01,04}
E=26 {01,05} {04,05}
E=17 E=21
CLARA: Different View
Initial Medoids
Dataset= {05,02}
{01,02} {04,02}
Sample= E=19 E=33 E=14
{01,04}
E=26 {01,05} {04,05}
E=17 E=21
CLARA: Different View
{02,03} Dataset=
Initial Medoids
E=20
Sample=
No. of patterns (n) = 5
No. of clusters (k) =2
{03,01} {03,04} {03,05} {01,02} {04,02} {05,02}
E=30 E=10 E=12 E=19 E=33 E=14
{01,04}
E=26 {01,05} {04,05}
E=17 E=21
CLARANS (“Randomized” CLARA)
Initial Medoids
Dataset= {05,02}
{01,02} {04,02}
Sample= E=19 E=33 E=14
{01,04}
E=26 {01,05} {04,05}
E=17 E=9
CLARA: Different View
{02,03} Dataset=
Initial Medoids
E=20
Sample=
No. of patterns (n) = 5
No. of clusters (k) =2
{03,01} {03,04} {03,05} {01,02} {04,02} {05,02}
E=30 E=10 E=12 E=19 E=33 E=14
{01,04}
E=26 {01,05} {04,05}
E=17 E=9
CLARANS
Advantages
– Experiments show that CLARANS is more effective than both PAM and CLARA
– Handles outliers
Disadvantages
– The computational complexity of CLARANS is O(n2), where n is the number of objects