0% found this document useful (0 votes)
32 views20 pages

K-Means Clustering Guide

K-means clustering is an algorithm to classify objects into K number of groups by minimizing the sum of squares of distances between data and the corresponding cluster centroid. The algorithm works by initializing centroids, assigning objects to the closest centroid, recomputing centroids, and repeating until convergence is reached. An example shows how k-means clustering with K=2 partitions a sample dataset into two clusters.

Uploaded by

Muneeba Hussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views20 pages

K-Means Clustering Guide

K-means clustering is an algorithm to classify objects into K number of groups by minimizing the sum of squares of distances between data and the corresponding cluster centroid. The algorithm works by initializing centroids, assigning objects to the closest centroid, recomputing centroids, and repeating until convergence is reached. An example shows how k-means clustering with K=2 partitions a sample dataset into two clusters.

Uploaded by

Muneeba Hussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 20

K-MEANS

CLUSTERING
INTRODUCTION-
What is clustering?

 Clustering is the classification of objects into


different groups, or more precisely, the
partitioning of a data set into subsets
(clusters), so that the data in each subset
(ideally) share some common trait - often
according to some defined distance measure.
Common Distance measures:

 Distance measure will determine how the similarity of two


elements is calculated and it will influence the shape of the
clusters.
They include:
1. The Euclidean distance (also called 2-norm distance) is
given by:

2. The Manhattan distance (also called taxicab norm or 1-


norm) is given by:
3.The maximum norm is given by:

4. The Mahalanobis distance corrects data for


different scales and correlations in the variables.
5. Inner product space: The angle between two
vectors can be used as a distance measure when
clustering high dimensional data
6. Hamming distance (sometimes edit distance)
measures the minimum number of substitutions
required to change one member into another.
K-MEANS CLUSTERING
 The k-means algorithm is an algorithm to cluster
n objects based on attributes into k partitions,
where k < n.
 It is similar to the expectation-maximization
algorithm for mixtures of Gaussians in that they
both attempt to find the centers of natural clusters
in the data.
 It assumes that the object attributes form a vector
space.
 Simply speaking k-means clustering is an
algorithm to classify or to group the objects
based on attributes/features into K number of
group.
 K is positive integer number.
 The grouping is done by minimizing the sum
of squares of distances between data and the
corresponding cluster centroid.
How the K-Mean Clustering
algorithm works?
 Step 1: Begin with a decision on the value of k =
number of clusters .
 Step 2: Put any initial partition that classifies the
data into k clusters. You may assign the
training samples randomly,or systematically
as the following:
1.Take the first k training sample as single-
element clusters
2. Assign each of the remaining (N-k) training
sample to the cluster with the nearest centroid.
After each assignment, recompute the centroid of
the gaining cluster.
 Step 3: Take each sample in sequence and
compute its distance from the centroid of
each of the clusters. If a sample is not
currently in the cluster with the closest
centroid, switch this sample to that cluster
and update the centroid of the cluster
gaining the new sample and the cluster
losing the sample.
 Step 4 . Repeat step 3 until convergence is
achieved, that is until a pass through the
training sample causes no new assignments.
A Simple example showing the
implementation of k-means algorithm
(using K=2)
Step 1:
Initialization: Randomly we choose following two centroids
(k=2) for two clusters.
In this case the 2 centroid are: m1=(1.0,1.0) and
m2=(5.0,7.0).
Step 2:
 Thus, we obtain two clusters
containing:
{1,2,3} and {4,5,6,7}.
 Their new centroids are:
Step 3:
 Now using these centroids
we compute the Euclidean
distance of each object, as
shown in table.

 Therefore, the new


clusters are:
{1,2} and {3,4,5,6,7}

 Next centroids are:


m1=(1.25,1.5) and m2 =
(3.9,5.1)
 Step 4 :
The clusters obtained are:
{1,2} and {3,4,5,6,7}

 Therefore, there is no
change in the cluster.
 Thus, the algorithm comes
to a halt here and final
result consist of 2 clusters
{1,2} and {3,4,5,6,7}.
PLOT
(with K=3)

Step 1 Step 2
PLOT
Real-Life Numerical Example
of K-Means Clustering
We have 4 medicines as our training data points object
and each medicine has 2 attributes. Each attribute
represents coordinate of the object. We have to
determine which medicines belong to cluster 1 and
which medicines belong to the other cluster.
Attribute1 (X): Attribute 2 (Y): pH
Object
weight index

Medicine A 1 1

Medicine B 2 1

Medicine C 4 3

Medicine D 5 4
Step 1:
 Initial value of
centroids : Suppose
we use medicine A and
medicine B as the first
centroids.
 Let and c1 and c2

denote the coordinate


of the centroids, then
c1=(1,1) and c2=(2,1)
We get the final grouping as the results as:

Object Feature1(X): Feature2 Group


weight index (Y): pH (result)
Medicine A 1 1 1
Medicine B 2 1 1
Medicine C 4 3 2
Medicine D 5 4 2

You might also like