0% found this document useful (0 votes)
89 views5 pages

K-Means Clustering

na

Uploaded by

Mani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views5 pages

K-Means Clustering

na

Uploaded by

Mani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

K-Means Clustering: Mini Project Report

Table of Contents
1. Introduction
2. K-Means Clustering Algorithm
o 2.1. Key Concepts
o 2.2. How K-Means Works
3. Applications of K-Means Clustering
4. Implementation of K-Means
5. Conclusion
6. References
1. Introduction
K-Means Clustering is one of the simplest and widely used unsupervised machine learning
algorithms for partitioning datasets into a set number of clusters. It attempts to group data points
into a predefined number (K) of clusters, where each data point belongs to the cluster with the
nearest mean. Clustering techniques such as K-Means help in organizing data into meaningful
groups, without requiring any pre-labeled data, making it especially useful in situations where data
labels are not available.

This report will explore the key concepts behind K-Means Clustering, its algorithm, applications
in real-world use cases, and a brief implementation overview.

2. K-Means Clustering Algorithm

2.1. Key Concepts

K-Means Clustering is an iterative algorithm aimed at minimizing the distances between data
points and the centroid of their respective clusters. The key steps involved are:

1. Centroids: The center of a cluster, calculated as the mean of all the points in the cluster.

2. Euclidean Distance: The most commonly used metric to measure the distance between a
data point and a centroid.

3. Cluster: A group of data points with similar properties, based on the minimum distance to
a centroid.

2.2. How K-Means Works

The K-Means algorithm works through the following steps:

1. Initialization: Choose K initial centroids randomly from the data points.

2. Assignment Step: Each data point is assigned to the nearest centroid, forming K clusters.

3. Update Step: Recalculate the centroids of the K clusters as the mean of all the points in
each cluster.

4. Repeat: Steps 2 and 3 are repeated until the centroids no longer change or converge.
The algorithm aims to minimize the following objective function:

Where:

• JJJ is the sum of squared distances between data points (xxx) and their cluster's centroid
(μi\mu_iμi).

• KKK is the number of clusters.

• CiC_iCi is the cluster of data points belonging to the iii-th centroid.


This process is guaranteed to converge, although the final result may depend on the initial
placement of centroids.

3. Applications of K-Means Clustering


K-Means has many applications across different fields, such as:

1. Image Segmentation: Grouping pixels with similar colors or textures to identify regions
in an image.

2. Market Segmentation: Clustering customer data to identify segments for targeted


marketing.

3. Document Clustering: Organizing large sets of documents into clusters based on content
similarity.

4. Anomaly Detection: Identifying outliers by detecting points that do not belong to any well-
defined cluster.

5. Healthcare: Classifying patient data based on similarities in symptoms or genetic profiles.

4. Implementation of K-Means with Elbow Method


The Elbow Method is used to determine the optimal number of clusters (K) by calculating the sum
of squared distances (inertia) between data points and their nearest centroid. The plot of inertia vs.
the number of clusters reveals a "bend" or "elbow," indicating the optimal K.
PYTHON

import numpy as np

import matplotlib.pyplot as plt

from sklearn.cluster import KMeans

from sklearn.datasets import make_blobs

X, _ = make_blobs(n_samples=300, centers=5, cluster_std=0.6, random_state=0)

inertia = []

for k in range(1, 11):

kmeans = KMeans(n_clusters=k)

kmeans.fit(X)

inertia.append(kmeans.inertia_)

plt.plot(range(1, 11), inertia, marker='o')


plt.title('Elbow Method for Optimal K')

plt.xlabel('Number of Clusters (K)')

plt.ylabel('Inertia')

plt.show()
OUTCOME: -

The graph shows a sharp reduction in inertia at the optimal number of clusters, after which
further reductions become minimal, forming an "elbow."

5. Conclusion
K-Means Clustering is an effective and efficient algorithm for organizing data into meaningful
clusters based on similarities. Although it is simple, K-Means has several limitations, including
sensitivity to the initial placement of centroids and difficulty handling non-spherical clusters.
Despite these challenges, K-Means continues to be a widely used algorithm for unsupervised
learning due to its speed and scalability.

Further improvements to the basic K-Means algorithm include techniques like K-Means++, which
helps to improve the initial placement of centroids, and using different distance metrics for non-
Euclidean spaces.

6. References

1. "An Introduction to K-Means Clustering." INTRAINZ.

You might also like