0% found this document useful (0 votes)

89 views5 pages

K-Means Clustering

Uploaded by

Mani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views5 pages

K-Means Clustering

Uploaded by

Mani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

K-Means Clustering: Mini Project Report

Table of Contents
1. Introduction
2. K-Means Clustering Algorithm
o 2.1. Key Concepts
o 2.2. How K-Means Works
3. Applications of K-Means Clustering
4. Implementation of K-Means
5. Conclusion
6. References
1. Introduction
K-Means Clustering is one of the simplest and widely used unsupervised machine learning
algorithms for partitioning datasets into a set number of clusters. It attempts to group data points
into a predefined number (K) of clusters, where each data point belongs to the cluster with the
nearest mean. Clustering techniques such as K-Means help in organizing data into meaningful
groups, without requiring any pre-labeled data, making it especially useful in situations where data
labels are not available.

This report will explore the key concepts behind K-Means Clustering, its algorithm, applications
in real-world use cases, and a brief implementation overview.

2. K-Means Clustering Algorithm

2.1. Key Concepts

K-Means Clustering is an iterative algorithm aimed at minimizing the distances between data
points and the centroid of their respective clusters. The key steps involved are:

1. Centroids: The center of a cluster, calculated as the mean of all the points in the cluster.

2. Euclidean Distance: The most commonly used metric to measure the distance between a
data point and a centroid.

3. Cluster: A group of data points with similar properties, based on the minimum distance to
a centroid.

2.2. How K-Means Works

The K-Means algorithm works through the following steps:

1. Initialization: Choose K initial centroids randomly from the data points.

2. Assignment Step: Each data point is assigned to the nearest centroid, forming K clusters.

3. Update Step: Recalculate the centroids of the K clusters as the mean of all the points in
each cluster.

4. Repeat: Steps 2 and 3 are repeated until the centroids no longer change or converge.
The algorithm aims to minimize the following objective function:

Where:

• JJJ is the sum of squared distances between data points (xxx) and their cluster's centroid
(μi\mu_iμi).

• KKK is the number of clusters.

• CiC_iCi is the cluster of data points belonging to the iii-th centroid.

This process is guaranteed to converge, although the final result may depend on the initial
placement of centroids.

3. Applications of K-Means Clustering

K-Means has many applications across different fields, such as:

1. Image Segmentation: Grouping pixels with similar colors or textures to identify regions
in an image.

2. Market Segmentation: Clustering customer data to identify segments for targeted

marketing.

3. Document Clustering: Organizing large sets of documents into clusters based on content
similarity.

4. Anomaly Detection: Identifying outliers by detecting points that do not belong to any well-
defined cluster.

5. Healthcare: Classifying patient data based on similarities in symptoms or genetic profiles.

4. Implementation of K-Means with Elbow Method

The Elbow Method is used to determine the optimal number of clusters (K) by calculating the sum
of squared distances (inertia) between data points and their nearest centroid. The plot of inertia vs.
the number of clusters reveals a "bend" or "elbow," indicating the optimal K.
PYTHON

import numpy as np

import matplotlib.pyplot as plt

from sklearn.cluster import KMeans

from sklearn.datasets import make_blobs

X, _ = make_blobs(n_samples=300, centers=5, cluster_std=0.6, random_state=0)

inertia = []

for k in range(1, 11):

kmeans = KMeans(n_clusters=k)

kmeans.fit(X)

inertia.append(kmeans.inertia_)

plt.plot(range(1, 11), inertia, marker='o')

plt.title('Elbow Method for Optimal K')

plt.xlabel('Number of Clusters (K)')

plt.ylabel('Inertia')

plt.show()
OUTCOME: -

The graph shows a sharp reduction in inertia at the optimal number of clusters, after which
further reductions become minimal, forming an "elbow."

5. Conclusion
K-Means Clustering is an effective and efficient algorithm for organizing data into meaningful
clusters based on similarities. Although it is simple, K-Means has several limitations, including
sensitivity to the initial placement of centroids and difficulty handling non-spherical clusters.
Despite these challenges, K-Means continues to be a widely used algorithm for unsupervised
learning due to its speed and scalability.

Further improvements to the basic K-Means algorithm include techniques like K-Means++, which
helps to improve the initial placement of centroids, and using different distance metrics for non-
Euclidean spaces.

6. References

1. "An Introduction to K-Means Clustering." INTRAINZ.

KMeans Clustering Report
No ratings yet
KMeans Clustering Report
2 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
K-Means Clustering
No ratings yet
K-Means Clustering
3 pages
Pilot
No ratings yet
Pilot
3 pages
K Mean
No ratings yet
K Mean
7 pages
K Means Algorithm
No ratings yet
K Means Algorithm
4 pages
K-Means Clustering and Elbow Method Guide
No ratings yet
K-Means Clustering and Elbow Method Guide
53 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
24 pages
Clustering
No ratings yet
Clustering
18 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
20 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
16 pages
Algo
No ratings yet
Algo
59 pages
Understanding K-Means Clustering
No ratings yet
Understanding K-Means Clustering
12 pages
K-Means Clustering
No ratings yet
K-Means Clustering
6 pages
ML 5
No ratings yet
ML 5
61 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
26 pages
Unit 4
No ratings yet
Unit 4
125 pages
K Clustering
No ratings yet
K Clustering
28 pages
Mini Project
No ratings yet
Mini Project
8 pages
K Mean Algorithm
No ratings yet
K Mean Algorithm
18 pages
Machine Learning Chapter 3
No ratings yet
Machine Learning Chapter 3
12 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
4 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
16 pages
Unit 4
No ratings yet
Unit 4
22 pages
K-Means Clustering Guide & Python Implementation
No ratings yet
K-Means Clustering Guide & Python Implementation
21 pages
K Means Clustering Report
No ratings yet
K Means Clustering Report
3 pages
Unit 3 - KmeansClustering
No ratings yet
Unit 3 - KmeansClustering
17 pages
K-Means Clustering Guide for Beginners
No ratings yet
K-Means Clustering Guide for Beginners
19 pages
1 Kmeans
No ratings yet
1 Kmeans
13 pages
K - Means Clustering
No ratings yet
K - Means Clustering
13 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
27 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
K-Means Clustering Algorithm Overview
No ratings yet
K-Means Clustering Algorithm Overview
47 pages
1 Akshada Assignment No.21 (KM Clustering)
No ratings yet
1 Akshada Assignment No.21 (KM Clustering)
22 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
K - Means Numerical
No ratings yet
K - Means Numerical
3 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
12 pages
K Mean Clustering
No ratings yet
K Mean Clustering
32 pages
K Means
No ratings yet
K Means
40 pages
K, Eans
No ratings yet
K, Eans
4 pages
K Mean Clustering 1
No ratings yet
K Mean Clustering 1
26 pages
Introduction To The K-Means Clustering Algorithm Based On The Elbow
No ratings yet
Introduction To The K-Means Clustering Algorithm Based On The Elbow
4 pages
K-Means Clustering Tutorial
No ratings yet
K-Means Clustering Tutorial
16 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
K Mean Clustering
No ratings yet
K Mean Clustering
48 pages
Minor Project
No ratings yet
Minor Project
10 pages
Da Exp 10 66
No ratings yet
Da Exp 10 66
6 pages
Chapter 9
No ratings yet
Chapter 9
8 pages
K-Means Clustering Guide 2023
No ratings yet
K-Means Clustering Guide 2023
14 pages
Clustering Techniques - Hierarchical, K-Means Clustering
No ratings yet
Clustering Techniques - Hierarchical, K-Means Clustering
22 pages
K Means Clustering
No ratings yet
K Means Clustering
27 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
40 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
Clustering Techniques for CS Students
100% (1)
Clustering Techniques for CS Students
26 pages
Clustering Kmeans
No ratings yet
Clustering Kmeans
6 pages
Mani Genpact
No ratings yet
Mani Genpact
1 page
Spam News Detection Report: Manikiran
No ratings yet
Spam News Detection Report: Manikiran
12 pages
MERN Stack Guide for Beginners
No ratings yet
MERN Stack Guide for Beginners
3 pages
Data Entry Specialist Profile
No ratings yet
Data Entry Specialist Profile
1 page
Advanced Python Data Structures Guide
No ratings yet
Advanced Python Data Structures Guide
72 pages
1BSC-SEM2 DS Using C Full Notes 2021 (7 Files Merged)
No ratings yet
1BSC-SEM2 DS Using C Full Notes 2021 (7 Files Merged)
178 pages
Collections in Java
No ratings yet
Collections in Java
4 pages
01 Arrays Easy
No ratings yet
01 Arrays Easy
7 pages
An Approach For Solving Transportation Problem Using Modified Kruskal's Algorithm
No ratings yet
An Approach For Solving Transportation Problem Using Modified Kruskal's Algorithm
4 pages
Data Structures Laboratory: Atria Institute of Technology
No ratings yet
Data Structures Laboratory: Atria Institute of Technology
63 pages
Linear Programming Guide
100% (1)
Linear Programming Guide
17 pages
Week10 LinearSortsAndRegularExpressions
No ratings yet
Week10 LinearSortsAndRegularExpressions
22 pages
Infix to Postfix Conversion Guide
No ratings yet
Infix to Postfix Conversion Guide
50 pages
Lecture 5.Pptx 2
No ratings yet
Lecture 5.Pptx 2
22 pages
Unit 2
No ratings yet
Unit 2
102 pages
3) Shifting Bottleneck
No ratings yet
3) Shifting Bottleneck
16 pages
Comparison of All Sorting Algorithms
100% (1)
Comparison of All Sorting Algorithms
16 pages
Linear Hashing
No ratings yet
Linear Hashing
21 pages
Facility Layout Optimization
No ratings yet
Facility Layout Optimization
61 pages
CRAFT Ex Solved
No ratings yet
CRAFT Ex Solved
6 pages
Brent Optimization
No ratings yet
Brent Optimization
8 pages
50 Programming Questions
No ratings yet
50 Programming Questions
3 pages
Tech Interview Prep for Pros
No ratings yet
Tech Interview Prep for Pros
21 pages
Introduction To Algorithm: Agenda
No ratings yet
Introduction To Algorithm: Agenda
23 pages
DS (Module 2 Note) - 1
No ratings yet
DS (Module 2 Note) - 1
25 pages
Dec 22
No ratings yet
Dec 22
2 pages
Performance Task #6 - Advance Linear Programming & Integer Linear Programming - Summer 2024 Bus Math 43 Management Science II
No ratings yet
Performance Task #6 - Advance Linear Programming & Integer Linear Programming - Summer 2024 Bus Math 43 Management Science II
13 pages
CS2210 Oct Nov 2024 Paper 2 Variant 2 Marking Scheme
No ratings yet
CS2210 Oct Nov 2024 Paper 2 Variant 2 Marking Scheme
2 pages
DAA Lab Manual19-20
No ratings yet
DAA Lab Manual19-20
27 pages
BRI405B
No ratings yet
BRI405B
2 pages
Unit - 1 Introduction To Data Structure
No ratings yet
Unit - 1 Introduction To Data Structure
15 pages
DSA University Question Answer Key
No ratings yet
DSA University Question Answer Key
18 pages
Roots of Nonlinear Equation 2024 2
No ratings yet
Roots of Nonlinear Equation 2024 2
44 pages
Queue Using Array Questions and Answers
No ratings yet
Queue Using Array Questions and Answers
11 pages

K-Means Clustering

Uploaded by

K-Means Clustering

Uploaded by

K-Means Clustering: Mini Project Report

2. K-Means Clustering Algorithm

2.1. Key Concepts

2.2. How K-Means Works

The K-Means algorithm works through the following steps:

1. Initialization: Choose K initial centroids randomly from the data points.

• KKK is the number of clusters.

• CiC_iCi is the cluster of data points belonging to the iii-th centroid.

3. Applications of K-Means Clustering

2. Market Segmentation: Clustering customer data to identify segments for targeted

5. Healthcare: Classifying patient data based on similarities in symptoms or genetic profiles.

4. Implementation of K-Means with Elbow Method

import matplotlib.pyplot as plt

from sklearn.cluster import KMeans

from sklearn.datasets import make_blobs

X, _ = make_blobs(n_samples=300, centers=5, cluster_std=0.6, random_state=0)

for k in range(1, 11):

plt.plot(range(1, 11), inertia, marker='o')

plt.xlabel('Number of Clusters (K)')

1. "An Introduction to K-Means Clustering." INTRAINZ.

You might also like