0% found this document useful (0 votes)
15 views17 pages

Unt32pptx 2025 01 14 14 47 03

Uploaded by

iitachi1401
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views17 pages

Unt32pptx 2025 01 14 14 47 03

Uploaded by

iitachi1401
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Chandigarh School of Business, Jhanjeri

Department of Computer Application


Program Name: BCA
Course Code:UGCA1950
Course Name: Machine Learning

Prepared by: Dr. Gurwinder Singh

Department of Computer Application 1


Outlines

• PTU Syllabus of Unit-I


• CO’s Introduction
• Topic Overview
• Brief description of what the presentation will cover
• Importance or relevance of the topic
• Key objectives or learning outcomes
• Summary
• References

Department of Computer Application 2


PTU Syllabus of Unit-I

Clustering What is Clustering & its Use Cases, K-means Clustering, How does K-means
algorithm work, C-means Clustering, Hierarchical Clustering, How Hierarchical
Clustering works.

Department of Computer Application 3


CO Introduction

CO NUMBER TOPICS LEVEL

PO(1,2,3,4,9) &
CO3 Design solution for basic problems using machine learning algorithms PSO(1)

Department of Computer Application 4


Topic Overview

K-means Clustering
K-means Clustering is a popular unsupervised machine learning algorithm used for partitioning data into distinct groups or
clusters.
It aims to group data points in such a way that points within the same cluster are more similar to each other than to those in
other clusters.
How It Works:
Define the Number of Clusters (K):
The user specifies the desired number of clusters (K).
Random Initialization:
K initial "centeroids" (cluster centers) are randomly placed in the data space.
Assignment Step:
Each data point is assigned to the nearest centeroid based on a distance metric (e.g., Euclidean distance).
Update Step:
The centeroid of each cluster is recalculated as the mean of all points assigned to it.
Iterative Process:
Steps 3 and 4 are repeated until centeroids stabilize or a predefined stopping condition is met.

Department of Computer Applications 5


Brief of what the presentation will

What is Clustering & Its Use Cases


Introduction to clustering and its applications.
K-means Clustering
Explanation of the K-means algorithm and its working process.
C-means Clustering
Overview of C-means clustering.
Hierarchical Clustering
Description of hierarchical clustering and its working mechanism.

Department of Computer Applications 6


Clustering Algorithms

• Flat algorithms
– Usually start with a random (partial) partitioning
– Refine it iteratively
• K means clustering
• (Model based clustering)
• Hierarchical algorithms
– Bottom-up, agglomerative
– (Top-down, divisive)
Sec. 16.4

K-Means

• Assumes documents are real-valued vectors.


• Clusters based on centroids (aka the center of gravity or mean) of points in a cluster,
c:

 1 
μ(c)  
| c | xc
x

• Reassignment of instances to clusters is based on distance to the current cluster


centroids.
– (Or one can equivalently phrase it in terms of similarities)
K-Means Algorithm

Select K random docs {s1, s2,… sK} as seeds.


Until clustering converges (or other stopping criterion):
For each doc di:

Assign di to the cluster cj such that dist(xi, sj) is minimal.


(Next, update the seeds to the centroid of each cluster)
For each cluster cj

sj = (cj)
Sec. 16.4

K Means Example
(K=2)
Pick seeds
Reassign clusters
Compute centroids
Reassign clusters
x x Compute centroids
x
x
Reassign clusters
Sec. 16.4

Termination conditions

• Several possibilities, e.g.,


– A fixed number of iterations.
– Doc partition unchanged.
– Centroid positions don’t change.

Does this mean that the docs in a cluster are unchanged?


Sec. 16.4

Convergence

• Why should the K-means algorithm ever reach a fixed point?


– A state in which clusters don’t change.

• K-means is a special case of a general procedure known as the Expectation


Maximization (EM) algorithm.
– EM is known to converge.
– Number of iterations could be large.
– But in practice usually isn’t
Sec. 16.4

Convergence of K-Means

• Recomputation monotonically decreases each Gk since (mk is number of members in


cluster k):
– Σ (di – a)2 reaches minimum for:
– Σ –2(di – a) = 0
– Σ di = Σ a
– mK a = Σ di
– a = (1/ mk) Σ di = ck
• K-means typically converges quickly
Sec. 16.4

Time Complexity

• Computing distance between two docs is O(M) where M is the dimensionality of the
vectors.
• Reassigning clusters: O(KN) distance computations, or O(KNM).
• Computing centroids: Each doc gets added once to some centroid: O(NM).
• Assume these two steps are each done once for I iterations: O(IKNM).
Sec. 16.4

Seed Choice

Example showing
• Results can vary based on random seed selection.
sensitivity to seeds
• Some seeds can result in poor convergence rate, or
convergence to sub-optimal clusterings.
– Select good seeds using a heuristic (e.g., doc least In the above, if you start
with B and E as centroids
similar to any existing mean) you converge to {A,B,C}
and {D,E,F}
– Try out multiple starting points If you start with D and F
you converge to
– Initialize with the results of another method. {A,B,D,E} {C,F}
Key objectives or learning outcomes

Understand the Concept of K-means Clustering


Learn the definition and purpose of K-means clustering as a data segmentation tool.
Explore the Applications of K-means Clustering
Identify real-world scenarios where K-means clustering is commonly applied, such as customer
segmentation, market research, and image segmentation.
Learn How the K-means Algorithm Works
Gain a step-by-step understanding of the K-means clustering process, including initialization,
assignment, and updating steps.
Understand the Role of Distance Metrics
Understand how distance measures (e.g., Euclidean distance) are used to assign data points to clusters.
Appreciate the Strengths and Limitations of K-means
Recognize the advantages of K-means, such as simplicity and efficiency, and its limitations, such as
sensitivity to outliers and reliance on the predefined number of clusters.
Apply K-means in Practice

Department of Computer Applications 16


THANK YOU

Department of Computer Application 17

You might also like