0% found this document useful (0 votes)

41 views50 pages

2021 Clustering

Uploaded by

sibahlemlambo5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views50 pages

2021 Clustering

Uploaded by

sibahlemlambo5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Machine Learning – COMS3007

Clustering
Benjamin Rosman

Based heavily on course notes by Chris

Williams, Victor Lavrenko, Charles Sutton,
David Blei, David Sontag, Shimon Ullman,
Tomaso Poggio, Danny Harari, Daneil
Zysman, Darren Seibart, and Clint van Alten
Previously on ML…
• So far: focused exclusively on supervised learning
• Data 𝑋 = {𝑥 0 , … , 𝑥 (𝑛) }, where 𝑥 (𝑖) ∈ 𝑅𝑑
• Labels 𝐲 = {𝑦 0 , … , 𝑦 (𝑛) }
• Want to learn function 𝑦 = 𝑓(𝑥, 𝜃) to predict y for a
new x
• Two main types:
• Classification:
• 𝑦 ∈ {0,1} (or more) 𝑥2
• Regression:
• 𝑦∈ℝ
• Conveniently: 𝑥1
• Similar models work
Unsupervised learning
• In supervised learning, we know what we are looking for
• We have appropriately labelled data

• This isn’t always the case!

• Unsupervised learning:
• Find patterns in the data (without labels)
• Understanding the hidden structure of the data
• Useful when you don’t know what you’re looking for

• Data:
• Given 𝐷 = {𝒙1 , … , 𝒙𝑁 }, where each 𝒙 ∈ ℝ𝑑
• No labels!
Clustering
• Clustering: one of the most common unsupervised
learning problems

• Involves automatically segmenting data into groups of

similar points
• Why?
• Automatically organising data
• Understanding structure of
the data
• Finding sub-populations
• Representing high dimensional
data in a low dimensional
space
Examples
• Make groupings from data, such as:
• Customers based on their purchase histories
• Genes according to expression profile
• Search results according to topic
• Facebook users according to interests
• Artifacts in a museum according to visual similarity

Note: this is different to

classifying. We don’t even
know what the classes are!
This gives us a way to
discover them.
Properties
• What makes a good clustering?

• Intra-cluster cohesion (compactness)

• Points in the same cluster are close together

• Inter-cluster separation (isolation)

• Points in different clusters are far apart
Distance metrics
• Notions of “closeness” require a distance metric

• Euclidean distance
2
𝑘 𝑘
• 𝑑 𝑥𝑖 , 𝑥𝑗 = σ𝑑𝑘=1 𝑥𝑖 − 𝑥𝑗 Euclidean

• Manhattan (city block) distance

Manhattan
𝑘 𝑘
• 𝑑 𝑥𝑖 , 𝑥𝑗 = σ𝑑𝑘=1 |𝑥𝑖 − 𝑥𝑗 |
• Approximation to Euclidean distance

• Both are special cases of Minkowski distance:

1
𝑝 𝑝
𝑘 𝑘
• 𝑑 𝑥𝑖 , 𝑥𝑗 = σ𝑑𝑘=1 𝑥𝑖 − 𝑥𝑗 p is a positive integer
K-means
• K-means is one of the most commonly used clustering
algorithms

• Partitional clustering algorithm (maintains partitions over

the space)

• Data points 𝐷 = {𝒙1 , … , 𝒙𝑁 }, where each 𝒙 ∈ ℝ𝑑

• K-means partitions the data into k clusters

• Each cluster has a cluster centre, called the centroid
• k is user specified
K-means algorithm
• Input: 𝐷 = {𝒙1 , … , 𝒙𝑁 }, where each 𝒙 ∈ ℝ𝑑

• Place centroids 𝑐1 , 𝑐2 , … , 𝑐𝑘 at random locations in ℝ𝑑

• Repeat until convergence (cluster assignments don’t change):

• For each point 𝑥𝑖 :
• Find the closest centroid 𝑐𝑗 = 𝑎𝑟𝑔𝑚𝑖𝑛𝑐𝑗 𝑑(𝑥𝑖 , 𝑐𝑗 )
• Assign 𝑥𝑖 to cluster 𝑗 Choose distance
metric 𝑑(⋅,⋅)
appropriately
• For each cluster 𝑗:
• Move the cluster centre 𝑐𝑗 to the average of the
1
assigned points: 𝑐𝑗 = σ𝑖:𝑥𝑖→𝑗 𝑥𝑖 Can compute
𝑛𝑗
median, etc.,
instead of mean
Performance
• Need an objective function to measure performance of the
algorithm

• K-means objective function is the sum of squared distances of

each point to its assigned mean.

• Let 𝑥𝑖 be assigned to cluster 𝑧𝑖

• Then
1 𝑁 2
• 𝐽 𝑥1:𝑁 , 𝑐1:𝐾 = σ𝑖=1 𝑥𝑖 − 𝑐𝑧𝑖
2

Distance between
point and its cluster
Example Objective value

Cluster
boundaries:
Points
equidistant from
2 cluster centres.
Defines a
partitioning.
Example
Example
Example
Example
Example
Example
Converged!
Convergence
• Note the decreasing objective
from the example
• K-means takes an alternating
optimization approach:
• Optimising cluster
assignments
• Optimising cluster positions
• Each step guaranteed to
decrease the objective
• So guaranteed to converge!
• But: local optimum
Properties of k-means
• Strengths:
• Simple to understand and implement
• Efficient: complexity O(NKT)
• N = number of data points Can you convince
• K = number of clusters yourself this is right?
• T = number of iterations
• Weaknesses:
• Converges to a local optimum May need to use
• Only applicable if mean can be defined something like a
• K must be specified median instead
• Sensitive to outliers
Limitations
• K-means finds a local
optimum

• Thus very reliant on

good initialisation

• May need to restart

several times
Limitations
• Very sensitive to outliers We want this:
• Points very far away
from other points

• Strategies:
• Remove outliers
manually (monitor
them over a few Instead, we may get this:
iterations first)
• Random sampling:
choose a subset of the
data, less likely to
contain outliers
• Median?
Limitations
• Not suitable for clusters that are not hyper-ellipsoids

• Nonlinear features may be useful here

Choosing k
• Not always clear what the correct number of clusters is

• Often heuristics are used

• Although there are algorithms that do this more
automatically
Changing values of k
Changing values of k
Changing values of k
Changing values of k
Changing values of k
Changing values of k
Changing values of k
Changing values of k
Changing values of k
• A common heuristic is to
look at the changing value
of the objective with
changing k

• The “kink” or “elbow” is

often taken to indicate the
best k
Running k-means online
• Online/sequential k-means
• Why? Clustering online articles as they’re written

• Algorithm:

• Place centroids 𝑐1 , 𝑐2 , … , 𝑐𝑘 at random locations in ℝ𝑑

• Set initial counts 𝑛1 , 𝑛2 , … , 𝑛𝑘 = 0
• Repeat until bored:
• Acquire new point 𝑥𝑖
• Find the closest centroid 𝑐𝑗 = 𝑎𝑟𝑔𝑚𝑖𝑛𝑐𝑗 𝑑(𝑥𝑖 , 𝑐𝑗 )
• Assign 𝑥𝑖 to cluster 𝑗
• 𝑛𝑗 ← 𝑛𝑗 + 1 Update appropriate cluster
1 centre by moving it closer to 𝑥𝑖 .
• 𝑐𝑗 ← 𝑐𝑗 + (𝑥𝑖 − 𝑐𝑗 ) 1
acts as an adaptive learning
𝑛𝑗 𝑛 𝑗
rate.
Applications: supervised learning
• Use clustering to discretise continuous values for
supervised learning

• Instead of using a set discretisation interval

• Cluster training data and use cluster ID
• This can also lower the dimension of high
dimensional data
Applications: visual words
• Use a similar idea for images
• What if you wanted to use a naïve Bayes or decision
tree model to classify images?
• Pixels as attributes?
• Huge space, and not useful for learning
• Bag-of-words would be nice: {“water”, “grass”, “tiger”}
• Needs human annotation
Applications: visual words
• Idea:
• Break image into set of patches
• Compute appearance features of each patch
• Relative position, distribution of colours, texture,
edge orientations
• Convert to a “word” (code) to reflect patch appearance
• Similar feature vectors → same “word”

𝑥1
𝑥2 “grass”
…
“C27”
𝑥𝑑
Applications: visual words
Applications: visual words
• Use k-means to:
• Group all feature vectors from all images into K clusters
• Provide a cluster ID for every patch in every image
• Similar-looking patches have the same ID

• Represent patch with cluster ID

• Image = bag of cluster IDs
• K-dimensional representation:
• {4 x “C14”, 7 x “C27”, 24 x “C79”, 0 x else}

• Similar to bag-of-words
• Cluster IDs called vis-terms or “visual words”
• Plug these into a classifier
Applications: image compression

• Every pixel in an image has a red, green, blue value

• How many bits per pixel?
• We can use k-means to compress the image!
Applications: image compression

• Clustering in the colour space

• Replace each pixel 𝑥𝑖 by its cluster centre 𝑐𝑥𝑖
• The k means are called the codebook
Applications: image compression
Applications: image compression
Applications: image compression
Applications: image compression
Applications: image compression
Applications: image compression
Applications: image compression
Applications: image compression
Applications: image compression

• Sometimes known as vector quantisation

• Also an easy way to do image segmentation
Recap
• Clustering and applications
• Distance metrics
• The k-means algorithm
• Limitations of k-means
• How to choose K
• Online k-means
• Representations for supervised learning
• Visual words
• Image compression and segmentation

Lect 6 - Clustering
No ratings yet
Lect 6 - Clustering
50 pages
Soft Vs Hard Clustering
No ratings yet
Soft Vs Hard Clustering
5 pages
Session 37 CO4 Unsupervised Learning
No ratings yet
Session 37 CO4 Unsupervised Learning
34 pages
K - Means Clustering and Related Algorithms: Ryan P. Adams COS 324 - Elements of Machine Learning Princeton University
No ratings yet
K - Means Clustering and Related Algorithms: Ryan P. Adams COS 324 - Elements of Machine Learning Princeton University
18 pages
DM&BAFall2204 2
No ratings yet
DM&BAFall2204 2
61 pages
Neural Network Clustering Guide
No ratings yet
Neural Network Clustering Guide
168 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Clustering
No ratings yet
Clustering
84 pages
Clustering Part1
No ratings yet
Clustering Part1
84 pages
Unit 4
No ratings yet
Unit 4
125 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
Unsupervised Learning: K-Means Clustering
No ratings yet
Unsupervised Learning: K-Means Clustering
23 pages
Eml 10 250825
No ratings yet
Eml 10 250825
91 pages
Day 3
No ratings yet
Day 3
74 pages
07 Clustering 2024
No ratings yet
07 Clustering 2024
51 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
P-3 1 2-Kmeans
No ratings yet
P-3 1 2-Kmeans
43 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
20 pages
13: Clustering: Unsupervised Learning - Introduction
No ratings yet
13: Clustering: Unsupervised Learning - Introduction
4 pages
Cluster
No ratings yet
Cluster
50 pages
Clustering Techniques - Hierarchical, K-Means Clustering
No ratings yet
Clustering Techniques - Hierarchical, K-Means Clustering
22 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
Unsupervised Learning Modi
No ratings yet
Unsupervised Learning Modi
16 pages
Clustering and K-Means Lecture
No ratings yet
Clustering and K-Means Lecture
36 pages
Clustering K-Means
100% (2)
Clustering K-Means
28 pages
Lec09 Clustering
No ratings yet
Lec09 Clustering
27 pages
Clustering Techniques for CS Students
100% (1)
Clustering Techniques for CS Students
26 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
16 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Datamining Lect8
No ratings yet
Datamining Lect8
79 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
16 pages
K Mean Clustering
No ratings yet
K Mean Clustering
32 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
K Mean
No ratings yet
K Mean
7 pages
Machine Learning: Clustering & Algorithms
No ratings yet
Machine Learning: Clustering & Algorithms
66 pages
K Means Algorithm
No ratings yet
K Means Algorithm
4 pages
Digital Image Processing: Segmentation-5
No ratings yet
Digital Image Processing: Segmentation-5
43 pages
3 UnSupervised Learning
No ratings yet
3 UnSupervised Learning
53 pages
کتاب چهارم بارگزاری شده
No ratings yet
کتاب چهارم بارگزاری شده
63 pages
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
110 pages
K Means
No ratings yet
K Means
25 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
Lecture 2.1.1 To 2.1.2
No ratings yet
Lecture 2.1.1 To 2.1.2
97 pages
5clustering 2
No ratings yet
5clustering 2
35 pages
Unsupervised Learning: Clustering Techniques
No ratings yet
Unsupervised Learning: Clustering Techniques
21 pages
Module 4
No ratings yet
Module 4
63 pages
Clustering Algorithm: An Unsupervised Learning Approach
No ratings yet
Clustering Algorithm: An Unsupervised Learning Approach
23 pages
MLT Unit 3 Notes
No ratings yet
MLT Unit 3 Notes
19 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
57 pages
Unsupervised Learning & Clustering Guide
No ratings yet
Unsupervised Learning & Clustering Guide
49 pages
Unsupervised Learning: Clustering Techniques
No ratings yet
Unsupervised Learning: Clustering Techniques
54 pages
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
No ratings yet
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
65 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
Clustering
No ratings yet
Clustering
125 pages
Medical Imabmnge Analysis
No ratings yet
Medical Imabmnge Analysis
41 pages
Akachukwu Chichebe, M.: Dr. A. M. Aibinu and Engr. Dr. M. N. Nwohu
No ratings yet
Akachukwu Chichebe, M.: Dr. A. M. Aibinu and Engr. Dr. M. N. Nwohu
54 pages
HR Predictive Analytics with ML
No ratings yet
HR Predictive Analytics with ML
13 pages
Dynamical Exploration of The Repertoire of Brain Networks at Rest Is Modulated by Psilocybin
No ratings yet
Dynamical Exploration of The Repertoire of Brain Networks at Rest Is Modulated by Psilocybin
16 pages
Machine Learning (Se204A) Lab Manual
No ratings yet
Machine Learning (Se204A) Lab Manual
27 pages
Unsupervised Learning for MBAs
No ratings yet
Unsupervised Learning for MBAs
10 pages
Diploma in Data Science: Integrating AI, Mathematics, Python, and Machine Learning
No ratings yet
Diploma in Data Science: Integrating AI, Mathematics, Python, and Machine Learning
12 pages
Assignment1 134023774911179688
No ratings yet
Assignment1 134023774911179688
3 pages
Unsupervised Learning Overview
No ratings yet
Unsupervised Learning Overview
41 pages
Week 7 - Latent Variable Models and Expectation Maximization
No ratings yet
Week 7 - Latent Variable Models and Expectation Maximization
39 pages
Hybrid Machine Learning Model For Efficient Botnet Attack Detection in IoT
No ratings yet
Hybrid Machine Learning Model For Efficient Botnet Attack Detection in IoT
5 pages
Csda V Nep2020 1
No ratings yet
Csda V Nep2020 1
36 pages
ML (AD-502) LAB Experments
No ratings yet
ML (AD-502) LAB Experments
2 pages
Ferreira Et Al-2017-Journal of Geophysical Research Oceans
No ratings yet
Ferreira Et Al-2017-Journal of Geophysical Research Oceans
20 pages
NLP Unit 5 QBank
No ratings yet
NLP Unit 5 QBank
2 pages
K Means Clustering - Ipynb - Colab
No ratings yet
K Means Clustering - Ipynb - Colab
2 pages
Bda Answers
No ratings yet
Bda Answers
18 pages
A Nonparametric Split and Kernel-Merge Clustering Algorithm
No ratings yet
A Nonparametric Split and Kernel-Merge Clustering Algorithm
15 pages
OML4SQL - Workshop V 1.0 052022
No ratings yet
OML4SQL - Workshop V 1.0 052022
42 pages
Artificial Intelligence: Brochure
No ratings yet
Artificial Intelligence: Brochure
8 pages
C3 W1 KMeans Assignment
No ratings yet
C3 W1 KMeans Assignment
13 pages
January 2024: Top 10 Downloaded Articles in Computer Science & Information Technology
No ratings yet
January 2024: Top 10 Downloaded Articles in Computer Science & Information Technology
35 pages
Hybrid Travel Recommendation Algorithm Based On Center Aggregation Parameters
No ratings yet
Hybrid Travel Recommendation Algorithm Based On Center Aggregation Parameters
7 pages
K-Means Clustering Overview by Srihari
No ratings yet
K-Means Clustering Overview by Srihari
20 pages
Lecture 3
No ratings yet
Lecture 3
15 pages
Regional Clustering For Developing Electricity Systems
No ratings yet
Regional Clustering For Developing Electricity Systems
14 pages
Machine Learning Record
No ratings yet
Machine Learning Record
52 pages
Means Clustering Via Principal Component Analysis: Chris Ding
No ratings yet
Means Clustering Via Principal Component Analysis: Chris Ding
8 pages
MIT 6.867 Machine Learning Exam Problems
No ratings yet
MIT 6.867 Machine Learning Exam Problems
10 pages
Clustering Analysis for Data Science
No ratings yet
Clustering Analysis for Data Science
51 pages
API Design For Machine Learning Software: Experiences From The Scikit-Learn Project
No ratings yet
API Design For Machine Learning Software: Experiences From The Scikit-Learn Project
15 pages

2021 Clustering

Uploaded by

2021 Clustering

Uploaded by

Machine Learning – COMS3007

Based heavily on course notes by Chris

• This isn’t always the case!

• Involves automatically segmenting data into groups of

Note: this is different to

• Intra-cluster cohesion (compactness)

• Inter-cluster separation (isolation)

• Manhattan (city block) distance

• Both are special cases of Minkowski distance:

• Partitional clustering algorithm (maintains partitions over

• Data points 𝐷 = {𝒙1 , … , 𝒙𝑁 }, where each 𝒙 ∈ ℝ𝑑

• K-means partitions the data into k clusters

• Place centroids 𝑐1 , 𝑐2 , … , 𝑐𝑘 at random locations in ℝ𝑑

• Repeat until convergence (cluster assignments don’t change):

• K-means objective function is the sum of squared distances of

• Let 𝑥𝑖 be assigned to cluster 𝑧𝑖

• Thus very reliant on

• May need to restart

• Nonlinear features may be useful here

• Often heuristics are used

• The “kink” or “elbow” is

• Place centroids 𝑐1 , 𝑐2 , … , 𝑐𝑘 at random locations in ℝ𝑑

• Instead of using a set discretisation interval

• Represent patch with cluster ID

• Every pixel in an image has a red, green, blue value

• Clustering in the colour space

• Sometimes known as vector quantisation

You might also like