0% found this document useful (0 votes)

12 views

DMDWUNITV

The document provides an overview of cluster analysis, detailing various clustering methods such as K-means, hierarchical clustering, and DBSCAN. It discusses the basic concepts of clustering, types of clusters, and the importance of choosing initial centroids in K-means clustering. Additionally, it highlights applications of cluster analysis in different industries and the complexities involved in clustering algorithms.

Uploaded by

Ratna Kumari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

DMDWUNITV

Uploaded by

Ratna Kumari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 72

𝑥−

𝑐𝑖

UNITV
CLUSTER ANALYSIS
Cluster Analysis: Basic Concepts and Algorithms: Overview, What Is Cluster
Analysis?
Different Types of Clustering,
Different Types of Clusters; K-means: The Basic K-means Algorithm, K-means
Additional Issues,
Bisecting K-means, Strengths and Weaknesses;
Agglomerative Hierarchical Clustering: Basic Agglomerative Hierarchical Clustering
Algorithm
DBSCAN: Traditional Density Center-Based Approach,
DBSCANlgorithm,StrengthsandWeaknesses.(Tan&Vipin)
CLUSTERING
What is a Clustering?
• In general a grouping of objects such that the
objects in a group (cluster) are similar (or related)
to one another and different from (or unrelated
to) the objects in other groups

Inter-
Intra- cluster
cluster distances
distances are
are
Applications of Cluster Analysis
Discovered Clusters Industry Group
• Understanding Applied-Matl-DOWN,Bay-Network-Down,3-COM-DOWN,

• Group related documents for

1 Cabletron-Sys-DOWN,CISCO-DOWN,HP-DOWN,
DSC-Comm-DOWN,INTEL-DOWN,LSI-Logic-DOWN,
Micron-Tech-DOWN,Texas-Inst-Down,Tellabs-Inc-Down, Technology1-DOWN
Natl-Semiconduct-DOWN,Oracl-DOWN,SGI-DOWN,
browsing, group genes and Sun-DOWN
Apple-Comp-DOWN,Autodesk-DOWN,DEC-DOWN,

proteins that have similar 2 ADV-Micro-Device-DOWN,Andrew-Corp-DOWN,

Computer-Assoc-DOWN,Circuit-City-DOWN,
Compaq-DOWN, EMC-Corp-DOWN, Gen-Inst-DOWN, Technology2-DOWN
functionality, or group stocks Motorola-DOWN,Microsoft-DOWN,Scientific-Atl-DOWN

3
Fannie-Mae-DOWN,Fed-Home-Loan-DOWN,
with similar price fluctuations MBNA-Corp-DOWN,Morgan-Stanley-DOWN Financial-DOWN
Baker-Hughes-UP,Dresser-Inds-UP,Halliburton-HLD-UP,

4 Louisiana-Land-UP,Phillips-Petro-UP,Unocal-UP,
Schlumberger-UP
Oil-UP

• Summarization
• Reduce the size of large
data sets

Clustering
precipitation in
Australia
Early applications of cluster
analysis
•John Snow, London 1854
Notion of a Cluster can be
Ambiguous

How many clusters? Six Clusters

Two Clusters Four Clusters

Types of Clusterings
• A clustering is a set of clusters

• Clusterings are hierarchical and

partitional clusters
• Partitional Clustering
• A division data objects into subsets
(clusters) such that each data object is in
exactly one subset

• Hierarchical clustering
• A set of nested clusters organized as a
hierarchical tree
Partitional Clustering

Original Points A Partitional Clustering

Hierarchical Clustering

p1
p3 p4
p2
p1 p2 p3 p4

Traditional Traditional Dendrogram

Hierarchical
Clustering

p1
p3 p4
p2

p1 p2 p3 p4
Other types of clustering
• Exclusive (or non-overlapping) versus
non- exclusive (or overlapping)
• In non-exclusive clustering’s, points may
belong to multiple clusters.
• Points that belong to multiple classes, or ‘border’ points

• Fuzzy (or soft) versus non-fuzzy (or hard)

• In fuzzy clustering, a point belongs to every cluster
with some weight between 0 and 1
• Weights usually must sum to 1 (often interpreted as probabilities)

• Partial versus complete

• In some cases, we only want to cluster some
of the data
Types of Clusters: Well-Separated
• Well-Separated Clusters:
• A cluster is a set of points such that any point in a
cluster is closer (or more similar) to every other point
in the cluster than to any point not in the cluster.

well-separated clusters
Types of Clusters: Center-Based
• Center-based
• A cluster is a set of objects such that an object in a
cluster is closer (more similar) to the “center” of a
cluster, than to the center of any other cluster
• The center of a cluster is often a centroid, the
minimizer of distances from all the points in the
cluster, or a medoid, the most “representative”
point of a cluster
center-based clusters
Types of Clusters: Contiguity-
Based
• Contiguous Cluster (Nearest
neighbor or Transitive)
• A cluster is a set of points such that a point in a
cluster is closer (or more similar) to one or more
other points in the cluster than to any point not in
the cluster.
contiguous clusters
Types of Clusters: Density-Based
• Density-based
• A cluster is a dense region of points, which is separated by
low-density regions, from other regions of high density.
• Used when the clusters are irregular or intertwined,
and when noise and outliers are present.

density-based clusters
Types of Clusters: Conceptual
Clusters
• Shared Property or Conceptual Clusters
• Finds clusters that share some common property or
represent a particular concept.
.
Overlapping Circles
Types of Clusters: Objective
Function
• Clustering as an optimization problem
• Finds clusters that minimize or maximize an objective
function.
• Enumerate all possible ways of dividing the points into
clusters and evaluate the `goodness' of each potential
set of clusters by using the given objective function.
(NP Hard)
• Can have global or local objectives.
• Hierarchical clustering algorithms typically have local objectives
• Partitional algorithms typically have global objectives
• A variation of the global objective function approach is
to fit the data to a parameterized model.
• The parameters for the model are determined from the data, and
they
determine the clustering
• E.g., Mixture models assume that the data is a ‘mixture' of
a number of statistical distributions.

Typical workflow for cluster

analysis
Clustering Algorithms
•K-means and its variants

•Hierarchical clustering

•DBSCAN
K-MEANS
K-means Clustering
• Partitional clustering approach
• Each cluster is associated with a
centroid (center point)
• Each point is assigned to the cluster
with the closest centroid
• Number of clusters, K, must be specified
• The objective is to minimize the
sum of distances of the points to
their respective centroid
K-means
Clustering
•Problem: Given a set X of n points in a d-
dimensional space and an integer K group
the points into K clusters C= {C1, C2,
…,Ck} such that
𝑘

𝐶𝑜𝑠𝑡 = ∑𝑑𝑖𝑠𝑡(𝑥, 𝑐)
𝐶 ∑ 𝑥∈
𝑖=1 𝐶𝑖
K-means
Clustering
is minimized, where ci is the centroid of the
points in cluster C
i
K-means
Clustering
• Most common definition is with euclidean
distance, minimizing the Sum of Squares
Error (SSE) function
• Sometimes K-means is defined like that

• Problem: Given a set X of n points in a d-

dimensional space and an integer K group the
points into K clusters C= {C1, C2,…,Ck} such

𝐶 𝑘
that

𝐶𝑜𝑠𝑡 = 𝑖=1

∑
K-means
𝑥∈𝐶𝑖
∑Clustering
2

is minimized, where ci is the mean of the points in

cluster Ci Sum of Squares Error
(SSE)
Complexity of the k-means
problem
•NP-hard if the dimensionality of the
data is at least 2 (d>=2)
• Finding the best solution in polynomial time is
infeasible

•For d=1 the problem is solvable in

polynomial time (how?)

•A simple iterative algorithm works quite

well in practice
K-means
Algorithm
•Also known as Lloyd’s algorithm.
•K-means is sometimes synonymous
with this algorithm
K-means Algorithm –
Initialization
• Initial centroids are often chosen
randomly.
• Clusters produced vary from one run to
another.
Two different K-means Clusterings
3

2.5

1.5
Original
Points
y
1

0.5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

3 3

2.5 2.5

2 2

1.5 1.5
y

y
1 1

0.5 0.5

0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x
x
Optimal Clustering Sub-optimal Clustering
Importance of Choosing Initial
Centroids
Iteration
Iteration5
1
2
3
4
3 6
2.
2.5
5

1.
1.5
5
y
y

0.
0.5
5

0
0

-2 -2 - -1.5 -1 -1 - -0.5 0 0 0. 0.5 1 1 1. 1.5 2

1.5 0.5 x 2 5 5
x
Importance of Choosing Initial
Centroids
Iteration 1
3
Iteration 2 Iteration
3 3
3

2.5
2.5 2.5

2 2 2

1.5
1.5 1.5
y

y
1 1 1

0.5
0.5 0.5

0 0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x
x x

Iteration 4
3
Iteration 5 Iteration
3 6
3

2.5
2.5 2.5

2 2 2

1.5
1.5 1.5
y
y

y
1 1 1

0.5
0.5 0.5

0 0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x
x x
Importance of Choosing Initial
Centroids
Iteration
Iteration4
1
2
3
3 5
2.
2.5
5

1.
1.5
5
y
y

0.
0.5
5

0
0

-2 -2 - -1.5 -1 -1 - -0.5 0 0 0. 0.5 1 1 1. 1.5 2

1.5 0.5 x 2 5 5
x
Importance of Choosing Initial
Centroids …
Iteration Iteration 2
3 1 3

2.5 2.5

2 2

1.5 1.5
y

y
1 1

0.5 0.5

0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5

2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x

Iteration Iteration Iteration

3 3 3 4 3 5
2.5 2. 2.
5 5

2 2 2

1.5 1. 1.
5 5
y

y
1 1 1

0.5 0. 0.
5 5

0 0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
Dealing with Initialization
•Do multiple runs and select the clustering
with the smallest error

•Select original set of points by methods

other than random . E.g., pick the most
distant (from each other) points as cluster
centers (K-means++ algorithm)
K-means Algorithm – Centroids
• The centroid depends on the distance function
• The minimizer for the distance function
• ‘Closeness’ is measured by Euclidean
distance (SSE), cosine similarity,
correlation, etc.
• Centroid:
• The mean of the points in the cluster for SSE, and cosine
similarity
• The median for Manhattan distance.

• Finding the centroid is not always easy

• It can be an NP-hard problem for some distance functions
• E.g., median form multiple dimensions
K-means Algorithm – Convergence
•K-means will converge for common
similarity measures mentioned above.
• Most of the convergence happens in the first few
iterations.
• Often the stopping condition is changed
to ‘Until relatively few points change
clusters’
•Complexity is O( n * K * I * d )
• n = number of points, K = number of
clusters, I = number of iterations, d =
dimensionality
•In general a fast and efficient algorithm
Limitations of K-means
•K-means has problems when clusters
are of different
• Sizes
• Densities
• Non-globular shapes

•K-means has problems when the data

contains outliers.
Limitations of K-means: Differing Sizes

Original Points K-means (3 Clusters)

Limitations of K-means: Differing
Density

Original Points K-means (3 Clusters)

Limitations of K-means: Non-globular
Shapes

Original Points K-means (2 Clusters)

Overcoming K-means Limitations

Original Points K-means Clusters

One solution is to use many clusters.

Find parts of clusters, but need to put together.
Overcoming K-means Limitations

Original Points K-means Clusters

Overcoming K-means Limitations

Original Points K-means Clusters

Variations
•K-medoids: Similar problem definition as in K-
means, but the centroid of the cluster is defined
to be one of the points in the cluster (the
medoid).

•K-centers: Similar problem definition as in K-

means, but the goal now is to minimize the
maximum diameter of the clusters (diameter of
a cluster is maximum distance between any
two points in the cluster).
Bisecting K-means
● Combines K-
means and
hierarchial
clustering
● Clusters are
iteratively split via
regular
K-means with K=2
● Stops when
desired # of
clusters is reached
•

Hierarchical clustering
Produces nested
clusters Can be
visualized as a
dendrogram
Can be either:
- Agglomerative (bottom
up): Initially, each point is a
cluster Repeatedly combine
the two “nearest” clusters
into one
- Divisive (top down):
Start with one cluster and
recursively split
Advantages of
Hierarchical
Clustering
● Do not have to assume
any particular number of
clusters
– Any desired number of
clusters can be obtained
by cutting the
dendrogram at the proper level
● No random component
(clusters will be the same
from run to run)
● Clusters may
correspond to
meaningful taxonomies
– Especially in biological
sciences (e.g., phylogeny
reconstruction)

Agglomerative Clustering Algorithm

● Most popular hierarchical clustering

technique
● Basic algorithm:
1) Compute the proximity metric
2) Let each data point be a cluster
3) Repeat
4) Merge the two closest clusters
5) Update the proximity metric
6) Until only a single cluster remains
● Key operation is the computation
of the proximity between two
clusters
– Different approaches to defining this
distance distinguish the different
algorithms
Divisive Clustering Algorithm
● Minimum spanning tree (MST)
– Start with one point
– In successive steps, look for closest pair of
points (p,q) such that p is in the tree but q
is not.
– Add q to the tree (add edge between p and q)

Linkages
● Linkage: measure of dissimilarity
between clusters
● Many methods:
– Single linkage
– Complete linkage
– Average linkage
– Centroids
– Ward’s method
Single linkage
(akanearest
neighbor)
● Proximity of two clusters is based on the two
closest points in the different cluster
● Proximity is determined by one pair of points (i.e., one
link)
● Can handle non-elliptical shapes
● Sensitive to noise and outliers
Complete linkage
● Proximity of two clusters is based on the two
most distant points in the different clusters
● Less susceptible to noise and outliers
● May break large clusters
● Biased toward globular clusters
Average linkage

● Proximity of two clusters is the

average of pairwise proximity
between points in the clusters
● Less susceptible to noise and outliers
● Biased towards globular clusters
Ward’s method

● Similiarity of two
clusters is based on
the increase in squared
error when two clusters
are merged
● Similar to group
average if distance
between points is
distance squared
● Less susceptible to
noise and outliers
● Biased towards
globular clusters
Agglomerative
clustering
exercise
● How do clusters change with different
linkage methods?
∙ Single
5
3 1
1
5
2 1
2 3 6
5
2
4
3 6 4

4
Agglomerative clustering exercise
● How do clusters change with different
linkage methods?
∙ Complete 4 1
2 5
5
1 2
3 6
3
5 1
2 4

3 6

4
Agglomerative clustering exercise
● How do clusters change with different
linkage methods?
∙ Average
5
1
2
1 5
2
3 6
5 3
2 4 1
4
3 6

4
Linkage Comparison
5 4
1 1
3
2 5
5 5
2 1 2
Single Complete
2 3 6 3 6
3
1
4 4
4

5 5 4
1 1
2 2
5 Ward’s Method 5
2 2
3 6 Average 3 6
• 3
4 1 1
4 4
3

DBSCAN ALGORITHM:
DBSCAN stands for density-based spatial clustering of applications with
noise. It is able to find arbitrary shaped clusters and clusters with noise (i.e.
outliers).

Density-based clustering
● Assumes clusters are areas of high density separated
by areas of low density
● Core points are in areas of a certain density (at least n
points in radius r from the core point)
● Border points aren’t core points, but are w/in r of the core
point
● Noise points are all other points

n=7
r
r r
DBSCAN Algorithm

● Eliminate noise points

● Perform clustering on remaining points
DBSCAN Advantages &
Limitations
● Advantages:
● Resistant to noise
● Can handle clusters of different shapes and sizes
● Number of clusters is determined by the algorithm

Original data Clustered

Limitations:
● Struggles to identify clusters with varying densities –
clustering is often incomplete at points in low density
regions are ignored
● Density can be difficult/expensive to compute in high-
dimensional datasets.

Hierarchical Clustering-Based Asset Allocation: Homas Affinot
No ratings yet
Hierarchical Clustering-Based Asset Allocation: Homas Affinot
11 pages
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
110 pages
datamining-lect8
No ratings yet
datamining-lect8
79 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
UNIT5
No ratings yet
UNIT5
60 pages
Lecture 9 Clustering
No ratings yet
Lecture 9 Clustering
36 pages
Chap7 Basic Cluster Analysis
No ratings yet
Chap7 Basic Cluster Analysis
82 pages
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
93 pages
Cluster
100% (1)
Cluster
72 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Clustering Analysis
No ratings yet
Clustering Analysis
102 pages
L11 Cluster Analysis
No ratings yet
L11 Cluster Analysis
47 pages
Lect 12
No ratings yet
Lect 12
80 pages
UNIT-5 PPT
No ratings yet
UNIT-5 PPT
85 pages
Data Mining Cluster Analysis: Basic Concepts and Algorithms
No ratings yet
Data Mining Cluster Analysis: Basic Concepts and Algorithms
26 pages
Unit 5
No ratings yet
Unit 5
63 pages
Lecture 6
No ratings yet
Lecture 6
14 pages
DMW UNIT 5
No ratings yet
DMW UNIT 5
10 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
TQM - TRG - F-07 - Cluster Analysis - Rev02 - 20180421
No ratings yet
TQM - TRG - F-07 - Cluster Analysis - Rev02 - 20180421
42 pages
Cluster Analysis 1731695796
No ratings yet
Cluster Analysis 1731695796
91 pages
Clustering-Part1
No ratings yet
Clustering-Part1
79 pages
Clustering
No ratings yet
Clustering
104 pages
M5
No ratings yet
M5
40 pages
Data Mining Unit 3 Cluster Analysis: Types of Clusters
No ratings yet
Data Mining Unit 3 Cluster Analysis: Types of Clusters
11 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
108 pages
M5
No ratings yet
M5
40 pages
Clustering
No ratings yet
Clustering
34 pages
Unit 4
No ratings yet
Unit 4
74 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Cluster Analysis Set 01: Types of Clustering
No ratings yet
Cluster Analysis Set 01: Types of Clustering
18 pages
Clustering
No ratings yet
Clustering
125 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
ML - 8
No ratings yet
ML - 8
70 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Wk. 9. Cluster Analysis [01-04-2021]
No ratings yet
Wk. 9. Cluster Analysis [01-04-2021]
97 pages
k-medoids
No ratings yet
k-medoids
101 pages
Clustering
No ratings yet
Clustering
84 pages
Module 5
No ratings yet
Module 5
98 pages
Unit 5 DWM by DR KSR Cluster Analysis
No ratings yet
Unit 5 DWM by DR KSR Cluster Analysis
72 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
Unit 4
No ratings yet
Unit 4
5 pages
Cluster Analysis: Basic Concepts and Algorithms
No ratings yet
Cluster Analysis: Basic Concepts and Algorithms
141 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
Clustering Algorithm: An Unsupervised Learning Approach
No ratings yet
Clustering Algorithm: An Unsupervised Learning Approach
23 pages
Week 9 Part 1 Clustering
No ratings yet
Week 9 Part 1 Clustering
44 pages
Clustering
No ratings yet
Clustering
12 pages
dm 4
No ratings yet
dm 4
76 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
9 pages
Unit - 5 Cluster Analysis
No ratings yet
Unit - 5 Cluster Analysis
83 pages
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
No ratings yet
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
45 pages
4 Clustering1
No ratings yet
4 Clustering1
41 pages
Cluster Analysis: G Sreenivas
No ratings yet
Cluster Analysis: G Sreenivas
29 pages
Unit 2 - Introduction to Cluster Analysis
No ratings yet
Unit 2 - Introduction to Cluster Analysis
53 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Lecture 8 Clustring
No ratings yet
Lecture 8 Clustring
16 pages
Cluster MCQ
No ratings yet
Cluster MCQ
12 pages
Clustering: EE-671 Prof L. Behera, IITK
No ratings yet
Clustering: EE-671 Prof L. Behera, IITK
33 pages
Cluster 3.0 Manual: Michael Eisen Updated by Michiel de Hoon
No ratings yet
Cluster 3.0 Manual: Michael Eisen Updated by Michiel de Hoon
34 pages
Lecture Notes - Clustering
No ratings yet
Lecture Notes - Clustering
13 pages
ML Unit 5
No ratings yet
ML Unit 5
50 pages
Cs8080 Unit3 Text Classification and Clustering
No ratings yet
Cs8080 Unit3 Text Classification and Clustering
171 pages
Dendrograms & PFGE Analysis
No ratings yet
Dendrograms & PFGE Analysis
28 pages
UNIT-6
No ratings yet
UNIT-6
30 pages
Clustering Analysis
No ratings yet
Clustering Analysis
30 pages
An Introduction to Spatial Data Science with GeoDa Volume 2 Clustering Spatial Data 2nd Edition Luc Anselin pdf download
No ratings yet
An Introduction to Spatial Data Science with GeoDa Volume 2 Clustering Spatial Data 2nd Edition Luc Anselin pdf download
67 pages
Cluster 3.0 Manual: Michael Eisen Updated by Michiel de Hoon
No ratings yet
Cluster 3.0 Manual: Michael Eisen Updated by Michiel de Hoon
32 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
2 pages
SPK Clustering
No ratings yet
SPK Clustering
35 pages
Cluster Analysis: Prof. Vandith Pamuru
No ratings yet
Cluster Analysis: Prof. Vandith Pamuru
68 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
Theory of Agglomerative Hierarchical Clustering Sadaaki Miyamoto - Get instant access to the full ebook content
No ratings yet
Theory of Agglomerative Hierarchical Clustering Sadaaki Miyamoto - Get instant access to the full ebook content
75 pages
Dolnicar 2003 SHARE
No ratings yet
Dolnicar 2003 SHARE
9 pages
Physicochemical Properties of Banana Peel Flour As Influenced by Variety and Stage of Ripeness Multivariate Statistical Analysis
No ratings yet
Physicochemical Properties of Banana Peel Flour As Influenced by Variety and Stage of Ripeness Multivariate Statistical Analysis
14 pages
DWM Solution May 2019
No ratings yet
DWM Solution May 2019
9 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
11 pages
The Hierarchical Equal Risk Contribution Portfolio
No ratings yet
The Hierarchical Equal Risk Contribution Portfolio
26 pages
Unit - 5
No ratings yet
Unit - 5
111 pages
PDF Laporan Praktikum Data Mining - Compress
No ratings yet
PDF Laporan Praktikum Data Mining - Compress
142 pages
Cluster Analysis or Clustering Is The Art of Separating The Data Points Into Dissimilar Group With A
No ratings yet
Cluster Analysis or Clustering Is The Art of Separating The Data Points Into Dissimilar Group With A
11 pages
Unit 3 Supervised Learning
No ratings yet
Unit 3 Supervised Learning
89 pages
Clustering
No ratings yet
Clustering
44 pages
Where Can Buy An Introduction To Spatial Data Science With GeoDa Volume 2 Clustering Spatial Data 1st Edition Luc Anselin Ebook With Cheap Price
100% (12)
Where Can Buy An Introduction To Spatial Data Science With GeoDa Volume 2 Clustering Spatial Data 1st Edition Luc Anselin Ebook With Cheap Price
70 pages
Clustering - Hierarchical
No ratings yet
Clustering - Hierarchical
4 pages

DMDWUNITV

Uploaded by

DMDWUNITV

Uploaded by

𝑥−

• Group related documents for

proteins that have similar 2 ADV-Micro-Device-DOWN,Andrew-Corp-DOWN,

How many clusters? Six Clusters

Two Clusters Four Clusters

• Clusterings are hierarchical and

Original Points A Partitional Clustering

Traditional Traditional Dendrogram

• Fuzzy (or soft) versus non-fuzzy (or hard)

• Partial versus complete

Typical workflow for cluster

• Problem: Given a set X of n points in a d-

is minimized, where ci is the mean of the points in

•For d=1 the problem is solvable in

•A simple iterative algorithm works quite

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-2 -2 - -1.5 -1 -1 - -0.5 0 0 0. 0.5 1 1 1. 1.5 2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-2 -2 - -1.5 -1 -1 - -0.5 0 0 0. 0.5 1 1 1. 1.5 2

-2 -1.5 -1 -0.5 0 0.5 1 1.5

Iteration Iteration Iteration

•Select original set of points by methods

• Finding the centroid is not always easy

•K-means has problems when the data

Original Points K-means (3 Clusters)

Original Points K-means (3 Clusters)

Original Points K-means (2 Clusters)

Original Points K-means Clusters

One solution is to use many clusters.

Original Points K-means Clusters

Original Points K-means Clusters

•K-centers: Similar problem definition as in K-

Agglomerative Clustering Algorithm

● Most popular hierarchical clustering

● Proximity of two clusters is the

● Eliminate noise points

Original data Clustered

You might also like