0% found this document useful (0 votes)
14 views39 pages

4.4 Hierarchical Clustering Methods

hiiiii

Uploaded by

Tri Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views39 pages

4.4 Hierarchical Clustering Methods

hiiiii

Uploaded by

Tri Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

Chapter 4:

Cluster Analysis in Data Mining


Hierarchical Clustering Me thods
Objective
● Understand the concept of hierarchical clustering.
● Explore agglomerative and divisive clustering algorithms.
● Discover extensions to hierarchical clustering.
● Learn about BIRCH, a micro-clustering-based approach.
● Recognize the benefits and applications of hierarchical clustering.
● Let's embark on a journey through the fascinating world of hierarchical
clustering and its implications for data analysis.

2
Hierarchical Clustering Me thods
Hierarchical Clustering Methods
● Hierarchical clustering is a technique that organizes data points into a tree-like
structure of nested clusters. It offers insights into both macro and micro-level
relationships within data.
● Hierarchical clustering
○ Generate a clustering hierarchy (drawn as a dendrogram)
○ Not required to specify K, the number of clusters
○ More deterministic
○ No iterative refinement

3
Hierarchical Clustering Me thods
Hierarchical Clustering Methods
● Two categories of algorithms:
○ Agglomerative: Start with singleton clusters, continuously merge two
clusters at a time to build a bottom-up hierarchy of clusters
○ Divisive: Start with a huge macro-cluster, split it continuously into two
groups, generating a top-down hierarchy of clusters
● Key Concepts:
○ Agglomerative and divisive algorithms build hierarchies from bottom-up
and top-down, respectively.
○ Hierarchical clustering produces dendrogram visualizations.
○ Provides flexibility in exploring clusters at different granularity levels.
4
○ Suitable for various data types and proximity measures.
Hierarchical Clustering Me thods
Dendrogram: Shows How Clusters are Merged
● Dendrogram: Decompose a set of data objects into a tree of clusters by multi-
level nested partitioning
● A clustering of the data objects is obtained by cutting the dendrogram at the
desired level, then each connected component forms a cluster

5
Hierarchical Clustering Me thods
Agglomerative Clustering Algorithms
● Agglomerative hierarchical clustering starts with individual data points as
separate clusters and progressively merges them.
● AGNES (AGglomerative NESting) (Kaufmann and Rousseeuw, 1990)
○ Use the single-link method and the dissimilarity matrix
○ Continuously merge nodes that have the least dissimilarity
○ Eventually all nodes belong to the same cluster

6
Hierarchical Clustering Me thods
Agglomerative Clustering Algorithms
● Agglomerative clustering varies on different similarity measures among
clusters
○ Single link (nearest neighbor)
○ Complete link (diameter)
○ Average link (group average)
○ Centroid link (centroid similarity)r.

7
Hierarchical Clustering Me thods
Single Link vs. Complete Link in Hierarchical Clustering
● Single link (nearest neighbor)
○ The similarity between two clusters is the similarity between their most
similar (nearest neighbor) members
○ Local similarity-based: Emphasizing more on close regions, ignoring the
overall structure of the cluster
○ Capable of clustering non-elliptical shaped group of objects
○ Sensitive to noise and outliers

8
Hierarchical Clustering Me thods
Single Link vs. Complete Link in Hierarchical Clustering
● Complete link (diameter)
○ The similarity between two clusters is the similarity between their most
dissimilar members
○ Merge two clusters to form one with the smallest diameter
○ Nonlocal in behavior, obtaining compact shaped clusters
○ Sensitive to outliers

9
Hierarchical Clustering Me thods
Agglomerative Clustering: Average vs. Centroid Links
● Agglomerative clustering with average link
○ Average link: The average distance between an
element in one cluster and an element in the other
(i.e., all pairs in two clusters)
■ Expensive to compute
● Agglomerative clustering with centroid link
○ Centroid link: The distance between the centroids of
two clusters

10
Hierarchical Clustering Me thods
Agglomerative Clustering: Average vs. Centroid Links
● Group Averaged Agglomerative Clustering (GAAC)
○ Let two clusters Ca and Cb be merged into CaUb.
The new centroid is:

■ Na is the cardinality of cluster Ca, and ca is the centroid of Ca


○ The similarity measure for GAAC is the average of their distances
● Agglomerative clustering with Ward’s criterion
○ Ward’s criterion: The increase in the value of the SSE criterion for the
clustering obtained by merging them into Ca U Cb:
11
Hierarchical Clustering Me thods
Agglomerative Clustering Algorithms
● Dendrogram Visualization:
○ Depicts the merging process as a tree diagram.
○ Height of the fusion indicates the dissimilarity between clusters.
○ Cutting the dendrogram yields clusters at desired granularity.
○ Agglomerative clustering is intuitive and versatile, but it can be
computationally expensive for large datasets.

12
Hierarchical Clustering Me thods
Divisive Clustering Algorithms
● Divisive hierarchical clustering begins with all data points in a single cluster
and recursively divides them.
● DIANA (Divisive Analysis) (Kaufmann and Rousseeuw,1990)
○ Implemented in some statistical analysis packages, e.g., Splus
● Inverse order of AGNES: Eventually each node forms a cluster on its own

13
Hierarchical Clustering Me thods
Divisive Clustering Algorithms
● Divisive clustering is a top-down approach
○ The process starts at the root with all the points as one cluster
○ It recursively splits the higher level clusters to build the dendrogram
○ Can be considered as a global approach
○ More efficient when compared with agglomerative clustering

14
Hierarchical Clustering Me thods
Divisive Clustering Algorithms
● Dendrogram Visualization:
○ Starts with a single cluster that splits into subclusters.
○ Divisions occur at varying levels of granularity.
○ Also yields a dendrogram, highlighting the hierarchical structure.
○ Divisive clustering allows exploration of finer details within clusters but
can be challenging to implement effectively.

15
Hierarchical Clustering Me thods
Extensions to Hierarchical Clustering
● Hierarchical clustering can be extended to handle specific challenges and data
characteristics.
● Single Linkage (Minimum Linkage):
○ Measures distance between two clusters as the minimum distance between
any pair of points.
○ Vulnerable to "chaining" effect, where distant points influence the linkage.

16
Hierarchical Clustering Me thods
Extensions to Hierarchical Clustering
● Complete Linkage (Maximum Linkage):
○ Measures distance between two clusters as the maximum distance
between any pair of points.
○ Can lead to "crowding" problem, where compact clusters are merged.
● Average Linkage:
○ Measures distance between two clusters as the average distance between
all pairs of points.
○ Balanced approach between single and complete linkage.

17
Hierarchical Clustering Me thods
Extensions to Hierarchical Clustering
● Major weaknesses of hierarchical clustering methods
○ Can never undo what was done previously
○ Do not scale well
■ Time complexity of at least O(n2), where n is the number of total
objects
● Other hierarchical clustering algorithms
○ BIRCH (1996): Use CF-tree and incrementally adjust the quality of sub-
clusters
○ CURE (1998): Represent a cluster using a set of well-scattered representative
points
18
○ CHAMELEON (1999): Use graph partitioning methods on the K-nearest
Hierarchical Clustering Me thods
BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies
● BIRCH is a hierarchical clustering approach designed for large datasets by
employing a micro-clustering strategy.
● A multiphase clustering algorithm (Zhang, Ramakrishnan & Livny,
SIGMOD’96)
● Incrementally construct a CF (Clustering Feature) tree, a hierarchical data
structure for multiphase clustering
○ Phase 1: Scan DB to build an initial in-memory CF tree (a multi-level
compression of the data that tries to preserve the inherent clustering
structure of the data)
○ Phase 2: Use an arbitrary clustering algorithm to cluster the leaf nodes of
the CFtree 19
Hierarchical Clustering Me thods
BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies
● Key idea: Multi-level clustering
○ Low-level micro-clustering: Reduce complexity and increase scalability
○ High-level macro-clustering: Leave enough flexibility for high-level
clustering
● Scales linearly: Find a good clustering with a single scan and improve the
quality with a few additional scans

20
Hierarchical Clustering Me thods
Clustering Feature Vector in BIRCH
● Clustering Feature (CF): CF = (N, LS, SS) \
○ N: Number of data points
○ LS: linear sum of N points:
○ SS: square sum of N points:
● Clustering feature:
○ Summary of the statistics for a given sub-cluster: the 0-th, 1st, and 2nd
moments of the sub-cluster from the statistical point of view
○ Registers crucial measurements for computing cluster and utilizes storage
efficiently

21
Hierarchical Clustering Me thods
Measures of Cluster: Centroid, Radius and Diameter
● Centroid:
○ the “middle” of a cluster
○ n: number of points in a cluster
○ is the i-th point in the cluster
● Radius: R
○ Average distance from member objects to the centroid
○ The square root of average distance from any point of the cluster to its
centroid

22
Hierarchical Clustering Me thods
Measures of Cluster: Centroid, Radius and Diameter
● Diameter: D
○ Average pairwise distance within a cluster
○ The square root of average mean squared distance between all pairs of
points in the cluster

23
Hierarchical Clustering Me thods
The CF Tree Structure in BIRCH
● Incremental insertion of new points (similar to B+-tree)
● For each point in the input
○ Find closest leaf entry
○ Add point to leaf entry and update CF
○ If entry diameter > max_diameter
■ split leaf, and possibly parents
○ A CF tree has two parameters
■ Branching factor: Maximum number of children
■ Maximum diameter of subclusters stored at the leaf nodes
24
Hierarchical Clustering Me thods
The CF Tree Structure in BIRCH
● A CF tree: A height-balanced tree that stores the clustering features (CFs)
● The non-leaf nodes store sums of the CFs of their children

25
Hierarchical Clustering Me thods
BIRCH: A Scalable and Flexible Clustering Method
● An integration of agglomerative clustering with other (flexible) clustering
methods
○ Low-level micro-clustering
■ Exploring CP-feature and BIRCH tree structure
■ Preserving the inherent clustering structure of the data
○ Higher-level macro-clustering
■ Provide sufficient flexibility for integration with other clustering
methods

26
Hierarchical Clustering Me thods
BIRCH: A Scalable and Flexible Clustering Method
● Impact to many other clustering methods and applications
● Concerns
○ Sensitive to insertion order of data points
○ Due to the fixed size of leaf nodes, clusters may not be so natural
○ Clusters tend to be spherical given the radius and diameter measures

27
Hierarchical Clustering Me thods
BIRCH: Advantages and Applications
● Applications:
○ Network traffic analysis for intrusion detection.
○ E-commerce customer segmentation.
○ Biological sequence clustering.
○ Sensor data analysis.
● BIRCH's innovative micro-clustering strategy makes it a valuable tool for
scalable and memory-efficient clustering tasks.

28
Hierarchical Clustering Me thods
Case Study: Retail Store Clustering with Hierarchical Methods
● Let's apply hierarchical clustering to a retail store dataset for location-based
segmentation.
● Objective:
○ Group stores based on geographical sales patterns.
○ Identify clusters with similar market behavior.

29
Hierarchical Clustering Me thods
Case Study: Retail Store Clustering with Hierarchical Methods
● Steps:
○ Data Preparation: Convert store sales and location data into suitable
format.
○ Agglomerative Clustering: Apply agglomerative hierarchical clustering.
○ Dendrogram Analysis: Analyze dendrogram to determine optimal cluster
count.
○ Cluster Interpretation: Interpret characteristics of formed clusters.
○ Strategy Development: Design marketing strategies for each cluster.
○ Hierarchical clustering helps unravel store behavior and guides targeted
marketing efforts.
30
Hierarchical Clustering Me thods
Hierarchical Clustering vs. Partitioning-Based Clustering
● Hierarchical Clustering:
○ Produces nested clusters in a tree-like structure.
○ Offers exploration at different granularity levels.
○ Dendrogram provides visual insights into cluster relationships.
○ Suitable for diverse data types.
● Partitioning-Based Clustering:
○ Divides data into distinct, non-overlapping clusters.
○ Requires specifying the number of clusters (K).
○ Applicable for large datasets and efficient for K-Means.
○ Offers flexibility in selecting different similarity measures. 31
Hierarchical Clustering Me thods
Hierarchical Clustering vs. Partitioning-Based Clustering
● Choosing between hierarchical and partitioning-based clustering depends on
the data characteristics and analysis goals

32
Hierarchical Clustering Me thods
Challenges and Considerations
● Hierarchical clustering methods come with their own set of challenges and
considerations.
● Challenges:
○ Dendrogram Interpretation: Determining the optimal number of clusters
from a dendrogram.
○ Computation Complexity: Agglomerative clustering can be resource-
intensive for large datasets.
○ Choice of Linkage: Selecting the appropriate linkage method based on
data characteristics.
○ Hierarchical Nature: Dendrograms may become complex and hard to
interpret. 33
Hierarchical Clustering Me thods
Challenges and Considerations
● Considerations:
○ Experiment with different linkage methods.
○ Use silhouette scores or other metrics to evaluate cluster quality.
○ Balance granularity and interpretability based on analysis goals.

34
Hierarchical Clustering Me thods
Visualizing Hierarchical Clustering
● Visualizing hierarchical clustering results enhances our understanding of
formed clusters.
● Dendrogram Visualization:
○ Depicts merging/dividing process.
○ Provides insights into cluster relationships.
○ Helpful for selecting optimal cluster count.

35
Hierarchical Clustering Me thods
Visualizing Hierarchical Clustering
● Visualizing hierarchical clustering results enhances our understanding of
formed clusters.
● Cluster Visualization:
○ Plot clusters in multidimensional space.
○ Use dimensionality reduction techniques (e.g., PCA, t-SNE).
○ Visualize cluster characteristics, separability, and patterns.
○ Visualizations empower us to extract meaningful insights from complex
hierarchical clustering outcomes.

36
Hierarchical Clustering Me thods
Hierarchical Clustering in Data Exploration
● Hierarchical clustering can be an effective tool for exploratory data analysis.
● Data Exploration Benefits:
○ Pattern Discovery: Identify inherent data patterns and relationships.
○ Anomaly Detection: Detect outliers and unusual data points.
○ Data Reduction: Group similar data points for summarization.
○ Segmentation: Uncover distinct segments or subgroups.
● By applying hierarchical clustering to data exploration, we unveil valuable
insights that guide subsequent analyses and decision-making.

37
Hierarchical Clustering Me thods
Recap and Key Takeaways
● Agglomerative clustering merges data points into nested clusters.
● Divisive clustering recursively divides data points into clusters.
● Linkage methods (single, complete, average) influence clustering outcomes.
● BIRCH utilizes micro-clustering for memory-efficient clustering.
● Applications include retail store segmentation and gene expression analysis.
● As you continue your journey through hierarchical clustering, remember the
significance of dendrograms and the diverse applications of this method in
various fields.

38
Summary
Hierarchical Clustering
● Hierarchical clustering organizes data in a tree-like structure.
● Agglomerative starts with individual data points and merges them.
● Divisive begins with all data in one cluster and splits iteratively.
● Useful for exploring data hierarchy and relationships.

39

You might also like