0% found this document useful (0 votes)

89 views103 pages

Image Segmentation

Image segmentation involves dividing an image into regions of similar pixels by looking for coherent regions where pixels share properties like color or intensity. Common segmentation methods include region growing, clustering, split-and-merge, and watershed algorithms, with split-and-merge starting with the whole image and recursively splitting inhomogeneous regions until homogeneity is achieved and then merging neighboring regions. The goal of segmentation is to partition an image into meaningful objects or regions while minimizing errors from over- and under-segmentation.

Uploaded by

chaman.singh.ug20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views103 pages

Image Segmentation

Uploaded by

chaman.singh.ug20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 103

Image segmentation

By Prof. Jyotsna Singh

Division of ECE
What is segmentation?
• Segmentation divides an image into groups of pixels
• Pixels are grouped because they share some local property (gray level,
color, texture, motion, etc.)

0
7 3

boundaries labels pseudocolors mean colors

(different ways of displaying the output)

Region based segmentation
• Goal: find coherent (homogeneous) regions in the image

• Coherent regions contain pixel which share some similar

property

• Advantages: Better for noisy images

• Disadvantages: Oversegmented (too many regions),
Undersegmented (too few regions)

• Can’t find objects that span multiple disconnected regions

Two errors

oversegmentation
(hair should
be one group)

undersegmentation
(water should be
separated from trees)
Types of segmentations

Input Over-segmentation Under-segmentation

Multiple Segmentations
Image Segmentation
• So all we have to do is define and implement
the similarity predicate.

• But, what do we want to be similar in each

region?
• Is there any property that will cause the
regions to be meaningful objects?
Foreground / background separation

Background subtraction provides figure-ground separation,

which is a type of segmentation
Image segmentation
• Segmentation = partitioning
Carve dense data set into (disjoint) regions
– Divide image based on pixel similarity
– Divide spatiotemporal volume based on image similarity (shot
detection)
– Figure / ground separation (background subtraction)
– Regions can be overlapping (layers)

• Grouping = clustering
Gather sets of items according to some model
– If items are dense, then essentially the same problem as
above
(e.g., clustering pixels)
– If items are sparse, then problem has a slightly different
flavor:
• Collect tokens that lie on a line
• Collect pixels that share the same fundamental matrix
• Group 3D surface elements that belong to the same surface
Segmentation as partitioning
• A partition of image is collection of sets S1, .., SN such
that
I = S 1 U S2 … U S N (sets cover entire image)
Si ∩ Sj = 0 for all i ≠ j (sets do not overlap)
• A predicate H(Si) measures region homogeneity
{
H( R ) = true if pixels in region R are similar
false otherwise
• We want
1. Regions to be homogeneous
H( Si ) = true for all i
2. Adjacent regions to be different from each other
H( Si U Sj ) = false for all adjacent Si , Sj
Region based segmentation
• An image domain X must be segmented in N different regions
R1,…,RN
• The segmentation rule is a logical predicate of the form P(R)
• Image segmentation partitions the set X into the subsets Ri ,
i=1,…,N having the following properties

Every pixel must be in a region

Points in a region must be connected.

Regions must be disjoint.

All pixels in a region satisfy specific

properties.

Different regions have different properties.

Region based segmentation
• Each image R is a
set of regions Ri.
– Every pixel
belongs to one
region.

R7
R6

– One pixel can

R1
only belong to a R5

single region.
R2 R3

R4
Thresholding
How can we divide an image into uniform regions?
• One of the simplest methods is that of histogram and
thresholding.
– If we plot the number of pixels which have a specific grey
value versus that value, we create the histogram of the
image.
– Properly normalized, the histogram is essentially the
probability density function of the grey values of the
image.
• Assume that we have an image consisting of a bright object
on a dark background and assume that we want to extract
the object.
How can we divide an image into uniform regions?
• For such an image, the histogram will have two peaks and a valley
between them.
• We can choose as the threshold then the grey value which
corresponds to the valley of the histogram, indicated by t0 , and label
all pixels with grey values greater than t0 as object pixels and all pixels
with grey values smaller than t0 as background pixels.

The histogram of an image with a bright object on a dark background.

What do we mean by “labelling” an image?

• When we say we “extract” an object in an image, we

mean that we identify the pixels that make the
object up.
• To express this information, we create an array of
the same size as the original image and we give to
each pixel a label.
• All pixels that make up the object are given the
same label and all pixels that make up the
background are given a different label.
How can we choose the minimum error threshold?
• Let us assume that the pixels which make up the object are
distributed according to the probability density function po(x)
and the pixels which make up the background are distributed
according to function pb(x).

• Their weighted sum, i.e. po(x) and pb(x) multiplied with the total
number of pixels that make up the object, No, and the
background, Nb, respectively, and added, is the histogram of the
image
How can we choose the minimum error threshold?
• Assume that we choose a threshold value t. Then the
error committed by misclassifying object pixels as
background pixels will be given by

• and the error committed by misclassifying

background pixels as object pixels is:

• In other words, the error that we commit arises from

misclassifying the two tails of the two probability
density functions on either side of threshold t.
How can we choose the minimum error threshold?
• Let us assume that the fraction of the pixels that make up the
object is θ, and,
• by inference, the fraction of the pixels that make up the
background is 1 − θ. Then, the total error is:

• We would like to choose t so that E(t) is minimum. We take

the first derivative of E(t) with respect to t and set it to zero:

• The solution of this equation gives the minimum error

threshold, for any type of probability density functions that
are used to model the two pixel populations.
Example
Example
Image Segmentation
• Segmentation of an image entails the division
or separation of the image into regions of
similar attribute.

1. Region Growing
2. Clustering
3. Split and Merge
4. Watershed algorithm
1. Split-and-Merge
• Split-and-merge algorithm combines these two ideas
– Split image into quadtree, where each region satisfies homogeneity
criterion
– Merge neighboring regions if their union satisfies criterion (like
connected components)

image after split after merge

Two approaches
• Splitting • Merging
(Divisive clustering) (Agglomerative clustering)
– start with single region – start with each pixel as a
covering entire image separate region
– repeat: split – repeat: merge adjacent
inhomogeneous regions regions if union is
homogeneous
– even better:
repeat: split cluster to – even better:
yield two distant repeat: merge two
components (difficult) closest clusters
Property 2 is always true: Property 1 is always true:
H( Si U Sj ) = false for H( Si ) = true for every
adjacent regions region
Goal is to satisfy Property 1: Goal is to satisfy Property 2:
H( Si ) = true for every region H( Si U Sj ) = false for
adjacent regions
Split and Merge
• The basic idea of region splitting is to break the image into a set of
disjoint regions which are coherent within themselves:

• Initially take the image as a whole to be the area of interest.

• Look at the area of interest and decide if all pixels contained in the
region satisfy some similarity constraint.

• If TRUE then the area of interest corresponds to a region in the

image.

• If FALSE split the area of interest (usually into four equal

sub-areas) and consider each of the sub-areas as the area of interest
in turn.
Region splitting
• Start with entire image as a single region
• Repeat:
– Split any region that does not satisfy homogeneity criterion into
subregions
• Quad-tree representation is convenient
• Then need to merge regions that have been split
Aa Ab B

Aa Ada
Adc Add
C D
Split and Merge

• This process continues until no further splitting occurs. In the

worst case this happens when the areas are just one pixel in
size.
• This is a divide and conquer or top down method.
• If only a splitting schedule is used then the final segmentation
would probably contain many neighbouring regions that have
identical or similar properties.
• Thus, a merging process is used after each split which
compares adjacent regions and merges them if necessary.
• Algorithms of this nature are called split and
merge algorithms.
• To illustrate the basic principle of these methods let us consider
an imaginary image.
Split and Merge
• Let I denote the whole image shown in Fig (a).
• Not all the pixels in I are similar so the region is split as in Fig
(b).
• Assume that all pixels within regions I1, I2 and I3 respectively
are similar but those in I4 are not.
• Therefore I4 is split next as in Fig (c).
• Now assume that all pixels within each region are similar with
respect to that region, and that after comparing the split
regions, regions I43 and I44 are found to be identical.
• These are thus merged together as in Fig (d).
Split and Merge
Split and Merge
We can describe the splitting of the image using a tree structure,
using a modified quadtree.
Each non-terminal node in the tree has at most four descendants,
although it may have less due to merging. See Fig.
2. Region Growing

Region growing techniques start with one pixel of a

potential region and try to grow it by adding adjacent
pixels till the pixels being compared are too disimilar.

• Thefirst pixel selected can be just the first unlabeled

pixel in the image.

• For region growing we need a rule describing a

growth mechanism and a rule checking the
homogeneity of the regions after each growth step
31
Region Growing
• Region growing approach is the opposite of the split and merge
approach:

• An initial set of small areas are iteratively merged according to

similarity constraints.
• Start by choosing an arbitrary seed pixel and compare it with
neighbouring pixels.
• Region is grown from the seed pixel by adding in neighbouring
pixels that are similar, increasing the size of the region.
• When the growth of one region stops we simply choose another
seed pixel which does not yet belong to any region and start again.
• This whole process is continued until all pixels belong to some
region.
•A bottom up method.
Region growing methods often give very good segmentations that correspond
well to the observed edges.
Region Growing
• Start with a pixel, or a group of pixels, and examine the
neighboring pixels. If a neighboring pixel meets a
certain criteria, it is added to the group, and if it does
not meet the criteria, it is not added.
• This process is continued until no more neighboring
pixels can be added to the group. Thus, a region is
defined.
• The criteria depends on how you wish to segment the
region.
• It can be a limit on the derivative between pixels, a
change in color, or any other criteria you wish to
differentiate between pixels.
Method One - Recursive Region
Growing
The idea behind recursive region growing is as follows:
• Start with a pixel, and examine the eight pixels bordering it.

• If a pixel meets the criteria for addition to the group, you

recursively call the function on that pixel.

• This process continues until all possible pixels have been

examined.

• The problem with this recursive segmentation routine is processing

power, because for a large region many, many pixels will be
admissable to the region, and thus there will be many, many
recursions before the recursion sequence will terminate.

• In fact, with our implementation of the recursive region growing

routine, we had the problem that matlab would crash with a
segmentation fault.
Region growing
• Starting with a particular seed pixel and letting this region grow
completely before trying other seeds biases the segmentation in
favor of the regions which are segmented first.

• This can have several undesirable effects:

• Current region dominates the growth process -- ambiguities

around edges of adjacent regions may not be resolved
correctly.

• Different choices of seeds may give different segmentation

results.

• Problems can occur if the (arbitrarily chosen) seed point lies

on an edge.
Region growing
• To counter the above problems, simultaneous region
growing techniques have been developed.

• Similarities of neighbouring regions are taken into account in the

growing process.

• No single region is allowed to completely dominate the

proceedings.

• A number of regions are allowed to grow at the same time.

• similar regions will gradually coalesce into expanding regions.

• Control of these methods may be quite complicated but efficient

methods have been developed.

• Easy and efficient to implement on parallel computers.

3. Clustering
•There are K clusters C1,…, CK with means m1,…, mK.

• The least-squares error is defined as

K 2
D = ∑ ∑ || xi - mk || .
k=1 xi ∈
Ck
• Out of all possible partitions into K clusters,
choose the one that minimizes D.
Why don’t we just do this?
If we could, would we get meaningful objects?
Clustering
• Task of grouping a set of objects

• Objects in the same group (called a cluster) are more

similar (in some sense or another) to each other

• Object of one cluster is different from an object of the

another cluster

• Connectivity model, centroid model, distribution

model, density model, graph based model, hard
clustering, soft-clustering, …
An Ideal Clustering Situation
Fig. 20.1

Variable 1

Variable 2
More Common Clustering Situation
Fig. 20.2

Variable 1

X
Variable 2
Statistics Associated with Cluster Analysis

• Agglomeration schedule. Gives information on the objects

being combined at each stage of a hierarchical clustering
process.
• Cluster centroid. Mean values of the variables for all the cases
in a particular cluster.
• Cluster centers. Initial starting points in nonhierarchical
clustering. Clusters are built around these centers, or seeds.
• Cluster membership. Indicates the cluster to which each
object or case belongs.
Statistics Associated with Cluster Analysis
• Dendrogram (A tree graph). A
graphical device for displaying
clustering results.
-Vertical lines represent clusters
that are joined together.
-The position of the line on the
scale indicates distances at which
clusters were joined.
• Distances between cluster centers.
These distances indicate how
separated the individual pairs of
clusters are. Clusters that are
widely separated are distinct, and
therefore desirable.
• Icicle diagram. Another type of
graphical display of clustering
results.
Dendrograms
Dendrogram yields a picture of output as clustering process continues

raw data

clusters represented as tree

from http://en.wikipedia.org/wiki/Dendrogram
the icicle plot
• In their paper, Kruskal and Landwher described a way to implement icicle plots with
the simple plotters.
• “In the icicle plot each vertical line topped by a label corresponds to an object.
• The label is repeated vertically with the symbol "&" used to separate successive copies,
down to the level where the object becomes a singleton cluster.
• Each horizontal line in the icicle plot shows one level of the clustering, as illustrated on
the right.
• Objects in the same cluster are joined by the symbol "=," while clusters are separated
by a blank space.
• At the left of the line are a serial number and proximity level for this stage of the
clustering.”
Conducting Cluster Analysis
Formulate the Problem

Select a Distance Measure

Select a Clustering Procedure

Decide on the Number of Clusters

Interpret and Profile Clusters

Assess the Validity of Clustering

Formulating the Problem
• Most important is selecting the variables on which
the clustering is based. Inclusion of even one or
two irrelevant variables may distort a clustering
solution.
• Variables selected should describe the similarity
between objects.
• Should be selected based on past research,
theory, or a consideration of the hypotheses being
tested.
Select a Similarity Measure
• Similarity measure can be correlations or distances

• The most commonly used measure of similarity is the

Euclidean distance. The city-block distance is also used.

• If variables measured in vastly different units, we must

standardize data. Also eliminate outliers

• Use of different similarity/distance measures may lead

to different clustering results.

• Hence, it is advisable to use different measures and

compare the results.
Classification of Clustering Procedures
Clustering
Procedures

Hierarchical Nonhierarchica
l

Agglomerative Divisive

Linkage Variance Centroi Sequential Parallel Optimizing

Method Method Method
d Threshol Threshol Partitionin
s s s d d g
Ward’s
Method

Single Complete Average

Linkage Linkage Linkage
Hierarchical Clustering Methods
• Hierarchical clustering is characterized by the development of
a hierarchy or tree-like structure.
-Agglomerative clustering starts with each object in a
separate cluster. Clusters are formed by grouping objects into
bigger and bigger clusters. Agglomerative methods are
commonly used in marketing research.
-Divisive clustering starts with all the objects grouped in a
single cluster. Clusters are divided or split until each object is
in a separate cluster.
Hierarchical Agglomerative Clustering-Linkage
Method
• The single linkage method is based on minimum
distance, or the nearest neighbor rule.

• The complete linkage method is based on the

maximum distance or the furthest neighbor approach.

• The average linkage method the distance between two

clusters is defined as the average of the distances
between all pairs of objects
Linkage Methods of Clustering
Single Linkage
Minimum Distance

Cluster 1 Cluster 2
Complete Linkage
Maximum
Distance

Cluster 1 Cluster 2
Average Linkage

Average Distance
Cluster 1 Cluster 2
Hierarchical Agglomerative Clustering-Variance
and Centroid Method
• Variance methods generate clusters to minimize the within-cluster
variance.

• Ward's procedure is commonly used. For each cluster, the sum of

squares is calculated. The two clusters with the smallest increase in the
overall sum of squares within cluster distances are combined.

• In the centroid methods, the distance between two clusters is the

distance between their centroids (means for all the variables),

• Of the hierarchical methods, average linkage and Ward's methods

have been shown to perform better than the other procedures.
Other Agglomerative Clustering Methods
Ward’s Procedure

Centroid Method
Select a Clustering Procedure
• The hierarchical and nonhierarchical methods should be used
in tandem.
-First, an initial clustering solution is obtained using a
hierarchical procedure (e.g. Ward's).
-The number of clusters and cluster centroids so obtained
are used as inputs to the optimizing partitioning method.
• Choice of a clustering method and choice of a distance
measure are interrelated.
• For example, squared Euclidean distances should be used
with the Ward's and centroid methods. Several
nonhierarchical procedures also use squared Euclidean
distances.
Nonhierarchical Clustering Methods

• The nonhierarchical clustering methods are frequently

referred to as k-means clustering. .

-In the sequential threshold method, a cluster center is selected

and all objects within a prespecified threshold value from the
center are grouped together.

-In the parallel threshold method, several cluster centers are

selected and objects within the threshold level are grouped with
the nearest center.

-The optimizing partitioning method differs from the two

threshold procedures in that objects can later be reassigned to
clusters to optimize an overall criterion, such as average within
cluster distance for a given number of clusters.
Centroid model
• Computational time is short

• User have to decide the number of clusters

before starting classifying data

• The concept of centroid

• One of the famous method: K-means Method

Idea Behind K-Means
Algorithm for K-means clustering
1. Partition items into K clusters.
2. Assign items to cluster with nearest centroid mean.
3. Recalculate centroids both for cluster receiving and
losing item.
4. Repeat steps 2 and 3 till no more reassignments.
k-Means Algorithm
• k-Means clustering algorithm proposed by J. Hartigan and M. A. Wong
[1979].
• Given a set of n distinct objects, the k-Means clustering algorithm partitions
the objects into k number of clusters such that intracluster similarity is high
but the intercluster similarity is low.
• In this algorithm, user has to specify k, the number of clusters and consider
the objects are defined with numeric attributes and thus using any one of the
distance metric to demarcate the clusters.
k-Means Algorithm
The algorithm can be stated as follows.
• First it selects k number of objects at random from the set of n objects. These
k objects are treated as the centroids or center of gravities of k clusters.
• For each of the remaining objects, it is assigned to one of the closest centroid.
Thus, it forms a collection of objects assigned to each centroid and is called a
cluster.
• Next, the centroid of each cluster is then updated (by calculating the mean
values of attributes of each object).
• The assignment and update procedure is until it reaches some stopping criteria
(such as, number of iteration, centroids remain unchanged or no assignment,
etc.)

CS 40003: Data Analytics 64

k-Means Algorithm
k-Means clustering
Algorithm
Input: D is a dataset containing n objects, k is the number of cluster
Output: A set of k clusters
Steps:
1. Randomly choose k objects from D as the initial cluster centroids.
2. For each of the objects in D do
• Compute distance between the current objects and k cluster centroids
• Assign the current object to that cluster to which it is closest.

3. Compute the “cluster centers” of each cluster. These become the new cluster
centroids.
4. Repeat step 2-3 until the convergence criterion is satisfied
5. Stop
k-Means Algorithm
•
Illustration of k-Means clustering algorithms
• Fig 16.1: Plotting data of Table 16.1

A1 A2
6.8 12.6
0.8 9.8
1.2 11.6
2.8 9.6
3.8 9.9
4.4 6.5
4.8 1.1
6.0 19.9
6.2 18.5
7.6 17.4
7.8 12.2
6.6 7.7
8.2 4.5
8.4 6.9
9.0 3.4
9.6 11.1
CS 40003: Data Analytics 67
Illustration of k-Means clustering algorithms
• Suppose, k=3. Three objects are chosen at random shown as circled (see Fig
16.1). These three centroids are shown below.
Initial Centroids chosen randomly
Centroi Objects
d A1 A2
c1 3.8 9.9
c2 7.8 12.2
c3 6.2 18.5
• Let us consider the Euclidean distance measure (L2 Norm) as the distance
measurement in our illustration.
• Let d1, d2 and d3 denote the distance from an object to c1, c2 and c3
respectively. The distance calculations are shown in Table 16.2.
• Assignment of each object to the respective centroid is shown in the
right-most column and the clustering so obtained is shown in Fig 16.2.

CS 40003: Data Analytics 68

Illustration of k-Means clustering algorithms
Table 16.2: Distance calculation Fig 16.2: Initial cluster with respect to Table
A1 A2 d1 d2 d3 16.2
cluster
6.8 12.6 4.0 1.1 5.9 2
0.8 9.8 3.0 7.4 10.2 1
1.2 11.6 3.1 6.6 8.5 1
2.8 9.6 1.0 5.6 9.5 1
3.8 9.9 0.0 4.6 8.9 1
4.4 6.5 3.5 6.6 12.1 1
4.8 1.1 8.9 11.5 17.5 1
6.0 19.9 10.2 7.9 1.4 3
6.2 18.5 8.9 6.5 0.0 3
7.6 17.4 8.4 5.2 1.8 3
7.8 12.2 4.6 0.0 6.5 2
6.6 7.7 3.6 4.7 10.8 1
8.2 4.5 7.0 7.7 14.1 1
8.4 6.9 5.5 5.3 11.8 2
9.0 3.4 8.3 8.9 15.4 1
9.6 11.1 5.9 2.1 8.1 2

CS 40003: Data Analytics 69

Illustration of k-Means clustering algorithms
The calculation new centroids of the three cluster using the mean of attribute
values of A1 and A2 is shown in the Table below. The cluster with new centroids
are shown in Fig 16.3.

Calculation of new centroids

New Objects
Centroi A1 A2
d
c1 4.6 7.1
c2 8.2 10.7
c3 6.6 18.6

Fig 16.3: Initial cluster with new centroids

CS 40003: Data Analytics 70
Illustration of k-Means clustering algorithms
We next reassign the 16 objects to three clusters by determining which centroid is
closest to each one. This gives the revised set of clusters shown in Fig 16.4.
Note that point p moves from cluster C2 to cluster C1.

Fig 16.4: Cluster after first iteration

CS 40003: Data Analytics 71

Illustration of k-Means clustering algorithms
• The newly obtained centroids after second iteration are given in the table below.
Note that the centroid c3 remains unchanged, where c2 and c1 changed a little.
• With respect to newly obtained cluster centres, 16 points are reassigned again.
These are the same clusters as before. Hence, their centroids also remain
unchanged.
• Considering this as the termination criteria, the k-means algorithm stops here.
Hence, the final cluster in Fig 16.5 is same as Fig 16.4.

Fig 16.5: Cluster after Second iteration

Cluster centres after second iteration

Centroi Revised Centroids

d A1 A2
c1 5.0 7.1
c2 8.1 12.0
c3 6.6 18.6

CS 40003: Data Analytics 72

Comments on k-Means algorithm
•

CS 40003: Data Analytics 73

Partitional Clustering

• K-mean algorithm :
Decide the number of the final
classified result with N.

Numbers of cluster: N

we now assume N=3

Partitional Clustering

• K-mean algorithm :
Randomly choose N point for
the centroids of cluster.
(N=3)

Numbers of cluster: N
Partitional Clustering

• K-mean algorithm :
Find the nearest point for every
centroid of cluster. Classify the
point into the cluster.

Notice the definition of the nearest!

Partitional Clustering

• K-mean algorithm :
Calculate the new centroid of
every cluster.

Notice the definition of the

centroid!
Partitional Clustering

• K-mean algorithm :
Repeat step1~ step4 until all
the point are classified.
Partitional Clustering

• K-mean algorithm :
Data clustering completed

For N=3
Example

Image Clusters on intensity Clusters on color

Example
Input image Segmentation using K-means
K-Means Example 1
K-Means Example 2
Comparison K-means, K=6

Isodata, K became 5

Original
Comments on k-Means algorithm
•
Comments on k-Means algorithm
•
Comments on k-Means algorithm
Example : k versus cluster quality
• With reference to an arbitrary experiment, suppose the following results are
obtained.

k SSE
1 62.8
2 12.3
3 9.4
4 9.3
5 9.2
6 9.1
7 9.05
8 9.0
Comments on k-Means algorithm
2. Choosing initial centroids:
• Another requirement in the k-Means algorithm to choose initial cluster
centroid for each k would be clusters.
• It is observed that the k-Means algorithm terminate whatever be the initial
choice of the cluster centroids.
• It is also observed that initial choice influences the ultimate cluster quality.
In other words, the result may be trapped into local optima, if initial
centroids are chosen properly.
• One technique that is usually followed to avoid the above problem is to
choose initial centroids in multiple runs, each with a different set of
randomly chosen initial centroids, and then select the best cluster (with
respect to some quality measurement criterion, e.g. SSE).
• However, this strategy suffers from the combinational explosion problem
due to the number of all possible solutions.
Comments on k-Means algorithm
•
Comments on k-Means algorithm
3. Distance Measurement:
• To assign a point to the closest centroid, we need a proximity measure that
should quantify the notion of “closest” for the objects under clustering.
• Usually Euclidean distance (L2 norm) is the best measure when object points
are defined in n-dimensional Euclidean space.
• Other measure namely cosine similarity is more appropriate when objects are of
document type.
• Further, there may be other type of proximity measures that appropriate in the
context of applications.
• For example, Manhattan distance (L1 norm), Jaccard measure, etc.
Comments on k-Means algorithm
•
Comments on k-Means algorithm
Distance with document objects
Suppose a set of n document objects is defined as d document term matrix (DTM) (a
typical look is shown in the below form).

Documen Term
t
t1 t2 tn
D1
D2

Dn
Comments on k-Means algorithm
Note: The criteria of objective function with different proximity measures

1. SSE (using L2 norm) : To minimize the SSE.

2. SAE (using L1 norm) : To minimize the SAE.
3. TC(using cosine similarity) : To maximize the TC.
Comments on k-Means algorithm
•
Comments on k-Means algorithm
•
Comments on k-Means algorithm
•
Comments on k-Means algorithm
•

? Interpret the best centroid for maximizing TC (with Cosine similarity measure) of a
cluster.

The above mentioned discussion is quite sufficient for the validation of k-Means
algorithm.
Different variants of k-means algorithm
There are a quite few variants of the k-Means algorithm. These can differ in the
procedure of selecting the initial k means, the calculation of proximity and strategy
for calculating cluster means. Another variants of k-means to cluster categorical
data.
Few variant of k-Means algorithm includes
• Bisecting k-Means (addressing the issue of initial choice of cluster means).
1. M. Steinbach, G. Karypis and V. Kumar “A comparison of document clustering
techniques”, Proceedings of KDD workshop on Text mining, 2000.
• Mean of clusters (Proposing various strategies to define means and variants of
means).
• B. zhan “Generalised k-Harmonic means – Dynamic weighting of data in
unsupervised learning”, Technical report, HP Labs, 2000.
• A. D. Chaturvedi, P. E. Green, J. D. Carroll, “k-Modes clustering”, Journal of
classification, Vol. 18, PP. 35-36, 2001.
• D. Pelleg, A. Moore, “x-Means: Extending k-Means with efficient estimation of the
number of clusters”, 17th International conference on Machine Learning, 2000.
Different variants of k-means algorithm
• N. B. Karayiannis, M. M. Randolph, “Non-Euclidean c-Means
clustering algorithm”, Intelligent data analysis journal, Vol 7(5), PP
405-425, 2003.
• V. J. Olivera, W. Pedrycy, “Advances in Fuzzy clustering and its
applications”, Edited book. John Wiley [2007]. (Fuzzy c-Means
algorithm).
• A. K. Jain and R. C. Bubes, “Algorithms for clustering Data”, Prentice
Hall, 1988.
Online book at http://www.cse.msu.edu/~jain/clustering_Jain_Dubes.pdf
• A. K. Jain, M. N. Munty and P. J. Flynn, “Data clustering: A Review”,
ACM computing surveys, 31(3), 264-323 [1999]. Also available online.

Common questions

The k-means clustering algorithm is designed to minimize within-cluster variance by partitioning data into k clusters and iteratively optimizing cluster centroids . It begins by randomly selecting k centroids, then assigns each object to the nearest centroid. This forms clusters where objects are similar to their centroid. Next, new centroids are computed based on the current assignments . These steps repeat until no more reassignments occur or centroids cease to change, effectively reducing the variance within clusters and ensuring distinct separation among them . This iterative process continues until the solution stabilizes and achieves minimum error, making it effective for partitioning data into clear groupings.

The k-means clustering algorithm partitions a set of objects into k clusters, minimizing the variance within each cluster while maximizing the variance between clusters . It begins by choosing k initial centroids, assigning each data point to the nearest centroid, and iteratively updating the centroids based on the mean of assigned points until convergence . Proximity measures used in k-means clustering include the Euclidean distance for geometric data, cosine similarity for document-type data, and Manhattan distance for certain applications . These measures help determine how data points are assigned to clusters, influencing the clustering outcome and quality .

The 'split and merge' technique is a divide and conquer approach used for image segmentation, which involves dividing an image into smaller regions and then merging those that meet certain criteria . A quadtree representation is often used to facilitate this process. Initially, the whole image is considered a single region; regions that do not meet a homogeneity criterion are split until all regions satisfy the criterion or are reduced to the smallest size (potentially one pixel). Regions that were divided and are found to be similar enough upon comparison are then merged back . On the other hand, 'region growing' is a bottom-up approach starting from a seed pixel and expanding by including adjacent pixels that meet specific similarity criteria . Unlike split and merge, it grows regions iteratively by adding neighboring pixels until no more can be added. This approach is efficient for detecting edges and defining homogeneous regions in images .

Split-and-merge and region growing are fundamentally different approaches to image segmentation. Split-and-merge first considers the whole image and splits it into four segments if there is inhomogeneity, then merges neighboring homogeneous regions . This method uses a top-down, divide-and-conquer strategy, easily represented by a quadtree . In contrast, region growing starts with a seed pixel and adds adjacent pixels that meet predefined homogeneity criteria, expanding each region until no further pixels can be added . It uses a bottom-up approach focusing on localized homogeneity and allowing multiple regions to grow simultaneously to address ambiguities . Each method has unique characteristics tailored to specific types of images and segmentation goals.

The initial choice of centroids in the k-means clustering algorithm significantly influences the final outcome as it determines the starting configuration of clusters . An unsuitable selection can lead to suboptimal partitions where the algorithm converges to local minima rather than the global optimal solution . To address this, multiple runs of k-means can be executed with different sets of randomly chosen initial centroids, selecting the partition with the best quality metric, such as the lowest sum of squared errors (SSE). However, this approach might lead to a combinatorial explosion due to numerous possible solutions, thus careful selection or sophisticated methods like k-means++ can minimize sensitivity to initial positions .

In image segmentation, the main goal of the split-and-merge algorithm is to achieve regions that maximize homogeneity within themselves while minimizing similarity with adjacent regions . This is done by first splitting the image into smaller parts that meet a homogeneity criterion, and then merging adjacent regions that are homogeneously similar . The potential drawbacks include increased computational complexity due to iterative splitting and merging processes, and the fact that if only splitting is performed without merging, many neighboring regions could have similar properties, leading to over-segmentation . Moreover, the determination of suitable homogeneity criteria can affect the quality of segmentation outcomes and may require domain-specific tuning .

Hierarchical clustering methods construct a series of nested partitions either by a sequence of splits or merges, resulting in a tree-like structure where each node represents a cluster . The main advantage is that they do not require the number of clusters to be specified beforehand. Non-hierarchical methods, such as k-means clustering, require an initial number of clusters and focus on partitioning data into distinct and separate groups with high cohesion . They are computationally more efficient and often used when the number of clusters is predefined or needs optimization through artificial tuning . In non-hierarchical clustering, adjustments can be made via iterative reassignment processes until an optimal clustering solution is obtained .

To mitigate combinatorial explosion in k-means clustering due to initial centroid selection, one strategy is to apply the k-means++ algorithm, which carefully selects initial centroids to minimize variance and prevent poor initial configurations . Another approach involves executing the algorithm multiple times with different sets of random centroids and choosing the output with the optimal clustering measure, such as lowest SSE . While this increases computational cost, it often yields improved consistency in results . Techniques like bisecting k-means or hybrid methods integrating hierarchical insights can also reduce the impact of misselected centroids by balancing assignments throughout clustering iterations . These methods aim to reduce sensitivity to initial choice, ultimately improving cluster quality and robustness .

The watershed algorithm fundamentally differs from the split-and-merge technique by segmenting images based on topological interpretations: it visualizes an image as a topographic surface and simulates water filling the surface from local minima until basins are full, effectively dividing the image along watershed lines . This creates segments based on gradient changes and edge intensity, particularly useful for detecting precise boundaries in high-contrast images. In contrast, split-and-merge segments by analyzing pixel similarity and adjusts based on regional homogeneity rather than topographical representation . The split-and-merge method iteratively divides and combines regions to meet criteria, focusing on balancing homogeneity within regions but often requiring more computational effort due to iterative adjustments .

Simultaneous region growing addresses biases in image segmentation by allowing multiple regions to grow concurrently, thus preventing any single region from dominating and causing biased results . The main advantage of this approach is that it can resolve ambiguities at the edges of regions and ensure a comprehensive segmentation, as similar regions will coalesce gradually . However, the complexity of controlling these multiple growing regions can be high, requiring sophisticated parallel processing algorithms to manage effectively. This complexity might increase the computational resources needed but can lead to more accurate and nuanced segmentation results .

ImSeg 10 11 18
No ratings yet
ImSeg 10 11 18
41 pages
4.6 Inverse Filtering:: Digital Image Processing-R15
No ratings yet
4.6 Inverse Filtering:: Digital Image Processing-R15
20 pages
Segmentation
No ratings yet
Segmentation
37 pages
Image Segmentation Techniques Overview
100% (1)
Image Segmentation Techniques Overview
44 pages
Lect Segmen
No ratings yet
Lect Segmen
44 pages
Segmentation
No ratings yet
Segmentation
43 pages
DIP Mod 4 Segment Part A
No ratings yet
DIP Mod 4 Segment Part A
58 pages
Module 3
No ratings yet
Module 3
98 pages
Im Seg 04
No ratings yet
Im Seg 04
42 pages
Unit 4 N
No ratings yet
Unit 4 N
11 pages
5 Segmentation
No ratings yet
5 Segmentation
62 pages
Lecture 9-1
No ratings yet
Lecture 9-1
32 pages
Understanding Regions and Region Segmentation: by Nayan Khinvasara
No ratings yet
Understanding Regions and Region Segmentation: by Nayan Khinvasara
59 pages
Image Segmentation
No ratings yet
Image Segmentation
9 pages
Unit 4 - Image Segmentation
No ratings yet
Unit 4 - Image Segmentation
24 pages
Image Segmentation
No ratings yet
Image Segmentation
36 pages
Image Segmentation Techniques Guide
No ratings yet
Image Segmentation Techniques Guide
30 pages
UNIT-4 Image Segmentation
No ratings yet
UNIT-4 Image Segmentation
13 pages
Image Segmentation
No ratings yet
Image Segmentation
20 pages
Segmentation
No ratings yet
Segmentation
31 pages
Image Segmentation Techniques
No ratings yet
Image Segmentation Techniques
78 pages
Aa Dip 2 2 (Segmentation)
No ratings yet
Aa Dip 2 2 (Segmentation)
77 pages
Image Segmentation New
No ratings yet
Image Segmentation New
71 pages
Chap 10
No ratings yet
Chap 10
73 pages
Lecture # 23: Segmentation Cont.: Muhammad Rzi Abbas
No ratings yet
Lecture # 23: Segmentation Cont.: Muhammad Rzi Abbas
12 pages
Image Segmentation FINAL PAPER PDF
No ratings yet
Image Segmentation FINAL PAPER PDF
5 pages
Region Segment
No ratings yet
Region Segment
9 pages
Unit 4-Region Based
No ratings yet
Unit 4-Region Based
11 pages
Unit 4
No ratings yet
Unit 4
17 pages
Feature Extraction
No ratings yet
Feature Extraction
23 pages
RRL
No ratings yet
RRL
6 pages
4 - GGI 3203 - Introduction To Object Based Image Analysis - 2023
No ratings yet
4 - GGI 3203 - Introduction To Object Based Image Analysis - 2023
18 pages
Image Segmentation and Edge Detection
No ratings yet
Image Segmentation and Edge Detection
3 pages
Overview of Image Segmentation Techniques
No ratings yet
Overview of Image Segmentation Techniques
6 pages
Krishna Kant Singh, Akansha Singh
100% (1)
Krishna Kant Singh, Akansha Singh
4 pages
Image Seg
No ratings yet
Image Seg
23 pages
Computer Vision
No ratings yet
Computer Vision
6 pages
Region-Based Image Segmentation
No ratings yet
Region-Based Image Segmentation
135 pages
Survey on Region-Based Image Segmentation
No ratings yet
Survey on Region-Based Image Segmentation
5 pages
Image Segmentation in Digital Image Processing
No ratings yet
Image Segmentation in Digital Image Processing
71 pages
Segmentation
100% (1)
Segmentation
51 pages
Region Growing for Image Segmentation
100% (1)
Region Growing for Image Segmentation
19 pages
Lecture 8 Segmentation
No ratings yet
Lecture 8 Segmentation
54 pages
Image Segmentation
No ratings yet
Image Segmentation
37 pages
Rese
No ratings yet
Rese
4 pages
CV Lecture 7
No ratings yet
CV Lecture 7
119 pages
Segmentation, Edge& Line Detection: Step in Image Analysis
No ratings yet
Segmentation, Edge& Line Detection: Step in Image Analysis
9 pages
Review On Image Segmentation Techniques
No ratings yet
Review On Image Segmentation Techniques
6 pages
Segmentation Techniques Comparison in Image Processing: Abstract
No ratings yet
Segmentation Techniques Comparison in Image Processing: Abstract
12 pages
Dip M5
No ratings yet
Dip M5
96 pages
Image Segmentation Techniques
No ratings yet
Image Segmentation Techniques
52 pages
Statistical Region Merging: Richard Nock and Frank Nielsen
No ratings yet
Statistical Region Merging: Richard Nock and Frank Nielsen
7 pages
Colour Image Segmentation Using FPGA
100% (1)
Colour Image Segmentation Using FPGA
63 pages
Region Growing and Region Merging Image Segmentation: August 1997
No ratings yet
Region Growing and Region Merging Image Segmentation: August 1997
5 pages
Chapter5 - Motion Detection, Segmentation and Wavelets
No ratings yet
Chapter5 - Motion Detection, Segmentation and Wavelets
30 pages
Image Processing in Multimedia Databases
No ratings yet
Image Processing in Multimedia Databases
58 pages
SAT Math: Advanced Problem Solving
No ratings yet
SAT Math: Advanced Problem Solving
10 pages
Cat Cake Recipes for Special Occasions
No ratings yet
Cat Cake Recipes for Special Occasions
23 pages
SJS - DHT-Downhole Oil Tools
100% (1)
SJS - DHT-Downhole Oil Tools
156 pages
Electromagnetically Actuated Clutches and Brakes: General Notes
No ratings yet
Electromagnetically Actuated Clutches and Brakes: General Notes
50 pages
Journal of Chromatography A, XXX (2010) XXX-XXX
No ratings yet
Journal of Chromatography A, XXX (2010) XXX-XXX
15 pages
Pentaho CDE Extension Points Guide
No ratings yet
Pentaho CDE Extension Points Guide
5 pages
Model Exam For Wildlife and Ecotourism - 2023
100% (2)
Model Exam For Wildlife and Ecotourism - 2023
6 pages
Solid Dispersion Technology Overview
No ratings yet
Solid Dispersion Technology Overview
8 pages
Tips to Calm Overthinking Mind
No ratings yet
Tips to Calm Overthinking Mind
2 pages
Inbound 706381463902323526
No ratings yet
Inbound 706381463902323526
2 pages
ACTIVITY NO. 4: How Elements Heavier Than Iron Are Formed Name: Date: Year & Section: Score: Concept Notes
No ratings yet
ACTIVITY NO. 4: How Elements Heavier Than Iron Are Formed Name: Date: Year & Section: Score: Concept Notes
2 pages
Virtu AL
100% (1)
Virtu AL
132 pages
Indian Mammals Species List
No ratings yet
Indian Mammals Species List
17 pages
WSV 2020 Brochure Web
No ratings yet
WSV 2020 Brochure Web
8 pages
Grade 10 Physical Sciences Plan
No ratings yet
Grade 10 Physical Sciences Plan
7 pages
BT1100 Bulldog Ais
No ratings yet
BT1100 Bulldog Ais
1 page
STD 6 SCIENCE WORKSHEET2025-term1
67% (3)
STD 6 SCIENCE WORKSHEET2025-term1
2 pages
Earth-Science-Q1-Module-1
No ratings yet
Earth-Science-Q1-Module-1
10 pages
Hair and Scalp Properties Overview
No ratings yet
Hair and Scalp Properties Overview
215 pages
Health Products and Food Branch Inspectorate: Guidance For Medical Device Complaint Handling and Recalls
No ratings yet
Health Products and Food Branch Inspectorate: Guidance For Medical Device Complaint Handling and Recalls
14 pages
Starship Manual
No ratings yet
Starship Manual
24 pages
Wind Energy Exercises
No ratings yet
Wind Energy Exercises
12 pages
CarbonOffsets CaseStudy 1
No ratings yet
CarbonOffsets CaseStudy 1
7 pages
Quotation for PDB & APFC Panel
No ratings yet
Quotation for PDB & APFC Panel
2 pages
IoT Workshop in Electricity Sector 2019
No ratings yet
IoT Workshop in Electricity Sector 2019
2 pages
Chapter 3: Physical Security : at The End of The Unit, The Students Should Be Able To
No ratings yet
Chapter 3: Physical Security : at The End of The Unit, The Students Should Be Able To
12 pages
Nexus4 QSG UKG Print V1.0 121012-1 PDF
No ratings yet
Nexus4 QSG UKG Print V1.0 121012-1 PDF
14 pages
Overseer Grade III
No ratings yet
Overseer Grade III
4 pages
Youth Baseball Elbow Injury Guide
No ratings yet
Youth Baseball Elbow Injury Guide
37 pages
International Caries Detection and Assessment System (ICDAS) PDF
No ratings yet
International Caries Detection and Assessment System (ICDAS) PDF
4 pages