0% found this document useful (0 votes)
14 views5 pages

Machine Learning Week 8 Homework

Uploaded by

newborniphone
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views5 pages

Machine Learning Week 8 Homework

Uploaded by

newborniphone
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Homework 08

INT3401E 57

Student: Lam Hoang Hai


Student ID: 22028057

Exercise 1 [K-means clustering].


Given 8 data points in 2-D space: A1 = (2, 10), A2 = (2, 5), A3 = (8, 4), A4 =
(5, 8), A5 = (7, 5), A6 = (6, 4), A7 = (1, 2), A8 = (4, 9). Suppose that the initial
seeds (centers of each cluster) are A1 , A4 and A7 . Run the k-means algorithm
for 1 epoch only. At the end of this epoch show: the new clusters (i.e., the data
points belonging to each cluster) and the centers of the new clusters.

Solution
Step 1: Initial Setup
The initial cluster centers are:
Cluster 1: C1 = A1 = (2, 10)
Cluster 2: C2 = A4 = (5, 8)
Cluster 3: C3 = A7 = (1, 2)

Step 2: Assign Each Point to the Nearest Cluster


We calculate the Euclidean distance from each point to each cluster center.
For two points (x1 , y1 ) and (x2 , y2 ), the Euclidean distance is:
p
d = (x2 − x1 )2 + (y2 − y1 )2
1. Point A1 = (2, 10)
p
d(A1 , C1 ) = (2 − 2)2 + (10 − 10)2 = 0
p √ √
d(A1 , C2 ) = (5 − 2)2 + (8 − 10)2 = 9 + 4 = 13
p √ √
d(A1 , C3 ) = (1 − 2)2 + (2 − 10)2 = 1 + 64 = 65
Assign A1 to Cluster 1

1
2. Point A2 = (2, 5)
p √
d(A2 , C1 ) = (2 − 2)2 + (5 − 10)2 = 25 = 5
p √ √
d(A2 , C2 ) = (5 − 2)2 + (8 − 5)2 = 9 + 9 = 18
p √ √
d(A2 , C3 ) = (1 − 2)2 + (2 − 5)2 = 1 + 9 = 10
Assign A2 to Cluster 3

3. Point A3 = (8, 4)
p √ √
d(A3 , C1 ) = (8 − 2)2 + (4 − 10)2 = 36 + 36 = 72
p √ √
d(A3 , C2 ) = (5 − 8)2 + (8 − 4)2 = 9 + 16 = 25 = 5
p √ √
d(A3 , C3 ) = (1 − 8)2 + (2 − 4)2 = 49 + 4 = 53
Assign A3 to Cluster 2

4. Point A4 = (5, 8)
p √ √
d(A4 , C1 ) = (5 − 2)2 + (8 − 10)2 = 9 + 4 = 13

d(A4 , C2 ) = 0 (since A4 is the initial center of C2 )


p √ √
d(A4 , C3 ) = (1 − 5)2 + (2 − 8)2 = 16 + 36 = 52
Assign A4 to Cluster 2

5. Point A5 = (7, 5)
p √ √
d(A5 , C1 ) = (7 − 2)2 + (5 − 10)2 = 25 + 25 = 50
p √ √
d(A5 , C2 ) = (5 − 7)2 + (8 − 5)2 = 4 + 9 = 13
p √ √
d(A5 , C3 ) = (1 − 7)2 + (2 − 5)2 = 36 + 9 = 45
Assign A5 to Cluster 2

6. Point A6 = (6, 4)
p √ √
d(A6 , C1 ) = (6 − 2)2 + (4 − 10)2 = 16 + 36 = 52
p √ √
d(A6 , C2 ) = (5 − 6)2 + (8 − 4)2 = 1 + 16 = 17
p √ √
d(A6 , C3 ) = (1 − 6)2 + (2 − 4)2 = 25 + 4 = 29
Assign A6 to Cluster 2

2
7. Point A7 = (1, 2)
p √ √
d(A7 , C1 ) = (1 − 2)2 + (2 − 10)2 = 1 + 64 = 65
p √ √
d(A7 , C2 ) = (5 − 1)2 + (8 − 2)2 = 16 + 36 = 52
d(A7 , C3 ) = 0 (since A7 is the initial center of C3 )
Assign A7 to Cluster 3

8. Point A8 = (4, 9)
p √ √
d(A8 , C1 ) = (4 − 2)2 + (9 − 10)2 = 4 + 1 = 5
p √ √
d(A8 , C2 ) = (5 − 4)2 + (8 − 9)2 = 1 + 1 = 2
p √ √
d(A8 , C3 ) = (1 − 4)2 + (2 − 9)2 = 9 + 49 = 58
Assign A8 to Cluster 2

After this assignment, the clusters are as follows: - Cluster 1: A1 - Cluster


2: A3 , A4 , A5 , A6 , A8 - Cluster 3: A2 , A7

Step 3: Calculate New Cluster Centers


Calculate the mean (average) of the points in each cluster to find the new
centers.

• New Center of Cluster 1:

C1 = A1 = (2, 10)

• New Center of Cluster 2:


   
8+5+7+6+4 4+8+5+4+9 30 30
C2 = , = , = (6, 6)
5 5 5 5

• New Center of Cluster 3:


   
2+1 5+2 3 7
C3 = , = , = (1.5, 3.5)
2 2 2 2

Conclusion
After one epoch, the clusters and their new centers are:
• Cluster 1: Points {A1 }, New Center: (2, 10)
• Cluster 2: Points {A3 , A4 , A5 , A6 , A8 }, New Center: (6, 6)
• Cluster 3: Points {A2 , A7 }, New Center: (1.5, 3.5)

3
Exercise 2 [DBScan].
If ϵ = 2 and min sample is 2, what are the clusters that DBScan√ would discover
with the 8 data points in Exercise 1 ? What if ϵ is increased to 10?

Solution
Step 1: Determine Core Points with ϵ = 2
A point is considered a core point if it has at least min samples points (in-
cluding itself) within a distance of ϵ = 2. We calculate the distances between
each pair of points to determine the core points.

Distance Matrix
We first compute the distance between each pair of points using the Euclidean
distance formula: p
d = (x2 − x1 )2 + (y2 − y1 )2
The distance matrix for the points is as follows (rounded to two decimal places):

A1 A2 A3 A4 A5 A6 A7 A8
A1 0 5 7.21 3.61 6.71 7.21 8.25 2.24
A2 5 0 6.32 5 5.39 5.10 3.16 4.47
A3 7.21 6.32 0 4.24 1.41 2 7.07 5
A4 3.61 5 4.24 0 3.61 4.12 6.71 1.41
A5 6.71 5.39 1.41 3.61 0 1 6.08 4.47
A6 7.21 5.10 2 4.12 1 0 5.39 5
A7 8.25 3.16 7.07 6.71 6.08 5.39 0 7.28
A8 2.24 4.47 5 1.41 4.47 5 7.28 0

Step 2: Identify Core Points


With ϵ = 2 and min samples = 2, a point is a core point if it has at least 2
points within a distance of 2.
From the distance matrix, we find the core points as follows: - A3 : Has A5 and
A6 within a distance of ϵ. - A5 : Has A3 and A6 within a distance of ϵ. - A6 :
Has A3 and A5 within a distance of ϵ.
Thus, the core points are A3 , A5 , and A6 .

Step 3: Form Clusters


Starting from the core points, we form clusters by including points within ϵ
distance of each core point.
1. Start with core point A3 : - A5 and A6 are within distance ϵ = 2. - Form
a cluster with A3 , A5 , and A6 .
Since A5 and A6 are already part of the cluster, no new clusters are formed.

4
2. Remaining points A1 , A2 , A4 , A7 , and A8 do not have enough neighbors
within ϵ = 2 to form a cluster. These points are labeled as noise.

Result for ϵ = 2
With ϵ = 2 and min samples = 2: - We have one cluster: {A3 , A5 , A6 } - The
points A1 , A2 , A4 , A7 , and A8 are noise.

Step 4: Increase ϵ to 10

Now, we increase ϵ to 10 ≈ 3.16 and re-evaluate.

Identify Core Points with ϵ = 10

With ϵ = 10 and min samples = 2, we identify the core points again.
- A1 : Has A4 and A8 within ϵ. - A3 : Has A5 and A6 within ϵ. - A4 : Has A1
and A8 within ϵ. - A5 : Has A3 and A6 within ϵ. - A6 : Has A3 and A5 within
ϵ. - A8 : Has A1 and A4 within ϵ.
Thus, the core points are A1 , A3 , A4 , A5 , A6 , and A8 .

Form Clusters with ϵ = 10
Using the core points, we form clusters as follows:
1. Start with core point A1 : - A4 and A8 are within distance ϵ. - Form a
cluster: {A1 , A4 , A8 }.
2. Start with core point A3 : - A5 and A6 are within distance ϵ. - Form
another cluster: {A3 , A5 , A6 }.
3. Points A2 and A7 are not connected to any cluster and remain as noise.

Result for ϵ = 10

With ϵ = 10 and min samples = 2: - We have two clusters: - Cluster 1:
{A1 , A4 , A8 } - Cluster 2: {A3 , A5 , A6 } - The points A2 and A7 are noise.

You might also like