0% found this document useful (0 votes)
46 views2 pages

Assignment 2.1

Uploaded by

ARJU Zerin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views2 pages

Assignment 2.1

Uploaded by

ARJU Zerin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Assignment 2.

1 Due Date 21/10/2021

Hierarchical clustering:

1. We have 2-dimensional iris data for 20 flowers here. So we have 20 2-dimensional points.
a. Use hierarchical clustering for 3 clusters. SO you will stop when there are clusters
formed. (hand calculation)
b. Draw the tree with pen and paper how you combine the points to create three clusters.
(example: fig 7.6, page 262, MMDS book)
2. Write a small python program to do the Q1

Kmeans clustering

Download dataset from here. Use data0.txt for your data file.

Data Description:

1. Each line contains one data point.


2. First column indicates the point index. Point index are not used in calculation. It is for
identification only.
3. After the index, 50 features (50 dimensional data) are provided with comma separated values in
each line.

Task: TODO

1. You have to implement k means clustering algorithm with these data


2. A sample implementation file is provided. Please download. You can use notebook or python
style to complete the TODO. The implementation file is actually a skeleton. You need to fill up by
your code blocks where asked.
3. At first, data are loaded
4. A fixed number of points are sampled randomly from the above data.
a. Inside the kmeans function, use initialize_centroids_simple() to initialize your centroids.
This is the simple assignment function you need to implement. Randomly select K points
from the sampled data and assign them as initialized centroids.
b. Then, in the kmeans function, you have to write your own code to count the number of
points assigned for each cluster and store in the defined structure.
c. Then write your own code to terminate the process based on the termination criteria
discussed in the class. We evaluate the quality of the clustering using the clustering
objective

Where N is the total number of sampled points. xi is the ith data point. zk is the centroid
for kth cluster. The algorithm is terminated when J is nearly equal in two successive
iterations (e.g., we terminate when |J − Jprev| ≤ 10−5J, where Jprev is the value of J after
the previous iteration).

d. In the main function, finally save you outputs.


Assignment 2.1 Due Date 21/10/2021

i. Write all final centroids in out1 file. One line for each centroids, Features would
be separated by comma.
ii. Write the cluster assignments for each of the point. Each line: Point index,
cluster number
5. Now use initialize_centroids() to initialize the centroids in the following way.
a. Calculate max feature and min feature value for each dimension
b. Use diff = max - min for each dimension
c. For each centroid j, in each dimension i; assign centroids[j][i] = min_feature_val + diff *
random.uniform(1e-5, 1)

6. Then do the same thing as 4b, 4c, and 4d.


7. Compare the outputs for 4d and 5.

You might also like