Assignment 2.1
Assignment 2.1
Hierarchical clustering:
1. We have 2-dimensional iris data for 20 flowers here. So we have 20 2-dimensional points.
a. Use hierarchical clustering for 3 clusters. SO you will stop when there are clusters
formed. (hand calculation)
b. Draw the tree with pen and paper how you combine the points to create three clusters.
(example: fig 7.6, page 262, MMDS book)
2. Write a small python program to do the Q1
Kmeans clustering
Download dataset from here. Use data0.txt for your data file.
Data Description:
Task: TODO
Where N is the total number of sampled points. xi is the ith data point. zk is the centroid
for kth cluster. The algorithm is terminated when J is nearly equal in two successive
iterations (e.g., we terminate when |J − Jprev| ≤ 10−5J, where Jprev is the value of J after
the previous iteration).
i. Write all final centroids in out1 file. One line for each centroids, Features would
be separated by comma.
ii. Write the cluster assignments for each of the point. Each line: Point index,
cluster number
5. Now use initialize_centroids() to initialize the centroids in the following way.
a. Calculate max feature and min feature value for each dimension
b. Use diff = max - min for each dimension
c. For each centroid j, in each dimension i; assign centroids[j][i] = min_feature_val + diff *
random.uniform(1e-5, 1)