K-Means algorithm
Aim
To implement the K-Means clustering algorithm in Python and group a given
dataset into clusters.
Algorithm (K-Means)
1. Choose the number of clusters k.
2. Initialize k centroids randomly.
3. Repeat until convergence:
o Assign each data point to the nearest centroid (using Euclidean
distance).
o Recompute centroids as the mean of all points assigned to each
cluster.
4. Stop when centroids do not change significantly or after a fixed number
of iterations.
Program:
import [Link] as plt
from [Link] import KMeans
from [Link] import make_blobs
# 1. Generate synthetic data
# make_blobs creates isotropic Gaussian blobs for clustering.
# n_samples: total number of points equally divided among clusters.
# centers: number of centers to generate, or the fixed center locations.
# cluster_std: standard deviation of the clusters.
X, y = make_blobs(n_samples=300, centers=4, cluster_std=0.60,
random_state=0)
# 2. Apply K-Means algorithm
# n_clusters: The number of clusters to form as well as the number of centroids
to generate.
# init: 'k-means++' is a smart way to initialize centroids to speed up
convergence.
# random_state: Determines random number generation for centroid
initialization.
kmeans = KMeans(n_clusters=4, init='k-means++', random_state=42,
n_init=10)
[Link](X)
# Get cluster labels for each data point
labels = [Link](X)
# Get the coordinates of the cluster centroids
centroids = kmeans.cluster_centers_
# 3. Visualize the clusters
[Link](figsize=(8, 6))
# Plot data points colored by their assigned cluster
[Link](X[:, 0], X[:, 1], c=labels, cmap='viridis', s=50, alpha=0.8)
# Plot the cluster centroids
[Link](centroids[:, 0], centroids[:, 1], s=200, marker='X', c='red',
label='Centroids')
[Link]('K-Means Clustering')
[Link]('Feature 1')
[Link]('Feature 2')
[Link]()
[Link](True)
[Link]()
print("Cluster centroids:\n", centroids)
print("\nFirst 10 data points with their assigned cluster labels:\n", labels[:10])