0% found this document useful (0 votes)

42 views6 pages

Practical 5

Uploaded by

chaudharisumit325

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views6 pages

Practical 5

Uploaded by

chaudharisumit325

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Name: Harshad Kamble

Roll No : 23
Aim: Assignment on Clustering Techniques
Download the following customer dataset from below link:
Data Set: https://www.kaggle.com/shwetabh123/mall-customers
This dataset gives the data of Income and money spent by the
customers visiting a Shopping Mall. The
data set contains Customer ID, Gender, Age, Annual Income, Spending
Score. Therefore, as a mall owner
you need to find the group of people who are the profitable
customers for the mall owner. Apply at least
two clustering algorithms (based on Spending Score) to find the
group of customers.
a. Apply Data pre-processing (Label Encoding , Data
Transformation….) techniques if
necessary. b. Perform data-preparation( Train-Test Split)
c. Apply Machine Learning Algorithm
d. Evaluate Model.
e. Apply Cross-Validation and Evaluate Model

In [1]: import pandas as pd

In [2]: import matplotlib.pyplot as plt

In [3]: from matplotlib.lines import Line2D

In [4]: from sklearn.preprocessing import StandardScaler

In [5]: from sklearn.decomposition import PCA

In [8]: from sklearn.cluster import KMeans

In [9]: df=pd.read_csv("/home/student/TE52/Mall_Customers.csv")#read specific menti

In [15]: df.head()#to show first few rows of dataframe

Out[15]:
CustomerID Gender Age Annual Income (k$) Spending Score (1-100)

0 1 Male 19 15 39

1 2 Male 21 15 81

2 3 Female 20 16 6

3 4 Female 23 16 77

4 5 Female 31 17 40

In [56]: df.rename(columns = {'Annual Income (k$)':'Annual Income'},inplace=True

In [55]: df.rename(columns={'Spending Score (1-100)':'Spending Score'},inplace

1 of 6
In [57]: df.describe()#to describe the framework

Out[57]:
CustomerID Age Annual Income Spending Score

count 200.000000 200.000000 200.000000 200.000000

mean 100.500000 38.850000 60.560000 50.200000

std 57.879185 13.969007 26.264721 25.823522

min 1.000000 18.000000 15.000000 1.000000

25% 50.750000 28.750000 41.500000 34.750000

50% 100.500000 36.000000 61.500000 50.000000

75% 150.250000 49.000000 78.000000 73.000000

max 200.000000 70.000000 137.000000 99.000000

In [60]: df.isnull().sum()#check for null values

Out[60]: CustomerID 0
Gender 0
Age 0
Annual Income 0
Spending Score 0
dtype: int64

In [47]: df.shape#to show no.of rows and columns

Out[47]: (200, 5)

In [49]: df['Gender'].value_counts()#view count

Out[49]: Gender
Female 112
Male 88
Name: count, dtype: int64

In [61]: print(df.columns.tolist())#check column names

['CustomerID', 'Gender', 'Age', 'Annual Income', 'Spending Score']

In [79]: sc = StandardScaler()#importing standard scaler

In [80]: numeric_features = df[['Age', 'Annual Income', 'Spending Score']]

In [81]: print(numeric_features.head())#printing first few values

Age Annual Income Spending Score

0 19 15 39
1 21 15 81
2 20 16 6
3 23 16 77
4 31 17 40

2 of 6
In [82]: numeric_features_scaled = sc.fit_transform(numeric_features)#sclae the valu

In [83]: df_scaled = pd.DataFrame(numeric_features_scaled, columns=numeric_features

In [90]: print(df_scaled)#display scaled dataframe

Age Annual Income Spending Score

0 -1.424569 -1.738999 -0.434801
1 -1.281035 -1.738999 1.195704
2 -1.352802 -1.700830 -1.715913
3 -1.137502 -1.700830 1.040418
4 -0.563369 -1.662660 -0.395980
.. ... ... ...
195 -0.276302 2.268791 1.118061
196 0.441365 2.497807 -0.861839
197 -0.491602 2.497807 0.923953
198 -0.491602 2.917671 -1.250054
199 -0.635135 2.917671 1.273347

[200 rows x 3 columns]

In [91]: pca = PCA(n_components = 2)#creating pca object

In [93]: df_pca = pca.fit_transform(df_scaled)#fitting and transforming data

In [95]: print("data shape after PCA :",df_pca.shape)#printing shape of transformed

data shape after PCA : (200, 2)

In [96]: print("data_pca is:",df_pca)#printing transformed data

data_pca is: [[-6.15720019e-01 -1.76348088e+00]

[-1.66579271e+00 -1.82074695e+00]
[ 3.37861909e-01 -1.67479894e+00]
[-1.45657325e+00 -1.77242992e+00]
[-3.84652078e-02 -1.66274012e+00]
[-1.48168526e+00 -1.73500173e+00]
[ 1.09461665e+00 -1.56610230e+00]
[-1.92630736e+00 -1.72111049e+00]
[ 2.64517786e+00 -1.46084721e+00]
[-9.70130513e-01 -1.63558108e+00]
[ 2.49568861e+00 -1.47048914e+00]
[-1.45688256e+00 -1.66436050e+00]
[ 2.01018729e+00 -1.45329897e+00]
[-1.41321072e+00 -1.61776746e+00]
[ 1.00042965e+00 -1.49579176e+00]
[-1.56943170e+00 -1.62502669e+00]
[ 2.94060318e-01 -1.49425585e+00]
[-1.31624924e+00 -1.57216383e+00]
[ 1.31669910e+00 -1.37243404e+00]
[-1.43679899e+00 -1.51039469e+00]
In [97]: plt_font = {'family':'serif' , 'size':16}#font size

3 of 6
In [99]: wcss_list = []
for i in range(1, 15):
kmeans = KMeans(n_clusters = i , init = 'k-means++' , random_state

In [101]: kmeans.fit(df_pca)
wcss_list.append(kmeans.inertia_)

In [5]: import matplotlib.pyplot as plt

# Example font properties (customize as needed)

plt_font = {
'fontsize': 12,
'fontweight': 'bold',
'family': 'serif'
}

# Example: Define wcss_list (replace this with your actual WCSS values)
wcss_list = [500, 300, 250, 200, 180, 175, 170, 160, 150, 145, 140, 135

# Plotting
plt.plot(range(1, len(wcss_list) + 1), wcss_list)
plt.plot([4, 4], [0, max(wcss_list)], linestyle='--', alpha=0.7) # Adjuste
plt.xlabel('K', fontdict=plt_font)
plt.ylabel('WCSS', fontdict=plt_font)
plt.title('Elbow Method for Optimal k', fontdict=plt_font)
plt.show()

4 of 6
In [30]:

# Example: Creating a sample dataset

# Replace this with your actual data
data = pd.DataFrame({
'feature1': np.random.rand(100),
'feature2': np.random.rand(100),
'feature3': np.random.rand(100),
})

# Step 1: Standardize the data

scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)

# Step 2: Perform PCA

pca = PCA(n_components=2) # Adjust the number of components as needed
df_pca = pca.fit_transform(data_scaled)

# Step 3: Perform KMeans clustering

kmeans = KMeans(n_clusters=4, init='k-means++', random_state=1)
kmeans.fit(df_pca)
cluster_id = kmeans.predict(df_pca)

# Step 4: Creating a result DataFrame

result_data = pd.DataFrame()
result_data['PC1'] = df_pca[:, 0]
result_data['PC2'] = df_pca[:, 1]
result_data['Cluster'] = cluster_id # Use 'Cluster' as defined earlier

# Print the result data

print(result_data)

# Define colors for clusters

cluster_colors = {0: 'tab:red', 1: 'tab:green', 2: 'tab:blue', 3: 'tab:pink
cluster_dict = {
'Centroid': 'tab:orange',
'Cluster0': 'tab:red',
'Cluster1': 'tab:green',
'Cluster2': 'tab:blue',
'Cluster3': 'tab:pink'
}

# Scatter plot for the clusters

plt.scatter(x=result_data['PC1'],
y=result_data['PC2'],
c=result_data['Cluster'].map(cluster_colors))

# Create legend handles

handles = [Line2D([0], [0], marker='o', color='w', markerfacecolor=v,
markersize=8) for k, v in cluster_dict.items()]
plt.legend(title='Color', handles=handles, bbox_to_anchor=(1.05, 1),

# Scatter plot for centroids

plt.scatter(x=kmeans.cluster_centers_[:, 0],
y=kmeans.cluster_centers_[:, 1],
marker='o',
c='tab:orange',
s=150,
alpha=1)

5 of 6
# Plot settings
plt.title("Clustered by KMeans", fontdict={'fontsize': 14, 'fontweight'
plt.xlabel("PC1", fontdict={'fontsize': 12})
plt.ylabel("PC2", fontdict={'fontsize': 12})
plt.show()

PC1 PC2 Cluster

0 -0.708627 0.160780 0
1 -1.604748 0.467547 0
2 2.637297 0.254471 2
3 -0.265911 -1.520968 3
4 -0.256524 1.008739 0
.. ... ... ...
95 1.335878 -1.062640 1
96 -0.139539 -1.098409 3
97 -0.077331 0.132332 0
98 0.473854 -0.749248 1
99 -0.749690 1.394149 0

[100 rows x 3 columns]

6 of 6

100 Questions - Jackie - ICF Credentialing Exam
100% (10)
100 Questions - Jackie - ICF Credentialing Exam
82 pages
Data Mining - Project
100% (2)
Data Mining - Project
11 pages
Audio Compression Using Wavelet Techniques: Project Report
No ratings yet
Audio Compression Using Wavelet Techniques: Project Report
41 pages
DSBDA Prac4 2
No ratings yet
DSBDA Prac4 2
1 page
Implement Clustering Algorithms For Unsupervised Classification
No ratings yet
Implement Clustering Algorithms For Unsupervised Classification
4 pages
Untitled Document-2-1-13-7-11.4
No ratings yet
Untitled Document-2-1-13-7-11.4
5 pages
Slip Clustering
No ratings yet
Slip Clustering
2 pages
ML2 Practical List
No ratings yet
ML2 Practical List
80 pages
Casos de ML Unsupervised Daniel Ames Camayo
No ratings yet
Casos de ML Unsupervised Daniel Ames Camayo
20 pages
Crash Barrier BBS & QTY
100% (10)
Crash Barrier BBS & QTY
4 pages
MLGS Ii
No ratings yet
MLGS Ii
505 pages
Sa1 Frame
No ratings yet
Sa1 Frame
51 pages
Kmeans
No ratings yet
Kmeans
5 pages
D3 Docs
No ratings yet
D3 Docs
6 pages
K Means
No ratings yet
K Means
5 pages
SPPUML6
No ratings yet
SPPUML6
9 pages
Drawback of Standard K-Means Algorithm
No ratings yet
Drawback of Standard K-Means Algorithm
5 pages
Mall Customer Segmentation Using KMeans Clustering Algorithm and Classification Algorithm
No ratings yet
Mall Customer Segmentation Using KMeans Clustering Algorithm and Classification Algorithm
40 pages
Unit 3 Unsupervised Learning
No ratings yet
Unit 3 Unsupervised Learning
9 pages
Pca 2382487
No ratings yet
Pca 2382487
8 pages
Tugas Clustering - 132021012 - Kevin Gazkia Naufal
No ratings yet
Tugas Clustering - 132021012 - Kevin Gazkia Naufal
6 pages
EDA Plots Code
No ratings yet
EDA Plots Code
13 pages
Roll NO 2020
No ratings yet
Roll NO 2020
8 pages
Chapter 2 (MPTH)
100% (1)
Chapter 2 (MPTH)
15 pages
23CC554
No ratings yet
23CC554
10 pages
Meteorological Instruments: MODEL 85000
No ratings yet
Meteorological Instruments: MODEL 85000
16 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
Mlda - Lab
No ratings yet
Mlda - Lab
35 pages
Clustering Algorithms SciKit Learn 1705740354
No ratings yet
Clustering Algorithms SciKit Learn 1705740354
22 pages
SOLUTION ONLY CODE DWDM - Lab - All
No ratings yet
SOLUTION ONLY CODE DWDM - Lab - All
8 pages
Clustering
No ratings yet
Clustering
1 page
Assignmnet 5
No ratings yet
Assignmnet 5
11 pages
SPE 101937-STU: Determining Cutting Transport Parameter in A Horizontal Coiled Tubing Underbalanced Drilling Operation
No ratings yet
SPE 101937-STU: Determining Cutting Transport Parameter in A Horizontal Coiled Tubing Underbalanced Drilling Operation
11 pages
CHE 1000-E LEARNING - BALANCING REDOX REACTIONS
No ratings yet
CHE 1000-E LEARNING - BALANCING REDOX REACTIONS
17 pages
Reading Data: #Importing Required Libraries
No ratings yet
Reading Data: #Importing Required Libraries
16 pages
Spectral Clustering
No ratings yet
Spectral Clustering
5 pages
Create Gantt Chart and Cash Flow Using Excel With A File
No ratings yet
Create Gantt Chart and Cash Flow Using Excel With A File
6 pages
AbidAdhikari26840 DWDM
No ratings yet
AbidAdhikari26840 DWDM
43 pages
Chapter 7
No ratings yet
Chapter 7
49 pages
3 - Modeling - Ipynb - Colaboratory
No ratings yet
3 - Modeling - Ipynb - Colaboratory
31 pages
Exp2 - Data Visualization and Cleaning and Feature Selection
No ratings yet
Exp2 - Data Visualization and Cleaning and Feature Selection
13 pages
Data Mining Ex1
No ratings yet
Data Mining Ex1
10 pages
Data Mining Practicals Complete
No ratings yet
Data Mining Practicals Complete
13 pages
Sales Data Clustering
No ratings yet
Sales Data Clustering
15 pages
Manual Instruction CPAM-EKA AIR C16 EKA KOOL V2
No ratings yet
Manual Instruction CPAM-EKA AIR C16 EKA KOOL V2
8 pages
ML - Lab Manual
No ratings yet
ML - Lab Manual
54 pages
LAB7 Kmeans
No ratings yet
LAB7 Kmeans
11 pages
Lecture - 7 - Practical - DBSCAN Clustering in Python
No ratings yet
Lecture - 7 - Practical - DBSCAN Clustering in Python
3 pages
M PDF
No ratings yet
M PDF
13 pages
IDM Assignment
No ratings yet
IDM Assignment
15 pages
Cheat Sheet-Building Unsupervised Learning Models
No ratings yet
Cheat Sheet-Building Unsupervised Learning Models
3 pages
Exp 12 and 15
No ratings yet
Exp 12 and 15
4 pages
DWM Practical
No ratings yet
DWM Practical
12 pages
Week 8 DS Practical
No ratings yet
Week 8 DS Practical
13 pages
BSC - Microbiology - Sem - 1 (Minor With Practicals)
No ratings yet
BSC - Microbiology - Sem - 1 (Minor With Practicals)
3 pages
Mini Project With Output
No ratings yet
Mini Project With Output
8 pages
Introduction To Electromagnetic Compatibility Engineering
No ratings yet
Introduction To Electromagnetic Compatibility Engineering
3 pages
S6 - Data Mining Lab Experiments (Except 1)
No ratings yet
S6 - Data Mining Lab Experiments (Except 1)
6 pages
Project Data Mining (AMAN YADAV)
No ratings yet
Project Data Mining (AMAN YADAV)
12 pages
MLFILE
No ratings yet
MLFILE
21 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
Strangers
No ratings yet
Strangers
8 pages
Inbuilt Kmeans
No ratings yet
Inbuilt Kmeans
3 pages
Principal Component Analysis Notes : Info
No ratings yet
Principal Component Analysis Notes : Info
22 pages
CV Riston Belman Sidabutar
No ratings yet
CV Riston Belman Sidabutar
6 pages
ML Short Code - Under Updating
No ratings yet
ML Short Code - Under Updating
4 pages
Baidurya Debnath 4
No ratings yet
Baidurya Debnath 4
37 pages
Nl-S2-941b1-B-English Manual
No ratings yet
Nl-S2-941b1-B-English Manual
5 pages
La 111 Sessional 2023
No ratings yet
La 111 Sessional 2023
3 pages
Data Preprocessing 2
No ratings yet
Data Preprocessing 2
5 pages
Shiva Teja
No ratings yet
Shiva Teja
19 pages
RSettings For 64GT & 99GT PDF
No ratings yet
RSettings For 64GT & 99GT PDF
7 pages
Numpy Cheatsheet
No ratings yet
Numpy Cheatsheet
11 pages
RANDOX
No ratings yet
RANDOX
7 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
ML Lab Exam Document
No ratings yet
ML Lab Exam Document
14 pages
The Effect of Macrocelebrity and Microin Uencer Endorsements On Consumer-Brand Engagement in Instagram
No ratings yet
The Effect of Macrocelebrity and Microin Uencer Endorsements On Consumer-Brand Engagement in Instagram
21 pages
Time Series Analysis and Forecasting of Gold Price Using ARIMA and LSTM Model
No ratings yet
Time Series Analysis and Forecasting of Gold Price Using ARIMA and LSTM Model
8 pages
In Partial Fulfillment of The Requirements For Work Immersion
No ratings yet
In Partial Fulfillment of The Requirements For Work Immersion
10 pages
ML Programs
No ratings yet
ML Programs
14 pages
4A Lesson Plan in English Grade 2: Valencia City Central School
No ratings yet
4A Lesson Plan in English Grade 2: Valencia City Central School
3 pages
SAT Suite Question Bank - Problem Solving and Data Analysis AnsResults
No ratings yet
SAT Suite Question Bank - Problem Solving and Data Analysis AnsResults
113 pages
Technical Data Sheet: Art. 630 Art. 630/1 - 630/2 - 630/3 Art. W51 Description
No ratings yet
Technical Data Sheet: Art. 630 Art. 630/1 - 630/2 - 630/3 Art. W51 Description
4 pages
Surat Undangan Peserta ADIA
No ratings yet
Surat Undangan Peserta ADIA
9 pages
Moral Panics Assignment
No ratings yet
Moral Panics Assignment
7 pages
Carl Jung: Everything About Other People That Doesn't Satisfy Us Helps Us To Better Understand Ourselves
No ratings yet
Carl Jung: Everything About Other People That Doesn't Satisfy Us Helps Us To Better Understand Ourselves
1 page
Cours 2 Anglais M1 Informatique
No ratings yet
Cours 2 Anglais M1 Informatique
5 pages
PHP Pizza Form
No ratings yet
PHP Pizza Form
1 page

Practical 5

Uploaded by

Practical 5

Uploaded by

Name: Harshad Kamble

In [1]: import pandas as pd

In [2]: import matplotlib.pyplot as plt

In [3]: from matplotlib.lines import Line2D

In [4]: from sklearn.preprocessing import StandardScaler

In [5]: from sklearn.decomposition import PCA

In [8]: from sklearn.cluster import KMeans

In [9]: df=pd.read_csv("/home/student/TE52/Mall_Customers.csv")#read specific menti

In [15]: df.head()#to show first few rows of dataframe

In [56]: df.rename(columns = {'Annual Income (k$)':'Annual Income'},inplace=True

In [55]: df.rename(columns={'Spending Score (1-100)':'Spending Score'},inplace

count 200.000000 200.000000 200.000000 200.000000

mean 100.500000 38.850000 60.560000 50.200000

std 57.879185 13.969007 26.264721 25.823522

min 1.000000 18.000000 15.000000 1.000000

25% 50.750000 28.750000 41.500000 34.750000

50% 100.500000 36.000000 61.500000 50.000000

75% 150.250000 49.000000 78.000000 73.000000

max 200.000000 70.000000 137.000000 99.000000

In [60]: df.isnull().sum()#check for null values

In [47]: df.shape#to show no.of rows and columns

In [49]: df['Gender'].value_counts()#view count

In [61]: print(df.columns.tolist())#check column names

['CustomerID', 'Gender', 'Age', 'Annual Income', 'Spending Score']

In [79]: sc = StandardScaler()#importing standard scaler

In [80]: numeric_features = df[['Age', 'Annual Income', 'Spending Score']]

In [81]: print(numeric_features.head())#printing first few values

Age Annual Income Spending Score

In [83]: df_scaled = pd.DataFrame(numeric_features_scaled, columns=numeric_features

In [90]: print(df_scaled)#display scaled dataframe

Age Annual Income Spending Score

[200 rows x 3 columns]

In [91]: pca = PCA(n_components = 2)#creating pca object

In [93]: df_pca = pca.fit_transform(df_scaled)#fitting and transforming data

In [95]: print("data shape after PCA :",df_pca.shape)#printing shape of transformed

data shape after PCA : (200, 2)

In [96]: print("data_pca is:",df_pca)#printing transformed data

data_pca is: [[-6.15720019e-01 -1.76348088e+00]

In [5]: import matplotlib.pyplot as plt

# Example font properties (customize as needed)

# Example: Creating a sample dataset

# Step 1: Standardize the data

# Step 2: Perform PCA

# Step 3: Perform KMeans clustering

# Step 4: Creating a result DataFrame

# Print the result data

# Define colors for clusters

# Scatter plot for the clusters

# Create legend handles

# Scatter plot for centroids

PC1 PC2 Cluster

[100 rows x 3 columns]

You might also like