0% found this document useful (0 votes)

25 views1 page

Mall Customer

Uploaded by

aboodyshehab30

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views1 page

Mall Customer

Uploaded by

aboodyshehab30

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Mall Customer

You own the mall and want to understand the customers like who can be easily converge [Target Customers] so that the sense can be given to marketing team and plan the strategy accordingly.

In [1]: import numpy as np

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.cluster import KMeans

from sklearn.preprocessing import LabelEncoder

from sklearn.preprocessing import StandardScaler

In [2]: df = pd.read_csv("Mall_Customers.csv")

In [3]: df

Out[3]: CustomerID Gender Age Annual Income (k$) Spending Score (1-100)

0 1 Male 19 15 39

1 2 Male 21 15 81

2 3 Female 20 16 6

3 4 Female 23 16 77

4 5 Female 31 17 40

... ... ... ... ... ...

195 196 Female 35 120 79

196 197 Female 45 126 28

197 198 Male 32 126 74

198 199 Male 32 137 18

199 200 Male 30 137 83

200 rows × 5 columns

Explore Data
In [4]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CustomerID 200 non-null int64
1 Gender 200 non-null object
2 Age 200 non-null int64
3 Annual Income (k$) 200 non-null int64
4 Spending Score (1-100) 200 non-null int64
dtypes: int64(4), object(1)
memory usage: 7.9+ KB

In [5]: df.describe()

Out[5]: CustomerID Age Annual Income (k$) Spending Score (1-100)

count 200.000000 200.000000 200.000000 200.000000

mean 100.500000 38.850000 60.560000 50.200000

std 57.879185 13.969007 26.264721 25.823522

min 1.000000 18.000000 15.000000 1.000000

25% 50.750000 28.750000 41.500000 34.750000

50% 100.500000 36.000000 61.500000 50.000000

75% 150.250000 49.000000 78.000000 73.000000

max 200.000000 70.000000 137.000000 99.000000

In [6]: df['Gender'].value_counts()

Out[6]: Gender
Female 112
Male 88
Name: count, dtype: int64

In [7]: df.groupby('Gender')['Spending Score (1-100)'].value_counts()

Out[7]: Gender Spending Score (1-100)

Female 42 7
50 5
40 4
35 3
77 3
..
Male 17 1
11 1
9 1
8 1
3 1
Name: count, Length: 120, dtype: int64

In [8]: la=LabelEncoder()

In [9]: df.Gender = la.fit_transform(df.Gender)

In [10]: df.Gender

Out[10]: 0 1
1 1
2 0
3 0
4 0
..
195 0
196 0
197 1
198 1
199 1
Name: Gender, Length: 200, dtype: int32

In [11]: df

Out[11]: CustomerID Gender Age Annual Income (k$) Spending Score (1-100)

0 1 1 19 15 39

1 2 1 21 15 81

2 3 0 20 16 6

3 4 0 23 16 77

4 5 0 31 17 40

... ... ... ... ... ...

195 196 0 35 120 79

196 197 0 45 126 28

197 198 1 32 126 74

198 199 1 32 137 18

199 200 1 30 137 83

200 rows × 5 columns

creat model
In [12]: clustern = []
j = []

In [13]: for i in range(1,12):

model = KMeans(n_clusters= i)
model.fit(df)
clustern.append(i)
j.append( model.inertia_)

C:\Users\Abdo\anaconda3\lib\site-packages\sklearn\cluster\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
C:\Users\Abdo\anaconda3\lib\site-packages\sklearn\cluster\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
C:\Users\Abdo\anaconda3\lib\site-packages\sklearn\cluster\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
C:\Users\Abdo\anaconda3\lib\site-packages\sklearn\cluster\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
C:\Users\Abdo\anaconda3\lib\site-packages\sklearn\cluster\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
C:\Users\Abdo\anaconda3\lib\site-packages\sklearn\cluster\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
C:\Users\Abdo\anaconda3\lib\site-packages\sklearn\cluster\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
C:\Users\Abdo\anaconda3\lib\site-packages\sklearn\cluster\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
C:\Users\Abdo\anaconda3\lib\site-packages\sklearn\cluster\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
C:\Users\Abdo\anaconda3\lib\site-packages\sklearn\cluster\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
C:\Users\Abdo\anaconda3\lib\site-packages\sklearn\cluster\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)

In [14]: pd.DataFrame( clustern,j)

Out[14]: 0

975512.060000 1

387065.713771 2

271396.562966 3

195401.198560 4

157179.693358 5

122625.389195 6

103208.122119 7

86119.077481 8

77369.724425 9

68900.885948 10

64977.883458 11

In [15]: plt.plot(clustern,j,marker = 'o')

plt.title ("find the best num of kmeans")
plt.xlabel("count of k_means")
plt.ylabel("perc of error")

Out[15]: Text(0, 0.5, 'perc of error')

In [16]: model = KMeans(n_clusters= 4)

model.fit(df)
pre = model.predict(df)

In [17]: df['k_means'] = pre

In [18]: df

Out[18]: CustomerID Gender Age Annual Income (k$) Spending Score (1-100) k_means

0 1 1 19 15 39 1

1 2 1 21 15 81 1

2 3 0 20 16 6 1

3 4 0 23 16 77 1

4 5 0 31 17 40 1

... ... ... ... ... ... ...

195 196 0 35 120 79 3

196 197 0 45 126 28 0

197 198 1 32 126 74 3

198 199 1 32 137 18 0

199 200 1 30 137 83 3

200 rows × 6 columns

In [19]: la = StandardScaler()

In [20]: xdata = la.fit_transform(df[['Age','Annual Income (k$)','Spending Score (1-100)']])

In [21]: clustern = []
j = []

In [22]: for i in range(1,12):

model = KMeans(n_clusters= i)
model.fit(df)
clustern.append(i)
j.append( model.inertia_)

In [23]: pd.DataFrame( clustern,j)

Out[23]: 0

975711.740000 1

387261.769977 2

271564.483308 3

195394.487179 4

157626.804813 5

122646.505263 6

105106.807522 7

86070.464191 8

76910.473799 9

68936.676383 10

64625.772938 11

In [24]: plt.plot(clustern,j,marker = 'o')

plt.title ("find the best num of kmeans")
plt.xlabel("count of k_means")
plt.ylabel("perc of error")

Out[24]: Text(0, 0.5, 'perc of error')

In [25]: df1 = df[df['k_means'] == 0]

df2 = df[df['k_means'] == 1]
df3 = df[df['k_means'] == 2]
df4 = df[df['k_means'] == 3]

In [26]: plt.scatter(df1['Annual Income (k$)'], df1['Spending Score (1-100)'] , label = 'group1')

plt.scatter(df2['Annual Income (k$)'], df2['Spending Score (1-100)'] , label = 'group2')
plt.scatter(df3['Annual Income (k$)'], df3['Spending Score (1-100)'] , label = 'group3')
plt.scatter(df4['Annual Income (k$)'], df4['Spending Score (1-100)'] , label = 'group4')
plt.title('find Clusters')
plt.xlabel('Annual income')
plt.ylabel('Spending Score ')
plt.legend()

Out[26]: <matplotlib.legend.Legend at 0x231a66eca00>

In [27]: la = StandardScaler()

In [28]: xdata = la.fit_transform(df[['Age','Annual Income (k$)','Spending Score (1-100)']])

In [29]: clustern = []
j = []

In [30]: for i in range(1,12):

xdata = KMeans(n_clusters= i)
xdata.fit(df)
clustern.append(i)
j.append( xdata.inertia_)

In [31]: pd.DataFrame( clustern,j)

Out[31]: 0

975711.740000 1

387261.769977 2

271553.349336 3

195401.198560 4

157145.164702 5

122659.604473 6

103267.233144 7

86042.965074 8

76910.473799 9

69714.586728 10

64629.937819 11

In [32]: plt.plot(clustern,j,marker = 'o')

plt.title ("find the best num of kmeans")
plt.xlabel("count of k_means")
plt.ylabel("perc of error")

Out[32]: Text(0, 0.5, 'perc of error')

In [33]: df1 = df[df['k_means'] == 0]

df2 = df[df['k_means'] == 1]
df3 = df[df['k_means'] == 2]
df4 = df[df['k_means'] == 3]

In [34]: plt.scatter(df1['Annual Income (k$)'], df1['Spending Score (1-100)'] , label = 'group1')

Out[34]: <matplotlib.legend.Legend at 0x231a658bd30>

Analysis
In [35]: sns.countplot(df, x = 'k_means',palette='viridis')

C:\Users\Abdo\AppData\Local\Temp\ipykernel_7324\4039995419.py:1: FutureWarning:

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.

sns.countplot(df, x = 'k_means',palette='viridis')
Out[35]: <Axes: xlabel='k_means', ylabel='count'>

In [36]: df.k_means.value_counts().plot.pie(autopct = '%0.2f%%')

Out[36]: <Axes: ylabel='count'>

In [37]: sns.histplot(df.Age)

Out[37]: <Axes: xlabel='Age', ylabel='Count'>

In [38]: sns.countplot(data = df,x = df['k_means'], hue = df['Gender'])

Out[38]: <Axes: xlabel='k_means', ylabel='count'>

In [ ]:

Business Case - Aerofit - Descriptive Statistics Probability (Final)
100% (1)
Business Case - Aerofit - Descriptive Statistics Probability (Final)
1 page
Treadmill User Data Analysis Insights
No ratings yet
Treadmill User Data Analysis Insights
1 page
ML Lab Exp 7 K-Means Clustering
No ratings yet
ML Lab Exp 7 K-Means Clustering
14 pages
Customer Clustering Analysis Results
No ratings yet
Customer Clustering Analysis Results
4 pages
Retail Customer Segmentation Insights
No ratings yet
Retail Customer Segmentation Insights
12 pages
Customer Segmentation Analysis with K-Means
No ratings yet
Customer Segmentation Analysis with K-Means
11 pages
Walmart - Ipynb - Colaboratory
No ratings yet
Walmart - Ipynb - Colaboratory
6 pages
Ai YasmeenAlhajYousef 0197638 Mohammad Almajali 2191370 End
No ratings yet
Ai YasmeenAlhajYousef 0197638 Mohammad Almajali 2191370 End
2 pages
Logistic Regression Implementation
No ratings yet
Logistic Regression Implementation
10 pages
Age and Income Purchase Analysis
No ratings yet
Age and Income Purchase Analysis
2 pages
Exp8 Clutering
No ratings yet
Exp8 Clutering
5 pages
User Demographics and Purchase Data Analysis
No ratings yet
User Demographics and Purchase Data Analysis
1 page
Data Structures and R Programming Guide
No ratings yet
Data Structures and R Programming Guide
15 pages
Vijay Shankar Customer Churn Random Forest Hyperparameter Tuning
No ratings yet
Vijay Shankar Customer Churn Random Forest Hyperparameter Tuning
40 pages
Dsbdalab 5
No ratings yet
Dsbdalab 5
3 pages
Fertility 2
No ratings yet
Fertility 2
626 pages
Logistic Regression PRGM
No ratings yet
Logistic Regression PRGM
1 page
Prac 31 Jan
No ratings yet
Prac 31 Jan
16 pages
Data Preprocessing - Ipynb - Colaboratory
No ratings yet
Data Preprocessing - Ipynb - Colaboratory
7 pages
Practical 1
No ratings yet
Practical 1
26 pages
Dsbda 3a
No ratings yet
Dsbda 3a
11 pages
Chisquare
No ratings yet
Chisquare
9 pages
Aerofit Eda
No ratings yet
Aerofit Eda
25 pages
Data Preprocessing
No ratings yet
Data Preprocessing
27 pages
Titanic Dataset Analysis Guide
No ratings yet
Titanic Dataset Analysis Guide
2 pages
Exploratory Data Analysis in Python
No ratings yet
Exploratory Data Analysis in Python
74 pages
Customer Trip Data Analysis Report
No ratings yet
Customer Trip Data Analysis Report
660 pages
Aerofit Case Study1
No ratings yet
Aerofit Case Study1
56 pages
Data Analysis Process: 5 Key Steps
No ratings yet
Data Analysis Process: 5 Key Steps
95 pages
Quiz Coding Question 1
No ratings yet
Quiz Coding Question 1
9 pages
Data Science Patient Dataset Generation
No ratings yet
Data Science Patient Dataset Generation
31 pages
Generate Mall Customer Data in Python
No ratings yet
Generate Mall Customer Data in Python
2 pages
Practise
No ratings yet
Practise
9 pages
Demographic Data of Patients
No ratings yet
Demographic Data of Patients
216 pages
Experiments Rlab Upto Cat - 1: Lab - 1 Introduction To R - Lab
No ratings yet
Experiments Rlab Upto Cat - 1: Lab - 1 Introduction To R - Lab
31 pages
Survey Data on Age, Gender, and Income
No ratings yet
Survey Data on Age, Gender, and Income
54 pages
Data Analysis with Python Libraries
No ratings yet
Data Analysis with Python Libraries
17 pages
Music Streaming User Churn Analysis
No ratings yet
Music Streaming User Churn Analysis
1 page
Customer Dataset - CSV
No ratings yet
Customer Dataset - CSV
17 pages
Payroll Data Analysis Guide
No ratings yet
Payroll Data Analysis Guide
6 pages
Heart Diseases EDA
No ratings yet
Heart Diseases EDA
1 page
Análisis Estadístico con Python
No ratings yet
Análisis Estadístico con Python
7 pages
Assignment3 VidulGarg
No ratings yet
Assignment3 VidulGarg
14 pages
Decision Tree PBEL With GridSearchCV
No ratings yet
Decision Tree PBEL With GridSearchCV
12 pages
I Practice Nursing 2
No ratings yet
I Practice Nursing 2
9 pages
User Demographics and Product Data
No ratings yet
User Demographics and Product Data
2,985 pages
User Demographics and Product Data
No ratings yet
User Demographics and Product Data
2,986 pages
Learning Journal: R Data Analysis
No ratings yet
Learning Journal: R Data Analysis
9 pages
Final Data Histogram Ogives
No ratings yet
Final Data Histogram Ogives
39 pages
LogisticRegression RST
No ratings yet
LogisticRegression RST
11 pages
Maths Project On Statistics
100% (1)
Maths Project On Statistics
7 pages
Churn Prediction with Python
No ratings yet
Churn Prediction with Python
7 pages
Logistic Regression Dataset
No ratings yet
Logistic Regression Dataset
9 pages
PBL 2.o
No ratings yet
PBL 2.o
40 pages
Sim900 Te PCB Layout Schematic For Reference v102
No ratings yet
Sim900 Te PCB Layout Schematic For Reference v102
16 pages
Windowing and Clipping
No ratings yet
Windowing and Clipping
43 pages
Python Project on Lok Sabha Election
100% (1)
Python Project on Lok Sabha Election
22 pages
User Manual 28181
No ratings yet
User Manual 28181
13 pages
Anupam Yadav - SEO Optimized Resume
No ratings yet
Anupam Yadav - SEO Optimized Resume
1 page
CTA-Report Format
No ratings yet
CTA-Report Format
11 pages
Arm Corstone-500 Getting Started Guide 102262 0000 01 en
No ratings yet
Arm Corstone-500 Getting Started Guide 102262 0000 01 en
62 pages
System Software: Computing Essentials 2014
No ratings yet
System Software: Computing Essentials 2014
19 pages
I PUC CS Chapter 2
No ratings yet
I PUC CS Chapter 2
11 pages
Log
No ratings yet
Log
49 pages
Building SOA-Based Applications Guide
No ratings yet
Building SOA-Based Applications Guide
35 pages
Technical Specifications Xgs 118 118w 128 128w 138
No ratings yet
Technical Specifications Xgs 118 118w 128 128w 138
8 pages
8051 Microcontroller Lab Manual
No ratings yet
8051 Microcontroller Lab Manual
7 pages
BloxOne Threat Defense Essentials Datasheet
No ratings yet
BloxOne Threat Defense Essentials Datasheet
2 pages
Exploring Computer Science One Pager
No ratings yet
Exploring Computer Science One Pager
1 page
QR Code E-Cart Shopping System
No ratings yet
QR Code E-Cart Shopping System
5 pages
LAN Design for Police Office
No ratings yet
LAN Design for Police Office
17 pages
SCSI Architecture Model - 2 (SAM-2)
No ratings yet
SCSI Architecture Model - 2 (SAM-2)
113 pages
Basics of Information Security Course Code: 4360702
No ratings yet
Basics of Information Security Course Code: 4360702
8 pages
Dokumentacja Allegro
No ratings yet
Dokumentacja Allegro
440 pages
Chap - 9
No ratings yet
Chap - 9
15 pages
Ee429 Programmable Logic Controllers (Dept. Elective - IV) : Course Description & Objectives
No ratings yet
Ee429 Programmable Logic Controllers (Dept. Elective - IV) : Course Description & Objectives
1 page
Data Science & AI Course by IIT Roorkee
No ratings yet
Data Science & AI Course by IIT Roorkee
14 pages
ERAN Feature Documentation ERAN16.1 - 10 20230721170353
No ratings yet
ERAN Feature Documentation ERAN16.1 - 10 20230721170353
331 pages
COA Tute 8 Main
No ratings yet
COA Tute 8 Main
3 pages
Microservices vs. Macroservices Guide
No ratings yet
Microservices vs. Macroservices Guide
4 pages
Fundamentals of Information Systems 8th Edition by Stair ISBN Test Bank
100% (64)
Fundamentals of Information Systems 8th Edition by Stair ISBN Test Bank
24 pages
DataRobot vs Cloudera MLOps Analysis
No ratings yet
DataRobot vs Cloudera MLOps Analysis
40 pages

Mall Customer

Uploaded by

Mall Customer

Uploaded by

Mall Customer

In [1]: import numpy as np

from sklearn.cluster import KMeans

from sklearn.preprocessing import LabelEncoder

... ... ... ... ... ...

195 196 Female 35 120 79

196 197 Female 45 126 28

197 198 Male 32 126 74

198 199 Male 32 137 18

199 200 Male 30 137 83

200 rows × 5 columns

Out[5]: CustomerID Age Annual Income (k$) Spending Score (1-100)

count 200.000000 200.000000 200.000000 200.000000

mean 100.500000 38.850000 60.560000 50.200000

std 57.879185 13.969007 26.264721 25.823522

min 1.000000 18.000000 15.000000 1.000000

25% 50.750000 28.750000 41.500000 34.750000

50% 100.500000 36.000000 61.500000 50.000000

75% 150.250000 49.000000 78.000000 73.000000

max 200.000000 70.000000 137.000000 99.000000

In [7]: df.groupby('Gender')['Spending Score (1-100)'].value_counts()

Out[7]: Gender Spending Score (1-100)

In [9]: df.Gender = la.fit_transform(df.Gender)

... ... ... ... ... ...

195 196 0 35 120 79

196 197 0 45 126 28

197 198 1 32 126 74

198 199 1 32 137 18

199 200 1 30 137 83

200 rows × 5 columns

In [13]: for i in range(1,12):

In [14]: pd.DataFrame( clustern,j)

In [15]: plt.plot(clustern,j,marker = 'o')

Out[15]: Text(0, 0.5, 'perc of error')

In [16]: model = KMeans(n_clusters= 4)

In [17]: df['k_means'] = pre

... ... ... ... ... ... ...

195 196 0 35 120 79 3

196 197 0 45 126 28 0

197 198 1 32 126 74 3

198 199 1 32 137 18 0

199 200 1 30 137 83 3

200 rows × 6 columns

In [20]: xdata = la.fit_transform(df[['Age','Annual Income (k$)','Spending Score (1-100)']])

In [22]: for i in range(1,12):

In [23]: pd.DataFrame( clustern,j)

In [24]: plt.plot(clustern,j,marker = 'o')

Out[24]: Text(0, 0.5, 'perc of error')

In [25]: df1 = df[df['k_means'] == 0]

In [26]: plt.scatter(df1['Annual Income (k$)'], df1['Spending Score (1-100)'] , label = 'group1')

Out[26]: <matplotlib.legend.Legend at 0x231a66eca00>

In [28]: xdata = la.fit_transform(df[['Age','Annual Income (k$)','Spending Score (1-100)']])

In [30]: for i in range(1,12):

In [31]: pd.DataFrame( clustern,j)

In [32]: plt.plot(clustern,j,marker = 'o')

Out[32]: Text(0, 0.5, 'perc of error')

In [33]: df1 = df[df['k_means'] == 0]

In [34]: plt.scatter(df1['Annual Income (k$)'], df1['Spending Score (1-100)'] , label = 'group1')

Out[34]: <matplotlib.legend.Legend at 0x231a658bd30>

In [36]: df.k_means.value_counts().plot.pie(autopct = '%0.2f%%')

Out[36]: <Axes: ylabel='count'>

Out[37]: <Axes: xlabel='Age', ylabel='Count'>

In [38]: sns.countplot(data = df,x = df['k_means'], hue = df['Gender'])

Out[38]: <Axes: xlabel='k_means', ylabel='count'>

You might also like