0% found this document useful (0 votes)
1K views8 pages

Diwali Sales Analysis EDA 1696347982

This document analyzes sales data from a Diwali festival in India using Python. Key findings from exploratory data analysis include: - Most buyers are female and females have higher total purchasing amounts than males. - The largest numbers of buyers are in the 26-35 age group, primarily female. - The states with the most total orders and sales amounts are Uttar Pradesh, Maharashtra, and Karnataka. - Married women have the highest purchasing power and account for the most buyers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views8 pages

Diwali Sales Analysis EDA 1696347982

This document analyzes sales data from a Diwali festival in India using Python. Key findings from exploratory data analysis include: - Most buyers are female and females have higher total purchasing amounts than males. - The largest numbers of buyers are in the 26-35 age group, primarily female. - The states with the most total orders and sales amounts are Uttar Pradesh, Maharashtra, and Karnataka. - Married women have the highest purchasing power and account for the most buyers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Exploratory Data Analysis Using Python- Diwali Sales Analysis Project

# import python libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt # visualizing data
%matplotlib inline
import seaborn as sns

# import csv file


df = pd.read_csv('Diwali Sales Data.csv', encoding= 'unicode_escape')

df.shape

(11251, 15)

# checking top 5 rows of data


df.head()

Age
User_ID Cust_name Product_ID Gender Age Marital_Status State Zone Occupation Product_Category Orders Amount
Group

0 1002903 Sanskriti P00125942 F 26-35 28 0 Maharashtra Western Healthcare Auto 1 23952.0

1 1000732 Kartik P00110942 F 26-35 35 1 Andhra Pradesh Southern Govt Auto 3 23934.0

2 1001990 Bindu P00118542 F 26-35 35 1 Uttar Pradesh Central Automobile Auto 3 23924.0

3 1001425 Sudevi P00237842 M 0-17 16 0 Karnataka Southern Construction Auto 2 23912.0

Food
4 1000588 Joni P00057942 M 26-35 28 1 Gujarat Western Auto 2 23877.0
Processing

# checking bottom 5 rows of data


df.tail()

Age
User_ID Cust_name Product_ID Gender Age Marital_Status State Zone Occupation Product_Category Orders Amou
Group

11246 1000695 Manning P00296942 M 18-25 19 1 Maharashtra Western Chemical Office 4

11247 1004089 Reichenbach P00171342 M 26-35 33 0 Haryana Northern Healthcare Veterinary 3

Madhya
11248 1001209 Oshin P00201342 F 36-45 40 0 Central Textile Office 4
Pradesh

11249 1004023 Noonan P00059442 M 36-45 37 0 Karnataka Southern Agriculture Office 3

11250 1002744 Brumley P00281742 F 18-25 19 0 Maharashtra Western Healthcare Office 3

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11251 entries, 0 to 11250
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 User_ID 11251 non-null int64
1 Cust_name 11251 non-null object
2 Product_ID 11251 non-null object
3 Gender 11251 non-null object
4 Age Group 11251 non-null object
5 Age 11251 non-null int64
6 Marital_Status 11251 non-null int64
7 State 11251 non-null object
8 Zone 11251 non-null object
9 Occupation 11251 non-null object
10 Product_Category 11251 non-null object
11 Orders 11251 non-null int64
12 Amount 11239 non-null float64
13 Status 0 non-null float64
14 unnamed1 0 non-null float64
dtypes: float64(3), int64(4), object(8)
memory usage: 1.3+ MB

#drop unrelated/blank columns


df.drop(['Status', 'unnamed1'], axis=1, inplace=True)

#check for null values


pd.isnull(df).sum()
User_ID 0
Cust_name 0
Product_ID 0
Gender 0
Age Group 0
Age 0
Marital_Status 0
State 0
Zone 0
Occupation 0
Product_Category 0
Orders 0
Amount 12
dtype: int64

# drop null values


df.dropna(inplace=True)

# change data type


df['Amount'] = df['Amount'].astype('int')

# change data type


df['Amount'].dtypes

dtype('int32')

df.columns

Index(['User_ID', 'Cust_name', 'Product_ID', 'Gender', 'Age Group', 'Age',


'Marital_Status', 'State', 'Zone', 'Occupation', 'Product_Category',
'Orders', 'Amount'],
dtype='object')

#rename column
df.rename(columns= {'Marital_Status':'Shaadi'}).head()

Age
User_ID Cust_name Product_ID Gender Age Shaadi State Zone Occupation Product_Category Orders Amount
Group

0 1002903 Sanskriti P00125942 F 26-35 28 0 Maharashtra Western Healthcare Auto 1 23952

1 1000732 Kartik P00110942 F 26-35 35 1 Andhra Pradesh Southern Govt Auto 3 23934

2 1001990 Bindu P00118542 F 26-35 35 1 Uttar Pradesh Central Automobile Auto 3 23924

3 1001425 Sudevi P00237842 M 0-17 16 0 Karnataka Southern Construction Auto 2 23912

Food
4 1000588 Joni P00057942 M 26-35 28 1 Gujarat Western Auto 2 23877
Processing

# describe() method returns description of the data in the DataFrame (i.e. count, mean, std, etc)
df.describe()

User_ID Age Marital_Status Orders Amount

count 1.123900e+04 11239.000000 11239.000000 11239.000000 11239.000000

mean 1.003004e+06 35.410357 0.420055 2.489634 9453.610553

std 1.716039e+03 12.753866 0.493589 1.114967 5222.355168

min 1.000001e+06 12.000000 0.000000 1.000000 188.000000

25% 1.001492e+06 27.000000 0.000000 2.000000 5443.000000

50% 1.003064e+06 33.000000 0.000000 2.000000 8109.000000

75% 1.004426e+06 43.000000 1.000000 3.000000 12675.000000

max 1.006040e+06 92.000000 1.000000 4.000000 23952.000000

# use describe() for specific columns


df[['Age', 'Orders', 'Amount']].describe()
Age Orders Amount

count 11239.000000 11239.000000 11239.000000

mean 35.410357 2.489634 9453.610553

std 12.753866 1.114967 5222.355168

min 12.000000 1.000000 188.000000

25% 27.000000 2.000000 5443.000000

50% 33.000000 2.000000 8109.000000

75% 43.000000 3.000000 12675.000000

max 92.000000 4.000000 23952.000000

Exploratory Data Analysis


Gender
# plotting a bar chart for Gender and it's count
sns.set(rc={'figure.figsize':(3,3)})
ax = sns.countplot(x = 'Gender',data = df)

for bars in ax.containers:


ax.bar_label(bars)

# plotting a bar chart for gender vs total amount

sales_gen = df.groupby(['Gender'], as_index=False)['Amount'].sum().sort_values(by='Amount', ascending=False)

sns.set(rc={'figure.figsize':(3,3)})
sns.barplot(x = 'Gender',y= 'Amount' ,data = sales_gen)

<Axes: xlabel='Gender', ylabel='Amount'>

From above graphs we can see that most of the buyers are females and even the purchasing power of females are greater than men

Age
sns.set(rc={'figure.figsize':(10,4)})
ax = sns.countplot(data = df, x = 'Age Group', hue = 'Gender')

for bars in ax.containers:


ax.bar_label(bars)

# Total Amount vs Age Group


sales_age = df.groupby(['Age Group'], as_index=False)['Amount'].sum().sort_values(by='Amount', ascending=False)

sns.set(rc={'figure.figsize':(10,4)})
sns.barplot(x = 'Age Group',y= 'Amount' ,data = sales_age)

<Axes: xlabel='Age Group', ylabel='Amount'>

From above graphs we can see that most of the buyers are of age group between 26-35 yrs female

State
# total number of orders from top 10 states

sales_state = df.groupby(['State'], as_index=False)['Orders'].sum().sort_values(by='Orders', ascending=False).head

sns.set(rc={'figure.figsize':(15,5)})
ax = sns.barplot(data = sales_state, x = 'State',y= 'Orders')

for bars in ax.containers:


ax.bar_label(bars)
# total amount/sales from top 10 states

sales_state = df.groupby(['State'], as_index=False)['Amount'].sum().sort_values(by='Amount', ascending=False).head

sns.set(rc={'figure.figsize':(15,5)})
sns.barplot(data = sales_state, x = 'State',y= 'Amount')

<Axes: xlabel='State', ylabel='Amount'>

From above graphs we can see that most of the orders & total sales/amount are from Uttar Pradesh, Maharashtra and Karnataka
respectively

Marital Status

ax = sns.countplot(data = df, x = 'Marital_Status')

sns.set(rc={'figure.figsize':(3,3)})
for bars in ax.containers:
ax.bar_label(bars)

sales_state = df.groupby(['Marital_Status', 'Gender'], as_index=False)['Amount'].sum().sort_values(by='Amount',

sns.set(rc={'figure.figsize':(5,4)})
sns.barplot(data = sales_state, x = 'Marital_Status',y= 'Amount', hue='Gender')
<Axes: xlabel='Marital_Status', ylabel='Amount'>

From above graphs we can see that most of the buyers are married (women) and they have high purchasing power

Occupation

sns.set(rc={'figure.figsize':(20,5)})
ax = sns.countplot(data = df, x = 'Occupation')

for bars in ax.containers:


ax.bar_label(bars)

sales_state = df.groupby(['Occupation'], as_index=False)['Amount'].sum().sort_values(by='Amount', ascending=False

sns.set(rc={'figure.figsize':(20,5)})
sns.barplot(data = sales_state, x = 'Occupation',y= 'Amount')

<Axes: xlabel='Occupation', ylabel='Amount'>

From above graphs we can see that most of the buyers are working in IT, Healthcare and Aviation sector

Product Category
sns.set(rc={'figure.figsize':(20,7)})
ax = sns.countplot(data = df, x = 'Product_Category')
for bars in ax.containers:
ax.bar_label(bars)

sales_state = df.groupby(['Product_Category'], as_index=False)['Amount'].sum().sort_values(by='Amount', ascending

sns.set(rc={'figure.figsize':(20,7)})
sns.barplot(data = sales_state, x = 'Product_Category',y= 'Amount')

<Axes: xlabel='Product_Category', ylabel='Amount'>

From above graphs we can see that most of the sold products are from Food, Clothing and Electronics category

Top Products
sales_state = df.groupby(['Product_ID'], as_index=False)['Orders'].sum().sort_values(by='Orders', ascending=False

sns.set(rc={'figure.figsize':(20,7)})
ax = sns.barplot(data = sales_state, x = 'Product_ID',y= 'Orders')

for bars in ax.containers:


ax.bar_label(bars)

# top 10 most sold products (same thing as above)


# top 10 most sold products (same thing as above)

fig1, ax1 = plt.subplots(figsize=(12,4))


ax = df.groupby('Product_ID')['Orders'].sum().nlargest(10).sort_values(ascending=False).plot(kind='bar')

for bars in ax.containers:


ax.bar_label(bars)

Conclusion:
Married women age group 26-35 yrs from UP, Maharastra and Karnataka working in IT, Healthcare and Aviation are more likely to buy
products from Food, Clothing and Electronics category

complete project on YouTube: https://www.youtube.com/watch?v=KgCgpCIOkIs

complete project on GitHub: https://github.com/rishabhnmishra/Python_Diwali_Sales_Analysis

Thank you!

Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

You might also like