0% found this document useful (0 votes)
18 views

DMV - 5 - Jupyter Notebook

Uploaded by

Anushka Jadhav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

DMV - 5 - Jupyter Notebook

Uploaded by

Anushka Jadhav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

10/6/24, 8:03 PM DMV_5 - Jupyter Notebook

In [1]: import pandas as pd

In [2]: data = pd.read_csv('Retail_Sales_Data.csv')

In [3]: data.head()

Out[3]:
Transaction Customer Product Price per Total
Date Gender Age Quantity
ID ID Category Unit Amount

2023-
0 1 CUST001 Male 34 Beauty 3 50 150
11-24

2023-
1 2 CUST002 Female 26 Clothing 2 500 1000
02-27

2023-
2 3 CUST003 Male 50 Electronics 1 30 30
01-13

2023-
3 4 CUST004 Male 37 Clothing 1 500 500
05-21

2023-
4 5 CUST005 Male 30 Beauty 2 50 100
05-06

In [4]: data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Transaction ID 1000 non-null int64
1 Date 1000 non-null object
2 Customer ID 1000 non-null object
3 Gender 1000 non-null object
4 Age 1000 non-null int64
5 Product Category 1000 non-null object
6 Quantity 1000 non-null int64
7 Price per Unit 1000 non-null int64
8 Total Amount 1000 non-null int64
dtypes: int64(5), object(4)
memory usage: 70.4+ KB

In [5]: data.describe()

Out[5]:
Transaction ID Age Quantity Price per Unit Total Amount

count 1000.000000 1000.00000 1000.000000 1000.000000 1000.000000

mean 500.500000 41.39200 2.514000 179.890000 456.000000

std 288.819436 13.68143 1.132734 189.681356 559.997632

min 1.000000 18.00000 1.000000 25.000000 25.000000

25% 250.750000 29.00000 1.000000 30.000000 60.000000

50% 500.500000 42.00000 3.000000 50.000000 135.000000

75% 750.250000 53.00000 4.000000 300.000000 900.000000

max 1000.000000 64.00000 4.000000 500.000000 2000.000000

localhost:8888/notebooks/BE_PRACTICALS/DMV_5.ipynb 1/5
10/6/24, 8:03 PM DMV_5 - Jupyter Notebook

In [6]: data.isnull().sum()

Out[6]: Transaction ID 0
Date 0
Customer ID 0
Gender 0
Age 0
Product Category 0
Quantity 0
Price per Unit 0
Total Amount 0
dtype: int64

In [8]: sales_by_region = data.groupby('Product Category')['Total Amount'].sum().reset_i


print(sales_by_region)

Product Category Total Amount


0 Beauty 143515
1 Clothing 155580
2 Electronics 156905

In [9]: import matplotlib.pyplot as plt

In [11]: plt.figure(figsize=(10, 6))


plt.bar(sales_by_region['Product Category'], sales_by_region['Total Amount'], co
plt.title('Total Sales by Product Category')
plt.xlabel('Product Category')
plt.ylabel('Total Sales Amount')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

localhost:8888/notebooks/BE_PRACTICALS/DMV_5.ipynb 2/5
10/6/24, 8:03 PM DMV_5 - Jupyter Notebook

In [12]: plt.figure(figsize=(8, 8))


plt.pie(sales_by_region['Total Amount'], labels=sales_by_region['Product Categor
plt.title('Sales Distribution by Region')
plt.tight_layout()
plt.show()

In [13]: top_regions = sales_by_region.sort_values(by='Total Amount', ascending=False).he


print("Top-performing regions:\n", top_regions)

Top-performing regions:
Product Category Total Amount
2 Electronics 156905
1 Clothing 155580
0 Beauty 143515

localhost:8888/notebooks/BE_PRACTICALS/DMV_5.ipynb 3/5
10/6/24, 8:03 PM DMV_5 - Jupyter Notebook

In [15]: sales_by_region_and_category = data.groupby(['Price per Unit', 'Product Category


print(sales_by_region_and_category)

Product Category Beauty Clothing Electronics


Price per Unit
25 3925 4600 4525
30 3990 5130 4230
50 8500 9450 8750
300 42600 57900 54900
500 84500 78500 84500

In [16]: sales_by_region_and_category.plot(kind='bar', stacked=True, figsize=(12, 6), col


plt.title('Sales by Total Amount and Product Category')
plt.xlabel('Product Category')
plt.ylabel('Total Sales Amount')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

localhost:8888/notebooks/BE_PRACTICALS/DMV_5.ipynb 4/5
10/6/24, 8:03 PM DMV_5 - Jupyter Notebook

In [18]: sales_by_region_and_category.plot(kind='bar', figsize=(12, 6), colormap='Set3')


plt.title('Sales by Total Amount and Product Category')
plt.xlabel('Region')
plt.ylabel('Total Sales Amount')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [ ]: ​

localhost:8888/notebooks/BE_PRACTICALS/DMV_5.ipynb 5/5

You might also like