Active Product Sales Analysis using Matplotlib in Python
Last Updated :
10 Sep, 2024
Every modern company that engages in online sales or maintains a specialized e-commerce website now aims to maximize its throughput in order to determine what precisely their clients need in order to increase their chances of sales. The huge datasets handed to us can be properly analyzed to find out what time of day has the highest user activity in terms of transactions.
In this post, We will use Python Pandas and Matplotlib to analyze the insight of the dataset. We can use the column Transaction Date, in this case, to glean useful insights on the busiest time (hour) of the day. You can access the entire dataset here.
Stepwise Implementation
Step 1:
First, We need to create a Dataframe of the dataset, and even before that certain libraries have to be imported.
Python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Order_Details = pd.read_csv('Order_details(masked).csv')
Output:
Step 2:
Create a new column called Time that has the DateTime format after converting the Transaction Date column into it. The DateTime format, which has the pattern YYYY-MM-DD HH:MM:SS, can be customized however you choose. Here we're more interested in obtaining hours, so we can have an Hour column by using an in-built function for the same:
Python
# here we have taken Transaction
# date column
Order_Details['Time'] = pd.to_datetime(Order_Details['Transaction Date'])
# After that we extracted hour
# from Transaction date column
Order_Details['Hour'] = (Order_Details['Time']).dt.hour
Step 3:
We then require the "n" busiest hours. For that, we get the first "n" entries in a list containing the occurrence rates of the hours when the transaction took place. To further simplify the manipulation of the provided data in Python, we may utilize value counts for frequencies and tolist() to convert to list format. We are also compiling a list of the associated index values.
Python
# n =24 in this case, can be modified
# as per need to see top 'n' busiest hours
timemost1 = Order_Details['Hour'].value_counts().index.tolist()[:24]
timemost2 = Order_Details['Hour'].value_counts().values.tolist()[:24]
Step 4:
Finally, we stack the indices (hour) and frequencies together to yield the final result.
Python
tmost = np.column_stack((timemost1,timemost2))
print(" Hour Of Day" + "\t" + "Cumulative Number of Purchases \n")
print('\n'.join('\t\t'.join(map(str, row)) for row in tmost))
Step 5:
Before we can create an appropriate data visualization, we must make the list slightly more customizable. To do so, we gather the hourly frequencies and perform the following tasks:
Python
timemost = Order_Details['Hour'].value_counts()
timemost1 = []
for i in range(0,23):
timemost1.append(i)
timemost2 = timemost.sort_index()
timemost2.tolist()
timemost2 = pd.DataFrame(timemost2)
Step 6:
For data visualization, we will proceed with Matplotlib for better comprehensibility, as it is one of the most convenient and commonly used libraries. But, It is up to you to choose any of the pre-existing libraries like Matplotlib, Ggplot, Seaborn, etc., to plot the data graphically.
The commands written below are mainly to ensure that X-axis takes up the values of hours and Y-axis takes up the importance of the number of transactions affected, and also various other aspects of a line chart, including color, font, etc., to name a few.
Python
plt.figure(figsize=(20, 10))
plt.title('Sales Happening Per Hour (Spread Throughout The Week)',
fontdict={'fontname': 'monospace', 'fontsize': 30}, y=1.05)
plt.ylabel("Number Of Purchases Made", fontsize=18, labelpad=20)
plt.xlabel("Hour", fontsize=18, labelpad=20)
plt.plot(timemost1, timemost2, color='m')
plt.grid()
plt.show()
The results are indicative of how sales typically peak in late evening hours prominently, and this data can be incorporated into business decisions to promote a product during that time specifically.
Get the complete notebook link here
Colab Link : click here.
Dataset Link : click here.
Similar Reads
Python Tutorial | Learn Python Programming Language Python Tutorial â Python is one of the most popular programming languages. Itâs simple to use, packed with features and supported by a wide range of libraries and frameworks. Its clean syntax makes it beginner-friendly.Python is:A high-level language, used in web development, data science, automatio
10 min read
Support Vector Machine (SVM) Algorithm Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks. It tries to find the best boundary known as hyperplane that separates different classes in the data. It is useful when you want to do binary classification like spam vs. not spam or
9 min read
Logistic Regression in Machine Learning Logistic Regression is a supervised machine learning algorithm used for classification problems. Unlike linear regression which predicts continuous values it predicts the probability that an input belongs to a specific class. It is used for binary classification where the output can be one of two po
11 min read
File Handling in Python File handling refers to the process of performing operations on a file such as creating, opening, reading, writing and closing it, through a programming interface. It involves managing the data flow between the program and the file system on the storage device, ensuring that data is handled safely a
7 min read
Learn Data Science Tutorial With Python Data Science has become one of the fastest-growing fields in recent years, helping organizations to make informed decisions, solve problems and understand human behavior. As the volume of data grows so does the demand for skilled data scientists. The most common languages used for data science are P
3 min read
Supervised Machine Learning Supervised machine learning is a fundamental approach for machine learning and artificial intelligence. It involves training a model using labeled data, where each input comes with a corresponding correct output. The process is like a teacher guiding a studentâhence the term "supervised" learning. I
12 min read
Python Lambda Functions Python Lambda Functions are anonymous functions means that the function is without a name. As we already know the def keyword is used to define a normal function in Python. Similarly, the lambda keyword is used to define an anonymous function in Python. In the example, we defined a lambda function(u
6 min read
Gradient Descent Algorithm in Machine Learning Gradient descent is the backbone of the learning process for various algorithms, including linear regression, logistic regression, support vector machines, and neural networks which serves as a fundamental optimization technique to minimize the cost function of a model by iteratively adjusting the m
15+ min read
Generative Adversarial Network (GAN) Generative Adversarial Networks (GANs) help machines to create new, realistic data by learning from existing examples. It is introduced by Ian Goodfellow and his team in 2014 and they have transformed how computers generate images, videos, music and more. Unlike traditional models that only recogniz
12 min read
Printing Pyramid Patterns in Python Pyramid patterns are sequences of characters or numbers arranged in a way that resembles a pyramid, with each level having one more element than the level above. These patterns are often used for aesthetic purposes and in educational contexts to enhance programming skills.Exploring and creating pyra
9 min read