DS3.1
DS3.1
: 210430116083
Experiment No: 3
Date:
Objective:
The objective of this lab practical is to gain hands-on experience with NumPy, Matplotlib, and
Pandas libraries to manipulate and visualize data. Through this practical, students will learn
how to use different functions of these libraries to perform various data analysis tasks.
Materials Used:
- Python programming environment
- NumPy library
- Matplotlib library
- Pandas library
- Dataset file (provided by faculty)
//Example of dataset file like sales_Data.csv
o Date: Date of sale
o Product: Name of the product sold
o Units Sold: Number of units sold
o Revenue: Total revenue generated from the sale
o Region: Geographic region where the sale took place
o Salesperson: Name of the salesperson who made the sale
Procedures:
Part 1: NumPy
1. Import the NumPy library into Python.
2. Create a NumPy array with the following specifications:
a. Dimensions: 5x5
b. Data type: integer
c. Values: random integers between 1 and 100
3. Reshape the array into a 1x25 array and calculate the mean, median, variance, and standard
deviation using NumPy functions.
4. Generate a random integer array of length 10 and find the percentile, decile, and quartile
values using NumPy functions.
Part 2: Matplotlib
1. Import the Matplotlib library into Python.
2. Create a simple bar chart using the following data:
a. X-axis values: ['A', 'B', 'C', 'D']
b. Y-axis values: [10, 20, 30, 40]
3. Customize the plot by adding a title, axis labels, and changing the color and style of the bars.
4. Create a pie chart using the following data:
a. Labels: ['Red', 'Blue', 'Green', 'Yellow']
b. Values: [20, 30, 10, 40]
5. Customize the pie chart by adding a title, changing the colors of the slices, and adding a
15
Enrolment No.: 210430116083
legend.
Part 3: Pandas
1. Import the Pandas library into Python.
2. Load the "sales_data.csv" file into a Pandas data frame.
3. Calculate the following statistics for the Units Sold and Revenue columns:
a. Mean
b. Median
c. Variance
d. Standard deviation
4. Group the data frame by Product and calculate the mean, median, variance, and standard
deviation of Units Sold and Revenue for each product using Pandas functions.
5. Create a line chart to visualize the trend of Units Sold and Revenue over time for each
product.
Interpretation/Program/code:
Part 1:
import numpy as np
np.random.seed(0)
print(arr)
mean = np.mean(reshaped_arr)
median = np.median(reshaped_arr)
variance = np.var(reshaped_arr)
std_dev = np.std(reshaped_arr)
print("Mean:", mean)
print("Median:", median)
print("Variance:", variance)
print("Standard Deviation:", std_dev)
16
Enrolment No.: 210430116083
print("Percentiles:", percentiles)
print("Deciles:", deciles)
print("Quartiles:", quartiles)
Part 2:
import matplotlib.pyplot as mt\
x_values = ['A', 'B', 'C', 'D']
y_values = [10, 20, 30, 40]
mt.bar(x_values,y_values)
mt.xlabel('X-axis')
mt.ylabel('Y-axis')
mt.title('Simple Bar Chart')
mt.show()
17
Enrolment No.: 210430116083
18
Enrolment No.: 210430116083
Part 3:
import pandas as pd
df=pd.read_csv(r'sales_data.csv')
mean_units_sold = df['Order_Quantity'].mean()
median_units_sold = df['Order_Quantity'].median()
variance_units_sold = df['Order_Quantity'].var()
std_dev_units_sold = df['Order_Quantity'].std()
mean_revenue = df['Revenue'].mean()
median_revenue = df['Revenue'].median()
variance_revenue = df['Revenue'].var()
std_dev_revenue = df['Revenue'].std()
print("Units Sold:")
print("Mean:", mean_units_sold)
print("Median:", median_units_sold)
print("Variance:", variance_units_sold)
print("Standard Deviation:", std_dev_units_sold)
print("\nRevenue:") print("Mean:",
mean_revenue) print("Median:",
median_revenue)
print("Variance:", variance_revenue)
print("Standard Deviation:", std_dev_revenue)
19
Enrolment No.: 210430116083
print(grouped_df)
20
Enrolment No.: 210430116083
df['Date'] = pd.to_datetime(df['Date'])
21
Enrolment No.: 210430116083
Conclusion:
In conclusion, this lab practical provided hands-on experience with NumPy, Matplotlib, and
Pandas libraries in Python for data manipulation and visualization. These libraries have wide-
ranging applications in various fields, enabling researchers and analysts to gain insights from
large datasets quickly and efficiently. Through exercises such as calculating statistical
measures and visualizing data using charts, we explored the functionality and flexibility of
these powerful data analysis tools. Overall, gaining proficiency in these libraries equips
individuals to tackle complex data analysis challenges and contribute to their respective fields
of study or industries.
Quiz:
1. What is the difference between a list and a tuple in Python?
2. How can you use NumPy to generate an array of random numbers?
Suggested References:-
1. Dinesh Kumar, Business Analytics, Wiley India Business alytics: The Science
2. V.K. Jain, Data Science & Analytics, Khanna Book Publishing, New Delhi of Dat
3. Data Science For Dummies by Lillian Pierson , Jake Porway
Rubrics wise marks obtained
02 02 05 01 10
22