0% found this document useful (0 votes)
3 views

DS3.1

The document outlines a lab practical focused on Python data types and libraries such as NumPy, Matplotlib, and Pandas for data manipulation and visualization. Students will perform tasks including creating arrays, generating charts, and analyzing datasets to gain hands-on experience. The conclusion emphasizes the importance of these libraries in efficiently handling and interpreting large datasets.

Uploaded by

Armankhan Pathan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

DS3.1

The document outlines a lab practical focused on Python data types and libraries such as NumPy, Matplotlib, and Pandas for data manipulation and visualization. Students will perform tasks including creating arrays, generating charts, and analyzing datasets to gain hands-on experience. The conclusion emphasizes the importance of these libraries in efficiently handling and interpreting large datasets.

Uploaded by

Armankhan Pathan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Enrolment No.

: 210430116083

Experiment No: 3

Date:

AIM: Study of Basics of Python data types, NumPy, Matplotlib, Pandas.

Relevant CO: CO1, CO2

Objective:
The objective of this lab practical is to gain hands-on experience with NumPy, Matplotlib, and
Pandas libraries to manipulate and visualize data. Through this practical, students will learn
how to use different functions of these libraries to perform various data analysis tasks.

Materials Used:
- Python programming environment
- NumPy library
- Matplotlib library
- Pandas library
- Dataset file (provided by faculty)
//Example of dataset file like sales_Data.csv
o Date: Date of sale
o Product: Name of the product sold
o Units Sold: Number of units sold
o Revenue: Total revenue generated from the sale
o Region: Geographic region where the sale took place
o Salesperson: Name of the salesperson who made the sale

Procedures:

Part 1: NumPy
1. Import the NumPy library into Python.
2. Create a NumPy array with the following specifications:
a. Dimensions: 5x5
b. Data type: integer
c. Values: random integers between 1 and 100
3. Reshape the array into a 1x25 array and calculate the mean, median, variance, and standard
deviation using NumPy functions.
4. Generate a random integer array of length 10 and find the percentile, decile, and quartile
values using NumPy functions.

Part 2: Matplotlib
1. Import the Matplotlib library into Python.
2. Create a simple bar chart using the following data:
a. X-axis values: ['A', 'B', 'C', 'D']
b. Y-axis values: [10, 20, 30, 40]
3. Customize the plot by adding a title, axis labels, and changing the color and style of the bars.
4. Create a pie chart using the following data:
a. Labels: ['Red', 'Blue', 'Green', 'Yellow']
b. Values: [20, 30, 10, 40]
5. Customize the pie chart by adding a title, changing the colors of the slices, and adding a

15
Enrolment No.: 210430116083

legend.

Part 3: Pandas
1. Import the Pandas library into Python.
2. Load the "sales_data.csv" file into a Pandas data frame.
3. Calculate the following statistics for the Units Sold and Revenue columns:
a. Mean
b. Median
c. Variance
d. Standard deviation
4. Group the data frame by Product and calculate the mean, median, variance, and standard
deviation of Units Sold and Revenue for each product using Pandas functions.
5. Create a line chart to visualize the trend of Units Sold and Revenue over time for each
product.

Interpretation/Program/code:
Part 1:
import numpy as np

np.random.seed(0)

arr = np.random.randint(1, 101, size=(5, 5), dtype=int)

print(arr)

reshaped_arr = arr.reshape(1, 25)

mean = np.mean(reshaped_arr)
median = np.median(reshaped_arr)
variance = np.var(reshaped_arr)
std_dev = np.std(reshaped_arr)

print("Mean:", mean)
print("Median:", median)
print("Variance:", variance)
print("Standard Deviation:", std_dev)

percentiles = np.percentile(arr, [10, 25, 50, 75, 90])


deciles = np.percentile(arr, [10, 20, 30, 40, 50, 60, 70, 80, 90])

16
Enrolment No.: 210430116083

quartiles = np.percentile(arr, [25, 50, 75])

print("Percentiles:", percentiles)
print("Deciles:", deciles)
print("Quartiles:", quartiles)

Part 2:
import matplotlib.pyplot as mt\
x_values = ['A', 'B', 'C', 'D']
y_values = [10, 20, 30, 40]
mt.bar(x_values,y_values)
mt.xlabel('X-axis')
mt.ylabel('Y-axis')
mt.title('Simple Bar Chart')
mt.show()

mt.bar(x_values, y_values, color='skyblue', edgecolor='black', linestyle='--')


mt.show()

17
Enrolment No.: 210430116083

labels = ['Red', 'Blue', 'Green', 'Yellow']


values = [20, 30, 10, 40]
mt.pie(values,labels=labels)
mt.title('Pie Chart')
mt.show()

colors = ['red', 'blue', 'green', 'yellow']


mt.pie(values, labels=labels, colors=colors)
mt.legend()
mt.show()

18
Enrolment No.: 210430116083

Part 3:
import pandas as pd
df=pd.read_csv(r'sales_data.csv')
mean_units_sold = df['Order_Quantity'].mean()
median_units_sold = df['Order_Quantity'].median()
variance_units_sold = df['Order_Quantity'].var()
std_dev_units_sold = df['Order_Quantity'].std()

mean_revenue = df['Revenue'].mean()
median_revenue = df['Revenue'].median()
variance_revenue = df['Revenue'].var()
std_dev_revenue = df['Revenue'].std()

print("Units Sold:")
print("Mean:", mean_units_sold)
print("Median:", median_units_sold)
print("Variance:", variance_units_sold)
print("Standard Deviation:", std_dev_units_sold)

print("\nRevenue:") print("Mean:",
mean_revenue) print("Median:",
median_revenue)
print("Variance:", variance_revenue)
print("Standard Deviation:", std_dev_revenue)

19
Enrolment No.: 210430116083

grouped_df = df.groupby('Product').agg({'Order_Quantity': ['mean', 'median', 'var', 'std'],


'Revenue': ['mean', 'median', 'var', 'std']})

print(grouped_df)

20
Enrolment No.: 210430116083

df['Date'] = pd.to_datetime(df['Date'])

groupe_df = df.groupby(['Product', 'Date']).sum().reset_index()

mt.plot(groupe_df[grouped_df['Product']=='Hitch Rack - 4-Bike']['Date'],


groupe_df[groupe_df['Product'] == 'Hitch Rack - 4-Bike']['Order_Quantity'], label='Product
A')
mt.plot(groupe_df[grouped_df['Product'] == 'Sport-100 Helmet, Black']['Date'],
groupe_df[groupe_df['Product'] == 'Sport-100 Helmet, Black']['Order_Quantity'],
label='Product B')
mt.plot(groupe_df[grouped_df['Product'] == 'Long-Sleeve Logo Jersey, L']['Date'],
groupe_df[groupe_df['Product'] == 'Long-Sleeve Logo Jersey, L']['Order_Quantity'],
label='Product C')
mt.xlabel('Date')
mt.ylabel('Units Sold')
mt.legend()
mt.show()

21
Enrolment No.: 210430116083

Conclusion:
In conclusion, this lab practical provided hands-on experience with NumPy, Matplotlib, and
Pandas libraries in Python for data manipulation and visualization. These libraries have wide-
ranging applications in various fields, enabling researchers and analysts to gain insights from
large datasets quickly and efficiently. Through exercises such as calculating statistical
measures and visualizing data using charts, we explored the functionality and flexibility of
these powerful data analysis tools. Overall, gaining proficiency in these libraries equips
individuals to tackle complex data analysis challenges and contribute to their respective fields
of study or industries.

Quiz:
1. What is the difference between a list and a tuple in Python?
2. How can you use NumPy to generate an array of random numbers?

Suggested References:-
1. Dinesh Kumar, Business Analytics, Wiley India Business alytics: The Science
2. V.K. Jain, Data Science & Analytics, Khanna Book Publishing, New Delhi of Dat
3. Data Science For Dummies by Lillian Pierson , Jake Porway
Rubrics wise marks obtained

Understanding of Analysis of Capability of Documentation


Problem the Problem writing program Total

02 02 05 01 10

22

You might also like