Q.
P Code: 100114
JAI SHRIRAM ENGINEERING COLLEGE
An Autonomous Institution
B.E / B.Tech Degree Examinations Nov/ Dec – 2024
Course Code: CS3352 Course Name: Foundations of Data Science
Semester:3rd semester Max Marks: 100
Answer key
Part – A
10 x 2 = 20
1. Question: Identify the importance of project charter.
Answer:
A project charter authorizes a project, defines objectives, scope, and stakeholder roles,
ensuring alignment and clarity. It acts as a reference document throughout the project
lifecycle.
2. Question: Define Data Warehousing.
Answer:
Data warehousing involves collecting and storing data from multiple sources in a
centralized repository. It facilitates efficient querying, reporting, and decision-making.
3. Question: Given the following data set: 5,7,8,10,12,14,15,18,20.Calculate the
interquartile range.
Answer:
4. Question: Apply the formula to convert Z score to original score
Answer:
5. Question: Define z scores.
Answer:
6. Question: Identify the properties of correlation coefficient.
Answer:
7. Question: Name some NumPy Array attributes.
Answer:
8. Question: Write a comment to create two-dimensional array?
Answer:
# Create a 2D NumPy array using np.array()
import numpy as np
# Create a 2D array with 3 rows and 4 columns
array_2d = np.array([[1, 2, 3, 4], [5, 6, 7, 8],[9, 10, 11, 12]])
print(array_2d)
9. Question: How can you set different colors for bar plot?
Answer:
Use the color parameter in plt.bar().
plt.bar(x, y, color=['red', 'blue', 'green'])
10. Question: State the purpose of histogram
Answer:
To visualize the distribution of numerical data.
Shows the frequency of data within specific intervals (bins).
Helps identify patterns, such as skewness or modality.
Scheme of Evaluation
Part – B
5 X 13 = 65
15. b) .
import matplotlib.pyplot as plt
import numpy as np
# Generate sample data for three groups
np.random.seed(42)
# Group 1
weights_group1 = np.random.uniform(56, 64, 20)
heights_group1 = np.random.uniform(120, 180, 20)
# Group 2
weights_group2 = np.random.uniform(60, 68, 20)
heights_group2 = np.random.uniform(140, 200, 20)
# Group 3
weights_group3 = np.random.uniform(66, 72, 20)
heights_group3 = np.random.uniform(160, 240, 20)
# Plotting the scatter plot
plt.figure(figsize=(8, 6))
# Group 1
plt.scatter(weights_group1, heights_group1, label='Group 1', color='blue', alpha=0.7)
# Group 2
plt.scatter(weights_group2, heights_group2, label='Group 2', color='green', alpha=0.7)
# Group 3
plt.scatter(weights_group3, heights_group3, label='Group 3', color='red', alpha=0.7)
# Adding labels, title, and legend
plt.title("Group wise Weight vs Height scatter plot")
plt.xlabel("weight")
plt.ylabel("height")
plt.legend()
plt.grid(True)
# Show plot
plt.show()
Part – C
1 X 15 = 15
16. a) You have been provided with a CSV file named "sales_data.csv" that contains
sales data for acompany. The file has the following columns: "Date", "Product",
"Quantity", and" Revenue". Your task is to load the data into a pandas Data Frame
and perform the following analysis.
Each 3 marks
i. Calculate the total revenue generated by the company.
ii. Find the product that generated the highest revenue.
iii. Calculate the average quantity sold per day.
iv. Group the data by month and calculate the total revenue for each month.
v. Plot a line graph showing the monthly revenue over time.
Answer
Python program
import pandas as pd
import matplotlib.pyplot as plt
# Load the CSV file into a DataFrame
df = pd.read_csv("sales_data.csv")
# Ensure 'Date' column is in datetime format
df['Date'] = pd.to_datetime(df['Date'])
# i. Calculate the total revenue generated by the company
total_revenue = df['Revenue'].sum()
print(f"Total Revenue: {total_revenue}")
# ii. Find the product that generated the highest revenue
highest_revenue_product = df.groupby('Product')['Revenue'].sum().idxmax()
print(f"Product with highest revenue: {highest_revenue_product}")
# iii. Calculate the average quantity sold per day
avg_quantity_per_day = df.groupby('Date')['Quantity'].sum().mean()
print(f"Average quantity sold per day: {avg_quantity_per_day}")
# iv. Group the data by month and calculate the total revenue for each month
df['Month'] = df['Date'].dt.to_period('M') # Group by month
monthly_revenue = df.groupby('Month')['Revenue'].sum()
# v. Plot a line graph showing the monthly revenue over time
plt.figure(figsize=(10, 6))
monthly_revenue.plot(kind='line', marker='o')
plt.title('Monthly Revenue Over Time')
plt.xlabel('Month')
plt.ylabel('Total Revenue')
plt.grid(True)
plt.show()
OR
b) Develop an example for contour plot,histogram,3D plotting and line plot for
Matplotlib.
Answer
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
# Prepare a grid and data for Contour Plot and 3D Plot
x = np.linspace(-5, 5, 50)
y = np.linspace(-5, 5, 50)
X, Y = np.meshgrid(x, y)
Z = np.sin(np.sqrt(X**2 + Y**2))
# Random data for Histogram
data = np.random.randn(1000)
# Data for Line Plot
x_line = np.linspace(0, 10, 100)
y_line = np.sin(x_line)
# Create a figure with 4 subplots
fig = plt.figure(figsize=(14, 10))
# 1. Contour Plot
ax1 = fig.add_subplot(2, 2, 1)
contour = ax1.contour(X, Y, Z, levels=10, cmap='viridis')
fig.colorbar(contour, ax=ax1)
ax1.set_title('Contour Plot')
ax1.set_xlabel('X-axis')
ax1.set_ylabel('Y-axis')
# 2. Histogram
ax2 = fig.add_subplot(2, 2, 2)
ax2.hist(data, bins=30, color='blue', alpha=0.7, edgecolor='black')
ax2.set_title('Histogram')
ax2.set_xlabel('Data')
ax2.set_ylabel('Frequency')
# 3. 3D Plot
ax3 = fig.add_subplot(2, 2, 3, projection='3d')
ax3.plot_surface(X, Y, Z, cmap='viridis', edgecolor='none')
ax3.set_title('3D Surface Plot')
ax3.set_xlabel('X-axis')
ax3.set_ylabel('Y-axis')
ax3.set_zlabel('Z-axis')
# 4. Line Plot
ax4 = fig.add_subplot(2, 2, 4)
ax4.plot(x_line, y_line, label='sin(x)', color='red', linewidth=2)
ax4.set_title('Line Plot')
ax4.set_xlabel('X-axis')
ax4.set_ylabel('Y-axis')
ax4.legend()
ax4.grid(True)
# Adjust layout and show the plots
plt.tight_layout()
plt.show()
Course In-Charge HoD