0% found this document useful (0 votes)
2 views

Practical (Data Science)

The document outlines the Continuous Internal Evaluation (CIE) laboratory work for the course 'Introduction to Data Science' for the academic year 2023-2024. It includes a list of experiments with corresponding codes and outputs related to data manipulation using pandas, such as creating Series and DataFrames, calculating statistics, and implementing machine learning for weather prediction. Each practical task is designed to enhance the student's understanding and application of data science concepts.

Uploaded by

jiwitav836
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Practical (Data Science)

The document outlines the Continuous Internal Evaluation (CIE) laboratory work for the course 'Introduction to Data Science' for the academic year 2023-2024. It includes a list of experiments with corresponding codes and outputs related to data manipulation using pandas, such as creating Series and DataFrames, calculating statistics, and implementing machine learning for weather prediction. Each practical task is designed to enhance the student's understanding and application of data science concepts.

Uploaded by

jiwitav836
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Continuous Internal Evaluation (CIE) Laboratory Work

Academic Year:-2023-2024 Course Name: - Introduction to Data Science


Name of Student: Yuv Sharma Class: BCA 4th sem Batch:-2022-23
Department : - ICT. Roll No : 03221302022

Date Title of Experiment

Performance &
Understanding
Exp

Signature of
&Participati

Submission

Submission
Attendance

(Out of 30)
.

Total

Teacher
Date of
Timely
No.

on
10 10 10 30
1 Create a pandas series from a
dictionary of values and an ndarray

2 Create a Series and print all the


elements that are above 75th
percentile.

3 Perform sorting on Series data and


DataFrames

4 Write a program to implement pivot()


and pivot-table() on a DataFrame.

5 Write a program to find mean


absolute deviation on a DataFrame

6 Create a DataFrame based on E-


Commerce data and generate mean,
mode, median.

7 Create a DataFrame based on


employee data and generate quartile
and
variance.

8 Write a program to create a


DataFrame to store weight, age and
name of
three people. Print the DataFrame and
its transpose.

9 Series objects Temp1, temp2, temp3,


temp 4 stores the temperature of
days of week 1, week 2, week 3,
week 4. Write a script to:-
a. Print average temperature per week
b. Print average temperature of entire
month
10 Predict the Weather with machine
learning

Subject In-charge Student Signature


Practical – 1
1. Create a pandas series from a dictionary of values and array

Code:

import pandas as pd

import numpy as np

# Example 1: Creating a Pandas Series from a dictionary

dictionary = {'A': 10, 'B': 20, 'C': 30}

series_from_dict = pd.Series(dictionary)

print("Series from dictionary:")

print(series_from_dict)

# Example 2: Creating a Pandas Series from an ndarray

arr = np.array([1, 3, 4, 7, 8, 8, 9])

series_from_ndarray = pd.Series(arr)

print("\nSeries from ndarray:")

print(series_from_ndarray)

OUTPUT :
Practical – 2
2. Create a Series and print all the elements that are above 75th percentile.

Code:

import pandas as pd

import numpy as np

# Create a Pandas Series

arr = np.array([42, 12, 72, 85, 56, 100])

ser = pd.Series(arr)

# Calculate the 75th percentile

quantile_value = ser.quantile(q=0.75)

print("75th Percentile is:", quantile_value)

print("Values that are greater than 75th percentile are:")

for val in ser:

if val > quantile_value:

print(val)

OUTPUT :
Practical-3
3. Perform sorting on Series data and DataFrames.
Code:
import pandas as pd

# Create a sample numeric series


s = pd.Series([100, 200, 54.67, 300.12, 400])

# Sort the series in ascending order


sorted_series = s.sort_values(ascending=True)

print(sorted_series)

Output:
Practical-4
4. Write a program to implement pivot() and pivot-table() on a DataFrame.
Code:
import pandas as pd

# Create a sample DataFrame


data = {
'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
'Fruit': ['Apple', 'Banana', 'Apple', 'Orange'],
'Sales': [100, 200, 150, 300]
}

df = pd.DataFrame(data)

# Create a pivot table


pivot_table = df.pivot_table(
values='Sales', # Column to aggregate
index='Date', # Rows (index)
columns='Fruit', # Columns
aggfunc='sum', # Aggregation function (sum of sales)
fill_value=0 # Fill missing values with 0
)

print(pivot_table)

Output:
Practical-5
5. Write a program to find mean absolute deviation on a DataFrame.
Code:
import pandas as pd

# Create a sample DataFrame


data = {
'A': [10, 20, 30, 40],
'B': [15, 25, 35, 45]
}
df = pd.DataFrame(data)

# Calculate MAD for each column


mad_A = df['A'].mad()
mad_B = df['B'].mad()

print(f"MAD for column A: {mad_A:.2f}")


print(f"MAD for column B: {mad_B:.2f}")

# Calculate overall MAD for the entire DataFrame


overall_mad = df.mad().mean()

print(f"Overall MAD: {overall_mad:.2f}")

Output:
Practical-6
6. Create a DataFrame based on E Commerce data and generate mean, mode, median.
Code:
import pandas as pd

# Sample e-commerce data


data = {
'Order_ID': [101, 102, 103, 104, 105],
'Product': ['Laptop', 'Phone', 'Tablet', 'Headphones', 'Camera'],
'Price': [1000, 800, 500, 150, 600]
}

df = pd.DataFrame(data)

# Display the first 5 records


print("Sample e-commerce DataFrame:")
print(df.head())

# Calculate mean, mode, and median Output:


mean_price = df['Price'].mean()
mode_price = df['Price'].mode().iloc[0]
median_price = df['Price'].median()

print(f"\nMean Price: ${mean_price:.2f}")


print(f"Mode Price: ${mode_price:.2f}")
print(f"Median Price: ${median_price:.2f}")
Practical-7
7. Create a DataFrame based on employee data and generate quartile and variance.
Code:
import pandas as pd

# Sample employee data


data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [30, 25, 28, 32],
'Salary': [60000, 70000, 55000, 80000],
'Department': ['HR', 'IT', 'Finance', 'Sales']
}

df = pd.DataFrame(data)

# Calculate quartiles for the 'Salary' column


quartiles = df['Salary'].quantile([0.25, 0.5, 0.75])
print("Quartiles for Salary:")
print(quartiles)
salary_variance = df['Salary'].var()
print(f"Variance of Salary: {salary_variance:.2f}")

Output:
Practical-8
8. Write a program to create a DataFrame to store weight, age and name of three people.
Print the DataFrame and its transpose.
Code:
import pandas as pd

# Create a dictionary with sample data


data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [30, 25, 28],
'Weight': [60.5, 70.2, 65.8]
}

# Create the DataFrame


df = pd.DataFrame(data)

# Print the original DataFrame


print("Original DataFrame:")
print(df)

# Print the transpose (switch rows and columns)


print("\nTransposed DataFrame:") Output:
print(df.transpose())
Practical-9
9. Series objects Temp1, temp2, temp3, temp 4 stores the temperature of days of week 1,
week 2, week 3, week 4. Write a script to: - a. Print average temperature per week b. Print
average temperature of entire month.
Code:
# Sample temperature data (replace with actual values)
Temp1 = [25, 28, 30, 27, 26, 29, 24] # Week 1
Temp2 = [23, 22, 24, 25, 21, 20, 22] # Week 2
Temp3 = [28, 27, 26, 29, 30, 28, 27] # Week 3
Temp4 = [31, 32, 30, 33, 34, 31, 32] # Week 4

# Calculate average temperature per week Output:

def average_temperature(temps):
return sum(temps) / len(temps)

avg_week1 = average_temperature(Temp1)
avg_week2 = average_temperature(Temp2)
avg_week3 = average_temperature(Temp3)
avg_week4 = average_temperature(Temp4)

print(f"Average temperature for Week 1: {avg_week1:.2f}°C")


print(f"Average temperature for Week 2: {avg_week2:.2f}°C")
print(f"Average temperature for Week 3: {avg_week3:.2f}°C")
print(f"Average temperature for Week 4: {avg_week4:.2f}°C")

# Calculate average temperature for the entire month


all_temps = Temp1 + Temp2 + Temp3 + Temp4
avg_month = average_temperature(all_temps)
print(f"Average temperature for the entire month: {avg_month:.2f}°C")
Practical-10
10. Predict the Weather with machine learning
Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import AdaBoostClassifier
from sklearn.neighbors import KNeighborsClassifier
from xgboost import XGBClassifier
from sklearn.metrics import precision_score, recall_score, f1_score

# Load the dataset


url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-total-female-
births.csv'
df = pd.read_csv(url)

# Preprocessing: Convert date to datetime, extract month and day


df['Date'] = pd.to_datetime(df['Date'])
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day

# Encode 'Rain' column (target variable)


le = LabelEncoder()
df['Rain'] = le.fit_transform(df['Rain'])

# Features and target


X = df[['Month', 'Day']]
y = df['Rain']
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize classifiers
xgb = XGBClassifier()
knn = KNeighborsClassifier()
ada = AdaBoostClassifier()

# Train classifiers
xgb.fit(X_train, y_train)
knn.fit(X_train, y_train)
ada.fit(X_train, y_train)

# Evaluate performance
y_pred_xgb = xgb.predict(X_test)
y_pred_knn = knn.predict(X_test)
y_pred_ada = ada.predict(X_test)

precision = precision_score(y_test, y_pred_xgb)


recall = recall_score(y_test, y_pred_xgb)
f1 = f1_score(y_test, y_pred_xgb)

print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}") Output:

You might also like