12/3/24, 4:15 PM Vicky patil_Practical_9 - Colab
Practical No.9
Name:-Vicky v patil
Class:-MCA 2nd Year Semester 3rd
Subject:-Data Anaytics Lab
Title of Practical:-Read a data which will give a proper distribution curve using pandas. Apply preprocessing on the data and plot a histogram
for the same. Properly label the plot. Analyze the plot
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression
#
df=pd.read_csv('/content/Iris.csv')
df
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 1 5.1 3.5 1.4 0.2 Iris-setosa
1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
4 5 5.0 3.6 1.4 0.2 Iris-setosa
... ... ... ... ... ... ...
145 146 6.7 3.0 5.2 2.3 Iris-virginica
146 147 6.3 2.5 5.0 1.9 Iris-virginica
147 148 6.5 3.0 5.2 2.0 Iris-virginica
148 149 6.2 3.4 5.4 2.3 Iris-virginica
149 150 5.9 3.0 5.1 1.8 Iris-virginica
150 rows × 6 columns
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
print(df.isnull().sum())
Id 0
SepalLengthCm 0
SepalWidthCm 0
PetalLengthCm 0
PetalWidthCm 0
Species 0
dtype: int64
le=LabelEncoder()
df['Species']=le.fit_transform(df['Species'])
df
https://colab.research.google.com/drive/1S8CI8u92AuPe6vwKOCNQdusyjrCoArqs?usp=sharing#scrollTo=HsAEcmTrKUV0&printMode=true 1/4
12/3/24, 4:15 PM Vicky patil_Practical_9 - Colab
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 1 5.1 3.5 1.4 0.2 0
1 2 4.9 3.0 1.4 0.2 0
2 3 4.7 3.2 1.3 0.2 0
3 4 4.6 3.1 1.5 0.2 0
4 5 5.0 3.6 1.4 0.2 0
... ... ... ... ... ... ...
145 146 6.7 3.0 5.2 2.3 2
146 147 6.3 2.5 5.0 1.9 2
147 148 6.5 3.0 5.2 2.0 2
148 149 6.2 3.4 5.4 2.3 2
149 150 5.9 3.0 5.1 1.8 2
150 rows × 6 columns
x=df[df.columns[0:3]]
y=df[df.columns[4]]
df.shape
(150, 6)
x.head()
Id SepalLengthCm SepalWidthCm
0 1 5.1 3.5
1 2 4.9 3.0
2 3 4.7 3.2
3 4 4.6 3.1
4 5 5.0 3.6
y.head()
PetalWidthCm
0 0.2
1 0.2
2 0.2
3 0.2
4 0.2
dtype: float64
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=10)
x_train.head()
Id SepalLengthCm SepalWidthCm
32 33 5.2 4.1
52 53 6.9 3.1
70 71 5.9 3.2
121 122 5.6 2.8
144 145 6.7 3.3
y_train.head()
https://colab.research.google.com/drive/1S8CI8u92AuPe6vwKOCNQdusyjrCoArqs?usp=sharing#scrollTo=HsAEcmTrKUV0&printMode=true 2/4
12/3/24, 4:15 PM Vicky patil_Practical_9 - Colab
PetalWidthCm
32 0.1
52 1.5
70 1.8
121 2.0
144 2.5
dtype: float64
# prompt: how to build a regression model
from sklearn.linear_model import LinearRegression
# Initialize the model
model = LinearRegression()
# Train the model
model.fit(x_train, y_train)
# Make predictions on the test set
y_pred = model.predict(x_test)
# Evaluate the model (example: R-squared)
from sklearn.metrics import r2_score
r2 = r2_score(y_test, y_pred)
print(f"R-squared: {r2}")
R-squared: 0.8627149401415144
md=LinearRegression()
md.fit(x_train,y_train)
▾ LinearRegression i ?
LinearRegression()
y_pred=md.predict(x_test)
y_pred
array([ 1.56788019, 1.81167007, 0.26224485, 1.45759149, 0.59318562,
0.86904724, 1.38235127, 1.13323559, 0.50776235, 0.95786364,
1.01767325, 1.9396602 , 0.92852498, 0.07921787, 0.24422778,
1.92294443, 1.31036751, 0.30594397, 0.27785444, 0.45477503,
1.91808717, 2.02534696, 1.89238898, 0.2438114 , 1.06657763,
-0.00271782, 1.36371277, 1.28999511, 1.56010462, 2.18772813,
1.28599669, 1.56844946, 2.24262204, 1.62647301, 2.17802355,
0.44691494, 1.98138436, 2.01894106, 2.34394446, 2.25686195,
0.33376007, 0.50442765, 0.84188492, 0.45727696, 1.11333801])
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error
r2 = r2_score(y_test, y_pred)
print(f"R-squared: {r2}")
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
R-squared: 0.8627149401415144
Mean Squared Error: 0.07482273045106973
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(9,6))
sns.histplot(df['Species'])
plt.title('Distribution of Species')
plt.xlabel('Species')
plt.ylabel('Count')
plt.show()
https://colab.research.google.com/drive/1S8CI8u92AuPe6vwKOCNQdusyjrCoArqs?usp=sharing#scrollTo=HsAEcmTrKUV0&printMode=true 3/4
12/3/24, 4:15 PM Vicky patil_Practical_9 - Colab
https://colab.research.google.com/drive/1S8CI8u92AuPe6vwKOCNQdusyjrCoArqs?usp=sharing#scrollTo=HsAEcmTrKUV0&printMode=true 4/4