0% found this document useful (0 votes)
40 views4 pages

Data Analytics: Histogram & Regression Analysis

Uploaded by

vickyvpatil25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views4 pages

Data Analytics: Histogram & Regression Analysis

Uploaded by

vickyvpatil25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

12/3/24, 4:15 PM Vicky patil_Practical_9 - Colab

Practical No.9

Name:-Vicky v patil

Class:-MCA 2nd Year Semester 3rd

Subject:-Data Anaytics Lab

Title of Practical:-Read a data which will give a proper distribution curve using pandas. Apply preprocessing on the data and plot a histogram
for the same. Properly label the plot. Analyze the plot

import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression

#
df=pd.read_csv('/content/Iris.csv')
df

Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

0 1 5.1 3.5 1.4 0.2 Iris-setosa

1 2 4.9 3.0 1.4 0.2 Iris-setosa

2 3 4.7 3.2 1.3 0.2 Iris-setosa

3 4 4.6 3.1 1.5 0.2 Iris-setosa

4 5 5.0 3.6 1.4 0.2 Iris-setosa

... ... ... ... ... ... ...

145 146 6.7 3.0 5.2 2.3 Iris-virginica

146 147 6.3 2.5 5.0 1.9 Iris-virginica

147 148 6.5 3.0 5.2 2.0 Iris-virginica

148 149 6.2 3.4 5.4 2.3 Iris-virginica

149 150 5.9 3.0 5.1 1.8 Iris-virginica

150 rows × 6 columns

from google.colab import drive


drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

print(df.isnull().sum())

Id 0
SepalLengthCm 0
SepalWidthCm 0
PetalLengthCm 0
PetalWidthCm 0
Species 0
dtype: int64

le=LabelEncoder()
df['Species']=le.fit_transform(df['Species'])
df

https://colab.research.google.com/drive/1S8CI8u92AuPe6vwKOCNQdusyjrCoArqs?usp=sharing#scrollTo=HsAEcmTrKUV0&printMode=true 1/4
12/3/24, 4:15 PM Vicky patil_Practical_9 - Colab

Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

0 1 5.1 3.5 1.4 0.2 0

1 2 4.9 3.0 1.4 0.2 0

2 3 4.7 3.2 1.3 0.2 0

3 4 4.6 3.1 1.5 0.2 0

4 5 5.0 3.6 1.4 0.2 0

... ... ... ... ... ... ...

145 146 6.7 3.0 5.2 2.3 2

146 147 6.3 2.5 5.0 1.9 2

147 148 6.5 3.0 5.2 2.0 2

148 149 6.2 3.4 5.4 2.3 2

149 150 5.9 3.0 5.1 1.8 2

150 rows × 6 columns

x=df[df.columns[0:3]]
y=df[df.columns[4]]

df.shape

(150, 6)

x.head()

Id SepalLengthCm SepalWidthCm

0 1 5.1 3.5

1 2 4.9 3.0

2 3 4.7 3.2

3 4 4.6 3.1

4 5 5.0 3.6

y.head()

PetalWidthCm

0 0.2

1 0.2

2 0.2

3 0.2

4 0.2

dtype: float64

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=10)

x_train.head()

Id SepalLengthCm SepalWidthCm

32 33 5.2 4.1

52 53 6.9 3.1

70 71 5.9 3.2

121 122 5.6 2.8

144 145 6.7 3.3

y_train.head()

https://colab.research.google.com/drive/1S8CI8u92AuPe6vwKOCNQdusyjrCoArqs?usp=sharing#scrollTo=HsAEcmTrKUV0&printMode=true 2/4
12/3/24, 4:15 PM Vicky patil_Practical_9 - Colab

PetalWidthCm

32 0.1

52 1.5

70 1.8

121 2.0

144 2.5

dtype: float64

# prompt: how to build a regression model

from sklearn.linear_model import LinearRegression

# Initialize the model


model = LinearRegression()

# Train the model


model.fit(x_train, y_train)

# Make predictions on the test set


y_pred = model.predict(x_test)

# Evaluate the model (example: R-squared)


from sklearn.metrics import r2_score
r2 = r2_score(y_test, y_pred)
print(f"R-squared: {r2}")

R-squared: 0.8627149401415144

md=LinearRegression()

md.fit(x_train,y_train)

▾ LinearRegression i ?
LinearRegression()

y_pred=md.predict(x_test)
y_pred

array([ 1.56788019, 1.81167007, 0.26224485, 1.45759149, 0.59318562,


0.86904724, 1.38235127, 1.13323559, 0.50776235, 0.95786364,
1.01767325, 1.9396602 , 0.92852498, 0.07921787, 0.24422778,
1.92294443, 1.31036751, 0.30594397, 0.27785444, 0.45477503,
1.91808717, 2.02534696, 1.89238898, 0.2438114 , 1.06657763,
-0.00271782, 1.36371277, 1.28999511, 1.56010462, 2.18772813,
1.28599669, 1.56844946, 2.24262204, 1.62647301, 2.17802355,
0.44691494, 1.98138436, 2.01894106, 2.34394446, 2.25686195,
0.33376007, 0.50442765, 0.84188492, 0.45727696, 1.11333801])

from sklearn.metrics import r2_score


from sklearn.metrics import mean_squared_error

r2 = r2_score(y_test, y_pred)
print(f"R-squared: {r2}")

mse = mean_squared_error(y_test, y_pred)


print(f"Mean Squared Error: {mse}")

R-squared: 0.8627149401415144
Mean Squared Error: 0.07482273045106973

import seaborn as sns


import matplotlib.pyplot as plt

plt.figure(figsize=(9,6))
sns.histplot(df['Species'])
plt.title('Distribution of Species')
plt.xlabel('Species')
plt.ylabel('Count')
plt.show()

https://colab.research.google.com/drive/1S8CI8u92AuPe6vwKOCNQdusyjrCoArqs?usp=sharing#scrollTo=HsAEcmTrKUV0&printMode=true 3/4
12/3/24, 4:15 PM Vicky patil_Practical_9 - Colab

https://colab.research.google.com/drive/1S8CI8u92AuPe6vwKOCNQdusyjrCoArqs?usp=sharing#scrollTo=HsAEcmTrKUV0&printMode=true 4/4

You might also like