0% found this document useful (0 votes)
33 views

EXP-4 DMusingPYTHON

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

EXP-4 DMusingPYTHON

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

EXPERIMENT-4

Aim: Build a model using linear regression algorithm on any dataset.


What is Simple Linear Regression?

In statistics, simple linear regression is a linear regression model with a single


explanatory variable. In simple linear regression, we predict scores on one
variable based on results on another. The criteria variable Y is the variable we
are predicting. Predictor variable X is the variable using which we are making
our predictions. The prediction approach is known as simple regression as there
is only one predictor variable,

As a result, a linear function that predicts the values of the dependent variable as
a function of the independent variable is discovered for two-dimensional sample
points with one independent variable and one dependent variable.

The below graph explains the relation between Salary and Years of Experience

Equation : y = mx + c

This is the simple linear regression equation where c is the constant and m is
the slope and describes the relationship between x (independent
variable) and y (dependent variable). The coefficient can be positive or negative
and is the degree of change in the dependent variable for every 1 unit of change
in the independent variable.
β0 (y-intercept) and β1 (slope) are the coefficients whose values represent the
accuracy of predicted values with the actual values.
Implement Simple Linear Regression in Python
In this example, we will use the salary data concerning the experience of
employees. In this dataset, we have two columns YearsExperience and Salary
Step 1: Import the required python packages
We need Pandas for data manipulation, NumPy for mathematical calculations,
and MatplotLib, and Seaborn for visualizations. Sklearn libraries are used for
machine learning operations
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from pandas.core.common import random_state
from sklearn.linear_model import LinearRegression
Step 2: Load the dataset
Download the dataset from here and upload it to your notebook and read it into
the pandas dataframe.
# Get dataset
df_sal = pd.read_csv(r"C:\Users\ayyap\OneDrive\Desktop\DMDW#LAB\
Salary_Data.csv")
df_sal.head()
Step 3: Data analysis
Now that we have our data ready, let's analyze and understand its trend in detail.
To do that we can first describe the data below –
# Describe data
df_sal.describe()

Here, we can see Salary ranges from 37731 to 122391 and a median of65237.We
can also find how the data is distributed visually using Seaborn Histplot
# Data distribution
plt.title('Salary Distribution Plot')
sns.Histplot(df_sal['Salary'])
plt.show()

A Histplot or distribution plot shows the variation in the data distribution.


It represents the data by combining a line with a histogram.
Then we check the relationship between Salary and Experience –
# Relationship between Salary and Experience
plt.scatter(df_sal['YearsExperience'], df_sal['Salary'], color = 'lightcoral')
plt.title('Salary vs Experience')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.box(False)
plt.show()

It is clearly visible now, our data varies linearly. That means, that an individual
receives more Salary as they gain Experience.
Step 4: Split the dataset into dependent/independent variables
Experience (X) is the independent variable
Salary (y) is dependent on experience
# Splitting variables
X = df_sal.iloc[:, :1] # independent
y = df_sal.iloc[:, 1:] # dependent
Step 4: Split data into Train/Test sets
Further, split your data into training (80%) and test (20%) sets
using train_test_split

# Splitting dataset into test/train


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state
= 0)

Step 5: Train the regression model


Pass the X_train and y_train data into the regressor model by regressor.fit to
train the model with our training data.

# Regressor model
regressor = LinearRegression()
regressor.fit(X_train, y_train)
Step 6: Predict the result
Here comes the interesting part, when we are all set and ready to predict any
value of y (Salary) dependent on X (Experience) with the trained model
using regressor.predict
# Prediction result
y_pred_test = regressor.predict(X_test) # predicted value of y_test
y_pred_train = regressor.predict(X_train) # predicted value of y_train
Step 7: Plot the training and test results
Its time to test our predicted results by plotting graphs
Plot training set data vs predictions
First we plot the result of training sets (X_train, y_train) with X_train and
predicted value of y_train (regressor.predict(X_train))
# Prediction on training set
plt.scatter(X_train, y_train, color = 'lightcoral')
plt.plot(X_train, y_pred_train, color = 'firebrick')
plt.title('Salary vs Experience (Training Set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.legend(['X_train/Pred(y_test)', 'X_train/y_train'], title = 'Sal/Exp', loc='best',
facecolor='white')
plt.box(False)
plt.show()
Plot test set data vs predictions
Secondly, we plot the result of test sets (X_test, y_test) with X_train and
predicted value of y_train (regressor.predict(X_train))

# Prediction on test set


plt.scatter(X_test, y_test, color = 'lightcoral')
plt.plot(X_train, y_pred_train, color = 'firebrick')
plt.title('Salary vs Experience (Test Set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.legend(['X_train/Pred(y_test)', 'X_train/y_train'], title = 'Sal/Exp', loc='best',
facecolor='white')
plt.box(False)
plt.show()
We can see, in both plots, the regressor line covers train and test data.Also, you
can plot results with the predicted value of y_test (regressor.predict(X_test)) but
the regression line would remain the same at it is generated from the unique
equation of linear regression with the same training data.
If you remember from the beginning of this article, we discussed the linear
equation y = mx + c, we can also get
the c (yintercept) and m (slope/coefficient) from the regressor model.
# Regressor coefficients and intercept
print(f'Coefficient: {regressor.coef_}')
print(f'Intercept: {regressor.intercept_}')
Output:

You might also like