0% found this document useful (0 votes)

20 views

LinearRegression HandsOn

The document describes a linear regression model to predict a car's miles per gallon (mpg) using its attributes. It provides details on the dataset, which includes variables like cylinders, displacement, horsepower, weight, and origin for 398 cars. It describes preparing the data, creating dummy variables for origin, splitting the data into train and test sets, fitting a linear model on the training set, and reporting that the model can explain 81.4% of the variance in mpg.

Uploaded by

SHEKHAR SWAMI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

LinearRegression HandsOn

Uploaded by

SHEKHAR SWAMI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

We will construct a linear model that can predict a car's mileage (mpg) by using its other attributes.

Data Description:
The dataset has 9 variables, including the name of the car and its various attributes like horsepower, weight, region of origin, etc. Missing values
in the data are marked by a series of question marks.

A detailed description of the variables is given below.

1. mpg: miles per gallon

2. cylinders: number of cylinders
3. displacement: engine displacement in cubic inches
4. horsepower: horsepower of the car
5. weight: weight of the car in pounds
6. acceleration: time taken, in seconds, to accelerate from O to 60 mph
7. model year: year of manufacture of the car (modulo 100)
8. origin: region of origin of the car (1 - American, 2 - European, 3 - Asian)
9. car name: name of the car

Import Libraries

import pandas as pd
import numpy as np

# for visualizing data
import matplotlib.pyplot as plt
import seaborn as sns

# For randomized data splitting
from sklearn.model_selection import train_test_split

# To build linear regression_model
import statsmodels.api as sm

# To check model performance
from sklearn.metrics import mean_absolute_error, mean_squared_error

Load and explore the data

Loding data into Google Colab. (If running it locally on jupyter this is not necessary - just use 'cData = pd.read_csv("auto-mpg.csv")')

from google.colab import files
import io

try:
uploaded
except NameError:
uploaded = files.upload()

cData = pd.read_csv(io.BytesIO(uploaded['auto-mpg.csv']))
#cData = pd.read_csv("auto-mpg.csv")

# let's check the shape of the data
cData.shape

(398, 9)

# let's check the first 5 rows of the data
cData.head()
model
mpg cylinders displacement horsepower weight acceleration origin car name
year

chevrolet
# let's check column types and number of values
0 18.0 8 307.0 130 3504 12.0 70 1 chevelle
cData.info()
malibu

<class 'pandas.core.frame.DataFrame'> buick

RangeIndex:
1 15.0 398 entries,
8 0 to 397
350.0 165 3693 11.5 70 1 skylark
Data columns (total 9 columns): 320
# Column Non-Null Count Dtype
--- ------ -------------- ----- plymouth
0 mpg 398 non-null float64
1 cylinders 398 non-null int64
2 displacement 398 non-null float64
3 horsepower 398 non-null object
4 weight 398 non-null int64
5 acceleration 398 non-null float64
6 model year 398 non-null int64
7 origin 398 non-null int64
8 car name 398 non-null object
dtypes: float64(3), int64(4), object(2)
memory usage: 28.1+ KB

Most of the columns in the data are numeric in nature ('int64' or 'float64' type).
The horsepower and car name columns are string columns ('object' type).

We will be dropping the 'car name' column for prediction purposes.

cData = cData.drop(["car name"], axis=1)

Dealing with Missing Values

[ ] ↳ 11 cells hidden

Bivariate Analysis
A bivariate analysis among the different variables can be done using scatter matrix plot. Seaborn libs create a dashboard reflecting useful
information about the dimensions. The result can be stored as a .png file.

[ ] ↳ 2 cells hidden

Create Dummy Variables

Values like 'america' cannot be read into an equation. Using substitutes like 1 for america, 2 for europe and 3 for asia would end up implying
that European cars fall exactly half way between American and Asian cars! We don't want to impose such a baseless assumption!

So we create 3 simple true or false columns with titles equivalent to "Is this car American?", "Is this car European?" and "Is this car Asian?".
These will be used as independent variables without imposing any kind of ordering between the three regions.

We will also be dropping one of those three columns to ensure there is no linear dependency between the three columns.

[ ] ↳ 1 cell hidden

Split Data
[ ] ↳ 6 cells hidden

Fit Linear Model

olsmod = sm.OLS(y_train, X_train)
olsres = olsmod.fit()

# let's print the regression summary
print(olsres.summary())

OLS Regression Results

==============================================================================
Dep. Variable: mpg R-squared: 0.814
Model: OLS Adj. R-squared: 0.809
Method: Least Squares F-statistic: 147.3
Date: Tue, 08 Jun 2021 Prob (F-statistic): 1.20e-93
Time: 23:03:45 Log-Likelihood: -734.21
No. Observations: 278 AIC: 1486.
Df Residuals: 269 BIC: 1519.
Df Model: 8
Covariance Type: nonrobust
=================================================================================
coef std err t P>|t| [0.025 0.975]
---------------------------------------------------------------------------------
const -21.2847 5.679 -3.748 0.000 -32.465 -10.104
cylinders -0.3948 0.423 -0.933 0.352 -1.228 0.439
displacement 0.0289 0.010 2.870 0.004 0.009 0.049
horsepower -0.0218 0.016 -1.330 0.185 -0.054 0.010
weight -0.0074 0.001 -8.726 0.000 -0.009 -0.006
acceleration 0.0619 0.118 0.524 0.601 -0.171 0.295
model year 0.8369 0.064 13.149 0.000 0.712 0.962
origin_asia 2.3953 0.684 3.503 0.001 1.049 3.741
origin_europe 3.0013 0.704 4.262 0.000 1.615 4.388
==============================================================================
Omnibus: 13.244 Durbin-Watson: 2.244
Prob(Omnibus): 0.001 Jarque-Bera (JB): 16.958
Skew: 0.386 Prob(JB): 0.000208
Kurtosis: 3.932 Cond. No. 8.45e+04
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 8.45e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

Interpretation of R-squared
The R-squared value tells us that our model can explain 81.4% of the variance in the training set.

Colab paid products - Cancel contracts here

JetCat Binary Protocol V22 04 06 B
No ratings yet
JetCat Binary Protocol V22 04 06 B
30 pages
Peugeot PPS
No ratings yet
Peugeot PPS
20 pages
OpenText Brava! For Content Suite 16.6 - Brava! For CS Module Administration Guide English (CLBRVWOTCS160600-AGD-EN-03)
No ratings yet
OpenText Brava! For Content Suite 16.6 - Brava! For CS Module Administration Guide English (CLBRVWOTCS160600-AGD-EN-03)
98 pages
UVM Quick Reference Guide: Author: Putta Satish
50% (2)
UVM Quick Reference Guide: Author: Putta Satish
47 pages
MTCARS Regression Analysis
No ratings yet
MTCARS Regression Analysis
5 pages
Regression
No ratings yet
Regression
5 pages
En Tanagra Python StatsModels PDF
No ratings yet
En Tanagra Python StatsModels PDF
20 pages
Exercises 2 Unfinished
No ratings yet
Exercises 2 Unfinished
8 pages
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
100% (1)
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
28 pages
Machine Learning With Python - Part-2
No ratings yet
Machine Learning With Python - Part-2
27 pages
Assignment Auto
No ratings yet
Assignment Auto
6 pages
hw16_109090023
No ratings yet
hw16_109090023
22 pages
Mtcars: Choosing The Most Related Variable (S) To The Response
No ratings yet
Mtcars: Choosing The Most Related Variable (S) To The Response
13 pages
Lab3 Report Revathy
No ratings yet
Lab3 Report Revathy
8 pages
Regression Anallysis Hands0n 1
100% (1)
Regression Anallysis Hands0n 1
3 pages
20BCE1205 Lab3
No ratings yet
20BCE1205 Lab3
9 pages
Linear Regression
100% (1)
Linear Regression
16 pages
Multi_Regression
No ratings yet
Multi_Regression
12 pages
Data Science Using R
No ratings yet
Data Science Using R
11 pages
INSY446 - 02 - Linear Model Part 1
No ratings yet
INSY446 - 02 - Linear Model Part 1
27 pages
Week 02 Data Wrangling
No ratings yet
Week 02 Data Wrangling
10 pages
Simple_and_Multiple_Regression
No ratings yet
Simple_and_Multiple_Regression
9 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
Ankit Bansal CGT19005
No ratings yet
Ankit Bansal CGT19005
7 pages
Lecture 3
No ratings yet
Lecture 3
90 pages
Regression Models Project Sid Jas
No ratings yet
Regression Models Project Sid Jas
7 pages
Session CLRM Review 5
No ratings yet
Session CLRM Review 5
15 pages
Bivariate Correlation and Multiple Regression Analyses For Continuous Variables Using SAS
No ratings yet
Bivariate Correlation and Multiple Regression Analyses For Continuous Variables Using SAS
9 pages
se python_merged (1) (1) (1)
No ratings yet
se python_merged (1) (1) (1)
77 pages
Chapter 4 Exercise 11
No ratings yet
Chapter 4 Exercise 11
5 pages
2015 Regression Using Stata and SAS
No ratings yet
2015 Regression Using Stata and SAS
36 pages
19CMU38 - DataScience With R Programming (1-5 & 11,12)
No ratings yet
19CMU38 - DataScience With R Programming (1-5 & 11,12)
25 pages
Outreg2(1)
No ratings yet
Outreg2(1)
34 pages
DMPM-LAB-03-Assignment: Rcode
No ratings yet
DMPM-LAB-03-Assignment: Rcode
9 pages
Fall 2023-2024 IE 451 Homework 2 Solutions
No ratings yet
Fall 2023-2024 IE 451 Homework 2 Solutions
20 pages
Assignment 2 ML
No ratings yet
Assignment 2 ML
11 pages
R
No ratings yet
R
3 pages
Analisis Jalur
No ratings yet
Analisis Jalur
30 pages
Swapnil Shashank Parkhe (UIN-660014865) Assignment 1 (All Are Pasted at End)
No ratings yet
Swapnil Shashank Parkhe (UIN-660014865) Assignment 1 (All Are Pasted at End)
16 pages
vertopal.com_UCD_linear_reg2
No ratings yet
vertopal.com_UCD_linear_reg2
3 pages
Ankit Bansal-CGT19005
No ratings yet
Ankit Bansal-CGT19005
7 pages
Regression Models Assignment 1
No ratings yet
Regression Models Assignment 1
5 pages
Report_FinalProject
No ratings yet
Report_FinalProject
89 pages
Multiple Regression1
No ratings yet
Multiple Regression1
27 pages
Assignment_Solution_1
No ratings yet
Assignment_Solution_1
11 pages
Lab 6
No ratings yet
Lab 6
2 pages
Motor Trend Car Road Tests
No ratings yet
Motor Trend Car Road Tests
5 pages
Lab4
No ratings yet
Lab4
4 pages
Session CLRM Review 3
No ratings yet
Session CLRM Review 3
22 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
Final DSR Lab Record
No ratings yet
Final DSR Lab Record
16 pages
'Horsepower' "?" 'Horsepower' 'Horsepower' 'Horsepower' 'Horsepower' 'Horsepower'
No ratings yet
'Horsepower' "?" 'Horsepower' 'Horsepower' 'Horsepower' 'Horsepower' 'Horsepower'
5 pages
Machine Learning Project Car Price Prediction Algorithm
No ratings yet
Machine Learning Project Car Price Prediction Algorithm
4 pages
Regression Models Assignment 1
No ratings yet
Regression Models Assignment 1
6 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
Project 8 Predictive Analytics - Ipynb - Colaboratory
No ratings yet
Project 8 Predictive Analytics - Ipynb - Colaboratory
8 pages
ML Foram
No ratings yet
ML Foram
17 pages
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
No ratings yet
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
10 pages
Appendix:: Initial Data Analysis Correlation: MPG, Year, Weight, Engine, Horse, Accel, Origin, Cylinder
No ratings yet
Appendix:: Initial Data Analysis Correlation: MPG, Year, Weight, Engine, Horse, Accel, Origin, Cylinder
7 pages
ML Lab6.Ipynb - Colaboratory
100% (1)
ML Lab6.Ipynb - Colaboratory
5 pages
Lab1: Introduction To R: Islr2
No ratings yet
Lab1: Introduction To R: Islr2
10 pages
Lab2 Revathy Report
No ratings yet
Lab2 Revathy Report
5 pages
Notes 8 - Examples(March5)
No ratings yet
Notes 8 - Examples(March5)
25 pages
C Programming Pocket Primer: An Essential Guide to C Programming Basics
From Everand
C Programming Pocket Primer: An Essential Guide to C Programming Basics
Mercury Learning and Information
No ratings yet
Comptia Securityx Cas 005 Exam Objectives (3 0)
No ratings yet
Comptia Securityx Cas 005 Exam Objectives (3 0)
17 pages
Ecg Analysis
No ratings yet
Ecg Analysis
20 pages
Mobile Phones Project
No ratings yet
Mobile Phones Project
16 pages
Op Codes
No ratings yet
Op Codes
54 pages
Hardware Evolution of Computer
No ratings yet
Hardware Evolution of Computer
35 pages
Origamix: Color Your World With Us
No ratings yet
Origamix: Color Your World With Us
37 pages
GNT Diagrams Heb-Jude
No ratings yet
GNT Diagrams Heb-Jude
253 pages
BBDM 3294 SBL L6 Trend in Information Technology
No ratings yet
BBDM 3294 SBL L6 Trend in Information Technology
50 pages
2011 Australian Mathematics Competition AMC Intermediate Years 9 and 10
No ratings yet
2011 Australian Mathematics Competition AMC Intermediate Years 9 and 10
9 pages
Discuss The Major Units of Smartbooks Cloud
No ratings yet
Discuss The Major Units of Smartbooks Cloud
15 pages
HP 280 G4 Microtower PC HP 280 G4 Microtower PC: Maximize Your Investment
No ratings yet
HP 280 G4 Microtower PC HP 280 G4 Microtower PC: Maximize Your Investment
4 pages
Internet of Things
No ratings yet
Internet of Things
50 pages
Phoenix WinNonlin 6.3 Examples Guide
0% (1)
Phoenix WinNonlin 6.3 Examples Guide
208 pages
Theory VL - PK207
No ratings yet
Theory VL - PK207
2 pages
3BUA000500 en V Syst
No ratings yet
3BUA000500 en V Syst
24 pages
13.automatic Mains Changeover Switch For Uninterrupted Power Supply
No ratings yet
13.automatic Mains Changeover Switch For Uninterrupted Power Supply
3 pages
Checksum 2
No ratings yet
Checksum 2
4 pages
Download Complete Orlicky s Material Requirements Planning 3rd Edition Carol A. Ptak PDF for All Chapters
100% (7)
Download Complete Orlicky s Material Requirements Planning 3rd Edition Carol A. Ptak PDF for All Chapters
53 pages
Next Move 3 Workbook
100% (1)
Next Move 3 Workbook
140 pages
Git Extensions Documentation Readthedocs Io en Release 4.1
No ratings yet
Git Extensions Documentation Readthedocs Io en Release 4.1
134 pages
Database Management System: by N.Ravikumar
No ratings yet
Database Management System: by N.Ravikumar
27 pages
Neo Automata Trouble to Resolve (T2R)
No ratings yet
Neo Automata Trouble to Resolve (T2R)
7 pages
What Is Big Data?: Examples of Big Data: Applications of Big Data in Real Life
No ratings yet
What Is Big Data?: Examples of Big Data: Applications of Big Data in Real Life
4 pages
Chapter 4
No ratings yet
Chapter 4
120 pages
6 Principles of First and Third Angle Orthogra 2020 Manual of Engineering
No ratings yet
6 Principles of First and Third Angle Orthogra 2020 Manual of Engineering
22 pages
Preston Carmon - Challenges and Achievements: Guided Response (By Friday)
No ratings yet
Preston Carmon - Challenges and Achievements: Guided Response (By Friday)
2 pages

LinearRegression HandsOn

Uploaded by

LinearRegression HandsOn

Uploaded by

We will construct a linear model that can predict a car's mileage (mpg) by using its other attributes.

A detailed description of the variables is given below.

1. mpg: miles per gallon

Load and explore the data

<class 'pandas.core.frame.DataFrame'> buick

We will be dropping the 'car name' column for prediction purposes.

Dealing with Missing Values

Create Dummy Variables

Fit Linear Model

OLS Regression Results

Colab paid products - Cancel contracts here

You might also like