0% found this document useful (0 votes)

91 views14 pages

House Price Prediction Analysis

This document discusses predicting house prices using machine learning models. It performs data preprocessing steps like dropping unnecessary columns, handling missing values, feature scaling and encoding categorical variables. Different regression models like linear regression, ridge regression, decision trees are applied to predict house prices.

Uploaded by

Vinay Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views14 pages

House Price Prediction Analysis

Uploaded by

Vinay Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

7/8/23, 4:15 PM house price

House Price Prediction

In [ ]: #Importing the libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import folium
from folium.plugins import FastMarkerCluster
from sklearn import preprocessing
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.metrics import mean_absolute_error
from sklearn.linear_model import Ridge

In [ ]: # Importing the dataset

data = pd.read_csv('https://raw.githubusercontent.com/rashida048/Datasets/master
data.head()

Out[ ]: id date price bedrooms bathrooms sqft_living sqft_lot f

0 7129300520 20141013T000000 221900 3 1.00 1180 5650

1 6414100192 20141209T000000 538000 3 2.25 2570 7242

2 5631500400 20150225T000000 180000 2 1.00 770 10000

3 2487200875 20141209T000000 604000 4 3.00 1960 5000

4 1954400510 20150218T000000 510000 3 2.00 1680 8080

5 rows × 21 columns

In [ ]: #droping the unnecessary columns such as id, date, zipcode , lat and long
data.drop(['id','date'],axis=1,inplace=True)
data.head()

file:///E:/Data Science Course/Projects/house price.html 1/13

7/8/23, 4:15 PM house price

Out[ ]: price bedrooms bathrooms sqft_living sqft_lot floors waterfront view condit

0 221900 3 1.00 1180 5650 1.0 0 0

1 538000 3 2.25 2570 7242 2.0 0 0

2 180000 2 1.00 770 10000 1.0 0 0

3 604000 4 3.00 1960 5000 1.0 0 0

4 510000 3 2.00 1680 8080 1.0 0 0

In [ ]: data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21613 entries, 0 to 21612
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 price 21613 non-null int64
1 bedrooms 21613 non-null int64
2 bathrooms 21613 non-null float64
3 sqft_living 21613 non-null int64
4 sqft_lot 21613 non-null int64
5 floors 21613 non-null float64
6 waterfront 21613 non-null int64
7 view 21613 non-null int64
8 condition 21613 non-null int64
9 grade 21613 non-null int64
10 sqft_above 21613 non-null int64
11 sqft_basement 21613 non-null int64
12 yr_built 21613 non-null int64
13 yr_renovated 21613 non-null int64
14 zipcode 21613 non-null int64
15 lat 21613 non-null float64
16 long 21613 non-null float64
17 sqft_living15 21613 non-null int64
18 sqft_lot15 21613 non-null int64
dtypes: float64(4), int64(15)
memory usage: 3.1 MB

In [ ]: data.describe()

file:///E:/Data Science Course/Projects/house price.html 2/13

7/8/23, 4:15 PM house price

Out[ ]: price bedrooms bathrooms sqft_living sqft_lot fl

count 2.161300e+04 21613.000000 21613.000000 21613.000000 2.161300e+04 21613.000

mean 5.400881e+05 3.370842 2.114757 2079.899736 1.510697e+04 1.494

std 3.671272e+05 0.930062 0.770163 918.440897 4.142051e+04 0.539

min 7.500000e+04 0.000000 0.000000 290.000000 5.200000e+02 1.000

25% 3.219500e+05 3.000000 1.750000 1427.000000 5.040000e+03 1.000

50% 4.500000e+05 3.000000 2.250000 1910.000000 7.618000e+03 1.500

75% 6.450000e+05 4.000000 2.500000 2550.000000 1.068800e+04 2.000

max 7.700000e+06 33.000000 8.000000 13540.000000 1.651359e+06 3.500

In [ ]: # checking for null values/missing values

data.isnull().sum()

Out[ ]: price 0
bedrooms 0
bathrooms 0
sqft_living 0
sqft_lot 0
floors 0
waterfront 0
view 0
condition 0
grade 0
sqft_above 0
sqft_basement 0
yr_built 0
yr_renovated 0
zipcode 0
lat 0
long 0
sqft_living15 0
sqft_lot15 0
dtype: int64

In [ ]: data.nunique()

file:///E:/Data Science Course/Projects/house price.html 3/13

7/8/23, 4:15 PM house price

Out[ ]: price 4032

bedrooms 13
bathrooms 30
sqft_living 1038
sqft_lot 9782
floors 6
waterfront 2
view 5
condition 5
grade 12
sqft_above 946
sqft_basement 306
yr_built 116
yr_renovated 70
zipcode 70
lat 5034
long 752
sqft_living15 777
sqft_lot15 8689
dtype: int64

Data Preprocessing
In [ ]: # changing float to integer
data['bathrooms'] = data['bathrooms'].astype(int)
data['floors'] = data['floors'].astype(int)
# renaming the column yr_built to age and changing the values to age
data.rename(columns={'yr_built':'age'},inplace=True)
data['age'] = 2023 - data['age']
# changing the column yr_renovated to renovated and changing the values to 0 and
data.rename(columns={'yr_renovated':'renovated'},inplace=True)
data['renovated'] = data['renovated'].apply(lambda x: 0 if x == 0 else 1)

In [ ]: # using simple feature scaling

data['sqft_living'] = data['sqft_living']/data['sqft_living'].max()
data['sqft_living15'] = data['sqft_living15']/data['sqft_living15'].max()
data['sqft_lot'] = data['sqft_lot']/data['sqft_lot'].max()
data['sqft_above'] = data['sqft_above']/data['sqft_above'].max()
data['sqft_basement'] = data['sqft_basement']/data['sqft_basement'].max()
data['sqft_lot15'] = data['sqft_lot15']/data['sqft_lot15'].max()

In [ ]: data.head()

Out[ ]: price bedrooms bathrooms sqft_living sqft_lot floors waterfront view cond

0 221900 3 1 0.087149 0.003421 1 0 0

1 538000 3 2 0.189808 0.004385 2 0 0

2 180000 2 1 0.056869 0.006056 1 0 0

3 604000 4 3 0.144756 0.003028 1 0 0

4 510000 3 2 0.124077 0.004893 1 0 0

file:///E:/Data Science Course/Projects/house price.html 4/13

7/8/23, 4:15 PM house price

Exploratory Data Analysis

Correlation Matrix to find the relationship between the variables

In [ ]: # using correlation statistical method to find the relation between the price an
data.corr()['price'].sort_values(ascending=False)

Out[ ]: price 1.000000

sqft_living 0.702035
grade 0.667434
sqft_above 0.605567
sqft_living15 0.585379
bathrooms 0.510072
view 0.397293
sqft_basement 0.323816
bedrooms 0.308350
lat 0.307003
waterfront 0.266369
floors 0.237211
renovated 0.126092
sqft_lot 0.089661
sqft_lot15 0.082447
condition 0.036362
long 0.021626
zipcode -0.053203
age -0.054012
Name: price, dtype: float64

In [ ]: plt.figure(figsize=(20,20))
sns.heatmap(data.corr(),annot=True)
plt.show()

file:///E:/Data Science Course/Projects/house price.html 5/13

7/8/23, 4:15 PM house price

Visualizing the coorelation with price

In [ ]: data.corr()['price'][:-1].sort_values().plot(kind='bar')

Out[ ]: <Axes: >

file:///E:/Data Science Course/Projects/house price.html 6/13

7/8/23, 4:15 PM house price

Visulaizing the data

In [ ]: # visualizing the relation between price and sqft_living, sqft_lot, sqft_above,

fig, ax = plt.subplots(4,4,figsize=(20,20))
sns.scatterplot( x = data['sqft_living'], y = data['price'],ax=ax[0,0])
sns.scatterplot( x = data['sqft_lot'], y = data['price'],ax=ax[0,1])
sns.scatterplot( x = data['sqft_above'], y = data['price'],ax=ax[0,2])
sns.scatterplot( x = data['sqft_basement'], y = data['price'],ax=ax[0,3])
sns.scatterplot( x = data['sqft_living15'], y = data['price'],ax=ax[1,0])
sns.scatterplot( x = data['sqft_lot15'], y = data['price'],ax=ax[1,1])
sns.lineplot( x = data['age'], y = data['price'],ax=ax[1,2])
sns.boxplot( x = data['renovated'], y = data['price'],ax=ax[1,3])
sns.scatterplot( x = data['bedrooms'], y = data['price'],ax=ax[2,0])
sns.lineplot( x = data['bathrooms'], y = data['price'],ax=ax[2,1])
sns.barplot( x = data['floors'], y = data['price'],ax=ax[2,2])
sns.boxplot( x = data['waterfront'], y = data['price'],ax=ax[2,3])
sns.barplot( x = data['view'], y = data['price'],ax=ax[3,0])
sns.barplot( x = data['condition'], y = data['price'],ax=ax[3,1])
sns.lineplot( x = data['grade'], y = data['price'],ax=ax[3,2])
sns.lineplot( x = data['age'], y = data['renovated'],ax=ax[3,3])
plt.show()

file:///E:/Data Science Course/Projects/house price.html 7/13

7/8/23, 4:15 PM house price

Plotting the location of the houses based on longitude and latitude on

the map

In [ ]: # adding a new column price_range and categorizing the price into 4 categories
data['price_range'] = pd.cut(data['price'],bins=[0,321950,450000,645000,1295648]

In [ ]: map = folium.Map(location=[47.5480, -121.9836],zoom_start=8)

marker_cluster = FastMarkerCluster(data[['lat', 'long']].values.tolist()).add_to
map

file:///E:/Data Science Course/Projects/house price.html 8/13

7/8/23, 4:15 PM house price

22
Out[ ]: Make this Notebook Trusted to load map: File -> Trust Notebook
+ 13 34

− 47
6

7 56
Leaflet (https://leafletjs.com) | Data by © OpenStreetMap (http://openstreetmap.org), under ODbL
(http://www.openstreetmap.org/copyright).
30 52

Train/Test Split
In [ ]: data.drop(['price_range'],axis=1,inplace=True)
X_train, X_test, y_train, y_test = train_test_split(data.drop('price',axis=1),da

Model Training

Using pipeline to combine the transformers and estimators

and fit the model
In [ ]: input = [('scale',StandardScaler()),('polynomial', PolynomialFeatures(degree=2))
pipe = Pipeline(input)
pipe

Out[ ]: ▸ Pipeline

▸ StandardScaler

▸ PolynomialFeatures

▸ LinearRegression

In [ ]: #training the model

pipe.fit(X_train,y_train)
pipe.score(X_test,y_test)

Out[ ]: 0.8271896429378042

In [ ]: #testing the model

pipe_pred = pipe.predict(X_test)
r2_score(y_test,pipe_pred)

file:///E:/Data Science Course/Projects/house price.html 9/13

7/8/23, 4:15 PM house price

Out[ ]: 0.8271896429378042

Ridge Regression
In [ ]: Ridgemodel = Ridge(alpha = 0.001)
Ridgemodel

Out[ ]: ▾ Ridge

Ridge(alpha=0.001)

In [ ]: # training the model

Ridgemodel.fit(X_train,y_train)
Ridgemodel.score(X_test,y_test)

In [ ]: #testing the model

r_pred = Ridgemodel.predict(X_test)
r2_score(y_test,r_pred)

Out[ ]: 0.7123220593275169

Random Forest Regression

In [ ]: from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators=100, random_state=0)
regressor

Out[ ]: ▾ RandomForestRegressor

RandomForestRegressor(random_state=0)

In [ ]: # training the model

regressor.fit(X_train,y_train)
regressor.score(X_test,y_test)

Out[ ]: 0.878968081057204

In [ ]: #testing the model

yhat = regressor.predict(X_test)
r2_score(y_test,yhat)

Out[ ]: 0.878968081057204

Model Evalution

Distribution plot from the models predictions and the

actual values
In [ ]: # displot of the actual price and predicted price for all models
fig, ax = plt.subplots(1,3,figsize=(20,5))
sns.distplot(y_test,ax=ax[0])
sns.distplot(pipe_pred,ax=ax[0])

file:///E:/Data Science Course/Projects/house price.html 10/13

7/8/23, 4:15 PM house price

sns.distplot(y_test,ax=ax[1])
sns.distplot(r_pred,ax=ax[1])
sns.distplot(y_test,ax=ax[2])
sns.distplot(yhat,ax=ax[2])
# legends
ax[0].legend(['Actual Price','Predicted Price'])
ax[1].legend(['Actual Price','Predicted Price'])
ax[2].legend(['Actual Price','Predicted Price'])
#model name as title
ax[0].set_title('Linear Regression')
ax[1].set_title('Ridge Regression')
ax[2].set_title('Random Forest Regression')
plt.show()

Error Evaluation
In [ ]: #plot the graph to compare mae, mse, rmse for all models
fig, ax = plt.subplots(1,3,figsize=(20,5))
sns.barplot(x=['Linear Regression','Ridge Regression','Random Forest'],y=[mean_a
sns.barplot(x=['Linear Regression','Ridge Regression','Random Forest'],y=[mean_s
sns.barplot(x=['Linear Regression','Ridge Regression','Random Forest'],y=[np.sqr
# label for the graph
ax[0].set_ylabel('Mean Absolute Error')
ax[1].set_ylabel('Mean Squared Error')
ax[2].set_ylabel('Root Mean Squared Error')
plt.show()

Accuracy Evaluation

In [ ]: # plot accuracy of all models in the same graph

fig, ax = plt.subplots(figsize=(7,5))
sns.barplot(x=['Linear Regression','Ridge Regression','Random Forest Regression'
ax.set_title('Accuracy of all models')
plt.show()

file:///E:/Data Science Course/Projects/house price.html 11/13

7/8/23, 4:15 PM house price

Predicting the price of a new house

In [ ]: #input the values
bedrooms = 3
bathrooms = 2
sqft_living = 2000
sqft_lot = 10000
floors = 2
waterfront = 0
view = 0
condition = 3
grade = 8
sqft_above = 2000
sqft_basement = 0
yr_built = 1990
yr_renovated = 0
zipcode = 98001
lat = 47.5480
long = -121.9836
sqft_living15 = 2000
sqft_lot15 = 10000

In [ ]: #predicting the price using random forest regression

price = regressor.predict([[bedrooms,bathrooms,sqft_living,sqft_lot,floors,water
print('The price of the house is $',price[0])

The price of the house is $ 1078694.0533333335

Conclusion

file:///E:/Data Science Course/Projects/house price.html 12/13

7/8/23, 4:15 PM house price

From the analysis, we can see that the Random Forest Regression model performed
better than the Ridge Regression model and Polynomial Regression model.

During the EDA process, we found out that the location of the house is a very important
factor in determining the price of the house, since houese with similar area and other
features can have different prices depending on the location of the house.

The location of the houses has been plotted on the map using the longitude and latitude
values which makesrole of location in determining the price of the house more clear.

file:///E:/Data Science Course/Projects/house price.html 13/13

NOTE: For some reasons, the map was not rendered properly when
the notebook was converted into pdf. So here is the image of the
rendered map showing the locations of the houses, color coded
according to their price range

House Price Prediction Model Guide
No ratings yet
House Price Prediction Model Guide
187 pages
Data Cleaning EDA
No ratings yet
Data Cleaning EDA
5 pages
Data Analysis With Python - Jupyter Notebook
No ratings yet
Data Analysis With Python - Jupyter Notebook
10 pages
House Price Prediction Analysis
No ratings yet
House Price Prediction Analysis
18 pages
Eda Project
No ratings yet
Eda Project
28 pages
Ds ML House Price Book
No ratings yet
Ds ML House Price Book
46 pages
Delhi House Price Prediction 1692019997
No ratings yet
Delhi House Price Prediction 1692019997
34 pages
Predicting Home Prices in Bangalore
No ratings yet
Predicting Home Prices in Bangalore
18 pages
California Housing Data Analysis
No ratings yet
California Housing Data Analysis
16 pages
House Price Prediction Analysis
No ratings yet
House Price Prediction Analysis
14 pages
Real Estate Price Prediction Guide
No ratings yet
Real Estate Price Prediction Guide
13 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
House Price Prediction with Regression
No ratings yet
House Price Prediction with Regression
20 pages
Major Project Guide
No ratings yet
Major Project Guide
5 pages
Intro to ML with Sklearn & Python
No ratings yet
Intro to ML with Sklearn & Python
10 pages
USA Real Estate Price Prediction Using Decision Tree Regressor, and AdaBoost Regressor
No ratings yet
USA Real Estate Price Prediction Using Decision Tree Regressor, and AdaBoost Regressor
14 pages
Setup: Chapter 2 - End-To-End Machine Learning Project
No ratings yet
Setup: Chapter 2 - End-To-End Machine Learning Project
31 pages
House Price Pridiction Prabhjotsingh2
No ratings yet
House Price Pridiction Prabhjotsingh2
14 pages
Tarea - Prediccion de Casas en California
No ratings yet
Tarea - Prediccion de Casas en California
5 pages
Bangalore Housing Price Analysis
No ratings yet
Bangalore Housing Price Analysis
6 pages
Median Housing Price Prediction Report
No ratings yet
Median Housing Price Prediction Report
30 pages
House Price Prediction with ML
No ratings yet
House Price Prediction with ML
2 pages
House Price Prediction Analysis Project
No ratings yet
House Price Prediction Analysis Project
7 pages
Exploratory Data Analysis on House Prices
No ratings yet
Exploratory Data Analysis on House Prices
19 pages
House Price Prediction with Python
No ratings yet
House Price Prediction with Python
6 pages
00 Data Wrangling
No ratings yet
00 Data Wrangling
10 pages
(House Price Prediction) Capstone Project For Python
No ratings yet
(House Price Prediction) Capstone Project For Python
10 pages
California Housing Project
No ratings yet
California Housing Project
5 pages
ML Beginners: Predict House Prices
No ratings yet
ML Beginners: Predict House Prices
32 pages
EDA Techniques in Data Analytics Lab
No ratings yet
EDA Techniques in Data Analytics Lab
48 pages
Boston Housing Solutions
No ratings yet
Boston Housing Solutions
3 pages
Housing Data Analysis with Pandas
No ratings yet
Housing Data Analysis with Pandas
14 pages
Experiment No 11
No ratings yet
Experiment No 11
19 pages
Housing Data Analysis with Python
No ratings yet
Housing Data Analysis with Python
26 pages
Indian Housing Data Analysis Report
No ratings yet
Indian Housing Data Analysis Report
7 pages
House Prices Analysis - Final Assessment
No ratings yet
House Prices Analysis - Final Assessment
2 pages
Formal Research Paper Slideshow by Slidesgo
No ratings yet
Formal Research Paper Slideshow by Slidesgo
9 pages
Explore Data with Pandas DataFrame
No ratings yet
Explore Data with Pandas DataFrame
6 pages
Normalizing California Housing Data
No ratings yet
Normalizing California Housing Data
7 pages
ML File
No ratings yet
ML File
6 pages
BCA 5th Sem Lab (ML)
No ratings yet
BCA 5th Sem Lab (ML)
20 pages
Pandas Assignment 1
No ratings yet
Pandas Assignment 1
7 pages
Deep Learning - House Price Prediction
No ratings yet
Deep Learning - House Price Prediction
17 pages
Capstone Project 6 April
No ratings yet
Capstone Project 6 April
64 pages
Project 4 - House Price Prediction - Ipynb - Colab
No ratings yet
Project 4 - House Price Prediction - Ipynb - Colab
5 pages
Unit 2
No ratings yet
Unit 2
78 pages
Report
No ratings yet
Report
40 pages
House Price Prediction Analysis Report
100% (2)
House Price Prediction Analysis Report
60 pages
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
No ratings yet
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
127 pages
House Rent Prediction EDA Insights
No ratings yet
House Rent Prediction EDA Insights
35 pages
Ese Lab File
No ratings yet
Ese Lab File
30 pages
Python House Rent Analysis & Prediction
No ratings yet
Python House Rent Analysis & Prediction
2 pages
Real Estate Price Analysis with Python
No ratings yet
Real Estate Price Analysis with Python
13 pages
House Price Determinants Analysis
No ratings yet
House Price Determinants Analysis
7 pages
Advanced House Price Prediction Analysis
No ratings yet
Advanced House Price Prediction Analysis
87 pages
Data Analysis Project MAIN
No ratings yet
Data Analysis Project MAIN
6 pages
Python Data Acquisition with Pandas
No ratings yet
Python Data Acquisition with Pandas
119 pages
California Housing Dataset Analysis
No ratings yet
California Housing Dataset Analysis
6 pages
Bengaluru House Price Analysis
No ratings yet
Bengaluru House Price Analysis
13 pages
Ethics in AI: Bias Mitigation Strategies
No ratings yet
Ethics in AI: Bias Mitigation Strategies
52 pages
Mcqs in Dbms213
No ratings yet
Mcqs in Dbms213
9 pages
Machine Learning Course Overview and Careers
No ratings yet
Machine Learning Course Overview and Careers
6 pages
NaOH and Na2CO3 Titration Guide
No ratings yet
NaOH and Na2CO3 Titration Guide
16 pages
C Programs for Process Scheduling Algorithms
No ratings yet
C Programs for Process Scheduling Algorithms
38 pages
SQL Basics for Beginners
No ratings yet
SQL Basics for Beginners
3 pages
Document Scanning Overview
No ratings yet
Document Scanning Overview
117 pages
Instrumentation and Measurement Basics
No ratings yet
Instrumentation and Measurement Basics
56 pages
4 - LM Test and Heteroskedasticity
No ratings yet
4 - LM Test and Heteroskedasticity
13 pages
TCS Recruitment Effectiveness Analysis
No ratings yet
TCS Recruitment Effectiveness Analysis
14 pages
Metro Manila Trip Estimation Model
No ratings yet
Metro Manila Trip Estimation Model
2 pages
ECO726 Applied Statistics
No ratings yet
ECO726 Applied Statistics
125 pages
Polynomial Regression Tool Online
No ratings yet
Polynomial Regression Tool Online
2 pages
Quantile Regression
No ratings yet
Quantile Regression
122 pages
Panel Data Assignment
No ratings yet
Panel Data Assignment
32 pages
Statistics True or False
100% (1)
Statistics True or False
9 pages
Organic Growth Boosters for Radishes
No ratings yet
Organic Growth Boosters for Radishes
17 pages
Econometrics Exam for Students
100% (1)
Econometrics Exam for Students
8 pages
Gratitude and Mental Health in Adults
No ratings yet
Gratitude and Mental Health in Adults
7 pages
Samplin Distn
No ratings yet
Samplin Distn
37 pages
Instrumentation Measurement Techniques
No ratings yet
Instrumentation Measurement Techniques
23 pages
ReCentering in Psych Statistics
No ratings yet
ReCentering in Psych Statistics
560 pages
Emotional Quotient & Social Media Use in Psychology Students
No ratings yet
Emotional Quotient & Social Media Use in Psychology Students
35 pages
Linear Regression Slides
No ratings yet
Linear Regression Slides
129 pages
Decision Sciences II Mid-Term Exam Solutions
No ratings yet
Decision Sciences II Mid-Term Exam Solutions
19 pages
SEM 3 Aviation Forecasting Techniques
No ratings yet
SEM 3 Aviation Forecasting Techniques
29 pages
Statistics Lesson Plan Residual Plots Final
No ratings yet
Statistics Lesson Plan Residual Plots Final
7 pages
Introduction Machine Learning
No ratings yet
Introduction Machine Learning
53 pages
Board Attributes Financial Reporting Quality Ibrahim Abubakar 2019
No ratings yet
Board Attributes Financial Reporting Quality Ibrahim Abubakar 2019
9 pages
Run Test
No ratings yet
Run Test
8 pages
Deep Learning Based Models For Solar Energy Prediction
No ratings yet
Deep Learning Based Models For Solar Energy Prediction
8 pages
Demand Forecasting at GE BEL
No ratings yet
Demand Forecasting at GE BEL
23 pages
3 - The Estimation of Missing Plot Values
No ratings yet
3 - The Estimation of Missing Plot Values
16 pages
Sampling Distribution Quiz EMATH 214
No ratings yet
Sampling Distribution Quiz EMATH 214
2 pages
Stat and Prob Q4 Mod1 W1 Hypothesis Testing
No ratings yet
Stat and Prob Q4 Mod1 W1 Hypothesis Testing
22 pages
Linear Regression Models and OLS Assumptions
No ratings yet
Linear Regression Models and OLS Assumptions
14 pages
Measurement Error Insights
No ratings yet
Measurement Error Insights
13 pages

House Price Prediction Analysis

Uploaded by

House Price Prediction Analysis

Uploaded by

7/8/23, 4:15 PM house price

House Price Prediction

In [ ]: # Importing the dataset

Out[ ]: id date price bedrooms bathrooms sqft_living sqft_lot f

0 7129300520 20141013T000000 221900 3 1.00 1180 5650

1 6414100192 20141209T000000 538000 3 2.25 2570 7242

2 5631500400 20150225T000000 180000 2 1.00 770 10000

3 2487200875 20141209T000000 604000 4 3.00 1960 5000

4 1954400510 20150218T000000 510000 3 2.00 1680 8080

file:///E:/Data Science Course/Projects/house price.html 1/13

0 221900 3 1.00 1180 5650 1.0 0 0

1 538000 3 2.25 2570 7242 2.0 0 0

2 180000 2 1.00 770 10000 1.0 0 0

3 604000 4 3.00 1960 5000 1.0 0 0

4 510000 3 2.00 1680 8080 1.0 0 0

file:///E:/Data Science Course/Projects/house price.html 2/13

Out[ ]: price bedrooms bathrooms sqft_living sqft_lot fl

count 2.161300e+04 21613.000000 21613.000000 21613.000000 2.161300e+04 21613.000

mean 5.400881e+05 3.370842 2.114757 2079.899736 1.510697e+04 1.494

std 3.671272e+05 0.930062 0.770163 918.440897 4.142051e+04 0.539

min 7.500000e+04 0.000000 0.000000 290.000000 5.200000e+02 1.000

25% 3.219500e+05 3.000000 1.750000 1427.000000 5.040000e+03 1.000

50% 4.500000e+05 3.000000 2.250000 1910.000000 7.618000e+03 1.500

75% 6.450000e+05 4.000000 2.500000 2550.000000 1.068800e+04 2.000

max 7.700000e+06 33.000000 8.000000 13540.000000 1.651359e+06 3.500

In [ ]: # checking for null values/missing values

file:///E:/Data Science Course/Projects/house price.html 3/13

Out[ ]: price 4032

In [ ]: # using simple feature scaling

0 221900 3 1 0.087149 0.003421 1 0 0

1 538000 3 2 0.189808 0.004385 2 0 0

2 180000 2 1 0.056869 0.006056 1 0 0

3 604000 4 3 0.144756 0.003028 1 0 0

4 510000 3 2 0.124077 0.004893 1 0 0

file:///E:/Data Science Course/Projects/house price.html 4/13

Exploratory Data Analysis

Out[ ]: price 1.000000

file:///E:/Data Science Course/Projects/house price.html 5/13

Visualizing the coorelation with price

Out[ ]: <Axes: >

file:///E:/Data Science Course/Projects/house price.html 6/13

Visulaizing the data

In [ ]: # visualizing the relation between price and sqft_living, sqft_lot, sqft_above,

file:///E:/Data Science Course/Projects/house price.html 7/13

Plotting the location of the houses based on longitude and latitude on

In [ ]: map = folium.Map(location=[47.5480, -121.9836],zoom_start=8)

file:///E:/Data Science Course/Projects/house price.html 8/13

Using pipeline to combine the transformers and estimators

In [ ]: #training the model

In [ ]: #testing the model

file:///E:/Data Science Course/Projects/house price.html 9/13

In [ ]: # training the model

In [ ]: #testing the model

Random Forest Regression

In [ ]: # training the model

In [ ]: #testing the model

Distribution plot from the models predictions and the

file:///E:/Data Science Course/Projects/house price.html 10/13

In [ ]: # plot accuracy of all models in the same graph

file:///E:/Data Science Course/Projects/house price.html 11/13

Predicting the price of a new house

In [ ]: #predicting the price using random forest regression

The price of the house is $ 1078694.0533333335

file:///E:/Data Science Course/Projects/house price.html 12/13

file:///E:/Data Science Course/Projects/house price.html 13/13

You might also like