0% found this document useful (0 votes)
6 views13 pages

Quantam - Learning - Colaboratory

Copy of quantum learning
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views13 pages

Quantam - Learning - Colaboratory

Copy of quantum learning
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

1/11/23, 12:50 PM Copy of Quantam_Learning_ - Colaboratory

from·google.colab·import·drive
drive.mount·('/content/drive')

Mounted at /content/drive

path='/content/drive/MyDrive/Copy of Bengaluru_House_Data.csv'

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline
import matplotlib
matplotlib.rcParams["figure.figsize"] = (20,10)

df1 = pd.read_csv("/content/drive/MyDrive/Copy of Bengaluru_House_Data.csv")


df1.head()

area_type availability location size society total_sqft bath bal

Super
Electronic City
0 built-up 19-Dec 2 BHK Coomee 1056 2.0
Phase II
Area

Ready To 4
1 Plot Area Chikka Tirupathi Theanmp 2600 5.0
Move Bedroom

Built-up Ready To
2 Uttarahalli 3 BHK NaN 1440 2.0
Area Move

df1.shape

(13320, 9)

df1.columns

Index(['area_type', 'availability', 'location', 'size', 'society',


'total_sqft', 'bath', 'balcony', 'price'],
dtype='object')

df1['area_type'].unique()

array(['Super built-up Area', 'Plot Area', 'Built-up Area',


'Carpet Area'], dtype=object)

df1['area_type'].value_counts()

Super built-up Area 8790


Built-up Area 2418
Plot Area 2025
Carpet Area 87
Name: area_type, dtype: int64

df2 = df1.drop(['area_type','society','balcony','availability'],axis='columns')
df2.shape

(13320, 5)

df2 = df1.drop(['area_type','society','balcony','availability'],axis='columns')
df2.shape

(13320, 5)

df2.isnull().sum()

location 1
size 16

https://colab.research.google.com/drive/1GxpC9HQufBWj-pbJ3qRIEW3LWgPzgjTa#scrollTo=mmEvB0k2ZzqO&printMode=true 1/13
1/11/23, 12:50 PM Copy of Quantam_Learning_ - Colaboratory
total_sqft 0
bath 73
price 0
dtype: int64

df2.shape

(13320, 5)

df3 = df2.dropna()
df3.isnull().sum()

location 0
size 0
total_sqft 0
bath 0
price 0
dtype: int64

df3.shape

(13246, 5)

df3['bhk'] = df3['size'].apply(lambda x: int(x.split(' ')[0]))


df3.bhk.unique()

<ipython-input-14-681cf3aca53d>:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus


df3['bhk'] = df3['size'].apply(lambda x: int(x.split(' ')[0]))
array([ 2, 4, 3, 6, 1, 8, 7, 5, 11, 9, 27, 10, 19, 16, 43, 14, 12,
13, 18])

def is_float(x):
try:
float(x)
except:
return False
return True

2+3

df3[~df3['total_sqft'].apply(is_float)].head(10)

location size total_sqft bath price bhk

30 Yelahanka 4 BHK 2100 - 2850 4.0 186.000 4

122 Hebbal 4 BHK 3067 - 8156 4.0 477.000 4

137 8th Phase JP Nagar 2 BHK 1042 - 1105 2.0 54.005 2

165 Sarjapur 2 BHK 1145 - 1340 2.0 43.490 2

188 KR Puram 2 BHK 1015 - 1540 2.0 56.800 2

410 Kengeri 1 BHK 34.46Sq. Meter 1.0 18.500 1

549 Hennur Road 2 BHK 1195 - 1440 2.0 63.770 2

648 Arekere 9 Bedroom 4125Perch 9.0 265.000 9

661 Yelahanka 2 BHK 1120 - 1145 2.0 48.130 2

672 Bettahalsoor 4 Bedroom 3090 - 5002 4.0 445.000 4

def convert_sqft_to_num(x):
tokens = x.split('-')
if len(tokens) == 2:
return (float(tokens[0])+float(tokens[1]))/2

https://colab.research.google.com/drive/1GxpC9HQufBWj-pbJ3qRIEW3LWgPzgjTa#scrollTo=mmEvB0k2ZzqO&printMode=true 2/13
1/11/23, 12:50 PM Copy of Quantam_Learning_ - Colaboratory
try:
return float(x)
except:
return None

df4 = df3.copy()
df4.total_sqft = df4.total_sqft.apply(convert_sqft_to_num)
df4 = df4[df4.total_sqft.notnull()]
df4.head(2)

location size total_sqft bath price bhk

0 Electronic City Phase II 2 BHK 1056.0 2.0 39.07 2

1 Chikka Tirupathi 4 Bedroom 2600.0 5.0 120.00 4

df4.loc[30]

location Yelahanka
size 4 BHK
total_sqft 2475.0
bath 4.0
price 186.0
bhk 4
Name: 30, dtype: object

(2100+2850)/2

2475.0

df5 = df4.copy()
df5['price_per_sqft'] = df5['price']*100000/df5['total_sqft']
df5.head()

location size total_sqft bath price bhk price_per_sqft

0 Electronic City Phase II 2 BHK 1056.0 2.0 39.07 2 3699.810606

1 Chikka Tirupathi 4 Bedroom 2600.0 5.0 120.00 4 4615.384615

2 Uttarahalli 3 BHK 1440.0 2.0 62.00 3 4305.555556

3 Lingadheeranahalli 3 BHK 1521.0 3.0 95.00 3 6245.890861

4 Kothanur 2 BHK 1200.0 2.0 51.00 2 4250.000000

df5_stats = df5['price_per_sqft'].describe()
df5_stats

count 1.320000e+04
mean 7.920759e+03
std 1.067272e+05
min 2.678298e+02
25% 4.267701e+03
50% 5.438331e+03
75% 7.317073e+03
max 1.200000e+07
Name: price_per_sqft, dtype: float64

df5.to_csv("bhp.csv",index=False)

df5.location = df5.location.apply(lambda x: x.strip())


location_stats = df5['location'].value_counts(ascending=False)
location_stats

Whitefield 533
Sarjapur Road 392
Electronic City 304
Kanakpura Road 264
Thanisandra 235
...
Rajanna Layout 1
Subramanyanagar 1
Lakshmipura Vidyaanyapura 1
Malur Hosur Road 1
Abshot Layout 1
Name: location, Length: 1287, dtype: int64

location_stats.values.sum()

https://colab.research.google.com/drive/1GxpC9HQufBWj-pbJ3qRIEW3LWgPzgjTa#scrollTo=mmEvB0k2ZzqO&printMode=true 3/13
1/11/23, 12:50 PM Copy of Quantam_Learning_ - Colaboratory

13200

len(location_stats[location_stats>10])

240

len(location_stats)

1287

len(location_stats[location_stats<=10])

1047

location_stats_less_than_10 = location_stats[location_stats<=10]
location_stats_less_than_10

BTM 1st Stage 10


Gunjur Palya 10
Nagappa Reddy Layout 10
Sector 1 HSR Layout 10
Thyagaraja Nagar 10
..
Rajanna Layout 1
Subramanyanagar 1
Lakshmipura Vidyaanyapura 1
Malur Hosur Road 1
Abshot Layout 1
Name: location, Length: 1047, dtype: int64

len(df5.location.unique())

1287

df5.location = df5.location.apply(lambda x: 'other' if x in location_stats_less_than_10 else x)


len(df5.location.unique())

241

df5.head(10)

location size total_sqft bath price bhk price_per_sqft

0 Electronic City Phase II 2 BHK 1056.0 2.0 39.07 2 3699.810606

1 Chikka Tirupathi 4 Bedroom 2600.0 5.0 120.00 4 4615.384615

2 Uttarahalli 3 BHK 1440.0 2.0 62.00 3 4305.555556

3 Lingadheeranahalli 3 BHK 1521.0 3.0 95.00 3 6245.890861

4 Kothanur 2 BHK 1200.0 2.0 51.00 2 4250.000000

5 Whitefield 2 BHK 1170.0 2.0 38.00 2 3247.863248

6 Old Airport Road 4 BHK 2732.0 4.0 204.00 4 7467.057101

7 Rajaji Nagar 4 BHK 3300.0 4.0 600.00 4 18181.818182

8 Marathahalli 3 BHK 1310.0 3.0 63.25 3 4828.244275

9 other 6 Bedroom 1020.0 6.0 370.00 6 36274.509804

df5[df5.total_sqft/df5.bhk<300].head()

location size total_sqft bath price bhk price_per_sqft

9 other 6 Bedroom 1020.0 6.0 370.0 6 36274.509804

45 HSR Layout 8 Bedroom 600.0 9.0 200.0 8 33333.333333

58 Murugeshpalya 6 Bedroom 1407.0 4.0 150.0 6 10660.980810

68 Devarachikkanahalli 8 Bedroom 1350.0 7.0 85.0 8 6296.296296

70 other 3 Bedroom 500.0 3.0 100.0 3 20000.000000

df5.shape

(13200, 7)

https://colab.research.google.com/drive/1GxpC9HQufBWj-pbJ3qRIEW3LWgPzgjTa#scrollTo=mmEvB0k2ZzqO&printMode=true 4/13
1/11/23, 12:50 PM Copy of Quantam_Learning_ - Colaboratory
df6 = df5[~(df5.total_sqft/df5.bhk<300)]
df6.shape

(12456, 7)

df6.price_per_sqft.describe()

count 12456.000000
mean 6308.502826
std 4168.127339
min 267.829813
25% 4210.526316
50% 5294.117647
75% 6916.666667
max 176470.588235
Name: price_per_sqft, dtype: float64

def remove_pps_outliers(df):
df_out = pd.DataFrame()
for key, subdf in df.groupby('location'):
m = np.mean(subdf.price_per_sqft)
st = np.std(subdf.price_per_sqft)
reduced_df = subdf[(subdf.price_per_sqft>(m-st)) & (subdf.price_per_sqft<=(m+st))]
df_out = pd.concat([df_out,reduced_df],ignore_index=True)
return df_out
df7 = remove_pps_outliers(df6)
df7.shape

(10242, 7)

def plot_scatter_chart(df,location):
bhk2 = df[(df.location==location) & (df.bhk==2)]
bhk3 = df[(df.location==location) & (df.bhk==3)]
matplotlib.rcParams['figure.figsize'] = (15,10)
plt.scatter(bhk2.total_sqft,bhk2.price,color='blue',label='2 BHK', s=50)
plt.scatter(bhk3.total_sqft,bhk3.price,marker='+', color='green',label='3 BHK', s=50)
plt.xlabel("Total Square Feet Area")
plt.ylabel("Price (Lakh Indian Rupees)")
plt.title(location)
plt.legend()

plot_scatter_chart(df7,"Rajaji Nagar")

plot_scatter_chart(df7,"Hebbal")

https://colab.research.google.com/drive/1GxpC9HQufBWj-pbJ3qRIEW3LWgPzgjTa#scrollTo=mmEvB0k2ZzqO&printMode=true 5/13
1/11/23, 12:50 PM Copy of Quantam_Learning_ - Colaboratory

{
'1' : {
'mean': 4000,
'std': 2000,
'count': 34
},
'2' : {
'mean': 4300,
'std': 2300,
'count': 22
},
}

{'1': {'mean': 4000, 'std': 2000, 'count': 34},


'2': {'mean': 4300, 'std': 2300, 'count': 22}}

def remove_bhk_outliers(df):
exclude_indices = np.array([])
for location, location_df in df.groupby('location'):
bhk_stats = {}
for bhk, bhk_df in location_df.groupby('bhk'):
bhk_stats[bhk] = {
'mean': np.mean(bhk_df.price_per_sqft),
'std': np.std(bhk_df.price_per_sqft),
'count': bhk_df.shape[0]
}
for bhk, bhk_df in location_df.groupby('bhk'):
stats = bhk_stats.get(bhk-1)
if stats and stats['count']>5:
exclude_indices = np.append(exclude_indices, bhk_df[bhk_df.price_per_sqft<(stats['mean'])].index.values)
return df.drop(exclude_indices,axis='index')
df8 = remove_bhk_outliers(df7)
# df8 = df7.copy()
df8.shape

(7317, 7)

plot_scatter_chart(df8,"Rajaji Nagar")

https://colab.research.google.com/drive/1GxpC9HQufBWj-pbJ3qRIEW3LWgPzgjTa#scrollTo=mmEvB0k2ZzqO&printMode=true 6/13
1/11/23, 12:50 PM Copy of Quantam_Learning_ - Colaboratory

plot_scatter_chart(df8,"Hebbal")

import matplotlib
matplotlib.rcParams["figure.figsize"] = (20,10)
plt.hist(df8.price_per_sqft,rwidth=0.8)
plt.xlabel("Price Per Square Feet")
plt.ylabel("Count")

https://colab.research.google.com/drive/1GxpC9HQufBWj-pbJ3qRIEW3LWgPzgjTa#scrollTo=mmEvB0k2ZzqO&printMode=true 7/13
1/11/23, 12:50 PM Copy of Quantam_Learning_ - Colaboratory

Text(0, 0.5, 'Count')

df8.bath.unique()

array([ 4., 3., 2., 5., 8., 1., 6., 7., 9., 12., 16., 13.])

plt.hist(df8.bath,rwidth=0.8)
plt.xlabel("Number of bathrooms")
plt.ylabel("Count")

Text(0, 0.5, 'Count')

df8[df8.bath>10]

location size total_sqft bath price bhk price_per_sqft

5277 Neeladri Nagar 10 BHK 4000.0 12.0 160.0 10 4000.000000

8483 other 10 BHK 12000.0 12.0 525.0 10 4375.000000

8572 other 16 BHK 10000.0 16.0 550.0 16 5500.000000

9306 other 11 BHK 6000.0 12.0 150.0 11 2500.000000

9637 other 13 BHK 5425.0 13.0 275.0 13 5069.124424

df8[df8.bath>df8.bhk+2]

location size total_sqft bath price bhk price_per_sqft

1626 Chikkabanavar 4 Bedroom 2460.0 7.0 80.0 4 3252.032520

5238 Nagasandra 4 Bedroom 7000.0 8.0 450.0 4 6428.571429

6711 Thanisandra 3 BHK 1806.0 6.0 116.0 3 6423.034330

8408 other 6 BHK 11338.0 9.0 1000.0 6 8819.897689

https://colab.research.google.com/drive/1GxpC9HQufBWj-pbJ3qRIEW3LWgPzgjTa#scrollTo=mmEvB0k2ZzqO&printMode=true 8/13
1/11/23, 12:50 PM Copy of Quantam_Learning_ - Colaboratory
df9 = df8[df8.bath<df8.bhk+2]
df9.shape

(7239, 7)

df9.head(2)

location size total_sqft bath price bhk price_per_sqft

0 1st Block Jayanagar 4 BHK 2850.0 4.0 428.0 4 15017.543860

1 1st Block Jayanagar 3 BHK 1630.0 3.0 194.0 3 11901.840491

df10 = df9.drop(['size','price_per_sqft'],axis='columns')
df10.head(3)

location total_sqft bath price bhk

0 1st Block Jayanagar 2850.0 4.0 428.0 4

1 1st Block Jayanagar 1630.0 3.0 194.0 3

2 1st Block Jayanagar 1875.0 2.0 235.0 3

dummies = pd.get_dummies(df10.location)
dummies.head(3)

1st 2nd 5th 5th 6th 7th 8th 9th


1st Block Phase Phase 2nd Stage Block Phase Phase Phase Phase Phase Vishveshwarya Vishwapriya
... Vittasandra W
Jayanagar JP Judicial Nagarbhavi Hbr JP JP JP JP JP Layout Layout
Nagar Layout Layout Nagar Nagar Nagar Nagar Nagar

0 1 0 0 0 0 0 0 0 0 0 ... 0 0 0

1 1 0 0 0 0 0 0 0 0 0 ... 0 0 0

2 1 0 0 0 0 0 0 0 0 0 ... 0 0 0

3 rows × 241 columns

df11 = pd.concat([df10,dummies.drop('other',axis='columns')],axis='columns')
df11.head()

1st 2nd 5th


1st Block Phase Phase 2nd Stage Block Vishveshwarya Vishwapr
location total_sqft bath price bhk ... Vijayanagar
Jayanagar JP Judicial Nagarbhavi Hbr Layout Lay
Nagar Layout Layout

1st Block
0 2850.0 4.0 428.0 4 1 0 0 0 0 ... 0 0
Jayanagar

1st Block
1 1630.0 3.0 194.0 3 1 0 0 0 0 ... 0 0
Jayanagar

1st Block
2 1875.0 2.0 235.0 3 1 0 0 0 0 ... 0 0
Jayanagar

1st Block
3 1200.0 2.0 130.0 3 1 0 0 0 0 ... 0 0
Jayanagar

1st Block
4 1235.0 2.0 148.0 2 1 0 0 0 0 ... 0 0
Jayanagar

5 rows × 245 columns

df12 = df11.drop('location',axis='columns')
df12.head(2)

https://colab.research.google.com/drive/1GxpC9HQufBWj-pbJ3qRIEW3LWgPzgjTa#scrollTo=mmEvB0k2ZzqO&printMode=true 9/13
1/11/23, 12:50 PM Copy of Quantam_Learning_ - Colaboratory

1st 2nd 5th 5th


1st Block Phase Phase 2nd Stage Block Phase Vishveshwarya Vishwapriya
total_sqft bath price bhk ... Vijayanagar
Jayanagar JP Judicial Nagarbhavi Hbr JP Layout Layout
Nagar Layout Layout Nagar

0 2850.0 4.0 428.0 4 1 0 0 0 0 0 ... 0 0 0


df12.shape
1 1630.0 3.0 194.0 3 1 0 0 0 0 0 ... 0 0 0
(7239, 244)
2 rows × 244 columns

X = df12.drop(['price'],axis='columns')
X.head(3)

1st 2nd 5th 5th 6th


1st Block Phase Phase 2nd Stage Block Phase Phase Vishveshwarya Vishwapriya
total_sqft bath bhk ... Vijayanagar
Jayanagar JP Judicial Nagarbhavi Hbr JP JP Layout Layout
Nagar Layout Layout Nagar Nagar

0 2850.0 4.0 4 1 0 0 0 0 0 0 ... 0 0 0

1 1630.0 3.0 3 1 0 0 0 0 0 0 ... 0 0 0

2 1875.0 2.0 3 1 0 0 0 0 0 0 ... 0 0 0

3 rows × 243 columns

X.shape

(7239, 243)

y = df12.price
y.head(3)

0 428.0
1 194.0
2 235.0
Name: price, dtype: float64

len(y)

7239

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=10)

from sklearn.linear_model import LinearRegression


lr_clf = LinearRegression()
lr_clf.fit(X_train,y_train)
lr_clf.score(X_test,y_test)

0.8629132245229447

from sklearn.model_selection import ShuffleSplit


from sklearn.model_selection import cross_val_score

cv = ShuffleSplit(n_splits=5, test_size=0.2, random_state=0)

cross_val_score(LinearRegression(), X, y, cv=cv)

array([0.82702546, 0.86027005, 0.85322178, 0.8436466 , 0.85481502])

from sklearn.model_selection import GridSearchCV

from sklearn.linear_model import Lasso


from sklearn.tree import DecisionTreeRegressor

def find_best_model_using_gridsearchcv(X,y):
algos = {
'linear_regression' : {
'model': LinearRegression(),
'params': {
'normalize': [True, False]
}
},
'lasso': {

https://colab.research.google.com/drive/1GxpC9HQufBWj-pbJ3qRIEW3LWgPzgjTa#scrollTo=mmEvB0k2ZzqO&printMode=true 10/13
1/11/23, 12:50 PM Copy of Quantam_Learning_ - Colaboratory
'model': Lasso(),
'params': {
'alpha': [1,2],
'selection': ['random', 'cyclic']
}
},
'decision_tree': {
'model': DecisionTreeRegressor(),
'params': {
'criterion' : ['mse','friedman_mse'],
'splitter': ['best','random']
}
}
}
scores = []
cv = ShuffleSplit(n_splits=5, test_size=0.2, random_state=0)
for algo_name, config in algos.items():
gs = GridSearchCV(config['model'], config['params'], cv=cv, return_train_score=False)
gs.fit(X,y)
scores.append({
'model': algo_name,
'best_score': gs.best_score_,
'best_params': gs.best_params_
})

return pd.DataFrame(scores,columns=['model','best_score','best_params'])

find_best_model_using_gridsearchcv(X,y)

https://colab.research.google.com/drive/1GxpC9HQufBWj-pbJ3qRIEW3LWgPzgjTa#scrollTo=mmEvB0k2ZzqO&printMode=true 11/13
1/11/23, 12:50 PM Copy of Quantam_Learning_ - Colaboratory

/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_base.py:141: FutureWarning: 'normalize' was deprecated in version 1.0


If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), LinearRegression())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}


model.fit(X, y, **kwargs)

warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_base.py:141: FutureWarning: 'normalize' was deprecated in version 1.0
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), LinearRegression())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}


model.fit(X, y, **kwargs)

warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_base.py:141: FutureWarning: 'normalize' was deprecated in version 1.0
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), LinearRegression())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}


model.fit(X, y, **kwargs)

warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_base.py:141: FutureWarning: 'normalize' was deprecated in version 1.0
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), LinearRegression())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}


model.fit(X, y, **kwargs)
New Section
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_base.py:141: FutureWarning: 'normalize' was deprecated in version 1.0
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:
def predict_price(location,sqft,bath,bhk):
loc_index = np.where(X.columns==location)[0][0]
from sklearn.pipeline import make_pipeline
x = np.zeros(len(X.columns))
model = make_pipeline(StandardScaler(with_mean=False), LinearRegression())
x[0] = sqft
x[1] = bath
If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:
x[2] = bhk
if loc_index
kwargs >= 0:
= {s[0] + '__sample_weight': sample_weight for s in model.steps}
x[loc_index]
model.fit(X, y, = 1
**kwargs)

return lr_clf.predict([x])[0]
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_base.py:148: FutureWarning: 'normalize' was deprecated in version 1.0
predict_price('1st Phase JP Nagar',1000, 2, 2)
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_base.py:148: FutureWarning: 'normalize' was deprecated in version 1.0
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/base.py:450: UserWarning: X does not have valid feature names, but LinearRegression
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_base.py:148:
warnings.warn( FutureWarning: 'normalize' was deprecated in version 1.0
warnings.warn(
83.86570258311595
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_base.py:148: FutureWarning: 'normalize' was deprecated in version 1.0
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_base.py:148: FutureWarning: 'normalize' was deprecated in version 1.0
predict_price('1st Phase JP Nagar',1000, 3, 3)
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_base.py:148: FutureWarning: 'normalize' was deprecated in version 1.0
/usr/local/lib/python3.8/dist-packages/sklearn/base.py:450:
warnings.warn( UserWarning: X does not have valid feature names, but LinearRegression
/usr/local/lib/python3.8/dist-packages/sklearn/tree/_classes.py:359:
warnings.warn( FutureWarning: Criterion 'mse' was deprecated in v1.0 and will
86.08062284986363
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/tree/_classes.py:359: FutureWarning: Criterion 'mse' was deprecated in v1.0 and will
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/tree/_classes.py:359: FutureWarning: Criterion 'mse' was deprecated in v1.0 and will
predict_price('Indira
warnings.warn( Nagar',1000, 2, 2)
/usr/local/lib/python3.8/dist-packages/sklearn/tree/_classes.py:359: FutureWarning: Criterion 'mse' was deprecated in v1.0 and will
warnings.warn(
https://colab.research.google.com/drive/1GxpC9HQufBWj-pbJ3qRIEW3LWgPzgjTa#scrollTo=mmEvB0k2ZzqO&printMode=true 12/13
1/11/23, 12:50 PM Copy of Quantam_Learning_ - Colaboratory
/usr/local/lib/python3.8/dist-packages/sklearn/tree/_classes.py:359:
/usr/local/lib/python3.8/dist-packages/sklearn/base.py:450: FutureWarning:
UserWarning: X does not Criterion
have valid'mse' wasnames,
feature deprecated in v1.0 and will
but LinearRegression
warnings.warn(
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/tree/_classes.py:359:
193.3119773317968 FutureWarning: Criterion 'mse' was deprecated in v1.0 and will
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/tree/_classes.py:359: FutureWarning: Criterion 'mse' was deprecated in v1.0 and will
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/tree/_classes.py:359:
predict_price('Indira Nagar',1000, 3, 3) FutureWarning: Criterion 'mse' was deprecated in v1.0 and will
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/tree/_classes.py:359: FutureWarning:
/usr/local/lib/python3.8/dist-packages/sklearn/base.py:450: UserWarning: X does not Criterion
have valid'mse' wasnames,
feature deprecated in v1.0 and will
but LinearRegression
warnings.warn(
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/tree/_classes.py:359:
195.5268975985445 FutureWarning: Criterion 'mse' was deprecated in v1.0 and will
warnings.warn(
model best_score best_params

import0pickle
linear_regression 0.847796 {'normalize': False}
with open('banglore_home_prices_model.pickle','wb') as f:
1 lasso
pickle.dump(lr_clf,f) 0.726745 {'alpha': 2, 'selection': 'random'}

2 decision_tree 0.713436 {'criterion': 'friedman_mse', 'splitter': 'best'}


import json
columns = {
'data_columns' : [col.lower() for col in X.columns]
}
with open("columns.json","w") as f:
f.write(json.dumps(columns))

check 0s completed at 12:49 PM

https://colab.research.google.com/drive/1GxpC9HQufBWj-pbJ3qRIEW3LWgPzgjTa#scrollTo=mmEvB0k2ZzqO&printMode=true 13/13

You might also like