0% found this document useful (0 votes)
4 views37 pages

AM19 EDA Assignment3

Assignment3 of EDA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views37 pages

AM19 EDA Assignment3

Assignment3 of EDA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

am19-eda-assignment3

November 28, 2024

Name: Swapnil Chaudhari


PRN: 2122000238
Roll No.: AM19
Assignment No. 2
[1]: import pandas as pd
import numpy as np

[2]: df = pd.read_excel('Bengaluru_House_Data.xlsx')
df

[2]: area_type availability location \


0 Super built-up Area 2024-12-19 00:00:00 Electronic City Phase II
1 Plot Area Ready To Move Chikka Tirupathi
2 Built-up Area Ready To Move Uttarahalli
3 Super built-up Area Ready To Move Lingadheeranahalli
4 Super built-up Area Ready To Move Kothanur
… … … …
13315 Built-up Area Ready To Move Whitefield
13316 Super built-up Area Ready To Move Richards Town
13317 Built-up Area Ready To Move Raja Rajeshwari Nagar
13318 Super built-up Area 2024-06-18 00:00:00 Padmanabhanagar
13319 Super built-up Area Ready To Move Doddathoguru

size society total_sqft bath balcony price


0 2 BHK Coomee 1056 2.0 1.0 39.07
1 4 Bedroom Theanmp 2600 5.0 3.0 120.00
2 3 BHK NaN 1440 2.0 3.0 62.00
3 3 BHK Soiewre 1521 3.0 1.0 95.00
4 2 BHK NaN 1200 2.0 1.0 51.00
… … … … … … …
13315 5 Bedroom ArsiaEx 3453 4.0 0.0 231.00
13316 4 BHK NaN 3600 5.0 NaN 400.00
13317 2 BHK Mahla T 1141 2.0 1.0 60.00
13318 4 BHK SollyCl 4689 4.0 1.0 488.00
13319 1 BHK NaN 550 1.0 1.0 17.00

1
[13320 rows x 9 columns]

[3]: df.isnull().sum()

[3]: area_type 0
availability 0
location 1
size 16
society 5502
total_sqft 0
bath 73
balcony 609
price 0
dtype: int64

[4]: df.isnull().sum().sum()

[4]: 6201

[5]: df.isna().sum()

[5]: area_type 0
availability 0
location 1
size 16
society 5502
total_sqft 0
bath 73
balcony 609
price 0
dtype: int64

[6]: df.isna().sum().sum()

[6]: 6201

[7]: df[df['society'].isnull()]

[7]: area_type availability location size \


2 Built-up Area Ready To Move Uttarahalli 3 BHK
4 Super built-up Area Ready To Move Kothanur 2 BHK
8 Super built-up Area Ready To Move Marathahalli 3 BHK
9 Plot Area Ready To Move Gandhi Bazar 6 Bedroom
10 Super built-up Area 2024-02-18 00:00:00 Whitefield 3 BHK
… … … … …
13310 Super built-up Area Ready To Move Rachenahalli 2 BHK

2
13311 Plot Area Ready To Move Ramamurthy Nagar 7 Bedroom
13312 Super built-up Area Ready To Move Bellandur 2 BHK
13316 Super built-up Area Ready To Move Richards Town 4 BHK
13319 Super built-up Area Ready To Move Doddathoguru 1 BHK

society total_sqft bath balcony price


2 NaN 1440 2.0 3.0 62.00
4 NaN 1200 2.0 1.0 51.00
8 NaN 1310 3.0 1.0 63.25
9 NaN 1020 6.0 NaN 370.00
10 NaN 1800 2.0 2.0 70.00
… … … … … …
13310 NaN 1050 2.0 2.0 52.71
13311 NaN 1500 9.0 2.0 250.00
13312 NaN 1262 2.0 2.0 47.00
13316 NaN 3600 5.0 NaN 400.00
13319 NaN 550 1.0 1.0 17.00

[5502 rows x 9 columns]

[8]: df.fillna(0)

[8]: area_type availability location \


0 Super built-up Area 2024-12-19 00:00:00 Electronic City Phase II
1 Plot Area Ready To Move Chikka Tirupathi
2 Built-up Area Ready To Move Uttarahalli
3 Super built-up Area Ready To Move Lingadheeranahalli
4 Super built-up Area Ready To Move Kothanur
… … … …
13315 Built-up Area Ready To Move Whitefield
13316 Super built-up Area Ready To Move Richards Town
13317 Built-up Area Ready To Move Raja Rajeshwari Nagar
13318 Super built-up Area 2024-06-18 00:00:00 Padmanabhanagar
13319 Super built-up Area Ready To Move Doddathoguru

size society total_sqft bath balcony price


0 2 BHK Coomee 1056 2.0 1.0 39.07
1 4 Bedroom Theanmp 2600 5.0 3.0 120.00
2 3 BHK 0 1440 2.0 3.0 62.00
3 3 BHK Soiewre 1521 3.0 1.0 95.00
4 2 BHK 0 1200 2.0 1.0 51.00
… … … … … … …
13315 5 Bedroom ArsiaEx 3453 4.0 0.0 231.00
13316 4 BHK 0 3600 5.0 0.0 400.00
13317 2 BHK Mahla T 1141 2.0 1.0 60.00
13318 4 BHK SollyCl 4689 4.0 1.0 488.00
13319 1 BHK 0 550 1.0 1.0 17.00

3
[13320 rows x 9 columns]

[9]: df.fillna(1)

[9]: area_type availability location \


0 Super built-up Area 2024-12-19 00:00:00 Electronic City Phase II
1 Plot Area Ready To Move Chikka Tirupathi
2 Built-up Area Ready To Move Uttarahalli
3 Super built-up Area Ready To Move Lingadheeranahalli
4 Super built-up Area Ready To Move Kothanur
… … … …
13315 Built-up Area Ready To Move Whitefield
13316 Super built-up Area Ready To Move Richards Town
13317 Built-up Area Ready To Move Raja Rajeshwari Nagar
13318 Super built-up Area 2024-06-18 00:00:00 Padmanabhanagar
13319 Super built-up Area Ready To Move Doddathoguru

size society total_sqft bath balcony price


0 2 BHK Coomee 1056 2.0 1.0 39.07
1 4 Bedroom Theanmp 2600 5.0 3.0 120.00
2 3 BHK 1 1440 2.0 3.0 62.00
3 3 BHK Soiewre 1521 3.0 1.0 95.00
4 2 BHK 1 1200 2.0 1.0 51.00
… … … … … … …
13315 5 Bedroom ArsiaEx 3453 4.0 0.0 231.00
13316 4 BHK 1 3600 5.0 1.0 400.00
13317 2 BHK Mahla T 1141 2.0 1.0 60.00
13318 4 BHK SollyCl 4689 4.0 1.0 488.00
13319 1 BHK 1 550 1.0 1.0 17.00

[13320 rows x 9 columns]

[10]: df.ffill()

[10]: area_type availability location \


0 Super built-up Area 2024-12-19 00:00:00 Electronic City Phase II
1 Plot Area Ready To Move Chikka Tirupathi
2 Built-up Area Ready To Move Uttarahalli
3 Super built-up Area Ready To Move Lingadheeranahalli
4 Super built-up Area Ready To Move Kothanur
… … … …
13315 Built-up Area Ready To Move Whitefield
13316 Super built-up Area Ready To Move Richards Town
13317 Built-up Area Ready To Move Raja Rajeshwari Nagar
13318 Super built-up Area 2024-06-18 00:00:00 Padmanabhanagar
13319 Super built-up Area Ready To Move Doddathoguru

4
size society total_sqft bath balcony price
0 2 BHK Coomee 1056 2.0 1.0 39.07
1 4 Bedroom Theanmp 2600 5.0 3.0 120.00
2 3 BHK Theanmp 1440 2.0 3.0 62.00
3 3 BHK Soiewre 1521 3.0 1.0 95.00
4 2 BHK Soiewre 1200 2.0 1.0 51.00
… … … … … … …
13315 5 Bedroom ArsiaEx 3453 4.0 0.0 231.00
13316 4 BHK ArsiaEx 3600 5.0 0.0 400.00
13317 2 BHK Mahla T 1141 2.0 1.0 60.00
13318 4 BHK SollyCl 4689 4.0 1.0 488.00
13319 1 BHK SollyCl 550 1.0 1.0 17.00

[13320 rows x 9 columns]

[11]: df.bfill()

[11]: area_type availability location \


0 Super built-up Area 2024-12-19 00:00:00 Electronic City Phase II
1 Plot Area Ready To Move Chikka Tirupathi
2 Built-up Area Ready To Move Uttarahalli
3 Super built-up Area Ready To Move Lingadheeranahalli
4 Super built-up Area Ready To Move Kothanur
… … … …
13315 Built-up Area Ready To Move Whitefield
13316 Super built-up Area Ready To Move Richards Town
13317 Built-up Area Ready To Move Raja Rajeshwari Nagar
13318 Super built-up Area 2024-06-18 00:00:00 Padmanabhanagar
13319 Super built-up Area Ready To Move Doddathoguru

size society total_sqft bath balcony price


0 2 BHK Coomee 1056 2.0 1.0 39.07
1 4 Bedroom Theanmp 2600 5.0 3.0 120.00
2 3 BHK Soiewre 1440 2.0 3.0 62.00
3 3 BHK Soiewre 1521 3.0 1.0 95.00
4 2 BHK DuenaTa 1200 2.0 1.0 51.00
… … … … … … …
13315 5 Bedroom ArsiaEx 3453 4.0 0.0 231.00
13316 4 BHK Mahla T 3600 5.0 1.0 400.00
13317 2 BHK Mahla T 1141 2.0 1.0 60.00
13318 4 BHK SollyCl 4689 4.0 1.0 488.00
13319 1 BHK NaN 550 1.0 1.0 17.00

[13320 rows x 9 columns]

[12]: df.ffill(axis=1)

5
[12]: area_type availability location \
0 Super built-up Area 2024-12-19 00:00:00 Electronic City Phase II
1 Plot Area Ready To Move Chikka Tirupathi
2 Built-up Area Ready To Move Uttarahalli
3 Super built-up Area Ready To Move Lingadheeranahalli
4 Super built-up Area Ready To Move Kothanur
… … … …
13315 Built-up Area Ready To Move Whitefield
13316 Super built-up Area Ready To Move Richards Town
13317 Built-up Area Ready To Move Raja Rajeshwari Nagar
13318 Super built-up Area 2024-06-18 00:00:00 Padmanabhanagar
13319 Super built-up Area Ready To Move Doddathoguru

size society total_sqft bath balcony price


0 2 BHK Coomee 1056 2.0 1.0 39.07
1 4 Bedroom Theanmp 2600 5.0 3.0 120.0
2 3 BHK 3 BHK 1440 2.0 3.0 62.0
3 3 BHK Soiewre 1521 3.0 1.0 95.0
4 2 BHK 2 BHK 1200 2.0 1.0 51.0
… … … … … … …
13315 5 Bedroom ArsiaEx 3453 4.0 0.0 231.0
13316 4 BHK 4 BHK 3600 5.0 5.0 400.0
13317 2 BHK Mahla T 1141 2.0 1.0 60.0
13318 4 BHK SollyCl 4689 4.0 1.0 488.0
13319 1 BHK 1 BHK 550 1.0 1.0 17.0

[13320 rows x 9 columns]

[13]: df.fillna({'society':'abc','balcony':'xyz'},inplace=True)
df.head(15)

[13]: area_type availability location \


0 Super built-up Area 2024-12-19 00:00:00 Electronic City Phase II
1 Plot Area Ready To Move Chikka Tirupathi
2 Built-up Area Ready To Move Uttarahalli
3 Super built-up Area Ready To Move Lingadheeranahalli
4 Super built-up Area Ready To Move Kothanur
5 Super built-up Area Ready To Move Whitefield
6 Super built-up Area 2024-05-18 00:00:00 Old Airport Road
7 Super built-up Area Ready To Move Rajaji Nagar
8 Super built-up Area Ready To Move Marathahalli
9 Plot Area Ready To Move Gandhi Bazar
10 Super built-up Area 2024-02-18 00:00:00 Whitefield
11 Plot Area Ready To Move Whitefield
12 Super built-up Area Ready To Move 7th Phase JP Nagar
13 Built-up Area Ready To Move Gottigere
14 Plot Area Ready To Move Sarjapur

6
size society total_sqft bath balcony price
0 2 BHK Coomee 1056 2.0 1.0 39.07
1 4 Bedroom Theanmp 2600 5.0 3.0 120.00
2 3 BHK abc 1440 2.0 3.0 62.00
3 3 BHK Soiewre 1521 3.0 1.0 95.00
4 2 BHK abc 1200 2.0 1.0 51.00
5 2 BHK DuenaTa 1170 2.0 1.0 38.00
6 4 BHK Jaades 2732 4.0 xyz 204.00
7 4 BHK Brway G 3300 4.0 xyz 600.00
8 3 BHK abc 1310 3.0 1.0 63.25
9 6 Bedroom abc 1020 6.0 xyz 370.00
10 3 BHK abc 1800 2.0 2.0 70.00
11 4 Bedroom Prrry M 2785 5.0 3.0 295.00
12 2 BHK Shncyes 1000 2.0 1.0 38.00
13 2 BHK abc 1100 2.0 2.0 40.00
14 3 Bedroom Skityer 2250 3.0 2.0 148.00

[14]: df["balcony"] = pd.to_numeric(df["balcony"], errors='coerce')


df["balcony"].fillna(value=df["balcony"].mean()).round(0)

[14]: 0 1.0
1 3.0
2 3.0
3 1.0
4 1.0

13315 0.0
13316 2.0
13317 1.0
13318 1.0
13319 1.0
Name: balcony, Length: 13320, dtype: float64

[15]: df["balcony"].fillna(value=df["balcony"].max())

[15]: 0 1.0
1 3.0
2 3.0
3 1.0
4 1.0

13315 0.0
13316 3.0
13317 1.0
13318 1.0
13319 1.0

7
Name: balcony, Length: 13320, dtype: float64

[16]: df["balcony"].fillna(value=df["balcony"].min())

[16]: 0 1.0
1 3.0
2 3.0
3 1.0
4 1.0

13315 0.0
13316 0.0
13317 1.0
13318 1.0
13319 1.0
Name: balcony, Length: 13320, dtype: float64

[17]: mode = df['balcony'].mode()


df["balcony"].fillna(value=mode[0])

[17]: 0 1.0
1 3.0
2 3.0
3 1.0
4 1.0

13315 0.0
13316 2.0
13317 1.0
13318 1.0
13319 1.0
Name: balcony, Length: 13320, dtype: float64

[18]: df.dropna()

[18]: area_type availability location \


0 Super built-up Area 2024-12-19 00:00:00 Electronic City Phase II
1 Plot Area Ready To Move Chikka Tirupathi
2 Built-up Area Ready To Move Uttarahalli
3 Super built-up Area Ready To Move Lingadheeranahalli
4 Super built-up Area Ready To Move Kothanur
… … … …
13314 Super built-up Area Ready To Move Green Glen Layout
13315 Built-up Area Ready To Move Whitefield
13317 Built-up Area Ready To Move Raja Rajeshwari Nagar
13318 Super built-up Area 2024-06-18 00:00:00 Padmanabhanagar
13319 Super built-up Area Ready To Move Doddathoguru

8
size society total_sqft bath balcony price
0 2 BHK Coomee 1056 2.0 1.0 39.07
1 4 Bedroom Theanmp 2600 5.0 3.0 120.00
2 3 BHK abc 1440 2.0 3.0 62.00
3 3 BHK Soiewre 1521 3.0 1.0 95.00
4 2 BHK abc 1200 2.0 1.0 51.00
… … … … … … …
13314 3 BHK SoosePr 1715 3.0 3.0 112.00
13315 5 Bedroom ArsiaEx 3453 4.0 0.0 231.00
13317 2 BHK Mahla T 1141 2.0 1.0 60.00
13318 4 BHK SollyCl 4689 4.0 1.0 488.00
13319 1 BHK abc 550 1.0 1.0 17.00

[12710 rows x 9 columns]

[19]: df1=df.dropna(how='all')
df1.isnull().sum()

[19]: area_type 0
availability 0
location 1
size 16
society 0
total_sqft 0
bath 73
balcony 609
price 0
dtype: int64

[20]: df2=df.dropna(how='any')
df2

[20]: area_type availability location \


0 Super built-up Area 2024-12-19 00:00:00 Electronic City Phase II
1 Plot Area Ready To Move Chikka Tirupathi
2 Built-up Area Ready To Move Uttarahalli
3 Super built-up Area Ready To Move Lingadheeranahalli
4 Super built-up Area Ready To Move Kothanur
… … … …
13314 Super built-up Area Ready To Move Green Glen Layout
13315 Built-up Area Ready To Move Whitefield
13317 Built-up Area Ready To Move Raja Rajeshwari Nagar
13318 Super built-up Area 2024-06-18 00:00:00 Padmanabhanagar
13319 Super built-up Area Ready To Move Doddathoguru

size society total_sqft bath balcony price

9
0 2 BHK Coomee 1056 2.0 1.0 39.07
1 4 Bedroom Theanmp 2600 5.0 3.0 120.00
2 3 BHK abc 1440 2.0 3.0 62.00
3 3 BHK Soiewre 1521 3.0 1.0 95.00
4 2 BHK abc 1200 2.0 1.0 51.00
… … … … … … …
13314 3 BHK SoosePr 1715 3.0 3.0 112.00
13315 5 Bedroom ArsiaEx 3453 4.0 0.0 231.00
13317 2 BHK Mahla T 1141 2.0 1.0 60.00
13318 4 BHK SollyCl 4689 4.0 1.0 488.00
13319 1 BHK abc 550 1.0 1.0 17.00

[12710 rows x 9 columns]

[21]: df3=df.replace(to_replace=np.nan,value=200)
df3.head(10)

[21]: area_type availability location \


0 Super built-up Area 2024-12-19 00:00:00 Electronic City Phase II
1 Plot Area Ready To Move Chikka Tirupathi
2 Built-up Area Ready To Move Uttarahalli
3 Super built-up Area Ready To Move Lingadheeranahalli
4 Super built-up Area Ready To Move Kothanur
5 Super built-up Area Ready To Move Whitefield
6 Super built-up Area 2024-05-18 00:00:00 Old Airport Road
7 Super built-up Area Ready To Move Rajaji Nagar
8 Super built-up Area Ready To Move Marathahalli
9 Plot Area Ready To Move Gandhi Bazar

size society total_sqft bath balcony price


0 2 BHK Coomee 1056 2.0 1.0 39.07
1 4 Bedroom Theanmp 2600 5.0 3.0 120.00
2 3 BHK abc 1440 2.0 3.0 62.00
3 3 BHK Soiewre 1521 3.0 1.0 95.00
4 2 BHK abc 1200 2.0 1.0 51.00
5 2 BHK DuenaTa 1170 2.0 1.0 38.00
6 4 BHK Jaades 2732 4.0 200.0 204.00
7 4 BHK Brway G 3300 4.0 200.0 600.00
8 3 BHK abc 1310 3.0 1.0 63.25
9 6 Bedroom abc 1020 6.0 200.0 370.00

[22]: df4=df.replace(to_replace=1,value=100)
df4.head(10)

[22]: area_type availability location \


0 Super built-up Area 2024-12-19 00:00:00 Electronic City Phase II
1 Plot Area Ready To Move Chikka Tirupathi

10
2 Built-up Area Ready To Move Uttarahalli
3 Super built-up Area Ready To Move Lingadheeranahalli
4 Super built-up Area Ready To Move Kothanur
5 Super built-up Area Ready To Move Whitefield
6 Super built-up Area 2024-05-18 00:00:00 Old Airport Road
7 Super built-up Area Ready To Move Rajaji Nagar
8 Super built-up Area Ready To Move Marathahalli
9 Plot Area Ready To Move Gandhi Bazar

size society total_sqft bath balcony price


0 2 BHK Coomee 1056 2.0 100.0 39.07
1 4 Bedroom Theanmp 2600 5.0 3.0 120.00
2 3 BHK abc 1440 2.0 3.0 62.00
3 3 BHK Soiewre 1521 3.0 100.0 95.00
4 2 BHK abc 1200 2.0 100.0 51.00
5 2 BHK DuenaTa 1170 2.0 100.0 38.00
6 4 BHK Jaades 2732 4.0 NaN 204.00
7 4 BHK Brway G 3300 4.0 NaN 600.00
8 3 BHK abc 1310 3.0 100.0 63.25
9 6 Bedroom abc 1020 6.0 NaN 370.00

[23]: df5=df.copy()
df5['balcony']=df['balcony'].interpolate(method='linear')
df5.head(15)

[23]: area_type availability location \


0 Super built-up Area 2024-12-19 00:00:00 Electronic City Phase II
1 Plot Area Ready To Move Chikka Tirupathi
2 Built-up Area Ready To Move Uttarahalli
3 Super built-up Area Ready To Move Lingadheeranahalli
4 Super built-up Area Ready To Move Kothanur
5 Super built-up Area Ready To Move Whitefield
6 Super built-up Area 2024-05-18 00:00:00 Old Airport Road
7 Super built-up Area Ready To Move Rajaji Nagar
8 Super built-up Area Ready To Move Marathahalli
9 Plot Area Ready To Move Gandhi Bazar
10 Super built-up Area 2024-02-18 00:00:00 Whitefield
11 Plot Area Ready To Move Whitefield
12 Super built-up Area Ready To Move 7th Phase JP Nagar
13 Built-up Area Ready To Move Gottigere
14 Plot Area Ready To Move Sarjapur

size society total_sqft bath balcony price


0 2 BHK Coomee 1056 2.0 1.0 39.07
1 4 Bedroom Theanmp 2600 5.0 3.0 120.00
2 3 BHK abc 1440 2.0 3.0 62.00
3 3 BHK Soiewre 1521 3.0 1.0 95.00

11
4 2 BHK abc 1200 2.0 1.0 51.00
5 2 BHK DuenaTa 1170 2.0 1.0 38.00
6 4 BHK Jaades 2732 4.0 1.0 204.00
7 4 BHK Brway G 3300 4.0 1.0 600.00
8 3 BHK abc 1310 3.0 1.0 63.25
9 6 Bedroom abc 1020 6.0 1.5 370.00
10 3 BHK abc 1800 2.0 2.0 70.00
11 4 Bedroom Prrry M 2785 5.0 3.0 295.00
12 2 BHK Shncyes 1000 2.0 1.0 38.00
13 2 BHK abc 1100 2.0 2.0 40.00
14 3 Bedroom Skityer 2250 3.0 2.0 148.00

[24]: df.duplicated().sum()

[24]: 530

[25]: df_dr_dup=df.drop_duplicates(keep='first')
df_dr_dup

[25]: area_type availability location \


0 Super built-up Area 2024-12-19 00:00:00 Electronic City Phase II
1 Plot Area Ready To Move Chikka Tirupathi
2 Built-up Area Ready To Move Uttarahalli
3 Super built-up Area Ready To Move Lingadheeranahalli
4 Super built-up Area Ready To Move Kothanur
… … … …
13314 Super built-up Area Ready To Move Green Glen Layout
13315 Built-up Area Ready To Move Whitefield
13316 Super built-up Area Ready To Move Richards Town
13317 Built-up Area Ready To Move Raja Rajeshwari Nagar
13318 Super built-up Area 2024-06-18 00:00:00 Padmanabhanagar

size society total_sqft bath balcony price


0 2 BHK Coomee 1056 2.0 1.0 39.07
1 4 Bedroom Theanmp 2600 5.0 3.0 120.00
2 3 BHK abc 1440 2.0 3.0 62.00
3 3 BHK Soiewre 1521 3.0 1.0 95.00
4 2 BHK abc 1200 2.0 1.0 51.00
… … … … … … …
13314 3 BHK SoosePr 1715 3.0 3.0 112.00
13315 5 Bedroom ArsiaEx 3453 4.0 0.0 231.00
13316 4 BHK abc 3600 5.0 NaN 400.00
13317 2 BHK Mahla T 1141 2.0 1.0 60.00
13318 4 BHK SollyCl 4689 4.0 1.0 488.00

[12790 rows x 9 columns]

12
[26]: df_dr_dup=df.drop_duplicates(keep='last')
df_dr_dup

[26]: area_type availability location \


0 Super built-up Area 2024-12-19 00:00:00 Electronic City Phase II
1 Plot Area Ready To Move Chikka Tirupathi
2 Built-up Area Ready To Move Uttarahalli
3 Super built-up Area Ready To Move Lingadheeranahalli
4 Super built-up Area Ready To Move Kothanur
… … … …
13315 Built-up Area Ready To Move Whitefield
13316 Super built-up Area Ready To Move Richards Town
13317 Built-up Area Ready To Move Raja Rajeshwari Nagar
13318 Super built-up Area 2024-06-18 00:00:00 Padmanabhanagar
13319 Super built-up Area Ready To Move Doddathoguru

size society total_sqft bath balcony price


0 2 BHK Coomee 1056 2.0 1.0 39.07
1 4 Bedroom Theanmp 2600 5.0 3.0 120.00
2 3 BHK abc 1440 2.0 3.0 62.00
3 3 BHK Soiewre 1521 3.0 1.0 95.00
4 2 BHK abc 1200 2.0 1.0 51.00
… … … … … … …
13315 5 Bedroom ArsiaEx 3453 4.0 0.0 231.00
13316 4 BHK abc 3600 5.0 NaN 400.00
13317 2 BHK Mahla T 1141 2.0 1.0 60.00
13318 4 BHK SollyCl 4689 4.0 1.0 488.00
13319 1 BHK abc 550 1.0 1.0 17.00

[12790 rows x 9 columns]

[27]: df_dr_dup=df.drop_duplicates(keep=False)
df_dr_dup

[27]: area_type availability location \


0 Super built-up Area 2024-12-19 00:00:00 Electronic City Phase II
1 Plot Area Ready To Move Chikka Tirupathi
2 Built-up Area Ready To Move Uttarahalli
3 Super built-up Area Ready To Move Lingadheeranahalli
4 Super built-up Area Ready To Move Kothanur
… … … …
13314 Super built-up Area Ready To Move Green Glen Layout
13315 Built-up Area Ready To Move Whitefield
13316 Super built-up Area Ready To Move Richards Town
13317 Built-up Area Ready To Move Raja Rajeshwari Nagar
13318 Super built-up Area 2024-06-18 00:00:00 Padmanabhanagar

13
size society total_sqft bath balcony price
0 2 BHK Coomee 1056 2.0 1.0 39.07
1 4 Bedroom Theanmp 2600 5.0 3.0 120.00
2 3 BHK abc 1440 2.0 3.0 62.00
3 3 BHK Soiewre 1521 3.0 1.0 95.00
4 2 BHK abc 1200 2.0 1.0 51.00
… … … … … … …
13314 3 BHK SoosePr 1715 3.0 3.0 112.00
13315 5 Bedroom ArsiaEx 3453 4.0 0.0 231.00
13316 4 BHK abc 3600 5.0 NaN 400.00
13317 2 BHK Mahla T 1141 2.0 1.0 60.00
13318 4 BHK SollyCl 4689 4.0 1.0 488.00

[12409 rows x 9 columns]

[28]: def select_middle(group):


return group.iloc[len(group) // 2]
df_dr_dup4=df.drop_duplicates(keep='last').apply(select_middle)
df_dr_dup4

[28]: area_type Super built-up Area


availability Ready To Move
location Sarjapur Road
size 2 BHK
society Adeatlm
total_sqft 1320
bath 2.0
balcony 2.0
price 115.0
dtype: object

[29]: df1=pd.read_csv("train - train.csv.csv")


df2=pd.read_csv("test - test.csv.csv")

[30]: df1

[30]: User_ID Product_ID Gender Age Occupation City_Category \


0 1000001 P00069042 F 0-17 10 A
1 1000001 P00248942 F 0-17 10 A
2 1000001 P00087842 F 0-17 10 A
3 1000001 P00085442 F 0-17 10 A
4 1000002 P00285442 M 55+ 16 C
… … … … … … …
550063 1006033 P00372445 M 51-55 13 B
550064 1006035 P00375436 F 26-35 1 C
550065 1006036 P00375436 F 26-35 15 B
550066 1006038 P00375436 F 55+ 1 C

14
550067 1006039 P00371644 F 46-50 0 B

Stay_In_Current_City_Years Marital_Status Product_Category_1 \


0 2 0 3
1 2 0 1
2 2 0 12
3 2 0 12
4 4+ 0 8
… … … …
550063 1 1 20
550064 3 0 20
550065 4+ 1 20
550066 2 0 20
550067 4+ 1 20

Product_Category_2 Product_Category_3 Purchase


0 NaN NaN 8370
1 6.0 14.0 15200
2 NaN NaN 1422
3 14.0 NaN 1057
4 NaN NaN 7969
… … … …
550063 NaN NaN 368
550064 NaN NaN 371
550065 NaN NaN 137
550066 NaN NaN 365
550067 NaN NaN 490

[550068 rows x 12 columns]

[31]: df2

[31]: User_ID Product_ID Gender Age Occupation City_Category \


0 1000004 P00128942 M 46-50 7 B
1 1000009 P00113442 M 26-35 17 C
2 1000010 P00288442 F 36-45 1 B
3 1000010 P00145342 F 36-45 1 B
4 1000011 P00053842 F 26-35 1 C
… … … … … … …
233594 1006036 P00118942 F 26-35 15 B
233595 1006036 P00254642 F 26-35 15 B
233596 1006036 P00031842 F 26-35 15 B
233597 1006037 P00124742 F 46-50 1 C
233598 1006039 P00316642 F 46-50 0 B

Stay_In_Current_City_Years Marital_Status Product_Category_1 \


0 2 1 1

15
1 0 0 3
2 4+ 1 5
3 4+ 1 4
4 1 0 4
… … … …
233594 4+ 1 8
233595 4+ 1 5
233596 4+ 1 1
233597 4+ 0 10
233598 4+ 1 4

Product_Category_2 Product_Category_3
0 11.0 NaN
1 5.0 NaN
2 14.0 NaN
3 9.0 NaN
4 5.0 12.0
… … …
233594 NaN NaN
233595 8.0 NaN
233596 5.0 12.0
233597 16.0 NaN
233598 5.0 NaN

[233599 rows x 11 columns]

[51]: df=pd.concat([df1,df2],axis=0)
df

[51]: User_ID Product_ID Gender Age Occupation City_Category \


0 1000001 P00069042 F 0-17 10 A
1 1000001 P00248942 F 0-17 10 A
2 1000001 P00087842 F 0-17 10 A
3 1000001 P00085442 F 0-17 10 A
4 1000002 P00285442 M 55+ 16 C
… … … … … … …
233594 1006036 P00118942 F 26-35 15 B
233595 1006036 P00254642 F 26-35 15 B
233596 1006036 P00031842 F 26-35 15 B
233597 1006037 P00124742 F 46-50 1 C
233598 1006039 P00316642 F 46-50 0 B

Stay_In_Current_City_Years Marital_Status Product_Category_1 \


0 2 0 3
1 2 0 1
2 2 0 12
3 2 0 12

16
4 4+ 0 8
… … … …
233594 4+ 1 8
233595 4+ 1 5
233596 4+ 1 1
233597 4+ 0 10
233598 4+ 1 4

Product_Category_2 Product_Category_3 Purchase


0 NaN NaN 8370.0
1 6.0 14.0 15200.0
2 NaN NaN 1422.0
3 14.0 NaN 1057.0
4 NaN NaN 7969.0
… … … …
233594 NaN NaN NaN
233595 8.0 NaN NaN
233596 5.0 12.0 NaN
233597 16.0 NaN NaN
233598 5.0 NaN NaN

[783667 rows x 12 columns]

[52]: df['Gender']=df['Gender'].map({'F':0,'M':1})
df

[52]: User_ID Product_ID Gender Age Occupation City_Category \


0 1000001 P00069042 0 0-17 10 A
1 1000001 P00248942 0 0-17 10 A
2 1000001 P00087842 0 0-17 10 A
3 1000001 P00085442 0 0-17 10 A
4 1000002 P00285442 1 55+ 16 C
… … … … … … …
233594 1006036 P00118942 0 26-35 15 B
233595 1006036 P00254642 0 26-35 15 B
233596 1006036 P00031842 0 26-35 15 B
233597 1006037 P00124742 0 46-50 1 C
233598 1006039 P00316642 0 46-50 0 B

Stay_In_Current_City_Years Marital_Status Product_Category_1 \


0 2 0 3
1 2 0 1
2 2 0 12
3 2 0 12
4 4+ 0 8
… … … …
233594 4+ 1 8

17
233595 4+ 1 5
233596 4+ 1 1
233597 4+ 0 10
233598 4+ 1 4

Product_Category_2 Product_Category_3 Purchase


0 NaN NaN 8370.0
1 6.0 14.0 15200.0
2 NaN NaN 1422.0
3 14.0 NaN 1057.0
4 NaN NaN 7969.0
… … … …
233594 NaN NaN NaN
233595 8.0 NaN NaN
233596 5.0 12.0 NaN
233597 16.0 NaN NaN
233598 5.0 NaN NaN

[783667 rows x 12 columns]

[53]: df['Age'].unique()

[53]: array(['0-17', '55+', '26-35', '46-50', '51-55', '36-45', '18-25'],


dtype=object)

[54]: df['Age']=df['Age'].map({'0-17':1,'18-25':2,'26-35':3,'36-45':4,'46-50':
↪5,'51-55':6,'55+':7})

df

[54]: User_ID Product_ID Gender Age Occupation City_Category \


0 1000001 P00069042 0 1 10 A
1 1000001 P00248942 0 1 10 A
2 1000001 P00087842 0 1 10 A
3 1000001 P00085442 0 1 10 A
4 1000002 P00285442 1 7 16 C
… … … … … … …
233594 1006036 P00118942 0 3 15 B
233595 1006036 P00254642 0 3 15 B
233596 1006036 P00031842 0 3 15 B
233597 1006037 P00124742 0 5 1 C
233598 1006039 P00316642 0 5 0 B

Stay_In_Current_City_Years Marital_Status Product_Category_1 \


0 2 0 3
1 2 0 1
2 2 0 12
3 2 0 12

18
4 4+ 0 8
… … … …
233594 4+ 1 8
233595 4+ 1 5
233596 4+ 1 1
233597 4+ 0 10
233598 4+ 1 4

Product_Category_2 Product_Category_3 Purchase


0 NaN NaN 8370.0
1 6.0 14.0 15200.0
2 NaN NaN 1422.0
3 14.0 NaN 1057.0
4 NaN NaN 7969.0
… … … …
233594 NaN NaN NaN
233595 8.0 NaN NaN
233596 5.0 12.0 NaN
233597 16.0 NaN NaN
233598 5.0 NaN NaN

[783667 rows x 12 columns]

[55]: df['Age'].unique()

[55]: array([1, 7, 3, 5, 6, 4, 2], dtype=int64)

[56]: df['City_Category']=df['City_Category'].map({'A':1,'B':2,'C':3})
df

[56]: User_ID Product_ID Gender Age Occupation City_Category \


0 1000001 P00069042 0 1 10 1
1 1000001 P00248942 0 1 10 1
2 1000001 P00087842 0 1 10 1
3 1000001 P00085442 0 1 10 1
4 1000002 P00285442 1 7 16 3
… … … … … … …
233594 1006036 P00118942 0 3 15 2
233595 1006036 P00254642 0 3 15 2
233596 1006036 P00031842 0 3 15 2
233597 1006037 P00124742 0 5 1 3
233598 1006039 P00316642 0 5 0 2

Stay_In_Current_City_Years Marital_Status Product_Category_1 \


0 2 0 3
1 2 0 1
2 2 0 12

19
3 2 0 12
4 4+ 0 8
… … … …
233594 4+ 1 8
233595 4+ 1 5
233596 4+ 1 1
233597 4+ 0 10
233598 4+ 1 4

Product_Category_2 Product_Category_3 Purchase


0 NaN NaN 8370.0
1 6.0 14.0 15200.0
2 NaN NaN 1422.0
3 14.0 NaN 1057.0
4 NaN NaN 7969.0
… … … …
233594 NaN NaN NaN
233595 8.0 NaN NaN
233596 5.0 12.0 NaN
233597 16.0 NaN NaN
233598 5.0 NaN NaN

[783667 rows x 12 columns]

[57]: df['City_Category'].unique()

[57]: array([1, 3, 2], dtype=int64)

[58]: df['Stay_In_Current_City_Years'].unique()

[58]: array(['2', '4+', '3', '1', '0'], dtype=object)

[59]: df['Stay_In_Current_City_Years'].value_counts()

[59]: Stay_In_Current_City_Years
1 276425
2 145427
3 135428
4+ 120671
0 105716
Name: count, dtype: int64

[60]: df['Stay_In_Current_City_Years']=df['Stay_In_Current_City_Years'].str.
↪replace("+","")

[61]: df['Stay_In_Current_City_Years'].value_counts()

20
[61]: Stay_In_Current_City_Years
1 276425
2 145427
3 135428
4 120671
0 105716
Name: count, dtype: int64

[62]: df['Product_ID']=df['Product_ID'].str[1:]
df['Product_ID']

[62]: 0 00069042
1 00248942
2 00087842
3 00085442
4 00285442

233594 00118942
233595 00254642
233596 00031842
233597 00124742
233598 00316642
Name: Product_ID, Length: 783667, dtype: object

[63]: df['Product_ID'] = df['Product_ID'].astype('int64')


df['Product_ID']

[63]: 0 69042
1 248942
2 87842
3 85442
4 285442

233594 118942
233595 254642
233596 31842
233597 124742
233598 316642
Name: Product_ID, Length: 783667, dtype: int64

[65]: df_city=pd.get_dummies(df['City_Category'],prefix=␣
↪'City_Category',drop_first=True)

df_city

[65]: City_Category_2 City_Category_3


0 False False
1 False False

21
2 False False
3 False False
4 False True
… … …
233594 True False
233595 True False
233596 True False
233597 False True
233598 True False

[783667 rows x 2 columns]

[66]: df_city=pd.get_dummies(df['City_Category'],prefix=␣
↪'City_Category',drop_first=False)

df_city

[66]: City_Category_1 City_Category_2 City_Category_3


0 True False False
1 True False False
2 True False False
3 True False False
4 False False True
… … … …
233594 False True False
233595 False True False
233596 False True False
233597 False False True
233598 False True False

[783667 rows x 3 columns]

0.1 Part B
[67]: df = pd.read_csv("WDICountry.csv")
df

[67]: Country Code Short Name Table Name \


0 ABW Aruba Aruba
1 AFE Africa Eastern and Southern Africa Eastern and Southern
2 AFG Afghanistan Afghanistan
3 AFW Africa Western and Central Africa Western and Central
4 AGO Angola Angola
.. … … …
262 ZAF South Africa South Africa
263 ZMB Zambia Zambia
264 ZWE Zimbabwe Zimbabwe
265 AUS Australia Australia

22
266 VEN Venezuela Venezuela, RB

Long Name 2-alpha code \


0 Aruba AW
1 Africa Eastern and Southern ZH
2 Islamic State of Afghanistan AF
3 Africa Western and Central ZI
4 People's Republic of Angola AO
.. … …
262 Republic of South Africa ZA
263 Republic of Zambia ZM
264 Republic of Zimbabwe ZW
265 Commonwealth of Australia AU
266 República Bolivariana de Venezuela VE

Currency Unit Region \


0 Aruban florin Latin America & Caribbean
1 NaN NaN
2 Afghan afghani South Asia
3 NaN NaN
4 Angolan kwanza Sub-Saharan Africa
.. … …
262 South African rand Sub-Saharan Africa
263 New Zambian kwacha Sub-Saharan Africa
264 Zimbabwean Dollar Sub-Saharan Africa
265 Australian dollar East Asia & Pacific
266 Venezuelan bolivar fuerte Latin America & Caribbean

Income Group WB-2 code \


0 High income AW
1 NaN ZH
2 Low income AF
3 NaN ZI
4 Lower middle income AO
.. … …
262 Upper middle income ZA
263 Lower middle income ZM
264 Lower middle income ZW
265 High income AU
266 NaN VE

National accounts base year … \


0 2013 …
1 NaN …
2 2016 …
3 NaN …
4 2002 …

23
.. … …
262 2015 …
263 2010 …
264 2019 …
265 Original chained constant price data are resca… …
266 1997 …

System of trade Government Accounting concept \


0 General trade system NaN
1 NaN NaN
2 General trade system Consolidated central government
3 NaN NaN
4 General trade system Budgetary central government
.. … …
262 General trade system Consolidated central government
263 General trade system Budgetary central government
264 General trade system Budgetary central government
265 General trade system Consolidated central government
266 NaN NaN

IMF data dissemination standard \


0 Enhanced General Data Dissemination System (e-…
1 NaN
2 Enhanced General Data Dissemination System (e-…
3 NaN
4 Enhanced General Data Dissemination System (e-…
.. …
262 Special Data Dissemination Standard (SDDS)
263 Enhanced General Data Dissemination System (e-…
264 Enhanced General Data Dissemination System (e-…
265 Special Data Dissemination Standard (SDDS)
266 Enhanced General Data Dissemination System (e-…

Latest population census Latest household survey \


0 2020 (expected) NaN
1 NaN NaN
2 1979 Demographic and Health Survey, 2015
3 NaN NaN
4 2014 Demographic and Health Survey, 2015/16
.. … …
262 2011 Demographic and Health Survey, 2016
263 2020 (expected) Demographic and Health Survey, 2018
264 2012 Multiple Indicator Cluster Survey, 2019
265 2016 NaN
266 2011 Multiple Indicator Cluster Survey, 2000

Source of most recent Income and expenditure data \

24
0 NaN
1 NaN
2 Integrated household survey (IHS), 2016/17
3 NaN
4 Integrated household survey (IHS), 2008/09
.. …
262 Expenditure survey/budget survey (ES/BS), 2014/15
263 Integrated household survey (IHS), 2015
264 Integrated household survey (IHS), 2011/12
265 Expenditure survey/budget survey (ES/BS), 2010
266 Integrated household survey (IHS), 2015

Vital registration complete Latest agricultural census \


0 Yes NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
.. … …
262 NaN 2007
263 NaN NaN
264 NaN NaN
265 Yes 2015-2016
266 NaN 2008

Latest industrial data Latest trade data


0 NaN 2018.0
1 NaN NaN
2 NaN 2018.0
3 NaN NaN
4 NaN 2018.0
.. … …
262 2010.0 2018.0
263 1994.0 2018.0
264 NaN 2018.0
265 2013.0 2018.0
266 1998.0 2013.0

[267 rows x 29 columns]

[68]: df.isnull().sum()

[68]: Country Code 0


Short Name 0
Table Name 0
Long Name 0
2-alpha code 2

25
Currency Unit 48
Region 48
Income Group 50
WB-2 code 1
National accounts base year 56
National accounts reference year 192
SNA price valuation 58
Lending category 122
Other groups 208
System of National Accounts 57
Alternative conversion factor 267
PPP survey year 267
Balance of Payments Manual in use 70
External debt Reporting status 146
System of trade 94
Government Accounting concept 108
IMF data dissemination standard 77
Latest population census 51
Latest household survey 112
Source of most recent Income and expenditure data 97
Vital registration complete 146
Latest agricultural census 137
Latest industrial data 118
Latest trade data 74
dtype: int64

[69]: df.isnull().sum().sum()

[69]: 2606

[70]: df.dtypes

[70]: Country Code object


Short Name object
Table Name object
Long Name object
2-alpha code object
Currency Unit object
Region object
Income Group object
WB-2 code object
National accounts base year object
National accounts reference year float64
SNA price valuation object
Lending category object
Other groups object
System of National Accounts object

26
Alternative conversion factor float64
PPP survey year float64
Balance of Payments Manual in use object
External debt Reporting status object
System of trade object
Government Accounting concept object
IMF data dissemination standard object
Latest population census object
Latest household survey object
Source of most recent Income and expenditure data object
Vital registration complete object
Latest agricultural census object
Latest industrial data float64
Latest trade data float64
dtype: object

[71]: df["Currency Unit"]

[71]: 0 Aruban florin


1 NaN
2 Afghan afghani
3 NaN
4 Angolan kwanza

262 South African rand
263 New Zambian kwacha
264 Zimbabwean Dollar
265 Australian dollar
266 Venezuelan bolivar fuerte
Name: Currency Unit, Length: 267, dtype: object

[72]: df['Currency Unit'] = df['Currency Unit'].fillna(df['Currency Unit'].mode().


↪iloc[0])

df['Region'] = df['Region'].fillna(df['Region'].mode().iloc[0])
df['Income Group'] = df['Income Group'].fillna(df['Income Group'].mode().
↪iloc[0])

[73]: # time data

df['National accounts base year'] = df['National accounts base year'].


↪fillna(method='ffill')

df['National accounts reference year'] = df['National accounts reference year'].


↪fillna(method='ffill')

C:\Users\Swapnil\AppData\Local\Temp\ipykernel_20100\2186738220.py:3:
FutureWarning: Series.fillna with 'method' is deprecated and will raise in a
future version. Use obj.ffill() or obj.bfill() instead.

27
df['National accounts base year'] = df['National accounts base
year'].fillna(method='ffill')
C:\Users\Swapnil\AppData\Local\Temp\ipykernel_20100\2186738220.py:4:
FutureWarning: Series.fillna with 'method' is deprecated and will raise in a
future version. Use obj.ffill() or obj.bfill() instead.
df['National accounts reference year'] = df['National accounts reference
year'].fillna(method='ffill')

[74]: df['WB-2 code'] = df['WB-2 code'].fillna('-')


df['2-alpha code'] = df['2-alpha code'].fillna('-')

[75]: df.isnull().sum()

[75]: Country Code 0


Short Name 0
Table Name 0
Long Name 0
2-alpha code 0
Currency Unit 0
Region 0
Income Group 0
WB-2 code 0
National accounts base year 0
National accounts reference year 5
SNA price valuation 58
Lending category 122
Other groups 208
System of National Accounts 57
Alternative conversion factor 267
PPP survey year 267
Balance of Payments Manual in use 70
External debt Reporting status 146
System of trade 94
Government Accounting concept 108
IMF data dissemination standard 77
Latest population census 51
Latest household survey 112
Source of most recent Income and expenditure data 97
Vital registration complete 146
Latest agricultural census 137
Latest industrial data 118
Latest trade data 74
dtype: int64

[76]: df.head(2)

28
[76]: Country Code Short Name Table Name \
0 ABW Aruba Aruba
1 AFE Africa Eastern and Southern Africa Eastern and Southern

Long Name 2-alpha code Currency Unit \


0 Aruba AW Aruban florin
1 Africa Eastern and Southern ZH Euro

Region Income Group WB-2 code \


0 Latin America & Caribbean High income AW
1 Europe & Central Asia High income ZH

National accounts base year … System of trade \


0 2013 … General trade system
1 2013 … NaN

Government Accounting concept \


0 NaN
1 NaN

IMF data dissemination standard Latest population census \


0 Enhanced General Data Dissemination System (e-… 2020 (expected)
1 NaN NaN

Latest household survey Source of most recent Income and expenditure data \
0 NaN NaN
1 NaN NaN

Vital registration complete Latest agricultural census \


0 Yes NaN
1 NaN NaN

Latest industrial data Latest trade data


0 NaN 2018.0
1 NaN NaN

[2 rows x 29 columns]

[77]: # remaining categorical freatures


df['SNA price valuation'] = df['SNA price valuation'].fillna(df['SNA price␣
↪valuation'].mode().iloc[0])

df['Lending category'] = df['Lending category'].fillna(df['Lending category'].


↪mode().iloc[0])

df['Other groups'] = df['Other groups'].fillna(df['Other groups'].mode().


↪iloc[0])

df['System of National Accounts'] = df['System of National Accounts'].


↪fillna(df['System of National Accounts'].mode().iloc[0])

29
df['Balance of Payments Manual in use'] = df['Balance of Payments Manual in␣
↪use'].fillna(df['Balance of Payments Manual in use'].mode().iloc[0])

df['External debt Reporting status'] = df['External debt Reporting status'].


↪fillna(df['External debt Reporting status'].mode().iloc[0])

df['System of trade'] = df['System of trade'].fillna(df['System of trade'].


↪mode().iloc[0])

df['Government Accounting concept'] = df['Government Accounting concept'].


↪fillna(df['Government Accounting concept'].mode().iloc[0])

df['IMF data dissemination standard'] = df['IMF data dissemination standard'].


↪fillna(df['IMF data dissemination standard'].mode().iloc[0])

[78]: df['Latest population census'] = df['Latest population census'].


↪fillna(df['Latest population census'].mode().iloc[0])

df['Latest household survey'] = df['Latest household survey'].fillna(df['Latest␣


↪household survey'].mode().iloc[0])

df['Source of most recent Income and expenditure data'] = df['Source of most␣


↪recent Income and expenditure data'].fillna(df['Source of most recent Income␣

↪and expenditure data'].mode().iloc[0])

df['Vital registration complete'] = df['Vital registration complete'].


↪fillna(df['Vital registration complete'].mode().iloc[0])

df['Latest agricultural census'] = df['Latest agricultural census'].


↪fillna(df['Latest agricultural census'].mode().iloc[0])

[79]: df['Latest industrial data'].head()

[79]: 0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
Name: Latest industrial data, dtype: float64

[80]: df['Latest industrial data'] = df['Latest industrial data'].


↪fillna(method='bfill')

df['Latest trade data'] = df['Latest trade data'].fillna(method='bfill')

C:\Users\Swapnil\AppData\Local\Temp\ipykernel_20100\160896208.py:1:
FutureWarning: Series.fillna with 'method' is deprecated and will raise in a
future version. Use obj.ffill() or obj.bfill() instead.
df['Latest industrial data'] = df['Latest industrial
data'].fillna(method='bfill')
C:\Users\Swapnil\AppData\Local\Temp\ipykernel_20100\160896208.py:2:
FutureWarning: Series.fillna with 'method' is deprecated and will raise in a
future version. Use obj.ffill() or obj.bfill() instead.
df['Latest trade data'] = df['Latest trade data'].fillna(method='bfill')

30
[81]: df['Latest industrial data'].head()

[81]: 0 2013.0
1 2013.0
2 2013.0
3 2013.0
4 2013.0
Name: Latest industrial data, dtype: float64

[82]: df

[82]: Country Code Short Name Table Name \


0 ABW Aruba Aruba
1 AFE Africa Eastern and Southern Africa Eastern and Southern
2 AFG Afghanistan Afghanistan
3 AFW Africa Western and Central Africa Western and Central
4 AGO Angola Angola
.. … … …
262 ZAF South Africa South Africa
263 ZMB Zambia Zambia
264 ZWE Zimbabwe Zimbabwe
265 AUS Australia Australia
266 VEN Venezuela Venezuela, RB

Long Name 2-alpha code \


0 Aruba AW
1 Africa Eastern and Southern ZH
2 Islamic State of Afghanistan AF
3 Africa Western and Central ZI
4 People's Republic of Angola AO
.. … …
262 Republic of South Africa ZA
263 Republic of Zambia ZM
264 Republic of Zimbabwe ZW
265 Commonwealth of Australia AU
266 República Bolivariana de Venezuela VE

Currency Unit Region \


0 Aruban florin Latin America & Caribbean
1 Euro Europe & Central Asia
2 Afghan afghani South Asia
3 Euro Europe & Central Asia
4 Angolan kwanza Sub-Saharan Africa
.. … …
262 South African rand Sub-Saharan Africa
263 New Zambian kwacha Sub-Saharan Africa
264 Zimbabwean Dollar Sub-Saharan Africa

31
265 Australian dollar East Asia & Pacific
266 Venezuelan bolivar fuerte Latin America & Caribbean

Income Group WB-2 code \


0 High income AW
1 High income ZH
2 Low income AF
3 High income ZI
4 Lower middle income AO
.. … …
262 Upper middle income ZA
263 Lower middle income ZM
264 Lower middle income ZW
265 High income AU
266 High income VE

National accounts base year … \


0 2013 …
1 2013 …
2 2016 …
3 2016 …
4 2002 …
.. … …
262 2015 …
263 2010 …
264 2019 …
265 Original chained constant price data are resca… …
266 1997 …

System of trade Government Accounting concept \


0 General trade system Consolidated central government
1 General trade system Consolidated central government
2 General trade system Consolidated central government
3 General trade system Consolidated central government
4 General trade system Budgetary central government
.. … …
262 General trade system Consolidated central government
263 General trade system Budgetary central government
264 General trade system Budgetary central government
265 General trade system Consolidated central government
266 General trade system Consolidated central government

IMF data dissemination standard \


0 Enhanced General Data Dissemination System (e-…
1 Enhanced General Data Dissemination System (e-…
2 Enhanced General Data Dissemination System (e-…
3 Enhanced General Data Dissemination System (e-…

32
4 Enhanced General Data Dissemination System (e-…
.. …
262 Special Data Dissemination Standard (SDDS)
263 Enhanced General Data Dissemination System (e-…
264 Enhanced General Data Dissemination System (e-…
265 Special Data Dissemination Standard (SDDS)
266 Enhanced General Data Dissemination System (e-…

Latest population census Latest household survey \


0 2020 (expected) Multiple Indicator Cluster Survey, 2019
1 2020 (expected) Multiple Indicator Cluster Survey, 2019
2 1979 Demographic and Health Survey, 2015
3 2020 (expected) Multiple Indicator Cluster Survey, 2019
4 2014 Demographic and Health Survey, 2015/16
.. … …
262 2011 Demographic and Health Survey, 2016
263 2020 (expected) Demographic and Health Survey, 2018
264 2012 Multiple Indicator Cluster Survey, 2019
265 2016 Multiple Indicator Cluster Survey, 2019
266 2011 Multiple Indicator Cluster Survey, 2000

Source of most recent Income and expenditure data \


0 Income survey (IS), 2015
1 Income survey (IS), 2015
2 Integrated household survey (IHS), 2016/17
3 Income survey (IS), 2015
4 Integrated household survey (IHS), 2008/09
.. …
262 Expenditure survey/budget survey (ES/BS), 2014/15
263 Integrated household survey (IHS), 2015
264 Integrated household survey (IHS), 2011/12
265 Expenditure survey/budget survey (ES/BS), 2010
266 Integrated household survey (IHS), 2015

Vital registration complete Latest agricultural census \


0 Yes 2010
1 Yes 2010
2 Yes 2010
3 Yes 2010
4 Yes 2010
.. … …
262 Yes 2007
263 Yes 2010
264 Yes 2010
265 Yes 2015-2016
266 Yes 2008

33
Latest industrial data Latest trade data
0 2013.0 2018.0
1 2013.0 2018.0
2 2013.0 2018.0
3 2013.0 2018.0
4 2013.0 2018.0
.. … …
262 2010.0 2018.0
263 1994.0 2018.0
264 2013.0 2018.0
265 2013.0 2018.0
266 1998.0 2013.0

[267 rows x 29 columns]

[83]: df = df.drop_duplicates()
df

[83]: Country Code Short Name Table Name \


0 ABW Aruba Aruba
1 AFE Africa Eastern and Southern Africa Eastern and Southern
2 AFG Afghanistan Afghanistan
3 AFW Africa Western and Central Africa Western and Central
4 AGO Angola Angola
.. … … …
261 YEM Yemen Yemen, Rep.
262 ZAF South Africa South Africa
263 ZMB Zambia Zambia
264 ZWE Zimbabwe Zimbabwe
266 VEN Venezuela Venezuela, RB

Long Name 2-alpha code \


0 Aruba AW
1 Africa Eastern and Southern ZH
2 Islamic State of Afghanistan AF
3 Africa Western and Central ZI
4 People's Republic of Angola AO
.. … …
261 Republic of Yemen YE
262 Republic of South Africa ZA
263 Republic of Zambia ZM
264 Republic of Zimbabwe ZW
266 República Bolivariana de Venezuela VE

Currency Unit Region \


0 Aruban florin Latin America & Caribbean
1 Euro Europe & Central Asia

34
2 Afghan afghani South Asia
3 Euro Europe & Central Asia
4 Angolan kwanza Sub-Saharan Africa
.. … …
261 Yemeni rial Middle East & North Africa
262 South African rand Sub-Saharan Africa
263 New Zambian kwacha Sub-Saharan Africa
264 Zimbabwean Dollar Sub-Saharan Africa
266 Venezuelan bolivar fuerte Latin America & Caribbean

Income Group WB-2 code National accounts base year … \


0 High income AW 2013 …
1 High income ZH 2013 …
2 Low income AF 2016 …
3 High income ZI 2016 …
4 Lower middle income AO 2002 …
.. … … … …
261 Low income RY 1990 …
262 Upper middle income ZA 2015 …
263 Lower middle income ZM 2010 …
264 Lower middle income ZW 2019 …
266 High income VE 1997 …

System of trade Government Accounting concept \


0 General trade system Consolidated central government
1 General trade system Consolidated central government
2 General trade system Consolidated central government
3 General trade system Consolidated central government
4 General trade system Budgetary central government
.. … …
261 Special trade system Consolidated central government
262 General trade system Consolidated central government
263 General trade system Budgetary central government
264 General trade system Budgetary central government
266 General trade system Consolidated central government

IMF
data dissemination standard \
0 Enhanced General Data
Dissemination System (e-…
1 Enhanced General Data
Dissemination System (e-…
2 Enhanced General Data
Dissemination System (e-…
3 Enhanced General Data
Dissemination System (e-…
4 Enhanced General Data
Dissemination System (e-…
.. …
261 Enhanced General Data Dissemination System (e-…
262 Special Data Dissemination Standard (SDDS)
263 Enhanced General Data Dissemination System (e-…
264 Enhanced General Data Dissemination System (e-…

35
266 Enhanced General Data Dissemination System (e-…

Latest population census Latest household survey \


0 2020 (expected) Multiple Indicator Cluster Survey, 2019
1 2020 (expected) Multiple Indicator Cluster Survey, 2019
2 1979 Demographic and Health Survey, 2015
3 2020 (expected) Multiple Indicator Cluster Survey, 2019
4 2014 Demographic and Health Survey, 2015/16
.. … …
261 2004 Demographic and Health Survey, 2013
262 2011 Demographic and Health Survey, 2016
263 2020 (expected) Demographic and Health Survey, 2018
264 2012 Multiple Indicator Cluster Survey, 2019
266 2011 Multiple Indicator Cluster Survey, 2000

Source of most recent Income and expenditure data \


0 Income survey (IS), 2015
1 Income survey (IS), 2015
2 Integrated household survey (IHS), 2016/17
3 Income survey (IS), 2015
4 Integrated household survey (IHS), 2008/09
.. …
261 Expenditure survey/budget survey (ES/BS), 2014
262 Expenditure survey/budget survey (ES/BS), 2014/15
263 Integrated household survey (IHS), 2015
264 Integrated household survey (IHS), 2011/12
266 Integrated household survey (IHS), 2015

Vital registration complete Latest agricultural census \


0 Yes 2010
1 Yes 2010
2 Yes 2010
3 Yes 2010
4 Yes 2010
.. … …
261 Yes 2010
262 Yes 2007
263 Yes 2010
264 Yes 2010
266 Yes 2008

Latest industrial data Latest trade data


0 2013.0 2018.0
1 2013.0 2018.0
2 2013.0 2018.0
3 2013.0 2018.0
4 2013.0 2018.0

36
.. … …
261 2012.0 2015.0
262 2010.0 2018.0
263 1994.0 2018.0
264 2013.0 2018.0
266 1998.0 2013.0

[266 rows x 29 columns]

[ ]:

37

You might also like