Data Cleaning and Fill Missing Values
Data Cleaning and Fill Missing Values
[4]: df_2.shape
[4]: (19, 9)
[5]: df_2.info
1
0 1 0 863 1.1 66.30 697.23 MS 55 2000
1 2 10 4644 2.4 233.03 119.66 CA NaN 2000
2 3 15 16330 4.2 325.08 175.46 WI f 2000
3 4 0 13 1.0 66.64 0.00 OR fg 2000
4 5 13 22537 4.5 328.66 175.46 AZ sd 2000
5 6 21 40931 3.1 205.28 175.46 FL gh 2000
6 7 11 34762 0.7 49.17 145.20 LA sd 2000
7 8 5 11051 2.9 208.80 270.04 GA sd 2000
8 9 8 7003 3.4 212.06 119.66 WA sd 2000
9 10 1 11 0.7 44.43 0.00 PA sd 2000
10 11 17 24879 3.5 260.29 119.66 TX sd 2000
11 12 3 5339 3.2 236.93 440.13 LA sd 2000
12 13 14 29782 10.0 695.10 228.12 FL sd 2000
13 14 19 56111 2.0 116.00 183.31 OH sd 2000
14 15 13 21946 3.8 312.36 175.46 MA sd 2000
15 16 8 3101 3.1 220.61 119.66 VA sd 2000
16 17 15 41965 0.9 66.25 119.66 OH sd 2000
17 18 3 15365 2.0 158.94 175.46 CO sd 2000
18 19 12 44865 4.9 319.51 119.66 FL sd 2000>
[10]: df_2.describe()
year
count 19.0
mean 2000.0
std 0.0
min 2000.0
25% 2000.0
50% 2000.0
75% 2000.0
max 2000.0
[11]: df_2.head(2)
2
1 Clean sum coloumns from data frame
[13]: df_2
[15]: df_2
3
10 11 17 24879 260.29 119.66 TX
11 12 3 5339 236.93 440.13 LA
12 13 14 29782 695.10 228.12 FL
13 14 19 56111 116.00 183.31 OH
14 15 13 21946 312.36 175.46 MA
15 16 8 3101 220.61 119.66 VA
16 17 15 41965 66.25 119.66 OH
[17]: df_2
4
1 2 10 4644 233.03 119.66 CA 5008.69
2 3 15 16330 325.08 175.46 WI 16848.54
3 4 0 13 66.64 0.00 OR 83.64
4 5 13 22537 328.66 175.46 AZ 23059.12
5 6 21 40931 205.28 175.46 FL 41338.74
6 7 11 34762 49.17 145.20 LA 34974.37
7 8 5 11051 208.80 270.04 GA 11542.84
8 9 8 7003 212.06 119.66 WA 7351.72
9 10 1 11 44.43 0.00 PA 66.43
10 11 17 24879 260.29 119.66 TX 25286.95
11 12 3 5339 236.93 440.13 LA 6031.06
12 13 14 29782 695.10 228.12 FL 30732.22
13 14 19 56111 116.00 183.31 OH 56443.31
14 15 13 21946 312.36 175.46 MA 22461.82
15 16 8 3101 220.61 119.66 VA 3465.27
16 17 15 41965 66.25 119.66 OH 42182.91
[32]: VCL 0
fm 0
MLG 0
lc 0
mc 0
5
country 0
Total 0
dtype: int64
[34]: 0 863
1 4644
2 16330
3 13
4 22537
5 40931
6 34762
7 11051
8 7003
9 11
10 24879
11 5339
12 29782
13 56111
14 21946
15 3101
16 41965
Name: MLG, dtype: int64
6
15 VA 16 3101 8 220.61 119.66 3465.27
16 OH 17 41965 15 66.25 119.66 42182.91
[40]: df_3
[49]: df_3
7
OH 14 56111 19 116.00 183.31 56443.31
MA 15 21946 13 312.36 175.46 22461.82
VA 16 3101 8 220.61 119.66 3465.27
OH 17 41965 15 66.25 119.66 42182.91
[57]: df_3.loc['LA']
[59]: lc mc Total
LA 49.17 145.20 34974.37
LA 236.93 440.13 6031.06
[60]: df_3.loc['WA','Total']
[60]: 7351.72
[61]: df_3.plot(kind='line')
[61]: <AxesSubplot:>
8
[62]: df_3.plot(kind='box')
[62]: <AxesSubplot:>
9
[64]: df_3.plot(kind='bar', figsize=(15,7))
[64]: <AxesSubplot:>
[66]: df_3.plot(kind='hist')
[66]: <AxesSubplot:ylabel='Frequency'>
10
[67]: df_3.plot(kind='scatter', x='VCL', y='Total')
11
MA 15 21946 13 312.36 175.46 22461.82
VA 16 3101 8 220.61 119.66 3465.27
OH 17 41965 15 66.25 119.66 42182.91
[79]: df_31['Total'].plot(kind='pie',
figsize=(15,6),
autopct='%1.0f%%',
startangle=150,
shadow=True,
labels=None,
pctdistance=1.12,)
plt.legend(labels=df_31.index, loc='upper left')
12
[82]: # 2nd meethof
df_31['Total'].plot(kind='pie',
figsize=(15,6),
autopct='%1.0f%%',
startangle=90,
shadow=True,)
[82]: <AxesSubplot:ylabel='Total'>
13
[20]: df_21=pd.read_excel('C:/Users/Nazakat ali/Desktop/Stat711/New folder/Book2.
,→xlsx')
df_21
[53]: df_21.iloc[5]
14
Mileage 12150.375
lh 3.100
lc 205.280
mc 175.460
Name: 5, dtype: float64
[27]: Vehicle 0
fm 0
Mileage 1
lh 0
lc 0
mc 2
dtype: int64
[31]: df_21.isnull().sum()
[31]: Vehicle 0
fm 0
Mileage 0
lh 0
lc 0
mc 0
dtype: int64
15