0% found this document useful (0 votes)

58 views15 pages

Data Cleaning and Fill Missing Values

The document discusses cleaning and preparing a dataset for analysis. It loads automobile data from an Excel file into a Pandas dataframe. It then cleans the data by dropping unnecessary columns, removing missing values, and renaming columns for clarity. New columns and rows are added to the dataframe. The dataframe is inspected and summarized to check for any remaining issues before analysis.

Uploaded by

Nazakat ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views15 pages

Data Cleaning and Fill Missing Values

Uploaded by

Nazakat ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Data cleaning and fill missing values

January 22, 2021

[1]: import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.style.use('ggplot')

[3]: df_2=pd.read_excel('C:/Users/Nazakat ali/Desktop/Stat711/New folder/Book1.xlsx')

df_2

[3]: Vehicle fm Mileage lh lc mc State ggg year

0 1 0 863 1.1 66.30 697.23 MS 55 2000
1 2 10 4644 2.4 233.03 119.66 CA NaN 2000
2 3 15 16330 4.2 325.08 175.46 WI f 2000
3 4 0 13 1.0 66.64 0.00 OR fg 2000
4 5 13 22537 4.5 328.66 175.46 AZ sd 2000
5 6 21 40931 3.1 205.28 175.46 FL gh 2000
6 7 11 34762 0.7 49.17 145.20 LA sd 2000
7 8 5 11051 2.9 208.80 270.04 GA sd 2000
8 9 8 7003 3.4 212.06 119.66 WA sd 2000
9 10 1 11 0.7 44.43 0.00 PA sd 2000
10 11 17 24879 3.5 260.29 119.66 TX sd 2000
11 12 3 5339 3.2 236.93 440.13 LA sd 2000
12 13 14 29782 10.0 695.10 228.12 FL sd 2000
13 14 19 56111 2.0 116.00 183.31 OH sd 2000
14 15 13 21946 3.8 312.36 175.46 MA sd 2000
15 16 8 3101 3.1 220.61 119.66 VA sd 2000
16 17 15 41965 0.9 66.25 119.66 OH sd 2000
17 18 3 15365 2.0 158.94 175.46 CO sd 2000
18 19 12 44865 4.9 319.51 119.66 FL sd 2000

[4]: df_2.shape

[4]: (19, 9)

[5]: df_2.info

[5]: <bound method DataFrame.info of Vehicle fm Mileage lh lc mc

State ggg year

1
0 1 0 863 1.1 66.30 697.23 MS 55 2000
1 2 10 4644 2.4 233.03 119.66 CA NaN 2000
2 3 15 16330 4.2 325.08 175.46 WI f 2000
3 4 0 13 1.0 66.64 0.00 OR fg 2000
4 5 13 22537 4.5 328.66 175.46 AZ sd 2000
5 6 21 40931 3.1 205.28 175.46 FL gh 2000
6 7 11 34762 0.7 49.17 145.20 LA sd 2000
7 8 5 11051 2.9 208.80 270.04 GA sd 2000
8 9 8 7003 3.4 212.06 119.66 WA sd 2000
9 10 1 11 0.7 44.43 0.00 PA sd 2000
10 11 17 24879 3.5 260.29 119.66 TX sd 2000
11 12 3 5339 3.2 236.93 440.13 LA sd 2000
12 13 14 29782 10.0 695.10 228.12 FL sd 2000
13 14 19 56111 2.0 116.00 183.31 OH sd 2000
14 15 13 21946 3.8 312.36 175.46 MA sd 2000
15 16 8 3101 3.1 220.61 119.66 VA sd 2000
16 17 15 41965 0.9 66.25 119.66 OH sd 2000
17 18 3 15365 2.0 158.94 175.46 CO sd 2000
18 19 12 44865 4.9 319.51 119.66 FL sd 2000>

[10]: df_2.describe()

[10]: Vehicle fm Mileage lh lc mc \

count 19.000000 19.000000 19.000000 19.000000 19.000000 19.000000
mean 10.000000 9.894737 20078.842105 3.021053 217.128421 187.331053
std 5.627314 6.462668 17281.824384 2.139205 151.779089 155.020900
min 1.000000 0.000000 11.000000 0.700000 44.430000 0.000000
25% 5.500000 4.000000 4991.500000 1.550000 91.320000 119.660000
50% 10.000000 11.000000 16330.000000 3.100000 212.060000 175.460000
75% 14.500000 14.500000 32272.000000 3.650000 286.325000 179.385000
max 19.000000 21.000000 56111.000000 10.000000 695.100000 697.230000

year
count 19.0
mean 2000.0
std 0.0
min 2000.0
25% 2000.0
50% 2000.0
75% 2000.0
max 2000.0

[11]: df_2.head(2)

[11]: Vehicle fm Mileage lh lc mc State ggg year

0 1 0 863 1.1 66.30 697.23 MS 55 2000
1 2 10 4644 2.4 233.03 119.66 CA NaN 2000

2
1 Clean sum coloumns from data frame

2 Plzz Rembered every time , axis=0 mean that dellet row

3 and axis=1 mean that dellet coloumn

[12]: df_2.drop(['ggg', 'year', 'lh'], axis=1, inplace=True)

[13]: df_2

[13]: Vehicle fm Mileage lc mc State

0 1 0 863 66.30 697.23 MS
1 2 10 4644 233.03 119.66 CA
2 3 15 16330 325.08 175.46 WI
3 4 0 13 66.64 0.00 OR
4 5 13 22537 328.66 175.46 AZ
5 6 21 40931 205.28 175.46 FL
6 7 11 34762 49.17 145.20 LA
7 8 5 11051 208.80 270.04 GA
8 9 8 7003 212.06 119.66 WA
9 10 1 11 44.43 0.00 PA
10 11 17 24879 260.29 119.66 TX
11 12 3 5339 236.93 440.13 LA
12 13 14 29782 695.10 228.12 FL
13 14 19 56111 116.00 183.31 OH
14 15 13 21946 312.36 175.46 MA
15 16 8 3101 220.61 119.66 VA
16 17 15 41965 66.25 119.66 OH
17 18 3 15365 158.94 175.46 CO
18 19 12 44865 319.51 119.66 FL

[14]: # Delet row 18 and 17 use exis=0

df_2.drop([18,17], axis=0, inplace=True)

[15]: df_2

[15]: Vehicle fm Mileage lc mc State

3
10 11 17 24879 260.29 119.66 TX
11 12 3 5339 236.93 440.13 LA
12 13 14 29782 695.10 228.12 FL
13 14 19 56111 116.00 183.31 OH
14 15 13 21946 312.36 175.46 MA
15 16 8 3101 220.61 119.66 VA
16 17 15 41965 66.25 119.66 OH

4 Rename columns of Data frame

[16]: df_2.rename(columns={'Vehicle':'VCL', 'Mileage':'MLG', 'State':'country'},␣
,→inplace=True)

[17]: df_2

[17]: VCL fm MLG lc mc country

5 Add columns and Row

6 column use axis=1

7 row use axis=0

[18]: # add coloumns
df_2['Total']=df_2.sum(axis=1)
df_2

[18]: VCL fm MLG lc mc country Total

0 1 0 863 66.30 697.23 MS 1627.53

4
1 2 10 4644 233.03 119.66 CA 5008.69
2 3 15 16330 325.08 175.46 WI 16848.54
3 4 0 13 66.64 0.00 OR 83.64
4 5 13 22537 328.66 175.46 AZ 23059.12
5 6 21 40931 205.28 175.46 FL 41338.74
6 7 11 34762 49.17 145.20 LA 34974.37
7 8 5 11051 208.80 270.04 GA 11542.84
8 9 8 7003 212.06 119.66 WA 7351.72
9 10 1 11 44.43 0.00 PA 66.43
10 11 17 24879 260.29 119.66 TX 25286.95
11 12 3 5339 236.93 440.13 LA 6031.06
12 13 14 29782 695.10 228.12 FL 30732.22
13 14 19 56111 116.00 183.31 OH 56443.31
14 15 13 21946 312.36 175.46 MA 22461.82
15 16 8 3101 220.61 119.66 VA 3465.27
16 17 15 41965 66.25 119.66 OH 42182.91

[19]: # Check missing value

df_2.isnull()

[19]: VCL fm MLG lc mc country Total

0 False False False False False False False
1 False False False False False False False
2 False False False False False False False
3 False False False False False False False
4 False False False False False False False
5 False False False False False False False
6 False False False False False False False
7 False False False False False False False
8 False False False False False False False
9 False False False False False False False
10 False False False False False False False
11 False False False False False False False
12 False False False False False False False
13 False False False False False False False
14 False False False False False False False
15 False False False False False False False
16 False False False False False False False

[32]: # Sum missing value

df_2.isnull().sum()

[32]: VCL 0
fm 0
MLG 0
lc 0
mc 0

5
country 0
Total 0
dtype: int64

[34]: # Data filter

df_2['MLG']
df_2.MLG

[34]: 0 863
1 4644
2 16330
3 13
4 22537
5 40931
6 34762
7 11051
8 7003
9 11
10 24879
11 5339
12 29782
13 56111
14 21946
15 3101
16 41965
Name: MLG, dtype: int64

[38]: # data filter

df_3=df_2.loc[:,['country', 'VCL', 'MLG', 'fm', 'lc', 'mc', 'Total']]
df_3

[38]: country VCL MLG fm lc mc Total

0 MS 1 863 0 66.30 697.23 1627.53
1 CA 2 4644 10 233.03 119.66 5008.69
2 WI 3 16330 15 325.08 175.46 16848.54
3 OR 4 13 0 66.64 0.00 83.64
4 AZ 5 22537 13 328.66 175.46 23059.12
5 FL 6 40931 21 205.28 175.46 41338.74
6 LA 7 34762 11 49.17 145.20 34974.37
7 GA 8 11051 5 208.80 270.04 11542.84
8 WA 9 7003 8 212.06 119.66 7351.72
9 PA 10 11 1 44.43 0.00 66.43
10 TX 11 24879 17 260.29 119.66 25286.95
11 LA 12 5339 3 236.93 440.13 6031.06
12 FL 13 29782 14 695.10 228.12 30732.22
13 OH 14 56111 19 116.00 183.31 56443.31
14 MA 15 21946 13 312.36 175.46 22461.82

6
15 VA 16 3101 8 220.61 119.66 3465.27
16 OH 17 41965 15 66.25 119.66 42182.91

[39]: # filter and setting country colums

df_3.set_index('country', inplace=True)

[40]: df_3

[40]: VCL MLG fm lc mc Total

country
MS 1 863 0 66.30 697.23 1627.53
CA 2 4644 10 233.03 119.66 5008.69
WI 3 16330 15 325.08 175.46 16848.54
OR 4 13 0 66.64 0.00 83.64
AZ 5 22537 13 328.66 175.46 23059.12
FL 6 40931 21 205.28 175.46 41338.74
LA 7 34762 11 49.17 145.20 34974.37
GA 8 11051 5 208.80 270.04 11542.84
WA 9 7003 8 212.06 119.66 7351.72
PA 10 11 1 44.43 0.00 66.43
TX 11 24879 17 260.29 119.66 25286.95
LA 12 5339 3 236.93 440.13 6031.06
FL 13 29782 14 695.10 228.12 30732.22
OH 14 56111 19 116.00 183.31 56443.31
MA 15 21946 13 312.36 175.46 22461.82
VA 16 3101 8 220.61 119.66 3465.27
OH 17 41965 15 66.25 119.66 42182.91

[46]: # Remove column name

df_3.index.name=None

[49]: df_3

[49]: VCL MLG fm lc mc Total

MS 1 863 0 66.30 697.23 1627.53
CA 2 4644 10 233.03 119.66 5008.69
WI 3 16330 15 325.08 175.46 16848.54
OR 4 13 0 66.64 0.00 83.64
AZ 5 22537 13 328.66 175.46 23059.12
FL 6 40931 21 205.28 175.46 41338.74
LA 7 34762 11 49.17 145.20 34974.37
GA 8 11051 5 208.80 270.04 11542.84
WA 9 7003 8 212.06 119.66 7351.72
PA 10 11 1 44.43 0.00 66.43
TX 11 24879 17 260.29 119.66 25286.95
LA 12 5339 3 236.93 440.13 6031.06
FL 13 29782 14 695.10 228.12 30732.22

7
OH 14 56111 19 116.00 183.31 56443.31
MA 15 21946 13 312.36 175.46 22461.82
VA 16 3101 8 220.61 119.66 3465.27
OH 17 41965 15 66.25 119.66 42182.91

[55]: # show row any data

df_3.loc['FL']

[55]: VCL MLG fm lc mc Total

FL 6 40931 21 205.28 175.46 41338.74
FL 13 29782 14 695.10 228.12 30732.22

[57]: df_3.loc['LA']

[57]: VCL MLG fm lc mc Total

LA 7 34762 11 49.17 145.20 34974.37
LA 12 5339 3 236.93 440.13 6031.06

[59]: # specific data pick

df_3.loc['LA',['lc', 'mc', 'Total']]

[59]: lc mc Total
LA 49.17 145.20 34974.37
LA 236.93 440.13 6031.06

[60]: df_3.loc['WA','Total']

[60]: 7351.72

[61]: df_3.plot(kind='line')

[61]: <AxesSubplot:>

8
[62]: df_3.plot(kind='box')

[62]: <AxesSubplot:>

9
[64]: df_3.plot(kind='bar', figsize=(15,7))

[64]: <AxesSubplot:>

[66]: df_3.plot(kind='hist')

[66]: <AxesSubplot:ylabel='Frequency'>

10
[67]: df_3.plot(kind='scatter', x='VCL', y='Total')

[67]: <AxesSubplot:xlabel='VCL', ylabel='Total'>

[71]: df_31=df_2.loc[:,['country', 'VCL', 'MLG', 'fm', 'lc', 'mc', 'Total']]

df_31.set_index('country', inplace=True)
df_31

[71]: VCL MLG fm lc mc Total

11
MA 15 21946 13 312.36 175.46 22461.82
VA 16 3101 8 220.61 119.66 3465.27
OH 17 41965 15 66.25 119.66 42182.91

[73]: df_312=df_31.groupby('country', axis=0).sum()

df_312

[73]: VCL MLG fm lc mc Total

country
AZ 5 22537 13 328.66 175.46 23059.12
CA 2 4644 10 233.03 119.66 5008.69
FL 19 70713 35 900.38 403.58 72070.96
GA 8 11051 5 208.80 270.04 11542.84
LA 19 40101 14 286.10 585.33 41005.43
MA 15 21946 13 312.36 175.46 22461.82
MS 1 863 0 66.30 697.23 1627.53
OH 31 98076 34 182.25 302.97 98626.22
OR 4 13 0 66.64 0.00 83.64
PA 10 11 1 44.43 0.00 66.43
TX 11 24879 17 260.29 119.66 25286.95
VA 16 3101 8 220.61 119.66 3465.27
WA 9 7003 8 212.06 119.66 7351.72
WI 3 16330 15 325.08 175.46 16848.54

[79]: df_31['Total'].plot(kind='pie',
figsize=(15,6),
autopct='%1.0f%%',
startangle=150,
shadow=True,
labels=None,
pctdistance=1.12,)
plt.legend(labels=df_31.index, loc='upper left')

[79]: <matplotlib.legend.Legend at 0x258700991c0>

12
[82]: # 2nd meethof
df_31['Total'].plot(kind='pie',
figsize=(15,6),
autopct='%1.0f%%',
startangle=90,
shadow=True,)

[82]: <AxesSubplot:ylabel='Total'>

13
[20]: df_21=pd.read_excel('C:/Users/Nazakat ali/Desktop/Stat711/New folder/Book2.
,→xlsx')

df_21

[20]: Vehicle fm Mileage lh lc mc

0 1 0 863.0 1.1 66.30 697.23
1 2 10 4644.0 2.4 233.03 119.66
2 3 15 16330.0 4.2 325.08 175.46
3 4 0 13.0 1.0 66.64 NaN
4 5 13 22537.0 4.5 328.66 175.46
5 6 21 NaN 3.1 205.28 175.46
6 7 11 34762.0 0.7 49.17 145.20
7 8 5 11051.0 2.9 208.80 270.04
8 9 8 7003.0 3.4 212.06 NaN

[53]: df_21.iloc[5]

[53]: Vehicle 6.000

fm 21.000

14
Mileage 12150.375
lh 3.100
lc 205.280
mc 175.460
Name: 5, dtype: float64

[27]: # Check missing value

df_21.isnull().sum()

[27]: Vehicle 0
fm 0
Mileage 1
lh 0
lc 0
mc 2
dtype: int64

8 Fill Missing Values

[30]: df_21.fillna(df_21.mean(), inplace=True)
df_21

[30]: Vehicle fm Mileage lh lc mc

0 1 0 863.000 1.1 66.30 697.230000
1 2 10 4644.000 2.4 233.03 119.660000
2 3 15 16330.000 4.2 325.08 175.460000
3 4 0 13.000 1.0 66.64 251.215714
4 5 13 22537.000 4.5 328.66 175.460000
5 6 21 12150.375 3.1 205.28 175.460000
6 7 11 34762.000 0.7 49.17 145.200000
7 8 5 11051.000 2.9 208.80 270.040000
8 9 8 7003.000 3.4 212.06 251.215714

[31]: df_21.isnull().sum()

[31]: Vehicle 0
fm 0
Mileage 0
lh 0
lc 0
mc 0
dtype: int64

Amada Punching Tool Catalog 2011 PDF
No ratings yet
Amada Punching Tool Catalog 2011 PDF
74 pages
F X X F X X F X X X X X: Graphing Linear Function
No ratings yet
F X X F X X F X X X X X: Graphing Linear Function
2 pages
WPS ASME IX WeldNote Metric
No ratings yet
WPS ASME IX WeldNote Metric
1 page
TWI Paper On Fatigue and Griding
No ratings yet
TWI Paper On Fatigue and Griding
42 pages
Data Wrangling PDF
No ratings yet
Data Wrangling PDF
14 pages
Data Cleaning With Python and Pandas
No ratings yet
Data Cleaning With Python and Pandas
49 pages
Assignment1,codeandssfile
No ratings yet
Assignment1,codeandssfile
29 pages
PW2 DataCleaning
No ratings yet
PW2 DataCleaning
6 pages
EDA_Ex4 - Colab
No ratings yet
EDA_Ex4 - Colab
5 pages
2777959-Day 8 - Data Wrangling
No ratings yet
2777959-Day 8 - Data Wrangling
2 pages
Data Cleaning
No ratings yet
Data Cleaning
20 pages
EXP-12_IAIML
No ratings yet
EXP-12_IAIML
13 pages
Lesson 3. Data Preparation and Structuring 1 Data Cleaning
No ratings yet
Lesson 3. Data Preparation and Structuring 1 Data Cleaning
36 pages
Data Wrangling
No ratings yet
Data Wrangling
24 pages
Pandas-1
No ratings yet
Pandas-1
13 pages
Hands On Data Cleaning With Pandas and NumPy
No ratings yet
Hands On Data Cleaning With Pandas and NumPy
20 pages
Data Wrangling
No ratings yet
Data Wrangling
24 pages
Reading 5 - Data Preparation
No ratings yet
Reading 5 - Data Preparation
23 pages
Pandas
No ratings yet
Pandas
4 pages
Missingvaluetreatment-Ex 2 Code
No ratings yet
Missingvaluetreatment-Ex 2 Code
2 pages
DA0101EN-2-Review-Data-Wrangling - Jupyter Notebook
No ratings yet
DA0101EN-2-Review-Data-Wrangling - Jupyter Notebook
14 pages
exp3-2
No ratings yet
exp3-2
5 pages
DS Lec 6
No ratings yet
DS Lec 6
27 pages
ML Practical 03
No ratings yet
ML Practical 03
20 pages
Handling Missing Values in Python
No ratings yet
Handling Missing Values in Python
9 pages
Python (Unit - 2)
No ratings yet
Python (Unit - 2)
22 pages
Module 3.Pptx
No ratings yet
Module 3.Pptx
20 pages
6.Data Cleaning
No ratings yet
6.Data Cleaning
20 pages
DAP writeups_merged
No ratings yet
DAP writeups_merged
33 pages
TP2- ML -handling outliers
No ratings yet
TP2- ML -handling outliers
5 pages
Data Preprocessing 1
No ratings yet
Data Preprocessing 1
6 pages
Data Cleaning in Python
No ratings yet
Data Cleaning in Python
6 pages
Group A Assignment No2 Writeup
No ratings yet
Group A Assignment No2 Writeup
9 pages
Day 10 Pandasdatacleaning
No ratings yet
Day 10 Pandasdatacleaning
6 pages
Handling Missing Values
No ratings yet
Handling Missing Values
4 pages
DWM Exp 7
No ratings yet
DWM Exp 7
4 pages
How To Handle Outliers
No ratings yet
How To Handle Outliers
6 pages
Unit 5 Python
No ratings yet
Unit 5 Python
30 pages
Kenny-230722-Data Cleaning With Python and Pandas - Detecting Missing Values
No ratings yet
Kenny-230722-Data Cleaning With Python and Pandas - Detecting Missing Values
13 pages
Pandas Data Cleaning Presentation
No ratings yet
Pandas Data Cleaning Presentation
11 pages
DataCleaninginML
No ratings yet
DataCleaninginML
15 pages
Ds Exp1 Manju
No ratings yet
Ds Exp1 Manju
5 pages
Ass-2 Ds
No ratings yet
Ass-2 Ds
29 pages
S08 Slides
No ratings yet
S08 Slides
14 pages
3b. Data Pre-Processing
No ratings yet
3b. Data Pre-Processing
84 pages
Data Cleaning
No ratings yet
Data Cleaning
2 pages
AttiqAhmadAfsarMidExam
No ratings yet
AttiqAhmadAfsarMidExam
8 pages
Question _2 Anser
No ratings yet
Question _2 Anser
4 pages
Data Cleaning
No ratings yet
Data Cleaning
13 pages
Statistical Transform Data Cleaning
No ratings yet
Statistical Transform Data Cleaning
30 pages
AI351 Lecture 1 - Data Preprocessing
No ratings yet
AI351 Lecture 1 - Data Preprocessing
8 pages
7 Cleaning data w3s.............................................
No ratings yet
7 Cleaning data w3s.............................................
2 pages
Data Analytics lab manual
No ratings yet
Data Analytics lab manual
47 pages
Ads Exp2 C35
No ratings yet
Ads Exp2 C35
9 pages
Lab 3 DWM
No ratings yet
Lab 3 DWM
5 pages
How to Handle Missing Data in Python. [Explained in 5 Easy Steps]
No ratings yet
How to Handle Missing Data in Python. [Explained in 5 Easy Steps]
10 pages
Lab Exercise 2-CS0017
No ratings yet
Lab Exercise 2-CS0017
17 pages
Data Cleaning & Preparation
100% (2)
Data Cleaning & Preparation
2 pages
project
No ratings yet
project
10 pages
Lecture 8 Handling Missing Values
No ratings yet
Lecture 8 Handling Missing Values
25 pages
Part A Assignment 6
No ratings yet
Part A Assignment 6
28 pages
Code explanation for date types
No ratings yet
Code explanation for date types
8 pages
Kawasaki Superbikes: Z1000 D & S
From Everand
Kawasaki Superbikes: Z1000 D & S
Stefan R. Oehl
No ratings yet
Kawasaki Superbikes: Z1000 R and Z1100 R
From Everand
Kawasaki Superbikes: Z1000 R and Z1100 R
Stefan R. Oehl
No ratings yet
Intro Read Data and Ploting
No ratings yet
Intro Read Data and Ploting
18 pages
Logistics Regression
100% (1)
Logistics Regression
5 pages
UsbFix Report
No ratings yet
UsbFix Report
3 pages
Real Number System - A Co-Teaching Lesson Plan
No ratings yet
Real Number System - A Co-Teaching Lesson Plan
8 pages
UsbFix Report
No ratings yet
UsbFix Report
3 pages
Lee-Carter Model: XT X X T XT
No ratings yet
Lee-Carter Model: XT X X T XT
2 pages
Structure of A Program
No ratings yet
Structure of A Program
144 pages
1.heavy Earth Moving Equipment A.dozers: Key Components
No ratings yet
1.heavy Earth Moving Equipment A.dozers: Key Components
8 pages
Extrusion of Metals: Mr. Jay Vora Faculty, School of Technology, PDPU, Gandhinagar
No ratings yet
Extrusion of Metals: Mr. Jay Vora Faculty, School of Technology, PDPU, Gandhinagar
27 pages
Metrology & Measurement (R-2013)
No ratings yet
Metrology & Measurement (R-2013)
11 pages
Midterm Test: Khawmk@utar - Edu.my
No ratings yet
Midterm Test: Khawmk@utar - Edu.my
5 pages
USER MANUAL Taurus Polar 3B Polure
No ratings yet
USER MANUAL Taurus Polar 3B Polure
28 pages
HDFC PPT (Maths + Eng) (3) - 8
No ratings yet
HDFC PPT (Maths + Eng) (3) - 8
59 pages
PROJECT REPORT MMC's
No ratings yet
PROJECT REPORT MMC's
40 pages
Google API Flow
No ratings yet
Google API Flow
7 pages
ths4222
No ratings yet
ths4222
48 pages
Sample Data Atwood's Machine
No ratings yet
Sample Data Atwood's Machine
8 pages
Blasius Developed An Exact Solution
No ratings yet
Blasius Developed An Exact Solution
14 pages
Maya Mel
No ratings yet
Maya Mel
146 pages
Preview-9781139227681 A23867119
100% (1)
Preview-9781139227681 A23867119
41 pages
Ebara Databook 6-8BHE (L) 60Hz
No ratings yet
Ebara Databook 6-8BHE (L) 60Hz
75 pages
Paper 2 Mock 2
No ratings yet
Paper 2 Mock 2
26 pages
Installation Instructions CLNX 1B & Condensing Unit (Direct Expansion) v1
No ratings yet
Installation Instructions CLNX 1B & Condensing Unit (Direct Expansion) v1
12 pages
2.3 Living Processes in Multicellular Organisms
No ratings yet
2.3 Living Processes in Multicellular Organisms
84 pages
Football Periodization Training Load Management Speed Endurance Conditioning
100% (1)
Football Periodization Training Load Management Speed Endurance Conditioning
10 pages
Mathematics For Architecture - Unit 2
No ratings yet
Mathematics For Architecture - Unit 2
18 pages
Ammonia Plant Design
No ratings yet
Ammonia Plant Design
75 pages
Amplification Writing
No ratings yet
Amplification Writing
3 pages
Asme PCC-1-2000
No ratings yet
Asme PCC-1-2000
32 pages
SX Education D
No ratings yet
SX Education D
127 pages
Fundamentals in Cavity Prepration
No ratings yet
Fundamentals in Cavity Prepration
42 pages
VR-UNIT 4-MODELING THE PHYSICAL WORLD (1)
No ratings yet
VR-UNIT 4-MODELING THE PHYSICAL WORLD (1)
39 pages

Data Cleaning and Fill Missing Values

Uploaded by

Data Cleaning and Fill Missing Values

Uploaded by

Data cleaning and fill missing values

January 22, 2021

[1]: import pandas as pd

[3]: df_2=pd.read_excel('C:/Users/Nazakat ali/Desktop/Stat711/New folder/Book1.xlsx')

[3]: Vehicle fm Mileage lh lc mc State ggg year

[5]: <bound method DataFrame.info of Vehicle fm Mileage lh lc mc

[10]: Vehicle fm Mileage lh lc mc \

[11]: Vehicle fm Mileage lh lc mc State ggg year

2 Plzz Rembered every time , axis=0 mean that dellet row

3 and axis=1 mean that dellet coloumn

[13]: Vehicle fm Mileage lc mc State

[14]: # Delet row 18 and 17 use exis=0

[15]: Vehicle fm Mileage lc mc State

4 Rename columns of Data frame

[17]: VCL fm MLG lc mc country

5 Add columns and Row

6 column use axis=1

7 row use axis=0

[18]: VCL fm MLG lc mc country Total

[19]: # Check missing value

[19]: VCL fm MLG lc mc country Total

[32]: # Sum missing value

[34]: # Data filter

[38]: # data filter

[38]: country VCL MLG fm lc mc Total

[39]: # filter and setting country colums

[40]: VCL MLG fm lc mc Total

[46]: # Remove column name

[49]: VCL MLG fm lc mc Total

[55]: # show row any data

[55]: VCL MLG fm lc mc Total

[57]: VCL MLG fm lc mc Total

[59]: # specific data pick

[67]: <AxesSubplot:xlabel='VCL', ylabel='Total'>

[71]: df_31=df_2.loc[:,['country', 'VCL', 'MLG', 'fm', 'lc', 'mc', 'Total']]

[71]: VCL MLG fm lc mc Total

[73]: df_312=df_31.groupby('country', axis=0).sum()

[73]: VCL MLG fm lc mc Total

[79]: <matplotlib.legend.Legend at 0x258700991c0>

[20]: Vehicle fm Mileage lh lc mc

[53]: Vehicle 6.000

[27]: # Check missing value

8 Fill Missing Values

[30]: Vehicle fm Mileage lh lc mc

You might also like