Day 10 Pandasdatacleaning
Day 10 Pandasdatacleaning
February 3, 2024
df = pd.DataFrame(data)
df
[39]: A B
0 10.0 None
1 20.0 Bangaluru
2 30.0 Tumkur
3 40.0 chennai
4 50.0 Mangaluru
5 NaN Badami
df['A'].fillna(df['A'].mean(), inplace=True)
1
[40]: A B
0 10.0 None
1 20.0 Bangaluru
2 30.0 Tumkur
3 40.0 chennai
4 50.0 Mangaluru
5 30.0 Badami
[44]: A B
1 20.0 Bangaluru
2 30.0 Tumkur
3 40.0 chennai
4 50.0 Mangaluru
5 30.0 Badami
This code will create a new DataFrame by removing columns with any missing values
[45]: cleandf = cleanDF.dropna(axis=1)
cleandf
[45]: A B
1 20.0 Bangaluru
2 30.0 Tumkur
3 40.0 chennai
4 50.0 Mangaluru
5 30.0 Badami
A
0 10.0
1 20.0
2 30.0
3 40.0
4 50.0
5 30.0
2. Removing Duplicates:
Removing duplicate rows
[47]: x = df.drop_duplicates()
x
2
[47]: A B
0 10.0 None
1 20.0 Bangaluru
2 30.0 Tumkur
3 40.0 chennai
4 50.0 Mangaluru
5 30.0 Badami
df
[57]: A
0 10
1 20
2 30
3 40
4 50
[58]: A
0 10
1 20
2 30
3 40
4 50
4.String Cleaning:
[66]: data = {'A': [1, 2, 3, 4, 5],
'B': [' apple ', 'banana', 'cherry ', 'date', ' elderberry ']}
df = pd.DataFrame(data)
df
[66]: A B
0 1 apple
1 2 banana
2 3 cherry
3 4 date
4 5 elderberry
3
the str.strip() method to remove leading and trailing whitespaces
[68]: A B
0 1 apple
1 2 banana
2 3 cherry
3 4 date
4 5 elderberry
[71]: A B
0 1 APPLE
1 2 BANANA
2 3 CHERRY
3 4 DATE
4 5 ELDERBERRY
[81]: A B
0 1 apple
1 2 banana
2 3 cherry
3 4 date
4 5 chocolate
Data transformation
[88]: data = {'A': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)
4
[89]: def double(x):
return x+x
df["A+A"] = df["A"].apply(double)
df
[89]: A A+A
0 10 20
1 20 40
2 30 60
3 40 80
4 50 100
[92]: # map()
data = {'Category': ['A', 'B', 'A', 'C', 'B']}
df = pd.DataFrame(data)
[92]: Category
0 A
1 B
2 A
3 C
4 B
[99]: # applymap()
data = {'A': [1, 2, 3],
'B': [4, 5, 6]}
df = pd.DataFrame(data)
def square(x):
return x ** 2
5
A B
0 1 16
1 4 25
2 9 36
/tmp/ipykernel_33/2438595464.py:9: FutureWarning: DataFrame.applymap has been
deprecated. Use DataFrame.map instead.
df = df.applymap(square) #we can also use map()
[ ]: