L-4 (Handling of Missing Values).Ipynb - Colab
L-4 (Handling of Missing Values).Ipynb - Colab
ipynb - Colab
import numpy as np
import pandas as pd
# dictionary of lists
# Adding NaN Values using np.nan
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}
0 1 2
https://colab.research.google.com/drive/1Twqn4BjyBrFr0ev-wLIvsJ0tFOcKu_Zf#printMode=true 1/8
3/14/25, 4:37 PM L-4 (Handling of Missing Values).ipynb - Colab
s1 = pd.isnull(df["First Score"])
# displaying data only with First Score = NaN
df[s1]
First Score 1
Second Score 1
Third Score 1
dtype: int64
https://colab.research.google.com/drive/1Twqn4BjyBrFr0ev-wLIvsJ0tFOcKu_Zf#printMode=true 2/8
3/14/25, 4:37 PM L-4 (Handling of Missing Values).ipynb - Colab
First Score Second Score Third Score
1 90.0 45.0 40.0
new_df = df.dropna(axis=1, how='all') # Drop columns that have all NaN Values
0 1 2
newdf=df.fillna(0)
newdf
0 1 2
0 1 2
df.fillna(0, inplace=True)
df
https://colab.research.google.com/drive/1Twqn4BjyBrFr0ev-wLIvsJ0tFOcKu_Zf#printMode=true 3/8
3/14/25, 4:37 PM L-4 (Handling of Missing Values).ipynb - Colab
0 1 2
0 1 2
df.fillna(method='ffill') # ffill() function is used forward fill the missing value in the dataframe.
0 1 2
df.fillna(method='bfill') # bfill() function is used backward fill the missing value in the dataframe. Last Value is NaN so no change
0 1 2
df.iloc[5,1]= 0.230171
df.iloc[5,2]= 0.430171
df
0 1 2
https://colab.research.google.com/drive/1Twqn4BjyBrFr0ev-wLIvsJ0tFOcKu_Zf#printMode=true 4/8
3/14/25, 4:37 PM L-4 (Handling of Missing Values).ipynb - Colab
df.fillna(method='bfill')
0 1 2
keyboard_arrow_down REINDEXING
An important method on pandas objects is reindex, which means to create a new object with the data conformed to a new index. Operations
can be accomplished through reindexing are −
# Create a series
obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
print(obj)
print(type(obj))
d 4.5
b 7.2
a -5.3
c 3.6
dtype: float64
<class 'pandas.core.series.Series'>
# Calling reindex on this Series rearranges the data according to the new index,
# introducing missing values if any index values were not already present:
a -5.3
b 7.2
c 3.6
d 4.5
e NaN
dtype: float64
# For ordered data like time series, it may be desirable to do some interpolation or filling of values when reindexing. The method optio
# method such as ffill, which forward-fills the values:
obj2 = pd.Series(['blue', 'purple', 'yellow'], index=[0, 4, 9])
print(obj2)
0 blue
4 purple
9 yellow
dtype: object
0 blue
1 NaN
2 NaN
3 NaN
4 purple
5 NaN
6 NaN
7 NaN
8 NaN
9 yellow
10 NaN
11 NaN
dtype: object
print("\nobj2\n", obj2)
obj2b= obj2.reindex(range(12), method='bfill') # also chek ffill
print("\nnew obj2\n", obj2b)
obj2
0 blue
https://colab.research.google.com/drive/1Twqn4BjyBrFr0ev-wLIvsJ0tFOcKu_Zf#printMode=true 5/8
3/14/25, 4:37 PM L-4 (Handling of Missing Values).ipynb - Colab
4 purple
9 yellow
dtype: object
new obj2
0 blue
1 purple
2 purple
3 purple
4 purple
5 yellow
6 yellow
7 yellow
8 yellow
9 yellow
10 NaN
11 NaN
dtype: object
new obj2
0 blue
4 purple
9 yellow
dtype: object
new obj2c
0 blue
1 blue
2 blue
3 blue
4 purple
5 purple
6 purple
7 purple
8 purple
9 yellow
10 yellow
11 yellow
dtype: object
# With DataFrame, reindex can alter either the (row) index, columns, or both.
# When passed only a sequence, it reindexes the rows in the result:
a 1 NaN 2
c 4 NaN 5
d 7 NaN 8
# fill_value
cities = ['Mumbai', 'Chennai', 'Kolkata']
frame.reindex(columns=cities, fill_value = 100)
https://colab.research.google.com/drive/1Twqn4BjyBrFr0ev-wLIvsJ0tFOcKu_Zf#printMode=true 6/8
3/14/25, 4:37 PM L-4 (Handling of Missing Values).ipynb - Colab
a 1 100 2
c 4 100 5
d 7 100 8
Sorting
# Sort Dataframe based on ‘age'(in descending order) and ‘grade’ (in ascending order) column.
# based on age and grade
print(df_Citizens.sort_values(['age', 'grade'], ascending = [False, True]))
#print("\n\n", df_Citizens) #Original dataframe remains same
# In-place sorting of Dataframe based on ‘grade’ and ‘favourite_color’ column. In case of in-place sorting,
# Dataframe.sort_values() method returns nothing it performs changes in the actual dataframe.
# na_position : Puts NaNs at the beginning if first; last puts NaNs at the end.
# na_position{‘first’, ‘last’}, default ‘last’
df_Citizens.sort_values(["grade", "favourite_color"],
axis = 0, ascending = [True, False],
inplace = True, na_position ='first')
print(df_Citizens)
https://colab.research.google.com/drive/1Twqn4BjyBrFr0ev-wLIvsJ0tFOcKu_Zf#printMode=true 7/8
3/14/25, 4:37 PM L-4 (Handling of Missing Values).ipynb - Colab
Delhi 3
Mumbai 3
Gurgaon 1
Name: city, dtype: int64
https://colab.research.google.com/drive/1Twqn4BjyBrFr0ev-wLIvsJ0tFOcKu_Zf#printMode=true 8/8