Pandas DataFrame Notebook
Pandas DataFrame Notebook
DataFrame()
Pandas DataFrame
Introduction
A DataFrame in Pandas is a 2-dimensional labeled data structure, similar to a table in a
database, an Excel spreadsheet, or a data frame in R. It is one of the primary data
structures in Pandas, and it allows for easy data analysis and manipulation.
Key Characteristics
Labeled Rows and Columns: Each row and column has labels for easier access.
students_data = [
[100,80,10],
[90,70,7],
[120,100,14],
[80,50,2]
]
pd.DataFrame(students_data,columns = ['iq','marks','package'])
0 100 80 10
1 90 70 7
2 120 100 14
3 80 50 2
USing dicts
students_dict = {
'name':['Gourab','saurabh','suman','pranav','sanoj','hero'],
'iq':[100,90,120,80,0,0],
'marks':[80,70,100,50,0,0],
'package':[10,7,14,2,0,0]
}
file:///C:/Users/goura/Downloads/14-DataFrame.html 1/143
2/8/25, 2:23 PM 14-DataFrame
students = pd.DataFrame(students_dict)
students.set_index('name',inplace = True)
students
name
Gourab 100 80 10
saurabh 90 70 7
pranav 80 50 2
sanoj 0 0 0
hero 0 0 0
using read_csv
movies = pd.read_csv("movies.csv")
movies
file:///C:/Users/goura/Downloads/14-DataFrame.html 2/143
2/8/25, 2:23 PM 14-DataFrame
Uri: The
0 Surgical tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.w
Strike
Battalion
1 tt9472208 NaN https://e
609
The
Accidental
2 Prime tt6986710 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wik
Minister
(film)
Why
3 Cheat tt8108208 https://upload.wikimedia.org/wikipedia/en/thum... https://en.w
India
Evening
4 tt6028796 NaN https://en.w
Shadows
Tera Mera
1624 Saath tt0301250 https://upload.wikimedia.org/wikipedia/en/2/2b... https://en.wik
Rahen
Yeh
1625 Zindagi tt0298607 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wik
Ka Safar
Sabse
1626 Bada tt0069204 NaN https://en.w
Sukh
file:///C:/Users/goura/Downloads/14-DataFrame.html 3/143
2/8/25, 2:23 PM 14-DataFrame
file:///C:/Users/goura/Downloads/14-DataFrame.html 4/143
2/8/25, 2:23 PM 14-DataFrame
Royal
2022- Rajasthan
1 1312199 Ahmedabad 2022 Qualifier 2 Challengers
05-27 Royals
Bangalore
Ah
Royal Lucknow
2022-
2 1312198 Kolkata 2022 Eliminator Challengers Super
05-25
Bangalore Giants
W
2022- Sunrisers Punjab
4 1304116 Mumbai 2022 70
05-22 Hyderabad Kings
Kolkata
2008- Deccan
945 335986 Kolkata 2007/08 4 Knight
04-20 Chargers
Riders
Royal
2008- Mumbai W
946 335985 Mumbai 2007/08 5 Challengers
04-20 Indians
Bangalore
Chennai
2008- Kings XI
948 335983 Chandigarh 2007/08 2 Super A
04-19 Punjab
Kings
Royal Kolkata
2008-
949 335982 Bangalore 2007/08 1 Challengers Knight Chin
04-18
Bangalore Riders
file:///C:/Users/goura/Downloads/14-DataFrame.html 5/143
2/8/25, 2:23 PM 14-DataFrame
shape
movies.shape
In [7]: ipl.shape
dtypes
movies.dtypes
In [9]: ipl.dtypes
file:///C:/Users/goura/Downloads/14-DataFrame.html 6/143
2/8/25, 2:23 PM 14-DataFrame
Out[9]: ID int64
City object
Date object
Season object
MatchNumber object
Team1 object
Team2 object
Venue object
TossWinner object
TossDecision object
SuperOver object
WinningTeam object
WonBy object
Margin float64
method object
Player_of_Match object
Team1Players object
Team2Players object
Umpire1 object
Umpire2 object
dtype: object
Index
In [10]: # Index --
movies.index
In [11]: ipl.index
columns
In [12]: # columns
movies.columns
In [13]: ipl.columns
values
file:///C:/Users/goura/Downloads/14-DataFrame.html 7/143
2/8/25, 2:23 PM 14-DataFrame
In [14]: # values
movies.values
In [15]: ipl.values
file:///C:/Users/goura/Downloads/14-DataFrame.html 8/143
2/8/25, 2:23 PM 14-DataFrame
movies.head()
file:///C:/Users/goura/Downloads/14-DataFrame.html 9/143
2/8/25, 2:23 PM 14-DataFrame
Uri: The
0 Surgical tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikiped
Strike
Battalion
1 tt9472208 NaN https://en.wik
609
The
Accidental
2 Prime tt6986710 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedi
Minister
(film)
Why
3 Cheat tt8108208 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipe
India
Evening
4 tt6028796 NaN https://en.wikiped
Shadows
In [17]: movies.tail()
file:///C:/Users/goura/Downloads/14-DataFrame.html 10/143
2/8/25, 2:23 PM 14-DataFrame
Tera
Mera
1624 tt0301250 https://upload.wikimedia.org/wikipedia/en/2/2b... https://en.wiki
Saath
Rahen
Yeh
1625 Zindagi tt0298607 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wiki
Ka Safar
Sabse
1626 Bada tt0069204 NaN https://en.wi
Sukh
In [18]: ipl.head()
file:///C:/Users/goura/Downloads/14-DataFrame.html 11/143
2/8/25, 2:23 PM 14-DataFrame
Narend
2022- Rajasthan Gujarat M
0 1312200 Ahmedabad 2022 Final
05-29 Royals Titans Stadiu
Ahmedab
Narend
Royal
2022- Rajasthan M
1 1312199 Ahmedabad 2022 Qualifier 2 Challengers
05-27 Royals Stadiu
Bangalore
Ahmedab
Royal Lucknow Ed
2022-
2 1312198 Kolkata 2022 Eliminator Challengers Super Garde
05-25
Bangalore Giants Kolk
Ed
2022- Rajasthan Gujarat
3 1312197 Kolkata 2022 Qualifier 1 Garde
05-24 Royals Titans
Kolk
Wankhe
2022- Sunrisers Punjab
4 1304116 Mumbai 2022 70 Stadiu
05-22 Hyderabad Kings
Mum
In [19]: ipl.tail()
file:///C:/Users/goura/Downloads/14-DataFrame.html 12/143
2/8/25, 2:23 PM 14-DataFrame
Kolkata
2008- Deccan
945 335986 Kolkata 2007/08 4 Knight
04-20 Chargers
Riders
Royal
2008- Mumbai Wa
946 335985 Mumbai 2007/08 5 Challengers
04-20 Indians
Bangalore
Chennai
2008- Kings XI
948 335983 Chandigarh 2007/08 2 Super Ass
04-19 Punjab
Kings S
Royal Kolkata
2008-
949 335982 Bangalore 2007/08 1 Challengers Knight Chinn
04-18
Bangalore Riders
In [20]: # sample
movies.sample(5)
file:///C:/Users/goura/Downloads/14-DataFrame.html 13/143
2/8/25, 2:23 PM 14-DataFrame
Chargesheet
803 tt1368453 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wik
(film)
Hero (2015
434 tt4467202 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wik
Hindi film)
Helicopter
144 tt8427036 https://upload.wikimedia.org/wikipedia/en/thum... https://en
Eela
Batti Gul
133 tt7720922 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wi
Meter Chalu
In [21]: ipl.sample(5)
file:///C:/Users/goura/Downloads/14-DataFrame.html 14/143
2/8/25, 2:23 PM 14-DataFrame
Kochi
2011- Deccan N
745 501229 Kochi 2011 32 Tuskers
04-27 Chargers Sta
Kerala
Royal
2015- Sunrisers
484 829719 Bangalore 2015 8 Challengers Chinnas
04-13 Hyderabad
Bangalore Sta
Royal Subrat
2012- Pune
647 548362 Pune 2012 57 Challengers S
05-11 Warriors
Bangalore Sta
Chennai
2014- Mumbai Wan
519 733995 Mumbai 2014 33 Super
05-10 Indians Sta
Kings
In [22]: # info
movies.info()
file:///C:/Users/goura/Downloads/14-DataFrame.html 15/143
2/8/25, 2:23 PM 14-DataFrame
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1629 entries, 0 to 1628
Data columns (total 18 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 title_x 1629 non-null object
1 imdb_id 1629 non-null object
2 poster_path 1526 non-null object
3 wiki_link 1629 non-null object
4 title_y 1629 non-null object
5 original_title 1629 non-null object
6 is_adult 1629 non-null int64
7 year_of_release 1629 non-null int64
8 runtime 1629 non-null object
9 genres 1629 non-null object
10 imdb_rating 1629 non-null float64
11 imdb_votes 1629 non-null int64
12 story 1609 non-null object
13 summary 1629 non-null object
14 tagline 557 non-null object
15 actors 1624 non-null object
16 wins_nominations 707 non-null object
17 release_date 1522 non-null object
dtypes: float64(1), int64(3), object(14)
memory usage: 229.2+ KB
In [23]: # describe
movies.describe()
In [24]: # isnull
movies.isnull().sum()
file:///C:/Users/goura/Downloads/14-DataFrame.html 16/143
2/8/25, 2:23 PM 14-DataFrame
Out[24]: title_x 0
imdb_id 0
poster_path 103
wiki_link 0
title_y 0
original_title 0
is_adult 0
year_of_release 0
runtime 0
genres 0
imdb_rating 0
imdb_votes 0
story 20
summary 0
tagline 1072
actors 5
wins_nominations 922
release_date 107
dtype: int64
In [25]: # duplicated
movies.duplicated().sum()
Out[25]: 0
In [26]: # rename
students
name
Gourab 100 80 10
saurabh 90 70 7
pranav 80 50 2
sanoj 0 0 0
hero 0 0 0
In [28]: students
file:///C:/Users/goura/Downloads/14-DataFrame.html 17/143
2/8/25, 2:23 PM 14-DataFrame
name
Gourab 100 80 10
saurabh 90 70 7
pranav 80 50 2
sanoj 0 0 0
hero 0 0 0
Maths Methods
In [29]: # sum -> axis argument
students.sum()
Out[29]: iq 390
percent 300
lpa 33
dtype: int64
students.sum(axis = 1)
Out[30]: name
Gourab 190
saurabh 167
suman 234
pranav 132
sanoj 0
hero 0
dtype: int64
In [31]: # mean
students.mean()
Out[31]: iq 65.0
percent 50.0
lpa 5.5
dtype: float64
In [32]: students.mean(axis = 1)
Out[32]: name
Gourab 63.333333
saurabh 55.666667
suman 78.000000
pranav 44.000000
sanoj 0.000000
hero 0.000000
dtype: float64
file:///C:/Users/goura/Downloads/14-DataFrame.html 18/143
2/8/25, 2:23 PM 14-DataFrame
In [33]: # median
students.median()
Out[33]: iq 85.0
percent 60.0
lpa 4.5
dtype: float64
In [34]: students.median(axis = 1)
Out[34]: name
Gourab 80.0
saurabh 70.0
suman 100.0
pranav 50.0
sanoj 0.0
hero 0.0
dtype: float64
In [35]: # min
students.min()
Out[35]: iq 0
percent 0
lpa 0
dtype: int64
In [36]: students.min(axis = 1)
Out[36]: name
Gourab 10
saurabh 7
suman 14
pranav 2
sanoj 0
hero 0
dtype: int64
In [37]: # max
students.max()
Out[37]: iq 120
percent 100
lpa 14
dtype: int64
In [38]: students.max(axis = 1)
Out[38]: name
Gourab 100
saurabh 90
suman 120
pranav 80
sanoj 0
hero 0
dtype: int64
file:///C:/Users/goura/Downloads/14-DataFrame.html 19/143
2/8/25, 2:23 PM 14-DataFrame
students.std()
Out[39]: iq 52.057660
percent 41.952354
lpa 5.787918
dtype: float64
In [40]: students.std(axis = 1)
Out[40]: name
Gourab 47.258156
saurabh 43.316663
suman 56.320511
pranav 39.344631
sanoj 0.000000
hero 0.000000
dtype: float64
students.var()
Out[41]: iq 2710.0
percent 1760.0
lpa 33.5
dtype: float64
In [42]: students.var(axis = 1)
Out[42]: name
Gourab 2233.333333
saurabh 1876.333333
suman 3172.000000
pranav 1548.000000
sanoj 0.000000
hero 0.000000
dtype: float64
movies['title_x']
file:///C:/Users/goura/Downloads/14-DataFrame.html 20/143
2/8/25, 2:23 PM 14-DataFrame
movies[['title_x','year_of_release','actors']]
Emraan Hashmi|Shreya
3 Why Cheat India 2019
Dhanwanthary|Snighdadeep ...
movies.iloc[0]
file:///C:/Users/goura/Downloads/14-DataFrame.html 21/143
2/8/25, 2:23 PM 14-DataFrame
movies.iloc[0:16:3]
file:///C:/Users/goura/Downloads/14-DataFrame.html 22/143
2/8/25, 2:23 PM 14-DataFrame
Uri: The
0 Surgical tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikiped
Strike
Why
3 Cheat tt8108208 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipe
India
Fraud
6 tt5013008 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wiki
Saiyaan
Thackeray
9 tt7777196 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipe
(film)
Hum
12 tt9319812 https://upload.wikimedia.org/wikipedia/en/thum... https://en.w
Chaar
Badla
15 (2019 tt8130968 https://upload.wikimedia.org/wikipedia/en/0/0c... https://en.wikiped
film)
movies.iloc[[0,4,5,9]]
file:///C:/Users/goura/Downloads/14-DataFrame.html 23/143
2/8/25, 2:23 PM 14-DataFrame
Uri: The
0 Surgical tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedi
Strike
Evening
4 tt6028796 NaN https://en.wikipedi
Shadows
Soni
5 tt6078866 https://upload.wikimedia.org/wikipedia/en/thum... https://en.w
(film)
Thackeray
9 tt7777196 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikiped
(film)
In [48]: # loc
students.loc['Gourab']
Out[48]: iq 100
percent 80
lpa 10
Name: Gourab, dtype: int64
In [49]: students.loc['Gourab':'pranav':2]
name
Gourab 100 80 10
In [50]: students.loc[['suman','pranav','sanoj']]
name
pranav 80 50 2
sanoj 0 0 0
file:///C:/Users/goura/Downloads/14-DataFrame.html 24/143
2/8/25, 2:23 PM 14-DataFrame
In [51]: students.iloc[0:5:2]
name
Gourab 100 80 10
sanoj 0 0 0
In [52]: students.iloc[[0,2,3,5]]
name
Gourab 100 80 10
pranav 80 50 2
hero 0 0 0
In [54]: movies.loc[0:2,'title_x':'poster_path']
Filtering a DataFrame
In [55]: # Find all the final winners
file:///C:/Users/goura/Downloads/14-DataFrame.html 25/143
2/8/25, 2:23 PM 14-DataFrame
new_df = ipl[mask]
new_df[['Season','WinningTeam']]
ipl[ipl['MatchNumber'] == 'Final'][['Season','WinningTeam']]
file:///C:/Users/goura/Downloads/14-DataFrame.html 26/143
2/8/25, 2:23 PM 14-DataFrame
ipl[ipl['SuperOver'] == 'Y'].shape[0]
Out[57]: 14
Out[58]: 5
(ipl[ipl['TossWinner'] == ipl['WinningTeam']].shape[0]/ipl.shape[0])*100
Out[59]: 51.473684210526315
In [60]: # movies with rating higher than 8 and votes > 10000
Out[60]: 43
file:///C:/Users/goura/Downloads/14-DataFrame.html 27/143
2/8/25, 2:23 PM 14-DataFrame
Out[61]: 33
movies['Country'] = 'India'
movies.head(2)
Uri: The
0 Surgical tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia
Strike
Battalion
1 tt9472208 NaN https://en.wikip
609
ipl.info()
file:///C:/Users/goura/Downloads/14-DataFrame.html 28/143
2/8/25, 2:23 PM 14-DataFrame
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 950 entries, 0 to 949
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 950 non-null int64
1 City 899 non-null object
2 Date 950 non-null object
3 Season 950 non-null object
4 MatchNumber 950 non-null object
5 Team1 950 non-null object
6 Team2 950 non-null object
7 Venue 950 non-null object
8 TossWinner 950 non-null object
9 TossDecision 950 non-null object
10 SuperOver 946 non-null object
11 WinningTeam 946 non-null object
12 WonBy 950 non-null object
13 Margin 932 non-null float64
14 method 19 non-null object
15 Player_of_Match 946 non-null object
16 Team1Players 950 non-null object
17 Team2Players 950 non-null object
18 Umpire1 950 non-null object
19 Umpire2 950 non-null object
dtypes: float64(1), int64(1), object(18)
memory usage: 148.6+ KB
In [66]: ipl.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 950 entries, 0 to 949
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 950 non-null int32
1 City 899 non-null object
2 Date 950 non-null object
3 Season 950 non-null object
4 MatchNumber 950 non-null object
5 Team1 950 non-null object
6 Team2 950 non-null object
7 Venue 950 non-null object
8 TossWinner 950 non-null object
9 TossDecision 950 non-null object
10 SuperOver 946 non-null object
11 WinningTeam 946 non-null object
12 WonBy 950 non-null object
13 Margin 932 non-null float64
14 method 19 non-null object
15 Player_of_Match 946 non-null object
16 Team1Players 950 non-null object
17 Team2Players 950 non-null object
18 Umpire1 950 non-null object
19 Umpire2 950 non-null object
dtypes: float64(1), int32(1), object(18)
memory usage: 144.9+ KB
file:///C:/Users/goura/Downloads/14-DataFrame.html 29/143
2/8/25, 2:23 PM 14-DataFrame
In [68]: ipl.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 950 entries, 0 to 949
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 950 non-null int32
1 City 899 non-null object
2 Date 950 non-null object
3 Season 950 non-null category
4 MatchNumber 950 non-null object
5 Team1 950 non-null category
6 Team2 950 non-null category
7 Venue 950 non-null object
8 TossWinner 950 non-null object
9 TossDecision 950 non-null object
10 SuperOver 946 non-null object
11 WinningTeam 946 non-null object
12 WonBy 950 non-null object
13 Margin 932 non-null float64
14 method 19 non-null object
15 Player_of_Match 946 non-null object
16 Team1Players 950 non-null object
17 Team2Players 950 non-null object
18 Umpire1 950 non-null object
19 Umpire2 950 non-null object
dtypes: category(3), float64(1), int32(1), object(15)
memory usage: 127.4+ KB
Task
"https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv"
Basic DataFrame
Consider the following Python dictionary data and Python list labels:
file:///C:/Users/goura/Downloads/14-DataFrame.html 30/143
2/8/25, 2:23 PM 14-DataFrame
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k']
Q-1:
i. Create a DataFrame birds from the above dictionary data which has the index labels.
df1.info()
df1.describe()
<class 'pandas.core.frame.DataFrame'>
Index: 11 entries, a to k
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 birds 11 non-null object
1 age 9 non-null float64
2 visits 11 non-null int64
3 priority 10 non-null object
dtypes: float64(1), int64(1), object(2)
memory usage: 440.0+ bytes
Out[71]: age visits
file:///C:/Users/goura/Downloads/14-DataFrame.html 31/143
2/8/25, 2:23 PM 14-DataFrame
c plovers 1.5 3 no
e spoonbills 6.0 3 no
g plovers 5.5 2 no
i spoonbills 8.0 3 no
Q-2:
i. Show only rows [1st, 3rd, 7th] from columns ['bird', 'age']
df1[['birds', 'age']].iloc[[0,2,6]]
a Cranes 3.5
c plovers 1.5
g plovers 5.5
df1[df1.visits<4]
c plovers 1.5 3 no
e spoonbills 6.0 3 no
g plovers 5.5 2 no
i spoonbills 8.0 3 no
j spoonbills 4.0 2 no
file:///C:/Users/goura/Downloads/14-DataFrame.html 32/143
2/8/25, 2:23 PM 14-DataFrame
Q-3:
i. Select all rows with nan values in age and visits column.
df1[df1.age.isna() | df1.visits.isna()]
df1.age.fillna(df1.age.mode()[0], inplace=True)
df1.visits.fillna(df1.visits.mode()[0], inplace=True)
Q-4
i. Find the total number of visits of the bird Cranes
iv. Drop Duplicates rows and make this changes permanent. Show dataframe after
changes.
df1[df1.birds == "Cranes"].visits.sum()
Out[77]: 14
df1.birds.value_counts()
Out[78]: birds
Cranes 5
spoonbills 4
plovers 2
Name: count, dtype: int64
df1.duplicated().sum()
file:///C:/Users/goura/Downloads/14-DataFrame.html 33/143
2/8/25, 2:23 PM 14-DataFrame
Out[79]: 2
In [80]: #4 Drop Duplicates rows and make this changes permanent. Show dataframe after ch
df1.drop_duplicates(inplace=True)
You need to make changes accordingly. Consider current name for each teams.
In [82]: data.columns
Q-6 Write a code which can display the bar chart of top 5
teams who have played maximum number of matches in
the IPL.
Hint: Be careful the data is divided in 2 different cols(Team 1 and Team 2)
file:///C:/Users/goura/Downloads/14-DataFrame.html 34/143
2/8/25, 2:23 PM 14-DataFrame
Out[84]: Player_of_Match
SPD Smith 4
Name: count, dtype: int64
file:///C:/Users/goura/Downloads/14-DataFrame.html 35/143
2/8/25, 2:23 PM 14-DataFrame
WinningTeam
Chennai Super Kings 17
Kolkata Knight Riders 9
Name: count, dtype: int64
Player_of_Match
RA Jadeja 3
Name: count, dtype: int64
file:///C:/Users/goura/Downloads/14-DataFrame.html 36/143
2/8/25, 2:23 PM 14-DataFrame
Q-10: Find out the average margin for the team Mumbai
Indians for only the session 2011.
In [87]: # code here
data[((data.Team1 == "Mumbai Indians") | (data.Team2 == "Mumbai Indians")) & (da
Out[87]: 19.25
In [88]: # value_counts
# sort_values
# rank
# sort index
# set index
# rename index -> rename
# reset index
# unique & nunique
# isnull/notnull/hasnans
# dropna
# fillna
# drop_duplicates
# drop
# apply
# isin
# corr
# nlargest -> nsmallest
# insert
# copy
file:///C:/Users/goura/Downloads/14-DataFrame.html 37/143
2/8/25, 2:23 PM 14-DataFrame
marks = pd.DataFrame([
[100,80,10],
[90,70,7],
[120,100,14],
[80,70,14],
[80,70,14]
],columns = ['iq','marks','package'])
marks
0 100 80 10
1 90 70 7
2 120 100 14
3 80 70 14
4 80 70 14
In [91]: marks.value_counts()
In [92]: a = pd.Series([1,1,1,2,2,2,3,3,4,4,5,6,7,8,8,9])
a.value_counts()
Out[92]: 1 3
2 3
3 2
4 2
8 2
5 1
6 1
7 1
9 1
Name: count, dtype: int64
file:///C:/Users/goura/Downloads/14-DataFrame.html 38/143
2/8/25, 2:23 PM 14-DataFrame
Narend
2022- Rajasthan Gujarat M
0 1312200 Ahmedabad 2022 Final
05-29 Royals Titans Stadiu
Ahmedab
Narend
Royal
2022- Rajasthan M
1 1312199 Ahmedabad 2022 Qualifier 2 Challengers
05-27 Royals Stadiu
Bangalore
Ahmedab
In [94]: # find which player has won most potm -> in finals and qualifiers
ipl[~ipl['MatchNumber'].str.isdigit()]['Player_of_Match'].value_counts()
file:///C:/Users/goura/Downloads/14-DataFrame.html 39/143
2/8/25, 2:23 PM 14-DataFrame
Out[94]: Player_of_Match
KA Pollard 3
F du Plessis 3
SK Raina 3
A Kumble 2
MK Pandey 2
YK Pathan 2
M Vijay 2
JJ Bumrah 2
AB de Villiers 2
SR Watson 2
HH Pandya 1
Harbhajan Singh 1
A Nehra 1
V Sehwag 1
UT Yadav 1
MS Bisla 1
BJ Hodge 1
MEK Hussey 1
MS Dhoni 1
CH Gayle 1
MM Patel 1
DE Bollinger 1
AC Gilchrist 1
RG Sharma 1
DA Warner 1
MC Henriques 1
JC Buttler 1
RM Patidar 1
DA Miller 1
VR Iyer 1
SP Narine 1
RD Gaikwad 1
TA Boult 1
MP Stoinis 1
KS Williamson 1
RR Pant 1
SA Yadav 1
Rashid Khan 1
AD Russell 1
KH Pandya 1
KV Sharma 1
NM Coulter-Nile 1
Washington Sundar 1
BCJ Cutting 1
M Ntini 1
Name: count, dtype: int64
ipl['TossDecision'].value_counts().plot(kind = 'pie')
file:///C:/Users/goura/Downloads/14-DataFrame.html 40/143
2/8/25, 2:23 PM 14-DataFrame
(ipl['Team1'].value_counts() + ipl['Team2'].value_counts()).sort_values(
ascending=False)
In [97]: # sort_values(series and dataframe) -> ascending -> na_position -> inplace -> mu
x = pd.Series([12,14,1,56,89])
x
Out[97]: 0 12
1 14
2 1
3 56
4 89
dtype: int64
file:///C:/Users/goura/Downloads/14-DataFrame.html 41/143
2/8/25, 2:23 PM 14-DataFrame
Out[98]: 4 89
3 56
1 14
0 12
2 1
dtype: int64
Uri: The
0 Surgical tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia
Strike
Battalion
1 tt9472208 NaN https://en.wikip
609
file:///C:/Users/goura/Downloads/14-DataFrame.html 42/143
2/8/25, 2:23 PM 14-DataFrame
Zor Lagaa
939 tt1479857 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wi
Ke...Haiya!
Zindagi
670 tt2164702 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wik
Tere Naam
Zindagi
778 Na Milegi tt1562872 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wik
Dobara
1971
1039 tt0983990 https://upload.wikimedia.org/wikipedia/en/thum... https://en.w
(2007 film)
1920: The
723 Evil tt2222550 https://upload.wikimedia.org/wikipedia/en/e/e7... https://en.wi
Returns
1920:
287 tt5638500 https://upload.wikimedia.org/wikipedia/en/thum... https://e
London
file:///C:/Users/goura/Downloads/14-DataFrame.html 43/143
2/8/25, 2:23 PM 14-DataFrame
16
1498 December tt0313844 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wiki
(film)
}
)
students
In [102… students.sort_values('name',na_position='first',ascending=False,inplace=True)
file:///C:/Users/goura/Downloads/14-DataFrame.html 44/143
2/8/25, 2:23 PM 14-DataFrame
In [103… students
In [104… movies.sort_values(['year_of_release','title_x'],ascending=[True,False])
file:///C:/Users/goura/Downloads/14-DataFrame.html 45/143
2/8/25, 2:23 PM 14-DataFrame
Yeh
1625 Zindagi tt0298607 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wik
Ka Safar
Yeh
Teraa
1622 Ghar Yeh tt0298606 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wik
Meraa
Ghar
Yeh
Raaste
1620 tt0292740 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wik
Hain
Pyaar Ke
Yaadein
1573 (2001 tt0248617 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wiki
film)
Article
37 tt10324144 https://upload.wikimedia.org/wikipedia/en/thum... https://en.w
15 (film)
Arjun
46 tt7881524 https://upload.wikimedia.org/wikipedia/en/thum... https://e
Patiala
Albert
Pinto Ko
Gussa
26 tt4355838 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wik
Kyun
Aata
Hai?
file:///C:/Users/goura/Downloads/14-DataFrame.html 46/143
2/8/25, 2:23 PM 14-DataFrame
batsman = pd.read_csv("batsman_runs_ipl.csv")
batsman.head()
1 A Badoni 161
2 A Chandila 4
3 A Chopra 53
4 A Choudhary 25
marks = {
'maths':67,
'english':57,
'science':89,
'hindi':100
}
file:///C:/Users/goura/Downloads/14-DataFrame.html 47/143
2/8/25, 2:23 PM 14-DataFrame
marks_series = pd.Series(marks)
marks_series
Out[107… maths 67
english 57
science 89
hindi 100
dtype: int64
In [108… marks_series.sort_index(ascending=False)
Out[108… science 89
maths 67
hindi 100
english 57
dtype: int64
In [109… movies.sort_index(ascending=False)
file:///C:/Users/goura/Downloads/14-DataFrame.html 48/143
2/8/25, 2:23 PM 14-DataFrame
Sabse
1626 Bada tt0069204 NaN https://en.w
Sukh
Yeh
1625 Zindagi tt0298607 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wik
Ka Safar
Tera Mera
1624 Saath tt0301250 https://upload.wikimedia.org/wikipedia/en/2/2b... https://en.wik
Rahen
Evening
4 tt6028796 NaN https://en.w
Shadows
Why
3 Cheat tt8108208 https://upload.wikimedia.org/wikipedia/en/thum... https://en.w
India
The
Accidental
2 Prime tt6986710 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wik
Minister
(film)
Battalion
1 tt9472208 NaN https://e
609
file:///C:/Users/goura/Downloads/14-DataFrame.html 49/143
2/8/25, 2:23 PM 14-DataFrame
batsman.set_index('batter',inplace=True)
batsman.reset_index(inplace=True)
In [112… batsman
2 A Chandila 4 535.0
3 A Chopra 53 329.0
4 A Choudhary 25 402.5
batsman.reset_index().set_index('batting_rank')
file:///C:/Users/goura/Downloads/14-DataFrame.html 50/143
2/8/25, 2:23 PM 14-DataFrame
batting_rank
535.0 2 A Chandila 4
329.0 3 A Chopra 53
402.5 4 A Choudhary 25
marks_series.reset_index()
Out[114… index 0
0 maths 67
1 english 57
2 science 89
3 hindi 100
movies.set_index('title_x',inplace=True)
In [118… movies.head(2)
file:///C:/Users/goura/Downloads/14-DataFrame.html 51/143
2/8/25, 2:23 PM 14-DataFrame
title_x
In [119… # unique(series)
temp = pd.Series([1,1,2,2,3,3,4,4,5,5,np.nan,np.nan])
temp.unique()
In [120… ipl['Season'].unique().shape
Out[120… (15,)
In [121… # nunique(series + dataframe) -> does not count nan -> dropna parameter
# nunique will not count null or missing values but unique will.
ipl['Season'].nunique()
Out[121… 15
students[~students['name'].isnull()]
In [123… students['name'][students['name'].isnull()]
file:///C:/Users/goura/Downloads/14-DataFrame.html 52/143
2/8/25, 2:23 PM 14-DataFrame
Out[123… 3 NaN
5 NaN
7 NaN
9 NaN
Name: name, dtype: object
students['name'][students['name'].notnull()]
Out[124… 8 aditya
2 Suman
1 Saurabh
6 Sanoj
4 Pranav
0 Gourab
Name: name, dtype: object
In [125… # hasnans(series)
students['name'].hasnans
Out[125… True
In [126… students
In [127… students.isnull()
file:///C:/Users/goura/Downloads/14-DataFrame.html 53/143
2/8/25, 2:23 PM 14-DataFrame
In [128… students.notnull()
students['name'].dropna()
Out[129… 8 aditya
2 Suman
1 Saurabh
6 Sanoj
4 Pranav
0 Gourab
Name: name, dtype: object
file:///C:/Users/goura/Downloads/14-DataFrame.html 54/143
2/8/25, 2:23 PM 14-DataFrame
students['name'].fillna('unknown')
file:///C:/Users/goura/Downloads/14-DataFrame.html 55/143
2/8/25, 2:23 PM 14-DataFrame
Out[133… 3 unknown
5 unknown
7 unknown
9 unknown
8 aditya
2 Suman
1 Saurabh
6 Sanoj
4 Pranav
0 Gourab
Name: name, dtype: object
In [134… students.fillna(0)
3 0 0 0 0.00 0.0
In [135… students['package'].fillna(students['package'].mean())
Out[135… 3 6.428571
5 7.000000
7 9.000000
9 6.428571
8 6.428571
2 6.000000
1 5.000000
6 8.000000
4 6.000000
0 4.000000
Name: package, dtype: float64
file:///C:/Users/goura/Downloads/14-DataFrame.html 56/143
2/8/25, 2:23 PM 14-DataFrame
Out[136… 3 aditya
5 aditya
7 aditya
9 aditya
8 aditya
2 Suman
1 Saurabh
6 Sanoj
4 Pranav
0 Gourab
Name: name, dtype: object
Out[138… 0 1
3 2
5 3
7 4
8 5
10 6
12 7
14 8
17 9
19 0
dtype: int64
marks.drop_duplicates(keep = 'last')
file:///C:/Users/goura/Downloads/14-DataFrame.html 57/143
2/8/25, 2:23 PM 14-DataFrame
Narend
2022- Rajasthan Gujarat M
0 1312200 Ahmedabad 2022 Final
05-29 Royals Titans Stadiu
Ahmedab
Narend
Royal
2022- Rajasthan M
1 1312199 Ahmedabad 2022 Qualifier 2 Challengers
05-27 Royals Stadiu
Bangalore
Ahmedab
Royal Lucknow Ed
2022-
2 1312198 Kolkata 2022 Eliminator Challengers Super Garde
05-25
Bangalore Giants Kolk
Ed
2022- Rajasthan Gujarat
3 1312197 Kolkata 2022 Qualifier 1 Garde
05-24 Royals Titans
Kolk
Wankhe
2022- Sunrisers Punjab
4 1304116 Mumbai 2022 70 Stadiu
05-22 Hyderabad Kings
Mum
5 rows × 21 columns
temp = pd.Series([10,2,3,16,45,78,10])
temp
file:///C:/Users/goura/Downloads/14-DataFrame.html 58/143
2/8/25, 2:23 PM 14-DataFrame
Out[142… 0 10
1 2
2 3
3 16
4 45
5 78
6 10
dtype: int64
In [143… temp.drop(index=[0,6])
Out[143… 1 2
2 3
3 16
4 45
5 78
dtype: int64
In [144… students
file:///C:/Users/goura/Downloads/14-DataFrame.html 59/143
2/8/25, 2:23 PM 14-DataFrame
In [146… students.drop(index=[0,8])
file:///C:/Users/goura/Downloads/14-DataFrame.html 60/143
2/8/25, 2:23 PM 14-DataFrame
name
temp = pd.Series([10,20,30,40,50])
temp
Out[148… 0 10
1 20
2 30
3 40
4 50
dtype: int64
In [150… temp.apply(sigmoid)
Out[150… 0 1.000045
1 1.000000
2 1.000000
3 1.000000
4 1.000000
dtype: float64
points_df
file:///C:/Users/goura/Downloads/14-DataFrame.html 61/143
2/8/25, 2:23 PM 14-DataFrame
0 (3, 4) (-3, 4)
1 (-6, 5) (0, 0)
2 (0, 0) (2, 2)
4 (4, 5) (1, 1)
GroupBy
GroupBy is study of groups -->> GroupBy will always apply on categorical column
file:///C:/Users/goura/Downloads/14-DataFrame.html 62/143
2/8/25, 2:23 PM 14-DataFrame
The
Frank Tim
0 Shawshank 1994 142 Drama 9.3
Darabont Robbins
Redemption
Francis
The Marlon
1 1972 175 Crime 9.2 Ford
Godfather Brando
Coppola
The Francis
Al
3 Godfather: 1974 202 Crime 9.0 Ford
Pacino
Part II Coppola
file:///C:/Users/goura/Downloads/14-DataFrame.html 63/143
2/8/25, 2:23 PM 14-DataFrame
Genre
InterstellarBack to the
Adventure 2014198520091981196819621959201319751963194819
FutureInglourious Bast...
Sen to Chihiro no
Animation kamikakushiThe Lion 2001199419882016201820172008199719952019200920
KingHota...
Schindler's
Biography ListGoodfellasHamiltonThe 1993199020202011200220171995198420182013201320
Intoucha...
GisaengchungLa vita è
Comedy 2019199719361931200919641940200120001973196019
bellaModern TimesCity Li...
The GodfatherThe
Crime Godfather: Part II12 Angry 1972197419571994200219991995199120192006199519
Me...
The Shawshank
Drama RedemptionFight 1994199919941975202019981946201420061998198819
ClubForrest Gump...
PsychoAlienThe ThingThe
Horror 1960197919821973196819612017197819332004200
ExorcistNight of the L...
MementoRear
Mystery WindowVertigoShutter 20001954195820102012199519721938198820121998199
IslandKahaani...
file:///C:/Users/goura/Downloads/14-DataFrame.html 64/143
2/8/25, 2:23 PM 14-DataFrame
Series_Title Released_Yea
Genre
Il buono, il brutto, il
Western 196619681965197
cattivoOnce Upon a Tim...
In [157… genres.min()
file:///C:/Users/goura/Downloads/14-DataFrame.html 65/143
2/8/25, 2:23 PM 14-DataFrame
Genre
Abhishek Aamir
Action 300 1924 45 7.6
Chaubey Khan
2001: A
Akira Aamir
Adventure Space 1925 88 7.6
Kurosawa Khan
Odyssey
Adam Adrian
Animation Akira 1940 71 7.6
Elliot Molina
Aamir Abhay
Drama 1917 1925 64 7.6
Khan Deol
E.T. the
Gene
Family Extra- 1971 100 7.8 Mel Stuart
Wilder
Terrestrial
Das
F.W. Max
Fantasy Cabinet des 1920 76 7.9
Murnau Schreck
Dr. Caligari
Alejandro Anthony
Horror Alien 1933 71 7.6
Amenábar Perkins
Bernard-
Alex
Mystery Dark City 1938 96 7.6 Pierre
Proyas
Donnadieu
Il buono, il
Clint Clint
Western brutto, il 1965 132 7.8
Eastwood Eastwood
cattivo
In [ ]:
In [ ]:
movies.groupby('Genre').sum()['Gross'].sort_values(ascending = False).head(3)
file:///C:/Users/goura/Downloads/14-DataFrame.html 66/143
2/8/25, 2:23 PM 14-DataFrame
Out[160… Genre
Drama 3.540997e+10
Action 3.263226e+10
Comedy 1.566387e+10
Name: Gross, dtype: float64
movies.groupby('Genre')['Gross'].sum().sort_values(ascending = False).head(3)
Out[161… Genre
Drama 3.540997e+10
Action 3.263226e+10
Comedy 1.566387e+10
Name: Gross, dtype: float64
movies.groupby('Genre')['IMDB_Rating'].mean().sort_values(ascending = False).hea
Out[163… Genre
Western 8.35
Name: IMDB_Rating, dtype: float64
movies.groupby('Director')['No_of_Votes'].sum().sort_values(ascending = False).h
Out[164… Director
Christopher Nolan 11578345
Name: No_of_Votes, dtype: int64
movies.groupby('Genre')['IMDB_Rating'].max()
Out[165… Genre
Action 9.0
Adventure 8.6
Animation 8.6
Biography 8.9
Comedy 8.6
Crime 9.2
Drama 9.3
Family 7.8
Fantasy 8.1
Film-Noir 8.1
Horror 8.5
Mystery 8.4
Thriller 7.8
Western 8.8
Name: IMDB_Rating, dtype: float64
# movies['Star1'].value_counts()
movies.groupby('Star1')['Series_Title'].count().sort_values(ascending=False)
file:///C:/Users/goura/Downloads/14-DataFrame.html 67/143
2/8/25, 2:23 PM 14-DataFrame
Out[166… Star1
Tom Hanks 12
Robert De Niro 11
Clint Eastwood 10
Al Pacino 10
Leonardo DiCaprio 9
..
Glen Hansard 1
Giuseppe Battiston 1
Giulietta Masina 1
Gerardo Taracena 1
Ömer Faruk Sorak 1
Name: Series_Title, Length: 660, dtype: int64
In [167… len(movies.groupby('Genre'))
Out[167… 14
In [168… movies['Genre'].nunique()
Out[168… 14
In [169… movies.groupby('Genre').size()
Out[169… Genre
Action 172
Adventure 72
Animation 82
Biography 88
Comedy 155
Crime 107
Drama 289
Family 2
Fantasy 2
Film-Noir 3
Horror 11
Mystery 12
Thriller 1
Western 4
dtype: int64
file:///C:/Users/goura/Downloads/14-DataFrame.html 68/143
2/8/25, 2:23 PM 14-DataFrame
Lana L
14 The Matrix 1999 136 Action 8.7
Wachowski Wachow
Saving
Steven
24 Private 1998 169 Drama 8.6 Tom Han
Spielberg
Ryan
Ayla: The
54 Daughter 2017 125 Biography 8.4 Can Ulkay Erdem C
of War
Lee Adr
61 Coco 2017 105 Animation 8.4
Unkrich Mol
Dr.
Strangelove
Stanley Pe
78 or: How I 1964 95 Comedy 8.4
Kubrick Sell
Learned to
Stop Worr...
Lawrence Pe
116 1962 228 Adventure 8.3 David Lean
of Arabia O'Too
In [ ]:
In [175… movies['Genre'].value_counts()
Out[175… Genre
Drama 289
Action 172
Comedy 155
Crime 107
Biography 88
Animation 82
Adventure 72
Mystery 12
Horror 11
Western 4
Film-Noir 3
Fantasy 2
Family 2
Thriller 1
Name: count, dtype: int64
In [176… genres.get_group('Horror')
file:///C:/Users/goura/Downloads/14-DataFrame.html 69/143
2/8/25, 2:23 PM 14-DataFrame
Alfred Anthony
49 Psycho 1960 109 Horror 8.5
Hitchcock Perkins
Ridley Sigourney
75 Alien 1979 117 Horror 8.4
Scott Weaver
John Kurt
271 The Thing 1982 109 Horror 8.1
Carpenter Russell
William Ellen
419 The Exorcist 1973 122 Horror 8.0
Friedkin Burstyn
Night of
George A. Duane
544 the Living 1968 96 Horror 7.9
Romero Jones
Dead
Jordan Daniel
724 Get Out 2017 104 Horror 7.7
Peele Kaluuya
John Donald
844 Halloween 1978 91 Horror 7.7
Carpenter Pleasence
The
James Claude
876 Invisible 1933 71 Horror 7.7
Whale Rains
Man
James Cary
932 Saw 2004 103 Horror 7.6
Wan Elwes
Alejandro Nicole
948 The Others 2001 101 Horror 7.6
Amenábar Kidman
In [177… #genres.get_group('Fantasy')
movies[movies['Genre'] == 'Fantasy']
Das
Robert Werner
321 Cabinet des 1920 76 Fantasy 8.1
Wiene Krauss
Dr. Caligari
F.W. Max
568 Nosferatu 1922 94 Fantasy 7.9
Murnau Schreck
In [178… genres.groups
file:///C:/Users/goura/Downloads/14-DataFrame.html 70/143
2/8/25, 2:23 PM 14-DataFrame
Out[178… {'Action': [2, 5, 8, 10, 13, 14, 16, 29, 30, 31, 39, 42, 44, 55, 57, 59, 60, 6
3, 68, 72, 106, 109, 129, 130, 134, 140, 142, 144, 152, 155, 160, 161, 166, 16
8, 171, 172, 177, 181, 194, 201, 202, 216, 217, 223, 224, 236, 241, 262, 275, 2
94, 308, 320, 325, 326, 331, 337, 339, 340, 343, 345, 348, 351, 353, 356, 357,
362, 368, 369, 375, 376, 390, 410, 431, 436, 473, 477, 479, 482, 488, 493, 496,
502, 507, 511, 532, 535, 540, 543, 564, 569, 570, 573, 577, 582, 583, 602, 605,
608, 615, 623, ...], 'Adventure': [21, 47, 93, 110, 114, 116, 118, 137, 178, 17
9, 191, 193, 209, 226, 231, 247, 267, 273, 281, 300, 301, 304, 306, 323, 329, 3
61, 366, 377, 402, 406, 415, 426, 458, 470, 497, 498, 506, 513, 514, 537, 549,
552, 553, 566, 576, 604, 609, 618, 638, 647, 675, 681, 686, 692, 711, 713, 739,
755, 781, 797, 798, 851, 873, 884, 912, 919, 947, 957, 964, 966, 984, 991], 'An
imation': [23, 43, 46, 56, 58, 61, 66, 70, 101, 135, 146, 151, 158, 170, 197, 2
05, 211, 213, 219, 229, 230, 242, 245, 246, 270, 330, 332, 358, 367, 378, 386,
389, 394, 395, 399, 401, 405, 409, 469, 499, 510, 516, 518, 522, 578, 586, 592,
595, 596, 599, 633, 640, 643, 651, 665, 672, 694, 728, 740, 741, 744, 756, 758,
761, 771, 783, 796, 799, 822, 828, 843, 875, 891, 892, 902, 906, 920, 956, 971,
976, 986, 992], 'Biography': [7, 15, 18, 35, 38, 54, 102, 107, 131, 139, 147, 1
57, 159, 173, 176, 212, 215, 218, 228, 235, 243, 263, 276, 282, 290, 298, 317,
328, 338, 342, 346, 359, 360, 365, 372, 373, 385, 411, 416, 418, 424, 429, 484,
525, 536, 542, 545, 575, 579, 587, 600, 606, 614, 622, 632, 635, 644, 649, 650,
657, 671, 673, 684, 729, 748, 753, 757, 759, 766, 770, 779, 809, 810, 815, 820,
831, 849, 858, 877, 882, 897, 910, 915, 923, 940, 949, 952, 987], 'Comedy': [1
9, 26, 51, 52, 64, 78, 83, 95, 96, 112, 117, 120, 127, 128, 132, 153, 169, 183,
192, 204, 207, 208, 214, 221, 233, 238, 240, 250, 251, 252, 256, 261, 266, 277,
284, 311, 313, 316, 318, 322, 327, 374, 379, 381, 392, 396, 403, 413, 414, 417,
427, 435, 445, 446, 449, 455, 459, 460, 463, 464, 466, 471, 472, 475, 481, 490,
494, 500, 503, 509, 526, 528, 530, 531, 533, 538, 539, 541, 547, 557, 558, 562,
563, 565, 574, 591, 593, 594, 598, 613, 626, 630, 660, 662, 667, 679, 680, 683,
687, 701, ...], 'Crime': [1, 3, 4, 6, 22, 25, 27, 28, 33, 37, 41, 71, 77, 79, 8
6, 87, 103, 108, 111, 113, 123, 125, 133, 136, 162, 163, 164, 165, 180, 186, 18
7, 189, 198, 222, 232, 239, 255, 257, 287, 288, 299, 305, 335, 363, 364, 380, 3
84, 397, 437, 438, 441, 442, 444, 450, 451, 465, 474, 480, 485, 487, 505, 512,
519, 520, 523, 527, 546, 556, 560, 584, 597, 603, 607, 611, 621, 639, 653, 664,
669, 676, 695, 708, 723, 762, 763, 767, 775, 791, 795, 802, 811, 823, 827, 833,
885, 895, 921, 922, 926, 938, ...], 'Drama': [0, 9, 11, 17, 20, 24, 32, 34, 36,
40, 45, 50, 53, 62, 65, 67, 73, 74, 76, 80, 82, 84, 85, 88, 89, 90, 91, 92, 94,
97, 98, 99, 100, 104, 105, 121, 122, 124, 126, 138, 141, 143, 148, 149, 150, 15
4, 156, 167, 174, 175, 182, 184, 185, 188, 190, 195, 196, 199, 200, 203, 206, 2
10, 225, 227, 234, 237, 244, 248, 249, 253, 254, 258, 259, 260, 264, 265, 268,
269, 272, 274, 278, 279, 280, 283, 285, 286, 289, 291, 292, 293, 295, 296, 297,
302, 303, 307, 310, 312, 314, 315, ...], 'Family': [688, 698], 'Fantasy': [321,
568], 'Film-Noir': [309, 456, 712], 'Horror': [49, 75, 271, 419, 544, 707, 724,
844, 876, 932, 948], 'Mystery': [69, 81, 119, 145, 220, 393, 420, 714, 829, 89
9, 959, 961], 'Thriller': [700], 'Western': [12, 48, 115, 691]}
genres.agg({
'Runtime' : 'mean',
'IMDB_Rating' : 'mean',
'No_of_Votes' : 'mean',
'Gross' : 'sum',
'Metascore' : 'min'
})
file:///C:/Users/goura/Downloads/14-DataFrame.html 71/143
2/8/25, 2:23 PM 14-DataFrame
Genre
genres.apply(min)
file:///C:/Users/goura/Downloads/14-DataFrame.html 72/143
2/8/25, 2:23 PM 14-DataFrame
Genre
Abhishek
Action 300 1924 45 Action 7.6
Chaubey
2001: A
Akira
Adventure Space 1925 88 Adventure 7.6
Kurosawa
Odyssey
Adam
Animation Akira 1940 71 Animation 7.6
Elliot
12 Years a Adam
Biography 1928 93 Biography 7.6
Slave McKay
12 Angry Akira
Crime 1931 80 Crime 7.6
Men Kurosawa
Aamir
Drama 1917 1925 64 Drama 7.6
Khan
E.T. the
Family Extra- 1971 100 Family 7.8 Mel Stuart
Terrestrial
Das
F.W.
Fantasy Cabinet des 1920 76 Fantasy 7.9
Murnau
Dr. Caligari
Shadow of Alfred H
Film-Noir 1941 100 Film-Noir 7.8
a Doubt Hitchcock
Alejandro
Horror Alien 1933 71 Horror 7.6
Amenábar
Alex
Mystery Dark City 1938 96 Mystery 7.6
Proyas
Do
Il buono, il
Clint
Western brutto, il 1965 132 Western 7.8
Eastwood E
cattivo
def foo(group):
return group['Series_Title'].str.startswith('A').sum()
In [184… genres.apply(foo)
file:///C:/Users/goura/Downloads/14-DataFrame.html 73/143
2/8/25, 2:23 PM 14-DataFrame
Out[184… Genre
Action 10
Adventure 2
Animation 2
Biography 9
Comedy 14
Crime 4
Drama 21
Family 0
Fantasy 0
Film-Noir 0
Horror 1
Mystery 0
Thriller 0
Western 0
dtype: int64
In [185… # find ranking of each movie in the group according to IMDB score
def rank_movie(group):
group['genre_rank'] = group['IMDB_Rating'].rank(ascending=False)
return group
In [186… genres.apply(rank_movie)
file:///C:/Users/goura/Downloads/14-DataFrame.html 74/143
2/8/25, 2:23 PM 14-DataFrame
Genre
The Lord of
the Rings: Peter
5 2003 201 Action 8.9
The Return Jackson
of the King
Christopher
8 Inception 2010 148 Action 8.8
Nolan
The Lord of
the Rings:
Peter
10 The 2001 178 Action 8.8
Jackson
Fellowship
of the Ring
The Lord of
the Rings: Peter
13 2002 179 Action 8.7
The Two Jackson
Towers
Western 12 Il buono, il
Sergio
brutto, il 1966 161 Western 8.8
Leone
cattivo
Once Upon
Sergio
48 a Time in 1968 165 Western 8.5
Leone
the West
Per qualche
Sergio
115 dollaro in 1965 132 Western 8.3
Leone
più
def normal(group):
group['norm_rating'] = (group['IMDB_Rating'] - group['IMDB_Rating'].min())/(
group['IMDB_Rating'].max()-group['IMDB_Rating'].min())
return group
genres.apply(normal)
file:///C:/Users/goura/Downloads/14-DataFrame.html 75/143
2/8/25, 2:23 PM 14-DataFrame
Genre
The Lord of
the Rings: Peter
5 2003 201 Action 8.9
The Return Jackson
of the King
Christopher
8 Inception 2010 148 Action 8.8
Nolan
The Lord of
the Rings:
Peter
10 The 2001 178 Action 8.8
Jackson
Fellowship
of the Ring
The Lord of
the Rings: Peter
13 2002 179 Action 8.7
The Two Jackson
Towers
Western 12 Il buono, il
Sergio
brutto, il 1966 161 Western 8.8
Leone
cattivo
Once Upon
Sergio
48 a Time in 1968 165 Western 8.5
Leone
the West
Per qualche
Sergio
115 dollaro in 1965 132 Western 8.3
Leone
più
duo = movies.groupby(['Director','Star1'])
duo
# Size
duo.size()
# get group
duo.get_group(('Aamir Khan','Amole Gupte'))
file:///C:/Users/goura/Downloads/14-DataFrame.html 76/143
2/8/25, 2:23 PM 14-DataFrame
duo['Gross'].sum().sort_values(ascending = False).head(1)
movies.groupby(['Star1','Genre'])['Metascore'].mean().reset_index().sort_values(
'Metascore',ascending = False).head(1)
Merge
file:///C:/Users/goura/Downloads/14-DataFrame.html 77/143
2/8/25, 2:23 PM 14-DataFrame
0 23 1
1 15 5
2 18 6
3 23 4
4 16 9
5 18 1
6 1 1
7 7 8
8 22 3
9 15 1
10 19 4
11 1 6
12 7 10
13 11 7
14 13 3
15 24 4
16 21 1
17 16 5
18 23 3
19 17 7
20 23 6
21 25 1
22 19 2
23 25 10
24 3 3
25 3 5
26 16 7
27 12 10
28 12 1
29 14 9
30 7 7
31 7 2
32 16 3
file:///C:/Users/goura/Downloads/14-DataFrame.html 78/143
2/8/25, 2:23 PM 14-DataFrame
student_id course_id
33 17 10
34 11 8
35 14 6
36 12 5
37 12 7
38 18 8
39 1 10
40 1 9
41 2 5
42 7 6
43 22 5
44 22 6
45 23 9
46 23 5
47 14 4
48 14 1
49 11 10
50 42 9
51 50 8
52 38 1
In [197… pd.concat([nov,dec]).shape
Out[197… (53, 2)
multi = pd.concat([nov,dec],keys=['Nov','Dec'])
# iloc will not work here becoz iloc is used for integer based index
multi.loc['Nov']
# loc is used for labels.
multi.loc['Dec']
file:///C:/Users/goura/Downloads/14-DataFrame.html 79/143
2/8/25, 2:23 PM 14-DataFrame
0 3 5
1 16 7
2 12 10
3 12 1
4 14 9
5 7 7
6 7 2
7 16 3
8 17 10
9 11 8
10 14 6
11 12 5
12 12 7
13 18 8
14 1 10
15 1 9
16 2 5
17 7 6
18 22 5
19 22 6
20 23 9
21 23 5
22 14 4
23 14 1
24 11 10
25 42 9
26 50 8
27 38 1
multi.loc[('Dec',8)]
Out[200… student_id 17
course_id 10
Name: (Dec, 8), dtype: int64
file:///C:/Users/goura/Downloads/14-DataFrame.html 80/143
2/8/25, 2:23 PM 14-DataFrame
pd.concat([nov,dec],axis = 1)
0 23.0 1.0 3 5
1 15.0 5.0 16 7
2 18.0 6.0 12 10
3 23.0 4.0 12 1
4 16.0 9.0 14 9
5 18.0 1.0 7 7
6 1.0 1.0 7 2
7 7.0 8.0 16 3
8 22.0 3.0 17 10
9 15.0 1.0 11 8
10 19.0 4.0 14 6
11 1.0 6.0 12 5
12 7.0 10.0 12 7
13 11.0 7.0 18 8
14 13.0 3.0 1 10
15 24.0 4.0 1 9
16 21.0 1.0 2 5
17 16.0 5.0 7 6
18 23.0 3.0 22 5
19 17.0 7.0 22 6
20 23.0 6.0 23 9
21 25.0 1.0 23 5
22 19.0 2.0 14 4
23 25.0 10.0 14 1
24 3.0 3.0 11 10
25 NaN NaN 42 9
26 NaN NaN 50 8
27 NaN NaN 38 1
file:///C:/Users/goura/Downloads/14-DataFrame.html 81/143
2/8/25, 2:23 PM 14-DataFrame
file:///C:/Users/goura/Downloads/14-DataFrame.html 82/143
2/8/25, 2:23 PM 14-DataFrame
0 1 Kailash Harjo 23 1
1 1 Kailash Harjo 23 6
2 1 Kailash Harjo 23 10
3 1 Kailash Harjo 23 9
4 2 Esha Butala 1 5
5 3 Parveen Bhalla 3 3
6 3 Parveen Bhalla 3 5
7 7 Tarun Thaker 9 8
8 7 Tarun Thaker 9 10
9 7 Tarun Thaker 9 7
10 7 Tarun Thaker 9 2
11 7 Tarun Thaker 9 6
12 11 David Mukhopadhyay 20 7
13 11 David Mukhopadhyay 20 8
14 11 David Mukhopadhyay 20 10
15 12 Radha Dutt 19 10
16 12 Radha Dutt 19 1
17 12 Radha Dutt 19 5
18 12 Radha Dutt 19 7
19 13 Munni Varghese 24 3
20 14 Pranab Natarajan 22 9
21 14 Pranab Natarajan 22 6
22 14 Pranab Natarajan 22 4
23 14 Pranab Natarajan 22 1
24 15 Preet Sha 16 5
25 15 Preet Sha 16 1
26 16 Elias Dodiya 25 9
27 16 Elias Dodiya 25 5
28 16 Elias Dodiya 25 7
29 16 Elias Dodiya 25 3
30 17 Yasmin Palan 7 7
31 17 Yasmin Palan 7 10
32 18 Fardeen Mahabir 13 6
file:///C:/Users/goura/Downloads/14-DataFrame.html 83/143
2/8/25, 2:23 PM 14-DataFrame
33 18 Fardeen Mahabir 13 1
34 18 Fardeen Mahabir 13 8
35 19 Qabeel Raman 12 4
36 19 Qabeel Raman 12 2
37 21 Seema Kota 15 1
38 22 Yash Sethi 21 3
39 22 Yash Sethi 21 5
40 22 Yash Sethi 21 6
41 23 Chhavi Lachman 18 1
42 23 Chhavi Lachman 18 4
43 23 Chhavi Lachman 18 3
44 23 Chhavi Lachman 18 6
45 23 Chhavi Lachman 18 9
46 23 Chhavi Lachman 18 5
47 24 Radhika Suri 17 4
48 25 Shashank D’Alia 2 1
49 25 Shashank D’Alia 2 10
file:///C:/Users/goura/Downloads/14-DataFrame.html 84/143
2/8/25, 2:23 PM 14-DataFrame
file:///C:/Users/goura/Downloads/14-DataFrame.html 85/143
2/8/25, 2:23 PM 14-DataFrame
temp_df = pd.DataFrame({
'student_id':[26,27,28],
'name':['Gaurav','Saurav','Rohan'],
'partner':[28,26,17]
})
students = pd.concat([students,temp_df],ignore_index=True)
file:///C:/Users/goura/Downloads/14-DataFrame.html 86/143
2/8/25, 2:23 PM 14-DataFrame
file:///C:/Users/goura/Downloads/14-DataFrame.html 87/143
2/8/25, 2:23 PM 14-DataFrame
50 42 NaN NaN 9
51 50 NaN NaN 8
52 38 NaN NaN 1
file:///C:/Users/goura/Downloads/14-DataFrame.html 88/143
2/8/25, 2:23 PM 14-DataFrame
file:///C:/Users/goura/Downloads/14-DataFrame.html 89/143
2/8/25, 2:23 PM 14-DataFrame
57 26 Gaurav 28 NaN
58 27 Saurav 26 NaN
59 28 Rohan 17 NaN
file:///C:/Users/goura/Downloads/14-DataFrame.html 90/143
2/8/25, 2:23 PM 14-DataFrame
Out[208… 154247
Out[209… level_0
Dec 65072
Nov 89175
Name: price, dtype: int64
regs.merge(courses,on = 'course_id').groupby('course_name')['price'].sum().plot(
file:///C:/Users/goura/Downloads/14-DataFrame.html 91/143
2/8/25, 2:23 PM 14-DataFrame
common_student_id = np.intersect1d(nov['student_id'],dec['student_id'])
common_student_id
In [214… students[students['student_id'].isin(common_student_id)]
file:///C:/Users/goura/Downloads/14-DataFrame.html 92/143
2/8/25, 2:23 PM 14-DataFrame
0 1 Kailash Harjo 23
2 3 Parveen Bhalla 3
6 7 Tarun Thaker 9
10 11 David Mukhopadhyay 20
15 16 Elias Dodiya 25
16 17 Yasmin Palan 7
17 18 Fardeen Mahabir 13
21 22 Yash Sethi 21
22 23 Chhavi Lachman 18
# courses['course_id']
# regs['course_id']
course_id_list = np.setdiff1d(courses['course_id'],regs['course_id'])
courses[courses['course_id'].isin(course_id_list)]
10 11 Numpy 699
11 12 C++ 1299
In [217… # find students who did not enroll into any courses
student_id_list = np.setdiff1d(students['student_id'],regs['student_id'])
In [218… students[students['student_id'].isin(student_id_list)].shape[0]
Out[218… 10
In [219… (10/28)*100
Out[219… 35.714285714285715
In [220… # Print student name -> partner name for all enrolled students
# self join
file:///C:/Users/goura/Downloads/14-DataFrame.html 93/143
2/8/25, 2:23 PM 14-DataFrame
25 Gaurav Rohan
26 Saurav Gaurav
regs.merge(students,on = 'student_id').groupby(['student_id','name'])['name'].co
file:///C:/Users/goura/Downloads/14-DataFrame.html 94/143
2/8/25, 2:23 PM 14-DataFrame
In [223… # 10. find top 3 students who spent most amount of money on courses
pd.merge(students,regs,how='inner',on='student_id')
file:///C:/Users/goura/Downloads/14-DataFrame.html 95/143
2/8/25, 2:23 PM 14-DataFrame
0 1 Kailash Harjo 23 1
1 1 Kailash Harjo 23 6
2 1 Kailash Harjo 23 10
3 1 Kailash Harjo 23 9
4 2 Esha Butala 1 5
5 3 Parveen Bhalla 3 3
6 3 Parveen Bhalla 3 5
7 7 Tarun Thaker 9 8
8 7 Tarun Thaker 9 10
9 7 Tarun Thaker 9 7
10 7 Tarun Thaker 9 2
11 7 Tarun Thaker 9 6
12 11 David Mukhopadhyay 20 7
13 11 David Mukhopadhyay 20 8
14 11 David Mukhopadhyay 20 10
15 12 Radha Dutt 19 10
16 12 Radha Dutt 19 1
17 12 Radha Dutt 19 5
18 12 Radha Dutt 19 7
19 13 Munni Varghese 24 3
20 14 Pranab Natarajan 22 9
21 14 Pranab Natarajan 22 6
22 14 Pranab Natarajan 22 4
23 14 Pranab Natarajan 22 1
24 15 Preet Sha 16 5
25 15 Preet Sha 16 1
26 16 Elias Dodiya 25 9
27 16 Elias Dodiya 25 5
28 16 Elias Dodiya 25 7
29 16 Elias Dodiya 25 3
30 17 Yasmin Palan 7 7
31 17 Yasmin Palan 7 10
32 18 Fardeen Mahabir 13 6
file:///C:/Users/goura/Downloads/14-DataFrame.html 96/143
2/8/25, 2:23 PM 14-DataFrame
33 18 Fardeen Mahabir 13 1
34 18 Fardeen Mahabir 13 8
35 19 Qabeel Raman 12 4
36 19 Qabeel Raman 12 2
37 21 Seema Kota 15 1
38 22 Yash Sethi 21 3
39 22 Yash Sethi 21 5
40 22 Yash Sethi 21 6
41 23 Chhavi Lachman 18 1
42 23 Chhavi Lachman 18 4
43 23 Chhavi Lachman 18 3
44 23 Chhavi Lachman 18 6
45 23 Chhavi Lachman 18 9
46 23 Chhavi Lachman 18 5
47 24 Radhika Suri 17 4
48 25 Shashank D’Alia 2 1
49 25 Shashank D’Alia 2 10
In [228… (num_sixes/num_matches).sort_values(ascending=False).head(10)
Out[228… venue
Holkar Cricket Stadium 17.600000
M Chinnaswamy Stadium 13.227273
Sharjah Cricket Stadium 12.666667
Himachal Pradesh Cricket Association Stadium 12.000000
Dr. Y.S. Rajasekhara Reddy ACA-VDCA Cricket Stadium 11.727273
Wankhede Stadium 11.526316
De Beers Diamond Oval 11.333333
Maharashtra Cricket Association Stadium 11.266667
JSCA International Stadium Complex 10.857143
Sardar Patel Stadium, Motera 10.833333
dtype: float64
In [229… temp_df.groupby(['season','batsman'])['batsman_runs'].sum().reset_index().sort_v
file:///C:/Users/goura/Downloads/14-DataFrame.html 97/143
2/8/25, 2:23 PM 14-DataFrame
In [230… temp_df.groupby(['season','batsman'])['batsman_runs'].sum().reset_index().sort_v
58 2008 L Balaji 0
MultiIndex-Objects
higher dimension data ko lower dimension me represent karne ka tarika hota hai multi-
indexing ye ek heirarchy create karta hai .
file:///C:/Users/goura/Downloads/14-DataFrame.html 98/143
2/8/25, 2:23 PM 14-DataFrame
index_val = [('cse',2019),('cse',2020),('cse',2021),
('cse',2022),('ece',2019),('ece',2020),('ece',2021),('ece',2022)]
a = pd.Series([1,2,3,4,5,6,7,8],index=index_val)
a
In [234… a[('cse',2022)]
Out[234… 4
file:///C:/Users/goura/Downloads/14-DataFrame.html 99/143
2/8/25, 2:23 PM 14-DataFrame
Out[237… 2019 1
2020 2
2021 3
2022 4
dtype: int64
jab aapne multi index banaya aapne higher dimension data ko ek lower dimension
object me represent kiya series hota kaisa hai 1-D hota hai lekin aapne series ko use
karke 2-D data ko series me represent kiya hum yaha par 3d data ko 2d me display
kara du ya 5d data ko 2d me display kara du ya even 10d data ko 2d me represent
kar sakte ho.
temp = s.unstack()
temp
cse 1 2 3 4
ece 5 6 7 8
In [239… # stack
temp.stack()
file:///C:/Users/goura/Downloads/14-DataFrame.html 100/143
2/8/25, 2:23 PM 14-DataFrame
branch_df1
cse 2019 1 2
2020 3 4
2021 5 6
2022 7 8
ece 2019 9 10
2020 11 12
2021 13 14
2022 15 16
In [241… branch_df1['students']
file:///C:/Users/goura/Downloads/14-DataFrame.html 101/143
2/8/25, 2:23 PM 14-DataFrame
branch_df2
2019 1 2 0 0
2020 3 4 0 0
2021 5 6 0 0
2022 7 8 0 0
In [243… branch_df2.loc[2019]
branch_df3 = pd.DataFrame(
[
[1,2,0,0],
[3,4,0,0],
[5,6,0,0],
[7,8,0,0],
[9,10,0,0],
[11,12,0,0],
[13,14,0,0],
[15,16,0,0],
],
index = multiindex,
columns = pd.MultiIndex.from_product([['delhi','mumbai'],
['avg_package','students']])
)
branch_df3
file:///C:/Users/goura/Downloads/14-DataFrame.html 102/143
2/8/25, 2:23 PM 14-DataFrame
cse 2019 1 2 0 0
2020 3 4 0 0
2021 5 6 0 0
2022 7 8 0 0
ece 2019 9 10 0 0
2020 11 12 0 0
2021 13 14 0 0
2022 15 16 0 0
branch_df1.unstack().unstack()
branch_df1.unstack().stack().stack()
file:///C:/Users/goura/Downloads/14-DataFrame.html 103/143
2/8/25, 2:23 PM 14-DataFrame
In [247… branch_df2.unstack()
In [248… branch_df2.stack().stack()
file:///C:/Users/goura/Downloads/14-DataFrame.html 104/143
2/8/25, 2:23 PM 14-DataFrame
branch_df3.head()
branch_df3.tail()
# Shape
branch_df3.shape
# info
branch_df3.info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 8 entries, ('cse', 2019) to ('ece', 2022)
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 (delhi, avg_package) 8 non-null int64
1 (delhi, students) 8 non-null int64
2 (mumbai, avg_package) 8 non-null int64
3 (mumbai, students) 8 non-null int64
dtypes: int64(4)
memory usage: 632.0+ bytes
Out[249… delhi mumbai
# Jab hamare paas named index hote hain tab hum loc use karte hain
branch_df3.loc[('cse',2022)]
file:///C:/Users/goura/Downloads/14-DataFrame.html 105/143
2/8/25, 2:23 PM 14-DataFrame
In [251… # multiple
branch_df3.loc[('cse',2019):('ece',2022):2]
cse 2019 1 2 0 0
2021 5 6 0 0
ece 2019 9 10 0 0
2021 13 14 0 0
branch_df3.iloc[0:5:2]
cse 2019 1 2 0 0
2021 5 6 0 0
ece 2019 9 10 0 0
branch_df3['delhi']['students']
In [254… branch_df3.iloc[:,1:3]
file:///C:/Users/goura/Downloads/14-DataFrame.html 106/143
2/8/25, 2:23 PM 14-DataFrame
students avg_package
cse 2019 2 0
2020 4 0
2021 6 0
2022 8 0
ece 2019 10 0
2020 12 0
2021 14 0
2022 16 0
branch_df3.iloc[[0,4],[1,2]]
students avg_package
cse 2019 2 0
ece 2019 10 0
branch_df3.sort_index(ascending=False)
branch_df3.sort_index(ascending=[False,True])
branch_df3.sort_index(level=0,ascending=[False])
ece 2019 9 10 0 0
2020 11 12 0 0
2021 13 14 0 0
2022 15 16 0 0
cse 2019 1 2 0 0
2020 3 4 0 0
2021 5 6 0 0
2022 7 8 0 0
file:///C:/Users/goura/Downloads/14-DataFrame.html 107/143
2/8/25, 2:23 PM 14-DataFrame
branch_df3.transpose()
delhi avg_package 1 3 5 7 9 11 13 15
students 2 4 6 8 10 12 14 16
mumbai avg_package 0 0 0 0 0 0 0 0
students 0 0 0 0 0 0 0 0
In [258… # swaplevel
branch_df3.swaplevel(axis=1)
cse 2019 1 2 0 0
2020 3 4 0 0
2021 5 6 0 0
2022 7 8 0 0
ece 2019 9 10 0 0
2020 11 12 0 0
2021 13 14 0 0
2022 15 16 0 0
Wide format is where we have a single row for every data point with multiple columns
to hold the values of various attributes.
file:///C:/Users/goura/Downloads/14-DataFrame.html 108/143
2/8/25, 2:23 PM 14-DataFrame
Long format is where, for each data point we have as many rows as the number of
attributes and each row contains the value of a particular attribute for a given data point.
pd.DataFrame({'cse':[120]}).melt()
0 cse 120
0 cse 120
1 ece 100
2 mech 50
In [261… pd.DataFrame(
{
'branch':['cse','ece','mech'],
'2020':[100,150,60],
'2021':[120,130,80],
'2022':[150,140,70]
}
).melt(id_vars=['branch'],var_name='year',value_name='students')
2 mech 2020 60
5 mech 2021 80
8 mech 2022 70
file:///C:/Users/goura/Downloads/14-DataFrame.html 109/143
2/8/25, 2:23 PM 14-DataFrame
death = pd.read_csv("time_series_covid19_deaths_global.csv")
confirm = pd.read_csv("time_series_covid19_confirmed_global.csv")
Winter Olympics
311249 NaN 39.904200 116.407400 1/2/23 535
2022
file:///C:/Users/goura/Downloads/14-DataFrame.html 110/143
2/8/25, 2:23 PM 14-DataFrame
0 Afghanistan 1/22/20 0 0
1 Albania 1/22/20 0 0
2 Algeria 1/22/20 0 0
3 Andorra 1/22/20 0 0
4 Angola 1/22/20 0 0
Pivot Table
The pivot table takes simple column-wise data as input, and groups the entries into a
two-dimensional table that provides a multidimensional summarization of the data.
In [268… df = sns.load_dataset('tips')
df.head()
In [269… df.groupby('sex')[['total_bill']].mean()
Out[269… total_bill
sex
Male 20.744076
Female 18.056897
In [270… df.groupby(['sex','smoker'])[['total_bill']].mean().unstack()
file:///C:/Users/goura/Downloads/14-DataFrame.html 111/143
2/8/25, 2:23 PM 14-DataFrame
Out[270… total_bill
smoker Yes No
sex
sex
In [272… # aggfunc
sex
In [274… df
file:///C:/Users/goura/Downloads/14-DataFrame.html 112/143
2/8/25, 2:23 PM 14-DataFrame
In [275… df.dtypes
sex
In [ ]:
In [277… # Multidimensional
file:///C:/Users/goura/Downloads/14-DataFrame.html 113/143
2/8/25, 2:23 PM 14-DataFrame
Out[277… size
sex smoker
5 rows × 23 columns
In [278… # margins
df.pivot_table(index='sex',columns='smoker',values='total_bill',
aggfunc='sum',margins=True)
sex
df = pd.read_csv("expense_data.csv")
df.head()
file:///C:/Users/goura/Downloads/14-DataFrame.html 114/143
2/8/25, 2:23 PM 14-DataFrame
CUB -
3/2/2022
0 online Food NaN Brownie 50.0 Expense N
10:11
payment
CUB - To
3/2/2022
1 online Other NaN lended 300.0 Expense N
10:11
payment people
CUB -
3/1/2022
2 online Food NaN Dinner 78.0 Expense N
19:50
payment
CUB -
3/1/2022
3 online Transportation NaN Metro 30.0 Expense N
18:56
payment
CUB -
3/1/2022
4 online Food NaN Snacks 67.0 Expense N
18:22
payment
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 277 entries, 0 to 276
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 277 non-null datetime64[ns]
1 Account 277 non-null object
2 Category 277 non-null object
3 Subcategory 0 non-null float64
4 Note 273 non-null object
5 INR 277 non-null float64
6 Income/Expense 277 non-null object
7 Note.1 0 non-null float64
8 Amount 277 non-null float64
9 Currency 277 non-null object
10 Account.1 277 non-null float64
dtypes: datetime64[ns](1), float64(5), object(5)
memory usage: 23.9+ KB
file:///C:/Users/goura/Downloads/14-DataFrame.html 115/143
2/8/25, 2:23 PM 14-DataFrame
file:///C:/Users/goura/Downloads/14-DataFrame.html 116/143
2/8/25, 2:23 PM 14-DataFrame
In [285… df.pivot_table(index='month',columns='Account',values='INR'
,aggfunc='sum',fill_value=0).plot()
Pandas Strings
In [286… # What are vectorized operation
a = np.array([1,2,3,4])
a * 4
s = ['cat','mat',None,'rat']
[i.startswith('c') for i in s]
file:///C:/Users/goura/Downloads/14-DataFrame.html 117/143
2/8/25, 2:23 PM 14-DataFrame
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[287], line 5
1 # problem in vectorized opertions in vanilla python
3 s = ['cat','mat',None,'rat']
----> 5 [i.startswith('c') for i in s]
s = pd.Series(['cat','mat',None,'rat'])
# string accessor
s.str.startswith('c')
Out[288… 0 True
1 False
2 None
3 False
dtype: object
df = pd.read_csv("titanic.csv")
df.head()
file:///C:/Users/goura/Downloads/14-DataFrame.html 118/143
2/8/25, 2:23 PM 14-DataFrame
Out[289… PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket F
Braund,
A/5
0 1 0 3 Mr. Owen male 22.0 1 0 7.2
21171
Harris
Cumings,
Mrs. John
Bradley
1 2 1 1 female 38.0 1 0 PC 17599 71.2
(Florence
Briggs
Th...
Heikkinen,
STON/O2.
2 3 1 3 Miss. female 26.0 0 0 7.9
3101282
Laina
Futrelle,
Mrs.
Jacques
3 4 1 1 female 35.0 1 0 113803 53.1
Heath
(Lily May
Peel)
Allen, Mr.
4 5 0 3 William male 35.0 0 0 373450 8.0
Henry
df['Name'].str.lower()
df['Name'].str.upper()
df['Name'].str.capitalize()
df['Name'].str.title()
# len
df['Name'].str.len()
df['Name'].str.len().max()
df['Name'].str.len() == 82
df['Name'][df['Name'].str.len() == 82].values[0]
# Strip
" Gourab ".strip()
df['Name'].str.strip()
file:///C:/Users/goura/Downloads/14-DataFrame.html 119/143
2/8/25, 2:23 PM 14-DataFrame
df['lastname'] = df['Name'].str.split(',').str.get(0)
df.head()
Out[291… PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket F
Braund,
A/5
0 1 0 3 Mr. Owen male 22.0 1 0 7.2
21171
Harris
Cumings,
Mrs. John
Bradley
1 2 1 1 female 38.0 1 0 PC 17599 71.2
(Florence
Briggs
Th...
Heikkinen,
STON/O2.
2 3 1 3 Miss. female 26.0 0 0 7.9
3101282
Laina
Futrelle,
Mrs.
Jacques
3 4 1 1 female 35.0 1 0 113803 53.1
Heath
(Lily May
Peel)
Allen, Mr.
4 5 0 3 William male 35.0 0 0 373450 8.0
Henry
df.head()
df['title'].value_counts()
file:///C:/Users/goura/Downloads/14-DataFrame.html 120/143
2/8/25, 2:23 PM 14-DataFrame
Out[293… title
Mr. 517
Miss. 182
Mrs. 125
Master. 40
Dr. 7
Rev. 6
Mlle. 2
Major. 2
Col. 2
the 1
Capt. 1
Ms. 1
Sir. 1
Lady. 1
Mme. 1
Don. 1
Jonkheer. 1
Name: count, dtype: int64
In [294… # Replace
df['title'] = df['title'].str.replace('Ms.','Miss.')
df['title'] = df['title'].str.replace('Mlle.','Miss.')
In [295… df['title'].value_counts()
Out[295… title
Mr. 517
Miss. 185
Mrs. 125
Master. 40
Dr. 7
Rev. 6
Major. 2
Col. 2
Don. 1
Mme. 1
Lady. 1
Sir. 1
Capt. 1
the 1
Jonkheer. 1
Name: count, dtype: int64
In [296… # filtering
# startswith/endswith
df[df['firstname'].str.endswith('A')]
# isdigit/isalpha...
df[df['firstname'].str.isdigit()]
Out[296… PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin E
file:///C:/Users/goura/Downloads/14-DataFrame.html 121/143
2/8/25, 2:23 PM 14-DataFrame
Out[297… PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket
Braund,
0 1 0 3 Mr. Owen male 22.0 1 0 A/5 21171
Harris
Cumings,
Mrs. John
Bradley
1 2 1 1 female 38.0 1 0 PC 17599
(Florence
Briggs
Th...
Heikkinen,
STON/O2.
2 3 1 3 Miss. female 26.0 0 0
3101282
Laina
Moran,
5 6 0 3 male NaN 0 0 330877
Mr. James
McCarthy,
6 7 0 1 Mr. male 54.0 0 0 17463
Timothy J
... ... ... ... ... ... ... ... ... ...
Sutehall,
SOTON/OQ
884 885 0 3 Mr. Henry male 25.0 0 0
392076
Jr
Graham,
Miss.
887 888 1 1 female 19.0 0 0 112053
Margaret
Edith
Johnston,
Miss.
888 889 0 3 Catherine female NaN 1 2 W./C. 6607
Helen
"Carrie"
Behr, Mr.
889 890 1 1 Karl male 26.0 0 0 111369
Howell
Dooley,
890 891 0 3 Mr. male 32.0 0 0 370376
Patrick
file:///C:/Users/goura/Downloads/14-DataFrame.html 122/143
2/8/25, 2:23 PM 14-DataFrame
In [298… # slicing
df['Name'].str[::-1]
Timestamp Object
Time stamps reference particular moments in time (e.g., Oct
24th, 2022 at 7:00pm)
Creating Timestamp objects
pd.Timestamp('2023/03/05')
In [300… type(pd.Timestamp('2023/03/05'))
Out[300… pandas._libs.tslibs.timestamps.Timestamp
In [301… # Variation
pd.Timestamp('2023-1-5')
pd.Timestamp('2023, 1, 5')
file:///C:/Users/goura/Downloads/14-DataFrame.html 123/143
2/8/25, 2:23 PM 14-DataFrame
import datetime as dt
dt.datetime(2023,1,5,9,21,56)
In [306… x = pd.Timestamp(dt.datetime(2023,1,5,9,21,56))
x
x.year
x.month
x.day
x.hour
x.minute
x.second
Out[307… 56
But the performance takes a hit while working with huge data. List vs Numpy Array
The weaknesses of Python's datetime format inspired the NumPy team to add a set of
native time series data type to NumPy.
The datetime64 dtype encodes dates as 64-bit integers, and thus allows arrays of
dates to be represented very compactly.
file:///C:/Users/goura/Downloads/14-DataFrame.html 124/143
2/8/25, 2:23 PM 14-DataFrame
DatetimeIndex Object
A collection of pandas timestamp
pd.DatetimeIndex(['2023/1/1','2022/1/1','2021/1/1'])
Out[311… pandas.core.indexes.datetimes.DatetimeIndex
ek single date ko store karne ke liye time stamp object use hota hai aur multiple
date ko store karne ke liye DateTimeIndex use hota hai.
pd.DatetimeIndex([dt.datetime(2023,1,1),dt.datetime(2022,1,1),
dt.datetime(2021,1,1)])
dt_index = pd.DatetimeIndex([pd.Timestamp(2023,1,1),
pd.Timestamp(2022,1,1),pd.Timestamp(2021,1,1)])
pd.Series([1,2,3],index=dt_index)
Out[314… 2023-01-01 1
2022-01-01 2
2021-01-01 3
dtype: int64
pd.date_range(start='2023/1/5',end='2023/2/28',freq='3D')
pd.date_range(start='2023/1/5',end='2023/2/28',freq='B')
pd.date_range(start='2023/1/5',end='2023/2/28',freq='W-MON')
pd.date_range(start='2023/1/5',end='2023/2/28',freq='6H')
file:///C:/Users/goura/Downloads/14-DataFrame.html 126/143
2/8/25, 2:23 PM 14-DataFrame
pd.date_range(start='2023/1/5',end='2023/2/28',freq='M')
pd.date_range(start='2023/1/5',end='2023/2/28',freq='MS')
pd.date_range(start='2023/1/5',end='2030/2/28',freq='A')
pd.date_range(start='2023/1/5',periods=25,freq='M')
to_datetime function
converts an existing objects to pandas timestamp/datetimeindex object
s = pd.Series(['2023/1/1','2022/1/1','2021/1/1'])
pd.to_datetime(s).dt.day_name() #dt.year,.month,.day,month_name...
file:///C:/Users/goura/Downloads/14-DataFrame.html 127/143
2/8/25, 2:23 PM 14-DataFrame
Out[325… 0 Sunday
1 Saturday
2 Friday
dtype: object
s = pd.Series(['2023/1/1','2022/1/1','2021/130/1'])
pd.to_datetime(s,errors = 'coerce').dt.year
Out[326… 0 2023.0
1 2022.0
2 NaN
dtype: float64
In [327… df = pd.read_csv("expense_data.csv")
df.shape
dt accessor
Accessor object for datetimelike properties of the Series values.
In [329… df['Date'].dt.is_quarter_start
Out[329… 0 False
1 False
2 False
3 False
4 False
...
272 False
273 False
274 False
275 False
276 False
Name: Date, Length: 277, dtype: bool
file:///C:/Users/goura/Downloads/14-DataFrame.html 128/143
2/8/25, 2:23 PM 14-DataFrame
df['day_name'] = df['Date'].dt.day_name()
In [332… df.groupby('day_name')['INR'].mean().plot(kind='bar')
file:///C:/Users/goura/Downloads/14-DataFrame.html 129/143
2/8/25, 2:23 PM 14-DataFrame
In [334… df.groupby('month_name')['INR'].sum().plot(kind='bar')
file:///C:/Users/goura/Downloads/14-DataFrame.html 130/143
2/8/25, 2:23 PM 14-DataFrame
In [335… df[df['Date'].dt.is_month_end]
file:///C:/Users/goura/Downloads/14-DataFrame.html 131/143
2/8/25, 2:23 PM 14-DataFrame
2022- CUB -
7 02-28 online Food NaN Pizza 339.15 Expense
11:56:00 payment
2022- CUB -
From
8 02-28 online Other NaN 200.00 Income
kumara
11:45:00 payment
2022- CUB -
Vnr to
61 01-31 online Transportation NaN 50.00 Expense
apk
08:44:00 payment
2022- CUB -
62 01-31 online Other NaN To vicky 200.00 Expense
08:27:00 payment
2022- CUB -
To ksr
63 01-31 online Transportation NaN 153.00 Expense
station
08:26:00 payment
2021- CUB -
Bharath
242 11-30 online Gift NaN 115.00 Expense
birthday
14:24:00 payment
2021- CUB -
244 11-30 online Food NaN Breakfast 70.00 Expense
10:11:00 payment
# date_range()
pd.date_range(start='2023-1-6',end='2023-1-31',freq='D')
# to_datetime()
s = pd.Series(['2023/1/6','2023/1/7','2023/1/7'])
pd.to_datetime(s).dt.day_name()
file:///C:/Users/goura/Downloads/14-DataFrame.html 132/143
2/8/25, 2:23 PM 14-DataFrame
Out[336… 0 Friday
1 Saturday
2 Saturday
dtype: object
Timedelta Object
Represents a duration, the difference between two dates or times.
t2 - t1
In [339… # Arithmetic
pd.Timestamp('6th jan 2023') + pd.Timedelta(days=2,hours=10,minutes=35)
In [340… pd.date_range(
start='2023-1-6',end='2023-1-31',freq='D') - pd.Timedelta(
days=2,hours=10,minutes=35)
file:///C:/Users/goura/Downloads/14-DataFrame.html 133/143
2/8/25, 2:23 PM 14-DataFrame
0 5/24/98 2/5/99
1 4/22/92 3/6/98
2 2/10/91 8/26/92
3 7/21/92 11/20/97
4 9/2/93 6/10/98
In [345… df.columns
df['delivery_time_period'].mean()
Time series
A time series is a data set that tracks a sample over time. In particular, a time series
allows one to see what factors influence certain variables from period to period. Time
series analysis can be useful to see how a given asset, security, or economic variable
changes over time.
Examples
file:///C:/Users/goura/Downloads/14-DataFrame.html 134/143
2/8/25, 2:23 PM 14-DataFrame
In [351… google.set_index('Date',inplace=True)
google.head()
Date
2004-
49.813290 51.835709 47.800831 49.982655 49.982655 44871361 August
08-19
2004-
50.316402 54.336334 50.062355 53.952770 53.952770 22942874 August
08-20
2004-
55.168217 56.528118 54.321388 54.495735 54.495735 18342897 August
08-23
2004-
55.412300 55.591629 51.591621 52.239197 52.239197 15319808 August
08-24
2004-
52.284027 53.798351 51.746044 52.802086 52.802086 9232276 August
08-25
In [354… #challenge- fetch info for a particular date every year- limitation of timedelta
google.head()
file:///C:/Users/goura/Downloads/14-DataFrame.html 135/143
2/8/25, 2:23 PM 14-DataFrame
google[google.index.isin(pd.date_range(
start='2005-1-6',end='2022-1-6',freq=pd.DateOffset(years=1)))]
Date
2005-
97.175758 97.584229 93.509506 93.922951 93.922951 20852067
01-06
2006-
227.581970 234.371521 225.773743 231.960556 231.960556 35646914
01-06
2009-
165.868286 169.763687 162.585587 166.406265 166.406265 12898566
01-06
2010-
311.761444 311.761444 302.047852 302.994293 302.994293 7987226
01-06
2011-
304.199799 308.060303 303.885956 305.604523 305.604523 4131026
01-06
2012-
328.344299 328.767700 323.681763 323.796326 323.796326 5405987
01-06
2014-
554.426880 557.340942 551.154114 556.573853 556.573853 3551864
01-06
2015-
513.589966 514.761719 499.678131 500.585632 500.585632 2899940
01-06
2016-
730.000000 747.179993 728.919983 743.619995 743.619995 1947000
01-06
2017-
795.260010 807.900024 792.203979 806.150024 806.150024 1640200
01-06
2020-
1350.000000 1396.500000 1350.000000 1394.209961 1394.209961 1732300
01-06
2021-
1702.630005 1748.000000 1699.000000 1735.290039 1735.290039 2602100
01-06
2022-
2749.949951 2793.719971 2735.270020 2751.020020 2751.020020 1452500
01-06
file:///C:/Users/goura/Downloads/14-DataFrame.html 136/143
2/8/25, 2:23 PM 14-DataFrame
In [356… google.loc['2021-12']['Close'].plot()
In [357… google.groupby('month_name')['Close'].mean().plot(kind='bar')
file:///C:/Users/goura/Downloads/14-DataFrame.html 137/143
2/8/25, 2:23 PM 14-DataFrame
file:///C:/Users/goura/Downloads/14-DataFrame.html 138/143
2/8/25, 2:23 PM 14-DataFrame
In [359… # frequency
google.index
In [360… # asfreq
google.asfreq('6H',method='bfill')
file:///C:/Users/goura/Downloads/14-DataFrame.html 139/143
2/8/25, 2:23 PM 14-DataFrame
Date
2004-
08-19 49.813290 51.835709 47.800831 49.982655 49.982655 44871361
00:00:00
2004-
08-19 50.316402 54.336334 50.062355 53.952770 53.952770 22942874
06:00:00
2004-
08-19 50.316402 54.336334 50.062355 53.952770 53.952770 22942874
12:00:00
2004-
08-19 50.316402 54.336334 50.062355 53.952770 53.952770 22942874
18:00:00
2004-
08-20 50.316402 54.336334 50.062355 53.952770 53.952770 22942874
00:00:00
2022-
05-19 2236.820068 2271.750000 2209.360107 2214.909912 2214.909912 1459600
00:00:00
2022-
05-19 2241.709961 2251.000000 2127.459961 2186.260010 2186.260010 1878100
06:00:00
2022-
05-19 2241.709961 2251.000000 2127.459961 2186.260010 2186.260010 1878100
12:00:00
2022-
05-19 2241.709961 2251.000000 2127.459961 2186.260010 2186.260010 1878100
18:00:00
2022-
05-20 2241.709961 2251.000000 2127.459961 2186.260010 2186.260010 1878100
00:00:00
Resampling
Resampling involves changing the frequency of your time series observations.
Upsampling: Where you increase the frequency of the samples, such as from minutes to
seconds.
file:///C:/Users/goura/Downloads/14-DataFrame.html 140/143
2/8/25, 2:23 PM 14-DataFrame
Downsampling: Where you decrease the frequency of the samples, such as from days to
months.
In [361… # Upsampling
google['Close'].resample('12H').interpolate(method='spline',order=2).plot()
Rolling Window(Smoothing)
Time series data in original format can be quite volatile, especially on smaller
aggregation levels. The concept of rolling, or moving averages is a useful technique for
smoothing time series data.
Shifting
The shift() function is Pandas is used to, well, shift the entire series up or down by the
desired number of periods.
file:///C:/Users/goura/Downloads/14-DataFrame.html 141/143
2/8/25, 2:23 PM 14-DataFrame
<class 'pandas.core.frame.DataFrame'>
Index: 208 entries, 2 to 1018
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 user_id 208 non-null int64
1 login_time 208 non-null datetime64[ns]
dtypes: datetime64[ns](1), int64(1)
memory usage: 4.9 KB
In [367… ax = df.plot(subplots=True,
layout=(3, 2),
sharex=False,
sharey=False,
linewidth=0.7,
fontsize=10,
legend=False,
figsize=(20,15))
file:///C:/Users/goura/Downloads/14-DataFrame.html 142/143
2/8/25, 2:23 PM 14-DataFrame
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
file:///C:/Users/goura/Downloads/14-DataFrame.html 143/143