exercise- Pandas
exercise- Pandas
Pandas dataframe
when data in the form of tables is load in the pandas then pandas will recognise it as a
dataframe
whole table will be called as dataframe
single row or single column will be called series
In [1]:
1 import numpy as np
2 import pandas as pd
Creating a dataframe
using a list
In [2]:
1 student_data = [
2 [100,80,10],
3 [90,70,7],
4 [120,100,14],
5 [80,50,2]
6 ]
7 student_data
Out[2]:
[[100, 80, 10], [90, 70, 7], [120, 100, 14], [80, 50, 2]]
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 1/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [3]:
1 pd.DataFrame(student_data)
2 # index will be generated automatically
Out[3]:
0 1 2
0 100 80 10
1 90 70 7
2 120 100 14
3 80 50 2
In [4]:
1 pd.DataFrame(student_data,columns=['iq','marks','package'])
Out[4]:
iq marks package
0 100 80 10
1 90 70 7
2 120 100 14
3 80 50 2
using dictionary
In [5]:
1 student_dict = {
2 'name':['nitish','ankit','rupesh','rishabh','amit','disha'],
3 'iq':[100,90,120,80,0,0],
4 'marks':[80,70,100,50,0,0],
5 'package':[10,7,14,2,0,0]
6 }
7
8 students = pd.DataFrame(student_dict)
9 students
Out[5]:
0 nitish 100 80 10
1 ankit 90 70 7
3 rishabh 80 50 2
4 amit 0 0 0
5 disha 0 0 0
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 2/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [6]:
1 students.set_index('name', inplace=True)
2 students
Out[6]:
iq marks package
name
nitish 100 80 10
ankit 90 70 7
rishabh 80 50 2
amit 0 0 0
disha 0 0 0
using read_csv()
In [7]:
1 movies = pd.read_csv('movies.csv')
2 movies
Out[7]:
Uri: The
0 Surgical tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/Uri:_The
Strike
Battalion
1 tt9472208 NaN https://en.wikipedia.org/wiki/Ba
609
The
Accidental
2 Prime tt6986710 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/The_Acc
Minister
(film)
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 3/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [8]:
1 ipl = pd.read_csv('ipl-matches.csv')
2 ipl
Out[8]:
Narendra
2022- Rajasthan Gujarat Modi Rajasth
0 1312200 Ahmedabad 2022 Final
05-29 Royals Titans Stadium, Roya
Ahmedabad
Narendra
Royal
2022- Rajasthan Modi Rajasth
1 1312199 Ahmedabad 2022 Qualifier 2 Challengers
05-27 Royals Stadium, Roya
Bangalore
Ahmedabad
Eden
2022- Rajasthan Gujarat Guja
shape
In [9]:
1 movies.shape
2 # returns in tuple form
Out[9]:
(1629, 18)
In [10]:
1 ipl.shape
Out[10]:
(950, 20)
dtypes
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 4/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [11]:
1 movies.dtypes
2 # here you will get whole series for each attribute
Out[11]:
title_x object
imdb_id object
poster_path object
wiki_link object
title_y object
original_title object
is_adult int64
year_of_release int64
runtime object
genres object
imdb_rating float64
imdb_votes int64
story object
summary object
tagline object
actors object
wins_nominations object
release_date object
dtype: object
In [12]:
1 ipl.dtypes
Out[12]:
ID int64
City object
Date object
Season object
MatchNumber object
Team1 object
Team2 object
Venue object
TossWinner object
TossDecision object
SuperOver object
WinningTeam object
WonBy object
Margin float64
method object
Player_of_Match object
Team1Players object
Team2Players object
index
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 5/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [13]:
1 movies.index
Out[13]:
In [14]:
1 ipl.index
Out[14]:
columns
In [15]:
1 movies.columns
Out[15]:
In [16]:
1 ipl.columns
Out[16]:
values
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 6/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [17]:
1 movies.values
2 # we will get 2D numpy array
g | | p | j | ,
'4 wins', '11 January 2019 (USA)'],
['Battalion 609', 'tt9472208', nan, ...,
'Vicky Ahuja|Shoaib Ibrahim|Shrikant Kamat|Elena Kazan|Vishwas
Kini|Major Kishore|Jashn Kohli|Rammy C. Pandey|Manish Sharma|Sparsh Sha
rma|Farnaz Shetty|Vikas Shrivastav|Chandraprakash Thakur|Brajesh Tiwari
|',
nan, '11 January 2019 (India)'],
['The Accidental Prime Minister (film)', 'tt6986710',
'https://upload.wikimedia.org/wikipedia/en/thumb/a/a1/The_Accid
ental_Prime_Minister_film.jpg/220px-The_Accidental_Prime_Minister_film.
jpg',
...,
'Anupam Kher|Akshaye Khanna|Aahana Kumra|Atul Sharma|Manoj Anan
d|Arjun Mathur|Suzanne Bernert|Abdul Quadir Amin|Bharat Mistri|Divya Se
th|Anil Rastogi|Ramesh Bhatkar|Parrgash Kaur|Jess Kaur|',
nan, '11 January 2019 (USA)'],
...,
['Sabse Bada Sukh', 'tt0069204', nan, ...,
'Vijay Arora|Asrani|Rajni Bala|Kumud Damle|Utpal Dutt|Meeta Fai
yyaz|Rabi Ghosh|Tarun Ghosh|Sanjeev Kumar|Keshto Mukherjee|Meena Rai|',
In [18]:
1 student_dict = {
2 'name':['nitish','ankit','rupesh','rishabh','amit','disha'],
3 'iq':[100,90,120,80,0,0],
4 'marks':[80,70,100,50,0,0],
5 'package':[10,7,14,2,0,0]
6 }
7
8 students = pd.DataFrame(student_dict)
9 students
10
11 students.set_index('name', inplace=True)
12 students
Out[18]:
iq marks package
name
nitish 100 80 10
ankit 90 70 7
rishabh 80 50 2
amit 0 0 0
disha 0 0 0
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 7/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [19]:
1 students.values
Out[19]:
In [20]:
1 movies.head()
2 # we will get top five rows by default
Out[20]:
Uri: The
0 Surgical tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/Uri:_The_Surg
Strike
Battalion
1 tt9472208 NaN https://en.wikipedia.org/wiki/Battalion_
609
The
Accidental
2 Prime tt6986710 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/The_Accidenta
Minister
(film)
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 8/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [21]:
1 movies.head(3)
Out[21]:
Uri: The
0 Surgical tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/
Strike
Battalion
1 tt9472208 NaN https://en.wikipedia.or
609
The
Accidental
2 Prime tt6986710 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/
Minister
(film)
In [22]:
1 ipl.tail()
2 # we will get last five rows by default
Out[22]:
Kolkata
2008- Deccan Eden Deccan
945 335986 Kolkata 2007/08 4 Knight
04-20 Chargers Gardens Chargers
Riders
Royal
2008- Mumbai Wankhede Mumba
946 335985 Mumbai 2007/08 5 Challengers
04-20 Indians Stadium Indians
Bangalore
Punjab
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 9/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [23]:
1 ipl.tail(4)
Out[23]:
Royal
2008- Mumbai Wankhede Mumba
946 335985 Mumbai 2007/08 5 Challengers
04-20 Indians Stadium Indians
Bangalore
Punjab
Chennai Cricket
2008- Kings XI Chenna
948 335983 Chandigarh 2007/08 2 Super Association
04-19 Punjab Super Kings
Kings Stadium,
Mohali
sample
we can use this to fetch items from the data frame when there is a bias in the data
In [24]:
1 ipl.sample()
2 # it randomly picks any single row
Out[24]:
Chennai Feroz
2013- Delhi Chennai
604 598020 Delhi 2013 24 Super Shah
04-18 Daredevils Super Kings
Kings Kotla
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 10/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [27]:
1 ipl.sample(3)
Out[27]:
Royal M
2016- Gujarat
376 981013 Bangalore 2016 Qualifier 1 Challengers Chinnaswamy C
05-24 Lions
Bangalore Stadium
Kolkata
2015- Delhi Eden
452 829761 Kolkata 2015 42 Knight
05-07 Daredevils Gardens
Riders
In [28]:
1 movies.sample(5)
Out[28]:
Bypass
65 Road tt9176260 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/Bypass_Roa
(film)
Kyaa Dil
1534 Ne tt0327005 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/Kyaa_Dil_Ne
Kahaa
info
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 11/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [29]:
1 movies.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1629 entries, 0 to 1628
Data columns (total 18 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 title_x 1629 non-null object
1 imdb_id 1629 non-null object
2 poster_path 1526 non-null object
3 wiki_link 1629 non-null object
4 title_y 1629 non-null object
5 original_title 1629 non-null object
6 is_adult 1629 non-null int64
7 year_of_release 1629 non-null int64
8 runtime 1629 non-null object
9 genres 1629 non-null object
10 imdb_rating 1629 non-null float64
11 imdb_votes 1629 non-null int64
12 story 1609 non-null object
13 summary 1629 non-null object
14 t li 557 ll bj t
In [30]:
1 ipl.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 950 entries, 0 to 949
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 950 non-null int64
1 City 899 non-null object
2 Date 950 non-null object
3 Season 950 non-null object
4 MatchNumber 950 non-null object
5 Team1 950 non-null object
6 Team2 950 non-null object
7 Venue 950 non-null object
8 TossWinner 950 non-null object
9 TossDecision 950 non-null object
10 SuperOver 946 non-null object
11 WinningTeam 946 non-null object
12 WonBy 950 non-null object
13 Margin 932 non-null float64
14 method 19 non null object
describe
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 12/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [31]:
1 ipl.describe()
Out[31]:
ID Margin
In [32]:
1 movies.describe()
Out[32]:
isnull
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 13/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [33]:
1 ipl.isnull()
Out[33]:
ID City Date Season MatchNumber Team1 Team2 Venue TossWinner TossDecision Super
0 False False False False False False False False False False F
1 False False False False False False False False False False F
2 False False False False False False False False False False F
3 False False False False False False False False False False F
4 False False False False False False False False False False F
... ... ... ... ... ... ... ... ... ... ...
945 False False False False False False False False False False F
946 False False False False False False False False False False F
947 False False False False False False False False False False F
948 False False False False False False False False False False F
In [35]:
Out[35]:
ID 0
City 51
Date 0
Season 0
MatchNumber 0
Team1 0
Team2 0
Venue 0
TossWinner 0
TossDecision 0
SuperOver 4
WinningTeam 4
WonBy 0
Margin 18
method 931
Player_of_Match 4
Team1Players 0
Team2Players 0
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 14/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [36]:
1 movies.isnull()
Out[36]:
In [38]:
1 movies.isnull().sum()
Out[38]:
title_x 0
imdb_id 0
poster_path 103
wiki_link 0
title_y 0
original_title 0
is_adult 0
year_of_release 0
runtime 0
genres 0
imdb_rating 0
imdb_votes 0
story 20
summary 0
tagline 1072
actors 5
wins_nominations 922
release_date 107
dtype: int64
duplicated
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 15/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [39]:
1 ipl.duplicated()
Out[39]:
0 False
1 False
2 False
3 False
4 False
...
945 False
946 False
947 False
948 False
949 False
Length: 950, dtype: bool
In [40]:
1 ipl.duplicated().sum()
Out[40]:
In [41]:
1 movies.duplicated()
Out[41]:
0 False
1 False
2 False
3 False
4 False
...
1624 False
1625 False
1626 False
1627 False
1628 False
Length: 1629, dtype: bool
In [42]:
1 movies.duplicated().sum()
2 # to get the total number of duplicated rows
Out[42]:
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 16/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [43]:
1 student_dict = {
2 'iq':[100,90,120,80,0,0],
3 'marks':[80,70,100,50,0,0],
4 'package':[10,7,14,2,0,0]
5 }
6
7 students = pd.DataFrame(student_dict)
8 students
Out[43]:
iq marks package
0 100 80 10
1 90 70 7
2 120 100 14
3 80 50 2
4 0 0 0
5 0 0 0
In [44]:
1 students.duplicated().sum()
Out[44]:
rename
In [45]:
1 students
2 # we want percentage instead of marks
3 # and instead of package we want LPA
Out[45]:
iq marks package
0 100 80 10
1 90 70 7
2 120 100 14
3 80 50 2
4 0 0 0
5 0 0 0
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 17/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [46]:
1 students.rename(columns={"marks":"percent","package":"LPA"})
2 # we are passing the dictionary using columns parameter inside rename
3 # this is not permanent change
4 # for permanent change use inplace=True parameter
Out[46]:
iq percent LPA
0 100 80 10
1 90 70 7
2 120 100 14
3 80 50 2
4 0 0 0
5 0 0 0
In [47]:
1 students.rename(index={0:"zero",1:"first",2:"second",
2 3:"third",4:"fourth",5:"fifth"})
Out[47]:
iq marks package
zero 100 80 10
first 90 70 7
third 80 50 2
fourth 0 0 0
fifth 0 0 0
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 18/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
Math Methods
In [48]:
1 student_dict = {
2 'iq':[100,90,120,80,0,0],
3 'marks':[80,70,100,50,0,0],
4 'package':[10,7,14,2,0,0]
5 }
6
7 students = pd.DataFrame(student_dict)
8 students
Out[48]:
iq marks package
0 100 80 10
1 90 70 7
2 120 100 14
3 80 50 2
4 0 0 0
5 0 0 0
sum
In [49]:
Out[49]:
iq 390
marks 300
package 33
dtype: int64
In [50]:
Out[50]:
0 190
1 167
2 234
3 132
4 0
5 0
dtype: int64
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 19/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [51]:
C:\Users\gadha\AppData\Local\Temp\ipykernel_2036\3281507241.py:3: FutureWa
rning: Dropping of nuisance columns in DataFrame reductions (with 'numeric
_only=None') is deprecated; in a future version this will raise TypeError.
Select only valid columns before calling the reduction.
movies.sum()
Out[51]:
In [48]:
1 ipl.sum()
C:\Users\gadha\AppData\Local\Temp\ipykernel_4404\599940423.py:1: Future
Warning: Dropping of nuisance columns in DataFrame reductions (with 'nu
meric_only=None') is deprecated; in a future version this will raise Ty
peError. Select only valid columns before calling the reduction.
ipl.sum()
Out[48]:
ID 788960985
Date 2022-05-292022-05-272022-05-252022-05-242022-0...
Season 2022202220222022202220222022202220222022202220...
MatchNumber FinalQualifier 2EliminatorQualifier 1706968676...
Team1 Rajasthan RoyalsRoyal Challengers BangaloreRoy...
Team2 Gujarat TitansRajasthan RoyalsLucknow Super Gi...
Venue Narendra Modi Stadium, AhmedabadNarendra Modi ...
TossWinner Rajasthan RoyalsRajasthan RoyalsLucknow Super ...
TossDecision batfieldfieldfieldbatfieldbatbatbatfieldfieldb...
WonBy WicketsWicketsRunsWicketsWicketsWicketsWickets...
Margin 15897.0
Team1Players ['YBK Jaiswal', 'JC Buttler', 'SV Samson', 'D ...
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 20/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [49]:
1 ipl.sum(axis=1)
2 # row wise
C:\Users\gadha\AppData\Local\Temp\ipykernel_4404\3452818570.py:1: FutureWa
rning: Dropping of nuisance columns in DataFrame reductions (with 'numeric
_only=None') is deprecated; in a future version this will raise TypeError.
Select only valid columns before calling the reduction.
ipl.sum(axis=1)
Out[49]:
0 1312207.0
1 1312206.0
2 1312212.0
3 1312204.0
4 1304121.0
...
945 335991.0
946 335990.0
947 335993.0
948 336016.0
949 336122.0
Length: 950, dtype: float64
mean
In [50]:
1 students.mean()
Out[50]:
iq 65.0
marks 50.0
package 5.5
dtype: float64
In [51]:
1 students.mean(axis=1)
2 # for row wise mean
Out[51]:
0 63.333333
1 55.666667
2 78.000000
3 44.000000
4 0.000000
5 0.000000
dtype: float64
median
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 21/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [52]:
1 students.median()
Out[52]:
iq 85.0
marks 60.0
package 4.5
dtype: float64
var
In [55]:
1 students.var()
Out[55]:
iq 2710.0
marks 1760.0
package 33.5
dtype: float64
In [56]:
1 students.var(axis=1)
2 # row wise variance
Out[56]:
0 2233.333333
1 1876.333333
2 3172.000000
3 1548.000000
4 0.000000
5 0.000000
dtype: float64
std
In [57]:
1 students.std()
Out[57]:
iq 52.057660
marks 41.952354
package 5.787918
dtype: float64
min
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 22/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [58]:
1 students.min()
Out[58]:
iq 0
marks 0
package 0
dtype: int64
In [59]:
1 students.min(axis=1)
Out[59]:
0 10
1 7
2 14
3 2
4 0
5 0
dtype: int64
max
In [60]:
1 students.max()
Out[60]:
iq 120
marks 100
package 14
dtype: int64
In [61]:
1 students.max(axis=1)
Out[61]:
0 100
1 90
2 120
3 80
4 0
5 0
dtype: int64
single column
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 23/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [62]:
1 movies.head(2)
Out[62]:
Uri: The
0 Surgical tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/U
Strike
Battalion
1 tt9472208 NaN https://en.wikipedia.org/w
609
In [63]:
1 movies['title_x']
Out[63]:
In [64]:
1 type(movies['title_x'])
2 # fetched single column will be of series datatype
Out[64]:
pandas.core.series.Series
multiple columns
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 24/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [65]:
1 movies[['title_x','year_of_release','actors']]
2 # we will get dataframe because multiple columns are there
Out[65]:
Emraan Hashmi|Shreya
3 Why Cheat India 2019
Dhanwanthary|Snighdadeep ...
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 25/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [66]:
Out[66]:
Emraan Hashmi|Shreya
3 2019 Why Cheat India
Dhanwanthary|Snighdadeep ...
In [67]:
1 ipl.head(2)
Out[67]:
Narendra
2022- Rajasthan Gujarat Modi
0 1312200 Ahmedabad 2022 Final
05-29 Royals Titans Stadium,
Ahmedabad
Narendra
Royal
2022- Rajasthan Modi
1 1312199 Ahmedabad 2022 Qualifier 2 Challengers
05-27 Royals Stadium,
Bangalore
Ahmedabad
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 26/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [70]:
1 ipl[['Team1','Team2','WinningTeam']]
Out[70]:
949 Royal Challengers Bangalore Kolkata Knight Riders Kolkata Knight Riders
1. iloc
single row
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 27/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [71]:
1 movies.head(3)
Out[71]:
Uri: The
0 Surgical tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/
Strike
Battalion
1 tt9472208 NaN https://en.wikipedia.or
609
The
Accidental
2 Prime tt6986710 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/
Minister
(film)
In [72]:
Out[72]:
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 28/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [73]:
1 type(movies.iloc[0])
Out[73]:
pandas.core.series.Series
multiple row
In [74]:
1 movies.iloc[0:5]
2 # this will be dataframe data type
Out[74]:
Uri: The
0 Surgical tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/Uri:_The_Surg
Strike
Battalion
1 tt9472208 NaN https://en.wikipedia.org/wiki/Battalion_
609
The
Accidental
2 Prime tt6986710 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/The_Accidenta
Minister
(film)
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 29/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [75]:
Out[75]:
Thackeray
9 tt7777196 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/Thackeray_(f
(film)
fancy indexing
In [76]:
1 movies.iloc[[0,4,7,8]]
2 # we are passing list of items we want
Out[76]:
Uri: The
0 Surgical tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/Uri:_The_Su
Strike
Evening
4 tt6028796 NaN https://en.wikipedia.org/wiki/Evening_Sh
Shadows
2. loc
when we provide range for fetching multiple items then the last number of given range will be included
where as in iloc the last number is not included
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 30/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [77]:
1 student_dict = {
2 'name':['nitish','ankit','rupesh','rishabh','amit','himanshu'],
3 'iq':[100,90,120,80,0,0],
4 'marks':[80,70,100,50,0,0],
5 'package':[10,7,14,2,0,0]
6 }
7
8 students = pd.DataFrame(student_dict)
9 students.set_index('name',inplace=True)
10 # we are setting name columns as our index
11 students
Out[77]:
iq marks package
name
nitish 100 80 10
ankit 90 70 7
rishabh 80 50 2
amit 0 0 0
himanshu 0 0 0
single row
In [78]:
1 students.loc['rupesh']
Out[78]:
iq 120
marks 100
package 14
Name: rupesh, dtype: int64
multiple rows
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 31/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [79]:
1 students.loc['nitish':'rishabh']
Out[79]:
iq marks package
name
nitish 100 80 10
ankit 90 70 7
rishabh 80 50 2
In [80]:
1 students.loc['nitish':'rishabh':2]
2 # it will print alternare rows because step value is 2
Out[80]:
iq marks package
name
nitish 100 80 10
fancy indexing
In [81]:
1 students.loc[['nitish','rupesh','himanshu']]
Out[81]:
iq marks package
name
nitish 100 80 10
himanshu 0 0 0
Note : we can also use iloc on the students data though we have name column as index, because there
is default index by pandas so here 0 and nitish will point out on same row positon
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 32/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [82]:
1 students.iloc[[1,3,5]]
Out[82]:
iq marks package
name
ankit 90 70 7
rishabh 80 50 2
himanshu 0 0 0
1 movies.iloc[0:3,0:3]
Out[83]:
In [84]:
1 movies.loc[0:2,'title_x':'poster_path']
Out[84]:
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 33/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
Filtering a DataFrame
In [85]:
1 ipl.head()
Out[85]:
Narendra
2022- Rajasthan Gujarat Modi Rajasthan
0 1312200 Ahmedabad 2022 Final
05-29 Royals Titans Stadium, Royals
Ahmedabad
Narendra
Royal
2022- Rajasthan Modi Rajasthan
1 1312199 Ahmedabad 2022 Qualifier 2 Challengers
05-27 Royals Stadium, Royals
Bangalore
Ahmedabad
Eden
2022- Rajasthan Gujarat Gujarat
In [86]:
1 '''we can see in the data that the Final match of each season
2 is labeled, so we will fetch that first'''
3
4 ipl["MatchNumber"] == "Final"
5 # with this code we will get the boolean series
Out[86]:
0 True
1 False
2 False
3 False
4 False
...
945 False
946 False
947 False
948 False
949 False
Name: MatchNumber, Length: 950, dtype: bool
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 34/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [87]:
1 ipl[ipl["MatchNumber"] == "Final"]
2 # we will get the data of final matches only
Out[87]:
Narendra
2022- Rajasthan Gujarat Modi Rajas
0 1312200 Ahmedabad 2022 Final
05-29 Royals Titans Stadium, Ro
Ahmedabad
Dubai
Chennai Kolkata Ko
2021- International
74 1254117 Dubai 2021 Final Super Knight Kn
10-15 Cricket
Kings Riders Ri
Stadium
Dubai
2020- Delhi Mumbai International D
134 1237181 NaN 2020/21 Final
11-10 Capitals Indians Cricket Cap
Stadium
In [88]:
In [89]:
1 new_df[["Season","WinningTeam"]]
2 # we are doing fancy indexing by passing a list
Out[89]:
Season WinningTeam
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 35/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [90]:
Out[90]:
Season WinningTeam
In [91]:
1 ipl["SuperOver"] == "Y"
Out[91]:
0 False
1 False
2 False
3 False
4 False
...
945 False
946 False
947 False
948 False
949 False
Name: SuperOver, Length: 950, dtype: bool
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 36/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [92]:
1 ipl[ipl["SuperOver"] == "Y"]
Out[92]:
MA
Chidambaram
2021- Delhi Sunrisers De
114 1254077 Chennai 2021 20 Stadium,
04-25 Capitals Hyderabad Capita
Chepauk,
Chennai
Kolkata
2020- Sunrisers Sheikh Zayed Sunrise
158 1216512 Abu Dhabi 2020/21 35 Knight
10-18 Hyderabad Stadium Hyderab
Riders
Dubai
2020- Mumbai Kings XI International Mumb
159 1216517 NaN 2020/21 36
10-18 Indians Punjab Cricket India
Stadium
Dubai
Royal
2020- Mumbai International Mumb
184 1216547 NaN 2020/21 10 Challengers
In [93]:
1 ipl[ipl["SuperOver"] == "Y"].shape[0]
2 # so far 14 matches had a superover
Out[93]:
14
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 37/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [94]:
1 ipl.head()
Out[94]:
Narendra
2022- Rajasthan Gujarat Modi Rajasthan
0 1312200 Ahmedabad 2022 Final
05-29 Royals Titans Stadium, Royals
Ahmedabad
Narendra
Royal
2022- Rajasthan Modi Rajasthan
1 1312199 Ahmedabad 2022 Qualifier 2 Challengers
05-27 Royals Stadium, Royals
Bangalore
Ahmedabad
Eden
2022- Rajasthan Gujarat Gujarat
In [95]:
1 ipl[ipl["City"] == "Kolkata"]
2 # this are matches played in kolkata vanue
Out[95]:
Eden
2022- Rajasthan Gujarat Gujarat
3 1312197 Kolkata 2022 Qualifier 1 Gardens,
05-24 Royals Titans Titans
Kolkata
Kolkata
2019- Mumbai Eden Mumbai
207 1178422 Kolkata 2019 47 Knight
04-28 Indians Gardens Indians
Riders
Kolkata
2019- Rajasthan Eden Rajasthan
211 1178418 Kolkata 2019 43 Knight
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 38/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [100]:
Out[100]:
0 False
1 False
2 False
3 False
4 False
...
945 False
946 False
947 False
948 False
949 False
Length: 950, dtype: bool
In [101]:
Out[101]:
Kolkata Chennai
2019- Eden Chennai
224 1178404 Kolkata 2019 29 Knight Super field
04-14 Gardens Super Kings
Riders Kings
Kolkata Chennai
2012- Eden Chennai
641 548368 Kolkata 2012 63 Knight Super field
05-14 Gardens Super Kings
Riders Kings
Kolkata Chennai
In [102]:
Out[102]:
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 39/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [103]:
1 ipl.head(1)
Out[103]:
Narendra
2022- Rajasthan Gujarat Modi Raja
0 1312200 Ahmedabad 2022 Final
05-29 Royals Titans Stadium, R
Ahmedabad
In [104]:
1 ipl['TossWinner'] == ipl['WinningTeam']
2 # we are doing compariosion for this two column
Out[104]:
0 False
1 True
2 False
3 True
4 False
...
945 False
946 False
947 False
948 True
949 False
Length: 950, dtype: bool
In [105]:
1 ipl[ipl['TossWinner'] == ipl['WinningTeam']]
Out[105]:
Narendra
Royal
2022- Rajasthan Modi Rajasthan
1 1312199 Ahmedabad 2022 Qualifier 2 Challengers
05-27 Royals Stadium, Royals
Bangalore
Ahmedabad
Eden
2022- Rajasthan Gujarat Gujara
3 1312197 Kolkata 2022 Qualifier 1 Gardens,
05-24 Royals Titans Titans
Kolkata
Wankhede
2022- Delhi Mumbai Mumba
5 1304115 Mumbai 2022 69 Stadium,
05-21 Capitals Indians Indians
Mumbai
Dr DY Patil
Lucknow Kolkata Lucknow
Navi 2022- Sports
8 1304112 2022 66 Super Knight Supe
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 40/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [106]:
1 ipl[ipl['TossWinner'] == ipl['WinningTeam']].shape[0]
Out[106]:
489
In [107]:
1 ipl.shape[0]
Out[107]:
950
In [108]:
1 a = ipl[ipl['TossWinner'] == ipl['WinningTeam']].shape[0]
2 b = ipl.shape[0]
3
4 percentage = (a/b)*100
5
6 percentage
Out[108]:
51.473684210526315
In [109]:
1 movies
Out[109]:
Uri: The
0 Surgical tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/Uri:_The
Strike
Battalion
1 tt9472208 NaN https://en.wikipedia.org/wiki/Ba
609
The
Accidental
2 Prime tt6986710 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/The_Acc
Minister
(film)
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 41/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [110]:
Out[110]:
Uri: The
0 tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/Ur
Surgical Strike
Article 15
37 tt10324144 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wik
(film)
In [111]:
Out[111]:
0 True
1 False
2 False
3 False
4 False
...
1624 False
1625 False
1626 False
1627 False
1628 False
Length: 1629, dtype: bool
In [113]:
Out[113]:
43
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 42/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [114]:
1 movies.head()
2 '''here in data if we look in to genres,
3 there is combination of genres in that column'''
Out[114]:
In [115]:
1 movies['genres']
Out[115]:
0 Action|Drama|War
1 War
2 Biography|Drama
3 Crime|Drama
4 Drama
...
1624 Drama
1625 Drama
1626 Comedy|Drama
1627 Action
1628 Drama|Romance
Name: genres, Length: 1629, dtype: object
In [116]:
Out[116]:
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 43/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [117]:
1 movies['genres'].str.split('|').apply(lambda x:'Action' in x)
2 # here we are checking if action genre is present or not
Out[117]:
0 True
1 False
2 False
3 False
4 False
...
1624 False
1625 False
1626 False
1627 True
1628 False
Name: genres, Length: 1629, dtype: bool
In [119]:
In [120]:
Out[120]:
Uri: The
0 Surgical tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/Uri:_The_
Strike
Family of
41 tt8897986 https://upload.wikimedia.org/wikipedia/en/9/99... https://en.wikipedia.org/wiki/Family_of_
Thakurganj
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 44/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [121]:
Out[121]:
Uri: The
0 Surgical tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/Uri:_The_
Strike
Family of
41 tt8897986 https://upload.wikimedia.org/wikipedia/en/9/99... https://en.wikipedia.org/wiki/Family_of_
Thakurganj
write a function that can return the track record of 2 teams against each other
In [ ]:
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 45/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
1 movies.head()
Out[122]:
Uri: The
0 Surgical tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/Uri:_The_Surg
Strike
Battalion
1 tt9472208 NaN https://en.wikipedia.org/wiki/Battalion_
609
The
Accidental
2 Prime tt6986710 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/The_Accidenta
Minister
(film)
In [123]:
1 movies['Country'] = "India"
In [124]:
1 movies.head()
2 # new country column will be added in the dataframe
Out[124]:
Uri: The
0 Surgical tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/Uri:_The_Surg
Strike
Battalion
1 tt9472208 NaN https://en.wikipedia.org/wiki/Battalion_
609
The
Accidental
2 Prime tt6986710 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/The_Accidenta
Minister
(film)
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 46/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
we want to make new column lead actor and here value will come from Actors column.
The first name in the Actors column will be the lead actor of that particular movies, so
we want that.
In [125]:
1 movies['actors'].head()
Out[125]:
In [126]:
1 movies['actors'].str.split('|')
Out[126]:
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 47/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [127]:
1 movies['actors'].str.split('|').apply(lambda x:x[0])
2 '''here this code will show an error because
3 there are some missing values in actors column
4
5 Pandas treat missing values as float values
6 so the logic we wrote in the code will throw an error
7
8 so here first we have to remove the missing value'''
-----------------------------------------------------------------------
----
TypeError Traceback (most recent call l
ast)
~\AppData\Local\Temp\ipykernel_4404\3178307672.py in <module>
----> 1 movies['actors'].str.split('|').apply(lambda x:x[0])
2 '''here this code will show an error because
3 there are some missing values in actors column
4
5 Pandas treat missing values as float values
In [128]:
1 movies.dropna(inplace=True)
In [129]:
1 movies.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 298 entries, 11 to 1623
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 title_x 298 non-null object
1 imdb_id 298 non-null object
2 poster_path 298 non-null object
3 wiki_link 298 non-null object
4 title_y 298 non-null object
5 original_title 298 non-null object
6 is_adult 298 non-null int64
7 year_of_release 298 non-null int64
8 runtime 298 non-null object
9 genres 298 non-null object
10 imdb_rating 298 non-null float64
11 imdb_votes 298 non-null int64
12 story 298 non-null object
13 summary 298 non-null object
14 t li 298 ll bj t
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 48/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [130]:
1 movies['actors'].str.split('|').apply(lambda x:x[0])
Out[130]:
11 Ranveer Singh
34 Gavie Chahal
37 Ayushmann Khurrana
87 Sidharth Malhotra
96 Ajay Devgn
...
1600 Divya Dutta
1601 Anant Nag
1607 Anil Kapoor
1621 Priyanshu Chatterjee
1623 Karisma Kapoor
Name: actors, Length: 298, dtype: object
In [131]:
In [132]:
1 movies.head()
2 # lead actors column is added in the last in the data frame
Out[132]:
Gully
11 tt2395469 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/Gully_Bo
Boy
Yeh
34 Hai tt5525846 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/Yeh_Hai_Ind
India
Article
37 15 tt10324144 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/Article_15_(film
(film)
astype
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 49/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [133]:
1 ipl.info()
2 # here memory occupied by ipl dataset is 148.6 kb
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 950 entries, 0 to 949
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 950 non-null int64
1 City 899 non-null object
2 Date 950 non-null object
3 Season 950 non-null object
4 MatchNumber 950 non-null object
5 Team1 950 non-null object
6 Team2 950 non-null object
7 Venue 950 non-null object
8 TossWinner 950 non-null object
9 TossDecision 950 non-null object
10 SuperOver 946 non-null object
11 WinningTeam 946 non-null object
12 WonBy 950 non-null object
13 Margin 932 non-null float64
14 th d 19 ll bj t
In [134]:
1 ipl['ID'].astype('int32')
2 # we changed the data type of ID column to int32
Out[134]:
0 1312200
1 1312199
2 1312198
3 1312197
4 1304116
...
945 335986
946 335985
947 335984
948 335983
949 335982
Name: ID, Length: 950, dtype: int32
In [135]:
1 ipl['ID'] = ipl['ID'].astype('int32')
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 50/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
In [136]:
1 ipl.info()
2 # now memory occcupied by ipl dataset is 144.9 kb
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 950 entries, 0 to 949
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 950 non-null int32
1 City 899 non-null object
2 Date 950 non-null object
3 Season 950 non-null object
4 MatchNumber 950 non-null object
5 Team1 950 non-null object
6 Team2 950 non-null object
7 Venue 950 non-null object
8 TossWinner 950 non-null object
9 TossDecision 950 non-null object
10 SuperOver 946 non-null object
11 WinningTeam 946 non-null object
12 WonBy 950 non-null object
13 Margin 932 non-null float64
14 th d 19 ll bj t
In [137]:
In [139]:
1 ipl.info()
2 # here memory occupied is 139.0 KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 950 entries, 0 to 949
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 950 non-null int32
1 City 899 non-null object
2 Date 950 non-null object
3 Season 950 non-null category
4 MatchNumber 950 non-null object
5 Team1 950 non-null object
6 Team2 950 non-null object
7 Venue 950 non-null object
8 TossWinner 950 non-null object
9 TossDecision 950 non-null object
10 SuperOver 946 non-null object
11 WinningTeam 946 non-null object
12 WonBy 950 non-null object
13 Margin 932 non-null float64
14 th d 19 ll bj t
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 51/52
5/30/23, 12:02 PM 1. Pandas DataFrame - Jupyter Notebook
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/2. Pandas DateFrame Campus X/1. Pandas DataFrame.ipynb 52/52