Iplprediction.ipynb - Colab
Iplprediction.ipynb - Colab
ipynb - Colab
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
pd.set_option('display.max_columns',None)
matches = pd.read_csv('/content/IPL_Matches_2008_2022.csv')
balls = pd.read_csv('/content/IPL_Ball_by_Ball_2008_2022.csv')
matches.head()
Narendra
2022- Rajasthan Gujarat Modi Rajasthan
0 1312200 Ahmedabad 2022 Final
05-29 Royals Titans Stadium, Royals
Ahmedabad
Narendra
Royal
2022- Rajasthan Modi Rajasthan
1 1312199 Ahmedabad 2022 Qualifier 2 Challengers
05-27 Royals Stadium, Royals
Bangalore
Ahmedabad
Eden
2022- Rajasthan Gujarat Gujarat
3 1312197 Kolkata 2022 Qualifier 1 Gardens,
05-24 Royals Titans Titans
Kolkata
Wankhede
2022- Sunrisers Punjab Sunrisers
4 1304116 Mumbai 2022 70 Stadium,
05-22 Hyderabad Kings Hyderabad
Mumbai
print(matches.shape)
print(" -------------------- ")
print(matches.isnull().sum())
print(" -------------------- ")
print(matches.info())
(950, 20)
--------------------
ID 0
City 51
Date 0
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 1/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
Season 0
MatchNumber 0
Team1 0
Team2 0
Venue 0
TossWinner 0
TossDecision 0
SuperOver 4
WinningTeam 4
WonBy 0
Margin 18
method 931
Player_of_Match 4
Team1Players 0
Team2Players 0
Umpire1 0
Umpire2 0
dtype: int64
--------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 950 entries, 0 to 949
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 950 non-null int64
1 City 899 non-null object
2 Date 950 non-null object
3 Season 950 non-null object
4 MatchNumber 950 non-null object
5 Team1 950 non-null object
6 Team2 950 non-null object
7 Venue 950 non-null object
8 TossWinner 950 non-null object
9 TossDecision 950 non-null object
10 SuperOver 946 non-null object
11 WinningTeam 946 non-null object
12 WonBy 950 non-null object
13 Margin 932 non-null float64
14 method 19 non-null object
15 Player_of_Match 946 non-null object
16 Team1Players 950 non-null object
17 Team2Players 950 non-null object
18 Umpire1 950 non-null object
19 Umpire2 950 non-null object
dtypes: float64(1), int64(1), object(18)
memory usage: 148.6+ KB
None
matches[matches['WinningTeam'].isna()]
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 2/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
Royal
2019- Rajasthan M.Chinnaswamy Rajastha
205 1178424 Bengaluru 2019 49 Challengers
04-30 Royals Stadium Royal
Bangalore
Royal Roya
2015- Delhi M Chinnaswamy
437 829813 Bangalore 2015 55 Challengers Challenger
05-17 Daredevils Stadium
Bangalore Bangalor
Royal
2015- Rajasthan M Chinnaswamy Rajastha
464 829763 Bangalore 2015 29 Challengers
04-29 Royals Stadium Royal
Bangalore
These matches has no Result becomes matches are stopped due to Rain or other reasons, so let's remove
them from the data
matches.head()
Narendra
2022- Rajasthan Gujarat Modi Rajasthan
0 2022 Final bat N
05-29 Royals Titans Stadium, Royals
Ahmedabad
Narendra
Royal
2022- Rajasthan Modi Rajasthan
1 2022 Qualifier 2 Challengers field N
05-27 Royals Stadium, Royals
Bangalore
Ahmedabad
Eden
2022- Rajasthan Gujarat Gujarat
3 2022 Qualifier 1 Gardens, field N
05-24 Royals Titans Titans
Kolkata
Wankhede
2022- Sunrisers Punjab Sunrisers
4 2022 70 Stadium, bat N
05-22 Hyderabad Kings Hyderabad
Mumbai
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 3/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
matches[matches['Season']=='2020/21'].head(2)
Dubai
2020- Delhi Mumbai International Delhi
134 2020/21 Final bat
11-10 Capitals Indians Cricket Capitals
Stadium
Sheikh
2020- Delhi Sunrisers Delhi
135 2020/21 Qualifier 2 Zayed bat
11-08 Capitals Hyderabad Capitals
Stadium
matches.drop('Season',axis=1, inplace=True)
matches.head()
Narendra
2022- Rajasthan Gujarat Modi Rajasthan G
0 Final bat N
05-29 Royals Titans Stadium, Royals
Ahmedabad
Narendra
Royal
2022- Rajasthan Modi Rajasthan Raj
1 Qualifier 2 Challengers field N
05-27 Royals Stadium, Royals
Bangalore
Ahmedabad
Eden
2022- Rajasthan Gujarat Gujarat G
3 Qualifier 1 Gardens, field N
05-24 Royals Titans Titans
Kolkata
Wankhede
2022- Sunrisers Punjab Sunrisers
4 70 Stadium, bat N Punjab
05-22 Hyderabad Kings Hyderabad
Mumbai
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 4/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
matches['WinningTeam'].unique()
matches.head()
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 5/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
Narendra
2022- Rajasthan Gujarat Modi Rajasthan G
0 Final bat N
05-29 Royals Titans Stadium, Royals
Ahmedabad
Narendra
Royal
2022- Rajasthan Modi Rajasthan Raj
1 Qualifier 2 Challengers field N
05-27 Royals Stadium, Royals
Bangalore
Ahmedabad
Eden
2022- Rajasthan Gujarat Gujarat G
3 Qualifier 1 Gardens, field N
05-24 Royals Titans Titans
Kolkata
Wankhede
2022- Sunrisers Punjab Sunrisers
4 70 Stadium, bat N Punjab
05-22 Hyderabad Kings Hyderabad
Mumbai
Okey! Now we Have cleaned Match Dataframe, let's check the Ball by Ball Datafram and Build the Statistical
Dataframe out of it so that we can build the Dashbaord or Perform Visualisation out of it.
balls.head()
non-
ID innings overs ballnumber batter bowler extra_type batsman_run extra
striker
YBK Mohammed JC
0 1312200 1 0 1 NaN 0
Jaiswal Shami Buttler
YBK Mohammed JC
1 1312200 1 0 2 legbyes 0
Jaiswal Shami Buttler
JC Mohammed YBK
2 1312200 1 0 3 NaN 1
Buttler Shami Jaiswal
YBK Mohammed JC
3 1312200 1 0 4 NaN 0
Jaiswal Shami Buttler
YBK Mohammed JC
4 1312200 1 0 5 NaN 0
Jaiswal Shami Buttler
print(balls.shape)
print(" -------------------- ")
print(balls.isnull().sum())
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 6/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
print(" -------------------- ")
print(balls.info())
(225954, 17)
--------------------
ID 0
innings 0
overs 0
ballnumber 0
batter 0
bowler 0
non-striker 0
extra_type 213905
batsman_run 0
extras_run 0
total_run 0
non_boundary 0
isWicketDelivery 0
player_out 214803
kind 214803
fielders_involved 217966
BattingTeam 0
dtype: int64
--------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 225954 entries, 0 to 225953
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 225954 non-null int64
1 innings 225954 non-null int64
2 overs 225954 non-null int64
3 ballnumber 225954 non-null int64
4 batter 225954 non-null object
5 bowler 225954 non-null object
6 non-striker 225954 non-null object
7 extra_type 12049 non-null object
8 batsman_run 225954 non-null int64
9 extras_run 225954 non-null int64
10 total_run 225954 non-null int64
11 non_boundary 225954 non-null int64
12 isWicketDelivery 225954 non-null int64
13 player_out 11151 non-null object
14 kind 11151 non-null object
15 fielders_involved 7988 non-null object
16 BattingTeam 225954 non-null object
dtypes: int64(9), object(8)
memory usage: 29.3+ MB
None
batgroup = balls.groupby(['batter'])
batsman_Stats = pd.DataFrame(batgroup['ballnumber'].count()).rename(columns={'ballnumber':'Balls_Faced'}
batsman_Stats.head()
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 7/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
Balls_Faced
batter
A Badoni 139
A Chandila 7
A Chopra 75
A Choudhary 20
batsman_Stats['innings']=batgroup['innings'].nunique()
batsman_Stats.head()
Balls_Faced innings
batter
A Badoni 139 2
A Chandila 7 1
A Chopra 75 2
A Choudhary 20 2
batsman_Stats['runs']=batgroup['batsman_run'].sum()
batsman_Stats.head()
batter
A Chandila 7 1 4
A Chopra 75 2 53
A Choudhary 20 2 25
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 8/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
batter
A Chandila 7 1 4 3.0
A Chopra 75 2 53 45.0
A Choudhary 20 2 25 40
batsman_Stats.head()
batter
A Ashish Reddy 196 2 280 61.0 83.0 20.0 1.0 16.0 15.0
A Choudhary 20 2 25 40 13 0 10 00 10 10
batsman_Stats['player_out']=batgroup['player_out'].count()
batsman_Stats.head()
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 9/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
batter
A Ashish Reddy 196 2 280 61.0 83.0 20.0 1.0 16.0 15.0 15
A Choudhary 20 2 25 40 13 0 10 00 10 10 2
batsman_Stats['bat_average'] = round(batsman_Stats['runs']/batsman_Stats['player_out'],2)
batsman_Stats['bat_average'].fillna(0,inplace=True)
batsman_Stats['bat_strike'] = round(batsman_Stats['runs']/batsman_Stats['Balls_Faced']*100,2)
batsman_Stats['bat_strike'].fillna(0,inplace=True)
batsman_Stats.head()
batter
A Ashish
196 2 280 61.0 83.0 20.0 1.0 16.0 15.0 15 18.67
Reddy
A Badoni 139 2 161 57.0 53.0 11.0 0.0 11.0 7.0 9 17.89
A
20 2 25 4.0 13.0 1.0 0.0 1.0 1.0 2 12.50
Choudhary
So, Here we have Batsman Statistics, Now let's create Bowler Statistics
bowlgroup = balls.groupby(['bowler'])
bowler_Stats = pd.DataFrame(bowlgroup['ballnumber'].count()).rename(columns={'ballnumber':'BallsThrow'
balls['kind'].unique()
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 10/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
BallsThrow wickets
bowler
A Badoni 12 2.0
A Dananjaya 25 NaN
bowler_Stats['wickets'].fillna(0,inplace=True)
bowler
A Dananjaya 25 00 40
bowler_Stats['runs_conceded'] = balls.groupby('bowler')['batsman_run'].sum()
bowler_Stats['runs_conceded'] = bowler_Stats['runs_conceded'].fillna(0)
bowler_Stats.head()
bowler
A Dananjaya 25 00 40 46
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 11/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
bowler_Stats['runs_conceded'] = bowler_Stats['runs_conceded'].add(balls[balls['extra_type'].isin(['wid
bowler_Stats.head()
bowler
A Dananjaya 25 00 40 47 0
bowler_Stats['bowl_econ'] = round(bowler_Stats['runs_conceded']/bowler_Stats['overs'],2)
bowler_Stats['bowl_strike_rate']=round(bowler_Stats['BallsThrow']/bowler_Stats['wickets'],2)
bowler_Stats.head()
bowler
A Dananjaya 25 00 40 47 0 11 75 inf
players_matches_dict = {}
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 12/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
# Create a dataframe with players and their number of matches
final_players_matches = pd.DataFrame({'Players': list(players_matches_dict.keys())})
final_players_matches['matches'] = final_players_matches['Players'].apply(lambda x: len(players_matche
final_players_matches = final_players_matches.set_index('Players')
final_players_matches.head()
matches
Players
YBK Jaiswal 23
JC Buttler 81
Mohammed Shami 93
Yash Dayal 9
SV Samson 134
Next
steps:
Generate code
with
final_players_matches
toggle_off View recommended
plots
New interactive
sheet
catchDf.head()
catches
A Badoni 9.0
A Chandila 7.0
A Chopra 2.0
A Flintoff 40
Now, We have Batsman, Bowler Statistics, Let's merge them using the Merge Function and make a full_fledge
DataFrame with all information of players and then we get the Clusters of Best players out of all.
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 13/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
final_df = pd.merge(final_df,bowler_Stats, left_index=True, right_index=True,how='outer')
# Merging Catches Stats of Each Player
final_df = pd.merge(final_df,catchDf, left_index=True, right_index=True,how='outer')
# Merging the data of players who got Man of the Match of not
final_df = pd.merge(final_df,matches['Player_of_Match'].value_counts(),left_index=True, right_index=Tr
# Making all the NAN values to 0 because they don't have the values Like a person who does not get Pla
final_df.fillna(0,inplace=True)
final_df.head()
A Ashish
28.0 196.0 2.0 280.0 61.0 83.0 20.0 1.0 16.0 15.0 15.0
Reddy
A Badoni 11.0 139.0 2.0 161.0 57.0 53.0 11.0 0.0 11.0 7.0 9.0
A Chandila 12.0 7.0 1.0 4.0 3.0 4.0 0.0 0.0 0.0 0.0 1.0
A Chopra 6.0 75.0 2.0 53.0 45.0 21.0 2.0 0.0 7.0 0.0 5.0
A
5.0 20.0 2.0 25.0 4.0 13.0 1.0 0.0 1.0 1.0 2.0
Choudhary
Great!! Now let's do some Data Visualisation to see Which player is Consistently good from 2008 to 2022.
import plotly.express as px
# Now you can create the plot using 'Player_of_Match' as the x-axis
fig = px.bar(final_df, x='Player_of_Match', y='runs', title='Number of runs scored by different players
fig.show()
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 14/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
Number of runs scored by different players and they get Player of Match
runs
6000
6000
5000
4000
4000
runs
3000
2000 2000
1000
0
0
A Ashish Reddy
AA Bilakhia
AG Murtaza
AS Roy
Arshdeep Singh
BCJ Cutting
CH Morris
D Wiese
DJG Sammy
EJG Morgan
Gagandeep Singh
IC Porel
JDS Neesham
JW Hastings
KJ Abbott
Karanveer Singh
M Jansen
MA Wood
MM Ali
Mohammed Shami
NT Ellis
PA Patel
Parvez Rasool
RA Bawa
RR Bose
S Badree
S Vidyut
SL Malinga
SS Mundhe
Shoaib Akhtar
TH David
V Kohli
Washington Sundar
Player of Match
# create a bar plot to see the Number of Wickets taken by different players and they get Player of Match
fig = px.bar(final_df, x='Player_of_Match', y='wickets', title='Number of Wickets taken by different pla
# show the plot
fig.show()
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 15/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
Number of Wickets taken by different players and they get Player of Mat
wickets
150
150
wickets
100
100
50 50
0
0
A Ashish Reddy
AA Bilakhia
AG Murtaza
AS Roy
Arshdeep Singh
BCJ Cutting
CH Morris
D Wiese
DJG Sammy
EJG Morgan
Gagandeep Singh
IC Porel
JDS Neesham
JW Hastings
KJ Abbott
Karanveer Singh
M Jansen
MA Wood
MM Ali
Mohammed Shami
NT Ellis
PA Patel
Parvez Rasool
RA Bawa
RR Bose
S Badree
S Vidyut
SL Malinga
SS Mundhe
Shoaib Akhtar
TH David
V Kohli
Washington Sundar
Player of Match
# Create the scatter plot to see the Most Player of the Match by a Player
# The 'Player_of_Match_Count' column (created earlier) contains the numerical data
# representing the number of times a player was Player of the Match.
# We should use this for 'y' and 'size' instead of the player's name.
fig = px.scatter(final_df,
x='matches',
y='Player_of_Match_Count', # Changed to numerical column
color='Player_of_Match',
size='Player_of_Match_Count', # Changed to numerical column
hover_name=final_df.index,
title='Player of the Match')
fig.update_layout(coloraxis=dict(colorscale='reds'))
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 16/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
25
Player_of_Match
A Ashish Reddy
A Badoni
A Chandila
20
A Chopra
Player_of_Match_Count
A Choudhary
A Dananjaya
15 A Flintoff
A Kumble
A Manohar
10 A Mishra
A Mithun
A Mukund
5 A Nehra
A Nel
A Nortje
A Singh
0
A Symonds
0 50 100 150 200
matches
# Create the scatter plot to see Which player has best strike rate
fig = px.scatter(final_df, x='matches', y='bat_strike',color='bat_strike',
size='bat_strike', hover_name=final_df.index, title='Batsman Strike Rate')
fig.update_layout(coloraxis=dict(colorscale='reds'))
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 17/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
bat_strike
400 400
350 350
300 300
250
bat_strike
250
200 200
150
150
100
100
50
50
0
0
0 50 100 150 200
matches
# Create the scatter plot to see the Which Player got most Catches
fig = px.scatter(final_df, x='matches', y='catches',color='catches',
size='catches', hover_name=final_df.index, title='Most Catches by Players')
fig.update_layout(coloraxis=dict(colorscale='reds'))
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 18/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
140 catches
120 120
100 100
80
catches
80
60 60
40
40
20
20
0
0
0 50 100 150 200
matches
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 19/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
Most 6s by a Batsman
6s
350 350
300 300
250 250
200 200
6s
150 150
100
100
50
50
0
0
0 50 100 150 200
matches
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 20/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
Most 4s by a Batsman
4s
700 700
600 600
500 500
400 400
4s
300 300
200
200
100
100
0
0
0 50 100 150 200
matches
As we see the best players in the IPL so far, Let's see the players who miss opportunities and score most time
0's to balls they faced or out most of the time in IPL so far.
# Create the scatter plot to see the Most 0s on the balls they Faced by a Batsman
fig = px.scatter(final_df, x='matches', y='0s',color='0s',
size='0s', hover_name=final_df.index, title='Most 0s by a Batsman for each ball they fa
fig.update_layout(coloraxis=dict(colorscale='blues'))
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 21/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
0s
2000
1500 1500
0s
1000 1000
500
500
0
0
0 50 100 150 200
matches
# Create the scatter plot to see the Most Time Out by a Batsman
fig = px.scatter(final_df, x='player_out', y='not_out', color='runs',
size='player_out', hover_name=final_df.index,
title='Most Time Out by a Batsman vs Matches Played to Player Not Out Matches')
fig.update_layout(coloraxis=dict(colorscale='blues'))
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 22/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
Most Time Out by a Batsman vs Matches Played to Player Not Out Match
140
runs
120
6000
100
5000
80
4000
not_out
60
3000
40
2000
20
1000
0
0
0 50 100 150 200
player_out
So after getting to know about about players best and lost performances, Let's move to visualizing Which are
top 10 Batsman, Bowlers, Strike Rate, Best Team.
final_df = final_df[final_df['matches']>50]
final_df.head()
12.0 154.0 410.0 2.0 362.0 182.0 172.0 22.0 0.0 29.0 5.0 30.0
40.0 88.0 63.0 2.0 41.0 37.0 21.0 1.0 0.0 3.0 1.0 8.0
106.0 76.0 50.0 2.0 26.0 31.0 16.0 1.0 0.0 2.0 0.0 9.0
0.0 170.0 3487.0 4.0 5181.0 1115.0 1420.0 268.0 17.0 414.0 253.0 125.0
37.0 80.0 1555.0 2.0 2069.0 737.0 417.0 66.0 4.0 239.0 92.0 76.0
import plotly.graph_objects as go
# Sort the dataframe by batting average and select the top 10
df_top10 = final_df.sort_values('bat_average', ascending=False).head(10)
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 23/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
fig.update_layout(title='Top 10 Batsmen based on Batting Average', xaxis_title="Player", yaxis_title="
fig.show()
46
100.0 matches, 3895.0 runs, 84.0 outs, 16.0 not outs
44
69.0 matches, 2489.0 runs, 65.0 outs, 4.0 not outs
30
42
20 40
38
10
36
0
0 20 40 60
Player
import plotly.graph_objects as go
# Sort the dataframe by batting average and select the top 10
df_top10 = final_df.sort_values('wickets', ascending=False).head(10)
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 24/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
154.0 matches, 166.0 wickets, 3976.0 RunGive, 3317.0 BallsThrow 130.0 matches, 166.0 wickets, 3624.0 RunGive, 2940.0 BallsThrow
122.0 matches, 170.0 wickets, 3364.0 RunGive, 2974.0 BallsThrow
180
181.0 matches, 157.0 wickets, 4534.0 RunGive, 4024.0 BallsThrow 165.0 matches, 157.0 wickets, 4301.0 RunGive, 3309.0 BallsThrow
150
146.0 matches, 154.0 wickets, 3971.0 RunGive, 3384.0 BallsThrow
175
120.0 matches, 148.0 wickets, 3407.0 RunGive, 2857.0 BallsThrow
170
Wickets
100
165
160
50
155
150
0
20 40 60 80
Player
matches.head(2)
Narendra
2022- Rajasthan Gujarat Modi Rajasthan G
0 Final bat N
05-29 Royals Titans Stadium, Royals
Ahmedabad
Narendra
Royal
2022- Rajasthan Modi Rajasthan Raj
1 Qualifier 2 Challengers field N
05-27 Royals Stadium, Royals
Bangalore
Ahmedabad
fig = go.Figure()
fig.add_trace(go.Bar(x=wins.index[:10], y=wins.values[:10], name="Win Percentage"))
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 25/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
120
100
Win Percentage
80
60
40
20
0
Ko Mu Su De Pu Ra Ro Ch Pu Gu
lkata mb nr lhi nja jas ya en ne jar
ai ise Ca bK th lC na W at
Kn In r sH pit in a n ha iS a r Tit
igh dia als gs Ro lle up r io an
yd ya ng er rs s
tR ns er l e K
ide ab s rs i n
rs ad Ba gs
ng
alo
re
Team
fig = go.Figure()
fig.add_trace(go.Bar(x=win_percentages.index[-10:], y=win_percentages.values[-10:], name="Win Percentage
fig.update_layout(title="Top 10 Least Performing Teams", xaxis_title="Team", yaxis_title="Win Percentage
fig.show()
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 26/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
100
80
Win Percentage
60
40
20
0
De Pu Ra Ro Ch Pu Gu Lu Pu Ko
lhi nja jas ya en ne jar ck ne ch
Ca bK th lC na W a no W iT
pit a h i a rri t T w a rri us
ing nR all Su ita Su ke
als s oy en pe or n p or rs
als ge rK s s er ss Ke
rs ing G ian ra
Ba s la
ng t s
alo
re
Team
matches = pd.read_csv('https://raw.githubusercontent.com/simranjeet97/IPL2023_WinningPrediction_EDA_Da
# First Drop the Unwanted Columns
matches.drop(['City','ID','method'],axis=1,inplace=True)
matches = matches.dropna()
matches['SEASON_INT'] = matches['Season'].apply(lambda x: int(x[:4]))
matches['SEASON_END_INT'] = matches['Season'].apply(lambda x: int('20'+str(x[5:])) if len(x)>5 else in
# Group the data by season and select the last row of each group
last_matches = matches.sort_values('Date').groupby('Season').tail(1)
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 27/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
10k SEASON_END_INT
2022
2020
8k
SEASON_END_INT
2018
6k
2016
2014
4k
2012
2k
2010
2008
0
Ra De Ch Ko Mu Su Gu
jas cc en lka mb nr jar
th an na ta a ise at
an Ch iS Kn i I rs Tit
Ro ar up igh n dia Hy an
ya ge er t n d e s
ls r s K ing R ide s rab
s rs a d
WinningTeam
Let's find out the Best Players of Time using K-Means Clustering
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 28/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
Args:
data: The input DataFrame.
threshold: The correlation threshold.
Returns:
A list of column names with correlation above the threshold.
"""
# Select only numeric columns before calculating correlation
numeric_data = data.select_dtypes(include=np.number)
corr_matrix = numeric_data.corr()
upper_triangle = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(np.bool))
col_corr = [col for col in upper_triangle.columns if any(upper_triangle[col] > threshold)]
return col_corr
top_columns =final_df.drop(['runs',
'0s',
'1s',
'2s',
'3s',
'4s',
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 29/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
'6s',
'player_out',
'wickets',
'overs',
'runs_conceded',
'bowl_strike_rate',
'not_out'],axis=1)
# ipython-input-100-a826ed55dbe8
# **Instead of removing all rows with any non-finite values, consider imputing them**
# **For example, you can replace NaN values with the column mean:**
for col in top_columns.columns:
top_columns[col] = top_columns[col].fillna(top_columns[col].mean())
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 30/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.impute import SimpleImputer
import matplotlib.pyplot as plt
plt.xlabel('K')
plt.ylabel('Errors')
plt.plot(clusters, errors, 'bx-')
plt.show()
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 31/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
model = KMeans(n_clusters = 4)
y_pred = model.fit_predict(scaled_data)
final_df_new['cluster'] = y_pred
final_df_new.head()
final_df_new.reset_index(inplace=True)
topPlayers = final_df_new.rename(columns = {'index':'PlayerNames'})
topPlayers_cluster = pd.DataFrame(topPlayers[['PlayerNames','cluster']])
topPlayers_cluster
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 32/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
PlayerNames cluster
0 12.0 2
1 40.0 2
2 106.0 2
3 0.0 3
4 37.0 0
127 NaN 1
128 6.0 3
129 49.0 2
130 56.0 1
131 218.0 2
Next
steps:
Generate code
with
topPlayers_cluster toggle_off View recommended
plots
New interactive
sheet
teams1 = topPlayers_cluster.loc[topPlayers_cluster['cluster']==0]
teams1 = teams1['PlayerNames'].tolist()
teams2 = topPlayers_cluster.loc[topPlayers_cluster['cluster']==1]
teams2 = teams2['PlayerNames'].tolist()
teams3 = topPlayers_cluster.loc[topPlayers_cluster['cluster']==2]
teams3 = teams3['PlayerNames'].tolist()
teams4 = topPlayers_cluster.loc[topPlayers_cluster['cluster']==3]
teams4 = teams4['PlayerNames'].tolist()
TopPlayer_Dataset = pd.DataFrame(teams1,columns=['teams1'])
TopPlayer_Dataset['teams2']=pd.Series(teams2)
TopPlayer_Dataset['teams3']=pd.Series(teams3)
TopPlayer_Dataset['teams4']=pd.Series(teams4)
TopPlayer_Dataset = TopPlayer_Dataset.fillna('')
TopPlayer_Dataset
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 33/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
22 189.0 83.0
24 147.0 130.0
27 52 0 65 0 142 0
Next
steps:
Generate code
with
TopPlayer_Dataset
toggle_off View recommended
plots
New interactive
sheet
matches = pd.read_csv('/content/IPL_Matches_2008_2022.csv')
balls = pd.read_csv('/content/IPL_Ball_by_Ball_2008_2022.csv')
ID innings total_run
0 335982 1 222
2 335983 1 240
4 335984 1 129
6 335985 1 165
8 335986 1 110
10 335987 1 166
12 335988 1 142
14 335989 1 208
16 335990 1 214
18 335991 1 182
Next steps: Generate code with inningScores toggle_off View recommended plots New interactive sheet
inningScores['target'] = inningScores['total_run'] + 1
inningScores.head(10)
Next steps: Generate code with inningScores toggle_off View recommended plots New interactive sheet
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 35/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
matches['WinningTeam'] = matches['WinningTeam'].str.replace('Deccan Chargers', 'Sunrisers Hyderabad')
teams2023 = [
'Rajasthan Royals',
'Royal Challengers Bangalore',
'Sunrisers Hyderabad',
'Delhi Capitals',
'Chennai Super Kings',
'Gujarat Titans',
'Lucknow Super Giants',
'Kolkata Knight Riders',
'Punjab Kings',
'Mumbai Indians'
]
matches = matches[matches['Team1'].isin(teams2023)]
matches = matches[matches['Team2'].isin(teams2023)]
matches = matches[matches['WinningTeam'].isin(teams2023)]
matches['Team1'].unique()
matches.head()
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 36/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
Narendra
2022- Rajasthan Gujarat Modi Rajasthan
0 1312200 Ahmedabad 2022 Final
05-29 Royals Titans Stadium, Royals
Ahmedabad
Narendra
Royal
2022- Rajasthan Modi Rajasthan
1 1312199 Ahmedabad 2022 Qualifier 2 Challengers
05-27 Royals Stadium, Royals
Bangalore
Ahmedabad
matches.isnull().sum()
Eden
2022- Rajasthan Gujarat Gujarat
3 1312197 Kolkata 2022 Qualifier 1 Gardens,
0 05-24 Royals Titans Titans
Kolkata
ID 0
City 51 Wankhede
2022- Sunrisers Punjab Sunrisers
4 1304116 Mumbai 2022 70 Stadium,
Date 0 05-22 Hyderabad Kings Hyderabad
Mumbai
Season 0
MatchNumber 0
Team1 0
Team2 0
V 0
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 37/37