0% found this document useful (0 votes)
12 views37 pages

Iplprediction.ipynb - Colab

Uploaded by

prasunagummadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views37 pages

Iplprediction.ipynb - Colab

Uploaded by

prasunagummadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

11/27/24, 3:31 PM iplprediction.

ipynb - Colab

IPL 2023 Winning Prediction 🏆 and Full Data Analysis

Data Loading and Summary Checking

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings("ignore")
pd.set_option('display.max_columns',None)

matches = pd.read_csv('/content/IPL_Matches_2008_2022.csv')
balls = pd.read_csv('/content/IPL_Ball_by_Ball_2008_2022.csv')
matches.head()

ID City Date Season MatchNumber Team1 Team2 Venue TossWinner To

Narendra
2022- Rajasthan Gujarat Modi Rajasthan
0 1312200 Ahmedabad 2022 Final
05-29 Royals Titans Stadium, Royals
Ahmedabad

Narendra
Royal
2022- Rajasthan Modi Rajasthan
1 1312199 Ahmedabad 2022 Qualifier 2 Challengers
05-27 Royals Stadium, Royals
Bangalore
Ahmedabad

Royal Lucknow Eden Lucknow


2022-
2 1312198 Kolkata 2022 Eliminator Challengers Super Gardens, Super
05-25
Bangalore Giants Kolkata Giants

Eden
2022- Rajasthan Gujarat Gujarat
3 1312197 Kolkata 2022 Qualifier 1 Gardens,
05-24 Royals Titans Titans
Kolkata

Wankhede
2022- Sunrisers Punjab Sunrisers
4 1304116 Mumbai 2022 70 Stadium,
05-22 Hyderabad Kings Hyderabad
Mumbai

Next steps: Generate code with matches


toggle_off View recommended plots New interactive sheet

print(matches.shape)
print(" -------------------- ")
print(matches.isnull().sum())
print(" -------------------- ")
print(matches.info())

(950, 20)
--------------------
ID 0
City 51
Date 0
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 1/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
Season 0
MatchNumber 0
Team1 0
Team2 0
Venue 0
TossWinner 0
TossDecision 0
SuperOver 4
WinningTeam 4
WonBy 0
Margin 18
method 931
Player_of_Match 4
Team1Players 0
Team2Players 0
Umpire1 0
Umpire2 0
dtype: int64
--------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 950 entries, 0 to 949
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 950 non-null int64
1 City 899 non-null object
2 Date 950 non-null object
3 Season 950 non-null object
4 MatchNumber 950 non-null object
5 Team1 950 non-null object
6 Team2 950 non-null object
7 Venue 950 non-null object
8 TossWinner 950 non-null object
9 TossDecision 950 non-null object
10 SuperOver 946 non-null object
11 WinningTeam 946 non-null object
12 WonBy 950 non-null object
13 Margin 932 non-null float64
14 method 19 non-null object
15 Player_of_Match 946 non-null object
16 Team1Players 950 non-null object
17 Team2Players 950 non-null object
18 Umpire1 950 non-null object
19 Umpire2 950 non-null object
dtypes: float64(1), int64(1), object(18)
memory usage: 148.6+ KB
None

matches[matches['WinningTeam'].isna()]

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 2/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

ID City Date Season MatchNumber Team1 Team2 Venue TossWinne

Royal
2019- Rajasthan M.Chinnaswamy Rajastha
205 1178424 Bengaluru 2019 49 Challengers
04-30 Royals Stadium Royal
Bangalore

Royal Roya
2015- Delhi M Chinnaswamy
437 829813 Bangalore 2015 55 Challengers Challenger
05-17 Daredevils Stadium
Bangalore Bangalor

Royal
2015- Rajasthan M Chinnaswamy Rajastha
464 829763 Bangalore 2015 29 Challengers
04-29 Royals Stadium Royal
Bangalore

2011- Delhi Pune Feroz Shah Delh


708 501265 Delhi 2011 68
05-21 Daredevils Warriors Kotla Daredevil

These matches has no Result becomes matches are stopped due to Rain or other reasons, so let's remove
them from the data

# First Drop the Unwanted Columns


matches.drop(['City','ID','method'],axis=1,inplace=True)
matches = matches.dropna()

matches.head()

Date Season MatchNumber Team1 Team2 Venue TossWinner TossDecision SuperOver

Narendra
2022- Rajasthan Gujarat Modi Rajasthan
0 2022 Final bat N
05-29 Royals Titans Stadium, Royals
Ahmedabad

Narendra
Royal
2022- Rajasthan Modi Rajasthan
1 2022 Qualifier 2 Challengers field N
05-27 Royals Stadium, Royals
Bangalore
Ahmedabad

Royal Lucknow Eden Lucknow


2022-
2 2022 Eliminator Challengers Super Gardens, Super field N
05-25
Bangalore Giants Kolkata Giants

Eden
2022- Rajasthan Gujarat Gujarat
3 2022 Qualifier 1 Gardens, field N
05-24 Royals Titans Titans
Kolkata

Wankhede
2022- Sunrisers Punjab Sunrisers
4 2022 70 Stadium, bat N
05-22 Hyderabad Kings Hyderabad
Mumbai

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 3/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

Next steps: Generate code with matches


toggle_off View recommended plots New interactive sheet

matches[matches['Season']=='2020/21'].head(2)

Date Season MatchNumber Team1 Team2 Venue TossWinner TossDecision SuperOve

Dubai
2020- Delhi Mumbai International Delhi
134 2020/21 Final bat
11-10 Capitals Indians Cricket Capitals
Stadium

Sheikh
2020- Delhi Sunrisers Delhi
135 2020/21 Qualifier 2 Zayed bat
11-08 Capitals Hyderabad Capitals
Stadium

# Convert the Date columns into their respective data types


matches['Date'] = pd.to_datetime(matches['Date'])

matches['SEASON_INT'] = matches['Season'].apply(lambda x: int(x[:4]))


matches['SEASON_END_INT'] = matches['Season'].apply(lambda x: int('20'+str(x[5:])) if len(x)>5 else in

matches.drop('Season',axis=1, inplace=True)

matches.head()

Date MatchNumber Team1 Team2 Venue TossWinner TossDecision SuperOver Winnin

Narendra
2022- Rajasthan Gujarat Modi Rajasthan G
0 Final bat N
05-29 Royals Titans Stadium, Royals
Ahmedabad

Narendra
Royal
2022- Rajasthan Modi Rajasthan Raj
1 Qualifier 2 Challengers field N
05-27 Royals Stadium, Royals
Bangalore
Ahmedabad

Royal Lucknow Eden Lucknow


2022-
2 Eliminator Challengers Super Gardens, Super field N Chall
05-25
Bangalore Giants Kolkata Giants Ban

Eden
2022- Rajasthan Gujarat Gujarat G
3 Qualifier 1 Gardens, field N
05-24 Royals Titans Titans
Kolkata

Wankhede
2022- Sunrisers Punjab Sunrisers
4 70 Stadium, bat N Punjab
05-22 Hyderabad Kings Hyderabad
Mumbai

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 4/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

Next steps: Generate code with matches


toggle_off View recommended plots New interactive sheet

matches['WinningTeam'].unique()

array(['Gujarat Titans', 'Rajasthan Royals',


'Royal Challengers Bangalore', 'Punjab Kings', 'Mumbai Indians',
'Lucknow Super Giants', 'Sunrisers Hyderabad', 'Delhi Capitals',
'Kolkata Knight Riders', 'Chennai Super Kings', 'Kings XI Punjab',
'Delhi Daredevils', 'Rising Pune Supergiant', 'Gujarat Lions',
'Rising Pune Supergiants', 'Pune Warriors', 'Deccan Chargers',
'Kochi Tuskers Kerala'], dtype=object)

matches['Team1'] = matches['Team1'].str.replace('Delhi Daredevils', 'Delhi Capitals')


matches['Team2'] = matches['Team2'].str.replace('Delhi Daredevils', 'Delhi Capitals')
matches['WinningTeam'] = matches['WinningTeam'].str.replace('Delhi Daredevils', 'Delhi Capitals')

matches['Team1'] = matches['Team1'].str.replace('Kings XI Punjab', 'Punjab Kings')


matches['Team2'] = matches['Team2'].str.replace('Kings XI Punjab', 'Punjab Kings')
matches['WinningTeam'] = matches['WinningTeam'].str.replace('Kings XI Punjab', 'Punjab Kings')

matches['Team1'] = matches['Team1'].str.replace('Deccan Chargers', 'Sunrisers Hyderabad')


matches['Team2'] = matches['Team2'].str.replace('Deccan Chargers', 'Sunrisers Hyderabad')
matches['WinningTeam'] = matches['WinningTeam'].str.replace('Deccan Chargers', 'Sunrisers Hyderabad')

matches['Team1'] = matches['Team1'].str.replace('Rising Pune Supergiant', 'Pune Warriors')


matches['Team2'] = matches['Team2'].str.replace('Rising Pune Supergiant', 'Pune Warriors')
matches['WinningTeam'] = matches['WinningTeam'].str.replace('Rising Pune Supergiant', 'Pune Warriors')

matches['Team1'] = matches['Team1'].str.replace('Rising Pune Supergiants', 'Pune Warriors')


matches['Team2'] = matches['Team2'].str.replace('Rising Pune Supergiants', 'Pune Warriors')
matches['WinningTeam'] = matches['WinningTeam'].str.replace('Rising Pune Supergiants', 'Pune Warriors'

matches['Team1'] = matches['Team1'].str.replace('Gujarat Lions', 'Gujarat Titans')


matches['Team2'] = matches['Team2'].str.replace('Gujarat Lions', 'Gujarat Titans')
matches['WinningTeam'] = matches['WinningTeam'].str.replace('Gujarat Lions', 'Gujarat Titans')

matches.head()

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 5/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

Date MatchNumber Team1 Team2 Venue TossWinner TossDecision SuperOver Winnin

Narendra
2022- Rajasthan Gujarat Modi Rajasthan G
0 Final bat N
05-29 Royals Titans Stadium, Royals
Ahmedabad

Narendra
Royal
2022- Rajasthan Modi Rajasthan Raj
1 Qualifier 2 Challengers field N
05-27 Royals Stadium, Royals
Bangalore
Ahmedabad

Royal Lucknow Eden Lucknow


2022-
2 Eliminator Challengers Super Gardens, Super field N Chall
05-25
Bangalore Giants Kolkata Giants Ban

Eden
2022- Rajasthan Gujarat Gujarat G
3 Qualifier 1 Gardens, field N
05-24 Royals Titans Titans
Kolkata

Wankhede
2022- Sunrisers Punjab Sunrisers
4 70 Stadium, bat N Punjab
05-22 Hyderabad Kings Hyderabad
Mumbai

Next steps: Generate code with matches


toggle_off View recommended plots New interactive sheet

Okey! Now we Have cleaned Match Dataframe, let's check the Ball by Ball Datafram and Build the Statistical
Dataframe out of it so that we can build the Dashbaord or Perform Visualisation out of it.

balls.head()

non-
ID innings overs ballnumber batter bowler extra_type batsman_run extra
striker

YBK Mohammed JC
0 1312200 1 0 1 NaN 0
Jaiswal Shami Buttler

YBK Mohammed JC
1 1312200 1 0 2 legbyes 0
Jaiswal Shami Buttler

JC Mohammed YBK
2 1312200 1 0 3 NaN 1
Buttler Shami Jaiswal

YBK Mohammed JC
3 1312200 1 0 4 NaN 0
Jaiswal Shami Buttler

YBK Mohammed JC
4 1312200 1 0 5 NaN 0
Jaiswal Shami Buttler

print(balls.shape)
print(" -------------------- ")
print(balls.isnull().sum())

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 6/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
print(" -------------------- ")
print(balls.info())

(225954, 17)
--------------------
ID 0
innings 0
overs 0
ballnumber 0
batter 0
bowler 0
non-striker 0
extra_type 213905
batsman_run 0
extras_run 0
total_run 0
non_boundary 0
isWicketDelivery 0
player_out 214803
kind 214803
fielders_involved 217966
BattingTeam 0
dtype: int64
--------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 225954 entries, 0 to 225953
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 225954 non-null int64
1 innings 225954 non-null int64
2 overs 225954 non-null int64
3 ballnumber 225954 non-null int64
4 batter 225954 non-null object
5 bowler 225954 non-null object
6 non-striker 225954 non-null object
7 extra_type 12049 non-null object
8 batsman_run 225954 non-null int64
9 extras_run 225954 non-null int64
10 total_run 225954 non-null int64
11 non_boundary 225954 non-null int64
12 isWicketDelivery 225954 non-null int64
13 player_out 11151 non-null object
14 kind 11151 non-null object
15 fielders_involved 7988 non-null object
16 BattingTeam 225954 non-null object
dtypes: int64(9), object(8)
memory usage: 29.3+ MB
None

batgroup = balls.groupby(['batter'])
batsman_Stats = pd.DataFrame(batgroup['ballnumber'].count()).rename(columns={'ballnumber':'Balls_Faced'}
batsman_Stats.head()

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 7/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

Balls_Faced

batter

A Ashish Reddy 196

A Badoni 139

A Chandila 7

A Chopra 75

A Choudhary 20

Next steps: Generate code with batsman_Stats


toggle_off View recommended plots New interactive sheet

batsman_Stats['innings']=batgroup['innings'].nunique()
batsman_Stats.head()

Balls_Faced innings

batter

A Ashish Reddy 196 2

A Badoni 139 2

A Chandila 7 1

A Chopra 75 2

A Choudhary 20 2

Next steps: Generate code with batsman_Stats


toggle_off View recommended plots New interactive sheet

batsman_Stats['runs']=batgroup['batsman_run'].sum()
batsman_Stats.head()

Balls_Faced innings runs

batter

A Ashish Reddy 196 2 280

A Badoni 139 2 161

A Chandila 7 1 4

A Chopra 75 2 53

A Choudhary 20 2 25

Next steps: Generate code with batsman_Stats


toggle_off View recommended plots New interactive sheet

batsman_Stats['0s'] = balls[balls['batsman_run'] == 0].groupby('batter')['batsman_run'].count()


batsman_Stats['0s'].fillna(0,inplace=True)
batsman_Stats.head()

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 8/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

Balls_Faced innings runs 0s

batter

A Ashish Reddy 196 2 280 61.0

A Badoni 139 2 161 57.0

A Chandila 7 1 4 3.0

A Chopra 75 2 53 45.0

A Choudhary 20 2 25 40

Next steps: Generate code with batsman_Stats


toggle_off View recommended plots New interactive sheet

batsman_Stats['1s'] = balls[balls['batsman_run'] == 1].groupby('batter')['batsman_run'].count()


batsman_Stats['1s'].fillna(0,inplace=True)

batsman_Stats['2s'] = balls[balls['batsman_run'] == 2].groupby('batter')['batsman_run'].count()


batsman_Stats['2s'].fillna(0,inplace=True)

batsman_Stats['3s'] = balls[balls['batsman_run'] == 3].groupby('batter')['batsman_run'].count()


batsman_Stats['3s'].fillna(0,inplace=True)

batsman_Stats['4s'] = balls[balls['batsman_run'] == 4].groupby('batter')['batsman_run'].count()


batsman_Stats['4s'].fillna(0,inplace=True)

batsman_Stats['6s'] = balls[balls['batsman_run'] == 6].groupby('batter')['batsman_run'].count()


batsman_Stats['6s'].fillna(0,inplace=True)

batsman_Stats.head()

Balls_Faced innings runs 0s 1s 2s 3s 4s 6s

batter

A Ashish Reddy 196 2 280 61.0 83.0 20.0 1.0 16.0 15.0

A Badoni 139 2 161 57.0 53.0 11.0 0.0 11.0 7.0

A Chandila 7 1 4 3.0 4.0 0.0 0.0 0.0 0.0

A Chopra 75 2 53 45.0 21.0 2.0 0.0 7.0 0.0

A Choudhary 20 2 25 40 13 0 10 00 10 10

Next steps: Generate code with batsman_Stats


toggle_off View recommended plots New interactive sheet

batsman_Stats['player_out']=batgroup['player_out'].count()
batsman_Stats.head()

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 9/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

Balls_Faced innings runs 0s 1s 2s 3s 4s 6s player_out

batter

A Ashish Reddy 196 2 280 61.0 83.0 20.0 1.0 16.0 15.0 15

A Badoni 139 2 161 57.0 53.0 11.0 0.0 11.0 7.0 9

A Chandila 7 1 4 3.0 4.0 0.0 0.0 0.0 0.0 1

A Chopra 75 2 53 45.0 21.0 2.0 0.0 7.0 0.0 5

A Choudhary 20 2 25 40 13 0 10 00 10 10 2

Next steps: Generate code with batsman_Stats


toggle_off View recommended plots New interactive sheet

batsman_Stats['bat_average'] = round(batsman_Stats['runs']/batsman_Stats['player_out'],2)
batsman_Stats['bat_average'].fillna(0,inplace=True)
batsman_Stats['bat_strike'] = round(batsman_Stats['runs']/batsman_Stats['Balls_Faced']*100,2)
batsman_Stats['bat_strike'].fillna(0,inplace=True)
batsman_Stats.head()

Balls_Faced innings runs 0s 1s 2s 3s 4s 6s player_out bat_average bat

batter

A Ashish
196 2 280 61.0 83.0 20.0 1.0 16.0 15.0 15 18.67
Reddy

A Badoni 139 2 161 57.0 53.0 11.0 0.0 11.0 7.0 9 17.89

A Chandila 7 1 4 3.0 4.0 0.0 0.0 0.0 0.0 1 4.00

A Chopra 75 2 53 45.0 21.0 2.0 0.0 7.0 0.0 5 10.60

A
20 2 25 4.0 13.0 1.0 0.0 1.0 1.0 2 12.50
Choudhary

So, Here we have Batsman Statistics, Now let's create Bowler Statistics

bowlgroup = balls.groupby(['bowler'])

bowler_Stats = pd.DataFrame(bowlgroup['ballnumber'].count()).rename(columns={'ballnumber':'BallsThrow'

balls['kind'].unique()

array([nan, 'caught', 'caught and bowled', 'run out', 'bowled', 'stumped',


'lbw', 'hit wicket', 'retired hurt', 'retired out',
'obstructing the field'], dtype=object)

# Get only useful Wicket and Outs


wickets_out = balls[balls['kind'].isin(['caught','bowled', 'lbw','stumped', 'caught and bowled', 'hit
bowler_Stats['wickets'] = wickets_out.groupby(['bowler'])['ballnumber'].count()
bowler_Stats.head()

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 10/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

BallsThrow wickets

bowler

A Ashish Reddy 270 18.0

A Badoni 12 2.0

A Chandila 234 11.0

A Choudhary 108 5.0

A Dananjaya 25 NaN

Next steps: Generate code with bowler_Stats


toggle_off View recommended plots New interactive sheet

bowler_Stats['wickets'].fillna(0,inplace=True)

# Calculating total number of overs bowled


bowler_Stats['overs'] = round(bowler_Stats['BallsThrow']/6)
bowler_Stats.head()

BallsThrow wickets overs

bowler

A Ashish Reddy 270 18.0 45.0

A Badoni 12 2.0 2.0

A Chandila 234 11.0 39.0

A Choudhary 108 5.0 18.0

A Dananjaya 25 00 40

Next steps: Generate code with bowler_Stats


toggle_off View recommended plots New interactive sheet

bowler_Stats['runs_conceded'] = balls.groupby('bowler')['batsman_run'].sum()
bowler_Stats['runs_conceded'] = bowler_Stats['runs_conceded'].fillna(0)
bowler_Stats.head()

BallsThrow wickets overs runs_conceded

bowler

A Ashish Reddy 270 18.0 45.0 386

A Badoni 12 2.0 2.0 11

A Chandila 234 11.0 39.0 242

A Choudhary 108 5.0 18.0 137

A Dananjaya 25 00 40 46

Next steps: Generate code with bowler_Stats


toggle_off View recommended plots New interactive sheet

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 11/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
bowler_Stats['runs_conceded'] = bowler_Stats['runs_conceded'].add(balls[balls['extra_type'].isin(['wid

bowler_Stats.head()

BallsThrow wickets overs runs_conceded

bowler

A Ashish Reddy 270 18.0 45.0 396.0

A Badoni 12 2.0 2.0 11.0

A Chandila 234 11.0 39.0 242.0

A Choudhary 108 5.0 18.0 144.0

A Dananjaya 25 00 40 47 0

Next steps: Generate code with bowler_Stats


toggle_off View recommended plots New interactive sheet

bowler_Stats['bowl_econ'] = round(bowler_Stats['runs_conceded']/bowler_Stats['overs'],2)
bowler_Stats['bowl_strike_rate']=round(bowler_Stats['BallsThrow']/bowler_Stats['wickets'],2)
bowler_Stats.head()

BallsThrow wickets overs runs_conceded bowl_econ bowl_strike_rate

bowler

A Ashish Reddy 270 18.0 45.0 396.0 8.80 15.00

A Badoni 12 2.0 2.0 11.0 5.50 6.00

A Chandila 234 11.0 39.0 242.0 6.21 21.27

A Choudhary 108 5.0 18.0 144.0 8.00 21.60

A Dananjaya 25 00 40 47 0 11 75 inf

players_matches_dict = {}

# Iterate over each row in the dataframe


for i, row in balls.iterrows():
# Check if the batter is already in the dictionary
if row['batter'] in players_matches_dict:
players_matches_dict[row['batter']].add(row['ID'])
else:
players_matches_dict[row['batter']] = {row['ID']}

# Check if the non-striker is already in the dictionary


if row['non-striker'] in players_matches_dict:
players_matches_dict[row['non-striker']].add(row['ID'])
else:
players_matches_dict[row['non-striker']] = {row['ID']}

# Check if the bowler is already in the dictionary


if row['bowler'] in players_matches_dict:
players_matches_dict[row['bowler']].add(row['ID'])
else:
players_matches_dict[row['bowler']] = {row['ID']}

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 12/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
# Create a dataframe with players and their number of matches
final_players_matches = pd.DataFrame({'Players': list(players_matches_dict.keys())})
final_players_matches['matches'] = final_players_matches['Players'].apply(lambda x: len(players_matche
final_players_matches = final_players_matches.set_index('Players')

final_players_matches.head()

matches

Players

YBK Jaiswal 23

JC Buttler 81

Mohammed Shami 93

Yash Dayal 9

SV Samson 134

Next
steps:
Generate code
with
final_players_matches
toggle_off View recommended
plots
New interactive
sheet

# Catches Caught by Players


outbyCatch = balls[(balls['kind'].isin(['caught and bowled']))].groupby('bowler')['ballnumber'].count(
justCatch = balls[(balls['kind'].isin(['caught']))].groupby('fielders_involved')['ballnumber'].count()

catchDf = pd.merge(outbyCatch,justCatch, left_index=True, right_index=True,how='outer')


catchDf.fillna(0,inplace=True)
catchDf['catches'] = catchDf['bowler_catches']+catchDf['fielder_catches']
catchDf.drop(['bowler_catches','fielder_catches'],axis=1,inplace=True)

catchDf.head()

catches

A Ashish Reddy 9.0

A Badoni 9.0

A Chandila 7.0

A Chopra 2.0

A Flintoff 40

Next steps: Generate code with catchDf


toggle_off View recommended plots New interactive sheet

Now, We have Batsman, Bowler Statistics, Let's merge them using the Merge Function and make a full_fledge
DataFrame with all information of players and then we get the Clusters of Best players out of all.

# Merging Batsman Stats


final_df = pd.merge(final_players_matches,batsman_Stats, left_index=True, right_index=True,how='outer'
# Merging Bowler Stats

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 13/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
final_df = pd.merge(final_df,bowler_Stats, left_index=True, right_index=True,how='outer')
# Merging Catches Stats of Each Player
final_df = pd.merge(final_df,catchDf, left_index=True, right_index=True,how='outer')
# Merging the data of players who got Man of the Match of not
final_df = pd.merge(final_df,matches['Player_of_Match'].value_counts(),left_index=True, right_index=Tr
# Making all the NAN values to 0 because they don't have the values Like a person who does not get Pla
final_df.fillna(0,inplace=True)

final_df.head()

matches Balls_Faced innings runs 0s 1s 2s 3s 4s 6s player_out bat_av

A Ashish
28.0 196.0 2.0 280.0 61.0 83.0 20.0 1.0 16.0 15.0 15.0
Reddy

A Badoni 11.0 139.0 2.0 161.0 57.0 53.0 11.0 0.0 11.0 7.0 9.0

A Chandila 12.0 7.0 1.0 4.0 3.0 4.0 0.0 0.0 0.0 0.0 1.0

A Chopra 6.0 75.0 2.0 53.0 45.0 21.0 2.0 0.0 7.0 0.0 5.0

A
5.0 20.0 2.0 25.0 4.0 13.0 1.0 0.0 1.0 1.0 2.0
Choudhary

Great!! Now let's do some Data Visualisation to see Which player is Consistently good from 2008 to 2022.

import plotly.express as px

# Merging Batsman Stats


final_df = pd.merge(final_players_matches,batsman_Stats, left_index=True, right_index=True,how='outer')
# Merging Bowler Stats
final_df = pd.merge(final_df,bowler_Stats, left_index=True, right_index=True,how='outer')
# Merging Catches Stats of Each Player
final_df = pd.merge(final_df,catchDf, left_index=True, right_index=True,how='outer')
# Merging the data of players who got Man of the Match of not
# Reset the index of the value_counts Series to make 'Player_of_Match' a column
player_of_match_counts = matches['Player_of_Match'].value_counts().reset_index()
# Rename the columns to avoid conflicts
player_of_match_counts.columns = ['Player_of_Match', 'Player_of_Match_Count']
# Now merge with final_df
final_df = pd.merge(final_df, player_of_match_counts, left_index=True, right_on='Player_of_Match', how='
# Making all the NAN values to 0 because they don't have the values Like a person who does not get Playe
final_df.fillna(0,inplace=True)

# Now you can create the plot using 'Player_of_Match' as the x-axis
fig = px.bar(final_df, x='Player_of_Match', y='runs', title='Number of runs scored by different players
fig.show()

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 14/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

Number of runs scored by different players and they get Player of Match

runs
6000
6000
5000

4000
4000
runs

3000

2000 2000

1000

0
0
A Ashish Reddy
AA Bilakhia
AG Murtaza
AS Roy
Arshdeep Singh
BCJ Cutting
CH Morris
D Wiese
DJG Sammy
EJG Morgan
Gagandeep Singh
IC Porel
JDS Neesham
JW Hastings
KJ Abbott
Karanveer Singh
M Jansen
MA Wood
MM Ali
Mohammed Shami
NT Ellis
PA Patel
Parvez Rasool
RA Bawa
RR Bose
S Badree
S Vidyut
SL Malinga
SS Mundhe
Shoaib Akhtar
TH David
V Kohli
Washington Sundar
Player of Match

# create a bar plot to see the Number of Wickets taken by different players and they get Player of Match
fig = px.bar(final_df, x='Player_of_Match', y='wickets', title='Number of Wickets taken by different pla
# show the plot
fig.show()

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 15/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

Number of Wickets taken by different players and they get Player of Mat

wickets

150
150
wickets

100
100

50 50

0
0
A Ashish Reddy
AA Bilakhia
AG Murtaza
AS Roy
Arshdeep Singh
BCJ Cutting
CH Morris
D Wiese
DJG Sammy
EJG Morgan
Gagandeep Singh
IC Porel
JDS Neesham
JW Hastings
KJ Abbott
Karanveer Singh
M Jansen
MA Wood
MM Ali
Mohammed Shami
NT Ellis
PA Patel
Parvez Rasool
RA Bawa
RR Bose
S Badree
S Vidyut
SL Malinga
SS Mundhe
Shoaib Akhtar
TH David
V Kohli
Washington Sundar
Player of Match

# Create the scatter plot to see the Most Player of the Match by a Player
# The 'Player_of_Match_Count' column (created earlier) contains the numerical data
# representing the number of times a player was Player of the Match.
# We should use this for 'y' and 'size' instead of the player's name.

fig = px.scatter(final_df,
x='matches',
y='Player_of_Match_Count', # Changed to numerical column
color='Player_of_Match',
size='Player_of_Match_Count', # Changed to numerical column
hover_name=final_df.index,
title='Player of the Match')
fig.update_layout(coloraxis=dict(colorscale='reds'))

# Show the plot


fig.show()

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 16/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

Player of the Match

25
Player_of_Match
A Ashish Reddy
A Badoni
A Chandila
20
A Chopra
Player_of_Match_Count

A Choudhary
A Dananjaya
15 A Flintoff
A Kumble
A Manohar
10 A Mishra
A Mithun
A Mukund
5 A Nehra
A Nel
A Nortje
A Singh
0
A Symonds
0 50 100 150 200

matches

# Create the scatter plot to see Which player has best strike rate
fig = px.scatter(final_df, x='matches', y='bat_strike',color='bat_strike',
size='bat_strike', hover_name=final_df.index, title='Batsman Strike Rate')
fig.update_layout(coloraxis=dict(colorscale='reds'))

# Show the plot


fig.show()

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 17/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

Batsman Strike Rate

bat_strike
400 400

350 350

300 300

250
bat_strike

250

200 200

150
150

100
100

50
50

0
0
0 50 100 150 200

matches

# Create the scatter plot to see the Which Player got most Catches
fig = px.scatter(final_df, x='matches', y='catches',color='catches',
size='catches', hover_name=final_df.index, title='Most Catches by Players')
fig.update_layout(coloraxis=dict(colorscale='reds'))

# Show the plot


fig.show()

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 18/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

Most Catches by Players

140 catches

120 120

100 100

80
catches

80

60 60

40
40

20
20

0
0
0 50 100 150 200

matches

# Create the scatter plot to see the Most 6s by a Batsman


fig = px.scatter(final_df, x='matches', y='6s',color='6s',
size='6s', hover_name=final_df.index, title='Most 6s by a Batsman')
fig.update_layout(coloraxis=dict(colorscale='greens'))

# Show the plot


fig.show()

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 19/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

Most 6s by a Batsman

6s
350 350

300 300

250 250

200 200
6s

150 150

100
100

50
50

0
0
0 50 100 150 200

matches

# Create the scatter plot to see the Most 4s by a Batsman


fig = px.scatter(final_df, x='matches', y='4s',color='4s',
size='4s', hover_name=final_df.index, title='Most 4s by a Batsman')
fig.update_layout(coloraxis=dict(colorscale='blues'))

# Show the plot


fig.show()

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 20/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

Most 4s by a Batsman

4s
700 700

600 600

500 500

400 400
4s

300 300

200
200

100
100

0
0
0 50 100 150 200

matches

As we see the best players in the IPL so far, Let's see the players who miss opportunities and score most time
0's to balls they faced or out most of the time in IPL so far.

# Create the scatter plot to see the Most 0s on the balls they Faced by a Batsman
fig = px.scatter(final_df, x='matches', y='0s',color='0s',
size='0s', hover_name=final_df.index, title='Most 0s by a Batsman for each ball they fa
fig.update_layout(coloraxis=dict(colorscale='blues'))

# Show the plot


fig.show()

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 21/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

Most 0s by a Batsman for each ball they face

0s
2000

1500 1500
0s

1000 1000

500
500

0
0
0 50 100 150 200

matches

# Calculate the ratio of matches to player_out


final_df['not_out'] = final_df['matches'] - final_df['player_out']

# Create the scatter plot to see the Most Time Out by a Batsman
fig = px.scatter(final_df, x='player_out', y='not_out', color='runs',
size='player_out', hover_name=final_df.index,
title='Most Time Out by a Batsman vs Matches Played to Player Not Out Matches')
fig.update_layout(coloraxis=dict(colorscale='blues'))

# Show the plot


fig.show()

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 22/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

Most Time Out by a Batsman vs Matches Played to Player Not Out Match

140
runs

120
6000

100
5000

80
4000
not_out

60
3000

40
2000

20
1000

0
0
0 50 100 150 200

player_out

So after getting to know about about players best and lost performances, Let's move to visualizing Which are
top 10 Batsman, Bowlers, Strike Rate, Best Team.

final_df = final_df[final_df['matches']>50]
final_df.head()

matches Balls_Faced innings runs 0s 1s 2s 3s 4s 6s player_out ba

12.0 154.0 410.0 2.0 362.0 182.0 172.0 22.0 0.0 29.0 5.0 30.0

40.0 88.0 63.0 2.0 41.0 37.0 21.0 1.0 0.0 3.0 1.0 8.0

106.0 76.0 50.0 2.0 26.0 31.0 16.0 1.0 0.0 2.0 0.0 9.0

0.0 170.0 3487.0 4.0 5181.0 1115.0 1420.0 268.0 17.0 414.0 253.0 125.0

37.0 80.0 1555.0 2.0 2069.0 737.0 417.0 66.0 4.0 239.0 92.0 76.0

import plotly.graph_objects as go
# Sort the dataframe by batting average and select the top 10
df_top10 = final_df.sort_values('bat_average', ascending=False).head(10)

# Create the plot using Plotly


fig = go.Figure(data=[go.Bar(
x=df_top10.index, y=df_top10['bat_average'],
text=df_top10['matches'].astype(str) + ' matches, ' + df_top10['runs'].astype(str) + ' runs, ' + d
textposition='auto',
marker=dict(color=df_top10['bat_average'], coloraxis="coloraxis")
)])

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 23/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
fig.update_layout(title='Top 10 Batsmen based on Batting Average', xaxis_title="Player", yaxis_title="
fig.show()

Top 10 Batsmen based on Batting Average

46
100.0 matches, 3895.0 runs, 84.0 outs, 16.0 not outs

170.0 matches, 5181.0 runs, 125.0 outs, 45.0 not outs


81.0 matches, 2029.0 runs, 49.0 outs, 32.0 not outs

162.0 matches, 5883.0 runs, 143.0 outs, 19.0 not outs

40 141.0 matches, 4997.0 runs, 128.0 outs, 13.0 not outs


81.0 matches, 2832.0 runs, 71.0 outs, 10.0 not outs

44
69.0 matches, 2489.0 runs, 65.0 outs, 4.0 not outs

58.0 matches, 1977.0 runs, 52.0 outs, 6.0 not outs

75.0 matches, 2105.0 runs, 57.0 outs, 18.0 not outs

102.0 matches, 2455.0 runs, 69.0 outs, 33.0 not outs


Batting Average

30
42

20 40

38
10

36
0
0 20 40 60

Player

import plotly.graph_objects as go
# Sort the dataframe by batting average and select the top 10
df_top10 = final_df.sort_values('wickets', ascending=False).head(10)

# Create the plot using Plotly


fig = go.Figure(data=[go.Bar(
x=df_top10.index, y=df_top10['wickets'],
text=df_top10['matches'].astype(str) + ' matches, ' + df_top10['wickets'].astype(str) + ' wickets, '
textposition='auto',
marker=dict(color=df_top10['wickets'], coloraxis="coloraxis")
)])
fig.update_layout(title='Top 10 Bowler based on Wickets', xaxis_title="Player", yaxis_title="Wickets")
fig.show()

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 24/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

Top 10 Bowler based on Wickets

160.0 matches, 183.0 wickets, 4360.0 RunGive, 3296.0 BallsThrow

154.0 matches, 166.0 wickets, 3976.0 RunGive, 3317.0 BallsThrow 130.0 matches, 166.0 wickets, 3624.0 RunGive, 2940.0 BallsThrow
122.0 matches, 170.0 wickets, 3364.0 RunGive, 2974.0 BallsThrow

180
181.0 matches, 157.0 wickets, 4534.0 RunGive, 4024.0 BallsThrow 165.0 matches, 157.0 wickets, 4301.0 RunGive, 3309.0 BallsThrow

150
146.0 matches, 154.0 wickets, 3971.0 RunGive, 3384.0 BallsThrow

148.0 matches, 152.0 wickets, 3831.0 RunGive, 3516.0 BallsThrow

163.0 matches, 150.0 wickets, 4030.0 RunGive, 3496.0 BallsThrow

175
120.0 matches, 148.0 wickets, 3407.0 RunGive, 2857.0 BallsThrow

170
Wickets

100
165

160

50
155

150

0
20 40 60 80

Player

matches.head(2)

Date MatchNumber Team1 Team2 Venue TossWinner TossDecision SuperOver Winnin

Narendra
2022- Rajasthan Gujarat Modi Rajasthan G
0 Final bat N
05-29 Royals Titans Stadium, Royals
Ahmedabad

Narendra
Royal
2022- Rajasthan Modi Rajasthan Raj
1 Qualifier 2 Challengers field N
05-27 Royals Stadium, Royals
Bangalore
Ahmedabad

Next steps: Generate code with matches


toggle_off View recommended plots New interactive sheet

# Create a new column for the winning team


matches["WinningTeam"] = matches.apply(lambda row: row["Team1"] if row["WinningTeam"] == "N" else row["T

# Count the number of wins for each team


wins = matches["WinningTeam"].value_counts()

# Sort the teams by win percentage


win_percentages = wins.sort_values(ascending=False)

# Plot the results


import plotly.graph_objs as go

fig = go.Figure()
fig.add_trace(go.Bar(x=wins.index[:10], y=wins.values[:10], name="Win Percentage"))
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 25/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

fig.update_layout(title="Top 10 Best Performing Teams", xaxis_title="Team", yaxis_title="Win Percentage"


fig.show()

Top 10 Best Performing Teams

120

100
Win Percentage

80

60

40

20

0
Ko Mu Su De Pu Ra Ro Ch Pu Gu
lkata mb nr lhi nja jas ya en ne jar
ai ise Ca bK th lC na W at
Kn In r sH pit in a n ha iS a r Tit
igh dia als gs Ro lle up r io an
yd ya ng er rs s
tR ns er l e K
ide ab s rs i n
rs ad Ba gs
ng
alo
re

Team

fig = go.Figure()
fig.add_trace(go.Bar(x=win_percentages.index[-10:], y=win_percentages.values[-10:], name="Win Percentage
fig.update_layout(title="Top 10 Least Performing Teams", xaxis_title="Team", yaxis_title="Win Percentage
fig.show()

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 26/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

Top 10 Least Performing Teams

100

80
Win Percentage

60

40

20

0
De Pu Ra Ro Ch Pu Gu Lu Pu Ko
lhi nja jas ya en ne jar ck ne ch
Ca bK th lC na W a no W iT
pit a h i a rri t T w a rri us
ing nR all Su ita Su ke
als s oy en pe or n p or rs
als ge rK s s er ss Ke
rs ing G ian ra
Ba s la
ng t s
alo
re

Team

matches = pd.read_csv('https://raw.githubusercontent.com/simranjeet97/IPL2023_WinningPrediction_EDA_Da
# First Drop the Unwanted Columns
matches.drop(['City','ID','method'],axis=1,inplace=True)
matches = matches.dropna()
matches['SEASON_INT'] = matches['Season'].apply(lambda x: int(x[:4]))
matches['SEASON_END_INT'] = matches['Season'].apply(lambda x: int('20'+str(x[5:])) if len(x)>5 else in

# Convert the Date column to a datetime data type


matches['Date'] = pd.to_datetime(matches['Date'])

# Group the data by season and select the last row of each group
last_matches = matches.sort_values('Date').groupby('Season').tail(1)

# Select the Season and WinningTeam columns


winning_teams = last_matches[['SEASON_END_INT', 'WinningTeam']]

# Plot the most winning team for each year


fig = px.bar(winning_teams, x='WinningTeam', y='SEASON_END_INT', hover_name='WinningTeam',color='SEASON_
fig.show()

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 27/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

10k SEASON_END_INT
2022

2020
8k
SEASON_END_INT

2018

6k
2016

2014
4k

2012

2k
2010

2008
0
Ra De Ch Ko Mu Su Gu
jas cc en lka mb nr jar
th an na ta a ise at
an Ch iS Kn i I rs Tit
Ro ar up igh n dia Hy an
ya ge er t n d e s
ls r s K ing R ide s rab
s rs a d

WinningTeam

Let's find out the Best Players of Time using K-Means Clustering

# Let's Remove the Most Correlated Columns


plt.figure(figsize=(20,10))
# Convert only numeric columns to correlation matrix
numeric_df = final_df.select_dtypes(include=np.number)
sns.heatmap(numeric_df.corr(),annot=True)
plt.show() # Add this line to display the plot

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 28/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

def correlation(data, threshold):


corr_matrix = data.corr()
upper_triangle = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(np.bool))
col_corr = [col for col in upper_triangle.columns if any(upper_triangle[col] > threshold)]
return col_corr

def correlation(data, threshold):


"""
Calculates the correlation between numerical features in a DataFrame.

Args:
data: The input DataFrame.
threshold: The correlation threshold.

Returns:
A list of column names with correlation above the threshold.
"""
# Select only numeric columns before calculating correlation
numeric_data = data.select_dtypes(include=np.number)

corr_matrix = numeric_data.corr()
upper_triangle = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(np.bool))
col_corr = [col for col in upper_triangle.columns if any(upper_triangle[col] > threshold)]
return col_corr

top_columns =final_df.drop(['runs',
'0s',
'1s',
'2s',
'3s',
'4s',

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 29/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
'6s',
'player_out',
'wickets',
'overs',
'runs_conceded',
'bowl_strike_rate',
'not_out'],axis=1)

# Convert the relevant columns to numeric type


for col in top_columns.columns:
if top_columns[col].dtype == object: # Check if column is of object type
try:
top_columns[col] = pd.to_numeric(top_columns[col], errors='coerce')
# Convert to numeric, replacing non-numeric values with NaN
except (ValueError, TypeError):
print(f"Could not convert column '{col}' to numeric.")

# Now apply isfinite


final_df_new = top_columns[np.isfinite(top_columns).all(1)]
final_df_new.head()

matches Balls_Faced innings bat_average bat_strike BallsThrow bowl_econ catches Player_of

from sklearn.preprocessing import StandardScaler


from sklearn.cluster import KMeans

# ipython-input-100-a826ed55dbe8

# Convert the relevant columns to numeric type


for col in top_columns.columns:
if top_columns[col].dtype == object: # Check if column is of object type
try:
top_columns[col] = pd.to_numeric(top_columns[col], errors='coerce')
# Convert to numeric, replacing non-numeric values with NaN
except (ValueError, TypeError):
print(f"Could not convert column '{col}' to numeric.")

# **Instead of removing all rows with any non-finite values, consider imputing them**
# **For example, you can replace NaN values with the column mean:**
for col in top_columns.columns:
top_columns[col] = top_columns[col].fillna(top_columns[col].mean())

# Now, your DataFrame should have data for scaling:


final_df_new = top_columns
final_df_new.head()

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 30/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

matches Balls_Faced innings bat_average bat_strike BallsThrow bowl_econ catches Play

12.0 154.0 410.0 2.0 12.07 88.29 3317.0 7.19 21.0

40.0 88.0 63.0 2.0 5.12 65.08 1974.0 7.58 19.0

106.0 76.0 50.0 2.0 2.89 52.00 1589.0 7.82 7.0

0.0 170.0 3487.0 4.0 41.45 148.58 0.0 0.00 120.0

37.0 80.0 1555.0 2.0 27.22 133.05 1.0 0.00 51.0

import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.impute import SimpleImputer
import matplotlib.pyplot as plt

# Assuming final_df_new is your preprocessed DataFrame

# 1. Replace infinite values with NaN:


final_df_new = final_df_new.replace([np.inf, -np.inf], np.nan)

# 2. Impute NaN values using SimpleImputer (strategy='mean'):


imputer = SimpleImputer(strategy='mean')
final_df_new_imputed_array = imputer.fit_transform(final_df_new)

# 3. Create DataFrame with imputed data:


# Get the correct number of columns from the original DataFrame to match the imputed array
num_cols_imputed = final_df_new_imputed_array.shape[1]
final_df_new_imputed = pd.DataFrame(final_df_new_imputed_array, columns=final_df_new.columns[:num_cols_i

# 4. Check and drop columns with zero variance after imputation:


for col in final_df_new_imputed.columns:
if final_df_new_imputed[col].std() == 0:
print(f"Dropping column '{col}' due to zero variance.")
final_df_new_imputed = final_df_new_imputed.drop(columns=[col])

# 5. Scale the data using StandardScaler:


scaler = StandardScaler()
scaled_data = scaler.fit_transform(final_df_new_imputed)

# 6. Perform KMeans clustering and plot the elbow curve:


clusters = range(1, 12)
errors = []
for k in clusters:
model = KMeans(n_clusters=k)
model.fit(scaled_data)
errors.append(model.inertia_)

plt.xlabel('K')
plt.ylabel('Errors')
plt.plot(clusters, errors, 'bx-')
plt.show()

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 31/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

model = KMeans(n_clusters = 4)
y_pred = model.fit_predict(scaled_data)
final_df_new['cluster'] = y_pred

final_df_new.head()

matches Balls_Faced innings bat_average bat_strike BallsThrow bowl_econ catches Play

12.0 154.0 410.0 2.0 12.07 88.29 3317.0 7.19 21.0

40.0 88.0 63.0 2.0 5.12 65.08 1974.0 7.58 19.0

106.0 76.0 50.0 2.0 2.89 52.00 1589.0 7.82 7.0

0.0 170.0 3487.0 4.0 41.45 148.58 0.0 0.00 120.0

37.0 80.0 1555.0 2.0 27.22 133.05 1.0 0.00 51.0

Next steps: Generate code with final_df_new


toggle_off View recommended plots New interactive sheet

final_df_new.reset_index(inplace=True)
topPlayers = final_df_new.rename(columns = {'index':'PlayerNames'})
topPlayers_cluster = pd.DataFrame(topPlayers[['PlayerNames','cluster']])

topPlayers_cluster

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 32/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

PlayerNames cluster

0 12.0 2

1 40.0 2

2 106.0 2

3 0.0 3

4 37.0 0

... ... ...

127 NaN 1

128 6.0 3

129 49.0 2

130 56.0 1

131 218.0 2

132 rows × 2 columns

Next
steps:
Generate code
with
topPlayers_cluster toggle_off View recommended
plots
New interactive
sheet

teams1 = topPlayers_cluster.loc[topPlayers_cluster['cluster']==0]
teams1 = teams1['PlayerNames'].tolist()

teams2 = topPlayers_cluster.loc[topPlayers_cluster['cluster']==1]
teams2 = teams2['PlayerNames'].tolist()

teams3 = topPlayers_cluster.loc[topPlayers_cluster['cluster']==2]
teams3 = teams3['PlayerNames'].tolist()

teams4 = topPlayers_cluster.loc[topPlayers_cluster['cluster']==3]
teams4 = teams4['PlayerNames'].tolist()

TopPlayer_Dataset = pd.DataFrame(teams1,columns=['teams1'])

TopPlayer_Dataset['teams2']=pd.Series(teams2)
TopPlayer_Dataset['teams3']=pd.Series(teams3)
TopPlayer_Dataset['teams4']=pd.Series(teams4)
TopPlayer_Dataset = TopPlayer_Dataset.fillna('')

TopPlayer_Dataset

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 33/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

teams1 teams2 teams3 teams4

0 37.0 13.0 12.0 0.0

1 23.0 64.0 40.0 11.0

2 67.0 106.0 1.0

3 68.0 48.0 33.0 2.0

4 66.0 86.0 53.0 32.0

5 136.0 211.0 78.0 7.0

6 10.0 112.0 135.0 36.0

7 95.0 16.0 4.0

8 21.0 35.0 70.0 20.0

9 63.0 27.0 80.0 3.0

10 94.0 124.0 30.0 38.0

11 18.0 110.0 114.0 19.0

12 108.0 24.0 213.0 8.0

13 81.0 73.0 45.0 5.0

14 14.0 74.0 34.0 9.0

15 62.0 79.0 144.0 6.0

16 202.0 47.0 230.0

17 59.0 87.0 69.0

18 137.0 42.0 90.0

19 41.0 99.0 160.0

20 118.0 85.0 127.0

21 39.0 98.0 103.0

22 189.0 83.0

23 28.0 54.0 109.0

24 147.0 130.0

25 25.0 75.0 125.0

26 55.0 46.0 152.0

27 52 0 65 0 142 0

Next
steps:
Generate code
with
TopPlayer_Dataset
toggle_off View recommended
plots
New interactive
sheet

Let's Build the Winning Prediction Model Now

matches = pd.read_csv('/content/IPL_Matches_2008_2022.csv')
balls = pd.read_csv('/content/IPL_Ball_by_Ball_2008_2022.csv')

inningScores = balls.groupby(['ID', 'innings']).sum()['total_run'].reset_index()


inningScores = inningScores[inningScores['innings']==1]
inningScores.head(10)
https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 34/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
gSco es. ead( 0)

ID innings total_run

0 335982 1 222

2 335983 1 240

4 335984 1 129

6 335985 1 165

8 335986 1 110

10 335987 1 166

12 335988 1 142

14 335989 1 208

16 335990 1 214

18 335991 1 182

Next steps: Generate code with inningScores toggle_off View recommended plots New interactive sheet

inningScores['target'] = inningScores['total_run'] + 1
inningScores.head(10)

ID innings total_run target

0 335982 1 222 223

2 335983 1 240 241

4 335984 1 129 130

6 335985 1 165 166

8 335986 1 110 111

10 335987 1 166 167

12 335988 1 142 143

14 335989 1 208 209

16 335990 1 214 215

18 335991 1 182 183

Next steps: Generate code with inningScores toggle_off View recommended plots New interactive sheet

matches = matches.merge(inningScores[['ID','target']], on='ID')

matches['Team1'] = matches['Team1'].str.replace('Delhi Daredevils', 'Delhi Capitals')


matches['Team2'] = matches['Team2'].str.replace('Delhi Daredevils', 'Delhi Capitals')
matches['WinningTeam'] = matches['WinningTeam'].str.replace('Delhi Daredevils', 'Delhi Capitals')

matches['Team1'] = matches['Team1'].str.replace('Kings XI Punjab', 'Punjab Kings')


matches['Team2'] = matches['Team2'].str.replace('Kings XI Punjab', 'Punjab Kings')
matches['WinningTeam'] = matches['WinningTeam'].str.replace('Kings XI Punjab', 'Punjab Kings')

matches['Team1'] = matches['Team1'].str.replace('Deccan Chargers', 'Sunrisers Hyderabad')


matches['Team2'] = matches['Team2'].str.replace('Deccan Chargers', 'Sunrisers Hyderabad')

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 35/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab
matches['WinningTeam'] = matches['WinningTeam'].str.replace('Deccan Chargers', 'Sunrisers Hyderabad')

matches['Team1'] = matches['Team1'].str.replace('Rising Pune Supergiant', 'Pune Warriors')


matches['Team2'] = matches['Team2'].str.replace('Rising Pune Supergiant', 'Pune Warriors')
matches['WinningTeam'] = matches['WinningTeam'].str.replace('Rising Pune Supergiant', 'Pune Warriors')

matches['Team1'] = matches['Team1'].str.replace('Rising Pune Supergiants', 'Pune Warriors')


matches['Team2'] = matches['Team2'].str.replace('Rising Pune Supergiants', 'Pune Warriors')
matches['WinningTeam'] = matches['WinningTeam'].str.replace('Rising Pune Supergiants', 'Pune Warriors'

matches['Team1'] = matches['Team1'].str.replace('Pune Warriorss', 'Pune Warriors')


matches['Team2'] = matches['Team2'].str.replace('Pune Warriorss', 'Pune Warriors')
matches['WinningTeam'] = matches['WinningTeam'].str.replace('Pune Warriorss', 'Pune Warriors')

matches['Team1'] = matches['Team1'].str.replace('Gujarat Lions', 'Gujarat Titans')


matches['Team2'] = matches['Team2'].str.replace('Gujarat Lions', 'Gujarat Titans')
matches['WinningTeam'] = matches['WinningTeam'].str.replace('Gujarat Lions', 'Gujarat Titans')

teams2023 = [
'Rajasthan Royals',
'Royal Challengers Bangalore',
'Sunrisers Hyderabad',
'Delhi Capitals',
'Chennai Super Kings',
'Gujarat Titans',
'Lucknow Super Giants',
'Kolkata Knight Riders',
'Punjab Kings',
'Mumbai Indians'
]

matches = matches[matches['Team1'].isin(teams2023)]
matches = matches[matches['Team2'].isin(teams2023)]
matches = matches[matches['WinningTeam'].isin(teams2023)]

matches['Team1'].unique()

array(['Rajasthan Royals', 'Royal Challengers Bangalore',


'Sunrisers Hyderabad', 'Delhi Capitals', 'Chennai Super Kings',
'Gujarat Titans', 'Lucknow Super Giants', 'Kolkata Knight Riders',
'Punjab Kings', 'Mumbai Indians'], dtype=object)

matches.head()

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 36/37
11/27/24, 3:31 PM iplprediction.ipynb - Colab

ID City Date Season MatchNumber Team1 Team2 Venue TossWinner To

Narendra
2022- Rajasthan Gujarat Modi Rajasthan
0 1312200 Ahmedabad 2022 Final
05-29 Royals Titans Stadium, Royals
Ahmedabad

Narendra
Royal
2022- Rajasthan Modi Rajasthan
1 1312199 Ahmedabad 2022 Qualifier 2 Challengers
05-27 Royals Stadium, Royals
Bangalore
Ahmedabad

Royal Lucknow Eden Lucknow


2022-
2 1312198 Kolkata 2022 Eliminator Challengers Super Gardens, Super
05-25
Bangalore Giants Kolkata Giants

matches.isnull().sum()
Eden
2022- Rajasthan Gujarat Gujarat
3 1312197 Kolkata 2022 Qualifier 1 Gardens,
0 05-24 Royals Titans Titans
Kolkata
ID 0

City 51 Wankhede
2022- Sunrisers Punjab Sunrisers
4 1304116 Mumbai 2022 70 Stadium,
Date 0 05-22 Hyderabad Kings Hyderabad
Mumbai
Season 0

MatchNumber 0

Team1 0

Team2 0

V 0

https://colab.research.google.com/drive/1twK0-VKoZs5qVfvDsa0XSx-MFPxex2Ut#scrollTo=ltCjUaTw-LEw&printMode=true 37/37

You might also like