0% found this document useful (0 votes)
18 views6 pages

Shetty 2020

This paper proposes a machine learning model to select the best playing 11 for the Indian cricket team. The model uses past player performance data and factors like pitch type, opposition, and whether a player is an all-rounder to classify player performance and predict the best team. Random forest classification is used to predict player performance classes with 76% accuracy for batsmen, 67% for bowlers, and 95% for all-rounders.

Uploaded by

renohe9897
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views6 pages

Shetty 2020

This paper proposes a machine learning model to select the best playing 11 for the Indian cricket team. The model uses past player performance data and factors like pitch type, opposition, and whether a player is an all-rounder to classify player performance and predict the best team. Random forest classification is used to predict player performance classes with 76% accuracy for batsmen, 67% for bowlers, and 95% for all-rounders.

Uploaded by

renohe9897
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Proceedings of the Fifth International Conference on Communication and Electronics Systems (ICCES 2020)

IEEE Conference Record # 48766; IEEE Xplore ISBN: 978-1-7281-5371-1

Machine learning-based Selection of


Optimal sports Team based on the Players
Performance
Prof. Monali Shetty Sankalp Rane Computer Chaitanya Pandita Suyash Salvi
Computer Engineering Engineering Fr. CRCE, Computer Engineering Computer Engineering
Fr. CRCE, Mumbai Mumbai Fr. CRCE, Mumbai Fr. CRCE, Mumbai
[email protected] sankalprane1998@gmail. chaitanyapandita97@gmail. suyash.salvi1998@gmail.
com com com

Abstract: This paper is about a model that can select best to it using machine learning [3], as it will remove the
playing 11 in the Indian cricket team. The performance of each proclivity towards one particular player and benefit our
player depends on several factors like the pitch type, the cricket team. The thing that makes this model unique is that
opposition team, the ground, and several others. The proposed it takes into consideration if a player is an all-rounder. All-
model contains data from the One Day International of the rounders, when compared based only on one of their
past several years of team India. The dataset used for this attributes, may not get a place in the team.
model created using data from trusted sites like espn.com. This
method is distinct in the sense that it gives you a 360-degree 2. LITERATURE REVIEW
view of the player's skill set, be it, batting, bowling, and
fielding. The vital part of this model is to find the best all - To know in-depth this topic, previous research is done in
rounder player. Random forest algorithm used for predicting this field, and the studies dedicated to this are discussed in
performance. The player performance classified into several detail here.
classes, and a random forest classifier used to predict the
player’s performance. This model gives 76% accuracy for [1] Aminul Islam Anik et al. proposed using the balls faced,
batsmen, around 67% accuracy for bowlers, and 95% for an ground, pitch, opposition, and position to find the
all-rounder. A model is developed with some extra features like performance of a player using SVM. This model does a
weather, matches played that have not considered in any proper analysis of various factors and their effect on the runs
existing model. Using this model, the best team can be selected
scored by batsmen. The primary factor is balls faced by the
to play in given conditions.
players, but a drawback is that the number of shots faced by
Keywords: All-rounder, random forest, player selection, team a player cannot be known before the match. The paper has
selection, cricket, machine learning, SVM, logistic regression done the right amount of work on batsmen and bowlers, but
the analysis for all-rounder is left.
1. INTRODUCTION
[2] Amal Kaluarachchi used Bayesian classifiers in Machine
Cricket is the second most-watched program on television.
learning, to predict how the factors like home game
The popularity of this sport is soaring high in southeast advantage affect the outcome of the match. Using this idea,
countries like India, Pakistan, Bangladesh, and Sri-Lanka.
the home and away is used as one of the parameters that
One of the major issues now is playing 11, which should be affects the players’ performance.
selected. People of different ages and backgrounds are
zealot fans of cricket because the Indian team is doing well [3]Pranavan Somaskandhan et al.; analyzed the set of
in the recent past. However, many confusions arise before a attributes that impose a high impact on the outcome of a
match about team combinations, i.e., which player to select game using machine learning. When the attribute
or drop for the combination of high individual wickets, number of bowled
deliveries, number of the thirties, total wickets, wickets in
the power play, runs in death overs, dots in middle overs,
number of fours and singles in middle overs highest
accuracy obtained. The attribute, as mentioned above, gave
next game, which batsman should play in which position or an accuracy of 81% using SVM.
which bowler should be picked for the upcoming game.
Machine learning is used to predict the match results in [4]Md. Muhaimenur Rahman analyzed Bangladesh One
many different sports [2]. From here, the motivation Day International Cricket data. They divided the study into
emerged to evaluate the performance of the player in a three sections, i.e., at the start of the game, after one inning
specific match and selecting the best playing 11 according and after the fall of wickets. They used Decision Tree and
got an accuracy of 63.63% at the beginning of the game,

978-1-7281-5371-1/20/$31.00 ©2020 IEEE 1267

Authorized licensed use limited to: University of New South Wales. Downloaded on July 26,2020 at 15:43:35 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Communication and Electronics Systems (ICCES 2020)
IEEE Conference Record # 48766; IEEE Xplore ISBN: 978-1-7281-5371-1

72.72%, 81.81% in first and second innings, and 80% and The dataset which was created was split into 2 parts, 80% of
70% for fall of wicket analysis. the dataset was used for training the model and 20% of the
dataset was used to evaluate the results (refer fig 3.1).
[5]Riju Chaudhari et al. used a DEA(Data Envelopment Various algorithms like logistic regression, SVM, and
Analysis) for measuring the efficiency of players. The paper random forest were used to get the results which are
takes records of the player's performance in test matches. discussed in detail in the next section of this paper.
Since there can be factors such as match-fixing in T20.
Since in test series, every player might have to do batting a
bowler with a higher batting strike rate is given preference.
This paper is very different from other paper since it does 4. ALGORITHMS AND TECHNIQUES
not take direct machine learning techniques and instead uses LOGISTIC REGRESSION: Logistic regressor usually
a unique approach. used for binary classification tasks. In the case of multi-
[6]Md. Jakir Hossain used a genetic algorithm on 30 players class classification, softmax function used in place of the
in the Bangladesh cricket team to select top players. The sigmoid function. [8] The hypothesis function for logistic
paper combines statistical analysis with a genetic algorithm regression is given as g(z)=1/(1+e -z).
to choose top tier players. Every possible solution out of
30
C14 total solutions taken as a chromosome. The ratings of
players considered using a statistical method. The final
fitness value calculated using factors like the sum of a rating
of players, number of bowlers, batsmen, all-rounders, and
wicket keepers, number of spins and fast bowlers, and
number of the right and left-handed players.

[7]Vipul Punjabi and team used naive Bayes classifier to


predict the runs scored by batsmen and wickets taken by the
bowler. The runs scored and wickets taken classified into
different categories. The dataset used for this is taken from
records of IPL matches. This paper takes remarkably few
input features, thus reducing a lot of potential of the model.

3. PROPOSED METHODOLOGY Fig.4.1 Graph of g(z) vs z


Data of the past ODI matches used to create the dataset, The plot (refer Fig.4.1) for g(z) tends towards one as z tends
which is mentioned in detail in the next section of this to infinity. And it tends towards 0 as z tends to negative
paper. The performance of batsmen, according to the runs infinity.
scored is divided into various classes, the same done for the
number of wickets taken by the bowler.

Fig.4.2. Cost function vs h(x) when y=0

Fig.3.1: Proposed Model

978-1-7281-5371-1/20/$31.00 ©2020 IEEE 1268

Authorized licensed use limited to: University of New South Wales. Downloaded on July 26,2020 at 15:43:35 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Communication and Electronics Systems (ICCES 2020)
IEEE Conference Record # 48766; IEEE Xplore ISBN: 978-1-7281-5371-1

Fig.4.3 Cost function vs h(x) when y=0

The cost function can be written in one line as [8]:

cost(h θ, (x),y) = -ylog( h θ(x) ) - (1-y)log( 1- h θ(x) ) Fig.4.5. Distplot of Runs

SUPPORT VECTOR: In support vector classifier, the There is a strong relationship between strike rate and
hyper plane helps to distinguish different classes. There are performance of players. The higher the result, the better is
different kernels to separate non-linear data by mapping
them to higher dimensions [9]. Many hyperplanes might the performance (refer Fig 4.6).
classify the data successfully. One reasonable choice as the
best hyperplane is the one representing the most significant
separation or margin between the two classes. So, the
hyperplane is chosen in a way that the distance from it to the
nearest point maximized.

Fig.4.4 SVM Hyperplane Fig.4.6 Joint plot of Result vs Strike Rate

The solid line (refer Fig 4.4) represents one of the possible
hyperplanes.
5. DATA DESCRIPTION
RANDOM FOREST: Random Forest is an ensemble
Data Collection: The dataset is made from sites like
learning method for classification, regression, and other espncricinfo.com, one of the legit sites. A CSV file created
tasks by taking a combination of results of several decision using the data from previous matches played by the Indian
cricket team. And for other conditions, the summary was
trees. used.
Random forests used to rank the importance of variables in
a regression or classification problem in a natural way.
Most of the batsmen score a meager amount of runs which
is given by the graph of the dataset provided below (Fig
4.5):

978-1-7281-5371-1/20/$31.00 ©2020 IEEE 1269

Authorized licensed use limited to: University of New South Wales. Downloaded on July 26,2020 at 15:43:35 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Communication and Electronics Systems (ICCES 2020)
IEEE Conference Record # 48766; IEEE Xplore ISBN: 978-1-7281-5371-1

Bowling Attributes: Matches Played, Wickets, Average,


Economy, Strike Rate, Ground, Pitch, Opponent, Weather,
Home Away.

All-rounder Attributes: matches played, wickets, runs


scored, strike rate, average, ground, home away, opponent,
weather, and pitch.

Since these models considered very few features, the


accuracy was not good enough. Hence, a model is
developed with some extra features like matches played,
pitch, weather, and so on.

These models only considered batsmen and bowler's to


create a team of 11 players. But if bowlers and batsmen
compared with all-rounders, all-rounders will always get
Fig. 5.1 A part of the Batsman dataset fewer ratings. Thus all-rounders are considered while
creating the model to generate an official team.
The datasets: Figure 5.1 is a small sample of the batsmen
dataset. To process this data, one hot encoding used. The 6. IMPLEMENTATION
player names and his stats are hardcoded in the backend.
For implementation, Flask API is used. A jupyter notebook
After the stats entered in the API, all players' performance is
is used to train the model and use the available algorithms in
recorded in a dictionary using for loop. The dictionary then
sorted in the reverse order to give the names of the top the scikit lean library in python. Below is the screenshot of
players. our implementation and a sample result. The input
There are eight opposition teams and 15 grounds taken into parameters, which are the same for batsmen, bowlers, and
consideration. Below is a part of the dataset after pre- all-rounders, are taken into consideration.
processing with pandas library.

Fig.5.2 Part of Batsmen dataset

Feature Selection: The previously created models for


selection of optimal team contained features like Opponents,
Runs Scored, Strike Rate, and Overall Average. The
following attributes are considered to measure a player’s
performance.

Batting Attributes: Position, Matches Played, Runs, Strike Fig.6.1: Input display
Rate, Ground, Home Away, 50s, 100s, overall average,
The sample output on clicking the predict button is given
pitch, opponent, and weather.
below in Fig.6.2.

978-1-7281-5371-1/20/$31.00 ©2020 IEEE 1270

Authorized licensed use limited to: University of New South Wales. Downloaded on July 26,2020 at 15:43:35 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Communication and Electronics Systems (ICCES 2020)
IEEE Conference Record # 48766; IEEE Xplore ISBN: 978-1-7281-5371-1

Using the scikit learn library in python, various algorithms


used to predict the results. Different techniques like logistic
regression, SVM classifier, decision tree, and random forest
used to predict the classes, out of which Random forest gave
the best results.

Fig.6.5 Logistic Regression Classification Report

So the process is further proceeded with the random forest


algorithm.

7. CONCLUSION AND FUTURE SCOPE

The proposed work can address the issue of selecting the


optimal team in cricket without any prejudice and give equal
importance to all-rounders. This method can successfully
implemented in a web application by using a flask to run our
project. This model provides 76% accuracy for batsmen
Fig.6.2: Output display for all-rounders
(refer Fig 7.1) around 67% accuracy for bowlers (Fig 7.3)
When the random forest classifier is used with a test size of and 95% for all-rounder (Fig 7.2). The results are verified
20%, the following results for the batsmen dataset are for 20% of the dataset and obtained the above results.
obtained.

Fig.7.1: Classification report for batsmen


Fig.6.3 Random Forest Classification Report
For batsmen, the dataset (refer fig 7.1) had the highest
The support vector classifier gives the below classification
accuracy for class 5 and lowest for class 3.
report.

Fig 7.2: Classification report for all-rounders

For all-rounder (refer fig 7.2) both the precision and recall
were very high leading to a good f1-score and a high
Fig.6.4 Support Vector Classification Report accuracy of around 96%.

While the logistic regression algorithm gave the worst


results

978-1-7281-5371-1/20/$31.00 ©2020 IEEE 1271

Authorized licensed use limited to: University of New South Wales. Downloaded on July 26,2020 at 15:43:35 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Communication and Electronics Systems (ICCES 2020)
IEEE Conference Record # 48766; IEEE Xplore ISBN: 978-1-7281-5371-1

"Identifying the Optimal Set of Attributes that Impose High Impact on the
End Results of a Cricket Match Using Machine Learning," 2017 IEEE
International Conference on Industrial and Information Systems (ICIIS)
[4] Md. Muhaimenur Rahman, Md. Omar Faruque Shamim, Sabir Ismail,
"An Analysis of Bangladesh One Day International Cricket Data: A
Machine Learning Approach," Computer Science & Engineering Sylhet
Engineering College Sylhet, Bangladesh, 2018 International Conference on
Innovations in Science, Engineering and T echnology (ICISET )

[5] Riju Chaudhari, Sahil Bhardwaj, Sakshi Lakra, " A DEA model for
Fig.7.3: Classification report for bowlers Selection of Indian Cricket team players." 2019 Amity International
Conference on Artificial Intelligence.
[6] Md. Jakir Hossain, "Bangladesh cricket squad prediction using
statistical data and genetic algorithm".2018 4 th International Conference on
Electrical Engineering and Information and Communication T echnology.
This analysis could be done between the game where the [7] Vipul Pujbai, Rohit Chaudhari, Devendra Pal, Kunal Nhavi, Nikhil
number of dots, remaining overs, number of wickets left, Shimpi, Harshal Joshi, A survey on team selection in game of cricket using
machine learning. Nov 2019, Vol 6, Issue 11, International Research
and strike rate are known, which could help the players Journal of Engineering and T echnology.
decide the position they should play, giving even better [8] Park, Hyeoun-Ae, “ An Introduction to Logistic Regression: From
results. As these factors determine the game's outcome Basic Concepts to Interpretat ion with Particular Attention to Nursing
Domain”. J Korean Acad Nurs Vol.43 No.2 April 2013.
within split seconds, a lot of work could be done in these [9] C. C. Chang and C. J. Lin, “ LIBSVM: a library for support vector
dynamic factors leading to a beneficial model. machines,” ACM transactions on intelligent systems and
technology(T IST ), vol. 2, no. 3, pp. 1–27, Jan. 2011.
7. REFERENCES [10] D. C. Montgomery, E. A. Peck, and G. G. Vining, Introduction to
linear regression analysis, vol. 821. John Wiley & Sons, 2012.
[11] Raj, J.S.,& Ananthi,J.V, “Recurrent Neural Networks and Nonlinear
[1] Aminul Anik “ Player’s Performance Prediction in ODI Cricket Using Prediction in Support Vector Machine”. Journal of Soft Computing
Machine Learning Algorithms” BRAC University, Dhaka, Bangladesh, 4 th
Paradigm(JSCP) in 2019,1(01),33-40.
International Conference 2018 on Electrical Engineering and Information
[12] H. H. Lemmer, "A measure for the batting performance of cricket
and Communication T echnology. players: research article," South African Journal for Research in Sport,
[2] Amal Kaluarachchi, Aparna S. Varde, "CricAI: A Classification Based
Physical Education, and Recreation, vol. 26, no. 1, 2004.
T ool to Predict the Outcome in ODI Cricket " thesis, Montclair State
[13]Chao-Ying Joanne Peng, Kuk Lida Lee, Gary M.Ingersoll, “ An
University, Montclair, NJ, USA, 2010 Fifth International Conference on
Introduction to Logistic Regression Analysis and Reporting”. T he Journal
Information and Automation for Sustainability. of Educational Research 96(1):3-14 · (2012)September.
[3] Pranavan Somaskandhan, Gihan Wijesinghe, Leshan Bashitha
Wijegunawardana, Asitha Bandaranayake, and Sampath Deegalla,

978-1-7281-5371-1/20/$31.00 ©2020 IEEE 1272

Authorized licensed use limited to: University of New South Wales. Downloaded on July 26,2020 at 15:43:35 UTC from IEEE Xplore. Restrictions apply.

You might also like