Shetty 2020
Shetty 2020
Abstract: This paper is about a model that can select best to it using machine learning [3], as it will remove the
playing 11 in the Indian cricket team. The performance of each proclivity towards one particular player and benefit our
player depends on several factors like the pitch type, the cricket team. The thing that makes this model unique is that
opposition team, the ground, and several others. The proposed it takes into consideration if a player is an all-rounder. All-
model contains data from the One Day International of the rounders, when compared based only on one of their
past several years of team India. The dataset used for this attributes, may not get a place in the team.
model created using data from trusted sites like espn.com. This
method is distinct in the sense that it gives you a 360-degree 2. LITERATURE REVIEW
view of the player's skill set, be it, batting, bowling, and
fielding. The vital part of this model is to find the best all - To know in-depth this topic, previous research is done in
rounder player. Random forest algorithm used for predicting this field, and the studies dedicated to this are discussed in
performance. The player performance classified into several detail here.
classes, and a random forest classifier used to predict the
player’s performance. This model gives 76% accuracy for [1] Aminul Islam Anik et al. proposed using the balls faced,
batsmen, around 67% accuracy for bowlers, and 95% for an ground, pitch, opposition, and position to find the
all-rounder. A model is developed with some extra features like performance of a player using SVM. This model does a
weather, matches played that have not considered in any proper analysis of various factors and their effect on the runs
existing model. Using this model, the best team can be selected
scored by batsmen. The primary factor is balls faced by the
to play in given conditions.
players, but a drawback is that the number of shots faced by
Keywords: All-rounder, random forest, player selection, team a player cannot be known before the match. The paper has
selection, cricket, machine learning, SVM, logistic regression done the right amount of work on batsmen and bowlers, but
the analysis for all-rounder is left.
1. INTRODUCTION
[2] Amal Kaluarachchi used Bayesian classifiers in Machine
Cricket is the second most-watched program on television.
learning, to predict how the factors like home game
The popularity of this sport is soaring high in southeast advantage affect the outcome of the match. Using this idea,
countries like India, Pakistan, Bangladesh, and Sri-Lanka.
the home and away is used as one of the parameters that
One of the major issues now is playing 11, which should be affects the players’ performance.
selected. People of different ages and backgrounds are
zealot fans of cricket because the Indian team is doing well [3]Pranavan Somaskandhan et al.; analyzed the set of
in the recent past. However, many confusions arise before a attributes that impose a high impact on the outcome of a
match about team combinations, i.e., which player to select game using machine learning. When the attribute
or drop for the combination of high individual wickets, number of bowled
deliveries, number of the thirties, total wickets, wickets in
the power play, runs in death overs, dots in middle overs,
number of fours and singles in middle overs highest
accuracy obtained. The attribute, as mentioned above, gave
next game, which batsman should play in which position or an accuracy of 81% using SVM.
which bowler should be picked for the upcoming game.
Machine learning is used to predict the match results in [4]Md. Muhaimenur Rahman analyzed Bangladesh One
many different sports [2]. From here, the motivation Day International Cricket data. They divided the study into
emerged to evaluate the performance of the player in a three sections, i.e., at the start of the game, after one inning
specific match and selecting the best playing 11 according and after the fall of wickets. They used Decision Tree and
got an accuracy of 63.63% at the beginning of the game,
Authorized licensed use limited to: University of New South Wales. Downloaded on July 26,2020 at 15:43:35 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Communication and Electronics Systems (ICCES 2020)
IEEE Conference Record # 48766; IEEE Xplore ISBN: 978-1-7281-5371-1
72.72%, 81.81% in first and second innings, and 80% and The dataset which was created was split into 2 parts, 80% of
70% for fall of wicket analysis. the dataset was used for training the model and 20% of the
dataset was used to evaluate the results (refer fig 3.1).
[5]Riju Chaudhari et al. used a DEA(Data Envelopment Various algorithms like logistic regression, SVM, and
Analysis) for measuring the efficiency of players. The paper random forest were used to get the results which are
takes records of the player's performance in test matches. discussed in detail in the next section of this paper.
Since there can be factors such as match-fixing in T20.
Since in test series, every player might have to do batting a
bowler with a higher batting strike rate is given preference.
This paper is very different from other paper since it does 4. ALGORITHMS AND TECHNIQUES
not take direct machine learning techniques and instead uses LOGISTIC REGRESSION: Logistic regressor usually
a unique approach. used for binary classification tasks. In the case of multi-
[6]Md. Jakir Hossain used a genetic algorithm on 30 players class classification, softmax function used in place of the
in the Bangladesh cricket team to select top players. The sigmoid function. [8] The hypothesis function for logistic
paper combines statistical analysis with a genetic algorithm regression is given as g(z)=1/(1+e -z).
to choose top tier players. Every possible solution out of
30
C14 total solutions taken as a chromosome. The ratings of
players considered using a statistical method. The final
fitness value calculated using factors like the sum of a rating
of players, number of bowlers, batsmen, all-rounders, and
wicket keepers, number of spins and fast bowlers, and
number of the right and left-handed players.
Authorized licensed use limited to: University of New South Wales. Downloaded on July 26,2020 at 15:43:35 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Communication and Electronics Systems (ICCES 2020)
IEEE Conference Record # 48766; IEEE Xplore ISBN: 978-1-7281-5371-1
SUPPORT VECTOR: In support vector classifier, the There is a strong relationship between strike rate and
hyper plane helps to distinguish different classes. There are performance of players. The higher the result, the better is
different kernels to separate non-linear data by mapping
them to higher dimensions [9]. Many hyperplanes might the performance (refer Fig 4.6).
classify the data successfully. One reasonable choice as the
best hyperplane is the one representing the most significant
separation or margin between the two classes. So, the
hyperplane is chosen in a way that the distance from it to the
nearest point maximized.
The solid line (refer Fig 4.4) represents one of the possible
hyperplanes.
5. DATA DESCRIPTION
RANDOM FOREST: Random Forest is an ensemble
Data Collection: The dataset is made from sites like
learning method for classification, regression, and other espncricinfo.com, one of the legit sites. A CSV file created
tasks by taking a combination of results of several decision using the data from previous matches played by the Indian
cricket team. And for other conditions, the summary was
trees. used.
Random forests used to rank the importance of variables in
a regression or classification problem in a natural way.
Most of the batsmen score a meager amount of runs which
is given by the graph of the dataset provided below (Fig
4.5):
Authorized licensed use limited to: University of New South Wales. Downloaded on July 26,2020 at 15:43:35 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Communication and Electronics Systems (ICCES 2020)
IEEE Conference Record # 48766; IEEE Xplore ISBN: 978-1-7281-5371-1
Batting Attributes: Position, Matches Played, Runs, Strike Fig.6.1: Input display
Rate, Ground, Home Away, 50s, 100s, overall average,
The sample output on clicking the predict button is given
pitch, opponent, and weather.
below in Fig.6.2.
Authorized licensed use limited to: University of New South Wales. Downloaded on July 26,2020 at 15:43:35 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Communication and Electronics Systems (ICCES 2020)
IEEE Conference Record # 48766; IEEE Xplore ISBN: 978-1-7281-5371-1
For all-rounder (refer fig 7.2) both the precision and recall
were very high leading to a good f1-score and a high
Fig.6.4 Support Vector Classification Report accuracy of around 96%.
Authorized licensed use limited to: University of New South Wales. Downloaded on July 26,2020 at 15:43:35 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Communication and Electronics Systems (ICCES 2020)
IEEE Conference Record # 48766; IEEE Xplore ISBN: 978-1-7281-5371-1
"Identifying the Optimal Set of Attributes that Impose High Impact on the
End Results of a Cricket Match Using Machine Learning," 2017 IEEE
International Conference on Industrial and Information Systems (ICIIS)
[4] Md. Muhaimenur Rahman, Md. Omar Faruque Shamim, Sabir Ismail,
"An Analysis of Bangladesh One Day International Cricket Data: A
Machine Learning Approach," Computer Science & Engineering Sylhet
Engineering College Sylhet, Bangladesh, 2018 International Conference on
Innovations in Science, Engineering and T echnology (ICISET )
[5] Riju Chaudhari, Sahil Bhardwaj, Sakshi Lakra, " A DEA model for
Fig.7.3: Classification report for bowlers Selection of Indian Cricket team players." 2019 Amity International
Conference on Artificial Intelligence.
[6] Md. Jakir Hossain, "Bangladesh cricket squad prediction using
statistical data and genetic algorithm".2018 4 th International Conference on
Electrical Engineering and Information and Communication T echnology.
This analysis could be done between the game where the [7] Vipul Pujbai, Rohit Chaudhari, Devendra Pal, Kunal Nhavi, Nikhil
number of dots, remaining overs, number of wickets left, Shimpi, Harshal Joshi, A survey on team selection in game of cricket using
machine learning. Nov 2019, Vol 6, Issue 11, International Research
and strike rate are known, which could help the players Journal of Engineering and T echnology.
decide the position they should play, giving even better [8] Park, Hyeoun-Ae, “ An Introduction to Logistic Regression: From
results. As these factors determine the game's outcome Basic Concepts to Interpretat ion with Particular Attention to Nursing
Domain”. J Korean Acad Nurs Vol.43 No.2 April 2013.
within split seconds, a lot of work could be done in these [9] C. C. Chang and C. J. Lin, “ LIBSVM: a library for support vector
dynamic factors leading to a beneficial model. machines,” ACM transactions on intelligent systems and
technology(T IST ), vol. 2, no. 3, pp. 1–27, Jan. 2011.
7. REFERENCES [10] D. C. Montgomery, E. A. Peck, and G. G. Vining, Introduction to
linear regression analysis, vol. 821. John Wiley & Sons, 2012.
[11] Raj, J.S.,& Ananthi,J.V, “Recurrent Neural Networks and Nonlinear
[1] Aminul Anik “ Player’s Performance Prediction in ODI Cricket Using Prediction in Support Vector Machine”. Journal of Soft Computing
Machine Learning Algorithms” BRAC University, Dhaka, Bangladesh, 4 th
Paradigm(JSCP) in 2019,1(01),33-40.
International Conference 2018 on Electrical Engineering and Information
[12] H. H. Lemmer, "A measure for the batting performance of cricket
and Communication T echnology. players: research article," South African Journal for Research in Sport,
[2] Amal Kaluarachchi, Aparna S. Varde, "CricAI: A Classification Based
Physical Education, and Recreation, vol. 26, no. 1, 2004.
T ool to Predict the Outcome in ODI Cricket " thesis, Montclair State
[13]Chao-Ying Joanne Peng, Kuk Lida Lee, Gary M.Ingersoll, “ An
University, Montclair, NJ, USA, 2010 Fifth International Conference on
Introduction to Logistic Regression Analysis and Reporting”. T he Journal
Information and Automation for Sustainability. of Educational Research 96(1):3-14 · (2012)September.
[3] Pranavan Somaskandhan, Gihan Wijesinghe, Leshan Bashitha
Wijegunawardana, Asitha Bandaranayake, and Sampath Deegalla,
Authorized licensed use limited to: University of New South Wales. Downloaded on July 26,2020 at 15:43:35 UTC from IEEE Xplore. Restrictions apply.