Aisyah Ariana Hamdan - Interim Report
Aisyah Ariana Hamdan - Interim Report
By
JANUARY 2023
by
19000476
(INFORMATION TECHNOLOGY)
Approved by,
_____________________
TRONOH, PERAK
January 2023
i
CERTIFICATION OF ORIGINALITY
This is to certify that I am responsible for the work submitted in this project, that the
original work is my own except as specified in the references and acknowledgements,
and that the original work contained herein have not been undertaken or done by
unspecified sources or persons.
Aisyah Ariana H
______________________________
ii
ABSTRACT
The purpose of this paper is to use machine learning to make predictions for
Formula One races. The predictions and their results are then displayed in an
interactive dashboard using data visualisation tools. Formula One is a cutting-edge,
high-tech racing competition that generates massive volumes of data that serve as the
ideal testing ground for new machine learning algorithms and approaches. There are
variables affecting the likelihood of the drivers winning because this sport is
unpredictable. As a result, a structured and integrated machine learning model would
be able to assist team builders in reviewing the team's performance and using this
information to provide an accurate prediction and better strategy. Furthermore, by
emphasising crucial elements like the ML algorithms utilised in achieving the
maximum prediction accuracy, this paper offers a comprehensive analysis and
assessment of the literature on Machine Learning (ML) and sport outcome predictions.
Moreover, CRISP-DM, a Cross-Industry Standard Process for Data Mining is used in
this project and contains six key steps. In terms of determining accuracy, the methods
that will be incorporated into this model also derive from the supervised learning
algorithm. The algorithms are Gradient Boosting Tree, Random Forest, Support
Vector Machine, and Neural Network. Finally, this report's content is to offer
information for any future studies in the field and to suggest future expansions of the
project's applications because this framework may also be used to predict other sports
that will be advantageous to the team.
iii
ACKNOWLEDGEMENT
I would like to express my sincere appreciation to all the individuals who have
supported me throughout the course of this project.
First and foremost, it is with great pleasure that I wish to express my sincere
thanks to Universiti Teknologi PETRONAS (UTP) for providing me with this
opportunity to apply the knowledge gained and interest developed during my
undergraduate studies in a professional setting. Additionally, I am beyond grateful to
Dr Kamaluddeen Usman Danyaro for the invaluable guidance, encouragement, and
inspiration throughout the project. The insights, knowledge, and constructive comment
provided have been essential in shaping my ideas and improving the quality of my
work.
Last but not least, I want to thank myself. A big pat on the back for never
quitting, and being able to express the interest I have in this field in a structured
manner. I am so proud to see how far I have come and grown into a better person
throughout this project journey.
iv
TABLE OF CONTENTS
CERTIFICATIONS……………………………………………………. ……. i
ABSTRACT…………………………………………………………….……. iii
ACKNOWLEDGEMENT……………………………………………............. iv
CHAPTER 1: INTRODUCTION ……………………………………... 1
1.1 Background of Study……………………………….. 1
1.2 Problem Statement………………………………….. 2
1.3 Research Questions…………………………............. 2
1.4 Objectives…………………………………………… 3
1.5 Scope of Study………………………………………. 4
1.6 Project Relevancy and Significance…………………. 4
CHAPTER 2: LITERATURE REVIEW AND THEORY ……………... 5
2.1 Intro to Sports Analytics and Machine Learning……. 5
2.2 Factors Influencing the Machine Learning Model…... 6
2.3 Machine Learning Algorithms………………………. 7
CHAPTER 3: METHODOLOGY AND PROJECT WORK…………… 9
3.1 Research Methodology……………………………… 9
3.1.1 Business Understanding……………………... 10
3.1.2 Data Understanding………………………….. 11
3.1.3 Data Preparation……………………………... 11
3.1.4 Modelling……………………………………. 12
3.1.5 Evaluation…………………………………… 12
3.1.6 Deployment…………………………………. 12
3.2 Research Algorithm……………………………….... 13
3.3 Tools………………………………………………... 14
3.3.1 Kaggle……………………………………….. 14
3.3.2 Google Colab……………………………….... 14
3.3.3 Microsoft Power BI………………………….. 15
3.4 Project Milestones…………………………………... 15
CHAPTER 4: CONCLUSION AND FUTURE WORK……………….. 17
4.1 Conclusion and Future Work……………….……….. 17
REFERENCES…………………………………………………………………….. 18
APPENDICES……………………………………………………………………... 20
CHAPTER 1
INTRODUCTION
1
1.2 Problem Statement
Sports may be intricate and dynamic activities with many variables that
might affect the result, including the weather, faulty equipment, and
individual abilities. If these factors are not properly considered, a team's
standing may be in jeopardy. As a result, predicting the outcome based on
an educated guess without a careful evaluation of the data and calculation
would be extremely wrong. Given the potential for these factors to shift
the outcome of a single event, the development of a prediction model using
machine learning is necessary. This model is curated for the Formula One
team principal and engineers, and it uses algorithms trained on historical
data to create a strategic, well-analyzed framework to predict each race's
outcomes.
The research questions for this research study are outlined as below:
2
1.4 Objectives
3
1.6 Project Relevancy and Significance
4
CHAPTER 2
5
Sport prediction is usually treated as a classification problem, with one
class (win, lose, draw) to be predicted as mentioned by Prasitio and Harlili
(2016). Hence, researchers are looking to employ a variety of features, such
as history team performance, historical match results, and player and other
data that has been gathered. There are numerous sports prediction models
available that use various algorithms to accomplish the forecast's objective.
Some of it include Bunker and Thabtah (2019), who constructed a
theoretical framework for general sports winner prediction using
unlabelled data. Multiple ML solutions used in match winner prediction for
NHL (National Hockey League) used as ensemble learning methods are
highlighted in Gu, Foster, Shang, and Wei (2019). Not just that, Ofoghi et
al. (2010) conducted a study for a machine learning approach to predicting
winning patterns in track cycling omnium that used unsupervised,
supervised and statistical analysis method. Last but not least, a machine
learning framework for predicting the race winner and championship
standings using 3 different machine learning algorithms by Sicoie (2022).
6
attack and defence ratings, and even team motivation/psychology in the
form of expert knowledge could lead to improved results.
7
work on increasing prediction accuracy in the game of cricket using
machine learning found that Random Forest turned out to be the most
accurate classifier for both the datasets with an accuracy of 90.74%.
Moving on, not all machine learning model depends on Random Forest
algorithms to get the highest prediction accuracy. For instance, A game-
predicting expert system using Big Data and Machine Learning shared that
Support Vector Machine (SVM) outdone other ML algorithms. The high
prediction accuracy (i.e. >90%) confirms that the SVM and ensemble
machine learning algorithm are valuable tools that can accurately predict
game outcomes as stated by Gu et al (2019). Furthermore, a study by Lotfi
and Rebbouj (2021) on machine learning for sports results prediction using
algorithms claimed that his work proved that the algorithms are effective
in deriving highly accurate models by utilizing Neural Networks (NN).
Lastly, according to Bunker and Susnjak (2022), a wide set of candidate
algorithms and ensembles should be used in experimentation in sports
result prediction. One of the algorithms used was Boosting Gradient even
though higher propensity for research to us Artificial NN in sports domains.
8
CHAPTER 3
9
Based on Figure 1, Table 1 below shows a brief explanation of the 6
CRISP-DM phases.
CRISP-DM
Explanation
Phases
Understands the business problem that you are trying to
Business solve and how a predictive model can help. Identify the
Understanding objectives, success, criteria, constraints, and risks
associated with the project.
Collect and explore the relevant data that will be used to
Data
build the predictive model. Identify any missing or
Understanding
erroneous data, and consider how to address the issues.
Clean, transform, and prepare the data for modeling. This
Data Preparation process may involve tasks such as selecting data, clean
data, constructing data, integrating data, and format data.
Select a proper machine learning algorithm and train the
model based on the prepared data. Experimenting with
Modelling
different algorithms and hyperparameters would work best
to find the best-performing model.
Focuses on technical model assessment. This phase looks
broadly at which model best meets the business and what
Evaluation
to do next. This includes evaluating results, reviewing the
process, and determine the next step.
Deploy the model into production, integrating it with
Deployment appropriate systems and processes. Ensure that it is well-
documented and accessible to relevant stakeholders.
10
3.1.2 Data Understanding
11
3.1.4 Modelling
i. Neural Network
ii. Random Forest
iii. Kernel SVM
iv. Gradient Boosting Tree
3.1.5 Evaluation
3.1.6 Deployment
12
3.2 Machine Learning Algorithms
i. Neural Network
ii. Random Forest
iii. Gradient Boosting Tree
iv. Kernel SVM
13
3.3 Tools
3.3.1 Kaggle
14
3.3.3 Microsoft Power BI
The Gantt chart on the next page shows the significant events to be
covered in FYP I as well as FYP II.
15
FYP I:
FYP II:
16
CHAPTER 4
4.1 Conclusion
Integrating machine learning in Formula One can not only predict the
winners of the race but can also be used in other areas of the sport, which
would be beneficial for the team. For example, sports analysts and
engineers can use machine learning to detect damages to the systems and
cars, analyze drivers' performances and areas where they need to improve,
as well as predict the constructor's overall standings.
Overall, I have high hopes for this project and think that the model will
offer useful information regarding the Formula One racing and sports
analytics industries.
The future work of this project includes executing the next steps of the
CRISP-DM framework, which will be presented in FYP II. The next steps
in this project include data preparation and analysis, data modelling, testing
and evaluation, as well as visualizing the findings and outcomes in an
interactive manner.
17
REFERENCES
[1] Bunker, R. P., & Thabtah, F. (2019). A machine learning framework for sport
result prediction. Applied Computing and Informatics, 15(1), 27–33.
https://doi.org/10.1016/j.aci.2017.09.005
[2] Bunker, R., & Susnjak, T. (2022). The application of machine learning
techniques for predicting match results in Team Sport: A Review. Journal of
Artificial Intelligence Research, 73, 1285–1322.
https://doi.org/10.1613/jair.1.13509
[3] Demir, İ., & Barman, İ. (2021). Modelling sport events with supervised machine
learning. Fundamental Journal of Mathematics and Applications, 232–244.
https://doi.org/10.33401/fujma.951665
[4] Gu, W., Foster, K., Shang, J., & Wei, L. (2019). A game-predicting expert
system using Big Data and machine learning. Expert Systems with
Applications, 130, 293–305. https://doi.org/10.1016/j.eswa.2019.04.025
[5] Gómez, M. A., Ibáñez, S. J., Parejo, I., & Furley, P. (2017). The use of
classification and regression tree when classifying winning and losing basketball
teams. Kinesiology, 49(1), 47. https://doi.org/10.26582/k.49.1.9
[6] Haghighat, M., Rastegari, H., & Nourafza, N. (2013). A Review of Data Mining
Techniques for Result Prediction in Sports, 2(5), 7–12.
[7] Lotfi, S., & Rebbouj, M. (2021). Machine learning for sport results prediction
using algorithms. International Journal of Information Technology and Applied
Sciences (IJITAS), 3(3), 148–155. https://doi.org/10.52502/ijitas.v3i3.114
[9] Ofoghi, B., Zeleznikow, J., MacMahon, C., & Dwyer, D. (2010). A machine
learning approach to predicting winning patterns in track cycling
18
omnium. Artificial Intelligence in Theory and Practice III, 67–76.
https://doi.org/10.1007/978-3-642-15286-3_7
[10] Passi, K., & Pandey, N. (2018). Increased prediction accuracy in the game of
cricket using Machine Learning. International Journal of Data Mining &
Knowledge Management Process, 8(2), 19–36.
https://doi.org/10.5121/ijdkp.2018.8203
[11] Sicoie, H. (2022). Machine Learning Framework for formula 1 race winner and
championship standings predictor(thesis). Tilburg University. Cognitive
Science and Artificial Intelligence.
[12] Vopani, V. (2023, March 7). Formula 1 World Championship (1950 - 2023).
Kaggle. Retrieved March 30, 2023, from
https://www.kaggle.com/datasets/rohanrao/formula-1-world-championship-
1950-2020/code?resource=download
19
APPENDICES
20