0% found this document useful (0 votes)
37 views13 pages

BTP Project Report

The document discusses developing a CNN model to automate penalty decisions in soccer by classifying tackles as fouls or clean based on images of initial contact. It covers collecting initial contact images from videos, implementing a CNN architecture in TensorFlow to classify the images, and addressing issues like subjective human refereeing and reducing delays from video reviews.

Uploaded by

Harsh Priye
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views13 pages

BTP Project Report

The document discusses developing a CNN model to automate penalty decisions in soccer by classifying tackles as fouls or clean based on images of initial contact. It covers collecting initial contact images from videos, implementing a CNN architecture in TensorFlow to classify the images, and addressing issues like subjective human refereeing and reducing delays from video reviews.

Uploaded by

Harsh Priye
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

“Player contact and foul detection in sports

using AI based video analysis”


A project report submitted in partial fulfillment of the requirements for
the award of the degree of
B.Tech in “Mathematics and Computing”

By

Saibal Patra (2020UCM2348)


Harsh Priye (2020UCM2355)
Mohammad Sahil (2020UCM2320)
Under the Supervision of

Dr. Savita Yadav


Index

S.no Title Page No.


1. Abstract 1
2. Introduction 2
3. Motivation 3
4. Literature Survey 4
5. Problem Statement 6
6. Implementation 7
7. Predictions 9
8. Work Done Till Date and 10
Future Work
9. References 10

1
1. Abstract

The sport has developed a lot in the recent century and so has the technology involved in
the game. The Virtual Assistant Referee (VAR) is one of them and has largely impacted
the game.

The role of VAR is simple yet complex: to intervene in between the play when the
referees make a wrong decision or cannot make one. A specific scenario arises when they
must decide if a sliding tackle inside the box has resulted in a clean tackle or penalty for
the opposition team. The technology is there to watch the moment at which tackle took
place on repeat but the decisions are still made by humans and hence can be biased. We
tried to develop a CNN based foul detection which is theoretically based on the principle
of the initial point of contact.

2. Introduction

Football originated in 1863 in England, over 160 years ago. Since then, it has become a
major entertainment sport in sporting history. In the 2012–13 season, the dutch league
was introduced by Virtual Assistant Referee (VAR) that like any other machine or
technology which is used by humans to make their lives easy, it was used to make the
referees live easy in the game.

According to FIFA, the objectives of VAR are:


● Goals: The role of the VAR is to assist the referee to determine whether there was
an infringement that means a goal should not be awarded. As the ball has crossed
the line, play is interrupted so there is no direct impact on the game
● Penalty/Foul Decisions: The role of the VAR is to ensure that no clearly wrong
decisions are made in conjunction with the award or non-award of a penalty kick.
● Direct Red Card Incidents: The role of the VAR is to ensure that no clearly wrong
decisions are made in conjunction with sending off or not sending off a player.
● Mistaken Identity: The referee cautions or sends off the wrong player, or is unsure
which player should be sanctioned. The VAR will inform the referee so that the
correct player can be disciplined.

2
In this project, we concentrate on the second use case, the penalty decisions. Although
the decisions are taken using a thorough check using repeat video recordings of the
moment when a tackle takes place, reviewing from different angles, this task still has a
human dependency and may contain a bias. To automate this process, I propose a
Convolutional Neural Network, that will take the initial point of contact image as an
input and predict if a foul has been committed or not. Hence the penalty decisions can
now be automated rather than based on the human investigation.

3. Motivation

With the advent of technology, we have witnessed remarkable advancements in the game,
and one of the most significant innovations is the introduction of the Virtual Assistant
Referee (VAR) system. While VAR has enhanced the accuracy of refereeing decisions,
there remains room for improvement, particularly in the crucial area of penalty and foul
decisions.

The motivation for this project stems from the recognition that despite the availability of
advanced technology to review plays, the final decision still rests with human referees.
These referees, like all of us, are susceptible to human error, biases, and the pressures of
making split-second judgments that can profoundly impact the outcome of a match. This
inherent subjectivity in penalty decisions calls for a more objective and technologically
advanced solution.

The aim of this project is to harness the power of modern technology, specifically
Convolutional Neural Networks (CNNs), to revolutionize the way we approach penalty
decisions in football. By focusing on the initial point of contact during a tackle inside the
penalty box, we seek to automate the process of foul detection. This approach offers
several compelling motivations:
● Enhanced Objectivity: By employing a CNN-based system, we eliminate the
potential for human bias in penalty decisions. The technology operates solely
based on data and algorithms, ensuring a fair and impartial assessment of each
situation.
● Consistency: The consistency of decisions is crucial in football. This system
promises to deliver consistent results, reducing disputes and controversies
surrounding penalty calls. Players, coaches, and fans can have greater confidence
in the fairness of the game.

3
● Efficiency: The automation of penalty decisions through AI technology will
expedite the decision-making process. It will reduce the need for lengthy video
reviews, allowing the game to flow more smoothly and eliminating unnecessary
disruptions.
● Reduced Errors: Human referees are prone to errors, especially in high-pressure
situations. This project aims to minimize errors in penalty decisions, leading to
fairer outcomes and less frustration among stakeholders.
● Advancement of the Sport: Football has always evolved with the times, embracing
technological innovations to improve the game. Implementing AI for penalty
decisions is a natural progression that can enhance the sport's appeal and
competitiveness.

4. Literature Survey

The VAR-CNN
The model we are working with is a Convolutional Neural Network-based model which
takes images based on the initial contact and provides a classification for the same. In this
section, we will discuss the data, the model architecture used, results and inferences from
our model. The model is a small workaround for the actual virtual assistant referee; hence
we named it VAR-CNN.

Data Collection:

Data Collecting the data has been a ponderous task, there are no open-source resources
for
the kind of data of any league. The only available sources are the video clips of the
European matches and compilations on YouTube of tackling and fouls. A small chunk of
data is also acquired from the paper Soccer Event Detection Using Deep Learning.

4
Collecting the data has been a ponderous task, there are no open-source resources for the
kind of data of any league. The only available sources are the video clips of the European
matches and compilations on YouTube of tackling and fouls. A small chunk of data is
also acquired from the paper Soccer Event Detection Using Deep Learning.

Data sources: https://arxiv.org/abs/2102.04331, https://github.com/aamir09/VarCnn

The variations in data can be observed above. A total of 1200+ images were scrapped for
two classes namely Clean Tackles and Fouls. Clean Tackles as the name suggests is when
a defender gets the ball first and the initial contact would be on the ball. On the contrary,
a foul is when a defender gets in contact with the player first. This approach was the basis
of this study and data collection. The initial contact data and the moment just after it are
recorded in this dataset.

5
5. Problem Statement

Despite the introduction of the Virtual Assistant Referee (VAR) system, which aims to
improve the accuracy of refereeing decisions, there are still significant challenges in
determining whether a sliding tackle inside the penalty box should result in a clean tackle
or a penalty for the opposition team. This project addresses the following key problems:

● Subjective Decision-Making: Penalty decisions often hinge on the subjective


judgment of on-field referees. Human referees can make inconsistent judgments
due to factors like viewing angles, split-second timing, and personal bias.
● Potential for Bias: Human referees, like all individuals, may introduce
unintentional biases based on team affiliation, player identity, or crowd influence.
These biases can lead to controversial and disputed penalty decisions.
● Delay and Disruption: The current method of reviewing penalty decisions through
VAR often results in delays and interruptions to the flow of the game. Lengthy
video reviews can frustrate players, coaches, and fans.
● Inefficiency in Decision-Making: While VAR provides access to video replays
from various angles, the final decision still relies on human interpretation. This
process is time-consuming and can lead to inconsistent outcomes.

6
6. Implementation

Model & Architecture


The model used in this study is based on the convolutional neural networks implemented
in python using tensorflow. The CNNs are based on the principle that the inputs(images)
will be convolved kernels present inside the filter, which will, in turn, generate a feature
map. The convolution is an element-wise multiplication of the kernel weights with the
pixels. In a filter, there is a separate kernel for each channel of the input and the sum of
the outputs of the kernel for each channel is the corresponding pixel value on the feature
maps. The model was quite simple having 650k+ parameters with only dropout used as
the form of regularization. Other regularization combinations like batch normalization, l1
and l2 norm but dropout was the most successful in terms of generalization. Though it
was evident that with BatchNorm in the system the loss converged much faster for the
same hyperparameters setting. The dropout rate was kept at 0.5 in the dense layers, the
advantage of dropout is it prevents overfitting by making the neurons learn individually
more than co-dependently as for each batch/example in training it randomly drops out
50% (dropout rate 0.5) of the neurons which in turn gives a new neural network to each
batch and what we have is the average predictions of all those possible combinations. The
initial Convolution layers had a kernel size of 5 and 64 filters each followed by
max-pooling while the later layers had a decreasing number of filters, kernel size of 3 and
a dilation rate of 2 followed by max-pooling. The dilation rate was kept at standard and
the advantage of dilated convolutions is pretty evident, they provide a larger receptive
field. An example of dilated convolutions is shown below:

7
Model structure is:

The last 3 blocks of CNN contain dilation, the dense layers use relu activations and the
output operates on sigmoid activation, we use a binary classification model. The size of
the input images was trimmed to 256,256 using nearest interpolation while using data
augmentation we provided various techniques like rotation, horizontal flip, vertical flip,
brightness ranges etc. While training an early stopping callback was also used with the
patience of 10 epochs having the best weights restored. Validation accuracy was
monitored in the early stopping call back. The training accuracy achieved was ~68% and
the validation accuracy achieved was ~74%. The accuracies were low but acceptable
knowing the data size and complexity of the datasets.

Training Logs:

8
Overfitting was observed in each and every model with different regularization
combinations but with dropout, it was last observed and as we used early stopping best
weights were restored.

7. Predictions

The predictions are made by converting videos into frames using OpenCV and generating
predictions for each frame.

9
8. Work done till now and Future work

Till this date, we have successfully implemented the model on the Jupyter Notebook.
Next step will be Model Fine-tuning and Deployment.

The future work is improving the model by:


● Increasing the volume of the data
● as well as the variety of fouls

In this project, we have studied sliding tackles.


Once a model with better accuracy is achieved, it may become the next advancement in
football’s decision making.

9. References
● Gerke, Sebastian, Antje Linnemann, and Karsten Müller. "Soccer player recognition using spatial
constellation features and jersey number recognition." Computer Vision and Image
Understanding 159 (2017): 105-115.
● Baysal, Sermetcan, and Pınar Duygulu. "Sentioscope: a soccer player tracking system using
model field particles." IEEE Transactions on Circuits and Systems for Video Technology 26, no.
7 (2015): 1350-1362.
● Kamble, P. R., A. G. Keskar, and K. M. Bhurchandi. "A deep learning ball tracking system in
soccer videos." Opto-Electronics Review 27, no. 1 (2019): 58-69.
● Choi, Kyuhyoung, and Yongduek Seo. "Automatic initialization for 3D soccer player tracking."
Pattern Recognition Letters 32, no. 9 (2011): 1274-1282.
● Kim, Wonjun. "Multiple object tracking in soccer videos using topographic surface analysis."
Journal of Visual Communication and Image Representation 65 (2019): 102683.
● Liu, Jia, Xiaofeng Tong, Wenlong Li, Tao Wang, Yimin Zhang, and Hongqi Wang. "Automatic
player detection, labeling and tracking in broadcast soccer video." Pattern Recognition Letters 30,
no. 2 (2009): 103-113.
● Komorowski, Jacek, Grzegorz Kurzejamski, and Grzegorz Sarwas. "BallTrack: Football ball
tracking for real-time CCTV systems." In 2019 16th International Conference on Machine Vision
Applications (MVA), pp. 1-5. IEEE, 2019.

10
● Hurault, Samuel, Coloma Ballester, and Gloria Haro. "Self-Supervised Small Soccer Player
Detection and Tracking." In Proceedings of the 3rd International Workshop on Multimedia
Content Analysis in Sports, pp. 9-18. 2020.
● Kamble, Paresh R., Avinash G. Keskar, and Kishor M. Bhurchandi. "A convolutional neural
network-based 3D ball tracking by detection in soccer videos." In Eleventh International
Conference on machine vision (ICMV 2018), vol. 11041, p. 110412O. International Society for
Optics and Photonics, 2019.
● Naidoo, Wayne Chelliah, and Jules Raymond Tapamo. "Soccer video analysis by ball, player and
referee tracking." In Proceedings of the 2006 annual research conference of the South African
institute of computer scientists and information technologists on IT research in developing
countries, pp. 51-60. 2006.
● Liang, Dawei, Yang Liu, Qingming Huang, and Wen Gao. "A scheme for ball detection and
tracking in broadcast soccer video." In Pacific-Rim Conference on Multimedia, pp. 864-875.
Springer, Berlin, Heidelberg, 2005
● Mazzeo, Pier Luigi, Marco Leo, Paolo Spagnolo, and Massimiliano Nitti. "Soccer ball detection
by comparing different feature extraction methodologies." Advances in Artificial Intelligence
2012 (2012).
● Garnier, Paul, and Théophane Gregoir. "Evaluating Soccer Player: from Live Camera to Deep
Reinforcement Learning." arXiv preprint arXiv:2101.05388 (2021).

11

You might also like