Paper
Paper
Abstract
Over the past decade, the technology used by referees in football has improved substantially,
enhancing the fairness and accuracy of decisions. This progress has culminated in the imple-
mentation of the Video Assistant Referee (VAR), an innovation that enables backstage referees
to review incidents on the pitch from multiple points of view. However, the VAR is currently
limited to professional leagues due to its expensive infrastructure and the lack of referees world-
wide. In this paper, we present the Video Assistant Referee System (VARS) that leverages the
latest findings in multi-view video analysis. Our VARS achieves a new state-of-the-art on the
SoccerNet-MVFoul dataset by recognizing the type of foul in 50% of instances and the appropri-
ate sanction in 46% of cases. Finally, we conducted a comparative study to investigate human
performance in classifying fouls and their corresponding severity and compared these findings
to our VARS. The results of our study highlight the potential of our VARS to reach human
performance and support football refereeing across all levels of professional and amateur federations.
Keywords: Football, Soccer, Artificial Intelligence, Computer Vision, Video Recognition, Automated
Decision, Video Assistant Referee, Referee Success Rate, Fouls evaluation
1
stands among the largest and most comprehen- referees during the 2020/2021 season, whereas the
sive sports dataset, with extensive annotations for number of games played each weekend was around
video understanding in football. 90,000 [45, 46]. The introduction of the VAR
In refereeing, the biggest revolution was intro- requires an additional team of referees per game,
duced by the Video Assistant Referee (VAR) in which is not feasible for semi-professional or ama-
2016 [40]. The system involves a team of refer- teur leagues. Finally, each referee interprets the
ees located in a video operation room outside the Laws of the Game [47] slightly differently, result-
stadium. These referees have access to all avail- ing in different decisions for similar actions. Given
able camera views and check all decisions taken that the video assistant referee (VAR) changes
by the on-field referee. If the VAR indicates a from one game to another, inconsistencies may
probable “clear and obvious error” (E.g. when arise, with the VAR making different decisions for
the referee misses a penalty or a red card, gives similar actions across different matches.
a yellow card to the wrong player, etc.), it will In this paper, we present the “Video Assistant
be communicated to the on-field referee who can Referee System” (VARS), which could support
then review his decision in the referee review area or extend the current VAR. Our VARS fulfills
before taking a final decision. The VAR helps to the same objectives and tasks as the VAR. By
ensure greater fairness in the game by reducing analyzing fouls from a single or a multi-camera
the impact of incorrect decisions on the outcome video feed, it indicates a probable “clear and obvi-
of games. Notably, in 8% of the matches, the VAR ous error”, and can communicate this information
has a decisive impact on the result of the game to the referee, who will then decide whether to
[41] and it slightly reduces the unconscious bias initiate a “review”. The proposed VARS automat-
of referees towards home teams [42]. On aver- ically analyzes potential incidents that can then
age, away teams now score more goals and receive be shown to the referee in the referee review area.
fewer yellow cards [43]. Controversial referee mis- Just like the regular VAR, our VARS serves as a
takes like the famous “hand of God” goal by support system for the referee and only alerts him
Diego Maradona during the quarter-final match in the case of potential game-changing mistakes,
Argentina versus England of the 1986 FIFA World but the final decision remains in the hands of the
Cup, Josip Šimunić getting three yellow cards in main referee. The main benefit of our VARS is
a single game at the 2006 FIFA World Cup, or that it no longer requires additional referees, mak-
Thierry Henry’s handball preventing the Repub- ing it the perfect tool for leagues that do not have
lic of Ireland from qualifying for the World Cup enough financial or human resources.
could have been avoided with the VAR and would Contributions. We summarize our contribu-
have changed football history. tions and novelties as follows: (i) We propose an
Despite its potential benefits, the use of the upgraded version of the VARS presented by Held
VAR technology remains limited to professional et al. [35]. We introduce an attention mechanism
leagues. The infrastructure of the VAR is expen- on the different views and calculate an importance
sive, including multiple cameras to analyze the score to allocate more attention to more infor-
incident from different angles, video operation mative views before aggregating the views. (ii)
rooms in various locations, and VAR officials hired We present a thorough study on the influence of
to analyze the footage. Leagues with financial lim- using multiple views and different types of cam-
itations cannot afford the necessary infrastructure era views on the performance of our VARS. (iii)
to operate the VAR. In addition to the upfront We present a comprehensive human study where
costs of the infrastructure, there is also an ongo- we compare the performance of human referees,
ing expense associated with using the VAR. The football players, and our VARS on the task of
officials who serve as Video Assistant Referees type of foul classification and offense severity clas-
require specialized training [44] and monetary sification. Our human study also illustrates the
compensation following each game. Given the subjectivity of refereeing decisions by examining
implementation and operational costs of VAR, its the inter-rater agreement among referees.
use is currently restricted to professional leagues.
A further obstacle is the shortage of referees world-
wide. In Germany, there were only 50,241 active
3
are more likely to provide crucial details, to accel- Type of Foul Offence Severity Time
erate the decision-making process during the VAR Acc. Conf. Acc. Conf.
review. Players 75% 3.6 58% 3.3 41.53
Referees 70% 3.7 60% 3.6 38.01
VARS 60% - 51% - 0.12
4 Human study Table 2: Accuracy comparison between ref-
erees, players, and our VARS. The survey was
In contrast to classical classification tasks that performed on a subset of the test set of size 77. The
involve well-defined and easily separable classes, time is given in seconds and represents the aver-
determining whether an action in football consti- age time needed to make a decision. Acc. stands
tutes a foul may be subjective. Despite the defini- for accuracy and conf. for confidence. A rating of 5
tions and regulations provided by the Laws of the indicates high confidence, while a rating of 1 indi-
Game [47], the rule book published by the IFAB cates low confidence.
regarding when an action in football is considered
a foul and its corresponding severity, these guide-
lines are still open to interpretation, leading to video’ button. For each action, the participants
differing opinions about the same action. In prac- had the same classification task as presented in
tice, many actions fall into this gray area where Section 3.1. Specifically, they had to determine the
both interpretations, foul or no foul, could be con- type of foul, if the action was a foul or not, and the
sidered correct. In this study, we first analyze corresponding severity. For each action, we use the
whether and how the performance of our VARS annotations from the SoccerNet-MVFoul dataset
aligns with human performance (i.e., referees and as ground truth to determine the accuracy for each
football players) by comparing the accuracy of participant. An important note is that the partic-
the type of foul and offense severity classifica- ipants have a clear advantage over our VARS as
tions between VARS and our human participants. they view clips lasting 5 seconds, with a frame rate
Secondly, we conduct an inter-rater agreement of 25 fps, while our model gets a 1-second clip at 16
analysis of human decisions to quantify the extent fps as input. Finally, let us note that our study was
of agreement among our human participants. approved by the local university’s ethics commit-
Experimental setup. The study involves two tee (2223-080/5624). All analyses were performed
distinct groups of participants with different using the JASP software.
expertise in football: “Players” and “Referees”.
The first group consisted of 15 male individuals 4.1 Comparison to human
aged 18 or older (with a mean M = 23.06 and
performance
a standard deviation SD = 3.49 years), who had
been playing football for a minimum of three years Table 2 shows the average accuracy compared
(M = 8.71 and SD = 3.32 years). The second to the ground truth of players, referees, and our
group consisted of 15 male individuals aged 18 or VARS, respectively. These results align with sim-
older (M = 25.33 and SD = 4.51 years), who are ilar studies [52–54], where the referees had an
certified football referees and have officiated in at overall decision accuracy ranging from 45% to
least 200 official games (from 223 to 1150 games). 80%.
Both groups analyzed 77 actions, each presented In terms of the type of foul categorization,
with three different camera perspectives simulta- players (M = 0.752, SD = 0.055) were numeri-
neously. The participants could review the clips cally more accurate than referees (M = 0.704, SD
several times and watch the actions in slow motion = 0.120), but this difference was not statistically
or frame-by-frame, without any time restriction. significant, as shown by an independent samples
To reduce bias, the actions were shown in a Student t-test, t(28) = 1.421, p = 0.166, d =
different random order to each participant. For 0.519, 95% CI = [−0.214 - 1.243]. Mean confidence
each action, we measured the time taken by the levels in these categorizations were comparable
participants to make their decision. This time between players (M = 3.64, SD = 0.28) and
was measured from the moment the participants referees (M = 3.71, SD = 0.32), t(28) < 1.
started the video until they clicked on the ‘Next
As for determining if an action corresponds Nb. of different decisions 1 2 3 4
to a foul and the corresponding severity, refer- High-level referees 16% 56% 28% 0%
ees were slightly more accurate (M = 0.594, SD Referee talents 2% 60% 38% 0%
= 0.091) than players (M = 0.582, SD = 0.061). Table 3: Similarity analysis of the results for
However, this difference was not statistically sig- Offense Severity classification. Among high-
nificant, t(28) = −0, 401, p = 0.691, d = −0.147, level referees, 28% of cases result in three different
95% CI = [−0.862 - 0.571]. Although the accuracy decisions being made for the same action. For
of players and referees was comparable, referees referee talents, this percentage even increases to
were more confident in their severity judgments 38%. These results show the significant challenge
(M = 3.67, SD = 0.36) than players (M = 3.33, involved in determining whether an action should
SD = 0.39), t(28) = −2.3, p = 0.029, d = −0.839, be classified as a foul and assessing its correspond-
95% CI = [−1.581 - −0.084]. Referees’ higher con- ing severity.
fidence might be due to their specific experience
in assessing fouls and their severity on the field.
Overall, our results suggest that the accu- of our VARS compared to humans, the current
racy of players and referees is comparable. The results are promising. Further, it is notable that
Bayesian version of the Student t-test provides our VARS only requires 120ms to reach a decision,
support for this null hypothesis with Bayes factors which is more than 300 times faster than humans.
BF10 of 0.732 and 0.366 for the type of foul and Both referees and players require a similar amount
offense severity task, respectively. There is a pos- of time to make a decision. On average, players
sibility that this lack of difference between groups take around 41.53 seconds and referees 38.01 sec-
is due to power issues, i.e., the sample size being onds, which is similar to the average time of 46
too small. Replication studies conducted on larger seconds taken for the VAR to make a decision as
groups would be valuable in revealing potential reported by López [55].
differences between the two human groups.
As we do not have a standard deviation for the 4.2 Inter-rater agreement
VARS, we conducted two One-Sample t-tests to
compare its performance against humans (players In this subsection, we investigate the reliability
and referees were grouped as their accuracy was and consistency of humans in determining whether
comparable). For action categorization, humans an action constitutes a foul and its severity. To
(M = 0.728, SD = 0.095) were significantly more assess the level of consensus among humans, we
accurate than our VARS (M = 0.597), t(29) = calculated inter-rater agreement in each group for
7.556, p < .001, d = 1.379, 95% CI = [0.870 - the severity classification task. Since determining
1.876]. Humans were also more accurate (M = if an action is a foul and assessing its severity is
0.588, SD = 0.081) than our VARS (M = 0.508) the most important task, we only focus on eval-
for offense severity judgments, t(29) = 5.492, p < uating inter-rater agreement for this aspect. To
.001, d = 1.003, 95% CI = [0.556 - 1.437]. This quantify the inter-rater agreement, we calculated
difference in performance might be due to differ- the unweighted average Cohen’s kappa, which
ences in training between our VARS and humans. measures the agreement between multiple individ-
Players and referees have accumulated an exten- uals. The referees achieved an unweighted average
sive amount of experience in football, through Cohen’s kappa of 0.213, indicating weak agree-
officiating, playing, and watching the game for ment. Similarly, players’ agreement was weak,
countless hours. In contrast, our VARS has only with a score of 0.223. This suggests limited con-
been trained on an unbalanced training set of sistency among both groups in their assessments.
2,916 actions, where some types of labels only Among our 15 referees, 7 are officiating at a high
occur a few times. For example, there are only 27 level (in the highest league of their country).
fouls with a red card in the training set, making These referees are called “high-level referees” in
it difficult for the model to precisely learn the dif- the following. All other referees are called “ref-
ference between a foul with a yellow card and one eree talents”. Table 3 shows the consensus in
with a red card. Considering the difficulty of the each subgroup for the offense severity classifica-
task and the significant experience disadvantage tion task. As can be seen, high-level and referee
9
Fig. 5: Example of the subjectivity of human choices. Decisions taken by our participants: “No
offense”, “Offense + No card”, and “Offense + Yellow card”.
talents reached a consensus between themselves Acknowledgement This work was partly sup-
for only 16% and 2% of the actions, respectively. ported by the King Abdullah University of Science
In the majority of cases, multiple decisions were and Technology (KAUST) Office of Sponsored
made for the same action, indicating the diffi- Research through the Visual Computing Center
culty in determining whether an action should be (VCC) funding and the SDAIA-KAUST Cen-
classified as a foul and assessing its severity. Par- ter of Excellence in Data Science and Artificial
ticularly among referee talents, 38% of actions Intelligence (SDAIA-KAUST AI). J. Held and
resulted in three different decisions (out of four A. Cioppa are funded by the F.R.S.-FNRS. The
possible decisions to take) for the same action. present research benefited from computational
Figure 5 shows an example of an action where all resources made available on Lucia, the Tier-1
three decisions “No offense”, “Offense + No card” supercomputer of the Walloon Region, infrastruc-
and “Offense + Yellow card” were taken among ture funded by the Walloon Region under the
the referees. For certain referees, the fact that the grant agreement n°1910247.
defender plays the ball is considered enough to
not award a free-kick in this situation. However, 6 Declarations
other referees believe that even if the defender
plays the ball, he disregards the danger to, or con- Availability of data and code. The data and
sequences for, an opponent and awards a yellow code are available at these addresses https://
card. These findings underscore the complexity github.com/SoccerNet/sn-mvfoul
and subjectivity inherent in refereeing decisions, Conflict of interest. The authors declare no
highlighting the potential for further research to conflict of interest.
improve consistency and fairness in officiating.
Open access.
5 Conclusion References
Distinguishing between a foul and no foul and
[1] Cioppa, A., Deliège, A., Ul Huda, N.,
determining its severity is a complex and subjec-
Gade, R., Van Droogenbroeck, M., Moes-
tive task that relies entirely on the interpretation
lund, T.B.: Multimodal and multiview dis-
of the Laws of the Game [47] by each individ-
tillation for real-time player detection on
ual. Despite the challenges posed by this complex
a football field. In: IEEE Int. Conf.
task and an unbalanced training dataset, our solu-
Comput. Vis. Pattern Recognit. Work.
tion demonstrates promising results. While we
(CVPRW), CVsports, Seattle, WA, USA,
have not reached human-level performance yet,
pp. 3846–3855 (2020). https://doi.org/10.
we believe that VARS holds the potential to
1109/CVPRW50498.2020.00448. https://doi.
assist and support referees across all levels of
org/10.1109/CVPRW50498.2020.00448
professionalism in the future.
[2] Maglo, A., Orcesi, A., Pham, Q.-C.: Effi- [7] Hong, J., Zhang, H., Gharbi, M., Fisher, M.,
cient tracking of team sport players with few Fatahalian, K.: Spotting temporally precise,
game-specific annotations. In: IEEE/CVF fine-grained events in video. In: Eur. Conf.
Conf. Comput. Vis. Pattern Recognit. Comput. Vis. (ECCV). Lect. Notes Com-
Work. (CVPRW), pp. 3460–3470. Inst. put. Sci., vol. 13695, pp. 33–51. Springer, Tel
Electr. Electron. Eng. (IEEE), New Orleans, Aviv, Israël (2022). https://doi.org/10.1007/
LA, USA (2022). https://doi.org/10.1109/ 978-3-031-19833-5 3
cvprw56347.2022.00390. https://doi.org/10.
1109/CVPRW56347.2022.00390 [8] Soares, J.V.B., Shah, A., Biswas, T.: Tempo-
rally precise action spotting in soccer videos
[3] Vandeghen, R., Cioppa, A., Van Droogen- using dense detection anchors. In: IEEE
broeck, M.: Semi-supervised training Int. Conf. Image Process. (ICIP), pp. 2796–
to improve player and ball detection 2800. Inst. Electr. Electron. Eng. (IEEE),
in soccer. In: IEEE Int. Conf. Com- Bordeaux, France (2022). https://doi.org/
put. Vis. Pattern Recognit. Work. 10.1109/icip46576.2022.9897256. https://doi.
(CVPRW), CVsports, pp. 3480–3489. org/10.1109/ICIP46576.2022.9897256
Inst. Electr. Electron. Eng. (IEEE),
New Orleans, LA, USA (2022). https: [9] Soares, J.V.B., Shah, A.: Action spot-
//doi.org/10.1109/cvprw56347.2022.00392. ting using dense detection anchors
https://doi.org/10.1109/CVPRW56347. revisited: Submission to the SoccerNet
2022.00392 challenge 2022. arXiv abs/2206.07846
(2022) https://arxiv.org/abs/2206.07846.
[4] Somers, V., Joos, V., Giancola, S., Cioppa, https://doi.org/10.48550/arXiv.2206.07846
A., Ghasemzadeh, S.A., Magera, F.,
Standaert, B., Mansourian, A.M., Zhou, [10] Giancola, S., Cioppa, A., Georgieva, J.,
X., Kasaei, S., Ghanem, B., Alahi, A., Billingham, J., Serner, A., Peek, K., Ghanem,
Van Droogenbroeck, M., De Vleeschouwer, B., Van Droogenbroeck, M.: Towards active
C.: SoccerNet game state reconstruction: learning for action spotting in association
End-to-end athlete tracking and identifi- football videos. In: IEEE/CVF Conf. Com-
cation on a minimap. In: IEEE Int. Conf. put. Vis. Pattern Recognit. Work. (CVPRW),
Comput. Vis. Pattern Recognit. Work. pp. 5098–5108. Inst. Electr. Electron. Eng.
(CVPRW), CVsports, Seattle, WA, USA (IEEE), Vancouver, Can. (2023). https:
(2024) //doi.org/10.1109/cvprw59228.2023.00538.
https://doi.org/10.1109/CVPRW59228.
[5] Cioppa, A., Deliège, A., Giancola, S., 2023.00538
Ghanem, B., Van Droogenbroeck, M., Gade,
R., Moeslund, T.B.: A context-aware loss [11] Cabado, B., Cioppa, A., Giancola, S.,
function for action spotting in soccer videos. Villa, A., Guijarro-Berdiñas, B., Padrón,
In: IEEE/CVF Conf. Comput. Vis. Pat- E., Ghanem, B., Van Droogenbroeck, M.:
tern Recognit. (CVPR), pp. 13123–13133. Beyond the Premier: Assessing action
Inst. Electr. Electron. Eng. (IEEE), Seat- spotting transfer capability across diverse
tle, WA, USA (2020). https://doi.org/10. domains. In: IEEE Int. Conf. Comput.
1109/cvpr42600.2020.01314. https://doi.org/ Vis. Pattern Recognit. Work. (CVPRW),
10.1109/CVPR42600.2020.01314 CVsports, Seattle, WA, USA (2024)
[6] Giancola, S., Ghanem, B.: Temporally-aware [12] Kassab, E.J., Solberg, H.M., Gautam, S.,
feature pooling for action spotting in soccer Sabet, S.S., Torjusen, T., Riegler, M.,
broadcasts. In: IEEE Int. Conf. Comput. Vis. Halvorsen, P., Midoglu, C.: Tacdec. Proceed-
Pattern Recognit. (CVPR), Nashville, TN, ings of the ACM Multimedia Systems Con-
USA, pp. 4490–4499 (2021). https://doi.org/ ference 2024 on ZZZ (2024). https://doi.org/
10.1109/CVPRW53098.2021.00506 10.1145/3625468.3652166
11
[16] Midoglu, C., Sabet, S.S., Sarkhoosh, M.H., [22] Andrews, P., Nordberg, O.E., Zubicueta Por-
Majidi, M., Gautam, S., Solberg, H.M., tales, S., Borch, N., Guribye, F., Fujita,
Kupka, T., Halvorsen, P.: Ai-based sports K., Fjeld, M.: AiCommentator: A multi-
highlight generation for social media. Pro- modal conversational agent for embedded
ceedings of the 3rd Mile-High Video Confer- visualization in football viewing. In: Int.
ence on zzz (2024). https://doi.org/10.1145/ Conf. Intell. User Interfaces, pp. 14–
3638036.3640799 34. ACM, Greenville, SC, USA (2024).
https://doi.org/10.1145/3640543.3645197.
[17] Gautam, S., Midoglu, C., Sabet, S.S., Ksha- https://doi.org/10.1145/3640543.3645197
tri, D.B., Halvorsen, P.: Assisting soc-
cer game summarization via audio inten- [23] Su, H., Maji, S., Kalogerakis, E., Learned-
sity analysis of game highlights. Unpub- Miller, E.: Multi-view convolutional neural
lished (2022). https://doi.org/10.13140/RG. networks for 3D shape recognition. In: IEEE
2.2.34457.70240/1 Int. Conf. Comput. Vis. (ICCV), pp. 945–953.
Inst. Electr. Electron. Eng. (IEEE), Santi-
[18] Midoglu, C., Hicks, S., Thambawita, V., ago, Chile (2015). https://doi.org/10.1109/
Kupka, T., Halvorsen, P.: MMSys’22 grand iccv.2015.114
challenge on AI-based video production for
soccer. In: ACM Multimedia Systems Con- [24] Bahdanau, D., Cho, K., Bengio,
ference (MMSys), Athlone, Ireland, pp. 1–6 Y.: Neural machine translation by
jointly learning to align and trans- https://doi.org/10.1145/3552437.3555699
late. arXiv abs/1409.0473 (2014).
https://doi.org/10.48550/arXiv.1409.0473 [31] Giancola, S., Amine, M., Dghaily, T.,
Ghanem, B.: SoccerNet: A scalable dataset
[25] Vaswani, A., Shazeer, N., Parmar, N., Uszko- for action spotting in soccer videos. In:
reit, J., Jones, L., Gomez, A.N., Kaiser, L., IEEE/CVF Conf. Comput. Vis. Pattern
Polosukhin, I.: Attention is all you need. Recognit. Work. (CVPRW), pp. 1792–
arXiv abs/1706.03762 (2017). https://doi. 179210. Inst. Electr. Electron. Eng. (IEEE),
org/10.48550/arXiv.1706.03762 Salt Lake City, UT, USA (2018). https://doi.
org/10.1109/cvprw.2018.00223
[26] Pappalardo, L., Cintia, P., Rossi, A., Mas-
succo, E., Ferragina, P., Pedreschi, D., Gian- [32] Deliège, A., Cioppa, A., Giancola, S., Seika-
notti, F.: A public data set of spatio-temporal vandi, M.J., Dueholm, J.V., Nasrollahi, K.,
match events in soccer competitions. Sci. Ghanem, B., Moeslund, T.B., Van Droogen-
Data 6(1), 1–15 (2019). https://doi.org/10. broeck, M.: SoccerNet-v2: A dataset and
1038/s41597-019-0247-7 benchmarks for holistic understanding of
broadcast soccer videos. In: IEEE Int.
[27] Yu, J., Lei, A., Song, Z., Wang, T., Cai, H., Conf. Comput. Vis. Pattern Recognit. Work.
Feng, N.: Comprehensive dataset of broad- (CVPRW), CVsports, Nashville, TN, USA,
cast soccer videos. In: IEEE Conf. Multime- pp. 4508–4519 (2021). https://doi.org/10.
dia Inf. Process. Retr. (MIPR), pp. 418–423. 1109/CVPRW53098.2021.00508. http://hdl.
Inst. Electr. Electron. Eng. (IEEE), Miami, handle.net/2268/253781
FL, USA (2018). https://doi.org/10.1109/
MIPR.2018.00090 [33] Cioppa, A., Deliège, A., Giancola, S.,
Ghanem, B., Van Droogenbroeck, M.: Scal-
[28] Scott, A., Uchida, I., Onishi, M., Kameda, Y., ing up SoccerNet with multi-view spatial
Fukui, K., Fujii, K.: SoccerTrack: A dataset localization and re-identification. Sci. Data
and tracking algorithm for soccer with 9(1), 1–9 (2022). https://doi.org/10.1038/
fish-eye and drone videos. In: IEEE/CVF s41597-022-01469-1
Conf. Comput. Vis. Pattern Recognit.
Work. (CVPRW), pp. 3568–3578. Inst. [34] Cioppa, A., Giancola, S., Deliege, A., Kang,
Electr. Electron. Eng. (IEEE), New Orleans, L., Zhou, X., Cheng, Z., Ghanem, B.,
LA, USA (2022). https://doi.org/10.1109/ Van Droogenbroeck, M.: SoccerNet-tracking:
cvprw56347.2022.00401. https://doi.org/10. Multiple object tracking dataset and bench-
1109/CVPRW56347.2022.00401 mark in soccer videos. In: IEEE Int.
Conf. Comput. Vis. Pattern Recognit. Work.
[29] Jiang, Y., Cui, K., Chen, L., Wang, C., (CVPRW), CVsports, pp. 3490–3501. Inst.
Xu, C.: SoccerDB: A large-scale database Electr. Electron. Eng. (IEEE), New Orleans,
for comprehensive video understanding. In: LA, USA (2022). https://doi.org/10.1109/
Int. ACM Work. Multimedia Content Anal. cvprw56347.2022.00393. https://doi.org/10.
Sports (MMSports), pp. 1–8. ACM, Seattle, 1109/CVPRW56347.2022.00393
WA, USA (2020). https://doi.org/10.1145/
3422844.3423051 [35] Held, J., Cioppa, A., Giancola, S., Hamdi, A.,
Ghanem, B., Van Droogenbroeck, M.: VARS:
[30] Van Zandycke, G., Somers, V., Istasse, M., Video assistant referee system for automated
Don, C.D., Zambrano, D.: DeepSportradar- soccer decision making from multiple views.
v1: Computer vision dataset for sports In: IEEE/CVF Conf. Comput. Vis. Pattern
understanding with high quality annota- Recognit. Work. (CVPRW), pp. 5086–5097.
tions. In: Int. ACM Work. Multimedia Inst. Electr. Electron. Eng. (IEEE), Vancou-
Content Anal. Sports (MMSports), ver, Can. (2023). https://doi.org/10.1109/
pp. 1–8. ACM, Lisbon, Port. (2022). cvprw59228.2023.00537. https://doi.org/10.
https://doi.org/10.1145/3552437.3555699.
13