Machine Learning and Data Mining For Sports Analytics: Ulf Brefeld Jesse Davis Jan Van Haaren Albrecht Zimmermann
Machine Learning and Data Mining For Sports Analytics: Ulf Brefeld Jesse Davis Jan Van Haaren Albrecht Zimmermann
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2024
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors
or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
The Machine Learning and Data Mining for Sports Analytics workshop aims to bring
together a diverse set of researchers working on Sports Analytics in a broad sense. In
particular, it aims to attract interest from researchers working on sports from outside
of machine learning and data mining. The 10th edition of the workshop was co-located
with the European Conference on Machine Learning and Principles and Practice of
Knowledge Discovery 2023.
Sports Analytics has been a steadily growing and rapidly evolving area over the last
decade, both in US professional sports leagues and in European football leagues. The
recent implementation of strict financial fair-play regulations in European football will
definitely increase the importance of Sports Analytics in the coming years. In addition,
there is the popularity of sports betting. The developed techniques are being used for
decision support in all aspects of professional sports, including but not limited to:
– Match strategy, tactics, and analysis
– Player acquisition, player valuation, and team spending
– Training regimens and focus
– Injury prediction and prevention
– Performance management and prediction
– Match outcome and league table prediction
– Tournament design and scheduling
– Betting odds calculation
The interest in the topic has grown so much that there is now an annual confer-
ence on Sports Analytics at the MIT Sloan School of Management, which has been
attended by representatives from over 70 professional sports teams in eminent leagues
such as Major League Baseball, National Basketball Association, National Football
League, National Hockey League, Major League Soccer, English Premier League, and
the German Bundesliga. Furthermore, sports data providers such as Statsbomb, Stats Per-
form, and Wyscout have started making performance data publicly available to stimulate
researchers who have the skills and vision to make a difference in the sports analytics
community. Moreover, the National Football League has been sponsoring a Big Data
Bowl where they release data and a concrete question to try to engage the analytics
community.
There has been growing interest in the Machine Learning and Data Mining commu-
nity about this topic, and the 2023 edition of MLSA built on the success of prior editions
at ECML PKDD 2013 and, ECML PKDD 2015 – ECML/PKDD 2022.
Both the community’s on-going interest in submitting to and participating in the
MLSA workshop series, and the fact that you are reading the fifth volume of MLSA
proceedings show that our workshop has become a vital venue for publishing and pre-
senting sports analytics results. In fact, as participants, many of whom had submitted
before, expressed during the workshop itself, venues for presenting machine learning
vi Preface
and data mining-based sports analytics work remain rare, motivating us to continue to
organize the workshop in future years.
In 2023, the workshop received a record 31 submissions of which 18 were selected
after a single-blind reviewing process involving at least three program committee mem-
bers per paper. One of the accepted papers was withdrawn from publication in the pro-
ceedings; two more were extended abstracts based on journal publications. In terms of
the sports represented, soccer was less dominant than in past years, and, notably, a paper
on sailing analysis was presented for the first time. Topics included tactical analysis,
outcome predictions, data acquisition, performance optimization, and player evaluation,
both physical and mental.
The workshop featured an invited presentation by Martin Rumo of OYM AG, Uni-
versity of Applied Sciences of Lucerne, and the Swiss Association of Computer Science
in Sport on “Sports Data Analytics: A Science and an Art”.
Further information about the workshop can be found on the workshop’s website at
https://dtai.cs.kuleuven.be/events/MLSA23/.
Workshop Co-chairs
Program Committee
Keynote
Football/Soccer
Basketball
Individual Sports
Martin Rumo(B)
Abstract. This short paper highlights the role of sports data analyt-
ics in providing credible metrics for complex skills, aiding managers and
coaches in decision-making processes. While Sport Science has been suc-
cessful in contributing to exercise physiology and theories of training
adaptions, it is less successful in explaining success in complex sports
such as invasion sports. We propose a framework to deal with this com-
plexity and providing meaningful metrices for individual and team per-
formances. We demonstrate its utility by providing an example for indi-
vidual and an example for team performance.
What distinguishes sports from games is that success in sport should by defi-
nition not rely on any “luck” element (i.e. dices, shuffling of cards) specifically
designed into the sport [1]. When we attribute success to chance, it might be
because the skills involved are not quite understood. Complex skills are difficult
to predict, and this difficulty often makes the situation seem more random to us.
Therefore we propose that the luck part of success is rather due to unknown
or not yet understood skills. Managers and Coaches try to understand skills in
order to develop them and use them as selection criteria. Sports Data Analytics
provides them with credible metrics for particulars skills in order to help them
in their decision process.
Apart from basic physical requirements for football players, sport science still has
little impact on practice in the day-to-day activity of football organisations [2].
We argue that this is mainly because the data do not adequately represent the
technical and tactical concepts used in the reasoning of coaches and managers.
In order to fill this semantic gap, we propose a process of reducing complexity
along the concepts around skill and performance in specific sport related tasks
[4]. In a sport science approach skills are usually broken down into basic skills
that can be tested in a more standardized way. As an example, agility is seen as
a combination of Change of Direction (COD) speed and the capability to quickly
react to stimulus. Both can be tested separately. But for the coach, agility is a
the capacity to solve a task, which is creating space in on-ball actions through
quick and short accelerations. We propose that for the coach or manager it is
more valuable to be able to evaluate to what extent a player is able to solve a
task relevant for in-game performance.
Sports Data Analytics: An Art and a Science 5
3 Conclusion
Recognizing the existing semantic gap between traditional sport science concepts
and the practical reasoning of coaches and managers, we proposed a framework
aimed at aligning data with the shared concepts of decision makers in the sports
industry. Our suggested framework, illustrated in Fig. 1, offers a structured app-
roach to extracting meaningful information about in-game performance. In order
to demonstrate its utility, we applied this framework to the evaluation of offen-
sive agility at individual level and extended its applicability to the modeling of
team coordination as a systemic structure, allowing for the reduction of complex
coordination patterns into measurable metrics. In essence, this paper advocates
for a paradigm shift in sports data analytics, urging to focus on concepts that
matter most to coaches and managers. By bridging the gap between raw data
and the technical and tactical reasoning of decision makers, our proposed frame-
work aims to enhance the effectiveness of sports analytics in providing valuable
insights for performance improvement and strategic decision-making in team
sports.
References
1. Definition of sport - services - sportaccord - international sports federation.
https://web.archive.org/web/20111028112912/. http://www.sportaccord.com/en/
members/index.php?idIndex=32&idContent=14881. Accessed 12 Sept 2023
2. Drust, B., Green, M.: Science and football: evaluating the influence of science
on performance. J. Sports Sci. 31(13), 1377–1382 (2013). https://doi.org/10.1080/
02640414.2013.828544, pMID: 23978109
3. Lames, M., McGarry, T.: On the search for reliable performance indicators in game
sports. Int. J. Perform. Anal. Sport 7(1), 62–79 (2007)
4. Seidenschwarz, P., Rumo, M., Probst, L., Schuldt, H.: A flexible approach to football
analytics: assessment, modeling and implementation. In: Lames, M., Danilov, A.,
Timme, E., Vassilevski, Y. (eds.) IACSS 2019. AISC, vol. 1028, pp. 19–27. Springer,
Cham (2020). https://doi.org/10.1007/978-3-030-35048-2 3
5. Seidenschwarz, P., Rumo, M., Probst, L., Schuldt, H.: High-level tactical perfor-
mance analysis with sportsense. In: Proceedings of the 3rd International Workshop
on Multimedia Content Analysis in Sports, pp. 45–52 (2020)
Football/Soccer
ETSY: A Rule-Based Approach to Event
and Tracking Data SYnchronization
1 Introduction
Over the past years, there has been a dramatic increase in the amount of in-
game data being collected about soccer matches. Until a few years ago, clubs
that were using data mostly only had access to event data that describes all on-
the-ball actions but does not include any information about what is happening
off the ball. Hence, the configurations and movements of players, both during
actions and in between, are missing. For example, it is impossible to distinguish
between a pass from midfield with 5 defenders vs. 1 defender in front of the ball.
Nonetheless, event data on its own can help address crucial tasks such as valuing
on-the-ball actions [1,6–10,13] and assessing in-game decisions [4,5,11,12].
More recently, top-level clubs have installed in-stadium optical tracking sys-
tems that are able to record the locations of all players and the ball multiple
times per second. Thus, tracking data provides the necessary context that is
missing in event data. However, it is not straightforward to perform tactical and
technical analyses solely based on tracking data as it does not contain informa-
tion about events such as passes, carries, and shots which are crucial to make
sense of a match since they contain semantic information.
Consequently, the richest analyses require integrating both data streams.
Unfortunately, these streams are often not time synchronized. That is, the event
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
U. Brefeld et al. (Eds.): MLSA 2023, CCIS 2035, pp. 11–23, 2024.
https://doi.org/10.1007/978-3-031-53833-9_2
12 M. Van Roy et al.
at a specific timestamp does not correspond to the tracking frame with the same
timestamp. Two main factors can create such misalignment. First, the clocks
used in event and tracking data collection might start at slightly different times,
introducing a constant bias. Second, because event data are manually annotated,
the timestamp of an event can occasionally be inaccurate due to human reac-
tion time or mistakes. Unfortunately, there are few approaches described in the
literature for synchronizing these data types and the relative merits of exist-
ing approaches are unclear. In this paper, we propose a rule-based approach for
accomplishing this. Then, this paper attempts to answer the following questions:
1. Event data of a game, which contains for each on-the-ball action its location
on the pitch (i.e., the x and y coordinate), the type (e.g., pass, shot, intercep-
tion), the time of the action, the result, the body part used, the player that
performed the action and the team he plays for.
2. Tracking data of the same game, which typically contains 10 or 25 frames per
second (FPS), and includes in each frame all x,y-coordinates of all players
and the x, y, z-coordinates of the ball at that moment in time.
Do: Assign each event a matching tracking frame such that it corresponds to
the match situation at the moment of the event (i.e., the ball is at the same
location in the event and tracking data, the player that performs the action in
the event data is the same player that is in possession of the ball in the frame,
and this player performs the same action as recorded in the event data).
There are few publicly described synchronization approaches. The approach
by Anzer & Bauer [2,3] uses two steps to synchronize the data streams. First, it
matches the kickoff event with its analogous tracking frame and computes the
offset in time between them. It then uses this offset to shift all timestamps in
the tracking data to remove the constant bias. To match the kickoff event with
its analogous tracking frame, they use the movement of the ball to identify the
kickoff frame within the first frames of the tracking data when the game has
started. Second, for each event, it determines windows of possession where the
1
https://github.com/ML-KULeuven/ETSY.
ETSY: Event and Tracking SYnchronization 13
player is within two meters of the ball and uses a weighted sum of features (e.g.,
time difference between the event and the tracking frame, distance between ball
and player, distance between ball coordinates in the event and tracking data) to
determine the best frame. Grid search on a manually labeled test set is used to
optimize the weights. Currently, their approach has only been evaluated on the
most relevant actions, passes and shots, and yields satisfactory results for both.
Alternatively, sync.soccer is a bio-informatics-inspired approach by Clark
& Kwiatkowski2 . Their method is based upon the Needleman-Wunsch algorithm
(similar to Dynamic Time Warping) that can synchronize any two sequences of
data. In a nutshell, this approach compares every event with every tracking
frame and uses a self-defined scoring function (i.e., a weighted combination of
time difference between the event and the tracking frame, distance between ball
and player, distance between ball coordinates in the event and tracking data, and
whether the ball is in play) to measure the fit between them. Next, it constructs
the synchronized sequence with the best overall fit. This approach is more general
than that of Anzer & Bauer as it allows to synchronize any event type instead
of only passes and shots, and without the need for manually labeled examples.
However, the weights of the scoring function need to be tuned separately for
each game, which is not straightforward and time intensive.
3 ETSY
We now outline our rule-based synchronization approach ETSY. To represent
the event data, we use SPADL [6] which is a unified format to represent all
on-the-ball actions in soccer matches. Therefore, any event stream that can be
transformed to SPADL can be used. Next, we outline the necessary preprocessing
steps to prepare the data and describe our algorithm.
3.1 Preprocessing
Before aligning the two data streams, four preprocessing steps are performed.
1. Event and tracking data often use different coordinate systems to represent
the pitch, with event data often using the intervals [0, 100] × [0, 100] and
tracking data the IFAB coordinate system (i.e., [0, 105] × [0, 68]). Therefore,
the event coordinates are transformed to match the coordinate system of the
tracking data.3
2. The event data of each team is transformed to match the playing direction
of the tracking data.
3. Only the tracking frames in which the game is not officially paused are kept.
This removes e.g., frames before the start, VAR checks, and pauses due to
injury. This ensures that events are matched with open-play frames only.
4. The velocity and acceleration of the ball are computed and added to each
tracking frame.
2
https://github.com/huffyhenry/sync.soccer.
3
The following package provides such a transformation: https://mplsoccer.
readthedocs.io/en/latest/gallery/pitch_plots/plot_standardize.html.
14 M. Van Roy et al.
3.2 Synchronization
The event and tracking data of each game are divided into the first and second
period. The synchronization is run for each period separately. Our rule-based
algorithm consists of the following two steps:
Step 1. Synchronize kickoff. Similar to Anzer & Bauer, we remove the constant
bias between the timestamps in the event and tracking data by aligning the
kickoffs. We determine which frame best matches the kickoff event by identifying,
in the first five seconds of the game, the frame in which the ball is within two
meters from the acting player and where the acceleration of the ball is the
highest. The tracking data timestamps are then corrected for this offset.
Next, we define for each category the time window parameter ta . For all
categories but set-pieces, we choose ta equal to five seconds. Set-pieces typically
require some set-up time and their annotated timestamps might be more off with
respect to other actions. Hence, we extend ta to 10 s to increase the chances that
the matching frame is contained in the window.
Finally, we define the action-specific filters. We want the ball to be sufficiently
close to the acting player. Thus, we enforce a maximum threshold on the distance
between the two, allowing for some measurement error in the data. The distance
for bad touches is set a bit larger as those typically already move the ball further
away from the player. Similarly, fault-like actions do not need to be in close
proximity to the ball. Filters on ball height aim at excluding frames where the
ball height is not coherent with the body part used to perform the action. When
the ball is up in the air, players typically do not use their feet to kick it; vice
versa, a low ball is rarely hit with the head. Actions performed with hands (e.g.,
throw-ins, keeper saves) can happen at higher heights, as the players can jump
and use their arms. In some cases, we add an extra limitation on whether the ball
should be accelerating or decelerating. The ball should clearly be accelerating
when a set-piece or pass-like action is performed with a player’s feet. Similarly,
the ball should be decelerating when the acting player is receiving the ball.
4 Evaluation
throw_in,
≤ 1.5m (with foot), accelerate
Set-piece freekick, corner, 10s ≤ 2.5m
≤ 3m (other) (with foot)
goalkick, penalty
interception,
keeper_save,
Incoming 5s ≤ 2m ≤ 3m decelerate
keeper_claim,
keeper_pickup
percent of events for which a matching frame can be found), and the agreement
with ETSY within s seconds (i.e., the percent of events for which the found frame
is within a range of s seconds from the one identified by ETSY). Additionally, we
compare the runtime of the different approaches.
Agreement. Figure 1 shows the agreement with ETSY for both sync.soccer
and timestamp-based approaches. Without time-bias adjustment, most of the
18 M. Van Roy et al.
found frames lie more than five seconds apart and hence no longer correspond
to the same match context. In contrast, the time-bias adjusted versions perform
much better. Most of the frames lie within one second of the frame found by ETSY.
This indicates the need for a time-bias adjustment in the existing approaches.
Additionally, we inspect the agreement for each of the defined action cate-
gories in Fig. 2. This gives us an indication of which actions are easier to match
than others. We only compare with the time-bias adjusted versions as the kickoff
alignment step was shown to be necessary. For bad touches, open-play, incoming,
and fault-like actions, the distributions are quite similar. The majority of the
found frames are within one second from the frame identified by ETSY, indicat-
ing a rather strong agreement. However, for set-pieces, we see an increase in the
number of events with a frame that is further off from the one identified by ETSY.
This could possibly indicate that the fixed time window used for set-pieces is
not ideal and might need adjusting in the future.
Fig. 1. Agreement with ETSY for (a) the time-bias adjusted and (b) the non-adjusted
versions of sync.soccer and the timestamp-based approach. All action types are
included and the agreement is computed over five disjunct windows.
Fig. 2. Agreement with ETSY for the time-bias adjusted versions of sync.soccer and
the timestamp-based approach. The agreement is calculated for each action category.
Table 3. ETSY missed events analysis. The percentages indicate how many of the
unmatched events would have a matching frame if each filter was dropped.
In most cases, removing the distance filter would yield a matching frame.
However, the associated frame would probably be wrong, as the ball would be
far away from the acting player. It is possible that these are cases where either
a slight error exists in the tracked locations, thus stretching distances between
player and ball, or in the annotated timestamps, meaning that the window of
selected frames does not include the correct frame. While filters on ball height
and acceleration show a minor influence, the filter on timestamps also has a con-
sistent effect. Except for set-pieces, roughly half of the misses would be avoided
20 M. Van Roy et al.
if frames whose timestamp is earlier than the last matched frame would be
retained. This happens when an earlier event is matched to a slightly incorrect
frame that is further in the future. This error propagates and prevents synchro-
nizing the next couple of actions. For example, this can occur when a number of
actions happen very quickly after one another.
Fig. 3. Illustration of two random events synchronized by three approaches. The red
cross indicates where the event takes place in the event data, the black encircled player
is the acting player according to the event data. All player positions are shown according
to the matched frame and the ball is shown in black. (Color figure online)
can be mitigated by adding a time-bias adjustment step, after which the app-
roach produces an automated and satisfactory synchronization. Naturally, the
timestamp-based approach is automated and can be applied to synchronize all
actions. However, its performance is unacceptable.
5 Conclusion
This paper addresses the task of synchronizing soccer event data with tracking
data of the same game, which is a problem that is not often explicitly mentioned
in the literature. This paper provides an overview of the current state-of-the-
art approaches, introduces a simple rule-based approach ETSY, and compares
22 M. Van Roy et al.
Acknowledgements. This work has received funding from the Flemish Government
under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” program,
the Research Foundation - Flanders under EOS No. 30992574, and the KU Leuven
Research Fund (iBOF/21/075). We thank Jan Van Haaren for the useful feedback
during the development of this work.
References
1. AmericanSoccerAnalysis: What are Goals Added (2020). https://www.
americansocceranalysis.com/what-are-goals-added
2. Anzer, G., Bauer, P.: A goal scoring probability model for shots based on synchro-
nized positional and event data in football (soccer). Front. Sports Active Living 3
(2021)
3. Anzer, G., Bauer, P.: Expected passes - determining the difficulty of a pass in
football (soccer) using spatio-temporal data. Data Min. Knowl. Disc. 36, 295–317
(2022)
4. Beal, R., Chalkiadakis, G., Norman, T.J., Ramchurn, S.D.: Optimising game
tactics for football. In: Proceedings of the 19th International Conference on
Autonomous Agents and MultiAgent Systems, AAMAS 2020, pp. 141–149 (2020)
5. Beal, R., Changder, N., Norman, T., Ramchurn, S.: Learning the value of team-
work to form efficient teams. In: Proceedings of the AAAI Conference on Artificial
Intelligence, vol. 34, pp. 7063–7070 (2020)
6. Decroos, T., Bransen, L., Van Haaren, J., Davis, J.: Actions speak louder than
goals: valuing player actions in soccer. In: Proceedings of the 25th ACM SIGKDD
International Conference on Knowledge Discovery & Data Mining, KDD 2019, pp.
1851–1861 (2019)
7. Merhej, C., Beal, R.J., Matthews, T., Ramchurn, S.: What happened next? Using
deep learning to value defensive actions in football event-data. In: Proceedings
of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining,
KDD 2021, pp. 3394–3403 (2021)
8. Rudd, S.: A framework for tactical analysis and individual offensive production
assessment in soccer using Markov chains. In: New England Symposium on Statis-
tics in Sports (2011)
9. Singh, K.: Introducing Expected Threat (2019). https://karun.in/blog/expected-
threat.html
10. StatsBomb: Introducing On-Ball Value (OBV) (2021). https://statsbomb.com/
articles/soccer/introducing-on-ball-value-obv/
ETSY: Event and Tracking SYnchronization 23
11. Van Roy, M., Robberechts, P., Yang, W.C., De Raedt, L., Davis, J.: Leaving goals
on the pitch: evaluating decision making in soccer. In: Proceedings of the 15th MIT
Sloan Sports Analytics Conference, pp. 1–25 (2021)
12. Van Roy, M., Robberechts, P., Yang, W.C., De Raedt, L., Davis, J.: A Markov
framework for learning and reasoning about strategies in professional soccer. J.
Artif. Intell. Res. 77, 517–562 (2023)
13. Yam, D.: Attacking Contributions: Markov Models for Football (2019). https://
statsbomb.com/2019/02/attacking-contributions-markov-models-for-football/
Masked Autoencoder Pretraining
for Event Classification in Elite Soccer
1 Introduction
Automatically labeling instances of multiagent tracking data from team sports
regarding the occurrence of selected in-game events (such as on-ball actions) is
an important topic for sports analytics. While supervised learning for this task is
conceptually straightforward, contemporary deep learning models may require a
substantial amount of manually annotated labels, even when exploiting inherent
data symmetries.
To date, supervised models for multiagent trajectories involve training on
either handcrafted or static features [1,5,11] or make use of self-supervised
and semi-supervised methods incorporating autoregressive reconstruction tasks
[7,13]. Although autoregressive models have demonstrated impressive perfor-
mance in certain generative tasks [3,14], recent research in language and vision
domains suggests that data denoising techniques might be more suitable for
pretraining models for downstream tasks [6,9]. Moreover, some self-supervised
approaches for multiagent trajectory data fail to account for apparent symme-
tries in the data. Specifically, models and objectives may not exhibit permutation
equivariance or permutation invariance concerning the ordering of agents.
In this paper, we introduce a novel self-supervised pretraining method
designed for the classification of multiagent trajectory instances. We refer to
this method as trajectory masked autoencoder (T-MAE). In our approach, we
Y. Rudolph—Work in large part performed while at SAP SE, Machine Learning R&D,
Berlin.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
U. Brefeld et al. (Eds.): MLSA 2023, CCIS 2035, pp. 24–35, 2024.
https://doi.org/10.1007/978-3-031-53833-9_3
Masked Autoencoder Pretraining for Event Classification in Elite Soccer 25
Fig. 1. Sketch of the proposed trajectory masked autoencoder (T-MAE). Notation and
details are provided in the text. In the encoder section at the top, segments that will
be subjected to masking in subsequent steps are highlighted using dotted lines. This
highlighting is intended for illustrative purposes only.
encoder operates on a multiagent trajectory instance, x = x1:T 1 , . . . , xK
1:T
. We
s
first split each individual trajectory x1:T
k into S segments x̄ k . These segments
are of equal length and encompass a fixed number of timesteps, referred to as lS .
Consequently, each segment comprises a specific set of timesteps. It is notewor-
thy that when lS = 1, each timestep corresponds to one segment, resulting in a
total of S = T segments. This segmentation operation is formally represented as
SEGlS (x1:T
k ) = x̄k , where the segmentation process is applied individually to
1:S
each trajectory.
Each of the created segments is separately embedded with a shared fully-
connected feed-forward network (FFN), which projects the segments into a com-
mon model dimension denoted as d. Subsequently, we enhance the representation
of each segment by incorporating positional encodings (PE) and, optionally, tra-
jectory embeddings (TE). This augmentation operation is denoted as ADDemb ,
where the term “emb” signifies the embeddings added to each trajectory seg-
ment. To encode the temporal position of each segment, we employ sinusoidal
positional encodings (PE), following [15]. This technique has also been utilized
in previous work by [8] for multiagent trajectories. Alternatively, it is possible
to employ learnable embeddings for positional encoding. The positional encod-
ings have a dimensionality of d and are added element-wise to each segment.
Mathematically, this operation can be expressed as ADDPE (x̄sk ) = PEs ⊕x̄sk .
Furthermore, we introduce trajectory embeddings (TE), which are learnable
embeddings also with a dimensionality of d. These embeddings encode specific
properties of individual trajectories and are also added to segments element-
wise. In this case, the same trajectory embedding is added to each segment of
any given individual trajectory. Mathematically, this process can be represented
as ADDTE (x̄sk ) = TEk ⊕x̄sk .
For each segmented, embedded and enriched trajectory, we uniformly chose
random segments to be masked out, following a predetermined masking ratio
denoted as r. Masking out refers to the process of removing segments from the
current representation of the trajectory. For example, a trajectory representation
28 Y. Rudolph and U. Brefeld
θdec (φenc (x)) = FFN ◦ FTE ◦ ADD ◦ UNSHUFFLE ◦ CAT (φenc (x)) .
PE Q
To update both the segment and query token representations within the T-MAE,
we employ a factorized transformer architecture. We will refer to this model as
factorized transformer encoder (FTE), for reasons that will soon become clear.
The FTE, which we use within the T-MAE, was initially introduced as part of
the encoder in a variational autoencoder for trajectory generation in [8], where
it is referred to as a multi-head attention block (MAB). While the FTE and
the MAB are architecturally identical, they operate on different representations.
Most importantly, the FTE that is employed in the encoder of the T-MAE
operates on shuffled segments per trajectory. This results in segments not being
temporally aligned for updates along the agent dimension. Consequently, the
model relies on the positional encoding to determine which other segments to
30 Y. Rudolph and U. Brefeld
Fig. 2. Sketch of one layer within the factorized transformer encoder (FTE) applied to
input which is shuffled along the temporal dimension (1–4, agent dimension is color-
coded). TLenc represents a standard transformer encoder layer.
attend to. For a visualization of one layer of the FTE applied to shuffled segment
representations, please refer to Fig. 2.
For simplicity, we will use the term x̃ to refer to both segment and query
token representations throughout the remainder of this section. The architecture
of the FTE consists of two distinct types of standard transformer encoder layers,
as proposed in [15]. These layers include multi-head self-attention layers, resid-
ual connections, dropout [12], layer normalization [2], and feed-forward neural
networks. For more detailed information on these layers, we refer to either [15]
or [8]. In the context of the FTE, one of these layers operates over segments in
time and is separately applied to each trajectory. The other layer operates over
segments per agent and is applied independently to all temporal positions. To
clarify, in the first operation, we effectively extend the batch dimension to encom-
pass individual trajectories, while in the second operation, we extend the batch
dimension to encompass temporal positions: Temporal self-attention is applied
S̃
to x̃1:
k for all k ∈ [1, . . . , K], where S̃ = S in the T-MAE decoder and T-MAE
encoder during testing, and S̃ = S − S · r in the T-MAE encoder during self-
supervised pretraining. Self-attention with respect to trajectory interactions is
applied to x̃s1:K , for all s ∈ S̃. Combining these two operations, the encoder layer
with temporal self-attention and the encoder layer with self-attention concern-
ing trajectory interactions constitute a single FTE layer. This FTE layer can be
applied repeatedly to the input. In our experiment in Sect. 4 we use three FTE
layers both in the encoder and decoder of T-MAEs, as well as for the baseline
transformer models.
The application of FTE on x̃ exhibits permutation equivariance concerning
the arrangement of trajectories. This property of FTE follows directly from
the fact that the standard transformer encoder is permutation equivariant with
respect to its input, and that the temporal self-attention (which is provided
with positional information through encodings) is separately applied to each
trajectory. For a formal proof of FTE’s permutation equivariance, we refer to
[8].
Masked Autoencoder Pretraining for Event Classification in Elite Soccer 31
Fig. 3. Visualization of T-MAE masking (middle) and reconstruction (right) for a vali-
dation multiagent trajectory instance (ground truth left). The reconstruction combines
ground truth timesteps provided to the model (i.e. dots in the middle column with lS ,
r = 5, 0.8) and masked timesteps predicted by the model.
1
All data is from the German Bundesliga (2017/18) and can be acquired from DFL.
32 Y. Rudolph and U. Brefeld
Fig. 4. Comparing T-MAE (with segment length lS = 5 and mask ratio r = 0.8) to
un-pretrained transformer models with similar architecture and MLP baseline. Metric
is mean average precision (mAP, higher is better), SE = standard error.
We transform the data so that the home team always plays from left to right
and provide type embeddings which encode whether a trajectory belongs to (i)
the ball, (ii) the home keeper, (iii) a home field player, (iv) the guest keeper or
(v) a guest field player. We work with minimal data augmentation, mirroring the
instances along horizontal and vertical lines going through the middle point of
the pitch. Instances are labeled with an event, if it occurs within the 75 central
frames. We consider ten labels, of which the label pass occurs most frequently and
corner kick least frequently. Sparsity in positive labels results in an mAP of only
0.067 for random guessing. In Fig. 3, we provide visualizations of a ground truth,
masked and reconstructed instance (from the validation data). Even though large
parts of the input data is masked, the model apparently learned to account for
turns, twists and possibly interactions in the players movements.
The main results of our experiment are presented in Fig. 4, where we compare
the classification performance of a factorized transformer model pretrained with
T-MAE to models of similar architecture without pretraining. We conducted
experiments involving both pretrained and un-pretrained models using three
distinct segment lengths lS ∈ 1, 5, 25 and a linear classifier on top of the fac-
torized transformer architecture. The feature representation, obtained from the
encoder of T-MAE, remains unshuffled and is aggregated across agents. This
aggregated representation is then concatenated along the segment dimension
within the classifier. While the factorized transformer architecture is maintained
for both pretrained and un-pretrained models, we opted not to shuffle segments
within trajectories for the latter, as it is not necessary. Optimization is per-
formed with Adam [10] and early stopping on validation performance. During
classification training, we finetune the weights of the pretrained model. While
we use a standard learning rate of 0.001 and no dropout in the FTEs for the
Masked Autoencoder Pretraining for Event Classification in Elite Soccer 33
The results clearly demonstrate that the model pretrained with the trajectory
masked autoencoder consistently outperforms the models without pretraining
across all four data regimes. This indicates that the pretrained model is more
data-efficient, achieving the same level of performance with less labeled data.
Furthermore, it is worth highlighting that the pretraining approach yields the
best overall performance even when 100% of the labeled data is available. This
suggests that the trajectory masked autoencoder is not only useful in scenar-
ios with limited labeled data but also beneficial when abundant labeled data is
accessible. Additionally, we observe that pretrained models are robust to vari-
ations in hyperparameter lS and r as indicated in Table 1. The relatively poor
performance of the MLP baseline underscores the significance of permutation
invariance concerning agents and emphasizes the need for increased modeling
capacity.
5 Conclusion
We proposed a novel self-supervised approach for multiagent trajectories. We
introduced a masking scheme that rendered masking of individual trajectories
34 Y. Rudolph and U. Brefeld
independent of one another and made novel use of a transformer architecture that
is factorized over time and agents. This makes the encoder of our pretraining
model permutation equivariant with respect to the order of trajectories and
naturally lends itself to downstream tasks that require permutation invariance
with respect to agent order. Empirically, pretraining with our approach improved
the performance of classifying multiagent trajectory instances with respect to in-
game events for tracking data from professional soccer matches.
References
1. Anzer, G., Bauer, P.: Expected passes: determining the difficulty of a pass in foot-
ball (soccer) using spatio-temporal data. Data Min. Knowl. Discov. 36, 295–317
(2022)
2. Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint
arXiv:1607.06450 (2016)
3. Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural
Information Processing Systems, vol. 33, pp. 1877–1901 (2020)
4. Casas, S., Gulino, C., Suo, S., Luo, K., Liao, R., Urtasun, R.: Implicit latent
variable model for scene-consistent motion forecasting. In: Vedaldi, A., Bischof,
H., Brox, T., Frahm, J.M. (eds.) ECCV 2020. LNCS, vol. 12368, pp. 624–641.
Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58592-1_37
5. Chawla, S., Estephan, J., Gudmundsson, J., Horton, M.: Classification of passes in
football matches using spatiotemporal data. ACM Trans. Spat. Algorithms Syst.
3(2), 1–30 (2017)
6. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of
deep bidirectional transformers for language understanding. arXiv preprint
arXiv:1810.04805 (2018)
7. Fassmeyer, D., Anzer, G., Bauer, P., Brefeld, U.: Toward automatically labeling
situations in soccer. Front. Sports Active Living 3 (2021)
8. Girgis, R., et al.: Latent variable sequential set transformers for joint multi-
agent motion prediction. In: International Conference on Learning Representations
(2022)
9. He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders
are scalable vision learners. In: IEEE Conference on Computer Vision and Pattern
Recognition (2022)
10. Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. arXiv
preprint arXiv:1412.6980 (2014)
11. Power, P., Ruiz, H., Wei, X., Lucey, P.: Not all passes are created equal: objectively
measuring the risk and reward of passes in soccer from tracking data. In: SIGKDD
International Conference on Knowledge Discovery and Data Mining, pp. 1605–1613
(2017)
12. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.:
Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn.
Res. 15(1), 1929–1958 (2014)
Masked Autoencoder Pretraining for Event Classification in Elite Soccer 35
13. Sun, J.J., Kennedy, A., Zhan, E., Anderson, D.J., Yue, Y., Perona, P.: Task pro-
gramming: learning data efficient behavior representations. In: IEEE Conference
on Computer Vision and Pattern Recognition (2021)
14. van den Oord, A., et al.: WaveNet: a generative model for raw audio. arXiv preprint
arXiv:1609.03499 (2016)
15. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information
Processing Systems, pp. 5998–6008 (2017)
16. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and compos-
ing robust features with denoising autoencoders. In: International Conference on
Machine Learning, pp. 1096–1103. ACM (2008)
17. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denois-
ing autoencoders: learning useful representations in a deep network with a local
denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)
18. Yeh, R.A., Schwing, A.G., Huang, J., Murphy, K.: Diverse generation for multi-
agent sports games. In: IEEE Conference on Computer Vision and Pattern Recog-
nition (2019)
19. Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczós, B., Salakhutdinov, R.R., Smola,
A.J.: Deep sets. In: Advances in Neural Information Processing Systems, pp. 3391–
3401 (2017)
Quantification of Turnover Danger
with xCounter
1 Introduction
There are various strategies for scoring goals in soccer, such as set pieces or
build-up play. One particular strategy that attracts attention are so-called coun-
terattacks, i.e., quick transition attacks executed immediately after winning the
ball [2,11,16]. Previous studies have examined the defensive team’s tactical
behavior at the moment of a turnover [2], while others have focused on the
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
U. Brefeld et al. (Eds.): MLSA 2023, CCIS 2035, pp. 36–51, 2024.
https://doi.org/10.1007/978-3-031-53833-9_4
Quantification of Turnover Danger with xCounter 37
2 Dataset
We conduct our analysis on StatsPerform position and event data for 289
matches of elite men’s soccer. All matches took place within the 20/21 season
of a top-five European league. The position data is automatically collected and
comprises all players and the ball while the event data is manually annotated
and contains information about the event type, player identity, and a timestamp.
To ensure consistency, we exclude situations with fewer than 22 players due to
reasons such as red cards or injuries. To address missing frames, we employ linear
interpolation between the last previous and first following player positions. Addi-
tionally, we normalize the direction of play, ensuring that the team in possession
always plays from left to right.
Fig. 1. Heatmap of open play turnover frequencies (left) and location average contin-
uous labels (right). Both heatmaps are smoothed with gaussian smoothing [9]. Design
choices for the involved parameters are listed in Sect. 3.6.
predictive capability, they often lack the desired degree of interpretability sought
by domain experts [20]. In the light of this, we have previously proposed a frame-
work for understanding complex sequences in soccer.
component of the definition, we identify the event closest to the goal of the ball-
losing team. Subsequently, we apply a function fsuccess : x → y i ∈ [0, 1] which
maps a pitch location x onto a continuous label y representing the danger of the
counterattack following the turnover. The resulting dataset, consisting of open
play turnovers and their continuous labels are plotted in Fig. 1.
Fig. 2. Prediction capability assessment of feature Infront ball Player-Count for the ball
losing team. Panels A and D represent the turnover frequencies and average continuous
labels for the “above” group. Panels B and E represent the corresponding information
for the “below” group. Panel C illustrates the reliable pitch regions where the minimum
threshold Nsample, min is met by both groups. Panel F shows the difference of label
values between the “above” and “below” groups. For example, at the darkest pitch grid
in midfield (left half space of ball-losing team), the “above” subgroup has a 0.04 higher
xCounter value than the “below” group. The resulting aggregation of pitch differences
is provided in Table 2, bottom row, second column.
play turnovers we aim for a relatively long duration to ensure that the remaining
turnovers have minimal residual set piece originated positioning. After discus-
sions and careful inspections of individual situations we choose tsetpiece = 30 s.
xi
ticounter (xi ) = 5 s + , (1)
5 ms−1
where xi is the turnover location and · denotes the Euclidean distance to
the losing team’s goal. We incorporate the fixed time offset of five seconds to
account for synchronization errors and counterattacks initiated after a few initial
backward or sideways actions. As a result, our analysis time varies between a
maximum of 32 s at the corner flag and 10 s for ball losses close to the losing
team’s goal.
For the spatial criterion we evaluate pitch locations reached after a coun-
terattack. While various concepts have been proposed for this purpose, such as
expected possession value [7], expected goals [3], or expected threat [14], they all
share the idea that pitch value generally increases with decreasing goal distance.
To provide a flexible and simple solution that is not dependent on a specific
concept, we choose a linear relation between goal distance and pitch value. We
define the label function fsuccess (xioff ) as follows:
xcenter −xioff |
i i if xi ≤ xcenter
y = fsuccess (xmax ) = xcenter , (2)
0 else
where ximax denotes the position of the ball-winning team’s closest event to the
ball-losing goal within ticounter after the turnover and xcenter denotes the position
of the center mark.
Table 2. The four most predictive features with critical feature values, most relevant
pitch zones, and aggregated prediction capability, separately listed for both teams.
experimentation, we find that a grid size of four by four meters satisfies this crite-
rion in most cases. Thus, each grid has a frequency of Ngrid = 132 ± 66 turnovers
on average. We further enhance the quality of the heatmaps and reduce the influ-
ence of grid outliers, by application of Gaussian smoothing [9] with a smoothing
factor of s = 2. This smoothing helps to create more visually interpretable and
robust heatmaps.
Table 3. Evaluation results of baselines and XGBoost with varying top Ntop features
being used. Results are presented in terms of mean absolute error (MAE), mean squared
error (MSE), mean squared logarithmic error (MLE), and R2 . The superscripts for
XGBoost describe different hyperparameter configurations, listed in the Appendix.
Model Ntop MAE MSE MLE R2 Model Ntop MAE MSE MLE R2
A
Location BL - .129 .186 .020 .448 XGBoost 1 .133 .185 .020 .463
Feat.- & Loc. BL 1 .138 .196 .022 .379 XGBoostB 1 .122 .186 .020 .454
5 .133 .199 .023 .373 XGBoostC 6 .101 .204 .025 .249
10 .131 .201 .024 .362 XGBoostD 9 .133 .184 .020 .391
30 .128 .207 .025 .327
by the features beyond the already strong predictive capability of the location.
Thus, a more refined algorithm is needed to effectively leverage the predictive
potential of the features in conjunction with the location data.
5 Related Work
Various studies in literature have dealt with the subject of counterattacks. Bauer
and Anzer [2] have used XGBoost to classify defensive team behavior after a
turnover into passive and aggressive ball regain attempts. They found that coun-
terpressing (aggressive) attempts are more effective at the sidelines and demon-
strated that successful teams generate more shots when counterpressing. Raudo-
nius et al. [16] isolated players contribution to a counterattack using four types
of performance indicators and identified outstanding performers. Sahasrabudhe
and Bekkers [18] developed a graph neural network that predicts the outcome
of individual frames during a counterattack. They found that player speed, ball-,
and goal angles are most predictive for the success of a counterattack.
This work, to the best of our knowledge, is first to predict counterattack
danger at the moment of the turnover. It complements existing literature and can
be combined with other approaches, e.g. to find the best counterpressing strategy
in high xCounter situations [2] or to analyze systematic over/underperformance
at different stages of the counterattack [16,18].
6 Application
Table 4. Counter effectiveness of both teams in our example match. High values of
over-performance were expected, due to the high number of goals scored by both teams.
Table 5. Individual performance of players for ball wins and ball losses. First listed
are the starters of the home team, subsequently the starters of the away team.
Position Ball wins Ball losses Danger Created Danger Allowed Danger +/-
ST 31 36 10.56 6.22 4.34
LCM 31 18 9.51 6.64 2.86
RCM 23 21 7.13 4.73 2.39
LM 14 21 6.35 4.09 2.26
LCB 11 8 3.09 0.94 2.15
RCB 13 8 3.80 2.45 1.35
CAM 14 16 2.32 2.67 −0.35
RB 19 14 4.94 7.34 −2.40
LB 21 31 4.44 9.60 −5.16
RM 17 35 4.18 9.77 −5.59
GK 28 16 0.12 10.72 −10.6
RCM 32 32 12.08 4.29 7.79
LB 31 22 9.84 5.43 4.41
LCM 21 22 8.47 4.42 4.05
ST 17 40 6.84 3.81 3.04
RM 18 26 6.78 4.15 2.63
CAM 18 19 7.80 6.62 1.18
LM 14 15 4.97 4.16 0.81
RB 34 26 8.59 8.11 0.48
LCB 22 17 4.18 6.99 −2.82
GK 25 10 0.59 4.43 −3.85
RCB 14 16 0.91 7.11 −6.20
can be obtained. Lastly, our approach is subject to a set of design choices that
need to be carefully inspected. Overall, we see large potential in combining and
evaluating xCounter in the light of existing sport-specific concepts.
A Appendix
For the predictive XGBoost models, a range of different hyperparameters were
examined. Therefore, training was done on the training set (70% of turnovers)
and the performance of different hyperparameter configurations was compared
on the test set (30% of turnovers). Due to the fact, that the focus of work does
not lie on generating an automatic model, we avoid using a more sophisticated
approach, e.g., using cross-validation or a separate validation set. A list of the
evaluated hyperparameters is provided in Table 6. Admittedly, our search space
is limited, yet we plan to expand it in future. An explanation of the different
hyperparameter configurations of the best XGBoost models, encrypted as super-
scripts in Table 3, is given in Table 7.
Detailed lists of the best ranked features for both teams are presented in
Tables 8 and 9.
Table 6. Hyperparameter search space used for optimizing the XGBoost models.
The choice of search space was inspired by the optimal results documented by Bauer
and Anzer [2] as experimentation with significantly different parameters did not offer
promising results.
Table 8. Most predictive features for the ball-winning team. The Lastplayer subgroup
was added to the feature set after inspection of the initial results.
Quantification of Turnover Danger with xCounter 49
Table 9. Most predictive features for the ball-losing team. The Lastplayer subgroup
was added to the feature set after inspection of the initial results.
50 H. Biermann et al.
References
1. Balague, N., Torrents, C., Hristovski, R., Davids, K., Araújo, D.: Overview of
complex systems in sport. J. Syst. Sci. Complex. 26(1), 4–13 (2013). https://doi.
org/10.1007/s11424-013-2285-0
2. Bauer, P., Anzer, G.: Data-driven detection of counterpressing in professional foot-
ball: a supervised machine learning task based on synchronized positional and event
data with expert-based feature extraction. Data Min. Knowl. Disc. 35(5), 2009–
2049 (2021). https://doi.org/10.1007/s10618-021-00763-7
3. Bauer, P., Anzer, G.: A goal scoring probability model for shots based on synchro-
nized positional and event data in football (soccer). Front. Sports Active Living 3,
53 (2021). https://doi.org/10.3389/fspor.2021.624475
4. Biermann, H., Theiner, J., Bassek, M., Raabe, D., Memmert, D., Ewerth, R.: A
unified taxonomy and multimodal dataset for events in invasion games. In: Proceed-
ings of the 4th International Workshop on Multimedia Content Analysis in Sports,
pp. 1–10. ACM, Virtual Event China (2021). https://doi.org/10.1145/3475722.
3482792
5. Biermann, H., Wieland, F.G., Timmer, J., Memmert, D., Phatak, A.: Towards
expected counter - using comprehensible features to predict counterattacks. In:
Brefeld, U., Davis, J., Van Haaren, J., Zimmermann, A. (eds.) MLSA 2022. CCIS,
vol. 1783, pp. 3–13. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-
27527-2_1
6. Delaunay, B., et al.: Sur la sphere vide. Izv. Akad. Nauk SSSR, Otdelenie Matem-
aticheskii i Estestvennyka Nauk 7(793–800), 1–2 (1934)
7. Fernández, J., Bornn, L., Cervone, D.: A framework for the fine-grained evaluation
of the instantaneous expected value of soccer possessions. Mach. Learn. 110(6),
1389–1427 (2021). https://doi.org/10.1007/s10994-021-05989-6
8. Frencken, W., Lemmink, K., Delleman, N., Visscher, C.: Oscillations of centroid
position and surface area of soccer teams in small-sided games. Eur. J. Sport Sci.
11(4), 215–223 (2011). https://doi.org/10.1080/17461391.2010.499967
9. Hockeyviz: Smoothing: How to (2023)
10. Lago-Ballesteros, J., Lago-Peñas, C., Rey, E.: The effect of playing tactics and
situational variables on achieving score-box possessions in a professional soccer
team. J. Sports Sci. 30(14), 1455–1461 (2012)
11. Lepschy, H., Wäsche, H., Woll, A.: Success factors in football: an analysis of the
German Bundesliga. Int. J. Perform. Anal. Sport 20(2), 150–164 (2020). https://
doi.org/10.1080/24748668.2020.1726157
12. Liu, G., Luo, Y., Schulte, O., Kharrat, T.: Deep soccer analytics: learning an
action-value function for evaluating soccer players. Data Min. Knowl. Disc. 34(5),
1531–1559 (2020)
13. LLC, S: Playing Styles Definition by StatsPerform (2023)
14. Merhej, C., Beal, R.J., Matthews, T., Ramchurn, S.: What happened next? Using
deep learning to value defensive actions in football event-data. In: Proceedings
of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining,
pp. 3394–3403. ACM, Virtual Event Singapore (2021). https://doi.org/10.1145/
3447548.3467090
15. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn.
Res. 12, 2825–2830 (2011)
16. Raudonius, L., Allmendinger, R.: Evaluating football player actions during coun-
terattacks. In: Yin, H., et al. (eds.) IDEAL 2021. LNCS, vol. 13113, pp. 367–377.
Springer, Cham (2021). https://doi.org/10.1007/978-3-030-91608-4_36
Quantification of Turnover Danger with xCounter 51
17. Robberechts, P., Davis, J.: How data availability affects the ability to learn good
xG models. In: Brefeld, U., Davis, J., Van Haaren, J., Zimmermann, A. (eds.)
MLSA 2020. CCIS, vol. 1324, pp. 17–27. Springer, Cham (2020). https://doi.org/
10.1007/978-3-030-64912-8_2
18. Sahasrabudhe, A., Bekkers, J.: A graph neural network deep-dive into successful
counterattacks. In: MIT Sloan Sports Analytics Conference, vol. 17 (2023)
19. Tenga, A., Kanstad, D., Ronglan, L.T., Bahr, R.: Developing a new method for
team match performance analysis in professional soccer and testing its reliability.
Int. J. Perform. Anal. Sport 9(1), 8–25 (2009). https://doi.org/10.1080/24748668.
2009.11868461
20. Van Haaren, J.: “Why would I trust your numbers?” On the explainability of
expected values in soccer. arXiv preprint arXiv:2105.13778 (2021)
Pass Receiver and Outcome Prediction
in Soccer Using Temporal Graph
Networks
1 Introduction
Passes are the most frequent event in soccer, so analyzing them is essential to
evaluate players’ performance or match situations [1,11]. Particularly, focusing
on individual passing options in a given passing situation enables domain partic-
ipants to characterize the general tendency of players’ decision-making or assess
their decisions. There are two main aspects of analyzing passing options: either
in terms of player (i.e., selecting a player to receive the pass) [1,11] or space (i.e.,
selecting a specific location on the pitch to send the ball to) [6,16,18].
In reality, it is difficult for players to pass the ball to a specific point on
purpose, so we focus on players rather than the space to concretize the passer’s
intention more intuitively. We employ Temporal Graph Network (TGN) [17]
to predict the intended and actual receivers in a given passing situation. By
leveraging the TGN’s ability to capture temporal dependencies, we estimate for
a given moment the receiver selection probability (RSP) that the passer intends
to send the ball to each of the teammates and the receiver prediction probability
(RPP) that each player becomes the actual receiver of the pass.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
U. Brefeld et al. (Eds.): MLSA 2023, CCIS 2035, pp. 52–63, 2024.
https://doi.org/10.1007/978-3-031-53833-9_5
Pass Receiver and Outcome Prediction 53
Based on these RSP and RPP, we compute the success probability (named
as CPSP in our paper) of each passing option that the pass is successfully sent
to the intended receiver as well as the overall pass success probability (OPSP)
of a given situation. Especially, we mathematically prove that dividing the RSP
of a teammate by the corresponding RPP is equal to the CPSP of the passing
option to the teammate. We analyze 358,790 passes from the 330 Belgian Pro
League matches to estimate their average success probabilities in 18 zones of the
pitch for both the start and end locations of the passes.
The proposed framework provides deeper insights into the context around
passes in soccer by quantifying the tendency of passers’ choice of passing options,
the difficulties of the options, and the overall difficulty of a given passing situation
at once. Another contribution is that this study suggests the potential of applying
the TGN model to team sports data for handling the interaction between players.
Also, we have made the source code available online for reproducibility1 .
2 Related Work
Several studies have tried to quantify the risk of a pass in a given passing sit-
uation. Spearman et al. [18] proposed a physics-based framework named Pitch
Control to estimate the probability of a pass being successful given that the pass
is sent to each location on the pitch. Power et al. [11] employed logistic regres-
sion to estimate the risk and reward of a pass based on handcrafted features.
Fernández et al. [6] performed a similar task to that of Pitch Control, but by
implementing a CNN-based deep learning architecture instead of physics-based
modeling. Anzer and Bauer [1] predicted the intended receiver leveraging the
approach of Pitch Control, and trained an XGBoost [5] model to estimate the
success probability of each passing option. Most recently, Robberechts et al. [16]
proposed a framework named un-xPass that measures a passer’s creativity.
Meanwhile, analyzing players’ movements in soccer is a cumbersome task
due to its spatiotemporal and permutation-invariant nature, so several methods
have been proposed to deal with this nature. Some studies [6,14–16] treated each
moment of the data as an image and apply a convolutional neural network (CNN)
to encode it, and others [10,12,13] sorted players by a rule-based ordering scheme
starting from the ball possessor. A better approach to model interaction between
players and the ball is to employ graph-based [3] or Transformer-based [19] neural
networks. To name a few, Anzer et al. [2] and Bauer et al. [4] constructed graph
neural networks (GNN) to detect overlapping patterns and to divide a match into
multiple phases of play, respectively. Kim et al. [8] deployed Set Transformers [9]
to predict the ball locations from player trajectories.
P (E = i, R = i|S = s) P (R = i|S = s)
P (R = i|S = s, E = i) = = . (5)
P (E = i|S = s) P (E = i|S = s)
In this section, we explain the tasks of estimating RSPs and RPPs introduced
in Sect. 3 using separate TGN models. The common goal of our RSP and RPP
models is to find the most likely receiver (either expected or actual) in a given
passing situation. What differentiates them is that the candidate receivers of the
former are the teammates (10 in general) of the passer and those of the latter
are all the players (21 in general) other than the passer. In Sect. 4.1 and 4.2,
we elaborate on the common fundamentals of our TGN models. In Sect. 4.3, we
describe how to train the TGN for each type of probability.
The TGN model for each task takes a sequence of time-stamped events that
occurred during each “possession” in soccer matches and produces the proba-
bility of the pass being received by each of the players (or the teammates) on
the pitch in a given game state. Here a possession is defined as a time interval
that a team continues to touch the ball except for fewer than three consecutive
actions by the opponents. Namely, we assume that a possession ends when the
next three actions are performed by the opposite team.
First, we make a graph with nodes corresponding to players, and interac-
tions (i.e., temporal edges) indicating pass attempts between players. We label
an interaction as a successful pass if it connects the two nodes of the same team,
and an unsuccessful pass otherwise. Also, we extract temporal features for each
of the nodes and edges on top of event and tracking data collected from the given
56 P. Rahimian et al.
match. More specifically, node features include a player’s (x, y) location, veloc-
ity, distance, and angle from the ball carrier, and a flag indicating whether the
player is the ball carrier for each time-step. Meanwhile, edge features include
the distance and relationship (i.e., teammates or opponents) between the two
interacting nodes.
Then, we model a TGN as a sequence of events G = {x(t1 ), x(t2 ), ...} at
times 0 ≤ t1 ≤ t2 ≤ · · · , where x(t) is either (1) a node-wise event vi (t) of a
player i such as the change of his location or (2) an interaction event eij (t)
represented by a temporal edge between two nodes i and j such as a pass or a
change in the distance between the two players.
Likewise, an interaction event eij (t) induces the computation of messages for
the passer i and the receiver j as follows:
where si (t− ) is the memory of i at the time of the last event before t in which
the player is involved and msgn , msgs , msgd are learnable message functions.
– Message Aggregator: Since each player i can be involved in multiple events
until time t, we aggregate all the memories mi (t1 ), . . . , mi (tb ) of i generated
before t by averaging them, i.e.,
– Memory Updater: For each event x(t), a learnable memory update function
updates the memory si of each player i involved in the event:
where P−i is the set of all 21 players other than i and h is Temporal Graph
Attention (TGA) proposed in Rossi et al. [17].
– Link Prediction: To predict the most likely receiver (either expected or
actual) of a pass attempt, we put the temporal embeddings zi (t) into a fully
connected layer that outputs link values between nodes. After applying soft-
max to these values, we obtain a set of probabilities that add up to 1 and mean
which candidate would be the receiver of the pass. Note that any passer does
not intend to send the ball to an opponent, so we restrain the candidates to
the passer’s teammates for the RSP model. Meanwhile, all the players includ-
ing opponents are the candidates for the RPP model. See Fig. 2 depicting the
resulting probabilities in a passing moment as an example.
Fig. 2. Visualizations of RSP and RPP for an example match situation. Every passing
option from the ball carrier to a player with a probability larger than 0.01 is expressed
as an arrow whose width indicates the probability value.
Fig. 3. Combined visualization of RSP and CPSP for the same match situation as
Fig. 2. The width of an arrow indicates the selection probability (RSP) of the corre-
sponding passing option (same as Fig. 2a) and the color of it stands for the success
probability (CPSP) of such option.
only a teammate can be a candidate receiver for the RSP model, all the players
including teammates and opponents are candidates for the RPP model.
E
For the RSP model, we aim to estimate ŷs,i = P (E = i|S = s) for each of
the passer’s teammates i, the probability that the passer intends to send the
ball to i in a given state s. While we cannot know the expected receiver of an
unsuccessful pass, we have assumed that for a successful pass, the actual receiver
is the expected receiver in Sect. 3. Hence, we take successful passes D+ in the
training dataset and use the actual receivers of them as the true labels indicating
Pass Receiver and Outcome Prediction 59
the expected receivers for training. Namely, the model is trained by minimizing
the cross-entropy loss
1 E
LE = + E
ys,i log ŷs,i
|D | + +
s∈D i∈T
E E E
between the output ŷs,i and the true label ys,i . Here ys,i = 1 if i receives the
E
pass, and ys,i = 0 otherwise.
R
For the RPP model, we want to estimate ŷs,i = P (R = i|S = s) for each of
the players i (either a teammate or an opponent), the probability that i actually
receives the pass. Other than the RSP model, we do know the true receiver for
every pass (either successful or failure) in the dataset. Thus, we train the model
with the entire training dataset D by minimizing the cross-entropy loss
1
LR = R
ys,i R
log ŷs,i
|D|
s∈D i∈T ∪T
+ −
R R
where ys,i = 1 if i receives the pass, and ys,i = 0 otherwise.
While the two models are trained on different datasets, they can be applied
E
to any passing situation regardless of its outcome. They produce ŷs,i for each
R
teammate i and ŷs,j for each player j (either a teammate or an opponent) for
the situation. Then, as mentioned in Sect. 3, we can compute the OPSP from
Eq. 2 and the CPSP per teammate from Eq. 5, i.e.,
R
P (O = o+ |S = s) = P (R = i|S = s) = ŷs,i (6)
i∈T + i∈T +
R
P (R = i|S = s) ŷs,i
P (R = i|S = s, E = i) = = E . (7)
P (E = i|S = s) ŷs,i
For example, the OPSP of Fig. 2 can be obtained by summing the widths of red
arrows in Fig. 2b. Also, the CPSP per the passer’s teammate is calculated by
dividing the width of the corresponding arrow in Fig. 2b by that of its counterpart
in Fig. 2a. The results are illustrated as the arrows’ colors in Fig. 3.
5 Experiments
5.1 Dataset
The dataset consists of high-resolution spatiotemporal tracking and event data
covering all 330 games of the 2020–21 season of Belgian Pro League collected
by Stats Perform. The tracking data include the (x, y) coordinates of all 22
players and the ball on the pitch for 25 observations per second. The event
data includes on-ball action types such as passes, shots, dribbles, etc. annotated
with additional features such as period ID, the ball carrier’s ID, start and end
locations of the ball. We then merged tracking data with event data. Each record
of our merged dataset includes all players and the ball coordinates with their
corresponding features for each snapshot, i.e., every 0.04 s.
60 P. Rahimian et al.
6 Conclusion
In this study, we propose the application of a Temporal Graph Network (TGN)
to pass receiver and outcome prediction in soccer. By leveraging the TGN’s
predictive capabilities, our framework can analyze a given passing situation with
segmentalized components. Specifically, it quantifies the tendency of passers’
choice of passing options, the difficulties of the options, and the overall difficulty
of a given passing situation at once. A direction for future work is to assess the
offensive and defensive performance of players leveraging our metrics.
62 P. Rahimian et al.
References
1. Anzer, G., Bauer, P.: Expected passes: determining the difficulty of a pass in foot-
ball (soccer) using spatio-temporal data. Data Min. Knowl. Disc. 36, 295–317
(2022)
2. Anzer, G., Bauer, P., Brefeld, U., Fassmeyer, D.: Detection of tactical patterns
using semi-supervised graph neural networks. In: 16th MIT Sloan Sports Analytics
Conference (2022)
3. Battaglia, P.W., et al.: Relational inductive biases, deep learning, and graph net-
works. CoRR abs/1806.01261 (2018). https://arxiv.org/abs/1806.01261
4. Bauer, P., Anzer, G., Shaw, L.: Putting team formations in association football
into context. J. Sports Anal. 9(6), 39–59 (2023)
5. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings
of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining (2016)
6. Fernández, J., Bornn, L.: SoccerMap: a deep learning architecture for visually-
interpretable analysis in soccer. In: Dong, Y., Ifrim, G., Mladenić, D., Saunders,
C., Van Hoecke, S. (eds.) ECML PKDD 2020. LNCS (LNAI), vol. 12461, pp. 491–
506. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67670-4 30
7. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8),
1735–1780 (1997)
8. Kim, H., Choi, H.J., Kim, C.J., Yoon, J., Ko, S.K.: Ball trajectory inference from
multi-agent sports contexts using set transformer and hierarchical bi-LSTM. In:
Proceedings of the 29th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining (2023)
9. Lee, J., Lee, Y., Kim, J., Kosiorek, A.R., Choi, S., Teh, Y.W.: Set transformer:
a framework for attention-based permutation-invariant neural networks. In: Pro-
ceedings of the 36th International Conference on Machine Learning (2019)
10. Mehrasa, N., Zhong, Y., Tung, F., Bornn, L., Mori, G.: Deep learning of player
trajectory representations for team activity analysis. In: 12th MIT Sloan Sports
Analytics Conference (2018)
11. Power, P., Ruiz, H., Wei, X., Lucey, P.: Not all passes are created equal: objec-
tively measuring the risk and reward of passes in soccer from tracking data. In:
Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining (2017)
12. Rahimian, P., Oroojlooy, A., Toka, L.: Towards optimized actions in critical situa-
tions of soccer games with deep reinforcement learning. In: Proceedings of the 8th
IEEE International Conference on Data Science and Advanced Analytics (2021)
13. Rahimian, P., da Silva Guerra Gomes, D.G., Berkovics, F., Toka, L.: Let’s penetrate
the defense: a machine learning model for prediction and valuation of penetrative
passes. In: Brefeld, U., Davis, J., Van Haaren, J., Zimmermann, A. (eds.) MLSA
2022. CCIS, vol. 1783, pp. 41–52. Springer, Cham (2022). https://doi.org/10.1007/
978-3-031-27527-2 4
Pass Receiver and Outcome Prediction 63
14. Rahimian, P., Van Haaren, J., Abzhanova, T., Toka, L.: Beyond action valuation:
a deep reinforcement learning framework for optimizing player decisions in soccer.
In: 16th MIT Sloan Sports Analytics Conference (2022)
15. Rahimian, P., Van Haaren, J., Toka, L.: Towards maximizing expected possession
outcome in soccer. Int. J. Sports Sci. Coach. (2023)
16. Robberechts, P., Roy, M.V., Davis, J.: un-xPass: measuring soccer player’s cre-
ativity. In: Proceedings of the 29th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (2023)
17. Rossi, E., Chamberlain, B., Frasca, F., Eynard, D., Monti, F., Bronstein,
M.M.: Temporal graph networks for deep learning on dynamic graphs. CoRR
abs/2006.10637 (2020). https://arxiv.org/abs/2006.10637
18. Spearman, W., Basye, A., Dick, G., Hotovy, R., Pop, P.: Physics-based modeling
of pass probabilities in soccer. In: 11th MIT Sloan Sports Analytics Conference
(2017)
19. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information
Processing Systems (2017)
Field Depth Matters: Comparing
the Valuation of Passes in Football
Abstract. This study delves into the influence of missing spatial context
information on the valuation of football actions through event data based
metrics. Using actions from an entire Premier League season, we analyze
successful passes originating from different field depths, considering the
subsequent occurrence of goals. By comparing the value assignments by
Valuing Actions by Estimating Probabilities (VAEP), we provide insights
into the metric’s ability to recognize the quality of passes in the early
stages of attacks.
1 Introduction
2 Methodology
This section presents an overview of the methodology utilized in the study, high-
lighting the fundamental approaches. Initially, the description of football event
data will be provided. Following that, we will introduce the VAEP framework,
which utilizes a machine learning algorithm to establish the probabilities of scor-
ing or conceding a goal in each gamestate. Next, the process of assigning a value
to each action will be explained. Finally, we will elucidate the grouping of exam-
ined passes and list the conducted analyses and tests.
2.2 VAEP
Event data allows us to represent a football match as a finite sequence of actions
(a1 , a2 , ..., an ), where n ∈ N. The VAEP framework is based on gamestates con-
sisted of three consecutive actions. A gamestate Si is defined as (ai−2 , ai−1 , ai ).
1
An alternative data representation is tracking data, which constantly captures all
players and ball coordinates. Although it details players positioning, it is more
expansive, more challenging to acquire, and requires a significantly larger volume
of instances to represent full matches.
66 L. M. de Sá-Freire and P. O. S. Vaz-de-Melo
Attribute Description
StartTime The action’s start time
EndTime The action’s end time
StartLoc The (x, y) location where the action started
EndLoc The (x, y) location where the action ended
Player The player who performed the action
Team The player’s team
ActionType The type of the action
BodyPart The player’s body part used for the action
Result The result of the action
Then, each gamestate Si is associated with the probabilities of the team t in pos-
session of the ball scoring (Pscores (Si , t)) or conceding (Pconcedes (Si , t)) a goal
within the subsequent ten actions. These probabilities are separately estimated
through a learning algorithm that utilizes features derived from the SPADL
attributes of the 3 gamestate actions, as detailed in [1]. A binary label is used to
indicate whether a goal occurs or not shortly after the action sequence. Finally,
a value is assigned to each action ai , denoted as V (ai ). The calculation is per-
formed as follows:
3 Results
With only event data attributes, it is not easy to accurately determine which
passes in the first third of the field actually significantly increased the real prob-
ability of a subsequent goal. Still, it is reasonable to assume that there is a higher
concentration of such passes among those that actually resulted in a goal.
This justifies why Group 1-G, which contains initial passes that resulted in
a goal in the near future, will be the focus of our analysis. It is not possible to
affirm that all the passes in this group actually made a significant contribution
to the subsequent goal, but certainly, some of them were decisive.
To understand if the VAEP valuation based on event data can identify them,
we will divide this section into three subsections. The first subsection will com-
pare the offensive valuation differences of passes within the same third of the
field. In the subsequent subsection, we will compare groups 1-G and 3-NG. Lastly,
the final subsection will showcase specific pass examples and highlight notable
cases.
Fig. 1. Comparison of Cumulative Distribution Functions (CDFs) for groups 1-G and
1-NG (left), 3-G and 3-NG (right). The larger discrepancy between 3-G and 3-NG
CDFs compared to 1-G and 1-NG CDFs indicates that the metric better differentiates
passes that result in goals in the final third than in the first third.
Field Depth Matters: Comparing the Valuation of Passes in Football 69
These results show that there is a significant difference between the two
comparisons. The performed test and the visualizations indicate that, although
both comparisons point out differences between the groups, the discrepancy of
values between 3-G and 3-NG is significantly larger than between 1-G and 1-NG.
This indicates that the metric can better differentiate passes that result in goals
in the final third than in the first third.
This result can be interpreted in different ways. One possible interpretation is
that the factors that led to the subsequent goals in group 1-G occurred after the
passes themselves. Thus, the variation in the label between groups 1-G and 1-NG
is more associated with chance than with the differences between the passes in
each group, which results in a similar valuation by VAEP in both groups. On the
other hand, it is in the final third where the goal condition is really generated.
Therefore, the metric appropriately differentiates good passes in the final third,
which are abundant in group 3-G and receive better overall ratings than group
3-NG.
However, this interpretation does not align with what we can observe in con-
temporary football. In recent years, defenders have been increasingly involved
in the offensive phase of the game, starting from their own field, breaking lines,
finding long balls, and structuring counter-attacks. This is a cornerstone of vari-
ous prominent tactical approaches today. Therefore, we believe in an alternative
interpretation of the results.
The advantage generated by good passes in the early stages of the field is
predominantly positional. Whether it is a pressure release, a first-line break, or
a good long pass, the gain for the executing team lies in the relation between
players’ positions and the ball rather than in the attributes stored in event data.
Conversely, in the final third of the field, the features involved in the model
can more easily capture dangerous passes. Although positional relations are still
relevant, there is also substantial importance on elements such as proximity to
the goal and the speed of the sequence of plays. Because of this, VAEP can make
a better distinction in the final third.
This second perspective on the results seems more plausible, but analyzing
just one season is not sufficient to claim that the metric indeed underestimates
good passes in the first third of the field. There is a significant difference in the
number of passes in each group, which makes it challenging to employ certain
approaches in this comparison. Nonetheless, even though they are not conclusive,
the presented results serve as evidence to believe that underestimation does
occur.
Fig. 2. Comparing CDFs of groups 1-G and 3-NG. Group 3-NG shows more frequent
high values than group 1-G. This observation is intriguing since all passes in group 1-G
resulted in goals, while none of the passes in group 3-NG did.
in the final third of the field, the probabilities of scoring are higher and, therefore,
can be decreased by some passes. Though, when we look at the other end of the
graph, we can see that the frequency of higher values is also greater in group
3-NG. This observation is intriguing since all passes in group 1-G resulted in
goals, while none of the passes in group 3-NG did.
Once again, this analysis cannot be taken as conclusive. In addition to the
inherent contextual difference between these two groups, there is a significant
disparity in the number of passes in each. Even so, it is a surprising fact that
corroborates the proposed investigation.
To analyze some examples more closely, we selected the top 10 passes from
group 1-G with the highest offensive values assigned by VAEP. These passes are
represented in Fig. 3, where the bottom part of each plot represents the defending
goal, and the top part represents the attacking goal.
The most prominent characteristic in this set is the depth of the passes.
Except for the first one, all of them are long balls that cover a significant portion
of the field. This observation aligns with expectations because ball advance is
a feature that event data can represent and is intuitively correlated with the
occurrence of a subsequent goal. At first glance, it is plausible that these passes
were influential in creating the resulting goal.
Still, when we look at group 3-NG, we observe that 434 passes are better
valuated than the one occupying the first position in group 1-G. In Fig. 4, we
can see a random sample of 10 of these 434 passes.
Field Depth Matters: Comparing the Valuation of Passes in Football 71
Fig. 3. Top 10 valued passes in group 1-G. There is a prominent characteristic of depth,
with all but the first pass being long balls covering a substantial portion of the field.
Fig. 4. Random sample of the top 434 valued passes in group 3-NG. In general the
passes are short and there is a slight trend of progressiveness.
We can observe a notable trend in these passes: they are generally progressive,
aiming for areas of the field closer to the opponent’s goal, despite being shorter
than the selected passes from group 1-G. While none resulted in a goal, they
give the initial impression of increasing the likelihood of scoring. However, it is
72 L. M. de Sá-Freire and P. O. S. Vaz-de-Melo
4.1 Conclusion
Event data has considerable advantages, such as more accessible collection,
greater availability, and lower memory demand. Despite its inherent limitations,
efficient models based on this structure have already been shown to be feasible
and are of great scientific and practical importance. This reinforces the need to
explore characteristics, identify flaws, and try to improve upon them.
Throughout the scope of this work, we conducted an analysis aiming to under-
stand if VAEP, with its original features, can distinguish good passes in the initial
third of the field. Although the scope and depth of the study do not allow for a
categorical conclusion, some relevant evidence has emerged.
Initially, it was possible to observe a significant discrepancy in the offensive
valuation of passes in the final third of the field when comparing actions with
positive and negative labels. The same was not observed in the initial third,
where the value distributions are more similar. Additionally, we observed that,
in general, there are more well-valuated passes among the negative-labeled passes
in the final third (group 3-NG) than among the positive-labeled passes in the
initial third (group 1-G). Finally, we presented some passes from these two groups
on the field and found that the best-valuated pass by the metric in group 1-G
has a lower score than 434 passes from group 3-NG.
These three stages of analysis support the hypothesis that the event-based
VAEP is limited in distinguishing good passes in the initial third of the field. As
mentioned before, this limitation may be due to the spatial context limitation
of the data format, which has a greater impact on initial passes than on final
passes. Thus, this study encourages further investigation in this direction.
tracking data. By re-valuating the actions using this second form of the metric,
it would be possible to identify which passes had a significant change in value.
This approach could guide a search for patterns within the original features to
enhance and balance the functioning of the metric using event data.
References
1. Decroos, T., Bransen, L., Van Haaren, J., Davis, J.: Actions speak louder than
goals: valuing player actions in soccer. In: Proceedings of the 25th ACM SIGKDD
International Conference on Knowledge Discovery & Data Mining, pp. 1851–1861.
Association for Computing Machinery, New York (2019). https://doi.org/10.1145/
3292500.3330758
2. Van Roy, M., Robberechts, P., Decroos, T., Davis, J.: Valuing on-the-ball actions
in soccer: a critical comparison of XT and VAEP. In: Proceedings of the AAAI-20
Workshop on Artificial Intelligence in Team Sports. AI in Team Sports Organizing
Committee (2020)
3. Robberechts, P., Van Roy, M., Davis, J.: un-xPass: measuring soccer player’s
creativity. In: StatsBomb Conference (2022). https://statsbomb.com/wp-content/
uploads/2022/09/Pieter
4. Singh, K.: Introducing Expected Threat (xT) (2023). https://karun.in/blog/
expected-threat.html
5. Crow, B.: How the creation of different superiorities generates opportunities to
progress the ball (2023). https://soccerdetail.com/2020/10/26/how-the-creation-
of-different-superiorities-generates-opportunities-to-progress-the-ball/
6. Hodson, T.: Positional play: football tactics explained (2023). https://www.
coachesvoice.com/cv/positional-play-football-tactics-explained-guardiola-cruyff-
manchester-city/
7. Fernández, J., Bornn, L., Cervone, D.: Decomposing the immeasurable sport: A
deep learning expected possession value framework for soccer. In: 13th MIT Sloan
Sports Analytics Conference (2019)
8. Decroos, T., Bransen, L., Van Haaren, J., Davis, J.: VAEP: an objective approach
to valuing on-the-ball actions in soccer. In: IJCAI, pp. 4696–4700 (2020)
9. Davis, J., et al.: Evaluating sports analytics models: challenges, approaches, and
lessons learned. In: AI Evaluation Beyond Metrics Workshop at IJCAI 2022, vol.
3169, pp. 1–11. CEUR Workshop Proceedings (2022)
10. Pappalardo, L., Cintia, P., Ferragina, P., Massucco, E., Pedreschi, D., Giannotti,
F.: PlayeRank: data-driven performance evaluation and player ranking in soccer
via a machine learning approach. ACM Trans. Intell. Syst. Technol. (TIST) 10(5),
1–27 (2019). https://doi.org/10.1007/1234567890
11. Fernández, J., Bornn, L., Cervone, D.: A framework for the fine-grained evaluation
of the instantaneous expected value of soccer possessions. Mach. Learn. 110(6),
1389–1427 (2021). https://doi.org/10.1007/1234567890
12. Nascimento, R.F.M., Rios-Neto, H.: Generalized action-based ball recovery model
using 360 data. In: StatsBomb Conference (2022). https://statsbomb.com/wp-
content/uploads/2022/09/Ricardo. Accessed 6 June 2023
13. Decroos, T., Davis, J.: Interpretable prediction of goals in soccer. In: Proceedings
of the AAAI-20 Workshop on Artificial Intelligence in Team Sports (2019)
Basketball
Momentum Matters: Investigating
High-Pressure Situations in the NBA
Through Scoring Probability
1 Introduction
In the world of sports, there is an adjective for athletes who perform well in big
moments: clutch. This expression is very well known in the NBA. NBA.com even
has a tab, where they list traditional, advanced, and other statistics for players
in the clutch. By their definition, the clutch is the final five minutes of a game,
where the point differential is five or less.
Although earlier studies on the subject show that points scored near the end
of a game, when the score is close, have a significant impact on the final outcome
([6]), games can be decided earlier than that. Giving up a big run can break a
team even if it happens in the third quarter or early on in the fourth. The authors
of [3] were the first ones to introduce pre-game elements to their definition of
high-pressure scenarios. They used features describing game importance, team
ambition, recent form, and game context to train a model on rankings acquired
from a set of football experts.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
U. Brefeld et al. (Eds.): MLSA 2023, CCIS 2035, pp. 77–90, 2024.
https://doi.org/10.1007/978-3-031-53833-9_7
78 B. Mihalyi et al.
2 Pre-game Pressure
When constructing our pre-game pressure metric, we use a feature set similar to
the one in [3]; however, our final pressure score is game importance itself.
To determine team ambition, we assigned all 30 NBA teams in the 6 seasons
under consideration to four clusters. During the clustering we used the follow-
ing features: average experience and age of the roster, average salary of the 5
highest earning players on the team, average experience of the 5 highest earning
players, number of all-stars on the team roster the year before, expected wins
and championship odds before the season began, number of wins the team had
1, 2, 3 and 4 seasons ago, and playoff appearances and championships by the
franchise. All this data was gathered from basketball-reference.com [1]. In the
case of pre-season expected wins and championship odds, we used the way-back
machine [2] to get sufficient data.
The results of the clustering can be seen in Table 1. There are two superior
clusters, 0 and 2. Both of these have more experience on their roster than the
other two, as well as more all-stars. The difference between the two clusters is
their success in recent seasons. Members of cluster 0 have been winning teams
for years, while teams in cluster 2 have been losing teams some years ago, but
are winning presently. Members of cluster 3 are the opposite of teams in cluster
2. They have been getting worse and worse over the last few seasons, but their
highest-paid players are quite experienced. These teams are likely just entering
a rebuild, but have not yet gotten rid of their aging veterans. Teams in cluster
1 are in complete demise. They are young and bad, and seemingly not going
Momentum Matters 79
anywhere. For some examples of teams in certain clusters look for Fig. 7 in the
Appendix.
– SHOTMAKING L2M, the shot-making streak of the shooter in the last 2 min.
The value is +m if the shooter made the last m shots they attempted and
−m if they missed their last m shots;
– PERF DELTA, the difference in the shooting percentages of the shooter in
previous games and the current game;
– SCOREMARGIN L5M, the score margin in the last 5 min before the shot
was taken from the shooter’s team’s point of view; and
– pregame pressure cluster, the pre-game pressure the shooter’s team was under
entering the game.
The other features were PERIOD, the quarter in which the shot was taken;
SECLEFT, the number of seconds left in the quarter when the shot was taken;
LAST SHOT, 1 if the shooter made their previous shot of the same type, 0
otherwise; SCOREMARGIN, the score difference between the two teams from
the shooter’s team’s perspective; and SHOTTYPE, 1 if the shot taken was a
2-pointer, 2 if it was a 3-pointer, and 3 if it was a free throw.
In our earlier attempts we used features created from tracking data acquired
from [11]. These features were shot-clock (the number of seconds remaining on
the shot-clock), closest defender (the distance between the shooter and the closest
defender to him), and average defender distance (the average distance between
the shooter and the defenders). Although these features added extra context to
each individual shot, our previous analysis proved them indifferent to shooting
success.
Momentum Matters 81
the opposing team so far in the quarter”. In other words, NBA players shot the
basketball much better when they gained momentum over their opponent in the
first few minutes of a quarter. Players shot both 2- and 3-pointers much worse
when these conditions were not true. Players made their 2-pointers 49.2% of the
time, but when under high pressure, they only made 31.5%. For 3-pointers, these
numbers are 35.6% and 15.2%, respectively. A potential way to stabilize things
when a team allows their opponent to go on a run early into a quarter could be
to get to the free throw line. Although free throw shooting also gets worse when
a team did not win the last 5 min, the 78.3% to 71.9% drop is more favorable
compared to the other two shot types.
It would make sense that momentum in the first few minutes of the first
quarter stems from pre-game pressure. To test this hypothesis we built a deci-
sion tree without the feature SCOREMARGIN L5M, exclusively on shots in the
first quarter. Figure 6 shows that in the first 11 min of first quarters, players
shoot 2-pointers better (52% vs. 49.5%) when the pre-game pressure is low (see
Appendix). Although there are much fewer shots in such situations (32, 482 vs.
112, 453), the tree is still robust in these leaves.
Fig. 2. Players’ 2-pointer (x-axis), 3-pointer (y-axis), and free throw (color) shooting
performance between the 2012–13 and 2017–18 NBA seasons (Color figure online)
where Sx (T ) denotes the set of type T shots player x took, and sx is one of these
shots. SHOT MADE FLAGsx is the value of the SHOT MADE FLAG variable
for shot sx , and πsx is the scoring probability of sx according to our decision
tree.
The results can be seen in Fig. 2. This figure, as well as Figs. 3, and 4, are
simplified since some of the players seen on them changed teams during the
investigated time period. Stephen Curry’s 3-point dominance is very apparent,
but the whole upper right section of the figure is interesting. Apart from Stephen
Curry, we see Kevin Durant, LeBron James, and James Harden (barely) there
as well. These players won the Most Valuable Player award 5 times out of the
possible 6 during the investigated time period. The only other MVP not there in
the best quarter of the plot is Russell Westbrook, who was better than average
only at free throw shooting while under pressure.
84 B. Mihalyi et al.
Fig. 3. 2-point (x-axis), 3-point (y-axis), and free throw (colors) shooting trends
between the 2012–13 and 2017–18 NBA seasons (Color figure online)
⎢
sx ⎥ · 100 (4)
⎣ sx ∈Sx
⎦
st
st ∈St
Essentially, we take the ratio between the fraction of team shots player x has
taken under pressure and the same ratio overall. Ss,p denotes the set of shots
Momentum Matters 85
Fig. 4. Assist (x-axis), turnover (y-axis), and assist to turnover ratio (colors) trends
between the 2012–13 and 2017–18 NBA seasons (Color figure online)
player x has taken under pressure, and St,p is the same for his team. Sx and St
are the set of shots player x and his team took, in that order. This is a slightly
modified version of the formula presented in [12].
Players’ shooting trends under pressure can be seen in Fig. 3. Almost every
player seems to take more 2-pointers and get to the free-throw line less under
pressure. The latter trend usually means that a player is less aggressive than
usual; however, in this case, it could also be explained by the fact that our high-
pressure criterion can only be true in the first few minutes of a quarter, where
there are usually fewer free throw attempts than in the later stages.
Apart from shooting, a player can also impact a basketball game with his
passing ability, by setting up his teammates to score. On the other hand, they
can also hurt their team by turning the ball over. To see how the players who
met the 1500-shot criterion did in this aspect of the game under pressure, we
86 B. Mihalyi et al.
Appendix
Fig. 6. The decision tree built on SHOT MADE FLAG, without the feature SCORE-
MARGIN L5M, solely on shots in the first quarter
Momentum Matters 89
References
1. Basketball Statistics & History of every Team & NBA and WNBA players. https://
www.basketball-reference.com/
2. Internet archive: Wayback machine. https://archive.org/web/
3. Bransen, L., Robberechts, P., Van Haaren, J., Davis, J.: Choke or shine? Quan-
tifying soccer players’ abilities to perform under mental pressure. In: MIT Sloan
Sports Analytics Conference (2019)
4. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
5. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regres-
sion Trees. Routledge, Abingdon-on-Thames (2017)
6. Goldman, M., Rao, J.M.: Effort vs. concentration: the asymmetric impact of pres-
sure on NBA performance. In: MIT Sloan Sports Analytics Conference (2012)
7. Iso-Ahola, S.E., Dotson, C.O.: Psychological momentum: why success breeds suc-
cess. Rev. Gen. Psychol. 18(1), 19–33 (2014)
8. Patel, S.: nba api, September 2018. https://github.com/swar/nba api
9. Rossotti, P.: NBA Enhanced Box Score and Standings (2012–2018), March 2018.
https://www.kaggle.com/datasets/pablote/nba-enhanced-stats
10. schadam26: Br scrape, January 2021. https://github.com/schadam26/BR Scrape/
90 B. Mihalyi et al.
11. Seward, N.: Sportvu movement tracking data, February 2018. https://github.com/
sealneaward/nba-movement-data
12. Zuccolotto, P., Manisera, M., Sandri, M.: Big data analytics for modeling scoring
probability in basketball: the effect of shooting under high-pressure conditions. Int.
J. Sports Sci. Coach. 13(4), 569–589 (2018)
Are Sports Awards About Sports? Using
AI to Find the Answer
1 Introduction
The general belief is that superior performance dictates the outcome. For
instance, an MVP typically achieves the highest averages in points, rebounds,
and assists; however, history shows many outcomes being driven by what are
apparently other factors.
Interestingly, for an event that brings debate, salary increases, trades etc.,
awards have received limited formal research. In this work, AI is leveraged to
predict the four major NBA awards. Using a novel machine learning (ML) app-
roach, where over ninety models are examined, we demonstrate a compelling
approach to predicting awards that also gives insight into, perhaps, what might
be intangibles. In fact, the question of whether statistics are sufficient to predict
awards is directly answered.
2 Background
Fig. 1. The green boxes illustrate model training and the blue box shows how the
trained model is used. 0: previous years are used as the training data. 1, 2: Individual
player statistics and team win statistics are used as the dataset, 3: preprocess the data
to remove unneeded features and entries. 4: award vote percentage is used as the class
label. 5: regression architectures are used to build models. 6: Player data from the
current year is fed into the model to get results. (Color figure online)
Notably, in the year 2022, the top three performers align with the top three
candidates for the MVP award, suggesting that the offensive rating metric can
serve as a fair indicator of performance. Figure 3 (top) plots points + assists per
game scored by each player. The MVP consistently outperforms the majority of
players typically ranking within the top 5th percentile. Figure 3 (bottom) shows
that many players remain candidates in subsequent years. The rationale behind
this analysis was to ascertain the presence of players with prior NBA experience,
which could potentially impact their likelihood of receiving MVP votes and being
considered contenders for the award. Notably, it is observed that a significant
overlap ranges from approximately 75% to 85% across all years, indicating a
substantial number of players who carry on from one season to the next can
win. Our analysis revealed the existence of more than 20 players who received
votes for the MVP award in at least five different years but have never won.
These results do not encompass the current 2022–23 season, which explains why
Joel Embiid appears in the visualization despite winning the award this season.
3 Experimental Overview
Four experiments were done to predict the chances of a player winning an award
in a specific category (greatest percentage vote) that included over 90 AI models
currently available in R [14]. Selected models vary greatly e.g., tree-based regres-
sor, support vector machine-based regressor, ordinary linear regression models
94 A. Shankar et al.
Fig. 2. MVP winners (red) consistently outperform other players (light blue) in Offen-
sive ratings. They consistently achieve scores above the 95th percentile (black arrow &
line) and the mean performance (green arrow & line) in each year. (Color figure online)
with regularization, random forest, and deep learning models. Models competed
to be the best predictors must (1) complete computation within 15 min; (2) have
no error. After meeting both conditions, 30 models remained for MVP, MIP, and
DPY (all the same). For RoY the number of models was 76. Model training is
done on each data set individually and we report the top models and their sum-
mary statistics. Performance is evaluated via two different metrics: Root Mean
Squared Error (RMSE) and Mean Absolute Error (MAE).
Irrelevant Features were removed and all values are converted to the same scale
by normalizing using maximum absolute scaling. The data is split in the ratio
85:15 for training and test sets, and the training data is further split into vali-
dation sets as described below. The target column is the number of votes and
the remaining columns are used as the predictor variables.
Are Sports Awards About Sports? Using AI to Find the Answer 95
Fig. 3. (top) MVP winners consistently outperform other players in points + assists
consistently achieving scores above both the 95th percentile (black arrow & line) and
the mean performance (green arrow & line) in each year. (bottom) Overlaps of 75%
to 85% of players in successive years in the NBA indicate many of the same players
remain candidates. (Color figure online)
Results (Fig. 4) show that across validation and test sets, top models include
cubist, pcaNNet, qrnn xgbTree, M5, knn, kknn and avNNet, and indicates that
averaged neural network, tree and neighbor based approaches have better perfor-
mance than their counterparts. Interestingly, the RMSE of these models on the
validation data is less than 0.02, and the MAE is less than 0.002, which shows
better performance than their counterparts. Although the results on test data
seem promising, they are in part biased by the sparsity in the target column.
Therefore, as noted, for MVP data, performance on validation data is a reliable
performance indicator.
Interestingly, the performance landscape of the top models can be broadly
classified into models that use ensemble or committee-based predictions versus
predictions from stand-alone models. Moreover, the models that use ensemble
approach-tree-based regression models and their variants generally demonstrate
the best performance across the evaluation metrics. We think the reason behind
this specific performance landscape is motivated by the accurate selection of fea-
tures either in the form of 1) rules identified by tree-based models that associate
different weights with specific features for regression, and use a committee of
models to smooth final predictions (opposed to conventional linear regression),
or 2) using PCA to identify the features with high variance-removing noisy or
outlier features thus resulting into better training and predictions.
Out of the 7 models that have the lowest average i.e. (RMSE < 0.02), 3
models-cubist, xgbTree and M5 are tree-based, avNNet is a bagging model that
averages the predictions from various single hidden layer neural networks, and
knn is the nearest neighbor method that predicts based on the closest neigh-
bors. Although, all the top 7 models have RMSE < 0.02, yet cubist attained
the best performance by using a relatively sophisticated approach i.e. correct-
ing the predictions made by a committee of tree-based regressors by using a
weighted nearest neighbor approach [15]. xgbTree which implements gradient
tree boosting [7], and M5 which uses linear regressors in the terminal leaves of
the tree (without nearest neighbor smoothing) have slightly higher RMSE than
cubist. An illustrative example of the rule extracted by M5 (a tree-based model)
is shown below:
IF
PTS <= 0.492
AST <= 0.369
THEN
Are Sports Awards About Sports? Using AI to Find the Answer 97
Fig. 4. Comparative performance of top models while predicting the Most Valuable
Player (MVP) of the year.
This was used to evaluate the predictions by comparing the top 10 predicted
players and the top 10 actual players. The similarity algorithm was run on
each dataset for every year from 2000 to 2023 to assess performance over multi-
ple years. The average Jaccard Similarity score was then calculated. The MVP
dataset achieved a score of 0.74 ± 0.12, which outperforms the other models.
The RoY dataset had a score of 0.62 ± 0.13, DPY had a score of 0.47 ± 0.15,
and MIP had a score of 0.5 ± 0.14. These scores suggest that, while the MVP is
mostly data-driven, the other awards apparently include other criteria (perhaps
including human bias). Making this analysis available might provide voters with
incentives to become more transparent.
Table 1. Comparison of actual and predicted awards winner for the 2020 season. The
average rank assessment and Jaccard similarity score respectively are shown n = 24.
Observe that MVP is clearly the most data-driven.
to the Jaccard calculations, this score function was run on all years from 2000 to
2022 and the average was calculated. The MVP model achieves a score of 0.40
± 0.18, MIP gets a score of 0.54 ± 0.09, DPY has a score of 0.51 ±, 0.15, and
RoY got a score of 0.35 ± 0.18.
For the defensive player, and most improved player, the model does not
perform quite as well. For these two awards, it was found that the metrics which
make a player a good choice do not necessarily show up in the stats. Many
variables, such as guarding match-ups, positioning, and general work ethic, are
all critical parts of what makes a good defensive or improving player, but what
variables are used is not evident. Since the models failed, perhaps the data was
not sufficient. The answer is that we accessed all available data that is ostensibly
used to determine the award.
In Table 2, the top three predictions made by the model for this year’s awards
are presented. The model accurately forecasted the winners of the MIP and RoY
awards. While the model anticipated Claxton to receive the DPY award, it was
ultimately awarded to Jackson Jr., who the model had predicted as the runner-
up. Additionally, the model correctly placed Jokic and Antetekounmpo in the
top three predictions. However, the award was ultimately bestowed upon Joel
Embiid, who was predicted by the model to secure the fourth position in the
voting.
Judging from the results of the awards, the models are performing fairly
successfully. They predicted two awards correctly. For the two other awards, the
mistakes made are understandable considering the tight competition of those
awards this year. From a data science perspective, this means other factors,
likely intangible, are playing roles.
References
1. Albert, A., de Mingo López, L., Allbright, K., Gómez Blas, N.: A hybrid machine
learning model for predicting USA NBA all-stars. Electronics 11(1), 97 (2022).
https://doi.org/10.3390/electronics11010097
2. Aoki, R.Y., Assuncao, R.M., Vaz de Melo, P.O.: Luck is hard to beat: the difficulty
of sports prediction. In: Proceedings of the 23rd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD ’17), pp. 1367–1376.
Association for Computing Machinery, New York, NY, USA (2017). https://doi.
org/10.1145/3097983.3098045
3. Basketball-Reference.com: Awards - 2023 (2023). https://www.basketball-
reference.com/awards/awards2023.html
4. Chapman, A.: The Application of Machine Learning to Predict the NBA Regular
Season MVP. Phd thesis, Utica University (2023)
5. Chen, M.: Predict NBA regular season MVP winner. In: International Conference
on Industrial Engineering and Operations Management. Bogota, Colombia, Octo-
ber 2017
6. Chen, M., Chen, C.: Data mining computing of predicting NBA 2019–2020 regular
season MVP winner. In: 2020 International Conference on Advances in Computing
and Communication Engineering (ICACCE), pp. 1–5. Las Vegas, NV, USA (2020).
https://doi.org/10.1109/ICACCE49060.2020.9155038
7. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings
of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining, pp. 785–794 (2016)
8. Coleman, B.J., DuMond, J.M., Lynch, A.K.: An examination of NBA MVP voting
behavior: does race matter? J. Sports Econ. 9(6), 606–627 (2008). https://doi.org/
10.1177/1527002508320653
9. Etocco, E.: NBA Player Stats (2023). https://data.world/etocco/nba-player-stats
10. Forese, J., Gelman, J., Reed, D., Lorenc, M., Shields, B.: Modern NBA coaching:
balancing team and talent. In: 2016 MIT Sloan Sports Analytics Conference (2016)
11. Gilermo, D.R.: NBA Players Stats (2023). https://www.kaggle.com/datasets/
drgilermo/nba-players-stats?resource=download&select=player+data.csv
12. Gmoney: NBA Team Records by Year (2023). https://data.world/gmoney/nba-
team-records-by-year
13. Johnson, J., Khoshgoftaar, T.: Survey on deep learning with class imbalance. J.
Big Data 6(27) (2019). https://doi.org/10.1186/s40537-019-0192-5
14. Kuhn, Max: Building predictive models in r using the caret package. J. Stat.
Softw.28(5), 1–26 (2008). https://doi.org/10.18637/jss.v028.i05, https://www.
jstatsoft.org/index.php/jss/article/view/v028i05
15. Kuhn, M., Weston, S., Keefer, C., Coulter, N.: Cubist models for regression. R
Package Vignette R Package Version 0.0 18, 480 (2012)
16. Lewis, R.J.: An introduction to classification and regression tree (CART) analy-
sis. In: Annual Meeting of the Society for Academic Emergency Medicine in San
Francisco, California, vol. 14. Citeseer (2000)
17. Maszczyk, A., Golás, A., Pietraszewski, P., Roczniok, R., Zajac, A., Stanula, A.:
Application of neural and regression models in sports results prediction. Procedia.
Soc. Behav. Sci. 117, 482–487 (2014). https://doi.org/10.1016/j.sbspro.2014.02.
249
18. Maymin, A., Maymin, P., Shen, E.: NBA chemistry: positive and negative synergies
in basketball. In: 2012 MIT Sloan Sports Analytics Conference (2012)
102 A. Shankar et al.
19. McCabe, A., Trevathan, J.: Artificial intelligence in sports prediction. In: Fifth
International Conference on Information Technology: New Generations (ITNG
2008), pp. 1194–1197. Las Vegas, NV, USA (2008). https://doi.org/10.1109/ITNG.
2008.203
20. Mclntyre, A., Brooks, J., Guttag, J., Wiens, J.: Recognizing and analyzing ball
screen defense in NBA. In: 2016 MIT Sloan Sports Analytics Conference (2016)
21. Miljković, D., Gajić, L., Kovačević, A., Konjović, Z.: The use of data mining for
basketball matches outcomes prediction. In: IEEE 8th International Symposium on
Intelligent Systems and Informatics, pp. 309–312. Subotica, Serbia (2010). https://
doi.org/10.1109/SISY.2010.5647440
22. Nagarajan, R., Zhao, Y., Li, L.: Effective NBA player signing strategies based on
salary cap and statistics analysis. In: 2018 IEEE 3rd International Conference on
Big Data Analysis (ICBDA), pp. 138–143. Shanghai, China (2018). https://doi.
org/10.1109/ICBDA.2018.8367665
23. nbn23.com: Basketball Statistics (Year). https://www.nbn23.com/improve-
efficiency-basketball-statistics/
24. Oh, M., Keshri, S., Iyengar, G.: Graphical model for basketball match simulation.
In: 2015 MIT Sloan Sports Analytics Conference (2015)
25. Papageorgiou, G.: Data mining in sports: daily NBA player performance prediction
(2020). https://hdl.handle.net/11544/29991. Accessed 30 May 2023
26. Ripley, B., Venables, W., Ripley, M.B.: Package ‘nnet’. R Package Version 7(3–12),
700 (2016)
27. Song, L., Langfelder, P., Horvath, S.: Random generalized linear model: a highly
accurate and interpretable ensemble predictor. BMC Bioinform. 14(1), 1–22 (2013)
28. Wang, N., Chen, M.: Simple poker game design, simulation, and probability. In:
IEOM Bogota Proceedings, pp. 1297–1301 (2017)
The Big Three: A Practical Framework
for Designing Decision Support Systems
in Sports and an Application
for Basketball
1 Introduction
In sports, the strategic decisions made by coaches and analysts play a crucial role
in the success of a team. With the advent of modern data science technologies,
there is a growing opportunity to leverage these advancements to aid in the
design and optimization of team strategies.
Nevertheless, the adoption of new Decision Support Systems (DSS) in high-
performance sports organizations can be challenging. In fact, it is hard to incor-
porate data-based decision-making in the daily operations of the team. A con-
siderable number of decision-making processes in sports are intuitive and the
Contributions
Past work in sports has explored model explanation and interactivity with data
separately. There is a lack of work that uses both tools together in DSS. In this
paper, we present a preliminary framework that closes that gap and improves
the understanding of DSS. The framework consists of three components: model,
explanation and interactivity. It has the potential to be used for any task in any
statistical sport. As an application, we focus on team scouting in basketball to
propose a novel methodology to understand why teams win.
2 Literature Overview
Past literature in sports analytics builds upon the fact that transitioning from
observational analysis to automatic statistical analysis based on data provides
huge opportunities [10,39]. The integration of data-driven approaches in physical
conditioning has been a long-standing reality [24,27]. Nevertheless, the advent
of modern data collection techniques has enabled comprehensive game analysis.
From tactical investigations [4,13] to the exploration of mental aspects [12],
nearly all facets of the game can now be studied with data-driven methodologies.
The growing prevalence of data-driven decision-making in diverse fields,
including sports, underscores the need for research addressing the successful
implementation and optimization of ML-based DSS for non-technical decision-
makers. Particularly relevant is the work of Kayande et al. [22], which proposes
a framework to understand potential problems of DSS when used by humans
decision-makers. In the context of sports, there has been past work highlight-
ing the challenges of adding these kinds of systems in professional organisations
[41,48]. There have also been some attempts of proposing frameworks to facili-
tate the incorporation of such systems into sports organisations [2,44]. However,
these frameworks mainly focus on organisations and not on how DSS should be
designed in order to overcome those challenges. Other papers focus on the devel-
opment of DSS for specific tasks [35,43]. Our work closes this gap by proposing
a general framework to design DSS based on ML in sports by adding two compo-
nents (model explanation and interactivity) to the traditional ML pipeline (only
composed by a model). With these three components (ML model, explanation,
interactivity), we aim to close the 3 gaps pointed out by Kayanda et al. [22].
The Big Three 105
data, summarizing basic game statistics. Play-by-play data, that describes the
time series of events in a game, is used by Grassetti et al. [15], and Vracar et al.
[49]. Lastly, tracking data, that provides player localtions at every moment, has
been used by Sampaio et al. [42], Reina et al. [40], and Abdelkrim et al. [1].
A key aspect of designing a game plan is trying to modify some aspects of the it
to maximize the probabilities of winning. Traditionally, coaches have relied on
intuition. We have developed a Data-Driven DSS1 to help coaches in scouting
teams and formulating game plans, showcasing our three-component framework.
The workflow, depicted in Fig. 2, illustrates how coaches can utilize Bas-
ketXplainer to create game plans based on data. The system’s reception among
domain experts has been promising, validating its potential in enhancing
decision-making processes in basketball analytics.
rebounds (DREB), 3-point attempts (FG3A), field goal attempts (FGA), free
throw attempts (FTA), offensive rebounds (OREB), steals (STL), and turnovers
(TO). Statistics directly impacting the score, such as field goals made (FGM), 3-
pointers made (FG3M), and free throws made (FTM), are deliberately excluded.
Explainability. Understanding what factors are the most important to win
a game can help coaches designing a game plan. As professional teams do not
have a lot of time between games, preparing just the most important aspects
of the next game is crucial. From an ML perspective, this can be done using
SHAP (SHapley Additive exPlanations) [31]. It assigns to each input feature an
importance value for a particular prediction based on how much it contributes
to the model output. For its implementation, we used TreeExplainer [30] for the
LightGBM model and a force plot [32] for visualisation.
Interactivity: Dashboard. Our primary focus has been conducting what-
if analyses. Coaches can greatly benefit from exploring multiple scenarios to
assess how potential game plans translate into tangible outcomes. To enhance
the practicality of these analyses, we have incorporated additional components
that allow coaches to understand if their envisioned scenarios are realistic.
Interactive Box Score Data. The core piece of the dashboard is an interactive
parallel coordinates plot of the box score data. When selecting an existing team
for analysis, the mean box score data for that team is displayed in the parallel
coordinates. Using the sliders in each parallel line users can change the boxscore
data and thereby simulate what-if scenarios to see how changes in the box score
will impact the predicted outcome of the matchup. Additionally, the mean boxs-
core data of the 5 most similar teams is shown in the background along with
the number of possessions of each team. This allows direct comparison to check
if the simulation is run with realistic data.
Winning Probability. The trained ML model is used to infer the winning chances
of the two teams based on the provided box score data. Every change to the box
scores displays new winning probabilities. Users can use this to understand how
changes in the box score will influence the predicted outcome.
As seen in Table 1, our model improves the accuracy of methods using simple
boxscore data. However, it falls short when compared to methods with engineered
features specifically tailored for game outcome prediction, such as Oliver’s fac-
tors. Despite this, it is crucial to consider that such engineered features are not
suitable for our application due to their lack of actionability.
The domain experts are five people from the coaching staff from teams in
Spanish men’s and women’s leagues. Among them, four belong to the top-ranked
league, while the fifth individual is associated with the third league. Three fulfill
the role of data analysts, and two are assistant coaches.
Gap2: Explainability
Q1: How coherent is the explanation of the model compared to your 1.2 / 2
explanation?
Q2: How satisfied with this kind of explanation? 1.2 / 2
Gap3: Interactivity
Q3: How useful has been the interactivity (what-if analysis) for the use 1.8 / 2
case proposed?
General Questions
Q4: Is the information obtained from the DSS transferable to court 1.6 / 2
decisions?
Q5: Would it be easy to include this DSS in your daily routine? 1.8 / 2
Q6: Does it add new perspectives? Is it innovative? 1.4 / 2
Q7: How effective ( information
time
) is the solution compared to what you use 1.6 / 2
now?
Table 2 displays the survey results. Although we asked them for the predicted
outcome of the game without showing them the actual model output, we don’t
include such a question because the model’s performance has been evaluated
in the previous section. Q1 evaluates the similarity of their reasoning for why
the game was won before and after presenting the SHAP values with the actual
SHAP explanation. Q2 measures the comprehensibility of the SHAP explanation
after it was shown to them. Notably, all interviewees admit a lack of confidence
in their answers for both Q1 and Q2 (and, previously, for guessing the game
outcome), as it is difficult to predict outcomes based only on the metrics used.
However, they demonstrated relative confidence in the rest of the questions.
Another conclusions of the open ended questions were:
They were Expecting a Tool with Information that Could be Quickly
Transferred to the Court. During the expectation check, before the interface
was shown to them, a recurring theme was the desire for a tool with informa-
tion that could be readily applied on the court. The interviewees emphasized
the importance of actionable metrics in facilitating data-driven decision-making
for devising effective game plans. Specifically, the two assistant coaches men-
tioned using data systems as warnings, while the two data analysts looked for
key headlines that could be communicated to the coaching staff. After the use
case, they highlighted that our solution fulfilled this. Our system allowed rapid
identification of key factors to win a game, which, was one of the most important
factors to decide whether to use a DSS or not.
112 F. J. Sanguino Bautiste et al.
In this paper, we have defined a general framework for designing Decision Sup-
port Systems (DSS) based on Machine Learning models in the domain of sports.
The framework consists of 3 components that close all the potential gaps that
produce user mistrust when using a DSS. To illustrate our approach, we have
developed a tailored DSS for basketball game planning. Our solution allows
coaches to scout the rival team using an ML model for predicting the outcome,
SHAP values to explain the model’s prediction and an interactive dashboard to
perform what-if analysis while checking if the potential game plan is realistic.
The model has a comparable accuracy to past literature and the preliminary
feedback from domain experts regarding explainability and interactivity was
excellent.
Regarding the Big Three: Our Framework. To further explore the benefit
of our framework, a good research opportunity would be what kind of models
The Big Three 113
Improving the Model to Make it Specific. One domain expert expressed a major
concern regarding the relevance of data used to train the predictive model, par-
ticularly regarding distribution shifts. For instance, using LEB ORO (Spanish
second men’s league) data to predict outcomes in the NBA would not be logical
due to differences in leagues. However, there might be insufficient data to train
an ML model for a specific league or team, prompting the exploration of ideas
like fine-tuning, weighted training, or ensemble training to address this challenge
and leverage useful representations from basketball games in other leagues.
Adding Other Metrics. A recurring desire expressed in all interviews was the pos-
sibility of incorporating personalized metrics into the dashboard, allowing for a
deductive workflow that integrates their domain knowledge. However, gathering
this specific data poses a significant challenge, as it often requires manual collec-
tion through game video analysis. Despite this obstacle, coaches recognized the
potential benefits of including personalized data in our application. This could
open up new avenues of research to explore the effectiveness of personalized met-
rics in basketball’s winning dynamics. Some coaches even requested the inclusion
of additional metrics relevant to their teams, such as differentiating assists for
three-pointers and two-pointers, further enhancing their ability to leverage their
mental models within the system.
Adding Other Player’s Statistics. In a similar line, our interviewees praised our
solution for providing valuable insights into the team’s overall performance, aid-
ing in game plan preparation. However, they emphasized the need to consider
individual player contributions, leading to a new research direction on effectively
integrating individual performance in team sports while preserving quality inter-
actions.
References
1. Abdelkrim, N.B., El Fazaa, S., El Ati, J.: Time-motion analysis and physiological
data of elite under-19-year-old basketball players during competition. Br. J. Sports
Med. 41(2), 69–75 (2007)
114 F. J. Sanguino Bautiste et al.
2. Browne, P., Sweeting, A.J., Woods, C.T., Robertson, S.: Methodological consid-
erations for furthering the understanding of constraints in applied sports. Sports
Med.-Open 7(1), 1–12 (2021)
3. Bunker, R., Susnjak, T.: The application of machine learning techniques for pre-
dicting match results in team sport: a review. J. Artif. Intell. Res. 73, 1285–1322
(2022)
4. Bunker, R.P., Thabtah, F.: A machine learning framework for sport result predic-
tion. Appl. Comput. Inform. 15(1), 27–33 (2019)
5. Chen, W., et al.: Gameflow: narrative visualization of NBA basketball games. IEEE
Trans. Multimed. 18(11), 2247–2256 (2016)
6. Daniel, K.: Thinking, fast and slow (2017)
7. Doshi-Velez, F., Kim, B.: Considerations for evaluation and generalization in inter-
pretable machine learning. In: Escalante, H., et al. (eds.) Explainable and Inter-
pretable Models in Computer Vision and Machine Learning. SSCML, pp. 3–17.
Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98131-4 1
8. Du, M., Yuan, X.: A survey of competitive sports data visualization and visual
analysis. J. Vis. 24, 47–67 (2021)
9. El-Assady, M., et al.: Towards XAI: structuring the processes of explanations.
In: Proceedings of the ACM Workshop on Human-Centered Machine Learning,
Glasgow, UK, vol. 4 (2019)
10. Fister, I., Jr., Ljubič, K., Suganthan, P.N., Perc, M., Fister, I.: Computational
intelligence in sports: challenges and opportunities within a new research domain.
Appl. Math. Comput. 262, 178–186 (2015)
11. Fu, Y., Stasko, J.: Supporting data-driven basketball journalism through interac-
tive visualization. In: Proceedings of the 2022 CHI Conference on Human Factors
in Computing Systems, pp. 1–17 (2022)
12. Gao, X., Uehara, M., Aoki, K., Kato, C.: Prototyping sports mental cloud. In:
2017 5th International Conference on Applied Computing and Information Tech-
nology/4th International Conference on Computational Science/Intelligence and
Applied Informatics/2nd International Conference on Big Data, Cloud Comput-
ing, Data Science (ACIT-CSII-BCD), pp. 141–146. IEEE (2017)
13. Goes, F., et al.: Unlocking the potential of big data to support tactical performance
analysis in professional soccer: a systematic review. Eur. J. Sport Sci. 21(4), 481–
496 (2021)
14. Goldsberry, K.: How mapping shots in the NBA changed it forever. FiveThir-
tyEight. FiveThirtyEight, 2 May 2019
15. Grassetti, L., Bellio, R., Di Gaspero, L., Fonseca, G., Vidoni, P.: An extended
regularized adjusted plus-minus analysis for lineup management in basketball using
play-by-play data. IMA J. Manag. Math. 32(4), 385–409 (2021)
16. Gudmundsson, J., Horton, M.: Spatio-temporal analysis of team sports. ACM
Comput. Surv. (CSUR) 50(2), 1–34 (2017)
17. Herold, M., Goes, F., Nopp, S., Bauer, P., Thompson, C., Meyer, T.: Machine
learning in men’s professional football: current applications and future directions
for improving attacking play. Int. J. Sports Sci. Coach. 14(6), 798–817 (2019)
18. Horvat, T., Job, J.: The use of machine learning in sport outcome prediction: a
review. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 10(5), e1380 (2020)
19. Horvat, T., Job, J., Logozar, R., Livada, Č: A data-driven machine learning algo-
rithm for predicting the outcomes of NBA games. Symmetry 15(4), 798 (2023)
20. Hubáček, O., Šourek, G., Železnỳ, F.: Exploiting sports-betting market using
machine learning. Int. J. Forecast. 35(2), 783–796 (2019)
The Big Three 115
21. Joash Fernandes, C., Yakubov, R., Li, Y., Prasad, A.K., Chan, T.C.: Predicting
plays in the national football league. J. Sports Anal. 6(1), 35–43 (2020)
22. Kayande, U., De Bruyn, A., Lilien, G.L., Rangaswamy, A., Van Bruggen, G.H.:
How incorporating feedback mechanisms in a DSS affects DSS evaluations. Inf.
Syst. Res. 20(4), 527–546 (2009)
23. Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. Adv.
Neural Inf. Process. Syst. 30 (2017)
24. Kellmann, M.: Preventing overtraining in athletes in high-intensity sports and
stress/recovery monitoring. Scand. J. Med. Sci. sports 20, 95–102 (2010)
25. Kubatko, J., Oliver, D., Pelton, K., Rosenbaum, D.T.: A starting point for ana-
lyzing basketball statistics. J. Quant. Anal. Sports 3(3) (2007)
26. Lalwani, A., Saraiya, A., Singh, A., Jain, A., Dash, T.: Machine learning in sports:
a case study on using explainable models for predicting outcomes of volleyball
matches. arXiv preprint arXiv:2206.09258 (2022)
27. Lapham, A., Bartlett, R.: The use of artificial intelligence in the analysis of sports
performance: a review of applications in human gait analysis and future directions
for sports biomechanics. J. Sports Sci. 13(3), 229–237 (1995)
28. Loeffelholz, B., Bednar, E., Bauer, K.W.: Predicting NBA games using neural
networks. J. Quant. Anal. Sports 5(1) (2009)
29. Losada, A.G., Therón, R., Benito, A.: BKViz: a basketball visual analysis tool.
IEEE Comput. Graph. Appl. 36(6), 58–68 (2016)
30. Lundberg, S.M., et al.: From local explanations to global understanding with
explainable AI for trees. Nat. Mach. Intell. 2(1), 56–67 (2020)
31. Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions.
Adv. Neural Inf. Process. Syst. 30 (2017)
32. Lundberg, S.M., et al.: Explainable machine-learning predictions for the prevention
of hypoxaemia during surgery. Nat. Biomed. Eng. 2(10), 749–760 (2018)
33. Maddox, J.T., Sides, R., Harvill, J.L.: Bayesian estimation of in-game home
team win probability for national basketball association games. arXiv preprint
arXiv:2207.05114 (2022)
34. Mandić, R., Jakovljević, S., Erčulj, F., Štrumbelj, E.: Trends in NBA and
Euroleague basketball: analysis and comparison of statistical data from 2000 to
2017. PLoS ONE 14(10), e0223524 (2019)
35. Märtins, J., Westmattelmann, D., Schewe, G.: Affected but not involved: two-
scenario based investigation of individuals’ attitude towards decision support sys-
tems based on the example of the video assistant referee. J. Decis. Syst. 1–25
(2022)
36. Oliver, D.: Basketball on Paper: Rules and Tools for Performance Analysis.
Potomac Books, Inc., Dulles (2004)
37. Page, G.L., Fellingham, G.W., Reese, C.S.: Using box-scores to determine a posi-
tion’s contribution to winning basketball games. J. Quant. Anal. Sports 3(4) (2007)
38. Perin, C., Vuillemot, R., Stolper, C.D., Stasko, J.T., Wood, J., Carpendale, S.:
State of the art of sports data visualization. In: Computer Graphics Forum, vol.
37, pp. 663–686. Wiley Online Library (2018)
39. Rein, R., Memmert, D.: Big data and tactical analysis in elite soccer: future chal-
lenges and opportunities for sports science. Springerplus 5(1), 1–13 (2016)
40. Reina Román, M., Garcı́a-Rubio, J., Feu, S., Ibáñez, S.J.: Training and competition
load monitoring and analysis of women’s amateur basketball by playing position:
approach study. Front. Psychol. 9, 2689 (2019)
116 F. J. Sanguino Bautiste et al.
41. Robertson, S., Bartlett, J.D., Gastin, P.B.: Red, amber, or green? Athlete moni-
toring in team sport: the need for decision-support systems. Int. J. Sports Physiol.
Perform. 12(s2), S2-73 (2017)
42. Sampaio, J., McGarry, T., Calleja-González, J., Jiménez Sáiz, S., Schelling i del
Alcázar, X., Balciunas, M.: Exploring game performance in the national basketball
association using player tracking data. PloS one 10(7), e0132894 (2015)
43. Schelling, X., Fernández, J., Ward, P., Fernández, J., Robertson, S.: Decision sup-
port system applications for scheduling in professional team sport. the team’s per-
spective. Front. Sports Active Living 3, 678489 (2021)
44. Schelling, X., Robertson, S.: A development framework for decision support sys-
tems in high-performance sport. Int. J. Comput. Sci. Sport 19(1), 1–23 (2020)
45. Silver, J., Huffman, T.: Baseball predictions and strategies using explainable AI.
In: The 15th Annual MIT Sloan Sports Analytics Conference (2021)
46. Song, H., et al.: Explainable defense coverage classification in NFL games using
deep neural networks (2023)
47. Thabtah, F., Zhang, L., Abdelhamid, N.: NBA game result prediction using feature
analysis and machine learning. Ann. Data Sci. 6(1), 103–116 (2019)
48. Torres-Ronda, L., Schelling, X.: Critical process for the implementation of tech-
nology in sport organizations. Strength Cond. J. 39(6), 54–59 (2017)
49. Vračar, P., Štrumbelj, E., Kononenko, I.: Modeling basketball play-by-play data.
Expert Syst. Appl. 44, 58–66 (2016)
50. Wang, Y., Liu, W., Liu, X.: Explainable AI techniques with application to NBA
gameplay prediction. Neurocomputing 483, 59–71 (2022)
51. Watson, N., Hendricks, S., Stewart, T., Durbach, I.: Integrating machine learning
and decision support in tactical decision-making in rugby union. J. Oper. Res. Soc.
72(10), 2274–2285 (2021)
52. Whitehead, T.: Explaining synergy’s offensive roles (2023). https://synergysports.
com/synergy-offensive-roles/. Accessed 3 June 2023
53. Zdravevski, E., Kulakov, A.: System for Prediction of the Winner in a Sports
Game. In: Davcev, D., Gomez, J.M. (eds.) ICT Innovations 2009. ICT Innovations
2009, pp. 55–63. Springer, Berlin, Heidelberg (2010). https://doi.org/10.1007/978-
3-642-10781-8 7
Other Team Sports
What Data Should Be Collected for a
Good Handball Expected Goal model?
1 Introduction
Expected goal (xG)1 models are important as accurate predictors of future team
and player performance. At team level, these models are better at predicting
performance than current goal difference or number of shots, such as the Total
Shot Ratio (TSR)2 .
Models of xG are commonly used for soccer and hockey [34]. Here we consid-
ered their use for handball, which is absent from the literature. Our study also
attempted to clarify what types of data are required for the design of a robust
xG model for handball. For example, can we be satisfied with analyzing player
positions alone, or do we need to take account of match events (passing, fouls,
shooting)? Taking each type of information into account represents a cost that
needs to be considered: budgets and resources vary from one sport to another,
and even from one team to another within the same sport. Depending on the
available budget, data acquisition technologies will provide rich, comprehensive
data, or poor, summary data.
We are therefore interested here in whether a dataset must necessarily be rich
to obtain relevant xG values, and how this impacts its learning cost. Indeed, data
1
https://www.statsperform.com/opta-analytics/.
2
https://www.scisports.com/total-shots-ratio/.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
U. Brefeld et al. (Eds.): MLSA 2023, CCIS 2035, pp. 119–130, 2024.
https://doi.org/10.1007/978-3-031-53833-9_10
120 A. Mortelier et al.
availability affects the ability to learn good models of xG [29]. In the search for
young talent, a minor league team has very little data, so it’s interesting to define
an approach that allows us to judge the quality of the xG with summary data,
so as not to miss out on the next recruit.
To answer the question of the relevance of the attributes needed to calculate
xG, we used machine learning techniques in this article. These techniques make
it possible to establish a model of the xG and the conditions for calculating
this model. Calculation performance (accuracy, convergence speed) indicates the
relevance of the attributes used: if the attribute improves learning quality, then
it is relevant to consider it. This approach is called wrapper [20].
In collective ball sports, goals are the most important events. Depending on the
sport, these goal events may occur rarely, as in soccer, where goal events have a
low probability of occurrence during a match [37]. Analysts therefore prefer to
focus their attention on shots, which are more numerous than goals. This has
led to the creation of measures such as the Total Shot Ratio or TSR [21], which
evaluates the dominance of teams based on their share of shots in matches. The
limitation of TSR, however, is that it does not take into account the quality of
a shot. For example, a shot on goal without a goalkeeper has the same value as
a shot from midfield defended by an entire team.
With the democratization of data analysis, several derivatives of this measure
were created. Shot differential, Fenwick3 differential (shot differential between
teams) and Corsi4 differential (which considers shots on target and blocked shots
differently) have become popular for analyzing the performance of hockey teams
and players [23]. These statistics were chosen because each has proven to be a
very good indicator of performance at team level.
As not all shots are equal, a method is needed to measure the quality of a
given shot or series of shots, and this is how the xG models were developed. xG
is a statistical metric based on the shooter’s position to calculate a probability
of turning a shot into a goal. However, in sports where goal events occur most
frequently, such as handball, which has around 40 goals per match for two and
a half times as many shots, xG models are not used.
Historically, xG models were introduced to soccer in 2012 by Sam Green
for statistics site Opta5 . In soccer, the company StatsBomb6 provides one of
the most recognized xG7 calculation services. StatsBomb uses this metric8 in a
3
https://en.wikipedia.org/wiki/Fenwick (statistic).
4
https://en.wikipedia.org/wiki/Corsi (statistic).
5
https://www.statsperform.com/opta/.
6
https://statsbomb.com/.
7
https://statsbomb.com/soccer-metrics/expected-goals-xg-explained/.
8
https://statsbomb.com/what-we-do/soccer-data/.
What Data Should Be Collected for a Good Handball Expected Goal Model? 121
2.2 Use of xG
3 Handball Data
3.1 Data Acquisition
There are two types of dataset for team sports: spatio-temporal data and event
data.
The spatio-temporal data are the trajectories of the players or the ball ; they
provide location, direction and speed information for each of the entities [12].
Event data provides logs of player-related events (pass, shot, catch, goal-
keeper stop, etc.) or technical events (red card, start of time, stop of time, foul,
etc.) for a match. Each of these events can be associated with a distance, duration
or direction, and is time-stamped.
These two ways of looking at data require different technologies and technical
or financial resources. For example, Indoor Position System (IPS) sensors [33]
are inexpensive but not very accurate, while video [27,31,32] is accurate but
requires expensive processing.
In the experiments in Sect. 5, we’re not looking for the optimal attribute
configuration. We simply compare several emblematic families (see Table 1):
The attributes chosen come from the literature on different xG models for
football [6,9,15,16,25,39]. All these models use a base consisting of the shooter’s
position, the opening angle of the goal and the distance to the goal.
However, distance and angle are calculated from the shooter’s x and y posi-
tions. They are therefore a priori redundant with the shooter’s position. This is
What Data Should Be Collected for a Good Handball Expected Goal Model? 125
why the model will be trained, on the one hand with the positions alone, and on
the other with the addition of distance and angle. This distinction will enable
us to decide whether the position is sufficient to calculate a good model.
For event data, the attributes are directly derived or calculated from the raw
data. The characterization of the game situation (placed attack or transition
game) is determined from the positions of the team’s players: after a change in
the team in possession of the ball, a transition situation is characterized by a
surplus of players, indicating the delay of one or more defenders.
For spatial data, the area of the shooter’s dominant Voronoı̈ region and the
number of neighbors in the Delaunay graph are considered. This information is
thought to be indicative of the pressure exerted on the shooter.
5 Results
5.1 Sample Calculation
On the Qatar 2015 data, shot density, goal density and xG are represented by
heat maps in Fig. 2. Although this article asks which attribute families should
be considered for good xG learning, it seems relevant to visualize and analyze
ground truth.
126 A. Mortelier et al.
Fig. 2. Shot density (left), goal density (middle) and xG (right) in handball.
Most shots are taken from the 6 m and 9 m lines. On the other hand, their
success seems to depend on the distance from the goal: the further away the
shooter is, the more the angle of his shot seems likely to face the goal. The
density suggests a shooting cone, with the wide base closest to the goal, and its
apex closer to midfield. We can assume that there is a correlation between the
chance of success on goal and the angle and distance of the shooter. Due to the
atypical topography of a handball pitch, which prohibits players from entering
the 6m zone, we observe a higher success rate for shots in this area. Indeed, an
attacker can enter the zone for a brief moment while in the air, and thus be in
a duel with the goalkeeper, with no defender to get in the way.
Table 2 shows performance (accuracy, precision, recall, etc.) for each attribute
family depending on model chosen. Learning times are in seconds for the refer-
ence attribute family (pos only), and in percentage relatively to the reference.
In our experiments, we found that the RandomForest algorithm performed
best. For RandomForest, the F1-score is between 0.75 and 0.78 – recall and
precision are almost identical – and the AUC varies between 0.71 and 0.75. This
is the sign of a high-performance model that predicts rather correctly whether
there is a goal in a given situation. The other models have an F1-score between
0.66 and 0.72, and an AUC between 0.63 and 0.67: RandomForest performs
much better than the others.
As attribute richness increases, the improvement in learning quality is not
very significant (F1-score of 0.75 for positions alone, 0.78 for all attributes). An
increase in learning time is observed between the simplest configuration (pos)
and the richest configuration (combined data).
We can also see that there is very little difference between event and spatial
attribute configurations. The extra cost involved in calculating spatial attributes,
such as those associated with the Delaunay graph, which requires the position
of all players, is not always justified. This is characteristic of handball, whereas
in soccer, the richness of attributes is important and the consideration of spatial
attributes is necessary.
What Data Should Be Collected for a Good Handball Expected Goal Model? 127
6 Conclusion
and team handball performance, which could have practical applications in the
field of training and strategy.
Acknowledgements. This work was partially funded by the ANR and the Normandy
Region as part of the HAISCoDe project.
References
1. Anzer, G., Bauer, P.: A goal scoring probability model for shots based on synchro-
nized positional and event data in football (soccer). Front. Sports Active Living 3,
624475 (2021)
2. Aurenhammer, F.: Voronoi diagrams: a survey of a fundamental geometric data
structure. ACM Comput. Surv. 23(3), 345–405 (1991)
3. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
4. Buchheit, M., Allen, A., Poon, T.K., Modonutti, M., Gregson, W., Di Salvo, V.:
Integrating different tracking systems in football: multiple camera semi-automatic
system, local position measurement and GPS technologies. J. Sports Sci. 32(20),
1844–1857 (2014)
5. Cardinale, M., Whiteley, R., Hosny, A.A., Popovic, N.: Activity profiles and posi-
tional differences of handball players during the world championships in Qatar
2015. Int. J. Sports Physiol. Perform. 12(7), 908–915 (2017)
6. Cavus, M., Biecek, P.: Explainable expected goal models for performance anal-
ysis in football analytics (2022). https://doi.org/10.48550/ARXIV.2206.07212,
https://arxiv.org/abs/2206.07212
7. Chen, T., et al.: Xgboost: extreme gradient boosting. R Package Version 0.4-2 1(4),
1–4 (2015)
8. Delaunay, B.: Sur la sphére vide. Proceedings du Congrés international des
mathématiciens de 1924, pp. 695–700 (1924)
9. Fairchild, A., Pelechrinis, K., Kokkodis, M.: Spatial analysis of shots in MLS: a
model for expected goals and fractal dimensionality. J. Sports Anal. 4(3), 165–174
(2018)
10. Fawcett, T.: Roc graphs: Notes and practical considerations for researchers. Tech-
nical report HPL-2003-4, HP Laboratories (2003)
11. Germano, M.: Turbulence: the filtering approach. J. Fluid Mech. 238, 325–336
(1992)
12. Gudmundsson, J., Horton, M.: Spatio-temporal analysis of team sports. ACM
Comput. Surv. (CSUR) 50(2), 1–34 (2017)
13. Hansen, C., Sanz-Lopez, F., Whiteley, R., Popovic, N., Ahmed, H.A., Cardinale,
M.: Performance analysis of male handball goalkeepers at the world handball cham-
pionship 2015. Biol. Sport 34(4), 393 (2017)
14. Hansen, C., Whiteley, R., Wilhelm, A., Popovic, N., Ahmed, H., Cardinale, M.,
et al.: A video-based analysis to classify shoulder injuries during the handball
world championships 2015. Sportverletzung Sportschaden: Organ der Gesellschaft
fur Orthopadisch-traumatologische Sportmedizin 33(1), 30–35 (2019)
15. Herold, M., Goes, F., Nopp, S., Bauer, P., Thompson, C., Meyer, T.: Machine
learning in men’s professional football: current applications and future directions
for improving attacking play. Int. J. Sports Sci. Coach. 14(6), 798–817 (2019).
https://doi.org/10.1177/1747954119879350
What Data Should Be Collected for a Good Handball Expected Goal Model? 129
16. Hewitt, J.H., Karakuş, O.: A machine learning approach for player and posi-
tion adjusted expected goals in football (soccer) (2023). https://doi.org/10.48550/
ARXIV.2301.13052, https://arxiv.org/abs/2301.13052
17. Ke, G., et al.: Lightgbm: a highly efficient gradient boosting decision tree. Adv.
Neural Inf. Process. Syst. 30 (2017)
18. Kim, S.: Voronoi analysis of a soccer game. Nonlinear Anal. Model. Control
9(3), 233–240 (2004). https://doi.org/10.15388/NA.2004.9.3.15154, https://www.
journals.vu.lt/nonlinear-analysis/article/view/15154
19. Kingsford, C., Salzberg, S.L.: What are decision trees? Nat. Biotechnol. 26(9),
1011–1013 (2008)
20. Kohavi, R., John, G.H.: The wrapper approach. In: Liu, H., Motoda, H. (eds.)
Feature Extraction, Construction and Selection. The Springer International Series
in Engineering and Computer Science, vol. 453, pp. 33–50. Springer, Boston, MA
(1998). https://doi.org/10.1007/978-1-4615-5725-8 3
21. Lago-Ballesteros, J., Lago-Peñas, C.: Performance in team sports: identifying the
keys to success in soccer. J. Hum. Kinet. 25(2010), 85–91 (2010)
22. LaValley, M.P.: Logistic regression. Circulation 117(18), 2395–2399 (2008)
23. Macdonald, B.: Adjusted plus-minus for NHL players using ridge regression with
goals, shots, fenwick, and corsi. J. Quant. Anal. Sports 8(3) (2012). https://doi.
org/10.1515/1559-0410.1447
24. Macdonald, B.: An expected goals model for evaluating NHL teams and players.
In: Proceedings of the 2012 MIT Sloan Sports Analytics Conference (2012)
25. Madrero Pardo, P.: Creating a model for expected Goals in football using qualita-
tive player information. Ph.D. thesis, UPC, Facultat d’Informàtica de Barcelona,
Departament de Ciéncies de la Computació, June 2020. http://hdl.handle.net/
2117/328922
26. Madrero Pardo, P.: Creating a model for expected goals in football using qualitative
player information. Master’s thesis, Universitat Politècnica de Catalunya (2020)
27. Pettersen, S.A., et al.: Soccer video and player position dataset. In: Proceedings
of the 5th ACM Multimedia Systems Conference, pp. 18–23 (2014)
28. Rathke, A.: An examination of expected goals and shot efficiency in soccer. J.
Hum. Sport Exerc. 12(2), 514–529 (2017)
29. Robberechts, P., Davis, J.: How data availability affects the ability to learn good
XG models. In: Brefeld, U., Davis, J., Van Haaren, J., Zimmermann, A. (eds.)
Machine Learning and Data Mining for Sports Analytics. MLSA 2020. CCIS,
vol. 1324, pp. 17–27. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-
64912-8 2
30. Ruiz, H., Lisboa, P., Neilson, P., Gregson, W.: Measuring scoring efficiency through
goal expectancy estimation. In: ESANN 2015 proceedings of the European Sym-
posium on Artificial Neural Networks, Computational Intelligence and Machine
Learning, pp. 149–154 (2015)
31. Sanford, R., Gorji, S., Hafemann, L.G., Pourbabaee, B., Javan, M.: Group activ-
ity detection from trajectory and video data in soccer. In: Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops,
pp. 898–899 (2020)
32. Scott, A., Uchida, I., Onishi, M., Kameda, Y., Fukui, K., Fujii, K.: Soccertrack: A
dataset and tracking algorithm for soccer with fish-eye and drone videos. In: Pro-
ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni-
tion, pp. 3569–3579 (2022)
33. Serpiello, F., et al.: Validity of an ultra-wideband local positioning system to mea-
sure locomotion in indoor sports. J. Sports Sci. 36(15), 1727–1733 (2018)
130 A. Mortelier et al.
34. Spearman, W.: Beyond expected goals. In: Proceedings of the 12th MIT Sloan
Sports Analytics Conference, pp. 1–17 (2018)
35. Taki, T., Hasegawa, J.: Visualization of dominant region in team games and its
application to teamwork analysis. In: Proceedings Computer Graphics Interna-
tional 2000, pp. 227–235 (2000). https://doi.org/10.1109/CGI.2000.852338
36. Taki, T., Hasegawa, J.I., Fukumura, T.: Development of motion analysis system
for quantitative evaluation of teamwork in soccer games. In: Proceedings of 3rd
IEEE International Conference on Image Processing, vol. 3, pp. 815–818. IEEE
(1996)
37. Tenga, A., Ronglan, L.T., Bahr, R.: Measuring the effectiveness of offensive match-
play in professional soccer. Eur. J. Sport Sci. 10(4), 269–277 (2010)
38. Tiippana, T., et al.: How accurately does the expected goals model reflect goalscor-
ing and success in football? (2020)
39. Umami, I., Gautama, D., Hatta, H.: Implementing the expected goal (xG)
model to predict scores in soccer matches. Int. J. Inform. Inf. Syst. 4(1), 38–
54 (2021). https://doi.org/10.47738/ijiis.v4i1.76, http://ijiis.org/index.php/IJIIS/
article/view/76
40. Umami, I., Gautama, D.H., Hatta, H.R.: Implementing the expected goal (xG)
model to predict scores in soccer matches. Int. J. Inform. Inf. Syst. 4(1), 38–54
(2021)
41. Van Haaren, J.: Why would i trust your numbers? On the explainability of expected
values in soccer. arXiv preprint arXiv:2105.13778 (2021)
Identifying Player Roles in Ice Hockey
1 Introduction
Ice hockey is a fast-paced team sport that emphasizes both physical prowess and
technical ability [10]. However, the expectations and responsibilities of players
vary, not only based on playing position, but also on the role of the player. The
three traditional groups of positions in ice hockey are goaltenders, defenders, and
forwards [21], where the latter two positions are referred to as skaters. However,
the roles of the players are not always that clear cut. For instance, while defenders
are typically given the highest responsibility for preventing the opposition from
scoring, there are defenders who specialize in offensive contribution [21].
The benefits of categorizing players into roles are multi-fold. For team staff
it will allow the choices and design of rosters and line-ups to be more effective
in-game. Additionally, the construction of team rosters is also constrained from
an economic standpoint. In the National Hockey League (NHL), the salary cap
prevents a team from having salary expenditure above a fixed amount [6]. On
an individual level, if there is a disagreement in expectations of the player’s role
between a player and a team, the development of the player may be hampered
and the likelihood of attaining success is lowered for both parties [14].
Work on player roles has been performed in different sports (e.g. [1,19]).
Prior work regarding player roles in ice hockey has typically utilized methods
that assign each player into a distinct cluster, e.g., using k-means, and used a
limited set of performance metrics, e.g., points, plus-minus, and penalty minutes,
which may leave some roles or role nuances undiscovered [6,21]. In comparison,
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
U. Brefeld et al. (Eds.): MLSA 2023, CCIS 2035, pp. 131–143, 2024.
https://doi.org/10.1007/978-3-031-53833-9_11
132 R. Säfvenberg et al.
the aim of this paper is to identify different player roles for skaters in ice hockey
using performance metrics that span more aspects of the game than previous
work, as well as a wider basis for discovering different player roles. Further,
players can be assigned to different roles to different degrees.
The contributions of this paper are as follows. First, we identify player roles
by using a larger set of performance metrics than previous work, allowing us to
discover new roles and/or key components in understanding a role, and by using
fuzzy clustering, allowing each player to belong to a role to some degree, rather
than assigning each player to a distinct role. Further, we show applications to
team constructions in terms of player contract comparison and team composi-
tion for successful and less successful teams. Our findings have value to players,
scouts, and managers.
The remainder of the paper is organized as follows. Section 2 describes the
data used for the analysis while Sect. 3 introduces the method including prepro-
cessing, principal component analysis and fuzzy clustering. Section 4 presents
and contextualizes the results. Further, we show two applications. In Sect. 5 we
compare players to players with similar roles with respect to their salaries and in
Sect. 6 we investigate the relationship between team composition based on roles
and reaching the playoffs. Limitations of the study are addressed and concluding
remarks are drawn in Sect. 7.
2 Data
We use data from 2021–2022 NHL regular season obtained from the official web-
site of the NHL1 and their public API, as well as salary data from CapFriendly2 .
The data combines play-by-play data with shift data. From this data, a set of
variables was derived (Table 1). Variables regarding goals, assists, and expected
goals are used to evaluate offensive quality and frequency among players. Plus-
minus (+/−), xGF, xGF%, and xGF% Relative serve as proxies for team perfor-
mance while the player is on the ice. Giveaways gives some measure of puck con-
trol, while takeaways and blocks represent defensive contributions. Hits, net hits,
penalties, net penalties, penalty minutes, and number of penalties per group all
portray player aggression and physical play. The number of penalties per group
variables are also split into the penalties the player is given as well as the penal-
ties that are drawn, to distinguish between players who are the instigator and
the receiver. Offensive zone starts can depict if a player is more offensively or
defensively orientated. The time on ice variables capture how much ice time the
player has, while the coordinate variable describes where the player typically is
when performing each event. Finally, weight characterizes player physique, which
is used to gauge the physical dimension of players and its impact on player role.
Furthermore, the variables xGF%, and xGF% Relative serve as a proxy for puck
possession, and, as [22] explains, can negate the weaknesses of the traditional
plus-minus metric.
1
https://www.NHL.com.
2
https://www.capfriendly.com.
Identifying Player Roles in Ice Hockey 133
Variable Description
Goals Number of goals scored by the player
Assists Number of passes by the player leading to goals
xG How many goals a player is expected to score
xG difference Goals - xG
S% Percentage of shots on goal that become goals
+/− +/− while the player is on the ice
xGF% xGF on ice/(xGF on ice + xGF off ice) during 5 on 5
xGF% Relative xGF% on ice - xGF% off ice during 5 on 5
Giveaway Unforced loss of control of puck by the player
Takeaway Retrieval of the puck from opponent
Blocks Number of blocked shots by the player
Hits Number of hits by the player on opposing players
Net hits Hits given - Hits received
Fights Number of fights the player was involved in
Penalties Number of penalties on the player
Net penalties Penalties on - Penalties drawn
Penalty minutes Total penalty minutes for the player
Penalties per group Groups are physical, restraining, stick, (other not used)
Penalties drawn per group Groups are physical, restraining, stick, (other not used)
OZS% % of starts in the offensive zone for the player
PP% % of time played in powerplay
SH% % of time played while shorthanded
OT% % of time played in overtime
Median X coordinate Median X coordinate for each event
Median Y coordinate Median absolute Y coordinate for each event
TOI∗ Total time on ice for the player
5 on 5 TOI∗ Total time on ice during full strength for the player
Height Player’s height in inches
Weight Player’s weight in pounds
* Not included as a variable for modeling.
3 Method
3.1 Preprocessing
In the Preprocessing step, a threshold was used of 200 min for minimum number
of minutes played during the season to exclude players who had insufficient
playing time. The number of defenders satisfying this requirement was 263 (out
of 345), and the number of forwards 485 (out of 659). We then split the data into
two subsets, one for defenders and one for forwards, to take positional variations
into consideration. Further, all3 performance-related variables with counts, i.e.,
3
Except xGF, which used 5 on 5 TOI.
134 R. Säfvenberg et al.
not variables with percentages, were then standardized by dividing by total time
on the ice (TOI) and multiplied by 60 (number of minutes in a game in regulation
time). The variables were also normalized by subtracting the variable’s mean and
dividing by its corresponding standard deviation.
3.3 Clustering
As we wanted to model that players can take on different roles to certain degrees
and that roles tend to have overlapping elements, we opted to use a fuzzy cluster-
ing algorithm rather than the crisp clustering algorithms used in previous work.
In fuzzy clustering, the objects are assigned a probability of belonging to a given
cluster, where the probabilities of cluster membership of an object sum to one.
In this paper, we used the fuzzy c-means algorithm [3,9]. The objective in fuzzy
c-means algorithm is to create k fuzzy partitions among a set of n objects from
a data vector x by solving (1) until convergence.
n
k
k
2
min Jm = ij d (xi , cj ) s.t. uij ∈ [0, 1],
um uij = 1 (1)
U,C
i=1 j=1 j=1
In (1), d denotes the distance between object i and the j:th cluster centroid
cj . Moreover, uij is the degree of membership for object i to cluster j. The
hyperparameter m controls the degree of fuzziness, where a higher m leads to
a fuzzier solution [4]. It can also be shown that the fuzzy solution converges to
the crisp solution as m → 1 [15] and as m → ∞ then uij → k1 .
There is no optimal m that suits all cases [4]. However, m ∈ [1.5, 3.0] tends
to give satisfactory results in general [4] or are typical values [24], m = 2 results
in compact and well separated clusters [9], but can also negatively affect the
clustering [8,20]. The formula that we use for deciding m was proposed in [20].
1418 12.33
f (n, p) = 1 + + 22.05 d−2 + 0.243 d−0.0406 log(n)−0.1134 , (2)
n n
popular method for deciding how many clusters should be formed is to compare
a set of candidate k by considering one or more cluster validity indices [16]. In
[23] a large set of different fuzzy cluster validity indices are compared. Although
no singular validity index is optimal for all data, the modified partition coeffi-
cient (MPC) was one of the indices that partitioned many of the investigated
data sets into the best number of clusters. The MPC is an extension of the
partition coefficient (PC) [3]:
1 2
n k
1
VP C = uij ∈ ,1 , (3)
n i=1 j=1 k
where uig and uig represent the two largest elements from Ui [5] while α ≥ 0 is
a weighting coefficient that is commonly set to 1 [11]. One distinction between
the crisp and fuzzy silhouette is that the latter computes the weight for each
term, based on the two fuzzy clusters that are found to be the best match. The
optimal number of k is obtained by maximizing the index [5].
4 Results
4.1 Principal Component Analysis
Figure 1 shows the proportion of variance explained by each component generated
by the PCA for defenders and forwards, respectively. The first six components are
136 R. Säfvenberg et al.
1.0
Cumulative proportion of
0.8
variance explained
0.6
Defenders
Forwards
0.4
0.2
0.0
5 10 15 20 25 30 35 40 45
Principal component
responsible for the majority of the variance in the data, by providing an explana-
tion of at least 50% of the variance. However, the proportion of explained variance
decreases rapidly, and in order to explain, e.g., 90% of the variance, at least 26 and
27 components are required for defenders and forwards, respectively. We used par-
allel analysis to determine the set of components, which resulted in the selection
of the first eight for defenders and nine components for forwards. These choices
explain approximately 55–57% of the variance for each respective position group.
For obtaining values for m in fuzzy c-means we used Eq. 2 which resulted in 2.407
for defenders and 2.179 for forwards. For k we used Eqs. 4 and 6 to evaluate
cluster cohesion. k = 2 produced the most distinct clusters for defenders and
forwards and less cohesive clusters were observed for values larger than k =
3 (defenders) and k = 4 (forwards). In addition to these metrics, the cluster
assignment of players was also compared to domain knowledge to guide the
final choice. More specifically, the players who belong to the same cluster should
share the same style of play, regardless of if other possible roles, i.e., clusters,
have some overlap. As a result, k = 3 and k = 4 were chosen for defenders and
forwards, respectively, as they provide more cohesive clusters while also allowing
the number of roles to be as descriptive as possible.
Figure 2 shows the distribution of the probabilities representing cluster mem-
bership. We note that the densities of probabilities for defenders are more similar
than forwards, where cluster F4 has a high peak close to zero. Furthermore, clus-
ters D1 and D3 among defenders have similar distributions while cluster D2 is
more centered. For forwards, both clusters F1 and F3 appear to span the entire
range of possible values between zero and one, while clusters F2 and F4 have a
somewhat smaller range. Except for cluster D2 among defenders, the densities
reach a peak between 0 and 0.25.
To explore the variables characterizing each cluster, we retrieve the cluster
centroids, which are expressed in terms of principal components. We then obtain
approximate centroids corresponding to the original variables by computing an
inverse transform of the centroids and the selected principal components. More-
over, since the data was standardized to have unit variance prior to conducting
Identifying Player Roles in Ice Hockey 137
Fig. 2. Cluster membership degrees for defenders (left) and forwards (right).
PCA we also invert this procedure by multiplying by the standard deviation and
adding the mean for each variable. Using this method, we then obtain approxi-
mate centroids on the original variable scales, expressed in per 60 min of ice time
(Table 2).
Among defenders, a higher probability to belong to cluster D1 is connected to
the most offensively skilled defenders, who assist their team’s attacking presence
by contributing more goals, assists, xG, and takeaways while also playing closer
to the opposition’s net. They also start in the offensive zone and in powerplay
situations, more often than defenders in the other clusters. In cluster D3 we find
the most physical defenders, as the number of fights, hits, and penalties are the
highest, alongside the largest average weight. Their offensive contribution, with
respect to xG, assist, and OZS% rank is lowest. They also garner the highest
share of time played while shorthanded, but lowest in overtime and while short-
handed. Finally, the third cluster, D2, contains the defensive specialists, where
goals, penalty minutes, and hits are at their lowest, in combination with playing
closer to their own net. They are also the shortest and lightest.
Regarding forwards, the players in cluster F4 can be described as physi-
cal players, with the highest weight in combination with most fights, hits, and
penalty minutes. The offensive production is second lowest, and they tend to
be preferred in defensive situations. The offensive specialists can be found in
cluster F3, where goals, assists, and xG are at their highest, which also can be
seen in their play closer to the opposition’s goal. Moreover, players in F3 also
draw the most non-physical penalties. These players also block the most shots. A
lower, but still second highest, offensive proficiency characterizes the players in
F2, alongside positive xGF% values. Finally, the two-way forwards reside in F1,
with skating that covers the entirety of the rink. Additionally, the lowest xG and
goals are created by these players, alongside the lowest +/− and xGF On. How-
ever, they rank the highest for time played while shorthanded and percentage of
starts in the defensive zone.
138 R. Säfvenberg et al.
6 Team Composition
Team composition can have a substantial impact on team performance [6]. There-
fore, we also investigate if there are any patterns between player roles and team
success for the given season. We first compute the minutes played for all players
per team and retain the 18 players with the highest playing time. The choice of
18 players is based on the roster size in the NHL, where 20 players, of whom 18
are skaters and 2 goaltenders, are allowed to be used in any given game. Thus,
4
https://www.capfriendly.com/transactions.
140 R. Säfvenberg et al.
Table 3. Most underpaid and overpaid players, relative to players with similar cluster
membership probabilities for each position.
these 18 players can then represent a possible composition of players for any
given team and game. Except for the San Jose Sharks (8D/10F) and the Florida
Panthers (5D/13F), the team compositions either consisted of 7 defenders and
11 forwards or 6 defenders and 12 forwards. Next, for a given team we then sum
the cluster probabilities among all players in each position group (defenders and
forwards) to obtain an estimate of how many players they have in each role,
which is then divided by the total number of players for the given position to
find the proportion of roles each team has. Thus, the sum of all forward clusters
sums to one and likewise for defenders. This is illustrated in Fig. 3, where a hier-
archical clustering using Ward’s linkage method and Euclidean distance groups
the playoff and non-playoff teams by team composition.
Identifying Player Roles in Ice Hockey 141
7 Conclusion
In this paper we have proposed a novel method for quantifying player roles in ice
hockey from a large set of performance indicators and player data using fuzzy
c-means. We also investigated the application of comparative contract evaluation
for the comparison of salary and player role, which can be used as a component in
142 R. Säfvenberg et al.
References
1. Aalbers, B., Van Haaren, J.: Distinguishing between roles of football players in
play-by-play match event data. In: Brefeld, U., Davis, J., Van Haaren, J., Zimmer-
mann, A. (eds.) MLSA 2018. LNCS, vol. 11330, pp. 31–41. Springer, Cham (2018).
https://doi.org/10.1007/978-3-030-17274-9_3
2. Assent, I.: Clustering high dimensional data. Wiley Interdisc. Rev. Data Min.
Knowl. Discov. 2(4), 340–350 (2012). https://doi.org/10.1002/widm.1062
3. Bezdek, J.C.: Pattern Recognition With Fuzzy Objective Function Algorithms.
Springer, New York (1981). https://doi.org/10.1007/978-1-4757-0450-1
4. Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algo-
rithm. Comput. Geosci. 10(2–3), 191–203 (1984). https://doi.org/10.1016/0098-
3004(84)90020-7
5. Campello, R.J., Hruschka, E.R.: A fuzzy extension of the silhouette width criterion
for cluster analysis. Fuzzy Sets Syst. 157(21), 2858–2875 (2006). https://doi.org/
10.1016/j.fss.2006.07.006
6. Chan, T.C., Cho, J.A., Novati, D.C.: Quantifying the contribution of NHL player
types to team performance. Interfaces 42(2), 131–145 (2012). https://doi.org/10.
1287/inte.1110.0612
7. Dave, R.N.: Validating fuzzy partitions obtained through c-shells clustering.
Pattern Recogn. Lett. 17(6), 613–623 (1996). https://doi.org/10.1016/0167-
8655(96)00026-8
8. Dembele, D., Kastner, P.: Fuzzy C-means method for clustering microarray data.
Bioinformatics 19(8), 973–980 (2003). https://doi.org/10.1093/bioinformatics/
btg119
Identifying Player Roles in Ice Hockey 143
9. Dunn, J.: A fuzzy relative of the isodata process and its use in detecting compact
well-separated clusters. J. Cybern. 3(3), 32–57 (1973). https://doi.org/10.1080/
01969727308546046
10. Felmet, G.: Ice hockey. In: Krutsch, W., Mayr, H.O., Musahl, V., Della Villa, F.,
Tscholl, P.M., Jones, H. (eds.) Injury and Health Risk Management in Sports: A
Guide to Decision Making, pp. 485–489. Springer, Heidelberg (2020). https://doi.
org/10.1007/978-3-662-60752-7_74
11. Ferraro, M.B., Giordani, P.: A toolbox for fuzzy clustering using the R program-
ming language. Fuzzy Sets Syst. 279, 1–16 (2015). https://doi.org/10.1016/j.fss.
2015.05.001
12. Glorfeld, L.W.: An improvement on Horn’s parallel analysis methodology for select-
ing the correct number of factors to retain. Educ. Psychol. Measur. 55(3), 377–393
(1995). https://doi.org/10.1177/0013164495055003002
13. Horn, J.L.: A rationale and test for the number of factors in factor analysis. Psy-
chometrika 30(2), 179–185 (1965). https://doi.org/10.1007/BF02289447
14. Lefebvre, J.S., Martin, L.J., Côté, J., Cowburn, I.: Investigating the process
through which National Hockey League Player Development Coaches ‘develop’
athletes: an exploratory qualitative analysis. J. Appl. Sport Psychol. 34(1), 47–66
(2022). https://doi.org/10.1080/10413200.2019.1688893
15. Miyamoto, S., Ichihashi, H., Honda, K.: Algorithms for Fuzzy Clustering. Springer,
Heidelberg (2008). https://doi.org/10.1007/978-3-540-78737-2
16. Pakhira, M.K., Bandyopadhyay, S., Maulik, U.: Validity index for crisp and fuzzy
clusters. Pattern Recogn. 37(3), 487–501 (2004). https://doi.org/10.1016/j.patcog.
2003.06.005
17. Peres-Neto, P.R., Jackson, D.A., Somers, K.M.: How many principal components?
Stopping rules for determining the number of non-trivial axes revisited. Comput.
Stat. Data Anal. 49(4), 974–997 (2005). https://doi.org/10.1016/j.csda.2004.06.
015
18. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation
of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987). https://doi.org/10.
1016/0377-0427(87)90125-7
19. Sattari, A., Johansson, U., Wilderoth, E., Jakupovic, J., Larsson-Green, P.: The
interpretable representation of football player roles based on passing/receiving pat-
terns. In: Brefeld, U., Davis, J., Van Haaren, J., Zimmermann, A. (eds.) MLSA
2021. CCIS, vol. 1571, pp. 62–76. Springer, Cham (2022). https://doi.org/10.1007/
978-3-031-02044-5_6
20. Schwämmle, V., Jensen, O.N.: A simple and fast method to determine the param-
eters for fuzzy c-means cluster analysis. Bioinformatics 26(22), 2841–2848 (2010).
https://doi.org/10.1093/bioinformatics/btq534
21. Vincent, C.B., Eastman, B.: Defining the style of play in the NHL: an application
of cluster analysis. J. Quant. Anal. Sports 5(1) (2009). https://doi.org/10.2202/
1559-0410.1133
22. Vollman, R.: Hockey Abstract Presents... Stat Shot: The Ultimate Guide to Hockey
Analytics. ECW Press (2016)
23. Wang, W., Zhang, Y.: On fuzzy cluster validity indices. Fuzzy Sets Syst. 158(19),
2095–2117 (2007). https://doi.org/10.1016/j.fss.2007.03.004
24. Wierzchoń, S.T., Kłopotek, M.A.: Modern Algorithms of Cluster Analysis.
Springer, Cham (2018). https://doi.org/10.1007/978-3-319-69308-8
Elite Rugby League Players’ Signature
Movement Patterns and Position
Prediction
The authors would like to acknowledge The Rugby Football League (RFL) for the
access to GPS data.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
U. Brefeld et al. (Eds.): MLSA 2023, CCIS 2035, pp. 144–154, 2024.
https://doi.org/10.1007/978-3-031-53833-9_12
RFL Positions’ Signature Movement Patterns and Prediction 145
1 Introduction
Sporting events [9] happen sequentially and players’ completed activities (i.e.,
external load) during matches or training are collected in a time-series format.
However, external loads are widely quantified and analysed without regard to
the sequential nature of sporting events [4] through the use of either technical-
tactical indicators (expertly coded by video-notational analysts from match
videos) or physical indicators (collected via a micro-technology unit contain-
ing accelerometer, magnetometer and or gyroscope worn by players). Few stud-
ies [11,12] have now quantified external loads with respect to their sequential
nature of occurrences - player movement profiling. Player movement profiling [11]
involves quantifying players’ external loads through the extraction of sequential
movement patterns (i.e., exact sequential on-field activities) to understand match
demands and uncover how external loads were accumulated towards replicating
match characteristics for training aspects.
Movement patterns are identified subsets of discrete movement activities
performed by players. Discrete movement sequences (for a player) are a set of
time-series concatenation of discretized velocity, acceleration and turning angle
values without periods of athlete’s inactivity. The study [12] is the first to iden-
tify players’ movement patterns from discrete movement sequences formulated
from Global Positioning System (GPS) data by developing a better and more
stable framework, called Sequential Movement Pattern-mining (SMP). An exam-
ple of an extracted movement pattern is “NNNNN”, (denoted as [Run-Neutral-
Straight] × 5 based on study [11] or denoted as [Sprint-Deceleration-Backwards]
× 5 based on study [12]).
In study [5], the SMP framework [12] was utilised to quantify the differ-
ence between three competition levels of the Rugby Football League based on
movement units obtained from decomposed movement patterns, through the use
of linear discriminant analysis algorithm. However, unique movement patterns
were not identified for each competition level which may be due to SMP frame-
work [12] limitations of identifying a few movement patterns which must be the
longest common movement patterns within each of twenty-five clusters of similar
discrete movement sequences. This type of movement pattern contains omission
of adjacent movement activities and may not represent exact match character-
istics required for training aspects. Additionally, other movement patterns that
are not the longest common subsequences but could be useful and of interest for
training aspects are discarded by the SMP framework.
The study [2] formulated and developed a sequential pattern mining algo-
rithm as a solution to the SMP framework limitations. The developed algorithm,
l -length closed contiguous sequential pattern mining (LCCspm), was used to
extract user-defined lengths of frequent closed contiguous movement patterns
146 V. E. Adeyemo et al.
from sets of discrete movement sequences of elite rugby league players as well as
frequent closed contiguous match events patterns performed by national team
players that participated in the men’s FIFA 2018 world cup. The algorithm was
reported to extract a user-defined length of closed contiguous patterns faster,
scaled better and used lesser memory than other existing related algorithms.
More so, a large number of frequent player movement patterns and match-event
patterns were reportedly discovered by LCCspm.
The study [3] quantified elite rugby league players’ external loads via three
distinct types of obtainable movement patterns, used the sets of extracted move-
ment patterns as independent variables to classify rugby league players into two
tactically distinct playing positions (i.e., hookers and wingers), and reported
LCCspm algorithm [2] closed contiguous movement patterns as the best type
of patterns for profiling players into playing positions. However, the specific
behavioural (i.e., signature) movement patterns of rugby league players belong-
ing to each position remain unknown and also the prediction of rugby league
players into all nine playing positions based on their movement patterns is largely
unexplored. Identifying the signature movement patterns and the classification
of players into playing positions is important for training optimisation for the
overall team and player performance improvement as well as for talent identifi-
cation and recruitment among others. Therefore, this study aims to uncover the
signature movement patterns of elite rugby league players for each nine playing
positions and use movement patterns to predict the playing positions of elite
rugby league players while investigating the contribution of movement patterns
towards the prediction.
2 Method
The method for processing GPS data into discrete movement sequences as pub-
lished [12] by was followed. Each movement unit and its encoded character is
published on [12] page 165. Two seasons (2019 and 2020) worth of GPS data
of 217 professional Rugby League players who participated in 338 fixtures were
collected using the micro-sensor units including global positioning systems [12]
(Optimeye S5, Catapult Sports, Melbourne, Australia) sampling at 10 Hz and
was processed into sets of discrete movement sequences. This study only consid-
ered players who played at one fixed position throughout the seasons. The dis-
tribution of the players across playing positions was 31 centres, 8 Five-Eighths,
22 Full-Backs, 26 Half-Backs, 22 Hookers, 8 Loose-Forwards, 47 Prop-Forwards,
25 SecondRows and 28 Wingers. Collected GPS data were processed into a total
of 4,640 sets of discrete movement sequences at the player-per-fixture level. This
study got the approval of the University Ethics Committee and obtained written
informed consent from the organisation representing the participants.
RFL Positions’ Signature Movement Patterns and Prediction 147
This study used the LCCspm [2] to extract a user-defined length of frequent
closed contiguous patterns from sets of discrete movement sequences. The param-
eters of LCCspm were set to 20 (i.e., 2 s time frame), and the support was set
to 5% (to include both low and high-frequency movement patterns), to extract
movement patterns. For each playing position, signature movement patterns were
derived by analysing extracted movement patterns to identify those performed
only by players within the playing position. Afterwards, signature movement pat-
terns of each position were visualised based on their frequency across all fixtures.
Centres: 3,307 movement patterns were performed by centres, where only 1241
(approximately 37.5%) were uniquely performed. The number of times each of
the 1241 movement patterns was performed by all centres varied from 1 to
365,692. The signature movement patterns performed by elite rugby league cen-
tres are visualised in Fig. 1a. An example is “eeeeeea” denoted as [Walk-Neutral-
Straight] × 6 and Walk-Deceleration-Straight.
Five Eighths: 1,199 movement patterns were performed by five-eighths, where
only 13 (approximately 1.08%) were uniquely performed. The number of times
each of the 13 movement patterns was performed by all five-eighths varied from
6 to 56,405. The signature movement patterns performed by elite rugby league
five-eighths are visualised in Fig. 1b. An example is “rqqqqqqq” denoted as Jog-
Neutral-Acute-Change and [Jog-Neutral-Straight] × 7.
Full Backs: 1,561 movement patterns were performed by full-backs, where
122 (approximately 7.82%) were uniquely performed. The number of times
each of the 122 movement patterns was performed by all full-backs varied
from 1 to 96,574. The signature movement patterns performed by elite rugby
league centres are visualised in Fig. 1c. An example is “yyyyzyyy” denoted
as [Run-Deceleration-Straight] × 4, Run-Deceleration-Acute-Change and [Run-
Deceleration-Straight] × 3.
Half Backs: 2,881 movement patterns were performed by half-backs, where
only 612 (approximately 21.24%) were uniquely performed. The number of times
each of the 612 movement patterns was performed by all half-backs varied from
1 to 169,958. The signature movement patterns performed by elite rugby league
centres are visualised in Fig. 1d. An example is “eeeefeeeeeeeeee” denoted as
[Walk-Neutral-Straight] × 4, Walk-Neutral-Acute-Change and [Walk-Neutral-
Straight] × 9.
Hookers: 1,282 movement patterns were performed by hookers, where only 300
(approximately 23.4%) were uniquely performed. The number of times each of
the 300 movement patterns was performed by all hookers varied from 1 to 12,604.
RFL Positions’ Signature Movement Patterns and Prediction 149
The signature movement patterns performed by elite rugby league hookers are
visualised in Fig. 1e. An example is “vwvv” denoted as Jog-Acceleration-Acute-
Change, Jog-Acceleration-Large-Change and [Jog-Acceleration-Acute-Change]
× 2.
Loose Forwards: 1,177 movement patterns were performed by loose forwards,
where only 13 (approximately 1.11%) were uniquely performed. The number of
times each of the 13 movement patterns was performed by all loose-forwards
varied from 4 to 374,428. The signature movement patterns performed by elite
rugby league loose forwards are visualised in Fig. 1f. An example is “zznmm”
denoted [Run-Deceleration-Acute-Change] × 2, Jog-Deceleration-Acute-Change
and [Jog-Deceleration-Straight] × 2.
Prop Forwards: 2,712 movement patterns were performed by prop forwards,
where only 771 (approximately 28.43%) were uniquely performed. The number
of times each of the 771 movement patterns was performed by all prop-forwards
varied from 1 to 145,249. The signature movement patterns performed by elite
rugby league prop forwards are visualised in Fig. 1g. An example is “mmmmma”
denoted as [Jog-Deceleration-Straight] × 5 and Walk-Deceleration-Straight.
Second Rows: 1,967 movement patterns were performed by second rows, where
only 194 (approximately 9.86%) were uniquely performed. The number of times
each of the 194 movement patterns was performed by all second rows varied from
1 to 115,168. The signature movement patterns performed by elite rugby league
second rows are visualised in Fig. 1h. An example is “KKKKKLKK” denoted
as [Sprint-Deceleration-Straight] × 5, Sprint-Deceleration-Acute-Change and
[Sprint-Deceleration-Straight] × 2
Wingers: 3,174 movement patterns were performed by wingers, where only
1066 (approximately 33.59%) were uniquely performed. The number of times
each of the 1,066 movement patterns was performed by all wingers varied from
1 to 28,830. The signature movement patterns performed by elite rugby league
wingers are visualised in Fig. 1i. An example is “GGSSSSSSS” denoted as [Run-
Acceleration-Straight] × 2 and [Sprint-Acceleration-Straight] × 7.
This study was able to identify signature movement patterns of elite rugby
league for each playing position, using closed contiguous movement patterns,
as opposed to the study [5] that did not identify signature movement patterns
of elite rugby league players for each competition level, using longest common
subsequence movement patterns.
(i) Wingers
Fig. 1. Signature Movement Patterns of Elite RFL players per Playing Positions
RFL Positions’ Signature Movement Patterns and Prediction 151
Classifier Precision Recall F1 Score Sampling Method Variables Value Accuracy (%)
Decision Tree 0.32 0.31 0.31 N/A Relative Frequency 33.99
Gaussian Naı̈ve Bayes 0.24 0.16 0.1 10.41
Random Forest 0.53 0.43 0.44 51.9
Logistic Regression 0.53 0.45 0.46 53.53
MLP 0.51 0.49 0.49 53.79
5-NN 0.5 0.46 0.46 48.36
Decision Tree 0.09 0.09 0.09 Random Undersampler Relative Frequency 9.1
Gaussian Naı̈ve Bayes 0.11 0.12 0.07 11.58
Random Forest 0.1 0.1 0.1 10.17
Logistic Regression 0.12 0.13 0.12 12.41
MLP 0.1 0.12 0.07 11.64
5-NN 0.12 0.12 0.11 12.0
Decision Tree 0.46 0.46 0.46 SMOTE Relative Frequency 45.51
Gaussian Naı̈ve Bayes 0.38 0.23 0.18 22.09
Random Forest 0.73 0.74 0.73 73.41
Logistic Regression 0.6 0.61 0.6 60.37
MLP 0.73 0.73 0.73 72.85
5-NN 0.66 0.65 0.61 63.6
Ranking Centres Five Eighth Full Backs Half Backs Hookers Loose Forwards Prop Forwards Second Rows Wingers
1 ij HH SS ij iij ijj ijj HH ijj
2 ijj GH jii ii ef ij HH ijj iij
3 jji ijj ij iij ij iij ij ij ij
4 iij jii HH GH iii HH jii iij ji
5 HH ii iij ijj ijj iji SS GH iji
6 ii jjj GH jii ii GH iij iji jjj
7 fb HG iii HH fb jjj GH jii HH
8 GH ff jji jji mq ii jji ji jii
9 ff iij ej HG iji jii iii jji SS
10 jii iji iji jj jjj iii iiij mq ab
11 ji iii ijj ji HH jji iji jj GH
12 jjj ji mq jjj jji ji ji ii ff
13 jj jji ji iii GG mq jj uv vu
14 HG ab jj jij HG no uvv fb uv
15 fa uuuu ii vuv ji ba jjj ej ii
16 iiij ij uuu iji vuvv uuu ff ef jj
17 iii vuuv uuvu iiij uuvu iiij vuvvu GG fb
18 uv bb jjj GG ff HG bbb iii jji
19 jij uv vvuv vvu jii vuuv HG no uuuv
20 uuu uuv HGG mq SS ff eff vvu iii
21 iji jj vuv GGG ur vuvv ii ff uuuu
22 mq qu ff uuu qu fb HGG iiii uq
23 iu jij GGG uu jj jj mq uvv mq
24 vuuvu GG jf bbb ej qq ab ab vvu
25 ef mq efe SS vuv vu uvuuv jij no
152 V. E. Adeyemo et al.
extracted movement patterns across playing positions and it was used as inde-
pendent variables for classification modelling.
The Logistic Regression classifier, fitted on the original dataset, had an accu-
racy of 53.53%, 0.53 precision, 0.45 recall and 0.46 F1-score for predicting players
into nine RFL playing positions. However, the MLP classification model achieved
a maximum accuracy of 53.79%, 0.49 recall and F1 score respectively and 0.51
precision score. Five of the six classification models achieved an accuracy higher
than the default prediction probability except for the Gaussian Naive Bayes.
The low performance of the Naive Bayes algorithm can be associated with its
naive assumption of independent variables whereas the independent variables
are high-dimensional and there maybe be a correlation among some.
The prediction of RFL players into playing positions, using the undersampled
dataset, had a minimum accuracy of 9.1%, 0.09 precision, F1-score and recall
scores as achieved by the Decision tree classifier. The maximum accuracy of
12.41%, precision and F1-score of 0.12 and recall of 0.13 was achieved by the
Logistic Regression classifier. The prediction accuracies of all classifiers on the
undersampled dataset were lower than the default prediction probability in two
cases (Decision tree and Random Forest models).
The prediction of RFL players into playing positions, using the oversampled
dataset, had a minimum accuracy of 22.09%, 0.38 precision, 0.23 recall and 0.18
F1-score as achieved by the Gaussian Naive Bayes classifier. The classifier with
the maximum performance is Random Forest with an accuracy of 73.41%, 0.73
F1-score, 0.73 precision and 0.74 recall scores (Table 1). Four of the six classifiers
had accuracies above 50% except for the Decision tree and Gaussian Naive Bayes
classifiers. Decision tree low performance (i.e. underfitting) can also be attributed
to a large number of independent variables as the classifier decision splits will
be based on numerous variables.
A comparative analysis of the prediction models revealed the individual pre-
diction accuracies of the classifiers were better on the original dataset and best
on the oversampled dataset. For example, the Random Forest model had an
accuracy of 51.9% on the original dataset, 10.17% accuracy on the undersam-
pled dataset and 73.41% accuracy on the oversampled dataset. Additionally, this
study’s Random Forest 73.41% accuracy for predicting elite rugby league play-
ers into nine playing positions is higher than the classification accuracy of 70.1%
reported by [13] for classifying elite junior Australian football players into mid-
field, defence, forwards or rucks positional groups using technical performance
indicators.
SHAP values revealed the rank and contribution of each unique movement
pattern. From Table 2, the top-25 most contributing movement patterns for the
prediction of RFL players’ varied across playing positions. This suggests that
each unique movement pattern contributed more or less information depend-
ing on the rugby league players’ playing positions. For example, the movement
pattern “mq” is the twenty-second most contributing movement pattern for clas-
sifying players into the centres. However, it is the twenty-fifth and twentieth most
contributing movement pattern for classifying players into the five-eighths and
RFL Positions’ Signature Movement Patterns and Prediction 153
References
1. Adeyemo, V.E., Balogun, A.O., Mojeed, H.A., Akande, N.O., Adewole, K.S.:
Ensemble-based logistic model trees for website phishing detection. In: Anbar,
M., Abdullah, N., Manickam, S. (eds.) ACeS 2020. CCIS, vol. 1347, pp. 627–641.
Springer, Singapore (2020). https://doi.org/10.1007/978-981-33-6835-4 41
2. Adeyemo, V.E., Palczewska, A., Jones, B.: LCCspm: l-length closed contiguous
sequential patterns mining algorithm to find frequent athlete movement patterns
from GPS. In: 2021 20th IEEE International Conference on Machine Learning and
Applications (ICMLA), pp. 455–460. IEEE (2021)
3. Adeyemo, V.E., Palczewska, A., Jones, B., Weaving, D.: Identification of pattern
mining algorithm for rugby league players positional groups separation based on
movement patterns. arXiv preprint arXiv:2302.14058 (2023)
4. Chambers, R., Gabbett, T.J., Cole, M.H., Beard, A.: The use of wearable microsen-
sors to quantify sport-specific movements. Sports Med. 45(7), 1065–1081 (2015).
https://doi.org/10.1007/s40279-015-0332-9
5. Collins, N., White, R., Palczewska, A., Weaving, D., Dalton-Barron, N., Jones, B.:
Moving beyond velocity derivatives; using global positioning system data to extract
sequential movement patterns at different levels of rugby league match-play. Eur.
J. Sport Sci. 23(2), 201–209 (2023)
6. Lemaı̂tre, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to
tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res.
18(1), 559–563 (2017)
7. Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions.
In: Advances in Neural Information Processing Systems, vol. 30 (2017)
154 V. E. Adeyemo et al.
8. Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In:
Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol.
30, pp. 4765–4774. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/
7062-a-unified-approach-to-interpreting-model-predictions.pdf
9. O’Donoghue, P.: An Introduction to Performance Analysis of Sport. Routledge,
London (2014)
10. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn.
Res. 12, 2825–2830 (2011)
11. Sweeting, A.J., Aughey, R.J., Cormack, S.J., Morgan, S.: Discovering frequently
recurring movement sequences in team-sport athlete spatiotemporal data. J. Sports
Sci. 35(24), 2439–2445 (2017). https://doi.org/10.1080/02640414.2016.1273536.
pMID: 28282752
12. White, R., Palczewska, A., Weaving, D., Collins, N., Jones, B.: Sequential move-
ment pattern-mining (SMP) in field-based team-sport: a framework for quantifying
spatiotemporal data and improve training specificity? J. Sports Sci. 40(2), 164–174
(2022)
13. Woods, C.T., Veale, J., Fransen, J., Robertson, S., Collier, N.F.: Classification of
playing position in elite junior Australian football using technical skill indicators.
J. Sports Sci. 3(1), 97–103 (2018)
Boat Speed Prediction in SailGP
1 Introduction
SailGP [3] has emerged as a thrilling and revolutionary sailing championship that
showcases the pinnacle of high-performance sailing. One of the most captivating
aspects of SailGP is the sheer speed and adrenaline experienced by the boats as
they harness the power of the wind and glide above the water’s surface using
foiling technology. Foiling, a groundbreaking innovation, allows the boats to
achieve remarkable speeds and push the boundaries of what was once thought
possible in sailing. By shedding light on the intricate relationship between boat
speed and the various influencing factors, this project aims to contribute to the
broader knowledge and understanding of high-performance sailing. The findings
and predictive models derived from this project hold the potential to aid sailors,
teams, and race strategists in making informed decisions and optimizing their
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
U. Brefeld et al. (Eds.): MLSA 2023, CCIS 2035, pp. 155–164, 2024.
https://doi.org/10.1007/978-3-031-53833-9_13
156 B. Zentai and L. Toka
2 Background
In the context of SailGP, several key terminologies play a vital role in under-
standing the dynamics of the sport. Foiling refers to the technique of raising the
catamaran’s hulls out of the water using hydrofoils, which are positioned beneath
each hull. These hydrofoils, commonly known as daggerboards, provide lift and
stability to the boat. Additionally, foils can also be present on the rudders, which
are horizontal fins located at the stern of each hull. These rudder foils signifi-
cantly contribute to the overall lift and control of the catamaran, amplifying its
maneuverability and maximizing its speed on the water.
Regarding the sails, the boats are equipped with innovative wingsails, which
are designed to resemble the wings of an aircraft. These wingsails are crucial to
the foiling technique and provide stability and aerodynamic efficiency to harness
the wind’s power effectively. The innovative design of SailGP boats enables them
to achieve remarkable speeds, soaring above the water at exhilarating velocities.
This technology, combined with the skillful navigation of the sailors, unlocks a
new level of performance and competitiveness in the world of sailing.
3 Data Set
Through the use of advanced sensors and onboard data collection systems, teams
gather vast amounts of data during training sessions and races. The SailGP
dataset we used in this project provides a comprehensive and detailed record of
a single race, spanning approximately 12 min. What makes this dataset particu-
larly valuable is the fact that it captures data at a granularity of one record per
second, offering a high-resolution view of the race dynamics and performance.
With nearly 60 features available, encompassing various aspects of boat speed,
Boat Speed Prediction in SailGP 157
wind speed, rudder, daggerboard, jib (headsail), and other critical components,
the dataset offers a wealth of numerical information for analysis and modeling.
However, there are also non-numeric features present, including boat names and
maneuver labels. It is worth noting that the data has a high temporal resolution.
However, obtaining a dataset of this complexity presented significant chal-
lenges due to the strict confidentiality surrounding SailGP data. As the infor-
mation is considered highly sensitive and confidential, gaining access to such
a dataset required extensive effort and collaboration. To overcome this obsta-
cle, several email communications were initiated with the top teams involved in
SailGP, seeking their assistance and cooperation in acquiring the necessary data.
The process involved establishing trust, explaining the project objectives, and
assuring the teams of the utmost confidentiality and data protection. Through
persistent communication and the cooperation of the teams, access to this intri-
cate and valuable dataset was ultimately secured. The complexity and sensitivity
of the data within the SailGP dataset make it a rarity in the field of sailing ana-
lytics.
The inclusion of various units of measurement, such as degrees and knots,
further adds to the intricacy of the dataset. The diverse nature of the features
and the high temporal resolution provide a unique opportunity to gain deep
insights into the race dynamics, team performance, and the impact of different
components on the overall outcome. By successfully navigating the challenges
associated with data confidentiality and secrecy, we have been able to leverage
the richness of the SailGP dataset, allowing for detailed analysis, modeling, and
valuable insights into the intricacies of high-performance sailing.
This dataset includes data from the Bermuda event in 2021, specifically Race
4 which took place on the second day of the event [9]. During this race, all the
boats were competing to secure a spot in the final, where the top three teams
from that weekend’s event would face off.
Fig. 1. This figure depicts the boat speed of each team recorded throughout the dura-
tion of the race. The data collection begins at T-30 s, and it is evident from the chart
that every team endeavors to maximize their speed leading up to the start. Notably,
during this phase and the initial leg, the teams exhibit their highest speeds, with
team GBR reaching a remarkable 50.0 knots (92.6 km/h). The chart also reveals slight
drops in speed during maneuvers and discernible variations in speed during upwind
and downwind sections. It is worth noting that in this race, both team JPN and team
USA encountered a collision during the first upwind leg, resulting in their withdrawal
from the race.
Additionally, the mean squared error (MSE) for the random forest regression
model was calculated to be 4.5. A lower MSE value indicates better model per-
formance, as it represents reduced prediction errors. The MSE of 4.5 suggests
that, on average, the predicted boat speed deviates by approximately 2.1 knots
from the true values (Fig. 2).
Fig. 2. This scatter plot provides a visual comparison between the actual boat speeds
and the predicted boat speeds from the meta-model. The points on the plot are col-
ored differently, indicating overestimation (red points) or correct/underestimation (blue
points). The plot of the random forest regression reveals a concentration of predicted
boat speeds around 30 knots (55.6 km/h). Notably, there is a tendency for overestima-
tion within the range of 25 knots (46.3 km/h) to 30 knots (55.6 km/h). This observation
suggests that the model tends to predict slightly higher boat speeds within this par-
ticular range. (Color figure online)
Fig. 3. The model exhibits a lower mean squared error (MSE), indicating a reduced
deviation between the predicted and actual boat speeds. Notably, the plot demonstrates
improved accuracy in the higher speed ranges, particularly for boat speeds exceeding 40
knots (74 km/h). This observation suggests that the model’s predictions align closely
with the actual boat speeds in these higher ranges, indicating a higher level of precision
and accuracy in those predictions.
4.4 Benchmark
The decision to create a stacked model by combining the random forest regres-
sion model and the gradient boosted regression model was driven by the desire
to leverage the strengths of both models and improve overall predictive per-
formance. Each model may have its own unique strengths and weaknesses in
capturing the complex relationships within the data. By combining their pre-
dictions, the stacked model potentially captured a broader range of patterns
and produced more accurate predictions. The idea behind stacking is to learn
from the individual models’ outputs and build a meta-model that combines their
strengths, ultimately aiming for improved predictive power. The performance of
the meta-model is evaluated by calculating the Mean Squared Error (MSE) and
R2 score, which provide insights into the accuracy and goodness-of-fit of the
model (Fig. 4).
Fig. 4. The plot reveals the effectiveness of predicting boat speed, particularly in the
higher speed ranges of 30+ knots (55.6 km/h). Moreover, the stacked model exhibits a
lower deviation, indicating its ability to make precise predictions with reduced errors.
However, it’s worth noting that there is a tendency for overestimation around 30 knots
(55.6 km/h) across all of the models, suggesting a potential area for further refinement.
5 Conclusion
SailGP stands at the forefront of a rapidly evolving and highly competitive
sailing landscape. This dynamic sport is characterized by its cutting-edge tech-
nology, intense competition, and a strong reliance on data-driven strategies. As
SailGP continues to develop and push the boundaries of what is possible in
sailing, the role of analytics and data mining becomes increasingly vital. It is
worth noting that, within the research community, our work represents a pio-
neering effort in exploring the realm of data mining in the context of SailGP.
By delving into this uncharted territory, we aim to shed light on the untapped
potential of leveraging data to gain insights and unlock performance gains in
this exhilarating sport.
Overall, the results indicate that the stacked regression model, incorporating
the predictions from RF and GBM models, along with the meta-model, yields
promising outcomes in predicting boat speeds. The relatively low MSE and high
R2 score suggest that the model captures and explains a significant portion of
the boat speed variations. These findings indicate the potential practical utility
of the model in estimating boat speeds accurately and its value in optimizing
performance in the context of sailboat racing.
The teams’ integration of data analytics to optimize their boats’ performance
showcases the increasing significance of data-driven decision-making within the
realm of competitive sailing. In SailGP, where the margins between victory and
defeat are often razor-thin, harnessing the power of data becomes crucial for
gaining a competitive edge.
Looking ahead, an exciting opportunity lies in the collaboration with SailGP,
which will grant access to further datasets and potentially real-time racing data.
By working closely with SailGP, we aim to expand our predictive models to
encompass a wider range of scenarios and to obtain more accurate predictions
during live racing events.
References
1. Oracle cloud infrastructure. https://www.oracle.com/customers/sailgp/. Accessed
01 June 2023
2. Sailgp. https://sailgp.com/news/nobu-katori-joins-japan/. Accessed 01 June 2023
3. Sailgp homepage. https://sailgp.com/. Accessed 01 June 2023
4. Sklearn gradient boosting regression. https://bit.ly/3YoA6ch. Accessed 01 June
2023
164 B. Zentai and L. Toka
1 Introduction
A new generation of detailed sports data is emerging for sports analysis in gen-
eral, including racket sports that require more precise analysis. A flagship exam-
ple is the TTNet [13] video tracking system, which enables real-time identifi-
cation of players and ball positions. This level of detail represents a paradigm
shift, as tracking data [8] of this kind is typically under-explored in such sports.
Meanwhile, a plethora of advanced tools are beginning to leverage this data,
such as iTTvis [13], which is aimed at experts to explore game sequences and
discover tactics. Other approaches also focus on sequence analysis to visually
explore frequent patterns [3], using an a-cyclic graph to represent all points in
a match, and extract tactics. TIVEE [2] leverages shot types, player positions,
and shuttle trajectory and speed to find correlations between strokes, aiding in
the discovery of tactics. TacticFlow [12] utilizes multivariate events in racket
sports to mine frequent patterns and detect how these patterns change over
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
U. Brefeld et al. (Eds.): MLSA 2023, CCIS 2035, pp. 167–178, 2024.
https://doi.org/10.1007/978-3-031-53833-9_14
168 G. Calmet et al.
time. Tac-Miner [11] allows users to analyze, explore, and compare tactics of
multiple matches based on three consecutive strokes. All of these works share
the commonality of being driven by the availability of detailed tracking data.
Fig. 1. Example of an extended table tennis detailed data model (from [3]). It includes
additional metadata (e. g., players’ names, score and winner) with advanced strokes
types, players and ball rebound zones. It also takes into account continuous player
positions and ball position (that we haven’t yet collected).
We hypothesize that such detailed data provide deeper games analysis, and
thus need to be anticipated. Figure 1 illustrates a detailed table tennis data model
that captures all data currently available (mostly meta and event-based data). In
this work, we use a combination of event-based data and video tracking data, on
a 2D space. Such data can be collected with regular technical skills using a blend
of computer vision, deep learning and manual annotation tools. It extends the
format previously used in [3] but with finer grained players position, orientation
and ball rebound position. Some researches have been led to collect such data
automatically. [6] uses Twin convolutional neural networks with 3D convolutions
on RGB data and optical flow to classify strokes in table tennis. [5] shows the
importance of optical flow and human detection algorithms to improve action
detection. [9] uses a CNN layer inspired by optical flow algorithms for action
recognition without having to compute optical flow. And TTNet [13] is a multi-
task convolution-based neural network to collect positions and stroke events data
simultaneously. Additional data could be collected, such as the 3D position of
players as well as ball effects, but this requires more work on video detection.
We discuss this part in the last section of this paper.
Exploring Table Tennis Analytics 169
To illustrate our analysis in this paper, we use the following scenario from an
international table tennis game: Lebrun, the French champion, against Zhen-
dong, the world champion and number one player in the world, in the quarter-
final match during WTT1 Championship in Macao, 2023. In this match, Lebrun
wins 3 sets to 2. They took turns winning the sets. It was a really close game,
and Lebrun won the decider 11–9 by touching the edge of the table. During
this match, our experts noticed that Lebrun was very strong when attacking
from the left side of the table. Usually, he would win the point just after his
attack, often down the line with his backhand. If we focus on the first set, we
can see that Lebrun’s domination decreased after the second point, while he
didn’t execute these shots. However, after his domination increased again, these
shots began to be more and more common. Moreover, we noticed that during an
important moment (7–4 for Zhendong), he manages to score twice using these
shots, and this made him take the lead of the set. We may suppose that this is
an important feature of his game plan. In the first set, we found 4 points won
by Lebrun when he makes these strokes (indicated by the red vertical lines in
Fig. 2):
We used various data from Fig. 1 (scores, positions of both players, zone of
rebound, type of stroke, laterality) to define the domination function D(t) nor-
malized between −1 and 1 to indicate which team dominates. At the beginning
of the match, no team dominates, in other words D(0) = 0. As domination usu-
ally relies on many factors (e. g., endurance, precision, self-confidence, power,
speed, trajectory prediction, agility, decision-making, strategy, to name a few)
we will therefore consider multiple types of domination: score, physical and
mental. However, we know that three functions won’t be enough to analyze
every aspect of a table tennis match, this definition is an initial approach that
inevitably contains many limitations.
2
https://github.com/centralelyon/table-tennis-analytics.
Exploring Table Tennis Analytics 171
Fig. 2. Detailed metrics during the first set of a match between Lebrun and Zhen-
dong at the WTT Championships in Macao, China in 2023. Red vertical lines the 4
points during the first set we focused on. (Color figure online)
Fig. 3. Theoretical structure of the Playing Patterns Trees (PPT) that enumerates all
shot attributes combination.
1. The children of a zone node or of the root are laterality nodes: backhand
and forehand
2. The children of a laterality node are type nodes: right side and left side
for services and offensive, push and defensive for the others strokes.
3. The children of a type node are zone nodes according to the zone of rebound
of the ball (d1, d2, d3, m1, m2, m3, g1, g2, g3. It also has a child named
fault if the rally ends there.
Each node stores the probability that the sequence results in a win. Theo-
retically, the PPT up to the third stroke contains 62, 651 nodes, but in reality,
many of them are never explored because they represent unlikely sequences. For
instance, after an offensive stroke, it is unlikely to find a short zone of rebound
like d1, m1, or g1. Actually, the trees that are built on several real match anal-
yses haven’t more than 2, 000 nodes. We have built our PPT from the analysis
of 9 simulated matches, augmented from 3 different set annotated manually.
This metric is particularly interesting because it allows us to introduce the
concept of chance (or unlikely success) and its analysis can explain certain sub-
tleties of mental domination. As Fig. 2 suggests, the expected score respects
the global match outcome 3–2 for Lebrun. However, the set winners are not
always the same as expected. The third set is particularly interesting because
Lebrun wins by a wide margin and dominated during the whole set. But the
expected score is totally different: he is expected to lose by a wide margin. This
can be explained by the fact that he just lost the previous set and needs now
to be careful. Moreover, Zhendong just came back to a draw and may be less
concentrated: he still plays aggressively, which means he has occasions but com-
mits mistakes. The fourth set is similar, both players are very close in terms of
expected score, but Lebrun loses by a wide margin, as Zhendong did in the
previous set: he just won the previous set, he is less concentrated, and he makes
mistakes. This is an important feature that could be useful for the understanding
of mental domination.
An important remark is that this metric isn’t used to point the finger at
players who are lucky; it is used to show how luck can sometimes work in a
player’s favor to gain a mental advantage. Moreover, what we call ’luck’ is only
those sequences that are statistically losing and still result in a win. It is quite
possible that a precise refinement of the quality of the stroke will be undetectable
in our analysis and will allow a losing sequence to become a win. For instance,
this metric doesn’t quite work with players that are extremely creative, like
Lebrun. This leads us to our last metric.
Being able to vary playing patterns during a match is one of the keys to victory
in table tennis. A player who always responds in the same way to a sequence is
bound to lose in the long term, even if their technique is perfect. However, it is
well known that humans are particularly bad at creating randomness, especially
174 G. Calmet et al.
when things are going fast and when the mind is in automatic mode. Therefore,
analyzing the variation of playing patterns during a set should be an interesting
way to look at the mental domination.
Fig. 4. Distance matrix between openings of the match between Lebrun and Zhen-
dong at the WTT Championships in Macao, 2023. At the beginning Lebrun doesn’t
vary much, probably to start with his strength and take the lead. Only then, he starts
to change to keep surprising his opponent with new openings. During the first set,
Zhendong started to lose when Lebrun started to vary openings. The most interest-
ing analysis is from the last set. We can see that Zhendong didn’t change a lot of
opening during this set (white square). We can suppose that he noticed that these tac-
tics were efficient, and he wanted to take the lead at the beginning of it. But Lebrun
adapted to this and managed to come back. Then, Zhendong never tried to change
pattern and lost the match. This may reflect a mental fatigue of Zhendong (maybe
with the stress he wanted to stay with something familiar to him, or maybe he wasn’t
lucid enough to take the decision to change of opening).
In a previous paper [3], we saw that some players tend to serve in the same
way, while they did not lose a point after such a serve. Here, we are going further
in the sense that we explore more strokes into the rally, and because we create a
metric representing the distance between two openings. By collecting the three
first strokes of every rally of a match, we can calculate similarities between
sequences.
Exploring Table Tennis Analytics 175
n
D(U, V ) = (n − i) · d(Ui , Vi ) (1)
i=1
with
– d(Ui , Vi ) = 0 if Ui = Vi ,
– d(Ui , Vi ) = 1 if Ui = Vi and if Ui and Vi are laterality nodes or type nodes,
– d(Ui , Vi ) = Mj,k , if Ui and Vi are zone nodes, where M is the zones’ adjacency
matrix and where j and k are respectively the indices for the zones Ui and
Vi in M .
For a given list of openings M = (Mi )i∈[0,m] , we can build the distance matrix
defined as Dist(M ) = (D(Mi , Mj ))i,j∈[0,m]2 . A feature worth attention on Fig. 4
is the similarity of consecutive sequences, that appears as white squares on the
diagonals of both matrix. Because of the temporal aspect of this figure, we can
see Zhendong tends to vary less in his opening at the end of the match, and
this can be a sign of a mental fatigue.
new metrics. We also plan to update these datasets with even more detailed
data, including better 3D pose estimation, ball spin effects, and trajectories.
A Appendix
Considering the scores being a for player A, and b for player B, we define the
a
probability for A to win the next point by p = a+b .
Then we can calculate the winning probability of A knowing the scores (noted
Pa,b ) by using the following recursive formula,
1
Pa,b = pPa+1,b + (1 − p)Pa,b+1 = (aPa+1,b + bPa,b+1 )
a+b
and by applying those limit conditions:
Because of the quite extreme winning probabilities that we encounter for low
scores, we added another condition to complete the model:
Fig. 5. Winning probability Pa,b (vertical axis) as a function of the scores a and b
(horizontal axes)
For the winning probability of a match, the same process is applied, taking
into account the probability to win the current set.
Exploring Table Tennis Analytics 177
References
1. Andrienko, G., et al.: Visual analysis of pressure in football. Data Min. Knowl.
Disc. 31(6), 1793–1839 (2017). https://doi.org/10.1007/s10618-017-0513-2
2. Chu, X., et al.: TIVEE: visual exploration and explanation of badminton tactics in
immersive visualizations. IEEE Trans. Vis. Comput. Graph. 28(1), 118–128 (2021)
3. Duluard, P., Li, X., Plantevit, M., Robardet, C., Vuillemot, R.: Discovering and
visualizing tactics in a table tennis game based on subgroup discovery. In: Brefeld,
U., Davis, J., Van Haaren, J., Zimmermann, A. (eds.) MLSA 2022. CCIS, vol. 1783,
pp. 101–112. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-27527-2 8
4. Green, S.: Assessing the performance of premier leauge goalscorers. OptaPro
Blog (2012). https://www.statsperform.com/resource/assessing-the-performance-
of-premier-league-goalscorers/
5. Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J.: Towards understand-
ing action recognition. In: 2013 IEEE International Conference on Computer
Vision, pp. 3192–3199. IEEE, Sydney (2013). https://doi.org/10.1109/ICCV.2013.
396. http://ieeexplore.ieee.org/document/6751508/
6. Martin, P.E., Benois-Pineau, J., Peteri, R., Morlier, J.: 3D attention mechanism
for fine-grained classification of table tennis strokes using a Twin Spatio-Temporal
Convolutional Neural Networks. In: 2020 25th International Conference on Pattern
Recognition (ICPR), pp. 6019–6026. IEEE, Milan (2021). https://doi.org/10.1109/
ICPR48806.2021.9412742. https://ieeexplore.ieee.org/document/9412742/
7. Mead, J., O’Hare, A., McMenemy, P.: Expected goals in football: improving model
performance and demonstrating value. PLoS ONE 18(4), e0282295 (2023)
8. Perin, C., Vuillemot, R., Stolper, C.D., Stasko, J.T., Wood, J., Carpendale, S.:
State of the art of sports data visualization. In: Computer Graphics Forum (Euro-
Vis 2018), vol. 37, no. 3, pp. 663–686 (2018). https://doi.org/10.1111/cgf.13447.
https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.13447
9. Piergiovanni, A., Ryoo, M.S.: Representation flow for action recognition. In: 2019
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.
9937–9945. IEEE, Long Beach (2019). https://doi.org/10.1109/CVPR.2019.01018.
https://ieeexplore.ieee.org/document/8953712/
10. Rolland, G., Vuillemot, R., Bos, W.J., Rivière, N.: Characterization of space and
time-dependence of 3-point shots in basketball. In: MIT Sloan Sports Analytics
Conference (2020)
11. Wang, J., Wu, J., Cao, A., Zhou, Z., Zhang, H., Wu, Y.: Tac-miner: visual tac-
tic mining for multiple table tennis matches. IEEE Trans. Vis. Comput. Graph.
27(6), 2770–2782 (2021). https://doi.org/10.1109/TVCG.2021.3074576. https://
ieeexplore.ieee.org/document/9411869
12. Wu, J., Liu, D., Guo, Z., Xu, Q., Wu, Y.: TacticFlow: visual analytics of ever-
changing tactics in racket sports. IEEE Trans. Vis. Comput. Graph. 28, 835–845
(2022). https://doi.org/10.1109/TVCG.2021.3114832. https://ieeexplore.ieee.org/
document/9552436
13. Wu, Y., et al.: iTTVis: interactive visualization of table tennis data. IEEE Trans.
Vis. Comput. Graph. 24(1), 709–718 (2017)
14. Zhu, Y., Naikar, R.: Predicting tennis serve directions with machine learning. In:
Brefeld, U., Davis, J., Van Haaren, J., Zimmermann, A. (eds.) MLSA 2022. CCIS,
vol. 1783, pp. 89–100. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-
27527-2 7
Performance Measurement 2.0: Towards
a Data-Driven Cyclist Specialization Evaluation
Abstract. Current cycling analytics solutions do not account for either the raced
course profile or the level of the competition. Therefore, this paper suggests a two-
stage approach which initially clusters races into coherent clusters based upon both
elevation and road surface type. Subsequently, underlying skill levels are deter-
mined per cluster through the observed race results. Our results indicate that the
methodology results into clusters which match the commonly known specializa-
tions in road cycling. The ranking methodology results into skill ratings which
enable the identification of specialization and can be used in other downstream
tasks.
1 Introduction
The field of professional cycling has seen major changes in recent years. In the last couple
of decades, the sport has shifted to appeal to a wider range of regions and nationalities
(Van Reeth, 2016). This shift has also resulted in many new innovations being introduced
in the sport, including the use of data analytics techniques, colloquially known as cycling
analytics. These innovations have created large value in the sport, where racing is now
more data-driven, with riders racing based upon their known strengths and weaknesses
as measured through their power profiles.
While descriptive techniques, such as measuring threshold power, are deeply embed-
ded in the field, a different story is observed for predictive techniques. This type of
approach was only recently introduced in the field (Hilmkil et al, 2018; Kataoka & Gray,
2018), which could allow for even deeper tactical advantages as team managers and rid-
ers could now start anticipating future behavior. Initial approaches were mainly based
upon riders from within one team, as teams typically have much richer information on
their own riders (e.g., detailed power outputs, and training regimes) compared to riders
who compete for rival teams. However, researchers quickly realized the potential in esti-
mating future performances of athletes across teams, which allowed for race outcome
estimation (Kholkine et al., 2020; Kholkine et al., 2021), and future talent identifica-
tion (Van Bulck, Vande Weghe & Goossens, 2021; Janssens, Bogaert & Maton, 2022).
Unfortunately, these applications need to work with publicly available data as perfor-
mances need to be compared across athletes and teams. This entails that current solutions
primarily focus on eventual race results, as they give detailed information on the even-
tual outcome of a highly competitive race. These individual race results are then simply
aggregated (Van Bulck et al., 2021), grouped manually (Janssens et al., 2022), or linked
in a race-by-race method (Kholkine et al., 2021).
This completely disregards several aspects of the competitiveness typically observed
in professional cycling. Race profile type heavily influences which riders are the a priori
race favorites, with heavier athletes typically thriving more on flatter terrain compared
to their lightweight colleagues who generally thrive in mountainous terrain. In contrast
to other racing sports, cycling is heavily influenced by in-race dynamics and tactics. This
means that one cannot directly measure performance through absolute values such as race
time. Therefore, current solutions look at riders finishing places. However, some races
see very different starting fields across the years, which means that a third place in a cer-
tain edition could be a stronger performance than a win in another year. Notably, this was
recently discussed by cycling observers, who debated on whether Geraint Thomas per-
formed higher during the 2022 Tour (where he finished third) compared to his victorious
2018 edition (Ozols, 2022).
Ideally, performance measurement, which is currently used as dependent variable for
several studies, should account for race profile and level of competition. Therefore, this
study sets out to formulate a methodology to determine who does so. Specifically, we
propose a two-stage process. In the first step, the races are clustered by information on
both course elevation and ridden surface types. Semi-supervised constrained clustering
is shown to outperform traditional unsupervised clustering techniques by leveraging
community expertise on race similarity. After races are clustered into coherent groups,
the second step ranks the performance of each rider per cluster. The Bayesian TrueSkill
algorithm is used for this purpose, a novelty in the field of cycling analytics and multi-
entrant sports competitions as a whole. This ranking algorithm has several interesting
properties such as the capability to account for multiple participants (i.e., more than
two) and the fact that it values performance relative to the level of the competition, this
as opposed to traditional points systems. Our results show that trustworthy rankings are
outputted when performance is quantified on an individual level rather than on the team
level.
2 Literature Review
Wearable sensors (e.g., heart rate monitors, power sensors) have revolutionized the field
of professional cycling. The quintessential role these detailed data points play in the
current professional cycling industry inspired many data mining researchers to develop
automated tools who enable value estimates without the need of all possible sensors
(e.g., Hilmkil et al.; Kataoka & Gray, 2018; de Leeuw et al., 2022). Recently, these
systems have also reached subfields of the sports such as track cycling (Steyaert, De
Bock & Verstockt, 2022), and cyclocross (De Bock & Verstockt, 2021).
Performance Measurement 2.0 181
These methods are extremely valuable for stakeholders from within teams. However,
they do not allow for insights across teams. To know which athlete is most likely to
succeed in winning a certain race we would need to have fitness level estimates across
all participating teams, which teams of course are not willing to share. Moreover, teams
often do not even want to share ‘less sensitive’ unhandled data such as power outputs
and heart rate responses, with athletes often hiding this type of information on platforms
such as Strava. Therefore, we can observe a trend where researchers who are interested
in comparing athletes across teams use less fine-grained data sources due to availability
reasons.
This is typically observed in race outcome prediction research. In this field,
researchers try to accurately estimate the most likely outcome of individual races. Initial
developments were made by Kholkine et al. (2020) who develop a model to predict the
outcome of the Tour of Flanders, where the authors handpick several races as being the
most informative races to look at previous performances. The fact that these races need
to be selected indicates a grouping of race specializations. Generally speaking, grand
tour performances have limited information towards predicting the outcome of several
classics races such as the Tour of Flanders. Rather, information needs to be searched
in more closely linked races as the ones selected by the authors. However, handpicking
similar races seems a tedious task when dealing with the thousands of races which are
being raced on the professional calendar across seasons. This is even further complicated
by the fact that not all riders compete in all races, as also indicated by a high number
of missing values in the study. Moreover, the study also highlights how performance
is currently hard to measure in professional cycling, with the ordered predicted relative
time used as dependent feature. Time differences in cycling are typically relatively small
compared to the total raced time, and heavily influenced by race tactics, and the level of
the other participants. These issues are alleviated to a certain degree in Kholkine et al.
(2021). Their method is generalized to include six classics and the authors acknowledge
the importance of ‘relative’ performances by adopting a learn-to-rank approach. How-
ever, relevant races are still handpicked, and overall historic performance is calculated
by points scored, which does not account for the level of competition. Another major
field of research is talent identification, where researchers use machine learning methods
to automatically detect which prospects show the greatest potential. Interestingly, simi-
lar observations to the limitations of the race outcome prediction studies can be made.
Both Van Bulck et al. (2021) and Janssens et al. (2022) use future points scored as the
dependent variable in their predictive set-up. This feature, however, is inevitably flawed
as it does not account for the level of the competition. Moreover, the issue of race-to-race
relevance remains. To limit this issue, Van Bulck et al. (2021) group races based upon
CQ ranking labels: sprints, mountain stages, time trials, general classification, and hilly,
with the latter being a rest category of races which are unlabeled on CQ. The large hill
category, as well as the lack of a typical category such as cobbled races, already indicates
a limitation in this data source. Janssens et al. (2022) follow a manual approach, which
manually groups together several races into eight homogenous categories. Once again
this method is hard to extrapolate to a system which tries to evaluate performance across
all races.
182 B. Janssens and M. Bogaert
3 Methodology
3.1 Data
Several aspects of the racecourse influence the riders which are the a priori favorites of
the races. Perhaps the most well-known aspect is elevation, with the physical demands
required for flat sprint stages (Menaspà et al., 2015) differing heavily from those required
in mountain climbs (Lucia, Joyos & Chicharro, 2000). This translates into a range of
rider specialties which excel based upon the elevation changes which are encountered
during the race. Detailed information on racecourse elevation is, however, not directly
available on popular data sources such as ProCyclingStats (Kholkine et al., 2020) or CQ
Ranking (Van Bulck et al., 2021). Rather, we use a community-based website1 which
discusses races in detail, and also shares detailed information about the racecourse,
which fans can then use to spot riders during the race or to ride the course themselves
during recreational rides. The website offers the potential to download the course as a
GPX (GPS eXchange Format) file. A web scraper was built to retrieve all race profiles
which were available on the website during Spring 2022. The GPS data in the GPX file
are stored in the form of the sequence of the GPS points forming the GPS track as used
for navigation. Geographical coordinates in the GPX file are supplemented with the data
concerning elevation above sea level, which allows for calculating metrics related to
course elevation.
However, elevation is not the sole specialization determinant in professional cycling.
Road cycling is raced on public roads, which may vary heavily. When roads get smaller,
positioning becomes more important as the narrow roads hinder moving through the
peloton. Bad positioning has several disadvantages such as inability to respond to attacks
at the front, or sudden shifts in time differences due to further narrowing of the road
which may results in time differences of tens of seconds between the first and last
riders of the same group, which can be extremely detrimental in a sport which is often
decides by differences of seconds. This is even further complicated by road surfaces
such as unpaved sections or cobble sections. It is clear that these aspects should be
accounted for. However, obtaining structured information on these aspects can be hard,
and may explain why this is currently unaccounted for in academic research. Therefore,
we suggest using the Komoot application2 , which is an application directed at people
who participate in outdoor sports such as running, hiking, and cycling. The application
allows for detailed route creation which users can use to explore the outdoors. One of the
features is that it calculates statistics about the surface and road type. This information
was retrieved by building a web scraper which automatically uploaded all GPX files
1 https://www.la-flamme-rouge.eu/.
2 https://www.komoot.com/discover.
Performance Measurement 2.0 183
from the previous step into the application, which then calculated and stored this for
every course in our data set.
Besides racecourse information, it is also essential to have information on actual
results. Results were scraped from ProCyclingStats, a popular website which stores
information on professional cycling results, which has been popular in previous research
(Kholkine et al., 2020; Kholkine et al., 2021; Van Bulck et al., 2021; Janssens & Bogaert,
2022; Janssens et al., 2022; Baron et al., 2023). Each race which was present on La
Flamme Rouge was searched on the website, and results were stored.
3.2 Clustering
Much domain knowledge is present within the field of professional cycling. This means
that fans and professionals are capable of grouping very similar races together. This could
indicate that grouping of similar races could be done manually. However, the scalability
to do so for thousands of races is limited. A natural solution lays in the deployment of
clustering algorithms. However, such an approach would completely disregard all the
domain knowledge in the field. Wagstaff et al. (2001) propose the use of a Constrained K-
means algorithm which imposes constraints on which races should be clustered together
and which races may not be clustered together. Using these constraints the observations
are assigned to the cluster centers, which results into different eventual cluster centers.
The algorithm takes a dataset D (i.e., the races in this case), a list of must-link constraints
Con= ⊆ D x D, and a list of cannot-link constraints Con= ⊆ D x D. Accordingly, we
have created a list of constraints Con = Con= ∪ Con= .
Besides Constrained K-means, we also benchmark several clustering algorithms
which are fitted without constraints to check and validate whether the constrained app-
roach improves the clustering performance: K-Means (Sculley, 2010), Affinity Propaga-
tion (Frey & Dueck, 2007), Mean Shift (Comaniciu & Meer, 2002), Spectral Clustering
(Shi & Malik, 2000), Hierarchical Clustering with Ward’s Linkage (Ward, 1963), Hier-
archical Clustering: Average Linkage (Sneath & Sokal, 1973), DBSCAN (Ester et al.,
1996), OPTICS (Schubert & Gertz, 2018), BIRCH (Zhang, Ramakrishnan & Livny,
1996), and Gaussian Mixture Model-Based (Raftery & Dean, 2006). If the number of
clusters was required, this was evaluated between 3 and 9 as a limited number of race
types seems to be reflecting the limited number of specialties observed in earlier research
(e.g., Menaspà et al, 2012; Janssens et al., 2022) as well as a more convenient practical
use afterwards (i.e., too many categories could hinder decision making). Each clustering
outcome was externally validated against a test set, which was labeled simultaneously
with list of constraints Con. The test set contains 100 unique race combinations, which
have been labeled as must-link or cannot-link. The test set has no overlap with the list
of constraints Con to ensure no possible data leakage.
approaches: (1) the original individual rankings, disregarding team formation, (2) first
rider per team rankings, and (3) average ranking per team. As it is hard to establish a
ground truth about which rider’s skill should be ranked above another rider’s, we will
compare the outcomes of the three methods based upon some example cases.
4 Results
3 4 5 6 7 8 9
K-Means 45.35% 39.90% 38.33% 36.97% 28.53% 24.73% 17.62%
Affinity Propagation 5.75%
Mean Shift 95.61‰
Spectral Clustering 76.73% 54.86% 52.54% 39.87% 39.88% 39.76% 40.46%
Ward 55.25% 41.66% 36.44% 36.41% 36.41% 36.41% 36.36%
Agglomerative Clustering 99.94% 99.90% 99.86% 99.86% 99.84% 99.84% 99.79%
DBSCAN 100.00%
OPTICS 99.39%
BIRCH 55.26% 55.26% 55.23% 55.18% 36.63% 36.63% 36.63%
Gaussian Mixture 71.49% 30.31% 42.56% 32.39% 30.55% 28.70% 26.88%
Constrained K-Means 97.97% 38.68% 39.63% 40.81% 32.05% 37.96% 31.81%
Table 2. External Validation Results: Percentage of elements correctly classified on test set. Each
clustering algorithm is represented on a row. If number of clusters was set, columns represent
setting. Best performance underlined and in bold.
3 4 5 6 7 8 9
K-Means 81 76 75 75 77 76 77
Affinity Propagation 54
Spectral Clustering 77 77 77 79 80 81 81
Ward 72 74 76 76 76 76 76
BIRCH 81 80 80 80 77 77 77
Gaussian Mixture 66 68 60 62 64 66 65
Constrained K-Means 81 54 79 77 72 82 84
186 B. Janssens and M. Bogaert
Table 1 depicts the percentage of observations which are assigned to the largest
cluster. It is clear that many algorithms (i.e., Mean Shift, Agglomorative Clustering,
DBSCAN, and BIRCH) have the tendency to put almost all observations into the same
cluster. Such outcomes counterargue with prior knowledge (i.e., specialization across
races) and are not informative. Accordingly, they are not included in Table 2, which
depicts the number of correctly clustered or separated observation pairs of the clustering
algorithms on the labeled test set (N = 100). Constrained K-means with K = 9 has the best
performance on the test set (i.e., 84/100 correctly linked/separated). Only a few clustering
algorithms achieve accuracies competitive with this algorithm, most notably K-Means,
and Spectral Clustering. However, they are all outperformed by the Constrained K-
Means algorithm, which correctly links 84 out of 100 instances in the test set, while
also resulting into an insightful grouping of races, with the largest cluster which only
includes around 30% of races. This is also confirmed when inspecting the resulting
clusters and assigned races. When going through the various clusters, they seem to
form coherent specialization clusters: time trials, cobble races, short races, sprint races,
mountain stages, hilly races, hilly races & cobbled races, races with off-road sections,
and short sprint stages. The inclusion of cobbled races and races with off-road sections
clearly shows the added value of including surface type in the clustering analysis.
Remarkably, maximal performance is obtained with the maximal number of possible
clusters as feasible within our methodological set-up. This could suggest that the true
number of clusters is even higher. To check this, we performed a small robustness check
where we allowed the number of clusters to be higher (i.e., up to 20) for the most-
performant algorithm (i.e., constrained clustering). Figure 1 depicts the results of this
analysis. The results clearly depict an ideal performance of K = 9, which is only matched
at a very high number of clusters. These clusters (i.e., K = 19) proofed less humanly
interpretable compared to our suggested number, as they agreed on most clustered pairs,
while the approach with the higher number of clusters tended to group small niche groups
of races together, which would also be uninformative to our ranking approach.
The results of the three TrueSkill ranking methodologies are reported in Table 3.
We interpret the results on the mountain stage cluster, as teammates can still offer large
advantages during these types of stages while stochastic effects (i.e., luck or race tactics)
have limited effects, resulting in more coherent rankings across races (i.e., ranking
being more directly linked to underlying skill level). This way, we should be able to
interpret the methods’ performances despite the absence of an objective ground truth.
The top-10 ranked riders for each method are reported. The mean team method is heavily
outperformed by the other two methods, with Emerson Santos being the suggested top
rider, which evidently is a worse suggestion than two-time Tour de France winner Tadej
Pogačar. The best rider team method results into some useful top-ranked riders, however,
the top-5 contains riders who very few observers would put in their top-5 of the period
2017-early 2022 compared to the individual method (e.g., Alejandro Osorio, Óscar
Sevilla).
Table 3. Top-5 Rankings Using All Three TrueSkill Methods on Mountain Stage Cluster
Individual Method Best Rider Team Method Mean Result Team Method
POGAČAR Tadej POGAČAR Tadej SANTOS Emerson
BARDET Romain LÓPEZ Miguel Ángel POELS Wout
ROGLIČ Primož OSORIO Alejandro SEVILLA Óscar
QUINTANA Nairo QUINTANA Nairo MUÑOZ Daniel
BERNAL Egan SEVILLA Óscar MUGISHA Samuel
Fig. 2. Depiction Tim Merlier (green), Mathieu van der Poel (purple), and Wout van Aert (red)
scores across the various clusters. (Color figure online)
The individual method results into fair rider evaluations across the various clusters.
Consider Fig. 2. The figure compares the scores of the riders Tim Merlier (green),
Mathieu van der Poel (purple), and Wout van Aert (red) across the 9 detected clusters.
Note how all riders have the lowest possible score on the short sprint race cluster, as
they have not performed in this race category (typically lower-level races). The figure
suggests that Tim Merlier is the best sprinter, but that van Aert and van der Poel perform
188 B. Janssens and M. Bogaert
better at the other specialization clusters. Interestingly, the results also suggest that van
Aert outperforms van der Poel on all other clusters besides the cobbled races, and that
the difference between the two is the largest for mountain stages and time trials. All these
observations match the expectations fans and followers may have upfront. Moreover,
these specialization scores could also be used as features in other downstream tasks such
as race outcome prediction or talent identification.
5 Conclusion
This study assesses the feasibility of a combined cluster-ranking method to come up
with a reliable rankings of rider performance. Semi-supervised constrained clustering is
shown to outperform traditional unsupervised clustering techniques while only using a
limited number of human-labeled observations. The study also introduces the TrueSkill
algorithm to the field of cycling. Despite it being an extension to popular one-on-one
ranking methods often used in sports analytics, such as the ELO rating system, its
application in sports settings is uncommon. We demonstrate that using individual race
results translates to much better rankings compared to team-based performances.
Future research might focus on the relationship between label set size and constrained
clustering performance, as while already outperforming other methods, theoretically
this performance should only increase with enlarged labeled set size. Unfortunately,
the creation of such a dataset is a cumbersome process which is heavily influenced
by the labeler’s own prejudices. Moreover, some fan accounts have also shown early
methodologies which focus on the grouping of related races. Unfortunately, limited
information is provided on to how these groupings are created, which makes comparative
analysis difficult. Future research might validate our clustering approach compared to
these approaches once more transparency about these methods is provided. Another
interesting avenue for future research might be the validation of this methodology as a
step in other analytics applications.
References
Baron, E., Janssens, B., Bogaert, M.: Bike2Vec: vector embedding representations of road cycling
riders and races. arXiv preprint arXiv:2305.10471 (2023)
Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans.
Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)
De Bock, J., Verstockt, S.: Video-based analysis and reporting of riding behavior in cyclocross
segments. Sensors 21(22), 7619 (2021)
de Leeuw, A.W., Heijboer, M., Verdonck, T., Knobbe, A., Latré, S.: Exploiting sensor data in
professional road cycling: personalized data-driven approach for frequent fitness monitoring.
Data Min. Knowl. Discov. 1–29 (2022)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in
large spatial databases with noise. In: KDD, vol. 96, no. 34, pp. 226–231 (1996)
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814),
972–976 (2007)
Herbrich, R., Minka, T., Graepel, T.: TrueSkill™: a Bayesian skill rating system. In: Advances in
Neural Information Processing Systems, vol. 19 (2006)
Performance Measurement 2.0 189
Hilmkil, A., Ivarsson, O., Johansson, M., Kuylenstierna, D., van Erp, T.: Towards machine learning
on data from professional cyclists. CoRR abs/1808.00198 (2018)
Hood, A.: Former UCI president questions WorldTour relegation/promotion: ‘why change if it’s
working well?’ (2022). https://www.velonews.com/news/former-uci-president-questions-wor
ldtour-relegation-promotion/. Accessed 8 Dec 2022
Janssens, B., Bogaert, M.: Imputation of non-participated race results. In: Brefeld, U., Davis, J.,
Van Haaren, J., Zimmermann, A. (eds.) MLSA 2021. CCIS, vol. 1571, pp. 155–166. Springer,
Cham (2022). https://doi.org/10.1007/978-3-031-02044-5_13
Janssens, B., Bogaert, M., Maton, M.: Predicting the next Pogačar: a data analytical approach to
detect young professional cycling talents. Ann. Oper. Res. 1–32 (2022)
Kataoka, Y., Gray, P.: Real-time power performance prediction in tour de France. In: Brefeld, U.,
Davis, J., Van Haaren, J., Zimmermann, A. (eds.) MLSA 2018. LNCS, vol. 11330, pp. 121–130.
Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17274-9_10
Kholkine, L., De Schepper, T., Verdonck, T., Latré, S.: A machine learning approach for road
cycling race performance prediction. In: Brefeld, U., Davis, J., Van Haaren, J., Zimmermann,
A. (eds.) MLSA 2020. CCIS, vol. 1324, pp. 103–112. Springer, Cham (2020). https://doi.org/
10.1007/978-3-030-64912-8_9
Kholkine, L., et al.: A learn-to-rank approach for predicting road cycling race outcomes. Front.
Sports Active Living 3 (2021)
Lucia, A., Joyos, H., Chicharro, J.L.: Physiological response to professional road cycling: climbers
vs. time trialists. Int. J. Sports Med. 21(07), 505–512 (2000)
Menaspà, P., Rampinini, E., Bosio, A., Carlomagno, D., Riggio, M., Sassi, A.: Physiological and
anthropometric characteristics of junior cyclists of different specialties and performance levels.
Scand. J. Med. Sci. Sports 22(3), 392–398 (2012)
Menaspà, P., Quod, M., Martin, D.T., Peiffer, J.J., Abbiss, C.R.: Physical demands of sprinting in
professional road cycling. Int. J. Sports Med. 36(13), 1058–1062 (2015)
Ozols, K.: Geraint Thomas was Stronger in the Tour de France 2022 compared to his 2018
Victory (2022). https://lanternerouge.com.au/2022/11/07/geraint-thomas-was-stronger-in-the-
tour-de-france-2022-compared-to-his-2018-victory/. Accessed 2 Dec 2022
Raftery, A.E., Dean, N.: Variable selection for model-based clustering. J. Am. Stat. Assoc.
101(473), 168–178 (2006)
Schubert, E., Gertz, M.: Improving the cluster structure extracted from optics plots. In: LWDA
(2018)
Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference
on World Wide Web, pp. 1177–1178 (2010)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach.
Intell. 22(8), 888–905 (2000)
Sneath, P.H., Sokal, R.R.: Numerical taxonomy. The principles and practice of numerical
classification (1973)
Steyaert, M., De Bock, J., Verstockt, S.: Sensor-based performance monitoring in track cycling.
In: Brefeld, U., Davis, J., Van Haaren, J., Zimmermann, A. (eds.) MLSA 2021. CCIS, vol.
1571, pp. 167–177. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-02044-5_14
Suzuki, A.K., Salasar, L.E.B., Leite, J.G., Louzada-Neto, F.: A Bayesian approach for predicting
match outcomes: the 2006 (association) football world cup. J. Oper. Res. Soc. 61(10), 1530–
1539 (2010)
Van Bulck, D., Vande Weghe, A., Goossens, D.: Result-based talent identification in road cycling:
discovering the next Eddy Merckx. Ann. Oper. Res. 1–18 (2021)
Reeth, D.: Globalization in professional road cycling. In: Van Reeth, D., Larson, D.J. (eds.) The
economics of professional road cycling. SEMP, vol. 11, pp. 165–205. Springer, Cham (2016).
https://doi.org/10.1007/978-3-319-22312-4_9
190 B. Janssens and M. Bogaert
Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S.: Constrained k-means clustering with background
knowledge. In: ICML, vol. 1, pp. 577–584 (2001)
Ward Jr., J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58(301),
236–244 (1963)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very
large databases. ACM SIGMOD Rec. 25(2), 103–114 (1996)
Exploiting Clustering for Sports Data
Analysis: A Study of Public
and Real-World Datasets
1 Introduction
Data mining methods are increasingly being used to analyze data. Thus, these
methods are also becoming more important for analyzing sports data to provide
training recommendations to athletes based on their data [5]. Finding proto-
typical features in sports data using clustering techniques and the associated
methods for preprocessing the data is the goal of this paper. To achieve this
goal, two sports data sets are used: First, the public Body Performance data set
[4] was used in the development whereas the real-world data set of the in:prove
project is used as a validation use case.
In this paper we hence make the following contributions: The mentioned data
sets are preprocessed accordingly before clustering, where different variants and
combinations are tested. Furthermore, the following well-known clustering meth-
ods are used: K-Means, Hierarchical Clustering, Density-Based Spatial Cluster-
ing of Applications with Noise (DBSCAN), Affinity propagation, Mean-shift
and Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH).
By choosing different clustering methods and preprocessing methods, differ-
ences between methods and the resulting influence on the clustering results are
clarified.
2 Related Work
One of the basic methods of data mining is clustering. Clustering belongs to the
unsupervised methods [9]. Several approaches apply clustering to sports-related
data (as surveyed in the following) without however providing the comprehensive
comparison as we do in this paper.
For mixed data, the authors present a robust fuzzy clustering model in [2].
In addition, noise clusters are used, and a weighting system is used for mixed
attribute to obtain feature sets relevant for clustering. The authors present a
simulation study and an empirical application. For this purpose, data from foot-
ball players are considered, which are grouped based on their performance and
position. The clustering algorithm the authors present proves effective for finding
clusters that remained hidden without the multi-attribute approach [2].
In [6], a clustering algorithm is developed based on the Delaunay method. The
authors group heat maps of football games into average formations of players and
use hierarchical clustering to subdivide the average formations. In the resulting
clusters, the players’ configurations are different. Thus, according to the authors,
typical transition patterns of formations of a team can be extracted.
To make better decisions for training, data collected with the help of wearable
devices from athletes of an NCAA Division 1 American football team are grouped
using K-Means clustering in [11]. The average playing demands of the athletes
were determined to form appropriate training groups. According to the authors,
the results are similar to traditional groupings for American Football training.
In [12], motivational profiles of young college athletes are found with the
help of clustering. To do this, the athletes filled out questionnaires that were
used to assess their motivation indices. According to the authors, four clusters
were found to be meaningful. This could serve as a support for coaches to develop
intervention programs regarding the motivational needs of their athletes.
3 Clustering Comparison
In this chapter, we first describe the public Body Performance data set. The
Body Performance data set contains twelve features and 13393 data points. The
features include a binary attribute called gender, as well as numeric attributes
such as age, weight kg, height cm, body fat %, blood pressure values diastolic
and systolic, as well as measures of athletic activities such as sit-ups counts,
gripForce, sit and bend forward cm, and broad jump cm. Furthermore, there is a
categorical attribute called class, which is not considered further as it is a target
variable. In future studies, external cluster validation indices could be used since
the use of class provides a ground truth label.
Exploiting Clustering for Sports Data Analysis 193
We now discuss our process; Fig. 1 shows the sequence of methods applied,
as well as associated Python libraries. After preprocessing, clustering divides
the data into groups. Subsequently, the resulting clusters are evaluated using
internal cluster indices. In the following, the cluster results of the different clus-
tering methods are described. Clustering was applied to three different variants of
the Body Performance data set resulting from the three preprocessing pipelines
sketched in Table 1. A comparison of the clusters formed by applying k-means,
hierarchical clustering and BIRCH to the first variant of the Body Performance
data is visualized with Principal Component Analysis (PCA) in Fig. 2.
Fig. 2. Visualization with PCA of the resulting k-means (left), hierarchical (middle)
and BIRCH (right) clusters of the first data set variant.
194 V. Meyer et al.
Table 2. Overview of the sizes of resulting k-means clusters for each data set variant
Table 3. Overview of the sizes of resulting hierarchical clusters for each data set variant
DBSCAN: In DBSCAN, all forms of clusters can occur: clusters are formed
based on regions with high density. In addition, DBSCAN requires the minimum
number of neighboring points and the distance between neighboring points and
a core sample as parameters [10]. For the Body Performance data set, DBSCAN
formed two clusters for each variant. In doing so, Variant 2 had some data points
marked as noise which is also included in Table 4. Variants 1 and 3, on the other
hand, received clusters of the same size as in hierarchical clustering.
Exploiting Clustering for Sports Data Analysis 195
Table 4. Overview of the sizes of resulting DBSCAN clusters for each data set variant
Table 6. Overview of the resulting Mean-Shift clusters for each data set variant
BIRCH: The BIRCH algorithm builds a Cluster Feature Tree (CFT). The data
points are packed into so-called Cluster Feature (CF) nodes [10]. The resulting
clusters of the body performance variants are also different (see Table 7). In the
two-dimensional visualizations with the PCA method, it is also clear that clusters
formed with BIRCH overlap more than clusters formed by other algorithms.
Table 7. Overview of the resulting BIRCH clusters for each data set variant
Cluster Validation Indices: Table 8 shows cluster validation scores of the respec-
tive data set variants and clustering algorithms to determine the goodness of
the clusters: Silhouette Coefficient, Calinski-Harabasz Score and Davies Bouldin
Score were used to calculate the validation scores.
To provide a comprehensive understanding of the results, we will present the
cluster validation indices in more detail.
Cluster validation indices are often used to evaluate cluster results. Internal
cluster validation indices like the above-mentioned scores evaluate cluster results
based on information found in the data itself. In general, these metrics are used
to evaluate clusters for compactness, i.e., the density of data points within a
cluster, and separability, i.e., the distance between two clusters [7,8].
For the silhouette coefficient, a higher score indicates better defined clusters.
The silhouette coefficient is calculated for each data point in the data set by
taking the average distance between one data point and all other data points
in the same cluster and calculating the average distance between the data point
and all other data points in the closest cluster [10]. For multiple data points
The silhouette coefficient is calculated by taking the average of the individual
silhouette coefficients of data points. Its values range from −1 to 1. A silhouette
coefficient of zero indicates overlapping clusters, while higher values represent
denser and better separated clusters. It is important to note that the silhouette
coefficient has a drawback: It tends to compute higher values for convex clus-
ters. Non-convex clusters, which can be found in DBSCAN, may have a lower
silhouette coefficient [10].
The Calinski-Harabasz index is another evaluation method. Similar to the
silhouette coefficient, a higher value indicates better defined clusters. The index
represents the ratio of the sum of dispersion between clusters and the dispersion
within clusters. An advantage of the Calinski-Harabasz index is its fast calcula-
tion. However, similar to the silhouette coefficient, it also has the disadvantage
that the values are higher for convex clusters. This may mean that non-convex
clusters, such as those that may occur in DBSCAN, have lower Calinski-Harabasz
index values [10].
In contrast to the Silhouette coefficient and the Calinski-Harabasz index, a
lower Davies-Bouldin index indicates that the clusters of a model are better sepa-
Exploiting Clustering for Sports Data Analysis 197
rated from each other. The index indicates the average similarity, where this sim-
ilarity is the comparison of the distance between clusters with the size of the clus-
ters. A value close to zero stand for better partitioning, with zero being the best
value. An advantage of the Davies-Bouldin index is that it is easier to calculate
compared to the silhouette coefficient. In addition, only pointwise distances are
used in the calculation, and the index is based solely on the sizes and features of
the data set. However, the Davies-Bouldin index also has the disadvantage that
the values for convex clusters are higher than for non-convex clusters [10].
According to the indices for the Body Performance data set, DBSCAN and
Hierarchical Clustering with data set variant 1 provide the best cluster results.
The clusters formed in this process consist of two well-separated groups, which
can also be seen in the two-dimensional visualizations. If we examine the Body
Performance data, we observe that the data was clustered by gender. This high-
lights that clustering methods may not provide meaningful results when not
excluding categorical or binary attribute types like the gender attribute. One
possibility is to remove the gender feature from the data set. However, in addi-
tion to gender, age and the performance features, the data set contains only a
few physiological features (height, weight, body fat, diastolic and systolic) which
can be dependent on gender and age. Therefore, this data set is useful for test-
ing purposes, but a data set that provides more physiological features would be
desirable for cluster analysis.
Table 8. Cluster validation indices for each cluster algorithm and data set variant
To state more context about the above two measurements, we briefly mention
the tests that they belong to:
point of view is that sports experts need to consider grouping the given tests in
cognition or performance alone for better correlation with the sports results in
real competitions.
Table 10. Clustering indexes of cognition and competition features with number of
clusters
5 Conclusion
We presented an implementation and comparative evaluation of clustering algo-
rithms on sports data sets. This paper provides a comprehensive evaluation based
on several cluster validation indexes (CVIs).
Based on the tables described, it can already be seen that the groupings
can differ greatly depending on the clustering algorithm and data set variant
Exploiting Clustering for Sports Data Analysis 201
Acknowledgements. This project was funded with research funds from the Bun-
desinstitut für Sport-wissenschaften based on a decision of Deutscher Bundestag
(Project Number: ZMI4-081901/21-25).
References
1. Brickenkamp, R., Schmidt-Atzert, L., Liepmann, D.: Test d2-Revision:
Aufmerksamkeits-und Konzentrationstest. Hogrefe, Göttingen (2010)
2. D’Urso, P., De Giovanni, L., Vitale, V.: A robust method for clustering football
players with mixed attributes. Ann. Oper. Res. 1–28 (2022)
3. Figueroa, I., Youmans, R., Shaw, T.: Cognitive flexibility and sustained attention.
In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting,
vol. 58, pp. 954–958 (2014). https://doi.org/10.1177/1541931214581200
4. Kaggle: Body Performance Dataset. https://www.kaggle.com/datasets/kukuroo3/
body-performance-data?resource=download. Accessed 05 Mar 2023
5. Li, X., Chen, X., Guo, L., Rochester, C.A.: Application of big data analysis tech-
niques in sports training and physical fitness analysis. Hindawi Wirel. Commun.
Mob. Comput. 2022, Article ID 3741087 (2022)
6. Narizuka, T., Yamazaki, Y.: Clustering algorithm for formations in football games.
Sci. Rep. 9(1), 1–8 (2019)
7. Rendón, E., Abundez, I., Arizmendi, A., Quiroz, E.M.: Internal versus external
cluster validation indexes. Int. J. Comput. Commun. 5(1), 27–34 (2011)
8. Rendón, E., et al.: A comparison of internal and external cluster validation indexes.
In: Proceedings of the 2011 American Conference, San Francisco, CA, USA, vol.
29, pp. 1–10 (2011)
9. Rokach, L., Maimon, O.: Data Mining and Knowledge Discovery Handbook.
Springer, New York (2005). https://doi.org/10.1007/b107408
10. Scikit Learn: Clustering. https://scikit-learn.org/stable/modules/clustering.
html#. Accessed 05 Mar 2023
11. Shelly, Z., Burch, R.F., Tian, W., Strawderman, L., Piroli, A., Bichey, C.: Using
K-means clustering to create training groups for elite American football student-
athletes based on game demands. Int. J. Kinesiol. Sports Sci. 8(2), 47–63 (2020)
12. Zason Chian, L.K., John Wang, C.K.: Motivational profiles of junior college ath-
letes: a cluster analysis. J. Appl. Sport Psychol. 20(2), 137–156 (2008)
Author Index
B L
Biczók, Gergely 77 Laborie, Timothé 103
Biermann, Henrik 36 Lambrix, Patrick 131
Bogaert, Matthias 179
Brefeld, Ulf 24
Brunner, Dustin 103 M
Memmert, Daniel 36
Meyer, Vanessa 191
C Mihalyi, Balazs 77
Calmet, Gabin 167 Mortelier, Alexis 119
Carlsson, Niklas 131
Cascioli, Lorenzo 11
P
Palczewska, Anna 144
D
Dalkilic, Mehmet M. 91
Davis, Jesse 11 R
de Sá-Freire, Leo Martins 64 Rahimian, Pegah 52
Rajasekaran, Gowtham Veerabadran 91
Rioult, François 119
E Rudolph, Yannick 24
El-Assady, Mennatallah 103 Rumo, Martin 3
Eradès, Aymeric 167
S
H Säfvenberg, Rasmus 131
Hendricks, Jacob 91 Sanguino Bautiste, Francisco Javier 103
Schlak, Jared Andrew 91
J Schmid, Marc 52
Janssens, Bram 179 Shankar, Anshumaan 91
Jones, Ben 144 Sharma, Parichit 91
K T
K. R., Madhavan 91 Timmer, Jens 36
Kim, Hyunsung 52 Toka, László 52, 77, 155
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2024
U. Brefeld et al. (Eds.): MLSA 2023, CCIS 2035, pp. 203–204, 2024.
https://doi.org/10.1007/978-3-031-53833-9
204 Author Index
V Y
Van Roy, Maaike 11 Yang, Liule 103
Vaz-de-Melo, Pedro O. S. 64 Yang, Weiran 36
Vuillemot, Romain 167
W
Weaving, Dan 144
Wieland, Franz-Georg 36 Z
Wiese, Lena 191 Zentai, Benedek 155