0% found this document useful (0 votes)
74 views11 pages

Flight Delay and Cancellation

Uploaded by

MGoyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views11 pages

Flight Delay and Cancellation

Uploaded by

MGoyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Journal of Air Transport Management 82 (2020) 101737

Contents lists available at ScienceDirect

Journal of Air Transport Management


journal homepage: http://www.elsevier.com/locate/jairtraman

Assessing strategic flight schedules at an airport using machine


learning-based flight delay and cancellation predictions
Miguel Lambelho a, Mihaela Mitici a, *, Simon Pickup b, Alan Marsden c
a
Faculty of Aerospace Engineering, Delft University of Technology, HS 2926, Delft, the Netherlands
b
London Heathrow Airport, Nelson Road, TW6 2GW, United Kingdom
c
Eurocontrol, Airport Research, Brussels, Belgium

A R T I C L E I N F O A B S T R A C T

Keywords: To mitigate air traffic demand-capacity imbalances, large European airports implement strategic flight schedules,
strategic flight schedule where flights are assigned arrival/departure slots several months prior to execution. We propose a generic
Delay prediction assessment of such strategic schedules using predictions about arrival/departure flight delays and cancellations.
Cancellation prediction
We demonstrate our approach for strategic flight schedules in the period 2013–2018 at London Heathrow
Machine learning
Schedule ranking
Airport. Together with the development of dedicated strategic flight schedule optimization models, our proposed
approach supports an integrated strategic flight schedule assessment, where schedules are evaluated with respect
to flight delays and cancellations.

1. Introduction access an airport’s infrastructure for arrival or departure at a specific


date and time, and 2) a deterministic, pre-defined airside and terminal
The continuous growth of air traffic, together with limited airport capacity of the airport in terms of, for instance, maximum number of
expansion possibilities, have resulted in air traffic demand-capacity movements per day, per hour and per 15min. Following the IATA
imbalances and arrival/departure flight delays at the largest airports guidelines, the output of the slot allocation process is a strategic flight
in Europe. As an example, in 2017, the number of flights in Europe has schedule containing the scheduled arrival and departure flight date and
increased by 4:3% relative to 2016 (EUROCONTROL, 2017). This cor­ time up to 6 months prior to the day of the flight execution. The strategic
responds to an additional 1191 flights per day on average. At the same schedule is in the form of a series of scheduled arrival and departure
time, 20:4% of the flights in 2017 experienced an arrival delay of 15min times. These series are commonly recurrent over a period of time, e.g.,
or more (EUROCONTROL, 2017). flight 123 is scheduled to arrive at 10AM every day from Monday to
To manage the demand-capacity imbalances, the busiest European Thursday, in the months April and May.
airports make use of administrative demand management strategies to In the past decade, several optimization models to allocate slots to
limit the number of flights scheduled to arrive/depart during busy arriving/departing flights, have been proposed (Zografos et al., 2012;
hours. The main administrative demand management strategy currently Ribeiro et al., 2018). have developed optimization models for slot
in use is the airport slot allocation process, which follows the Interna­ allocation at a single European airport. Network-wide slot allocation
tional Air Transport Association (IATA) Worldwide Slot Guidelines optimization models have been developed by (Castelli et al., 2012;
(International Air Transport Association, 2017). The IATA slot alloca­ Corolli et al., 2014; Pellegrini et al., 2017). The main objective of these
tion process takes place two times per year (the Winter and the Summer models is to minimize the difference between the airlines’ slot requests
season) and gives airlines the permission to use the full range of an and the slots granted at the airport, following the IATA guidelines and
airport’s infrastructure to arrive or depart at a specific date and time at taking into account the declared capacity limits of the airport. However,
the airport (Zografos et al., 2017; Pellegrini et al., 2017; Ribeiro et al., these models assume ideal conditions: the flights are assumed to be able
2018). provide a detailed overview of the slot allocation process. This to arrive and depart exactly within their scheduled slots, and the ca­
process is managed by airport coordinators. The main input of the slot pacity of the airport is considered to be fixed, deterministic. In the day of
allocation process at an airport are 1) the requests of the airlines to the execution, however, flights often experience arrival/departure

* Corresponding author.
E-mail address: [email protected] (M. Mitici).

https://doi.org/10.1016/j.jairtraman.2019.101737
Received 14 May 2019; Received in revised form 24 September 2019; Accepted 23 October 2019
Available online 6 November 2019
0969-6997/© 2019 Elsevier Ltd. All rights reserved.
M. Lambelho et al. Journal of Air Transport Management 82 (2020) 101737

delays or cancellations. The strategic flight schedules, currently ob­ et al., 2008). proposes a statistical model to estimate flight departure
tained following optimization of the slots requests and the IATA delay distributions and seasonal trends at Denver International Airport.
guidelines up to 6 month prior to the day of the flight execution, do not The authors consider in their model a seasonal trend, daily propagation
give an indication on the potential flight delays and cancellations patterns and random residuals (Abdel-Aty et al., 2007). develops a fre­
associated. In turn, the actual impact of the strategic schedules on the quency analysis to detect flight delay patterns at Orlando International
airport on-time performance is unknown at the moment of schedule Airport. To analysis the propagation of delays in a network of airports,
generation. To address this, a methodology is needed to assess strategic Pyrgiotis et al. (2013) proposes a network of queues, while Xu et al.
flight schedules with respect to potential flight delays and cancellations (2005) make use of Bayesian networks.
and to provide airports with insights into potential performance bot­ In the last years, an increasing number of studies have analyzed flight
tlenecks. Such insights are particularly important to support the airport delays using machine learning approaches (Sternberg et al., 2017).
coordinators in developing strategic schedules that not only meet the Several studies (Kim et al., 2016; Choi et al., 2017; Alonso and Loureiro,
IATA guidelines, but also enable a smooth and robust air traffic that 2015; Klein et al., 2010; Rebollo and Balakrishnan, 2014; Chen and Li,
benefits both airlines, airports and passengers. 2019) consider a short prediction horizon of up to 1 day before the flight
In this paper we propose a machine learning-based approach to as­ execution (Kim et al., 2016). classifies delays at several US airports using
sesses the impact of strategic, IATA guidelines-compliant flight sched­ recurrent neural networks and several weather-related features. The
ules on the on-time performance at an airport. In particular, we propose models achieve a classification accuracy of 0.874 (Choi et al., 2017).
classification algorithms to predict whether flights scheduled in the employs random forests and weather-related features to classify flight
strategic phase (6 months prior to the day of the execution) are subject to delay with an accuracy of 0.828(Alonso and Loureiro, 2015). develops a
arrival/departure delays and cancellations during execution. Using the multi-class classification algorithm to predict flight departure delay at
obtained flight delay and cancellation results, we propose a generic Porto airport, achieving an accuracy of 0.57. One of the most important
methodology to rank the strategic schedules by comparing and con­ feature used for classification is the amount of delay experienced by the
trasting the associated flight delay and cancellation predictions. This previous flight arrival. In Klein et al. (2010), airport delay predictions
analysis provides a means to assess strategic schedules based on their are determined using as explanatory feature the weather conditions. In
predisposition to have flight delays and cancellations. We demonstrate Rebollo and Balakrishnan (2014), the authors develop predictions
our assessment methodology using 10 strategic flight schedules from models for airport and network delays for a 2 h prediction horizon and
2013 to 2018 at London Heathrow Airport (LHR), which is one of the achieve an average regression test error of 19%. The propagation of
busiest airports in Europe. To the best of our knowledge, in this paper we flight delays is analyzed in Chen and Li (2019) making use of multi-label
address for the first time the assessment of strategic fight schedules with random forest classification algorithm. The authors achieve an accuracy
respect to potential flight delays and cancellations. of 0.8 or more for their prediction algorithms and show that departure
The main contribution of this paper is that it provides a generic and arrival delay have the main explanatory power. In contrast, for our
assessment of strategic flight schedules, at the moment when they are case we cannot consider such features, as we assume strategic schedules,
generated, using KPIs derived from predictions on flight delays and where flights are scheduled to arrive and depart on time.
cancellations. The generality of our proposed assessment relies on the (Choi et al., 2016; Belcastro et al., 2016; Horiguchi et al., 2017)
fact that we use a relative performance comparison between the assessed propose machine learning approaches to classify flight delays with a
strategic schedules, rather than assigning user-defined weights to the prediction horizon of several days prior to the day of the flight execution
target KPIs. Thus, we assess, using a generic methodology, the robust­ (Choi et al., 2016). achieves an accuracy of 0.268 using weather fore­
ness of strategic, IATA-compliant flight schedules with respect to delays casts available 5 days prior to the day of the flight execution. The au­
and cancellations. Together with the development of dedicated opti­ thors employ random forests classifier that are exclusively trained with
mization models that aim to satisfy airlines’ requests for slots in the weather-related features (Belcastro et al., 2016). proposes a model to
presence of airport capacity constraints, our approach provides the classify flights as being delayed exclusively as a result of unfavorable
airport coordinators with an integrated assessment of the performance weather conditions. The authors use a balanced flight dataset, where a
of the IATA-compliant, airport slot allocation process. random under-sampling algorithm is used to decrease the number of
The remainder of this paper is organized as follows. Section 2 dis­ delayed samples. The features considered are the scheduled departur­
cusses existing machine learning approaches for flight delay and e/arrival time, the origin/destination airport and the weather condi­
cancellation predictions and their performance. Section 3 describes the tions. The proposed random forests classifier obtains an accuracy of
flight schedules and the flight delay and cancellation data from LHR in 0.858, with a recall of 0.869 with a 60 min delay threshold (a fight
the period 2013–2018. Section 4 presents our proposed machine considered to be delayed if it has a delay of 60 min or more relative to
learning approach for flight delay and cancellation classification. Sec­ the scheduled arrival time) (Horiguchi et al., 2017). considers a flight
tion 5 describes a generic approach to assess strategic flight schedules delay prediction horizon of 5 months before the day of the flight
based on KPIs that are derived in Section 4 using machine learning- execution. A XGBoost classifier achieves an area under the ROC curve
based predictions for flight delay and cancellation. Section 6 discusses (AUC score) of 0.534 when predicting flight delay for 20 airports in Asia
the implications of our results. as Section 7 provides conclusions and for a low-cost airline.
outlines future research directions. Several studies analyze flight delay cancellations at an airport
(Sridhar et al., 2009). proposes a neural network approach that aims at
1.1. Related work predicting the total aggregate number of flight cancellations. The ac­
curacy of the predictions obtained is 0.79. Many studies also propose
The analysis of flight delays has been extensively addressed in the logit models to explain the influence of several variables on a flight
literature (Mueller and Chatterji, 2002; Wu, 2014; Tu et al., 2008). being cancelled (Rupp and Holmes, 2006; Xiong and Hansen, 2009).
propose data-driven models to estimate flight delay distributions at This paper expands this previous work on flight delay and cancel­
non-European airports (Mueller and Chatterji, 2002). determines flight lation prediction by developing machine learning classifiers to predict
delay statistics for 10 major US airports by analyzing historical fight flight delays and cancellations with a 6-month prediction horizon at a
data. Based on these statistics, departure and arrival delays have been large European airport. These flight delay and cancellation predictions
model as a Poisson process and a normal distribution, respectively (Wu, are further used to assess strategic flight schedules on their impact on the
2014). estimates the probability density function of departure and airport’s on-time performance.
arrival delays at Beijing Capital International Airport using historical
flight delay data and an optimal generalized extreme value model (Tu

2
M. Lambelho et al. Journal of Air Transport Management 82 (2020) 101737

1.2. Description of the case study data Table 2


Actual delays and cancellations of flights that have been scheduled in the
The case studies presented in this paper are based on the strategic 2013–2018 strategic schedules.
flight schedules, i.e., scheduled arrival and scheduled departure flight Flight Imbalance ratio Delay STD (min)
times, at London Heathrow Airport (LHR) in the period 30 March Arrivals 3.15 44.68
2013–30 March 2018. These scheduled arrival and departure times Departures 3.70 32.87
correspond to 10 strategic slot allocation schedules, i.e., for each year in Cancellations 58.88 –
the period 2013–2018 there is a 6-month Summer Season schedule (end
March to end September) and a 6-month Winter Season schedule. These
period 2013–2018 (2 schedules per year). In Section 4.1 we discuss the
schedules are the result of the IATA slot allocation process at LHR: air­
selection of features used as input for the flight classification algorithms.
lines submit requests for arrival/departure slots; the airport coordinator
Section 4.2 presents the performance of three flight classification algo­
grants slots while taking into account the available capacity at LHR and
rithms: LightGBM, multilayer perceptron (MLP) and random forests
the IATA guidelines for slot allocation (International Air Transport As­
(RF). In Section 4.3 we make use of model-agnostic interpretability
sociation, 2017).
methods to explain the predictions yielded by the classification
Table 1 gives an example of scheduled flights at LHR, following the
algorithms.
IATA slot allocation process. Each arrival/departure flight is assigned an
Fig. 1 shows the methodology used for our proposed machine
arrival/departure time between 6:30–24:00, an interval of dates when
learning approach. For each classifier, the training set consists of 9 of the
the flight is scheduled for arrival/departure, the frequency of the
10 strategic schedules. The testing set consists of the remaining strategic
arrival/departure over the indicated interval of dates, an LHR terminal
schedule. We repeat this training process for each of the 10 schedules
(at LHR there are 4 passenger terminals and a cargo terminal), and the
considered.
destination (origin) airport for an flight departing from (arriving at)
LHR. The type of aircraft assigned to a scheduled flight is also known
from the slot request submitted by the airline. For example, flight 2.1. Feature selection
KL1031 in Table 1 is scheduled for arrival at 17:55 every day from
Monday to Saturday in the period 01.04.2013–01.07.2013. In this section we discuss the selection of the features used to classify
We consider 2.3 million flights scheduled to arrive and depart from strategic, scheduled flights as being delayed or cancelled.
LHR in the period March 2013–September 2018. The datasets consid­ We select the features used for the flight delay and cancellation
ered in this paper have been provided by LHR. There are 177 types of classification algorithms using the recursive feature elimination (RFE)
aircraft assigned for these flights and a total of 542 distinct origin/de­ method (Guyon et al., 2002; Granitto et al., 2006), which recursively
parture airports for the flights arriving/departing from LHR. Moreover, eliminates weak features using cross-validation and scoring of features
25% of these flights are short-haul, with a flown distance of at most subsets. Table 3 shows the RFE-based features selected to classify an
700 km, 25% of the flights have a flown distance of 700 1400 km, 30% arrival/departure flight with a 6-months prediction horizon. Table 4
of the flights 1400 6000 km, and 20% of the flights more than provides a detailed description of these features.
6000 km. The features Airline, Terminal, Aircraft, Airport, Year, Month, Hour,
Table 2 shows the flight imbalance ratio (IR), i.e., the ratio between Day of year, Day of month, Day of week are obtained directly from the
the non-delayed/delayed flights and the non-cancelled/cancelled strategic flight schedules, which are compliant with the IATA slot allo­
flights, which have been scheduled in the period 2013–2018 at LHR. cation guidelines. The features Distance, Country and Seats are derived
As expected, the number of cancelled flights is much smaller than the after processing the 6-months strategic schedules. The feature Seats has
non-cancelled flights. Table 2 also shows the standard deviation (STD) of been derived from the type of aircraft assigned to a flight and has a range
the flight arrival/departure delays at LHR in 2013–2018. between 4 seats (for regional jets) to 800 seats (for A380-800). The
We say that an arrival (departure) flight has an arrival/departure feature Arrival ATFM delay is a daily average feature, i.e, assumes the
delay if, during execution, this flight arrives (departs) 16min or more same value for all flights in a specific day of operations, and corresponds
after the scheduled time of arrival (departure) (EUROCONTROL, 2017). to the duration between the last Estimated Take-Off Time (ETOT) and
We say that an arrival (departure) flight is cancelled if this flight is not the Calculated Take-Off Time (CTOT) allocated by the European ATM
executed in the day when it was scheduled to arrive (depart). Network Manager. A positive value of this parameter indicates traffic
congestion due to, for instance, weather conditions. We have also
2. Machine learning algorithms for flight delay and cancellation considered several other features such as the number of aircraft present
prediction at the airport at the moment of/1 h before/1 h after the flight arrival
time/departure time, the arrival stack, the standard instrumental de­
In this Section, we present a machine learning approach to predict parture (SID) route. However, these features have been eliminated by
whether strategic, scheduled arrival/departure flights are delayed or the RFE feature elimination algorithm, i.e., these features did not further
cancelled. These predictions are based on strategic flight schedules from improve the performance of the classification algorithms.
LHR and assume a 6-month prediction horizon, i.e., we predict whether The categorical features Airline, Terminal, Aircraft and Country are
flights are delayed or cancelled 6 months prior to the day of the flight encoded using the target encoding method (Micci-Barreca, 2001). The
execution. We make use of classification algorithms to predict: arrival categorical feature Airport has been encoded both using the geographic
flight delay, departure flight delay and flight cancellation. To train and coordinates of the airport and target encoding. Binary encoding and
test our algorithms, we consider a set of 10 strategic schedules in the one-hot encoding methods have not been employed due to the high
cardinality of the categorical features. Ordinal encoding has not been

Table 1
Example of strategic flight schedule with flights scheduled to arrive at/depart from LHR.
FlightID Arrival/Departure Time Start Date End Date Days of the week Terminal Origin/Destination Aircraft

KL1031 Arr. 1755 April 01, 2013 July 01, 2013 123456⋅ 4 AMS 73 W
BA830 Dep. 0930 April 01, 2013 June 20, 2013 12⋅⋅567 1 DUB 320
DL100 Arr. 0800 April 01, 2013 May 13, 2013 1⋅⋅45⋅7 3 JFK 764

3
M. Lambelho et al. Journal of Air Transport Management 82 (2020) 101737

Fig. 1. Methodology to evaluate the flight delay and cancellation classifiers.

Table 3 Table 5
Feature selection for flight delay and cancellation classifiers with a 6-months Example of encoding methods for feature Airline and classifier departure delay.
prediction horizon. Here we consider 3 airlines, i.e., TAP, KLM, BA, and a total of 6 flights.
Classifier Features Airline Delayed One-hot Ordinal Binary Target
a a a c a encoding encoding encoding encoding
Departure Delay Airline , Terminal , Aircraft , Distance, Airport , Country ,
Seats, Year, Monthb, Hourb, Day of yearb, Day of monthb, Day of TAP Yes 100 1 00 0.5
weekb KLM No 010 2 01 0
Arrival Delay Airlinea, Terminala, Aircrafta, Distance, Airportc, Countrya, BA Yes 001 3 10 0.67
Seats, Year, Monthb, Hourb, Day of yearb, Day of monthb, Day of TAP No 100 1 00 0.5
weekb BA Yes 001 3 10 0.67
Flight Airlinea, Terminala, Aircrafta, Distance, Airporta, Countrya, Year, BA No 001 3 10 0.67
Cancellation Hourb, Day of yearb, Day of monthb, Day of weekb
a
Feature prepossessed with the target encoding method.
b encoded as 23 ¼ 0:67 since there are 2 BA flights delayed from a total of 3
Feature transformed by trigonometric functions.
c
Categorical feature encoded using geographic coordinates. BA flights.
The features Hour, Day of year and Month have been transformed by
trigonometric functions to account for periodicity (Horiguchi et al.,
Table 4 2017). For example, for a specific hour of the day t, we use the trigo­
� � � �
Description of features used for flight delay and cancellation classification al­
gorithms - 6 months prediction horizon. C¼Categorical, N¼Numerical, nometric functions sin 224
π t and cos 2π t to ensure a 24hrs periodicity.
24
T ¼ trigonometric transform function. As a consequence, t ¼ 24:00 and t ¼ 1:00 become sequential hours.
Features Feature Feature description Similarly, we ensure a periodicity of 365 days for the feature Day of the
type year and a periodicity of 12 for the feature Month.
Airline C airline operating the flight
Terminal C arrival/departure airport terminal assigned to a
flight
2.2. Flight delay and cancellation classification algorithms
Aircraft C aircraft type
Distance N distance between origin and destination airport
(km) In this Section we present three machine learning classification al­
Airport C origin/destination airport of the flight gorithms to classify flight delays and cancellations 6 months in advance
Country C country of origin/destination airport of the day of the flight execution: LightGBM, multilayer perceptron
Seats N number of seats of the aircraft assigned to a
(MLP) and random forests (RF). These three algorithms belong to
flight
Year N scheduled year of flight arrival/departure different machine learning types of algorithms: gradient boosting deci­
Month T scheduled month of flight arrival/departure sion tree, neural networks and random decision forests, respectively. We
Hour T scheduled hour of the day of flight arrival/ make use of three different classification algorithms to cross check our
departure
results.
Day of year T scheduled day of the year of flight arrival/
departure
LightGBM (Ke et al., 2017) is a tree-based machine learning algo­
Day of month T scheduled day of the month of flight arrival/ rithm where ensembles of decision trees are trained in sequence by
departure fitting negative gradients of the loss. LightGBM uses Gradient-based
Day of week T scheduled day of the week of flight arrival/ One-Side Sampling, which excludes data instances with small gradi­
departure
ents, and Exclusive Feature Building, which bundles mutually exclusive
Arrival ATFM N daily average ATFM arrival delay (min)
delay variables, thus, reducing the number of features. To estimate the
hyperparameters that yield the best performance, we use the Python
library hyperot (Bergstra et al., 2013) to optimize the f1-score metric
used either since it assumes an unnecessary ordering within a feature. Duda et al. (2012), i.e., the harmonic mean between precision and
For example, an ordinal encoding of the airlines such as 1; 2; 3; … would recall, by performing Bayesian optimization. Table 6 shows the hyper­
mean that an airline encoded as 1 is more similar to an airline encoded parameters of the LightGBM classifiers. The best performance is ach­
as 2 than an airline encoded as 8. Table 5 gives an example for each of ieved with a high learning rate and with a relatively small number of
the mentioned encoding methods, where one-hot encoding uses strings decision trees.
of bits with only one high bit (1) and the rest low bits (0) for each airline Multilayer perceptron (MLP) (Hinton, 1990) is a feed-forward neural
type, ordinal encoding uses ordered integers for each airline type, binary network that has consecutive layers with adaptive weights. The vector of
encoding uses binary strings of bits for each airline type. Lastly, the inputs of MLP was normalized Nð0; 1Þ. The initialization of the weights
target encoding method (Micci-Barreca, 2001), taking the case of de­ follows a normal distribution Nð0; 0:01Þ. To increase the stability of the
parture delay classifiers, encodes an airline type based on the probability neural network, all the hidden layers have batch normalization. Table 7
that a flight from this airline is delayed (the target variable). As an shows the hyperparameters of the MLP classifiers. All classifiers pro­
example, in Table 5 there are a total of 6 flights and the airline BA is duced superior results when trained with two hidden layers and with the

4
M. Lambelho et al. Journal of Air Transport Management 82 (2020) 101737

Table 6
Hyperparameters of LightGBM flight delay and cancellation classifiers with a 6-month prediction horizon.
Classifier Number input features Learning rate Max depth of tree Trees Subsample Weight positive class

Dep. Delay 18 0.1 15 300 0.578 2.265


Arr. Delay 18 0.1 39 200 0.573 2.100
Cancellation 14 0.1 30 100 0.781 7.300

classification algorithms, LightGBM performs the best with respect to


Table 7
accuracy, precision, recall and area under the ROC curve (AUC) (Duda
Hyperparameters of MLP flight delay and cancellation classifiers with a 6-month
et al., 2012).
prediction horizon, Adam optimizer.
All classifiers achieve an accuracy of 0.75 or higher for flight delay
Classifier Number Number Batch Dropout Learning classification and 0.98 or higher for cancellation classification. We note
input neurons for size rate rate
features each layer
that the prediction horizon is 6 months prior to the day of the flight
execution. Given that the training and test data consists of a larger
Dep. Delay 18 100→100 1000 0.15 1:0 �
number of negatives, i.e., not-delayed/not-cancelled flights, than posi­
10 4
tives, i.e., delayed/cancelled flights (see also the imbalance ratios in
Arr. Delay 18 110→110 1000 0.05 1:0 �
Table 2), it is necessary to analyze closely both the recall, i.e., how many
10 3
of the true positives are predicted as positive, and the precision, i.e., how
Cancellation 14 150→150 1000 0.05 1:0 �
many of the predicted positives are correctly predicted. As such, when
10 6
estimating the hyperparameters of the classification algorithms, the aim
has been to obtain the highest value of the f1-score and similar values for
Adam optimizer (Kingma and Ba, 2014). Additionally, dropout layers precision and recall, i.e., an equal number of false negatives and false
were included to reduce overfitting. The small learning rate used positives.
increased the computational time of the MLP classifiers when compared Figs. 2–4 show the ROC curves achieved using LightGBM, MLP and
with the LightGBM models, as shown in Table 10. We note that, since we RF for the three classifiers: arrival flight delay, departure flight delay
aim to compare the performance of the LightGBM algorithm with other and flight cancellation classifiers. Again, the results obtained using
prediction algorithms, for the MLP classifier we consider a depth limited LightGBM show the largest AUC, i.e., the ability to identify actual
to 3 hidden layers. delayed (not delayed) flights as delayed (not delayed) Duda et al.
Random Forests (RF) (Breiman, 2001) is an ensemble learning (2012).
method that aggregates dissimilar decision trees. When building a forest The computational times required for the LightGBM, MLP and RF
tree, only a random part of the training set is used to build each tree. To classifiers are given in Table 10. All the three classifiers have been
increase the ensemble diversity, further randomness is introduced when trained and tested on the same dataset (see Section 3). The RF classifiers
building each tree by selecting a fraction of the total number of features. require the most computational time, despite the small depth of the trees
Once all forest trees are created, the classification of each sample in the generated and the low percentage of features considered for each split
test set is executed by combining the predictions of each tree through (see Table 8).
majority voting. Table 8 shows the hyperparameters of the RF classifiers.
Before presenting the results of the classification algorithms, we 2.3. Model-agnostic interpretability - LightGBM flight classification
introduce the following metrics. We consider the True Negatives (TN), algorithms
the False Positives (FP), the False Negatives (FN) and the True Positives
(TP) (Marsland, 2011; Duda et al., 2012; Hossin and Sulaiman, 2015), In this Section we interpret the results yielded by the LightGBM flight
where TN is the number of actual non-delayed flights that are predicted classifiers, which has the best performance with respect to accuracy,
to be non-delayed, FP is the number of actual non-delayed flights that precision and recall (see Table 9).
are predicted to be delayed, FN is the number of actual delayed flights To determine the impact of a feature on the output of the LightGBM
that are predicted to be non-delayed and, TP is the number of actual classifiers, we determine the Shapley additive explanations (SHAP)
flights that are delayed and are predicted to be delayed. Then, we value (Lundberg and Lee, 2017) of a feature i, which we denote as φi , for
TPþTN
determine Accuracy ¼ TNþFPþFNþTP TP
, Recall ¼ TPþFN TP
, Precision ¼ TPþFP, every flight classified, as follows (Lundberg and Lee, 2017):
f1-score is the harmonic mean between Precision and Recall, and AUC, i. X jSj!ðjFj jSj 1Þ!
e., the area under the curve determined by the rate of TP and FP φi ¼ ½f ðS[figÞ f ðSÞ�;
jFj!
Marsland (2011).
S⊆Ffig

Table 9 shows the performance of LightGBM, MLP and RF to classify where F is the set of all features considered for the classification algo­
arrival/departure flights as being delayed and cancelled. We note that rithm, S⊆F is a subset of features obtained from the set F except feature i,
the prediction horizon is 6 months prior to the day of the flight execu­ and fðSÞ is the expected classification output given by the set S of
tion. A 5-fold cross validation is performed using the data on the flights features.
arriving and departing in the period 2013–2018 from LHR. Among the 3 The SHAP values show which features have a significant positive or
negative impact on the delay/cancellation flight classification and what
is the magnitude of the impact, i.e., how much a specific feature value
Table 8
drives the classification of a flight as delayed/cancelled. For a specific
Hyperparameters of RF flight delay and cancellation classifiers with a 6-month
flight, a large positive (large negative) SHAP value of a feature indicates
prediction horizon.
that this feature has a large contribution for the flight to be classified as
Classifier Number Number trees Max Percentage
delayed/cancelled (not delayed/cancelled). In this paper, the SHAP
input generated depth of features for each
features tree split
values are expressed in log odds, where the log odd of a variable A is
defined as:
Dep. Delay 18 500 11 0.60
Arr. Delay 18 1000 12 0.55
Cancellation 14 500 10 0.60

5
M. Lambelho et al. Journal of Air Transport Management 82 (2020) 101737

Table 9
5-fold cross validation results for machine learning models with 6-month pre­
diction horizon.
Classifier LightGBM MLP RF

Metric Mean STD Mean STD Mean STD

Departure Accuracy 0.794 5:8 � 0.772 3:8 � 0.771 9:3 �


Delay 10 3 10 3 10 4
Precision 0.516 2:1 � 0.467 7:6 � 0.460 2:4 �
10 3 10 3 10 3
Recall 0.516 2:2 � 0.488 1:0 � 0.455 2:7 �
10 3 10 2 10 3
f1-score 0.516 1:4 � 0.478 2:8 � 0.458 1:9 �
10 3 10 3 10 3
AUC 0.786 1:0 � 0.754 2:4 � 0.744 8:3 �
10 3 10 3 10 4
Arrival Accuracy 0.791 5:0 � 0.771 3:9 � 0.759 8:0 �
Delay 10 4 10 3 10 4
Precision 0.567 2:2 � 0.525 9:3 � 0.500 3:0 �
Fig. 3. ROC curves of flight departure delay classifiers.
10 3 10 3 10 3
Recall 0.553 2:2 � 0.527 1:1 � 0.515 1:3 �
10 3 10 2 10 3
f1-score 0.560 1:6 � 0.526 3:6 � 0.507 1:8 �
10 3 10 3 10 3
AUC 0.803 1:2 � 0.774 2:1 � 0.758 1:4 �
10 3 10 3 10 3
Cancellation Accuracy 0.987 1:8 � 0.984 2:0 � 0.984 8:5 �
10 4 10 4 10 5
Precision 0.608 3:8 � 0.532 1:5 � 0.529 7:0 �
10 3 10 2 10 3
Recall 0.592 2:0 � 0.530 1:0 � 0.529 3:0 �
10 3 10 2 10 3
f1-score 0.600 1:8 � 0.531 8:3 � 0.529 4:1 �
10 3 10 3 10 3
AUC 0.929 4:3 � 0.840 7:2 � 0.862 3:3 �
10 3 10 3 10 3

� �
PðAÞ
log ; ​ with ​ PðAÞ < 1: Fig. 4. ROC curves of flight cancellation classifiers.
1 PðAÞ
We note that, given a feature, a SHAP value in log odds close to zero
given feature, each dot corresponds to a flight and an associated SHAP
indicates that this feature does not contribute/does not help deciding in
value. For a given feature, the color blue of a dot (flight) indicates that,
classifying a flight as being delayed/cancelled or not.
for this flight, the value of the feature is small, while the color red in­
Figs. 5–7 are summary plots that show the SHAP values for all fea­
dicates that the value of the feature is large. For example, in Fig. 5, for
tures for all flights considered for classification, i.e., these figures show
the feature Arrival ATFM Delay, the dots (flights) colored red have large
an aggregation of dots, where each dot corresponds to a flight. For a
Arrival ATFM delays, while the dots (flights) colored blue have small
Arrival ATFM delays. For a given feature, an accumulation of dots in­
dicates that there is a large number of flights that have similar SHAP
values. As an example, in Fig. 5, for the feature Arrival ATFM Delay,
there is a significant number of flights where this feature has a SHAP
value between 1 and 0, i.e., an accumulation of blue dots corre­
sponding to a SHAP value between 1 and 0. Again, the color blue in­
dicates that all these flights have a low Arrival ATFM delay. The blue
dots (flights) that have a negative SHAP value are those flights with a
low Arrival ATFM delay (blue color) and that are classified as not
delayed (negative SHAP). In particular, for the blue dots (flights) where
the SHAP value is close to zero, the Arrival ATFM delay is low (blue
color), but it does not significantly impact the classification of these

Table 10
Computational time for LightGBM, MLP and RF classifiers.
Classifier LightGBM (sec) MLP (sec) RF (sec)

Departure Delay 55.91 715.43 943.97


Arrival Delay 33.66 1554.75 1874.01
Cancellations 27.31 2131.48 1849.15
Fig. 2. ROC curves of flight arrival delay classifiers.

6
M. Lambelho et al. Journal of Air Transport Management 82 (2020) 101737

Fig. 7. SHAP values (log odds) of the features used for flight cancellation
Fig. 5. SHAP values (log odds) of the features used for delayed departure flight classification - LightGBM, 6 months prediction horizon.
classification using LightGBM - 6 months prediction horizon.

features Month and Day of the month have the lowest feature impor­
tance. Fig. 7 shows that the features origin/destination Airport and
Airline have the highest feature importance for the flight cancellation
classification algorithm. The feature Aircraft also have a high impor­
tance in the cancellation classifier when compared with the flight delay
classifiers. The feature Day of the month shows the lowest feature
importance from for the cancellation classifier.
Figs. 5–7 also allow for a detailed analysis of the impact of each
feature. For a given feature, the color red used of a dot, i.e., flight, and a
corresponding large SHAP value shows that large values (red) of this
feature are very significant (large SHAP value) for the classification. For
example, Fig. 5 shows that for the feature Arrival ATFM Delay, there are
several dots, i.e., flights, that are red and that have a positive and large
SHAP value. This means that a large (red) value for the Arrival ATFM
Delay is very significant (large SHAP value) and drives the classification
of a departure flight as being delayed (positive SHAP value). In Fig. 6
there is a larger accumulation of blue dots (flights), with SHAP values
between 1 and zero. The color blue indicates that these departure
flights have low Arrival ATFM Delays. Here, the blue dots (flights) with
negative SHAP values away from zero indicate that, for these flights,
small (blue) Arrival ATFM Delays drive the classification of these de­
parture flights as being not delayed (negative SHAP value). Also, the
blue dots (flights) with negative SHAP values close to zero indicate that,
Fig. 6. SHAP summary plot (log odds) of the features used for delayed arrival for these departure flights, the small (blue) Arrival ATFM Delays do not
flight classification using LightGBM - 6 months prediction horizon. significantly impact the classification of these flights. In Fig. 5, for
feature Arrival ATFM Delay, we also note that for the dots (flights) with
flights (SHAP value in log odds close to zero). SHAP values around zero, i.e., the feature does not significantly drive
In Figs. 5–7, the features are sorted by the sum of the SHAP values the classification of a flight as being delayed or not delayed, the values of
magnitudes over all samples such that the feature at the top of the graph the Arrival ATFM Delay are low (blue color). Thus, low Arrival ATFM
has the highest impact on the flight classification, whereas the feature at Delays have a low classification importance.
the bottom of the graph has the lowest impact. For example, in Fig. 5, the A similar analysis can be made for the other features in Figs. 5–7. We
feature Arrival ATFM delay is at the top of the graph since it has the note that for the categorical features Airline, Country, Aircraft, Terminal
highest impact on the flight delay classification. The features Hour and which are encoded using the target encoding method, high feature
Airline have the second and third largest impact on the flight delay values means high probabilities of delay. Here, it can be seen that, for
classification. Similarly, Fig. 6 shows that the features Arrival ATFM these features, high values of these features, i.e., high probabilities of
Delay, Airline and Hour have the highest impact on the arrival flight delay, correspond to high SHAP values. For the features encoded with
delay classification. Both Figs. 5 and 6 show that the feature Seats also trigonometric functions (see also Section 4.1), we note that a detailed
has a high importance for both arrival and departure delay classifiers. analysis of the summary plots is not straightforward as we apply sin and
The feature Terminal has the lowest feature importance for departure cos transformations. As such, for these features, we make use of the
flight delay classification. For the arrival flight delay classification, summary plots to determine their feature importance relative to the
however, the feature Terminal has a larger importance, whereas the other features, as discussed above.

7
M. Lambelho et al. Journal of Air Transport Management 82 (2020) 101737

2.4. Assessing strategic flight schedules using machine-learning flight


delay and cancellation predictions

In this Section we present a generic approach to assess strategic flight


schedules in terms of associated flight delays and cancellations. In Sec­
tion 5.1 we present a generic method to assess flight schedules based on
a set of general KPIs. Rather than assigning weights to the considered
KPIs, which are user-specific and may be difficult to define, we propose a
generic ranking of the schedules based on the relative improvement in
KPIs of some schedules against others. However, in the case of strategic
flight schedules, i.e., 6 months prior to the flight execution, the value of
flight delay and cancellation-related KPIs are not known. In fact, a
strategic flight schedule does not give any indication on the associated
number of delayed/cancelled flights. To address this, in our proposed
schedule assessment we make use of KPIs derived from the flight delay Fig. 8. Example of schedule dominance for 2 KPIs.
and cancellation predictions (see Section 4). In Section 5.2 we apply this
assessment method to rank 10 strategic schedules using KPIs derived X
N
1
from machine learning-based flight delay and cancellation predictions. DðSi Þ ¼ 1 ; (1)
j ððSi ​ dominates ​ Sk Þ\ðSk 2L ÞÞ
j

The numerical examples in this section are based on strategic flight k¼1
schedules from LHR in the period 2013–2018. k6¼i

2.5. Generic schedule assessment methodology where 1 � j � m, and 1ðAÞ is an indicator function that takes value 1 if A
is true and zero otherwise.
In this Section we apply an assessment methodology for strategic As an example, in Fig. 8, DðS2 Þ ¼ 12 þ 12 þ 13 since S2 dominates S4 from
flight schedules using a set of general KPIs. This assessment can be done layer 2, S5 from layer 2 and S6 from layer 3; DðS4Þ ¼ DðS5 Þ ¼ 13 because
both before and after the execution of the flights, as long as the values of both S4 and S5 dominate S6 from layer 3; and DðS1 Þ ¼ DðS3 Þ ¼ DðS6 Þ ¼ 0
the KPIs are known. When assessing the flight schedules, we make use of since S1 ; S3 ; S6 do not dominate other schedules.
the notion of schedule domination, which we define below, rather than Lastly, we rank the strategic flight schedules i 2 S ; 1 � i � N based
assuming user-defined weights for the KPIs considered. Thus, we pro­ on their dominance power.
pose a generic assessment methodology that does not depend on the We are interested in those schedules with the highest dominance
weights of the KPIs, which are user-specific. power. We assign a ranking position of 1 for the schedule(s) with the
We characterize a strategic flight schedule i by a set of n KPIs, i.e. Si : highest dominance power, a ranking position of 2 for the schedule(s)
ðKPIi1 ;…;KPIin Þ. We are interested in those schedules where the values of with the second highest dominance power and so on.
all n KPIs are minimal. To this end, we define the concept of schedule
domination as follows. We say that schedule i, Si : ðKPIi1 ; KPIi2 ; …; KPIin Þ,
j j 2.6. Assessing strategic flight schedules - results
dominates schedule j, Sj : ðKPI1 ; KPI2 ; …; KPIjn Þ, if: 8u 2 f1; 2; …; ng;
KPIiu � KPIju and there exists at least one KPI Kl ; l 2 f1; 2; …; ng such that In this Section we assess 10 strategic flight schedules using the
j
KPIil < KPIl (Boyd and Vandenberghe, 2004). methodology introduced in Section 5.1. In doing so, we consider 5 KPIs,
We consider the set S ¼ fS1 ; …; SN g of N schedules. The Pareto front which are based on flight delay and cancellation predictions with a
of the schedules i 2 S ;1 � i � N, with respect to the KPIs KPI1 ;…;KPIn , horizon of 6 months prior to the flight execution day (see Section 4): 1)
is the subset S 1 of schedules that are not dominated by any other the predicted percentage of flights cancelled, 2) the predicted percent­
schedule (Boyd and Vandenberghe, 2004), where S 1 ⊂ S . We say that age of departure delayed flights, 3) the predicted percentage of arrival
layer 1, which we denote by L1 , consists of all the schedules in S 1 . We flights delayed, 4) the predicted percentage of departure yellow days, 5)
next partition the set of remaining schedules S nS 1 into additional
layers. We define layer 2, i.e., L2 , of the schedules i 2 S ; 1 � i � N as the
set of schedules that are in the Pareto front of the schedules S nS 1 ;S n
S 1 6¼ ∅. In general, we define layer m, denoted by Lm , of the schedules
i 2 S ; 1 � i � N as the set of schedules in the Pareto front of the
schedules S nðS 1 [S 2 [… [S m 1 Þ, with S nðS 1 [S 2 [… [S m 1 Þ 6¼
∅.
Fig. 8 shows an example of dominance relationships between 6
schedules S1 ; S2 …; S6 . Schedules S1 ; S2 ; S3 form layer 1. Schedules S4 ; S5
form layer 2. Schedule S6 is layer 3. Fig. 8 also shows the dominance
boundaries for each schedule, i.e., the bounds of the set of points that a
schedule dominates. All the schedules i; 1 � i � 6, located within the
dominance boundaries of a given schedule j 6¼ i; 1 � j � 6, are domi­
nated by schedule j. Here, schedules S1 , S6 and S3 do not dominate any
other schedule, schedules S4 and S5 dominate schedule S6 , schedule S2
dominates schedules S4 , S5 and S6 .
We next define the dominance power of a schedule i 2 S ;1 � i � N,
as introduced in Valkanas et al. (2014). We consider a total of m > 0
layers. We say that the dominance power of schedule i;i 2 S , which we
denote by DðSi Þ, is as follows: Fig. 9. Layers of the strategic flight schedules when considering the percentage
of predicted flights cancelled and the percentage of predicted delayed
arrival flights.

8
M. Lambelho et al. Journal of Air Transport Management 82 (2020) 101737

Table 11
Dominance power of the 10 strategic schedules S1 ;…;S10 , when considering the
percentage of cancelled flight and percentage of delayed arrival flights.
S3 S1 S4 S2 S6 S8 S10 S5 S7 S9

2.33 1.83 1.33 0.33 0.33 0.33 0 0 0 0

Table 12
Schedule ranking with respect to percentage of flights cancelled and percentage
of arrival delays under schedule Si .
Ranking position 1 2 3 4 5 6 7 8 9 10

KPIs prediction S3 S1 S4 S2 S6 S8 S10 S5 S7 S9


KPIs real-data S1 S3 S2 S4 S7 S5 S6 S8 S10 S9

Fig. 11. Layers of the strategic flight schedules when considering the per­
centage of predicted delayed departure and arrival flights.

the Pareto front (layer 1).


Table 12 shows the ranking of the 10 strategic schedules based on the
dominance power of the schedules (see Equation (1)). When the domi­
nance power of two or more schedules is the same, we further
discriminate between these schedules by ranking them based on the
percentage of predicted flight cancelled, where the schedule with the
lowest percentage of flight cancellations is preferred. Table 12 shows the
schedule ranking obtained using i) the predicted KPI values, as a result
of the machine learning algorithms, and ii) the actual values of the KPIs
from the actual flight data. We note that the actual values of the KPIs are
known only after the flights have been executed, while the values for the
predicted KPIs are known 6 months prior to the flight execution day. The
results show that the best-ranked 4 schedules when considering the
Fig. 10. Layers of the strategic flight schedules when considering the per­ actual flight data are also captured by the schedule ranking when using
centage of predicted flights cancelled and the percentage of predicted delayed the predicted values of the KPIs.
departure flights. Fig. 10 shows the percentage of predicted cancelled flights and the
percentage of predicted delayed departure flights for each of the 10
the predicted percentage of arrival yellow days. These strategic flight strategic schedules. Layer 1 consists of schedules fS1 ; S3 ; S4 g. Layer 2
schedules correspond to 5 years of operations 2013–2018, when 2 consists of schedules fS2 ;S5 ;S6 ;S7 ;S8 g. Layer 3 consists of schedules fS9 ;
strategic schedules are generated every year (a winter season and a S10 g.
summer season schedule). Table 13 shows the dominance power of the schedules, where
Fig. 9 shows the percentage of predicted cancelled flights and the schedules S3 ; S1 ; S4 have the best performance with respect to flight
percentage of predicted delayed arrival flights for each of the 10 stra­ cancellations and delayed departure flights.
tegic schedules. Layer 1 consists of schedules fS1 ; S3 ; S4 ; S9 g. Layer 2 Table 14 shows the ranking of the 10 strategic schedules based on the
consists of schedules fS2 ; S5 ; S6 ; S7 ; S8 g. Layer 3 consists of schedule dominance power of the schedules. When the dominance power of two
fS10 g. Using Equation (1), we determine the dominance power of these or more schedules is the same, we further discriminate between these
10 strategic schedules when taking into account the percentage of schedules by ranking them based on the percentage of predicted flight
cancelled flights and the percentage of delayed arrival flights. Table 11 cancelled, where the schedule with the lowest percentage of flight
shows the dominance power of the schedules, where schedules S3 ; S1 ; S4 cancellations is preferred. Table 12 shows the schedule ranking obtained
have the best performance with respect to flight cancellations and using i) the predicted values KPIs, as a result of the machine learning
delayed arrival flights. We note that all these three schedules are part of algorithms, and ii) the actual values of the KPIs from the actual flight
data. Again, we note that the actual values of the KPIs are known only

Table 13
Dominance power of the 10 strategic schedules S1 ; …; S10 , when considering the percentage of cancelled flight and percentage of delayed departure flights.
S3 S1 S4 S6 S8 S7 S2 S10 S5 S9

DðSi Þ 2.67 1.83 0.83 0.33 0.33 0.33 0 0 0 0

Table 14
Schedule ranking with respect to percentage of flights cancelled and percentage of departure delays under schedule Si .
Ranking Position 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th

KPIs prediction S3 S1 S4 S6 S8 S7 S2 S10 S5 S9


KPIs real-data S3 S1 S2 S4 S10 S7 S5 S6 S8 S9

9
M. Lambelho et al. Journal of Air Transport Management 82 (2020) 101737

Table 15
Dominance power of the 10 strategic schedules S1 ; …; S10 when considering the percentage of delayed departure and arrival flights.
S3 S7 S9 S1 S5 S8 S6 S4 S10 S2

DðSi Þ 1.90 1.40 1.40 1.07 0.82 0.62 0.29 0.29 0 0

Table 16
Schedule ranking with respect to percentage of delayed departure and arrival flights under schedule Si .
Ranking position 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th

KPIs prediction S3 S7 S9 S1 S5 S8 S6 S4 S10 S2


KPIs real data S5 S7 S1 S3 S10 S9 S2 S8 S4 S6

schedules when considering 3 KPIs: the percentage of cancelled flights,


the percentage of yellow departure days and the percentage of yellow
arrival days. Layer 1 consists of schedules fS1 ;S3 ;S4 g. Layer 2 consists of
schedules fS2 ; S6 ; S8 g. Layer 3 consists of schedules fS5 ; S7 ; S9 ; S10 g.
Table 17 shows the dominance power of the schedules, where
schedules S3 ; S1 ; S4 have the best performance with respect to delayed
departure and arrival flights.
Table 18 shows the ranking of the 10 strategic schedules based on the
dominance power of the schedules. When the dominance power of two
or more schedules is the same, we further discriminate between these
schedules by ranking them based on the percentage of predicted
cancelled flights, where the schedule with the lowest percentage of
cancelled flights is preferred. Table 18 shows the schedule ranking ob­
tained using i) the predicted values KPIs, as a result of the machine
learning algorithms, and ii) the values of the KPIs from the actual flight
data. The results show that the 4 best-ranked schedules when consid­
ering the actual flight data are also captured when considering predicted
KPIs.

3. Discussion

In practice, the implications of being able to assess strategic airport


Fig. 12. Pareto Front (L1 ) obtained using 3 predicted KPIs: the percentage of slot schedules with respect to potential flight delays and cancellations
cancelled flights, the percentage of departure yellow days and the percentage of
are multilateral. One implication is that airport slot coordinators are
arrival yellow days. (For interpretation of the references to color in this figure
able to identify at an early stage potential airport on-time performance
legend, the reader is referred to the Web version of this article.)
bottlenecks associated with these strategic schedules. Such bottlenecks
can be in the form of congested days, i.e., days where the scheduled
after the flights have been executed, while the values for the predicted
flights are expected to experience many delays and cancellations, as well
KPIs are known 6 months prior to the flight execution day. The results
as in the form of more detailed indicators such as the type of airline or
show that the 2 best-ranked schedules when considering the actual flight
the type of terminal associated with potential large delays and
data are also captured when considering predicted KPIs.
Fig. 11 shows the percentage of predicted percentage of delayed
departure and arrival flights for each of the 10 strategic schedules. Layer
Table 17
1 consists of schedules fS3 ;S9 g. Layer 2 consists of schedule fS7 g, Layer 3
Dominance power of the 10 strategic schedules S1 ; …; S10 when considering the
consists of schedule fS1 g. Layer 4 consists of schedule fS5 g. Layer 5
percentage of cancelled flights, the percentage of yellow departure days and the
consists of schedules fS8 g. Layer 6 consists of schedule fS4 ;S6 g. Layer 7 percentage of yellow arrival days.
consists of schedules fS2 ; S10 g.
S3 S1 S4 S2 S6 S8 S10 S5 S7 S9
Table 15 shows the dominance power of the schedules, where
schedules S3 ; S7 ; S9 have the best performance with respect to delayed DðSi Þ 2.83 1.33 0.83 0.33 .33 0.33 0 0 0 0
departure and arrival flights.
Table 16 shows the ranking of the 10 strategic schedules based on the
dominance power of the schedules. When the dominance power of two
or more schedules is the same, we further discriminate between these Table 18
schedules by ranking them based on the percentage of predicted delayed Schedule ranking with respect to the percentage of cancelled flights, the per­
centage of yellow departure days and the percentage of yellow arrival days
departure flights, where the schedule with the lowest percentage of
under schedule Si .
delayed departure flight is preferred. Table 16 shows the schedule
ranking obtained using i) the predicted values KPIs, as a result of the Ranking 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
position
machine learning algorithms, and ii) the values of the KPIs from the
actual flight data. The results show that 3 from the 4 best-ranked KPIs S3 S1 S4 S2 S6 S8 S10 S5 S7 S9
prediction
schedules when considering the actual flight data are also captured
KPIs S3 S1 S2 S4 S10 S7 S5 S6 S8 S9
when considering predicted KPIs. real data
Fig. 12 shows the predicted performance of the 10 strategic

10
M. Lambelho et al. Journal of Air Transport Management 82 (2020) 101737

cancellations. This insight can support the airport management in Corolli, L., Lulli, G., Ntaimo, L., 2014. The time slot allocation problem under uncertain
capacity. Transp. Res. C Emerg. Technol. 46, 16–29.
identifying mitigation actions for these performance bottlenecks such as,
Duda, R.O., Hart, P.E., Stork, D.G., 2012. Pattern Classification. John Wiley & Sons.
for instance, assigning more resources in a specific part of the airport. EUROCONTROL, 2017. Performance Review Report - an Assessment of Air Traffic
A second implication is that, in the case when airport performance Management in Europe during the Calendar Year 2017. Performance Review
bottlenecks are expected, airport coordinators are provided with support Commission.
Granitto, P.M., Furlanello, C., Biasioli, F., Gasperi, F., 2006. Recursive feature
to propose, in the limits of the IATA slot allocation guidelines and elimination with random forest for ptr-ms analysis of agroindustrial products.
following negotiations with the airlines that operate the flights associ­ Chemometr. Intell. Lab. Syst. 83 (2), 83–90.
ated with the performance bottlenecks, changes to schedule such as Guyon, I., Weston, J., Barnhill, S., Vapnik, V., 2002. Gene selection for cancer
classification using support vector machines. Mach. Learn. 46 (1–3), 389–422.
alternative arrival/departure time slots or alternative types of aircraft Hinton, G.E., 1990. Connectionist learning procedures. In: Machine Learning, vol. III.
used for the flight execution. Thus, the results of this assessment provide Elsevier, pp. 555–610.
a quantified motivation for potential schedule alternatives during the Horiguchi, Y., Baba, Y., Kashima, H., Suzuki, M., Kayahara, H., Maeno, J., 2017.
Predicting fuel consumption and flight delays for low-cost airlines. In: Innovative
negotiation for slots between airport coordinators and airlines. Applications of Artificial Intelligence (IAAI) Conference, pp. 4686–4693.
Last, but not least, we note that when evaluating the strategic flight Hossin, M., Sulaiman, M., 2015. A review on evaluation metrics for data classification
schedules using flight delay and cancellation-based KPIs, we do not as­ evaluations. Int. J. Data Min. Knowl. Manag. Process 5 (2), 1.
C. International Air Transport Association, Montreal. Worldwide Slot Guidelines, eighth
sume user-specific weights. As such, our strategic schedule assessment is ed., 2017
generic and can be applied to any large European airport where the slot Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.Y., 2017. Light
allocation process is in place. GBM: A highly efficient gradient boosting decision tree. In: Advances in Neural
Information Processing Systems, pp. 3146–3154.
Kim, Y.J., Choi, S., Briceno, S., Mavris, D., 2016. A deep learning approach to flight delay
4. Conclusion prediction. In: Digital Avionics Systems Conference (DASC), 2016 IEEE/AIAA 35th.
IEEE, pp. 1–6.
To support the slot allocation process at airports, in this paper we Kingma, D.P., Adam, J. Ba, 2014. A Method for Stochastic Optimization arXiv preprint
arXiv:1412.6980.
have developed a machine learning approach to evaluate the resulting Klein, A., Craun, C., Lee, R.S., 2010. Airport delay prediction using weather-impacted
flight schedules in terms of predicted flight delays and cancellations. traffic index (witi) model. In: 29th Digital Avionics Systems Conference. B. IEEE, 2.
Based on these predictions, we have developed a generic ranking of the Lundberg, S.M., Lee, S.-I., 2017. A unified approach to interpreting model predictions. In:
Advances in Neural Information Processing Systems, pp. 4765–4774.
strategic schedules. We have implemented our proposed approach for Marsland, S., 2011. Machine Learning: an Algorithmic Perspective. Chapman and Hall/
strategic flight schedules at London Heathrow Airport in the period CRC.
2013–2018. Micci-Barreca, D., 2001. A preprocessing scheme for high-cardinality categorical
attributes in classification and prediction problems. ACM SIGKDD Explor. Newsl. 3
In practice, our methodology can provide airport coordinators with (1), 27–32.
insight into potential delays and cancellations associated with the stra­ Mueller, E., Chatterji, G., 2002. Analysis of aircraft arrival and departure delay
tegic schedules. In the case when on-time performance bottlenecks are characteristics. In: AIAA’s Aircraft Technology, Integration, and Operations, vol.
5866. (ATIO) 2002 Technical Forum.
identified, our approach provides the airport coordinators with a Pellegrini, P., Boli�c, T., Castelli, L., Pesenti Sosta, R., 2017. An effective model for the
quantitative support to motivate potential actions to mitigate flight simultaneous optimisation of airport slot allocation. Transp. Res. E Logist. Transp.
delays, such as changes to the flight schedule. Together with the Rev. 99, 34–53.
Pyrgiotis, N., Malone, K.M., Odoni, A., 2013. Modelling delay propagation within an
development of dedicated flight schedule optimization models, our
airport network. Transp. Res. C Emerg. Technol. 27, 60–75.
approach supports an integrated strategic flight schedule assessment, Rebollo, J.J., Balakrishnan, H., 2014. Characterization and prediction of air traffic
where strategic flight schedules are evaluated with respect to on-time delays. Transp. Res. C Emerg. Technol. 44, 231–241.
airport performance. Ribeiro, N.A., Jacquillat, A., Antunes, A.P., Odoni, A.R., Pita, J.P., 2018. An optimization
approach for airport slot allocation under IATA guidelines. Transp. Res. Part B
As future work, we consider extending the set of features for the Methodol. 112, 132–156.
prediction algorithms to improve the accuracy of the predictions. In Rupp, N.G., Holmes, G.M., 2006. An investigation into the determinants of flight
addition, we will evaluate the impact of considering flight delay and cancellations. Economica 73 (292), 749–783.
Sridhar, B., Wang, Y., Klein, A., Jehlen, R., 2009. Modeling flight delays and
cancellation predictions in the flights scheduling optimization models, cancellations at the national, regional and airport levels in the United States. In: 8th
at the strategic phase. USA/Europe ATM R&D Seminar, Napa, California (USA).
Sternberg, A., Soares, J., Carvalho, D., Ogasawara, E., 2017. A Review on Flight Delay
Prediction arXiv preprint arXiv:1703.06118.
References Tu, Y., Ball, M.O., Jank, W.S., 2008. Estimating flight departure delay distributions a
statistical approach with long-term trend and short-term pattern. J. Am. Stat. Assoc.
Abdel-Aty, M., Lee, C., Bai, Y., Li, X., Michalak, M., 2007. Detecting periodic patterns of 103 (481), 112–125.
arrival delay. J. Air Transp. Manag. 13 (6), 355–361. Valkanas, G., Papadopoulos, A.N., Gunopulos, D., 2014. Skyline ranking a la IR. In:
Alonso, H., Loureiro, A., 2015. Predicting flight departure delay at porto airport: a EDBT/ICDT Workshops, pp. 182–187.
preliminary study. In: Proceedings of the 7th International Joint Conference on Wu, Q., 2014. A stochastic characterization based data mining implementation for
Computational Intelligence (IJCCI), vol. 3. IEEE, pp. 93–98. airport arrival and departure delay data. In: Applied Mechanics and Materials, vol.
Belcastro, L., Marozzo, F., Talia, D., Trunfio, P., 2016. Using scalable data mining for 668. Trans Tech Publ, pp. 1037–1040.
predicting flight delays. ACM Trans. Intell. Syst. Technol. (TIST) 8 (1), 5. Xiong, J., Hansen, M., 2009. Value of flight cancellation and cancellation decision
Bergstra, J., Yamins, D., Cox, D.D., 2013. Hyperopt: A python library for optimizing the modeling: ground delay program postoperation study. Transp. Res. Rec.: J. Transp.
hyperparameters of machine learning algorithms. In: Proceedings of the 12th Python Res. Board 2106, 83–89.
in Science Conference. Citeseer, pp. 13–20. Xu, N., Donohue, G., Laskey, K.B., Chen, C.-H., 2005. Estimation of delay propagation in
Boyd, S., Vandenberghe, L., 2004. Convex Optimization. Cambridge University Press. the national aviation system using bayesian networks. In: 6th USA/Europe Air
Breiman, L., 2001. Random forests. Mach. Learn. 45 (1), 5–32. Traffic Management Research and Development Seminar. FAA and Eurocontrol,
Castelli, L., Pellegrini, P., Pesenti, R., 2012. Airport slot allocation in Europe: economic Baltimore, MD.
efficiency and fairness. Int. J. Revenue Manag. 6 (1–2), 28–44. Zografos, K.G., Salouras, Y., Madas, M.A., 2012. Dealing with the efficient allocation of
Chen, J., Li, M., 2019. Chained predictions of flight delay using machine learning. In: scarce resources at congested airports. Transp. Res. C Emerg. Technol. 21 (1),
AIAA Scitech 2019 Forum, p. 1661. 244–256.
Choi, S., Kim, Y.J., Briceno, S., Mavris, D., 2016. Prediction of weather-induced airline Zografos, K.G., Madas, M.A., Androutsopoulos, K.N., 2017. Increasing airport capacity
delays based on machine learning algorithms. In: Digital Avionics Systems utilisation through optimum slot scheduling: review of current developments and
Conference (DASC), 2016 IEEE/AIAA 35th. IEEE, pp. 1–6. identification of future needs. J. Sched. 20 (1), 3–24.
Choi, S., Kim, Y.J., Briceno, S., Mavris, D., 2017. Cost-sensitive prediction of airline
delays using machine learning. In: IEEE/AIAA 36th Digital Avionics Systems
Conference (DASC), pp. 1–8.

11

You might also like