Forrest 2005
Forrest 2005
www.elsevier.com/locate/ijforecast
Abstract
Sets of odds issued by bookmakers may be interpreted as incorporating implicit probabilistic forecasts of sporting events.
Employing a sample of nearly 10 000 English football (soccer) games, we compare the effectiveness of forecasts based on
published odds and forecasts made using a benchmark statistical model incorporating a large number of quantifiable variables
relevant to match outcomes. The experts’ views, represented by the published odds, are shown to be increasingly effective over
a 5-year period. Bootstraps performed on the statistical model fail to outperform the expert judges. The trend towards odds-
setters displaying greater expertise as forecasters coincided with a period during which intensifying competition is likely to have
increased the financial penalties for bookmakers of imprecise odds-setting. In the context of a financially pressured
environment, the main findings of this paper challenge the consensus that subjective forecasting by experts will normally be
inferior to forecasts from statistical models.
D 2005 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
It is possible, however, that the consensus view on forms available by the Wednesday. Odds are therefore
the role of judgmental forecasting should be qualified. announced simultaneously at the various bookmakers.
A reasonable hypothesis to test might propose that Although the bookmakers retain the right to change
accuracy in judgmental forecasting will improve as odds up to matchday kick-off time, they rarely make
the payoff to accuracy increases. Perhaps, with minds adjustments. These odds then remain available for
concentrated by heavy financial consequences from several days, up to the start of the relevant match,
wrong forecasting decisions, the gap between the regardless of betting volumes and indeed regardless of
performance of experts and statistical models would new information (e.g. on players’ injuries or midweek
narrow or even disappear. match results). If bets are mispriced, the financial
Professional sport, and football in particular, again consequences for bookmakers may be serious because
provides a convenient forum for testing this hypoth- they are committed to continue to sell the mispriced
esis. The newspaper columnists included in the bet even though betting volumes may have alerted
tipsters study are not the only professionals engaged them to the fact that, at least in the view of the public,
in assessing prospects in advance of British games. odds-setters have made an inappropriate assessment
Bookmakers publish odds for each possible result of a of outcome probabilities.
match, which may be represented as being derived Good forecasting therefore matters to bookmakers,
from subjective probabilistic forecasts of match out- certainly more than to newspapers that employ
comes. In contrast to newspapers, for which the tipsters. Further, the pressure on bookmakers to
weekly football column is only a small part of the total produce good forecasts to be used in odds-setting
product, bookmakers have an obvious and very strong has increased over time. From about 1999, the year
interest in accurate probabilistic forecasting. when UK bookmakers set up their own offshore
Betting volumes in British football are large and subsidiaries to serve the domestic and overseas betting
growing. Turnover with British bookmakers was markets, international competition intensified to the
claimed by Global Betting and Gaming Consultants extent that bettors began to have access to a wide
(2001) to be close to o2bn in 1998. Mintel Intelli- range of betting firms, located around the world but
gence Report (2001) saw it as the fastest growing quoting odds on English football.
sector in British gambling. But the importance to Up to 1999, British bookmakers usually refused
bookmakers of expertise in football odds-setting does to accept bets on single matches (dsingles bettingT)
not rest only on the high volume of business. Also and would only accept bets on combinations of
relevant is the unusual way in which the particular matches, normally three. One of the consequences of
betting market is organised. In conventional book- the Internet revolution in the gambling industry was
maker betting (e.g. horse racing worldwide or team that British bookmakers, one by one, abandoned
sports betting in Las Vegas casino operations), odds their traditional restriction against betting on single
shift during the betting period in response to the football matches until, in 2003, all such restrictions
weight of bettors’ money. If a particular price is overly were removed. During this period the UK Govern-
generous, for example, heavy wagers are likely to be ment was induced to remove the 6.75% betting duty
made by the public, and bookmakers will defend that had applied before October 2001. The ending of
themselves by shortening the odds offered. The the restriction preventing betting on single matches is
consequences of the initial error in odds-setting are, likely to have increased the competitive pressure on
then, mitigated to the extent that the dgood value betT bookmakers. Previously, an error that made one
is effectively withdrawn from sale at some point in the particular price look attractive might not necessarily
betting period. or even usually, present an opportunity for a positive
This is not possible in the case of football betting in expected return at the bookmaker’s expense, because
Britain (and several European countries). In football, a combination bet across three or more matches had
the dominant form of betting is dfixed oddsT. Odds for to be placed. The removal of betting tax was
weekend matches, for example, are determined by likewise a significant development because, by
odds-setters’ meetings at each bookmaker on the raising transactions costs substantially, the tax had
previous Monday (Sharpe, 1997), and printed on entry deterred well-informed professional bettors (capable
D. Forrest et al. / International Journal of Forecasting 21 (2005) 551–564 553
of identifying mispriced bets) from participating in performance are drawn. Each version of the bench-
the market. mark model is estimated using data from all English
Given the substantial and growing importance to Premier League and Football League matches played
bookmakers of accurate assessment of the prospects in during the preceding 15 seasons.
each football game played, it is unsurprising that In Section 3, we report the results of a series of
expertise attracts high rewards in the relevant labour likelihood-ratio tests which compare the accuracy of
market, with odds-setters’ positions regularly adver- the forecasts obtained from the benchmark model
tised in the betting press at salaries approximately and the corresponding forecasts implicit in book-
double those of full professors of statistics in British makers’ odds in anticipating the results of nearly
universities. Bookmakers willing to pay such high 10 000 matches played between 1998 and 2003. We
salaries might anticipate that the level of expertise ask whether odds-setters’ decisions capture accu-
procured should be capable of producing subjective rately all the information in the benchmark model
probabilistic forecasts at least as good as those and whether they appear to benefit from extra
generated by available statistical models. Comparing information not utilised by the benchmark model.
forecasts implicit in match odds with forecasts from In this way, we test the efficacy of bookmaker odds
an information-rich statistical model therefore offers a as a forecasting tool separately for each of five
more plausible route for drawing general conclusions betting firms covered by our electronic archive of
about the potential of subjective forecasting than betting odds (www.mabels-tables.com). Of the five
exploring the performance of newspaper tipsters, firms, Coral, Hill, Ladbroke and Stanley are major
where the level of accountability is likely to be low high-street bookmakers, while SuperSoccer is a
and the incentive to invest in high-level talent small. specialist agency that supplies odds to most small
Therefore, to summarise, the English football independent bookmakers in the UK.
betting market is a particularly interesting and useful In Section 4, we go on to explore an additional
vehicle to use to compare expert and statistical though related issue. Boulier and Stekler (2003)
forecasting systems. This market has been the subject compared the performance of the betting market and
of considerable structural change caused by the that of a well-known commentator (from The New
growth of Internet betting alternatives to traditional York Times) in forecasting the outcomes of American
bookmakers. As a result, competitive pressure on football (NFL) matches. They found that spreads in
odds-setters has increased. This market differs from the betting market offered superior guidance to those
many other sports betting markets in that odds are of the expert. Furthermore, when they estimated a
fixed, with consequent incentives for odds-makers to model to dexplainT the forecasts of the expert, the
generate accurate forecasts of match outcomes. A fitted values from that model offered superior fore-
further difference from American sports betting casts to those of the expert himself. This suggests that,
markets is the existence of three outcomes (home when experts depart from the forecasts suggested by
win, away win and draw) rather than two outcomes as their dnormalT or daverageT method of processing
in Major League Baseball. Therefore, we need to quantitative information, they become less rather than
apply an ordered probit model rather than a standard more reliable. By implication, adjustment of forecasts
logit or probit model. to take account of subjective information may be a
The remainder of the paper proceeds as follows. In bad idea. We test this general proposition in our
Section 2 we describe a data-rich benchmark statis- context, English football (soccer), by comparing the
tical forecasting model, against which odds-setters’ forecasting performance of a model including only
performance will be assessed. The benchmark model information about bookmaker odds with that of a
uses as inputs very detailed information on the teams model embodying instead fitted values of bookmaker
contesting each individual football match. A different odds.
estimated version of the benchmark model is used to Finally, Section 5 summarises our findings con-
produce probabilistic forecasts for matches played cerning the level of expertise demonstrated by odds-
during each of the five seasons for which comparisons setters according to our various criteria, and how this
between the model’s and the odds-setters’ forecasting may have changed over time.
554 D. Forrest et al. / International Journal of Forecasting 21 (2005) 551–564
Throughout the paper, we assess odds-setters’ expected returns in respect of teams with lower win-
expertise by the results from models that include odds probabilities. The opposite result is well documented
as an input. We do not draw direct comparisons for racetrack (horse and greyhound) betting markets
between the probabilistic forecasts of the benchmark (Vaughan Williams, 1999). Although, in preliminary
statistical model, published odds and match results analysis for this paper, we found scant evidence for
because published odds are not intended as forecasts. longshot bias in the football betting market, we
They are prices that are set with commercial objectives nevertheless allowed for any bias there may be by
in view, and may therefore contain biases that cater to using (as the basis for our evaluation of their perform-
bettor preferences. In many betting markets, positive or ance) an ordered probit model that generated
negative longshot bias has been identified. For dbookmaker forecastsT from information on raw odds.
example, Woodland and Woodland (1994, 2001, This avoids falsely attributing to bad forecasting any
2003) have found that odds in baseball and (ice) tendency of odds to systematically over- or under-state
hockey betting markets are more dgenerousT in terms of the chances of the favoured team in a match.
A number of authors have attempted to model the process by which the results of football matches are
determined, notwithstanding that predictions for single matches are notoriously unreliable (Norman, 1998). A
strand in the applied statistics literature, derived from Maher (1982), proposes that the scores of the two teams
contesting any match may be modelled using independent Poisson distributions, with means reflecting both
the goal-scoring record of the team and the goal-conceding record of its opponent. The model was
operationalised by Dixon and Coles (1997) and Rue and Salvesen (2000) in attempts to identify profitable
betting strategies for football. Dixon and Pope (2004) compare probabilistic forecasts obtained from the
Dixon–Coles model with probabilities inferred from UK bookmakers’ prices for fixed-odds betting. Other
contributors, including Dobson and Goddard (2001), Forrest and Simmons (2000), Goddard and
Asimakopoulos (2004) and Kuypers (1999), have adopted the less computationally demanding methodology
of ordered probit or logit. These techniques model home win–draw–away win match results directly, rather
than indirectly according to the probability distributions of the scores by each team. Regressors typically
include measures of team strength and form, such as the difference in league position and the outcomes of
recent matches for each team.
Goddard (2005) has compared the performance in forecasting home–draw–away match results of models in
which the dependent variable is the number of goals scored and conceded by each team; and models with a
discrete (home–draw–away) match result dependent variable. The difference between the forecasting
performance of goals- and results-based models is found to be relatively small. Here, we use as our
benchmark forecasting model the most parsimonious specification considered by Goddard (2005): an ordered
probit model including a match results dependent variable and covariates based on lagged match results and
other relevant information. The model produces forecasts based solely on historical information that is publicly
available before the start of the match in question. Given the large number of matches (typically sixty) for
which odds-setters must make decisions at a single sitting, it is reasonable to suppose that their published odds
will be influenced, either explicitly or implicitly, by publicly-available indicators. A fair test of their expertise
is therefore whether they process such information more or less efficiently than the benchmark statistical
model.
The notation for the benchmark forecasting model is as follows. y i, j is the result of the match between home
team i and away team j, coded as shown below. The latent variable y i,* j is a linear function of a set of covariates
relevant for forecasting match results. The covariates are described briefly below, and are defined in full in
Appendix A. e i, j is a random disturbance term, assumed to follow the standard Normal distribution. l 1s and l 2s
D. Forrest et al. / International Journal of Forecasting 21 (2005) 551–564 555
are the cut-off parameters, which control the overall proportions of home wins, draws and away wins in season s.
The structure of the benchmark model is ordered probit:
Home win Z yi; j ¼ 1 if l2s by4i; j þ ei; j
Draw Z yi; j ¼ 0:5 if l1s by4i; j þ ei; j bl2s ð1Þ
Away win Z yi; j ¼ 0 if y4i; j þ ei; j bl1s
Having obtained an estimated version of Eq. (1) over some specific sample period, out-of-sample fitted
match result probabilities for the three possible match result outcomes are obtained by rearranging Eq. (1) as
follows:
Home win probability ¼ pH i; j ¼
prob ei; j Nl2s y4i; j ¼ 1 U l2s y4i; j
Draw probability ¼ pD i; j ¼ prob l1s yi; j bei; j bl2s ŷ
4 y 4i; j ¼ U l2s y4i; j U l1s y4i; j ð2Þ
A
Away win probability ¼ pi; j ¼ prob ei; j bl1s yi; j ¼ U l1s yi; j
4 4
In Eq. (2), y i,* j is the fitted value of the latent variable for the match in question, and l 1s and l 2s are the
estimated values of these parameters for the final (most recent) season in the estimation period.
In order to generate predictions (in the form of match result probabilities) for each of the five seasons 1998–9 to
2002–3 inclusive, the benchmark model is estimated using data for the preceding 15 seasons in each case.
Accordingly, forecasts for season 1998–9 are generated from a version of the model estimated using data for
seasons 1983–4 to 1997–8 inclusive; forecasts for 1999–2000 are generated from the model estimated over 1984–5
to 1998–9; and so on. Preliminary experimentation indicated that extending the estimation period up to about 15
seasons produced tangible benefits in terms of improved forecasting accuracy, but there was little or no further gain
beyond 15 seasons.
The estimation results for the final version of the benchmark model, estimated over seasons 1987–8 to 2001–2,
are reported in Table 1. Overall, the results of a Wald test indicate that the 59 covariates are jointly significant (chi-
square statistic = 1628.6, critical value at 5% level = 77.9). Below, we comment briefly on the interpretation of the
estimation results for each set of covariates; see also Goddard (2005) and Goddard and Asimakopoulos (2004).
If a match is important for championship, promotion or relegation issues for one team but unimportant
for the other, the match result is likely to be influenced by the difference between the incentives for the two
556 D. Forrest et al. / International Journal of Forecasting 21 (2005) 551–564
Table 1
Ordered probit estimation results: estimation period 1987–1988 to 2001–2002 inclusive
d d
1. Win ratios over previous 24 months ( P i,y,s , P j,y,s )
Home team (i) Away team ( j)
0–12 months 12–24 months 0–12 months 12–24 months
( y = 0) ( y = 1) ( y = 0) ( y = 1)
Matches played Current Last Last Two Current Last Last Two
season season season seasons season season season seasons
(s = 0) (s = 1) (s = 1) ago (s = 2) (s = 0) (s = 1) (s = 1) ago (s = 2)
Two divisions higher (d = 2) 0.362 0.110
(0.610) (0.626)
One division higher (d = 1) 1.915*** 0.828*** 0.493** 1.470*** 0.776*** 0.625***
(0.256) (0.213) (0.206) (0.254) (0.212) (0.205)
Current division (d = 0) 1.769*** 1.207*** 0.659*** 0.464*** 1.330*** 0.935*** 0.590*** 0.403***
(0.153) (0.139) (0.131) (0.131) (0.150) (0.137) (0.130) (0.130)
One division lower (d = 1) 0.888*** 0.470*** 0.421*** 0.590*** 0.322*** 0.200*
(0.124) (0.117) (0.109) (0.125) (0.118) (0.109)
Two divisions lower (d = 2) 0.051 0.443**
(0.191) (0.197)
H A
2. Most recent match results (R i,m, R i,n, R,Hj ,n, R,Aj ,m )
Number of matches 1 2 3 4 5 6 7 8 9
ago (m,n)
Home team (i) 0.012 0.005 0.027*** 0.001 0.003 0.009 0.007 0.000 0.009
Home matches (0.008) (0.008) (0.008) (0.008) (0.008) (0.008) (0.008) (0.008) (0.008)
Away matches 0.006 0.020** 0.019** 0.009
(0.009) (0.008) (0.008) (0.008)
Away team ( j) 0.021** 0.022** 0.006 0.015*
Home matches (0.009) (0.009) (0.009) (0.008)
Away matches 0.015* 0.011 0.017** 0.019** 0.009 0.002 0.007 0.009 0.029***
(0.008) (0.008) (0.008) (0.008) (0.008) (0.008) (0.008) (0.008) (0.008)
teams. For the purposes of the estimation, a match is deemed to be important if it is possible (before the
match is played) for the team in question to win the championship or be promoted or relegated, if all other
teams currently in contention for the same outcome take one point on average from each of their remaining
fixtures.
Early elimination from the FA Cup may have implications for a team’s results in subsequent league matches,
although the effect could operate in either direction. A team eliminated from the cup may be able to concentrate
efforts on the league, suggesting an improvement in league results; or cup elimination may cause a loss of
confidence, suggesting a deterioration. The estimated coefficients on CUPi and CUPj suggest that the second of
these two effects dominates.
D. Forrest et al. / International Journal of Forecasting 21 (2005) 551–564 557
The covariate DISTi, j , the geographical distance between the home towns of the teams contesting the match,
controls for a tendency for home advantage to be less pronounced in matches between teams located close
together, and more pronounced in matches between teams from distant cities or towns.
The covariates APi,k and APj,k , based on average attendance data relative to league position for k = 1, 2 seasons
prior to the current season, allow for a dbig teamT effect on match results: for given values of other controls, large-
market teams are more likely (and small-market teams less likely) to win. This effect might reflect the direct
influence of the crowd on the match result, or the ability of teams with larger attendances to spend more heavily on
acquiring and retaining playing talent.
3. Odds-setters’ forecasting performance there was one match that was rescheduled at very
short notice, for which bookmakers’ odds were not
In the English football betting market, bookmaker published.)
odds are quoted in the form: a-to-b home win; c-to-d As a descriptive measure of the accuracy of
draw; and e-to-f away win. If b is staked on a home probability forecasts, Table 2 reports the Brier Score
win, the overall payoffs to the bettor are +a (the (Boulier & Stekler, 2003; Brier, 1950). For any set of
bookmaker pays the winnings and returns the stake) if odds-setter’s home win probabilities, the Brier Score is
the bet wins, and b (the bookmaker keeps the stake)
if the bet loses. These quoted prices can be converted
Table 2
to the home win, draw and away win dprobabilitiesT: Brier Scores for forecasting performance, odds-setters’ implicit
hH D A
i, j = b / (a + b); h i, j = d / (c + d); h i, j = f / (e + f ). How- probabilities and the benchmark model’s forecast probabilities
H D A
ever, the sum h i, j + h i, j + h i, j invariably exceeds one, 1998– 1999– 2000– 2001– 2002–
because the prices contain a margin to cover the 1999 2000 2001 2002 2003
bookmaker’s costs and profits. Implicit home win, Home win
draw and away win probabilities which sum to one are Coral 0.236 0.231 0.233 0.235 0.235
hH H H D A D
i, j = h i, j / (h i, j + h i, j + h i, j ), and likewise for h i, j and Hill 0.236 0.230 0.232 0.234 0.236
h i, j . The bookmaker’s dover-roundT is k i, j = u H
A
i, j +
Ladbroke 0.235 0.230 0.232 0.234 0.236
Stanley 0.236 0.230 0.233 0.235 0.235
hDi, j + h A
i, j 1, and k i, j / (1 + k i, j ) is the take-out rate if
SuperSoccer 0.236 0.231 0.233 0.235 0.236
the bookmaker holds equal liabilities in respect of Benchmark model 0.234 0.231 0.234 0.234 0.238
each of the three possible match outcomes.
In Tables 2 and 3, comparisons are drawn between Draw
the forecasting performance of the match result Coral 0.201 0.199 0.197 0.195 0.196
Hill 0.201 0.198 0.198 0.195 0.196
probabilities derived from the benchmark model,
Ladbroke 0.201 0.199 0.197 0.195 0.196
and the implicit probabilities derived from the book- Stanley 0.201 0.199 0.198 0.195 0.196
makers’ odds. These comparisons are based on all SuperSoccer 0.201 0.198 0.198 0.195 0.197
matches for which a set of odds and a set of Benchmark model 0.200 0.198 0.197 0.195 0.197
benchmark model probabilities are available. (The
Away win
model does not provide forecasts for matches involv-
Coral 0.189 0.186 0.185 0.185 0.197
ing teams that were admitted to the league within their Hill 0.189 0.186 0.185 0.184 0.196
first two seasons of league membership, because for Ladbroke 0.189 0.185 0.185 0.184 0.196
such teams insufficient lagged match results data are Stanley 0.190 0.186 0.185 0.185 0.196
available to define values for all of the benchmark SuperSoccer 0.190 0.186 0.185 0.185 0.197
Benchmark model 0.189 0.185 0.186 0.185 0.198
model’s covariates. In addition, in the 2002–3 season
558 D. Forrest et al. / International Journal of Forecasting 21 (2005) 551–564
Table 3
Maximised log-likelihood values and LR tests: various ordered probit regressions
Season 1998–1999 1999–2000 2000–2001 2001–2002 2002–2003
No. of matches 1944 1946 1946 1946 1945
1. Log-likelihood: ordered probit regression of match results on / yi, j
Coral 2022.2 1994.5 1992.4 1990.2 2029.4
Hill 2021.4 1990.4 1989.6 1984.3 2029.2
Ladbroke 2021.4 1986.8 1991.1 1985.1 2030.6
Stanley 2023.0 1990.8 1992.8 1987.9 2029.1
SuperSoccer 2023.6 1993.1 1991.0 1990.5 2031.0
2. Log-likelihood: ordered probit regression of match results on p yi, j
Benchmark model 2015.8 1988.9 2000.1 1988.3 2038.8
3. Log-likelihood: ordered probit regression of match results on / yi, j and p yi, j
Coral 2014.4 1986.0 1990.8 1984.3 2028.7
Hill 2014.2 1984.1 1988.6 1981.2 2028.7
Ladbroke 2013.9 1982.1 1989.6 1981.8 2029.9
Stanley 2014.6 1984.4 1991.2 1983.3 2028.6
SuperSoccer 2014.8 1985.3 1989.6 1984.5 2029.9
4. LR test for significance of / yi , j in regression of match results on / yi , j and p yi , j
Coral 2.80* 5.80** 18.68*** 7.88*** 20.16***
Hill 3.30* 9.50*** 22.94*** 14.20*** 20.26***
Ladbroke 3.90** 13.50*** 21.06*** 12.94*** 17.88***
Stanley 2.48 8.94*** 17.90*** 9.98*** 20.44***
SuperSoccer 2.14 7.14*** 20.96*** 7.62*** 17.78***
5. LR test for significance of p yi, j in regression of match results on / yi, j and p yi, j
Coral 15.56*** 16.96*** 3.32* 11.78*** 1.36
Hill 14.50*** 12.54*** 1.96 6.30** 1.06
Ladbroke 15.08*** 9.36*** 3.10* 6.50** 1.44
Stanley 16.80*** 12.88*** 3.34* 9.18*** 0.96
SuperSoccer 17.66*** 15.60*** 2.70 12.06*** 2.24
6. Log-likelihood: ordered probit regression of match results on /̂yi, j
Coral—fitted 2021.2 1991.1 2003.4 1990.8 2036.0
Hill—fitted 2020.8 1989.9 2002.9 1990.7 2035.1
Ladbroke—fitted 2021.9 1990.5 2002.6 1991.6 2035.2
Stanley—fitted 2021.2 1990.5 2002.7 1991.4 2034.8
SuperSoccer—fitted 2022.0 1991.2 2003.0 1991.1 2035.2
*** = significant at 1% level; ** = 5% level; * = 10% level.
P
QR = (H i, j / H 2
i, j ) /N, where H i, j = 1 if the match /Di, j are the implicit home win and draw odds-setters’
between teams i and j resulted in a home win and 0 probabilities, as defined above. The same model is
otherwise, N is the number of matches, and there are estimated separately for each of the five sets of
equivalent definitions for draws and away wins. QR probabilities. These maximised log-likelihood values
can also be evaluated for the probabilities obtained provide an alternative basis for drawing comparisons
from the benchmark model. QR is analogous to the of forecasting accuracy over a specific set of matches,
mean square error of a set of probability forecasts. QR combining the forecast probabilities for all three
always lies within the scale 0 to 1; the smaller QR is match result outcomes in a single measure (rather
within this scale, the more accurate the probability than three separate measures in the case of QR).
forecasts are. The Brier Scores and the maximised log-likelihood
Panel 1 of Table 3 shows the maximised values of values provide a simple basis for comparing the
the log-likelihood functions obtained by fitting a set of forecasting performance of the odds-setters’ proba-
five ordered probit regressions using y i, j as the bilities. According to the maximised log-likelihood
dependent variable and / yi, j = / H D
i, j + 0.5/ i, j as the sole values, William Hill was the best performer overall
covariate, where y i, j is the match result and / H i, j and (1st out of 5 in two of the five seasons, and 2nd in the
D. Forrest et al. / International Journal of Forecasting 21 (2005) 551–564 559
other three seasons), and SuperSoccer was the worst / yi, j , and model probability, p yi, j ) contains relevant
performer (2nd once, 4th once and 5th in the other information that the other does not contain. If so, the
three seasons). However, it is apparent that the forecasting performance of / yi, j and p yi, j combined
variation in the predictability of match results from should be superior to that of either individually. In
season to season is much larger than the variation in order to investigate whether this is the case, Panel 3 of
forecasting performance between the odds-setters. Table 3 shows the maximised values of the log-
According to the maximised log-likelihood values, likelihood functions obtained by fitting a set of five
all five odds-setters were noticeably less successful in ordered probit regressions using match outcome y i, j
forecasting match results in 1998–9 and 2002–3 than as the dependent variable, and both / yi, j and p yi, j as
in the other three seasons. covariates. As before, this model is estimated sepa-
Panel 2 of Table 3 shows the maximised values of rately for each of the five sets of odds-setters’
the log-likelihood functions obtained by fitting an probabilities. Panels 4 and 5 show the results of
ordered probit regression using y i, j as the dependent likelihood ratio (LR) tests for the individual signifi-
variable and p yi, j = p H D
i, j + 0.5p i, j as the sole covariate, cance of / yi, j and p yi, j , respectively, in the regressions
where y i, j is the match result and p H D
i, j and p i, j are the summarised in Panel 3.
benchmark model’s home win and draw probabilities, Panel 4 assesses the extent to which the odds-
generated in accordance with the procedure described setters’ probabilities contain useful information that is
in Section 2. Comparisons between the benchmark not incorporated in the benchmark model. For 1998–
model’s and the odds-setters’ Brier Scores can be 9, the value added by the odds-setters seems to have
obtained from Table 2, and comparisons between the been quite marginal, with bookmaker probability, / yi, j ,
maximised log-likelihood values can be obtained from significant at 5% only in the case of one of the odds-
Panels 1 and 2 of Table 3. setters according to the LR tests. For the other four
Over the five seasons, these comparisons appear to seasons, however, / yi, j is significant in all cases,
produce a clear trend: at the start of the period the suggesting that the odds-setters probabilities do
benchmark model tended to outperform the odds- contain useful information that is not captured by
setters, but by the end of the period the opposite was the benchmark model.
true. As was also the case for the odds-setters, the Conversely, Panel 5 of Table 3 assesses the extent to
benchmark model’s forecasting performance was which the benchmark model contains useful informa-
noticeably worse in 1998–9 and 2002–3 than in the tion that is not also captured by the odds-setters. For
other three seasons when, presumably, there was less 1998–9, 1999–2000 and 2001–2, model probability,
dnoiseT in the pattern of match results. p yi, j , is significant in all cases. However, for 2000–1 p yi, j
The Brier Scores permit comparisons between the is only borderline significant, and for 2002–3 p yi, j is
odds-setters’ and benchmark model’s probabilities for insignificant. Overall these results appear consistent
the match result outcomes (home win, draw, away with the notion that relative to the benchmark model,
win) individually. However, on this basis there is little the odds-setters’ performance has improved over time.
evidence of systematic differences between the fore- By 2002–3, the odds-setters appear to have been
casting performance of the odds-setters and the model. incorporating most of the information that is contained
For home wins, the model outperformed all 5 odds- in the benchmark model into their prices.
setters in 2 seasons, and was outperformed by all 5 in Within each football season and for each match
2 seasons. For draws, the model outperformed all 5 outcome, in absolute terms the Brier Scores reported
odds-setters in 3 seasons, and was outperformed by all in Table 2 for the five bookmakers and for the
5 in 1 season. For away wins, the model outperformed benchmark model are very similar. This is also true of
all 5 odds-setters in 1 season, and was outperformed the maximised log-likelihood values reported in Table
by all 5 in 3 seasons. 3. It is therefore relevant to assess whether the small
The comparison between the forecasting perform- absolute differences that do exist are economically
ance of the odds-setters and the benchmark model important. For example, is the significance of model
varies over time, but it is possible that at any particular probability, p yi, j , in the regression of match results on
time each summary measure (bookmaker probability, p yi, j and bookmaker probability, / yi, j , in three of the
560 D. Forrest et al. / International Journal of Forecasting 21 (2005) 551–564
five seasons (Table 3, Panel 5) of any practical every match, on the match outcome with the highest
importance for a bettor seeking to use the benchmark expected return according to the benchmark model.
model to devise a profitable betting strategy? Or are The expected returns are calculated by comparing the
the match result probabilities produced by the bench- benchmark model’s probabilities with each of the five
mark model so similar to those implicit in the sets of bookmakers’ odds individually, and with the
bookmakers’ odds that we must infer that the model best available odds. In all cases, there are substantial
contains little or no useful additional information? reductions in the bookmakers’ take-out. Using the
Panel 1 of Table 4 reports the percentage return that best available odds, the take-out is virtually elimi-
would be earned if bets (with identical stakes) were nated: the average return over all five seasons is
placed indiscriminately on each of the three possible 0.2%. Again, the reductions in the bookmakers’
outcomes for every match. These returns are calculated take-out achievable through arbitrage and by exploit-
for each of the five bookmakers individually, and using ing the benchmark model to select profitable bets,
the best available odds (across the five bookmakers) for reflect differences between the odds-setters’ proba-
each match outcome. For each of the bookmakers, bilities and the benchmark model’s probabilities that
negative returns predominantly in the range 10% to appear to be important economically.
12% merely reflect the magnitude of the book- One possible limitation of the analysis in this
makers’ over-round. Using the best available odds, this section arises from the fact that the same benchmark
negative percentage return is substantially reduced (but model is used to generate probabilities for all matches
not eliminated): the average return over all five seasons within a single season whereas odds-setters’ decisions
is 6.6%. This reduction in the bookmakers’ take-out, could in principle be based on versions of their
achievable solely through arbitrage, appears non- (implicit) model updated much more frequently. For
negligible, suggesting that relatively small differences example, benchmark model forecasts for all matches
between the Brier Scores and maximised log-like- played in the 2002–3 season are generated from the
lihood values reflect differences between the book- model that is reported in Table 1 (estimated using data
makers’ prices that are important economically. for seasons 1987–8 to 2001–2 inclusive). Although
Panel 2 of Table 4 reports the percentage return 2002–3 data prior to the current match are used to
that would be earned if a bettor placed one bet on calculate the covariate values that generate the
probability forecasts for the current match, these data
Table 4 are not used to update the estimates of the model itself
Percentage returns from indiscriminate and selective betting as the 2002–3 season progresses. In principle, how-
strategies ever, the information required to do this would be
Season 1998– 1999– 2000– 2001– 2002– available to bettors during the course of each season.
1999 2000 2001 2002 2003 It is therefore relevant to examine whether updating
No. of matches 1944 1946 1946 1946 1945 the model in this way might deliver improved
1. Percentage return from all available bets forecasting performance.
Coral 10.02 10.79 11.41 11.62 10.38 In order to investigate this issue, Table 5 reports
Hill 10.01 10.99 11.34 11.96 11.18 Brier Scores and maximised log likelihood values
Ladbroke 10.16 11.24 11.55 12.16 11.04 from ordered probit regressions of match results on
Stanley 9.25 10.78 11.07 11.57 10.39
SuperSoccer 9.93 10.84 11.07 11.44 10.13
benchmark model forecast probabilities, calculated on
Best available odds 5.31 6.37 7.09 7.65 6.48 a similar basis to those reported in Table 2 and Panel 2
2. Percentage return if bets are placed on the match outcome with of Table 3, for the following:
the highest expected return, according to the probabilities obtained
from the benchmark model (i) Benchmark model probabilities for matches
Coral 0.94 0.60 6.35 4.15 5.20
Hill 1.09 5.62 8.21 9.79 6.27
played in August to December of season s,
Ladbroke 1.52 4.58 8.35 12.14 8.13 obtained from a version estimated using data for
Stanley 1.86 2.84 8.76 4.53 3.84 seasons s-15 to s-1 (as before);
SuperSoccer 1.82 4.44 6.15 8.16 1.12 (ii) Benchmark model probabilities for matches
Best available odds 2.94 0.22 1.77 2.81 0.82 played in January to May of season s, obtained
D. Forrest et al. / International Journal of Forecasting 21 (2005) 551–564 561
Table 5
Brier Scores and maximised log-likelihood values for the benchmark model’s forecast probabilities: August–December and January–May
1998–1999 1999–2000 2000–2001 2001–2002 2002–2003
No. matches
(i) 1025 1002 1011 1057 1044
(ii)/(iii) 919 944 935 888 902
Brier Score, home win
(i) 0.2389 0.2320 0.2322 0.2371 0.2386
(ii) 0.2288 0.2290 0.2362 0.2301 0.2376
(iii) 0.2286 0.2301 0.2369 0.2300 0.2368
Brier Score, draw
(i) 0.1991 0.1976 0.1920 0.1957 0.1934
(ii) 0.2015 0.1989 0.2025 0.1943 0.2002
(iii) 0.2016 0.1991 0.2030 0.1942 0.2001
Brier Score, away win
(i) 0.1851 0.1816 0.1877 0.1905 0.2012
(ii) 0.1939 0.1885 0.1834 0.1790 0.1938
(iii) 0.1936 0.1893 0.1834 0.1791 0.1934
Log-likelihood: ordered probit regression of match results on p di,j
(i) 1063.4 1018.2 1032.5 1095.5 1096.7
(ii) 950.7 968.4 966.1 892.6 941.7
(iii) 951.1 968.7 966.2 892.8 941.7
(i) Denotes matches played in August to December of season s, forecast probabilities obtained from model estimated using data for seasons s-15
to s-1.
(ii) Denotes matches played in January to May of season s, forecast probabilities obtained from model estimated using data for seasons s-15 to
s-1.
(iii) Denotes matches played in January to May of season s, forecast probabilities obtained from model estimated using data for January to May
of season s-15, seasons s-14 to s-1, and August to December of season s.
from a version estimated using data for seasons worse. From inspection of the coefficients of the
s-15 to s-1 (as before); benchmark model estimated over different sample
(iii) Benchmark model probabilities for matches periods, it is apparent that any systematic variation in
played in January to May of season s, obtained the coefficients of this model occurs very slowly, over
from a version estimated using data for January periods of several years’ rather than just a few
to May of season s-15, all of seasons s-14 to s-1, months’ duration. Therefore the benefit gained by
and August to December of season s. updating the model at the season’s mid-point is small.
Furthermore, updating imposes a small cost, because
To obtain (iii), the benchmark model is updated as (iii) requires the estimation of two additional coef-
if it were re-estimated on 31 December (when roughly ficients, with l 1s and l 2s for seasons s-15, and s
half of the season’s matches had been completed). The estimated using data on only half of the matches
comparison between (ii) and (iii) is the most important played in these two seasons. Table 5 suggests that this
feature of Table 5. This comparison suggests however cost may marginally outweigh the benefit gained by
that contrary to the hypothesis articulated above, updating the estimated model in this way.
updating the benchmark model at the season’s mid- So far, we have found that the performance of the
point has little or no effect on its forecasting per- odds-setters improved over our sample period relative
formance. According to the Brier Scores, the updated to the statistical model. We find evidence of statisti-
model performs sometimes marginally better, and cally significant differences between the forecast
sometimes marginally worse, than the model esti- probabilities of the odds-setters and those of the
mated with data to the end of the previous season. The statistical model in three out of five seasons. Our
maximised log-likelihood values suggest that overall, simulations of the returns to indiscriminate and
the performance of the updated model is marginally selective betting strategies shown in Table 4 suggest
562 D. Forrest et al. / International Journal of Forecasting 21 (2005) 551–564
that these differences in forecasting performance do because 0 V / yi, j V 1, while the logit is unconstrained).
translate into differences in the financial returns from /ˆyi, j captures the relationship between the publicly
betting that are important economically. Updating the available historical information and the odds-setters’
forecasts from the statistical model within seasons implicit probabilities. Each OLS regression can be
does not appear to make any substantial difference to interpreted as the implicit model employed by the
our results. odds-setter to convert historical information into
prices. Therefore the comparison between the ordered
probit models for match results in which actual and
4. Odds-setters’ use of subjective information fitted bookmaker probabilities, / yi, j and /̂ yi, j respec-
tively, are used as the sole covariates provides a
In the final stage of the empirical analysis, an measure of the contribution made by the odds-setters’
attempt is made to distinguish between the contribu- use of subjective information to their forecasting
tion to the odds-setters’ forecasting performance of performance.
the publicly available historical information that is The comparison between the maximised log-like-
contained in the covariates of the forecasting model, lihood values in Panels 1 and 6 of Table 3 suggests
and the contribution of other more subjective infor- that at the start of the period, the forecasting
mation that is not contained in the forecasting model performance of the probabilities obtained from the
covariates, but which may nevertheless be relevant to odds-setters’ implicit models was superior to that of
an assessment of the match result probabilities. the probabilities derived from the odds-setters’ actual
Webby and O’Connor (1996) use the term dbroken prices. Therefore the odds-setters’ use of subjective
leg cueT to refer to unusual information, extra to the information appears to have been detrimental to their
normal flow, that may influence a forecaster. This forecasting performance. This finding is consistent
might include information on individual player with the results for spread betting on match outcomes
injuries (literally broken legs in some cases) or in the NFL reported by Boulier and Stekler (2003). By
impending suspensions, or inferences concerning the end of the period, however, the opposite was true:
future match results from subjective analysis of recent the forecasting performance of the probabilities
team performances (rather than objective analysis of derived from the odds-setters’ actual prices was
match results). It seems highly likely that odds-setters’ superior to that of the probabilities derived from their
prices are influenced by information of this kind. If implicit models. Therefore the odds-setters’ use of
the subjective information is used effectively, it may subjective information appears to have had a positive
contribute positively to the odds-setters’ forecasting effect on their forecasting performance. Again, these
performance. However, it is also conceivable that the results appear consistent with the notion that the odds-
odds-setters’ use of dbroken leg cuesT adds dnoiseT that setters’ use of information (both objective and
might have a deleterious effect on their forecasting subjective) has improved over time.
performance.
Panel 6 of Table 3 shows the maximised values of
the log-likelihood functions obtained by fitting a set 5. Conclusions
of six ordered probit regressions using match
outcome, y i , j , as the dependent variable and We have assessed the use of information by
predicted bookmaker probability /̂yi, j as the sole football odds-setters in British bookmaking firms
covariate. The predicted bookmaker probabilities, during five seasons, from 1998–9 to 2002–3. Our
/ˆyi,j , are the fitted values for / yi, j obtained from five key findings are as follows.
OLS regressions in which the dependent variable is
log odds, ln{/ yi, j / (1 / yi, j )}, for each of the five a) Early in the period, a data-rich benchmark stat-
sets of odds-setters’ implicit probabilities and the istical forecasting model tended to produce better
independent variables are the full set of forecasting forecasts than probabilities based on published
model covariates defined in Appendix A. (For the bookmakers’ odds; but by the end of the period, the
OLS regressions, the logit transformation is applied opposite was true.
D. Forrest et al. / International Journal of Forecasting 21 (2005) 551–564 563
b) According to a series of likelihood-ratio tests, the laboratory of sport is that subjective probabilistic
adding probabilities based on bookmakers’ odds forecasting may outperform forecasting based on
improved the forecasting performance of the quantification when the financial stakes become
benchmark model, indicating that odds-setters are sufficiently high. When money is at risk, forecasters
privy to, and make effective use of, information employed by British bookmakers appear to make
not included in the benchmark model. This was a good use of available information.
test failed (Forrest and Simmons, 2000) by news-
paper tipsters even though their performance was
evaluated relative to a much cruder statistical Acknowledgement
model.
c) In another series of likelihood ratio tests, adding We acknowledge the helpful comments of an
the probability estimates from the benchmark anonymous referee in facilitating improvements on
model to a forecasting framework utilising only an earlier draft.
bookmaker odds was demonstrated to improve
forecasting performance, but only for the first three
seasons of the study period. By 2001–2 and 2002– Appendix A
3, odds-setters’ decisions appeared already to
capture fully and accurately the extensive data P di,y,s =p di,y,s /n i,y, where p di,y,s = home team i’s total
represented in the benchmark model. dpointsT score, on a scale of 1 = win, 0.5 =
d) We estimated an equation designed to account for draw, 0 = loss in matches played 0–12
the determination of the odds themselves. Early in months ( y = 0) or 12–24 months ( y = 1)
the period, fitted values from this equation before current match; within the current
provided more effective guidance on the outcomes season (s = 0), the previous season (s = 1) or
of matches than actual odds, indicating that two seasons ago (s = 2); in the team’s current
subjective adjustment to results from their implicit division (d = 0) or one (d = F 1) or two
models lowered the odds-setters’ performance. (d = F2) divisions above or below the
However, by the end of the study period, the current division; and n i,y = i’s total matches
opposite was true and incorporation of subjective played 0–12 months ( y = 0) or 12–24 months
judgement appeared to improve performance: ( y = 1) before current match.
odds-setters appear latterly to have learned how RH i,m Result (1 = win, 0.5 = draw, 0 = loss) of i’s
properly to assess and use dbroken leg cuesT. mth most recent home match.
RA i,n Result of i’s nth most recent away match.
The paper on newspaper football tipsters by Forrest SIGHi, j 1 if match has championship, promotion or
and Simmons (2000) claimed to validate the consen- relegation significance for i but not for away
sual view in forecasting that statistical models out- team j; 0 otherwise.
perform experts. In the present paper, a much more SIGAi, j 1 if match has significance for j but not for i;
detailed benchmark statistical model proves to be far 0 otherwise.
from dominant over the views of a group of experts, CUPi 1 if i is eliminated from the FA Cup; 0
whose assessments may be inferred from the odds otherwise.
they set for football matches. Furthermore, the DISTi, j Natural logarithm of the geographical dis-
performance of these experts has improved in a tance between the grounds of i and j.
number of dimensions through a period when an APi,h Residual for i from a cross-sectional regres-
intensification of competitive pressure in bookmaking sion of the log of average home attendance
has made the consequences of poor forecasting on final league position, defined on a scale of
performance increasingly costly. Structural changes 92 for the PL winner to 1 for the bottom team
in the UK betting market have resulted in improved in Division 3 of the Football League (FLD3),
expert forecasts relative to bootstrapped predictions h seasons before the present season, for
from a sophisticated statistical model. Our news from h = 1,2.
564 D. Forrest et al. / International Journal of Forecasting 21 (2005) 551–564