Economic predictions with big data
The recent availability of large datasets, combined with advances in the fields of statistics, machine learning, and econometrics, have
generated interest in predictive models with many possible predictors. In 2018 a researcher who wants to forecast the future growth rate
of US GDP, for example, can use hundreds of potentially useful predictive variables, such as aggregate and sectoral employment,
prices, interest rates, and many others.
In this type of 'big data' situation, standard estimation techniques – such as ordinary least squares (OLS) or maximum likelihood –
perform poorly. To understand why, consider the extreme case of an OLS regression with as many regressors as observations. The in-
sample fit of this model will be perfect, but its out-of-sample performance would be embarrassingly bad. More formally, the proliferation
of regressors magnifies estimation uncertainty, producing inaccurate out-of-sample predictions. As a consequence, inference methods
aimed at dealing with this curse of dimensionality have become increasingly popular.
Ng (2013) and Chernozhukov et al. (2017), suggested that these methodologies can be divided into two broad classes:
Sparse modelling techniques. These focus on selecting a small set of explanatory variables with the highest predictive
power from a much larger pool of possible regressors. For instance, the popular LASSO and its variants belong to this class of
estimators that yield sparse representations of predictive models (Tibshirani 1996, see Belloni et al. 2011 for a recent survey and
examples of big data applications of these methodologies in economics).
Dense modelling techniques. At the opposite side of the spectrum, these techniques recognise that all possible explanatory
variables might be important for prediction, although the impact of some of them may be small. This insight justifies the use of
shrinkage or regularisation techniques that prevent overfitting by forcing parameter estimates to be small when sample information
is weak. Factor analysis or Ridge regressions are standard examples of dense statistical modelling (Pearson 1901, Tikhonov
1963, see Stock and Watson 2002 or De Mol et al. 2008 for big data applications of these techniques in economics).
While similar in spirit, these two approaches might differ in their predictive accuracy. In addition, there is a fundamental distinction
between a dense model with shrinkage, which pushes some coefficients to be small, and a sparse model with variable selection, which
sets some coefficients identically to zero. Low-dimensional, sparse models may also appear easier to interpret economically, which is an
attractive property for researchers.
Before even starting to discuss whether these structural interpretations are warranted – in most cases they are not, given the predictive
nature of the models – it is important to address whether the data are informative enough to favour sparse models and rule out dense
ones.
Sparse or dense modelling?
We proposed to shed light on these issues by estimating a model that encompasses both sparse and dense approaches (Giannone et
al. 2017). Our main result was that sparse predictive models are rarely preferred in economics. A clearer pattern of sparsity only
emerges when a researcher strongly favours low-dimensional models a priori.
We developed a variant of the 'spike-and-slab' model, originally proposed by Mitchell and Beauchamp (1998). The objective was to
predict a variable of interest – say, GDP growth – using many predictors, for example a large number of macroeconomic indicators.
The model postulates that only some of the predictors are relevant. The unknown fraction that are relevant – denote it by q – is a key
object of interest since it represents model size. Note, however, that if we tried to conduct inference on model size in this simple
framework, we would never estimate it to be very large. This is because high-dimensional models without regularisation suffer from the
curse of dimensionality, as above. Therefore, to make our sparse–dense bake-off fairer, we also allowed for shrinkage: whenever a
predictor was deemed relevant, its impact on the response variable was prevented from being too large to avoid overfitting. We then
conducted Bayesian inference on model size and the degree of shrinkage.
Applications in macro, finance, and micro
We estimated our model on six popular 'big' datasets that have been used for predictive analyses with large information in the fields of
macroeconomics, finance, and microeconomics.
In our macroeconomic applications, we investigated the predictability of aggregate economic activity in the US (Stock and Watson 2002)
and the determinants of economic growth in a cross-section of countries (Barro and Lee 2004, Belloni et al. 2011).
In finance, we studied the predictability of the US equity premium (Welch and Goyal 2008), and the factors that explain the cross-
sectional variation of US stock returns (Freyberger et al. 2017).
/conversion/tmp/scratch/[Link] 1
In our microeconomic analyses, we investigated the factors behind the decline in the crime rate in a cross-section of US states
(Donohue and Levitt 2001, Belloni et al. 2014), and the determinants of rulings in the matter of government takings of private property in
US judicial circuits (Chen and Yeh 2012, Belloni et al. 2012).
Table 1 reports some details of our six applications. They covered a broad range of configurations, in terms of types of data – time-
series, cross-section and panel data – and sample sizes relative to the number of predictors.
Table 1 Summary details of the empirical applications
Source: Giannone et al. (2017).
Result 1: No clear pattern of sparsity
The first key result delivered by our Bayesian inferential procedure was that, in all applications but one, the data do not support sparse
model representations. To illustrate this point, Figure 1 plots the posterior distribution of the fraction of relevant predictors (q) in our six
empirical applications. In the case of Micro 1 this posterior is concentrated around very low values, but in none of the others. In all other
applications, larger values of q are more likely, suggesting that including more than a handful of predictors is preferable in order to
improve forecasting accuracy. For example, in the case of Macro 2 and Finance 1, the preferred specification is the dense model with all
predictors (q=1).
Figure 1 Posterior density of the fraction of relevant predictors (q)
/conversion/tmp/scratch/[Link] 2
Source: Giannone et al. (2017).
Even more surprising, our posterior results were inconsistent with the existence of clear sparsity patterns even when the posterior
density of q is concentrated around values smaller than 1, as in the Macro 1, Finance 2, and Micro 2 cases. To show this point, Figure 2
plots the posterior probabilities of inclusion of each predictor in the six empirical applications.
In the 'heat maps' of this figure, each vertical stripe corresponds to a possible predictor, and darker shades denote higher probabilities of
inclusion. The most straightforward subplot to interpret is from Micro 1. This is a truly sparse model, in which the 39th regressor is
selected 65% of the time, and all other predictors are rarely included.
Figure 2 Heat maps of the probabilities of inclusion of each predictor
/conversion/tmp/scratch/[Link] 3
Source: Giannone et al. (2017).
The remaining five applications, however, do not exhibit a distinct pattern of sparsity, because all predictors seem to be relevant with
non-negligible probability. For example, consider the case of Macro 1, in which the best-fitting models are those with q around 0.25,
according to Figure 1. But Figure 2 suggests that there is a lot of uncertainty about which specific group of predictors should be
selected, because there are many different models using about 25% of the predictors with a very similar predictive accuracy. As a
consequence, it is difficult to characterise any representation of the predictive model as sparse.
Result 2: More sparsity only with an a priori bias in favour of small
models
Our second important result was that clearer sparsity patterns only emerged when the researcher has a strong a priori bias in favour of
predictive models with a small number of regressors. To demonstrate this point, we re-estimated our model forcing q to be very small
(more formally, we used an extremely tight prior centred on very low values of q).
Figure 3 shows the posterior probabilities of inclusion obtained with this alternative estimation. Relative to our baseline, these heat maps
have much larger light-coloured areas, indicating that many more coefficients are systematically excluded, and revealing clearer
patterns of sparsity in all six applications.
Put differently, when the model is forced to be low-dimensional, the data are better at identifying a few powerful predictors. When model
size is not fixed a priori, model uncertainty is pervasive.
Figure 3 Heat maps of the probabilities of inclusion of each predictor when models are forced to be low dimensional
/conversion/tmp/scratch/[Link] 4
Source: Giannone et al. (2017).
Summing up, strong prior beliefs favouring low-dimensional models appear to be necessary to support sparse representations. In most
cases, the idea that the data are informative enough to identify sparse predictive models might be an illusion.
Authors’ note: The views expressed in this paper are those of the authors and do not necessarily reflect views at the ECB, the
Eurosystem, the Federal Reserve Bank of New York, or the Federal Reserve System.
/conversion/tmp/scratch/[Link] 5