intro to factor model_good
intro to factor model_good
Massimiliano Marcellino
Bocconi University
12 May 2017
Why factor models?
Factor models decompose the behaviour of an economic
variable (xit ) into a component driven by few unobservable
factors (ft ), common to all the variables but with speci…c
e¤ects on them (λi ), and a variable speci…c idiosyncratic
components (ξ it ):
xit= λi ft + ξ it ,
t = 1, . . . , T ; i = 1, ..., N
Idea of few common forces driving all economic variables is
appealing from an economic point of view, e.g. in the Real
Business Cycle (RBC) and Dynamic Stochastic Genereal
Equilibrium (DSGE) literature there are just a few key
economic shocks a¤ecting all variables (productivity, demand,
supply, etc.), with additional variable speci…c shocks
Moreover, factor models can handle large datasets (N large),
re‡ecting the use of large information sets by policy makers
and economic agents when taking their decisions
Why factor models?
From an econometric point of view, factor models:
Alleviate the curse of dimensionality of standard VARs
(number of parameters growing with the square of the number
of variables)
Prevent omitted variable bias and issues of
non-fundamentalness of shocks (shocks depending on future
rather than past information that cannot be properly
recovered from VARs)
Provide some robustness in the presence of structural breaks
Require minimal conditions on the errors (can be correlated
over time, heteroskedastic etc)
Are relatively easy to be implemented (though underlying
model is nonlinear and with unobservable variables)
What can be done with factor models?
Use the estimated factors to summarize the information in a
large set of indicators. For example, construct coincident and
leading indicators as the common factors extracted from a set
of coincident and leading variables, or in the same way
construct …nancial condition indexes or measures of global
in‡ation or growth.
Use the estimated factors for nowcasting and forecasting,
possibly in combination with autoregressive (AR) terms
and/or other selected variables, or for estimation of missing or
outlying observations (getting a balanced dataset from an
unbalanced one). Typically, they work rather well.
Identify the structural shocks driving the factors and their
dynamic impact on a large set of economic and …nancial
indicators (impulse response functions and forecast error
variance decompositions, as in structural VARs)
An introduction to factor models
In this seminar we will consider:
Small scale factor models: representation, estimation and
issues
Large scale factor models
Representation (exact/approximate, static/dynamic,
parametric / non parametric)
Estimation: principal components, dynamic principal
components, maximum likelihood via Kalman …lter, subspace
algorithms
Selection of the number of factors (informal methods and
information criteria)
Forecasting (direct / iterated)
Structural analysis (FAVAR based)
Useful references (surveys): Bai and Ng (2008), Stock and
Watson (2006, 2011, 2015), Lutekpohl (2014)
Some extensions
Xt = Λft + ξ t ,
ft = Aft 1 + ut ,
X = ΛF + ξ, and
X = ΛP 1 PF + ξ = ΘG + ξ,
An interesting question:
Is there a VAR that is equivalent to a factor model (in the
sense of having the same likelihood)?
Unfortunately, in general no, at least not a …nite order VAR.
However, it is possible to impose restrictions on a VAR to make it
"similar" to a factor model.
Let us consider the VAR(1) model
Xt = BXt 1 + ξt ,
Xt = Cgt 1 + ξt ,
gt = Qgt 1 + vt ,
where Q = DB and vt = Dξ t .
This is called a Multivariate Autoregressive Index (MAI) model,
and gt plays a similar role as ft in the factor model, but it is
observable (a linear combination of the variables in Xt ) and can
only a¤ect Xt with a lag. Moreover, estimation of the MAI is
complex, as the model is nonlinear (see Carriero, Kapetanios and
Marcellino (2011, 2015)). Hence, let us return to the factor model.
Estimation by the Kalman …lter
Xt = Λft + ξ t ,
ft = Aft 1 + ut .
In this formulation:
the factors are unobservable states,
Xt = Λft + ξ t are the observation equations (linking the
unobservable states to the observable variables),
ft = Aft 1 + ut are the transition equations (governing the
evolution of the states).
Hence, the model:
Xt = Λft + ξ t ,
ft = Aft 1 + ut .
Xt = Λft + ξ t ,
where:
Xt is N 1 vector of stationary variables
ft is r 1 vector of common factors, can be correlated over
time
Λ is N r matrix of loadings
ξ t is N 1 vector of idiosyncratic disturbances, can be mildly
cross-sectionally and temporally correlated
conditions on Λ and ξ t guarantee that the factors are pervasive
(a¤ect most variables) while idiosyncratic errors are not
The SW approach - PCA
yt +1 = ft β + vt ,
Xt = Λft + ξ t ,
ft b
ybt +1 = b β,
where bft are the PCA factor estimators and bβ the OLS
estimator of β, obtained by regressing yt +1 on b
ft .
The asymptotic distribution of factor based forecasts is also
Normal, under general conditions, and its variance depends on
the variance of the loadings and on that of the factors, so you
need both N and T large to get a precise forecast (Bai and
Ng (2006)). This results can be used to derive interval and
density factor based forecasts.
The FHLR approach - DPCA
The FHLR factor model is
Xt = B ( L ) u t + ξ t = χ t + ξ t ,
where:
Xt is the N 1 vector of stationary variables
ut is the q 1 vector of i.i.d. orthonormal common shocks.
These are the drivers of the common factors in the SW
formulation, but in FHLR the focus in on the common shocks
rather than the common factors)
B (L) = I + B1 L + B2 L2 + ... + Bp Lp
χt = B (L)ut is the N 1 vector of common components. It is
estimated by Dynamic Principal Components (DPCA), details
in Appendix A.
ξ t is the N 1 vector of idiosyncratic shocks, can be mildly
correlated across units and over time
Conditions on B (L) and ξ t guarantee that the factors are
pervasive (a¤ect most variables) while idiosyncratic errors are
not
The FHLR approach - static and dynamic factors
q can be di¤erent from r : the former is usually referred to as
the number of dynamic factors while r is the number of static
factors, with q r .
Let us assume for simplicity that there is a single factor ft , but
it has both a contemporaneous and lagged e¤ect on Xt :
Xt = Λ1 ft + Λ2 ft 1 + ξt ,
ft = aft 1 + ut .
0 0 0
We can de…ne gt = (ft , ft 1) , Λ = (Λ1 , Λ2 ), and write the
model in static form as
Xt = Λgt + ξ t .
In this case we have r = 2 static factors (those in gt ), which
are all driven by q = 1 common shock (ut ). Typically, FHLR
focus on q (and the common shocks ut ), while SW on r (and
the common factors gt ). The distinction matters more for
structural analysis than for forecasting.
The FHLR approach - Choice of q
Informal methods:
- Estimate recursively the spectral density matrix of a subset of
Xt , increasing the number of variables at each step; calculate
the dynamic eigenvalues for a grid of frequencies, λxθ ; choose q
so that when the number of variables increases the average
over frequencies of the …rst q dynamic eigenvalues diverges,
while the average of the q + 1th does not.
- For the whole Xt there should be a big gap between the
variance of Xt explained by the …rst q dynamic principal
components and that explained by the q + 1th component.
Formal methods:
- Information criteria: Hallin Liska (2007); Amengual and
Watson (2007)
The FHLR approach - Forecasting
Xt + h = B ( L ) u t + ξ t + h = χ t + ξ t + h .
b t that
In this context, an optimal linear forecast for Xt +h is χ
can be obtained, as said, by DPCA.
A problem with using this method for forecasting is the use of
future information in the computation of the DPCA. To
overcome this issue, which prevents a real time
implementation of the procedure, Forni, Hallin, Lippi and
Reichlin (2005) propose a modi…ed one-sided estimator
(which is however too complex for implementation in EViews).
Parametric estimation - quasi MLE
Xt = Λft + ξ t , (2)
Ψ(L)ft = Bη t , (3)
where Xtf = (Xt0 , Xt0+1 , Xt0+2 , . . .)0 , Xtp = (Xt0 1 , Xt0 2 , . . .)0 ,
Etf = (ut0 , ut0 +1 , . . .)0 .
Note that (i) Xtf = O ft + E Etf and (ii) ft = KXtp . Hence,
best linear predictor of future X is OKXtp , and we need an
estimator for K (and for the loadings O ).
Parametric estimation - SSS
ft = K̂Xtp .
KM show how to obtains the SSS factor estimates, b
See Appendix A for details.
Once estimates of the factors are available, estimates of the
other parameters (including the factor loadings, Ob ) can be
obtained by OLS.
Choice of number of factors can be done by information
criteria, similar to those by Bai and Ng (2002) for PCA but
with di¤erent penalty function, see KM.
Parametric estimation - SSS forecasts
The SSS forecasts are Xtf = O bKb Xtp , where O
b is obtained by
OLS regression on the estimated factors, as in PCA.
With MLE forecasts are obtained by iterated method (VAR for
factors is iterated forward to produce forecasts for the factors,
which are then inserted into the static model for Xt ).
Forecasts obtained by PCA, DPCA and SSS use direct
method (variable of interest is regressed on the estimated
factors lagged h periods, and parameter estimates are
combined with current value of the estimated factors to
produce h-step ahead forecast of variable(s) of interest).
If model is correctly speci…ed, MLE plus iterated method
produces better (more e¢ cient) forecasts. If there is
mis-speci…cation, as it is often the case, the ranking is not
clear-cut, other factor estimation approaches plus direct
estimation can be better. See, e.g., Marcellino, Stock and
Watson (2006) for comparison of direct and iterated
forecasting with AR and VAR models.
Factor estimation methods - Monte Carlo Comparison
Comparison of PCA, DPCA, MLE and SSS (based on
Kapetanios and Marcellino (2009, KM)).
The DGP is:
xt = Cft + et , t = 1, . . . , T
A(L)ft = B ( L ) ut (6)
where A(L) = I A1 (L) . . . Ap (L),
B (L) = I + B1 (L) + . . . + Bq (L), with (N, T ) = (50,50),
(50,100), (100,50), (100,100), (50,500), (100,500) and
(200,50). MLE for (50,50) only, due to computational burden.
Experiments di¤er for number of factors (one or several), A
and B matrices, choice of s (s = m or s = 1), factor loadings
(static or dynamic), choice of number of factors (true number
or misspeci…ed), properties of idiosyncratic errors
(uncorrelated or serially correlated), and the way C matrix is
generated (standard normal or uniform with non-zero mean).
Five groups of experiments, each replicated 500 times.
Factor estimation methods - MC Comparison, summary
In line with BBE, ELM estimate the …rst G PCs from the set
of slow-moving variables, denoted by Fbtslow .
Then, they carry out a multiple regression of Ft on Fbtslow and
on it , i.e.
Ft = aFbtslow + bit + νt .
An estimate of Ft is then given by âFbtslow .
Structural FAVAR - Monetary policy shock identi…cation
In the joint factor vector Ft [F̂t , it ], the Federal Funds rate
it is ordered last. Given this ordering, the VAR representation
with lower-triangular contemporaneous-relation matrix P in
(8) directly identi…es the monetary policy shock as the last
element of the innovation vector ut , say uint,t . Hence, the
shock identi…cation works via a Cholesky decomposition,
which is here readily given by the lower triangular P 1 .
Naturally, the methodology also allows for other identi…cation
approaches, such as short/long run or sign restrictions. These
can be just applied to the VAR for Ft [F̂t , it ].
Impulse responses of the factors to the monetary policy shock,
∂Ft +h /∂uint,t , are then computed in the usual fashion from
the estimated VAR, and used in conjunction with the
estimated loading equations, xi ,t = Λ b 0 Fbt + b ei ,t , to get
i
∂xi ,t +h /∂uint,t . Proper con…dence bands for the impulse
response functions can be computed by using the bootstrap
method.
Structural FAVAR - Monetary policy (FFR) shock
Impulse responses from constant parameter FAVAR (solid) and
time varying FAVAR (averages over all periods, dotted) for key
variables, taken from ELM (who developed the TV-FAVAR)
Structural FAVAR - Monetary policy (FFR) shock
Impulse responses from FAVAR (solid) and TV-FAVAR (dotted)
Structural FAVAR - Monetary policy (FFR) shock
Impulse responses from FAVAR (solid) and TV-FAVAR (dotted)
Structural FAVAR: Summary
M
pjT (L) = ∑ pjT,k Lk ,
k= M
2M
1
2M + 1 h∑
pjT,k = pjT (θ h )e ik θ h , k = M, ..., M.
=0