0% found this document useful (0 votes)
8 views70 pages

intro to factor model_good

This document introduces factor modeling, which decomposes economic variables into common unobservable factors and idiosyncratic components. It discusses the advantages of factor models in econometrics, such as handling large datasets and alleviating dimensionality issues, and outlines their applications in summarizing indicators, forecasting, and identifying structural shocks. The seminar will cover various aspects of factor models, including estimation methods, extensions, and identification challenges.

Uploaded by

akhil cv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views70 pages

intro to factor model_good

This document introduces factor modeling, which decomposes economic variables into common unobservable factors and idiosyncratic components. It discusses the advantages of factor models in econometrics, such as handling large datasets and alleviating dimensionality issues, and outlines their applications in summarizing indicators, forecasting, and identifying structural shocks. The seminar will cover various aspects of factor models, including estimation methods, extensions, and identification challenges.

Uploaded by

akhil cv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

An Introduction to Factor Modelling

Massimiliano Marcellino
Bocconi University

12 May 2017
Why factor models?
Factor models decompose the behaviour of an economic
variable (xit ) into a component driven by few unobservable
factors (ft ), common to all the variables but with speci…c
e¤ects on them (λi ), and a variable speci…c idiosyncratic
components (ξ it ):
xit= λi ft + ξ it ,
t = 1, . . . , T ; i = 1, ..., N
Idea of few common forces driving all economic variables is
appealing from an economic point of view, e.g. in the Real
Business Cycle (RBC) and Dynamic Stochastic Genereal
Equilibrium (DSGE) literature there are just a few key
economic shocks a¤ecting all variables (productivity, demand,
supply, etc.), with additional variable speci…c shocks
Moreover, factor models can handle large datasets (N large),
re‡ecting the use of large information sets by policy makers
and economic agents when taking their decisions
Why factor models?
From an econometric point of view, factor models:
Alleviate the curse of dimensionality of standard VARs
(number of parameters growing with the square of the number
of variables)
Prevent omitted variable bias and issues of
non-fundamentalness of shocks (shocks depending on future
rather than past information that cannot be properly
recovered from VARs)
Provide some robustness in the presence of structural breaks
Require minimal conditions on the errors (can be correlated
over time, heteroskedastic etc)
Are relatively easy to be implemented (though underlying
model is nonlinear and with unobservable variables)
What can be done with factor models?
Use the estimated factors to summarize the information in a
large set of indicators. For example, construct coincident and
leading indicators as the common factors extracted from a set
of coincident and leading variables, or in the same way
construct …nancial condition indexes or measures of global
in‡ation or growth.
Use the estimated factors for nowcasting and forecasting,
possibly in combination with autoregressive (AR) terms
and/or other selected variables, or for estimation of missing or
outlying observations (getting a balanced dataset from an
unbalanced one). Typically, they work rather well.
Identify the structural shocks driving the factors and their
dynamic impact on a large set of economic and …nancial
indicators (impulse response functions and forecast error
variance decompositions, as in structural VARs)
An introduction to factor models
In this seminar we will consider:
Small scale factor models: representation, estimation and
issues
Large scale factor models
Representation (exact/approximate, static/dynamic,
parametric / non parametric)
Estimation: principal components, dynamic principal
components, maximum likelihood via Kalman …lter, subspace
algorithms
Selection of the number of factors (informal methods and
information criteria)
Forecasting (direct / iterated)
Structural analysis (FAVAR based)
Useful references (surveys): Bai and Ng (2008), Stock and
Watson (2006, 2011, 2015), Lutekpohl (2014)
Some extensions

How to handle parameter time variation: an e¢ cient …lter for


TV-FAVAR see e.g. Eickmeier, Lemke and Marcellino (2015)
and their references
How to handle I(1) variables: Factor augmented Error
Correction Models, see e.g. Banerjee, Marcellino and Masten
(2014) and their references
How to handle unbalanced datasets: missing observations,
mixed frequencies and ragged edges, see e.g. Marcellino and
Schumacher (2010) and their references
Some extensions

How to handle hierarchical structures (e.g.,


countries/regions/sectors), see e.g. Beck, Hubrich and
Marcellino (2016) and their references
How to construct targeted factors for forecasting speci…c
variables, see e.g. Hepenstrick and Marcellino (2016) and
their references
How to use factors in IV regressions, see e.g. Kapetanios,
Khalaf and Marcellino (2016) and their references
Representation
Let us consider the factor model:
0 1 0 10 1 0 1
x1t λ11 λ12 ... λ1r f1t ξ 1t
B x2t C B λ21 λ22 ... λ2r C B f2t C B ξ 2t C
B C B CB C+B C
@ ... A = @ ... ... A @ ... A @ ... A ,
xNt λN 1 λN 2 ... λNr frt ξ Nt
0 1 0 10 1 0 1
f1t a11 a12 ... a1r f1t 1 u1t
B f2t C B a21 a22 ... a2r C B f2t 1 C B u2t C
B C B CB C+B C
@ ... A = @ ... ... A @ ... A @ ... A ,
frt ar 1 ar 2 ... arr frt 1 urt

where each (weakly stationary and standardized) variable xit ,


i = 1, ..., N, depends on r unobservable factors fjt via the loadings
λij , j = 1, ..., r , and on its own idiosyncratic error, ξ it . In turn, the
factors are generated from a VAR(1) model, so that each factor fjt
depends on the …rst lag of all the factors, plus an error term, ujt .
Representation

For example, xit , i = 1, ..., N, t = 1, ..., T can be:


A set of macroeconomic and/or …nancial indicators for a
country -> the factors represent their common drivers
GDP growth or in‡ation for a large set of countries -> the
factors capture global movements in these two variables
All the subcomponents of a price index -> the factors capture
the extent of commonality among them and can be compared
with the aggregate index
A set of interest rates of di¤erent maturities -> commonality
is driven by level, slope and curvature factors
In general, we are assuming that all the variables are driven by a
(small) set of common unobservable factors, plus variable speci…c
errors.
Let us write the factor model more compactly as:
Xt = Λft + ξ t ,
ft = Aft 1 + ut ,
where:
- Xt = (x1t , ..., xNt )0 is the N 1 vector of stationary variables
under analysis
- ft = (f1t , ..., frt )0 is the r 1 vector of unobservable factors
- Λ = (λ10 , ..., λN 0 0
) , is the N r matrix of loadings, with
λi = (λi 1 , ..., λir ) (measure e¤ects of factors on variables)
- ξ t = (ξ 1t , ..., ξ Nt )0 is the N 1 vector of idiosyncratic shocks
- ut = (u1t , ..., urt )0 is the r 1 vector of shocks to the factors
- ξ t and ut are multivariate, mutually uncorrelated, standard
orthogonal white noise sequences (hence, uncorrelated over
time and with constant variance covariance matrix);
- jλmax (A)j < 1, jλmin (A)j > 0 (factors are stationary and
dynamic (A 6= 0))
In the factor model:

Xt = Λft + ξ t ,
ft = Aft 1 + ut ,

Λft is called the common component, and λi ft is the common


component for each variable i.
ξ t is called the idiosyncratic component, and ξ it is the
idiosyncratic component for each variable i.
As ft has only a contemporaneous e¤ect on Xt , this is a static
factor model.
Additional lags of ft in the Xt equations can be easily allowed,
and we obtain a dynamic factor model. Additional lags in the
ft equations can be also easily allowed, as well as deterministic
components.
If the variance covariance matrix of ξ t is diagonal (no
correlation at all among the idiosyncratic components), we
have a strict factor model. Otherwise, an approximate factor
model.
As we have speci…ed a model for the factors (VAR(1)), and
made speci…c assumption on the error structure (multivariate
white noise), we have a parametric factor model.
Let us consider an even more compact formulation of the factor
model:
X = ΛF + ξ
where:
- X = (X1 , ..., XT ) is the N T matrix of stationary variables
under analysis
- F = (f1 , ..., fT ) is the r T matrix of unobservable factors
- Λ= (λ10 , ..., λN0 )0 , is the N r matrix of loadings, as before
- ξ = (ξ 1 , ..., ξ T ) is the N T matrix of idiosyncratic shocks
Identi…cation

Let us now consider two factor models:

X = ΛF + ξ, and
X = ΛP 1 PF + ξ = ΘG + ξ,

where P is an r r invertible matrix, Θ = ΛP 1 and


G = PF .
The two models for X are obervationally equivalent (same
likelihood), hence to uniquely identify the factors and the
loadings we need to impose a priori restrictions on Λ and/or
F.
This is similar to the error correction model where the
cointegrating vectors and/or their loadings are properly
restricted to achieve identi…cation.
Typical restrictions are either Λ = (I : Λ e ), where Ir is the
e
r -dimensional identity matrix and Λ is the N r r matrix
of unrestricted loadings, or FF 0 = Ir . The latter condition
imposes that the factors are orthogonal and with unit
variance, as
0 1
∑Tt=1 f1t2 ∑Tt=1 f1t f2t ... ∑Tt=1 f1t frt
B ∑T f2t f1t ∑Tt=1 f2t2 ... ∑Tt=1 f2t frt C
FF 0 = B
@
t =1 C
A
... ...
T T T 2
∑t =1 frt f1t ∑t =1 frt f2t ... ∑t =1 frt

The condition FF 0 = Ir is su¢ cient to get unique estimators


for the factors, but not to fully identify the model. For that
additional conditions are needed, such as Λ0 Λ is diagonal with
distinct, decreasing diagonal elements. See, e.g., Lutkepohl
(2014) for details.
Factor models and VARs

An interesting question:
Is there a VAR that is equivalent to a factor model (in the
sense of having the same likelihood)?
Unfortunately, in general no, at least not a …nite order VAR.
However, it is possible to impose restrictions on a VAR to make it
"similar" to a factor model.
Let us consider the VAR(1) model

Xt = BXt 1 + ξt ,

assume that the N N matrix B can be factored into B = CD,


where C and D are N r and r N matrices respectively, and
de…ne gt = DXt . We get:

Xt = Cgt 1 + ξt ,
gt = Qgt 1 + vt ,

where Q = DB and vt = Dξ t .
This is called a Multivariate Autoregressive Index (MAI) model,
and gt plays a similar role as ft in the factor model, but it is
observable (a linear combination of the variables in Xt ) and can
only a¤ect Xt with a lag. Moreover, estimation of the MAI is
complex, as the model is nonlinear (see Carriero, Kapetanios and
Marcellino (2011, 2015)). Hence, let us return to the factor model.
Estimation by the Kalman …lter

Let us consider again the factor model written as:

Xt = Λft + ξ t ,
ft = Aft 1 + ut .

In this formulation:
the factors are unobservable states,
Xt = Λft + ξ t are the observation equations (linking the
unobservable states to the observable variables),
ft = Aft 1 + ut are the transition equations (governing the
evolution of the states).
Hence, the model:

Xt = Λft + ξ t ,
ft = Aft 1 + ut .

is already in state space form, and therefore we can use the


Kalman Filter to obtain maximum likelihood estimators for
the factors, the loadings, the dynamics of the factors, and the
variance covariance matrices of the errors (e.g., Stock and
Watson (1989)).
However, there are a few problems:
First, the method is computationally demanding, so that it is
traditionally considered applicable only when the number of
variables, N, is small.
Second, with N …nite, we cannot get consistent estimators for
the factors (as the latter are random variables, not
parameters).
Finally, the approach requires to specify a model for the
factors, which can be di¢ cult as the latter are not observable.
Hence, let us consider alternative estimation approaches.
Non-parametric, large N, factor models

There are two competing approaches in the factor literature


that are non-parametric, allow for very large N (in theory
N ! ∞) and produce consistent estimators for the factors
and/or the common components. They were introduced by
Stock and Watson (2002a, 2002b, SW) and Forni, Hallin,
Lippi and Reichlin (2000, FHLR), and later re…ned and
extended in many other contributions, see e.g. Bai and Ng
(2008) for an overview.
We will now review their main features and results, starting
with SW (which is simpler) and then moving to FHLR.
The SW approach - PCA

The Stock and Watson (2002a,2002b) factor model is

Xt = Λft + ξ t ,

where:
Xt is N 1 vector of stationary variables
ft is r 1 vector of common factors, can be correlated over
time
Λ is N r matrix of loadings
ξ t is N 1 vector of idiosyncratic disturbances, can be mildly
cross-sectionally and temporally correlated
conditions on Λ and ξ t guarantee that the factors are pervasive
(a¤ect most variables) while idiosyncratic errors are not
The SW approach - PCA

Estimation of Λ and ft in the model Xt = Λft + ξ t is complex


because of nonlinearity (Λft ) and the fact that ft is a random
variable rather than a parameter.
The minimization problem we want to solve is
T
min ∑ (Xt
Λ,f1 ,f2 ,...,fT t =1
Λft )0 (Xt Λft )

Under mild regularity conditions, it can be shown that the


(space spanned by the) factors can be consistently estimated
by the …rst r static principal components of X (PCA).
The SW approach - Choice of r

Choice of the number of factors, r :


Fraction of explained variance of Xt : should be large (though
decreasing) for the …rst r principal components, very small for
the remaining ones
Information criteria (Bai and Ng (2002): r should minize
properly de…ned information criteria (cannot use standard
ones as now not only T but also N can diverge)
Testing: Kapetanios (2010) provides some statistics and
related distributions, not easy
The SW approach - Properties of PCA
Need both N and T to grow large, and not too much
cross-correlation among idiosyncratic errors.
As a basic example, consider case with one factor and
uncorrelated idiosyncratic errors (exact factor model):
xit = λi ft + eit . (1)
Then, use simple cross-sectional average as factor estimator:
!
1 N 1 N 1 N
N i∑ N i∑ N i∑
xit = x t = λ i f t + eit
=1 =1 =1
lim x t = λft
N !∞
and x t is consistent for ft (up to a scalar). We can also get
factor loadings by OLS regression of xit on x t , and
b i = λi
lim λ
T !∞ λ
So, if both N and T diverge λ b i x t ! λ i ft .
The SW approach - Properties of PCA

PCA are weighted rather than simple averages of the


variables, where weights depend on λi and var (eit ).
Under general conditions and with proper standardization,
PCA and estimated loadings have asymptotic Normal
distributions (Bai Ng (2006))
If N grows faster than T (such that T 1/2 /N goes to zero),
the estimated factors can be treated as true factors when used
in second-step regressions (e.g. for forecasting, factor
augmented VARs, etc.). Namely, there are no generated
regressor problems.
If the factor structure is weak (…rst factor explains little
percentage of overall variance), PCA is no longer consistent
(Onatski (2006)).
The SW approach - Properties of PCA based forecasts
Suppose the model is

yt +1 = ft β + vt ,
Xt = Λft + ξ t ,

then we can construct a forecast as

ft b
ybt +1 = b β,

where bft are the PCA factor estimators and bβ the OLS
estimator of β, obtained by regressing yt +1 on b
ft .
The asymptotic distribution of factor based forecasts is also
Normal, under general conditions, and its variance depends on
the variance of the loadings and on that of the factors, so you
need both N and T large to get a precise forecast (Bai and
Ng (2006)). This results can be used to derive interval and
density factor based forecasts.
The FHLR approach - DPCA
The FHLR factor model is
Xt = B ( L ) u t + ξ t = χ t + ξ t ,
where:
Xt is the N 1 vector of stationary variables
ut is the q 1 vector of i.i.d. orthonormal common shocks.
These are the drivers of the common factors in the SW
formulation, but in FHLR the focus in on the common shocks
rather than the common factors)
B (L) = I + B1 L + B2 L2 + ... + Bp Lp
χt = B (L)ut is the N 1 vector of common components. It is
estimated by Dynamic Principal Components (DPCA), details
in Appendix A.
ξ t is the N 1 vector of idiosyncratic shocks, can be mildly
correlated across units and over time
Conditions on B (L) and ξ t guarantee that the factors are
pervasive (a¤ect most variables) while idiosyncratic errors are
not
The FHLR approach - static and dynamic factors
q can be di¤erent from r : the former is usually referred to as
the number of dynamic factors while r is the number of static
factors, with q r .
Let us assume for simplicity that there is a single factor ft , but
it has both a contemporaneous and lagged e¤ect on Xt :
Xt = Λ1 ft + Λ2 ft 1 + ξt ,
ft = aft 1 + ut .
0 0 0
We can de…ne gt = (ft , ft 1) , Λ = (Λ1 , Λ2 ), and write the
model in static form as
Xt = Λgt + ξ t .
In this case we have r = 2 static factors (those in gt ), which
are all driven by q = 1 common shock (ut ). Typically, FHLR
focus on q (and the common shocks ut ), while SW on r (and
the common factors gt ). The distinction matters more for
structural analysis than for forecasting.
The FHLR approach - Choice of q

Informal methods:
- Estimate recursively the spectral density matrix of a subset of
Xt , increasing the number of variables at each step; calculate
the dynamic eigenvalues for a grid of frequencies, λxθ ; choose q
so that when the number of variables increases the average
over frequencies of the …rst q dynamic eigenvalues diverges,
while the average of the q + 1th does not.
- For the whole Xt there should be a big gap between the
variance of Xt explained by the …rst q dynamic principal
components and that explained by the q + 1th component.
Formal methods:
- Information criteria: Hallin Liska (2007); Amengual and
Watson (2007)
The FHLR approach - Forecasting

Consider now the model (direct estimation, the common


shocks have an h-period delay in a¤ecting Xt ):

Xt + h = B ( L ) u t + ξ t + h = χ t + ξ t + h .

b t that
In this context, an optimal linear forecast for Xt +h is χ
can be obtained, as said, by DPCA.
A problem with using this method for forecasting is the use of
future information in the computation of the DPCA. To
overcome this issue, which prevents a real time
implementation of the procedure, Forni, Hallin, Lippi and
Reichlin (2005) propose a modi…ed one-sided estimator
(which is however too complex for implementation in EViews).
Parametric estimation - quasi MLE

Kalman …lter produces (quasi-) ML estimators of the factors,


but considered not feasible for large N. No longer true: Doz,
Giannone, Reichlin (2011, 2012).
Model has the form

Xt = Λft + ξ t , (2)
Ψ(L)ft = Bη t , (3)

where q-dimensional vector η t contains the orthogonal


dynamic shocks driving the r factors ft , and the matrix B is
(r q)-dimensional, with q r .
For given r and q, estimation proceeds in the following steps:
Parametric estimation - quasi MLE
1. Estimate bft by PCA and Λ b by regressing Xt on Fbt . The
covariance of bξ t = Xt Λ b Fbt , denoted as Σ b ξ , is also
estimated.
2. Estimate a VAR(p) on the factors b ft , yielding Ψ b (L) and the
ςt = Ψ
residual covariance of b b (L)Fbt , denoted as Σ b ς.
3. To estimate B, given the number of dynamic shocks q, apply
an eigenvalue decomposition of Σ b ς . Let M be the (r q)
matrix of the eigenvectors corresponding to the q largest
eigenvalues, and let the (q q)-dimensional matrix P contain
the largest eigenvalues on the main diagonal and zero
otherwise. Then, B b = M P 1/2 .
4. The Kalman …lter (or smoother) then yields new estimates of
the factors, and the procedure can be iterated.
5. Forecasts can be obtained either by the Kalman …lter or as
bT +h = Λ
X bbfT + h ,
where b
fT +h are obtained from the VAR in (3).
Parametric estimation - Subspace algorithms (SSS)
Let us now consider again the factor model:

Xt = Cft + Dut , t = 1, . . . , T (4)


ft = Aft 1 + But 1

Kapetanios and Marcellino (2009, KM) show that (4) can be


written as regression of future on past, with particular reduced
rank restrictions on the coe¢ cients (similar to reduced rank
VAR seen above):

Xtf = OKXtp + E Etf (5)

where Xtf = (Xt0 , Xt0+1 , Xt0+2 , . . .)0 , Xtp = (Xt0 1 , Xt0 2 , . . .)0 ,
Etf = (ut0 , ut0 +1 , . . .)0 .
Note that (i) Xtf = O ft + E Etf and (ii) ft = KXtp . Hence,
best linear predictor of future X is OKXtp , and we need an
estimator for K (and for the loadings O ).
Parametric estimation - SSS

ft = K̂Xtp .
KM show how to obtains the SSS factor estimates, b
See Appendix A for details.
Once estimates of the factors are available, estimates of the
other parameters (including the factor loadings, Ob ) can be
obtained by OLS.
Choice of number of factors can be done by information
criteria, similar to those by Bai and Ng (2002) for PCA but
with di¤erent penalty function, see KM.
Parametric estimation - SSS forecasts
The SSS forecasts are Xtf = O bKb Xtp , where O
b is obtained by
OLS regression on the estimated factors, as in PCA.
With MLE forecasts are obtained by iterated method (VAR for
factors is iterated forward to produce forecasts for the factors,
which are then inserted into the static model for Xt ).
Forecasts obtained by PCA, DPCA and SSS use direct
method (variable of interest is regressed on the estimated
factors lagged h periods, and parameter estimates are
combined with current value of the estimated factors to
produce h-step ahead forecast of variable(s) of interest).
If model is correctly speci…ed, MLE plus iterated method
produces better (more e¢ cient) forecasts. If there is
mis-speci…cation, as it is often the case, the ranking is not
clear-cut, other factor estimation approaches plus direct
estimation can be better. See, e.g., Marcellino, Stock and
Watson (2006) for comparison of direct and iterated
forecasting with AR and VAR models.
Factor estimation methods - Monte Carlo Comparison
Comparison of PCA, DPCA, MLE and SSS (based on
Kapetanios and Marcellino (2009, KM)).
The DGP is:
xt = Cft + et , t = 1, . . . , T
A(L)ft = B ( L ) ut (6)
where A(L) = I A1 (L) . . . Ap (L),
B (L) = I + B1 (L) + . . . + Bq (L), with (N, T ) = (50,50),
(50,100), (100,50), (100,100), (50,500), (100,500) and
(200,50). MLE for (50,50) only, due to computational burden.
Experiments di¤er for number of factors (one or several), A
and B matrices, choice of s (s = m or s = 1), factor loadings
(static or dynamic), choice of number of factors (true number
or misspeci…ed), properties of idiosyncratic errors
(uncorrelated or serially correlated), and the way C matrix is
generated (standard normal or uniform with non-zero mean).
Five groups of experiments, each replicated 500 times.
Factor estimation methods - MC Comparison, summary

Appendix B provides more details on the DGP and detailed


results. The main …ndings are the following:
DPCA shows consistently lower correlation between true and
estimated common components than SSS and PCA. It shows,
in general, more evidence of serial correlation of idiosyncratic
components, although not to any signi…cant extent.
SSS beats PCA, but gains are rather small, in the range
5-10%, and require a careful choice of s.
SSS beats MLE, which is only sligthly better than PCA.
All methods perform very well in recovering the common
components. As PCA is simpler, it seems reasonable to use it.
Factor models - Forecasting performance

Really many papers on forecasting with factor models in the


past 15 years, starting with Stock and Watson (2002b) for the
USA and Marcellino, Stock and Watson (2003) for the euro
area. Banerjee, Marcellino and Masten (2006) provide results
for ten Eastern European countries. Eickmeier and Ziegler
(2008) provide nice summary (meta-analysis), see also Stock
and Watson (2006) for a survey of the earlier results.
Recently used also for nowcasting, i.e., predicting current
economic conditions (before o¢ cial data is released), see e.g.
Kuzin, Marcellino and Schumacher (2013) and their
references.
Factor models - Forecasting performance

Eickmeier and Ziegler (2008):


"Our results suggest that factor models tend to outperform
small models, whereas factor forecasts are slightly worse than
pooled forecasts. Factor models deliver better predictions for
US variables than for UK variables, for US output than for
euro-area output and for euro-area in‡ ation than for US in‡
ation. The size of the dataset from which factors are
extracted positively a¤ects the relative factor forecast
performance, whereas pre-selecting the variables included in
the dataset did not improve factor forecasts in the past.
Finally, the factor estimation technique may matter as well."
Structural Factor Augmented VAR (FAVAR)
To illustrate the use of the FAVAR for structural analysis, we
take as starting point the FAVAR model as proposed by
Bernanke, Boivin and Eliasz (2005, BBE), see also Eickmeier,
Lemke and Marcellino (2015, ELM) for extensions and
Lutkepohl (2014), Stock and Watson (2015) for surveys.
The model for a large set of stationary macroeconomic and
…nancial variables is:
xi ,t = Λi0 Ft + ei ,t , i = 1, . . . N, (7)
where the factors are orthonormal (F 0 F = I ) and uncorrelated
with the idiosyncratic errors, and E (et ) = 0, E (et et0 ) = R,
where R is a diagonal matrix. As we have seen, these
assumptions identify the model and are common in the
FAVAR literature.
The dynamics of the factors are then modeled as a VAR(p),
Ft = B1 Ft 1 + . . . Bp Ft p + wt , E (wt ) = 0, E (wt wt0 ) = W .
(8)
Structural FAVAR
The VAR equations in (8) can be interpreted as a
reduced-form representation of a system of the form
PFt = K1 Ft 1 + . . . K p Ft p E (ut ) = 0, E (ut ut0 ) = S,
+ ut ,
(9)
where P is lower-triangular with ones on the main diagonal,
and S is a diagonal matrix.
The relation to the reduced-form parameters in (8) is
Bi = P 1 Ki and W = P 1 SP 1 0 . This system of equations
is often referred to as a ‘structural VAR’(SVAR)
representation, obtained with Choleski identi…cation.
For the structural analysis, BBE assume that Xt is driven by
G latent factors Ft and the Federal Funds rate (it ) as a
(G + 1)th observable factor, as they are interested in
measuring the e¤ects of monetary policy shocks in the
economy. ELM use G = 5 factors, that provide a proper
summary of the information in Xt .
Structural FAVAR - Monetary policy shock identi…cation

The space spanned by the factors can be estimated by PCA


using, as we have seen, the …rst G + 1 PCs of the data Xt
(BBE also consider other factor estimation methods).
To remove the observable factor it from the space spanned by
all G + 1 factors, dataset is split into slow-moving variables
(expected to move with delay after an interest rate shock),
and fast-moving variables (can move instantaneously).
Slow-moving variables comprise, e.g., real activity measures,
consumer and producer prices, de‡ators of GDP and its
components and wages, whereas fast-moving variables are
…nancial variables such as asset prices, interest rates or
commodity prices.
Structural FAVAR - Monetary policy shock identi…cation

In line with BBE, ELM estimate the …rst G PCs from the set
of slow-moving variables, denoted by Fbtslow .
Then, they carry out a multiple regression of Ft on Fbtslow and
on it , i.e.
Ft = aFbtslow + bit + νt .
An estimate of Ft is then given by âFbtslow .
Structural FAVAR - Monetary policy shock identi…cation
In the joint factor vector Ft [F̂t , it ], the Federal Funds rate
it is ordered last. Given this ordering, the VAR representation
with lower-triangular contemporaneous-relation matrix P in
(8) directly identi…es the monetary policy shock as the last
element of the innovation vector ut , say uint,t . Hence, the
shock identi…cation works via a Cholesky decomposition,
which is here readily given by the lower triangular P 1 .
Naturally, the methodology also allows for other identi…cation
approaches, such as short/long run or sign restrictions. These
can be just applied to the VAR for Ft [F̂t , it ].
Impulse responses of the factors to the monetary policy shock,
∂Ft +h /∂uint,t , are then computed in the usual fashion from
the estimated VAR, and used in conjunction with the
estimated loading equations, xi ,t = Λ b 0 Fbt + b ei ,t , to get
i
∂xi ,t +h /∂uint,t . Proper con…dence bands for the impulse
response functions can be computed by using the bootstrap
method.
Structural FAVAR - Monetary policy (FFR) shock
Impulse responses from constant parameter FAVAR (solid) and
time varying FAVAR (averages over all periods, dotted) for key
variables, taken from ELM (who developed the TV-FAVAR)
Structural FAVAR - Monetary policy (FFR) shock
Impulse responses from FAVAR (solid) and TV-FAVAR (dotted)
Structural FAVAR - Monetary policy (FFR) shock
Impulse responses from FAVAR (solid) and TV-FAVAR (dotted)
Structural FAVAR: Summary

Structural factor augmented VARs are a promising tool as


they address several issues with smaller scale VARs, such as
omitted variable bias, curse of dimensionality, possibility of
non-fundamental shocks, etc.
FAVAR estimation and computation of the responses to
structural shocks is rather simple, though managing a large
dataset is not so simple
Some problems in VAR analysis remain also in FAVARs, in
particular robustness to alternative identi…cation schemes,
parameter instability, nonlinearities, e¤ects of unit roots and
cointegration, etc. They can be handled, see the extensions
mentioned in the Introduction, at the cost of additional model
complexity
References
Amengual, D. and Watson, M.W. (2007), "Consistent
estimation of the number of dynamic factors in a large N and
T panel", Journal of Business and Economic Statistics, 25(1),
91-96
Bai, J. and S. Ng (2002). "Determining the number of factors
in approximate factor models". Econometrica, 70, 191-221.
Bai, J. and Ng, S., (2006). "Con…dence Intervals for Di¤usion
Index Forecasts and Inference for Factor-Augmented
Regressions," Econometrica, 74(4), 1133-1150.
Bai, J., and S. Ng (2008), “Large Dimensional Factor
Analysis,” Foundations and Trends in Econometrics, 3(2):
89-163.
Bauer, D. (1998), Some Asymptotic Theory for the Estimation
of Linear Systems Using Maximum Likelihood Methods or
Subspace Algorithms, Ph.d. Thesis.
Banerjee, A., Marcellino, M.and I. Masten (2006).
“Forecasting macroeconomic variables for the accession
countries”, in Artis, M., Banerjee, A. and Marcellino, M.
(eds.), The European Enlargement: Prospects and Challenges,
Cambridge: Cambridge University Press.
Banerjee, A., M. Marcellino and I. Masten (2014).
"Forecasting with Factor-augmented Error Correction
Models". International Journal of Forecasting, 30(3), 589-612.
Beck, G., Hubrich, K, and M. Marcellino (2016) “On the
importance of sectoral and regional shocks for price-setting”,
Journal of Applied Econometrics, 31(7), 1234–1253.
Bernanke, B.S., Boivin, J. and P. Eliasz (2005). "Measuring
the e¤ects of monetary policy: a factor-augmented vector
autoregressive (favar) approach", The Quarterly Journal of
Economics, 120(1), 387–422.
Carriero, A., Kapetanios, G. and Marcellino, M. (2011),
“Forecasting Large Datasets with Bayesian Reduced Rank
Multivariate Models”, Journal of Applied Econometrics, 26,
736-761.
Carriero, A., Kapetanios, G. and Marcellino, M. (2016),
"Structural Analysis with Classical and Bayesian Large
Reduced Rank VARs", Journal of Econometrics, forthcoming.
Doz, C., Giannone, D. and L. Reichlin (2011). "A two-step
estimator for large approximate dynamic factor models based
on Kalman …ltering," Journal of Econometrics, 164(1),
188-205.
Doz, C., Giannone, D. and L. Reichlin (2012). "A
Quasi–Maximum Likelihood Approach for Large, Approximate
Dynamic Factor Models," The Review of Economics and
Statistics, 94(4), 1014-1024.
Eickmeier, S., W. Lemke, M. Marcellino, (2015). "Classical
time-varying FAVAR models - estimation, forecasting and
structural analysis", Journal of the Royal Statistical Society,
Series A, 178, 493–533.
Eickmeier, S. and Ziegler, C. (2008). "How successful are
dynamic factor models at forecasting output and in‡ation? A
meta-analytic approach", Journal of Forecasting, (27),
237-265.
Forni, M., Hallin, M., Lippi, M. and L. Reichlin (2000), “The
generalised factor model: identi…cation and estimation”, The
Review of Economic and Statistics, 82, 540-554.
Forni, M., M. Hallin, M. Lippi, L. Reichlin (2005), "The
Generalized Dynamic Factor Model: One-sided estimation and
forecasting", Journal of the American Statistical Association,
100, 830-840.
Hallin, M., and Liška, R., (2007), “The Generalized Dynamic
Factor Model: Determining the Number of Factors,” Journal
of the American Statistical Association, 102, 603-617
Hepenstrick, C. and M. Marcellino (2016), "Forecasting with
Large Unbalanced Datasets: The Mixed Frequency Three-Pass
Regression Filter", SNB WP 2016/04.
Kapetanios, G. (2010), "A Testing Procedure for Determining
the Number of Factors in Approximate Factor Models With
Large Datasets". Journal of Business and Economic Statistics,
28(3), 397-409.
Kapetanios, G., Marcellino, M. (2009). "A parametric
estimation method for dynamic factor models of large
dimensions". Journal of Time Series Analysis 30, 208-238.
Kapetanios, G., Khalaf, L. and M. Marcellino (2016) “Factor
based identi…cation-robust inference in IV regressions”,
Journal of Applied Econometrics, 31(5), 821–842.
Kuzin, V., Marcellino, M. and C. Schumacher (2013) “Pooling
versus model selection for nowcasting GDP with many
predictors: Empirical evidence for six industrialized countries”,
Journal of Applied Econometrics, 28(3), 392-411.
Marcellino, M. and C. Schumacher (2010) “Factor-MIDAS for
now- and forecasting with ragged-edge data: A model
comparison for German GDP”, Oxford Bulletin of Economics
and Statistics, 72, 518-550.
Marcellino, M., J.H. Stock and M.W. Watson (2003),
“Macroeconomic forecasting in the Euro area: country speci…c
versus euro wide information”, European Economic Review,
47, 1-18.
Marcellino, M., J. Stock and M.W. Watson, (2006), “A
Comparison of Direct and Iterated AR Methods for Forecasting
Macroeconomic Series h-Steps Ahead”, Journal of
Econometrics, 135, 499-526.
Onatski, A. (2006). "Asymptotic Distribution of the Principal
Components Estimator of Large Factor Models when Factors
are Relatively Weak". Mimeo.
Stock, J.H and M.W. Watson (1989), “New indexes of
coincident and leading economic indicators.” In NBER
Macroeconomics Annual, 351–393, Blanchard, O. and S.
Fischer (eds). MIT Press, Cambridge, MA.
Stock, J.H and M.W. Watson (2002a), “Forecasting using
Principal Components from a Large Number of Predictors”,
Journal of the American Statistical Association, 97, 1167-1179.
Stock, J. H. and Watson, M. W. (2002b), “Macroeconomic
Forecasting Using Di¤usion Indexes, Journal of Business and
Economic Statistics 20(2), 147-162.
Stock, J. H. and Watson, M. W. (2006), “Forecasting with
Many Predictors," Handbook of Forecasting, North Holland.
Stock, J. H. and Watson, M. W. (2011), Dynamic Factor
Models, in Clements, M.P. and Hendry, D.F. (eds), Oxford
Handbook of Forecasting, Oxford: Oxford University Press.
Stock, J.H. and Watson, M. W. (2015), “Factor Models for
Macroeconomics," in J. B. Taylor and H. Uhlig (eds),
Handbook of Macroeconomics, Vol. 2, North Holland.
Appendix A: Details on estimation of factor models
The FHLR approach - DPCA

The FHLR estimation procedure (assuming q known) is based


on the so-called Dynamic Principal Components (DPC) and
can be summarized as follows:
- Estimate the spectral density matrix of Xt by
periodogram-smoothing:
M
ΣT ( θ h ) = ∑ ΓT
k ωk e
ik θ h
,
k= M
θh = 2πh/(2M + 1), h = 0, ..., 2M,

where M is the window width, ω k are kernel weights and ΓT


k
is an estimator of E (Xt X , Xt k X )
- Calculate the …rst q eigenvectors of ΣT (θ h ), pjT (θ h ),
j = 1, ..., q, for h = 0, ..., 2M.
The FHLR approach - DPCA

- De…ne pjT (L) as

M
pjT (L) = ∑ pjT,k Lk ,
k= M
2M
1
2M + 1 h∑
pjT,k = pjT (θ h )e ik θ h , k = M, ..., M.
=0

- pjT (L)xt , j = 1, .., q, are the …rst q dynamic principal


components of xt .
- Regress xt on present, past, and future pjT (L)xt . The …tted
value is the estimated common component of xt , χ bt .
Parametric estimation - Subspace algorithms (SSS)
The model Xtf = OKXtp + E Etf involves in…nite dimensional
vectors. In practice, use truncated versions,
p
Xsf,t = (Xt0 , Xt0+1 , . . . , Xt0+s 1 )0 and Xp,t = (Xt0 1 , Xt0 2 ,
p
. . . , Xt0 p )0 . Then, regress Xsf,t on Xp,t , and apply a singular
value decomposition to Γ̂f F̂ Γ̂p , where F = OK and Γ̂f , and
p
Γ̂p are the sample covariances of Xsf,t and Xp,t respectively.
These weights are used to determine the importance of certain
directions in F̂ . Then, the estimate of K is given by
1
K̂ = Ŝm1/2 V̂m Γ̂p

where Û Ŝ V̂ represents the singular value decomposition of


Γ̂f F̂ Γ̂p , Ŝ contains the singular values of Γ̂f F̂ Γ̂p in decreasing
order, Ŝm denotes the matrix containing the …rst m columns
of Ŝ and V̂m denotes the heading m m submatrix of V̂ .
Therefore, the SSS factor estimates are b ft = K̂Xtp .
Parametric estimation - SSS, T asymptotics
p must increase at a rate greater than ln (T )α , for some
α > 1, but Np at a rate lower than T 1/3 . N is …xed for the
moment. A range of α between 1.05 and 1.5 provides a
satisfactory performance.
s is required to satisfy sN > m. As N is large this restriction
is not binding, s = 1 is enough.
If we de…ne fˆt = K̂Xtp , then fˆt converges to (the space
spanned by) ft . The speed of convergence is between T 1/2
and T 1/3 because p grows. Note that consistency is possible
because ft depends on ut 1 . If ft depends on ut , fˆt converges
to Aft 1 . p
The asymptotic distribution of T (vec (fˆ ) vec (Hm f ))
with f = (f1 , . . . , fT )0 .is N (0, Vf ).
Once estimates of the factors are available, estimates of the
other parameters (including the factor loadings)
p can be
obtained by OLS. Bauer (1998) proves T consistency and
asymptotic normality
Parametric estimation - SSS, T and N asymptotics

If Np is o (T 1/3 ), p is O (T 1/z ), z > 3, then when N and T


diverge fˆt = K̂Xtp converges to (the space spanned by) ft .
The speed of convergence is (T /Np )1/2 . The intuition is that
p
the estimator of F = OK in Xsf,t = F Xp,t + E Etf remains
consistent if Np = o (T ). 1/3

With a proper standardization, fˆt remains asymptotically


normal
Choice of number of factors can be done by information
criteria, similar to those by Bai and Ng (2002) for PCA but
with di¤erent penalty function.
Appendix B: Details on Monte Carlo comparison
Factor estimation methods - MC Comparison

First set of experiments: a single VARMA factor with di¤erent


speci…cations:
1 a1 = 0.2, b1 = 0.4;
2 a1 = 0.7, b1 = 0.2;
3 a1 = 0.3, a2 = 0.1, b1 = 0.15, b2 = 0.15;
4 a1 = 0.5, a2 = 0.3, b1 = 0.2, b2 = 0.2;
5 a1 = 0.2, b1 = 0.4;
6 a1 = 0.7, b1 = 0.2;
7 a1 = 0.3, a2 = 0.1, b1 = 0.15, b2 = 0.15;
8 a1 = 0.5, a2 = 0.3, b1 = 0.2, b2 = 0.2.
9 As 1 but C = C0 + C1 L.
10 As 1 but one factor assumed instead of p + q
Factor estimation methods - MC Comparison

Second group of experiments: as in 1-10 but with each


idiosyncratic error being an AR(1) process with coe¢ cient 0.2
(exp. 11-20). Experiments with cross correlation yield similar
ranking of methods.
Third group of experiments: 3 dimensional VAR(1) for the
factors with diagonal matrix with elements equal to 0.5 (exp.
21).
Fourth group of experiments: as 1-21 but the C matrix is
U(0,1) rather than N(0,1).
Fifth group of experiments: as 1-21 but using s = 1 instead of
s = m.
Factor estimation methods - MC Comparison

KM compute the correlation between true and estimated


common component and the spectral coherency for selected
frequencies. They also report the rejection probabilities of an
LM(4) test for no correlation in the idiosyncratic component.
The values are averages over all series and over all replications.
Detailed results are in paper: for exp. 1-21, groups 1-3, see
Tables 1-7; for exp. 1-21, group 4, see Table 8 for (N=50,
T=50); for exp. 1-21, group 5, see Tables 9-11.
Factor estimation methods - MC Comparison, N=T=50

Single ARMA factor (exp. 1-8): looking at correlations, SSS


clearly outperforms PCA and DPCA. Gains wrt PCA rather
limited, 5-10%, but systematic. Larger gains wrt DPCA,
about 20%. Little evidence of correlation of idiosyncratic
component , but rejection probabilities of LM(4) test
systematically larger for DPCA.
Serially correlated idiosyncratic errors (exp. 11-18): no major
changes. Low rejection rate of LM(4) test due to low power
for T = 50.
Dynamic e¤ect of factor (exp. 9 and 19): serious deterioration
of SSS, a drop of about 25% in the correlation values. DPCA
improves but it is still beaten by PCA. Choice of s matters:
for s = 1 SSS becomes comparable with PCA (Table 9).
Factor estimation methods - MC Comparison, N=T=50

Misspeci…ed number of factors (exp. 10 and 20): no major


changes, actually slight increase in correlation. Due to
reduced estimation uncertainty.
Three autoregressive factors: (exp. 21): gap PCA-DPCA
shrinks, higher correlation values than for one single factor.
SSS deteriorates substantially, but improves and becomes
comparable to PCA when s = 1 (Table 11).
Full MLE gives very similar and only very slightly better
results than PCA, and is dominated clearly by SSS.
Factor estimation methods - MC Comparison, other results
Larger temporal dimension (N=50, T=100,500). Correlation
between true and estimated common component increases
monotonically for all the methods, ranking of methods across
experiments not a¤ected. Performance of LM tests for serial
correlation gets closer and closer to the theoretical one. (Tab
2,3)
Larger cross-sectional dimension (N=100, 200, T=50). SSS is
not a¤ected (important, N > T ), PCA and DPCA improve
systematically, but SSS still yields the highest correlation in all
cases, except exp. 9, 19, 21. (Tab 4,7).
Larger temporal and cross-sectional dimension
(N=100,T=100 or N=100,T=500). The performance of all
methods improves, more so for PCA and DPCA that bene…t
more for the larger value of N. SSS is in general the best in
terms of correlation(Tab 5,6).
Uniform loading matrix. No major changes (Tab 8)
Choice of s. PCA and SSS perform very similarly (Tab 9-11).

You might also like