Lag selection, forecasting and ARDL models
Chikumbe, SE
21st June, 2023
Econometrics II
Department of Economics, Kwame Nkrumah University
1 / 35
Recap of last week
• OLS assumptions
• Zero conditional mean
• No serial correlation
• Static models → yt = β 0 + β 1 xt + ut
• Distributed Lag (DL) models → yt = β 0 + β 1 xt + β 2 xt −1 + ut
• Autoregressive (AR) models → yt = β 0 + β 1 yt −1 + ut
2 / 35
Roadmap for this week
• AR and DL model lag selection
• Autoregressive Distributed Lag (ARDL) model
• One period ahead forecasting and forecast errors
3 / 35
Lag selection
4 / 35
Putting the p in AR(p)
• We’ve looked at AR(1), AR(2), AR(3) etc.
• How do we know the correct order of an AR model?
• Why is it important to choose the correct order?
• The order, p, directly influences the zero conditional mean assumption
5 / 35
AR order and the zero conditional mean assumption
• We implement an AR(2) model: yt = β 0 + β 1 yt −1 + β 2 yt −2 + ut
• We find the coefficients on β 1 and β 2 to be highly statistically significant
• Zero conditional mean: E (ut |yt −1 , yt −2 ) = 0
• Now we implement an AR(3) model: yt = β 0 + β 1 yt −1 + β 2 yt −2 + β 3 yt −3 + ut
• We find β 3 to also be highly statistically significant: E (ut |yt −1 , yt −2 ) ̸= 0
• Estimating the AR(3) however: E (ut |yt −1 , yt −2 , yt −3 ) = 0
6 / 35
Choosing the p in AR(p)
• How do we know the correct order of an AR model?
• Three approaches
1. F-statistic approach
2. Bayes Information Criterion (BIC)
3. Akaike Information Criterion (AIC)
7 / 35
F-statistic approach
• Start with a model with as many lags as possible
• Test the significance of the last lag
• Do this until the last lag becomes significant at the 95% confidence level
• Advantage: intuitive
• Drawback: it will suggest too many lags, some of the time
• Assume the true AR order is 5, so that the sixth coefficient is zero
• A test using the t-statistic will incorrectly reject the null 5% of the time
• → when the true value of p is five, this method will estimate p to be six 5% of the time.
8 / 35
Autoregressive Model of % ∆ ZAR/USD
Dependent Variable: % ∆ZAR/USD
AR(1) AR(2) AR(3)
% ∆ZAR/USD t −1 0.307∗∗∗ 0.343∗∗∗ 0.344∗∗∗
(0.051) (0.054) (0.054)
% ∆ZAR/USD t −2 −0.109∗∗ −0.116∗∗
(0.054) (0.057)
% ∆ZAR/USD t −3 0.023
(0.055)
Intercept 0.004∗∗ 0.005∗∗ 0.004∗∗
(0.002) (0.002) (0.002)
R2 0.094 0.106 0.105
9 / 35
Bayes Information Criterion (BIC)
SSR (p ) ln(T )
BIC (p ) = ln + (p + 1)
T T
• SSR (p ) → the sum of squared residuals of the AR(p) model
• SSR = ∑ni=1 (ûi )2 → how much variation is not explained by our regression
• (p + 1) → number of variables used, plus 1 for the intercept
ln(T )
• T → log of the number of observations / number of observations
10 / 35
Bayes Information Criterion (BIC)
SSR (p ) ln(T )
BIC (p ) = ln + (p + 1)
T T
• SSR decreases as we add more AR terms
ln(T )
• (p + 1) T → increases as we add more AR terms
• BIC trades off these two forces
• We are interested in finding p̂, the value of p which minimizes BIC (p ) among possible
choices of p
11 / 35
Bayes Information Criterion (BIC)
12 / 35
Akaike Information Criterion (AIC)
SSR (p ) 2
AIC (p ) = ln + (p + 1)
T T
• ln(T ) is replaced by 2, in the second term → second term is smaller in the AIC than
the BIC
• In a dataset with 1000 observations
• ln(1000) = 6.9 which is more than 3 times larger than 2
• → a smaller decrease in the SSR is needed to justify including another AR term
• In large samples, AIC will overestimate p̂ with nonzero probability
• Still useful, if you are concerned there are too few lags suggested by the BIC
13 / 35
Comparing the 3 approaches
Dependent Variable: % ∆ZAR/USD
AR(1) AR(2) AR(3) • F-Statistic approach → AR(2)
% ∆ZAR/USD t −1 0.307∗∗∗ 0.343∗∗∗ 0.344∗∗∗
(0.051) (0.054) (0.054) • AIC → AR(2)
% ∆ZAR/USD t −2 −0.109∗∗ −0.116∗∗
(0.054) (0.057) • BIC → AR(1)
% ∆ZAR/USD t −3 0.023
(0.055)
Intercept 0.004∗∗ 0.005∗∗ 0.004∗∗
(0.002) (0.002) (0.002)
R2 0.094 0.106 0.105
AIC -6.774 -6.782 -6.744
BIC -6.752 -6.749 -6.730
14 / 35
Autoregressive Distributed Lag (ARDL)
models
15 / 35
Types of time series models
• Static models
• yt = β 0 + β 1 xt + ut
• Distributed lag models (DL)
• yt = β 0 + β 1 xt + β 2 xt −1 + ut
• Autoregressive models (AR)
• yt = β 0 + β 1 yt −1 + ut
• Autoregressive distributed lag models (ARDL)
• yt = β 0 + β 1 yt −1 + β 2 xt −1 + ut
16 / 35
ARDL models
yt = β 0 + β 1 yt −1 + β 2 yt −2 + σ1 xt −1 + σ2 xt −2 + ut
• ARDL(p, q) where p = order of AR terms and q = order of DL terms
• The example above is therefore an ARDL(2,2) model
• Combines the benefits of AR and DL models
• Zero conditional mean assumption
• → at time t, the error term is independent of every explanatory variable (y & x), in every
period
17 / 35
ARDL models
yt = β 0 + β 1 yt −1 + β 2 yt −2 + σ1 xt −1 + σ2 xt −2 + ut
• A note on ARDL convention
• When using ARDL models, the convention is to typically exclude contemporaneous DL
terms (no xt )
• → does not allow for contemporaneous effects
• Why? In time series, the focus is often on forecasting, so values in time t are never known
• However, including xt is perfectly fine
18 / 35
ARDL example: ARDL Model of the %∆ ZAR/USD
Dependent Variable: %∆ ZAR/USD
ARDL(1,1) ARDL(2,2)
%∆ ZAR/USDt −1 0.211∗∗∗ 0.221∗∗∗
(0.074) (0.081)
%∆ ZAR/USDt −2 −0.132∗
(0.079)
%∆ ALSIt −1 −0.253∗∗∗ −0.261∗∗∗
(0.064) (0.065)
%∆ ALSIt −2 −0.072
(0.070)
Intercept 0.007∗∗ 0.008∗∗∗
(0.003) (0.003)
R2 0.155 0.175
19 / 35
ARDL models and lag selection
SSR (k ) ln(T ) SSR (k ) 2
BIC (k ) = ln + (k + 1) AIC (k ) = ln + (k + 1)
T T T T
• k = p+q
• This can result in many different models needing to be tested
• k = 4, with p = 1 and q = 3
• k = 4, with p = 2 and q = 2
• k = 4, with p = 3 and q = 1 etc.
• Convention (but not a requirement!): set p = q
• If you do this however, think about implications for zero conditional mean
20 / 35
ARDL lag selection example: ARDL Model of the %∆ ZAR/USD
ARDL(1,1) ARDL(2,2)
%∆ ZAR/USDt −1 0.211∗∗∗ 0.221∗∗∗
(0.074) (0.081)
%∆ ZAR/USDt −2 −0.132∗
(0.079)
%∆ ALSIt −1 −0.253∗∗∗ −0.261∗∗∗
(0.064) (0.065)
%∆ ALSIt −2 −0.072
(0.070)
Intercept 0.007∗∗ 0.008∗∗∗
(0.003) (0.003)
R2 0.155 0.175
AIC -6.681 -6.676
BIC -6.624 -6.582
21 / 35
Forecasting
22 / 35
Forecasting is a core part of time series econometrics
• “What is your best forecast of next month’s unemployment rate?”
• Critical for policy making, planning, investment decisions etc.
• Given dependence in time series data, AR models are the workhorse model for
forecasting
23 / 35
Forecasting is a core part of time series econometrics
Figure: National Treasury forecasts of active vs passive debt stablization policies
24 / 35
Forecasting is a core part of time series econometrics
• Contrast this with the cross sectional domain
• Example: predict test scores in a cross section vs predicting test scores in a time series
• Time series models can be used to forecast, even if none of the coefficients have a
causal interpretation
25 / 35
Forecast Error
• We want to forecast yt +1 based off an AR(1) model
• yt +1 = β 0 + β 1 yt
• Since t + 1 is into the future, the true values of β 0 and β 1 are unknown
• → we use the OLS estimators β̂ 0 and β̂ 1 from historical data as a proxy
• We then seek to estimate yt +1 conditional on information at time t
• ŷt +1|t = β̂ 0 + β̂ 1 yt
• Forecast error: yt +1 − ŷt +1|t
• The difference between the realized value of yt +1 and the forecasted value
26 / 35
A forecast is not a predicted value
• From lecture 2: the predicted value of y is ŷ
• Formally, ŷi = βˆ0 + βˆ1 xi
• Predicted values are calculated for for observations in the sample used to estimate the
regression
• Forecasts are made for a date that exists outside of the sample
27 / 35
The forecast error is not an OLS residual
• From lecture 2: the residual of a regression is the difference the actual value for y and
the predicted value for y , ŷ for observations in the sample
• Formally, ûi = yi − ŷi = yi − βˆ0 − βˆ1 xi
• The forecast error is the difference between the future value of y, which is not
contained in the sample, and the forecast of that future value
• “Out-of-sample” versus “In-sample”
28 / 35
Forecasting with an AR(1) model
%∆ERt = 0.004 + 0.311%∆ERt −1
• Our forecast equation becomes
%∆ERt +1|t = 0.004 + 0.311%∆ERt
• %∆ERt = −0.023
• → %∆ERt +1|t = 0.004 + 0.31 ∗ −0.023 = −0.003
29 / 35
Forecasting with an AR(1) model
→ %∆ERt +1|t = 0.004 + 0.31 ∗ −0.023 = −0.003
• You are now told that %∆ERt +1 = −0.056
ˆ t +1|t = −0.056 − (−0.003) = −0.053
• Forecast error: ERt +1 − ER
• Our forecast overpredicts the realized value by -0.053
30 / 35
Forecast uncertainty
• Forecast error can be divided into two parts
1. Uncertainty about the regression coefficients
2. Uncertainty about the future value of ut
• If there are few coefficients, (2) > (1)
• We’ll introduce the Root Mean Squared Forecast Error (RMSFE) that incorporates
both (1) and (2)
31 / 35
Root Mean Squared Forecast Error (RMSFE)
• Start with an ARDL(1,1): yt = β 0 + β 1 yt −1 + β 2 xt −1 + ut
• Forecast: Ŷt +1|t = βˆ0 + βˆ1 yt + βˆ2 xt
• Forecast error: yt +1 − ŷt +1|t = ut +1 − ( βˆ0 − β 0 ) + ( βˆ1 − β 1 )yt + ( βˆ2 − β 2 )xt
MSFE = E yt +1 − ŷt +1|t
= σu2 + var ( βˆ0 − β 0 ) + ( βˆ1 − β 1 )yt + ( βˆ2 − β 2 )xt
√
• The RMSFE is then simply MSFE
32 / 35
Root Mean Squared Forecast Error (RMSFE)
• Forecast error can be divided into two parts
1. Uncertainty about the regression coefficients
2. Uncertainty about the future value of ut
• If there are few coefficients, (2) > (1)
• We’ll introduce the Root Mean Squared Forecast Error (RMSFE) that incorporates
both (1) and (2)
• If (1) > (2), then the RMSFE ≈
p
var (ut )
33 / 35
Forecast interval
• Similar in spirit to a confidence interval
• One major difference1
• Confidence interval: coefficient ± 1.96 * standard error
• Confidence interval is justified by CLT and therefore holds for a wide range of
distributions of ut
• The forecast error contains
1 If
this sounds a bit tricky, revisit your Chapter 4 lectures with Safia to refresh your grasp of confidence
intervals, hypothesis tests and the central limit theorem
34 / 35
Forecasting interest rates
•
Figure: Fan chart of interest rate forecast from the SARB
35 / 35