0% found this document useful (0 votes)

581 views340 pages

Applied Time Series Analysis Guide

A time series is a collection of observations made at different times on a given system. Time series data are almost always correlated with each other-autocorrelated. We may want to exploit that correlation, or merely to cope with it.

Uploaded by

genlovesmusic09

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

581 views340 pages

Applied Time Series Analysis Guide

Uploaded by

genlovesmusic09

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Statistics 730

Applied Time series Analysis

Fall 2011

Professor Peter Bloomfield

email: Peter [email protected]

http://www.stat.ncsu.edu/people/bloomfield/courses/st730/

Characteristics of Time Series

A time series is a collection of observations made at different

times on a given system.

For example:
Earnings per share of Johnson and Johnson stock (quarterly);
Global temperature anomalies from 1856 1997 (annual);
Investment returns on the New York Stock Exchange (daily).
2

Digression: Retrieving the Data Using R

jj = scan("http://www.stat.pitt.edu/stoffer/tsa2/data/jj.dat");
jj = ts(jj, frequency = 4, start = c(1960, 1));
plot(jj);
globtemp = scan("http://www.stat.pitt.edu/stoffer/tsa2/data/globtemp.dat");
globtemp = ts(globtemp, start = 1856);
plot(globtemp);
nyse = scan("http://www.stat.pitt.edu/stoffer/tsa2/data/nyse.dat");
nyse = ts(nyse);
plot(nyse);

Correlation

Time series data are almost always correlated with each

otherautocorrelated.

We may want to exploit that correlation, or merely to cope

with it.

Exploiting Correlation: Forecasting

Suppose Yt is the tth observation, and we observe Y0, Y1, . . . , Yn1.
What can we say about Yn?
If we know the correlation structure, or more precisely the
joint distribution, of Y0, Y1, . . . , Yn1, Yn, then we calculate
the conditional distribution of Yn|Y0, Y1, . . . , Yn1.
The conditional mean is the best forecast of Yn, and the conditional standard deviation is the root-mean-square forecast
error. If the conditional distribution is normal, we can use
them to make probability statements about Yn.
5

Coping with Correlation: Regression

Suppose instead that Yt is related to a covariate xt, and we
are interested in the regression of Yt on xt.
Because the Y s are correlated, we should not use Ordinary
Least Squares to fit the regression.
If we knew the correlation structure, we would use Generalized Least Squares.
Usually we dont know it, so we must estimate it, typically
using a parsimonious parametric model.
6

Time Domain and Frequency Domain

Methods that focus on how a time series evolves from one

time to the next are called time domain methods.
Some graphs (e.g. residuals of global temperatures from a
quadratic trend) suggest the possibility of waves in the data:
l = lm(globtemp ~ time(globtemp) + I(time(globtemp)^2));
plot(globtemp - fitted(l));

Since a wave is described in terms of its period, or alternatively its frequency, methods that measure the waves in a
time series are called frequency domain methods.
7

Statistical Models

The primary objective of time series analysis is to develop mathematical models that provide plausible descriptions for sample data. . .

We model a time series as a collection of random variables:

x1, x2, x3, . . . , or more generally {xt, t T }.

Often the phenomenon being observed evolves in continuous

time, but our observations are always discrete samples.

If the sampling times t1, t2, . . . are equally spaced, their separation t = tn tn1 is the sampling interval and 1/t is the
sampling rate (samples per unit time).

Choice of sampling rate affects all aspects of data collection,

analysis, and interpretation.

Example: White Noise

Uncorrelated random variables wt with mean 0 and variance

2 , written w wn(0, 2 ).
w
t
w
Why white noise?
By analogy with white light: in the frequency domain, all
frequencies are present with the same strength.

If in addition the ws are independent and identically dis2 ).

tributed, we write wt iid(0, w
3

Iid White Noise

# t-distributed with 3 degrees of freedom:

w = ts(rt(500, df = 3));
plot(w);

100

200

300

400

500

Time

If in addition the ws are normally distributed, we write

2 ).
wt iid N(0, w

Iid Normal White Noise

0
1
2

w = ts(rnorm(500));
plot(w);

100

200

300

400

500

Time

Example: Moving Average

Many observed series are smoother than white noise.

Possible model:
vt =

1
wt1 + wt + wt+1
3

Moving Average

w = ts(rnorm(500));
v = filter(w, sides = 2, rep(1, 3) / 3);
plot(v);

100

200

300

400

500

Time

Averaging attenuates the faster oscillations, leaving the slower

oscillations more apparent.

More generally, a weighted average of 2, 3, or more noise

terms.

Example: Autoregression

Recursive model:
xt = xt1 0.9xt2 + wt,

t = 1, 2, . . . , 500

Like a regression equation, but the RHS contains past (lagged)

LHS variables, hence autoregression.

Shows many different types of behavior for different choices

of coefficients.

Autoregression

w = ts(rnorm(500));
v = filter(w, filter = c(1, -0.9), method = "recursive");
plot(v);

100

200

300

400

500

Time

Example: Random Walk

One model for trend; recursive definition:

xt = + xt1 + wt

Explicitly:
t

xt = t +

wj
j=1

is the drift (per unit time).

Random Walk

# drift delta = 0.2 per sample:

x = ts(cumsum(rnorm(500) + 0.2));
plot(x);

100

200

300

400

500

Time

The white noise we build it from could be non-normal.

Non-Normal Random Walk

250

200

150

100

# t-distributed increments, 1 degree of freedom, no drift:

x = ts(cumsum(rt(500, df = 1)));
plot(x);

100

200

300

400

500

Time

Example: Signal in Noise

Sine-wave signal:
xt = 2 cos(2t/50 + 0.6) + wt,

t = 1, 2, . . . , 500

More generally, the wave term could be

A cos(2t + ),
where:
A is amplitude;
is frequency (in cycles per unit time);
is phase (in this case, in radians).
16

Cosine wave signal plus noise

0
2
4

w = ts(rnorm(500));
x = 2 * cos(2 * pi * time(w) / 50 + 0.6 * pi) + w;
plot(x);

100

200

300

400

500

Time

Means

Recall: We model a time series as a collection of random

variables: x1, x2, x3, . . . , or more generally {xt, t T }.

The mean function is

x,t = E(xt) =

xft(x)dx

where the expectation is for the given t, across all the possible
values of xt. Here ft() is the pdf of xt.

Example: Moving Average

wt is white noise, with E (wt) = 0 for all t

the moving average is

vt =

1
wt1 + wt + wt+1
3

so
v,t = E (vt) =

1
E wt1 + E (wt) + E wt+1
3

= 0.

Moving Average Model with Mean Function

100

200

300

400

500

Time

Example: Random Walk with Drift

The random walk with drift is

xt = t +

wj
j=1

so
t

x,t = E (xt) = t +

E wj = t,
j=1

a straight line with slope .

Random Walk Model with Mean Function

100

200

300

400

500

Time

Example: Signal Plus Noise

The signal plus noise model is

xt = 2 cos(2t/50 + 0.6) + wt

so
x,t = E (xt)
= 2 cos(2t/50 + 0.6) + E (wt)
= 2 cos(2t/50 + 0.6),
the (cosine wave) signal.

0
2
4

Signal-Plus-Noise Model with Mean Function

100

200

300

400

500

Time

Covariances
The autocovariance function is, for all s and t,
x(s, t) = E (xs x,s) xt x,t

Symmetry: x(s, t) = x(t, s).

Smoothness:
if a series is smooth, nearby values will be very similar,
hence the autocovariance will be large;
conversely, for a choppy series, even nearby values may
be nearly uncorrelated.
8

Example: White Noise

2 ), then
If wt is white noise wn(0, w

2 ,
w
w (s, t) = E (wswt) =
0,

s = t,
s = t.

definitely choppy!

Autocovariances of White Noise

gamma

t
s

Example: Moving Average

The moving average is

vt =

1
wt1 + wt + wt+1
3

and E (vt) = 0, so
v (s, t) = E (vsvt)
1
= E ws1 + ws + ws+1 wt1 + wt + wt+1
9

(3/9)w
s=t

(2/9) 2 ,
s=t1
w
=
2,

(1/9)w
s=t2

0,
otherwise.
11

Autocovariances of Moving Average

gamma

t
s

Example: Random Walk

The random walk with zero drift is
t

xt =

wj
j=1

and E (xt) = 0
so
x(s, t) = E (xsxt)

= E

j=1
2.
= min{s, t}w
j=1

Autocovariances of Random Walk

gamma

t
s

Notes:
For the first two models, x(s, t) depends on s and t only
through |s t|, but for the random walk x(s, t) depends
on s and t separately.
For the first two models, the variance x(t, t) is constant,
2 increases indefibut for the random walk x(t, t) = tw
nitely as t increases.

Correlations

The autocorrelation function (ACF) is

(s, t) =

(s, t)
(s, s)(t, t)

Measures the linear predictability of xt given only xs.

Like any correlation, 1 (s, t) 1.

Across Series

For a pair of time series xt and yt, the cross covariance

function is
x,y (s, t) = E (xs x,s) yt y,t

The cross correlation function (CCF) is

x,y (s, t) =

x,y (s, t)

x(s, s)y (t, t)

Stationary Time Series

Basic idea: the statistical properties of the observations do
not change over time.
Two specific forms: strong (or strict) stationarity and weak
stationarity.
A time series xt is strongly stationary if the joint distribution
of every collection of values {xt1 , xt2 , . . . , xtk } is the same as
that of the time-shifted values {xt1+h, xt2+h, . . . , xtk +h}, for
every dimension k and shift h.
Strong stationarity is hard to verify.
18

If {xt} is strongly stationary, then for instance:

k = 1: the distribution of xt is the same as that of xt+h, for

any h;
in particular, if we take h = t, the distribution of xt is
the same as that of x0;
that is, every xt has the same distribution;

k = 2: the joint (bivariate) distribution of (xs, xt) is the same

as that of (xs+h, xt+h), for any h;
in particular, if we take h = t, the joint distribution of
(xs, xt) is the same as that of (xst, x0);
that is, the joint distribution of (xs, xt) depends on s and
t only through s t;

and so on...

A time series xt is weakly stationary if:

the mean function t is constant; that is, every xt has the
same mean;
the autocovariance function (s, t) depends on s and t only
through their difference |s t|.
Weak stationarity depends only on the first and second moment functions, so is also called second-order stationarity.
Strongly stationary (plus finite variance) weakly stationary.
Weakly stationary strongly stationary (unless some other
property implies it, like normality of all joint distributions).
21

Simplifications

If xt is weakly stationary, cov xt+h, xt depends on h but not

on t, so we write the autocovariances as
(h) = cov xt+h, xt

Similarly corr xt+h, xt depends only on h, and can be written

(h) =

(t + h, t)
(t + h, t + h)(t, t)

(h)
.
(0)

Examples

White noise is weakly stationary.

A moving average is weakly stationary.

A random walk is not weakly stationary.

Estimating Means and Covariances

In other statistical applications, means, variances, and covariances are estimated by averaging across samples.

In time series, we often have only one realization.

Stationarity allows us to estimate moments anyway.

Mean
If xt is stationary, t = E (xt) , so we can estimate by
the sample mean
1 n
x
=
xt .
n t=1
We could also use a weighted mean
n

wtxt,
t=1

where

wt = 1.
t=1

Both are unbiased; usually some weighted mean has smaller

variance than x
, but not much smaller.
2

Autocovariance

Similarly, if xt is stationary, (t + h, t) = cov xt+h, xt (h),

so we can estimate (h) by
1 nh

(h) =
xt+h x
xt x

n t=1
for h = 0, 1, . . . , n 1, with
(h) =
(h).

We estimate the autocorrelation function (ACF) by

(h) =

(h)
.

(0)
3

Sampling Properties

x
is unbiased for .

(h) is not unbiased for (h), but
1 nh
xt+h
n h t=1

would be. Note:

(n h) denominator instead of n;
centering at instead of x
.
4

Non-negative Definiteness
The covariance matrix of (x1, x2, . . . , xk ) is

k =

(0)
(1)
(1)
(0)
...
...
(k 1) (k 2)

. . . (k 1)
. . . (k 2)
...
...
...
(0)

and, as a covariance matrix, is non-negative definite:

a k a = var(a1x1 + a2x2 + + ak xk ) 0
for any vector of constants a = (a1, a2, . . . , ak ) .
k is also non-negative
With the above definition of
(h),
definite; that would not be true if we divided by (n h).
5

Another Sampling Property

If xt is white noise and n is large and some mild conditions

hold, (h) is approximately normal with zero mean and standard deviation
1
(h) = .
n

So we can look for autocorrelations outside 2/ n as evidence of autocorrelation.

R Examples
White noise:
acf(ts(rnorm(100)));

Southern Oscillation Index and fish recruitment:

soi = scan("http://www.stat.pitt.edu/stoffer/tsa2/data/soi.dat");
soi = ts(soi, start = 1950, frequency = 12);
recruit = scan("http://www.stat.pitt.edu/stoffer/tsa2/data/recruit.dat");
recruit = ts(recruit, start = 1950, frequency = 12);
acf(soi, 50);
acf(recruit, 50);
ccf(soi, recruit, 50);
# Negative lags indicate SOI leads recruitment.
7

Interpreting the Cross-Correlation

help(ccf) states: The lag k value returned by ccf(x,y)

estimates the correlation between x[t+k] and y[t].

So the graph shows negative correlation between SOI(t - 5

to 9 months) and recruit(t).

That is, current recruitment is (negatively) correlated with

SOI from 5 9 months ago.

SAS Example
Southern Oscillation Index and fish recruitment:
options pagesize = 80;
data soi;
infile soi.dat;
input soi;
run;
data recruit;
infile recruit.dat;
input recruit;
run;

data both;
time +1;
merge soi recruit;
run;
proc gplot data = both;
symbol i = join;
plot (soi recruit) * time;
run;
proc arima data = both;
title SOI and recruitment;
identify var = soi nlag = 50;
identify var = recruit crosscorr = soi nlag = 50;
/* Positive lags indicate SOI leads recruitment. */
run;

SAS program and output.

Seasonality in the SOI

The ACF of the SOI suggests that xt has a correlation of
around 0.4 with xt+12, xt+24, and so on.
This correlation is caused by the fact that those values
all fall in the same month of the year, and different months
have different means.
That is, this series has a non-constant mean function t.
Since it is non-stationary in the mean, the sample ACF does
not estimate the population ACF, and the graph has no
meaning.
11

We can estimate t and subtract it, to give a series with zero

mean.

The simplest way is to subtract the mean for a given month

of the year from all data for that month.
In R (in SAS, use corresponding proc glm):
soiSA = residuals(lm(soi ~ factor(cycle(soi))));
# transfer the time series structure of soi to soiSA:
soiSA = ts(soiSA, start = start(soi), frequency = frequency(soi));
acf(soiSA, lag = 50);

The ACF graph now shows correlation dropping progressively

from around 0.5 at a one month lag to zero at one year.

The CCF of soiSA and recruit shows correspondingly simpler

structure.

Frequency-domain methods will show that the recruitment

series also has some seasonality, but with much weaker effects.

Replacing recruit with a corresponding recruitSA makes negigible changes to the ACF and CCF.
13

Frequency-domain methods will also show that the seasonal

effects in SOI consist largely of an annual sine wave.

Instead of estimating 12 separate monthly means, we can fit,

and remove, a three-parameter model
2t
2t
t = 0 + 1 cos
+ 2 sin
.
12
12
In R:
soiCS = residuals(lm(soi ~ cos(2 * pi * time(soi)) +
sin(2 * pi * time(soi))));
soiCS = ts(soiCS, start = start(soi), frequency = frequency(soi));
acf(soiCS, lag = 50);
14

Vector-Valued SeriesNotation

Studies of time series data often involve p > 1 series.

E.g. Southern Oscillation Index and recruitment in a fish

population (p = 2).

Treated as a p 1 column vector:

xt =

xt,1
xt,2
...
xt,p

Mean Vector

Assume jointly weakly stationary.

mean vector:

= E (xt) =

E xt,1

E xt,2

...

E xt,p

1
2
...
p

Autocovariance Matrix

Autocovariance matrix contains individual autocovariances

on the diagonal and cross-covariances off the diagonal:
(h) = E

xt+h (xt )
1,1(h) 1,2(h)
2,1(h) 2,2(h)
...
...
p,1(h) p,2(h)

. . . 1,p(h)
. . . 2,p(h)

...
...

. . . p,p(h)

Sample mean and autocovariances

sample mean:
1 n
xt

x=
n t=1
sample autocovariance:
1 nh
(h) =

xt+h
x (xt
x)
n t=1
(h) =
(h) .
for h 0, and

Multidimensional Series (Spatial Statistics)

Some studies involve data indexed by more than one variable.

E.g. soil surface temperatures in a field

Notation: xs is the observed value at location s (s for spatial).

Soil temperatures

6
60
40

row

ature

Temper

colu 20
mns

20
10

Autocovariance and Variogram

Stationary : E (xs) and cov xs+h, xs do not depend on s.
For a stationary process, the autocovariance function is
(h) = cov xs+h, xs = E

xs+h

Intrinsic: E xs+h xs and var xs+h xs do not depend on

s.
For an intrinsic process, the (semi-)variogram is
1
Vx(h) = var xs+h xs
2
7

A stationary process is intrinsic (see Problem 1.26), but an

intrinsic process is not necessarily stationary.

In one dimension, the random walk is intrinsic but not stationary.

When stationary, Vx(h) = (0) (h).

Isotropic: an intrinsic process is isotropic if the variogram is

a function only of |h|, the Euclidean distance between s + h
and s.
8

Time Series Regression

A regression model relates a response xt to inputs zt,1, zt,2, . . . , zt,q :

xt = 1zt,1 + 2zt,2 + + q zt,q + error.

Time domain modeling: the inputs often include lagged values of the same series, xt1, xt2, . . . , xtp.

Frequency domain modeling: the inputs include sine and cosine functions.

Fitting a Trend

0.4
0.2
0.0
0.4

window(globtemp, start = 1900)

> g1900 = window(globtemp, start = 1900)

> plot(g1900)

1900

1920

1940

1960

1980

2000

Time

possible model:
xt = 1 + 2t + wt,
where the error (noise) is white noise (unlikely!).
fit using ordinary least squares (OLS):
> lmg1900 = lm(g1900 ~ time(g1900)); summary(lmg1900)
Call:
lm(formula = g1900 ~ time(g1900))
Residuals:
Min
1Q
-0.30352 -0.09671

Median
0.01132

3Q
0.08289

Max
0.33519
3

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.219e+01 9.032e-01 -13.49
<2e-16 ***
time(g1900) 6.209e-03 4.635e-04
13.40
<2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Residual standard error: 0.1298 on 96 degrees of freedom
Multiple R-Squared: 0.6515,
Adjusted R-squared: 0.6479
F-statistic: 179.5 on 1 and 96 DF, p-value: < 2.2e-16

0.0
0.4

g1900

0.2

0.4

> plot(g1900)
> abline(reg = lmg1900)

1900

1920

1940

1960

1980

2000

Time

Using PROC ARIMA

Program
data globtemp;
infile globtemp.dat;
n + 1;
input globtemp;
year = 1855 + n;
run;
proc arima data = globtemp;
where year >= 1900;
identify var = globtemp crosscorr = year;
/* The ESTIMATE statement fits a model to the
*\
\* variable in the most recent IDENTIFY statement */
estimate input = year;
run;

and output.
6

Regression Review
the regression model:
xt = 1zt,1 + 2zt,2 + + q zt,q + wt = zt + wt.
fit by minimizing the residual sum of squares
n

xt zt

RSS( ) =

t=1

find the minimum by solving the normal equations

=
zt zt

t=1

ztxt.
t=1
7

Matrix Formulation

factor matrix Znq = (z1, z2, . . . , zn) , response vector xn1 =

(x1, x2, . . . , xn)
= Z x with solution
= (Z Z)1Z x
normal equations (Z Z)

minimized RSS
= x Z

RSS

x Z

Zx
=xx
= x x x Z(Z Z)1Z x
8

Distributions

If the (white noise) errors are normally distributed (wt

2 )), then
is multivariate normal, and the usual
iid N(0, w
t- and F -statistics have the corresponding distributions.

If the errors are not normally distributed, but still iid, the
same is approximately true.

If the errors are not white noise, none of that is true.

Choosing a Regression Model

We want a model that fits well without using too many parameters.

Two estimates of the noise variance:

unbiased: s2
w = RSS/(n q)
maximum likelihood:
2 = RSS/n.

We want small
2 but also small q.
10

Information Criteria (smaller is better)

Akaikes Information Criterion (with k variables in the model):

AIC = ln
k2 +

n + 2k
n

bias-corrected Akaikes Information Criterion:

n+k
AICc = ln
k2 +
nk2
Schwarzs (Bayesian) Information Criterion:
k ln n
2
SIC = ln
k +
n

Notes
More commonly (e.g. in SAS output and in Rs AIC function),
these are all multiplied by n.
AIC, AICc, and SIC (also known as SBC and BIC) can be
generalized to other problems where likelihood methods are
used.
If n is large and the true k is small, minimizing BIC picks k
well, but minimizing AIC tends to over-estimate it.
If the true k is large (or infinite), minimizing AIC picks a value
that gives good predictions by trading off bias vs variance.
12

Exploratory Data Analysis (or Searching for Stationarity)

When an observed time series appears stationary, we can

calculate its sample autocorrelations, and use them to decide
on a model.

Many time series do not appear stationary; e.g., Johnson and

Johnson earnings, global temperature.

Often we can find a way to relate one series to a different

series, for which stationarity is more plausible.

Trends and Detrending

Some series can be modeled as

xt = t + yt,
where yt is stationary.

If t is a parametric form, we can estimate it and subtract

it. That is, we use the residuals from a fitted trend.

The form of trend might be linear, or higher degree polynomial, or some other function suggested by theory.
2

Example: 20th Century Global Temperature

0.3
0.0
0.3

Residuals

lmg1900 = lm(g1900 ~ time(g1900));

plot(ts(residuals(lmg1900), start = 1900));

1900

1920

1940

1960

1980

2000

Time

Differencing

Some series still appear nonstationary after detrending.

E.g. the trend t is a random walk with drift:

t = t +

wj
j=1

Here E(xt) = t, but

xt E ( xt ) =

wj + yt
j=1

with a variance that grows with time.

But now the first differences

xt = xt xt1 = + wt + yt yt1
are stationary.

Define the backshift operator B by Bxt = xt1

Then xt = (1 B)xt.

Also second differences

2xt = (1 B)2xt = xt 2xt1 + xt2,
etc. Easy for any positive integer d; possible for fractional d.
5

Example: 20th Century Global Temperature

0.0
0.3

diff(g1900)

0.3

plot(diff(g1900));

1900

1920

1940

1960

1980

2000

Time

Both detrending and differencing give apparently stationary

results.
6

acf(diff(g1900));

0.4
0.2

ACF

1.0

Series diff(g1900)

Lag

Differencing has removed almost all auto-correlation.

acf(residuals(lmg1900))

0.4
0.2

ACF

1.0

Series residuals(lmg1900)

Lag

Removing the trend without differencing leaves more autocorrelation.

Transformation (Re-expression)

Some series need to be re-expressed.

Most commonly logarithms, sometimes square roots (especially with counted data).

Often re-expression improves stationarity, and other desirable

features such as symmetry of distribution.

E.g. Glacial varve thicknesses, Johnson and Johnson earnings.

Periodic Signals

If a series is plausibly modeled as a cosine wave plus noise,

we can fit
xt = A cos(2t+)+wt = (A cos ) cos(2t)(A sin ) sin(2t)
by least squares.

If is known (e.g., = 1/12 for an annual cycle in monthly

data), this is a linear regression:
xt = 1 cos(2t) + 2 sin(2t)

If is of the form j/n for integer j (n = series length), then

2 n
1 =
xt cos(2tj/n),
n t=1
2 n
2 =
xt sin(2tj/n).
n t=1
For other , use standard linear least squares regression.
If is unknown, either:
try all s of the form j/n, plotting 1(j/n)2 + 2(j/n)2
against j/n (the periodogram);
use non-linear least squares for other .
11

# detrend global temperature using a quadratic fit

gtres = residuals(lm(globtemp ~ time(globtemp) + I(time(globtemp)^2)));
gtres = ts(gtres, start = start(globtemp));
par(mfcol = c(2, 1));
plot(gtres);
# use spectrum() to plot the periodogram of detrended global temperature
spectrum(gtres, log = "no");

Smoothing a Time Series

Smoothing a time series makes long-term behavior (low frequencies) more apparent. E.g. global temperature, Johnson
and Johnson earnings.
Many types of smoother:
moving averages;
kernel smoothers;
lowess, supsmu, etc.;
smoothing splines.
13

# Trailing yearly average J&J earnings

plot(jj)
lines(filter(jj, rep(1, 4)/4, sides = 1), col = "red")
title("Trailing 4-quarter averages")
# smooth global temperatures over a 30 year window
# (note half weight on end values)
plot(globtemp)
lines(filter(globtemp, c(.5, rep(1, 29), .5)/30), col = "red")
title("Centered 30 year averages")

Smoothing a Scatter Plot

Smoothing a scatter plot can also reveal behavior.

E.g. daily NYSE returns plotted against previous day.

# scatter plot of NYSE return against previous day,

# with lowess smooth
plot(nyse[-length(nyse)], nyse[-1], xlim = c(-0.02, 0.02),
ylim = c(-0.02, 0.02))
lines(lowess(nyse[-length(nyse)], nyse[-1], f = 1/5), col = "red")
title("NYSE daily return against previous day")

Time Domain Models

Box & Jenkins popularized an approach to time series analysis

based on
Auto-Regressive
Integrated
Moving Average
(ARIMA) models.

Autoregressive Models

Autoregressive model of order p (AR(p)):

xt = 1xt1 + 2xt2 + + pxtp + wt,
where:
xt is stationary with mean 0;
1, 2, . . . , p are constants with p = 0;
wt is uncorrelated with xtj , j = 1, 2, . . .

To model a series with non-zero mean :

(xt ) = 1 xt1 + 2 xt2 + + p xtp + wt,
or
xt = + 1xt1 + 2xt2 + + pxtp + wt,
where
= (1 1 2 p) .

Note that the intercept is not .

Note also that

wt = xt 1xt1 + 2xt2 + + pxtp
and is therefore also stationary.

Furthermore, for k > 0,

wtk = xtk 1xtk1 + 2xtk2 + + pxtkp
and wt is uncorrelated with all terms on the right hand side.
So wt is uncorrelated with wtk .
That is, {wt} is white noise.
4

The Autoregressive Operator

Use the backshift operator:

xt = 1Bxt + 2B 2xt + + pB P xt + wt,
or
1 1B 2B 2 pB p xt = wt.

The autoregressive operator is

(B) = 1 1B 2B 2 pB p.

In operator form, the model equation is (B)xt = wt.

Example: AR(1)

For the first-order model:

xt = xt1 + wt.

Also
xt1 = xt2 + wt1
so
xt = xt2 + wt1 + wt
= 2xt2 + wt1 + wt.
6

Now use
xt2 = xt3 + wt2
so
xt = 2 xt3 + wt2 + wt1 + wt
= 3xt3 + 2wt2 + wt1 + wt.

Continuing:
xt = k xtk +

j wtj .

j=0

We have shown:
xt = k xtk +

j wtj .

j=0

Since xt is stationary, if || < 1 then k xtk 0 as k ,

xt =

j wtj ,

j=0

an infinite moving average, or linear process.

Moments

Mean: E(xt) = 0.

Autocovariances: for h 0
(h) = cov xt+h, xt

= E

j wt+hj

k wtk

j
2 h
w
=
.
2
1

Autocorrelations: for h 0
(h)
= h.
(h) =
(0)
Note that
(h) = (h 1),

h = 1, 2, . . .

Compare with the original equation

xt = xt1 + wt.

Simulations

plot(arima.sim(model = list(ar = .9), 100))

Causality

What if || > 1? Rewrite

xt = xt1 + wt.
as
xt = 1xt+1 1wt+1
Now

xt =

j wt+j ,

j=1

a sum of future noise terms. This process is said to be not

causal. If || < 1 the process is causal.
12

The Autoregressive Operator Again

Compare the original equation:

xt = xt1 + wt (1 B)xt = wt xt = (1 B)1wt.

with the (infinite) moving average representation:

xt =
j=0

j wtj =

j B j wt

j=0

So
(1 B)1 =

j B j .

j=0

Compare with
(1 z)1 =

1
j
=
(z) =
j z j ,
1 z
j=0
j=0

valid for |z| < 1 (because || < 1).

We can manipulate expressions in B as if it were a complex

number z with |z| < 1.
14

Stationary versus Transient

E.g. AR(1):
Stationary version, when || < 1:

xt =

j wtj

j=0

But suppose we want to simulate, using

xt = xt1 + wt,

t = 1, 2, . . .

What about x0?

One possibility: let x0 = 0.

Then x1 = w1, x2 = w2 + w1, and generally

xt =

j wtj .

j=0

This means that

2 1 + 2 + 4 + + 2(t1) .
var(xt) = w

var(xt) depends on t this version is not stationary.

But, if || < 1, then for large t,

2 1 + 2 + 4 + . . .
var(xt) w

2
w
=
1 2

Also, under the same conditions (more work!),

2 |h|
w
cov xt+h, xt
.
2
1

This version is called asymptotically stationary.

The non-stationarity is only for small t, and is called transient. Simulations use a burn-in or spin-up period: discard
the first few simulated values.
17

2
w
But note: in the stationary version, x0 N 0, 12 .

If we simulate x0 from this distribution, and for t > 0 use

xt = xt1 + wt,

t = 1, 2, . . . ,

then the result is exactly stationary.

That is, we can use a simulation with no spin-up.

This is harder for AR(p) when p > 1, so most simulators use

a spin-up period.

Moving Average Model

Moving average model of order q (MA(q)):

xt = wt + 1wt1 + 2wt2 + + q wtq
where:
1, 2, . . . , q are constants with q = 0;
2 ).
wt is Gaussian white noise wn(0, w

Note that wt is uncorrelated with xtj , j = 1, 2, . . . .

In operator form:
xt = (B)wt,
where the moving average operator (B) is
(B) = 1 + 1B + 2B 2 + + q B q .

Compare with the autoregressive model (B)xt = wt.

The moving average process is stationary for any values of

1, 2, . . . , q .

Moments

Mean: E (xt) = 0.

Autocovariances:
(h) = cov xt+h, xt

= E

j wt+hj

k wtk

j
2
= w

k k+h
k

if h > q.
3

The MA(q) model is characterized by

2 = 0
(q) = w
q

(h) = 0

for h > q.

The contrast between the ACF of

a moving average model, which is zero except for a finite
number of lags h
an autoregressive model, which goes to zero geometrically
makes the sample ACF an important tool in deciding what
model to fit.
4

Inversion
Example: MA(1)
xt = wt + wt1 = (1 + B)wt,
so if || < 1,
wt = (1 + B)1xt = (B)xt,
where

(B) =

()j B j .

j=0

So xt satisfies an infinite autoregression:

xt =

()j xtj + wt,

j=1
5

Autoregressive Moving Average Models

Combine! ARMA(p, q):

xt =1xt1 + 2xt2 + + pxtp
+ wt + 1wt1 + 2wt2 + + q wtq .

In operator form:
(B)xt = (B)wt.

Issues in ARMA Models

Parameter redundancy: if (z) and (z) have any common factors, they can be canceled out, so the model is the same as
one with lower orders. We assume no redundancy.

Causality: If (z) = 0 for |z| 1, xt can be written in terms of

present and past ws. We assume causality.

Invertibility: If (z) = 0 for |z| 1, wt can be written in terms

of present and past xs, and xt can be written as an infinite
autoregression. We assume invertibility.
7

Using proc arima

Example: fit an MA(1) model to the differences of the log

varve thicknesses.

options linesize = 80;

ods html file = ../varve1.html;
data varve;
infile ../data/varve.dat;
input varve;
lv = log(varve);
dlv = dif(lv);
run;
8

proc arima data = varve;

title Fit an MA(1) model to differences of log varve;
identify var = dlv;
estimate q = 1;
run;

proc arima output

Using some proc arima options

Example: fit an IMA(1) model to the log varve thicknesses.

options linesize = 80;

ods html file = varve2.html;
data varve;
infile varve.dat;
input varve;
lv = log(varve);
run;

proc arima data = varve;

title Fit an IMA(1, 1) model to log varve, using ML;
title2 Use minic option to identify a good model;
identify var = lv(1) minic;
estimate q = 1 method = ml;
estimate q = 2 method = ml;
estimate p = 1 q = 1 method = ml;
run;

proc arima output

Notes on the proc arima output

For the MA(1) model, the Autocorrelation Check of Residuals rejects the null hypothesis that the residuals are white
noise.
If the series really had MA(1) structure, the residuals
would be white noise.
So the MA(1) model is not a good fit for this series.

For both the MA(2) and the ARMA(1, 1) models, the ChiSquare statistics are not significant, so these models both
seem satisfactory. ARMA(1, 1) has the better AIC and SBC.
10

Using R

Fit a given model and test the residuals as white noise:

varve.ma1 = arima(diff(log(varve)),
order = c(p = 0, d = 0, q = 1));
varve.ma1;
Box.test(residuals(varve.ma1), lag = 6,
type = "Ljung", fitdf = 1);

Note: the fitdf argument indicates that these are residuals

from a fit with a single parameter.
11

As in proc arima, differencing can be carried out within arima():

varve.ima1 = arima(log(varve), order = c(0, 1, 1));
varve.ima1;
Box.test(residuals(varve.ima1), 6, "Ljung", 1);

But note that you cannot include the intercept, so the results
are not identical.
Rerun the original analysis with no intercept:
arima(diff(log(varve)), order = c(0, 0, 1),
include.mean = FALSE);
12

Make a table of AICs:

AICtable = matrix(NA, 5, 5);
dimnames(AICtable) =
list(paste("p =", 0:4), paste("q =", 0:4));
for (p in 0:4) {
for (q in 0:4) {
varve.arma = arima(diff(log(varve)), order = c(p, 0, q));
AICtable[p+1, q+1] = AIC(varve.arma);
}
}
AICtable;
Note: proc arimas MINIC option tabulates (an approximation
to) BIC, not AIC.
13

Make a table of BICs:

BICtable = matrix(NA, 5, 5);
dimnames(BICtable) =
list(paste("p =", 0:4), paste("q =", 0:4));
for (p in 0:4) {
for (q in 0:4) {
varve.arma = arima(diff(log(varve)), order = c(p, 0, q));
BICtable[p+1, q+1] =
AIC(varve.arma, k = log(length(varve) - 1));
}
}
BICtable;
Both tables suggest ARMA(1, 1).
14

ARMA Autocorrelation Functions

For a moving average process, MA(q):
xt = wt + 1wt1 + 2wt2 + + q wtq .
So (with 0 = 1)
(h) = cov xt+h, xt

= E

j wt+hj

j=0

2
j j+h,
w
=
j=0

k wtk

k=0

0hq
h > q.
1

So the ACF is

j j+h

j=0
,
q
(h) =
2

j=0

0hq

h > q.

Notes:
In these expressions, 0 = 1 for convenience.
(q) = 0 but (h) = 0 for h > q.
MA(q).

This characterizes

For an autoregressive process, AR(p):

xt = 1xt1 + 2xt2 + + pxtp + wt.

So
(h) = cov xt+h, xt

= E

j xt+hj + wt+h xt
j=1

j (h j) + cov wt+h, xt .

=
j=1

Because xt is causal, xt is wt+ a linear combination of wt1, wt2, . . . .

2
w
cov wt+h, xt =
0

h=0
h > 0.

Hence
p

j (h j),

(h) =

h>0

j=1

and
p
2.
j (j) + w

(0) =
j=1

2 , these equa If we know the parameters 1, 2, . . . , p and w

tions for h = 0 and h = 1, 2, . . . , p form p + 1 linear equations
in the p + 1 unknowns (0), (1), . . . , (p).

The other autocovariances can then be found recursively

from the equation for h > p.

Alternatively, if we know (or have estimated) (0), (1), . . . , (p),

they form p + 1 linear equations in the p + 1 parameters
2.
1, 2, . . . , p and w

These are the Yule-Walker equations.

For the ARMA(p, q) model with p > 0 and q > 0:

xt =1xt1 + 2xt2 + + pxtp
+ wt + 1wt1 + 2wt2 + + q wtq ,
a generalized set of Yule-Walker equations must be used.

The moving average models ARMA(0, q) = MA(q) are the

only ones with a closed form expression for (h).

For AR(p) and ARMA(p, q) with p > 0, the recursive equation

means that for h > max(p, q + 1), (h) is a sum of geometrically decaying terms, possibly damped oscillations.
6

The recursive equation is

j (h j),

(h) =

h > q.

j=1

What kinds of sequences satisfy an equation like this?

Try (h) = z h for some constant z.
The equation becomes
p

0 = z h

j z (hj) = z h 1
j=1

j z j = z h(z).
j=1

So if (z) = 0, then (h) = z h satisfies the equation.

Since (z) is a polynomial of degree p, there are p solutions,
say z1, z2, . . . , zp.
So a more general solution is
p

(h) =

cl zlh,

l=1

for any constants c1, c2, . . . , cp.

If z1, z2, . . . , zp are distinct, this is the most general solution;
if some roots are repeated, the general form is a little more
complicated.
8

If all z1, z2, . . . , zp are real, this is a sum of geometrically

decaying terms.
If any root is complex, its complex conjugate must also be a
root, and these two terms may be combined into geometrically decaying sine-cosine terms.
The constants c1, c2, . . . , cp are determined by initial conditions; in the ARMA case, these are the Yule-Walker equations.
Note that the various rates of decay are the zeros of (z),
the autoregressive operator, and do not depend on (z), the
moving average operator.
9

Example: ARMA(1, 1)
xt = xt1 + wt1 + wt.

The recursion is
(h) = (h 1),

h = 2, 3, . . .

So (h) = ch for h = 1, 2, . . . , but c = 1.

Graphically, the ACF decays geometrically, but with a different value at h = 0.

0.2

0.4

0.6

0.8

ARMAacf(ar = 0.9, ma = 0.5, 24)

1.0

20
25

Index

The Partial Autocorrelation Function

An MA(q) can be identified from its ACF: non-zero to lag q,

and zero afterwards.

We need a similar tool for AR(p).

The partial autocorrelation function (PACF) fills that role.

Recall: for multivariate random variables X, Y, Z, the partial

correlations of X and Y given Z are the correlations of:
the residuals of X from its regression on Z; and
the residuals of Y from its regression on Z.
Here regression means conditional expectation, or best linear prediction, based on population distributions, not a sample calculation.
In a time series, the partial autocorrelations are defined as
h,h = partial correlation of xt+h and xt
given xt+h1, xt+h2, . . . , xt+1.
13

For an autoregressive process, AR(p):

xt = 1xt1 + 2xt2 + + pxtp + wt,

If h > p, the regression of xt+h on xt+h1, xt+h2, . . . , xt+1 is

1xt+h1 + 2xt+h2 + + pxt+hp

So the residual is just wt+h, which is uncorrelated with

xt+h1, xt+h2, . . . , xt+1 and xt.

So the partial autocorrelation is zero for h > p:

h,h = 0,

h > p.

We can also show that p,p = p, which is non-zero by assumption.

So p,p = 0 but h,h = 0 for h > p. This characterizes AR(p).

The Inverse Autocorrelation Function

SASs proc arima also shows the Inverse Autocorrelation Function (IACF).
The IACF of the ARMA(p, q) model
(B)xt = (B)wt
is defined to be the ACF of the inverse (or dual) process
(inverse)

(B)xt

= (B)wt.

The IACF has the same property as the PACF: AR(p) is

characterized by an IACF that is nonzero at lag p but zero
for larger lags.
16

Summary: Identification of ARMA processes

AR(p) is characterized by a PACF or IACF that is:
nonzero at lag p;
zero for lags larger than p.
MA(q) is characterized by an ACF that is:
nonzero at lag q;
zero for lags larger than q.
For anything else, try ARMA(p, q) with p > 0 and q > 0.
17

For p > 0 and q > 0:

AR(p)

MA(q)

ARMA(p, q)

Tails off

Cuts off after lag q

Tails off

PACF

Cuts off after lag p

Tails off

IACF

Cuts off after lag p

Tails off

ACF

Note: these characteristics are used to guide the initial choice

of a model; estimation and model-checking will often lead to
a different model.

Other ARMA Identification Techniques

SASs proc arima offers the MINIC option on the identify

statement, which produces a table of SBC criteria for various
values of p and q.

The identify statement has two other options: ESACF and

SCAN.

Both produce tables in which the pattern of zero and nonzero values characterize p and q.

See Section 3.4.10 in Brocklebank and Dickey.

options linesize = 80;

ods html file = varve3.html;
data varve;
infile ../data/varve.dat;
input varve;
lv = log(varve);
run;
proc arima data = varve;
title Use identify options to identify a good model;
identify var = lv(1) minic esacf scan;
estimate q = 1 method = ml;
estimate q = 2 method = ml;
estimate p = 1 q = 1 method = ml;

run;

proc arima output

Forecasting
General problem: predict xn+m given xn, xn1, . . . , x1.
General solution: the (conditional) distribution of xn+m given
xn, xn1, . . . , x1.
In particular, the conditional mean is the best predictor (i.e.
minimum mean squared error).
Special case: if {xt} is Gaussian, the conditional distribution
is also Gaussian, with a conditional mean that is a linear
function of xn, xn1, . . . , x1 and a conditional variance that
does not depend on xn, xn1, . . . , x1.
1

Linear Forecasting

What if xt is not Gaussian?

Use the best linear predictor: xn
n+m .
Not the best possible predictor, but computable.

One-step Prediction
The hard way: suppose
xn
n+1 = n,1 xn + n,2 xn1 + + n,n x1 .
Choose n,1, n,2, . . . , n,n to minimize the mean squared prediction error E

2
n
.
xn+1 xn+1

Differentiate and equate to zero: n linear equations in the n

unknowns.
Solve recursively (in n) using the Durbin-Levinson algorithm.
Incidentally, the PACF is n,n.
3

One-step Prediction for an ARMA Model

The easy way: suppose we can write

xn+1 = some linear combination of xn, xn1, . . . , x1
+ something uncorrelated with xn, xn1, . . . , x1.
Then the first part is the best linear predictor, and the second
part is the prediction error.

E.g. AR(p), p n:
xn+1 = 1xn + 2xn1 + + pxn+1p +
first part

wn+1
second part

General ARMA case

Now
xn+1 =1xn + 2xn1 + + pxn+1p
+ 1wn + 2wn1 + + q wn+1q
+ wn+1.

First part on the right hand side is a linear combination of

xn, xn1, . . . , x1.

Last part, wn+1, is uncorrelated with xn, xn1, . . . , x1.

Middle part? If the model is invertible, wt is a linear combination of xt, xt1, . . . , so if n is large, we can truncate the
sum at x1, and wn, wn1, . . . , wn+1q are all (approximately)
linear combinations of xn, xn1, . . . , x1.

So the middle part is also approximately a linear combination

of xn, xn1, . . . , x1, whence
xn
n+1 =1 xn + 2 xn1 + + p xn+1p
+ 1wn + 2wn1 + + q wn+1q
and wn+1 is the prediction error, xn+1 xn
n+1 .

Multi-step Prediction

The easy way: build on one-step prediction. E.g. two-step:

xn+2 =1xn+1 + 2xn + + pxn+2p
+ 1wn+1 + 2wn + + q wn+2q
+ wn+2.
Replace xn+1 by xn
n+1 + wn+1 :
xn+2 =1xn
n+1 + 2 xn + + p xn+2p
+ 2wn + + q wn+2q
+ wn+2 + (1 + 1) wn+1.
7

The first two parts are again (approximately) linear combinations of xn, xn1, . . . , x1, and the last is uncorrelated with
xn, xn1, . . . , x1. So
n
xn
=
x
1
n+2
n+1 + 2 xn + + p xn+2p
+ 2wn + + q wn+2q

and the prediction error is

xn+2 xn
n+2 = wn+2 + (1 + 1 ) wn+1 .
Note that the mean squared prediction error is
2 1 + + 2 2 .
w
( 1
1)
w

Mean squared prediction error increases as we predict further

into the future.
8

Forecasting with proc arima

E.g. the fishery recruitment data.

proc arima program and output.

Note that predictions approach the series mean, and std

errors approach the series standard deviation.

The autocorrelation test for residuals is borderline, largely

because of residual autocorrelations at lags 12, 24, . . . .

Spectrum analysis shows that these are caused by seasonal

means, which can be removed: proc arima program and
output.

Comments on Choice of ARMA model

Keep it simple! Use small p and q.
Some systems have autoregressive-like structure.
E.g. first order dynamics:
dx(t)
= x(t)
dt
or in stochastic form,
dx(t) = x(t)dt + dW (t)
where W (t) is a Wiener process, the continuous time limit of
the random walk.
1

Discrete time approximation:

x(t) = x(t + t) x(t) = x(t)t + W (t)
or
x(t + t) = x(t) x(t)t + W (t)
= (1 t)x(t) + W (t),
an AR(1) (causal if > 0 and t is small).

Similarly a second order system leads to AR(2).

Since many real-world systems can be approximated by first

or second order dynamics, this suggests using p = 1 or 2,
and q = 0.
2

Some systems have more dimensions. E.g. first order vector

autoregression, VARp(1):

xt1 + wt .
xt =
p1
p1
pp p1
Here each component time series is typically ARMA(p, p 1).

This suggests using q < p, especially q = p 1.

Added noise: if yt is ARMA(p, q) with q < p, but we observe

xt = yt + wt where wt is white noise, uncorrelated with yt,
then xt is ARMA(p, p).

This suggests using q = p.

Summary: youll often find that you can use small p and
q p, perhaps q = 0 or q = p 1 or q = p, depending on the
background of the series.

Estimation

Current methods are likelihood-based:

f1,2,...,n (x1, x2, . . . , xn) = f1 (x1) f2|1 (x2|x1) . . .
fn|n1,...,1 xn|xn1, xn2, . . . , x1 .
If xt is AR(p) and n > p, then
fn|n1,...,1 xn|xn1, xn2, . . . , x1 =
fn|n1,...,np xn|xn1, xn2, . . . , xnp .

Assume xt is Gaussian. E.g. AR(1):

2 ] for t > 1,
ft|t1(xt|xt1) is N [(1 ) + xt1, w

and
2 /(1 2 )].
f1(x1) is N [, w

So the likelihood, still for AR(1), is

2 ) = (2 2 )n/2
L(, , w
w

S(, )
2
,
1 exp
2
2w

where
S(, ) = (1 2) (x1 )2 +

(xt ) xt1 2 .

t=2
6

Methods in proc arima

method = ml: maximize the likelihood.

method = uls:
S(, ).

minimize the unconditional sum of squares

method = cls: minimize the conditional sum of squares Sc(, ):

Sc(, ) = S(, ) (1 2) (x1 )2
n

(xt ) xt1 2 .

t=2

This is essentially least squares regression of xt on xt1.

AR(p), p > 1, can be handled similarly.

ARMA(p, q) with q > 0 is more complicated; state space

methods can be used to calculate the exact likelihood.

proc arima implements the same three methods in all cases.

All three methods give estimators with the same large-sample

normal distribution; all are asymptotically optimal.

Brute Force

Above methods fail (or need serious modification) if any data

are missing.

Can always fall back to brute force:

x1, x2, . . . , xn Nn(1, ),
where

(0)
(1)
(2)
(1)
(0)
(1)
(2)
(1)
(0)
...
...
...
(n 1) (n 2) (n 3)

. . . (n 1)

. . . (n 2)

. . . (n 3)

...
...

...
(0)
9

2 (h), and use e.g.

Write (h) = w
compute (h).

Rs ARMAacf(...)

Likelihood is
1
1
exp (x 1) 1(x 1)
2
det(2 )
=

1
2 )
det(2w

exp

1
1(x 1)
(
x

1
)

2
2w

2 , then
Can maximize analytically with respect to and w
numerically with respect to and .

Missing data? Just leave out corresponding rows and columns

of .
10

The Integrated ARMA model: ARIMA(p, d, q)

Some series are nonstationary, but their differences are stationary; e.g. the random walk.
Recall: the first differences of xt are
xt xt1 = (1 B)xt = xt.
The second differences are
xt xt1 = (1 B)xt = 2xt.
If dxt is ARMA(p, q), we say that xt is ARIMA(p, d, q).
1

Under-differencing

Suppose that xt is ARIMA(p, d, q), but we analyze yt = d xt

for some d < d.

In this case, yt satisfies

dd (B)yt = (B)yt = (B)wt
where (z) = (1 z)(dd )(z) has d d roots at z = 1.

This looks like an ARMA(p + d d , q) model, but it is not

causal.
2

Over-differencing

Suppose that xt is ARIMA(p, d, q), but we analyze yt = d xt

for some d > d.

In this case, yt satisfies

(B)yt = d d(B)wt = (B)wt
where (z) = (1 z)(d d)(z) has d d roots at z = 1.

This looks like an ARMA(p, q + d d) model, but it is not

invertible.
3

Simplest model with d > 0: ARIMA(0, 1, 1)

Many nonstationary series are found to be fitted quite well
as ARIMA(0, 1, 1).
This model is connected with the exponentially weighted
moving average (EWMA) method of forecasting.
If the model is written xt xt1 = wt wt1, the one-step
forecast is

x
n+1 = (1 )

j xnj ,

j=0

the exponentially weighted moving average.

We can calculate the forecast recursively:

xn+1 = xn wn + wn+1.

We can find wn from xn, xn1, . . . , so the one-step forecast

is the first part:
x
n+1 = xn wn

But wn is the previous forecast error, xn x

n, so
x
n+1 = xn (xn x
n)
= (1 )xn +
xn .
In words,
the new forecast is a weighted average of the current
forecast and the current value.
Also
x
n+1 = x
n + (1 )(xn x
n),
so the new forecast is the current forecast plus a correction
based on the current forecast error.
6

Strategy for Building ARIMA Models

1. First choose d:
ACF of an integrated series tends to die away slowly, so
difference until it dies away quickly;
the IACF of a non-invertible series tends to die away
slowly, which indicates over-differencing.
You may want to try more than one value of d.

2. Next choose p and q, e.g. using MINIC.

3. Next estimate the model.

4. Finally check the model diagnostics:
p (if p > 0)
Significance of highest order coefficients,
and q (if q > 0);
Non-significance in autocorrelation check of residuals;
Low value of AIC or SBC.
5. Repeat from step 2 until satisfactory.
Note: You may not find a completely satisfactory model,
especially for a long data series.
8

Unit Root Tests

Choice of d can be formulated as a hypothesis test.
E.g. in the AR(1) model xt = xt1 + wt, set:
H0 : = 1, xt is ARIMA(0, 1, 0) (nonstationary, d = 1);
HA : || < 1, xt is ARIMA(1, 0, 0) (stationary, d = 0).
Test using proc arimas stationarity keyword on the identify
statement.
E.g. the global temperature data: proc arima program and
output.
9

The statistics on the Lags 0 rows in the panel Augmented

Dickey-Fuller Unit Root Tests refer to the three models
Zero Mean:
xt = xt1 + wt;
Single Mean:
xt = (xt1 ) + wt;
Trend:
xt t = xt1 (t 1) + wt.

Note that under H0, these models reduce to

xt = xt1 + wt,
xt = xt1 + wt,
xt = xt1 + + wt,
the first two being random walks with no drift, the latter
being a random walk with drift.
The statistics on the Lags 1 rows refer to corresponding
AR(2) models, which reduce to integrated AR(1) models
under the null hypothesis.
The Tau tests are generally preferred to the Rho tests.
11

E.g. Case-Shiller housing data: proc arima program and output.

Seasonal ARIMA Models

Many time series collected on a monthly or quarterly basis

have seasonal behavior.

Similarly hourly data and daily behavior.

E.g. Johnson & Johnson quarterly earnings; discussion typically focuses on comparison with:
previous quarter ;
same quarter, previous year.
1

That is, we compare xt with xt1 and xt4.

More generally, we compare xt with xt1 and xts, where
s = 4 for quarterly data,
s = 12 for monthly data,
s = 24 for daily effects in hourly data,
s = 168 for weekly effects in hourly data,
etc.
This suggests modeling xt in terms of xt1 and xts.
2

Pure Seasonal ARMA

The pure seasonal ARMA model has the form
xt = 1xts + 2xt2s + + P xtP s
+ wt + 1wts + 2wt2s + + QwtQs.
Notation: ARMA(P, Q)s.
In operator form:
P (B s)xt = Q(B s)wt.
P (B s) and Q(B s) are seasonal autoregressive and moving
average operators.
3

Multiplicative Seasonal ARMA

ACF of pure seasonal ARMA is nonzero only at lags s, 2s,

. . . ; most seasonal time series have other nonzero values.
(s)

For such series, wt = Q(B s)1P (B s)xt is not white noise

for any choice of P and Q.
(s)

But suppose that for some P and Q, wt

(s)

p(B)wt

is ARMA(p, q):

= q (B)wt,

where {wt} is white noise.

Then xt satisfies
P (B s)p(B)xt = Q(B s)q (B)wt.

This is the Multiplicative

ARMA(p, q) (P, Q)s.

Seasonal

ARMA

model

The non-seasonal parts p and q control short-term correlations (up to half a season, lag s/2), while the seasonal
parts P and Q control the decay of the correlations over
multiple seasons.

Example: Johnson & Johnson earnings; R analysis

par(mfrow = c(2, 1))
plot(log(jj))
jjl = lm(log(jj) ~ time(jj) + factor(cycle(jj)))
summary(aov(jjl))
jjf = ts(fitted(jjl), start = start(jj),
frequency = frequency(jj))
lines(jjf, col = 2, lty = 2)
jjr = ts(residuals(jjl), start = start(jj),
frequency = frequency(jj))
plot(jjr)
acf(jjr)
pacf(jjr)
6

PACF is simpler than ACF:

ACF spikes at lags 4, 8, perhaps 12; of these, PACF spikes
only at lag 4;
apart from lags 4, 8, . . . , PACF drops off faster.
(P)ACF indicates neither simple ARMA nor simple ARMA4.
PACF suggests ARMA(2, 0) (1, 0)4:
jja = arima(jjr, order = c(2, 0, 0),
seasonal = list(order = c(1, 0, 0), period = 4))
print(jja)
tsdiag(jja)
7

Note: the original fit of the straight line and seasonal dummies was by OLS;
possibly inefficient;
invalid inferences (standard errors, etc.).
Solution: refit as part of the time series model.
x = model.matrix( ~ time(jj) + factor(cycle(jj)))
jja = arima(log(jj), order = c(2, 0, 0),
seasonal = list(order = c(1, 0, 0), period = 4),
xreg = x, include.mean = FALSE)
print(jja)
tsdiag(jja)
8

Notes:
The time series being fitted is the original unadjusted log(jj).
The regressors are specified as the matrix argument xreg.
arima does not check for linear dependence, so we must either
omit one dummy variable from xreg or use include.mean =
FALSE in arima.
Regression parameter estimates are similar to OLS, but standard errors are roughly doubled.
Using SAS: proc arima program and output.
9

Multiplicative Seasonal ARIMA

The seasonal difference operator is s = 1 B s.
Some series show slow decay of ACF only at lags s, 2s, . . . ,
which suggests seasonal differencing.
But note: seasonal means also give slow decay of ACF at
those lags.
The
Multiplicative
Seasonal
ARIMA(p, d, q) (P, D, Q)s is

ARIMA

model

d x = (B s ) (B)w .
P (B s)p(B)D

q
t
t
Q
s
10

The Frequency Domain

Time domain methods:

regress present on past;
capture dynamics in terms of velocity (first order), acceleration (second order), inertia, etc.

Frequency domain methods:

regress present on periodic sines and cosines;
capture dynamics in terms of resonant frequencies, etc.
1

E.g. AR(2):
plot(ts(arima.sim(list(order = c(2,0,0), ar = c(1.5,-.95)), n = 144)))

Strong periodicity, around 16 peaks period of around 9

samples.

Fitting an AR model doesnt describe this:

xt = 1.50xt1 0.95xt2 + wt.

Cyclical Behavior
Simplest case is the periodic process
xt = A cos(2t + )
= U1 cos(2t) + U2 sin(2t).
where:
A is amplitude;
is frequency, in cycles per sample;
is phase, in radians;
and U1 = A cos(), U2 = A sin().
3

Folding Frequency; Aliasing

If = 0, xt = A cos(), constant.
If = 1? At t = 0, 1, 2, . . . , same thing!
= 0 is an alias of = 1.
All frequencies higher than = 1/2 have an alias in 0
1/2:
cos[2(k )t + ] = cos(2t ),

t = 0, 1, 2, . . .

= 1/2 is the folding frequency.

For example, = 0.8:

omega = 0.8;
phi = pi / 6;
plot(function(x) cos(2 * pi *
from = 0, to = 10);
plot(function(x) cos(2 * pi *
from = 0, to = 10, add =
abline(v = 0:10, lty = 2, col

omega * x + phi),
(1 - omega) * x - phi),
TRUE, col = "red");
= "blue");

Note:
= 0.8 = 0.5 + 0.3, and 1 = 0.2 = 0.5 0.3;
1 is folded around 0.5.
5

Stationarity

If
xt = A cos(2t + )
= U1 cos(2t) + U2 sin(2t).
and is random, uniformly distributed on [0, 2), then:
E(xt) = 0,
1
E xt+hxt = A2 cos(2h).
2

So xt is weakly stationary.
6

Also
E(U1) = E(U2) = 0,
E U12

= E U22

1 2
= A ,
2

and
E(U1U2) = 0.
Alternatively, if the U s have these properties, xt is stationary
with the same mean and autocovariances:
E(xt) = 0,
1
E xt+hxt = A2 cos(2h).
2

More generally, if
q

xt =

Uk,1 cos(2k t) + Uk,2 sin(2k t) ,

k=1

where:
the U s are uncorrelated with zero mean;
var Uk,1 = var Uk,2 = k2;
then xt is stationary with zero mean and autocovariances
q

k2 cos(2k h).

(h) =
k=1

Harmonic Analysis
Any time series sample x1, x2, . . . , xn can be written
(n1)/2

xt = a0 +

aj cos(2jt/n) + bj sin(2jt/n)
j=1

if n is odd; if n is even, an extra term is needed.

The periodogram is
2
P (j/n) = a2
j + bj .

The R function spectrum can calculate and plot the periodogram.

R examples:
par(mfcol = c(2, 1));
# one frequency:
x = cos(2*pi*(0.123)*(1:144))
plot.ts(x); spectrum(x, log = "no")
# and a second frequency:
x = x + 2 * cos(2*pi*(0.234)*(1:144))
plot.ts(x); spectrum(x, log = "no")
# and added noise:
x = x + rnorm(144)
plot.ts(x); spectrum(x, log = "no")
# the AR(2) series:
x = ts(arima.sim(list(order = c(2,0,0), ar = c(1.5,-.95)), n = 144))
plot(x); spectrum(x, log = "no")

Using SAS: proc spectra program and output.

tr st

rr ss rqs r str
t s

t t t r t s s sttr
t srs

str st s t rrs r

srt rr rsr

t t srt rr trsr s

t t
t

rqs r t rr r t
rqs

tr rr trsr t s rs trsr

t
t t

Prr

rr s

s rr s rr s P

trs s trs r

tt s r s rt s
r

r t rr s

t str st t

rst t str st s t rt
t t rr

t s r t str st ss t s
t t rr t

rs t rr s s sttr
t str st

t t rr s t rs r
s t r stt

tts t tr

r sttr t srs t t trs

s rs str r str strt
t r

s st ts t s str st
t

r rs ts s s

rtt s t s

Prrts t str st

t s stt

t t str t t s
t t s

r t t t t t

s s

r s rsts t r tr s
s tt t str st t q rss
t t

t tt ts s t s rt
t tr str t srs ts t
tt t rq

t rs r rss rrs t
rs
rs
t
t

ts str st s
rs

s t rs rss

t
rr
ttsrsstr

r
s

t t t t

t ts t
sqr t t

t t

s r r

t rs rt ss
t rs t

s r st ts t t r

tt t r t t rs

sst ts t t r r s

ss tt t r

t t s t
r

t r

rtr tr stts

r rt t

Pr
s rt s sttr

t t t rs r t s

t P

rq s

s r t rr rqs r s ttr
s tr r rq t sr
t

s st t st r s
tr

s tr s sr rr s st
t rt t s qtt rt
t

r Prr

s t t rr
rqs

t s rt s r

t t r Prr

r
s strs st tr tr

r str t s s
r s
ts
r

Pts ts

ts r rt
tstrs t sqr

ts r r
rt tt s
r rt t
tstrs t r

t ts r ts r rrq
ts t t t t r t tr str
ts trs s strt
t r r

t t Prr

s t r
t t s r stt t s
qtt
t s t t s t tr ss rt

trt t r

t t t Prr

t r t s rs ts r
trr ts
r
t

r ts rs
r
s strs st tr tr

t ts sttt s s t rt
rtrr r
r str t s s
r s
ts
r

str

rs t ts t s t

t ts sttt s s t rt s
str rs
ts r

r str

rr tt

s st

tr st t r
tr s t ss st s t
sstt s t rr t t rq

t srs s sttr s tr t
r t r str st t

r ss ss ss t ss st srs
s rsss strs
s tss rq rqs strt strts

r
s strs st tr tr
tstrs t sqr
tt r

Period in years
1

0.5

0.25

0.167

0.00 0.02 0.04 0.06 0.08

s(f)

Frequency in cycles per year

tstrs t r
tt r

Period in years
1

0.5

0.25

0.167

0.04
0.02
0.00

s(f)

0.06

0.08

Frequency in cycles per year

Tapering
The periodogram works well with data containing only Fourier
frequencies:
w = rnorm(128, sd = 0.01);
x5 = cos(2*pi*(5/128)*(1:128)) + w;
x6 = cos(2*pi*(6/128)*(1:128)) + w;
par(mfcol = c(3, 1), mar = c(2, 2, 1, 1));
spectrum(x5, taper = 0, ylim = c(1e-7, 1e2));
spectrum(x6, taper = 0, ylim = c(1e-7, 1e2));

It doesnt work so well with other frequencies:

x5h = cos(2*pi*(5.5/128)*(1:128)) + w;
spectrum(x5h, taper = 0, ylim = c(1e-7, 1e2));
1

One solution is to taper the data:

spectrum(x5h, taper = 0.5, ylim = c(1e-7, 1e2))

This works by multiplying the data by a data window :

par(mfcol = c(3, 1), mar = c(2, 2, 1, 1));
plot(tapr(rep(1, 128), 0.25));
plot(x5h);
plot(tapr(x5h, 0.25));

The data window modifies a fraction of the data at each end

of the series, to make the data more nearly continuous when
it is wrapped.
2

Tapering makes the main peak wider, but much reduces side
lobes.

To see the side lobes, make the periodogram graphs on a

finer grid of frequencies:
par(mfcol = c(2, 1), mar = c(2, 2, 1, 1));
spectrum(x5h, taper = 0.0, ylim = c(1e-7, 1e2), pad = 896)
spectrum(x5h, taper = 0.5, ylim = c(1e-7, 1e2), pad = 896)

The default in Rs spectrum (or spec.pgram, which does the

work) is to taper 10% at each end of the data.
3

t rs

t sttr srs t t rss rs

t t

rss str st s

rss str st s
q

r q r t str q str
rst

q q

qr r

s t r t s s

ts t rrstt
t

st s st

r q r t rs t

s s ts t rrstts t
t

rrts r s sr t trrt t rs
t sqr r

srs t strt t rts t t t

t rq

s s t srs t rt
r t t rq t

t tt r

Ps tr

qr r s s t t sqr rr
t t t rs

st rrt t t ts t

t ts s str rrt rt tr
st r t rts

s t rrt s tr r

t t st

t sr

s s r s t
t t t r t t r t

t rtt s

r s t s str

s t s str s sr rs t
s t rt t s r s

s str srs t t r tts

t t rq t sst rt t t
ts t t t s rq

tr tr

r r rt t srs
t t

str tr s

t t
t
t str tr s st

rtr stt

r t rt s

s t r trs t s

r r r

stt r

s t s r t t stt

sqr r s

t tr r s r

s st sr rst s rt

t r

r t tss

t rqs r s ts rt
trr t s tt t r s st

rqs t s ts s s
t r

t s t sqr r t t tr
st t srs rrtt srs
r
s strs r r
tr st
ts tt t
q s
s r
q s
s r t

ss ts s t s t rt
s
s
r
r
s

rsss trs
tss strt strts rq rqs
rssr trr
tsr strt strtr rq rqr
strs r r
tr st
ts tt t
q s
s r
q s
s r t

stt s
ts tt s t

r trs

r tr ts t
t

t tt t

r tr

r ts r r

t t s t s t t

t
t
t tt s t t s t ts r t s
rss t t tr
t

r t
trs

rr
tsr strt
t
t t
r
tr
t t
r

t
ss
t

s r r r r t tr s rt r s
r r

t t s s
t t

t tt s
t

t
r

r tr

r r

s t rq rss t t tr

tr s t r r t
tr t
r tr
tr

t
tr ss

st rtr

tr rtt
t

r tr

r r t

r r

rq rss t s

rq rss t r rts

s t t r t
s t s t

t t

stt

s r t s

s tr

r tr

t tr s t s ss tr

t s t s ss tr
trs r rrs
t t

r tr

t s t s t tr s t s
t s t s t tr s rrs t s
r s t tt tr t
t s t tr rrs tr

tt tr

t t s sttr rss t trs

t rs t tt r

t t

r tr

sts

r s tr ts

r s r s

s s t t s t s sttr t

r s r s

t r str st t s

r r

r s r s

r srs

tt str t str sqr

s t t rss r

t t

sts

s t ts

s s

s t s t t s t t r t
sttr t

s s

t rss str st s

s s
s

r r rs
rss str t strrq rss t

t tt t sqr r s

r r rq

t s srs s tr rs tr tr
sqr r s t rqs

rs s s tr

rst r
r

trs

t s

rq rss t s
s

r t t s s
rr
tt s r t
t
tt s r t
t

r t s t s

tr r

t
r tr

rq rss t s

r

s
s

t
rr
tt
r t
tt
r t
tt
r t

s s
t
ss s
t
s s
t

Ps t trts t

t t t

s tr rss t t t s

Prt

t str t tt tr s

s t tr s tt

t s stt

t s t tt t tr s t s t s
rt tr r t t t

t rt tr s t t
t t t t s t qt t t
s s tr

rt rt s

r stt
r

s sr t stt t

rst stt s t

Lagged regression
The fisheries recruitment series (yt) and the Southern Oscillation Index (xt) are cross-correlated with lags of several
months.
Perhaps we can model them as

yt =

r xtr + vt
r=

where vt is uncorrelated with xtr at all lags r. That is,

the coherence between vt and xt is zero at all frequencies;
in words: vt and xt are incoherent.
1

In terms of filters:

zt =

r xtr
r=

is the output of a filter whose input is xt, and yt is zt plus

noise that is incoherent with the input.

If the frequency response function of the filter is

B() =

r e2ir ,

the spectrum of zt is
fzz () = |B()|2fxx().
2

Also the cross spectrum is

fzx() = B()fxx().

Now yt = zt + vt, and vt is incoherent with xt, and therefore

also with zt.

So the spectrum of yt is
fyy () = fzz () + fvv () = |B()|2fxx() + fvv ().
and the cross spectrum of yt and xt is
fyx() = fzx() = B()fxx().
3

So B() must satisfy

B() =

fyx()
.
fxx()

Can we find a filter with frequency response function B()?

Typically, yes. If xt and yt are such that

1/2

fyx()
d < ,
|B()|d =
1/2
1/2 fxx ()
the coefficients are
1/2

r =

1/2
n1

e2ir B()d

1
e2ik r B(k ).

n k=0

SOI and recruitment

We need B(k ), k = 0, 1, . . . , n 1, but both Rs spectrum()

and SASs proc spectra omit k = 0 and k > n/2.
In R, we can use fft() directly:
dy = fft(rec - mean(rec)) / sqrt(length(rec))
dx = fft(soi - mean(soi)) / sqrt(length(soi))
fyx = filter.complex(dy * Conj(dx), rep(1, 15), sides = 2, circular = TRUE)
fxx = filter.complex(dx * Conj(dx), rep(1, 15), sides = 2, circular = TRUE)
B = fyx/fxx
beta = Re(fft(B, inv = TRUE)) / length(B)
plot(-15:15, c(beta[-1], beta)[length(B) + -15:15], type = "h")
abline(h = 0, lty = 3)

Using the seasonally adjusted series gives a very similar result:

dy = fft(recSA) / sqrt(length(recSA))
dx = fft(soiSA) / sqrt(length(soiSA))
fyx = filter.complex(dy * Conj(dx), rep(1, 15), sides = 2, circular = TRUE)
fxx = filter.complex(dx * Conj(dx), rep(1, 15), sides = 2, circular = TRUE)
B = fyx/fxx
betaSA = Re(fft(B, inv = TRUE)) / length(B)
plot(-15:15, c(betaSA[-1], betaSA)[length(B) + -15:15], type = "h")
abline(h = 0, lty = 3)

In this case, the response is clearly recruitment, and the input

is SOI.
We could reverse the roles: SOI versus recruitment.
fxy = Conj(fyx)
fyy = filter.complex(dy * Conj(dy), rep(1, 15), sides = 2, circular = TRUE)
B = fxy/fyy
beta = Re(fft(B, inv = TRUE)) / length(B)
plot(-15:15, c(beta[-1], beta)[length(B) + -15:15], type = "h")
abline(h = 0, lty = 3)

In the first version, r 0 for r < 0, so the filter is physically

realizable. In other cases, this method may give unrealizable
filters; we can fit the best realizable filter using time domain
methods.
8

Interpreting Coherence
Recall that
fyy () = |B()|2fxx() + fvv ()
fyx()
=
fxx()

fxx() + fvv ()

= 2
yx ()fyy () + fvv ().
So
fvv () = 1 2
yx () fyy ().
The squared coherence is the proportion of the spectrum of
yt that is explained by the lagged regression on xt.
9

Forecasting

The forecasting problem is also a type of lagged regression:

of xt on its own lags;
and on only the past.

We have seen that the solution is

x
t =

r xtr ,
r=1

where the s must satisfy

cov(xt x
t, xtr ) = 0

for r = 1, 2, . . .
10

That is, wt = xt x
t is uncorrelated with all past xs, and
hence with all past ws, and hence is white noise.

So the filter

wt = xt

rxtr

r xtr =
r=1

r=0

turns xt into white noise wt.

So the spectrum fxx() satisfies

2 = f ()
w
xx

r=0

re2ir

= fxx()
r=0

re2ir

r=0
11

So, taking logarithms:

2 log
log[fxx()] = log w

re2ir log

re2ir .

r=0

Now, provided log[fxx()] is integrable:

1/2
1/2

|log[fxx()]| d <

we can write

log[fxx()] = l0 + 2

= l0 +
r=1

lr cos(2r)
r=1

lr e2ir +

lr e2ir

r=1
12

Some standard complex variable theory implies that we can

match terms:
2 =l ,
log w
0

log

re2ir =

lr e2ir ,

r=0
r=1

log
re2ir =
lr e2ir .
r=0
r=1

That is,
1/2

2 = exp l
w
( 0) = exp

1/2

log[fxx()] d ,

and

re2ir = exp

r=0

lr e2ir

r=1

whence for r = 1, 2, . . .
r = r =

1/2
1/2

exp

lr e2ir e2ir d.

r=1

This is the essence of Kolmogorovs (1941) solution to the

forecasting problem.
14

Long Memory Time Series

A time series has short memory if

|(h)| < .

So a time series for which

|(h)| =
is said to have long memory.

Why do we care?
Write the mean of x1, x2, . . . , xn as
x + x2 + + xn
.
x
n = 1
n
Then
n1
1
|h|
var(x
n) =
1
(h)
n h=(n1)
n

1
|h|
1
=
(h)
n h=
n +
where (a)+ = max(a, 0) is a if a 0 and 0 if a < 0.
2

|(h)| < , then

|h|
1
(h)
(h)
n +
h=
h=

as n .

nvar(x
n)

(h),
h=

var(x
n) =

1
1
(h) + o
.
n h=
n
3

That is, for a short memory time series, var(x

n) goes to zero
as the sample size increases at the usual rate, 2/n, but with
a different multiplier.
Note that

(h) = f (0),
h=

the spectral density f () evaluated at = 0.

So we can also write
var(x
n) =

f (0)
1
+o
:
n
n

2 is replaced by f (0).
4

But if

|(h)| = , this doesnt work.

In practice, many series show var(x

n) decaying more slowly.
Plot log[var(x
n)] against log(n), and look for a slope of 1.
vartime = function(x, nmax = round(length(x) / 10)) {
v = rep(NA, nmax);
for (n in 1:nmax) {
y = filter(x, rep(1/n, n), sides = 1);
v[n] = var(y, na.rm = TRUE);
}
plot(log(1:nmax), log(v));
lmv = lm(log(v) ~ log(1:nmax));
abline(lmv);
title(paste(deparse(substitute(x)), "; nmax = ", nmax));
print(summary(lmv));
}
vartime(log(varve))
vartime(globtemp)
vartime(residuals(lm(globtemp ~ time(globtemp))))
5

Fractional Integration
How can we model such series?
Fractionally integrated white noise:
(1 B)dxt = wt,

0 < d < 0.5.

ACF is
(h) =

(h + d)(1 d)
h2d1
(h d + 1)(d)

So for 0 < d < 0.5,

|(h)| = .
h=
6

Notes:

var(x
n) decays like n(2d1), so
1 + slope of variance-time graph
2
gives a rough empirical estimate of d.
d=

The spectral density is

f () =

2
w
d
2
4 sin()

so for d > 0, f () as 0.
7

Also f () ||2d as 0, so a graph of log[f ()] against

log(||) gives another estimate of d.

If d 0.5, f () is not integrable, so the series is not stationary.

ARFIMA Model

In some long-memory series, autocorrelations at small lags

do not match those of fractionally integrated noise.

We can add ARMA components to allow for such differences;

the ARIMA(p, d, q) model with fractional d, or ARFIMA.

Use the R function fracdiff():

library(fracdiff)
summary(fracdiff(log(varve)))
summary(fracdiff(log(varve), nar = 1, nma = 1))
summary(fracdiff(residuals(lm(globtemp ~ time(globtemp)))))
9

Trend Estimation with ARFIMA errors

The R function fracdiff() does not allow explanatory variables, but we can use it to calculate a profile likelihood function.
E.g. global temperature versus cumulative CO2 emissions:
source("http://www.stat.ncsu.edu/people/bloomfield/courses/st730/co2w.R");
plot(cbind(globtemp, co2w));
slopes = seq(from = 0, to = 1.5, length = 151);
ll2 = rep(NA, length(slopes));
for (i in 1:length(slopes))
ll2[i] = -2 * fracdiff(globtemp - slopes[i] * co2w)$log.likelihood;
plot(slopes, ll2, type = "l");
abline(h = min(ll2) + qchisq(.95, 1));
10

The point estimate is

slopeEst = slopes[which.min(ll2)];
abline(v = slopeEst, col = "red"); # [1] 0.68

and the 95% confidence interval is roughly:

slopeCI = range(slopes[ll2 <= min(ll2) + qchisq(.95, 1)]);
abline(v = slopeCI, col = "red", lty = 2); # [1] 0.41 1.03

The CO2 series was scaled by its change from 1900 to 2000,
so we estimate the 20th century warming as 0.68C, with a
confidence interval of (0.41C, 1.03C) (note the asymmetry:
0.68(0.27, +0.35)C).
Compare with IPCC: 19062005 warming is 0.74C 0.18C.
11

Conditional Heteroscedasticity (CH)

So far, our models are for the conditional mean.

For instance, the Gaussian AR(1) model

yt = yt1 + t
may be written:
Conditionally on yt1, yt2, . . . ,

2 .
yt N + yt1 , w

The conditional mean depends on the past, the conditional

variance does not.
1

Three key features:

The conditional distribution is normal;
The conditional mean is a linear function of yt1, yt2, . . . ;
The conditional variance is constant: conditional homoscedasticity.

All three features could be changed.

Non-normal noise: typically longer tails; for fitting, provided

the variance is finite, changes the likelihood function, but
not much else.

Nonlinear mean function: Modeling a nonlinear mean is quite

difficult; for instance, ensuring stationarity is restrictive. Threshold models are perhaps most feasible.

Non-constant variance. Two approaches:

ARCH (AutoRegressive CH), GARCH (Generalized ARCH),
...
Stochastic volatility.
3

ARCH Models

Simplest is ARCH(1):
yt = t t
2
t2 = 0 + 1yt1
where t is Gaussian white noise with variance 1.

Alternatively:
Conditionally on yt1, yt2, . . . ,

2
yt N 0, 0 + 1yt1
.

If |yt1| happens to be large, t is increased, so |yt| also tends

to be large.

Conversely, if |yt1| happens to be small, t is decreased, so

|yt| also tends to be small.

volatility clusters and long tails.

n = 1000; alpha1 = 0.9; alpha0 = 1 - alpha1;
y = epsilon = ts(rnorm(n));
par(mfcol = c(2, 1));
plot(epsilon);
for (i in 2:n) y[i] = epsilon[i] * sqrt(alpha0 + alpha1 * y[i - 1]^2);
plot(y);
5

ARCH as AR

The ARCH(1) model for yt implies:

yt2 = t2 2
t
= t2 + t2

21
t
2 + 2
= 0 + 1yt1
t

21
t

or
2 +v ,
yt2 = 0 + 1yt1
t

where
vt = t2

21 .
t
6

Note that
E(vt|yt1, yt2, . . . ) = 0,
and hence that for h > 0,
E(vtvth) = E E vtvth|yt1, yt2, . . .
= E vthE vt|yt1, yt2, . . .
= 0,
so vt is (highly nonnormal) white noise, and yt2 is AR(1).
For positivity and stationarity, 0 > 0 and 0 1 < 1, and
unconditionally,
0
2
E yt = var(yt) =
.
1 1
7

Extensions and Generalizations

Extend to ARCH(m):
yt = t t
2 + y2 + + y2
t2 = 0 + 1yt1
m tm .
2 t2
Now yt2 is AR(m) usual restrictions on s.
Generalize to GARCH(m, r):
yt = t t
t2 = 0 +

m
j=1

2 +
j ytj

2 .
j tj

j=1

Now yt2 is ARMA[m, max(m, r)] corresponding restrictions

on s and s.
8

Simplest GARCH model: GARCH(1, 1)

The GARCH(1, 1) model is widely used:

2 + 2
t2 = 0 + 1yt1
1 t1

with
1 + 1 < 1
for stationarity.

The unconditional variance is now

E yt2 = var(yt) =

0
.
1 1 1
9

n = 1000; alpha1 = 0.5; beta1 = 0.4; alpha0 = 1 - alpha1 - beta1;

y = epsilon = ts(rnorm(n));
par(mfcol = c(2, 1));
plot(epsilon);
sigmatsq = 1;
for (i in 2:n) {
sigmatsq = alpha0 + alpha1 * y[i - 1]^2 + beta1 * sigmatsq;
y[i] = epsilon[i] * sqrt(sigmatsq);
}
plot(y);

Volatility clusters are more sustained.

In SAS, use proc autoreg and the garch option on the model
statement.
In R, explore and describe volatility:
nyse = ts(scan("nyse.dat"));
par(mfcol = c(2, 1));
plot(nyse);
plot(abs(nyse));
lines(lowess(time(nyse), abs(nyse), f = .005), col = "red");
par(mfcol = c(2, 2));
acf(nyse);
acf(abs(nyse));
acf(nyse^2);

In R, fit GARCH (default is 1,1):

library(tseries);
nyse.g = garch(nyse);
summary(nyse.g);
plot(nyse.g);
par(mfcol = c(1, 1));
plot(nyse);
matlines(predict(nyse.g), col = "red", lty = 1);

GARCH with a unit root: IGARCH

A special case:
1 + 1 = 1:

IGARCH(1, 1)

GARCH(1, 1)

with

yt = t t
2 + 2
t2 = 0 + (1 1) yt1
1 t1
Solving recursively with 0 = 0:
t2 = (1 1)

j1 2
ytj

1
j=1

an exponentially weighted moving average of yt2.

Tail Length

All xARCH models give yt with fat tails:

yt = t t where t N (0, 1)
fy (y) =

1
y
f ()
d.

fy () is a mixture of Gaussian densities with the same

mean and different variances.

In practice, residuals in xARCH models may not be normal,

but are usually closer to normal than the original data.
14

R UpdateFall 2011

Shumway and Stoffers code for Example 5.3 does not work
with the R garch function.

The fGarch package provides another method, garchFit, which

allows simultaneous fitting of ARMA and GARCH models.

gnp96 = read.table("http://www.stat.pitt.edu/stoffer/tsa2/data/gnp96.dat");
gnpr = ts(diff(log(gnp96[, 2])), frequency = 4, start = c(1947, 1));
library(fGarch);
gnpr.mod = garchFit(gnpr ~ arma(1, 0) + garch(1, 0), data.frame(gnpr = gnpr));
summary(gnpr.mod);
Title:
GARCH Modelling
Call:
garchFit(formula = gnpr ~ arma(1, 0) + garch(1, 0),
data = data.frame(gnpr = gnpr))
Mean and Variance Equation:
data ~ arma(1, 0) + garch(1, 0)
[data = data.frame(gnpr = gnpr)]
Conditional Distribution:
norm
16

Coefficient(s):
mu
ar1
0.00527795 0.36656255

omega
0.00007331

alpha1
0.19447134

Std. Errors:
based on Hessian
Error Analysis:
Estimate Std. Error t value
mu
5.278e-03
8.996e-04
5.867
ar1
3.666e-01
7.514e-02
4.878
omega 7.331e-05
9.011e-06
8.135
alpha1 1.945e-01
9.554e-02
2.035
--Signif. codes: 0 *** 0.001 ** 0.01 *
Log Likelihood:
722.2849
normalized:

Pr(>|t|)
4.44e-09
1.07e-06
4.44e-16
0.0418

***
***
***
*

0.05 . 0.1

3.253536

Standardised Residuals Tests:

Jarque-Bera Test
Shapiro-Wilk Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
LM Arch Test

R
R
R
R
R
R^2
R^2
R^2
R

Chi^2
W
Q(10)
Q(15)
Q(20)
Q(10)
Q(15)
Q(20)
TR^2

Statistic
9.118036
0.9842405
9.874326
17.55855
23.41363
19.2821
33.23648
37.74259
25.41625

p-Value
0.01047234
0.01433578
0.4515875
0.2865844
0.2689437
0.03682245
0.004352734
0.009518987
0.01296901

Information Criterion Statistics:

AIC
BIC
SIC
HQIC
-6.471035 -6.409726 -6.471669 -6.446282

garchFit also provides many diagnostic plots:

plot(gnpr.mod);

Threshold Models

A simple form of nonlinear model, basically a switching

AR(p):
(j)

(j)

xt = (j) + 1 xt1 + + p xtp + (j)wt

if xt1 Rj , where xt1 = (xt1, . . . , xtp) , R1, R2, . . . , Rr is
a partition of Rp, and wt is white noise with variance 1.

That is, the AR(p) parameters in the equation for xt change,

depending on the values of the previous p observations
xt1, . . . , xtp.
1

Assuming equal variances, can estimate using regression.

E.g. for monthly pneumonia and influenza deaths:
flu = ts(scan("flu.dat"));
dflu = diff(flu);
a = dflu;
for (l in 1:6)
a = cbind(a, lag(dflu, -l));
a = cbind(a, lag(dflu, -1) > 0.05);
a = data.frame(a);
names(a) = c("x", paste("x", 1:6, sep = ""), "delta");
summary(lm(x ~ delta + x1*delta + x2*delta + x3*delta + x4*delta +
x5*delta + x6*delta, data = a));
flu.l = lm(x ~ -1 + delta + x1*delta + x2*delta + x3*delta + x4*delta,
data = a);
summary(flu.l);
flu.r = residuals(flu.l);
delta = a$delta[4 + 1:length(flu.r)];
lapply(split(flu.r, delta), sd);
acf(flu.r);
2

This is inefficient, and standard errors are invalid, if variances

are unequal; here F = 1.93, df = (17, 110), P = .022.
We can also fit the model using two separate regressions:
flu.lF = lm(x ~ x1 + x2 + x3 + x4, data = a, subset = (delta == 0));
summary(flu.lF);
flu.lT = lm(x ~ x1 + x2 + x3 + x4, data = a, subset = (delta == 1));
summary(flu.lT);

Setting up a residual series:

flu.r01 = flu.r;
flu.r01[!delta] = residuals(flu.lF) / sd(residuals(flu.lF));
flu.r01[delta] = residuals(flu.lT) / sd(residuals(flu.lT));
acf(flu.r01);
3

Regression with Autocorrelated Errors

Regression model
yt = zt + xt
where xt has covariance matrix .

Generalized least squares (GLS) estimates for known :

= Z 1Z

Z 1y.

For unknown , plug in an estimate.

If xt is a stationary time series, can fit using OLS, then either:

get estimated autocovariances
(h) from the residuals,
;
and plug in
or use Cochrane-Orcutt method.

More generally, can use mixed model methods (SAS proc

mixed).

Cochrane and Orcutt suggested:

fit using OLS to get initial estimate of ;
fit AR(p) to OLS residuals (wt is white noise):
(B)xt = wt;
transform the regression to
(B)yt = (B)zt + (B)xt = (B)zt + wt
or
ut = vt + wt.
Residuals are now white, so fit using OLS.
6

SAS proc arima offers better solution:

Mortality and air pollution (Example 5.6): program and
output.
Global temperature and cumulative CO2 emissions: program and output.
In R, temperature (slightly different) and CO2:
arima(globtemp, order = c(1, 0, 0), xreg = co2w);
arima(globtemp, order = c(4, 0, 0), xreg = co2w);
arima(globtemp, order = c(0, 0, 4), xreg = co2w);

Lagged Regression again: Transfer Functions

To forecast an output series yt given its own past and the

present and past of an input series xt, we might use

yt =

j xtj + t = (B)xt + t,
j=0

where the noise t is uncorrelated with the inputs.

This generalizes regression with correlated errors by including lags, and specializes the frequency domain lagged regression by excluding future inputs.

Preliminary estimation of 0, 1, . . . often suggests a parsimonious model

(B)
(B) = B d
,
(B)
where:
d is the pure delay : 0 1 d1 0 and d = 0;
(B) and (B) are low-order polynomials: (B) is needed
if the s decay exponentially, and (B) is needed if the
first few nonzero s do not follow the decay.

Preliminary estimates from frequency domain method, or a

similar time domain method.
2

Time Domain Preliminary Estimates

If the input series xt were white noise, the cross correlation

y,x(h) = E yt+hxt

= E

j xt+hj + t+h xt

j=0

= hvar (xt) ,
so
y,x(h)/var (xt) provides an estimate of h.

Usually, xt is not white noise, but if it is a stationary time

series, we know how to make it white: fit an ARMA model.
3

Prewhitening

Suppose that xt is ARMA:

(B)xt = (B)wt,
where wt is white noise.
Apply the prewhitening filter (B)(B)1 to the lagged regression equation:

j wtj +
t,

yt =
j=0

where yt = [(B)(B)1]yt and

t = [(B)(B)1]t.
4

Now the cross correlation y,w (h) provides an estimate of h.

You can use SASs proc arima to do this:
first identify and estimate a model for xt;
then identify yt with xt as a crosscorr variable.
At the second step, SAS uses the prewhitening filter from
the first step to filter both xt and yt before calculating cross
correlations.
Note: SAS announces that both series have been prewhitened,
but the filter is designed to prewhiten only xt; yt is filtered,
but typically not prewhitened.
5

Finally estimate the model for yt, specifying the input series,
in the form:
input = (d$(L1,1, L1,2, . . . ) . . . (Lk,1, . . . )
/(Lk+1,1, . . . ) . . . (. . . )variable)

E.g. for Southern Oscillation and the fisheries recruitment

series: program and output.

E.g. for global temperature and an estimated historical forcing series: program and output.

Interpreting a Transfer Function

For the global temperature case, we have

yt = 0.087917 (xt + 0.79513xt1 + 0.795132xt2 + . . . ) + t.

So the effect of an impulse in the forcing xt, say a dip due

to a volcanic eruption, is felt in the current year and several
subsequent years, with a mean delay of 1/(10.79513) 4.9
years.

Also, the effect of a sustained change of +4.4W/m2 would

be
0.087917 4.4 (1 + 0.79513 + 0.795132 + . . . )
= 0.087917 4.4/(1 0.79513)
1.9C.
This is the expected forcing for a doubling of CO2 over preindustrial levels, and the temperature response is called the
climate sensitivity. The IPCC states:
Analysis of models together with constraints from
observations suggest that the equilibrium climate sensitivity is likely to be in the range 2C to 4.5C, with a
best estimate value of about 3C. It is very unlikely to
be less than 1.5C.
8

Our estimate is at the low end of that range, but quantifying

its uncertainty is difficult using proc arima.

The profile likelihood for climate sensitivity, constructed using a grid search in R (with p = 4), gives an estimated value
of 1.85C and 95% confidence limits of 1.44C to 2.27C.

1.5

2.0

2.5

-2 Log-Likelihood contours for climate sensitivity (y-axis) and

decay factor (x-axis):

0.4

0.5

0.6

0.7

0.8

0.9

306
308
310

ll2

304

302

-2 Log-Likelihood profile for climate sensitivity:

1.5

2.0

2.5

4.4 * theta

310 308 306 304 302 300

ll2

-2 Log-Likelihood profile for decay factor:

0.4

0.5

0.6

0.7

0.8

0.9

lambda

ARMAX Models
Vector (multivariate) regression:
output vector

yt =

yt,1
yt,2
...
yt,k

input vector

zt,1
z

t,2
zt = ..
.
zt,r
1

Regression equation:
yt,i = i,1zt,1 + i,2zt,2 + + i,r zt,r + wt,i
or in vector form

yt = Bzt + wt.
Here {wt} is multivariate white noise:
E(wt) = 0,

,
w
cov wt+h, wt =
0,

h=0
h = 0.

Given observations for t = 1, 2, . . . , n, the least squares estimator of B, also the maximum likelihood estimator when
{wt} is Gaussian white noise, is
=YZ ZZ
B

where

y1
y2
...
yn

and

z1
z2
...
zn

ML estimate of w (replace n with (n r) for unbiased):

1 n
w =
t

yt Bz
n t=1

t .
yt Bz
3

Information criteria:
Akaike:
w +
AIC = ln

2
k(k + 1)
kr +
;
n
2

Schwarz:
w +
SIC = ln

ln n
k(k + 1)
kr +
,
n
2

Bias-corrected AIC (incorrect in Shumway & Stoffer):

w +
AICc = ln

k(k + 1)
2
kr +
.
nkr1
2

Vector Autoregression
E.g., VAR(1):

xt = + xt1 + wt.
Here is a k k coefficient matrix, and {wr } is Gaussian
multivariate white noise.
This resembles the vector regression equation, with:

yt = xt,
B= ,
zt =

xt1

.
5

Observe x0, x1, . . . , xn, and condition on x0.

Maximum conditional likelihood estimators of B and w are

same as for ordinary vector regression.

VAR(p) is similar, but we must condition on the first p observations.

Full likelihood = conditional likelihood likelihood derived

from marginal distribution of first p observations, and is difficult to use.

Example: 1-year, 5-year, and 10-year weekly interest rates

Data from http://research.stlouisfed.org/fred2/series/WGS1YR/,
etc.
a = read.csv("WGS1YR.csv");
WGS1YR = ts(a[,2]);
a = read.csv("WGS5YR.csv");
WGS5YR = ts(a[,2]);
a = read.csv("WGS10YR.csv");
WGS10YR = ts(a[,2]);
a = cbind(WGS1YR, WGS5YR, WGS10YR);
plot(a);
plot(diff(a));

Use the dse package to fit VAR(1) and VAR(2) models to

differences:
library(dse);
b = TSdata(output = diff(a));
b1 = estVARXls(b, max.lag = 1);
cat("VAR(1)\n print method:\n");
print(b1);
cat("\n summary method:\n");
print(summary(b1));
b2 = estVARXls(b, max.lag = 2);
cat("\nVAR(2)\n print method:\n");
print(b2);
cat("\n summary method:\n");
print(summary(b2));

VAR(1)
print method:
neg. log likelihood= -7188.785
A(L) =
1-1.014698L1
0-0.02482398L1
0-0.0144053L1
B(L)
1
0
0

=
0
1
0

0+0.05794167L1
1-0.9224325L1
0+0.03872528L1

0-0.04292339L1
0-0.05304638L1
1-1.024605L1

0
0
1

summary method:
neg. log likelihood = -7188.785
sample length = 2448
WGS1YR y.WGS5YR
WGS10YR
RMSE 0.2005654 0.1713752 0.1563661
ARMA: model estimated by estVARXls
inputs :
outputs: WGS1YR y.WGS5YR WGS10YR
9

input dimension = 0
output dimension = 3
order A = 1
order B = 0
order C =
9 actual parameters
6 non-zero constants
trend not estimated.
VAR(2)
print method:
neg. log likelihood= -7414.944

A(L) =
1-1.329215L1+0.3221239L2
0+0.1030711L1-0.05850615L2
0-0.1539836L1+0.1172694L
0-0.07336772L1+0.05027099L2
1-1.117284L1+0.1974304L2
0-0.1148573L1+0.0577710
0+0.0002002881L1-0.01317073L2
0-0.02287398L1+0.06233586L2
1-1.252808L1+0.226
B(L)
1
0
0

=
0
1
0

0
0
1

summary method:
neg. log likelihood = -7414.944

sample length = 2448

WGS1YR y.WGS5YR
WGS10YR
RMSE 0.1910442 0.1666275 0.1534016
ARMA: model estimated by estVARXls
inputs :
outputs: WGS1YR y.WGS5YR WGS10YR
input dimension = 0
output dimension = 3
order A = 2
order B = 0
order C =
18 actual parameters
6 non-zero constants
trend not estimated.

AIC is smaller (more negative) for VAR(2), but SIC is smaller

for VAR(1).

For VAR(1),

1 =

0.3288773
0.08581201
0.06575108
0.1534516
0.004959931
0.04152504

0.136938

0.08875425
0.2406055

Largest off-diagonal elements are (1,3) and (2,3), suggesting

that changes in the 10-year rate are followed, one week later,
by changes in the same direction in the 1-year and 5-year
rates.
10

Time Series Summary
100% (1)
Time Series Summary
23 pages
Introduction To Vars and Structural Vars:: Estimation & Tests Using Stata
100% (1)
Introduction To Vars and Structural Vars:: Estimation & Tests Using Stata
69 pages
Arima
100% (1)
Arima
4 pages
VAR Lecture2
100% (1)
VAR Lecture2
39 pages
Time Series Analysis in R for Aviation Data
No ratings yet
Time Series Analysis in R for Aviation Data
49 pages
Univariate Time Series Forecasting
100% (2)
Univariate Time Series Forecasting
72 pages
ARIMA Modeling for Analysts
100% (1)
ARIMA Modeling for Analysts
205 pages
Probability and Statistics Ii: George Deligiannidis Module Lecturer 2020/21: Kalliopi Mylona
No ratings yet
Probability and Statistics Ii: George Deligiannidis Module Lecturer 2020/21: Kalliopi Mylona
107 pages
Introductory Econometrics For Finance Chris Brooks Solutions To Review Questions - Chapter 5
No ratings yet
Introductory Econometrics For Finance Chris Brooks Solutions To Review Questions - Chapter 5
9 pages
Time Series Analysis
0% (1)
Time Series Analysis
173 pages
Time Series
No ratings yet
Time Series
327 pages
Cheat Sheet
No ratings yet
Cheat Sheet
163 pages
Multivariate Statistics With R
No ratings yet
Multivariate Statistics With R
190 pages
Time Series
No ratings yet
Time Series
819 pages
Time Series Analysis With Python
100% (1)
Time Series Analysis With Python
64 pages
Stata Guide for Economists
No ratings yet
Stata Guide for Economists
48 pages
Time Series Analysis - An Introduction
No ratings yet
Time Series Analysis - An Introduction
38 pages
VAR Model for Turkish Financial Markets
No ratings yet
VAR Model for Turkish Financial Markets
21 pages
Time Series
100% (1)
Time Series
91 pages
Auto ARIMA for Python Users
No ratings yet
Auto ARIMA for Python Users
9 pages
Applied Time Series Analysis
No ratings yet
Applied Time Series Analysis
200 pages
An Overview of Practical Time Series Forecasting Using Pytho
No ratings yet
An Overview of Practical Time Series Forecasting Using Pytho
30 pages
2actuarial Mathematics of Ss Pensions
No ratings yet
2actuarial Mathematics of Ss Pensions
142 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
Introductory Time Series Analysis in R
No ratings yet
Introductory Time Series Analysis in R
22 pages
Predictive Analytics & Time Series
No ratings yet
Predictive Analytics & Time Series
54 pages
Vector Autoregressive Models Overview
No ratings yet
Vector Autoregressive Models Overview
34 pages
Top 50+ Time Series Interview Q&A
No ratings yet
Top 50+ Time Series Interview Q&A
7 pages
Time Series
No ratings yet
Time Series
40 pages
Time Series Forecasting in R
No ratings yet
Time Series Forecasting in R
98 pages
R For Actuaries & Data Scientists
50% (2)
R For Actuaries & Data Scientists
70 pages
Stata Time Series Guide
100% (1)
Stata Time Series Guide
26 pages
Time Series Modeling: Shouvik Mani April 5, 2018
No ratings yet
Time Series Modeling: Shouvik Mani April 5, 2018
46 pages
Time Series
100% (3)
Time Series
19 pages
Slides 2-04 - The ARMA Model and Model Selection PDF
100% (1)
Slides 2-04 - The ARMA Model and Model Selection PDF
14 pages
Time Series Book
100% (2)
Time Series Book
435 pages
Time Series Analysis in Business Forecasting
No ratings yet
Time Series Analysis in Business Forecasting
62 pages
GAMs for Statistical Learning
No ratings yet
GAMs for Statistical Learning
10 pages
STATS Textbook
100% (1)
STATS Textbook
459 pages
Time Series Analysis - Economics
100% (1)
Time Series Analysis - Economics
48 pages
Polynomial Regression and Step Function
100% (1)
Polynomial Regression and Step Function
6 pages
Time Series - Practical Exercises
100% (1)
Time Series - Practical Exercises
9 pages
Time Series Analysis Guide
100% (1)
Time Series Analysis Guide
83 pages
Chow Test
No ratings yet
Chow Test
23 pages
50 Important Statistics' Q & A To Crack DS Interview
No ratings yet
50 Important Statistics' Q & A To Crack DS Interview
14 pages
DR - Arunachalam Rajagopal - Time Series Forecasting With R A Beginner's Guide (2020)
No ratings yet
DR - Arunachalam Rajagopal - Time Series Forecasting With R A Beginner's Guide (2020)
93 pages
Understanding Time Series Analysis Techniques
No ratings yet
Understanding Time Series Analysis Techniques
143 pages
Time Series Models for Engineers
No ratings yet
Time Series Models for Engineers
15 pages
Class16 PDF
No ratings yet
Class16 PDF
77 pages
STAT 479: Time Series Analysis Notes
No ratings yet
STAT 479: Time Series Analysis Notes
74 pages
Stationary Time Series Analysis
No ratings yet
Stationary Time Series Analysis
35 pages
Time Series 2022 B
No ratings yet
Time Series 2022 B
57 pages
AR, MA, ARIMATime Series
No ratings yet
AR, MA, ARIMATime Series
76 pages
Time Series Analysis
100% (1)
Time Series Analysis
66 pages
1-Basic Concepts 34454745
No ratings yet
1-Basic Concepts 34454745
13 pages
Correlogram and Time Series Analysis
No ratings yet
Correlogram and Time Series Analysis
30 pages
SC Presentation VIT 03272024 Final
No ratings yet
SC Presentation VIT 03272024 Final
98 pages
Math7339TS1TimesSeries Intro
No ratings yet
Math7339TS1TimesSeries Intro
33 pages
Stationary Time Series Analysis
No ratings yet
Stationary Time Series Analysis
7 pages
Time Series Analysis Guide
No ratings yet
Time Series Analysis Guide
17 pages
Time Series Analysis - Univariate and Multivariate Methods by William Wei PDF
100% (3)
Time Series Analysis - Univariate and Multivariate Methods by William Wei PDF
634 pages
Python Finance & Trading Guide
No ratings yet
Python Finance & Trading Guide
11 pages
Renewable and Sustainable Energy Reviews: Jaesung Jung, Robert P. Broadwater
No ratings yet
Renewable and Sustainable Energy Reviews: Jaesung Jung, Robert P. Broadwater
16 pages
Unit 1 - Planning & Evaluating Front Office Operations: Forecasting Techniques
No ratings yet
Unit 1 - Planning & Evaluating Front Office Operations: Forecasting Techniques
11 pages
Stat 497 - LN4
No ratings yet
Stat 497 - LN4
67 pages
Time Series Analysis and Forecasting
No ratings yet
Time Series Analysis and Forecasting
59 pages
WQU Econometrics M3 Compiled Content PDF
No ratings yet
WQU Econometrics M3 Compiled Content PDF
44 pages
A Dictionary of Forex PDF
100% (5)
A Dictionary of Forex PDF
88 pages
Practical Signal Processing Course with MATLAB
100% (1)
Practical Signal Processing Course with MATLAB
0 pages
Zhong Dan Enke - 2017 - Forecasting Daily Stock Market Return Using Dimens
No ratings yet
Zhong Dan Enke - 2017 - Forecasting Daily Stock Market Return Using Dimens
14 pages
SAS High Performance Forecasting
No ratings yet
SAS High Performance Forecasting
604 pages
Da Unit-4
No ratings yet
Da Unit-4
43 pages
ARMA State Space Model Optimization
No ratings yet
ARMA State Space Model Optimization
10 pages
Flutter Prediction From Flight Flutter Test Data
No ratings yet
Flutter Prediction From Flight Flutter Test Data
13 pages
Forecasting PDF
No ratings yet
Forecasting PDF
101 pages
ARIMA Modeling and B-J Procedure
No ratings yet
ARIMA Modeling and B-J Procedure
26 pages
EViews 6 Users Guide II
No ratings yet
EViews 6 Users Guide II
688 pages
NumXL Functions
No ratings yet
NumXL Functions
11 pages
Forecasting Chicken Prices in the Philippines
No ratings yet
Forecasting Chicken Prices in the Philippines
16 pages
Time Series Documentation - Mathematica
100% (2)
Time Series Documentation - Mathematica
214 pages
Stata Time Series Reference Manual
No ratings yet
Stata Time Series Reference Manual
921 pages
Time Series Analysis Exercises: Universität Potsdam
100% (1)
Time Series Analysis Exercises: Universität Potsdam
30 pages
2006 Xu - Hydrologic Models PDF
100% (2)
2006 Xu - Hydrologic Models PDF
175 pages
Time Series Outlier Detection Tool
No ratings yet
Time Series Outlier Detection Tool
32 pages
Time Series Econometrics Guide
No ratings yet
Time Series Econometrics Guide
57 pages
EC306 Study Questions Guide
No ratings yet
EC306 Study Questions Guide
5 pages
Covariances of ARMA Processes
No ratings yet
Covariances of ARMA Processes
9 pages
Exponential Smoothing Techniques Explained
No ratings yet
Exponential Smoothing Techniques Explained
32 pages
Econometrics Eviews 8
No ratings yet
Econometrics Eviews 8
19 pages