Time Series StudyBook
Time Series StudyBook
Study Book
Written by
Dr Peter Dunn
Department of Mathematics & Computing
Faculty of Sciences
The University of Southern Queensland
ii
Published by
http://www.usq.edu.au
Copyrighted materials reproduced herein are used under the provisions of the
Copyright Act 1968 as amended, or as a result of application to the copyright
owner.
Produced using LATEX in the USQ style by the Department of Mathematics and
Computing.
1 Introduction 3
4 arma Models 59
5 Finding a Model 73
10 Introduction 215
iii
iv Table of Contents
1
2
Introduction
1
Module contents
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Time-series . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Signal and noise . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Simple methods . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5.1 The r package . . . . . . . . . . . . . . . . . . . . . . . 13
1.5.2 Getting help in r . . . . . . . . . . . . . . . . . . . . . . 17
1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.6.1 Answers to selected Exercises . . . . . . . . . . . . . . . 19
Module objectives
3
4 Module 1. Introduction
know the particular kinds of time series being discussed in this course;
recognise the reasons for finding statistical models for time series;
understand that the signal of a time series can be modelled and that
the noise is random;
1.1 Introduction
This Module introduces time series and associated terminology. Some sim-
ple methods are discussed for analysing time series, and the software used
in the course is also introduced.
1.2 Time-series
1.2.1 Definitions
Example 1.1: The monthly Southern Oscillation Index (the SOI) is avail-
able for approximately the last 130 years. A plot of the monthly av-
erage SOI (Fig. 1.1) has time on the horizontal axis, and the SOI on
the vertical axis. Generally, the observations are joined with a line
to indicate that the points are given in a particular order. (Note the
horizontal line at zero was added by me, and is not part of the default
plot.)
Example 1.2: The seasonal SOI can also be examined. This series cer-
tainly does not consist of independent observations. The seasonal SOI
can be plotted against the SOI for the previous season, the season
before that, and so on (Fig. 1.2).
There is a reasonably strong relationship between the seasonal SOI
and the previous season. The relationship between the SOI and the
season before that is still obvious; it is less obvious (but still present)
with three seasons previous. There is basically no relationship between
the seasonal SOI and the SOI four seasons previous.
Example 1.3:
Consider the annual rainfall near Wendover, Utah, USA. (These data
are considered in more detail in Example 7.1.) A plot of the data
(Fig. 1.3, top panel) suggests a non-stationary mean (the mean goes
up and down a little). To check this, a smoothing filter was applied
30
20
10
SOI
0
−10
−20
−30
−40
Time
20
10
0
SOI
−10
−20
−30
Time
Figure 1.1: A time-plot of the monthly average SOI. Top: the SOI from
from 1876 to 2001; Bottom: the SOI since 1980 showing more detail. (In
this example, the SOI has been plotted using las=1; this just make the labels
on the vertical axis easier to read in my opinion, but is not necessary.)
SOI vs SOI one season previous SOI vs SOI two seasons previous
● ●
● ●
20 ● ● ●● ● ●
20 ●● ● ●●● ●
● ● ● ● ●
● ● ●● ● ●●
●●●●●● ●
● ● ● ●●●● ●● ●
● ●
● ● ●●● ● ●● ● ●● ● ●●
●●●●●●● ●● ● ●● ● ● ●● ●
●
● ●●●● ●
10 ● ●● ●● ● ● ●●●●
●● ● ●●●● ● ●● 10 ●●● ●●
●● ● ● ●●
● ●●●● ● ●● ●
● ●● ●●● ●●● ● ●● ●● ●●● ● ● ● ●●
●● ●●●● ● ● ●
● ● ●● ● ●
SOI at time t − 1
SOI at time t − 2
●●● ●● ● ●●● ●
●● ●● ●●● ●● ●● ● ● ●● ●● ●●
●
●
●
●●●●●●
● ●
●● ●
●●
● ●●●● ● ●● ● ● ●● ●●
●●● ● ●●
●●●●
● ●● ●●
●● ●●
●● ● ●●
●● ●
● ●●● ●
●● ● ● ●● ●● ●● ● ● ● ●●
● ●● ●●● ●
●●●●●
● ●●●●●●●● ●●●●
●●●
● ● ● ● ●● ● ●●
● ●●
●●
●●●
●● ●●
● ●●●
● ●
●●● ●
● ●● ●●● ●●
●●
●
●
●
●
●●●
●●● ●● ●●●
●●
●
●
●●●
●● ● ● ●● ●
●●●●
●●●● ●●●
●●
●
●● ● ●
●
●●●●●
●
●
●●
●
●●● ●● ● ●● ● ●
● ●● ● ●●● ● ●●
0 ●● ● ●●● ●●
● ●●
●●● ●
● ●●●
● ●
● ●● ● 0 ●● ● ●
●●●● ●
●●
●●● ●●●●● ●● ● ●
● ● ●●
●
● ●●● ● ●● ●
●
●●●
●
●●
●●●
●●
●
●●
●
●●
●
●
●
●●●
●●
● ●●
●●●● ●● ● ● ●● ●●
● ●● ●● ●●●
●●●● ●●
●
●
●●
●●●●●●●●●
● ●●
●●●●●●
● ●● ●● ●
● ●● ●
●● ●●●●●●
●●
●
●
●●●●
● ● ● ●
●● ● ●
● ● ●● ● ●● ●
●
●
●●
●●●
●●
●
●●●●
●
●
●●●●●●
●
●●
●●
●
● ●● ● ● ●
●●●●
●●● ●
●● ● ●●● ●
● ● ●● ● ●● ●● ● ●●●
●● ● ● ● ● ● ●● ● ●
● ●
● ●● ●●● ●
● ●●
●●● ●●●● ● ● ●● ●● ●● ●● ● ●
●● ●● ● ●●●●
● ● ●●
● ● ●● ● ● ●● ●●●● ● ●● ● ● ● ●●
●●● ●●
●
● ●● ● ● ● ●●
●●●●●● ●● ●
●●●● ●●● ● ● ●●●● ●
−10 ● ● ● ● ● ●● ●● ●● ● ● −10 ● ● ● ● ● ●
●● ● ● ● ●
● ● ● ●● ●● ●●● ●●●●●● ● ● ● ●● ●
● ●●●● ● ●●● ●●● ●●● ●
● ●●● ● ● ●● ●●
● ●
●● ●●● ●● ● ● ● ● ●
● ● ●
●
●●● ● ● ●● ● ● ● ● ● ●
● ● ● ● ● ●
● ●
● ● ● ● ● ●
● ●
−20 −20
● ●
● ●
−30 ● −30 ●
● ●
SOI vs SOI three seasons previous SOI vs SOI four seasons previous
● ●
● ●
20 ● ● ● ● ●
20 ● ●● ●●
● ● ●●● ● ●● ●
● ●
●
●● ● ● ● ●● ● ● ●
●●● ● ● ●
● ● ●●
●
●
● ●●● ●● ● ● ●● ●
●●● ● ●●● ●
●● ● ● ● ● ● ● ●
● ●● ● ● ● ●●
● ● ●●●● ● ●● ●●● ●● ● ●
10 ● ● ● ●● ●●●●●● ● ● ●●●
● ● ● 10 ● ●● ●● ● ● ● ● ●● ●● ●● ●
●●● ●● ● ●● ●
● ●● ●● ● ●
● ●● ● ● ●● ●
● ● ● ●● ●●● ●●●●● ● ●
SOI at time t − 3
SOI at time t − 4
●● ● ●● ●●●● ● ● ● ●● ● ●● ●
●
● ● ● ●● ●● ●
●●● ●
● ●● ●
●●●●●●●●
● ●
● ●● ●● ● ● ●● ● ● ●●●●
●● ● ●
●
●
●●●●●
●●●●●●
● ● ● ●
●●●
● ● ● ● ●●
● ●
●
●●●● ●
●●
● ●
●
●●
●●
●●
●●●●●●
● ●● ● ● ● ●● ●●●
●● ● ●
●
●
●
●
● ●●●
● ●●●
● ●●
●
● ●●● ● ●●
●
● ●● ●●
●● ●●● ● ●●●●● ●● ●● ● ● ● ● ● ● ●● ● ●● ● ●●
●●● ●●● ● ●
●●●
● ●●●
● ● ● ● ● ● ●●● ●●
●●●●●
●● ●
●● ●● ●
●●
● ●● ●●● ●● ●●
●
●●
0 ●● ● ● ●● ●● ●●●● ●●
●●
●
●
●●
● ●
●●●●
●●●● ● ● ● ●●
●
●●● 0 ● ●● ●●● ●● ●
●●●●
●●
●
●
●● ●●
●
●● ● ● ●●
●● ●
●
● ● ●
● ● ● ● ●●●● ● ●
●●
● ●
●
●●●
●●
● ● ●●●●●●● ● ● ●●● ● ●●●●●●●● ●●
●●
●
●
●●●●●● ●
●●● ●●●
● ● ●● ●●
●●
●●
●●●
●
●
● ●
●
●●
●
●
● ●
●
●●● ● ● ●● ● ●●
●●
●●●
● ●
●●●●
●●●
●
●
● ●
●
●●●
●
●
●
●
●●
● ● ●● ● ● ● ●● ● ●● ●●● ●● ●●● ● ●● ● ● ● ● ● ●●●●
● ●●● ●●● ●● ●●●●● ●●
● ●● ● ●●
● ●● ●● ●●● ● ● ●● ●●
●●●
●●●● ●● ● ●●
● ● ●● ● ●● ● ●● ●● ● ●
●●●● ●●●
● ●● ● ●
● ●●●●●●●●●●
●● ●●●● ●●●● ●● ●● ● ● ●
●
●
●●●●● ● ●●●● ●● ●●● ●●
●●
● ● ●● ●● ● ●● ●● ● ●●● ● ● ● ●
−10 ●● ● ●● ●● ● ● ●● ● ● −10 ●●● ● ● ● ● ● ● ●●●●● ● ●
● ●● ●●● ●●●●● ●● ●● ●●● ● ●● ●
●●●●● ● ● ● ● ● ●●●● ●● ●
●
●
● ●● ●●● ● ● ● ● ●
● ●●● ●
● ●● ●
● ●● ●● ● ● ● ●
● ●●● ●
●●●
● ● ●
● ● ● ● ●
● ● ●
● ●● ●
● ●
−20 −20
● ●
● ●
−30 ● −30 ●
● ●
Figure 1.2: The seasonal SOI plotted against previous values of the SOI.
that computed the mean of each set of six observations at a time. This
smooth gave the thick, dark line in the bottom panel of Fig. 1.3, and
suggests that the mean is perhaps non-stationary as this line is not
(approximately) constant. However, it is not too bad. The middle
panel of Fig. 1.3 shows a series that is definitely non-stationary. This
series—the average monthly sea-level at Darwin—is not stationary as
the mean obviously fluctuates. However, the SOI from 1876 to 2001,
plotted in the top panel of Fig. 1.3 (and seen in Example 1.1), is
approximately stationary.
All the time series considered in this part of the course will be equally
spaced (or regular ). These are time series recorded at regular intervals—
every day, year, month, second, etc. Until Module 8, the time series are all
considered for continuous data. In addition, only stationary time series will
be considered initially (until Module 7).
1.2.2 Purpose
30
20
10
SOI
0
−10
−20
−30
−40
Time
4.2
Sea level (in m)
4.1
4.0
3.9
3.8
Time
500
Annual rainfall (in mm)
400
300
200
Year
Figure 1.3: Stationary and non-stationary time series. Bottom: the annual
rainfall near Wendover, Utah, USA in mm is plotted. The data is plotted
with a thin line, and the smoothed data in a thick line indicating that the
mean is perhaps non-stationary. Middle: the monthly average sea level
(in metres) in Darwin, Australia is plotted. The data are definitely not
stationary, as the mean fluctuates. Top: the average monthly SOI from
1876 to 2001 is shown. This series looks approximately stationary.
Example 1.4: Consider the average monthly sea level (in metres) in Dar-
win, Australia (Fig. 1.3, middle panel).
Any useful model for this time series would need to capture the im-
portant features of this time series. What are the important features?
One obvious feature is that the series has a cyclic pattern: the average
sea level rises and falls on a regular basis. Is there also an indication
that the average sea level has been rising since about 1994? Any good
model should capture these important features of the data. As noted
in the previous Example, the series is not stationary.
Methods for modelling and forecasting time series are well established and
rigourous and are sometimes quite accurate, but keep in mind the following:
1.2.3 Notation
The notation Xt (or Xn , or similar) is used to indicate the value of the time
series X at a particular point in time t. For different values of t, values of
the time series at different points in time are indicated. That is, Xt+1 refers
to the next term in the series following Xt .
The entire series is usually written {Xn }n≥1 , indicating the variable X is
a time sequence of numbers. Sometimes, the upper and lower limits are
specified explicitly as {Xn }n=1000
n=1 . Quite often, the notation is abbreviated
so that
{Xn } ≡ {Xn }n≥1 .
The observed and recorded time series, say {Xn }, consists of two compo-
nents:
1. The signal. This is the component of the data that contains informa-
tion, say {Sn } This is the component of the time series that can be
forecast.
The task of the scientist is to extract the signal (or information) from the
time series in the presence of noise. There is no way of knowing exactly what
the signal is; instead, statistical methods are used to separate the random
noise from the forecastable signal. There are many methods for doing this; in
this course, one of those methods will be studied in detail: the Box–Jenkins
method. Some other simple models are discussed in Sect. 1.4; more complex
methods are discussed in Module 9.
0
PDO
−1
−2
Time
1
PDO signal
−1
−2
Time
1.5
1.0
PDO noise
0.5
0.0
−0.5
−1.0
−1.5
Time
Figure 1.4: The monthly Pacific Decadal Oscillation (PDO) from Jan 1980
to Dec 2000 in the top plot. Middle: a lowess smooth is shown superimposed
over the PDO. Bottom: the noise is shown (observations minus signal).
A lowess smoother can be applied to the data1 . (The details are not
important—it is simply one type of smoother.) For one set of param-
eters, the smooth is shown in Fig. 1.4 (middle panel). The smoother
captures the important features of the time series, and ignores the
random noise. The noise is shown Fig. 1.4 (bottom panel), and if the
smooth is good, should be random. (In this example, the noise does
not apear random, and so the model is probably not very good.)
One difficulty with using smoothers is that they have limited use for
forecasting into the future, as the fitted smoother apply only for the
given data. Consequently, other methods are considered here.
Many methods exist for modelling time series. These notes concentrate only
on the Box–Jenkins method, though some other methods will be discussed
very briefly at the end of the course.
In this section, a variety of simple methods for forecasting are first discussed.
Importantly, in some situations they are also the best method available.
If this is the case, it may not be obvious—it might require some careful
statistical analysis to show that a simple model is the best model.
Slope estimation If the time series appears to have a linear trend, it may
be appropriate to estimate this trend by fitting a straight line by linear
regression. Future values can then be forecast by extrapolating this
line.
1
Many statisticians would probably not identify a smoother as a statistical model (in
fact, I am one of them). But the use of a smoother here demonstrates a point.
Random walk model In some cases, the best estimate of a future value
is the most recent observation. This model is called a random walk
model. For example, the best forecast of the future price of a share is
usually quite close to the present price.
1.5 Software
This course uses the free software package r. r is a free, open source soft-
ware project which is “not unlike” S-Plus, an expensive commercial software
package. r is available for many operating systems from http://cran.
r-project.org/, or http://mirror.aarnet.edu.au/pub/CRAN/ for resi-
dents of Australia and New Zealand. More information about r, including
documentation, is found at http://www.r-project.org/. r is command
line driven like Matlab, but has a statistical rather than mathematical
focus.
r is object orientated. This means to get the most benefit from r, objects
should be correctly defined. For example, time series data should be declared
as time series data. When r knows that a particular data set is a time series,
it has default mechanisms of working with the data. For example, plotting
data in r generally produces a dot-plot; if the data is declared as time series
data, the data are joined by lines which is the standard way of plotting time
series data. The following example explains some of these details.
Example 1.6: In Example 1.1, the monthly average SOI was plotted. Ass-
ming the current folder (or directory) is set as described above, the
following code reproduces this plot.
> summary(soidata)
> soidata[1:5, ]
> names(soidata)
This shows the dataset (or object) soidata consists of four different
variables. The one of interest now is soi, and this variable is referred
to (and accessed) as soidata$soi. To use this variable first declare it
as a time series object:
The first argument is the name of the variable. The input start indi-
cates the time when the data starts. For the SOI data, the data starts
at January 1876, which is input to r as c(1876, 1) (the one means
January, the first month). The command c means ‘concatenate’, or
join together. The data set ends at February 2002; if an end is not
defined, r should be able to deduce it anyway from the rest of the
given information. But make sure you check your time series to ensure
r has interpreted the input correctly. The argument frequency indi-
cates that the data have a cycle of twelve (that is, each twelve points
make one larger grouping—here twelve months make one year).
Now plot the data:
The plot (Fig. 1.5, top panel) is formatted correctly for time series
data. (The command abline(h=0) adds a horizontal line at y = 0.)
In contrast, if the data is not declared as time series data, the default
plot appears as the bottom panel in Fig. 1.5.
When the data are declared as a time series, the observations are
plotted and joined by lines and the horizontal axis is labelled Time by
default (the axis label is easily changed using the command:
title(ylab="New y-axis label")).
Other methods also have a standard default if the data have been
declared as a time series object.
30
20
10
SOI
0
−10
−20
−30
−40
Time
● ●
30 ● ●
●
● ● ● ● ●
● ● ● ●● ● ●● ● ●
20 ●
●
●
● ●
●●●●● ● ●
● ●
● ●● ●
●
● ●
●
●
● ●
●●●
●
●●
●
●● ●● ●
●●
●
●● ● ● ●● ●●● ●
●
● ●● ● ●
●
●
● ● ●●● ● ● ●
● ●●
● ●●●●●●●
● ●●
● ●● ●
●
●●
●●●● ●●
●
● ●●
●
●
●
●
●● ●● ● ●●●●
● ●● ● ●
● ●
● ●
●
●
●● ● ●
●
●
●
● ●●● ● ●
●
●●● ● ● ●● ●
●
●
● ● ●
●●
●●
soidata$soi
10 ● ● ●● ● ●● ●●
●● ●
● ● ●● ● ● ● ● ● ● ● ●
●
●●
●●
●
●●
●●●● ●●
● ●●
●● ●●
●● ●
●
●
●
●
●●
●●●
● ●● ● ●
●●● ●
●●
●●●
●●
● ●●
●●●●●
● ●
●
●
● ●●●
●●● ●●●●●
● ●●●
●●
●● ●
●●●
●● ●●●● ●
● ●● ●●
● ●
● ● ●●
●●
●
●
●
●
● ●● ● ● ●●●●●●● ●● ●
●●● ●●● ● ●●● ●
●
●●●● ● ●
●●
● ●
●●● ●
●
● ●● ●● ●●●
●● ●●●●●●●● ●
● ●●
●●●
● ●
● ●
● ●●●● ● ●● ●●●
● ●●
● ●●
●●
●
●●
●●●
● ●●●
● ●
●●●●●●●●●●●
●
●●●● ●● ● ●●● ●● ● ●
●● ● ●
●
●●● ●
●
●●●
● ●
●●
●
●
● ●●●●●
●
●
●
●
● ●
●
●
●●●
●●●●
●
●
●
●
●
●●●● ●●●
●●
●
●
●●
●●●
●●
●
●●●●
●
●●
●●●
●
●
●
●●
●● ●● ●
●●●
●
● ●●
●●●● ●●●●●
●
●●
● ●
●●●
●
● ●
●●●
● ●
●
●●
●●
●●
●
●
●●●● ●●●● ●
●
●●
●●
0 ● ●●●
●●
● ●
●
●
●
● ●●●
● ●●●
●
●
●
● ●
●
●●●●
● ●● ●
●● ●
●●● ●
●●●●
● ●
●
●
●
●
●
●
●
●
●●●●
●●
●
●
●
●
● ●
●
● ● ●●●
●●●
●
●
●●
● ●
●●
●●
●
●
●● ●
●●
●
●●●
●
●
●
● ●● ●
●●●●●
●●
●●●●
●
● ●
●
●●●●
●
●●●● ● ●● ● ●
● ● ●●
●●●
● ●●● ●
● ●●●●●● ●●●
● ●
●●●
●●●●● ●● ● ● ●●●●● ●
● ● ●
● ●
●● ●●●●
● ● ● ●
●● ●
●●
●
● ● ●●●●
●● ●●●●●● ●●● ●●
●● ●●
●
●●
●● ●● ●● ●●
● ●
●●
●●●●● ●
●
●●
●
● ●●
●
●●● ●● ● ●● ●● ●●
●
●●● ●● ●●
● ●●●
● ●●
●● ● ●●●●●●● ●
●● ●
●●●●●●●●
●●●●● ●●● ●
● ●●
● ● ●
●●●●●
●
●●
●●●●●●● ●
●
●
●● ●●●
●●● ●●● ●●●
● ● ●
● ●
●●●●●●
●
●
●
●
●
●
●
● ●
● ●●●● ● ●
●●●●
● ●
● ● ●
●●●● ●●
●
●●
● ●●●●● ● ● ●●●●
●
● ●●
●● ●●
● ●●
●●●● ●
−10 ●
● ●● ●
●● ● ●●● ●●●●●
●●●● ●
●
●●●
● ●●●
●●
●
●● ● ●
●●● ● ● ● ●●●●●
●●●
● ●● ●
● ●● ● ●● ●● ●● ● ●
●● ● ● ● ● ●●
●
● ●● ●
●●●● ● ●● ●● ● ●
● ● ●●●● ● ●● ●● ●● ● ●●
●● ●
●●●●
●●
● ●● ●● ●●● ● ●● ● ●● ● ●●
●
●● ● ● ●●●●●●●● ●
●
−20 ● ● ● ● ●●
●
● ●
●
● ●● ● ● ●●●● ●
●● ●● ● ● ● ● ● ●
● ●
● ●
● ● ● ● ● ●
−30 ● ●
●
● ●
● ●
−40
Index
Figure 1.5: A plot of the monthly average SOI from 1876 to 2001, without
the series declared as being a time series. Top: the data has been declared
as a time series; Bottom: the data has not been declared as a time series.
In the above example, the data was available in a file. If it is not avail-
able, a data file can be created, or the data can be entered in to r. The
following commands show the general approach to entering data in r. The
command c is very useful: it is used to create a list of numbers, and stands
for ‘concatenate’.
The first line puts the observation into a list called data.values. The second
line designates the data as a time series starting in 1980 (and so r assumes
the values are annual measurements).
You can also use scan(); see ?scan.Data stops being read when a blank line
is entered if you use scan.
Two commands of particular interest are the commands help and help.search.
The help command gives help on a particular topic. For example, try typing
help("names") or help("plot") at the r command prompt. (The quotes
are necessary.) A short-cut is also available: typing ?names is equivalent to
typing help("names"). Using the short-cut is generally more convenient.
The command help.search searches the help database for particular words.
For example, try typing help.search("eigen") to find how to evaluate
eigenvalues in r. (The quotes are necessary.) This function requires a rea-
sonably specific search phrase. The command help.start starts the r help
in a web browser (if everything is configured correctly).
1.6 Exercises
Ex. 1.7: Start r and load in the data file qbo.dat. This data file is a time
series of the monthly quasi-biennial oscillation (QBO) from January
1948 to December 2001.
Ex. 1.8: Start r and load in the data file easterslp.dat. This data file is
a time series of sea-level air pressure anomalies at Easter Island from
Jan 1951 to Dec 1995.
Ex. 1.9: Obtain the maximum temperature from your town or residence for
as far back as possible up to, say, thirty days. This may be obtained
from a newspaper or website.
Ex. 1.10: The data in Table 1.1 shows the mean annual levels at Lake
Victoria Nyanza from 1902 to 1921, relative to a fixed reference point
(units are not given). The data are from Shaw [41], as quoted in
Hand [19].
Table 1.1: The mean annual level of Lake Victoria Nyanza from 1902 to
1921 relative to some fixed level (units are unknown).
(b) Plot the data. Make sure you give appropriate labels.
(c) List important features in the data (if any) that should be mod-
elled.
Ex. 1.11: Many people believe that sunspots affect the climate on the
earth. The mean number of sunspots from 1770 to 1869 for each year
are given in the data file sunspots.dat and are shown in Table 1.2.
(The data are from Izenman [23] and Box & Jenkins [9, p 530], as
quoted in Hand [19]).
(a) Enter the data into r as a time series by loading the data file
sunspots.dat.
(b) Plot the data. Make sure you give appropriate labels.
(c) List important features in the data (if any) that should be mod-
elled.
10
Quasi−bienniel oscillation
−10
−20
−30
Time
Here the square brackets [ . . . ] have been used; they are used by
r to indicate elements of an array or matrix2 . (Note that start
must have numeric inputs, so qbo$Month[1]will not work as it
returns Jan, which is a text string.)
It is worth printing out qbo to ensure that r has interpretted your
statements correctly. Type qbo at the prompt, and in particular
check that the series ends in December 2001.
(c) The following code plots the graph:
> plot(qbo, las = 1, xlab = "Time", ylab = "Quasi-bienniel oscillation",
+ main = "QBO from 1948 to 2001")
The final plot is shown in Fig. 1.6.
1.10 Here is one way of doing the problem. (Note: The data can be entered
using scan or by typing the data into a data file and loading the usual
way. Here, we assume the data is available as the object llevel.)
30
20
10
−10
Time
Figure 1.7: The mean annual level of Lake Victoria Nyanza from 1902 to
December 1921. The figures are relative to some fixed level and units are
unknown.
The final plot is shown in Fig. 1.7. There is too little data to be sure of
any patterns of features to be modelled, but the series suggests there
may be some regular up-and-down pattern.
23
24 Module 2. Autoregressive (AR) models
Module objectives
2.1 Introduction
2.2 Definition
The letter p denotes the order of autoregressive model, defining how many
previous values the current value is related to. The model is called auto-
regressive because the series is regressed on to past values of itself.
The error term {en } in Equation (2.1) refers to the noise in the time series.
Above, the errors were said to be iid. Commonly, they are also assumed to
have a normal distribution with mean zero and variance σe2 .
For the model in Equation (2.1) to be of use in practice, the scientist must
be able to estimate the value of p (that is, how many terms are needed in
the ar model), and then estimate the values of φk and m0 . Each of these
issues will be addressed in later sections.
Notice the subscripts are defined so that the first value of the series to appear
on the left of the equation is always one. Now consider the ar(3) model in
Example 2.2: When n = 1 (for the first observation in the time series), the
equation reads
T1 = 0.9T0 − 0.4T−1 + 0.1T−2 + e1 .
Bbut the series {T } only exists for positive indices. This means that the
model does not apply for the first three terms in the series, because the data
T0 , T−1 and T−2 are unavailable.
10
6
W
0 20 40 60 80 100
Index
Figure 2.1: One realization of the ar(1) model Wn+1 = 3.12+0.63Wn +en+1
The final plot is shown in Fig. 2.1. The data created in r are called a
realization of the model. Every realization will be different, since each
will be based on a different set of random {e}. The first few values
are not typical, as the model cannot be used for the first observation
(when n = 0 in the ar(1) model in Example 2.1, W0 does not exist);
it takes a few terms before the effect of this is out of the system.
Example 2.4: Chu & Katz [13] studied the seasonal SOI time series {Xt }
from January 1935 to August 1983 (that is, the average SOI for (north-
ern hemisphere) Summer, Spring, etc), and concluded the data was
well modelled using the ar(3) model
One purposes of having models for time series data is to make forecasts. In
this section, ar models will be discussed. First, some notation is established.
2.3.1 Notation
Consider a time series {Xn }. Suppose the values of {Xn } are known from
n = 1 to n = 100. Then the forecast of {Xn } at n = 101 is written as
Xb101|100 . The ‘hat’ indicates the quantity is a forecast, not an observed
value of the series. The subscript implies the value of {Xn } is known up
to n = 100, and the forecast is for the value at n = 101. This is called a
one-step ahead forecast, since the forecast is one-step ahead of the available
data.
In general, the notation X bn+k|n indicates the value of the time series {Xn }
is to be forecast for time n + k assuming that the series is known up to time
n. This forecast is a k-step ahead forecast. Note a k-step ahead forecast can
be written in many ways: Xn+k|n , Xn|n−k and Xn−2|n−k−2 are all k-step
ahead forecasts.
Example 2.5: Consider the forecast Ybt+3|t+1 . This is a forecast of the time
series {Yt } at time t + 3 if the time series is known to time t + 1. This
is a two-step ahead forecast, since the forecast at t + 3 is two steps
ahead of the available information, known up to time t + 1.
2.3.2 Forecasting
The value of Fn+1 , if we knew exactly what is was, is found from Equa-
tion (2.2) as
Fn+1 = 23 + 0.4Fn − 0.2Fn−1 + en+1 (2.3)
by adjusting the subscripts. Then conditioning on what we actually ‘know’
gives
Fn+1|n = 23 + 0.4Fn|n − 0.2Fn−1|n + en+1|n
Adding ‘hats’ to all the terms, the forecast will be
is the forecast.
The difference between Fn+1 and Fbn+1|n determined from Equation (2.3)
and (2.4) is
Hence, the error in making the forecast is en+1 , and so the terms {en } are
actually the one-step ahead forecasting errors.
The same approach can be used for k-step ahead forecasts also, as shown in
the next example.
Hence
Fbn+2|n = 23 + 0.4Fbn+1|n − 0.2Fbn|n + ebn+2|n .
where Equation (2.4) can be substituted for Fbn+1|n , but it is not nec-
essary.
This section introduces the backshift operator , a tool that enables compli-
cated time series model to be written in a simple form, and also allows the
models to be manipulated. A full appreciation of the value of the backshift
operator will not become apparent until later, when the models considered
become very complicated and cannot be written down in any other (practi-
cal) way (see, for example, Example 7.22).
2.4.1 Definition
Note the backshift operator can be used more than once, so that
In general,
B r Xt = Xt−r .
The backshift operator allows ar models to be written in a different form,
which will later prove very useful.
Note the backshift operator only operates on time series (otherwise it makes
no sense to “shift backward” in time). This implies that Bk = k if k is a
constant.
φ(B)Xt = et
2.5 Statistics
since E[en ] = 0 (the average error is zero). Now, assuming the time series
{Xk } is stationary, the mean of this series will be approximately constant at
any time (that is, for any subscript). Let this constant mean be µ. (It only
makes sense to talk about the ‘mean of a series’ if the series is stationary.)
Then,
µ = m0 + φ1 µ + φ2 µ + · · · + φp µ,
and so, on solving for µ,
m0
µ= .
1 − φ1 − φ2 − · · · φp
This enables the mean of the sequences to be computed from the ar model.
Example 2.9: In Equation (2.2), let the mean of the series be µ = E[F ].
Taking expected values of each term,
µ = 23 + 0.4µ − 0.2µ + 0.
Yt = 12 + 0.5Yt−1 + et ,
Yt − 0.5Yt−1 = 12 + et ,
since the errors {en } are assumed to be independent of the time series {Yn }.
Since the series is assumed stationary, the variance is constant at all time
steps; hence define σY2 = var[Yn ]. Then,
The covariance is a measure of how two variables change together. For two
random variables X (with mean µX and variance σX 2 ) and Y (with mean
2
µY and variance σY ), the covariance is defined as
Covar[X, Y ]
Corr[X, Y ] = 2 σ2 .
σX Y
In the case of a time series, the autocovariance is defined between two points
in the time series {Xn } (with a mean µ), say Xi and Xj , as
Since the time series is stationary, the autocovariance is the same if the time
series is shifted in time. For example, consider Example 1.2 which includes
a plot of the SOI. If we were to split the SOI series into (say) five equal
period of time, and produce a plot like Fig. 1.2 (top panel) (p 7) for each
time period, the correlation would be similar for each time period.
This all means the important information about Xi and Xj is the time
between the two observations (that is, |i − j|). Arbitrarily, Xi can be set to
X0 then, and hence the autocovariance can be written as
γk = Covar[X0 , Xk ]
for integer k. As with correlation, the autocorrelation is then defined as
γk
ρk =
γ0
for integer k, where γ0 = Covar[X0 , X0 ] is simply the variance of the time
series.
The series {ρk } is known as the autocorrelation function, or acf, at lag k.
For any given ar model, it is possible to determine the acf, which will
be unique to that ar model. For this reason, the acf is one of the most
important pieces of information to know about a time series. Later, the acf
isused to determine which ar model is appropriate for our data.
The term lag indicates the time difference in the acf. Thus, “the acf at
lag 2” means the term in the acf for k = 2, which is the correlation of any
term in the series with the term two time steps before (or after, as the series
is assumed stationary).
Note that since the autocorrelation is a series, the backshift operator can be
used with the autocorrelation. It can be shown that the autocovariance for
an ar(p) model is
σe2
γ(B) = . (2.5)
φ(B)φ(B −1 )
Example 2.11: In Example 1.2 (p 5), the seasonal SOI was plotted against
the seasonal SOI for one, two, three and four seasons ago. In r, the
correlation coefficients were computed as
The correlations between the SOI and lagged values of the SOI can be
written as the series of autocorrelations:
{ρ} = {1, 0.632, 0.41, 0.2, 0.0076}.
Example 2.12:
The ar(2) model
Ut+1 = 0.3Ut − 0.2Ut−1 + et+1 (2.6)
is written using the backshift operator as
φ(B)Ut+1 = et+1
where φ(B) = 1 − 0.3B + 0.2B 2 . Suppose for the sake of example that
σe2 = 10. Then, since φ(B −1 ) = 1 − 0.3B −1 + 0.2B −2 , the autocovari-
ance is
10
γ(B) =
(1 − 0.3B + 0.2B )(1 − 0.3B −1 + 0.2B −2 )
1 2
10
= .
(0.2B −2 − 0.36B −1 + 1.13 − 0.36B + 0.2B −2 )
By some detailed mathematics (Sect. 3.6.3), this equals
γ(B) = · · ·+11.11+2.78B 1 −1.39B 2 −0.97B 3 −0.0139B 4 +0.190B 5 +· · · ,
only quoting the terms for the non-negative lags (recall that the auto-
correlation is symmetric). The terms in the autocorrelation are there-
fore (quoting terms from the non-negative lags again):
{γ} = {γ0 , γ1 , γ2 , . . . }
= {11.11, 2.78, −1.39, −0.97, −0.0139, 0.190, 0.0598, · · · }.
The first term at lag zero always has an acf value of one (that is, each
term is perfectly correlated with itself). It is usual to plot the acf,
(Fig. 2.2).
1.0
0.8
0.6
ACF
0.4
0.2
0.0
5 10 15 20
lag
Figure 2.2: The acf for the ar(2) model in Equation (2.6).
The plot is typical of an ar(2) model: the terms in the acf decay
slowly towards zero. Indeed, any low order ar model (such as ar(1),
ar(2), ar(3), or similar) shows similar behaviour: a slow decay of the
term towards zero.
φ1 + φ2 < 1
φ2 − φ1 < 1
−1 < φ2 < 1
2.7 Summary
2.8 Exercises
Ex. 2.13: Classify the following ar models (that is, state if they are ar(1),
ar(4), etc.)
Ex. 2.14: Classify the following ar models (that is, state if they are ar(1),
ar(4), etc.)
(a) Xn = en + 0.223Xn−1 .
(b) At = 26.7 + 0.2At−1 − 0.2At−2 + et .
(c) Qt + 0.21Qt−1 + 0.034Qt−2 − 0.13Qt−3 = et .
Ex. 2.17: Write each of the models in Exercise 2.13 using the backshift
operator.
Ex. 2.18: Write each of the models in Exercise 2.14 using the backshift
operator.
Ex. 2.19: The time series {An } has a mean of 47.4. The following ar(2)
model was fitted to the series:
An = m0 + 0.25An−1 + 0.17An−2 + en .
Ex. 2.20: The time series {Yn } has a mean of 12.26. The following ar(3)
model was fitted to the series:
Ex. 2.21: Yao [52] fits numerous ar models to model the total June rainfall
(in mm) at Shanghai, {Yt }, from 1932 to 1950. One of the fitted models
is
Yt = 309.70 − 0.44Yt−1 − 0.29Yt−2 + et .
Ex. 2.22: In Guiot & Tessier [18], ar(3) models are fitted to the widths
of tree rings. This is of interest as there is evidence that pollution
may be affecting tree growth. Each observation in the series {Ct } is
the average of 30 tree-ring widths from 1900 to 1941 of a species of
conifer. Write down the general form of the model used to forecast
tree-ring width.
Ex. 2.23: Woodward and Gray [51] use a number of models, including ar
models, to study change in global temperature. One such ar model
is given in the paper (their Table 2) for modelling the International
Panel for Climate Change (IPCC) data series from 1968 to 1990 has
the factor
(1 + 0.22B + 0.59B 2 )
when the model is written using backshift operators. Write out the
model without using the backshift operator.
Ex. 2.26: The notes indicate that for an ar(2) process to be stationary,
the following conditions must be satisfied:
φ1 + φ2 < 1
φ2 − φ1 < 1
−1 < φ2 < 1
Ex. 2.27: Consider the time series {G}, for which the last three observa-
tions are: G67 = 40.3, G68 = 39.6, G69 = 50.1. A statistician has
developed the ar(2) model
Gn = en − 0.3Gn−1 − 0.1Gn−2 + 63
Ex. 2.28: Use r generate a time series from the ar(1) model
(b) Compute the mean of your R-generated time series, ignoring the
first 50 observations. (It usually takes a little while for the sim-
ulations to stabilize; see Fig. 2.1.) Compare to your previous
answer, and comment.
(c) Develop a forecasting formula for forecasting {F } one-, two- and
three-steps ahead.
(d) Using your generated data set, compute numerical forecasts for
the next three observations.
2.15 (a) Let µ = E[X] and take expectations of each term. This gives:
µ = 0 + 78.03 − 0.56µ − 0.23µ + 0.19µ. Solving for µ shows that
µ = E[X] ≈ 48.77.
(b) In a similar manner, E[Y ] = 10.49.
(c) E[D] = 0.
Module contents
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3 The backshift operator . . . . . . . . . . . . . . . . . . . 43
3.4 Forecasting ma models . . . . . . . . . . . . . . . . . . . 44
3.4.1 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4.2 Confidence intervals . . . . . . . . . . . . . . . . . . . . 45
3.4.3 Forecasting difficulties with ma models . . . . . . . . . . 47
3.5 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.5.1 The mean . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.5.2 The variance . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5.3 Autocovariance and autocorrelation . . . . . . . . . . . 49
3.6 Why have different types of models? . . . . . . . . . . . 50
3.6.1 Two reasons . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.6.2 Conversion of models . . . . . . . . . . . . . . . . . . . . 51
3.6.3 The acf for ar models . . . . . . . . . . . . . . . . . . 53
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.8.1 Answers to selected Exercises . . . . . . . . . . . . . . . 57
41
42 Module 3. Moving Average (MA) models
Module objectives
understand that the acf for an ma(q) model will have q non-zero terms
(apart from the term at lag zero, which is always one).
3.1 Introduction
This Module introduces a second type of time series model: moving aver-
age models. Together with autoregressive models, they form the two basic
models in the Box–Jenkins methodology.
3.2 Definition
For the model in Equation (3.2) to be of use in practice, the scientist must
be able to estimate the value of q (that is, how many terms are needed in
the ma model), and then estimate the values of θk and m. Each of these
issues will be addressed in later sections.
The backshift operator can be used to write ma models in the same way as
ar models. Consider the model in Example 3.1. Using backshift operators,
this is written
3.4.1 Forecasting
The principles of forecasting were developed in Sect. 2.3.1 (it may be worth
reading this section again) in the context of ar models. The same principles
apply for ma models. Consider the following ma(2) model:
Rn = 12 + en − 0.3en−1 − 0.12en−2 , (3.3)
where en has a normal distribution with a mean of zero and variance of
σe2 = 3; that is, en ∼ N (0, 3). Suppose a one-step ahead forecast is required
if the information about the time series {Rn } is known up to time n; that
is, R
bn+1|n is required.
Example 3.3: A two-step ahead forecast for the ma(2) model in Equa-
tion (3.3) is found by first adjusting the subscripts:
Rn+2 = 12 + en+2 − 0.3en+1 − 0.12en ,
and then writing
bn+2|n = 12 + ên+2|n − 0.3ên+1|n − 0.12ên|n .
R
Of the terms on the right, only en|n is known; the rest must be replaced
by the mean value of zero. So the two-step ahead forecast is
bn+2|n = 12 − 0.12en .
R
The forecasts for three-steps ahead is
R
bn+3|n = 12,
For this equation, one- and two-step ahead forecasts were developed. The
one-step ahead forecast is
bn+1|n = 12 − 0.3en − 0.12en−1 .
R
The forecasting error is the difference between the value forecast, and the
value actually observed. It is found as follows:
Rn+1 − R
bn+1|n .
from Equation (3.5). (The reason Rn+1 is not known exactly is that Rn+1
depends on the unknown random value of en+1 ; this is the error we make
when we make our forecast, which is of course unknown.) This means that
the forecasting error is
Rn+1 − R
bn+1|n = [12 + en+1 − 0.3en − 0.12en−1 ] − [12 − 0.3en − 0.12en−1 ]
= en+1 .
This tells us that the series {en } is actually just the one-step ahead forecast-
ing errors. A confidence interval for the forecast of Rn+1 can also be formed.
The actual error about to be made, en+1 is, of course, unknown. But this
information can be used to develop confidence intervals for the forecast.
The variance of {en } can generally be estimated by computing all the previ-
ous forecasting errors (r computes these) and then computing the variance.
Suppose for the sake of example the variance of the errors is 5.8. Then the
variance of the forecast is
var[Rn+1 − R
bn+1|n ] = var[en+1 ] = 5.8.
for the appropriate value of z ∗ . Generally, this is taken as 2 for a 95% con-
fidence interval. (1.96 is more precise; t-values with an appropriate number
of degrees of freedom even more precise. In practice, however, the value of
2 is often used.) So the confidence interval for the forecast is approximately
√
Rbn+1|n ± 2 × 5.8
or Rbn+1|n ± 4.82.
Example 3.4: In Example 3.3, the following two-step ahead forecast was
obtained for Equation (3.5):
bn+2|n = 12 − 0.12en .
R
Rn+2 − R
bn+2|n = [12 + en+2 − 0.3en+1 − 0.12en ] − [12 − 0.12en ]
= en+2 − 0.3en+1 .
The same principle is used for three-, four- and further steps ahead,
when the confidence interval is
√
Rbn+k|n ± 2 6.40552 = R bn+k|n ± 5.06
Tb10|9 = −0.3e9 .
And so we need the one-step ahead forecasting error for n = 9, which requires
knowledge of Tb9|8 . From the forecasting formula, we find this using
Tb9|8 = −0.3e8 .
And so the cycle continues, right back to the start of the series.
In practice, we need to compute all the one-step ahead forecasting errors. r
can compute these errors and produce predictions without having to worry
about these difficulties in a real (data-driven) situation; see Sect. 5.4.
3.5 Statistics
since the average error is zero. Hence, for an ma model, the constant term
m is actually the mean of the series {Xn }.
Example 3.5: In Equation (3.3), let the mean of the series be µ = E[R].
Then taking expected values of each term gives
µ = 12,
so that the mean of the series is µ = E[R] = 12. This should not be
unexpected given the forecasts in Example 3.3
The variance of a time series written in ma(1) form is found by taking the
variance of each term. Consider again Equation (3.2); taking the variance
of each term gives
where {en } ∼ N (0, 3), since the errors {en } are independent of the time
series {Rn } and independent of each other. This gives
> set.seed(100)
> ma.sim <- arima.sim(model = list(ma = c(-0.3,
+ -0.12)), n = 10000, sd = sqrt(3))
> var(ma.sim)
[1] 3.321068
[1] 3.309557
γk = Covar[X0 , Xk ]
Note that since the autocorrelation is a series, it can be written using the
backshift operator. It can be shown that the autocovariance for an ma(p)
model is
γ(B) = θ(B)θ(B −1 )σe2 .
Example 3.7: The ma(2) model Vn+1 = en+1 − 0.39en − 0.22en−1 can be
written
Vn+1 = θ(B)en+1
where θ(B) = 1 − 0.39B 1 − 0.22B 2 . Suppose for the sake of example
that σe2 = 2. Then, since θ(B −1 ) = 1 − 0.39B −1 − 0.22B −2 , the
autocovariance is
The terms in the autocovariance are therefore (quoting only the terms
for the non-zero lags, as the autocorrelation is symmetric):
1.0
0.8
0.6
ACF
0.4
0.2
0.0
−0.2
0 1 2 3 4 5 6 7
Lag
Figure 3.1: The acf for the ma(2) model in Example 3.7.
The plot is typical of an ma(2) model: there are two terms in the acf
that are non-zero (apart from the term at a lag of zero, which is always
one).
In general, the acf of an ma(q) model has q non-zero terms excluding
the term at lag zero which is always one.
Why are both ar and ma models necessary? ar models are far more popular
in the literature than ma models, so why not just have ar models? There
are two important reasons why both ma and ar models are necessary.
The first reason is that only ma models can be used to create confidence
intervals on forecasts (Sect. 3.4.2). If an ar model is developed, it must be
written as ma model to produce confidence intervals for the forecasts.
Xn = θ(B)en ,
which looks like an ma model. This is exactly the way models are converted
from ar to ma.
Consider writing the ar(1) model Xn = 0.6Xn−1 +en as an ma model. There
are three ways of proceeding. The first can only be used for ar(1) models
as it uses a mathematical result relevant only then. The second approach
is more difficult, but is used for any ar model. The third approach uses r,
and so is the easiest but of no use in the examination.
Using the first approach, write the model using the backshift operator as:
φ(B)Xn = en , where φ(B) = 1 − 0.6B. Then divide by φ(B) to obtain
Xn = θ(B)en , where θ(B) = 1/φ(B). So,
1
θ(B) = . (3.6)
1 − 0.6B
The mathematical result for the sum of a geometric series1 is then used to
obtain
1
θ(B) = = 1 + 0.6B + (0.6)2 B 2 + (0.6)3 B 3 + · · · .
1 − 0.6B
1
1 + r + r2 + r3 + · · · = 1/(1 − r) if |r| < 1.
This shows an ar(1) model has an equivalent ma(∞) form. Since both are
equivalent, the simpler ar(1) form would be preferred, but the ma form is
necessary for computing confidence intervals of forecasts.
In the second approach, start with Equation (3.6), and equate it to an un-
known infinite sequence of θ’s:
1
= 1 + θ1 B + θ2 B 2 + · · · .
1 − 0.6B
Then multiply both sides by 1 − 0.6B to get
1 = (1 − 0.6B)(1 + θ1 B + θ2 B 2 + · · · )
= 1 + B(θ1 − 0.6) + B 2 (θ2 − 0.6θ1 ) + · · · ,
and then equate the powers of B on both sides of the equation. For example,
looking at constants, there is one on both sides. Looking at powers of B, zero
are on the left, and −0.6 + θ1 on the right after multiplying out. Equating,
we find that θ1 = 0.6 (as before). Then equating powers of B 2 , the left hand
side has zero, and the right hand side has θ2 − 0.6θ1 . Substituting θ1 = 0.6
and solving gives θ2 = (0.6)2 (as before). A general pattern emerges, giving
the same result as before.
The third approach uses r. This is useful, but you will need to know other
methods for the examination. Naturally, the answers are the same as using
the other two methods.
Note the one is not needed in the list of ar components as it is always one!
Confusingly, the sign is different for the φ.
Time Series:
Start = 1
End = 20
Frequency = 1
[1] 1.000000e+00 6.000000e-01 3.600000e-01
[4] 2.160000e-01 1.296000e-01 7.776000e-02
[7] 4.665600e-02 2.799360e-02 1.679616e-02
[10] 1.007770e-02 6.046618e-03 3.627971e-03
[13] 2.176782e-03 1.306069e-03 7.836416e-04
[16] 4.701850e-04 2.821110e-04 1.692666e-04
[19] 1.015600e-04 6.093597e-05
Briefly, we digress to again consider the acf for ar models, seen previously
in Sect. 2.5.4, Equation 2.5, and Example 2.12 (p 34) in particular. In this
example, the following in stated:
. . . the autocovariance is
10
γ(B) =
(1 − 0.3B 1 + 0.2B 2 )(1 − 0.3B −1 + 0.2B −2 )
10
= .
(0.2B −2 − 0.36B −1 + 1.13 − 0.36B + 0.2B −2 )
On the left, the constant term is 10; on the right, a constant can be found
from:
So we have
That’s the AR model found. Note that the first component is 1 and is
assumed; it should not be included.
> theta[1:4]
3.7 Summary
3.8 Exercises
Ex. 3.8: Classify the following ma models (that is, state if they are ma(3),
ma(2), etc.)
Ex. 3.9: Classify the following ma models (that is, state if they are ma(3),
ma(2), etc.)
(a) Bt = 0.1et−1 + et .
(b) Yn = 0.036en−2 − 0.36en−1 + en .
(c) Wt + 0.39et−1 + 0.25et−2 − 0.21et−3 − et = 8.00.
Ex. 3.12: Write each of the models in Exercise 3.8 using the backshift op-
erator.
Ex. 3.13: Write each of the models in Exercise 3.9 using the backshift op-
erator.
into the equivalent ma model using each of the three methods outlined
in Sect. 3.6, and confirm that they give the same answer.
Yn = en + 0.3en−1 − 0.1en−2
into the equivalent ar model using one of the three methods outlined
in Sect. 3.6.
Yn = 0.25Yn−1 − 0.13Yn−2 + en
into the equivalent ma model using one of the three methods outlined
in Sect. 3.6.
Ex. 3.17: Compute forecasting formula for each of the ma models in Exer-
cise 3.8 for one-, two- and three-steps ahead, and compute confidence
intervals for each forecast in terms of the error variance σe2 .
Ex. 3.18: Compute forecasting formula for each of the ma models in Exer-
cise 3.9 for one-, two- and three-steps ahead, and compute confidence
intervals for each forecast. In each case, assume σe2 = 2.
Xn = 0.4en−1 + en
Zt = 0.2et−1 − 0.1et−2 + et ,
Wn = 0.3Wn−1 + en ,
Yt = 0.45Yt−1 − 0.2Yt−2 + et ,
3.10 The means are: E[A] = 8.39; E[X] = 0; and E[Y ] = 12.40.
or
Xt+1 = et+1 + 0.4et + 0.16et−1 + 0.064et−2 + · · · .
ARMA Models
4
Module contents
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3 The backshift operator for arma models . . . . . . . . . 62
4.4 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4.1 The mean . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4.2 The autocovariance and autocorrelation . . . . . . . . . 63
4.5 Conversion of arma models to ar and ma models . . . 64
4.6 Forecasting arma models . . . . . . . . . . . . . . . . . . 65
4.6.1 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.6.2 Confidence intervals . . . . . . . . . . . . . . . . . . . . 66
4.6.3 Forecasting difficulties with arma models . . . . . . . . 67
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.8.1 Answers to selected Exercises . . . . . . . . . . . . . . . 70
Module objectives
59
60 Module 4. arma Models
4.1 Introduction
This Module examines models with both autoregressive and moving average
components.
4.2 Definition
The principle of parsimony—that the best model is the simplest model that
captures the important features of the data—has been mentioned before,
where it was noted that a complex ar model can often be replaced by a
simpler ma model.
where {Xn }n≥1 , m0 is some constant, and the φk and θj are defined as for
ar and ma models respectively.
Example 4.3: Chu & Katz [13] studied the monthly SOI time series from
January 1935 to August 1983, and concluded the data could be mod-
elled by an arma(1, 1) model.
Example 4.4: Davis & Rappoport [15] use an arma(2, 2) model for the
Palmer Drought Index, {Yt }. The final fitted model is
Katz & Skaggs [26] claim the equivalent ar(2) model is almost as good
as the model given by Davis & Rappoport, yet has half the number of
parameters. For this reason, they prefer the ar(2) model.
arma models have both ar and ma components; the model is easily writ-
ten using the backshift operator by following the guidelines for ar and ma
models.
4.4 Statistics
Example 4.6: The mean of {Zt }, say µ, in the arma(1, 2) model in Equa-
tion (4.2) is found by taking expectations of each term:
σe2 θ(B)θ(B −1 )
γ(B) = .
φ(B)φ(B −1 )
φ(B) = (1 + 0.29B 1 )
θ(B) = (1 − 0.66B 1 + 0.72B 2 ).
Suppose for the sake of example that σe2 = 4. Then, the autocovariance
is
4(1 − 0.66B 1 + 0.72B 2 )(1 − 0.66B −1 + 0.72B −2 )
γ(B) = .
(1 + 0.29B)(1 + 0.29B −1 )
Using similar approaches as used before in Sect. 3.6, arma models can be
converted to pure ar or pure ma models.
θ(B) = θ0 (B)φ(B)
1 + 0.4B − 0.1B 2 = (θ00 + θ10 B + θ20 B 2 + θ30 B 3 + · · · )(1 − 0.3B)
= θ00 + B(−0.3θ00 + θ10 )
+ B 2 (−0.3θ10 + θ20 )
+ B 3 (−0.3θ20 + θ30 ) + · · ·
Now, equate powers of B so that both sides of the equation are equal.
Equating constant terms: 1 = θ00 as expected. Equating terms in B:
0 = −0.3θ20 + θ30 ,
Taking expectations of Equation (4.3) shows that the mean of the series
is E[X] = 10/0.7 ≈ 14.2857. Taking expectations of Equation (4.5)
shows that m0 = E[X] ≈ 14.2857. So the arma(1, 2) model has the
equivalent ma model
4.6.1 Forecasting
Forecasting arma models uses the same principles as for forecasting ma and
ar models. This procedures is called the hat principle, summarized below:
The forecasting equation for an arma model is obtained from the model
equation by “placing hats” on all the terms of the equation, and adjusting
subscripts accordingly. The “hat” designates the best linear estimate of the
quantity underneath the hat. This equation is then adjusted by noting:
1. An ebk|j for which k is in the future (i.e. k > j) just equals zero (the
mean of {ek }), while one for which k is in the present or past (k ≤ j)
just equals ek . In other words, hats change future ek ’s to zeros and
they fall off present and past ek s.
W
cn+1|n = 0.72 + 0.44W cn−1|n + ên+1|n − 0.26ên|n .
cn|n + 0.17W
W
cn+2|n = 0.72 + 0.44W
cn+1|n + 0.17Wn .
Again, W
cn+1|n can be replaced by Equation (4.7) (though this is not
necessary) to get
cn+2|n = 0.72 + 0.44 {0.72 + 0.44Wn + 0.17Wn−1 − 0.26ên } + 0.17Wn ,
W
Xt+2 − X
bt+2|t = et+2 + 0.7et+1 ,
4.7 Summary
4.8 Exercises
Ex. 4.11: Classify the following models as ar, ma or arma, and state the
orders of the models (for example, an answer may be arma(1, 3)):
Ex. 4.12: Classify the following models as ar, ma or arma, and state the
orders of the models (for example, an answer may be arma(1, 3)):
Ex. 4.15: Write each model in Exercise 4.11. using the backshift operator.
Ex. 4.16: Write each model in Exercise 4.12. using the backshift operator.
Xn = 0.2Xn−1 + en − 0.1en−1
Yn = 0.3Yn−1 + en + 0.2en−1
Dt − exp{−1/K3 }Dt−1 =
(1 − c3 exp{−1/K3 })It − exp{−1/K3 }(1 − c3 )It−1 ,
Table 4.1: Parameters estimates and standard errors for the arma(1, 1)
model fitted by Sales, Pereira & Vieira [40].
Parameter Estimate Standard Error
φ1 0.8421 0.0237
θ1 −0.2398 0.0426
σe2 0.4343
Ex. 4.23: Consider the arma(2, 2) model for the Palmer Drought Index
seen in Example 4.4. Write this model using the backshift operator.
Then create forecasting formulae for forecasting one-, two-, three- and
four-steps ahead.
4.11 The models are arma(1, 1); ar(2) (or arma(2, 0)); arma(1, 1); ma(1)
(or arma(0, 1)); arma(1, 3).
4.13 The means are: E[A] = 8.75; E[X] ≈ 13.0; E[Y ] = 0; E[R] = 0;
E[P ] ≈ 6.44.
(d) The variance of the forecasting error for the one-step ahead fore-
cast is σe2 = 9.3. For the two-step ahead forecast, the variance of
the forecast error is σe2 + (0.1)2 σ√e2 = 9.393. The 95% confidence
intervals therefore are X ± 2 9.3 for the one-step ahead fore-
√ t+1|t
b
cast; and Xt+2|t ± 2 9.393 for the two-step ahead forecast.
b
4.21 Hint: First write φ = exp(−1/K3 ), and the right-hand side looks like
the ar(1) part. Then, use the given relationship between It and et to
find It−1 and hence show that θ = φ(1 − c3 )/(1 − c3 φ) for the ma(1)
part.
Module contents
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.2 Identifying a Model . . . . . . . . . . . . . . . . . . . . . 75
5.2.1 The Autocorrelation Function . . . . . . . . . . . . . . . 75
5.2.2 Sample acf . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2.3 Sample pacf . . . . . . . . . . . . . . . . . . . . . . . . 80
5.2.4 Tips for using the sample acf and pacf . . . . . . . . . 83
5.2.5 Model selection using aic . . . . . . . . . . . . . . . . . 83
5.2.6 Selecting arma models . . . . . . . . . . . . . . . . . . 84
5.3 Parameter estimation . . . . . . . . . . . . . . . . . . . . 85
5.3.1 Preliminary estimation for ar models: The Yule–Walker
equations . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.3.2 Parameter estimation in R . . . . . . . . . . . . . . . . 86
5.4 Forecasting using R . . . . . . . . . . . . . . . . . . . . . 88
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.6.1 Answers to selected Exercises . . . . . . . . . . . . . . . 95
73
74 Module 5. Finding a Model
Module objectives
use the sample acf and sample pacf to select ar and ma models for
time series data;
use r to plot the sample acf and pacf for time series data;
5.1 Introduction
In this Module, methods are discussed for finding the best model for a par-
ticular time series. This consists of two stages: first, determining which type
of model is appropriate for the given data (for example, ar(1) or ma(2));
then secondly estimating the parameters in the chosen model.
The choice of ar, ma and arma models are discussed, as well as the number
of parameters necessary for the chosen type of model.
The two most important tools in making these decisions are the sample
autocorrelation function (acf) and sample partial autocorrelation function
(pacf).
In practice, the scientist doesn’t start with a known model, but instead starts
with data for which a model is sought. Using software, the acf is estimated
from the data (using a sample acf), and the characteristics of the sample
acf used to select the best model.
The autocorrelation function is estimated from the data using the formulae
N −k
1 X
γ
bk = (Xi − µ̂)(Xi+k − µ̂) k≥0
N
i=1
γ
bk
ρbk = k≥0
γ
b0
where N is the number of terms in the time series and µ̂ is the sample mean
of the time series. Of course, the actual computations are performed by
computer, using a package such as r. Since the quantities γ bk (and hence
ρbk ) are estimated, there will be some sampling error. Formulae exist for
estimation of the sampling error but will not be given here. However, r uses
these formulae to produce approximate 95% confidence intervals for ρbk .
Consider the ma(2) model as used in Example 3.7 (p 49): Vn+1 = en+1 −
0.39en − 0.22en−1 , where σe2 = 2. In that example, the theoretical acf was
computed as
{ρ} = {1, −0.253, −0.183}.
The series {Vn } is simulated in r as follows:
Series sim.ma2[10:1000]
1.0
0.8
0.6
0.4
ACF
0.2
0.0
−0.2
0 5 10 15 20 25 30
Lag
Figure 5.1: The sample acf for the ma(2) model in Example 2.2 (p 35).
> acf(sim.ma2[10:1000])
Note the first few terms have been ignored; this allows the simulation to
recover from the initial (arbitrary) choice of errors needed to begin the sim-
ulation.
First, note the dotted horizontal lines on the plot. These indicate the approx-
imate 95% confidence intervals for ρbk . In other words, if the autocorrelation
value lies within the dotted lines, the value can be considered as zero; the
reason it is not exactly zero is due to sampling error only.
We would expect that the sample acf would demonstrate the features of the
acf for the model. Compare Figures 3.1 (p 50) and 5.1; the sample acf and
acf do look similar—they both show two components in the plot that are
larger than the rest when we ignore the term at a lag of zero which will always
be one. (Recall that only two acf values are outside the dotted confidence
bands, so the rest can be considered as zero, and that the first term will
always be one so is of no importance.) Notice there are two components in
the acf that are non-zero for a two-parameter ma model (that is, ma(2)).
In fact, this is typical. Here is one of the most important rules for identifying
time series models:
Xn = 0.4Xn−1 − 0.3Xn−2 + en .
The theoretical acf can be computed and plotted in r (by first con-
verting to an ma model):
This sample acf is shown in the bottom panel of Fig. 5.2. They are
very similar as expected.
Example 5.2:
Parzens [36] studied a time series of yearly snowfall in Buffalo from
1910 to 1972 (recorded to the nearest tenth of an inch):
Actual ACF
1.0
0.8
0.6
ACF
0.4
0.2
0.0
−0.2
2 4 6 8 10
Lag
Sample ACF
1.0
0.8
0.6
ACF
0.4
0.2
0.0
−0.2
0 2 4 6 8 10
Lag
Figure 5.2: Top: the theoretical acf for the ar(2) model in Example 5.1;
Bottom: the sample acf for data simulated from the ar(2) model in Ex-
ample 5.1.
120
100
80
sf
60
40
Time
Series sf
1.0
0.6
ACF
0.2
−0.2
0 5 10 15
Lag
Figure 5.3: Yearly Buffalo snowfall from 1910 to 1972. Top: the plot of the
data; Bottom: the sample acf.
The data are plotted in the top panel of Fig. 5.3. The time series
is small, but the series appears to be approximately stationary. The
sample acf for the data has been computed in r(Fig. 5.3, bottom
panel).
The acf has two non-zero terms (ignoring the term at lag zero, which
is always one), suggesting an ma(2) model is appropriate for modelling
the data. Note the confidence bands are approximate only. Here is
some of the code used to produce the plots:
In the previous section, the acf was introduced to indicate the order of the
ma model appropriate for a dataset. How do we choose the appropriate
order of the ar model? To identify ar models, a partial acf is used, which
is explained below.
Example 5.3: Consider the ar(2) model from Example 5.1. As this is an
ar(2) model, the sample pacf from the simulated data is expected
to have two significant terms. The sample pacf (Fig. 5.4) has two
significant terms as expected.
As explained, note there is no term at a lag of zero for the sample
pacf.
Example 5.4: In Example 5.2 (p 77), the annual Buffalo snowfall data
was examined using the acf and an ma(2) model was found to be
suitable.
0.3
0.2
0.1
Partial ACF
0.0
−0.1
−0.2
−0.3
2 4 6 8 10
Lag
Figure 5.4: The sample pacf of data simulated from an ar(2) model.
Series sf
0.3
0.2
0.1
Partial ACF
0.0
−0.1
−0.2
5 10 15
Lag
Figure 5.5: The sample pacf of yearly Buffalo snowfall from 1910 to 1972.
6
Correct AR(2) model
4
Incorrect ARIMA(9,2,9) model
2
Data
0
−2
−4
Time
Figure 5.6: Simulated ar(2) data. Two models have been used to make pre-
dictions; the simple model is better for prediction. Note the more complex
model predicts snowfall wil increase linearly over time!
The sample pacf for the data has been computed in r(Fig. 5.5, bottom
panel); there is no term at a lag of zero for the sample pacf.
The pacf has only one non-zero term, suggesting an ar(1) model is
appropriate for modelling the data. Recall the acf suggested an ma(2)
model. Which model do we choose? Since the one-parameter ar model
is simpler than the two-parameter ma model, the ar(1) model would
be chosen as the best model. We almost certainly do not need both a
ma(2) and ar(1) term in the model. (Later, we will learn about other
criteria to use that helps make this decision also.)
Now that an ar(1) model is chosen, it remains to estimate the param-
eters of the model. This will be discussed in Sect. 5.3.
Example 5.5: Consider some simulated ar(2) data. An ar(2) model and
a more complicated model (an arima(9, 2, 9); we learn about arima
models in Module 7.4) are fitted to the data. Predictions can be made
using both models; these predictions are compared in Fig. 5.6.
The simple model is far better for making predictions!
Table 5.1: Typical features of a sample acf and sample pacf for ar and
ma models. The ‘slow decay’ may not always be observed.
acf pacf
ar(k) model slow decay k non-zero terms
ma(k) model k non-zero terms slow decay
When using the sample acf and pacf it is important to realize they are
obtained from sample information. This means they have sampling error. To
allow for this, the dotted lines produced by r represent confidence intervals
(95% by default). This implies a small number of terms (about 1 in 20) will
lie outside the dotted lines even if they are truly zero. In addition, these
confidence intervals are approximate only. Since 5% (or 1 in 20) components
are expected to be outside these approximate limits anyway, it is important
to not place too much emphasis on term in the sample acf and pacf are
marginal. For example, if the sample acf has two significant terms, but
one is just over the confidence bands, perhaps an ma(1) model will be just
as good as an ma(2). Tools for assisting in making this decision will be
considered in Module 6.
than produced using the aic. In each case, the model with the minimum
aic is selected.
In r, the function ar uses the aic to select the order of the ‘best’ ar model;
unfortunately, ma and arma models are not considered.
The advantage of this method it is automatic, and any two people using
the same data and software will select the same model. The disadvantage is
the computer is very strict in its decision making and does not allow for a
human’s expert knowledge or interpretation of the information.
Example 5.6: Using the snowfall data from Example 5.4 (p 80), the func-
tion ar can be used to select the order of the ar model.
> sf.armodel <- ar(sf)
> sf.armodel
Call:
ar(x = sf)
Coefficients:
1 2
0.2379 0.2229
(We will consider writing down the actual model in Sect. 5.3.2).
Thus the ar function recommends an ar(2) model (from line 10)
Order selected 2). There are therefore three models to consider:
an ma(2) from the sample acf an ar(1) from the sample pacf and
now an ar(2) from r using the aic. Which do we choose?
This predicament happens often in time series analysis: there are often
many good models from which to choose. In Module 6, some methods
will be discussed for evaluating various models. If one of the model
appears better than the others using these methods, that model should
be chosen. But what if they all appear to be equally good? In that
case, the simplest model would be chosen—the ar(1) model in this
case.
Selecting arma models is not easy from the acf and the pacf. To select
arma models, it is first necessary to study some diagnostics of ar and ma
models in the next Module. The issue of selecting arma models will be
reconsidered in Sect. 6.3.
Previous sections have given the basis for selecting an ar or ma model for a
given data set, and to determine the order of the model. This section now
discusses how to estimate the unknown parameters in the model using r.
The actual mathematics is not discussed and indeed, it is not easy.
γk = φ1 γk−1 + · · · + φp γk−p
This matrix equation can be solved for the coefficients {φk }pk=0 via the for-
mula −1
φ1 1 ρ1 ρ2 · · · ρp−1 ρ1
φ2 ρ1 1 ρ1 · · · ρp−1 ρ2
.. = .. .. .. . (5.3)
.. .. ..
. . . . . . .
φp ρp−1 ρp−2 ρp−3 · · · 1 ρp
Example 5.7: Suppose we have a set of time series data. A plot of the
reveals that the first few non-zero terms of the (and hence ρk val-
ues) are 0.36, −0.14, 0.01 and −0.03. We could use the to determine
approximate values for φk :
−1
φ1 1 0.36 −0.14 0.01 0.36
φ2 0.36 1 0.36 −0.14
−0.14 ,
φ3 = −0.14 0.36
1 0.36 0.01
φ4 0.01 −0.14 0.36 1 −0.03
The Yule–Walker equations are used to find an initial estimate of the pa-
rameters. Note also that they are based on finding parameters for an ar
model only.
Call:
arima(x = sf, order = c(1, 0, 0))
Coefficients:
ar1 intercept
0.3302 80.8809
s.e. 0.1236 4.1722
Bt = 54.17 + 0.3302Bt−1 + et .
The parameter estimates are also given in the output. Either form is
acceptable as the final model.
Call:
arima(x = sf, order = c(2, 0, 0))
Coefficients:
ar1 ar2 intercept
0.2542 0.2373 81.5422
s.e. 0.1262 0.1262 5.2973
Rearranging produces
Comparing the aic for both the ar models show that the ar(2) model
is only slightly better using this criterion than the ar(1) model.
The output from using the function ar can also be used to write down
the fitted model but it doesn’t estimate the intercept; see Example 5.6.
The estimates are also slightly different as a different algorithm is used
for estimating the parameters.
Call:
arima(x = sf, order = c(0, 0, 1))
Coefficients:
ma1 intercept
0.2104 80.5421
s.e. 0.0982 3.4616
or
Bt = 80.54 + et + 0.2104et−1 .
In general, the model is fitted using arima using the order option. The first
component in order is the order of the ar component, and the third is the
order of the ma component. What is the second term?
The second term is only necessary if the series is non-stationary. The next
Module discusses this issue, where the meaning of the second term in the
order parameter will be discussed.
Once a model has been found, r can be used to make forecasts. The function
to use is predict. The following example shows how to use this function.
$pred
Time Series:
Start = 1973
End = 1982
Frequency = 1
[1] 90.49534 84.05536 81.92903 81.22696 80.99516
[6] 80.91862 80.89335 80.88500 80.88225 80.88134
$se
Time Series:
Start = 1973
End = 1982
Frequency = 1
[1] 22.28815 23.47162 23.59705 23.61068 23.61217
[6] 23.61233 23.61235 23.61235 23.61235 23.61235
r has made predictions for the next ten years based on the ar(1)
model, and has included the standard errors of the forecasts as well.
(This make it easy to compute the confidence intervals.) Notice the
forecasts from about six years ahead and further are almost the same.
This implies that the model has little skill at forecasting that far ahead
(which is not surprising). Forecasts a long way into the future tend to
be the mean, which is reasonable.
The data and the forecasts can be plotted together (Fig. 5.7) as follows:
Similar forecasts and plots can be constructed from the other types of
models (that is, ma or arma models) in a similar way. The forecasts
are shown for each of these models in Table 5.2.
5.5 Summary
120
100
snow.and.preds
80
60
40
Time
Figure 5.7: Forecasting the Buffalo snowfall data ten years ahead. There is
little skill in the forecast after a few years. The forecasts are shown using a
dashed line.
Note that most time series (including climatological time series) are not
stationary, but the methods developed so far apply only to stationary data.
In Module 7, non-stationary time series will be examined.
5.6 Exercises
Ex. 5.12: Consider a time series {L}. The fitted model is an arma(1, 0)
model.
Ex. 5.14: The mean annual streamflow in Cache River at Forman, Illinois,
from 1925 to 1988 is given in the file cacheriver.dat. (The data are
not reported by calendar year, but by ‘water year’. A water year starts
in October of the calendar year one year less than the water year and
ends in September of the calendar year the same as the water year. For
example, water year 1980 covers the period October 1, 1979 through
September 30, 1980. However, this does not affect the model or your
analysis.) There are two variables of interest: Mean reports the mean
annual flow, and Max reports the maximum flow each water year, each
measured in cubic feet per second. (The data have been obtained from
USGS [4].)
(a) Use r to find a suitable model for the mean annual stream flow
using the acf and pacf.
(b) Use r to find a suitable model for the maximum annual stream
flow using the function ar and the sample acf and sample pacf.
(c) Using your chosen model, produce forecasts up to three-steps
ahead.
where {e} ∼ N (0, 4). Compute the sample acf and sample pacf from
this simulated data. Do they show the features you expect?
Xt = −0.3et−1 − 0.2et−2 + et
where {e} ∼ N (0, 8). Compute the sample acf and sample pacf from
this simulated data. Do they show the features you expect?
Ex. 5.17: The data in Table 5.3 are thirty consecutive values of March
precipitation in inches for Minneapolis, St. Paul obtained from Hand
et al. [19]. The years are not given. (The data are available in the
data file minn.txt.)
(a) Load the data into r and find a suitable model (ma or ar) for
the data.
(b) Produce forecasts up to three-steps ahead with your chosen model.
Ex. 5.18: The data in the file lake.dat give the mean annual levels at
Lake Victoria Nyanza from 1902 to 1921, relative to a fixed reference
point (units are not given). The data are from Shaw [41] as quoted in
Hand et al [19]. Explain why an ar, ma or arma cannot be fitted to
this data set.
Ex. 5.19: The Easter Island sea level air pressure anomalies from 1951 to
1995 are given in the data file easterslp.dat, which were obtained
from the IRI/LDEO Climate Data Library (http://ingrid.ldgo.
columbia.edu/). Find a suitable ar or ma model for the series using
the sample acf and pacf. Use this model to forecast up to three
months ahead.
Ex. 5.20: The Western Pacific Index (WPI) measures the mode of low-
frequency variability over the North Pacific. The time series in the data
file wpi.txt is from the Climate Prediction Center [3] and the Climate
Diagnostic Centre [2], and gives the monthly WPI from January 1950
to December 2001.
Ex. 5.21: The seasonal average SOI from (southern hemisphere) summer
1876 to (southern hemisphere) summer 2001 is given in the file soiseason.dat
Ex. 5.22: The monthly average solar flux from January 1948 to December
2002 is given in the file solarflux.txt.
Ex. 5.23: The acf in Fig. 5.8 was produced for a time series {P }. In this
question, the Yule–Walker equations are used to form initial estimates
for the values of φ.
(a) Use the first three terms in the acf to set up the Yule–Walker
equations, and solve for the ar parameters. (Any terms within
the confidence limits can be assumed to be zero.)
(b) Repeat, but use four terms of the acf. Compare you answers to
those in part (a).
1.0
0.8
0.6
ACF
0.4
0.2
0.0
−0.2
0 2 4 6 8 10
Lag
Ex. 5.24: The acf in Fig. 5.9 was produced for a time series {Q}. In this
question, the Yule–Walker equations are used to form initial estimates
for the values of φ.
(a) Use the first three terms in the acf to set up the Yule–Walker
equations, and solve for the ar parameters. (Any terms within
the confidence limits can be assumed to be zero.)
(b) Repeat, but use four terms of the acf. Compare you answers to
those in part (a).
(c) Repeat, but use five terms of the acf. Compare you answers to
those in parts (a) and (b).
Ex. 5.25: The acf in Fig. 5.10 was produced for a time series {R}. In this
question, the Yule–Walker equations are used to form initial estimates
for the values of φ.
(a) Use the first three terms in the acf to set up the Yule–Walker
equations, and solve for the ar parameters. (Any terms within
the confidence limits can be assumed to be zero.)
(b) Repeat, but use four terms of the acf. Compare you answers to
those in part (a).
1.0
0.8
0.6
0.4
ACF
0.2
0.0
−0.2
−0.4
0 2 4 6 8 10
Lag
(c) Repeat, but use five terms of the acf. Compare you answers to
those in parts (a) and (b).
5.14 (a) The time series is plotted in Fig. 5.12. The data appears to be
approximately stationary. The sample acf and pacf are shown
in Fig. 5.13.
The sample acf has no significant terms, suggesting no particular
ma model will be useful. The sample pacf has only one term
marginally significant at a lag of 14. This suggests that there is
1.0
0.8
0.6
0.4
ACF
0.2
0.0
−0.2
−0.4
0 2 4 6 8 10
Lag
0 2 4 6 8 10
Lag
0.6
Partial ACF
0.4
0.2
0.0
2 4 6 8 10
Lag
Figure 5.11: A possible sample acf and pacf for an ar(1) model. The acf
is shown in the top plot; the pacf in the bottom plot.
Coefficients:
1 2 3 4 5
-0.2096 -0.1298 -0.3270 -0.2821 -0.2111
5.19 The time series is plotted in Fig. 5.14. The data appears to be approx-
imately stationary. The sample acf and pacf are shown in Fig. 5.15.
The sample acf has seven significant terms, suggesting an ma(7)
800
600
rflow
400
200
Time
Figure 5.12: A plot of the mean annual streamflow in cubic feet per second
at Cache River, Illinois, from 1925 to 1988.
1.0
0.6
ACF
0.2
−0.2
0 5 10 15
Lag
0.1
Partial ACF
−0.1
−0.3
5 10 15
Lag
Figure 5.13: The sample acf and pacf of the mean annual streamflow in
cubic feet per second at Cache River, Illinois, from 1925 to 1988. Top: the
sample acf; Bottom: the sample pacf.
2
eislp
−2
−4
−6
Time
Figure 5.14: A plot of the Easter Island sea level air pressure anomaly from
1951 to 1995.
0.8
ACF
0.4
0.0
Lag
0.2
Partial ACF
0.1
0.0
−0.1
Lag
Figure 5.15: The sample acf and pacf of the Easter Island sea level air
pressure anomaly. Top: the sample acf; Bottom: the sample pacf.
[1] 540
> eislp[535:540]
5.23 From the acf ρ1 ≈ 0.3, ρ3 ≈ −0.2 and ρ3 ≈ 0.2 (and the rest are
essentially zero). So the matrix equation is
1 0.3 −0.2 φ1 0.3
0.3 1 0.3 φ2 = −0.2
−0.2 0.3 1 φ3 0.2
with solution
0.5416667
−0.5 .
0.4583333
In r:
[,1]
[1,] 0.7870968
[2,] -0.7677419
[3,] 0.7483871
[4,] -0.5354839
The solutions are very different. In practice, all the available informa-
tion is used (and hence very large matrices result).
Module contents
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.2 Residual acf and pacf . . . . . . . . . . . . . . . . . . . . 107
6.3 Identification of arma models . . . . . . . . . . . . . . . 110
6.4 The Box–Pierce test (Q-statistic) . . . . . . . . . . . . . 116
6.5 The cumulative periodogram . . . . . . . . . . . . . . . 117
6.6 Significance of parameters . . . . . . . . . . . . . . . . . 118
6.7 Normality of residuals . . . . . . . . . . . . . . . . . . . . 119
6.8 Alternative models . . . . . . . . . . . . . . . . . . . . . 120
6.9 Evaluating the performance of a model . . . . . . . . . 121
6.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.11.1 Answers to selected Exercises . . . . . . . . . . . . . . . 124
Module objectives
105
106 Module 6. Diagnostic Tests
use r to create Q–Q plot and understand what it implies about the
fitted model;
fit competing models to a time series, and use the appropriate tests to
compare the possible models;
6.1 Introduction
Since the residuals should be white noise (that is, are independent and
contain no elements are predictable), the acf and pacf of the residuals
should contain no hint of being forecastable. In other words, the terms of the
residual acf and residual pacf should all lie between the (approximate) 95%
confidence limits. If not, there are elements in the residuals are forecastable,
and these forecastable aspects should be included in the signal of model.
Example 6.1: In Sect. 5.3 (p 85), numerous models were fitted to the
yearly Buffalo snowfall data first introduced in Example 5.2 (p 77).
Two of those models were ar models. Here, consider the ar(1) model.
The model was fitted in Example 5.8 (p 86).
There are two ways to do diagnostic tests in r. The first way is to
use the tsdiag function; this function plots the standardized residuals
in order and plots the acf of the residuals. (It also produces another
plot studied in Sect. 6.4). Here is how the function can be used:
The result is shown in Fig. 6.1. The middle panel in Fig. 6.1 indicates
the residual acf is fine and no model could be fitted to the residuals.
The second method involves using the output object from the arima
command, as shown below.
> summary(resid(ar1))
2
1
0
−1
−2
−3
Standardized Residuals
Time
ACF of Residuals
1.0
0.6
ACF
0.2
−0.2
0 5 10 15
Lag
●
●
0.6
p value
●
● ●
0.4
●
0.2
● ●
●
0.0
2 4 6 8 10
lag
Figure 6.1: Diagnostic plots after fitting an ar(1) model to the yearly Buffalo
snowfall data. This is the output of using the tsdiag command in r.
Series resid(ar1)
1.0
0.6
ACF
0.2
−0.2
0 5 10 15
Lag
Series resid(ar1)
0.2
0.1
Partial ACF
0.0
−0.2
5 10 15
Lag
Figure 6.2: Diagnostic plots after fitting an ar(1) model to the yearly Buffalo
snowfall data. Top: the residual acf; Bottom: the residual pacf.
Using the residual acf and pacf is often how arma models are fitted. A
researcher may look at the sample acf and sample pacf and conclude an
ar(2) model is appropriate. After fitting such a model, an examination of
the residual acf and residual pacf indicates an ma(1) model now seems
appropriate. The best model for the data then be an arma(2, 1) model.
The researcher would hope the residuals from this arma(2, 1) would be
white noise. As was alluded to in Sect. 6.2, using the residual acf and pacf
allows arma models to be identified.
Example 6.2: In Example 4.3, Chu & Katz [13] were said to fit an arma(1, 1)
model to the monthly SOI time series from January 1935 to August
1983. In this example we see how that model may have been chosen.
Keep in mind that selecting arma models is very much an art and
requires experience to do well.
As with any time series, the data must be stationarity. (Fig. 1.3 bot-
tom panel, p 8), which it appears to be. The next step is to look at
the acf and pacf; (Fig. 6.3). The acf suggests a very large order ma
model; the pacf suggests possibly an ar(2) model or an ar(4) model.
To begin, select an ar(2) model as it is simpler and the terms at lags 3
and 4 are only just significant; if an ar(4) model is necessary, it will
become apparent in the diagnostic analysis. The code so far:
The residuals can be examined now to see if the fitted ar(2) model
is adequate using the residual acf and pacf from the ar(2) model
(Fig. 6.4).
The residual acf suggests the model is reasonable, but the residual
pacf suggests at least one ma term at lag 2 may be necessary. (There
Series ms$soi
1.0
0.6
ACF
0.2
−0.2
0 5 10 15 20 25 30
Lag
Series ms$soi
0.6
Partial ACF
0.4
0.2
0.0
0 5 10 15 20 25 30
Lag
Figure 6.3: The acf and pacf of monthly SOI. Top: the acf; Bottom: the
pacf.
Series resid(ms.ar2)
0.8
ACF
0.4
0.0
0 5 10 15 20 25 30
Lag
Series resid(ms.ar2)
0.05
Partial ACF
0.00
−0.10
0 5 10 15 20 25 30
Lag
Figure 6.4: The acf and pacf of residuals for the ar(2) model fitted to the
monthly SOI. In (a), the acf; in (b), the pacf.
> acf(ms.ar2$residuals)
> pacf(ms.ar2$residuals)
> ms.arma22 <- arima(ms$soi, order = c(2, 0,
+ 2))
> acf(ms.arma22$residuals)
> pacf(ms.arma22$residuals)
Again the residual acf looks fine; the residual pacf looks better, but
still not ideal (Fig. 6.5). The significant term at lag 2 has gone as
well as those at lags 5 and 6 however; this is more important than the
significant terms at lags 14 and higher (as lags 14 time steps away are
less likely to be of importance). So perhaps the arma(2, 2) model will
suffice. Here’s the model:
> ms.arma22
Call:
arima(x = ms$soi, order = c(2, 0, 2))
Coefficients:
ar1 ar2 ma1 ma2 intercept
0.9192 -0.0473 -0.4273 -0.0131 -0.0903
s.e. 0.3801 0.3250 0.3792 0.1451 0.8158
Note the second ar term and the second ma term are both unnecessary
(the estimate divided by the standard errors are much less than one).
This suggests the second ar term and the second ma term should be
excluded from the model. In other words, try fitting an arma(1, 1)
model.
Series ms.arma22$residuals
0.8
ACF
0.4
0.0
0 5 10 15 20 25 30
Lag
Series ms.arma22$residuals
0.05
Partial ACF
0.00
−0.05
−0.10
0 5 10 15 20 25 30
Lag
Figure 6.5: The acf and pacf of residuals for the arma(2, 2) model fitted
to the monthly SOI. Top: the acf; Bottom: the pacf.
Series ms.arma11$residuals
0.8
ACF
0.4
0.0
0 5 10 15 20 25 30
Lag
Series ms.arma11$residuals
0.05
Partial ACF
0.00
−0.05
−0.10
0 5 10 15 20 25 30
Lag
Figure 6.6: The acf and pacf of residuals for the arma(1, 1) model fitted
to the monthly SOI. Top: the acf; Bottom: the pacf.
The residual acf and pacf from this model (Fig. 6.6) look very similar
to those in Fig. 6.5, suggesting the arma(1, 1) model is better than
the arma(2, 2) model, and also simpler.
Here’s the arma(1, 1) model:
> ms.arma11
Call:
arima(x = ms$soi, order = c(1, 0, 1))
Coefficients:
ar1 ma1 intercept
0.8514 -0.3698 -0.1183
s.e. 0.0196 0.0355 0.7927
The aic implies this is a better model than the arma(2, 2) model, and
so the arma(1, 1) is appropriate for the data.
called the Ljung–Box test. Both tests, however, may lack statistical power.
In r, the function Box.test is used for both tests.
Example 6.3: In Example 6.1, the yearly Buffalo snowfall data were con-
sidered. In that Example, the residual acf and pacf showed the
residuals were not forecastable using an ar(1) model. To test if the
residuals appear to be independent, use the Box.test function in r.
The input variables are the residuals from the fitted model, and the
number of terms in the acf to be used to compute the statistic. The
default value is one, which is far too few. Typically, a value such as 15
is used (it is often more if the series is longer or is seasonal, and shorter
if the time series is short).
Box-Pierce test
data: resid(ar1)
X-squared = 7.3209, df = 15, p-value = 0.9481
Box-Ljung test
data: resid(ar1)
X-squared = 8.1009, df = 15, p-value = 0.9197
The P -value indicates there is no evidence that the residuals are de-
pendent. The conclusion from the Ljung–Box test is similar. This
further confirms that the ar(1) model is adequate. If the P -value was
below about 0.05, there would be some cause for concern: it would
imply that the terms in the acf are too large to be a white noise.
Note the r function tsdiag produces a plot using P -value of the Box–Pierce
statistic for various value of the lag; see the third (bottom) panel in Fig. 6.1.
The dotted line in the plot corresponds to a P -value of 0.05.
Another test applied to the residuals is to calculate the cumulative (or in-
tegrated) periodogram and apply a Kolmogorov–Smirnov test to check the
assumption that the residuals form a white noise process. The r function
cpgram performs this test.
The cumulative periodogram from a white noise process will lie close to the
central diagonal line. Thus, if the residuals do form a white noise process as
they should do approximately if the model is correct, the cumulative peri-
odogram of the residuals will lie within the indicated bounds with probability
95%.
Example 6.4: In Example 6.1, the yearly Buffalo snowfall data were con-
sidered and an ar(1) model fitted. The cumulative periodogram is
found as follows:
1.0
0.8
0.6
0.4
0.2
0.0
frequency
Figure 6.7: The cumulative periodogram after fitting an ar(1) model to the
yearly Buffalo snowfall data.
The result (Fig. 6.7) indicates that the model is adequate as it remains
between the confidence bands.
Example 6.5: In Example 6.1, the yearly Buffalo snowfall data were con-
sidered. An ar(1) model was fitted to the data. There were two
estimated parameters: the constant term in the model, m0 , and the
ar term. The ar term can be tested for significance. (Recall that the
intercept is of no interest to the structure of the model.) The param-
eter estimates and the standard errors are shown in Example 5.8 (p 86).
Dividing the estimate by the standard error produces an approximate
t-score. The parameter estimates for the ar term has a t-score greater
than two in absolute value, indicating that it is necessary in the model.
The actual t-scores can be computed using the output from the fitting
of the model, as shown below.
> coef(ar1)
ar1 intercept
0.3301765 80.8808921
> ar1$coef
ar1 intercept
0.3301765 80.8808921
> ar1$var.coef
ar1 intercept
ar1 0.01528329 0.03975151
intercept 0.03975151 17.40728000
> coef(ar1)/sqrt(diag(ar1$var.coef))
ar1 intercept
2.670778 19.385655
●
●
40
●
●●
●
●●
●●●
●●
20
●●
●●
●
●
●●
Sample Quantiles
●
●●
●●●
●●●●●●
0
●●●●
●●●
●●
●●
●●●
●
−20
●
●
●●●
●
● ●●●
●●
● ●
−40
−60
−2 −1 0 1 2
Theoretical Quantiles
Figure 6.8: The cumulative periodogram after fitting an ar(1) model to the
yearly Buffalo snowfall data.
Example 6.6: Continuing Example 6.1 (the yearly Buffalo snowfall), con-
sider again the fitted ar(1) model. The Q–Q plot of the residu-
als (Fig. 6.8) indicates the residuals are approximately normally dis-
tributed.
> qqnorm(resid(ar1))
> qqline(resid(ar1))
(Note: qqnorm plots the points; qqline draw the diagonal line.)
The last type of test is to check if an alternative model might be better. This
is open-ended, because there is an endless variety of alternative models from
which to choose. But, as seen before, there are sometimes a small number
of models that are suggested from which the researcher has to choose. If one
model proves to be better using the diagnostic tests, that model should be
used. If all perform similarly, choose the simplest model. But what if there
is more than one model that perform similarly, and each are as simple as the
other? If you can’t decide between them, then it probably doesn’t matter!
Finally, consider an evaluation tool that is slightly different than those pre-
viously discussed. The idea is that the model is fitted to the first portion of
the data (perhaps half the data), and then forecasts are made on the basis of
that model fitted to this portion (called the training sets). One-step ahead
forecasts are then made for each of the remaining data points (called the
testing set) to see how adequate the model can forecast—which, after all, is
one of the main reasons for developing time series models.
This approach generally requires a time series with a large number of ob-
servations to work well, since splitting the data into two parts halves the
amount of information available for model selection. Obviously, smaller por-
tions can be withheld from the model selection stage if necessary, as shown
in the next example. The approach discussed here is called cross-validation.
The ‘best’ model is the model whose predictions in the tesing set are clos-
est to the actual observed values; this can be summarised by noting the
mean and variance of the differences. More sophisticated cross-validation
techniques are possible, but not discussed here.
Example 6.7: Because the Buffalo snowfall data is a short series, we with-
hold only the last ten observations and retain those for model evalu-
ation. The one-step ahead forecasts for the remaing ten observations
for each model are shown in Table 6.1.
These one-step aheads predictions are plotted in Fig. 6.9. Table 6.1
suggests little difference between the models; the ar(2) model has
smaller errors on average (compare the means), but the ar(1) model
is closest more consistent (compare the variances).
6.10 Summary
Before accepting a time series model, it must be tested. The main tests are
based on analysing the “residuals”—the one-step ahead forecast errors of the
model. Table 6.2 summaries the diagnostic tests discussed.
Table 6.1: The one-step ahead forecasts for the ar(1), ar(2) and ma(2)
model after withholding the last ten observations and using the remainder
as a training set.
Prediction from Model:
Actual ar(1) ar(2) ma(2)
1 89.80 87.08 91.67 86.91
2 71.50 83.21 88.52 84.59
3 70.90 77.10 80.89 77.14
4 98.30 76.89 75.88 73.97
5 55.50 86.05 82.53 85.08
6 66.10 71.75 79.17 79.26
7 78.40 75.29 70.43 66.64
8 120.50 79.40 76.31 79.19
9 97.00 93.46 90.05 95.84
10 110.00 85.61 95.39 93.69
Errors: Mean: 4.215 2.717 3.570
Var: 414.9 444.7 427.0
120
110
Series and predictions
100
90
80
70
Series
AR(1) preds
60 AR(2) preds
MA(2) preds
Years
Figure 6.9: The cross-validation one-step ahead predictions for the ar(1),
ar(2) and ma(2) models applied to the Buffalo snowfall data.
Table 6.2: A summary of the diagnostic test to use on given time series
models.
6.11 Exercises
Ex. 6.8: In Exercise 4.22, an arma(1, 1) model was discussed that was
fitted by Sales, Pereira & Vieira [40] to the natural monthly average
flow rate (in cubic metres per second) of the reservior of Furnas on the
Grande River in Brazil. Table 4.1 (p 70) gave the parameter estimates
and their standard errors. Determine if each parameter is significant
at the 95% level.
Ex. 6.9: In Exercise 5.14 (p 91), data concerning the mean annual stream-
flow from 1925 to 1988 in Cache River at Forman, Illinois, given in the
file cacheriver.dat There are two variables of interest: Mean reports
the mean annual flow, and Max reports the maximum flow each water
year, each measured in cubic feet per second. Perform the diagnostic
checks to see if the model found for the variable Mean in that exercise
produce adequate models.
Ex. 6.10: In Exercise 5.19 (p 92), the Easter Island sea level air pressure
anomalies from 1951 to 1995, given in the data file easterslp.txt,
were analysed. An ar(3) model was considered a suitable model. Per-
form the appropriate diagnostic checks on this model, and determine
if the model is adequate.
Ex. 6.11: In Exercise 4.4, Davis & Rappoport [15] were reported to use
an arma(2, 2) model for modelling the Palmer Drought Index, {Yt }.
Katz & Skaggs [26] claim the equivalent ar(2) model is almost as good
as the model given by Davis & Rappoport, yet has half the number of
parameters. For this reason, they prefer the ar(2) model.
Load the data into r and decide on the best model. Give reasons for
your solution, and include diagnostics analyses.
Ex. 6.12: In Exercise 5.20, a model was fitted to the Western Pacific Index
(WPI). The time series in the data file wpi.txt gives the monthly
WPI from January 1950 to December 2001. Perform some diagnostic
analyses and select the ‘best’ model for the data, justifying your choice
and illustrating your answer with appropriate diagrams.
Ex. 6.13: In Exercise 5.21, the seasonal average SOI from (southern hemi-
sphere) summer 1876 to (southern hemisphere) summer 2001 was stud-
ied. The data is given in the file soiseason.dat. Fit an appropriate
model to the data justifying your choice and illustrating your answer
with appropriate diagrams.
Ex. 6.14: In Exercise 5.22, the monthly average solar flux from December
1950 to December 2001 was studied. The data is given in the file
solarflux.txt. Fit an appropriate model to the data justifying your
choice and illustrating your answer with appropriate diagrams.
Ex. 6.15: The data file rionegro.dat contains the average monthly heights
of the Rio Negro river at Manaus from 1903–1992 in metres (relative
to an arbitrary reference point). Find a suitable model for the times
series, including a diagnostic analysis of possible models.
6.9 The model chosen for the variable Mean was simply that the data were
random. Hence the residual acf and residual pacf are just the sam-
ple acf and sample pacf as shown in Fig. 5.13. The cumulative
periodogram shows no problems with this model; see Fig. 6.10. The
Box–Pierce test likewise indicates no problems. The Q–Q plot is not
ideal though (and looks better if an ar(3) model is fitted). Here is
some of the code:
> Box.test(rflow)
Box-Pierce test
data: rflow
X-squared = 0.865, df = 1, p-value = 0.3523
800
●
Series: rflow
●
1.0
600
●
●
0.8
●
●
●●
Sample Quantiles
●
●
●●
0.6
400
●●
●
●●
●
●
0.4
●
●
●
●
●
●
●
●
●●
●●
●
●
0.2
●●
●
200
●●
●●
●
●●
●●
●
●
●●
●
●
●●●
0.0
●
●●●●
0.0 0.1 0.2 0.3 0.4 0.5 ● ●●
●
frequency
−2 −1 0 1 2
Theoretical Quantiles
Box-Pierce test
data: resid(rn.ar3)
X-squared = 0.1847, df = 1, p-value = 0.6674
> plot(ht)
4
2
0
ht
−2
−4
−6
Time
Series ht Series ht
1.0
0.8
0.8
0.6
0.6
0.4
Partial ACF
ACF
0.4
0.2
0.0
0.2
−0.2
0.0
0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0 2.5
Lag Lag
Figure 6.12: The acf and pacf of the Rio negro river data
0.05
0.8
0.6
0.00
Partial ACF
ACF
0.4
−0.05
0.2
0.0
0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0 2.5
Lag Lag
Figure 6.13: The residual acf and pacf of the Rio Negro river data after
fitting the ar(3) model
Series: resid(rn.ar3) ●
●●
1.0
●
2
●●
●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
0.8
●
●
●●
●
●●
●
Sample Quantiles
●●
1
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
0.6
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
0
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
0.4
●
●
●
●
●
●
●
●●
●
●
−1
●
●
●
●
●
●
●
●
●
●●
●
●
●
0.2
●
●●
●
●
●
●
●
●
●●
●
−2
●●
●
●
●
●
●
0.0
●●
●
●
●
●●
−3
0 1 2 3 4 5 6 ●
frequency −3 −1 0 1 2 3
Theoretical Quantiles
Figure 6.14: Further diagnsotic plot of the Rio Negro river data after fitting
the ar(3) model
> coef(rn.ar3)/sqrt(diag(rn.ar3$var.coef))
Ther Box test shows no problems; all the parameters seem necessary.
This model seems fine (if not perfect).
> rn.ar3
Call:
arima(x = ht, order = c(3, 0, 0))
Coefficients:
ar1 ar2 ar3 intercept
1.1587 -0.4985 0.1837 -0.0020
s.e. 0.0299 0.0437 0.0299 0.1462
Non-Stationary Models
7
Module contents
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 130
7.2 Non-stationarity in the mean . . . . . . . . . . . . . . . 131
7.3 Non-stationarity in the variance . . . . . . . . . . . . . 134
7.4 arima models . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.4.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.4.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 136
7.4.3 Backshift operator . . . . . . . . . . . . . . . . . . . . . 138
7.5 Seasonal models . . . . . . . . . . . . . . . . . . . . . . . 138
7.5.1 Identifying the season length . . . . . . . . . . . . . . . 141
7.5.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
7.5.3 The backshift operator . . . . . . . . . . . . . . . . . . . 147
7.5.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.6 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . 150
7.7 Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.8 A summary of model fitting . . . . . . . . . . . . . . . . 154
7.9 A complete example . . . . . . . . . . . . . . . . . . . . . 156
7.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.11.1 Answers to selected Exercises . . . . . . . . . . . . . . . 164
129
130 Module 7. Non-Stationary Models
Module objectives
7.1 Introduction
Up to now, all the time series considered have been assumed stationary. This
assumption was crucial to the definitions of the autocorrelation and partial
autocorrelation. In practice, however, many time series are not stationary.
In this Module, methods for identifying non-stationary series are considered,
and then models for modelling these series are examined.
Many series may exhibit more than one of these types of non-stationarity.
Note that each time a set of differences is calculated, the new series has one
less observation than the original. In r, differences are created using diff.
Example 7.1:
Consider the annual rainfall near Wendover, Utah, USA. The data
appear to have a non-stationary mean (Fig. 7.1) as the mean goes up
and down, though it is not too severe. To check this, a smoothing
filter was applied computing the mean of each set of six observations
at a time. This smooth (Fig. 7.1, top panel) suggests the mean is
probably non-stationary as this line is not (approximately) constant.
The following code fragment shows how the differences series was found
in r.
500
300
200
Year
Differences of Annual rainfall (in mm)
200
100
−100
−200
Year
Figure 7.1: The annual rainfall near Wendover, Utah, USA in mm. Top:
the original data is plotted with a thin line, and a smooth in a thick line,
indicating that the mean is non-stationary. Bottom: the differenced data is
plotted with a thin line, and a smooth in a thick line. Since the smooth is
relatively flat, the differenced data has a stationary mean.
AMO
0.2
0.1
amo
0.0
−0.1
Time
0.02
damo
0.00
−0.02
−0.04
Time
0.01
0.00
−0.01
−0.02
−0.03
Time
Figure 7.2: The Atlantic Multidecadal Oscillation from 1948 to 1994. Top:
the plot shows the data is not stationary. Middle: the first differences are
also not stationary. Bottom: taking two sets of differences has produced a
stationary series.
Example 7.2: Enfield et al. [16] used the Kaplan SST to compute a ten-
year running mean of detrended Atlantic SST anomalies north of the
equator. This data series is called the Atlantic Multidecadal Oscilla-
tion (AMO). The data, obtained from the NOAA Climatic Diagnostic
Center [2], are stored as amo.dat. A plot of the data shows the series is
non-stationary in the mean; (Fig. 7.2, top panel). The first differences
are also non-stationary; (Fig. 7.2, middle panel). Taking one more
set of differences produces approximately stationary data; (Fig. 7.2,
bottom panel).
Here is the code used.
+ frequency = 1)
> par(mfrow = c(3, 1))
> plot(amo, main = "AMO", las = 1)
> damo <- diff(amo)
> plot(damo, main = "One difference of AMO",
+ las = 1)
> ddamo <- diff(damo)
> plot(ddamo, main = "Two differences of AMO",
+ las = 1)
7.4.1 Notation
ar, ma or arma models in which differences have been taken are collectively
called autoregressive integrated moving average models, or arima models.
Consider an arima model in which the original time series has been differ-
enced d times (d is mostly 1, sometimes 2, and almost never greater than
2). If this now-stationary time series can be well modelled by an arma(p, q)
1.0
0.5
ACF
0.0
−0.5
0 5 10 15
Lag
0.2
0.0
Partial ACF
−0.2
−0.4
5 10 15
Lag
Figure 7.3: The differences of the annual rainfall near Wendover, Utah, USA
in mm. Top: the sample acf. Bottom: the sample pacf.
Example 7.3: In Example 7.1, the annual rainfall near Wendover, Utah,
say {Xn }, was considered. The time series was non-stationary, and
differences were taken. The differenced time series, say {Yn }, is now
stationary. The sample acf and pacf of the stationary series {Yn } is
shown in Fig. 7.3.
The sample acf suggests an ma(1) model is appropriate (again re-
calling that the term at lag 0 is always one), while the sample pacf
suggests an ar(2) model is appropriate. The AIC recommends an
ar(1) model. If the ar(2) model is chosen, the model would be an
arima(2, 1, 0). If the ma(1) model is chosen, the model would be an
arima(0, 1, 1). If the ar(1) model is chosen, the model would be an
7.4.2 Estimation
The r function arima can be used to fit arima models, with only a simple
change to what was seen for stationary models.
Example 7.5: In Example 7.3, three models are considered. To fit the
arima(0, 1, 1) model, use the code
Call:
arima(x = ann.rain, order = c(0, 1, 1))
Coefficients:
ma1
-0.7036
s.e. 0.1208
We have now seen what the second element of order is for: it indicates
the order of the differencing necessary to make the series stationary.
The fitted model for the first differences of the annual rainfall series
is therefore Wt = −0.7036et−1 + et where Wt = Yt − Yt−1 , and {Y } is
the original time series of annual rainfall (since first differences were
taken). This can be written as
Yt − Yt−1 = −0.7036et−1 + et
and further unravelled to
Yt = Yt−1 − 0.7036et−1 + et .
To fit the arima(1, 1, 0) model, proceed as follows:
Call:
arima(x = ann.rain, order = c(1, 1, 0))
Coefficients:
ar1
-0.4494
s.e. 0.0933
When differences are taken of a time series {Xt }, this is written using the
backshift operator as Yt = (1 − B)Xt .
Example 7.7: In Example 7.2, the AMO from 1948 to 1994 was examined.
Two sets of differences were required to make the data stationary.
Looking at the sample acf and sample pacf of the twice-differenced
data shows that no model is necessary. The fitted model is therefore
is an arima(0, 2, 0) model. Using the backshift operator, the model is
(1 − B)2 At = et , where {A} is the AMO series.
The most common type of non-stationarity is when the time series exhibits
a ‘seasonal’ pattern. ‘Seasonal’ does not necessarily have anything to do
with the seasons of Winter, Spring, and so on. It means that there is some
kind of regular pattern in the data. This type of non-stationarity is very
common in climatological and meteorological applications, where there is
often an annual pattern evident in the data. Seasonal data is time series
data that shows regular fluctuation aligned usually with some natural time
period (not just the actual seasons of Winter, Spring, etc). The length of
a season is the time period over which the pattern repeats. For example,
monthly data might show an annual pattern with a season of length 12, as
the data may have a pattern that repeats each year (that is, each twelve
months). These patterns usually appear in the sample acf and pacf.
Example 7.8:
The average monthly sea level at Darwin, Australia (in millimetres),
obtained from the Joint Archive for Sea Level [1], is plotted in the top
panel of Fig. 7.4. The sample acf and sample pacf are also shown.
The code used to produce these Figure is given below:
The data show a seasonal pattern—the sea level has a regular rise and
fall according to the months of the year (as expected). The length of
the season is therefore twelve, since the pattern is of length twelve,
when the pattern then repeats. This seasonality also appears in the
sample acf.
Xt = et − 0.23Xt−12
might be used to model monthly data (where the season length is twelve, as
the data might be expected to repeated each year). This model explicitly
models the seasonal pattern by incorporating an autoregressive term at a
lag of twelve.
4.2
Sea level (in m)
4.1
4.0
3.9
3.8
Time
0.8
0.4
ACF
0.0
−0.4
Lag
0.8
0.4
Partial ACF
0.0
−0.4
Lag
Figure 7.4: The monthly average sea level at Darwin, Australia in metres.
Top: the data are plotted. Centre: the sample acf and Bottom: the sample
pacf.
since the first ar term is one ‘season’ (12 time steps) behind, and the second
ar term is two ‘seasons’ (2 × 12 = 24 time steps) behind.
Yt = Xt − Xt−s
is used to create a more stationary time series. Again, the r function diff
is used with an optional parameter given to indicate the season length.
Example 7.9: The Darwin sea average monthly sea level data (Exam-
ple 7.8, p 139) has a strong seasonal pattern. Taking seasonal differ-
ences seems appropriate:
The plot of the seasonally differenced data (Fig. 7.5, top panel) sug-
gests the series is still possibly non-stationary in the mean, so taking
ordinary (non-seasonal) differences also seems appropriate:
The plot of the twice-differenced data (Fig. 7.5, bottom panel) is now
approximately stationary.
Example 7.10: Kärner & Rannik [25] using the seasonal ma model
xt − xt−12 = et − Θ1 et−12
0.3
0.2
0.1
dsl
0.0
−0.1
−0.2
Time
0.10
0.05
ddsl
0.00
−0.05
−0.10
Time
Figure 7.5: The differences in monthly average sea level at Darwin, Australia
in metres (see also Fig. 7.4). Top: the seasonal differences are plotted, while
in the bottom plot, both seasonal and non-seasonal differences have been
taken.
10
0
qbo
−10
−20
−30
1960 1970 1980 1990 2000
Time
Series: x
Raw Periodogram
1e+03
1e+01
spectrum
1e−01
1e−03
0 1 2 3 4 5 6
frequency
bandwidth = 0.00601
Series: x
Smoothed Periodogram
1e+02
spectrum
1e+00
1e−02
0 1 2 3 4 5 6
frequency
bandwidth = 0.0662
Figure 7.6: The Quasi-Biennial Oscillation (QBO) from 1955 to 2001. Top:
the the QBO is plotted and shown cyclic behaviour. Middle: the spectrum
is shown. Bottom: the spectrum is shown again, but has been smoothed.
The result is a much smoother spectrum (Fig. 7.6, bottom panel). The
season length is identified as the frequency where the spectrum is at
its greatest. This can also be done in r:
[1] 0.4375
> 1/max.freq
[1] 2.285714
Series: x Series: x
Smoothed Periodogram Smoothed Periodogram
1.0 1.5
1.0
spectrum
spectrum
0.5
0.5
0.2
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
frequency frequency
bandwidth = 0.00318 bandwidth = 0.00318
Series: x Series: x
Smoothed Periodogram Smoothed Periodogram
2.0
2.0
1.0
1.0
spectrum
spectrum
0.5
0.5
0.2
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
frequency frequency
bandwidth = 0.00318 bandwidth = 0.00318
Figure 7.7: Four replication of a spectrum from 1000 Normal random num-
bers. There is no evidence of one frequency dominating.
7.5.2 Notation
These models are very difficult to write down. There are a number of pa-
rameters that must be included:
Note that any one model should only have a few parameters, and so some
of p, q P , and Q are expected to be zero. In addition, d + D is most often
one, sometimes two, and rarely greater than two. These parameters are
summarized by writing a model down as follows: A model with all of the
above parameters would be written as a arima(p, d, q) (P, D, Q)s model.
Example 7.17: In Example 7.8 (p 139), the average monthly sea level at
Darwin was analysed. In Example 7.9 (p 141), seasonal differences
were taken to make the data stationary.
The seasonally differenced data (Fig. 7.5, top panel) was non-statonary.
The seasonally differenced and non-seasonally differenced data (Fig. 7.5,
bottom panel) looks approximately stationary. The sample acf and
pacf of this series is shown in Fig. 7.8.
0.2
1.0
0.1
0.0
0.5
Partial ACF
−0.1
ACF
−0.2
0.0
−0.3
−0.4
−0.5
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0
Lag Lag
Figure 7.8: The sample acf and pacf for the twice-differenced monthly
average sea level at Darwin, Australia in metres. Top: the sample acf;
Bottom: the sample pacfof the twice-differenced data are shown.
For the non-seasonal components of the model, the sample acf sug-
gests no model is necessary (the one component above the dotted confi-
dence interval can probably be ignored—it is just over the approximate
lines and is at a lag of two). The sample pacf suggest no model is
needed either—though there is again a marginal component at a lag of
two. (It may be necessary to include these terms later, as wil become
evident in the diagnostic analysis, but it is unlikely.)
For the seasonal model, the sample pacf decays very slowly (there
is one at seasonal lag 1, lag 2 and lag 3), suggesting a large number
seasonal ar terms would be necessary. In contrast, the sample acf
suggests one seasonal ma term is needed. In summary, two differences
have been taken (so d = 1 and D = 1). No non-seasonal model seems
necessary (so p = q = 0), but a seasonal ma(1) term is suggested (so
P = 0 and Q = 1). So the model is arima(0, 1, 0) (0, 1, 1)12 , and there
is only one parameter to estimate (the seasonal ma(1) parameter).
The general form of an arima(p, d, q) (P, D, Q)s model is written using the
backshift operator as
Example 7.18: In Example 7.17, one model suggested for the average
monthly sea level at Darwin was arima(0, 1, 0) (0, 1, 1)12 . Using the
backshift operator, this model is
Example 7.19: Maier & Dandy [31] use arima models to model the daily
salinity at Murray Bridge, South Australia from Jan 1, 1987 to 31 Dec
1991. They examined numerous models, including some models not
in the Box–Jenkins methodology. The best Box–Jenkins models were
those based on one set of non-seasonal differences, and one or two sets
of seasonal differences, with a season of length s = 365. One of their
final models was the arima(1, 1, 1) (1, 2, 0)365 model
7.5.4 Estimation
Estimation of seasonal arima models is quite tricky, as there are many pa-
rameters that could be specified: the ar and ma components both seasonally
and non-seasonally. This is part of the help from the arima function
The input order has been used previously; to also specifiy seasonal compo-
nents, the input seasonal must be used.
Call:
arima(x = sealevel$Sealevel, order = c(0, 1, 0), seasonal = list(order = c(0,
1, 1), period = 12))
Coefficients:
sma1
-0.9996
s.e. 0.2305
7.6 Forecasting
The principles of forecasting used earlier apply to arima and seasonal arima
models without significant differences. However, it is necessary to write the
model without using the backshift operator first, which can be quite tedious.
was given for the daily salinity at Murray Bridge, South Australia,
say {Xt }. After expanding the terms on the left-hand side, there will
be terms involving B, B 2 , B 365 , B 366 , B 367 , B 730 , B 731 , B 732 , B 1095 ,
B 1096 and B 1097 . This makes it very difficult to write down. Indeed,
without using the backshift operator as above, it would be very tedious
to write down the model at all, even though only three parameters have
been estimated. Note this is an unusual case of model fitting in that
three sets of differences were taken.
Partial ACF
0.6
ACF
0.2
−0.15
−0.2
Lag Lag
●
Sample Quantiles
0.8
●
50 ●●●●●
●
●
●●
●●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●●
●●
●
●
●●
●
●●
●
●●
●
●
●
●●
●
●●
●
0
●
●●
●
●
●●
●
●
●●
●
●
0.4
●
●●
●●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●
●●
●●
●●●
●●
●
−100
0.0
Figure 7.9: Some residual plots for the arima(0, 1, 0) (0, 1, 1)12 fitted to the
monthly sea level at Darwin. Top left: the residual acf; top right: the
residual pacf; bottom left: the cumulative periodogram; bottom right: the
Q–Q plot
7.7 Diagnostics
The usual diagnostics apply equally for non-stationary models; see Module 6.
Example 7.23: In Example 7.17, the model arima(0, 1, 0) (0, 1, 1)12 was
suggested for the monthly sea level at Darwin. The residual acf,
residual pacf and the cumulative periodogram can be produced in
˚(Fig. 7.9).
> sma1 <- arima(sl, order = c(0, 1, 0), seasonal = list(order = c(0,
+ 1, 1), period = 12))
> acf(resid(sma1))
> pacf(resid(sma1))
> cpgram(resid(sma1))
Both the residual acf and pacf look OK, but both have a significant
term at lag 2; the periodogram looks a little suspect, but isn’t too bad.
The Box–Pierce Q statistic can be computed, and the standard error
of the estimated parameter found also:
> Box.test(resid(sma1))
Box-Pierce test
data: resid(sma1)
X-squared = 2.9538, df = 1, p-value = 0.08567
> coef(sma1)/sqrt(diag(sma1$var.coef))
sma1
-4.335704
> oth.mod <- arima(sl, order = c(2, 1, 0), seasonal = list(order = c(0,
+ 1, 1), period = 12))
> acf(resid(oth.mod))
> pacf(resid(oth.mod))
> cpgram(resid(oth.mod))
> qqnorm(resid(oth.mod))
> qqline(resid(oth.mod))
> Box.test(resid(oth.mod))
Box-Pierce test
data: resid(oth.mod)
X-squared = 0.0116, df = 1, p-value = 0.9144
> coef(oth.mod)/sqrt(diag(oth.mod$var.coef))
The residual acf and pacf appear better, as does the periodogram.
The Box–Pierce statistic now certainly not significant, but one of the
parameters is unnecessary. (This was expected; we only really wanted
0.10
Partial ACF
0.6
ACF
0.00
0.2
−0.15
−0.2
Lag Lag
●
●
●
Sample Quantiles
0.8
●●●●
●
●●
●●
●
●●
●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●●
●
0.00
●
●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●●
●
0.4
●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●●
●●
●●●
●●
●●●
−0.10
0.0
0 1 2 3 4 5 6 −2 −1 0 1 2
Figure 7.10: Some residual plots for the arima(2, 1, 0) (0, 1, 1)12 fitted to
the monthly sea level at Darwin. Top left: the residual acf; top right: the
residual pacf; bottom left: the cumulative periodogram; bottom right: the
Q–Q plot.
the second lag, but were forced to take the first, insignificant one.)
The Q–Q plot looks marginally improved also.
Fitting a arima(0, 1, 2) (0, 1, 1)12 produces similar results. Which is
the better model? It is not entirely clear; either is probably OK.
To summarise, these are the steps that need to be taken to fit a good model:
Plot the data. Check that the data is stationary. If the data is not
stationary, deal with it appropriately (by taking logarithms or differ-
ences (seasonal and/or non-seasonal), or perhaps both). Remember
that it is rare to require many levels of differencing.
Examine the sample acf, sample pacf and/or the AIC to determine
possible models to for the data. Models may include ma, ar, arma or
arima models, with non-seasonal and/or seasonal aspects. (Remem-
ber that is it rare to have models with a large number of parameters to
be estimated.) You may have to use a periodogram to identify season
length.
Use r’s arima function to fit the models and determine the parameter
estimates.
Perform the following diagnostic checks for each of the possible models.
Choose the best model from the available information, and write down
the model (probably using backshift operators). Remember that the
simplest, most adequate model is the best model; more parameters do
not necessarily make a better model.
Yes
Is time series stationary?
No
Yes
Is this model an adequate model? The residual acf and pacf (Fig. 7.14)
suggest the model is adequate.
The cumulative periodogram and Q–Q plots (Fig. 7.15) indicate the model
is adequate. The two estimated parameters are also significant:
350
340
co
330
320
Time
0
dco
−1
−2
Time
1.0
0.5
ddco
0.0
−0.5
−1.0
Time
1.0
0.5
ACF
0.0
−0.5
Lag
0.1
Partial ACF
−0.1
−0.3
Lag
Series resid(co.model)
0.8
ACF
0.4
0.0
Lag
Series resid(co.model)
0.00 0.05 0.10
Partial ACF
−0.10
Lag
● ●
●●●
Series: resid(co.model) ●
●●
●●
0.5
●●
1.0
●
●
●●
●
●●
●
●●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
0.8
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
Sample Quantiles
●
●
●
●
●●
●●
0.0
●
●●
●
●
●●
●
0.6
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
0.4
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
−0.5
●
●●
●
●
●
●●
●
●●
●
●
0.2
●
●●
●●
●
●
0.0
−1.0
0 1 2 3 4 5 6
●
frequency
−3 −2 −1 0 1 2 3
Theoretical Quantiles
> co.model$coef/sqrt(diag(co.model$var.coef))
ma1 sma1
-6.62813 -27.24159
> co.model
Call:
arima(x = co, order = c(0, 1, 1), seasonal = list(order = c(0, 1, 1), season = 1
Coefficients:
ma1 sma1
-0.3634 -0.8581
s.e. 0.0548 0.0315
7.10 Summary
7.11 Exercises
Ex. 7.24: Consider a arima(1, 0, 0) (0, 1, 1)7 model fitted to a time series
{Pn }. Write this model using the backshift operator notation (make
up some reasonable parameter estimates).
Ex. 7.25: Consider a arima(1, 1, 0) (1, 1, 0)12 model fitted to a time series
{Yn }. Write this model using the backshift operator notation (make
up some reasonable parameter estimates).
(a) Write down the model fitted to the series {W } using the backshift
notation;
(b) Write down the model fitted to the series {W } using arima no-
tation.
(c) Write the model out in terms of Wt , et and previous terms (that
is, don’t use the backshift operator).
Ex. 7.27: Consider some non-stationary data {Z}. After taking seasonal
differences, the series seems stationary. Let this differenced data be
{Y }. A non-seasonal ma(2) model and seasonal arma(1, 1) is fitted
to the stationary data (the season is of length 24).
(a) Write down the model fitted to the series {Z} using the backshift
notation;
(b) Write down the model fitted to the series {Z} using arima no-
tation.
(c) Write the model out in terms of Zt , et and previous terms (that
is, don’t use the backshift operator).
Ex. 7.28: For each of the following cases, write down the final model using
the backshift operator and using notation.
Ex. 7.29: For each of the following cases, write down the final model using
the backshift operator and using notation.
Ex. 7.30: For the following models written using backshift operators, ex-
pand the model and write down the model in standard form. In addi-
tion, write down the model using arima notation.
Ex. 7.31: For the following models written using backshift operators, ex-
pand the model and write down the model in standard form. In addi-
tion, write down the model using arima notation.
Ex. 7.32: Consider some non-stationary monthly data {G}. After taking
seasonal differences, the series seems stationary. Let this differenced
data be {H}. A non-seasonal ma(2) model, a seasonal ma(1) and a
seasonal ar(1) model are fitted to {H}. Write down the model fitted
to the series {G} using
Ex. 7.33: Trenberth & Stepaniak [43] defined an index of El Niño evolution
they called the Trans-Niño Index (TNI). This monthly time series is
given in the data file tni.txt, and contains values of the TNI from
January 1958 to December 1999. (The data have been obtained from
the Climate Diagnostic Center [2].)
Ex. 7.34: The sunspot numbers from 1770 to 1869 were given in Table 1.2
(p 20). The data are given in the data file sunspots.dat.
(a) Plot the data and decide if a seasonal component appears to exist.
(b) Use spectrum (and a smoother) to find any seasonal components.
(c) Suggest a possible model for the data (make sure to do a diag-
nostic analysis).
(a) Plot the data and decide if a seasonal component appears to exist.
(b) Use spectrum (and a smoother) to find any seasonal components.
(c) Suggest a possible model for the data (make sure to do a diag-
nostic analysis).
Ex. 7.37: Kärner & Rannik [25] fit an arima(0, 0, 0) (0, 1, 1)12 to the Inter-
national Satellite Cloud Climatology Project (ISCCP) cloud detection
time series {Cn }. They fit different model for different latitudes. At
−90◦ latitude, the unknown model parameter is about 0.7 (taken from
their Figure 5).
Ex. 7.39: The data file wateruse.dat contains the annual water usage in
Baltimore city in litres per capita per day from 1885 to 1963. The
data are from Hipel & McLeod [21] and Hyndman [5]. Plot the data
and confirm that the data is non-stationary.
Ex. 7.40: The file firring.txt contain the tree ring indicies for the Dou-
glas fir at the Navajo National Monument in Arizona, USA from 1107
to 1968. Find a suitable model for the data.
Ex. 7.41: The data file venicesealevel.dat contains the maximum sea
levels recorded at Venice from 1887–1981. Find a suitable model for
the times series, including a diagnostic analysis of possible models.
This is equivalent to
7.39 The series is plotted in the top plot in Fig. 7.16. The data are clearly
non-stationary in the mean. Taking difference produces an approxi-
mately stationary series; see the bottom plot in Fig. 7.16.
Using the stationary differenced series, the sample acf and pacf are
shown in Fig. 7.17. These plots suggest that no model can be fitted
to the differenced series. That is, the first differences are random.
The model for the water usage {Wt } is therefore
(1 − B)Wt = et
650
600
550
wu
500
450
400
350
Time
100
50
dwu
−50
−100
−150
1900 1920 1940 1960
Time
Figure 7.16: The annual water usage in Baltimore city in litres per capita
per day from 1885 to 1968. Top: the data is clearly non-stationary in the
mean; Bottom: the first differences are approximately stationary.
1.0
0.6
ACF
0.2
−0.2
0 5 10 15
Lag
0.2
0.1
Partial ACF
0.0
−0.2
5 10 15
Lag
Figure 7.17: The annual water usage in Baltimore city in litres per capita
per day from 1885 to 1968. Top: the sample acf of the differenced series;
Bottom: the sample pacf of the differenced series.
80
180
60
160
40
140
20
diff(vs)
vs
120
0
100
−20
−40
80
−60
60
1900 1920 1940 1960 1980 1900 1920 1940 1960 1980
Time Time
Figure 7.18: A plot of the Venice sea level data. Left: original data; right:
after taking first differences
This model appears fine, if not perfect, so let’s examine more diagnos-
tics (Fig. 7.22); these look OK too.
So, for some final diagnostics:
0.2
0.8
0.1
0.6
0.0
Partial ACF
0.4
−0.1
ACF
0.2
−0.2
0.0
−0.3
−0.2
−0.4
−0.4
0 5 10 15 5 10 15
Lag Lag
Figure 7.19: The acf and pacf of the Venice sea level data
0.2
0.8
0.1
0.6
0.0
Partial ACF
0.4
ACF
−0.1
0.2
−0.2
0.0
−0.2
−0.3
0 5 10 15 5 10 15
Lag Lag
Figure 7.20: The residual acf and pacf of the Venice sea level data after
fitting the ar(1) model
1.0
0.2
0.8
0.1
0.6
Partial ACF
0.0
0.4
ACF
−0.1
0.2
0.0
−0.2
−0.2
−0.3
0 5 10 15 5 10 15
Lag Lag
Figure 7.21: The residual acf and pacf of the Venice sea level data after
fitting the ma(1) model
Series: resid(vs.ma1)
80
●
1.0
60
0.8
Sample Quantiles
●●
40
●
●●
●●
0.6
●
●
20
●
●●
●
●
●
●
●●
0.4
●
●●
●
●
●
●●
●
●
●●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
0
●
●
●●
●
●
●
●●
0.2
●
●●
●
●●
●
●●
●
●●
●●
●
●
●●
●
●
−20
●●
●●●●
0.0
●
●●
−40
frequency −2 −1 0 1 2
Theoretical Quantiles
Figure 7.22: Further diagnsotic plot of the Venice sea level data after fitting
the ma(1) model
> Box.test(resid(vs.ma1))
Box-Pierce test
data: resid(vs.ma1)
X-squared = 0.8388, df = 1, p-value = 0.3597
> coef(vs.ma1)/sqrt(diag(vs.ma1$var.coef))
ma1
-16.48790
> vs.ma1
Call:
arima(x = vs, order = c(0, 1, 1))
Coefficients:
ma1
-0.8677
s.e. 0.0526
Module contents
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 174
8.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 174
8.3 The transition matrix . . . . . . . . . . . . . . . . . . . . 177
8.4 Forecast the future with powers of the transition matrix181
8.5 Classification of finite Markov chains . . . . . . . . . . 184
8.6 Limiting state (steady state) probabilities . . . . . . . . 187
8.6.1 Share of the market model . . . . . . . . . . . . . . . . 190
8.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
8.7.1 Answers to selected Exercises . . . . . . . . . . . . . . . 200
173
174 Module 8. Markov chains
8.1 Introduction
Up to now, only continuous time series have been considered; that is, the
quantity being measured over time is continuous. In this Module1 , a simple
method is considered for time series that take on discrete values. A simple
example is the state of the weather: If it is fine or if it is raining, for example.
8.2 Terminology
Let T (n) be the time between nth and (n + 1)th pulse registered on
a Geiger counter. The indexing parameter n ∈ {0, 1, 2, . . .} is discrete
and the state space continuous. A realisation of this process would be
a discrete set of real numbers with values in the range (0, ∞).
1
Most of the material in this Module has been drawn from previous work by Dr Ashley
Plank and Professor Tony Roberts.
In this section we consider stochastic models with both discrete state space
and discrete parameter space. Some example include annual survey of bi-
ological populations; monthly assessment of the water levels in a reservoir;
weekly inventories of stock; daily inspections of a vending machine; mi-
crosecond sampling of a buffer state. These models are used occasionally in
climate modelling.
It follows that
0.4 0.2
1=sunny 2=cloudy
0.8
This expression says that the probability distribution of the state at time
t + 1 depends only on the state at time t (namely it ) and does not depend
on the states the chain passed through on the way to it at time t. Usu-
ally we make a further assumption that for all states i and j and all t,
Pr {Xt+1 = j | Xt = i} is independent of t. This assumption applies when-
ever the system under study behaves consistently over time. Any stochastic
process with this behaviour is called stationary. Based on this assumption
we write
Pr {Xt+1 = j | Xt = i} = Pij , (8.2)
so that Pij is the probability that given the system is in state i at time t, the
system will be in state j at time t + 1. Pij ’s are referred to as the transition
probabilities.
Note that it is crucial that you clearly define the states and the discrete
times.
Example 8.2: Preisendorfer and Mobley [37] and Wilks [48] use a three-
state Markov chain to model the transitions between below-normal,
normal and above-normal months for temperature and precipitation.
For a system with s states the transition probabilities are conveniently repre-
sented as an s × s matrix P . Such a matrix P is called the transition matrix
and each Pij is called a one-step transition probability. For example, P12
represents the probability that the process makes a transition from state 1
to state 2 in one period, whereas P22 is the probability that the system stays
in state 2. Each row represents the one-step transition probability distri-
bution over all states. If we observe the system in state i at the beginning
of any period, then the ith row of the transition matrix P represents the
probability distribution over the states at the beginning of the next period.
or it can be sunny tomorrow then sunny the day after, with probability
(as the transitions are assumed independent)
Since these are two mutually exclusive routes, we add their probability
to determine
Using independence of transitions from step to step, and the mutual exclu-
siveness of different possible paths we establish the general key result:
Proof: Consider the following schematic general but partial state transition
diagram:
p1 (t) 1
Z
Z P1j
2
Z
p2 (t) Z
P PP Z pj (t + 1)
.. PP ZZ
2j PP
. ~
q
P
: j
i >
pi (t)
Pij
..
.
P
sj
ps (t) s
Note that the future behaviour of the system (for example, the states of
the weather) only depends on the current state and not on how it entered
this state. Given the transition matrix P , knowledge of the current state
occupied by the process is sufficient to completely describe the future proba-
bilistic behaviour of the process. This lack of memory of earlier history may
be viewed as an extreme limitation. However, this is not so. As the next
example shows, we can build into the current state such a memory. The
trick is widely applicable and creates a powerful modelling mechanism.
if the last two days have been sunny, then 95% of the time to-
morrow will be sunny;
if yesterday was cloudy and today is sunny, then 70% of the time
tomorrow’s will be sunny;
if yesterday was sunny and today is cloudy, then 60% of the time
tomorrow’s will be cloudy;
if the last two days have been cloudy, then 80% of the time to-
morrow will be cloudy.
SC
0.05 0.4
0.6
SS 0.95 CC 0.8
0.3
0.2
0.7
CS
SS SC CS CC
SS 0.95 0.05 0.0 0.0
P = SC 0.0
0.0 0.40 0.60
CS 0.70 0.30 0.0 0.0
CC 0.0 0.0 0.20 0.80
See that you may write down the states of a Markov chain in any order that
you please. But once you have decided on an ordering, you must stick to
that ordering throughout the analysis. In the above example, the labels for
both the rows and the columns of the transition matrix must be, and are,
in the same order, namely SS, SC, CS, and CC. In applying Markov chains,
there need not be a natural order for the states, and so you will have to
decide and fix upon one.
Using independence of transitions from step to step, and the mutual exclu-
siveness of different possible paths:
Example 8.5: In the weather Example 8.3 we saw that p(1) = p(0)P and
p(2) = p(1)P so that
Corollary 8.4 The (i, j)th element of P n gives the probability of starting
from state i and being in state j precisely n steps later.
Proof: Being in state i at time t corresponds to p(t) being zero except for
the ith element which is one, then the right-hand side of p(t + n) = p(t)P n
shows p(t + n) must be just the ith row of P n . Thus the corollary follows.
♠
where:
Hence [P 2 ]12 = 0.17. This means that the probability that a city
person will live in the country after 2 transitions (years) is 17%.
To find the population distribution after
1, 2, 3 and
10 years given
that the initial distribution is p(0) = 0.75 0.25 we perform the
following calculations.
Notice that after many transitions the population distribution tends to settle
down to a steady state distribution.
The above calculations can be also performed as follows:
p(n) = p(0)P n .
Hence to calculate p(10) we multiply the initial population distribution by
10 0.6761 0.3239
P = .
0.6478 0.3522
0.6761 0.3239
p(10) = 0.75 0.25 = 0.6690 0.3310 .
0.6478 0.3522
which is the same result as before.
For large n notice that P n also approaches a steady state with identical
rows. For example,
n 0.6667 0.3333
lim P =
n→∞ 0.6667 0.3333
The probabilities in each row represent the population distribution in the
steady state. This distribution is independent of the initial conditions. For
example if a fraction x (0 ≤ x ≤ 1) of the population initially lived in the
city and a fraction (1−x) in the country, in the steady state situation we will
find 66.67 percent living in the city and 33.33 percent living in the country
regardless of the value of x. This is verified by computing
0.6667 0.3333
p(∞) = x 1 − x = 0.6667 0.3333 .
0.6667 0.3333
The long term behaviour of Markov chains depend on the general struc-
ture of the transition matrix. For some transition matrices the chain will
settle down to a steady state condition which is independent of the initial
state. In this subsection we identify the characteristics of a Markov chain
that will ensure a steady state exists. In order to do this we must classify a
Markov chain according to the structure of its transition diagram and ma-
trix. The critical property we need for a steady state is that the Markov
chain is “ergodic”—you may meet this term in completely different contexts,
such as in fluid turbulence, but the meaning is essentially the same: here
it means that the probabilities get “mixed up” enough to ensure there are
There are biological no long time correlations in behaviour and hence a steady state will appear.
situations with intriguing Further, an ergodic system is one in which time averages, such as might be
non-ergodic effects.
obtained from an experiment, are identical to ensemble averages, averages
over many realisations, which is what we often want to discuss and report
in applications.
Consider the following transition matrix
1 2 3 4 5
1 0.3 0.7 0 0 0
2 0.9 0.1 0 0 0
P =
3
0 0 0.2 0.8 0
4 0 0 0.5 0.3 0.2
5 0 0 0 0.4 0.6
Always draw such a state This matrix is depicted by the following state transition diagram. Each
transition diagram for your node represents a state and the labels on the arrows represent the transition
Markov chains.
probability Pij .
1 2 3 4
Example 8.7: Determine which of the chains with the following transition
matrices is ergodic.
0 0 0.5 0.5
0 0.2 0.4 0.4
0 0.4 0.6
P1 = 0.1 0.9 0
, P2 = 0.1 0.2 0.7 .
0
0.3 0.3 0.4
0.4 0.6 0 0
Solution: Draw a state transition diagram for each, then the follow-
ing observations easily follow. The states in P1 communicate with each
other. However, if the process is in state 1 or 2 it will always move to
either state 3 or 4 in the next transition. Similarly if the process is in
The common row vector π represents the limiting state probability distribu-
tion or the steady state probability distribution that the process approaches
regardless of the initial state. When the above limit occurs, then following
any initial condition p(0) the probability vector after a large number n of
transitions is
p(n) = p(0)P n → π .
To show this last step, consider just the first element, p1 (n), of the proba-
bility vector p(n). It is computed as p(0) times the first column of P n , but
T
P n to π1 · · · π1
hence
π1
p1 (n) → p(0) ...
π1
1
= π1 p(0) ...
1
= π1
as the sum of the elements in p(0) have to be 1. Similarly for all the other
elements in p(n).
How do we find these limiting state probabilities π? For a given chain with
transitions matrix P we have observed that as the number of transitions n
increases
p(n) → π
π = πP . (8.4)
The limiting steady state probabilities are therefore the solution of the sys-
tem of linear equations such that the row sum of π is 1:
s
X
πj = 1 . (8.5)
j=1
Example 8.8: To illustrate how to solve the steady state probabilities con-
sider the transition matrix,
0.7 0.2 0.1 0
0.3 0.4 0.2 0.1
P = 0
.
0.3 0.4 0.3
0 0 0.3 0.7
Solving π = πP we have
0.7 0.2 0.1 0
0.3 0.4 0.2 0.1
π1 π 2 π 3 π4 = π 1 π 2 π3 π4 0 0.3 0.4 0.3 ,
0 0 0.3 0.7
or
π1 = 0.7π1 + 0.3π2 + 0π3 + 0π4 ,
π2 = 0.2π1 + 0.4π2 + 0.3π3 + 0π4 ,
π3 = 0.1π1 + 0.2π2 + 0.4π3 + 0.3π4 ,
π4 = 0π1 + 0.1π2 + 0.3π3 + 0.7π4 ,
together with
π 1 + π2 + π 3 + π 4 = 1 . (8.6)
Discarding any of the first four equations and solving the remaining
equations we find the steady state probabilities:
3 3 4 5
π = 15 15 15 15 .
(I − P )T π T = 0
(that is, Equation (8.6)). This can all be done in r—albeit with some
effort.
[,1]
[1,] 0.2000000
[2,] 0.2000000
[3,] 0.2666667
[4,] 0.3333333
80% of the passengers who currently fly with kra will fly with kra
next time, 15% will switch to emu and the remaining 5% will switch
to nfa;
90% of the passengers who currently fly with emu will fly with emu
next time, 6% will switch to kra and the remaining 4% will switch to
nfa;
90% of the passengers who currently fly with nfa will fly with nfa
next time, 4% will switch to kra and the remaining 6% will switch to
emu Airlines.
8.7 Exercises
Ex. 8.9: The SOI is a well known climatological indicator for eastern Aus-
tralia. Stone and Auliciems [42] developed SOI phases in which the av-
erage monthly SOI is allocated to one of five phases correspond to the
SOI falling rapidly (phase 1), staying consistently negative (phase 2),
staying consistently near zero (phase 3), staying consistently positive
(phase 4), and rising rapidly (phase 5).
The transition matrix, based on data collected from July 1877 to
February 2002 is
0.668 0.000 0.081 0.154 0.101
0.000 0.683 0.125 0.062 0.130
P = 0.354 0.000 0.063 0.370 0.212 .
0.000 0.387 0.204 0.102 0.303
0.036 0.026 0.132 0.276 0.529
Ex. 8.10: Draw the state transition diagram for the Markov chain given by
1/3 2/3
P = .
1/4 3/4
Ex. 8.11: Draw the state transition diagram and hence determine if the
following Markov chain is ergodic. Also determine the recurrent, tran-
sient and absorbing states of the chain.
0 0 1 0 0 0
0 0 0 0 0 1
0 0 0 0 1 0
P = 1 1 0 1 0 0
4 4 2
1 0 0 0 0 0
1 2
0 3 0 0 0 3
Ex. 8.12: The daily rainfall in Melbourne has been recorded from 1981 to
1990. The data is contained in the file melbrain.dat, and is from
Hyndman [5] (and originally from the Australian Bureau of Meteorol-
ogy).
A large number of days recorded no rainfall at all. The following
transition matrix shows the transition matrix for the two states ‘Rain’
and ‘No rain’:
0.721 0.279
P = .
0.440 0.560
Ex. 8.13: The daily rainfall in Melbourne has been recorded from 1981 to
1990, and was used in the previous exercise. In that exercise, two states
(‘Rain’ (R) or ‘No rain’ (N)) were used. Then, the state yesterday was
used to deduce probabilities of the two states today. In this exercise,
four states are used, taking into account the weather for the previous
two days.
There are four states RR, RN, NR, NN; the left-most state occurs
earlier. (That is, RN means a rain-day followed by a day with no
rain). The following transition matrix shows the transition matrix for
the four states:
0.564 0.436 0 0
0 0 0.315 0.685
P =
0.5554 0.445
.
0 0
0 0 0.265 0.735
(b) Explain why eight entries in the transition matrix must be exactly
zero.
(c) Use r to determine the steady state probabilities of the four states
for the data.
(d) Determine the probability that two wet days will be followed by
two dry days.
1110010011111110011110111
1111001111111110001101101
Ex. 8.15: Suppose that if it has rained for the past three days, then it will
rain today with probability 0.8; if it did not rain for any of the past
three days, then it will rain today with probability 0.2; and in any
other case the weather today will, with probability 0.6, be the same
as the weather yesterday. Determine the transition matrix for this
Markov chain.
Ex. 8.16: Let {Xn | n = 0, 1, 2, ...} be a Markov chain with state space
{1, 2, 3} and transition matrix
1 1 1
2 4 4
2 1
P = 3 0 3
.
3 2
5 5 0
Ex. 8.17: Determine the limiting state probabilities for Markov chains with
the following transition matrices.
0 1 0 0.2 0.4 0.4
0.5 0.5
P1 = P2 = 0 0 1 P3 = 0.5 0.2 0.3
0.7 0.3
0.4 0 0.6 0.3 0.4 0.3
Ex. 8.18: Two white and two black balls are distributed in two urns in
such a way that each contains two balls. We say that the system is in
state i, i = 0, 1, 2, if the first urn contains i white balls. At each step,
we randomly select one ball from each urn and place the ball drawn
from the first urn into the second, and conversely with the ball from
the second urn. Let Xn denote the state of the system after nth step.
Assuming that the process can be modelled as a Markov chain, draw
the state transition diagram and determine the transition matrix.
Ex. 8.19: A company has two machines. During any day each machine that
is working at the beginning of the day has a 1/3 chance of breaking
down. If a machine breaks down during the day, it is sent to repair
facility and will be working 3 days after it breaks down. (i.e. if a
machine breaks down during day 3, it will be working at the beginning
of day 6). Letting the state of the system be the number of machines
working at the beginning of the day, draw a state transition diagram
and formulate a transition probability matrix for this situation.
Ex. 8.20: The State Water Authority plans to build a reservoir for flood
mitigation and irrigation purposes on the Macintyre river. The pro-
posed maximum capacity of the reservoir is 4 million cubic metres.
The weekly flow of the river can be approximated by the following
discrete probability distribution:
(a) Model the system as a Markov chain and determine the steady
state probabilities. State any assumptions you make.
(b) Explain the steady state probabilities in the context of this ques-
tion.
Ex. 8.21: Past records indicate that the survival function for light bulbs of
traffic lights has the following pattern:
(a) If each light bulb is replaced after failure, draw a state transition
diagram and find the transition matrix associated with this pro-
cess. Assume that a replacement during the month is equivalent
to a replacement at the end of the month.
(b) Determine the steady state probabilities.
(c) If an intersection has 20 bulbs, how many bulbs fail on average
per month?
(d) If an individual replacement has a cost of $15, what is the long-
run average cost per month ?
(a) The weather can be classified as Sunny (S) or Cloudy (C). Con-
sider the previous two days classified in this manner; then there
are four states: SS, SC, CS and CC. The transition matrix, en-
tered in R, is
pp <- matrix( nrow=4, ncol=4, byrow=TRUE,
data=c(.9, .1, 0, 0,
0, 0, .4, .6,
.7, .3, 0, 0,
0, 0, .2, .8) )
(You can read ?matrix for assistance.) Verify this could be a
valid transition matrix by computing the row sums rowSums(pp)
(the row sums ) are all one. (See ?rowSums.)
Suppose today is the secondof two sunny days in a row, state SS,
that is p0 = 1 0 0 0 . Enter this state into r by typing
pie <- c(1,0,0,0), then compute the probabilities of being in
various states tomorrow as pie <- pie %*% pp. (See ?"%*%"
for help here (quotes necessary). Note that the operator * does an
element-by-element multiplication; the command %*% is used for
matrix multiplication in r.) Why is Pr {cloudy tomorrow} = 0.1?
Evaluate pie <- pie %*% pp again to compute the probabilities
for two days time. Why is Pr {cloudy in 2 days} = 0.15?
What is the probability of being sunny in 3 days time?
(b) Keep applying pie <- pie %*% pp iteratively and see that the
predicted probabilities recognisably converge in about 10–20 days
to π = 0.58 0.08 0.08 0.25 . These are the long-term prob-
abilities of the various states.
Compute P 10 , P 20 and P 30 and see that the rows of powers of
the transition matrix also converge to the same probabilities.
(c) So far we have only addressed patterns of probabilities. Some-
times we run simulations to see how the Markov chain may ac-
tually evolve. That is, we need to generate a sequence of states
according to the probabilities of transitions. For this weather ex-
ample, if we start in zeroth state SS we need to generate for the
first state either SS with probability 0.9 or SC with probability
0.1. Suppose it was SC, then for the second state we need gener-
ate either CS with probability 0.4 and CC with probability 0.6.
How is this done?
Sampling from general probability distributions is done using the
cumulative probability distribution (cdf) and rand. For exam-
ple, if weare in state SS, i = 1, then the cdf for the choice of next
state is .9 1 1 1 obtained in r by cumsum( pp[1,] ).
Thus in general the next state j is found from the current state i
by for example
Ex. 8.24: Packets of information sent via modems down a noisy telephone
line often fail. For example, suppose in 31 attempts we find that
packets are sent successfully or fail with the following pattern, where
1 denotes success and 0 failure:
1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0,
0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1 .
Given this model, what is the long-term probability of success for each
packet?
Suppose the probability of success of packet transmission depends
upon chance and the success or failure of the previous two attempts.
Write down and interpret the four states of this Markov chain model.
Use the data to estimate the transition probabilities, then form them
into a 4×4 transition matrix P . Compute using Matlab a high power
of P to see that the long-term
distribution of states is approximately
π = .4 .2 .2 .2 and hence deduce this model would predict a
slightly higher overall success rate.
Ex. 8.25: Let a Markov chain with the state space S = {0, 1, 2} be such
that:
Draw the transition diagram and write down the transition matrix.
draw the transition diagram and find the probability that the particle
will be in state 1 after three jumps given it started in state 1.
Ex. 8.27: (sampling problem) Let X be a Markov Chain. Show that the
sequence Yn = X2n , n ≥ 0 is a Markov chain (such chains are called
imbedded in X).
Ex. 8.28: (lumping states together) Let X be a Markov chain. Show that
Yn = |Xn | , n ≥ 0 is not necessarily a Markov chain.
Ex. 8.29: Classify the states of the following Markov chains and determine
whether they are absorbing, transient or recurrent:
0 1/2 1/2
P1 = 1/2 0 1/2 ;
1/2 1/2 0
0 0 0 1
0 0 0 1
P2 =
1/2 1/2 0
;
0
0 0 1 0
Ex. 8.34: Suppose that coin 1 has probability 0.7 of coming up heads, and
coin 2 has probability 0.6 of coming up heads. If the coin flipped
today comes up heads, then we select coin 1 to flip tomorrow, and if
it comes up tails, then we select coin 2 to flip tomorrow. If the coin
initially flipped is equally likely to be coin 1 or coin 2, then what is
the probability that the coin flipped on the third day after the initial
flip is coin 1?
Ex. 8.35: For a series of dependent trials the probability of success on any
trial is (k + 1)/(k + 2) where k is equal to the number of successes on
the previous two trials. Compute
8.9 The chain is ergodic, and the steady state probabilities are (to three
decimal places)
[0.165, 0.247, 0.126, 0.183, 0.278].
8.14
6 8
0 14 14
P = 8 27
1 35 35
8.15 The process may be modelled as an 8 state Markov chain with states
{[111], [112], [121], [122], [211], [212], [221], [222]} where 1 indicates no
rain, 2 indicates rain and a triple [abc] indicates the weather was a the
day before yesterday, b yesterday and c today.
[111] 0.8 0.2 0 0 0 0 0 0
[112] 0 0 0.4 0.6 0 0 0 0
[121] 0
0 0 0 0.6 0.4 0 0
[122] 0 0 0 0 0 0 0.4 0.6
P =
[211] 0.6 0.4 0 0 0 0 0 0
[212] 0 0 0.4 0.6 0 0 0 0
[221] 0 0 0 0 0.6 0.4 0 0
[222] 0 0 0 0 0 0 0.2 0.8
8.16
17 9 5
30 40 24
P2 = 16
30
9
30
1
6
.
17 3 17
30 20 60
(c) π = 13 13 31
8.18
0 1 0
1 1 1
P = 4 2 4
0 1 0
8.19 The process may be modelled as a 6 state Markov chain with the
following states ∈ {[200], [101], [110], [020], [011], [002]}.
The three numbers in the label for each state describes the number
of machines currently working, in repair for 1 day and in repair for 2
days. For example, the state [020] means no machines are currently
working and both machines were broken down yesterday and would
be available again the day after tomorrow. If we are currently at
state [020] then after one transition (day) the process will move to
state [002]. Following this process we find the transition matrix as
4 4 1
[200] 9 0 9 9 0 0
2 1
[101]
3 0 3 0 0 0
2 1
[110] 0
3 0 0 3 0
P =
[020] 0
0 0 0 0 1
[011] 0 1 0 0 0 0
[002] 1 0 0 0 0 0
8.20 (a) The states are the volume of water in the reservoir, which al-
though continuous are assumed to take discrete values ∈ {1, 2, 3, 4}.
Hence the transition matrix is
0.7 0.2 0.1 0.0
0.3 0.4 0.2 0.1
P = 0.0 0.3 0.4 0.3
(b) The steady state probabilities represent the long term average
probability of finding the reservoir in each state. For example
in the long run we expect the reservoir will start, or end, with
a volume of 1 million m3 , 20.5% of the time and a volume of 4
million m3 , 32.9% of the time.
8.21 (a) The states are the age of lights in months ∈ {0, 1, 2} then the
transition matrix associated with this process is
0.15 0.85 0.0
P = 0.29 0.0 0.71 .
1.0 0.0 0.0
8.22 (a) Let Xn = elapsed time in hours (at time n) since adjustment
∈ {0, 1, 2, 3}. Assume that adjustments occur on the hour only
and that the time taken to service the machine is negligible. (al-
ternative sets of assumptions are possible.) The transition prob-
abilities can be found by converting the given table as follows. If
the machine is adjusted 100 times, the number of these adjust-
ments which we expect to ”survive” are given by
Time since adjustment (hours) Number surviving
0 100
1 100
2 80
3 50
4 0
We then have
100
P01 = Pr {Xn+1 = 1 | Xn = 0} = =1
100
80
P12 = = 0.8
100
50
P23 = = 0.625
80
Since none survive to “age” 4 a state 4 is not needed. Hence the
required transition matrix is
0 1 0 0
0.2 0 0.8 0
P = 0.375
.
0 0 0.625
1 0 0 0
(b)
10 10 8 5
π= 33 33 33 33
8.34 Model this as a Markov chain with two states: C1 means that coin 1 is
to be tossed; C2 means that coin 2 is to be tossed. From the question
the state transition diagram is
Pr(T)=0.3 -
0.7 C1 C2 0.4
Pr(H)=0.6
π3 = π0 P 3 = 0.6665 0.3335 .
Other Models
9
Module contents
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 206
9.2 Using other models . . . . . . . . . . . . . . . . . . . . . 206
9.3 Seasonally adjusted models . . . . . . . . . . . . . . . . 206
9.4 Regime-dependent autoregressive models . . . . . . . . 207
9.5 Neural networks . . . . . . . . . . . . . . . . . . . . . . . 207
9.6 Trend-regression models . . . . . . . . . . . . . . . . . . 208
9.7 Multivariate time series models . . . . . . . . . . . . . . 208
9.8 Forecasting by means of a model . . . . . . . . . . . . . 211
9.9 Finding similar past patterns . . . . . . . . . . . . . . . 211
9.10 Singular spectrum analysis . . . . . . . . . . . . . . . . . 212
Module objectives
205
206 Module 9. Other Models
9.1 Introduction
In this part of the course, one particular type of time series methodology
has been discussed: the Box–Jenkins models, or arma type models. There
are a large number of other possible models for time series however. In this
Module, some of these models are briefly discussed. You are required to
know the details of only one of these models in particular, but should at
least know the names and ideas behind the others.
You don’t need to understand all the details in this Module; but see Assign-
ment 3.
In particular, the method that unofficially won the competition was the
theta-method (see Assimakopoulos and Nikolopoulos [7]) which was shown
later (see Hyndman and Billah [22]) to be simply exponential smoothing
with a drift (or trend) component. Exponential smoothing was listed in
Section 1.4 as a simple method.
The lesson is clear: Just because methods appear clever, complicated or
technical, simple methods are often the best. However, all methods have
situation in which they perform well, and there are other methods worthy
of consideration. Some of those are considered here.
Activity 9.A: Read Chu & Katz [13] in the selected read-
ings.
Table 9.1: The parameter estimates for an ar(3) model with seasonally
varying parameters for modelling the seasonal SOI. Note the seasons refer
to northern hemisphere seasons.
SOI predictand
Chu & Katz [13] discuss fitting arma type models to the seasonal and
monthly SOI using an arma model whose coefficients change according to
the season. They fit a seasonally varying ar(3) model to the seasonal SOI,
{Xt }, of the form
Activity 9.B: Read Zwiers & von Storch [53] in the selected
readings.
Zwiers & von Storch [53] fit a regime-dependent ar model (ram) to the SOI
described by a stochastic differential equation. (These models are also called
Threshold Autoregressive Models by other authors, such as Tong [44].) In
essence, the SOI is modelled using one of two indicators of the SOI (ei-
ther the South Pacific Convergence Zone hypothesis, or the Indian Monsoon
hypothesis, as explained in the article), and a seasonal indicator.
(and can be passed to other processing elements). Neural networks are said
to be loosely based on the operation of the human brain (!).
Maier & Dandy [31] fit neural networks to daily salinity data at Murray
Bridge, South Australia, as well as numerous Box–Jenkins models. They
conclude the Box–Jenkins models produce better one-day ahead forecasts,
while the neural networks produce better long term forecasts.
Guiot & Tessier [18] use neural networks and ar(3) models to detect the
effects of pollution of the widths of tree rings, and hence tree growth, from
1900 to 1983.
Visser & Molenaar [47] discuss a trend-regression model for modelling a time
series {Yt } in the presence of k other variables {Xi,t } for i = 1, . . . , k. These
models are of the form
One particular model they fit is for modelling annual mean surface air tem-
peratures in the northern hemisphere from 1851 to 1990, {Tn }. They fit
a TR(0, 2, 0, 2) model using the Southern Oscillation Index (SOI) and the
index of volcanic dust (VDI) on the northern hemisphere as covariates. The
fitted model is
Tn = µt − 0.050SOIt − 0.086VDIt + et
In this course, only univariate time series have been discussed. It is possible,
however, for two time series to be related to each other. In this case, there
is a multivariate time series.
30
10
SOI
−10
−30
Time
Sea Level Pressure Anomaly
6
4
2
−2 0
−6
Time
Figure 9.1: Two time series that might be expected to vary together: Top:
the SOI; Bottom: the sea level air pressure anomaly at Easter Island.
Example 9.1: The SOI and the sea level air pressure anomaly at Easter
Island might be expected to vary together, since the SOI is related to
pressure anomalies at Darwin and Tahiti. The two are plotted together
in Figure 9.1.
In a similar way as the autocorrelation was measured, the cross corre-
lation can be defined as:
where µX is the mean of the time series {Xt } and µY is the mean of
the time series {Yt }, and k is again the lag. The cross correlation can
be computed for various k. For this example, the plot of the cross
correlation is shown in Figure 9.2.
The cross correlation indicates there is a significant correlation between
0.1
0.0
−0.1
−2 −1 0 1 2
Lag
Figure 9.2: The cross correlation between the SOI and the sea level air
pressure anomaly at Easter Island.
the two series near a lag of zero. That is, when the SOI goes up, there
is a strong chance the sea level air pressure anomaly at Easter Island
will also go up at the same time.
Suppose we have a time series {Xt }t≥0 and we wish to be able to forecast
future values. We wish to identify an estimator of the next value of the time
series, say X
bt+1|t . One way of doing this is to search through the history
of the time series and find a time when the past k values of the time series
have approximately occurred before.
How do we determine which five past values are “like” the pattern we are
currently observing?
Call the current m values vector x. For any past series of m values, say
vector y, one measure of the distance between these two vectors is defined
by v
um
uX
d(x, y) = t (xk − yk )2 .
k=1
where xk is the kth element of the vector x. (This is the most common way
to define distance between vectors.)
213
214
Introduction
10
Module contents
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 216
10.2 Multivariate data . . . . . . . . . . . . . . . . . . . . . . 216
10.3 Preview of methods . . . . . . . . . . . . . . . . . . . . . 217
10.4 Review of mathematical concepts . . . . . . . . . . . . . 217
10.5 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
10.6 Displaying multivariate data . . . . . . . . . . . . . . . . 217
10.7 Some hypothesis tests . . . . . . . . . . . . . . . . . . . . 221
10.8 Further comments . . . . . . . . . . . . . . . . . . . . . . 221
10.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
10.9.1 Answers to selected Exercises . . . . . . . . . . . . . . . 223
Module objectives
215
216 Module 10. Introduction
10.1 Introduction
There are numerous books available about multivariate statistics, and many
are available from the USQ library. You may find other books useful to refer
to during your study of this multivariate anlaysis component of this course.
One of the most common sources of multivariate data are the Sea Sur-
face Temperatures (SST). SSTs are measurements of the temperature of the
oceans, measured at locations all around the world.
In addition, multivariate data can be created from any univariate series since
climatological variable are often time-dependent. The original data, with say
n observations, can be designated as X1 . The series can then be shifted back
t time steps to create a new variable X2 . Both variables can be adjusted to
have a length of n − t, when the variables could now be identified as X10 and
X20 . The two variables (X10 , X20 ) can be considered multivariate data.
This Chapter contains material that should be revision for the most part.
You may find it useful to refer back to Chapter 2 throughout this course. Pay
particular attention to sections 2.5 to 2.7 as many multivariate techniques
use these concepts.
10.5 Software
The software package r will be used for this Part, as with the time series
component. See Sect. 1.5.1 for more details. Most statistical programs will
have multivariate analysis capabilities.
For this part of the course, the r multivariate analysis library is needed; this
should be part of the package that you install by default. To enable this
package to be available to r, type library(mva) at the r prompt when r
is started. For an idea of what functions are available in this library, type
library(help=mva) at the r prompt.
Many of the plots discussed are available in the package S-Plus, a commercial
package not unlike r. In the free software, r, however, some of these plots
are not available (in particular, Chernoff faces). The general consensus is
that it would be a lot of work for a graphic that isn’t that useful. One
particular problem with Chernoff faces is that the faces (and interpretations)
can change dramatically depending on what variables are allocated to which
dimensions of the face.
However, the star plot is available using the function stars. The “Drafts-
man’s display” is available just by plotting a multivariate dataset; see the
following example.
The profile plots are useful, but only when there are not too many variables
or too many groups, otherwise the plots become too cluttered to be of any
use.
Example 10.1: Hand et al. [19, dataset 26] gives a number of measure-
ments air pollution from 41 cities in the USA. The data consists of
seven variables (plus the names of the cities), generally means from
1969 to 1971
> library(mva)
> us <- read.table("usair.dat", header = TRUE)
> plot(us[, 1:4])
> pairs(us[, 1:4])
45 55 65 75 0 1000 2500
● ● ●
100
● ● ●
● ● ●
● ● ●
SO2
60
● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ●● ● ●
● ●●● ● ●●● ● ●●
●●● ●
● ● ● ●●●●● ●●●●
20
● ●● ● ● ● ● ●●
●●●
●● ●●●●
●●
● ● ● ●
● ●● ●● ●●● ●●● ●
● ● ●●
●● ● ●●● ●●●●
75
● ● ●
● ● ●
●●
● ●● ● ●● ●
● ● ●
65
●● ●
●●●
● ●
●● temp ●●
●●
●
●●● ●●
● ●●
●
●●
● ●
● ●●
55
● ● ● ●● ●● ● ●●
● ● ●● ● ● ● ● ● ● ● ● ●
●● ●●
●
● ● ●
●
●●●● ● ●●●
●
●
●
● ● ●
● ● ● ● ● ●● ●
● ●●
● ● ●
● ● ● ● ● ●
45
● ● ●
● ● ●
● ● ●
2500
● ●
manufac ●
1000
● ● ●
● ● ●
● ● ● ●● ●
● ●
● ● ●
● ● ● ● ●●●
●●●●●
●● ●
●
●●●
●
●● ● ● ●●
●●●●
● ●●● ● ●● ●
●
●
●
●
● ●
●
●●
●●
● ● ● ● ● ●●
● ●
●●
● ● ●
● ●
●●●
●
●●●● ●● ●● ● ●● ● ●●●● ● ●
●
●
●●
●
●●
●
0
● ● ●
2500
● ● ●
● ● ●
population
1000
● ● ●
● ● ● ● ●
●
● ● ●● ● ● ● ● ●●●●
● ●
●●● ● ●● ●
● ●● ● ●● ●●●● ●
●●●
●●●●● ●
● ●● ●● ● ●
●●
●●
●
●
●●●● ●●● ● ● ●●● ● ● ●
●●●
●
●
●
● ●● ● ● ● ●●●● ●● ● ●
●
●
●
●● ●●
0
The plot is shown in Fig. 10.1. Star plots can also be produced:
The input key.loc changes the location of the ‘key’ that shows which
variable is displayed where on the star; it was determined through trial
and error. Alter the value of the input flip.labels to true (that is,
set flip.labels=TRUE) to see what affect this has.
The star plot discussed in the text is in Fig. 10.2. Only the stars for
the first eleven cities are shown so that the detail can be seen here.
A variation of the star plot is given in Fig. 10.3, and is particularly
instructive when seen in colour.
From the star plots, can you find any cities that look very similar?
That look very different?
SO2
wind.speed
Atlanta Chicago days.precip
annual.precip
population
days.precip
Atlanta Chicago wind.speed
annual.precip
One difficulty with multivariate data has already been discussed: it may be
hard to display the data in a useful way. Because of this, it is often difficult
to find any outliers in multivariate data. Note that an observation may not
appear as an outlier with regard to any particular variables, but it may have
a strange combination of variables.
The main multivariate techniques can be broadly divided into the following
categories:
Consider the data in Example 10.1. We may wish to reduce the number
of variables from eight to two or three. If we could reduce the number of
maxt
mint
1980 1990
radn
variables to just one, this might be called a ‘pollution index’. This would be
an example of data reduction. Data reduction works with the variables.
10.9 Exercises
Ex. 10.2: The data set twdecade.dat contains (among other things) the
average rainfall, maximum temperature and minimum temperature at
Toowoomba for the decades 1890s to the 1990s.
Produce a multivariate plot of the three variables by decade. Which
decades appear similar?
Ex. 10.3: The data set twdecade.dat contains the average rainfall, max-
imum temperature and minimum temperature at Toowoomba for the
each month. It should be possible to see the seasonal pattern in tem-
peratures and rainfall. Produce a multivariate plot that shows the
features by month.
Ex. 10.4: The data set emdecade.dat contains the average rainfall, max-
imum temperature and minimum temperature at Emerald for the
decades 1890s to the 1990s.
Produce a multivariate plot of the three variables by decade. Which
decades appear similar? How similar are the patterns to those observed
for Toowoomba?
Ex. 10.5: The data set emdecade.dat contains the average rainfall, maxi-
mum temperature and minimum temperature at Emerald for the each
month. It should be possible to see the seasonal pattern in temper-
atures and rainfall. Produce a multivariate plot that shows the fea-
tures by month. How similar are the patterns to those observed for
Toowoomba?
Ex. 10.6: The data in the file countries.dat contains numerous variables
from a number of countries, and the countries have been classified by
region. Create a plot to see which countries appear similar.
Ex. 10.7: This question concerns a data set that is not climatological, but
you may find interesting. The data file chocolates.dat, available
from http://www.sci.usq.edu.au/staff/dunn/Datasets/applications/
popular/chocolates.html, contains measurements of the price, weight
and nutritional information for 17 chocolates commonly available in
Queensland stores. The data was gathered in April 2002 in Brisbane.
Create a plot to see which chocolates appear similar. Are there are
surprises?
Ex. 10.8: The data file soitni.txt contains the SOI and TNI from 1958 to
1999. The TNI is related to sea surface temperatures (SSTs), and SOI
is also known to be related to SSTs. It may be expected, therefore,
that there may be a relationship between the two indices. Create a
plot to examine if such a relationship exists.
The plot (Fig. 10.4) shows a trend of increasing rainfall from the 1900s
to the 1950s, a big drop in the 1960s, then a big jump in the 1970s. The
1990s were very dry again. The 1990s were also a very warm decade
(relatively speaking), and the 1960s very cold (relatively speaking).
Principal Components
11
Analysis
Module contents
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 226
11.2 The procedure . . . . . . . . . . . . . . . . . . . . . . . . 228
11.2.1 When should the correlation matrix be used? . . . . . . 232
11.2.2 Selecting the number of pcs . . . . . . . . . . . . . . . . 233
11.2.3 Interpretation of pcs . . . . . . . . . . . . . . . . . . . . 234
11.3 pca and other statistical techniques . . . . . . . . . . . 235
11.4 Using r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
11.5 Spatial pca . . . . . . . . . . . . . . . . . . . . . . . . . . 242
11.5.1 A small example . . . . . . . . . . . . . . . . . . . . . . 242
11.5.2 A larger example . . . . . . . . . . . . . . . . . . . . . . 245
11.6 Rotation of pcs . . . . . . . . . . . . . . . . . . . . . . . . 247
11.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
11.7.1 Answers to selected Exercises . . . . . . . . . . . . . . . 252
Module objectives
225
226 Module 11. Principal Components Analysis
11.1 Introduction
● ●
● ●
2
2
● ●
● ●● ● ● ●● ●
● ● ● ●
● ● ● ●
1
1
● ●
● ●
● ● ●● ● ● ● ● ● ●● ● ● ●
● ●
●
● ●● ●
● ●●
● ● ●● ● ● ●●
● ●
● ●● ● ● ●● ●
● ● ● ●● ● ● ● ●●
● ● ● ●
x2
x2
● ● ● ● ● ● ● ●
● ●● ● ●● ● ●● ● ●●
0
0
●●
● ●●
●
● ●
● ●● ● ●● ● ●● ● ●●
● ● ● ● ● ● ● ●
● ●●● ●● ● ●●● ●●
● ● ● ● ● ●
●
● ● ●
● ●
● ● ●● ●● ● ● ●● ●●
● ●
●
● ●
●
●
−1
−1
● ●
●● ● ●● ●
● ●● ●● ● ●● ●●
● ●
● ●
● ●
● ●
−2
−2
● ●
−2 −1 0 1 2 −2 −1 0 1 2
x1 x1
Histogram of PCA 1
35
●
2
30
●
● ●● ●
● ●
● ●
25
1
●
●
● ● ●● ● ● ●
●
● ●●
20
●
Frequency
● ● ●●
●
● ●● ●
● ● ● ●●
● ●
x2
● ● ● ●
● ●● ● ●●
0
●●
15
● ●
● ●● ● ●●
● ● ● ●
● ●●● ●●
● ● ●
●
● ●
● ● ●● ●●
10
●
●
●
●
−1
●
●● ●
● ●● ●●
5
●
●
●
●
−2
●
0
−2 −1 0 1 2 −3 −2 −1 0 1 2 3 4
x1 predict(pca)[, 1]
1.5
25
20
1.0
Frequency
Variances
15
10
0.5
5
0
0.0
predict(pca)[, 2]
component only. That is, using just the first principal components reduces
the dimension of the problem from two to one, with only a small loss of
information.
Note the pcs are simply linear combinations of the variables, and that they
are orthogonal. Also note that my computer struggles to perform the re-
quired computations on my machine (it seems to manage despite complain-
ing).
The four steps outlined at the bottom of p 80 of Manly show the general
procedure. Software is used to do the computations.
Example 11.1: Consider the following data matrix X with two variables
X1 and X2 , with three observations (so n = 3) for each variable:
1 0
X = 1 1 .
4 2
The data are plotted in Fig. 11.2 (a). The mean for each variable is
X 1 = 2 and X 2 = 1, so the centred matrix is
−1 −1
Xc = −1 0 .
2 1
The centred data are plotted in Fig. 11.2 (b). It is usual to find the pcs
from the correlation matrix. First, find the covariance matrix, found
(R − Iλ)e = 0. (11.1)
1
Notice that we have divided by n rather than n − 1. This is simply to follow what r
does; more commonly, the divisor is n − 1 when sample variances (and covariances) are
computed. I do not know why r divides by n instead of n − 1.
2
This is a quick review of work already studied in MAT2100. Eigenvalues and eigen-
vectors are covered in most introductory algebra texts.
|R − Iλ| = 0
2 100%
3
The signs may change from one version of r to another, or even differ between copies
of r on different operating systems. This is true for almost any computer package gener-
ating eigenvectors. A change in sign simply means the eigenvectors point in the opposite
direction and makes no effective difference to the analysis
4
3
3
● Centred X 2
2
2
X2
● ●
1
● ●
0
0
−1
−1
−1 0 1 2 3 4 −1 0 1 2 3 4
X1 Centred X 1
e1
Centred & scaled X 2
3
2
e2
X2
● ●
1
● ●
0
0
−1
−1
● ●
−1 0 1 2 3 4 −1 0 1 2 3 4
Figure 11.2: The data from Example 11.1. Top left: the original data; Top
right: the data have been centred; Bottom left: the data have been centred
and then scaled; Bottom right: the directions of the principcal components
have been added.
A scree plot can be drawn from this if you wish. In any case, one pc
would be taken (otherwise, no simplication has been made for all this
work!).
It is possible to then determine what ‘score’ each of the original points
now have on the new variables (or principal components). These new
scores, say Y , can be found from the ‘original’ variables, X, using
Y = XC.
In our case in this example, the matrix X will refer to the centred,
scaled variables since the pcs were computed using these. Hence,
√ √ √
−1/√2 − 3/ 2 √ √
1/√2 1/ √2
Y = −1/
√ 2 √ 0√
1/ 2 −1/ 2
2 3/ 2
√ √
(−1 − 3)/2 (−1 + 3)/2
= −1/2
√ −1/2
√
.
1 + 3/2 (1 − 3)/2
√ √
Thus, the point (1, 0) is now mapped to ((−1 − 3)/2, (−1 + 3)/2),
the point (1, 1) is now mapped
√ √ −1/2), and the point (4, 2)
to (−1/2,
is now mapped to (1 + 3/2, (1 − 3)/2) in the new system. In
Fig. 11.2 (d), the point (1, 1) can be seen to be mapped to a negative
value for the first pc, and the same (possibly negative) value for the
second pc4 . Thus, (−1/2, −1/2) seems a sensible value to which the
second point could be mapped.
Since we only take one pc, the new variable takes the values
√ √
(−1 − 3)/2, −1/2, 1 + 3/2
which accounts for about 93% of the variation in the original data.
For example, Example 10.1 involves variables that are measured on different
scales: SO2 was measured in micrograms per cubic metre, whereas man-
ufac is simply the number of manufacturing enterprises with more than 20
employees. These are very different and measured in different units of mea-
surement. For this reason, the pca should be based on the correlation
matrix.
One of the difficult decisions to make in pca is how many principal com-
ponents (pcs) are necessary to keep. The analysis will always produce as
many pcs as there are variables, so keeping all the pcs means that no infor-
mation is lost, but it also completely reproduces the data. This defeats the
purpose of performing a data reduction technique such as pca—it simply
complicates matters!
There are many criteria for making this decision, but no formal procedure
(involving tests, etc.). There are only guidelines; some are given below.
Using any of the methods without thought is dangerous and prone to error.
Always examine the information and make a sensible decision that you can
justify. Sometimes, there is not one clear decision. Remember the purpose
of pca is to reduce the dimension of the data, so a small number of pcs is
preferred.
Scree plots
One way to help make the decision is to use a scree plot. The scree plot is
used to help decide between the important pcs (with large eigenvalues) and
the less important pcs (with small eigenvalues). Some authors claim this
method generally includes too many pcs. When using a screeplot, some pcs
should be clearly more important than others. (This is not always the case,
however.)
This method recommends only keeping those pcs whose eigenvalues are
greater than the average. (Note that if the correlation matrix has been used
to compute the pcs, this means that pcs are retained if their eigenvalues are
greater than one.) For a small number of variables (say 20), this method is
reported to include too few techniques.
Example 11.3: Katz & Glantz [27] use a principal components analysis
on rainfall data to show that no single rainfall index (or principal
component) can adequately explain rainfall variation.
11.2.3 Interpretation of PC s
It is often useful to find an interpretation for the pcs, recalling that the pcs
are simply linear combinations of the variables. It is not uncommon for the
first pc to be a measure of ‘size’. Finding interpretations is often quite an
art, and sometimes any interpretation is difficult to find.
Example 11.4: Mantua et al. [33] define the the Pacific Decadal Oscilla-
tion (PDO) as the leading pc of monthly SST anomalies in the North
Pacific Ocean.
Example 11.5: Wolff, Morrisey & Kelly [50] use principal components
analysis followed by a regression to identify source areas of the fine
particles and sulphates which are the primary components of summer
haze in the Blue Ridge Mountains of Virginia, USA.
Example 11.6: Fritts [17] describes two techniques for examining the rela-
tionship between ring-width of conifers in western North America and
climatic variables. The first technique is a multiple regression on the
principal components of climate.
pca is sometimes used with cluster analysis (see Module 13) to classify
climatological variables.
Example 11.7: Stone & Auliciems [42] use a combination of cluster analy-
sis and pca to define phases of the Southern Oscillation Index (SOI).
r Computation Matrix
function method used
11.4 Using R
The next example continues on from Example 11.9 and uses a very small
data matrix to show how the calculations done by hand can be compared to
those performed in r.
Example 11.9:
Refer to Example 11.1. and the data are plotted in Fig. 11.2 (a). How
can this analysis be done in r?
Of course, tasks such as multiplying matrices and computing the eigen-
values can be done in r (using the commands %*% and eigen respec-
tively). First, define the data matrix (and then centre it also):
$values
[1] 1.8660254 0.1339746
$vectors
[,1] [,2]
[1,] 0.7071068 0.7071068
[2,] 0.7071068 -0.7071068
These results agree with those in Example 11.1. But of course, r can
compute principal components without us having to resort to matrix
multiplication and finding eigenvalues.
> p$sdev
> p$sdev^2
> p$rotation
PC1 PC2
[1,] 0.7071068 0.7071068
[2,] 0.7071068 -0.7071068
> screeplot(p)
or just
> plot(p)
> summary(p)
Importance of components:
PC1 PC2
Standard deviation 1.366 0.366
Proportion of Variance 0.933 0.067
Cumulative Proportion 0.933 1.000
> p$sdev^2
The new scores, called the principal components or pcs (and called Y
earlier), can be found using
> predict(p)
PC1 PC2
[1,] -1.1153551 0.2988585
[2,] -0.4082483 -0.4082483
[3,] 1.5236034 0.1093898
This example was to show you how to perform a pca by hand, and how to
find those bits-and-pieces in the r output. Notice that once the correlation
matrix has been found, the analysis proceeds without knowledge of anything
else. Hence, given only a correlation matrix, pca can be performed. (Note
r requires a data matrix for use in prcomp; to use only a correlation matrix,
you must use eigen and so on.)
Commonly, a small number of the pcs are chosen for further analysis; these
can be extracted as follows (where the first two pcs here are extracted as an
example):
> cor(sp)
> sp.prcomp$rotation
> sp.prcomp$sdev^2
> screeplot(sp.prcomp)
> screeplot(sp.prcomp, type = "lines")
sp.prcomp sp.prcomp
3.5
●
3.5
3.0
3.0
2.5
2.5
2.0
Variances
Variances
2.0
1.5
1.5
1.0
1.0
0.5
0.5
●
●
●
●
0.0
1 2 3 4 5
Figure 11.3: Two different ways of presenting the screeplot for the sparrow
data. In (a), the default screeplot; in (b), the more standard screeplot
produced with the option type="lines".
The final plot is shown in Fig. 11.3. The first pc obviously is much
larger than the rest, and easily accounts for most of the variation in
the data.
If we use the screeplot, you may decide to keep only one pc. Using
the total variance rule, you may decide that three or four pcs are
necessary:
> summary(sp.vars)
Using the above average pc rule would select only one pc:
> mean(sp.vars)
[1] 0.2
The values of the pcs for each bird is found using (for the first 10 birds
only)
> predict(sp.prcomp)[1:10]
Note that the first bird has a score of 0.07837 on the first pc, whereas
the score is 0.064 in Manly. The scores on the second pc are very
similar: 0.6166 (above) compared to 0.602 (Manly).
The first three pcs are extracted for further analysis using
As noted by Wilks, this is a very common use of pca. The idea is this: Data,
such as rainfall, may be available for a large number of locations (usually
called ‘stations’), usually over a long time period. pca can be used to find
patterns over those locations.
Table 11.2: Monthly rainfall figures for ten stations in Australia. There are
15 observations for each station, given in order of time (the actual recording
months are unknown; the source did not state).
Station number
1 2 3 4 5 6 7 8 9 10
1 111.70 30.80 78.70 58.60 30.60 63.60 53.40 15.90 27.60 72.60
2 25.50 2.80 19.20 4.00 8.10 7.80 10.30 1.00 4.10 27.30
3 82.90 47.50 98.90 65.20 73.50 117.00 95.60 37.50 93.40 139.90
4 174.30 81.50 106.80 80.90 73.90 123.50 155.80 51.20 81.50 177.10
5 77.70 22.00 48.90 56.20 67.10 113.00 256.40 38.30 65.60 253.30
6 117.10 35.90 118.10 86.90 81.90 98.60 84.00 42.40 67.30 154.30
7 111.20 52.70 69.10 56.80 27.20 51.60 76.00 16.30 50.40 191.50
8 147.40 109.70 150.70 101.20 102.80 112.40 32.60 42.60 52.50 47.30
9 66.50 29.00 41.70 22.60 50.60 73.10 92.80 26.40 36.00 80.10
10 107.70 37.70 77.00 52.80 27.60 34.80 16.20 7.60 5.50 12.20
11 26.70 6.10 16.20 11.90 14.20 34.80 32.60 18.00 28.70 118.30
12 92.40 25.70 45.50 58.00 22.20 32.30 35.70 8.80 13.80 37.80
13 157.00 63.00 79.20 70.10 45.70 66.80 76.00 14.40 16.30 71.50
14 20.80 4.10 12.50 7.90 7.40 11.70 9.30 14.80 6.60 19.40
15 137.20 38.10 82.40 59.70 27.60 58.00 45.30 5.00 34.30 108.40
The scree plot is shown in Fig. 11.4; it is not clear how many pcs should
be retained. We shall select three for the purpose of this example; three
is not unreasonable as they account for over 90% of the variation in
the data (see line 11 of the output).
There a few important points to note:
(a) In practice, there are often hundreds of stations with available
data, and over a hundred years worth of rainfall data for most
stations. This creates huge data files that, in practice, take large
amounts of computing power to analyse.
(b) If latitudes and longitudes of the stations are known, contour
maps can be drawn of the principal components over a map of
Australia (see the next example).
(c) Each pc is a vector of length 15 and is also a time series. These
can be plotted as time series (see Fig. 11.5) and even analysed
as a time series using the techniques previously studied. This
analysis can detect time trends in the pcs.
In this small example, the time trends of 15 stations have been reduced
to time trends of three new variables that capture the important in-
formation carried by all 15.
10000
8000
Variances
6000
4000
2000
0
Figure 11.4: The scree plot for the pca of the small rainfall example.
100
0
Time
−100
PCA 1
PCA 2
−200 PCA 3
2 4 6 8 10 12 14
PCs
Figure 11.5: The pcs plotted over time for the small rainfall examples.
15
10
Variances
5
0
Figure 11.6: The scree plot for the full rainfall example.
Example 11.12: Using the larger data file from which the data in the
previous example came, a more thorough pca can be performed. This
analysis was over 1188 time points for 52 stations. The data matrix has
1188 × 52 = 61 776 entries; this needs a lot of storage in the computer,
and a lot of memory for performing operations such as matrix multi-
plication and matrix inversions. The scree plot is shown in Fig. 11.6.
Plotting the first pc over a map of Australia gives Fig. 11.7 (a). The
second pc has been plotted over a map of Australia Fig. 11.7 (b).
This time, the first three pcs account for about 57% of the total varia-
tion. Notice that even with 52 stations, the contours are jagged; they
could, of course, be smoothed.
It requires special methods to handle data files of this size. The code
used to generate these picture is given below. Be aware that you
probably cannot run this code as it requires installing r libraries that
you probably do not have by default (but can perhaps be installed;
see Appendix A). The huge data files necessary are in a format called
netCDF, and a special library is required to read these files.
First PC
−10
−15
−20
−25
Latitude
−30
−35
−40
−45
Longitude
Second PC
−10
−15
−20
−25
Latitude
−30
−35
−40
−45
Longitude
Figure 11.7: The first two pcs plotted over a map of Australia.
> library(oz)
> library(ncdf)
> set.datadir()
> d <- open.ncdf("./pca/oz-rain.nc")
> rawrain <- get.var.ncdf(d, "RAIN")
> missing <- attr(rawrain, "missing_value")
> rawrain[rawrain == missing] <- NA
> set.docdir()
> longs <- get.var.ncdf(d, "LONGITUDE79_90")
> nx <- length(longs)
> lats <- get.var.ncdf(d, "LATITUDE19_33")
> ny <- length(lats)
> times <- get.var.ncdf(d, "TIME")
> ntime <- length(times)
> rain <- matrix(0, ntime, nx * ny)
> for (ix in (1:nx)) {
+ for (iy in (1:ny)) {
+ idx <- (iy - 1) * nx + ix
+ t <- rawrain[ix, iy, 1:ntime]
+ if (length(na.omit(t)) == ntime) {
+ rain[, idx] <- t
+ }
+ }
+ }
> pc.rain <- rain[, colSums(rain) > 0]
> p1 <- prcomp(pc.rain, center = TRUE, scale = TRUE)
> plot(p1$rotation, type = "b", main = "Full rainfall example",
+ ylab = "Eigenvalues")
> par(mfrow = c(2, 1))
> oz(add = TRUE, lwd = 2)
> oz(add = TRUE, lwd = 2)
The gaps in the plots are because there is such little data in those
remote parts of Australia, and rainfall is scare there anyway. Note the
pcs are deduced from the correlations, so the contours are for small
and sometimes negative numbers, not rainfall amounts.
One constraint on the pcs is they must be orthogonal, which some authors
argue limits how well they can be interpretted. If the physical interpretation
of the pcs is more important than data reduction, some authors argue that
the orthogonality constraint should be relaxed to allow better interpretation
(see, for example, Richman [38]). This is called rotation of the pcs. Many
methods exist for rotation of the pcs.
However, there are many arguments against rotation of pcs (see, for ex-
ample, Basilevsky [8]). Accordingly, r does not explicitly allow for pcs to
be rotated, but it can be accomplished using functions designed to be used
in factor analysis (where rotations are probably the norm rather than the
exception). We will not discuss this topic any further, except to note two
issues:
11.7 Exercises
(a) Perform a pca ‘by hand’ using the correlation matrix (follow
Example 11.1 or Example 11.9). (Don’t use prcomp or similar
functions; you may use r to do the matrix multiplication and so
on for you.)
(b) Perform a pca ‘by hand’, but using the covariance matrix.
(c) Compare and comment on the two strategies.
(a) Perform a pca ‘by hand’ using the correlation matrix (follow
Example 11.1 or Example 11.9). (Don’t use prcomp or similar
functions; you may use r to do the matrix multiplication and so
on for you.)
(b) Perform a pca ‘by hand’, but using the covariance matrix.
(c) Compare and comment on the two strategies.
Perform a pca using the correlation matrix. Define the new variables,
and explain how many new pcs are necessary.
(a) Perform a pca using the correlation matrix and show it always
produces new axes at 45◦ to the original axes.
(b) Explain what happens in the pca for r = 0, r = 0.25, r = 0.5
and r = 1.
Ex. 11.17: The data file toowoomba.dat contains (among other things) the
daily rainfall, maximum and minimum temperatures at Toowoomba
from 1 January 1889 to 21 July 2002 (a total of 41474 observations
on three variables). Perform a pca. How many pcs are necessary to
summarize the data?
Ex. 11.18: Consider again the air quality data from 41 cities in the USA,
as seen in Example 10.1. For each city, seven variables have been mea-
sured (see p 218). The first is the concentration of SO2 in microgram
per cubic metre; the other six are potential identifiers of pollution
problems. The original source treats the concentration of SO2 as a
response variable, and the other six as covariates.
(a) Examine the correlation matrix; what varaible are highly corre-
lated?
(b) Produce a star plot of the data, and comment.
(c) Is it possible to reduce these six covariates to a smaller number,
without losing much information? Use a pca to perform a data
reduction.
Ex. 11.19: Consider the example in 11.5.2. If you can load the appropri-
ate libraries, try the same steps in that example but for the data in
oz-slp.nc.
Ex. 11.20: The data file emerald.dat contains the daily rainfall, maximum
and minimum temperatures, radiationp, an evaporation and maximum
vapour pressure deficit (in hPa) at Emerald from 1 January 1889 to
15 September 2002 (a total of 41530 observations on three variables).
Perform a pca. How many pcs are necessary to summarize the data?
Ex. 11.21: The data file gatton.dat contains the daily rainfall, maximum
and minimum temperatures, radiationp, an evaporation and maximum
vapour pressure deficit (in hPa) at Gatton from 1 January 1889 to 15
September 2002 (a total of 41530 observations on three variables).
Ex. 11.22: The data file strainfall.dat contains the average month and
annual rainfall (in tenths of mm) for 363 Australian rainfall stations.
(a) Perform a pca using the monthly averages (and not the annual
average) using the correlation matrix. How many pcs seems nec-
essary?
(b) Perform a pca using the monthly averages (and not the annual
average) using the covariance matrix. How many pcs seems nec-
essary?
(c) Which pca would you prefer? Why?
(d) Select the first two pcs. Confirm that they are uncorrelated.
Ex. 11.23: The data file jondaryn.dat contains the daily rainfall, max-
imum and minimum temperatures, radiationp, an evaporation and
maximum vapour pressure deficit (in hPa) at Jondaryn from 1 Jan-
uary 1889 to 15 September 2002 (a total of 41474 observations on six
variables). Perform a pca. How many pcs are necessary to summarize
the data?
Ex. 11.24: The data file wind_ca.dat contains numerous weather and wind
measurements from Canberra during 1989.
(a) Explain why it is best to use the correlation matrix for this data.
(b) Perform a pca using the correlation matrix.
(c) How many pcs are necessary to summarize the data? Explain.
(d) If possible, interpret the pcs.
(e) Perform a time series analysis on the first pc.
Ex. 11.25: The data file wind_wp.dat contains numerous weather and wind
measurements from Wilson’s Promontory, Victoria (the most southerly
point of mainland Australia) during 1989.
(a) Explain why it is best to use the correlation matrix for this data.
(b) Perform a pca using the correlation matrix.
(c) How many pcs are necessary to summarize the data? Explain.
(d) If possible, interpret the pcs.
(e) Explain why a time series analysis on, say, the first pc cannot be
done here. (Hint: Read the help about the data.)
Ex. 11.26: The data file qldweather.dat contains six weather-related vari-
ables for 20 Queensland cities.
(a) Perform a pca using the correlation matrix. How many pcs seems
necessary?
(b) Perform a pca using the covariance matrix. How many pcs seems
necessary?
(c) Which pca would you prefer? Why?
(d) Select the first three pcs. Confirm that they are uncorrelated.
Ex. 11.27: This question concerns a data set that is not climatological,
but you may find interesting. The data file chocolates.dat, available
from http://www.sci.usq.edu.au/staff/dunn/Datasets/applications/
popular/chocolates.html, contains measurements of the price, weight
and nutritional information for 17 chocolates commonly available in
Queensland stores. The data was gathered in April 2002 in Brisbane.
$values
[1] 1.5 0.5
$vectors
[,1] [,2]
[1,] 0.7071068 0.7071068
[2,] -0.7071068 0.7071068
$values
[1] 2.4120227 0.9213107
$vectors
[,1] [,2]
[1,] 0.5257311 0.8506508
[2,] -0.8506508 0.5257311
How many pcs should be selected? The screeplot is shown in Fig. 11.8,
from which three or four might be selected. The variances of the
eigenvectors are
> summary(us.pca)
Importance of components:
PC1 PC2 PC3 PC4
Standard deviation 1.482 1.225 1.181 0.872
Proportion of Variance 0.366 0.250 0.232 0.127
Cumulative Proportion 0.366 0.616 0.848 0.975
PC5 PC6
Standard deviation 0.3385 0.18560
Proportion of Variance 0.0191 0.00574
Cumulative Proportion 0.9943 1.00000
Perhaps three pcs are appropriate. The first three account for almost
74% of the total variance. It would also be possible to choose four pcs,
but with six original variables, this isn’t a large reduction.
2.0
1.5
Variances
1.0
0.5
0.0
Is there a sensible interpretation for these pcs? The first pc has a high
positive loading for temperature, but a high negative loading for the
other variables (apart from annual precipitation). This could be seen
as the contrast between temperature and the other variables: the con-
trast between temperature rises and other variables rising. It is hard
to see any intelligent purpose in such a pc. Likewise, interpretations
for the next two pcs are difficult to determine.
Factor Analysis
12
Module contents
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 256
12.2 The Procedure . . . . . . . . . . . . . . . . . . . . . . . . 257
12.2.1 Path model . . . . . . . . . . . . . . . . . . . . . . . . . 258
12.2.2 Steps in a fa . . . . . . . . . . . . . . . . . . . . . . . . 260
12.3 Factor rotation . . . . . . . . . . . . . . . . . . . . . . . . 262
12.3.1 Methods of factor rotation . . . . . . . . . . . . . . . . . 262
12.4 Interpretation of factors . . . . . . . . . . . . . . . . . . 263
12.5 The differences between pca and fa . . . . . . . . . . . 266
12.6 Principal components factor analysis . . . . . . . . . . . 267
12.7 How many factors to choose? . . . . . . . . . . . . . . . 268
12.8 Using r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
12.9 Concluding comments . . . . . . . . . . . . . . . . . . . . 274
12.10Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
12.10.1 Answers to selected Exercises . . . . . . . . . . . . . . . 277
Module objectives
255
256 Module 12. Factor Analysis
12.1 Introduction
The idea of having underlying, but unobservable, factors may sound odd.
But consider an example: annual taxable income, number of cars owned,
value of home, and occupation may all be measure various observable so-
cioeconomic status indicators. Likewise, heart rate, muscle strength, blood
pressure and hours of exercise per week may all be measurements of fitness.
The observable measurements are all aspects of the underlying factor called
‘fitness’. In both cases, the true, underlying variable of interest (‘socioeco-
nomic status’ and ‘fitness’) is hard to measure directly, but can be measured
using the observed variables given.
Factor analysis (fa), like pca, is a data reduction technique. fa and pca are
very similar, and indeed some computer programs and texts barely distingish
between them. However, there are certainly differences. As with pca, the
analysis starts with n observations on p variables. These p variables are
assumed to have a common set of m factors underlying them; the role of fa
is to identify these factors.
where Fj are the underlying factors common to all the variables Xi , aij are
called factor loadings, and the ei are the parts of each variable unique to
that variable. In matrix notation,
x = Λf + e, (12.2)
where the factor loadings are in the matrix λ. In general, the Xi are stan-
dardized to have mean zero and variance one. Likewise, the factors Fj are
assumed to have mean zero and variance one, and are independent of ei .
The factor loadings aij are assumed constant. Under these assumptions,
1. The effect of the common factors Fj , through the constants aij . Hence,
the quantity a2i1 + a2i2 + · · · + a2im is called the communality for Xi .
: X1
e1
a
11
>
F1
XXX
a21
a31Z XX
Z
XXX
Z : X2
z
X
e2
a12 Z
Z
Z a22
F2
XX
XXX Z
a32 XXZZ
X~
z X
X
3 e3
: X1
0.38
0.6
>
F1
X
ZXXXX 0.9
0.1Z XXX
Z z X
X 0.15
Z : 2
0.5Z
F2
X Z 0.2
XXX Z
0.9 XXXZZ ~
X3
XXz 0.18
Covar[X1 , X2 ]
= Covar[0.6F1 + 0.5F2 , 0.9F1 + 0.2F2 ]
= Covar[0.6F1 + 0.5F2 , 0.9F1 ] + Covar[0.6F1 + 0.5F2 , 0.2F2 ]
= Covar[0.6F1 , 0.9F1 ] + Covar[0.5F2 , 0.9F1 ] +
Covar[0.6F1 , 0.2F2 ] + Covar[0.5F2 , 0.2F2 ]
= 0.54Covar[F1 , F1 ] + 0 + 0 + 0.1Covar[F2 , F2 ]
= 0.64,
F1 F2
X1 0.6 0.5
X2 0.9 0.2
X3 0.1 0.9
12.2.2 Steps in a FA
are also valid factors. The original factor loadings Λ are effectively
replaced by ΛT for some rotation matrix T .
In general, with two or more common factors, the initial factor solution
may be converted to another equally valid solution with the same number of
factors by an orthogonal rotation. Such a rotation preserves the correlations
and communalities amongst variables, but of course changes the loadings or
correlations between the original variables and the factors. Recalling that
the initial factor solution may result in loadings which do not allow easy
interpretation of the factors, rotation can be used to “simplify” the loadings
in the sense of enabling easier interpretation. The rotational process of
factor analysis allows the reseacher a degree of flexibility by presenting a
multiplicity of views of the same data set. Obtain a parsimonious or simple
structure following these guidelines:
1. Any column of the factor loadings matrix should have mostly small
values, as close to zero as possible.
2. Any row of the matrix should have only a few entries far from zero.
The most popular oblique factor rotation methods are promax (which is in
r), oblimax , quartimin, covarimin, biquartimin, and oblimin. Similar to or-
thogonal rotation methods, oblique methods are designed to satisfy various
definitions of simple structure, and no algorithm is clearly superior to an-
other. Oblique methods present complexities that don’t exist for orthogonal
methods. They include:
1. The factors are no longer uncorrelated and hence the pattern and
structure matrices will not in general be identical.
For more information on some popular rotation techniques, see Kim and
Mueller [30].
Example 12.2: Buell & Bundgaard [10] use factor analysis to represent
wind soundings over Battery MacKenzie.
Example 12.3: Kalnicky [24] used factor analysis to classify the atmo-
spheric circulation over the midlatitudes of the northern hemisphere
from 1899–1969.
Example 12.4: Hannes [20] used rotated factors to explore the relationship
between water temperatures measured at Blunt’s Reef Light Ship and
the air pressure at Eureka, California. The factor loadings indicated
that the water temperatures measured at Trinidad Head and Blunt’s
Reef were quite different.
Example 12.5: Rogers [39] used factor analysis to find areal patterns of
anomalous sea surface temperature (SST) over the eastern North Pa-
cific based on monthly SSTs, surface pressure and 1000–500mb layer
thickness over North America during 1960–1973.
3
2
2
Transformed y
Original y
1
−1 0
−1 0
−3
−3
−4 −2 0 1 2 3 −4 −2 0 2
Original x Transformed x
Figure 12.1: The effect on the cartesian plane of applying the orthogonal
transform in matrix T in Eq. (12.3)
: X1
0.38
0.77
>
F1
X
ZXXXX 0.88
0.54Z XXX
Z z X
X 0.15
Z : 2
0.13
Z −0.28
F2
X XXX
Z
Z
0.73 XXXX ZZ
~
z
X X3 0.18
Note that still
Covar[X1 , X2 ] = Covar[0.77F1 + 0.13F2 , 0.88F1 − 0.28F2 ]
= (0.77 × 0.88) + (0.1 × −0.28)
≈ 0.64.
3
2
2
Transformed y
1
Original y
1
0
−1 0
−3 −2 −1
−3
−4 −3 −2 −1 0 1 2 −4 −2 0 2
Original x Transformed x
Figure 12.2: The effect on the cartesian plane of applying the oblique trans-
form in matrix S in Eq. (12.7)
: X1 0.38
0.58
>
F1
XXX
Z
0.0023
Z
XX
XXX
0.94
r 6 Z : X2
z
X
0.15
? 0.35Z
Z−0.052
Z
F2
XXX Z
0.90XXXXZ XZ
~
z X
X 0.18
3
pca and factor analysis are similar methods, which is often a source of
confusion for students. This section lists some of the difference (also see
Mardia, Kent & Bibby [34, §9.8]).
In the previous section, differences between fa and pca were pointed out.
However, pca can actually be used to assist in performing a fa. This is
called principal components factor analysis, and uses a pca to perform the
first step in the fa (note that this is not the only option) from which the
next two steps can be done. This idea is presented in this section.
Begin with p original variables Xi for i = 1 . . . p. Performing a pca will
produce p pcs, Zi for i = 1 . . . p. The pcs are defined as
Z1 = b11 X1 + b12 X2 + · · · + b1p Xp
.. .. ..
. . .
Zp = bp1 X1 + bp2 X2 + · · · + bpp Xp
where the bij are given by the eigenvectors of the correlation matrix. In
matrix form, write Z = BX. Since B is a matrix of eigenvectors, B −1 = B T ,
so also X = B T Z, or
X1 = b11 Z1 + b21 Z2 + · · · + bp1 Zp
.. .. ..
. . .
Xp = b1p Z1 + b2p Z2 + · · · + bpp Zp
Now in a factor analysis, we only keep m of the p factors; hence
X1 = b11 Z1 + b21 Z2 + · · · bp1 Zm + e1
.. .. ..
. . .
Xp = b1p Z1 + b2p Z2 + · · · bmp Zm + ep
where the ei are unexplained components after omitting the last p − m pcs.
In this equation, the bij are like factor loadings. But true factors have a
variance of one; here, var[Zi ] = λi since the Zi is a pc. This means the Zi
are not ‘true’ factors. Of course, the Zi can be rescaled to have a variance
of one:
p p p p p p
X1 = ( λ1 b11 )Z1 / λ1 + ( λ2 b21 )Z2 λ2 + · · · + ( λm bm1 )Zm λm + e1
.. .. ..
. . .
p p p p p p
Xp = ( λ1 b1p )Z1 λ1 + ( λ2 b2p )Z2 λ2 + · · · + ( λp bmp )Zm λm + ep
X = ΛT f + e
In pca, there were some guidelines for selecting the number of pcs. Similar
guidelines also exist for factor analysis. r will not let you have too many
factors; for example, if you try to extract three factors from four variables,
you will be told this is too many.
As usual, there are two competing criteria: To have the simplest model
possible, and to explain as much of the variation as possible.
There is no easy answer to explain how many factors are chosen. This is one
of the major criticism of fa. Try to find a number of factors that explains as
much variation as possible (using the communalities and uniquenesses), but
is not too complicated, and preferably leads to a useful interpretation. The
best methods is probably to perform a pca, and note that ‘best’ number of
pcs, and then use this many factors in the fa.
Note also that choosing the number of factors is a separate issue to the
rotation. The rotation will not alter the communalities or uniquenesses.
The first step is therefore to decide on the number of factors using commu-
nalities and uniquenesses, and then try various rotations to find the best
interpretation.
12.8 Using R
Actually doing this is beyond the scope of this course; we will just use r
trusting the code gives sensible answers.
Loadings:
Factor1 Factor2 Factor3 Factor4
AGR -0.961 0.178 -0.178 0.094
MIN 0.143 0.625 -0.410 -0.078
MAN 0.744 0.416 -0.102 -0.508
PS 0.582 0.576 -0.017 0.569
CON 0.449 0.034 0.376 -0.375
SER 0.601 -0.327 0.600 0.089
FIN 0.103 -0.121 0.631 0.228
SPS 0.697 -0.672 -0.138 0.196
TC 0.615 -0.121 -0.233 0.146
Notice that the value are not identical to those shown in Manly; there
are numerous different algorithms for factor analysis, so this is of no
concern. The help for the function factanal in r states
There are so many variations on factor analysis that it is hard to
compare output from different programs. Further, the optimiza-
tion in maximum likelihood factor analysis is hard, and many
other examples we compared had less good fits than produced by
this function.
The values are, however, similar. The signs are different, but this is of
no consequence.
The results using the varimax rotation are obtained as follows:
Loadings:
Factor1 Factor2 Factor3 Factor4
AGR -0.695 -0.633 -0.278 -0.185
MIN -0.142 0.194 -0.546 0.479
MAN 0.199 0.882 -0.293 0.302
PS 0.205 0.086 0.084 0.969
CON 0.081 0.644 0.250 -0.033
SER 0.427 0.368 0.720 0.023
FIN -0.022 0.041 0.686 0.055
SPS 0.972 0.051 0.197 -0.091
TC 0.614 0.160 -0.061 0.249
> 1 - ee.fa4$uniqueness
Again, while they are somewhat similar to those shown in Manly, they
are not identical.
We now show how to extract the ‘scores’ from the factor analysis.
In this example, the ‘scores’ represent how each country scores on
each factor. First, we need to adjust the call to factanal by adding
scores="regression":
> ee.fa4scores$scores
Factor1 Factor2
Belgium 0.735864909 0.39347899
Denmark 1.660414922 -0.66961020
France 0.196749209 0.34219028
W.Germany 0.367203694 1.17655067
Ireland 0.109519146 -1.15479327
Italy -0.243011308 0.72606452
Luxemborg -0.387309499 1.19266940
Netherlands 0.998482010 -0.43751930
UK 1.296604056 -0.10590831
Austria -0.554916391 0.47914938
Finland 0.674141842 -0.53711442
Greece -1.463580232 -0.79224633
Norway 0.910020084 -0.36175539
Portugal -0.582448069 -0.01995971
> summary(ee.lm)
> names(ee.lm)
Loadings:
Factor1 Factor2
Length 0.370 0.926
Extent 0.659 0.530
Head 0.638 0.459
Humerus 0.901 0.317
Sternum 0.475 0.463
Factor1 Factor2
SS loadings 2.017 1.665
Proportion Var 0.403 0.333
Cumulative Var 0.403 0.736
> 1 - sp.fa.vm$uniqueness
Loadings:
Factor1 Factor2
Factor1 Factor2
SS loadings 2.180 1.601
Proportion Var 0.436 0.320
Cumulative Var 0.436 0.756
> 1 - sp.fa.pm$uniqueness
12.10 Exercises
Ex. 12.10: The data file toowoomba.dat contains the daily rainfall, max-
imum and minimum temperatures, radiation, pan evaporation and
maximum vapour pressure deficit (in hPa) at Toowoomba from 1 Jan-
uary 1889 to 21 July 2002 (a total of 41474 observations on three
variables). Perform a fa to find two underying factors, and compare
the factors using no rotation, promax rotation and varimax rotation.
Ex. 12.11: In a certain factor analysis, the factor loadings were computed
as shown in the following table.
F1 F2
X1 0.3 0.5
X2 0.8 0.1
X3 0.1 0.8
X4 0.6 0.7
Ex. 12.12: Consider again the air quality data from 41 cities in the USA,
as seen in Example 10.1. For each city, seven variables have been mea-
sured (see p 218). The first is the concentration of SO2 in microgram
per cubic metre; the other six are potential identifiers of pollution
problems. The original source treats the concentration of SO2 as a
response variable, and the other six as covariates.
Ex. 12.13: The data file gatton.dat contains the daily rainfall, maximum
and minimum temperatures, radiation, pan evaporation and maximum
vapour pressure deficit (in hPa) at Gatton from 1 January 1889 to 21
July 2002 from 1 January 1889 to 15 September 2002 (a total of 41474
observations on six variables). Perform a fa to find two underying
factors, and compare the factors using no rotation, promax rotation
and varimax rotation.
Ex. 12.14: The data file strainfall.dat contains the average month and
annual rainfall (in tenths of mm) for 363 Australian rainfall stations.
Ex. 12.15: The data file jondaryn.dat contains the daily rainfall, max-
imum and minimum temperatures, radiation, pan evaporation and
maximum vapour pressure deficit (in hPa) at Jondaryn from 1 January
1889 to 21 July 2002 (a total of 41474 observations on six variables).
Perform a fa to find two underying factors, and compare the factors
using no rotation, promax rotation and varimax rotation.
Ex. 12.16: The data file emerald.dat contains the daily rainfall, maximum
and minimum temperatures, radiationp, an evaporation and maximum
vapour pressure deficit (in hPa) at Emerald from 1 January 1889 to
21 July 2002 (a total of 41474 observations on six variables). Perform
a fa to find two underying factors, and compare the factors using no
rotation, promax rotation and varimax rotation.
Ex. 12.17: The data file wind_ca.dat contains numerous weather and wind
measurements from Canberra during 1989.
Ex. 12.18: The data file wind_wp.dat contains numerous weather and wind
measurements from Wilson’s Promontory, Victoria (the most southerly
point of mainland Australia) during 1989.
Ex. 12.19: The data file qldweather.dat contains six weather-related vari-
ables for 20 Queensland cities.
Ex. 12.20: This question concerns a data set that is not climatological,
but you may find interesting. The data file chocolates.dat, available
from http://www.sci.usq.edu.au/staff/dunn/Datasets/applications/
popular/chocolates.html, contains measurements of the price, weight
and nutritional information for 17 chocolates commonly available in
Queensland stores. The data was gathered in April 2002 in Brisbane.
Cluster Analysis
13
Module contents
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 280
13.2 Types of cluster analysis . . . . . . . . . . . . . . . . . . 280
13.2.1 Hierarchical methods . . . . . . . . . . . . . . . . . . . . 280
13.3 Problems with cluster analysis . . . . . . . . . . . . . . 281
13.4 Measures of distance . . . . . . . . . . . . . . . . . . . . 281
13.5 Using PCA and cluster analysis . . . . . . . . . . . . . . 281
13.6 Using r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
13.7 Some final comments . . . . . . . . . . . . . . . . . . . . 287
13.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
13.8.1 Answers to selected Exercises . . . . . . . . . . . . . . . 290
Module objectives
279
280 Module 13. Cluster Analysis
13.1 Introduction
Example 13.1: Kavvas and Delleur [28] use a cluster analysis for modelling
sequences of daily rainfall in Indiana.
Example 13.2: Fritts [17] describes two techniques for examining the rela-
tionship between ring-width of conifers in western North America and
climatic variables. The second technique is a cluster analysis which
he uses to identify similarities and differences in the response function
and then to classify the tree sites.
The simple idea of cluster analysis is explained in Manly, section 9.1. The
actual mechanics, however, can be performed in numerous ways. Manly
discusses two of these. Two methods are hierarchical clustering (using
hclust),also the first mentioned by Manly; and k-means clustering (using
kmeans),the second method mentioned by Manly. The hierarchical methods
are discussed in more detail in both Manly and these notes.
The hierarchical methods discussed in this section are well explained by the
text. The third method, using group averages, can be performed in r using
the option method="average" in the call to hclust. A similar approach to
the first method is found using the option method="single". r also provides
other hierarchical clustering methods; see ?hclust.
Activity 13.E: Read Manly, sections 9.5, 5.1, 5.2 and 5.3.
Example 13.3: Stone & Auliciems [42] use a combination of cluster analy-
sis and PCA to define phases of the Southern Oscillation Index (SOI).
13.6 Using R
After hclust is used, the resultant object can be plotted; the default plot
is the dendrogram (Manly, Figure 9.1).
> attach(ec)
The example in Manly uses standardized data (see the top of page 135).
Here is one way to do this in r:
Now that the data is prepared, the clustering can commence. The clus-
tering method used in the Example is the nearest neighbour method;
the most similar of the methods available in r is called method="single".
The distance measure used is the default Euclidean distance.
The final plot, shown in Fig. 13.1, looks very similar to that shown in
Manly Figure 9.3.
You can try other methods if you want to experiment. To then de-
termine which countries are in which cluster, the function cutree is
used, here is an example of extracting four clusters:
> cutree(es.hc, k = 4)
Cluster Dendrogram
5
4
3
2
Height
Turkey
Yugoslavia
W.Germany
Czechoslovakia
Poland
Norway
Luxemborg
Austria
Portugal
Sweden
France
Spain
USSR
Romania
Italy
Greece
Switzerland
Ireland
UK
Finland
Belgium
E.Germany
Bulgaria
Denmark
Netherlands
Hungary
dist(ec.std)
hclust (*, "single")
1 1 1
Hungary Poland Romania
1 1 1
USSR Yugoslavia
1 4
Later (Example 13.6), we will see that using Ward’s method is common
in the climatological literature. This produces four different clusters
(Fig. 13.2.)
Cluster Dendrogram
15
10
Height
5
0
Turkey
Yugoslavia
W.Germany
Czechoslovakia
Poland
Norway
Luxemborg
Austria
France
Sweden
Portugal
UK
Finland
Ireland
Belgium
Italy
Switzerland
Spain
Greece
USSR
Romania
E.Germany
Netherlands
Denmark
Bulgaria
Hungary
dist(ec.std)
hclust (*, "ward")
2 2 2
Spain Switzerland Turkey
2 2 3
Yugoslavia Bulgaria Czechoslovakia
3 4 4
E.Germany Hungary Poland
4 4 4
Romania USSR
4 4
> row.names(ec)[ec.km2$cluster == 2]
These are different groups that given in Manly (since a different algo-
rithm is used). Six groups can also be specified:
> row.names(ec)[ec.km6$cluster == 2]
> row.names(ec)[ec.km6$cluster == 3]
> row.names(ec)[ec.km6$cluster == 4]
> row.names(ec)[ec.km6$cluster == 5]
[1] "Turkey"
> row.names(ec)[ec.km6$cluster == 6]
Example 13.6:
Unal, Kindap and Karaca [45] use cluster analysis to analyse Turkey’s
climate. The abstract states:
The clusters produced using different methods can be quite different (the
default method is the complete agglomeration method).
13.8 Exercises
Ex. 13.7: Try to reproduce Manly’s Figure 9.4 by standardizing and then
using hclust.
Ex. 13.8: The data file tempppt.dat contains the average July temperature
(in ◦ F ) and the average July precipitation for 28 stations in the USA.
Each station has also been classified as belonging to southeastern,
central or northeastern USA.
Ex. 13.9: The data file strainfall.dat contains the average month and
annual rainfall (in tenths of mm) for 363 Australian rainfall stations.
(a) Perform a cluster analysis using the monthly averages. Use vari-
ous clustering methods and compare.
(b) Using a dendrogram, how many classifications seems useful?
(a) Perform a PCA on the data. Show that two PCs is reasonable.
(b) Plot the first PC against the second PC. What does this indicate?
Ex. 13.11: This question concerns a data set that is not climatological,
but you may find interesting. The data file chocolates.dat, available
from http://www.sci.usq.edu.au/staff/dunn/Datasets/applications/
popular/chocolates.html, contains measurements of the price, weight
and nutritional information for 17 chocolates commonly available in
Queensland stores. The data was gathered in April 2002 in Brisbane.
Ex. 13.12: The data file ustemps.dat contains the normal average January
minimum temperature in degrees Fahrenheit with the latitude and
longitude of 56 U.S. cities. (See the help file for full details.) Perform
a cluster analysis. How many clusters seem appropriate? Explain.
Ex. 13.13: In Exercise 11.18, the US pollution data was examined, and a
PCA performed.
(a) Perform a cluster analysis of the first two PCs. Produce a den-
drogram. Does it appear the cities can be clustered into a small
number of groups, based on the first two PCs?
(b) Repeat the above exercise, but use the first three PCs. Compare
the two cluster analyses.
Ex. 13.14: The data in the file qldweather.dat contains six weather-related
data for 20 Queensland cities (covering temperatures, rainfall, number
of raindays, humidity) plus elevation.
Weipa
Cairns
Atherton Innisfail
Townsville
Mt.Isa
Mackay
Rockhampton
Gladstone
Childers
Theodore
Birdsville Maryborough
Gympie
Roma Nambour
Toowomba Brisbane
Cunnamulla Warwick Mt.Tamborine
Stanthorpe
Ex. 13.15: The data in the file countries.dat contains numerous variables
from a number of countries, and the countries have been classified by
region.
(a) Perform a cluster analysis on the original data. Given the re-
gions of the countries, is there a sensible clustering that emerges?
Explain.
(b) Perform a PCA on the data. How many PCs seem necessary?
Let this number of PCs be p.
(c) Cluster the first p PCs. Given the regions of the countries, is
there a sensible clustering that emerges? Explain.
(d) How do these clusters compare to the clusters identified using all
the data?
291
292 Appendix A. Installing other packages in R
6. Then, from the r menu, select Install package from local zip file.
Then point to where you saved the file.
Then you should have the package installed ready for use. At the r prompt,
you can then type library(oz), for example, and the library is loaded.
Sample space: For any experiment, the sample space S of the experiment
consists of all possible outcomes for the experiment.
For example, the sample space of a coin toss is simply S = {tail, head},
usually abbreviated to {T, H}, whereas for a queuing system the sam-
ple space is the huge set of all possible realisations over time of people
arriving and being served in the queue.
293
294 Appendix B. Review of statistical rules
For example, we used these last two properties to determine the steady
state probabilities in a queue. Let event Ej denote that the queue is
in state j (that is, with j people in the queue). These are clearly
mutually exclusive events as the queue cannot be in two states at
once. Further the sample space is the union of all possible states:
S = E0 ∪ E1 ∪ E2 ∪ · · · and hence
1 = Pr {S} = Pr {E0 ∪ E1 ∪ E2 ∪ · · · }
= Pr {E0 } + Pr {E1 } + Pr {E2 } + · · ·
= π 0 + π 1 + π2 + · · · .
Pr {E1 ∩ E2 }
Pr {E2 | E1 } = .
Pr {E1 }
Solution: The events of being fined and losing licence are not mutu-
ally exclusive, therefore apply the general addition rule:
Pr {F ∪ L} = Pr {F } + Pr {L} − Pr {F ∩ L}
= 0.87 + 0.52 − 0.41
= 0.98 .
Example B.2: A researcher knows that 60% of the goats in a certain dis-
trict are male and that 30% of female goats have a certain disease.
Find the probability that a goat picked at random from the district is
a female and has the disease.
Pr {F ∩ D} = Pr {F } Pr {D | F }
= 0.4 × 0.3
= 0.12 .
E(cX1 ) = cE(X1 )
E(X1 + c) = E(X1 ) + c
Var(cX1 ) = c2 Var(X1 )
Var(X1 + c) = Var(X1 )
Example B.4: The alternative formula for the variance is derived as follows
Var(X) = E (X − µX )2
= E X 2 − 2µX X + µ2X
= E X 2 − E [X]2 as µX = E(X) .
Solution:
299
300 Appendix C. Some time series tricks in R
Function Description
acf Autocovariance and Autocorrelation
Function Estimation
ar Fit Autoregressive Models to Time Series
ar.burg Fit Autoregressive Models to Time Series
ar.mle Fit Autoregressive Models to Time Series
ar.ols Fit Autoregressive Models to Time Series by OLS
ar.yw Fit Autoregressive Models to Time Series
arima ARIMA Modelling of Time Series
austres Quarterly Time Series: Number of Australian Residents
bandwidth.kernel Smoothing Kernel Objects
beaver1 Body Temperature Series of Two Beavers
beaver2 Body Temperature Series of Two Beavers
beavers Body Temperature Series of Two Beavers
BJsales Sales Data with Leading Indicator.
Box.test Box–Pierce and Ljung–Box Tests
ccf Function Estimation
301
302 Appendix D. Time series functions in R
Multivariate analysis
E
functions in R
Function Description
ability.cov Ability and Intelligence Tests
as.dendrogram General Tree Structures
as.dist Distance Matrix Computation
as.hclust Convert Objects to Class hclust
as.matrix.dist Distance Matrix Computation
biplot Biplot of Multivariate Data
biplot.princomp Biplot for Principal Components
cancor Canonical Correlations
cmdscale Classical (Metric) Multidimensional Scaling
cut.dendrogram General Tree Structures
cutree Cut a tree into groups of data
dist Distance Matrix Computation
factanal Factor Analysis
factanal.fit.mle Factor Analysis
format.dist Distance Matrix Computation
Harman23.cor Harman Example 2.3
Harman74.cor Harman Example 7.4
305
306 Appendix E. Multivariate analysis functions in R
307
308 Bibliography
[12] Chin, Roland T., Jau, Jack Y. C. and Weinman, James A. (1987). ‘The
application of time series models to cloud field morphology analysis’ in
Journal of Climate and Applied Meteorology, 26, 363–373.
[13] Chu, Pao-Shin and Katz, Richard W. (1985). ‘Modeling and forecast-
ing the southern oscillation: A Time-Domain Approach’ in Monthly
Weather Review, 113, 1876–1888.
[15] Davis, J. M. and Rapoport, P. N. (1974). ‘The use of time series analysis
techniques in forecasting meteorological drought’, in Monthly Weather
Review 102, 176–180.
[16] Enfield, D.B., A. M. Mestas-Nunez and P.J. Tribble, (2001). ‘The At-
lantic multidecadal oscillation and it’s relation to rainfall and river flows
in the continental U.S.’ in Geophysical Research Letters, 28, 2077–2080.
[19] Hand, D. J., Daly, F., Lunn, A. D., McConway, K. J. and Ostrowski, E.
(1994). A Handbook of Small Data Sets, London: Chapman and Hall.
[20] Hannes, Gerald (1976). ‘Factor analyusis of costal air pressure and wa-
ter temperature’ in Journal of Applied Meteorology, 15(2), 120–126.
[21] Hipel and McLeod (1984). Time Series Modelling of Water Resources
and Environmental Systems, Elsevier.
[22] Hyndman, Rob J. and Billah, Baki. ‘Unmasking the Theta method’ to
appear in International Journal of Forecasting.
[25] Kärner, Olavi and Rannik, Üllar (1996). ‘Stochastic models to repre-
sent the temporal variability of zonal average cloudiness’ in Journal of
Climate, 9, 2718–2726.
[26] Katz, Richard W. and Skaggs, Richard H. (1981). ‘On the use of
autoregressive-moving average processes to model meteorological time
series’ in Monthly Weather Review, 109, 479–484.
[30] Kim, Jae-On and Mueller, Charles W. (1990). Fcator Analysis: Sta-
tistical Methods and Prcatical Issues, Sage University Paper series on
Quantitative Applications in the Social Sciences, series no. 14. Beverley
Hills and London: Sage Publications.
[33] Mantua, Nathan J. Hare, Steven R., Zhang, Yuan, Wallace, John M.,
and Francis, Robert C. (1997). ‘A Pacific interdecadal climate oscilla-
tion with impacts on salmon production’ in the Bulletin of the American
Meteorological Society, 78, 1069–1079.
[42] Stone RC and Auliciems A. (1992). ‘SOI phase relationships with rain-
fall in eastern Australia’ in International Journal of Climatology, 12,
625–636.
[44] Tong, Howell (1983). Threshold Models in Nonlinear Time Series Anal-
ysis, Springer-Verlag.
[50] Wolff, George T., Morrisey, Mark L. and Kelly, Nelson A. (1984). ‘An
investigation of the sources of summertime haze in the Blue Ridge
Mountains using multivariate statistical methods’ in Journal of Applied
Meteorology, 23(9), 1333–1341.
[53] Zwiers, Francis and von Storch, Hans (1990). ‘Regime-dependent au-
toregressive time series models of the southern oscillation’ in Journal
of Climate, 3, 1347–1363.