UNIVARIATE TIME SERIES ANALYSIS
GENERAL
Time series analysis is purely based on statistical method applicable to non
repeatable experiments (Box-Jenkins, 1970, Robeson and Steyn, 1990, Schlink et al.,
1997). Air quality data constitutes a good example of time series (Chock et al., 1975). The
Box-Jenkins approach has been thoroughly applied to the analysis of air quality data (Khare
and Sharma, 2002, Robeson and Steyn, 1990, Schlink et al., 1997, Chock et al., 1975). This
approach extracts all the trends and serial correlations among the data until only a sequence
of white noise (shock) remains. The extraction is accomplished via the difference,
autoregressive and moving average operators. To remove trends or the non stationarity in
the time series (Chock, 1975). The definitions of various statistical terms used in univariate
time series analysis have been presented in the following section.
DEFINITION OF TERMS IN UNIVARIATE TIME SERIES ANALYSIS
Stochastic process: A stochastic process is defined as a statistical phenomenon that evolves
over time according to probability laws.
Mean : It is one of the measure to represent the central tendency of the air quality data. In
time series analysis mean () values are calculated for each segment of data to check the
stationarity of the series.
1 n
zt
n t 1
( F.1)
Where, zt = air quality observation at time t, t= 1,..n
n = number of observations in a segment
Variance : It is the another condition to check the stationarity of the data in time series
analysis. The variance (2) of the series express the degree of variation around the assumed
constant mean level and as such gives a measure of uncertainty around this mean.
Mathematically it is expressed as:
1 n
2
n
( z t ) 2 ( F.2)
t 1
Stationarity: A series which contains no systematic change in mean (no trend), variance
and periodic variations; then it is called a stationary series. The analysis of time series
requires stationary series. Many possible data transformation techniques available to
convert non-stationary series into stationary series.
(a) logarithmic transformation:- it can be employed when variance of the series is
proportional to standard deviation of the series.
(b) Square root transformation:- this can be employed when variance of the series
is proportional to mean value of the series.
Trend: Any systematic change in the level of a time series. Box-Jenkins (1970) have
advocated a method called ‘differencing’ for removal of trends in the series. The method
of differencing a time series consists of subtracting the values of the observations from one
another in some defined time dependent order. Eg. a first (order) difference transformation
is defined as the difference between the values of two adjacent observations; second (order)
differencing consists of differencing of the differenced series; and so on.
Autocorrelation function (ACF): This is an important tool for a time series model
identification. It measures the correlation between observations at different distances (lags)
apart. The autocorrelation function of a time series process at lag k is defined as:
ACF ( k )
cov z , z
t t 1
(F..3)
var(z )
t
Where cov = covariance , var = variance.
The set of values k and the plot of k against k = 1, 2, , are known as the autocorrelation
function (ACF) or correllgram.
Partial autocorrelation function (PACF): The partial autocorrelation function at lag k
is defined as the correlation between time series terms k lags apart, after the correlation
due to intermediate terms has been removed (Milions and Davies, 1994a) i e. the partial
autocorrelations constitute a device for summarizing all the information contained in the
ACF of an autoregressive process in a small number of non-zero statistics (Vandaele,
1983). The lag k partial autocorrelation is the partial regression coefficient kk in the kth
order autoregression (equation F.4).
Zt k1Zt 1 k2 Zt 2 kk Zt k a t (F.4)
It also measures the additional correlation between Zt and Zt-k after adjustments have been
made for the intermediate variables Zt-1, Zt-2, , Zt-k+1. The kk is obtained from Yule-
Walker equations (Mills, 1991). In general, it difficult to know the population values of
autocorrelation and partial autocorrelation of the underlying stochastic processes.
Consequently sample autocorrelation and partial autocorrelation functions are used for
identification of tentative model. Since sample autocorrelation and partial autocorrelation
are only estimates, they are subject to sampling errors, and as such will never match exactly
to the underlying true autocorrelations and partial autocorrelations. Table F.1 shows the
properties of ACF and PACF for different B-J seasonal and non-seasonal models that help
in selection of tentative model.
Table F.1a Properties of the ACF and PACF for non-seasonal B-J models.
Model ACF PACF
Tails off.
Exponential and/or sine wave
AR (p) Cuts off after lag p (p spikes).
decay; may contain damped
oscillations.
Tails off.
Dominated by linear combination of
MA (q) Cuts off after lag q (q spikes). damped exponentials and/or sine
waves; may contain damped
oscillations.
Tails off after q-p lags. Tails off after p-q lags. Dominated by
ARMA
Exponential and/ or sine exponential and/or sine waves after p-
(p,q)
wave decay after q-p lags. q lags.
Table F.1b Properties of the ACF and PACF for seasonal B-J models.
Model ACF PACF
AR (p), Seasonal AR (P) Tails off. Cuts off after lag p + sP.
MA (q), Seasonal MA (Q) Cuts off after lag q + sQ. Tails off.
Tails off after (p + sP) - (q
Tails off after (q + sQ) – (p
+ sQ) lags.
+ sP) lags. Exponential and/
Mixed models Exponential and/ or sine
or sine wave decay after (q
wave decay after (p + sP) -
+ sQ) – (p + sP) lags.
(q + sQ) lags.
Stationarity and invertibility conditions: Stationarity and invertibility conditions impose
restrictions on the parameters of the autoregressive and moving average processes
respectively. These two conditions provide a diagnostic tool to check the stationarity of the
fitted model. For stationarity series, the autoregressive and moving average parameters of
the fitted model should be less than one. If the model fails to fulfill these two conditions,
it implies that series is non-stationary, therefore additional differencing, is required in order
to induce stationarity (Khare and Sharma, 2002).
White noise: White noise is a sequence of random shocks drawn from a fixed distribution,
with zero mean and constant variance.
Residual analysis : The residuals of the ARIMA model can be defined as the difference
between the observed and fitted values. If the model adequately depicts the ARIMA
process governing in the sample data series then residuals are white noise. The whiteness
of the residuals is examined by two approaches. First by seeing ACF plots. Second, by
Ljung-Box Q statistic test (also called Portmanteau lack-of-fit-test). If the residuals are
truly white noise, then their ACF should have no spikes and autocorrelations should be
small. Thus, the autocorrelations rk, which lie, say, outside the range 2 (i.e. outside
n
the approximate 95% large sample confidence limits) are significantly different from zero.
In second approach for analyzing the residual autocorrelations is to rely on the
Ljung-Box Q statistic, defined in equation F.5.
Q QK nn 2
k 1
rk2 â ( F.5)
k 1 n k
where n = the length of the series after any differencing, K= number of residual
autocorrelation used to calculate Q, k = lag period. If the fitted model is appropriate (i.e. if
the residuals are white noise), Q is approximately distributed as a Chi-square distribute
variable with (K-p-q-P-Q) degrees of freedom, where p, q, P and Q are the numbers
parameters in ARIMA model, representing autoregressive, moving average, the seasonal
autoregressive and the seasonal moving average parameters respectively. The Q statistic
is sensitive to the value of K. However, Davies et al.(1977) and, Chatfield (1996) suggested
that just “looking” at a few values of rk, particularly at lags 1, 2 and the first seasonal lag
(if any), and examining whether any are significantly different from zero using the crude
limits of the series 2 , is sufficient to test the whiteness of the residuals (Khare and
n
Sharma, 2002).
Metadiagnosis This includes omitting or fitting extra parameters where the model is over
specified or underspecified as described below.
Overspecified model (omitting parameters): Overfitting is one of the procedure for
diagnostic checking of model adequacy advocated by Box-Jenkins (1970). This checks the
presence of redundant parameters in the fitted model. Redundant parameters can be spotted
by calculating the t-ratio, which is the ratio of the parameter estimate to the standard error.
A parameter is significantly different from zero, if the t-ratio is equal to or greater than 2
in absolute value. An insignificant parameter is an indication that the model is
overspecified and simplification of the model is possible (Khare and Sharma, 2002).
Underspecified model (fitting extra parameters): This procedure verify that the tentative
model contains the appropriate number of parameters to represent the data and check that
if additional parameter results improvement over the original model. Example if the fitted
model is ARMA (p, d, q), more elaborate models ARMA (p+1, d, q) and ARMA(p, d, q+1)
are fitted to the data. The model is then tested to see whether the additional parameters
improve the fit significantly. This is seen by examining residual variances; if the white
noise variances are reduced by 10% by fitting an overfit model, then the overfit model is
appropriate (Khare and Sharma, 2002).