Unit-13 Correlation Analysis in Time Series
Unit-13 Correlation Analysis in Time Series
CORRELATION ANALYSIS IN
TIME SERIES
Structure
13.1 Introduction 13.5 Correlogram
Expected Learning Outcomes 13.6 Interpretation of
Correlogram
13.2 Autocovariance and
Autocorrelation Functions 13.7 Summary
13.3 Estimation of Autocovariance 13.8 Terminal Questions
and Autocorrelation
13.9 Solution/Answers
Functions
13.4 Partial Autocorrelation
Function
13.1 INTRODUCTION
With the help of the time series data, we try to fit a time series model so that
we can forecast the observations. But one of the essential elements of time
series modelling is stationarity. In the previous unit, you have studied what is
stationary and nonstationary time series and how to detect and transform
nonstationary time series to stationary time series. As you know, a time series
is a collection of observations with respect to time, therefore, there is a chance
that a value at the present time may relate/depend on the past value. In most
of the time series, we observe such relationships. To study the degree of
relationship between previous/past values with the current value, we have to
study the covariance and correlation between them before modelling the time
series. Therefore, in this unit, you will study correlation analysis in time series.
We begin with a simple introduction of autocovariance and autocorrelation
functions in time series in Sec. 13.2. In Sec. 13.3, we discuss how to estimate
the autocovariance and autocorrelation functions using time series data.
When we study the autocorrelation between observations in the presence of
the intermediate variables, then it does not give the true picture of the relation.
Therefore, to remove the effect of the same, we use partial autocorrelation
which is discussed in Sec. 13.4. To present the autocorrelation/ partial
autocorrelation in the form of graphs/diagrams, we use a correlogram. In
Sec. 13.5, we describe what is correlogram and how to plot it. The
95
* Dr. Prabhat Kumar Sangal, School of Sciences, IGNOU, New Delhi
Block 3 Time Series Analysis
interpretation of the correlogram is also explained in Sec. 13.6. In the next
unit, you will study different models for time series.
the value of another variable and vice versa. A zero value indicates no
relationship between the variables.
The main problem with the covariance is that it is hard to interpret due to its
wide range ( −∞ to + ∞ ). For example, our data set could return a value say 5,
or 500. It may take a large value if the variables X and Y are large. Therefore,
a large value of covariance does not indicate that there exists a strong
relationship between the variables. It means that it does not tell us that there
exists a strong relationship between the variables when it is large. A value of
500 tells us that the variables are correlated, but unlike the correlation
coefficient, that number doesn’t tell us exactly how strong that relationship is.
There is no meaning of the numerical value of covariance only the sign is
useful. To overcome this problem the covariance is divided by the standard
deviation to get the correlation coefficient.
Correlation
Lag
The number of intervals between the two observations is the lag. For example,
the lag between the current and previous observations is one. If you go back
one more interval, the lag is two, and so on. In mathematical terms, the
observations Yt and Yt+k are separated by k time units, then the lag is k. This
lag can be days, quarters, or years depending on the nature of the data. When
k = 1, you are assessing adjacent observations.
We now come to our main topic autocovariance, and autocorrelation and we
now define autocovariance/autocorrelation formally. 97
Block 3 Time Series Analysis
Autocovariance
The autocovariance is the same as the covariance. The only difference is that
the autocovariance is applied to the same time series data, i.e., you compute
the covariance of the data say temperature Y with the same data temperature
Y, but from a previous period.
Autocorrelation
In time series analysis, the autocorrelation is the fundamental technique for
calculating the degree of correlation between a series and its lags. This
method is fairly similar to the Pearson correlation coefficient but
autocorrelation uses the same time series twice: one in its original form and
the second lagged one or more time periods as in autocovariance. We now
define autocorrelation as
Autocorrelation is a measure of the degree of relationship between a
given time series and a lagged version of itself over successive time intervals.
If Yt and Yt +k denote the value of a stationary time series which start from
time t and t+k, respectively,then the autocorrelation function/ coefficient
98 between time series Yt and its lag value Yt +k is defined as
Unit 13 Correlation Analysis in Time Series
Cov ( Yt , Yt +k )
ρk =
Var ( Yt ) Var ( Yt +k )
Since for stationary time series variance of the series remains constant,
therefore,
Var ( Yt ) = Var ( Yt +k )
Cov ( Yt , Yt +k ) ∑ ( Y − μ)( Y
t t +k − μ)
=ρk = t =1
Var ( Yt ) N
∑ ( Y − μ)
2
t
t =1
∑ ( Y − μ)( Y
t t +k − μ)
γk
ρk =
t =1
N
γ0
∑ ( Yt − μ)
2
t =1
∑ ( Y − μ)( Y − μ)
t t
γ0
ρ
=0
t =1
N
= = 1
γ0
∑ ( Y − μ)
2
t
t =1
The degree of correlation between a series and its lags indicates the
pattern/characteristics of the series. For example, if a time series has a
seasonality component say monthly then we will observe a strong correlation
with its seasonal lags, say, 12, 24, and 36 months.
Some important properties of time series can be studied with the help of
autocovariance and autocorrelation functions. They measure the linear
relationship between observations at different time lags apart. They provide
useful descriptive properties of the time series under study. This is also an
important tool for guessing a suitable model for the time series data.
After understanding the concept of autocovariance and autocorrelation
functions, we now study how to estimate them using sample data.
∑(y t− y )( y t −k − y )
c
ρ̂k= rk= t =1
n
= k ; k= 1,2,...,n − 1
c0
∑ ( yt − y )
2
t =1
Calculate mean, variance and autocorrelation functions for the given data.
Solution: As you know that the autocovariance/autocorrelation function is
calculated between variables with multiple values of the same length.
100 Therefore, to compute the sample autocorrelation, first of all, we make two
Unit 13 Correlation Analysis in Time Series
series of the same length. If y t denotes the value of the temperature/series at
any particular time t then y t +1 denotes the value of the temperature/series one
time after time t. That is, y t +1 is the lag 1 value of y t as shown in the following
table:
Day Temperature (yt) yt+1 Day Temperature (yt) yt+1
1 22 -- 9 28 28
2 23 22 10 30 28
3 23 23 11 32 30
4 24 23 12 32 31
5 23 24 13 34 30
6 25 23 14 33 31
7 26 25 15 34 31
8 28 26
5 23 24 23 23 22
6 25 23 24 23 23
7 26 25 23 24 23
8 28 26 25 23 24
101
Block 3 Time Series Analysis
9 28 28 26 25 23
10 30 28 28 26 25
11 31 30 28 28 26
12 30 31 30 28 28
13 31 30 31 30 28
14 31 31 30 31 30
15 30 31 31 30 31
Total 405
Since for the calculation of the autocorrelation function, we assume that the
time series is stationary, therefore, mean and variance of the series will be
constant. Thus, we calculate the sample mean and variance of the given
original time series and make the necessary calculations for calculating the
autocovariance and autocorrelation function in the following table:
yt − y ( yt − y )
2
y t +1 − y yt+2 − y yt +3 − y yt+4 − y ( y t − y )( y t +1 − y ) ( y t − y )( y t + 2 − y ) ( y t − y )( y t + 3 − y ) ( y t − y )( y t + 4 − y )
–5 25
–4 16 –5 20
–4 16 –4 –5 16 20
–3 9 –4 –4 –5 12 12 15
-4 16 –3 –4 –4 –5 12 16 16 20
–2 4 –4 –3 –4 –4 8 6 8 8
–1 1 –2 –4 –3 –4 2 4 3 4
1 1 –1 –2 –4 –3 –1 –2 –4 –3
1 1 1 –1 –2 –4 1 –1 –2 –4
3 9 1 1 –1 –2 3 3 –3 –6
4 16 3 1 1 -1 12 4 4 –4
3 9 4 3 1 1 12 9 3 3
4 16 3 4 3 1 12 16 12 4
4 16 4 3 4 3 16 12 16 12
3 9 4 4 3 4 12 12 9 12
Total 164 –3 –7 –11 –14 137 111 77 46
Therefore,
1 n 405
Mean
= ∑ =
n t =1
y t = 27
15
1 n 164
Variance =c 0 = ∑ ( y t − y ) =
2
=10.933
n t =1 15
Autocovariance function
1 n−1 1
c1 = ∑
n t =1
( y t − y )( y t +1 − y ) = × 137 = 9.133
15
1 n−2 1
c2 = ∑
n t =1
( y t − y )( y t +2 − y ) = × 111 = 7.4
15
1 n−3 1
102 c3 = ∑
n t =1
( y t − y )( y t +3 − y ) = × 77 = 5.133
15
Unit 13 Correlation Analysis in Time Series
1 n− 4
1
c4 = ∑
n t =1
( y t − y )( y t + 4 − y ) = × 46 = 3.067
15
SAQ 1
A researcher wants to study the pattern of the unemployment rate in his
country. He collected quarterly unemployment rate data and given in the
following table:
Unemployment Quarter Unemployment
Quarter
rate rate
1 91 7 64
2 45 8 99
3 89 9 64
4 36 10 89
5 72 11 68
6 51 12 108
Compute:
(i) mean and variance, and
(ii) Autocovariance and autocorrelation functions.
This is the correlation between values two time periods apart conditional on
knowledge of the value in between. (By the way, the two variances in the
denominator will equal each other in a stationary series.), therefore,
The formula for calculating the partial autocorrelation function looks scary,
therefore, we calculate it using the autocorrelation function instead of it.
The 1st order partial autocorrelation function equals to the 1st order
autocorrelation function, that is,
φ11 =ρ1
Similarly, we can define the 2nd order (lag) partial autocorrelation function in
terms of autocorrelation function as
φ22 =
(ρ − ρ )
2
2
1
104 (1 − ρ )2
1
Unit 13 Correlation Analysis in Time Series
where
φ1k 1 ρ1 ρ2 ρk −1 ρ1 and if then
φ ρ 1 ρ3 ρk − 2 ρ according to Cramer-
2k 1 2 Rule the system has
=φk φ3k
= ,Pk ρ2 ρ1 1 ρk −3 and
= Ψk ρ2 unique solution and is
given by
φkk ρk −1 ρk − 2 ρk −3 1 ρk
In the above expression, the last coefficient, φkk , is the partial autocorrelation Where
Pk*
φkk =
Pk
As you saw, the autocorrelation function helps assess the properties of a time
series. In contrast, the partial autocorrelation function (PACF) is more useful
for finding the order of an autoregressive, autoregressive integrated moving
average (ARIMA) model. You will study these models in the next unit.
φˆ11 =r1
φˆ22 =
(
r2 − r12 )
(
1 − r12 )
The general form for calculating the sample partial autocorrelation function of
order k is given in the matrix form as shown below:
1 r1 r2 r1
r1 1 r3 r2
r2 r1 1 r3
*
P̂ k rk −1 rk − 2 rk −3 rk
φˆkk= =
Pˆk 1 r1 r2 rk −1
r1 1 r3 rk − 2
r2 r1 1 rk −3
rk −1 rk − 2 rk −3 1
Let's consider an example which helps you to understand how to calculate the
sample partial autocorrelation function.
Example 2: For the data given in Example 2 of Unit 12, calculate the sample
partial autocorrelation up to order 3.
φ11 = r1 = 0.835
We can calculate the 2nd order (lag) sample partial autocorrelation function as
=
φ22
(r =
−r )
2 1
2
0.677 − ( 0.835 )
2
(1 − r ) 1
2
1 − ( 0.835 )
2
−0.020
= = −0.067
106 0.303
Unit 13 Correlation Analysis in Time Series
1 r1 r1 1 0.835 0.835
r1 1 r2 = 0.835 1 0.677
r2 r1 r3 0.677 0.835 0.470
SAQ 2
For the data given in SAQ 1, calculate the sample partial autocorrelation
function up to order 2.
13.5 CORRELOGRAM
In the previous sessions, you learnt autocovariance, autocorrelation, and
partial autocorrelation functions which are used to understand the properties of
time series, fit the appropriate models, and forecast future events of the series.
With the help of the autocorrelation/partial autocorrelation function, we can
also diagnose whether the time series is stationary or not. But a group of a
large number of autocorrelation always makes misperceptions to the reader
and he/she may understand it wrongly. If we present the autocorrelation/
partial autocorrelation function in the form of graphs/diagrams, then it attracts
the reader and it can be understood better. 107
Block 3 Time Series Analysis
A plot in which we take the autocorrelation function on the vertical axis and
different lags on the horizontal axis is known as a correlogram. The technique
of drawing a correlogram is the same as that of a simple bar diagram. The
only difference is that we just take a line instead of a bar of the same width.
Each bar in the correlogram represents the level of correlation between the
series and its lags in chronological order. A correlogram is also known as an
autocorrelation function (ACF) plot or autocorrelation plot. It gives
a summary of autocorrelation at different lags. With the help of a
correlogram, we can easily examine the nature of the time series and
diagnose a suitable model for the time series data.
The correlogram suggests that observations with smaller lag are positively
correlated and autocorrelation decreases as lag k increases. In most of the
time series, it is noticed that the absolute value of rk i.e. | rk| decreases as k
increases. This is because observations which are located far away are not
much related to each other, whereas observations close may be positively or
negatively correlated.
Let us understand how we plot a correlogram with the help of an example.
Example 3: For the data given in Example 2 of Unit 12, plot the correlogram.
Solution: A correlogram is a plot of the autocorrelation function with respect to
its lag, therefore, first of all, we have to compute the sample autocorrelation
coefficients. In Example 2, we have already calculated these. Therefore, to the
sake of time, we just write them here
r1 = 0.835, r2 = 0.676, r3 = 0.469, r4 = 0.280
For the correlogram, we take lags on the X-axis and sample autocorrelation
function on the Y-axis. At each lag, we draw a line, which represents the level
of correlation between the series and a lagged version of itself, as shown in
the following Fig. 13.2.
After learning what is correlogram and how we plot it, we now understand how
the correlogram helps us to recognise the nature of a time series.
helpful for visual inspection to recognise the nature of time series, though it is
not always easy. We now describe certain types of time series and the nature
of their correlograms.
Random Series
A time series is completely random if it contains only independent
observations. Therefore, the values of the autocorrelation function for such a
series are approximately zero, that is, rk 0 and the correlogram of such a
random time series will be moving around the zero line. The typical
correlogram is shown in Fig. 13.3.
Alternating Series
If a time series behaves in a very rough and zig-zag manner, alternating
between above and below mean, then it indicates by negative rk and positive
rk+1 and vice-versa. The correlogram of an alternating time series is shown in
Fig. 13.4.
A time series is said to be stationary if its mean, variance and covariance are
109
almost constant and it is free from trend and seasonal effects. The
Block 3 Time Series Analysis
110
Fig. 13.6: The correlogram of time series having trend effect.
Unit 13 Correlation Analysis in Time Series
SAQ 3
A share market expert wants to study the pattern of a particular share price.
For that, he calculates the autocorrelation for different lags which are given as
follows:
r0 = 1, r1 = 0.482 , r2 = 0.050 , r3 = −0.159 , r4 = 0.253 , r5 = −0.024 , r6 = 0.053,
r13 = 0.407 , r14 = 0.010 , r15 = −0.181, r16 = −0.257 , r7 = −0.057 , r18 = 0.016
r19 = −0.051
13.7 SUMMARY
In this unit, we have discussed:
• Role of correlation analysis in time series.
• The covariance between a given time series and a lagged version of itself
over successive time intervals is called autocovariance. The formula for
calculating the autocovariance function is given as
1 N− k
γ −k Cov ( Yt , Yt +k ) =
γk == ∑ ( Yt − μ)( Yt +k − μ)
N t =1
and its estimate using sample data is as follows:
1 n −k
γ̂k ==
ck ∑ ( y t − y )( y t +k − y ); k =
n t =1
1,2,...,n − 1
∑ ( Y − μ)( Y
t − μ)
γk t +k
=ρk =
t =1
N
γ0
∑ ( Yt − μ)
2
t =1
∑(y t− y )( y t −k − y )
c
ρ̂k= rk= t =1
n
= k ; k= 1,2,...,n − 1
c0
∑ ( yt − y )
2
t =1
φ22 =
(ρ − ρ )
2
2
1
(1 − ρ ) 2
1
Pk*
φkk =
Pk
13.9 SOLUTION/ANSWERS
Self Assessment Questions (SAQs)
1. Since there are 12 observations, therefore, we prepare the data up to
n/4 = 12/4 = 3 lags as follows:
Quarter Unemployment (yt) yt+1 yt+2 yt+3
1 91
2 45 91
3 89 45 91
4 36 89 45 91
5 72 36 89 45
6 51 72 36 89
7 64 51 72 36
8 99 64 51 72
9 64 99 64 51
10 89 64 99 64
11 68 89 64 99
12 108 68 89 64
Total 876
yt − y ( yt − y )
2
y t +1 − y y t + 2 − y y t + 3 − y ( y t − y )( y t +1 − y ) ( y t − y )( y t +2 − y ) ( y t − y )( y t +3 − y )
18 324
–28 784 18 –504
16 256 –28 18 –448 288
–37 1369 16 –28 18 –592 1036 –666
–1 1 –37 16 –28 37 –16 28
–22 484 –1 –37 16 22 814 –352
–9 81 –22 –1 –37 198 9 333
26 676 –9 –22 –1 –234 –572 –26
–9 81 26 –9 –22 –234 81 198
16 256 –9 26 –9 –144 416 –144
–5 25 16 –9 26 –80 45 –130
35 1225 –5 16 –9 –175 560 –315
0 5562 –2154 2661 –1074
Therefore,
1 n 876
Mean
= ∑ =
n t =1
y t = 73 ,
12
1 n 5562
Variance =c 0 = ∑ ( y t − y ) =
2
=463.5
n t =1 12 113
Block 3 Time Series Analysis
Autocovariance
1 n−1 1
c1 = ∑
n t =1
( y t − y )( y t +1 − y ) = × −2154 = −179.5
15
1 n−2
1
c 2 = ∑ ( y t − y )( y t + 2 − y ) = × 2661 = 221.75
n t =1 12
1 n−3 1
c 3 = ∑ ( y t − y )( y t + 3 − y ) = × −1074 = −89.5
n t =1 12
After calculating the autocovariance function, we now calculate the
sample autocorrelation function as
r1 = −0.387 , r2 = 0.478 , r3 = −0.193
2. In SAQ 1, we have already calculated the sample autocorrelation
coefficients which are as follows:
c1 −179.5 c 2 221.75
r1 = = = −0.387 , =
r2 = = 0.478
c0 463.9 c0 463.9
c 3 −89.5
r3 = = = −0.193
c 0 463.9
φˆ22
=
(r =
2−r )
1
2
0.478 − ( −0.387 )
=
2
0.386
(1 − r )
1
2
1 − ( −0.193 )
2
3. For plotting the correlogram, we take lags on the X-axis and sample
autocorrelation coefficients on the Y-axis. At each lag, we draw a line,
which represents the level of correlation between the series and its lags,
as shown in the following Fig. 13.8.
yt − y ( yt − y )
2
y t +1 − y yt+2 − y yt +3 − y yt+4 − y ( y t − y )( y t +1 − y ) ( y t − y )( y t + 2 − y ) ( y t − y )( y t + 3 − y ) ( y t − y )( y t + 4 − y )
17 289
0 0 17 0
11 121 0 17 0 187
–3 9 11 0 17 –33 0 –51
–16 256 –3 11 0 17 48 –176 0 –272
–6 36 –16 –3 11 0 96 18 –66 0
–1 1 4 –6 –16 –3 –4 6 16 3
–6 36 –11 –1 4 –6 66 6 –24 36
1 n −k 1272
Variance =c 0 = ∑ ( y t − y ) =
2
=90.86
n t =1 14
Autocovariance
1 n−1 1
c1 = ∑
n t =1
( y t − y )( y t +1 − y ) = × −110 = −7.86
14
1 n−2 1
c2 = ∑
n t =1
( y t − y )( y t +2 − y ) = × −41 = −2.93
14
1 n−3 1
c3 = ∑
n t =1
( y t − y )( y t +3 − y ) = × 33 = 2.36
14
1 n− 4 1
c4 = ∑
n t =1
( y t − y )( y t + 4 − y ) = × −109 = −7.79
14
c 3 2.36 c −7.79
r3
= = = 0.026 , r4 = 4 = = −0.086
c 0 90.86 c 0 90.86
Fig. 13.9: The correlogram of time series data of sales of new single houses.
116