Efficient MLE for Multivariate ARFIMA
Efficient MLE for Multivariate ARFIMA
This article considers the maximum likelihood estimation (MLE) of a class of stationary and invert-
ible vector autoregressive fractionally integrated moving-average (VARFIMA) processes considered in
Equation (26) of Luceño [A fast likelihood approximation for vector general linear processes with long
series: Application to fractional differencing, Biometrika 83 (1996), pp. 603–614] or Model A of Lobato
[Consistency of the averaged cross-periodogram in long memory series, J. Time Ser. Anal. 18 (1997),
pp. 137–155] where each component yi,t is a fractionally integrated process of order di , i = 1, . . . , r.
Under the conditions outlined in Assumption 1 of this article, the conditional likelihood function of
this class of VARFIMA models can be efficiently and exactly calculated with a conditional likelihood
Durbin–Levinson (CLDL) algorithm proposed herein. This CLDL algorithm is based on the multivariate
Durbin–Levinson algorithm of Whittle [On the fitting of multivariate autoregressions and the approximate
canonical factorization of a spectral density matrix, Biometrika 50 (1963), pp. 129–134] and the condi-
tional likelihood principle of Box and Jenkins [Time Series Analysis, Forecasting, and Control, 2nd ed.,
Holden-Day, San Francisco, CA]. Furthermore, the conditions in the aforementioned Assumption 1 are
general enough to include the model considered in Andersen et al. [Modeling and forecasting realized
volatility, Econometrica 71 (2003), 579–625] for describing the behaviour of realized volatility and the
model studied in Haslett and Raftery [Space–time modelling with long-memory dependence: Assessing Ire-
land’s wind power resource, Appl. Statist. 38 (1989), pp. 1–50] for spatial data as its special cases. As the
computational cost of implementing the CLDL algorithm is much lower than that of using the algorithms
proposed in Sowell [Maximum likelihood estimation of fractionally integrated time series models, Working
paper, Carnegie-Mellon University], we are thus able to conduct a Monte Carlo experiment to investigate
the finite sample performance of the CLDL algorithm for the 3-dimensional VARFIMA processes with the
sample size of 400. The simulation results are very satisfactory and reveal the great potentials of using the
CLDL method for empirical applications.
Keywords: Durbin–Levinson algorithm; long memory; maximum likelihood estimation; multivariate time
series
*Email: [email protected]
1. Introduction
Consider the maximum likelihood estimation (MLE) of a class of stationary and invertible vector
autoregressive fractionally integrated moving-average (VARFIMA) processes:
where 0 = Ir , and the (r × r) coefficient matrices j are often referred to as impulse response
functions. Note that j decays at a slow hyperbolic rate when the differencing parameters are not
equal to 0. When r = 1, this process was first introduced by Granger [1], Granger and Joyeux [2],
and Hosking [3]. They show that Yt is stationary if d < 1/2. As d > 0, Yt is said to have a long
memory, because the autocovariance functions of the univariate process Yt are not absolutely
summable. See Beran [4] for the overviews of long memory processes.
The model in Equation (1) has been considered in Equation (26) of Luceño [5] and is exactly
the Model A investigated in Lobato [6]. Each component yi,t in (1) is a fractionally integrated
process of order di , i = 1, . . . , r, denoted in Equation (3). Sowell [7] is the pioneer to calculate
the likelihood function of the VARFIMA model in Equation (1). Nevertheless, the presence of AR
parameters greatly complicates the computation of the corresponding autocovariance functions,
because the algorithms of Sowell [7,8] involve hypergeometric functions that need to be evaluated
with a truncated infinite sum, no matter whether the data are univariate or multivariate. On the other
hand, Luceño [5] suggests an approximate log-likelihood function based on the autocovariance
function of the ‘inverse-transpose’ model. As a consequence, a rounding error is inevitable from
using the approximate procedure of Luceño [5] and those of Sowell [7,8]. The impacts of rounding
errors generated from these algorithms on the estimation results might not be trivial, especially
when the dimensionality of the data series is relatively large.
The contribution of this article is to show that the conditional likelihood function of the
VARFIMA process in Equation (1) can be evaluated exactly and efficiently if the model in
Equation (1) can be represented as:
For example, the distinction between the models in Equations (1) and (5) is irrelevant if the
data series is univariate, i.e., r = 1. Under the multivariate time series scenario, the models
in Equations (1) and (5) are identical if either (B) is diagonal or the values of differencing
parameters remain intact across i = 1, . . . , r as imposed in Assumption 1 in the next section. Thus,
each component yi,t in Equation (5) is still a fractionally integrated process of order di defined in
Equation (3) given that the above-mentioned two conditions are satisfied. In fact, these conditions
have been employed in the multivariate long memory literature, e.g., Haslett and Raftery [9]
impose an homogeneous structure on the fractional differencing and ARMA parameters across
meteorological stations to describe the wind speeds recorded at 12 synoptic meteorological stations
in Ireland when using the VARFIMA model for their spatial data. Moreover, Andersen et al. [10]
employ the long-memory Gaussian trivariate VAR model in their Equation (15) to describe the
realized logarithm volatilities of exchange rates where the values of differencing parameters are
also identical across i = 1, . . . , r. The reason why they impose such an homogenous structure on
the value of fractional differencing parameters is to reduce the computational burden of estimating
the VARFIMA model whose dimensionality and span of data series are both large. However, the
methodology developed in this article allows us to deal with the VARFIMA model in Equation (1)
where the integration of each component yi,t is different across i = 1, . . . , r.
The model in Equation (5) has been discussed in Equation (27) of Luceño [5]. It is also equiv-
alent to Model B considered in Lobato [6] and can be estimated with Luceño’s [5] approximate
log-likelihood function. As argued previously, a rounding error is inevitable when using the
approximate procedure of Luceño [5]. On the contrary, given that the model in Equation (1) can
be transformed as the one in Equation (5), we show that the conditional likelihood function of
the VARFIMA process in Equation (1) can be evaluated exactly and efficiently by combining the
multivariate Durbin–Levinson algorithm of Whittle [11] with the conditional likelihood principle
of Box and Jenkins [12] even though the dimensionality and span of data series are relatively
large. We name this procedure the conditional likelihood Durbin–Levinson (CLDL) algorithm. In
the literature, Deriche and Tewfik [13] and Tsay and Härdle [14] estimate the univariate ARFIMA
process with the univariate Durbin–Levinson algorithm, while Doornik and Ooms [15] suggest
using Whittle’s [11] method for VARFIMA models but with no implementation.
Because the VARFIMA processes considered in Andersen et al. [10] belong to the subcases of
models in Equation (1), the proposed CLDL algorithm is readily applied to the data of Andersen
et al. [10]. Most importantly, the CLDL algorithm will provide a much more efficient estimate
than the equation-by-equation ordinary least squares (OLS) method of Andersen et al. [10], who
fix the value of d in Equation (1) with a common estimate 0.401 across three data series.
Another valuable feature of the CLDL algorithm is its computational gain over the original
algorithm of Sowell [7]. It is well known that the computational burdens of applying Sowell’s [7]
algorithm to the VARFIMA data are tremendous. Particularly, when studying the joint behaviour
of US and Canadian bond rates, Dueker and Startz [16] demonstrate that it takes about 35 min
on a 200-MHz PC for each iteration of the MLE of a bivariate VARFIMA process with 121
observations and 18 parameters when implementing Sowell’s [7] algorithm. However, it takes
less than about 40 s on a 1066-MHz PC for each iteration of the conditional MLE of a bivariate
VARFIMA(1, d, 1) process with 400 observations and 11 parameters when implementing the
proposed CLDL algorithms. This clearly demonstrates the power of the CLDL algorithm for many
potential empirical applications. It also explains why we can conduct a Monte Carlo experiment
to investigate the finite sample performance of the CLDL algorithm for 3-dimensional VARFIMA
processes under the sample size up to 400.
The remaining parts of this article are arranged as follows: Section 2 presents the autocovariance
functions of a VARFIMA(0, d, q) process and the implementation of the multivariate Durbin–
Levinson algorithm. Section 3 considers a class of VARFIMA(p, d, q) processes with which
we can compute their corresponding conditional likelihood function exactly with the CLDL
732 W.-J. Tsay
algorithm. The finite sample performance of the CLDL algorithm is investigated through a Monte
Carlo experiment in Section 4. Section 5 provides a conclusion.
The model in Equation (5) is essentially different from the one in Equation (1) on the ordering
of (B) and diag(∇ d ). The major idea of this article is to show that the conditional likelihood
function of the VARFIMA process in Equation (1) can be evaluated exactly and efficiently if the
model in Equation (1) can be transformed as the one in Equation (5). Assumption 1 summarizes
these conditions.
Assumption 1 Given that the data is generated by Equation (1), we assume (i) (B) is diagonal,
or (ii) the values of differencing parameters di remain intact across i = 1, . . . , r.
Provided that Assumption 1 holds, the difference in choosing between the models in Equa-
tions (1) and (5) no longer exists. Consequently, we impose the conditions in Assumption 1
throughout this article, and each component yi,t in model (1) or (5) is a fractionally integrated
process of order di , i = 1, . . . , r, as defined in Equation (3). Item (ii) of Assumption 1 allows us
to estimate the data of Andersen et al. [10] and those of Haslett and Raftery [9] with the proposed
CLDL algorithm explained more clearly later. The advantages of using item (i) of Assumption 1
are more evident when the dimensionality of the data series is relatively large. For example, when
r = 7, we may require that (B) is diagonal, and the associated number of parameters for such a
7-dimensional VARFIMA(1, d, 1) process is only 91 when compared with 133 parameters within
a VARFIMA(1, d, 1) process without being imposed any restriction. In doing so, the resulting
simplified model is still quite flexible, because we do not impose any constraint on the structure
of MA parameters; however, the computational burden of estimating such a simplified model is
much mitigated based on the CLDL algorithm as has been documented in Section 1.
Before implementing the MLE for the model in Equation (1), we need to compute the autovari-
ance functions of the VARFIMA models. We first consider the simplest case where Yt is generated
as a VARFIMA(0, d, 0) model:
Yt = diag(∇ −d )Zt , (6)
or
∞
(j + di )
yi,t = ψi,j zi,t−j , ψi,j = , i = 1, 2, . . . , r, (7)
j =0
(j + 1)(di )
From Abramowitz and Stegun [17, p. 556, Sections 15.1.1 and 15.1.20], we note that the
hypergeometric function is defined as
∞
(c) (a + n)(b + n) n
F (a, b; c; z) = z ,
(a)(b) n=0 (c + n)(n + 1)
and it fulfills the following relationship:
(c)(c − a − b)
F (a, b; c; 1) = , c = 0, −1, −2, . . . , c − a − b > 0.
(c − a)(c − b)
Using the above results, we derive the (m, n)th element of (h) in Equation (8) as
∞
σmn (1 − dm − dn ) (h + dn )
m,n (h) ≡ σmn ψm,j ψn,j +h = . (9)
j =0
(dn )(1 − dn ) (h + 1 − dm )
Luceño [5, p. 611] independently presents the formula for m,n (h) in his Equation (8), but
indirectly.
For computational efficiency, it is well known in the literature that m,n (h) in Equation (9) can
be calculated as follows:
∞
σmn (1 − dm − dn ) k − 1 + dn
σmn ψm,j ψn,j +h = , (10)
j =0
(1 − dm )(1 − dn ) 0<k≤h k − dm
where m, n = 1, 2, . . . , r, and h = 1, 2, . . . .
Combining the results in Equations (9) and (10) with Whittle’s [11] multivariate
Durbin–Levinson algorithm, we evaluate the exact unconditional likelihood function of a
VARFIMA(0, d, 0) model, L(, , d, , Y ), as
⎧ ⎫−1/2 ⎧ ⎫
⎨T ⎬ ⎨ 1 T ⎬
(2π )−rT /2 det(Vj −1 ) exp − j ) V −1 (Yj − Y
(Yj − Y j ) , (11)
j −1
⎩ ⎭ ⎩ 2 ⎭
j =1 j =1
matrix. As j = 1, Y 1 = 0, and V0 = (0). For the definition and computation of Y j and those of
Vj −1 , see Whittle [11] or Brockwell and Davis [18, Proposition 11.4.1] for details.
When Yt is a VARFIMA(0, d, q) process, i.e.
diag(∇ d )Yt = (B)Zt , (12)
the (m, n)th element of its corresponding autocovariance function, (h), is
m,n (h) ≡ ∗ (σmn mm,0 nn,0 )
⎧ ⎫
⎨ q
q
r r
(h + d + g − f )(h + 1 − d ) ⎬
+ ∗
n m
σuv mu,f nv,g
⎩ (h + dn )(h + 1 − dm + g − f ) ⎭
f =1 g=1 u=1 v=1
⎧ ⎫
⎨q
r
(h + d − f ) (h + 1 − d ) ⎬
+ ∗
n m
σnu mu,f
⎩ (h + dn ) (h + 1 − dm − f ) ⎭
f =1 u=1
⎧ ⎫
⎨ q
r
(h + dn + f ) (h + 1 − dm ) ⎬
+ ∗ σmu nu,f , (13)
⎩ (h + dn ) (h + 1 − dm + f ) ⎭
f =1 u=1
734 W.-J. Tsay
where
(1 − dm − dn ) (h + dn )
∗ = , (14)
(dn )(1 − dn ) (h + 1 − dm )
and mn,k denotes the (m, n)th element of k in Equation (1). Note that there are (rq + 1)2
terms at the right-hand side of Equation (13). With the autocovariance functions in Equation (13),
we apply the multivariate Durbin–Levinson algorithm of Whittle [11] to the VARFIMA(0, d, q)
processes as we have suggested for the VARFIMA(0, d, 0) processes.
Sowell [7] provides a formula to compute the autocovariance functions of the
VARFIMA(0, d, q) process in pages 12 and 14 of his manuscript. In this article, we present
the formula in Equation (13) more explicitly, thus providing an easy access to empirical appli-
cations. Furthermore, similar to the results in Equation (10), the ratios of gamma functions in
Equation (13) can be computed with the following relationship:
⎧
⎪
⎪
l
h + dn + k − 1
⎪
⎪
⎪
⎪ h − dm + k
if l > 0,
⎪
⎪
(h + dn + l) (h + 1 − dm ) ⎨ k=1
= 1 if l = 0, (15)
(h + dn ) (h + 1 − dm + l) ⎪ ⎪
⎪ |l|
⎪
⎪
⎪ h − dm − k + 1
⎪
⎪ if l < 0,
⎩ h + dn − k
k=1
where ⎧
⎪
⎪
q−|l|
⎪
⎪
⎪
⎪
mm,k nn,k+|l| if l > 0,
⎪
⎪
⎪
⎪
k=0
⎪
⎨ q
(l) = mm,k nn,k if l = 0, (17)
⎪
⎪
⎪ k=0
⎪
⎪
⎪
⎪
⎪
q−|l|
⎪
⎪
⎪
⎩ nn,k mm,k+|l| if l < 0.
k=0
Now there are only (q + 1)2 terms at the right-hand side of Equation (16). As r = 1, the
autocovariance function in Equation (16) is equivalent to Sowell’s [8, p. 173] formula.
We now show that the conditional likelihood function of the model in (1) under Assumption 1
can be evaluated exactly. From Equation (1) or (5), we have (B)Yt = diag(∇ −d )(B)Zt , and
the term diag(∇ −d )(B)Zt at the right-hand side of the equality sign corresponds to being a
VARFIMA(0, d, q) process whose autocovariances are displayed in Equation (13). Adopting the
Journal of Statistical Computation and Simulation 735
idea of the conditional likelihood function in Box and Jenkins [12, Chapter 7], we transform
Yt in Equation (1) into diag(∇ −d )(B)Zt for a given choice of parameters in (B) and of
suitable starting values. Particularly, if p = 1, then conditional on 1 and Y1 , Yt − 1 Yt−1 , t =
2, 3, . . . , T , is a VARFIMA(0, d, q) process, and we denote its associated conditional likelihood
function as
t ,
(B)Yt = diag(∇ −d )(B)Zt = (B) diag(∇ −d )Zt = (B)Y (19)
This section considers the finite sample performance of the MLE of VARFIMA processes with
the proposed CLDL algorithm. To our knowledge, this is the first simulation study pertaining
to the full time-domain MLE of the VARFIMA processes. Without loss of generality, we focus on
the process where all the fractional differencing parameters are positive in that the long memory
process is the major concern of the fractionally integrated literature. Moreover, we divide our
studies into two parts. The first part deals with the three-dimensional VARFIMA(0, d, 1) process,
while the second part considers the two-dimensional VARFIMA(1, d, 1) one.
736 W.-J. Tsay
We first consider the following three-dimensional VARFIMA(0, d, 1) process for the Monte Carlo
experiment
⎡ d ⎤ ⎡ ⎤⎡ ⎤
∇ 1 y1,t 1 + 11,1 B 0 0 z1,t
⎣∇ d2 y2,t ⎦ = ⎣ 0 1 + 22,1 B 0 ⎦ ⎣z2,t ⎦ . (20)
∇ d3 y3,t 0 0 1 + 33,1 B z3,t
Note that AR parameters are not included in Equation (20). Moreover, the specification in
Equation (20) reveals that yi,t is a fractionally integrated process of order di .
The parameters considered for the simulations are
(d1 , d2 , d3 ) = (0.4, 0.3, 0.2) , (11,1 , 22,1 , 33,1 ) = (−0.7, −0.5, −0.3) , (21)
⎡ ⎤
1 ρ ρ
(z1,t , z2,t , z3,t ) = N (0, ), = ⎣ρ 1 ρ ⎦ and ρ = 0.2, 0.5. (22)
ρ ρ 1
Given the specification in Equation (20), we adopt the Cholesky decomposition algorithm sug-
gested by McLeod and Hipel [21] and Hosking [22] to simulate three (T × 1) ARFIMA processes
zi,t , i = 1, 2, 3.
We estimate the parameter
ξ = κ(
ξ ), (24)
where
d1 d2 d3 e11,1 − 1 e22,1 − 1 e33,1 − 1
κ(
ξ) = , , , , , , (25)
1 + 2|d1 | 1 + 2|d2 | 1 + 2|d3 | e11,1 + 1 e 22,1 + 1 e33,1 + 1
and
ξ = (d1 , d2 , d3 ,
11,1 ,
22,1 ,
33,1 ) (26)
are the parameters really estimated with the proposed algorithm. To create a realistic simulation
scheme, we do not use the inverse function of the preceding transformation function calculated
at the true parameter value of ξ as the initial value for estimation procedure. Instead, we employ
the true value of ξ as the initial value for
ξ . In other words, in the simulation we use
ξ0 = (d1 , d2 , d3 , 11,1 , 22,1 , 33,1 ) , (27)
= U U. (28)
The parameters contained in U are the parameters really estimated with the proposed algorithm.
However, the starting values for each element of U are all set to be 1. Thus, the initial value of
Journal of Statistical Computation and Simulation 737
U for starting the procedure is not set to be the true value of U . This is the second mechanism to
create a realistic simulation environment.
All the programs are written in GAUSS evaluated at three sample sizes, T = 200, 300, 400.
The choice of these sample sizes strongly reveals that the computational burden of implementing
the CLDL algorithm is mild. Moreover, these sample sizes are frequently encountered with the
usual macroeconomic time series.
Two hundred additional values are generated to obtain random starting values. The optimization
algorithm used to implement the CLDL algorithm is the quasi-Newton algorithm of Broyden,
Fletcher, Goldfarb, and Shanno (BFGS) contained in the GAUSS MAXLIK library. The maximum
number of iterations for each replication is 100. The first 250 replications of normal convergence
are recorded for the subsequent data analysis.
Define bias as the true parameter values minus the corresponding average estimated values.
The simulation results in Table 1 for ρ = 0.5 reveal that the bias performance from using the
unconditional maximum likelihood estimator is satisfactory and the associated root-mean-squared
error (RMSE) decreases with the increase of sample size for all configurations considered in the
experiment.
To give a clearer picture about the finite sample performance of the CLDL algorithm, we also
report the simulation results with box-plots in Figures 1– 3 for the case ρ = 0.5. Figure 1 shows
the box-plots of the estimated d, while Figure 2 illustrates the box-plots of the estimated . The
box-plots of the estimated are displayed in Figure 3. All these figures demonstrate that the
performance of the CLDL algorithm performs well at a larger sample size as shown in Table 1,
ρ = 0.5
CLDL
T = 200 Bias 0.1027 0.0157 −0.0537 −0.0915 −0.0069 0.0782
RMSE 0.1447 0.0837 0.1066 0.1449 0.1024 0.1349
T = 300 Bias 0.0925 0.0072 −0.0614 −0.0865 −0.0011 0.0853
RMSE 0.1264 0.0709 0.0961 0.1275 0.0819 0.1240
T = 400 Bias 0.0888 0.0058 −0.0659 −0.0859 −0.0004 0.0889
RMSE 0.1171 0.0651 0.0913 0.1186 0.0737 0.1181
QMLE
m = [T 0.50 ]
T = 200 Bias 0.1947 0.0805 0.0466 NA NA NA
RMSE 0.2630 0.1909 0.1862 NA NA NA
T = 300 Bias 0.1383 0.0592 0.0344 NA NA NA
RMSE 0.2048 0.1672 0.1650 NA NA NA
T = 400 Bias 0.1120 0.0488 0.0194 NA NA NA
RMSE 0.1765 0.1497 0.1357 NA NA NA
m = [T 0.65 ]
T = 200 Bias 0.3541 0.1722 0.0826 NA NA NA
RMSE 0.3700 0.2010 0.1379 NA NA NA
T = 300 Bias 0.2964 0.1407 0.0652 NA NA NA
RMSE 0.3106 0.1663 0.1104 NA NA NA
T = 400 Bias 0.2676 0.1217 0.0492 NA NA NA
RMSE 0.2803 0.1440 0.0893 NA NA NA
Notes: The results are all based on 250 replications. Details of the experimental designs are given in Equations (20)–(22). CLDL denotes the
conditional likelihood Durbin–Levinson algorithm proposed in this article. QMLE denotes the QMLE method considered in Lobato [23],
and m is the maximum number of Fourier frequencies λj = 2πj/T with j = 1, . . . , m used for implementing QMLE. NA denotes not
available.
738 W.-J. Tsay
Figure 1. Box-plots of the estimated d from the three-dimensional VARFIMA(0, d, 1) model defined in Equa-
tions (20)–(22), and ρ = 0.5 based on the CLDL algorithm with 250 replications. The value f (g) denotes the model
specification where f = d, g denotes the sample size, such that g = A = 200, g = B = 300, and g = C = 400.
Figure 2. Box-plots of the estimated MA parameter from the three-dimensional VARFIMA(0, d, 1) model defined in
Equations (20)–(22), and ρ = 0.5 based on the CLDL algorithm with 250 replications. The value f (g) denotes the model
specification where f represents the value of MA parameter, and g denotes the sample size, such that g = A = 200,
g = B = 300, and g = C = 400.
Journal of Statistical Computation and Simulation 739
Figure 3. Box-plots of the estimated from the three-dimensional VARFIMA(0, d, 1) model defined in Equa-
tions (20)–(22), and ρ = 0.5 based on the CLDL algorithm with 250 replications.
supporting the usefulness of the CLDL algorithm in estimating the VARFIMA processes displayed
in Equation (20).
The simulation results for the case ρ = 0.2 are qualitatively identical to those found in Table 1
and in Figures 1– 3, where ρ = 0.5. To save space, we do not report the results with ρ = 0.2.
However, these results are available upon request from the author.
We also compare the performance of the CLDL algorithm in estimating the model in
Equation (20) with that of the semiparametric QMLE in Equation (8) of Lobato [23]. The dis-
tinguished feature of the QMLE is that it is designed to be robust to the presence of AR and
MA parameters. Thus, it can generate robust estimates of fractional differencing parameters
in Equation (1) without worrying about the order of the AR and MA polynomials. However,
the implementation of Lobato’s [23] QMLE requires selecting a number of Fourier frequencies
λj = 2πj/T with j = 1, . . . , m for estimation. The choice of m is crucial on the performance of
the semiparametric QMLE. In this section we select two values of m = [T 0.5 ] and m = [T 0.65 ],
where [X] is the largest integer less than or equal to X.
We employ a sequential quadratic programming algorithm ‘sqpSolve’ contained in the GAUSS
library to implement the QMLE of Lobato [23]. Similar to the experimental design for the CLDL
740 W.-J. Tsay
algorithm, 200 additional values are generated in order to obtain random starting values. Again,
the first 250 replications of normal convergence are recorded for the subsequent data analysis as
we have done for the CLDL algorithm.
In contrast with the CLDL algorithm, where the maximum number of iterations for each
replication is 100, we do not impose a restriction on the maximum number of iteration when
conducting the QMLE. This is the first design which favours the implementation of QMLE
relative to the CLDL algorithm. Furthermore, the true parameter values of di in Equation (20)
are set as the initial value for estimation procedure. This is the second design that favours the
implementation of QMLE relative to the CLDL algorithm. If the simulation results show that
the performance of the CLDL algorithm in estimating the fractional differencing parameters is
still better than that of the QMLE of Lobato [23] under the aforementioned experimental design,
then we can accrue this finding to the outstanding performance of MLE over its semiparametric
counterpart under correct model specification scheme.
For ease of comparison, the simulation results of the QMLE are also displayed in Table 1 for
ρ = 0.5. Table 1 clearly shows that performance of the QMLE improves when m = [T 0.5 ] is used
as compared with the other choice m = [T 0.65 ]. This finding is reasonable, because the design
of QMLE is motivated by examining the spectral density function close to the zero frequency
of a long memory process. This table also reveals that, for all three differencing parameters
considered in the experiment, the performance of the CLDL algorithm is much better than that of
QMLE even when m = [T 0.5 ] is chosen, because the RMSE of the CLDL algorithm is less than
that of the QMLE, respectively. This is as expected in that the CLDL algorithm is a conditional
Figure 4. Box-plots of the estimated d from the three-dimensional VARFIMA(0, d, 1) model defined in Equa-
tions (20)–(22) with 250 replications and ρ = 0.5. The value f (g, QMLE) denotes the estimation results when the
estimator is the QMLE of Lobato [23] and m = [T 0.50 ], f denotes the true value of d = f , and g denotes the sample
size, such that g = A = 200, g = B = 300, g = C = 400. Likewise, the value f (g, CLDL) denotes the estimation results
when the estimator is the proposed CLDL algorithm, f denotes the true value of d = f , and g denotes the sample size,
such that g = A = 200, g = B = 300, g = C = 400.
Journal of Statistical Computation and Simulation 741
Figure 5. Box-plots of the estimated d from the 2-dimensional VARFIMA(1, d, 1) model defined in Equations (29) and
(30), and ρ = 0.5 based on the CLDL algorithm with 250 replications. The value f (g) denotes the model specification
where f = d, and g denotes the sample size, such that g = A = 200, g = B = 300, and g = C = 400.
Figure 6. Box-plots of the estimated MA parameter from the 2-dimensional VARFIMA(1, d, 1) model defined in
Equations (29) and (30), and ρ = 0.5 based on the CLDL algorithm with 250 replications. The value f (g) denotes the
model specification where f represents the value of MA parameter, and g denotes the sample size, such that g = A = 200,
g = B = 300, and g = C = 400.
742 W.-J. Tsay
maximum likelihood estimator that should be more efficient than its semiparametric counterpart
under correct model specification.
The simulation results from the CLDL and QMLE are also graphed with box-plots for clarity
of exposition. Figure 4 displays the results for the case ρ = 0.5. To save space, we only present
the simulations concerning the parameter d1 = 0.4. Note that the results from QMLE are based
on the choice m = [T 0.5 ] for the reason outlined previously. It is clear from this figure that
the performance of the CLDL algorithm is much better than that of the QMLE. Moreover, the
preceding findings remain intact when ρ = 0.2. Again, the results with ρ = 0.2 are available upon
request from the author.
This subsection considers the impacts of AR parameters on the performance of the CLDL algo-
rithms when the off-diagonal elements of (B) in Equation (5) are not equal to zero. To save the
loadings of estimation, we focus on the following two-dimensional VARFIMA(1, d, 1) model:
d
∇ 1 0 1 − 11,1 B −12,1 B y1,t t ,
=Z (29)
0 ∇ d2 −21,1 B 1 − 22,1 B y2,t
Figure 7. Box-plots of the estimated AR parameter from the two-dimensional VARFIMA(1, d, 1) model defined in
Equations (29) and (30), and ρ = 0.5 based on the CLDL algorithm with 250 replications. The top-left panel depicts the
estimates for 11,1 , the top-right panel depicts the estimates for 12,1 , the low-left panel depicts the estimates for 21,1 ,
and the low-right panel depicts the estimates for 22,1 .
Journal of Statistical Computation and Simulation 743
where
1 + 11,1 B 0 z1,t
Zt = . (30)
0 1 + 22,1 B z2,t
Note that AR parameters are included in Equation (29) as compared with the model in
Equation (20). The presence of non-zero off-diagonal elements in (B) indicates that the con-
ditions imposed in Assumption 1 are no longer binding for the model defined in Equations (29)
and (30) which is a special case of the model in Equation (5). Thus, the order of integration of
yi,t is not equal to di any more. This implies that the fractional differencing parameters d1 and d2
in Equation (29) cannot be consistently estimated with the QMLE of Lobato [23], because it is
the linear combination of yi,t , yi,t−1 , and yj,t−1 with j = i is a fractionally integrated process of
order di . However, the CLDL algorithm still can deal with the model defined in Equations (29)
and (30) easily.
The parameters considered for the simulations are
The principle of setting initial values for the CLDL algorithms and the QMLE of Lobato [23]
is identical to that outlined in Section 4.1, respectively. In addition, the ways of choosing initial
Figure 8. Box-plots of the estimated d from the 2-dimensional VARFIMA(1, d, 1) model defined in Equations (29)
and (30) with 250 replications and ρ = 0.5. The value f (g, QMLE) denotes the estimation results when the estimator
is the QMLE of Lobato [23] and m = [T 0.50 ], f denotes the true value of d = f , and g denotes the sample size, such
that g = A = 200, g = B = 300, g = C = 400. Likewise, the value f (g, CLDL) denotes the estimation results when the
estimator is the proposed CLDL algorithm, f denotes the true value of d = f , and g denotes the sample size, such that
g = A = 200, g = B = 300, g = C = 400.
744 W.-J. Tsay
values for the AR parameters are identical to those of choosing initial values for the MA parameters
outlined in Section 4.1 for the CLDL algorithm.
Because the simulation results for the case ρ = 0.2 are qualitatively identical to those from the
case ρ = 0.5 in this subsection, we only discuss the results with ρ = 0.5 to shorten the length of the
article. The simulations are graphed with box-plots in Figures 5–7. Figure 5 exhibits the box-plots
of the estimated d, Figures 6 demonstrates the box-plots of the estimated , and Figure 7 presents
the box-plots of the estimated . To save space, we do not present the box-plots of the estimated
in this article, but point out that they are similar to those shown in Figure 3. All these figures
demonstrate that the performance of the CLDL algorithm improves with the increasing sample
sizes and the bias from using the CLDL is very small as we find in the studies concerning the
three-dimensional VARFIMA(0, d, 1) process in Section 4.1. These simulation results explicitly
support the usefulness of the CLDL algorithm in estimating the VARFIMA(1, d, 1) processes
displayed in Equations (29) and (30).
We also prepare a comparison between the QMLE and the CLDL algorithm in estimating the
fractional differencing parameter under the model in Equations (29) and (30). Before discussing
the findings, we emphasize here that the order of integration of y1,t in Equation (29) is not 0.4
because of the presence of a non-zero off-diagonal element shown in (B) of Equation (29).
Thus, the estimates for d1 from using the QMLE of course will not be close to the true parameter
value 0.4. Indeed, in Figure 8 we find the estimates of d1 based on Lobato’s [23] QMLE are far
away from the true parameter value d1 = 0.4. On the contrary, the conditional MLE based on
the CLDL algorithm provides a promising estimate of d1 , indicating the excellent ability of the
proposed CLDL algorithm in estimating all the AR, MA, and fractional differencing parameters
under the model in Equation (5).
5. Conclusion
using the unconstrained VARFIMA model. Essentially, the development of the CLDL algorithm
greatly relaxes the restrictions imposed on the contemporaneous ARFIMA model of Haslett and
Raftery [9] and sheds more lights on the modelling of long memory space–time data.
Acknowledgements
I would like to thank an anonymous referee and an Associate Editor for their valuable comments and suggestions. I
also thank Jia-Ci Lin and Peng-Hsuan Ke for their excellent research assistance. ∗ Corresponding author. The Institute of
Economics, Academia Sinica, Taipei, Taiwan, R.O.C. Tel.: (886-2) 2782-2791 ext. 296. Fax: (886-2) 2785-3946. E-Mail:
[email protected]
References
[1] C.W.J. Granger, Long memory relationships and the aggregation of dynamic models, J. Econom. 14 (1980),
pp. 227–238.
[2] C.W.J. Granger and R. Joyeux, An introduction to long-memory time series models and fractional differencing,
J. Time Ser. Anal. 1 (1980), pp. 15–29.
[3] J.R.M. Hosking, Fractional differencing, Biometrika 68 (1981), pp. 165–176.
[4] J. Beran, Statistics for Long-Memory Processes, Chapman and Hall, New York, 1994.
[5] A. Luceño, A fast likelihood approximation for vector general linear processes with long series: Application to
fractional differencing, Biometrika 83 (1996), pp. 603–614.
[6] I.N. Lobato, Consistency of the averaged cross-periodogram in long memory series, J. Time Ser. Anal. 18 (1997),
pp. 137–155.
[7] F. Sowell, Maximum likelihood estimation of fractionally integrated time series models, Working paper, Carnegie-
Mellon University, 1989.
[8] F. Sowell, Maximum likelihood estimation of stationary univariate fractionally integrated time series models, J.
Econom. 53 (1992), pp. 165–188.
[9] J. Haslett and A.E. Raftery, Space–time modelling with long-memory dependence: Assessing Ireland’s wind power
resource, Appl. Statist. 38 (1989), pp. 1–50.
[10] T.G. Andersen, T. Bollserslev, F.X. Diebold, and P. Labys, Modeling and forecasting realized volatility. Econometrica
71 (2003), pp. 579–625.
[11] P. Whittle, On the fitting of multivariate autoregressions and the approximate canonical factorization of a spectral
density matrix, Biometrika 50 (1963), pp. 129–134.
[12] G.E.P. Box and G.M. Jenkins, Time Series Analysis, Forecasting, and Control, 2nd ed., Holden-Day, San Francisco,
1976.
[13] J.A. Deriche and A.H. Tewfik, Maximum likelihood estimation of the parameters of discrete fractionally differenced
Gaussian noise process, IEEE Trans. Signal Process. 41 (1993), pp. 2977–2989.
[14] W.J. Tsay and W.K. Härdle, A generalized ARFIMA process with Markov-switching fractional differencing
parameter, J. Statist. Comput. Simulation (2008), iFirst doi: 10.1080/00949650801910289.
[15] J.A. Doornik and M. Ooms, Computational aspects of maximum likelihood estimation of autoregressive fractionally
integrated moving average models, Comput. Statist. Data Anal. 42 (2003), pp. 333–348.
[16] M. Dueker and R. Startz, Maximum-likelihood estimation of fractionally cointegration with an application to U.S.
and Canadian Bond Rates, Rev. Econom. Statist. 80 (1998), pp. 420–426.
[17] M. Abramowitz and I.A. Stegun, Handbook of Mathematical Functions. Dover, New York, 1970.
[18] P.J. Brockwell R.A. Davis, Time Series: Theory and Methods, 2nd ed., Springer-Verlag, New York, 1991.
[19] W.K. Li and A.I. McLeod, Fractional time series modelling, Biometrika 73 (1986), pp. 217–221.
[20] M. Dueker and A. Serletis, Do real exchange rates have autoregressive unit roots? A test under the alternative of
long memory and breaks, Working Paper 2000-016A, Federal Reserve Bank of St. Louis, 2000.
[21] A.I. McLeod and K.W. Hipel, Preservation of the rescaled adjusted range 1: A reassessment of the Hurst phenomenon,
Water Resources Res. 14 (1978), pp. 491–508.
[22] J.R.M. Hosking, Modeling persistence in hydrological time series using fractional differencing, Water Resources
Res. 20 (1984), pp. 1898–908.
[23] I.N. Lobato, A semiparametric two-step estimator in a multivariate long memory model, J. Econom. 90 (1999),
pp. 129–153.