0% found this document useful (0 votes)
80 views49 pages

سلاسل ماركوف 1

1. The authors develop a new test for the Markov property in time series using the conditional characteristic function in a frequency domain approach. This allows them to check implications of the Markov property across conditional moments and lags. 2. Simulation studies show the test has reasonable size and power against non-Markov alternatives in finite samples when using a smoothed nonparametric bootstrap. 3. The authors apply the test to several financial time series and find some evidence against the Markov property.

Uploaded by

Vayo Sony
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views49 pages

سلاسل ماركوف 1

1. The authors develop a new test for the Markov property in time series using the conditional characteristic function in a frequency domain approach. This allows them to check implications of the Markov property across conditional moments and lags. 2. Simulation studies show the test has reasonable size and power against non-Markov alternatives in finite samples when using a smoothed nonparametric bootstrap. 3. The authors apply the test to several financial time series and find some evidence against the Markov property.

Uploaded by

Vayo Sony
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Econometric Theory, 28, 2012, 130–178.

doi:10.1017/S0266466611000065

TESTING FOR THE MARKOV


PROPERTY IN TIME SERIES

BIN CHEN
University of Rochester
YONGMIAO HONG
Cornell University and Xiamen University

The Markov property is a fundamental property in time series analysis and is of-
ten assumed in economic and financial modeling. We develop a new test for the
Markov property using the conditional characteristic function embedded in a fre-
quency domain approach, which checks the implication of the Markov property in
every conditional moment (if it exists) and over many lags. The proposed test is ap-
plicable to both univariate and multivariate time series with discrete or continuous
distributions. Simulation studies show that with the use of a smoothed nonparametric
transition density-based bootstrap procedure, the proposed test has reasonable sizes
and all-around power against several popular non-Markov alternatives in finite sam-
ples. We apply the test to a number of financial time series and find some evidence
against the Markov property.

1. INTRODUCTION
The Markov property is a fundamental property in time series analysis and is often
a maintained assumption in economic and financial modeling. Testing for the va-
lidity of the Markov property has important implications in economics, finance,
as well as time series analysis. In economics, for example, Markov decision pro-
cesses (MDP), which are based on the Markov assumption, provide a general
framework for modeling sequential decision making under uncertainty (see Rust,
1994, and Ljungqvist and Sargent, 2000, for excellent surveys) and have been
extensively used in economics, finance, and marketing. Applications of MDP in-
clude investment under uncertainty (Lucas and Prescott, 1971; Sargent, 1987),
asset pricing (Lucas, 1978; Hall, 1978; Mehra and Prescott, 1985), economic
growth (Uzawa, 1965; Romer, 1986, 1990; Lucas, 1988), optimal taxation (Lucas

We thank Pentti Saikkonen (the co-editor), three referees, Frank Diebold, Oliver Linton, James MacKinnon, Katsumi
Shimotsu, Kyungchul Song, Liangjun Su, George Tauchen, and seminar participants at Peking University, Queen’s
University, University of Pennsylvania, the 2008 Xiamen University-Humboldt University Joint Workshop, the 2008
International Symposium on Recent Developments of Time Series Econometrics in Xiamen, the 2008 Symposium
on Econometric Theory and Applications (SETA) in Seoul, and the 2008 Far Eastern Econometric Society Meeting
in Singapore for their constructive comments on the previous versions of this paper. Any remaining errors are solely
ours. Bin Chen thanks the Department of Economics, University of Rochester, for financial support. Yongmiao
Hong thanks the outstanding overseas youth fund of the National Science Foundation of China for its support.
Address correspondence to Bin Chen, Department of Economics, University of Rochester, Rochester, NY 14620,
USA; e-mail: [email protected].

130 
c Cambridge University Press 2011
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 131

and Stokey, 1983; Zhu, 1992), industrial organization (Ericson and Pakes, 1995;
Weintraub, Benkard, and Van Roy, 2008), and equilibrium business cycles
(Kydland and Prescott, 1982). In the MDP framework, an optimal decision rule
can be found within the subclass of nonrandomized Markovian strategies, where a
strategy depends on the past history of the process only via the current state. Obvi-
ously, the optimal decision rule may be suboptimal if the foundational assumption
of the Markov property is violated. Recently non-Markov decision processes
(NMDP) have attracted increasing attention in the literature (e.g., Mizutani and
Dreyfus, 2004; Aviv and Pazgal, 2005). The non-Markov nature can arise in many
ways. The most direct extension of MDP to NMDP is to deprive the decision
maker of perfect information on the state of the environment.
In finance the Markov property is one of the most popular assumptions in
most continuous-time modeling. It is well known that stochastic integrals yield
Markov processes. In modeling interest rate term structure, such popular models
as Vasicek (1977), Cox, Ingersoll, and Ross (1985), affine term structure models
(Duffie and Kan, 1996; Dai and Singleton, 2000), quadratic term structure mod-
els (Ahn, Dittmar, and Gallant, 2002), and affine jump diffusion models (Duffie,
Pan, and Singleton, 2000) are all Markov processes. They are widely used in
pricing and hedging fixed-income or equity derivatives, managing financial risk,
and evaluating monetary policy and debt policy. If interest rate processes are not
Markov, alternative non-Markov models, such as Heath, Jarrow, and Morton’s
(1992) model may provide a better characterization of interest rate dynamics.
In a discrete-time framework, Duan and Jacobs (2008) find that deviations from
the Markovian structure significantly improve the empirical performance of the
model for the short-term interest rate. In general, if a process is obtained by dis-
cretely sampling a subset of the state variables of a continuous-time process that
evolves according to a system of nonlinear stochastic differential equations, it is
non-Markov. A leading example is the class of stochastic volatility models (e.g.,
Anderson and Lund, 1997; Gallant, Hsieh, and Tauchen, 1997).
In the market microstructure literature, one important issue is the price for-
mation mechanism, which determines whether security prices follow a Markov
process. Easley and O’Hara (1987) develop a structural model of the effect of
asymmetric information on the price-trade size relationship. They show that trade
size introduces an adverse selection problem to security trading because informed
traders, given their wish to trade, prefer to trade larger amounts at any given
price. Hence market makers’ pricing strategies will also depend on trade size,
and the entire sequence of past trades is informative of the likelihood of an in-
formation event and thus price evolution. Consequently, prices typically will not
follow a Markov process. Easley and O’Hara (1992) further consider a variant
of Easley and O’Hara’s (1987) model and delineate the link between the exis-
tence of information, the timing of trades, and the stochastic process of security
prices. They show that while trade signals the direction of any new information,
the lack of trade signals the existence of any new information. The latter effect
can be viewed as event uncertainty and suggests that the interval between trades
132 BIN CHEN AND YONGMIAO HONG

may be informative and hence time per se is not exogenous to the price process.
One implication of this model is that either quotes or prices combined with inven-
tory, volume, and clock time are Markov processes. Therefore, rather than using
the non-Markov price series alone, it would be preferable to estimate the price
process consisting of no trade outcomes, buys, and sells. On the other hand, other
models also explain market behavior but reach opposite conclusions on the prop-
erty of prices. For example, Platen and Rebolledo (1996) and Amaro de Matos
and Rosario (2000) propose equilibrium models, which assume that market mak-
ers can take advantage of their superior information on trade orders and set differ-
ent prices. The presence of market makers prevents the direct interaction between
demand and supply sides. By specifying the supply and demand processes, these
market makers obtain the equilibrium prices, which may be Markov. By testing
the Markov property, one can check which models reflect reality more closely.
Our interest in testing the Markov property is also motivated by its wide ap-
plications among practitioners. For example, technical analysis has been used
widely in financial markets for decades (see, e.g., Edwards and Magee, 1966;
Blume, Easley, and O’Hara, 1994; LeBaron, 1999). One important category is
priced-based technical strategies, which refer to the forecasts based on past prices,
often via moving-average rules. However, if the history of prices does not provide
additional information, in the sense that the current prices already impound all
information, then price-based technical strategies would not be effective. In other
words, if prices adjust immediately to information, past prices would be redun-
dant and current prices are the sufficient statistics for forecasting future prices.
This actually corresponds to a fundamental issue: namely, whether prices follow
a Markov process.
Finally, in risk management, financial institutions are required to rate assets
by their default probability and by their expected loss severity given a default.
For this purpose, historical information on the transition of credit exposures is
used to estimate various models that describe the probabilistic evolution of credit
quality. The simple time-homogeneous Markov model is one of the most popular
models (e.g., Jarrow and Turnbull, 1995; Jarrow, Lando, and Turnbull, 1997),
specifying the stochastic processes completely by transition probabilities. Under
this model, a detailed history of individual assets is not needed. However, whether
the Markov specification adequately describes credit rating transitions over time
has substantial impact on the effectiveness of credit risk management. In empirical
studies, Kavvathas (2001) and Lando and Skφdeberg (2002) document strong
non-Markov behaviors such as dependence on previous rating and waiting-time
effects in rating transitions. In contrast, Bangia, Diebold, Kronimus, Schagen, and
Schuermann (2002) and Kiefer and Larson (2004) find that first-order Markov
ratings dynamics provide a reasonable practical approximation.
Despite innumerable studies rooted in Markov processes, there are few existing
tests for the Markov property in the literature. Ait-Sahalia (1997) first proposes
a test for whether the interest rate process is Markov by checking the validity
of the Chapman-Kolmogorov equation, where the transition density is estimated
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 133

nonparametrically. The Chapman-Kolmogorov equation is an important charac-


terization of Markov processes and can detect many non-Markov processes with
practical importance, but it is only a necessary condition of the Markov property.
Feller (1959), Rosenblatt (1960), and Rosenblatt and Slepian (1962) provide
examples of stochastic processes that are not Markov but whose first-order tran-
sition probabilities nevertheless satisfy the Chapman-Kolmogorov equation. Ait-
Sahalia’s (1997) test will miss these non-Markov processes.
Amaro de Matos and Fernandes (2007) test whether discretely recorded obser-
vations of a continuous-time process are consistent with the Markov property via
a smoothed nonparametric density approach. They test the conditional indepen-
dence of the underlying data generating process (DGP).1 Because only a fixed
lag order in the past information set is checked, the test may easily overlook
the violation of conditional independence from higher-order lags. Moreover, the
test involves a relatively high-dimensional smoothed nonparametric joint density
estimation (see more discussion below).
In this paper we provide a conditional characteristic function (CCF) charac-
terization for the Markov property and use it to construct a nonparametric test
for the Markov property. The characteristic function has been widely used in
time series analysis and econometrics (e.g., Feuerverger and McDunnough, 1981;
Epps and Pulley, 1983; Hong, 1999; Singleton, 2001; Jiang and Knight, 2002;
Chacko and Viceira, 2003; and Su and White, 2007). The basic idea of the CCF-
characterization for the Markov property is that when and only when a stochas-
tic process is Markov, a generalized residual process associated with the CCF is
a martingale difference sequence (MDS). This characterization has never been
used in testing the Markov property. We use a nonparametric regression method
to estimate the CCF and use a spectral approach to check whether the generalized
residuals are explainable by the entire history of the underlying processes. Our
approach has several attractive features.
First, we use a novel generalized cross-spectral approach, which embeds the
CCF in a spectral framework, thus enjoying the appealing features of spectral
analysis. In particular, our approach can examine a growing number of lags as the
sample size increases without suffering from the notorious “curse of dimensional-
ity” problem. This improves upon the existing tests, which can only check a fixed
number of lags.
Second, as the Fourier transform of the transition density, the CCF can also
capture the full dynamics of the underlying process, but it involves a lower di-
mensional smoothed nonparametric regression than the nonparametric density
approaches in the literature.
Third, because we impose regularity conditions directly on the CCF of a dis-
cretely observed random sample, our test is applicable to discrete-time processes
and continuous-time processes with discretely observed data. It is also applicable
to both univariate and multivariate time series processes. Due to the nonparametric
nature of the test, it does not need any parametric specification of the underlying
process and thus avoids the misspecification problems.
134 BIN CHEN AND YONGMIAO HONG

In Section 2 we describe the hypotheses of interest and propose a novel


approach to testing the Markov property. We derive the asymptotic distribution
of the proposed test statistic in Section 3, and we discuss its asymptotic power
property in Section 4. In Section 5 we use Horowitz’s (2003) smoothed nonpara-
metric transition-based bootstrap procedure to obtain the finite sample critical
values of the test and examine the finite sample performance of the test in com-
parison with some existing popular tests. We also apply our test to stock prices,
interest rates, and foreign exchange rates and document some evidence against
the Markov property with all three financial time series. Section 6 concludes. All
mathematical proofs are collected in the Appendix. A Gauss code to implement
our test is available from the authors upon request. Throughout the paper we use
C to denote a generic bounded constant, · for the euclidean norm, and A∗ for
the complex conjugate of A.

2. HYPOTHESES OF INTEREST AND TEST STATISTICS


Suppose {Xt } is a strictly stationary d-dimensional time series process, where d
is a positive integer. It follows a Markov process if the conditional probability
distribution of Xt+1 given the information set It = {Xt , Xt−1 , ...} is the same
as the conditional probability distribution of Xt+1 given Xt only. This can be
formally expressed as

H0 : P(Xt+1 ≤ x|It ) = P(Xt+1 ≤ x|Xt ) almost surely (a.s.)


for all x ∈ R and all t ≥ 1.
d
(2.1)

Under H0 , the past information set It−1 is redundant in the sense that the current
state variable or vector Xt will contain all information about the future behavior
of the process that is contained in the current information set It . Alternatively,
when

H A : P(Xt+1 ≤ x|It ) = P(Xt+1 ≤ x|Xt ) for some t ≥ 1, (2.2)

then Xt is not a Markov process.2


Ait-Sahalia (1997) proposes a nonparametric kernel-based test for H0 by check-
ing the Chapman-Kolmogorov equation,

g (Xt+1 |Xt−1 ) = g (Xt+1 |Xt = x) g (Xt = x|Xt−1 ) dx for all t ≥ 1,
Rd

where g(·|·) is the conditional probability density function estimated by the


smoothed nonparametric kernel method. The Chapman-Kolmogorov equation is
an important characterization of the Markov property and can detect many non-
Markov processes with practical importance. However, there exist non-Markov
processes whose first-order transition probabilities satisfy the Chapman-
Kolmogorov equation (Feller, 1959; Rosenblatt, 1960; Rosenblatt and Slepian,
1962). Ait-Sahalia’s (1997) test is expected to miss these processes.
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 135

Amaro de Matos and Fernandes (2007) propose a nonparametric kernel-based


test for H0 by checking the conditional independence between Xt+1 and Xt− j
given Xt , namely,
 
g (Xt+1 |Xt ) = g Xt+1 |Xt , Xt− j for all t, j ≥ 1,

which is implied by H0 . By choosing j = 1, Amaro de Matos and Fernandes


check

g (Xt+1 , Xt , Xt−1 ) = g (Xt+1 |Xt ) g (Xt , Xt−1 ) for all t ≥ 1,

in their simulation and empirical studies. This approach requires a 3d-dimensional


smoothed nonparametric joint density estimation for g(Xt+1 , Xt , Xt−1 ).
Both of the existing tests essentially check the conditional independence of

g(Xt+1 |Xt , Xt−1 ) = g(Xt+1 |Xt ) for all t ≥ 1,

which is implied by H0 in (2.1), but the converse is not true. The most impor-
tant feature of H0 is the necessity of checking the entire currently available in-
formation It . Inevitably there will be information loss if only one lag order is
considered. For example, the existing tests may overlook the departure of the
Markov property from higher-order lags, say, Xt−2 . Moreover, their tests may
suffer from the curse of dimensionality problem when the dimension d is rela-
tively large, because the nonparametric density estimators ĝ(Xt+1 |Xt , Xt−1 ) and
ĝ(Xt+1 |Xt ) involve 3d and 2d dimensional smoothing, respectively.
We now develop a new test for H0 using the CCF. As the Fourier transform of
the conditional probability density, the CCF can also capture the full dynamics
of Xt+1 . Let ϕ(u|Xt ) be the CCF of Xt+1 conditioning on its current state Xt ;
that is,
 √

ϕ(u|Xt ) = ei u x g(x|Xt )dx, u ∈ Rd , i= −1. (2.3)
Rd

Let ϕ(u|It ) be the CCF of Xt+1 conditioning on the currently available informa-
tion It , that is,
 √

ϕ(u|It ) = ei u x g(x|It )dx, u ∈ Rd , i= −1.
Rd

Given the equivalence between the conditional probability density and the CCF,
the hypotheses of interest H0 in (2.1) versus H A in (2.2) can be written as

H0 : ϕ(u|Xt ) = ϕ(u|It ) a.s. for all u ∈ Rd and all t ≥ 1, (2.4)

versus the alternative hypothesis

H A : ϕ(u|Xt ) = ϕ(u|It ) for some t ≥ 1. (2.5)


136 BIN CHEN AND YONGMIAO HONG

There exist other characterizations of the Markov property. For example,


Darsow, Nguyen, and Olsen (1992) and Ibragimov (2007) provide copula-based
characterizations of Markov processes. The CCF-based characterization is intu-
itively appealing and offers much flexibility. To gain insight into this approach,
we define a complex-valued process

Z t+1 (u) = ei u Xt+1 − ϕ(u|Xt ), u ∈ Rd .
Then the Markov property is equivalent to the MDS characterization
 
E Z t+1 (u)|It = 0 for all u ∈Rd and t ≥ 1. (2.6)
The process {Z t (u)} may be viewed as a residual of the nonparametric regression

 
ei u Xt+1 = E ei u Xt+1 |Xt + Z t+1 (u) = ϕ(u|Xt ) + Z t+1 (u).

The MDS characterization in (2.6) has implications on all conditional moments


of Xt when the latter exist. To see this, we consider a Taylor series expansion of
(2.6), for the case of d = 1, 3 around the origin of u:
  ∞ (iu)m
E Z t+1 (u)|It = ∑ m! t+1 |It ) − E(Xt+1 |Xt )} = 0
{E(Xm m
m=0
for t ≥ 1 and all u near 0. (2.7)
Thus, checking (2.6) is equivalent to checking whether all conditional moments
of Xt+1 (if they exist) are Markov. Nevertheless, the use of (2.6) itself does not
require any moment conditions of Xt+1 .
It is not a trivial task to check (2.6). First, the MDS property in (2.6) must hold
for all u ∈ Rd , not just a finite number of grid points of u. This is an example
of the so-called nuisance parameter problem encountered in the literature (e.g.,
Davies, 1977, 1987; Hansen, 1996). Second, the generalized residual process
Z t+1 (u) is unknown because the CCF ϕ(u|Xt ) is unknown, and it has to be esti-
mated nonparametrically to be free of any potential model misspecification. Third,
the conditioning information set It in (2.6) has an infinite dimension as t → ∞,
so there is a curse of dimensionality difficulty associated with testing the Markov
property. Finally, {Z t (u)} may display serial dependence in its higher-order con-
ditional moments. Any test for (2.6) should be robust to time-varying conditional
heteroskedasticity and higher-order moments of unknown form in {Z t (u)}.
To check the MDS property of {Z t (u)}, we extend Hong’s (1999) univari-
ate generalized spectrum to a multivariate generalized cross-spectrum.4 Just as
the conventional spectral density is a basic analytic tool for linear time series,
the generalized spectrum, which embeds the characteristic function in a spectral
framework, is an analytic tool for nonlinear time series. It can capture nonlinear
dynamics while maintaining the nice features of spectral analysis, particularly its
appealing property to accommodate all lags information. In the present context
it can check departures of the Markov property over many lags in a pairwise
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 137

manner, avoiding the curse of dimensionality difficulty. This is not achievable


by the existing tests in the literature. They only check a fixed lag order.
Define the generalized covariance function

 j (u, v) = cov[Z t (u), ei v Xt−| j| ], u, v ∈ Rd . (2.8)

Given that the conventional spectral density is defined as the Fourier transform
of the autocovariance function, we can define a generalized cross-spectrum

1 ∞
F(ω, u, v) =
2π ∑  j (u, v)e−i jω , ω ∈ [−π, π ], u, v ∈ Rd , (2.9)
j=−∞

which is the Fourier transform of the generalized covariance function  j (u, v),
where ω is a frequency. This function contains the same information as  j (u, v).
No moment conditions on {Xt } are required. This is particularly appealing for
economic and financial time series. It has been argued that higher moments of
financial time series may not exist (e.g., Pagan and Schwert, 1990; Loretan and
Phillips, 1994). Moreover, the generalized cross-spectrum can capture cyclical
patterns caused by linear and nonlinear cross-dependence, such as volatility clus-
tering and tail clustering of the distribution.
Under H0 we have  j (u, v) = 0 for all u, v ∈ Rd and all j = 0. Consequently,
the generalized cross-spectrum F(ω, u, v) becomes a ”flat” spectrum as a func-
tion of frequency ω:

1
F(ω, u, v) = F0 (ω, u, v) ≡ 0 (u, v), ω ∈ [−π, π ], u, v ∈ Rd .

(2.10)

Thus, we can test H0 by checking whether a consistent estimator for F(ω, u, v) is


flat with respect to frequency ω. Any significant deviation from a flat generalized
cross-spectrum is evidence of the violation of the Markov property.
The hypothesis of E[Z t (u)|It−1 ] = 0 for all u ∈Rd is different from the hy-
pothesis of  j (u, v) = 0 for all u, v ∈ Rd and all j = 0. The former implies the
latter but not vice versa. This gap is the price we have to pay for dealing with
the difficulty of the curse of dimensionality. From a theoretical point of view, the
pairwise approach will miss dependent processes that are pairwise independent.
However, such processes apparently do not appear in most empirical applications
in economics and finance.  
It is rather difficult to formally characterize the gap between E Z t (u) |It−1 =
0 for all u ∈Rd and  j (u, v) = 0 for all u, v ∈ Rd and all j = 0. However, these
two hypotheses coincide under some special cases. One example is when {Xt }
follows an additive process: Xt = α0 + ∑∞ j=1 g(Xt− j ) + εt , where g(·) is not a
zero function at least for some lag j > 0. Additive time series processes have
attracted considerable interest in the nonparametric literature (e.g., Masry and
Tjøstheim, 1997; Kim and Linton, 2003).
138 BIN CHEN AND YONGMIAO HONG

To reduce the gap between E[Z t (u)|It−1 ] = 0 for all u ∈Rd and  j (u, v) =
0 for all u, v ∈ Rd and all j = 0, we can extend F(ω, u, v) to a generalized
bispectrum

1 ∞ ∞
B (ω1 , ω2 , u, v, τ ) =
(2π)2
∑ ∑ C j,l (u, v, τ ) e−i jω1 −ilω2 ,
j=−∞ l=−∞

ω1 , ω2 ∈ [−π, π ] , u, v, τ ∈ Rd ,

where

C j,l (u, v, τ ) = Z t (u) ei v Xt−| j| − ϕ̂ (v) eiτ Xt−|l| − ϕ̂ (τ ) , u, v, τ ∈ Rd

is a generalized third-order central cumulant function. This is equivalent to the


use of E[Z t (u)|Xt− j , Xt−l ]. With C j,l (u, v, τ ) , we can detect a larger class of
alternatives to E[Z t (u)|It−1 ] = 0. Note that the nonparametric generalized bis-
pectrum approach can check many pairs of lags ( j,l), while still avoiding the
curse of dimensionality. Nevertheless, in this paper, we focus on  j (u, v) for
simplicity.
Suppose now we have a discretely observed sample {Xt }t=1 T
of size T, and we
consider consistent estimation of F(ω, u, v) and F0 (ω, u, v). Because Z t (u) is
not observable, we have to estimate it first. Then we can estimate the generalized
covariance  j (u, v) by its sample analogue

1 T

ˆ j (u, v) = ∑ Ẑ t (u) ei v Xt−| j| − ϕ̂(v) , u, v ∈ Rd , (2.11)


T − | j| t=| j|+1

where the estimated generalized residual



Ẑ t (u) = ei u Xt − ϕ̂(u|Xt−1 ),

ϕ̂(u|Xt−1 ) is a consistent estimator for ϕ(u|Xt−1 ), and ϕ̂(v) = T −1 ∑t=1
T
ei v Xt
is the empirical characteristic function of Xt . We do not parameterize ϕ(u|Xt−1 ),
which would suffer from potential model misspecification. We use nonparamet-
ric regression to estimate ϕ(u|Xt−1 ). Various nonparametric regression methods
could be used here. For concreteness, we use local polynomial regression.
Local polynomial smoothing is introduced originally by Stone (1977) and sub-
sequently studied by Cleveland (1979), Fan (1992, 1993), Ruppert and Wand
(1994), Masry (1996a, 1996b), and Masry and Fan (1997), among many others.
Local polynomial smoothing has some advantages over the conventional
Nadaraya–Watson (NW) kernel estimator: e.g., local polynomial fits adapt auto-
matically to the boundary regions when the order of polynomial r is odd (Ruppert
and Wand; Fan and Yao, 2003); it is superior to the NW estimator in the context
of estimating the derivatives of the regression function (Ruppert and Wand; Fan
and Yao).
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 139

Following Masry (1996a, 1996b), we introduce the notations


d
j = ( j1 , ..., jd ) , j! = j1 ! × ... × jd !, |j| = ∑ jl ,
l=1
j j j
x = x 11 × ... × xdd ,
r l l
∑ =∑ ∑ ··· ∑ .
0≤|j|≤r l=0 j1 =0 jd =0
j1 +...+ jd =l

We consider the multivariate local weighted least squares problem


2
T


min ∑ ei u Xt − ∑ βj (Xt−1 − x)j Kh (x − Xt−1 ) , x ∈Rd , (2.12)
β∈R N t=2 0≤|j|≤r
 

where β = β0 , β1 , .., βr is an N × 1 parameter vector, N = ∑rl=0 Nl , Nl =
(l+d−1)!
Kh (x) = h −d K (x/ h) , K : Rd → R is a kernel function, h is a band-
(d−1)!l! ,
width, and r is an odd integer. When r =1, (2.12) boils down to a local linear
regression. An example of K(·) is a prespecified symmetric probability density
function. We obtain the following solution to (2.12):
⎡ ⎤
β̂0 (x, u)
⎢ .. ⎥ −1
β̂ ≡ β̂(x, u) = ⎣ . ⎦ = ST (x)  (x, u) , x ∈Rd , (2.13)
β̂r (x, u)
where ST (x) is an N × N matrix
⎡ ⎤
S0,0 S0,1 · · · S0,r
⎢ S1,0 S1,1 · · · S1,r ⎥
⎢ ⎥
ST (x) = ⎢ . .. ⎥ ,
⎣ .. . ⎦
Sr,0 Sr,1 · · · Sr,r
 
S|j|,|l| is an N|j|×N|l| matrix with its (m, n)th element S|j|,|l| m,n =sg|j| (m)+g|l| (n),
 
1 T Xt−1 − x j
sj (x) = ∑
T − 1 t=2 h
Kh (Xt−1 − x),

and gl−1 denotes the one-to-one map that arranges those Nl d-tuples as a sequence
in a lexicographical order.5 And  (x, u) is an N × 1 vector
⎡ ⎤
0
⎢ 1 ⎥
⎢ ⎥
 (x, u) = ⎢ . ⎥ ,
⎣ .. ⎦
r
140 BIN CHEN AND YONGMIAO HONG
 
|j| is of dimension N|j| × 1, with its lth element |j| l = τg|j| (l) , and
 j
i u Xt Xt−1 − x
T
τj (x) = ∑ e Kh (Xt−1 − x).
t=2 h

Note that β̂ depends on the location x and parameter u, but for notational sim-
plicity, we have suppressed its dependence on x and u.
Under suitable regularity conditions, ϕ(u|x) can be consistently estimated by
the local intercept estimator β̂0 (x, u) . Specifically, we have
 
T Xt−1 − x i u Xt
ϕ̂(u|x) = ∑ Ŵ e ,
t=2 h

where Ŵ (·) : Rd → R is an effective kernel, defined as

Ŵ (z) ≡ (T − 1)−1 e 1 S−1


T (z) K (z) / h ,
d

e1 = (1, 0, ..., 0) is an N × 1 unit vector, (z) is an N × 1 vector


⎡ ⎤
0 (z)
⎢ 1 (z) ⎥
⎢ ⎥
(z) = ⎢ . ⎥ ,
⎣ .. ⎦
r (z)

|j| (z) is of dimension N|j| × 1, with its lth element |j| (z) = (z)g|j| (l) ,
l
and z is a d × 1 vector. The regression estimator ϕ̂(u|Xt−1 ) only involves a
d-dimensional smoothing, thus enjoying some advantages over the existing non-
parametric density approaches, which involve a 2d or 3d dimensional smoothing.
With the sample generalized covariance function ˆ j (u, v), we can construct a
consistent estimator for the flat generalized spectrum F0 (ω, u, v),
1
F̂0 (ω, u, v) = ˆ 0 (u, v), ω ∈ [−π, π ], u, v ∈ Rd .

Consistent estimation for F (ω, u, v) is more challenging. We use a nonparamet-
ric smoothed kernel estimator for F(ω, u, v) :

1 T −1
F̂(ω, u, v) =
2π ∑ (1 − | j| /T )1/2 k( j/ p)ˆ j (u, v)e−i jω ,
j=1−T

ω ∈ [−π, π ], u, v ∈ Rd , (2.14)

where p = p(T ) → ∞ is a lag order, and k : R → [−1, 1] is a kernel function


that assigns weights to various lag orders. Note that k(·) here is different from the
kernel K(·) in (2.12). Most commonly used kernels discount higher-order lags.
Examples of commonly used k(·) include the Bartlett kernel
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 141

1 − |z| , |z| ≤ 1,
k (z) = (2.15)
0, otherwise,
the Parzen kernel

⎨ 1 − 6z 2 + 6 |z|3 , |z| ≤ 0.5,
k (z) = 2(1 − |z|)3 , 0.5 < |z| ≤ 1, (2.16)

0, otherwise.
and the quadratic-spectral kernel
 
3 sin(π z)
k (z) = − cos (π z) , ∈ R. (2.17)
(π z)2 πz
In (2.14) the factor (1 − | j| /T )1/2 is a finite-sample correction. It could be re-
placed by unity. Under certain regularity conditions, F̂(ω, u, v) and F̂0 (ω, u, v)
are consistent for F(ω, u, v) and F0 (ω, u, v), respectively. The estimators
F̂(ω, u, v) and F̂0 (ω, u, v) converge to the same limit under H0 and generally
converge to different limits under H A . Thus any significant divergence between
them is evidence of the violation of the Markov property.
We can measure the distance between F̂(ω, u, v) and F̂0 (ω, u, v) by the
quadratic form
   π 2
πT
L 2 ( F̂, F̂0 ) = F̂(ω, u, v) − F̂0 (ω, u, v) dωdW (u) dW (v)
2 −π
T −1   2
ˆ
= ∑ k 2 ( j/ p)(T − j)  j (u, v) dW (u) dW (v) , (2.18)
j=1

where the second equality follows by Parseval’s identity, and W : Rd → R+ is


a nondecreasing weighting function that weighs sets symmetric about the origin
equally.6 An example of W (·) is the multivariate independent N (0, I) cumulative
distribution function (CDF), where I is a d × d identity matrix. Throughout, un-
specified integrals are all taken over the support of W (·) . We can compute the
integrals over (u, v) by numerical integration. Alternatively, we can generate ran-
dom draws of u and v from the prespecified distribution W (·), and then use the
Monte Carlo simulation to approximate the integrals over (u, v). This is compu-
tationally simple and is applicable even when the dimension d is large. Note that
W (·) need not be continuous. They can be nondecreasing step functions. This
will lead to a convenient implementation of our test but it may adversely affect
the power. See more discussion below.
Our test statistic for H0 against H A is an appropriately standardized version
of (2.18), namely,
  

T −1 2 

M̂ = ∑ k 2 ( j/ p)(T − j) ˆ j (u, v) dW (u) dW (v) − Ĉ D̂, (2.19)
j=1

where the centering factor


142 BIN CHEN AND YONGMIAO HONG

T −1 T   2 2

Ĉ = ∑ k 2 ( j/ p)(T − j)−1 ∑ Ẑ t (u) ψ̂t− j (v) dW (u) dW (v) ,
j=1 t=| j|+1

and the scaling factor


T −2 T −2    
D̂ = 2 ∑ ∑ k 2 ( j/ p)k 2 (l/ p) dW (u1 ) dW (u2 ) dW (v1 ) dW (v2 )
j=1 l=1
2
T
1
× ∑
T − max( j,l) t=max( j,l)+1
Ẑ t (u1 ) Ẑ t (u2 )ψ̂t− j (v1 )ψ̂t−l (v2 ) ,


where ψ̂t (v) = ei v Xt − ϕ̂(v), and ϕ̂(v) = T −1 ∑t=1 T
ei v Xt is the empirical
characteristic function of {Xt }. The factors Ĉ and D̂ are approximately the mean
and variance of the quadratic form in (2.18) , respectively. They have taken into
account the impact of higher-order serial dependence in the generalized residual
{Z t (u)} . As a result, the M̂ test is robust to conditional heteroskedasticity and
time-varying higher-order conditional moments of unknown form in {Z t (u)}.
In practice, M̂ has to be calculated using numerical integration or approximated
by simulation methods. This can be computationally costly when the dimension
of Xt is large. Alternatively, one can only use a finitely large number of grid
points for u and v. For example, we can generate finitely many numbers of u and
v from a multivariate standard normal distribution. This will dramatically reduce
the computational cost but it may lead to some power loss. We will examine this
issue via simulation.
We emphasize that although the CCF and the transition density are Fourier
transforms of each other, our nonparametric regression-based CCF approach has
an advantage over the nonparametric conditional density-based approach, in the
sense that our nonparametric regression estimator of CCF only involves
d-dimensional smoothing, but the nonparametric joint density estimators used
in the existing tests involve 2d- and 3d-dimensional smoothing. We expect that
such dimension reduction will give better size and power performance in finite
samples.

3. ASYMPTOTIC DISTRIBUTION
To derive the null asymptotic distribution of the test M̂, we impose the following
regularity conditions.

Assumption 1. (i) Assume {Xt }is a strictly stationary β-mixing process with
mixing coefficient β ( j) = O j −ν for some constant ν > 12; (ii) the marginal
density g (x) of Xt is bounded and Lipschitz, and the joint density g j (x, y) of Xt
and Xt− j is bounded.
Assumption 2. For each sufficiently large integer q, there exists a q-dependent
 2
stationary process {Xqt }, such that E Xt − Xqt  ≤ Cq −η for some constant
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 143

η ≥ 12 and all large q. The random vector Xqt is measurable with respect to some
sigma field, which may be different from the sigma field generated by {Xt } .
Assumption 3. Let ϕ (u|x) be the CCF of Xt given Xt−1 . For each u ∈ Rd ,
(r +1)
ϕ (u|x)is (r + 1)th differentiable with respect to x ∈Rd and ∂∂x(r +1) ϕ (u|x) is
(r +1)
(r +1)
Lipschitz of order α: ∂∂x(r +1) ϕ (u|x1 ) − ∂∂x(r +1) ϕ (u|x2 ) ≤ l (u) x1 − x2 α ,

where 0 < α ≤ 1 and l 2 (u) dW (u) < ∞.
Assumption 4. The function K is a product kernel of some univariate kernel
K , i.e., K (u) = ∏dj=1 K (u j ), where K : G → R+ is a symmetric and bounded
function and G is a compact set. The function Hj (u) ≡ uj K (u) is Lipschitz for
all j with 0 ≤ |j| ≤ 2r + 1.
Assumption 5. (i) k : R → [−1, 1] is a symmetric function that is continuous
at zero and all points in R except for a finite number of points; (ii) k (0) = 1; (iii)
k (z) ≤ c |z|−b for some b > 34 as |z| → ∞.

Assumption 6. W : Rd → R+ is a nondecreasing 
weighting function that
weighs sets symmetric about the origin equally, with u4 dW (u) < ∞.
Assumptions 1–3 are regularity conditions on the DGP of {Xt }. Assumption
1(i) restricts the degree of temporal dependence of {Xt }. We say that {Xt } is
β-mixing (absolutely regular) if
 
 
β ( j) = sup E sup P A|F − P (A) → 0,
s
1
s≥1 ∞
A∈Fs+ j

as j → ∞, where F js is the σ -field generated by {Xτ : τ = j, ..., s}, with j ≤ s.


Assumption 1(i) holds for many well-known processes such as stationary au-
toregressive moving average (ARMA) processes and a large class of processes
implied by numerous nonlinear models, including bilinear, nonlinear autoregres-
sive (AR), and autoregressive conditional heteroskedasticity (ARCH) models (Fan
and Li, 1999). Ait-Sahalia, Fan, and Peng (2009), Amaro de Matos and Fernandes
(2007), and Su and White (2007, 2008) also impose β-mixing conditions. Our
mixing condition is weaker than those imposed in Amaro de Matos and Fernandes
and in Su and White (2008). They assume a β-mixing condition with a geometric
decay rate.
The proposed test is applicable to both univariate and multivariate time se-
ries with discrete or continuous distributions, or a mix of continuous and discrete
data.7 For simplicity, we just focus on the continuous case. Cases with discrete
data or mixed data will be left for future research.
Assumption 2 is required only under H0 . It assumes that ! a Markov process
{Xt } can be approximated by a q-dependent process Xqt arbitrarily well if q
is sufficiently large.8 In fact, a Markov process can be q-dependent. Lévy (1949),
Rosenblatt and Slepian (1962), Aaronson, Gilat, and Keane (1992), and Matús
144 BIN CHEN AND YONGMIAO HONG

(1996, 1998) provide examples of a q-dependent Markov process. Ibragimov


(2007) provides the conditions that a Markov process is a q-dependent process. In
this case, Assumption 2 holds trivially. Assumption 2 is not restrictive even when
Xt is not a q-dependent process. To appreciate this, we first consider a simple
AR(1) process {Xt } :
Xt = αXt−1 + εt , {εt } ∼ i.i.d. (0, 1) .
q
Define Xqt = ∑ j=0 α j εt− j , a q-dependent process. Then we have
" #2
 2 ∞ α 2(q+1)
E Xt − Xqt = E ∑ α j εt− j =
1−α
.
j=q+1

Hence Assumption 2 holds if |α| < 1.


Another example is an ARCH(1) process {Xt } :
⎧ 1/2
⎨ Xt = h t εt ,
h = α + βX2t−1 ,
⎩ t
εt ∼ i.i.d.N (0, 1).

This is a Markov process. By recursive substitution, we have h t = α +α ∑∞


j
j=1 ∏i=1
1/2 q j
βεt−i
2 . Define X ≡ h
qt qt εt , where h qt ≡ α + α ∑ j=1 ∏i=1 βεt−i . Then Xqt is a
2

q-dependent process, and


 2    
1/2 1/2 2
E Xt − Xqt = E h t − h qt ≤ E h t − h qt

∞ j  αβ q+1
=α ∑ ∏ E βε t−i =
2
1−β
.
j=q+1 i=1
Thus Assumption 2 holds if β < 1.
For the third example we consider a mean-reverting Ornstein-Uhlenbeck pro-
cess Xt :
dXt = κ (θ − Xt ) dt + σ dWt ,
where Wt is the standard Brownian motion. This is known as Vasicek’s (1977)
model in the interest rate
 term  structure literature. From the stationarity condi-
σ2
tion, we have Xt ∼ N θ, 2κ . Define Xqt = θ + t−q t
σ e−κ(t−s) dWs , which is a
q-dependent process. Then Assumption 2 holds because
  t−q 2
 2
E Xt − Xqt = E e−κt (X0 − θ) + σ e−κ(t−s) dWs
0
" # 
σ2 t−q
= e−2κt + σ 2 e−2κ(t−s) ds
2κ 0

σ 2 e−2κq  
= = o q −η , for any η > 0.

TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 145

Assumption 3 provides conditions on the CCF of Xt . As the CCF is the Fourier


transform of the transition density, we can easily translate the conditions on
the CCF into the conditions on the transition density p (y|x). In particular, As-
sumption 3 holds if for each y ∈ Rd , p (y|x) is (r + 1)th differentiable with
(r +1)
respect to x ∈ Rd , and ∂∂x(r +1) p (y|x) satisfies the Lipschitz condition of order
(r +1)
(r +1)
α: ∂∂x(r +1) p (y|x1 ) − ∂∂x(r +1) p (y|x2 ) ≤ l (y) x1 − x2 α , where 0 < α ≤ 1 and
  2i u y 2
e l (y) dydW (u) < ∞. Assumption 4 imposes regularity conditions on
the kernel function used in local polynomial regression estimation. The same as-
sumption has been imposed by Masry (1996a) and Ait-Sahalia et al. (2009). The
condition on the boundedness and the compact support of K (·) is imposed for the
brevity of proofs and could be removed at the cost of a more tedious proof.9
Assumption 5 imposes regularity conditions on the kernel function k(·) used for
generalized cross-spectral estimation. This kernel is different from the kernel K (·)
used in the first-stage nonparametric regression estimation of ϕ(u|Xt−1 ). Here,
k(·) provides weighting for various lags, and it is used to estimate the generalized
cross-spectrum F(ω, u, v). Among other things, the continuity of k (·) at zero
and k (0) = 1 ensures that the bias of the generalized cross-spectral estimator
F̂ (ω, u, v) vanishes to zero asymptotically as T → ∞. The condition on the
tail behavior of k (·) ensures that higher order lags will have little impact on the
statistical properties of F̂ (ω, u, v) . Assumption 5 covers most commonly used
kernels. For kernels with bounded support, such as the Bartlett and Parzen kernels,
b = ∞. For kernels with unbounded support, b is a finite positive real number. For
example, b = 1 for the Daniell kernel k (z) = sin (π z) / (π z) , and b = 2 for the
quadratic-spectral kernel k (z) = 3/ (π z)2 [sin (π z) /(π z) − cos (π z)] .
Assumption 6 imposes mild conditions on the prespecified weighting function
W (·) . Any CDF with finite fourth moments satisfies Assumption 6. Note that
W (·) need not be continuous. This provides a convenient way to implement our
tests, because we can avoid relatively high dimensional numerical integrations by
using finitely many numbers of grid points for u and v.
We now state the main result of this paper.
THEOREM 1. Suppose Assumptions 1−6 hold, and p = cT λ for
 0 < λ < (3 +
−1 −δ λν 1−λ
4b−2 ) and 0 < c < ∞, h = cT , δ ∈ 4(r +1) , min( 2d , d ) . Then under
1 2−λ

H0 , M̂ →d N (0, 1) as T → ∞.
As an important feature of M̂, the use of the nonparametrically estimated gen-
eralized residual Ẑ t (u) in place of the true unobservable residual Z t (u) has
no impact on the limit distribution of M̂. One can proceed as if the true CCF
ϕ(u|Xt−1 ) were known and equal to the nonparametric estimator ϕ̂(u|Xt−1 ).
The reason is that by choosing suitable bandwidth h and lag order p, the conver-
gence rate of the nonparametric CCF estimator ϕ̂(u|Xt−1 ) is faster than that of the
nonparametric estimator F̂ (ω, u, v) to F (ω, u, v) . Consequently, the limiting
distribution of M̂ is solely determined by F̂ (ω, u, v) , and replacing ϕ(u|Xt−1 )
146 BIN CHEN AND YONGMIAO HONG

by ϕ̂(u|Xt−1 ) has no impact on the asymptotic distribution of M̂ under H0 . The


impact of the first-stage estimation comes from two sources—bias and variance—
and we have to balance them. The dimension d affects the variance but not the
bias. For given T and h, the variance increases with the dimension and, conse-
quently, a smaller dimension allows for a bigger feasible range of δ. The dimen-
sion d has no direct impact on λ, as the frequency domain estimation is used for
the one-dimensional generalized spectrum F(ω, u, v), no matter how big the di-
mension of Xt is. However, since we need to balance the convergence speeds of
h and p, the dimension d has an indirect impact on p. The smaller the dimension
is, the bigger the feasible range of λ would be.
Although the use of ϕ̂(u|Xt−1 ) has no impact on the limit distribution of the
M̂ test, it may have substantial impact on its finite sample size performance.
To overcome such adverse impact, we will use Horowitz’s (2003) nonparamet-
ric smoothed transition density-based bootstrap procedure to obtain the critical
values of the test in finite samples. See more discussion in Section 5 below.

4. ASYMPTOTIC POWER
Our test is derived without assuming a specific alternative to H0 . To get insights
into the nature of the alternatives that our test is able to detect, we now examine
the asymptotic power behavior of M̂ under H A in (2.2).
λ
 hold, and p = cT for 0 < λ <
THEOREM 2. Suppose Assumptions 1 and 3–6
−1 −δ λν 1
(3 + 4b−2 ) and 0 < c < ∞, h = cT , δ ∈ 4(r +1) , min( 2d , d ) . Then under
1 2−λ

H A , and as T → ∞,
p 1/2 1 ∞  
M̂ → P √ ∑  j (u, v) 2 dW (u) dW (v)
T D j=1
   π
π
= √ |F (ω, u, v) − F0 (ω, u, v)|2 dωdW (u) dW (v) ,
2 D −π

where
 ∞  
D=2 k 4 (z) dz |0 (u1 , u2 )|2 dW (u1 )
0
∞  
dW (u2 ) ∑  j (v1 , v2 ) 2 dW (v1 ) dW (v2 ) ,
j=−∞


 j (u, v) = cov ei u Xt , ei v Xt−| j| and 0 (u, v) = cov [Z t (u) , Z t (v)] .

The restriction on h in Theorem 2 is weaker than that in Theorem 1, as we allow


for a slower convergence rate of the first-stage nonparametric estimation. The
function L (ω, u, v) is the generalized spectral density of the process {Xt } , which
is first introduced in Hong (1999) in a univariate context. It captures temporal
dependence in {Xt }. The dependence of the constant D on L (ω, u, v) is due to
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 147

the fact that the conditioning variable {ei v Xt−| j| } is a time series process. This
suggests that if the time series {Xt } is highly persistent, it may be more difficult
to detect violation of the Markov property because the constant D will be larger.
Following reasoning analogous to Bierens (1982) and Stinchcombe and White
(1998),
 we have that for j > 0,  j (u, v) = 0 for all u, v ∈ Rd if and only if
E Z t (u)|Xt− j = 0 a.s. for all u ∈ R . Thus, the generalized covariance function
d

 j (u, v) can capture various departures from the Markov property in every con-
ditional moment of Xt in view of the Taylor series expansion in (2.7). Suppose
  
E Z t (u)|Xt− j = 0 at some lag j > 0. Then we have  j (u, v) 2 dW (u)
dW (v) > 0 for any weighting function W (·) that is positive, monotonically
increasing, and continuous, with unbounded support on Rd . Consequently, P[ M̂ >
C (T )] → 1 for any sequence of constants {C(T ) = o(T / p 1/2 )}. Thus M̂ hasasy-
mptotic unit power at any given significance level, whenever E Z t (u)|Xt− j = 0
at some lag j > 0.
Thus, to ensure the consistency property of M̂, it is important to integrate u
and v over the entire domain of Rd . When numerical integration is difficult, as
is the case where the dimension d is large, one can use Monte Carlo simulation
to approximate the integrals over u and v. This can be obtained by using a large
number of random draws from the distribution W (·) and then computing the sam-
ple average as an approximation to the related integral. Such an approximation
will be arbitrarily accurate provided the number of random draws is sufficiently
large. Alternatively, we can use a nondecreasing step function W (·). This avoids
numerical integration or Monte Carlo simulation, but the power of the test may be
affected. In theory, the consistency property will not be preserved if only a finite
number of grid points of u and v are used, and the power of the test may depend
on the choice of grid points for u and v.
On the other hand, Theorem 2 implies that the M̂ test can check departure
from the Markov property at any lag order j > 0, as long as the sample size T is
sufficiently large. This is achieved because M̂ includes an increasing number of
lags as the sample size T → ∞. Usually the use of a large number of lags would
lead to the loss of a large number of degrees of freedom. Fortunately this is not the
case with the M̂ test, thanks to the downward weighting of k 2 (·) for higher-order
lags.
As revealed by the Taylor series expansion in (2.7), our test, which is based on
the MDS characterization in (2.6), essentially checks departures from the Markov
property in every conditional moment. When M̂ rejects the Markov property, one
may be further interested in what causes the rejection. To gauge possible sources
of the violation of the Markov property, we can construct a sequence of tests based
on the derivatives of the nonparametric regression residual Z t (u) at the origin 0:

∂ |m|   m1 md
m1 m d E Z t (u)|It−1 u=0 = E(X 1t · · · X dt |It−1 )
∂u 1 · · · ∂u d
m1 md
−E(X 1t · · · X dt |Xt−1 ) = 0,
148 BIN CHEN AND YONGMIAO HONG

where the order of derivatives |m| = a=1


d
m a , and m = (m 1 , ..., m d ) , and m a ≥ 0
for all a = 1, ..., d. For the univariate time series (i.e., d = 1), the choices of
m = 1, 2, 3, 4 correspond to tests for departures of the Markov property in the first
four conditional moments, respectively. For each m, the resulting test statistic is
given by
 
 $
T −1 2
ˆ (m,0)
M̂(m) = ∑k 2
( j/ p)(T − j)  j (0, v) dW (v) − Ĉ(m) D̂(m),
j=1
(4.1)
(m,0)
where ˆ j (0, v) is the sample analogue of the derivative of the generalized
cross-covariance function
%   &
d d

(0, v) = cov ∏ (i X at )m a − E ∏ (i X at )m a Xt−1 , e(i v Xt−| j| ) ,
(m,0)
j
a=1 a=1

the centering and scaling factors

T −1 T  2
1 (m) 2
Ĉ(m) = ∑ k 2 ( j/ p) T − j ∑ Ẑ t (0) ψ̂t− j (v) dW (v) ,
j=1 t=| j|+1

T −2 T −2  
D̂(m) = 2 ∑ ∑ k 2 ( j/ p)k 2 (l/ p)
j=1 l=1
2
T
1 (m) 2
× ∑
T − max( j,l) t=max( j,l)+1
Ẑ t (0) ψ̂t− j (v1 )ψ̂t− j (v2 ) dW (v1 ) dW (v2 ) ,

and
 
d d
(m)
Ẑ t (0) = ∏ (i X at ) ma
−E ∏ (i X at ) ma
|Xt−1 .
a=1 a=1

These derivative tests may provide additional useful information on the possi-
ble sources of the violation of the Markov property. On the other hand, some
economic theories only have implications for the Markov property in certain
moments, and our derivative tests are suitable to test these implications. For ex-
ample, Hall (1978) shows that a rational expectation
 model
 of consumption can be
characterized by the Euler equation that E u (Ct+1 ) |It = u (Ct ) , where u (Ct )
is the marginal utility of consumption Ct . This can be viewed as the Markov
property in mean for the marginal utility process of consumption. The derivative
test M̂ (1) can be used to test this implication.
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 149

5. NUMERICAL RESULTS
5.1. Monte Carlo Simulations
Theorem 1 provides the null asymptotic N (0, 1) distribution of M̂. Thus, one can
implement our test for H0 by comparing M̂ with a N (0, 1) critical value. How-
ever, like many other nonparametric tests in the literature, the size of M̂ in finite
samples may differ significantly from the prespecified asymptotic significance
level. Our analysis suggests that the asymptotic theory may not work well even
for relatively large sample sizes, because the asymptotically negligible higher-
order terms in M̂ are close in order of magnitude to the dominant U -statistic
that determines the limit distribution of M̂. In particular, the first-stage smoothed
nonparametric regression estimation for ϕ(u|Xt−1 ) may have substantial adverse
effect on the size of M̂ in finite samples. Indeed, our simulation study shows that
M̂ displays severe underrejection under H0 . We examine the finite sample per-
formance of an infeasible M̂ test by replacing the estimated generalized residual
Ẑ t (u) with the true generalized residual Z t (u). We find that the size of the in-
feasible test is reasonable. This experiment suggests that the underrejection of M̂
is mainly due to the impact of the first-stage nonparametric estimation of CCF,
which has a rather slow convergence rate. Similar problems are also documented
by Skaug and Tjøstheim (1993, 1996), Hong and White, (2005), and Fan, Li, and
Min (2006) in other contexts.
To overcome this problem, we use Horowitz’s (2003) smoothed nonparamet-
ric conditional density bootstrap procedure to approximate the null finite-sample
null distribution of M̂ more accurately. The basic idea is to use a smoothed non-
parametric transition density estimator (under H0 ) to generate bootstrap samples.
Specifically, it involves the following steps:
Step (i). To obtain a bootstrap sample X b ≡ {Xbt }t=1
T , draw Xb from the
1
smoothed unconditional kernel density
1 T
ĝ (x) =
T ∑ Kh (x − Xs−1 )
s=2

and {Xbt }t=2


T recursively from the smoothed conditional kernel den-
sity
 
1 T
∑s=2 Kh (x − Xs ) Kh Xbt−1 − Xs−1
ĝ(x|Xbt−1 ) = T  b  , (5.1)
T ∑s=2 Kh Xt−1 − Xs−1
1 T

where K(·) and h are the same as those used in M̂.


Step (ii). Compute a bootstrap statistic M̂ b in the same way as M̂, with X b
replacing X ={Xt }t=1
T . The same K(·) and h are used in M̂ and

M̂ .
b

Step (iii). Repeat steps (i) and (ii) B times to obtain B bootstrap test statistics
{M̂lb }l=1
B .
150 BIN CHEN AND YONGMIAO HONG

Step (iv). Compute the bootstrap p-value pb ≡ B −1 ∑l=1


B
1( M̂lb > M̂) for a
sufficiently large B.
We suggest using the same kernel K(·) and the same bandwidth h in comput-
ing ĝ(x|Xt−1 ), M̂ and M̂ b . This is not necessary, but it delivers a simpler test
procedure.10 Smoothed nonparametric bootstraps have been used to improve fi-
nite sample performance in hypothesis testing. For example, Su and White (2007,
2008) apply Paparoditis and Politis’s (2000) procedure in testing for conditional
independence, and Amaro de Matos and Fernandes (2007) use Horowitz’s (2003)
Markov conditional bootstrap procedure in testing for the Markov property. Papar-
oditis and Politis’s (2000) procedure is similar to Horowitz’s, except that Paparo-
ditis and Politis (2000) generate bootstrap samples from ĝ(x|Xt−1 ) and Horowitz
generates bootstrap samples from ĝ(x|Xbt−1 ). Both methods can be applied to our
test, although Horowitz’s procedure is more computationally expensive.11 When
Paparoditis and Politis’s (2000) method is used, the bootstrap sample {Xbt }t=1T is

an i.i.d. sequence conditional on the original sample X and hence it is Markov


conditional on X . Following an analogous proof of Theorem 4.1 of Su and White
(2008), we can show that conditional on X , M̂ b →d N (0, 1) as T → ∞. The
proof is similar to but simpler than that of Theorem 1 in Section 3 due to the fact
that {Xbt }t=1
T is i.i.d. conditional on X . More specifically, we can first show that
the estimation uncertainty in the first-stage nonparametric estimation has no im-
pact asymptotically. Then, by applying Brown’s (1971) central limit theorem, we
can derive the asymptotic normality of M̂ b conditional on X . On the other hand,
the proof of consistency with the Horowitz approach is much more involved. We
conjecture that following an analogous proof of Theorem 3.4 of Paparoditis and
Politis (2002), we can show that conditional on X , Xbt is a so-called ρ-mixing
process with a geometric decay rate. Then by applying a suitable central limit
theorem of the degenerate U -statistics (e.g., Gao and Hong, 2008, Thm 2.1), the
asymptotic normality of M̂ b conditional on {Xt }t=1T may be obtained.

The consistency of the smoothed bootstrap does not indicate the degree of im-
provement of the smoothed bootstrap upon the asymptotic distribution. Since M̂
is asymptotically pivotal, it is possible that M̂ b can achieve reasonable accuracy
in finite samples. We shall examine the performance of the smoothed bootstrap in
our simulation study.
We shall compare the finite sample performance of our M̂ test with Su and
White’s (SW ) (2007, 2008) CCF-based test and Hellinger metric test for condi-
tional independence.12 To examine the size of the tests under H0 , we consider
two Markov DGPs:

DGP S1 [AR(1)]: X t = 0.5X t−1 + εt ,


 1/2
X t = h t εt
DGP S2 [ARCH(1)]:
h t = 0.1 + 0.1X t−1
2 ,

where εt ∼ i.i.d.N (0, 1) .


TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 151

To examine the power of the tests using the smoothed bootstrap, we consider
the following non-Markovian DGPs:

DGP P1 [MA(1)]: X t = εt + 0.5εt−1 ,


%
1/2
X t = h t εt
DGP P2 [GARCH(1,1)]:
h t = 0.1 + 0.2X t−1
2 + 0.7h
t−1 ,

⎨ X t = 0.3 + 0.5h t + z t
1/2
DGP P3 [GARCH-in-Mean]: z = h t εt
⎩ t
h t = 0.1 + 0.2X t−1
2 + 0.7h
t−1 ,

0.7X t−1 + εt , if St = 0,
DGP P4 [Markov Chain Regime-Switching]: Xt =
−0.3X t−1 + εt , if St = 1,
⎧ √

Xt = √h t εt , if St = 0,
DGP P5 [Markov Chain Regime-Switching ARCH]: 3 h t εt , if St = 1,

h t = 0.1 + 0.3X t−1
2 ,

where εt ∼ i.i.d.N (0, 1) , and in DGPs P4 and P5, St is a latent state variable that
follows a two-state Markov chain with transition probabilities P(St = 1|St−1 = 0)
= P (St = 0|St−1 = 1) = 0.9. DGPs P4 and P5 are the Markov Chain Regime-
Switching model and Markov Chain Regime-Switching ARCH model proposed
by Hamilton (1989) and Hamilton and Susmel (1994), respectively. They can cap-
ture the state-dependent behaviors in time series. The introduction of St changes
the Markov property of AR(1) and ARCH(1) processes. The knowledge of X t−1
is not sufficient to summarize all relevant information in It−1 that is useful to
predict the future behavior of X t . The departure from the Markov property comes
from the conditional mean in DGPs P1 and P4, from the conditional variance in
DGPs P2 and P5, and from both the conditional mean and conditional variance in
DGP P3.
Throughout, we consider three sample sizes: T = 100, 250, 500. For each DGP
we first generate T + 100 observations and then discard the first 100 to mitigate
the impact of the initial values. To examine the bootstrap sizes and powers of
the tests, we generate 500 realizations of the random sample {X t }t=1T , using the

Gauss Windows version random number generator. We use B = 100 bootstrap


iterations for each simulation iteration. To reduce computational costs of our M̂
test, we generate u and v from an N (0, 1) distribution, with each u and v hav-
ing 30 symmetric grid points in R, respectively.13 We use the Bartlett kernel in
(2.15), which has bounded support and is computationally efficient. Our simula-
tion experience suggests that the choices of W (·) and k (·) have little impact on
both the size and power of the tests.14 Like Hong (1999), we use a data-driven
p̂ via a plug-in method that minimizes the asymptotic integrated mean squared
error of the generalized spectral density estimator F̂ (ω, x, y), with the Bartlett
kernel k (·) used in some preliminary generalized spectral estimators. To examine
the sensitivity of the choice of a preliminary bandwidth p̄ on the size and power
of the M̂ test, we consider p̄ in the range of 5 to 20. We use the Gaussian kernel
152 BIN CHEN AND YONGMIAO HONG

for K (·). For simplicity, we choose h = Ŝ X T −1/4.5 , where Ŝ X is the sample stan-
dard deviation of {X t }t=1
T .15 We compare the proposed test with Su and White’s

(2007, 2008) tests, applied to the present context to check whether X t is inde-
pendent of X t−2 conditional on X t−1 . Following Su and White (2008, 2007), we
choose a fourth-order kernel K(u) = (3 − u 2 )ϕ(u)/2, where ϕ(·) is the N (0, 1)
density function, h = T −1/8.5 for the nonparametric estimation of their Hellinger
metric test SWa , h 1 = h ∗1 T 1/10 T −1/6 and h 2 = h ∗2 T 1/9 T −1/5 for their CCF-based
test SWb , where h ∗1 and h ∗2 are the least-squares cross-validated bandwidths for
estimating the conditional expectations of X t given (X t−1 , X t−2 ) and X t−1 , re-
spectively, and b = T −1/5 for the bootstrap.
Table 1 reports the bootstrap sizes and powers of M̂, SWa , and SWb at the
10% and 5% levels under DGPs S1–S2 and P1–P5. The M̂ test has reasonable
sizes under the DGPs S1 and S2 at both 10% and 5% levels. Under DGP S1
(AR(1)) the empirical levels of M̂ are very close to the nominal levels, especially
at the 5% level. When T = 100, M̂ tends to overreject a little under DGP S2
(ARCH(1)), but the overrejection is not excessive, and it improves as T increases.
The sizes of M̂ are not very sensitive to the choice of the preliminary lag order
p̄. The smoothed bootstrap procedure has reasonable sizes in small samples. We
note that the rejection rate of SWa decreases monotonically under DGP S1 and
reaches 2.8% at the 5% level when T = 500, but SWb has good sizes under both
DGPs.
Under DGPs P1–P5, X t is not Markov, and our test has reasonable power.
Under DGPs P1 and P4 (MA(1) and Markov Chain Regime-Switching), our test
dominates SWa and SWb for all sample sizes considered. Interestingly, SWa and
SWb have nonmonotonic power against DGP P4, and their rejection rates only
reach 10.4% and 7.2%, respectively, at the 5% level when T = 500. In contrast,
the power of M̂ is around 50% at the 5% level when T = 500. Under DGPs P2, P3,
and P5 (GARCH(1,1), Markov Chain Regime-Switching ARCH, and GARCH-
in-Mean), SWa and SWb perform slightly better in small samples, but the power
of our M̂ test increases more quickly with T , and our test outperforms SWa and
SWb when T = 500, which demonstrates the nice feature of our frequency domain
approach. The relative ranking between SWa and SWb does not display a very
clear pattern, but SWb is more powerful under DGPs P1–P3.
In summary, the new M̂ test with the smoothed bootstrap procedure delivers
reasonable size and omnibus power against various non-Markov alternatives in
small samples. It performs well relative to two existing tests SWa and SWb in
many cases.

5.2. Application to Financial Data


As documented by Hong and Li (2005), such popular spot interest rate continuous-
time models as Vasicek (1977), Cox et al. (1985), Chan, Karolyi, Longstaff, and
Sanders (1992), Ait-Sahalia (1996), and Ahn and Gao (1999) are all strongly
rejected with real interest rate data. They cannot capture the full dynamics of
TABLE 1. Size and power of the test
T = 100 T = 250 T = 500
M̂ SWa SWb M̂ SWa SWb M̂ SWa SWb
lag 10 15 20 10 15 20 10 15 20

TESTING FOR THE MARKOV PROPERTY IN TIME SERIES


Size
DGP S1: AR(1)
10% .066 .088 .090 .080 .112 .094 .098 .096 .072 .090 .088 .086 .092 .058 .088
5% .042 .042 .048 .040 .064 .036 .044 .048 .036 .050 .044 .048 .044 .028 .048
DGP S2: ARCH(1)
10% .116 .122 .126 .164 .102 .082 .094 .098 .138 .100 .094 .092 .092 .086 .100
5% .070 .064 .066 .102 .040 .046 .040 .040 .078 .058 .048 .048 .050 .050 .050
Power
DGP P1: MA(1)
10% .278 .262 .236 .128 .138 .444 .424 .390 .156 .260 .718 .674 .616 .252 .360
5% .156 .144 .136 .076 .072 .328 .300 .256 .098 .166 .622 .552 .508 .158 .264
DGP P2: GARCH(1,1)
10% .172 .158 .150 .218 .234 .224 .242 .258 .210 .284 .440 .452 .446 .310 .372
5% .084 .086 .078 .150 .160 .154 .166 .162 .136 .206 .274 .300 .296 .216 .234
DGP P3: GARCH-in-Mean
10% .164 .168 .174 .188 .206 .348 .360 .366 .246 .340 .628 .648 .668 .362 .508
5% .090 .102 .088 .114 .120 .224 .234 .246 .162 .246 .490 .540 .536 .254 .362
DGP P4: Markov Regime-Switching
10% .244 .214 .202 .190 .134 .442 .384 .348 .180 .150 .666 .612 .578 .164 .120
5% .156 .148 .140 .114 .078 .302 .270 .252 .094 .070 .550 .494 .458 .104 .072
DGP P5: Markov Chain Regime-Switching ARCH
10% .174 .152 .154 .100 .188 .328 .320 .298 .364 .288 .626 .594 .590 .560 .388
5% .098 .086 .082 .042 .112 .204 .202 .198 .262 .162 .496 .478 .456 .448 .240

Notes: (i) M̂ is our proposed omnibus test, given in (2.19); SWa and SWb are Su and White’s (2008) Hellinger metric test and Su and White’s (2007) characteristic function-based test,

153
respectively; (ii) 500 iterations and 100 bootstrap iterations for each simulation iteration.
154 BIN CHEN AND YONGMIAO HONG

the spot interest rates. Although works are still going on to add the richness of
model specification in terms of jumps and functional forms, the models proposed
continue to be a Markov process. In fact, the firm rejection of a continuous-time
model could be due to the violation of the Markov property, as speculated by Hong
and Li. If this is indeed the case, one should not attempt to look for flexible func-
tional forms within the class of Markov models. On the other hand, as discussed
earlier, an important conclusion of the asymmetric information microstructure
models (e.g., Easley and O’Hara, 1987, 1992) is that asset price sequences do not
follow a Markov process. It is interesting to check whether real stock prices are
consistent with this conjecture.
We apply our test to three important financial time series—stock prices, interest
rates, and foreign exchange rates—and compare it with SWa and SWb . We use
the Standard & Poor’s 500 price index, 7-day Eurodollar rate, and Japanese yen,
obtained from Datastream. The data are weekly series from 1 January 1988 to 31
December 2006. The weekly series are generated by selecting Wednesdays series
(if a Wednesday is a holiday then the preceding Tuesday is used), which all have
991 observations. The use of weekly data avoids the so-called weekend effect, as
well as other biases associated with nontrading, asynchronous rates, and so on,
which are often present in higher-frequency data. To examine the sensitivity of
our conclusion to the possible structural changes, we consider two subsamples:
1 January 1988 to 31 December 1997, for a total of 521 observations, and 1 Jan-
uary 1998 to 31 December 2006, for a total of 470 observations. Figure 1 provides
the time series plots and Table 2 reports some descriptive statistics. The aug-
mented Dickey-Fuller test indicates that there exists a unit root in all three level
series but not in their first differenced series. Therefore, as is a standard practice,
we use S&P 500 log returns, 7-day Eurodollar rate changes, and Japanese yen log
returns. To check possible structural changes, we use Inoue’s (2001) Kolmogorov-
Smirnov (KS) test for the stability of stationary distributions.16 Table 2 shows that
we are unable to reject the distribution stability hypothesis for all series in both
sample periods at the 5% level, and we are only able to reject the distribution sta-
bility hypothesis for 7-day Eurodollar rate changes for the full sample at the 10%
level. On the other hand, to check the robustness of our test to possible structural
breaks, we apply our test to an AR(1) model with structural break in a simula-
tion study (results are availabe upon requests). This DGP is Markov, but there
exists a structural break. Our test does not overreject the null Markov hypothe-
sis. This suggests that our test may be robust to some forms of structural breaks
in practice.
Table 3 reports the test statistics and bootstrap p-values of our test, SWa , and
SWb . The bootstrap p-values, based on B = 500 bootstrap iterations, are com-
puted as described in Section 5.1. For all sample periods considered, the bootstrap
p-values of our test statistics are quite robust to the choice of the preliminary lag
order p̄. For the whole sample and the subsample of 1998 to 2006, we find strong
evidence against the Markov property for S&P 500 returns, 7-day Eurodollar rate
changes, and Japanese yen returns: All bootstrap p-values of our test are smaller
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 155

F IGURE 1. Financial time series plots.

than 5%. For the subsample of 1988 to 1997, we only reject the Markov property
of 7-day Eurodollar rate changes at the 5% level. The results of SWa and SWb
are mixed, and there seems no clear pattern of these two tests. For example, at the
10% level, SWa is only able to reject the Markov property of S&P 500 returns and
7-day Eurodollar rate changes from 1998 to 2006, and SWb is only able to reject
that of S&P 500 returns from 1988 to 2006 and 7-day Eurodollar rate changes
from 1988 to 2006 and 1988 to 1997.
To gauge possible sources of the violation of the Markov property, we also
implement derivative tests M̂(m), m = 1, 2, 3, 4, as described in Section 4. Tests
and their results are reported in Table 4. We first consider S&P 500 returns. A
156
BIN CHEN AND YONGMIAO HONG
TABLE 2. Descriptive statistics for S&P 500, interest rate, and exchange rate
01/01/1988 − 31/12/2006 01/01/1988 − 31/12/1997 01/01/1998 − 31/12/2006
S&P Eurodollar JY S&P Eurodollar JY S&P Eurodollar JY
Sample size 991 991 991 521 521 521 470 470 470
Mean 0.0017 −0.0017 −0.0001 0.0025 −0.0025 0.0000 0.0008 −0.0004 −0.0002
Std 0.0209 0.3272 0.0145 0.0179 0.4087 0.0146 0.0238 0.2019 0.0145
ADF −0.58 −1.19 −2.07 2.10 −1.00 −1.19 −1.78 −0.72 −2.42
(0.8728) (0.6808) (0.2550) (0.9999) (0.7532) (0.6786) (0.3896) (0.8395) (0.1362)
KS 0.2828 0.0707 0.3939 0.6364 0.1010 0.1818 0.1818 0.1414 0.2828
Notes: ADF denotes the augmented Dickey-Fuller test; KS denotes the Kolmogorov-Smirnov test for the stability of stationary distributions proposed by Inoue (2001).
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 157

TABLE 3. Markov test for S&P 500, interest rate, and exchange rate
S&P 500 7-day Eurodollar rate Japanese yen
lag Statistics p-values Statistics p-values Statistics p-values
M̂ 01/01/1988 − 31/12/2006
10 0.86 0.0160 0.75 0.0080 1.34 0.0000
11 0.86 0.0160 0.98 0.0040 1.39 0.0000
12 0.89 0.0160 1.16 0.0040 1.52 0.0000
13 0.95 0.0160 1.35 0.0040 1.65 0.0000
14 1.01 0.0160 1.58 0.0020 1.76 0.0000
15 1.05 0.0180 1.79 0.0020 1.85 0.0000
16 1.07 0.0180 1.99 0.0020 1.94 0.0000
17 1.09 0.0200 2.22 0.0020 2.01 0.0000
18 1.11 0.0180 2.48 0.0020 2.08 0.0000
19 1.12 0.0180 2.74 0.0020 2.15 0.0000
20 1.13 0.0180 2.97 0.0000 2.21 0.0000
SWa 0.79 0.1680 −4.61 0.9920 0.09 0.4600
SWb 0.36 0.0940 0.21 0.0520 −0.85 0.5700
M̂ 01/01/1988 − 31/12/1997
10 −1.39 0.5940 0.25 0.0100 −0.35 0.0980
11 −1.39 0.6120 0.30 0.0100 −0.35 0.1060
12 −1.35 0.6080 0.34 0.0080 −0.27 0.1020
13 −1.30 0.5900 0.38 0.0060 −0.21 0.0980
14 −1.25 0.5840 0.41 0.0080 −0.16 0.0980
15 −1.20 0.5780 0.45 0.0080 −0.12 0.1040
16 −1.15 0.5600 0.49 0.0080 −0.08 0.1040
17 −1.08 0.5260 0.51 0.0080 −0.04 0.1060
18 −1.02 0.5080 0.54 0.0100 −0.01 0.1100
19 −0.96 0.4860 0.57 0.0100 0.03 0.1100
20 −0.91 0.4640 0.62 0.0080 0.06 0.1100
SWa −0.36 0.6540 −4.85 0.9900 0.16 0.3640
SWb −0.14 0.1680 0.07 0.0700 0.03 0.1100
M̂ 01/01/1998 − 31/12/2006
10 1.68 0.0080 0.34 0.0100 0.71 0.0100
11 1.88 0.0060 0.74 0.0040 0.76 0.0120
12 2.06 0.0040 1.08 0.0000 0.82 0.0140
13 2.22 0.0020 1.36 0.0000 0.88 0.0140
14 2.36 0.0020 1.62 0.0000 0.94 0.0140
15 2.48 0.0020 1.87 0.0000 0.98 0.0140
16 2.58 0.0020 2.09 0.0000 1.02 0.0100
17 2.66 0.0000 2.27 0.0000 1.06 0.0100
18 2.74 0.0000 2.44 0.0000 1.09 0.0100
19 2.81 0.0000 2.60 0.0000 1.11 0.0120
20 2.88 0.0000 2.75 0.0000 1.14 0.0120
SWa 1.12 0.0960 1.50 0.0180 0.63 0.2160
SWb −0.07 0.1520 −0.18 0.1740 −1.28 0.8600
Notes: (i) M̂ is our proposed omnibus test,given in (2.19); SWa and SWb are Su and White’s (2008) Hellinger metric
test and Su and White’s (2007) characteristic function based test respectively; (ii) 500 bootstrap iterations.
158 BIN CHEN AND YONGMIAO HONG

TABLE 4. Derivative tests for S&P 500, interest rate, and exchange rate
M̂ (1) M̂ (2) M̂ (3) M̂ (4)
S&P 500 1988–2006 0.4220 0.2380 0.2160 0.2220
S&P 500 1988–1997 0.6320 0.0140 0.2680 0.0280
S&P 500 1998–2006 0.7300 0.0000 0.7140 0.0040
7-day Eurodollar 1988–2006 0.0020 0.0000 0.0000 0.0000
7-day Eurodollar 1988–1997 0.0060 0.0060 0.0100 0.0140
7-day Eurodollar 1998–2006 0.0400 0.0440 0.1040 0.0520
Japanese yen 1988–2006 0.0780 0.0000 0.0200 0.0120
Japanese yen 1988–1997 0.0360 0.0760 0.1100 0.1440
Japanese yen 1998–2006 0.6048 0.1060 0.0360 0.1520

Notes: (i) M̂ (m) , m = 1, 2, 3, 4, are our proposed derivative tests, given in (4.1); (ii) The bootstrap p -values are
calculated by the smoothed nonparametric transition density-based bootstrap procedure described in Section 5 with
500 bootstrap iterations.

bit surprisingly, the four derivative tests M̂(m), m = 1, 2, 3, 4 all fail to reject the
Markov hypothesis. However, for two subsamples, M̂(2) and M̂(4) reject the null
at the 5% level, while M̂(1) and M̂(3) do not reject the null hypothesis. These
results suggest that the violation of the Markov property may come from the con-
ditional variance and kurtosis dynamics of S&P 500 returns. For 7-day Eurodollar
rate changes, for both the whole sample and the first subsample, all four derivative
tests firmly reject the null hypothesis at the 5% level. For the second subsample,
M̂(1) and M̂(2) reject the null at the 5% level, but M̂(3) and M̂(4) do not. It
seems that the violation of the Markov property for the 7-day Eurodollar rate
comes from both mean and variance dynamics, and also possibly from higher-
order moment dynamics. For Japanese yen changes, M̂(2), M̂(3), and M̂(4) tests
strongly reject the null hypothesis at the 5% level, for the whole sample. How-
ever, the results from both subsamples are less clear. For the first subsample, only
M̂(1) rejects the null hypothesis at the 5% level, and for the second subsample,
only M̂(3) rejects the null hypothesis at the 5% level. To sum up, for all three
financial series, we find strong evidence of violation of the Markov property in
the conditional variance, among other things. This is consistent with the popu-
lar use of such non-Markovian models as generalized autoregressive conditional
heteroskedasticity (GARCH) and stochastic volatility models in capturing the dy-
namics of price sequences in the literature.
As many financial time series have been documented to have a long memory
property, which is non-Markov, we also apply Lobato and Robinson’s (1998) test
for the long memory property. Results (not reported here) show that there is no
evidence of long memory for S&P 500 returns and Japanese yen returns in the
whole sample and two subsamples, while there is some evidence of long memory
for 7-day Eurodollar rate changes. Thus, we can not rule out the possibility that
the rejection of the Markov property of 7-day Eurodollar rate may be due to the
long memory property. Indeed, the evidence of 7-day Eurodollar rate changes
against the Markov property is strongest among three time series.
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 159

The documented evidence against the Markov property casts some new thoughts
on financial modeling. Although most popular stochastic differential equation
models exhibit mathematical elegance and tractability, they may not be an ade-
quate representation of the dynamics of the underlying process, due to the Markov
assumption. Other modeling schemes, which allow for the non-Markov property,
may be needed to better capture the dynamics of financial time series.

6. CONCLUSION
The Markov property is one of most fundamental properties in stochastic pro-
cesses. Without justification, this property has been taken for granted in many
economic and financial models, especially in continuous-time finance models.
We propose a conditional characteristic function-based test for the Markov prop-
erty in a spectral framework. The use of the conditional characteristic function,
which is consistently estimated nonparametrically, allows us to check departures
from the Markov property in all conditional moments, and the frequency domain
approach, which checks many lags in a pairwise manner, provides a nice solution
to tackling the difficulty of the curse of dimensionality associated with testing for
the Markov property. To overcome the adverse impact of the first-stage nonpara-
metric estimation of the conditional characteristic function, we use the smoothed
nonparametric transition density-based bootstrap procedure, which provides rea-
sonable sizes and powers for the proposed test in finite samples. We apply our test
to three important financial time series and find some evidence that the Markov
assumption may not be suitable for many financial time series.

NOTES
1. There are other existing tests for conditional independence of continuous variables in the litera-
ture. Linton and Gozalo (1997) propose two nonparamtric tests for conditional independence based on
a generalization of the empirical distribution function. Su and White (2007, 2008) check conditional
independence by the Hellinger distance and empirical characteristic function respectively. These tests
can be used to test the Markov property. However, they are expected to encounter the “curse of dimen-
sionality” problem because the Markov property implies that conditional independence must hold for
an infinite number of lags.
2. Here we focus on the Markov property of order 1, which is the main interest of economic and
financial
 modeling. However,
  our approach can be generalized to test the Markov property of order
p : P Xt+1 ≤ x|It = P Xt+1 ≤ x|Xt , Xt−1 , ..., Xt− p+1 for p fixed.
3. A multivariate Taylor series expansion can be obtained when d > 1. Since the expression is
tedious, we do not present it here.
4. The extension is substantial since we use nonparametric estimation in the first stage and {Z t (u)}
is not independent and identically distributed (i.i.d.) under H0 .
5. See Masry (1996a, 1996b) for detailed explanations of these notations.
6. If W (u) is differentiable, then this implies that its derivative (∂/∂u a )W (u) is an even function
of u a for a = 1, ..., d.
7. If Xt takes on discrete values, we can estimate ϕ(u|Xt ) via a frequency approach, namely
replacing Kh (x − X) with 1(x − X), where 1(·) is the indicator function. If Xt is a mix of discrete
and continuous variables, e.g., Xt = (Xdt , Xct ), where Xdt and Xct denote discrete and continuous
components, respectively, following Li and Racine (2007), we can replace Kh (·) with
160 BIN CHEN AND YONGMIAO HONG
   
Wγ (x, Xs ) = Kh xc − Xcs Lλ xd , Xds ,

where γ = (h, λ). And Lλ (·) is the kernel function for the discrete components defined as
 
  d d = x d
1 X as a
Lλ xd , Xds = ∏ λa ,
a=1

where 0 ≤ λa ≤ 1 is the smoothing parameter for Xds . Once we get a consistent estimator for ϕ(u|Xt ),
we can calculate the generalized residual and construct the test statistic.
8. The proof strategy depends on Assumption 2. It seems plausible that one may relax Assump-
tion 2 and rely on a more generous central limit theorem for degenerate U-statistics (e.g., Theo-
rem 2.1 of Gao and Hong, 2008) but we may have to impose a more restrictive mixing condition
as the cost. Due to its complexity, this will be left for our future research. On the other hand, As-
sumptions 1 and 2 do not imply each other. For example, consider a long memory process Xt =
∑∞j=0 ϕ j εt− j , where {εt } ∼ i.i.d.(0, 1), ϕ j = ( j + d)/ [ (d)  ( j + 1)] ≈ 
−1 (d) j d−1 as j → ∞,
q
where  (·) is the Gamma function. Define Xqt = ∑ j=0 ϕ j εt− j , a q-dependent process. Then we
 2  
have E Xt − Xqt ≈  −1 (d) q −1+2d · q −1 ∑∞ j=q+1 ( j/q)
−2(1−d) = O q −1+2d . Hence, As-

sumption 2 holds if 0 < d ≤ 14 , but Assumption 1 is violated since {Xt } is not a strictly stationary
β-mixing process.
9. Alternatively, we could impose Hansen’s (2008) Assumption 3 on kernel functions,
namely,
  for
d
  < ∞ and L < ∞, either K(u) = 0 for u > L and for all u, u ∈ R , K (u) − K u ≤
some
 u − u  ; or K (u) is differentiable, |(∂/∂u) K (u)| ≤ , and for some ν > 1, |(∂/∂u) K (u)| ≤
 u−ν for u > L , where u ≡ max (|u 1 | , ..., |u d |) . Here the kernel function is required to
either have a truncated support and is Lipschitz or it has a bounded derivative with an integrable tail.
Our proof could go through with this assumption, but the trade-off is a strengthening requirement on
the bandwidth. Since the choice of the bandwidth is more important than the choice of the kernel,
and many commonly used kernels have compact support, we only consider the case of the compact
support of the kernel K(u) in our formal analysis. Nevertheless, we examine the effect of allowing
kernels with support on Rd in our simulation study.
10. It is different from Paparoditis and Politis (2000), which requires different bandwidths. The
reason why the same bandwidth works in our paper is that we use undersmoothing in the first stage,
and the bias of the first-stage nonparametric estimation vanishes to 0 asymptotically. Therefore, we
need not balance two bandwidths to obtain a good approximation of the asymptotic bias. This idea is
shown in Theorem 2.1 (i) of Paparoditis and Politis (2000) in a different context.
11. Our simulation experiments show that results based on these two smoothed bootstrap procedures
are very similar.
12. We thank Liangjun Su for providing the Matlab codes on computing Su and White’s tests of
conditional independence (2007, 2008).
13. We first generate 15 grid points u0 , v0 from N (0, 1) and obtain u = [u 0 ,−u 0 ] and v = [v0 ,
−v0 ] to ensure symmetry. Preliminary experiments with different numbers of grid points show that
simulation results are not very sensitive to the choice of numbers. Concerned with the computational
cost in the simulation study, we are satisfied with current results with 30 grid points.
14. We have tried the Parzen kernel for k (·) , obtaining similar results (not reported here).
15. Following Ait-Sahalia (1997) and Amoro de Matos and Fernandes (2007), we use undersmooth-
ing to ensure that the squared bias vanishes to zero faster than the variance. On the other hand, we have
used the smoothed nonparametric conditional density bootstrap procedure, and hence the simulation
results are expected not to be very sensitive to the choice of the bandwidth.
16. Ideally, the conditional distribution of Xt given Xt−1 should be tested. Unfortunately, to our
knowledge, no such test is available in the literature. Compared with some existing tests in the
literature, Inoue’s tests are model-free, allow for dependence in the data, and are robust against the
heavy-tailed distributions observed in financial markets. Hence, they are most suitable here for pre-
liminary testing.
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 161

REFERENCES

Aaronson, J., D. Gilat, & M. Keane (1992) On the structure of 1-dependent Markov chains. Journal
of Theoretical Probability 5, 545–561.
Ahn, D., R. Dittmar, & A.R. Gallant (2002) Quadratic term structure models: Theory and evidence.
Review of Financial Studies 15, 243–288.
Ahn, D. & B. Gao (1999) A parametric nonlinear model of term structure dynamics. Review of Finan-
cial Studies 12, 721–762.
Ait-Sahalia, Y. (1996) Testing continuous-time models of the spot interest rate. Review of Financial
Studies 9, 385–426.
Ait-Sahalia, Y. (1997) Do Interest Rates Really Follow Continuous-Time Markov Diffusions? Work-
ing paper, Princeton University.
Ait-Sahalia, Y., J. Fan, and H. Peng (2009) Nonparametric transition-based tests for diffusions. Jour-
nal the of American Statistical Association 104, 1102–1116.
Amaro de Matos, J. & M. Fernandes (2007) Testing the Markov property with high frequency data.
Journal of Econometrics 141, 44–64.
Amaro de Matos, J. & J. Rosario (2000) The Equilibrium Dynamics for an Endogenous Bid-Ask
Spread in Competitive Financial Markets. Working paper, European University Institute and
Universidade Nova de Lisboa.
Anderson, T. & J. Lund (1997) Estimating continuous time stochastic volatility models of the short
term interest rate. Journal of Econometrics 77, 343–377.
Aviv, Y. & A. Pazgal (2005) A partially observed Markov decision process for dynamic pricing.
Management Science 51, 1400–1416.
Bangia, A., F. Diebold, A. Kronimus, C. Schagen, & T. Schuermann (2002) Ratings migration and the
business cycle, with application to credit portfolio stress testing. Journal of Banking and Finance
26, 445–474.
Bierens, H. (1982) Consistent model specification tests. Journal of Econometrics 20, 105–134.
Blume, L., D. Easley, & M. O’Hara (1994) Market statistics and technical analysis: The role of
volume. Journal of Finance 49, 153–181.
Brown, B.M. (1971) Martingale limit theorems. Annals of Mathematical Statistics 42, 59–66.
Chacko, G., & L. Viceira (2003) Spectral GMM estimation of continuous-time processes. Journal of
Econometrics 116, 259–292.
Chan, K.C., G.A. Karolyi, F.A. Longstaff, & A.B. Sanders (1992) An empirical comparison of alter-
native models of the short-term interest rate. Journal of Finance 47, 1209–1227.
Chen, B. & Y. Hong (2009) Diagnosing Multivariate Continuous-Time Models with Application to
Affine Term Structure Models. Working paper, Cornell University and University of Rochester.
Cleveland, W.S. (1979) Robust locally weighted regression and smoothing scatterplots. Journal of the
American Statistical Association 74, 829–836.
Cox, J.C., J.E. Ingersoll, & S.A. Ross (1985) A theory of the term structure of interest rates. Econo-
metrica 53, 385–407.
Dai, Q., & K. Singleton (2000) Specification analysis of affine term structure models. Journal of
Finance 55, 1943–1978.
Darsow, W.F., B. Nguyen, & E.T. Olsen (1992) Copulas and Markov processes. Illinois Journal of
Mathematics 36, 600–642.
Davies, R.B. (1977) Hypothesis testing when a nuisance parameter is present only under the alterna-
tive. Biometrika 64, 247–254.
Davies, R.B. (1987) Hypothesis testing when a nuisance parameter is present only under the alterna-
tive. Biometrika 74, 33–43.
Duan, J.C & K. Jacobs (2008) Is long memory necessary? An empirical investigation of nonnegative
interest rate processes. Journal of Empirical Finance 15, 567–581.
Duffie, D. & R. Kan (1996) A yield-factor model of interest rates. Mathematical Finance 6,
379–406.
162 BIN CHEN AND YONGMIAO HONG

Duffie, D., J. Pan, & K. Singleton (2000) Transform analysis and asset pricing for affine jump-
diffusions. Econometrica 68, 1343–1376.
Easley, D. & M. O’Hara (1987) Price, trade size, and information in securities markets. Journal of
Financial Economics 19, 69–90.
Easley, D. & M. O’Hara (1992) Time and the process of security price adjustment. Journal of Finance
47, 577–605.
Edwards, R. & J. Magee (1966) Technical Analysis of Stock Trends. John Magee.
Epps, T.W. & L.B. Pulley (1983) A test for normality based on the empirical characteristic function.
Biometrika 70, 723–726.
Ericson, R. & A. Pakes (1995) Markov-perfect industry dynamics: A framework for empirical work.
Review of Economic Studies 62(1), 53–82.
Fan, J. (1992) Design-adaptive nonparametric regression. Journal of the American Statistical Associ-
ation 87, 998–1004.
Fan, J. (1993) Local linear regression smoothers and their minimax efficiency. Annals of Statistics 21,
196–216.
Fan, J. & Q. Yao (2003) Nonlinear Time Series: Nonparametric and Parametric Methods. Springer
Verlag.
Fan, Y. & Q. Li (1999) Root-N -consistent estimation of partially linear time series models. Journal of
Nonparametric Statistics 11, 251–269.
Fan, Y., Q. Li, & I. Min (2006) A nonparametric bootstrap test of conditional distributions. Economet-
ric Theory 22, 587–613.
Feller, W. (1959) Non-Markovian processes with the semi-group property. Annals of Mathematical
Statistics 30, 1252–1253.
Feuerverger, A. & P. McDunnough (1981) On the efficiency of empirical characteristic function pro-
cedures. Journal of the Royal Statistical Society, Series B 43, 20–27.
Gallant, A.R., D. Hsieh, & G. Tauchen (1997) Estimation of stochastic volatility models with diag-
nostics. Journal of Econometrics 81, 159–192.
Gao, J. & Y. Hong (2008) Central limit theorems for generalized U-statistics with applications in
nonparametric specification. Journal of Nonparametric Statistics 20, 61–76.
Hall, R. (1978) Stochastic implications of the life cycle permanent income hypothesis: Theory and
practice. Journal of Political Economy 86, 971–987.
Hamilton, J.D. (1989) A new approach to the economic analysis of nonstationary time series and the
business cycle. Econometrica 57, 357–384.
Hamilton J.D. & R. Susmel (1994) Autoregressive conditional heteroskedasticity and changes in
regime. Journal of Econometrics 64, 307–333.
Hansen, B.E. (1996) Inference when a nuisance parameter is not identified under the null hypothesis.
Econometrica 64, 413–430.
Hansen, B.E. (2008) Uniform convergence rates for kernel estimation with dependent data. Econo-
metric Theory 24, 726–748.
Heath, D., R. Jarrow, & A. Morton (1992) Bond pricing and the term structure of interest rates: A new
methodology for contingent claims valuation. Econometrica 60, 77–105.
Hong, Y. (1999) Hypothesis testing in time series via the empirical characteristic function: A gener-
alized spectral density approach. Journal of the American Statistical Association 94, 1201–1220.
Hong, Y. & H. Li (2005) Nonparametric specification testing for continuous-time models with appli-
cations to term structure of interest rates. Review of Financial Studies 18, 37–84.
Hong, Y. & H. White (2005) Asymptotic distribution theory for an entropy-based measure of serial
dependence. Econometrica 73, 837–902.
Horowitz, J.L. (2003) Bootstrap methods for Markov processes. Econometrica 71, 1049–1082.
Ibragimov, R. (2007) Copula-Based Characterizations for Higher-Order Markov Processes. Working
paper, Harvard University.
Inoue, A. (2001) Testing for distributional change in time series. Econometric Theory 17, 156–187.
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 163

Jarrow, R., D. Lando, & S. Turnbull (1997) A Markov model for the term structure of credit risk
spreads. Review of Financial Studies 10, 481–523.
Jarrow, R. & S. Turnbull (1995) Pricing derivatives on financial securities subject to credit risk. Journal
of Finance 50, 53–86.
Jiang, G. & J. Knight (1997) A nonparametric approach to the estimation of diffusion processes with
an application to a short-term interest rate model. Econometric Theory 13, 615–645.
Kavvathas, D. (2001) Estimating Credit Rating Transition Probabilities for Corporate Bonds. Working
paper, University of Chicago.
Kiefer, N.M. & C.E. Larson (2004) Testing Simple Markov Structures for Credit Rating Transitions.
Working paper, Cornell University.
Kim, W. & O. Linton (2003) A Local Instrumental Variable Estimation Method for Generalized
Additive Volatility Models. Working paper, Humboldt University of Berlin, London School of
Economics.
Kydland, F.E. & E. Prescott (1982) Time to build and aggregate fluctuations. Econometrica 50,
1345–70.
Lando D. & T. Skφdeberg (2002) Analyzing rating transitions and rating drift with continuous obser-
vations. Journal of Banking & Finance 26, 423–444.
LeBaron, B. (1999) Technical trading rule profitability and foreign exchange intervention. Journal of
International Economics 49, 125–143.
Lee, A.J. (1990) U-Statistics: Theory and Practice. Marcel Dekker.
Lévy, P. (1949) Exemple de processus pseudo-markoviens. Comptes Rendus de l’Académie des Sci-
ences 228, 2004–2006.
Li, Q. & J.S. Racine (2007) Nonparametric Econometrics: Theory and Practice. Princeton University
Press.
Linton, O. & P. Gozalo (1997) Conditional Independence Restrictions: Testing and Estimation. Work-
ing paper, Cowles Foundation for Research in Economics, Yale University.
Ljungqvist, L. & T.J. Sargent (2000) Recursive Macroeconomic Theory. MIT Press.
Lobato, I.N. & P.M. Robinson (1998) A nonparametric test for I(0). Review of Economic Studies 65,
475–495.
Loretan, M. & P.C.B Phillips (1994) Testing the covariance stationarity of heavy-tailed time series: An
overview of the theory with applications to several financial datasets. Journal of Empirical Finance
1, 211–248.
Lucas, R. (1978) Asset prices in an exchange economy. Econometrica 46, 1429–45.
Lucas, R. (1988) On the mechanics of economic development. Journal of Monetary Economics 22,
3–42.
Lucas, R. & E. Prescott (1971) Investment under uncertainty. Econometrica 39, 659–81.
Lucas, R. & N.L. Stokey (1983) Optimal fiscal and monetary policy in an economy without capital.
Journal of Monetary Economics 12, 55–94.
Masry, E. (1996a) Multivariate local polynomial regression for time series: Uniform strong consis-
tency and rates. Journal of Time Series Analysis 6, 571–599.
Masry, E. (1996b) Multivariate regression estimation local polynomial fitting for time series. Stochas-
tic Processes and Their Applications 65, 81–101.
Masry, E. & J. Fan (1997) Local polynomial estimation of regression functions for mixing processes.
Scandinavian Journal of Statistics 24, 165–179.
Masry, E. & D. Tjøstheim (1997) Additive nonlinear ARX time series and projection estimates.
Econometric Theory 13, 214–252.
Matús, F. (1996) On two-block-factor sequences and one-dependence. Proceedings of the American
Mathematical Society 124, 1237–1242.
Matús, F. (1998) Combining m-dependence with Markovness. Annales de l’Institut Henri Poincaré.
Probabilités et Statistiques 34, 407–423.
Mehra, R. & E. Prescott (1985) The equity premium: A puzzle. Journal of Monetary Economics 15,
145–61.
164 BIN CHEN AND YONGMIAO HONG

Mizutani, E. & S. Dreyfus (2004) Two stochastic dynamic programming problems by model-free
actor-critic recurrent network learning in non-Markovian settings. Proceedings of the IEEE-INNS
International Joint Conference on Neural Networks.
Pagan, A.R. & G.W. Schwert (1990) Testing for covariance stationarity in stock market data. Eco-
nomics Letters 33, 165–70.
Paparoditis, E. & D.N. Politis (2000) The local bootstrap for kernel estimators under general depen-
dence conditions. Annals of the Institute of Statistical Mathematics 52, 139–159.
Paparoditis, E. & D.N. Politis (2002) The local bootstrap for Markov processes. Journal of Statistical
Planning and Inference 108, 301–328.
Platen, E. & R. Rebolledo (1996) Principles for modelling financial markets. Journal of Applied Prob-
ability 31, 601–613.
Romer, P. (1986) Increasing returns and long-run growth. Journal of Political Economy 5,
1002–1037.
Romer, P. (1990) Endogenous technological change. Journal of Political Economy 5, 71–102.
Rosenblatt, M. (1960) An aggregation problem for Markov chains. In R.E. Machol (ed.), Information
and Decision Processes, pp. 87–92. McGraw-Hill.
Rosenblatt, M. & D. Slepian (1962) N th order Markov chains with every N variables independent.
Journal of the Society for Industrial and Applied Mathematics 10, 537–549.
Ruppert, D. & M.P. Wand (1994) Multivariate weighted least squares regression. Annals of Statistics
22, 1346–1370.
Rust, J. (1994) Structural estimation of Markov decision processes. Handbook of Econometrics 4,
3081–3143.
Sargent, T. (1987) Dynamic Macroeconomic Theory. Harvard University Press.
Singleton, K. (2001) Estimation of affine asset pricing models using the empirical characteristic func-
tion. Journal of Econometrics 102, 111–141.
Skaug, H.J. & D. Tjøstheim (1993) Nonparametric tests of serial independence. In T. Subba Rao (ed.),
Developments in Time Series Analysis: The Priestley Birthday Volume, pp. 207–229. Chapman &
Hall.
Skaug, H.J. & D. Tjøstheim (1996) Measures of distance between densities with application to testing
for serial independence. In P. Robinson & M. Rosenblatt (eds.), Time Series Analysis in Memory of
E. J. Hannan, pp. 363–377. Springer.
Stinchcombe, M.B. & H. White (1998) Consistent specification testing with nuisance parameters
present only under the alternative. Econometric Theory 14, 295–325.
Stone, C.J. (1977) Consistent nonparametric regression. Annals of Statistics 5, 595–645.
Su, L. & H. White (2007) A consistent characteristic-function-based test for conditional independence.
Journal of Econometrics 141, 807–834.
Su, L. & H. White (2008) Nonparametric Hellinger metric test for conditional independence. Econo-
metric Theory 24, 829–864.
Uzawa, H. (1965) Optimum technical change in an aggregative model of economic growth. Interna-
tional Economic Review 6, 18–31.
Vasicek, O. (1977) An equilibrium characterization of the term structure. Journal of Financial Eco-
nomics 5, 177–188.
Weintraub, G.Y., L.C. Benkard, & B. Van Roy (2008) Markov perfect industry dynamics with many
firms. Econometrica 76, 1375–1411.
Yoshihara, K. (1976) Limiting behavior of U-statistics for stationary, absolutely regular processes.
Z. Wahrsch. Verw. Gebiete 35, 237–252.
Zhu, X. (1992) Optimal fiscal policy in a stochastic growth model. Journal of Economic Theory 2,
250–289.

APPENDIX
Throughout the Appendix, we let M̃ be defined in the same way as M̂ in (2.19) with
Ẑ t (u) replaced by Z t (u). Also, C ∈ (1, ∞) denotes a generic bounded constant.
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 165

Proof of Theorem 1. The proof of Theorem 1 consists of the proofs of Theorems A.1–
A.3 below. n
p
THEOREM A.1. Under the conditions of Theorem 1, M̂ − M̃ → 0.

THEOREM A.2. Let M̃q be defined as M̃, with {Xq,t }t=1 T T ,


replacing {Xt }t=1
!T  !T !
ϕqt (u) t=1 replacing ϕ u|Xt−1 t=1 , where {Xq,t } and ϕqt (u) are as in As-
sumption 2. Then under the conditions of Theorem 1 and q = p 1+1/(4b−2) (ln2 T )1/(2b−1) ,
p
M̃q − M̃ → 0.

THEOREM A.3. Under the conditions of Theorem 1 and q = p 1+1/(4b−2)(ln2 T )1/(2b−1) ,


d
M̃q → N (0, 1).

Proof of Theorem A.1. Put Tj ≡ T − | j|, and let ˜ j (u, v) be defined in the same way
p
as ˆ j (u, v) in (2.11), with Ẑ t (u) replaced by Z t (u) . To show M̂ − M̃ → 0, it suffices to
show

 T −1
p
D̂ −1/2 ∑ k 2 ( j/ p)Tj [|ˆ j (u, v)|2 − |˜ j (u, v)|2 ]dW (u)dW (v) → 0, (A.1)
j=1

p −1 (Ĉ − C̃) = O P (T −1/2 ), and p −1 ( D̂ − D̃) = o P (1) , where C̃ and D̃ are defined in the
same way as Ĉ and D̂ in (2.19), respectively, with Ẑ t (u) replaced by Z t (u) . For space, we
focus on the proof of (A.1); the proofs for p −1 (Ĉ − C̃) = O P (T −1/2 ) and p −1 ( D̂ − D̃) =
o P (1) are straightforward. We note that it is necessary to obtain the convergence rate
O P ( pT −1/2 ) for Ĉ − C̃ to ensure that replacing Ĉ with C̃ has asymptotically negligible
impact given p/T → 0.
To show (A.1), we first decompose

 T −1

∑ k 2 ( j/ p)Tj |ˆ j (u, v)|2 − |˜ j (u, v)|2 dW (u)dW (v) = Â1 + 2Re( Â2 ), (A.2)
j=1

where

 T −1 2

Â1 = ∑ k 2 ( j/ p)Tj ˆ j (u, v) − ˜ j (u, v) dW (u)dW (v) ,
j=1
 T −1

Â2 = ∑ k 2 ( j/ p)Tj ˆ j (u, v) − ˜ j (u, v) ˜ j (u, v)∗ dW (u) dW (v),


j=1

where Re( Â2 ) is the real part of Â2 , and ˜ j (u, v)∗ is the complex conjugate of ˜ j (u, v).
Then (A.1) follows from Propositions A.1 and A.2 below, and p → ∞ as T → ∞. n
p
PROPOSITION A.1. Under the conditions of Theorem 1, p −1/2 Â1 → 0.
p
PROPOSITION A.2. Under the conditions of Theorem 1, p −1/2 Â2 → 0.
166 BIN CHEN AND YONGMIAO HONG

Proof of Proposition A.1. Put ψt (v) ≡ ei v Xt − ϕ (v) and ϕ(v) ≡ E(ei v Xt ). Then
straightforward algebra yields that for j > 0,

ˆ j (u, v) − ˜ j (u, v)
T       
= Tj−1 ∑ ϕ u|Xt−1 − ϕ̂ u|Xt−1 ψt− j (v) + ϕ (v) − ϕ̂ (v) Tj−1
t= j+1

T     
∑ ϕ u|Xt−1 − ϕ̂ u|Xt−1
t= j+1

= B̂1 j (u, v) + B̂2 j (u, v), say. (A.3)

2 T −1 
It follows that Â1 ≤ 2 ∑a=1 ∑ j=1 k 2 ( j/ p)Tj | B̂a j (u, v)|2 dW (u)dW (v) . Proposition
A.1 follows from Lemmas A.1 and A.2 below. n
−1 2 
Lemma A.1. p −1/2 ∑Tj=1 k ( j/ p)Tj | B̂1 j (u, v)|2 dW (u)dW (v) = o P (1).

−1 2 
Lemma A.2. p −1/2 ∑Tj=1 k ( j/ p)Tj | B̂2 j (u, v)|2 dW (u)dW (v) = o P (1).

We now show these lemmas. Throughout, we put aT ( j) ≡ k 2 ( j/ p)Tj−1 .

Proof of Lemma A.1. We write


   
T T Xs−1 − Xt−1 i u Xs
B̂1 j (u, v) = Tj−1 ∑ ϕ (u|Xt−1 ) − ∑ Ŵ
h
e ψt− j (v)
t= j+1 s=2
   
T T Xs−1 − Xt−1
= −Tj−1 ∑ ∑ Ŵ h
ϕ (u|Xs−1 ) − ϕ (u|Xt−1 ) ψt− j (v)
t= j+1 s=2
 
T T Xs−1 − Xt−1
−Tj−1 ∑ ∑ Ŵ h
Z s (u) ψt− j (v)
t= j+1 s=2

= − B̂11 j (u, v) − B̂12 j (u, v) , say. (A.4)

For the first term, we further decompose

B̂11 j (u, v)
T      
= Tj−1 ∑ e 1 h r +1 S−1
T Xt−1 BT Xt−1 Dr +1 u, Xt−1 ψt− j (v)
t= j+1
T    
+ Tj−1 ∑ e 1 S−1
T Xt−1 RT u, Xt−1 ψt− j (v)
t= j+1
= B̂111 j (u, v) + B̂112 j (u, v) , say, (A.5)
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 167

where BT (x) is an N × Nr +1 matrix


⎡ ⎤
S0,r +1
⎢ ⎥
BT (x) = ⎣ ... ⎦ ,
Sr,r +1
Sj,r +1 is of dimension N j × Nr +1 , Dr +1 (u, x) is obtained by arranging the Nr +1 el-
 
ements of the derivatives 1/j!ϕ (j) u|Xt−1 = x for |j| = r + 1 as a column using the
lexicographical order, RT (u, x) is an N × 1 vector
⎡ ⎤
R0
⎢ ⎥
RT (u, x) = ⎣ ... ⎦ ,
Rr
 
R|j| is of dimension N|j| × 1, with its lth element R|j| = γg|j| (l) , and
l

h r +1 T X l+j
s−1 − x
γj = (r + 1) ∑ ∑ Kh (Xs−1 − x)
|l|=r +1 l! (T − 1) s=2 h
 1     

× ϕ (l) u|Xt−1 = x+w Xs−1 − x − ϕ (l) u|Xt−1 = x (1 − w)r dw. (A.6)


0
For the first term in (A.5), we have
'  −1    

B̂111 j (u, v) = E e 1 h r +1 S̃ Xt−1 B̃ Xt−1 Dr +1 u, Xt−1 ψt− j (v)

T '  −1  
+Tj−1 ∑ e 1 h r +1 S̃ Xt−1 B̃ Xt−1
t= j+1
 
×Dr +1 u, Xt−1 ψt− j (v)
 −1  
−E e 1 h r +1 S̃ Xt−1 B̃ Xt−1
   ((
×Dr +1 u, Xt−1 ψt− j (v) [1 + o P (1)]

= B̂1111 j (u, v) + B̂1112 j (u, v) [1 + o P (1)] , say, (A.7)

where S̃ (x) ≡ E [ST (x)] and B̃ (x) ≡ E [BT (x)] .


For the first term in (A.7), we have
 2

B̂1111 j (u, v) dW (u) dW (v)
   −1  

≤ Ch 2(r +1) β ( j) e 1 S̃ Xt−1 B̃ Xt−1
    2
×Dr +1 u, Xt−1 ∞ ψt− j (v)∞ dW (u) dW (v)

≤ Cβ 2 ( j) h 2(r +1) ,
168 BIN CHEN AND YONGMIAO HONG

where we have used the mixing inequality and Assumption 1. It follows that

T −1  2

p −1/2 ∑ k 2 ( j/ p) Tj B̂1111 j (u, v) dW (u) dW (v) = o P (1) , (A.8)
j=1

where we have used the fact that


T −1 T −1
∑ aT ( j) = ∑ k 2 ( j/ p)Tj−1 = O( p/T ). (A.9)
j=1 j=1

For the second term in (A.7), we have


 2

E B̂1112 j (u, v) dW (u) dW (v)

T T   −1  
= 2Tj−2 ∑ ∑ E e 1 h r +1 S̃ Xt−1 B̃ Xt−1
τ = j+1 t=τ +1
 
 −1  
×Dr +1 u, Xt−1 ψt− j (v) − B̂1111 j (u, v) e 1 h r +1 S̃ Xt−1 B̃ Xt−1
 

×Dr +1 u, Xτ −1 ψτ − j (v) − B̂1111 j (u, v) dW (u) dW (v)
  −1  

+Tj−1 E e 1 h r +1 S̃ Xt−1 B̃ Xt−1
  2
× Dr +1 u, Xt−1 ψt− j (v) dW (u)dW (v)
T−j  
l
≤ C Tj−1 ∑ 1− β (l) h 2(r +1) + C Tj−1 h 2(r +1) ≤ C Tj−1 h 2(r +1) ,
l=1 Tj

where we have used Assumption 1 and the mixing inequality. It follows from (A.9) and
Chebychev’s inequality that

T −1  2

p −1/2 ∑ k 2 ( j/ p) Tj B̂1112 j (u, v) dW (u) dW (v) = o P (1) . (A.10)
j=1

By Cauchy-Schwarz inequality, we have


T −1 
p −1/2 ∑ k 2 ( j/ p) Tj ∗
B̂1111 j (u, v) B̂1112 j (u, v) dW (u) dW (v)
j=1
 
1/2
T −1 2

≤ p −1/2 ∑ k 2 ( j/ p) T j B̂1111 j (u, v) dW (u) dW (v)
j=1
 
1/2
T −1 2

∑ k 2 ( j/ p) Tj 1112 j
B̂ (u, v) dW (u) dW (v)
j=1
= o P (1) . (A.11)
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 169

Combining (A.8), (A.10), and (A.11), we obtain


T −1  2

p −1/2 ∑ k 2 ( j/ p) Tj B̂111 j (u, v) dW (u) dW (v)
j=1

T −1  2

= p −1/2 ∑ k 2 ( j/ p) Tj B̂1111 j (u, v) dW (u) dW (v)
j=1

T −1  2

+ p −1/2 ∑ k 2 ( j/ p) Tj B̂1112 j (u, v) dW (u) dW (v)
j=1

T −1 
−2 p −1/2 Re ∑ k 2 ( j/ p) Tj ∗
B̂1111 j (u, v) B̂1112 j (u, v) dW (u) dW (v)
j=1

= o P (1) .
(A.12)
For the second term in (A.5), we have
'  −1  

B̂112 j (u, v) = E e 1 S̃ Xt−1 R̃ u, Xt−1 ψt− j (v)

T '  −1  
+Tj−1 ∑ e 1 S̃ Xt−1 R̃ u, Xt−1 ψt− j (v)
t= j+1
 −1  
((
−E e 1 S̃ Xt−1 R̃ u, Xt−1 ψt− j (v) [1 + o P (1)]

= B̂1121 j (u, v) + B̂1122 j (u, v) [1 + o P (1)] , say, (A.13)

where R̃ (x) is an N × 1 vector


⎡ ⎤
R̃0
⎢ ⎥
R̃ (u, x) = ⎣ ... ⎦ ,
R̃r
 

R̃|j| is of dimension N|j| × 1, with its lth element R̃|j| = γ̃g|j| (l) ≡ E γg|j| (l) .
l
For the first term in (A.13), we have
 2

B̂1121 j (u, v) dW (u) dW (v)
   −1     2
  ψt− j (v) dW (u) dW (v)
≤C β 2 ( j) e 1 S̃ Xt−1 R̃ u, Xt−1  ∞

≤ Cβ 2 ( j) h 2(r +1) ,
where we have used the mixing inequality, Assumption 1, and the fact that
2  

sup γ̃j = O P h 2(r +1) . (A.14)
x∈G
170 BIN CHEN AND YONGMIAO HONG

It follows that
T −1  2
1
p− 2 ∑ k 2 ( j/ p) Tj B̂1121 j (u, v) dW (u) dW (v) = o P (1) , (A.15)
j=1

where we have used (A.9).


For the second term in (A.13), we have
 2

E B̂1122 j (u, v) dW (u) dW (v)

T T   −1  

= 2Tj−2 ∑ ∑ E e 1 S̃ Xt−1 R̃ u, Xt−1 ψt− j (v) − B̂1121 j (u, v)


τ = j+1 t=τ +1
 −1  

× e 1 S̃ Xt−1 R̃ u, Xt−1 ψτ − j (v) − B̂1121 j (u, v) dW (u) dW (v)
  −1   2

+ Tj−1 E e 1 S̃ Xt−1 R̃ u, Xt−1 ψt− j (v) dW (u) dW (v)

T−j  
l
≤ C Tj−1 ∑ 1− β (l) h 2(r +1) + C Tj−1 h 2(r +1) ≤ C Tj−1 h 2(r +1) ,
l=1 Tj

where we have used (A.14), Assumption 1, and the mixing inequality. It follows from (A.9)
and Chebychev’s inequality that
T −1  2

p −1/2 ∑ k 2 ( j/ p) Tj B̂1122 j (u, v) dW (u) dW (v) = o P (1) . (A.16)
j=1

Combining (A.15) and (A.16), we obtain


T −1  2

p −1/2 ∑ k 2 ( j/ p) Tj B̂112 j (u, v) dW (u) dW (v) = o P (1) . (A.17)
j=1

For the second term in (A.4), we have


T T  −1
B̂12 j (u, v) = Tj−1 (T − 1)−1 h −d ∑ ∑ e1 S̃ Xt−1 
t= j+1 s=2
 
Xs−1 − Xt−1
× Z s (u) ψt− j (v) [1 + o P (1)]
h
 T T  −1
= Tj−1 (T − 1)−1 h −d ∑ ∑ e 1 S̃ Xt−1 
t=2 s=2
 
Xs−1 − Xt−1
× Z s (u) ψt− j (v)
h
j T  −1
− Tj−1 (T − 1)−1 h −d ∑ ∑ e 1 S̃ Xt−1 
t=2 s=2
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 171
  )
Xs−1 − Xt−1
× Z s (u) ψt− j (v) [1 + o P (1)]
h

= B̂121 j (u, v) − B̂122 j (u, v) [1 + o P (1)] , (A.18)

where  (z) ≡ (z) K (z) . Now, introducing


 
   −1 Xs−1 − Xt−1
 j Y jt , Y js = e 1 S̃ Xt−1  Z s (u) ψt− j (v)
h
 
 −1 Xt−1 − Xs−1
+ e 1 S̃ Xs−1  Z t (u) ψs− j (v) ,
h
 
where Y jt = Xt , Xt−1 , Xt− j , we can write B̂121 j (u, v) as a U -statistic,

1 −1  
B̂121 j (u, v) = Tj (T − 1)−1 h −d ∑∑ j Y jt , Y js
2 t=s

1 T  
+ Tj−1 (T − 1)−1 h −d ∑  j Y jt , Y jt .
2 t=2

For notational simplicity, we have suppressed the dependence on u and v of φ j (Y jt , Y js ).


For the second term, it is easy to see that


T −1  T  
2
1 −1
∑ k 2 ( j/ p) T j Tj (T − 1)−1 h −d ∑  j Y jt , Y jt dW (u) dW (v) = o P (1) .
j=1
2 t=2

For the first term, we have

1 −1  
T (T − 1)−1 h −d ∑∑ j Y jt , Y js
2 j t=s

T   1  
= Tj−1 h −d ∑ 1 j Xt−1 + Tj−1 (T − 1)−1 h −d ∑∑ ˜ j Y jt , Y js
t=2 2 t>s

T  
= Tj−1 h −d ∑ 1 j ˜ j,
Xt−1 +  say, (A.19)
t=2

       −1  Xt−1 −Xs−1 


where 1 j (y) =  j y, Y js d Fj Y js = e 1 S̃ Xs−1  Z t (u)
        h  
˜
ψs− j (v) d Fj Y js and  j Y jt , Y js =  j Y jt , Y js − 1 j Y jt − 1 j Y js .
Note we have made use of the fact that
    
0 j = 1 j Y jt d F Y jt
  
 −1 Xt−1 − Xs−1    
= e1 S̃ Xs−1  Z t (u) ψs− j (v) d Fj Y js d Fj Y jt = 0.
h
172 BIN CHEN AND YONGMIAO HONG

For the first term in (A.19), we have


 
−1 −d
T  
var Tj h ∑ 1 j Y jt
t=2
   

= 2Tj−2 h −2d ∑ ∑cov 1 j Y j,t−1 , ∗1 j Y j,τ −1


t>τ
  
+ Tj−2 (T − 1) h −2d var 1 j Xt−1

T −1  l
   2γ
1/γ
≤ C Tj−2 (T − 1) h −2d ∑ 1− β (l)1−(1/γ ) E 1 j Xt−1
l=1 T
  
+ Tj−2 (T − 1) h −2d var 1 j Xt−1
  2γ
1/γ
≤ C Tj−2 (T − 1) h −2d E 1 j Xt−1
  
+ Tj−2 (T − 1) h −2d var 1 j Xt−1 ,

where γ = ν−1 ν + ε and ε > 0. Note that we have used Assumption 1 and the mixing

inequality. Put D = h −d/ν , where ν is defined in Assumption 1. It follows that


2
D  T  
−1/2 −1 −d
p ∑ k ( j/ p) Tj E Tj h ∑ 1 j Xt−1 dW (u) dW (v)
2
j=1 t=2

≤ C Dp −1/2 = o (1) , (A.20)

where we have used the fact that h −d 1 j (z) is bounded in probability. At the same time,
we have
2
T −1  T  
−1/2 −1 −d
p ∑ k ( j/ p) Tj E Tj h ∑ 1 j Xt−1 dW (u) dW (v)
2
j=D+1 t=2
T −1  
≤ C p −1/2 ∑ k 2 ( j/ p) β 2 ( j − 1) h −2d = O h −d/ν p −1/2 = o (1) , (A.21)
j=D+1

where we have used Assumptions 1, and 4, and the fact 1 j (x) ≤ β ( j − 1) given the
mixing inequality.
It follows from (A.20), (A.21), and Chebychev’s inequality that
2
T −1  T  
−1
p −1/2 ∑ k 2 ( j/ p) T j Tj ∑ 1 j Xt−1 dW (u) dW (v) = o P (1) . (A.22)
j=1
t=2

For the second term in (A.19), we have


  ∗ 

˜ 2 ) = T −2 (T − 1)−2 h −2d ∑ ∑ E 
E( ˜ Y jt , Y js .
˜ j Y jt , Y js 
j j j
t=s t =s
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 173

Following Yoshihara (1976) and Lee (1990), we split it into two types: (a) those for which
t, s, t , s are all distinct; and (b) those remaining.
For terms of type (a), we have

  ∗ 


∑ E  j Y jt , Y js  j Y jt , Y js
˜ ˜
t,s,t ,s

T T −1 T −1 T −1 

˜   ∗
˜ Y jt+s+t , Y jt+s+t +s
≤ 4! ∑ ∑ ∑ ∑ E  j Y jt , Y jt+s  j

t=2 s=2 t =2 s =2

T   ∗ 

˜ ˜ Y jt+s+t , Y jt+s+t +s
≤ 4! ∑ ∑ E  j Y jt , Y jt+s  j
t=2 2≤t ,s ≤s

T   ∗ 

˜ ˜ Y jt+s+t , Y jt+s+t +s
+ 4! ∑ ∑ E  j Y jt , Y jt+s  j
t=2 2≤s,s ≤t

T   ∗ 

˜ ˜ Y jt+s+t , Y jt+s+t +s .
+ 4! ∑ ∑ E  j Y jt , Y jt+s  j (A.23)
t=2 2≤s,t ≤s

For the first term in (A.23), we have

T   ∗ 

˜ ˜ Y jt+s+t , Y jt+s+t +s
∑ ∑

E  j Y jt , Y jt+s  j
t=2 2≤t ,s ≤s

T T
≤ ∑ ∑ (s + 1)2 β α/α+1 (s) h 2d/(α+1) ≤ C (T − 1) h 2d/(α+1) ,
t=2 s=2

where 1 > α > ν−33 and we have used Lemma 2 of Yoshihara (1976) and Assumption 1.
For the second term in (A.23), we have

T   ∗ 

˜ ˜ Y jt+s+t , Y jt+s+t +s
∑ ∑ E  j Y jt , Y jt+s  j
t=2 2≤s,s ≤t

T T  2  
≤ ∑ ∑ t + 1 β α/α+1 t h 2d/(α+1)

t=2 t =2

T  
+ ∑ ∑ β α/α+1 (s) h d/(α+1) β α/α+1 s h d/(α+1)
t=2 2≤s,s ≤t

≤ C (T − 1)2 h 2d/(α+1) ,

where 1 > α > ν−3 3 and we have used Lemma 2 of Yoshihara and Assumption 1. The third
term is similar to the first term.
174 BIN CHEN AND YONGMIAO HONG

For terms of type (b), we also consider one case:


 
T   ∗ 
T
α/α+1
∑ ∑ E ˜ j Y jt , Y js ˜ j Y jt , Y js ≤ (T − 1) h 1 + ∑ β
2 d ( j)
2≤t<t ≤T s=2 j=1

≤ C (T − 1)2 h d ,
where 1 > α > ν−1 1 and we have used Lemma 2 of Yoshihara (1976) and Assumption 1.
For other cases, similar arguments apply.
Hence we have
T −1 
1 ˜ 2
p− 2 ∑ k 2 ( j/ p) Tj  j dW (u) dW (v) = o P (1) , (A.24)
j=1

where we have used Chebychev’s inequality and (A.9).


It follows from (A.19), (A.22), and (A.24) that
T −1  2

p −1/2 ∑ k 2 ( j/ p) Tj B̂121 j (u, v) dW (u) dW (v) = o P (1) . (A.25)
j=1

For the second term in (A.18), we have


2

E B̂122 j (u, v)
 
j j T T  −1 Xs−1 − Xt−1
= 2Tj−2 (T − 1)−2 h −2d ∑ ∑ ∑ ∑ Eg Xt−1 
t=2 t =t+1 s=2 s =s+1 h
 
 −1 Xs −1 − Xt −1
× Z s (u) ψt− j (v) e 1 S̃ Xt −1  Z s (u) ψt − j (v)
h
j T
+ Tj−2 (T − 1)−2 h −2d ∑∑
t=2 s=2
  2
 −1 Xs−1 − Xt−1
×E e 1 S̃ Xt−1  Z s (u) ψt− j (v)
h

≤ C Tj−2 j 2 + C Tj−2 (T − 1)−1 j h −d ,


where we have used Assumptions 1 and 3.
It follows from (A.9) and Chebychev’s inequality that
T −1  2

p −1/2 ∑ k 2 ( j/ p) Tj B̂122 j (u, v) dW (u) dW (v) = o P (1) . (A.26)
j=1

Then it follows from (A.4), (A.5), (A.12), (A.17), (A.25), and (A.26) that
T −1  2

p −1/2 ∑ k 2 ( j/ p) Tj B̂1 j (u, v) dW (u) dW (v) = o P (1) . (A.27)
j=1

The desired result of Lemma A.1 follows. n


TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 175

Proof of Lemma A.2. We write

B̂2 j (u, v)
   
  T   T Xs−1 − Xt−1
= ϕ (v) − ϕ̂ (v) Tj−1 ∑ ϕ u|Xt−1 − ∑ Ŵ ei u Xs
t= j+1 s=2 h
  
  Xs−1 − Xt−1 
T T   
= − ϕ (v) − ϕ̂ (v) Tj−1 ∑ ∑ Ŵ ϕ u|Xs−1 − ϕ u|Xt−1
t= j+1 s=2 h
 
  −1 T T Xs−1 − Xt−1
− ϕ (v) − ϕ̂ (v) Tj ∑ ∑ Ŵ h
Z s (u)
t= j+1 s=2

= − B̂21 j (u, v) − B̂22 j (u, v) , say. (A.28)

We further decompose

  T      
B̂21 j (u, v) = ϕ (v) − ϕ̂ (v) Tj−1 ∑ e 1 h r +1 S−1
T Xt−1 BT Xt−1 Dr +1 Xt−1
t= j+1

  T    
+ ϕ (v) − ϕ̂ (v) Tj−1 ∑ e 1 S−1
T Xt−1 RT Xt−1
t= j+1

= B̂211 j (u, v) + B̂212 j (u, v) , say, (A.29)

and
%
  T T  −1
B̂22 j (u, v) = Tj−1 (T − 1)−1 h −d ϕ (v) − ϕ̂ (v) ∑ ∑ e 1 S̃ Xt−1 
t=2 s=2
 
Xs−1 − Xt−1  
× Z s (u) − Tj−1 (T − 1)−1 h −d ϕ (v) − ϕ̂ (v)
h
  &
j T  −1 Xs−1 − Xt−1
× ∑∑ e 1 S̃ Xt−1  Z s (u) ψt− j (v) [1 + o P (1)]
t=2 s=2 h

= B̂221 j (u, v) + B̂222 j (u, v) [1 + o P (1)] , say. (A.30)

To show Lemma A.2, it suffices to show that

T −1  2

p −1/2 ∑ k 2 ( j/ p) Tj B̂2abj (u, v) dW (u) dW (v) = o P (1) , for a, b = 1, 2.
j=1
(A.31)

The proof of (A.31) is similar to that of (A.12), (A.17), (A.25), and (A.26) in Lemma
4
A.1, with the fact that E ϕ (v) − ϕ̂ (v) ≤ C Tj−2 given Assumption 1. n
176 BIN CHEN AND YONGMIAO HONG

Proof of Proposition A.2. Given the decomposition in (A.3), we have


2
ˆ
[ j (u, v) − ˜ j (u, v)]˜ j (u, v)∗ ≤ ∑ | B̂a j (u, v)||˜ j (u, v)|, (A.32)
a=1

where the B̂a j (u, v) are defined in (A.3).


For a = 1, by (A.5) and the triangular inequality, we have
T −1 

p −1/2 ∑ k 2 ( j/ p)Tj B̂1 j (u, v) ˜ j (u, v) dW (u)dW (v)
j=1

T −1 
≤ p −1/2 ∑ k 2 ( j/ p)Tj | B̂111 j (u, v)||˜ j (u, v)|dW (u)dW (v)
j=1

T −1 
+ p −1/2 ∑ k 2 ( j/ p)Tj | B̂112 j (u, v)||˜ j (u, v)|dW (u)dW (v)
j=1

T −1 
+ . p −1/2 ∑ k 2 ( j/ p)Tj | B̂121 j (u, v)||˜ j (u, v)|dW (u)dW (v)
j=1

T −1 
+ p −1/2 ∑ k 2 ( j/ p)Tj | B̂122 j (u, v)||˜ j (u, v)|dW (u)dW (v) . (A.33)
j=1

For the first term in (A.33), we have


T −1 

p −1/2 ∑ k 2 ( j/ p)Tj B̂111 j (u, v) ˜ j (u, v) dW (u)dW (v)
j=1

T −1 
≤ p −1/2 ∑ k 2 ( j/ p)Tj | B̂1111 j (u, v)||˜ j (u, v)|dW (u)dW (v)
j=1

T −1 
+ p −1/2 ∑ k 2 ( j/ p)Tj | B̂1112 j (u, v)||˜ j (u, v)|dW (u)dW (v)
j=1
   
= O P p −1/2 T 1/2 h r +1 + O P p 1/2 h r +1 = o P (1) , (A.34)

where we have used Assumptions 1 and 4, (A.9), and the fact that E|˜ j (u,v)|2 ≤ C Tj−1
under H0 .
For the second term in (A.33), we have
T −1 

p −1/2 ∑ k 2 ( j/ p)Tj B̂112 j (u, v) ˜ j (u, v) dW (u)dW (v)
j=1

T −1 
≤ p −1/2 ∑ k 2 ( j/ p)Tj | B̂1121 j (u, v)||˜ j (u, v)|dW (u)dW (v)
j=1
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 177

T −1 
+ p −1/2 ∑ k 2 ( j/ p)Tj | B̂1122 j (u, v)||˜ j (u, v)|dW (u)dW (v)
j=1
   
= O P p −1/2 T 1/2 h r +1 + O P p 1/2 h r +1 = o P (1) , (A.35)

where we have used Assumptions 1 and 4 and (A.9).


For the third term in (A.33), we have
T −1 

p −1/2 ∑ k 2 ( j/ p)Tj B̂12 j (u, v) ˜ j (u, v) dW (u)dW (v)
j=1

T −1 
≤ p −1/2 ∑ k 2 ( j/ p)Tj | B̂121 j (u, v)||˜ j (u, v)|dW (u)dW (v)
j=1

T −1 
+ p −1/2 ∑ k 2 ( j/ p)Tj | B̂122 j (u, v)||˜ j (u, v)|dW (u)dW (v)
j=1
   
= O P p −1/2 T (1−ν)/2ν h −d/ν + O P p −1/2 T −1 h −d/2 = o P (1) , (A.36)

where we have used Assumptions 1 and 4 and (A.9).


For the fourth term in (A.33), we have
T −1 

p −1/2 ∑ k 2 ( j/ p)Tj B̂114 j (u, v) ˜ j (u, v) dW (u)dW (v)
j=1

T −1 
≤ p −1/2 ∑ k 2 ( j/ p)Tj | B̂1141 j (u, v)||˜ j (u, v)|dW (u)dW (v)
j=1

T −1 
p −1/2 + ∑ k 2 ( j/ p)Tj | B̂1142 j (u, v)||˜ j (u, v)|dW (u)dW (v)
j=1
   
= O P p −1/2 h −d/ν + O P p 1/2 T −1 h −3d/ν
   
+ O P p 3/2 T −1/2 + O P pT −1 h −d/2 = o P (1) , (A.37)

where we have used Assumptions 1 and 4, (A.9), and the fact that E|˜ j (u,v)|2 ≤ C Tj−1
given Assumption 1.
For a = 2, similar arguments apply. n
Proof of Theorem A.2. The proof is similar to that of Theorem A.2 of Chen and Hong
(2009). n
Proof of Theorem A.3. The proof is similar to that of Theorem A.3 of Chen and Hong
(2009). n
Proof of Theorem 2. The proof of Theorem 2 consists of the proofs of Theorems A.4
and A.5 below. n
178 BIN CHEN AND YONGMIAO HONG

p
THEOREM A.4. Under the conditions of Theorem 2, ( p 1/2 /T )( M̂ − M̃) → 0.
THEOREM A.5. Under the conditions of Theorem 2,
  π
p
( p 1/2 /T ) M̃ → D −1/2 |F(ω, u, v) − F0 (ω, u, v)|2 dωdW (u)dW (v) .
−π

Proof of Theorem A.4. It suffices to show that


 T −1
p
T −1 ∑ k 2 ( j/ p)Tj |ˆ j (u, v)|2 − |˜ j (u, v)|2 dW (u)dW (v) → 0, (A.38)
j=1

p
p −1 (Ĉ − C̃) = O P (1), and p −1( D̂ − D̃) → 0, where C̃ and D̃ are
 defined in the same
way as Ĉ and D̂ in (2.19), with ϕ̂ u|Xt−1 replaced by ϕ u|Xt−1 . Since the proofs for
p
p −1 (Ĉ − C̃) = O P (1) and p −1 ( D̂ − D̃) → 0 are straightforward, we focus on the proof of
 −1 2
(A.38). From (A.9), the Cauchy-Schwarz inequality, and the fact that T −1 ∑Tj=1 k ( j/ p)
Tj |˜ j (u,v)| dW (u)dW (v) = O P (1) as is implied by Theorem A.5 (the proof of Theorem
2
p
A.5 does not depend on Theorem A.4), it suffices to show that T −1 Â1 → 0, where Â1 is
defined as in (A.2). This is very similar to the proof of Proposition A.1, and hence it
completes the proof for Theorem A.4. n
Proof of Theorem A.5. The proof is similar to that of Theorem A.5 of Chen and Hong
(2009). n

You might also like