سلاسل ماركوف 1
سلاسل ماركوف 1
doi:10.1017/S0266466611000065
BIN CHEN
University of Rochester
YONGMIAO HONG
Cornell University and Xiamen University
The Markov property is a fundamental property in time series analysis and is of-
ten assumed in economic and financial modeling. We develop a new test for the
Markov property using the conditional characteristic function embedded in a fre-
quency domain approach, which checks the implication of the Markov property in
every conditional moment (if it exists) and over many lags. The proposed test is ap-
plicable to both univariate and multivariate time series with discrete or continuous
distributions. Simulation studies show that with the use of a smoothed nonparametric
transition density-based bootstrap procedure, the proposed test has reasonable sizes
and all-around power against several popular non-Markov alternatives in finite sam-
ples. We apply the test to a number of financial time series and find some evidence
against the Markov property.
1. INTRODUCTION
The Markov property is a fundamental property in time series analysis and is often
a maintained assumption in economic and financial modeling. Testing for the va-
lidity of the Markov property has important implications in economics, finance,
as well as time series analysis. In economics, for example, Markov decision pro-
cesses (MDP), which are based on the Markov assumption, provide a general
framework for modeling sequential decision making under uncertainty (see Rust,
1994, and Ljungqvist and Sargent, 2000, for excellent surveys) and have been
extensively used in economics, finance, and marketing. Applications of MDP in-
clude investment under uncertainty (Lucas and Prescott, 1971; Sargent, 1987),
asset pricing (Lucas, 1978; Hall, 1978; Mehra and Prescott, 1985), economic
growth (Uzawa, 1965; Romer, 1986, 1990; Lucas, 1988), optimal taxation (Lucas
We thank Pentti Saikkonen (the co-editor), three referees, Frank Diebold, Oliver Linton, James MacKinnon, Katsumi
Shimotsu, Kyungchul Song, Liangjun Su, George Tauchen, and seminar participants at Peking University, Queen’s
University, University of Pennsylvania, the 2008 Xiamen University-Humboldt University Joint Workshop, the 2008
International Symposium on Recent Developments of Time Series Econometrics in Xiamen, the 2008 Symposium
on Econometric Theory and Applications (SETA) in Seoul, and the 2008 Far Eastern Econometric Society Meeting
in Singapore for their constructive comments on the previous versions of this paper. Any remaining errors are solely
ours. Bin Chen thanks the Department of Economics, University of Rochester, for financial support. Yongmiao
Hong thanks the outstanding overseas youth fund of the National Science Foundation of China for its support.
Address correspondence to Bin Chen, Department of Economics, University of Rochester, Rochester, NY 14620,
USA; e-mail: [email protected].
130
c Cambridge University Press 2011
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 131
and Stokey, 1983; Zhu, 1992), industrial organization (Ericson and Pakes, 1995;
Weintraub, Benkard, and Van Roy, 2008), and equilibrium business cycles
(Kydland and Prescott, 1982). In the MDP framework, an optimal decision rule
can be found within the subclass of nonrandomized Markovian strategies, where a
strategy depends on the past history of the process only via the current state. Obvi-
ously, the optimal decision rule may be suboptimal if the foundational assumption
of the Markov property is violated. Recently non-Markov decision processes
(NMDP) have attracted increasing attention in the literature (e.g., Mizutani and
Dreyfus, 2004; Aviv and Pazgal, 2005). The non-Markov nature can arise in many
ways. The most direct extension of MDP to NMDP is to deprive the decision
maker of perfect information on the state of the environment.
In finance the Markov property is one of the most popular assumptions in
most continuous-time modeling. It is well known that stochastic integrals yield
Markov processes. In modeling interest rate term structure, such popular models
as Vasicek (1977), Cox, Ingersoll, and Ross (1985), affine term structure models
(Duffie and Kan, 1996; Dai and Singleton, 2000), quadratic term structure mod-
els (Ahn, Dittmar, and Gallant, 2002), and affine jump diffusion models (Duffie,
Pan, and Singleton, 2000) are all Markov processes. They are widely used in
pricing and hedging fixed-income or equity derivatives, managing financial risk,
and evaluating monetary policy and debt policy. If interest rate processes are not
Markov, alternative non-Markov models, such as Heath, Jarrow, and Morton’s
(1992) model may provide a better characterization of interest rate dynamics.
In a discrete-time framework, Duan and Jacobs (2008) find that deviations from
the Markovian structure significantly improve the empirical performance of the
model for the short-term interest rate. In general, if a process is obtained by dis-
cretely sampling a subset of the state variables of a continuous-time process that
evolves according to a system of nonlinear stochastic differential equations, it is
non-Markov. A leading example is the class of stochastic volatility models (e.g.,
Anderson and Lund, 1997; Gallant, Hsieh, and Tauchen, 1997).
In the market microstructure literature, one important issue is the price for-
mation mechanism, which determines whether security prices follow a Markov
process. Easley and O’Hara (1987) develop a structural model of the effect of
asymmetric information on the price-trade size relationship. They show that trade
size introduces an adverse selection problem to security trading because informed
traders, given their wish to trade, prefer to trade larger amounts at any given
price. Hence market makers’ pricing strategies will also depend on trade size,
and the entire sequence of past trades is informative of the likelihood of an in-
formation event and thus price evolution. Consequently, prices typically will not
follow a Markov process. Easley and O’Hara (1992) further consider a variant
of Easley and O’Hara’s (1987) model and delineate the link between the exis-
tence of information, the timing of trades, and the stochastic process of security
prices. They show that while trade signals the direction of any new information,
the lack of trade signals the existence of any new information. The latter effect
can be viewed as event uncertainty and suggests that the interval between trades
132 BIN CHEN AND YONGMIAO HONG
may be informative and hence time per se is not exogenous to the price process.
One implication of this model is that either quotes or prices combined with inven-
tory, volume, and clock time are Markov processes. Therefore, rather than using
the non-Markov price series alone, it would be preferable to estimate the price
process consisting of no trade outcomes, buys, and sells. On the other hand, other
models also explain market behavior but reach opposite conclusions on the prop-
erty of prices. For example, Platen and Rebolledo (1996) and Amaro de Matos
and Rosario (2000) propose equilibrium models, which assume that market mak-
ers can take advantage of their superior information on trade orders and set differ-
ent prices. The presence of market makers prevents the direct interaction between
demand and supply sides. By specifying the supply and demand processes, these
market makers obtain the equilibrium prices, which may be Markov. By testing
the Markov property, one can check which models reflect reality more closely.
Our interest in testing the Markov property is also motivated by its wide ap-
plications among practitioners. For example, technical analysis has been used
widely in financial markets for decades (see, e.g., Edwards and Magee, 1966;
Blume, Easley, and O’Hara, 1994; LeBaron, 1999). One important category is
priced-based technical strategies, which refer to the forecasts based on past prices,
often via moving-average rules. However, if the history of prices does not provide
additional information, in the sense that the current prices already impound all
information, then price-based technical strategies would not be effective. In other
words, if prices adjust immediately to information, past prices would be redun-
dant and current prices are the sufficient statistics for forecasting future prices.
This actually corresponds to a fundamental issue: namely, whether prices follow
a Markov process.
Finally, in risk management, financial institutions are required to rate assets
by their default probability and by their expected loss severity given a default.
For this purpose, historical information on the transition of credit exposures is
used to estimate various models that describe the probabilistic evolution of credit
quality. The simple time-homogeneous Markov model is one of the most popular
models (e.g., Jarrow and Turnbull, 1995; Jarrow, Lando, and Turnbull, 1997),
specifying the stochastic processes completely by transition probabilities. Under
this model, a detailed history of individual assets is not needed. However, whether
the Markov specification adequately describes credit rating transitions over time
has substantial impact on the effectiveness of credit risk management. In empirical
studies, Kavvathas (2001) and Lando and Skφdeberg (2002) document strong
non-Markov behaviors such as dependence on previous rating and waiting-time
effects in rating transitions. In contrast, Bangia, Diebold, Kronimus, Schagen, and
Schuermann (2002) and Kiefer and Larson (2004) find that first-order Markov
ratings dynamics provide a reasonable practical approximation.
Despite innumerable studies rooted in Markov processes, there are few existing
tests for the Markov property in the literature. Ait-Sahalia (1997) first proposes
a test for whether the interest rate process is Markov by checking the validity
of the Chapman-Kolmogorov equation, where the transition density is estimated
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 133
Under H0 , the past information set It−1 is redundant in the sense that the current
state variable or vector Xt will contain all information about the future behavior
of the process that is contained in the current information set It . Alternatively,
when
which is implied by H0 in (2.1), but the converse is not true. The most impor-
tant feature of H0 is the necessity of checking the entire currently available in-
formation It . Inevitably there will be information loss if only one lag order is
considered. For example, the existing tests may overlook the departure of the
Markov property from higher-order lags, say, Xt−2 . Moreover, their tests may
suffer from the curse of dimensionality problem when the dimension d is rela-
tively large, because the nonparametric density estimators ĝ(Xt+1 |Xt , Xt−1 ) and
ĝ(Xt+1 |Xt ) involve 3d and 2d dimensional smoothing, respectively.
We now develop a new test for H0 using the CCF. As the Fourier transform of
the conditional probability density, the CCF can also capture the full dynamics
of Xt+1 . Let ϕ(u|Xt ) be the CCF of Xt+1 conditioning on its current state Xt ;
that is,
√
ϕ(u|Xt ) = ei u x g(x|Xt )dx, u ∈ Rd , i= −1. (2.3)
Rd
Let ϕ(u|It ) be the CCF of Xt+1 conditioning on the currently available informa-
tion It , that is,
√
ϕ(u|It ) = ei u x g(x|It )dx, u ∈ Rd , i= −1.
Rd
Given the equivalence between the conditional probability density and the CCF,
the hypotheses of interest H0 in (2.1) versus H A in (2.2) can be written as
Given that the conventional spectral density is defined as the Fourier transform
of the autocovariance function, we can define a generalized cross-spectrum
1 ∞
F(ω, u, v) =
2π ∑ j (u, v)e−i jω , ω ∈ [−π, π ], u, v ∈ Rd , (2.9)
j=−∞
which is the Fourier transform of the generalized covariance function j (u, v),
where ω is a frequency. This function contains the same information as j (u, v).
No moment conditions on {Xt } are required. This is particularly appealing for
economic and financial time series. It has been argued that higher moments of
financial time series may not exist (e.g., Pagan and Schwert, 1990; Loretan and
Phillips, 1994). Moreover, the generalized cross-spectrum can capture cyclical
patterns caused by linear and nonlinear cross-dependence, such as volatility clus-
tering and tail clustering of the distribution.
Under H0 we have j (u, v) = 0 for all u, v ∈ Rd and all j = 0. Consequently,
the generalized cross-spectrum F(ω, u, v) becomes a ”flat” spectrum as a func-
tion of frequency ω:
1
F(ω, u, v) = F0 (ω, u, v) ≡ 0 (u, v), ω ∈ [−π, π ], u, v ∈ Rd .
2π
(2.10)
To reduce the gap between E[Z t (u)|It−1 ] = 0 for all u ∈Rd and j (u, v) =
0 for all u, v ∈ Rd and all j = 0, we can extend F(ω, u, v) to a generalized
bispectrum
1 ∞ ∞
B (ω1 , ω2 , u, v, τ ) =
(2π)2
∑ ∑ C j,l (u, v, τ ) e−i jω1 −ilω2 ,
j=−∞ l=−∞
ω1 , ω2 ∈ [−π, π ] , u, v, τ ∈ Rd ,
where
1 T
and gl−1 denotes the one-to-one map that arranges those Nl d-tuples as a sequence
in a lexicographical order.5 And (x, u) is an N × 1 vector
⎡ ⎤
0
⎢ 1 ⎥
⎢ ⎥
(x, u) = ⎢ . ⎥ ,
⎣ .. ⎦
r
140 BIN CHEN AND YONGMIAO HONG
|j| is of dimension N|j| × 1, with its lth element |j| l = τg|j| (l) , and
j
i u Xt Xt−1 − x
T
τj (x) = ∑ e Kh (Xt−1 − x).
t=2 h
Note that β̂ depends on the location x and parameter u, but for notational sim-
plicity, we have suppressed its dependence on x and u.
Under suitable regularity conditions, ϕ(u|x) can be consistently estimated by
the local intercept estimator β̂0 (x, u) . Specifically, we have
T Xt−1 − x i u Xt
ϕ̂(u|x) = ∑ Ŵ e ,
t=2 h
|j| (z) is of dimension N|j| × 1, with its lth element |j| (z) = (z)g|j| (l) ,
l
and z is a d × 1 vector. The regression estimator ϕ̂(u|Xt−1 ) only involves a
d-dimensional smoothing, thus enjoying some advantages over the existing non-
parametric density approaches, which involve a 2d or 3d dimensional smoothing.
With the sample generalized covariance function ˆ j (u, v), we can construct a
consistent estimator for the flat generalized spectrum F0 (ω, u, v),
1
F̂0 (ω, u, v) = ˆ 0 (u, v), ω ∈ [−π, π ], u, v ∈ Rd .
2π
Consistent estimation for F (ω, u, v) is more challenging. We use a nonparamet-
ric smoothed kernel estimator for F(ω, u, v) :
1 T −1
F̂(ω, u, v) =
2π ∑ (1 − | j| /T )1/2 k( j/ p)ˆ j (u, v)e−i jω ,
j=1−T
ω ∈ [−π, π ], u, v ∈ Rd , (2.14)
T −1 T 2 2
Ĉ = ∑ k 2 ( j/ p)(T − j)−1 ∑ Ẑ t (u) ψ̂t− j (v) dW (u) dW (v) ,
j=1 t=| j|+1
3. ASYMPTOTIC DISTRIBUTION
To derive the null asymptotic distribution of the test M̂, we impose the following
regularity conditions.
Assumption 1. (i) Assume {Xt }is a strictly stationary β-mixing process with
mixing coefficient β ( j) = O j −ν for some constant ν > 12; (ii) the marginal
density g (x) of Xt is bounded and Lipschitz, and the joint density g j (x, y) of Xt
and Xt− j is bounded.
Assumption 2. For each sufficiently large integer q, there exists a q-dependent
2
stationary process {Xqt }, such that E Xt − Xqt ≤ Cq −η for some constant
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 143
η ≥ 12 and all large q. The random vector Xqt is measurable with respect to some
sigma field, which may be different from the sigma field generated by {Xt } .
Assumption 3. Let ϕ (u|x) be the CCF of Xt given Xt−1 . For each u ∈ Rd ,
(r +1)
ϕ (u|x)is (r + 1)th differentiable with respect to x ∈Rd and ∂∂x(r +1) ϕ (u|x) is
(r +1)
(r +1)
Lipschitz of order α: ∂∂x(r +1) ϕ (u|x1 ) − ∂∂x(r +1) ϕ (u|x2 ) ≤ l (u) x1 − x2 α ,
where 0 < α ≤ 1 and l 2 (u) dW (u) < ∞.
Assumption 4. The function K is a product kernel of some univariate kernel
K , i.e., K (u) = ∏dj=1 K (u j ), where K : G → R+ is a symmetric and bounded
function and G is a compact set. The function Hj (u) ≡ uj K (u) is Lipschitz for
all j with 0 ≤ |j| ≤ 2r + 1.
Assumption 5. (i) k : R → [−1, 1] is a symmetric function that is continuous
at zero and all points in R except for a finite number of points; (ii) k (0) = 1; (iii)
k (z) ≤ c |z|−b for some b > 34 as |z| → ∞.
Assumption 6. W : Rd → R+ is a nondecreasing
weighting function that
weighs sets symmetric about the origin equally, with u4 dW (u) < ∞.
Assumptions 1–3 are regularity conditions on the DGP of {Xt }. Assumption
1(i) restricts the degree of temporal dependence of {Xt }. We say that {Xt } is
β-mixing (absolutely regular) if
β ( j) = sup E sup P A|F − P (A) → 0,
s
1
s≥1 ∞
A∈Fs+ j
∞ j αβ q+1
=α ∑ ∏ E βε t−i =
2
1−β
.
j=q+1 i=1
Thus Assumption 2 holds if β < 1.
For the third example we consider a mean-reverting Ornstein-Uhlenbeck pro-
cess Xt :
dXt = κ (θ − Xt ) dt + σ dWt ,
where Wt is the standard Brownian motion. This is known as Vasicek’s (1977)
model in the interest rate
term structure literature. From the stationarity condi-
σ2
tion, we have Xt ∼ N θ, 2κ . Define Xqt = θ + t−q t
σ e−κ(t−s) dWs , which is a
q-dependent process. Then Assumption 2 holds because
t−q 2
2
E Xt − Xqt = E e−κt (X0 − θ) + σ e−κ(t−s) dWs
0
" #
σ2 t−q
= e−2κt + σ 2 e−2κ(t−s) ds
2κ 0
σ 2 e−2κq
= = o q −η , for any η > 0.
2κ
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 145
H0 , M̂ →d N (0, 1) as T → ∞.
As an important feature of M̂, the use of the nonparametrically estimated gen-
eralized residual Ẑ t (u) in place of the true unobservable residual Z t (u) has
no impact on the limit distribution of M̂. One can proceed as if the true CCF
ϕ(u|Xt−1 ) were known and equal to the nonparametric estimator ϕ̂(u|Xt−1 ).
The reason is that by choosing suitable bandwidth h and lag order p, the conver-
gence rate of the nonparametric CCF estimator ϕ̂(u|Xt−1 ) is faster than that of the
nonparametric estimator F̂ (ω, u, v) to F (ω, u, v) . Consequently, the limiting
distribution of M̂ is solely determined by F̂ (ω, u, v) , and replacing ϕ(u|Xt−1 )
146 BIN CHEN AND YONGMIAO HONG
4. ASYMPTOTIC POWER
Our test is derived without assuming a specific alternative to H0 . To get insights
into the nature of the alternatives that our test is able to detect, we now examine
the asymptotic power behavior of M̂ under H A in (2.2).
λ
hold, and p = cT for 0 < λ <
THEOREM 2. Suppose Assumptions 1 and 3–6
−1 −δ λν 1
(3 + 4b−2 ) and 0 < c < ∞, h = cT , δ ∈ 4(r +1) , min( 2d , d ) . Then under
1 2−λ
H A , and as T → ∞,
p 1/2 1 ∞
M̂ → P √ ∑ j (u, v)2 dW (u) dW (v)
T D j=1
π
π
= √ |F (ω, u, v) − F0 (ω, u, v)|2 dωdW (u) dW (v) ,
2 D −π
where
∞
D=2 k 4 (z) dz |0 (u1 , u2 )|2 dW (u1 )
0
∞
dW (u2 ) ∑ j (v1 , v2 )2 dW (v1 ) dW (v2 ) ,
j=−∞
j (u, v) = cov ei u Xt , ei v Xt−| j| and 0 (u, v) = cov [Z t (u) , Z t (v)] .
j (u, v) can capture various departures from the Markov property in every con-
ditional moment of Xt in view of the Taylor series expansion in (2.7). Suppose
E Z t (u)|Xt− j = 0 at some lag j > 0. Then we have j (u, v)2 dW (u)
dW (v) > 0 for any weighting function W (·) that is positive, monotonically
increasing, and continuous, with unbounded support on Rd . Consequently, P[ M̂ >
C (T )] → 1 for any sequence of constants {C(T ) = o(T / p 1/2 )}. Thus M̂ hasasy-
mptotic unit power at any given significance level, whenever E Z t (u)|Xt− j = 0
at some lag j > 0.
Thus, to ensure the consistency property of M̂, it is important to integrate u
and v over the entire domain of Rd . When numerical integration is difficult, as
is the case where the dimension d is large, one can use Monte Carlo simulation
to approximate the integrals over u and v. This can be obtained by using a large
number of random draws from the distribution W (·) and then computing the sam-
ple average as an approximation to the related integral. Such an approximation
will be arbitrarily accurate provided the number of random draws is sufficiently
large. Alternatively, we can use a nondecreasing step function W (·). This avoids
numerical integration or Monte Carlo simulation, but the power of the test may be
affected. In theory, the consistency property will not be preserved if only a finite
number of grid points of u and v are used, and the power of the test may depend
on the choice of grid points for u and v.
On the other hand, Theorem 2 implies that the M̂ test can check departure
from the Markov property at any lag order j > 0, as long as the sample size T is
sufficiently large. This is achieved because M̂ includes an increasing number of
lags as the sample size T → ∞. Usually the use of a large number of lags would
lead to the loss of a large number of degrees of freedom. Fortunately this is not the
case with the M̂ test, thanks to the downward weighting of k 2 (·) for higher-order
lags.
As revealed by the Taylor series expansion in (2.7), our test, which is based on
the MDS characterization in (2.6), essentially checks departures from the Markov
property in every conditional moment. When M̂ rejects the Markov property, one
may be further interested in what causes the rejection. To gauge possible sources
of the violation of the Markov property, we can construct a sequence of tests based
on the derivatives of the nonparametric regression residual Z t (u) at the origin 0:
∂ |m| m1 md
m1 m d E Z t (u)|It−1 u=0 = E(X 1t · · · X dt |It−1 )
∂u 1 · · · ∂u d
m1 md
−E(X 1t · · · X dt |Xt−1 ) = 0,
148 BIN CHEN AND YONGMIAO HONG
T −1 T 2
1 (m) 2
Ĉ(m) = ∑ k 2 ( j/ p) T − j ∑ Ẑ t (0) ψ̂t− j (v) dW (v) ,
j=1 t=| j|+1
T −2 T −2
D̂(m) = 2 ∑ ∑ k 2 ( j/ p)k 2 (l/ p)
j=1 l=1
2
T
1 (m) 2
× ∑
T − max( j,l) t=max( j,l)+1
Ẑ t (0) ψ̂t− j (v1 )ψ̂t− j (v2 ) dW (v1 ) dW (v2 ) ,
and
d d
(m)
Ẑ t (0) = ∏ (i X at ) ma
−E ∏ (i X at ) ma
|Xt−1 .
a=1 a=1
These derivative tests may provide additional useful information on the possi-
ble sources of the violation of the Markov property. On the other hand, some
economic theories only have implications for the Markov property in certain
moments, and our derivative tests are suitable to test these implications. For ex-
ample, Hall (1978) shows that a rational expectation
model
of consumption can be
characterized by the Euler equation that E u (Ct+1 ) |It = u (Ct ) , where u (Ct )
is the marginal utility of consumption Ct . This can be viewed as the Markov
property in mean for the marginal utility process of consumption. The derivative
test M̂ (1) can be used to test this implication.
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 149
5. NUMERICAL RESULTS
5.1. Monte Carlo Simulations
Theorem 1 provides the null asymptotic N (0, 1) distribution of M̂. Thus, one can
implement our test for H0 by comparing M̂ with a N (0, 1) critical value. How-
ever, like many other nonparametric tests in the literature, the size of M̂ in finite
samples may differ significantly from the prespecified asymptotic significance
level. Our analysis suggests that the asymptotic theory may not work well even
for relatively large sample sizes, because the asymptotically negligible higher-
order terms in M̂ are close in order of magnitude to the dominant U -statistic
that determines the limit distribution of M̂. In particular, the first-stage smoothed
nonparametric regression estimation for ϕ(u|Xt−1 ) may have substantial adverse
effect on the size of M̂ in finite samples. Indeed, our simulation study shows that
M̂ displays severe underrejection under H0 . We examine the finite sample per-
formance of an infeasible M̂ test by replacing the estimated generalized residual
Ẑ t (u) with the true generalized residual Z t (u). We find that the size of the in-
feasible test is reasonable. This experiment suggests that the underrejection of M̂
is mainly due to the impact of the first-stage nonparametric estimation of CCF,
which has a rather slow convergence rate. Similar problems are also documented
by Skaug and Tjøstheim (1993, 1996), Hong and White, (2005), and Fan, Li, and
Min (2006) in other contexts.
To overcome this problem, we use Horowitz’s (2003) smoothed nonparamet-
ric conditional density bootstrap procedure to approximate the null finite-sample
null distribution of M̂ more accurately. The basic idea is to use a smoothed non-
parametric transition density estimator (under H0 ) to generate bootstrap samples.
Specifically, it involves the following steps:
Step (i). To obtain a bootstrap sample X b ≡ {Xbt }t=1
T , draw Xb from the
1
smoothed unconditional kernel density
1 T
ĝ (x) =
T ∑ Kh (x − Xs−1 )
s=2
M̂ .
b
Step (iii). Repeat steps (i) and (ii) B times to obtain B bootstrap test statistics
{M̂lb }l=1
B .
150 BIN CHEN AND YONGMIAO HONG
The consistency of the smoothed bootstrap does not indicate the degree of im-
provement of the smoothed bootstrap upon the asymptotic distribution. Since M̂
is asymptotically pivotal, it is possible that M̂ b can achieve reasonable accuracy
in finite samples. We shall examine the performance of the smoothed bootstrap in
our simulation study.
We shall compare the finite sample performance of our M̂ test with Su and
White’s (SW ) (2007, 2008) CCF-based test and Hellinger metric test for condi-
tional independence.12 To examine the size of the tests under H0 , we consider
two Markov DGPs:
To examine the power of the tests using the smoothed bootstrap, we consider
the following non-Markovian DGPs:
where εt ∼ i.i.d.N (0, 1) , and in DGPs P4 and P5, St is a latent state variable that
follows a two-state Markov chain with transition probabilities P(St = 1|St−1 = 0)
= P (St = 0|St−1 = 1) = 0.9. DGPs P4 and P5 are the Markov Chain Regime-
Switching model and Markov Chain Regime-Switching ARCH model proposed
by Hamilton (1989) and Hamilton and Susmel (1994), respectively. They can cap-
ture the state-dependent behaviors in time series. The introduction of St changes
the Markov property of AR(1) and ARCH(1) processes. The knowledge of X t−1
is not sufficient to summarize all relevant information in It−1 that is useful to
predict the future behavior of X t . The departure from the Markov property comes
from the conditional mean in DGPs P1 and P4, from the conditional variance in
DGPs P2 and P5, and from both the conditional mean and conditional variance in
DGP P3.
Throughout, we consider three sample sizes: T = 100, 250, 500. For each DGP
we first generate T + 100 observations and then discard the first 100 to mitigate
the impact of the initial values. To examine the bootstrap sizes and powers of
the tests, we generate 500 realizations of the random sample {X t }t=1T , using the
for K (·). For simplicity, we choose h = Ŝ X T −1/4.5 , where Ŝ X is the sample stan-
dard deviation of {X t }t=1
T .15 We compare the proposed test with Su and White’s
(2007, 2008) tests, applied to the present context to check whether X t is inde-
pendent of X t−2 conditional on X t−1 . Following Su and White (2008, 2007), we
choose a fourth-order kernel K(u) = (3 − u 2 )ϕ(u)/2, where ϕ(·) is the N (0, 1)
density function, h = T −1/8.5 for the nonparametric estimation of their Hellinger
metric test SWa , h 1 = h ∗1 T 1/10 T −1/6 and h 2 = h ∗2 T 1/9 T −1/5 for their CCF-based
test SWb , where h ∗1 and h ∗2 are the least-squares cross-validated bandwidths for
estimating the conditional expectations of X t given (X t−1 , X t−2 ) and X t−1 , re-
spectively, and b = T −1/5 for the bootstrap.
Table 1 reports the bootstrap sizes and powers of M̂, SWa , and SWb at the
10% and 5% levels under DGPs S1–S2 and P1–P5. The M̂ test has reasonable
sizes under the DGPs S1 and S2 at both 10% and 5% levels. Under DGP S1
(AR(1)) the empirical levels of M̂ are very close to the nominal levels, especially
at the 5% level. When T = 100, M̂ tends to overreject a little under DGP S2
(ARCH(1)), but the overrejection is not excessive, and it improves as T increases.
The sizes of M̂ are not very sensitive to the choice of the preliminary lag order
p̄. The smoothed bootstrap procedure has reasonable sizes in small samples. We
note that the rejection rate of SWa decreases monotonically under DGP S1 and
reaches 2.8% at the 5% level when T = 500, but SWb has good sizes under both
DGPs.
Under DGPs P1–P5, X t is not Markov, and our test has reasonable power.
Under DGPs P1 and P4 (MA(1) and Markov Chain Regime-Switching), our test
dominates SWa and SWb for all sample sizes considered. Interestingly, SWa and
SWb have nonmonotonic power against DGP P4, and their rejection rates only
reach 10.4% and 7.2%, respectively, at the 5% level when T = 500. In contrast,
the power of M̂ is around 50% at the 5% level when T = 500. Under DGPs P2, P3,
and P5 (GARCH(1,1), Markov Chain Regime-Switching ARCH, and GARCH-
in-Mean), SWa and SWb perform slightly better in small samples, but the power
of our M̂ test increases more quickly with T , and our test outperforms SWa and
SWb when T = 500, which demonstrates the nice feature of our frequency domain
approach. The relative ranking between SWa and SWb does not display a very
clear pattern, but SWb is more powerful under DGPs P1–P3.
In summary, the new M̂ test with the smoothed bootstrap procedure delivers
reasonable size and omnibus power against various non-Markov alternatives in
small samples. It performs well relative to two existing tests SWa and SWb in
many cases.
Notes: (i) M̂ is our proposed omnibus test, given in (2.19); SWa and SWb are Su and White’s (2008) Hellinger metric test and Su and White’s (2007) characteristic function-based test,
153
respectively; (ii) 500 iterations and 100 bootstrap iterations for each simulation iteration.
154 BIN CHEN AND YONGMIAO HONG
the spot interest rates. Although works are still going on to add the richness of
model specification in terms of jumps and functional forms, the models proposed
continue to be a Markov process. In fact, the firm rejection of a continuous-time
model could be due to the violation of the Markov property, as speculated by Hong
and Li. If this is indeed the case, one should not attempt to look for flexible func-
tional forms within the class of Markov models. On the other hand, as discussed
earlier, an important conclusion of the asymmetric information microstructure
models (e.g., Easley and O’Hara, 1987, 1992) is that asset price sequences do not
follow a Markov process. It is interesting to check whether real stock prices are
consistent with this conjecture.
We apply our test to three important financial time series—stock prices, interest
rates, and foreign exchange rates—and compare it with SWa and SWb . We use
the Standard & Poor’s 500 price index, 7-day Eurodollar rate, and Japanese yen,
obtained from Datastream. The data are weekly series from 1 January 1988 to 31
December 2006. The weekly series are generated by selecting Wednesdays series
(if a Wednesday is a holiday then the preceding Tuesday is used), which all have
991 observations. The use of weekly data avoids the so-called weekend effect, as
well as other biases associated with nontrading, asynchronous rates, and so on,
which are often present in higher-frequency data. To examine the sensitivity of
our conclusion to the possible structural changes, we consider two subsamples:
1 January 1988 to 31 December 1997, for a total of 521 observations, and 1 Jan-
uary 1998 to 31 December 2006, for a total of 470 observations. Figure 1 provides
the time series plots and Table 2 reports some descriptive statistics. The aug-
mented Dickey-Fuller test indicates that there exists a unit root in all three level
series but not in their first differenced series. Therefore, as is a standard practice,
we use S&P 500 log returns, 7-day Eurodollar rate changes, and Japanese yen log
returns. To check possible structural changes, we use Inoue’s (2001) Kolmogorov-
Smirnov (KS) test for the stability of stationary distributions.16 Table 2 shows that
we are unable to reject the distribution stability hypothesis for all series in both
sample periods at the 5% level, and we are only able to reject the distribution sta-
bility hypothesis for 7-day Eurodollar rate changes for the full sample at the 10%
level. On the other hand, to check the robustness of our test to possible structural
breaks, we apply our test to an AR(1) model with structural break in a simula-
tion study (results are availabe upon requests). This DGP is Markov, but there
exists a structural break. Our test does not overreject the null Markov hypothe-
sis. This suggests that our test may be robust to some forms of structural breaks
in practice.
Table 3 reports the test statistics and bootstrap p-values of our test, SWa , and
SWb . The bootstrap p-values, based on B = 500 bootstrap iterations, are com-
puted as described in Section 5.1. For all sample periods considered, the bootstrap
p-values of our test statistics are quite robust to the choice of the preliminary lag
order p̄. For the whole sample and the subsample of 1998 to 2006, we find strong
evidence against the Markov property for S&P 500 returns, 7-day Eurodollar rate
changes, and Japanese yen returns: All bootstrap p-values of our test are smaller
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 155
than 5%. For the subsample of 1988 to 1997, we only reject the Markov property
of 7-day Eurodollar rate changes at the 5% level. The results of SWa and SWb
are mixed, and there seems no clear pattern of these two tests. For example, at the
10% level, SWa is only able to reject the Markov property of S&P 500 returns and
7-day Eurodollar rate changes from 1998 to 2006, and SWb is only able to reject
that of S&P 500 returns from 1988 to 2006 and 7-day Eurodollar rate changes
from 1988 to 2006 and 1988 to 1997.
To gauge possible sources of the violation of the Markov property, we also
implement derivative tests M̂(m), m = 1, 2, 3, 4, as described in Section 4. Tests
and their results are reported in Table 4. We first consider S&P 500 returns. A
156
BIN CHEN AND YONGMIAO HONG
TABLE 2. Descriptive statistics for S&P 500, interest rate, and exchange rate
01/01/1988 − 31/12/2006 01/01/1988 − 31/12/1997 01/01/1998 − 31/12/2006
S&P Eurodollar JY S&P Eurodollar JY S&P Eurodollar JY
Sample size 991 991 991 521 521 521 470 470 470
Mean 0.0017 −0.0017 −0.0001 0.0025 −0.0025 0.0000 0.0008 −0.0004 −0.0002
Std 0.0209 0.3272 0.0145 0.0179 0.4087 0.0146 0.0238 0.2019 0.0145
ADF −0.58 −1.19 −2.07 2.10 −1.00 −1.19 −1.78 −0.72 −2.42
(0.8728) (0.6808) (0.2550) (0.9999) (0.7532) (0.6786) (0.3896) (0.8395) (0.1362)
KS 0.2828 0.0707 0.3939 0.6364 0.1010 0.1818 0.1818 0.1414 0.2828
Notes: ADF denotes the augmented Dickey-Fuller test; KS denotes the Kolmogorov-Smirnov test for the stability of stationary distributions proposed by Inoue (2001).
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 157
TABLE 3. Markov test for S&P 500, interest rate, and exchange rate
S&P 500 7-day Eurodollar rate Japanese yen
lag Statistics p-values Statistics p-values Statistics p-values
M̂ 01/01/1988 − 31/12/2006
10 0.86 0.0160 0.75 0.0080 1.34 0.0000
11 0.86 0.0160 0.98 0.0040 1.39 0.0000
12 0.89 0.0160 1.16 0.0040 1.52 0.0000
13 0.95 0.0160 1.35 0.0040 1.65 0.0000
14 1.01 0.0160 1.58 0.0020 1.76 0.0000
15 1.05 0.0180 1.79 0.0020 1.85 0.0000
16 1.07 0.0180 1.99 0.0020 1.94 0.0000
17 1.09 0.0200 2.22 0.0020 2.01 0.0000
18 1.11 0.0180 2.48 0.0020 2.08 0.0000
19 1.12 0.0180 2.74 0.0020 2.15 0.0000
20 1.13 0.0180 2.97 0.0000 2.21 0.0000
SWa 0.79 0.1680 −4.61 0.9920 0.09 0.4600
SWb 0.36 0.0940 0.21 0.0520 −0.85 0.5700
M̂ 01/01/1988 − 31/12/1997
10 −1.39 0.5940 0.25 0.0100 −0.35 0.0980
11 −1.39 0.6120 0.30 0.0100 −0.35 0.1060
12 −1.35 0.6080 0.34 0.0080 −0.27 0.1020
13 −1.30 0.5900 0.38 0.0060 −0.21 0.0980
14 −1.25 0.5840 0.41 0.0080 −0.16 0.0980
15 −1.20 0.5780 0.45 0.0080 −0.12 0.1040
16 −1.15 0.5600 0.49 0.0080 −0.08 0.1040
17 −1.08 0.5260 0.51 0.0080 −0.04 0.1060
18 −1.02 0.5080 0.54 0.0100 −0.01 0.1100
19 −0.96 0.4860 0.57 0.0100 0.03 0.1100
20 −0.91 0.4640 0.62 0.0080 0.06 0.1100
SWa −0.36 0.6540 −4.85 0.9900 0.16 0.3640
SWb −0.14 0.1680 0.07 0.0700 0.03 0.1100
M̂ 01/01/1998 − 31/12/2006
10 1.68 0.0080 0.34 0.0100 0.71 0.0100
11 1.88 0.0060 0.74 0.0040 0.76 0.0120
12 2.06 0.0040 1.08 0.0000 0.82 0.0140
13 2.22 0.0020 1.36 0.0000 0.88 0.0140
14 2.36 0.0020 1.62 0.0000 0.94 0.0140
15 2.48 0.0020 1.87 0.0000 0.98 0.0140
16 2.58 0.0020 2.09 0.0000 1.02 0.0100
17 2.66 0.0000 2.27 0.0000 1.06 0.0100
18 2.74 0.0000 2.44 0.0000 1.09 0.0100
19 2.81 0.0000 2.60 0.0000 1.11 0.0120
20 2.88 0.0000 2.75 0.0000 1.14 0.0120
SWa 1.12 0.0960 1.50 0.0180 0.63 0.2160
SWb −0.07 0.1520 −0.18 0.1740 −1.28 0.8600
Notes: (i) M̂ is our proposed omnibus test,given in (2.19); SWa and SWb are Su and White’s (2008) Hellinger metric
test and Su and White’s (2007) characteristic function based test respectively; (ii) 500 bootstrap iterations.
158 BIN CHEN AND YONGMIAO HONG
TABLE 4. Derivative tests for S&P 500, interest rate, and exchange rate
M̂ (1) M̂ (2) M̂ (3) M̂ (4)
S&P 500 1988–2006 0.4220 0.2380 0.2160 0.2220
S&P 500 1988–1997 0.6320 0.0140 0.2680 0.0280
S&P 500 1998–2006 0.7300 0.0000 0.7140 0.0040
7-day Eurodollar 1988–2006 0.0020 0.0000 0.0000 0.0000
7-day Eurodollar 1988–1997 0.0060 0.0060 0.0100 0.0140
7-day Eurodollar 1998–2006 0.0400 0.0440 0.1040 0.0520
Japanese yen 1988–2006 0.0780 0.0000 0.0200 0.0120
Japanese yen 1988–1997 0.0360 0.0760 0.1100 0.1440
Japanese yen 1998–2006 0.6048 0.1060 0.0360 0.1520
Notes: (i) M̂ (m) , m = 1, 2, 3, 4, are our proposed derivative tests, given in (4.1); (ii) The bootstrap p -values are
calculated by the smoothed nonparametric transition density-based bootstrap procedure described in Section 5 with
500 bootstrap iterations.
bit surprisingly, the four derivative tests M̂(m), m = 1, 2, 3, 4 all fail to reject the
Markov hypothesis. However, for two subsamples, M̂(2) and M̂(4) reject the null
at the 5% level, while M̂(1) and M̂(3) do not reject the null hypothesis. These
results suggest that the violation of the Markov property may come from the con-
ditional variance and kurtosis dynamics of S&P 500 returns. For 7-day Eurodollar
rate changes, for both the whole sample and the first subsample, all four derivative
tests firmly reject the null hypothesis at the 5% level. For the second subsample,
M̂(1) and M̂(2) reject the null at the 5% level, but M̂(3) and M̂(4) do not. It
seems that the violation of the Markov property for the 7-day Eurodollar rate
comes from both mean and variance dynamics, and also possibly from higher-
order moment dynamics. For Japanese yen changes, M̂(2), M̂(3), and M̂(4) tests
strongly reject the null hypothesis at the 5% level, for the whole sample. How-
ever, the results from both subsamples are less clear. For the first subsample, only
M̂(1) rejects the null hypothesis at the 5% level, and for the second subsample,
only M̂(3) rejects the null hypothesis at the 5% level. To sum up, for all three
financial series, we find strong evidence of violation of the Markov property in
the conditional variance, among other things. This is consistent with the popu-
lar use of such non-Markovian models as generalized autoregressive conditional
heteroskedasticity (GARCH) and stochastic volatility models in capturing the dy-
namics of price sequences in the literature.
As many financial time series have been documented to have a long memory
property, which is non-Markov, we also apply Lobato and Robinson’s (1998) test
for the long memory property. Results (not reported here) show that there is no
evidence of long memory for S&P 500 returns and Japanese yen returns in the
whole sample and two subsamples, while there is some evidence of long memory
for 7-day Eurodollar rate changes. Thus, we can not rule out the possibility that
the rejection of the Markov property of 7-day Eurodollar rate may be due to the
long memory property. Indeed, the evidence of 7-day Eurodollar rate changes
against the Markov property is strongest among three time series.
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 159
The documented evidence against the Markov property casts some new thoughts
on financial modeling. Although most popular stochastic differential equation
models exhibit mathematical elegance and tractability, they may not be an ade-
quate representation of the dynamics of the underlying process, due to the Markov
assumption. Other modeling schemes, which allow for the non-Markov property,
may be needed to better capture the dynamics of financial time series.
6. CONCLUSION
The Markov property is one of most fundamental properties in stochastic pro-
cesses. Without justification, this property has been taken for granted in many
economic and financial models, especially in continuous-time finance models.
We propose a conditional characteristic function-based test for the Markov prop-
erty in a spectral framework. The use of the conditional characteristic function,
which is consistently estimated nonparametrically, allows us to check departures
from the Markov property in all conditional moments, and the frequency domain
approach, which checks many lags in a pairwise manner, provides a nice solution
to tackling the difficulty of the curse of dimensionality associated with testing for
the Markov property. To overcome the adverse impact of the first-stage nonpara-
metric estimation of the conditional characteristic function, we use the smoothed
nonparametric transition density-based bootstrap procedure, which provides rea-
sonable sizes and powers for the proposed test in finite samples. We apply our test
to three important financial time series and find some evidence that the Markov
assumption may not be suitable for many financial time series.
NOTES
1. There are other existing tests for conditional independence of continuous variables in the litera-
ture. Linton and Gozalo (1997) propose two nonparamtric tests for conditional independence based on
a generalization of the empirical distribution function. Su and White (2007, 2008) check conditional
independence by the Hellinger distance and empirical characteristic function respectively. These tests
can be used to test the Markov property. However, they are expected to encounter the “curse of dimen-
sionality” problem because the Markov property implies that conditional independence must hold for
an infinite number of lags.
2. Here we focus on the Markov property of order 1, which is the main interest of economic and
financial
modeling. However,
our approach can be generalized to test the Markov property of order
p : P Xt+1 ≤ x|It = P Xt+1 ≤ x|Xt , Xt−1 , ..., Xt− p+1 for p fixed.
3. A multivariate Taylor series expansion can be obtained when d > 1. Since the expression is
tedious, we do not present it here.
4. The extension is substantial since we use nonparametric estimation in the first stage and {Z t (u)}
is not independent and identically distributed (i.i.d.) under H0 .
5. See Masry (1996a, 1996b) for detailed explanations of these notations.
6. If W (u) is differentiable, then this implies that its derivative (∂/∂u a )W (u) is an even function
of u a for a = 1, ..., d.
7. If Xt takes on discrete values, we can estimate ϕ(u|Xt ) via a frequency approach, namely
replacing Kh (x − X) with 1(x − X), where 1(·) is the indicator function. If Xt is a mix of discrete
and continuous variables, e.g., Xt = (Xdt , Xct ), where Xdt and Xct denote discrete and continuous
components, respectively, following Li and Racine (2007), we can replace Kh (·) with
160 BIN CHEN AND YONGMIAO HONG
Wγ (x, Xs ) = Kh xc − Xcs Lλ xd , Xds ,
where γ = (h, λ). And Lλ (·) is the kernel function for the discrete components defined as
d d = x d
1 X as a
Lλ xd , Xds = ∏ λa ,
a=1
where 0 ≤ λa ≤ 1 is the smoothing parameter for Xds . Once we get a consistent estimator for ϕ(u|Xt ),
we can calculate the generalized residual and construct the test statistic.
8. The proof strategy depends on Assumption 2. It seems plausible that one may relax Assump-
tion 2 and rely on a more generous central limit theorem for degenerate U-statistics (e.g., Theo-
rem 2.1 of Gao and Hong, 2008) but we may have to impose a more restrictive mixing condition
as the cost. Due to its complexity, this will be left for our future research. On the other hand, As-
sumptions 1 and 2 do not imply each other. For example, consider a long memory process Xt =
∑∞j=0 ϕ j εt− j , where {εt } ∼ i.i.d.(0, 1), ϕ j = ( j + d)/ [ (d) ( j + 1)] ≈
−1 (d) j d−1 as j → ∞,
q
where (·) is the Gamma function. Define Xqt = ∑ j=0 ϕ j εt− j , a q-dependent process. Then we
2
have E Xt − Xqt ≈ −1 (d) q −1+2d · q −1 ∑∞ j=q+1 ( j/q)
−2(1−d) = O q −1+2d . Hence, As-
sumption 2 holds if 0 < d ≤ 14 , but Assumption 1 is violated since {Xt } is not a strictly stationary
β-mixing process.
9. Alternatively, we could impose Hansen’s (2008) Assumption 3 on kernel functions,
namely,
for
d
< ∞ and L < ∞, either K(u) = 0 for u > L and for all u, u ∈ R , K (u) − K u ≤
some
u − u ; or K (u) is differentiable, |(∂/∂u) K (u)| ≤ , and for some ν > 1, |(∂/∂u) K (u)| ≤
u−ν for u > L , where u ≡ max (|u 1 | , ..., |u d |) . Here the kernel function is required to
either have a truncated support and is Lipschitz or it has a bounded derivative with an integrable tail.
Our proof could go through with this assumption, but the trade-off is a strengthening requirement on
the bandwidth. Since the choice of the bandwidth is more important than the choice of the kernel,
and many commonly used kernels have compact support, we only consider the case of the compact
support of the kernel K(u) in our formal analysis. Nevertheless, we examine the effect of allowing
kernels with support on Rd in our simulation study.
10. It is different from Paparoditis and Politis (2000), which requires different bandwidths. The
reason why the same bandwidth works in our paper is that we use undersmoothing in the first stage,
and the bias of the first-stage nonparametric estimation vanishes to 0 asymptotically. Therefore, we
need not balance two bandwidths to obtain a good approximation of the asymptotic bias. This idea is
shown in Theorem 2.1 (i) of Paparoditis and Politis (2000) in a different context.
11. Our simulation experiments show that results based on these two smoothed bootstrap procedures
are very similar.
12. We thank Liangjun Su for providing the Matlab codes on computing Su and White’s tests of
conditional independence (2007, 2008).
13. We first generate 15 grid points u0 , v0 from N (0, 1) and obtain u = [u 0 ,−u 0 ] and v = [v0 ,
−v0 ] to ensure symmetry. Preliminary experiments with different numbers of grid points show that
simulation results are not very sensitive to the choice of numbers. Concerned with the computational
cost in the simulation study, we are satisfied with current results with 30 grid points.
14. We have tried the Parzen kernel for k (·) , obtaining similar results (not reported here).
15. Following Ait-Sahalia (1997) and Amoro de Matos and Fernandes (2007), we use undersmooth-
ing to ensure that the squared bias vanishes to zero faster than the variance. On the other hand, we have
used the smoothed nonparametric conditional density bootstrap procedure, and hence the simulation
results are expected not to be very sensitive to the choice of the bandwidth.
16. Ideally, the conditional distribution of Xt given Xt−1 should be tested. Unfortunately, to our
knowledge, no such test is available in the literature. Compared with some existing tests in the
literature, Inoue’s tests are model-free, allow for dependence in the data, and are robust against the
heavy-tailed distributions observed in financial markets. Hence, they are most suitable here for pre-
liminary testing.
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 161
REFERENCES
Aaronson, J., D. Gilat, & M. Keane (1992) On the structure of 1-dependent Markov chains. Journal
of Theoretical Probability 5, 545–561.
Ahn, D., R. Dittmar, & A.R. Gallant (2002) Quadratic term structure models: Theory and evidence.
Review of Financial Studies 15, 243–288.
Ahn, D. & B. Gao (1999) A parametric nonlinear model of term structure dynamics. Review of Finan-
cial Studies 12, 721–762.
Ait-Sahalia, Y. (1996) Testing continuous-time models of the spot interest rate. Review of Financial
Studies 9, 385–426.
Ait-Sahalia, Y. (1997) Do Interest Rates Really Follow Continuous-Time Markov Diffusions? Work-
ing paper, Princeton University.
Ait-Sahalia, Y., J. Fan, and H. Peng (2009) Nonparametric transition-based tests for diffusions. Jour-
nal the of American Statistical Association 104, 1102–1116.
Amaro de Matos, J. & M. Fernandes (2007) Testing the Markov property with high frequency data.
Journal of Econometrics 141, 44–64.
Amaro de Matos, J. & J. Rosario (2000) The Equilibrium Dynamics for an Endogenous Bid-Ask
Spread in Competitive Financial Markets. Working paper, European University Institute and
Universidade Nova de Lisboa.
Anderson, T. & J. Lund (1997) Estimating continuous time stochastic volatility models of the short
term interest rate. Journal of Econometrics 77, 343–377.
Aviv, Y. & A. Pazgal (2005) A partially observed Markov decision process for dynamic pricing.
Management Science 51, 1400–1416.
Bangia, A., F. Diebold, A. Kronimus, C. Schagen, & T. Schuermann (2002) Ratings migration and the
business cycle, with application to credit portfolio stress testing. Journal of Banking and Finance
26, 445–474.
Bierens, H. (1982) Consistent model specification tests. Journal of Econometrics 20, 105–134.
Blume, L., D. Easley, & M. O’Hara (1994) Market statistics and technical analysis: The role of
volume. Journal of Finance 49, 153–181.
Brown, B.M. (1971) Martingale limit theorems. Annals of Mathematical Statistics 42, 59–66.
Chacko, G., & L. Viceira (2003) Spectral GMM estimation of continuous-time processes. Journal of
Econometrics 116, 259–292.
Chan, K.C., G.A. Karolyi, F.A. Longstaff, & A.B. Sanders (1992) An empirical comparison of alter-
native models of the short-term interest rate. Journal of Finance 47, 1209–1227.
Chen, B. & Y. Hong (2009) Diagnosing Multivariate Continuous-Time Models with Application to
Affine Term Structure Models. Working paper, Cornell University and University of Rochester.
Cleveland, W.S. (1979) Robust locally weighted regression and smoothing scatterplots. Journal of the
American Statistical Association 74, 829–836.
Cox, J.C., J.E. Ingersoll, & S.A. Ross (1985) A theory of the term structure of interest rates. Econo-
metrica 53, 385–407.
Dai, Q., & K. Singleton (2000) Specification analysis of affine term structure models. Journal of
Finance 55, 1943–1978.
Darsow, W.F., B. Nguyen, & E.T. Olsen (1992) Copulas and Markov processes. Illinois Journal of
Mathematics 36, 600–642.
Davies, R.B. (1977) Hypothesis testing when a nuisance parameter is present only under the alterna-
tive. Biometrika 64, 247–254.
Davies, R.B. (1987) Hypothesis testing when a nuisance parameter is present only under the alterna-
tive. Biometrika 74, 33–43.
Duan, J.C & K. Jacobs (2008) Is long memory necessary? An empirical investigation of nonnegative
interest rate processes. Journal of Empirical Finance 15, 567–581.
Duffie, D. & R. Kan (1996) A yield-factor model of interest rates. Mathematical Finance 6,
379–406.
162 BIN CHEN AND YONGMIAO HONG
Duffie, D., J. Pan, & K. Singleton (2000) Transform analysis and asset pricing for affine jump-
diffusions. Econometrica 68, 1343–1376.
Easley, D. & M. O’Hara (1987) Price, trade size, and information in securities markets. Journal of
Financial Economics 19, 69–90.
Easley, D. & M. O’Hara (1992) Time and the process of security price adjustment. Journal of Finance
47, 577–605.
Edwards, R. & J. Magee (1966) Technical Analysis of Stock Trends. John Magee.
Epps, T.W. & L.B. Pulley (1983) A test for normality based on the empirical characteristic function.
Biometrika 70, 723–726.
Ericson, R. & A. Pakes (1995) Markov-perfect industry dynamics: A framework for empirical work.
Review of Economic Studies 62(1), 53–82.
Fan, J. (1992) Design-adaptive nonparametric regression. Journal of the American Statistical Associ-
ation 87, 998–1004.
Fan, J. (1993) Local linear regression smoothers and their minimax efficiency. Annals of Statistics 21,
196–216.
Fan, J. & Q. Yao (2003) Nonlinear Time Series: Nonparametric and Parametric Methods. Springer
Verlag.
Fan, Y. & Q. Li (1999) Root-N -consistent estimation of partially linear time series models. Journal of
Nonparametric Statistics 11, 251–269.
Fan, Y., Q. Li, & I. Min (2006) A nonparametric bootstrap test of conditional distributions. Economet-
ric Theory 22, 587–613.
Feller, W. (1959) Non-Markovian processes with the semi-group property. Annals of Mathematical
Statistics 30, 1252–1253.
Feuerverger, A. & P. McDunnough (1981) On the efficiency of empirical characteristic function pro-
cedures. Journal of the Royal Statistical Society, Series B 43, 20–27.
Gallant, A.R., D. Hsieh, & G. Tauchen (1997) Estimation of stochastic volatility models with diag-
nostics. Journal of Econometrics 81, 159–192.
Gao, J. & Y. Hong (2008) Central limit theorems for generalized U-statistics with applications in
nonparametric specification. Journal of Nonparametric Statistics 20, 61–76.
Hall, R. (1978) Stochastic implications of the life cycle permanent income hypothesis: Theory and
practice. Journal of Political Economy 86, 971–987.
Hamilton, J.D. (1989) A new approach to the economic analysis of nonstationary time series and the
business cycle. Econometrica 57, 357–384.
Hamilton J.D. & R. Susmel (1994) Autoregressive conditional heteroskedasticity and changes in
regime. Journal of Econometrics 64, 307–333.
Hansen, B.E. (1996) Inference when a nuisance parameter is not identified under the null hypothesis.
Econometrica 64, 413–430.
Hansen, B.E. (2008) Uniform convergence rates for kernel estimation with dependent data. Econo-
metric Theory 24, 726–748.
Heath, D., R. Jarrow, & A. Morton (1992) Bond pricing and the term structure of interest rates: A new
methodology for contingent claims valuation. Econometrica 60, 77–105.
Hong, Y. (1999) Hypothesis testing in time series via the empirical characteristic function: A gener-
alized spectral density approach. Journal of the American Statistical Association 94, 1201–1220.
Hong, Y. & H. Li (2005) Nonparametric specification testing for continuous-time models with appli-
cations to term structure of interest rates. Review of Financial Studies 18, 37–84.
Hong, Y. & H. White (2005) Asymptotic distribution theory for an entropy-based measure of serial
dependence. Econometrica 73, 837–902.
Horowitz, J.L. (2003) Bootstrap methods for Markov processes. Econometrica 71, 1049–1082.
Ibragimov, R. (2007) Copula-Based Characterizations for Higher-Order Markov Processes. Working
paper, Harvard University.
Inoue, A. (2001) Testing for distributional change in time series. Econometric Theory 17, 156–187.
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 163
Jarrow, R., D. Lando, & S. Turnbull (1997) A Markov model for the term structure of credit risk
spreads. Review of Financial Studies 10, 481–523.
Jarrow, R. & S. Turnbull (1995) Pricing derivatives on financial securities subject to credit risk. Journal
of Finance 50, 53–86.
Jiang, G. & J. Knight (1997) A nonparametric approach to the estimation of diffusion processes with
an application to a short-term interest rate model. Econometric Theory 13, 615–645.
Kavvathas, D. (2001) Estimating Credit Rating Transition Probabilities for Corporate Bonds. Working
paper, University of Chicago.
Kiefer, N.M. & C.E. Larson (2004) Testing Simple Markov Structures for Credit Rating Transitions.
Working paper, Cornell University.
Kim, W. & O. Linton (2003) A Local Instrumental Variable Estimation Method for Generalized
Additive Volatility Models. Working paper, Humboldt University of Berlin, London School of
Economics.
Kydland, F.E. & E. Prescott (1982) Time to build and aggregate fluctuations. Econometrica 50,
1345–70.
Lando D. & T. Skφdeberg (2002) Analyzing rating transitions and rating drift with continuous obser-
vations. Journal of Banking & Finance 26, 423–444.
LeBaron, B. (1999) Technical trading rule profitability and foreign exchange intervention. Journal of
International Economics 49, 125–143.
Lee, A.J. (1990) U-Statistics: Theory and Practice. Marcel Dekker.
Lévy, P. (1949) Exemple de processus pseudo-markoviens. Comptes Rendus de l’Académie des Sci-
ences 228, 2004–2006.
Li, Q. & J.S. Racine (2007) Nonparametric Econometrics: Theory and Practice. Princeton University
Press.
Linton, O. & P. Gozalo (1997) Conditional Independence Restrictions: Testing and Estimation. Work-
ing paper, Cowles Foundation for Research in Economics, Yale University.
Ljungqvist, L. & T.J. Sargent (2000) Recursive Macroeconomic Theory. MIT Press.
Lobato, I.N. & P.M. Robinson (1998) A nonparametric test for I(0). Review of Economic Studies 65,
475–495.
Loretan, M. & P.C.B Phillips (1994) Testing the covariance stationarity of heavy-tailed time series: An
overview of the theory with applications to several financial datasets. Journal of Empirical Finance
1, 211–248.
Lucas, R. (1978) Asset prices in an exchange economy. Econometrica 46, 1429–45.
Lucas, R. (1988) On the mechanics of economic development. Journal of Monetary Economics 22,
3–42.
Lucas, R. & E. Prescott (1971) Investment under uncertainty. Econometrica 39, 659–81.
Lucas, R. & N.L. Stokey (1983) Optimal fiscal and monetary policy in an economy without capital.
Journal of Monetary Economics 12, 55–94.
Masry, E. (1996a) Multivariate local polynomial regression for time series: Uniform strong consis-
tency and rates. Journal of Time Series Analysis 6, 571–599.
Masry, E. (1996b) Multivariate regression estimation local polynomial fitting for time series. Stochas-
tic Processes and Their Applications 65, 81–101.
Masry, E. & J. Fan (1997) Local polynomial estimation of regression functions for mixing processes.
Scandinavian Journal of Statistics 24, 165–179.
Masry, E. & D. Tjøstheim (1997) Additive nonlinear ARX time series and projection estimates.
Econometric Theory 13, 214–252.
Matús, F. (1996) On two-block-factor sequences and one-dependence. Proceedings of the American
Mathematical Society 124, 1237–1242.
Matús, F. (1998) Combining m-dependence with Markovness. Annales de l’Institut Henri Poincaré.
Probabilités et Statistiques 34, 407–423.
Mehra, R. & E. Prescott (1985) The equity premium: A puzzle. Journal of Monetary Economics 15,
145–61.
164 BIN CHEN AND YONGMIAO HONG
Mizutani, E. & S. Dreyfus (2004) Two stochastic dynamic programming problems by model-free
actor-critic recurrent network learning in non-Markovian settings. Proceedings of the IEEE-INNS
International Joint Conference on Neural Networks.
Pagan, A.R. & G.W. Schwert (1990) Testing for covariance stationarity in stock market data. Eco-
nomics Letters 33, 165–70.
Paparoditis, E. & D.N. Politis (2000) The local bootstrap for kernel estimators under general depen-
dence conditions. Annals of the Institute of Statistical Mathematics 52, 139–159.
Paparoditis, E. & D.N. Politis (2002) The local bootstrap for Markov processes. Journal of Statistical
Planning and Inference 108, 301–328.
Platen, E. & R. Rebolledo (1996) Principles for modelling financial markets. Journal of Applied Prob-
ability 31, 601–613.
Romer, P. (1986) Increasing returns and long-run growth. Journal of Political Economy 5,
1002–1037.
Romer, P. (1990) Endogenous technological change. Journal of Political Economy 5, 71–102.
Rosenblatt, M. (1960) An aggregation problem for Markov chains. In R.E. Machol (ed.), Information
and Decision Processes, pp. 87–92. McGraw-Hill.
Rosenblatt, M. & D. Slepian (1962) N th order Markov chains with every N variables independent.
Journal of the Society for Industrial and Applied Mathematics 10, 537–549.
Ruppert, D. & M.P. Wand (1994) Multivariate weighted least squares regression. Annals of Statistics
22, 1346–1370.
Rust, J. (1994) Structural estimation of Markov decision processes. Handbook of Econometrics 4,
3081–3143.
Sargent, T. (1987) Dynamic Macroeconomic Theory. Harvard University Press.
Singleton, K. (2001) Estimation of affine asset pricing models using the empirical characteristic func-
tion. Journal of Econometrics 102, 111–141.
Skaug, H.J. & D. Tjøstheim (1993) Nonparametric tests of serial independence. In T. Subba Rao (ed.),
Developments in Time Series Analysis: The Priestley Birthday Volume, pp. 207–229. Chapman &
Hall.
Skaug, H.J. & D. Tjøstheim (1996) Measures of distance between densities with application to testing
for serial independence. In P. Robinson & M. Rosenblatt (eds.), Time Series Analysis in Memory of
E. J. Hannan, pp. 363–377. Springer.
Stinchcombe, M.B. & H. White (1998) Consistent specification testing with nuisance parameters
present only under the alternative. Econometric Theory 14, 295–325.
Stone, C.J. (1977) Consistent nonparametric regression. Annals of Statistics 5, 595–645.
Su, L. & H. White (2007) A consistent characteristic-function-based test for conditional independence.
Journal of Econometrics 141, 807–834.
Su, L. & H. White (2008) Nonparametric Hellinger metric test for conditional independence. Econo-
metric Theory 24, 829–864.
Uzawa, H. (1965) Optimum technical change in an aggregative model of economic growth. Interna-
tional Economic Review 6, 18–31.
Vasicek, O. (1977) An equilibrium characterization of the term structure. Journal of Financial Eco-
nomics 5, 177–188.
Weintraub, G.Y., L.C. Benkard, & B. Van Roy (2008) Markov perfect industry dynamics with many
firms. Econometrica 76, 1375–1411.
Yoshihara, K. (1976) Limiting behavior of U-statistics for stationary, absolutely regular processes.
Z. Wahrsch. Verw. Gebiete 35, 237–252.
Zhu, X. (1992) Optimal fiscal policy in a stochastic growth model. Journal of Economic Theory 2,
250–289.
APPENDIX
Throughout the Appendix, we let M̃ be defined in the same way as M̂ in (2.19) with
Ẑ t (u) replaced by Z t (u). Also, C ∈ (1, ∞) denotes a generic bounded constant.
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 165
Proof of Theorem 1. The proof of Theorem 1 consists of the proofs of Theorems A.1–
A.3 below. n
p
THEOREM A.1. Under the conditions of Theorem 1, M̂ − M̃ → 0.
Proof of Theorem A.1. Put Tj ≡ T − | j|, and let ˜ j (u, v) be defined in the same way
p
as ˆ j (u, v) in (2.11), with Ẑ t (u) replaced by Z t (u) . To show M̂ − M̃ → 0, it suffices to
show
T −1
p
D̂ −1/2 ∑ k 2 ( j/ p)Tj [|ˆ j (u, v)|2 − |˜ j (u, v)|2 ]dW (u)dW (v) → 0, (A.1)
j=1
p −1 (Ĉ − C̃) = O P (T −1/2 ), and p −1 ( D̂ − D̃) = o P (1) , where C̃ and D̃ are defined in the
same way as Ĉ and D̂ in (2.19), respectively, with Ẑ t (u) replaced by Z t (u) . For space, we
focus on the proof of (A.1); the proofs for p −1 (Ĉ − C̃) = O P (T −1/2 ) and p −1 ( D̂ − D̃) =
o P (1) are straightforward. We note that it is necessary to obtain the convergence rate
O P ( pT −1/2 ) for Ĉ − C̃ to ensure that replacing Ĉ with C̃ has asymptotically negligible
impact given p/T → 0.
To show (A.1), we first decompose
T −1
∑ k 2 ( j/ p)Tj |ˆ j (u, v)|2 − |˜ j (u, v)|2 dW (u)dW (v) = Â1 + 2Re( Â2 ), (A.2)
j=1
where
T −1 2
Â1 = ∑ k 2 ( j/ p)Tj ˆ j (u, v) − ˜ j (u, v) dW (u)dW (v) ,
j=1
T −1
where Re( Â2 ) is the real part of Â2 , and ˜ j (u, v)∗ is the complex conjugate of ˜ j (u, v).
Then (A.1) follows from Propositions A.1 and A.2 below, and p → ∞ as T → ∞. n
p
PROPOSITION A.1. Under the conditions of Theorem 1, p −1/2 Â1 → 0.
p
PROPOSITION A.2. Under the conditions of Theorem 1, p −1/2 Â2 → 0.
166 BIN CHEN AND YONGMIAO HONG
Proof of Proposition A.1. Put ψt (v) ≡ ei v Xt − ϕ (v) and ϕ(v) ≡ E(ei v Xt ). Then
straightforward algebra yields that for j > 0,
ˆ j (u, v) − ˜ j (u, v)
T
= Tj−1 ∑ ϕ u|Xt−1 − ϕ̂ u|Xt−1 ψt− j (v) + ϕ (v) − ϕ̂ (v) Tj−1
t= j+1
T
∑ ϕ u|Xt−1 − ϕ̂ u|Xt−1
t= j+1
2 T −1
It follows that Â1 ≤ 2 ∑a=1 ∑ j=1 k 2 ( j/ p)Tj | B̂a j (u, v)|2 dW (u)dW (v) . Proposition
A.1 follows from Lemmas A.1 and A.2 below. n
−1 2
Lemma A.1. p −1/2 ∑Tj=1 k ( j/ p)Tj | B̂1 j (u, v)|2 dW (u)dW (v) = o P (1).
−1 2
Lemma A.2. p −1/2 ∑Tj=1 k ( j/ p)Tj | B̂2 j (u, v)|2 dW (u)dW (v) = o P (1).
B̂11 j (u, v)
T
= Tj−1 ∑ e 1 h r +1 S−1
T Xt−1 BT Xt−1 Dr +1 u, Xt−1 ψt− j (v)
t= j+1
T
+ Tj−1 ∑ e 1 S−1
T Xt−1 RT u, Xt−1 ψt− j (v)
t= j+1
= B̂111 j (u, v) + B̂112 j (u, v) , say, (A.5)
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 167
h r +1 T X l+j
s−1 − x
γj = (r + 1) ∑ ∑ Kh (Xs−1 − x)
|l|=r +1 l! (T − 1) s=2 h
1
T ' −1
+Tj−1 ∑ e 1 h r +1 S̃ Xt−1 B̃ Xt−1
t= j+1
×Dr +1 u, Xt−1 ψt− j (v)
−1
−E e 1 h r +1 S̃ Xt−1 B̃ Xt−1
((
×Dr +1 u, Xt−1 ψt− j (v) [1 + o P (1)]
≤ Cβ 2 ( j) h 2(r +1) ,
168 BIN CHEN AND YONGMIAO HONG
where we have used the mixing inequality and Assumption 1. It follows that
T −1 2
p −1/2 ∑ k 2 ( j/ p) Tj B̂1111 j (u, v) dW (u) dW (v) = o P (1) , (A.8)
j=1
T T −1
= 2Tj−2 ∑ ∑ E e 1 h r +1 S̃ Xt−1 B̃ Xt−1
τ = j+1 t=τ +1
−1
×Dr +1 u, Xt−1 ψt− j (v) − B̂1111 j (u, v) e 1 h r +1 S̃ Xt−1 B̃ Xt−1
∗
×Dr +1 u, Xτ −1 ψτ − j (v) − B̂1111 j (u, v) dW (u) dW (v)
−1
+Tj−1 E e 1 h r +1 S̃ Xt−1 B̃ Xt−1
2
× Dr +1 u, Xt−1 ψt− j (v) dW (u)dW (v)
T−j
l
≤ C Tj−1 ∑ 1− β (l) h 2(r +1) + C Tj−1 h 2(r +1) ≤ C Tj−1 h 2(r +1) ,
l=1 Tj
where we have used Assumption 1 and the mixing inequality. It follows from (A.9) and
Chebychev’s inequality that
T −1 2
p −1/2 ∑ k 2 ( j/ p) Tj B̂1112 j (u, v) dW (u) dW (v) = o P (1) . (A.10)
j=1
T −1 2
= p −1/2 ∑ k 2 ( j/ p) Tj B̂1111 j (u, v) dW (u) dW (v)
j=1
T −1 2
+ p −1/2 ∑ k 2 ( j/ p) Tj B̂1112 j (u, v) dW (u) dW (v)
j=1
T −1
−2 p −1/2 Re ∑ k 2 ( j/ p) Tj ∗
B̂1111 j (u, v) B̂1112 j (u, v) dW (u) dW (v)
j=1
= o P (1) .
(A.12)
For the second term in (A.5), we have
' −1
T ' −1
+Tj−1 ∑ e 1 S̃ Xt−1 R̃ u, Xt−1 ψt− j (v)
t= j+1
−1
((
−E e 1 S̃ Xt−1 R̃ u, Xt−1 ψt− j (v) [1 + o P (1)]
R̃|j| is of dimension N|j| × 1, with its lth element R̃|j| = γ̃g|j| (l) ≡ E γg|j| (l) .
l
For the first term in (A.13), we have
2
B̂1121 j (u, v) dW (u) dW (v)
−1 2
ψt− j (v) dW (u) dW (v)
≤C β 2 ( j) e 1 S̃ Xt−1 R̃ u, Xt−1 ∞
∞
≤ Cβ 2 ( j) h 2(r +1) ,
where we have used the mixing inequality, Assumption 1, and the fact that
2
sup γ̃j = O P h 2(r +1) . (A.14)
x∈G
170 BIN CHEN AND YONGMIAO HONG
It follows that
T −1 2
1
p− 2 ∑ k 2 ( j/ p) Tj B̂1121 j (u, v) dW (u) dW (v) = o P (1) , (A.15)
j=1
T T −1
T−j
l
≤ C Tj−1 ∑ 1− β (l) h 2(r +1) + C Tj−1 h 2(r +1) ≤ C Tj−1 h 2(r +1) ,
l=1 Tj
where we have used (A.14), Assumption 1, and the mixing inequality. It follows from (A.9)
and Chebychev’s inequality that
T −1 2
p −1/2 ∑ k 2 ( j/ p) Tj B̂1122 j (u, v) dW (u) dW (v) = o P (1) . (A.16)
j=1
1 −1
B̂121 j (u, v) = Tj (T − 1)−1 h −d ∑∑ j Y jt , Y js
2 t=s
1 T
+ Tj−1 (T − 1)−1 h −d ∑ j Y jt , Y jt .
2 t=2
T −1 T
2
1 −1
∑ k 2 ( j/ p) T j Tj (T − 1)−1 h −d ∑ j Y jt , Y jt dW (u) dW (v) = o P (1) .
j=1
2 t=2
1 −1
T (T − 1)−1 h −d ∑∑ j Y jt , Y js
2 j t=s
T 1
= Tj−1 h −d ∑ 1 j Xt−1 + Tj−1 (T − 1)−1 h −d ∑∑ ˜ j Y jt , Y js
t=2 2 t>s
T
= Tj−1 h −d ∑ 1 j ˜ j,
Xt−1 + say, (A.19)
t=2
T −1 l
2γ
1/γ
≤ C Tj−2 (T − 1) h −2d ∑ 1− β (l)1−(1/γ ) E 1 j Xt−1
l=1 T
+ Tj−2 (T − 1) h −2d var 1 j Xt−1
2γ
1/γ
≤ C Tj−2 (T − 1) h −2d E 1 j Xt−1
+ Tj−2 (T − 1) h −2d var 1 j Xt−1 ,
where γ = ν−1 ν + ε and ε > 0. Note that we have used Assumption 1 and the mixing
where we have used the fact that h −d 1 j (z) is bounded in probability. At the same time,
we have
2
T −1 T
−1/2 −1 −d
p ∑ k ( j/ p) Tj E Tj h ∑ 1 j Xt−1 dW (u) dW (v)
2
j=D+1 t=2
T −1
≤ C p −1/2 ∑ k 2 ( j/ p) β 2 ( j − 1) h −2d = O h −d/ν p −1/2 = o (1) , (A.21)
j=D+1
where we have used Assumptions 1, and 4, and the fact 1 j (x) ≤ β ( j − 1) given the
mixing inequality.
It follows from (A.20), (A.21), and Chebychev’s inequality that
2
T −1 T
−1
p −1/2 ∑ k 2 ( j/ p) T j Tj ∑ 1 j Xt−1 dW (u) dW (v) = o P (1) . (A.22)
j=1
t=2
˜ 2 ) = T −2 (T − 1)−2 h −2d ∑ ∑ E
E( ˜ Y jt , Y js .
˜ j Y jt , Y js
j j j
t=s t =s
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 173
Following Yoshihara (1976) and Lee (1990), we split it into two types: (a) those for which
t, s, t , s are all distinct; and (b) those remaining.
For terms of type (a), we have
∗
∑ E j Y jt , Y js j Y jt , Y js
˜ ˜
t,s,t ,s
T T −1 T −1 T −1
˜ ∗
˜ Y jt+s+t , Y jt+s+t +s
≤ 4! ∑ ∑ ∑ ∑ E j Y jt , Y jt+s j
t=2 s=2 t =2 s =2
T ∗
˜ ˜ Y jt+s+t , Y jt+s+t +s
≤ 4! ∑ ∑ E j Y jt , Y jt+s j
t=2 2≤t ,s ≤s
T ∗
˜ ˜ Y jt+s+t , Y jt+s+t +s
+ 4! ∑ ∑ E j Y jt , Y jt+s j
t=2 2≤s,s ≤t
T ∗
˜ ˜ Y jt+s+t , Y jt+s+t +s .
+ 4! ∑ ∑ E j Y jt , Y jt+s j (A.23)
t=2 2≤s,t ≤s
T ∗
˜ ˜ Y jt+s+t , Y jt+s+t +s
∑ ∑
E j Y jt , Y jt+s j
t=2 2≤t ,s ≤s
T T
≤ ∑ ∑ (s + 1)2 β α/α+1 (s) h 2d/(α+1) ≤ C (T − 1) h 2d/(α+1) ,
t=2 s=2
where 1 > α > ν−33 and we have used Lemma 2 of Yoshihara (1976) and Assumption 1.
For the second term in (A.23), we have
T ∗
˜ ˜ Y jt+s+t , Y jt+s+t +s
∑ ∑ E j Y jt , Y jt+s j
t=2 2≤s,s ≤t
T T 2
≤ ∑ ∑ t + 1 β α/α+1 t h 2d/(α+1)
t=2 t =2
T
+ ∑ ∑ β α/α+1 (s) h d/(α+1) β α/α+1 s h d/(α+1)
t=2 2≤s,s ≤t
≤ C (T − 1)2 h 2d/(α+1) ,
where 1 > α > ν−3 3 and we have used Lemma 2 of Yoshihara and Assumption 1. The third
term is similar to the first term.
174 BIN CHEN AND YONGMIAO HONG
≤ C (T − 1)2 h d ,
where 1 > α > ν−1 1 and we have used Lemma 2 of Yoshihara (1976) and Assumption 1.
For other cases, similar arguments apply.
Hence we have
T −1
1 ˜ 2
p− 2 ∑ k 2 ( j/ p) Tj j dW (u) dW (v) = o P (1) , (A.24)
j=1
Then it follows from (A.4), (A.5), (A.12), (A.17), (A.25), and (A.26) that
T −1 2
p −1/2 ∑ k 2 ( j/ p) Tj B̂1 j (u, v) dW (u) dW (v) = o P (1) . (A.27)
j=1
B̂2 j (u, v)
T T Xs−1 − Xt−1
= ϕ (v) − ϕ̂ (v) Tj−1 ∑ ϕ u|Xt−1 − ∑ Ŵ ei u Xs
t= j+1 s=2 h
Xs−1 − Xt−1
T T
= − ϕ (v) − ϕ̂ (v) Tj−1 ∑ ∑ Ŵ ϕ u|Xs−1 − ϕ u|Xt−1
t= j+1 s=2 h
−1 T T Xs−1 − Xt−1
− ϕ (v) − ϕ̂ (v) Tj ∑ ∑ Ŵ h
Z s (u)
t= j+1 s=2
We further decompose
T
B̂21 j (u, v) = ϕ (v) − ϕ̂ (v) Tj−1 ∑ e 1 h r +1 S−1
T Xt−1 BT Xt−1 Dr +1 Xt−1
t= j+1
T
+ ϕ (v) − ϕ̂ (v) Tj−1 ∑ e 1 S−1
T Xt−1 RT Xt−1
t= j+1
and
%
T T −1
B̂22 j (u, v) = Tj−1 (T − 1)−1 h −d ϕ (v) − ϕ̂ (v) ∑ ∑ e 1 S̃ Xt−1
t=2 s=2
Xs−1 − Xt−1
× Z s (u) − Tj−1 (T − 1)−1 h −d ϕ (v) − ϕ̂ (v)
h
&
j T −1 Xs−1 − Xt−1
× ∑∑ e 1 S̃ Xt−1 Z s (u) ψt− j (v) [1 + o P (1)]
t=2 s=2 h
T −1 2
p −1/2 ∑ k 2 ( j/ p) Tj B̂2abj (u, v) dW (u) dW (v) = o P (1) , for a, b = 1, 2.
j=1
(A.31)
The proof of (A.31) is similar to that of (A.12), (A.17), (A.25), and (A.26) in Lemma
4
A.1, with the fact that E ϕ (v) − ϕ̂ (v) ≤ C Tj−2 given Assumption 1. n
176 BIN CHEN AND YONGMIAO HONG
T −1
≤ p −1/2 ∑ k 2 ( j/ p)Tj | B̂111 j (u, v)||˜ j (u, v)|dW (u)dW (v)
j=1
T −1
+ p −1/2 ∑ k 2 ( j/ p)Tj | B̂112 j (u, v)||˜ j (u, v)|dW (u)dW (v)
j=1
T −1
+ . p −1/2 ∑ k 2 ( j/ p)Tj | B̂121 j (u, v)||˜ j (u, v)|dW (u)dW (v)
j=1
T −1
+ p −1/2 ∑ k 2 ( j/ p)Tj | B̂122 j (u, v)||˜ j (u, v)|dW (u)dW (v) . (A.33)
j=1
T −1
≤ p −1/2 ∑ k 2 ( j/ p)Tj | B̂1111 j (u, v)||˜ j (u, v)|dW (u)dW (v)
j=1
T −1
+ p −1/2 ∑ k 2 ( j/ p)Tj | B̂1112 j (u, v)||˜ j (u, v)|dW (u)dW (v)
j=1
= O P p −1/2 T 1/2 h r +1 + O P p 1/2 h r +1 = o P (1) , (A.34)
where we have used Assumptions 1 and 4, (A.9), and the fact that E|˜ j (u,v)|2 ≤ C Tj−1
under H0 .
For the second term in (A.33), we have
T −1
p −1/2 ∑ k 2 ( j/ p)Tj B̂112 j (u, v) ˜ j (u, v) dW (u)dW (v)
j=1
T −1
≤ p −1/2 ∑ k 2 ( j/ p)Tj | B̂1121 j (u, v)||˜ j (u, v)|dW (u)dW (v)
j=1
TESTING FOR THE MARKOV PROPERTY IN TIME SERIES 177
T −1
+ p −1/2 ∑ k 2 ( j/ p)Tj | B̂1122 j (u, v)||˜ j (u, v)|dW (u)dW (v)
j=1
= O P p −1/2 T 1/2 h r +1 + O P p 1/2 h r +1 = o P (1) , (A.35)
T −1
≤ p −1/2 ∑ k 2 ( j/ p)Tj | B̂121 j (u, v)||˜ j (u, v)|dW (u)dW (v)
j=1
T −1
+ p −1/2 ∑ k 2 ( j/ p)Tj | B̂122 j (u, v)||˜ j (u, v)|dW (u)dW (v)
j=1
= O P p −1/2 T (1−ν)/2ν h −d/ν + O P p −1/2 T −1 h −d/2 = o P (1) , (A.36)
T −1
≤ p −1/2 ∑ k 2 ( j/ p)Tj | B̂1141 j (u, v)||˜ j (u, v)|dW (u)dW (v)
j=1
T −1
p −1/2 + ∑ k 2 ( j/ p)Tj | B̂1142 j (u, v)||˜ j (u, v)|dW (u)dW (v)
j=1
= O P p −1/2 h −d/ν + O P p 1/2 T −1 h −3d/ν
+ O P p 3/2 T −1/2 + O P pT −1 h −d/2 = o P (1) , (A.37)
where we have used Assumptions 1 and 4, (A.9), and the fact that E|˜ j (u,v)|2 ≤ C Tj−1
given Assumption 1.
For a = 2, similar arguments apply. n
Proof of Theorem A.2. The proof is similar to that of Theorem A.2 of Chen and Hong
(2009). n
Proof of Theorem A.3. The proof is similar to that of Theorem A.3 of Chen and Hong
(2009). n
Proof of Theorem 2. The proof of Theorem 2 consists of the proofs of Theorems A.4
and A.5 below. n
178 BIN CHEN AND YONGMIAO HONG
p
THEOREM A.4. Under the conditions of Theorem 2, ( p 1/2 /T )( M̂ − M̃) → 0.
THEOREM A.5. Under the conditions of Theorem 2,
π
p
( p 1/2 /T ) M̃ → D −1/2 |F(ω, u, v) − F0 (ω, u, v)|2 dωdW (u)dW (v) .
−π
p
p −1 (Ĉ − C̃) = O P (1), and p −1( D̂ − D̃) → 0, where C̃ and D̃ are
defined in the same
way as Ĉ and D̂ in (2.19), with ϕ̂ u|Xt−1 replaced by ϕ u|Xt−1 . Since the proofs for
p
p −1 (Ĉ − C̃) = O P (1) and p −1 ( D̂ − D̃) → 0 are straightforward, we focus on the proof of
−1 2
(A.38). From (A.9), the Cauchy-Schwarz inequality, and the fact that T −1 ∑Tj=1 k ( j/ p)
Tj |˜ j (u,v)| dW (u)dW (v) = O P (1) as is implied by Theorem A.5 (the proof of Theorem
2
p
A.5 does not depend on Theorem A.4), it suffices to show that T −1 Â1 → 0, where Â1 is
defined as in (A.2). This is very similar to the proof of Proposition A.1, and hence it
completes the proof for Theorem A.4. n
Proof of Theorem A.5. The proof is similar to that of Theorem A.5 of Chen and Hong
(2009). n