0% found this document useful (0 votes)
35 views25 pages

Threshold Regression in Heterogeneous Panel Data With Interactive

This paper introduces a panel data threshold regression model that allows for heterogeneous slope coefficients and threshold parameters across cross-sectional units. It proposes using a unit-specific empirical quantile transformation to estimate a common underlying threshold parameter from the whole panel, while still allowing for heterogeneous thresholds across units. The paper derives the asymptotic theory for estimators in this model and applies it to examine the Feldstein-Horioka puzzle.

Uploaded by

Visjona Plaku
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views25 pages

Threshold Regression in Heterogeneous Panel Data With Interactive

This paper introduces a panel data threshold regression model that allows for heterogeneous slope coefficients and threshold parameters across cross-sectional units. It proposes using a unit-specific empirical quantile transformation to estimate a common underlying threshold parameter from the whole panel, while still allowing for heterogeneous thresholds across units. The paper derives the asymptotic theory for estimators in this model and applies it to examine the Feldstein-Horioka puzzle.

Uploaded by

Visjona Plaku
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Threshold Regression in Heterogeneous Panel Data with Interactive

Fixed Effects⋆
Marco Barassi, Yiannis Karavias∗, Chongxian Zhu
Department of Economics, University of Birmingham
arXiv:2308.04057v1 [econ.EM] 8 Aug 2023

Abstract

This paper introduces unit-specific heterogeneity in panel data threshold regression. Both slope
coefficients and threshold parameters are allowed to vary by unit. The heterogeneous threshold
parameters manifest via a unit-specific empirical quantile transformation of a common underlying
threshold parameter which is estimated efficiently from the whole panel. In the errors, the un-
observed heterogeneity of the panel takes the general form of interactive fixed effects. The newly
introduced parameter heterogeneity has implications for model identification, estimation, interpre-
tation, and asymptotic inference. The assumption of a shrinking threshold magnitude now implies
shrinking heterogeneity and leads to faster estimator rates of convergence than previously encoun-
tered. The asymptotic theory for the proposed estimators is derived and Monte Carlo simulations
demonstrate its usefulness in small samples. The new model is employed to examine the Feldstein-
Horioka puzzle and it is found that the trade liberalization policies of the 80’s significantly impacted
cross-country capital mobility.
Keywords: Panel Data; Threshold Regression; Heterogeneity; Interactive Fixed Effects; Regime
Switching; Feldstein-Horioka Puzzle.
JEL classification: C23; C24; F32; F41.

1. Introduction

Threshold regression was introduced in the seminal paper of Tong (1978) and has since become
one of the few key non-linear models widely employed by econometricians. Tsay (1989, 1998) char-
acterizes threshold models as “practical non-linear models” which stand out in the vast expanse of
the non-linear world. Indeed, threshold regression can capture complex properties of the data like
limit cycles, which in economics correspond to the existence of multiple equilibria often predicted
by economic theory, and yet still maintain intuitive interpretation and tractability. Threshold


Acknowledgements: The authors would like to thank Otilia Boldea, Chihwa Kao, Oliver Linton, Esfandiar Maa-
soumi, Hashem Pesaran, Liangjun Su, Lorenzo Trapani, and seminar participants at the University of Birmingham,
at the 2023 Italian Congress of Econometrics and Empirical Economics, and at the 28th International Panel Data
Conference for their valuable comments and suggestions.

Corresponding Author. Email: [email protected], University of Birmingham, Birmingham, B15 2TT, UK.
regression was introduced in panel data in Hansen (1999) and has since been an active field of the-
oretical and empirical research. This is because the pooled information across units helps with the
identification of the different regimes, as well as the precise estimation of the threshold parameter.
In this paper, we provide a comprehensive asymptotic theory for estimation and testing in panel
threshold regression model with two key features: i) heterogeneous parameters across units and ii)
interactive fixed effects. We show how to estimate the heterogeneous slopes and thresholds and
prove estimator consistency. We derive novel estimator rates of convergence and offer asymptotically
valid confidence intervals. Finally, we propose a test for the null hypothesis of linearity against the
alternative of a threshold regression.
Parameter heterogeneity arises naturally in panel data which consists of many cross-sectional
units. The greater the number of units, the more likely it is that the model parameters will
vary across them. This is a fact that is well recognized in the panel data literature, see e.g. the
contributions in Swamy (1970), Pesaran and Smith (1995), Pesaran et al. (1999), Pesaran (2006),
Fernández-Val and Lee (2013), Bonhomme and Manresa (2015), Gao et al. (2020), Norkutė et al.
(2021), Trapani (2021), and Chen et al. (2022), inter alia. In the heterogeneous panel model, the
parameters of interest are the individual unit slope coefficients and also their average. Therefore,
only Mean Group (MG hereafter) estimators can be used here (Pesaran and Smith, 1995). If the
model is static and linear, pooled estimators assuming homogeneity across units can consistently
estimate the average slope across units (see e.g. Pesaran (2006)) but this property does not extend
to non-linear models like the panel threshold regression.
The second key feature of the threshold model considered here is interactive fixed effects in the
error term. Interactive fixed effects represent a generalization of the standard two-way fixed effects
model. As their name suggests, interactive fixed effects are the inner product of a vector individual
effect with a vector time effect, which is a more flexible and empirically relevant way of captur-
ing unobserved heterogeneity. Interactive fixed effects have been widely used in microeconomics,
macroeconomics and finance. In microeconomics, they are able to capture the time-varying effect of
unobserved unit-invariant characteristics like ability and soft skills, see e.g. Kejriwal et al. (2020).
In macroeconomics, they can be interpreted as a set of unobserved multiple common factors useful
in capturing cross-section dependence (Pesaran, 2006) and in finance as unobserved factors for asset
returns (Giglio and Xiu, 2021).
Both parameter heterogeneity and the existence of interactive fixed effects have important
implications for model estimation. Parameter heterogeneity reduces the estimator rate of conver-
gence and interactive fixed effects cannot be removed by the standard fixed effects transformations.
There are several types of estimators that can deal with interactive fixed effects, such as the quasi-
difference approach (Holtz-Eakin et al., 1988), the generalized method of moments estimator (Ahn
et al., 2001, 2013; Juodis and Sarafidis, 2022), the Common Correlated Effects (CCE) estimator
(Pesaran, 2006), the Principal Components (PC) estimator (Bai, 2009), the two-stage instrumental
variables estimator (Norkutė et al., 2021), the Post-Nuclear Norm Regularized estimator of Moon
and Weidner (2018), and the Lasso type shrinkage methods on fixed effects (Lu and Su, 2016; Su

2
et al., 2016). Yet, not all of the above estimators can deal with heterogeneous coefficients. This pa-
per employs the CCE method for estimation because it is general enough to allow for heterogeneous
coefficients, is analytically tractable, has excellent small sample properties, and is computationally
fast which is important in threshold regression estimation.
The model estimation by CCE is not straightforward and presents several technical difficulties.
First, the identification of individual slope coefficients requires that the threshold variable has a
common support across units, something which is unlikely in practice. This is a serious problem
that can only be avoided if the threshold parameters are also allowed to be heterogeneous across
units. Threshold parameter heterogeneity, however, removes the appeal of panel data, as it the
problem becomes equivalent to running separate time series regressions. To break this impasse, we
assume that there is a homogeneous threshold parameter across units and propose transforming
the threshold variable by the monotonic percentile function. The transformation results in all panel
units having a common support for the transformed threshold variable and a transformed common
threshold parameter. The common threshold parameter (now a threshold in terms of percentiles)
can be estimated precisely using information from both dimensions of the panel as it is desirable.
At the same time, the threshold percentile, which splits the model coefficient regimes, translates
to heterogeneous thresholds in the original threshold variable. Therefore, we estimate a common
parameter across units, whose projected interpretation in the original threshold variable is that of a
heterogeneous threshold parameter. This modelling choice gives us the best of both worlds; efficient
panel data estimation of the common threshold percentile parameter, no identification issues for
the heterogeneous coefficient slopes, and heterogeneous interpretation of each unit’s threshold. The
percentile function transformation and the impact of parameter heterogeneity on marginal effects
are presented in Section 2 below.
Another key obstacle is that the asymptotic distribution of the threshold parameter depends on
nuisance parameters related to the distribution of the error term, see e.g. Chan (1993). To deal with
this problem, Hansen (2000) derives a nuisance parameter-free asymptotic distribution, under the
assumption of a “shrinking theshold”. The shrinking threshold assumption is maintained here, as in
almost all other papers in the literature, however, what is new here is that it interacts with the slope
coefficient heterogeneity. The shrinking threshold assumption in this model contemporaneously
implies shrinking heterogeneity. As the threshold magnitude shrinks, the relevant slope coefficients
become less heterogeneous and the estimator rate of convergence becomes faster, reaching in the
limit that of a pooled estimator in a homogeneous panel. To the best of our knowledge, this is the
first time this effect is observed.
In terms of threshold regression in heterogeneous panels, this paper is closer to that of Chudik
et al. (2017), which considers almost the same framework, however, it is mostly focused on the
specific on an empirical application and does not provide any asymptotic theory supporting the
estimation methodology. We fill this gap with the current contribution. Another paper that is very
close is that of Miao et al. (2020b) which assumes that units belong to a small number of groups
and parameters vary across groups. This paper allows only for fixed effects, and not for the full

3
interactive fixed effects model considered in the current contribution. Furthermore, the empirical
application is focused on estimating the number of underlying groups and group membership, which
is different from the form of parameter heterogeneity considered here. The two papers are therefore
clearly distinct. Miao et al. (2020a) consider panel threshold regression with interactive fixed effects
but restrict the parameters to be homogeneous across units. Hacıoğlu Hoke and Kapetanios (2021)
consider a smooth transition model with heterogeneous coefficients and interactive fixed effects.
Massacci (2017) and Massacci et al. (2021) consider threshold regression in the heterogeneous
loadings of pure factor models, without additional regressions. Other contributions in the area are
those of Yu et al. (2023) which considers threshold regressions with endogenous regressors, Chen
et al. (2012) which considers panels with two threshold variables, and Seo and Shin (2016) which
considers threshold regression in dynamic panel data models with a short time dimension. All these
contributions assume homogeneous coefficients and standard fixed effects.
We conclude this paper by applying the new methodology to examine one of the most impor-
tant problems of macroeconomics, that of the Feldstein-Horioka puzzle (Feldstein and Horioka,
1980). The puzzling fact is that domestic savings in a country are highly correlated with domes-
tic investment, meaning that savers disregard potential opportunities for higher returns in other
countries. The new threshold model studied in this paper allows for heterogeneous coefficients
and cross-section dependence which have both been documented previously in country-level data
(Chudik et al., 2017). Our results confirm the existence of the puzzle but show that higher trade
openness results in greater international capital mobility. We find that most countries moved into
this “high capital mobility” regime in the 80’s, which is in line with the trade liberalization policies
introduced at that time (Faini, 2004).
The remainder of the paper is organised as follows. Section 2 describes the model we study.
Section 3 develops the estimation strategy. Section 4 provides the asymptotic theory. Section 5
introduces a test for the presence of the threshold effects. Section 6 applies the methodology to study
the relationship between inflation and economic growth. Section 7 concludes. The supplementary
appendix contains the bootstrap algorithm for the test of no nonlinearity, extensive Monte Carlo
simulations, estimators for group-specific parameter heterogeneity with known group membership,
and all relevant mathematical proofs.
We will use the following notation. The letter C stands for an universal finite positive constant,
while Im denotes m × m identity matrix. Also, for a real m × n matrix A, the element (i, j) is
denoted by Aij , while ||A|| denotes the Frobenius norm. 1T = (1, 1, ..., 1)′ , a T × 1 unity vector.
λmin (A) and λmax (A) denote the smallest and largest eigenvalues of A respectively. We define the
projection matrices PA = A(A′ A)−1 A′ and MA = Im − PA . I(.) is the indicator function, and
Ȳt = N −1 N
P
i=1 Yi,t indicates the cross-sectional average of any variable Yi,t . diag(A) denotes a
p
diagonal matrix consisting of main diagonal elements of the matrix A. The symbol → denotes
d
convergence in probability, → convergence in distribution, ⇒ denotes weak convergence with
respect to the uniform metric, and plim probability limit. (N, T ) → ∞ denotes that both N and
T tend to infinity jointly, where N is the number of units in the panel and T is the number of time

4
series observations.

2. Model

Consider the following model, in which the response variable of the ith unit, observed at time
t, yi,t is given by the model:

yi,t = βi′ xi,t + δi′ wi,t I{qi,t ≤ γ} + ei,t i = 1, ..., N , t = 1, ..., T (1)

where xi,t is K × 1 vector of observable regressors, and βi is a K × 1 vector of heterogeneous slope


coefficients, which can be different for each unit i. Additionally, let:

wi,t = R′ xi,t , (2)

be an r × 1 subset of the regressors in xi,t . The matrix R is an K × r selection matrix of zeros


and ones with full column rank r that picks elements of xi,t whose coefficients are subject to the
threshold effect. R is known to the researcher. If R = IK , then all K regressors in xi,t have the
threshold effect and the model is called a pure threshold model, while if R = (Ir , 0r×(K−r) )′ , then
only the first r regressors in xi,t are affected by the threshold, and the model is called a partial
threshold model.
A key characteristic of (1) is that the effect of the regressors on the dependent variable is
allowed to vary across two regimes which are identified by I, an indicator function which takes
value 1 when {qi,t ≤ γ} and 0 otherwise. The variable qi,t is a scalar that may belong in xi,t , and γ
is the threshold parameter that defines the two different regimes in the model. When qi,t > γ the
effect of xi,t on yi,t is βi . This is frequently called the “high regime”. There is also a “low regime”
with qi,t ≤ γ, where the coefficient of the variables is βi + δi . The model is suitable to identify
the different equilibria which can arise when the response of the dependent variable is different
across periods. Examples of the “low regime” include periods of economic and financial turmoil
and distress, such as poor stock market performance or economic crises, or periods of unfavorable
economic outlook and low sentiment when compared to normal times, namely the “high regime”.1
The model in (1) is sufficiently general to render existing panel models as special cases; (i) the
simple linear heterogeneous panel model of Swamy (1970) corresponds to the case where δi = 0 for
all is and (ii) the homogeneous threshold panel models of Hansen (1999) and Miao et al. (2020a)
correspond to the case where βi = β, and δi = δ.
The threshold parameter γ is assumed to be common across units, in line with all the pre-
existing literature on panel data threshold regression. The γ parameter can be seen as the average
of heterogeneous γi which vary across units, in the same way that a common break can be seen as the

1
Similarly, it is possible to define I{qi,t > γ} in (1) to capture regimes where inflation or interest rates are above
a certain threshold in monetary policy models (high inflation regimes). These cases are formally equivalent and the
only difference between them is the interpretations of the δi coefficients.

5
average of heterogeneous breaks across units see Bai (2010) and Karavias et al. (2022). Assuming
homogeneity in γ allows estimation by pooling the information across N . Observations on many
units allows observing many limit cycles and this pooling results in more precise estimation of the
threshold parameter. Contrary to the previous literature, in model (1) the threshold homogeneity
assumption creates complications given that the βi and δi are now allowed to vary across units.
When γ is homogeneous and the slopes are heterogeneous across units, the model in (1) is hard
to estimate in most empirical applications due its stringent identification requirements. The identi-
fication of δi necessitates that the threshold γ is such that there are threshold variable observations
in both regimes defined by γ; explicitly Tt=1 I{qi,t ≤ γ} > 0 and Tt=1 I{qi,t > γ} > 0, for every
P P

unit i. If however the supports of qi,t and qj,t , for i ̸= j, are disjoint, then the aforementioned
condition will not hold and either δi or δj will not be identified. To see this, consider the impact
of government expenditure on economic growth. The UK’s General government final consumption
expenditure (% of GDP) varies between 16% to 22% from 1973 to 2021, while that of Mexico in the
same period varies between 8% to 12%. Therefore, there is no common γ that creates two regimes
in both countries.
To bypass this challenge, we introduce a variation in the functional form of the threshold
variable:
yi,t = βi′ xi,t + δi′ wi,t I{pi (qi,t ) ≤ γ} + ei,t . (3)

The function pi (·) transforms the threshold variable into an empirical quantile. The benefit of the
transformation is that it ensures that pi (qi,t ) has common support across all units and that all the
δi s are identified.
The introduction of the empirical quantile transformation has a significant implication for the
interpretation of the model; the threshold parameter γ is now interpreted as a threshold percentile
value, and not in the original threshold variable units. To translate the threshold parameter in the
original units, the inverse transformation of pi (·) must be employed. Because pi (·) is unit-specific,
the inverse transformation is also unit specific and therefore, the retrieved threshold in the original
units will also be heterogeneous and unit-specific: γi = pi (γ)−1 .
Threshold heterogeneity is a desirable generalization when it does not come at the expense of
estimation accuracy. Efficient estimation is achieved by maintaining a common threshold parameter
γ. The key additional assumption in model (3) is the specific form of the empirical quantile
function. Let for example pi (qi,t ) = Ranki (qi,t )/T , where Ranki (qi,t ) is the rank of qi,t across the
time series dimension {qi,t , t = 1, ..., T }. This formula transforms qi,t into percentiles. Notice that
the transformed variable pi (qi,t ) is in percentiles and thus has T unique observations which coincide
across all units.
Using the ranks of qi,t does not change the interpretation of the slope coefficients in the baseline
⋆ = p (q ). The marginal effect of x
model (1). Let qi,t i i,t i,t is still:


⋆ )
∂E(yi,t |xi,t , qi,t β , if q ⋆ > γ,
i i,t
= (4)
∂xi,t β + δ if q ⋆ ≤ γ.
i i i,t

6
The main focus of this paper is to estimate the marginal effects in (4). To this end, we provide
consistent estimators for βi , δi and also γ. Additionally, the average marginal effects are of interest:

N ⋆ ⋆ > γ,
1 X ∂E(yi,t |xi,t , qi,t ) β̄, if qi,t
= (5)
N ∂xi,t β̄ + δ̄ if q ⋆ ≤ γ,
i=1 i,t

PN PN
where β̄ = N −1 i=1 βi , and δ̄ = N −1 i=1 δi . Following Pesaran and Smith (1995) the average
marginal effects are β̄ and δ̄ or the E(βi ) and E(δi ) as N → ∞.
The errors eit contain the unobserved heterogeneity, which has the general form of interactive
fixed effects or a multi-factor error structure as it is sometimes called in the literature (Pesaran,
2006):
ei,t = λ′i ft + εi,t (6)

where ft is a m × 1 vector containing m unobserved factors which are common to all units, λi
are the heterogeneous common factor loadings, and εi,t are the remaining idiosyncratic errors. The
unobserved heterogeneity formulation in (6), encompasses all standard panel data models. If ft = 1
then we have the standard fixed effect (FE) model, while for ft = (1, τt )′ and λi = (αi , 1)′ we have
the two-way fixed effect (TWFE). The unobserved common factors ft can capture the price of an
unobserved skill which changes in time or the price of an input for production, or even aggregate
shocks to a particular market or economy. Because all units are impacted by these factors, the
factor structure is the source of the cross-sectional dependence across the units.
The interactive fixed effects are allowed to be correlated with the regressors. We follow Pesaran
(2006) and assume that the correlation is linear and can be modelled as:

xi,t = Π′i ft + ξi,t (7)

where Πi is m × K fixed factor loading matrix and ξi,t is a K × 1 is the idiosyncratic part. This is
a Mundlak-style assumption and is well accepted in the literature.
We further assume that the threshold variable qi,t is one of the regressors in xi,t and therefore
satisfies (7). This is the most empirically relevant scenario. Alternatively, it is also possible that
qi,t is not included in xi,t , in which case, the theory below still holds but collapses to that of a
structural break model that can be estimated by arranged regression (Tsay, 1998) as in Karavias
et al. (2022). The assumption that qi,t satisfies (7) does not interfere with the empirical quantile
transformation of q ⋆ because the latter appears only in the indicator function I{qi,t
⋆ ≤ γ}, and there

⋆ ≤ γ}.
are no assumptions in the threshold literature on the distribution of I{qi,t

7
3. Estimation

To present the estimators we stack the model in (3) across the time dimension. Letting wi,t (γ) =
⋆ ≤ γ}, the stacked model becomes:
wi,t I{qi,t

yi = Xi βi + Wi (γ)δi + ei , (8)

where, yi = (yi,1 , yi,2 , . . . , yi,T )′ is a T × 1 vector, Xi = (xi,1 , xi,2 , . . . , xi,T )′ is a T × K matrix,


Wi (γ) = (wi,1 (γ), wi,2 (γ), . . . , wi,T (γ))′ is a T × r matrix and, ei = (ei,1 , ei,2 , . . . , ei,T )′ is a T × 1
vector. The interactive effects from (6) in matrix form become:

ei = F λi + εi , (9)

where F = (f1 , f2 , . . . , fT )′ is a T ×m matrix, and εi = (εi,1 , εi,2 , . . . , εi,T )′ is a T ×1 vector. Finally,


we express (7) in matrix form as:
Xi = F Πi + ξi , (10)

where, ξi = (ξi,1 , ξi,2 , . . . , ξi,T )′ is a T × k matrix.


We are interested in three sets of parameters: the threshold parameter γ, the average effect
of the slope coefficients β and δ, and the heterogeneous slope coefficients βi and δi , which may
be different across units i ∈ {1, 2, ..., N }. The model in (8), is linear in the parameters βi and δi
and non-linear in the parameter γ. For now, let γ be known. In this case the model is linear in
the parameters and can be estimated by a variant of the Mean Group CCE estimator in Pesaran
(2006), adapted as in Karavias et al. (2022). The key idea is that cross-section averages of Xi can
be used to consistently estimate the space spanned by the unknown factors in (10). This is an
alternative to using principal components, see e.g. Westerlund and Urbain (2015). The first step
to estimation then involves pre-multiplying (8) by MX̄ = IT − X̄(X̄ ′ X̄)−1 X̄ ′ , where IT is a T -order
identity matrix. The transformed model becomes:

ỹi = X̃i βi + W̃i (γ)δi + ẽi , (11)

where ỹi = MX̄ yi , X̃i = MX̄ Xi , W̃i (γ) = MX̄ Wi (γ) and ẽi = MX̄ ei . We will show later that
MX̄ ei = MX̄ F λi + MX̄ εi = MX̄ εi + op (1), asymptotically removing the m common factors.2 The
model in (11) can be rewritten in a more compact form as:

ỹi = Z̃i (γ)θi + ẽi , (12)


 
where Z̃i (γ) = X̃i , W̃i (γ) and θi = (βi′ , δi′ )′ .

2
Here we only use X̄ to remove the interactive fixed effects, which is different to using both Ȳ and X̄ as in the
original CCE estimator of Pesaran (2006). Karavias et al. (2022) show that Ȳ is not rotationally consistent for ft in
the model with structural breaks and the same applies here.

8
Assuming γ is known, the CCE estimators are:

−1 N
 1 X
θ̂i (γ) = Z̃i (γ)′ Z̃i (γ) Z̃i (γ)′ ỹi θ̂(γ) = θ̂i (γ) (13)
N
i=1

and more explicitly:

−1 N
 1 X
β̂i (γ) = X̃i′ MW̃i (γ) X̃i X̃i′ MW̃i (γ) ỹi , β̂(γ) = β̂i (γ), (14)
N
i=1
−1 N
 1 X
δ̂i (γ) = W̃i (γ)′ MX̃i W̃i (γ) W̃i (γ)′ MX̃i ỹi , δ̂(γ) = δ̂i (γ). (15)
N
i=1

If γ is unknown, as it is usually the case in practice, we follow Hansen (1999) and estimate it
with the γ which minimizes the CCE sum of squared residuals:

N h
X i′ h i
γ̂ = argmin ỹi − Z̃i (γ)θ̂i ỹi − Z̃i (γ)θ̂i (16)
γ∈Γ i=1
XN h i′ h i
= argmin ỹi − X̃i β̂i (γ) − W̃i (γ)δ̂i (γ) ỹi − X̃i β̂i (γ) − W̃i (γ)δ̂i (γ) . (17)
γ∈Γ i=1

The above sum of squared residuals is a step function for γ having only O(T ) distinct values.
When T is large, Hansen (1999) suggests approximating Γ by a grid search method to save compu-
⋆ , 1 ≤ t ≤ T }.3 First, sort the distinct values of the observations
tational time, searching in Γ ∩ {q1,t
⋆ , and next, trim the top and bottom 1%, 5%, 10%, or any other spe-
on the threshold variable q1,t
⋆ . Finally, search for γ̂ over the remaining values of q ⋆ .
cific percentiles of q1,t 4 Once γ̂ has been
1,t
obtained, the estimators for β̂i , δ̂i , β̂ are δ̂ can be obtained by substituting γ̂ for γ in (14), and
(15).
The estimate γ̂ will be a quantile, and to interpret it in the original variable qi,t for each unit
i the inverse transformation γ̂i = pi (γ̂)−1 must be applied. We will henceforth denote β̃i = β̂i (γ̂),
δ̃i = δ̂i (γ̂), β̃ = β̂(γ̂), and δ̃ = δ̂(γ̂). Similarly we define θ̃i = (β̃i , δ̃i ) and θ̃ = (β̃, δ̃).

4. Assumptions

This section presents the main assumptions under which we develop the asymptotic theory of
the estimators. Since the empirical quantile transformation is a one-to-one and onto map, we only

3 ⋆
Notice that for each unit i, the unique values in {qi,t , 1 ≤ t ≤ T } are the same because they are empirical
quantiles based on the same number of time series observations.
4
This problem is computationally faster to solve because there are only O(T ) unique values, instead of O(N T ) in
the case without the quantile transformation. The transformation to quantiles is necessary only when the support
of the threshold variable is not common across units. If this is not the case, then the quantile transformation is
not necessary and the estimation proceeds in the same way except that now there will be O(N T ) distinct threshold
values.

9
study asymptotic properties with qi,t to simplify notations. Similar results can be obtained by
⋆ . Let D = σ(F ) be the minimal sigma-field generated from the factor structure
replacing qi,t by qi,t
F . Let PD (A) = P(A|D) and ED (A) = E(A|D).
We use the superscript 0 to denote the true parameter values. In particular, the true regression
′ ′
coefficients are denoted by θi0 = (βi0 , δi0 )′ for i = 1, ..., N, and the true threshold parameter by γ 0 .5

Assumption A.1 (Common Factors). i) ft is strictly stationary and ergodic, distributed indepen-
p
dently of εi,t′ and ξi,t′ for all i, t, t′ ; ii) E||ft ||4+ϵ < ∞ for some ϵ > 0; iii) T1 Tt=1 ft ft′ → Σf > 0
P

for some m × m matrix Σf as T → ∞.

Assumption A.2 (Fixed Factor Loadings). i) Rank(Π̄) = m ≤ k for all N , including N → ∞;


ii) ||Π̄|| < ∞; iii) ||λi || < ∞ for i = 1, 2, ..., N .

Assumption A.3 (Heterogeneity). The slopes θi follow the random coefficient model:

θi0 = θ0 + vi , vi ∼ iid(0, Ωv ), i = 1, 2, ..., N, (18)

where ||θ0 || < C, Ωv is a (k + r) × (k + r) symmetric non-negative definite matrix such that


||Ωv || < C, and the vi are distributed independently of λj , Πj , ξj,t , εj,t and ft ∀ i, j and t.

Assumption A.4 (Regressors). i) ξi,t are independent and identically distributed (i.i.d.) across
i; ii) For each i, ξi,t is strictly stationary and ergodic with ρ-mixing coefficients over t, such that
P∞ 1/2 (ξ) (ξ) (ξ)
m=1 ρm < ∞; iii) E(ξi,t |FN T,t−1 ) = 0, where FN T,t−1 is the sigma field such that FN T,t−1 =
σ({ξi,t−1 }N
i=1 ); iv) E|ξi,t |
4+ϵ < ∞ for some ϵ > 0.

Assumption A.5 (Errors). i) εi,t are independent and identically distributed (i.i.d.) across i;
ii) For each i, εi,t is strictly stationary and ergodic with ρ-mixing coefficients over t, such that
P∞ 1/2 (ε) (ε) (ε)
m=1 ρm < ∞; iii) E(εi,t |FN T,t−1 ) = 0, where FN T,t−1 is the sigma field such that FN T,t−1 =
σ({εi,t−1 }N
i=1 ); iv) E|εi,t |
4+ϵ < ∞, for some ϵ > 0 v) ε
i,t and ξj,t′ are distributed independently for
all i, j, t, t′ ; vi) E(ξi,t ξi,t
′ ) = Σ , and E(ε ε′ ) = Σ .
ξ i i εi

Assumption A.6 (Threshold Parameter Space). The threshold parameter γ 0 ∈ Γ = [γ, γ], where
Γ is a compact set.

Assumption A.7 (Continuous Threshold Variable). i) qi,t is a continuous threshold variable.


Conditionally on D, qi,t has a conditional probability density function, fi,t,D (γ). ii) fi,t,D (γ) is
continuous at γ = γ 0 and it is uniformly bounded, and thus supi,t supγ∈Γ fi,t,D (γ) ≤ C < ∞.

Assumption A.8 (Full Rank Conditions). Conditional on D, we require that


⋆ > M ⋆ (γ 1 , γ 2 ) > 0 for all i, and γ 1 , γ 2 ∈ Γ, where M ⋆ = E −1 X ′ M X , M ⋆ (γ 1 , γ 2 ) =

i) MT,i T,i T,i D T i F i T,i
ED T Xi (γ ) MF Xi (γ ) ; Particularly, if γ = γ = γ, then MT,i (γ) = ED T Xi (γ)′ MF Xi (γ) ;
−1 1 ′ 2 1 2 ⋆ −1
 

5
If qi,t has common support across units then the quantile transformation may not be necessary. The asymptotic
theory presented below applies to this section as well.

10
⋆ > M ⋆⋆ (γ) > 0 for all i and γ ∈ Γ, where M ⋆⋆ (γ) = E −1 X ′ M X (γ) ;

ii) MT,i T,i T,i D T i F i 
⋆ ⋆ (γ 1 , γ 2 ) > 0 for all γ 1 , γ 2 ∈ Γ, where M ⋆ −1 PN ′M X ,
iii) MN T > M NT = E D (N T ) X
i=1 i F i
  NT
⋆ 1 2 −1 PN 1 ′ 2 1 2 ⋆
and MN T (γ , γ ) = ED (N T ) i=1 Xi (γ ) MF Xi (γ ) . If γ = γ = γ, then MN T (γ) =
 
ED (N T )−1 N ′
P
i=1 Xi (γ) MF Xi (γ) ;
 
⋆ > M ⋆⋆ (γ) > 0 for all γ ∈ Γ, where M ⋆⋆ (γ) = E −1 PN ′ M X (γ) .
iv) MN T NT NT D (N T ) X
i=1 i F i

Assumption A.9 (Identification Condition). Consider the cross-product terms S1i (γ) = Z̃i (γ)′ Z̃i (γ)
with S1i (γ) > 0, S2i (γ) = Z̃i (γ)′ W̃i (γ 0 , γ), S3i (γ) = W̃i (γ 0 , γ)′ W̃i (γ 0 , γ), and W̃i (γ 1 , γ 2 ) = W̃i (γ 1 )−
W̃i (γ 2 ) for any γ 1 , γ 2 ∈ Γ. Then, i) N −1 N 0 2
P
i=1 ||δi || > 0, including N → ∞; ii) There exists some
constant τ > 0, as T → ∞ for all i uniformly, such that:
   
1 ′ −1
 0
 p
P min λmin S3i (γ) − S2i (γ) S1i (γ)S2i (γ) ≥ τ min 1, |γ − γ | → 1; (19)
γ∈Γ T

iii) limsupN,T (N T )−1 N ′ ′ −1 ′


P
i=1 E||(Z̃i (γ) ei ) S1i (γ)Z̃i (γ) ei || < ∞, uniformly for γ ∈ Γ; and iv)
limsupN,T (N T )−1 N ′ ′ −1
P
i=1 E||δi (S3i (γ) − S2i (γ) S1i (γ)S2i (γ))δi || < ∞ uniformly for γ ∈ Γ.

Assumption A.1 is standard in the literature and it is similar to Assumption 1 in Pesaran (2006)
which excludes non-stationary factors and trends. Assumption A.2 is the so-called rank condition
and states that the number of factors must be smaller or at most equal to the number of regressors.
This assumption is standard in the CCE literature and appears in Pesaran (2006) and in Karavias
et al. (2022). Applied research typically finds a small number of factors in the error term, see e.g.
Juodis and Sarafidis (2022). To further relieve the strain of the rank condition on the number of
regressors, notice that some factors in ft may be observable. Observed factors such as the intercept,
time effects, seasonal dummies, and other common variables like index stock returns, central bank
interest rates and oil prices should be included in Xi . Any factor included in Xi does not count
towards the dimension m of F . When it comes to the factor loadings, we follow Westerlund and
Kaddoura (2022) and assume that λi and Πi are fixed in our setting, unlike Pesaran (2006), which
assumes that they are random and independent of each other, which is much stronger.
Assumption A.3 states that the heterogeneous coefficients are randomly distributed across units
and are independent of any other random elements in the model. This is the prevalent assumption in
the heterogeneous coefficients literature, see e.g. Pesaran (2006) and Chudik and Pesaran (2013).
Assumption A.4 is similar to Assumption 2 of Pesaran (2006) and states that the series must
be stationary and that the cross-sectional dependence across units is fully captured by the factor
structure. The ρ mixing assumption controls the degree of time series dependence and is weaker than
uniform mixing, yet stronger than strong mixing. This assumption also implies that ρm = O(ρm )
with |ρ| < 1.
Assumption A.5 is the typical zero-mean assumption made for errors, similar to Pesaran (2006).
We require the errors to be a martingale difference sequence to avoid bias in θ̂i (γ). The series
(xi,t , qi,t ) are treated as strictly exogenous variables conditional on D. Furthermore, ξi,t is mean
independent with respect to both future and past xi,t , qi,t conditional on D. Finally, εi,t and ξj,t

11
are not correlated. All these ensure the correct specification of the conditional mean. The higher
order moments are as in Miao et al. (2020a).
Assumptions A.6 and A.7, which are also standard in the literature, impose that the threshold
parameter belongs to a compact set (A.6), and also (A.7) that it has a conditional probability
density function and is uniformly bounded (Hansen, 2000).
Assumption A.8 contains full rank assumptions that ensure matrix invertibility wherever nec-
essary, as in Hansen (2000), and assumption A.9 ensures the identification of parameter γ 0 . 6

Assumption A.9 is not restrictive, and the case where some units do not have threshold effects is
permitted if the order of the number of threshold-affected units is N .

5. Asymptotic Theory

In this section, we derive the asymptotic properties of the proposed estimators and related tests
by letting N and T tend to infinity. Specifically we shall begin with Theorems 1 and 2 which
prove the consistency of γ̂ and derive its rate of converge. Theorems 3 and 4 derive the asymptotic
distributions of the individual θ̃i and the MG θ̃ estimators. Theorem 5 derives the asymptotic
distribution of γ̂, while Theorem 6 provides the likelihood ratio statistic for testing hypotheses
about γ. Finally, Theorem 7 derives a hypothesis test for the null of no threshold. All proofs are
relegated to Appendix A.
p
Theorem 1. Under Assumptions A.1-A.9, and as (N, T ) → ∞, γ̂ → γ 0 .

Theorem 1 establishes the consistency for γ̂ under weaker conditions than Miao et al. (2020a),
in that it does not require the shrinking threshold assumption, or any restriction on the relative
magnitude of N and T .
While the shrinking threshold assumption is not necessary to prove consistency of the estimator
for γ, it is needed to establish the convergence rate of the estimators.

Assumption A.10 (Shrinking Threshold Assumption). Let C0i = C0 + Cvi , where C0 is fixed with
C0 ∈ R, and Cvi is such that Cvi ∼ iid(0, Ωvi ), with Ωvi is a r × r symmetric non-negative definite
matrix and ||Ωvi || < C. Additionally, supi=1,2...,N C0i < ∞. For 0 < α < 1/2, the threshold effect
δi0 satisfies that δi0 = C0i T −α for i = 1, 2, ..., N , and therefore, by Assumption A.3, δ 0 = C0 T −α .

Assumption A.10 is called the “shrinking threshold” assumption for the threshold parameter,
introduced in the threshold literature in Hansen (1999) and in the structural breaks literature in
Bai (1997). This assumption is similar to the idea of local to zero approximations in hypothesis
testing, and it is used to derive a pivotal asymptotic distribution of γ̂ in our model, such that the
critical values can be taken from a table. Without this assumption, the asymptotic distribution is

6
Hansen (2000) suggests an identification condition such as Z̃i (γ)′ Z̃i (γ 0 ) = Z̃i (γ 0 )′ Z̃i (γ 0 ) for γ > γ 0 ; yet this
equality does not hold in the presence of interactive fixed effects. Instead, we rely on the more complex identification
condition proposed by Miao et al. (2020a).

12
ridden with nuisance parameters arising from the underlying error distribution (see Chan (1993)).
However, the assumption implies that the derived asymptotic distribution is a better approximation
of the sampling distribution when δi0 is small. As will be shown below however, when δi0 is large,
it is also easier to estimate.
Hansen (1999) and Miao et al. (2020a) assume that δi0 = O[(N T )−α ] as opposed to δi0 = O(T )−α
as in part (ii) of our Assumption A.10. We use this alternative because of the heterogenous
coefficients in (1). For a given γ, the estimator for the heterogeneous slope δi only depends on each
unit’s information.
Additionally, we require the usual condition on higher conditional moments. Let γ 1 , γ 2 ∈
Γ, the parameter space and define Wi (γ 1 , γ 2 ) = Wi (γ 1 ) − Wi (γ 2 ). Let MD (γ) = (N T )−1 N
P
i=1
PT ′ x x′ C |q = γ)f
E
t=1 D (C 0i i,t i,t 0i i,t i,t,D (γ).
D such that
Assumption A.11 (Higher Order Moments). i) There exist D-dependent variables Ci,t
D , sup
supγ∈Γ ED (||xi,t ||4 |qi,t = γ) ≤ Ci,t 4 D
γ∈Γ ED (||xi,t ei,t || |qi,t = γ) ≤ Ci,t , and
 
P (N T )−1 N T D
P P
i=1 t=1 Ci,t ≤ C = 1 as (N, T ) → ∞.
ii) MD (γ) is continuous at γ = γ 0 .
iii) For all ϵ > 0, there exist constants B > 0 and C1 > 0, for enough large N, T , we have:
 MD (γ) 
P inf > C1 > 1 − ϵ. (20)
|γ−γ 0 |<B |γ − γ 0 |

Assumption A.11 i) assumes that the fourth-order conditional moments of xi,t and xi,t ei,t exist
and are bounded. Assumption A.11 ii) and iii) are similar to Assumption A.5 of Miao et al. (2020a),
and are made to ensure that the square matrix, MD (γ), is well-behaved in the neighbourhood of
γ0.

Theorem 2. Under Assumptions A.1-A.11, T α / N → 0, as (N, T ) → ∞, N T 1−2α (γ̂ − γ 0 ) =
Op (1).

Theorem 2 shows that the rate of convergence of γ̂ is N T 1−2α , thus depending on the magnitude
of threshold effects. Smaller α implies that the threshold effect is larger and in turn the convergence
rate is faster. On the other hand, if α is close to 1/2, the rate of convergence becomes slower since
the magnitude of the threshold is smaller. A fixed-magnitude threshold effect, is equivalent to
α → 0 and generates γ̂ − γ 0 = Op [(N T )−1 ]. However, under this rate of convergence, Chan (1993)
finds that the threshold estimator converges in distribution to a functional of a compound Poisson
process with unknown nuisance parameters. This distribution can alternatively be simulated by
a numerical approach as in Li and Ling (2012). The CCE literature typically requires T /N → 0

for pooled estimators (Pesaran, 2006). If this holds, then we have T α / N → 0 immediately.
Therefore, the relative rate of divergence between T and N in Theorem 2 above is weaker than
what is necessary elsewhere.
Below, we derive the asymptotic distributions of θ̃i , the MG estimator θ̃M G , and γ̂.

13

Theorem 3. Under Assumptions A.1-A.11, T α / N → 0, as (N, T ) → ∞, we have:
√ p
i) T (θ̃i − θ̂i (γ 0 )) → 0.

ii) If we further require that T /N → 0, and conditional on D, then the limiting distribu-
√ d
tion is T (θ̃i − θi0 ) → N (0, Vθi ), where Vθi is the standard asymptotic covariance matrix de-
fined as Vθi = Σ−1 −1
i Sie Σi , Σi = plimT →∞ T
−1 E (Z (γ 0 )′ M Z (γ 0 )), Σ
D i F i εi = T
−1 E(ε ε′ ), and
i i
Sie = plimT →∞ T −1 ED (Zi (γ 0 )′ MF Σεi MF Zi (γ 0 )).

Theorem 3 shows that the difference between θ̃i and θ̂i (γ 0 ) is op (1/ T ), which implies that the
distribution of θ̃i can be approximated by the conventional normal distribution as if γ is known
with certainty.
Inference on θi0 requires a consistent estimator of Vθi . Such an estimator is given by V̂θi =
Σ̂−1 −1
i Ŝiε Σ̂i , where:

1 1
Σ̂i = Z̃i (γ̂)′ Z̃i (γ̂), Ŝiε = Z̃i (γ̂)′ diag(ε̂i ε̂′i )Z̃i (γ̂), ε̂i = ẽˆi = ỹi − Z̃i (γ̂)θ̂i (γ̂). (21)
T T

Theorem 3 is useful for testing hypotheses on individual units’ parameters. As an example, consider
testing the statistical significance of the regression coefficients: H0i : θi0k = 0 v.s. H1i : θi0k ̸= 0,
for some i = 1, ..., N , where ik denotes kth element in θi0 .
Turning to the asymptotic distribution for the MG estimator (13).

Theorem 4. Under Assumptions A.1-A.11, as (N, T ) → ∞ and T α / N → 0, we have:
"√ #
N IK 0K×r d
√ α
(θ̃ − θ0 ) → N (0, ΣM G ), (22)
0r×K N T Ir

where ΣM G = Ωv . An estimator for the variance is:

N
" #
IK 0K×r 1 X
Σ̂M G = (θ̃i − θ̃)(θ̃i − θ̃)′ .
0r×K T 2α Ir N −1
i=1

Theorem 4 derives the rate of convergence and the asymptotic distribution of the MG estimator.
The theorem demonstrates that slope coefficient heterogeneity has unique implications for the
asymptotic theory, as δ̃ has a rate of convergence that is much faster than that of the MG estimator
in linear regression, see e.g., Juodis et al. (2021). This arises due to the shrinking threshold
assumption A.10:
C0i C0 Cv
δi0 = α
= α + αi , (23)
T T T
which shows that the error term which drives unit heterogeneity Cvi /T α , is O(T −α ). This implies
that in the limit, heterogeneity vanishes and we have a homogeneous model. The closer the δi0 are
to zero, the closer they are to each other, In other words, they become more homogeneous. This is

reflected in the estimator rate of convergence, which increases to reach the standard N T found
in pooled estimators of homogeneous panel models, see e.g. Pesaran (2006). If the threshold is

14
√ d
large, this can be captured by α → 0 so that N (θ̃ − θ0 ) → N (0, ΣM G ), which is the same result
as Pesaran (2006). If the threshold is small, however, then we are closer to the homogeneous case
√ d
where α → 1/2 so that N T (θ̃ − θ0 ) → N (0, ΣM G ).
The result of Theorem 4 cannot be used because it contains the unknown parameter α. Dividing
(22) by T α gives the following usable corollary.
√ d
Corollary 1. Under the Assumptions of of Theorem 4, N (θ̃ − θ0 ) → N (0, ΣM G ), and a consis-
tent estimator for ΣM G is given by Σ̂M G = (N − 1)−1 N ′
P
i=1 (θ̃i − θ̃)(θ̃i − θ̃) .

Below, we derive the asymptotic distribution of the threshold parameter estimator. We will
need the following two assumptions. Let fi,t (.) denote the pdf of qi,t .

Assumption A.12 (No jumps in threshold variable). For all i, if s > t, fi,s|t (γ 0 |γ 0 ) < ∞.

Assumption A.13 (Well-behaved limits). Define:

N T
1 XX  ′
E (C0i xi,t )2 |qi,t = γ fi,t (γ),

DN T (γ) = (24)
NT
i=1 t=1
N T
1 XX  ′
xi,t )2 ε2i,t |qi,t = γ fi,t (γ),

VN T (γ) = E (C0i (25)
NT
i=1 t=1

and let D(γ) = plim(N,T )→∞ DN T (γ) and V (γ) = plim(N,T )→∞ VN T (γ).
i) The limits D(γ) and V (γ) exist and are continuous at γ = γ 0 .
ii) There is a constant C ⋆ such that:

N T
" #
1 XX (1) 1 2 2 (1)
V ar √ ED (||gi,t (γ , γ )|| ) ≤ C ⋆ supi,t E(||gi,t (γ 1 , γ 2 )||4 ), (26)
N T i=1 t=1
N T
" #
1 XX (2) 1 2 2 (2)
V ar √ ED (||gi,t (γ , γ )|| ) ≤ C ⋆ supi,t E(||gi,t (γ 1 , γ 2 )||4 ), (27)
N T i=1 t=1

(1) (2)
where gi,t (γ 1 , γ 2 ) = xi,t |di,t (γ 1 ) − di,t (γ 2 )|, gi,t (γ 1 , γ 2 ) = xi,t εi,t |di,t (γ 1 ) − di,t (γ 2 )| and di,t (γ) =
⋆ ≤ γ}.
I{qi,t

Assumption A.12 excludes the possibility for the unit i, qi,t = γ 0 holds for all t = 1, 2, ..., T ,
which is similar as Assumption 8 of Hansen (1999). Assumption A.13 provides conditional moment
boundedness for all γ ∈ Γ.
√ √
Theorem 5. Under Assumptions A.1-A.13, if T α / N → 0, and T /N → 0 as (N, T ) → ∞,
then:
d
N T 1−2α (γ̂ − γ 0 ) → ϕζ, (28)

where ϕ = V (γ 0 )/D(γ 0 )2 , ζ = argmax−∞<r<∞ {(−1/2)|r| + W (r)}, and W (r) is a two-sided


standard Brownian motion on the real line.

15
Theorem 5 gives the asymptotic distribution of the threshold estimate γ̂. The threshold pa-
rameter’s asymptotic distribution is the same as in Hansen (2000). In the presence of conditional
homoskedasticity in εi,t , ϕ can be further simplified as σ 2 [D(γ 0 )]−1 . Hansen (2000) suggests that
the asymptotic distribution of Theorem 5 should not be used for the construction of confidence
intervals on γ, because the nuisance parameter ϕ is hard to estimate accurately.
We now propose a likelihood ratio test for the null hypothesis H0 : γ = γ 0 . This test can be
used to get confidence intervals for γ 0 , instead of the result in Theorem 5. Let

RSS(θ̂1 (γ), ..., θ̂N (γ), γ) − RSS(θ̂1 (γ̂), ..., θ̂N (γ̂), γ̂)
LR(γ) = ,
σ̂ 2
PN
where RSS(θ̂1 (γ), ..., θ̂N (γ), γ) = i=1 (ỹi − Z̃i (γ)θ̂i (γ))′ (ỹi − Z̃i (γ)θ̂i (γ)), and the error variance
estimator is given by σ̂ 2 = (N T )−1 RSS(θ̃1 , ..., θ̃N , γ̂).
√ √
Theorem 6. Under H0 : γ = γ 0 , Assumptions A.1-A.13, T α / N → 0, and T /N → 0, as
(N, T ) → ∞,
d
LR(γ 0 ) → η 2 Ξ,
PN PT
where η 2 = V (γ 0 )/(σ 2 D(γ 0 )) and σ 2 = plim(N,T )→∞ (N T )−1 i=1
2
t=1 εi,t . The random variable
Ξ = sups∈R [2W (s) − |s|] has a distribution function given by P(Ξ ≤ x) = (1 − exp(−x/2))2 .

In a special case where the error εi,t is homoskedastic along both the cross-section and time
dimension, η 2 = 1 and inference can be made based on readily avaialble critical values. The
distribution function of Ξ has the inverse:

c(a) = −2log(1 − 1 − a). (29)

where c(a) is the critical value and a is the significance level. A test of H0 rejects at the asymptotic
level a if LR(γ 0 ) exceeds c(a). The critical values appear in Table 1 of Hansen (2000) and are:
for a = 0.1 it is 5.94, for a = 0.05 it is 7.35 and for a = 0.01 it is 10.59. In terms of forming an
asymptotic confidence interval for γ, the non-rejection region of confidence level 1 − a is the set
of values of γ such that LR(γ) ≤ c(a). The confidence interval can be found by plotting LR(γ)
against γ and drawing a flat line at c(a).
For homoskedastic errors the above Theorem 6 is asymptotically correct under the shrinking
threshold assumption δi0 → 0 for all i = 1, 2, ..., N . However, if the errors are additionally normal,
i.i.d. across both dimensions and independent of the regressors and the threshold variable, we
conjecture that the result of Theorem 3 of Hansen (2000) holds, which says that inference based
on the LR test is asymptotically valid even if δi0 does not shrink towards 0.
If the errors are not homoskedastic, however, η 2 needs to be estimated consistently. This can
be done by noticing that

0′
PN PT 2 0 0

2 plim(N,T )→∞ i=1 t=1 E (δi xi,t εi,t ) |qi,t = γ ft (γ )
η = , (30)
σ 2 plim(N,T )→∞ N
P PT 0′ x )2 |q = γ 0 f (γ 0 )

i=1 t=1 E (δ i i,t i,t t

16
which can be estimated nonparametrically by
PN PT ′ 2
t=1 Kh (γ̂ − qi,t )(δ̃i xi,t ε̂i,t )
ηˆ2 = i=1
, (31)
σ̂ 2 N T ′ 2
P P
i=1 t=1 Kh (γ̂ − qi,t )(δ̃i xi,t )

where σ̂ 2 = (N T )−1 RSS(θ̃1 , ..., θ̃N , γ̂), Kh (u) = h−1 K(u/h), K(.) is a kernel function, and h → 0
is the bandwidth parameter. It is straightforward to show consistency of the above estimator as
in Hansen (2000) and Miao et al. (2020a). Given the estimator η̂ 2 , a normalized LR statistic is
defined as N LR(γ 0 ) = LR(γ 0 )/η̂ 2 and its critical values are the same as in the homoskedastic case
studied above.

6. Testing the threshold effect

Testing the null hypothesis of no threshold regression is challenging for two reasons. First,
under H0 : δi0 = 0, for i = 1, ..., N , the parameter γ 0 disappears and thus can only be identified
under the alternative. This is a nonstandard hypothesis testing problem which has been studied by
Davies (1977) and further developed by Hansen (1996), among others. To deal with this problem
we will follow the previous literature and test the null hypothesis based on a supremum-type Wald
statistic whose limiting distribution is non-standard but can be estimated by the bootstrap.
The second challenge, which is new to the threshold literature is that the null hypothesis above
is equivalent to testing N individual hypotheses, where N goes to infinity. This is a multiple testing
problem that can lead to low “power”, where power here has the more loose interpretation of “not
the null”. To avoid the multiple testing issue altogether, we employ an approach recently put
forward in Juodis et al. (2021), which notices that under the null hypothesis, the model becomes
homogeneous in δi0 because δi0 = δ 0 = 0 for all i. For homogeneous threshold coefficients, the data
generating process becomes

yi = Xi βi + Wi (γ)δ + ei . (32)

For a given γ, δ 0 can be estimated by the pooled estimator

N
!−1 N
!
X X
′ ′
δ̂p (γ) = Wi (γ) MZi⋆ (γ) Wi (γ) Wi (γ) MZi⋆ (γ) yi , (33)
i=1 i=1

′ ′
where MZi⋆ (γ) = I−Zi⋆ (γ)(Zi⋆ (γ)Zi⋆ (γ))−1 Zi⋆ (γ), Zi⋆ (γ) = (X̄, W̄ (γ), Xi ), and W̄ (γ) = N −1 N
P
√ i=1 Wi (γ).
The estimator δ̂p (γ) is a pooled estimator with a faster N T rate of convergence. It also varies
from δ̂(γ) in terms of the annihilator matrix. In MZi⋆ (γ) , W̄ (γ) is included to remove asymptotic
bias from the interactive effects following Karavias et al. (2022), and Xi is included to project out
the variables with the heterogeneous coefficients as in Juodis et al. (2021).

17
The supremum Wald statistic for the null hypothesis is:

supW = sup WN T (γ), (34)


γ∈Γ

WN T (γ) = N T δ̂p (γ)′ V̂δp (γ) δ̂p (γ), (35)

where V̂δp (γ) = Σ̂−1 (γ, γ)K̂(γ, γ)Σ̂−1 (γ, γ), Σ̂(γ, γ) = N1T N ′
P
i=1 W̃i (γ) W̃i (γ), and where K̂(γ, γ) =
1 P N ′ ′
NT i=1 W̃i (γ) diag(ε̂i (γ)ε̂i (γ) )W̃i (γ). In the above ỹi (γ) = MZi⋆ (γ) yi , W̃i (γ) = MZi⋆ (γ) Wi (γ), and
ε̂i (γ) = ẽˆi (γ) = ỹi (γ) − W̃i (γ)δ̂p (γ).
The implementation of supW requires an approximation of Γ like the one used in estimating γ̂
above. To derive the limiting distribution of supW we will need an additional assumption. Define:

N N
1 X ′ 1 X
SN T (γ) = √ 1 2
Wi (γ) MF εi , and Σ(γ , γ ) = plim E(Wi (γ 1 )′ MF Wi (γ 2 )|D).
N T i=1 (N,T )→∞ N T
i=1

Assumption A.14 is a high level assumption. It is straight forward to prove its claims from more
primitive conditions following Hansen (1996) or Lemma A.9 in the appendix of Miao et al. (2020a).

Assumption A.14. SN T (γ) ⇒ S(γ), where S(γ) is a mean-zero Gaussian process with covari-
ance kernel K(γ 1 , γ 2 ) = plim(N,T )→∞ (N T )−1 N 1 ′ 2
P
i=1 E(Wi (γ ) MF Σεi MF Wi (γ )|D), where Σεi =
plim(N,T )→∞ E(εi ε′i ).

Theorem 7. Suppose that Assumptions A.1-A.8 and A.14 hold, as (N, T ) → ∞ and T /N → 0,
under H0 : δi0 = 0, for i = 1, ..., N , we have:

d
supW → sup WNc T (γ) (36)
γ∈Γ

where WNc T (γ) = S̄(γ)′ K̄(γ, γ)−1 S̄(γ), and S̄(γ) = Σ(γ, γ)−1 S(γ) is a mean-zero Gaussian process
with covariance kernel K̄(γ 1 , γ 2 ) = Σ(γ 1 , γ 1 )−1 K(γ 1 , γ 2 )Σ(γ 2 , γ 2 )−1 .

The limiting distribution under the null depends on nuisance parameters. Following Hansen
(1996), Chudik et al. (2017) and Giannerini et al. (2023) we find the critical values using the
bootstrap. The steps for the bootstrap and Monte Carlo simulations evaluating its performance
can be found in the Appendix.

7. Empirical Application

As an empirical application of the above theory, we revisit one of the key puzzles in international
economics, the Feldstein-Horioka puzzle (Feldstein and Horioka, 1980). In theory, perfect capital
mobility should allow savings from one country to be invested in other countries, where better
investment opportunities with higher returns are available. Feldstein and Horioka (1980), however,
find that this is not the case, and that domestic investments are highly correlated with domestic

18
savings. Ever since, this has become one of the six major puzzles of international macroeconomics
(Obstfeld and Rogoff, 2000).
Feldstein and Horioka (1980) used cross-sectional data to estimate the relationship:

I S
=a+β , (37)
Y Y

where Y is national income, I is domestic investment, and S is domestic savings. They find that β
is closer to 1, than to 0, meaning that there is a strong relationship between domestic saving rates
and domestic investment.
The closest contribution to ours is that of Hacıoğlu Hoke and Kapetanios (2021) which exam-
ine this problem for a panel of OECD countries by introducing a nonlinear relationship between
investment and savings, based on the trade openness of the country. The idea is that higher trade
openness would imply fewer frictions and therefore higher capital mobility. This would in turn
weaken the relationship between domestic savings and investment, given that more investment op-
portunities are made available worldwide. The nonlinearity is modelled via a smooth transition
model based on the logistic function.
Our analysis is different from that in Hacıoğlu Hoke and Kapetanios (2021) in that we employ
the discontinuous and heterogeneous threshold model developed previously. The discontinuous
threshold model has several advantages over smooth transition models. First, it has fewer parame-
ters to estimate and there is no need to select the continuous transition function. Second, it is more
intuitive to interpret as it implies that there are only two regimes, below and above the threshold,
while smooth transition models can be viewed as models with a wide continuum of states between
two extreme regimes. Finally, the threshold parameter estimator in the discontinuous model is
super-consistent and thus has a faster rate of convergence when compared to the smooth transition
threshold parameter. The model we consider is the following:

Investmenti,t = β1i Savingsi,t + β2i Trade Opennessi,t


+ δi Savingsi,t I{pi (Trade Opennessi,t ) ≥ γ} + αi + λ′i ft + εi,t , (38)

where Investmenti,t is the investment share of real GDP per capita for country i at year t, Savingsi,t
is the percentage share of current savings to GDP per capita for country i at year t, Trade Opennessi,t
denotes the trade openness for country i at year t, pi (γ) is the percentile function of the distribution
of Trade Opennessi,t across t = 1, ..., T for country i, αi is the fixed effects, λ′i ft are the interactive
fixed effects, and εi,t , are the innovations. The data is taken from the Penn World Tables version
7.1 as in Hacıoğlu Hoke and Kapetanios (2021) and covers the period of 1951-2000 resulting in a
balanced panel with N = 45 and T = 50.
The main results are displayed in Table 1. The top panel of the table presents the results of the
test for non-linearity which can be seen to reject the null of no-threshold regression. The middle
panel, reveals the threshold parameter estimate, which is 0.685. This means that for country i if
trade openness is below the 68.5 percentile of that country’s historical trade openness distribution,

19
Table 1: Main Results

Test for the presence of threshold effects:


SupW 96.194
P-value 0.000
Thereshold estimate:
Threshold(γ̂) 0.685
95%CI [0.685, 0.700]
Coefficient estimates:
Investment
EST SE HC
Savings 0.730⋆⋆⋆ 0.062 0.867
Saving(Trade Openness >= Threshold Level) −0.085⋆⋆⋆ 0.024 0.578
Control variables:
Trade Openness 0.081⋆ 0.049 0.689
Notes: EST denote the coefficient estimates β̃ and δ̃, SE are the corresponding standard errors, and HC indicates
the proportion of units that have statistically significant βi and δi are defined in the section 5.3. ⋆, ⋆⋆ and ⋆ ⋆ ⋆
denote statistical significance at the 10%, 5% and 1% levels, respectively.

then that country is in the low regime. Beyond that threshold, country i is in the high regime.
The confidence interval is asymmetric because it is created by inverting the normalised LR statistic
as suggested in Hansen (2000). The bottom panel of Table 1 reports the estimates for β̃ and δ̃
which are the MG estimates based on the estimated threshold values. The coefficient of savings
in the low regime is 0.73, which is in line with previous literature. Hacıoğlu Hoke and Kapetanios
(2021) report estimates of β̃ of 0.69 (pooled) and 0.61 (MG) for OECD countries when the cross-
sectional dependence is accounted for. On the other hand, Feldstein and Horioka (1980) reported
the savings retention parameter to be 0.89, while Obstfeld and Rogoff (2000) found a smaller
estimate of 0.60. Notice that the forms of non-linearity are modelled in different ways in the above
mentioned papers. In the high regime, the savings retention parameter is reduced by 0.085 to
0.73 − 0.085 = 0.645 showing that as trade openness increases international capital mobility also
increases. The standalone trade openness regressor is included to avoid potential omitted variable
bias. However, removing it will not alter the results obtained.
Table 2, translates the 68.5 percentile threshold into actual trade openness threshold values
for the countries in our sample. Each country has a different distribution of the trade openness
variable and thus the 68.5 percentile translates to a different trade openness threshold. Table 2
presents theses thresholds in an increasing order. The country with the lowest threshold is the
US. Generally, from the left column of the table, larger economies seem to have lower thresholds,
implying that capital flows are easier to leave the domestic market. On the contrary, smaller
economies require significant levels of trade openness in order to divert domestic savings toward
investment opportunities abroad.
Figure 1 displays the proportion of countries that are in the high regime. This proportion can
be seen to increase rapidly in the 1980s, and can be explained by the extensive trade liberalisation
policies implemented internationally at the time, see Faini (2004). The results show that higher
levels of trade liberalisation increase international capital mobility.

20
Table 2: Heterogeneous Threshold Levels

United States 13.005 United Kingdom 32.635 Israel 58.489


India 13.121 Portugal 33.131 Norway 58.580
Brazil 13.248 Pakistan 33.492 Philippines 59.025
Japan 15.252 Peru 35.001 Thailand 60.204
Mexico 15.440 New Zealand 35.463 Ireland 64.784
Turkey 18.903 Finland 40.968 Netherlands 69.857
Argentina 19.025 Canada 43.879 Trinidad and Tobago 70.627
Colombia 20.603 Iceland 45.518 Cyprus 85.200
Congo, Dem. Rep. 20.841 Sweden 48.747 Egypt, Arab Rep. 88.981
Spain 22.683 Bolivia 49.311 Belgium 96.196
Australia 23.453 Denmark 51.214 Sri Lanka 116.871
Uruguay 28.454 Venezuela 51.767 Honduras 126.058
Uganda 29.121 South Africa 55.277 Puerto Rico 129.323
France 29.136 Switzerland 56.129 Panama 174.804
Italy 30.279 Austria 56.526 Luxembourg 200.054
Mean 54.237
Notes: Mean denotes the average of heterogeneous threshold levels in the sample.

8. Conclusion

In this paper, we studied a new panel data threshold regression model with heterogeneous
coefficients and interactive fixed effects. We employed a quantile transformation approach to achieve
pooled estimation of the threshold parameter and threshold heterogeneity across units. We provided
a complete and novel inferential theory for the estimated heterogeneous slope coefficients and the
threshold parameter. We also proposed a test for the presence of the threshold effect. The Monte
Carlo simulations showed that the new theory works well in practice. The new model was used to
examine the Feldstein-Horioka puzzle, where we found that once trade openness is greater than a
threshold, capital mobility increases.
There are still several interesting topics for future research. Interactive fixed effects represent
a great novelty in panel threshold regression and existing methods could be extended along this
direction. Such examples include panel threshold models with endogenous threshold variables as
in Seo and Shin (2016) and Yu et al. (2023), multiple-regime threshold models (Yau et al., 2015),
binary response models (Gao et al., 2023) and quantile models as in Chen (2019), Harding et al.
(2020) and Zhang et al. (2021).

References

Ahn, S.C., Lee, Y.H., Schmidt, P., 2001. Gmm estimation of linear panel data models with time-
varying individual effects. Journal of Econometrics 101, 219–255.

Ahn, S.C., Lee, Y.H., Schmidt, P., 2013. Panel data models with multiple time-varying individual
effects. Journal of Econometrics 174, 1–14.

Bai, J., 1997. Estimation of a change point in multiple regression models. Review of Economics
and Statistics 79, 551–563.

21
Figure 1: Proportion of countries experiencing the threshold effects

Bai, J., 2009. Panel data models with interactive fixed effects. Econometrica 77, 1229–1279.

Bai, J., 2010. Common breaks in means and variances for panel data. Journal of Econometrics
157, 78–92.

Bonhomme, S., Manresa, E., 2015. Grouped patterns of heterogeneity in panel data. Econometrica
83, 1147–1184.

Chan, K.S., 1993. Consistency and limiting distribution of the least squares estimator of a threshold
autoregressive model. The Annals of Statistics 21, 520–533.

Chen, H., Chong, T.T.L., Bai, J., 2012. Theory and applications of tar model with two threshold
variables. Econometric Reviews 31, 142–170.

Chen, J., Shin, Y., Zheng, C., 2022. Estimation and inference in heterogeneous spatial panels with
a multifactor error structure. Journal of Econometrics 229, 55–79.

Chen, L., 2019. Two-step estimation of quantile panel data models with interactive fixed effects.
Econometric Theory , 1–28.

Chudik, A., Mohaddes, K., Pesaran, M.H., Raissi, M., 2017. Is there a debt-threshold effect on
output growth? Review of Economics and Statistics 99, 135–150.

Chudik, A., Pesaran, M.H., 2013. Large panel data models with cross-sectional dependence: a
survey. CAFE Research Paper .

Davies, R.B., 1977. Hypothesis testing when a nuisance parameter is present only under the
alternative. Biometrika 64, 247–254.

22
Faini, R., 2004. Trade liberalization in a globalizing world. IZA Discussion Papers No. 1406.

Feldstein, M., Horioka, C., 1980. Domestic saving and international capital flows. The Economic
Journal 90, 314–329.

Fernández-Val, I., Lee, J., 2013. Panel data models with nonadditive unobserved heterogeneity:
Estimation and inference. Quantitative Economics 4, 453–481.

Gao, J., Liu, F., Peng, B., Yan, Y., 2023. Binary response models for heterogeneous panel data
with interactive fixed effects. Journal of Econometrics .

Gao, J., Xia, K., Zhu, H., 2020. Heterogeneous panel data models with cross-sectional dependence.
Journal of Econometrics 219, 329–353.

Giannerini, S., Goracci, G., Rahbek, A., 2023. The validity of bootstrap testing for threshold
autoregression. Journal of Econometrics .

Giglio, S., Xiu, D., 2021. Asset pricing with omitted factors. Journal of Political Economy 129,
1947–1990.

Hacıoğlu Hoke, S., Kapetanios, G., 2021. Common correlated effect cross-sectional dependence
corrections for nonlinear conditional mean panel models. Journal of Applied Econometrics 36,
125–150.

Hansen, B.E., 1996. Inference when a nuisance parameter is not identified under the null hypothesis.
Econometrica 64, 413–430.

Hansen, B.E., 1999. Threshold effects in non-dynamic panels: Estimation, testing, and inference.
Journal of Econometrics 93, 345–368.

Hansen, B.E., 2000. Sample splitting and threshold estimation. Econometrica 68, 575–603.

Harding, M., Lamarche, C., Pesaran, M.H., 2020. Common correlated effects estimation of hetero-
geneous dynamic panel quantile regression models. Journal of Applied Econometrics 35, 294–314.

Holtz-Eakin, D., Newey, W., Rosen, H.S., 1988. Estimating vector autoregressions with panel data.
Econometrica 56, 1371–1395.

Juodis, A., Karavias, Y., Sarafidis, V., 2021. A homogeneous approach to testing for granger
non-causality in heterogeneous panels. Empirical Economics 60, 93–112.

Juodis, A., Sarafidis, V., 2022. A linear estimator for factor-augmented fixed-t panels with endoge-
nous regressors. Journal of Business & Economic Statistics 40, 1–15.

Karavias, Y., Narayan, P.K., Westerlund, J., 2022. Structural breaks in interactive effects panels
and the stock market reaction to covid-19. Journal of Business & Economic Statistics , 1–14.

23
Kejriwal, M., Li, X., Totty, E., 2020. Multidimensional skills and the returns to schooling: Evidence
from an interactive fixed-effects approach and a linked survey-administrative data set. Journal
of Applied Econometrics 35, 548–566.

Li, D., Ling, S., 2012. On the least squares estimation of multiple-regime threshold autoregressive
models. Journal of Econometrics 167, 240–253.

Lu, X., Su, L., 2016. Shrinkage estimation of dynamic panel data models with interactive fixed
effects. Journal of Econometrics 190, 148–175.

Massacci, D., 2017. Least squares estimation of large dimensional threshold factor models. Journal
of Econometrics 197, 101–129.

Massacci, D., Sarno, L., Trapani, L., 2021. Factor models with downside risk. Available at SSRN
3937321 .

Miao, K., Li, K., Su, L., 2020a. Panel threshold models with interactive fixed effects. Journal of
Econometrics 219, 137–170.

Miao, K., Su, L., Wang, W., 2020b. Panel threshold regressions with latent group structures.
Journal of Econometrics 214, 451–481.

Moon, H.R., Weidner, M., 2018. Nuclear norm regularized estimation of panel regression models.
arXiv preprint arXiv:1810.10987 .

Norkutė, M., Sarafidis, V., Yamagata, T., Cui, G., 2021. Instrumental variable estimation of
dynamic linear panel data models with defactored regressors and a multifactor error structure.
Journal of Econometrics 220, 416–446.

Obstfeld, M., Rogoff, K., 2000. The six major puzzles in international macroeconomics: is there a
common cause? NBER macroeconomics annual 15, 339–390.

Pesaran, M.H., 2006. Estimation and inference in large heterogeneous panels with a multifactor
error structure. Econometrica 74, 967–1012.

Pesaran, M.H., Shin, Y., Smith, R.P., 1999. Pooled mean group estimation of dynamic heteroge-
neous panels. Journal of the American Statistical Association 94, 621–634.

Pesaran, M.H., Smith, R., 1995. Estimating long-run relationships from dynamic heterogeneous
panels. Journal of Econometrics 68, 79–113.

Seo, M.H., Shin, Y., 2016. Dynamic panels with threshold effect and endogeneity. Journal of
Econometrics 195, 169–186.

Su, L., Shi, Z., Phillips, P.C., 2016. Identifying latent structures in panel data. Econometrica 84,
2215–2264.

24
Swamy, P.A., 1970. Efficient inference in a random coefficient regression model. Econometrica 38,
311–323.

Tong, H., 1978. On a threshold model in pattern recognition and signal processing, ed. CH Chen,
Amsterdam: Sijhoff & Noordhoff .

Trapani, L., 2021. Inferential theory for heterogeneity and cointegration in large panels. Journal
of Econometrics 220, 474–503.

Tsay, R.S., 1989. Testing and modeling threshold autoregressive processes. Journal of the American
Statistical Association 84, 231–240.

Tsay, R.S., 1998. Testing and modeling multivariate threshold models. Journal of the American
Statistical Association 93, 1188–1202.

Westerlund, J., Kaddoura, Y., 2022. Cce in heterogenous fixed-t panels. The Econometrics Journal
25, 715–738.

Westerlund, J., Urbain, J.P., 2015. Cross-sectional averages versus principal components. Journal
of Econometrics 185, 372–377.

Yau, C.Y., Tang, C.M., Lee, T.C., 2015. Estimation of multiple-regime threshold autoregressive
models with structural breaks. Journal of the American Statistical Association 110, 1175–1186.

Yu, P., Liao, Q., Phillips, P.C.B., 2023. New control function approaches in threshold regression
with endogeneity. Econometric Theory , 1–55.

Zhang, Y., Wang, H.J., Zhu, Z., 2021. Single-index thresholding in quantile regression. Journal of
the American Statistical Association , 1–16.

25

You might also like