Exact Small Sample Theory in The Simultaneous Equations Model
Exact Small Sample Theory in The Simultaneous Equations Model
Yale University
Contents
1. Introduction 451
2. Simple mechanics of distribution theory 454
2. I. Primitive exact relations and useful inversion formulae 454
2.2. Approach via sample moments of the data 455
2.3. Asymptotic expansions and approximations 457
2.4. The Wishart distribution and related issues 459
3. Exact theory in the simultaneous equations model 463
3.1. The model and notation 463
3.2. Generic statistical forms of common single equation estimators 464
3.3. The standardizing transformations 467
3.4. The analysis of leading cases 469
3.5. The exact distribution of the IV estimator in the general single equation case 472
3.6. The case of two endogenous variables 478
3.1. Structural variance estimators 482
3.8. Test statistics 484
3.9. Systems estimators and reduced-form coefficients 490
3.10. Improved estimation of structural coefficients 497
3.11. Supplementary results on moments 499
3.12. Misspecification 501
*The present chapter is an abridgement of a longer work that contains inter nlia a fuller exposition
and detailed proofs of results that are surveyed herein. Readers who may benefit from this greater
degree of detail may wish to consult the longer work itself in Phillips (1982e).
My warmest thanks go to Deborah Blood, Jerry Hausmann, Esfandiar Maasoumi, and Peter Reiss
for their comments on a preliminary draft, to Glena Ames and Lydia Zimmerman for skill and effort
in preparing the typescript under a tight schedule, and to the National Science Foundation for
research support under grant number SES 800757 1.
Little experience is sufficient to show that the traditional machinery of statistical processes is wholly
unsuited to the needs of practical research. Not only does it take a cannon to shoot a sparrow, but it
misses the sparrow! The elaborate mechanism built on the theory of infinitely large samples is not
accurate enough for simple laboratory data. Only by systematically tackling small sample problems on
their merits does it seem possible to apply accurate tests to practical data. Such at least has been the
aim of this book. [From the Preface to the First Edition of R. A. Fisher (1925).]
1. Introduction
‘The nature of local alternative hypotheses is discussed in Chapter 13 of this Handbook by Engle.
‘See, for example, Fisher (1921, 1922, 1924, 1928a, 1928b, 1935) and the treatment of exact
sampling distributions by Cram&r (1946).
452 P. C. B. Phillips
The mission of these early researchers is not significantly different from our
own today: ultimately to relieve the empirical worker from the reliance he has
otherwise to place on asymptotic theory in estimation and inference. Ideally, we
would like to know and be able to compute the exact sampling distributions
relevant to our statistical procedures under a variety of stochastic environments.
Such knowledge would enable us to make a better assessment of the relative
merits of competing estimators and to appropriately correct (from their asymp-
totic values) the size or critical region of statistical tests. We would also be able to
measure the effect on these sampling distributions of certain departures in the
underlying stochastic environment from normally distributed errors. The early
researchers clearly recognized these goals, although the specialized nature of their
results created an impression3 that there would be no substantial payoff to their
research in terms of applied econometric practice. However, their findings have
recently given way to general theories and a powerful technical machinery which
will make it easier to transmit results and methods to the applied econometrician
in the precise setting of the model and the data set with which he is working.
Moreover, improvements in computing now make it feasible to incorporate into
existing regression software subroutines which will provide the essential vehicle
for this transmission. Two parallel current developments in the subject are an
integral part of this process. The first of these is concerned with the derivation of
direct approximations to the sampling distributions of interest in an applied
study. These approximations can then be utilized in the decisions that have to be
made by an investigator concerning, for instance, the choice of an estimator or
the specification of a critical region in a statistical test. The second relevant
development involves advancements in the mathematical task of extracting the
form of exact sampling distributions in econometrics. In the context of simulta-
neous equations, the literature published during the 1960s and 1970s concentrated
heavily on the sampling distributions of estimators and test statistics in single
structural equations involving only two or at most three endogenous variables.
Recent theoretical work has now extended this to the general single equation case.
The aim of the present chapter is to acquaint the reader with the main strands
of thought in the literature leading up to these recent advancements. Our
discussion will attempt to foster an awareness of the methods that have been used
or that are currently being developed to solve problems in distribution theory,
and we will consider their suitability and scope in transmitting results to empirical
researchers. In the exposition we will endeavor to make the material accessible to
readers with a working knowledge of econometrics at the level of the leading
textbooks. A cursory look through the journal literature in this area may give the
impression that the range of mathematical techniques employed is quite diverse,
with the method and final form of the solution to one problem being very
different from the next. This diversity is often more apparent than real and it is
3The discussions of the review article by Basmann (1974) in Intriligator and Kendrick (1974)
illustrate this impression in a striking way. The achievements in the field are applauded, but the reader
Ch. 8: Exact Small Sample Theory 453
hoped that the approach we take to the subject in the present review will make the
methods more coherent and the form of the solutions easier to relate.
Our review will not be fully comprehensive in coverage but will report the
principal findings of the various research schools in the area. Additionally, our
focus will be directed explicitly towards the SEM and we will emphasize exact
distribution theory in this context. Corresponding results from asymptotic theory
are surveyed in Chapter 7 of this Handbook by Hausman; and the refinements of
asymptotic theory that are provided by Edgeworth expansions together with their
application to the statistical analysis of second-order efficiency are reviewed in
Chapter 15 of this Handbook by Rothenberg. In addition, and largely in parallel
to the analytical research that we will review, are the experimental investigations
involving Monte Carlo methods. These latter investigations have continued
traditions established in the 1950s and 1960s with an attempt to improve certain
features of the design and efficiency of the experiments, together with the means
by which the results of the experiments are characterized. These methods are
described in Chapter 16 of this Handbook by Hendry. An alternative approach to
the utilization of soft quantitative information of the Monte Carlo variety is
based on constructive functional approximants of the relevant sampling distribu-
tions themselves and will be discussed in Section 4 of this chapter.
The plan of the chapter is as follows. Section 2 provides a general framework
for the distribution problem and details formulae that are frequently useful in the
derivation of sampling distributions and moments. This section also provides a
brief account of the genesis of the Edgeworth, Nagar, and saddlepoint approxi-
mations, all of which have recently attracted substantial attention in the litera-
ture. In addition, we discuss the Wishart distribution and some related issues
which are central to modem multivariate analysis and on which much of the
current development of exact small sample theory depends. Section 3 deals with
the exact theory of single equation estimators, commencing with a general
discussion of the standardizing transformations, which provide research economy
in the derivation of exact distribution theory in this context and which simplify
the presentation of final results without loss of generality. This section then
provides an analysis of known distributional results for the most common
estimators, starting with certain leading cases and working up to the most general
cases for which results are available. We also cover what is presently known about
the exact small sample behavior of structural variance estimators, test statistics,
systems methods, reduced-form coefficient estimators, and estimation under
n-&specification. Section 4 outlines the essential features of a new approach to
small sample theory that seems promising for future research. The concluding
remarks are given in Section 5 and include some reflections on the limitations of
traditional asymptotic methods in econometric modeling.
Finally, we should remark that our treatment of the material in this chapter is
necessarily of a summary nature, as dictated by practical requirements of space. A
more complete exposition of the research in this area and its attendant algebraic
detail is given in Phillips (1982e). This longer work will be referenced for a fuller
454 P. C. B. Phillips
df(r)=P(@,gr)=
/ yE8( ,pdf(y) 4,
(2.1)
@(r)=iy:B,(y,x)4r).r
This is an nT-dimensional integral over the domain of values O(r) for which
8, d r.
The distribution of OTis also uniquely determined by its characteristic function
(c.f.), which we write as
and this inversion formula is valid provided cf(s) is absolutely integrable in the
Lebesgue sense [see, for example, Feller (1971, p. 509)]. The following two
inversion formulae give the d.f. of 8, directly from (2.2):
and
m e’“‘cf( - s) - e-‘“‘cf( s) ds
df(r)=;++-/ (2.5)
0 is
The first of these formulae is valid whenever the integrand on the right-hand side
of (2.4) is integrable [otherwise a symmetric limit is taken in defining the
improper integral- see, for example, Cramer (1946, pp. 93-94)]. It is useful in
computing first differences in df(r) or the proportion of the distribution that lies
in an interval (a, b) because, by subtraction, we have
The second formula (2.5) gives the d.f. directly and was established by Gil-Pelaez
(1951).
When the above inversion formulae based on the characteristic function cannot
be completed analytically, the integrals may be evaluated by numerical integra-
tion. For this purpose, the Gil-Pelaez formula (2.5) or variants thereof have most
frequently been used. A general discussion of the problem, which provides
bounds on the integration and truncation errors, is given by Davies (1973).
Methods which are directly applicable in the case of ratios of quadratic forms are
given by Imhof (1961) and Pan Jie Jian (1968). The methods provided in the
latter two articles have often been used in econometric studies to compute exact
probabilities in cases such as the serial correlation coefficient [see, for example,
Phillips (1977a)] and the Durbir-Watson statistic [see Durbin and Watson
(1971)].
Most econometric estimators and test statistics we work with are relatively simple
functions of the sample moments of the data (y, x). Frequently, these functions
are rational functions of the first and second sample moments of the data. More
specifically, these moments are usually well-defined linear combinations and
matrix quadratic forms in the observations of the endogenous variables and with
456 P. C. B. Phillips
the weights being determined by the exogenous series. Inspection of the relevant
formulae makes this clear: for example, the usual two-step estimators in the linear
model and the instrumental variable (IV) family in the SEM. In the case of
limited information and full information maximum likelihood (LIML, FIML),
these estimators are determined as implicit functions of the sample moments of
the data through a system of implicit equations. In all of these cases, we can
proceed to write OT= O,( y, x) in the alternative form 8, = f3:( m), where m is a
vector of the relevant sample moments.
In many econometric problems we can write down directly the p.d.f. of the
sample moments, i.e. pdf(m), using established results from multivariate distri-
bution theory. This permits a convenient resolution of the distribution of 8,. In
particular, we achieve a useful reduction in the dimension of the integration
involved in the primitive forms (2.1) and (2.2). Thus, the analytic integration
required in the representation
P-7)
has already been reduced. In (2.7) a is a vector of auxiliary variates defined over
the space & and is such that the transformation y -+ (m, a) is 1: 1.
The next step in reducing the distribution to the density of 8, is to select a
suitable additional set of auxiliary variates b for which the transformation
m + (O,, b) is 1: 1. Upon changing variates, the density of 8, is given by the
integral
4See, for example, Sargan (1976a, Appendix B) and Phillips (198Oa).These issues will be taken up
further in Section 3.5.
Ch. 8: Exact Small Sample Theory 451
(1976), and the review by Phillips (1980b) for further discussion, references, and
historical background.
The concept of using a polynomial approximation of 8, in terms of the
elements of m to produce an approximate distribution for 8, can also be used to
approximate the moments of 8,, where these exist, or to produce pseudo-
moments (of an approximating distribution) where they do not.5 The idea
underlies the work by Nagar (1959) in which such approximate moments and
pseudo-moments were developed for k-class estimators in the SEM. In popular
parlance these moment approximations are called Nagar approximations to the
moments. The constructive process by which they are derived in the general case
is given in Phillips (1982e).
An alternative approach to the development of asymptotic series approxima-
tions for probability densities is the saddlepoint (SP) method. This is a powerful
technique for approximating integrals in asymptotic analysis and has long been
used in applied mathematics. A highly readable account of the technique and a
geometric interpretation of it are given in De Bruijn (1958). The method was first
used systematically in mathematical statistics in two pathbreaking papers by
Daniels (1954, 1956) and has recently been the subject of considerable renewed
interest.6
The conventional approach to the SP method has its starting point in inversion
formulae for the probability density like those discussed in Section 2.1. The
inversion formula can commonly be rewritten as a complex integral and yields the
p.d.f. of 8, from knowledge of the Laplace transform (or moment-generating
function). Cauchy’s theorem in complex function theory [see, for example, Miller
(1960)] tells us that we may well be able to deform the path of integration to a
large extent without changing the value of the integral. The general idea behind
the SP method is to employ an allowable deformation of the given contour, which
is along the imaginary axis, in such a way that the major contribution to the value
of the integral comes from the neighborhood of a point at which the contour
actually crosses .a saddlepoint of the modulus of the integrand (or at least its
dominant factor). In crude terms, this is rather akin to a mountaineer attempting
to cross a mountain range by means of a pass, in order to control the maximum
5This process involves a stochastic approximation to the statistic 0r by means of polynomials in the
elements of WIwhich are grouped into terms of like powers of T- ‘/* The approximating statistic then
yields the “moment” approximations for or. Similar “moment” approximations are obtained by
developing alternative stochastic approximations in terms of another parameter. Kadane (1971)
derived such alternative approximations by using an expansion of 8, (in the case of the k-class
estimator) in terms of increasing powers of o, where IJ* is a scalar multiple of the covariance matrix of
the errors in the model and the asvmptotics apply as (T+ 0. Anderson (1977) has recently discussed
the relationship between these alternative parameter sequences in the context of the SEM:
‘See, for example, Phillips (1978), Holly and Phillips (1979), Daniels ( 1980), Durbin (1980a, 1980b),
and Bamdorff-Nielson and Cox ( 1979).
Ch. 8: Exact Small Sample Theory 459
This integral is a (matrix variate) Laplace transform [see, for example, Herz
460 P. C. B. Phillips
(1955) and Constantine (1963)] which converges absolutely for Re(z) > +(n - 1)
and the domain of integration is the set of all positive definite matrices. It can be
evaluated in terms of univariate gamma functions as
[see James (1964)]. In (2.9) we also use the abbreviated operator representation
etr( a) = exp{tr( e)}.
The parameters of the Wishart distribution (2.9) are: (i) the order of the
symmetric matrix A, namely n; (ii) the degrees of freedom T, of the component
variates x, in the summation A = XX’= CT= ,xIxi; and (iii) the covariance matrix,
0, of the normally distributed columns x, in X A common notation for the
Wishart distribution (2.9) is then ‘?&( T, 52) [see, for example, Rao (1973, p. 534)].
This distribution is said to be central (in the same sense as the central X2
distribution) since the component variates x, have common mean E(x,) = 0. In
fact, when n = 1, s2 = 1, and A = a is a scalar, the density (2.9) reduces to
(2)-T/2r(T/2)-IaT/2-le~1/2)o, the density of a central X2 with T degrees of
freedom.
If the component variates x, in the summation are not restricted to have a
common mean of zero but are instead independently distributed as N(m,, s2),
then the joint distribution of the matrix A = XX’= cy,,x,x; is said to be
(non-central) Wishart with non-centrality matrix 2 = MM’, where M = [m,, . . . ,
mT]. This is frequently denoted Wn(T, 9, a), although M is sometimes used in
place of ?i? [as in Rao (1973), for example]. The latter is a more appropriate
parameter in the matrix case as a convenient generalization of the non-centrality
parameter that is used in the case of the non-central x2 distribution- a special
case of qn(T, 62,li?) in which n = 1, D = 1, and % = cy= ,m:.
The p.d.f. of the non-central Wishart matrix A = XX’= CT_ ,x,x:, where the x,
are independent’N(m,, s2), M = [M,, . . ., mT] = E(X), and 5i;i= MM’ is given by
etr( - +a-‘M)
pdf( A) =
In (2.11) J indicates a partition of the integerj into not more than n parts, where
S is an n x n matrix. A partition J of weight r is a set of r positive integers
(j ,, . . . ,j,} such that ci_, ji = j. For example (2, l} and {I, 1, l} are partitions of 3
and are conventionally written (21) and (13). The coefficients (a), and (b), in
(2.11) are multivariate hypergeometric coefficients defined by
and where
(h)j=h(h+l)...(X+ j-l)=r(X+j)/r(h).
[see, for example, Johnson and Kotz (1972, p. 171)]. Algorithms for the extraction
of the coefficients in these polynomials have been written [see James (1968) and
McLaren (1976)] and a complete computer program for their evaluation has
recently been developed and made available by Nagel (1981). This is an im-
portant development and will in due course enhance what is at present our very
limited ability to numerically compute and readily interpret multiple infinite
series such as (2.11). However, certain special cases of (2.11) are already recogniz-
able in terms of simpler functions: when n = 1 we have the classical hypergeomet-
ric functions
Q1 (u,)j...(u,)jsj
pFg(q,...,ap;b,,...,b,; s>= c j-0 (b,)j.--(b,)jj!
[see, for example, Lebedev (1965, ch. 9)]; and when p = q = 0 we have
,F,(S) = E CC,(S)/j!=
j=O J
etr(S),
which generalizes the exponential series and which is proved in James (196 1); and
whenp=l and q=O we have
which generalizes the binomial series [Constantine (1963)]. The series oF,( ;) in
the non-central Wishart density (2.10) generalizes the classical Bessel function.
[The reader may recall that the non-central x2 density can be expressed in terms
of the modified Bessel function of the first kind- see, for example, Johnson and
Kotz (1970, p. 133).] In particular, when n = 1, ;12= 1, a= X, and A = a is a
scalar, we have
pdf(u)= exP{-:(a+U)ur,z_,
2T’2r( T/2)
= expW(a+W m xjuT/2+j- 1
(2.12)
2T/2 c
j-0 r(T/2+ j)j!22”
y=zn+v, (3.2)
or
(3.5)
or
As argued in Section 2.2, most econometric estimators and test statistics can be
expressed as simple functions of the sample moments of the data. In the case of
the commonly used single equation estimators applied to (3.3) we obtain rela-
tively simple generic statistical expressions for these estimators in terms of the
elements of moment matrices which have Wishart distributions of various degrees
of freedom and with various non-centrality parameter matrices. This approach
enables us to characterize the distribution problem in a simple but powerful way
for each case. It has the advantage that the characterization clarifies those cases
for which the estimator distributions will have the same mathematical forms but
for different values of certain key parameters and it provides a convenient first
base for the mathematics of extracting the exact distributions. Historically the
approach was first used by Kabe (1963, 1964) in the econometrics context and
has since been systematically employed by most authors working in this field. An
excellent recent discussion is given by Mariano (1982).
Ch. 8: Exact Small Sample Theory 465
and writing
YN = tz;pHz,)-‘z;pfft~l
- %&Vh (3.9)
Prv= (r;[p,-p,Z,(Z;p,Z,)-‘Z;p,]y,j-’
~{Y;[~,-~,~l~~;~,~l~-‘~;~,]Y,). (3.10)
%t%) 4dPH)
1i 1
Y;(fkpz,)Y, JGPH%,)Y,
=
JmY) =
[ %1&f) 4*w = r;(p,- PZ,)Yl r;(G- p&2
The generic statistical form for the estimator &, in (3.12) is then
PN = &2’w~,,(PH). (3.14)
This specializes to the cases of OLS and 2SLS where we have, respectively,
In a similar way we find that the k-class estimator PCk)of /? has the generic form
P(k)= {r;[k(P,-~~,)+(l-k,Q,,]YZ}-'
x{r;[k(P,-P,,)+(l-k)Q,,]y,}
= [kA**(~Z)+(1-k)A22(~)1-‘[ka,,(~,)+(1-k)a,,(~)l.
(3.17)
{A(I)-h[A(I)-A(Pz)]}p,=O, (3.19)
where X is the minimum of the variance ratio in (3.18). Thus, &rML is given by
the generic form
PLIML= [XA,,(P,)+(l-A)A,,(I)]-‘[Xa,,(P,)+(l-A)a,,(l)l,
(3.20)
that is, the k-class estimator (3.17) with k = A.
The above formulae show that the main single equation estimators depend in a
very similar way on the elements of an underlying moment matrix of the basic
form (3.13) with some differences in the projection matrices relevant to the
various cases. The starting point in the derivation of the p.d.f. of these estimators
of /3 is to write down the joint distribution of the matrix A in (3.13). To obtain the
p.d.f. of the estimator we then transform variates so that we are working directly
with the relevant function A;2’u,,. The final step in the derivation is to integrate
over the space of the auxiliary variates, as prescribed in the general case of (2.8)
above, which in this case amounts essentially to (a, ,, A,,). This leaves us with the
required density function of the estimator.
The mathematical process outlined in the previous section is simplified, without
loss of generality, by the implementation of standardizing transformations. These
transformations were first used and discussed by Basmann (1963, 1974). They
reduce the sample second moment matrix of the exogenous variables to the
identity matrix (orthonormalization) and transform the covariance matrix of the
endogenous variables to the identity matrix (canonical form). Such transforma-
tions help to reduce the parameter space to an essential set and identify the
Ch. 8: Exact Small Sample Theory 461
critical parameter functions which influence the shape of the distributions.’ They
are fully discussed in Phillips (1982e) and are briefly reviewed in the following
section.
[@”2’
52= 1*
w21 22
(3.21)
Then the following result [proved in Phillips (1982e)] summarizes the effect of
the standardizing transformations on the model.
Theorem 3.3.1
There exist transformations of the variables and parameters of the model given by
(3.3) and (3.5) which transform it into one in which
Under these transformations (3.3) and (3.5) can be written in the form
and
where T- ‘z’z = IK and the rows of [y::Y;C] are uncorrelated with covariance
matrix given by I,, + 1. Explicit formulae for the new coefficients in (3.23) are
and
y*=
(z+ )“2( cd,, - w;,s2,‘w,,)-“2y. (3.26)
7As argued recently by Mariano (1982) these reductions also provide important guidelines for the
design of Monte Carlo experiments (at least in the context of SEMs) by indicating the canonical
parameter space which is instrumental in influencing the shape of the relevant small sample
distributions and from which a representative sample of points can be taken to help reduce the usual
specificity of simulation findings.
468 P. C. B. Phillips
It turns out that the commonly used econometric estimators of the standardized
coefficients p* and v* in (3.23) are related to the unstandardized coefficient
estimators by the same relations which define the standard coefficients, namely
(3.25) and (3.26). Thus, we have the following results for the 2SLS estimator [see
Phillips (1982e) once again for proofs].
Theorem 3.3.2
The 2SLS estimator, &rs, of the coefficients of the endogenous variables in (3.3)
are invariant under the transformation by which the exogenous variables are
orthomormalized. The 2SLS estimator, y2sLs, is not, in general, invariant under
this transformation. The new exogenous variable coefficients are related to the
original coefficients under the transformation 7 = 5, ,y and to the estimators by
the corresponding equation yzsLs = J, ,yzsLs, where Ji, = (2; Z, /T)‘/‘. 0
Theorem 3.3.3
The 2SLS estimators of p* and v* in the standardized model (3.23) are related to
the corresponding estimators of p and y in the unstandardized model (3.3) by the
equations:
and
i/2
ZiZ, (3.28)
zsts = Y2SLS’
Tb,, - 4&‘~2J
Cl
Results that correspond to these for 2SLS can be derived similarly for other
estimators such as IV and LIML [see Phillips (1982e) for details].
The canonical transformation induces a change in the coordinates by which the
variables are measured and therefore (deliberately) affects their covariance struc-
ture. Some further properties of the transformed structural equation (3.23) are
worth examinin g. Let us first write (3.23) in individual observation form as
and
These relations show that the transformed coefficient vector, p*, in the stan-
dardized model contains the key parameters which determine the correlation
pattern between the included variables and the errors. In particular, when the
elements of /3* become large the included endogenous variables and the error on
the equation become more highly correlated. In these conditions, estimators of the
IV type will normally require larger samples of data to effectively purge the
included variables of their correlation with the errors. We may therefore expect
these estimators to display greater dispersion in small samples and slower
convergence to their asymptotic distributions under these conditions than other-
wise. These intuitively based conjectures have recently been substantiated by the
extensive computations of exact densities by Anderson and Sawa (1979)’ and the
graphical analyses by Phillips (1980a, 1982a) in the general case.
The vector of correlations corresponding to (3.32) in the unstandardized model
is given by
b,,
= (1 + p*;s*> 112 ’
-wo,, +P15222W2
so that for a fixed reduced-form error covariance matrix, ti, similar conditions
persist as the elements of p grow large. Moreover, as we see from (3.33), the
transformed structural coefficient p* is itself determined by the correlation
pattern between regressors and error in the unstandardized model. The latter (like
p*) can therefore be regarded as one of the critical sets of parameters that
influence the shape of the distribution of the common estimators of the coeffi-
cient /3.
There are two special categories of models in which the exact density functions of
the common SEM estimators can be extracted with relative ease. In the first
category are the just identified structural models in which the commonly used
consistent estimators all reduce to indirect least squares (ILS) and take the form
(3.34)
‘See also the useful discussion and graphical plots in Anderson (1982).
470 P. C. B. Phillips
of a matrix ratio of normal variates. In the two endogenous variable case (where
n = 1) this reduces to a simple ratio of normal variates whose p.d.f. was first
derived by Fieiller (1932) and in the present case takes the form’
exp ( -$(1+82))
pdf( r) = (3.35)
a(1 + r2) ’
hypothesis itself no longer holds. As such the leading term provides important
information about the shape of the distribution by defining a primitive member
of the class to which the true density belongs in the more general case. In the
discussion that follows, we will illustrate the use of this technique in the case of
IV and LIML estimators.‘3
We set p = 0 in the structural equation (3.3) and II*2 = 0 in the reduced form
so that y, and JJ~(taken to be a vector of observations on the included endogenous
variable now that n = 1) are determined by the system4
The IV estimator of j3 is
where B(f , K,/2) is the beta function. This density specializes to the case of
2SLS when K, = K, and OLS when K, = T - K,. [In the latter case we may use
(3.15) and write Q,, = I - T- ‘Z,Z; = C,C;, where C, is a T X (T - K,) matrix
whose columns are the orthogonal latent vectors of Qz, corresponding to unit
latent roots.] The density (3.38) shows that integral moments of the distribution
exist up to order K, - 1: that is, in the case of 2SLS, K, - 1 (or the degree of
overidentification) and, in the case of OLS, T - K, - 1.
The result corresponding to (3.38) for the case of the LIML estimator is [see
Phillips (1982e) for the derivation]
13An example of this type of analysis for structural variance estimators is given in Section 3.7.
141n what follows it will often not be essential that both j3 = 0 and II,, = 0 for the development of
the “leading case” theory. What is essential is that II a2 = 0, so that the structural coefficients are, in
fact, unidentifiable. Note that the reduced-form equations take the form
Thus, the exact sampling distribution of the &rML is Cauchy in this leading case.
In fact, (3.39) provides the leading term in the series expansion of the density of
LIML derived by Mariano and Sawa (1972) in the general case where /3 * 0 and
III,, * 0. We may also deduce from (3.39) that &rMr_ has no finite moments of
integral order, as was shown by Mariano and Sawa (1972) and Sargan (1970).
This analytic property of the exact distribution of &rML is associated with the
fact that the distribution displays thicker tails than that of &v when K, > 1. Thus,
the probability of extreme outliers is in general greater for &rML than for &.
This and other properties of the distributions of the two estimators will be
considered in greater detail in Sections 3.5 and 3.6.
3.5. The exact distribution of the IV estimator in the general single equation case
where the standardizing transformations are assumed to have been carried out.
This is the case where H = [Z, : Zs] is a matrix of K, + K, instruments used in
the estimation of the equation. To find the p.d.f. of &, we start with the density
of the matrix:
pdf( A) =
in view of the relations (3.6) given above. Writing Z&SS’ZI,, as Fz21&2, where
r,, is an n X n matrix (which is non-singular since the structural equation (3.3) is
assumed to be identified), we find that
Moreover, since the non-zero latent roots of MM’A are the latent roots of
(3.41) becomes
etrj - ~(Z+/3~‘)E2,ir,)
pdf(w, r, A,,)
etr( - 5 (I + /3Z3’)E22~22}
X,,F,(+;{ ~w~22&3’~22+if,2(Z+/W)A22(Z+rj3’)~22])
pdf( w, r, B) =
f {wfl*,,pp~F*, + TT22(I+j3r’)(l+rr~)-“2B(I+rrt)-“2(~+r~~)~22}
(3.43)
“An alternative approach to the extraction of the exact density of prv from (3.42) is given in
Phillips (1980a, appendix B) and directly involves the algebra of expanding the zonal polynomial of a
sum of two matrices into a sum of more basic polynomials in the constituent matrices. This algebra
was developed by Davis (1980a, 198Ob) and has recently been extended by Cl&use (1981) to matrix
multinomial expansions of zonal polynomials of the sum of several matrices.
Ch. 8: Exact Small Sample Theory
X[(58’if;jadj~)if,,8)‘(det(Z+~))’L”’2t’
L+n+l L+n
X,F, -+ j; T(Z+W)1722
2 ’ 2
x(z+~r’)(z+rr’)-yz+r/3t)n22
(3.44)
exp{ -$(1+b2))r(T)
pdf( r) =
7T”2r
pdf( r) =
X,F, i F; 5; Sn,,(Z+j3r’)(Z+rr’)‘(Z+r/Y)IT22).
(3.46)
16This is a generic term that I am using to denote zonal polynomials and more general polynomials
of this class but which may involve several argument matrices, as in the work of Davis (1980a, 198Ob)
and Chikuse ( 198 1).
Ch. 8: Exact Small Sample Theory 471
(1) For comparable parameter values the marginal distributions of pIv appear to
concentrate more slowly as T + cc when the number of endogenous variables
(n + 1) in the equation increases.
(2) The marginal densities are particularly sensitive to the degree of correlation
in the concentration parameter matrix % = Tn22fi22 in (3.44) Setting, for
example,
JYj = p2 1 P
[P 11
(3) The effect of increasing the number of endogenous variables, ceteris paribus,
in a structural equation is a decrease in the precision of estimation. This accords
with well-known results for the classical regression model.
(4) The marginal distribution of & displays more bias in finite samples as L,
the number of additional instruments used for the n right-hand-side endogenous
variables, increases in value. When L becomes small the distribution is more
centrally located about the true value of the parameter but also has greater
dispersion than when L is large.
As seen in (3.45) the general form of the joint density (3.44) can be specialized to
yield results which apply in the two endogenous variable case. These results were
Ch. 8: Exact Small Sample Theory 479
first established independently by Richardson (1968) and Sawa (1969) for 2SLS
and OLS [to which (3.45) applies], by Mariano and Sawa (1972) for LIML, and
by Anderson and Sawa (1973) for k-class estimators. Moreover, as demonstrated
by Richardson and Wu (1970) and by Anderson (1976) the exact p.d.f.s for ~SLS
and LIML directly apply after appropriate changes in notation to the OLS and
orthogonal regression estimators of the slope coefficient in the errors in variables
model.
Details of the argument leading to the exact density of the 2SLS (or OLS)
estimator can be outlined in a few simple steps arising from (3.42) [see Phillips
(1982e) for details]. The final result is expression (3.45), obtained above as a
specialized case of the more general result in Section 3.5. Expression (3.45) gives
the density of &rs when L = K, - 1 and the density of PO,, when L = T - K, - 1.
An alternative method of deriving the density of &rs (or Do,,) is given in
Phillips (1980b, appendix A), where the Fourier inversion [of the form (2.3)] that
yields the density is performed by contour integration.
Similar methods can be used to derive the exact densities of the LIML and
k-class estimators, &rML and Ptk). In the case of LIML the analysis proceeds as
for the leading case but now the joint density of sample moments is non-central
[see Phillips (1982e) for details]. This joint density is the product of independent
Wishart densities with different degrees of freedom (K, and T - K, respectively)
and a non-centrality parameter matrix closely related to that which applies in the
case of the IV estimator analyzed in Section 3.5. The parameterization of the joint
density of the sample moments upon which &rML depends clarifies the key
parameters that ultimately influence the shape of the LIML density. These are the
(two) degrees of freedom, the non-centrality matrix, and the true coefficient
vector. For an equation with two endogenous variables the relevant parameters of
the LIML density are then: K,, T- K, p’, and /?. The mathematical form of the
density was first derived for this case by Mariano and Sawa ( 1972).17 The
parameterization of the LIML density is different from that of the IV density
given above. In particular, the relevant parameters of (3.45) are L, p’, and p; or
in the case of 2SLS, K,, p2, and p. We may note that the IV density depends on
the sample size T only through the concentration parameter p2, as distinct from
the LIML density which depends on the sample size through the degrees of
freedom, T - K, of one of the underlying Wishart matrices as well as the
concentration parameter.
Similar considerations apply with respect to the distribution of the k-class
estimator, PCkJ. We see from (3.17) that for k * 0,l the p.d.f. of fiCkJ depends on
the joint density of two underlying Wishart matrices. The relevant parameters of
the p.d.f. of PCk, are then: K,, T - K, k, p2, and /3. The mathematical form of this
(1) The distribution of /_I,,,, is asymmetric about the true parameter value,
except when /3 = 0 [the latter special case is also evident directly from expression
(3.45) above]. The asymmetry and skewness of the distribution increase as both /I
and K, increase. For example, when p = 1, EL*= 100, and K, = 30 the median of
the distribution is - 1.6 (asymptotic) standard deviations from the true parameter
value, whereas at K, = 3 the median is - 0.14 standard deviations from p. As K,
becomes small the distribution becomes better located about /3 (as the numbers
just given illustrate) but displays greater dispersion. Thus, at /3 = 1, p* = 100, and
K, = 30 the interquartile range (measured again in terms of asymptotic standard
deviations) is 1.03 1, whereas at /I = 1, p* = 100, and K, = 3 the interquartile range
is 1.321. Table 3.1 table illustrates how these effects are magnified as p increases: I8
Table 3.1
Median (MDN) and interquartile range (IQR)
of &sLs - p in terms of asymptotic
standard deviations (a2 = 100)
1 2 5
(2) The rate at which the distribution of pzsLs (appropriately centered and
standardized) approaches normality depends critically on the values of p and K,.
If either (or both) of these parameters are large, then the approach to normality is
quite slow. At p = 1 and K, = 3, for example, the value of CL*must be at least 100
to hold the maximum error on the asymptotic normal approximation to 0.05; but
‘*The numbers in Tables 3.1 and 3.2 have been selected from the extensive tabulations in Anderson
and Sawa (1977, 1979) which are recommended to the reader for careful study. My thanks to
Professors Anderson and Sawa for their permission to quote from their tables.
Ch. 8: Exact Small Sample Theoty 481
when K, = 10, p2 must be at least 3000 to ensure the same maximum error on the
asymptotic distribution.
(3) Since the exact distribution of &rMr_ involves a triple infinite series, Ander-
son and Sawa (1977, 1979) tabulated the distribution of a closely related estima-
tor known as LIMLK. This estimator represents what the LIML estimator would
be if the covariance matrix of the reduced-form errors were known. In terms of
(3.1% P,_IM~_K minimizes the ratio &WP,/pdL?&, where D is the reduced-form
error covariance matrix and satisfies the system (W - A ,,&I)& = 0, where Xm is
the smallest latent root of 9-‘W. The exact distribution of &rMLK can be
obtained from the non-central Wishart distribution of W. Anderson and Sawa
(1975) give this distribution in the form of a double infinite series that is more
amenable to numerical computation than the exact distribution of LIML. In a
sampling experiment Anderson et al. (1980) investigated the difference between
the LIML and LIMLK distributions and found this difference to be very small
except for large values of K,. Anderson (1977) also showed that expansions of the
two distributions are equivalent up to terms of 0(pe3). These considerations led
Anderson and Sawa to take LIMLK and a proxy for LIML in analyzing the small
sample properties of the latter and in the comparison with 2SLS. They found the
central location of LIMLK to be superior to that of 2SLS. In fact, LIMLK is
median unbiased for all p and K,. Moreover, its distribution (appropriately
centered and standardized) approaches normality much faster than that of 2SLS.
However, LIMLK displays greater dispersion in general than 2SLS and its
distribution function approaches unity quite slowly. These latter properties result
from the fact that LIMLK, like LIML, has no integral moments regardless of the
sample size and its distribution can therefore be expected to have thicker tails
than those of 2SLS. Table 3.2 [selected computations from Anderson and Sawa
(1979)] illustrates these effects in relation to the corresponding results for 2SLS in
Table 3.1.19
Table 3.2
Median and interquartile range of PLIMLK - /3
in terms of asymptotic standard deviations (p2 = 100)
Y.-.-i! 1 2 5
MDN 0 0 0
3 IOR 1.360 1.357 1.356
MDN 0 0 0
30 IQR 1.450 1.394 1.363
19We note that since PLIMLK depends only on the non-central Wishart matrix W with degrees of
freedom K,, the dlstnbutlon of PLrMLK depends on the sample size T only through the concentration
parameter p2. unlike the distribution of PLIML.
482 P. C. B. Phillips
These features of the exact small sample distributions of 2SLS and LIMLK
give rise to the following two conclusions reported by Anderson (1982): (a) the
distribution of j&s may be badly located and skewed unless /3 and K, are small
or p2 is very large; and (b) the approach to the asymptotic normal distribution is
slow for 2SLS and rapid for LIMLK and, apparently, LIML. Thus, in many cases
the asymptotic normal theory may be a fairly adequate approximation to the
actual distribution of LIML but a less than adequate approximation to the
distribution of 2SLS.
These conclusions clearly suggest the use of caution in the application of
asymptotic theory and thereby agree with the results of many other studies. One
additional point is worthy of mention. The above exact results and reported
numerical experience refer to the standardized model as discussed in Section 3.3.
When we referred to the true coefficient fi above, we therefore meant the true
standardized coefficient [as given by p* in expression (3.25) of Theorem 3.3.11.
But we note that the correlation between the included endogenous regressor, y21,
and the structural error, u,, in the unstandardized model is a simple function of
/3*, namely corr(y,, u,) = - p*/(l + p*2)‘/2 as given by (3.33) in the general case.
Thus, as the modulus of the standardized coefficient, ID*], increases, the correla-
tion between y21 and u, increases. We therefore need, ceteris paribus, a larger
sample of data to effectively purge yZ1of its correlation with ut in estimation by
2SLS (or more generally IV). This correlation is explicitly taken into account
when we estimate by LIMLK (or LIML), since we directly utilize the reduced-form
error covariance matrix (or an estimate of it) in this procedure. Thus, it may not
be too surprising that the finite sample distribution of pzsLs displays a far greater
sensitivity to the value (particularly large values) of /3* than does the distribution
of LIML, as the computations in Tables 3.1 and 3.2 illustrate.
alternative classes of estimator for the structural variance, a*, of the errors in
(3.3):
G*(P)=P~X’Q=XPd=Pd[A(I)-A(Pz)lPd, (3.48)
Q(P)=G,(p)-G,(p)=pdx’(p,-p,,)xPa=PdA(Pz)Pa.*’ (3.49)
and
201n the case of estimation by IV (with instrument matrix H) it will sometimes be more appropriate
to consider the following quadratic form instead of (3.49):
484 P. C. B. Phillips
In particular, the exact density of urv in standardized form and in the leading
case is given by [see Phillips (1982e) for derivations]
Pdf( 01” z
j=O
(3.52)
“As pointed out by Anderson and Rubin (1949, p. 61) their method was independently suggested
by Bartlett (1948).
486 P. C. B. Phillips
other hand, when Ha is not true, y* will [in view of the reduced form (3.5)] be a
linear function of both 2, and 2,. Thus, Ha may be tested by a conventional
F-test of the hypothesis that the coefficient vector of Z, is zero in the regression
of JJ* on Z, and Z,. The statistic for this test takes the usual form of
F= T-KY*'(Qz,-Qzb*
(3.53)
K2 y*'Qzy*
and has an FK,, T_ K distribution under H,,. When Ha is false, the denominator of
(3.53) is still proportional to a x$_~ variate while the numerator becomes
non-central xi, with the non-centrality dependent upon the vector of coefficient
inaccuracy under the null, p - &,, and a subset of the reduced-form parameters.
Thus (3.53) is non-central FK,,T_K under the alternative hypothesis, p * &. This
test can readily be extended to accommodate hypotheses that involve exogenous
variable coefficients and even (under suitable conditions) coefficients from several
structural equations. The common requirement in each version of the test is that
all structural coefficients pertaining to endogenous variables be specified under
the null. This requirement ensures that the model can be rewritten, as above, as a
multiple (or multivariate) regression when the null hypothesis holds. The test
based on (3.53) is consistent and its power function was considered briefly by
Revankar and Mallela (1972). Confidence regions follow from (3.53) in the usual
way as the set of all p satisfying the inequality
are lower than those for the t-distribution (unless /I = 0), so that confidence levels
will be overstated and levels of significance will be understated if the t-distribu-
tion is used as an approximation in constructing confidence intervals or in
two-sided tests.
Their analysis can be illustrated by considering the IV estimator, piv, in the
two endogenous variable case and for the leading null hypothesis /? = 0, II*, = 0
of Section 3.4. The Richardson-Rohr structural t-statistic is given by
t= (Y;w~Y*Y2PIv/~, (3.55)
where s 2 = Q( &,)/( K, - 1). Simple manipulations show that this has a Student’s
t-distribution with K, - 1 degrees of freedom. In the 2SLS case that is considered
by Richardson and Rohr (1971), K, - 1= K, - 1= degree of overidentification of
the structural equation and it is assumed that K, - la 1.
An interesting experimental investigation that bears on this test has been
reported by Maddala (1974). Maddala studied the power functions of the
Dhrymes-Richardson-Rohr (DRR) statistic, the Anderson-Rubin (AR) statistic,
and the conventional t-ratio statistic (corresponding to what would be justified if
the equation were a classical linear regression and estimation were by OLS). For
the model and parameter values used by Maddala, he found that the DRR test
had very low power in comparison with the AR and conventional test. This
outcome is partially explained by Maddala in terms of the different structural
variance estimators that are used in the various test statistics. He argues, in
particular, that the DRR statistic involves a variance estimator based on Q(p) in
(3.49). This estimator relies on linear forms in the data such as Z;X and does not
involve the sample second moments X’X directly, as do the more conventional
estimators utv and uLIML in (3.50)-(3.51). To this extent they neglect useful
sample information about the error variance and this is reflected in the observed
low power of the DRR test in comparison with the conventional tests.
H,: y2 = 0. (3.57)
in the literature and are referred to under the name of identifiability test statistics.
The most common of these arise naturally in 2SLS and LIML estimation. Their
construction relies on the quadratic forms (3.47)-(3.49) studied in connection
with structural variance estimators. Explicitly we have
G,(PxLs)-G(&J_~)= Q(&Ls)
(3.58)
@2SLS =
G,(P2SLS) G2032,~s) '
G,(PL~&G~(PLIML) = Q@LIML)
+ LIML= (3.59)
G,(PLIML) G203LIML).
T+ 2 x&n(a), (3.60)
where (Yis the chosen significance level and + denotes either +2sLs or GLIML.
As an approximate finite sample test, Anderson and Rubin (1949) suggested
the alternative critical region:
(3.61)
This may be justified on the argument that for fixed /I in (3.59) the ratio
(T - K)+/K, is indeed distributed as FK,,T_K [compare (3.53) above]. Basmann
(1960) criticized this suggestion on the grounds that as T + co
whereas
K&2,.-, 2 x:,.
He also argued that these considerations suggested an adjustment to the numera-
tor degrees of freedom in the F-ratio and, as a result, the alternative critical
Ch. 8: Exact Small Sample Theory 489
regions :
(3.62)
(3.63)
M=$PE(X’)(Pz-P&(X). (3.64)
After standardizing transformations have been carried out and when the null
hypothesis is correct, this becomes:
M=T (3.65)
Thus, under the null a has one zero latent root and generally n non-zero roots.
490 I’. C. B. Phillips
When the null is false, the simpler form of Min (3.65) no longer holds and (3.64)
normally has rank n + 1 rather than n. Thus, the true power functions of tests
such as (3.60), (3.61), or (3.63) depend on the values of these non-zero latent
roots. Rhodes (1981) investigates the actual size and power of these tests for a
selected set of latent roots of M and finds that when the non-zero roots are small
(less than 10) the true size of each test is very poorly represented by the nominal
level of significance. To relate these results to those of Basmann (1960) reported
above, Rhodes calculated the non-zero latent roots of the relevant non-centrality
matrix for Basmann’s experiment and found the roots to be large, explaining in
part why (3.63) proved to be quite accurate in those experiments.
Since the exact distribution of +LIML is not amenable to computation, some
steps have been taken to provide improvements on the critical regions (3.60),
(3.61), and (3.63). McDonald (1974) obtained an approximate F distribution for
+LIML by selecting parameters for the former in order that the first two moments
of the distributions would be the same. Rhodes (1981) developed an alternative
critical region for the test by considering the conditional distribution of +LIML
given the other roots of the LIML determinantal equation. In particular, this
conditional distribution has a simple asymptotic form as the largest n latent roots
of a tend to infinity and can be used for the computation of a new critical region
for a test based on $LIML and for power function evaluations. It has the
advantage (eve! the conventional asymptotic and other tests we have discussed)
of incorporating more sample information, and preliminary experimental results
in Rhodes (198 1) indicate that it may provide a more accurate critical region for
the identifiability test.
In comparison with the analytic results reviewed in previous sections for single
equation estimators and test statistics, much less is known about the distribution
of full systems estimators, reduced-form coefficient estimators, and their associ-
ated test statistics. Most progress has in fact been made in the application of
small sample asymptotics by the use of Edgeworth expansions. Here the theory
and constructive process detailed in Phillips (1982e) are directly applicable and
machine programmable for both structural and reduced-form coefficient estima-
tors. We will consider the analytic results for the two groups of coefficients
separately below.
Some manageable formulae for the first correction term of O(T-‘I*) in the
Edgeworth expansion have been obtained by Sargan (1976a, appendix C) for
Ch. 8: Exact Small Sample Theory 491
3SLS and FIML systems estimators. But no work is presently available to shed
light on the adequacy of these approximations. What we know of their perfor-
mance in the case of single equation estimators2* suggests that their adequacy (at
least for 3SLS estimation) will deteriorate as certain equations in the system
become heavily overidentified. It also seems clear that the size of the system will
have an important bearing in this respect, given other relevant factors such as the
sample size, reduced-form parameter values, and features of the exogenous series.
Some evidence which relates to this issue is available in Phillips (1977c), who
developed formulae for the Edgeworth expansion of two-stage Aitken estimators
of the parameters in a linear multivariable system subject to general linear
cross-equation restrictions. 23 These formulae show that to terms of O(T- ‘) the
finite sample distribution is a resealed version of the exact distribution of the
Aitken estimator. This scaling factor depends on the moments of the estimated
error covariance matrix and the sample second moments of the exogenous
variables. As the number of equations in the system increases, the scale generally
changes in such a way that the dispersion of the distribution increases. This
corresponds with exact results obtained by Kataoka (1974) for a somewhat
simpler version of this model and squares with the intuition that as the precision
of our error covariance estimator decreases (through reductions in the effective
degrees of freedom) the sampling dispersion of the resulting two-stage coefficient
estimator increases. These results for the multivariate linear model furnish inter-
esting conjectures for systems estimation in the SEM. Finally in this connection,
we may mention that Nagar-type approximating moments may be deduced from
the Edgeworth formulae [see Phillips (1982e)]. Such approximating moments, or
pseudo-moments (where this term is appropriate), were derived independently for
the 3SLS structural coefficient estimator by M&hail (1969) in doctoral disser-
tation work at the London School of Economics.
In addition to the approximate distribution theory discussed above some
progress on a leading case analysis for systems estimation along the lines of
Section 3.4 is possible. The principles may be illustrated by considering FIML
applied to a two-equation system of the form (3.1) with
B=
1
[ b 12
b2,
1’ 1 (3.66)
**See Anderson and Sawa (i 979), Holly and Phillips (1979), and Richardson and Rohr (198 1). An
attempt to tackle this problem by asymptotic expansions in which the degree of overidentification
grows large is given by Morimune (I 98 1).
23Recent work in the same framework has been published by Maekawa (1980) for t ratio type test
statistics.
492 P. C. B. Phillips
In an important article, McCarthy (1972) initiated the analytic study of the finite
sample properties of restricted reduced-form (RRF) coefficient estimators and
associated predictions. The RRF incorporates additional information that is
embodied in overidentifying restrictions on the structural equations of the system.
To the extent that RRF estimators utilize this information, they were thought for
many years to possess higher asymptotic efficiency and, as a result, smaller
Ch. 8: Exact Small Sample Theory 493
Theorem 3.9. I
If B = J/(p)/+(p), where p is a random n-vector and /!t is a scalar function of p
and there exists a J+, in the domain of definition of p such that:
24Dhrymes (1973) showed that this ranking in terms of asymptotic efficiency does not hold for RRF
estimators, such as ZSLS, which are not fully efficient.
494 P. C. B. Phillips
From the criterion function (3.67) it is clear that low probabilistic weight will be
attached to events in 8 space which imply large values of II since the latter will
normally imply large values for the criterion (3.67). This will not be the case as
the columns of Z become highly collinear or more generally when the complete
data matrix T- ‘W’W is close to singularity. Thus, we might expect the FIML
reduced-form IIFIML = II( BFIML)to possess finite moments provided T is large in
relation to the number of variables, n + K, in the system. In fact, Sargan (1976b)
proves that IIFIML has finite moments of integral order up to T-n-K.
The fact that many reduced-form estimators possess no integral moments has
led to the suggestion of improved estimators which combine URF and RRF
estimators in such a way that the tail behavior of the combined estimator is
improved. A fundamental contribution in this area is due to Maasoumi (1978).25
Maasoumi develops a new reduced-form estimator which combines the corre-
sponding restricted 3SLS and the unrestricted OLS estimators. The new estimator
incorporates the outcome of an asymptotic x2 test of the model’s overidentifying
restrictions and thereby opens up a middle road of methodology that lies between
completely unrestricted and fully restricted estimation. Specifically, Maasoumi
proposes the following estimator:
(3.68)
25More recent work by Maasoumi (1981) dealing with generic reduced forms that allow for
reduced-form estimation in the light of intrinsically uncertain structural information is also pertinent
to this discussion. Nagar pseudo-moment expansions for 3SLS reduced-form coefficients have also
been developed in Maasoumi (I 977).
Ch. 8: Exact Small Sample Theory 495
where
(3.69)
(3.71)
where I(., is an indicator function equal to unity when $J is in the indicated range
and equal to zero otherwise. This estimator differs from the traditional Stein-like
variety in that it takes the unrestricted estimate II,,, as the point of attraction
rather than simply the origin.
The finite sample and asymptotic properties of the combined estimator II* are
investigated in Maasoumi (1978). 29 It is shown that II* has finite integral
moments to the order T-n - K (as for the FIML reduced-form discussed earlier in
this section) and that the limiting distribution of @(II* - II) is close to that of
JT(&,s - II) for conventional choices of the significance level C,. Thus, II*
has close to asymptotic equivalence with IIssLs and has apparently superior small
sample properties in terms of outlier elimination. Practical implementation of the
method is as straightforward as 3SLS. What remains problematic is the selection
*%ee, for example, James and Stein (1961), Zellner and Vandaele (1975), and Chapter IO in this
Handbook by Judge and Bock.
27See Goldberger ( 1973).
**See Zellner (I 978) and Zellner and Park (1979).
29The finite sample properties of Stein-like improved estimators in the context of the linear
regression model have been studied by Ullah ( 1974, 1980).
496 P. C. B. Phillips
of the critical level, C,. The statistic $I in (3.70) has a limiting xi distribution,
where N is the total number of overidentifying restrictions. Even in moderately
sized models, N may be quite large and strict application of the test based on
(3.70) at conventional significance levels usually leads to a rejection of the
restrictions. Thus, frequent occurrence of II* = IIors in practical situations might
be expected and this might raise the very genuine objection to the combined
estimator that it will frequently result in the extreme alternative of unrestricted
reduced-form estimation by OLS. This criticism should be tempered by the
knowledge that the critical value, C,, will often be a very poor (asymptotically
based) indicator of the correct finite sample critical value for a test with a chosen
size of (Y.Monte Carlo results by Basmann (1960), Byron (1974), Maddala (1974),
Basmann, Richardson and Rohr (1974), Maasoumi (1977), Laitinen (1978),
Meisner (1979), Hausman and McFadden (1981), and Rhodes and Westbrook
(1982) all indicate that many conventional asymptotic tests of restrictions lead to
an unduly high rate of rejection (that is often severe) in small sample situations.
This evidence suggests that conventional asymptotic tests are often not suffi-
ciently reliable to justify the extreme alternative of completely unrestricted
reduced-form estimation. It would therefore seem wise in the light of this evidence
to set the size of the test at a level much lower than usual so that the implied
(asymptotic) critical value, C,, is larger and the probability of a test rejection
reduced. The problem of the most appropriate selection of C, for a given model,
data set, and limited knowledge about the exact distribution of $ clearly warrants
substantially more attention than it has received. Mechanical correctors to the
asymptotic critical region (C,, co) can be based on Edgeworth expansions along
the lines of Section 2.3 and this is an area of extensive current research in
mathematical statistics. However, little is known at present concerning the
adequacy of such corrections.
In addition to the above work on reduced forms, attention has also been given
in the literature to the partially restricted reduced-form (PRRF) suggested by
Amemiya (1966) and Kakwani and Court (1972). The PRRF coefficients can be
obtained equation by equation from relationships such as
(3.72)
[deduced from (3.6) above] which relate the reduced-form coefficients of one
(relevant) equation to those of other equations in terms of the identifying
restrictions. The PRRF estimator of the coefficients in the first reduced-form
equation [given by the left-hand side of (3.72)] is then
[I[ II I
”
3oA similar analysis of higher order terms (in Edgeworth expansions) of the distributions of
conventional test statistics can be performed. Much work has already been done on this topic in
mathematical statistics leading to some general results on the higher order efficiency of tests based on
maximum likelihood estimators. See, for example, Pfanzagl and Wefelmeyer (1978, 1979).
498 P. C. B. Phillips
family in terms of the minimum expected loss (MELO) criterion whereby the
posterior expectation of a weighted quadratic loss function is minimized with
respect to the structural coefficients. Both Nagar and Zellner-Park reported
applications of their improved estimators in the context of small macroeconomet-
ric models. Zellner and Park found in their application that the (asymptotic)
standard errors of the MEL0 estimates were consistently smaller and often much
smaller than their 2SLS counterparts.
Alternative estimators constructed by taking linear combinations of 2SLS with
OLS and 2SLS with LIML were proposed by Sawa (1973a, 1973b) and Morimune
( 197Q3’ respectively. The weights in these combined estimators were selected so
as to remove the bias (or pseudo-bias when this is appropriate) in the estimator
up to terms of 0(a2), where u2 is a scalar multiple of the covariance matrix of the
errors in the model. That is, the improvements were based on the use of small-u
asymptotic expansions (see footnote 5 in Section 2). Sawa (1973b) numerically
computed the first two exact moments of the combined 2SLS-OLS estimator but
no clear conclusion concerning its superiority over 2SLS emerged from these
computations. Morimune (1978) examined the (asymptotic) mean squared error32
of the 2SLS-LIML combination and demonstrated its superiority over LIML
according to this criterion. In the context of systems estimation related work has
been done by Maasoumi (1980) on a ridge-like modification to the 3SLS estima-
tor.
Fuller (1977) introduced modifications to the LIML and fixed k-class estima-
tors which ensure that the new estimators possess finite moments. The modifica-
tions add weight to the denominators in the matrix ratios that define the
unmodified estimators. Their generic form, in the notation of Section 3.2 above, is
as follows:
where
det[A(Z)-X{A(Z)-A(Pz)}] =0 (3.76)
Earlier work was done by Mariano (1973) who covered the 2SLS (k = 1) case
for even-order moments and by Hatanaka (1973) who gave sufficient conditions
for existence. Sawa (1972) dealt with the two endogenous variable case, estab-
lished the above result, and further demonstrated that b(k) has no integral
moments when k > 1. Sawa also gave exact formulae for the first two moments
when 0 I k I 1 and developed asymptotic expansions for them in terms of the
reciprocal of the concentration parameter, namely l/p*. Similar formulae were
derived by Takeuchi (1970) for OLS, 2SLS, and IV estimators in the two
endogenous variable case. 33 Ullah and Nagar (1974) gave analytic formulae for
the mean of the 2SLS estimator and their results were used by Sahay (1979) in
finding an expression for the mean of the 2SLS structural equation residuals.
Extending this work to the general single equation case (with n and K, arbitrary),
Hillier and Srivastava (1981) and Kinal (1982) have derived exact formulae for
the bias and mean squared error of the OLS and 2SLS estimator of a single
endogenous variable coefficient. This generalizes the work of Sawa (1972).
Unfortunately, the presence of zonal-type polynomials in the final formulae
prevents their use for numerical computations in the general single equation case,
at least with present-day tabulations and algorithmic machinery (see the discus-
sion of this point in Section 3.5 above).
Before leaving this topic it may be worth mentioning that moments are useful
to the extent that they shed light on the distribution itself. In particular, they
provide summary information about the location, dispersion, and shape of a
distribution. However, as many of the cases that are analyzed in Sections 3.5 and
3.6 attest, an important feature of many exact distributions in econometrics is
their asymmetry. Obviously, moment analyses of higher order than the second are
necessary to inform on such aspects of the shape of a distribution. In some cases,
of course, such higher order moments may not exist. When they do, the formulae
will often be as complicated as the series expressions for the p.d.f.s themselves.
Considerations of research strategy therefore indicate that it may well be wise to
direct most of our attention to the distributions, their numerical evaluation, and
their approximation rather than that of the moments.34
Finally, we remark that direct results concerning the moments of estimated
coefficients of the exogenous variables in a structural equation can be deduced
from the relevant formulae given in Section 3.3 above and the results for the
coefficients of the included endogenous variables. Thus, in the case of the IV
33As reported by Maasoumi and Phillips (1982a) there appear to be errors in his expression arising
out of his formulae (2-7) and (2-8) which appear to confuse even- and odd-order moments.
34The issues raised in this section have an obvious bearing on Monte Carlo experimentation, where
it is customary to work with summary measures defined in terms of low order moments. Caution in
the use of such methods has been advised by several authors, for example Basmann (1961) and
Maasoumi and Phillips (1982a). Problems of accurately estimating high order moments by Monte
Carlo replications (and the demands this may place on the experimental design) are apposite here but
seem not to have been discussed in the literature in this field.
Ch. 8: Exact Small Sample Theory 501
estimator we have
3.12. Misspecification
Earlier results in this section have all been obtained on the presumption that the
model has been correctly specified. When this is not so, the sampling distributions
undergo modifications contingent upon the nature and extent of n&specification
and earlier conclusions about estimator and test statistic performance in small
samples no longer necessarily apply. Fisher (1961, 1966, 1967) carried out an
asymptotic analysis of estimator performance in the presence of specification
error consisting of incorrectly excluded variables in structural equations such as
(3.3). An exact small sample theory can also be developed for this problem using
the approach of Sections 3.5 and 3.6. We illustrate by considering OLS and 2SLS
estimation of the (incorrectly specified) structural equation (3.3) when the true
equation includes additional exogenous variables and is of the form
(3.79)
(3.83)
with FF’ = Z,Z; in the case of 2SLS and FF'= Z - T- ‘Z,Z; for OLS, where F is
a T x f matrix of rank f and F'F= TZf. Formulae for the exact densities of Is,,,
and aSLS in the general case are then obtained by arguments which closely follow
those of Section 3.5, as shown by Maasoumi and Phillips (1982b). For the two
endogenous variable case we obtain [see Phillips (1982e) for derivations]:
pdf( r) =
h=O
(3.85)
This expression gives the exact density under misspecification of poLs when
f=T-K, andof/32s,swhenf=K2. The density reduces to (3.45) when the
structural equation is correctly specified (y4 = 0) as can be shown by rearrange-
ment of the series.
Ch. 8: Exact Small Sample Theoy 503
Formula (3.85) was derived by Rhodes and Westbrook ( 1981)35 and formed the
basis of the computational work reported in their paper. These numerical compu-
tations provide valuable evidence concerning the practical consequences of mis-
specification. Two principal results emerge from their study: r&specification can
substantially increase the concentration of the distribution of both OLS and
2SLS; and in some cases it may also reduce the bias (as well as the dispersion) of
both estimators. These results led Rhodes and Westbrook to conclude that, when
a structural equation is misspecified by incorrectly excluded variables, OLS may
indeed be a superior technique of estimation to 2SLS.
The same general conclusion was reached by Hale, Mariano and Ramage
(1980) who examined exact and approximate asymptotic expressions for the bias
and mean squared error (MSE) of k-class estimators (for k non-stochastic in the
interval 0 I k 2 1). Their results, which also refer to the two endogenous variable
case, show that OLS is relatively insensitive to specification error and that when
errors of specification are a more serious problem than simultaneity, OLS is
preferable to 2SLS. Moreover, the entire k-class is dominated in terms of MSE
under misspecification by either OLS or 2SLS.
Similar analysis of the effect of misspecification upon the LIML estimator in
the two endogenous variable case has been performed by Mariano and Ramage
(1978). Some extensions of this work, involving asymptotic expansions and
moment approximations to the general single equation case, are contained in
Mariano and Ramage (1979). Exact formulae for the p.d.f.s of OLS and 2SLS
estimators in the general single equation case under m&specification are given by
Maasoumi and Phillips (1982b).36-39
%ieir stated result in theorem 2.1 contains a ,small error in that 1T,/41 k in their formula (2.1 I)
should be replaced by 1T,/2 1k.
361n addition, Knight (1981) has shown how, in the two endogenous variable case, expressions for
the exact moments of k-class estimators under misspecification can be extracted from the correspond-
in expressions that apply in correctly specified situations.
8 7Related work on the effect of multicollinearity on the shape of the distributions of OLS and 2SLS
estimators has been done by Mariano, McDonald and Tishler (1979).
38Hale (1979) has also studied the effects of misspecification on the two-stage Aitken estimator
(ZSAE) and OLS estimator in a two-equation seemingly unrelated regression model. Hale’s main
conclusion is that the distribution of 2SAE appears to be more affected by n&specification than that
of OLS.
39Analysis of the effects of distributional shape are also possible. Knight (1981) has, in particular,
found expressions for the first two exact moments of the k-class estimator in the two endogenous
variable case when the reduced-form errors follow a non-normal distribution of the Edgeworth type.
Phillips (1980b) indicated generalizations of existing results for asymptotic expansions of coefficient
estimators and test statistics under non-normal errors of this type. Explicit formulae for such
asymptotic expansions have in fact been derived by Satchel1 (1981) for the distribution of the serial
correlation coefficient.
504 P. C. B. Phillips
This section outlines the elements of a new approach to small sample theory that
is developed in Phillips (1982~). The idea that underlies the method in this article
is very simple. It is motivated by the observation that, in spite of the complex
analytic forms of many of the exact p.d.f.s presently known for econometric
statistics (such as those in Section 3) when we do turn around and obtain
numerical tabulations or graphical plots of the densities we typically end up with
well-behaved, continuous functions that tend to zero at the limits of their domain
of definition. The form of these p.d.f.s strongly suggests that we should be able to
get excellent approximations to them in the class of much simpler functions and
certainly without the use of multiple infinite series. We need to deal with
approximating functions (or approximants as they are often called) that are
capable of capturing the stylized form of a density: in particular, we want the
approximant to be able to go straight for long periods in a direction almost
parallel to the horizontal axis and yet still be able to bend, quite sharply if
necessary, to trace out the body of the distribution wherever it is located. One
class of functions that seems particularly promising in this respect, as well as
being simple in form, are rational functions. Even low degree rational functions
can go straight for long periods and then bend quite sharply. In this, of course,
they are very different from low degree polynomials whose graphs typically
display a distinct roly-poly character.
The possibility of finding rational functions which provide good global ap-
proximations to a general class of p.d.f.s is considered in Phillips (1982c). The
technique developed there is based on the idea of working from local Taylor series
approximations at certain points of the distribution towards a global approxima-
tion which performs well in the whole domain over which the distribution is
defined and yet retains the good performance of the Taylor series approximations
in the immediate locality of the points of expansion. This is, in part, achieved by
the use of multiple-point PadC approximants. These Pad6 approximant&” are
rational functions constructed so as to preserve the local Taylor series behavior of
the true p.d.f. (or d.f.) to as high an order as possible. The points selected for local
expansion will often be simply the origin (in the central body of the distribution)
and the tails. These local expansions can, in fact, be obtained from information
about the characteristic function of the distribution so that direct knowledge of
the local behavior of the true p.d.f. is not necessary for the successful application
4oPadk approximants have a long tradition in mathematics and have recently been succes$fully
applied to a large number of problems in applied mathematics and mathematical physics. References
to this literature may be found in Phillips (1982~).
Ch. 8: Exact Small Sample Theory 505
where m and n are even integers with m I n and s(r) is a real continuous function
satisfying s(r) > 0 and s(r) + 0 as r + + co.
The coefficient function s(r) in (4.1) is a vehicle by which additional informa-
tion about the true density can be readily embodied in the approximant. This can
be soft quantitative information, for example of the type that pdf(r) > 0 and
pdf(r) + 0 as r + f 00 [already explicit in s(r)]; or hard quantitative informa-
tion, for example of the type (i) that pdf(r) has moments up to a certain order or
(ii) that pdf(r) takes an especially simple form in an important and relevant
leading case or (iii) that pdf(r) has a known Edgeworth series expansion up to a
certain order (suitably modified to ensure that it is everywhere positive and still
tends to zero at infinity).
Practical considerations frequently suggest a specialization of (4.1) to the
family of rational fractions in which numerator and denominator polynomials
have the same degrees (i.e. m = n). 4’ In addition, a normalization condition is
imposed on the coefficients of the polynomials in (4.1) to eliminate the re-
dundancy that results from the multiplication of P,(r) and Q,(r) by an arbitrary
constant. In density function approximation this can be simply achieved by
setting b, = 1, which also ensures that the rational approximant is well behaved as
to measure the error in the approximation. Under this error norm it is shown that
best uniform approximants within the family (4.1) exist and are unique for a
general class of continuous p.d.f.s. Setting m = n and defining y’ = {a,, . . . , a,;
b , , . . . , b,} in (4.1) means that there exists a vector y* and a corresponding rational
fraction R’,,(r) for which
given some continuous density pdf(r); and, moreover, the rational fraction
R’,,(r) with the property (4.3) is unique. As n + 00 R’,,(r) converges uniformly
to pdf(r). Hence, an arbitrarily good approximation is possible within this family
of rational functions.
Practical implementation of rational approximation requires the degree of
R,,(r) to be prescribed, the coefficient function s(r) to be selected, and the
parameters of the polynomialsto be specified. The problem is one of constructive
functional approximation to a given distribution within the family of approxi-
mants (4.1) Operational guidelines for this constructive process are laid out in
Phillips (1982~) and the final solution in any particular case will rely intimately
on the information that is available about the true distribution. Typically, we will
want the approximant to embody as much analytic and reliable experimental
information about the distribution as possible. This will directly affect the choice
of s(r) and the prescribed degree of R,,(r). Leading case analyses such as those
in Section 3.4 will often lead to a suitable choice of s(r). Knowledge of the local
behavior of the distribution in the body and in the tails can be used to determine
the polynomial coefficients in R,,(r) which will then magnify or attenuate as
appropriate the leading case distribution. Local information about the distribu-
tion may take the form of Taylor expansions at certain points or estimates of the
function values obtained from Monte Carlo simulations. In cases where numerical
or Monte Carlo integration is possible, a selected set of points within the main
body and in the tails of the distribution can be used for these evaluations, which
can then assist in determining the parameters of R,,(r). This has the advantage
Ch. 8: Exact Small Sample Theory 507
The above discussion and reported application present a favorable picture of the
strengths and potential of this new approach. An important contributory factor in
this optimistic view is the flexible mathematical apparatus that underlies con-
structive functional approximation in the class defined by (4.1). As much analytic
knowledge as is available about a distribution can be embodied in R,,(r)
through the dual vehicles of the coefficient function s(r) and the rational
coefficients (a,, . . . , a,, b,, . . . , b,}. Thus, Edgeworth expansions and saddlepoint
approximations are just subcases of (4.1). For if these expansions are known to
yield good approximants in certain problems they themselves may be used to
construct s(r). Simple modifications to the Edgeworth expansion will ensure that
s(r) is everywhere positive, continuous, and still tends to zero as Irl+ 00.
Additional information about the distribution can then be incorporated in the
rational coefficients and in adjustments to s(r) that ensure the same tail behavior
as the true distribution, where this is known by separate analytic investigation.
Other choices of s(r) that stem directly from analytic knowledge of the true
distribution are also possible, as the example cited demonstrates. Moreover,
experimental data about the distribution can be utilized in the choice of the
rational coefficients by least squares or generalized least squares fitting to the
508 P. C. B. Phillips
5. Concluding remarks
This review began with some remarks taken from the first edition of R. A.
Fisher’s (1925) influential manual for practising statisticians. Fisher’s keen aware-
ness of the limitations of asymptotic theory, his emphasis on statistical tools
which are appropriate in the analysis of small samples of data, and his own
research on the exact sampling distributions of variance ratios and correlation
coefficients contributed in significant ways to the growth of what is now an
extensive literature in mathematical statistics on small sample distribution theory.
The challenge of developing such a theory in models that are of interest to
econometricians has produced the corpus of knowledge that forms the subject-
matter of this review. Questions of the relevance of this research and its opera-
tional payoff in terms of empirical practice are as much a topic of debate in
econometrics as they were (and still are to a lesser degree) in mathematical
statistics.
In contrast to small sample theory, the power of asymptotic theory lies
umnistakedly in the generality with which its conclusions hold, extending over a
wide domain of models and assumptions that now allow for very general forms of
dependent random processes, non-linear functional forms, and model misspecifi-
cations. However, the generality of this theory and the apparent robustness of
many of its conclusions should not necessarily be presumed to be strengths. For
the process by which asymptotic machinery works inevitably washes out sensitivi-
ties that are present and important in finite samples. Thus, generality and
robustness in asymptotic theory are achieved at the price of insensitivity with
Ch. 8: Exact Small Sample Theory 509
42Some further reflections on the problems inherent in asymptotic theory are given in Phillips
(1982b).
510 P. C. B. Phillips
References
Amemiya, T. (1966) “On the Use of Principal Components of Independent Variables in Two-Stage
Least-Squares Estimation”, International Economic Reuiew, I, 283-303.
Anderson, T. W. (1976) “Estimation of Linear Functional Relationships: Approximate Distribution
and Connections with Simultaneous Equations in Econometrics”, Journal of the Royal Statistical
Society, B38, l-36.
Anderson, T. W. (1977) “Asymptotic Expansions of the Distributions of Estimates in Simultaneous
Equations for Alternative Parameter Sequences”, Econometrica, 45, 509-518.
Anderson, T. W. (1982) “Some Recent Developments on the Distributions of Single-Equation
Estimators”, in: W. Hildebrand (ed.), Aduances in Econometrics. Amsterdam: North-Holland
Publishing Co. (forthcoming).
Anderson, T. W., N. Kunitomo and T. Sawa (1980) “Evaluation of the Distribution Function of the
Limited Information Maximum Likelihood Estimator”, Technical Report No. 3 19, The Economic
Series, Stanford University.
Anderson, T. W., K. Morimune and T. Sawa (1978) “The Numerical Values of Some Key Parameters
in Econometric Models”, Stanford University, IMSSS TR no. 270.
Anderson, T. W. and H. Rubin (1949), “Estimation of the Parameters of a Single Equation in a
Complete System of Stochastic Equations”, Annals of Mathematical Statistics, 20, 46-63.
Anderson, T. W. and H. Rubin (1950), “The Asymptotic Properties of Estimates of the Parameters of
a Single Equation in a Complete System of Stochastic Equations”, Annals of Mathematical Statistics,
21, 570-582.
Anderson, T. W. and T. Sawa (1973) “Distributions of Estimates of Coefficients of a Single Equation
in a Simultaneous System and Their Asymptotic Expansions”, Econometrica, 41, 683-714.
Anderson, T. W. and T. Sawa (1975) “Distribution of a Maximum Likelihood Estimate of a Slope
Coefficient: The LIML Estimate for Known Covariance Matrix”, Technical Report no. 174, IMSSS,
Stanford University.
Anderson, T. W. and T. Sawa (1977) “Numerical Evaluation of the Exact and Approximate
Distribution Functions of the Two Stage Least Squares Estimate”, Stanford Economic Series
Technical Report no. 239.
Anderson, T. W. and T. Sawa (1979) “Evaluation of the Distribution Function of the Two-Stage Least
Squares Estimate”, Econometrica, 47, 163- 182.
Bartlett, M. S. (1948) “A Note on the Statistical Estimation of Demand and Supply Relations from
Time Series”, Econometrica, 16, 323-329.
Bamdorff-Nielson, 0. and D. R. Cox (1979) “Edgeworth and Saddle-point Approximations with
Statistical Applications”, Journal of the Royal Statistical Society, Ser. B, 41, 279-3 12.
Basmann, R. L. (1960) “On Finite Sample Distributions of Generalized Classical Linear Identifiabil-
ity Test Statistics”, Journal of the American Statistical Association, 55, 650-659.
Basmann, R. L. (1961) “Note on the Exact Finite Sample Frequency Functions of Generalized
Classical Linear Estimators in Two Leading Overidentified Cases”, Journal of the American
Statistical Association, 56, 619-636.
Basmann, R. L. (1963) “A Note on the Exact Finite Sample Frequency Functions of Generalized
Classical Linear Estimators in a Leading Three Equation Case”, Journal of the American Statistical
Association, 58, 161-171.
Basmann, R. L. (1965) “On the Application of the Identifiability Test Statistic in Predictive Testing of
Explanatory Economic Models”, Econometric Annual of the Indian Economic Journal, 13, 387-423.
Basmann, R. L. (1974) “Exact Finite Sample Distribution for Some Econometric Estimators and Test
Statistics: A Survey and Appraisal”, M. D. Intriligator and D. A. Kendricks (eds.), in: Frontiers of
Quantitative Economics, vol. 2. Amsterdam: North-Holland Publishing Co., ch. 4, pp. 209-271.
Basmann, R. L. and R. H. Richardson (1969) “The Finite Sample Distribution of a Structural
Variance Estimator”, Research Papers in Theoretical and Applied Economics, 24, University of
Kansas, mimeographed.
Basmann, R. L. and D. H. Richardson (1973), “The Exact Finite Sample Distribution of a
Non-Consistent Structural Variance Estimator”, Econometrica, 41, 41-58.
Ch. 8: Exact Small Sample Theory 511
Basmann, R. L., D. H. Richardson and R. J. Rohr (1974) “An Experimental Study of Structural
Estimators and Test Statistics Associated with Dynamical Econometric Models”, Econometrica, 42,
717-730.
Bergstrom, A. R. (1962) “The Exact Sampling Distributions of Least Squares and Maximum
Likelihood Estimators of the Marginal Propensity to Consume”, Econometrica, 30, 480-490.
Bhattacharya, R. N. and R. R. Rao (1976) Normal Approximation and Asymptotic Expansions. New
York: John Wiley & Sons.
Bleistein, N. and R. A. Handelsman (1976) Asymptotic Expansions of Integrals. New York: Holt,
Rinehart and Winston.
Byron, R. P. (1974) “Testing Structural Specification Using the Unrestricted Reduced Form”,
Econometrica, 42, 869-883.
Chikuse, Y. (1981) “Invariant Polynomials with Matrix Arguments and Their Applications”, in: R. P.
Gupta (ed.), Multivariate Statistical Analysis. Amsterdam: North-Holland Publishing Co.
Constantine, A. G. (1963) “Some Noncentral Distribution Problems in Multivariate Analysis”, Annals
of Mathematical Statistics, 34, 1270-1285.
Cramer, H. (1946) Mathematical Methods of Statistics. Princeton: Princeton University Press.
Cramer, H. (1972) “On the History of Certain Expansions Used in Mathematical Statistics”,
Biometrika, 59, 204-207.
Daniels, H. E. (1954) “Saddlepoint Approximations in Statistics”, Annals of Mathematical Statistics,
25,63 l-650.
Daniels, H. E. (1956) “The Approximate Distribution of Serial Correlation Coefficients”, Biometrika,
43, 169-185.
Daniels, H. E. (1980) “Exact Saddlepoint Approximations”, Biometrika, 67, 59-63.
Davies, R. B. (1973) “Numerical Inversion of a Characteristic Function”, Biometrika, 60, 415-417.
Davis, A. W. (1980a) “Invariant Polynomials with Two Matrix Arguments Extending the Zonal
Polynomials”, in: P. R. Krishnaiah (ed.), Multioariate Analysis. Amsterdam: North-Holland Pub-
lishing Co.
Davis, A. W. (198Ob) “Invariant Polynomials with Two Matrix Arguments Extending the Zonal
Polynomials: Applications to Multivariate Distribution Theory”, Annals of the Institute of Statistical
Mathematics, forthcoming.
De Bruijn, N. G. (1958) Asymptotic Methods in Analysis. Amsterdam: North-Holland Publishing Co.
Dhrymes, P. J. (I 969) “Alternative Asymptotic Tests of Significance and Related Aspects of 2SLS and
3SLS Estimated Parameters”, Review of Economic Studies, 36, 213-226.
Drymes, P. J. (1973) “Restricted and Unrestricted Reduced Froms: Asymptotic Distribution and
Relative Efficiency”, Econometrica, 41, 119-134.
Durbin, J. (198Oa) “Approximations for Densities of Sufficient Estimators”, Biometrika, 67, 3 1 l-333.
Durbin, J. (1980b) “The Approximate Distribution of Partial Serial Correlation Coefficients Calcu-
lated from Residuals from Regression on Fourier Series “, Biometrika, 67, 335-349.
Durbin, J. and G. S. Watson (1971) “Testing for Serial Correlation in Least Squares Regression, III”,
Biometrika, 58, I - 19.
Ebbeler, D. H. and J. B. McDonald (1973) “An Analysis of the Properties of the Ejtact Finite Sample
Distribution of a Nonconsistent GCL Structural Variance Estimator”, Econometrica, 41, 59-65.
Erdiyli, A. (1953) Higher Transcendental Functions, vol. 1. New York: McGraw-Hill.
Feller, W. (1971) An Introduction to Probability Theory and Its Applications, vol. II. Wiley: New York.
Fieiller, E. C. (1932) “The Distribution of the Index in a Normal Bivariate Population”, Biometrika,
24, 428-440.
Fisher, F. M. (1961) “On the Cost of Approximate Specification in Simultaneous Equation Estima-
tion”, Econometrica, 29, 139- 170.
Fisher, F. M. (1966) “The Relative Sensitivity to Specification Error of Different k-Class Estimators”,
Journal of the American Statistical Association, 61, 345-356.
Fisher, F. M. (1967) “Approximate Specification and the Choice of a k-Class Estimator”, Journal of
the American Statistical Association, 62, 1265- 1276.
Fisher, R. A. (1921) “On the Probable Error of a Coefficient of Correlation Deduced From a Small
Sample”, Metron, 1, l-32.
Fisher, R. A. (1922) “The Goodness of Fit of Regression Formulae and the Distribution of Regression
Coefficients”, Journal of the Royal Statistical Society, 85, 597-612.
512 P. C. B. Phillips
Fisher, R. A. (1924) “The Distribution of the Partial Correlation Coefficient”, Metron, 3, 329-332.
Fisher, R. A. (1925) Statistical Methoclsfor Research Workers. Edinburgh: Oliver and Boyd,
Fisher. R. A. (1928a) “On a Distribution Yielding. the Error Functions of Several Well Known
Statistics”, in: Proceedings of the International Congress of Mathematics. Toronto, pp. 805-813.
Fisher, R. A. (I 928b) “The General Sampling Distribution of the Multiple Correlation Coefficient”,
Proceedings of the Royal Statistical Society, 121, 654-673.
Fisher, R. A. (1935) “The Mathematical Distributions Used in the Common Tests of Significance”,
Econometrica, 3, 353-365.
Fuller, W. A. (1977) “Some Properties of a Modification of the Limited Information Estimator”,
Econometrica, 45, 939-953.
Gil-Pelaez, J. (1951) “Note on the Inversion Theorem”, Biometrika, 38, 481-482.
Goldberger, A. (1973) “Efficient Estimation in Overidentified Models: An Interpretive Analysis”, in:
A. S. Goldberger and 0. D. Duncan (eds.), Structural Equation Models in the Social Sciences.
Seminar Press.
Haavelmo, T. (1947) “Methods of Measuring the Marginal Propensity to Consume “, Journal of the
American Statistical Association, 42, 105- 122.
Hale, C. (1979) “Misspecification in Seemingly Unrelated Regression Equations”, Kent University,
mimeo.
Hale, C., R. S. Mariano and J. G. Ramage (1980), “Finite Sample Analysis of Misspecification in
Simultaneous Equation Models”, Journal of the American Statistical Association, 75, 418-427.
Hart, J. F. (1968) Computer Approximations. New York: John Wiley & Sons.
Hastings, C. (1955) Approximations for Digital Computers. Princeton: Princeton University Press.
Hatanaka, M. (I 973) “On the Existence and the Approximation Formulae for the Moments of the
k-Class Estimators”, The Economic Studies Quarterly, 24, l-15.
Hausman, J. A. and D. McFadden (1981) “Specification Tests for the Multinomial Logit Model”,
Discussion Paper no. 292, Department of Economics, Massachusetts Institute of Technology.
Hea, C. S. (1955) “Bessel Functions of Matrix Argument”, Annals of Mathematics, 61, 474-523.
Hillier, G. H. and V. R. Srivastava (1981) “The Exact Bias and Mean Square Error of the k-Class
Estimators for the Coefficient of an Endogenous Variable in a General Structural Equation”,
Monash University, mimeo.
Hollv. A. and P. C. B. Phillins (1979) “A Saddlenoint Annroximation to the Distribution of the
k-Class Estimator of a Coefiicient in’a Simultaneous Syst&“, Econometrica, 47, 1527-1547.
Hood, Wm. C. and T. C. Koopmans (1953) “The Estimation of Simultaneous Linear Economic
Relationships”, in: Wm. C. Hood and T. C. Koopmans (eds.), Studies in Econometric Method,
Cowles Commission. New York: John Wiley & Sons.
Hurwicz, L. (1950) “Least Squares Bias in Time Series”, in: T. C. Koopmans (ed.), Statistical Inference
in Dynamic Economic Models. New York: John Wiley & Sons.
Imhof, J. P. (1961) “Computing the Distribution of Quadratic Forms in Normal Variables”,
Biometrika, 48, 419-426.
Intriligator, M. D. and D. A. Kendrick (eds.) (1974) Frontiers of Quantitatiue Economics, vol. 11.
Amsterdam: North-Holland Publishing Co.
James, A. T. (1961) “Zonal Polynomials of the Real Positive Definite Symmetric Matrices”, Annals of
Mathematics, 74, 456-469.
James, A. T. (1964) “Distribution of Matrix Variates and Latent Roots Derived from Normal
Samples”, Annals of Mathematical Statistics, 35, 475.
James, A. T. (1968) “Circulation of Zonal Polynomial Coefficients by Use of the Laplace-Beltrami
operator”, Annals of Mathematical Statistics, 39, 1711- 17 18.
James, W. and C. Stein (1961) “Estimation with Quadratic Loss”, in: Proceedings of the Fourth
Berkeley Symposium on Mathematical Statistics and Probability, vol. 1. Berkeley: University of
California Press, pp. 36 1-379.
Johnson, N. L. and S. Katz (1970) Continuous Unioariate Distributions- 2. Boston: Houghton Mifflin.
Johnson, N. L. and S. Kotz (1972) Distributions in Statistics: Continuous Multivariate Distributions.
New York: John Wiley & Sons.
Kabe, D. G. (1963) “A Note on the Exact Distributions of the GCL Estimators in Two-Leading
Overidentified Cases”, Journal of the American Statistical Association, 58, 535-537.
Ch. 8: Exact Small Sample Theory 513
Kabe, D. G. (1964) “On the Exact Distributions of the GCL Estimators in a Leading Three-Equation
Case”, Journal of the American Statistical Association, 58, 535-531.
Kadane, J. (1971) “Comparison of k-Class Estimators When the Disturbances are Small”,
Econometrica, 39, 723-137.
Kakwani, N. C. and R. H. Court (1972) “Reduced-Form Coefficient Estimation and Forecasting from
a Simultaneous Equation Model”, Australian Journal of Statistics, 14, 143-160.
Kataoka, Y. (1974) “The Exact Finite Sample Distributions of Joint Least Squares Estimators for
Seemingly Unrelated Regression Equations”, Economic Studies Quarterly, 25,36-44.
Keleiian. H. H. (1974) “Random Parameters in a Simultaneous Eouation Framework: Identification
and Estimation”, Econometrica, 42, 517-521.
Kinal, T. W. (1980) “The Existence of Moments of k-Class Estimators”, Econometrica, 48, 241-249.
Kinal, T. W. (1982) “On the Comparison of Ordinary and Two Stage Least Squares Estimators”,
SUNY, Albany, mimeo.
Knight, J. L. (1977) “On the Existence of Moments of the Partially Restricted Reduced-Form
Estimators from a Simultaneous-Equation Model”, Journal of Econometrics, 5, 3 15-321.
Knight, J. L. (1981) “Non-Normality of Disturbances and the k-Class Structural Estimator”, School
oi Economics, The University of New South Wales, mimeo.
Kunitomo. N. (198 1) “On a Third Order Outimum Prouertv of the LIML Estimator When the Samnle
Size is Large”, Discussion Paper no. 502: Department of Economics, Northwestern University. *
Laitinen, K. (1978) “Why is Demand Homogeneity Rejected so Often?“, Economic Letters, 1,
187-191.
Lebedev, N. N. (1972) Special Functions and Their Application. New York: Dover.
Maasoumi, E. (1977) “A Study of Improved Methods of Estimating Reduced Form Coefficients Based
Upon 3SLS”. unpublished Ph.D. Thesis, London School of Economics.
Maasoumi, E. (1978) “A Modified Stein-Like Estimator for the Reduced Form Coefficients of
Simultaneous Equations”, Econometricu, 46, 695-703.
Maasoumi, E. (1980) “A Ridge-like Method for Simultaneous Estimation of Simultaneous Equa-
tions “, Journal of Econometrics, 12, 16 1- 176.
Maasoumi, E. (1981) “Uncertain Structural Models and Generic Reduced Form Estimation”, Iowa
University, mimeo.
Maasoumi, E. and P. Phillips (1982a) “On the Behaviour of Instrumental Variable Estimators”,
Journal of Econometrics (forthcoming).
Maasoumi, E. and P. Phillips (1982b) “Misspecification in the General Single Equation Case”, Yale
Univerisity, mimeo.
Maddala, G. S. (1974) “Some Small Sample Evidence on Tests of Significance in Simultaneous
Equations Models”, Econometrica, 42, 84 l-85 1.
Maekawa, K. (1980) “An Asymptotic Expansion of the Distribution of the Test Statistics for Linear
Restrictions in Zellner’s SUR Model”, The Hiroshima Economic Review, 4, 81-97.
Malinvaud, E. (1980) Statistical Metho& of Econometrics. Amsterdam: North-Holland Publishing Co.
Mariano, R. S. (1973) “Approximations to the Distribution Functions of the Ordinary Least-Squares
and Two-stage Least Squares Estimates in the Case of Two Included Endogenous Variables”,
Econometrica, 41, 61-11.
Mariano, R. S. (1975) “Some Large-Concentration-Parameter Asymptotics for the k-Class Estimators”,
Journal of Econometrics 3, 171- 1II.
Matiano, R. S. (1977) “Finite-Sample Properties of Instrumental Variable Estimators of Structural
Coefficients”, Econometrica, 45, 487-496.
Mariano, R. S. (1982) “Analytical Small-Sample Distribution Theory in Econometrics: The Simulta-
neous-Equations Case”, International Economic Review (forthcoming).
Mariano, R. S. and J. McDonald (1979) “A Note on the Distribution Functions of LIML and 2SLS
Coefficient Estimators in the Exactly Identified Case”, Journal of the American Statistical Associa-
tion, 14, 847-848.
Mariano. R. S., J. McDonald and A. Tishler (1979) “On the Effects of Multicollinearity upon the
Properties of Structural Coefficient Estimators”, mimeo.
Mariano, R. S. and J. G. Ramage (1978) “Ordinary Least Squares versus other Single Equation
Estimators: A Return Bout under Misspecification in Simultaneous Systems”, University of
Pennsylvania, Department of Economics Discussion Paper no. 400.
514 P. C. B. Phillips
Mariano, R. S. and J. G. Ramage (1979) “Large Sample Asymptotic Expansions for General Linear
Simultaneous Systems under Misspecification”, University of Pennsylvania, mimeo.
Mariano, R. S. and T. Sawa (1972) “The Exact Finite-Sample Distribution of the Limited Information
Maximum Likelihood Estimator in the Case of Two Included Endogenous Variables”, Journal of the
American Stutistical Ass&&ion, 67, 159- 163.
McCarthy, M. D. (1972) “A Note on the Forecasting Properties of 2SLS Restricted Reduced Forms”,
Internationul Economic Review, 13, 757-76 1.
McDonald, J. B. (1972) “The Exact Finite Sample Distribution Function of the Limited-Information
Maximum Likelihood Identifiability Test Statistic”, Econometrica, 40, 1109-1119.
McDonald, J. B. (1974) “An Approximation of the Distribution Function of the LIML Identifiability
Test Statistic Using the Method of Moments”, Journal of Statistical Computation and Simulation, 3,
53-66.
McLaren, M. L. (1976) “Coefficients of the Zonal Polynomials”, Applied Statistics, 25, 82-87.
Meisner, J. F. (1979) “The Sad Fate of the Asymptotic Slutsky Symmetry Test for Large Systems”,
Economic Letters, 2, 23 I-233.
M&hail, W. M. (1969) “A Study of the Finite Sample Properties of Some Economic Estimators”,
unpublished Ph.D. Thesis, London School of Economics.
Miller, K. S. (1960) Advanced Complex Calculus. New York: Harper.
Morimune, K. (1978) “Improving the Limited Information Maximum Likelihood Estimator When the
Disturbances are Small”, Journal of the American Stutisticul Association, 73, 867-871.
Morimune, K. (1981) “Approximate Distributions of the k-Class Estimators when the Degree of
Overidentifiability is Large Compared with the Sample Size”, Discussion Paper no. 159, Institute of
Economic Research, Kyoto University.
Morimune, K. and N. Kunitomo (1980) “Improving the Maximum Likelihood Estimate in Linear
Functional Relationships for Alternative Parameter Sequences”, Journal of the American Statistical
Association, 75, 230-237.
Muirhead, R. J. (1978) “Latent Roots and Matrix Variates: A Review of Some Asymptotic Results”,
The Annals of Statistics, 6, 5-33.
Nagar, A. L. (1959) “The Bias and Moment Matrix of the General k-Class Estimators of the
Parameters in Structural Equations”, Econometrica, 27, 575-595.
Nagar, A. L. and S. N. Sahay (1978) “The Bias and Mean Squared Error of Forecasts from Partially
Restricted Reduced Form”, Journal of Econometrics, 7, 227-243.
Nagel, P. J. A. (1981) “Programs for the Evaluation of Zonal Polynomials”, American Statistician, 35,
53.
Pan Jie-Jian (1968) “Distribution of the Noncircular Serial Correlation Coefficients”, American
Mathematical Society and Institute of Mathematical Statistics, Selected Translations in Probability and
Statistics, 7, 28 l-29 1.
Pfanzagl, J. and W. Wefelmeyer (1978) “A Third-Order Optimum Property of the Maximum
Likelihood Estimator”, Journal of Multivariate Analysis, 8, l-29.
Pfanzagl, J. and W.’ Wefelmeyer (1979) “Addendum to a Third-Order Optimum Property of the
Maximum Likelihood Estimator”, Journal of Multivariate Analysis, 9, 179- 182.
Phillips, P. C. B. (1977a) “Approximations to Some Finite Sample Distributions Associated with a
First Order Stochastic Difference Equation”, Econometrica, 45, 463-486.
Phillips, P. C. B. (1977b) “A General Theorem in the Theory of Asymptotic Expansions as
Approximations to the Finite Sample Distributions of Econometric Estimators”, Econometrica, 45,
1517-1534.
Phillips, P. C. B. (1977~) “An Approximation to the Finite Sample Distribution of Zellner’s Seemingly
Unrelated Regression Estimator”, Journal of Econometrics, 6, 147- 164.
Phillips, P. C. B. (1978) “Edgeworth and Saddlepoint Approximations in the First-Order Noncircular
Autoregression”, Biometrika, 65, 91-98.
Phillips, P. C. B. (198Oa) “The Exact Finite Sample Density of Instrumental Variable Estimators in an
Equation with n + 1 Endogenous Variables”, Econometrica, 48, no. 4, 861-878.
Phillips, P. C. B. (1980b) “Finite Sample Theory and the Distributions of Alternative Estimators of
the Marginal Propensity to Consume”, Review of Economic Studies, 47, no. 1, 183-224.
Phillips, P. C. B. (1982a) “Marginal Densities of Instrumental Variable Estimators in the General
Single Equation Case”, Advances in Econometrics (forthcoming).
Ch. 8: Exact Small Sample Theory 515
Sargan, .I. D. (1976b) “Existence of the Moments of Estimated Reduced Form Coefficients”, London
School of Economics Discussion Paper no. A6.
Sargan, J. D. (1978) “On the Existence of the Moments of 3SLS Estimators”, Econometrica, 46,
1329- 1350.
Sacgan, J. D. (1980) “Some Approximation to the Distribution of Econometric Criteria which are
Asymptotically Distributed as Chi-squared”, Econometrica, 48, 1107-I 138.
Sargan, J. D. and S. E. Satchel1 (1981) “The Validity of Edgeworth Expansions for Autoregressive
Models”, in preparation.
Satchell, S. E. (1981) “Edgeworth Approximations in Linear Dynamic Models”, unpublished Ph.D.
dissertation, London School of Economics.
Sawa, T. (1969) “The Exact Finite Sampling Distribution of Ordinary Least Squares and Two Stage
Least Squares Estimator”, Journal of ihe American Statistical Assocunion, 64,923-936.
Sawa. T. (19721 “Finite-&mule Pronerties of the k-Class Estimators”, Econometrica, 40, 653-680.
Sawa; T. il973a) “Almost Unbiased Estimator in Simultaneous Equations Systems”, International
Economic Review, 14, 97- 106.
Sawa, T. (1973b) “The Mean Square Error of a Combined Estimator and Numerical Comparison with
the TSLS Estimator”, Journal of Econometrics, 1, I 15- 132.
Sclove, S. (1968) “Improved Estimators for Coefficients in Linear Regression”, Journnl of the
American Statistical Association, 63, 596-606.
Sclove, S. (1971) “Improved Estimation of Parameters in Multivariate Regression”, Sunkya, Ser. A,
61-66.
Slater, L. J. (1965) “Confluent Hypergeometric Functions”, in: M. Abramowitz and I. A. Stegun
(eds.), Handbook of Mathematical Functions. New York: Dover.
Srinivasan, T. N. (1970) “Approximations to Finite Sample Moments of Estimators Whose Exact
Sampling Distributions are Unknown”, Econometrica, 38, 533-541.
Swamy, P. A. V. B. and J. S. Mehta (1980) “On the Existence of Moments of Partially Restricted
Reduced Form Coefficients”, Journal of Econometrics, 14, 183- 194.
Takeuchi, K. (1970) “Exact Sampling Moments of Ordinary Least Squares, Instrumental Variable and
Two Stage Least Squares Estimators”, International Economic Review, 11, I - 12.
Ullah, A. (1974) “On the Sampling Distribution of Improved Estimators for Coefficients in Linear
Regression”, Journal of Econometrics, 2, 143- 150.
Ullah, A. (1980) “The Exact, Large-Sample and Small-Disturbance Conditions of Dominance of
Biased Estimators in Linear Models”, Economic Letters, 6, 339-344.
Ullah, A. and A. L. Nagar (I 974) “The Exact Mean of the Two Stage Least Squares Estimator of the
Structural Parameters in an Equation Having Three Endogenous Variables”, Econometrica, 42,
749-758.
Wallace, D. L. (1958) “Asymptotic Approximations to Distributions”, Annuls of Mathematical
Statistics, 29, 635-654.
Wegge, L. L. (1971) “The Finite Sampling Distribution of Least Squares Estimators with Stochastic
Regressors”, Econometrica, 38, 241-25 1.
Whittaker, E. T. and G. N. Watson (1927) Modern Analysis. Cambridge.
Widder, D. V. (1941) The Luplace Transform. Princeton: Princeton University Press.
Widder, D. V. (1961) Advanced Calculus. Prentice-Hall.
Zellner, A. (1978) “Estimation of Functions of Population Means and Regression Coefficients
Including Structural Coefficients: A Minimum Expected Loss (MELO) Approach”, Journal of
Econometrics, 8, 127- 158.
Zellner, A. and W. Vandaele (1975) “Bayes-Stein Estimators for k-Means, Regression and Simulta-
neous Equation Models”, in: S. E. Feinberg and A. Zellner (eds.), Studies in Buyesian Econometrics
and Statistics in Honor of Leonard J. Savage. Amsterdam: North-Holland Publishing CO.
Zellner, A. and S. B. Park (1979) “Minimum Expected Loss (MELO) Estimators for Functions of
Parameters and Structural Coefficients of Econometric Models”, Journal of the American Statistical
Association, 74, 183- 185.