0% found this document useful (0 votes)
59 views8 pages

Variables (IV) Is Used To Estimate

The document discusses instrumental variable methods, which allow for consistent estimation of causal relationships when explanatory variables are correlated with error terms in a regression model. An instrumental variable is a variable that is correlated with the endogenous explanatory variables but not with the error term. Two main requirements for an instrumental variable are that it must be correlated with the endogenous explanatory variables conditional on other covariates, and it cannot be correlated with the error term conditional on other covariates. Instrumental variable methods are commonly used to estimate causal effects when controlled experiments are not feasible, by using natural experiments or policy changes as instruments.

Uploaded by

sanamdad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views8 pages

Variables (IV) Is Used To Estimate

The document discusses instrumental variable methods, which allow for consistent estimation of causal relationships when explanatory variables are correlated with error terms in a regression model. An instrumental variable is a variable that is correlated with the endogenous explanatory variables but not with the error term. Two main requirements for an instrumental variable are that it must be correlated with the endogenous explanatory variables conditional on other covariates, and it cannot be correlated with the error term conditional on other covariates. Instrumental variable methods are commonly used to estimate causal effects when controlled experiments are not feasible, by using natural experiments or policy changes as instruments.

Uploaded by

sanamdad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

In statistics, econometrics, epidemiology and related disciplines, the method of instrumental

variables (IV) is used to estimate causal relationships when controlled experiments are not feasible
or when a treatment is not successfully delivered to every unit in a randomized experiment. [1]
Instrumental variable methods allow consistent estimation when the explanatory
variables (covariates) are correlated with the error terms of a regressionrelationship. Such
correlation may occur when the dependent variable causes at least one of the covariates ("reverse"
causation), when there are relevant explanatory variables which are omitted from the model, or
when the covariates are subject to measurement error. In this situation, ordinary linear
regressiongenerally produces biased and inconsistent estimates.[2] However, if an instrument is
available, consistent estimates may still be obtained. An instrument is a variable that does not itself
belong in the explanatory equation and is correlated with the endogenous explanatory variables,
conditional on the other covariates. In linear models, there are two main requirements for using an
IV:

The instrument must be correlated with the endogenous explanatory variables, conditional
on the other covariates.

The instrument cannot be correlated with the error term in the explanatory equation
(conditional on the other covariates), that is, the instrument cannot suffer from the same problem
as the original predicting variable.
Contents
[hide]

1 Definitions

2 Example

3 Applications

4 Selecting Suitable Instruments

5 Estimation

6 Interpretation as two-stage least squares

7 Identification

8 Non-parametric analysis

9 On the interpretation of IV estimates

10 Potential problems

11 Sampling properties and hypothesis testing

12 Testing instrument strength and overidentifying restrictions

13 References

14 Further reading

15 External links

Definitions[edit]
The theory of instrumental variables was first derived by Philip G. Wright, possibly in co-authorship
with his son Sewall Wright, in his 1928 book The Tariff on Animal and Vegetable Oils.[3][4] Traditionally,
[5]

an instrumental variable is defined as a variable Z that is correlated with the independent

variable X and uncorrelated with the "error term" U in the equation

However, this definition suffers from ambiguities in concepts such as "error term" and "independent
variable," and has led to confusion as to the meaning of the equation itself, which was wrongly
labeled "regression."[6]
General definitions of instrumental variables, using counterfactual and graphical formalism, were
given by Pearl (2000; p. 248).[7] The graphical definition requires that Z satisfy the following
conditions:

where

stands for d-separation[8] and

stands for the graph in which all arrows entering X are

cut off.
The counterfactual definition requires that Z satisfies

where

stands for the value that Y would attain had X been x. and

stands for independence.

If there are additional covariates W then the above definitions are modified so that Z qualifies as an
instrument if the given criteria hold conditional on W.
The essence of Pearl's definition is:
1. The equations of interest are "structural," not "regression."
2. The error term U stands for all exogenous factors that affect Y when X is held constant.

3. The instrument Z should be independent of U.


4. The instrument Z should not affect Y when X is held constant (exclusion restriction).
5. The instrument Z should not be independent of X.
These conditions do not rely on specific functional form of the equations and are applicable therefore
to nonlinear equations, where U can be non-additive (see Non-parametric analysis). They are also
applicable to a system of multiple equations, in which X (and other factors) affect Y through several
intermediate variables. Note that an instrumental variable need not be a cause of X; a proxy of such
cause may also be used, if it satisfies conditions 1-5.[7] Note also that the exclusion restriction
(condition 4) is redundant; it follows from conditions 2 and 3.

Example[edit]
Informally, in attempting to estimate the causal effect of some variable x on another y, an instrument
is a third variable z which affects y only through its effect on x. For example, suppose a researcher
wishes to estimate the causal effect of smoking on general health.[9] Correlation between health and
smoking does not imply that smoking causes poor health because other variables may affect both
health and smoking, or because health may affect smoking in addition to smoking causing health
problems. It is at best difficult and expensive to conduct controlled experiments on smoking status in
the general population. The researcher may proceed to attempt to estimate the causal effect of
smoking on health from observational data by using time series on the tax rate for tobacco products
as an instrument for smoking in a causal analysis. If tobacco taxes and state of health are correlated
then this may be viewed as evidence that smoking causes changes in health.
Because demonstrating that the third variable 'z' is causally related to 'y' exclusively via 'x' is an
experimental impossibility, and because the same limitations that prevent an experiment to
determine if there is a causal relationship between 'x' and 'y' will normally also prevent experiments
determining if there is a causal relationship between 'z' and 'y' (assumed to be mediated through 'x'),
correlational data is the only type of evidence that analysis by instrumental variable can provide, and
causal inference is not justified. The use of an instrumental variable produces additional evidence of
a statistical relationship (in this case between 'z' and 'y'), without providing evidence of what type of
relationship it is, and without providing direct evidence for the type of relationship between 'x' and 'y'.

Applications[edit]
IV methods are commonly used to estimate causal effects in contexts in which controlled
experiments are not available. Credibility of the estimates hinges on the selection of suitable
instruments. Good instruments are often created by policy changes. For example, the cancellation of
a federal student-aid scholarship program may reveal the effects of aid on some students' outcomes.

Other natural and quasi-natural experiments of various types are commonly exploited, for example,
Miguel, Satyanath, and Sergenti (2004) use weather shocks to identify the effect of changes in
economic growth (i.e., declines) on civil conflict. [10] Angrist andKrueger (2001) present a survey of the
history and uses of instrumental variable techniques. [11]

Selecting Suitable Instruments[edit]


Since U is unobserved, the requirement that Z be independent of U cannot be inferred from data and
must instead be determined from the model structure, i.e., the data-generating process. Causal
graphs are a representation of this structure, and the graphical definition given above can be used to
quickly determine whether a variable Z qualifies as an instrumental variable given a set of
covariates W. To see how, consider the following example.

Figure 1: Proximity qualifies as an instrumental variable given Library Hours

Figure 2:

, which is used to determine whether Proximity is an instrumental variable.

Figure 3: Proximity does not qualify as an instrumental variable given Library Hours

Figure 4: Proximity qualifies as an instrumental variable but does not qualify as an instrumental variable given
Library Hours

Suppose that we wish to estimate the effect of a university tutoring program on GPA at a university
where the dormitories to which students are assigned is random. The relationship between attending
the tutoring program and GPA may be confounded by a number of factors. Students that attend the

tutoring program may care more about their grades or may be struggling with their work. (This
confounding is depicted in the Figures 1-3 on the right through the bidirected arc between Tutoring
Program and GPA.) Given that students are assigned to dormitories at random, the proximity of the
student's dorm to the tutoring program is a natural candidate for being an instrumental variable.
However, what if the tutoring program is located in the college library? Proximity may also cause
students to spend more time at the library, which in turn improves their GPA (see Figure 1). Using
the causal graph depicted in the Figure 2, we see that Proximity does not qualify as an instrumental
variable because it is d-connected to GPA through the path Proximity
in

Library Hours

GPA

. However, if we control for Library Hours by adding it as a covariate then Proximity becomes

an instrumental variable since Proximity is d-separated from GPA given Library Hours in

. Now,

suppose that we notice that a student's "natural ability" affects his or her number of hours in the
library as well as his or her GPA, as in Figure 3. Using the causal graph, we see that Library Hours is
a collider and conditioning on it opens the path Proximity

Library Hours

GPA. As a result,

Proximity cannot be used an instrumental variable. Finally, suppose that Library Hours does not
actually affect GPA because students who do not study in the library simply study elsewhere, as in
Figure 4. In this case, controlling for Library Hours still opens a spurious path from Proximity to GPA.
However, if we do not control for Library Hours and remove it as a covariate then Proximity can
again be used an instrumental variable.

Estimation[edit]
Suppose the data are generated by a process of the form

where

i indexes observations,

is the dependent variable,

is an independent variable,

is an unobserved error term representing all causes of

is an unobserved scalar parameter.

The parameter
of
of

is the causal effect on

other than

of a one unit change in

constant. The econometric goal is to estimate

, and

, holding all other causes

. For simplicity's sake assume the draws

are uncorrelated and that they are drawn from distributions with the same variance, that is,

that the errors are serially uncorrelated and homoskedastic.

Suppose also that a regression model of nominally the same form is proposed. Given a random
sample of T observations from this process, the ordinary least squares estimator is

where x, y and

denote column vectors of length T. When x and

are uncorrelated, under

certain regularity conditions the second term has an expected value conditional on x of zero
and converges to zero in the limit, so the estimator is unbiased and consistent. When x and
the other unmeasured, causal variables collapsed into the

term are correlated, however,

the OLS estimator is generally biased and inconsistent for . In this case, it is valid to use
the estimates to predict values of y given values of x, but the estimate does not recover the
causal effect of x on y.
An instrumental variable z is one that is correlated with the independent variable but not with
the error term. Using the method of moments, take expectations conditional on z to find

The second term on the right-hand side is zero by assumption. Solve for

and write the

resulting expression in terms of sample moments,

When z and

are uncorrelated, the final term, under certain regularity conditions,

approaches zero in the limit, providing a consistent estimator. Put another way, the
causal effect of x on y can be consistently estimated from these data even
though x is not randomly assigned through experimental methods.
The approach generalizes to a model with multiple explanatory variables.
Suppose X is the T K matrix of explanatory variables resulting
from T observations on Kvariables. Let Z be a T K matrix of instruments. Then it
can be shown that the estimator

is consistent under a multivariate generalization of the conditions discussed


above. If there are more instruments than there are covariates in the equation of
interest so that Z is a T M matrix with M > K, the generalized method of
moments (GMM) can be used and the resulting IV estimator is

where

Note that the second expression collapses to the first when the number of
instruments is equal to the number of covariates in the equation of interest
(just-identified case).
Proof that GMM collapses to IV in the just-identified case
[show]

Interpretation as two-stage least squares[edit]


One computational method which can be used to calculate IV estimates is
two-stage least-squares (2SLS or TSLS). In the first stage, each
explanatory variable that is an endogenous covariate in the equation of
interest is regressed on all of the exogenous variables in the model,
including both exogenous covariates in the equation of interest and the
excluded instruments. The predicted values from these regressions are
obtained.
Stage 1: Regress each column of X on Z, (

and save the predicted values:

In the second stage, the regression of interest is estimated as


usual, except that in this stage each endogenous covariate is
replaced with the predicted values from the first stage.
Stage 2: Regress Y on the predicted values from the first stage:

Which gives:

Proof: computation of the 2SLS estimator


[show]

The resulting estimator of

is numerically identical to the

expression displayed above. A small correction must be


made to the sum-of-squared residuals in the second-stage
fitted model in order that the covariance matrix of
calculated correctly.

is

You might also like