0% found this document useful (0 votes)

141 views25 pages

Regression With One Regressor

This document outlines univariate linear regression analysis. It introduces the key concepts of relating two variables using a population regression line where the expected value of the outcome (Y) given the predictor (X) is modeled as a linear function. It describes how ordinary least squares (OLS) estimation is used to estimate the parameters of the population regression line by minimizing the sum of squared errors between the actual and predicted Y values. Notation for the regression parameters and error term are also defined.

Uploaded by

Fatemeh Iglinsky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

141 views25 pages

Regression With One Regressor

Uploaded by

Fatemeh Iglinsky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Univariate Linear Regression

Joonhyung Lee
University of Memphis

Econ 7810/8810

Contents
1 Relating Two Variables 3

2 Estimation 4
2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Ordinary Least Squares (OLS) . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 The OLS Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Properties of the OLS Estimators 9

3.1 OLS is unbiased . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 OLS is consistent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Estimating V ar(βb1 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 Asymptotic Distribution of βb1 (skip) . . . . . . . . . . . . . . . . . . . . . . 12
3.4.1 Convergence in Probability . . . . . . . . . . . . . . . . . . . . . . . 12
3.4.2 Convergence in Distribution . . . . . . . . . . . . . . . . . . . . . . . 13
3.4.3 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . 13
3.4.4 Slutsky’s Theorem (Combines Conv in Prob and Dist) . . . . . . . . 13

4 Skedacity 16
4.1 Heteroskedasticity and Homoskedasticity . . . . . . . . . . . . . . . . . . . . 16
4.2 Weighted Least Squares (WLS) . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 The Variance of X and the Variance of βb1 . . . . . . . . . . . . . . . . . . . 18

5 Statistical Inference 19

6 Regression When X is a Binary Variable 19

7 Goodness of Fit 20

8 Units of Measurement 22

1
9 Estimation in Stata 23
9.1 Effects of Education on Hourly Wage (WAGE1.DTA) . . . . . . . . . . . . 23
9.2 Test score and student ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
9.3 CPS data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2
1 Relating Two Variables
• Econometrics is concerned with understanding relationships between variables that
we as economists care about.

– Education and wages, investment and innovation, advertising and sales, class
size and test scores...

• But given what we know so far, all we can do to study the relationship between two
(or more) variables is to use covariance and correlation.

• Covariance measures how 2 variables move together:

– Cov (X, Y ) = E [(X − µX ) (Y − µY )]

– But how can we estimate this? Suppose (Xi , Yi ) ∼ iid (pairs of observations are
iid)
1 P

– Then we can use sXY = n−1 Xi − X Yi − Y
p
– Furthermore, we can show that sXY → σXY

• Correlation also measures how two variables move together

sXY p
– In particular, rXY = sX sY (it’s also true that rXY → ρXY )

• But does rXY > 0 mean that high values of X cause the values of Y to be high?

• No, correlation is not causation. Moreover, correlation represents a linear relation.

• In many cases we want to know: If we increase X by a certain amount, what is the

expected effect on Y ?

• Are averages enough to answer this question?

• Let’s start with a case where X is discrete and compare E (Y | X) for two values of
X.

• Example: the following has the info about average earnings for men and women. Is
there a significant gender gap?

– Wage gap Y m − Y w is $4.11 per hour.

– The standard error SE Y m − Y w = .35 so the t-stat for H0 : Y m − Y w = 0 is
4.11−0
.35 = 11.74, which has a p-value that’s very close to 0 (2Φ(−11.74) ≈ 0).
– Indeed, a 99% CI for the wage gap is 4.11 ± 2.58 · .35 = (3.41, 4.80)

3
– So there is a gender gap and it’s statistically significant.
– But is this a sign of discrimination?
– Quite possibly. But why might it not be?
– Some “other factor” could be driving the relationship (experience, education).

• To establish gender bias we need to keep “everything else” constant which means
that instead of looking at

E(earnings | gender)

• We should be concerned with

E(earnings | gender, age, experience, education, etc.)

• It turns out that we can do both using regression analysis.

2 Estimation
• Let’s keep it simple in the beginning and start with E (Y | X).

• Of course, in most cases a univariate regression will be inadequate.

• Consider the following example. Does

E (T estScore | Classsize)

really capture the causal effect of class size on test scores?

• Aren’t other variables also important and perhaps driving the relationship?

– Neighborhood, teacher quality, parents’ income....

– Can we identify the impact of class size on test scores without controlling for
these other factors? Probably not.

• For now we’ll just stick to one X (and attribute these other factors to random vari-
ation).

• Adding more X’s turns out to be pretty simple and will allow us to account for these
additional factors explicitly.

4
2.1 Notation
• Y is the dependent, explained, response, predicted variable or regressand.
• X is the independent, explanatory, control, predictor variable or regressor, covariate.
• We know that E (Y | X) is a function of X, but what function?
• Let’s start by assuming it’s linear.
• Suppose E (Y | X) is linear in X:
E (Y | X) = β0 + β1 X

• In words, this is saying that if we know X, the expected value of Y is a linear function
of X.
• β0 + β1 X is then called the population regression line (the relationship that holds
between Y and X on average).
• So what do β0 & β1 represent? Consider the impact on Y of a one unit change in X.
E (Y | X = x) = β0 + β1 x
E (Y | X = (x + 1)) = β0 + β1 (x + 1)
E (Y | x + 1) − E (Y | x)= β0 + β1 (x + 1) − β0 − β1 x = β1
• So β1 is the expected change in Y associated with a one unit change in X (i.e. the
∆Y
slope: β1 = ∆X ).
• β0 is the intercept: the expected value of Y when X = 0.

– The intercept is simply the point at which the population regression line inter-
sects the Y axis.
– Note that in applications where X cannot equal 0, the intercept has no real
meaning. (ex) X is class size and Y test score. Intercept means test scores when
x = 0, implying nothing.

• Note E (Y | X) = β0 + β1 X doesn’t mean that the data will all lie on the same line.
• Notice that we didn’t write Yi = β0 +β1 Xi , but wrote E (Y | X) = β0 +β1 X instead.
• E (Y | X) is an expectation, the actual observations will be scattered around the
population regression line:
Yi = β0 + β1 Xi + ui

• ui represents all the other factors besides Xi that determine the value of Yi for a
particular observation i

5
2.2 Ordinary Least Squares (OLS)
• Given that we’ve assumed there’s a linear relationship between E (Y | X) and X,
how do we estimate it?
• Intuitively, we want to estimate E
b (Y | X) = βb0 + βb1 X, where βb0 & βb1 are estimates
of the population parameters β0 & β1 (just like X is an estimate of µ).
• So how do we find βb0 & βb1 ? By minimizing the prediction error.
• Our estimates βb0 & βb1 will give us the predicted value of Y conditional on X (the
predicted values are Ybi = βb0 + βb1 Xi ).
• Although we expect our estimates of β0 & β1 to be correct on average, for any
particular observation i, we are likely to make a prediction error.
• The error made in predicting the ith observation is given by Yi − Ybi = Yi − βb0 − βb1 Xi
• Intuitively, we would like to choose βb0 & βb1 to make all of these errors as small as
possible. But how?
• The OLS estimator chooses the regression coefficients by minimizing the sum of the
squared prediction errors
P 2 Ph i2
• M in Yi − Ybi = M in Yi − βb0 + βb1 Xi
βb0 ,βb1 βb0 ,βb1

• Another way: min Yi − Ybi (quantile regression)

• Taking partial derivatives yields

h i2 P
∂ P
Yi − β
b0 − βb X
1 i = −2 Y i − β
b 0 − β
b 1 i → Y − β0 − β1 X = 0
X b b
∂ βb0
h i2
∂ P b0 − βb1 Xi = −2 P Yi − βb0 − βb1 Xi Xi → 1 P Yi Xi −βb0 X−βb1 1 P X 2 =
Yi − β n n i
∂ βb1
0
• Setting the partial derivatives equal to zero, collecting terms, dividing by n, and
solving the resulting two equations in two unknowns for βb0 & βb1 yields:
(Xi −X )(Yi −Y )
P
β1 =
b 2 = ssXY
2
(Xi −X )
P
X

βb0 = Y − βb1 X
• So we can derive the estimating equations for βb0 & βb1 by minimizing the sum of
squared prediction errors (recall that X can be constructed in a similar way).
• Moreover, just like X, βb0 & βb1 are themselves random variables. (We’ll derive their
distributions in a bit.)

6
2.3 The OLS Assumptions
• So why should we have faith in the OLS methodology?

• Do the OLS estimators have the same desirable properties that X had (unbiasedness,
consistency, asymptotic normality, efficiency)?

• Do the OLS estimators have causal interpretation?

• The answer is yes, pending assumptions.

• Actually, these assumptions are enough to give us unbiasedness, consistency and

asymptotic normality (which will let us build confidence intervals and conduct hy-
pothesis tests).

• Efficiency will require an additional assumption (Homoskedacity or iid assumption)

that we’ll discuss later.

• The assumptions of OLS are

1. Linear in parameters. The population model can be written as

y = β0 + β1 x + u

where β0 and β1 are the (unknown) population parameters.

2. Simple random sample⇔ (Xi , Yi ) , i = 1, ..., n, each individual in the population
is equally likely to be included in the sample
3. Sample variation in the explanatory variable
4. Zero conditional mean; Strict exogeneity; E (ui | Xi ) = 0
5. Homeskedacity: V ar(ui |Xi ) = σ 2

• In fact, a central purpose of these assumptions is to allow us to derive sampling

distributions for the estimates (which turn out to be normal).

• This will allow us to construct CIs and test hypotheses just like we did for µ.

• A second role of the assumptions is to highlight situations in which OLS regressions

might run into trouble.

• Much of the second half of the course is focused on handling these situations.

Assumption 1 & 4

• Yi = β0 + β1 Xi + ui

7
• The conditional distribution of ui given Xi has mean 0, called the zero conditional
mean assumption.

• Extending to multiple regressors, if u is correlated with any of the Xi , this assumption

is violated. This is usually a good way to think about the problem.

• Implication: ui is random (noise) given Xi , i.e. Xi and ui are independent. In the

wage equation, Suppose u is “ability” and x is years of education. We need, for
example,
E(ability|x = 8) = E(ability|x = 12) = E(ability|x = 16)
so that the average ability is the same in the different portions of the population with
an 8th grade education, a 12th grade education, and a four-year college eduction.

• Because people choose education levels partly based on ability, this assumption is
almost certainly false.

• As another example, suppose u is “land quality” and x is fertilizer amount. Then

E(u|x) = E(u) if fertilizer amounts are chosen independently of quality. This as-
sumption is reasonable (assumes fertilizer amounts are assigned at random).

• We will relax this into E(ui |X1 , X2 ) = E(ui |X2 ), which is called conditional mean
independence. In this case, we can still interpret the causality of X1 , but not X2 .
The idea is that X1 is independent (exogenous) variable as long as X2 is controlled.
We will get back to this issue in linear regression with multiple regressors.

• E (ui | Xi ) = 0 ⇒ E (Yi − (β0 + β1 Xi ) | Xi ) = 0 ⇒ E(Yi |Xi ) = β0 + β1 Xi

• Assumption 1 & 4 generates conditional expectation being linear.

• Intuition: Given Xi , the mean of the distribution of ui is 0.

• This means the conditional distribution is centered around the population regression
line.

• Note also E (ui | Xi ) = 0⇒ρux = 0, but not reverse.

Assumption 2

• Intuition: You have a random sample!

• This assumption is likely to hold in cross-sections, but often violated in time series
or panel data.

8
3 Properties of the OLS Estimators
3.1 OLS is unbiased
• We are going to show that OLS is unbiased and find its asymptotic distribution.

• Let’s start with unbiasedness.

• We want to show that

E(βb0 ) = β0 and E(βb1 ) = β1

• We’ll calculate E(βb1 ) now.

• To find E(βb1 ), we first need to know the formula for βb1 :

P
X i − X Yi − Y
βb1 = P 2
Xi − X

• It will be useful to rewrite this using a clever “trick”.

• Since we are assuming Yi = β0 + β1 Xi + ui , it follows that

1X 1X 1X 1X
Yi = β0 + β1 Xi + ui
n n n n
⇔
Y = β0 + β1 X + u
taking the difference we have

Yi − Y = β1 Xi − X + ui − u

• Using this trick and some additional algebra, we can rewrite the formula for βb1 in a
more useful way

9
P
Xi − X Yi − Y
βb1 = P 2
Xi − X
P
Xi − X β1 Xi − X + ui − u
= P 2
Xi − X
P 2 P
β1 Xi − X + Xi − X (ui − u)
= P 2
Xi − X
P 2 P P
β1 Xi − X + Xi − X ui − Xi − X u
= P 2
Xi − X
P 2 P
β1 Xi − X + Xi − X ui
= P 2
Xi − X
P
Xi − X ui
= β1 + P 2
Xi − X

• So P
Xi − X ui
β1 = β1 + P
b 2
Xi − X

which is a very useful result (we’ll use it to derive the distribution of βb1 later on).

• Now, to show E(βb1 ) = β1 I just need to show that the expected value of the second
term is zero.

• Now let’s do the proof (i.e. show that E(βb1 ) = β1 ).

• Since we have P
Xi − X ui
β1 = β1 + P
b 2
Xi − X
it follows that "P #
Xi − X ui
E(βb1 ) = β1 + E P 2
Xi − X
(& using the LIE (E (Z) = E [E (Z | X)]) on the 2nd term)
" P !#
Xi − X ui
= β1 + E E P 2 | X1 , ..., Xn
Xi − X

• Therefore, we have shown that E(βb1 ) = β1 , so βb1 is an unbiased estimator of β1 .

• A similar approach can be used to show that E(βb0 ) = β0

3.2 OLS is consistent

P
X i − X ui
plim(βb1 ) = β1 + plim( P 2 )
Xi − X
1 P

n Xi − X ui
= β1 + plim( P 2 )
1
n Xi − X
1 1X
= β1 + plim( Xi − X ui )
var(Xi ) n
cov(Xi , ui )
= β1 +
var(Xi )
= β1 + 0

3.3 Estimating V ar(βb1 )

P
Xi − X ui
var(β1 ) = var( P
b 2 )
Xi − X
1 X
= P 2 var( Xi − X ui )
( Xi − X )2
1 X 2
= P 2 Xi − X var(ui )
( Xi − X )2
1 P
2
n Xi − X var(ui )
= 2
(Xi −X ) 2
P
n( n )
1 P 2
Xi − X var(ui )
= n
n(var(Xi ))2

11
if we further assume assumption #5, i.e. homoskedacity (iid assumption), we can go
further
1 X 2
= P 2 Xi − X σ 2
( Xi − X )2
σ2
=P 2
Xi − X
1 2
nσ
= 2
1 P
n Xi − X
σ2
=
n ∗ var(Xi )
q
• So, SE βb1 = V ar(βb1 )

• These are the formulas that Stata uses to construct the standard errors.

3.4 Asymptotic Distribution of βb1 (skip)

This subsection sketches proof for the normality of βbs .

3.4.1 Convergence in Probability

• Let a1 , a2 , ..., an , .. be a sequence of random variables

– Example: Y 1 , Y 2 , ..., Y n , where n is the # of observations.

• Loosely speaking, we say that a random variable an converges in probability to a if

an becomes closer and closer to a as n → ∞.
p
– This is written as an −→ a or plim (an ) = a

• Formally, an converges in probability to a if for every ε > 0

P (|an − a| > ε) → 0 as n → ∞

p p
• Examples: Y n → µY , s2Y → σY2

12
3.4.2 Convergence in Distribution
• Let F1 , F2 , ..., Fn , .. be a sequence of CDFs corresponding to a sequence of random
variables W1 , W2 , ..., Wn , ..

d
• Wn converges in distribution to W Wn → W if the CDFs {Fn } converge to F (the
CDF of W )
d
Wn → W ⇐⇒ lim Fn (t) = F (t)
n→∞

• We will sometimes also use the notation

a
Wn ∼ F

σ2

Y −µY d a
• Examples: σY
√
→ N (0, 1) , Y ∼ N µY , nY
n

3.4.3 The Central Limit Theorem

• If Y1 , ..., Yn are iid with E (Yi ) = µY , var (Yi ) = σY2 where 0 < σY2 < ∞, then the
standardized sample average

Y − µY d
σY → N (0, 1)
√
n

√
Y −µY n(Y −µY )
• Since σY
√
= σY the CLT can also be written as
n

√ d
n Y − µY → σY N (0, 1)

√ d
n Y − µY → N 0, σY2

• You will sometimes see this written as

σY2

a
Y ∼ N µY ,
n

3.4.4 Slutsky’s Theorem (Combines Conv in Prob and Dist)

p d d d Wn d
• Suppose an → a and Wn → W, then an + Wn → a + W, an Wn → aW and an →
W
a (if a 6= 0)

– Example: using Slutsky to find the asymptotic distribution of the t-statistic

13
– Assume Yi ∼ iid (µY , σY2 < ∞)
– Recall that the t-statistic based on Y is
Y − µY
t= sY
√
n

σY Y −µY
and let an = sY and Wn = σY
√
so that t = an Wn
n
p
– Since s2Y → σY2 , (and by using the continuous mapping theorem1 ), we know that
p
an → 1

also, from the Central Limit Theorem, we know that

d
Wn → N (0, 1)

Therefore, applying the Slutsky theorem

d
t = an Wn → N (0, 1)

• Now let’s derive the asymptotic distribution of βb1 .

• Recall our trick from before

1 P

n Xi − X ui
βb1 − β1 = 2
1 P
n Xi − X
1 1 1
P P P
• Let n Xi − X ui = n (Xi − µX ) ui = n vi = v.

• Thus, we can write

√
√ n (v) ”Wn ”
n βb1 − β1 = 2 =
1 P
Xi − X ”an ”
n

1 P 2 p 2 (so a → σ 2 ) p
• We know n Xi − X → var(Xi ) = σX n X
1
The continious mapping theorem states that, for any continuous function g :
p p
∗ if an → a then g(an ) → g(a), and
d d
∗ if Wn → W then g(Wn ) → g(W )

14
• Suppose we can prove that

√ d
n (v) → N 0, σv2

d
⇔ Wn → N 0, σv2

√
d 1 a σv2
0, σv2

• Then n βb1 − β1 → 2 N
σX
, so βb1 ∼ N β1 , 2 2
n(σX )
√ d
n (v) → N 0, σv2 ? By CLT!

• So how do we show that

• First, note that since

• E (vi ) = E [(Xi − µX ) ui ] = 0 (Assumption #4)

• V ar (vi ) = V ar ((Xi − µX ) ui ) = σv2 < ∞

• We can apply the CLT

v − µv d √ d
n v → N 0, σv2

σv → N (0, 1) ⇔
√
n

• Finally, applying the Slutsky theorem,

√ d
→ N 0, σv2

√ n (v)
n βb1 − β1 = 2 = p
1 P 2
→ σX
n Xi − X

• Or, !
√
d 1 σv2
n βb1 − β1 → 2 N 0, σv2 = N

0,
σX 2 2

σX

• So, we conclude that

a V ar ((Xi − µX ) ui )
βb1 ∼ N β1 ,
n (var(Xi ))2

15
4 Skedacity
4.1 Heteroskedasticity and Homoskedasticity
• Note that the ui ’s determine how the data will be scattered around the regression
line.

• But so far we’ve made no assumptions about V ar(ui ) (aside from a nonzero finite
fourth moment: 0 < E u4i < ∞).

• We did assume that E (ui | Xi ) = 0 but we did not assume that V ar (ui | Xi ) = σu2
(i.e. that the variance does not depend on the regressors).

• If it’s true that V ar (ui | Xi ) = σu2 (a constant) then we have homoskedasticity, which
is a useful property to have!

• If instead, V ar (ui | Xi ) = f (Xi ) we have heteroskedasticity.

• “Skedasticity”, sometimes spelled “scedasticity”, is a statistical word meaning “ten-

dency to scatter”.

• Homoskedasticity: All conditional distributions have the same variance (spread).

• Heteroskedasticity: The conditional distributions can have different variances.

• Examples : education and income, a firm’s productivity and foreign investment, etc.

• If the tendency to scatter has some pattern, we may use quantile regression.

• So why is homoskedasticity nice to have?

• First, it simplifies the formulas for the SEs quite a bit.

• Second, more importantly, assumptions above plus homoskedasticity mean that OLS
is BLUE.

– BLUE means best (min. var.) linear unbiased estimator.

– βb0 & βb1 are efficient among all estimators that are linear and unbiased, condi-
tional on the Xi ’s.
– The OLS estimators have the smallest variance of all unbiased estimators.
– Before we showed the OLS estimators were unbiased, consistent and asymptot-
ically normal (all still true).
– Now OLS is BLUE (also called the Gauss-Markov Theorem).

• However,

16
– If the errors are instead heteroskedastic, which is true in most practical ques-
tions, OLS is no longer BLUE.
– Even if the errors are homoskedacitic, OLS is the best in the linear sense. That
is, there may be better non-linear estimator than OLS.

• So why don’t we use these simple formulas all the time?

• Homoskedasticity often does not hold in practice.

b2b using a computer, we don’t care so much about having a

• Also, since we compute σ
β1
simple formula.

• Moreover, the heteroskedasticity robust (HR) estimator can handle homoskedasticity

since HR assumes less.

• Remember that the point estimate is the same either way: only the SE’s change
depending on what you assume about V ar (ui | Xi ) .

• Typically
bβ2b (HR) > σ
σ bβ2b (Homosked only)
1 1

so you’ll have bigger CI’s and p-values

βb −β
• Since a bigger variance leads to a smaller t-ratio (recall t = 1 b1,0 ), you are less
SE(β1 )
likely to reject H0 if you use HR SE’s, so your assumptions matter for inference!

– Point estimates are the same, but tests and CI’s change

• Thus, it’s much safer to always use HR standard errors unless you know that V ar (ui | Xi ) =
σu2 .

4.2 Weighted Least Squares (WLS)

• Can we transform heteroskedasticity to homoskedasticity?

• Yes, if we know the form of heteroskedasticity.

• Suppose

1. E (Yi | Xi ) = β0 + β1 Xi

2. (Yi , Xi ) ∼ iid

3. Xi , ui have finite fourth moments, and

17
4. V ar (ui | Xi ) = λf (Xi ) where f (·) is a known function and λ is an unknown constant
(so we know the form of heteroskedasticity up to the proportionality factor λ), but
covariance matrix is still zeros. (E(ui uj ) = 0)
Define
Yei = √ Yi e0i = √
,X 1
, e1i = √ Xi
X ei = √ ui
and u
f (Xi ) f (Xi ) f (Xi ) f (Xi )

Yi = β0 + β1 Xi + ui =⇒ Yei = β0 X
e0i + β1 X
e1i + u
ei
but now

V ar(ui |Xi )
ui | Xi ) = V ar
V ar (e √ ui | Xi = f (Xi ) = λ which is a constant
f (Xi )

• So we can create Yei , X

e0i , & X
e1i and run OLS on

Yei = β0 X
e0i + β1 X
e1i + u
ei

• This WLS regression will be BLUE.

• It’s called weighted least squares because we calculate the coefficients by minimizing
1
the sum of the squared residuals, weighted by f (X i)
.

• WLS can be extended to cases where we have to estimate the function f (·), which
is called feasible WLS.

• WLS is more efficienty than heteroskedacity robust SE.

• So what’s the caveat?

– We don’t know much about f (X)

– Since the functional form of f (X) is rarely known (and using the wrong one
invalidates the method), WLS is rarely used in practice.
– In most cases, it is preferable to simply use HR SEs
∗ They produce asymptotically valid inferences even when you don’t know
f (X) .
∗ They are computed automatically by Stata and other regression packages.

4.3 The Variance of X and the Variance of βb1

• First, a high variance of X yields a low variance of βb1 .

1 V ar ((Xi − µX ) ui )
σβ2b =
1 n (V ar(Xi ))2

18
– Which of these two scatterplots would you rather fit a line through?

• Second, a low variance of u yields a low variance of βb1 . This is rationale for adding
more control variables explaining y.

5 Statistical Inference
• We are now able to construct confidence intervals and conduct hypothesis tests.

• We can construct a 95% confidence interval as

βb1 ± 1.96 · SE βb1

βb1 −β1,0
• Similarly our t-ratio or t-statistic is t =
SE (βb1 )

• So a two-sided test has

p-value = P (|Z| > |t|) = 2Φ (− |t|)

• Remember that the p-value is the smallest significance level at which the null hy-
pothesis could be rejected.

• Equivalently, it’s the probability of obtaining a statistic, by random sampling vari-

ation, at least as different from the null hypothesis value as the statistic actually
observed (assuming H0 is correct).

6 Regression When X is a Binary Variable

• So far, we have only looked at examples where the regressor X is a continuous variable
(e.g. class size).

• Regression Analysis can also be used when X is a binary or dummy variable (i.e. can
only take on the values 0 and 1).

– gender, drug treatment, democrat...

• Although the coefficients are calculated in exactly the same way when X is binary,
the interpretation of β1 differs.

– Why? Because a regression with a binary regressor is equivalent to performing

a difference of means analysis.

19
• For example, let Yi be average hourly earnings in 2008 and Di equal 1 if the worker
is male and 0 if the worker is female.

• The population regression model with Di as the regressor is

Yi = β0 + β1 Di + ui

• Since Di is not continuous, we can’t really think of β1 as a slope (there’s no “line”

since Di only takes on 2 values).

• For this reason, we just call β1 the coefficient on Di , instead of the slope.

• So how do we interpret β1 if it’s not a slope? Let’s look at what we have for each
value of Di

• When Di = 0 (the worker is female)

Yi = β0 + β1 · 0 + ui = β0 + ui

• Since E (Yi | Di = 0) = β0 , β0 is the population mean value of earnings for women.

• Whereas when Di = 1 (the worker is male)

Yi = β0 + β1 · 1 + ui = β0 + β1 + ui

• So E (Yi | Di = 1) = β0 + β1 , the population mean value of earnings for men.

• β1 is then the difference between the two population means.

• Stata command (margins) report this result as well.

7 Goodness of Fit
• So we’ve learned how to estimate β0 & β1 and how to test hypotheses and build CI’s
using these estimates.

• But how “good” is our regression?

• In other words, how close is the line to the actual data?

• Or more precisely, how much of the variation in Y is explained by our regression?

• Can we measure how much better the regression does at estimating Y than just using
Y?

20
• We need to measure how close we are getting to the data.

• Define the total sum of squares (SST), explained sum of squares (SSE), and residual
sum of squares (or sum of squared residuals) as

n
X
SST = T SS = (yi − ȳ)2
i=1
Xn
SSE = ESS = (ŷi − ȳ)2
i=1
n
X
SSR = RSS = û2i
i=1

• The R2 is the percentage of the total variation in Y “explained” by the estimated

regression:
2
(Yi −Y )
P b
ESS “explained variation” sample variance of Ybi
R2 = 2 = T SS = “total variation” = sample variance of Yi
(Yi −Y )
P

• Since T SS = ESS + RSS, we can also show that

RSS “unexplained variation”
R2 = 1 − =1−
T SS “total variation”

• Note that 0 ≤ R2 ≤ 1

• R2 = 1 is a perfect fit (all the data points are on the regression line).

• R2 = 0 means you are explaining none of the variation in Y (so your best guess for
any Yi is just the sample mean Y ).

• It turns out that there is a close link between R2 in the univariate regression model
and the sample correlation coefficient rXY = ssXXY
sY

• R2 is a measure of the fit of the linear model.

• The sample correlation (rXY ) is a measure of the linear relationship between two
variables.

• In fact, R2 = rXY2 (you can prove this using the definitions of R2 and rXY
2 ) in the

univariate case.

• This is useful to know since it gives us some idea of what a high or low R2 should
“look like”.

21
• A high R2 means that a lot of the total variation is explained by the regression (data
is tightly concentrated around the line).

• But, R2 does not tell you about the statistical significance of the coefficients (for this
you need SEs).

– R2 also does not prove our model is right or wrong: you can have a good model
but a low R2 because V ar(ui ) is large.
– Can also have a bad model with R2 ≈ 1
– Spurious regression: X and Y move together because of something else.
– Ex: Regress the number of supermarkets on the number of cars (or video stores).

8 Units of Measurement
• It is very important to know how y and x are measured in order to interpret regression
functions. Consider an equation estimated from CEOSAL1.DTA, where annual CEO
salary is in thousands of dollars and the return on equity is a percent:
\ = 963.191 + 18.501 roe
salary
n = 209, R2 = .0132

• When roe = 0 (it never is in the data), salary

\ = 963.191. But salary is in thousands
of dollars, so $963,191.

• A one percentage point increase in roe increases predicted salary by 18.501, or

$18,501.

• What if we measure roe as a decimal, rather than a percent? Define

roedec = roe/100

• What will happen to the intercept, slope, and R2 when we regress

salary on roedec?

• Nothing should happen to the intercept: roedec = 0 is the same as roe = 0. But the
slope will increase by 100. The goodness-of-fit should not change, and it does not.

• The new regression is

\ = 963.191 + 1, 850.1 roedec
salary
n = 209, R2 = .0132

22
• Now a one percentage point change in roe is the same as ∆roedec = .01, and so we
get the same effect as before.
• What if we measure salary in dollars, rather than thousands of dollars?
• Both the intercept and slope get multiplied by 1,000:
\
salarydol = 963, 191 + 18, 501 roe
n = 209, R2 = .0132

9 Estimation in Stata
9.1 Effects of Education on Hourly Wage (WAGE1.DTA)
• Data are from 1991 on men only. wage is reported in dollars per hour, educ is highest
grade completed.
• reg wage educ
• Negative intercept. Each additional year of schooling is estimated to be worth $0.54.
• Plugging in educ = 0 gives the silly prediction wage
[ = −.904. Extrapolating outside
the range of the data can produce strange predictions.
• When educ = 12, the predicted hourly wage is $5.59, which we can think of as our
estimate of the average wage in the population when educ = 12.
• margins, at(educ=12)
• We are explaining about 16% of the variation in wage with our regression. In other
words, 84% of variation in wage remains unexplained.
• predict wagehat
• predict uhat, resid
• Some residuals are positive, others are negative. None is especially close to zero.
Years of schooling, by itself, need not be a very good predictor of wage.

9.2 Test score and student ratio

• Stata command: reg testscr str, robust
• The estimated regression line is
\ = 698.9 − 2.28 · ST R
T estScore
(10.4) (.52)

23
• So what is the expected impact on test scores of a two student increase in class size?
-2.28*2=-4.5

• What is the expected test score in a district with 20 students per teacher? How about
30 students? 0 students?

• Note that SE(βb0 ) = 10.4 & SE(βb1 ) = .52.

• Of course, we can use them to test hypotheses as we did before. For example, suppose
you want to test

H0 : β1 = 0
HA : β1 6= 0

βb1 −0 −2.28−0
t-stat = = .52 = −4.39 ⇒ p-value = 2 · Φ(−4.39) ≈ 0 (so we reject the
SE(βb1 )
null).
Alternatively, a 95% CI for β1 is simply βb1 ± 1.96 · SE(βb1 ) = −2.28 ± 1.02 =
(−3.3, −1.26) (same conclusion).

• rXY = −.226. (Stata command : pwcorr str testscr )

• We can see that (−.226)2 = .051, which is the R2 from the regression!

• What do you expect the test score of orange county compared to other counties?
Run the regression and interpret it.

9.3 CPS data

• Here’s the result of the regression above using the 2008 data:

• Stata command: reg ahe08 a sex if year==2008, robust

\ = 29.08 − 4.10 · F emale

Earnings

• βb0 = 29.08 is the average value of earnings for men.

• βb0 + βb1 = 24.98 is the average value of earnings for women.

• βb1 = −4.10 is the difference between the two sample averages.

• Stata command: reg ahe08 i.a sex if year==2008, robust; margins a sex;

24
• We can test the hypothesis H0 : β1 = 0 HA : β1 6= 0 by calculating the t-statistic

βb1 − 0 −4.10
tact = = = −11.59
SE βb1 ..353

and then calculating the p-value

p-value = 2Φ − tact = 2Φ (−11.59) ≈ 0

• We can reject the null hypothesis at any positive level of significance (just as before).

EC226: Linear Regression Analysis
No ratings yet
EC226: Linear Regression Analysis
9 pages
Ordinary Least Squares
No ratings yet
Ordinary Least Squares
21 pages
Applied Econometrics: Causal Inference Guide
No ratings yet
Applied Econometrics: Causal Inference Guide
766 pages
Linear Regression Analysis of Class Size
No ratings yet
Linear Regression Analysis of Class Size
38 pages
Econometrics Bacheror's Lectures Utrecht University
No ratings yet
Econometrics Bacheror's Lectures Utrecht University
24 pages
Cross-Sectional Data in Linear Regression
No ratings yet
Cross-Sectional Data in Linear Regression
113 pages
Multiple Regression Analysis Guide
No ratings yet
Multiple Regression Analysis Guide
22 pages
Understanding Simple Regression Analysis
No ratings yet
Understanding Simple Regression Analysis
42 pages
Simple Regression in Econometrics
No ratings yet
Simple Regression in Econometrics
107 pages
Chapter 2 Econometrics
No ratings yet
Chapter 2 Econometrics
9 pages
Chapter 2
No ratings yet
Chapter 2
50 pages
Class Size Impact on Test Scores
100% (2)
Class Size Impact on Test Scores
84 pages
Understanding Simple Regression and OLS
No ratings yet
Understanding Simple Regression and OLS
29 pages
Understanding Simple Regression Models
No ratings yet
Understanding Simple Regression Models
45 pages
Chapter Two: Bivariate Regression Mode
100% (1)
Chapter Two: Bivariate Regression Mode
54 pages
The Simple Regression Model
No ratings yet
The Simple Regression Model
24 pages
Understanding Ordinary Least Squares (OLS)
No ratings yet
Understanding Ordinary Least Squares (OLS)
9 pages
Lecture 2 SLR - 1
No ratings yet
Lecture 2 SLR - 1
28 pages
Quantitative Methods for Finance Overview
No ratings yet
Quantitative Methods for Finance Overview
21 pages
Simple Linear Regression Overview
No ratings yet
Simple Linear Regression Overview
5 pages
PEV Onesided
No ratings yet
PEV Onesided
322 pages
Two-Variable Regression Model Basics
No ratings yet
Two-Variable Regression Model Basics
17 pages
Understanding Simple Linear Regression
No ratings yet
Understanding Simple Linear Regression
59 pages
Linear Regression Basics in Econometrics
No ratings yet
Linear Regression Basics in Econometrics
67 pages
Understanding Ordinary Least Squares (OLS)
100% (1)
Understanding Ordinary Least Squares (OLS)
47 pages
Econ Methods for Manchester Students
No ratings yet
Econ Methods for Manchester Students
164 pages
Summary of Econometrics Chapters 3-5
No ratings yet
Summary of Econometrics Chapters 3-5
64 pages
ZSMFG
No ratings yet
ZSMFG
43 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
36 pages
Simple Regression Model Overview
No ratings yet
Simple Regression Model Overview
83 pages
Linear Regression Fundamentals in Econometrics
No ratings yet
Linear Regression Fundamentals in Econometrics
12 pages
Week 2, OLS
No ratings yet
Week 2, OLS
83 pages
Simple Linear Regression Overview
No ratings yet
Simple Linear Regression Overview
7 pages
Econometrics Final
No ratings yet
Econometrics Final
13 pages
Econometrics II: Revision Class: Introduction To Econometrics
No ratings yet
Econometrics II: Revision Class: Introduction To Econometrics
55 pages
Understanding Simple Linear Regression
No ratings yet
Understanding Simple Linear Regression
75 pages
EC2C4 Econometrics II
No ratings yet
EC2C4 Econometrics II
56 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
17 pages
Introduction to Econometrics Concepts
No ratings yet
Introduction to Econometrics Concepts
95 pages
Econ 399 Chapter2a
No ratings yet
Econ 399 Chapter2a
40 pages
Econometrics Study Guide
No ratings yet
Econometrics Study Guide
9 pages
Bivariate Regression Analysis Overview
No ratings yet
Bivariate Regression Analysis Overview
40 pages
Introduction to Simple Linear Regression
No ratings yet
Introduction to Simple Linear Regression
47 pages
Lecture 2-3
No ratings yet
Lecture 2-3
8 pages
Understanding Econometrics Basics
No ratings yet
Understanding Econometrics Basics
14 pages
Metrics Topic6 Part1 Multipleregression
No ratings yet
Metrics Topic6 Part1 Multipleregression
33 pages
Econometrics Final Exam Review Guide
No ratings yet
Econometrics Final Exam Review Guide
5 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
46 pages
Ordinary Least Squares Linear Regression Review: Week 4
No ratings yet
Ordinary Least Squares Linear Regression Review: Week 4
10 pages
Problem Set 3 SOLUTIONS
No ratings yet
Problem Set 3 SOLUTIONS
7 pages
Ch3 Slides Ed4 2024 20
No ratings yet
Ch3 Slides Ed4 2024 20
72 pages
OLS Regression Explained by Dr. Mitiku
No ratings yet
OLS Regression Explained by Dr. Mitiku
80 pages
Understanding Covariance and Regression Analysis
No ratings yet
Understanding Covariance and Regression Analysis
46 pages
Overview of Classical Linear Regression
No ratings yet
Overview of Classical Linear Regression
72 pages
Regression Models and Statistical Concepts
No ratings yet
Regression Models and Statistical Concepts
13 pages
PDF (8) Panel Q-A PDF
No ratings yet
PDF (8) Panel Q-A PDF
9 pages
Panal Data Method ch14 PDF
No ratings yet
Panal Data Method ch14 PDF
38 pages
Binary Dependent Variable Regression
No ratings yet
Binary Dependent Variable Regression
18 pages
Linearization and Regression Analysis in Econometrics
No ratings yet
Linearization and Regression Analysis in Econometrics
4 pages
Hypothesis Testing in MRM Overview
No ratings yet
Hypothesis Testing in MRM Overview
6 pages
Statistical Inference For Engineers and Data Scientists Solutions Manual
No ratings yet
Statistical Inference For Engineers and Data Scientists Solutions Manual
12 pages
MGT 403 Syllabus, FE 2021, Fall 2018
No ratings yet
MGT 403 Syllabus, FE 2021, Fall 2018
7 pages
Engineering Data Analysis Course Details
No ratings yet
Engineering Data Analysis Course Details
2 pages
Correlation and Regression
No ratings yet
Correlation and Regression
39 pages
M140 TMA 03 2022J: Faculty of Science, Technology, Engineering and Mathematics M140 Introducing Statistics
No ratings yet
M140 TMA 03 2022J: Faculty of Science, Technology, Engineering and Mathematics M140 Introducing Statistics
10 pages
Workplace Stress vs. Procrastination Study
No ratings yet
Workplace Stress vs. Procrastination Study
14 pages
Business Statistics: Australasian
No ratings yet
Business Statistics: Australasian
38 pages
Python Data Analysis Interview Notes Real World Scenarios
No ratings yet
Python Data Analysis Interview Notes Real World Scenarios
5 pages
Be The Outlier - How To Ace Data Science Interviews - Shrilata Murthy
100% (2)
Be The Outlier - How To Ace Data Science Interviews - Shrilata Murthy
150 pages
Big Data CH01
No ratings yet
Big Data CH01
12 pages
Time Series and Forecasting - Subject Overview
100% (2)
Time Series and Forecasting - Subject Overview
5 pages
A Study of Matriculation Computer Science Students' Knowledge, Anxiety and Attitude Towards Computer
No ratings yet
A Study of Matriculation Computer Science Students' Knowledge, Anxiety and Attitude Towards Computer
9 pages
Grade 11 Statistics Lesson Plan
100% (1)
Grade 11 Statistics Lesson Plan
3 pages
Final Exam Practice: XSTK Statistics
No ratings yet
Final Exam Practice: XSTK Statistics
9 pages
Social Capital and Health in Nagaland
No ratings yet
Social Capital and Health in Nagaland
231 pages
PGDM Project Dissertation Guide
No ratings yet
PGDM Project Dissertation Guide
5 pages
Impact of Green HRM Practices and Employee Green Perception On Sustainable Organizational Performance
No ratings yet
Impact of Green HRM Practices and Employee Green Perception On Sustainable Organizational Performance
16 pages
Data Analysis & Business Modelling Lab
No ratings yet
Data Analysis & Business Modelling Lab
52 pages
BASIC Scientific Subroutines Vol. II
No ratings yet
BASIC Scientific Subroutines Vol. II
805 pages
Descriptive Statistics: Data Summarization
No ratings yet
Descriptive Statistics: Data Summarization
30 pages
MCQs on Measures of Central Tendency
No ratings yet
MCQs on Measures of Central Tendency
11 pages
MSc Data Science for Business Program
No ratings yet
MSc Data Science for Business Program
5 pages
EXAI for Early Parkinson's Diagnosis
No ratings yet
EXAI for Early Parkinson's Diagnosis
13 pages
Experimental Design Essentials
No ratings yet
Experimental Design Essentials
28 pages
Data Coding and Classification Guide
No ratings yet
Data Coding and Classification Guide
9 pages
Time Management & Critical Thinking in BSEd Students
No ratings yet
Time Management & Critical Thinking in BSEd Students
34 pages
Understanding Normal Distributions and Probabilities
No ratings yet
Understanding Normal Distributions and Probabilities
29 pages
2017 CFA Level 2 Mock Exam Morning - Ans
No ratings yet
2017 CFA Level 2 Mock Exam Morning - Ans
62 pages
DP 100 Demo
No ratings yet
DP 100 Demo
59 pages
Lesson Worksheet: 6.1A Mean
No ratings yet
Lesson Worksheet: 6.1A Mean
7 pages