HSP 511: Economics Lab
Lecture 1
Indian Institute of Technology Delhi
Contents
Linear Mean Model 1
Finite Sample Properties 3
Large Sample Properties 3
Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Asymptotic Normality . . . . . . . . . . . . . . . . . . . . . . . . . 5
Confidence Interval and Testing . . . . . . . . . . . . . . . . . . . . 6
An econometric model is a mathematical model that embodies a set of statistical
assumptions concerning the generation of sample data (and similar data from a
larger population). It is usually specified as a mathematical relationship between
one or more random variables and other non-random variables. Suppose that we
have a population of children and we have an i.i.d. sample data on height, age from
this population. The height of a child will be stochastically related to the age. We
could formalize that relationship in a linear regression model,
heighti = b0 + b1 agei + ϵi ,
where b0 is the intercept, b1 is a parameter that age is multiplied by to obtain a
prediction of height, ϵi is the error term. This implies that height is predicted by
age, with some error. We can further specify some restrictions on error. However,
these assumptions on the error term ϵi should be such that the model is consistent
with all the data points.
※ Linear Mean Model
Suppose we have a random variable Xi e.g. height of individuals in some population.
This random variable has some distribution which we don’t know and we want to
understand this distribution from an i.i.d. sample from the population. Suppose we
are only interested in the mean of this distribution as mean is the best prediction
of any random variable in terms of mean squared error.
Lets consider the linear mean model and θ ∈ R be a parameter such that we
have statistical model
Xi = θ + ϵi ,
1
Lecture Notes Linear Mean Model
where E(ϵi ) = 0 and Var(ϵi ) = σ 2 < ∞. The only assumption here is that variance
is finite and everything else hols without loss of generality as we can always write
Xi = E(Xi ) + (Xi − E(Xi ))
= E(Xi ) + ϵi ,
where ϵi satisfy E(ϵi ) = 0. So the quantity of interest is θ = E(Xi ) and σ 2 = Var(ϵi ).
Let X = (X1 . . . Xn )⊤ be an i.i.d. sample from this model. We want to estimate
(θ, σ 2 ) from the sample observations. Consider following estimators
n
1X
θ̂ = Xi ,
n i=1
and n
1 X
σ̂ 2 = [Xi − θ̂]2 .
n − 1 i=1
R Suppose we want to predict a random variable by a constant α such that
mean squared error is minimized
min E(X − α)2 .
α
The α which minimizes this problem is exactly E(X). The best prediction of
any random variable in terms of MSE is its expectation.
R (Plug-in Approach) Most common approach to developing estimators is
based on the analogy principle or plug-in method. The analogy principle
of estimation proposes that population parameters be estimated by sample
statistics which have the same property in the sample as the parameters
do in the population. More generally expectations are replaced by sample
averages and parameters are replaced by their estimators.
Definition 1 An estimator θ̂ of θ is called unbiased if
E[θ̂] = θ.
Exercise 1 Show that θ̂ and σ̂ 2 are unbiased estimator of θ and σ 2 .
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
H = np.mean(np.mean(np.random.rand(100,10000),axis=0))-.5
print('The bias for theta is: {}'.format(H))
2
Lecture Notes Finite Sample Properties
## The bias for theta is: 0.00025030865719077866
Fact 1.1
(1) If X ∼ N (θ, Σ) be a n × 1 random vector and A (m × n) a matrix, then
AX ∼ N (Aθ, AΣA⊤ ).
(2) If X ∼ N (0, 1), then
X 2 ∼ χ21 .
(3) If X ∼ N (0, In ), thena
X ⊤ X ∼ χ2n .
(4) If X ∼ N (0, In ) and M be orthogonal projection (M 2 = M and M = M ⊤ )
with rank n − k, then
X ⊤ M X ∼ χ2n−k .
(5) Let z ∼ N (0, 1), w ∼ χ2m and z ⊥ w, then
z
q
w
∼ tm .
m
a
In denote n × n identity matrix.
※ Finite Sample Properties
We want to know the distribution of our estimator θ̂ and σ̂ 2 to make inference
or build confidence interval. Usually, it is not easy to know the distribution
of the estimator until we either make some assumption or rely on large sample
considerations. Here we make the normality assumption in order to find the
distribution of our estimator. Assume that Xi ∼ N (θ, σ 2 ). Then
θ̂ ∼ N (θ, σ 2 /n).
and
(n − 1)σ̂ 2
∼ χ2n−1 ,
σ2
this follows from Fact 1 (4) as
1 − n1 − n1 . . . − n1
(n − 1)σ̂ 2 ⊤
− n1 1 − n1 . . . − n1
= X .. .. .. .. X,
σ2
. . . .
1
−n − n . . . 1 − n1
1
| {z }
M
where M is a projection matrix of rank n − 1.
3
Lecture Notes Large Sample Properties
※ Large Sample Properties
※ Consistency
P
Definition 2 We say an estimator θ̂ of θ is consistent if θ̂ −
→ θ.
We show that estimator θ̂ and σ̂ 2 are consistent estimators. Under E(|Xi |) < ∞,
from weak law of large number
n
1X P
θ̂ = Xi −
→θ
n i=1
Secondly, under E(Xi2 ) < ∞
n
1 X
σ̂ 2 = [Xi − θ̂]2
n − 1 i=1
n
1 X
= [Xi2 + θ̂2 − 2θ̂Xi ]
n − 1 i=1
n
n 1X n
= Xi2 − θ̂ |{z}
θ̂
n − 1 n i=1 n
| {z } | {z } | {z } P
− 1 |{z}
P
→1 P
→1 −→θ − →θ
−
→E(Xi2 )
P
→ E(Xi2 ) − (E(Xi ))2 = σ 2 .
−
dff=np.zeros(100)
eps=.01
for i in range(100):
H = (np.sign(np.abs(np.mean(np.random.rand(100*(i+1),1000),
axis=0)-.5)-eps)+1)/2
dff[i]= np.mean(H)
plt.plot(dff)
plt.show()
4
Lecture Notes 3.2 Asymptotic Normality
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 20 40 60 80 100
※ Asymptotic Normality
Now, we will see as sample size grow the approximate distribution of θ̂ is normal
distribution. We will use central limit theorem to prove this result.
Theorem 3.1: Central Limit Theorem
Suppose {X1 , . . . , Xn } is a sequence of i.i.d. random variables with E[Xi ] = µ
and Var[Xi ] < ∞. Then, as n approaches infinity
√ n
!
1X d
n − N 0, σ 2 ,
Xi − µ →
n i=1
where
n
!
2 1X
σ = lim nVar Xi .
n→∞ n i=1
x_axis = np.arange(-3, 3, 0.01)
clt = np.sqrt(12*100)*(np.mean(np.random.rand(100,1000),axis=0)-.5)
plt.hist(clt, bins=50, density= True)
plt.plot(x_axis, norm.pdf(x_axis, 0, 1))
plt.show()
5
Lecture Notes 3.3 Confidence Interval and Testing
0.5
0.4
0.3
0.2
0.1
0.0
3 2 1 0 1 2 3
※ Confidence Interval and Testing
Now, we want to develop a test for
H0 : θ = θ0 versus H1 : θ ̸= θ0
The general strategy behind testing is to develop a test statistic T (X) whose
distribution under the null is known and its distribution under the alternate is
different compare to under the null distribution. The ability of the test based on
T (X) to detect null vs alternate relies on the difference in behavior under the null
vs alternate. Let consider the following test statistic
√
n(θ̂ − θ0 )
T (X) = .
σ̂
Under the null as sample size increases, we have
d
T (X) →
− N (0, 1).
Under the alternate, we have
√
n(θ̂ − θ0 )
T (X) =
σ̂
√ √
n(θ̂ − θ1 ) n(θ1 − θ0 )
= + ,
| σ̂
{z } | σ̂
{z }
d large +ve or -ve
→
− N (0,1)
6
Lecture Notes 3.3 Confidence Interval and Testing
where θ1 belongs the true alternate. Suppose we also want to control the type I
error to be less than α. Consider the test ϕ∗ given by
(
∗ 1 if T (X) < −c or T (X) > c
ϕ (X) =
0 otherwise,
where c is chosen so that
E(1 − ϕ∗ (X)) ≤ α.
It is easy to see in our hypothesis testing c = zα/2 , where zα/2 is α/2 quantile of
normal distribution. So our test becomes
√
n(θ̂ − θ )
0
ϕ∗ (X) = 1 ≤ zα/2
σ̂
x_axis = np.arange(-3, 3, 0.01)
x_axis1= np.arange(-3, -1.96, 0.01)
x_axis2= np.arange(1.96, 3, 0.01)
plt.plot(x_axis, norm.pdf(x_axis, 0, 1))
plt.fill_between(x_axis1, norm.pdf(x_axis1, 0, 1))
plt.fill_between(x_axis2, norm.pdf(x_axis2, 0, 1))
plt.show()
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
3 2 1 0 1 2 3
A confidence interval is a data-dependent interval C(X) that contains the
parameter of interest with large probability. Suppose, we want to find confidence
7
Lecture Notes 3.3 Confidence Interval and Testing
interval for θ. One possibility is to set C(X) = (−∞, ∞). Such an interval will
contain the parameter of interest with probability 1. The problem is to construct
shortest interval that will contain θ with large probability. The goal is to find
confidence interval such that
Prθ {θ ∈ C(X)} ≥ 1 − α.
A confidence interval can constructed by inverting a test. For each possible
parameter value θ′ , consider the problem of testing the null hypothesis, H0 : θ = θ′
against the alternative, H1 : θ ̸= θ′ . Suppose that for each such hypothesis we have
a test of level α. Then the confidence set
C(X) = {θ′ ∈ Θ : the null hypothesis θ = θ′ is accepted.}
is of coverage probability at least 1 − α. Indeed, suppose that the true value of the
parameter is θ0 . Since the test of θ = θ0 against θ ̸= θ0 has level α by construction.
We can use the test above to find our confidence interval as
√ ′
n(θ̂ − θ )
C(X) = θ′ : ≤ zα/2
σ̂
( )
zα/2 σ̂ zα/2 σ̂
= θ : θ̂ − √ ≤ θ′ ≤ θ̂ + √
′
.
n n