Week 4 - 418
Week 4 - 418
𝑁
1 1
𝐿=( ) exp (− 2
∑𝜀 2 )
𝜎√2𝜋 2𝜎
3. Define the log Likelihood Function
1
𝑙 = −𝑁 ln 𝜎 − 𝑁 ln √2𝜋 − 2
∑𝜀 2
2𝜎
1
𝑙 = −𝑁 ln 𝜎 − 𝑁 ln √2𝜋 − 2
∑(𝑌 − 𝑌̂)2
2𝜎
𝑌̂ = 𝛼̂ + 𝛽̂ 𝑋
1
𝑙(𝛼̂, 𝛽̂ , 𝜎̂) = −𝑁 ln 𝜎̂ − 𝑁 ln √2𝜋 − 2
∑(𝑌 − 𝛼̂ − 𝛽̂ 𝑋)2
2𝜎̂
1
∑(𝑌 − 𝛼̂ − 𝛽̂ 𝑋) = 0
𝜎̂ 2
∑𝛼̂ = ∑𝑌 − 𝛽̂ ∑𝑋
∑𝛼̂ ∑𝑌 𝛽̂ ∑𝑋
= −
𝑛 𝑛 𝑛
𝛼̂(𝑀𝐿𝐸 |𝑁) = 𝑌̅ − 𝛽̂ 𝑋̅
1
𝑙(𝛼̂, 𝛽̂ , 𝜎̂) = −𝑁 ln 𝜎̂ − 𝑁 ln √2𝜋 − 2
∑(𝑌 − (𝑌̅ − 𝛽̂ 𝑋̅) − 𝛽̂ 𝑋)2
2𝜎̂
1
𝑙(𝛼̂, 𝛽̂ , 𝜎̂) = −𝑁 ln 𝜎̂ − 𝑁 ln √2𝜋 − 2 ∑(𝑌 − 𝑌̅ + 𝛽̂ 𝑋̅ − 𝛽̂ 𝑋)2
2𝜎̂
1
𝑙(𝛼̂, 𝛽̂ , 𝜎̂) = −𝑁 ln 𝜎̂ − 𝑁 ln √2𝜋 − 2
∑((𝑌 − 𝑌̅) + 𝛽̂ (𝑋̅ − 𝑋))2
2𝜎̂
1
𝑙(𝛼̂, 𝛽̂ , 𝜎̂) = −𝑁 ln 𝜎̂ − 𝑁 ln √2𝜋 − 2
∑((𝑌 − 𝑌̅) − 𝛽̂ (𝑋 − 𝑋̅))2
2𝜎̂
𝜕𝑙 1 (−2)
=− ∑(𝑋 − 𝑋̅) ((𝑌 − 𝑌̅) − 𝛽̂ (𝑋 − 𝑋̅))
̂
𝜕𝛽 2𝜎̂ 2
1
2
∑(𝑋 − 𝑋̅) ((𝑌 − 𝑌̅) − 𝛽̂ (𝑋 − 𝑋̅)) = 0
𝜎̂
Multiply both sides by 𝜎̂ 2
1
𝑙(𝛼̂, 𝛽̂ , 𝜎̂) = −𝑁 ln 𝜎̂ − 𝑁 ln √2𝜋 − ∑(𝑌 − 𝛼
̂ − ̂ 𝑋)2
𝛽
2𝜎̂ 2
1
𝑙(𝛼̂, 𝛽̂ , 𝜎̂) = −𝑁 ln 𝜎̂ − 𝑁 ln √2𝜋 − 2
∑𝜀 2
2𝜎̂
𝜕𝑙 𝑁 1
= − + 3 ∑𝜀 2
𝜕𝜎̂ 𝜎̂ 𝜎̂
𝑁 1
− + 3 ∑𝜀 2 = 0
𝜎̂ 𝜎̂
∑𝜀 2 𝑁
=
𝜎̂ 3 𝜎̂
Multiply both sides by 𝜎̂
∑𝜀 2
=𝑁
𝜎̂ 2
∑𝜀 2
𝜎̂(2𝑀𝐿𝐸 |𝑁) =
𝑁
Conclusion:
• The Maximum point of likelihood under Normality assumption
converges to the Minimum Least Squares of OLS, in particular
when n~∞ (asymptomatic assumption for the sample).
• 90% of distributions should be solved through iterations (Software
cannot be solved manually)!
• One Other Important Notes:
o MLE might fail to exist (not concave).
• MLE has privilege over OLS that it provides essential tests to
evaluate parameters, MLE concavity and Fisher Information
Matrices!
Appendix:
𝛼̂𝑂𝐿𝑆 = 𝑌̅ − 𝛽̂ 𝑋̅
2
∑𝜀 2
𝜎̂𝑂𝐿𝑆 =
𝑁−𝐾−1
Before we derive in 3081:
(1) Expansion.
(2) Common Factor.
(3) Look at order of variables then derive.
MLE offers some tests that provides privilege to evaluate the estimation
process itself. These tools are known as trinity tests:
(1) Likelihood Ratio Test.
(2) Wald Test
(3) Lagrange Multiplier (Score) Test.
These three tests can be applied from detailed information about:
(1) Gradient Vector
Vector of first order condition before we solve for the
parameters (before we equate to Zero)
(2) Hessian Matrix
(3) Fisher Information Matrix
𝜕𝑙 1
= − 2 (−2𝑋 ′ 𝑌 + 2 𝑋 ′ 𝑋𝛽̂ )
𝜕𝛽̂ 2𝜎
1
=− 2
(𝑋 ′ 𝑋𝛽̂ − 𝑋 ′ 𝑌) → (1)
𝜎
1
− 2
(𝑋 ′ 𝑋𝛽̂ − 𝑋 ′ 𝑌) = 0
𝜎
𝜕𝑙 𝑁 𝜀 ′𝜀
= − + 3 → (2)
𝜕𝜎̂ 𝜎̂ 𝜎̂
𝑁 𝜀′𝜀
− + 3 =0
𝜎̂ 𝜎̂
2
𝜀 ′𝜀
𝜎̂ =
𝑁
(1) Gradient Vector
𝜕𝑙
𝜕𝛽
𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡 (∇𝐺 ) =
𝜕𝑙
(𝜕𝜎 )
1
−2
(𝑋 ′ 𝑋𝛽̂ − 𝑋 ′ 𝑌)
𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡 (∇𝐺 ) = ( 𝜎 )
𝑁 𝜀′𝜀
− + 3
𝜎̂ 𝜎̂
𝜕2𝑙 1
= − 2 (𝑋 ′ 𝑋)
𝜕𝛽𝜕𝛽′ 𝜎
We know that
𝜕𝑙 𝑁 𝜀′𝜀
=− + 3
𝜕𝜎 𝜎̂ 𝜎̂
𝜕𝑙
= −𝑁𝜎̂ −1 + 𝜀 ′ 𝜀 𝜎̂ −3
𝜕𝜎
𝜕2𝑙
= −(−1)𝑁𝜎̂ −2 + (−3)𝜀 ′ 𝜀 𝜎̂ −4
𝜕𝜎𝜕𝜎′
𝑁 𝜀′𝜀
= 2−3 4
𝜎̂ 𝜎̂
It can be simplified More but we do it next class
𝜕𝑙
= −𝜎 −2 (𝑋 ′ 𝑋𝛽̂ − 𝑋 ′ 𝑌)
𝜕𝛽̂
𝜕2𝑙 2
= 3 (𝑋 ′ 𝑋𝛽̂ − 𝑋 ′ 𝑌)
𝜕𝛽̂ 𝜕𝜎 𝜎̂
2 ′
= 𝑋 (𝑋𝛽̂ − 𝑌)
𝜎̂ 3
2 ′
=− 3
𝑋 (𝑌 − 𝑋𝛽̂ )
𝜎̂
2 ′
=− 𝑋 𝜀
𝜎̂ 3
𝑋′𝑋 2 ′
− 2 − 𝑋 𝜀
𝐻 (𝜃 ) = ( 𝜎 𝜎̂ 3 )
2 𝑁 𝜀′𝜀
− 3 𝑋′ 𝜀 −3 4
𝜎̂ 𝜎̂ 2 𝜎̂