0% found this document useful (0 votes)
8 views10 pages

Week 4 - 418

The document discusses the Maximum Likelihood Estimation (MLE) method, particularly under the assumption of normality, detailing the likelihood and log-likelihood functions, and their derivations. It highlights the convergence of MLE to the Minimum Least Squares of Ordinary Least Squares (OLS) as sample size approaches infinity, and emphasizes the importance of MLE in providing essential tests for parameter evaluation. Additionally, it introduces the trinity tests (Likelihood Ratio Test, Wald Test, and Lagrange Multiplier Test) that can be applied to assess the estimation process.

Uploaded by

Mostafa Allam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views10 pages

Week 4 - 418

The document discusses the Maximum Likelihood Estimation (MLE) method, particularly under the assumption of normality, detailing the likelihood and log-likelihood functions, and their derivations. It highlights the convergence of MLE to the Minimum Least Squares of Ordinary Least Squares (OLS) as sample size approaches infinity, and emphasizes the importance of MLE in providing essential tests for parameter evaluation. Additionally, it introduces the trinity tests (Likelihood Ratio Test, Wald Test, and Lagrange Multiplier Test) that can be applied to assess the estimation process.

Uploaded by

Mostafa Allam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Week 4

Review of MLE introduced the Last Week


• 𝑃𝜃 : 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝐷𝑒𝑛𝑠𝑖𝑡𝑦 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛.
• 𝑁
𝐿 = Π𝑖=1 𝑃𝜃 : Likelihood Function
• 𝐿 → 𝑙 : Log Likelihood
• 𝑀𝑎𝑥𝑖𝑚𝑖𝑧𝑒 with respect to parameters or iterate.

Assuming that 𝜀~𝑁(0, 𝜎 2 )


Simple Form
1. Define the Probability Density Function
1 1 2
𝑃𝜃 = exp (− 𝜀 )
𝜎 √2𝜋 2𝜎 2
2. Define the Likelihood Function and expand it
𝑁 1 1 2
𝐿 = Π𝑖=1 exp (− 𝜀 )
𝜎 √2𝜋 2𝜎 2

𝑁
1 1
𝐿=( ) exp (− 2
∑𝜀 2 )
𝜎√2𝜋 2𝜎
3. Define the log Likelihood Function
1
𝑙 = −𝑁 ln 𝜎 − 𝑁 ln √2𝜋 − 2
∑𝜀 2
2𝜎

1
𝑙 = −𝑁 ln 𝜎 − 𝑁 ln √2𝜋 − 2
∑(𝑌 − 𝑌̂)2
2𝜎
𝑌̂ = 𝛼̂ + 𝛽̂ 𝑋
1
𝑙(𝛼̂, 𝛽̂ , 𝜎̂) = −𝑁 ln 𝜎̂ − 𝑁 ln √2𝜋 − 2
∑(𝑌 − 𝛼̂ − 𝛽̂ 𝑋)2
2𝜎̂

If Formula does not have


If Formula has solution (closed)
solution (Open)
𝜕𝑙
𝐼𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠
𝜕𝛼̂
𝜕𝑙 𝜕𝑙 (90%)
=
𝜕𝜃 𝜕𝛽̂
𝜕𝑙
(𝜕𝜎̂ )

(1) Through Optimization:


𝜕𝑙 (−2)
=− 2
∑(𝑌 − 𝛼̂ − 𝛽̂ 𝑋)
𝜕𝛼̂ 2𝜎̂

1
∑(𝑌 − 𝛼̂ − 𝛽̂ 𝑋) = 0
𝜎̂ 2

Multiply both sides by 𝜎̂ 2 :


∑𝑌 − ∑𝛼̂ − 𝛽̂ ∑𝑋 = 0

∑𝛼̂ = ∑𝑌 − 𝛽̂ ∑𝑋
∑𝛼̂ ∑𝑌 𝛽̂ ∑𝑋
= −
𝑛 𝑛 𝑛

𝛼̂(𝑀𝐿𝐸 |𝑁) = 𝑌̅ − 𝛽̂ 𝑋̅

Now we need to get 𝛽̂ :

1
𝑙(𝛼̂, 𝛽̂ , 𝜎̂) = −𝑁 ln 𝜎̂ − 𝑁 ln √2𝜋 − 2
∑(𝑌 − (𝑌̅ − 𝛽̂ 𝑋̅) − 𝛽̂ 𝑋)2
2𝜎̂
1
𝑙(𝛼̂, 𝛽̂ , 𝜎̂) = −𝑁 ln 𝜎̂ − 𝑁 ln √2𝜋 − 2 ∑(𝑌 − 𝑌̅ + 𝛽̂ 𝑋̅ − 𝛽̂ 𝑋)2
2𝜎̂

1
𝑙(𝛼̂, 𝛽̂ , 𝜎̂) = −𝑁 ln 𝜎̂ − 𝑁 ln √2𝜋 − 2
∑((𝑌 − 𝑌̅) + 𝛽̂ (𝑋̅ − 𝑋))2
2𝜎̂

1
𝑙(𝛼̂, 𝛽̂ , 𝜎̂) = −𝑁 ln 𝜎̂ − 𝑁 ln √2𝜋 − 2
∑((𝑌 − 𝑌̅) − 𝛽̂ (𝑋 − 𝑋̅))2
2𝜎̂

𝜕𝑙 1 (−2)
=− ∑(𝑋 − 𝑋̅) ((𝑌 − 𝑌̅) − 𝛽̂ (𝑋 − 𝑋̅))
̂
𝜕𝛽 2𝜎̂ 2

1
2
∑(𝑋 − 𝑋̅) ((𝑌 − 𝑌̅) − 𝛽̂ (𝑋 − 𝑋̅)) = 0
𝜎̂
Multiply both sides by 𝜎̂ 2

∑(𝑋 − 𝑋̅) ((𝑌 − 𝑌̅) − 𝛽̂ (𝑋 − 𝑋̅))


∑(𝑋 − 𝑋̅)(𝑌 − 𝑌̅) − 𝛽̂ ∑(𝑋 − 𝑋̅)2 = 0

∑(𝑋 − 𝑋̅)(𝑌 − 𝑌̅)


𝛽̂(𝑀𝐿𝐸 |𝑁) =
∑(𝑋 − 𝑋̅)2

Mow we derive with respect to 𝜎̂:

1
𝑙(𝛼̂, 𝛽̂ , 𝜎̂) = −𝑁 ln 𝜎̂ − 𝑁 ln √2𝜋 − ∑(𝑌 − 𝛼
̂ − ̂ 𝑋)2
𝛽
2𝜎̂ 2

1
𝑙(𝛼̂, 𝛽̂ , 𝜎̂) = −𝑁 ln 𝜎̂ − 𝑁 ln √2𝜋 − 2
∑𝜀 2
2𝜎̂

𝜕𝑙 𝑁 1
= − + 3 ∑𝜀 2
𝜕𝜎̂ 𝜎̂ 𝜎̂

𝑁 1
− + 3 ∑𝜀 2 = 0
𝜎̂ 𝜎̂
∑𝜀 2 𝑁
=
𝜎̂ 3 𝜎̂
Multiply both sides by 𝜎̂
∑𝜀 2
=𝑁
𝜎̂ 2
∑𝜀 2
𝜎̂(2𝑀𝐿𝐸 |𝑁) =
𝑁

Conclusion:
• The Maximum point of likelihood under Normality assumption
converges to the Minimum Least Squares of OLS, in particular
when n~∞ (asymptomatic assumption for the sample).
• 90% of distributions should be solved through iterations (Software
cannot be solved manually)!
• One Other Important Notes:
o MLE might fail to exist (not concave).
• MLE has privilege over OLS that it provides essential tests to
evaluate parameters, MLE concavity and Fisher Information
Matrices!

Appendix:

OLS Parameters are:

𝛼̂𝑂𝐿𝑆 = 𝑌̅ − 𝛽̂ 𝑋̅

∑(𝑋 − 𝑋̅)(𝑌 − 𝑌̅)


𝛽̂𝑂𝐿𝑆 =
∑(𝑋 − 𝑋̅)2

2
∑𝜀 2
𝜎̂𝑂𝐿𝑆 =
𝑁−𝐾−1
Before we derive in 3081:
(1) Expansion.
(2) Common Factor.
(3) Look at order of variables then derive.

MLE offers some tests that provides privilege to evaluate the estimation
process itself. These tools are known as trinity tests:
(1) Likelihood Ratio Test.
(2) Wald Test
(3) Lagrange Multiplier (Score) Test.
These three tests can be applied from detailed information about:
(1) Gradient Vector
Vector of first order condition before we solve for the
parameters (before we equate to Zero)
(2) Hessian Matrix
(3) Fisher Information Matrix

Applying these concepts on the Normality assumption:


1
𝑙 = −𝑁 ln 𝜎 − 𝑁 ln √2𝜋 − 2
∑𝜀 2
2𝜎
1
𝑙 = −𝑁 ln 𝜎 − 𝑁 ln √2𝜋 − 2 𝜀 ′ 𝜀
2𝜎
1 ′
𝑙 = −𝑁 ln 𝜎 − 𝑁 ln √2𝜋 − 2 (𝑌 − 𝑋𝛽̂ ) (𝑌 − 𝑋𝛽̂ )
2𝜎
1
𝑙 = −𝑁 ln 𝜎 − 𝑁 ln √2𝜋 − 2
(𝑌 ′ 𝑌 − 2𝛽̂ ′ 𝑋 ′ 𝑌 + 𝛽̂ ′ 𝑋 ′ 𝑋𝛽̂ )
2𝜎

𝜕𝑙 1
= − 2 (−2𝑋 ′ 𝑌 + 2 𝑋 ′ 𝑋𝛽̂ )
𝜕𝛽̂ 2𝜎

1
=− 2
(𝑋 ′ 𝑋𝛽̂ − 𝑋 ′ 𝑌) → (1)
𝜎

1
− 2
(𝑋 ′ 𝑋𝛽̂ − 𝑋 ′ 𝑌) = 0
𝜎

𝛽̂(𝑀𝐿𝐸 |𝑁) = (𝑋 ′ 𝑋)−1 𝑋′𝑌

Now we optimize with respect to 𝜎̂:


1 ′
𝑙 = −𝑁 ln 𝜎 − 𝑁 ln √2𝜋 − 𝜀𝜀
2𝜎 2

𝜕𝑙 𝑁 𝜀 ′𝜀
= − + 3 → (2)
𝜕𝜎̂ 𝜎̂ 𝜎̂

𝑁 𝜀′𝜀
− + 3 =0
𝜎̂ 𝜎̂

2
𝜀 ′𝜀
𝜎̂ =
𝑁
(1) Gradient Vector

𝜕𝑙
𝜕𝛽
𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡 (∇𝐺 ) =
𝜕𝑙
(𝜕𝜎 )

1
−2
(𝑋 ′ 𝑋𝛽̂ − 𝑋 ′ 𝑌)
𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡 (∇𝐺 ) = ( 𝜎 )
𝑁 𝜀′𝜀
− + 3
𝜎̂ 𝜎̂

(2) Hessian Matrix


𝜕2𝑙 𝜕2𝑙
𝜕𝛽𝜕𝛽′ 𝜕𝛽𝜕𝜎′
𝐻 (𝜃 ) =
𝜕2𝑙 𝜕2𝑙
(𝜕𝜎𝜕𝛽′ 𝜕𝜎𝜕𝜎′)

𝜕2𝑙 1
= − 2 (𝑋 ′ 𝑋)
𝜕𝛽𝜕𝛽′ 𝜎
We know that
𝜕𝑙 𝑁 𝜀′𝜀
=− + 3
𝜕𝜎 𝜎̂ 𝜎̂

𝜕𝑙
= −𝑁𝜎̂ −1 + 𝜀 ′ 𝜀 𝜎̂ −3
𝜕𝜎

𝜕2𝑙
= −(−1)𝑁𝜎̂ −2 + (−3)𝜀 ′ 𝜀 𝜎̂ −4
𝜕𝜎𝜕𝜎′

𝑁 𝜀′𝜀
= 2−3 4
𝜎̂ 𝜎̂
It can be simplified More but we do it next class

Now we solve the Off-Diagonal Part:


We know that
𝜕𝑙 1
= − 2 (𝑋 ′ 𝑋𝛽̂ − 𝑋 ′ 𝑌)
𝜕𝛽̂ 𝜎

𝜕𝑙
= −𝜎 −2 (𝑋 ′ 𝑋𝛽̂ − 𝑋 ′ 𝑌)
𝜕𝛽̂
𝜕2𝑙 2
= 3 (𝑋 ′ 𝑋𝛽̂ − 𝑋 ′ 𝑌)
𝜕𝛽̂ 𝜕𝜎 𝜎̂
2 ′
= 𝑋 (𝑋𝛽̂ − 𝑌)
𝜎̂ 3
2 ′
=− 3
𝑋 (𝑌 − 𝑋𝛽̂ )
𝜎̂

2 ′
=− 𝑋 𝜀
𝜎̂ 3

𝑋′𝑋 2 ′
− 2 − 𝑋 𝜀
𝐻 (𝜃 ) = ( 𝜎 𝜎̂ 3 )
2 𝑁 𝜀′𝜀
− 3 𝑋′ 𝜀 −3 4
𝜎̂ 𝜎̂ 2 𝜎̂

Midterm 27 March 2025 (Conditional on your approval)

You might also like