endsem_solutions
endsem_solutions
101
Endsem Solutions
5 Mark Questions
Q1: Let Z = X1 + X2 + · · · + XN , where Xi are i.i.d. random variables and N is a
positive discrete random variable. Prove that:
MZ (t) = E[etZ ]
MZ (t) = EN E etz | N .
Expanding Z we have
N
Y
tz t(X1 +X2 +···+XN )
e =e = etXi
i=1
MZ (t) = EN (E etX1 )N .
MN (t) = E[etN ]
1
if t = log u
N
MN (log(u)) = E[elog(u)×N ] = E[elog(u ) ] = E[uN ]
Hence proved
d d
fY (y) = FY (y) = [FX (log y)] .
dy dy
1
fY (y) = fX (log y) · .
y
(log y − µ)2
1 1
fY (y) = √ exp − 2
· .
2πσ 2 2σ y
Simplifying:
(log y − µ)2
1
fY (y) = √ exp − , y > 0.
y 2πσ 2 2σ 2
2
8 Mark Questions
Q1: Let D = {x1 , x2 , . . . , xn } denote i.i.d. samples from a Poisson random variable with
unknown parameter γ.
(a) Find the Maximum Likelihood Estimate (MLE) for the unknown parameter
γ. (5 marks)
(b) Determine the Mean Squared Error (MSE) of the estimate. (3 marks)
A:
Taking the natural logarithm of the likelihood function, the log-likelihood is:
n
X
ℓ(D; γ) = [xi ln(γ) − γ − ln(xi !)] .
i=1
To find the MLE, we differentiate ℓ(γ; D) with respect to γ and set it equal to
zero: n
∂ℓ(D; γ) X xi
= − n = 0.
∂γ i=1
γ
Simplifying:
n
X
xi = nγ.
i=1
3
(b) Mean Squared Error (MSE) of γ̂
Let the sample mean γ̂ be written as:
n
1X
γ̂ = Xi .
n i=1
A:
Let X ∼ N (µ, σ 2 ) be a Gaussian random variable with known mean µ and unknown
variance σ 2 . The probability density function (PDF) of X is given by:
(x − µ)2
2 1
f (x | µ, σ ) = √ exp − .
2πσ 2 2σ 2
Given k independent and identically distributed (iid) samples D = {x1 , x2 , . . . , xk },
the likelihood function is expressed as:
4
k k Pk !
2
1 i=1 (xi − µ)
Y
L(σ 2 | D) = f (xi | µ, σ 2 ) = √ exp − .
i=1 2πσ 2 2σ 2
k
X
2
kσ = (xi − µ)2 .
i=1
Solve for σ 2 :
k
2 1X
σ̂ = (xi − µ)2 .
k i=1
k
1X
σˆ2 = (xi − µ)2
k i=1
To determine whether the MLE estimate is biased, we need to compare E[σˆ2 ] with
the true variance σ 2 .
k
2 1X
σ̂ = (xi − µ)2
k i=1
5
So,
k
1X
E[σˆ2 ] = E[x2i ] − 2µE[xi ] + µ2
k i=1
k
1X 2
σ + µ2 − 2µ2 + µ2
=
k i=1
1
kσ 2 = σ 2
=
k
Bias[σ̂ 2 ] = E[σ̂ 2 ] − σ
Bias[σ̂] = 0
p
(a) Show that Xn →
− 0 (Convergence in probability to 0). (4 marks)
a.s.
(b) Show that Xn −−→ 0 (Almost sure convergence to 0). (4 marks)
A:
p
(a) Convergence in probability: To show Xn →
− 0, we need to prove:
We have:
1
P (|Xn | > ϵ) = P (Xn = n) = .
n2
Hence:
1
lim P (|Xn | > ϵ) = lim = 0.
n→∞ n→∞ n2
p
Thus, Xn →
− 0.
P (|Xn − X| > ϵ)
= P (|Xn | > ϵ)
∞
X 1
= .
n=1
n2
6
= f inite
This
P series converges because it is a p-series(type of infinite series that is written
as (1/np )) with p = 2 > 1. Therefore:
a.s.
Hence, Xn = 0 for all sufficiently large n with probability 1. Thus, Xn −−→ 0.
Q4: Given a Markov coin with the following transition probability matrix P and initial
distribution µ = [0.1, 0.9], use the following 4 independent U nif orm[0, 1] samples
{0.3, 0.7, 0.23, 0.97} to obtain/generate 4 successive toss outcomes of the Markov
coin. (Hint: The first toss is to be sampled from the initial distribution.)
0.9 0.1
P =
0.4 0.6
A:
We will simulate the successive toss outcomes of the Markov coin using the given
transition probability matrix P , initial distribution µ = [0.1, 0.9], and uniform
random samples {0.3, 0.7, 0.23, 0.97}.
Step 1: The first toss The first toss is sampled from the initial distribution µ.
Since the first sample is 0.3:
Sample = 0.7. Since 0.7 ≤ 1.0, the next state is 2. so when i get tail as state
for second toss, then distribution changes to [0.4 0.6]
• Third toss: Current state = 2, transition probabilities = [0.4, 0.6].
Sample = 0.23. Since 0.23 ≤ 0.4, the next state is 1. so when i get head as
state for third toss, then distribution changes to [0.9 0.1]
• Fourth toss: Current state = 1, transition probabilities = [0.9, 0.1].
7
Final Results
The sequence of states for the 4 tosses is:
{2, 2, 1, 2}
Q5: Let X and Y be independent random variables with common distribution function
F.
1. PDF of Z1 = max(X, Y )
CDF of Z1 :
PDF of Z1 :
d
fZ1 (z) = F (z)2 = 2F (z)f (z)
dz
2. PDF of Z2 = min(X, Y )
CDF of Z2 :
PDF of Z2 :
d
1 − (1 − F (z))2 = 2(1 − F (z))f (z)
fZ2 (z) =
dz
Final PDFs
fZ1 (z) = 2F (z)f (z)
fZ2 (z) = 2(1 − F (z))f (z)
8
10 Mark Questions
Q1: Let D = {x1 , . . . xn } denote i.i.d. samples from a uniform random variable U [a, b]
where a and b are unknown. Find an M LE estimate for the unknown parameters
a and b
A: The pdf of a U [a, b] random variable is given by
(
1
a≤u≤b
fU (u) = b−a
0 o.w
Similarly, the M LE estimate for b is given by b̂M L = arg max log L(x1 , . . . xn ; a, b)
b
To find the maxima, we take the derivative w.r.t b
∂ log L −n
= a ≤ mini (xi ), b ≥ maxi (xi )
∂b b−a
The derivative w.r.t b is monotonically decreasing in the region b ≥ maxi (xi ), so to
maximize the likelihood we take the minimum value b can take in the region which
is given by
b̂M L = max xi
i
9
Q2: Bayesian Inference/ Conjugate prior problem: Suppose D = {x1 , ..., xn } is a data
set consisting of independent samples of a Poisson random variable with unknown
parameter λ∗ . Now assume a prior model Λ ∼ Gamma(α, β) on the unknown
parameter λ∗ (see hint below for gamma distribution). Obtain an expression for
the posterior distribution on λ∗ . (7mks). What is the MAP estimate for λ∗ ? (3mks)
Hint: Use Prior belief: Λ ∼ Gamma(α, β),
β α α−1 −βλ
fΛ (λ) = λ e ; λ > 0.
Γ(α)
and Likelihood of observing x given Λ = λ:
λx e−λ
fX|Λ (x|λ) =
x!
A:
(a) We first obtain the expression for likelihood. Let Xi be the random variable
corresponding to sample xi
βα n ∗
n
(λ∗ )α−1+Σi=1 xi e−(β+n)λ
Πi=1 xi !Γ(α)
=
R∞ βα n
λα−1+Σi=1 xi e−(β+n)λ dλ
0
Πni=1 xi !Γ(α)
n ∗
(λ∗ )α−1+Σi=1 xi e−(β+n)λ
= R ∞ α−1+Σn xi −(β+n)λ
0
λ i=1 e dλ
10
Let (β + n)λ = t
n ∗
∗
(λ∗ )α−1+Σi=1 xi e−(β+n)λ
fΛ|X1 ,···,Xn (λ |x1 , · · · , xn ) = R ∞ α−1+Σni=1 xi
t 1
0 β+n
e−t β+n dt
n ∗
(λ∗ )α−1+Σi=1 xi e−(β+n)λ
=
1 R ∞ α−1+Σn x
n x t i=1 i e−t dt
(β + n)α+Σ i=1 i 0
R∞
We know that Γ(z) = 0
tz−1 e−t dt
n
∗
(β + n)α+Σi=1 xi ∗ α−1+Σn xi −(β+n)λ∗
fΛ|X1 ,···,Xn (λ |x1 , · · · , xn ) = (λ ) i=1 e
Γ(α + Σni=1 xi )
(ii) For a Markov chain with state space S = {1, 2, 3} and transition matrix:
p 1−p 0
P = p 1 − 2p p ,
0 0 1
use the above equality to find Fii for i = 1, 2, 3. From the values of Fii , deduce
which states are transient and recurrent.
A:
11
(a) Let
Sii := {{Xj }nj=0 s.t. X0 = i and ∃n s.t. Xn = i}
snii := {{Xj }nj=0 s.t. X0 = i , Xn = i , Xk ̸= i ∀k ∈ {1, . . . , n − 1}}
∞
[
Sii′ := snii
n=1
2 −1
=⇒ y = {Xj }nj=0 , Xn2 = i
If y ∈ snii1
1 −1
=⇒ y = {Xj }nj=0 , Xn1 = i
2 −1 1 −1
=⇒ y = {Xj }nj=0 , Xn2 = i, {Xj }nj=n 2 +1
, Xn1 = i [Taking an intermediate stop at n2 ]
But we are given by definition of snii that
∀k ∈ {1, . . . , n − 1} Xk ̸= i
=⇒ y ̸∈ snii1
∴ snii1 ∩ snii2 = ϕ
• To understand how to prove second claim, suppose there exists a sequence
y ∈ Sii of length n such that yk = i for some k ∈ {1, . . . , n − 1}. The
sequence can be decomposed as follows:
– A segment from X0 = i to Xk = i,
– A subsequent segment from Xk = i to Xn = i.
• We can consider Xk to be the starting point of another sequence, which
will also satisfy the given conditions. Subsequences following the first time
return will not contribute to the previous subsequence.
• Note that we are able to break these sequences, and consider a new starting
point, forgetting the info in past as a consequence of Markov Property.
• We can see that y ∈ Sii is just a concatenation of first time return se-
quences, we can also claim that the contributions made by the subse-
quences following the first time return subsequence will be none, and we
can reduce our subsequence.
• Since we are dealing with probability of ever returning back to i, it is
sufficient to stop at first time visit, and still have the same probability
measure.
12
∴ P(Sii ) = P(Sii′ )
When we apply probability axioms on these two sets, we observe the following:
Note: The proof above is just for reference, any other proofs which mention
with proper reasoning why we can add the probabilities, will also be rewarded
marks.
∞
X
Fii = P (returning back to i, starting in i, and not before in n steps)
n=1
∞
X ∞
X
=⇒ Fii = P (returning back to i after n exact steps) = fiin
n=1 n=1
13
1−p
p 1 2 1 − 2p
p
p
1 3
n
To calculate F33 , note that f33 = 0 ∀ n > 1. This is true because there are
no other transitions that will make us leave state 3. So, you cannot visit state
3 starting in 3 after exactly n > 1 steps.
1
f33 = P (X1 = 3 | X0 = 3) = 1
∞
X
1 n
=⇒ F33 = f33 + f33 =1+0=1
n=2
∴ State 3 is recurrent
—
n
To calculate F22 , observe the transitions. We can write f22 as follows:
1
f22 = P (X1 = 2 | X0 = 2) = 1 − 2p
n
f22 = P (leave 2, loop n − 2 times outside 2, then come back to 2)
X n−1
Y
n
=⇒ f22 = P (X1 = s | X0 = 2)· P (Xj = s | Xj−1 = s)·P (Xn = 2 | Xn−1 = s)
s∈S\{2} j=2
n−1
Y
n
=⇒ f22 = P (X1 = 1 | X0 = 2)· P (Xj = 1 | Xj−1 = 1)·P (Xn = 2 | Xn−1 = 1)
j=2
n−1
Y
n
=⇒ f22 =p· p · (1 − p) = (1 − p) · pn−1
j=2
14
∞
X
= 1 − 2p + (1 − p) · pn−1
n=2
∞
1−pX n
= 1 − 2p + p
p n=2
1−p p2
= 1 − 2p + ·
p 1−p
= 1 − 2p + p = 1 − p < 1
∴ State 2 is transient
—
n
To calculate F11 , we can write f11 as follows:
1
f11 = P (X1 = 1 | X0 = 1) = p
n
f11 = P (leave 1, loop n − 2 times outside 1, then come back to 1)
X n−1
Y
n
=⇒ f11 = P (X1 = s | X0 = 1)· P (Xj = s | Xj−1 = s)·P (Xn = 1 | Xn−1 = s)
s∈S\{1} j=2
n−1
Y
n
=⇒ f11 = P (X1 = 2 | X0 = 1)· P (Xj = 2 | Xj−1 = 2)·P (Xn = 1 | Xn−1 = 2)
j=2
n−1
Y
n
=⇒ f11 = (1 − p) · (1 − 2p) · p = p(1 − p) · (1 − 2p)n−2
j=2
∞
X
=p+ p(1 − p) · (1 − 2p)n−2
n=2
p(1 − p)
=p+
(1 − 2p)2
15
(1 − p)
=p+
2
p+1
= <1
2
1
Note that p < 2
since P (Xi = 2 | Xi−1 = 2) = 1 − 2p > 0.
∴ State 1 is transient
Since, E[X] = µ.
CX = E[(AZ)(AZ)T ].
CX = AE[ZZ T ]AT .
Since Z is a standard normal vector, its covariance matrix is the identity
matrix I, i.e., E[ZZ T ] = I. Thus:
CX = AIAT = AAT .
CX = AAT .
16
(c) PDF of X:
Let’s start with the PDF of the standard normal vector Z. For a standard
normal vector where Zi ’s are i.i.d. and Zi ∼ N (0, 1), the PDF is given by:
n n
!
Y 1 1 X
2 1 1 T
fZ (z) = fZi (zi ) = exp − z = exp − z z
i=1
(2π)n/2 2 i=1 i (2π)n/2 2
fX (x) = fZ (H(x))|J|
Z = A−1 (X − µ),
∂Hi
= aik .
∂xk
17
Thus, we conclude that
1
fX (x) = fZ (A−1 (x − µ))
| det(A)|
1 1 −1 T −1
fX (x) = exp − (A (x − µ)) (A (x − µ))
(2π)n/2 | det(A)| 2
1 1 T −T −1
fX (x) = exp − (x − µ) A A (x − µ)
(2π)n/2 | det(A)| 2
1 1 T T −1
fX (x) = exp − (x − µ) (AA ) (x − µ)
(2π)n/2 | det(A)| 2
Q5: Shifted Exponential: Let X be a random variable following a shifted exponential
distribution with rate parameter λ > 0 and shift parameter µ. The probability
density function (PDF) of X is given by:
(
λe−λ(x−µ) , if x ≥ µ,
fX (x) =
0, otherwise.
A:
(a)
MX (t) = E[etX ]
Z ∞
= etx · λe−λ(x−µ) dx
µ
Z ∞
= λe λµ
e−(λ−t)x dx
µ
−(λ−t)x ∞
λµ e
= λe −
λ−t µ
−(λ−t)µ
e
= λeλµ , for t < λ
λ−t
λeµt
= , for t < λ
λ−t
MX (t) exists for all t < λ since e−(λ−t)x becomes 0 at infinity only if (λ − t) is
positive.
(b) To find E[X], we need to find the derivative of MX (t), since E[X] = MX′ (0).
d µt µλeµt λeµt
MX′ (t) = λe (λ − t)−1 = +
dt λ−t (λ − t)2
18
λµ λ 1
E[X] = MX′ (0) = + 2 =µ+
λ λ λ
Similarly, E[X 2 ] = MX′′ (0).
d
MX′′ (t) = µλeµt (λ − t)−1 + λeµt (λ − t)−2
dt
µ2 λeµt µλeµt 2λeµt
= +2 +
λ−1 (λ − t)2 (λ − t)3
2 2
2 2 1 1 1 1
Var[X] = E[X ] − E[X] = µ+ + 2 − µ+ = 2
λ λ λ λ
19