0% found this document useful (0 votes)
8 views

endsem_solutions

The document contains solutions to various probability and statistics problems, including moment generating functions, maximum likelihood estimation, and convergence concepts. It provides detailed derivations and explanations for each question, demonstrating the application of statistical principles. The document also includes examples of Markov processes and the use of transition matrices.

Uploaded by

sumedha1174
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

endsem_solutions

The document contains solutions to various probability and statistics problems, including moment generating functions, maximum likelihood estimation, and convergence concepts. It provides detailed derivations and explanations for each question, demonstrating the application of statistical principles. The document also includes examples of Markov processes and the use of transition matrices.

Uploaded by

sumedha1174
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Probability and Statistics: MA6.

101
Endsem Solutions

5 Mark Questions
Q1: Let Z = X1 + X2 + · · · + XN , where Xi are i.i.d. random variables and N is a
positive discrete random variable. Prove that:

MZ (t) = MN (log MX (t)).

A: The moment generating function of Z is given by:

MZ (t) = E[etZ ]

Conditioning on N , the number of terms, we have:

MZ (t) = EN E etz | N .
  

Expanding Z we have
N
Y
tz t(X1 +X2 +···+XN )
e =e = etXi
i=1

So the expression is " " ##


N
Y
MZ (t) = EN E etXi | N .
i=1

Since the Xi are independent, we can use E[XY ] = E[X] × E[Y ]:


"N #
Y 
tXi

MZ (t) = EN E e |N .
i=1

Now Xi are i.i.d so Xi = X1 and it is independant of N

MZ (t) = EN (E etX1 )N .
   

Now E[etX1 ] = MX (t) by definition of moment generating function. So:

MZ (t) = EN (MX (t))N .


 

Now let u = MX (t). Then:


MZ (t) = EN [uN ].

Now let’s think of MGF of N:

MN (t) = E[etN ]

1
if t = log u
N
MN (log(u)) = E[elog(u)×N ] = E[elog(u ) ] = E[uN ]

Subsituting this in our answer

MZ (t) = MN (log(u)) = MN (log(MX (t))).

Hence proved

Q2: Let Y = eX , where X ∼ N (µ, σ 2 ). Obtain the pdf of Y .


A: The cumulative distribution function (CDF) of Y is given by:

FY (y) = P (Y ≤ y) = P (eX ≤ y).

Taking the natural logarithm, this is equivalent to:

FY (y) = P (X ≤ log y).

Note: y > 0 for log to be defined

FY (y) = FX (log y).

Differentiating FY (y) with respect to y, we obtain the PDF of Y :

d d
fY (y) = FY (y) = [FX (log y)] .
dy dy

From the chain rule:


d
fY (y) = fX (log y) · (log y).
dy

The derivative of log y is y1 . Substituting:

1
fY (y) = fX (log y) · .
y

The PDF of X is:


(x − µ)2
 
1
fX (x) = √ exp − .
2πσ 2 2σ 2

Substituting x = log y, we get:

(log y − µ)2
 
1 1
fY (y) = √ exp − 2
· .
2πσ 2 2σ y

Simplifying:
(log y − µ)2
 
1
fY (y) = √ exp − , y > 0.
y 2πσ 2 2σ 2

2
8 Mark Questions
Q1: Let D = {x1 , x2 , . . . , xn } denote i.i.d. samples from a Poisson random variable with
unknown parameter γ.

(a) Find the Maximum Likelihood Estimate (MLE) for the unknown parameter
γ. (5 marks)
(b) Determine the Mean Squared Error (MSE) of the estimate. (3 marks)

A:

(a) MLE for γ


The probability mass function (PMF) of a Poisson random variable is given
by:
γ x e−γ
P (X = x) = , x = 0, 1, 2, . . .
x!
For the i.i.d. samples D = {x1 , x2 , . . . , xn }, the likelihood function is:
n
Y γ xi e−γ
L(D; γ) = .
i=1
xi !

Taking the natural logarithm of the likelihood function, the log-likelihood is:
n
X
ℓ(D; γ) = [xi ln(γ) − γ − ln(xi !)] .
i=1

To find the MLE, we differentiate ℓ(γ; D) with respect to γ and set it equal to
zero: n
∂ℓ(D; γ) X xi
= − n = 0.
∂γ i=1
γ

Simplifying:
n
X
xi = nγ.
i=1

Solving for γ, the MLE and corresponding estimator γ̂ is:


n
1X
γ̂ = xi = x̄,
n i=1

where x̄ is the sample mean.


Thus MLE estimate can be written as:-
n
1X
γ̂ = Xi = X̄,
n i=1

3
(b) Mean Squared Error (MSE) of γ̂
Let the sample mean γ̂ be written as:
n
1X
γ̂ = Xi .
n i=1

The expectation of γ̂ is:


" n
# n
1X 1X
E[γ̂] = E Xi = E[Xi ].
n i=1 n i=1

Since E[Xi ] = γ for a Poisson random variable:


1
E[γ̂] = · n · γ = γ.
n
The variance of γ̂ is:
n
! n
1X 1 X
Var(γ̂) = Var Xi = 2 Var(Xi ).
n i=1 n i=1

Since Var(Xi ) = γ for a Poisson random variable:


1 γ
Var(γ̂) = 2
·n·γ = .
n n
The Mean Squared Error (MSE) of γ̂ is:

MSE(γ̂) = Var(γ̂) + (E[γ̂] − γ)2 .

Since E[γ̂] = γ, the bias is zero:


γ
MSE(γ̂) = Var(γ̂) = .
n
Q2: Consider a Gaussian random variable X with a known mean µ but an unknown
variance σ 2 . Suppose you observe k iid samples from this random variable, denoted
by D = {x1 , x2 , . . . , xk }.

(a) Find the MLE for the unknown variance σ 2 . (5 marks)


(b) Is the MLE estimate biased? Justify your answer. (3 marks)

A:
Let X ∼ N (µ, σ 2 ) be a Gaussian random variable with known mean µ and unknown
variance σ 2 . The probability density function (PDF) of X is given by:

(x − µ)2
 
2 1
f (x | µ, σ ) = √ exp − .
2πσ 2 2σ 2
Given k independent and identically distributed (iid) samples D = {x1 , x2 , . . . , xk },
the likelihood function is expressed as:

4
k k Pk !
2

1 i=1 (xi − µ)
Y
L(σ 2 | D) = f (xi | µ, σ 2 ) = √ exp − .
i=1 2πσ 2 2σ 2

To determine the maximum likelihood estimate (MLE) of σ 2 , denoted as σ̂ 2 , we


maximize the likelihood function. Since the logarithm is a monotonically increasing
function, we equivalently maximize the log-likelihood function:
Pk
k k i=1 (xi − µ)2
ℓ(σ | D) = ln L(σ | D) = − ln(2π) − ln(σ 2 ) −
2 2
.
2 2 2σ 2
To find the MLE, we differentiate ℓ(σ 2 | D) with respect to σ 2 and set the derivative
to zero:
Pk
∂ℓ k i=1 (xi − µ)2
2
=− 2 + = 0.
∂σ 2σ 2σ 4
Simplifying, we find:
Pk
k i=1 (xi − µ)2
= .
2σ 2 2σ 4
Multiply through by 2σ 4 :

k
X
2
kσ = (xi − µ)2 .
i=1

Solve for σ 2 :

k
2 1X
σ̂ = (xi − µ)2 .
k i=1

Thus, the MLE estimate for the variance is:

k
1X
σˆ2 = (xi − µ)2
k i=1

To determine whether the MLE estimate is biased, we need to compare E[σˆ2 ] with
the true variance σ 2 .

k
2 1X
σ̂ = (xi − µ)2
k i=1

Now, we know that:


E[X] = µ

5
So,
k
1X
E[σˆ2 ] = E[x2i ] − 2µE[xi ] + µ2

k i=1

k
1X 2
σ + µ2 − 2µ2 + µ2

=
k i=1

1
kσ 2 = σ 2

=
k

Bias[σ̂ 2 ] = E[σ̂ 2 ] − σ
Bias[σ̂] = 0

Therefore, the estimator σ̂ is an unbiased estimator.

Q3: Consider a sequence {Xn , n = 1, 2, 3, . . . } such that


(
n, with probability n12
Xn = 1
0, with probability 1 − n2

p
(a) Show that Xn →
− 0 (Convergence in probability to 0). (4 marks)
a.s.
(b) Show that Xn −−→ 0 (Almost sure convergence to 0). (4 marks)

A:
p
(a) Convergence in probability: To show Xn →
− 0, we need to prove:

∀ϵ > 0, lim P (|Xn − X| > ϵ) = 0.


n→∞

∀ϵ > 0, lim P (|Xn | > ϵ) = 0.


n→∞

We have:
1
P (|Xn | > ϵ) = P (Xn = n) = .
n2
Hence:
1
lim P (|Xn | > ϵ) = lim = 0.
n→∞ n→∞ n2
p
Thus, Xn →
− 0.

(b) Almost sure convergence:


a.s.
By the Borel-Cantelli lemma, if ∞
P
n=1 P (|Xn − X| > ϵ) < ∞, then Xn −
−→ X

P (|Xn − X| > ϵ)

= P (|Xn | > ϵ)


X 1
= .
n=1
n2

6
= f inite

This
P series converges because it is a p-series(type of infinite series that is written
as (1/np )) with p = 2 > 1. Therefore:
a.s.
Hence, Xn = 0 for all sufficiently large n with probability 1. Thus, Xn −−→ 0.

Q4: Given a Markov coin with the following transition probability matrix P and initial
distribution µ = [0.1, 0.9], use the following 4 independent U nif orm[0, 1] samples
{0.3, 0.7, 0.23, 0.97} to obtain/generate 4 successive toss outcomes of the Markov
coin. (Hint: The first toss is to be sampled from the initial distribution.)
 
0.9 0.1
P =
0.4 0.6

A:
We will simulate the successive toss outcomes of the Markov coin using the given
transition probability matrix P , initial distribution µ = [0.1, 0.9], and uniform
random samples {0.3, 0.7, 0.23, 0.97}.
Step 1: The first toss The first toss is sampled from the initial distribution µ.
Since the first sample is 0.3:

Cumulative probabilities for µ : [0.1, 1.0]

Since 0.3 ≤ 1.0, the first state is 2 (corresponding to µ2 ).


Step 2: Successive tosses For each successive toss, the state is determined using
the transition probability matrix P and the next sample: so when i get tail as state
for first toss, then distribution changes to [0.4 0.6]

• Second toss: Current state = 2, transition probabilities = [0.4, 0.6].

Cumulative probabilities: [0.4, 1.0]

Sample = 0.7. Since 0.7 ≤ 1.0, the next state is 2. so when i get tail as state
for second toss, then distribution changes to [0.4 0.6]
• Third toss: Current state = 2, transition probabilities = [0.4, 0.6].

Cumulative probabilities: [0.4, 1.0]

Sample = 0.23. Since 0.23 ≤ 0.4, the next state is 1. so when i get head as
state for third toss, then distribution changes to [0.9 0.1]
• Fourth toss: Current state = 1, transition probabilities = [0.9, 0.1].

Cumulative probabilities: [0.9, 1.0]

Sample = 0.97. Since 0.97 ≤ 1.0, the next state is 2.

7
Final Results
The sequence of states for the 4 tosses is:

{2, 2, 1, 2}

Q5: Let X and Y be independent random variables with common distribution function
F.

1. PDF of Z1 = max(X, Y )
CDF of Z1 :

FZ1 (z) = P (Z1 ≤ z)


= P (max(X, Y ) ≤ z)
= P (X ≤ z and Y ≤ z)
= P (X ≤ z) · P (Y ≤ z) (using independence)
= F (z) · F (z)
= F (z)2

PDF of Z1 :
d
fZ1 (z) = F (z)2 = 2F (z)f (z)
dz

2. PDF of Z2 = min(X, Y )
CDF of Z2 :

FZ2 (z) = P (Z2 ≤ z)


= P (min(X, Y ) ≤ z)
= 1 − P (min(X, Y ) > z)
= 1 − P (X > z and Y > z)
= 1 − P (X > z) · P (Y > z) (using independence)
= 1 − (1 − F (z)) · (1 − F (z))
= 1 − (1 − F (z))2

PDF of Z2 :
d 
1 − (1 − F (z))2 = 2(1 − F (z))f (z)

fZ2 (z) =
dz

Final PDFs
fZ1 (z) = 2F (z)f (z)
fZ2 (z) = 2(1 − F (z))f (z)

8
10 Mark Questions
Q1: Let D = {x1 , . . . xn } denote i.i.d. samples from a uniform random variable U [a, b]
where a and b are unknown. Find an M LE estimate for the unknown parameters
a and b
A: The pdf of a U [a, b] random variable is given by
(
1
a≤u≤b
fU (u) = b−a
0 o.w

The likelihood of D is defined as

L(x1 , x2 , . . . xn ; a, b) = fU1 ,...Un (x1 , . . . xn ; a, b)


Y
= fUi (xi ; a, b) as the samples are i.i.d.

From the pdf, it is clear that L ̸= 0 if a ≤ xi ≤ b ∀i = 1 . . . n. So L ̸= 0 iff


a < mini (xi ) and b > maxi (xi )
(
1
(b−a)n
a ≤ mini (xi ), b ≥ maxi (xi )
L(x1 , . . . xn ; a, b) =
0 o.w
The log likelihood is
(
−n log(b − a) a ≤ mini (xi ), b ≥ maxi (xi )
log L(x1 , . . . xn ; a, b) =
−∞ o.w

The M LE estimate for a is given by âM L = arg max log L(x1 , . . . xn ; a, b)


a
To find the maxima, we take the derivative w.r.t a
∂ log L n
= a ≤ mini (xi ), b ≥ maxi (xi )
∂a b−a
The derivative w.r.t a is monotonically increasing in the region a ≤ mini (xi ), so to
maximize the likelihood we take the maximum value a can take in the region which
is given by
âM L = min xi
i

Similarly, the M LE estimate for b is given by b̂M L = arg max log L(x1 , . . . xn ; a, b)
b
To find the maxima, we take the derivative w.r.t b
∂ log L −n
= a ≤ mini (xi ), b ≥ maxi (xi )
∂b b−a
The derivative w.r.t b is monotonically decreasing in the region b ≥ maxi (xi ), so to
maximize the likelihood we take the minimum value b can take in the region which
is given by
b̂M L = max xi
i

9
Q2: Bayesian Inference/ Conjugate prior problem: Suppose D = {x1 , ..., xn } is a data
set consisting of independent samples of a Poisson random variable with unknown
parameter λ∗ . Now assume a prior model Λ ∼ Gamma(α, β) on the unknown
parameter λ∗ (see hint below for gamma distribution). Obtain an expression for
the posterior distribution on λ∗ . (7mks). What is the MAP estimate for λ∗ ? (3mks)
Hint: Use Prior belief: Λ ∼ Gamma(α, β),

β α α−1 −βλ
fΛ (λ) = λ e ; λ > 0.
Γ(α)
and Likelihood of observing x given Λ = λ:

λx e−λ
fX|Λ (x|λ) =
x!
A:

(a) We first obtain the expression for likelihood. Let Xi be the random variable
corresponding to sample xi

fX1 ,···,Xn |Λ (x1 , · · · , xn |λ∗ ) = Πni=1 fXi |Λ (xi |λ∗ )



(λ∗ )xi e−λ
= Πni=1
xi !
∗ Σn ∗
(λ ) i=1 i e−nλ
x
=
Πni=1 xi !

The posterior distribution can be found using Bayes’ rule

fX1 ,···,Xn |Λ (x1 , · · · , xn |λ∗ )fΛ (λ∗ )


fΛ|X1 ,···,Xn (λ∗ |x1 , · · · , xn ) =
fX1 ,···,Xn (x1 , · · · , xn )
n ∗
(λ∗ )Σi=1 xi e−nλ β α ∗ α−1 −βλ∗
(λ ) e
Πni=1 xi ! Γ(α)
= R∞
0
fX1 ,···,Xn |Λ (x1 , · · · , xn |λ)fΛ (λ) dλ

βα n ∗
n
(λ∗ )α−1+Σi=1 xi e−(β+n)λ
Πi=1 xi !Γ(α)
=
R∞ βα n
λα−1+Σi=1 xi e−(β+n)λ dλ
0
Πni=1 xi !Γ(α)
n ∗
(λ∗ )α−1+Σi=1 xi e−(β+n)λ
= R ∞ α−1+Σn xi −(β+n)λ
0
λ i=1 e dλ

10
Let (β + n)λ = t
n ∗

(λ∗ )α−1+Σi=1 xi e−(β+n)λ
fΛ|X1 ,···,Xn (λ |x1 , · · · , xn ) = R ∞ α−1+Σni=1 xi
t 1
0 β+n
e−t β+n dt

n ∗
(λ∗ )α−1+Σi=1 xi e−(β+n)λ
=
1 R ∞ α−1+Σn x
n x t i=1 i e−t dt
(β + n)α+Σ i=1 i 0

R∞
We know that Γ(z) = 0
tz−1 e−t dt
n

(β + n)α+Σi=1 xi ∗ α−1+Σn xi −(β+n)λ∗
fΛ|X1 ,···,Xn (λ |x1 , · · · , xn ) = (λ ) i=1 e
Γ(α + Σni=1 xi )

Which gives us Λ|X1 , · · · , Xn ∼ Gamma(α + Σni=1 xi , β + n)


(b) To find the MAP estimate of λ∗
λM AP = argmax fΛ|X1 ,···,Xn (λ∗ |x1 , · · · , xn )
λ∗

Ignoring the variables independent of λ∗


n ∗
λM AP = argmax(λ∗ )α−1+Σi=1 xi e−(β+n)λ
λ∗

Differentiating the expression with respect to λ∗ and setting it to zero to obtain


the maximum point
n ∗ n ∗
(α − 1 + Σni=1 xi )(λ∗ )α−2+Σi=1 xi e−(β+n)λ − (β + n)(λ∗ )α−1+Σi=1 xi e−(β+n)λ = 0
α − 1 + Σni=1 xi
λ∗ =
β+n
Thus our MAP estimate for λ∗ is
α − 1 + Σni=1 xi
λM AP =
β+n
Q3: (i) For a Markov chain, let Fii denote the probability of the chain ever returning to
state i having started in state i, and let fiin denote the probability of visiting state
i for the first time in exactly n steps, having started in state i. Show that:

X
Fii = fiin .
n=1

(ii) For a Markov chain with state space S = {1, 2, 3} and transition matrix:
 
p 1−p 0
P = p 1 − 2p p ,
0 0 1
use the above equality to find Fii for i = 1, 2, 3. From the values of Fii , deduce
which states are transient and recurrent.
A:

11
(a) Let
Sii := {{Xj }nj=0 s.t. X0 = i and ∃n s.t. Xn = i}
snii := {{Xj }nj=0 s.t. X0 = i , Xn = i , Xk ̸= i ∀k ∈ {1, . . . , n − 1}}

[
Sii′ := snii
n=1

We make the following claims:


i. For n1 ̸= n2 , snii1 ∩ snii2 = ϕ
ii. P(Sii ) = P(Sii′ )
For the proof of first claim, assume ∃y ∈ snii2
Without loss of generality, assume n1 > n2 .

2 −1
=⇒ y = {Xj }nj=0 , Xn2 = i
If y ∈ snii1
1 −1
=⇒ y = {Xj }nj=0 , Xn1 = i
2 −1 1 −1
=⇒ y = {Xj }nj=0 , Xn2 = i, {Xj }nj=n 2 +1
, Xn1 = i [Taking an intermediate stop at n2 ]
But we are given by definition of snii that

∀k ∈ {1, . . . , n − 1} Xk ̸= i

=⇒ y ̸∈ snii1

∴ snii1 ∩ snii2 = ϕ
• To understand how to prove second claim, suppose there exists a sequence
y ∈ Sii of length n such that yk = i for some k ∈ {1, . . . , n − 1}. The
sequence can be decomposed as follows:
– A segment from X0 = i to Xk = i,
– A subsequent segment from Xk = i to Xn = i.
• We can consider Xk to be the starting point of another sequence, which
will also satisfy the given conditions. Subsequences following the first time
return will not contribute to the previous subsequence.
• Note that we are able to break these sequences, and consider a new starting
point, forgetting the info in past as a consequence of Markov Property.
• We can see that y ∈ Sii is just a concatenation of first time return se-
quences, we can also claim that the contributions made by the subse-
quences following the first time return subsequence will be none, and we
can reduce our subsequence.
• Since we are dealing with probability of ever returning back to i, it is
sufficient to stop at first time visit, and still have the same probability
measure.

12
∴ P(Sii ) = P(Sii′ )
When we apply probability axioms on these two sets, we observe the following:

• Fii = P(Sii ) = P(Sii′ ) = P( ∞ n


S
n=1 sii )
• fiin = P(snii )
Now using the third axiom of disjoint countable union additivity, we can write:

[ ∞
X ∞
X
Fii = P( snii ) = P(snii ) = fiin
n=1 n=1 n=1

Note: The proof above is just for reference, any other proofs which mention
with proper reasoning why we can add the probabilities, will also be rewarded
marks.

A simpler version of this proof:

Fii = P (coming back to state i, having started in state i)

= P (union of all paths of arbitrary length n, with X0 = i, Xn = i)


• We can simplify the union by seeing that returning back in n steps exact
and m steps with n ̸= m has no overlapping paths.
• So the event of ever returning to i is equal to a disjoint countable union
of returning back in exactly n steps where the union is over n.
• Then applying the third axiom of probability we get the summation.


X
Fii = P (returning back to i, starting in i, and not before in n steps)
n=1

Also fiin = P (returning back to i after n steps and not before)


X ∞
X
=⇒ Fii = P (returning back to i after n exact steps) = fiin
n=1 n=1

(b) For a state i:


• i is recurrent if Fii = 1. This is true because you will always come back
to i with probability 1.
• i is transient if Fii < 1. This is true because there exists some finite non-
zero probability p = 1 − Fii which means that with probability p, you will
never return to state i.
Markov chain:

13
1−p

p 1 2 1 − 2p
p
p

1 3

n
To calculate F33 , note that f33 = 0 ∀ n > 1. This is true because there are
no other transitions that will make us leave state 3. So, you cannot visit state
3 starting in 3 after exactly n > 1 steps.

1
f33 = P (X1 = 3 | X0 = 3) = 1


X
1 n
=⇒ F33 = f33 + f33 =1+0=1
n=2

∴ State 3 is recurrent

n
To calculate F22 , observe the transitions. We can write f22 as follows:

1
f22 = P (X1 = 2 | X0 = 2) = 1 − 2p

n
f22 = P (leave 2, loop n − 2 times outside 2, then come back to 2)

X n−1
Y
n
=⇒ f22 = P (X1 = s | X0 = 2)· P (Xj = s | Xj−1 = s)·P (Xn = 2 | Xn−1 = s)
s∈S\{2} j=2

n−1
Y
n
=⇒ f22 = P (X1 = 1 | X0 = 2)· P (Xj = 1 | Xj−1 = 1)·P (Xn = 2 | Xn−1 = 1)
j=2

[since P (Xn = 2 | Xn−1 = 3) = 0]

n−1
Y
n
=⇒ f22 =p· p · (1 − p) = (1 − p) · pn−1
j=2

Substituting this in the expression, we get:



X
1 n
F22 = f22 + f22
n=2

14

X
= 1 − 2p + (1 − p) · pn−1
n=2


1−pX n
= 1 − 2p + p
p n=2

1−p p2
= 1 − 2p + ·
p 1−p

= 1 − 2p + p = 1 − p < 1

∴ State 2 is transient

n
To calculate F11 , we can write f11 as follows:

1
f11 = P (X1 = 1 | X0 = 1) = p

n
f11 = P (leave 1, loop n − 2 times outside 1, then come back to 1)

X n−1
Y
n
=⇒ f11 = P (X1 = s | X0 = 1)· P (Xj = s | Xj−1 = s)·P (Xn = 1 | Xn−1 = s)
s∈S\{1} j=2

n−1
Y
n
=⇒ f11 = P (X1 = 2 | X0 = 1)· P (Xj = 2 | Xj−1 = 2)·P (Xn = 1 | Xn−1 = 2)
j=2

[since P (Xn = 1 | Xn−1 = 3) = 0]

n−1
Y
n
=⇒ f11 = (1 − p) · (1 − 2p) · p = p(1 − p) · (1 − 2p)n−2
j=2

Substituting this in the expression, we get:



X
1 n
F11 = f11 + f11
n=2


X
=p+ p(1 − p) · (1 − 2p)n−2
n=2

p(1 − p)
=p+
(1 − 2p)2

15
(1 − p)
=p+
2

p+1
= <1
2
1
Note that p < 2
since P (Xi = 2 | Xi−1 = 2) = 1 − 2p > 0.

∴ State 1 is transient

Q4: Gaussian: Suppose X = AZ + µ, where A is an n × n matrix and Z is a standard


normal vector of length n. Derive the expression for the mean E[X] and covariance
matrix CX . (5 mks) Also derive the expression for the pdf of X. (5 mks)
A:

(a) Mean E[X]:


Applying the expectation operator to both sides of X = AZ + µ, we get:

E[X] = E[AZ + µ].

E[X] = AE[Z] + E[µ].


Since Z is a standard normal vector, E[Z] = 0, and µ is a constant vector, so
E[µ] = µ.
E[X] = A · 0 + µ = µ.
E[X] = µ.

(b) Covariance Matrix CX :


The covariance matrix of X is given by:

CX = Cov(X) = E[(X − E[X])(X − E[X])T ].

Since, E[X] = µ.

CX = E[(X − µ)(X − µ)T ]


Substituting X = AZ + µ we have:

CX = E[(AZ)(AZ)T ].

CX = AE[ZZ T ]AT .
Since Z is a standard normal vector, its covariance matrix is the identity
matrix I, i.e., E[ZZ T ] = I. Thus:

CX = AIAT = AAT .

Therefore, the covariance matrix of X is:

CX = AAT .

16
(c) PDF of X:
Let’s start with the PDF of the standard normal vector Z. For a standard
normal vector where Zi ’s are i.i.d. and Zi ∼ N (0, 1), the PDF is given by:
n n
!  
Y 1 1 X
2 1 1 T
fZ (z) = fZi (zi ) = exp − z = exp − z z
i=1
(2π)n/2 2 i=1 i (2π)n/2 2

When X = G(Z), where G : Rn → Rn is continuous, invertible, and has


continuous partial derivatives, let H denote its inverse. Then

fX (x) = fZ (H(x))|J|

where J is the determinant of the Jacobian matrix of H.


Here X = AZ + µ, since A is invertible, we can write:

Z = A−1 (X − µ),

the inverse function H is given by

H(x) = A−1 (x − µ).

Now, let the entries of A−1 be:


 
a11 a12 · · · a1n
 a21 a22 · · · a2n 
A−1 =  .. .
 
.. . .. .
.. 
 . . 
an1 an2 · · · ann

The i-th entry of Hi (x) is:


n
X
Hi (x) = aij (xj − µj ).
j=1

Differentiating Hi (x) with respect to xk gives:

∂Hi
= aik .
∂xk

The Jacobian matrix J is:


 ∂H1 ∂H1 ∂H1

∂x1 ∂x2
··· ∂xn
 ∂H2 ∂H2
··· ∂H2 
 ∂x1 ∂x2 ∂xn  −1
J = . .. .. ..  = A .
 .. . . . 
∂Hn ∂Hn ∂Hn
∂x1 ∂x2
··· ∂xn

Finally, the determinant of the Jacobian is:


1
|J| = det(A−1 ) = .
det(A)

17
Thus, we conclude that
1
fX (x) = fZ (A−1 (x − µ))
| det(A)|
 
1 1 −1 T −1
fX (x) = exp − (A (x − µ)) (A (x − µ))
(2π)n/2 | det(A)| 2
 
1 1 T −T −1
fX (x) = exp − (x − µ) A A (x − µ)
(2π)n/2 | det(A)| 2
 
1 1 T T −1
fX (x) = exp − (x − µ) (AA ) (x − µ)
(2π)n/2 | det(A)| 2
Q5: Shifted Exponential: Let X be a random variable following a shifted exponential
distribution with rate parameter λ > 0 and shift parameter µ. The probability
density function (PDF) of X is given by:
(
λe−λ(x−µ) , if x ≥ µ,
fX (x) =
0, otherwise.

(a) Derive the MGF of X, state its region of convergence (5 mks)


(b) Using the MGF, obtain the first and second moments of X. What is the
variance of X? (5 mks)

A:

(a)

MX (t) = E[etX ]
Z ∞
= etx · λe−λ(x−µ) dx
µ
Z ∞
= λe λµ
e−(λ−t)x dx
µ
 −(λ−t)x ∞
λµ e
= λe −
λ−t µ
 −(λ−t)µ 
e
= λeλµ , for t < λ
λ−t
λeµt
= , for t < λ
λ−t

MX (t) exists for all t < λ since e−(λ−t)x becomes 0 at infinity only if (λ − t) is
positive.
(b) To find E[X], we need to find the derivative of MX (t), since E[X] = MX′ (0).

d µt µλeµt λeµt
MX′ (t) = λe (λ − t)−1 = +
dt λ−t (λ − t)2

18
λµ λ 1
E[X] = MX′ (0) = + 2 =µ+
λ λ λ
Similarly, E[X 2 ] = MX′′ (0).

d
MX′′ (t) = µλeµt (λ − t)−1 + λeµt (λ − t)−2
dt
µ2 λeµt µλeµt 2λeµt
= +2 +
λ−1 (λ − t)2 (λ − t)3

E[X 2 ] = MX′′ (0)


µ2 λ µλ 2λ
= +2 2 + 2
λ λ λ
 2
1 1
= µ+ + 2
λ λ

 2  2
2 2 1 1 1 1
Var[X] = E[X ] − E[X] = µ+ + 2 − µ+ = 2
λ λ λ λ

19

You might also like