0% found this document useful (0 votes)
77 views

Math525 2

This document provides solutions to homework problems from a probability and statistics course. It includes: 1) Derivations of Bayes' theorem for updating probabilities of parameters θ1 and θ2 given observed data x. 2) Calculations of posterior probabilities π(θi|x) for different prior distributions and datasets with n observations. 3) Derivations of the posterior distribution π(θ|X=k) for a binomial observation X, and identification of the most probable value of θ given an observation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

Math525 2

This document provides solutions to homework problems from a probability and statistics course. It includes: 1) Derivations of Bayes' theorem for updating probabilities of parameters θ1 and θ2 given observed data x. 2) Calculations of posterior probabilities π(θi|x) for different prior distributions and datasets with n observations. 3) Derivations of the posterior distribution π(θ|X=k) for a binomial observation X, and identification of the most probable value of θ given an observation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Homework solutions Math525 Fall 2003

Text book: Bickel-Doksum, 2nd edition


Assignment # 2

Section 1.2.

1. (a).

π(θi )p(x|θi ) p(x|θi )


π(θi |x) = = i = 1, 2
π(θ1 )p(x|θ1 ) + π(θ2 )p(x|θ2 ) p(x|θ1 ) + p(x|θ2 )

Therefore,
0.8 2 0.2 1
 
 π(θ1 |0) = =  π(θ1 |1) = =

 

0.8 + 0.4 3 0.2 + 0.6 4
0.4 1 0.6 3
 π(θ2 |0) = =  π(θ2 |1) = =

 

0.8 + 0.4 3 0.2 + 0.6 4

(b). Pn Pn
xi
p(x1 , · · · , xn |θ1 ) = (0.2) i=1 (0.8)n− i=1
xi

Pn Pn
xi
p(x1 , · · · , xn |θ2 ) = (0.6) i=1 (0.4)n− i=1
xi

Hence
π(θi )p(x1 , · · · , xn |θi )
π(θi |x1 , · · · , xn ) =
π(θ1 )p(x1 , · · · , xn |θ1 ) + π(θ2 )p(x1 , · · · , xn |θ2 )
p(x1 , · · · , xn |θi )
= i = 1, 2
p(x1 , · · · , xn |θ1 ) + p(x1 , · · · , xn |θ2 )
Thus,
Pn Pn
xi n− xi
(0.2) i=1 (0.8) i=1
π(θ1 |x1 , · · · , xn ) = Pn Pn Pn Pn
xi n− xi xi n− xi
(0.2) i=1 (0.8) i=1 + (0.6) i=1 (0.4) i=1

Pn Pn
xi n− xi
(0.6) i=1 (0.4) i=1
π(θ2 |x1 , · · · , xn ) = Pn
Pn
Pn
Pn
xi n− xi xi n− xi
(0.2) i=1 (0.8) i=1 + (0.6) i=1 (0.4) i=1

(c).
Pn Pn
(0.25)(0.2) i=1 xi (0.8)n− i=1 xi
π(θ1 |x1 , · · · , xn ) = Pn Pn Pn Pn
xi n− xi xi n− xi
0.25(0.2) i=1 (0.8) i=1 + 0.75(0.6) i=1 (0.4) i=1
Pn Pn
(0.2) i=1 xi (0.8)n− i=1 xi
= Pn Pn Pn Pn
(0.2) i=1 xi (0.8)n− i=1 xi + 3 · (0.6) i=1 xi (0.4)n− i=1 xi

1
Pn Pn
3 · (0.6) i=1 xi (0.4)n− i=1 xi
π(θ2 |x1 , · · · , xn ) = Pn Pn Pn Pn
(0.2) i=1 xi (0.8)n− i=1 xi + 3 · (0.6) i=1 xi (0.4)n− i=1 xi

(d) For the prior distribution π,


n
 X n (0.2)n/2 (0.8)n/2
π θ1 Xi = =

i=1
2 (0.2)n/2 (0.8)n/2 + (0.6)n/2 (0.4)n/2

n
 X n (0.6)n/2 (0.4)n/2
π θ2 Xi = =

i=1
2 (0.2)n/2 (0.8)n/2 + (0.6)k (0.4)n/2
As n = 2
2
 X  (0.2)(0.8) 2
π θ1 Xi = 1 = =

i=1
(0.2)(0.8) + (0.6)(0.4) 5
2
 X  2 3
π θ2 Xi = 1 = 1 − =

i=1
5 5

As n = 100
100
 X  (1.6)50
π θ1 Xi = 50 =

i=1
(1.6)50 + (2.4)50
n
 X n (2.4)50
π θ2 Xi = =

i=1
2 (1.6)50 + (2.4)50

For the prior distribution π1 , as n = 2,


2
 X  (0.2)(0.8) 2
π θ1 Xi = 1 = =

i=1
(0.2)(0.8) + 3 · (0.6)(0.4) 11

2
 X  2 9
π θ2 Xi = 1 = 1 − =

i=1
5 11
100
 X  (1.6)50
π θ1 Xi = 50 =

i=1
(1.6)50 + 3 · (2.4)50
100
 X  3 · (2.4)50
π θ2 Xi = 50 =

i=1
(1.6)50 + 3 · (2.4)50

(e) First, I believe that there k should be n/2. By definition


n n n
 X n n  X n  X n o
arg max π θ Xi = = θ; π θ Xi = = max π s Xi =

θ
i=1
2 i=1
2 s
i=1
2

2
By (d), in both cases n = 2 and n = 100,
n
 X n
arg max π θ Xi = = θ2

θ 2
i=1

3. (a)

π(θ|X = 2)
π(θ)P {X = 2|θ = θ}
=
π(1/4)P {X = 2|θ = 1/4} + π(1/2)P {X = 2|θ = 1/2} + π(3/4)P {X = 2|θ = 3/4}
(1 − θ)2 θ 16
= 2 2 2
= (1 − θ)2 θ
(3/4) (1/4) + (1/2) (1/2) + (1/4) (3/4) 5
Hence
9 2 3
π((1/4)|X = 2) = , π((1/2)|X = 2) = , π((3/4)|X = 2) =
20 5 20

(b) Given X = 2, the most probable value of θ is 1/4. As X = k,

(1 − θ)k θ
π(θ|X = k) =
(3/4)k (1/4) + (1/2)k (1/2) + (1/4)k (3/4)

So we need to compare (3/4)k (1/4), (1/2)k (1/2) and (1/4)k (3/4). As k = 0, the third is
the largest — so 3/4 is most probable. As k = 1, the second is the largest — so 1/2 is
most probable. As k ≥ 2, the first is the largest — so 1/4 is most probable.

(c). The posterior density of θ is

π(θ|X = k)
θ r−1 (1 − θ)s−1  Z 1 xr−1 (1 − x)s−1 −1
= P {X = k|θ = θ} P {X = k|θ = x}dx
B(r, s) 0 B(r, s)
Z 1 −1 θ r (1 − θ)s+k−1
r−1 s−1 k r−1 s−1 k
= θ (1 − θ) (1 − θ) θ x (1 − x) (1 − x) xdx =
0 B(r + 1, s + k)

where θ > 0. So the posterior distribution of θ given X = k is the beta distribution


β(r + 1, s + k).

4.(a) Notice that p(x1 , · · · , xn |j) = 0 as j < max{x1 , · · · , xn } = m.

π(j)p(x1 , · · · , xn |j)
π(j|x1 , · · · , xn ) = P∞
k=1 π(k)p(x1 , · · · , xn |k)
c(a)j −a j −n c(n + a, m)
= P∞ = j = m, m + 1, · · ·
c(a) k=m j −a−n j a+n

3
(b).
∞ 
X −1  ∞  m n+a −1
c(n + a, m) m n+a X
π(m|x1 , · · · , xn ) = = = 1+
ma+n j=m
j j=m+1
j

Therefore the conclusion follows from the fact that


X∞  m n+a
−→ 0 (n → ∞)
j=m+1
j

which can be easily checked out by either dominated or monotonic convergence theorem.

Explanation: {X1 , · · · , Xn } is an i.i.d. sample from the population with the uniform
distribution over {1, · · · , θ}, where θ is an unknown parameter given as the exact upper
bound of the distribution. It is intuitively clear that max{X1 , · · · , Xn } converges to θ in
some suitable sense as n → ∞. Another way to describe such phenomenum is to say that
θ, the randomization of θ, take the value m = max{x1 , · · · , xn } with probability close to 1
as n is large.

Section 1.3.

1. (a), (b) We use the formula in p.25:

R(θ, δ) = l(θ, a1)P {δ(X) = a1 } + l(θ, a2 )P {δ(X) = a2 } + l(θ, a3 )P {δ(X) = a3 }



 Pθ1 {δ(X) = a2 } + 2Pθ1 {δ(X) = a3 } if θ = θ1
=
2Pθ2 {δ(X) = a1 } + Pθ2 {δ(X) = a3 } if θ = θ2

We now consider the distributions of the decision rules δi i = 1, 2, · · · , 9:


i = 1:
Pθ1 {δ1 (X) = a1 } = 1, Pθ1 {δ1 (X) = a2 } = 0 Pθ1 {δ1 (X) = a3 } = 0
Pθ2 {δ1 (X) = a1 } = 1, Pθ2 {δ1 (X) = a2 } = 0 Pθ2 {δ1 (X) = a3 } = 0

i = 2:

Pθ1 {δ2 (X) = a1 } = p, Pθ1 {δ2 (X) = a2 } = 1 − p, Pθ1 {δ2 (X) = a3 } = 0

Pθ2 {δ2 (X) = a1 } = q, Pθ2 {δ2 (X) = a2 } = 1 − q, Pθ2 {δ2 (X) = a3 } = 0

i = 3:

Pθ1 {δ3 (X) = a1 } = p, Pθ1 {δ3 (X) = a2 } = 0, Pθ2 {δ3 (X) = a3 } = 1 − p

Pθ2 {δ3 (X) = a1 } = q, Pθ2 {δ3 (X) = a2 } = 0, Pθ2 {δ3 (X) = a3 } = 1 − q

4
i = 4:

Pθ1 {δ4 (X) = a1 } = 1 − p, Pθ1 {δ4 (X) = a2 } = p, Pθ1 {δ3 (X) = a3 } = 0

Pθ2 {δ4 (X) = a1 } = 1 − q, Pθ2 {δ4 (X) = a2 } = q, Pθ2 {δ3 (X) = a3 } = 0

i = 5:
Pθ1 {δ5 (X) = a1 } = 0, Pθ1 {δ5 (X) = a2 } = 1, Pθ1 {δ5 (X) = a3 } = 0
Pθ2 {δ5 (X) = a1 } = 0, Pθ2 {δ5 (X) = a2 } = 1, Pθ2 {δ5 (X) = a3 } = 0

i = 6:

Pθ1 {δ6 (X) = a1 } = 0, Pθ1 {δ6 (X) = a2 } = p, Pθ1 {δ6 (X) = a3 } = 1 − p

Pθ2 {δ6 (X) = a1 } = 0, Pθ2 {δ6 (X) = a2 } = q, Pθ2 {δ6 (X) = a3 } = 1 − p

i = 7:

Pθ1 {δ7 (X) = a1 } = 1 − p, Pθ1 {δ7 (X) = a2 } = 0, Pθ1 {δ7 (X) = a3 } = p

Pθ2 {δ7 (X) = a1 } = 1 − q, Pθ2 {δ7 (X) = a2 } = 0, Pθ2 {δ7 (X) = a3 } = q

i = 8:

Pθ1 {δ8 (X) = a1 } = 0, Pθ1 {δ8 (X) = a2 } = 1 − p, Pθ1 {δ8 (X) = a3 } = p

Pθ2 {δ8 (X) = a1 } = 0, Pθ2 {δ8 (X) = a2 } = 1 − q, Pθ2 {δ8 (X) = a3 } = q

i = 9:
Pθ1 {δ9 (X) = a1 } = 0, Pθ1 {δ9 (X) = a2 } = 0, Pθ1 {δ9 (X) = a3 } = 1
Pθ2 {δ9 (X) = a1 } = 0, Pθ2 {δ9 (X) = a2 } = 0, Pθ2 {δ9 (X) = a3 } = 1
Plug in we have all risk points R(θj , δi ) (j = 1, 2, i = 1, 2, · · · 9).

(c) In the case (a), the decision rule has the same distribution regardless the value of
θ. 
 P {δ(X) = a2 } + 2P {δ(X) = a3 } if θ = θ1
R(θ, δ) =
2P {δ(X) = a1 } + P {δ(X) = a3 } if θ = θ1

Hence,
0 if θ = θ1 0.9 if θ = θ1
( (
R(θ, δ1 ) = , R(θ, δ2) = ,
1 if θ = θ1 0.2 if θ = θ1

5
1.8 if θ = θ1 0.1 if θ = θ1
( (
R(θ, δ3 ) = , R(θ, δ4 ) = ,
1.1 if θ = θ1 1.8 if θ = θ1
1 if θ = θ1 1.9 if θ = θ1
( (
R(θ, δ5 ) = , R(θ, δ6) = ,
0 if θ = θ1 0.9 if θ = θ1
1.1 if θ = θ1 1.1 if θ = θ1
( (
R(θ, δ7 ) = , R(θ, δ8 ) = ,
1.9 if θ = θ1 0.1 if θ = θ1
2 if θ = θ1
(
R(θ, δ9 ) =
1 if θ = θ1
The minimax rule is δ2 .

(d).
r(δ1 ) = 0.5, r(δ2 ) = 0.55, r(δ3 ) = 1.45, r(δ4 ) = 0.95, r(δ5 ) = 0.5
r(δ6 ) = 1.4, r(δ7 ) = 1.5, r(δ8 ) = 0.6, r(δ9 ) = 1.5
The Bayes rules are δ1 and δ5 .

8 (a) Let µ be the expectation of the population.


n
−1
X 2
Es = (n − 1) E [Xi − µ] − [X̄ − µ]
i=1
n
Xn o
= (n − 1)−1 σ 2 − 2E [Xi − µ] · [X̄ − µ] + V ar(X̄)

i=1
n o
= (n − 1)−1 nσ 2 − 2nV ar(X̄) + nV ar(X̄)
n o n o
−1 2 −1
= (n − 1) nσ − nV ar(X̄) = (n − 1) 2 2
nσ − σ = σ 2

(b)(i) Since s is unbiased,


M SE(s) = V ar(s) = Es2 − σ 4
Pn 2
Notice that i=1 Xi − X̄ ∼ σ 2 χ2n−1 , there are i.i.d. standard normal random variable
Y1 , · · · , Yn−1 such that
n n−1
2 1 X  2 2 σ4  X 2
Es = 2
E Xi − X̄ = 2
E Yk2
(n − 1) i=1
(n − 1)
k=1
n−1 n−1
σ4 X σ4 X X
= E Yj2 Yk2 = EYk4 + 2 EYj2 Yk2
(n − 1)2 (n − 1)2
j,k=1 k=1 1≤j<k≤n−1
4
σ n+1 4
3(n − 1) + (n − 1)2 − (n − 1) } =
 
= 2
σ
(n − 1) n−1

6
So
n+1 4 2
M SE(s) = σ − σ4 = σ4
n−1 n−1

(ii)
2 2 2
M SE(σ̂02 ) = V ar(σ̂02 ) + E(σ̂02 ) − σ 2 = E(σ̂02 )2 − E(σ̂02 ) + E(σ̂02 ) − σ 2
2
= c2 σ 4 (n − 1)2 + 2(n − 1) − c2 (n − 1)2 σ 4 + c(n − 1)σ 2 − σ 2

n 2 o
= σ 4 2c2 (n − 1) + c(n − 1) − 1

It is easy to see that c = (n + 1)−1 is the minimizer of the right hand side.

9. E p̂ = θ and E θ̂ = 0.02 + 0.8E p̂ = 0.02 + 0.8θ.

σ2
M SE(p̂) = V ar(p̂) =
n

M SE(θ̂) = V ar(θ̂) + 0.02 + 0.8θ − θ)2


σ2
= 0.64V ar(p̂) + 0.04 · (0.1 − θ)2 = 0.64 + 0.04 · (0.1 − θ)2
n
Solve the inequality M SE(θ̂) < E p̂:
σ
|θ − 0.1| < 0.3 √
n

19. (a). Given a decision rule δ: {0, 1} −→ {a1 , a2 },



2Pθ1 {δ(X) = a2 } θ = θ1
R(θ, δ) = El(θ, δ(X)) =
3Pθ2 {δ(X) = a1 } + Pθ2 {δ(X) = a2 } θ = θ2

There are 4 possible non-randomized decision rules:

δ1 (0) = a1 , δ1 (1) = a2 ,

δ2 (0) = a2 , δ2 (1) = a1 ,
δ3 (0) = a1 , δ3 (1) = a1 ,
δ4 (0) = a2 , δ4 (1) = a2 ,
We have
 
1.6 θ = θ1 0.4 θ = θ1
R(θ, δ1 ) = , R(θ, δ2 ) = ,
1.8 θ = θ2 2.2 θ = θ2

7
 
0 θ = θ1 2 θ = θ1
R(θ, δ3 ) = , R(θ, δ4 ) = ,
3 θ = θ2 1 θ = θ2
The minimax rule is δ1 .

(b) The risk function of any randomized decision rule δ can be written in the form

4
X
R(θ, δ1 ) = λi R(θ, δi )
i=1

where λ1 , λ2 , λ3 , λ4 can be any non-negative numbers satisfying λ1 + λ2 + λ3 + λ4 = 1.


The risk set S takes the form
n o
S = 1.6λ1 + 0.4λ2 + 2λ4 , 1.8λ1 + 2.2λ2 + 3λ3 + λ4

By graphing, the optimal  point lies on the line sigment between R(θ 1 , δ 2 ), R(θ 2 , δ 2 ) and
R(θ1 , δ4 ), R(θ2 , δ4 ) That is

R(θ1 , δ) = 0.4λ + 2(1 − λ) and R(θ2 , δ) = 2.2λ + (1 − λ)

Solving 0.4λ + 2(1 − λ) = 2.2λ + (1 − λ) we have λ = 5/14. So the minmax rule among
the randomized decision rules is

δ2 with probability 5/14
δ=
δ4 with probability 9/14

(c). The Bayes risks are

r(δ1 ) = 0.1 × 1.6 + 0.9 × 1.8 = 1.78

r(δ2 ) = 0.1 × 0.4 + 0.9 × 2.2 = 2.02


r(δ3 ) = 0.1 × 0 + 0.9 × 3 = 2.7
r(δ4 ) = 0.1 × 2 + 0.9 × 1 = 1.1
So the Bayes rule is δ4 .

You might also like