0% found this document useful (0 votes)
12 views36 pages

확통1 LectureNote06 on Limit Theorems

Uploaded by

jedem10224
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views36 pages

확통1 LectureNote06 on Limit Theorems

Uploaded by

jedem10224
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Limit Theorems

Limit Theorems: Motivation


𝑋1 , ⋯ , 𝑋𝑛 are i.i.d. random variables.
Let
𝑋1 +⋯+𝑋𝑛
𝑀𝑛 = .
𝑛
What happens to 𝑀𝑛 as 𝑛 → ∞ ?

• A tool: Several inequalities in probability

• Convergence “in probability”

• Convergence “with probability 1”


.
2
Markov Inequality
• For a nonnegative random variable 𝑋, 𝑓𝑋 (𝑥)

𝐄[𝑋]
P 𝑋≥𝑎 ≤ for all 𝑎 > 0
𝑎

0, if 𝑋 < 𝑎
• Why? : Let 𝑌𝑎 = ቊ 𝑃(𝑌𝑎 = 0)

𝑎, if 𝑋 ≥ 𝑎 𝑃(𝑌𝑎 = 𝑎)

Then, 𝑌𝑎 ≤ 𝑋 and 𝐄 𝑌𝑎 ≤ 𝐄[𝑋]. 0 𝑎

On the other hand, 𝐄 𝑌𝑎 = 𝑎𝐏 𝑌𝑎 = 𝑎 = 𝑎𝐏 𝑋 ≥ 𝑎 ,


from which we get the result.
3
Generalized Markov Inequality
• We now have for a nonnegative random variable 𝑋,
𝐄𝑋
P 𝑋≥𝑎 ≤ for all 𝑎 > 0
𝑎
• Next, we can generalize the Markov inequality. We can
substitute any positive non-decreasing function 𝑓: 𝑋 →
ℝ+ :
𝐄[𝑓(𝑋)]
P 𝑋 ≥ 𝑎 = 𝐏(𝑓 𝑋 ≥ 𝑓 𝑎 ) ≤
𝑓(𝑎)

⚫ If we pick 𝑓(𝑋) judiciously we can obtain better bounds.

4
Chebyshev Inequality
• For a random variable 𝑋 with mean 𝐄[𝑋] and variance 𝜎𝑋2 ,
𝜎𝑋2
𝐏 𝑋−𝐄 𝑋 ≥𝑐 ≤ for all 𝑐 > 0
𝑐2
• Why? : As a first application of the generalized Markov bound,
we pick 𝑓 𝑋 = 𝑋 2 . Then,
𝐏 𝑋 − 𝐄 𝑋 ≥ 𝑐 = 𝐏 𝑋 − 𝐄 𝑋 2 ≥ 𝑐2
𝐄 𝑋−𝐄 𝑋 2 𝜎𝑋2
≤ 2
= 2
𝑐 𝑐
⚫ For 𝑐 = 𝑘𝜎,
1
𝐏(|𝑋 − 𝐄 𝑋 | ≥ 𝑘𝜎𝑋 ) ≤ 2
𝑘

5
Example: Chebyshev bound is conservative
• The Chebyshev bound is more powerful than the Markov bound,
because it also uses variance. But since the mean and variance are
only a rough summary of its properties, we cannot expect the
bound to be close approximation to the exact value.
2 4−0 2ൗ
• If 𝑋~𝑈[0,4], we 𝐄 X = 2, 𝜎𝑋 ≤ 12 = 4/3, and for 𝑐 = 1
4
𝐏 𝑋−2 ≥1 ≤ .
3
which is uninformative compared to the exact value 1/2.
• Let 𝑋~Exp(𝜆 = 1), so that 𝐄 X = 1, 𝜎𝑋2 = 1. For 𝑐 > 1,
𝐏 𝑋 ≥𝑐 =𝐏 𝑋−1≥𝑐−1
1
≤𝐏 𝑋−1 ≥𝑐−1 ≤ 2
,
𝑐−1
which is again conservative compared to the exact value
𝐏 𝑋 > 𝑐 = 𝑒 −𝑐 .
6
Example: Upper bound of Chebyshev Ineq.
• If 𝑋 is in [𝑎, 𝑏], we claim a conservative bound 𝜎𝑋2 ≤ 𝑏 − 𝑎 2 /4.
If 𝜎𝑋2 is unknown, we may use 𝜎𝑋2 = 𝑏 − 𝑎 2 /4, and claim
𝑏−𝑎 2
𝐏(|𝑋 − 𝐄 𝑋 | ≥ 𝑐) ≤
4𝑐 2
• Why? : For any constant 𝛾, we have
𝐄 (𝑋 − 𝛾)2 = 𝐄 𝑋 2 − 2𝐄 𝑋 𝛾 + 𝛾 2 ,
and this is minimized when 𝛾 = 𝐄 𝑋 . Thus,
2
𝐄 (𝑋 − 𝛾)2 ≥𝐄 𝑋−𝐄 𝑋 = 𝜎𝑋2 , for all 𝛾.
By setting 𝛾 = (𝑎 + 𝑏)/2, we have
2 2 2
𝑎+𝑏 𝑏−𝑎 𝑏−𝑎
𝜎𝑋2 ≤ 𝐄 𝑋− =𝐄 𝑋−𝑎 𝑋−𝑏 + ≤
2 4 4
where the last inequality follows (𝑥 − 𝑎)(𝑥 − 𝑏) ≤ 0 for all 𝑥 in
the range [𝑎, 𝑏].
7
Chernoff Bound (1)
• Chernoff bounds are typically (but not always) tighter than
Markov and Chebyshev bounds but require stronger assumptions.
Let 𝑋 be a sum of 𝑛 independent Bernoulli random variables {𝑋𝑖 },
𝑋 = σ𝑖 𝑋𝑖 with 𝐄[𝑋𝑖 ] = 𝑝𝑖 . Let 𝜇 = 𝐄 𝑋 . Then we have
𝜇 = 𝐄 𝑋 = 𝐄 ෍ 𝑋𝑖 = ෍ 𝐄[𝑋𝑖 ] = ෍ 𝑝𝑖
𝑖 𝑖 𝑖
• We pick 𝑓 𝑋 = 𝑒 𝑡𝑋 . Then,
𝐄 𝑒 𝑡𝑋
𝐏[𝑋 ≥ 1 + 𝛿 𝜇] = 𝐏 𝑒 𝑡𝑋 ≥ 𝑒 1+𝛿 𝜇𝑡 ≤ 1+𝛿 𝜇𝑡 (1)
𝑒
⚫ We will establish a bound on 𝐄 𝑒 𝑡𝑋 :

𝐄 𝑒 𝑡𝑋 = 𝐄 𝑒 𝑡 σ 𝑋𝑖 = 𝐄 ෑ 𝑒 𝑡𝑋𝑖 = ෑ 𝐄[𝑒 𝑡𝑋𝑖 ]

= ෑ(𝑝𝑖 𝑒 𝑡 + (1 − 𝑝𝑖 ) ∙ 1) = ෑ(1 + 𝑝𝑖 (𝑒 𝑡 − 1)) 8


Chernoff Bound (2)
• We now use the following approximation, for 𝑦 ∈ ℝ, 1 + 𝑦 ≤ 𝑒 𝑦 .
Hence, regarding 𝑦 as 𝑝𝑖 (𝑒 𝑡 − 1)
𝑡 −1)
𝐄 𝑒 𝑡𝑋 = ෑ(1 + 𝑝𝑖 (𝑒 𝑡 − 1)) ≤ ෑ𝑒 𝑝 𝑖 (𝑒

σ 𝑝𝑖 (𝑒 𝑡 −1) (𝑒 𝑡 −1) σ 𝑝 (𝑒 𝑡 −1)𝜇


=𝑒 = 𝑒 = 𝑖 𝑒
Substituting this into Eq.(1), we get that for all 𝑡 ≥ 0,
𝑒 𝑡 −1 𝜇
𝑒
𝐏 𝑋 ≥ 1 + 𝛿 𝜇 ≤ 1+𝛿 𝜇𝑡 (2)
𝑒
⚫ In order to make the bound as tight to as possible, we find the
value of 𝑡 that minimizes the upper bound of eq.(2), 𝑡 = ln 1 + 𝛿 .
Substituting this into eq.(2), we obtain, for all 𝛿 ≥ 0:
𝑒 ln 1+𝛿 −1 𝜇− 1+𝛿 ln 1+𝛿 𝜇
𝐏 𝑋 ≥ 1+𝛿 𝜇 ≤𝑒
= 𝑒 [𝛿− 1+𝛿 ln 1+𝛿 ]𝜇 (3)
9
Chernoff Bound (3)
⚫ We will now try to obtain a simpler form of the above bound. In
particular, we use the Taylor series expansion of ln 1 + 𝛿 given by
𝑖+1 𝛿𝑖
ln 1 + 𝛿 = σ𝑖≥1(−1) ∙ . Therefore
𝑖
𝑖𝛿𝑖(
1 1
1 + 𝛿 ln 1 + 𝛿 = 𝛿 + ෍ −1 − )
𝑖≥2 𝑖−1 𝑖
Assuming that 0 < 𝛿 < 1, and ignoring the higher order terms,
𝛿2 𝛿3 𝛿2
1 + 𝛿 ln 1 + 𝛿 > 𝛿 + − >𝛿+
2 6 3
Plugging this into eq.(3), we obtain
−𝛿 2 𝜇
𝐏 𝑋 ≥ 1+𝛿 𝜇 ≤ 𝑒 3 (0 < 𝛿 < 1)

⚫ A very similar calculation shows that


−𝛿 2 𝜇
𝐏 𝑋 < 1−𝛿 𝜇 ≤ 𝑒 2 (0 < 𝛿 < 1) 10
A More General Chernoff Bound
2𝛿
⚫ We observe that ln 1 + 𝛿 > for 𝛿 > 0. This implies that
2+𝛿
−𝛿 2
𝛿 − 1 + 𝛿 ln 1 + 𝛿 ≤ .
2+𝛿
Hence, using eq.(3) we obtain the following bound, which works
for all positive 𝛿 ,
−𝛿 2 𝜇
𝐏 𝑋 ≥ 1+𝛿 𝜇 ≤ 𝑒 2+𝛿 (𝛿 > 0)
Similarly, it can be shown that
−𝛿 2 𝜇
𝐏 𝑋 < 1−𝛿 𝜇 ≤ 𝑒 2+𝛿 (𝛿 > 0)
⚫ We can combine both inequalities into one called two-sided
Chernoff bound
𝛿2𝜇

𝐏 |𝑋 − 𝜇| ≥ 𝛿𝜇 ≤ 2𝑒 2+𝛿 (𝛿 > 0)
11
Example: Fair Coin Tossing
• Suppose you toss a fair coin 200 times. How likely is it that you
see at least 120 heads?
𝑛
First note that 𝜇 = = 100, and from 120 = 1 + 𝛿 𝜇, we see
2
𝛿 = 0.2. Then the Chernoff bound says
0.22 ×100 20
𝐏 𝑋 ≥ 120 ≤ 𝑒 − 2+0.2 =𝑒 −11
= 0.162
⚫ Let us compare this with the Chebyshev bound. Note that 𝜎 2 =
𝑛
= 50, and from 120 = 1 + 𝛿 𝜇, we see 𝜇𝛿 = 20. Then the
4
Chebyshev bound is
𝜎2 50
𝐏 𝑋 ≥ 120 ≤ 2
= 2 = 0.125
(𝜇𝛿) 20
This result shows that the Chernoff bound is not always tighter
than the Chebyshev bound.
12
Convergence of a deterministic sequence

• We have a sequence of real numbers 𝑎𝑛 and a number 𝑎

• We say that 𝑎𝑛 converges to 𝑎 and write lim 𝑎𝑛 = 𝑎,


𝑛→∞
− if (intuitively): 𝑎𝑛 eventually gets and stays
(arbitrarily) close to 𝑎

− if (rigorously): For every 𝜖 > 0, there exists


some 𝑛0 such that for all 𝑛 ≥ 𝑛0 ,
𝑎𝑛 − 𝑎 < 𝜖

13
Convergence “in probability”

• We have a sequence of random variables 𝑌𝑛


• We say that 𝑌𝑛 converges to a number 𝑎 in probability,
− if (intuitively): “Almost all” of the PMF/PDF of 𝑌𝑛
eventually gets concentrated (arbitrarily)
close to 𝑎

− if (rigorously): For every 𝜖 > 0, we have


lim 𝐏 𝑌𝑛 − 𝑎 < 𝜖 = 1
𝑛→∞

14
Example: Convergence
• One might be tempted to believe that if a sequence 𝑌𝑛
converges to a number 𝑎, then 𝐄[𝑌𝑛 ] must also to 𝑎. The
following example shows this need not be a case.
• Consider a sequence of random variables with the following
sequence of PMFs: 𝑷𝒀 (𝒚)
𝒏

1 − 𝑛1 , 𝑦=0 𝟏−
𝟏
𝐏 𝑌𝑛 = 𝑦 = ቐ 1
𝒏

𝑛, 𝑦 = 𝑛2 𝟏
𝒏
• For every 𝜖 > 0, we have 𝟎 𝒏𝟐 𝒀𝒏
1
lim 𝐏 𝑌𝑛 − 0 ≥ 𝜖 = lim = 0.
𝑛→∞ 𝑛→∞ 𝑛
Thus, 𝑌𝑛 converges to 0 in probability.
1
• 𝐄 𝑌𝑛 = 𝑛2 × = 𝑛, which goes to ∞ as 𝑛 increases.
𝑛
15
Convergence “with probability 1” (1)
• We have a sequence of random variables 𝑌1 , 𝑌2 , 𝑌3 , …
(not necessarily i.i.d.)
• We say that 𝑌𝑛 converges to 𝑎 with probability 1 (wp1)
(or almost surely (a.s.)) if

𝐏( lim 𝑌𝑛 = 𝑎) = 1
𝑛→∞

• Convergence with probability 1 implies convergence in


probability, but the converse is not necessarily true

16
Convergence “with probability 1” (2)
• Consider a sequence 𝑌1 , 𝑌2 , 𝑌3 , … . If for all 𝜖 > 0, we have

෍ 𝐏 𝑌𝑛 − 𝑎 > 𝜖 < ∞,
𝑛=1
a.s.
then 𝑌𝑛 𝑎. This provides only a sufficient condition for
almost sure convergence.
• In the case σ∞
𝑛=1 𝐏 𝑌𝑛 − 𝑎 > 𝜖 = ∞, then we have a
necessary and sufficient condition for almost surely
convergence: Define the set of events
𝑆𝑚 = 𝑌𝑛 − 𝑎 < 𝜖, for all 𝑛 ≥ 𝑚 .
a.s.
Then, 𝑌𝑛 𝑎 if and only if for any 𝜖 > 0, we have

lim 𝐏 𝑆𝑚 = lim 𝐏 𝑌𝑛 − 𝑎 < 𝜖, for all 𝑛 ≥ 𝑚 = 1


𝑚→∞ 𝑚→∞
17
Convergence “with probability 1” (3)
• Example: Let 𝑋1 , … , 𝑋𝑛 be i.i.d. Bernoulli(1/2), and
define 𝑌𝑛 = 2𝑛 ς𝑛𝑖=1 𝑋𝑖 . Then for any 0 < 𝜖 < 2𝑛 ,
𝐏{ 𝑌𝑛 − 0 < 𝜖 for all 𝑛 ≥ 𝑚}
= 𝐏{𝑋𝑛 = 0 for some 𝑛 ≤ 𝑚}
= 1 − 𝐏{𝑋𝑛 = 1 for all 𝑛 ≤ 𝑚}
= 1 − 1Τ2 𝑚 ,
which converges to 1 as 𝑚 → ∞. Hence, the sequence 𝑌𝑛
converges to 0 almost surely.
• Exercise: Let 𝑋𝑛 be i.i.d. Bernoulli(1/𝑛) rvs for 𝑛 =
a.s.
2,3, … . The goal is to check whether 𝑋𝑛 0.
(a) Check that σ∞𝑛=1 𝐏 𝑋𝑛 − 0 > 𝜖 = ∞ .
(b) Show that 𝑋𝑛 does not converge to 0 almost surely.
18
Convergence of Sample Mean
• Let 𝑋1 , ⋯ , 𝑋𝑛 be i.i.d. rvs with mean 𝜇 and variance
𝜎 2 and the sample mean is
𝑋1 + 𝑋2 + ⋯ 𝑋𝑛
𝑀𝑛 =
𝑛
• Mean: 𝐄 𝑀𝑛 = 𝜇
𝜎2
• Variance: 𝐕 𝑀𝑛 =
𝑛
𝐕 𝑀𝑛
• Chebyshev: 𝐏( 𝑀𝑛 − 𝐄 𝑀𝑛 ≥ ϵ) ≤
𝜖2

𝜎2
𝐏( 𝑀𝑛 − 𝜇 ≥ 𝜖) ≤ 2
𝑛𝜖 19
WLLN and SLLN
• Let 𝑋1 , ⋯ , 𝑋𝑛 be i.i.d. with finite mean 𝜇 and variance 𝜎 2
• Weak Law of Large Numbers (WLLN)
For every 𝜖 > 0, 𝑀𝑛 converges to 𝜇 in probability

𝐏 |𝑀𝑛 − 𝜇| ≥ 𝜖 → 0 , as 𝑛 → ∞

• Strong Law of Large Numbers (SLLN)


𝑀𝑛 converges to 𝜇 with probability 1, in the sense that

𝐏 lim 𝑀𝑛 = 𝜇 = 1
𝑛→∞

20
The Pollster’s Problem (1)
• 𝑝: proportion of population that do something
1 if "Yes"
• 𝑖th person polled ~ Bernoulli(𝑝): 𝑋𝑖 = ቊ
0 if "No"
σ𝑛
• 𝑀𝑛 = 𝑖=1 𝑋𝑖
𝑛
= sample proportion of “Yes” as our estimate of 𝑝
• How many persons should be polled to satisfy
𝐏(|𝑀𝑛 − 𝑝| ≥ 0.01) ≤ 0.05
• Chebyshev bound is 𝐏 𝑀𝑛 − 𝐄 𝑀𝑛 ≥ 𝜖 ≤ 2 .
𝐕 𝑀𝑛
𝜖
𝑝(1−𝑝) 1
We have ϵ = 0.01, 𝐄 𝑀𝑛 = 𝑝, 𝐕 𝑀𝑛 = ≤
𝑛 4𝑛
1 (∵)When 𝑿 takes values
Thus, 𝐏(|𝑀𝑛 − 𝑝| ≥ 0.01) ≤ 2 ≤ 0.05 in [𝒂,𝟐𝒃], 𝝈𝟐𝑿 ≤ 𝒃 − 𝒂 𝟐/𝟒.
4𝑛 0.01 So 𝝈𝑿 = 𝒑(𝟏 − 𝒑) ≤ 𝟏/𝟒
• If we choose 𝑛 large enough to satisfy the above bound,
we have a conservative value of 𝑛 ≥ 50,000
21
Central Limit Theorem (1)
• Let 𝑋1 , ⋯ , 𝑋𝑛 be a sequence of i.i.d. rvs with finite mean
𝜇 and variance 𝜎 2
• Look at three variants of their sum:
− 𝑆𝑛 = 𝑋1 + ⋯ + 𝑋𝑛 (variance 𝑛𝜎 2 ) increases to ∞
− 𝑀𝑛 = 𝑆𝑛 /𝑛 (variance 𝜎 2 /𝑛) converges “in probability”
to 𝜇 from WLLN
− 𝑆𝑛 / 𝑛 has the variance at a constant level 𝜎 2
• We define a “standardized” sum
𝑀𝑛 − 𝐄(𝑀𝑛 ) 𝑀𝑛 − 𝜇 𝑛𝑀𝑛 − 𝑛𝜇 𝑆𝑛 − 𝑛𝜇
𝑍𝑛 = = = =
𝜎𝑀𝑛 𝜎/ 𝑛 𝜎 𝑛 𝜎 𝑛
from which 𝐄 𝑍𝑛 = 0 and 𝐕(𝑍𝑛 ) = 1
22
Central Limit Theorem (2)
• Then, the CDF of 𝑍𝑛 converges to the standard normal CDF
in the sense that
lim 𝐏(𝑍𝑛 ≤ 𝑧) = Φ(𝑧), for every 𝑧
𝑛→∞
where Φ(𝑧) is the standard normal CDF
𝑧
1 −𝑥 2 /2
Φ 𝑧 = න𝑒 𝑑𝑥
2𝜋
−∞
• This is called the Central Limit Theorem (CLT).

23
What exactly does the CLT say?
• CDF of 𝑍𝑛 converges to Φ(𝑧)
− Not a statement about convergence of PDFs or PMFs.

• Normal Approximation:
− Treat 𝑍𝑛 as if normal (CLT)
− Also treat 𝑆𝑛 as if normal (NA)

• Can we use it when 𝑛 is “moderate” ?


− Yes, but no nice theorems about the value of 𝑛

24
Normal Approximation based on CLT
• If 𝑛 is large, the probability 𝐏(𝑆𝑛 ≤ 𝑠) can be
approximated by treating 𝑆𝑛 as if it were normal,
according to the following procedure:

1. Calculate the mean 𝑛𝜇 and the variance 𝑛𝜎 2 of 𝑆𝑛 .


2. Calculate the normalized value 𝑧 = (𝑠 − 𝑛𝜇)/𝜎 𝑛.
3. Use the approximation
𝐏 𝑆𝑛 ≤ 𝑠 ≈ 𝐏 𝑍𝑛 ≤ 𝑧 = Φ 𝑧 ,
where Φ 𝑧 is available from the standard normal
CDF table.

25
Example: CLT (1)
• We load on a plane 100 packages whose weights are i.i.d. rvs
that are uniformly distributed between 5 and 50 kgs. What is
𝐏(𝑆100 > 3000 kgs) ?
5+50 (50−5)2
• 𝜇= = 27.5, 𝜎 2 = = 168.75
2 12
3000 − 100 × 27.5
𝑧= = 1.92
100 × 168.75
Use the standard normal tables to get the approximation
𝐏(𝑆100 ≤ 3000) ≈ Φ 1.92 = 0.9726.
Thus, the desired probability is
𝐏 𝑆100 > 3000 = 1 − 𝐏 𝑆100 ≤ 3000
≈ 1 − 0.9726 = 0.0274.

26
Example: CLT (2)
• The production times of machine parts are i.i.d. rvs, uniformly
distributed in [1, 5] minutes. What is the probability that the
number of parts produced within 320 minutes, 𝑁320 , is at least
100?
• Let 𝑋𝑖 be the processing time of the 𝑖th part and let 𝑆100 be the
total processing time of the 100 parts. Note that the event {𝑁320 ≥
100} is the same as the event {𝑆100 ≤ 320}.
1+5 (5−1)2 320−100×3
𝜇= = 3, 𝜎2 = = 4/3, 𝑧 = = 1.73
2 12 100×4/3
Thus, the desired probability is
𝐏 𝑁320 ≥ 100 = 𝐏 𝑆100 ≤ 320 ≈ Φ 1.73 = 0.9582.
n n+1

𝑆𝑛 ⇒ 𝑁𝑡 ≥ 𝑛 = {𝑆𝑛 ≤ 𝑡}
27
𝑁𝑡 events
Continuity Correction (1)
• Let us assume that 𝑌~Bin(𝑛 = 20, 𝑝 = 1/2), and suppose that
we are interested in 𝐏(8 ≤ 𝑌 ≤ 10). Then,
𝑌 = 𝑋1 + ⋯ + 𝑋𝑛 with 𝑋𝑖 ~Bernoulli 𝑝 = 1/2 .
• We can apply the CLT to approximate
8 − 𝑛𝜇 𝑌 − 𝑛𝜇 10 − 𝑛𝜇
𝐏 8 ≤ 𝑌 ≤ 10 = 𝐏 ≤ ≤
𝜎 𝑛 𝜎 𝑛 𝜎 𝑛
8 − 10 10 − 10
=𝐏 ≤𝑍≤
5 5
2
≈Φ 0 −Φ − = 0.3145
5
• We can also find the exact value
10 𝑘 20−𝑘
20 1 1
𝐏 8 ≤ 𝑌 ≤ 10 = ෍ 1− = 0.4565
𝑘=8 𝑘 2 2 28
Continuity Correction (2)
• We notice that our approximation is not good. Part of the error
comes from the fact that 𝑌 is a discrete rv and we are using a
continuous distribution. Here is a trick to get a better result, called
continuity correction.
• Since 𝑌 can only take integer values, we can write
𝐏 8 ≤ 𝑌 ≤ 10 = 𝐏(7.5 ≤ 𝑌 ≤ 10.5)
7.5 − 10 𝑌 − 𝑛𝜇 10.5 − 10
= 𝐏( ≤ ≤ )
5 𝜎 𝑛 5
0.5 2.5
≈Φ −Φ − = 0.4567
5 5
• As we can see, our approximation improved significantly. The
continuity correction is particularly useful when we use the
normal approximation to the binomial distribution.
29
Continuity Correction (3)
𝑌 is at least 8 = {𝑌 ≥ 8}
(includes 8 and above)

𝑌 is more than 8 = {𝑌 > 8}


(doesn’t include 8)

𝑌 is at most 8 = {𝑌 ≤ 8}
(includes 8 and below)

𝑌 is fewer than 8 = {𝑌 < 8}


(doesn’t include 8)

𝑌 is exactly 8 = {𝑌 = 8}
30
The Pollster’s Problem (2)
• Suppose we want 𝐏(|𝑀𝑛 − 𝑝| ≥ 0.01) ≤ 0.05 with 𝐄 𝑆𝑛 =
𝑛𝑝, 𝜎𝑆2𝑛 = 𝑛𝜎 2 and 𝜎 2 ≤ 1/4 (∵)Since 𝒑 takes values
in [𝟎, 𝟏], 𝝈𝟐 ≤ 𝟏 − 𝟎 𝟐 /𝟒
𝒑

• Event of interest: |𝑀𝑛 − 𝑝| ≥ 0.01


𝑋1 + ⋯ 𝑋𝑛 − 𝑛𝑝
≥ 0.01
𝑛
𝑋1 + ⋯ 𝑋𝑛 − 𝑛𝑝 0.01 𝑛

𝜎 𝑛 𝜎
𝑍𝑛 ≥ 0.01 𝑛/𝜎 ≈ |𝑍| ≥ 0.01 𝑛/𝜎
⇒ 𝐏 |𝑀𝑛 − 𝑝| ≥ 0.01 ≈ 𝐏 |𝑍| ≥ 0.01 𝑛/𝜎
• Obtain the upper bound on 𝑍 by assuming that 𝑝 has the largest
possible variance, 𝜎 2 = 1/4, which corresponds to 𝑝 = 1/2.
⇒ 𝐏 |𝑀𝑛 − 𝑝| ≥ 0.01 ≈ 𝐏 |𝑍| ≥ 0.02 𝑛
31
The Pollster’s Problem (3)
• How large a sample size n is needed if we want
𝐏(|𝑀𝑛 − 𝑝| ≥ 0.01) ≤ 0.05 ?
⇒ 𝐏 𝑀𝑛 − 𝑝 ≥ 0.01 ≈ 𝐏 𝑍 ≥ 0.02 𝑛
= 2 − 2𝐏(𝑍 ≤ 0.02 𝑛) = 2 − 2Φ(0.02 𝑛) ≤ 0.05,
or
Φ(0.02 𝑛) ≥ 0.975
• From the standard normal table, Φ 1.96 = 0.975
1.962
0.02 𝑛 ≥ 1.96 or 𝑛 ≥ = 9604
0.022

• Compare to 𝑛 ≥ 50,000 that we derived using the


Chebyshev’s inequality
32
Usefulness of the CLT
• Only means and variances matter
• Much more accurate than Chebyshev’s inequality
• Useful computational shortcut, even if we have a
formula for the distribution of 𝑆𝑛
• Justification of models involving normal rvs
− Noise in electrical components
− Motion of a particle suspended in a fluid (Brownian
motion)

33
CLT Summary

• 𝑋1 , ⋯ , 𝑋𝑛 are i.i.d. with finite 𝜇 and 𝜎 2


• 𝑆𝑛 = 𝑋1 + ⋯ + 𝑋𝑛 with mean 𝑛𝜇 and variance 𝑛𝜎 2
𝑆𝑛 −𝑛𝜇
• 𝑍𝑛 = →𝑍
𝜎 𝑛
where 𝑍 is standard normal (zero mean, unit variance)
• CLT: For every 𝑐, 𝐏 𝑍𝑛 ≤ 𝑐 → 𝐏 𝑍 ≤ 𝑐 = Φ(𝑐)
• Normal approximation: Treat 𝑆𝑛 as if normal

34
Proof of the CLT
• Assume for simplicity 𝜇 = 𝐄 𝑋 = 0, 𝐄 𝑋 2 = 𝜎 2 = 1
(𝑋1 +𝑋2 +⋯+𝑋𝑛 )
• We want to show that 𝑍𝑛 = converges to
𝑛
the standard normal, or equivalently show the MGF of 𝑍𝑛
tends to that of the standard normal distribution:
𝑀𝑍𝑛 𝑠 = 𝐄 𝑒 𝑠𝑍𝑛 = 𝐄[𝑒 (𝑠Τ 𝑛)(𝑋1+⋯+𝑋𝑛 ) ]
𝑠 𝑠 2 𝑠 2
𝐄 𝑒 𝑠𝑋Τ 𝑛 ≈ 1 + 𝐄𝑋 + 𝐄 𝑋2 ≈ 1 +
𝑛 2𝑛 2𝑛
𝑛 𝑛
𝑠2 2 /2
Thus, 𝑀𝑍𝑛 𝑠 = 𝐄[𝑒 𝑠𝑋Τ 𝑛 ] ≈ 1+ → 𝑒 𝑠 ,
2𝑛
which is the MGF of the standard normal distribution.

Note) MGF of 𝑁 𝜇, 𝜎 2 is exp(𝜇𝑠 + 𝜎 2 𝑠 2 Τ2) 35


Homework #6
Textbook “Introduction to Probability”, 2nd Edition, D. Bertsekas and J. Tsitsiklis
Chapter5 p.284-p.294, Problems 1, 4, 5, 8, 9, 10, 11
Due date: 아주BB 과제출제 확인

36

You might also like