Sampling Distribution CI HyT
Sampling Distribution CI HyT
Hypothesis Testing
▶ Random Sampling:
▶ Simple, systematic, stratified
▶ Reduces bias, increases representativeness
▶ Non-Random Sampling:
▶ Quota, cluster
▶ Practical for large or inaccessible populations
▶ Higher risk of bias
SIMULATING RANDOM SAMPLES
Simulating Random Samples from Given Distributions
X f
1 8
2 12
3 14
4 6
▶ Cumulative frequencies:
X Cumulative Frequency
1 8
2 20
3 34
4 40
Frequency Distribution
X Cumulative Proportion
1 0.20
2 0.50
3 0.85
4 1.00
X F (X ≤ x)
0 0.4096
1 0.8192
2 0.9728
3 0.9984
4 1.0000
F (x) √ x
3
0.723 x = √5.784 = 1.80
0.850 x = 3 6.80 = 1.89
Normal Distribution
Φ(z) z
0.382 z = −0.3
0.824 z = 0.931
29.4, 31.9
CENTRAL LIMIT THEOREM
Central Limit Theorem
σ2
X̄ ∼ N µ,
n
▶ Standard Error:
σ
SE = √
n
Example 1: Poisson Distribution
▶ Poisson Distribution: X ∼ Poisson(λ)
▶ Mean (µ = λ), Variance (σ 2 = λ)
▶ For a large sample size n:
λ
X̄ ∼ N λ,
n
▶ Example:
▶ λ = 4.5
▶ Sample size n = 30
▶ Distribution of sample mean:
4.5
X̄ ∼ N 4.5, = N(4.5, 0.15)
30
▶ Probability X̄ > 5:
5 − 4.5
P X̄ > 5 = P Z > √ = P(Z > 1.29) ≈ 0.098
0.15
Example 2: Binomial Distribution
▶ Binomial Distribution: X ∼ B(n, p)
▶ Mean (µ = np), Variance (σ 2 = npq)
▶ For a large sample size n:
npq pq
X̄ ∼ N np, = N(p, )
n n
▶ Example:
▶ n = 9, p = 0.5
▶ Sample size n = 30
▶ Distribution of sample mean:
2.25
X̄ ∼ N 4.5, = N(4.5, 0.075)
30
▶ Probability X̄ > 5:
5 − 4.5
P X̄ > 5 = P Z > √ = P(Z > 1.83) ≈ 0.034
0.075
Example 3: Uniform Distribution
▶ Uniform Distribution: X ∼ U(a, b)
(b−a)2
▶ Mean (µ = a+b 2
2 ), Variance (σ = 12 )
▶ For a large sample size n:
a + b (b − a)2
X̄ ∼ N ,
2 12n
▶ Example:
▶ a = 2, b = 7
▶ Sample size n = 30
▶ Distribution of sample mean:
25
X̄ ∼ N 4.5, = N(4.5, 0.0694)
360
▶ Probability X̄ > 5:
5 − 4.5
P X̄ > 5 = P Z > √ = P(Z > 1.90) ≈ 0.029
0.0694
Importance of the Central Limit Theorem
▶ Practical Applications:
▶ Helps in making inferences about population parameters.
▶ Basis for many statistical tests and confidence intervals.
▶ Simplifies analysis of large data sets.
▶ Assumptions and Conditions:
▶ Sample size should be sufficiently large (commonly n ≥ 30).
▶ The samples should be independent.
▶ The population should have a finite variance.
Central Limit Theorem in Action
▶ Real-world Example:
▶ Measuring the average height of adult males in a city.
▶ Population distribution might not be normal.
▶ By taking a large sample, the distribution of the sample mean
will be approximately normal.
▶ Steps:
1. Collect a large sample of heights.
2. Calculate the sample mean.
3. Use the normal distribution to make inferences about the
population mean.
Unbiased Estimates of Population Parameters
X
p̂ =
n
▶ Population Variance:
n
2 1 X
s = (Xi − X̄ )2
n−1
i=1
( X )2 7382
X P
2 1 2 1
s = X − = 16, 526 −
n−1 n 49 50
1 1
s2 = (16, 526 − 10, 881.84) = ×5, 644.16 = 115.19 minutes2
49 49
Example: Unbiased Estimate of the Proportion
7
p̂ = = 0.14
50
▶ Interpretation: 14
Unbiased Estimates of Population Variance
2.75 ≤ µ ≤ 2.85
▶ General Form:
σ
CI = x̄ ± z · √
n
▶ x̄ = sample mean
▶ z = z-score corresponding to the desired confidence level
▶ σ = population standard deviation
▶ n = sample size
▶ For unknown population standard deviation: Use the
sample standard deviation s and the t-distribution:
σ̂
CI = x̄ ± t · √
n
▶ t = t-score from the t-distribution table corresponding to the
desired confidence level and degrees of freedom (df = n − 1)
Confidence Intervals for Population Mean
Population Sample Size Variance C.I. Bounds
Normal Any size Known (σ 2 ) x̄ ± z √σn
Normal Large (n ≥ 30) Known (σ 2 ) x̄ ± z √σn
Any Large (n ≥ 30) Unknown (σ̂ 2 ) x̄ ± z √σ̂n
Normal Small (n < 30) Unknown (σ̂ 2 ) x̄ ± t √σ̂n
Table: Confidence Intervals for Different Cases
Note
For the cases where the population variance is unknown:
n 2
σ̂ 2 = s
n−1
OR
n
!
Pn 2
1 X ( xi )
σ̂ 2 = xi2 − i=1
n−1 n
i=1
Example - Population Mean (σ Known)
s
x̄ ± tn−1,α/2 √
n
r
p̂(1 − p̂)
p̂ ± zα/2
n
CI = x̄ ± ME = 15 ± 0.6192
CI = (14.3808, 15.6192)
▶ Therefore, the 95% confidence interval for the mean petrol
consumption is (14.3808, 15.6192) km/l.
Confidence Interval for Proportion
▶ Formula: r
p̂(1 − p̂)
CI = p̂ ± z ·
n
▶ p̂ = sample proportion
▶ z = z-score corresponding to the desired confidence level
▶ n = sample size
Example: Proportion of Diesel Car Owners
Problem:
▶ Calculate a 95% confidence interval for the proportion of
diesel car owners in a survey.
▶ Given:
▶ Sample proportion (p̂) = 0.4
▶ Sample size (n) = 200
Solution:
▶ Step 1: Identify the z-score for a 95% confidence level. From
the z-table, z0.025 = 1.96.
▶ Step 2: Calculate the standard error (SE):
r r
0.24 √
r
p̂(1 − p̂) 0.4(1 − 0.4)
SE = = = = 0.0012 = 0.03464
n 200 200
▶ Step 3: Calculate the margin of error (ME):
Problem:
▶ Calculate a 95% confidence interval for the proportion of
diesel car owners in a survey.
▶ Given:
▶ Sample proportion (p̂) = 0.4
▶ Sample size (n) = 200
Solution:
▶ Step 4: Determine the confidence interval:
CI = p̂ ± ME = 0.4 ± 0.0679
CI = (0.3321, 0.4679)
▶ Therefore, the 95% confidence interval for the proportion of
diesel car owners is (0.3321, 0.4679).
Confidence Interval for Variance
▶ Formula: !
(n − 1)s 2 (n − 1)s 2
, 2
χ2α/2 χ1−α/2
▶ s 2 = sample variance
▶ χ2 = chi-square value corresponding to the desired confidence
level and degrees of freedom (df = n − 1)
▶ n = sample size
Example: Variance of Battery Life
Problem:
▶ Calculate a 95% confidence interval for the variance of battery
life (in hours) of a new type of battery.
▶ Given:
▶ Sample variance (s 2 ) = 10 hours
▶ Sample size (n) = 15
Solution:
▶ Step 1: Identify the chi-square values for a 95% confidence
level with df = n − 1 = 14. From the chi-square table,
χ20.025,14 = 26.119 and χ20.975,14 = 5.629.
▶ Step 2: Calculate the confidence interval for the variance:
!
(n − 1)s 2 (n − 1)s 2
CI = , 2
χ2α/2 χ1−α/2
14 · 10 14 · 10
CI = ,
26.119 5.629
140 140
CI = ,
26.119 5.629
Example: Variance of Battery Life
Solution:
▶ Step 2: Calculate the confidence interval for the variance:
!
(n − 1)s 2 (n − 1)s 2
CI = , 2
χ2α/2 χ1−α/2
14 · 10 14 · 10
CI = ,
26.119 5.629
140 140
CI = ,
26.119 5.629
CI = (5.36, 24.88)
▶ Therefore, the 95% confidence interval for the variance of
battery life is (5.36, 24.88) hours.
HYPOTHESIS TESTING
What is Hypothesis Testing?
σ2
X ∼ N µ0 ,
n
X̄ − µ0
Test statistic: Z = √ where Z ∼ N(0, 1)
σ/ n
Hypothesis Test 1: z-tests and t-tests
σ2
X ∼ N µ0 ,
n
X̄ − µ0
Test statistic: Z = √ where Z ∼ N(0, 1)
σ/ n
Hypothesis Test 1: z-tests and t-tests
Standardized test statistics:
Test 1: Testing an unknown population mean µ, H0 : µ = µ0
When σ 2 is unknown:
▶ 1c: X is preferably normally distributed. For large n,
σ2
X ∼ N µ0 ,
n
X̄ − µ0
Test statistic: Z = √ where Z ∼ N(0, 1)
s/ n
▶ 1d: X is normally distributed, X ∼ N(µ, σ 2 ). For small n,
X̄ − µ0
Test statistic: T = √ where T ∼ t(n − 1)
s/ n
Hypothesis Test 2: Testing a binomial proportion p, where
X ∼ B(n, p)
X − np
Test statistic: Z = √ where Z ∼ N(0, 1)
npq
Hypothesis Test 3: Testing µ1 − µ2 , the difference between
means of two normal distributions
3a: σ12 , σ22 known
σ12 σ22
X1 − X2 ∼ N µ1 − µ2 , +
n1 n2
(X̄1 − X̄2 ) − (µ1 − µ2 )
Test statistic: Z = q 2 where Z ∼ N(0, 1)
σ1 σ22
n1 + n2
n1 s12 + n2 s22
Use σ̂ 2 = (s12 , s22 sample variances)
n1 + n2 − 2
(x1i − X̄1 )2 + (x2i − X̄2 )2
P P
2
σ̂ =
n1 + n2 − 2
When n is large
When n is small
(X̄1 − X̄2 ) − (µ1 − µ2 )
Test statistic: T = q where T ∼ t(n1 +n2 −2)
σ̂ n11 + n12
Z-TEST
One-Tailed Z-Test
S. L. Two-tailed test
10% Reject H0 if z < −1.645 or z > 1.645
(written |z| > 1.645)
5% Reject H0 if z < −1.96 or z > 1.96
(written |z| > 1.96)
1% Reject H0 if z < −2.576 or z > 2.576
(written |z| > 2.576)
Table: Critical values for one-tailed tests (S. L. stands for Significant
Level)
Example: z-test for Mean
Problem:
▶ The mean volume of liquid in packs is 524 ml with a standard
deviation of 3 ml.
▶ After repair, the mean volume of 50 packs is 524.9 ml.
▶ Test at the 5% significance level whether the mean volume
has increased.
Solution:
▶ Step 1: State the hypotheses:
H0 : µ = 524 (null hypothesis)
H1 : µ > 524 (alternative hypothesis)
▶ Step 2: Choose the significance level: α = 0.05
▶ Step 3: Select the appropriate test: z-test (large sample,
known variance)
▶ Step 4: Calculate the test statistic:
x̄ − µ 524.9 − 524 0.9
z= √ = √ = ≈ 2.12
σ/ n 3/ 50 0.424
Example: z-test for Mean
Problem:
▶ The mean volume of liquid in packs is 524 ml with a standard
deviation of 3 ml.
▶ After repair, the mean volume of 50 packs is 524.9 ml.
▶ Test at the 5% significance level whether the mean volume
has increased.
Solution:
▶ Step 5: Determine the critical value: For α = 0.05,
z0.05 = 1.645
▶ Step 6: Make a decision: Since z = 2.12 > 1.645, reject
H0 .
▶ Step 7: Draw a conclusion: There is sufficient evidence at
the 5% significance level to conclude that the mean volume
has increased.
T-TEST FOR MEAN
Example: t-test for Mean
Problem:
▶ The average volume in a sample of 10 jars of marmalade is
454.8 g.
▶ Standard deviation of the sample is 0.8 g.
▶ Test at the 5% significance level whether the mean volume
has changed from 455 g.
Solution:
▶ Step 1: State the hypotheses:
H0 : µ = 455 (null hypothesis)
H1 : µ ̸= 455 (alternative hypothesis)
▶ Step 2: Choose the significance level: α = 0.05
▶ Step 3: Select the appropriate test: t-test (small sample,
unknown variance)
▶ Step 4: Calculate the test statistic:
x̄ − µ 454.8 − 455 −0.2
t= √ = √ = ≈ −0.79
s/ n 0.8/ 10 0.253
Example: t-test for Mean
Problem:
▶ The average volume in a sample of 10 jars of marmalade is
454.8 g.
▶ Standard deviation of the sample is 0.8 g.
▶ Test at the 5% significance level whether the mean volume
has changed from 455 g.
Solution:
▶ Step 5: Determine the critical value: For α = 0.05 and
df = 9, t0.025,9 = 2.262
▶ Step 6: Make a decision: Since |t| = 0.79 < 2.262, do not
reject H0 .
▶ Step 7: Draw a conclusion: There is not sufficient evidence
at the 5
Example: Hypothesis Test for Proportion
Problem:
▶ Of 180 survey respondents, 134 favor a proposal.
▶ Test at the 5% significance level whether the proportion of the
population favoring the proposal is greater than 70
Solution:
▶ Step 1: State the hypotheses:
H0 : p = 0.7 (null hypothesis)
H1 : p > 0.7 (alternative hypothesis)
▶ Step 2: Choose the significance level: α = 0.05
▶ Step 3: Select the appropriate test: z-test for proportion
▶ Step 4: Calculate the test statistic:
134
p̂ = = 0.744
180
p̂ − p 0.744 − 0.7
z=q = q ≈ 1.56
p(1−p) 0.7(0.3)
n 180
Example: Hypothesis Test for Proportion
Problem:
▶ Of 180 survey respondents, 134 favor a proposal.
▶ Test at the 5% significance level whether the proportion of the
population favoring the proposal is greater than 70
Solution:
▶ Step 5: Determine the critical value: For α = 0.05,
z0.05 = 1.645
▶ Step 6: Make a decision: Since z = 1.56 < 1.645, do not
reject H0 .
▶ Step 7: Draw a conclusion: There is not sufficient evidence
at the 5
TYPE-I & II ERRORS
Type I and Type II Errors
▶ Type I Error: Rejecting the null hypothesis when it is
actually true.
▶ Type II Error: Failing to reject the null hypothesis when it is
actually false.
▶ Significance Level (α): The probability of making a Type I
error.
▶ Power of the Test: The probability of correctly rejecting the
null hypothesis (1 - probability of Type II error).
Problem:
▶ A random variable has a normal distribution with mean µ and
standard deviation 3.
▶ Test H0 : µ = 20 against H1 : µ > 20 using a sample size of
25.
▶ Reject H0 if the sample mean is greater than 21.4.
Solution:
▶ Calculate the probability of Type I error:
21.4 − 20
=P Z > √ = P(Z > 2.33) = 0.01
3/ 25
Example: Type I and Type II Errors
Problem:
▶ A random variable has a normal distribution with mean µ and
standard deviation 3.
▶ Test H0 : µ = 20 against H1 : µ > 20 using a sample size of
25.
▶ Reject H0 if the sample mean is greater than 21.4.
Solution:
▶ Calculate the probability of Type II error:
21.4 − 21
=P Z ≤ √ = P(Z ≤ 0.67) = 0.75
3/ 25
▶ Power of the test:
M2 = 1.9620×100
2 = 3.8416×10000
400 = 96.04 ≈ 97
▶ Number of light bulbs to be selected = 97
2. Estimating population proportion of credit card users
▶ α = 100% − 99% = 1% ⇒ α/2 = 0.5% ⇒ z0.5% = 2.576
2 2
▶ n = zα/2 P(1−P)
M2 = 2.576 ×0.66×(1−0.66)
0.032 ≈ 1646.82 ≈ 1647
▶ Number of people to be sampled = 1647