0% found this document useful (0 votes)
33 views98 pages

Sampling Distribution CI HyT

Uploaded by

Lavin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views98 pages

Sampling Distribution CI HyT

Uploaded by

Lavin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Sampling Distribution, Confidence Intervals and

Hypothesis Testing

Dr. Sandun Dassanayake


Population

▶ Population: The target group about which we want to gather


information.
▶ Examples of populations:
▶ Pupils in a class
▶ People in England in full-time employment
▶ Hospitals in Wales
▶ Cans of soft drink produced in a factory
▶ Ferns in a wood
▶ Rational numbers between 0 and 10
Surveys

▶ Surveys: Methods for collecting information from


populations.
▶ Types of surveys:
1. Census: Surveys every member of the population.
▶ Example: Government census of the population every ten
years.
▶ Advantage: Accurate information about the entire population.
▶ Disadvantage: Time-consuming and costly, especially for large
populations.
2. Sample Survey: Surveys a subset of the population.
▶ Example: Opinion polls conducted before elections.
▶ Advantage: Cheaper and quicker than a census.
▶ Disadvantage: Must ensure the sample is representative to
avoid bias.
Bias
▶ Bias: Systematic error that leads to incorrect estimates of
population parameters.
▶ Sources of bias:
▶ Lack of a good sampling frame:
▶ Example: Using a telephone directory misses people without
phones.
▶ Wrong choice of sampling unit:
▶ Example: Surveying individuals instead of households.
▶ Non-response by some chosen units:
▶ Example: People refusing to participate in a survey.
▶ Bias introduced by the person conducting the survey:
▶ Example: Interviewer influencing responses by their
questioning style.
▶ Eliminating bias:
▶ Use random sampling methods.
▶ Ensure sampling frame is as accurate as possible.
▶ Design questionnaires to be clear, specific, and neutral.
RANDOM SAMPLING
Simple Random Sampling

▶ Simple Random Sampling: Every sample of size n has an


equal chance of being selected.
▶ Methods:
▶ Drawing lots: Assign numbers to each unit, then randomly
draw numbers.
▶ Random number tables: Use random digits to select
sampling units.
▶ Example:
▶ Population: 100 people.
▶ Sample size: 10 people.
▶ Method: Assign numbers 1 to 100 to each person, then use
random number tables to select 10 numbers.
Systematic Sampling

▶ Systematic Sampling: Select every k-th unit from a list


after a random starting point.
▶ Example:
▶ Population: 300 people.
▶ Sample size: 10 people.
▶ Method: Choose a random starting point, then select every
30th person.
▶ Advantage: Quick and easy to carry out.
▶ Disadvantage: Can introduce bias if there is a periodic pattern
in the population.
Stratified Sampling

▶ Stratified Sampling: Divide population into strata and


sample each stratum proportionately.
▶ Example:
▶ Population: Company employees.
▶ Strata: Drivers, administrative staff, mechanics.
▶ Method: Sample each group proportionately to its size in the
population.
▶ Advantage: Ensures representation of all subgroups.
▶ Disadvantage: Requires detailed population information.
NON-RANDOM SAMPLING
Cluster Sampling

▶ Cluster Sampling: Divide population into clusters, then


randomly sample clusters.
▶ Example:
▶ Population: Primary school children.
▶ Clusters: Local education authorities.
▶ Method: Randomly select clusters, then sample all children
within chosen clusters.
▶ Advantage: Cost-effective and convenient.
▶ Disadvantage: Higher similarity within clusters can reduce
precision.
Quota Sampling

▶ Quota Sampling: Divide population into groups and sample


until quotas are met.
▶ Example:
▶ Population: Shoppers in a mall.
▶ Groups: Age, gender.
▶ Method: Interview specified number of people in each group.
▶ Advantage: Quick and easy to implement.
▶ Disadvantage: Non-random, potential for interviewer bias.
Comparison of Sampling Methods

▶ Random Sampling:
▶ Simple, systematic, stratified
▶ Reduces bias, increases representativeness
▶ Non-Random Sampling:
▶ Quota, cluster
▶ Practical for large or inaccessible populations
▶ Higher risk of bias
SIMULATING RANDOM SAMPLES
Simulating Random Samples from Given Distributions

▶ Simulating Random Samples: Generating data that mimics


a given probability distribution.
▶ Method: Use cumulative proportional frequencies or
probabilities to generate random samples.
Frequency Distribution
▶ Frequency Distribution: Distribution of data points
according to the number of occurrences.
▶ Example:
▶ Given frequencies for X :

X f
1 8
2 12
3 14
4 6

▶ Cumulative frequencies:

X Cumulative Frequency
1 8
2 20
3 34
4 40
Frequency Distribution

▶ Frequency Distribution: Distribution of data points


according to the number of occurrences.
▶ Example:
▶ Proportional frequencies:

X Cumulative Proportion
1 0.20
2 0.50
3 0.85
4 1.00

▶ Use random numbers to simulate:

36, 42, 94, 58, 83 =⇒ X = 2, 2, 4, 3, 3


Probability Distribution
▶ Probability Distribution: Describes how probabilities are
distributed over the values of the random variable.
▶ Example:
X P(X = x)
0 0.1
1 0.2
2 0.4
3 0.3
▶ Cumulative probabilities:
X F (X ≤ x)
0 0.1
1 0.3
2 0.7
3 1.0
▶ Use random numbers to simulate:
3, 7, 4, 7, 6, 5, 3, 3, 9, 0 =⇒ X = 1, 2, 2, 2, 2, 2, 1, 1, 3, 3
Binomial Distribution
▶ Binomial Distribution: Discrete distribution representing the
number of successes in a fixed number of independent trials.
▶ Example:
X ∼ B(4, 0.2)
▶ Cumulative probabilities:

X F (X ≤ x)
0 0.4096
1 0.8192
2 0.9728
3 0.9984
4 1.0000

▶ Use random numbers to simulate:

2811, 5747, 6157, 8988 =⇒ X = 0, 1, 1, 2


Poisson Distribution
▶ Poisson Distribution: Discrete distribution representing the
number of events occurring in a fixed interval of time or space.
▶ Example:
X ∼ Poisson(3)
▶ Cumulative probabilities:
X F (X ≤ x)
0 0.0498
1 0.1991
2 0.4232
3 0.6472
4 0.8153
5 0.9161
6 0.9665
7 0.9881
8+ 1.0000
▶ Use random numbers to simulate:
8135 =⇒ X = 4
Continuous Distribution

▶ Continuous Distribution: Represents data points that can


take any value within a given range.
▶ Example: Uniform Distribution f (x) = 38 x 2 for 0 < x < 2
▶ Cumulative distribution function:
1
F (x) = x 3
8
▶ Use random numbers to simulate:

F (x) √ x
3
0.723 x = √5.784 = 1.80
0.850 x = 3 6.80 = 1.89
Normal Distribution

▶ Normal Distribution: Continuous distribution that is


symmetric about the mean, representing the distribution of
many natural phenomena.
▶ Example: X ∼ N(30, 4)
▶ Cumulative probabilities using standard normal tables:

Φ(z) z
0.382 z = −0.3
0.824 z = 0.931

▶ Use random numbers to simulate:

29.4, 31.9
CENTRAL LIMIT THEOREM
Central Limit Theorem

▶ Central Limit Theorem (CLT): States that the distribution


of the sample mean will approximate a normal distribution as
the sample size becomes large, regardless of the original
distribution of the population.
▶ Applies to both discrete and continuous distributions.
▶ Key points:
▶ The mean of the sample means will equal the population mean.
▶ The standard deviation of the sample means (standard error)
will equal the population standard deviation divided by the
square root of the sample size.
Mathematical Formulation

▶ Population: Mean (µ), Variance (σ 2 )


▶ Sample: Mean (X̄ ), Size (n)
▶ Distribution of the sample mean:

σ2
 
X̄ ∼ N µ,
n
▶ Standard Error:
σ
SE = √
n
Example 1: Poisson Distribution
▶ Poisson Distribution: X ∼ Poisson(λ)
▶ Mean (µ = λ), Variance (σ 2 = λ)
▶ For a large sample size n:
 
λ
X̄ ∼ N λ,
n
▶ Example:
▶ λ = 4.5
▶ Sample size n = 30
▶ Distribution of sample mean:
 
4.5
X̄ ∼ N 4.5, = N(4.5, 0.15)
30

▶ Probability X̄ > 5:
 
 5 − 4.5
P X̄ > 5 = P Z > √ = P(Z > 1.29) ≈ 0.098
0.15
Example 2: Binomial Distribution
▶ Binomial Distribution: X ∼ B(n, p)
▶ Mean (µ = np), Variance (σ 2 = npq)
▶ For a large sample size n:
 npq  pq
X̄ ∼ N np, = N(p, )
n n
▶ Example:
▶ n = 9, p = 0.5
▶ Sample size n = 30
▶ Distribution of sample mean:
 
2.25
X̄ ∼ N 4.5, = N(4.5, 0.075)
30

▶ Probability X̄ > 5:
 
 5 − 4.5
P X̄ > 5 = P Z > √ = P(Z > 1.83) ≈ 0.034
0.075
Example 3: Uniform Distribution
▶ Uniform Distribution: X ∼ U(a, b)
(b−a)2
▶ Mean (µ = a+b 2
2 ), Variance (σ = 12 )
▶ For a large sample size n:

a + b (b − a)2
 
X̄ ∼ N ,
2 12n
▶ Example:
▶ a = 2, b = 7
▶ Sample size n = 30
▶ Distribution of sample mean:
 
25
X̄ ∼ N 4.5, = N(4.5, 0.0694)
360
▶ Probability X̄ > 5:
 
 5 − 4.5
P X̄ > 5 = P Z > √ = P(Z > 1.90) ≈ 0.029
0.0694
Importance of the Central Limit Theorem

▶ Practical Applications:
▶ Helps in making inferences about population parameters.
▶ Basis for many statistical tests and confidence intervals.
▶ Simplifies analysis of large data sets.
▶ Assumptions and Conditions:
▶ Sample size should be sufficiently large (commonly n ≥ 30).
▶ The samples should be independent.
▶ The population should have a finite variance.
Central Limit Theorem in Action

▶ Real-world Example:
▶ Measuring the average height of adult males in a city.
▶ Population distribution might not be normal.
▶ By taking a large sample, the distribution of the sample mean
will be approximately normal.
▶ Steps:
1. Collect a large sample of heights.
2. Calculate the sample mean.
3. Use the normal distribution to make inferences about the
population mean.
Unbiased Estimates of Population Parameters

▶ Unbiased Estimate: An estimate of a population parameter


is unbiased if the expected value of the estimate equals the
true parameter value.
▶ Point Estimates:
▶ Proportion (p̂): Best unbiased estimate of population
proportion p.
▶ Mean (X̄ ): Best unbiased estimate of population mean µ.
▶ Variance (s 2 ): Best unbiased estimate of population variance
σ2 .
▶ Efficient Estimate: The most efficient estimate has the
smallest variance among all unbiased estimates.
Mathematical Formulation of Unbiased Estimates
▶ Population Mean:
Pn
i=1 Xi
µ= = X̄
n
▶ Population Proportion:

X
p̂ =
n
▶ Population Variance:
n
2 1 X
s = (Xi − X̄ )2
n−1
i=1

▶ Alternative formula for variance:


n
!
( ni=1 Xi )2
P
2 1 X
s = Xi2 −
n−1 n
i=1
Example: Unbiased Estimate of the Mean
▶ Scenario: A railway enthusiast records the lateness of trains
to the nearest minute.
▶ Data: Random sample of 50 journeys with recorded lateness
(in minutes).
▶ Given: X X
X = 738, X 2 = 16, 526
▶ Calculate Mean:
P
X 738
X̄ = = = 14.76 minutes
n 50
▶ Calculate Variance:

( X )2 7382
X P   
2 1 2 1
s = X − = 16, 526 −
n−1 n 49 50

1 1
s2 = (16, 526 − 10, 881.84) = ×5, 644.16 = 115.19 minutes2
49 49
Example: Unbiased Estimate of the Proportion

▶ Scenario: Proportion of trains that are more than 25 minutes


late.
▶ Data: Out of 50 journeys, 7 trains were more than 25
minutes late.
▶ Calculate Proportion:

7
p̂ = = 0.14
50
▶ Interpretation: 14
Unbiased Estimates of Population Variance

▶ Unbiased Estimate of Variance:


n
2 1 X
s = (Xi − X̄ )2
n−1
i=1

▶ Example: Using the previous data, calculate the unbiased


estimate of the variance:
7382
 
2 1
s = 16, 526 − = 115.19
49 50
▶ Standard Deviation:
√ √
s = s 2 = 115.19 ≈ 10.73 minutes
Importance of Unbiased Estimates

▶ Accuracy: Unbiased estimates provide accurate reflections of


population parameters.
▶ Efficiency: Efficient estimates minimize the variance, leading
to more precise estimates.
▶ Applications:
▶ Statistical analyses
▶ Quality control
▶ Policy making
INTERVAL ESTIMATES
What are Interval Estimates?

▶ Point Estimate: A single value estimate of a population


parameter (e.g., sample mean).
▶ Interval Estimate: A range of values within which the
population parameter is expected to lie with a certain level of
confidence.
▶ Confidence Interval (CI): An interval estimate combined
with a confidence level, typically expressed as a percentage
(e.g., 95
▶ Confidence Level: The probability that the interval estimate
contains the population parameter (e.g., 95% confidence level
means we are 95% confident that the interval contains the
true parameter).
Confidence Interval Applications

▶ Suppose we want to estimate the mean GPA of all the


students at UOM. The mean GPA for all the students is an
unknown population mean µ.
▶ How? – select a sample of students and compute the sample
mean x̄. Assume x̄ is 2.80.
▶ Now, as a point estimate of the population mean µ, we ask
how accurate is the 2.80 value as an estimate of µ?
▶ By taking into account the variability from sample to sample,
we can construct a CI estimate for µ to answer this question.
Constructing a Confidence Interval

▶ When we construct a CI, we indicate the confidence of


correctly estimating the value of the population parameter.
▶ This allows us to say that there is a specified confidence that
µ is somewhere in the range of numbers defined by the
interval.
▶ After studying today’s lesson, we might find that a 95% CI for
the mean GPA at UOM is:

2.75 ≤ µ ≤ 2.85

▶ We interpret this interval estimate by stating that we are 95%


confident that the mean GPA at UOM is between 2.75 and
2.85.
Confidence Interval

▶ A CI uses sample data to estimate an unknown population


parameter with an indication of how accurate the point
estimate is and of how confident we are that the result is
correct.
▶ The general form of a CI is:

Point estimate ± Margin of error

▶ The confidence level (CL) is the success rate of the method


that produces the interval. That is, CL is the probability that
the method will give a correct answer.
Formula for Confidence Interval

▶ General Form:
σ
CI = x̄ ± z · √
n
▶ x̄ = sample mean
▶ z = z-score corresponding to the desired confidence level
▶ σ = population standard deviation
▶ n = sample size
▶ For unknown population standard deviation: Use the
sample standard deviation s and the t-distribution:
σ̂
CI = x̄ ± t · √
n
▶ t = t-score from the t-distribution table corresponding to the
desired confidence level and degrees of freedom (df = n − 1)
Confidence Intervals for Population Mean
Population Sample Size Variance C.I. Bounds
Normal Any size Known (σ 2 ) x̄ ± z √σn
Normal Large (n ≥ 30) Known (σ 2 ) x̄ ± z √σn
Any Large (n ≥ 30) Unknown (σ̂ 2 ) x̄ ± z √σ̂n
Normal Small (n < 30) Unknown (σ̂ 2 ) x̄ ± t √σ̂n
Table: Confidence Intervals for Different Cases

Note
For the cases where the population variance is unknown:
n 2
σ̂ 2 = s
n−1
OR
n
!
Pn 2
1 X ( xi )
σ̂ 2 = xi2 − i=1
n−1 n
i=1
Example - Population Mean (σ Known)

▶ Sample mean of the college students saving funds


contribution is 1768 and the σ is 1483. Suppose that sample
size is only 177 students.
▶ (a) At 95% confidence, what is the margin of error?
▶ (b) Construct and interpret a 95% CI for the population mean.
Solution - Population Mean (σ Known)

▶ Given: x̄ =1768, σ =1483, n = 177


▶ Need to find: M = zα/2 √σ and 95% CI for µ (σ is known)
n
▶ α = 100% − 95% = 5% ⇒ α/2 = 2.5%
▶ zα/2 = z2.5% = 1.96
▶ (a) M = 1.96 × √1483
177
≈ 218.48 (in
)(b)95%CIfor µ = x̄ ±M = 1768±218.48 = [1549.52, 1986.48]

▶ Conclusion: We conclude with 95% confidence that the


population mean of the college students saving funds
contribution is between 1549.52 and 1986.48.
Your Turn

▶ A sample of 10 typical one-hour massage therapy sessions


showed an average charge of 59. The population standard
deviation for a one-hour session is σ = 5.50.
▶ (a) At 99% confidence, what is the margin of error?
▶ (b) Construct and interpret a 99
▶ Answers:
▶ (a) 4.47
▶ (b) 54.53 ≤ µ ≤ 63.47(in$)
CI for Population Mean (σ Unknown)

s
x̄ ± tn−1,α/2 √
n

▶ Use the t-distribution with n − 1 degrees of freedom.


▶ Example: Construct a 95% CI for the population mean with
unknown σ.
Example - Population Mean (σ Unknown)

▶ Sales personnel for Skillings Distributors submit weekly reports


listing the customer contacts made during the week. A sample
of 15 weekly reports showed a sample mean of 19.5 customer
contacts per week. The sample standard deviation was 5.2.
▶ (a) With 95% confidence, what is the margin of error?
▶ (b) Construct and interpret a 95% CI for the population mean
number of weekly customer contacts for the sales personnel.
Solution - Population Mean (σ Unknown)

▶ Given: n = 15, x̄ = 19.5, s = 5.2


▶ Need to find: M = tn−1,α/2 √s and 95% CI for µ (σ is
n
unknown)
▶ α = 100% − 95% = 5% ⇒ α/2 = 2.5%
▶ tn−1,α/2 = t14,2.5% = 2.145
▶ (a) M = 2.145 × 5.2

15
≈ 2.88
▶ (b) 95% CI for µ = x̄ ± M = 19.5 ± 2.88 = [16.62, 22.38]
▶ Conclusion: We are 95% confident that the population mean
number of weekly customer contacts for the sales personnel is
between 16.62 and 22.38.
Your Turn

▶ A stationery store wants to estimate the mean retail value of


greeting cards that it has in its inventory. A random sample of
20 greeting cards indicates a mean value of
2.55andastandarddeviationof 0.44.
▶ (a) At 90
▶ (b) Construct and interpret a 90
▶ Answers:
▶ (a) 0.17(b)2.38 ≤ µ ≤ 2.72(in)
CI for Population Proportion

r
p̂(1 − p̂)
p̂ ± zα/2
n

▶ Example: Construct a 95% CI for a population proportion.


Example - Population Proportion

▶ In a survey of 1000 adults in the United States, 520 indicated


that they had at least one alcoholic drink in the past 7 days.
Compute a 95% CI for the population proportion of adults in
the United States who had at least one drink in the past 7
days.
Solution - Population Proportion

▶ Given: n = 1000, p̂ = 0.52


▶ Need to find: 95% CI for P
▶ α = 100% − 95% = 5% ⇒ α/2 = 2.5%
▶ zα/2 = z2.5% = 1.96
q
▶ 95% CI for P = p̂ ± zα/2 p̂(1−
n
p̂)
=
q
0.52 ± 1.96 0.52×(1−0.52)
1000 ≈ [0.4896, 0.5504]
▶ Conclusion: We are 95% confident that the population
proportion of adults in the United States who had at least one
drink in the past 7 days is between 0.4896 and 0.5504.
Your Turn

▶ In a survey of 700 workers in the United States, 276 indicated


that they had at least one workplace injury in the past year.
Compute a 99
▶ Answers:
0.344, 0.442
t-DISTRIBUTION
Introduction to the t-Distribution

▶ t-Distribution: Used when the sample size is small (n < 30)


and/or the population standard deviation is unknown.
▶ Degrees of Freedom (df): The t-distribution has
df = n − 1.
▶ Characteristics:
▶ Symmetric and bell-shaped, like the normal distribution.
▶ More spread out than the normal distribution, especially for
small sample sizes.
▶ Approaches the normal distribution as the sample size
increases.
Example of the t-Distribution
Problem:
▶ For a sample of size 8, T follows a t-distribution with 7
degrees of freedom.
▶ Find the 95
▶ Given: Sample mean X̄ = 20, Sample standard deviation
s = 2.
Solution:
▶ Step 1: Identify the t-value for a 95
▶ Step 2: Calculate the standard error (SE):
s 2 2
SE = √ = √ = = 0.707
n 8 2.828
▶ Step 3: Calculate the margin of error (ME):
ME = t · SE = 2.365 · 0.707 = 1.674
▶ Step 4: Determine the confidence interval:
CI = x̄ ± ME = 20 ± 1.674
CI = (18.326, 21.674)
Example: Mean Petrol Consumption
Problem:
▶ Calculate a 95% confidence interval for the mean petrol
consumption (in kilometers per liter) of cars of a specific type.
▶ Given:
▶ Sample mean (x̄) = 15 km/l
▶ Sample standard deviation (s) = 1.5 km/l
▶ Sample size (n) = 25
Solution:
▶ Step 1: Identify the t-score for a 95% confidence level with
df = n − 1 = 24. From the t-distribution table,
t0.025,24 ≈ 2.064.
▶ Step 2: Calculate the standard error (SE):
s 1.5 1.5
SE = √ = √ = = 0.3
n 25 5
▶ Step 3: Calculate the margin of error (ME):

ME = t · SE = 2.064 · 0.3 = 0.6192


Example: Mean Petrol Consumption
Problem:
▶ Calculate a 95% confidence interval for the mean petrol
consumption (in kilometers per liter) of cars of a specific type.
▶ Given:
▶ Sample mean (x̄) = 15 km/l
▶ Sample standard deviation (s) = 1.5 km/l
▶ Sample size (n) = 25
Solution:
▶ Step 4: Determine the confidence interval:

CI = x̄ ± ME = 15 ± 0.6192

CI = (14.3808, 15.6192)
▶ Therefore, the 95% confidence interval for the mean petrol
consumption is (14.3808, 15.6192) km/l.
Confidence Interval for Proportion

▶ Formula: r
p̂(1 − p̂)
CI = p̂ ± z ·
n
▶ p̂ = sample proportion
▶ z = z-score corresponding to the desired confidence level
▶ n = sample size
Example: Proportion of Diesel Car Owners
Problem:
▶ Calculate a 95% confidence interval for the proportion of
diesel car owners in a survey.
▶ Given:
▶ Sample proportion (p̂) = 0.4
▶ Sample size (n) = 200
Solution:
▶ Step 1: Identify the z-score for a 95% confidence level. From
the z-table, z0.025 = 1.96.
▶ Step 2: Calculate the standard error (SE):
r r
0.24 √
r
p̂(1 − p̂) 0.4(1 − 0.4)
SE = = = = 0.0012 = 0.03464
n 200 200
▶ Step 3: Calculate the margin of error (ME):

ME = z · SE = 1.96 × 0.03464 = 0.0679


Example: Proportion of Diesel Car Owners

Problem:
▶ Calculate a 95% confidence interval for the proportion of
diesel car owners in a survey.
▶ Given:
▶ Sample proportion (p̂) = 0.4
▶ Sample size (n) = 200
Solution:
▶ Step 4: Determine the confidence interval:

CI = p̂ ± ME = 0.4 ± 0.0679

CI = (0.3321, 0.4679)
▶ Therefore, the 95% confidence interval for the proportion of
diesel car owners is (0.3321, 0.4679).
Confidence Interval for Variance

▶ Formula: !
(n − 1)s 2 (n − 1)s 2
, 2
χ2α/2 χ1−α/2
▶ s 2 = sample variance
▶ χ2 = chi-square value corresponding to the desired confidence
level and degrees of freedom (df = n − 1)
▶ n = sample size
Example: Variance of Battery Life
Problem:
▶ Calculate a 95% confidence interval for the variance of battery
life (in hours) of a new type of battery.
▶ Given:
▶ Sample variance (s 2 ) = 10 hours
▶ Sample size (n) = 15
Solution:
▶ Step 1: Identify the chi-square values for a 95% confidence
level with df = n − 1 = 14. From the chi-square table,
χ20.025,14 = 26.119 and χ20.975,14 = 5.629.
▶ Step 2: Calculate the confidence interval for the variance:
!
(n − 1)s 2 (n − 1)s 2
CI = , 2
χ2α/2 χ1−α/2
 
14 · 10 14 · 10
CI = ,
26.119 5.629
 
140 140
CI = ,
26.119 5.629
Example: Variance of Battery Life

Solution:
▶ Step 2: Calculate the confidence interval for the variance:
!
(n − 1)s 2 (n − 1)s 2
CI = , 2
χ2α/2 χ1−α/2
 
14 · 10 14 · 10
CI = ,
26.119 5.629
 
140 140
CI = ,
26.119 5.629
CI = (5.36, 24.88)
▶ Therefore, the 95% confidence interval for the variance of
battery life is (5.36, 24.88) hours.
HYPOTHESIS TESTING
What is Hypothesis Testing?

▶ Hypothesis Testing: A statistical method used to make


decisions about the population parameters based on sample
data.
▶ Null Hypothesis (H0 ): The hypothesis that there is no effect
or no difference; it is the default or starting assumption.
▶ Alternative Hypothesis (H1 ): The hypothesis that there is
an effect or a difference; it is what we want to prove.
▶ Significance Level (α): The probability of rejecting the null
hypothesis when it is actually true, typically set at 0.05 or 5
▶ p-value: The probability of obtaining test results at least as
extreme as the observed results, under the assumption that
the null hypothesis is correct.
Types of Hypothesis Tests (based on the ”-tail”)
▶ One-tailed Test: Tests for an effect in one direction (e.g.,
H1 : µ > µ0 ).

▶ Two-tailed Test: Tests for an effect in either direction (e.g.,


H1 : µ ̸= µ0 ).
Types of Hypothesis Tests (based on the ”sample-size”)

▶ z-tests: Used when the population variance is known or the


sample size is large (n ≥ 30).
▶ t-tests: Used when the population variance is unknown and
the sample size is small (n < 30).
Steps in Hypothesis Testing
1. State the hypotheses: Define the null and alternative
hypotheses.
2. Choose the significance level: Decide on α, the probability
of making a Type I error.
3. Select the appropriate test: Based on the sample size and
variance information, choose between z-test or t-test.
4. Calculate the test statistic: Use the sample data to
calculate the test statistic (z or t).
5. Determine the p-value or critical value (CV): Compare
the test statistic to the critical value or calculate the p-value.
6. Make a decision: Reject H0 if the test statistic is in the
critical region or if the p-value is less than α; otherwise, do
not reject H0 .
7. Draw a conclusion: Relate the statistical decision to the
context of the original problem.
Rejection Region in Hypothesis Testing
The set of all test statistic values for which we would reject the
”null hypothesis”.
▶ Stated in terms of a critical value (CV), which depends on
assumptions about your data.
▶ H0 is rejected if and only if the observed/computed
value of the test statistic falls in the rejection region.
Hypothesis Test 1: z-tests and t-tests

Standardized test statistics:


Test 1: Testing an unknown population mean µ, H0 : µ = µ0
When σ 2 is known:
▶ 1a: X is normally distributed, X ∼ N(µ, σ 2 )

σ2
 
X ∼ N µ0 ,
n

X̄ − µ0
Test statistic: Z = √ where Z ∼ N(0, 1)
σ/ n
Hypothesis Test 1: z-tests and t-tests

Standardized test statistics:


Test 1: Testing an unknown population mean µ, H0 : µ = µ0
When σ 2 is known:
▶ 1b: X is not normally distributed

For large samples of size n, by the central limit theorem,

σ2
 
X ∼ N µ0 ,
n
X̄ − µ0
Test statistic: Z = √ where Z ∼ N(0, 1)
σ/ n
Hypothesis Test 1: z-tests and t-tests
Standardized test statistics:
Test 1: Testing an unknown population mean µ, H0 : µ = µ0

When σ 2 is unknown:
▶ 1c: X is preferably normally distributed. For large n,

σ2
 
X ∼ N µ0 ,
n

X̄ − µ0
Test statistic: Z = √ where Z ∼ N(0, 1)
s/ n
▶ 1d: X is normally distributed, X ∼ N(µ, σ 2 ). For small n,

X̄ − µ0
Test statistic: T = √ where T ∼ t(n − 1)
s/ n
Hypothesis Test 2: Testing a binomial proportion p, where
X ∼ B(n, p)

Standardized test statistics:


Test 1: Testing an unknown population mean µ, H0 : µ = µ0

Testing a binomial proportion p, where X ∼ B(n, p)


▶ X is the number of successes in n trials.
▶ If n is large such that np > 5 and nq > 5, then
X ∼ N(np, npq).

X − np
Test statistic: Z = √ where Z ∼ N(0, 1)
npq
Hypothesis Test 3: Testing µ1 − µ2 , the difference between
means of two normal distributions
3a: σ12 , σ22 known

σ12 σ22
 
X1 − X2 ∼ N µ1 − µ2 , +
n1 n2
(X̄1 − X̄2 ) − (µ1 − µ2 )
Test statistic: Z = q 2 where Z ∼ N(0, 1)
σ1 σ22
n1 + n2

3b: Common population variance σ 2 known


  
2 1 1
X1 − X2 ∼ N µ1 − µ2 , σ +
n1 n2
(X̄1 − X̄2 ) − (µ1 − µ2 )
Test statistic: Z = q where Z ∼ N(0, 1)
σ n11 + n12
Hypothesis Test 3: Testing µ1 − µ2 , the difference between
means of two normal distributions
3c: Common population variance σ 2 unknown

n1 s12 + n2 s22
Use σ̂ 2 = (s12 , s22 sample variances)
n1 + n2 − 2
(x1i − X̄1 )2 + (x2i − X̄2 )2
P P
2
σ̂ =
n1 + n2 − 2
When n is large

(X̄1 − X̄2 ) − (µ1 − µ2 )


Test statistic: Z = q where Z ∼ N(0, 1)
σ̂ n11 + n12

When n is small
(X̄1 − X̄2 ) − (µ1 − µ2 )
Test statistic: T = q where T ∼ t(n1 +n2 −2)
σ̂ n11 + n12
Z-TEST
One-Tailed Z-Test

S. L. Lower tail Upper tail


10% Reject H0 if z < −1.282 Reject H0 if z > 1.282
5% Reject H0 if z < −1.645 Reject H0 if z > 1.645
1% Reject H0 if z < −2.326 Reject H0 if z > 2.326
Table: Critical values for one-tailed tests (S. L. stands for Significant
Level)
Two-Tailed Z-Test

S. L. Two-tailed test
10% Reject H0 if z < −1.645 or z > 1.645
(written |z| > 1.645)
5% Reject H0 if z < −1.96 or z > 1.96
(written |z| > 1.96)
1% Reject H0 if z < −2.576 or z > 2.576
(written |z| > 2.576)
Table: Critical values for one-tailed tests (S. L. stands for Significant
Level)
Example: z-test for Mean
Problem:
▶ The mean volume of liquid in packs is 524 ml with a standard
deviation of 3 ml.
▶ After repair, the mean volume of 50 packs is 524.9 ml.
▶ Test at the 5% significance level whether the mean volume
has increased.
Solution:
▶ Step 1: State the hypotheses:
H0 : µ = 524 (null hypothesis)
H1 : µ > 524 (alternative hypothesis)
▶ Step 2: Choose the significance level: α = 0.05
▶ Step 3: Select the appropriate test: z-test (large sample,
known variance)
▶ Step 4: Calculate the test statistic:
x̄ − µ 524.9 − 524 0.9
z= √ = √ = ≈ 2.12
σ/ n 3/ 50 0.424
Example: z-test for Mean
Problem:
▶ The mean volume of liquid in packs is 524 ml with a standard
deviation of 3 ml.
▶ After repair, the mean volume of 50 packs is 524.9 ml.
▶ Test at the 5% significance level whether the mean volume
has increased.
Solution:
▶ Step 5: Determine the critical value: For α = 0.05,
z0.05 = 1.645
▶ Step 6: Make a decision: Since z = 2.12 > 1.645, reject
H0 .
▶ Step 7: Draw a conclusion: There is sufficient evidence at
the 5% significance level to conclude that the mean volume
has increased.
T-TEST FOR MEAN
Example: t-test for Mean
Problem:
▶ The average volume in a sample of 10 jars of marmalade is
454.8 g.
▶ Standard deviation of the sample is 0.8 g.
▶ Test at the 5% significance level whether the mean volume
has changed from 455 g.
Solution:
▶ Step 1: State the hypotheses:
H0 : µ = 455 (null hypothesis)
H1 : µ ̸= 455 (alternative hypothesis)
▶ Step 2: Choose the significance level: α = 0.05
▶ Step 3: Select the appropriate test: t-test (small sample,
unknown variance)
▶ Step 4: Calculate the test statistic:
x̄ − µ 454.8 − 455 −0.2
t= √ = √ = ≈ −0.79
s/ n 0.8/ 10 0.253
Example: t-test for Mean

Problem:
▶ The average volume in a sample of 10 jars of marmalade is
454.8 g.
▶ Standard deviation of the sample is 0.8 g.
▶ Test at the 5% significance level whether the mean volume
has changed from 455 g.
Solution:
▶ Step 5: Determine the critical value: For α = 0.05 and
df = 9, t0.025,9 = 2.262
▶ Step 6: Make a decision: Since |t| = 0.79 < 2.262, do not
reject H0 .
▶ Step 7: Draw a conclusion: There is not sufficient evidence
at the 5
Example: Hypothesis Test for Proportion
Problem:
▶ Of 180 survey respondents, 134 favor a proposal.
▶ Test at the 5% significance level whether the proportion of the
population favoring the proposal is greater than 70
Solution:
▶ Step 1: State the hypotheses:
H0 : p = 0.7 (null hypothesis)
H1 : p > 0.7 (alternative hypothesis)
▶ Step 2: Choose the significance level: α = 0.05
▶ Step 3: Select the appropriate test: z-test for proportion
▶ Step 4: Calculate the test statistic:
134
p̂ = = 0.744
180
p̂ − p 0.744 − 0.7
z=q = q ≈ 1.56
p(1−p) 0.7(0.3)
n 180
Example: Hypothesis Test for Proportion

Problem:
▶ Of 180 survey respondents, 134 favor a proposal.
▶ Test at the 5% significance level whether the proportion of the
population favoring the proposal is greater than 70
Solution:
▶ Step 5: Determine the critical value: For α = 0.05,
z0.05 = 1.645
▶ Step 6: Make a decision: Since z = 1.56 < 1.645, do not
reject H0 .
▶ Step 7: Draw a conclusion: There is not sufficient evidence
at the 5
TYPE-I & II ERRORS
Type I and Type II Errors
▶ Type I Error: Rejecting the null hypothesis when it is
actually true.
▶ Type II Error: Failing to reject the null hypothesis when it is
actually false.
▶ Significance Level (α): The probability of making a Type I
error.
▶ Power of the Test: The probability of correctly rejecting the
null hypothesis (1 - probability of Type II error).

Truth about the population


Decision based on sample H0 true H0 false (H1 true)
Reject H0 Type I error Correct decision
Fail to reject H0 Correct decision Type II error
Table: Type I and Type II Errors in Hypothesis Testing
Example: Type I and Type II Errors

Problem:
▶ A random variable has a normal distribution with mean µ and
standard deviation 3.
▶ Test H0 : µ = 20 against H1 : µ > 20 using a sample size of
25.
▶ Reject H0 if the sample mean is greater than 21.4.
Solution:
▶ Calculate the probability of Type I error:

P(Type I error) = P(X̄ > 21.4 | µ = 20)

 
21.4 − 20
=P Z > √ = P(Z > 2.33) = 0.01
3/ 25
Example: Type I and Type II Errors
Problem:
▶ A random variable has a normal distribution with mean µ and
standard deviation 3.
▶ Test H0 : µ = 20 against H1 : µ > 20 using a sample size of
25.
▶ Reject H0 if the sample mean is greater than 21.4.
Solution:
▶ Calculate the probability of Type II error:

P(Type II error | µ = 21) = P(X̄ ≤ 21.4 | µ = 21)

 
21.4 − 21
=P Z ≤ √ = P(Z ≤ 0.67) = 0.75
3/ 25
▶ Power of the test:

Power = 1 − P(Type II error) = 1 − 0.75 = 0.25


Relationship between Type I Type II Error Probabilities

▶ For a fixed sample size, the smaller we specify the level of


significance, α, the larger will be the probability, β, of not
rejecting a false H0 .
▶ Type I error is a “false alarm” and Type II error is a “missed
opportunity” to take some corrective action.
▶ Thus, Type I error is typically more serious than Type II error.
▶ We specify the level of significance before we perform the
hypothesis test, so we directly control the risk of committing
Type I error.
Complements of Type I Type II Errors

▶ Complement of the probability of Type I error = (1 − α), is


called the confidence coefficient.
▶ The confidence coefficient is the probability that we will not
reject H0 when it is true.
▶ Complement of the probability of Type II error = (1 − β), is
called the power of a statistical test.
▶ The power of a statistical test is the probability that we will
reject H0 when it is false.
Possible Conclusions

▶ Suppose that a hypothesis test is conducted at a small


significance level.
▶ If H0 is rejected, we conclude that the data provide sufficient
evidence to support H1 .
▶ If H0 is not rejected, we conclude that the data do not provide
sufficient evidence to support H1 .
SAMPLE SIZE
Determining Sample Size
▶ Sample size determination for µ
2 σ2
zα/2
n=
M2
▶ Sample size determination for P
2 P(1 − P)
zα/2
n=
M2
▶ M = Sampling error
▶ Sampling error is the error caused by observing a sample
instead of the whole population.
▶ It represents the difference between a sample statistic (e.g.,
sample mean) and the corresponding population parameter
(e.g., population mean).
▶ Sampling error is the maximum allowable difference between
the sample statistic and the population parameter that the
researcher is willing to accept.
▶ It is used to achieve a specified level of confidence and
precision in the estimates.
Note on Sample Size Determination

▶ When determining sample size for µ, we use z instead of t


because, to determine the critical value of t, we need to know
the sample size, but we do not know it yet.
▶ Note that we rarely know the population parameters. So, how
do we find those?
▶ In some instances, we may have past data or relevant
experience that provides an estimate of parameters.
▶ If we do not have past data or relevant experience, we can try
to provide a value for parameters that would never
underestimate the sample size needed.
▶ For example:
▶ To determine sample size for µ, we can estimate σ as the
range of the variable divided by 6.
▶ To determine sample size for P, we can use P = 0.5.
Example - Sample Size Determination

1. If a quality control manager wants to estimate, with 95%


confidence, the mean life of light bulbs to within ±20 hours
and also assumes that the population standard deviation is
100 hours, how many light bulbs need to be selected?
2. A survey found that 66% of USA adults used credit cards for
convenience. To conduct a follow-up study that would provide
99% confidence that the point estimate is correct to within
±0.03 of the population proportion, how many people need to
be sampled?
Solution

1. Estimating mean life of light bulbs


▶ α = 100% − 95% = 5% ⇒ α/2 = 2.5% ⇒ z2.5% = 1.96
2
σ2
▶ n = zα/2
2 2

M2 = 1.9620×100
2 = 3.8416×10000
400 = 96.04 ≈ 97
▶ Number of light bulbs to be selected = 97
2. Estimating population proportion of credit card users
▶ α = 100% − 99% = 1% ⇒ α/2 = 0.5% ⇒ z0.5% = 2.576
2 2
▶ n = zα/2 P(1−P)
M2 = 2.576 ×0.66×(1−0.66)
0.032 ≈ 1646.82 ≈ 1647
▶ Number of people to be sampled = 1647

You might also like