Module 2 - Hypothesis Testing_afterclass
Module 2 - Hypothesis Testing_afterclass
1
Today’s Objectives
1. Intro to Hypothesis Tes1ng
2. Conduct a one-tailed upper and lower hypothesis test for the true
popula1on mean !
– when sigma is known
2
Motivating Example
Upon analyzing the data over the past three years, you find out that the
sample mean return of the scheme is $205 and it is known that the
popula1on standard devia1on is $65.
Suppose you only want to invest in the scheme if its popula1on mean
return is more than $200. Should you invest or not?
3
Making Judgment Using Hypothesis Testing
– It is not possible to ascertain the truthfulness of a certain claim or
hypothesis with 100% certainty. No such thing as perfect judgment.
4
Null and Alternative Hypotheses
Start with the Alternative
--
Null Hypothesis, H0
• The opposite of HA
Alternative Hypothesis, HA
• “Research hypothesis” or the claim
• Status quo, what is assumed you are tes1ng
about the popula1on • Challenges the presumed value of
the null hypothesis
the chaim to be supported
5
The Logic of Hypothesis Testing
If no evidence to
Test if H0 is likely to
-
contradict H0, we do
not reject H0.
Assume H0 is true. be true given the
A jury
member
Collect evidence to see if you
The defendant is
have “enough” evidence to
assumed innocent first.
reject the initial assumption.
6
The Logic of Hypothesis Testing
If no evidence to
contradict H0, we do
Test if H0 is likely to not reject H0.
Assume H0 is true. be true given the
data you have. If evidence exists to
contradict H0, we
reject H0.
A researcher wants
to test a new
finding/hypothesis.
8
Examples of H0 and HA
Suppose a baker claims that his bread height is more than 15 cm, on average.
Several of his customers do not believe him. To persuade his customers that he
is right, the baker decides to do a hypothesis test.
He bakes 9 loaves of bread. The mean height of the sample loaves is 17 cm.
-
The baker knows from baking hundreds of loaves of bread that the standard
deviation for the height is 0.5 cm. and the distribution of heights is normal.
-
9
Examples of H0 and HA
Suppose a baker claims that his bread height is more than 15 cm, on average.
Several of his customers do not believe him. To persuade his customers that he
is right, the baker decides to do a hypothesis test.
He bakes 9 loaves of bread. The mean height of the sample loaves is 17 cm.
The baker knows from baking hundreds of loaves of bread that the standard
deviation for the height is 0.5 cm. and the distribution of heights is normal.
H0: µ ≤ 15
HA: µ > 15
10
Distribution needed for Hypothesis Testing
§ Perform tests of a population mean using a normal distribution or a
Student's t-distribution.
§ If you are testing a population mean, the distribution for the test is:
– We can use the normal distribution when the population standard deviation is
known.
,! "
-> &% ~)(!! , )
-
x̅#$
Z = "! ! ~N(0, 1)
% #
̅ !
'#$
Given a specific sample mean 0,̅ define z-score = "! .
% #
11
Rare events and the Sample
§ Suppose you make an assumption about a property of the population (this
assumption is the null hypothesis). Then you gather sample data randomly.
If the sample has properties that would be very unlikely to occur if the
assumption is true, then you would conclude that your assumption about
-
the population is probably incorrect.
-
§ Use the sample data to calculate the actual probability of getting the test
result, called the p-value.
– p-value is P(the results from another randomly selected sample will be as
nee
extreme or more extreme as the results obtained from the given sample |
-
Null hypothesis is true)
-
– A large p-value calculated from the data indicates that we should not reject the
null hypothesis.
– Draw a graph that shows the p-value. The hypothesis test is easier to perform if
you use a graph because you see the problem more clearly.
12
Ni 15
Example of p-value HA :
M> 15
Suppose a baker claims that his bread height is more than 15 cm, on average.
Several of his customers do not believe him. To persuade his customers that he
is right, the baker decides to do a hypothesis test. He bakes 9 loaves of bread.
&
The mean height of the sample loaves is 17 cm. The baker knows from baking
hundreds of loaves of bread that the standard deviation for the height is 0.5 cm.
and the distribution of heights is normal.
The p-value, then, is the probability that a sample mean is the same or greater
than 17 cm, when the population mean is, in fact, 15 cm.
X N(m (8)
!
- .
13
i
Ho : he15
Example of p-value 15
HA
>
:
Suppose a baker claims that his bread height is more than 15 cm, on average.
Several of his customers do not believe him. To persuade his customers that he
is right, the baker decides to do a hypothesis test. He bakes 9 loaves of bread.
The mean height of the sample loaves is 17 cm. The baker knows from baking
hundreds of loaves of bread that the standard deviation for the height is 0.5 cm.
and the distribution of heights is normal.
The p-value, then, is the probability that a sample mean is the same or greater
than 17 cm, when the population mean is, in fact, 15 cm.
- &% ~)(!! ,
*! "
+
)
17#()
&
̅ !
'#$
z−score = "! = $.& =12
% # % '
* ->%
P−value = P(&>17)
= P(Z > 12)
14
Decision and Conclusion
§ A systematic way to make a decision of whether to reject or not reject the
null hypothesis is to compare the p-value and a preset or preconceived α
(also called a "significance level").
2. The p-value does NOT tell you the chance the null
hypothesis is wrong.
Do not reject H0 if p-value > α
2641
M
.
M
-I
-J
·
nsbC
I
2601 1220
-
proc
9 28
.
15 10 .
2 is it 16 10 7 .
1159 11591159 10
: 00
Example: Baking
Suppose a baker claims that his bread height is more than 15 cm, on average.
Several of his customers do not believe him. To persuade his customers that he
is right, the baker decides to do a hypothesis test. He bakes 9 loaves of bread.
&
The mean height of the sample loaves is 17 cm.- The baker knows from baking
-hundreds of loaves of bread that the standard deviation for the height is 0.5 cm.
and the distribution of heights is normal. 9 = 0.05
16
Baking X
H0: µ ≤ 15
Hypotheses:
HA: µ > 15
-
Hence it’s an upper-tail test
- 0 Data
n = 9, x̅ = 17, , = 0.5
->Picture
P(X- /7)
SE = 0.5/ 9 = 1/6
9 = 0.05
0
Test Statistic p-Value
̅ $
'#$
P(Z > z-score) = 1 – P(Z ≤ z-score)
z-score = = 1 – P(Z ≤ 12)
,-
= 1 – norm.dist (12, 0, 1, 1)
= 0.000
z-score = 12
p-value = 0.000 is smaller than % = 0.05
Reject H0
17
Student's t-distribution
§ X1, …, Xn are a random sample (independent and identically
distributed) from a population with mean !( and a possibly unknown
-
standard deviation σ
18
Student's t-distribution
§ X1, …, Xn are a random sample (independent and identically
distributed) from a population with mean !( and a possibly unknown
standard deviation σ
The )!" has a t distribution with n-1 degrees of freedom (df).
⑳
@ − B0
A s: sample standard deviation.
?./ = D
C E - -
Need the population to have a normal distribution or n > 30.
19
N(M ,
0 ? ) ;
Mx .
ox
Student's t-distribution
§ X1, …, Xn are a random sample (independent and identically
distributed) from a population with mean !( and a possibly unknown
standard deviation σ
@ − B0
A s: sample standard deviation.
e
The )!" has a t distribution with n-1 degrees of freedom (df).
F is the degree of
freedom.
20
Student's t-distribution
§ X1, …, Xn are a random sample (independent and identically
distributed) from a population with mean !( and a possibly unknown
standard deviation σ
The )!" has a t distribution with n-1 degrees of freedom (df).
@ − B0
A s: sample standard deviation.
?./ = D Need the population to have a normal distribution or n > 30.
C E
E
mean 0,̅
̅
'#$
define t-score = ( !.
% #
F is the degree of
freedom.
21
Student's t-distribution
§ X1, …, Xn are a random sample (independent and identically
distributed) from a population with mean !( and a possibly unknown
standard deviation σ
@ − B0
A The )!" has a t distribution with n-1 degrees of freedom (df).
?./ = D s: sample standard deviation.
C E Need the population to have a normal distribution or n > 30.
2. Conduct a one-tailed upper and lower hypothesis test for the true
popula1on mean !
–
-
when sigma is known Normal .
–
-
when sigma is unknown E
3. Conduct two-tailed hypothesis tests for the true popula1on mean !
23
Statistics & p-Values For One-Sided Tests (with known s)
I H A : µ < µ0
µ0 is given in problem
= norm.dist(z-score, 0, 1, 1)
statement x̅+,!
Z= " Z has a standard normal
- # distribuPon
̅ !
/+,
z-score = "
Upper Tail Test - # P(Z > z-score)
H A : µ > µ0
=1-norm.dist(z-score, 0, 1, 1)
µ0 is given in problem
statement
Z has a standard normal
distribution
24
Exercise: Manufacturing
A manufacturing process drills a hole in a metal plate. The mean dimension
of the hole is specified to be at most 8 mm. The population standard
deviation is known to be 0.001 mm. A random sample of 9 holes had a
sample mean of 8.0007. Test that the mean - size of the holes have increased
at 9 = 0.05. Assume that the dimensions of holes are Normally distributed.
25
Manufacturing
⑳
H 0: µ ≤ 8
Hypotheses:
H A: µ > 8 Hence it’s an upper-tail test
Data
- Picture
P-value = P(Z>2.1) = 0.018
r
z
2. Conduct a one-tailed upper and lower hypothesis test for the true
population mean !
–
-
when sigma is known
27
Statistics & p-Values For One-Sided Tests (with unknown s)
Type Test Statistic, P-Value
Unknown s
µ0 is given in problem
statement
@ − B1
A
r =t.dist(t-score,n-1,1)
28
Cadillac Buyers
It is presumed the average Cadillac driver is over 50 years old. But you
-
hypothesize
-
as a result of recent successful marketing efforts, the
mean age of a Cadillac driver is actually younger than 50 years old. To
--
test your hypothesis, you sample 36 drivers and find a sample mean
age of 45 years and themesample standard deviation is 12 years. Conduct
-
a hypothesis test and determine: is there evidence to support your
hypothesis at the G = H. HI level? Ha:
M550
Ho M=50
:
29
>5)P (t35 < -
2 .
31
Cadillac Buyers Income: Upper Tail Test
E
Hypotheses: H0: µ ≤ 75000
HA: µ > 75000 Hence it’s an upper-tail test
Data Picture
n=36, x̅ = 79,000, s=15,000
SE = 15000/ 36 = 2500
df = n-1 =35
9 = 0.05
-
tstat = = 1.6 The p-value is larger than % = 0.05
2500
Do Not Reject H0
32
Today’s Objectives
1. Intro to Hypotheses Tes1ng
2. Conduct a one-tailed upper and lower hypothesis test for the true
popula1on mean !
– when sigma is unknown
33
Test Statistics & p-Values For Two-Tailed Tests
Type Test Statistic P-Value
Two-Tailed Test
P(Z < -|z-score|) + P(Z > |z-score|)
When ! is
known
& HA: µ ≠ µ0 z-score =
x̅+,!
-
=2*P(Z < -|z-score|)
= 2*norm.dist(-|z-score|,0 , 1, 1)
µ0 is given in the "
problem statement - #
Z has standard normal distribution
-N
of the tails shaded in red à
34
HK pop (sigma is known)
HK pop plans to introduce a new song this winter with a TicTok video. In the
past, their videos received 55,000 daily views on average (daily population
mean). The number of daily views for 25 days had a sample mean of 57,000
-
and a population standard deviation of 15,000. Test that the population
mean number of daily views is different from the past with a 0.05 level of
significance test. Assume that the number of daily views is Normally
distributed.
35
HK pop
Hypotheses: H0: µ = 55,000
“Two-Tailed Test”
HA: µ ≠ 55,000
Data p-value = 2* P(Picture
Z < -|z-score|) = 0.5048
Z Test Statistic
- p-Value
x̅#55000
zstat = "
% # -
2*P(Z<-|zstat|) = 2*P(Z < -0.667)
= 2* norm.dist(-0.667, 0, 1 , 1) = 0.5048
There does not exist evidence to support that the new video on TicTok is
36 any different from previous videos
Today’s Objectives
1. Intro to Hypotheses Testing
2. Conduct a one-tailed upper and lower hypothesis test for the true
population mean !
– when sigma is unknown
37