0% found this document useful (0 votes)
17 views37 pages

Module 2 - Hypothesis Testing_afterclass

Uploaded by

1135399568
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views37 pages

Module 2 - Hypothesis Testing_afterclass

Uploaded by

1135399568
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

IIMT 2641 Introduction to Business Analytics

Module 2: Intro to Statistics


Topic 3: Hypothesis Testing

1
Today’s Objectives
1. Intro to Hypothesis Tes1ng

2. Conduct a one-tailed upper and lower hypothesis test for the true
popula1on mean !
– when sigma is known

– when sigma is unknown E


3. Conduct two-tailed hypothesis tests for the true popula1on mean !

2
Motivating Example

Your investment advisor proposes you a monthly investment scheme.

Upon analyzing the data over the past three years, you find out that the
sample mean return of the scheme is $205 and it is known that the
popula1on standard devia1on is $65.

Suppose you only want to invest in the scheme if its popula1on mean
return is more than $200. Should you invest or not?

3
Making Judgment Using Hypothesis Testing
– It is not possible to ascertain the truthfulness of a certain claim or
hypothesis with 100% certainty. No such thing as perfect judgment.

– But we can make reasonable judgment, while minimizing some


statistical error. This is the idea of Hypothesis Testing.

– Our reasonable judgement comes from asking the question:

Does the sample data we have suggest something


we think about the population is actually NOT true?

Does the sample data we have on investment


returns suggest the popula1on mean income is
actually more than $200 per month?

4
Null and Alternative Hypotheses
Start with the Alternative

--
Null Hypothesis, H0
• The opposite of HA
Alternative Hypothesis, HA
• “Research hypothesis” or the claim
• Status quo, what is assumed you are tes1ng
about the popula1on • Challenges the presumed value of
the null hypothesis
the chaim to be supported

5
The Logic of Hypothesis Testing

If no evidence to
Test if H0 is likely to
-
contradict H0, we do
not reject H0.
Assume H0 is true. be true given the

- data you have. If evidence exists to


contradict H0, we
reject H0.

A jury
member
Collect evidence to see if you
The defendant is
have “enough” evidence to
assumed innocent first.
reject the initial assumption.
6
The Logic of Hypothesis Testing

If no evidence to
contradict H0, we do
Test if H0 is likely to not reject H0.
Assume H0 is true. be true given the
data you have. If evidence exists to
contradict H0, we
reject H0.
A researcher wants
to test a new
finding/hypothesis.

The old theory To decide if you have “enough”


(currently accepted) is evidence to reject the old
7 assumed to be true. theory.
Null and Alternative Hypotheses
Start with the Alternative

Null Hypothesis, H0 Alternative Hypothesis, HA


• Currently accepted value for a • Challenges the presumed value of
parameter. the null hypothesis
• For example, µ = 15 • Never contains “=” < 15
M
• Always contains “=” (=,≤, or ≥)
•Lower tail test has <
• First write the alterna1ve, and
S then write the null as the
opposite.
•Upper tail test has >
•Two tail test has ≠

8
Examples of H0 and HA
Suppose a baker claims that his bread height is more than 15 cm, on average.
Several of his customers do not believe him. To persuade his customers that he
is right, the baker decides to do a hypothesis test.

He bakes 9 loaves of bread. The mean height of the sample loaves is 17 cm.
-
The baker knows from baking hundreds of loaves of bread that the standard
deviation for the height is 0.5 cm. and the distribution of heights is normal.
-

9
Examples of H0 and HA
Suppose a baker claims that his bread height is more than 15 cm, on average.
Several of his customers do not believe him. To persuade his customers that he
is right, the baker decides to do a hypothesis test.

He bakes 9 loaves of bread. The mean height of the sample loaves is 17 cm.
The baker knows from baking hundreds of loaves of bread that the standard
deviation for the height is 0.5 cm. and the distribution of heights is normal.

H0: µ ≤ 15
HA: µ > 15

10
Distribution needed for Hypothesis Testing
§ Perform tests of a population mean using a normal distribution or a
Student's t-distribution.

§ If you are testing a population mean, the distribution for the test is:
– We can use the normal distribution when the population standard deviation is
known.

,! "
-> &% ~)(!! , )
-
x̅#$
Z = "! ! ~N(0, 1)
% #
̅ !
'#$
Given a specific sample mean 0,̅ define z-score = "! .
% #

11
Rare events and the Sample
§ Suppose you make an assumption about a property of the population (this
assumption is the null hypothesis). Then you gather sample data randomly.
If the sample has properties that would be very unlikely to occur if the
assumption is true, then you would conclude that your assumption about
-
the population is probably incorrect.
-

§ Use the sample data to calculate the actual probability of getting the test
result, called the p-value.
– p-value is P(the results from another randomly selected sample will be as
nee
extreme or more extreme as the results obtained from the given sample |
-
Null hypothesis is true)
-
– A large p-value calculated from the data indicates that we should not reject the
null hypothesis.
– Draw a graph that shows the p-value. The hypothesis test is easier to perform if
you use a graph because you see the problem more clearly.

12
Ni 15

Example of p-value HA :

M> 15

Suppose a baker claims that his bread height is more than 15 cm, on average.
Several of his customers do not believe him. To persuade his customers that he
is right, the baker decides to do a hypothesis test. He bakes 9 loaves of bread.
&
The mean height of the sample loaves is 17 cm. The baker knows from baking
hundreds of loaves of bread that the standard deviation for the height is 0.5 cm.
and the distribution of heights is normal.

The p-value, then, is the probability that a sample mean is the same or greater
than 17 cm, when the population mean is, in fact, 15 cm.
X N(m (8)

!
- .

13
i
Ho : he15
Example of p-value 15
HA
>
:

Suppose a baker claims that his bread height is more than 15 cm, on average.
Several of his customers do not believe him. To persuade his customers that he
is right, the baker decides to do a hypothesis test. He bakes 9 loaves of bread.
The mean height of the sample loaves is 17 cm. The baker knows from baking
hundreds of loaves of bread that the standard deviation for the height is 0.5 cm.
and the distribution of heights is normal.

The p-value, then, is the probability that a sample mean is the same or greater
than 17 cm, when the population mean is, in fact, 15 cm.

- &% ~)(!! ,
*! "
+
)
17#()
&
̅ !
'#$
z−score = "! = $.& =12
% # % '

* ->%
P−value = P(&>17)
= P(Z > 12)
14
Decision and Conclusion
§ A systematic way to make a decision of whether to reject or not reject the
null hypothesis is to compare the p-value and a preset or preconceived α
(also called a "significance level").

1. The smaller the p-value, the more evidence to reject the


Null.
Reject H0 if p-value < α

2. The p-value does NOT tell you the chance the null
hypothesis is wrong.
Do not reject H0 if p-value > α
2641

M
.

M
-I
-J
·
nsbC
I
2601 1220

-
proc

9 28
.

15 10 .
2 is it 16 10 7 .

1159 11591159 10
: 00
Example: Baking
Suppose a baker claims that his bread height is more than 15 cm, on average.
Several of his customers do not believe him. To persuade his customers that he
is right, the baker decides to do a hypothesis test. He bakes 9 loaves of bread.
&

The mean height of the sample loaves is 17 cm.- The baker knows from baking
-hundreds of loaves of bread that the standard deviation for the height is 0.5 cm.
and the distribution of heights is normal. 9 = 0.05

16
Baking X
H0: µ ≤ 15
Hypotheses:
HA: µ > 15
-
Hence it’s an upper-tail test

- 0 Data
n = 9, x̅ = 17, , = 0.5
->Picture
P(X- /7)

SE = 0.5/ 9 = 1/6
9 = 0.05

0
Test Statistic p-Value
̅ $
'#$
P(Z > z-score) = 1 – P(Z ≤ z-score)
z-score = = 1 – P(Z ≤ 12)
,-
= 1 – norm.dist (12, 0, 1, 1)
= 0.000
z-score = 12
p-value = 0.000 is smaller than % = 0.05
Reject H0
17
Student's t-distribution
§ X1, …, Xn are a random sample (independent and identically
distributed) from a population with mean !( and a possibly unknown
-
standard deviation σ

18
Student's t-distribution
§ X1, …, Xn are a random sample (independent and identically
distributed) from a population with mean !( and a possibly unknown
standard deviation σ
The )!" has a t distribution with n-1 degrees of freedom (df).


@ − B0
A s: sample standard deviation.
?./ = D
C E - -
Need the population to have a normal distribution or n > 30.

19
N(M ,
0 ? ) ;
Mx .
ox

Student's t-distribution
§ X1, …, Xn are a random sample (independent and identically
distributed) from a population with mean !( and a possibly unknown
standard deviation σ

@ − B0
A s: sample standard deviation.
e
The )!" has a t distribution with n-1 degrees of freedom (df).

?./ = D Need the population to have a normal distribution or n > 30.


C E

F is the degree of
freedom.

20
Student's t-distribution
§ X1, …, Xn are a random sample (independent and identically
distributed) from a population with mean !( and a possibly unknown
standard deviation σ
The )!" has a t distribution with n-1 degrees of freedom (df).
@ − B0
A s: sample standard deviation.
?./ = D Need the population to have a normal distribution or n > 30.
C E

Given a specific sample

E
mean 0,̅
̅
'#$
define t-score = ( !.
% #
F is the degree of
freedom.

21
Student's t-distribution
§ X1, …, Xn are a random sample (independent and identically
distributed) from a population with mean !( and a possibly unknown
standard deviation σ
@ − B0
A The )!" has a t distribution with n-1 degrees of freedom (df).
?./ = D s: sample standard deviation.
C E Need the population to have a normal distribution or n > 30.

Given a specific sample


mean 0,̅
̅
'#$
define t-score = ( !.
% #

§ The t-score has the same interpretation as the z-score. It


measures how farD #" is from its mean !(.
§ The degrees of freedom, n – 1, come from the calculation of the
sample standard deviation s.
22
Today’s Objectives
1. Intro to Hypothesis Tes1ng

2. Conduct a one-tailed upper and lower hypothesis test for the true
popula1on mean !

-
when sigma is known Normal .


-
when sigma is unknown E
3. Conduct two-tailed hypothesis tests for the true popula1on mean !

23
Statistics & p-Values For One-Sided Tests (with known s)

Type Test Statistic, P-Value


known s
Lower Tail Test
P(Z < z-score)

I H A : µ < µ0

µ0 is given in problem
= norm.dist(z-score, 0, 1, 1)
statement x̅+,!
Z= " Z has a standard normal
- # distribuPon
̅ !
/+,
z-score = "
Upper Tail Test - # P(Z > z-score)
H A : µ > µ0
=1-norm.dist(z-score, 0, 1, 1)
µ0 is given in problem
statement
Z has a standard normal
distribution
24
Exercise: Manufacturing
A manufacturing process drills a hole in a metal plate. The mean dimension
of the hole is specified to be at most 8 mm. The population standard
deviation is known to be 0.001 mm. A random sample of 9 holes had a
sample mean of 8.0007. Test that the mean - size of the holes have increased
at 9 = 0.05. Assume that the dimensions of holes are Normally distributed.

25
Manufacturing


H 0: µ ≤ 8
Hypotheses:
H A: µ > 8 Hence it’s an upper-tail test

Data
- Picture
P-value = P(Z>2.1) = 0.018

n=9, x̅ = 8.0007, , = 0.001


SE = 0.001/ 9 = 0.000333 v
9 = 0.05

r
z

Test Statistic p-Value


̅ $
'#$
P(Z > z-score) = 1 – P(Z ≤ z-score)
z-score = = 1 – P(Z ≤ 2.1)
,-
= 1 – norm.dist (2.1, 0, 1, 1)
8.0007#8 = 0.018
z-score = = 2.1
0.000333 p-value = 0.018 is smaller than % = 0.05
Reject H0
26
Today’s Objectives
1. Intro to Hypothesis Testing

2. Conduct a one-tailed upper and lower hypothesis test for the true
population mean !

-
when sigma is known

– when sigma is unknown

3. Conduct two-tailed hypothesis tests for the true population mean !

27
Statistics & p-Values For One-Sided Tests (with unknown s)
Type Test Statistic, P-Value
Unknown s

Lower Tail Test P()!" < t-score)


H A : µ < µ0

µ0 is given in problem
statement
@ − B1
A
r =t.dist(t-score,n-1,1)

)!" has a t-distribution with


?./ = D
C E n-1 degrees of freedom
2#4
3
t-score = * )
% +
P()!" > t-score)
Upper Tail Test
H A : µ > µ0
=1-t.dist(t-score,n-1,1)
µ0 is given in problem
)!" has a t-distribution with
statement
n-1 degrees of freedom

28
Cadillac Buyers

It is presumed the average Cadillac driver is over 50 years old. But you
-
hypothesize
-
as a result of recent successful marketing efforts, the
mean age of a Cadillac driver is actually younger than 50 years old. To
--
test your hypothesis, you sample 36 drivers and find a sample mean
age of 45 years and themesample standard deviation is 12 years. Conduct
-
a hypothesis test and determine: is there evidence to support your
hypothesis at the G = H. HI level? Ha:
M550
Ho M=50
:

29
>5)P (t35 < -
2 .

Cadillac Buyers Age: Lower Tail Test


H0: µ ≥ 50
Hypotheses: v HA: µ < 50 Hence it’s a lower-tail test
Data v >
Picture
n = 36, x̅ = 45, and s = 12
("
SE = =2 V
56
df = n-1 = 35 v
9 = 0.05

T Test Statistic p-Value


x̅#$$ p-value
t-score = = 0 1#$ < t−score = t.dist (t-score, df,
,-
1) = t.dist(-2.5,35,1)
45#50
t-score = = −2.5 = 0.00862
"
p-value is less than % = 0.05
Conclusion: Reject H0
30
Example: Cadillac Buyers Income

It is also presumed the population mean household income of a


Cadillac driver is under $75,000. But there is a belief that there is
evidence the population mean household income is actually above
$75,000. To test this hypothesis, you sample 36 drivers and find a mean
-
household income of $79,000, and the sample
-
standard deviation is
$15,000. Conduct a hypothesis test and determine: is there evidence to
support your hypothesis at the 9 = H. HI level?

31
Cadillac Buyers Income: Upper Tail Test

E
Hypotheses: H0: µ ≤ 75000
HA: µ > 75000 Hence it’s an upper-tail test
Data Picture
n=36, x̅ = 79,000, s=15,000
SE = 15000/ 36 = 2500
df = n-1 =35
9 = 0.05

Test Statistic p-Value


P(1#$ > t−score) = P(1#$ > 1.6) = 1 –
x̅#75,000
tstat = P(1#$ ≤ 1.6) = 1 - t.dist (1.6,35,1)
,-
= 0.0593
79000#75000

-
tstat = = 1.6 The p-value is larger than % = 0.05
2500
Do Not Reject H0

32
Today’s Objectives
1. Intro to Hypotheses Tes1ng

2. Conduct a one-tailed upper and lower hypothesis test for the true
popula1on mean !
– when sigma is unknown

– when sigma is known

3. Conduct two-tailed hypothesis tests for the true popula1on mean !

33
Test Statistics & p-Values For Two-Tailed Tests
Type Test Statistic P-Value

Two-Tailed Test
P(Z < -|z-score|) + P(Z > |z-score|)
When ! is
known
& HA: µ ≠ µ0 z-score =
x̅+,!
-
=2*P(Z < -|z-score|)
= 2*norm.dist(-|z-score|,0 , 1, 1)
µ0 is given in the "
problem statement - #
Z has standard normal distribution

Two-Tailed Test P(T < -|t-score|) + P(T > |t-score|)


When is ! HA: µ ≠ µ0 =2*P(T < -|t-score|)
is x̅+,! = 2*t.dist(-|t-score|,n-1,1)
t-score = $
unknown µ0 is given in the - #
problem statement T has t-distribution with n-1 degrees
of freedom

*Suppose the t-score = -2.5


and df = 15, then p-value =
2*P(T < -|t-score|) is the area

-N
of the tails shaded in red à
34
HK pop (sigma is known)
HK pop plans to introduce a new song this winter with a TicTok video. In the
past, their videos received 55,000 daily views on average (daily population
mean). The number of daily views for 25 days had a sample mean of 57,000
-
and a population standard deviation of 15,000. Test that the population
mean number of daily views is different from the past with a 0.05 level of
significance test. Assume that the number of daily views is Normally
distributed.

Presumed popula1on mean = 55,000


n=25, x̅ =57,000, and sigma = 15,000
0.05 Level of Significance

35
HK pop
Hypotheses: H0: µ = 55,000
“Two-Tailed Test”
HA: µ ≠ 55,000
Data p-value = 2* P(Picture
Z < -|z-score|) = 0.5048

n=25, x̅ = 57,000 & , = 15,000


SE = , / - = 15,000/ 25 = 3,000
9 = 0.05

Z Test Statistic
- p-Value
x̅#55000
zstat = "
% # -
2*P(Z<-|zstat|) = 2*P(Z < -0.667)
= 2* norm.dist(-0.667, 0, 1 , 1) = 0.5048

(57000#55000) Do not Reject H0 because p-value > 0.05


zstat = = 0.667
5888

There does not exist evidence to support that the new video on TicTok is
36 any different from previous videos
Today’s Objectives
1. Intro to Hypotheses Testing

2. Conduct a one-tailed upper and lower hypothesis test for the true
population mean !
– when sigma is unknown

– when sigma is known

3. Conduct two-tailed hypothesis tests for the true population mean !

37

You might also like