Lecture 2- Formulation hypothesis,likelihood ratio tests & basic tests
Lecture 2- Formulation hypothesis,likelihood ratio tests & basic tests
Page 1 of 19
LECTURE 2
(iii) two-tailed if 𝐻1 states that the parameter is different (or not equal to) the
value claimed in 𝐻0
Types of Hypotheses
A hypothesis is a statement about the value of an unknown parameter in the model.
There are two types of hypotheses. Namely;
(i) Null hypothesis (𝑯𝟎 )
This can be thought of as the implied hypothesis. “Null” meaning
“nothing.”
This hypothesis states that there is no difference between groups or no
relationship between variables
Usually the null hypothesis represents a statement of “no effect”, “no
difference” or , put another way, “ things haven’t changed”
The null hypothesis is a presumption of status quo or no change.
It is the initial claim about the value of a parameter.
Always contain s some variation of equality as part of the relationship
The standard approach to carrying out a statistical test involves the following steps
(i) Specify the hypothesis to be tested/state the hypotheses in terms of
population parameters
𝐻𝑜 : Null hypothesis, usually is the opposite of our research hypothesis. The
null hypothesis always includes equality.
𝐻𝑎 : Alternative hypothesis, corresponds to our research hypothesis. Does
not include equality.
Page 2 of 19
LECTURE 2
𝐻0 ≥ ≤ =
𝐻𝑎 < > ≠
Example;
𝐻𝑜: The mean number of GVSU students enrolled in STA215 during
WINTER 2018 who speak English as a second language is 15
𝐻𝑎: The mean number of GVSU students enrolled in STA215 during
WINTER 2018 who speak English as a second language is not equal to 15
(ii) Calculate the Test Statistic
We use a corresponding sample statistics from a simple random sample to
challenge the statement made in 𝐻0 . We convert the sample statistic to a
corresponding value of the appropriate sampling distribution
(iii) Calculate the P-value
We use the sampling distribution of the test statistic and type of test to
compute the P-value of this statistic. Under the assumption that the null
hypothesis is true, the p-value is the probability of getting a sample statistic
as extreme as or more extreme than the observed statistic form our random
sample.
(iv) State the Conclusion
We conclude the test. If the P-value is very small, we have evidence to reject
𝐻0 and adopt 𝐻1 . What do we mean by “very small?’ we compare the p-vale
to the preset level of significance 𝛼. If the 𝑃 − 𝑣𝑎𝑙𝑢𝑒 ≤ 𝛼, then we say that
we have evidence to reject 𝐻0 and adopt 𝐻1 . Otherwise, we say that the
sample evidence is insufficient to reject 𝐻0
(v) Finally, we interpret the results in the context of application
Page 3 of 19
LECTURE 2
(i) Testing claims about a population mean (𝝁) when (𝝈) is known
In most real –world situations, 𝜎, is simply not known however, in some
cases a preliminary study or other information can be used to get a
realistic and accurate value for 𝜎
Requirements
Let 𝑥 be a random variable appropriate to your application.
Obtain a simple random variable (of size n) of 𝑥 values from which you
compute the sample mean 𝑥̅ . The value of 𝜎 is already known (perhaps
from a previous study) if you can assume that 𝑥 ha s anormal
distribution, then any sample size 𝑛 will work.
If you cannot assume this, then use a sample size 𝑛 ≥ 30
In other words;
The sample is a simple random sample
Either or both of these conditions are satisfied; the population is
normally distributed or 𝑛 ≥ 30
Procedure of testing
(i) In the context of the application, state the null and alternative
hypotheses and set the level of significance 𝛼.
(ii) Use the known 𝜎, the sample size 𝑛 , the value of 𝑥 from the sample
and 𝜇 from the null hypothesis to compute the standardized sample
test statistic
𝑥̅ − 𝜇
𝑍=
𝜎/√𝑛
(iii) Use the standard normal distribution and the type of test, one-tailed
or two tailed, to find the p-value corresponding to the test statistic
(iv) Conclude the test. If 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 𝛼, then reject 𝐻0 . If
𝑝 − 𝑣𝑎𝑙𝑢𝑒 > 𝛼 then fail to reject 𝐻0
(v) Interpret your conclusion in the context of the application
Example1
Page 4 of 19
LECTURE 2
Sunspot have been observed for many centuries. Records of sunspots from ancient
Persian and Chinese astronomers go back thousands of years. Some archeologists
think sunspots activity may somehow be related to prolonged periods of droughts
in the south western United States.
Let 𝑥 be a random variable representing the average number of sunspots observed
in a
4-week period. A random sample of 40 such periods from Spanish colonial times
gave the following data (reference: M. waldmeir, sunspot Activity, international
Astronomical union Bulletin)
12.5 14.1 37.6 48.3 67.3 70.0 43.8 56.5 59.7 24.0
12.0 27.4 53.5 73.9 104.0 54.6 4.4 177.3 70.1 54.0
28.0 13.0 6.5 134.7 114.0 72.7 81.2 24.1 20.4 13.3
9.4 25.7 47.8 50.0 45.3 61.0 39.0 12.0 7.2 11.3
The sample mean is 𝑥̅ = 47.0. Previous studies of sunspot activity during this
period indicates that 𝜎 = 35. It is thought that for thousands of years, the mean
number of sunspots per 4-week period was about 𝜇 = 41.
Sunspot activity above this level may (may not) be linked to gradual change. Do the
data indicate that the mean sunspot activity during the Spanish colonial period was
higher than 41? Use 𝛼 = 0.05
Solution
Page 5 of 19
LECTURE 2
(ii) Check requirements: What distribution do we use for the sample test
statistic? Compute the z value of the sample test statistics 𝑥̅
Since 𝑛 ≥ 30 and we know 𝜎, we use the standard normal distribution.
Using 𝑥̅ = 47 from the sample, 𝜎 = 35, 𝜇 = 41 from 𝐻0 , 𝑎𝑛𝑑, 𝑛 = 40
𝑥̅ − 𝜇 47 − 41
𝑍= = = 1.08
𝜎/√𝑛 35⁄
√40
(iii) Find the p-value of the test statistic
Since we have a right-tailed test, the p-value is the area to the right of
𝑧 = 1.08
P-value = 𝑃(𝑍 > 1.08) = 1 − 𝑃(𝑍 ≤ 1.08)
= 1 − Φ(1.08)
= 1 − 0.85993
= 0.14007
≈ 0.1401
Example 2
The Environment protection Agency has been studying Miller creek regarding ammonia
nitrogen concentration. For many years, the concentration has been 2.3mg/L. however, a
new golf course and new housing development are raising concern that the concentration
Page 6 of 19
LECTURE 2
may have changed because of lawn fertilizer. Any change (either an increase or decrease)
in the ammonia nitrogen concentration can affect plant and animal life in and around creek
(Reference: EPA Report 832-R-93-005)
Let x be a random variable representing ammonia nitrogen concentration (in mg/L).
Based on recent studies of miller creek, we may assume that 𝑥 has a normal distribution
with 𝜎 = 0.30. Recently, a random sample of eight water tests from creek gave the
following 𝑥 values
2.1 2.5 2.2 2.8 3.0 2.2 2.4 2.9
Solution
Solution
To determine the type of the test, look at the symbol used in the alternative
hypothesis(𝐻1 ).
The symbol > points to the right and the test is a right-tailed
Page 7 of 19
LECTURE 2
The symbol < point to the left and the test is left-tailed
The symbol ≠ is used for a two-tailed test.
Since the alternative hypothesis is 𝐻𝑎 : 𝜇 ≠ 2.3 𝑚𝑔/𝐿, this is a two-tailed test.
(iii) Check requirements: what sampling distribution shall we use? Note that the
value of 𝜇 is given in the null hypothesis, 𝐻0
Solution
Since the 𝑥 distribution is normal and 𝜎 is known, we use the standard normal distribution
with
𝑥̅ − 𝜇 𝑥̅ − 2.3
𝑍=𝜎 =
⁄ 𝑛 0.3⁄
√ √8
(iv) What is the value of the sample test statistic? Convert the sample mean 𝑥̅ to
a standard 𝑍 value
Solution
(v) Draw a sketch showing the p-value area on the standard normal
distribution. Find the p-value
Solution
Page 8 of 19
LECTURE 2
(vi) Compare the level of significance 𝛼 and the p-value. What is your
conclusion?
Solution
Solution
The sample data are not significant at 𝛼 = 0.01 level. At this point in time, there is not
enough evidence to conclude that the ammonia nitrogen has changed in miller creek.
Page 9 of 19
LECTURE 2
“Fail to reject” the null hypothesis simply means the evidence in favor of rejection was
not strong enough
Often, in the case that 𝐻0 cannot be rejected , a confidence interval is used to estimate
the parameter in question
The confidence interval gives the statistician a range of possible values for the
parameter.
Term Meaning
Fail to Reject 𝐻0 There is not enough evidence in the data (and the test being employed) to
justify a rejection of 𝐻0 . This means that we retain 𝐻0 with the
understanding that we have not proved it to be true beyond all doubt
Reject 𝐻0 There is enough evidence in the data (and the test employed) to justify
rejection of 𝐻0. This means that we choose the alternative hypothesis
𝐻1 with the understanding that we have not proved 𝐻1 to be true beyond all
doubt.
Page 10 of 19
LECTURE 2
(ii) Testing claims about a population mean (𝝁) when (𝝈) is unknown
When the 𝜎 is unknown, we usually use the student’s 𝒕 test
Requirements
Let x be a random variable appropriate to your application.
Obtain a simple random sample of size 𝑛 of 𝑥 values from which you can
compute the sample mean 𝑥̅ and the sample standard deviation s. if you
can assume 𝑥 has a normal distribution or simply a mound-shaped and
symmetric distribution, then sample size 𝑛 will work.
If you cannot assume this, use a sample size 𝑛 ≥ 30.
Testing procedure
(i) In the context of the application, state the null and alternative
hypotheses and set the level of significance 𝛼
(ii) Use 𝑥̅ , 𝑠 and 𝑛 from the sample with 𝜇 from 𝐻0 , to compute the 𝑡 value
𝑥̅ −𝜇
𝑡=𝑠 With degrees of freedom 𝑑𝑓 = 𝑛 − 1
⁄ 𝑛
√
(iii) Use the student’s t distribution and the type of test, one-tailed or two-
tailed, to find (or estimate) the p-value corresponding to test statistic.
(iv) Conclude the test. If p-value ≤ 𝛼, then reject 𝐻0 . If p-value > 𝛼 , then do
not reject 𝐻0
(v) Interpret your conclusion in the context of the application
Note
If the test statistic t for the sample statistic 𝑥̅ is negative, look up the p-value for
the corresponding positive value of 𝑡 (i.e. Look up the p-value |t|)
Page 11 of 19
LECTURE 2
Example 1
The drug 6 –MP (6-mercaptopurine) is used to treat leukemia. The following data
represent the remission times (in weeks) for a random sample of 21 patients using
6-Mp (reference: E.A Gehan, University of Texas cancer center)
10 7 32 23 22 6 16 34 32 25 11
20 19 6 17 35 6 13 9 6 10
The sample mean 𝑥̅ = 17.1 weeks, with sample standard deviation 𝑠 = 10.0. Let 𝑥
be arandom variable representing the remission time (in weeks) for all patients.
Using 6-MP.
Assume the 𝑥 distribution is mound shaped and symmetric. A previously used drug
treatment had a mean remission time of 𝜇 = 12.5 weeks. Do the data indicate that
the mean remission time using the drug 6-MP is different (either way) from 12.5
weeks? Use 𝛼 = 0.01
Solution
(ii) Check requirements: what distribution do we use for the sample test
statistic 𝑥̅ compute the sample test statistic 𝑥̅ and the corresponding 𝑡
value
The 𝑥 distribution is assumed to be mound-shaped and shaped and
symmetric. Because we don’t know 𝜎, we use a student’s 𝑡 distribution.
With 𝑑𝑓 = 20. Using ̅𝑥 = 17.1 and 𝑠 = 10.0 from the sample data, 𝜇 = 12.5
from 𝐻0 , and, 𝑛 = 21,
𝑥̅ − 𝜇 17.1 − 12.5
𝑡=𝑠 = = 2.108
⁄ 𝑛 10.0⁄
√ √21
(iii) Find the p-value or the interval containing the p-value
The sample statistic is 𝑡 = 2.108 and since this is a two-tailed test with
𝑑𝑓 = 20 − 1 = 19 degrees of freedom, then, the p-value for the sample t
falls between the corresponding two-tailed areas 0.050 and 0.020
0.020 < 𝑝 𝑣𝑎𝑙𝑢𝑒 < 0.050
Page 12 of 19
LECTURE 2
Example 2
Page 13 of 19
LECTURE 2
The sample mean is 𝑥̅ = 2.92𝑐𝑚 and the sample standard deviation is 𝑠 = 0.85,
where 𝑥 is a random variable that represents the lengths (in cm) of all projectile
points found at the adjacent cliff dwelling site. Do these data indicate that the mean
length of projectile points in the adjacent cliff dwelling is longer than 2.6 cm? Use
a 1% level of significance.
(i) State H0 , H1 and α
Solution
(ii) Check requirements: what sampling distribution should you use for 𝑥̅ ?
what is the t-value of the sample test statistic
Solution
(iii) Find an interval containing the p-value, do you use one-tailed or two-tail
areas? Why? Sketch a figure showing the p-value. Find an interval
containing the p-value.
Solution
Page 14 of 19
LECTURE 2
Solution
Since the interval containing the p-value lies to the left of 𝛼 = 0.01, we reject 𝐻0
Solution
Page 15 of 19
LECTURE 2
̅
The values of 𝑥̅ for which we reject the 𝐻0 are called the critical region of the 𝒙
distribution
Depending on the alternate hypothesis, the critical region is located on the left-
side, on the right side or both sides of the 𝑥̅ distribution.
The figure below sows the relationship of the critical region to the alternate
hypothesis and the level of significance 𝛼
Notice that the total area in the critical region is preset to be the level of
significance 𝜶
Recall that the level of significance 𝛼 should be in theory be a fixed, preset
number assigned before drawing any samples
The most commonly used levels of significance are 𝛼 = 0.05 and 𝛼 = 0.01
Critical values are the boundaries of the critical region.
Critical values designated as 𝑍0 for the standard normal distribution are shown
below
Page 16 of 19
LECTURE 2
The procedure for hypothesis testing using the critical region follows the same
procedure as the one for the p-value
Page 17 of 19
LECTURE 2
(ii) Use the known 𝜎, the sample size 𝑛, the value of 𝑥̅ from the sample and 𝜇
from the null hypothesis to compute the standardized sample test statistic
𝑥̅ − 𝜇
𝑍=𝜎
⁄ 𝑛
√
(iii) Show the critical region and the critical values on the graph of the sampling
distribution. The level of significance 𝛼 and the alternative hypothesis
determine the locations of the critical regions and the critical values.
(iv) Conclude the test. If the test statistic Z computed in step 2 is in the critical
region, then reject the null hypothesis. If the test statistic Z are not in the
critical region, then do not reject 𝐻0.
(v) Interpret your conclusions in the context of the application
Example
Consider the example regarding sunspots. Let x be a random variable representing the
number of sunspots observed in a -4 week period. A random sample of 40 such periods
from Spanish colonial gave the number of sunspots per period. The raw data are given
previously. The sample mean is 𝑥̅ = 47
Previous studies indicate that for this period, 𝜎 = 35. It is thought that for thousands of
years, the mean number of sunspots per 4-week period was about 𝜇 = 41. Do the data
indicate that the mean sunspot activity during the Spanish colonial period was higher than
41? Use 𝛼 = 0.05
Solution
Page 18 of 19
LECTURE 2
We conclude that the test by showing the critical region, critical value, and
the sample statistic 𝑧 = 1.08 on the standard normal curve. For a right-
tailed test with 𝛼 = 0.05 the critical value is 𝑧0 = 1.645
The figure below shows the critical region. As we can see, the sample test
statistic does not fall in the critical region. Therefore, we fail to reject the
null hypothesis.
(vi) How do results of the critical region method compare to the results of the p-
value method for a 5% level of significance?
The results as expected, are the same. In both cases we fail to reject to reject
𝐻0
Page 19 of 19