Hypothesis testing
Hypothesis testing
by the analyst depends on the nature of the data used and the reason for the analysis.
Hypothesis testing is used to assess the plausibility of a hypothesis by using sample data.
Such data may come from a larger population, or from a data-generating process.
data.
The test provides evidence concerning the plausibility of the hypothesis, given the
data.
The four steps of hypothesis testing include stating the hypotheses, formulating an
analysis plan, analyzing the sample data, and analyzing the result.
hypotheses: the null hypothesis and the alternative hypothesis. A null hypothesis determines
there is no difference between two groups or conditions, while the alternative hypothesis
determines that there is a difference. Researchers evaluate the statistical significance of the
1. Formulate a null hypothesis (H0) and an alternative hypothesis (Ha): The null
hypothesis represents the status quo, and states that there is no significant difference
between the population parameter and a specific value (usually the value of zero or a
certain reference value). The alternative hypothesis, on the other hand, represents the
researcher‟s claim and states that there is a significant difference between the
2. Determine the level of significance: This is the probability of making a type I error
(the error of rejecting a true null hypothesis). The levels of significance lie between 0
and 1. This is typically done by calculating a test statistic (Z score) and comparing it
to a critical value from a statistical table based on the chosen level of significance and
the degree of freedom of the data. If the calculated test statistic is greater than the
critical value, the null hypothesis is rejected, and the results are considered
statistically significant. If the calculated test statistic is less than or equal to the critical
value, the null hypothesis cannot be rejected, and the results are considered not
statistically significant.
3. Choose a test statistic and calculate its value: The test statistic is a measure of how
different the sample mean is from the null hypothesis value. The value of the test
statistic is calculated based on the sample data and a specific distribution (such as the
normal distribution). Generally the test statistic is calculated as the pattern in your
data (i.e. the correlation between variables or difference between groups) divided by
4. Compare the test statistic to the critical value: The decision to accept or reject the null
hypothesis is based on the comparison of the calculated test statistic to the critical
5. Make a decision and interpret the results: i) Case I – if the calculated test statistic is
greater than the critical value, the null hypothesis is rejected and the alternative
hypothesis accepted. This means that there is sufficient evidence to support the
researcher‟s claim. ii) Case II – if the calculated test statistic is less than or equal to
the critical value, the null hypothesis cannot be rejected, and the results are considered
Level of Significance
The significance level of an event (such as a statistical test) is the probability that the event
could have occurred by chance. If the level is quite low, that is, the probability of occurring
by chance is quite small; we say the event is significant. Statistical significance does not
mean that the event has any clinical meaning. It should not be confused with the common use
of the term significant by society, meaning the event has some societal importance.
The level of significance is defined as the fixed probability of wrong elimination of null
hypothesis when in fact, it is true. The level of significance is stated to be the probability of
type I error and is preset by the researcher with the outcomes of error. The level of
significance is the measurement of the statistical significance. It defines whether the null
hypothesis is assumed to be accepted or rejected. It is expected to identify if the result is
The level of significance is denoted by the Greek symbol α (alpha). Therefore, the level of
The values or the observations are less likely when they are farther than the mean. The results
Example: The value significant at 5% refers to p-value is less than 0.05 or p < 0.05.
Similarly, significant at the 1% means that the p-value is less than 0.01.
Decision criteria
To measure the level of statistical significance of the result, the investigator first needs to
calculate the p-value. It defines the probability of identifying an effect which provides that
the null hypothesis is true. When the p-value is less than the level of significance (α), the null
hypothesis is rejected. If the p-value so observed is not less than the significance level α, then
theoretically null hypothesis is accepted. But practically, we often increase the size of the
sample size and check if we reach the significance level. The general interpretation of the p-
If p > 0.1, then there will be no assumption for the null hypothesis
If p > 0.05 and p ≤ 0.1, it means that there will be a low assumption for the null
hypothesis.
If p > 0.01 and p ≤ 0.05, then there must be a strong assumption about the null
hypothesis.
If p ≤ 0.01, then a very strong assumption about the null hypothesis is indicated.
One tailed
A one-tailed test results from an alternative hypothesis which specifies a direction. i.e. when
the alternative hypothesis states that the parameter is in fact either bigger or smaller than the
value specified in the null hypothesis. A one-tailed test may be either left-tailed or right-
tailed.
A left-tailed test is used when the alternative hypothesis states that the true value of the
parameter specified in the null hypothesis is less than the null hypothesis claims.
A right-tailed test is used when the alternative hypothesis states that the true value of the
parameter specified in the null hypothesis is greater than the null hypothesis claims
Two tailed
A two-tailed test results from an alternative hypothesis which does not specify a direction. i.e.
when the alternative hypothesis states that the null hypothesis is wrong.
The main difference between one-tailed and two-tailed tests is that one-tailed tests will only
have one critical region whereas two-tailed tests will have two critical regions. If we require
a 100(1−α) 100(1−α)% confidence interval we have to make some adjustments when using a
two-tailed test.
The confidence interval must remain a constant size, so if we are performing a two-tailed test,
as there are twice as many critical regions then these critical regions must be half the size.
This means that when we read the tables, when performing a two-tailed test, we need to
Decision errors
Decisions Errors refer to the probability of making a wrong conclusion when doing
hypothesis testing. When a researcher sets out to do a study, she typically has a hypothesis, or
a prediction of what she thinks the results will be. She then conducts the study to find out
whether her hypothesis is supported by data or not. Depending on the results of the study, she
then makes a decision about his hypothesis. Of course, there is always the possibility of
There are two ways a researcher can make a Decision Error. She can either decide that his
hypothesis is true when it is actually false, or decide that his hypothesis is false when it is in
fact true.
Type I
A Type I error means rejecting the null hypothesis when it‟s actually true. It means
concluding that results are statistically significant when, in reality, they came about purely
The risk of committing this error is the significance level (alpha or α) you choose. That‟s a
value that you set at the beginning of your study to assess the statistical probability of
The significance level is usually set at 0.05 or 5%. This means that your results only have a
If the p value of your test is lower than the significance level, it means your results are
statistically significant and consistent with the alternative hypothesis. If your p value is higher
than the significance level, then your results are considered statistically non-significant.
Type II
A Type II error means not rejecting the null hypothesis when it‟s actually false. This is not
quite the same as “accepting” the null hypothesis, because hypothesis testing can only tell
Instead, a Type II error means failing to conclude there was an effect when there actually
was. In reality, your study may not have had enough statistical power to detect an effect of a
certain size.
Power is the extent to which a test can correctly detect a real effect when there is one. A
The risk of a Type II error is inversely related to the statistical power of a study. The higher
the statistical power, the lower the probability of making a Type II error.