Module 2 Hypothesis Testing
Module 2 Hypothesis Testing
Decision Sciences
https://learn.upgrad.com/course/1260
Agenda
Types of Hypothesis
Null hypothesis refers to a specified value of the population parameter not sample
A null hypothesis may be rejected, but it can never be accepted based on a single test.
Tails of test
We want to test that the We want to test that the We want to test that the
population mean is different population mean is less than 10 population mean is greater than 10
than 10
Significance Level (⍺) Two-Tailed Test One-Tailed Test (Left) One-Tailed Test (Right)
0.01 ±2.58 -2.326 +2.326
0.05 ±1.96 -1.645 +1.645
0.10 ±1.645 -1.282 +1.282
Steps Involved in Hypothesis Testing
Formulate H0 and Ha
Hypothesis Tests
Tests of Tests of
Association Differences
• Hypothesis Statement
Null hypothesis: The mileage is greater than or equal to 17
(as this is the default claim made by the brand )
Alternative hypothesis: The mileage is less than 17
(as this challenges the null hypothesis)
• Mathematically
Ho: Mileage (mean) ≥ 17
Hα: Mileage (mean) < 17
Formulating the Hypotheses
1 At least
More than or equal to
2 More than
3 Less than
Less than or equal to
4 At most
Ho: Decrease in wear after 3 years ≤ 9%; Ha: Decrease in wear after 3 years > 9%
Formulating the Hypotheses
• Mr. Mohan of the Civil Engineering Department wants to test the load bearing
capacity of an old bridge which must be more than 10 tons, in that case he can
state his hypotheses as under:
• Null hypothesis H0 : tons µ<=10
• Alternative Hypothesis Ha : tons µ > 10
Formulating the Hypotheses
• The average score in an aptitude test administered at the national level
is 80. To evaluate a state’s education system, the average score of 100
of the state’s students selected on random basis was 75. The state
wants to know if there is a significant difference between the local
scores and the national scores.
• Null hypothesis H0 : µ = 80
• Alternative Hypothesis Ha : µ ≠ 80
Formulating the Hypotheses
• The null hypothesis for this experiment is that the average weight of the flour
packages is 10 kg (no problem). The alternative hypothesis is that the average is not
10 kg (process is out of control).
Formulating the Hypotheses: Practice sets
• A financial investment firm wants to test to determine whether the average hourly
change in the Dow Jones Average over a 10-year period is +0.25.
• A manufacturing company wants to test to determine whether the average
thickness of a plastic bottle is 2.4 millimeters.
• A retail store wants to test to determine whether the average age of its customers
is less than 40 years.
Testing of Hypothesis
• Quiz 2. Testing of
Hypothesis
https://forms.gle/jLb
szZKDucYstzyy5
Type of Test
Type of Test
Test One sample Population standard N>30 Z test
(normality (Parameter of deviation is known
assumption) measurement:
mean)
Population standard N<30 Independent
deviation is not known sample t test
P value method
If p<=alpha , Reject Ho
If p> alpha, Fail to Reject Ho
Confidence level
• Confidence interval is a range of values, derived from sample statistics, which is likely to contain
the value of an unknown population parameter
• The confidence level is defined for the hypothesis test according to the accuracy needed.
• A higher confidence level indicates that more evidence is needed to reject the null hypothesis.
• Therefore, increasing the confidence level makes it harder to reject the null hypothesis.
• Inversely, a low confidence level indicates that the null hypothesis can be rejected easily.
✓ The confidence level or reliability is the expected percentage of times that the actual value will
fall within the stated precision limits.
✓ Thus, if we take a confidence level of 95%, then we mean that there are 95 chances in 100 (or
.95 in 1) that the sample results represent the true condition of the population within a specified
precision range against 5 chances in 100 (or .05 in 1) that it does not.
✓ Confidence level indicates the likelihood that the answer will fall within that range, and the
significance level indicates the likelihood that the answer will fall outside that range.
✓ We can always remember that if the confidence level is 95%, then the significance level will be
(100 – 95) i.e., 5%; if the confidence level is 99%, the significance level is (100 – 99) i.e., 1%.
✓ We should also remember that the area of normal curve within precision limits for the specified
confidence level constitute the acceptance region and the area of the curve outside these limits
in either direction constitutes the rejection regions.
Level of Significance
• The significance level is the probability of
rejecting the null hypothesis when it is
true. For example, a significance level of
0.05 indicates a 5% risk of concluding
that a difference exists when there is no
actual difference. Lower significance
levels indicate that you require stronger
evidence before you will reject the null
hypothesis.
• Los (Alpha)= 1- CI
Z Score
✓If the z-score of the sample lies further away from the center than the critical z-
values, the null hypothesis is rejected.
✓Otherwise, the test fails to reject the hypothesis.
✓The only two possible outcomes of a hypothesis test are ‘reject the null
hypothesis’ or ‘fail to reject the null hypothesis’. This hypothesis can never be
‘accepted’.
Commonly used critical z scores
Left tail test Two tail test Right tail test
Two tail test
Example 2: MS EXCEL
• Imagine you’re the owner of a pizza company, and you claim that your pizzas
are more than 9 inches in diameter. But you’ve been receiving complaints
from some of your customers, who say that the pizzas are actually smaller.
Your task is to now find out whether your chefs are producing smaller pizzas.
In this case, you will conduct a ‘left-tailed test’ by checking whether your
sample mean is significantly lesser than 9 inches, since you’re checking
whether the complaints about smaller pizzas are true.
• Hypothesis Statement
• Null hypothesis : Pizza size is at least 9 inches (i.e. 9 or more).
• Alternative hypothesis : Pizza size is less than 9 inches
• Mathematically
• Null hypothesis : Pizza size ≥ 9.
• Alternative hypothesis : Pizza size is < 9.
Hypothesis testing –One sample t test
• The One Sample t Test examines whether the mean of a population is
statistically different from a known or hypothesized value. The One
Sample t Test is a parametric test.
• Applicable when population standard deviation is unknown
• Number of samples is less than 30
Two Sample t Test
• When there is a need to compare the means of two samples, a two-
sample t-test is conducted. In such a case, the formula for the t-statistic
becomes
https://learn.upgrad.com/course/1260/segment/10349/64185/187901/997831
Types of two sample test
Paired t test Unpaired t test
• Paired t test - Paired means that • An unpaired t-test is used to compare the
both samples consist of the same test subjects mean between two independent groups. You
use an unpaired t-test when you are comparing
• Paired t-tests are used when the same item or
two separate groups with equal variance.
group is tested twice, which is known as a repeated
measures t-test. Some examples of instances for • Unpaired t test- Unpaired means that
which a paired t-test is appropriate include: both samples consist of distinct test subjects.
• Research, such as a pharmaceutical study or
• The before and after effect of a pharmaceutical
other treatment plan, where ½ of the subjects
treatment on the same group of people.
are assigned to the treatment group and ½ of
• Body temperature using two different the subjects are randomly assigned to the
thermometers on the same group of participants. control group.
• Standardized test results of a group of students • Comparing the average commuting distance
before and after a study prep course. traveled by New York City and San Francisco
residents using 1,000 randomly selected
participants from each city.
https://www.technologynetworks.com/informatics/articles/paired-vs-unpaired-t-test-differences-assumptions-and-hypotheses-330826
Summary
1. Define the hypothesis statements: Your test will either ‘reject’ or ‘fail to reject’ the null hypothesis.
2. Collect as many data points as possible: The data points you collect will produce one sample. The size
of this single sample will depend on how many data points you take.
3. Measure the sample mean and the sample standard deviation: The standard deviation should be
calculated using the ‘n-1’ method. The STDEV function in Excel takes care of this.
4. Identify the distribution of the sample means: If the sample size is larger than 30, the distribution will
be a normal one (We’re only focusing on normal distributions for now).
5. Define the confidence level: This is the level of surety that you demand from a hypothesis test. The
higher the confidence level, the harder it is to reject the null hypothesis.
6. Find the critical z-scores of the confidence level and the test statistic or the z-score of the sample: The
z-score of the sample can be calculated by subtracting the hypothesised mean from the sample mean
and dividing it by the population standard deviation, divided by the root over sample size.
7. Compare the sample test statistic with the critical z-scores: Here, you check whether the sample
statistic is more extreme than the z-scores.
8. If the sample test statistic is more extreme than the critical z-scores, you will reject the null
hypothesis. Otherwise, you will fail to reject it.
Summary
✓When the test needs to check only positive or negative deviation from the null
hypothesis, a one-tailed test is performed.
✓When the test needs to check for deviation on either side of the null hypothesis, a
two-tailed test is performed.
✓When the sample size is low, a t-test is performed.
✓A t-test is also preferred over a z-test when the population standard deviation is
unknown.
✓When two sample means need to be checked for equality, a two-sample t-test is
performed.
✓When there is a need to check whether an entire distribution is similar to another,
a goodness of fit test is performed.
✓Hypothesis testing also carries some probability of committing errors. The errors
can be of two types: Type I and Type II.
A/B testing
✓An A/B test tells you whether there is a statistical difference in the performance of
the two options.
✓Data driven decision making system
✓A/B tests are used whenever there is a need to compare two alternatives.
✓The A/B test can be considered the most basic kind of randomized controlled
experiment
✓You will now learn about ‘A/B tests’, which are used in the industry when there is a
need to make a choice between two options. An A/B test tells you whether there
is a statistical difference in the performance of the two options.
A/B testing : History
• In the 1920s statistician and biologist Ronald Fisher discovered the most
important principles behind A/B testing and randomized controlled
experiments in general.
• Fisher ran agricultural experiments, asking questions such as, What happens if I
put more fertilizer on this land? The principles persisted and in the early 1950s
scientists started running clinical trials in medicine.
• In the 1960s and 1970s the concept was adapted by marketers to evaluate
direct response campaigns (e.g., would a postcard or a letter to target
customers result in more sales?).
Areas of application
• Medicine, to understand if a drug works or not
• Economics, to understand human behaviour
• Foreign aid and charitable work (the reputable ones at least), to
understand which interventions are most effective at alleviating
problems (health, poverty, etc)
• Comparing two version of websites
• Comparing two colors/ tab/ page design
Example: A/ B testing
• Let’s say John builds a website for a free e-book and is testing out two colour
variations — red and blue. On the red website, 45 out of 100 visitors downloaded
the e-book. But on the blue website, 47 out of 100 visitors downloaded the e-book.
Based on this, John may conclude that the blue website is performing better.
• However, John’s method can backfire. This is because he did not bother to check for
statistical significance. The difference in performance observed may be due to plain
old randomness. Thus, there’s a high probability that he may end up with an
inferior website colour.
Variant B’s conversion rate (4.26%) was 11.22% lower than variant A’s conversion rate (4.80%). You can
be 95% confident that variant B will perform worse than variant A.
Power 0.00% p value 1.0000
Example: A/ B testing
Tanishq launched two ads during Diwali on youtube to promote its
products. Bothe ads were measured in terms of how many people
watched the ad and how many clicked on them to visit Tanishq store.
Using the following data, calculate if adv 2 is more effective in directing
the traffic.
Advertisement 1 Advertisement 2
Impressions 343490 344200
Clicks 96720 97535
CTR 28.16% 28.34%
Variant B’s conversion rate (28.34%) was 0.63% higher than variant A’s conversion rate (28.16%). You
can be 95% confident that variant B will perform better than variant A.
Power 75.27% p value 0.0499
Errors in Hypothesis test
Decision
Accept H0 Reject H0
H0 (True) Correct decision Type I error
(Alpha Error)
H0 (False) Type II Error Correct decision
(Beta error)
• Type 1- Null hypothesis was true but rejected, pizza>=9, but I rejected
• Type 2 error- Accept Ho when ho is false, pizza was not >=9 but accepted it
Handling Error
There are two ways of handling error-
1. Increasing confidence level of the test
a. Reduces type 1 error
b. Increase types two error
2. Increasing sample size
a. Reduces type 2 error
b. Doesn’t effect Type 1 error
• P value calculator
• http://courses.atlas.illinois.edu/spring2016/STAT/STAT200/pnormal.ht
ml
• A/B testing
• https://www.surveymonkey.com/mp/ab-testing-significance-calculator/
• Quiz3. Practice
https://forms.gle/NmAufBEXStaRagRr8
Doubts?
All the Best!
https://www.youtube.com/watch?v=Z9Gw9dIJGiA&t=86s&ab_channel=upGrad_Gmba