0% found this document useful (0 votes)
263 views33 pages

Two-Sample Inference Techniques Explained

The document provides an overview of statistical tests for comparing two populations, including: - Z-tests and confidence intervals for comparing two independent sample means when population standard deviations are known - T-tests and confidence intervals for comparing two independent sample means when population standard deviations are unknown - Paired t-tests and confidence intervals for comparing two dependent/matched samples, such as comparing test scores from the same subjects before and after an intervention. Examples are provided to demonstrate how to perform and interpret each type of test.

Uploaded by

Adinaan Shaafii
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
263 views33 pages

Two-Sample Inference Techniques Explained

The document provides an overview of statistical tests for comparing two populations, including: - Z-tests and confidence intervals for comparing two independent sample means when population standard deviations are known - T-tests and confidence intervals for comparing two independent sample means when population standard deviations are unknown - Paired t-tests and confidence intervals for comparing two dependent/matched samples, such as comparing test scores from the same subjects before and after an intervention. Examples are provided to demonstrate how to perform and interpret each type of test.

Uploaded by

Adinaan Shaafii
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Two Sample Inference

By: Girma M.

1
Outline’s
• Introduction
• Z test and CI’s for the difference b/n two sample means
• t-test and CI’s for the difference b/n two sample means
• Paired Data Analysis
• Z-test and CI’s for the difference b/n two sample proportion

2
Introduction
• The previous chapters presented Estimation and Hypothesis testing
for a single population parameter (the mean µ, and the proportion p).
This chapter extends those concepts to the case of two populations.
• There are, however, many instances when researchers wish to
compare two sample means, using experimental and control groups.
• For example, the average growing time of two different type of
Papaya might be compared to see whether there is any difference in
their growing time. Two different brands of fertilizer might be tested
to see whether one is better than the other for growing plants.
• The objective of this chapter is to study the difference in the
parameters of the two populations.

3
Z test and CI’s for the difference b/n two sample means
• Suppose a researcher wishes to see whether there is a difference in
the average production of maize of using fertilizer for the
experimental group and using compost for the control group. In this
case, the researcher is not interested in the average production of
both groups; instead, he is interested in comparing the means of the
two groups. His research question is, Does the mean production of
using fertilizer differ from using compost?
• Here, the hypotheses are:
𝐻0 : 𝜇1 = 𝜇2 𝑣𝑠𝐻1 : 𝜇1 ≠ 𝜇2
• Where: 𝜇1 = 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛 𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑚𝑎𝑧𝑖𝑒 𝑢𝑠𝑖𝑛𝑔 𝑓𝑒𝑟𝑡𝑖𝑙𝑖𝑧𝑒𝑟
𝜇2 = 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛 𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑚𝑎𝑧𝑖𝑒 𝑢𝑠𝑖𝑛𝑔 𝑐𝑜𝑚𝑝𝑜𝑠𝑡
• If there is no difference in population means, subtracting them will
give a difference of zero. If they are different, subtracting will give
a number other than zero.
4
Con’t

5
Con’t
• First, assume that both population distributions are normal and
both 𝜎1 2 and 𝜎2 2 valuesare known.
• Because the population distributions are normal, 𝑥ҧ both 𝑦ത and
have normal distributions. Furthermore, independence of the two
samples implies that the two sample means are independent of one
another.
• Thus, the difference 𝑥ҧ − 𝑦ത is normally distributed, with mean 𝜇1 −
𝜎1 2 𝜎2 2
𝜇2 and variance ൗ𝑚 + ൗ𝑛. Standardizing 𝑥ҧ − 𝑦gives
ത the
standard normal variable:
𝑥ҧ − 𝑦ത − (𝜇1 − 𝜇2 )
𝑍=
𝜎1 2 ൗ 𝜎2 2 ൗ
𝑚 + 𝑛

6
Con’t
• The two tailed hypothesis can be stated as
𝐻0 : 𝜇1 = 𝜇2 𝑣𝑠𝐻1 : 𝜇1 ≠ 𝜇2

The decision rules are


the same as the single
Sample inference.
7
Con’t
• A 100(1 − α)% confidence interval for the difference population
means(𝜇1 − 𝜇2 ) are:

• If both samples are large and the population variances are not given,
then a 100(1 − α)% confidence interval for the difference between
population mean will be

8
Example
1. An agriculture researcher wishes to see whether
the productions with fertilizer are higher than
without fertilizer. samples of 100 product with
fertilizer and 100 without fertilizer are selected.
𝑥ҧ = 90 , 𝑠1 = 6 𝑎𝑛𝑑 𝑦ത = 88 , 𝑠2 = 7 are the
results.
a) Can the researcher conclude, at α = 0.05, the
productions with fertilizer are greater than without
fertilizer?
b) Find 99% confidence interval for the difference
between the true production mean?
9
Con’t
2. Lengths of Major U.S. Rivers A researcher wishes to see if the
average length of the major rivers in the United States is the same
as the average length of the major rivers in Europe. The data (in
miles) of a sample of rivers are shown. At a 𝛼 = 0.01, is there
enough evidence to reject the claim? Assume 𝜎1 = 450 𝑎𝑛𝑑 𝜎2 =
474. Use both rejection region method and confidence interval
method.

10
t-test and CI’s for the difference b/n two sample means
• The z test was used to test the difference b/n two means when
the population standard deviations were known and the
variables were normally or approximately normally
distributed, or when both sample sizes were greater than
or equal to 30.
• In many situations, however, these conditions cannot be met
that is, the population standard deviations are not known.
• In these cases, a t test is used to test the difference b/n means when
the two samples are independent and when the samples are taken
from two normally or approximately normally distributed
populations. Samples are independent samples when they are not
related. Also it will be assumed that the variances are not equal.
11
Con’t
• Hhh

12
Con’t
• Confidence intervals can also be found for the difference between
two means with this formula:

• In many statistical software packages, a different method is used


to compute the degrees of freedom for this t test. They are
determined by the formula

13
Example
• The average size of a farm in Indiana County,
Pennsylvania, is 191 acres. The average size of a farm in
Greene County, Pennsylvania, is 199 acres. Assume the
data were obtained from two samples with standard
deviations of 38 and 12 acres, respectively, and sample
sizes of 8 and 10, respectively. Can it be concluded at 𝛼 =
0.05that the average size of the farms in the two counties
is different? Construct a 95% confidence interval for the
difference b/n true average farm size for two state.
Assume the populations are normally distributed.

Source: Pittsburgh Tribune-Review.

14
Warning!!!
• In some situations, we can reasonably assume that the unknown
variances 𝜎12 and 𝜎22 are equal for two independent normal
populations with means µ1 and µ1 .
• It seems reasonable to combine the two sample variances and to
form an estimator of common variance which is called pooled
estimator of 𝜎 2 is defined as follows.

• Where: 𝑠𝑝2 is pooled variance, and an estimator for common


variance 𝜎 2 . The test statistic becomes:
𝑥ҧ − 𝑦ത − 𝜇1 − 𝜇2
𝑡=
1 1
𝑠𝑝 +
𝑚 𝑛
• the test statistic t has t-distribution with m+n-2 degree of
freedom. 15
Analysis of Parried Data
• There are a number of experimental situations in which there is
only one set of n individuals or experimental objects; making two
observations on each one results in a natural pairing of values and
samples are considered to be dependent samples when the
subjects are paired or matched in some way. .
• For example, suppose a medical researcher wants to see whether a
drug will affect the reaction time of its users.
• To test this hypothesis, the researcher must pretest the subjects in
the sample first. That is, they are given a test to ascertain their
normal reaction times. Then after taking the drug, the subjects are
tested again, using a posttest.
• Finally, the means of the two tests are compared to see whether
there is a difference.
16
Con’t
• Since the same subjects are used in both cases, the samples are
related; subjects scoring high on the pretest will generally score
high on the posttest, even after consuming the drug. the researcher
employs a t test, using the differences between the pretest values
and the posttest values. Thus only the gain or loss in values is
compared.
• When the samples are dependent, a special t test for dependent
means is used. This test employs the difference in values of the
matched pairs. The hypotheses are as follows:

• Where 𝜇𝐷 is the symbol for the expected mean of the difference of


the matched pairs. 17
Con’t
• The general procedure for finding the test value involves
several steps.

18
Con’t

• Where the observed value is the mean of the differences. The


expected value 𝜇𝐷 is zero if the hypothesis is 𝜇𝐷 =0. The standard
error of the difference is the standard deviation of the difference,
divided by the square root of the sample size. Both populations must
be normally or approximately normally distributed.

19
Example
1

20
Example
2• A sample of nine local banks shows their deposits (in billions of
dollars) 3 years ago and their deposits (in billions of dollars)
today. At a 𝛼 = 0.05, can it be concluded that the average in
deposits for the banks is greater today than it was 3 years ago?
Find the 99% Confidence interval for 𝜇𝐷 .

• Note: Confidence intervals can be found for the mean


differences with this formula.

21
Z-test and CI’s for the difference b/n two sample proportion
• The z test with some modifications can be used to test the equality of
two proportions.
• For example, a researcher might ask, Is the proportion of men who
exercise regularly less than the proportion of women who exercise
regularly? Is there a difference in the percentage of students who
own a personal computer and the percentage of nonstudents who
own one? Is there a difference in the proportion of infected trees in
the natural forest and infected trees in the plantation?
• When you are testing the difference between two population
proportions 𝑃1 and 𝑃2 , the hypotheses can be stated thus, if no
difference between the proportions is hypothesized.

• Similar statements using > or < in the alternate hypothesis can be


formed for one-tailed tests. 22
Con’t
𝑋1 𝑋2
• For two proportions, 𝑝Ƹ1 = is used to estimate 𝑃1 and 𝑝Ƹ 2 = is
𝑛1 𝑛2
used to estimate 𝑃2 . The standard error of the difference is:

• Where 𝜎 2 𝑝1 and 𝜎 2 𝑝2 are the variances of the proportions, 𝑞1 = 1 −


𝑝1 , 𝑞2 = 1 − 𝑝2 and 𝑛1 and 𝑛2 are the respective sample sizes.
• Since 𝑝1 and 𝑝2 are unknown, a weighted estimate of p can be
computed by using the formula:

23
Con’t
• This weighted estimate is based on the hypothesis that 𝑝1 =
𝑝2 . Hence 𝑝,ҧ is a better estimate than either 1 or 2, since it is a
combined average using both 𝑝Ƹ1 and 𝑝Ƹ2 . Since can be
simplified to:

• Finally, the standard error of the difference in terms of the


weighted estimate is:

24
Con’t

• The formula for the confidence interval for the difference between
two proportions is

25
Con’t

• Example: Survey on Inevitability of War; A sample


of 200 teenagers shows that 50 believe that war is
inevitable, and a sample of 300 people over age 60
shows that 93 believe war is inevitable. Is the proportion
of teenagers who believe war is inevitable different from
the proportion of people over age 60 who do? Use 𝛼 =
0.01. Find the 99% confidence interval for the difference
of the two proportions.
26
Testing the Difference Between Two Variances
• In addition to comparing two means and proportions, statisticians are
interested in comparing two variances or standard deviations. For
example, is the variation in the temperatures for a certain month for
two cities different?
• For the comparison of two variances or standard deviations, an F
test is used.
• If two independent samples are selected from two normally
distributed populations in which the variances are equal (𝜎 21 =
𝑠21
𝜎2 2 )and if the variances 𝑠2 1 and 𝑠2 2 are compare as , the
𝑠22
sampling distribution of the variances is called the F distribution.

27
Con’t

• The shapes of several curves for the F distribution are shown in


the following figure:

28
Con’t

• When you are testing the equality of two variances, these


hypotheses are used:

29
Con’t
• There are four key points to keep in mind when you are using the F
test.

30
Example
Heart Rates of Smokers
• A medical researcher wishes to see whether the variance of the
heart rates (in beats per minute) of smokers is different from the
variance of heart rates of people who do not smoke. Two samples
are selected, and the data are as shown. Using 𝛼 = 0.1, is there
enough evidence to support the claim?

31
Summary

32
Thank You!!!

33

You might also like