Introduction
to Statistical
Inference:
Confidence
Interval
Two-sided Confidence Intervals for
Population Mean
WHAT IS CONFIDENCE INTERVAL?
CONFIDENCE INTERVAL
A range of values that is likely to
contain the true population mean.
It is constructed based on a sample
data from the population.
CONFIDENCE LEVEL
is the probability that the interval
will contain the true population
mean.
Common Level of Confidence
Common Level of Confidence
Common Level of Confidence
WHAT IS THE PURPOSE OF
CONFIDENCE INTERVAL?
Uses of Confidence Intervals
Give us sense of how much we can trust our sampled-based estimates
Provide a range of values within which we are reasonably confident the true
population parameter lies.
Helps us acknowledge the uncertainty in our estimates by offering a plausible
range for the actual parameter, making our conclusion more reliable and
resilient.
FORMULA FOR A TWO-SIDED CONFIDENCE
INTERVAL FOR POPULATION MEAN
Point Estimate ± (Reliability Factor)(Standard deviation)
The value of the reliability factor depends on the desired
level of confidence
CONFIDENCE INTERVAL
Confidence Interval for μ
(σ2 Known)
Assumptions
Population variance σ2 is known
Population is normally distributed...
....or large sample so that CLT can be used.
Confidence interval estimate:
σ σ
x z α/2 μ x z α/2
n n
(where z/2 is the normal distribution value for a probability of /2 in each tail)
EXAMPLE
A sample of 11 circuits from a large normal population has a mean resistance of
2.20 ohms. We know from past testing that the population standard deviation is
0.35 ohms.
Determine a 95% confidence interval for the true mean resistance of the population.
EXAMPLE
A sample of 11 circuits from a large normal population has a mean resistance of
2.20 ohms. We know from past testing that the population standard deviation is .35
ohms.
Solution:
σ
x̄ ± z
√n
INTERPRETATION
We are 95% confident that the true mean resistance is between 1.9932 and
2.4068 ohms
Although the true mean may or may not be in this interval, 95% of intervals
formed in this manner will contain the true mean
MARGIN ERROR
Margin of error in confidence intervals represents the range
within which we expect the true population parameter to lie.
Factors influencing:
sample size (larger reduces error),
confidence level (higher widens the margin), and
population variability (more variability increases the margin).
MARGIN ERROR
The confidence interval,
σ σ
x z α/2 μ x z α/2
n n
Can also be written as x ME
where ME is called the margin of error
σ
ME z α/2
n
The interval width, w, is equal to twice the margin of error
CONFIDENCE INTERVAL
Confidence Interval for μ
(σ2 Unknown)
If the population standard deviation σ is unknown, we can substitute the
sample standard deviation, s
This introduces extra uncertainty, since s is variable from sample to sample
Therefore we use the t distribution instead of the normal distribution
confidence level and z-score
The relationship between confidence level and z-score is based on the standard normal
distribution (Z-distribution) and is used in constructing confidence intervals in statistics.
A confidence level represents the probability that the confidence interval, constructed from
sample data, contains the true population parameter of interest. The confidence level is often
denoted by (1 - α), where α is the significance level, representing the probability of making a
Type I error (rejecting a true null hypothesis). Commonly used confidence levels include 90%,
95%, and 99%.
The z-score corresponds to the number of standard deviations away from the mean for a given
probability or confidence level. For example, for a standard normal distribution (mean = 0,
standard deviation = 1), a z-score of approximately 1.96 corresponds to a 95% confidence level.
The relationship between confidence level
and z-score can be summarized as follows:
1. For a given confidence level (1 - α), you can find the corresponding z-score
by looking it up in the standard normal distribution table (often denoted as
Zα/2, where α/2 is the tail probability for a two-tailed test).
2. Conversely, if you know the desired z-score (e.g., from the standard normal
distribution table), you can calculate the corresponding confidence level by
finding the area under the curve (probability) associated with that z-score.
For example, for a 95% confidence level:
The z-score is approximately 1.96 (or -1.96 for the lower bound).
This means that approximately 95% of the area under the standard
normal curve lies within 1.96 standard deviations from the mean,
leaving 2.5% in each tail.
t - Distribution
t - Distribution
Confidence Level and t distribution
Confidence Level and t distribution:
EXAMPLE
What is one tailed test?
A one-tailed test, also known as a one-sided test, is a
statistical hypothesis test in which the critical region of the
test is on only one side of the sampling distribution. This
means that it focuses on determining whether the sample
mean is significantly greater than or less than a
hypothesized population mean, but not both.
What is one tailed test?
In contrast, a two-tailed test looks for the possibility of the sample
mean being significantly different from the hypothesized population
mean in either direction.
One-tailed tests are often used when there is a specific directional
hypothesis or when researchers are interested in the effect in only one
direction. They can be more powerful than two-tailed tests in detecting
effects in that specific direction but are less suitable when the effect
could go in either direction.
What is two tailed test?
A two-tailed test, also known as a two-sided test, is a statistical hypothesis test
in which the critical region of the test is on both sides of the sampling
distribution. This means that it assesses whether the sample mean is
significantly different from a hypothesized population mean, without
specifying a particular direction of the difference.
In contrast to a one-tailed test, which focuses on determining whether the
sample mean is significantly greater than or less than a hypothesized
population mean, a two-tailed test looks for the possibility of the sample mean
being significantly different from the hypothesized population mean in either
direction.
t- table
A t-table, also known as a t-distribution table, is a statistical tool used in hypothesis
testing and confidence interval estimation when working with the Student's t-
distribution. The t-table provides critical values of the t-statistic for different levels
of significance (alpha) and degrees of freedom.
The t-distribution is similar to the standard normal distribution but is used when the
sample size is small or when the population standard deviation is unknown. It is
shaped like a bell curve but has thicker tails, which accounts for the increased
variability that arises from smaller sample sizes.
t-table
To use a t-table, you need to know the degrees of freedom (df), which is determined
by the sample size and is equal to the sample size minus one. You also need to
specify the desired level of significance (alpha), typically denoted as 0.05 for a
95% confidence level.
Once you have the degrees of freedom and the desired level of significance, you
can find the critical value of the t-statistic from the table. This critical value is then
compared to the calculated t-statistic from your sample data to make decisions in
hypothesis testing or construct confidence intervals.
hypothesis testing
Hypothesis testing is a fundamental statistical method used to make inferences about population
parameters based on sample data. It involves formulating two competing hypotheses about the
population parameter of interest, known as the null hypothesis (H0) and the alternative
hypothesis (H1 or Ha)
The null hypothesis typically represents the status quo or the absence of an effect, stating that
there is no significant difference or relationship in the population. The alternative hypothesis, on
the other hand, represents what the researcher is trying to find evidence for, such as the presence
of an effect, a difference between groups, or a relationship.
Hypothesis testing is widely used in various fields, including science, medicine, economics, and
psychology, to evaluate theories, test hypotheses, and make informed decisions based on
empirical evidence.
The process of hypothesis testing involves the following steps:
1. Formulating hypotheses: Clearly state the null and alternative hypotheses based on the
research question
2. Selecting a significance level: Choose a threshold for significance, denoted as α
(alpha), typically set at 0.05. This represents the maximum probability of making a
Type I error (incorrectly rejecting the null hypothesis when it is true).
3. Collecting and analyzing data: Gather sample data and calculate relevant test statistics
based on the chosen hypothesis test
4. Determining the test statistic: Compute the appropriate test statistic based on the
sample data and the chosen hypothesis test (e.g., t-test, z-test, chi-square test).
5. Calculating the p-value: Determine the probability of obtaining the observed results
(or more extreme) under the assumption that the null hypothesis is true. This is the p-
value.
6. Making a decision: Compare the p-value to the significance level. If the p-value is
less than or equal to α, reject the null hypothesis in favor of the alternative hypothesis.
If the p-value is greater than α, fail to reject the null hypothesis.
7. Drawing conclusions: Based on the decision made in step 6, draw conclusions about
the population parameter of interest and interpret the results in the context of the
research question.
The central limit theorem
The central limit theorem (CLT) is a fundamental concept in statistics that describes the
behavior of the sampling distribution of the sample mean or the sample sum as the sample size
increases, regardless of the shape of the population distribution. It states that under certain
conditions, the sampling distribution of the sample mean or sum approaches a normal
distribution, even if the population distribution is non-normal.
The central limit theorem has profound implications in statistical inference because it allows
us to make probabilistic statements about sample statistics, such as the sample mean or sum,
even when we may not know the exact population distribution. It forms the basis for many
statistical methods, including hypothesis testing, confidence interval estimation, and
regression analysis.
Key points about the central limit theorem:
1. Sample Size: As the sample size (n) increases, the sampling distribution of the sample mean or
sum approaches a normal distribution
2. Population Distribution: The population distribution from which the samples are drawn can be
non-normal, as long as the sample size is sufficiently large (typically n ≥ 30 for a good
approximation).
3. Independence: The individual observations within the sample should be independent of each
other
4. Random Sampling: The samples should be selected randomly from the population of interest.
Two-sided confidence intervals are commonly used in various real-
world scenarios, such as:
Medical Research: When estimating the effectiveness of a new
treatment, researchers may use a two-sided confidence interval to express
the range within which they are reasonably confident that the true effect
lies.
Market Research: In polling or survey analysis, two-sided confidence
intervals help convey the uncertainty around estimated percentages or
proportions, providing a range for the true population parameter.
Manufacturing Quality Control: When assessing the precision of a
manufacturing process, two-sided confidence intervals can be employed
to express the range within which the true mean of a certain parameter is
likely to fall.
Finance and Investments: Analysts may use two-sided confidence
intervals to estimate the range for the true value of financial indicators,
like stock returns or portfolio performance.
Environmental Studies: In ecological research, scientists might use two-
sided confidence intervals to estimate the range of population parameters,
such as species diversity, based on sampled data.
CONCLUSION
Two-sided confidence intervals provide a range of values within
which we are confident the true parameter lies.
They account for uncertainty in both directions and offer a more
comprehensive perspective in statistical inference. By considering
both upper and lower bounds, they provide a nuanced
understanding of the parameter's possible values, enhancing the
robustness of our conclusions.
THANK YOU!!!