Estimation by Confidence Interval
Estimation by Confidence Interval
MATRICULE:15E0628C
PRESENTATION ON
Contents
Definition .................................................................................................. 1
Estimation ................................................................................................ 2
Demonstration.......................................................................................... 3
Calculation ............................................................................................... 7
Conclusion .............................................................................................. 13
Definition
Confidence interval was introduced to statistics by JERZY NEYMAN in 1937. Confidence
interval refers to the probability that a population parameter will fall between two sets of value
for a certain proportion of times. It measures the degree of uncertainty or certainty in a sampling
model. A confidence interval can take any number of probabilities but the most common are
95% or 99% confidence level. A confidence is a range of value that likely will contain an
unknown parameter.
A confidence level refers to the percentage of probability or uncertainty that the confidence
interval will contain the true parameter when you draw a random sample many times.
Estimation
Here we are trying to get an estimation of a value we can’t really calculate.
Let’s say we have a population of the US. Now we want to calculate a particular thing from this
population. Let’s say the mean heartrates. But we don’t know because there is no way to
calculate them as they are over 200million persons there. You can’t measure the heartrate of
every single person, so the only thing we can do is to take a random sample. This random sample
is smaller in number than the original population, and so we can measure can measure the
heartrate of each person and calculate the mean.
As we talked before, population are the parameters and the parameters are the mean. It is
represented by μ which is the mean of the parameter and this population parameter is something
we can’t calculate, but we can get an estimate and what is our estimate is the sample mean from
the sample, and it is represented by X which is the sample mean.
We can also use the sample mode or the sample median. These are all measures of central
tendencies. In some cases, one is a better estimate than the other. So here, we are going to use the
sample mean. This will lead us to distinguish two types of estimate namely; the point estimate
and the interval estimate.
Point Estimate
With the point estimate we end up only with one single number that is going to represent the
value that you are to estimate which in this case is μ. For instance, we are trying to measure the
average heartrates of the population of the US, so we take our random sample and we take the
mean of the sample and we come out with the number of 74 beats per minutes and that is our
point value which is the number that represent the average heartrate of the population (μ).
Interval Estimate
With interval estimate, we are going to have a range of values that represent the mean and with
that range, we are going to specify how confidence we think the mean we are looking for is
within that range. Here we specify the range of values and we can say for instance that the range
is between 68 and 76 and we say we are 95% sure that μ is between this interval. Hence an
Interval Estimate with our range of value specified (68↔76) and our degree of confidence (95%)
A point estimate by itself is of no value because it does not reveal the uncertainty
associated with the estimate, it does not know how far away this 74 beats sample mean might be
from the population mean. So, the degree of uncertainty in this single sample is missing.
Whereas the interval estimate provides more information than the point estimate. By
establishing a 95% confidence level using the sample mean and standard deviation, and
assuming a normal distribution, the researcher arrives at an upper and lower bound that contain
the true mean of 95% of the time.
Demonstration
Here, we are going to look at some demonstration that will help us to better understand the
concept.
We can take our random sample has a distribution like this, it has a mean which we can
calculate.
95% CI => X ± 1.96δ. if we want different confidence interval, then we use different numbers.
With 1.96, 1.645, 2.58 called the Reliability factors and are taken from the Z table.
So, after looking at this, we any say why do we use 95% of CI? Why don’t we just use
99% of CI as it seems more confident and it seems as if we are more likely to get the value of the
population of the mean. Well if you think about that, let’s look at this;
99% C I = 40↔120.
We see that the number is exaggerated and the range is very huge but I’m 99% sure that I might
get the mean heartrate is in here because most people have a heartrate of between 60 and 100. So
The bigger the confidence interval, you will lose the value of this estimation. It is not a precision.
So now let’s think of our sample that we did. Let’s consider the 1st to the 6th sample mean.
We realize that most of this 95% of a time is going to fall on two standard deviation away from
the population mean (μ). This means that whenever we calculate one of the sample means, we
can actually put an interval around this μ. That is, whenever we calculate the sample mean, 95%
of times it is going to be around the intervals. We are trying to say here that if the sample mean is
within two standard deviation away from the population mean, it also means that the population
mean is two standard deviation away from the sample mean.
So, the time it is not going to work is when we are not within two standard deviation away from
the population mean, like the case of (X5) above. Also, because its interval does not include the
population mean. So, if we are going to do the sample a hundred times, 95% of times we are
going to fall on the sample mean. That is the basis of a Confidence Interval. So, we are going to
say 95% of times the population is within two standard deviation away from the mean and 5% of
time it is not.
Calculation
The confidence interval for a population mean is determined by taking sample maen, point
estimate and add and substract from the margin of error abbreviated as E
𝜹
Therefore E = Zα/2 , so if the population standard deviation is known, the margin of
√𝒏
error is calculated as; α = significance level which is calculated as
α = 1 - CL
CL = 1 – α
α = 1 – 0.09 = 0.05
Zα/2 is the single value called the critical value. It can be found on the normal table. When
constructing a 95% confidence interval, for example the confidence level is in the middle of the
distribution and the remaining 0.05 (α) is divided equally into the two tails as 0.025 and 0.025.
In this table, 0.025 in the left tail correspond to the Z value of -1.96 and due to symmetry, it will
be a positive 1.96 in the right tail.
Application Exercise
Let’s take for example scores on an exam are normally distributed with a population standard
deviation of 5.6. a random sample of 40 scores and the exam had a mean of 32. That is;
δ = 5.6
α = 40
n = 32
1) 80% CI
2) 90% CI
3) 98% CI
𝛿 5.6
E = Zα/2 => Zα/2
√𝑛 √40
Solution
α = 1 – 0.8 = 0.2
Looking at the Z value, we look at the 0.1 in the normal table. We find the closest value which
will be 0.1003 and that correspond to a Z score of -1.28 for the left tail and by symmetry is
positive 1.28. So, the Z critical value for the 80% of the confidence interval is 1.28. Hence,
5.6
E = 1.28 So, E = 1.13
√40
To interpret this, we say we have 80% confidence that the population mean score is between
30.87 and 33.13.
α = 1-0.9 = 0.1
Looking at the Z table, we realize that 0.05 lies between Z scores of -1.64 and -1.65. We take the
average of the two numbers and we are to have 1.645 and that will give us our E.
5.6
So E = 1.645 E = 1.46
√40
We say here that, we have 90% confidence level that the population mean score lies between
30.54 and 33.46.
3) For 98% CL
α = 1 – 0.98 = 0.2
0.01 lies between the Z scores of -2.33 and 2.33. so, we take 2.33. The E becomes;
5.6
E = 2.33 E = 2.06
√40
Here the population mean is found between 29.94 and 34.06. Which also mean that
29.94 —— μ —— 34.06.
As the confidence level increases, the critical value increases, the margin of error
increases and consequently, the confidence interval becomes wider.
Lastly, what is to be known and is very important is that, let’s do this by illustrating.
DIAGRAM
Here is the confidence interval centered around our sample mean and is a 95% confidence
interval. But it does not mean we are 95% sure our population is in here.
μ is the population mean that happens to fall in this level. It doesn’t mean that there is a 95%
confidence that our population mean is in the sample mean X. Because technically, the
population mean is either in here or not. (The probability is a 1 or a 0. In the above case, the
probability that the mean is found in this confidence level is 1.)
DIAGRAM
Here our population mean doesn’t fall between the confidence interval and so the probability that
the population mean is found in this confidence level of 95% is a 0.
Hence, we don’t use 95% CL and say it is the probability that we know the sample mean
because, technically it is not. What it really means is that if we took a hundred of this confidence
interval or a thousand or million or whatever, 95% of time, we are going to catch the sample
mean within it. So, this 95% is our confidence. That is why we call it the confidence coefficient
and not probability.
Conclusion
Confidence interval is a valuable form of statistics inference that we believe has certain
advantages over conventional hypothesis testing based on tests of significance. We can say that,
confidence interval allow themselves rapidly to graphic portrayal. Based on a review of physical
therapy, confidence interval is believed to be underutilized. So, the following recommendations
were made:
Confidence interval should be included whenever a sample statistic such as the mean is presented
as an estimate of the corresponding population parameter.
Confidence interval should be provided in addition to the result of the hypothesis test, with the
level of confidence for the confidence interval matched to the level of statistical significance for
the hypothesis test. For example, 95% < 0.05,99% for P < 0.9