0% found this document useful (0 votes)
2 views30 pages

Lecture 3

stat

Uploaded by

sarpercakirli0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views30 pages

Lecture 3

stat

Uploaded by

sarpercakirli0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Examples 3-4

3) In a factory bulbs are produced with a standard 4) The following table shows the distribution of the land
deviation of 20 hours and an average durability of cultivated by farmers selected randomly among 2000
550 hours. It is known that the durability of bulbs farmers and monthly incomes of them.
normal distributed. Quality is kept under control
by randomly selecting 100 units with a probability Cultivated Number of Annual Number of
of 95%. In a selected sample, the average is Land (in Farmers Income Farmers
calculated as 540 hours. acres)
16 5 140-148 3
20 8 148-156 8
25 12 156-164 13

A) Find the confidence interval of the average 28 10 164-172 10


durability of the bulbs with a probability of 30 3 172-180 6
95%. Are the sample means up to the 45 2
standard?
B) Find the interval for the sample means to be
up to the standard.
A) Estimate the total land cultivated in this region and
B) The average monthly income of the farmers with a
probability of 99%.
3) Population is normal distributed, s.d. of population is known, N is not known.

 20
A) x  540 x   2
n 100
With 95% confidence the durability time of
  x  Z /2 x  540  1.96*(2) bulbs varies between 536 and 544 hours. So the
= 536  xi  544 bulbs are produced below the standard.

B) x    1.96 x

x  550  1.96(2)  546  554 When the average durability time varies
546  xi  554 between 546 and 554 hours, the production is up
to the standard.
4) s.d. of population is not known, n  30 and n / N  0.05
A) Estimations for Cultivated Land
Cultivated
Land (in acres)
Number of
Farmers
fi xi fi x 2i
x
 fx i i

1000
 25 acres.
16 5 80 1280
n 40
20 8 160 3200
 fi xi  1000 
2 2


25 12 300 7500
fi xi2  26570 
28 10 280 7840
s n  40  6.345
30 3 90 2700 n 1 39
45 2 90 4050
40 1000 26570 n / N  40 / 2000  0.02  0.05

s 6.345
ˆ x    1.003
n 40

N ˆ  N  x  Z /2 x   2000  25  2.58 1.003  The size of cultivated land is estimated to be


= 44824.52  55175.48 acres between 44824.52 and 55175.48 acres, with 99%
confidence.
B) Estimations for Annual Income
Annual Number of fi mi
mi fi m 2i
Income (1000 TL.) Farmers
x
 fm i i

6464
 161600 TL.
[140-148) 3 144 432 62208 n 40

[148-156) 8 152 1216 184832


  fm  6464 
2 2

  1047936 
[156-164) 13 160 2080 332800 i i
fi mi2
s n  40  9.273
[164-172) 10 168 1680 282240
n 1 39
[172-180) 6 176 1056 185856

40 6464 1047936 s 9.273


ˆ x    1.466
n 40

  x  Z /2ˆ x  161600  2.58(1466)


With a probability of 99% the annual income of
157817.7    165382.3 farmers is between 157817.7 and 165382.3 TL.
Examples 5

In a region 26 districts were randomly selected from 300 districts and the population distribution in
these districts was determined as follows. It is known that population is normal distributed. Estimate
the total population in this region with 95% probability.

Number of People (in Number of District


1000)
20-24 7
24-28 10
38-32 6
32-40 3
s.d of population is not known, population is normal distributed, n  30 , n / N  0.05

Numer of Number
fi m 2
x
 fm i i

702
 27000
inhabitants of mi fi mi i
n 26
(1000) district
 fi mi   702 
2 2
20-24 7 22 154 3388
fm i
2
i 
n
19436 
26
24-28 10 26 260 6760 s   4.391
n 1 25
28-32 6 30 180 5400
32-40 3 36 108 3888
n / N  26 / 300  0.087  0.05
26 702 19436
s N  n 4.391 300  26
ˆ x    0.824
n N 1 26 300  1

N ˆ  N  x  t /2,n 1 x   300 [27  2.06(0.824)] With 95% confidence the region population varies
= 300 (27  1.698)=7591  8609 between 7591000 and 8609000.
Critical Values of t
If we are sampling from a normal distribution, the t-statistic has a sampling distribution very much like that of
the z-statistic: mound-shaped, symmetric, with mean 0. The primary difference between the sampling
distributions of t and z is that the t-statistic is more variable than the z, which follows intuitively when you
realize that t contains two random quantities ( x and s), whereas z contains only one x .
The actual amount of variability in the sampling distribution of t depends on the sample size n. A convenient
way of expressing this dependence is to say that the t-statistic has (n – 1) degrees of freedom (df). Recall that
the quantity (n – 1) is the divisor that appears in the formula for s 2 . This number plays a key role in the
sampling distribution of s.2 In particular, the smaller the number of degrees of freedom associated with the t-
statistic, the more variable will be its sampling distribution.
• Note that ta values are listed for various degrees of freedom, where a refers to the tail area under the t-
distribution to the right of ta. For example, if we want the t-value with an area of .025 to its right and 4 df,
we look in the table under the column t.025 for the entry in the row corresponding to 4 df. This entry,
t.025 = 2.776, is highlighted in Figure 6.9. Recall that the corresponding standard normal z-score is z.025
= 1.96.
• The last row of Table III, where df = (infinity), contains the standard normal z-values. This follows from
the fact that as the sample size n grows very large, s becomes closer to s and thus t becomes closer in
distribution to z. In fact, when df = 29, there is little difference between corresponding tabulated values of
z and t. Thus, researchers often choose the arbitrary cutoff of n = 30 (df = 29) to distinguish between the
large sample and small-sample inferential techniques when s is unknown.

• Example: Consider the pharmaceutical company that desires an estimate of the mean increase in blood
pressure of patients who take a new drug. The blood pressure increases (points) for the n = 6 patients in
the human testing phase are shown in table below. Use this information to construct a 95% confidence
interval for  , the mean increase in blood pressure associated with the new drug for all patients in the
population.
Solution:

df  n  1  5  30

1.7  3.0  0.8  3.4  2.7  2.1


x  2.283
6

(1.7  2.283) 2  (3.0  2.283) 2   (2.1  2.283) 2


s  0.94956
5

1.286  3.28

We can be 95% confident that the mean increase in blood pressure associated with taking this new drug
is between 1.286 and 3.28 points.
Examples 6-7
6) In order to determine the monthly average 7) To estimate the average number of children
wages of the workers operating in the same in a rural area, 350 families were selected
sector, 320 people were randomly selected randomly from among 5000 families.
from among 8000 workers. The average number of children of selected
It was determined that the monthly average families was calculated as 5.4 and the standard
wage was 8500 TL. and the standard deviation deviation as 2.1.
was 2600 TL .
Between what values does the average worker a) Construct a 99% confidence interval for the
wage vary with a probability of 99.73%? the average number of children in region.
b) For the average number of children
determine the mininum value with a 99%
confidence.
6) s.d of population is not known, population is 7) s.d of population is not known, n  30 ,
normal distributed n  30 ,
n / N  350 / 5000  0.07  0.05

s N n 2.1 5000  350


ˆ x    0.1083
n N 1 350 5000  1

n / N  320 / 8000  0.04  0.05 ˆ  x  Z /2ˆ x  5.4  2.58(0.1083)


5.12    5.68
s 2600
ˆ x    145.34 With 99% confidence the average number of
n 320
children varies between 5 and 6.

ˆ  x  Z /2ˆ x
 8500  3(145.34)  8063.97  8936.03 b) ˆ  x  Z ˆ x  5.4  2.33(0.1083)
= 5.4  0.2523  5.1477
  5.1477
With 99.73% confidence the average wage of
workers varies between 8063.97 and 8936.03 TL.
With 99% confidence the average number of
children should be greater than 5.
Example- 8
Some quality-control experiments require destructive sampling (i.e., the test to determine whether the item is
defective destroys the item) in order to measure some particular characteristic of the product. The cost of
destructive sampling often dictates small samples. For example, suppose a manufacturer of printers for personal
computers wishes to estimate the mean number of characters printed before the printhead fails. Suppose the
printer manufacturer tests n = 15 randomly selected printheads and records the number of characters printed
until failure for each. These 15 measurements (in millions of characters) are listed in the table below.

A) Form a 99% confidence interval for the mean of mean number of characters printed before the printhead
fails. Interpret the result.
B) What assumption is required for the interval, part a, to be valid? Is it reasonably satisfied?
8)   0.01  / 2  0.005 n  15 t0.005,14  2.977 a) The manufacturer can be 99% confident that the
df  n  1  14 printhead has a mean life of between 1.091 and
1.13  1.55  1.29 1.387 million characters. If the manufacturer were
x  1.239 to advertise that the mean life of its printheads is (at
15 least) 1 million characters, the interval would
support such a claim. Our confidence is derived
(1.13  1.239) 2  (1.29  1.239) 2
s  0.193 from the fact that 99% of the intervals formed in
14 repeated applications of this procedure would
 s   0.193  contain  .
x  t0.005,14    1.239  2.977  
 n  15 
b) Because n is small, we must assume that the number of
= 1.239  0.148 characters printed before printhead failure is a random variable
1.091    1.387 from a normal distribution—that is, we assume that the population
from which the sample of 15 measurements is selected is
distributed normally.

!!! An assumption that the population is approximately normally distributed is necessary for making small-sample inferences
about  when  is unknown and when using the t-statistic. Although many phenomena do have approximately normal
distributions, it is also true that many random phenomena have distributions that are not normal or even mound-shaped.
Empirical evidence acquired over the years has shown that confidence intervals based on the t-distribution are rather insensitive
to moderate departures from normality—that is, use of the t-statistic when sampling from slightly or moderately skewed mound-
shaped populations generally produces credible results; however, for cases in which the distribution is distinctly nonnormal, we
must either take a large sample or use a nonparametric method !!!
Confidence Interval for a Population Proportion
In some cases it is necessary to estimate the proportion or number of units in the population that have a
particular characteristic. In this case, the populations have two-groups consisting of those with and without
a certain feature, or they are transformed into this shape for the purpose of the research.
Researchers may sometimes want to treat multigroup populations as two-group. For example, while the
distribution of a class is multigroup according to the grades taken from any course (between 0-100), it can
be converted into two groups as successful and unsuccessful students from this course, when desired.
If A is the number of people with a certain feature in a population of N units, than the proportion of those
who have this feature will be;
A a
P For sample: p
N n

and the proportion of those who do not have this feature will be;

NA na
Q  1 P  For sample: q  1 p 
N n
Connection between ratio and mean
If the units with the examined feature are indicated by (1) in the population and those without this feature are
indicated by (0); N

 Xi  A
i 1

and the mean is; N

X i
A
X i 1
 P
N N

The mean can also be extracted; E  X i   X   P  X i   P (1)  (1  P )(0)  P

The variance of the population is; E  X i  X   E  X i  P   P (1  P ) 2  1  P  0  P 


2 2 2

= P 1  2 P  P 2   (1  P) P 2
= P  2 P 2  P 3  P 2  P 3  P  P 2  P (1  P )  PQ

For sample = pq
The fact that p̂ is a “sample mean number of successes per trial” allows us to form confidence intervals about p in a manner that is
completely analogous to that used for large-sample estimation of .
Example
A food-products company conducted a market study by randomly sampling and interviewing 1,000
consumers to determine which brand of breakfast cereal they prefer. Suppose 313 consumers were found to
prefer the company’s brand. How would you estimate the true fraction of all consumers who prefer the
company’s cereal brand?

313 pq
pˆ   0.313 pˆ  Z / 2 pˆ  0.313  1.96
1000 1000
qˆ  1  pˆ  1  0.313  0.687  0.313 0.687 
= 0.313  1.96
1000
np  1000(0.313)  313
= 0.313  0.029
nq  1000(0.687)  687
0.0284  pˆ  0.342

The company can be 95% confident that the interval from 28.4% to 34.2% contains the true
percentage of all consumers who prefer its brand—that is, in repeated construction of confidence
intervals, approximately 95% of all samples would produce confidence intervals that enclose p.
Suppose you want to estimate the proportion of executives who die from a work-related injury using a sample size
of n = 100. This proportion is likely to be near 0, say, p  0.001 . If so, then np  100(0.001)  0.1 is less than the
recommended value of 15 . Consequently, a confidence interval for p based on a sample of n = 100 will probably
be misleading. To overcome this potential problem, an extremely large sample size is required. Because the value
of n required to satisfy “extremely large” is difficult to determine statisticians have proposed an alternative
method, based on the Wilson (1927) point estimator of p. The procedure is outlined in the box below. Researchers
have shown that this confidence interval works well for any p, even when the sample size n is very small.
Example
According to the Bureau of Labor Statistics, the probability of injury while working at a jewelry store is less
than 0.01. Suppose that in a random sample of 200 jewelry store workers, 3 were injured on the job. Estimate
the true proportion of jewelry store workers injured on the job using a 95% confidence interval.

Solution:
Because the number of “successes” (i.e., number of injured jewelry store workers) in the sample is x = 3, the
adjusted sample proportion is

x2 3 2 5 p(1  p) (0.025)(0.975)


p    0.025 p  1.96  0.025  1.96
n  4 200  4 204 n4 204
= 0.025  0.021
0.004  p  0.046

Consequently, we are 95% confident that the true proportion of jewelry store workers who are injured
while on the job falls between 0.004 and 0.046.
Determining the Sample Size for  and p̂
Sample Size for Confidence Interval for  Sample Size for Confidence Interval for 

In order to estimate  with a sampling error SE and In order to estimate a binomial probability p̂ with sampling
with 100(1   )% confidence, the required sample size is error SE and with 100(1   )% confidence, the required
found as follows: sample size is foundby solving the following equation:
  
Z /2    SE
 n pq
Z /2  SE
n
The Solution for n is giving by the equation:
The Solution for n can be written as follows:
 ( Z ) 
2

n    /2 
 SE 
( Z /2 ) 2 ( pq)
n
Note: The value of  is usually unknown. It can be ( SE ) 2
estimated by the standard deviation, s, from a prior
sample. Alternatively, we may approximate the range R Note: Because the value of the product pq is unknown, it
of observations in the population, and (conservatively) can be estimated by using the sample fraction of successes,
estimate   R / 4 . In any case, you should round the from a prior sample. In any case, you should round the
value of n obtained upward to ensure that the sample value of n obtained upward to ensure that the sample size
size will be sufficient to achieve the specified reliability. will be sufficient to achieve the specified reliability.
Example

In a region where 10000 families live, a firm will conduct a sampling study to investigate
whether there has been a significant change in its market share, which has been 20% in recent
years. What should the sample size be to estimate the market share with a probability of 99%
with a margin of 0.05?

( Z /2 ) 2 ( pq) p  0.2 (2.58) 2 (0.2)(0.8)


n n  426.0096  427
( SE ) 2 q  0.8 (0.05) 2

SE  0.05

n / N  427 / 10000  0.0427  0.05

Sample size should be at least 427 families.


Example

• A specialty manufacturer wants to purchase remnants of sheet aluminum foil. The foil, all of
which is the same thickness, is stored on 1,462 rolls, each containing a varying amount of foil.
To obtain an estimate of the total number of square feet of foil on all the rolls, the manufacturer
randomly sampled 100 rolls and measured the number of square feet on each roll. The sample
mean was 47.4, and the sample standard deviation was 12.4.

A) Find an approximate 95% confidence interval for the mean amount of foil on the 1,462 rolls.
B) Estimate the total number of square feet of foil on all the rolls by multiplying the confidence
interval, part a, by 1,462. Interpret the result.
45.01  49.79

The manufacturer estimates the total amount of foil to be in the interval of


65,805 square feet to 72,793 square feet with 95% confidence.
Confidence Interval for a Population Variance
Intuitively, it seems reasonable to use the sample variance, s 2, to estimate  2 . However, unlike with sample
2
means and proportions, the sampling distribution of s does not follow a normal (z) distribution or a Student’s t-
2
distribution. Rather, when certain assumptions are satisfied, the sampling distribution of s possesses
approximately a chi-square  distribution. The chi-square probability distribution, like the t-distribution, is
2

characterized by a quantity called the degrees of freedom (df) associated with the distribution. Several chi-square
distributions with different df values are shown in the figure below. You can see that unlike z- and t-distributions,
the chi-square distribution is not symmetric about 0.
Critical Values of 
2
Critical Values of 
2
Example
The number of supermarkets in Myanmar’s most populated cities is increasing and market competition is also high.
Kaggle has published a study regarding the growth of supermarkets. A three-month dataset has been collected based
on the historical sales of a supermarket company located at Mandalay. The product lines under consideration in the
study are electronic accessories, fashion accessories, food and beverages, health and beauty, home and lifestyle, and
sports and travel. To analyze the average unit price of all the electronic accessories, a random sample of eight
accessories’ unit prices (in Kyat) paid by cash are listed in the following table.

a) Identify the target parameter for this study.


b) Compute a point estimate of the target parameter.
c) What is the problem with using the normal (z) statistic to find a confidence interval for the target parameter?
d) Find a 95% confidence interval for the target parameter.
e) Give a practical interpretation of the interval, part d.
f) What conditions must be satisfied for the interval, part d, to be valid?
g) How many electronic accessories’ unit prices would need to be sampled in order to reduce the width of the
confidence interval to 5 Kyat?
h) The analysist wants to know the variation of the unit prices of the electronic accessories at Mandalay. Provide the
researchers with an estimate of the target parameter using a 99% confidence interval.
a)  = average unit price for all electronic accessories paid for by cash.
b) Point estimator for  is x
(25.51)  (93.96)   (14.96)
x  56.18
8
c) Sampling distribution of x is unknown.
(25.51  56.18) 2   (14.96  56.18) 2
d) a  0.05  / 2  0.025 n8 t0.025,7  2.365 s  33.80131
7
 s   33.801 
x  t0.025,7    56.18  2.365   = 56.18  28.2631 27.913    84.439
 n  8 

e) 95% confident that the true mean unit price is between 27.91 and 84.44 Kyat.
f) Population is normally distributed and that the sample is a random sample
 ( Z )   1.96(33.801) 
2 2
g) .
n    /2      702.26  703
 SE   2.5 
h)   0.01  / 2  0.005 (n  1) s 2 7(1142.529) (n  1) s 2 7(1142.529)
  394.408   8084.53
 0.005,7
2
 20.277 s 2  (33.8013) 2  / 2
2
20.2777  (12  / 2) 0.98926
 0.995,7
2
 0.98926 =1142.53
19.86    89.91

You might also like