Basics of Statistics
Basics of Statistics
Statistics
PRAVIN KUMAR
1
Why study statistics?
2
Applications of statistical
concepts in the business
world
Finance – correlation and regression, index
numbers, time series analysis
Marketing – hypothesis testing, chi-square tests,
nonparametric statistics
Personel – hypothesis testing, chi-square tests,
nonparametric tests
Operating management – hypothesis testing,
estimation, analysis of variance, time series
analysis
3
Statistics
4
Types of statistics
Descriptivestatistics – Methods of organizing,
summarizing, and presenting data in an informative
way
Inferential statistics – The methods used to
determine something about a population on the basis
of a sample
Population –The entire set of individuals or
objects of interest or the measurements obtained
from all individuals or objects of interest
Sample – A portion, or part, of the population of
interest
5
Data and Statistics
Data consists of information coming from observations,
counts, measurements, or responses.
6
Populations & Samples
Example:
In a recent survey, 250 college students at
Union College were asked if they smoked
cigarettes regularly. 35 of the students said
yes. Identify the population and the sample.
Responses of all students
at Union College
(population)
Responses of
students in survey
(sample)
7
Parameters & Statistics
A parameter is a numerical description of a
population characteristic.
Parameter Population
Statistic Sample
8
Parameters & Statistics
Example:
Decide whether the numerical value describes a population parameter or a sample statistic.
Descriptive Inferential
statistics statistics
Involves the Involves using a
organization, sample to draw
summarization, conclusions
and display of about a
data. population. 10
Descriptive and
Inferential Statistics
Example:
In a recent study, volunteers who had less than 6 hours of sleep were four times more likely to answer incorrectly on a science test than were participants
who had at least 8 hours of sleep. Decide which part is the descriptive statistic and what conclusion might be drawn using inferential statistics.
12
Types of Data
Data sets can consist of two types of data:
qualitative data and quantitative data.
Data
Qualitative Quantitative
Data Data
Consists of Consists of
attributes, numerical
labels, or non- measurements or
numerical counts.
entries. 13
Levels of Measurement
The level of measurement determines which
statistical calculations are meaningful. The
four levels of measurement are: nominal,
ordinal, interval, and ratio.
Nominal
Levels Lowest
Ordinal to
of
Measurement Interval highest
Ratio
14
Nominal Level of
Measurement
Data at the nominal level of measurement
are qualitative only.
Nominal
Levels Calculated using names,
of labels, or qualities. No
Measurement mathematical computations
can be made at this level.
15
Ordinal Level of
Measurement
Data at the ordinal level of measurement
are qualitative or quantitative.
Levels Ordinal
of
Arranged in order, but
Measurement differences between data
entries are not meaningful.
18
Summary of Levels of
Measurement
Put Arrang Subtract Determine if
Level of
data in e data one data value
measurem in
data is a multiple
categor
ent order values of another
ies
Nominal Yes No No No
Ordinal Yes Yes No No
Interval Yes Yes Yes No
Ratio Yes Yes Yes Yes
19
Population and Sample
20
Inferential Statistics
Estimation
e.g.,
Estimate the
population mean
weight using the
sample mean weight
Hypothesis testing
e.g.,
Test the claim
that the population
Inference is the process of drawing
conclusions
mean weight isor70making
kg decisions
about a population based on sample
21
Sampling
22
Sampling methods
24
Descriptive Statistics
Collect data
e.g., Survey
Present data
e.g., Tables and graphs
Summarize data
X i
e.g., Sample mean =n
25
Statistical data
The collection of data that are relevant to the
problem being studied is commonly the most
difficult, expensive, and time-consuming part of
the entire research project.
Statistical
data are usually obtained by counting or
measuring items.
Primary data are collected specifically for the
analysis desired
Secondary data have already been compiled
and are available for statistical analysis
A variable is an item of interest that can take on
many different numerical values.
A constant has a fixed numerical value. 26
Data
Statistical data are usually obtained by counting or
measuring items. Most data can be put into the
following categories:
Qualitative - data are measurements that each fail
into one of several categories. (hair color, ethnic
groups and other attributes of the population)
quantitative - data are observations that are
measured on a numerical scale (distance traveled to
college, number of children in a family, etc.)
27
Qualitative data
Qualitative data are generally described by words or
letters. They are not as widely used as quantitative data
because many numerical techniques do not apply to the
qualitative data. For example, it does not make sense to
find an average hair color or blood type.
Qualitative data can be separated into two subgroups:
dichotomic (if it takes the form of a word with two
options (gender - male or female)
polynomic (if it takes the form of a word with more
than two options (education - primary school, secondary
school and university).
28
Quantitative data
29
Types of variables
Variables
Qualitative Quantitative
30
Numerical scale of
Nominal – consist of categories in each of which the
measurement:
number of respective observations is recorded. The
categories are in no logical order and have no
particular relationship. The categories are said to be
mutually exclusive since an individual, object, or
measurement can be included in only one of them.
Ordinal – contain more information. Consists of
distinct categories in which order is implied. Values in
one category are larger or smaller than values in other
categories (e.g. rating-excelent, good, fair, poor)
Interval– is a set of numerical measurements in
which the distance between numbers is of a known,
constant size.
Ratio – consists of numerical measurements where
the distance between numbers is of a known, constant
size, in addition, there is a nonarbitrary zero point. 31
Data presentation
32
Numerical presentation
of qualitative data
pivot table (qualitative dichotomic
statistical attributes)
contingency table (qualitative statistical
attributes from which at least one of them
is polynomic)
33
Frequency distributions –
numerical presentation of
quantitative
data
Frequency distribution – shows the
frequency, or number of occurences, in each
of several categories. Frequency
distributions are used to summarize large
volumes of data values.
When the raw data are measured on a
qunatitative scale, either interval or ratio,
categories or classes must be designed for
the data values before a frequency
distribution can be formulated.
34
Steps for constructing a
frequency distribution
1. Determine the number of classes m n
h
max min
2. Determine the size of each class m
3. Determine the starting point for the first class
4. Tally the number of values that occur in each
class
5. Prepare a table of the distribution using actual
counts and/ or percentages (relative frequencies)
35
Frequency table
absolute
frequency “ni” (Data
TabData AnalysisHistogram)
relative frequency “fi”
Cumulative frequency distribution shows
the total number of occurrences that lie
above or below certain key values.
cumulative frequency “Ni”
cumulative relative frequency “Fi”
36
Charts and graphs
37
Histogram
Frequently used to graphically present interval and
ratio data
Is often used for interval and ratio data
The adjacent bars indicate that a numerical range is
being summarized by indicating the frequencies in
arbitrarily chosen classes
38
Histogram
39
Frequency polygon
Another common method for graphically
presenting interval and ratio data
To construct a frequency polygon mark the
frequencies on the vertical axis and the
values of the variable being measured on
the horizontal axis, as with the histogram.
Ifthe purpose of presenting is comparation
with other distributions, the frequency
polygon provides a good summary of the
data
40
Frequency Polygon
41
Ogive
42
Ogive
43
Pie Chart
45
Bar chart
Another common method for graphically
presenting nominal and ordinal scaled data
One bar is used to represent the frequency for each
category
The bars are usually positioned vertically with
their bases located on the horizontal axis of the
graph
The bars are separated, and this is why such a
graph is frequently used for nominal and ordinal
data – the separation emphasize the plotting of
frequencies for distinct categories
46
Bar Chart
47
Time Series Graph
48
Time Series Graph
49
Descriptive Statistics:
Numerical Measures
Measures of Location
Measures of Variability
50
Measures of Location
Mean
If the measures are computed
Median
for data from a sample,
Mode they are called sample statistics.
Percentiles
Quartiles If the measures are computed
for data from a population,
they are called population parameters.
51
Mean
52
Sample Mean x
53
Population Mean
54
Sample Mean
55
Sample Mean
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
56
Sample Mean
x x i
34,356
490.80
n 70
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
57
Median
The median of a data set is the value in the middle
when the data items are arranged in ascending ord
Whenever a data set has extreme values, the media
is the preferred measure of central location.
The median is the measure of location most often
reported for annual income and property value data
A few extremely large incomes or property values
can inflate the mean.
58
Median
26 18 27 12 14 27 19 7 observations
12 14 18 19 26 27 27 in ascending order
Median = 19
59
Median
26 18 27 12 14 27 30 19 8 observations
12 14 18 19 26 27 27 30 in ascending order
60
Median
Averaging the 35th and 36th data values:
Median = (475 + 475)/2 = 475
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
61
Mode
The mode of a data set is the value that occurs with
greatest frequency.
The greatest frequency can occur at two or more
different values.
If the data have exactly two modes, the data are
bimodal.
If the data have more than two modes, the data are
multimodal.
62
Mode
450 occurred most frequently (7 times)
Mode = 450
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
63
Percentiles
A percentile provides information about how the
data are spread over the interval from the smallest
value to the largest value.
Admission test scores for colleges and universities
are frequently reported in terms of percentiles.
64
Percentiles
The pth percentile of a data set is a value such that at
least p percent of the items take on this value or less and
at least (100 - p) percent of the items take on this value or
more.
65
Percentiles
66
90th Percentile
i = (p/100)n = (90/100)70 = 63
Averaging the 63rd and 64th data values:
90th Percentile = (580 + 590)/2 = 585
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
67
90th Percentile
68
Quartiles
69
Third Quartile
Third quartile = 75th percentile
i = (p/100)n = (75/100)70 = 52.5 = 53
Third quartile = 525
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
70
Measures of Variability
It is often desirable to consider measures of variabil
(dispersion), as well as measures of location.
For example, in choosing supplier A or supplier B we
might consider not only the average delivery time f
each, but also the variability in delivery time for eac
71
Measures of Variability
Range
Interquartile Range
Variance
Standard Deviation
Coefficient of Variation
72
Range
73
Range
Range = largest value - smallest value
Range = 615 - 425 = 190
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
74
Interquartile Range
75
Interquartile Range
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
76
Variance
77
Variance
2
( xi x ) ( xi ) 2
s2 2
n 1 N
for a for a
sample population
78
Standard Deviation
79
Standard Deviation
s s2 2
for a for a
sample population
80
Coefficient of Variation
The coefficient of variation indicates how large the
standard deviation is in relation to the mean.
s
100 % 100 %
x
for a for a
sample population
81
Descriptive Statistics:
Numerical Measures
82
Measures of Distribution
Shape,
Relative Location, and
Detecting Outliers
Distribution Shape
z-Scores
Detecting Outliers
83
Distribution Shape:
Skewness
When referring to the shape of frequency or probability
distributions, “skewness” refers to asymmetry of the
distribution.
A distribution with an asymmetric tail extending out to the
right is referred to as “positively skewed” or “skewed to
the right,” while a distribution with an asymmetric tail
extending out to the left is referred to as “negatively
skewed” or “skewed to the left.”
Skewness can range from minus infinity to positive
infinity.
84
Distribution Shape: Skewness
.30
.25
.20
.15
.10
.05
0
85
Distribution Shape:
Skewness
Moderately Skewed Left
Skewness is negative.
Mean will usually be less than the median.
.35
Skewness = .31
Relative Frequency
.30
.25
.20
.15
.10
.05
0
86
Distribution Shape:
Skewness
Moderately Skewed Right
Skewness is positive.
Mean will usually be more than the median.
.35
Skewness = .31
Relative Frequency
.30
.25
.20
.15
.10
.05
0
87
Distribution Shape: Skewness
.30
.25
.20
.15
.10
.05
0
88
Skewness
Karl Pearson (1895) first suggested measuring skewness
by standardizing the difference between the mean and
the mode, that is,
Mode
sk
Population modes are not well estimated from sample
modes, but one can estimate the difference between
the mean and the mode as being three times the
difference between the mean and the median (Stuart &
Ord, 1994), leading to the following estimate of
skewness:
3( M Median)
skest
s
89
Many statisticians use this measure but with the ‘3’
eliminated, that is,
( M Median)
sk
s
90
Fisher’s skewness is most often estimated by:
n z 3
n ( xi )
3
g1
(n 1)(n 2) (n 1)(n 2)
6/n
91
Kurtosis
Karl Pearson
introduced the term
Kurtosis (literally the
amount of hump) for
the degree of
peakedness or
flatness of a
unimodal frequency
curve.
92
When the peak of a curve becomes relatively high then
that curve is called Leptokurtic.
93
For a normal distribution, kurtosis is equal to 3.
94
Another measure of Kurtosis, known as
Percentile coefficient of kurtosis is:
Q.D
Kurt=
P90 P10
Where,
Q.D is semi-interquartile range=Q.D=(Q3-
Q1)/2
P90=90th percentile
P10=10th percentile
95
Karl Pearson (1905) defined a distribution’s degree of kurtosis as
where
2 3
Y
4
2
n 4
96
An unbiased estimator for 2 is
n(n 1) z 4
3(n 1) 2
g2
(n 1)(n 2(n 3) (n 2)( n 3)
24 / n
97
Pearson (1905) introduced kurtosis as a measure
of how flat the top of a symmetric distribution is
when compared to a normal distribution of the
same variance. He referred to more flat-topped
distributions (2 < 0) as “platykurtic,” less flat-
topped distributions (2 > 0) as “leptokurtic,” and
equally flat-topped distributions as “mesokurtic”
(2 0).
98
z-Scores
The
The z-score
z-score is
is often
often called
called the
the standardized
standardized value.
value.
It
It denotes
denotes the the number
number of
of standard
standard deviations
deviations aa data
data
value
value xxii is
is from
from the
the mean.
mean.
xi x
zi
s
99
z-Scores
100
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
x 490.8
s 54.74
101
z-Scores
z-Score of Smallest Value (425)
xi x 425 490.80
z 1.20
s 54.74
102
Empirical Rule
For data having a bell-shaped distribution:
68.26%
68.26% of of the
the values
values of
of aa normal
normal random
random variable
variable
are within+/-
are within+/- 1
1 standard
standard deviation
deviation of
of its
its mea
me
95.44%
95.44% of of the
the values
values of
of aa normal
normal random
random variable
variable
are within+/-
are within +/- 2
2 standard
standard deviations
deviations of
of its
its mea
me
99.72%
99.72% of of the
the values
values of
of aa normal
normal random
random variable
variable
are within+/-
are within +/- 3
3 standard
standard deviations
deviations of
of its
its mea
me
103
Empirical Rule
99.72%
95.44%
68.26%
x
– 3 – 1 + 1 + 3
– 2 + 2
104
Normal Probability
Distributions
105
INTRODUCTION TO NORMAL
DISTRIBUTIONS AND THE
STANDARD DISTRIBUTION
106
Properties of Normal Distributions
A continuous random variable has an infinite
number of possible values that can be represented
by an interval on the number line.
Normal curve
108
Properties of Normal Distributions
Properties of a Normal Distribution
1. The mean, median, and mode are equal.
2. The normal curve is bell-shaped and symmetric
about the mean.
3. The total area under the curve is equal to one.
4. The normal curve approaches, but never touches
the x-axis as it extends farther and farther away
from the mean.
5. Between μ σ and μ + σ (in the center of the
curve), the graph curves downward. The graph
curves upward to the left of μ σ and to the right of
μ + σ. The points at which the curve changes from
curving upward to curving downward are called the
inflection points. 109
Properties of Normal Distributions
Inflection points
Total area = 1
x
μ 3σ μ 2σ μσ μ μ+σ μ + 2σ μ + 3σ
111
Means and Standard Deviations
Example:
1. Which curve has the greater mean?
2. Which curve has the greater standard
deviation?
B
A
x
1 3 5 7 9 11 13
112
Interpreting Graphs
Example:
The heights of fully grown magnolia bushes are
normally distributed. The curve represents the
distribution. What is the mean height of a fully grown
magnolia bush? Estimate the standard deviation.
The inflection points are one
standard deviation away from the
μ=8 mean. σ 0.7
x
6 7 8 9 10
Height (in feet)
113
The Standard Normal Distribution
The standard normal distribution is a normal
distribution with a mean of 0 and a standard deviation
of 1.
114
The Standard Normal Distribution
If each data value of a normally distributed random
variable x is transformed into a z-score, the result
will be the standard normal distribution.
The area that falls in the interval
under the nonstandard normal curve
(the x-values) is the same as the
area under the standard normal
curve (within the corresponding z-
boundaries).
z
3 2 1 0 1 2 3
116
The Standard Normal Table
Example:
Find the cumulative area that corresponds to a z-
score of 2.71.
Appendix B: Standard Normal Table
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359
0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753
0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141
2.6 .9953 .9955 .9956 .9957 .9959 .9960 .9961 .9962 .9963 .9964
2.7 .9965 .9966 .9967 .9968 .9969 .9970 .9971 .9972 .9973 .9974
2.8 .9974 .9975 .9976 .9977 .9977 .9978 .9979 .9979 .9980 .9981
3.4 .0002 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003
3.3 .0003 .0004 .0004 .0004 .0004 .0004 .0004 .0005 .0005 .0005
0.3 .3483 .3520 .3557 .3594 .3632 .3669 .3707 .3745 .3783 .3821
0.2 .3859 .3897 .3936 .3974 .4013 .4052 .4090 .4129 .4168 .4207
0.1 .4247 .4286 .4325 .4364 .4404 .4443 .4483 .4522 .4562 .4602
0.0 .4641 .4681 .4724 .4761 .4801 .4840 .4880 .4920 .4960 .5000
119
Guidelines for Finding Areas
Finding Areas Under the Standard
Normal Curve
b. To find the area to the right of z, use the
Standard Normal Table to find the area that
corresponds to z. Then subtract the area from
1. 2. The area to 3. Subtract to find the area
the left of z = to the right of z = 1.23:
1.23 is 0.8907. 1 0.8907 =
0.1093.
z
0 1.23
1. Use the table to find
the area for the z-score.
120
Guidelines for Finding Areas
Finding Areas Under the Standard
Normal Curve
c. To find the area between two z-scores, find the
area corresponding to each z-score in the
Standard Normal Table. Then subtract the
smaller area
2. The from the larger
area to area.
4. Subtract to find the area
of the region between the
the left of z =
1.23 is two z-scores:
0.8907. 0.8907 0.2266 =
3. The area to the 0.6641.
left of z = 0.75 is
0.2266.
z
0.75 0 1.23
121
Guidelines for Finding Areas
Example:
Find the area under the standard normal
curve to the left of z = 2.33.
Always draw
the curve!
2.33 0
122
Guidelines for Finding Areas
Example:
Find the area under the standard normal
curve to the right of z = 0.94.
Always draw
the curve!
0.8264
1 0.8264 =
0.1736
z
0 0.94
123
Guidelines for Finding Areas
Example:
Find the area under the standard normal
curve between z = 1.98 and z = 1.07.
Always draw
0.8577 the curve!
z
1.98 0 1.07
x
μ =10 15
126
Probability and Normal
Distributions
Normal Distribution Standard Normal
μ = 10 Distribution
μ=0
σ=5 σ=1
x z
μ =10 15 μ =0 1
Same area
P(x < 15) = P(z < 1) = Shaded area under the curv
= 0.8413
127
Probability and Normal
Distributions
Example:
The average on a statistics test was 78 with a
standard deviation of 8. If the test scores are
normally distributed, find the probability that a
student receives a test score less than 90.
μ = 78 x - μ 90-78
σ=8 z =
σ 8
=1.5
P(x < 90)
128
Probability and Normal
Distributions
Example:
The average on a statistics test was 78 with a
standard deviation of 8. If the test scores are
normally distributed, find the probability that a
student receives a test score greater than than 85.
x - μ 85-78
μ = 78 z= =
σ 8
σ=8
=0.875 0.88
P(x > 85)
The probability that a
x student receives a test
μ =78 85 score greater than 85 is
z
μ =0 0.88
?
0.1894.
(x > 85) = P(z > 0.88) = 1 P(z < 0.88) = 1 0.8106 = 0.18
129
Probability and Normal
Distributions
Example:
The average on a statistics test was 78 with a
standard deviation of 8. If the test scores are
normally distributed, find the probability that a
student receives a test score
z =
between
x 60 =
- μ 60 - 78
=
and
-2.2580.
1
σ 8
P(60 < x < 80) x - μ 80 - 78 =0.25
z2 =
σ 8
μ = 78
σ=8
The probability that a
x student receives a test
60 μ =7880 score between 60 and
z
2.25
? μ =0 0.25
?
80 is 0.5865.
(60 < x < 80) = P(2.25 < z < 0.25) = P(z < 0.25) P(z < 2.25)
= 0.5987 0.0122 = 0.5865
130
NORMAL
DISTRIBUTIONS:
FINDING VALUES
131
Finding z-Scores
Example:
Find the z-score that corresponds to a cumulative
area of 0.9973. Appendix B: Standard Normal Table
z .00 .01 .02 .03 .04 .05 .06 .07 .08
.08 .09
0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359
0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753
0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141
2.6 .9953 .9955 .9956 .9957 .9959 .9960 .9961 .9962 .9963 .9964
2.7
2.7 .9965 .9966 .9967 .9968 .9969 .9970 .9971 .9972 .9973 .9974
2.8 .9974 .9975 .9976 .9977 .9977 .9978 .9979 .9979 .9980 .9981
3.4 .0002 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003
0.2 .0003 .0004 .0004 .0004 .0004 .0004 .0004 .0005 .0005 .0005
Use the
closest
0.3 .3483 .3520 .3557 .3594 .3632 .3669 .3707 .3745 .3783 .3821
area.
0.2
0.2 .3859 .3897 .3936 .3974 .4013 .4052 .4090 .4129 .4168 .4207
0.1 .4247 .4286 .4325 .4364 .4404 .4443 .4483 .4522 .4562 .4602
0.0 .4641 .4681 .4724 .4761 .4801 .4840 .4880 .4920 .4960 .5000
Area = 0.75
z
μ =0 0.67
?
134
Transforming a z-Score to an x-
Score
To transform a standard z-score to a data value,
x, in a given population, use the formula
x μ +zσ.
Example:
The monthly electric bills in a city are normally
distributed with a mean of $120 and a standard deviation
of $16. Find the x-value corresponding to a z-score of
1.60.
x μ +zσ
=120+1.60(16)
=145.6
We can conclude that an electric bill of $145.60 is 1.6
standard deviations above the mean.
135
Finding a Specific Data Value
Example:
The weights of bags of chips for a vending machine are
normally distributed with a mean of 1.25 ounces and a
standard deviation of 0.1 ounce. Bags that have weights
in the lower 8% are too light and will not work in the
machine. What is the least a bag of chips can weigh and
still work in the machine?
P(z < ?) = 0.08
8% P(z < 1.41) = 0.08
z
?
1.41 0 x μ +zσ
x
? 1.25
1.25 ( 1.41)0.1
1.11
1.11
The least a bag can weigh and still work in the machine is 1.11
ounces.
136
SAMPLING DISTRIBUTIONS
AND THE CENTRAL LIMIT
THEOREM
137
Sampling Distributions
A sampling distribution is the probability
distribution of a sample statistic that is formed when
samples of size n are repeatedly taken from a
population.
Sample Sample
Sample Sample
Sample
Sample
Sample
Sample
Populati Sample
Sample
on
138
Sampling Distributions
If the sample statistic is the sample mean, then the
distribution is the sampling distribution of
sample means.
Sample 3
Sample 1 x3 Sample 2 Sample 6
Sample 4
xSample 5
x6
x
4
1
5 x x2
139
Properties of Sampling
Distributions
Properties of Sampling Distributions of Sample
Means μx ,
1. The mean of the sample means, is equal to the
μx = μ
population mean.
σx ,
2. The standard deviation of theσ,sample means, is equal to
the population standard σ σ
deviation, divided by the square
x=
root of n.
n
141
Sampling Distribution of Sample
Means
Example continued:
The population values {5, 10, 15, 20} are written on slips
of paper and put in a hat. Two slips are randomly
selected, with replacement.
b. Graph the probability histogram for the population
values.
P(x) Probability
Histogram of
0.25 Population of x
This uniform distribution
Probability
144
Sampling Distribution of Sample
Means
Example continued:
The population values {5, 10, 15, 20} are written on slips
of paper and put in a hat. Two slips are randomly
selected, with replacement.
e. Graph the probability histogram for the sampling
distribution.
P(x) Probability
Histogram of
0.25 Sampling
Distribution
Probability
0.20
The shape of the graph
0.15
is symmetric and bell
0.10
shaped. It approximates
0.05 a normal distribution.
x
5 7.5 10 12. 15 17. 20
Sample5mean 5
145
The Central Limit Theorem
If a sample of size n 30 is taken from a population
with any type of distribution that has a mean =
and standard deviation = ,
x x
146
The Central Limit Theorem
If the population itself is normally distributed,
with mean = and standard deviation = ,
147
The Central Limit Theorem
In either case, the sampling distribution of sample
means has a mean equal to the population mean.
μx μ Mean of the
sample means
x
75 78 79
z
1.88
? 00.63
? Continued.
153
Probability and Normal
Distributions
Example continued:
x
75 78 79
z
?
1.88 0 0.63
?
P(75 < <x 79) = P(1.88 < z < 0.63) = P(z < 0.63) P(z < 1.88)
= 0.7357 0.0301 = 0.7056
Approximately 70.56% of the 25 students will have a
mean score between 75 and 79.
154
Probabilities of x and x
Example:
The population mean salary for auto mechanics is
= $34,000 with a standard deviation of =
$2,500. Find the probability that the mean salary for a
randomly selected sample of 50 mechanics is greater
μthan
x
$35,000.
=34000
x μx 35000 34000 =2.83
σ 2500 z =
σx = =353.55 σx 353.55
n 50
P ( x > 35000)
= P (z > 2.83)
= 1 P (z < 2.83)
= 1 0.9977= 0.0023
158
Normal Approximation
The normal distribution is used to approximate
the binomial distribution when it would be
impractical to use the binomial distribution to
find a probability.
Normal Approximation to a Binomial
Distribution
If np 5 and nq 5, then the binomial random
variableμ xis
npapproximately normally distributed with
mean
σ npq.
and standard deviation
159
Normal Approximation
Example:
Decided whether the normal distribution to
approximate x may be used in the following
examples.
1. Thirty-six percent of people in the United States
own a dog. You randomly select 25 people in
the
np United States
=(25)(0.36) =9 and ask them
Because np andifnq
they own a
are greater
dog.
nq =(25)(0.64) =16 than 5, the normal distribution may
be used.
c
x
162
Guidelines
Using the Normal Distribution to Approximate Binomial
Probabilities
In Words In Symbols
Specify n, p, and q.
1. Verify that the binomial distribution applies.
Is np 5?
2. Determine if you can use the normal Is nq 5?
distribution to approximate x, the binomial
variable. μ np
3. Find the mean and standard deviation σ npq
for the distribution.
4. Apply the appropriate continuity correction. Add or subtract 0.5
Shade the corresponding area under the from endpoints.
x-μ
normal curve. z
σ
5. Find the corresponding z-value(s). Use the Standard
6. Find the probability. Normal Table.
163
Approximating a Binomial
Probability
Example:
Thirty-one percent of the seniors in a certain high school
plan to attend college. If 50 students are randomly
selected, find the probability that less than 14 students plan
to attend
np college.
= (50)(0.31) = 15.5The variable x is approximately
nq = (50)(0.69) = 34.5normally distributed with = np =
15.5
σ = and
npq = (50)(0.31)(0.69) =3.27.