0% found this document useful (0 votes)
7 views165 pages

Basics of Statistics

The document provides an introduction to statistics, emphasizing its importance in decision-making across various fields such as finance, marketing, and operations management. It explains key concepts including types of statistics (descriptive and inferential), data classification, levels of measurement, and sampling methods. Additionally, it covers data presentation techniques and the significance of statistical analysis in interpreting data effectively.

Uploaded by

kumar3727
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views165 pages

Basics of Statistics

The document provides an introduction to statistics, emphasizing its importance in decision-making across various fields such as finance, marketing, and operations management. It explains key concepts including types of statistics (descriptive and inferential), data classification, levels of measurement, and sampling methods. Additionally, it covers data presentation techniques and the significance of statistical analysis in interpreting data effectively.

Uploaded by

kumar3727
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 165

Introduction To

Statistics

PRAVIN KUMAR

1
Why study statistics?

1. Data are everywhere


2. Statistical techniques are used to make many
decisions that affect our lives

2
Applications of statistical
concepts in the business
world
 Finance – correlation and regression, index
numbers, time series analysis
 Marketing – hypothesis testing, chi-square tests,
nonparametric statistics
 Personel – hypothesis testing, chi-square tests,
nonparametric tests
 Operating management – hypothesis testing,
estimation, analysis of variance, time series
analysis

3
Statistics

 The science of collectiong, organizing,


presenting, analyzing, and interpreting data
to assist in making more effective
decisions
 Statistical
analysis – used to manipulate
summarize, and investigate data, so that
useful decision-making information
results.

4
Types of statistics
 Descriptivestatistics – Methods of organizing,
summarizing, and presenting data in an informative
way
 Inferential statistics – The methods used to
determine something about a population on the basis
of a sample
 Population –The entire set of individuals or
objects of interest or the measurements obtained
from all individuals or objects of interest
 Sample – A portion, or part, of the population of
interest
5
Data and Statistics
 Data consists of information coming from observations,
counts, measurements, or responses.

Statistics is the science of collecting,


organizing, analyzing, and interpreting data
in order to make decisions.
A population is the collection of all
outcomes, responses, measurement, or
counts that are of interest.
A sample is a subset of a population.

6
Populations & Samples

 Example:
 In a recent survey, 250 college students at
Union College were asked if they smoked
cigarettes regularly. 35 of the students said
yes. Identify the population and the sample.
Responses of all students
at Union College
(population)

Responses of
students in survey
(sample)

7
Parameters & Statistics
A parameter is a numerical description of a
population characteristic.

A statistic is a numerical description of a


sample characteristic.

Parameter Population

Statistic Sample

8
Parameters & Statistics
 Example:
 Decide whether the numerical value describes a population parameter or a sample statistic.

a.) A recent survey of a sample of 450


college students reported that the
average weekly income for students is
$325.
Because the average of $325 is
based on a sample, this is a sample
statistic.
b.) The average weekly income for all
students is $405.
Because the average of $405 is based
on a population, this is a population
parameter.
9
Branches of Statistics
The study of statistics has two major
branches: descriptive statistics and
inferential statistics.
Statistics

Descriptive Inferential
statistics statistics
Involves the Involves using a
organization, sample to draw
summarization, conclusions
and display of about a
data. population. 10
Descriptive and

 Inferential Statistics
Example:
In a recent study, volunteers who had less than 6 hours of sleep were four times more likely to answer incorrectly on a science test than were participants
who had at least 8 hours of sleep. Decide which part is the descriptive statistic and what conclusion might be drawn using inferential statistics.

The statement “four times more


likely to answer incorrectly” is a
descriptive statistic. An inference
drawn from the sample is that all
individuals sleeping less than 6
hours are more likely to answer
science question incorrectly than
individuals who sleep at least 8
11
DATA
CLASSIFICATION

12
Types of Data
Data sets can consist of two types of data:
qualitative data and quantitative data.
Data

Qualitative Quantitative
Data Data
Consists of Consists of
attributes, numerical
labels, or non- measurements or
numerical counts.
entries. 13
Levels of Measurement
The level of measurement determines which
statistical calculations are meaningful. The
four levels of measurement are: nominal,
ordinal, interval, and ratio.
Nominal
Levels Lowest
Ordinal to
of
Measurement Interval highest

Ratio

14
Nominal Level of
Measurement
Data at the nominal level of measurement
are qualitative only.
Nominal
Levels Calculated using names,
of labels, or qualities. No
Measurement mathematical computations
can be made at this level.

Colors Names of Textbooks


in the students in you are using
US flag your class this semester

15
Ordinal Level of
Measurement
Data at the ordinal level of measurement
are qualitative or quantitative.

Levels Ordinal
of
Arranged in order, but
Measurement differences between data
entries are not meaningful.

Class Numbers on Top 50 songs


standings: the back of played on the
freshman, each player’s radio
sophomore, shirt
junior, senior
16
Interval Level of
Measurement
Data at the interval level of measurement
are quantitative. A zero entry simply
represents a position on a scale; the entry is
not an inherent zero.
Levels Interval
of Arranged in order, the differences
Measurement between data entries can be
calculated.
Temperatures Years on a Atlanta
timeline Braves World
Series
victories
17
Ratio Level of
Measurement
Data at the ratio level of measurement are
similar to the interval level, but a zero entry is
meaningful.
A ratio of two data values can be
Levels formed so one data value can be
of expressed as a ratio.
Measurement
Ratio

Ages Grade point Weights


averages

18
Summary of Levels of
Measurement
Put Arrang Subtract Determine if
Level of
data in e data one data value
measurem in
data is a multiple
categor
ent order values of another
ies
Nominal Yes No No No
Ordinal Yes Yes No No
Interval Yes Yes Yes No
Ratio Yes Yes Yes Yes

19
Population and Sample

20
Inferential Statistics
 Estimation

 e.g.,
Estimate the
population mean
weight using the
sample mean weight
 Hypothesis testing
 e.g.,
Test the claim
that the population
Inference is the process of drawing
conclusions
mean weight isor70making
kg decisions
about a population based on sample
21
Sampling

a sample should have the same characteristics


as the population it is representing.
Sampling can be:
 with replacement: a member of the
population may be chosen more than once
(picking the candy from the bowl)
 without replacement: a member of the
population may be chosen only once (lottery
ticket)

22
Sampling methods

Sampling methods can be:


 random (each member of the population has an equal
chance of being selected)
 nonrandom

The actual process of sampling causes sampling


errors. For example, the sample may not be large
enough or representative of the population. Factors not
related to the sampling process cause nonsampling
errors. A defective counting device can cause a
nonsampling error.
23
Random sampling
methods
 simple random sample (each sample of the same
size has an equal chance of being selected)
 stratified sample (divide the population into groups
called strata and then take a sample from each
stratum)
 cluster sample (divide the population into strata and
then randomly select some of the strata. All the
members from these strata are in the cluster sample.)
 systematic sample (randomly select a starting point
and take every n-th piece of data from a listing of
the population)

24
Descriptive Statistics

 Collect data
 e.g., Survey
 Present data
 e.g., Tables and graphs
 Summarize data
X i
 e.g., Sample mean =n

25
Statistical data
 The collection of data that are relevant to the
problem being studied is commonly the most
difficult, expensive, and time-consuming part of
the entire research project.
 Statistical
data are usually obtained by counting or
measuring items.
 Primary data are collected specifically for the
analysis desired
 Secondary data have already been compiled
and are available for statistical analysis
A variable is an item of interest that can take on
many different numerical values.
A constant has a fixed numerical value. 26
Data
Statistical data are usually obtained by counting or
measuring items. Most data can be put into the
following categories:
 Qualitative - data are measurements that each fail
into one of several categories. (hair color, ethnic
groups and other attributes of the population)
 quantitative - data are observations that are
measured on a numerical scale (distance traveled to
college, number of children in a family, etc.)

27
Qualitative data
Qualitative data are generally described by words or
letters. They are not as widely used as quantitative data
because many numerical techniques do not apply to the
qualitative data. For example, it does not make sense to
find an average hair color or blood type.
Qualitative data can be separated into two subgroups:
 dichotomic (if it takes the form of a word with two
options (gender - male or female)
 polynomic (if it takes the form of a word with more
than two options (education - primary school, secondary
school and university).
28
Quantitative data

Quantitative data are always numbers and are the


result of counting or measuring attributes of a
population.
Quantitative data can be separated into two
subgroups:
 discrete (if it is the result of counting (the number
of students of a given ethnic group in a class, the
number of books on a shelf, ...)
 continuous (if it is the result of measuring
(distance traveled, weight of luggage, …)

29
Types of variables
Variables

Qualitative Quantitative

Dichotomic Polynomic Discrete Continuous

Gender, Children in Amount of


Brand of Pc, family, income tax
marital hair color Strokes on a paid, weight
status golf hole of a student

30
Numerical scale of
 Nominal – consist of categories in each of which the
measurement:
number of respective observations is recorded. The
categories are in no logical order and have no
particular relationship. The categories are said to be
mutually exclusive since an individual, object, or
measurement can be included in only one of them.
 Ordinal – contain more information. Consists of
distinct categories in which order is implied. Values in
one category are larger or smaller than values in other
categories (e.g. rating-excelent, good, fair, poor)
 Interval– is a set of numerical measurements in
which the distance between numbers is of a known,
constant size.
 Ratio – consists of numerical measurements where
the distance between numbers is of a known, constant
size, in addition, there is a nonarbitrary zero point. 31
Data presentation

32
Numerical presentation
of qualitative data
 pivot table (qualitative dichotomic
statistical attributes)
 contingency table (qualitative statistical
attributes from which at least one of them
is polynomic)

You should know how to convert absolute


values to relative ones (%).

33
Frequency distributions –
numerical presentation of
quantitative

data
Frequency distribution – shows the
frequency, or number of occurences, in each
of several categories. Frequency
distributions are used to summarize large
volumes of data values.
 When the raw data are measured on a
qunatitative scale, either interval or ratio,
categories or classes must be designed for
the data values before a frequency
distribution can be formulated.
34
Steps for constructing a
frequency distribution
1. Determine the number of classes m  n
h
 max  min 
2. Determine the size of each class m
3. Determine the starting point for the first class
4. Tally the number of values that occur in each
class
5. Prepare a table of the distribution using actual
counts and/ or percentages (relative frequencies)

35
Frequency table

 absolute
frequency “ni” (Data
TabData AnalysisHistogram)
 relative frequency “fi”
Cumulative frequency distribution shows
the total number of occurrences that lie
above or below certain key values.
 cumulative frequency “Ni”
 cumulative relative frequency “Fi”
36
Charts and graphs

 Frequency distributions are good ways to


present the essential aspects of data
collections in concise and understable
terms
 Pictures are always more effective in
displaying large data collections

37
Histogram
 Frequently used to graphically present interval and
ratio data
 Is often used for interval and ratio data
 The adjacent bars indicate that a numerical range is
being summarized by indicating the frequencies in
arbitrarily chosen classes

38
Histogram

39
Frequency polygon
 Another common method for graphically
presenting interval and ratio data
 To construct a frequency polygon mark the
frequencies on the vertical axis and the
values of the variable being measured on
the horizontal axis, as with the histogram.
 Ifthe purpose of presenting is comparation
with other distributions, the frequency
polygon provides a good summary of the
data

40
Frequency Polygon

41
Ogive

A graph of a cumulative frequency distribution


 Ogive is used when one wants to determine how
many observations lie above or below a certain
value in a distribution.
 Firstcumulative frequency distribution is
constructed
 Cumulative frequencies are plotted at the upper
class limit of each category
 Ogive can also be constructed for a relative
frequency distribution.

42
Ogive

43
Pie Chart

 The pie chart is an effective way of


displaying the percentage breakdown of
data by category.
 Useful
if the relative sizes of the data
components are to be emphasized
 Pie charts also provide an effective way of
presenting ratio- or interval-scaled data
after they have been organized into
categories
44
Pie Chart

45
Bar chart
 Another common method for graphically
presenting nominal and ordinal scaled data
 One bar is used to represent the frequency for each
category
 The bars are usually positioned vertically with
their bases located on the horizontal axis of the
graph
 The bars are separated, and this is why such a
graph is frequently used for nominal and ordinal
data – the separation emphasize the plotting of
frequencies for distinct categories
46
Bar Chart

47
Time Series Graph

The time series graph is a


graph of data that have been
measured over time.
The horizontal axis of this graph
represents time periods and the
vertical axis shows the
numerical values corresponding
to these time periods

48
Time Series Graph

49
Descriptive Statistics:
Numerical Measures
 Measures of Location
 Measures of Variability

50
Measures of Location
 Mean
If the measures are computed
 Median
for data from a sample,
 Mode they are called sample statistics.
 Percentiles
 Quartiles If the measures are computed
for data from a population,
they are called population parameters.

A sample statistic is referred to


as the point estimator of the
corresponding population parameter.

51
Mean

 The mean of a data set is the average of all the data


values.
 The sample mean x is the point estimator of the
population mean .

52
Sample Mean x

Sum of the values


of the n observations
x i
x
n
Number of
observations
in the sample

53
Population Mean 

Sum of the values


of the N observations
x i

N
Number of
observations in
the population

54
Sample Mean

 Example: Apartment Rents


Seventy efficiency apartments
were randomly sampled in
a small college town. The
monthly rent prices for
these apartments are listed
in ascending order on the next slide.

55
Sample Mean

425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

56
Sample Mean
x  x i

34,356
 490.80
n 70
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

57
Median
 The median of a data set is the value in the middle
when the data items are arranged in ascending ord
 Whenever a data set has extreme values, the media
is the preferred measure of central location.
 The median is the measure of location most often
reported for annual income and property value data
 A few extremely large incomes or property values
can inflate the mean.

58
Median

 For an odd number of observations:

26 18 27 12 14 27 19 7 observations

12 14 18 19 26 27 27 in ascending order

the median is the middle value.

Median = 19

59
Median

 For an even number of observations:

26 18 27 12 14 27 30 19 8 observations

12 14 18 19 26 27 27 30 in ascending order

the median is the average of the middle two values.

Median = (19 + 26)/2 = 22.5

60
Median
Averaging the 35th and 36th data values:
Median = (475 + 475)/2 = 475
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

61
Mode
 The mode of a data set is the value that occurs with
greatest frequency.
 The greatest frequency can occur at two or more
different values.
 If the data have exactly two modes, the data are
bimodal.
 If the data have more than two modes, the data are
multimodal.

62
Mode
450 occurred most frequently (7 times)
Mode = 450

425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

63
Percentiles
 A percentile provides information about how the
data are spread over the interval from the smallest
value to the largest value.
 Admission test scores for colleges and universities
are frequently reported in terms of percentiles.

64
Percentiles
 The pth percentile of a data set is a value such that at
least p percent of the items take on this value or less and
at least (100 - p) percent of the items take on this value or
more.

65
Percentiles

Arrange the data in ascending order.

Compute index i, the position of the pth percentile.


i = (p/100)n

If i is not an integer, round up. The p th percentile


is the value in the i th position.

If i is an integer, the p th percentile is the average


of the values in positions i and i +1.

66
90th Percentile
i = (p/100)n = (90/100)70 = 63
Averaging the 63rd and 64th data values:
90th Percentile = (580 + 590)/2 = 585
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

67
90th Percentile

“At least 90% “At least 10%


of the items of the items
take on a value take on a value
of 585 or less.” of 585 or more.”
63/70 = .9 or 90% 7/70 = .1 or 10%
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

68
Quartiles

 Quartiles are specific percentiles.


 First Quartile = 25th Percentile
 Second Quartile = 50th Percentile = Median
 Third Quartile = 75th Percentile

69
Third Quartile
Third quartile = 75th percentile
i = (p/100)n = (75/100)70 = 52.5 = 53
Third quartile = 525

425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

70
Measures of Variability
 It is often desirable to consider measures of variabil
(dispersion), as well as measures of location.
 For example, in choosing supplier A or supplier B we
might consider not only the average delivery time f
each, but also the variability in delivery time for eac

71
Measures of Variability

 Range
 Interquartile Range
 Variance
 Standard Deviation
 Coefficient of Variation

72
Range

 The range of a data set is the difference between t


largest and smallest data values.
 It is the simplest measure of variability.
 It is very sensitive to the smallest and largest data
values.

73
Range
Range = largest value - smallest value
Range = 615 - 425 = 190

425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

74
Interquartile Range

 The interquartile range of a data set is the differen


between the third quartile and the first quartile.
 It is the range for the middle 50% of the data.
 It overcomes the sensitivity to extreme data value

75
Interquartile Range
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80

425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

76
Variance

The variance is a measure of variability that utilizes


all the data.

It is based on the difference between the value of


x ( for a sample
each observation (xi) and the mean
 for a population).

77
Variance

The variance is the average of the squared


differences between each data value and the mean.

The variance is computed as follows:

2
 ( xi  x )  ( xi   ) 2
s2  2
 
n 1 N

for a for a
sample population

78
Standard Deviation

The standard deviation of a data set is the positive


square root of the variance.

It is measured in the same units as the data, making


it more easily interpreted than the variance.

79
Standard Deviation

The standard deviation is computed as follows:

s  s2   2

for a for a
sample population

80
Coefficient of Variation
The coefficient of variation indicates how large the
standard deviation is in relation to the mean.

The coefficient of variation is computed as follows:

s   
 100 %  100  %
x   
for a for a
sample population

81
Descriptive Statistics:
Numerical Measures

Measures of Distribution Shape, Relative


Location, and Detecting Outliers

82
Measures of Distribution
Shape,
Relative Location, and
Detecting Outliers
 Distribution Shape
 z-Scores
 Detecting Outliers

83
Distribution Shape:
Skewness
 When referring to the shape of frequency or probability
distributions, “skewness” refers to asymmetry of the
distribution.
 A distribution with an asymmetric tail extending out to the
right is referred to as “positively skewed” or “skewed to
the right,” while a distribution with an asymmetric tail
extending out to the left is referred to as “negatively
skewed” or “skewed to the left.”
 Skewness can range from minus infinity to positive
infinity.

84
Distribution Shape: Skewness

 Symmetric (not skewed)


• Skewness is zero.
• Mean and median are equal.
.35
Skewness =
0
Relative Frequency

.30
.25
.20
.15
.10
.05
0

85
Distribution Shape:
Skewness
 Moderately Skewed Left
 Skewness is negative.
 Mean will usually be less than the median.

.35
Skewness = .31
Relative Frequency

.30
.25
.20
.15
.10
.05
0

86
Distribution Shape:
Skewness
 Moderately Skewed Right
 Skewness is positive.
 Mean will usually be more than the median.

.35
Skewness = .31
Relative Frequency

.30
.25
.20
.15
.10
.05
0

87
Distribution Shape: Skewness

 Highly Skewed Right


• Skewness is positive (often above 1.0).
• Mean will usually be more than the median.
.35
Skewness = 1.25
Relative Frequency

.30
.25
.20
.15
.10
.05
0

88
Skewness
Karl Pearson (1895) first suggested measuring skewness
by standardizing the difference between the mean and
the mode, that is,
  Mode
sk 

Population modes are not well estimated from sample
modes, but one can estimate the difference between
the mean and the mode as being three times the
difference between the mean and the median (Stuart &
Ord, 1994), leading to the following estimate of
skewness:
3( M  Median)
skest 
s
89
Many statisticians use this measure but with the ‘3’
eliminated, that is,

( M  Median)
sk 
s

This statistic ranges from -1 to +1. Absolute values


above 0.2 indicate great skewness

90
Fisher’s skewness is most often estimated by:

n z 3
n  ( xi   ) 
3

g1   
(n  1)(n  2) (n  1)(n  2)   

For large sample sizes (n > 150), g1 may be


distributed approximately normally, with a standard
error of approximately

6/n
91
Kurtosis

Karl Pearson
introduced the term
Kurtosis (literally the
amount of hump) for
the degree of
peakedness or
flatness of a
unimodal frequency
curve.

92
When the peak of a curve becomes relatively high then
that curve is called Leptokurtic.

When the curve is flat-topped, then it is called Platykurtic.

Since normal curve is neither very peaked nor very flat


topped, so it is taken as a basis for comparison.

The normal curve is called Mesokurtic.

93
 For a normal distribution, kurtosis is equal to 3.

 When is greater than 3, the curve is more sharply


peaked and has narrower tails than the normal curve
and is said to be leptokurtic.

 When it is less than 3, the curve has a flatter top and


relatively wider tails than the normal curve and is said
to be platykurtic.

94
Another measure of Kurtosis, known as
Percentile coefficient of kurtosis is:

Q.D
Kurt=
P90  P10
Where,
Q.D is semi-interquartile range=Q.D=(Q3-
Q1)/2
P90=90th percentile
P10=10th percentile

95
Karl Pearson (1905) defined a distribution’s degree of kurtosis as
where
  2  3
 Y   
4

2 
n 4

2 is often referred to as “Pearson’s kurtosis,” and 2 ‑ 3


(often symbolized with 2 ) as “kurtosis excess” or
“Fisher’s kurtosis,” even though it was Pearson who
defined kurtosis as 2 ‑ 3.

96
An unbiased estimator for 2 is

n(n  1) z 4
3(n  1) 2
g2  
(n  1)(n  2(n  3) (n  2)( n  3)

For large sample sizes (n > 1000), g2 may be distributed


approximately normally, with a standard error of
approximately

24 / n

97
Pearson (1905) introduced kurtosis as a measure
of how flat the top of a symmetric distribution is
when compared to a normal distribution of the
same variance. He referred to more flat-topped
distributions (2 < 0) as “platykurtic,” less flat-
topped distributions (2 > 0) as “leptokurtic,” and
equally flat-topped distributions as “mesokurtic”
(2  0).

98
z-Scores

The
The z-score
z-score is
is often
often called
called the
the standardized
standardized value.
value.

It
It denotes
denotes the the number
number of
of standard
standard deviations
deviations aa data
data
value
value xxii is
is from
from the
the mean.
mean.

xi  x
zi 
s

99
z-Scores

 An observation’s z-score is a measure of the relativ


location of the observation in a data set.
 A data value less than the sample mean will have a
z-score less than zero.
 A data value greater than the sample mean will hav
ha
a z-score greater than zero.
 A data value equal to the sample mean will have a
z-score of zero.

100
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

x 490.8
s 54.74
101
z-Scores
 z-Score of Smallest Value (425)

xi  x 425  490.80
z    1.20
s 54.74

Standardized Values for Apartment Rents


-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

102
Empirical Rule
For data having a bell-shaped distribution:

68.26%
68.26% of of the
the values
values of
of aa normal
normal random
random variable
variable
are within+/-
are within+/- 1
1 standard
standard deviation
deviation of
of its
its mea
me

95.44%
95.44% of of the
the values
values of
of aa normal
normal random
random variable
variable
are within+/-
are within +/- 2
2 standard
standard deviations
deviations of
of its
its mea
me

99.72%
99.72% of of the
the values
values of
of aa normal
normal random
random variable
variable
are within+/-
are within +/- 3
3 standard
standard deviations
deviations of
of its
its mea
me

103
Empirical Rule
99.72%
95.44%
68.26%

x

 – 3  – 1  + 1  + 3
 – 2  + 2

104
Normal Probability
Distributions

105
INTRODUCTION TO NORMAL
DISTRIBUTIONS AND THE
STANDARD DISTRIBUTION

106
Properties of Normal Distributions
A continuous random variable has an infinite
number of possible values that can be represented
by an interval on the number line.

Hours spent studying in a


day
0 3 6 9 12 15 18 21 24

The time spent


studying can be
any number
between 0 and 24.

The probability distribution of a continuous random


variable is called a continuous probability
distribution.
107
Properties of Normal Distributions
The most important probability distribution in
statistics is the normal distribution.

Normal curve

A normal distribution is a continuous probability


distribution for a random variable, x. The graph of a
normal distribution is called the normal curve.

108
Properties of Normal Distributions
Properties of a Normal Distribution
1. The mean, median, and mode are equal.
2. The normal curve is bell-shaped and symmetric
about the mean.
3. The total area under the curve is equal to one.
4. The normal curve approaches, but never touches
the x-axis as it extends farther and farther away
from the mean.
5. Between μ  σ and μ + σ (in the center of the
curve), the graph curves downward. The graph
curves upward to the left of μ  σ and to the right of
μ + σ. The points at which the curve changes from
curving upward to curving downward are called the
inflection points. 109
Properties of Normal Distributions

Inflection points

Total area = 1

x
μ  3σ μ  2σ μσ μ μ+σ μ + 2σ μ + 3σ

If x is a continuous random variable having a


normal distribution with mean μ and standard
deviation σ, you can graph a normal curve with the
equation 1
y= e-(x - μ )2 2σ 2
. e =2.178 π =3.14
σ 2π
110
Means and Standard Deviations
A normal distribution can have any mean
and any positive standard deviation.
Inflection
The mean gives points
Inflection the location of
points the line of
symmetry.
x x
1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 10 11

Mean: μ = 3.5 Mean: μ = 6


Standard Standard
deviation: σ  deviation: σ 
1.3 1.9
The standard deviation describes the spread of the
data.

111
Means and Standard Deviations
Example:
1. Which curve has the greater mean?
2. Which curve has the greater standard
deviation?

B
A

x
1 3 5 7 9 11 13

The line of symmetry of curve A occurs at x = 5. The line of


symmetry of curve B occurs at x = 9. Curve B has the greater
mean.
Curve B is more spread out than curve A, so curve B has the
greater standard deviation.

112
Interpreting Graphs
Example:
The heights of fully grown magnolia bushes are
normally distributed. The curve represents the
distribution. What is the mean height of a fully grown
magnolia bush? Estimate the standard deviation.
The inflection points are one
standard deviation away from the
μ=8 mean. σ  0.7

x
6 7 8 9 10
Height (in feet)

The heights of the magnolia bushes are normally


distributed with a mean height of about 8 feet and
a standard deviation of about 0.7 feet.

113
The Standard Normal Distribution
The standard normal distribution is a normal
distribution with a mean of 0 and a standard deviation
of 1.

The horizontal scale


corresponds to z-
scores.
z
3 2 1 0 1 2 3

Any value can be transformed into a z-score by using


Value- Mean x-μ
the formulaStandard deviation
z = =
σ
.

114
The Standard Normal Distribution
If each data value of a normally distributed random
variable x is transformed into a z-score, the result
will be the standard normal distribution.
The area that falls in the interval
under the nonstandard normal curve
(the x-values) is the same as the
area under the standard normal
curve (within the corresponding z-
boundaries).

z
3 2 1 0 1 2 3

After the formula is used to transform an x-value


into a z-score, the Standard Normal Table in
Appendix B is used to find the cumulative area
under the curve.
115
The Standard Normal Table
Properties of the Standard Normal
Distribution
1. The cumulative area is close to 0 for z-scores close to z =
3.49.
2. The cumulative area increases as the z-scores increase.
3. The cumulative area for z = 0 is 0.5000.
4. The cumulative area is close to 1 for z-scores close to z =
3.49

Area is close to 0. Area is close to 1.


z
3 2 1 0 1 2 3
z = 3.49 z = 3.49
z=0
Area is 0.5000.

116
The Standard Normal Table
Example:
Find the cumulative area that corresponds to a z-
score of 2.71.
Appendix B: Standard Normal Table
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09

0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359

0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753

0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141

2.6 .9953 .9955 .9956 .9957 .9959 .9960 .9961 .9962 .9963 .9964

2.7 .9965 .9966 .9967 .9968 .9969 .9970 .9971 .9972 .9973 .9974

2.8 .9974 .9975 .9976 .9977 .9977 .9978 .9979 .9979 .9980 .9981

Find the area by finding 2.7 in the left hand column,


and then moving across the row to the column
under 0.01.
The area to the left of z = 2.71 is 0.9966.
117
The Standard Normal Table
Example:
Find the cumulative area that corresponds to a z-
score of 0.25.
Appendix B: Standard Normal Table
z .09 .08 .07 .06 .05 .04 .03 .02 .01 .00

3.4 .0002 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003

3.3 .0003 .0004 .0004 .0004 .0004 .0004 .0004 .0005 .0005 .0005

0.3 .3483 .3520 .3557 .3594 .3632 .3669 .3707 .3745 .3783 .3821

0.2 .3859 .3897 .3936 .3974 .4013 .4052 .4090 .4129 .4168 .4207

0.1 .4247 .4286 .4325 .4364 .4404 .4443 .4483 .4522 .4562 .4602
0.0 .4641 .4681 .4724 .4761 .4801 .4840 .4880 .4920 .4960 .5000

Find the area by finding 0.2 in the left hand column,


and then moving across the row to the column under
0.05.
The area to the left of z = 0.25 is 0.4013
118
Guidelines for Finding Areas
Finding Areas Under the Standard
Normal Curve
1. Sketch the standard normal curve and shade the
appropriate area under the curve.
2. Find the area by following the directions for each
case shown.
a. To find the area to the left of z, find the area
that2.corresponds
The area to the
to z in the Standard Normal
Table.left of z = 1.23
is 0.8907.

1. Use the table to 0find 1.23


the area for the z-score.

119
Guidelines for Finding Areas
Finding Areas Under the Standard
Normal Curve
b. To find the area to the right of z, use the
Standard Normal Table to find the area that
corresponds to z. Then subtract the area from
1. 2. The area to 3. Subtract to find the area
the left of z = to the right of z = 1.23:
1.23 is 0.8907. 1  0.8907 =
0.1093.

z
0 1.23
1. Use the table to find
the area for the z-score.

120
Guidelines for Finding Areas
Finding Areas Under the Standard
Normal Curve
c. To find the area between two z-scores, find the
area corresponding to each z-score in the
Standard Normal Table. Then subtract the
smaller area
2. The from the larger
area to area.
4. Subtract to find the area
of the region between the
the left of z =
1.23 is two z-scores:
0.8907. 0.8907  0.2266 =
3. The area to the 0.6641.
left of z = 0.75 is
0.2266.

z
0.75 0 1.23

1. Use the table to find the area


for the z-score.

121
Guidelines for Finding Areas
Example:
Find the area under the standard normal
curve to the left of z = 2.33.

Always draw
the curve!

2.33 0

From the Standard Normal Table, the area is


equal to 0.0099.

122
Guidelines for Finding Areas
Example:
Find the area under the standard normal
curve to the right of z = 0.94.
Always draw
the curve!
0.8264
1  0.8264 =
0.1736
z
0 0.94

From the Standard Normal Table, the area is


equal to 0.1736.

123
Guidelines for Finding Areas
Example:
Find the area under the standard normal
curve between z = 1.98 and z = 1.07.
Always draw
0.8577 the curve!

0.0239 0.8577  0.0239 =


0.8338

z
1.98 0 1.07

From the Standard Normal Table, the area is


equal to 0.8338.
124
NORMAL
DISTRIBUTIONS:
FINDING
PROBABILITIES
125
Probability and Normal
Distributions
If a random variable, x, is normally
distributed, you can find the probability that
x will fall in a given interval by calculating
the area under the normal curve for that
interval.
μ = 10
P(x < σ=5
15)

x
μ =10 15

126
Probability and Normal
Distributions
Normal Distribution Standard Normal
μ = 10 Distribution
μ=0
σ=5 σ=1

P(x < 15) P(z < 1)

x z
μ =10 15 μ =0 1

Same area

P(x < 15) = P(z < 1) = Shaded area under the curv
= 0.8413
127
Probability and Normal
Distributions
Example:
The average on a statistics test was 78 with a
standard deviation of 8. If the test scores are
normally distributed, find the probability that a
student receives a test score less than 90.
μ = 78 x - μ 90-78
σ=8 z =
σ 8
=1.5
P(x < 90)

The probability that a


x student receives a test
μ =78 90 score less than 90 is
z
μ =0 ?
0.9332.
1.5

P(x < 90) = P(z < 1.5) = 0.9332

128
Probability and Normal
Distributions
Example:
The average on a statistics test was 78 with a
standard deviation of 8. If the test scores are
normally distributed, find the probability that a
student receives a test score greater than than 85.
x - μ 85-78
μ = 78 z= =
σ 8
σ=8
=0.875 0.88
P(x > 85)
The probability that a
x student receives a test
μ =78 85 score greater than 85 is
z
μ =0 0.88
?
0.1894.

(x > 85) = P(z > 0.88) = 1  P(z < 0.88) = 1  0.8106 = 0.18

129
Probability and Normal
Distributions
Example:
The average on a statistics test was 78 with a
standard deviation of 8. If the test scores are
normally distributed, find the probability that a
student receives a test score
z =
between
x 60 =
- μ 60 - 78
=
and
-2.2580.
1
σ 8
P(60 < x < 80) x - μ 80 - 78 =0.25
z2  =
σ 8
μ = 78
σ=8
The probability that a
x student receives a test
60 μ =7880 score between 60 and
z
2.25
? μ =0 0.25
?
80 is 0.5865.

(60 < x < 80) = P(2.25 < z < 0.25) = P(z < 0.25)  P(z < 2.25)
= 0.5987  0.0122 = 0.5865
130
NORMAL
DISTRIBUTIONS:
FINDING VALUES

131
Finding z-Scores
Example:
Find the z-score that corresponds to a cumulative
area of 0.9973. Appendix B: Standard Normal Table
z .00 .01 .02 .03 .04 .05 .06 .07 .08
.08 .09

0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359

0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753

0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141

2.6 .9953 .9955 .9956 .9957 .9959 .9960 .9961 .9962 .9963 .9964

2.7
2.7 .9965 .9966 .9967 .9968 .9969 .9970 .9971 .9972 .9973 .9974

2.8 .9974 .9975 .9976 .9977 .9977 .9978 .9979 .9979 .9980 .9981

Find the z-score by locating 0.9973 in the body of the


Standard Normal Table. The values at the beginning of
the corresponding row and at the top of the column give
the z-score.
The z-score is 2.78.
132
Finding z-Scores
Example:
Find the z-score that corresponds to a cumulative
area of 0.4170.
Appendix B: Standard Normal Table
z .09 .08 .07 .06 .05 .04 .03 .02 .01
.01 .00

3.4 .0002 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003

0.2 .0003 .0004 .0004 .0004 .0004 .0004 .0004 .0005 .0005 .0005
Use the
closest
0.3 .3483 .3520 .3557 .3594 .3632 .3669 .3707 .3745 .3783 .3821
area.
0.2
0.2 .3859 .3897 .3936 .3974 .4013 .4052 .4090 .4129 .4168 .4207

0.1 .4247 .4286 .4325 .4364 .4404 .4443 .4483 .4522 .4562 .4602
0.0 .4641 .4681 .4724 .4761 .4801 .4840 .4880 .4920 .4960 .5000

Find the z-score by locating 0.4170 in the body of the


Standard Normal Table. Use the value closest to 0.4170.
The z-score is 0.21.
133
Finding a z-Score Given a
Percentile
Example:
Find the z-score that corresponds to P75.

Area = 0.75

z
μ =0 0.67
?

The z-score that corresponds to P75 is the same z-score


that corresponds to an area of 0.75.
The z-score is 0.67.

134
Transforming a z-Score to an x-
Score
To transform a standard z-score to a data value,
x, in a given population, use the formula
x  μ +zσ.
Example:
The monthly electric bills in a city are normally
distributed with a mean of $120 and a standard deviation
of $16. Find the x-value corresponding to a z-score of
1.60.
x  μ +zσ
=120+1.60(16)
=145.6
We can conclude that an electric bill of $145.60 is 1.6
standard deviations above the mean.
135
Finding a Specific Data Value
Example:
The weights of bags of chips for a vending machine are
normally distributed with a mean of 1.25 ounces and a
standard deviation of 0.1 ounce. Bags that have weights
in the lower 8% are too light and will not work in the
machine. What is the least a bag of chips can weigh and
still work in the machine?
P(z < ?) = 0.08
8% P(z < 1.41) = 0.08
z
?
1.41 0 x  μ +zσ
x

? 1.25
1.25  ( 1.41)0.1
1.11
1.11
The least a bag can weigh and still work in the machine is 1.11
ounces.
136
SAMPLING DISTRIBUTIONS
AND THE CENTRAL LIMIT
THEOREM

137
Sampling Distributions
A sampling distribution is the probability
distribution of a sample statistic that is formed when
samples of size n are repeatedly taken from a
population.

Sample Sample
Sample Sample
Sample
Sample
Sample
Sample
Populati Sample
Sample
on

138
Sampling Distributions
If the sample statistic is the sample mean, then the
distribution is the sampling distribution of
sample means.
Sample 3
Sample 1 x3 Sample 2 Sample 6
Sample 4
xSample 5
x6
x
4
1
5 x x2

The sampling distribution consists of the values of


x1 , x 2 , x 3 , x 4 , x 5 , x 6 .
the sample means,

139
Properties of Sampling
Distributions
Properties of Sampling Distributions of Sample
Means μx ,
1. The mean of the sample means, is equal to the
μx = μ
population mean.

σx ,
2. The standard deviation of theσ,sample means, is equal to
the population standard σ σ
deviation, divided by the square
x=
root of n.
n

The standard deviation of the sampling distribution of the


sample means is called the standard error of the mean. 140
Sampling Distribution of Sample
Means
Example:
The population values {5, 10, 15, 20} are written on slips
of paper and put in a hat. Two slips are randomly
selected, with replacement.
a. Find the mean, standard deviation, and variance
of the population.
Populatio μ =12.5
n
5 σ =5.59
10
15 σ 2 =31.25
20
Continued.

141
Sampling Distribution of Sample
Means
Example continued:
The population values {5, 10, 15, 20} are written on slips
of paper and put in a hat. Two slips are randomly
selected, with replacement.
b. Graph the probability histogram for the population
values.
P(x) Probability
Histogram of
0.25 Population of x
This uniform distribution
Probability

shows that all values


have the same
probability of being
x selected.
5 10 15 20
Population Continued.
values
142
Sampling Distribution of Sample
Means
Example continued:
The population values {5, 10, 15, 20} are written on slips
of paper and put in a hat. Two slips are randomly
selected, with replacement.
c. List all the possible samples of size n = 2 and
calculate the mean of each.
Sample x
Sample mean, Sample x
Sample mean,
5, 5 5 15, 5 10 These means
5, 10 7.5 15, 10 12.5 form the
5, 15 10 15, 15 15 sampling
5, 20 12.5 15, 20 17.5 distribution
10, 5 7.5 20, 5 12.5 of the
10, 10 10 20, 10 15 sample
10, 15 12.5 20, 15 17.5 means.
10, 20 15 20, 20 20
Continued.
143
Sampling Distribution of Sample
Means
Example continued:
The population values {5, 10, 15, 20} are written on slips
of paper and put in a hat. Two slips are randomly
selected, with replacement.
d. Create the probability distribution of the sample
means.
x f Probability
5 1 0.0625
7.5 2 0.1250 Probability
10 3 0.1875 Distribution of
Sample Means
12.5 4 0.2500
15 3 0.1875
17.5 2 0.1250
20 1 0.0625

144
Sampling Distribution of Sample
Means
Example continued:
The population values {5, 10, 15, 20} are written on slips
of paper and put in a hat. Two slips are randomly
selected, with replacement.
e. Graph the probability histogram for the sampling
distribution.
P(x) Probability
Histogram of
0.25 Sampling
Distribution
Probability

0.20
The shape of the graph
0.15
is symmetric and bell
0.10
shaped. It approximates
0.05 a normal distribution.
x
5 7.5 10 12. 15 17. 20
Sample5mean 5

145
The Central Limit Theorem
If a sample of size n  30 is taken from a population
with any type of distribution that has a mean = 
and standard deviation = ,

x x

the sample means will have a normal


distribution.

146
The Central Limit Theorem
If the population itself is normally distributed,
with mean =  and standard deviation = ,

the sample means will have a normal


distribution for any sample size n.

147
The Central Limit Theorem
In either case, the sampling distribution of sample
means has a mean equal to the population mean.

μx  μ Mean of the
sample means

The sampling distribution of sample means has a


standard deviation equal to the population standard
deviation divided by the square root of n.

σ Standard deviation of the


σx  sample means
n
This is also called the
standard error of the
mean.
148
The Mean and Standard Error
Example:
The heights of fully grown magnolia bushes have a
mean height of 8 feet and a standard deviation of
0.7 feet. 38 bushes are randomly selected from the
population, and the mean of each sample is
determined. Find the mean and standard error of
the mean of the sampling distribution.
Standard deviation
Mean (standard error)
μx  μ σ
σx 
n
=8
0.7
= =0.11
38
Continued.
149
Interpreting the Central Limit
Theorem
Example continued:
The heights of fully grown magnolia bushes have a
mean height of 8 feet and a standard deviation of
0.7 feet. 38 bushes are randomly selected from
the population, and the mean of each sample is
determined.
The mean of the sampling distribution is 8 feet ,and
the standard error of the sampling distribution is
0.11 feet.
From the Central Limit
Theorem, because the sample
size is greater than 30, the x

sampling distribution can be 7.6 8 8.4

approximated by the normal μx =8 σx =0.11


distribution.
150
Finding Probabilities
Example:
The heights of fully grown magnolia bushes have a
mean height of 8 feet and a standard deviation of
0.7 feet. 38 bushes are randomly selected from
the population, and the mean of each sample is
determined.
The mean of the sampling
distribution is 8 feet, and the μx =8 n =38
standard error of the sampling σx =0.11
distribution is 0.11 feet.
Find the probability that the
x
mean height of the 38 bushes
7.6 8 8.4
is less than 7.8 feet. 7.8
Continued.
151
Finding Probabilities
Example continued:
Find the probability that the mean height of the 38
bushes is less than 7.8 feet.
μx =8 n = 38
σx =0.11
x  μx
z
P ( x< 7.8) σx
x
7.6 8 8.4 7.8  8
=
7.8 0.11
z
0 =  1.82
P ( x < 7.8) = P (z1.82
<
? = 0.0344
____probability
The ) that the mean height of the 38
bushes is less than 7.8 feet is 0.0344.
152
Probability and Normal
Distributions
Example:
The average on a statistics test was 78 with a
standard deviation of 8. If the test scores are
normally distributed, find the probability that the
mean score of 25 randomly selected students is
between
μx =78 75 and 79. x  μx 75  78
=  1.88
z1 = =
σx 1.6
σ 8
σx = = =1.6
n 25
x  μ 79  78 =0.63
z2 = =
P (75 < x < 79) σ 1.6

x
75 78 79
z
1.88
? 00.63
? Continued.
153
Probability and Normal
Distributions
Example continued:

P (75 < x < 79)

x
75 78 79
z
?
1.88 0 0.63
?

P(75 < <x 79) = P(1.88 < z < 0.63) = P(z < 0.63)  P(z < 1.88)
= 0.7357  0.0301 = 0.7056
Approximately 70.56% of the 25 students will have a
mean score between 75 and 79.
154
Probabilities of x and x
Example:
The population mean salary for auto mechanics is
 = $34,000 with a standard deviation of  =
$2,500. Find the probability that the mean salary for a
randomly selected sample of 50 mechanics is greater
μthan
x
$35,000.
=34000
x  μx 35000  34000 =2.83
σ 2500 z =
σx  = =353.55 σx 353.55
n 50
P ( x > 35000)
= P (z > 2.83)
= 1  P (z < 2.83)
= 1  0.9977= 0.0023

The probability that the mean


x salary for a randomly
3400035000 selected sample of 50
z
0 2.83
? mechanics is greater than
$35,000 is 0.0023.
155
Probabilities of x and x
Example:
The population mean salary for auto mechanics is
 = $34,000 with a standard deviation of  =
$2,500. Find the probability that the salary for one
randomly selected mechanic is greater than $35,000.
(Notice that the Central Limit Theorem does not apply.)

μ =34000 x - μ 35000-34000 =0.4


z= =
σ 2500
σ =2500
= P (z > 0.4)
P (x > 35000) = 1  P (z < 0.4)
= 1  0.6554= 0.3446

x The probability that the


3400035000 salary for one mechanic is
z
0 0.4
? greater than $35,000 is
0.3446.
156
Probabilities of x and x
Example:
The probability that the salary for one randomly
selected mechanic is greater than $35,000 is 0.3446.
In a group of 50 mechanics, approximately how many
would have a salary greater than $35,000?
This also means that 34.46% of
P(x > 35000) = 0.3446 mechanics have a salary greater
than $35,000.

34.46% of 50 = 0.3446  50 = 17.23

You would expect about 17 mechanics out of the


group of 50 to have a salary greater than
$35,000.
157
NORMAL APPROXIMATIONS TO
BINOMIAL
DISTRIBUTIONS

158
Normal Approximation
The normal distribution is used to approximate
the binomial distribution when it would be
impractical to use the binomial distribution to
find a probability.
Normal Approximation to a Binomial
Distribution
If np  5 and nq  5, then the binomial random
variableμ xis
npapproximately normally distributed with
mean

σ  npq.
and standard deviation

159
Normal Approximation
Example:
Decided whether the normal distribution to
approximate x may be used in the following
examples.
1. Thirty-six percent of people in the United States
own a dog. You randomly select 25 people in
the
np United States
=(25)(0.36) =9 and ask them
Because np andifnq
they own a
are greater
dog.
nq =(25)(0.64) =16 than 5, the normal distribution may
be used.

2. Fourteen percent of people in the United States


own a cat. You
np =(20)(0.14) randomly
=2.8 Becauseselect 20greater
np is not people in 5,
than
the
nq United States
=(20)(0.86) =17.2 and
the ask them
normal if they may
distribution ownNOT
a be
cat. used.
160
Correction for Continuity
The binomial distribution is discrete and can be
represented by a probability histogram.
Exact binomial To calculate exact binomial
probability
probabilities, the binomial formula is
used for each value of x and the
P(x = c) results are added. Normal
approximation
P(c 0.5 < x < c + 0.5)

c
x

When using the continuous x


c  0.5 c c + 0.5
normal distribution to approximate a binomial
distribution, move 0.5 unit to the left and right of
the midpoint to include all possible x-values in the
interval.
This is called the correction for
continuity. 161
Correction for Continuity
Example:
Use a correction for continuity to convert the binomial
intervals to a normal distribution interval.
1. The probability of getting between 125 and 145
successes, inclusive.
The discrete midpoint values are 125, 126, …, 145.
The continuous interval is 124.5 < x < 145.5.

2. The probability of getting exactly 100 successes.


The discrete midpoint value is 100.
The continuous interval is 99.5 < x < 100.5.

3. The probability of getting at least 67 successes.


The discrete midpoint values are 67, 68, ….
The continuous interval is x > 66.5.

162
Guidelines
Using the Normal Distribution to Approximate Binomial
Probabilities
In Words In Symbols
Specify n, p, and q.
1. Verify that the binomial distribution applies.
Is np  5?
2. Determine if you can use the normal Is nq  5?
distribution to approximate x, the binomial
variable. μ np
3. Find the mean  and standard deviation σ  npq
for the distribution.
4. Apply the appropriate continuity correction. Add or subtract 0.5
Shade the corresponding area under the from endpoints.
x-μ
normal curve. z 
σ
5. Find the corresponding z-value(s). Use the Standard
6. Find the probability. Normal Table.

163
Approximating a Binomial
Probability
Example:
Thirty-one percent of the seniors in a certain high school
plan to attend college. If 50 students are randomly
selected, find the probability that less than 14 students plan
to attend
np college.
= (50)(0.31) = 15.5The variable x is approximately
nq = (50)(0.69) = 34.5normally distributed with  = np =
15.5
σ = and
npq = (50)(0.31)(0.69) =3.27.

P(x < 13.5)= P(z < 0.61)


Correction for
= 0.2709 = 15.5
continuity
13.5
x - μ 13.5 - 15.5
z  = =-0.61 x
σ 3.27 10 15 20
The probability that less than 14 plan to attend college is
0.2079.
164
Approximating a Binomial
Probability
Example:
A survey reports that forty-eight percent of US citizens
own computers. 45 citizens are randomly selected and
asked whether he or she owns a computer. What is the
probability that exactly 10 say yes?
np = (45)(0.48) = 12 μ =12
nq = (45)(0.52) = 23.4 σ  npq = (45)(0.48)(0.52) =3.35

= P(0.75 < z  0.45)


P(9.5 < x < 10.5)  = 12
= 0.0997 10.5
Correction for
continuity 9.5
x
The probability that exactly 5 10 15
10 US citizens own a computer is
0.0997. 165

You might also like