THE NORMAL
DISTRIBUTION
Prepared by [Link], Nyanyama
Facilitator Prof Innocent Semali
TOPIC OUTLINE
1. discrete and continuous probability distributions
2. Probability density function
3. Normal distribution
4. Properties of normal distribution
5. Standard normal distribution
6. Standard Z scores
7. Finding normal curve areas
8. Finding probabilities'
PROBABILITY
The study of uncertainty and randomness in the world.
What is the probability you will develop diabetes in the next 30 years?
What is the likelihood that you have a knee replacement surgery?
OBJECTIVES
probability as it pertains to statistical inference.
probability models.
Central Limit Theorem.
PROBABILITY
BASICS
Probability reflects the likelihood that outcome will occur.
0 < Probability < 1.
Numberwith outcome
Probability
N
NEW YORK CITY SELECTED DATA- CANCER REGISTRY
2008*
TYPE WHITE MALE WHITE BLACK MALE BLACK TOTAL
FEMALE FEMALE
CORO-RECTAL 1236 1251 449 584 3520
CA
LIVER CA 330 134 149 57 670
LUNG CA 1449 1332 537 497 3815
THYROID CA 175 537 29 135 876
NON- 582 523 170 159 1434
HODGKINS
LYMPHOMA
LEUKEMIA 384 285 87 85 805
TOTAL 4120 4026 1421 1517 11120
PROBABILITY
A CASE IS SELECTED AT RANDOM
P (W.M) = 4120/11120 = 0.37
P (B.M) = 1421/11120 = 0.13
P (THYROID CA) = 876/11120 = 0.08
P (W.F WITH LIVER CA) = 134/11120 = 0.01
P (BLACK PATIENTS WITH LUNG CA) = 537+497/11120 = 0.09
What is the probability
of selecting a patient
with Pre-HTN or HTN?
=95/150
PROBABILITY
DISTRIBUTIONS
A discrete
random
variable is a
variable that
can assume
only a
countable
number of
values
Discrete
Many
possible number of complaints per day
outcomes:
number of TVs in a household
Probability
number of rings before the phone is
Distributions Only two
answered
gender: male or female
possible
outcomes:
defective: yes or no
spreads peanut butter first vs. spreads
jelly first
A continuous random variable is a
variable that can assume any value
on a continuum (can assume an
uncountable number of values)
Continuous • thickness of an item
• time required to complete a task
Probability • temperature of a solution
• height, in inches
Distributions
These can potentially take on any
value, depending only on the ability
to measure
The Binomial Distribution
Characteristics of the Binomial Distribution:
A trial has only two possible outcomes –
“success” or “failure”
There is a fixed number, n, of identical
trials
The trials of the experiment are
independent of each other
The probability of a success, p, remains
constant from trial to trial
If p represents the probability of a
success, then (1-p) = q is the probability
of a failure
Binomial Distribution Settings
A manufacturing plant labels items as either defective or acceptable
A firm bidding for a contract will either get the contract or not
A marketing research firm receives survey responses of “yes I will buy” or
“no I will not”
New job applicants either accept the offer or reject it
Binomial Distribution Formula
P(x) = probability of x successes in n
trials,with probability of success p on
each trial
x = number of ‘successes’ in sample,
(x = 0, 1, 2, ..., n)
p = probability of “success” per trial
q = probability of “failure” = (1 – p)
n = number of trials (sample size)
Binomial Distribution
The shape of the binomial distribution
depends on the values of p and n
ABOVE Here, n = 5 and p = 0.1
BELOW Here, n = 5 and p = 0.5
Binomial Distribution
Characteristics
Mean μ = np
Variance Standard Deviation
Where n = sample size
p = probability of success
q = (1 – p) = probability of failure
Examples
Examples
The Poisson Distribution
Characteristics of the Poisson Distribution:
The outcomes of interest are rare
relative to the possible outcomes
The average number of outcomes of
interest per time or space interval is λ
The number of outcomes of interest are
random, and the occurrence of one
outcome does not influence the chances
of another outcome of interest
The probability of that an outcome of
interest occurs in each segment is the
same for all segments
Poisson Distribution Formula
where:
t = size of the segment of interest
x = number of successes in segment
of interest
λ = expected number of successes in
a segment of unit size
e = base of the natural logarithm
system (2.71828...
Poisson Distribution Characteristics
MEAN
VARIANCE where λ = number of successes in a segment of
unit size
t = the size of the segment of interest
STANDARD DEVIATION
The Hypergeometric Distribution
“n” trials in a sample taken from a
finite population of size N
Sample taken without replacement
Trials are dependent
Concerned with finding the probability
of “x” successes in the sample where
there are “X” successes in the
population
Hypergeometric Distribution
Formula
(Two possible outcomes per trial)
Were
N = Population size
X = number of successes in the population
n = sample size
x = number of successes in the sample
n – x = number of failures in the sample
Hypergeometric Distribution
Formula
EXAMPLE
3 Light bulbs were selected from 10. Of the 10 there were 4 defective. What is
the probability that 2 of the 3 selected are defective?
N=10
X=4
X=2
N=3
The Normal Distribution
The normal distribution also known as the
Gaussian curve is a probability distribution of a
continuous random variable that is symmetrical
about the mean, showing that data around the
mean are more frequent, depicting a “bell
shaped” appearance.
The Normal ‘Bell Shaped’
Symmetrical
Distribution Mean, Median and Mode are Equal
Location is determined by the mean, μ
Spread is determined by the standard deviation,
σ
The random variable has an infinite theoretical
range: + ∞ to − ∞
The Normal Distribution
The Normal Distribution: Definition
of Terms and Symbols Used
Normal Distribution Definition:
1) A continuous variable X having the symmetrical, bell-shaped distribution is called a
Normal Random Variable.
2) The normal probability distribution (Gaussian distribution) is a continuous distribution
which is regarded by many as the most significant probability distribution in statistics
particularly in the field of statistical inference.
Symbols Used:
“z” – z-scores or the standard scores. The table that transforms every normal distribution
to a distribution with mean 0 and standard deviation 1. This distribution is called the
standard normal distribution or simply standard distribution and the individual values are
called standard scores or the z-scores.
“µ” – the Greek letter “mu,” which is the Mean, and
“σ” – the Greek letter “sigma,” which is the Standard Deviation
CHARACTERISTICS OF NORMAL DISTRIBUTION:
1) It is “Bell-Shaped” and has a single peak at the center of the distribution,
2) The arithmetic Mean, Median and Mode are equal.
3) The total area under the curve is 1.00; half the area under the normal curve
is to the right of this center point and the other half to the left of it,
4) It is Symmetrical about the mean,
5) It is Asymptotic: The curve gets closer and closer to the X –axis but never
actually touches it. To put it another way, the tails of the curve extend
indefinitely in both directions.
6) The location of a normal distribution is determined by the Mean, µ, the
Dispersion or spread of the distribution is determined by the Standard
Deviation, σ.
By varying the parameters
μ and σ, we obtain different
normal distributions
PARAMETERS changing µ (mean)shifts
AFFECTING the whole curve to the left
NORMAL or right
CURVES
Increasing sigma
(SND)makes the curve
flatter and more spread out
Finding Normal Probabilities
Probability is the area under the curve!
Probability is measured by the area
under the curve
P(a ≤ x ≤b)
Probability as Area
Under the Curve
The total area under the
curve is 1.0, and the
curve is symmetric, so
half is above the mean,
half is below
Empirical Rules
What can we say about the distribution of
values around the mean? There are some
general rules:
μ ± 1σ encloses about 68% of x’s
μ ± 2σ covers about 95% of x’s
μ ± 3σ covers about 99.7% of x’s
68% of the area of a normal distribution is within one
standard deviation of the mean.
Approximately 95% of the area of a normal distribution
is within two standard deviations of the mean.
Accordingly, 99% of the area of a normal distribution is
within three standard deviations.
The probability between the limits:
m-s and m+s is 0.68
m-1.96s and m+1.96s is 0.95
m-2.58s and m+2.58s is 0.99
Importance of the Rule
If a value is about 2 or more standard deviations away from the mean in a
normal distribution, then it is far from the mean
The chance that a value that far or farther away from the mean is highly
unlikely, given that particular mean and standard deviation
Note: Limits covering the 95% area are known as the 95% reference or
spread limits, and values contained in the interval are commonly referred to
as the NORMAL values
The Standard
Normal Distribution
Also known as the “z”
distribution
Mean is defined to be 0
Standard Deviation is 1
Values above the mean
have positive z-values,
values below the mean
have negative z-values
CALCULATIONS OF PROBABILITIES USING STANDARD
NORMAL DISTRIBUTION
The normal distribution is determined by its
mean and its standard deviation.
These quantities are different for different
problems and so it is not possible to make
values of µ 𝑎𝑛𝑑 "s"
tables of the Normal distribution for all the
So, calculations are made by referring to the
Standard Normal distribution which has µ=0
and s =1.
CALCULATIONS OF PROBABILITIES USING STANDARD
NORMAL DISTRIBUTION
The Standard Normal Distribution is a Normal Distribution with a Mean of 0
and a Standard Deviation of 1.
It is also called the z distribution
A z –value is the distance between a selected value , designated X, and the
population Mean µ, divided by the Population Standard Deviation, σ.
The standard normal cumulative tables used are the less than cumulative
tables, they usually have the left tail of the distribution shaded with positive
and negative parts.
The formula is :
CALCULATIONS OF PROBABILITIES USING STANDARD
NORMAL DISTRIBUTION
Example
If x is distributed normally with mean of 100 and standard deviation of 50,
the z value for x = 250 is
This says that x = 250 is three
standard
deviations (3 increments of 50
units) above
the mean of 100.
APPLICATION OF THE STANDARD NORMAL
DISTRIBUTION
Example 1:
A study of blood pressure of Jangwani school girls
gave a distribution of systolic blood pressure
(SBP) CLOSE TO NORMAL with µ=105.8mmhg
and s = 13.4mm Hg.
a) What percentage of girls would be expected to
have SBP greater than 120mm Hg
SND: 120 – 105.8 /= 1.06
13.4
from the tables the area to the right of
SND equal to 1.06 is 0.14457, so about
14.5% of the girls would be expected to
have systolic blood pressure > 120mm
Hg.
b) What percentage of girls would be
expected to have SBP < 120mmHg
Answer:
if 14.5% have SBP >120mmHg then 100-
14.5= 85.5% will have SBP less
<120mmHg.
C) What proportion of girls would be expected to have SBP
between 85 and 120 mmHg.
Answer
SND1= 85-105.8 = -1.55
13.4
SND2= 120-105.8 = 1.06
13.4
The area to the left of SND 1.55 is 0.060571
and the area to the right of the SND 1.06 is
0.14457 so the proportion with SBP between
85mmHg and 120mmHg is
100-14.5-6.1 = 79.4
D) Within what limits would the central 95% of SBPs be expected?
If µ= 105.8
s = 13.4 then
µ + 1.96s includes 95% of SPB.
(105.8 – 1.96(13.4) to 105.8 + 1.96 (13.4)
i.e. 79.5 to 132.1 mmHg.
i.e. 95% of the girls have SBPs between 79.5 mm/Hg and 132.1 mmHg.
The Standard Normal Table
Is the compilation of areas from the standard normal distribution
negative z score tables show values areas to the left.
positive z score tables show values areas to the right.
STANDARD NORMAL DISTRIBUTION WITH Z TABLES
Example
Scores on an exam are normally distributed with a mean of 65 and a standard
deviation of [Link] the percentage of the scores
A. less than 54
B. At least 80
C. Between 70 and 86
STANDARD NORMAL DISTRIBUTION WITH Z TABLES
A. x=54
In the formular z=x-mean/standard deviation
Z=54-65/9 = -1.222222…… round to -1.22
At the z table = 0.1112
P(x<54)= p(z<-1.22) =0.1112=11.12%
STANDARD NORMAL DISTRIBUTION WITH Z TABLES
B. at least 80
In continuous distribution there is no distinction of at least and greater than
hence
P(x>/ 80) = p(x > 80)
X= 80
Z = 80 – 65/9 =1.67
On z table 1.67 = 0.9525
1- 0.9525 = 0.0475
4.75%
REFERENCE
BARRONS AP STATISTICS
INTRODUCTION TO STATISTICAL LEARNING