Normal Distribution
Shair Muhammad Hazara
MSPH (Health Services Academy, NIH, Islamabad)
MSBE (Dow University of Health Sciences Karachi)
BSN (PRN) The Aga Khan University, Karachi
E-mail address:
[email protected] Objectives
By the end of this session the students should be able
to:
– Understand the concept of Normal distribution & standard
normal distribution.
– Differentiate between the sample mean and the population
mean.
2
What is the Normal Distribution (Curve)?
It’s a theoretical model. The normal distribution plays a
very important role in statistical inference
A frequency polygon or histogram that is unimodal, smooth,
and symmetrical (no empirical distribution has a shape that
perfectly matches this ideal model)
Since the distribution is unimodal it is bell-shaped
3
The Nature of the Normal Distribution:
Properties of the Normal Distribution
Unimodal
– One mode
Symmetrical
– Left and right halves are mirror images
Bell-shaped
– With maximum height at the mean, median, mode
Continuous
– There is a value of Y for every value of X
• Asymptotic
– The farther the curve goes from the mean, the closer it gets to the X axis but
4
History of the Normal Curve
The scores of many variables are normally distributed
– Normal Distribution
– Gaussian Distribution
Sir Francis Galton Carl Friedrich Gauss 5
(1822-1911) (1777-1855)
Normal Distribution & it’s Properties
It was firstly discovered by De Moivre (1733). Also called a
Gaussian after another mathematician (gauss).
Many real-life observations follow the normal distribution
(or are very close to being normally distributed);
Two parameters define the normal distribution, the
mean (µ) and the standard deviation (σ).
6
Many Normal Distributions
There are an infinite number of normal distributions
By varying the parameters and , we obtain
different normal distributions
7
The Normal Distribution:
The Most Important One in Statistics
It’s important because…
– Many variables have approximate normal
distributions.
– It’s used to approximate many discrete distributions.
– Many statistical methods use the normal distribution
even when the data are not bell-shaped.
8
Theoretical Normal Distribution
• ±1 s = about 68%
• ±2 s = about 95%
• ±3 s = about 99%
68.26%
95.44%
99.72%
9
-5 -4 -3 -2 -1 0 1 2 3 4 5
Finding Probabilities
Probability is
the area under
the curve! P c X d ?
f(X)
X
c d 10
Standard Normal Distribution
The Z ...
1. The Z-score for an observation is the number of standard
deviations that it falls from the mean.
2. Expresses this distance in a standardized way.
Specifically, in standard deviation units.
3. Re-scales empirical distributions to have a mean=0 & a
standard deviation=1.
11
Standard Normal Distribution
• ±1 z = about 68%
• ±2 z = about 95%
• ±3 z = about 99%
68.26%
95.44%
99.72%
-5 -4 -3 -2 -1 0 1 2 3 4 5 12
-z -z -z z +z +z +z
Z-Score
For each fixed number z, the probability within z
standard deviations of the mean is the area
under the normal curve between
- z and z
13
Z-Score
For z = 1+
68% of the area (probability) of a normal
distribution falls between:
- 1 and 1
14
Z-Score
For z = 2+
95% of the area (probability) of a normal
distribution falls between:
- 2 and 2
15
Z-Score
For z = 3:
99.7% of the area (probability) of a normal
distribution falls between:
- 3 and 3
16
Area between 0 and z
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
17
Area between 0 and z
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
18
Find the area under the curve to the left of 1.42
The "area under the curve" represents the shaded portion and it tells you
to the "left" of 1.42, so everything to the left of 1.42 should be shaded.
P (Z <1.42)
When the z-score of 1.42 is looked up in the table, the value returned is 0.9222 and since
we want the area to the left, we're done. 19
Find the area under the curve, P(Z > - 0.42)
The -0.42 is a single value, therefore it is the z-score looked up in the
table and represented by the vertical line. The z is a variable, meaning it
can take on many values, and corresponds to the shaded area. So
another way of looking at this is "the shaded area is greater than -0.42".
20
Find the percent of the data between -1.75 and 2.05
Here there are two z-values, but each of those is a singular number and so they
are represented by vertical lines on the graph. The "data" is the shaded portion
of the graph and so the shaded portion is between z = -1.75 and z = 2.05.
Area between -1.75 and 2.05
21
Z-Scores and the Standard Normal Distribution
When a random variable has a normal distribution
and its values are converted to z-scores by
subtracting the mean and dividing by the standard
deviation, the z-scores have the standard normal
distribution.
22
Z-Score for a Value of a Random Variable
The z-score for a value of a random variable is the number
of standard deviations that x falls from the mean µ.
It is calculated as:
x-
z
23
Steps for calculating probability using the Z-score
1. Sketch a bell-shaped curve, indicate the mean and the
value(s) of x of interest.
2. shade the area (which represents the probability) you are
interested in obtaining.
3. Use the Z-score formula to calculate Z-value(s) for the
value of X of interested.
4. Look up Z-values in table to find corresponding area(s).
You need to use symmetry Z- table.
5. Calculate the area
6. interpret
24
Example
Scores on the verbal or math portion of the
SAT are approximately normally distributed
with mean µ = 500 and standard deviation
σ = 100. The scores range from 200 to 800.
25
Theoretical Normal Distribution
68.26%
95.44%
99.72%
-5 -4 -3 -2 -1 0 1 2 3 4 5
200 300 400 500 600 700 800
26
Example
• If your verbal (Standardized test for USA,
Scholastic aptitude test) SAT score was
x = 700, how many standard deviations from
the mean was it? Find the z-score for above
x = 700.
27
Theoretical Normal Distribution
68.26%
95.44%
99.72%
-5 -4 -3 -2 -1 0 1 2 3 4 5
200 300 400 500 600 700 800
28
Example
• If your verbal SAT score was x = 700, how many standard
deviations from the mean was it? Find the z-score for
x = 700.
x-700 - 500
z __________
2 = +2
100
29
Now find out
• What percentage of SAT scores was
higher than yours?
30
Area between 0 and z
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990 31
Z= 2
Area= 0.4772(probability of interest)
0.5 - 0.4772 = 0.0228*100
= 2.28%
• 2.28% percentage of SAT scores was
higher than yours.
32
Z= 2
Area= 0.4798(probability of interest)
0.5 + 0.4772 = 0.9772*100
= 97.72%
• 97.72% percentage of SAT scores was
less then yours.
33
Comparing variables with very different observed units of measure
• Example of comparing an SAT score to an American
collage testing (ACT) score
Mary’s ACT score is 26.
Jason’s SAT score is 900. Who did better?
• The mean SAT score is 1000 with a standard
deviation of 100 SAT points.
• The mean ACT score is 22 with a standard deviation
of 2 ACT points.
34
Let’s find the z-scores
Jason: Zx = 900-1000 = -1
100
Mary: Zx = 26-22 = +2
2
• From these findings, we gather that Jason’s score is
1 standard deviation below the mean SAT score and
Mary’s score is 2 standard deviations above the
mean ACT score.
• Therefore, Mary’s score is relatively better.
35
Central Limit Theorem
• Even though the population is not normally
distributed, the sampling distribution will approximate
a normal distribution.
• The approximation becomes better as the sample
size gets larger.
36
Sampling Distribution of Sample Means (N=2)
This distribution is only roughly normal.
37
Sampling Distribution of Sample Means (N=3)
Normality is better than those above.
38
Sampling Distribution of Sample Means (N=4)
Normality is even better than those above.
39
Sampling Distribution of Sample Means (N=5)
This distribution is very normal.
Sampling Distribution of Sample Means (N=6)
This distribution is even more normal. 40
Another example
• Suppose hemoglobin level in adults is approximately
normally distributed with mean 12.7 and standard
deviation 2.8
– A) What proportion of adults would you expect to have
HB level between 10 & 13.
41
A) Suppose hemoglobin level in adults is approximately normally distributed
with mean 12.7 and standard deviation 2.8
4.3 7.1 9.9 12.7 13.5 16.3 19.1 42
Example
• A) What proportion of adults would you expect to
have HB level between 10 & 13.
x- 10 - 12.7
z ____ - 0.96______
2.8
x - 13 - 12.7
z ____0.10______
2.8
x
43
Area between 0 and z
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441 44
Example
• 0.3315+0.0398 = 0.3713*100
• 37.13 % of adults would expect to have hemoglobin
level between 10 & 13.
45
Any Question ??
Thank You!!!!
46