0% found this document useful (0 votes)
74 views14 pages

Notes On Normal Distribution

The document discusses the normal distribution, emphasizing that continuous random variables have probabilities defined over intervals rather than specific values. It explains the probability density function (PDF) and introduces the standard normal distribution, characterized by a mean of 0 and a standard deviation of 1. Additionally, it covers the Central Limit Theorem, properties of normal distribution, and practical applications, including examples of calculating probabilities and determining guarantees based on normal distribution characteristics.

Uploaded by

sani42755024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views14 pages

Notes On Normal Distribution

The document discusses the normal distribution, emphasizing that continuous random variables have probabilities defined over intervals rather than specific values. It explains the probability density function (PDF) and introduces the standard normal distribution, characterized by a mean of 0 and a standard deviation of 1. Additionally, it covers the Central Limit Theorem, properties of normal distribution, and practical applications, including examples of calculating probabilities and determining guarantees based on normal distribution characteristics.

Uploaded by

sani42755024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Normal Distribution

For a continuous random variable X, we do not have a formula which gives the probability of
any particular value of X. The probability that a continuous random variable X assumes a
specific value x is taken to be zero.
For a continuous random variable we deal with probabilities of intervals rather than
probabilities of particular individual values.
The probability distribution of continuous random variable X is characterised by a function
f(x) known as the probability density function. This function is not the same as the
probability function in the discrete case.
Since the probability that X is equal a specific value is zero, the probability density function
does not represent the probability that X = x. Rather, it provides the means by which the
probability of an interval can be determined.
The function f (x) whose graph is the limiting curve is the probability density function of the
continuous random variable X, provided that the vertical scale is chosen in such a way that
the total area under the curve is one.
Definition
Let X be a continuous random variable. If a function f (x) satisfies
1. f(x) ≥ 0, −∞ < x < ∞
2. ⌠f(x)dx = 1
3. P(a ≤X ≤b) = ⌠f(x) dx for any a and b, then f (x) is the probability density function of X.
The third condition indicates how the function is used: The probability that X will assume
some value in the interval [a, b] = the area under the curve between x = a and x = b.
The standard normal distribution
The random variable Z following a normal distribution with mean 0 and standard deviation 1
is said to follow the standard normal distribution, written Z≈ N(0,l). The p.d.f. of Z is given
by
f(x) = 1 /σ2πexp [- (x - μ)2/2σ2] , −∞<x< ∞, −∞<µ< ∞, σ>0
The z-score represents the number of standard deviations a data value is from the mean.
x-μ
Z=
σ
The z-score is important, because if the variable X is normally distributed, Z is as well. This
brings us to an important fact:
If X is normally distributed with mean μ and standard deviation σ, then
Z= x-μ
σ
is normally distributed with a mean of 0 and a standard deviation of 1. We say that Z has
the standard normal distribution.
Central Limit Theorem
Sampling Distribution of the Mean the probability distribution of sample means, with all
samples having the same sample size n. (In general, the sampling distribution of any statistic
is the probability distribution of that statistic.)
General Idea:
Regardless of the population distribution model, as the sample size increases, the sample
mean tends to be normally distributed around the population mean, and its standard deviation
shrinks as n increases.
Certain conditions must be met to use the CLT.
● The samples must be independent
● The sample size must be “big enough”
Independent Samples Test
● “Randomization”: Each sample should represent a random sample from the population, or
at least follow the population distribution.
● “10% Rule”: The sample size must not be bigger than 10% of the entire population. Large
Enough Sample Size
● Sample size n should be large enough so that np≥10 and nq≥10
Given
1. The random variable x has a distribution (which may or may not be normal) with mean µ
and standard deviation σ.
2. Simple random samples all of the same size n are selected from the population. (The
samples are selected so that all possible samples of size n have the same chance of being
selected.)
Wrapping up
1. The distribution of sample means x bar will, as the sample size increases, approach a
normal distribution.
2. The mean of the sample means will be the population mean µ.
3. The standard deviation of the sample means will approach σ / n.
Practical rules
1. For samples of size n larger than 30, the distribution of the sample means can be
approximated reasonably well by a normal distribution. The approximation gets better as the
sample size n becomes larger.
2. If the original population is itself normally distributed, then the sample means will be
normally distributed for any sample size n (not just the values of n larger than 30).
Real life examples of Normal Distribution
1. Income Distribution in Economy
2. Shoe size
3. Birth weight
4. Student’s average report
5. Stock market
6. Tossing a coin
7. IQ
8. Height
9. Rolling a die
Properties of normal distribution
(i) The shape is symmetric.
(ii) The distribution has a mound in the middle, with tails going down to the left and right.
(iii) The mean (μ) is directly in the middle of the distribution.
(iv) The mean, median and the mode are the same value because of the symmetry.
(v) The standard deviation (σ) is the distance from the centre to the saddle point (the place
where the curve changes from an “upside-down-bowl” shape to a “right-side-up-bowl” shape.
(vi) About 68 percent of the values lie within one standard deviation of the mean, about 95
percent lie within two standard deviations, and most of the values (99.7 percent or more) lie
within three standard deviations by the empirical rule.
(vii) Each normal distribution has a different mean and standard deviation that make it look a
little different from the rest, yet they all have the same bell shape.
(viii) The total area under the normal curve is equal to 1.

1. Find the following probabilities:

(a) P(Z > 1.06)

(b) P(Z < -2.15)


(c) P(1.06 < Z < 4.00)

(d) P(-1.06 < Z < 4.00)

Solution: From the z-table:


(a) This is the same as asking "What is the area to the right of 1.06 under the standard normal
curve?"
We need to take the whole of the right hand side (area 0.5) and subtract the area
from z=0 to z=1.06, which we get from the z-table.
P(Z>1.06) =0.5−P(0<Z<1.06) = 0.5−0.3554 = 0.1446
(b) This is the same as asking "What is the area to the left of -2.15 under the standard normal
curve?"
This time, we need to take the area of the whole left side (0.5) and subtract the area
from z=0 to z=2.15 (which is actually on the right side, but the z-table is assuming it is the
right hand side.)
P(Z<−2.15)= 0.5−P(0<Z<2.15) = 0.5−0.4842 = 0.0158
(c) This is the same as asking "What is the area between z=1.06 and z=4.00 under the
standard normal curve?"
P(1.06<Z<4.00)
= P(0<Z<4.00)− P(0<Z<1.06)
=0.5−0.3554
=0.1446
(d) This is the same as asking "What is the area between z=−1.06 and z=4.00 under the
standard normal curve?"
We find the area on the left side from z=−1.06 to z=0 (which is the same as the area
from z=0 to z=1.06), then add the area between z=0 to z=4.00 (on the right side):
P(−1.06<Z<4.00)
=P(0<Z<1.06)+ P(0<Z<4.00)
=0.3554 + 0.5
=0.8554
3. The average life of a certain type of motor is 10 years, with a standard deviation of 2 years.
If the manufacturer is willing to replace only 3% of the motors because of failures, how long
a guarantee should she offer? Assume that the lives of the motors follow a normal
distribution.
Solution:
X = life of motor
x = guarantee period
We need to find the value (in years) that will give us the bottom 3% of the distribution. These
are the motors that we are willing to replace under the guarantee.
P(X<x)=0.03
The area that we can find from the z-table is 0.5−0.03=0.47
The corresponding z-score is z=−1.88.
Since Z = σx−μ, we can write: x−10/2= −1.88
Solving this gives x=6.24.
So the guarantee period should be 6.24 years.
Here's a graph of our situation. Our normal curve has μ = 10, σ = 2.
The yellow portion represents the 47% of all motors that we found in the z-table (that is,
between 0 and −1.88 standard deviations).
The light green portion on the far left is the 3% of motors that we expect to fail within the
first 6.24 years.
4. The volume of water in commercially supplied fresh drinking water containers is
approximately normally distributed with mean 70 litres and standard deviation 0.75 litres.
Estimate the proportion of containers likely to contain (i) in excess of 70.9 litres, (ii) at most
68.2 litres, (iii) less than 70.5 litres.
Solution
Let X denotes the volume of water in a container, in litres. Then X ~ N(70, 0.752), i.e. μ = 70,
σ = 0.75 and Z = (X − 70)/0.75
(i) X = 70.9; Z = (70.9 − 70)/0.75 = 1.20. P(X > 70.9) = P(Z > 1.20) = 0.1151 or 11.51%
(ii) X = 68.2; Z = −2.40. P(X < 68.2) = P(Z < −2.40) = 0.0082 or 0.82%
(iii) X = 70.5; Z = 0.67
P(X > 70.5) = 0.2514; P(X < 70.5) = 0.7486 or 74.86%
For a discrete random variable, probabilities are associated with particular individual values
of the random variable and the sum of all probabilities is one.
For a continuous random variable X, we do not have a formula which gives the probability of
any particular value of X. The probability that a continuous random variable X assumes a
specific value x is taken to be zero.
For a continuous random variable we deal with probabilities of intervals rather than
probabilities of particular individual values.
The probability distribution of continuous random variable X is characterised by a function
f(x) known as the probability density function. This function is not the same as the
probability function in the discrete case.
Since the probability that X is equal a specific value is zero, the probability density function
does not represent the probability that X = x. Rather, it provides the means by which the
probability of an interval can be determined.
The function f (x) whose graph is the limiting curve is the probability density function of the
continuous random variable X, provided that the vertical scale is chosen in such a way that
the total area under the curve is one.
Definition
Let X be a continuous random variable. If a function f (x) satisfies
1. f(x) ≥ 0, −∞ < x < ∞
2. ⌠f(x)dx = 1
3. P(a ≤X ≤b) = ⌠f(x) dx
for any a and b, then f (x) is the probability density function of X.
The third condition indicates how the function is used: The probability that X will assume
some value in the interval [a, b] = the area under the curve between x = a and x = b. This is
the shaded area as shown in the graph below.
In general the exact evaluation of areas requires us to use the probability density function f
(x) and integral calculus. These calculations are time consuming and not straightforward.
However, probability values can be obtained from statistical tables (just as for discrete
probability distributions)
The Normal Distribution
The Normal Distribution is the most important and most widely used continuous probability
distribution. It is the keystone of the application of statistical inference in analysis of data
because the distributions of several important sample statistics tend towards a normal
distribution as the sample size increases. The graphical appearance of the Normal distribution
is a symmetrical bell-shaped curve that extends without bound in both positive and negative
directions:
The probability density function is given by
f(x) = 1 /σ2πexp [- (x - μ)2/2σ2] , −∞<x< ∞, −∞<µ< ∞, σ>0
Where μ and σ are parameters.
These turn out to be the mean and standard deviation, respectively, of the distribution.
we write X ~ N(μ,σ2)
The curve never actually reaches the horizontal axis buts gets close to it beyond about 3
standard deviations each side of the mean.
For any normally distributed variable:
68.3% of all values will lie between μ−σ and μ+σ (i.e. μ ±σ)

95.45% of all values will lie within μ ±2σ

99.73% of all values will lie within μ ±3σ


The graphs below illustrate the effect of changing the values of μ and σ on the shape of the
probability density function. Low variability (σ = 0.71) with respect to the mean gives a
pointed bell-shaped curve with little spread. Variability of σ = 1.41 produces a flatter bell
shaped curve with a greater spread.
The standard normal distribution
The random variable Z following a normal distribution with mean 0 and standard deviation 1
is said to follow the standard normal distribution, written Z≈ N(0,l). The p.d.f. of Z is given
by
f(x) = 1 /σ2πexp [- (x - μ)2/2σ2] , −∞<x< ∞, −∞<µ< ∞, σ>0
The z-score represents the number of standard deviations a data value is from the mean.
x-μ
Z=
σ
The z-score is important, because if the variable X is normally distributed, Z is as well. This
brings us to an important fact:
If X is normally distributed with mean μ and standard deviation σ, then
x-μ
Z=
σ
is normally distributed with a mean of 0 and a standard deviation of 1. We say that Z has
the standard normal distribution.
Properties of normal distribution
(i) The shape is symmetric.
(ii) The distribution has a mound in the middle, with tails going down to the left and right.
(iii) The mean (μ) is directly in the middle of the distribution.
(iv) The mean, median and the mode are the same value because of the symmetry.
(v) The standard deviation (σ) is the distance from the centre to the saddle point (the place
where the curve changes from an “upside-down-bowl” shape to a “right-side-up-bowl” shape.
(vi) About 68 percent of the values lie within one standard deviation of the mean, about 95
percent lie within two standard deviations, and most of the values (99.7 percent or more) lie
within three standard deviations by the empirical rule.
(vii) Each normal distribution has a different mean and standard deviation that make it look a
little different from the rest, yet they all have the same bell shape.
(viii) The total area under the normal curve is equal to 1.
Normal Curve
A continuous random variable is normally distributed or has a normal probability
distribution if its relative frequency histogram has the shape of a normal curve.
Calculation of Normal Probabilities
Only one set of tabled values for the Normal distribution are available- this is for the
Standard Normal variable, which has mean = 0 and standard deviation = 1.
The calculation of Normal probabilities associated with a specified range of values of random
variable X involves two steps:
Step - 1
Apply the transformation Z = (X − μ)/σ to change the basis of the probability calculation
from X to Z, the standard Normal variable i.e. this expresses the probability calculation we
need to carry out in terms of an equivalent probability calculation involving the Standard
Normal variable (for which we have tabled values).
Step - 2
Use tables of probabilities for Z, together with the symmetry properties of the Normal
Distribution to determine the required probability.
Tables of the Standard Normal distribution are widely available. The version we will use the
table of Murdoch and Barnes, in which the tabulated value is the probability P(Z > u) where u
lies between 0 and 4, i.e. the tables provide the probability of Z exceeding a specific value u,
sometimes referred to as the right-hand tail probability.
Problem
1. Find the area under the standard normal curve for the following, using the z-table. Sketch
each one.
(a) Between z = 0 and z = 0.78
(b) Between z = −0.56 and z = 0
(c) Between z = −0.43 and z = 0.78
(d) Between z = 0.44 and z = 1.50
(e) To the right of z = −1.33.
2. Find the following probabilities:

(a) P(Z > 1.06)

(b) P(Z < -2.15)

(c) P(1.06 < Z < 4.00)

(d) P(-1.06 < Z < 4.00)

Solution: From the z-table:


(a)This is the same as asking "What is the area to the right of 1.06 under the standard normal
curve?"
We need to take the whole of the right hand side (area 0.5) and subtract the area
from z=0 to z=1.06, which we get from the z-table.
P(Z>1.06) =0.5−P(0<Z<1.06) = 0.5−0.3554 = 0.1446
(b)This is the same as asking "What is the area to the left of -2.15 under the standard normal
curve?"
This time, we need to take the area of the whole left side (0.5) and subtract the area
from z=0 to z=2.15 (which is actually on the right side, but the z-table is assuming it is the
right hand side.)
P(Z<−2.15)= 0.5−P(0<Z<2.15) = 0.5−0.4842 = 0.0158
(c) This is the same as asking "What is the area between z=1.06 and z=4.00 under the
standard normal curve?"
P(1.06<Z<4.00)
= P(0<Z<4.00)− P(0<Z<1.06)
=0.5−0.3554
=0.1446
(d) This is the same as asking "What is the area between z=−1.06 and z=4.00 under the
standard normal curve?"
We find the area on the left side from z=−1.06 to z=0 (which is the same as the area
from z=0 to z=1.06), then add the area between z=0 to z=4.00 (on the right side):
P(−1.06<Z<4.00)
=P(0<Z<1.06)+ P(0<Z<4.00)
=0.3554 + 0.5
=0.8554
3. The average life of a certain type of motor is 10 years, with a standard deviation of 2 years.
If the manufacturer is willing to replace only 3% of the motors because of failures, how long
a guarantee should she offer? Assume that the lives of the motors follow a normal
distribution.
Solution:
X = life of motor
x = guarantee period
We need to find the value (in years) that will give us the bottom 3% of the distribution. These
are the motors that we are willing to replace under the guarantee.
P(X<x)=0.03
The area that we can find from the z-table is 0.5−0.03=0.47
The corresponding z-score is z=−1.88.
Since Z = σx−μ, we can write: x−10/2= −1.88
Solving this gives x=6.24.
So the guarantee period should be 6.24 years.
Here's a graph of our situation. Our normal curve has μ = 10, σ = 2.
The yellow portion represents the 47% of all motors that we found in the z-table (that is,
between 0 and −1.88 standard deviations).
The light green portion on the far left is the 3% of motors that we expect to fail within the
first 6.24 years.
4. The volume of water in commercially supplied fresh drinking water containers is
approximately normally distributed with mean 70 litres and standard deviation 0.75 litres.
Estimate the proportion of containers likely to contain (i) in excess of 70.9 litres, (ii) at most
68.2 litres, (iii) less than 70.5 litres.
Solution
Let X denotes the volume of water in a container, in litres. Then X ~ N(70, 0.752), i.e. μ = 70,
σ = 0.75 and Z = (X − 70)/0.75
(i) X = 70.9; Z = (70.9 − 70)/0.75 = 1.20. P(X > 70.9) = P(Z > 1.20) = 0.1151 or 11.51%
(ii) X = 68.2; Z = −2.40. P(X < 68.2) = P(Z < −2.40) = 0.0082 or 0.82%
(iii) X = 70.5; Z = 0.67
P(X > 70.5) = 0.2514; P(X < 70.5) = 0.7486 or 74.86%
Exercise
1. The weights of bags of potatoes delivered to a supermarket are approximately normally
distributed with mean 5 Kg and standard deviation 0.2 Kg. The potatoes are delivered in
batches of 500 bags.
(i) Calculate the probability that a randomly selected bag will weigh more than 5.5 Kg.
(ii) Calculate the probability that a randomly selected bag will weigh between 4.6 and 5.5 Kg.
(iii) What is the expected number of bags in a batch weighing less than 5.5 Kg?

2. A machine in a factory produces components whose lengths are approximately normally


distributed with mean 102mm and standard deviation 1mm.
(i) Find the probability that if a component is selected at random and measured, its length will
be:
(a) less than 100mm; (b) greater that 104mm.
(ii) If an output component is only accepted when its length lies in the range 100mm to
104mm, find the expected proportion of components that are accepted.
Problems and Solutions of Normal Distribution
1. A personal computer is used for office work at home, research, communication, personal
finances, education, entertainment, social networking, and a myriad of other things. Suppose
that the average number of hours a household personal computer is used for entertainment is
two hours per day. Assume the times for entertainment are normally distributed and the
standard deviation for the times is half an hour.
(i) Find the probability that a household personal computer is used for entertainment between
1.8 and 2.75 hours per day.
(ii) Find the maximum number of hours per day that the bottom quartile of households uses a
personal computer for entertainment.
Solution:
(i) Let X= the amount of time (in hours) a household personal computer is used for
entertainment. X ~ N(2, 0.5) where μ = 2 and σ = 0.5. Find P(1.8 < x < 2.75).
The probability for which you are looking is the area between x = 1.8 and x = 2.75.
So, we have to find out P(1.8 < x < 2.75)
Z1 = (1.8-2)/0.5 = -0.4
Z2 = (2.75-2)/0.5 = 1.5
Required probability = (Area between z=0 and z=-0.4) + (Area between z=0 and z=1.5)
= 0.1554 + 0.4332
= 0.5886
The probability that a household personal computer is used between 1.8 and 2.75 hours per
day for entertainment is 0.5886.
(ii) To find the maximum number of hours per day that the bottom quartile of households
uses a personal computer for entertainment, find the 25th percentile, k, where P(x < k) = 0.25.
The maximum number of hours per day that the bottom quartile of households uses a
personal computer for entertainment is 1.66 hours.
2. The time taken to assemble a car in a certain plant is a random variable having a normal
distribution of 20 hours and a standard deviation of 2 hours. What is the probability that a car
can be assembled at this plant in a period of time
a) less than 19.5 hours?
b) between 20 and 22 hours?
Solution:
a) P(x < 19.5) = P(z < -0.25)
= 0.0987
b) P(20 < x < 22) = P(0 < z < 1)
= 0.3413
3. A large group of students took a test in Business Statistics and the final grades have a mean
of 70 and a standard deviation of 10. If we can approximate the distribution of these grades
by a normal distribution, what percent of the students
a) scored higher than 80?
b) should pass the test (grades≥60)?
c) should fail the test (grades<60)?
Solution:
a) For x = 80, z = 1
Area to the right (higher than) z = 1 is equal to 0.3413 = 34.13% scored more than 80.
b) For x = 60, z = -1
Area to the right of z = -1 is equal to 0.3413 = 34.13% should pass the test.
c) 100% - 34.13% = 65.87% should fail the test.
4. The annual salaries of employees in a large company are approximately normally
distributed with a mean of Rs. 50,000 and a standard deviation of Rs. 20,000.
a) What percent of people earn less than Rs. 40,000?
b) What percent of people earn between Rs. 45,000 and Rs. 65,000?
c) What percent of people earn more than Rs. 70,000?
Solution:
a) For x = 40000, z = -0.5
Area to the left (less than) of z = -0.5 is equal to 0.3085 = 30.85% earn less than Rs. 40,000.
b) For x = 45000, z = -0.25 and for x = 65000, z = 0.75
Area between z = -0.25 and z = 0.75 is equal to 0.3721 = 37.20% earn between Rs. 45,000
and Rs. 65,000.
c) For x = 70000, z = 1
Area to the right (higher) of z = 1 is equal to 0.3413 = 34.13% earn more than Rs. 70,000.
5. Entry to the Kalyani University is determined by a national test. The scores on this test are
normally distributed with a mean of 500 and a standard deviation of 100. Anuradha wants to
be admitted to this university and she knows that she must score better than at least 70% of
the students who took the test. Anuradha takes the test and scores 585. Will she be admitted
to this university?
Solution:
Let x be the random variable that represents the scores.
X is normally distributed with a mean of 500 and a standard deviation of 100.
The total area under the normal curve represents the total number of students who took the
test.
If we multiply the values of the areas under the curve by 100, we obtain percentages.
For x = 585, z = (585 - 500) / 100 = 0.85.
The proportion of students who scored below 585 is given by
P = [area to the left of z = 0.85] = 0.8023 = 80.23%
Anuradha scored better than 80.23% of the students who took the test and she will be
admitted to this University.
6. A manufacturer does not know the mean and standard deviation of the diameters of ball
bearings he is producing. However, a sieving system rejects all bearings larger than 2.4 cm
and those under 1.8 cm in diameter. Out of 1000 ball bearings 8% are rejected as too small
and 5.5% as too big. What is the mean and standard deviation of the ball bearings produced?
Solution:
Assume a normal distribution of 8% are rejected as too small, i.e, the required probability is
1-0.08 = 0.92, then z = 1.4
So, 1.8 cm is 1.4 standard deviation below mean or 1.4 = (1.8 – µ)/σ
µ - 1.4 σ = 1.8 ......(i)
Again, 5.5% are rejected as too big, i.e, the required probability is 1-0.055 = 0.945, then z =
1.6
So, 2.4 cm is 1.6 standard deviation below mean or 1.6 = (2.4 – µ)/σ
µ + 1.6 σ = 2.4 ......(ii)
Solving,
µ - 1.4 σ = 1.8 ......(i)
µ + 1.6 σ = 2.4 ......(ii)
σ = 0.2 and µ = 2.08
So, diameters are distributed N(2.08, 0.22)

You might also like