Module I: Probability Distributions
Topic 4: Normal Distribution
Dr. P. Rajendra, Professor, Department of Mathematics,
CMRIT, Bengaluru.
Introduction to Normal Distribution
▶ The Normal Distribution, also known as the Gaussian
Distribution, is a continuous probability distribution that is
symmetric about the mean, describing data that clusters
around a central value.
▶ It is characterized by two parameters:
▶ µ (mean): the central value around which the data is
distributed.
▶ σ 2 (variance): the measure of the spread of the distribution.
▶ The Normal Distribution is widely used in statistics, data
science, and AI for modeling natural phenomena, errors, and
noise.
Example
In AI, the Normal Distribution can model the distribution of errors
in predictions made by regression models, or the distribution of
features in a dataset for a machine learning algorithm.
Probability Density Function (PDF)
▶ The probability density function (PDF) of the Normal
Distribution is given by:
(x − µ)2
1
f (x) = √ exp −
2πσ 2 2σ 2
▶ Here:
▶ x is the variable.
▶ µ is the mean of the distribution.
▶ σ 2 is the variance of the distribution.
▶ exp is the exponential function.
▶ The PDF describes the likelihood of a random variable taking
on a particular value.
Example
In a machine learning context, the PDF can describe the
distribution of continuous features in a dataset. For instance, if we
assume that the heights of people in a dataset follow a Normal
Distribution, the PDF can tell us the likelihood of finding someone
with a specific height.
Properties of the Normal Distribution
▶ Symmetry: The Normal Distribution is symmetric around the
mean µ.
▶ Mean, Median, and Mode: For a Normal Distribution, the
mean, median, and mode are all equal.
▶ 68-95-99.7 Rule (Empirical Rule):
▶ Approximately 68% of the data lies within one standard
deviation (σ) of the mean.
▶ Approximately 95% of the data lies within two standard
deviations of the mean.
▶ Approximately 99.7% of the data lies within three standard
deviations of the mean.
▶ Unimodal: The distribution has a single peak (mode) at the
mean µ.
Example
When evaluating model performance, the errors or residuals (the
differences between observed and predicted values) are often
assumed to follow a Normal Distribution. This assumption
underlies many statistical tests and confidence intervals.
Standard Normal Distribution
▶ The Standard Normal Distribution is a special case of the
Normal Distribution with a mean of 0 and a standard
deviation of 1.
▶ Any Normal Distribution can be converted to the Standard
Normal Distribution using the z-score:
x −µ
z=
σ
▶ The z-score represents the number of standard deviations a
data point is from the mean.
Applications in AI and Data Science
▶ Regression Analysis: Assumes that the errors or residuals
follow a Normal Distribution, allowing for hypothesis testing
and the construction of confidence intervals.
▶ Data Preprocessing: Features are often normalized to a
Normal Distribution to ensure better model performance,
especially in algorithms sensitive to feature scaling (e.g.,
SVMs, Neural Networks).
▶ Machine Learning Model Assumptions: Some machine
learning algorithms assume normality in data distribution,
impacting how models are trained and evaluated.
▶ Statistical Quality Control: Used in monitoring
manufacturing processes where the measurements of product
quality are normally distributed.
Problem 1
200 students appeared in an examination. The distribution of
marks is assumed to be normal with a mean of 30 and a standard
deviation of 6.25. How many students are expected to get marks:
(i) Between 20 and 40
(i) Less than 35, given that Z (1.6) = 0.4452 and
Z (0.8) = 0.2881
Solution:
Given:
Mean (µ) = 30
Standard Deviation (σ) = 6.25
Total number of students = 200
i) Probability of marks between 20 and 40:
To find the probability of marks between 20 and 40, we standardize
the values using the Z-score formula:
X −µ
Z=
σ
For X = 20:
20 − 30 −10
Z1 = = = −1.6
6.25 6.25
For X = 40:
40 − 30 10
Z2 = = = 1.6
6.25 6.25
Using the Z-table:
P(20 < X < 40) = P(−1.6 < Z < 1.6)
= 2 × Z (1.6) = 2 × 0.4452 = 0.8904
Expected number of students with marks between 20 and 40:
0.8904 × 200 = 178.08 ≈ 178 students
ii) Probability of marks less than 35:
For X = 35:
35 − 30 5
Z= = = 0.8
6.25 6.25
Using the Z-table:
P(X < 35) = P(Z < 0.8) = 0.5 + Z (0.8) = 0.5 + 0.2881 = 0.7881
Expected number of students with marks less than 35:
0.7881 × 200 = 157.62 ≈ 158 students
Problem 2
The weekly wages of workers in a company are normally distributed
with a mean of Rs. 700 and a standard deviation of Rs. 50. Find
the probability that the weekly wage of a randomly chosen worker
is:
(i) Between Rs. 650 and Rs. 750
(i) More than Rs. 750
Solution:
Given:
Mean (µ) = 700
Standard Deviation (σ) = 50
i) Probability of weekly wage between Rs. 650 and Rs. 750:
To find the probability, we first convert the raw wages to Z-scores
using the formula:
X −µ
Z=
σ
For X = 650:
650 − 700 −50
Z1 = = = −1
50 50
For X = 750:
750 − 700 50
Z2 = = =1
50 50
Using the Z-table:
P(650 < X < 750) = P(−1 < Z < 1)
Since the Z-table gives the area to the left of a given Z-value and
the normal distribution is symmetric:
P(−1 < Z < 1) = 2 × P(Z < 1)
P(−1 < Z < 1) = 2 × 0.3413 = 0.6826
Thus, the probability that the weekly wage is between Rs. 650 and
Rs. 750 is 0.6826.
ii) Probability of weekly wage more than Rs. 750:
For X = 750:
750 − 700
Z= =1
50
Using the Z-table:
P(X > 750) = 0.5 − P(Z < 1)
P(X > 750) = 0.5 − 0.3413 = 0.1587
Therefore, the probability that the weekly wage is more than Rs.
750 is 0.1587.
Problem 3
The lifetime of a certain type of electric bulbs of a particular brand
is normally distributed with an average life of 2000 hours and a
standard deviation of 60 hours. If a firm purchases 2500 bulbs, find
the number of bulbs that are likely to last for:
(i) More than 2100 hours
(ii) Less than 1950 hours
(iii) Between 1900 and 2100 hours
Solution:
Given:
Mean (µ) = 2000 hours
Standard Deviation (σ) = 60 hours
Total number of bulbs = 2500
i) Number of bulbs likely to last more than 2100 hours:
To find the probability, we first convert the raw life times to
Z-scores using the formula:
X −µ
Z=
σ
For X = 2100:
2100 − 2000 100
Z= = ≈ 1.67
60 60
Using the Z-table:
P(X > 2100) = P(Z > 1.67)
The Z-table gives the cumulative probability up to 1.67:
P(Z > 1.67) = 0.5 − 0.4525 = 0.0475
The expected number of bulbs lasting more than 2100 hours is:
2500 × 0.0475 = 118.75 ≈ 119 bulbs
ii) Number of bulbs likely to last less than 1950 hours:
For X = 1950:
1950 − 2000 −50
Z= = ≈ −0.83
60 60
Using the Z-table:
P(X < 1950) = P(Z < −0.83)
= 0.5 − 0.2967 = 0.2033
The expected number of bulbs lasting less than 1950 hours is:
2500 × 0.2033 = 508.25 ≈ 508 bulbs
iii) Number of bulbs likely to last between 1900 and 2100
hours:
For X = 1900:
1900 − 2000 −100
Z1 = = ≈ −1.67
60 60
For X = 2100:
2100 − 2000 100
Z2 = = ≈ 1.67
60 60
Using the Z-table:
P(1900 < X < 2100) = P(−1.67 < Z < 1.67)
= 2 × P(0 < Z < 1.67) = 2 × 0.4575 = 0.9050
The expected number of bulbs lasting between 1900 and 2100
hours is:
2500 × 0.9050 = 2262.5 ≈ 2263 bulbs
Problem 4
In a normal distribution, 31% of the items are under 45, and 8% of
the items are over 64. Find the mean and standard deviation of
the distribution.
Solution:
Given:
P(X < 45) = 0.31
P(X > 64) = 0.08
Let the mean be µ and the standard deviation be σ. We need to
find the Z-scores corresponding to these probabilities.
Step 1: Finding Z-scores from the standard normal
distribution table.
Given, P(X < 45) = P(Z < z1 ) = 0.31
=⇒ P(−∞ < Z < 0) − P(0 < Z < z1 ) = 0.31
=⇒ 0.5 − P(0 < Z < z1 ) = 0.31
=⇒ P(0 < Z < z1 ) = 0.5 − 0.31 = 0.19
From the Z-table, z1 ≈ −0.50.
Given, P(X > 64) = P(Z > z2 ) = 0.08
=⇒ 0.5 − P(0 < Z < z2 ) = 0.08
=⇒ P(0 < Z < z2 ) = 0.5 − 0.08 = 0.42
From the Z-table, z2 ≈ 1.41.
Step 2: Setting up equations for X = 45 and X = 64.
45 − µ
z1 = = −0.50
σ
64 − µ
z2 = = 1.41
σ
Step 3: Solving for µ and σ.
From z1 :
−0.50σ = 45 − µ =⇒ µ = 45 + 0.50σ
From z2 :
1.41σ = 64 − µ
Substitute µ = 45 + 0.50σ into the second equation:
1.41σ = 64 − (45 + 0.50σ)
Simplify:
1.41σ + 0.50σ = 64 − 45
1.91σ = 19
19
σ= ≈ 9.95
1.91
Now, find µ:
µ = 45 + 0.50 × 9.95 = 45 + 4.975 ≈ 49.975
Therefore,
Mean (µ) ≈ 50
Standard Deviation (σ) ≈ 9.95
Problem 5
In an examination, 7% of the students scored less than 35% of the
marks, and 89% of the students scored less than 60% of the
marks. Find the mean and standard deviation if the marks are
normally distributed.
Solution:
Given:
P(X < 35) = 0.07
P(X < 60) = 0.89
Let the mean be µ and the standard deviation be σ. We need to
find the Z-scores corresponding to these probabilities.
Step 1: Finding Z-scores from the standard normal
distribution table.
Given, P(X < 35) = P(Z < z1 ) = 0.07
=⇒ P(−∞ < Z < 0) − P(0 < Z < z1 ) = 0.07
=⇒ 0.5 − P(0 < Z < z1 ) = 0.07
=⇒ P(0 < Z < z1 ) = 0.5 − 0.07 = 0.43
From the Z-table, z1 ≈ −1.48.
For P(X < 60) = 0.89: Given, P(X < 60) = P(Z < z1 ) = 0.89
=⇒ P(−∞ < Z < 0) + P(0 < Z < z1 ) = 0.89
=⇒ 0.5 + P(0 < Z < z1 ) = 0.89
=⇒ P(0 < Z < z1 ) = 0.89 − 0.5 = 0.39
From the Z-table, z2 ≈ 1.23.
Step 2: Setting up equations for X = 35 and X = 60.
35 − µ
z1 = = −1.48
σ
60 − µ
z2 = = 1.23
σ
Step 3: Solving for µ and σ.
From z1 :
−1.48σ = 35 − µ =⇒ µ = 35 + 1.48σ
From z2 :
1.23σ = 60 − µ
Substitute µ = 35 + 1.48σ into the second equation:
1.23σ = 60 − (35 + 1.48σ)
Simplify:
1.23σ + 1.48σ = 60 − 35
2.71σ = 25
25
σ= ≈ 9.23
2.71
Now, find µ:
µ = 35 + 1.48 × 9.23 = 35 + 13.66 ≈ 48.66
Therefore,
Mean (µ) ≈ 48.66
Standard Deviation (σ) ≈ 9.23
Assignment Problems
1. The marks of 1000 students in an examination follows
normal distribution with mean 70 and standard deviation 5.
Find the number students whose marks will be:
(i) Less than 65 [Ans: 159]
(ii) More than 75 [Ans: 59]
(iii) Between 65 and 75 [Ans: 683]
[Given Z (1) = 0.3413]
2. In a test on 2000 electric bulbs, it was found that the life
of a particular make was normally distributed with an
average life of 2040 hours and SD of 60 hours. Estimate the
number of bulbs likely to burn for:
(i) More than 2150 hours [Ans: 67]
(ii) Less than 1950 hours [Ans: 137]
(iii) Between 1920 and 2160 hours [Ans: 1909]
3. In a normal distribution, 7% of items are under 35 and
89% of the items are under 63. Find the mean and standard
deviation of the distribution.
Answer: Mean = 50.29, S.D = 10.33