9/28/25, 1:14 PM OneNote notes for DSMP
Session 41 – Normal Distribution | DSMP 2023
1. Recap of Previous Session
Last session: Probability Distributions (Random Variables, PMF, PDF, CDF, Parametric vs Non-parametric).
Today: Normal Distribution – one of the most important distributions in Statistics & Data Science.
2. What is Normal Distribution?
Also called Gaussian Distribution or Bell Curve.
Continuous probability distribution, symmetric about mean.
Shape: Bell-shaped curve.
Many natural & man-made phenomena follow it (height, weight, exam scores, measurement errors, etc.).
3. Properties
1. Symmetry – perfectly symmetric about mean.
2. Mean = Median = Mode (for a perfect normal distribution).
3. Tails (asymptotic) – curve approaches x-axis but never touches.
4. Area under curve = 1 (probability rule).
4. Parameters
Two parameters fully define Normal Distribution:
Mean (µ): central location (center of data).
Standard Deviation (σ): spread or dispersion of data.
https://chatgpt.com/c/68c94f92-c340-832f-9f7f-da7cebb36ef5 1/4
9/28/25, 1:14 PM OneNote notes for DSMP
Effect of Parameters:
Changing µ → shifts curve left/right.
Changing σ → curve becomes flatter (larger spread) or taller/narrower (smaller spread).
5. Equation
PDF of Normal Distribution:
1 (x−μ)2
f (x) = e− 2σ2
σ 2π
µ = mean, σ = standard deviation, π & e are constants.
6. Standard Normal Distribution (Z-Distribution)
Special case of Normal Distribution with:
µ = 0, σ = 1.
Denoted as Z ~ N(0,1).
Used to standardize any Normal Distribution using transformation:
X −μ
Z=
σ
7. Why Standard Normal?
Allows comparison of different distributions.
Provides access to Z-Tables (pre-computed probabilities).
Makes probability calculations easy.
https://chatgpt.com/c/68c94f92-c340-832f-9f7f-da7cebb36ef5 2/4
9/28/25, 1:14 PM OneNote notes for DSMP
8. Empirical Rule (68–95–99.7 Rule)
For any Normal Distribution:
68% data lies within µ ± 1σ.
95% data lies within µ ± 2σ.
99.7% data lies within µ ± 3σ.
9. Examples
Example 1: Height Problem
Population: N(µ = 68 in, σ = 3 in).
Probability height > 72 in?
Convert to Z:
72 − 68
Z= = 1.33
3
From Z-Table: P(Z ≤ 1.33) ≈ 0.908 → Probability > 72 = 1 - 0.908 = 0.092 (~9%).
Example 2: Probability between mean & 1σ
From Z-Table: P(0 ≤ Z ≤ 1) ≈ 0.3413.
Means ~34% data lies between mean & +1σ (symmetrically another 34% on the left).
10. CDF (Cumulative Distribution Function)
Represents probability that a random variable ≤ x.
Graph is S-shaped (sigmoid curve).
CDF = Area under PDF curve up to point x.
https://chatgpt.com/c/68c94f92-c340-832f-9f7f-da7cebb36ef5 3/4
9/28/25, 1:14 PM OneNote notes for DSMP
11. Applications in Data Science
Outlier Detection → values outside ±3σ often considered outliers.
Machine Learning Models → regression assumes residuals are normally distributed.
Hypothesis Testing → many tests rely on normality assumption.
Central Limit Theorem → sample means tend toward Normal Distribution, even if population isn’t.
12. Skewness
Skewness measures asymmetry of distribution.
0 → symmetric (Normal).
> 0 → right-skewed (long tail on right).
< 0 → left-skewed (long tail on left).
Practical rule:
-0.5 ≤ skew ≤ +0.5 → approximately symmetric.
Beyond that → skewed distribution.
13. Key Takeaways
Normal Distribution = foundation of statistics & ML.
Defined by µ & σ.
Standardization (Z-scores) makes comparison & probability calculation easy.
Empirical Rule gives quick estimates without tables.
Many real-world data approx. follows Normal → powerful tool for modeling & inference.
✅ End of Session 41 Notes
https://chatgpt.com/c/68c94f92-c340-832f-9f7f-da7cebb36ef5 4/4