0% found this document useful (0 votes)
29 views4 pages

Session 40 - Normal (Gaussian) Distributions - Notes (DSMP 2023)

Session 41 covers the Normal Distribution, a key concept in statistics and data science characterized by its bell-shaped curve and defined by mean (µ) and standard deviation (σ). It discusses properties, the standard normal distribution (Z-Distribution), the empirical rule, and applications in data science such as outlier detection and hypothesis testing. Key takeaways emphasize the importance of normal distribution in modeling and inference, along with the utility of standardization for comparison and probability calculations.

Uploaded by

Sagar Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views4 pages

Session 40 - Normal (Gaussian) Distributions - Notes (DSMP 2023)

Session 41 covers the Normal Distribution, a key concept in statistics and data science characterized by its bell-shaped curve and defined by mean (µ) and standard deviation (σ). It discusses properties, the standard normal distribution (Z-Distribution), the empirical rule, and applications in data science such as outlier detection and hypothesis testing. Key takeaways emphasize the importance of normal distribution in modeling and inference, along with the utility of standardization for comparison and probability calculations.

Uploaded by

Sagar Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

9/28/25, 1:14 PM OneNote notes for DSMP

Session 41 – Normal Distribution | DSMP 2023

1. Recap of Previous Session


Last session: Probability Distributions (Random Variables, PMF, PDF, CDF, Parametric vs Non-parametric).
Today: Normal Distribution – one of the most important distributions in Statistics & Data Science.

2. What is Normal Distribution?


Also called Gaussian Distribution or Bell Curve.
Continuous probability distribution, symmetric about mean.
Shape: Bell-shaped curve.
Many natural & man-made phenomena follow it (height, weight, exam scores, measurement errors, etc.).

3. Properties
1. Symmetry – perfectly symmetric about mean.
2. Mean = Median = Mode (for a perfect normal distribution).
3. Tails (asymptotic) – curve approaches x-axis but never touches.
4. Area under curve = 1 (probability rule).

4. Parameters
Two parameters fully define Normal Distribution:
Mean (µ): central location (center of data).
Standard Deviation (σ): spread or dispersion of data.

https://chatgpt.com/c/68c94f92-c340-832f-9f7f-da7cebb36ef5 1/4
9/28/25, 1:14 PM OneNote notes for DSMP

Effect of Parameters:
Changing µ → shifts curve left/right.
Changing σ → curve becomes flatter (larger spread) or taller/narrower (smaller spread).

5. Equation
PDF of Normal Distribution:

1 (x−μ)2
f (x) = e− 2σ2 ​

σ 2π

µ = mean, σ = standard deviation, π & e are constants.

6. Standard Normal Distribution (Z-Distribution)


Special case of Normal Distribution with:
µ = 0, σ = 1.
Denoted as Z ~ N(0,1).
Used to standardize any Normal Distribution using transformation:

X −μ
Z=
σ

7. Why Standard Normal?


Allows comparison of different distributions.
Provides access to Z-Tables (pre-computed probabilities).
Makes probability calculations easy.

https://chatgpt.com/c/68c94f92-c340-832f-9f7f-da7cebb36ef5 2/4
9/28/25, 1:14 PM OneNote notes for DSMP

8. Empirical Rule (68–95–99.7 Rule)


For any Normal Distribution:
68% data lies within µ ± 1σ.
95% data lies within µ ± 2σ.
99.7% data lies within µ ± 3σ.

9. Examples

Example 1: Height Problem


Population: N(µ = 68 in, σ = 3 in).
Probability height > 72 in?
Convert to Z:

72 − 68
Z= = 1.33
3

From Z-Table: P(Z ≤ 1.33) ≈ 0.908 → Probability > 72 = 1 - 0.908 = 0.092 (~9%).

Example 2: Probability between mean & 1σ


From Z-Table: P(0 ≤ Z ≤ 1) ≈ 0.3413.
Means ~34% data lies between mean & +1σ (symmetrically another 34% on the left).

10. CDF (Cumulative Distribution Function)


Represents probability that a random variable ≤ x.
Graph is S-shaped (sigmoid curve).
CDF = Area under PDF curve up to point x.

https://chatgpt.com/c/68c94f92-c340-832f-9f7f-da7cebb36ef5 3/4
9/28/25, 1:14 PM OneNote notes for DSMP

11. Applications in Data Science


Outlier Detection → values outside ±3σ often considered outliers.
Machine Learning Models → regression assumes residuals are normally distributed.
Hypothesis Testing → many tests rely on normality assumption.
Central Limit Theorem → sample means tend toward Normal Distribution, even if population isn’t.

12. Skewness
Skewness measures asymmetry of distribution.
0 → symmetric (Normal).
> 0 → right-skewed (long tail on right).
< 0 → left-skewed (long tail on left).
Practical rule:
-0.5 ≤ skew ≤ +0.5 → approximately symmetric.
Beyond that → skewed distribution.

13. Key Takeaways


Normal Distribution = foundation of statistics & ML.
Defined by µ & σ.
Standardization (Z-scores) makes comparison & probability calculation easy.
Empirical Rule gives quick estimates without tables.
Many real-world data approx. follows Normal → powerful tool for modeling & inference.

✅ End of Session 41 Notes

https://chatgpt.com/c/68c94f92-c340-832f-9f7f-da7cebb36ef5 4/4

You might also like