0% found this document useful (0 votes)
12 views23 pages

Q & A-Unit 2 - Distributions

The document covers various probability distributions, including discrete and continuous types, with a focus on Normal, Poisson, Lognormal, Weibull, and Exponential distributions. It explains key concepts such as probability measures, distribution characteristics, and the memoryless property of the Exponential distribution. Additionally, it includes a question bank for assessment and links to video resources for further learning.

Uploaded by

baip1066
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views23 pages

Q & A-Unit 2 - Distributions

The document covers various probability distributions, including discrete and continuous types, with a focus on Normal, Poisson, Lognormal, Weibull, and Exponential distributions. It explains key concepts such as probability measures, distribution characteristics, and the memoryless property of the Exponential distribution. Additionally, it includes a question bank for assessment and links to video resources for further learning.

Uploaded by

baip1066
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Computational Statistics

Unit 2 - Distributions: (9 hours)

Probability Distributions, Characterizing a Distribution, Discrete Distributions, Normal


Distributions, Continuous Distributions Derived from the Normal Distribution, Poisson
Distribution, Other Continuous distributions- Lognormal, Weighbull, Exponential,
Uniform.

Question Bank:
1. Explain the concepts of Probability Distribution and Uniform Distribution (8M)
2. Explain the concept of Poisson distribution (7 M)
3. Explain continuous distribution and type of continuous distribution. (9 M)
4. What is lognormal distribution? State and explain the three parameters that
defines the shape of the lognormal distribution. (9 M)
5. Define the Exponential distribution. What are different application of
Exponential Distribution? (9 M)
6. Explain the memoryless property of the Exponential Distribution. (7 M)
7. Differentiate between Normal & Poisson Distribution. (7 M)
8. Define Weibull distribution and explain Weibull probability density function
(PDF) in brief. (9 M)

Answers:
https://www.youtube.com/watch?v=QHCe4Fj-vJc

https://www.youtube.com/watch?v=5gaDDcty_LY

https://www.youtube.com/watch?v=JqfyW0lmd4A

https://www.youtube.com/watch?v=qNGDD_Rh8ps

Probability measures the likelihood of an outcome in a random experiment, reflecting long-


term frequency despite short-term uncertainty.
 Experiment: Any repeatable process with uncertain results.
 Simple Event: A single outcome (eᵢ).
 Sample Space (S): All possible outcomes.

1
 Event (E): Any collection of outcomes.
 Probability P (E): Chance of event E occurring, 0 ≤ P (E) ≤ 1, with impossible
events = 0 and certain events = 1. Sum of probabilities in S equals 1.
Methods to find probability:
1. Classical: Equally likely outcomes; P (E) = (number of favourable outcomes) / (total
outcomes).
2. Empirical: Based on observed frequency.
3. Subjective: Based on personal judgment or estimate.
Rules:
 Addition Rule: P (E or F) = P(E) + P(F) – P(E and F).
 Conditional Probability: P (F|E) = Probability of F given E.
 Independence: E and F are independent if P (F|E) = P(F).
 Multiplication Rule: For independent events, P (E and F) = P(E) × P(F).
Probability Distributions: Describe how probabilities are spread over possible outcomes.
 Discrete: Finite/countable values (e.g., Binomial).
 Continuous: Any value in a range (e.g., Normal).
 Properties: Probabilities sum to 1; each between 0 and 1.

Probability Distributions

https://www.youtube.com/watch?v=cNqb_awlB1A

https://www.youtube.com/watch?v=pgWHDeiysao

https://www.youtube.com/watch?v=Cg0W6mod9Hw

A probability distribution is a function that describes how likely it is for a random variable to
take on certain values. It helps us understand and model uncertainty in various real-world
scenarios.
A probability distribution:
 Describes how likely outcomes are.
 Helps us predict, simulate, and analyse random events.
 Is a fundamental concept in statistics, data science, engineering and AI, etc..

2
1. Discrete Probability Distributions
https://www.youtube.com/watch?v=RloMUUEagCI&t=105s

https://www.youtube.com/watch?v=Vydwz53i--c

A Discrete Probability Distribution describes the probabilities of outcomes for a random


variable that can take on a countable number of distinct values. Each possible value has an
associated probability, and the sum of all these probabilities is 1. Examples include the Binomial
and Poisson distributions.

2. Continuous Probability Distributions:


https://www.youtube.com/watch?v=ELQ-PftBQ5E

https://www.youtube.com/watch?v=u6UmB-ruq_A

A continuous distribution covers an infinite, uncountable range of values—for example, time,


which can be measured from 0 seconds to infinitely large amounts. In contrast, a discrete
distribution includes only countable values.

Types:

 Normal Distribution – Symmetric bell-shaped curve used in many natural phenomena.

 Standard Normal Distribution – A normal distribution with mean 0 and standard deviation

 Gamma Distribution – Models wait times and generalizes the exponential distribution.

 Exponential Distribution – Models time between events in a Poisson process.

 Chi-Square Distribution – Used in hypothesis testing and confidence interval estimation.

 Lognormal Distribution – Distribution of a variable whose logarithm is normally


distributed.

 Weibull Distribution – Used in reliability and life data analysis.

Comparison between Discrete and continuous random variable

Feature Discrete Random Variable (DRV) Continuous Random Variable (CRV)

Takes on uncountable (infinite)


Definition Takes on countable values values

Examples Binomial, Poisson Uniform, Normal, Exponential

3
Mean
E(x)=Σxi⋅P(xi)
(Expectation)

Small sample space (e.g., coin Large sample space (e.g., height of
Sample Space tosses) persons)

Spread Generally small Generally large

Values Type Integers (e.g., 0, 1, 2, ...) Decimals/Fractions (e.g., 1.23, 4.56)

Finite or countably infinite (can list Uncountable (values in a range or


Countability all values) interval)

Graph Type Probability Mass Function (PMF) Probability Density Function (PDF)

Characterizing a Distribution
Characteristic Description

Type Nature of distribution (Discrete or Continuous)

Support Range of possible values

Probability Function PMF for discrete, PDF for continuous

Cumulative Distribution Function (CDF) Probability that variable is ≤ x

Mean (Expected Value) Average or central value

Variance Spread or variability of distribution

Standard Deviation Square root of variance

Skewness Measure of asymmetry

Kurtosis Peakedness or flatness of distribution

Moment Generating Function (MGF) Encodes all moments of the distribution

Characteristic Function Fourier transform of the PDF or PMF

Entropy (Information Theory) Measure of uncertainty in distribution

Mode Value(s) at which the distribution is maximized

Median Middle value when data is ordered

PMF - Probability Mass Function

4
PDF - Probability Density Function

Normal (Gaussian) distribution: (continuous type)

https://www.youtube.com/watch?v=a4S4SA61saY&list=PLhSp9OSVmeyLcsT2DVslme
RFQbszeksZa

https://www.youtube.com/watch?v=2CGvLkj-V4Q

https://www.youtube.com/watch?v=XT5-scsjpTU

https://www.youtube.com/watch?v=WPj4yuwdInc&t=36s

https://www.youtube.com/watch?v=hfBeF8jdO6U

Normal (Gaussian) distribution is described by the following probability density function


(PDF):

x = random variable
μ= mean (centre of the distribution)
σ= standard deviation (spread of the distribution)
σ2= variance
e = Euler’s number (approximately 2.71828)

A normal distribution, also called a Gaussian distribution or bell curve, is symmetric with
no skew. Most values cluster around the mean, with fewer appearing toward the tails. Its key
features are:
 Mean = Median = Mode
 Symmetry about the mean (50% of values lie on each side)
 Defined by just two parameters: the mean and standard deviation.
The mean is the location parameter that sets the centre of the curve peak. Increasing the mean
shifts the curve to the right, while decreasing it shifts the curve to the left. The standard

5
deviation is the scale parameter that controls the spread

The standard deviation controls the curve spread: a small standard deviation makes the curve
narrow, while a large one makes it wider.

Empirical Rule (68-95-99.7): In a normal distribution, about 68% of values lie within 1
standard deviation (SD) of the mean, 95% within 2 SDs, and 99.7% within 3 SDs.
Example: For SAT scores with mean 1150 and SD 150:
6
 68% score between 1000 and 1300
 95% between 850 and 1450
 99.7% between 700 and 1600

Empirical Rule: Helps to identify outliers; small samples may need other distributions like t-
distribution.
Probability Density Function
In a probability density function (PDF), the area under the curve represents probability,
totalling 1 for the normal distribution. Although the normal PDF formula looks complex,
only need the mean and standard deviation to calculate the probability density for any value
of x.

Example: To find the probability that SAT scores exceed 1380, use the probability density
function graph. The probability is the shaded area under the curve to the right of the score
1380.

7
Standard Normal Distribution
https://www.youtube.com/watch?v=p_KApjpyBHE

https://www.youtube.com/watch?v=66A2lUyyD5s

https://www.youtube.com/watch?v=2tuBREK_mgE
The standard normal distribution, or z-distribution, is a normal distribution with mean 0 and
standard deviation 1. All normal distributions can be transformed into this form by converting
values (x) into z-scores (z), which represent how many standard deviations a value is from
the mean.

Need to know the mean and standard deviation of distribution to find the z-score of a value.

8
9
Lognormal Distribution:
https://www.youtube.com/watch?v=xtTX69JZ92w

https://www.youtube.com/watch?v=xbLvviyFqNg

https://www.youtube.com/watch?v=1_y4dZKiz48

https://www.youtube.com/watch?v=rLv1k_PxUOE&t=87s

A lognormal distribution is a probability distribution where the logarithm of the variable is


normally distributed.
 A variable is lognormally distributed if its log follows a normal distribution.
 It often models skewed, positive-valued data with low means and large variance.
 Values must be positive since the logarithm is only defined for positive numbers.
The probability density function is defined by the mean μ and standard deviation, σ:

Three parameters that defines the shape of lognormal distribution

Parameter Symbol Controls Effect on Distribution

Location μ Log-mean Shifts the curve left/right

Scale σ Log-standard deviation Controls width and skewness

Threshold θ Shift (optional) Sets the lower bound (start point)

The standard lognormal distribution has Θ = 0 and μ = 1 (the scale parameter).


If Θ = 0, it is called a 2-parameter lognormal distribution.

10
Applications:
The normal distribution, often recognized as the "bell curve," is the most widely used and
well-known distribution in scientific studies. It effectively represents many natural
phenomena, ranging from straightforward ones like weights and heights to more intricate
cases.
On the other hand, a lognormal distribution can be used to model various phenomena such as:
– Milk yield in cows
– Lifespan of industrial components that fail due to fatigue stress
– Rainfall amounts
– Size variation of raindrops
– Gas volume in petroleum reserves

Weibull Distribution:
https://www.youtube.com/watch?v=Bdos-UvkmNc

https://www.youtube.com/watch?v=m8CsxGr53lQ

https://www.youtube.com/watch?v=Bdos-UvkmNc&t=824s

https://www.youtube.com/watch?v=s_28i3mWOy0

https://www.youtube.com/watch?v=9BlJVOhcANo

11
The Weibull distribution is a versatile continuous probability distribution widely used in
reliability engineering, survival analysis, and extreme value statistics. It is particularly useful
for modelling the time until an event occurs, such as equipment failure or death.

Probability Density Function (PDF)


The PDF of the Weibull distribution is defined as:

Parameter Interpretations
Scale Parameter (λ): Stretches or compresses the distribution. Larger λ shifts the distribution
to the right.
Shape Parameter (k): Dictates the form of the distribution curve:

k < 1: Decreasing failure rate (infant mortality)


k = 1: Constant failure rate (reduces to Exponential distribution)
k > 1: Increasing failure rate (wear-out failures)
k = 2: Special case - Rayleigh distribution (bell-shaped PDF)

Shape Parameter Effects


The shape parameter k dramatically influences the distribution's behaviour:
 k < 1: Decreasing failure rate (infant mortality)
 k = 1: Constant failure rate (exponential distribution)
 k > 1: Increasing failure rate (wear-out phase)
 k = 2: Rayleigh distribution (special case)

Cumulative Distribution Function (CDF)


The CDF of the Weibull distribution is:

12
Key Properties
• Mean: μ = λ Γ(1 + 1/k)
• Variance: σ² = λ² [Γ(1 + 2/k) - (Γ(1 + 1/k))²]
where Γ is the gamma function.
The Weibull distribution is a family of distributions that can take on many shapes, depending
on what parameters;

13
Common Applications
Reliability Engineering: Modelling component lifetimes and failure rates in manufacturing
and engineering systems.
Survival Analysis: Analysing time-to-event data in medical research and biological studies.
Weather Modelling: Describing wind speed distributions and extreme weather events.
Quality Control: Modelling defect rates and process variations in manufacturing.

Exponential Distribution: (Memoryless Property)

https://www.youtube.com/watch?v=8M_VRCc9rMY

https://www.youtube.com/watch?v=3kxnPEDecIA

https://www.youtube.com/watch?v=b57FzIGdCY4

https://www.youtube.com/watch?v=ABbGOw73nuk

From Weibull distribution


Probability Density Function (PDF)

14
The Exponential distribution is a special case of the Weibull distribution where the shape
parameter k = 1. It describes a constant failure rate, typically used when the event probability
remains the same over time. This makes it suitable for modelling the time between
independent events that occur at a constant average rate.

Probability Density Function (PDF): when k=1 (Exponential distribution)

The Exponential distribution is the only continuous probability distribution that satisfies
the memoryless property states that

This means: If an event hasn’t occurred by time s, the probability it will take more than
another t units of time is the same as if we had just started waiting from time 0.
This means that if the event hasn't happened by time s, the chance that it will still take more
than t additional units of time to occur is exactly the same as the probability of waiting more
than t units of time from the very beginning. In other words, the process has no memory of
how long you've already waited.
Comparison between:
Feature Weibull (general) Exponential (k = 1)

Shape flexibility Very flexible (controlled by k) Only fixed shape

Failure rate Increasing, decreasing, or constant Constant only

Memoryless No Yes

Special cases Includes exponential and Rayleigh Itself a special case

Applications of the Exponential Distribution


 Reliability Engineering: Modelling time until failure of electronic components that
fail randomly (e.g., light bulbs, resistors).
15
 Queueing Theory: Modelling the time between arrivals of customers in a service
system.
 Survival Analysis: Modelling time until death or failure in populations where hazard
is constant.
 Telecommunications: Modelling time between signal arrivals.

Uniform Distribution
https://www.youtube.com/watch?v=n_1Z-HVemP0

https://www.youtube.com/watch?v=UC-CBUSQXAo
There are two main types of uniform distributions:
Continuous Uniform Distribution
A continuous uniform distribution is where the probability is equally spread over a
continuous interval [a, b]
Mean (Expected Value) of a uniform random variable X is: E(X) = (1/2) (a + b)

Variance of a uniform random variable is: Var (x) = (1/12)*(b-a) ^2

PDF

Discrete Uniform Distribution


A discrete uniform distribution is where a finite number of outcomes are equally likely.

Example:
Rolling a fair die – outcomes are {1, 2, 3, 4, 5, 6}. Each has a probability of 1/6.
Probability Mass Function (PMF):

Mean (Expected Value):

16
Variance:

Applications:
Random Number Generation
 Application: Most programming languages generate pseudo-random numbers using
uniform distributions.
 Use Case: Generating random samples for simulations, games, or cryptographic keys.
Cryptography and Security
 Application: Uniform randomness ensures unpredictability in key generation.
 Use Case: Secure token generation, password creation, session identifiers.
Game Design and Simulations
 Application: Games often simulate fair events like rolling a die, spinning a wheel, or
dealing cards.
 Use Case: Board games, AI simulations.
Computer Graphics
 Application: Random sampling for visual effects or distributing points in space.
17
 Use Case: Texture generation, Monte Carlo ray tracing.
Quality Control and Reliability Engineering
 Application: Assumes that a process could fail at any point during its operation
period with equal probability.
 Use Case: Time to failure modelling under certain assumptions.
Industrial and Manufacturing Applications
 Application: Uniform distribution models tolerance ranges and measurement errors
when variability is equally likely across a range.
 Use Case: Simulating uniform wear and tear, or uniform material distribution.

Poisson’s distribution:
https://www.youtube.com/watch?v=eKNu4Af6ess

https://www.youtube.com/watch?v=V3tZ6VLVbak

https://www.youtube.com/watch?v=Bg7OUXfn2AU

https://www.youtube.com/watch?v=cjjHhZCeoKs
The Poisson distribution is a discrete probability distribution that describes the number
of times an event occurs in a fixed interval of time or space, provided the events:
 occur independently,
 occur at a constant average rate
 cannot happen simultaneously

PMF (Probability Mass Function):

X = number of occurrences,
λ= average rate (mean number of occurrences per interval),
k= actual number of occurrences
Assumptions of Poisson distribution
1. Events are independent: The occurrence of one event does not affect another.
2. Constant rate: The average number of events (λ) is constant over time/space.
3. No simultaneous events: Events occur one at a time.

18
4. The probability of more than one event in an infinitesimally small interval is
nearly zero.

Application Area Example

Call centres Number of incoming calls per hour

Healthcare Number of patients arriving in ER per shift

Biology Number of mutations per unit of DNA

Finance Number of claims filed with an insurance company

Manufacturing Number of defects in a length of wire

Astronomy Number of photons hitting a detector in a given time

Traffic engineering Number of cars passing a point per minute

19
Here is the Poisson Probability Mass Function (PMF) plotted for three different values of λ
(1, 4, and 10). As you can see:
 When λ=1, the distribution is highly right-skewed.
 When λ=4, it becomes more balanced.
 When λ=10, the distribution appears nearly symmetric and starts resembling a normal
distribution.
Comparison (difference) between Normal and Poisson distributions:

Feature Normal Distribution Poisson Distribution

Type Continuous Discrete

Models continuous data


Use Case Models event counts in fixed intervals
(e.g., heights, test scores)

Central to Central Limit Approximated by Normal when λ is large


Occurrence in CLT
Theorem (λ > 10)

Probability Density Uses PDF (area under curve =


vs. Mass Uses PMF (sum of bars = probability)
probability)

Shape Bell-shaped, symmetric Skewed for small λ, symmetric for large λ

Parameters Mean (μ), Rate parameter (λ)

20
Feature Normal Distribution Poisson Distribution

Standard Deviation (σ)

Domain (X values) −∞<x<∞- x=0,1,2,

Mean μ λ

Variance σ² λ (mean = variance)

Skewness 0 (symmetric)

Kurtosis 3 (mesokurtic)

Aggregation of many small, Rare, independent events occurring over


Underlying Process
independent effects time/space

Common
Central Limit Theorem Binomial with large n, small p; (n·p = λ)
Approximation

Human IQ scores, Number of calls/hour, number of decay


Example
measurement errors events/min

Clarity for Kurtosis:


https://www.youtube.com/watch?v=EWuR4EGc9EY

Type Shape Tails Outlier Behavior Kurtosis Value

Leptokurtic Tall, narrow peak Fat/heavy tails Many outliers >3

Normal bell-shaped Moderate number of


Mesokurtic Moderate tails =3
curve outliers

Platykurtic Wide, flat peak Thin/light tails Few outliers <3

21
Continuous probability distributions derived from the normal distribution
These derived distributions are foundational in statistical inference, especially in hypothesis
testing and estimation. The most common are described below..
1. Chi-Square (χ²) Distribution
Definition: The distribution of the sum of the squares of k independent standard normal
variables.
Formula:

Degrees of Freedom: k
Applications:
 Goodness-of-fit tests
 Test for independence in contingency tables
 Confidence intervals for variance
22
2. Student's t-Distribution
Definition: The distribution of the standardized mean when the population variance is
unknown and estimated from the sample.
Formula:

Degrees of Freedom: k
Applications:
 Confidence intervals and hypothesis tests for means
 Small-sample inference when population variance is unknown
3. F-Distribution
 Definition: The distribution of the ratio of two scaled chi-square variables.
 Formula:

Degrees of Freedom: d1 and d2


Applications:
 Analysis of variance (ANOVA)
 Regression analysis
 Comparing two variances

4. Log-Normal Distribution (already detailed)

Clarity for Central Limit Theorem (CLT)


https://www.youtube.com/watch?v=a7szVlUy9dU
The Central Limit Theorem (CLT) is a fundamental concept in statistics that explains the
behavior of the sampling distribution of the sample mean.
The Central Limit Theorem states that when independent random samples of sufficiently
large size are drawn from any population (regardless of its original distribution), the
sampling distribution of the sample mean will tend to follow a normal distribution,
provided the sample size is large enough (typically n ≥ 30).

23

You might also like