Open In App

Last Minute Notes (LMNs) - Probability and Statistics

Last Updated : 23 Jul, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Probability refers to the likelihood of an event occurring. For example, when an event like throwing a ball or picking a card from a deck occurs, there is a certain probability associated with that event, which quantifies the chance of it happening. this "Last Minute Notes" article provides a quick and concise revision of the key concepts in Probability and Statistics.

Counting

Permutation:

Arrangement of items where order matters.

Formula:

P(n, r) = \frac{n!}{(n-r)!}

Example:

1) Arranging 2 out of 3 letters (A, B, C): P(3, 2) = 6 (AB, BA, AC, CA, BC, CB).

2) The number of ways to arrange 3 books out of 5: P(5, 3) = \frac{5!}{(5-3)!} = 60.

Combination:

Selection of items where order does not matter.

Formula: C(n, r) = \frac{n!}{r! \times (n-r)!}

Example:

1) The number of ways to select 2 items from 5: C(5,2) = \frac{5!}{2! \times (5-2)!} =10.

2) Selecting 2 out of 3 letters (A, B, C): C(3,2) = 3 (AB, AC, BC).

Differences Between Permutations and Combinations:

Permutation

Combination

Order is important.

Order is not important.

Formula: P(n,r).

Formula:C(n,r).

Example: AB ≠ BA.Example: AB = BA.

read more about Permutations and Combinations.

Basics of Probability

Sample Space (S):

The set of all possible outcomes of a random experiment.
For example, tossing two coins has S = {HH, HT, TH, TT}.

Events:

A subset of the sample space. For example, getting two heads is the event A = {HH}.

Compound Event:

A compound event is an event that consists of two or more outcomes.

Mutually Exclusive Events:

Events that cannot happen simultaneously

Mathematically: P(A \cap B) = 0

Key Rules:

  • Addition Rule: P(A \cup B) = P(A) + P(B).
  • P(A∣B)=0 (If B happens, A cannot).

Examples:

1) Coin toss: Getting Heads (A) and Tails (B) are mutually exclusive events.

2) Rolling a die: Getting Odd (A) and Even (B) number are mutually exclusive events.

Independent Events:

Events where the occurrence of one does not affect the other.

Mathematically: P(A∩B)=P(A)⋅P(B).

Key Rules:

  • Multiplication Rule: P(A \cap B) = P(A) \cdot P(B)
  • P(A∣B)=P(A); P(B∣A)=P(B).

Examples:

1) Two coin tosses: Heads on the first toss (A) and Tails on the second (B).
2) Rolling two dice: Rolling a 6 (A) on one die and a 4 (B) on the other.

Important Rules:

  • Addition Rule:
    • For two mutually exclusive events A and B: P(∪B) = P(A) + P(B).
    • For two non-mutually exclusive events A and B: P(A∪B) = P(A) + P(B) − P(A∩B).
  • Multiplication Rule:
    • For independent events A and B: P(A∩B) = P(A) ⋅ P(B).
    • For conditional probability: P(A∣B) = P(A∩B)/P(B)​, provided P(B) > 0.

Joint, Marginal and Conditional Probability

Joint Probability:

Joint probability represents the likelihood of two or more events occurring simultaneously. It is denoted as P(A \cap B), where A and B are events.

If A and B are independent, the formula simplifies to:

P(A \cap B) = P(A) \cdot P(B)

Marginal Probability:

Marginal probability is the probability of a single event regardless of the outcomes of other events. It is obtained by summing or integrating joint probabilities over all possible values of the other event.

P(A) = \sum_B P(A, B) \quad \text{(for discrete variables)}

Conditional Probability:

Conditional probability calculates the probability of event A given that event B has already occurred. It is denoted as P(A∣B) and is computed using:

P(A|B) = \frac{P(A \cap B)}{P(B)}, \, \text{provided } P(B) > 0.

read more about Joint, Marginal and Conditional Probability.

Bayes’ Theorem:

Bayes’ Theorem provides a way to calculate the conditional probability of an event A, given that another event B has already occurred. It uses prior knowledge about related events to update the probability of A.

Formula:

P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}

Here:

  • P(A∣B) is the probability of event A occurring given that event B has occurred.
  • P(B∣A) is the probability of event B occurring given that event A has occurred.
  • P(A) and P(B) are the probabilities of events A and B occurring, respectively.

Descriptive Statistics

Descriptive statistics involves summarizing and organizing data to make it easier to understand. It includes measures like mean, median, mode, standard deviation, and variance

Mean (Average):

The mean is the central value of a dataset, calculated by summing all data points and dividing by the number of points.

u = \frac{1}{n} \sum_{i=1}^{n} x_i

Example: For {4,8,6,5,3,7} :

u = \frac{4 + 8 + 6 + 5 + 3 + 7}{6} = 5.5

Median:

The median is the middle value of a sorted dataset. If the dataset has an odd number of elements, it’s the middle value; if even, it’s the average of the two middle values.

Example: For {3,4,5,6,7,8} the median is:

Median={5 + 6}/{2} = 5.5

Mode:

The mode is the value(s) that appear most frequently in the dataset.

Example: For {3,7,7,19,24}, the mode is: 7

Variance:

Variance is a measure of how much the values in a dataset deviate from the mean (average). It quantifies the spread or dispersion of the data.

Formula: For a dataset x_1, x_2, ..., x_n​, the variance \sigma^2 is calculated as:

\sigma^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2

Example: Consider the dataset: 3,7,8,10,12.

  • Mean: μ=8
  • Variance: \sigma^2 = \frac{1}{5}[(3-8)^2 + (7-8)^2 + (8-8)^2 + (10-8)^2 + (12-8)^2] = 9.2

Standard Deviation (SD):

The standard deviation measures the spread of the data from the mean. It is the square root of the variance.

\sigma = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2}

Example: For {4,8,6,5,3,7}, with μ= 5.5:

\sigma^2 = \frac{(4-5.5)^2 + (8-5.5)^2 + \dots + (7-5.5)^2}{6} = 2.92,

\quad \sigma = \sqrt{2.92} \approx 1.71

Covariance:

Measures how two variables vary together.

Range: -\infty\ to +\infty.

Types:

  • Positive: Both variables move in the same direction.
  • Negative: Variables move in opposite directions.
  • Zero: No linear relationship.


Formula:

\text{Cov}(X, Y) = \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})


Correlation:

Standardized measure of the strength and direction of the linear relationship.

Range: −1 to +1.

  • +1: Perfect positive correlation.
  • −1: Perfect negative correlation.
  • 0: No linear relationship.

Formula:

\rho(X, Y) = \frac{\text{Cov}(X, Y)}{\sigma_X \cdot \sigma_Y}

Random Variable

Random variables :

A random variable is a function that maps outcomes of a random experiment to real numbers. It helps quantify uncertainty and calculate probabilities.

Example: If two unbiased coins are tossed, let X (X is a random variable or function) represent the number of heads.

The sample space is S = {HH, HT, TH, TT}, and X can take the values {0, 1, 2}.

Cumulative Distribution Function (CDF):

The Cumulative Distribution Function (CDF), F(x), represents the probability that a random variable X takes a value less than or equal to x. It provides an accumulated probability up to a certain point x.

  1. For Discrete Random Variables: F(x) = P(X \leq x) = \sum_{x_0 \leq x} P(x_0) Here, P(x_0) is the probability of X being equal to x_0​.
  2. For Continuous Random Variables: F(x) = P(X \leq x) = \int_{-\infty}^{x} f(x) dx Here, f(x) is the Probability Density Function (PDF) of the random variable X.

Joint Random Variable

Conditional Expectation:

Conditional expectation is the expected value (mean) of a random variable Y, given that another random variable X has a specific value or distribution. It provides the average value of Y, considering the information provided by X.

Conditional Variance:

Conditional variance measures the spread or variability of a random variable Y, given that another random variable X takes a specific value.

Conditional Probability Density Function:

The Conditional PDF describes the probability distribution of a random variable X, given that another random variable Y is known to take a specific value.

Mathematically:

f_{X|Y}(x|y) = \frac{f_{X,Y}(x, y)}{f_Y(y)}

  • f_{X,Y}(x, y): Joint PDF of X and Y.
  • f_Y(y): Marginal PDF of Y

Probability Distributions

Discrete Probability Distribution :

Applies to discrete random variables, which can only take specific, countable values (e.g., integers). The probabilities of these outcomes are represented by the Probability Mass Function (PMF).

Continuous Probability Distribution:

Applies to continuous random variables, which can take any value within a range or interval. Probabilities are described using the Probability Density Function (PDF).

Uniform Distribution:

The Uniform Distribution, also called the Rectangular Distribution, is a type of Continuous Probability Distribution. It represents a scenario where a continuous random variable X is uniformly distributed over a finite interval [a, b]. This means that every value within [a, b] is equally likely, and the probability density function f(x) is constant over this range.

The probability density function (PDF) of a uniform distribution is defined as:

f(x) = \begin{cases} \frac{1}{b-a}, & a \leq x \leq b \\ 0, & \text{otherwise} \end{cases}

This constant density ensures that the total probability over the interval [a, b] is 1.

Mean: μ = (a+b)/2

Variance : \sigma^2 = \frac{(b - a)^2}{12}

Binomial Distribution:

A probability distribution that models the number of successes in n independent Bernoulli trials.

Key Parameters:

  • n: Total number of trials.
  • p: Probability of success.
  • q=1−p: Probability of failure.

Probability Mass Function:

P(X = r) = \binom{n}{r} \cdot p^r \cdot q^{n-r}, \, r = 0, 1, 2, \dots, n

Mean = np
Variance. = np(1-p) 

Bernoulli Trials:

  • A Bernoulli trial is an experiment with two possible outcomes: success (A) or failure (A′).
  • The probability of success is P(A)=p, and failure is P(A')=q=1−p.

Examples: Tossing a coin (Head = success, Tail = failure).

Theorem:

Probability of r successes in n trials is: P(X = r) = \,^{n}C_{r} \cdot p^r \cdot q^{n-r}

Exponential Distribution:

The Exponential Distribution models the time between events in a process where events occur continuously and independently at a constant average rate.

For a positive real number \lambda, the Probability Density Function (PDF) of an exponentially distributed random variable X is:

f_X(x) = \begin{cases} \lambda e^{-\lambda x}, & x \in R_X = [0, \infty) \\ 0, & x \notin R_X \end{cases}

Mean: E[X] = \frac{1}{\lambda}

Variance: \text{Var}[X] = \frac{1}{\lambda^2}​

Poisson Distribution:

The Poisson distribution is a discrete probability distribution used to model the number of occurrences of an event in a fixed interval of time, space, or volume, where:

  • The events occur independently.
  • The average rate λ of occurrences is constant.

Probability Mass Function (PMF):

P(X = x) = \frac{e^{-\lambda} \lambda^x}{x!}, \, x = 0, 1, 2, \dots

where λ is the mean (expected number of events)

Mean: E[X] = \lambda

Variance: \text{Var}[X] = \lambda

Normal Distribution:

The Normal Distribution is a continuous probability distribution that models many natural and real-world phenomena. It is characterized by its symmetric, bell-shaped curve, where:

  • The highest point (mean) represents the most probable value.
  • Probabilities decrease as you move away from the mean.

Probability Density Function (PDF) is:

f_X(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^2}

Mean (\mu): E[X] = \mu

Variance : V[X] = \sigma^2

Standard Normal Distribution:

The Standard Normal Distribution, also called the Z-distribution, is a special case of the normal distribution where:

  • Mean (μ)=0
  • Standard deviation (σ)=1

It is used to compare and analyze data by standardizing values using the z-score:

Z = \frac{X - \mu}{\sigma}

Probability Density Function (PDF)

f(Z) = \frac{1}{\sqrt{2\pi}} e^{-\frac{Z^2}{2}}, \quad -\infty < Z < \infty

t-Distribution:

The t-distribution (Student's t-distribution) is used in statistics to infer population means when:

  • Sample size is small n ≤ 30.
  • Population standard deviation σ is unknown.

Key Formula:

The t-score: t = \frac{\bar{x} - \mu}{s / \sqrt{n}}

Where:

  • \bar{x}: Sample mean
  • μ: Population mean
  • s: Sample standard deviation

Chi-Squared Distribution:

The Chi-Squared distribution represents the sum of the squares of k independent standard normal random variables:

X^2 = Z_1^2 + Z_2^2 + \dots + Z_k^2

  • k: Degrees of freedom (df).
  • As k increases, the distribution becomes more symmetric and approaches a normal distribution.

Probability Density Function (PDF):

f(x; k) = \frac{1}{2^{k/2} \Gamma(k/2)} x^{(k/2)-1} e^{-x/2}, \, x \geq 0Γ(.)\Gamma(.).

Mean: =k

Variance: = {Variance} = 2k

Inferential Statistics

Inferential statistics makes predictions or inferences about a population based on sample data.

Sampling Distribution: A sampling distribution is the probability distribution of a statistic (such as the sample mean) obtained through repeated sampling from a population. It shows how the statistic varies across different samples,

Central limit theorem:

The Central Limit Theorem (CLT) states that for a sufficiently large sample size (n >30), the distribution of the sample mean approaches a normal distribution, regardless of the shape of the population distribution. The population must have a finite variance.

formula: For a random variable X with:

  • Mean (μ)
  • Standard deviation (σ)

The sample mean \bar{X} follows:

\bar{X} \sim N\left(\mu, \frac{\sigma}{\sqrt{n}}\right)

The Z-score for the sample mean is given by:

Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}}

Confidence Interval:

A Confidence Interval (CI) is a range of values within which the true population parameter (e.g., mean) lies with a certain confidence level (e.g., 95%).

Key Formula:

\text{CI} = \text{Point Estimate} \pm \text{Critical Value} \times \text{Standard Error}

  • Point Estimate: Sample mean/proportion.
  • Critical Value: From z-table or t-table.
  • Standard Error: Depends on the statistic (e.g., \frac{s}{\sqrt{n}} for the mean).

Z-Test:

A statistical test used to determine if a sample mean differs significantly from a population mean, applicable when:

  • Sample size n > 30
  • Population standard deviation (σ) is known.

Formula:

Z = \frac{\bar{x} - \mu}{\sigma}

T-Test :

A t-test is a statistical method to compare the means of two groups and determine if the difference is statistically significant. It is used when:

  • Sample size is small (n < 30).
  • Population variance is unknown.

Key Types:

One-Sample T-Test:

  • Compares a sample mean to a known population mean.
  • Formula: Formula: t = \frac{\bar{x} - \mu}{\frac{s}{\sqrt{n}}}

Independent T-Test: Compares means of two independent groups.

Paired T-Test: Compares means from the same group at two different times.

Chi-Square Test:

A chi-square (χ²) test assesses whether there is a significant relationship between two categorical variables.It compares observed data against expected frequencies to identify if the results are likely to occur by chance.

Example: When tossing a coin, the test can show if heads or tails appear disproportionately often, suggesting that the result isn't just random.

Formula:

\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}

where:

  • O_i = observed frequency
  • E_i= expected frequency

Explore