Probability Distribution
Probability Distribution
A frequency distribution describes a specific sample or dataset. It’s the number of times each
possible value of a variable occurs in the dataset.
The number of times a value occurs in a sample is determined by its probability of occurrence.
Probability is a number between 0 and 1 that says how likely something is to occur:
The higher the probability of a value, the higher its frequency in a sample.
More specifically, the probability of a value is its relative frequency in an infinitely large sample.
Infinitely large samples are impossible in real life, so probability distributions are theoretical.
They’re idealized versions of frequency distributions that aim to describe the population the
sample was drawn from.
Probability distributions are used to describe the populations of real-life variables, like coin
tosses or the weight of chicken eggs. They’re also used in hypothesis testing to
determine p values.
Example: Probability distributions are idealized frequency distributions. Imagine that an egg
farmer wants to know the probability of an egg from her farm being a certain size.
The farmer weighs 100 random eggs and describes their frequency distribution using a
histogram:
She can get a rough idea of the probability of different egg sizes directly from this frequency
distribution. For example, she can see that there’s a high probability of an egg being around 1.9
oz., and there’s a low probability of an egg being bigger than 2.1 oz.
Suppose the farmer wants more precise probability estimates. One option is to improve her
estimates by weighing many more eggs.
A better option is to recognize that egg size appears to follow a common probability distribution
called a normal distribution. The farmer can make an idealized version of the egg weight
distribution by assuming the weights are normally distributed:
Since normal distributions are well understood by statisticians, the farmer can calculate precise
probability estimates, even with a relatively small sample size.
Variables that follow a probability distribution are called random variables. There’s special
notation you can use to say that a random variable follows a specific distribution:
For example, the following notation means “the random variable X follows a normal distribution
with a mean of µ and a variance of σ2.”
Discrete probability distributions only include the probabilities of values that are possible. In
other words, a discrete probability distribution doesn’t include any values with a probability of
zero. For example, a probability distribution of dice rolls doesn’t include 2.5 since it’s not a
possible outcome of dice rolls.
The probability of all possible values in a discrete probability distribution add up to one. It’s
certain (i.e., a probability of one) that an observation will have one of the possible values.
Probability tables
A probability table represents the discrete probability distribution of a categorical variable.
Probability tables can also represent a discrete variable with only a few possible values or a
continuous variable that’s been grouped into class intervals.
“Greetings, human!” .6
“Hi!” .1
“Howdy!” .1
Notice that all the probabilities are greater than zero and that they sum to one.
Example: Probability mass functionImagine that the number of sweaters owned per person in the
United States follows a Poisson distribution.
The probability mass function of the distribution is given by the formula:
Where:
The probability that a person owns zero sweaters is .05, the probability that they own one
sweater is .15, and so on. If you add together all the probabilities for every possible number of
sweaters a person can own, it will equal exactly 1.
Binomial Describes variables with two possible outcomes. It’s the The number of times a coin
probability distribution of the number of successes lands on heads when you
in n trials with p probability of success. toss it five times
Discrete Describes events that have equal probabilities. The suit of a randomly
uniform drawn playing card
Poisson Describes count data. It gives the probability of an event The number of text
happening k number of times within a given interval of messages received per day
time or space.
A continuous variable can have any value between its lowest and highest values. Therefore,
continuous probability distributions include every number in the variable’s range.
The probability that a continuous variable will have any specific value is so infinitesimally small
that it’s considered to have a probability of zero. However, the probability that a value will fall
within a certain interval of values within its range is greater than zero.
In graph form, a probability density function is a curve. You can determine the probability that a
value will fall within a certain interval by calculating the area under the curve within that
interval. You can use reference tables or software to calculate the area.
The area under the whole curve is always exactly one because it’s certain (i.e., a probability of
one) that an observation will fall somewhere in the variable’s range.
Example: Probability density functionThe probability density function of the normal distribution
of egg weight is given by the formula:
Where:
The probability of an egg being exactly 2 oz. is zero. Although an egg can weigh very close to 2
oz., it is extremely improbable that it will weigh exactly 2 oz. Even if a regular scale measured
an egg’s weight as being 2 oz., an infinitely precise scale would find a tiny difference between
the egg’s weight and 2 oz.
The probability that an egg is within a certain weight interval, such as 1.98 and 2.04 oz., is
greater than zero and can be represented in the graph of the probability density function as a
shaded region:
The shaded region has an area of .09, meaning that there’s a probability of .09 that an egg will
weigh between 1.98 and 2.04 oz. The area was calculated using statistical software.
Normal Describes data with values that become less probable the SAT scores
distribution farther they are from the mean, with a bell-shaped
probability density function.
Continuous Describes data for which equal-sized intervals have equal The amount of time cars
uniform probability. wait at a red light
Log-normal Describes right-skewed data. It’s the probability distribution The average body weight
of a random variable whose logarithm is normally of different mammal
distributed. species
Exponential Describes data that has higher probabilities for small values Time between
than large values. It’s the probability distribution of time earthquakes
between independent events.
• If you have a formula describing the distribution, such as a probability density function,
the expected value is usually given by the µ parameter. If there’s no µ parameter, the
expected value can be calculated from the other parameters using equations that are
specific to each distribution.
• If you have a sample, then the mean of the sample is an estimate of the expected value of
the population’s probability distribution. The larger the sample size, the better the
estimate will be.
• If you have a probability table, you can calculate the expected value by multiplying
each possible outcome by its probability, and then summing these values.
Example: Expected valueAmerican robins lay between two and four eggs in their nests. Imagine
that this probability table describes the probability distribution of the number of robin eggs per
nest:
Eggs Probability
2 0.2
3 0.5
4 0.3
2 .2 2 * 0.2 = 0.4
3 .5 3 * 0.5 = 1.5
4 .3 4 * 0.3 = 1.2
• If you have a formula describing the distribution, such as a probability density function,
the standard deviation is sometimes given by the σ parameter. If there’s no σ parameter,
the standard deviation can often be calculated from other parameters using formulas that
are specific to each distribution.
• If you have a sample, the standard deviation of the sample is an estimate of the standard
deviation of the population’s probability distribution. The larger the sample size, the
better the estimate will be.
• If you have a probability table, you can calculate the standard deviation by calculating
the deviation between each value and the expected value, squaring it, multiplying it by its
probability, and then summing the values and taking the square root.
Example: Standard deviationCalculate the deviation between each value and the expected value:
Eggs (x) Probability (P(x)) x – E(x)
2 .2 2 − 3.1 = −1.1
3 .5 3 − 3.1 = −0.1
4 .3 4 − 3.1 = 0.9
σ = √(0.49)
σ = 0.7 eggs
All hypothesis tests involve a test statistic. Some common examples are z, t, F, and chi-square. A
test statistic summarizes the sample in a single number, which you then compare to the null
distribution to calculate a p value.
The p value is the probability of obtaining a value equal to or more extreme than the sample’s
test statistic, assuming that the null hypothesis is true. In practical terms, it’s the area under the
null distribution’s probability density function curve that’s equal to or more extreme than the
sample’s test statistic.
Example: Testing hypotheses using null distributionsA one-sample t test is a hypothesis test that
uses a test statistic called Student’s t. If a sample has a t of 1.7, we calculate the p value (for a
one-sided test) as the shaded area to the right of t = 1.7 in the null distribution of Student’s t:
The area, which can be calculated using calculus, statistical software, or reference tables, is equal
to .06. Therefore, p = .06 for this sample.
Common null distributions and the statistical tests that use them
Two-sample t test
Paired t test
Common null distributions and the statistical tests that use them
Linear regression
Pearson correlation
F distribution ANOVA
Comparison of nested linear models
McNemar’s test