Probability
Distributions
INTRODUCTION TO QUANTITATIVE TECHNIQUES
DIPLOMA IN TAX ADMINISTRATION
Introduction
•A probability distribution shows the possible outcomes of an experiment and the
probability of each of these outcomes.
•In a probability distribution, the variables are distributed according to some
definite probability function.
•Probability distributions are either discrete or continuous. Discrete distributions
give the probabilities of discrete random variables while continuous
distributions give the probabilities of continuous random variables.
•A discrete variable arises from counting and can only take one of a particular set
of values.
•A continuous variable arises from measuring and takes any value within a
specified range.
Characteristics of a probability
distribution
1. The probability of a particular outcome is between 0 and 1 inclusive.
2. The outcomes are mutually exclusive events.
3. The list is exhaustive. The sum of the probabilities of the various events is
equal to 1.
Example of generating a probability
distribution
Suppose we are interested in the number of heads showing face up on three
tosses of a coin. This is the experiment. The possible results are zero heads, one
head, two heads and three heads. What is the probability distribution for the
number of heads?
There are eight possible outcomes to this experiment as summarised in the
following table.
Example of generating a probability
distribution
Possible 1st Coin toss 2nd Coin toss 3rd Coin toss Number of
result heads
1 T T T 0
2 T T H 1
3 T H T 1
4 T H H 2
5 H T T 1
6 H T H 2
7 H H T 2
8 H H H 3
Example of generating a probability
distribution
The probability distribution of this experiment is summarised in the table
below:-
Number of heads (x) Probability of outcome, P(x)
0 1/8 = 0.125
1 3/8 = 0.375
2 3/8 = 0.375
3 1/8 = 0.125
Total 8/8 = 1
Definitions
•Random variable – A quantity resulting from an experiment that, by chance,
can assume different values. Random variables are denoted by capital letters (X)
whereas any specific value of the random variable is denoted by a small letter (x).
•Probability function – A function that allows us to compute the probability
for any event that is defined in terms of value of the random variable.
•Expected value (mean) – This is the average value of a random variable.
•Variance – This is the variance of the random variable.
Discrete Probability distributions
Mean (Expected Value)
The mean or expected value is a typical value used to represent the central
location of a probability distribution.
It is also the long run average value of the random variable.
It is the weighted average where the possible values of a random variable are
weighted by their corresponding probabilities of occurrence.
The mean (expected value) of a discrete probability distribution is computed as:-
◦ μ = Σ[xP(x)]
Discrete Probability distributions
Variance and Standard Deviation
The variance describes the amount of spread of a probability distribution.
The variance of a discrete probability distribution is computed as:-
◦ σ2 = Σ[(x- μ)2P(x)]
The standard deviation σ is obtained by taking the positive square root of the
variance, that is:-
◦ σ = √ σ2
Example
Mrs Hadija is an employee of KRA. Part of her work entails reviewing the returns filed by
taxpayers. She has developed the following probability distribution for the number of
returns she expects to review on a particular week.
Number of returns reviewed, x Probability, P(x)
0 0.10
1 0.20
2 0.30
3 0.30
4 0.10
Total 1
a. On a typical week, how many returns does Mrs. Hadija expect to review?
b. What is the variance of the distribution?
Solution
a. The mean number (expected number) of returns to be reviewed is computed
by weighting the number of returns reviewed by the probability of reviewing that
number of returns, and then summing up the products:-
μ = Σ[xP(x)]
= 0(0.10) + 1(0.20) + 2(0.30)+ 3(0.30) + 4(0.10)
= 2.1
The expected value of 2.1 indicates that over a large number of weeks, Mrs
Hadija expects to review 2.1 taxpayer returns a week. This expected value can be
used to predict the arithmetic mean number of returns reviewed per week in the
long run. For example, if Mrs Hadija works for 50 weeks in a year, then she can
expect to review 50*2.1=105 returns.
Solution
b. The variance is calculated as shown in the table below:-
No of returns reviewed, x Probability, P(x) (x- μ) (x- μ)2 (x- μ)2P(x)
0 0.10 -2.1 4.41 0.441
1 0.20 -1.1 1.21 0.242
2 0.30 -0.1 0.01 0.003
3 0.30 0.9 0.81 0.243
4 0.10 1.9 3.61 0.361
σ2 = 1.290
The standard deviation, σ, is 1.136
Counting Techniques
In computing the probability of an event or the probability of a combination of events,
when the total number of possible events is large, it will be convenient to have available
some methods for counting the number of such events. These counting techniques are:
1. Factorials
2. Permutations
3. Combinations
Factorials
Given the positive integer n, the product of all the whole numbers from n down through 1
is called n factorial and is written as n! Therefore n! = n*(n-1)*(n-2)…3*2*1
For example 6! = 6*5*4*3*2*1 = 720
Note that 0! = 1
Counting Techniques
Permutations
Permutations are also called arrangements. Each of the arrangements which
can be made by taking some or all the number of things is called a permutation.
Permutation is any arrangement of r objects selected from a single group of n
possible objects.
The permutation formula is given as:
!
nPr = ( )!
(Note that the arrangements a b c, b c a, c a b, a c b, c b a and b a c are different
permutations)
Counting Techniques
Combinations
Combinations are also called selections. A combination of number of objects is
a selection of these objects, considered without regard to their order.
The combination formula is given as:
n!
nCr =
(n−r)! r!
(Note that the arrangements a b c, b c a, c a b, a c b, c b a and b a c are the same
combintion)
Binomial Distribution
This is a widely occurring discrete probability distribution with the following
characteristics / assumptions:-
1. The process is performed under the same conditions for a fixed and finite
number of trials (usually indicated as n).
2. Each trial is independent of other trials, i.e. the probability of an outcome for
any particular trial is not influenced by the outcomes of the other trials.
3. Each trial has two mutually exclusive possible outcomes, such as ‘success’ or
‘failure’, ‘good’ or ‘defective’, ‘yes’ or ‘no’, ‘hit’ or ‘miss’. The outcomes are
usually called success and failure for convenience.
4. The probability of success, p, remains constant from trial to trial. (so is the
probability of failure q, where q = 1 – p)
Computing the Binomial Probability
To compute a particular binomial probability, we use:-
a) the number of trials
b) the probability of success on each trial
A binomial probability is computed by the formula:
P(x) = nCx px (1-p)n-x
Where:
C denotes a combination
n is the number of trials
x is the random variable defined as the number of successes
p is the probability of a success in each trial
Example
Jambo Jet has five daily flights from Nairobi to Mombasa. Suppose the probability that any
flight arrives late is 0.20. What is the probability that none of the flights are late on a
particular day? What is the probability that exactly one of the flights is late on a particular
day?
Solution
In the case of the first question, p = 0.2, n = 5, x = 0
P(0) = 5C0 (0.2)0 (0.8)5
= 0.32768
In the case of the second question, p = 0.2, n = 5, x = 1
P(1) = 5C1 (0.2)1 (0.8)4
= 0.4096
Mean and Variance of a Binomial
Distribution
The mean of a binomial distribution is given by the formula:
μ = n*p
Where n is the number of trials
p is the probability of a success in each trial
The variance of a binomial distribution is given by the formula:
σ2 = n*p*(1-p)
Poisson Probability Distribution
The Poisson probability distribution describes the number of times some event
occurs during a specified interval. The interval may be time, distance, area, or
volume.
The distribution is based on the following assumptions:
1. The probability (of the event occurring) is proportional to the length of the
interval. (The longer the interval, the larger the probability)
2. The intervals are independent. (The number of occurrences in one interval
does not affect the other intervals)
Poisson Probability Distribution
The Poisson distribution is also a limiting form of the binomial distribution
when the probability of success is very small and n is large.
It is often referred to as the ‘law of impossible events’ meaning that the
probability, p, of a particular event’s happening is quite small.
The Poisson distribution is a discrete probability distribution because it is
formed by counting.
Characteristics of the Poisson
distribution
1. The random variable is the number of times some event occurs during a
defined interval.
2. The probability of the event is proportional to the size of the interval.
3. The intervals do not overlap and are independent.
Uses (applications) of Poisson
distribution
This probability distribution has many applications. Some of these are:
•Number of customers arriving at a service facility in unit time (say, per hour)
•Number of telephone calls arriving at a telephone switch board per unit time (per minute)
•Number of printing mistakes per page in a book
•Number of radioactive particles decaying in a given interval of time
•Dimensional errors in an engineering drawing
•Number of accidents on a particular road per day
•Number of hospital emergencies per day
•Number of goals in a football match
•Number of defective parts in outgoing shipments
Formula of Poisson Distribution
The Poisson distribution can be described mathematically by the formula:
!
x = 0,1,2,…
Where:
μ is the mean number of occurrences in a particular interval
e is the constant (base of natural log) approximately equal to 2.71828
x is the number of occurrences
P(x) is the probability for a specified value of x
Mean and variance of Poisson
distribution
The mean number of occurrences, μ, is determined by n*p where n is the total
number of trials and p the probability of success.
The variance of the Poisson distribution is equal to its mean.
Example
Assume luggage is rarely lost by Kenya Airways. Most flights do not experience
any mishandled bags; some have one bag lost; a few have two bags lost; rarely
will a flight three lost bags; and so on. Suppose a random sample of 1,000 flights
shows a total of 300 bags were lost. Thus the arithmetic mean number of lost
bags per flight is 0.3 found by 300/1000. Thus in this case μ is 0.3. We can
therefore compute the various probabilities using the Poisson distribution
formula.
For example, the probability of not losing any bags is:
. .
!
= 0.7408 i.e. 74% of the flights will have no lost luggage.
The probability of exactly one lost bag is:
. .
!
= 0.2222 i.e. we would expect to find exactly one lost bag in 22% of the
flights.
Example
Trucks arrive at the Gilgil weigh bridge at the rate of two per minute. The distribution of
arrivals approximates a Poisson distribution.
a) What is the probability that no trucks arrive in a particular minute?
b) What is the probability that at least one truck arrives during a particular minute?
Solution
a) μ= 2, and X = 0 therefore, 𝑃 0 = !
= 0.1353
b) Probability of at least 1 truck = P(1) + P(2) + P(3) +…
= 1-P(0)
= 1-0.1353
= 0.8647
Continuous Probability
Distributions
A continuous probability distribution usually results from measuring something
such as distance, weight, amount etc.
When examining a continuous distribution we are usually interested in
information of observations that occur within a certain range (less than, more
than, between).
A continuous random variable has an infinite number of values within a
particular range. We therefore calculate the probability a variable will have a
value within a specified range rather than the probability for a specific value.
The Normal Probability
Distribution
The normal distribution has the following (complex) formula:
Where:
◦ μ is the mean
◦ σ is the standard deviation
◦ π is the constant (pi) whose value is approximately 22/7 or 3.1416
◦ e is the constant (base of natural log) approximately equal to 2.71828
◦ x is the value of a random variable
A normal distribution is based (defined by) its mean and standard deviation
Characteristics of the normal
distribution
•It is bell-shaped and has a single peak at the centre of the distribution. The arithmetic
mean, median and mode are all equal and located in the centre of the distribution. The
total area under the curve is 1.00. Half of the area under the normal curve is to the right
of this centre point and the other half to the left of it.
•It is symmetrical about the mean. If we cut the normal curve vertically at the centre
value, the two halves will be mirror images.
•It falls off smoothly in either direction from the central value but never touches the X-
axis. That is, the distribution is asymptotic: The curve gets closer and closer to the X-
axis but never actually touches it. The tails of the curve extend indefinitely in both
directions.
•The location of a normal distribution is determined by μ (its mean) and the dispersion or
spread is determined by σ (its standard deviation)
Normal distribution curve
Same mean different variances
There is not just one normal probability distribution but rather a ‘family’ of them.
Different means same variance
There is not just one normal probability distribution but rather a ‘family’ of them.
Different means and different variances
•For continuous probability distributions, areas below the curve define probabilities.
•The total area under the normal curve is 1.0. This accounts for all possible outcomes.
•Because the normal probability distribution is symmetric, the area under the curve to the
left of the mean is 0.5, and the area under the curve to the right of the mean is 0.5.
•To determine the probability that a value falls between two value, we need to know about
the standard normal distribution.
The Standard Normal
Probability Distribution
The number of normal distributions is unlimited, each having a different mean, standard
deviation or both.
In order to determine the probabilities of all these normal distributions, we use the
standard normal probability distribution.
It is unique because it has a mean of 0 and a standard deviation of 1.
Any normal probability distribution can be converted into a standard normal probability
distribution by subtracting the mean from each observation and dividing this difference by
the standard deviation. The results are called z values or z scores.
𝒙 𝝁
𝝈
A z value is the distance from the mean, measured in units of the standard deviation.
The Standard Normal
Probability Distribution
Once the normally distributed observations are standardised, the z values are normally
distributed with mean 0 and standard deviation of 1.
So the Z distribution has all the characteristics of any normal probability distribution.
Example
The monthly income of shift foremen on the Nairobi expressway project follow the
normal distribution with mean $1,000 and standard deviation of $100. What is the z
value for the income of a foreman who earns:
a) $1,100 per month
b) $900 per month
The Standard Normal
Probability Distribution
Solution
𝒙 𝝁
◦
𝝈
a) For X = 1,100:
z = (1,100 – 1,000)/100
= 1.00
b) For X = 900:
z = (900 – 1,000)/100
= -1.00
The Empirical Rule
1. About 68% of the area under the normal curve is within one standard deviation of the
mean. i.e. μ 1σ
2. About 95% of the area under the normal curve is within two standard deviations of
the mean. i.e. μ 2σ
3. About 99.7% (practically all) of the area under the normal curve is within three
standard deviations of the mean. i.e. μ 3σ
Transforming measurements to standard normal deviates changes the scale. For
example, μ + 1σ is converted to a z value of 1.0. Likewise, μ - 2σ is transformed to a z
value of -2.0.
Note that the centre of the z distribution is zero, indicating no deviation from the mean μ.
The Empirical Rule
Determining the probability using
z-tables
Example
The monthly income of shift foremen on the Nairobi expressway project follow the
normal distribution with mean $1,000 and standard deviation of $100. What is the
likelihood of selecting a foreman who:
a) Earns between $1,000 and $1,100?
b) Earns less than $1,100?
c) Between $790 and $1,200?
d) More than $1,250?
Determining the probability
using z-tables
Solution
The first step in determining the probability is to get the z value corresponding to the
desired income level. z=(x – μ)/σ
The next step is to read the probability associated with the calculated z value from the z-
tables (standard normal distribution tables)
Income values Z values Reading from the table Required
probability
P(1000<X<1,100) P(0<Z<1) P(z<1)-P(z<0) = 0.8413-0.5 0.3413
P(X<1,100) P(Z<1) P(z<1) = 0.8413 0.8413
P(790<X<1,200) P(-2.1<Z<2) P(z<2)-P(z<-2.1) = 0.9772-0.0179 0.9593
P(X>1,050) P(Z>0.5) 1-P(z<0.5) = 1-0.6915 0.3085
Finding the required value of the
observation given the probability
Example
Yana Tire Company wishes to set a minimum mileage guarantee on its new YK130 tire.
Tests reveal the mean mileage is 135,800 km with a standard deviation of 4,100 km and
that the distribution of kilometres follows a normal probability distribution. Yana wants
to set the minimum guaranteed mileage so that no more than 4% of the tires will have to
be replaced. What minimum guaranteed kilometres should Yana announce?
Solution
We need to find the kilometres such that only 4% or 0.04 of the tires manufactured by
Yana will get damaged (and therefore require replacement) before covering these
kilometres.
Finding the required value of the
observation given the probability
The tire will be replaced if the kilometres are less than the value X, which needs to be
determined.
The tire will be replaced
if the kilometres are
less than this value
4% or
0.04
X 135,800
z=(x – μ)/σ i.e. z = (x-135,800)/4,100
Our problem has 2 unknowns, Z and X. We first find z and then solve for x.
From the z-tables, we need to find the value of z that corresponds to the probability (area
under the curve) of 0.04.
From the tables the closest area to 0.04 is 0.0401 which corresponds to a z value of -1.75
Knowing that z is -1.75, we can now solve for x.
-1.75 = (x-135,800)/4,100
x = -1.75(4,100) +135,800
x = 128,625
Yana can therefore advertise that it will replace for free any tire that wears out before it
reaches 128,625km , and the company will know that only 4% of the tires will be replaced
under this plan.
Importance of the Normal
Distribution
1. Frequency distributions of many physical characteristics such as heights, weights,
exam scores, dimensions of items from production processes etc. often have the
shape of the normal curve.
2. The normal distribution is useful as an approximation to the various other
distributions under certain limiting conditions.
3. The normal distribution is a fairly ‘robust’ distribution i.e. reasonable results may be
obtained by approximating the normal distribution to many non-normal
distributions provided the distributions do not depart too much from normality.
4. By virtue of the ‘Central Limit Theory’, the distribution of the means of samples taken
from any population which need not be normal, tends towards the normal
distribution if the sample is large.
Importance of the Normal
Distribution
5. It is very useful in statistical quality control where the control limits are set by using
this distribution.
6. It has extensive uses in the sampling theory. Any statistic based on a large sample
generally follows the normal distribution. Hence it helps us to estimate population
parameters from sample statistics and to find the confidence intervals of the
parameters.
7. The normal distribution has a wide use in testing statistical hypothesis and tests of
significance in which the sample has been drawn from a normally distributed
population.
Practice questions
If 8% of the products produced by an automatic machine are defective, find the
probability of that out of 16 products selected at random, there are: (i) exactly 2
defective, (ii) at most 3 defective, (iii) at least 2 defective, (iv) between 1 and 3 defective
inclusive, (v) find the mean and variance of the number of defective products from the
sample of size 16.
A manufacture who produces beer bottles finds that 0.1% of the bottles are defective. The
bottles are packed in boxes containing 1,000 bottles. Using Poisson distribution, find
how many boxes will contain: (i) no defective bottles. (ii) at least 2 defective bottles.
The marks obtained in a certain examination follow the normal distribution with mean
45 and standard deviation 10. If 1,000 students appeared at the examination, calculate
the number of students scoring: (i) less than 40, (ii) more than 60, (iii) between 40 & 50
Practice questions
From recent college fees records available, the mean annual cost to attend college in
Kenya is ksh.53,600. Assume the distribution of annual college fees follows the normal
probability distribution and the standard deviation is Ksh8,000. Ninety five percent of
all students in colleges in Kenya pay less than what amount?