MGT 2148 - Business Analytics and
Decision Making
Week 4
Professor: Mohammad Raahemi Spring 2025
Week 4 – Probability
❑ Joint, Marginal, and Conditional Probability
❑ Probability Complement Rule
❑ Probability Multiplication Rule
❑ Probability Addition Rule
❑ Probability Trees
Introduction
After completing this week, you should be able to:
▪ Define and Assign Probability to Events
▪ Compute and Interpret Joint, Marginal, and Conditional Probability
▪ Develop Probability Trees
▪ Apply Bayes’s Law
▪ Identify the Correct Method of Probability
Approaches to Assigning Probabilities
There are three ways to assign a probability, P(Oi), to
an outcome, Oi, namely:
❑ Classical approach: based on equally likely
events.
❑ Relative frequency: assigning probabilities
based on experimentation or historical data.
❑ Subjective approach: Assigning probabilities
based on the assignor’s (subjective) judgment.
Classical Approach
If an experiment has n possible outcomes, this method would assign a probability of 1/n to
each outcome. It is necessary to determine the number of possible outcomes.
Experiment: Rolling a die
Outcomes: {1, 2, 3, 4, 5, 6}
Probabilities: Each sample point has a 1/6 chance of occurring.
Classical Approach
Experiment: Rolling two dice and observing the total
Outcomes: {2, 3, …, 12}
Examples:
1 2 3 4 5 6
P(2) = 1/36 1 2 3 4 5 6 7
2 3 4 5 6 7 8
P(6) = 5/36 3 4 5 6 7 8 9
4 5 6 7 8 9 10
P(10) = 3/36
5 6 7 8 9 10 11
6 7 8 9 10 11 12
Relative Frequency Approach
Bits & Bytes Computer Shop tracks the number Desktops
# of Days Desktops Sold
Sold
of desktop computer systems it sells over a
0 1 1/30 = 0.03
month (30 days):
1 2 2/30 = 0.07
For example, 10 days out of 30, 2 desktops
were sold. From this we can construct the 2 10 10/30 = 0.33
probabilities of an event (i.e. the # of desktop 3 12 12/30 = 0.40
sold on a given day). 4 5 5/30 = 0.17
There is a 40% chance Bits & Bytes will sell 3 ∑ = 1.00
desktops on any given day
What is a Probability Distribution?
The probability distribution of a random variable describes the values that the random variable
can take along with the probabilities of those values.
Discrete provides the probability
Discrete Probability mass
Probability for each value of the
function random variable.
Distribution
Random
Variable
determines the
Continuous Probability probability with which
Probability density the continuous random
Continuous variable lies in a given
Distribution function interval.
Subjective Approach
“In the subjective approach we define probability as the degree of belief that we hold in the
occurrence of an event”
e.g., weather forecasting’s “P.O.P.”
“Probability of Precipitation” (or P.O.P.) is defined in different ways by different forecasters, but
basically, it’s a subjective probability based on past observations combined with current
weather conditions.
POP 60% – based on current conditions, there is a 60% chance of rain (say).
Interpreting Probability
❑ No matter which method is used to assign
probabilities all will be interpreted in the
relative frequency approach
❑ For example, a government lottery game
where 6 numbers (of 49) are picked. The
classical approach would predict the
probability for any one number being picked
as 1/49=2.04%.
❑ We interpret this to mean that in the long run
each number will be picked 2.04% of the time.
Joint, Marginal, Conditional Probability
We study methods to determine probabilities of events that result from combining other events
in various ways.
There are several types of combinations and relationships between events:
❑ Complement event
❑ Intersection of events
❑ Union of events
❑ Mutually exclusive events
❑ Dependent and independent events
Complement of an Event
The complement of event A is defined to be the event consisting of all sample points that are
“not in A”.
Complement of A is denoted by Ac
The Venn diagram below illustrates the concept of a complement.
P(A) + P(Ac ) = 1
For example, the rectangle stores all the possible tosses of 2
dice {(1,1), 1,2),… (6,6)}.
Let A = tosses totaling 7 {(1,6), (2, 5), (3,4), (4,3), (5,2), (6,1)}
A Ac
P(Total = 7) + P(Total not equal to 7) = 1
Intersection of Two Events
The intersection of events A and B is the set of all sample points that are in both A and B.
The intersection is denoted: A and B
The joint probability of A and B is the probability of the intersection of A and B, i.e. P(A and B)
For example, let A = tosses where first toss is 1 {(1,1),
(1,2), (1,3), (1,4), (1,5), (1,6)} and B = tosses where the
second toss is 5 {(1,5), (2,5), (3,5), (4,5), (5,5), (6,5)}
The intersection is {(1,5)}
The joint probability of A and B is the probability of the
intersection of A and B, i.e. P(A and B) = 1/36
A B
Union of Two Events
The union of two events A and B, is the event containing all sample points that are in A or B or
both:
Union of A and B is denoted: A or B
For example, let A = tosses where first toss is 1 {(1,1),
(1,2), (1,3), (1,4), (1,5), (1,6)} and B is the tosses that the
second toss is 5 {(1,5), (2,5), (3,5), (4,5), (5,5), (6,5)}
Union of A and B is {(1,1), (1,2), (1,3), (1,4), (1,5), (1,6), A B
(2,5), (3,5), (4,5), (5,5), (6,5)}
Mutually Exclusive Events
When two events are mutually exclusive (that is the two events cannot occur together), their
joint probability is 0, hence:
A B
Mutually exclusive; no points in common…
For example A = tosses totaling 7, and B = tosses totaling 11
Example 6.1
Why are some mutual fund managers more successful than others? One possible factor is
where the manager earned his or her MBA. The following table compares mutual fund
performance against the ranking of the school where the fund manager earned their MBA:
Mutual fund Mutual fund
outperforms the doesn’t
market outperform the
market
Top 20 MBA .11 .29
program
Not top 20 MBA .06 .54
program
Example 6.1
Alternatively, we could introduce shorthand notation to represent the events:
A1 = Fund manager graduated from a top-20 MBA program
A2 = Fund manager did not graduate from a top-20 MBA program
B1 = Fund outperforms the market
B2 = Fund does not outperform the market
e.g., P(A2 and B1) = 0.06 = the probability a fund outperforms the market and the manager isn’t from a top-
20 school.
B1 B2
A1 0.11 0.29
A2 0.06 0.54
Marginal Probabilities
Marginal probabilities are computed by adding across rows and down columns; that is they are
calculated in the margins of the table: P(A2) = .06 + .54
“what’s the probability a fund
manager isn’t from a top school?”
B1 B2 P(Ai)
A1 0.11 0.29 0.40
A2 0.06 0.54 0.60
P(Bj) 0.17 0.83 1.00
P(B1) = .11 + .06 BOTH margins must add to 1
“what’s the probability a fund (useful error check)
outperforms the market?”
Conditional Probability
Conditional probability is used to determine how two events are related; that is, we can
determine the probability of one event given the occurrence of another related event.
Conditional probabilities are written as P(A | B) and read as “the probability of A given B” and is
calculated as:
Conditional Probability
Again, the probability of an event given that another event has occurred is called a conditional
probability.
Note how “A given B” and “B given A” are related…
Example 6.2
In Example 6.1, What’s the probability that a fund will outperform the market given that the
manager graduated from a top-20 MBA program?
Recall:
A1 = Fund manager graduated from a top-20 MBA program
A2 = Fund manager did not graduate from a top-20 MBA program
B1 = Fund outperforms the market
B2 = Fund does not outperform the market
Thus, we want to know “what is P(B1 | A1) ?”
Example 6.2
Thus, there is a 27.5% chance that that a fund will outperform the market given that the
manager graduated from a top-20 MBA program.
B1 B2 P(Ai)
A1 0.11 0.29 0.40
A2 0.06 0.54 0.60
P(Bj) 0.17 0.83 1.00
Independence
One of the objectives of calculating conditional probability is to determine whether two events
are related.
In particular, we would like to know whether they are independent, that is, if the probability of
one event is not affected by the occurrence of the other event.
Two events A and B are said to be independent if:
P(A|B) = P(A) or P(B|A) = P(B)
Independence
For example, we saw that P(B1 | A1) = 0.275
The marginal probability for B1 is: P(B1) = 0.17. Since P(B1|A1) ≠ P(B1), B1 and A1 are not
independent events.
Stated another way, they are dependent. That is, the probability of one event (B1) is affected by
the occurrence of the other event (A1).
Union
We stated earlier that the union of two events is denoted as: A or B. We can use this concept
to answer questions like:
Determine the probability that a fund outperforms the market or the manager graduated from a
top-20 MBA program. Determine the probability that a fund outperforms (B1) or the manager
graduated from a top-20 MBA program (A1).
Union
A1 or B1 occurs whenever:
A1 and B1 occurs, A1 and B2 occurs, or A2 and B1 occurs…
B1 B2 P(Ai)
A1 .11 .29 .40
A2 .06 .54 .60
P(Bj) .17 .83 1.00
P(A1 or B1) = 0.11 + 0.06 + 0.29 = 0.46
Probability Rules and Trees
We introduce three rules that enable us to calculate the probability of more complex events
from the probability of simpler events
▪ The Complement Rule
▪ The Multiplication Rule
▪ The Addition Rule
Probability Trees
An effective and simpler method of applying the probability rules is the probability tree, wherein
the events in an experiment are represented by lines. The resulting figure resembles a tree,
hence the name. We will illustrate the probability tree with several examples, including two that
we addressed using the probability rules alone.
Example 6.5
This is P(F|F), the probability of selecting a female
student second, given that a female was already
chosen first
This is P(F), the
probability of selecting a
female student first First selection Second selection
Example 6.5
At the ends of the “branches”, we calculate joint probabilities as the product of the individual
probabilities on the preceding branches.
Joint probabilities
First selection Second selection
P(FF)=(3/10)(2/9)
P(FM)=(3/10)(7/9)
P(MF)=(7/10)(3/9)
P(MM)=(7/10)(6/9)
Example 6.6
Suppose we have our grad class of 10 students again, but make the student sampling
independent, that is “with replacement” – a student could be picked first and picked again in
the second round. Our tree and joint probabilities now look like:
FF P(FF)=(3/10)(3/10)
FM P(FM)=(3/10)(7/10)
MF P(MF)=(7/10)(3/10)
MM P(MM)=(7/10)(7/10)
Example 6.6
Suppose we have our grad class of 10 students again, but make the student sampling
independent, that is “with replacement” – a student could be picked first and picked again in
the second round. Our tree and joint probabilities now look like:
FF P(FF)=(3/10)(3/10)
FM P(FM)=(3/10)(7/10)
MF P(MF)=(7/10)(3/10)
MM P(MM)=(7/10)(7/10)
Example 6.6
Suppose we have our grad class of 10 students again, but make the student sampling
independent, that is “with replacement” – a student could be picked first and picked again in
the second round. Our tree and joint probabilities now look like:
FF P(FF)=(3/10)(3/10)
FM P(FM)=(3/10)(7/10)
MF P(MF)=(7/10)(3/10)
MM P(MM)=(7/10)(7/10)
Example 6.6
Suppose we have our grad class of 10 students again, but make the student sampling
independent, that is “with replacement” – a student could be picked first and picked again in
the second round. Our tree and joint probabilities now look like:
FF P(FF)=(3/10)(3/10)
FM P(FM)=(3/10)(7/10)
MF P(MF)=(7/10)(3/10)
MM P(MM)=(7/10)(7/10)
Example 6.6
Suppose we have our grad class of 10 students again, but make the student sampling
independent, that is “with replacement” – a student could be picked first and picked again in
the second round. Our tree and joint probabilities now look like:
FF P(FF)=(3/10)(3/10)
FM P(FM)=(3/10)(7/10)
MF P(MF)=(7/10)(3/10)
MM P(MM)=(7/10)(7/10)
Example 6.8
Law school grads must pass a bar exam. Suppose pass rate for first-time test takers is 72%.
They can re-write if they fail and 88% pass their second attempt. What is the probability that a
randomly grad passes the bar?
P(Pass) = 0.72
First exam
Second exam
P(Fail and Pass)=
(0.28)(0.88)=0.2464
P(Fail and Fail) =
(0.28)(0.12) = 0.0336
Example 6.8
What is the probability that a randomly grad passes the bar?
“There is almost a 97% chance they will pass the bar”
P(Pass) = P(Pass 1st) + P(Fail 1st and Pass 2nd) = 0.7200 + 0.2464 = .9664
P(Pass) = 0.72
First exam
Second exam
P(Fail and Pass)=
(0.28)(0.88)=0.2464
P(Fail and Fail) =
(0.28)(0.12) = 0.0336
Bayes’ Law
Bayes’ Law is named for Thomas Bayes, an eighteenth century mathematician. In its most
basic form, if we know P(B | A), we can apply Bayes’ Law to determine P(A | B)
P(B|A) P(A|B)
Example 6.9
Let A = GMAT score of 650 or more,
hence AC = GMAT score less than 650
Our student has determined the probability of getting greater than 650 (without any prep course) as 10%, that is:
P(A) = 0.10 → It follows that P(AC) = 1 – 0.10 = 0.90
Let B represent the event “take the prep course”
and thus, BC is “do not take the prep course”
From our survey information, we’re told that among GMAT scorers above 650, 52% took a preparatory course, that
is:P(B | A) = 0.52
(Probability of finding a student who took the prep course given that they scored above 650…)
But our student wants to know P(A | B), that is, what is the probability of getting more than 650 given that a prep course
is taken?
If this probability is > 20%, he will spend $500 on the prep course.
Example 6.9
Among GMAT scorers of less than 650 only 23% took a preparatory course. That is: P(B |AC ) = .23
(Probability of finding a student who took the prep course given that he or she scored less than 650…)
Conditional probabilities are P(B | A) = 0.52 and P(B |AC ) = 0.23
Again using the complement rule we find the following conditional probabilities.
P(BC | A) = 1 - 0.52 = 0.48
and
P(BC | AC ) = 1 - 0.23 = 0.77
Example 6.9
In order to go from P(B | A) = 0.52 to P(A | B) = ??
we need to apply Bayes’ Law.
Graphically:
Score ≥ 650 Prep Test
A and B 0.052
Marginal Prob.
A and BC 0.048
P(B) = P(A and B) +
P(AC and B) = 0.259
AC and B 0.207
AC and BC 0.693
Thank you
Source of Content: G. Keller (2017) Statistics for Management
Source of Decorative Figures: [Link]