0% found this document useful (0 votes)
15 views7 pages

DE ZG535 (23-S2) - Sessions 13 (20 Apr 2024)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views7 pages

DE ZG535 (23-S2) - Sessions 13 (20 Apr 2024)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Session 13: Conditional probability

Topics: Joint events and conditional probability; Properties of conditional


probability; Theorem of total probability; Statistically independent events;
Bayes’ theorem; Markov chains

Joint events and conditional probability


Conditional probability arises in the study of random phenomena having constrained
outcomes. In other words, prior knowledge about possible outcomes gives rise to a
reduced sample space from the original sample space S.
Consider the events E and F of the sample space S, the occurrence of one affecting the
probability of the other. For instance, the experiment of tossing three fair coins
simultaneously has the sample space 𝑆 = {𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝐻𝑇𝑇, 𝑇𝐻𝐻, 𝑇𝐻𝑇, 𝑇𝑇𝐻, 𝑇𝑇𝑇},
1
wherein each outcome has probability 8.

Let the event E be “at least two Heads appear”, and the event F be “the first coin throws
1
a Tail”. i.e.; 𝐸 = {𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝑇𝐻𝐻} and 𝐹 = {𝑇𝐻𝐻, 𝑇𝐻𝑇, 𝑇𝑇𝐻, 𝑇𝑇𝑇} so that 𝑝(𝐸) = 2
1
and 𝑝(𝐹) = 2.

Let’s say we have prior knowledge that F occurred. Then the sample space S of E has
reduced to 𝐹 = {𝑇𝐻𝐻, 𝑇𝐻𝑇, 𝑇𝑇𝐻, 𝑇𝑇𝑇}. The only outcome from F that favors 𝐸 =
{𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝑇𝐻𝐻} is 𝐸 ∩ 𝐹 = {𝑇𝐻𝐻}. The event 𝐸 ∩ 𝐹 = {𝑇𝐻𝐻} is called the joint
event and the probability of occurrence of the joint event is called the joint probability
1
𝑝(𝐸 ∩ 𝐹) = 8.

The probability of the event E given that F occurred, called the conditional probability of
1
E subject to the occurrence of F, is denoted by 𝑝(𝐸|𝐹). Hence 𝑝(𝐸|𝐹) = .
4

Since the elements of the reduced sample space F that favor the occurrence of E are the
𝑝(𝐸∩𝐹)
elements common between E and F, we note that 𝑝(𝐸|𝐹) = , 𝑝(𝐹) ≠ 0 𝑜𝑟 𝐹 ≠ 𝜙.
𝑝(𝐹)

For notational convenience, we denote 𝑝(𝐸 ∩ 𝐹) as 𝑝(𝐸𝐹). Hence 𝑝(𝐸𝐹) = 𝑝(𝐸|𝐹)𝑝(𝐹).

Summarizing, for an event E whose probability is controlled by the occurrence of an event


F, both belonging to a certain sample space S, 𝑝(𝐸) is called the marginal probability, F

------------------------------------------------------------------------------------------------------------------------------------------------------------
Advanced Engineering Math (DE/AE ZG535) course notes
Param, Core Engineering Group, WILP Division, BITS Pilani
is called the reduced sample space, 𝑝(𝐸𝐹) is called the joint probability and 𝑝(𝐸|𝐹) is
called the conditional probability of E subject to the occurrence of F.

Conditional probability may be greater than, equal to, or less than the ordinary probability
– i.e.; 𝑝(𝐸|𝐹) ⋚ 𝑝(𝐸). Conditional probability obeys the basic axioms of probability
detailed earlier and its properties are discussed next.

Properties of conditional probability

The following properties of conditional probability arise from the axioms of probability:

(i) If F is an event of the sample space S, 𝑃(𝑆/𝐹) = 1

𝑃(𝑆 ∩ 𝐹) 𝑝(𝐹)
𝑃(𝑆|𝐹) = = =1
𝑝(𝐹) 𝑝(𝐹)

(ii) If E and F are events of the sample space S, then:

𝑃((𝐴 ∪ 𝐵)|𝐹) = 𝑝(𝐴|𝐹) + 𝑝(𝐵|𝐹) − 𝑝((𝐴 ∩ 𝐵)|𝐹)

(iii) If E and F are events of the sample space S, then:

𝑝(𝐸 ′ |𝐹) = 1 − 𝑝(𝐸|𝐹)

Theorem of total probability

A set of events 𝐸1 , 𝐸2 , … 𝐸𝑛 is said to


partition the sample space S if and only if:

(a) 𝐸𝑖 ∩ 𝐸𝑗 = 𝜙; 𝑖, 𝑗 = 1, 2, 3, … 𝑛; 𝑖 ≠ 𝑗

(b) ∑𝑛𝑗=1 𝐸𝑗 = 𝑆

i.e.; a partition of the sample space is a set


of events that are exhaustive and pairwise
disjoint. There is more than one way to
partition the sample space – the partition of S is not unique.

Let the events 𝐸1 , 𝐸2 , … 𝐸𝑛 partition the sample space S, and let A be an event in S (see
figure above). Then, the theorem of total probability states that 𝑝(𝐴) = 𝑝(𝐴|𝐸1 )𝑝(𝐸1 ) +
𝑝(𝐴|𝐸2 )𝑝(𝐸2 ) + ⋯ 𝑝(𝐴|𝐸𝑛 )𝑝(𝐸𝑛 ) = ∑𝑛𝑗=1 𝑝(𝐴|𝐸𝑗 )𝑝(𝐸𝑗 ).

------------------------------------------------------------------------------------------------------------------------------------------------------------
Advanced Engineering Math (DE/AE ZG535) course notes
Param, Core Engineering Group, WILP Division, BITS Pilani
Statistically independent events
Events E and F that belong to a certain sample space S are said to be statistically
independent if the probability of occurrence of one of them is unaffected by the occurrence
of the other. i.e.; E and F are statistically independent events if 𝑝(𝐸|𝐹) = 𝑝(𝐸). Along with
𝑝(𝐸𝐹)
𝑝(𝐸|𝐹) = , it follows that 𝑝(𝐸𝐹) = 𝑝(𝐸)𝑝(𝐹). In other words, the joint probability of the
𝑝(𝐹)

occurrence of E and F is the product of their individual probabilities of occurrence.

It can then be quickly shown that the notion of statistical independence is indeed
symmetric. If the occurrence of F does not change the probability of occurrence of E, then
it follows that the occurrence of E should not change the probability of occurrence of F as
𝑝(𝐸𝐹) 𝑝(𝐸)𝑝(𝐹)
per the following construct: 𝑝(𝐹|𝐸) = 𝑝(𝐸)
= 𝑝(𝐸)
= 𝑝(𝐹). i.e.; events E and F are
statistically independent of each other.

We should not overlook that the occurrence of F does affect the outcomes of the
experiment – but 𝑝(𝐸|𝐹) remains equal to 𝑝(𝐸). This merely means that 𝑝(𝐸 ∩ 𝐹) and
𝑝(𝐹) get scaled in the same ratio.

Statistical independence and mutual exclusivity are not the same. Mutually exclusive
events refer to a set of events, of which only one can occur during a trial. i.e.; if events E
and F are mutually exclusive, then 𝑝(𝐸𝐹) = 0. But if E and F are statistically independent,
𝑝(𝐸𝐹) = 𝑝(𝐸)𝑝(𝐹), which is non-zero unless one of the events is impossible.

We now consider the joint probability and statistical independence of three events E, F
and G that belong to the sample space S. The joint probability of E subjected to the
occurrence of F and G is 𝑝(𝐸|𝐹𝐺), which by definition is:
𝑝(𝐸𝐹𝐺)
𝑝(𝐸|𝐹𝐺) = ⟹ 𝑝(𝐸𝐹𝐺) = 𝑝(𝐸|𝐹𝐺)𝑝(𝐹𝐺) = 𝑝(𝐸|𝐹𝐺)𝑝(𝐹|𝐺)𝑝(𝐺)
𝑝(𝐹𝐺)

Likewise,

𝑝(𝐸𝐹𝐺𝐻)
𝑝(𝐸|𝐹𝐺𝐻) = ⟹ 𝑝(𝐸𝐹𝐺𝐻) = 𝑝(𝐸|𝐹𝐺𝐻)𝑝(𝐹𝐺𝐻) = 𝑝(𝐸|𝐹𝐺𝐻)𝑝(𝐹|𝐺𝐻)𝑝(𝐺|𝐻)𝑝(𝐻)
𝑝(𝐹𝐺𝐻)

A set of three or more events is said to be statistically independent if the knowledge of


the occurrence of one or more events does not alter the probability of occurrence of the
rest. For the three events E, F and G to be statistically independent, all of the following
must be valid:

------------------------------------------------------------------------------------------------------------------------------------------------------------
Advanced Engineering Math (DE/AE ZG535) course notes
Param, Core Engineering Group, WILP Division, BITS Pilani
𝑝(𝐸|𝐹) = 𝑝(𝐸); 𝑝(𝐸|𝐺) = 𝑝(𝐸); 𝑝(𝐸|𝐹𝐺) = 𝑝(𝐸)

𝑝(𝐹|𝐺) = 𝑝(𝐹); 𝑝(𝐹|𝐸) = 𝑝(𝐹); 𝑝(𝐹|𝐺𝐸) = 𝑝(𝐹)

𝑝(𝐺|𝐸) = 𝑝(𝐺); 𝑝(𝐺|𝐹) = 𝑝(𝐺); 𝑝(𝐺|𝐸𝐹) = 𝑝(𝐺)

It follows from the above that:

𝑝(𝐸𝐹) = 𝑝(𝐸)𝑝(𝐹); 𝑝(𝐹𝐺) = 𝑝(𝐹)𝑝(𝐺); 𝑝(𝐺𝐸) = 𝑝(𝐺)𝑝(𝐸); 𝑝(𝐸𝐹𝐺) = 𝑝(𝐸)𝑝(𝐹)𝑝(𝐺)

If only the first three conditions above are satisfied, E, F and G are said to be pairwise
independent, which is not sufficient to ensure independence.
We can extend the above to define statistical independence of more than three events.

Example 1: In the rolling of a fair die, determine if the events E=”even number”,
F=”number more than 2” and G=”number less than 5” are statistically independent.

𝐸 = {2, 4, 6}, 𝐹 = {3, 4, 5, 6}, 𝐺 = {1, 2, 3, 4}

𝐸𝐹 = {4, 6}, 𝐹𝐺 = {3, 4}, 𝐺𝐸 = {2, 4}, 𝐸𝐹𝐺 = {4}


1 2 2 1 1 1 1
𝑝(𝐸) = 2, 𝑝(𝐹) = 3, 𝑝(𝐺) = 3, 𝑝(𝐸𝐹) = 3, 𝑝(𝐹𝐺) = 3, 𝑝(𝐺𝐸) = 3, 𝑝(𝐸𝐹𝐺) = 6

𝑝(𝐸𝐹) = 𝑝(𝐸)𝑝(𝐹), 𝑝(𝐹𝐺) ≠ 𝑝(𝐹)𝑝(𝐺), 𝑝(𝐺𝐸) = 𝑝(𝐺)𝑝(𝐸), 𝑝(𝐸𝐹𝐺) ≠ 𝑝(𝐸)𝑝(𝐹)𝑝(𝐺)

The events are statistically dependent. ◼

Bayes’ theorem

Context: Let there be two bags I and II. Bag I contains 2W and 3R balls. Bag II contains
4W and 5R balls. Here W is white and R is red. Consider the experiment of selecting a
bag at random and drawing a ball from it. We can determine the probability of drawing a
W or a R ball (theorem of total probability). Now, assume that a W ball has been drawn
from one of the bags. What’s the probability that it was drawn from bag II?

Let the events 𝐸1 , 𝐸2 , … 𝐸𝑛 partition the sample space S, and let A be an event in S. Then:

𝑝(𝐸𝑖 𝐴) = 𝑝(𝐴|𝐸𝑖 )𝑝(𝐸𝑖 ) = 𝑝(𝐸𝑖 |𝐴)𝑝(𝐴) ⟹


𝑝(𝐴|𝐸𝑖 )𝑝(𝐸𝑖 )
𝑝(𝐸𝑖 |𝐴) =
𝑝(𝐴)

is called Bayes’ theorem; where according to the theorem of total probability:

𝑝(𝐴) = 𝑝(𝐴|𝐸1 )𝑝(𝐸1 ) + 𝑝(𝐴|𝐸2 )𝑝(𝐸2 ) + ⋯ 𝑝(𝐴|𝐸𝑛 )𝑝(𝐸𝑛 ) = ∑𝑛𝑗=1 𝑝(𝐴|𝐸𝑗 )𝑝(𝐸𝑗 ).

------------------------------------------------------------------------------------------------------------------------------------------------------------
Advanced Engineering Math (DE/AE ZG535) course notes
Param, Core Engineering Group, WILP Division, BITS Pilani
Since the events 𝐸1 , 𝐸2 , … 𝐸𝑛 partition the sample space S, they are pairwise disjoint –
only one of these events can, and must, occur. Think of the events 𝐸1 , 𝐸2 , … 𝐸𝑛 as the
possible causes of the effect A. The 𝑝(𝐸𝑖 )’s are the probabilities of occurrence of each
cause before the effect A occurs and are called apriori (beforehand) probabilities of the
causes 𝐸𝑖 . 𝑝(𝐸𝑖 |𝐴) is the probability that the ith cause 𝐸𝑖 produces the effect A and is called
the aposteriori (after the fact) probability. Bayes’ theorem finds the aposteriori probability
of the cause 𝐸𝑖 in terms of the apriori probabilities 𝐸1 , 𝐸2 , … 𝐸𝑛 .
𝑝(𝐴|𝐸𝑖 )𝑝(𝐸𝑖 )
It is interesting to note that Bayes theorem (𝑝(𝐸𝑖 |𝐴) = ), allows the observed
𝑝(𝐴)

data (𝑝(𝐴|𝐸𝑖 )) to modify the apriori probability of the causes (𝑝(𝐸𝑖 )) to statistically infer
the aposteriori probability of the causes 𝑝(𝐸𝑖 |𝐴).

Example 2: A card from a pack of 52 playing cards is lost. From the remaining cards of
the pack, three cards drawn at random without replacement are found to be all spades.
Find the probability that the lost card is a spade.
Let E denote the event “three cards drawn at random are spades”. Let D, H, S and C
denote the respective events that the lost card is a diamond, hearts, spade and clubs.
13 1
Then 𝑝(𝐷) = 𝑝(𝐻) = 𝑝(𝑆) = 𝑝(𝐶) = 52 = 4.
13
𝐶3 13 × 12 × 11 286
𝑝(𝐸|𝐷) = 𝑝(𝐸|𝐻) = 𝑝(𝐸|𝐶) = 51𝐶
= =
3 51 × 50 × 49 20825
12
𝐶3 12 × 11 × 10 44
𝑝(𝐸|𝑆) = 51𝐶
= =
3 51 × 50 × 49 4165

From the theorem of total probability,

𝑝(𝐸) = 𝑝(𝐸|𝐷)𝑝(𝐷) + 𝑝(𝐸|𝐻)𝑝(𝐻) + 𝑝(𝐸|𝐶)𝑝(𝐶) + 𝑝(𝐸|𝑆)𝑝(𝑆)


1 286 1 44 539
= (3 × × )+( × )=
4 20825 4 4165 41650
44 1
𝑝(𝐸|𝑆)𝑝(𝑆) × 110
Hence 𝑝(𝑆|𝐸) = = 4165 4
539 = 539 ≈ 0.2 ◼
𝑝(𝐸)
41650

Markov chains

Earlier, we independent considered a sequence of independent Bernoulli trials – random


experiments in which the trials are independent and each trial has just two outcomes
whose probabilities remain unchanged during the course of the experiment. The

------------------------------------------------------------------------------------------------------------------------------------------------------------
Advanced Engineering Math (DE/AE ZG535) course notes
Param, Core Engineering Group, WILP Division, BITS Pilani
assumption of independence makes the calculation of joint probabilities from the marginal
probabilities relatively straightforward as explained below.

Let us consider the experiment of tossing a coin twice. The coin has probability 𝑝 of
coming up with heads. We are interested in calculating the joint probability of 𝑇1 𝐻2 (tail in
the first trial and head in the second). Simply multiply the individual probabilities of 𝑇1 for
the first trial and 𝐻2 for the second trial and arrive at (1 − 𝑝)𝑝 as the joint probability. This
is possible because the appearance of a head or a tail in the first toss doesn’t affect the
probabilities of the outcomes in the second toss. An event E of the experiment can be
considered as the joint event 𝐸1 ∩ 𝐸2 ∩ 𝐸3 ∩ … 𝐸𝑛 , where 𝐸𝑖 is the required event in the ith
trial so that 𝑝(𝐸) = 𝑝(𝐸1 ) 𝑝(𝐸2 )𝑝(𝐸3 ) … 𝑝(𝐸𝑛 ). Notice a slight difference in the way
independent events were defined before. They were defined as events on the sample
space. But here E and the 𝐸𝑖 ’s are defined on different sample spaces. We will not worry
about this.

If we remove the assumption of independence, we can no longer write 𝑝(𝐸) =


𝑝(𝐸1 ) 𝑝(𝐸2 )𝑝(𝐸3 ) … 𝑝(𝐸𝑛 ). Instead, we fall back on the following:

𝑝(𝐸) = 𝑝(𝐸1 ∩ 𝐸2 ∩ 𝐸3 ∩ … 𝐸𝑛 )
= 𝑝(𝐸𝑛 |𝐸1 𝐸2 … 𝐸𝑛−1 ) 𝑝(𝐸𝑛−1 |𝐸1 𝐸2 … 𝐸𝑛−2 ) 𝑝(𝐸𝑖 |𝐸1 𝐸2 … 𝐸𝑖−1 ) … 𝑝(𝐸2 |𝐸1 ) 𝑝(𝐸1 )

Dependent Bernoulli trials are trials of a random experiment that have exactly two
outcomes (in each trial) whose probabilities depend on the outcomes of the previous
trials. A sequence of such trials is called a sequence of dependent Bernoulli trials.

A dependent Bernoulli sequence in which the probabilities of a certain trial depend only
on the outcomes of the previous trial is called a Markov chain. i.e.; 𝑝(𝐸𝑖 |𝐸1 𝐸2 … 𝐸𝑖−1 ) =
𝑝(𝐸𝑖 |𝐸𝑖−1 ), so that the probability of any joint event may be determined from:

𝑝(𝐸) = 𝑝(𝐸1 ∩ 𝐸2 ∩ 𝐸3 ∩ … 𝐸𝑛 ) = 𝑝(𝐸𝑛 |𝐸𝑛−1 ) 𝑝(𝐸𝑛−1 |𝐸𝑛−2 ) 𝑝(𝐸𝑖−1 ) … 𝑝(𝐸2 |𝐸1 ) 𝑝(𝐸1 )

The following example deals with a Markov chain.

Example 2: Consider two coins, one fair and the other weighted. The weighted coin has
1
probability of heads 4. Consider the following experiment: Select a coin at random and
toss it. If it is heads that comes up, pick the fair coin and toss it. If it is tails that comes up,
pick the weighted coin and toss it. Continue this process for the succeeding trials.

------------------------------------------------------------------------------------------------------------------------------------------------------------
Advanced Engineering Math (DE/AE ZG535) course notes
Param, Core Engineering Group, WILP Division, BITS Pilani
The outcome of a trial depends on whether a fair coin or weighted coin is chosen for that
trial. But this depends on the outcome of the previous trial (we choose a fair coin in a trial
if the previous trial threw heads, else we choose the weighted coin).
The outcome of the previous trial is called the state of the
sequence. The probabilities in each trial, called state
transition probabilities, are actually conditional probabilities
(because they depend on the outcomes of the previous
1 1
trial). In this case, 𝑝(𝐻𝑖 |𝐻𝑖−1 ) = 2, 𝑝(𝐻𝑖 |𝑇𝑖−1 ) = 4,
1 3
𝑝(𝑇𝑖 |𝐻𝑖−1 ) = 2, 𝑝(𝑇𝑖 |𝑇𝑖−1 ) = 4. We can illustrate the chain by means of the Markov state
probability diagram shown alongside. ◼

Example 3: For the Markov chain in Example 2, determine the probability of 10 heads in
succession – i.e.; the event {(𝐻, 𝐻, 𝐻, 𝐻, 𝐻, 𝐻, 𝐻, 𝐻, 𝐻, 𝐻)}.

From 𝑝(𝐸) = 𝑝(𝐸𝑛 |𝐸𝑛−1 ) 𝑝(𝐸𝑛−1 |𝐸𝑛−2 ) 𝑝(𝐸𝑖−1 ) … 𝑝(𝐸2 |𝐸1 ) 𝑝(𝐸1 ), we can write:

𝑝(𝐻, 𝐻, 𝐻, 𝐻, 𝐻, 𝐻, 𝐻, 𝐻, 𝐻, 𝐻) = 𝑝(𝐻10 |𝐻9 ) 𝑝(𝐻9 |𝐻8 ) 𝑝(𝐻8 |𝐻7 ) … 𝑝(𝐻2 |𝐻1 ) 𝑝(𝐻1 )

1
𝑝(𝐻10 |𝐻9 ) = 𝑝(𝐻9 |𝐻8 ) = 𝑝(𝐻8 |𝐻7 ) = 𝑝(𝐻2 |𝐻1 ) =
2

𝑝(𝐻1 ) = 𝑝(𝐻1 |𝑓𝑎𝑖𝑟 𝑐𝑜𝑖𝑛) 𝑝(𝑓𝑎𝑖𝑟 𝑐𝑜𝑖𝑛) + 𝑝(𝐻1 |𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑐𝑜𝑖𝑛) 𝑝(𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑐𝑜𝑖𝑛)
1 1 1 1 3
=( × )+( × )=
2 2 4 2 8

1 9 3 3
Hence probability of 10 heads in succession = (2) (8) = 4096.

The above Markov process can be illustrated


using a trellis diagram. Starting with the initial
probability, the probability of a sequence of
events may be traced through the trellis by
multiplying the probabilities of each branch. This
is shown in the adjoint sketch.

Try solving Example 3 above using this trellis


diagram.
-§§§§§§§§§§§-
------------------------------------------------------------------------------------------------------------------------------------------------------------
Advanced Engineering Math (DE/AE ZG535) course notes
Param, Core Engineering Group, WILP Division, BITS Pilani

You might also like