Give a detailed account on why conditional probabilities be inconsistent with the prior
probabilities provided by the expert. Give an example of such an inconsistency. [15]
The framework for Bayesian reasoning requires probability values as primary inputs. The
assessment of these values usually involves human judgement. However, psychological
research shows that humans either cannot elicit probability values consistent with the
Bayesian rules or do it badly (Burns and Pearl, 1981; Tversky and Kahneman, 1982).
This suggests that the conditional probabilities may be inconsistent with the prior
probabilities given by the expert. Consider, for example, a car that does not start and makes
odd noises when you press the starter. The conditional probability of the starter being faulty
if the car makes odd noises may be expressed as:
IF the symptom is ‘odd noises’
THEN the starter is bad {with probability 0.7}
Apparently the conditional probability that the starter is not bad if the car makes odd noises
is:
Therefor
e, we can obtain a companion rule that states
IF the symptom is ‘odd noises’
THEN the starter is good {with probability 0.3}
Domain experts do not deal easily with conditional probabilities and quite often deny the
very existence of the hidden implicit probability (0.3 in our example). In our case, we would
use available statistical information and empirical studies to derive the following two rules:
IF the starter is bad
THEN the symptom is ‘odd noises’ {with probability 0.85}
IF the starter is bad
THEN the symptom is not ‘odd noises’ {with probability 0.15}
To use the Bayesian rule, we still need the prior probability, the probability that the starter
is bad if the car does not start. Here we need an expert judgement. Suppose, the expert
supplies us the value of 5 per cent. Now we can apply the Bayesian rule (3.18) to obtain
The number obtained is significantly lower than the expert’s estimate of 0.7 given at the
beginning of this assessment.
The most obvious reason for the inconsistency is that the expert made different
assumptions when assessing the conditional and prior probabilities. We may attempt to
investigate it by working backwards from the posterior probability P (starter is bad I odd
noises) to the prior probability p (starter is bad). In our case, we can assume that:
If we now take the value of 0.7, p (starter is bad | odd noises), provided by the expert as the
correct one, the prior probability p (starter is bad) would have to be:
This value is almost six times larger than the figure of 5 per cent provided by the expert.
Thus the expert indeed uses quite different estimates of the prior and conditional
probabilities. In fact, the prior probabilities also provided by the expert are likely to be
inconsistent with the likelihood of sufficiency, LS, and the likelihood of necessity, LN. Several
methods are proposed to handle this problem (Duda et al., 1976).
The most popular technique, first applied in PROSPECTOR, is the use of a piecewise linear
interpolation model (Duda et al., 1979). However, to use the subjective Bayesian approach,
we must satisfy many assumptions, including the conditional independence of evidence
under both a hypothesis and its negation. As these assumptions are rarely satisfied in real
world problems, only a few systems have been built based on Bayesian reasoning. The best
known one is PROSPECTOR, an expert system for mineral exploration (Duda et al., 1979).
1. Explain in detail how a rule-based expert system propagates uncertainties using the
Bayesian approach. [15]
The Bayes’s theorem is a mechanism for combining new and existent evidence, usually given
as subjective probabilities. It is used to revise existing prior probabilities based on new
information. The Bayesian approach is based on subjective probabilities (probabilities
estimated by an expert without the benefit of a formal model). A subjective probability is
provided for each proposition (Sonal , et al., 2014).
If E is the evidence (the sum of all information available to the system), then each
proposition, P, has associated with it a value representing the probability that P holds in
light of all the evidence, E, derived by using Bayesian inference. The Bayes’s theorem offers
a way of computing the probability of a particular event, given some set of observations that
have already been made (Sonal , et al., 2014).
Rule-based systems express knowledge in an IF-THEN format:
IF X is true
THEN Y can be concluded with probability p
If we observe that X is true, then we can conclude that Y exist with the specified probability. For
example:
IF the patient has a cold
THEN the patient will sneeze (0.75)
But what if we reason deductively and observe Y (i.e., the patient sneezes) while knowing nothing
about X (that the patient has a cold)? What can we conclude about it? Bayes’ Theorem describes
how we can derive a probability for X.
Within the rule given above, Y (denotes some piece of evidence (typically referred to as E)
and X denotes some hypothesis (H) given
P (E | H) P (H)
(1) P H | E) = -------------------
P (E)
OR
P (E | H) P (H)
(2) P (H | E) = -----------------------------------------
P (E | H) P (H) + P (E | H’) P (H’)
To make this more concrete, consider whether Rob has a cold (the hypothesis) given that he
sneezes (the evidence). Equation (2) states that the probability that Rob has a cold given
that he sneezes is the ratio of the probability that he both has a cold and sneezes, to the
probability that he sneezes.
The probability of his sneezing is the sum of the conditional probability that he sneezes
when he has a cold and the conditional probability that he sneezes when he doesn’t have a
cold. In other words, the probability that he sneezes regardless of whether he has a cold or
not. Suppose that we know in general:
(H) = P (Rob has a cold)
= 0.2
P (E | H) = P (Rob was observed sneezing | Rob has a cold)
= 0.75
P (E | H’) = P (Rob was observed sneezing | Rob does not have a cold)
= 0.2
Then
P (E) = P (Rob was observed sneezing)
= (0.75) (0.2) + (0.2) (0.8)
= 0.31
and P(H | E) =P(Rob has a cold | Rob was observed sneezing)
(0.75)(0.2)
= ---------------
(0.31)
= 0.48387
Or Rob’s probability of having a cold given that he sneezes is about 0.5.
What we have just examined is very limited since we have only considered when each piece
of evidence affects only one hypothesis. This must be generalized to deal with “m”
hypotheses H1, H2,... Hm and “n” pieces of evidence E1, ..., En, the situation normally
encountered in real-world problems. When these factors are included, Equation (2)
becomes
(3) P(Ej1Ej2...Ejk | Hi) P(Hi)
P(Hi | Ej1 Ej2 ...Ejk) = -----------------------------------
P(Ej1 Ej2 ... Ejk)
P(Ej1 | Hi)P(Ej2 | Hi) ... P(Ejk | Hi)P(Hi)
= -------------------------------------------------------
m
P(Ej1 | Hl)P(Ej2 | Hl) ... P(Ejk | Hl)P(Hl)
l=1
where {j1, ...,jk) {1, ..., n}
This probability is called the posterior probability of hypothesis H i from observing evidence Ej1,
Ej2, ..., Ejk.
This equation is derived based on several assumptions:
The hypotheses H1, ..., Hm, m 1, are mutually exclusive.
Furthermore, the hypotheses H1, ..., Hm are collectively exhaustive.
The pieces of evidence E1, ..., En, n 1, are conditionally independent given any hypothesis
Hi, 1 i m .
Conditional independent: The events E1, E2, ..., En, are conditionally independent
given an event H if
P (Ej1 ... Ejk | H) = P(Ej1 | H) ...P(Ejk | H)
For each subset {j1, ...,jk) {1, ..., n}.
This last assumption often causes great difficulties for probabilistic based methods. For
example, two symptoms, A and B, might each independently indicate that some disease is
50 percent likely. Together, however, it might be that these symptoms reinforce (or
contradict) each other. Care must be taken to ensure that such a situation does not exist
before using the Bayesian approach.
To illustrate how belief is propagated through a system using Bayes’ rule, consider the
values shown in the Table below. These values represent (hypothetically) three mutually
exclusive and exhaustive hypotheses
1. H1, the patient, Rob, has a cold;
2. H2, Rob has an allergy; and
3. H3, Rob has a sensitivity to light
With their prior probabilities, P (Hi)’s, and two conditionally independent pieces of evidence
1. E1, Rob sneezes and
2. E2, Rob coughs,
Which support these hypotheses to differing degrees.
If we observe evidence E1 (e.g., the patient sneezes), we can compute posterior probabilities
for the hypotheses using Equation (3) (where k = 1) to be:
(0.3)(0.6)
P (H1 | E1) = ------------------------------------------ = 0.4
(0.3)(0.6) + (0.8) (0.3) + (0.3) (0.1)
(0.8)(0.3)
P (H2 | E1) = ------------------------------------------ = 0.53
(0.3)(0.6) + (0.8) (0.3) + (0.3 (0.1)
(0.3)(0.1)
P (H3 | E1) = ------------------------------------------ = 0.06
(0.3)(0.6) + (0.8) (0.3) + (0.3)(0.1)
Note that the belief in hypotheses H1 and H3 have both decreased while the belief in
hypothesis H2 has increased after observing E 1. If E2 (e.g., the patient coughs) is now
observed, new posterior probabilities can be computed from Equation (3) (where k = 2):
P(H1 | E1 E2)
(0.3)(0.6)(0.6)
= ------------------------------------------------------------
(0.3)(0.6)(0.6) + (0.8)(0.9)(0.3) + (0.3)(0.0)(0.1)
= 0.33
P(H3 | E1 E2)
(0.3)(0.0)(0.1)
= ------------------------------------------------------------
(0.3)(0.6)(0.6) + (0.8)(0.9)(0.3) + (0.3)(0.0)(0.1)
= 0.0
Hypothesis H3 (e.g., sensitivity to light) has now ceased to be a viable hypothesis and H 2
(e.g., allergy) is considered much more likely than H 1 (e.g., cold) even though H 1 initially
ranked higher.
P(H2 | E1 E2)
(0.8)(0.9)(0.3)
= ------------------------------------------------------------
(0.3)(0.6)(0.6) + (0.8)(0.9)(0.3) + (0.3)(0.0)(0.1)
= 0.67
P(H3 | E1 E2)
(0.3)(0.0)(0.1)
= ------------------------------------------------------------
(0.3)(0.6)(0.6) + (0.8) (0.9)(0.3) + (0.3)(0.0)(0.1)
= 0.0
Hypothesis H3 (e.g., sensitivity to light) has now ceased to be a viable hypothesis and H 2
(e.g., allergy) is considered much more likely than H 1 (e.g., cold) even though H 1 initially
ranked higher.
References
Cheyer, A.; Martin D.; and Moran D. 1999. The open agent architecture: A framework for
building distributed software systems. Applied Artificial Intelligence 13(1-2).
Davidsson, P., and Boman, M. 1998. Energy saving and value added services: Controlling
intelligent buildings using a multi-agent systems approach. In DA/DSM Europe DistribuTECH,
Penn Well.
Ekenberg, L.; Danielson, M.; and Boman, M. 1996. From local assessments to global
rationality. Intl Journal of Intelligent Cooperative Information Systems 5(2&3): 315-331.
Michael Negnevitsky (2005) Artificial Intelligence A Guide to Intelligent Systems 2nd Edition