Module 3 Notes_AI
Module 3 Notes_AI
BAI654D
Artificial Intelligence Module -III
MODULE III
In this chapter and the next, we explore and discuss techniques for solving problems with incomplete
and uncertain models.
What is Reasoning?
➢ Reasoning is the act of deriving a conclusion from certain premises using a given
methodology.
➢ Reasoning is a process of thinking, logically arguing and drawing inferences.
➢ UNCERTAINTY IN REASONING
o The world is an uncertain place; often the Knowledge is imperfect which causes
uncertainty.
o Therefore, reasoning must be able to operate under uncertainty.
o Uncertainty is major problem in knowledge elicitation, especially when the
expert's knowledge must be quantized in rules.
o Uncertainty may cause bad treatment in medicine, loss of money in business.
o
➢ INTRODUCTION TO NONMONOTONIC REASONING
➢ Logic is non-monotonic, if the truth of a proposition may change when new information
(axioms) is added. i.e., Non-monotonic logic allows a statement to be retracted (taken
back). Also used to formalize plausible (believable) reasoning.
Example 1: Birds typically fly.
Tweety is a bird.
Tweety flies (most probably).
- Conclusion of non-monotonic argument may not be correct.
✓ The techniques that can be used to reason effectively even when a complete, consistent, and
constant model of the world is not available are discussed here. One of the examples, which
we call the ABC Murder story, clearly illustrates many of the main issues these techniques must
deal with.
✓ Let Abbott, Babbitt, and Cabot be suspects in a murder case. Abbott has an alibi (explanation/
defense), in the register of a respectable hotel in Albany. Babbitt also has an alibi for his brother-
in-law testified that Babbitt was visiting him in Brooklyn at the time. Cabot pleads
Artificial Intelligence Module -III
alibi too, claiming to have been watching a ski meet in the Catskills, but we have only his word
for that. So we believe -
1. That Abbott did not commit the crime.
2. That Babbitt did not commit the crime.
3. That Abbott or Babbitt or Cabot did.
✓ But presently Cabot documents his alibi that he had the good luck to have been caught by
television in the sidelines at the ski meet. A new belief is thus thrust upon us:
4. That Cabot did not.
✓ Our beliefs (1) through (4) are inconsistent, so we must choose one for rejection. Which has the
weakest evidence? The basis for (1) in the hotel register is good, since it is a fine old hotel. The
basis for (2) is weaker, since Babbitt’s brother-in-law might be lying. The basis for (3) is perhaps
twofold: that there is no sign of burglary (robbery) and that only Abbott, Babbitt, and Cabot
seem to have stood to gain from the murder apart from burglary. This exclusion of burglary
seems conclusive, but the other consideration does not. There could be some fourth beneficiary.
For (4), finally, the basis is conclusive: the evidence from television. Thus (2) and
(3) are the weak points. To resolve the inconsistency of (1) through (4) we should reject (2) or
(3), thus either incriminating (drop some body) Babbitt or widening our net for some new suspect.
✓ This story illustrates some of the problems posed by uncertain, fuzzy, and often changing
knowledge. A variety of logical frameworks and computational methods have been proposed
for handling such problems.
➢ It is complete with respect to the domain of interest. In other words, all the facts
that are necessary to solve a problem are present in the system or can be derived
from existing facts with rules of first-order logic.
➢ It is consistent.
➢ The new facts can be added as they become available. If these new facts are
consistent with all the other facts that have already been asserted, then nothing
will ever be retracted (taken back) from the set of facts that are known to be true.
This property is called monotonicity. In other words, logic is monotonic if the
truth of a proposition does not change when new information (axioms) is added.
• Default Reasoning
✓ This is a very common form of non-monotonic reasoning. The conclusions are
drawn based on what is most likely to be true. There are two approaches for
Default reasoning and both are logic type: Non-monotonic logic and Default logic.
1. Nonmonotonic Logic
- Provides a basis for default reasoning.
- It has already been defined. It says, "The truth of a proposition may change
when new information (axioms) is added and logic may be build to allow the
statement to be retracted."
- Non-monotonic logic is a predicate logic with one extension called modal
operator M
which means “consistent with everything we know”. The purpose of M is
to allow consistency. i.e., FOPL is augmented with a modal operator M and
can be read as “is consistent”.
- Here the Rules are Wff’s.
- A way to define consistency with PROLOG notation is :
Artificial Intelligence Module -III
To show that fact P is true, we attempt to prove ¬P.
If we fail, we may say that P is consistent since ¬P is false.
Examples 1:
x, y : Related(x , y) M GetAlong(x, y) → WillDefend(x , y)
Should be read as “For all x and y, if x and y are related and if the fact that x
gets along with y is consistent with everything else that is believed, then
conclude that x will not defend y”
2. Default Logic
- Default logic initiates a new inference rule: A : B
C
Where, A is known as the prerequisite, B as the justification, and C as the
consequent. Read the above inference rule as: " if A is provable and if it is
consistent to assume B, then conclude C ". The rule says that given the
prerequisite, the consequent can be inferred, provided it is consistent with
the rest of the data.
- Example : Rule that "birds typically fly" would be represented as
bird(x) : flies(x)
flies (x)
which says " If x is a bird and the claim that x flies is consistent with what
we know, then infer that x flies".
Since, all we know about Tweety is that: Tweety is a bird, we therefore
inferred that
Tweety flies.
- These inferences are used as basis for computing possible set of
extensions to the knowledge base.
- Here, Rules are not Wff’s
- Applying Default Rules :
While applying default rules, it is necessary to check their justifications for
consistency, not only with initial data, but also with the consequents of any
other default rules that may be applied. The application of one rule may thus
block the application of another. To solve this problem, the concept of default
theory was extended.
3. Abduction
- Abduction means systematic guessing: "infer" an assumption from a
conclusion.
- Definition: "Given two Wffs: A→B and B, for any expressions A and B, if it
is consistent to assume A, do so".
- Refers to deriving Conclusions, applying the implications in reverse.
- For example, the following formula:
∀x: RainedOn(x) → wet(x)
could be used "backwards" with a
specific x: if wet(Tree) then
RainedOn(Tree)
This, however, would not be logically justified. We could say:
wet(Tree) CONSISTENT(rainedOn(Tree)) → rainedOn(Tree)
We could also attach probabilities, for example like
this: wet(Tree) → rainedOn(Tree)70%||
wet(Tree) → morningDewOn(Tree) 20%
|| wet(Tree) → sprinkledOn(Tree) 10% ||
- The concept is “An object inherits attribute values from all the classes of
-
Artificial Intelligence Module -III
- Which is not allowed to have more than one height, and then we would not be
able to apply the default rule. Thus an explicitly stated value will block the
inheritance of a default value which is exactly what we want.
- But now, let’s encode the default rule for the height of adult males in general.
• Minimalist Reasoning
✓ The idea behind using minimal models as a basis for nonmonotonic reasoning
about the world is the following: “There are many fewer true statements than false
ones. If something is true and relevant it makes sense to assume that it has been
entered into the knowledge base. Therefore, assume that the only true statements
are those that necessarily must be true to maintain the consistency of the
Artificial Intelligence Module -III
knowledge base.”
We derive: A (Joe)
B (Joe)
The CWA allows us to conclude both ? A (Joe) and ? B (Joe), since neither
A nor B must necessarily be true of Joe. So, the resulting extended
knowledge base is inconsistent.
- The problem is that we have assigned a special status to positive instances
of predicates, as opposed to negative ones. Specifically, the CWA forces
completion of knowledge base by adding the negative assertion P
whenever it is consistent to do so. But the assignment of a real world
property to some predicate P and its complement to the negation of P may
be arbitrary. For example, suppose we define a predicate single and create
the following knowledge base:
Single (John)
Single (Mary)
Then, if we ask about Jane, the CWA will yield the answer Single (Jane).
Artificial Intelligence Module -III
But now suppose we had chosen instead to use the predicate Married rather than
Single. Then the corresponding knowledge base would be
Married (John)
Married (Mary)
If we now ask about Jane, the CWA will yield the result
Married (Jane).
2. Circumscription
- Circumscription is a Nonmonotonic logic to formalize the common sense
➢ IMPLEMENTATION ISSUES
✓ The solutions are offered, considering the reasoning processes into two parts:
- One, a problem solver that uses whatever mechanism it happens to
have to draw conclusions as necessary, and
- Second, a truth maintenance system whose job is to maintain consistency
in knowledge representation of a knowledge base.
- Depth-first search
- Breadth-first search
➢ AUGMENTING A PROBLEM-SOLVER
✓ Knowledge base the usual PROLOG- style control structure in which rules are
matched top to bottom, left to right. Then if we ask the question? Suspect {x}, the
program will first try Abbott and return Abbott as its answer. If we had also
included the facts.
RegisteredHotel(Abbott,
Albany) FarAway(Albany)
Then, the program would have failed to conclude that Abbott was a suspect
and it would instead have located Babbitt and then Cabot.
✓ Figure below shows how the same knowledge could be represented as
forward rules.
✓ Figure: Forward Rules Using UNLESS
Artificial Intelligence Module -III
• Dependency-Directed Backtracking
•
✓ Depth-first approach to nonmonotonic reasoning: We need to know a fact F,
fairly simple problem. Finding a time at which three busy people can all attend a
meeting? One way to solve such a problem is first to make an assumption that the
meeting will be held on some particular day, say Wednesday, add to the database.
Then proceed to find a time, checking along the way for any inconsistencies in
people’s schedules. If a conflict arises, the statement representing the assumption
must be discarded and replaced by another, hope fully non- contradictory, one.
This kind of solution can be handled by a straightforward tree search with chronological
backtracking. All assumptions, as well as the inferences drawn from them, are recorded at
the search node that created them. When a node is determined to represent a contradiction,
simply backtrack to the next node from which there remain unexplored paths. The
assumptions and their inferences will disappear automatically.
Artificial Intelligence Module -III
Artificial Intelligence Module -III
purpose is to assure that inferences made by the reasoning system (RS) are valid.
✓ The RS provides the TMS with information about each inference it performs, and
in return the TMS provides the RS with information about the whole set of
inferences.
✓ The TMS maintains the consistency of a knowledge base as soon as new knowledge
is added. It considers only one state at a time so it is not possible to manipulate
environment.
✓ Several implementations of TMS have been proposed for non-monotonic
The Inference Engine (IE) may tell the TMS that some sentences are
contradictory. Then, TMS may find that all those sentences are believed
true, and reports to the IE that it can eliminate the inconsistencies by
determining the assumptions used and changing them appropriately.
Artificial Intelligence Module -III
Example: A statement that either Abbott, or Babbitt, or Cabot is guilty
together with other statements that Abbott is not guilty, Babbitt is not
guilty, and Cabot is not guilty, form a contradiction.
- Support default reasoning
At this point, we have described the key reasoning operations that are performed by a
JTMS:
- Consistent labeling
- Contradiction resolution
Also described a set of important reasoning operations that a JTMS does not
perform,
including:
- Applying rules to derive conclusions
- Detecting contradictions
All of these operations must be performed by the problem-solving program that is using
the JTMS.
• Logic-Based Truth Maintenance Systems
- Propagate inconsistencies, thus ruling out contexts that include sub contexts
Assign as a context for the node corresponding to C the intersection of the contexts
corresponding to the nodes A1 through An. It is necessary to think of the set of
contexts that are defined by a set of assumptions as forming a lattice.
Artificial Intelligence Module -III
STATISTICAL REASONING
Several representation techniques that can be used to model belief systems in which, at any
given point, a particular fact is believed to be true, believed to be false, or not considered
one way or the other.
The first class contains problems in which there is genuine randomness in the world.
Playing card games such as bridge and blackjack is a good example of this class.
Although in these problems it is not possible to predict the world with certainty, some
knowledge about the likelihood of various outcomes is available, and we would like to be
able to exploit it.
The second class contains problems that could, be modeled using the techniques we
described in the last chapter. In these problems, the relevant world is not random. It
behaves “normally” unless there is some kind of exception. Many common sense tasks
fall into this category, as do many expert reasoning tasks such as medical diagnosis. For
problems like this, statistical measures may serve a very useful function as summaries of
the world. We explore several techniques that can be used to augment knowledge
representation techniques with statistical measures that describe levels of evidence and
belief.
Bayes' theorem:
• Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian
reasoning, which determines the probability of an event with uncertain knowledge.
• In probability theory, it relates the conditional probability and marginal probabilities
of two random events.
• Bayes' theorem was named after the British mathematician Thomas Bayes.
The Bayesian inference is an application of Bayes' theorem, which is fundamental
to Bayesian statistics.
Artificial Intelligence Module -III
• It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).
• Bayes' theorem allows updating the probability prediction of an event by observing
new information of the real world.
Example: If cancer corresponds to one's age then by using Bayes' theorem, we can
determine the probability of cancer more accurately with the help of age.
Bayes' theorem can be derived using product rule and conditional probability of event A
with known event B:
As from product rule we can write:
1. P(A ⋀ B)= P(A|B) P(B) or
The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is
basic of most modern AI systems for probabilistic inference.
It shows the simple relationship between joint and conditional probabilities. Here,
P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we
calculate the probability of evidence.
Where A1, A2, A3,. . , An is a set of mutually exclusive and exhaustive events.
Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and P(A).
This is very useful in cases where we have a good probability of these three terms and want
to determine the fourth one. Suppose we want to perceive the effect of some unknown cause,
and want to compute that cause, then the Bayes' rule becomes:
Artificial Intelligence Module -III
Example-1:
Question: what is the probability that a patient has diseases meningitis with a stiff neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it occurs 80% of the
time. He is also aware of some more facts, which are given as follows:
Let a be the proposition that patient has stiff neck and b be the proposition that patient has meningitis. , so
we can calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02
Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a stiff neck.
Example-2:
Question: From a standard deck of playing cards, a single card is drawn. The probability that the card
is king is 4/52, then calculate posterior probability P(King|Face), which means the drawn face card is
a king card.
Solution:
given.
✓ Bayes' theorem is helpful in weather forecasting.
As a result, several mechanisms for exploiting its power while at the same time
making it tractable have been developed. In the rest of the discussion, we explore
three of these:
- Attaching certainty factors to rules
- Bayesian networks
- Dempster-Shafer theory
✓ Certainty factors provides a simple way of updating probabilities given new evidence.
✓ The basic idea is to add certainty factors to rules, and use these to calculate the
measure of belief in some hypothesis. So, we might have a rule such as:
IF has-spots(X)
AND has-fever(X)
THEN has-measles(X) CF 0.5
✓ Certainty factors consist of two components a measure of belief and a measure of
disbelief. However, here we’ll assume that we only have positive evidence and equal
certainty factors with measures of belief.
✓ Certainty factors are related to conditional probabilities, but are not the same. For
one thing, we allow certainty factors of less than zero to represent cases where some
Artificial Intelligence Module -III
evidence tends to deny some hypothesis. Rich and Knight discuss how certainty
factors consist of two components: a measure of belief and a measure of disbelief.
However, here we'll assume that we only have positive evidence and equate certainty
factors with measures of belief.
✓ Suppose we have already concluded has-spots(fred) with certainty 0.3, and has-
evidence “e”.
• MB measures the extent to which the evidence supports evidence the
hypothesis.
• It is zero if the evidence fails to support the hypothesis.
2. MD[h,e] →a measure (between 0 and 1)of disbelief in hypothesis “h” given the
evidence “e”.
• MD measures the extent to which the evidence supports the
negation of the hypothesis.
• It is zero if the evidence supports
Artificial Intelligence Module -III
hypothesis. CF[h,e]=mb[h,e]-md[h,e]
Example 2:
The approach that we discuss here was found in the MYCIN system, which
attempts to recommend appropriate therapies for patients with bacterial
infections. It interacts with the physician to acquire the clinical data it needs.
MYCIN is an example of an expert system, since it performs a task normally
done by a human expert. Here we concentrate on the use of probabilistic
reasoning.
✓ MYCIN represents most of its diagnostic knowledge as a set of rules. Each rule
has associated with it a certainty factor, which is a measure of the extent to which
the evidence that is described by the antecedent of the rule supports the conclusion
that is given in the rule’s consequent. A typical MYCIN rule looks like:
If: 1. the stain of the organism is gram-positive, and
2. the morphology of the organism is coccus, and
3. the growth conformation of the organism is clumps (cluster),
then there is suggestive evidence (0.7) that the identity of the organism is
staphylococcus.
This is the form in which the rules are stated to the user.
➢ BAYESIAN NETWORKS
Bayesian Network can be used for building models from data and experts opinions,
and it consists of two parts:
o Directed Acyclic Graph
o Table of conditional probabilities.
The generalized form of Bayesian network that represents and solve decision
problems under uncertain knowledge is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:
o Each node corresponds to the random variables, and a variable can be continuous or discrete.
o Actual numbers
Each node in the Bayesian network has condition probability distribution P(Xi
|Parent(Xi) ), which determines the effect of the parent on that node.
Bayesian network is based on Joint probability distribution and conditional probability.
So let's first understand the joint probability distribution:
Joint probability distribution:
If we have variables x1, x2, x3,. , xn, then the probabilities of a different combination of
x1, x2, x3.. xn, are known as Joint probability distribution.
P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint
probability distribution.
In general for each variable Xi, we can write the equation as:
Example 1:
✓ Suppose that there are two events which could cause grass to be wet: either the sprinkler is on it’s
raining that the rain had a direct effect on the use of the sprinkler(namely that when it rains,the
sprinkler is usually not turned on).
✓ Let’s return to the example of the sprinkler, rain, and grass. we construct a directed acyclic
graph (DAG) that represents causality relationships among variables.The variables in such a
graph may be propositional (values TRUE or FALSE) or they may be variables that take on values
of some other type e.g., a specific disease, a body temperature, or a reading taken by some other
diagnostic device.
✓ A Bayes net treats the above problem with the tool: a graphical model and a few tables. For the
above example, one possible representation is
✓
Sprinkler Rain
Rain
T F
0.2 0.8
Grass wet
example, from the table we see that the prior probability of the rainy season is
0.5. “Then, if it is the rainy season, the probability of rain on a given night is
0.9, if it is not, the probability is only 0.1.
➢ DEMPSTER - SHAFER THEORY
This theory was released because of following reason:-
✓ Bayesian theory is only concerned about single evidences.
✓ Bayesian probability cannot describe ignorance.
✓ DST is an evidence theory, it combines all possible outcomes of the problem. Hence
it is used to solve problems where there may be a chance that a different evidence
will lead to some different result.
✓ DST is a mathematical theory of evidence based on belief functions and plausible
reasoning. It is used to combine separate pieces of information (evidence) to
calculate the probability of an event.
✓ DST offers an alternative to traditional probabilistic theory for the mathematical
representation of uncertainty.
✓ DST can be regarded as, a more general approach to represent uncertainty than the
Bayesian approach.
Dempster-Shafer Model
➢ belief ≤ plausibility
➢ sets enclosed by it (i.e. the sum of the masses of all subsets of the hypothesis).
It is the amount of belief that directly supports a given hypothesis at least in
part, forming a lower bound.
➢ Plausibility is 1 minus the sum of the masses of all sets whose intersection with
the hypothesis is empty. It is an upper bound on the possibility that the
hypothesis could possibly happen, up to that value, because there is only so
much evidence that contradicts that hypothesis.
FUZZY LOGIC
✓ Here, we take a different approach and briefly consider what happens if we make fundamental
changes to our idea of set membership and corresponding changes to our definitions of logical
operations.
✓ The motivation for fuzzy sets is provided by the need to represent such propositions as:
✓
Artificial Intelligence Module -III
Artificial Intelligence Module -III
Artificial Intelligence Module -III
Artificial Intelligence Module -III
Artificial Intelligence Module -III
Artificial Intelligence Module -III
Artificial Intelligence Module -III
Artificial Intelligence Module -III
Artificial Intelligence Module -III
-
Artificial Intelligence Module -III
Artificial Intelligence Module -III
Artificial Intelligence Module -III
Artificial Intelligence Module -III
✓