Introduction to
Artificial Intelligent
(AI)
3. Probabilistic
Reasoning
Probabilistic
Reasoning
• Real world is filled with uncertainty and ambiguity
- unreliable source of information
- equipment faults/errors
- temperature variation …
• Probabilistic reasoning is making decisions based on
probabilities and likelihoods rather than absolute
facts.
• Probabilistic reasoning provides a mathematical
framework for dealing with uncertainty and making
rational decisions 2
Review of Probability
3
Review of Probability
• Random variable
• Joint and Marginal Distributions
• Conditional distribution
• Product Rule, chain rule, Bayes’ Rule
• Inference
• Independence
4
Review of Probability
• A random variable represents an event 0.25
whose outcome is unknown
• A probability distribution is an
assignment of weight to outcomes
• Example: traffic on freeway 0.5
Random variable: T = whether there’s
traffic
Outcomes: T in {none, light, heavy}
Distribution: P(T=none) = 0.25, 0.25
P(T=light) = 0.5,
5
Review of Probability
P(T) Tim
• Probabilities are always non- 0.25 e20’
negative
• Probabilities over all possible
outcomes sum to 1
0.5 30’
• The expected value of a function
of a random variable is the
average, weight by the probability
distribution over outcomes
Example: how long to get to the 0.25 60’
school?
f(T) = 20*0.25+30*0.5+60*0.25
6
= 35 min
Joint Probability
𝐏 ( 𝐀 ∩ 𝑩 ) =𝑷 ( 𝑨 ) × 𝑷 ( 𝑩)
𝐏 ( 𝟔 ∩ 𝑹𝒆𝒅 )=?
7
Joint distributions
• Joint distribution is the probability
of events (variables) happening
together.
• From a joint distribution, we can
calculate the probability of any
event
E.g. Probability that it’s hot AND
8
sunny?
Marginal
Distributions
• Marginal distributions are sub-tables of joint
distribution which eliminate variables
9
Conditional
Probabilities
• Conditional probability is the likelihood of an event
occurring, based on the occurrence of a previous
event.
𝐏 ( 𝑩∨ 𝑨 ) =𝑷 ( 𝑨 ∩ 𝑩) / 𝑷 ( 𝑨 )
10
Conditional
Distributions
• Conditional distributions are probability distributions
over some variables given fixed values of others
11
Normalization
• Normalization is a process of bringing or restoring
probabilities to normal condition All entries sum to
ONE
• Procedure:
Step 1: Computer Z = sum over all entry
Step 2: Divide every entry by Z
12
Probabilistic
Inference
• Probabilistic inference: compute a desired probability from
other known probabilities
• Generally compute conditional probabilities
P(no accident | light traffic) = 0.90
• Probabilities change with new evidence
P(no accident | light traffic, 5 am) = 0.95
P(no accident | light traffic, 5 am, raining) = 0.70
13
Probabilistic
Inference
14
Probabilistic
Inference
• P(W)?
• P(W | winter)?
• P(W | winter, hot)?
15
The Product Rule
16
The Chain Rule
17
Bayes’ Rule
Therefore:
Bayes’ formular is helpful
• Lets us build one conditional from its
reverse
• Often one conditional is tricky but the Thomas
other one is simple Bayes
(1701 – 1761)
• Foundation of many advanced systems 18
19
Inference with
Bayes’ Rule
• Given
• What is P(W | dry) ?
20
Independence
• Two variables are independent if:
or
• Example: coin flip
21
Conditional
Independence
• Unconditional (absolute) independence very rare
• Conditional independence is our most basic and robust
form of knowledge about uncertain environments.
• X is conditionally independent of Y given Z
if and only if:
or, if and only if:
22
Bayesian Network
23
Bayesian Network
• Bayes nets: a technique for
describing complex joint
distributions (models) using simple,
local distributions (conditional
probabilities)
• Bayes Net consists of 2 parts
- Topology (in Directed Acyclic
Graph form)
- Local conditional probabilities
24
Inference in Bayes Net
• Bayes nets encode joint distributions as product of
conditional distributions on each variable:
• Example:
25
Bayesian Network
Example: Harry installed a new burglar alarm at his home to
detect burglary. The alarm reliably responds at detecting a
burglary but also responds for minor earthquakes. Harry has
two neighbors John and Mary, who have taken a responsibility
to inform Harry at work when they hear the alarm. John always
calls Harry when he hears the alarm, but sometimes he got
confused with the phone ringing and calls at that time too. On
the other hand, Mary likes to listen to high music, so sometimes
she misses to hear the alarm.
Problem: Calculate the probability that alarm has sounded, but
there is neither a burglary, nor an earthquake occurred, and
John and Mary both called the Harry.
26
Bayesian Network
𝑷 ( 𝑱 , 𝑴 , 𝑨 ,! 𝑩, ! 𝑬 ) =𝑷 ( 𝑱| 𝑨 ) 𝑷 ( 𝑴| 𝑨 ) 𝑷 ( 𝑨|! 𝑩 ∩! 𝑬 ) . 𝑷 ( ! 𝑩 ) . 𝑷 (! 𝑬 ) 27
Bayesian Network
𝑷 ( 𝑺=𝟏∨𝑾 =𝟏 )= ?
28
Decision Networks
Action Node
Chance Node
Utility Node
choose the action which maximizes the
expected utility given the evidence
29
Decision Networks
30
Decision Networks
31
Decision Networks
32
Naïve Bayes
• The Naïve Bayes classifier is a supervised machine
learning algorithm, which is used for classification
tasks
• Procedure:
- Step 1: convert data into a frequency table
- Step 2: Create Likelihood table by finding the
probability
- Step 3: Use Bayesian equation to calculate the
posterior probability. The class with highest posterior
probability is the outcome of the prediction
33
Naïve Bayes
Example: Spam email
detection
Input: an email
Output: spam or normal
Setup:
• Get a large collection of
example emails, each labeled
“spam” or “normal”
• Select features:
e.g.: words, text pattern… Naive Bayes, Clearly Explained!!! - YouTube 34
Naïve Bayes
Example: digit recognition
Input: image
Output: a digit 0-9
Setup:
• Get a large collection of
example images, each
labeled with a digit
• Select features:
e.g.: pixel value, shape…
35
Decision Tree
36
Decision Tree
37