Fuzzy Notes
Fuzzy Notes
SATHYABAMA UNIVERSITY
1
SCSX1029 SOFT COMPUTING
UNIT 1
Neural Networks
Introduction to ANS- adaline- BPN- Hopfield network- Boltzman machine- Self Organizing maps
Neural Networks
Neural networks are highly interconnected processing elements or neurons. Adaptive Neural Networks
(ANNs) are inspired by biological neurons. ANNs are used to estimate and approximate the functions
that are the outcome of vast and unknown inputs.
Advantages of ANNs:
Drawbacks of ANNS
1. Signal Processing: In telephone networks ANNs are used to process the signals thereby
suppressing the noise
2. Pattern Recognition: Handwritten characters, Radar signal classification and analysis, Speech
recognition, Finger print recognition, character recognition, Handwriting analysis
3. Medicine: ECG signal analysis and understanding, Diagnosis of diseases, Medical Image
Processing
4. Image Processing: Image matching, pre-processing, image compression
5. Military Systems: Sea mine detection, Radar cluster classification
6. Power Systems: State estimation, Transient deduction, Fault deduction
Human brain has ten billion interconnected neurons. Each neuron or nerve cell uses biological reactions
to receive, process and transmit information
1
Dendrites
Nucleus
Soma or Axon
Cell body
Axon hillock
Dendrites: Receive signals from other neurons. Signals are electric impulses transmitted across synaptic
gap using chemical reactions.
Soma: Contains nucleus and pericardium. It sums incoming signals when sufficient input is received
when the cell fires.
Axon: Neuron sends spikes of electrical activity through long thin stand called axon which is split into
branches.
Characters of ANN
- Architecture
- Learning algorithm
- Activation function
2
Representation of Artificial Neuron
= + +
Let PE be the processing elements. Properties of PEs of ANN with biological neurons
ANN functioning
ANN functioning corresponds to the arrangement of neurons into layers and connected patterns
between each layer.
3
Types:
Single layer Representation: The single layer representation comprises of only the input layer and the
output layer. No hidden Layers are included
Multilayer network representation: The multilayer representation consists of Input layer, one or more
hidden layers and output layer.
4
Feedback Networks or Recurrent Networks or Counter Propagation Networks
5
Inputs: x1,x2
Output: y1, y2
AF Types:
6
2. Step function
3. Sigmoidal function
1. Binary output( 1 or 0)
2. Bipolar output(+1 or -1)
Learning or training
7
Classification of Learning Algorithms
1. Supervised Learning:
a. Hebb rule: ∆ =
Hebb rule is determines the change in the ith synaptic weight of a node ‘I’ (∆ ).
η: Learning rate
x : Input
Y: post synaptic response and = ∑ for each node j.
b. Delta rule:
Delta rule is the gradient descent learning rule for updating input weights in a single layer ANN.
∆ = ( )
η: Learning rate
x : Input
d" : Desired output or target output
y" : actual output
= $
c. Back Propagation Network(BPN)
d. Perceptron Learning rule
The perceptron learning rule was developed by Frank Rosenblatt in the late 1950s. Training patterns are
presented to the network's inputs; the output is computed. Then the connection weights wj are
modified by an amount that is proportional to the product of the difference between the actual output
y, and the desired output d, and the input pattern, x.
8
η: Learning rate
x : Input
d" : Desired output or target output
y" : actual output
t: iteration number
Note: In supervised learning the error information is used to improve the network behaviour
Associative memory : represents the Neural Network Association between the input vector and the
output vector
Auto Associative memory: If the desired output vector is same as the input vector, it is called auto
associative memory
Hetero Associative memory: If the output target vector is different from input vector
2. Unsupervised Learning
Error information is not used to improve the network behaviour. The network is self organizing.
9
Neuron Modelling
McCulloch Pitts Neuron model
Components:
Set of Inputs xi
Threshold , u
Activation function, f
Architecture
Θ : Threshold
Activation Function:
( ) = 1 ( ≥ *
0 ( < *
10
Example 1: McCulloch pitts neuron for AND function
X1 X2 Y
1 1 1
1 0 0
0 1 0
0 0 0
Example 2: McCulloch pitts neuron for OR function
X1 X2 Y
1 1 1
1 0 1
0 1 1
0 0 0
X1 X2 Y
1 1 0
1 0 1
0 1 1
0 0 0
11
Example 4: McCulloch pitts neuron to perform XOR with the following neural model:
- = 1 ( -./ ≥ 2
0 ( -./ < 2
- =1( - ./ ≥2
0( - ./ <2
./ = - 1 + - 1
= 1 ( ./ ≥ 2
0 ( ./ < 2
12
Simple Neural network for pattern classification
Let ‘b’ be the bias value which is always 1. For Bipolar input, the output function y is as follows:
./ 1 ( 234 ) 0
1 ( 234 , 0
0( ./ ,0
234 5 $
5 0
5
0
0
13
1.2 Adaline: Adaptive Linear Neuron
Rule: Difference between Actual output and desired output is the background for error correction
Learning: Changing of weights in ANN. The value of correction is proportional to signal at the elements
input.
Structure of Adaline:
Basic structure follows simple neuron with linear activation function and a feedback loop.
14
1 ( 678 9:4;:4 (< + =3
= > + $
?
If x0 = 1 then
= ∑?>
= /
Applications of Adaline
- Making binary decisions
- Realizations of AND, OR and NOT gates
- Only linear separable functions are recognized.
- Linear Separability: The idea behind hidden Layers
- two sets are linearly separable if there exists at least one line in the plane with all of the positive
values on one side of the line and all the negative values on the other side.
-
- Fig. Linear Separability
15
- But XOR function cannot be separated using a single line. Two lines are needed to segregate positive
and negative values. Hence it is not supported by adaline.
-
- Fig. XOR is not linear separable
w: weights
y: output values
Given the set of input, desired output pairs {(x1,d1), (x2,d2)}, the best value of w* needs to be
calculated.
G
/ G
Substituting equations 3, 2 in 1
FG (HG / G )
16
1.3 Back Propagation Network
-Supervised Learning
- Multilayer Perceptron
BPN Rule: Adjusting the weights in previous level of layers to reduce error. This leads to Delta learning
rule
BPN algorithm
For i=1 to n
Read x[i]
For k=1 to m
Read o[k]
17
Step 3: Read the input hidden weights whij
For i=1 to n
For j=1 to h
Read wh[i][j]
For j=1 to h
For k=1 to m
Read wo[j][k]
For j= 1 to h
For j= 1 to h
1
9ℎMNO
(1 + 3 Q./R[] )
For j=1 to h
For k=1 to m
For k=1 to m
1
99MSO
(1 + 3 Q./T[G] )
Step 9: Calculate error in output layer ‘eok’ ( Desired output- Actual output)
18
For k=1 to m
For j=1 to h
For k=1 to m
Step 11: Calculate the new weights(nwojk) for the hidden output ‘nwojk’
For j=1 to h
For k=1 to m
Step 12: New weight for input hidden layer is calculated as follows : ‘nwhij’
For i=1 to n
For j=1 to h
The new weight obtained for hidden output layer is nwoij and the new weight obtained for input hidden
layer is nwij
Step 13: Replace old weights in hidden layer and output layer with new weights ‘nwhij’ and ‘nwojk’
-Symmetric weights
19
Single Layer recurrent network
]^_ =0 if aC_ , _
Hopfield Algorithm.
20
Hopfield nets have a scalar value associated with each state of the network referred to as the "energy",
Q
E, of the network, where: C ∑ ∑\ ∑ _
E: Energy
_ = Threshold
=weight
=input
Function of energy landscape: Storage and retrieval. The units are randomly chosen for updation. When
the units are chosen energy E will be decremented or stable. Repeated updation leads to the eventual
convergence to a local minimum state in the energy function(Lyapunov function).Thus E is stable when
state is local minimum.
Learning:
Local learning: A learning rule is local if each weight is updated using information available to neurons
on either side of the connection that is associated with that particular weight.
Incremental learning: New patterns can be learned without using information from the old patterns
that have been also used for training. That is, when a new pattern is used for training, the new values
for the weights only depend on the old values and on the new pattern
The Hebbian rule is both local and incremental. For the Hopfield Networks, it is implemented in
the following manner, when learning n binary patterns:
∑j? 3j 3j where 3j is the bit i, from pattern p. If the bit representation of i and j are
equal, then the product, 3j 3j is positive. This in turn will have a positive effect on weight . If
the neurons are different then the product is negative.
21
Drawback of Hopfield Network
Two Phases
Phase 1
Phase 2: Decremental
A self-organizing map consists of components called nodes or neurons. Associated with each node are
a weight vector of the same dimension as the input data vectors, and a position in the map space.
- Unsupervised learning
- to produce a low-dimensional (typically two-dimensional), discretized representation of the input
space of the training samples, called a map.
- they use a neighborhood function to preserve the topological properties of the input space.
- Two models available are Kohenon model and Willshaw model
22
Kohenon Model:
1. Training
2. Mapping
Cooperation: The winning neuron determines the spatial location of a topological neighbourhood of
excited neurons, thereby providing the basis for cooperation among neighbouring neurons.
Adaptation: The excited neurons decrease their individual values of the discriminant function in relation
to the input pattern through suitable adjustment of the associated connection weights, such that the
response of the winning neuron to the subsequent application of a similar input pattern is enhanced.
Training Data:
23
Inputs
Y: y1, y2,……ym
Network Architecture:
2 Layer of units
Input: n units
Output: m units
Algorithm:
24
(4 1) = (1) + η(t)(i" w" (t))
4.5 Increment t
End while
Application:
1.Phonetic typewriter
2.Pattern recognition :Winning neurons with minimum distance are brought together in a single cluster.
References:
25
UNIT- II
Fuzzy sets – Fuzzy rules and fuzzy reasoning – Fuzzy inference system – Mamdani fuzzy model
– Sugeno fuzzy model – Tsukamoto fuzzy model.
Introduction
1. As a professional subject for building systems of high utility - for example fuzzy control.
Fuzzy Logic is a form of multi-valued logic derived from fuzzy set theory to deal with
reasoning that is approximate rather than precise. Fuzzy logic is not a vague logic system, but a
system of logic for dealing with vague concepts. As in fuzzy set theory the set membership
values can range (inclusively) between 0 and 1, in fuzzy logic the degree of truth of a statement
can range between 0 and 1 and is not constrained to the two truth values true/false as in classic
predicate logic.
Problem: A real estate owner wants to classify the houses he offers to his clients. One main
indicator of comfort of these houses is the number of bedrooms in them. Let the available
types of houses be represented by the following set.
U = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
The houses in this set are described by U number of bedrooms in a house. The realtor wants to
describe a "comfortable house for a 4-person family," using a fuzzy set.
Solution: The fuzzy set "comfortable type of house for a 4-person family" may be described
using a fuzzy set in the following manner.
HouseForFour =FuzzySet [{{1, 0.2}, {2, .5}, {3, .8}, {4, 1}, {5, .7}, {6, .3}},
Universal Set—>{1,10}];
A Fuzzy System can be contrasted with a conventional - crisp system in three main ways:
2. Fuzzy Conditional Statements are expressions of the form “If A THEN B”, where A
and B have fuzzy meaning, e.g. “If x is small THEN y is large”, where small and large
are viewed as labels of fuzzy sets.
3. A Fuzzy Algorithm is an ordered sequence of instructions which may contain fuzzy
assignment and conditional statements, e.g., “x = very small, IF x is small THEN y is
large”. The execution of such instructions is governed by the compositional rule of
inference and the rule of the preponderant alternative.
A Fuzzy set is a set whose elements have degrees of membership. Fuzzy sets are an
extension of the classical notion of set (known as a Crisp Set). More mathematically, a fuzzy set
is a pair (A, µA) where A is a set and µA : A → [0, 1]. For all x Є A, µA(x) is called the grade of
membership of x. If µA(x) = 1, we say that x is Fully Included in (A, µA), and if µA(x) = 0, we
say that x is Not Included in (A, µA). If there exists some x Є A such that µA(x) = 1, we say that
(A, µA) is Normal. Otherwise, we say that (A, µA) is Subnormal.
A = µA(x1)/x1 + · · · + µA(xn)/xn
Simple Example:
Suppose a child is asked which of the numbers in X are “large” relative to the others. The child
might come up with the following:
10 Definitely 1
9 Definitely 1
7 May be 0.5
6 Not usually 0.2
Following are the definitions for two fuzzy sets (A, µA) and (B, µB), where A, B ⊆ X:
Note that A ∩ ¬A is not necessarily the empty set, as would be the case with classical set
theory. Also, if A is crisp, then Aα = A for all α. We will define the Cartesian product A × B
to be the same as A ∩ B.
Membership Functions
A membership function (MF) is a curve that defines how each point in the input space is mapped
to a membership value (or degree of membership) between 0 and 1. The input space is
sometimes referred to as the universe of discourse, a fancy name for a simple concept.
• Triangles
• Trapezoids
• Bell Curves
• Gaussian and
• Sigmoidal
Human beings make decisions based on rules. Although, we may not be aware of it, all
the decisions we make are all based on computer like if-then statements. If the weather is fine,
then we may decide to go out. If the forecast says the weather will be bad today, but fine
tomorrow, then we make a decision not to go today, and postpone it till tomorrow. Rules
associate ideas and relate one event to another.
Fuzzy machines, which always tend to mimic the behavior of man, work the same way.
However, the decision and the means of choosing that decision are replaced by fuzzy sets and the
rules are replaced by fuzzy rules. Fuzzy rules also operate using a series of if-then statements.
For instance, if X then A, if y then b, where A and B are all sets of X and Y. Fuzzy rules define
fuzzy patches, which is the key idea in fuzzy logic.
A machine is made smarter using a concept designed by Bart Kosko called the Fuzzy
Approximation Theorem (FAT). The FAT theorem generally states a finite number of patches
can cover a curve as seen in the figure below. If the patches are large, then the rules are sloppy.
If the patches are small then the rules are fine.
Rule: if x is A then y is B
Fact: x is A’
Conclusion: y is B’
The i-th fuzzy rule from this rule-base
Ri : if x is Ai and y is Bi then z is Ci is
defined as
A fuzzy inference system (FIS) is a system that uses fuzzy set theory to map inputs (features in
the case of fuzzy classification) to outputs (classes in the case of fuzzy classification).
Fuzzy inference systems have been successfully applied in fields such as automatic control, data
classification, decision analysis, expert systems, and computer vision. Because of its
multidisciplinary nature, fuzzy inference systems are associated with a number of names, such as
fuzzy-rule-based systems, fuzzy expert systems, fuzzy modeling, fuzzy associative memory,
fuzzy logic controllers, and simply (and ambiguously) fuzzy systems.
− Here x is low; y is high; z is medium are fuzzy statements; x and y are input
variables; z is an output variable, low, high, and medium are fuzzy sets.
The antecedent describes to what degree the rule applies, while the conclusion assigns a fuzzy
function to each of one or more output variables. Most tools for working with fuzzy expert
systems allow more than one conclusion per rule.
The functional operations in fuzzy expert system proceed in the following steps.
− Fuzzification
− Defuzzification
Fuzzification
● Fuzzy statements in the antecedent are resolved to a degree of membership between 0 and
1.
− If there is only one part to the antecedent, then this is the degree of support for the
rule.
− If there are multiple parts to the antecedent, apply fuzzy logic operators and
resolve the antecedent to a single number between 0 and 1.
− For OR -- max
Fuzzy Inferencing
− Truth value for the premise of each rule is computed and applied to the conclusion
part of each rule.
− This results in one fuzzy set to be assigned to each output variable for each rule.
The use of degree of support for the entire rule is to shape the output fuzzy set. The consequent
of a fuzzy rule assigns an entire fuzzy set to the output. If the antecedent is only partially true,
(i.e., is assigned a value less than 1), then the output fuzzy set is truncated according to the
implication method. If the consequent of a rule has multiple parts, then all consequents are
affected equally by the result of the antecedent. The consequent specifies a fuzzy set to be
assigned to the output. The implication function then modifies that fuzzy set to the degree
specified by the antecedent.
It is the process where the outputs of each rule are combined into a single fuzzy set.
• The input of the aggregation process is the list of truncated output functions returned by
the implication process for each rule.
• The output of the aggregation process is one fuzzy set for each output variable.
− Here, all fuzzy sets assigned to each output variable are combined together to
form a single fuzzy set for each output variable using a fuzzy aggregation
operator.
− the sum : (point-wise sum over all of the fuzzy) the probabilistic sum.
Defuzzification
Some commonly used techniques are the centroid and maximum methods.
− In the centroid method, the crisp value of the output variable is computed by
finding the variable value of the centre of gravity of the membership function for
the fuzzy value.
− In the maximum method, one of the variable values at which the fuzzy set has its
maximum truth value is chosen as the crisp value for the output variable.
− bisector, middle of maximum (the average of the maximum value of the output
set), largest of maximum, and smallest of maximum, etc.
There are two types of fuzzy inference systems that can be implemented in the Fuzzy Logic
Toolbox: Mamdani-type and Sugeno-type.
Mamdani's fuzzy inference method is the most commonly seen fuzzy methodology. Mamdani's
method was among the first control systems built using fuzzy set theory. It was proposed in 1975
by Ebrahim Mamdani as an attempt to control a steam engine and boiler combination by
synthesizing a set of linguistic control rules obtained from experienced human operators.
To compute the output of this FIS given the inputs, one must go through six steps:
3. Combining the fuzzified inputs according to the fuzzy rules to establish a rule strength,
4. Finding the consequence of the rule by combining the rule strength and the output membership
function,
6. Defuzzifying the output distribution (this step is only if a crisp output (class) is needed).
Fuzzy rules are a collection of linguistic statements that describe how the FIS should make a
decision regarding classifying an input or controlling an output. Fuzzy rules are always written in
the following form:
Example
Another Example
Fuzzification
The purpose of fuzzification is to map the inputs from a set of sensors (or features of those
sensors such as amplitude or spectrum) to values from 0 to 1 using a set of input membership
functions. In the example shown in the above figure, there are two inputs, x0 and y0 shown at the
lower left corner. These inputs are mapped into fuzzy numbers by drawing a line up from the
inputs to the input membership functions above and marking the intersection point.
These input membership functions, as discussed previously, can represent fuzzy concepts such as
"large" or "small", "old" or "young", "hot" or "cold", etc. When choosing the input membership
functions, the definition of what we mean by "large" and "small" may be different for each input.
Fuzzy Combinations
In making a fuzzy rule, we use the concept of "and", "or", and sometimes "not". The sections
below describe the most common definitions of these "fuzzy combination" operators. Fuzzy
combinations are also referred to as "T-norms".
a) Fuzzy “and”
where µ A is read as "the membership in class A" and µ B is read as "the membership in class
B". There are many ways to compute "and". The two most common are:
1. Zadeh - min(uA(x), uB(x)) This technique, named after the inventor of fuzzy set theory
simply computes the "and" by taking the minimum of the two (or more) membership values.
This is the most common definition of the fuzzy "and".
2. Product - ua(x) times ub(x)) This techniques computes the fuzzy "and" by multiplying the
two membership values.
T(a,1) = T(1,a) = a
One of the nice things about both definitions is that they also can be used to compute the
Boolean "and". The table below shows the Boolean "and" operation. Notice that both fuzzy
"and" definitions also work for these numbers. The fuzzy "and" is an extension of the
Boolean "and" to numbers that are not just 0 or 1, but between 0 and 1.
0 0 0
0 1 0
1 0 0
1 1 1
The Boolean "and"
b) Fuzzy “or”
Similar to the fuzzy "and", there are two techniques for computing the fuzzy "or":
1. Zadeh - max(uA(x), uB(x)) This technique computes the fuzzy "or" by taking the maximum
of the two (or more) membership values. This is the most common method of computing the
fuzzy "or".
2. Product - uA(x)+ uB(x) - uA(x) uB(x) This technique uses the difference between the sum of
the two (or more) membership values and the product of the membership values.
T(a,0) = T(0,a) = a
T(a,1) = T(1,a) = 1
Similar to the fuzzy "and", both definitions of the fuzzy "or" also can be used to compute the
Boolean "or". The table below shows the Boolean "or" operation. Notice that both fuzzy "or"
definitions also work for these numbers. The fuzzy "or" is an extension of the Boolean "or"
to numbers that are not just 0 or 1, but between 0 and 1.
0 0 0
0 1 1
1 0 1
1 1 1
The Boolean “or”
c) Consequence
2. Clipping the output membership function at the rule strength. Once again, refer to Fig 5. to see
how this is done for a two input, two rule Mamdani FIS.
The outputs of all of the fuzzy rules must now be combined to obtain one fuzzy output
distribution. This is usually, but not always, done by using the fuzzy "or". Figure 5 shows an
example of this. The output membership functions on the right hand side of the figure are
combined using the fuzzy "or" to obtain the output distribution shown on the lower right corner
of the figure.
In many instances, it is desired to come up with a single crisp output from a FIS. For
example, if one was trying to classify a letter drawn by hand on a drawing tablet, ultimately
the FIS would have to come up with a crisp number to tell the computer which letter was
drawn. This crisp number is obtained in a process known as defuzzification. There are two
common techniques for defuzzifying:
1. Center of mass - This technique takes the output distribution found previously and finds
its center of mass to come up with one crisp number. This is computed as follows:
where z is the center of mass and uc is the membership in class c at value zj. An example
outcome of this computation is shown in Fig 6.
Fig 6. Defuzzification Using the Center of Mass
2. Mean of maximum - This technique takes the output distribution found previously and
finds its mean of maxima to come up with one crisp number. This is computed as follows:
where z is the mean of maximum, zj is the point at which the membership function is maximum,
and l is the number of times the output distribution reaches the maximum level. An example
outcome of this computation is shown in Figure 7.
Fig 7. Defuzzification Using the Mean of Maximum
Fuzzy Inputs
In summary, Fig 5 shows a two input Mamdani FIS with two rules. It fuzzifies the two inputs by
finding the intersection of the crisp input value with the input membership function. It uses the
minimum operator to compute the fuzzy "and" for combining the two fuzzified inputs to obtain a
rule strength. It clips the output membership function at the rule strength. Finally, it uses the
maximum operator to compute the fuzzy "or" for combining the outputs of the two rules.
Fig 8. A two Input, two rule Mamdani FIS with a fuzzy input
Fig 8 shows a modification of the Mamdani FIS where the input y0 is fuzzy, not crisp. This can
be used to model inaccuracies in the measurement. For example, we may be measuring the
output of a pressure sensor. Even with the exact same pressure applied, the sensor is measured to
have slightly different voltages. The fuzzy input membership function models this uncertainty.
The input fuzzy function is combined with the rule input membership function by using the
fuzzy "and" as shown in Fig 8.
The Sugeno FIS is quite similar to the Mamdani FIS. The primary difference is that the output
consequence is not computed by clipping an output membership function at the rule strength. In
fact, in the Sugeno FIS there is no output membership function at all. Instead the output is a crisp
number computed by multiplying each input by a constant and then adding up the results. This is
shown in Figure 9. "Rule strength" in this example is referred to as "degree of applicability" and
the output is referred to as the "action". Also notice that there is no output distribution, only a
"resulting action" which is the mathematical combination of the rule strengths (degree of
applicability) and the outputs (actions).
Fig 9. A two input, two rule Sugeno FIS (pn, qn, and rn are user-defined constants)
One of the large problems with the Sugeno FIS is that there is no good intuitive method for
determining the coefficients, p, q, and r. Also, the Sugeno has only crisp outputs which may not
be what is desired in a given HCI application. Why then would you use a Sugeno FIS rather than
a Mamdani FIS? The reason is that there are algorithms which can be used to automatically
optimize the Sugeno FIS.
In the Tsukamoto fuzzy models, the consequent of each fuzzy if-then rule is represented by a
fuzzy set with a monotonical membership function, as shown in Figure 10. As a result, the
inferred output of each rule is defined as a crisp value induced by the rule’s firing strength. The
overall output is taken as the weighted average of each rule’s output. Figure 10 illustrates the
reasoning procedure for a two-input two-rule system.
Fig 10. The Tsukamoto fuzzy model
Since each rule infers a crisp output, the Tsukamoto fuzzy model aggregate each rule’s output by
the method of weighted average and thus avoids the time-consuming process of defuzzification.
However, the Tsukamoto fuzzy model is not used often since it is not as transparent as either
the Mamdani or Sugeno fuzzy models. The following is a single-input example.
Fig 11. Single-input single output Tsukamoto fuzzy model: (a) antecedent MFs; (b)
consequent MFs; (c) each rule’s output curve; (d) overall input-output curve.
Since the reasoning mechanism of the Tsukamoto fuzzy model does not follow strictly the
compositional rule of inference, the output is always crisp even when the inputs are fuzzy.
References:
•https://www.calvin.edu/~pribeiro/othrlnks/Fuzzy/fuzzydecisions.htm
•Jang, sun, Mitzutani,”Neuro Fuzzy and Soft computing”, Prentice Hall India, 2006
Unit III
Adaptive Neuro Fuzzy Inference System- Co active Neuro Fuzzy modelling- Classification and
Regression Trees- Data Clustering Algorithms- Rule based structure-Neuro Fuzzy control 1-
Neuro Fuzzy control 2- Fuzzy Decision Making
3.1ANFIS
For simplicity, we assume that the fuzzy inference sytem under consideration has two inputs x
and y and one output z. For a first-orderTakagi-Sugeno fuzzy model, a common rule set with two
fuzzy if-then rules is the following:
Rule 1: If x is A1 and y is B1, then f1=p1x+q1y+r1;
Rule 2: If x is A2 and y is B2, then f2=p2x+q2y+r2;
1
Figure (a) A two inputs first order Takagi-Sugeno fuzzy model with two rules; (b) The
equivalent ANFIS architecture.
Figure (a) illustrates the reasoning mechanism for this Takagi-Sugeno model; where nodes of the
same layer have similar functions. (Here we denote the output of the ith node in layer l as Ol,i )
Layer 1 Every node i in this layer is an adaptive node with a node function
where x (or y) is the input to node i and Ai (or Bi-2) is a linguistic label (such as "small" or
"large") associated with this node. In other words, O1,i is the membership grade of a fuzzy set A (
=A1 , A2 , B1 or B2 ) and it specifies the degree to which the given input x (or y) satifies the
quantifier A. Here the membership function for A can be any appropriate parameterized
membership function introduced inhere, such as the generalized bell function:
2
where {ai, bi, ci} is the parameter set. As the values of these parameters change, the bell-shaped
function varies accordingly, thus exhibiting various forms of membership function for fuzzy
set A. Parameters in this layer are referred to as premise parameters.
Layer 2 Every node in this layer is a fixed node labeled , whose output is the product of all the
incoming signals:
Each node output represents the firing strength of a rule. In general, any other T-norm operators
that perform fuzzy AND can be used as the node function in this layer.
Layer 3 Every node in this layer is a fixed node labeled N. The ith node calculates the ratio of
the ith rule's firing strength to the sum of all rules' firing strenghts:
For convenience, outputs of this layer are called normalized firing strengthes.
Layer 4 Every node i in this layer is an adaptive node with a node function:
where wi is a normalized firing strength from layer 3 and {pi, qi, ri} is the parameter set of this
node. Parameters in this layer are referred to as consequent parameters.
Layer 5 The single node in this layer is a fixed node labeled ∑, which computes the overall
output as the summation of all incoming singals:
3
i)The ANFIS can be trained by a hybrid learning algorithm.
ii)In the forward pass the algorithm uses least-squares method to identify the consequent
parameters on the layer 4.
iii) In the backward pass the errors are propagated backward and the premise parameters are
updated by gradient descent.
Suppose that an adptive network has L layers and the kth layer has #( k) nodes.
We can denote the node in the ith position of the kth layer by (k, i ). The node function is
denoted by Oi k .
Since the node output depends on its incoming signals and its parameter set (a, b, c ), we have O
k
i= O k i ( Oi k − 1 , . . . , O#( k −1) k − 1, a, b, c )
Error Measure
Assume that a training data set has P entries. The error measure for the pth entry can be defined
as the sum of the squared error
= ∑(, − )
#( )
,
OLm,p is the mth component the actual output vector. The overall error is
E=∑
When the number of rules is not restricted, a zero-order Sugeno model has unlimited
approximation power for matching well any nonlinear function arbitrarily on a compact set. This
4
can be proved using the Stone-Weierstrass theorem. Let domain D be a compact space of N
dimensions, and let F be a set of continuous real-valued functions on D satisfying the following
criteria:
Stone-Weierstrauss theorem – I
Algebraic closure: If f and g are any two functions in F, then fg and af + bg are in F for any two
real numbers a and b.
Indentity Function
Indentity function: The constant f ( x) = 1 is in F. The first hypothesis requires that our fuzzy
inference system be able to compute the identity function f ( x) = 1. An obvious solution is to set
the consequence part of each rule equal to one.
Separability: For any two points x1 # x2 in D, there is an f in F such that f(x1) # f(x2). The
second hypothesis requires that our fuzzy inference system be able to compute functions that
have different values for different points. This is achievable by any fuzzy inference system with
appropriate parameters.
Algebraic closure addition: If f and g are any two functions in F, then af + bg are in F for any
two real numbers a and b. • Suppose that we have two fuzzy inference systems S and Sˆ; each of
them has two rules. • The final output of each system is specified as
. 1 + 2. 2
: =
1 + 2
CANFIS has extended basic ideas of its predecessor ANFIS (Adaptive Network based Fuzzy
Inference System).In this ANFIS concept has been extended to any number of input-/output pairs
.In addition, CANFIS yields advantages from non linear fuzzy rules. This CANFIS realizes the
5
sugeno –type ( or TSK)fuzzy inferencing accomplishing fuzzy ifthen rules such as ,If X is A1
and Y is B1, Then C1=p1X+q1Y+r1.
FRAMEWORK
Toward Multiple Inputs/Outputs Systems
CANFIS has extended the notion of a single-output system, ANFIS, to produce multiple
outputs. One way to get multiple outputs is to place as many ANFIS models side by side
as there are required outputs. In this MANFIS (multiple ANFIS) model, no modifiable
parameters are shared by
the juxtaposed ANFIS models. That is, each ANFIS has an independent set of fuzzy
rules, which makes it difficult to realize possible certain correlations between outputs. An
additional concern resides in the number of adjustable parameters, which drastically
increases as outputs increase.
Another way of generating multiple outputs is to maintain the same antecedents of fuzzy
rules among multiple ANFIS models.
Architectural Comparisons
6
3.3 CART: Classification and Regression Trees
In ANFIS, learning rules deal only with parameter identification. Still there is a need for
structure identification.
7
i. Input space partitioning
ii. Number of MFs for each input
iii. Antecedent(premise) part of fuzzy rules
iv. Consequent part of fuzzy rules
3. Choosing the initial parameters for MFs
CART-> Quick method to solve problem of structure identification
Resulting ANFIS architecture based on CART is both efficient in training and application
because of weight normalization.
Decision Trees(DT)
Partitions the input space into mutually exclusive regions( Assigned lable/action/value)
-DT is a structure with internal and external nodes
* nodes connected by branches
* internal node is decision making unit
* External node: (leaf/terminal node) : has no child( assigned with label / value
- Depending on the result of decision function, the tree will branch to node’s child
- Binary decision tree is a decision tree with two children
- DT used for classification problems are called classification trees(CT)
- DT for regression problems are Regression Trees
8
Input Space partitioning for Binary DT
Tree Growing
CART grows DT determining the success of splits(partition training data into disjoint subsets)
Classification trees:
9
The error measure is determined by the impurity function E(t), where t is the node
() = φ( , . . )
Best impurity function for J-class classification trees are entropy function and Gini Deversity
index.
Where pl, pr: percentage of impurities on the left and right and s is the split.
Regression Trees
-(.)
Where {xi, yi} are data points. dt(x, 4): local model, and 4 is the modifiable parameter.
Tree pruning:
10
b. Weakness subtree shrinking.
• Conceptually simple
• Computaionaly efficient
• Applicatble to classification and regression problems
• Solid statistical foundation is available
• Suitable for high dimensional data
• Able to identify relevant inputs simultaneously.
K-Means clustering
• It is also called C-means clustering.
• Used in image and speech processing
11
• RBF(Radial Basis function): value based on distance from source
• Let xj(j=1..n) be the input vector and it is clustered into C groups.
Let Ci be the cluster center and Gi be the centre in each group(for i=1..n)
The cost function J is calculated as ∑=$ <$ = ∑=$(∑?,;A∈C ‖2? − @$ ‖ )
•
D
•
= ∑=$(∑?,;A∈C 0(2? − @$ ))
D
0 otherwise
Step 4: Update the cluster @$ = M ∑?,;A∈C 2? , where ci is the optimal cluster centre.
D D
@$ = η(2 − @$ )
∑" E$ 2
Step 2: Calculate fuzzy cluster centre
@$ =
∑ E$
"
<(N, @ . . @= ) = ∑=$ <$ = ∑=$ ∑" E$ 0$ , where 0$ = G@$ − 2 G
Step 4: Compute new N$ = , where 0$ is the distance between the ith custer
Q
ODP RST
∑UAVT
OAP
and jth data point, where m is the weighting exponent for all m∈ [1, ∞]
Mountain Clustering:
• Proposed by Yager and Filer
12
• The cluster centre is estimated by density measure called mountain function
• It follows quick approximate clustering
Step 1: Form a grid on data space. The grid lines constitute candidates for cluster centres
Y(Z) = ∑-
exp(− ), where xi is the ith data point and v∈V = xi(ith data)
‖^_;D ‖Q
σQ
Y"` (Z) = Y(Z) − Y(@ )exp(− ), where m(c1)is inversely proportional to the distance
‖^_=T ‖Q
a Q
between v and c1
• Proposed by Chin
• Data points are used for cluster instead of grids
Step 2: Update b$
G2$ − 2 G
b$ = b$ − b$ exp g− i
Jh
c2f
13
1. Neuro Fuzzy modelling 2 phases
i. Structure identification: Simple grid partitioning
ii. Parameter identification: Objective functions a. density measure
b.Typicality measure
14
A binary fuzzy box tree ‘T’ is a rooted tree in which each node has two children. Let R be the
nodes set of ‘T’ and r ∈ R is a fuzzy set with µl (u). If s is child of r then µm (u)≤ µn (u), ∀ u∈ U, s
≤ r.
A focus set or focus window is a fuzzy set defined on feature space that indicates focus or the
current interest.
Linear efficiency algorithm or Algorithm to find the best rule base with respect to the current
context
15
Feedback control systems & Neuro Fuzzy Control- an overview
X(t)=state variables
16
Fig 2. Diagram of Discrete time domain feedback control system
• If controller blocks in the Fig 2 are replaced with Neural networks or FIS(Fuzzy
Inference System) then it is Neuro or Fuzzy control system.
• Neuro fuzzy control refers to design methods for fuzzy logic control
• More NFCs are non-linear and difficult to train and requires expert control i.e mimicking
an expert. For example complex plants likes electric train, traffic signal etc need
knowledge acquisition and requires human inputs.
NFC has unique properties of ANFIS(Adaptive Neuro Fuzzy Infrence System) controller
1. Learning ability
2. Parallel operation
3. Structured knowledge representation
17
4. Better integration with other control design methods.
Two phases:
a. Learning phase
b. Application phase: generate control actions
2K 1 2K , EK where x(k) is the state at time k and u(k) is the control signal at time
k
Stat at K+2
Generalizing:
2K u w2K , N
1
N EK , EK 1 … . EK u − 1
N o2K , 2K u
p Wr"_ s … . rs. sX
a. Plant block
b. Training phase
19
c. Application phase
y oz 2K , 2q K u
N
20
Drawback:
To overcome the drawback and to minimize the system error specialized learning is preferred.
Specialized Learning
a. Plant block
21
Let the plant dynamics be 2K 1 2K , ZK
4= param vector
If Z{K is set as plant’s input ZK , then we have a closed –loop system specification by
Aim : To minimize the difference between closed loop systems and the desired model.
22
Specialized learning with model referencing
Reiforcement Learning
GARIC
Three components:
23
AHC models Critic module Action module
AHCON Neuro Neuro
GARIC Neuro Neuro-Fuzzy
RNN-FLCS Neuro-Fuzzy Neuro-Fuzzy
Decision making and control are two fields with distinct methods for solving problems, and yet
they are closely related. This book bridges the gap between decision making and control in the
field of fuzzy decisions and fuzzy control, and discusses various ways in which fuzzy decision
making methods can be applied to systems modeling and control.
Fuzzy decision making is a powerful paradigm for dealing with human expert knowledge when
one is designing fuzzy model-based controllers. The combination of fuzzy decision making and
24
fuzzy control in this book can lead to novel control schemes that improve the existing controllers
in various ways. The following applications of fuzzy decision making methods for designing
control systems are considered:
• Fuzzy decision making for enhancing fuzzy modeling. The values of important parameters
in fuzzy modeling algorithms are selected by using fuzzy decision making.
• Fuzzy decision making for designing signal-based fuzzy controllers. The controller
mappings and the defuzzification steps can be obtained by decision making methods.
• Fuzzy design and performance specifications in model-based control. Fuzzy constraints and
fuzzy goals are used.
• Design of model-based controllers combined with fuzzy decision modules. Human operator
experience is incorporated for the performance specification in model-based control.
Most decisions that people make are logical decisions, they look at the situation and make a
decision based on the situation. The generalized form of such a decision is called a generalized
modus ponens, which is in the form:
If P, then Q.
P.
Therefore, Q.
This form of logical reasoning is fairly strict, Q can only be if P. Fuzzy logic loosens this
strictness by saying that Q can mostly be if P is mostly or:
If P, then Q.
mostly P.
Therefore, mostly Q.
25
Where P and Q are now fuzzy numbers. The reasoning above requires a set of rules to be
defined. These rules are linguistic rules to relate different fuzzy sets and numbers. The general
form of these rules are: "if x is A then y is B," where x and y are fuzzy numbers in the fuzzy sets
A and B respectivly. These fuzzy sets are defined by membership functions. There can be any
number of input and output membership functions for the same input as well, depending on the
number of rules in the system. For example, a system could have membership functions that
represent slow, medium, and fast as inputs.
The linguistic rules are used to define the relation between the input and the output, but how
exactly are the output fuzzy values determined? There are several ways to determine the answer
based on the inputs, mainly the Mamdani, Larsen, Takagi-Sugeno-Kang, and Tsukamoto
inference and aggregation methods. Firstly, we must describe the basic general set of rules, they
will bet a set of rules that have one input in a fuzzy set and one output in a fuzzy set:
Let us look at a system that has two input membership functions (A1,A2) and two output
membership functions (B1, B2). These membership functions, shown below, define the fuzzy
sets A and B in the above general inference rule.
A1 and A2 are shown on the left, with A1 in blue and A2 in green. On the right B1 is blue and
B2 is green. We will be using the Mamdani inference model to combine the sets and rules. The
Mamdani inference model is:
26
R(x,y) = pg110 in Nguyen
Using this model will give an aggregate fuzzy set, R, that uses the input values in A1 and A2
to modify and combine B1 and B2. The input membership functions, as well as the output
membership functions, are overlapping; this means that an input value can have membership in
both membership functions, or in only one. If the input value has membership in a function, than
any rule using that membership funciton is said to 'fire' and produce a result. These results are
then aggregated using the Mamdani model, or a different model.
Let us then pick and input value that has membership function in A1 and A2, 1.25, this will
cause both rules to fire. The value 1.25 has a membership of 0.75 in A1 and a membership of
0.25 in A2. Using the Mamdani model and these inputs the resulting aggregate output will be:
When all of these combinations have been made, the aggregate output membership function
(red), as well as B1 and B2 (dashed) are shown below:
27
This aggregate fuzzy membership function is the result of the rule based inference decision
making process. To get a finite number as an output we need to go through the defuzzification
process. Defuzzification is a method that produces a number that best represents, and consistenly
represents the fuzzy set. There are many ways to do this with most of them being some type of
averaging method. The most common is the centroid method, this calculates the center of area of
the fuzzy set and uses the value at which this occurs as the defuzzified output. Other methods
include the bisector, largest of maximum, smallest of maximum, and middle of maximum. For
the above aggregate fuzzy set, the different defuzzification methods produce these finite values
shown below. So, if the most common method, centroid, is used, the finite result would be 7.319.
28
Smallest of Max 6
Middle of Max 7.5
References:
29
Unit 4
Genetic Algorithms
4.1 Introduction
• Special Features:
– Representations
– Mutations
– Crossovers
– Selection mechanisms
• More fit individuals are stochastically selected from the current population
• Iteration continues still satisfactory fitness level has been reached for the
population.
Key terms
4.3 Reproduction
Population Fitness
1 25.0
2 5.0
3 40.0
4 10.0
5 20.0
end for
end for
do this twice
end for
end
create offspring
end loop
Fitness Function
4.4 Crossover
The two strings participating in the crossover operation are known as parent
strings and the resulting strings are known as children strings
Cross over
String
1# 101|11 String 1# 101|01
String
2# 100|01 String 2# 100|11
4.5 Mutation
Probi = (fi/sum(fi))
Expected count=fi/average(fi)
Crossover
Mutation
Various softwares are available for coding Genetic Algorithms. Some of the
available softwares are listed as follows:
• JGAP
• jMetal
• Jenetics: Java Genetic Algorithm Library
• Java Graticule 3D
4.7 Fitness Scaling
′ = +
′ = × !
′ =
4.8 Applications of GA
i. Optimization
i. Optimization
• numerical optimization
• combinatorial optimization
• Example problems:
• For example
– sorting networks
• host-parasite co-evolutions
• GAs have been used to study how individual learning and species evolution
affect one another.
• Evolution of communication
ARTIFICIAL INTELLIGENCE
Introduction – Searching techniques – First order Logic – Forward reasoning – Backward reasoning –
Semantic – Frames.
5.1 Introduction
1
new conclusions.
Machine learning to adapt to new circumstances and to detect andextrapolate patterns
2
otherwise) that I could have?
(d) Acting rationally : The Rational Agent Approach
Rational behavior: doing the right thing
The right thing: that which is expected to maximize goal achievement, giventhe available
information.
Doesn't necessarily involve thinking (e.g.) blinking reflex - thinking should bein the service of
rational action
An agent is something that acts. Computer agents are not mere programs ,butthey are expected to
have the following attributes also : (a) operating underautonomous control, (b) perceiving their
environment, (c) persisting over a prolongedtime period, (e) adapting to change.
A rational agent is one that acts so as to achieve the best outcome.
Game playing:
IBM's Deep Blue became the first computer program to defeat the worldchampion in a chess match
when it bested Garry Kasparov by a score of 3.5 to 2.5 inan exhibition match (Goodman and Keene,
1997).
Diagnosis:
Medical diagnosis programs based on probabilistic analysis have been able toperform at the level of
an expert physician in several areas of medicine.
Robotics:
Many surgeons now use robot assistants in microsurgery. HipNav (DiGioia etal., 1996) is a system
that uses computer vision techniques to create a three-dimensional model of a patient's internal
anatomy and then uses robotic control toguide the insertion of a hip replacement prosthesis.
Language understanding and problem solving:
PROVERB (Littman et al., 1999) is a computer program that solves crosswordpuzzles better than
most humans, using constraints on possible word fillers, a largedatabase of past puzzles, and a variety
of information sources including dictionariesand online databases such as a list of movies and the
actors that appear in them.
3
5.2 Searching Techniques
Example
The 8-puzzle
An 8-puzzle consists of a 3x3 board with eight numbered tiles and a blank space. Atile adjacent to the
blank space can slide into the space. The object is to reach thegoal state ,as shown in figure 5.1
Example: The 8-puzzle
4
be reached from exactly half of the possible initialstates.
o Successor function : This generates the legal states that result from trying thefour actions(blank
moves Left, Right, Up or down).
o Goal Test : This checks whether the state matches the goal configuration.
o Path cost : Each step costs 1,so the path cost is the number of steps in the path.
The 8-puzzle belongs to the family of sliding-block puzzles, which are oftenused as test problems for
new search algorithms in AI. This general class isknown as NP-complete. The 8-puzzle has 9!/2 =
181,440 reachable states and is easily solved.
The 15 puzzle (4 x 4 board ) has around 1.3 trillion states, an the randominstances can be solved
optimally in few milli seconds by the best searchalgorithms.
The 24-puzzle (on a 5 x 5 board) has around 1025 states ,and random instancesare still quite difficult
to solve optimally with current machines and algorithms.
Real-World Problems
5
• you want to find the solution containing the fewest arcs.
• few solutions may exist, and at least one has a short path length.
• infinite paths may exist, because it explores all of the search space, even with infinite paths.
Since it never generates a node in the tree until all the nodes at shallower levels have been
generated, breadth-first search always finds a shortest path to a goal. Since each node can be
generated in constant time, the amount of time used by Breadth first search is proportional to the
number of nodes generated, which is a function of the branching factor b and the solution d. Since the
number of nodes at level d is bd, the total number of nodes generated in the worst case is b + b2 +
b3 +… + bd i.e. O(bd) , the asymptotic time complexity of breadth first search.
Look at the above tree with nodes starting from root node, R at the first level, A and B at the second
level and C, D, E and F at the third level. If we want to search for node E then BFS will search level
by level. First it will check if E exists at the root. Then it will check nodes at the second level. Finally
it will find E a the third level.
1. Breadth first search will never get trapped exploring the useless path forever.
2. If there is a solution, BFS will definitely find it out.
6
3. If there is more than one solution then BFS can find the minimal one that requires less
number of steps.
1. The main drawback of Breadth first search is its memoryrequirement. Since each level of the
tree must be saved in order to generate the next level, and the amount ofmemory is
proportional to the number of nodes stored, the space complexity of BFS is O(bd).
2. If the solution is farther away from the root, breath first search will consume lot of time.
Depth First Search (DFS) searches deeper into the problem space. Breadth-first search always
generates successor of the deepest unexpanded node. It uses last-in first-out stack for keeping the
unexpanded nodes. Depth-first search is implemented recursively, with the recursion stack taking the
place of an explicit node stack.
1.If the initial state is a goal state, quit and return success.
2.Otherwise, loop until success or failure is signaled.
a) Generate a state, say E, and let it be the successor of the initial state. If there is no successor, signal
failure.
b) Call Depth-First Search with E as the initial state.
c) If success is returned, signal success. Otherwise continue in this loop.
• The advantage of depth-first Search is that memoryrequirement is only linear with respect to the
search graph. This is in contrast with breadth-first search which requires more space. The reason is
that the algorithm only needs to store a stack of nodes on the path from the root to the current node.
• The time complexity of a depth-first Search to depth d is O(b^d) since it generates the same set of
nodes as breadth-first search, but simply in a different order. The depth-first search is time-limited
rather than space-limited.
• If depth-first search finds solution without exploring much in a path then the time and space it takes
will be very less.
7
Disadvantages of Depth-First Search
• The disadvantage of Depth-First Search is that there is a possibility that it may go down the left-most
path forever. Even a finite graph can generate an infinite tree. One solution to this problem is to
impose a cutoff depth on the search. Although the ideal cutoff is the solution depth d and this value is
rarely known in advance of actually solving the problem. If the chosen cutoff depth is less than d, the
algorithm will fail to find a solution, whereas if the cutoff depth is greater than d, a large price is paid
in execution time, and the first solution found may not be an optimal one.
• And there is no guarantee to find a minimal solution, if more than one solution exists.
c.Bidirectional Search
It searches forward from initial state and backward from goal state till both meet to identify a
common state.
The path from initial state is concatenated with the inverse path from the goal state. Each search is
done only up to half of the total path.
Disadvantage − There can be multiple long paths with the cost ≤ C*. Uniform Cost search must
explore them all.
It never creates a node until all lower nodes are generated. It only saves a stack of nodes. The
algorithm ends when it finds a solution at depth d. The number of nodes created at depth d is bd and
at depth d-1 is bd-1.
8
Iterative Deepening Depth-First Search
In each iteration, a node with a minimum heuristic value is expanded, all its child nodes are created
and placed in the closed list. Then, the heuristic function is applied to the child nodes and they are
placed in the open list according to their heuristic value. The shorter paths are saved and the longer
ones are disposed.
g. A * Search
It is best-known form of Best First search. It avoids expanding paths that are already expensive, but
expands most promising paths first.
• f(n) estimated total cost of path through n to goal. It is implemented using priority queue by
increasing f(n).
9
Disadvantage − It can get stuck in loops. It is not optimal.
j.Hill-Climbing Search
It is an iterative algorithm that starts with an arbitrary solution to a problem and attempts to find a
better solution by changing a single element of the solution incrementally. If the change produces a
better solution, an incremental change is taken as a new solution. This process is repeated until there
are no further improvements.
Otherwise the (initial k states and k number of successors of the states = 2k) states are placed in a
pool. The pool is then sorted numerically. The highest k states are selected as new initial states. This
process continues until a maximum value is reached.
Simulated Annealing
Annealing is the process of heating and cooling a metal to change its internal structure for modifying
its physical properties. When the metal cools, its new structure is seized, and the metal retains its
newly obtained properties. In simulated annealing process, the temperature is kept variable.
Initially set the temperature high and then allow it to ‘cool' slowly as the algorithm proceeds. When
the temperature is high, the algorithm is allowed to accept worse solutions with high frequency.
Start
• k = k + 1;
10
Repeat steps 1 through 4 till the criteria is met.
End
Start
Find out all (n -1)! Possible solutions, where n is the total number of cities.
Determine the minimum cost by finding out the cost of each of these (n -1)! solutions.
Finally, keep the one with the minimum cost.
end
A sentence in first-order logic is written in the form Px or P(x), where P is the predicate and x is the
subject, represented as a variable. Complete sentences are logically combined and manipulated
according to the same rules as those used in Boolean algebra.
In first-order logic, a sentence can be structured using the universal quantifier (symbolized ) or the
x : Ax Fx
Propositional logic assumes world contains facts, first-order logic (like natural language)
assumes the world contains
11
• Objects: people, houses, numbers, theories, Ronald McDonald, colors, baseball games,
wars, centuries . . .
• Relations: red, round, bogus, prime, multistoried . . ., is the brother of, is bigger than, is
inside, is part of, has color, occurred after, owns, comes between, . . .
• Functions: father of, best friend, third inning of, one more than, end of
Variable symbols x, y, a, b, . . .
Connectives ∧ ∨ ¬ ⇒ ⇔
Equality =
Quantifiers ∀ ∃
Punctuation ( )
Atomic sentences
Complex sentences
Complex sentences are made from atomic sentences using connectives ¬S, S1 ∧ S2, S1 ∨ S2,
S1 ⇒ S2, S1 ⇔ S2
Universal quantification
∀<variables><sentence>
12
Common mistake with ∀:
Existential quantification
∃<variables><sentence>
There’s someone who either is smart or isn’t at UMD. That’s true if there’s anyone who is not at
UMD. Probably you meant to say this instead: ∃ x At(x, UMD) ∧ Smart(x).There’s someone who is at
UMD and is smart.
Properties of quantifiers
∀ x ∀ y is the same as ∀ y ∀ x
∃ x ∃ y is the same as ∃ y ∃ x
∃ x ∀ y Loves(x, y)
∀ y ∃ x Loves(x, y)
13
∀ x Likes(x, IceCream) ¬∃ x ¬Likes(x, IceCream)
Examples of sentences
∀ x, y Brother(x, y) ⇒ Sibling(x, y)
“Sibling” is symmetric
∀ x, y Sibling(x, y) ⇔ Sibling(y, x)
For example, suppose that the goal is to conclude the color of a pet named Fritz, given that he croaks
and eats flies, and that the rule base contains the following four rules:
Let us illustrate forward chaining by following the pattern of a computer as it evaluates the rules.
Assume the following facts:
• Fritz croaks
• Fritz eats flies
With forward reasoning, the inference engine can derive that Fritz is green in a series of steps:
1. Since the base facts indicate that "Fritz croaks" and "Fritz eats flies", the antecedent of rule #1 is
satisfied by substituting Fritz for X, and the inference engine concludes:
Fritz is a frog
14
2. The antecedent of rule #3 is then satisfied by substituting Fritz for X, and the inference engine
concludes:
Fritz is green
The name "forward chaining" comes from the fact that the inference engine starts with the data and
reasons its way to the answer, as opposed to backward chaining, which works the other way around.
In the derivation, the rules are used in the opposite order as compared to backward chaining. In this
example, rules #2 and #4 were not used in determining that Fritz is green.
For example, suppose a new pet, Fritz, is delivered in an opaque box along with two facts about Fritz:
• Fritz croaks
• Fritz eats flies
The goal is to decide whether Fritz is green, based on a rule base containing the following four rules:
With backward reasoning, an inference engine can determine whether Fritz is green in four steps. To
start, the query is phrased as a goal assertion that is to be proved: "Fritz is green".
1. Fritz is substituted for X in rule #3 to see if its consequent matches the goal, so rule #3 becomes:
15
If Fritz is a frog – Then Fritz is green
Since the consequent matches the goal ("Fritz is green"),the rules engine now needs to see if the
antecedent ("If Fritz is a frog") can be proved. The antecedent therefore becomes the new goal:
Fritz is a frog
Since the consequent matches the current goal ("Fritz is a frog"), the inference engine now needs to
see if the antecedent ("If Fritz croaks and eats flies") can be proved. The antecedent therefore
becomes the new goal:
3. Since this goal is a conjunction of two statements, the inference engine breaks it into two sub-goals,
both of which must be proved:
Fritz croaks
Fritz eats flies
4. To prove both of these sub-goals, the inference engine sees that both of these sub-goals were given
as initial facts. Therefore, the conjunction is true:
therefore the antecedent of rule #1 is true and the consequent must be true:
Fritz is a frog
therefore the antecedent of rule #3 is true and the consequent must be true:
Fritz is green
This derivation therefore allows the inference engine to prove that Fritz is green. Rules #2 and #4
were not used.
16
5.6 Semantic Networks
A semantic net (or semantic network) is a knowledge representation technique used for
propositional information. So it is also called a propositional net. Semantic nets convey meaning.
They are two dimensional representations of knowledge. Mathematically a semantic net can be
defined as a labelled directed graph.
Semantic nets consist of nodes, links (edges) and link labels. In the semantic network
diagram, nodes appear as circles or ellipses or rectangles to represent objects such as physical objects,
concepts or situations. Links appear as arrows to express the relationships between objects, and link
labels specify particular relations. Relationships provide the basic structure for
organizing knowledge. The objects and relations involved need not be so concrete. As nodes are
associated with other nodes semantic nets are also referred to as associative nets.
17
In the above figure all the objects are within ovals and connected using labelled arcs. Note that there
is a link between Jill andFemalePersons with label MemberOf. Simlarly there is aMemberOf link
between Jack and MalePersons and SisterOf link between Jill and Jack. The MemberOf link
between Jill and FemalePersons indicates that Jill belongs to the category of female persons.
Inheritance Reasoning
Unless there is a specific evidence to the contrary, it is assumed that all members of a class (category)
will inherit all the properties of their superclasses. So semantic network allows us to perform
inheritance reasoning. For example Jill inherits the property of having two legs as she belongs to the
category of FemalePersons which in turn belongs to the category of Persons which has a boxed Legs
link with value 2. Semantic nets allows multiple inheritance. So an object can belong to more than one
category and a category can be a subset of more than one another category.
Inverse Links
Semantic network allows a common form of inference known as inverse links. For example we can
have a HasSister link which is the inverse of SisterOf link.The inverse links make the job of inference
algorithms much easier to answer queries such as who the sister of Jack is. On discovering
that HasSister is the inverse of SisterOf the inference algorithm can follow that link HasSister
from Jack to Jill and answer the query.
1.One of the drawbacks of semantic network is that the links between the objects represent only
binary relations.
18
Advantages of Semantic Nets
1.Semantic nets have the ability to represent default values for categories. In the above figure Jack has
one leg while he is a person and all persons have two legs. So persons have two legs has only default
status which can be overridden by a specific value.
2.They convey some meaning in a transparent manner.
3.They nets are simple and easy to understand.
4.They are easy to translate into PROLOG.
5.7 Frames
Frames can also be regarded as an extension to Semantic nets. Semantic nets initially used to
represent labelled connections between objects. As tasks became more complex the representation
needs to be more structured. A frame is a collection of attributes or slots and associated values that
describe some real world entity. Each frame represents:
• a class (set), or
• an instance (an element of a class).
Need of Frames
Frame is a type of schema used in many AI applications including vision and natural language
processing. The situations to represent may be visual scenes, structure of complex physical objects,
etc. A frame is similar to a record structure and corresponding to the fields and values are slots and
slot fillers. Basically it is a group of slots and fillers that defines a stereotypical object. A single frame
is not much useful. Frame systems usually have collection of frames connected to each other. Value
of an attribute of one frame may be another frame.
Slots Fillers
publisher Thomson
author Giarratano
edition Third
year 1998
pages 600
19
Frames can represent either generic or frame. Following is the example for generic frame.
Slot Fillers
name computer
location (home,office,mobile)
The fillers may values such as computer in the name slot or a range of values as in types slot. The
procedures attached to the slots are called procedural attachments. There are mainly three types of
procedural attachments: if-needed, default and if-added. As the name implies if-needed types of
procedures will be executed when a filler value is needed. Default value is taken if no other value
exists. Defaults are used to represent commonsense knowledge.
References:
20