chp4
chp4
4
Machine Learning
Learning
Training Algorithm Model Test Accuracy
Data Data
produces the outputs that match our actual output for given input output data set. Supervised
learning is used for classification problems.
Supervised Learning Process
There are two steps in supervise learning process:
1. Learning (training): Learn a model using the training data.
2. Testing: Test the model using unseen test data to assess the model accuracy.
4.1.2. Unsupervised Learning
According to unsupervised learning, the weights and biases are modified with respect to
the network inputs only. In this type of learning no target outputs available therefore most
of these algorithm performed clustering operations. They categorised the input objects into
a diffrent classes. This technique is used in applications like vector quantization. In this
learning paradigm, suppose that we are given data samples without being told which classes
they belong to. There are schemes that are aimed to discover significant patterns in the input
data without a teacher.
In unsupervised learning, some data ‘x’ is given and the cost function is given. Our goal
is to minimize the cost in that function. The cost function is related to a problem for that we
want solution and may be related to a priori assumptions. For example, in data compression
problem it may be related to the mutual information between x and y, while in statistical
modeling problem, it may be related to the posterior probability of the model given the
data. Tasks that fall within this paradigm of unsupervised learning are in general estimation
problems; the applications include clustering, the estimation of statistical distributions,
compression and filtering.
4.1.3. Reinforcement Learning
Reinforcement learning is learning about how to map situations to the actions so as to
maximize the numerical reward signal. There are two main characteristics of reinforcement
learning are trial and error, delayed reward. You need to discover an action which must
produce most reward by hit and trial method. One important thing is that any action may
affect not only the intermediate reward but also next situation and all successor reward.
In reinforcement learning, data x are usually not given, data may produces at the time of
interactions of an agent with the environment. Whenever, the agent performs an action yt
and the environment generates an observation xt and an instantaneous cost ct, according to
some unknown dynamics. Our aim is to search a method for selecting actions that minimizes
the expected total cost. The environment’s dynamics and the total cost for each method are
generally unknown, but can be estimated. Reinforcement learning is better suits for control
problems, games and other sequential decision making tasks. There are two types of
Reinforcement learning:
Passive Reinforcement Learning: In fully observable environment, Passive learning
Policy is fixed (behavior does not change). The agent learns how good each state is. Similar
to policy evaluation, but Transition function and reward function or unknown. It is useful
for future policy revisions.
Active Reinforcement Learning: Using passive reinforcement learning, utilities of
states and transition probabilities are learned. Those utilities and transitions can be plugged
into Bellman equations. Bellman equations give optimal solutions given correct utility and
transition functions. Active reinforcement learning produces approximate estimates of those
functions.
MACHINE LEARNING 4.3
4.1.4. Adaptation
Adaptation can be simply defined as a change in the relationship between recognized
pattern and the present classes that has been induced by the level of the pattern. A change by
which a pattern becomes better suited into its environment or classes. A major function of
adaptation is to increase the amount of sensor information for classifying a pattern into a
class. The amount of information collected depends upon the ways in which a samples
pattern and transducers signals. The amount of information that is used is further limited by
internal losses during transmission and processing. Adaptation can increase the information
of capturing and reduce internal losses by minimizing the effects of physical and biophysical
constraints.
4.2. DECISION TREES
A decision tree is a graphic display of various decision alternatives and the sequence of
events as if they were branches of a tree.
Rectangle Symbols are used to indicate decision points. And Circle Symbols are used to
denote situation of uncertainty or event branches coming out of a decision tree. These
points are representing of immediate mutually exclusive alternative open to decision maker.
A decision tree is highly useful to a decision point where immediate mutually exclusive
alternatives open to decision maker.
A decision tree is highly useful to a decision maker in multistage situation which
involve a serious of decisions each dependant on the preceding one.
Example 4.1. A company is running and after paying for materials labor etc. brings a
profit of Rs. 12000. The following alternatives are available to the company
1. The company can start a research R 1 which is coast of Rs 10000 having 90%
chances of success. If R1 successes the company gets total income of Rs 20000.
2. The company can start research R2 of coast of Rs 8000 having 60% chances of
success. If R2 successes the company gets total income of Rs 25000.
3. Company can pay Rs 6000 as royalty for a new process which will bring net gross
income Rs 20000.
4. The company continues the current process.
Because of limited recourse it is assumed that only one of the two researches can be
carried out at a time. Use decision tree analysis to locate the optimal strategy for the
company.
Solution. Following results we get from given decision tree: (Fig. 4.2)
1. If The company can conduct research R1. Net profit of company = 12500
2. If The company can conduct research R1. Net profit of company = 7000
3. If Company can pay Rs 6000 as royalty. Net profit = 14000.
4. If the company continues the current process. Net profit = 12000.
Hence final Decision is the option 3 i.e. company pay royalty.
4.4 MACHINE LEARNING
Fig. 4.2
4.3. NAÏVE BAYES MODEL
The Bayesian Decision theory or Bayesian frameworks have been used to deal with a
wide variety of problems in many scientific and engineering areas. Whenever a quantity is
to be inferred, or some conclusion is to be drawn, from observed data. The Bayesian principles
and tools can be used. Bayes Decision Theory is based on the ever popular Bayes Rule.
Bayes theorem is essentially an expression of conditional probabilities. More or less,
conditional probabilities represent the probability of an event occurring given evidence.
To better understand, Bayes Theorem can be derived from the joint probability of x and wi
(i.e., P(x)) as follows:
Fw I FG x IJ p(w )
PH K i
=
p
Hw K
i
i
...(i)
x p( x )
There are following probabilities
1. The Prior [P(wi)]: As the name implies, the prior or a priori distribution is a prior
belief about a particular system how it is modelled. For instance, the prior this system may
be modelled using a Gaussian of some calculated mean and variance. Many times, if prior is
unknown then a uniform distribution is used to model the prior and iterative trials may
yields a much better estimate.
2. The Likelihood [P(x wi)]: The likelihood is the probability that a random variable
most likely belongs to specific class. This is generally known and its complement is needed
to calculate the posterior probability.
3. The Posterior [P(wi x)]: The posterior or a posterior probability is the result
obtained from the Bayes theorem. It is the probability of happening an event according to
given evidence. Hence the a posterior is shown as P(ω | x) where ω is the particular query
and x is the evidence given.
4. The Evidence [P(x)]: The evidence p(x) is used as facts or proof of occurring an
event in past time.
MACHINE LEARNING 4.5
There are other forms of bays theorem as follows:
FG x IJ p(w )
Fw I
PH K i
=
p
Hw K i
i
...(ii)
x
z pFGH wx IJK p(w )
i
i
F xI
pG J p(w )
Fw I Hw K i
P H xK
i
= i
...(iii)
z pFGH wx IJK p(w ) dw
i
i i
Likelihood × Prior
Posterior = ...(iv)
Evidence
Proof of Bays Theorem: According to multiplication theorem of probability if event
x and w and two independent events then probability of simultaneous happening of both
event is
P(x w) = P(xw) = P(x) × p(w) ...(a)
If x and w are not necessarily independent then Eq. (a) becomes
F w I = P(w ) × p F x I
P (x w) = P ( x ) × p ( w ) = P ( x ) × p H xK H wK ...(b)
(wi ) × p
FG x IJ
Fw I Hw K
P H xK
i
=
p( x )
i
...(i)
If S is a sample space contains (w1, w2, ..., wn) and x is arbitrary event then xi is subset
of S. Then x corresponding to wi is
P(x) = p( xw11 ) + p( xw12 ) + ... + p( xw1i ) = ∑ p( xwi ) ...(d)
According to equation (b) P ( x ) = ∑ p(wi ) p
x FG IJ
wi H K
now putting this value of P(x) in
x FG IJ
F I
wi
p
wi Hp(wi )
K
equation (i) we get P H K
x
=
∑p
x FG IJ ...(ii)
wi H
p(wi )
K
F xI
p G J p(w )
F w I
In case w is continuous random variable PH K =
Hw K i i
i
...(iii)
i x
z pFGH wx IJK p(w ) dw i
i i
This squirrel has started search its lost food in one of two patches. The only problem is,
she can’t remember which one it is. There are two hypotheses: Hypothesis 1 is that the food
is in patch 1 and Hypothesis 2 is that the food is in patch 2. This squirrel is pretty sure that
she left the food in patch 1. In fact, she’s willing to say that there’s an 80% chance that the
food is in patch 1. She also knows that she’s really good at hiding her food. Consequently,
there’s only a 20% chance of finding the food per day when she’s looking in the right patch
(and, of course, a 0% probability if she’s looking in the wrong patch).
Before she even starts searching, she has a prior probability P(patch 1) = 0.8. Fortunately,
this squirrel has been trained in Bays theorem, and can therefore calculate posterior
probabilities. Suppose she looks in patch 1 and doesn’t find any food? What’s the probability
that the food is in patch 1, given that she didn’t find anything? In terms of Bays Theorem, we
see that:
• A set of facts to represent the initial working memory. This should be anything
relevant to the beginning state of the system.
• A set of rules. This should encompass any and all actions that should be taken
within the scope of a problem, but nothing irrelevant. The number of rules in the
system can affect its performance, so you do not want any that are not needed.
• A condition that determines that a solution has been found or that none exists. This
is necessary to terminate some rule-based systems that find themselves in infinite
loops otherwise. In fact, there are three essential components to a fully functional
rule based expert system: the knowledge base, the working memory and the inference
engine.
The knowledge base: The knowledge based is the store in which the knowledge in the
particular domain is kept. The knowledge base stores information about the subject domain.
However, this goes further than a passive collection of records in a database. Rather it
contains symbolic representations of experts’ knowledge, including definitions of domain
terms, interconnections of component entities, and cause-effect relationships between these
components. The knowledge in the knowledge based is expressed as a collection of fact and
rule. Each fact expresses relationship between two or more object in the problem domain
and can be expressed in term of predicates IF condition THEN conclusion where the condition
or conclusion are fact or sets of fact connected by the logical connectives NOT, AND, OR.
The working memory: The working memory is a temporal store that holds the fact
produced during processing and possibly awaiting further processing produced by the
Inference engine during its activities. Note that the working memory contains only facts and
these fact are those produced during the searching process.
The inference engine: The core of any expert system is its inference engine. This is the
part of expert system that manipulates the knowledge based to produce new fact in order to
solve the given problem. An inference engine consists of search and reasoning procedures to
enable the system to find solutions, and, if necessary, provide justifications for its answers.
In this process it can used either forward or backward searching as a direction of search while
applying some searching technique such as depth first search, breath first search etc. The
roles of inference engine are:
1. It identified the rule to be fired. The rule selected is the one whose conditional part
is the same as the fact been considered in the case of forward chaining or the one
whose conclusion part is the one as the fact been considered in the case of backward
chaining.
2. It resolve conflict when more than one rule satisfy the matching this is called
conflict resolution which is based on certain criteria mentioned further.
3. It recognizes the goal state. When the goal state is reached it report the conclusion
of searching.
4.5.1.2. Limitation of Rule Based System
Knowledge acquisition is the process of extracting knowledge from experts. Given the
difficulty involved in having experts articulate their intuition in terms of a systematic
process of reasoning; this aspect is regarded as the main bottleneck in expert systems
development. rule-based systems are really only feasible for problems for which any and all
knowledge in the problem area can be written in the form of if-then rules Rule based system
is only applicable for problem in which the area is not large. If there are too many rules, the
system can become difficult to maintain and can suffer a performance hit. Rule-based systems
are a relatively simple model that can be adapted to any number of problems. A rule-based
MACHINE LEARNING 4.9
system has its strengths as well as limitations that must be considered before deciding if it is
the right technique to use for a given problem. Overall, rule-based systems are really only
feasible for problems for which any and all knowledge in the problem area can be written in
the form of if-then rules and for which this problem area is not large.
4.5.2. Case based system
In case-based reasoning (CBR) systems expertise is embodied in a library of past cases,
rather than being encoded in classical rules. Each case typically contains a description of
the problem, plus a solution and/or the outcome. The knowledge and reasoning process
used by an expert to solve the problem is not recorded, but is implicit in the solution. To
solve a current problem: the problem is matched against the cases in the case base, and
similar cases are retrieved. The retrieved cases are used to suggest a solution which is reused
and tested for success. If necessary, the solution is then revised. Finally the current problem
and the final solution are retained as part of a new case. Case-based reasoning is liked by
many people because they feel happier with examples rather than conclusions separated
from their context. A case library can also be a powerful corporate resource, allowing everyone
in an organisation to tap into the corporate case library when handling a new problem.
4.5.2.1. Case Based System Cycle
All case-based reasoning methods have in common the following process:
1. retrieve the most similar case (or cases) comparing the case to the library of past cases;
2. reuse the retrieved case to try to solve the current problem;
3. revise and adapt the proposed solution if necessary;
4. retain the final solution as part of a new case.
4.5.2.2. Applications of Case Based System
Case based reasoning first appeared in commercial tools in the early 1990’s and since
then has been used to create numerous applications in a wide range of domains:
Diagnosis: Case-based diagnosis systems try to retrieve past cases whose symptom
lists are similar in nature to that of the new case and suggest diagnoses based on the best
matching retrieved cases. The majority of installed systems are of this type and there are
many medical CBR diagnostic systems.
Help Desk: Case-based diagnostic systems are used in the customer service area dealing
with handling problems with a product or service.
Assessment: Case-based systems are used to determine values for variables by comparing
it to the known value of something similar. Assessment tasks are quite common in the
finance and marketing domains.
Decision Support: In decision making, when faced with a complex problem, people
often look for analogous problems for possible solutions. CBR systems have been developed
to support in this problem retrieval process (often at the level of document retrieval) to find
relevant similar problems. CBR is particularly good at querying structured, modular and
non-homogeneous documents.
4.5.3. Example of Expert System
There are many expert systems available from which two are given below:
4.5.3.1. Dendral
Dendral was the famous expert system in artificial intelligence of the 1960s. Its main
purpose is to provide support to chemists in identify unknown organic molecules, by study
4.10 MACHINE LEARNING
the mass spectra and knowledge of chemistry of molecules. Dendral software is considered
the first expert system because it is first time used to automate the decision-making process
and problem-solving of organic chemists. It consists of two sub-programs, Heuristic Dendral
and Meta-Dendral.
Heuristic Dendral: Heuristic Dendral is a software program that inputs the mass spectra
and other experimental data with knowledge base of chemistry. It produces output the set of
possible chemical structures that are related to experimental data. Mass spectrometer is used
to generate A mass spectrum of a compound, this can be used to find the molecular weight
and the masses of compound atomic constituents.
Meta-Dendral: Meta-Dendral is a knowledge refining system that uses the possible
chemical structures and related mass spectra as input, and suggests a set of hypotheses to
explain relation between some of the suggested structures and the mass spectrum. These
hypotheses can again fed back to Heuristic Dendral to test their validity. We can say it is a
learning system and the Heuristic Dendral is decision system. Theses system work on two
principles: the plan-generate-test paradigm and knowledge engineering.
Plan-Generate-Test Paradigm: The plan-generate-test paradigm is the problem-solving
method, used by both Heuristic Dendral and Meta-Dendral systems. The generator generates
possible solutions for a particular problem using knowledge base. After that heuristic Dendral
check these solutions for validity.
Knowledge Engineering: The main aim of knowledge engineering is to provide a
productive interaction between the available knowledge base and problem solving
techniques. Knowledge engineering must contain the following things:
Large Knowledge Base: Knowledge base contains large amount of information related
to mass spectrometry technique and large amount of information about chemistry. Chemical
structure of compound their atomic mass their atomic components and their atomic number
and atomic mass.
General Rules: The possible rules that can be used to access knowledge from knowledge
base related to the problem.
4.5.3.2. MYCIN
MYCIN was the first large expert system to perform at the level of a human expert and to
provide users with an explanation of its reasoning. Most expert systems developed since
MYCIN have used MYCIN as a benchmark to define an expert system. Moreover, the
techniques developed for MYCIN have become widely available in the various small expert
system building tools. MYCIN was developed at Stanford University in the mid-1970s. It
was designed to aid physicians in the diagnosis and treatment of meningitis and bacterial
infections. MYCIN was strictly a research system. AI investigators wanted to advance the state of
expert system building by undertaking a hard problem with clear, practical ramifications.
MYCIN provides consultative advice about bacterial infection (infections that involve
bacteria in the blood) and meningitis (infections that involve inflammation of the membranes
that envelop the brain and spinal cord). These infectious diseases can be fatal and often
show themselves during hospitalization.
Working of MYCIN
MYCIN is a computer program designed to provide attending physicians with advice
comparable to that which they would otherwise get from a consulting physician specializing
in bacterial and meningitis infections. To use MYCIN, the attending physician must sit in
front of a computer terminal that is connected to a DEC-20 (one of Digital Equipment
MACHINE LEARNING 4.11
Corporation’s mainframe computers) where the MYCIN program is stored. When the MYCIN
program is evoked, it initiates a dialogue. The physician types answers in response to
various questions. Eventually MYCIN provides a diagnosis and a detailed drug therapy
recommendation.
Example: Laboratory results of body fluid analyses, symptoms that the patient is
displaying, and general characteristics of the patient, such as age and sex. MYCIN obtains
this information by interrogating the physician. A MYCIN consultation proceeds in two
phases. First a diagnosis is made to identify the most likely infectious organisms. Then one
or more drugs are prescribed that should control for all of the possible organisms. The
antibiotics prescribed must rid the patient of the disease. They must also interact favorably
with each other, and be appropriate for the specific patient.
4.6. MACHINE LEARNING ALGORITHMS
There are many machine learning algorithm in which two are most important given
below:
4.6.1. Genetic Algorithm
A genetic algorithm is a search technique used in computing to find exact or approximate
solutions to optimization and search problems. Genetic algorithms are categorized as global
search heuristics. Genetic algorithms are a particular class of evolutionary algorithms that
use techniques inspired by evolutionary biology such as inheritance, mutation, selection,
and crossover Genetic algorithms are implemented as a computer simulation in which a
population of abstract representations (called chromosomes or the genotype of the genome)
of candidate solutions (called individuals, creatures, or phenotypes) to an optimization
problem evolves toward better solutions. Traditionally, solutions are represented in binary
as strings of 0s and 1s, but other encodings are also possible. The evolution usually starts
from a population of randomly generated individuals and happens in generations. In each
generation, the fitness of every individual in the population is evaluated, multiple individuals
are stochastically selected from the current population (based on their fitness), and modified
(recombined and possibly randomly mutated) to form a new population. The new population
is then used in the next iteration of the algorithm. Commonly, the algorithm terminates
when either a maximum number of generations has been produced, or a satisfactory fitness
level has been reached for the population. If the algorithm has terminated due to a maximum
number of generations, a satisfactory solution may or may not have been reached.
Applications of Genetic algorithm
Genetic algorithms find application in bioinformatics, phylogenetics, computational
science, engineering, economics, chemistry, manufacturing, mathematics, physics and other
fields.
4.6.2. Neural Network
Artificial neural network, commonly referred to as neural network is an artificial
representation of human brain that tries to simulate its learning process. Traditionally the
word neural network is referred to a network of biological neuron in the nervous system that
process and transmits information. These systems are constructed to make use of some
organizational principles similar to those of the human brain? They represent a promising
new generation of information processing systems. A neural network is an interconnected
group of artificial neurons that uses a mathematical model or computational model for
4.12 MACHINE LEARNING
x1 w1
x2 w2
∑ u
f(u) y
xn wn
ϑ
Fig. 4.3. Artificial Neural Network
X2
Xn
Teaching Input
Fig. 4.4. Simple Neuron
Complicated neuron: The simple neuron doesn’t do anything that conventional
computers don’t do already. Figure 4.5 is the example of complicated neuron. The difference
from the previous model is that the inputs are ‘weighted’; the effect that each input has at
decision making is dependent on the weight of the r input. Weight of an input is a number
which when multiplied with the input it convert the input into weighted input. If the
summation of these weighted inputs exceed a pre-set threshold, then neuron fires. Otherwise
neuron does not fire.
MACHINE LEARNING 4.13
Teach/Use
X1 W1
X2 W2
Xn Wn
Teaching Input
x2 2 y2
xN J yJ
Input Output
Layer Layer
Fig. 4.6. Single Layer Feed-forward Neural Network
Multilayer Feed-forward Neural Networks (MFNN): A Multilayer Feed-forward Neural
Network is shown in Figure 4.7 is the most widely used neural networks, particularly within
the area of systems and control. Similar to the single-layer feed-forward neural networks,
4.14 MACHINE LEARNING
there is one input layer and one output layer, and no interconnections between the nodes in
a particular layer. But different from the single-layer feed-forward neural networks, multilayer
neural networks have a number of intermediate or hidden layers (any layer between the
input and output layers, is called a hidden layer because it is internal to the network and has
no direct contact with the external environment) existing between the input and output
layer. One, two or more hidden layers are used for most applications. The small number of
hidden layers is due to the fact that the training process becomes too long and tedious if the
architecture of the neural network becomes large. In Figure 4.7 one hidden layer is present in
this multilayer neural network, where J ≠ K ≠ N; J, K, N ∈ R. To get the output from the
network, a set of input data is first presented as inputs to the input layer in turn. The outputs
from this layer are then fed, as inputs to the first hidden layer, and subsequently the outputs
from the first hidden layer are fed, as weighted inputs (the outputs from the first hidden layer
are multiplied by the weights), to the second hidden layer. This process carries on until the
output layer is reached. An example of a feed-forward neural network is the multilayer
perceptron (MLP) (commonly called the multilayer feed forward network).
x1 1 1 y1
x2 2 2 y2
xN J K yK
Input Output
Competition
Feedback
Fig. 4.8. Feedback Neural Network
MACHINE LEARNING 4.15
4.6.2.3. Learning in Neural Networks
The ability of neural network is to learn their environment and to adaptive fine-tune
their parameter to improve the system performance. Generally, learning is the process by
which the NN adapts itself to a stimulus, and eventually it produces a desired response.
During the process of learning, the network adjusts its parameters, like weights, at the input
so that its actual output converges to the desired output response. When the actual output
match with the desired one, we say that learning is completed. Learning rules are defined by
mathematical expressions called learning equations. The learning process is not same for
the all the networks depend on applications. There are two general categories of learning
known as supervised learning and unsupervised learning.
Supervised Learning: In supervised learning, we know the input, actual output and
the desired response .we try to calculate the difference between actual output and desired
output. If the actual response differs from the desired output, the NN generates an error
signal, and the difference between actual and desired responce is then used to calculate the
weight adjustment so that actual output matches with the desired output.
Unsupervised Learning: In unsupervised learning, we don’t know about the desired
output. In training network receives at its input many different input patterns and it arbitrarily
organizes the pattern into classes. When a input applied later, the network provides an
output response indicating the class to which the input belongs. If a class cannot be found
for the stimulus, a new class is generated. This type of learning sometimes referred to as self-
organizing learning.
4.6.2.4. Neural Network Learning Algorithms
A learning algorithm is a mathematical tool that outlines the methodology and the
speed for NN to reach the steady state of its parameters, weights and thresholds successfully.
It starts with an error function (energy function), which is expressed in terms of weights. The
objective is to minimize the error in the set of weights. When the error function is zero or
small enough, the steady state of the network and of the weights is reached. During learning,
the error function decreases and the weights are updated. The decrease may be accomplished
with different optimization techniques such as the Delta rule, Boltzman’s algorithm, the
back propagation learning algorithm and simulation annealing. The selection of the error
function and the optimization method is important, because it may increase stability,
instability or a solution trapped in a local minimum. Generally we use back propagation
learning algorithm. Back propagation learning algorithm is the basic learning mechanism
and it is very popular. In this algorithm, the network output, on presentation of input data, is
compared with the desired output and a measure of the error is obtained. This error measure
is then used to incrementally modify appropriate weights in the connection matrices in
order to reduce the error.
4.6.3. Fuzzy Logic
A “fuzzy set” is a simple extension of the definition of a classical set in which the
characteristic function is permitted to have any values between 0 and 1. A “fuzzy set” A in X
can be defined as a set of ordered pairs:
A = {x, µA(x) : x ∈ X)
Where µA(x) is called membership function for the fuzzy set A. It maps each x to a
membership grade between 0 and 1. Examples of membership functions (Triangular,
Trapezoidal and Gaussian) can be seen in Figure 4.9 and described with the following
formulas:
4.16 MACHINE LEARNING
T 0 if x ≥ c
Trapezoidal MFs: Trapezoidal membership function is given by following equation
R| 0 if x ≤ a
|| bx −− aa if a ≤ x ≤ b
Trapezoidal (x; a, b, c, d) = |S 1 if b ≤ x ≤ c
|| d − x
if c ≤ x ≤ d
|| d − c
T 0 if x ≥ d
Degree of Membership
Degree of Membership
0 0 0
0.8 Young
Old
0.4
Very
Very Old
Young
0.2
0 10
Fig. 4.10. Membership Functions of the Term Set Age
by fuzzy if-then rules. The rule base can be constructed either from human expert or automatic
generation that is extraction of rules using numerical input-output data.
Knowledge
Base
Fig. 4.12
MACHINE LEARNING 4.21
In this model a single neuron, which receives a set of inputs [([(x)]↓ (1), x↓(2), x)]↓3 ….. xn).
This set of inputs is multiplied by a set of weights (w1, w2, ... wn). Here, weights are referred to
as strengths of the synapses. These weighted values are then summed and the output is passed
through an activation (transfer) function. The activation function is also referred to as a
squashing function in that it squashes (limits) the permissible range of the output signal to
some finite value.
Q. 7. Explain fuzzy set and fuzzy system in detail.
Ans. A “fuzzy set” is a simple extension of the definition of a classical set in which the characteristic
function is permitted to have any values between 0 and 1. A “fuzzy set” A in X can be defined
as a set of ordered pairs:
A = {x, µA(x): x∈X)
Where µA(x) is called membership function for the fuzzy set A. It maps each x to a membership
grade between 0 and 1. Fuzzy systems are made of a knowledge base and reasoning mechanism
called fuzzy inference engine. The structure of fuzzy inference engine is shown in Figure.
A fuzzy inference engine combines fuzzy if-then rules into a mapping from the inputs of the
system into its outputs, using fuzzy methods. Fuzzy systems is a nonlinear mapping
accompanied by fuzzy if-then rules. The rule base can be constructed either from human expert
or automatic generation that is extraction of rules using numerical input-output data.
EXERCISE
1. What is Bayes Theorem ? Which kind of problem can be solved using this theorem ?
2. What are the role of E-stop and M-stop in Expectation Maximin Algorithm ?
3. What are the main parts of an Exper System ?
4. Explain fuzzy inference engine.
4.22 MACHINE LEARNING
5. Explain the following :
(a) Triangular membership function
(b) Trapezoidal membership function
(c) Gaussian membership function
6. What do you understand by Linguistic variables ? When we can use Linguistic variables ?