0% found this document useful (0 votes)
3 views22 pages

chp4

Chapter 4 covers the fundamentals of machine learning, introducing key concepts such as supervised, unsupervised, and reinforcement learning techniques. It explains the processes involved in each learning type, including training and testing for supervised learning, clustering for unsupervised learning, and trial-and-error strategies for reinforcement learning. Additionally, the chapter discusses decision trees, the Naïve Bayes model, and the Expectation Maximization algorithm for parameter estimation in statistical models.

Uploaded by

super Cel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views22 pages

chp4

Chapter 4 covers the fundamentals of machine learning, introducing key concepts such as supervised, unsupervised, and reinforcement learning techniques. It explains the processes involved in each learning type, including training and testing for supervised learning, clustering for unsupervised learning, and trial-and-error strategies for reinforcement learning. Additionally, the chapter discusses decision trees, the Naïve Bayes model, and the Expectation Maximization algorithm for parameter estimation in statistical models.

Uploaded by

super Cel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

CHAPTER

4
Machine Learning

4.1. INTRODUCTION TO LEARNING


Learning is the process to gather information and knowledge from past experience data
analysis and apply this information and knowledge to enhance the system performance. The
aim of learning or training a system is to acquire the necessary knowledge from the training
sample to make it able to differentiate among the regarded classes.
“Learning represents changes in a system that, make a system to do the same task more
efficiently the next time”.
“Learning is the process of constructing new or modifying existing representations of a
system according to experience to improve the efficiency of the system.”
There are three types of learning techniques, each corresponding to a particular type of
learning task. These are supervised learning, unsupervised learning and reinforcement
learning.
4.1.1. Supervised Learning
In supervised learning we provide an input and its corresponding target output to the
network, when inputs are given to the network, the network generate outputs and we compare
the network outputs to the target outputs. The learning function is then used to adjust the
biases of the network so that network outputs reach closer to the target outputs.

Learning
Training Algorithm Model Test Accuracy
Data Data

Step 1: Training Step 2: Testing


Fig. 4.1
Supervised learning is a machine learning technique used to learn a function from
training data set. The training data is a combination of input data and corresponding desired
outputs. The output of the function may be a continuous value or a classification of input
objects into classes. The main task of supervised learning is to find a function value that
4.2 MACHINE LEARNING

produces the outputs that match our actual output for given input output data set. Supervised
learning is used for classification problems.
Supervised Learning Process
There are two steps in supervise learning process:
1. Learning (training): Learn a model using the training data.
2. Testing: Test the model using unseen test data to assess the model accuracy.
4.1.2. Unsupervised Learning
According to unsupervised learning, the weights and biases are modified with respect to
the network inputs only. In this type of learning no target outputs available therefore most
of these algorithm performed clustering operations. They categorised the input objects into
a diffrent classes. This technique is used in applications like vector quantization. In this
learning paradigm, suppose that we are given data samples without being told which classes
they belong to. There are schemes that are aimed to discover significant patterns in the input
data without a teacher.
In unsupervised learning, some data ‘x’ is given and the cost function is given. Our goal
is to minimize the cost in that function. The cost function is related to a problem for that we
want solution and may be related to a priori assumptions. For example, in data compression
problem it may be related to the mutual information between x and y, while in statistical
modeling problem, it may be related to the posterior probability of the model given the
data. Tasks that fall within this paradigm of unsupervised learning are in general estimation
problems; the applications include clustering, the estimation of statistical distributions,
compression and filtering.
4.1.3. Reinforcement Learning
Reinforcement learning is learning about how to map situations to the actions so as to
maximize the numerical reward signal. There are two main characteristics of reinforcement
learning are trial and error, delayed reward. You need to discover an action which must
produce most reward by hit and trial method. One important thing is that any action may
affect not only the intermediate reward but also next situation and all successor reward.
In reinforcement learning, data x are usually not given, data may produces at the time of
interactions of an agent with the environment. Whenever, the agent performs an action yt
and the environment generates an observation xt and an instantaneous cost ct, according to
some unknown dynamics. Our aim is to search a method for selecting actions that minimizes
the expected total cost. The environment’s dynamics and the total cost for each method are
generally unknown, but can be estimated. Reinforcement learning is better suits for control
problems, games and other sequential decision making tasks. There are two types of
Reinforcement learning:
Passive Reinforcement Learning: In fully observable environment, Passive learning
Policy is fixed (behavior does not change). The agent learns how good each state is. Similar
to policy evaluation, but Transition function and reward function or unknown. It is useful
for future policy revisions.
Active Reinforcement Learning: Using passive reinforcement learning, utilities of
states and transition probabilities are learned. Those utilities and transitions can be plugged
into Bellman equations. Bellman equations give optimal solutions given correct utility and
transition functions. Active reinforcement learning produces approximate estimates of those
functions.
MACHINE LEARNING 4.3
4.1.4. Adaptation
Adaptation can be simply defined as a change in the relationship between recognized
pattern and the present classes that has been induced by the level of the pattern. A change by
which a pattern becomes better suited into its environment or classes. A major function of
adaptation is to increase the amount of sensor information for classifying a pattern into a
class. The amount of information collected depends upon the ways in which a samples
pattern and transducers signals. The amount of information that is used is further limited by
internal losses during transmission and processing. Adaptation can increase the information
of capturing and reduce internal losses by minimizing the effects of physical and biophysical
constraints.
4.2. DECISION TREES
A decision tree is a graphic display of various decision alternatives and the sequence of
events as if they were branches of a tree.
Rectangle Symbols are used to indicate decision points. And Circle Symbols are used to
denote situation of uncertainty or event branches coming out of a decision tree. These
points are representing of immediate mutually exclusive alternative open to decision maker.
A decision tree is highly useful to a decision point where immediate mutually exclusive
alternatives open to decision maker.
A decision tree is highly useful to a decision maker in multistage situation which
involve a serious of decisions each dependant on the preceding one.
Example 4.1. A company is running and after paying for materials labor etc. brings a
profit of Rs. 12000. The following alternatives are available to the company
1. The company can start a research R 1 which is coast of Rs 10000 having 90%
chances of success. If R1 successes the company gets total income of Rs 20000.
2. The company can start research R2 of coast of Rs 8000 having 60% chances of
success. If R2 successes the company gets total income of Rs 25000.
3. Company can pay Rs 6000 as royalty for a new process which will bring net gross
income Rs 20000.
4. The company continues the current process.
Because of limited recourse it is assumed that only one of the two researches can be
carried out at a time. Use decision tree analysis to locate the optimal strategy for the
company.
Solution. Following results we get from given decision tree: (Fig. 4.2)
1. If The company can conduct research R1. Net profit of company = 12500
2. If The company can conduct research R1. Net profit of company = 7000
3. If Company can pay Rs 6000 as royalty. Net profit = 14000.
4. If the company continues the current process. Net profit = 12000.
Hence final Decision is the option 3 i.e. company pay royalty.
4.4 MACHINE LEARNING

Fig. 4.2
4.3. NAÏVE BAYES MODEL
The Bayesian Decision theory or Bayesian frameworks have been used to deal with a
wide variety of problems in many scientific and engineering areas. Whenever a quantity is
to be inferred, or some conclusion is to be drawn, from observed data. The Bayesian principles
and tools can be used. Bayes Decision Theory is based on the ever popular Bayes Rule.
Bayes theorem is essentially an expression of conditional probabilities. More or less,
conditional probabilities represent the probability of an event occurring given evidence.
To better understand, Bayes Theorem can be derived from the joint probability of x and wi
(i.e., P(x)) as follows:

Fw I FG x IJ p(w )
PH K i
=
p
Hw K
i
i
...(i)
x p( x )
There are following probabilities
1. The Prior [P(wi)]: As the name implies, the prior or a priori distribution is a prior
belief about a particular system how it is modelled. For instance, the prior this system may
be modelled using a Gaussian of some calculated mean and variance. Many times, if prior is
unknown then a uniform distribution is used to model the prior and iterative trials may
yields a much better estimate.
2. The Likelihood [P(x  wi)]: The likelihood is the probability that a random variable
most likely belongs to specific class. This is generally known and its complement is needed
to calculate the posterior probability.
3. The Posterior [P(wi  x)]: The posterior or a posterior probability is the result
obtained from the Bayes theorem. It is the probability of happening an event according to
given evidence. Hence the a posterior is shown as P(ω | x) where ω is the particular query
and x is the evidence given.
4. The Evidence [P(x)]: The evidence p(x) is used as facts or proof of occurring an
event in past time.
MACHINE LEARNING 4.5
There are other forms of bays theorem as follows:
FG x IJ p(w )
Fw I
PH K i
=
p
Hw K i
i
...(ii)
x
z pFGH wx IJK p(w )
i
i

F xI
pG J p(w )
Fw I Hw K i
P H xK
i
= i
...(iii)
z pFGH wx IJK p(w ) dw
i
i i

Likelihood × Prior
Posterior = ...(iv)
Evidence
Proof of Bays Theorem: According to multiplication theorem of probability if event
x and w and two independent events then probability of simultaneous happening of both
event is
P(x  w) = P(xw) = P(x) × p(w) ...(a)
If x and w are not necessarily independent then Eq. (a) becomes
F w I = P(w ) × p F x I
P (x  w) = P ( x ) × p ( w ) = P ( x ) × p H xK H wK ...(b)

Now rearranging equation (b), we get


F xI
F wI (w) × p
H wK
P
H xK =
p( x )
...(c)

Now putting w = wi in case of w is a random event we get bays theorem (i)

(wi ) × p
FG x IJ
Fw I Hw K
P H xK
i
=
p( x )
i
...(i)

If S is a sample space contains (w1, w2, ..., wn) and x is arbitrary event then xi is subset
of S. Then x corresponding to wi is
P(x) = p( xw11 ) + p( xw12 ) + ... + p( xw1i ) = ∑ p( xwi ) ...(d)
According to equation (b) P ( x ) = ∑ p(wi ) p
x FG IJ
wi H K
now putting this value of P(x) in
x FG IJ
F I
wi
p
wi Hp(wi )
K
equation (i) we get P H K
x
=
∑p
x FG IJ ...(ii)
wi H
p(wi )
K
F xI
p G J p(w )
F w I
In case w is continuous random variable PH K =
Hw K i i
i
...(iii)
i x
z pFGH wx IJK p(w ) dw i
i i

Example: The Bayesian Squirrel


4.6 MACHINE LEARNING

This squirrel has started search its lost food in one of two patches. The only problem is,
she can’t remember which one it is. There are two hypotheses: Hypothesis 1 is that the food
is in patch 1 and Hypothesis 2 is that the food is in patch 2. This squirrel is pretty sure that
she left the food in patch 1. In fact, she’s willing to say that there’s an 80% chance that the
food is in patch 1. She also knows that she’s really good at hiding her food. Consequently,
there’s only a 20% chance of finding the food per day when she’s looking in the right patch
(and, of course, a 0% probability if she’s looking in the wrong patch).
Before she even starts searching, she has a prior probability P(patch 1) = 0.8. Fortunately,
this squirrel has been trained in Bays theorem, and can therefore calculate posterior
probabilities. Suppose she looks in patch 1 and doesn’t find any food? What’s the probability
that the food is in patch 1, given that she didn’t find anything? In terms of Bays Theorem, we
see that:

p(Find no food | Food in patch 1) p (Food in patch 1)


p(Food in patch 1 | Find no food) =
p(Find no food)
The probability of not finding food when searching in the right patch is 0.8, so:
(Find no food | Food in patch 1) = 1 – 0.2 = 0.8
The second term is our prior probability that the food is in patch 1, which equals 0.8:
p(Food in patch 1) = 0.8
What’s the denominator? There are two ways that she could not find food. She could
have been looking in the right patch but didn’t come across the food, or she’s looking in the
wrong patch (e.g., the food is in patch 2). Whenever we see an “or” statement, we know that
we should be adding probabilities, so, the probability of her not finding her food is P(food
is in patch 2) + 0.8 P(food is in patch 1). Again, P(food is in patch 1) is our prior probability,
and P(food is in patch 2) is simply 1-the prior probability:
p(Find no food) = p(food in patch 2) + p(food in patch 1) p(Find no food | Food in patch 1)
= 0.2 + 0.8 × 0.8
Calculate the posterior probability that the food is in patch 1, given that she fails to find
food there. She knows that she should search in patch 1 until the probability that the food is
in patch 1 drops below 0.5. At that point, she should switch and look in patch 2.
4.4. EXPECTATION MAXIMIZATION ALGORITHM
This algorithm is an iterative procedure to find maximum likelihood estimation (MLE)
of parameters in statistical models, when some data is missing. Expectation Maximization
model is based on the two variables that are hidden variables and observed variables. Observed
variables are those variables that are directly measurable from the data, for example:
• The waveform values of a speech recording
• Is it raining today?
• Did the smoke alarm go off?
Hidden variables are those variables that are influence the data, but not trivial to measure
for example:
• The phonemes that produce a given speech recording.
• P (rain today | rain yesterday).
• Is the smoke alarm malfunctioning?
MACHINE LEARNING 4.7
In Expectation Maximization model, we consider the complete data is a combination
of two parts, dcomplte = {dobs, dmis}, in which dobs is observed data and dmis is missing data
(or unobservable, hidden data). So the Expectation Maximization (EM) procedure attempts
to solve the following ML estimation problem
θ* = arg max ln P(dobs | θ)
Thus the maximum likelihood estimate (MLE) of θ is that value of θ that maximizes
like (θ): it is the value that makes the observed data the “most probable”. There are two steps
in EM algorithm procedure:
• Expectation step (E-Step): Estimate the missing parts as dmis on the basis of given
present θ and then use this value of θ to augment the observed data dobs from the
complete data set dcomplete = {dobs, dmis}.
• Maximizing step (M-Step): On the basis of above calculation, calculate the
likelihood is called maximizing step.
d(dobs, dmis/θ)
Examples: There have many examples where are we applying the expectation
maximization model;
1. Parsing Problem: In the parsing problem the complete Data is the combination of
a sentence and its parse tree. But the observed data is the sentence and the unobserved data
in the complete data is the non-terminal categories and their relationships that form the
parse tree. So EM provides a model that allows one to compute the probability of parse trees.
2. Semantic Labelling: In this problem Complete Data are context, cluster and words
and the observed Data is context and words only but the cluster is the unobserved Data,
So modelling the cluster on the bases of observed data when some of the data is missing (i.e.,
cluster) applying the expectation maximization technique, which is calculated;
P(context, cluster, word) = P(context)P(cluster | context)P(word | cluster)
4.5. EXPERT SYSTEM
Expert system is a programs that attempt to perform the duty of an expert in the problem
domain in which it is defined. Expert systems are computer programs that have been
constructed in such a way that they are capable of functioning at the standard of human
experts or more in given fields that embody a depth and richness of knowledge that permit
them to perform at the level of an expert.
4.5.1. Rule Based Expert System
Using a set of assertions, which collectively form the ‘working memory’, and a set of
rules that specify how to act on the assertion set, a rule-based system can be created. Rule-
based systems are fairly simplistic, consisting of little more than a set of if-then statements,
but provide the basis for so-called “expert systems” which are widely used in many fields.
The concept of an expert system is this: the knowledge of an expert is encoded into the rule
set. When exposed to the same data, the expert system will perform in a similar manner as the
expert.
4.5.1.1. Element of rule based Expert System
Rule-based systems are a relatively simple model that can be adapted to any number of
problems. To create a rule-based system for a given problem, you must have (or create) the
following:
4.8 MACHINE LEARNING

• A set of facts to represent the initial working memory. This should be anything
relevant to the beginning state of the system.
• A set of rules. This should encompass any and all actions that should be taken
within the scope of a problem, but nothing irrelevant. The number of rules in the
system can affect its performance, so you do not want any that are not needed.
• A condition that determines that a solution has been found or that none exists. This
is necessary to terminate some rule-based systems that find themselves in infinite
loops otherwise. In fact, there are three essential components to a fully functional
rule based expert system: the knowledge base, the working memory and the inference
engine.
The knowledge base: The knowledge based is the store in which the knowledge in the
particular domain is kept. The knowledge base stores information about the subject domain.
However, this goes further than a passive collection of records in a database. Rather it
contains symbolic representations of experts’ knowledge, including definitions of domain
terms, interconnections of component entities, and cause-effect relationships between these
components. The knowledge in the knowledge based is expressed as a collection of fact and
rule. Each fact expresses relationship between two or more object in the problem domain
and can be expressed in term of predicates IF condition THEN conclusion where the condition
or conclusion are fact or sets of fact connected by the logical connectives NOT, AND, OR.
The working memory: The working memory is a temporal store that holds the fact
produced during processing and possibly awaiting further processing produced by the
Inference engine during its activities. Note that the working memory contains only facts and
these fact are those produced during the searching process.
The inference engine: The core of any expert system is its inference engine. This is the
part of expert system that manipulates the knowledge based to produce new fact in order to
solve the given problem. An inference engine consists of search and reasoning procedures to
enable the system to find solutions, and, if necessary, provide justifications for its answers.
In this process it can used either forward or backward searching as a direction of search while
applying some searching technique such as depth first search, breath first search etc. The
roles of inference engine are:
1. It identified the rule to be fired. The rule selected is the one whose conditional part
is the same as the fact been considered in the case of forward chaining or the one
whose conclusion part is the one as the fact been considered in the case of backward
chaining.
2. It resolve conflict when more than one rule satisfy the matching this is called
conflict resolution which is based on certain criteria mentioned further.
3. It recognizes the goal state. When the goal state is reached it report the conclusion
of searching.
4.5.1.2. Limitation of Rule Based System
Knowledge acquisition is the process of extracting knowledge from experts. Given the
difficulty involved in having experts articulate their intuition in terms of a systematic
process of reasoning; this aspect is regarded as the main bottleneck in expert systems
development. rule-based systems are really only feasible for problems for which any and all
knowledge in the problem area can be written in the form of if-then rules Rule based system
is only applicable for problem in which the area is not large. If there are too many rules, the
system can become difficult to maintain and can suffer a performance hit. Rule-based systems
are a relatively simple model that can be adapted to any number of problems. A rule-based
MACHINE LEARNING 4.9
system has its strengths as well as limitations that must be considered before deciding if it is
the right technique to use for a given problem. Overall, rule-based systems are really only
feasible for problems for which any and all knowledge in the problem area can be written in
the form of if-then rules and for which this problem area is not large.
4.5.2. Case based system
In case-based reasoning (CBR) systems expertise is embodied in a library of past cases,
rather than being encoded in classical rules. Each case typically contains a description of
the problem, plus a solution and/or the outcome. The knowledge and reasoning process
used by an expert to solve the problem is not recorded, but is implicit in the solution. To
solve a current problem: the problem is matched against the cases in the case base, and
similar cases are retrieved. The retrieved cases are used to suggest a solution which is reused
and tested for success. If necessary, the solution is then revised. Finally the current problem
and the final solution are retained as part of a new case. Case-based reasoning is liked by
many people because they feel happier with examples rather than conclusions separated
from their context. A case library can also be a powerful corporate resource, allowing everyone
in an organisation to tap into the corporate case library when handling a new problem.
4.5.2.1. Case Based System Cycle
All case-based reasoning methods have in common the following process:
1. retrieve the most similar case (or cases) comparing the case to the library of past cases;
2. reuse the retrieved case to try to solve the current problem;
3. revise and adapt the proposed solution if necessary;
4. retain the final solution as part of a new case.
4.5.2.2. Applications of Case Based System
Case based reasoning first appeared in commercial tools in the early 1990’s and since
then has been used to create numerous applications in a wide range of domains:
Diagnosis: Case-based diagnosis systems try to retrieve past cases whose symptom
lists are similar in nature to that of the new case and suggest diagnoses based on the best
matching retrieved cases. The majority of installed systems are of this type and there are
many medical CBR diagnostic systems.
Help Desk: Case-based diagnostic systems are used in the customer service area dealing
with handling problems with a product or service.
Assessment: Case-based systems are used to determine values for variables by comparing
it to the known value of something similar. Assessment tasks are quite common in the
finance and marketing domains.
Decision Support: In decision making, when faced with a complex problem, people
often look for analogous problems for possible solutions. CBR systems have been developed
to support in this problem retrieval process (often at the level of document retrieval) to find
relevant similar problems. CBR is particularly good at querying structured, modular and
non-homogeneous documents.
4.5.3. Example of Expert System
There are many expert systems available from which two are given below:
4.5.3.1. Dendral
Dendral was the famous expert system in artificial intelligence of the 1960s. Its main
purpose is to provide support to chemists in identify unknown organic molecules, by study
4.10 MACHINE LEARNING

the mass spectra and knowledge of chemistry of molecules. Dendral software is considered
the first expert system because it is first time used to automate the decision-making process
and problem-solving of organic chemists. It consists of two sub-programs, Heuristic Dendral
and Meta-Dendral.
Heuristic Dendral: Heuristic Dendral is a software program that inputs the mass spectra
and other experimental data with knowledge base of chemistry. It produces output the set of
possible chemical structures that are related to experimental data. Mass spectrometer is used
to generate A mass spectrum of a compound, this can be used to find the molecular weight
and the masses of compound atomic constituents.
Meta-Dendral: Meta-Dendral is a knowledge refining system that uses the possible
chemical structures and related mass spectra as input, and suggests a set of hypotheses to
explain relation between some of the suggested structures and the mass spectrum. These
hypotheses can again fed back to Heuristic Dendral to test their validity. We can say it is a
learning system and the Heuristic Dendral is decision system. Theses system work on two
principles: the plan-generate-test paradigm and knowledge engineering.
Plan-Generate-Test Paradigm: The plan-generate-test paradigm is the problem-solving
method, used by both Heuristic Dendral and Meta-Dendral systems. The generator generates
possible solutions for a particular problem using knowledge base. After that heuristic Dendral
check these solutions for validity.
Knowledge Engineering: The main aim of knowledge engineering is to provide a
productive interaction between the available knowledge base and problem solving
techniques. Knowledge engineering must contain the following things:
Large Knowledge Base: Knowledge base contains large amount of information related
to mass spectrometry technique and large amount of information about chemistry. Chemical
structure of compound their atomic mass their atomic components and their atomic number
and atomic mass.
General Rules: The possible rules that can be used to access knowledge from knowledge
base related to the problem.
4.5.3.2. MYCIN
MYCIN was the first large expert system to perform at the level of a human expert and to
provide users with an explanation of its reasoning. Most expert systems developed since
MYCIN have used MYCIN as a benchmark to define an expert system. Moreover, the
techniques developed for MYCIN have become widely available in the various small expert
system building tools. MYCIN was developed at Stanford University in the mid-1970s. It
was designed to aid physicians in the diagnosis and treatment of meningitis and bacterial
infections. MYCIN was strictly a research system. AI investigators wanted to advance the state of
expert system building by undertaking a hard problem with clear, practical ramifications.
MYCIN provides consultative advice about bacterial infection (infections that involve
bacteria in the blood) and meningitis (infections that involve inflammation of the membranes
that envelop the brain and spinal cord). These infectious diseases can be fatal and often
show themselves during hospitalization.
Working of MYCIN
MYCIN is a computer program designed to provide attending physicians with advice
comparable to that which they would otherwise get from a consulting physician specializing
in bacterial and meningitis infections. To use MYCIN, the attending physician must sit in
front of a computer terminal that is connected to a DEC-20 (one of Digital Equipment
MACHINE LEARNING 4.11
Corporation’s mainframe computers) where the MYCIN program is stored. When the MYCIN
program is evoked, it initiates a dialogue. The physician types answers in response to
various questions. Eventually MYCIN provides a diagnosis and a detailed drug therapy
recommendation.
Example: Laboratory results of body fluid analyses, symptoms that the patient is
displaying, and general characteristics of the patient, such as age and sex. MYCIN obtains
this information by interrogating the physician. A MYCIN consultation proceeds in two
phases. First a diagnosis is made to identify the most likely infectious organisms. Then one
or more drugs are prescribed that should control for all of the possible organisms. The
antibiotics prescribed must rid the patient of the disease. They must also interact favorably
with each other, and be appropriate for the specific patient.
4.6. MACHINE LEARNING ALGORITHMS
There are many machine learning algorithm in which two are most important given
below:
4.6.1. Genetic Algorithm
A genetic algorithm is a search technique used in computing to find exact or approximate
solutions to optimization and search problems. Genetic algorithms are categorized as global
search heuristics. Genetic algorithms are a particular class of evolutionary algorithms that
use techniques inspired by evolutionary biology such as inheritance, mutation, selection,
and crossover Genetic algorithms are implemented as a computer simulation in which a
population of abstract representations (called chromosomes or the genotype of the genome)
of candidate solutions (called individuals, creatures, or phenotypes) to an optimization
problem evolves toward better solutions. Traditionally, solutions are represented in binary
as strings of 0s and 1s, but other encodings are also possible. The evolution usually starts
from a population of randomly generated individuals and happens in generations. In each
generation, the fitness of every individual in the population is evaluated, multiple individuals
are stochastically selected from the current population (based on their fitness), and modified
(recombined and possibly randomly mutated) to form a new population. The new population
is then used in the next iteration of the algorithm. Commonly, the algorithm terminates
when either a maximum number of generations has been produced, or a satisfactory fitness
level has been reached for the population. If the algorithm has terminated due to a maximum
number of generations, a satisfactory solution may or may not have been reached.
Applications of Genetic algorithm
Genetic algorithms find application in bioinformatics, phylogenetics, computational
science, engineering, economics, chemistry, manufacturing, mathematics, physics and other
fields.
4.6.2. Neural Network
Artificial neural network, commonly referred to as neural network is an artificial
representation of human brain that tries to simulate its learning process. Traditionally the
word neural network is referred to a network of biological neuron in the nervous system that
process and transmits information. These systems are constructed to make use of some
organizational principles similar to those of the human brain? They represent a promising
new generation of information processing systems. A neural network is an interconnected
group of artificial neurons that uses a mathematical model or computational model for
4.12 MACHINE LEARNING

information processing based on connectionist approach to computation. In this model a


single neuron, which receives a set of inputs (x1, x2, x3, ..., xn). This set of inputs is multiplied
by a set of weights (w1, w2, ..., wn). Here, weights are referred to as strengths of the synapses.
These weighted values are then summed and the output is passed through an activation
(transfer) function. The activation function is also referred to as a squashing function in that
it squashes (limits) the permissible range of the output signal to some finite value.
Activation
Inputs Weights Function

x1 w1
x2 w2
∑ u
f(u) y

xn wn

ϑ
Fig. 4.3. Artificial Neural Network

4.6.2.1. Type of Neurons


Neuron can be classified in two types namely, simple neuron and complicated neuron.
Following is the basic information of these two given in detail:
Simple neuron: An artificial neuron is a device with many inputs and single output.
The neuron can be used in two modes; the training mode and the application mode. In the
training, the neuron is trained to fire (or not), for particular input values. In the application
mode, when a taught input value is detected as an input, its corresponding output becomes
the current output. If the input value does not match in the taught list of input values, the
firing rule is used to decide whether to fire or not. Figure 4.4 show an example of simple
neuron.
Teach/Use
X1

X2

Inputs Neuron Output

Xn

Teaching Input
Fig. 4.4. Simple Neuron
Complicated neuron: The simple neuron doesn’t do anything that conventional
computers don’t do already. Figure 4.5 is the example of complicated neuron. The difference
from the previous model is that the inputs are ‘weighted’; the effect that each input has at
decision making is dependent on the weight of the r input. Weight of an input is a number
which when multiplied with the input it convert the input into weighted input. If the
summation of these weighted inputs exceed a pre-set threshold, then neuron fires. Otherwise
neuron does not fire.
MACHINE LEARNING 4.13
Teach/Use
X1 W1

X2 W2

Inputs Neuron Output

Xn Wn

Teaching Input

Fig. 4.5. A Complicated Neuron


In mathematical terms, the neuron fires if and only if;
X1W1 + X2W2 + X3W3 + ... > T
The addition of input weights and of the threshold makes this neuron very flexible. The
complicated neuron generally has the ability to adapt to a particular situation by adjusting
its weights and/or threshold. Various algorithms are available that cause the neuron to ‘adapt’;
the most used ones are the Delta rule and the back error propagation.
4.6.2.2. Architecture of Neural Networks
Single-Layer Feed-forward Neural Networks
A Single-Layer Feed-forward Neural Network is shown in Figure 4.6 in the single layer
feed-forward neural network, there is only one input layer and one output layer. In this
network, several neurons (nodes) can be connecting in parallel to a layer. The network is
strictly feed-forward, that is, there is no feedback connections from the outputs back to the
inputs. Usually, no connections exist between the neurons (nodes) in a particular layer. The
network shown in Figure 4.6 is fully connected, that is, all inputs are connected to all the
nodes. Partially connected networks are those where some of the connection links are missing.
x1 1 y1

x2 2 y2

xN J yJ

Input Output
Layer Layer
Fig. 4.6. Single Layer Feed-forward Neural Network
Multilayer Feed-forward Neural Networks (MFNN): A Multilayer Feed-forward Neural
Network is shown in Figure 4.7 is the most widely used neural networks, particularly within
the area of systems and control. Similar to the single-layer feed-forward neural networks,
4.14 MACHINE LEARNING

there is one input layer and one output layer, and no interconnections between the nodes in
a particular layer. But different from the single-layer feed-forward neural networks, multilayer
neural networks have a number of intermediate or hidden layers (any layer between the
input and output layers, is called a hidden layer because it is internal to the network and has
no direct contact with the external environment) existing between the input and output
layer. One, two or more hidden layers are used for most applications. The small number of
hidden layers is due to the fact that the training process becomes too long and tedious if the
architecture of the neural network becomes large. In Figure 4.7 one hidden layer is present in
this multilayer neural network, where J ≠ K ≠ N; J, K, N ∈ R. To get the output from the
network, a set of input data is first presented as inputs to the input layer in turn. The outputs
from this layer are then fed, as inputs to the first hidden layer, and subsequently the outputs
from the first hidden layer are fed, as weighted inputs (the outputs from the first hidden layer
are multiplied by the weights), to the second hidden layer. This process carries on until the
output layer is reached. An example of a feed-forward neural network is the multilayer
perceptron (MLP) (commonly called the multilayer feed forward network).

x1 1 1 y1

x2 2 2 y2

xN J K yK

Input Hidden Output


Layer Layer Layer

Fig. 4.7. Multilayer Feed-forward Neural Network


Feedback Networks: Feedback networks can have signals travelling in both directions
by introducing loops. Feedback networks are more powerful and can get very complicated.
Feedback networks are dynamic in nature and there state is changing continuously until
they reach an equilibrium state. They remain in equilibrium state until the input changes
and a new equilibrium state required. Feedback architecture is shown in shown in
Figure 4.8.
Feedback

Input Output

Competition

Feedback
Fig. 4.8. Feedback Neural Network
MACHINE LEARNING 4.15
4.6.2.3. Learning in Neural Networks
The ability of neural network is to learn their environment and to adaptive fine-tune
their parameter to improve the system performance. Generally, learning is the process by
which the NN adapts itself to a stimulus, and eventually it produces a desired response.
During the process of learning, the network adjusts its parameters, like weights, at the input
so that its actual output converges to the desired output response. When the actual output
match with the desired one, we say that learning is completed. Learning rules are defined by
mathematical expressions called learning equations. The learning process is not same for
the all the networks depend on applications. There are two general categories of learning
known as supervised learning and unsupervised learning.
Supervised Learning: In supervised learning, we know the input, actual output and
the desired response .we try to calculate the difference between actual output and desired
output. If the actual response differs from the desired output, the NN generates an error
signal, and the difference between actual and desired responce is then used to calculate the
weight adjustment so that actual output matches with the desired output.
Unsupervised Learning: In unsupervised learning, we don’t know about the desired
output. In training network receives at its input many different input patterns and it arbitrarily
organizes the pattern into classes. When a input applied later, the network provides an
output response indicating the class to which the input belongs. If a class cannot be found
for the stimulus, a new class is generated. This type of learning sometimes referred to as self-
organizing learning.
4.6.2.4. Neural Network Learning Algorithms
A learning algorithm is a mathematical tool that outlines the methodology and the
speed for NN to reach the steady state of its parameters, weights and thresholds successfully.
It starts with an error function (energy function), which is expressed in terms of weights. The
objective is to minimize the error in the set of weights. When the error function is zero or
small enough, the steady state of the network and of the weights is reached. During learning,
the error function decreases and the weights are updated. The decrease may be accomplished
with different optimization techniques such as the Delta rule, Boltzman’s algorithm, the
back propagation learning algorithm and simulation annealing. The selection of the error
function and the optimization method is important, because it may increase stability,
instability or a solution trapped in a local minimum. Generally we use back propagation
learning algorithm. Back propagation learning algorithm is the basic learning mechanism
and it is very popular. In this algorithm, the network output, on presentation of input data, is
compared with the desired output and a measure of the error is obtained. This error measure
is then used to incrementally modify appropriate weights in the connection matrices in
order to reduce the error.
4.6.3. Fuzzy Logic
A “fuzzy set” is a simple extension of the definition of a classical set in which the
characteristic function is permitted to have any values between 0 and 1. A “fuzzy set” A in X
can be defined as a set of ordered pairs:
A = {x, µA(x) : x ∈ X)
Where µA(x) is called membership function for the fuzzy set A. It maps each x to a
membership grade between 0 and 1. Examples of membership functions (Triangular,
Trapezoidal and Gaussian) can be seen in Figure 4.9 and described with the following
formulas:
4.16 MACHINE LEARNING

Triangular MFs: Triangular membership function is given by following equation [25]


R|0 if x ≤ a
|| bx −− aa if a ≤ x ≤ b
Triangle (x; a, b, c) = S| c − x
|| c − b if b ≤ x ≤ c

T 0 if x ≥ c
Trapezoidal MFs: Trapezoidal membership function is given by following equation

R| 0 if x ≤ a
|| bx −− aa if a ≤ x ≤ b

Trapezoidal (x; a, b, c, d) = |S 1 if b ≤ x ≤ c
|| d − x
if c ≤ x ≤ d
|| d − c
T 0 if x ≥ d

Gaussian MFs: Gaussian membership function is given by following equation:


( x − c )2

Gaussian (x; c, σ) = e 2

Triangular Trapezoidal Gaussian


1 1 1

0.8 0.8 0.8


Degree of Membership

Degree of Membership

Degree of Membership

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0

Fig. 4.9. Examples of Membership Functions

4.6.3.1. Linguistic Variables


The concept of linguistic variables was introduced by Zadeh to provide a basis for
approximate reasoning. A linguistic variable is defined as a variable whose values are words
or sentences. For instance, Age can be linguistic variable if its values are linguistic rather
than numerical, i.e., young, very young, old, very old, etc., rather than 20, 21, 53, 55….
Figure 4.10 illustrates the term set Age expressed by the Gaussian MFs.
MACHINE LEARNING 4.17

0.8 Young
Old

Degree of Membership Middle


Age
0.6

0.4
Very
Very Old
Young
0.2

0 10
Fig. 4.10. Membership Functions of the Term Set Age

4.6.3.2. Fuzzy if-then Rules


A fuzzy if-then rule is expressed as follow:
If x is A then y is B
Where A and B are linguistic values defined by fuzzy sets. “x is A” is called “antecedent”
or “premise”, while “y is B” is called the “consequence” or “conclusion”. Some of the if-then
rule examples can be given below:
1. If age is 25, then person is young.
2. If the speed is high AND the distance is small, then the force on brake should be
high.
3. If Weight is 200 kg then person is fat.
4.6.3.3. Fuzzy Reasoning
Fuzzy reasoning, approximate reasoning, is an inference procedure whose outcome is
conclusion for a set of fuzzy if-then rules. The steps of fuzzy reasoning can be given as
follows:
1. “Input variables are compared with the MFs on the premise part to obtain the
membership values of each linguistic label (fuzzification).
2. The membership values on the premise part are combined through specific fuzzy
set operations
3. The qualified consequent (either fuzzy or crisp) is generated depends on the firing
strength.
4. The qualified consequents are aggregated to produce crisp output according to the
defined methods.
4.6.3.4. Fuzzy Systems
Fuzzy systems are made of a knowledge base and reasoning mechanism called fuzzy
inference engine. The structure of fuzzy inference engine is shown in Figure 4.11. A fuzzy
inference engine combines fuzzy if-then rules into a mapping from the inputs of the system
into its outputs, using fuzzy methods. Fuzzy systems is a nonlinear mapping accompanied
4.18 MACHINE LEARNING

by fuzzy if-then rules. The rule base can be constructed either from human expert or automatic
generation that is extraction of rules using numerical input-output data.

Knowledge
Base

Crisp Fuzzication Crisp


Defuzzication
Input Output
Fuzzy Inference
Engine
Fuzzy Fuzzy
Input Output

Fig. 4.11. Fuzzy Inference Engine


A fuzzy inference system consists of four functional blocks as shown in Figure 4.11.
1. Fuzzification: Conversion of the crisp inputs into degrees of match with linguistic
values.
2. Knowledge Case: consists of a rule base and a database. A rule base contains a
number of fuzzy rules. A database defines the membership functions of the fuzzy
sets used in the fuzzy rules.
3. Fuzzy inference engine: Used to perform operations on the rules.
Defuzzification: Conversion of the fuzzy results of the inference into a crisp output.

UNIVERSITY PREVIOUS YEAR QUESTIONS WITH ANSWERS


Q. 1. Compare and contrast between supervised and unsupervised learning techniques.
[GBTU VII th, VIII th Semester 2011-12; GBTU VIII th Semester 2012-13]
Ans. See Arts. 4.1.1 and 4.1.2.
Q. 2. Illustrate Naïve Bayes model of statistical learning. [GBTU VII th, VIIIth Semester 2011-12]
Ans. See Art. 4.3.
Q. 3. Describe the decision tree learning model by choosing a suitable example.
[GBTU VII th Semester 2011-12; GBTU VIII th Semester 2012-13]
Ans. See Art. 4.2.
Q. 4. Define the term reinforcement learning. How does the passive reinforcement learning differ
than active reinforcement learning?
[GBTU VIII th Semester 2011-12; GBTU VII th Semester 2012-13]
Ans. See Art. 4.1.3.
Q. 5. Write note on Knowledge in learning. [GBTU VIII th Semester 2012-13]
Ans. See Art. 4.1.
Q. 6. Write note on Reinforcement learning. [GBTU VIII th Semester 2012-13]
Ans. See Art. 4.1.3.
Q. 7. What is machine learning?
Ans. Machine learning is the subfield of artificial intelligence that is concerned with the design and
development of algorithms that allow computers to improve their performance over time based
on data, such as from sensor data or databases. A major focus of machine learning research is
to automatically produce (induce) models, such as rules and patterns, from data. Hence,
machine learning is closely related to fields such as data mining, statistics, inductive reasoning,
pattern recognition, and theoretical computer science.
MACHINE LEARNING 4.19
th
Q. 8. Explain the working of DENDRAL expert system. [GBTU 2009-10; VII Semester 2010-11]
Ans. See Art. 4.5.3.1.
Q. 9. Write the shot note on MYCIN expert system. [GBTU VII th Semester 2009-10]
Ans. See Art. 4.5.3.2.
Q. 10. Write short notes on Limitation of expert system and self explanation system.
[GBTU VII th Semester 2010-11]
Ans. See Art. 4.5.1.2.

SHORT QUESTIONS WITH ANSWERS


Q. 1. What is rule based expert system?
Ans. Rule based Expert system Using a set of assertions, which collectively form the ‘working
memory’, and a set of rules that specify how to act on the assertion set, a rule-based system can
be created. Rule-based systems are fairly simplistic, consisting of little more than a set of if-
then statements, but provide the basis for so-called “expert systems” which are widely used in
many fields. The concept of an expert system is this: the knowledge of an expert is encoded
into the rule set. When exposed to the same data, the expert system will perform in a similar
manner as the expert.
Q. 2. What are various component of an rule based expert system?
Ans. Rule-based systems are a relatively simple model In fact, there are three essential components
to a fully functional rule based expert system: the knowledge base, the working memory and
the inference engine.
The knowledge base: The knowledge based is the store in which the knowledge in the
particular domain is kept. The knowledge base stores information about the subject domain.
However, this goes further than a passive collection of records in a database. Rather it contains
symbolic representations of experts’ knowledge, including definitions of domain terms,
interconnections of component entities, and cause-effect relationships between these
components. The knowledge in the knowledge based is expressed as a collection of fact and
rule. Each fact expresses relationship between two or more object in the problem domain and
can be expressed in term of predicates IF condition THEN conclusion where the condition or
conclusion are fact or sets of fact connected by the logical connectives NOT, AND, OR.
The working memory: The working memory is a temporal store that holds the fact produced
during processing and possibly awaiting further processing produced by the Inference engine
during its activities. Note that the working memory contains only facts and these fact are those
produced during the searching process.
The inference engine: The core of any expert system is its inference engine. This is the part
of expert system that manipulates the knowledge based to produce new fact in order to solve
the given problem. An inference engine consists of search and reasoning procedures to enable
the system to find solutions, and, if necessary, provide justifications for its answers. In this
process it can used either forward or backward searching as a direction of search while
applying some searching technique such as depth first search, breath first search etc.
Q. 3. What is case based expert system?
Ans. In case-based reasoning (CBR) systems expertise is embodied in a library of past cases, rather
than being encoded in classical rules. Each case typically contains a description of the problem,
plus a solution and/or the outcome. The knowledge and reasoning process used by an expert
to solve the problem is not recorded, but is implicit in the solution. To solve a current problem:
the problem is matched against the cases in the case base, and similar cases are retrieved. The
retrieved cases are used to suggest a solution which is reused and tested for success. If
necessary, the solution is then revised. Finally the current problem and the final solution are
retained as part of a new case. Case-based reasoning is liked by many people because they feel
happier with examples rather than conclusions separated from their context. A case library can
also be a powerful corporate resource, allowing everyone in an organisation to tap into the
corporate case library when handling a new problem.
4.20 MACHINE LEARNING
Q. 4. Explain various applications of case based expert system.
Ans. Case based reasoning first appeared in commercial tools in the early 1990’s and since then has
been sued to create numerous applications in a wide range of domains:
Diagnosis: Case-based diagnosis systems try to retrieve past cases whose symptom lists are
similar in nature to that of the new case and suggest diagnoses based on the best matching
retrieved cases. The majority of installed systems are of this type and there are many medical
CBR diagnostic systems.
Help Desk: Case-based diagnostic systems are used in the customer service area dealing with
handling problems with a product or service.
Assessment: Case-based systems are used to determine values for variables by comparing it
to the known value of something similar. Assessment tasks are quite common in the finance
and marketing domains.
Decision support: In decision making, when faced with a complex problem, people often
look for analogous problems for possible solutions. CBR systems have been developed to
support in this problem retrieval process (often at the level of document retrieval) to find
relevant similar problems. CBR is particularly good at querying structured, modular and non-
homogeneous documents.
Q. 5. What is genetic algorithm ?
Ans. A genetic algorithm is a search technique used in computing to find exact or approximate
solutions to optimization and search problems. Genetic algorithms are categorized as global
search heuristics. Genetic algorithms are a particular class of evolutionary algorithms that use
techniques inspired by evolutionary biology such as inheritance, mutation, selection, and
crossover Genetic algorithms are implemented as a computer simulation in which a population
of abstract representations of candidate solutions to an optimization problem evolves toward
better solutions. Traditionally, solutions are represented in binary as strings of 0s and 1s, but
other encodings are also possible. The evolution usually starts from a population of randomly
generated individuals and happens in generations. In each generation, the fitness of every
individual in the population is evaluated, multiple individuals are stochastically selected from
the current population (based on their fitness), and modified (recombined and possibly randomly
mutated) to form a new population. The new population is then used in the next iteration of the
algorithm. Commonly, the algorithm terminates when either a maximum number of generations
has been produced, or a satisfactory fitness level has been reached for the population. If the
algorithm has terminated due to a maximum number of generations, a satisfactory solution may
or may not have been reached.
Q. 6. Explain neural network in detail.
Ans. Artificial neural network, commonly referred to as neural network is an artificial representation
of human brain that tries to simulate its learning process. Traditionally the word neural network
is referred to a network of biological neuron in the nervous system that process and transmits
information. These systems are constructed to make use of some organizational principles
similar to those of the human brain? They represent a promising new generation of information
processing systems. A neural network is an interconnected group of artificial neurons that uses
a mathematical model or computational model for information processing based on connectionist
approach to computation.

Fig. 4.12
MACHINE LEARNING 4.21
In this model a single neuron, which receives a set of inputs [([(x)]↓ (1), x↓(2), x)]↓3 ….. xn).
This set of inputs is multiplied by a set of weights (w1, w2, ... wn). Here, weights are referred to
as strengths of the synapses. These weighted values are then summed and the output is passed
through an activation (transfer) function. The activation function is also referred to as a
squashing function in that it squashes (limits) the permissible range of the output signal to
some finite value.
Q. 7. Explain fuzzy set and fuzzy system in detail.
Ans. A “fuzzy set” is a simple extension of the definition of a classical set in which the characteristic
function is permitted to have any values between 0 and 1. A “fuzzy set” A in X can be defined
as a set of ordered pairs:
A = {x, µA(x): x∈X)
Where µA(x) is called membership function for the fuzzy set A. It maps each x to a membership
grade between 0 and 1. Fuzzy systems are made of a knowledge base and reasoning mechanism
called fuzzy inference engine. The structure of fuzzy inference engine is shown in Figure.
A fuzzy inference engine combines fuzzy if-then rules into a mapping from the inputs of the
system into its outputs, using fuzzy methods. Fuzzy systems is a nonlinear mapping
accompanied by fuzzy if-then rules. The rule base can be constructed either from human expert
or automatic generation that is extraction of rules using numerical input-output data.

Fig. 4.12. Fuzzy Inference Engine


A fuzzy inference system consists of four functional blocks as shown in Figure
1. Fuzzification: conversion of the crisp inputs into degrees of match with linguistic values.
2. Knowledge base: consists of a rule base and a database. A rule base contains a number of
fuzzy rules. A database defines the membership functions of the fuzzy sets used in the fuzzy
rules.
3. Fuzzy inference engine: used to perform operations on the rules.
4. Defuzzification: conversion of the fuzzy results of the inference into a crisp output.

EXERCISE
1. What is Bayes Theorem ? Which kind of problem can be solved using this theorem ?
2. What are the role of E-stop and M-stop in Expectation Maximin Algorithm ?
3. What are the main parts of an Exper System ?
4. Explain fuzzy inference engine.
4.22 MACHINE LEARNING
5. Explain the following :
(a) Triangular membership function
(b) Trapezoidal membership function
(c) Gaussian membership function
6. What do you understand by Linguistic variables ? When we can use Linguistic variables ?

You might also like