0% found this document useful (0 votes)
35 views45 pages

Unit 4

The document discusses designing a learning system for playing checkers games. It covers four key aspects of designing a learning system: 1. Choosing the training experience, such as direct examples of board states and moves or indirect sequences of moves and outcomes. 2. Choosing the target function, which is the function the learning system aims to learn, such as a function that selects the best move or assigns a score to board states. 3. Choosing a representation for the target function, such as representing the function as a neural network. 4. Choosing a learning algorithm, such as backpropagation, to learn the target function from the training experience.

Uploaded by

farhandevil111
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views45 pages

Unit 4

The document discusses designing a learning system for playing checkers games. It covers four key aspects of designing a learning system: 1. Choosing the training experience, such as direct examples of board states and moves or indirect sequences of moves and outcomes. 2. Choosing the target function, which is the function the learning system aims to learn, such as a function that selects the best move or assigns a score to board states. 3. Choosing a representation for the target function, such as representing the function as a neural network. 4. Choosing a learning algorithm, such as backpropagation, to learn the target function from the training experience.

Uploaded by

farhandevil111
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

Unit-4

Well-Proposed Learning Algorithm


• A computer program is said to learn from
experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by P,
improves with experience E.
Well-Proposed Learning Algorithm
• A computer program is said to learn from
experience E in context to some task T and
some performance measure P, if its
performance on T, as was measured by P,
upgrades with experience E.
Well Defined Learning Algorithm:
Three features:
• The class of tasks,
• the measure of performance to be improved,
and
• the source of experience.
Machine Learning CSE 574, Spring 2004

Scientific Application of Machine


Learning
 Learning to classify new astronomical
structures
 Very large databases to learn general
regularities implicit in the data
 Classify celestial objects from image data
 Decision tree algorithms are now used by
NASA to classify all objects in sky survey
which consists of 3 terabytes of image data

5
Machine Learning CSE 574, Spring 2004

A Robot Driving Learning Problem


 Task T: driving on public, 4-lane highway
using vision sensors
 Performance measure P: average
distance traveled before an error (as
judged by human overseer)
 Training experience E: a sequence of
images and steering commands recorded
while observing a human driver

6
Machine Learning CSE 574, Spring 2004

A Handwriting Recognition Learning


Problem
 Task T: recognizing and classifying
handwritten words within images
 Performance measure P: percent of
words correctly classified
 Training experience E: a database of
handwritten words with given
classifications

7
Machine Learning CSE 574, Spring 2004

Handwriting Recognition Learning

8
Machine Learning CSE 574, Spring 2004

Text Categorization Problem


 Task T: assign a document to its content
category
 Performance measure P: Precision and
Recall
 Training experience E: Example pre-
classified documents

9
To better filter emails as spam or
not

Task – Classifying emails as spam or not


Performance Measure – The fraction of emails
accurately classified as spam or not spam
Experience – Observing you label emails as
spam or not spam
2. A checkers learning problem

Task – Playing checkers game


Performance Measure – percent of games won
against opposer
Experience – playing implementation games
against itself
Fruit Prediction Problem

Task – forecasting different fruits for


recognition
Performance Measure – able to predict
maximum variety of fruits
Experience – training machine with the largest
datasets of fruits images
Automatic Translation of
documents
Task – translating one type of language used in
a document to other language
Performance Measure – able to convert one
language to other efficiently
Experience – training machine with a large
dataset of different types of languages
Learning to recognize spoken words

All of the most successful speech recognition systems employ


machine learning in some form. For example, the SPHINX
system (e.g., Lee 1989) learns speaker-specific strategies for
recognizing the primitive sounds (phonemes) and words from the
observed speech signal.
Neural network learning methods (e.g., Waibel et al. 1989) and
methods for learning hidden Markov models (e.g., Lee 1989) are
effective for automatically customizing to,individual speakers,
vocabularies, microphone characteristics, background noise, etc.
Similar techniques have potential applications in many signal-
interpretation problems.
Learning to drive an autonomous vehicle.

Machine learning methods have been used to train


computer-controlled vehicles to steer correctly when
driving on a variety of road types. For example, the
ALVINN system (Pomerleau 1989) has used its learned
strategies to drive unassisted at 70 miles per hour for 90
miles on public highways among other cars. Similar
techniques have possible applications in many sensor-
based control problems.
Learning to classify new astronomical
structures.
• Machine learning methods have been applied to a
variety of large databases to learn general
regularities implicit in the data. For example,
decision tree learning algorithms have been used
by NASA to learn how to classify celestial objects
from the second Palomar Observatory Sky Survey
(Fayyad et al. 1995).
• This system is now used to automatically classify
all objects in the Sky Survey, which consists of
three terrabytes of image data.
Learning to play world-class backgammon.

• The most successful computer programs for playing


games such as backgammon are based on machiie
learning algorithms. For example, the world's top
computer program for backgammon, TD-GAMMON
(Tesauro 1992, 1995). learned its strategy by
playing over one million practice games against
itself. It now plays at a level competitive with the
human world champion. Similar techniques have
applications in many practical problems where very
large search spaces must be examined efficiently.
DESIGNING A LEARNING SYSTEM
1. Choosing the Training Experience
2. Choosing the Target Function
3. Choosing a Representation for the Target
Function
4. Choosing a Function Approximation
Algorithm
5. The Final Design
DESIGNING A LEARNING SYSTEM for
Checkers Game
Choosing the Training Experience:
impact on Success and Failure of Data:

• The first design choice we face is to choose the


type of training experience from which our system
will learn.
• A second important attribute of the training
experience is the degree to which the learner
controls the sequence of training examples.
• A third important attribute of the training
experience is how well it represents the
distribution of examples over which the final
system performance P must be measured.
Type of training experience
• Direct information (or training examples)
consists of individual checkerboard states and
their correct moves.
• Indirect information consist of a move
sequences and the final outcomes (win or lose).
• When using indirect information we are faced
with the credit assignment problem:
determining how much credit each move
should receive for the final outcome.
Experience is the degree to which the
learner
• Teacher or Not — Supervised — The training experience
will be labeled, which means, all the board states will be
labeled with the correct move. So the learning takes place
in the presence of a supervisor or a teacher.
Unsupervised — The training experience will be unlabeled,
which means, all the board states will not have the moves.
So the learner generates random games and plays against
itself with no supervision or teacher involvement.
Semi-supervised — Learner generates game states and asks
the teacher for help in finding the correct move if the board
state is confusi
Experience is the degree to which the
learner
• For example, the learner might rely on the
teacher to select informative board states and to
provide the correct move for each. Alternatively,
the learner might itself propose board states that
it finds particularly confusing and ask the teacher
for the correct move. Or the learner may have
complete control over both the board states and
(indirect) training classifications, as it does when
it learns by playing against itself with no teacher
present.
Choosing the Training Experience
2.Choosing the Target Function
• The next design choice is to determine exactly
what type of knowledge will be learned and
how this will be used by the performance
program.
• The next important step is choosing the target
function.
2.Choosing the Target Function
• When you are playing the checkers game, at any
moment of time, you make a decision on
choosing the best move from different
possibilities. You think and apply the learning that
you have gained from the experience. Here the
learning is, for a specific board, you move a
checker such that your board state tends towards
the winning situation. Now the same learning has
to be defined in terms of the target function.
Here there are 2 considerations — direct and
indirect experience-For Target Function
• .
• During the direct experience, the checkers learning
system, it needs only to learn how to choose the best
move among some large search space. We need to find
a target function that will help us choose the best move
among alternatives.
• Let us call this function ChooseMove and use the
notation ChooseMove : B →M to indicate that this
function accepts as input any board from the set of legal
board states B and produces as output some move from
the set of legal moves M.
Here there are 2 considerations — direct and
indirect experience-For Target Function
• When there is an indirect experience, it
becomes difficult to learn such function. How
about assigning a real score to the board
state. So the function be V : B →R indicating
that this accepts as input any board from the
set of legal board states B and produces an
output a real score. This function assigns the
higher scores to better board states.
assigning a real score to the board state.
V : B →R
• Let us therefore define the target value V(b) for an
arbitrary board state b in B, as follows:
1. if b is a final board state that is won, then V(b) = 100
2. if b is a final board state that is lost, then V(b) = -100
3. if b is a final board state that is drawn, then V(b) = 0
4. if b is a not a final state in the game, then V (b) = V
(b’), where b’ is the best final board state that can be
achieved starting from b and playing optimally until
the end of the game.
Step 3- Choosing Representation for Target
function:
• Now its time to choose a representation that the
learning program will use to describe the function ^V
that it will learn. The representation of ^V can be as
follows.
• A table specifying values for each possible board
state?
• collection of rules?
• neural network?
• a polynomial function of board features?
• …
Step 3- Choosing Representation for Target
function:
Partial design of a checkers learning program
Step 4- Choosing Function Approximation
Algorithm:
• To learn the target functionV:B-R(board
Value), we require a set of training examples,
each describing a specific board state b and
the training value (Correct Move ) y for b.
• The training algorithm learns/approximates
the coefficients u0, u1 up to u6 with the help
of these training examples by estimating and
adjusting these weights.
Step 4- Choosing Function Approximation
Algorithm:
• An optimized move cannot be chosen just
with the training data.
• The training data had to go through with set of
example and through these examples the
training data will approximates which steps
are chosen and after that machine will provide
feedback on it.
Step 4- Choosing Function Approximation
Algorithm:
• In order to learn the target function f we require a
set of training examples, each describing a specific
board state b and the training value Vtrain(b) for b.
• In other words, each training example is an ordered
pair of the form (b, V',,,i,(b)). For instance, the
following training example describes a board state b
in which black has won the game (note x2 = 0
indicates that red has no remaining pieces) and for
which the target function value VZrain(b) is
therefore +100.
Step 4- Choosing Function Approximation
Algorithm:
• 4.1 ESTIMATING TRAINING VALUES
Step 4- Choosing Function Approximation
Algorithm:
• 4.2 ADJUSTING THE WEIGHTS

5. The Final Design

• The final design of our checkers learning


system can be naturally described by four
distinct program modules that represent the
central components in many learning systems.
These four modules, summarized in Figure 1.1,
are as follows:
Performance System
• The Performance System is the module that must solve
the given performance task, in this case playing
checkers, by using the learned target function(s).
• It takes an instance of a new problem (new game) as
input and produces a trace of its solution (game history)
as output.
• In this strategy used by the Performance System to
select its next move at each step is determined by the
learned p evaluation function. Therefore, we expect its
performance to improve as this evaluation function
becomes increasingly accurate.
Critic
• The Critic takes as input the history or trace of
the game and produces as output a set of
training examples of the target function. As
shown in the diagram, each training example
in this case corresponds to some game state in
the trace, along with an estimate Vtrai, of the
target function value for this example
Generalizer
• The Generalizer takes as input the training
examples and produces an output hypothesis
that is its estimate of the target function. It
generalizes from the specific training
examples, hypothesizing a general function
that covers these examples and other cases
beyond the training examples.
Experiment Generator
• The Experiment Generator takes as input the
current hypothesis (currently learned
function) and outputs a new problem (i.e.,
initial board state) for the Performance System
to explore. Its role is to pick new practice
problems that will maximize the learning rate
of the overall system.

You might also like