Multi Perceptor

Uploaded by

dr.shashiprabha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views37 pages

Multi Perceptor

Uploaded by

dr.shashiprabha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 37

Multilayer neural

networks
Agenda

 Some historical notes

 Multilayer neural networks

 Backpropagation

 Error surfaces and feature mapping

 Speed ups

 Conclusions
Some historical notes
Rosenblatt’s perceptron 1958: a single neuron
with adjustable synaptic weights and bias
Perceptron convergence theorem 1962
(Rosenblatt): If patterns used to train the
perceptron are drawn from two linearly
separable classes, then the algorithm
converges and positions the decision surface in
the form of hyperplane between two classes.
The limitations of networks implementing linear
discriminants were well known in the 1950s and
1960s.
Some historical notes
 A single neuron -> the class of solutions that
can be obtained is not general enough ->
multilayer perceptron

 Widrow & Hoff (1960): least-mean square

algorithm (LMS) = delta rule: three-layer
networks designed by hand (fixed input-to-
hidden weights, trained hidden-to-output
weights).
 The development of backpropagation: Kalman
filtering (Kalman 1960, Bryson,Denham,Dreyfus
1963,1969).
Some historical notes
 Werbos : Beyond Regression: New
Tools for Prediction and Analysis in
the Behavioral Science, Ph.D. thesis,
Harvard University, 1974
 Parker 1982,1985
 Rumelhart, Hinton, Williams: Learning
internal representations by
backpropagating errors, Nature
323(99),pp533-536,1986
Multilayer Neural Networks
 Input layer
 Hidden layer
 Output layer
 Bias unit = neuron
 netj = wtjx
 yj = f(netj)
 f(net) = Sgn(net)
 zk = f(netk)
Multilayer Neural Networks
 The XOR problem:
(x1 OR x2)
AND NOT (x1 AND
x2)
 y1 : x1 + x2 +0.5 = 0
 > 0 -> y1 = 1,
otherwise –1
 y2 : x1 + x2 -1.5 = 0
 > 0 -> y2 = 1,
otherwise –1
Multilayer Neural Networks
 Any desired
continuous function
can be implemented by
a three-layer network
given sufficient number
of hidden units, proper
nonlinearitiers and
weights (Kolmogorov)
 Feedforward
 Learning
 (Demo chapter 11)
Multilayer Neural Networks
 Nonlinear multilayer
networks

 How to set the

weights of a three-
layer neural network
based on training
patterns and the
desired output?
Backpropagation Algorithm

 The LMS
algorithm exists
for linear
systems
 Training error

 Learning rule
Backpropagation Algorithm
 Learning rule

 Hidden-to-output

 Input-to-hidden
 Note, that wij are
initialized with
random values
 Demo Chapter 11
Backpropagation Algorithm
 Compare with LMS
algoritms
 1) Method of Steepest
Descent
 The direction of
steepest descent is in
direction opposite to
the gradient vector g =
E(w)
 w(n+1) = w(n) –g(n)
  is the stepsize or
learning-rate parameter
Backpropagation Algorithm
 Training set = a set of patterns with known
labels
 Stochastic training = patterns are chosen
randomly from the training set
 Batch training = all patterns are presented to
the network before learning takes place
 On-line protocol = each pattern is presented
once and only once, no memory in use
 Epoch = a single presentation of all patterns in
the training set. The number of epochs is an
indication of the relative amount of learning.
Backpropagation Algorithm
 Stopping criterion
Backpropagation Algorithm

Learning set
Validation set
Test set

Stopping criterion
Learning curve, the
average error per
pattern
Cross-validation
Error Surfaces and Feature
Mapping
 Note, error backpropagation is based
on gradient descent in a criterion
function J(w) that is represented
represented
Error Surfaces and Feature
Mapping
 The total training
error is minimized.
 It usually
decreases
monotonically,
even though this is
not the case for the
error on each
individual pattern.
Error Surfaces and Feature
Mapping
 Hidden-to-output
weights ~ a linear
discriminant
 Input-to-hidden
weights ~ ”matched
filter”
Practical Techniques for
Improving Backpropagation
 How to improve
convergence,
performance, and
results?
 Neuron:
Sigmoid function =
centered zero and
antisymmetric
Practical Techniques for
Improving Backpropagation
Scaling input variables
= the input patterns should be shifted so
that the average over the training set of
each feature is zero.
= the full data set should be scaled to have
the same variance in each feature
component
Note, this standardization can only be done
for stochastic and batch learning!
Practical Techniques for
Improving Backpropagation
 When the training set is small one can generate
surrogate training patterns.
 In the absence of problem-specific information,
the surrogate patterns should be made by
adding d-dimensional Gaussian noise to true
training points, the category label should be left
unchanged.
 If we know about the source of variation among
patterns we can manufacture training data.
 The number of hidden units should be less than
the total number of training points n, say roughly
n/10.
Practical Techniques for
Improving Backpropagation
 We cannot initialize the weights to 0.
 Initializing weights = uniform learning ->
choose weights randomly from a
single distribution
 Input-to-hidden weights: -1 / d < wij < + 1 /
d ,where d is the number of input units
 Hidden-to-output weights: -1 /  nH < wkj < +
1 / nH ,where d is the number of hidden
units
Practical Techniques for
Improving Backpropagation
 Learning Rates
 Demo Chapter 9
 The optimal rate
Practical Techniques for
Improving Backpropagation
 Momentum : allows
the network to learn
more quickly when
plateaus in the
error surface exist.
Demo chapter 12.
0.9
Practical Techniques for
Improving Backpropagation
 Weight Decay to avoid overfitting = to
start with a network with too many
weights and decay all the weights
during training wnew =
wold(1-), where 0 <  < 1
 The weights that are not needed for
reducing the error function become
smaller and smaller -> eliminated
Practical Techniques for
Improving Backpropagation
 If we have insufficient
training data for the
desired classification
accuracy.
 Learning with hints is
to add output units for
addressing an ancillary
problem, one different
from but related to the
specific classification
problem at hand.
Practical Techniques for
Improving Backpropagation
 Stopped Training
 Number of hidden layers: typically 2-3.

 Criterion Function: cross entropy,

Minkowski Error
Practical Techniques for
Improving Backpropagation
 Second-Order Methods
to speed up the
learning rate:
 Newton’s Method,
demo chapter 9
 Quickprop
 Conjugate Gradient
Descent requires batch
training, demo chapter
9
Practical Techniques for
Improving Backpropagation
 2) Newton’s method
 The idea is to minimize the quadratic
approximation of the cost function E(w) around
the current point w(n).
 Using a second-order Taylor series expansion
of the cost function around the point w(n).
  Ew(n)  gT(n)w(n) +½ wT(n) H(n) w(n)
 g(n) is the m-by-1 gradient vector of the cost
function E(w) evaluated at the point w(n). The
matrix H(n) is the m-by-m Hessian matrix of
E(w) (second derivative), H = ²E(w)
Practical Techniques for
Improving Backpropagation
 H = ²E(w) requires the cost function E(w) to
be twice continuously differentiable with
respect to the elements of w.
 Differentiating  Ew(n)  gT(n)w(n) +½
wT(n) H(n) w(n) with respect to w, the
change E(n) is minimized when
 g(n) + H(n)w(n) = 0 -> w(n) = H-1(n)g(n)
 w(n+1) = w(n) + w(n)
 w(n+1) = w(n)+H-1(n)g(n)
 where H-1(n) is the inverse of the Hessian of
E(w).
Practical Techniques for
Improving Backpropagation
 Newton’s method converges quickly
asymtotically and does not exhibit the
zigzagging behavior.
 Newton’s method requires that the
Hessian H(n) has to be a positive
definite matrix for all n!
Practical Techniques for
Improving Backpropagation
 3) Gauss-Newton Method
 It is applicable to a cost function that is expressed
as the sum of error squares.
 E(w) = ½ n e²(i), note that all the error terms are
i=1
calculated on the basis of a weight vector w that is
fixed over the entire observation interval 1 i  n.
 The error signal e(i) is a function of the adjustable
weight vector w. Given an operating point w(n), we
linearize the dependence of e(i) on w by writing
e’(i,w) = e(i) + [e(i)/w]Tw=w(n) (w-w(n)), i=1,2,...,n
Practical Techniques for
Improving Backpropagation
e’(n,w) = e(n) + J(n)(w-w(n))
where e(n) is the error vector e(n) =
[e(1),e(2),...,e(n)]T and J(n) is the n-by-m
Jacobian matrix of e(n) (The Jacobian J(n) is
the transpose of the m-by-n gradient matrix
e(n), where e(n) =[e(1), e(2), ...,e(n)].
w(n+1) = arg min w {½e’(n,w)²}
= ½e(n)² +eT(n)J(n)(w-w(n)) +
½(w-w(n))TJT(n)J(n)(w-w(n))
Differentiating the expression with respect w and
setting the result equal to zero
Practical Techniques for
Improving Backpropagation
JT(n)e(n) + JT(n)e(n) (w-w(n)) = 0
w(n+1) = w(n) – (JT(n)J(n))-1JT(n)e(n)
The Gauss-Newton requires only the Jacobian
matrix of the error function e(n).
For the Gauss-Newton iteration to be
computable, the matrix product JT(n)J(n) must
be nonsigular. JT(n)J(n) is always nonnegative
definite but to ensure that it is nonsingular, the
Jacobian J(n) must have row rank n. -> add
the diagonal matrix I to the matrix JT(n)J(n),
the parameter  is a small positive constant.
Practical Techniques for
Improving Backpropagation
 JT(n)J(n)+ I ; positve definite for all n.
 -> The Gauss-Newton method is
implemented in the following form:
w(n+1) = w(n) – (JT(n)J(n) + I )-
J (n)e(n)
1 T

 This is the solution to the modified cost

function:
 E(w) = ½{w-w(0)²+  n e²(i)}
i=1
 where w(0) is the initial value of w.
Regularization, Complexity
Adjustment and Pruning
 If we have too many
weight and train too
long -> overfitting
 Wald Statistic: we can
estimate the
importance of a
parameter in a model
 The Optimal Brain
Damage, The Optimal
Brain Surgeon:
eliminate the
parameter having the
least importance
Summary
 Multilayer nonlinear neural networks trained
by gradient descent methods such as
backpropagation perform a maximum-
likelihood estimation of the weight values in
the model defined by the network topology.
 f(net) at the hidden units allows the
networks to form an arbitary decision
boundary, so long as there are sufficiently
many hidden units.

tiaSYSUP1500 - 01 - SystemOverview - en - 28 31 01 2020
No ratings yet
tiaSYSUP1500 - 01 - SystemOverview - en - 28 31 01 2020
27 pages
Java Interview JavaTpoint
100% (1)
Java Interview JavaTpoint
170 pages
Lecture (4) Backpropagation
No ratings yet
Lecture (4) Backpropagation
27 pages
Backpropagation Algorithm
No ratings yet
Backpropagation Algorithm
6 pages
Back Propagation
No ratings yet
Back Propagation
5 pages
ANN Notes Updated
0% (1)
ANN Notes Updated
46 pages
Module 2
No ratings yet
Module 2
14 pages
ECEN3250 Lab 7: Design of Common-Source MOS Amplifiers Prelab Assignment
No ratings yet
ECEN3250 Lab 7: Design of Common-Source MOS Amplifiers Prelab Assignment
14 pages
Training Feed Forward Networks With The Marquardt Algorithm
No ratings yet
Training Feed Forward Networks With The Marquardt Algorithm
5 pages
Group 3 Report
No ratings yet
Group 3 Report
66 pages
L04 Slides - mlp1
No ratings yet
L04 Slides - mlp1
22 pages
ML Exp 8
No ratings yet
ML Exp 8
2 pages
Understanding Backpropagation and Its Role in Deep LearningPARTH LAMBAT AND - 20250415 - 122012 - 0000
No ratings yet
Understanding Backpropagation and Its Role in Deep LearningPARTH LAMBAT AND - 20250415 - 122012 - 0000
18 pages
21 CA1 Mahak
No ratings yet
21 CA1 Mahak
10 pages
Working of Multi-Layer Perceptron
No ratings yet
Working of Multi-Layer Perceptron
16 pages
Backpropagation
No ratings yet
Backpropagation
4 pages
Preprocessor Directives in C Programming
No ratings yet
Preprocessor Directives in C Programming
7 pages
NN 2
No ratings yet
NN 2
12 pages
Pages 17-20
No ratings yet
Pages 17-20
4 pages
SuperMark 1.5T Proposal - 2108
No ratings yet
SuperMark 1.5T Proposal - 2108
29 pages
L.O Electronics
No ratings yet
L.O Electronics
8 pages
Back Propagation
No ratings yet
Back Propagation
29 pages
Computer Hardware Assessment Package LS 6
No ratings yet
Computer Hardware Assessment Package LS 6
21 pages
Silicon N-Channel Power MOSFET: General Description
No ratings yet
Silicon N-Channel Power MOSFET: General Description
10 pages
Joystick DANFOSS JS1-H
No ratings yet
Joystick DANFOSS JS1-H
4 pages
333 High Frequency GRE Words With Meanings
No ratings yet
333 High Frequency GRE Words With Meanings
7 pages
Graded Questions 2023 Solutions Per Block Updated
100% (1)
Graded Questions 2023 Solutions Per Block Updated
152 pages
Python Module 1
No ratings yet
Python Module 1
7 pages
AWS Assignment
No ratings yet
AWS Assignment
7 pages
Backpropagation
No ratings yet
Backpropagation
2 pages
FFNN, GD, Backpropagation
No ratings yet
FFNN, GD, Backpropagation
18 pages
SIR2 Manual
No ratings yet
SIR2 Manual
32 pages
Huawei RTN 905e Brochure
No ratings yet
Huawei RTN 905e Brochure
2 pages
14 Backprop
No ratings yet
14 Backprop
34 pages
Skylon (Album)
No ratings yet
Skylon (Album)
4 pages
Mist Edge
No ratings yet
Mist Edge
2 pages
Now and Get: Best VTU Student Companion You Can Get
No ratings yet
Now and Get: Best VTU Student Companion You Can Get
5 pages
ATM Banking System (18192203029)
No ratings yet
ATM Banking System (18192203029)
4 pages
Artificial Neural Networks - Lect - 3
No ratings yet
Artificial Neural Networks - Lect - 3
16 pages
Tejesh Resume PDF
No ratings yet
Tejesh Resume PDF
1 page
Snapdragon 616 Processor Product Brief
No ratings yet
Snapdragon 616 Processor Product Brief
2 pages
Block Retráctil
No ratings yet
Block Retráctil
1 page
Backpropagation
No ratings yet
Backpropagation
4 pages
Types of MAC Protocols
No ratings yet
Types of MAC Protocols
32 pages
PEC Magazine
No ratings yet
PEC Magazine
77 pages
Backpropagation
No ratings yet
Backpropagation
12 pages
Gate Controlled Switch
No ratings yet
Gate Controlled Switch
14 pages
Lect 15 MLP Introduction Backprop
No ratings yet
Lect 15 MLP Introduction Backprop
24 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Lect 15 MLP Introduction Backprop
No ratings yet
Lect 15 MLP Introduction Backprop
24 pages
PNAL6 MLPTraining
No ratings yet
PNAL6 MLPTraining
40 pages
NN 1
No ratings yet
NN 1
21 pages
Network Configuration: 69-3 Nguyen Thi Nho, P9, Q.Tbinh, Tp. HCM
No ratings yet
Network Configuration: 69-3 Nguyen Thi Nho, P9, Q.Tbinh, Tp. HCM
20 pages
ML Unit 5
No ratings yet
ML Unit 5
34 pages
Mod 2.4,2.5,2.6 Architecture Design
No ratings yet
Mod 2.4,2.5,2.6 Architecture Design
20 pages
Unit 2
No ratings yet
Unit 2
38 pages
Backpropagation
No ratings yet
Backpropagation
8 pages
UNIT 3 - Backpropagation Algorithm
No ratings yet
UNIT 3 - Backpropagation Algorithm
38 pages
Backpropagation Learning in Neural Networks
No ratings yet
Backpropagation Learning in Neural Networks
27 pages
SBI 7126T S6 Blade Module
No ratings yet
SBI 7126T S6 Blade Module
74 pages
RBFN and TDNN
No ratings yet
RBFN and TDNN
42 pages
DFCM Driver Manual
No ratings yet
DFCM Driver Manual
52 pages
Lecture 13.3 Classification ANN
No ratings yet
Lecture 13.3 Classification ANN
64 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
Unit II Supervised II
No ratings yet
Unit II Supervised II
16 pages
Improving The Rate of Convergence of The Backpropagation Algorithm For Neural Networks Using Boosting With Momentum
No ratings yet
Improving The Rate of Convergence of The Backpropagation Algorithm For Neural Networks Using Boosting With Momentum
6 pages
EELU ANN ITF309 Lecture 08 Spring 2023-2024-Sensitivity-Back-Propagation
No ratings yet
EELU ANN ITF309 Lecture 08 Spring 2023-2024-Sensitivity-Back-Propagation
39 pages
Neural-Networks Back Propagation
No ratings yet
Neural-Networks Back Propagation
70 pages
CI-6-8 Backpropagation (COMPLETE) Updated
No ratings yet
CI-6-8 Backpropagation (COMPLETE) Updated
76 pages
Multi Layer Feed-Forward Network Learning
No ratings yet
Multi Layer Feed-Forward Network Learning
5 pages
5_2018_12_06!07_01_32_PM
No ratings yet
5_2018_12_06!07_01_32_PM
37 pages
4-perceptron-06-08-2025
No ratings yet
4-perceptron-06-08-2025
32 pages
L3 Backpropagation
No ratings yet
L3 Backpropagation
61 pages
Curs3site PDF
No ratings yet
Curs3site PDF
38 pages
Shortcomings in Single Layer Neural Networks: Most Real World Problems Are Not
No ratings yet
Shortcomings in Single Layer Neural Networks: Most Real World Problems Are Not
43 pages
Lecture 9
No ratings yet
Lecture 9
78 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
78 pages
Ambo University: Institute of Technology Department of Computer Science
No ratings yet
Ambo University: Institute of Technology Department of Computer Science
75 pages
Part 2 Module 2 DL BP
No ratings yet
Part 2 Module 2 DL BP
66 pages
Machine Learning
No ratings yet
Machine Learning
73 pages
Soft Computing 2
No ratings yet
Soft Computing 2
33 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
2012-1158. Backpropagation NN
No ratings yet
2012-1158. Backpropagation NN
56 pages
Solution HW4
No ratings yet
Solution HW4
5 pages
Chapter 9. Classification: Advanced Methods
No ratings yet
Chapter 9. Classification: Advanced Methods
39 pages
Back Propagation
No ratings yet
Back Propagation
20 pages
An Introduction To Mathematics Behind Neural Networks
No ratings yet
An Introduction To Mathematics Behind Neural Networks
5 pages

Multi Perceptor

Uploaded by

Multi Perceptor

Uploaded by

Multilayer neural

 Some historical notes

 Error surfaces and feature mapping

 Widrow & Hoff (1960): least-mean square

 How to set the

 Criterion Function: cross entropy,

 This is the solution to the modified cost

You might also like