0% found this document useful (0 votes)
103 views29 pages

Back Propagation Back Propagation Network Network Network Network

The document describes back propagation networks (BPNs), a type of neural network that uses backpropagation as a learning algorithm. BPNs have a multi-layer feedforward architecture consisting of an input layer, hidden layers, and an output layer. During training, the error is calculated at the output and propagated back through the network to update the weights between layers. Factors that affect training include the initial weights, learning rate, momentum, number of training examples, and number of hidden nodes. An example demonstrates calculating the output, error, and weight updates for a simple BPN classifying a single input pattern.

Uploaded by

alvinverghese
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views29 pages

Back Propagation Back Propagation Network Network Network Network

The document describes back propagation networks (BPNs), a type of neural network that uses backpropagation as a learning algorithm. BPNs have a multi-layer feedforward architecture consisting of an input layer, hidden layers, and an output layer. During training, the error is calculated at the output and propagated back through the network to update the weights between layers. Factors that affect training include the initial weights, learning rate, momentum, number of training examples, and number of hidden nodes. An example demonstrates calculating the output, error, and weight updates for a simple BPN classifying a single input pattern.

Uploaded by

alvinverghese
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

BACK PROPAGATION

NETWORK
Back propagation network (BPN)
 Network associated with back propagation learning
algorithm (BPLA).
 BPLA is one of the most important development in neural
network.
 BPLA is applied to multilayer feed forward networks
consisting of processing elements with continuous
differentiable activation function.
 BPN is used to classify the input patterns correctly.
 Basics of gradient descent method is used in weight update
algorithm.
Back propagation network (BPN)
 Error will be propagated back to the hidden unit.
 Aims to achieve a balance between the network’s ability to
respond and its ability to give reasonable responses to the
input that is similar but not identical to the training input.
 Training stages in BPN network:
 Output generation of the network for the input pattern

 Calculation and back propagation of the error

 Updation of weights.
Architecture
 BPN is a multi layer feed-forward neural network consisting
of
 Input layer

 Hidden layer and

 Output layer

 During back propagation of error, the signal are sent in the


reverse direction.
 Input and output of BPN may be binary or bipolar.
 Activation function could be any function which increases
monotonically and differentiable.
Architecture (Contd..,)
Notations
 x – input training vector
 t – target output vector
 α - learning rate
 voj – bias on jth hidden unit
 wok – bias on kth output unit
 zj – hidden unit j
 zinj – net input to zj
 yk – output unit k
Notations (Contd..,)
 δ k - error correction weight adjustment for wjk
 δ j - error correction weight adjustment for vij
 Commonly used activation function:
 Binary sigmoidal function

 Bipolar sigmoidal function

 Properties of activation function to be used in BPN


 Continuity

 Differentiability

 Nondecreasing monotony
Training patterns
 Incremental approach for updation of weights.
 Weights are being changed immediately after a single training pattern
is presented.

 Batch – mode training


 Weights are being changed only after all the training patterns are
presented.

 Requires additional storage for each connection to maintain the


immediate weight changes.

 Effectiveness of the training pattern depends on problem.


 BPN – Equivalent to optimal Bayesian discriminant function
BP learning algorithm
 BP learning algorithm will converge and find proper weights
for network even after enough learning if and only if there
exist a relation between input and output training pattern is
deterministic and error surface is deterministic.
 BPN is a special case of stochastic approximation.
 Randomness of the algorithm helps it to get out of local
optima.
Factors affecting the BPN
 Training of BPN and convergence of BPN is based on the
choice of various parameters like
 Initial weights

 Learning rate

 Updation rule

 Size and nature of training set

 Architecture (i.e., number of layers and number of neurons per layer)


Factors affecting the BPN
 Initial weight
 Initialized at random values.
 Choice of initial weight determines how fast the network converges.
 Can not be very high since sigmoidal activation functions used here
may get saturated and system may be stuck to local optima.
 Method 1: Range in which the initial weight can be initialized

where oi is the number of processing elements j that feed – forward to


processing element i
Factors affecting the BPN
 Method 2: Using Nyugen – Widrow initialization
 This method leads to faster convergence of network.

 Concept is based on geometric analysis.


Factors affecting the BPN
 Learning rate:
 Affects the convergence.

 Larger value
 speed up the convergence but might result in overshooting

 Leads to rapid learning but there is oscillation of weights

 Smaller value – Has vice versa effect.

 Range: 10-3 to 10
Factors affecting the BPN
 Momentum
 Very efficient and commonly used method that allows a larger
learning rate without oscillations is adding a momentum factor to the
normal weight updation method.

 Denoted as

 Common value assigned is 0.9

 Can be used in pattern by pattern updating or batch – mode updating.

 Momentum factor leaves some useful information for weight


updation if pattern by pattern method is used.

 Helps in faster convergence


Factors affecting the BPN
 Weight updation formula
Factors affecting the BPN
 Generalization
 A network is said to be generalized when it sensibly interpolates with
the new input networks.

 Over-fitting or Over-training:

 Network learns well but does not generalize well if there are many
trainable parameters for the given amount of training data is available,

 Making small changes in the input space of a pattern without


changing the output components can improve the ability of the
network to generalize to a test data set.

 Smaller networks are preferred since, a network with large number


of nodes is capable of memorizing the training set that generalizing it.
Factors affecting the BPN
 Number of training data T
 Should be sufficient and proper.

 Training data should cover the entire expected input space, and while
training, training – vector pairs should be selected randomly from the
set.

 Let us consider the input space can be linearly separable into L


disjoint regions , and T is the lower bound on the number of training
patterns .

 If proper value of T is selected such that T/L >> 1, then the network
can able to discriminate pattern classes using fine piecewise
hyperplane partitioning.
Factors affecting the BPN
 Number of hidden layer nodes
 If there is more than one hidden layer in a BPN, then calculations
performed for a single layer are repeated for other layers and are
summed up at the end.

 For a network of a reasonable size, the size of hidden nodes has to be


relatively small fraction of input layer.

 Example:
 If the network does not converge to a solution, it may need more hidden
nodes.

 Also, if the network converges, the user may try a very few hidden nodes
and then settle finally on a size based on overall system performance.
Example
 Input pattern [0, 1] . Target output: 1 Learning rate,
Example (Contd..,)
 Initial weights:
 [ v11 v21 v01] =[0.6 -0.1 0.3]

 [v12 v22 v02 ] = [-0.3 0.4 0.5]

 [w1 w2 w0] = [ 0.4 0.1 -0.2]

 Activation function used:


Example (Contd..,)
 Calculate the net input :
 For z1 layer:

 For z2 layer:
Example (Contd..,)
 Applying activation function:

 Calculate the net input to the output layer.


Example (Contd..,)
 Applying activation function, we get

 Compute the error using

 Now,
Example (Contd..,)
 Therefore,

 Change in weight between the hidden and output layer.


Example (Contd..,)
 Calculate the error between the input and hidden layer using
and

 Here m = 1 and j = 1 to 2.
 Therefore,
Example (Contd..,)
 Now,
Example (Contd..,)
 Now,
Example (Contd..,)
 Calculate change un weights between the input and hidden
layer
Example (Contd..,)
 The final weights are calculated as

You might also like