0% found this document useful (0 votes)

98 views72 pages

unit 2

Uploaded by

nusrathshaikh07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views72 pages

unit 2

Uploaded by

nusrathshaikh07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 72

UNIT - II

ARTIFICIAL NEURAL NETWORKS

2
 UNIT - II
 Artificial Neural Networks-1– Introduction, neural network
representation, appropriate problems for neural network learning,
perceptions, multilayer networks and the back-propagation algorithm.
 Artificial Neural Networks-2- Remarks on the Back-Propagation
algorithm, An illustrative example: face recognition, advanced topics in
artificial neural networks.
 Evaluation Hypotheses – Motivation, estimation hypothesis accuracy,
basics of sampling theory, a general approach for deriving confidence
intervals, difference in error of two hypotheses, comparing learning
algorithms.

05/01/2025
TOPICS
1. Artificial Neural Networks
1.1. Neurons and Biological Motivation
1.2. Neural Network Representations
1.3. Problems for Neural Network Learning

2. Perceptrons
2.1. Representational Power of Perceptrons
2.2. The Perceptron Training Rules

3. Multilayer Networks and the Back

Propagation Algorithm
3.1. A Differentiable Threshold Unit
3.2. The Backpropagation Algorithm
1. Artificial Neural Networks
4

 Neural network learning methods provide a

robust approach to approximating real-valued,
discrete-valued and vector-valued target
functions.

 ANN learning is well matched for certain types of

problems, like:
 Interpret complex real-world sensor data.

 Speech recognition.

 Visual scenes.

 Learning robot control strategies. etc…

05/01/2025
1. Artificial Neural Networks
(Continued)
5

 The Backpropagation algorithm is one of the

efficient ANN algorithm , which is successful in
many practical problems such as:
 Learning to recognize handwritten characters

 Learning to recognize faces

 Learning to recognize spoken words

 The Backpropagation algorithm use gradient

descent to tune network parameters to best fit a
training set of input-output pairs.

05/01/2025
1.1. Neurons and Biological Motivation
6

 The study of artificial neural network(ANN) has

been inspired by the observation that biological
learning systems are built of very high complex,
nonlinear parallel interconnections of neurons.

 The human body is made up of trillions of cells.

Cells of the nervous system, called nerve cells
or neurons, are specialized to carry "messages"
through an electrochemical process.

05/01/2025
1.1. Neurons and Biological Motivation
(Continued)
7

 Neurons have specialize cell

part called
dendrites and
axons. Dendrites bring
electrical signals to the cell
body and axons take
information away from the cell
body.

 To a neuron dendrites are used

to take input to neuron cell and
axons are used to give output
from that neuron cell , which
may input to another neuron.
05/01/2025
1.1. Neurons and Biological Motivation
(Continued)
8

A Neuron structure

Communication between two neurons

05/01/2025
1.1. Neurons and Biological Motivation
(Continued)
9

 The human brain , is estimated to contain an

interconnected network of approximately 1011 neurons,
each connected, on average, to 104 others.
 Neuron activity is typically excited through connections
to other neurons.
 Signals can be transmitted unchanged or they can be
altered by synapses. A synapse is able to increase or
decrease the strength of the connection from the
neuron to neuron and cause excitation or inhibition of
a subsequence neuron. This is where information is
stored.
 ANNs are loosely motivated by biological neural systems,
there are many complexities to biological neural systems
05/01/2025
that are not modeled by ANNs.
1.1. Neurons and Biological Motivation
(Continued)
10

 The information processing abilities of biological

neural systems must follow from highly parallel
processes operating on representations that are
distributed over many neurons. One motivation for
ANN is to capture this kind of highly parallel
computation based on distributed representations.

 Historically, two groups of researchers have worked

with artificial neural networks.
 ANNs to study and model biological learning

processes.
 To obtaining highly effective machine learning
05/01/2025
algorithms
1.2. Neural Network Representations
11

 A prototypical example of ANN learning is

ALVINN( (Autonomous Land Vehicle In
a Neural Network)is a perception system, which
uses a learned ANN to steer an autonomous
vehicle driving at normal speeds on public
highways.
 Input: The input to the neural network is a 30 x
32 grid of pixel intensities obtained from a
forward-pointed camera mounted on the vehicle.
 Output: The network output is the direction in
which the vehicle is steered
Machine Learning GITAM University 05/01/2025
1.2. Neural Network Representations
(Continued)
12

 The ANN is trained to mimic the observed

steering commands of a human driving the
vehicle for approximately 5 minutes. ALVINN has
used its learned networks to successfully drive at
speeds up to 70 miles per hour and for distances
of 90 miles on public highways.

 Figures in the next slide illustrates the neural

network representation used in one version of the
ALVINN system, and illustrates the kind of
representation typical of many ANN systems.
05/01/2025
1.2. Neural Network Representations
(Continued)
13

Figure 2

Figure 3
Figure 1
05/01/2025
1.2. Neural Network Representations
(Continued)
14
 The ALVINN system uses BACKPROPAGATION to learn to
steer an autonomous vehicle (photo at top) driving at
speeds up to 70 miles per hour.
 The diagram on the left (figure 1) shows how the image of
a forward-mounted camera is mapped to 960 neural
network inputs, which are fed forward to 4 hidden units,
connected to 30 output units.
 The figure 3 shows 30 x 32 weights into the hidden
unit are displayed in the large matrix, with white blocks
indicating positive and black indicating negative weights.
 As can be seen from these output weights, activation of
this particular hidden unit encourages a turn toward the
left.
05/01/2025
1.2. Neural Network Representations
(Continued)
15

 Each ANN is composed of a collection of

perceptrons grouped in layers. A typical structure
is shown in below figure.

Note the three

layers: input,
intermediate (called
the hidden layer)
and output.
Several hidden layers
can be placed
between the input
and output layers. 05/01/2025
1.3. Problems for Neural Network
Learning
16

 ANN learning is well-suited to problems in which

the training data corresponds to noisy, complex
sensor data. It is also applicable to problems for
which more symbolic representations are used.

 The back propagation (BP) algorithm is the most

commonly used ANN learning technique. It is
appropriate for problems with the characteristics:

05/01/2025
1.3. Problems for Neural Network
Learning (Continued)
17

 Instances are represented by many attribute-

value pairs.
 The target function output may be discrete-
valued, real-valued, or a vector of several real- or
discrete-valued attributes.
 The training examples may contain errors.
 Long training times are acceptable.
 Fast evaluation of the learned target function
may be required.
 The ability of humans to understand the learned
target function is not important
05/01/2025
2. Perceptrons
18

 One type of ANN system is based on a unit called

a perceptron, illustrated in below figure.

A Perceptron
 An ANN consists of perceptrons. Each of the
perceptrons receives inputs, processes inputs
and delivers a single output.
05/01/2025
2. Perceptrons (Continued)
19

 A perceptron takes a vector of real-valued inputs,

then calculates a linear combination of these
inputs, after that it returns1if the result is greater
than some threshold and -1 otherwise as its
output.

 where each wi is a real-valued constant, or

weight, that determines the contribution of input
xi to the perceptron output.
05/01/2025
2.1. Representational Power of
Perceptrons
20
 We can view the perceptron as representing a
hyperplane decision surface in the n-
dimensional space of instances (i.e. points).
 The perceptron outputs a 1 for instances lying on
one side of the hyperplane and outputs a –1 for
instances lying on the other side.
 Examples can be separated by a single
perceptron are called linearly separated set of
examples.

05/01/2025
2.1. Representational Power of
Perceptrons (Continued)
21

 A single perceptron can be used to represent many

boolean functions.
 For example, if we assume boolean values of 1 (true)
and -1 (false), then one way to use a two-input
perceptron to implement the AND function is to set the
weights wo = -0.8, and w1 = w2 = .5.
 This perceptron can be made to represent the OR
function instead by altering the threshold to wo =-0.3.
 Unfortunately, however, some boolean functions
cannot be represented by a single perceptron, such as
the XOR function. Those are called linearly
nonseparable training examples

05/01/2025
2.1. Representational Power of
Perceptrons (Continued)
22

 A single perceptron can be used to represent

many boolean functions.
· AND function(linearly separable) :
x
<Trai ni ng exam
1 x2
pl es>
out put
Decision hyper plane :
0 0 -1
w0 + w 1 x1 + w 2 x2 = 0
0 1 -1 -0.8 + 0.5 x1 + 0.5 x2 = 0
1 0 -1 x2
1 1 1

<Test Resul ts> - +

x1 x2 wixi output
0 0 - 0. 8 -1
0 1 - 0. 3 -1 - -
x1
1 0 - 0. 3 -1
1 1 0. 2 1 -0.8 + 0.5 x1 + 0.5 x2 = 0

05/01/2025
2.1. Representational Power of
Perceptrons (Continued)
23

• OR function (linearly separable):

• The two-input perceptron can implement the OR
function when we set the weights: w0 = -0.3, w1
=
x1
w2 = 0.5 . o utp ut
< Tra in in g e x a mp le s >
x2 Decision hyperplane :
0 0 -1 w 0 + w 1 x1 + w 2 x2 = 0
0 1 1
-0.3 + 0.5 x1 + 0.5 x2 = 0
1 0 1
1 1 1 x2

< Te s t Re s ults > + +

x1 x2 w x o u tp ut
i i
0 0 - 0.3 -1
0 1 0.2 -1 - +
1 0 0.2 -1 x1
1 1 0.7 1 -0.3 + 0.5 x1 + 0.5 x2 = 0
05/01/2025
2.1. Representational Power of
Perceptrons (Continued)
24

 XOR function(Non linearly

separable) :
 It’s impossible to implement the XOR function by
x2
a single perception.
<Trai ni ng exampl es>
x1 x2 out put
0 0 -1
0 1 1
+ -
1 0 1
1 1 -1
- +
A two-layer network of x1
perceptrons can represent
XOR function.

Refer to this equation, 05/01/2025

2.2. The Perceptron Training
25
Rule
 let us begin by understanding how to learn the
weights for a single perceptron.

 They are important to ANNs because they

provide the basis for learning networks of many
units.

 Several algorithms are known to solve this

learning problem. Here we consider two:
 The Perceptron rule.
 The Delta rule.
05/01/2025
2.2. The Perceptron Training Rule
(Continued)
26

 The Perceptron Rule:

 One way to learn an acceptable weight vector is to begin
with random weights, then iteratively apply the
perceptron to each training example, modifying the
perceptron weights whenever it misclassifies an example.
 This process is repeated until the perceptron classifies all
training examples correctly.
 Weights are modified at each step according to the
perceptron training rule , which revises the weight wi
associated with input xi according to the rule.

Where ,
05/01/2025
2.2. The Perceptron Training Rule
(Continued)
27

In the above equation:

t  is the target output for the current training
example
o  is the output generated by the perceptron
and
 is a positive constant called the learning
rate

 The role of the learning rate is to moderate the

degree to which weights are changed at each step.
05/01/2025
2.2. The Perceptron Training Rule
(Continued)

28
 If t = -1 and o = 1, then weights associated with
positive xi will be decreased rather than increased.
 If t=1 and o=-1 , then weights associated with positive
xi will be increased rather than decreased.
 Example: if xi = 0.8, = 0.1, t = 1, and o = - 1, then
the weight update will be = O.1(1 - (-
1))0.8 = 0.16.
 the above learning procedure can be proven to
converge within a finite number of applications of the
perceptron training rule to a weight vector that
correctly classifies all training examples, provided the
training examples are linearly separable and provided
a sufficiently small n is used. If the data are not
linearly separable, convergence is not assured.
05/01/2025
2.2. The Perceptron Training Rule
(Continued)
29

• Gradient Descent and the Delta Rule:

 The perceptron rule finds a successful weight vector when
the training examples are linearly separable, it can fail to
converge if the examples are not linearly separable.
 A second training rule, called the delta rule, is designed to
overcome this difficulty.
 The key idea of delta rule is to use gradient descent to
search the space of possible weight vector to find the
weights that best fit the training examples.
 This rule is important because gradient descent provides
the basis for the BACKPROPAGATION algorithm, it can
serve as the basis for learning algorithms that must
search through hypothesis spaces containing many
different types of continuously parameterized hypotheses.
05/01/2025
2.2. The Perceptron Training Rule
(Continued)
30

 The delta training rule is best understood by

considering the task of training an
unthresholded perceptron; that is, a linear
unit for which the output o is given by
 In order to derive a weight learning rule for
linear units, let us begin by specifying a
measure for the training error of a
hypothesis (weight vector), relative to the
training examples. Although there are many
ways to define this error, one common measure
used is
05/01/2025
2.2. The Perceptron Training Rule
(Continued)
31

Error of different hypotheses. For a linear unit with two

weights, the hypothesis space H is the w0, w1 plane.

05/01/2025
2.2. The Perceptron Training Rule
(Continued)
32

 Gradient descent search determines a

weight vector that minimizes E by
starting with an arbitrary initial weight
vector, then repeatedly modifying it in
small steps.
 At each step, the weight vector is
altered in the direction that produces the
steepest descent along the error surface.
 This process continues until the global
minimum error is reached
05/01/2025
2.2. The Perceptron Training Rule
(Continued)
33

Then , it will update the

weights as:

 the gradient descent algorithm for training linear

units is as follows: Pick an initial random weight
vector. Apply the linear unit to all training
examples, then compute  wi for each weight
according to Equation Update each weight wi by
adding  wi, then repeat this process.
05/01/2025
2.2. The Perceptron Training Rule
(Continued)
34

 Derivation of Gradient – Descent Rule:

 How can we calculate the direction of
steepest descent along the error surface?

 This can be done by computing the

derivative of E with respect to each
component of the vector .

 This vector derivative is called the gradient

of E with respect to , written .
05/01/2025
2.2. The Perceptron Training Rule
(Continued)
35

 A linear unit for which the output o is given by

-------------- (1)
 In order to derive a weight learning rule for linear
units, let us begin by specifying a measure for the
training error of a hypothesis, one common
measure that will turn out to be especially
convenient is
--------- (2)
 This vector derivative is called the gradient of E
with respect to the vector <w0,…,wn>, written E .
------------(3)
05/01/2025
2.2. The Perceptron Training Rule
(Continued)
36

 Since the gradient specifies the direction of steepest

increase of E, the training rule for gradient descent is

where
----------- (4)
 Here is a positive constant called the learning rate.
 The negative sign is present because we want to
move the weight vector in the direction that
decreases E.
 Equation 4 can also be written as ------
(5)

05/01/2025
2.2. The Perceptron Training Rule
(Continued)
37

 which makes it clear that steepest descent is achieved by

altering each component w i of weight vector in proportion to
E/wi.The vector of E/wi derivatives that form the gradient
can be obtained by differentiating E from Equation (2), as

-------
----------
05/01/2025
(6)
2.2. The Perceptron Training Rule
(Continued)
38

 Substituting Equation 6 into Equation 5

yields the weight update rule for
gradient descent

05/01/2025
2.2. The Perceptron Training Rule
(Continued)
39

Gradient-
Descent
Algorithm:


05/01/2025
Gradient-Descent Algorithm
40

 To summarize, the gradient descent algorithm for

training linear units is as follows:
 Pick an initial random weight vector. Apply the
linear unit to all training examples, then compute
delta-wi for each weight
 Update each weight wi by adding delta-wi, then
repeat this process.
 Because the error surface contains only a single
global minimum, this algorithm will converge to a
weight vector with minimum error, regardless of
whether the training examples are linearly
separable
05/01/2025
stochastic gradient descent

 The key practical difficulties in applying gradient descent.

 converging to a local minimum can sometimes be quite
slow.
 if there are multiple local minima in the error surface,
then there is no guarantee that the procedure will find
the global minimum.
 One common variation on gradient descent intended to
alleviate these difficulties is called incremental gradient
descent, or alternatively stochastic gradient descent.
 The idea behind stochastic gradient descent is to
approximate this gradient descent search by updating
weights incrementally, following the calculation of the error
for each individual example.
05/01/2025
stochastic gradient descent
42

The modified training rule is

error function defined for each individual training

example d as follows

05/01/2025
Remarks
44

 Perceptron training rule

 1.Updates weights based on the error in the thresholded
perceptron output
 2.converges after a finite number of iterations to a hypothesis
that perfectly classifies the training data, provided the training
examples are linearly separable.

 Delta rule
 1. Updates weights based on the error in the un-thresholded
linear combination of inputs
 2. converges only asymptotically toward the minimum error
hypothesis, possibly requiring unbounded time, but converges
regardless of whether the training data are linearly separable.

05/01/2025
3. Multilayer Networks
45

 The kind of multilayer networks learned by the

BACKPROPACATION algorithm are capable of
expressing a rich variety of nonlinear decision surfaces.

 It is possible for the multilayer network to represent

highly nonlinear decision surfaces that are much more
expressive than the linear decision surfaces.

 For example, in figure 3.0. the speech recognition task

involves distinguishing among 10 possible vowels, all
spoken in the context of "h-d" (i.e., “hid”, “had”,
“head”, “hood”, etc.).

05/01/2025
3. Multilayer Networks (Continued)

Figure 3.0.
 The network input consists of two parameters, F1 and
F2, obtained from a spectral analysis of the sound.
The 10 network outputs correspond to the 10 possible
vowel sounds. The plot on the right illustrates the
highly nonlinear decision surface.
05/01/2025
47 05/01/2025
3. Multilayer Networks
(Continued)
48

A Differentiable
Threshold Unit:

 Like the perceptron, the sigmoid unit first

computes a linear combination of its inputs, then
applies a threshold to the result. In the case of
sigmoid unit, however, the threshold output is a
continuous function of its input. The sigmoid function
(x) is also called the logistic function.
05/01/2025
3. Multilayer Networks
(Continued)
49

 Interesting property:

 Output ranges between 0 and 1, increasing

monotonically with its input.

We can derive gradient decent rules to train

 One sigmoid unit
 Multilayer networks of sigmoid units 
Backpropagation

05/01/2025
50 05/01/2025
3. Multilayer Networks
(Continued)
51
 The Backpropagation (BP)Algorithm :
 The BP algorithm learns the weights for a multilayer network,
given a network with a fixed set of units and interconnections. It
employs a gradient descent to attempt to minimize the squared
error between the network output values and the target values for
these outputs.
 Considering networks with multiple output units rather than single
units, redefine E to sum the errors over all of the network output
units
 E( )    ((tkd – okd)2
d D koutputs

 where outputs is the set of output units in the network,

 Tkd and okd are the target and output values05/01/2025
associated with kth
output unit and training example d.
52

 The learning problem in Backpropagation is to

search a large hypothesis space defined by all
possible weight values for all the units in the
network.
 In multilayer network the error surface can have
multiple local minima.
 The gradient descent is guaranteed only to
converge toward some local minimum, but not
necessarily to global minimum error.

05/01/2025
3. Multilayer Networks
(Continued)

05/01/2025
54

 The algorithm here applies to layered feedforward networks

containing two layers of sigmoid units, with units at each
layer connected to all units from the preceding layer.
 Notations :
 An index(integer) is assigned to each node in the network,
where a ‘node’ is either an input to the network or the output
of some unit in the network.
 xji denotes the input from node i to unit j , and wji denotes the
corresponding weight.
  n denotes the error term associated with unit n. It plays a role
similar to the quantity (t – o) in delta training rule. Where
  = E 05/01/2025
n netn
Working of algorithm
55

 The algorithm starts by constructing a network with the

desired no. of hidden input and output units.
 Initializing all network weights to small random values.
 The main loop of the algorithm then repeatedly iterates over
the training examples.
 For each training example, it applies the network to the
example, calculates the error of the network output for this
example , the update all the weights in the network.
 The gradient descent weight-update rule w(ji jxji )
updates each weight in proportion to the learning rate
,the input value xji to which the weight is applied, and the
error in the output of the unit. 05/01/2025
56

 The exact form of follows

j from the derivation of the weight-
tuning rule.
 k is computed for each network output unit k as :
k  ok (1  ok )(tk  ok )
 The factor ok (1is derivative
ok ) of sigmoid squashing function.
h value for each hidden unit h is calculated as


h oh (1  oh )  wkhk
koutputs
 Training examples provide target values tk only for network
output,no target values are directly available to indicate the error
of hidden unit’s values.
05/01/2025
57

 Error term for hidden unit h is calculated by summing the error

terms k for each output unit influenced by h, weighing each of
the k ‘s by wkh, the weight from hidden unit h to output unit k.
 The weight characterizes the degree to which from hidden unit h
is ‘responsible for’ the error in output unit k.
 the algorithm updates weights incrementally , following the
presentation of each training example.
 The true gradient of E one would sum the jxji values over all
training examples before altering weight values.

05/01/2025
58

 The weigh-update loop in BackPropagation may be iterated

thousands of times in a typical application.
 The termination conditions are

i) after fixed number of iterations through the loop or,

ii) once the error on the training examples falls below
some threshold or,
iii) once the error on a separate validation set of
examples meet some criterion.
The choice of termination plays important role, as few no of
iterations can fail to reduce error and too many can lead to
overfitting the training data. 05/01/2025
Adding Momentum
59

 We can alter the weigh-update rule in wji jxji

in the algorithm by making the weight update on the nth iteration
depend partially on the update that occurred during (n-1)th
iteration, as follows :
wji n  jxji  wji n  1
wji n 
 is the weight update performed during the nth iteration
0  1 is a constant called the momentum.


 On RHS of the equation :

first term is just weight-update equation of Backpropagation
algorithm,second term is new and called as momentum term
05/01/2025
Learning in arbitrary acyclic networks
60

 The Backpropagation easily generalizes to feedforward

networks of arbitrary depth. 
 The weight update rule : wji jxji is used ,with
 of values.
small changes in computing
 The general equation for computing is :
r or (1  or ) w sr
slayer ( m 1)
s

r value for a unit ‘r’ in layer ‘m’ is computed from the values at the next
deeper layer m + 1 .

05/01/2025
61

 We can generalize the algorithm to any directed

acyclic graph, regardless of whether the network
units are arranged in uniform layers or not.
 The rule for calculating  for any internal unit is
r or (1  or )  wsrs
sDownstream ( r )

 where Downstream(r) is the set of units immediately

downstream from unit ‘r’ in the network i.e all
units whose inputs include the output of unit ‘r’.
05/01/2025
Derivation of the Backpropagation Rule
62

 We will derive the weight-tunning rule of Back-

propagation algorithm.
 First we will derive the stochastic gradient descent rule
implemented by the algorithm.
 The stochastic gradient descent involves iterating through
the training examples one at a time, for each training
example d descending the gradient of the error E d w.r.t this
single example.
 For each trainingexample
wji
d every weight wji is updated by
adding to it is updated by adding to it
05/01/2025
Ed
wji  
wji
63

 where Ed is the error on training example d

 Summing over all output units in the
network Ed ( )  1  (tk  ok )
2 2 koutputs
 Here,
outputs is the set of units in the network,
tk is the traget value of unit k for training
example d

05/01/2025
Subscripts and variables used in notation of
stochastic gradient descent rule:
64

 i -> ith input

 J -> jth unit of the network
 xji -> the ith input to unit j
 Wji -> the weight associated with ith input to unit j
 netj -> ( the weighted sum of inputs for
w
i
ji xji
unit j)
oj -> the output computed for unit j
tj -> the target output for unit j
 -> the sigmoid function
outputs -> the set of units in the final layer of the
05/01/2025
network
Downstream(j) = the set of units whose
immediate inputs include the 0utput of unit j
65

 In order to implement the stochastic

gradient rule , we will derive
Edan
wji
expression for
 The weight wji can influence the rest of
network only through
Ed

Enet j .j
d net

wji netj wji

 Using chain rule :Ed
 xji
netj

 Ed
Now we have to derive a convenient
netj
expression for 05/01/2025
ADVANCED TOPICS IN ARTIFICIAL NEURAL
NETWORKS
66
 Alternative Error Functions:
 gradient descent can be performed for any function E that is
differentiable with respect to the arameterized hypothesis space.
While the basic BP algorithm defines E in terms of the sum of
squared errors of the network, other definitions have been suggested
in order to incorporate other constraints into the weight-tuning rule.
 Examples of alternative definitions of E include:
1. Adding a penalty term for weight magnitude. we can add a term
to E that increases with the magnitude of the weight vector. This
causes the gradient descent search to seek weight vectors with small
magnitudes, thereby reducing the risk of overfitting. One way to do
this is to redefine E as

05/01/2025
67

 2. Adding a term for errors in the slope, or derivative of the target function.
In some cases, training information may be available regarding desired
derivatives of the target function, as well as desired values.

 3. Minimizing the cross entropy of the network with respect to the target
values. Consider learning a probabilistic function, such as predicting whether a
loan applicant will pay back a loan based on attributes such as the applicant's age
and bank balance. Although the training examples exhibit only boolean target
values, the underlying target function might be best modeled by outputting the
probability that the given applicant will repay the loan, rather than attempting to
output the actual 1 and 0 value for each input instance. probability estimates that
are given by the network that minimizes the cross entropy, defined as
68

4. Altering the effective error function can also be accomplished

by weight sharing, or "tying together" weights associated
with different units or inputs. For example, in an application
of neural networks to speech recognition, the various units
that receive input from different portions of the time window
are forced to share weights. Such weight sharing is typically
implemented by first updating each of the shared weights
separately within each unit that uses the weight, then
replacing each instance of the shared weight by the mean of
their values.

05/01/2025
Alternative Error Minimization Procedures

69
 While gradient descent is one of the most general search methods for finding a
hypothesis to minimize the error function, it is not always the most efficient. It
is not uncommon for BP to require tens of thousands of iterations through the
weight update loop when training complex networks. For this reason, a number
of alternative weight optimization algorithms have been proposed and
explored.
 One optimization method, known as line search, involves a different approach
to choosing the distance for the weight update. once a line is chosen that
specifies the direction of the update, the update distance is chosen by finding
the minimum of the error function along this line.
 A second method, that builds on the idea of line search, is called the conjugate
gradient method. Here, a sequence of line searshes is performed to search for a
minimum in the error surface. On the first step in this sequence, the direction
chosen is the negative of the gradient. On each subsequent step, a new
direction is chosen so that the component of the error gradient that has just
been made zero, remains zero. 05/01/2025
Recurrent Networks

 Recurrent networks are artificial neural networks that apply to

time series data and that use outputs of network units at time t as
the input to other units at time t + 1. In this way, they support a
form of directed cycles in the network.
 consider the time series prediction task of predicting the next day's
stock market average y(t + 1 ) based on the current day's economic
indicators x(t). Given a time series of such data, one obvious
approach is to train a feedforward network to predict y(t + 1 ) as
its output, based on the input values x(t). Such a network is shown
in Figure 4.11(a).
 One limitation of such a network is that the prediction of y(t + 1 )
depends only on x(t) and cannot capture possible dependencies of y
(t + 1 ) on earlier values of x. 05/01/2025
71

 if we wish the network to consider an arbitrary window of time in the past

when predicting y(t+l), then a different solution is required. The recurrent
network shown in Figure 4.1 1(b) provides one such solution. the input
value c(t) to the network at one time step is simply copied from the value
of unit b on the previous time step. this implements a recurrence relation.
Many other network topologies also can be used to represent recurrence
relations. For example, we could have inserted several layers of units
between the input and unit b, and we could have added several context
units in parallel where we added the single units b and c.
 Figure 4.11(c), which shows the data flow of the recurrent network
"unfolded in time. Here we have made several copies of the recurrent
network, replacing the feedback loop by connections between the various
copies. This large unfolded network contains no cycles. Therefore, the
weights in the unfolded network can be trained directly using BP.

05/01/2025
72

05/01/2025
Dynamically Modifying Network Structure:
A variety of methods have been proposed to dynamically grow or shrink
the number of network units and interconnections in an attempt to
improve generalization accuracy and training efficiency.
• One idea is to begin with a network containing no hidden units, then
grow the network as needed by adding hidden units until the training error
is reduced to some acceptable level. The CASCADE-CORRELATlON
algorithm is one such algorithm.
•A second idea for dynamically altering network structure is to take the
opposite approach. Instead of beginning with the simplest possible
network and adding complexity, we begin with a complex network and
prune it as we find that certain connections are inessential. One way to
decide whether a particular weight is inessential is to see whether its value
is close to zero. A second way, which appears to be more successful in
practice, is to consider the effect that a small variation in the weight has on
the error E. The effect on E of varying w can be taken as a measure of the
salience of the connection.
73 05/01/2025

Amada Punching Tool Catalog 2011 PDF
No ratings yet
Amada Punching Tool Catalog 2011 PDF
74 pages
Ann I
No ratings yet
Ann I
41 pages
UNIT4_Part1 aiml
No ratings yet
UNIT4_Part1 aiml
79 pages
Ipcw Ann
No ratings yet
Ipcw Ann
100 pages
Machine Learning Module-3
No ratings yet
Machine Learning Module-3
23 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
ML-MODULE-4 - Part 2
No ratings yet
ML-MODULE-4 - Part 2
262 pages
Neural Nets
No ratings yet
Neural Nets
33 pages
ML_UNIT-1 &2 Notes
No ratings yet
ML_UNIT-1 &2 Notes
84 pages
ML Unit 2
No ratings yet
ML Unit 2
91 pages
Lecture+8
No ratings yet
Lecture+8
65 pages
Lecture NN 2005
No ratings yet
Lecture NN 2005
137 pages
Unit-5 AI
No ratings yet
Unit-5 AI
19 pages
Unit-5
No ratings yet
Unit-5
59 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
7 pages
Wk. 12. Artificial Neural Networks [12!05!2021] (1)
No ratings yet
Wk. 12. Artificial Neural Networks [12!05!2021] (1)
48 pages
Introduction To Neural Networks
No ratings yet
Introduction To Neural Networks
51 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
68 pages
LIET III-II CSE AIML IV UNIT Previous Yrs QN Papers Qns and Answers
No ratings yet
LIET III-II CSE AIML IV UNIT Previous Yrs QN Papers Qns and Answers
15 pages
Neural Networks and Their Statistical Application
No ratings yet
Neural Networks and Their Statistical Application
41 pages
neural network
No ratings yet
neural network
11 pages
Artificial Neural Networks For Machine Learning - Every Aspect You Need To Know About
No ratings yet
Artificial Neural Networks For Machine Learning - Every Aspect You Need To Know About
9 pages
Neural Networks
No ratings yet
Neural Networks
96 pages
Kiet School of Engineering & Technology: Department of Computer Appication
No ratings yet
Kiet School of Engineering & Technology: Department of Computer Appication
30 pages
NN UNIT-1 Complete Notes with 153 pages (1)
No ratings yet
NN UNIT-1 Complete Notes with 153 pages (1)
153 pages
Module 3 Ppt
No ratings yet
Module 3 Ppt
83 pages
Artificial Neural Networks - MiniProject
100% (1)
Artificial Neural Networks - MiniProject
16 pages
Artificial Neural Network and Its Applications
No ratings yet
Artificial Neural Network and Its Applications
21 pages
Ict L2 PDF
No ratings yet
Ict L2 PDF
49 pages
Artificial Neural Networks Ppt
No ratings yet
Artificial Neural Networks Ppt
26 pages
Refined Chapter 5 UceQEJ (2)
No ratings yet
Refined Chapter 5 UceQEJ (2)
79 pages
Neural Networks
No ratings yet
Neural Networks
36 pages
Unit 9 - Neural Network
No ratings yet
Unit 9 - Neural Network
53 pages
Introduction To Neural Networks: 2Nd Year Ug, MSC in Computer Science
No ratings yet
Introduction To Neural Networks: 2Nd Year Ug, MSC in Computer Science
16 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
22 pages
Neural Network
No ratings yet
Neural Network
52 pages
Module 3 Chap 4 ANNs
No ratings yet
Module 3 Chap 4 ANNs
69 pages
Neural Networks
No ratings yet
Neural Networks
75 pages
New Module 3 Part2
No ratings yet
New Module 3 Part2
90 pages
Ann Today
No ratings yet
Ann Today
30 pages
Lec 1
No ratings yet
Lec 1
57 pages
Soft Computing: by K.Sai Saranya, Assistant Professor, Department of CSE
No ratings yet
Soft Computing: by K.Sai Saranya, Assistant Professor, Department of CSE
127 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
7 pages
Neural-Network-oxygen
No ratings yet
Neural-Network-oxygen
25 pages
Wk9-Neural Networks
No ratings yet
Wk9-Neural Networks
46 pages
Ai Unit 4 Notes
No ratings yet
Ai Unit 4 Notes
11 pages
28 Lecture CSC462
No ratings yet
28 Lecture CSC462
28 pages
ML-24-NN-part1-v.0.2_=7
No ratings yet
ML-24-NN-part1-v.0.2_=7
85 pages
ANNandItsApplicationsinCivilEngineering
No ratings yet
ANNandItsApplicationsinCivilEngineering
264 pages
NN Unit 1 Complete Notes
100% (1)
NN Unit 1 Complete Notes
154 pages
4.2 Ann
No ratings yet
4.2 Ann
26 pages
Lecture Slides-Week13,14
No ratings yet
Lecture Slides-Week13,14
62 pages
Introduction to Neural Networks
No ratings yet
Introduction to Neural Networks
125 pages
ML - Chapter 5 - Neural Network
No ratings yet
ML - Chapter 5 - Neural Network
64 pages
nn-unit-1-complete-notes
No ratings yet
nn-unit-1-complete-notes
154 pages
13_Ann
No ratings yet
13_Ann
39 pages
Intelligent Control of Drives-1
No ratings yet
Intelligent Control of Drives-1
82 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
20 pages
PP&DS-5
No ratings yet
PP&DS-5
31 pages
Convolutional Neural Networks in Python: Beginner's Guide to Convolutional Neural Networks in Python
From Everand
Convolutional Neural Networks in Python: Beginner's Guide to Convolutional Neural Networks in Python
Frank Millstein
No ratings yet
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet
Math Word Problem Homework Help
100% (1)
Math Word Problem Homework Help
7 pages
Mass Transfer in A Spray Column During Two-Phase Extraction of Horseradish Peroxidase
No ratings yet
Mass Transfer in A Spray Column During Two-Phase Extraction of Horseradish Peroxidase
5 pages
Geometry Volume - Cylinders - Cone
100% (1)
Geometry Volume - Cylinders - Cone
2 pages
G9 EOT Chemistry & Biology Content
No ratings yet
G9 EOT Chemistry & Biology Content
2 pages
6731ce27d9ae9d19ebc770e8 ## VPRP Mega Test-07 ROI Dropper 10-Nov-2024 Ph-1 Hs 349 Questions
No ratings yet
6731ce27d9ae9d19ebc770e8 ## VPRP Mega Test-07 ROI Dropper 10-Nov-2024 Ph-1 Hs 349 Questions
20 pages
Fundamentals in Cavity Prepration
No ratings yet
Fundamentals in Cavity Prepration
42 pages
BTM2133-Chapter 4 Measuring Instruments
50% (2)
BTM2133-Chapter 4 Measuring Instruments
61 pages
Mapaga Lesson Plan in Grade VI Mathematics
100% (1)
Mapaga Lesson Plan in Grade VI Mathematics
8 pages
PROJECT REPORT MMC's
No ratings yet
PROJECT REPORT MMC's
40 pages
Protein Synthesis Project
No ratings yet
Protein Synthesis Project
2 pages
Ammonia Plant Design
No ratings yet
Ammonia Plant Design
75 pages
Introducing Spring Boot
No ratings yet
Introducing Spring Boot
18 pages
VR-UNIT 4-MODELING THE PHYSICAL WORLD (1)
No ratings yet
VR-UNIT 4-MODELING THE PHYSICAL WORLD (1)
39 pages
Mathematical Economics Mod L
No ratings yet
Mathematical Economics Mod L
22 pages
Ellie Dolman Cardigan Tutorial
No ratings yet
Ellie Dolman Cardigan Tutorial
18 pages
Aws Developer Guide
No ratings yet
Aws Developer Guide
784 pages
Lesson 3 - Inertia: The Meaning of Inertia
100% (1)
Lesson 3 - Inertia: The Meaning of Inertia
7 pages
Solutions
No ratings yet
Solutions
27 pages
Ketones
No ratings yet
Ketones
17 pages
NRF52 Online Power Profiler
No ratings yet
NRF52 Online Power Profiler
6 pages
Power Supply Unit - MINI-PS-100-240AC/24DC/2 - 2938730: Key Commercial Data
No ratings yet
Power Supply Unit - MINI-PS-100-240AC/24DC/2 - 2938730: Key Commercial Data
6 pages
Atomic Structure - Study Notes
No ratings yet
Atomic Structure - Study Notes
16 pages
Measurement Size
No ratings yet
Measurement Size
2 pages
Google API Flow
No ratings yet
Google API Flow
7 pages
1997 I178
No ratings yet
1997 I178
2 pages
DIGITAL LITERACY NOTES
No ratings yet
DIGITAL LITERACY NOTES
9 pages
T4 Probability
No ratings yet
T4 Probability
7 pages
Am8530H/Am85C30: Serial Communications Controller
No ratings yet
Am8530H/Am85C30: Serial Communications Controller
195 pages
Automobile Tracking System Using Gps and GSM: Jaya Ram Khatri Chhetri
No ratings yet
Automobile Tracking System Using Gps and GSM: Jaya Ram Khatri Chhetri
49 pages