0% found this document useful (0 votes)
115 views8 pages

3-ADALINE (Adaptive Linear Neuron) (Widrow & Hoff, 1960) : W X T E

The document describes Adaline, an adaptive linear neuron, and MADALINE, a multilayer extension of Adaline. Adaline uses bipolar activation values and weights that are adjusted using the delta rule to minimize error. MADALINE contains multiple hidden Adaline layers that feed into an output Adaline. The MRI training algorithm adjusts only hidden layer weights initially, setting output weights to perform logical OR. Hidden unit activations are used to update weights based on the target and each unit's net input value to minimize error.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
115 views8 pages

3-ADALINE (Adaptive Linear Neuron) (Widrow & Hoff, 1960) : W X T E

The document describes Adaline, an adaptive linear neuron, and MADALINE, a multilayer extension of Adaline. Adaline uses bipolar activation values and weights that are adjusted using the delta rule to minimize error. MADALINE contains multiple hidden Adaline layers that feed into an output Adaline. The MRI training algorithm adjusts only hidden layer weights initially, setting output weights to perform logical OR. Hidden unit activations are used to update weights based on the target and each unit's net input value to minimize error.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

3- ADALINE (adaptive linear Neuron) [Widrow & Hoff, 1960]

- Typically, Adaline uses bipolar ( -1, +1) activation for its input
signals and its target outputs (although it is not restricted to such
value).
- The weights from the input units to the Adaline are adjustable.
- In general the Adaline can be trained using the delta rule (also
known as the least mean square (LMS) or Widrow–Hoff rule.
- In Adaline there is only one output unit.

The structure of Adaline

net =

After training, if the net is being used for pattern classification in which
the desired output is either +1 or -1, a threshold function is applied to the
net input to obtain the activation.
1 if net ≥ θ
y = f(net) =
-1 if net < θ

b (new) = b (old) + α(t - net)


wi (new) = wi (old) + α(t- net)xi
- For a single neuron, the suitable value of the learning rate α is to be:
0.1 ≤ n*α ≤ 1, where n is the total number of the input units.
- The learning rule minimizes the mean squared error between the
activation and the target value.
m n
E   (t p   (xi , p wi ) 2
p 1 i 0
m
E   (t p  net p ) 2
p 1
Where tp is the associated target for the input pattern p. As an example, if
the neural net represents logic gate with two input, then the total squared
error is
m
E   (t p  ( x1, p w1  x2, p w  w0 ))2
p 1

47
For AND function gate, if w1 = w2 = 1 and b = -1.5, then,
Pattern binary bipolar
P x1,p x2,p tp netp Ep
1 1 1 1 0.5 0.25
2 1 0 -1 -.5 0.25
3 0 1 -1 -.5 0.25
4 0 0 -1 -.5 0.25 E=
∑Ep= 1
The separating line is: x2 = -x1 + 1.5 (i.e., the weights that minimize
this error are: w1=w2=1, b=-1.5)
4- Delta learning rule:
4.1- delta rule for single output unit:
The delta rule change the weights of the neural connections so as to
minimize the difference between the net input “net” and the target value
“t”,
∆wi = α (t – net) xi …………(1)
n
net = ∑xiwi …..……..(2)
i=1

the squared error for a particular training pattern is:


E = (t – net)2 …………(3)
 E is a function of all of the weights, wi, i=1,2, …., n
 The gradient of E is the vector consisting of the partial derivatives
of E with respect to each of the weights
E
E  …………(4)
wi
 The gradient gives the direction of most rapid increase in E, the
opposite direction gives the most rapid decrease in the error, i. e.,
the error can be reduced by adjusting the weight wi in the direction
of
-∂E/∂w.
Since, net = ∑xiwi ; and E = (t – net)2

Thus,
E net
E   2(t  net ) …………..(5)
wi wi
= -2 (t – net) xi …………(6)
Thus the local error will be reduced most rapidly by adjusting the weight
to the delta rule:

48
∆wi = α (t – net) xi
4.2- Delta rule for several output units:
Delta rule can be extended to more than single output unit, then
for the output unit yJ we have:
∆wIJ = α (tJ – netJ) xI
The squared error for a particular training pattern is
m
E   (t J  net J ) 2
J 1
Again,
E  m
E  
wiJ wi J
 (t
j 1
j  net j ) 2


 (t J  net J ) 2
wi J
The weight WIJ influence the error only on output unit yJ and:
n
netJ = ∑xiwiJ
i=1

E net J
E   2(t J  net J )
wiJ wi J
= -2 (tJ – netJ) xI
Adjusting the weights according to delta rule for a given learning rate:
∆wIJ = α (tJ – netJ) xI

5- MADALINE
Adalines can be combined so that the output from some of them becomes
input for others of them, then the net becomes multilayer. Such a
multilayer net, known as a MADALINE.
In this section we will discuss a MADALINE with one hidden layer
(composed of two hidden Adaline units) and one output Adaline unit.
Generalizations to more hidden units, more output units, and more hidden
layers, are straightforward.

49
Architecture
A simple MADALINE net is illustrated in the following figure. The use
of the hidden units, Z1 and Z2 give the net computational capabilities not
found in single layer nets, but also complicate the training process.

Figure Madaline with two hidden Adaline and one output Adaline.

Algorithm
In the MRI algorithm (Madaline Rule I: the original form of
MADALINE training) [Widrow and Hoff, 1960]:
1- only the weights for the hidden Adalines are adjusted; the weights
for the output unit are fixed. (MRII, allows training for weights in
all layers of the net).
2- the weights v1 and v2 and the bias b3 that feed into the output unit
Y are determined so that the response of unit Y is 1 if the signal it
receives from either Z1 or Z2 (or both) is 1, and is -1 if both Z1 and
Z2 send a signal of -1. In other words, the unit Y performs the logic

50
function OR on the signals it receives from Z1 and Z2 The weights
into Y are:
v1 = 0.5, v2 = 0.5, b3 = 0.5
3- the weights on the first hidden Adaline w11, w21, and the bias b1 and
the weights on the second hidden Adaline w12, w22, and b2 are
adjusted according to the algorithm.
4- the activation function for units Z1and Z2 and Y is:

1 if x≥0
f(x) =
-1 if x<0

5- Set the learning rate α as in the Adaline training algorithm (a small


value between 0.1 and 1).
6- Compute net input to each Adaline unit:
Z-in1 = b1 + x1w11 + x2w21
Z-in2 = b2 + x1w12 + x2w22
7- Determine output of each hidden unit:
Z1 = f( Z-in1)
Z2 = f(Z-in2)
8- Determine output of net:
y-in = b3 +Z1v1 + Z2v2
y = f(y-in)
9- Determine error and update weights according to the following:
9.1 if t = y , no weight updates are performed.
9.2 if t≠y, then:
if t=1, then: update weights on Zj, the unit whose net
input is closest to 0
bJ(new) = bJ(old) + α(1- Z-inJ)
wiJ(new) = wiJ(old) + α(1- Z-inJ)xi

51
if t=-1, then: update weights on all units Zk that
have positive net input (Z-ink > 0):

bk(new) = bk(old) + α(-1- Z-ink)


wik(new) = wik(old) + α(-1- Z-ink)xi

10- If weight changes have stopped (or reached an acceptable level), ,


then stop; otherwise continue.

Step 9 is motivated by the desire to (1) update the weights only if an error
occurred and (2) update the weights in such a way that it is more likely
for the net to produce the desired response.
If t = 1 and error has occurred, it means that all Z units had value - 1 and
at least one Z unit needs to have a value of +1. Therefore, we take ZJ to
be the
Z unit whose net input is closest to 0 and adjust its weights (using
Adaline training with a target of + 1)
bJ(new) = bJ(old) + α(1- Z-inJ)
wiJ(new) = wiJ(old) + α(1- Z-inJ)xi
If t = -1 and error has occurred, it means that at least one Z unit had value
+1 and all Z units must have value -1. Therefore, we adjust the weights
on all of the Z units with positive net input,(using Adaline training with a
target of -1)
bk(new) = bk(old) + α(-1- Z-ink)
wik(new) = wik(old) + α(-1- Z-ink)xi
Example: illustrate the use of the MRI algorithm to train a
MADALLNE to solve the XOR problem, having the following:
Weights into Z1 Weights into Z2 Weights into Y
w11 w21 b1 w12 w22 b2 v1 v2 b3
.05 .2 .3 .1 .2 .15 .5 .5 .5

52
Sol:
Only the computations for the first weight updates are shown. The
training patterns are:
x1 x2 t
1 1 -1
1 -1 1
-1 1 1
-1 -1 -1
1- For the first training pair x1 =1, x2 = 1, t = -1
Z-in1 = .3 + .05 + .2 = .55
Z-in2 = .15 + .1 + .2 = .45
Z1 = 1
Z2 = 1
y-in = .5 + .5 + .5 =1.5
y=1
t – y = -1 -1 = -2, ≠ 0, then an error occurred
since t = -1, and both Z units have positive net inputs,
update the weights on unit Z1 as follows:
b1(new) = b1(old) + α(-1- Z-in1)
= 0.3 + (0.5) (-1 - 0.55) = -0.475
w11(new) = w11(old) + α(-1- Z-in1)x1
= 0.05 + (0.5) (-1 – 0.55) = -0.725
w21(new) = w21(old) + α(-1- Z-in1)x2
= 0.2 + (0.5) (-1 -0.55) = -0.575
update the weights on unit Z2 as follows:
b2(new) = b2(old) + α(-1- Z-in2)
= 0.15 + (0.5) (-1 - 0.45) = -0.575
w12(new) = w12(old) + α(-1- Z-in2)x1
= 0.1 + (0.5) (-1 – 0.45) = -0.625
w22(new) = w22(old) + α(-1- Z-in2)x2
= 0.2 + (0.5) (-1 -0.45) = -0.525
After four epochs of training , the final weights found to be:
w11 w21 b1 w12 w22 b2

-0.73 1.53 -0.99 1.27 -1.33 -1.09

53
Geometric interpretation of weights:
The positive response region for the Madaline trained is the union of the
regions where each of the hidden units has a positive response. The
decision boundary for each hidden unit can be calculated:
For hidden unit Z1, the boundary line is:

And for hidden unit Z2, the boundary line is:

The response diagram for the MADALINE is illustrated in the shown


figure

-2.5

-2.5

54

You might also like