3-ADALINE (Adaptive Linear Neuron) (Widrow & Hoff, 1960) : W X T E
3-ADALINE (Adaptive Linear Neuron) (Widrow & Hoff, 1960) : W X T E
- Typically, Adaline uses bipolar ( -1, +1) activation for its input
signals and its target outputs (although it is not restricted to such
value).
- The weights from the input units to the Adaline are adjustable.
- In general the Adaline can be trained using the delta rule (also
known as the least mean square (LMS) or Widrow–Hoff rule.
- In Adaline there is only one output unit.
net =
After training, if the net is being used for pattern classification in which
the desired output is either +1 or -1, a threshold function is applied to the
net input to obtain the activation.
1 if net ≥ θ
y = f(net) =
-1 if net < θ
47
For AND function gate, if w1 = w2 = 1 and b = -1.5, then,
Pattern binary bipolar
P x1,p x2,p tp netp Ep
1 1 1 1 0.5 0.25
2 1 0 -1 -.5 0.25
3 0 1 -1 -.5 0.25
4 0 0 -1 -.5 0.25 E=
∑Ep= 1
The separating line is: x2 = -x1 + 1.5 (i.e., the weights that minimize
this error are: w1=w2=1, b=-1.5)
4- Delta learning rule:
4.1- delta rule for single output unit:
The delta rule change the weights of the neural connections so as to
minimize the difference between the net input “net” and the target value
“t”,
∆wi = α (t – net) xi …………(1)
n
net = ∑xiwi …..……..(2)
i=1
Thus,
E net
E 2(t net ) …………..(5)
wi wi
= -2 (t – net) xi …………(6)
Thus the local error will be reduced most rapidly by adjusting the weight
to the delta rule:
48
∆wi = α (t – net) xi
4.2- Delta rule for several output units:
Delta rule can be extended to more than single output unit, then
for the output unit yJ we have:
∆wIJ = α (tJ – netJ) xI
The squared error for a particular training pattern is
m
E (t J net J ) 2
J 1
Again,
E m
E
wiJ wi J
(t
j 1
j net j ) 2
(t J net J ) 2
wi J
The weight WIJ influence the error only on output unit yJ and:
n
netJ = ∑xiwiJ
i=1
E net J
E 2(t J net J )
wiJ wi J
= -2 (tJ – netJ) xI
Adjusting the weights according to delta rule for a given learning rate:
∆wIJ = α (tJ – netJ) xI
5- MADALINE
Adalines can be combined so that the output from some of them becomes
input for others of them, then the net becomes multilayer. Such a
multilayer net, known as a MADALINE.
In this section we will discuss a MADALINE with one hidden layer
(composed of two hidden Adaline units) and one output Adaline unit.
Generalizations to more hidden units, more output units, and more hidden
layers, are straightforward.
49
Architecture
A simple MADALINE net is illustrated in the following figure. The use
of the hidden units, Z1 and Z2 give the net computational capabilities not
found in single layer nets, but also complicate the training process.
Figure Madaline with two hidden Adaline and one output Adaline.
Algorithm
In the MRI algorithm (Madaline Rule I: the original form of
MADALINE training) [Widrow and Hoff, 1960]:
1- only the weights for the hidden Adalines are adjusted; the weights
for the output unit are fixed. (MRII, allows training for weights in
all layers of the net).
2- the weights v1 and v2 and the bias b3 that feed into the output unit
Y are determined so that the response of unit Y is 1 if the signal it
receives from either Z1 or Z2 (or both) is 1, and is -1 if both Z1 and
Z2 send a signal of -1. In other words, the unit Y performs the logic
50
function OR on the signals it receives from Z1 and Z2 The weights
into Y are:
v1 = 0.5, v2 = 0.5, b3 = 0.5
3- the weights on the first hidden Adaline w11, w21, and the bias b1 and
the weights on the second hidden Adaline w12, w22, and b2 are
adjusted according to the algorithm.
4- the activation function for units Z1and Z2 and Y is:
1 if x≥0
f(x) =
-1 if x<0
51
if t=-1, then: update weights on all units Zk that
have positive net input (Z-ink > 0):
Step 9 is motivated by the desire to (1) update the weights only if an error
occurred and (2) update the weights in such a way that it is more likely
for the net to produce the desired response.
If t = 1 and error has occurred, it means that all Z units had value - 1 and
at least one Z unit needs to have a value of +1. Therefore, we take ZJ to
be the
Z unit whose net input is closest to 0 and adjust its weights (using
Adaline training with a target of + 1)
bJ(new) = bJ(old) + α(1- Z-inJ)
wiJ(new) = wiJ(old) + α(1- Z-inJ)xi
If t = -1 and error has occurred, it means that at least one Z unit had value
+1 and all Z units must have value -1. Therefore, we adjust the weights
on all of the Z units with positive net input,(using Adaline training with a
target of -1)
bk(new) = bk(old) + α(-1- Z-ink)
wik(new) = wik(old) + α(-1- Z-ink)xi
Example: illustrate the use of the MRI algorithm to train a
MADALLNE to solve the XOR problem, having the following:
Weights into Z1 Weights into Z2 Weights into Y
w11 w21 b1 w12 w22 b2 v1 v2 b3
.05 .2 .3 .1 .2 .15 .5 .5 .5
52
Sol:
Only the computations for the first weight updates are shown. The
training patterns are:
x1 x2 t
1 1 -1
1 -1 1
-1 1 1
-1 -1 -1
1- For the first training pair x1 =1, x2 = 1, t = -1
Z-in1 = .3 + .05 + .2 = .55
Z-in2 = .15 + .1 + .2 = .45
Z1 = 1
Z2 = 1
y-in = .5 + .5 + .5 =1.5
y=1
t – y = -1 -1 = -2, ≠ 0, then an error occurred
since t = -1, and both Z units have positive net inputs,
update the weights on unit Z1 as follows:
b1(new) = b1(old) + α(-1- Z-in1)
= 0.3 + (0.5) (-1 - 0.55) = -0.475
w11(new) = w11(old) + α(-1- Z-in1)x1
= 0.05 + (0.5) (-1 – 0.55) = -0.725
w21(new) = w21(old) + α(-1- Z-in1)x2
= 0.2 + (0.5) (-1 -0.55) = -0.575
update the weights on unit Z2 as follows:
b2(new) = b2(old) + α(-1- Z-in2)
= 0.15 + (0.5) (-1 - 0.45) = -0.575
w12(new) = w12(old) + α(-1- Z-in2)x1
= 0.1 + (0.5) (-1 – 0.45) = -0.625
w22(new) = w22(old) + α(-1- Z-in2)x2
= 0.2 + (0.5) (-1 -0.45) = -0.525
After four epochs of training , the final weights found to be:
w11 w21 b1 w12 w22 b2
53
Geometric interpretation of weights:
The positive response region for the Madaline trained is the union of the
regions where each of the hidden units has a positive response. The
decision boundary for each hidden unit can be calculated:
For hidden unit Z1, the boundary line is:
-2.5
-2.5
54