0% found this document useful (0 votes)
235 views66 pages

Overview of Artificial Neural Networks

1. Artificial neural networks (ANNs) aim to achieve human-like performance through nonlinear computational elements arranged in patterns like biological neural networks. 2. ANNs are composed of weighted nodes that are connected and can adapt their weights during use to improve performance, similar to human brain development. 3. Early attempts to model biological neurons date back to the 1940s, while perceptrons in the 1960s helped solve simple pattern recognition problems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
235 views66 pages

Overview of Artificial Neural Networks

1. Artificial neural networks (ANNs) aim to achieve human-like performance through nonlinear computational elements arranged in patterns like biological neural networks. 2. ANNs are composed of weighted nodes that are connected and can adapt their weights during use to improve performance, similar to human brain development. 3. Early attempts to model biological neurons date back to the 1940s, while perceptrons in the 1960s helped solve simple pattern recognition problems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

hange E hange E

XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

2.1 Introduction
Artificial neural network (ANN) models have been studied for many
years with the hope of achieving "Human-like performance", Different names
were given to these models such as:
- Parallel distributed processing models
- Biological computers or Electronic Brains.
- Connectionist models
- Neural morphic system
After that, all these names settled on Artificial Neural Networks (ANN)
and after it on neural networks (NN) only.
There are two basic different between computer and neural, these are:
1- These models are composed of many non-linear computational elements
operating in parallel and arranged in patterns reminiscent of biological
neural networks.
2- Computational Elements (or node s) are connected via weights that are
typically adapted during use to improve performance just like human
brain.
Computer logic Elements (1, 0)
Neural weighted performance
2.2 Development of Neural Networks
An early attempt to understand biological computations was stimulated
by McCulloch 4 pitts in [1943], who modeled biological neurons as logical as
logical decision elements these elements were described by a two – valued state
variables (on, off) and organized into logical decision networks that could
compute simple Boolean functions.
In 1961 Rosenblatt salved simple pattern recognition problems using
perceptrons. Minskey and paert in [1969] studied that capabilities and
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

limitations of perceptrons and concluded that many interesting problems could


never be soled by perceptron networks.
Recent work by Hopfield examined the computational power of a model
system of two –state neurons operating with organized symmetric connections
and feed back connectivity. The inclusion of feed –back connectivity in these
networks distinguished them from perceptron – line networks. Moreover,
graded – response neurons were used to demonstrate the power * speed of these
Networks. Recent interest in neural networks is due to the interest in building
parallel computers and most importantly due the discovery of powerful network
learning algorithms.

2.3 Areas of Neural Networks


The areas in which neural networks are currently being applied are:
1-Signal processing
2- Pattern Recognition.
3- Control problems
4- Medicine
5- Speech production
6- Speech Recognition
7- Business

3.1 Theory of Neural Networks (NN)


Human brain is the most complicated computing device known to a
human being. The capability of thinking, remembering, and problem solving of
the brain has inspired many scientists to model its operations. Neural network is
an attempt to model the functionality of the brain in a simplified manner. These
models attempt to achieve "good" performance via dense interconnections of
simple computational elements. The term (ANN) and the connection of its
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

models are typically used to distinguish them from biological network of


neurons of living organism which can be represented systematically as shown in
figure below

x1 w1
 x2 w2  f y
xn wn

Artificial Neural Network

Biological Neural Network

Neclues is a simple processing unite which receives and combines


signals from many other neurons through input paths called dendrites if the
combined signal is strong enough, it activates the firing of neuron which
produces an o/p signal. The path of the o/p signal is called the axon, synapse is
the junction between the (axon) of the neuron and the dendrites of the other
neurons. The transmission across this junction is chemical in nature and the
amount of signal transferred depends on the synaptic strength of the junction.
This synoptic strength is modified when the brain is learning.

Weights (ANN)  synaptic strength (biological Networks)


hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

3.2 Artificial Neural Networks (ANN)


An artificial neural network is an information processing system that has
certain performance characters in common with biological neural networks.
Artificial neural networks have been developed as generalizations of
mathematical models of human cognition or neural biology, based on the
assumptions that:-
1-Information processing occurs at many simple elements called neurans.
2-Signals are passed between neurons over connection links.
3-Each connection link has an associated weight which, in a typical neural net,
multiplies the signal transmitted.
4-Each neuron applies an action function (usually nonlinear) to its net input
(sum of weighted input signals) to determine its output signal.

A neural network is characterized by:


1- Architecture: - its pattern of connections between the neurons.
2- Training Learning Algorithm: - its method of determining the weights on
the connections.
3- Activation function.
3.2.1 Properties of ANN
1-Parallelism
2-Capacity for adaptation "learning rather programming"
3-Capacity of generalization
4-No problem definition
5- Abstraction & solving problem with noisy data.
6- Ease of constriction & learning.
7-Distributed memory
8- Fault tolerance
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

3-3 type of learning


In case a neural network is to be used for particle applications, a general
procedure is to be taken, which in its various steps can be described as follows:-
1: A logical function to be represented is given. The input vector e1 , e2, e3, ….
, en are present, whom the output vectors a1, a2, a3, …. , an assigned. These
functions are to be represented by a network.
2: A topology is to be selected for the network.
3: The weights w1, w2, w3, … are to be selected in such away that the network
represents The given function (n) the selected topology. Learn procedures
are to be used for determining the weights.
4: After the weights have been learned and the network becomes available, it
can be used as after as desired.

The learning of weights is generally done as follows:


1- Set random numbers. For all weights.
2- Select a random input vector ej.
3- Calculate the output vector Oj with the current weights.
4- Compare Oj with the destination vector aj , if Cj = aj then continue
with (2).
Else correct the weights according to a suitable correction formula and
then continue with (2).
There are three type of learning in which the weights organize themselves
according to the task to be learnt, these types are:-
1- Supervised learning:-
The supervised is that, at every step the system is informed about the
exact output vector. The weights are changed according to a formula (e.g. the
delta-rule), if o/p is unequal to a. This method can be compared to learning
under a teacher, who knows the contents to be learned and regulates them
accordingly in the learning procedure.
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

2- Unsupervised Learning:-
Here the correct final vector is not specified, but instead the weights are
changed through random numbers. With the help of an evaluation function one
can ascertain whether the output calculated with the changed weights is better
than the previous one. In this case the changed weights are stored, else
forgotten. This type of learning is also called reinforcement learning.
3- Learning through self- organization:-
The weights changed themselves at every learning step. The change
depends up on
1- The neighborhood of the input pattern.
2- The probability pattern, with which the permissible input pattern is
offered.
3-4 Typical Architecture of NN
Neural nets are often classified as single layer or multilayer. In
determining the number of layers, the input units are not counted as a layer,
because they perform no computation. Equivalently, the number of layers in the
net can be defined to be the number of layers of weighted interconnects links
between the slabs of neurons. This view is motivated by the fact that the
weights in a net contain extremely important information. The net shown bellow
has two layers of weights:
(A very simple neural network)
X1
W1 v1 Z1

W2
X2 Y

W3 v2 Z2

X3

Input unit Hidden unit Output unit


hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

3-4-1 Single-Layer Net:-


A single-layer net has one layer of connection weight. Often, the units
can be distinguished as input units, which receive signals from the outside
world, and output units, from which the response of the net can be read. In the
typical single-layer net shown in figure bellow the input units are fully
connected to output units but are not connected to other input units and the
output units are not connected to other output units.

W11
X1 Wi1 Y1
Wn1
W1j
Xi Wij Yj
Wnj
W1m
Xn Wim Ym

Wnm
Input One layer Output
unit of weights unit

(A single-layer neural network)

3-4-2 Multilayer net


A Multilayer net is a net with one or more layers (or levels) of nodes
which is called hidden units, between the input units and the output units.
Typically, there is a layer of weights between two adjacent levels of units
(input, hidden, or output). Multilayer nets can solve more complicated problems
than can single-layer nets, but training may be more difficult. However, in some
cases, training may be more successful because it is possible to solve a problem
that a single-layer net can not be trained to perform correctly at all. The figure
bellow shows the multilayer neural net.
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

X1
V11 Z1
W11
Y1
Vi1 Wj1
V1i
Vn1 Wp1
V1p W1k
Xi Vij Zj Wik
Yk
Vnj Wpk
Vip
Wjm W1m
Xn Vnp Zp Wpm Ym

Output
Input Hidden unit unit
unit

(A Multilayer neural net)


The figure shown bellow is an example of a three-layered neural net work
with two hidden neurons.

a1 a2 a3

hi h2

e1 e2 e3
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

3.5 Basic Activation Functions


The activation function (Sometimes called a transfers function) shown in
figure below can be a linear or nonlinear function. There are many different
types of activation functions. Selection of one type over another depends on the
particular problem that the neuron (or neural network) is to solve. The most
common types of activation function are:-
n
Vq   WqjX j
v0

 threshold
X1 W1  bias
axan
W2 vq yq
X2  F(0) Y output
Wn Activation function
Summing
Xn (cell body)
junction

Synaptic weights
( including  or )

Fig:- Alternate nonlinear model of an ANN


1- The first type is the linear (or identity) function. Ramp
y q f lin ( v q )  v q
Flin(vq)

6
4
2

-8 -6 -4 -2 2 4 6 8
-2
-4
-6
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

2-The second type of activation function is a hard limiter; this is a binary (or
bipolar) function that hard-limits the input to the function to either a 0 or a 1 for
the binary type, and a -1 or 1 for the bipolar type. The binary hard limiter is
sometimes called the threshold function, and the bipolar hard limiter is referred
to as the symmetric hard limiter.
a- The o/p of the binary hard limiter:-
f hl ( vq )
 0 if v q  0
y q  f hl ( v q )  
1 if v q  0 +1

vq

b-The o/p for the symmetric hard limiter (shl):-


  1 if v q  0

y q  f shl ( v q )   0 if v q  0
 1 if v q  0

double side ‫تسمى ايضا‬


f shl ( v q )

+1

vq

-1

3-The third type of basic activation function is the saturating linear function or
threshold logic Unite (tLu) .
This type of function can have either a binary or bipolar range for the saturation
limits of the output. The bipolar saturating linear function will be referred to as
the symmetric saturating linear function.
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

a- The o/p for the saturating linear function (binary o/p):-


 0 if vq  - 1/2

y q  f sl ( v q )   v q  1 / 2 if  1/2  v q  1/2
 1 if v q  1/2

 if x  
      
or y x if x
   if  -
y
Fsi(vq)

1
0.6

0.4
0.2 x
vq
-1 -0.75 -0.5 -0.25 0.25 0.5 0.75 1

 

b- The o/p for the symmetric saturating linear function:-


 1 if vq  -1
 -1   vq   1
y q  f ssl ( v q )  v q if
 1 if vq  1

f shl ( v q )

+1

vq
-1 1

-1
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

4-The fourth type is sigmoid. Modern NN's use the sigmoid nonlinearity which
is also known as logistic, semi linear, or squashing function.
‫ وبھا مرونة‬1 ‫ و‬0 ‫محصورة بين‬
1 f bs ( v q )
y q  f bs ( v q ) 
 v q
1 e +1
1
y 0.5
x
1 e x
0
vq

5-Hyperbollc tangent function is similar to sigmoid in shape but symmetric


about the origin. (tan h)
y
+1
ex  ex
y
ex  ex
x

-1

Ex.1 find y for the following neuron if :- x1=0.5, x2=1, x3=0.7


w1=0, w2=-0.3, w3=0.6 x1
w1
w
x2 2 y
w3
Sol x3
net =  X1W1  X 2 W2  X 3 W3
=0.5*0+1*-0.3+(-0.7*0.6)= -0.72
1- if f is linear
y = -0.72
2- if f is hard limiter (on-off)
y = -1
3-if f is sigmoid
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

1
y  0.32
 ( 0.72)
1 e
4-if f is tan h

e  0.72  e 0.72
y  0.6169
 0.72  0.72
e e
5-if f is (TLU) with b=0.6, a=3 then y=-3

a y  b
 a y  b
f ( y)  ky - b  y  b f(y)  
- a ky 0 yb
 y  -b

Ex2:- (H.W)
Find y for the following neuron if
x1 = 0.5, x2 = 1, x3 = -0.7
w1 = 0, w2 = -0.3, w3 = 0.6 x1
w1
 =1 x2 w2 y
w
Sol x3 3 1

Net =  Wi X i  
= -0.72 + 1 = 0.28
1- if f is linear
y = 0.28
2- if f is hard limiter
y=1
3-if f is sigmoid
1
y  0.569
 0.28
1 e
4-if f is tan sh

e 0.28  e  0.28
y  0.272
0.28 0.28
e e
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

5-if f is TLU with b=0.6, +a=3


y=0.28 y  b  y  b

Ex.3
The output of a simulated neural using a sigmoid function is 0.5 find the
value of threshold when the input x1 = 1, x2 = 1.5, x3 = 2.5. and have initial
weights value = 0.2. x1
w1
Sol x2 w2 Y = net
Output = F (net +  ) w3
x3
1
F(net ) 
1  e  net
Net =  Wi X i
 X1W1  X 2 W2  X 3 W3
=(1*0.2)+(1.5*0.2)+(2.5*0.2) = 0.2 +0.30 +0.50 = 1
1
0.5 
1  e  (1 )
0.5 (1  e  (1 ) )  1
0.5  0.5 e  (1 )  1
0.5 e  (1 )  0.5
e  (1 )  1
 (1  )  ln 1  - 1 -     -   1     -1
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

3.6 The bias


‫قيمة ثابتة تضاف لتحسين التعلم‬
Some networks employ a bias unit as part of every layer except the output layer.
This units have a constant activation value of 1 or -1, it's weight might be
adjusted during learning. The bias unit provides a constant term in the weighted
sum which results in an improvement on the convergence properties of the
network.
A bias acts exactly as a weight on a connection from a unit whose
activation is always 1. Increasing the bias increases the net input to the unit. If a
bias is included, the activation function is typically taken to be:

1 if net  0 ;
f (net )
 1 if net  0 ;
Where
net  b   X i Wi
i
Figure: - single –layer NN for logic function

b
1

W1
X1 y

X2 W2

Input unit output unit

Same authors do not use a bias weight, but instead use a fixed threshold  for
the activation function.
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

1 if net  0 ;
f (net )
 1 if net  0 ;
Where
net  b   X i Wi
i
However, this is essentially equivalent to the use of an adjustable bias.

4.1 Learning Algorithms


The NN's mimic the way that a child learns to identify shapes and colors
NN algorithms are able to adapt continuously based on current results to
improve performance. Adaptation or learning is an essential feature of NN's in
order to handle the new "environments" that are continuously encountered. In
contrast to NN's algorithms, traditional statistical techniques are not adoption
but typically process all training data simultaneously before being used with
new data. The performance of learning procedure depends on many factors such
as:-
1- The choice of error function.
2- The net architecture.
3- Types of nodes and possible restrictions on the values of the weights.
4- An activation function.

The convergent of the net:-


Depends on the:-
1- Training set
2- The initial conditions
3- Learning algorithms.
Note:-
The convergence in the case of complete information is better than in the case
of incomplete information
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

Training a NN is to perform weights assignment in a net to minimize the


o/p error. The net is said to be trained when convergence is achieved or in other
words the weights stop changing.
The learning rules are considered as various types of the:-

4.1.1 Hebbian Learning Rule


The earliest and simplest learning rule for a neural net is generally known
as the Hebb rule. Hebbian learning rule suggested by Hebb in 1949. Hebb's
basic idea is that if a unit Uj receives an input from a unit Ui and both unite are
highly active (positive) , then the weight Wij (from unit i to unit j) should be
strengthened(increase), otherwise the weight decrease.
This idea is formulated as:-
w ij   x i y j

Where  is the learning rate   1 ,


w is the weight change
w(new) = w(old) + xy
 w(new)  w(old)  w

Algorithm (Hebbian learning Rule)


Step 0: Initialize all weights
wi = 0 (i =1 to n )
Step 1: for each I/p training vector target o/p
Pair. S : t do steps 2- 4.
Step 2 : Set activations for I/P units:
wi = si (i =1 to n )

Step 3 : set activation for O/P unit :


y=t
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

Step 4 : Adjust the weights for


wi (new) = wi(old) + xiy (i =1 to n )
Adjust the bias:
b(new) = b(old) + y
Note that the bias is adjusted exactly like a weight from a "unit" whose output
signal is always 1.

Ex 4:
A Hebb net for the AND function: binary input and targets

Input Target
1 1 1 1
1 0 1 0
0 1 1 0
0 0 1 0

w1  x1y, w 2  x 2 y, b  y Initial weights = 0, w1 =0 , w2 =0, w3 =0

x1 x2 b y w 1 w 2 b w1 w2 b
0 0 0
1 1 1 1 1 1 1 1 1 1 1
2 1 0 1 0 0 0 0 1 1 1
3 0 1 1 0 0 0 0 1 1 1
4 0 0 1 0 0 0 0 1 1 1

The first input pattern shows that the response will be correct presenting
the second, third, and fourth training i/p shows that because the target value is 0,
no learning occurs. Thus, using binary target values prevents the net from
learning only pattern for which the target is "off".
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

The AND function can be solved if we modify its representation to


express the inputs as well as the targets in bipolar form. Bipolar representation
of the inputs and targets allows modifications of a weight when the input unit
and the target value are both "on" at the same time and when they are both "off"
at the same time and all units will learn whenever there is an error in the output.
The Hebb net for the AND function: bipolar inputs and targets are:
w 1  x 1 * y x1 x2 b y
 1 *1  1
1 1 1 1
w 1 (new )  w 1 (old)  w 1
1 -1 1 -1
 0 1  1
-1 1 1 -1
-1 -1 1 -1
Presenting the first input:-
x1 x2 b y w 1 w 2 b w1 w2 b
0 0 0
1 1 1 1 1 1 1 1 1 1

Presenting the second input:-


x1 x2 b y w 1 w 2 b w1 w2 b
1 1 1
1 -1 1 -1 -1 1 -1 0 2 0

Presenting the third input:-


x1 x2 b y w 1 w 2 b w1 w2 b
0 2 0
-1 1 1 -1 1 -1 -1 1 1 -1

Presenting the fourth input:-


x1 x2 b y w 1 w 2 b w1 w2 b
1 1 -1
-1 -1 1 -1 1 1 -1 2 2 -2
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

The first iteration will be:-


Input target Weight change weights
x1 x2 b y w 1 w 2 b w1 w2 b
0 0 0
1 1 1 1 1 1 1 1 1 1
1 -1 1 -1 -1 1 -1 0 2 0
-1 1 1 -1 1 -1 -1 1 1 -1
-1 -1 1 -1 1 1 -1 2 2 -2

Second Method
Wij   X i Y j or W   XT Y

Ex. 5
What would the weights be if Hebbian learning is applied to the data shown in
the following table? Assume that the weights are all zero at the start.

p x1 x2 y
1 0 0 1
2 0 1 1
3 1 0 0
4 1 1 1

With weights that you’ve just found, what output values are produce with a
threshold of 1, using hyperbolic activation function.

x1 x2 y w 1 w w1 w2
0 0
1 0 0 1 0 0 0 0
2 0 1 1 0 1 0 1
3 1 0 0 0 0 0 1
4 1 1 1 1 1 1 2
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

p  0, w 1  0, w1  0
p  1, w 1  0  x 1 * y  0
w2  0  x2 * y  0
p  2, w 1  0  0 * 1  0
w 2  0  1*1  1
p  3, w 1  0  1 * 0  0
w2 1 0 * 0 1
p  4, w 1  0  1 * 1  1
w 2  1  1*1  2
 w1  1 , w 2  2
4
net   X i  Wi
i 1

p  1, net  x 1 * w 1  x 2 * w 2
 0 *1  0 * 2  0
p  2, net  0 * 1  1 * 2  2
p  3, net  1 * 1  0 * 2  1
p  4, net  1 * 1  1 * 2  3

e ( net )  e  ( net )


output  F(net  )  ( net )
e  e ( net )

p x1 x2 net y
1 0 0 0 0
2 0 1 2 1
3 1 0 1 0
4 1 1 3 1
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

4.1.2 widrow-Hoff Learning Rule


The idea of Hebb was modified to produce the Basic delta rule (BDR) in
1960 or least Mean Square (LMS). The BDR is formulated as:-
w ij  (d j  y i ) x i

w ij    j x i (Delta rule)

w : - is the weight change


 : - is the learning rate
d :- desired output
y :- actual output
 : - error between d and y

Note:-
Before training the net, a decision has to be made on the setting of the learning
rate. Theoretically, the larger  the faster training process goes. But practically,
 may have to be set to a small value (e.g 0.1) in order to prevent the training
process from being trapped at local minimum resulting at oscillatory behavior.

Other Learning Rules:


1. Perceptron learning rule
2. Delta learning rule
3. correlation learning rule
4. Winner –Take-All learning rule
5. Outstar learning rule
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

H.W
Q1:-Briefly discuss the following:
A-Dendrites
B-synapses

Q2:- A fully connected feed forward network has 10 source nodes, 2 hidden
layers, on with 4 neurons and other with 3 neurons, and single output neuron.
Construct an architecture graph of this network.

Q3:- A neuron j receives input from four other neurons whose activity levels are
10, -20, 4 and -2. The respective synaptic weights of neuron j are 0.8, 0.2, -1,
and -0.9. Calculate the output of neuron j for the following two activation
functions:-
i) Hard-limiting function

ii) Logistic function F( x )  1 /(1  e  x ) .

Q4:-:
1. Outline the basic structure and components of a simple biological
neuron.
2. Describe how this is related to a McCulloch-Pitts neuron.

Q5:-list the features that distinguish the delta rule & Hebb's rule from each
other?
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

4.1.3 Back propagation


The determination of the error is a recursive process which start with the
o/p units and the error is back propagated to the I/p units. Therefore the rule is
called error Back propagation (EBP) or simply Back Propagation (BP). The
weight is changed exactly in the same form of the standard DR
w ij    j x i

 w ij ( t  1)  w ij ( t )    j x i

There are two other equations that specify the error signal. If a unite is an o/p
unit, the error signal is given by:-
  (d j  y j ) f j (net j)

Where net j   w ij x i  

The GDR minimize the squares of the differences between the actual and the
desired o/p values summed over the o/p unit and all pairs of I/p and o/p vectors.
The rule minimize the overall error E   E p by implementing a gradient

descent in E: - where, E p  1 / 2 j (d j  y j ) 2 .

The BP consists of two phases:-


1- Forward Propagation:-
During the forward phase, the I/p is presented and propagated towards the
o/p.
‫المرحلة األولى‬
Pattern Hidden o/p
Y1

Y2

Yn
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

2- Backward Propagation:-
During the backward phase, the errors are formed at the o/p and
propagated towards the I/p

1

2

n

3- Compute the error in the hidden layer.


1
If y  f ( x ) 
1  e x
f   y(1  y)
Equation is can rewrite as:-
 j  y(1 y)(d j  y j )

The error signal for hidden units for which there is no specified target
(desired o/p) is determined recursively in terms of the error signals of the units
to which it directly connects and the weights of those connections:-
That is
 j  f (net j ) k  k w ik

Or
 j  y j (1  y j ) k  k w ik

B.P learning is implemented when hidden units are embedded between input
and output units.
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

Convergence:-
A quantitative measure of the learning is the :Root Mean Square (RMS) error
which is calculated to reflect the "degree" of learning.
Generally, an RMS bellow (0.1) indicates that the net has learned its training
set. Note that the net does not provide a yes /no response that is "correct" or
"incorrect" since the net get closer to the target value incrementally with each
step. It is possible to define a cut off point when the nets o/p is said to match the
target values.
Local minima

- Convergence is not always easy to achieve because sometimes the net gets
stuck in a "Local minima" and stops learning algorithm.
- Convergence can be represented intuitively in terms of walking about
mountains.

Momentum term
The choice of the learning rate plays important role in the stability of the
process. It is possible to choose a learning rate as large as possible without
leading to oscillations. This offers the most rapid learning. One way to increase
the learning rate without leading to oscillations is to modify the GDR to include
momentum term.
This can be achieved by the following rule:-
Wij ( t  1)  Wij ( t )   j x i   ( Wij ( t )  Wij ( t  1))
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

Where  (0  1) is a constant which determines the effect of the past weight
changes on the current direction of movement in weight space.

A "global minima" unfortunately it is possible to encounter a local


minima, avally that is not the lowest possible in the entire terrain. The net does
not leave a local minima by the standard BP algorithm and special techniques
should be used to get out of a local minima such as:-

1- Change the learning rate or the momentum term.


2- Change the no. of hidden units (10%).
3- Add small random value to the weights.
4- Start the learning again with different initial weights.

4.1.3.1 Back propagation training algorithm


Training a network by back propagation involves three stages:-
1-the feed forward of the input training pattern
2-the back propagation of the associated error
3-the adjustment of the weights
let n = number of input units in input layer,
let p = number of hidden units in hidden layer
let m = number of output units in output layer
let Vij be the weights between i/p layer and the hidden layer,
let Wij be the weights between hidden layer and the output layer,
we refer to the i/p units as Xi , i=1, 2, ….,n. and we refer to the hidden units as
Zj , j=1,….,p. and we refer to the o/p units as yk, k=1,….., m.
1 j is the error in hidden layer,

 2k is the error in output layer,


 is the learning rate
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

 is the momentum coefficient (learning coefficient, 0.0 <  < 1.0,


yk is the o/p of the net (o/p layer),
Zj is the o/p of the hidden layer,
Xi is the o/p of the i/p layer.
 is the learning coefficient.

The algorithm is as following :-


Step 0 : initialize weights (set to small random value).
Step 1 : while stopping condition is false do steps 2-9
Step 2: for each training pair, do steps 3-8
Feed forward :-
Step 3:- Each i/p unit (Xi) receives i/p signal Xi & broad casts this signal
to all units in the layer above (the hidden layer)

Step 4:- Each hidden unit (Zj) sums its weighted i/p signals,
n
Z  inj  Vaj   x i v ij (Vaj is abias)
i 1
and applies its activation function to compute its output signal (the
activation function is the binary sigmoid function),
Z jf ( Z  inj)  1 / (1  exp - (Z - inj))

and sends this signal to all units in the layer above (the o/p layer).
Step 5:- Each output unit (Yk)sums its weighted i/p signals,
p
y  ink  wok   Zjwjk (where wok is abias)
j 1

and applies its activation function to compute its output signal.


y k  f ( y  ink )  1 /(1  exp ( y  ink )
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

back propagation of error:-


step 6 : Each output unit (yk , k= 1 tom ) receive a target pattern
corresponding to the input training pattern, computes its error
information term and calculates its weights correction term used
to update Wjk later,
 2k  y k (1  y k ) * (Tk  y k ),
where Tk is the target pattern & k=1 to m .
step 7 : Each hidden unit (Zj, j= 1 top ) computes its error information
term and calculates its weight correction term used to update Vij
later,
m
1 j  Zj * (1  Zj) *  2kWjk
k 1
Update weights and bias :-
step 8: Each output unit (yk, k =1 tom ) updates its bias and weights:
Wjk (new )   * 2k * Zj  *[Wjk (dd )],
j= 1 to p
Each hidden unit (Zj, j= 1 to p) update its bias and weights:
Vij(new )   * 1j * Xi  [ vij(dd)],
I = 1 to n
Step 9 : Test stopping condition.
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

EX6
Suppose you have BP- ANN with 2-input , 2-hiddden , 1-output nodes with
sigmoid function and the following matrices weight, trace with 1-iteration.
 0.1  0.3
V  w  0.3  0.5
0.75 0.2 
Where  0.9,   0.45, x  (1,0) , and Tk  1

Solution:-

11
0.1 V11
X1 Z1
2
0.3 W11
-0.3 V21
Y1
0.75 V12 12 -0.5 W21
X2 0.2 V Z2
22

Input Hidden output


units units units

1-Forword phase :-
Z  in1  X1V11  X 2 V21  1 * 0.1  0 * 0.75  0.1
Z  in 2  X1V12  X 2 V22  1 * 0.3  0 * 0.2  0.3
Z1  f ( Z  in1)  1 /(1  exp ( Z  in1))  0.5
Z 2  f ( Z  in 2)  1 /(1  exp ( Z  in 2))  0.426
y  in1  Z1W11  Z 2 W21
 0.5 * 0.3  0.426 * (-0.5)  -0.063
y1  f ( y  in1)  1 /(1  exp ( y  in1)  0.484
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

2-Backward phase :-
 2 k  yk(1  yk) * (Tk  yk)
 21  0.484(1  0.484) * (1  0.484)0.129
m
1 j  Z j * (1  Z j ) *   2k W jk
k 1

11  Z1(1  Z1) * (21W11)


 0.5 (1 - 0.5) * (0.129 * 0.3)  0.0097
12  Z2 (1  Z2 ) * (21W21)
 0.426(1  0.426) * (0.129 * (0.5))  0.015

3-Update weights:-

W jk (new )   *  2 k * Z j   * W jk (old) 
W11   *  21 * Z1   * W11 (old)
 0.45 * 0.129 * 0.5  0.9 * 0.3  0.299
W21   *  21 * Z 2   * W21 (old)
 0.45 * 0.129 * 0.426  0.9 * -0.5  0.4253

Vij (new )   * 1 j * X i   * Vij (old) 
V11   * 11 * X1   * V11 (old)
 0.45 * 0.0097 *1  0.9 * 0.1  0.0944
V12   * 12 * X1   * V12 (old)
 0.45 * 0.0158 *1  0.9 * -0.3  0.2771
V21   * 11 * X 2   * V21 (old)
 0.45 * 0.0097 * 0  0.9 * 0.75  0.675
V22   * 12 * X 2   * V22 (old)
 0.45 * -0.0158 * 0  0.9 * 0.2  0.18

0.0944  0.2771
V   W  0.299 - 0.4253
 0.675 0.18 
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

4.2 The Hopfield Network :-


The Nobel prize winner (in physics ) John Hopfield has developed the
discrete Hopfield net in (1982-1984). The net is a fully interconnected neural
net, in the sense that each unit is connected to every other unit. The discrete
Hopfield net has symmetric weights with no self-connections, i.e,
Wij  W ji

And Wii  0
In this NN, inputs of 0 or 1 are usually used, but the weights are initially
calculated after converting the inputs to -1 or +1 respectively.

w21 w21 y1
x1 T1
w31 w31

w12 w12
x2 T y2
w32 w32 2

w13 w13
x3 T3 y3
w23 w23

“The Hopfield network“

The outputs of the Hopfield are connected to the inputs as shown in


Figure, Thus feedback has been introduced into the network. The present output
pattern is no longer solely dependent on the present inputs, but is also dependent
on the previous outputs. Therefore the network can be said to have some sort of
memory, also the Hopfield network has only one layer of neurons.
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

The response of an individual neuron in the network is given by :-


n
y j  1 if  WijX i  Tj
i 1 i  j

n
y j  0 if  Wij X i  Tj
i 1 i  j

This means that for the jth neuron, the inputs from all other neurons are
weighted and summed.
Note i  j , which means that the output of each neuron is connected to
the input of every other neuron, but not to itself. The output is a hard-limiter
which gives a 1 output if the weighted sum is greater than Tj and an output of 0
if the weighted sum is less than Tj. it will be assumed that the output does not
change when the weighted sum is equal to Tj.
Thresholds also need to be calculated. This could be included in the
matrix by assuming that there is an additional neuron, called neuron 0, which is
permanently stuck at 1. All other neurons have input connections to this
neuron’s output with weight W01, W02, W03,…etc. this provides an offset
which is added to the weighted sum. The relation ship between the offset and
the threshold Tj is therefore:- Tj  -W0j
The output [y] is just the output of neuron 0 which is permanently stuck at 1, so
the formula becomes:- W0   X  Y0 
t

For example, if the patterns X1  0011 and X 2  0101 are to be stored, first
convert them to
X1   1  1 1 1
X 2   1 1  1 1
To find the threshold:-
 1  1 1 1
1- The matrix  
 1 1  1 1
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

 1  1
 1 1 
2-The transpose of the matrix is  
 1  1
 
1 1
3- y0 is permanently stuck at +1 , so the offsets are calculated as follows
 1  1   2
 1  1  
 1  0 
W0    1   0 
 1  1  
   
 1  1   2

4-These weights could be converted to thresholds to give:-


T1  2
T2  0
Tj  -W0j
T3  0
T4  2
EX7:-
Consider the following samples are stored in a net:-
0 1 0 0   1  1  1  1
1 1 0 0   1  1  1  1
   
0 0 1 1   1  1  1  1
binary  convert  bipolar
The binary input is (1110). We want the net to know which of samples is the i/p
near to?

Note :-
A binary Hopfield net can be used to determine whether an input vector is a
“known” vector (i.e., one that was stored in the net ) or “unknown” vector.
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

Solution:- 1-use Hebb rule to find the weights matrix


 W11 W12 W13 W14 
W W22 W23 W24 
W   21 
 W31 W32 W33 W34 
 
 W41 W42 W43 W44 

Wii=0 (diagonal)
1 2 3 4
1 0 W12 W13 W14 
Wij=Wji 2  W21 0 W23 W24 
 
3  W31 W32 0 W34 
 
4  W41 W42 W43 0 

W12  (1 *1)  (1 *1)  (1 * 1)  1


W13  (1 * 1)  (1 * 1)  (1 *1)  1
W14  (1 * 1)  (1 * 1)  (1 *1)  1
W21  W12  1
W23  (1 * 1)  (1 * 1)  (1 *1)  3
W24  (1 * 1)  (1 * 1)  (1 *1)  3
W31  W32  1
W32  W23  3
W34  (1 * 1)  (1 * 1)  (1 *1)  3
W41  W14  1
W42  W24  3
W43  W34  3

0 1  1  1
1 0  3  3
W   
 1  3 0 3
 
 1  3 3 0
2-The i/p vector x = (1 1 1 0). For this vector, y= (1 1 1 0)
Choose unit y1 to update its activation
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

m
y  in1  X1   y j w j1
j

y  in1  1  [(0 * 1)  (1 * 1)  (1 * 1)  (1 * 0)]


1 0 1
 y  (1110)

Choose unit y2 to up date its activation:-


y  in 2  x 2   y j w j2
j

 1  [(1 *1)  (1 * 0)  (1 * 3)  (0 * 3)]


 1  (2)  1
y  in 2  0  y2  0
 y  (1010)
Choose unit y3 to update its activation:-
y  in3  x 3   y j w j3
j

 1  [(1 * 1)  (1 * 3)  (1 * 0)  (0 * 3)]


 1  (4)  3
y  in 3  0  y3  0
 y  (1000)
Choose unit y4 to update its activation:-
y  in4  x 4   y j w j4
j

= 0+ [(1*-1) + (1*-3) + (1*3) + (0*0)]


= 0+ (-1) = -1
y-in4 < 0 y4=0
y = (1000)
3- Test for convergence, false
 The input vector x = (1000), for this vector,
Y= (1 0 0 0)
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

y  in1  1
y  in 2  1
y  in 3  1  0
y  in 4  1  0
 y  (1100)
 The input vector x= (1 1 0 0)
Y= (1 1 0 0)
y  in1  2  1
y  in 2  2  1
y  in 3  4  0
y  in 4  4  0
 y  (1100)
The input is near to the second sample.
True.
Stop.

H.W
1-find the weights and thresholds for a Hopfield network that stares the
patterns:- (0 0 1) and (0 1 1).

2-There are special techniques should be used to get out of local minima,
explain it.
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

4.3 Bidirectional Associative Memory (BAM)


A bidirectional associative memory (BAM) is very similar to a Hopfield
network, but has two layers of neurons (kosko, 1988) and is fully connected
from each layer to the other. There are feedback connections from the output
layer to the input layer.
The BAM is hetero associative, that is, it accept on input vector on one
set of neurons and produces a related, but different, output vector on another set.
The weights on the connections between any two given neurons from different
layers are the same.
The matrix of weights for the connections from the output layer to the
input layer is simply the transpose of the matrix of weights for the connections
between the input and output layer.
Matrix for forward connection weights = w
Matrix for backward connection weights = wT
There are 2 layers of neurons, an input layer and on output layer. There are no
lateral connections, that is, no two neurons within the some layer are connected,
Recurrent connections, which are feedback connections to a neuron from itself,
may or not be present. Unlike the Hopfield net work, the diagonal of the
connection matrix is left intact, also the number of bits in the input pattern need
not be the same as the output pattern, so the connection matrix is not necessarily
sequare.

Y1 Yi Yn
Wn1 W1m
W1j
Wim
W11 Wnm
Wij
Wi1
x1 W1m xi xn

“ Layout of BAM Network “


hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

The BAM operates by presenting on input pattern,[A], and passing it through


the connection matrix to produce an output pattern,[B] .so:-

B[k ]  f ([A(k )][w ])


Where
K: indicates time
A(k), B(k) :- are equivalent to [x] and [y]
F: activation function
W:- weight matrix between layer 1 & layer 2
The output of the neurons are produced by the function f( ) which, like the
Hopfield, is a hard-limiter with special case at  .
This output function is defined as follows :-
outi (k  1)  1 if Neti (k)  0
outi (k  1)  0 if Neti (k)  0
outi (k  1)  outi (k) if Neti  0 unchanged
The output [B], is then passed back through the connection matrix to produce a
new input pattern, [A].
A(k  1)  f ( [B(k )][W T ] )
The [A] & [B] pattern are passed back and forth through the connection matrix
in the way just described until there are no further changes to the values of [A]
& [B]
BAM ‫محاسن الـ‬
‫ منسجمة مع الدوائر التناظرية واالنظمة البصرية‬-١
‫ لھا اقتراب سريع في عملية التعلم واالسترجاع‬-٢
noisy data ‫ لھا حصانة ضد الـ‬-٣
‫ االوزان ثابتة‬-٤
BAM ‫مساوئ الـ‬
‫ سعة الخزن محددة‬-١
‫لھا استجابة زائفة‬-٢
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

‫ احيانا ً تسلك سلوك غير متوقع‬-٣


No learning -٤

EX8:- let us try to train a network to remember three binary – vector pairs.
Ai,Bi have the same number of component, using the Hebb rule to star :-
A1  (1 0 0) B1  (0 0 1)
A 2  ( 0 1 0) B 2  ( 0 1 0)
A 3  (0 0 1) B 3  (1 0 0)
1- Find the weight matrix?
2-Apply an input vector A1 = (1 0 0) to test the net to remember A1.
Sol
W1  [ A1T ][ B1 ]

1  1  1 1 
W1   1 - 1 - 1 1   1 1  1
   
 1  1 1  1

W2  [A T2 ][B 2 ]

 1  1 1 1 
W2   1  - 1 1 - 1   1 1  1
   
 1  1  1 1 

W3  [A 3T ][B 3 ]

 1  1 1 1 
 
W3   1 1 - 1 - 1   1 1 1 
   
 1   1  1  1

W  W1  W2  W3

 1  1 1   1  1 1   1 1 1   1  1 3 
W   1 1  1   1 1  1   1 1 1    1 3  1
       
 1 1  1  1  1 1   1  1  1  3  1  1

Test for A1
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

Or ‫طريقة مختصرة‬
W = [AT] [B]
 1  1  1  1  1 1   1  1 3 
W   1 1  1  1 1  1   1 3  1
    
 1  1 1   1  1  1  3  1  1

And then continues the same steps .


A1 * W = B1
 1  1 3 
1  1  1 1 3  1   3  3 5
 3  1  1

 3  3 5  0 0 1  B1

H.W
Q1: find the weights and thresholds for a Hopfield network that stores the
pattern 001 and 011.
Q2- A BAM is trained using the following input and output patterns:-
Input Output
000010010000010 01
000010000010000 10
000100100100000 11
Find the weights that would be generated for the BAM network, and check that
the input patterns generate the corresponding output patterns.
Q3- Briefly explain the following :-
1- Single layer network , Multi layer network
2-ANN
3- Areas of Neural network
4- supervised Learning , unsupervised Learning
5-Recurrnt, non recurrent
6-Advantage & disadvantage of BAM
7- write the complete alg. Of BAM.
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

5-1 Adaline
Adaline is the short from of "Adaptive linear neuron" and was presented
in 1960 by B. Widrow and N. E. Hoff [WH1960]. The network is single
layered, the binary values to be assumed for input and output are -1 and +1
respectively. Figure bellow shows the general topology of the network.

Y1 Y2 Y3 Y4
w
w
w w

1 X1 X2 X3
bias

“Topology of Adaline “
Where
X = input vector (including bias)
Y=output vector = f(w*x)
W=weight matrix
An Adaline can be trained using the delta rule, also known as the least mean
sequares (LMS) or widerow- Holf rule. The learning rule minimize the mean
squared error between the activation and the target value. This allows the net to
continue learning on all training patterns, even after the correct output value is
generated (if a threshold function is applied ) for some patterns.
When the Adaline is in its tracing or learning phase, there are three factors to be
taken into account
1-the inputs that are applied are chosen from a training set where the desired
response of the system to these inputs is known.
2-the actual output produced when an input pattern is applied is compared with
the desired output and used to calculate an error 
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

3- the weight are adjusted to reduce the error .


This kind of training is called supervised learning because the output are
known and the network is being forced into producing the correct outputs.
Three additional points need to be included before the learning rule can be
used:-
4-the constant,  , has to be decided. The original suggestion for the Adaline
was that  is made equal to:-
  1 /(n  1)
Where n is the number of inputs.
The effect of adjusting the weights by this amount is to reduce the error
for the current input pattern to zero. In practice if  is sat to this value the
weights rarely settle down to a constant value and a smaller value is generally
used.
5-the weight are initially set to a small random value. This is to ensure that the
weights are all different.
6-the offset, w0 gets adjusted in the same way as the other weights, except that
the corresponding input x0 is assumed to be +1.
The steps for solving any question in Adaline by using Delta-rule are :-
1-compute the learning coefficient  :-
  1 /(n  1)
n= number of inputs
2-comput neti :-
neti   x i .w i
i 1

3-compute the error 


  d  neti d is the desired o/p

4-compute the value of     x i for all weights

5-find the total for all weight total   x i


hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

6-find mean i mean i  total /p


Where :-
P:- is the no. of states
new old
7- adjust the weights depending on meani Wi  Wi  meani

EX9 :-
Adaline is given the four different input and output combinations of the two
input AND function, y  x 1  x 2 , as training set
w 0  0.12
w 1  0.4
w 2  0.65 X1 X2 Y
0 0 0
y  x 1  x 2 0 1 0
1 0 1
1 1 0
bias
X0 X1 X2 Y
+1 -1 -1 -1
Desired o/p

X2 w2
+1 -1 +1 -1
X1 w1
+1 +1 -1 +1 w0
X0
+1 +1 +1 -1

First the input pattern : +1 -1 -1


Weights : -0.12 0.4 0.65
n
net   x i .w i
i 1

= (+1*-0.12 )+(-1*0.4) +(-1*0.65) = -1.17 (actual output )


d = desired output = -1 (for first pattern)
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

   d  net
= -1-(-1.17) =0.17 (error)
  0.1
Also we must compute :-
w ij      x i

For convenience , these figures have been rounded to two places after the
decimal point, so become :-     x 0  (0.1 * 0.17 * 1)  0.017   0.02
-: ‫ وبھذا نحصل على النتائج التالية‬input ‫ نستمر بالعمل مع بقية ال‬-:‫مالحظة‬
w 0 w 1 w 2
X0 X1 X2 W0 W1 W2 net d x 0 x 1 x 2
+1 -1 -1 -0.12 - 0.40 0.65 -1.17 -1 0.02 -0.02 -0.02
+1 -1 +1 0.12 0.40 0.65 0.13 -1 -0.11 0.11 -0.11
+1 +1 -1 -0.12 0.40 0.65 -0.37 +1 0.14 0.14 -0.14
+1 +1 +1 -0.12 0.40 0.65 0.93 -1 -0.19 -0.19 -0.19
total -0.14 0.04 -0.46
meanj  total(wij) / p p4
Mean0 = -0.14/4 =-0.035 = -0.04
Mean1 = -0.04/4 =-0.01
Mean2 = -0.46/4 =-0.115 = -0.12
new old
 Wij  Wij  meanj
new
W0 =-0.12+(-0.04)=-0.16
new
W1 =-0.40+(0.01)= -0.41
new
W2 =-0.66+(-0.12) = 0.53 ,
Continue until x  0
X0 X1 X2 W0 W1 W2 net d x 0 x 1 x 2
+1 -1 -1 -0.16 0.41 0.53 -1.10 -1 0.01 -0.01 -0.01
+1 -1 +1 -0.16 0.41 0.53 0.04 -1 -0.10 0.10 -0.10
+1 +1 -1 -0.16 0.41 0.53 -0.25 +1 0.13 0.13 -0.13
+1 +1 +1 -0.16 0.41 0.53 0.78 -1 -0.18 -0.18 -0.18
total -0.14 0.04 -0.44
mean -0.04 0.01 -0.11
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

w 0 w 1 w 2
X0 X1 X2 W0 W1 W2 net d x 0 x 1 x 2
+1 -1 -1 -0.50 0.50 -0.50 -0.50 -1 -0.05 0.05 0.05
+1 -1 +1 -0.50 0.50 -0.50 -1.50 -1 0.05 -0.05 0.05
+1 +1 -1 -0.50 0.50 -0.50 0.50 +1 0.05 0.05 -0.5
+1 +1 +1 -0.50 0.50 -0.50 -0.50 -1 -0.05 -0.05 -0.5
total 0.00 0.00 0.00

The network has successfully found a set of weight that produces the correct
outputs for all of the patterns.
H.W
Q1:A 2-input Adaline has the following set of weights w0 =0.3 , w1=-2.0 ,w2 =
1.5 When the input pattern is x0 = 1 , x1= 1 , x2 = -1
And the desired output is 1
a- what is the actual output?
b- what is the value of  ?
c- Assuming that the weights are updated after each pattern and the value  is
1/n+1 , what are the new values for the weights?
d- using these new values of weights, what would the output be for the same
input pattern?
Q2: with  set to 0.5, calculated the weights (to one decimal place) in the
following example after are iteration through the set of training patterns.
a- updating after all the patterns are presented
b- updating after each pattern is presented

X0 X1 X2 W0 W1 W2 net d x 0 x 1 x 2


+1 -1 -1 -0.2 0.1 0.3 -0.6 +1 0.8 -0.8 -0.8
+1 -1 +1 +1
+1 +1 -1 -1
+1 +1 +1 +1
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

5-2 kohonen network


Teuvo kohonen presented the self-organizing feature map in 1982. it is an
unsupervised, competitive learning , clustering network in which only one
neuron (or only one neuron in a group) is “on” at a time.
The self-organizing neural networks, also called (topology –preserving
maps), assume a topological structure among the cluster units. This property is
observed in the brain, but is not found in other artificial neural networks.
There are m cluster units arranged in a one –or two – dimensional array.
. ‫ وھي مجاميع من المعلومات كل مجموعة لھا صفة معينة‬: Cluster
The weight vector for cluster units serves as an exemplar of the input
patterns associated with that cluster. During the self organizing process, the
cluster unit whose weight vector matches the input pattern most closely
(typically, the square of the minimum Euclidean distance ) is chosen as the
winner. The winning unit and its neighboring units update their weights. The
weight vectors of neighboring units are not, in general, close to the input
pattern.
5.2.1 Architecture
A kohonen network has two layers, an input layer to receive the input and
an output layer. Neurons in the output layer are usually arranged into a regular
two dimensional array. The architecture of the kohonen self-organizing map is
shown bellow.

Y1 Yj Ym
W1j W1m
Wn1 Wnj
Wij W1m
W11 Wi1 Wnm

x1 xi xn

Figure (4.1)
(kohonen self-organizing map)
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

* * * [ * ( * [ #] * ) * ] * *
R2 R1 R0 R1 R2

Figure (4.2) linear array 10 cluster


Neighborhoods of the unit designated by # of radii R=2 (1& 0) in a one –
dimensional topology (with 10 cluster units) are shown in figure (4.2)

* * * * * * *
* * * * * * *
* * * * * * *
* * * # * * *
* * * * * * *
* * * * * * *
* * * * * * *
Figure (4.3)
Neighborhoods for rectangular grid
R0 = ……..
R1 = _____
R2 = - - - - -

* * * * * * * *
The Neighborhoods of unit radii R=2 (1 & 0) are shown in figure (4.3) for a
rectangular grid and in figure (4.4) for hexagonal grid (each with 49 units). In
each illustration, the winning unit is indicated by the symbol “#” and the other
units are denoted by “*” .
* Kohonen NN can be used in speech recognizer
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

* * * R2 * * * *
* * * * * * *
R1
* * * R0
* * * *
* * * # * * *
* * * * * * *
* * * * * * *
* * * * * * *
Figure (4.4)
Neighborhoods for hexagonal grid
R0 = ……..
R1 = _____
R2 = - - - - -

5.2.2 Algorithm
Step 0 : initialize weights wij
Set topological neighborhood parameters
Set Learning rate parameters.
Step1:while stopping condition is false, do step 2-8
Step2: for each input vector x, do step 3-5
Step3: for each j, compute distance
D( j)   ( x i  w ij ) 2 Euclidean distances
i

Step4 : find index J such that D(J) is a minimum


Step5: for all units j within a specified neighborhood of J, and for all i:
Wij(new )  Wij(old)  [Xi  Wij(old)]
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

Step6: update learning rate.


Step7: Reduce radius of topological neighborhood at specified times
Step8: Test stopping condition.

EX 10
A kohonen self-organizing map (SOM) to be cluster four vectors
vector1  (1 1 0 0)
vector 2  (0 0 0 1)
vector3  (1 0 0 0)
vector 4  (0 0 1 1)
The maximum no. of clusters to be formed is m=2 with learning rate  0.6
Sol:
With only 2 clusters available, the neighborhood of nodJ is set so that only one
cluster up dates its weight at each step
Initial weight matrix:
0.2 0.8
0.6 0.4
 
0.5 0.7 
 
0.9 0.3
x1 x2 x3 x4
1- for the first vector
(1 1 0 0)

D(i)  (1  0.2) 2  (1  0.6) 2  (0  0.5) 2  (0  0.9) 2  1.86

D(2)  (1  0.8) 2  (1  0.4) 2  (0  0.7) 2  (0  0.3) 2  0.98 (Minimum)


 J  2 (The input vector) is closest to output node 2)
 The weight on the winning unit is update:-
W21 (new )  W12 (old)  0.6( x i  W12 (old))
 0.8  0.6(1 - 0.8)  0.92
W22 (new)  0.4  0.6(1 - 0.4)
 0.4  0.36  0.76
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

W23 (new )  0.7  0.6(0 - 0.7)


 0.28
W24 (new )  0.3  0.6(0 - 0.3)
 0.12
0.2 0.92
0.6 0.76
This gives the weight matrix  
0.5 0.28
 
0.9 0.12
2-for the second vector 0 0 0 1

D(i)  (0  0.2) 2  (0  0.6) 2  (0  0.5) 2  (1  0.9) 2  0.66 minimum

D(2)  (0  0.92) 2  (0  0.76) 2  (0  0.28) 2  (1  0.12) 2  2.2768


 J  1(The i/p vector is closest to o/p node 1)
After update the first column of the weight matrix:-
0.08 0.92
0.24 0.76
 
0.20 0.28
 
0.96 0.12
3- for the third vector (1 0 0 0)
D(i)  (0.08) 2  (0  0.24) 2  (0  0.20) 2  (0  0.96) 2  1.856

D(2)  (1  0.92) 2  (0  0.76) 2  (0  0.28) 2  (1  0.12) 2


 2.2768 minimum
 J  2 (The i/p vector is closest to o/p node (2))
After update the second column of the weight matrix:-
0.08 0.968 
0.24 0.304 
 
0.20 0.112 
 
0.96 0.0.48
4- for the fourth vector ( 0 0 1 1)
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

D(i)  (0  0.08) 2  (0  0.24) 2  (1  0.20) 2  (1  0.96) 2  0.7056 minimum


D(2)  (0  0.968) 2  (0  0.304) 2  (1  0.112) 2  (1  0.048) 2  2.724
 J  1(the i/p vector is closest to o/p node 1)
After update the first column of the weight matrix :-
0.032 0.968 
0.096 0.304 
 
0.680 0.112 
 
0.984 0.0.48
 Reduce the learning rate
 ( t  1)*  ( t )  0.5 * (0.6)  0.3
 After one iteration the weight matrix will be:-
0.032 0.970
0.096 0.300
 
0.680 0.110
 
0.984 0.048
H.W
Find the output node with minimum distance then update its reference vector
only  0.5

C1 C2 C3 C4 C5
0.8
0.7 0.9 0.5
0.6 0.1 0.4 0.3
0.3 0.2

X1 X2

X1=0.5 X2=0.2
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

5.3 Self- Organizing Networks


Self –organizing networks mean that the systems are trained by showing
examples of patterns that are to be classified, and the network is allowed to
produce its own output code for the classification.
In self – organizing networks the training can be supervised or
unsupervised. The advantage of unsupervised learning is that the network finds
its own energy minima and therefore tends to be more efficient in terms of the
number of patterns that it can accurately store and recall.
In self – organizing networks four properties are required:-
1- The weight in the neurons should be representative of a class of patterns.
So each neuron represents a different class
2- Input patterns are presented to all of the neurons, and each neuron
produces an output. The value of the output of each neuron is used as a
measure of the match between the input pattern and the pattern stored in
the neuron
3- A competitive learning strategy which selects the neuron with the largest
response.
4- A method of reinforcing the largest response.
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

5.4 Adaptive Resonance theory (ART)


Adaptive resonance theory (ART) was developed by Carpenter and
Grossberg (1987). One form, ART 1, is designed for clustering binary vectors,
another, ART2 also by Carpenter and Grossberg (1987).
These nets cluster inputs by using unsupervised learning input patterns may
be presented in any order. Each time a pattern is presented, an appropriate
cluster unit is chosen and that cluster’s weights are adjusted to let the cluster
unit learn the pattern.

4.4.1 Basic Architecture


Adaptive resonance theory nets are designed to allow the user to control the
degree of similarity of patterns placed on the same cluster. ART1 is designed to
cluster binary input vectors. The architecture of an ART1 net Consists of the
following units:-
1- Computational units.
2- Supplemental units.

1- Computational units:-
The architecture of the computational units for ART1 consists of three
field of unites:-
1- The F1 units (input and interface units)
2- The F2 units (cluster units)
3- Reset unite
This main portion of the ART1 architecture is illustrated in figure
bellow:-
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

F2 Layer(cluster units)
Y1 YJ Yr

F1(b)Layer(Interface)
X1 XJ Xn

S1 Si Sn F1(a) Layer(input)

"Figure (4.5) Basic structure of ART1"

The F1 layer can be considered to consist to two of two parts:-


1- F1 (a) the input units
2- F1 (b) the interface units.

Each unit in the F1 (a) (input) layer is connected to the


corresponding unit in the F1 (b) (interface) layer .Each unit in the F1 (a)
&F1 (b) layer is connected to the reset unit, which in turn is connected to
every F2 unit. Each unit in the F1 (b) is connected to each unit in the F2
(cluster) by two weighted pathways:-
1- Bottom –up weights:-
The F1(b) unit Xi is connected to the F2 unit Yj by bottom –up weights bij.

2- Top- down weights:-


Unit Yj is connected to unit Xi by top-down weights tji.

The F2 layer is a competitive layer in which only the uninhibited node


with the largest net input has a non zero activation.
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

2-Supplemental Units:-
The Supplemental Units shown in figure (4-6) are important from a
theoretical point of view. There are two Supplemental Units called gain
control units, these are:-
1- Gain1 g1 or G1
2- Gain2 G2
In addition to the reset unit R
+ +
F2 layer (cluster)
_
G2
R bij tji
_ + +
F1 (b) layer (interface) G1

+ + +

F1 (a) layer (input)

“Figure (4.6) the Supplemental Units for ART1”

Excitatory signals are indicated by (+) and inhibitory signals by (-), a


signal is sent whenever any unit in the designated layer is (on).
Each unit in either the F1 (b) or F2 layer of the ART1 net has three sources
from which it can receive a signal
1- F1(b) can receive signals from :-
- F1(a) (an input signal)
- F2 node (top –down signal)
- G1 unit.
2- F2 unit can receive a signal from :-
- F1 (b) (an interface unit)
- R unit (reset unit)
- G2 unit
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

An F1(b) (interface) or F2 unit must receive two excitatory signals in


order to be (on). Since there are three possible sources of signals, this
requirement is called the two- thirds rule.
The reset unit R controls the vigilance matching (the degree of similarity
required for patterns to be assigned to the same cluster unit is controlled by a
user – specified parameter, known as the vigilance parameter). When any unit in
the F1 (a) is on, an excitatory signal is sent to R. the strength of that signal
depends on how many F1(a) are (on). R also receives inhibitory signals from the
F1(b) that are (on). If enough F1(b) are (on), unit R is prevented from firing . If
unit R does fire, it inhibits any F2 unit that is (on). This forces the F2 layer to
choose a new winning node.

There are two types of learning that differ both in their theoretical
assumptions and in their performance characteristics can be used for ART nets:-

1- fast learning
It is assumed that weight updates during resonance occur rapidly, in fast
learning, the weight reach equilibrium on each trial. It is assumed that the
ART1net is being operated in the fast learning mode.

2- slow learning
The weight changes occur slowly relative to the duration of a learning trial,
the weights do not reach equilibrium on a particular trail.
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

H.W
Q1: Write the complete algorithm for kohonen neural network?

Q2: there are two basic units in ART1 architecture, list them and draw the
figure for each one of them.

Q3: there are two kinds of learning in ART neural network. Briefly explain
each one of them. Which kind does ART1 use?

Q4: Define the following expressions:-


1- Euclidean Distances
2- Vigilance matching
3- Bottom –up and top- down weights
4- Two-thirds rule
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

6.1 Genetic Algorithm (GA)

Adaptive algorithm

Neural computing (NN) Fuzzy system (FS) Evolutionary computation


(EC)

Genetic Evolutionary Evolutionary Classifier Genetic


Programming Programming Strategies System Algorithm
(Gp) (EP) (ES) (CS) (GA)

Structure of Adaptive Algorithm

A genetic algorithm is a search procedure modelled on the mechanics of


natural selection rather than a simulated reasoning process. Domain Knowledge
is embedded in the abstract representation of a candidate solution termed an
organism. Organisms are grouped into sets called populations. Successive
population are called generation. The aim of GA is search for goal.
A generational GA creates an initial generation G(0) , and for each
generation ,G(t) , generates a new one ,G(t+1) . An abstract view of the
algorithm is:-
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

Generate initial population, G(0);


Evaluate G(0);
t:=0;
Repeat
t:=t+ 1
Generate G(t) using G(t-1);
Evaluate G(t);
Until solution is found.

6.1.1 Genetic Operators


The process of evolving a solution to a problem involves a number of
operations that are loosely modeled on their counterparts from genetics .
Modeled after the processes of biological genetics , pairs of vectors in the
population are allowed to “ mate” with a probability that is proportional to their
fitness . the mating procedure typically involves one or more genetic operators .
The most commonly applied genetic operators are :-
1- Crossover.
2- Mutation.
3- Reproduction.
1- Crossover
Is the process where information from two parents is combined to form
children. It takes two chromosomes and swaps all genes residing after a
randomly selected crossover point to produce new chromosomes.
This operator does not add new genetic information to the population
chromosomes but manipulates the genetic information already present in the
mating pool (MP).
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

The hope is to obtain new more fit children It works as follows :-


1- Select two parents from the MP ( The best two chromosomes ) .
2- Find a position K between two genes randomly in the range (1, M-1 )
M = length of chromosome
3- Swap the genes after K between the two parents.
The output will be the both children or the more fit one.

2- Mutation
The basic idea of it is to add new genetic information to chromosomes. It is
important when the chromosomes are similar and the GA may be yet stuck in
Local maxima. A way to introduce new information is by changing the a of
some genes. Mutation can be applied to :-
1- Chromosomes selected from the MP.
2- Chromosomes that have already subject to crossover.
The Figure bellow illustrates schematically the GA approach.

3- Reproduction
After manipulating the genetic information already present in the MP . by
fitness function the reproduction operator add new genetic information to the
population of the chromosomes by combining strong parents with strong
children , the hope is to obtain new more fit children . Reproduction imitate to
the natural selection.
This schematic diagram of a genetic algorithm shows the functions that are
carried out in each generation. Over a number of such generation the initial
population is evolved to the point where it can meet some criterion with respect
the problem at hand .
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

Initialize First
Population

Select Pairs
Evaluate Of Vectors Apply
Old Vector
each For Mating Crossover New
Population
Vector For On basis of Mutation Vector
Fitness fitness Operators Population

Replace Old Population With new


Population until some criterion has been
achieved

“Figure (5.1) Genetic Algorithm approach “


hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

6.2 Genetic Programming (GP)

Genetic programming (GP) is a domain – independent problem – solving


approach in which computer programs are evolved to solve, or approximately
solve problems. Thus, it addresses one of the central goals of computer science
namely automatic programming. The goal of automatic programming is to
create, in an automated way, a computer program that enables a computer to
solve a problem.
GP is based on reproduction and survival of the fittest genetic operations
such as crossover and mutation. Genetic operation are used to create new
offspring population of individual computer programs from the current
population of programs .
GP has several properties that make it more suitable than other paradigms
( e.g. . best – first search , heuristic search , hill climbing etc . ) , these
properties are :-
1- GP produces a solution to a problem as a computer program. Thus GP is
automatic programming.
2- Adaptation in GP is general hierarchical computer programs of
dynamically varying size & shape.
3- It is probabilistic algorithm.
4- Another important feature of GP is role of pre processing of inputs and
post processing of outputs .
hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

EX. 11:-
By using GA step by step, find the maximum number in 0 to 31.let k=3 and
population size=4 ,and the initial population is:-
14 01110
3 00011 population
25 11001
21 10101
Fitness function will be:-
25&21
3&14
25&21
1 1 0 0 1 25
1 0 1 0 1 21

1 1 1 0 1 29
1 0 0 0 1 17

14&3
0 1 1 1 0 14
0 0 0 1 1 3

0 1 0 1 1 11
0 0 1 1 0 6
the new population will be an array and we choose position [16] randomly to do
mutation on it:-
1 1 1 0 1 1 1 1 0 1
1 0 0 0 1 1 0 0 0 1
0 1 0 1 1 Mutation 0 1 0 1 1
0 0 1 1 0 1 0 1 1 0
Mutation 0 1
After Mutation the new population will be:
1 1 1 0 1 = 29
1 1 0 0 1 =25
1 0 1 0 1 =21 reproduction
1 0 1 1 0 =22

Because the mutation we replace 17 with 22 in the new population.


hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

EX 12
Apply GA in travelling salesman to find the shortest path . let k=2 and the
initial population is:-

A B C D E =12
B C D E A =10
A C D B E =11
E C A D B =11 initial
B A D C E =10 population
D E B A C =10

B C D E A =10
B A D C E =10

B C D A E =6
B A D E C =13

D E B A C =10
A C D B E

D E B A C = 10
A C D E B =9

E C A D B =11
A B C D E =12

E C A D B = 11
A B C D E =12

THE NEW POPULATION IS:-


hange E hange E
XC di XC di
F- t F- t
PD

PD
or

or
!

!
W

W
O

O
N

N
Y

Y
U

U
B

B
Neural Networks
to

to
ww

ww
om

om
k

k
lic

lic
C

C
.c

.c
w

w
tr re tr re
.

.
ac ac
k e r- s o ft w a k e r- s o ft w a

B C D A E = 6
B A D E C =13
D E B A C = 10
A C D E B =9
E C A D B =11
A B C D E =12

‫أما نأخذ افضل ثالثة كروموسومات من الجيل الجديد و افضل ثالثة كروموسومات من الجيل السابق‬
‫ للجيل الجيد فقط‬crossover ‫ لھما او نعمل‬crossover‫ونعمل‬

H.W

Q1: Can the bit string 0 1 0 1 0 1 0 1 be the result of crossing over the
following pairs of parents?:-
a- 11111111 and 00000000
b-01010101 and 11111111
c-10100101 and 01011010

Q2: What is genetic algorithm (GA). Explain its algorithm.

Q3: What are the most commonly operators used in GA, list it only, then
draw the figure which illustrates schematically the GA approach.

Q4: Adaptive algorithm includes GA and GP in one port of it. Illustrates


schematically the main structure of adaptive algorithm.

You might also like