NEURAL NETWORKS AND DEEP LEARNING
Subject Code:
R18 B.Tech. CSE (AIML) IV Year – I
Semester
By
K.M.N.Vardhini
Assistant
Professor
Department of Artificial Intelligence and Machine
Learning Sphoorthy Engineering College
Nadergul, Hyderabad – 501510.
UNIT-I
Artificial Neural Networks Introduction, Basic models of ANN, important
terminologies, Supervised Learning Networks, Perceptron Networks, Adaptive Linear
Neuron, Back-propagation Network. Associative Memory Networks. Training
Algorithms for pattern association, BAM and Hopfield Networks.
ARTIFICIAL NEURAL
NETWORKS
Introduction
✔ The Artificial Neural Networks (ANN) began with Warren McCulloch and Walter Pitts
(1943) who created a computational model for neural networks based on algorithms
called threshold logic.
✔ Computational model that mimes the functional of human brain to perform various
tasks faster than traditional system. ANN is an efficient information processing system
which resembles the characteristics of biological neural network
✔ The term "Artificial neural network" refers to a biologically inspired sub-field of artificial
intelligence modelled after the human brain.
✔ An Artificial neural network is usually a computational network based on biological
neural networks that construct the structure of the human brain. Similar to a human brain
that has neurons interconnected to each other, artificial neural networks also have neurons
(called perceptrons) that are linked to each other in various layers of the networks.
✔ These neurons are known as perceptrons/nodes.
The given figure illustrates the typical diagram of Biological Neural Network.
The typical Artificial Neural Network looks something like the given figure.
✔ The input signals is received by dendrites, and processing generally to the cell
body (soma). Incoming signals can be either excitatory which means they tend
to make the neuron fire (generate an electrical impulse).
✔ Most neurons receive many input signals throughout their dendritic trees. A
single neuron may have more than one set of dendrites and may receive many
thousands of input signals. To decide whether a neuron is excited to fire an
impulse depends on the sum of all of the excitatory and inhibitory signals it
receives. The processing of this information takes place in soma which is
neuron cell body. If the neuron does end up firing, the nerve impulse, or action
potential, is conducted down the axon.
✔ Towards its end, the axon splits up into many branches known as axon
terminals (or nerve terminals), which makes connections on target cells.
✔ The junctions that allow signal transmission between the axons terminals and
dendrites are called synapses. The process of transmission is by diffusion of
chemicals called neuro transmitters across the synaptic cleft
✔ Relationship between Biological neural network and artificial neural
network:
Biological Neural Network Artificial Neural Network
Dendrites Inputs
Cell nucleus Nodes
Synapse Weights
Axon Output
✔ An Artificial Neural Network in the field of Artificial intelligence where it
attempts to mimic the network of neurons makes up a human brain so that
computers will have an option to understand things and make decisions in a
human-like manner.
✔ There are around 1000 billion neurons in the human brain. Each neuron has
an association point somewhere in the range of 1,000 and 100,000. In the
human brain, data is stored in such a manner as to be distributed, and we can
extract more than one piece of this data when necessary from our memory
parallelly. We can say that the human brain is made up of incredibly amazing
parallel processors.
✔ We can understand the artificial neural network with an example, consider
an example of a digital logic gate that takes an input and gives an output.
"OR" gate, which takes two inputs. If one or both the inputs are "On," then
we get "On" in output. If both the inputs are "Off," then we get "Off" in
output. Here the output depends upon input. Our brain does not perform the
same task. The outputs to inputs relationship keep changing because of the
neurons in our brain, which are "learning."
The architecture of an artificial neural network:
✔ To understand the concept of the architecture of an artificial neural network, we
have to understand what a neural network consists of. In order to define a neural
network that consists of a large number of artificial neurons, which are termed
units arranged in a sequence of layers. Let’s look at various types of layers
available in an artificial neural network.
✔ Artificial Neural Network primarily consists of three layers:
Input Layer:
As the name suggests, it accepts inputs in several different formats provided by
the programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which
finally results in output that is conveyed using this layer.
✔ The artificial neuron model has N input, denoted as x1, x2, ...xn. Each line
connecting these inputs to the neuron is assigned a weight, which are denoted
as w1, w2, .., wn respectively.
✔ Weights in the artificial model correspond to the synaptic connections in
biological neurons.
✔ The threshold in artificial neuron is usually represented by Θ and the activation
corresponding to the graded potential is given by the formula:
i.e., Net input
□ The output can be calculated by applying the activation function over the
net input.
Advantages of Artificial Neural Network (ANN)
□ Parallel processing capability:
o Artificial neural networks have a numerical value that can perform more than one
task simultaneously.
□ Storing data on the entire network:
o Data that is used in traditional programming is stored on the whole network, not
on a database. The disappearance of a couple of pieces of data in one place doesn't
prevent the network from working.
□ Capability to work with incomplete knowledge:
o After ANN training, the information may produce output even with inadequate
data. The loss of performance here relies upon the significance of missing data.
□ Having a memory distribution:
o For ANN is to be able to adapt, it is important to determine the examples and
to encourage the network according to the desired output by demonstrating
these examples to the network. The succession of the network is directly
proportional to the chosen instances, and if the event can't appear to the
network in all its aspects, it can produce false output.
□ Having fault tolerance:
o Extortion of one or more cells of ANN does not prohibit it from
generating output, and this feature makes the network fault-tolerance.
Disadvantages of Artificial Neural Network (ANN):
□ Assurance of proper network structure:
o There is no particular guideline for determining the structure of artificial neural
networks. The appropriate network structure is accomplished through
experience, trial, and error.
□ Unrecognized behaviour of the network:
o It is the most significant issue of ANN. When ANN produces a testing
solution, it does not provide insight concerning why and how. It decreases
trust in the network.
□ Hardware dependence:
o Artificial neural networks need processors with parallel processing power, as
per their structure. Therefore, the realization of the equipment is dependent.
□ Difficulty of showing the issue to the network:
o ANNs can work with numerical data. Problems must be converted into
numerical values before being introduced to ANN. The presentation
mechanism to be resolved here will directly impact the performance of
the network. It relies on the user's abilities.
□ The duration of the network is unknown:
o The network is reduced to a specific value of the error, and this value
does not give us optimum results.
How do artificial neural networks work?
✔ Artificial Neural Network can be best represented
as a weighted directed graph, where the artificial
neurons form the nodes.
✔ The association between the neurons outputs and
neuron inputs can be viewed as the directed edges
with weights.
✔ The Artificial Neural Network receives the input
signal from the external source in the form of a
pattern and image in the form of a vector.
✔ These inputs are then mathematically assigned by
the notations x(n) for every n number of inputs.
✔ Afterward, each of the input is multiplied by its corresponding weights
(these weights are the details utilized by the artificial neural networks to
solve a specific problem). In general terms, these weights normally
represent the strength of the interconnection between neurons inside the
artificial neural network. All the weighted inputs are summarized
inside the computing unit.
✔ If the weighted sum is equal to zero, then bias is added to make the output
non-zero or something else to scale up to the system's response. Bias has
the same input, and weight equals to 1. Here the total of weighted inputs
can be in the range of 0 to positive infinity. Here, to keep the response in
the limits of the desired value, a certain maximum value is benchmarked,
and the total of weighted inputs is passed through the activation function.
BASIC MODEL OF ARTIFICIAL NEURAL NETWORK
✔ The models of ANN are specified by the three basic entities namely:
1. The model's synaptic interconnection.
2. The training rules or learning rules adopted for updating and adjusting the
connection weights.
3. Activation functions.
The model's synaptic interconnection.
✔ Interconnection can be defined as the way processing elements (Neuron) in ANN are
connected to each other. Hence, the arrangements of these processing elements and
geometry of interconnections are very essential in ANN.
✔ These arrangements always have two layers that are common to all network architectures,
the Input layer and output layer where the input layer buffers the input signal, and the
output layer generates the output of the network.
✔ The third layer is the Hidden layer, in which neurons are neither kept in the input layer
nor in the output layer. These neurons are hidden from the people who are interfacing
with the system and act as a black box to them.
✔ By increasing the hidden layers with neurons, the system’s computational and processing
power can be increased but the training phenomena of the system get more complex at
the same time.
✔ There are three fundamentally different classes of network architectures:
a) Single-Layer Feedforward Networks
b) Multilayer Feedforward Networks
c) Recurrent Networks
a) Single-layer feed-forward network
✔ In this type of network, we have input layer
and output layer but the input layer does not
count because no computation is
performed in this layer.
✔ Output Layer is formed when different
weights are applied on input nodes and the
cumulative effect per node is taken.
✔ After this, the neurons collectively give the
output layer to compute the output
signals.
b) Multilayer Feedforward Networks
✔ This network has one or more hidden layers,
the term "hidden" refers to the fact that this part
of the neural network is not seen directly from
either the input or output of the network.
✔ The function of hidden neurons is to intervene
between the external input and the network
output in some useful manner.
✔ The existence of one or more hidden layers
enables the network to be computationally
stronger.
c) Recurrent Networks
✔ This neural network distinguishes itself from a feedforward
neural network in that it has at least one feedback loop.
Single Layer Recurrent Network
✔ This network is a single-layer network with a feedback
connection in which the processing element's output can
be directed back to itself or to other processing
elements or both.
✔ A recurrent neural network is a class of artificial neural
network where the connection between nodes forms a
directed graph along a sequence.
✔ This allows is it to exhibit dynamic temporal behaviour
for a time sequence. Unlike feedforward neural
networks, RNNs can use their internal state (memory) to
process sequences of inputs.
Multilayer Recurrent Network
✔ In this type of network, processing element
output can be directed to the processing
element in the same layer and in the preceding
layer forming a multilayer recurrent network.
✔ They perform the same task for every element
of the sequence, with the output being
dependent on the previous computations. Inputs
are not needed at each time step.
✔ The main feature of a multilayer recurrent
network is its hidden state, which captures
information about a sequence.
The training rules or learning rules adopted for updating and
adjusting the connection weights.
✔ Just as there are different ways in which we ourselves learn from our
own surrounding environments, neural networks learning processes as
follows: learning with a teacher (supervised learning) and learning
without a teacher (unsupervised learning and reinforcement learning).
Supervised Learning
o Learning with a teacher is also referred to as supervised learning. In conceptual
terms, we may think of the teacher as having knowledge of the environment and
unknown to the neural network.
o The teacher is able to provide the neural network with a desired response for
that training vector. Indeed, the desired response represents the "optimum"
action to be performed by the neural network.
o The network parameters are adjusted under the combined influence of the
training vector and the error signal. The error signal is defined as the difference
between the desired response and the actual response of the network. This
adjustment is carried out iteratively in a step-by-step fashion with the aim of
eventually making the neural network emulate the teacher; the emulation is
presumed to be optimum in some statistical sense.
o In this way, knowledge of the environment available to the teacher is
transferred to the neural network through training and stored in the form of
"fixed" synaptic weights, representing long-term memory. When this condition
is reached, we may then dispense with the teacher and let the neural network
deal with the environment completely by itself.
Unsupervised Learning
o In unsupervised, or self-organized, learning is done without the supervision of a
teacher. The goal of unsupervised learning is to find the underlying structure of
dataset, group that data according to similarities, and represent that dataset in a
compressed format.
o To perform unsupervised learning, we may use a competitive-learning rule. For
example, we may use a neural network that consists of two layers, an input layer
and a competitive layer. The input layer receives the available data.
o The competitive layer consists of neurons that compete with each other (in
accordance with a learning rule) for the "opportunity" to respond to features
contained in the input data. In its simplest form, the network operates in
accordance with a "winner-takes-all" strategy.
o In such a strategy, the neuron with the greatest total input "wins" the
competition and turns on; all the other neurons in the network then
switch off.
Reinforcement Learning
o Reinforcement Learning is a feedback-based Network technique in which an
agent learns to behave in an environment by performing the actions and seeing
the results of actions. For each good action, the agent gets positive feedback,
and for each bad action, the agent gets negative feedback or penalty.
o Since there is no labelled data, so the agent is bound to learn by its experience
only.
o The agent interacts with the environment and explores it by itself. The primary
goal of an agent in reinforcement learning is to improve the performance by
getting the maximum positive rewards.
o The goal of reinforcement learning is to minimize a cost-to-go function,
defined as the expectation of the cumulative cost of actions taken over
a sequence of steps instead of simply the immediate cost.
Activation functions
✔ The activation function is applied over the net input to calculate the output of an
ANN. An integration function (say f) is associated with the input of a
processing element. This function serves to combine activation, information or
evidence from an external source or other processing elements into a net input
to the processing element.
✔ When a signal is fed through a multilayer network with linear activation
functions, the output obtained remains same as that could be obtained using a
single-layer network. Due to this reason, nonlinear functions are widely used in
multilayer networks compared to linear functions.
✔ What is an activation function and why use them?
The activation function decides whether a neuron should be activated or not by
calculating the weighted sum and further adding bias to it. The purpose of the
activation function is to introduce non-linearity into the output of a neuron.
Explanation: We know, the neural network has neurons that work in
correspondence with weight, bias, and their respective activation function. In a
neural network, we would update the weights and biases of the neurons on the basis
of the error at the output. This process is known as back-propagation. Activation
functions make the back-propagation possible since the gradients are supplied along
with the error to update the weights and biases.
✔ Why do we need Non-linear activation function?
A neural network without an activation function is essentially just a
linear regression model. The activation function does the non-
linear transformation to the input making it capable to learn and
perform more complex tasks.
✔ There are many activation functions available. In this part, we'll look at a few:
Identity function:
o It is a linear function and can be defined as f(x)=x for all x
o The output here remains the same as input. The input layer uses the
identity activation function.
Binary step function:
o This function can be defined as
where Θ represents the threshold
value. This function is most widely
used in single-layer nets to
convert the net input to an output
that is a binary (1 or 0).
Bipolar step function:
o This function can be defined
a
where Θ represents the threshold
value. This function is also used in
single-layer nets to convert the
net input to an output that is
bipolar (+1 or –1).
Sigmoidal functions:
o The sigmoidal functions are widely used in back-propagation nets because of
the relationship between the value of the functions at a point and the value
of the derivative at that point which reduces the computational burden
during training.
o Sigmoidal functions are of two types:
1. Binary sigmoid function:
▪ It is also termed as logistic sigmoid function or unipolar sigmoid
function. The range of the sigmoid function is from 0 to 1.
▪ It can be defined as
where λ is the steepness parameter.
The derivative of this function is
2. Bipolar sigmoid function:
▪ The range of the bipolar sigmoid function is from -1 to 1.
▪ It can be defined as
where λ is the steepness parameter.
The derivative of this function is
Tanh Activation Function
o Tanh Activation is an activation function used for neural networks:
o Historically, the tanh function became preferred over the sigmoid function as it
gave better performance for multi-layer neural networks. But it did not solve the
vanishing gradient problem that sigmoid suffered, which was tackled more
effectively with the introduction of ReLU activations.
o The Tanh activation function can be written as:
Rectified Linear Units (ReLU) Activation Function
o Rectified Linear Units, or ReLUs, are a type of activation function that are
linear in the positive dimension, but zero in the negative dimension. The
kink in the function is the source of the non-linearity.
o Linearity in the positive dimension has the attractive property that it prevents
non-saturation of gradients (contrast with sigmoid activations), although for
half of the real line its gradient is zero.
SoftMax Activation Function
o The SoftMax output function transforms a previous layer's output into a
vector of probabilities.
o It is commonly used for multiclass classification. Given an input vector and a
weighting vector w we have:
ARTIFICIAL NEURAL NETWORK TERMINOLOGIES
✔ The ANN (Artificial Neural Network) is based on BNN (Biological Neural
Network) as its primary goal is to fully imitate the Human Brain and its functions.
✔ Similar to the brain having neurons interlinked to each other, the ANN also has
neurons that are linked to each other in various layers of the networks which are
known as nodes.
The ANN learns through various learning algorithms that are described as
supervised or unsupervised learning.
∙ In supervised learning algorithms, the target values are labelled. Its goal is to
try to reduce the error between the desired output (target) and the actual
output for optimization. Here, a supervisor is present.
∙ In unsupervised learning algorithms, the target values are not labelled and
the network learns by itself by identifying the patterns through repeated
trials and experiments.
ANN Terminology:
Weights: each neuron is linked to the other neurons through connection links
that carry weight. The weight has information and data about the input signal.
The output depends solely on the weights and input signal. The weights can be
presented in a matrix form that is known as the Connection matrix.
if there are “n” nodes with each node
having “m” weights, then it is represented
as:
Bias: Bias is a constant that is added to the product of inputs and weights to
calculate the product. It is used to shift the result to the positive or negative side.
The net input weight is increased by a positive bias while the net input weight
is decreased by a negative bias.
Here,{1,x1…xn} are the inputs, and the output (Y) neurons will be computed by the
function g(x) which sums up all the input and adds bias to it.
g(x) = ∑xi+b where i=0 to n
= x1+........+xn+b
The role of the activation is to provide the output depending on the results of
the summation function:
Y=1 if g(x)>=0
Y=0 else
Threshold: A threshold value is a constant value that is compared to the
net input to get the output. The activation function is defined based on the
threshold value to calculate the output.
For Example:
Y=1 if net-input>=threshold
Y=0 else
Learning Rate: The learning rate is denoted α. It ranges from 0 to 1. It is used
for balancing weights during the learning of ANN.
Target value: Target values are Correct values of the output variable and are also
known as just targets.
Error: It is the inaccuracy of predicted output values compared to Target Values.
SUPERVISED LEARNING NETWORKS
Perceptron Networks
✔ The perceptron was first introduced by Mr. Frank Rosenblatt in 1957.
✔ A Perceptron is an algorithm for supervised learning of binary classifiers.
✔ There are two types of Perceptrons: Single layer and Multilayer.
o Single layer - Single layer perceptrons can learn only linearly separable
patterns
o Multilayer - Multilayer perceptrons or feedforward neural networks
with two or more layers have the greater processing power
✔ The Perceptron algorithm learns the weights for the input signals in order to
draw a linear decision boundary.
✔ This enables you to distinguish between the two linearly separable classes
+1 and -1.
Perceptron algorithm (Single layer)
Step – 0: Initialize the weights and the bias (for easy calculation they can be set to zero). Also
initialize
the learning rate α (0, α, 1) for simplicity α is set to 1.
Step – 1: Perform step 2 to 6 until the final stopping condition is false.
Step – 2: Perform step 3 to 5 each training pair indicated by s:t
Step – 3: The input layer containing input units is applied with identity activation function: xi =
si
Step – 4: Calculate the output of the network. to do so first obtain the net input:
where n is the number of inputs neurons in the input layer. Then apply activation
over the net input calculated to obtain the output
Step – 5: Weight and bias adjustment: Compare the value of the actual (calculated) output and
desire(target) output
Step – 6: Train the network until there is no weight change. This is the stopping condition for
the
network. If this condition is not met, then start again form step 2.
Perceptron algorithm (Multiple layer)
Step – Initialize the weights, biases and learning rate suitably.
0:
Step – Check for stopping condition; if it is false, perform Steps 2–6.
1:
Step – Perform Steps 3–5 for each bipolar or binary training vector pair s:t.
2:
Step – Set activation (identity) of each input unit i = 1 to n: xi = si
3:
Step – Calculate output response of each output unit j = 1 to m: First, the net input is
4: calculated as
Then activations are applied over the net input to calculate the output response:
Step – Make adjustment in weights and bias for j = 1 to m and i = 1 to n.
5:
Step – Test for the stopping condition, i.e., if there is no change in weights then stop the
6: training
process, else start again from Step 2.
Example of Single layer Perceptron
We need to understand that the output of an AND gate is 1 only if both inputs (in
this case, x1 and x2) are 1.
Truth table for AND function with bipolar inputs and targets.
ADAPTIVE LINEAR NEURON (ADALINE)
✔ Adaline which stands for Adaptive Linear Neuron, is a network having a single
linear unit. It was developed by Widrow and Hoff in 1960. Some important
points about Adaline are as follows −
o It uses bipolar activation function.
o Adaline neuron can be trained using Delta rule or Least Mean Square
(LMS) rule or widrow-hoff rule
o The net input is compared with the target value to compute the error
signal.
o On the basis of adaptive training algorithm weights are adjusted
ADALINE BASIC ARCHITECTURE
✔ The basic structure of Adaline is similar to perceptron having an extra
feedback loop with the help of which the actual output is compared with the
desired/target output. After comparison on the basis of training algorithm,
the weights and bias will be updated.
Adaptive Linear Neuron Learning algorithm
Step – 0: Initialize the weights and the bias are set to some random values but not to
zero, also
initialize the learning rate α.
Step – 1: Perform steps 2-7 when stopping condition is false.
Step – 2: Perform steps 3-5 for each bipolar training pair s:t.
Step – 3: Activate each input unit as follows –
xi = si (i=1 to n)
Step – 4: Obtain the net input with the following relation –
Here ‘b’ is bias and ‘n’ is the total number of input neurons.
Step – 5: Until least mean square is obtained (t - yin), Adjust the weight and bias as
follows –
Now calculate the error using 🡪 E = (t - yin)2
Step – 6: Test for the stopping condition, if error generated is less-than or equal to
specified
tolerance then stop.
MULTIPLE ADAPTIVE LINEAR NEURON (MADALINE)
✔ Madaline which stands for Multiple Adaptive Linear Neuron, is a network
which consists of many Adalines in parallel. It will have a single output unit.
Some important points about Madaline are as follows −
o It is just like a multilayer perceptron, where Adaline will act as a hidden unit
between the input and the Madaline layer.
o The weights and the bias between the input and Adaline layers, as in we see
in the Adaline architecture, are adjustable.
o The Adaline and Madaline layers have fixed weights and bias of 1.
o Training can be done with the help of Delta rule.
MADALINE BASIC ARCHITECTURE
✔ It consists of “n” units of input layer and “m” units of Adaline layer and “1” unit
of the Madaline layer. Each neuron in the Adaline and Madaline layers has a bias
of excitation “1”.
✔ The Adaline layer is present between the input layer and the Madaline layer; the
Adaline layer is considered as the hidden layer.
Multiple Adaptive Linear Neuron (Madaline) Training Algorithm
By now we know that only the weights and bias between the input and the
Adaline layer are to be adjusted, and the weights and bias between the Adaline
and the Madaline layer are fixed.
Step – 0: Initialize the weights and the bias (for easy calculation they can be set to zero). Also
initialize
the learning rate α (0, α, 1) for simplicity α is set to 1.
Step – 1: Perform steps 2-6 when stopping condition is false.
Step – 2: Perform steps 3-5 for each bipolar training pair s:t
Step – 3: Activate each input unit as follows –
xi = si (i=1 to n)
Step – 4: Obtain the net input at each hidden layer, i.e. the Adaline layer with the following
relation
Here ‘b’ is bias and ‘n’ is the total number of input neurons.
Step – 5: Apply the following activation function to obtain the final output at the Adaline and the
Madaline
layer –
Output at the hidden (Adaline)
unit Final output of the
network
Step – 6: Calculate the error and adjust the weights as follows –
Step – 7: Test for the stopping condition, which will happen when there is no change in weight or
the
highest weight change occurred during training is smaller than the specified tolerance.
BACK PROPAGATION
Back Propagation Neural Networks
✔ Back Propagation Neural (BPN) was first introduced in 1960s, it is a multilayer
neural network consisting of the input layer, at least one hidden layer and output
layer.
✔ It is a multi-layer feed forward neural network using Gradient descent approach
which exploits the chain rule to optimize the parameter.
✔ The error which is calculated at the output layer, by comparing the target output and
the actual output, will be propagated back towards the input layer.
✔ The main features of Backpropagation are the iterative, recursive and efficient
method through which it calculates the updated weight to improve the network until
it is not able to perform the task for which it is being trained.
Back Propagation Neural Networks Training Algorithm
Back Propagation Neural will use binary sigmoid activation function.
The training will have the following three phases.
∙ Phase 1 − Feed Forward Phase
∙ Phase 2 − Back Propagation of error
∙ Phase 3 − Updating of weights
Step – 0: Initialize the weights and the bias (For easy calculation and
simplicity, take some small random values but not zero). Also
initialize the learning rate α (0, α, 1).
Step – 1: Continue step 2 – 10 when the stopping condition is not true.
Step – 2: Continue step 3 – 9 for every training pair.
PHASE – 1
Step – 3: Each input unit receives input signal xi and sends it to the hidden unit for all i
= 1 to n
Step – 4: Calculate the net input at the hidden unit using the following relation –
Now calculate the net output by applying the following sigmoidal activation
function
Step – 5: Calculate the net input at the output layer unit using the following relation
Calculate the net output by applying the following sigmoidal activation
function
PHASE – 2
Step – 6: Compute the error correcting term, in correspondence with the target pattern received at each
output unit, as follows
if Binary sigmoid function:
if Bipolar sigmoid function:
Then, send δk back to the
hidden layer
Step – 7: Now each hidden unit will be the sum of its delta inputs from the output units.
Error term can be calculated as follows –
PHASE – 3
Step – 8: Each output unit yk (k = 1 to m) updates the weight and bias as follows
Step – 9: Each Hidden unit qj (j = 1 to p) updates the weight and bias as follows
Step – 10: Check for the stopping condition, which may be either the number of
epochs reached or the target output matches the actual output.
ASSOCIATE MEMORY NETWORK
✔ An associative memory network can store a set of patterns as memories. When
the associative memory is being presented with a key pattern, it responds by
producing one of the stored patterns, which closely resembles or relates to the
key pattern.
✔ Thus, the recall is through association of the key pattern, with the help of
information memorized. These types of memories are also called as
content-addressable memories (CAM). The CAM can also be viewed as
associating data to address, i.e.; for every data in the memory there is a
corresponding unique address. Also, it can be viewed as data correlator.
✔ Here input data is correlated with that of the stored data in the CAM. It should
be noted that the stored patterns must be unique, i.e., different patterns in each
location.
✔ If the same pattern exists in more than one location in the CAM, then, even
though the correlation is correct, the address is noted to be ambiguous.
Associative memory makes a parallel search within a stored data file. The
concept behind this search is to Output any one or all stored items which
match the given search argument.
TRAINING ALGORITHMS FOR PATTERN ASSOCIATION
There are two algorithms developed for training of pattern association nets.
1. Hebb Rule
2. Outer Products Rule
1. Hebb Rule
✔ The Hebb rule is widely used for finding the weights of an associative
memory neural network. The training vector pairs here are denoted as s:t.
✔ The weights are updated until there is no weight change.
Hebb Rule Algorithmic
Step – Set all the initial weights to zero, i.e., Wij = 0 (i = 1 to n, j = 1 to m)
0:
Step – For each training target input output vector pairs s:t, perform Steps 2-
1: 4.
Step – Activate the input layer units to current training input, Xi=Si (for i = 1
2: to n)
Step – Activate the output layer units to current target output, y j = tj (for j = 1
3: to m)
Step – Start the weight adjustment
4:
2. Outer Products Rule
✔ Outer products rule is a method for finding weights of an associative net.
Input🡪 s = (s1, ... ,si, ... ,sn)
Output🡪 t= (t1, ... ,tj, ... ,tm)
✔ The outer product of the two vectors is the product of the matrices S = sT and T = t,
i.e., between [n X 1] matrix and [1 x m] matrix.
✔ The transpose is to be taken for the input matrix given.
ST = sTt 🡪
✔ This weight matrix is same as the weight matrix obtained by Hebb rule to
store the pattern association s:t. For storing a set of associations, s(p):t(p),
p = 1 to P, wherein,
s(p) = (s1 (p}, ... , si(p), ... , sn(p)) t(p)
= (t1 (p), · · ·' tj(p), · · · 'tm(p))
the weight matrix W = {wij} can be
given as
There two types of associative memories
∙ Auto Associative Memory
∙ Hetero Associative memory
Auto Associative Memory
✔ An auto-associative memory recovers a previously stored pattern that most
closely relates to the current pattern. It is also known as an auto-associative
correlator.
✔ In the auto associative memory network, the training input vector and
training output vector are the same
Auto Associative Memory
Algorithm Training Algorithm
For training, this network is using the Hebb or Delta learning rule.
Step – Initialize all the weights to zero as wij = 0 i=1 to n, j=1 to n
1:
Step – Perform steps 3-4 for each input vector.
2:
Step – Activate each input unit as follows –
3: xi = si (i = 1 to n)
Step – Activate each output unit as follows –
4: yj = sj (j = 1 to n)
Step – Adjust the weights as follows –
5: wij(new) = wij(old) + xiyj
The weight can also be determined form the Hebb Rule or Outer Products Rule
learning
Testing
Algorithm
Step – Set the weights obtained during training for Hebb’s rule.
1:
Step – Perform steps 3-5 for each input vector.
2:
Step – Set the activation of the input units equal to that of the input vector.
3:
Step – Calculate the net input to each output unit j = 1 to n;
4:
Step – Apply the following activation function to calculate the output
5:
Hetero Associative memory
✔ In a hetero-associate memory, the training input and the target output vectors
are different. The weights are determined in a way that the network can store a
set of pattern associations.
✔ The association here is a pair of training input target output vector pairs (s(p),
t(p)), with p = 1,2,…p. Each vector s(p) has n components and each vector t(p)
has m components. The determination of weights is done either by using
Hebb rule or delta rule.
✔ The net finds an appropriate output vector, which corresponds to an input
vector x, that may be either one of the stored patterns or a new pattern.
Hetero Associative Memory
Algorithm Training Algorithm
Step – 1: Initialize all the weights to zero as wij = 0 i= 1 to n, j= 1 to m
Step – 2: Perform steps 3-4 for each input vector.
Step – 3: Activate each input unit as follows –
xi = si (i = 1 to n)
Step – 4: Activate each output unit as follows –
yj = sj (j = 1 to m)
Step – 5: Adjust the weights as follows –
The weight can also be determined form the Hebb Rule or Outer Products Rule
learning
Testing
Algorithm
Step – Set the weights obtained during training for Hebb’s rule.
1:
Step – Perform steps 3-5 for each input vector.
2:
Step – Set the activation of the input units equal to that of the input vector.
3:
Step – Calculate the net input to each output unit j = 1 to m;
4:
Step – Apply the following activation function to calculate the output
5:
BIDIRECTIONAL ASSOCIATIVE MEMORY (BAM)
✔ Bidirectional associative memory (BAM), first proposed by Bart Kosko in
the year 1988.
✔ The BAM network performs forward and backward associative searches for
stored stimulus responses.
✔ The BAM is a recurrent hetero associative pattern-marching network that
encodes binary or bipolar patterns using Hebbian learning rule.
✔ It associates patterns, say from set A to patterns from set B and vice versa
is also performed. BAM neural nets can respond to input from either layer
(input layer and output layer).
Bidirectional Associative Memory Architecture
✔ The architecture of BAM network consists of two layers of neurons which are
connected by directed weighted pare interconnections.
✔ The network dynamics involve two layers of interaction. The BAM network iterates
by sending the signals back and forth between the two layers until all the neurons
reaches equilibrium. The weights associated with the network are bidirectional. Thus,
BAM can respond to the inputs in either layer.
✔ Figure shows a BAM network consisting of n units in X layer and m units in
Y layer.
✔ The layers can be connected in both directions (bidirectional) with the result
the weight matrix sent from the X layer to the Y layer is W and the weight
matrix for signals sent from the Y layer to the X layer is WT. Thus, the
Weight matrix is calculated in both directions.
Determination of Weights
✔ Let the input vectors be denoted by s(p) and target vectors by t(p). p = 1, ... , P. Then
the weight matrix to store a set of input and target vectors, where
s(p) = (s1(p), .. , si(p), ... , sn(p))
t(p) = (t1(p), .. , tj(p), ... , tm(p))
can be determined by Hebb rule training a1gorithm. In case of input vectors being
binary, the weight matrix W = {wij} is given by
When the input vectors are bipolar, the weight matrix W = {wij} can be defined as
The activation function for the Y- The activation function for the Y-
layer
1. With binary input 1.layer
With binary input vectors is
vectors is
2. With bipolar input 2. With bipolar input vectors is
vectors is
Testing Algorithm for Discrete Bidirectional Associative
Memory
Step – 0: Initialize the weights to store p vectors. Also initialize all the activations to
zero.
Step – 1: Perform Steps 2-6 for each testing input.
Step – 2: Set the activations of X layer to current input pattern, i.e., presenting the
input
pattern x to X layer and similarly presenting the input pattern y to Y layer.
Even though, it is bidirectional memory, at one time step, signals can be
sent from only one layer. So, either of the input patterns may be the zero
vector.
Step – 3: Perform Steps 4-6 when the activations are not converged.
Step – 4: Update the activations of units in Y layer. Calculate the net input,
Applying ilie activations, we obtain
Send this signal to the X layer.
Step – Update the activations of units in X layer. Calculate the net input,
5:
Applying ilie activations, we
obtain Send this signal to the
Y layer.
Step – Test for convergence of the net. The convergence occurs if the
6: activation
vectors x and y reach equilibrium. If this occurs then stop,
Otherwise, continue.
HOPFIELD NETWORKS
Hopfield Neural Network
✔ Hopfield neural network was proposed by John J. Hopfield in 1982. It is an
auto-associative fully interconnected single layer feedback network.
✔ It is a symmetrically weighted network (i.e., Wij = Wji).
✔ The Hopfield network is commonly used for auto-association and optimization
tasks.
✔ The Hopfield network is of two types
1. Discrete Hopfield Network
2. Continuous Hopfield Network
Discrete Hopfield Network
✔ When this is operated in discrete line fashion it is called as discrete Hopfield
network
✔ The network takes two-valued inputs: binary (0, 1) or bipolar (+1, -1); the
use of bipolar inputs makes the analysis easier. The network has symmetrical
weights with no self-connections, i.e.,
Wij = Wji;
Wij = 0
if i = j
Architecture of Discrete Hopfield Network
✔ The Hopfield's model consists of processing
elements with two outputs, one inverting and
the other non-inverting.
✔ The outputs from each processing element are
fed back to the input of other processing
elements but not to itself.
Training Algorithm of Discrete Hopfield Network
✔ During training of discrete Hopfield network, weights will be updated. As we
know that we can have the binary input vectors as well as bipolar input vectors.
✔ Let the input vectors be denoted by s(p), p = 1, ... , P. Then the weight matrix W
to store a set of input vectors, where
✔ In case of input vectors being binary, the weight matrix W = {wij} is given by
✔ When the input vectors are bipolar, the weight matrix W = {wij} can be defined as
Testing Algorithm of Discrete Hopfield
Network
Step – Initialize the weights to store patterns, i.e., weights obtained from training
0:
algorithm using Hebb rule.
Step – When the activations of the net are not converged, then perform Steps 2-
1: 8.
Step – Perform Steps 3-7 for each input vector X.
2:
Step – Make the initial activations of the net equal to the external input vector X:
3:
Step – Perform Steps 5-7 for each unit yi. (Here, the units are updated in random
4: order.)
Step – Calculate the net input of the network:
5:
Step – Apply the activations over the net input to calculate the output:
6:
where θi is the threshold and is normally taken as zero.
Step – Now feedback the obtained output yi to all other units. Thus, the
7: activation
vectors are updated.
Step – Finally, test the network for convergence.
8:
Energy Function Evaluation
✔ An energy function is defined as a function that is bonded and
non-increasing function of the state of the system.
✔ Energy function Ef also called Lyapunov function determines the stability
of discrete Hopfield network, and is characterized as follows –
✔ Condition − In a stable network, whenever the state of node changes, the
above energy function will decrease.
✔ Suppose when node i has changed state from y (k) to y (k+1) then the
Energy
i
i
change ΔEf is given by the following relation
✔ Here
✔ The change in energy depends on the fact that only one unit can update its
activation at a time.
Continuous Hopfield Network
✔ Continuous network has time as a continuous variable, and can be used for
associative memory problems or optimization problems like traveling
salesman problem.
✔ The nodes of this network have a continuous, graded output rather than a
two-state binary output. Thus, the energy of the network decreases
continuously with time.
✔ In comparison with Discrete Hopfield network, continuous network has time
as a continuous variable. It is also used in auto association and optimization
problems such as travelling salesman problem.
✔ Model − The model or architecture can be built up by adding electrical
components such as amplifiers which can map the input voltage to the
output voltage over a sigmoid activation function.
Energy Function Evaluation
Here λ is gain parameter and gri input conductance.