0% found this document useful (0 votes)
159 views58 pages

Unit 5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
159 views58 pages

Unit 5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Basics of Neural Network

• Basics of Neural Network: Introduction, Understanding the Biological Neuron,


Exploring the Artificial Neuron, Types of Activation Functions, Early
Implementations of ANN, Architectures of Neural Network.
Introduction
• Machine learning, mimics the human form of learning.

• On the other hand, human learning, or for that matter every action of a human
being, is controlled by the nervous system.

• In any human being, the nervous system coordinates the different actions by
transmitting signals to and from different parts of the body.

• The nervous system is constituted of a special type of cell, called neuron or


nerve cell which has special structures allowing it to receive or send signals to
other neurons.

• This structure essentially forms a network of neurons or a neural network.


Introduction
• The biological neural network is a massively large and complex parallel computing
network.

• It is because of this massive parallel computing network that the nervous system helps
human beings to perform actions or take decisions at a very high speed.

• The fascinating capability of the biological neural network has inspired the inception of
artificial neural network (ANN) which is made up of artificial neurons.

• In generic form, an ANN is a machine designed to model the functioning of the nervous
system or, more specifically, the neurons.

• The difference between biological neuron and artificial neuron is in the digital form a
neuron.

• Digital neurons or artificial neurons form the smallest processing units of the ANNs.
Understanding the biological neuron
• Neurons are basic structural units of the central nervous system (CNS).

• A neuron is able to receive, process, and transmit information in the form of


chemical and electrical signals.

• Figure below presents the structure of a neuron. It has three main parts to
carry out its primary functionality of receiving and transmitting information:
Understanding the biological neuron
• 1. Dendrites – to receive signals from neighbouring neurons.

• 2. Soma – main body of the neuron which accumulates the signals coming from
the different dendrites. It ‘fires’ when a sufficient amount of signal is
accumulated.

• 3. Axon – last part of the neuron which receives signal from soma, once the
neuron ‘fires’, and passes it on to the neighbouring neurons through the axon
terminals.

• There is a very small gap between the axon terminal of one neuron and the
adjacent dendrite of the neighbouring neuron.

• This small gap is known as synapse.


Understanding the biological neuron
• The signals transmitted through synapse may be excitatory or inhibitory.

• The adult human brain, which forms the main part of the central nervous
system, is approximately 1.3 kg in weight and 1200 cm3 in volume.

• It is estimated to contain about 100 billion (i.e. 1014 ) neurons.

• The axon size varied from 10–12 μm in diameter.


Exploring the artificial neuron
• The biological neural network has been modelled in the form of ANN with
artificial neurons simulating the functionality of biological neurons.

• As depicted in Figure below, each neuron has three major components: input
signal xi (x1, x2,.…, xn ) comes to an artificial neuron, summing part, and
threshold activation function.

• 1. A set of ‘i’ synapses having weight wi . A signal xi forms the input to the i-th
synapse having weight wi.

• The value of weight wi may be positive or negative.

• A positive weight has an excitatory effect, while a negative weight has an


inhibitory effect on the output of the summation junction, ysum.
Exploring the artificial neuron
• 2. A summation junction for the input signals is weighted by the
respective synaptic weight.

• The output of the summation junction, ysum, can be expressed as follows:

• Typically, a neural network also includes a bias which adjusts the input of
the activation function.
Exploring the artificial neuron
• 3. A threshold activation function (or
simply activation function, also called
squashing function) results in an output
signal only when an input signal exceeding
a specific threshold value comes as an
input.

• Output of the activation function, yout , can


be expressed as follows:
Types of activation functions
• There are different types of activation functions. The most
commonly used activation functions are :
• 1. Identity function
• 2. Threshold/step function
• 3. ReLU (Rectified Linear Unit) function
• 4. Sigmoid function
• 5. Hyperbolic tangent function
Types of activation functions
• 1. Identity function

• Identity function is used as an activation function for the input layer. It is a


linear function having the form

• The output remains the same as the input.

• 2. Threshold/step function

• Step/threshold function is a commonly used activation function. Figure (a), below


shows, step function which gives 1 as output if the input is either 0 or positive.
Types of activation functions
• If the input is negative, the step function gives 0 as
output. Expressing mathematically.

• The threshold function (depicted in Figure b) is


almost like the step function, with the only
difference being the fact that θ is used as a
threshold value instead of 0.

• Expressing mathematically
Types of activation functions
• 3. ReLU (Rectified Linear Unit) function

• ReLU is the most popularly used activation

function in the areas of convolutional neural

networks and deep learning.

• It is of the form
• This function is differentiable,
except at a single point x = 0.
• This means that f(x) is zero when x is less
• In that sense, the derivative of a
than zero and f(x) is equal to x when x is ReLU is actually a sub derivative.
above or equal to zero.
Types of activation functions
4. Sigmoid function

Sigmoid function is shown in Figure, which is the


most commonly used activation function in neural
networks.

The need for sigmoid function tells that many


learning algorithms require the activation function
to be differentiable and hence continuous.

There are two types of sigmoid function:

1. Binary sigmoid function

2. Bipolar sigmoid function


Types of activation functions
4.1 Binary sigmoid function

• A binary sigmoid function, depicted in the Figure a, takes the form

• where k = steepness or slope parameter of the sigmoid function.

• By varying the value of k, sigmoid functions with different slopes can be


obtained. It has range of (0, 1).

• The slope at origin is k/4. As the value of k becomes very large, the sigmoid
function becomes a threshold function.
Types of activation functions
4.2 Bipolar sigmoid function

• A bipolar sigmoid function, depicted in the Figure (b), takes the form.

• The range of values of sigmoid functions can be varied depending on the


application.

• However, the range of (−1, +1) is most commonly adopted.


Types of activation functions
5. Hyperbolic tangent function

• Hyperbolic tangent function is another continuous activation function, which is


bipolar in nature.

• It is a widely adopted activation function for a special type of neural network


known as backpropagation network.

• The hyperbolic tangent function is of the form

• This function is similar to the bipolar sigmoid function.


Early implementations of ANN
1. McCulloch–Pitts model of neuron

• The McCulloch–Pitts neural model which is the earliest ANN model that has
only two types of inputs weights – excitatory and inhibitory.

• The excitatory weights have weights of positive magnitude and the inhibitory
weights have weights of negative magnitude.

• The inputs of the McCulloch Pitts neuron could be either 0 or 1.

• It has a threshold function as activation function.

• It has a threshold function as activation function. So, the output signal yout is 1
if the input ysum is greater than or equal to a given threshold value, else 0.
Early implementations of ANN
• Simple McCulloch–Pitts neurons can be used to design logical operations.

• For that purpose, the connection weights need to be correctly decided along with
the threshold function.

• Figure below shows McCulloch–Pitts model of neuron


Early implementations of ANN
• Example :

• Consider a two input signals values x1 and x2 can be either 0 or 1. Use the
value of both weights x1 and x2 as 1 and a threshold value of the activation
function as 1. Verify the final output using McCulloch–Pitts model of neuron.

• Solution :

• The neural model and truth table will look as below for two input signals.

X1 X2 Yout
0 0
0 1
1 0
1 1
Early implementations of ANN
• Formally, we can say

• The truth table built with respect to the problem is depicted Figure
below :
Early implementations of ANN

• The following problems are found in the MP model :

1. What about non-boolean (say, real) inputs ?

2. Do we always need to hand code the threshold ?

3. Are all inputs equal ? What if we want to assign more weight (importance)
to some inputs ?

4. What about functions which are not linearly separable ?


Early implementations of ANN
2. Rosenblatt’s perceptron

Rosenblatt’s perceptron is built around the


McCulloch–Pitts neural model. The
perceptron, as depicted in Figure below:

The linear combiner or the adder node


computes the linear combination of the
inputs x1, x2, x3…. applied to the synapses
with synaptic weights being w1, w2, w3,….

Then, the hard limiter checks whether the


resulting sum is positive or negative.
Early implementations of ANN
• If the input of the hard limiter node is positive, the output is +1, and if
the input is negative, the output is −1.

• Mathematically, the hard limiter input is :

• However, perceptron includes an adjustable value or bias as an


additional weight w .

• The output is decided by the expression


Early implementations of ANN
• The objective of perceptron is to classify a set of inputs into two classes, c1
and c2.

• This can be done using a very simple decision rule – assign the inputs x0 ,
x1 , x2 , …, xn to c1 if the output of the perceptron, i.e. yout , is +1 and c2 if
yout is −1.

• Therefore, for two input signals denoted by variables x1 and x2 , the


decision boundary is a straight line of the form
Early implementations of ANN
• Thus, we can see that for a data set with linearly
separable classes, perceptrons can always be
employed to solve classification problems using
decision lines (for two dimensional space),
decision planes (for three-dimensional space), or
decision hyperplanes (for n-dimensional space).

• If the classes are non-linearly separable as shown


in Figure, then the classification problem cannot
be solved by perceptron.
Early implementations of ANN
• Multi-layer perceptron

• A basic perceptron works very successfully for data sets which possess linearly
separable patterns. However, in practical situation, that is an ideal situation to
have.

• This was exactly the point driven by Minsky and Papert in their work (1969).

• They showed that a basic perceptron is not able to learn to compute even a
simple 2-bit XOR.

• Figure shows the truth table highlighting output of a 2-bit XOR function.
Early implementations of ANN

• As shown in Figure, the data is not linearly separable. Only a curved decision

boundary can separate the classes properly.


Early implementations of ANN
• To address this issue, the other option is to use two decision lines in
place of one.
• Figure shows how a linear decision boundary with two decision lines
can clearly partition the data.
Early implementations of ANN
• This is the philosophy used to design the multi-layer perceptron model.

• The major highlights of this model are as follows:

1. The neural network contains one or more intermediate layers between


the input and the output nodes, which are hidden from both input and
output nodes.

2. Each neuron in the network includes a non-linear activation function


that is differentiable.

3. The neurons in each layer are connected with some or all the neurons in
the previous layer.
Early implementations of ANN
Early implementations of ANN
• 3. ADALINE network model

• Adaptive Linear Neural Element (ADALINE) is an early single-layer ANN developed by


Professor Bernard Widrow of Stanford University.

• As shown in the Figure it has only output neuron. The output value can be +1 or −1.

• A bias input x0 (where x = 1) having a weight w0 is added.

• The activation function is such that if the weighted sum is positive or 0, then the output is 1,
else it is −1.
Early implementations of ANN

• The supervised learning algorithm adopted by the ADALINE network is known


as Least Mean Square (LMS) or Delta rule.

• A network combining a number of ADALINEs is termed as MADALINE (many


ADALINE). MADALINE networks can be used to solve problems related to
nonlinear separability.
Architectures of neural network
• ANN is a computational system consisting of a large number of interconnected
units called artificial neurons.

• The connection between artificial neurons can transmit signal from one
neuron to another.

• There are multiple possibilities for connecting the neurons based on which
architecture we are going to adopt for a specific solution.

• Some of the possibilities could be :

1. There may be just two layers of neuron in the network – the input and output
layer.
Architectures of neural network
2. Other than the input and output layers, there may be one or more

intermediate ‘hidden’ layers of neuron.

3. The neurons may be connected with one or more of the neurons in the next

layer.

4. The neurons may be connected with all neurons in the next layer.

5. There may be single or multiple output signals. If there are multiple output

signals, they might be connected with each other.

6. The output from one layer may become input to neurons in the same or

preceding layer.
Architectures of neural network
• 1 Single-layer feed forward network

• Single-layer feed forward is the simplest and most basic architecture of ANNs.
It consists of only two layers as depicted in Figure. the input layer and the
output layer.
Architectures of neural network
• The input layer consists of a set of ‘m’ input neurons X1 , X2 , …, Xn connected
to each of the ‘n’ output neurons Y1 , Y2 , …, Yn . The connections carry
weights w11 , w12 , …, wmn .

• The input layer of neurons does not conduct any processing – they pass the
input signals to the output neurons.

• The computations are performed only by the neurons in the output layer.

• Hence, known as single layer in spite of having two layers of neurons.

• The signals always flow from the input layer to the output layer. So, this
network is known as feed forward.
Architectures of neural network
• The net signal input to the output neurons is given by:

• for the k-th output neuron. The signal output from each output neuron
will depend on the activation function used.
Architectures of neural network
• 2 Multi-layer feed forward ANNs

• The multi-layer feed forward network is quite similar to the single-layer feed forward
network, except for the fact that there are one or more intermediate layers of neurons
between the input and the output layers.

• The structure of this network is depicted in Figure


Architectures of neural network
• Each of the layers may have varying number of neurons.

• The net signal input to the neuron in the hidden layer is given by

• for the k-th hidden layer neuron.

• The net signal input to the neuron in the output layer is given by

• for the k-th output layer neuron.


Architectures of neural network
• 3. Competitive network

• The competitive network is almost the same in structure as the single-layer feed
forward network.

• The only difference is that the output neurons are connected with each other.

• Figure below depicts a fully connected competitive network


Architectures of neural network
• In competitive networks, for a given input, the output neurons compete
amongst themselves to represent the input.

• It represents a form of unsupervised learning algorithm in ANN that is


suitable to find clusters in a data set.
Architectures of neural network
• 4. Recurrent network

• In feed forward networks, signals

always flow from the input layer

towards the output layer (through the

hidden layers in the case of multi-layer

feed forward networks), i.e. in one In

the case of recurrent neural networks,

there is a small deviation. There is a

feedback loop, as shown in Figure.


Learning process in ANN
• What is learning in the context of ANNs?
• There are four major aspects which need to be decided:
• 1. The number of layers in the network
• 2. The direction of signal flow
• 3. The number of nodes in each layer
• 4. The value of weights attached with each interconnection between
neurons
Learning process in ANN
• 1 Number of layers

• A neural network may have a single layer or multi-layer.

• In the case of a single layer, a set of neurons in the input layer receives signal, i.e. a
single feature per neuron, from the data set.

• The value of the feature is transformed by the activation function of the input
neuron.

• The signals processed by the neurons in the input layer are then forwarded to the
neurons in the output layer.

• The neurons in the output layer use their own activation function to generate the
final prediction.
Learning process in ANN
• 2 Direction of signal flow

• Signal is always fed in one direction, i.e. from the input layer towards the output
layer through the hidden layers, if there is any.

• However, certain networks, such as the recurrent network, also allow signals to
travel from the output layer to the input layer.

• 3 Number of nodes in layers

• In the case of a multi-layer network, the number of nodes in each layer can be
varied.

• However, the number of nodes or neurons in the input layer is equal to the number
of features of the input data set.
Learning process in ANN
• Similarly, the number of output nodes will depend on possible outcomes, e.g. number
of classes in the case of supervised learning.

• The number of nodes in each of the hidden layers is to be chosen by the user.

• A larger number of nodes in the hidden layer help in improving the performance.
But, too many nodes may result in overfitting as well.

• 4 Weight of interconnection between neurons

• Deciding the value of weights attached with each interconnection between neurons
for a specific learning problem is quite a difficult problem.

• In the case of supervised learning, the objective to be pursued is to reduce the


number of misclassifications.
Learning process in ANN
• The iterations for making changes in weight values should be continued
till there is no misclassification.

• Learning process using ANN is a combination of multiple aspects – which


include deciding the number of hidden layers, number of nodes in each of
the hidden layers, direction of signal flow, and last but not the least,
deciding the connection weights.
Back propagation
• The learning method adopted to train a multi-layer feed forward network
is termed as backpropagation.

• In 1986, an efficient method of training an ANN was discovered.

• In this method, errors, i.e. difference in output values of the output layer
and the expected values, are propagated back from the output layer to the
preceding layers.

• The algorithm implementing this method is known as backpropagation,


i.e. propagating the errors backward to the preceding layers.
Back propagation
• The backpropagation algorithm is applicable for multi-layer feed
forward networks.

• It is a supervised learning algorithm which continues adjusting the


weights of the connected neurons with an objective to reduce the
deviation of the output signal from the target output.

• This algorithm consists of multiple iterations, also known as epochs.


Each epoch consists of two phases -
Back propagation
• A forward phase in which the signals flow from the neurons in the input layer to the
neurons in the output layer through the hidden layers.

• The weights of the interconnections and activation functions are used during the
flow.

• In the output layer, the output signals are generated.

• A backward phase in which the output signal is compared with the expected value.

• The computed errors are propagated backwards from the output to the preceding
layers.

• The errors propagated back are used to adjust the interconnection weights between
the layers.
Back propagation
• The iterations continue till a stopping criterion is reached. Figure shows a
simplified version of the backpropagation algorithm.
Back propagation
• One main part of the algorithm is adjusting the interconnection weights.

• This is done using a technique termed as gradient descent.

• In simple terms, the algorithm calculates the partial derivative of the


activation function by each interconnection weight to identify the
‘gradient’ or extent of change of the weight required to minimize the cost
function.

• Backpropagation net with one hidden layer is depicted in Figure below. In


this network, X0 is the bias input to the hidden layer and Y0 is the bias
input to the output layer.
Back propagation

• The net signal input to the hidden layer neurons is given by :


Back propagation
• For the k-th neuron in the hidden layer. If fy is the activation function of the
hidden layer, then

• The net signal input to the output layer neurons is given by :

• for the k-th neuron in the output layer. The input signals to X and Y are
assumed as 1. If fz is the activation function of the hidden layer, then
Back propagation
• If tk is the target output of the k-th output neuron, then the cost function
defined as the squared error of the output layer is given by

• As a part of the gradient descent algorithm, partial derivative of the cost


function E has to be done with respect to each of the interconnection weights
W’01, W’02,……w’nr

• Mathematically, it can be represented as follows:


Back propagation

• for the interconnection weight between the j-th neuron in the hidden layer
and the k-th neuron in the output layer.

You might also like