0% found this document useful (0 votes)

29 views106 pages

Ann mod1

Uploaded by

imnavinbabu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views106 pages

Ann mod1

Uploaded by

imnavinbabu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 106

20MCA283

DEEP LEARNING
Module 1
Artificial Neural Networks
Introduction
• Two major Problem Solving Techniques are
• Hard computing
• Soft Computing
Introduction
Hard computing
• Hard computing deals with precise models where
accurate solutions are achieved quickly.
• Hard Computing technique require exact input
data
• It is strictly sequential and provides precise
answers to complex problems.
Introduction
Soft computing
• Soft Computing deals with approximate models.
• The term "soft computing" was introduced by
Professor Lofti Zadeh in 1994.
• Provides solutions to complex problems and deals
with imprecise, uncertain and partial truth data.
• Soft computing is a combination of Neural
Network, Fuzzy Logic and Genetic Algorithm.
Deep learning
Deep learning
Deep learning
Area
Facilities
Price
Age
Location
Deep learning
Deep learning
• Deep learning is a subset of machine learning in artificial
intelligence.
• It is capable of implementing a function that is used to mimic the functionality
of the brain by creating patterns(ability to learn and encode significant features
from the input data) and processing data.
• Uses artificial neural network with many layers to address complex problems.

• Problems in Deep Learning

• Categorization (Classification)
• Prediction
Applications of Deep learning

• Computer Vision: Computer vision deals with algorithms for

computers to understand the world using the image and video data and
tasks.
• Eg:Image recognition, image classification, object detection, image
segmentation, image restoration, etc.
• Speech and Natural Language Processing: Natural language
processing deals with algorithms for computers to understand,
interpret, and manipulate human language. NLP algorithms work with
text and audio data and transform them into audio or text output.
• Eg: Sentiment analysis, speech recognition, language transition, and natural
language generation, etc.
Applications of Deep learning

• Autonomous Vehicles: Deep learning models are trained with a huge

amount of data for identifying street signs; some models specialize in
identifying pedestrians, identifying humans, etc., for driverless cars
while driving.
• Text Generation: By using deep learning models trained by language,
grammar, and types of texts, etc., can be used to create a new text with
correct spelling and grammar.
• Image Filtering: By using deep learning models such as adding color
to black-and-white images, it can be done by deep learning models,
which will take more time if we do it manually.
Deep learning
Biological Neuron
• Human Brain consist of billions of neural cells
called neurons (approximately 1011)
• Neuron process information.
• Each cell works like single processor.
• Interaction between cells and their parallel
processing helps brain to learn, re-organize
itself from experience and adapt to the
environment.
• Information transported between neurons in
the form of electric signals.
Biological Neuron
Neuron consists of the following four parts
• Dendrites−responsible for receiving the
information from other neurons it is connected
to.
• Soma (Cell body) − It is responsible for
processing of information they have received
from dendrites. Nucleus is located here.
• Axon − It is just like a cable through which
neurons send the information.
• Synapses − It is the connection between the
axon and other neuron dendrites.
Biological Neuron

• The incoming information from dendrites are

added up at nucleus and then delivered to synapse
via axon.

• If the incoming stimulation has exceeded a

certain threshold, the neuron is activated.

• If the stimulation is too low the neuron is

inhibited, and the information will not be
transported to any further.
Artificial Neuron
Artificial Neuron

• Artificial Neuron is a mathematical model of the

biological neuron.
• Artificial Neuron is the basic unit of Artificial
Neural Network(ANN).
• A neuron can accept any number of inputs and
send a single output signal.
Artificial Neuron – Mathematical Model

• Let the inputs be x1, x2, …, xn

• Inputs are connected to the cell body through
links having weights w1, w2, …, wn
respectively
• Weights represents the connection strength
similar to synaptic strength in biological
neuron.
Artificial Neuron – Net Input Calculation
• Net input
Yin = x1w1 +x2w2+ … +xnwn
=σ𝑛𝑖=1 𝑥𝑖 𝑤𝑖

• Output Calculation
To calculate output, an activation function is applied
over the net input Yin

Eg: Activation functions

Hard Limiter
Sign
Sigmoid
Ramp Function
Hyperbolic Tangent
Biological Neuron Vs. Artificial Neuron
Terminology relationship between biological and
artificial neuron

Biological Artificial Neuron

Neuron
Cell Neuron
Dendrites Weight or
Interconnection
Soma Net Input
Axon Output
Biological Neuron Vs. Artificial Neuron
Biological Neuron Artificial Neuron
1. Speed of execution Milliseconds Nano seconds
Can perform massive parallel Can perform massive parallel
2. Processing Data operations operations

Total number of neurons in Size and complexity is based

brain is about 1011 and total on application and designer.
3. Size and Complexity number of interconnections is
about 1015
Complexity is High
Fault tolerant Information got corrupted if
Can store and retrieve data interconnections got
4. Tolerance
even if the interconnections disconnected.
got disconnected
Biological Neuron Vs. Artificial Neuron

Biological Neuron Artificial Neuron

Store data on synapse Store data at weight

and new data can be matrix.
5. Storage or added by adjusting Adding new data may
memory synaptic strength destroy old data.
without destroying old
data.
No control unit. A control unit present in
6. Control CPU, which transfer
Mechanism values from one unit to
another
Eg: ANN Models
• McCulloch Pits Model (1943)
• Hebb Network (1949)
• Perceptron (1958)
• Adaline (1960)
• Back Propagation Network(1986)
Characteristics of ANN

• Mathematical model.
• Contains interconnected processing elements(neurons).
• WEIGHTED LINKS (interconnections) hold information.
• Neurons can learn, recall, generalize data by adjusting
weights.
• No single neuron carries specific information; only a
collection of neurons hold data.
ANN
• In Neural networks neurons are
organized in layers.
Input layer
Hidden Layer
Output layer
• When some data is fed to the
ANN, it is processed via layers
of neurons to produce desired
output.
• Data is presented to the network
via the input layer.
• Input layer communicates to
hidden layers
• The hidden layers then link to an
output layer where the answer is
output
Basic Models of Artificial Neual Networks (ANN)

• Ann Models are specified by three entities

1. Inter Connections
2. Learning
3. Activation Function
Basic Models of ANN
1. Inter Connections

• Connection pattern formed within and between layers is called the

network architecture.
• There exist five basic types of neuron connection architectures.
1. single-layer feed-forward network;
2. multilayer feed-forward network;
3. single node with its own feedback;
4. single-layer recurrent network;
5. multilayer recurrent network.
1.1 Single-layer feed-forward network
• In feed forward networks the data flow in
a single direction, from the input layer data to X1
w11

the output layer. w12

y1
y1

• Single layer feed forward network contain input w21

layer and output layer X2

w22
y2
y2

• Here inputs are given directly to the neurons of w31

output layer. w32

Input Layer Output Layer

• Each neurons of output layer will calculate
NET INPUT(yin) and ACTIVATION
FUNCTION is applied over it to produce the
output(y)
1.2 Multi layer Feed Forward network

• This type network contain

v11
one or more layers (hidden X1

Z1
layers) between input and v12 w11

y
output layer. X2 v21 Y1

v22

Z2 w21
v31

• More the number of the X3

v32
hidden layers, more is the
complexity of the network. Input Layer Hidden Layer Output Layer
1.3 Single layer Recurrent Network (Feed back Network)

• Networks contain output w11

X1
links directed back as inputs y1
y1
to the same or preceding w12

layer nodes is called feedback X2

w21

networks. w22
y2
• Recurrent networks are
y2
w31
X3
feedback networks with w32

closed loop.
1.4 Multi layer Recurrent Network (Feed back Network)

• Networks contain output v11

links directed back as inputs X1

Z1
w11
to the same or preceding v12
y
layer nodes is called feedback X2 v21 Y1

networks. v22

Z2 w21
• Recurrent networks are X3
v31

feedback networks with v32

closed loop.
1.5 Single node with its own feedback (Feed back Network)

• Single recurrent network

having a single neuron with
feedback to itself.
• Recurrent networks are
feedback networks with
closed loop.
2. Learning or Training
• Learning is the process which improves ANN’s performance
and is applied repeatedly over the network.
• Learning is done by updating weights till the desired output is
obtained.
• For learning there is a learning algorithm.
• Data called as training data set is fed to the learning algorithm
and the algorithm draws inferences from the training data set.
• Types of learning are:
• Supervised learning
• Unsupervised learning
• Reinforcement learning
2. Learning or Training – Supervised learning
2. Learning or Training – Supervised learning
• Learn with the help of a Teacher/ Supervisor.
• In this method both input and output patterns (training pair)
are provided.
• During training, input is given to ANN, which gives an
output(actual output).
• This actual output is compared with the target output(output
given in the training pair).
• If there exists a difference an error signal is generated.
• This error signal is used for adjusting weights.
• This process repeats until actual output matches the target
output or we attain some accuracy.
• Supervisor helps to reduce error, so it is called supervised
learning.
2. Learning or Training – Unsupervised learning

• Learning process is independent

• Inputs are grouped as clusters without the use of training
data, it finds the hidden structure in the input.
• In the training process, the network receives the input
patterns and organize these patterns to form clusters.
• When an input is applied to ANN, it gives a response
indicating the cluster to which that input belongs.
• If an input didn’t belong to a cluster, a new cluster is formed.
• There is no mechanism to check whether the outputs are
correct or not.
2. Learning or Training – Unsupervised learning

• ANN must itself discover pattern regularities and features from input and the
relations for the input data over the output.
• While discovering these features network undergo changes in weights. This
process is called self –organizing.

Changes in
weight
values
2. Learning or Training – Reinforcement Learning

• This is a form of supervised learning.

• The exact information if output is not
available but a critic information is
available.
• Feedback is sent as reinforcement signal.
• This reinforcement signal is processed in
an error signal generator.
• ANN adjusts weights according to the
error signal and the training process
repeats.
3. Activation Functions

Let T=0
1. Hard Limiter or STEP function or
Binary Step function

𝑓 𝑦𝑖𝑛 = 0 𝑖𝑓 𝑦𝑖𝑛 < 𝑇ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 𝑇

= 1 𝑖𝑓 𝑦𝑖𝑛 ≥ 𝑇ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑(𝑇)
3. Activation Functions
Let T=0

2. Sign function or Bipolar Step

𝑓 𝑦𝑖𝑛 = −1 𝑖𝑓 𝑦𝑖𝑛 < 𝑇ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 𝑇

= 1 𝑖𝑓 𝑦𝑖𝑛 ≥ 𝑇ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑(𝑇)
+
1
1
Here y is -1 or +1
-
1
Let T=1
3. Activation Functions

3. Binary Sigmoid f(yin)

1 1
𝑓 𝑦𝑖𝑛 =
1 + 𝑒 −𝑦𝑖𝑛

Here y lies between 0 and 1

3. Activation Functions

4. Bipolar Sigmoid f(yin)

1 − 𝑒 −𝑦𝑖𝑛
𝑓 𝑦𝑖𝑛 =
1 + 𝑒 −𝑦𝑖𝑛

-1

Here y lies between -1 and 1

3. Activation Functions

5. RAMP function f(yin)

1
𝑓 𝑦𝑖𝑛 =1 if 𝑦𝑖𝑛 >1
= 0 if 𝑦𝑖𝑛 <0
= 𝑦𝑖𝑛 if 0 ≤ 𝑦𝑖𝑛 ≤ 1 1

Here y lies between 0 and 1

3. Activation Functions

6. Tanh — Hyperbolic tangent

mathematical formula is
𝑒 2𝑦𝑖𝑛 − 1
𝑓 𝑥 = 2𝑦𝑖𝑛
𝑒 +1
3. Activation Functions

7. ReLu -Rectified linear units

Almost all deep learning Models use ReLu

Mathamatical formula is :

R(x) = max(0,x)

i.e if x < 0 , R(x) = 0 and if x >= 0 , R(x) = x.

 It is very simple and efficient .

 ReLu is only be used within Hidden layers of a Neural Network Model.

3. Activation Functions

8. Softmax activation function

• The softmax function would squeeze the outputs for each class between 0 and
1 and would also divide by the sum of the outputs.

• This gives the probability of the input being in a particular class.

Terminologies of ANN

• Weight : Each neuron is connected to another neuron via links (weights). The
weights contain information about input signal.

• Bias : Bias is an additional input(x0=1) with some weight (bj).

Bias has an impact in calculating net input(yin) X0 =1
If bias is positive net input increases b0
If bias is negative net input decreases w1
x1
Y1

Net input 𝑦𝑖𝑛 = 𝑥0 𝑏𝑗 + σ𝑛𝑖=1 𝑥𝑖 𝑤𝑖 xn wn

v11
X1

V = 𝑉1 𝑣11 𝑣12
= 𝑣21 𝑣22 Z1
𝑉2 w11
v12
𝑉3 𝑣31 𝑣32
y

X2
v21 Y1
𝑤11
W = 𝑤21 v22

Z2 w21
v31

v32
Terminologies of ANN

• Threshold : Threshold is a set value based upon which the final

output of the network may be calculated.
The threshold value is used in activation function.
A comparison is made between the net input and the
threshold to obtain the actual output.
𝑓 𝑦𝑖𝑛 = 0 𝑖𝑓 𝑦𝑖𝑛 < 𝜭
= 1 𝑖𝑓 𝑦𝑖𝑛 ≥ 𝜭

• Learning Rate : It is used to control the amount of weight

adjustment at each step in the training.
The learning rate α, ranging from 0 -to 1.
Determines the rate of learning at each time step
Mc Culloch Pits Neuron (MP Neuron Model)

• Warren Mc Culloch and Walter Pits introduced MP Neural Network in 1943.

• MP Neuron model are used in logic functions like AND, OR, NOT, NAND,
NOR etc.
• MP Neuron accepts Binary Input (0/1).
• The output of MP Neuron model is also Binary (0/1).
• Weights can be positive (excitatory) or negative (Inhibitory).
• Weight values used are +1 or -1
• Activation function is Hard Limiter.
• The Threshold value is constant for an MP Neuron.
• No bias is used in MP Model
Mc Culloch Pits Neuron (MP Neuron Model)

• When the net input yin exceeds the threshold, the

neuron FIRES(y=1) ; Otherwise the neuron won’t
FIRE(y=0) x1 w1
y = f(yin) = 1 ; yin ≥ Threshold
y
= 0 ; yin < Threshold Σ T
x2
w2
• The M-P neuron has no particular training
algorithm.
• An analysis has to be performed to determine the
values of the weights and the Threshold.
• Weights of the neuron are set along with the threshold.
Perceptron Network

• Inputs are directly connected to the neurons in the output layer via adjustable weight
values.
• The activation function used at the output layer is modified SIGN function.
output y = 1 if yin > +Threshold
0 if -Threshold ≤ 𝑦𝑖𝑛 ≤ +𝑇ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 X0 =1

-1 if yin < -Threshold b0

x1 w1
Σ| y
w2 Y1
AF
x2
xn wn
Training Perceptron Network
• Learning is the process of updating the weight values.
• Perceptron can be trained by perceptron Learning rule.
• Updation of weight is done by calculating the ERROR between the
desired output (Target) and the calculated output (actual).
ERROR = Target - Actual
• If ERROR is zero goal has been achieved; otherwise update weight values.
• The perceptron networks are used to classify input pattern as a
‘member’ or ‘not a member’ to a particular class.

• The perceptron algorithm can be used for either binary or bipolar

input vectors, having bipolar targets
Perceptron Training Algorithm

Step1: initialise weights, learning parameter α and threshold θ.

Step2: for each training pair, activate inputs.
Step3: calculate output of the network.
1. obtain net input (yin)
yin =b+ σ𝑛𝑖=1 𝑥𝑖𝑤𝑖
2. Apply activation function over net input to get output (y)
y = 1 if yin > + θ
0 if - θ ≤ 𝑦𝑖𝑛 ≤ +θ
-1 if yin < -θ
Perceptron Training Algorithm

Step4: Compare calculated output, y and the target output, t.

Update weights if necessary.
if y≠ 𝐭 ERROR exists, then update weights as follows
wi (new) = wi(old)+ α xi t
b(new) = b(old)+ α t
else No need to update weight
wi (new) = wi(old)
b(new) = b(old)
Step5: If there is no change in weights for all training pair, stop
training.
Else go to step 2
Perceptron Testing Algorithm

Once the training process is complete, test the performance of

perceptron network.
Step1 :Initialise weights as the final weights obtained during training.
Step2 : For each input perform the following.
yin = b + σ𝑛𝑖=1 𝑥𝑖𝑤𝑖

y = 1 if yin > + θ (Fire)

0 if - θ ≤ 𝑦𝑖𝑛 ≤ +θ (Not fire)
-1 if yin < -θ (Not fire)
AND GATE using Perceptron

Threshold=0
x1 x2 X0 target yin Actual w1 w2 b α=1
=1 output
y
Epoch 1 0 0 0
X0 =1
1 1 1 1 0 0 1 1 1
1 -1 1 -1 1 1 0 2 0 x1 -1
-1 1 1 -1 2 1 1 1 -1
1
-1 -1 1 -1 -3 -1 1 1 -1 y
Y1
Epoch 2
x2 1
1 1 1 1 1 1 1 1 -1
1 -1 1 -1 -1 -1 1 1 -1
-1 1 1 -1 -1 -1 1 1 -1
-1 -1 1 -1 -3 -1 1 1 -1

In epoch 2 weight is constant for all the I/P pattern, so we can stop the training.
Backpropagation Networks (BPN)

• Backpropagation network is a multilayer feedforward neural network

containing an input layer, one or more hidden layers and an output
layer.
• This network is also called multilayer perceptron.
• Neurons present in the hidden layers and output layer have bias inputs.
• The input and output can be either binary or bipolar.
• The activation functions are either binary sigmoid or bipolar
sigmoid.
Backpropagation Networks
Binary Sigmoid Bipolar Sigmoid
1
𝑦 = 𝑓 𝑦𝑖𝑛 =
1 + 𝑒 −𝑦𝑖𝑛 −𝑦𝑖𝑛
f(yin
1−𝑒 )
𝑦 = 𝑓 𝑦𝑖𝑛 = 1
f(yin)
1 + 𝑒 −𝑦𝑖𝑛
Here y lies between 0 and 1 1
-
Here y lies between -1 and 1 1

Sigmoid activation function is used in BPN because

• It is continuous
• Monotonically Non decreasing
• Differentiable
Backpropagation Networks (BPN): Learning Rule

• Learning algorithm provides procedure for changing weights.

• The basic concept for this weight update algorithm is the Gradient
descent method used in simple perceptron networks with differentiable
units.
• To update weights, error must be calculated.
• In BPN the error is propagated back to the hidden unit.
• The error at output layer can be calculated easily (ie, target – actual).
• The calculation of error at hidden layer is difficult because output of
hidden layer is not known .
Backpropagation Networks (BPN): Learning Rule

• When the hidden layers are increased, training become more

complex.
• Back propagation algorithm works in three phases.
• Feed forward phase
• Backpropagation of errors
• Weight and bias updation.
Backpropagation Networks (BPN): Learning Rule
x – Training I/P vector(x1,x2,
…,xn)
t- target o/p vector(t1,t2,…tm)
α – learning rate
Xi –i/p unit i
v0j – bias in jth hidden unit
w0k –bias on kth o/p unit
Zj – jth hidden unit
vij – weight of link from ith I/P
unit to jth hidden unit.
wjk - weight of link from jth
hidden unit to kth output unit.
Backpropagation Networks (BPN): Learning Rule
1. Net i/p to Zj
zinj = 𝒗𝒐𝒋 + σ𝒏𝒊=𝟏 𝒙𝒊𝒗𝒊𝒋
2. o/p of Zj
𝟏 𝟏−𝒆−𝒛𝒊𝒏𝒋
zj = f(zinj) = −𝒛𝒊𝒏𝒋 or
𝟏+𝒆 𝟏+𝒆−𝒛𝒊𝒏𝒋

3. Yk- kth o/p unit

4. net i/p to Yk
𝒑
yink = 𝒘𝒐𝒌 + σ𝒋=𝟏 𝒛𝒋𝒘𝒋𝒌

5. o/p of Yk
𝟏 𝟏−𝒆−𝒚𝒊𝒏𝒌
yk = f(yink) = or
𝟏+𝒆−𝒚𝒊𝒏𝒌 𝟏+𝒆−𝒚𝒊𝒏𝒌
Backpropagation Networks
Training Algorithm
1.Feed forward of input training pattern – calculate the net input of each
neuron.
2. Calculation of back propagation of error – calculate errors for weight
updation
3. Updation of weights – weights updation of all units (including bias)

Testing Algorithm
Computation of feedforward phase only – calculation of output value
Backpropagation Networks : Training Algorithm

1. Initialize weights and learning parameter (to some small value).

2. Perform steps 3 -10 when stopping condition is false
3. Perform steps 4 -9 for each training pair
Phase 1 Feedforward phase
4. Each input unit receives input signal x i and send it to hidden unit(i=1
to n).
5. Each hidden unit Zj(j=1 to p) calculate the net input.
zinj = 𝑣𝑜𝑗 + σ𝑛𝑖=1 𝑥𝑖𝑣𝑖𝑗
Backpropagation Networks : Training Algorithm
5. Also calculate the output of hidden units
1 1−𝑒 −𝑧𝑖𝑛𝑗
zj = f(zinj) = or
1+𝑒 −𝑧𝑖𝑛𝑗 1+𝑒 −𝑧𝑖𝑛𝑗

6. For each output unit Yk calculate the net input

𝑝
yink = 𝑤0𝑘 + σ𝑗=1 𝑧𝑗𝑤𝑗𝑘

Also calculate the output of output units

1 1−𝑒 −𝑦𝑖𝑛𝑘
yk = f(yink) = or
1+𝑒 −𝑦𝑖𝑛𝑘 1+𝑒 −𝑦𝑖𝑛𝑘
Backpropagation Networks : Training Algorithm
Phase II Backpropagating Error
7. Calculate error at output layer and send this error back to hidden layer.
Error at output node k is denoted by δk or Errk
if bipolar activation function
δk = (tk – yk) yk (1- yk)
δK= (tk – yk) 0.5(1+yk) (1- yk)
8. Calculate the error at the hidden layer.
Error at jth hidden layer neuron = δj or Errj

δj = zj(1-zj) δinj δj = 0.5(1+zj) (1- zj)σ𝑚

𝑘=1 δk𝑤𝑗𝑘
= zj(1-zj) σ𝑚
𝑘=1 δk𝑤𝑗𝑘
Backpropagation Networks : Training Algorithm
Phase III :Weights and Bias updations
9. For each neuron in the Hidden Layer
vij (new)= vij (old)+αδjxi
v0j(new) = v0j(old) +αδj

For each neuron in the Output Layer

wjk (new) = wjk (old)+αδkzj
w0k(new) = w0k(old) +αδk
10. Check for the stopping condition. The stopping condition may be
certain number of epochs or calculated output =target output
Equations (based on the following n/w)

Net i/p hidden layer 1

zin1 =v01+x1v11 +x2v21
zin2 =v02+x1v12 +x2v22 v01
1
v11
X1
w01
Output Hidden layer Z1
𝟏 𝟏 v12 w11
z1= −𝒛𝒊𝒏𝟏 z2= Y y1
𝟏+𝒆 𝟏+𝒆−𝒛𝒊𝒏𝟐 w12
X2 v21 1
Net i/p output layer
v22
yin1 =w01+z1w11 +z2w21 Z2
w21
Y
yin2 =w02+z1w12 +z2w22 2 y2
w22

Output of output layer v02 w02

𝟏 𝟏
y1= y2=
𝟏+𝒆−𝒚𝒊𝒏𝟏 𝟏+𝒆−𝒚𝒊𝒏𝟐 1 1

Error output layer Error hidden layer

δ1 = (t1 – y1) y1 (1- y1) δz1 = z1(1 – z1) [δ1𝑤11 + δ2𝑤12 ]
δ2 = (t2 – y2) y2 (1- y2) δz2 = z2(1 – z2) [δ1𝑤21 + δ2𝑤22 ]
Backpropagation Networks : Training Algorithm
Phase III :Weights and Bias updation
9. For neurons in the Hidden Layer
v11 (new)= v11 (old)+αδz1x1
v12 (new)= v12 (old)+αδz2x1
v21 (new)= v21 (old)+αδz1x2
v22 (new)= v22 (old)+αδz2x2
v01(new) = v01(old) +αδz1
v02(new) = v02(old) +αδz2
For neurons in the Output Layer
w11 (new) = w11 (old)+αδ1z1
w12 (new) = w12 (old)+αδ2z1
w21 (new) = w21 (old)+αδ1z2
w22 (new) = w22 (old)+αδ2z2
w01(new) = w01(old) +αδ1
w02(new) = w02(old) +αδ2
Use BPN find new weights for the following network.
Input pattern is[0,1] and output is 1, learning rate is 0.25. use binary
sigmoid activation function.
1

0.3
X1 0.6
1
Z1
-0.3
Given 0.4 -0.2
[x1 x2] =[0 1] t=1
α = 0.25 y
-0.1
𝑣01 𝑣𝑜2 0.3 0.5 X2 Y1

𝑣11 𝑣12 = 0.6 −0.3 0.4

𝑣21 𝑣22 −0.1 0.4 0.1
Z2
𝑤01 −0.2
𝑤11 = 0.4
𝑤21 0.1 0.5

1
Phase 1 Forward Phase Given
[x1 x2] =[0 1] t=1
Net i/p hidden layer α = 0.25
zin1 =v01+x1v11 +x2v21 =0.3+0 * 0.6+1 * -0.1 =0.2 𝑣01 𝑣𝑜2 0.3 0.5
zin2 =v02+x1v12 +x2v22 = 0.5 +0 * -0.3 + 1 * 0.4 =0.9 𝑣11 𝑣12 = 0.6 −0.3
Output Hidden layer 𝑣21 𝑣22 −0.1 0.4
𝟏 𝟏
z1= −𝒛𝒊𝒏𝟏 z 2= −𝒛𝒊𝒏𝟐 𝑤01 −0.2
𝟏+𝒆 𝟏+𝒆
𝟏 𝟏 𝑤11 = 0.4
= =0.5498 = = 0.711 𝑤21 0.1
𝟏+𝒆−𝟎.𝟐 𝟏+𝒆−𝟎.𝟗

Net i/p output layer 1

yin1 =w01+z1w11 +z2w21 = -0.2+0.5498 * 0.4 + 0.711 * 0.1 X
0.3
0.6
=0.0910 1
Z 1
-0.3 1
0.4 -0.2
Output of output layer X -0.1 Y y
𝟏 𝟏 2 1
y 1= = =0.5227 0.4
𝟏+𝒆−𝒚𝒊𝒏𝟏 𝟏+𝒆−𝟎.𝟎𝟗𝟏𝟎 Z 0.1
2

0.5

1
Phase 2. Error Calculation
Y1 =0.5227
Z1=0.5498
Error output layer Z2=0.711
δ1 = (t1 – y1) y1 (1- y1)
= (1- 0.5227) * 0.5227 * (1-0.5227) =
0.11908

Error hidden layer

δz1 = z1(1 – z1) [δ1𝑤11] = 0.5498(1-0.5498)[0.11908 * 0.4)
=0.01178
1
δz2 = z2(1 – z2) [δ1𝑤21 ]=0.711(1-0.711)[0.11908 * 0.1] =0.00245 X
0.6
0.3
1
Z 1
-0.3 1
0.4 -0.2
X -0.1 Y y
2 1
0.4
Z 0.1
2

0.5

1
Phase 3. Weight Updation
Y1 =0.5227
For neurons in the Hidden Layer Z1=0.5498
v11 (new)= v11 (old)+αδz1x1 = 0.6+0.25 * 0.01178 * 0 =0.6 Z2=0.711
δz1 =0.01178
v12 (new)= v12 (old)+αδz2x1 = -0.3 + 0.25* 0.00245*0 = -0.3
δz2 = 0.00245
v21 (new)= v21 (old)+αδz1x2 = -0.1 +0.25*0.01178 * 1= -0.09706 δ1 = 0.11908
v22 (new)= v22 (old)+αδz2x2 = 0.4+0.25*0.00245 *1 =0.4006125
v01(new) = v01(old) +αδz1 = 0.3 + 0.25*0.01178 = 0.302945
v02(new) = v02(old) +αδz2 = 0.5 +0.25*0.00245 = 0.5006125

1
X
0.3
0.6
1
For neurons in the Output Layer Z1
1
-0.3
w11 (new) = w11 (old)+αδ1z1 = 0.4+0.25 * 0.11908 *0.5498 0.4 -0.2

=0.41637 X -0.1 Y y
2 1

w21 (new) = w21 (old)+αδ1z2 = 0.1+0.25 * 0.11908 * 0.711 =

0.4
0.1
Z2
0.12117 0.5
w01(new) = w01(old) +αδ1 = -0.2 + 0.25 * 0.11908 = -0.17023
1
Updated BPN after Epoch 1.

1
0.302945
X1 0.6
1
Z1
-0.3
0.41637 -0.17023

X2 -0.09706 Y1
y

0.4006125
0.12117
Z2

0.500612

1
Back propagation

• The goal of training is to minimize the cost function.

• Back propagation algorithm allows the gradients to back

propagate through the network and then these are used to
adjust weights and biases to move the solution space towards
the direction of reducing cost function.
Loss function and cost function

• During training, we predict the output of a model for different inputs

and compare the predicted output with actual output in our training
set.
• The difference in actual and predicted output is termed as loss
over that input.
• The sum of squares of losses across all inputs is termed as
cost function.

• Selection of a loss and cost functions depends on the kind of output

we are targeting.
Eg: For classification we use cross entropy cost function.
Gradient Descent

• The goal of all supervised machine learning algorithms is to

best estimate a target function (f) that maps input data (x) onto
output variables (y).
(This describes all classification and regression problems.)
• Machine learning algorithms require a process of optimization to
find the set of coefficients that result in the best estimate of the
target function.
• Gradient descent method can be used to optimize coefficients.
• Gradient descent is best used when the parameters cannot be
calculated analytically (e.g. using linear algebra) and must be
searched for by an optimization algorithm.
Gradient Descent

• Gradient descent is a fundamental optimization algorithm used

to minimize the cost or loss function of a neural network or any
other model.
• The goal of gradient descent is to find the parameters (weights
and biases) of the model that minimize the error between the
predicted output and the actual target values.
Gradient Descent

• Gradient descent is one of the famous ways to calculate the local

minimum.
• By Changing the weights, we are moving towards the minimum value
of the error function.
• The weights are changed by taking steps in the negative direction of
the function gradient(derivative)
Gradient Descent Algorithm

• GD is an optimization algorithm to find the minimum of a function.

• Start with a random point function and move in the negative direction
of the gradient of the function to reach the local/global minima.
• Gradient Descent Algorithm iteratively calculates the next point
using gradient at the current position, then scales it (by a
learning rate) and subtracts obtained value from the current
position (makes a step).
• This process can be written as:
𝜕𝑦
𝑤𝑖+1 = 𝑤𝑖 − 𝛼
𝜕𝑥
Gradient Descent Algorithm

• There’s an important parameter α which scales the gradient and

thus controls the step size. α is called learning rate .
• The smaller learning rate the longer GD converges or may
reach maximum iteration before reaching the optimum point.
• If learning rate is too big the algorithm may not converge to the
optimal point (jump around) or even to diverge completely.
Gradient Descent Procedure

• The goal is to continue to try different values for the coefficients,

evaluate their cost and select new coefficients that have a slightly
better (lower) cost.
• Repeating this process enough times will lead to the values of the
coefficients that result in the minimum cost
Gradient Descent Procedure

1.Initialization: Initially, the model's parameters (weights and biases) are

set to random or small values.

2.Forward Pass: For a given set of input data, the model computes a output
using the current parameter values. This output is compared to the target
values using a cost or loss function, which quantifies how far off the
prediction is from the truth.

3.Backward Pass (Backpropagation): Gradient descent gets its name

from the way it updates the model parameters. It calculates the gradient
(partial derivative) of the loss function with respect to each parameter. This
gradient tells us how much the loss would change if we made small
adjustments to each parameter.
Gradient Descent Procedure

4. Update Parameters: The parameters are updated in the opposite

direction of the gradient to minimize the loss function.
The learning rate, which is a hyperparameter, determines the size of
each step taken during this update. A smaller learning rate results in
smaller steps, which can help the algorithm converge more accurately
but may take longer to converge. A larger learning rate can lead to
faster convergence but may risk overshooting the optimal parameter
values.

4.Repeat: Steps 2-4 are repeated for a specified number of iterations

(epochs) or until the loss converges to a satisfactory level.
Gradient Descent Procedure

Steps
1. Initialize the values for the coefficient or coefficients for the function. These could be 0.0
or a small random value.
coefficient = 0.0
2. The cost of the coefficients is evaluated by plugging them into the function and
calculating the cost.
cost = f(coefficient)
3. The derivative of the cost is calculated.
(The derivative is refers to the slope of the function at a given point)
delta = derivative(cost)
(We need to know the slope so that we know the direction (sign) to move the
coefficient values in order to get a lower cost on the next iteration)
Gradient Descent Procedure

4. Update the coefficient values. A learning rate parameter (alpha) must be

specified that controls how much the coefficients can change on each
update.
coefficient = coefficient – (alpha * delta)

5. This process is repeated until the cost of the coefficients (cost) is 0.0 or
close enough to zero to be good enough.
Gradient Descent Algorithm

• Gradient Descent method’s steps are:

1. choose a starting point (initialization)
2. calculate gradient at this point
3. make a scaled step in the opposite( negative) direction to the
gradient (objective: minimize)
4. repeat points 2 and 3 until one of the criteria is met:
• maximum number of iterations reached
• step size is smaller than the tolerance.
Local vs. Global Minimum
• The neural network might give different results with
different start weights.
• The algorithm tries to find the local minima rather
than global minima.
• There can be many local minima’s, which means
there can be many solutions to neural network
problem
• We need to perform the validation checks before
choosing the final model.
Gradient Descent Algorithm

• Gradient descent algorithm does not work for all functions.

There are two specific requirements. A function has to be:
• differentiable
• convex
• Differentiable : If a function is differentiable, it has a derivative
for each point in its domain.
Gradient Descent Algorithm

Function has to be convex.

• For a univariate function, this means that the line segment
connecting two function’s points lays on or above its curve (it
does not cross it).
• If it does it means that it has a local minimum which is not a
global one.
Gradient Descent Algorithm
Gradient Descent Algorithm

• To check mathematically if a univariate function is convex is to

calculate the second derivative and check if its value is always
bigger than 0.
Gradient Descent
1.Batch Gradient Descent:
• Batch gradient descent, also called vanilla gradient descent,
calculates the error for each example within the training dataset,
but only after all training examples have been evaluated does
the model get updated.
• This whole process is like a cycle and it's called a training
epoch.
• But if the number of training examples is large, then batch
gradient descent is computationally very expensive and is not
preferred.
Gradient Descent
2. Stochastic Gradient Descent:
• Updates the parameters for each training example one by one.
• The parameters are being updated even after one iteration in
which only a single example has been processed.
• The frequent updates allow us to have a pretty detailed rate of
improvement.
• Hence this is quite faster than batch gradient descent.
• When the number of training examples is large cause additional
overhead for the system.
Gradient Descent
2. Stochastic Gradient Descent algorithm
Initialize w1
for k = 1 to K do
Sample an observation i uniformly at random
Update wK +1 =wK − α∇fi(wK )
end for
Return wK
Gradient Descent
3. Mini Batch gradient descent:
This is a type of gradient descent which works faster than both
batch gradient descent and stochastic gradient descent.
Mini-batch gradient descent is the go-to method since it’s a
combination of the concepts of SGD and batch gradient descent.
It simply splits the training dataset into small batches and
performs an update for each of those batches.
Sigmoid Neuron
• A sigmoid neuron is a type of artificial neuron that was commonly used in the
early days of neural network research.

• It is named after the sigmoid function, which is an S-shaped curve, and it's also
known as the logistic sigmoid or logistic function.

• In Sigmoid neurons a small changes in their weights and bias cause only a small
change in their output.

• Output function (sigmoid activation function)is much smoother than the step
function.
Sigmoid Neuron

• The output y is not binary but a real value between 0 and 1 which can
be interpreted as a probability
Sigmoid Neuron
• Sigmoid neurons have some limitations and drawbacks, which have led to
their decreased use in deep learning in favour of other activation functions
like ReLU (Rectified Linear Unit). These limitations include:
1. Vanishing Gradient Problem: Sigmoid neurons suffer from the
vanishing gradient problem, especially in deep networks. When gradients
become very small during backpropagation, weight updates can become
insignificant, which can slow down or even halt the learning process in
deep networks.

2. Output Centering: The output of the sigmoid function is not centered

around zero, which can lead to slower convergence when training neural
networks.
Sigmoid Neuron
3. Saturation: Sigmoid neurons saturate when the input is far from zero,
causing the gradient to be close to zero. This slows down the learning
process because weight updates become small.

• However, sigmoid neurons are still used in specific cases, such as the
output layer of binary classification models, where the sigmoid function's
output range (0, 1) is desirable for modeling probabilities.

Soft Computing PPT Module1
No ratings yet
Soft Computing PPT Module1
102 pages
M.tech DL
No ratings yet
M.tech DL
221 pages
Fuzzy Notes
No ratings yet
Fuzzy Notes
110 pages
AIML Unit 5
No ratings yet
AIML Unit 5
195 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
AAI unit 2
No ratings yet
AAI unit 2
147 pages
Artificial Intelligence - Wikipedia
No ratings yet
Artificial Intelligence - Wikipedia
53 pages
Document (1)
No ratings yet
Document (1)
175 pages
SOFT COMPUTING NOTES-I MCA
No ratings yet
SOFT COMPUTING NOTES-I MCA
142 pages
Project A
No ratings yet
Project A
24 pages
Soft Computing Notes (2)
No ratings yet
Soft Computing Notes (2)
51 pages
IP_FeatureExtractionEndAnalysis_L7
No ratings yet
IP_FeatureExtractionEndAnalysis_L7
63 pages
Artificial Neural Network̄
No ratings yet
Artificial Neural Network̄
62 pages
Introduction to Neural Networks
No ratings yet
Introduction to Neural Networks
125 pages
ANN Material
No ratings yet
ANN Material
99 pages
Ds Unit V Ann Perceptron
No ratings yet
Ds Unit V Ann Perceptron
69 pages
Neural-Networks [Compatibility Mode] [Repaired]
No ratings yet
Neural-Networks [Compatibility Mode] [Repaired]
72 pages
Approximate_solution_of_one_and_two_dimensional_nonlinear_Klein_Sinh_Gordon_equations_using_method_of_line_based_on_Fibonacci_polynomials
No ratings yet
Approximate_solution_of_one_and_two_dimensional_nonlinear_Klein_Sinh_Gordon_equations_using_method_of_line_based_on_Fibonacci_polynomials
23 pages
Lntroduction NN
No ratings yet
Lntroduction NN
96 pages
Neural Networks
No ratings yet
Neural Networks
75 pages
UNIT1 NN 2023ver1
No ratings yet
UNIT1 NN 2023ver1
57 pages
Datasets in machine learning Unit 2
No ratings yet
Datasets in machine learning Unit 2
15 pages
Notes ML 24 Slides RNN ANN
No ratings yet
Notes ML 24 Slides RNN ANN
78 pages
Neural Networks
100% (1)
Neural Networks
57 pages
UNIT I II NOTES SOFT Computing
No ratings yet
UNIT I II NOTES SOFT Computing
46 pages
Unit-1 (1)
No ratings yet
Unit-1 (1)
89 pages
Final Year Project Report
No ratings yet
Final Year Project Report
52 pages
Soft Computing
No ratings yet
Soft Computing
38 pages
Artificial Neural Network - edited-2
No ratings yet
Artificial Neural Network - edited-2
43 pages
ML Unit4
No ratings yet
ML Unit4
32 pages
DL Question Bank Answers
No ratings yet
DL Question Bank Answers
55 pages
ANN_Module_1
No ratings yet
ANN_Module_1
39 pages
Chapter 5 - Machine Learning
No ratings yet
Chapter 5 - Machine Learning
59 pages
Lec 1
No ratings yet
Lec 1
57 pages
UNIT II Basic On Neural Networks
No ratings yet
UNIT II Basic On Neural Networks
36 pages
NN topologies
No ratings yet
NN topologies
19 pages
ML_Unit-4
No ratings yet
ML_Unit-4
18 pages
module1.docx
No ratings yet
module1.docx
22 pages
DL+lect+4 (1)
No ratings yet
DL+lect+4 (1)
41 pages
Hedlin Novian Napitupulu Tugas3
No ratings yet
Hedlin Novian Napitupulu Tugas3
7 pages
Deep_learning_in_drug_discovery
No ratings yet
Deep_learning_in_drug_discovery
12 pages
MCQ_ST2
No ratings yet
MCQ_ST2
15 pages
Soft Computing Notes
No ratings yet
Soft Computing Notes
127 pages
Unit V Tn321
No ratings yet
Unit V Tn321
50 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
57 pages
Unit 5
No ratings yet
Unit 5
75 pages
Improved Swarm Intelligence Optimization Using Crossover and Mutation For Medical Classification
No ratings yet
Improved Swarm Intelligence Optimization Using Crossover and Mutation For Medical Classification
7 pages
Artificial Neural Networks
100% (1)
Artificial Neural Networks
18 pages
DL Question Paper Solved
No ratings yet
DL Question Paper Solved
12 pages
1905.13750 Sketch2code Generating A Website From A Paper
No ratings yet
1905.13750 Sketch2code Generating A Website From A Paper
64 pages
Bishop 1994
No ratings yet
Bishop 1994
30 pages
Neural Networks
No ratings yet
Neural Networks
28 pages
Soft Computing: by K.Sai Saranya, Assistant Professor, Department of CSE
No ratings yet
Soft Computing: by K.Sai Saranya, Assistant Professor, Department of CSE
127 pages
Neural Network: Sudipta Roy
No ratings yet
Neural Network: Sudipta Roy
64 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
22 pages
Week 1
No ratings yet
Week 1
24 pages
NN3 PDF
No ratings yet
NN3 PDF
7 pages
CCN Book
No ratings yet
CCN Book
20 pages
LIET III CSE AIML II SEM A & B OU Soft Computing UNIT IV LN
No ratings yet
LIET III CSE AIML II SEM A & B OU Soft Computing UNIT IV LN
43 pages
CS407 Neural Computation: Lecturer: A/Prof. M. Bennamoun
No ratings yet
CS407 Neural Computation: Lecturer: A/Prof. M. Bennamoun
34 pages
Lecture Notes SC
No ratings yet
Lecture Notes SC
21 pages
Artificial Neural Network: Introduction-Basics
No ratings yet
Artificial Neural Network: Introduction-Basics
28 pages
Module 2 Deep Feed Forward Networks
No ratings yet
Module 2 Deep Feed Forward Networks
18 pages
Notes Unit 1
No ratings yet
Notes Unit 1
13 pages
Ann Chapter 2
No ratings yet
Ann Chapter 2
240 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
48 pages
Soft Computing
No ratings yet
Soft Computing
30 pages
Introduction To AI and ES-1
No ratings yet
Introduction To AI and ES-1
14 pages
Artificial Neural Network Lecture 1
No ratings yet
Artificial Neural Network Lecture 1
9 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
31 pages
Soft Computing
No ratings yet
Soft Computing
30 pages
Neural Networks - Comprehensive Foundation (Introduction)
No ratings yet
Neural Networks - Comprehensive Foundation (Introduction)
47 pages
Deep Learning Introduction Unit 1
No ratings yet
Deep Learning Introduction Unit 1
21 pages
Gender Recognition Using Fast Fourier Transform With Ann
No ratings yet
Gender Recognition Using Fast Fourier Transform With Ann
6 pages
ANN 3 - Perceptron
100% (1)
ANN 3 - Perceptron
56 pages
Neural Networks
No ratings yet
Neural Networks
5 pages
Artificial Intelligence Artificial Neural Networks - : Introduction
No ratings yet
Artificial Intelligence Artificial Neural Networks - : Introduction
43 pages
Minin Handout
No ratings yet
Minin Handout
13 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
3 pages
ANN Unit 3
No ratings yet
ANN Unit 3
11 pages
Artifcial Neural Network": "A Project On
No ratings yet
Artifcial Neural Network": "A Project On
31 pages
Artificial Neural Networks: 1 CSE 590 Lecture 1
No ratings yet
Artificial Neural Networks: 1 CSE 590 Lecture 1
72 pages
Unit-I Introduction and ANN Structure
No ratings yet
Unit-I Introduction and ANN Structure
15 pages
Machine Learning For Beginners. The Simplified Guide
No ratings yet
Machine Learning For Beginners. The Simplified Guide
24 pages
Radial Basis Function
No ratings yet
Radial Basis Function
35 pages
Introduction of Neural Network
No ratings yet
Introduction of Neural Network
31 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
12 pages
Convolutional Neural Networks in Python: Beginner's Guide to Convolutional Neural Networks in Python
From Everand
Convolutional Neural Networks in Python: Beginner's Guide to Convolutional Neural Networks in Python
Frank Millstein
No ratings yet
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
From Everand
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Fouad Sabry
No ratings yet
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
From Everand
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
Fouad Sabry
No ratings yet