Ann mod1
Ann mod1
DEEP LEARNING
Module 1
Artificial Neural Networks
Introduction
• Two major Problem Solving Techniques are
• Hard computing
• Soft Computing
Introduction
Hard computing
• Hard computing deals with precise models where
accurate solutions are achieved quickly.
• Hard Computing technique require exact input
data
• It is strictly sequential and provides precise
answers to complex problems.
Introduction
Soft computing
• Soft Computing deals with approximate models.
• The term "soft computing" was introduced by
Professor Lofti Zadeh in 1994.
• Provides solutions to complex problems and deals
with imprecise, uncertain and partial truth data.
• Soft computing is a combination of Neural
Network, Fuzzy Logic and Genetic Algorithm.
Deep learning
Deep learning
Deep learning
Area
Facilities
Price
Age
Location
Deep learning
Deep learning
• Deep learning is a subset of machine learning in artificial
intelligence.
• It is capable of implementing a function that is used to mimic the functionality
of the brain by creating patterns(ability to learn and encode significant features
from the input data) and processing data.
• Uses artificial neural network with many layers to address complex problems.
• Output Calculation
To calculate output, an activation function is applied
over the net input Yin
• Mathematical model.
• Contains interconnected processing elements(neurons).
• WEIGHTED LINKS (interconnections) hold information.
• Neurons can learn, recall, generalize data by adjusting
weights.
• No single neuron carries specific information; only a
collection of neurons hold data.
ANN
• In Neural networks neurons are
organized in layers.
Input layer
Hidden Layer
Output layer
• When some data is fed to the
ANN, it is processed via layers
of neurons to produce desired
output.
• Data is presented to the network
via the input layer.
• Input layer communicates to
hidden layers
• The hidden layers then link to an
output layer where the answer is
output
Basic Models of Artificial Neual Networks (ANN)
w22
y2
y2
X3
Z1
layers) between input and v12 w11
y
output layer. X2 v21 Y1
v22
Z2 w21
v31
networks. w22
y2
• Recurrent networks are
y2
w31
X3
feedback networks with w32
closed loop.
1.4 Multi layer Recurrent Network (Feed back Network)
Z1
w11
to the same or preceding v12
y
layer nodes is called feedback X2 v21 Y1
networks. v22
Z2 w21
• Recurrent networks are X3
v31
closed loop.
1.5 Single node with its own feedback (Feed back Network)
• ANN must itself discover pattern regularities and features from input and the
relations for the input data over the output.
• While discovering these features network undergo changes in weights. This
process is called self –organizing.
Changes in
weight
values
2. Learning or Training – Reinforcement Learning
Let T=0
1. Hard Limiter or STEP function or
Binary Step function
1 1
𝑓 𝑦𝑖𝑛 =
1 + 𝑒 −𝑦𝑖𝑛
1 − 𝑒 −𝑦𝑖𝑛
𝑓 𝑦𝑖𝑛 =
1 + 𝑒 −𝑦𝑖𝑛
-1
1
𝑓 𝑦𝑖𝑛 =1 if 𝑦𝑖𝑛 >1
= 0 if 𝑦𝑖𝑛 <0
= 𝑦𝑖𝑛 if 0 ≤ 𝑦𝑖𝑛 ≤ 1 1
Mathamatical formula is :
R(x) = max(0,x)
• The softmax function would squeeze the outputs for each class between 0 and
1 and would also divide by the sum of the outputs.
• Weight : Each neuron is connected to another neuron via links (weights). The
weights contain information about input signal.
V = 𝑉1 𝑣11 𝑣12
= 𝑣21 𝑣22 Z1
𝑉2 w11
v12
𝑉3 𝑣31 𝑣32
y
X2
v21 Y1
𝑤11
W = 𝑤21 v22
Z2 w21
v31
X3
v32
Terminologies of ANN
• Inputs are directly connected to the neurons in the output layer via adjustable weight
values.
• The activation function used at the output layer is modified SIGN function.
output y = 1 if yin > +Threshold
0 if -Threshold ≤ 𝑦𝑖𝑛 ≤ +𝑇ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 X0 =1
x1 w1
Σ| y
w2 Y1
AF
x2
xn wn
Training Perceptron Network
• Learning is the process of updating the weight values.
• Perceptron can be trained by perceptron Learning rule.
• Updation of weight is done by calculating the ERROR between the
desired output (Target) and the calculated output (actual).
ERROR = Target - Actual
• If ERROR is zero goal has been achieved; otherwise update weight values.
• The perceptron networks are used to classify input pattern as a
‘member’ or ‘not a member’ to a particular class.
Threshold=0
x1 x2 X0 target yin Actual w1 w2 b α=1
=1 output
y
Epoch 1 0 0 0
X0 =1
1 1 1 1 0 0 1 1 1
1 -1 1 -1 1 1 0 2 0 x1 -1
-1 1 1 -1 2 1 1 1 -1
1
-1 -1 1 -1 -3 -1 1 1 -1 y
Y1
Epoch 2
x2 1
1 1 1 1 1 1 1 1 -1
1 -1 1 -1 -1 -1 1 1 -1
-1 1 1 -1 -1 -1 1 1 -1
-1 -1 1 -1 -3 -1 1 1 -1
In epoch 2 weight is constant for all the I/P pattern, so we can stop the training.
Backpropagation Networks (BPN)
5. o/p of Yk
𝟏 𝟏−𝒆−𝒚𝒊𝒏𝒌
yk = f(yink) = or
𝟏+𝒆−𝒚𝒊𝒏𝒌 𝟏+𝒆−𝒚𝒊𝒏𝒌
Backpropagation Networks
Training Algorithm
1.Feed forward of input training pattern – calculate the net input of each
neuron.
2. Calculation of back propagation of error – calculate errors for weight
updation
3. Updation of weights – weights updation of all units (including bias)
Testing Algorithm
Computation of feedforward phase only – calculation of output value
Backpropagation Networks : Training Algorithm
0.3
X1 0.6
1
Z1
-0.3
Given 0.4 -0.2
[x1 x2] =[0 1] t=1
α = 0.25 y
-0.1
𝑣01 𝑣𝑜2 0.3 0.5 X2 Y1
1
Phase 1 Forward Phase Given
[x1 x2] =[0 1] t=1
Net i/p hidden layer α = 0.25
zin1 =v01+x1v11 +x2v21 =0.3+0 * 0.6+1 * -0.1 =0.2 𝑣01 𝑣𝑜2 0.3 0.5
zin2 =v02+x1v12 +x2v22 = 0.5 +0 * -0.3 + 1 * 0.4 =0.9 𝑣11 𝑣12 = 0.6 −0.3
Output Hidden layer 𝑣21 𝑣22 −0.1 0.4
𝟏 𝟏
z1= −𝒛𝒊𝒏𝟏 z 2= −𝒛𝒊𝒏𝟐 𝑤01 −0.2
𝟏+𝒆 𝟏+𝒆
𝟏 𝟏 𝑤11 = 0.4
= =0.5498 = = 0.711 𝑤21 0.1
𝟏+𝒆−𝟎.𝟐 𝟏+𝒆−𝟎.𝟗
0.5
1
Phase 2. Error Calculation
Y1 =0.5227
Z1=0.5498
Error output layer Z2=0.711
δ1 = (t1 – y1) y1 (1- y1)
= (1- 0.5227) * 0.5227 * (1-0.5227) =
0.11908
0.5
1
Phase 3. Weight Updation
Y1 =0.5227
For neurons in the Hidden Layer Z1=0.5498
v11 (new)= v11 (old)+αδz1x1 = 0.6+0.25 * 0.01178 * 0 =0.6 Z2=0.711
δz1 =0.01178
v12 (new)= v12 (old)+αδz2x1 = -0.3 + 0.25* 0.00245*0 = -0.3
δz2 = 0.00245
v21 (new)= v21 (old)+αδz1x2 = -0.1 +0.25*0.01178 * 1= -0.09706 δ1 = 0.11908
v22 (new)= v22 (old)+αδz2x2 = 0.4+0.25*0.00245 *1 =0.4006125
v01(new) = v01(old) +αδz1 = 0.3 + 0.25*0.01178 = 0.302945
v02(new) = v02(old) +αδz2 = 0.5 +0.25*0.00245 = 0.5006125
1
X
0.3
0.6
1
For neurons in the Output Layer Z1
1
-0.3
w11 (new) = w11 (old)+αδ1z1 = 0.4+0.25 * 0.11908 *0.5498 0.4 -0.2
=0.41637 X -0.1 Y y
2 1
1
0.302945
X1 0.6
1
Z1
-0.3
0.41637 -0.17023
X2 -0.09706 Y1
y
0.4006125
0.12117
Z2
0.500612
1
Back propagation
2.Forward Pass: For a given set of input data, the model computes a output
using the current parameter values. This output is compared to the target
values using a cost or loss function, which quantifies how far off the
prediction is from the truth.
Steps
1. Initialize the values for the coefficient or coefficients for the function. These could be 0.0
or a small random value.
coefficient = 0.0
2. The cost of the coefficients is evaluated by plugging them into the function and
calculating the cost.
cost = f(coefficient)
3. The derivative of the cost is calculated.
(The derivative is refers to the slope of the function at a given point)
delta = derivative(cost)
(We need to know the slope so that we know the direction (sign) to move the
coefficient values in order to get a lower cost on the next iteration)
Gradient Descent Procedure
5. This process is repeated until the cost of the coefficients (cost) is 0.0 or
close enough to zero to be good enough.
Gradient Descent Algorithm
• It is named after the sigmoid function, which is an S-shaped curve, and it's also
known as the logistic sigmoid or logistic function.
• In Sigmoid neurons a small changes in their weights and bias cause only a small
change in their output.
• Output function (sigmoid activation function)is much smoother than the step
function.
Sigmoid Neuron
• The output y is not binary but a real value between 0 and 1 which can
be interpreted as a probability
Sigmoid Neuron
• Sigmoid neurons have some limitations and drawbacks, which have led to
their decreased use in deep learning in favour of other activation functions
like ReLU (Rectified Linear Unit). These limitations include:
1. Vanishing Gradient Problem: Sigmoid neurons suffer from the
vanishing gradient problem, especially in deep networks. When gradients
become very small during backpropagation, weight updates can become
insignificant, which can slow down or even halt the learning process in
deep networks.
• However, sigmoid neurons are still used in specific cases, such as the
output layer of binary classification models, where the sigmoid function's
output range (0, 1) is desirable for modeling probabilities.