Unit 4 (2)
Unit 4 (2)
Techniques
KCS 055
Artificial Neural Network (ANN)
• Inspired by information
processing model of Human
Brain.
• Human Consists of billions
of neurons that link with
each other.
• Every neuron receive
information from other
neurons.
Artificial Neural Network (ANN)
• ANNs are computational
algorithms.
• Simulates biological
behavior of nerves system of
human brain.
• Based on human mind
neuron pattern.
• Used in Deep Learning for
classification.
Artificial Neural Network (ANN)
Applications of ANN
Classification
Autonomous
problems, E.g. Classification and
Vehicle Driving
Loan Application Regression Tasks
Using ANN
Approval
Basic Terminology in ANN
Perceptron ADALINE
Pitts Model
Model Model
Perceptron
• Basic unit used to build
ANN.
• Takes real valued input.
• Calculate linear combination
of these inputs and
generates output
• If result > threshold, output
= 1, otherwise, output = 0
Perceptron Training Rule
Linear Combination :
σ 𝑤𝑖𝑥𝑖 = 𝑤0 + 𝑤1𝑥1 + 𝑤2𝑥2 + 𝑤3𝑥3 + 𝑤4𝑥4 + … … … … … … + 𝑤𝑛𝑥𝑛
Perceptron Training Rule
o = actual output
t = target output
If actual = target ➔ weights are fixed weights.
Otherwise, Weights need to be changed
𝑤𝑖 = 𝑤𝑖 + ∆𝑤𝑖
∆𝑤𝑖 = 𝑛 𝑡 − 𝑜 𝑥𝑖
where,
n = learning rate
t = targetoutput
o = actual output
xi = input associated with the weights wi.
Designing AND gate using Perceptron
Training Rule
w1 = 1.2, w2 =0.6, Threshold = 1 and Learning Rate n = 0.5
A B A^B
0 0 0 • Example -1 : A= 0, B=0 and Target = 0
0 1 0
1 0 0 • σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 1 1 • σ 𝑤𝑖𝑥𝑖 = 0*1.2 + 0 * 0.6 = 0
• This is not greater than the threshold of 1,
so the output = 0
Designing AND gate using Perceptron
Training Rule
w1 = 1.2, w2 =0.6, Threshold = 1 and Learning Rate n = 0.5
A B A^B
0 0 0 • Example -2 : A= 0, B=1 and Target = 0
0 1 0
1 0 0 • σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 1 1 • wi.xi = 0*1.2 + 1 * 0.6 = 0.6
• This is not greater than the threshold of 1,
so the output = 0
Designing AND gate using Perceptron
Training Rule
w1 = 1.2, w2 =0.6, Threshold = 1 and Learning Rate n = 0.5
A B A^B • Example -3 : A= 1, B=0 and Target = 0
0 0 0
0 1 0
• σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 0 0 • σ 𝑤𝑖𝑥𝑖 = 1*1.2 + 0 * 0.6 = 1.2
1 1 1
• This is greater than the threshold of 1, so
the output = 1
• Actual output (o) ≠ Target output (t)
Designing AND gate using Perceptron
Training Rule
w1 = 1.2, w2 =0.6, Threshold = 1 and Learning Rate n = 0.5
A B A^B
• Example -3 : A= 1, B=0 and Target = 0
0 0 0
0 1 0 • Actual output (o) ≠ Target output (t)
1 0 0 • wi = wi + Δwi = wi + n(t-o)xi
1 1 1
• w1 = 1.2 + 0.5 * (0 – 1) * 1 = 1.2 + (-0.5) = 0.7
• w2 = 0.6 + 0.5 * (0 – 1) * 0 = 0.6 + 0 = 0.6
Designing AND gate using Perceptron
Training Rule
w1 = 0.7, w2 =0.6, Threshold = 1 and Learning Rate n = 0.5
A B A^B
0 0 0 • Example -1 : A= 0, B=0 and Target = 0
0 1 0 • σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 0 0
1 1 1
• σ 𝑤𝑖𝑥𝑖 = 0*0.7 + 0 * 0.6 = 0
• This is not greater than the threshold of 1,
so the output = 0
Designing AND gate using Perceptron
Training Rule
w1 = 0.7, w2 =0.6, Threshold = 1 and Learning Rate n = 0.5
A B A^B
0 0 0 • Example -2 : A= 0, B=1 and Target = 0
0 1 0
1 0 0 • σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 1 1 • σ 𝑤𝑖𝑥𝑖 = 0*0.7 + 1 * 0.6 = 0.6
• This is not greater than the threshold of 1,
so the output = 0
Designing AND gate using Perceptron
Training Rule
w1 = 0.7, w2 =0.6, Threshold = 1 and Learning Rate n = 0.5
A B A^B
0 0 0 • Example -3 : A= 1, B=0 and Target = 0
0 1 0
1 0 0 • σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 1 1 • σ 𝑤𝑖𝑥𝑖 = 1*0.7 + 0 * 0.6 = 0.7
• This is not greater than the threshold of 1,
so the output = 0
Designing AND gate using Perceptron
Training Rule
w1 = 0.7, w2 =0.6, Threshold = 1 and Learning Rate n = 0.5
A B A^B
0 0 0 • Example -4 : A= 1, B=1 and Target = 1
0 1 0
1 0 0 • σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 1 1 • σ 𝑤𝑖𝑥𝑖 = 1*0.7 + 1 * 0.6 = 1.3
• This is greater than the threshold of 1, so
the output = 1.
Designing AND gate using Perceptron
Training Rule
• Hence, the final weights to design Logical AND
A B A^B gate using Perceptron Model are:
0 0 0
• w1 = 0.7
0 1 0
1 0 0 • w2 = 0.6
1 1 1 x1 w1=0.7
𝑤𝑖𝑥𝑖 𝑓 O
w2=0.6
x2
Designing OR gate using Perceptron
Training Rule
w1 = 0.6, w2 =0.6, Threshold = 1 and Learning Rate n = 0.5
A B A|B
0 0 0 • Example -1 : A= 0, B=0 and Target = 0
0 1 1
1 0 1 • σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 1 1 • σ 𝑤𝑖𝑥𝑖 = 0*0.6 + 0 * 0.6 = 0
• This is not greater than the threshold of 1,
so the output = 0
Designing OR gate using Perceptron
Training Rule
w1 = 0.6, w2 =0.6, Threshold = 1 and Learning Rate n = 0.5
A B A|B • Example -2 : A= 0, B=1 and Target = 1
0 0 0
0 1 1
• σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 0 1 • σ 𝑤𝑖𝑥𝑖 = 0*0.6 + 1 * 0.6 = 0.6
1 1 1
• This is not greater than the threshold of 1,
so the output = 0
• Actual output (o) ≠ Target output (t)
Designing OR gate using Perceptron
Training Rule
w1 = 0.6, w2 =0.6, Threshold = 1 and Learning Rate n = 0.5
A B A|B
• Example -2 : A= 0, B=1 and Target = 1
0 0 0
0 1 1 • Actual output (o) ≠ Target output (t)
1 0 1 • wi = wi + Δwi = wi + n(t-o)xi
1 1 1
• w1 = 0.6 + 0.5 * (1 – 0) * 0 = 0.6 + 0= 0.6
• w2 = 0.6 + 0.5 * (1– 0) * 1 = 0.6 + 0.5 = 1.1
Designing OR gate using Perceptron
Training Rule
w1 = 0.6, w2 =1.1, Threshold = 1 and Learning Rate n = 0.5
A B A|B • Example -1 : A= 0, B=0 and Target = 0
0 0 0
0 1 1
• σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 0 1 • σ 𝑤𝑖𝑥𝑖 = 0*0.6 + 0 * 1.1 = 0
1 1 1
• This is not greater than the threshold of 1,
so the output = 0
• Actual output (o) = Target output (t)
Designing OR gate using Perceptron
Training Rule
w1 = 0.6, w2 =1.1, Threshold = 1 and Learning Rate n = 0.5
A B A|B • Example -2 : A= 0, B=1 and Target = 1
0 0 0
0 1 1
• σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 0 1 • σ 𝑤𝑖𝑥𝑖 = 0*0.6 + 1 * 1.1 = 1.1
1 1 1
• This is greater than the threshold of 1, so
the output = 1
• Actual output (o) = Target output (t)
Designing OR gate using Perceptron
Training Rule
w1 = 0.6, w2 =1.1, Threshold = 1 and Learning Rate n = 0.5
A B A|B • Example -3 : A= 1, B=0 and Target = 1
0 0 0
0 1 1
• σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 0 1 • σ 𝑤𝑖𝑥𝑖 = 1*0.6 + 0 * 1.1 = 0.6
1 1 1
• This is not greater than the threshold of 1,
so the output = 0
• Actual output (o) ≠ Target output (t)
Designing OR gate using Perceptron
Training Rule
w1 = 0.6, w2 =1.1, Threshold = 1 and Learning Rate n = 0.5
A B A|B
• Example -3 : A= 1, B=0 and Target = 1
0 0 0
0 1 1 • Actual output (o) ≠ Target output (t)
1 0 1 • wi = wi + Δwi = wi + n(t-o)xi
1 1 1
• w1 = 0.6 + 0.5 * (1 – 0) * 1 = 0.6 + 0.5= 1.1
• w2 = 1.1 + 0.5 * (1– 0) * 0 = 1.1 + 0 = 1.1
Designing OR gate using Perceptron
Training Rule
w1 = 1.1, w2 =1.1, Threshold = 1 and Learning Rate n = 0.5
A B A|B • Example -1 : A= 0, B=0 and Target = 0
0 0 0
0 1 1
• σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 0 1 • σ 𝑤𝑖𝑥𝑖 = 0*1.1 + 0 * 1.1 = 0
1 1 1
• This is not greater than the threshold of 1,
so the output = 0
• Actual output (o) = Target output (t)
Designing OR gate using Perceptron
Training Rule
w1 = 0.6, w2 =1.1, Threshold = 1 and Learning Rate n = 0.5
A B A|B • Example -2 : A= 0, B=1 and Target = 1
0 0 0
0 1 1
• σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 0 1 • σ 𝑤𝑖𝑥𝑖 = 0*1.1 + 1 * 1.1 = 1.1
1 1 1
• This is greater than the threshold of 1, so
the output = 1
• Actual output (o) = Target output (t)
Designing OR gate using Perceptron
Training Rule
w1 = 0.6, w2 =1.1, Threshold = 1 and Learning Rate n = 0.5
A B A|B • Example -3 : A= 1, B=0 and Target = 1
0 0 0
0 1 1
• σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 0 1 • σ 𝑤𝑖𝑥𝑖 = 1*1.1 + 0 * 1.1 = 1.1
1 1 1
• This is greater than the threshold of 1, so
the output = 1
• Actual output (o) = Target output (t)
Designing OR gate using Perceptron
Training Rule
w1 = 1.1, w2 =1.1, Threshold = 1 and Learning Rate n = 0.5
A B A|B • Example -4 : A= 1, B=1 and Target = 1
0 0 0
0 1 1
• σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 0 1 • σ 𝑤𝑖𝑥𝑖 = 1*1.1 + 1 * 1.1 = 2.2
1 1 1
• This is not greater than the threshold of 1,
so the output = 1
• Actual output (o) = Target output (t)
Designing OR gate using Perceptron
Training Rule
• Hence, the final weights to design Logical OR
A B A|B gate using Perceptron Model are:
0 0 0
0 1 1 • w1 = 1.1
1 0 1 • w2 = 1.1
1 1 1 x1 w1=1.1
𝑤𝑖𝑥𝑖 𝑓 O
w2=1.1
x2
Delta Rule
𝜕𝜀 𝜕𝜀 𝜕𝜀 𝜕𝜀 𝜕𝜀
∇𝜀 𝑤 = [ , , , ……….. ]
𝜕𝑤0 𝜕𝑤1 𝜕𝑤2 𝜕𝑤3 𝜕𝑤𝑛
𝜕𝜀 𝜕 1
= ( σ𝑑∈𝐷 𝑡𝑑 − 𝑜 𝑑 )2
𝜕𝑤𝑖 𝜕𝑤𝑖 2
𝜕𝜀 1 𝜕
= × (σ𝑑∈𝐷 𝑡𝑑 − 𝑜 𝑑 )2
𝜕𝑤𝑖 2 𝜕𝑤𝑖
𝜕𝜀 1 𝜕
= × 2(σ𝑑∈𝐷 𝑡𝑑 − 𝑜 𝑑 ) × (𝑡𝑑 − 𝑜𝑑)
𝜕𝑤𝑖 2 𝜕𝑤𝑖
Derivation of Gradient Descent Rule
𝜕𝜀 1 𝜕
= × 2 × σ𝑑∈𝐷 𝑡𝑑 − 𝑜𝑑 𝑡𝑑 − 𝑜𝑑
𝜕𝑤𝑖 2 𝜕𝑤𝑖
𝜕𝜀 𝜕
= σ𝑑∈𝐷(𝑡𝑑 − 𝑜 𝑑 ) 𝑡𝑑 − 𝑜𝑑
𝜕𝑤𝑖 𝜕𝑤𝑖
𝜕𝜀 𝜕
= σ𝑑∈𝐷(𝑡𝑑 − 𝑜 𝑑 ) 𝑡𝑑 − 𝑤𝑑𝑥𝑑
𝜕𝑤𝑖 𝜕𝑤𝑖
𝜕𝜀
= σ𝑑∈𝐷(𝑡𝑑 − 𝑜 𝑑 ) 0 − 𝑥𝑑
𝜕𝑤𝑖
Derivation of Gradient Descent Rule
𝜕𝜀
= σ𝑑∈𝐷(𝑡𝑑 − 𝑜 𝑑 ) 0 − 𝑥𝑑
𝜕𝑤𝑖
𝝏𝜺
= σ𝒅∈𝑫(𝒕𝒅 − 𝒐 𝒅) −𝒙𝒅
𝝏𝒘𝒊
Therefore, ∆𝑤𝑖 = −η∇𝜀 𝑤
∆𝑤𝑖 = −η σ𝑑∈𝐷(𝑡𝑑 − 𝑜 𝑑 ) −𝑥𝑑
∆𝒘𝒊 = η σ𝒅∈𝑫(𝒕𝒅 − 𝒐 𝒅) 𝒙𝒅
Backpropagation Algorithm
• Backward propagation
of errors
• When error occurs, we
go in backward direction
Output Layer → Hidden
Layer → Input Layer
Example
Part 1: Forward Pass
1) Calculate h1(in and out)
• h1(in) = w1i1 +w2i2 + b1
• h1(in) = 0.15*0.05 +
0.20* 0.10 +0.35
• h1(in) = 0.377
Example
(Forward Pass)
1
• h1(out) =
(1+𝑒 −ℎ1 𝑖𝑛 )
1
• h1(out) =
(1+𝑒 −0.377) )
• h1(out) = 0.5932
Example
(Forward Pass)
2) Calculate h2(in and out)
• h2(in) = w3i1 +w4i2 + b1
• h2(in) = 0.25*0.05 +
0.30*0.10 + 0.35
• h2(in) = 0.3925
Example
(Forward Pass)
1
• h2(out) =
(1+𝑒 −ℎ2 𝑖𝑛 )
1
• h2(out) =
(1+𝑒 −0.3925) )
• h2(out) = 0.5968
Example
(Forward Pass)
3) Calculate o1(in and out)
• o1(in) = w5h1(out) +
w6h2(out) + b2
• o1(in) = 0.40*0.593 +
0.45*0.596 + 0.60
• o1(in) = 1.105
Example
(Forward Pass)
1
• o1(out) =
(1+𝑒 −𝑜1 𝑖𝑛 )
1
• o1(out) =
(1+𝑒 −1.105) )
• o1(out) = 0.7513
Example
(Forward Pass)
4) Calculate o2(in and out)
• o2(in) = w7h1(out) +
w8h2(out) + b2
• o2(in) = 0.5932*0.5 +
0.5968*0.55 + 0.60
• o2(in) = 1.22484
Example
(Forward Pass)
1
• o2(out) =
(1+𝑒 −𝑜2 𝑖𝑛 )
1
• o2(out) =
(1+𝑒 −1.22484) )
• o2(out) = 0.7729
Example
(Forward Pass)
5) Calculate Ɛtotal
1
• Ɛtotal = σ (t – o)2
2
1
• Ɛtotal = 0.01 0.7513 2 +
2
1 2
0.99 − 0.7729
2
• Ɛtotal = 0.29837 (approx.)
Example
Part 2: Backward Pass
1) For Output Layer
w+5 = w5 + Δw5
𝜕Ɛtotal
• Δw5 = - η 𝜕𝑤5
𝜕Ɛtotal
• =
𝜕𝑤5
𝜕Ɛtotal 𝜕𝑜𝑢𝑡𝑜1 𝜕𝑛𝑒𝑡𝑜1
* *
𝜕𝑜𝑢𝑡𝑜1 𝜕𝑛𝑒𝑡𝑜1 𝜕𝑤5
Example (Backward Pass)
Output → Hidden Layer
1 1
• Ɛtotal = (targeto1- outo1)2 + (targeto2- outo2)2
2 2
𝜕Ɛtotal 𝜕𝑜𝑢𝑡𝑜1
• δo1= ∗
𝜕𝑜𝑢𝑡𝑜1 𝜕𝑛𝑒𝑡𝑜1
• δo1 = outo1 – targeto1 * outo1 (1– outo1)
𝜕Ɛtotal 𝜕Ɛtotal
• = ∗ outh1
𝜕𝑤5 𝜕𝑛𝑒𝑡𝑜1
𝜕Ɛtotal
• δo1 outh1
𝜕𝑤5 =
Example (Backward Pass)
Output → Hidden Layer
𝜕Ɛtotal
• Δw5 = -η
𝜕𝑤𝑖
• Let’s take n = 0.6
• Δw5 = 0.6 * 0.08216704
• Δw5 = 0.6 * 0.08216704
• w+5 = w5 + Δw5
• w+5 = 0.4 + (-0.6 *
0.08216704)
• w+5 = 0.350699776
Example (Backward Pass)
Output → Hidden Layer
Now, let’s calculate w6, w7
and w8.
w+6 = w6 + Δw6
• w+6 = 0.408666186
• w+ 7 = 0.511301270
• w+ 8 = 0.561370121
Example (Backward Pass)
Hidden → Input Layer
2) Hidden Layer → input
layer
w+1 = w1 + Δw1
𝜕Ɛtotal
• Δw1 = - η 𝜕𝑤𝑖
𝜕Ɛtotal
• =
𝜕𝑤1
𝜕Ɛtotal 𝜕𝑜𝑢𝑡ℎ1 𝜕𝑛𝑒𝑡ℎ1
* *
𝜕𝑜𝑢𝑡ℎ1 𝜕𝑛𝑒𝑡ℎ1 𝜕𝑤1
Example (Backward Pass)
Hidden → Input Layer
𝜕Ɛtotal 𝜕Ɛo1 𝜕Ɛo2
• = +
𝜕𝑜𝑢𝑡ℎ1 𝜕𝑜𝑢𝑡ℎ1 𝜕𝑜𝑢𝑡ℎ1
𝜕Ɛo1
• Starting with,
𝜕𝑜𝑢𝑡ℎ1
𝜕Ɛo1 𝜕Ɛo1 𝜕𝑛𝑒𝑡o1
• = 𝜕𝑛𝑒𝑡 * 𝜕𝑜𝑢𝑡ℎ
𝜕𝑜𝑢𝑡ℎ1 𝑜1 1
Example (Backward Pass)
Hidden → Input Layer
𝜕Ɛo1
• 𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑖𝑛𝑔 ,
𝜕𝑛𝑒𝑡𝑜1
𝜕Ɛo1 𝜕Ɛo1 𝜕𝑜𝑢𝑡o1
• = *
𝜕𝑛𝑒𝑡𝑜1 𝜕𝑜𝑢𝑡𝑜1 𝜕𝑛𝑒𝑡𝑜1
𝜕Ɛo1
• = (−targeto1+ outo1) * outo1 (1– outo1)
𝜕𝑛𝑒𝑡𝑜1
𝜕Ɛo1
• = 0.7413565 * 0.186815602 = 0.138498562
𝜕𝑛𝑒𝑡𝑜1
Example (Backward Pass)
Hidden → Input Layer
𝜕𝑛𝑒𝑡o1
• 𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑖𝑛𝑔 ,
𝜕𝑜𝑢𝑡ℎ1
• n𝑒to1= w5* outh1 + w6*outh2 + b2
𝜕𝑛𝑒𝑡𝑜1 𝜕
• = (w5* outh1 + w6*outh2 + b2)
𝜕𝑜𝑢𝑡ℎ1 𝜕𝑜𝑢𝑡ℎ1
𝜕𝑛𝑒𝑡𝑜1
• = w5
𝜕𝑜𝑢𝑡ℎ1
Example (Backward Pass)
Hidden → Input Layer
𝜕𝑛𝑒𝑡𝑜1
• = w5 = 0.40
𝜕𝑜𝑢𝑡ℎ1
• Plugging them in
𝜕Ɛo1 𝜕Ɛo1 𝜕𝑛𝑒𝑡o1
• = *
𝜕𝑜𝑢𝑡ℎ1 𝜕𝑛𝑒𝑡𝑜1 𝜕𝑜𝑢𝑡ℎ1
𝜕Ɛo1
• = 0.138498562 * 0.40
𝜕𝑜𝑢𝑡ℎ1
𝜕Ɛo1
• 𝜕𝑜𝑢𝑡ℎ
= 0.055399425
Example (Backward Pass)
Hidden → Input Layer
𝜕Ɛo2
• Similarly, we will calculate,
𝜕𝑜𝑢𝑡ℎ1
𝜕Ɛo2
• = -0.019049119
𝜕𝑜𝑢𝑡ℎ1
• Therefore,
𝜕Ɛtotal 𝜕Ɛo1 𝜕Ɛo2
• = +
𝜕𝑜𝑢𝑡ℎ1 𝜕𝑜𝑢𝑡ℎ1 𝜕𝑜𝑢𝑡ℎ1
𝜕Ɛtotal
• = 0.055399425 - 0.019049119 = 0.036350306
𝜕𝑜𝑢𝑡ℎ1
Example (Backward Pass)
Hidden → Input Layer
𝜕𝑜𝑢𝑡ℎ1
• Let’s Calculate 𝜕𝑛𝑒𝑡ℎ1
,
1
• 𝑜𝑢𝑡ℎ1 = (1+𝑒 −𝑛𝑒𝑡 )
ℎ1
𝜕𝑜𝑢𝑡ℎ1 𝜕 1
• = ( − 𝑛𝑒𝑡 )
𝜕𝑛𝑒𝑡ℎ1 𝜕𝑛𝑒𝑡ℎ1 (1+𝑒 ℎ1 )
𝜕𝑜𝑢𝑡ℎ1
• = outh1 (1– outh1)
𝜕𝑛𝑒𝑡ℎ1
Example (Backward Pass)
Hidden → Input Layer
𝜕𝑜𝑢𝑡ℎ1
• = outh1 (1– outh1)
𝜕𝑛𝑒𝑡ℎ1
𝜕𝑜𝑢𝑡ℎ1
• = 0.59326999(1-
𝜕𝑛𝑒𝑡ℎ1
0.59326999)
𝜕𝑜𝑢𝑡ℎ1
• = 0.241300709
𝜕𝑛𝑒𝑡ℎ1
Example (Backward Pass)
Hidden → Input Layer
𝜕𝑛𝑒𝑡ℎ1
• Now let’s derive 𝜕𝑤1
,
• 𝑛𝑒𝑡ℎ1 = w1* i1 + w2*i2 + b1
𝜕𝑛𝑒𝑡ℎ1 𝜕
• = 𝜕𝑤1 (w1* i1 + w2*i2 + b1)
𝜕𝑤1
𝜕𝑛𝑒𝑡ℎ1
• = i1 + 0 + 0
𝜕𝑤1
𝜕𝑛𝑒𝑡ℎ1
• = i1
𝜕𝑤1
Example (Backward Pass)
Hidden → Input Layer
𝜕𝑛𝑒𝑡ℎ1
• = i1
𝜕𝑤1
𝜕𝑛𝑒𝑡ℎ1
• = 0.05
𝜕𝑤1
Example (Backward Pass)
Hidden → Input Layer
• When an item of the neural network • Neural networks are black boxes, meaning
declines, it can continue without some we cannot know how much each independent
issues by its parallel features. variable is influencing the dependent variables.
w42
w11
X1 X2 X3 X4
𝑤11 𝑤12 0.2 0.9
𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.4 0.7
𝑤31 𝑤32 0.6 0.5
𝑤41 𝑤42 0.8 0.3
• Updated Weights:
• Updated Weights:
• Updated Weights:
• Updated Weights:
0.15
0.025
X1 X2 X3 X4
Advantages and Disadvantages of SOM
Advantages Disadvantages
• Data mapping is easily interpreted. • Difficult to determine what input weights to
• A heuristic algorithm.
Deep learning is a subset of machine learning,
which is essentially a neural network with
three or more layers.
Convolutional Neural Network (CNN)
Fully
Input Convolutio Flatten
ReLU Layer Pooling Connected Output
Image nal Layer Layer
Layer
Convolutional Layer
3 3 2 1 0
0 0 1 3 1 0 1 2 12
x 2 2 0 =
3 1 2 2 3
2 0 0 2 2 0 1 2
3x3
2 0 0 0 1 Kernel
5x5
Input Image 3x0 + 3x1 + 2x2 + 0x2 + 0x2 + 1x0 + 3x0 + 1x1 + 2x2
Convolutional Layer
(Kernel)
3 3 2 1 0
0 0 1 3 1 0 1 2 12 12
x 2 2 0 =
3 1 2 2 3
2 0 0 2 2 0 1 2
3x3
2 0 0 0 1 Kernel
5x5
Input Image 3x0 + 2x1 + 1x2 + 0x2 + 1x2 + 3x0 + 1x0 + 2x1 + 2x2
Convolutional Layer
(Kernel)
3 3 2 1 0
0 0 1 3 1 0 1 2 12 12 17
x 2 2 0 =
3 1 2 2 3
2 0 0 2 2 0 1 2
3x3
2 0 0 0 1 Kernel
5x5
Input Image 2x0 + 1x1 + 0x2 + 1x2 + 3x2 + 1x0 + 2x0 + 2x1 + 3x2
Convolutional Layer
(Kernel)
3 3 2 1 0
0 0 1 3 1 0 1 2 12 12 17
x 2 2 0 = 10
3 1 2 2 3
2 0 0 2 2 0 1 2
3x3
2 0 0 0 1 Kernel
5x5
Input Image 0x0 + 0x1 + 1x2 + 3x2 + 1x2 + 2x0 + 2x0 + 0x1 + 0x2
Convolutional Layer
(Kernel)
3 3 2 1 0
0 0 1 3 1 0 1 2 12 12 17
x 2 2 0 = 10 17
3 1 2 2 3
2 0 0 2 2 0 1 2
3x3
2 0 0 0 1 Kernel
5x5
Input Image 0x0 + 1x1 + 3x2 + 1x2 + 2x2 + 2x0 + 0x0 + 0x1 + 2x2
Convolutional Layer
(Kernel)
3 3 2 1 0
0 0 1 3 1 0 1 2 12 12 17
x 2 2 0 = 10 17 19
3 1 2 2 3
2 0 0 2 2 0 1 2
3x3
2 0 0 0 1 Kernel
5x5
Input Image 1x0 + 3x1 + 1x2 + 2x2 + 2x2 + 3x0 + 0x0 + 2x1 + 2x2
Convolutional Layer
(Kernel)
3 3 2 1 0
0 0 1 3 1 0 1 2 12 12 17
x 2 2 0 = 10 17 19
3 1 2 2 3
0 1 2
2 0 0 2 2
3x3
9
2 0 0 0 1 Kernel
5x5
Input Image 3x0 + 1x1 + 2x2 + 2x2 + 0x2 + 0x0 + 2x0 + 0x1 + 0x2
Convolutional Layer
(Kernel)
3 3 2 1 0
0 0 1 3 1 0 1 2 12 12 17
x 2 2 0 = 10 17 19
3 1 2 2 3
0 1 2
2 0 0 2 2
3x3
9 6
2 0 0 0 1 Kernel
5x5
Input Image 1x0 + 2x1 + 2x2 + 0x2 + 0x2 + 2x0 + 0x0 + 0x1 + 0x2
Convolutional Layer
(Kernel)
3 3 2 1 0
0 0 1 3 1 0 1 2 12 12 17
x 2 2 0 = 10 17 19
3 1 2 2 3
0 1 2
2 0 0 2 2
3x3
9 6 14
2 0 0 0 1 Kernel
5x5
Input Image 2x0 + 2x1 + 3x2 + 0x2 + 2x2 + 2x0 + 0x0 + 0x1 + 1x2
Convolutional Layer
(Kernel)
3 3 2 1 0
0 0 1 3 1 0 1 2 12 12 17
x 2 2 0 = 10 17 19
3 1 2 2 3
0 1 2
2 0 0 2 2
3x3
9 6 14
2 0 0 0 1 Kernel
3x3
5x5 Output
Size of Output = [size of Input – size of kernel] + 1
Input Image
O = [z – k] + 1
O = [5 – 3] + 1 = 2 + 1 = 3
Convolutional Layer
(Stride)
Stride (S) = 2
3 3 2 1 0
0 0 1 3 1 0 1 2
x 2 2 0 = 12
3 1 2 2 3
2 0 0 2 2 0 1 2
3x3
2 0 0 0 1 Kernel
5x5
Input Image 3x0 + 3x1 + 2x2 + 0x2 + 0x2 + 1x0 + 3x0 + 1x1 + 2x2
Convolutional Layer
(Stride)
Stride (S) = 2
3 3 2 1 0
0 1 2
0 0 1 3 1
x = 12 17
3 1 2 2 3 2 2 0
2 0 0 2 2 0 1 2
3x3
2 0 0 0 1 Kernel
5x5
Input Image 2x0 + 1x1 + 0x2 + 1x2 + 3x2 + 1x0 + 2x0 + 2x1 + 3x2
Convolutional Layer
(Stride)
Stride (S) = 2
3 3 2 1 0
0 1 2
0 0 1 3 1
x = 12 17
3 1 2 2 3 2 2 0
0 1 2 9
2 0 0 2 2
3x3
2 0 0 0 1 Kernel
5x5
Input Image 3x0 + 1x1 + 2x2 + 2x2 + 0x2 + 0x0 + 2x0 + 0x1 + 0x2
Convolutional Layer
(Stride)
Stride (S) = 2
3 3 2 1 0
0 1 2
0 0 1 3 1
x = 12 17
3 1 2 2 3 2 2 0
0 1 2 9 14
2 0 0 2 2
3x3
2 0 0 0 1 Kernel
5x5
Input Image 2x0 + 2x1 + 3x2 + 0x2 + 2x2 + 2x0 + 0x0 + 0x1 + 1x2
Convolutional Layer
(Stride)
3 3 2 1 0
0 0 1 3 1 0 1 2
x 2 2 0 = 12 17
3 1 2 2 3
2 0 0 2 2 0 1 2 9 14
3x3 2x2
2 0 0 0 1 Kernel
Output
5x5 [𝑧−𝑘]
𝑜= +1
𝑠
Input Image
[5−3]
𝑜= +1= 1+1=2
2
Convolutional Layer
(Padding)
Padding (p) = 1
00 0 0 0 00 6
03 3 2 1 00 0 1 2
00 0 1 3 10 x 2 2 0 =
03 1 2 2 30 0 1 2
02 0 0 2 20 3x3
02 0 0 0 10 Kernel
00 0 0 0 00
5x5 0x0 + 0x1 + 0x2 + 0x2 + 3x2 + 3x0 + 0x0 + 0x1 + 0x2
Input Image
Convolutional Layer
(Padding)
Padding (p) = 1,
Stride = 1 What is the size of
00 0 0 0 00
03 3 2 1 00 0 1 2 the output?
00 0 1 3 10 x 2 2 0 = → 𝑜 = [𝑧−𝑘+2𝑝] + 1
03 1 2 2 30 0 1 2 𝑠
[5−3+2∗1]
02 0 0 2 20 3x3 →𝑜= + 1
1
02 0 0 0 10 Kernel
[2+2]
00 0 0 0 00 →𝑜= +1
1
5x5 →𝑜 =4+1=5
Input Image
Convolutional Layer
(Padding)
Padding (p) = 1,
Stride = 1
00 0 0 0 00 6 14 17 11 3
03 3 2 1 00 0 1 2 14 12 12 17 11
00 0 1 3 10 x 2 2 0 = 8 10 17 19 13
11 9 6 14 12
03 1 2 2 30 0 1 2 6 4 4 6 4
02 0 0 2 20 3x3
02 0 0 0 10 Kernel
5x5
00 0 0 0 00 Output/ Feature Map
5x5
Input Image
Convolutional Layer
(Padding)
Padding (p) = 1,
Stride = 1
00 0 0 0 00 6 14 17 11 3
03 3 2 1 00 0 1 2 14 12 12 17 11
00 0 1 3 10 x 2 2 0 = 8 10 17 19 13
11 9 6 14 12
03 1 2 2 30 0 1 2 6 4 4 6 4
02 0 0 2 20 3x3
02 0 0 0 10 Kernel
5x5
00 0 0 0 00 Output/ Feature Map
5x5
Input Image
ReLU Layer
6 14 17 11 3
14 12 12 17 11 2x2
8 10 17 19 13 Pool Size 14
11 9 6 14 12 Max pooling
6 4 4 6 4
Pooling
6 14 17 11 3
14 12 12 17 11 2x2
8 10 17 19 13 Pool Size 14 17
11 9 6 14 12 Max pooling
6 4 4 6 4
Pooling
6 14 17 11 3
14 12 12 17 11 2x2
8 10 17 19 13 Pool Size 14 17
11 9 6 14 12 Max pooling
6 4 4 6 4 11
Pooling
6 14 17 11 3
14 12 12 17 11 2x2
8 10 17 19 13 Pool Size 14 17
11 9 6 14 12 Max pooling
6 4 4 6 4 11 19
2x2
5x5
Feature Map
Pooling
6 14 17 11 3
14 12 12 17 11 2x2
8 10 17 19 13 Pool Size 11.5 14.25
11 9 6 14 12 Average 9.5 14.0
6 4 4 6 4 pooling
2x2
5x5
Feature Map
Flatten Layer
Converts the multi-
dimensional arrays into
flattened one-dimensional
arrays or single-
dimensional arrays.
Fully- Connected Layer
(Dense Layer)
• Each neuron is
connected to each
other.
• Takes the inputs from
the feature analysis and
applies weights to
predict the correct
label.
Training Of CNN
Tom M. Mitchell, Ethem Alpaydin, ―Introduction Stephen Marsland, Bishop, C., Pattern
―Machine Learning, to Machine Learning (Adaptive ―Machine Learning: An Recognition and Machine
McGraw-Hill Computation and Machine Algorithmic Perspective, Learning. Berlin:
Education (India) Learning), The MIT Press 2004. CRC Press, 2009. Springer-Verlag.
Private Limited, 2013.
Text Books
Saikat Dutt, Andreas C. Müller and John Paul Mueller and Dr. Himanshu
Subramanian Sarah Guido - Luca Massaron - Sharma, Machine
Chandramaouli, Amit Introduction to Machine Machine Learning for Learning, S.K.
Kumar Das – Machine Learning with Python Dummies Kataria & Sons -2022
Learning, Pearson