0% found this document useful (0 votes)

30 views

Unit 4 (2)

Uploaded by

sksharma3058

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views

Unit 4 (2)

Uploaded by

sksharma3058

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 148

Machine Learning

Techniques

KCS 055
Artificial Neural Network (ANN)
• Inspired by information
processing model of Human
Brain.
• Human Consists of billions
of neurons that link with
each other.
• Every neuron receive
information from other
neurons.
Artificial Neural Network (ANN)
• ANNs are computational
algorithms.
• Simulates biological
behavior of nerves system of
human brain.
• Based on human mind
neuron pattern.
• Used in Deep Learning for
classification.
Artificial Neural Network (ANN)
Applications of ANN

Stock Price Character Fingerprint

Prediction Recognition Recognition

Classification
Autonomous
problems, E.g. Classification and
Vehicle Driving
Loan Application Regression Tasks
Using ANN
Approval
Basic Terminology in ANN

• Artificial Neurons: Interconnected Nodes in ANN.

• Interconnections: Several processing units are interconnected to
each other. In biological brain, these interconnections are called
synapse. The general model of processing unit consists of a summing
part with N inputs, activation function and an output.
• Processing Unit: Consists of several unit which are interconnected to
each other.
• Weight Update: To learn a target function each link of a neural
network is updated repeatedly until the target function is obtained.
Basic Terminology in ANN

• Activation Function: The function which decides the type of

output after learning the weighted sum of inputs.
• Input Layer: Initial data for neural network.
• Hidden Layer: Intermediate layer between input and output
layers. All computations link take place in hidden layer.
• Output layer: Produces the output result from given inputs.
Model of Artificial Neurons

Perceptron ADALINE
Pitts Model
Model Model
Perceptron
• Basic unit used to build
ANN.
• Takes real valued input.
• Calculate linear combination
of these inputs and
generates output
• If result > threshold, output
= 1, otherwise, output = 0
Perceptron Training Rule

Linear Combination :
σ 𝑤𝑖𝑥𝑖 = 𝑤0 + 𝑤1𝑥1 + 𝑤2𝑥2 + 𝑤3𝑥3 + 𝑤4𝑥4 + … … … … … … + 𝑤𝑛𝑥𝑛
Perceptron Training Rule
o = actual output
t = target output
If actual = target ➔ weights are fixed weights.
Otherwise, Weights need to be changed
𝑤𝑖 = 𝑤𝑖 + ∆𝑤𝑖
∆𝑤𝑖 = 𝑛 𝑡 − 𝑜 𝑥𝑖
where,
n = learning rate
t = targetoutput
o = actual output
xi = input associated with the weights wi.
Designing AND gate using Perceptron
Training Rule
w1 = 1.2, w2 =0.6, Threshold = 1 and Learning Rate n = 0.5
A B A^B
0 0 0 • Example -1 : A= 0, B=0 and Target = 0
0 1 0
1 0 0 • σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 1 1 • σ 𝑤𝑖𝑥𝑖 = 0*1.2 + 0 * 0.6 = 0
• This is not greater than the threshold of 1,
so the output = 0
Designing AND gate using Perceptron
Training Rule
w1 = 1.2, w2 =0.6, Threshold = 1 and Learning Rate n = 0.5
A B A^B
0 0 0 • Example -2 : A= 0, B=1 and Target = 0
0 1 0
1 0 0 • σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 1 1 • wi.xi = 0*1.2 + 1 * 0.6 = 0.6
• This is not greater than the threshold of 1,
so the output = 0
Designing AND gate using Perceptron
Training Rule
w1 = 1.2, w2 =0.6, Threshold = 1 and Learning Rate n = 0.5
A B A^B • Example -3 : A= 1, B=0 and Target = 0
0 0 0
0 1 0
• σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 0 0 • σ 𝑤𝑖𝑥𝑖 = 1*1.2 + 0 * 0.6 = 1.2
1 1 1
• This is greater than the threshold of 1, so
the output = 1
• Actual output (o) ≠ Target output (t)
Designing AND gate using Perceptron
Training Rule
w1 = 1.2, w2 =0.6, Threshold = 1 and Learning Rate n = 0.5
A B A^B
• Example -3 : A= 1, B=0 and Target = 0
0 0 0
0 1 0 • Actual output (o) ≠ Target output (t)
1 0 0 • wi = wi + Δwi = wi + n(t-o)xi
1 1 1
• w1 = 1.2 + 0.5 * (0 – 1) * 1 = 1.2 + (-0.5) = 0.7
• w2 = 0.6 + 0.5 * (0 – 1) * 0 = 0.6 + 0 = 0.6
Designing AND gate using Perceptron
Training Rule
w1 = 0.7, w2 =0.6, Threshold = 1 and Learning Rate n = 0.5
A B A^B
0 0 0 • Example -1 : A= 0, B=0 and Target = 0
0 1 0 • σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 0 0
1 1 1
• σ 𝑤𝑖𝑥𝑖 = 0*0.7 + 0 * 0.6 = 0
• This is not greater than the threshold of 1,
so the output = 0
Designing AND gate using Perceptron
Training Rule
w1 = 0.7, w2 =0.6, Threshold = 1 and Learning Rate n = 0.5
A B A^B
0 0 0 • Example -2 : A= 0, B=1 and Target = 0
0 1 0
1 0 0 • σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 1 1 • σ 𝑤𝑖𝑥𝑖 = 0*0.7 + 1 * 0.6 = 0.6
• This is not greater than the threshold of 1,
so the output = 0
Designing AND gate using Perceptron
Training Rule
w1 = 0.7, w2 =0.6, Threshold = 1 and Learning Rate n = 0.5
A B A^B
0 0 0 • Example -3 : A= 1, B=0 and Target = 0
0 1 0
1 0 0 • σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 1 1 • σ 𝑤𝑖𝑥𝑖 = 1*0.7 + 0 * 0.6 = 0.7
• This is not greater than the threshold of 1,
so the output = 0
Designing AND gate using Perceptron
Training Rule
w1 = 0.7, w2 =0.6, Threshold = 1 and Learning Rate n = 0.5
A B A^B
0 0 0 • Example -4 : A= 1, B=1 and Target = 1
0 1 0
1 0 0 • σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 1 1 • σ 𝑤𝑖𝑥𝑖 = 1*0.7 + 1 * 0.6 = 1.3
• This is greater than the threshold of 1, so
the output = 1.
Designing AND gate using Perceptron
Training Rule
• Hence, the final weights to design Logical AND
A B A^B gate using Perceptron Model are:
0 0 0
• w1 = 0.7
0 1 0
1 0 0 • w2 = 0.6
1 1 1 x1 w1=0.7
෍ 𝑤𝑖𝑥𝑖 𝑓 O

w2=0.6
x2
Designing OR gate using Perceptron
Training Rule
w1 = 0.6, w2 =0.6, Threshold = 1 and Learning Rate n = 0.5
A B A|B
0 0 0 • Example -1 : A= 0, B=0 and Target = 0
0 1 1
1 0 1 • σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 1 1 • σ 𝑤𝑖𝑥𝑖 = 0*0.6 + 0 * 0.6 = 0
• This is not greater than the threshold of 1,
so the output = 0
Designing OR gate using Perceptron
Training Rule
w1 = 0.6, w2 =0.6, Threshold = 1 and Learning Rate n = 0.5
A B A|B • Example -2 : A= 0, B=1 and Target = 1
0 0 0
0 1 1
• σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 0 1 • σ 𝑤𝑖𝑥𝑖 = 0*0.6 + 1 * 0.6 = 0.6
1 1 1
• This is not greater than the threshold of 1,
so the output = 0
• Actual output (o) ≠ Target output (t)
Designing OR gate using Perceptron
Training Rule
w1 = 0.6, w2 =0.6, Threshold = 1 and Learning Rate n = 0.5
A B A|B
• Example -2 : A= 0, B=1 and Target = 1
0 0 0
0 1 1 • Actual output (o) ≠ Target output (t)
1 0 1 • wi = wi + Δwi = wi + n(t-o)xi
1 1 1
• w1 = 0.6 + 0.5 * (1 – 0) * 0 = 0.6 + 0= 0.6
• w2 = 0.6 + 0.5 * (1– 0) * 1 = 0.6 + 0.5 = 1.1
Designing OR gate using Perceptron
Training Rule
w1 = 0.6, w2 =1.1, Threshold = 1 and Learning Rate n = 0.5
A B A|B • Example -1 : A= 0, B=0 and Target = 0
0 0 0
0 1 1
• σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 0 1 • σ 𝑤𝑖𝑥𝑖 = 0*0.6 + 0 * 1.1 = 0
1 1 1
• This is not greater than the threshold of 1,
so the output = 0
• Actual output (o) = Target output (t)
Designing OR gate using Perceptron
Training Rule
w1 = 0.6, w2 =1.1, Threshold = 1 and Learning Rate n = 0.5
A B A|B • Example -2 : A= 0, B=1 and Target = 1
0 0 0
0 1 1
• σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 0 1 • σ 𝑤𝑖𝑥𝑖 = 0*0.6 + 1 * 1.1 = 1.1
1 1 1
• This is greater than the threshold of 1, so
the output = 1
• Actual output (o) = Target output (t)
Designing OR gate using Perceptron
Training Rule
w1 = 0.6, w2 =1.1, Threshold = 1 and Learning Rate n = 0.5
A B A|B • Example -3 : A= 1, B=0 and Target = 1
0 0 0
0 1 1
• σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 0 1 • σ 𝑤𝑖𝑥𝑖 = 1*0.6 + 0 * 1.1 = 0.6
1 1 1
• This is not greater than the threshold of 1,
so the output = 0
• Actual output (o) ≠ Target output (t)
Designing OR gate using Perceptron
Training Rule
w1 = 0.6, w2 =1.1, Threshold = 1 and Learning Rate n = 0.5
A B A|B
• Example -3 : A= 1, B=0 and Target = 1
0 0 0
0 1 1 • Actual output (o) ≠ Target output (t)
1 0 1 • wi = wi + Δwi = wi + n(t-o)xi
1 1 1
• w1 = 0.6 + 0.5 * (1 – 0) * 1 = 0.6 + 0.5= 1.1
• w2 = 1.1 + 0.5 * (1– 0) * 0 = 1.1 + 0 = 1.1
Designing OR gate using Perceptron
Training Rule
w1 = 1.1, w2 =1.1, Threshold = 1 and Learning Rate n = 0.5
A B A|B • Example -1 : A= 0, B=0 and Target = 0
0 0 0
0 1 1
• σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 0 1 • σ 𝑤𝑖𝑥𝑖 = 0*1.1 + 0 * 1.1 = 0
1 1 1
• This is not greater than the threshold of 1,
so the output = 0
• Actual output (o) = Target output (t)
Designing OR gate using Perceptron
Training Rule
w1 = 0.6, w2 =1.1, Threshold = 1 and Learning Rate n = 0.5
A B A|B • Example -2 : A= 0, B=1 and Target = 1
0 0 0
0 1 1
• σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 0 1 • σ 𝑤𝑖𝑥𝑖 = 0*1.1 + 1 * 1.1 = 1.1
1 1 1
• This is greater than the threshold of 1, so
the output = 1
• Actual output (o) = Target output (t)
Designing OR gate using Perceptron
Training Rule
w1 = 0.6, w2 =1.1, Threshold = 1 and Learning Rate n = 0.5
A B A|B • Example -3 : A= 1, B=0 and Target = 1
0 0 0
0 1 1
• σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 0 1 • σ 𝑤𝑖𝑥𝑖 = 1*1.1 + 0 * 1.1 = 1.1
1 1 1
• This is greater than the threshold of 1, so
the output = 1
• Actual output (o) = Target output (t)
Designing OR gate using Perceptron
Training Rule
w1 = 1.1, w2 =1.1, Threshold = 1 and Learning Rate n = 0.5
A B A|B • Example -4 : A= 1, B=1 and Target = 1
0 0 0
0 1 1
• σ 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1 + 𝑤2𝑥2
1 0 1 • σ 𝑤𝑖𝑥𝑖 = 1*1.1 + 1 * 1.1 = 2.2
1 1 1
• This is not greater than the threshold of 1,
so the output = 1
• Actual output (o) = Target output (t)
Designing OR gate using Perceptron
Training Rule
• Hence, the final weights to design Logical OR
A B A|B gate using Perceptron Model are:
0 0 0
0 1 1 • w1 = 1.1
1 0 1 • w2 = 1.1
1 1 1 x1 w1=1.1
෍ 𝑤𝑖𝑥𝑖 𝑓 O

w2=1.1
x2
Delta Rule

• Perceptron rule is used when training examples are

linearly separable.
• But when the training examples are non-linearly
separable then perceptron rule fails to converge the
target concept.
• Delta Rule is used in ANN when training examples are
non-linearly separable.
Main Idea Of Delta rule

• Uses gradient descent rule and finds out the best

weight.
Gradient Descent Rule

• How to modify weights?

𝑤𝑖 = 𝑤𝑖 + ∆𝑤𝑖
∆𝑤𝑖 = −η∇𝜀 𝑤
∇𝜀 𝑤 -->derivative of error w.r.t weights
» Also called as gradient
Derivation of Gradient Descent Rule

𝜕𝜀 𝜕𝜀 𝜕𝜀 𝜕𝜀 𝜕𝜀
∇𝜀 𝑤 = [ , , , ……….. ]
𝜕𝑤0 𝜕𝑤1 𝜕𝑤2 𝜕𝑤3 𝜕𝑤𝑛
𝜕𝜀 𝜕 1
= ( σ𝑑∈𝐷 𝑡𝑑 − 𝑜 𝑑 )2
𝜕𝑤𝑖 𝜕𝑤𝑖 2
𝜕𝜀 1 𝜕
= × (σ𝑑∈𝐷 𝑡𝑑 − 𝑜 𝑑 )2
𝜕𝑤𝑖 2 𝜕𝑤𝑖
𝜕𝜀 1 𝜕
= × 2(σ𝑑∈𝐷 𝑡𝑑 − 𝑜 𝑑 ) × (𝑡𝑑 − 𝑜𝑑)
𝜕𝑤𝑖 2 𝜕𝑤𝑖
Derivation of Gradient Descent Rule

𝜕𝜀 1 𝜕
= × 2 × σ𝑑∈𝐷 𝑡𝑑 − 𝑜𝑑 𝑡𝑑 − 𝑜𝑑
𝜕𝑤𝑖 2 𝜕𝑤𝑖
𝜕𝜀 𝜕
= σ𝑑∈𝐷(𝑡𝑑 − 𝑜 𝑑 ) 𝑡𝑑 − 𝑜𝑑
𝜕𝑤𝑖 𝜕𝑤𝑖
𝜕𝜀 𝜕
= σ𝑑∈𝐷(𝑡𝑑 − 𝑜 𝑑 ) 𝑡𝑑 − 𝑤𝑑𝑥𝑑
𝜕𝑤𝑖 𝜕𝑤𝑖
𝜕𝜀
= σ𝑑∈𝐷(𝑡𝑑 − 𝑜 𝑑 ) 0 − 𝑥𝑑
𝜕𝑤𝑖
Derivation of Gradient Descent Rule

𝜕𝜀
= σ𝑑∈𝐷(𝑡𝑑 − 𝑜 𝑑 ) 0 − 𝑥𝑑
𝜕𝑤𝑖
𝝏𝜺
= σ𝒅∈𝑫(𝒕𝒅 − 𝒐 𝒅) −𝒙𝒅
𝝏𝒘𝒊
Therefore, ∆𝑤𝑖 = −η∇𝜀 𝑤
∆𝑤𝑖 = −η σ𝑑∈𝐷(𝑡𝑑 − 𝑜 𝑑 ) −𝑥𝑑
∆𝒘𝒊 = η σ𝒅∈𝑫(𝒕𝒅 − 𝒐 𝒅) 𝒙𝒅
Backpropagation Algorithm
• Backward propagation
of errors
• When error occurs, we
go in backward direction
Output Layer → Hidden
Layer → Input Layer
Example
Part 1: Forward Pass
1) Calculate h1(in and out)
• h1(in) = w1i1 +w2i2 + b1
• h1(in) = 0.15*0.05 +
0.20* 0.10 +0.35
• h1(in) = 0.377
Example
(Forward Pass)
1
• h1(out) =
(1+𝑒 −ℎ1 𝑖𝑛 )
1
• h1(out) =
(1+𝑒 −0.377) )
• h1(out) = 0.5932
Example
(Forward Pass)
2) Calculate h2(in and out)
• h2(in) = w3i1 +w4i2 + b1
• h2(in) = 0.25*0.05 +
0.30*0.10 + 0.35
• h2(in) = 0.3925
Example
(Forward Pass)
1
• h2(out) =
(1+𝑒 −ℎ2 𝑖𝑛 )
1
• h2(out) =
(1+𝑒 −0.3925) )
• h2(out) = 0.5968
Example
(Forward Pass)
3) Calculate o1(in and out)
• o1(in) = w5h1(out) +
w6h2(out) + b2
• o1(in) = 0.40*0.593 +
0.45*0.596 + 0.60
• o1(in) = 1.105
Example
(Forward Pass)
1
• o1(out) =
(1+𝑒 −𝑜1 𝑖𝑛 )
1
• o1(out) =
(1+𝑒 −1.105) )
• o1(out) = 0.7513
Example
(Forward Pass)
4) Calculate o2(in and out)
• o2(in) = w7h1(out) +
w8h2(out) + b2
• o2(in) = 0.5932*0.5 +
0.5968*0.55 + 0.60
• o2(in) = 1.22484
Example
(Forward Pass)
1
• o2(out) =
(1+𝑒 −𝑜2 𝑖𝑛 )
1
• o2(out) =
(1+𝑒 −1.22484) )
• o2(out) = 0.7729
Example
(Forward Pass)
5) Calculate Ɛtotal
1
• Ɛtotal = σ (t – o)2
2
1
• Ɛtotal = 0.01 0.7513 2 +
2
1 2
0.99 − 0.7729
2
• Ɛtotal = 0.29837 (approx.)
Example
Part 2: Backward Pass
1) For Output Layer
w+5 = w5 + Δw5
𝜕Ɛtotal
• Δw5 = - η 𝜕𝑤5
𝜕Ɛtotal
• =
𝜕𝑤5
𝜕Ɛtotal 𝜕𝑜𝑢𝑡𝑜1 𝜕𝑛𝑒𝑡𝑜1
* *
𝜕𝑜𝑢𝑡𝑜1 𝜕𝑛𝑒𝑡𝑜1 𝜕𝑤5
Example (Backward Pass)
Output → Hidden Layer
1 1
• Ɛtotal = (targeto1- outo1)2 + (targeto2- outo2)2
2 2

𝜕Ɛtotal 1 𝜕(targeto1− outo1)2

• = +0
𝜕𝑜𝑢𝑡𝑜1 2 𝜕𝑜𝑢𝑡𝑜1
𝜕Ɛtotal 1
• =2 * ∗ (target o1− out o1)
2-1-1 +0
𝜕𝑜𝑢𝑡𝑜1 2
𝜕Ɛtotal
• = −targeto1+ outo1
𝜕𝑜𝑢𝑡𝑜1
Example (Backward Pass)
Output → Hidden Layerample
𝜕Ɛtotal
• = outo1 – targeto1
𝜕𝑜𝑢𝑡𝑜1
𝜕Ɛtotal
• = 0.751365-0.01
𝜕𝑜𝑢𝑡𝑜1
𝜕Ɛtotal
• = 0.7413565
𝜕𝑜𝑢𝑡𝑜1
Example (Backward Pass)
Output → Hidden Layer

• Now, we will find how much output (outo1) is changes

with respect to net input of o1.
1
• Outo1=
(1+𝑒 −𝑜1 𝑖𝑛 )
𝜕𝑜𝑢𝑡𝑜1 𝜕 1
• = ( −𝑜1 𝑖𝑛 ) = outo1 (1– outo1)
𝜕𝑛𝑒𝑡𝑜1 𝜕𝑛𝑒𝑡𝑜1 (1+𝑒 )
Example (Backward Pass)
Output → Hidden Layer
𝜕𝑜𝑢𝑡𝑜1
• = outo1 (1– outo1)
𝜕𝑛𝑒𝑡𝑜1
𝜕𝑜𝑢𝑡𝑜1
• = 0.751365 *
𝜕𝑛𝑒𝑡𝑜1
(1-0.751365)
𝜕𝑜𝑢𝑡𝑜1
• = 0.186815602
𝜕𝑛𝑒𝑡𝑜1
Example (Backward Pass)
Output → Hidden Layer

• Finally, how much does the total net input of o1 changes

with respect to w5.
• n𝑒to1= w5* outh1 + w6*outh2 + b2
𝜕𝑛𝑒𝑡𝑜1 𝜕
• = (w5* outh1 + w6*outh2 + b2)
𝜕𝑤5 𝜕𝑤5
𝜕𝑛𝑒𝑡𝑜1
• = 1 * outh1 + 0 + 0 = outh1
𝜕𝑤5
Example (Backward Pass)
Output → Hidden Layer
𝜕𝑛𝑒𝑡𝑜1
• = outh1
𝜕𝑤5
𝜕𝑛𝑒𝑡𝑜1
• = 0.59326992
𝜕𝑤5
Example (Backward Pass)
Output → Hidden Layer
𝜕Ɛtotal
• =
𝜕𝑤5
𝜕Ɛtotal 𝜕𝑜𝑢𝑡𝑜1 𝜕𝑛𝑒𝑡𝑜1
∗ ∗
𝜕𝑜𝑢𝑡𝑜1 𝜕𝑛𝑒𝑡𝑜1 𝜕𝑤5
𝜕Ɛtotal
• = 0.7413565 *
𝜕𝑤5
0.186815602 * 0.59326992
𝜕Ɛtotal
• = 0.08216704
𝜕𝑤5
Example (Backward Pass)
Output → Hidden Layer
𝜕Ɛtotal 𝜕Ɛtotal 𝜕𝑜𝑢𝑡𝑜1 𝜕𝑛𝑒𝑡𝑜1
• = ∗ ∗
𝜕𝑤5 𝜕𝑜𝑢𝑡𝑜1 𝜕𝑛𝑒𝑡𝑜1 𝜕𝑤5
𝜕Ɛtotal 𝜕Ɛtotal
• = ∗ outh1
𝜕𝑤5 𝜕𝑛𝑒𝑡𝑜1
𝜕Ɛtotal
• can be represented with a geek letter 𝑑𝑒𝑙𝑡𝑎 δ
𝜕𝑛𝑒𝑡𝑜1
𝜕Ɛtotal 𝜕Ɛtotal 𝜕𝑜𝑢𝑡𝑜1
• δo1= = 𝜕𝑜𝑢𝑡𝑜 ∗
𝜕𝑛𝑒𝑡𝑜1 1
𝜕𝑛𝑒𝑡𝑜1
Example (Backward Pass)
Output → Hidden Layer

𝜕Ɛtotal 𝜕𝑜𝑢𝑡𝑜1
• δo1= ∗
𝜕𝑜𝑢𝑡𝑜1 𝜕𝑛𝑒𝑡𝑜1
• δo1 = outo1 – targeto1 * outo1 (1– outo1)
𝜕Ɛtotal 𝜕Ɛtotal
• = ∗ outh1
𝜕𝑤5 𝜕𝑛𝑒𝑡𝑜1
𝜕Ɛtotal
• δo1 outh1
𝜕𝑤5 =
Example (Backward Pass)
Output → Hidden Layer
𝜕Ɛtotal
• Δw5 = -η
𝜕𝑤𝑖
• Let’s take n = 0.6
• Δw5 = 0.6 * 0.08216704
• Δw5 = 0.6 * 0.08216704
• w+5 = w5 + Δw5
• w+5 = 0.4 + (-0.6 *
0.08216704)
• w+5 = 0.350699776
Example (Backward Pass)
Output → Hidden Layer
Now, let’s calculate w6, w7
and w8.
w+6 = w6 + Δw6
• w+6 = 0.408666186
• w+ 7 = 0.511301270
• w+ 8 = 0.561370121
Example (Backward Pass)
Hidden → Input Layer
2) Hidden Layer → input
layer
w+1 = w1 + Δw1
𝜕Ɛtotal
• Δw1 = - η 𝜕𝑤𝑖
𝜕Ɛtotal
• =
𝜕𝑤1
𝜕Ɛtotal 𝜕𝑜𝑢𝑡ℎ1 𝜕𝑛𝑒𝑡ℎ1
* *
𝜕𝑜𝑢𝑡ℎ1 𝜕𝑛𝑒𝑡ℎ1 𝜕𝑤1
Example (Backward Pass)
Hidden → Input Layer
𝜕Ɛtotal 𝜕Ɛo1 𝜕Ɛo2
• = +
𝜕𝑜𝑢𝑡ℎ1 𝜕𝑜𝑢𝑡ℎ1 𝜕𝑜𝑢𝑡ℎ1
𝜕Ɛo1
• Starting with,
𝜕𝑜𝑢𝑡ℎ1
𝜕Ɛo1 𝜕Ɛo1 𝜕𝑛𝑒𝑡o1
• = 𝜕𝑛𝑒𝑡 * 𝜕𝑜𝑢𝑡ℎ
𝜕𝑜𝑢𝑡ℎ1 𝑜1 1
Example (Backward Pass)
Hidden → Input Layer
𝜕Ɛo1
• 𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑖𝑛𝑔 ,
𝜕𝑛𝑒𝑡𝑜1
𝜕Ɛo1 𝜕Ɛo1 𝜕𝑜𝑢𝑡o1
• = *
𝜕𝑛𝑒𝑡𝑜1 𝜕𝑜𝑢𝑡𝑜1 𝜕𝑛𝑒𝑡𝑜1
𝜕Ɛo1
• = (−targeto1+ outo1) * outo1 (1– outo1)
𝜕𝑛𝑒𝑡𝑜1
𝜕Ɛo1
• = 0.7413565 * 0.186815602 = 0.138498562
𝜕𝑛𝑒𝑡𝑜1
Example (Backward Pass)
Hidden → Input Layer
𝜕𝑛𝑒𝑡o1
• 𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑖𝑛𝑔 ,
𝜕𝑜𝑢𝑡ℎ1
• n𝑒to1= w5* outh1 + w6*outh2 + b2
𝜕𝑛𝑒𝑡𝑜1 𝜕
• = (w5* outh1 + w6*outh2 + b2)
𝜕𝑜𝑢𝑡ℎ1 𝜕𝑜𝑢𝑡ℎ1
𝜕𝑛𝑒𝑡𝑜1
• = w5
𝜕𝑜𝑢𝑡ℎ1
Example (Backward Pass)
Hidden → Input Layer
𝜕𝑛𝑒𝑡𝑜1
• = w5 = 0.40
𝜕𝑜𝑢𝑡ℎ1
• Plugging them in
𝜕Ɛo1 𝜕Ɛo1 𝜕𝑛𝑒𝑡o1
• = *
𝜕𝑜𝑢𝑡ℎ1 𝜕𝑛𝑒𝑡𝑜1 𝜕𝑜𝑢𝑡ℎ1
𝜕Ɛo1
• = 0.138498562 * 0.40
𝜕𝑜𝑢𝑡ℎ1
𝜕Ɛo1
• 𝜕𝑜𝑢𝑡ℎ
= 0.055399425
Example (Backward Pass)
Hidden → Input Layer
𝜕Ɛo2
• Similarly, we will calculate,
𝜕𝑜𝑢𝑡ℎ1
𝜕Ɛo2
• = -0.019049119
𝜕𝑜𝑢𝑡ℎ1
• Therefore,
𝜕Ɛtotal 𝜕Ɛo1 𝜕Ɛo2
• = +
𝜕𝑜𝑢𝑡ℎ1 𝜕𝑜𝑢𝑡ℎ1 𝜕𝑜𝑢𝑡ℎ1
𝜕Ɛtotal
• = 0.055399425 - 0.019049119 = 0.036350306
𝜕𝑜𝑢𝑡ℎ1
Example (Backward Pass)
Hidden → Input Layer
𝜕𝑜𝑢𝑡ℎ1
• Let’s Calculate 𝜕𝑛𝑒𝑡ℎ1
,
1
• 𝑜𝑢𝑡ℎ1 = (1+𝑒 −𝑛𝑒𝑡 )
ℎ1

𝜕𝑜𝑢𝑡ℎ1 𝜕 1
• = ( − 𝑛𝑒𝑡 )
𝜕𝑛𝑒𝑡ℎ1 𝜕𝑛𝑒𝑡ℎ1 (1+𝑒 ℎ1 )

𝜕𝑜𝑢𝑡ℎ1
• = outh1 (1– outh1)
𝜕𝑛𝑒𝑡ℎ1
Example (Backward Pass)
Hidden → Input Layer
𝜕𝑜𝑢𝑡ℎ1
• = outh1 (1– outh1)
𝜕𝑛𝑒𝑡ℎ1
𝜕𝑜𝑢𝑡ℎ1
• = 0.59326999(1-
𝜕𝑛𝑒𝑡ℎ1
0.59326999)
𝜕𝑜𝑢𝑡ℎ1
• = 0.241300709
𝜕𝑛𝑒𝑡ℎ1
Example (Backward Pass)
Hidden → Input Layer
𝜕𝑛𝑒𝑡ℎ1
• Now let’s derive 𝜕𝑤1
,
• 𝑛𝑒𝑡ℎ1 = w1* i1 + w2*i2 + b1
𝜕𝑛𝑒𝑡ℎ1 𝜕
• = 𝜕𝑤1 (w1* i1 + w2*i2 + b1)
𝜕𝑤1
𝜕𝑛𝑒𝑡ℎ1
• = i1 + 0 + 0
𝜕𝑤1
𝜕𝑛𝑒𝑡ℎ1
• = i1
𝜕𝑤1
Example (Backward Pass)
Hidden → Input Layer
𝜕𝑛𝑒𝑡ℎ1
• = i1
𝜕𝑤1
𝜕𝑛𝑒𝑡ℎ1
• = 0.05
𝜕𝑤1
Example (Backward Pass)
Hidden → Input Layer

• Putting it all together in a chain rule:

𝜕Ɛtotal 𝜕Ɛtotal 𝜕𝑜𝑢𝑡ℎ1 𝜕𝑛𝑒𝑡ℎ1
• = * *
𝜕𝑤1 𝜕𝑜𝑢𝑡ℎ1 𝜕𝑛𝑒𝑡ℎ1 𝜕𝑤1
𝜕Ɛtotal
• = 0.036350306 * 0.241300709 * 0.05
𝜕𝑤1
𝜕Ɛtotal
• = 0.000438568
𝜕𝑤1
Example (Backward Pass)
Hidden → Input Layer
𝜕Ɛtotal 𝜕Ɛtotal 𝜕𝑜𝑢𝑡ℎ1 𝜕𝑛𝑒𝑡ℎ1
• = * *
𝜕𝑤1 𝜕𝑜𝑢𝑡ℎ1 𝜕𝑛𝑒𝑡ℎ1 𝜕𝑤1
𝜕Ɛtotal 𝜕Ɛtotal
• = ∗ i1
𝜕𝑤1 𝜕𝑛𝑒𝑡ℎ1
𝜕Ɛtotal
• can be represented with a geek letter 𝑑𝑒𝑙𝑡𝑎 δ
𝜕𝑛𝑒𝑡ℎ1
𝜕Ɛtotal
• = δh1∗ i1
𝜕𝑤1
Example (Backward Pass)
Hidden → Input Layer
𝜕Ɛtotal
• Δw1 = -η
𝜕𝑤𝑖
• Let’s take η = 0.6
• Δw1 = 0.6 *0.000438568
• w+1 = w1 + Δw1
• w+1 = 0.15 + (-0.6
*0.000438568)
• w+1 = 0.15-0.0002631408
• w+1 = 0.1497368592
Example (Backward Pass)
Hidden → Input Layer
• Similarly, update w2, w3,
w4.
• w+2 = 0.199…..
• w+3 = 0.249…..
• w+4 = 0.299…..
Refer the following link for this –
https://mattmazur.com/2015/03/17/a-step-by-step-
backpropagation-example/
Advantages and Disadvantages of ANN
Advantages Disadvantages
• A neural network can implement tasks • The neural network required training to
that a linear program cannot. operate.

• When an item of the neural network • Neural networks are black boxes, meaning
declines, it can continue without some we cannot know how much each independent
issues by its parallel features. variable is influencing the dependent variables.

• A neural network determines and does • Large complexity of network structure.

not require to be reprogrammed. • It needed high processing time for big neural
• It can be executed in any application. networks.
Self Organizing Maps (SOM)
• Self Organizing Map(SOM) is a neural network
used for Unsupervised Learning Algorithm.
• It has only 2 layers: Input Layer and Output Layer.
• Self Organizing Map(SOM) proposed by Teuvo
Kohonen is a data visualization technique.
• It is also known as Kohonen Maps.
• It helps to understand high dimensional data by
reducing the dimensions of data to a map.
• It showcases clustering by grouping similar data
together.
Self Organizing Maps (SOM)
Kohonen Self-Organizing Maps

• Step-1: Initialize the weights wij, Random values may be

assumed.
• Step-2: Initialize the learning rate.
• Step-3: Calculate square of Euclidean distance. i.e. or each j=1
to m.
𝐷 𝑗 = σ𝑛𝑖=1 σ𝑚 𝑗=1(𝑥𝑖 − 𝑤𝑖𝑗) 2

• Step-4:Winning unit index j, so that D(j) is minimum.

Kohonen Self-Organizing Maps

• Step-5: For all unit j within a specific neighborhood of j

and for all i, calculate new weights.
𝑤𝑖𝑗 𝑛𝑒𝑤 = 𝑤𝑖𝑗 𝑜𝑙𝑑 + η(𝑥𝑖 − 𝑤𝑖𝑗 𝑜𝑙𝑑 )
Example

Construct KSOM to cluster four vectors. The four input vectors

are: [(0,0,1,1),(1,0,0,0),(0,1,1,0),(0,0,0,1)]. Number of clusters
to be formed is 2. Assume an initial learning rate of 0.5 and
random weights associated with each input are as follows:
0.2 0.9
𝑤𝑖𝑗 = 0.4 0.7
0.6 0.5
0.8 0.3
Solution
Y1 Y2

w42
w11

w12 w21 w22 w31

w41
w32

X1 X2 X3 X4
𝑤11 𝑤12 0.2 0.9
𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.4 0.7
𝑤31 𝑤32 0.6 0.5
𝑤41 𝑤42 0.8 0.3

• First Input Vector: (0,0,1,1)

• Calculating Euclidean Distance D(1) :
𝐷 𝑗 = σ(𝑤𝑖𝑗 − 𝑥𝑖)2
𝐷 1 = σ(𝑤𝑖1 − 𝑥1)2
𝐷 1 = 0.2 − 0 2 + 0.4 − 0 2 + 0.6 − 1 2 + 0.8 − 1 2

𝐷 1 = 0.04 + 0.16 + 0.16 + 0.04 = 0.4

𝑤11 𝑤12 0.2 0.9
𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.4 0.7
𝑤31 𝑤32 0.6 0.5
𝑤41 𝑤42 0.8 0.3

• First Input Vector: (0,0,1,1)

• Calculating Euclidean Distance D(2) :
𝐷 𝑗 = σ(𝑤𝑖𝑗 − 𝑥𝑖)2
𝐷 2 = σ(𝑤𝑖2 − 𝑥2)2
𝐷 2 = 0.9 − 0 2 + 0.7 − 0 2 + 0.5 − 1 2 + 0.3 − 1 2

𝐷 2 = 0.81 + 0.49 + 0.25 + 0.49 = 2.04

𝑤11 𝑤12 0.2 0.9
𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.4 0.7
𝑤31 𝑤32 0.6 0.5
𝑤41 𝑤42 0.8 0.3

• First Input Vector: (0,0,1,1)

• D(1) = 0.4 and D(2) = 2.04
• As, D(1) < D(2), so, Y1 is a winning cluster.
• Thus, First Input belongs to Y1.
𝑤11 𝑤12 0.2 0.9
𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.4 0.7
𝑤31 𝑤32 0.6 0.5
𝑤41 𝑤42 0.8 0.3

• Next step to update the initial weights on winning cluster

unit J=1.
𝑤𝑖𝑗 𝑛𝑒𝑤 = 𝑤𝑖𝑗 𝑜𝑙𝑑 + η 𝑥𝑖 − 𝑤𝑖𝑗 𝑜𝑙𝑑

𝑤𝑖1 𝑛𝑒𝑤 = 𝑤𝑖1 𝑜𝑙𝑑 + η(𝑥𝑖 − 𝑤𝑖1 𝑜𝑙𝑑 )

𝑤11 𝑤12 0.2 0.9
𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.4 0.7
𝑤31 𝑤32 0.6 0.5
𝑤41 𝑤42 0.8 0.3

• First Input Vector: (0,0,1,1)

• Given, Learning Rate η = 0.5
• 𝑤11 𝑛𝑒𝑤 = 𝑤11 𝑜𝑙𝑑 + η 𝑥1 − 𝑤11 𝑜𝑙𝑑
• 𝑤11 𝑛𝑒𝑤 = 0.2 + 0.5 0 − 0.2 = 0.1
• 𝑤21 𝑛𝑒𝑤 = 𝑤21 𝑜𝑙𝑑 + η 𝑥2 − 𝑤21 𝑜𝑙𝑑
• 𝑤21 𝑛𝑒𝑤 = 0.4 + 0.5 0 − 0.4 = 0.2
𝑤11 𝑤12 0.2 0.9
𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.4 0.7
𝑤31 𝑤32 0.6 0.5
𝑤41 𝑤42 0.8 0.3

• First Input Vector: (0,0,1,1)

• Given, Learning Rate η = 0.5
• 𝑤31 𝑛𝑒𝑤 = 𝑤31 𝑜𝑙𝑑 + η 𝑥3 − 𝑤31 𝑜𝑙𝑑
• 𝑤31 𝑛𝑒𝑤 = 0.6 + 0.5 1 − 0.6 = 0.8
• 𝑤41 𝑛𝑒𝑤 = 𝑤41 𝑜𝑙𝑑 + η 𝑥4 − 𝑤41 𝑜𝑙𝑑
• 𝑤41 𝑛𝑒𝑤 = 0.8 + 0.5 1 − 0.8 = 0.9
𝑤11 𝑤12 0.2 0.9
𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.4 0.7
𝑤31 𝑤32 0.6 0.5
𝑤41 𝑤42 0.8 0.3

• Updated Weights:

𝑤11 𝑤12 0.1 0.9

• 𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.2 0.7
𝑤31 𝑤32 0.8 0.5
𝑤41 𝑤42 0.9 0.3
𝑤11 𝑤12 0.1 0.9
𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.2 0.7
𝑤31 𝑤32 0.8 0.5
𝑤41 𝑤42 0.9 0.3

• Second Input Vector: (1,0,0,0)

• Calculating Euclidean Distance D(1) :
𝐷 𝑗 = σ(𝑤𝑖𝑗 − 𝑥𝑖)2
𝐷 1 = σ(𝑤𝑖1 − 𝑥1)2
𝐷 1 = 0.1 − 1 2 + 0.2 − 0 2 + 0.8 − 0 2 + 0.9 − 0 2

𝐷 1 = 0.81 + 0.04 + 0.64 + 0.01 = 2.3

𝑤11 𝑤12 0.1 0.9
𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.2 0.7
𝑤31 𝑤32 0.8 0.5
𝑤41 𝑤42 0.9 0.3

• Second Input Vector: (1,0,0,0)

• Calculating Euclidean Distance D(2) :
𝐷 𝑗 = σ(𝑤𝑖𝑗 − 𝑥𝑖)2
𝐷 2 = σ(𝑤𝑖2 − 𝑥2)2
𝐷 2 = 0.9 − 1 2 + 0.7 − 0 2 + 0.5 − 0 2 + 0.3 − 0 2

𝐷 2 = 0.01 + 0.49 + 0.25 + 0.09 = 0.84

𝑤11 𝑤12 0.1 0.9
𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.2 0.7
𝑤31 𝑤32 0.8 0.5
𝑤41 𝑤42 0.9 0.3

• Second Input Vector: (1,0,0,0)

• D(1) = 2.3 and D(2) = 0.84
• As, D(2) < D(1), so, Y2 is a winning cluster.
• Thus, Second Input belongs to Y2.
𝑤11 𝑤12 0.1 0.9
𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.2 0.7
𝑤31 𝑤32 0.8 0.5
𝑤41 𝑤42 0.9 0.3

• Next step to update the initial weights on winning cluster

unit J=2.
𝑤𝑖𝑗 𝑛𝑒𝑤 = 𝑤𝑖𝑗 𝑜𝑙𝑑 + η 𝑥𝑖 − 𝑤𝑖𝑗 𝑜𝑙𝑑

𝑤𝑖2 𝑛𝑒𝑤 = 𝑤𝑖2 𝑜𝑙𝑑 + η(𝑥𝑖 − 𝑤𝑖2 𝑜𝑙𝑑 )

𝑤11 𝑤12 0.1 0.9
𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.2 0.7
𝑤31 𝑤32 0.8 0.5
𝑤41 𝑤42 0.9 0.3

• Second Input Vector: (1,0,0,0)

• Given, Learning Rate η = 0.5
• 𝑤12 𝑛𝑒𝑤 = 𝑤12 𝑜𝑙𝑑 + η 𝑥1 − 𝑤12 𝑜𝑙𝑑
• 𝑤12 𝑛𝑒𝑤 = 0.9 + 0.5 1 − 0.9 = 0.95
• 𝑤22 𝑛𝑒𝑤 = 𝑤22 𝑜𝑙𝑑 + η 𝑥2 − 𝑤22 𝑜𝑙𝑑
• 𝑤22 𝑛𝑒𝑤 = 0.7 + 0.5 0 − 0.7 = 0.35
𝑤11 𝑤12 0.1 0.9
𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.2 0.7
𝑤31 𝑤32 0.8 0.5
𝑤41 𝑤42 0.9 0.3

• Second Input Vector: (1,0,0,0)

• Given, Learning Rate η = 0.5
• 𝑤32 𝑛𝑒𝑤 = 𝑤32 𝑜𝑙𝑑 + η 𝑥3 − 𝑤32 𝑜𝑙𝑑
• 𝑤32 𝑛𝑒𝑤 = 0.5 + 0.5 0 − 0.5 = 0.25
• 𝑤42 𝑛𝑒𝑤 = 𝑤42 𝑜𝑙𝑑 + η 𝑥4 − 𝑤42 𝑜𝑙𝑑
• 𝑤42 𝑛𝑒𝑤 = 0.3 + 0.5 0 − 0.3 = 0.15
𝑤11 𝑤12 0.1 0.9
𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.2 0.7
𝑤31 𝑤32 0.8 0.5
𝑤41 𝑤42 0.9 0.3

• Updated Weights:

𝑤11 𝑤12 0.1 0.95

• 𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.2 0.35
𝑤31 𝑤32 0.8 0.25
𝑤41 𝑤42 0.9 0.15
𝑤11 𝑤12 0.1 0.95
𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.2 0.35
𝑤31 𝑤32 0.8 0.25
𝑤41 𝑤42 0.9 0.15

• Third Input Vector: (0,1,1,0)

• D(1) = 1.5 and D(2) = 0.1.91
• As, D(1) < D(2), so, Y1 is a winning cluster.
• Thus, Third Input belongs to Y1.
𝑤11 𝑤12 0.1 0.95
𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.2 0.35
𝑤31 𝑤32 0.8 0.25
𝑤41 𝑤42 0.9 0.15

• Third Input Vector: (0,1,1,0)

• w11 = 0.05
• w21 = 0.6
• w31 = 0.9
• w41 = 0.45
𝑤11 𝑤12 0.1 0.95
𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.2 0.35
𝑤31 𝑤32 0.8 0.25
𝑤41 𝑤42 0.9 0.15

• Updated Weights:

𝑤11 𝑤12 0.05 0.95

• 𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.6 0.35
𝑤31 𝑤32 0.9 0.25
𝑤41 𝑤42 0.45 0.15
𝑤11 𝑤12 0.05 0.95
𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.6 0.35
𝑤31 𝑤32 0.9 0.25
𝑤41 𝑤42 0.45 0.15

• Fourth Input Vector: (0,0,0,1)

• D(1) = 1.475 and D(2) = 0.1.81
• As, D(1) < D(2), so, Y1 is a winning cluster.
• Thus, Fourth Input belongs to Y1.
𝑤11 𝑤12 0.05 0.95
𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.6 0.35
𝑤31 𝑤32 0.9 0.25
𝑤41 𝑤42 0.45 0.15

• Fourth Input Vector: (0,1,1,0)

• w11 = 0.025
• w21 = 0.3
• w31 = 0.45
• w41 = 0.475
𝑤11 𝑤12 0.05 0.95
𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.6 0.35
𝑤31 𝑤32 0.9 0.25
𝑤41 𝑤42 0.45 0.15

• Updated Weights:

𝑤11 𝑤12 0.025 0.95

• 𝑤𝑖𝑗 = 𝑤21 𝑤22 = 0.3 0.35
𝑤31 𝑤32 0.45 0.25
𝑤41 𝑤42 0.475 0.15
Final Architecture
Y1 Y2

0.15
0.025

0.95 0.3 0.45 0.25

0.35
0.475

X1 X2 X3 X4
Advantages and Disadvantages of SOM
Advantages Disadvantages
• Data mapping is easily interpreted. • Difficult to determine what input weights to

• Projects high dimensional data onto use.

lower dimensional map. • Clustering result depends on initial weight

• Capable of organizing large, complex data vector.

sets. • Mapping can result in divided clusters.

• Useful for Visualization. • Requires that nearby points behave similarly.

• A heuristic algorithm.
Deep learning is a subset of machine learning,
which is essentially a neural network with
three or more layers.
Convolutional Neural Network (CNN)

• CNN is a type of artificial neural network, which is widely

used for image/object recognition and classification.
Convolutional Neural Network (CNN)

• CNN is a type of artificial neural network, which is

widely used for image/object recognition and
classification.
• It is made up of multiple layers, including convolutional
layers, pooling layers, and fully connected layers.
Architecture Of CNN

Fully
Input Convolutio Flatten
ReLU Layer Pooling Connected Output
Image nal Layer Layer
Layer
Convolutional Layer

• A “filter” passes over the image, scanning a few

pixels at a time and creating a feature map that
predicts the class to which each feature belongs.
• 3 main components are:
– Kernel/Filter
– Stride
– Padding
Convolutional Layer

• Kernel/Filter → A kernel is a feature extractor that

extracts features in the image.
• Stride → It refers to the number of pixels by which
we move the filter across the input image.
• Padding → It is the addition of extra pixels around
the borders of the input images or feature map.
Convolutional Layer
(Kernel)

3 3 2 1 0
0 0 1 3 1 0 1 2 12
x 2 2 0 =
3 1 2 2 3
2 0 0 2 2 0 1 2
3x3
2 0 0 0 1 Kernel
5x5
Input Image 3x0 + 3x1 + 2x2 + 0x2 + 0x2 + 1x0 + 3x0 + 1x1 + 2x2
Convolutional Layer
(Kernel)

3 3 2 1 0
0 0 1 3 1 0 1 2 12 12
x 2 2 0 =
3 1 2 2 3
2 0 0 2 2 0 1 2
3x3
2 0 0 0 1 Kernel
5x5
Input Image 3x0 + 2x1 + 1x2 + 0x2 + 1x2 + 3x0 + 1x0 + 2x1 + 2x2
Convolutional Layer
(Kernel)

3 3 2 1 0
0 0 1 3 1 0 1 2 12 12 17
x 2 2 0 =
3 1 2 2 3
2 0 0 2 2 0 1 2
3x3
2 0 0 0 1 Kernel
5x5
Input Image 2x0 + 1x1 + 0x2 + 1x2 + 3x2 + 1x0 + 2x0 + 2x1 + 3x2
Convolutional Layer
(Kernel)

3 3 2 1 0
0 0 1 3 1 0 1 2 12 12 17
x 2 2 0 = 10
3 1 2 2 3
2 0 0 2 2 0 1 2
3x3
2 0 0 0 1 Kernel
5x5
Input Image 0x0 + 0x1 + 1x2 + 3x2 + 1x2 + 2x0 + 2x0 + 0x1 + 0x2
Convolutional Layer
(Kernel)

3 3 2 1 0
0 0 1 3 1 0 1 2 12 12 17
x 2 2 0 = 10 17
3 1 2 2 3
2 0 0 2 2 0 1 2
3x3
2 0 0 0 1 Kernel
5x5
Input Image 0x0 + 1x1 + 3x2 + 1x2 + 2x2 + 2x0 + 0x0 + 0x1 + 2x2
Convolutional Layer
(Kernel)

3 3 2 1 0
0 0 1 3 1 0 1 2 12 12 17
x 2 2 0 = 10 17 19
3 1 2 2 3
2 0 0 2 2 0 1 2
3x3
2 0 0 0 1 Kernel
5x5
Input Image 1x0 + 3x1 + 1x2 + 2x2 + 2x2 + 3x0 + 0x0 + 2x1 + 2x2
Convolutional Layer
(Kernel)

3 3 2 1 0
0 0 1 3 1 0 1 2 12 12 17
x 2 2 0 = 10 17 19
3 1 2 2 3
0 1 2
2 0 0 2 2
3x3
9
2 0 0 0 1 Kernel
5x5
Input Image 3x0 + 1x1 + 2x2 + 2x2 + 0x2 + 0x0 + 2x0 + 0x1 + 0x2
Convolutional Layer
(Kernel)

3 3 2 1 0
0 0 1 3 1 0 1 2 12 12 17
x 2 2 0 = 10 17 19
3 1 2 2 3
0 1 2
2 0 0 2 2
3x3
9 6
2 0 0 0 1 Kernel
5x5
Input Image 1x0 + 2x1 + 2x2 + 0x2 + 0x2 + 2x0 + 0x0 + 0x1 + 0x2
Convolutional Layer
(Kernel)

3 3 2 1 0
0 0 1 3 1 0 1 2 12 12 17
x 2 2 0 = 10 17 19
3 1 2 2 3
0 1 2
2 0 0 2 2
3x3
9 6 14
2 0 0 0 1 Kernel
5x5
Input Image 2x0 + 2x1 + 3x2 + 0x2 + 2x2 + 2x0 + 0x0 + 0x1 + 1x2
Convolutional Layer
(Kernel)

3 3 2 1 0
0 0 1 3 1 0 1 2 12 12 17
x 2 2 0 = 10 17 19
3 1 2 2 3
0 1 2
2 0 0 2 2
3x3
9 6 14
2 0 0 0 1 Kernel
3x3
5x5 Output
Size of Output = [size of Input – size of kernel] + 1
Input Image
O = [z – k] + 1
O = [5 – 3] + 1 = 2 + 1 = 3
Convolutional Layer
(Stride)
Stride (S) = 2

3 3 2 1 0
0 0 1 3 1 0 1 2
x 2 2 0 = 12
3 1 2 2 3
2 0 0 2 2 0 1 2
3x3
2 0 0 0 1 Kernel
5x5
Input Image 3x0 + 3x1 + 2x2 + 0x2 + 0x2 + 1x0 + 3x0 + 1x1 + 2x2
Convolutional Layer
(Stride)
Stride (S) = 2

3 3 2 1 0
0 1 2
0 0 1 3 1
x = 12 17
3 1 2 2 3 2 2 0
2 0 0 2 2 0 1 2
3x3
2 0 0 0 1 Kernel
5x5
Input Image 2x0 + 1x1 + 0x2 + 1x2 + 3x2 + 1x0 + 2x0 + 2x1 + 3x2
Convolutional Layer
(Stride)
Stride (S) = 2

3 3 2 1 0
0 1 2
0 0 1 3 1
x = 12 17
3 1 2 2 3 2 2 0
0 1 2 9
2 0 0 2 2
3x3
2 0 0 0 1 Kernel
5x5
Input Image 3x0 + 1x1 + 2x2 + 2x2 + 0x2 + 0x0 + 2x0 + 0x1 + 0x2
Convolutional Layer
(Stride)
Stride (S) = 2

3 3 2 1 0
0 1 2
0 0 1 3 1
x = 12 17
3 1 2 2 3 2 2 0
0 1 2 9 14
2 0 0 2 2
3x3
2 0 0 0 1 Kernel
5x5
Input Image 2x0 + 2x1 + 3x2 + 0x2 + 2x2 + 2x0 + 0x0 + 0x1 + 1x2
Convolutional Layer
(Stride)

3 3 2 1 0
0 0 1 3 1 0 1 2
x 2 2 0 = 12 17
3 1 2 2 3
2 0 0 2 2 0 1 2 9 14
3x3 2x2
2 0 0 0 1 Kernel
Output
5x5 [𝑧−𝑘]
𝑜= +1
𝑠
Input Image
[5−3]
𝑜= +1= 1+1=2
2
Convolutional Layer
(Padding)
Padding (p) = 1
00 0 0 0 00 6
03 3 2 1 00 0 1 2
00 0 1 3 10 x 2 2 0 =
03 1 2 2 30 0 1 2
02 0 0 2 20 3x3
02 0 0 0 10 Kernel
00 0 0 0 00
5x5 0x0 + 0x1 + 0x2 + 0x2 + 3x2 + 3x0 + 0x0 + 0x1 + 0x2
Input Image
Convolutional Layer
(Padding)
Padding (p) = 1,
Stride = 1 What is the size of
00 0 0 0 00
03 3 2 1 00 0 1 2 the output?
00 0 1 3 10 x 2 2 0 = → 𝑜 = [𝑧−𝑘+2𝑝] + 1
03 1 2 2 30 0 1 2 𝑠
[5−3+2∗1]
02 0 0 2 20 3x3 →𝑜= + 1
1
02 0 0 0 10 Kernel
[2+2]
00 0 0 0 00 →𝑜= +1
1
5x5 →𝑜 =4+1=5
Input Image
Convolutional Layer
(Padding)
Padding (p) = 1,
Stride = 1
00 0 0 0 00 6 14 17 11 3
03 3 2 1 00 0 1 2 14 12 12 17 11
00 0 1 3 10 x 2 2 0 = 8 10 17 19 13
11 9 6 14 12
03 1 2 2 30 0 1 2 6 4 4 6 4
02 0 0 2 20 3x3
02 0 0 0 10 Kernel
5x5
00 0 0 0 00 Output/ Feature Map
5x5
Input Image
Convolutional Layer
(Padding)
Padding (p) = 1,
Stride = 1
00 0 0 0 00 6 14 17 11 3
03 3 2 1 00 0 1 2 14 12 12 17 11
00 0 1 3 10 x 2 2 0 = 8 10 17 19 13
11 9 6 14 12
03 1 2 2 30 0 1 2 6 4 4 6 4
02 0 0 2 20 3x3
02 0 0 0 10 Kernel
5x5
00 0 0 0 00 Output/ Feature Map
5x5
Input Image
ReLU Layer

• The rectified linear activation function or ReLU for

short is a piecewise linear function that will output
the input directly if it is positive, otherwise, it will
output zero.
𝑓 𝑥 = max(𝐼𝑛𝑝𝑢𝑡 𝑧 , 0)
𝑓 𝑥 = max(𝐼𝑛𝑝𝑢𝑡(𝑧), 0)
Pooling

• To reduce the dimensions of the hidden layer by

combining the outputs of neuron clusters at the
previous layer into a single neuron in the next layer.
• There are mainly 2 types of Pooling :
– Max Pooling
– Average Pooling
Pooling
Pooling

6 14 17 11 3
14 12 12 17 11 2x2
8 10 17 19 13 Pool Size 14
11 9 6 14 12 Max pooling
6 4 4 6 4
Pooling

6 14 17 11 3
14 12 12 17 11 2x2
8 10 17 19 13 Pool Size 14 17
11 9 6 14 12 Max pooling
6 4 4 6 4
Pooling

6 14 17 11 3
14 12 12 17 11 2x2
8 10 17 19 13 Pool Size 14 17
11 9 6 14 12 Max pooling
6 4 4 6 4 11
Pooling

6 14 17 11 3
14 12 12 17 11 2x2
8 10 17 19 13 Pool Size 14 17
11 9 6 14 12 Max pooling
6 4 4 6 4 11 19
2x2
5x5
Feature Map
Pooling

6 14 17 11 3
14 12 12 17 11 2x2
8 10 17 19 13 Pool Size 11.5 14.25
11 9 6 14 12 Average 9.5 14.0
6 4 4 6 4 pooling

2x2
5x5
Feature Map
Flatten Layer
Converts the multi-
dimensional arrays into
flattened one-dimensional
arrays or single-
dimensional arrays.
Fully- Connected Layer
(Dense Layer)
• Each neuron is
connected to each
other.
• Takes the inputs from
the feature analysis and
applies weights to
predict the correct
label.
Training Of CNN

• Initialize all filter weights with random values.

• Forward Propagation (Input → Convolutional → ReLU → Pooling
→ Flatten → Fully Connected → Output).
1
• Calculate total error using Sum squared error σ 𝑡 − 𝑜 2.
2
• Backpropagation (using gradient descent), update weights and
parameters.
• Repeat till minimum error/loss.
• Repeat for all input images (training set).
Disadvantage Of CNN
• High computational requirements.
• Needs large amount of labeled data.
• Large memory footprint.
• Interpretability challenges.
• Limited effectiveness for sequential data.
• Tend to be much slower.
• Training takes a long time.
Reference Books

Tom M. Mitchell, Ethem Alpaydin, ―Introduction Stephen Marsland, Bishop, C., Pattern
―Machine Learning, to Machine Learning (Adaptive ―Machine Learning: An Recognition and Machine
McGraw-Hill Computation and Machine Algorithmic Perspective, Learning. Berlin:
Education (India) Learning), The MIT Press 2004. CRC Press, 2009. Springer-Verlag.
Private Limited, 2013.
Text Books

Saikat Dutt, Andreas C. Müller and John Paul Mueller and Dr. Himanshu
Subramanian Sarah Guido - Luca Massaron - Sharma, Machine
Chandramaouli, Amit Introduction to Machine Machine Learning for Learning, S.K.
Kumar Das – Machine Learning with Python Dummies Kataria & Sons -2022
Learning, Pearson

Canonical Legal MP Grabbe
No ratings yet
Canonical Legal MP Grabbe
53 pages
Unit 4_AI
No ratings yet
Unit 4_AI
19 pages
Artificial Neural Networks and Representation of Neural Networks
No ratings yet
Artificial Neural Networks and Representation of Neural Networks
35 pages
NN Ch04
No ratings yet
NN Ch04
29 pages
Neural Representation of AND- OR- XOR-1
No ratings yet
Neural Representation of AND- OR- XOR-1
23 pages
Perceptron
No ratings yet
Perceptron
17 pages
AND_Gate_Perceptron_Learning
No ratings yet
AND_Gate_Perceptron_Learning
2 pages
AND_Gate_Perceptron_Learning (1)
No ratings yet
AND_Gate_Perceptron_Learning (1)
2 pages
Neural Representation of AND, OR, NOT, XOR and XNOR Logic Gates (Perceptron Algorithm)
No ratings yet
Neural Representation of AND, OR, NOT, XOR and XNOR Logic Gates (Perceptron Algorithm)
14 pages
Deep Learning QB-1
No ratings yet
Deep Learning QB-1
13 pages
Lmis Final
No ratings yet
Lmis Final
10 pages
EAI - Lecture 4
No ratings yet
EAI - Lecture 4
23 pages
Deep Learning Fundamentals Materials
100% (1)
Deep Learning Fundamentals Materials
216 pages
Percept Ron
No ratings yet
Percept Ron
15 pages
OmNarayanSingh CC306 IS Final
No ratings yet
OmNarayanSingh CC306 IS Final
15 pages
ECE/CS 559 - Neural Networks Lecture Notes #4 The Perceptron and Its Training
No ratings yet
ECE/CS 559 - Neural Networks Lecture Notes #4 The Perceptron and Its Training
12 pages
Ann Muj
No ratings yet
Ann Muj
65 pages
Perceptron Network
No ratings yet
Perceptron Network
26 pages
Perceptron Learning Rules
50% (2)
Perceptron Learning Rules
38 pages
Lecture 5 NN
No ratings yet
Lecture 5 NN
57 pages
perceptron
No ratings yet
perceptron
11 pages
Lec7 Inroduction to Neural Network (1)
No ratings yet
Lec7 Inroduction to Neural Network (1)
24 pages
SC - M2 -Ktunotes.in
No ratings yet
SC - M2 -Ktunotes.in
124 pages
IML5
No ratings yet
IML5
21 pages
Lab ANDandXOR REGRESSION ANN
No ratings yet
Lab ANDandXOR REGRESSION ANN
13 pages
Ml Unit 3 Study Material-1
No ratings yet
Ml Unit 3 Study Material-1
32 pages
DL CHPT 1
No ratings yet
DL CHPT 1
59 pages
Linear
No ratings yet
Linear
18 pages
Jntuk R20 ML Unit-V
No ratings yet
Jntuk R20 ML Unit-V
19 pages
Introduction To Neural Networks: Revision Lectures: © John A. Bullinaria, 2004
No ratings yet
Introduction To Neural Networks: Revision Lectures: © John A. Bullinaria, 2004
24 pages
Introduction To Neural Networks: John Paxton Montana State University Summer 2003
No ratings yet
Introduction To Neural Networks: John Paxton Montana State University Summer 2003
31 pages
1587253226
No ratings yet
1587253226
35 pages
08 NN
No ratings yet
08 NN
43 pages
Ch1-fundamental of neural network
No ratings yet
Ch1-fundamental of neural network
59 pages
Lecture 6 Perceptron Learning Rule
No ratings yet
Lecture 6 Perceptron Learning Rule
32 pages
In-Class Exercise Solutions - Perceptrons
No ratings yet
In-Class Exercise Solutions - Perceptrons
23 pages
Slide 2
No ratings yet
Slide 2
35 pages
Perceptron 2015
No ratings yet
Perceptron 2015
63 pages
AIML - 04 Single Layer Perceptron
No ratings yet
AIML - 04 Single Layer Perceptron
11 pages
Clase3_redUnidireccional
No ratings yet
Clase3_redUnidireccional
74 pages
lect 5
No ratings yet
lect 5
41 pages
37_Perceptron2
No ratings yet
37_Perceptron2
21 pages
AML_mod4
No ratings yet
AML_mod4
22 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
71 pages
Unit-5 AI
No ratings yet
Unit-5 AI
19 pages
Neural Network: Presented by Lecturer Dept. of Mechatronics Engineering Rajshahi University of Engineering & Technology
No ratings yet
Neural Network: Presented by Lecturer Dept. of Mechatronics Engineering Rajshahi University of Engineering & Technology
25 pages
Topic 3i - Artificial Neural Networks - Revised 20032020
100% (1)
Topic 3i - Artificial Neural Networks - Revised 20032020
70 pages
L2 Perceptrons, Function Approximation, Classification
No ratings yet
L2 Perceptrons, Function Approximation, Classification
89 pages
Unit 1
No ratings yet
Unit 1
19 pages
Unit 2-Ann
No ratings yet
Unit 2-Ann
62 pages
ANN 3 - Perceptron
100% (1)
ANN 3 - Perceptron
56 pages
Machine Learning: Algorithms and Applications: (Continued)
No ratings yet
Machine Learning: Algorithms and Applications: (Continued)
17 pages
Chapter 07 Artificial Neural Network
No ratings yet
Chapter 07 Artificial Neural Network
62 pages
Implementing Logic Gates Using Neural Networks (Part 2) - by Vedant Kumar - Towards Data Science
No ratings yet
Implementing Logic Gates Using Neural Networks (Part 2) - by Vedant Kumar - Towards Data Science
3 pages
ML Lecture Slides 11
No ratings yet
ML Lecture Slides 11
28 pages
Perceptron For Class
No ratings yet
Perceptron For Class
28 pages
Language Programs and Policies in Multilingual Societies Module 1
No ratings yet
Language Programs and Policies in Multilingual Societies Module 1
4 pages
Slickline Care 3
100% (3)
Slickline Care 3
22 pages
School Based Assessment 2023-24 Second Term Computer Education Grade 8
No ratings yet
School Based Assessment 2023-24 Second Term Computer Education Grade 8
1 page
Tieng Anh 6 Friends Plus - Unit 8 - Test 2 (key)
No ratings yet
Tieng Anh 6 Friends Plus - Unit 8 - Test 2 (key)
6 pages
Empowerment Tech.
No ratings yet
Empowerment Tech.
4 pages
Summary of VAT
No ratings yet
Summary of VAT
5 pages
1.4 The Foot Manual PDF
No ratings yet
1.4 The Foot Manual PDF
16 pages
James Schlauderaff: Career Goal
No ratings yet
James Schlauderaff: Career Goal
2 pages
Examples of Masters Thesis in Education
100% (2)
Examples of Masters Thesis in Education
4 pages
June 22 Statement
No ratings yet
June 22 Statement
17 pages
Sek Serisi Dip Siyirici Sek: Sek Series Sweep Auger Capacity Chart
No ratings yet
Sek Serisi Dip Siyirici Sek: Sek Series Sweep Auger Capacity Chart
2 pages
Chassis
No ratings yet
Chassis
30 pages
Wa0004.
No ratings yet
Wa0004.
19 pages
Final Income Tax
No ratings yet
Final Income Tax
6 pages
Instructions For PHD Candidates-2012: Gujarat Technological University
No ratings yet
Instructions For PHD Candidates-2012: Gujarat Technological University
1 page
165830-1CD Instrucciones PDF
No ratings yet
165830-1CD Instrucciones PDF
91 pages
Conflict Management
100% (2)
Conflict Management
22 pages
Tokyo
No ratings yet
Tokyo
9 pages
Weird Fiction: A Genre Study Michael Cisco - Get the ebook in PDF format for a complete experience
No ratings yet
Weird Fiction: A Genre Study Michael Cisco - Get the ebook in PDF format for a complete experience
53 pages
Sample Resume Help Desk Agent
100% (1)
Sample Resume Help Desk Agent
8 pages
Hemmings Classic Car - February 2023
No ratings yet
Hemmings Classic Car - February 2023
78 pages
Africa Enterprise Challenge Fund (AECF) Application Manager - Investing in Women in The Blue Economy in Kenya (IIW-BEK) - Round III
No ratings yet
Africa Enterprise Challenge Fund (AECF) Application Manager - Investing in Women in The Blue Economy in Kenya (IIW-BEK) - Round III
18 pages
An Introduction to Combinatorics and Graph Theory
No ratings yet
An Introduction to Combinatorics and Graph Theory
123 pages
Oral Presentation
No ratings yet
Oral Presentation
19 pages
A Day On The Grand Canal With The Emperor of China
No ratings yet
A Day On The Grand Canal With The Emperor of China
2 pages
Aksum Town
No ratings yet
Aksum Town
33 pages
PaperforRecyclingQuality Control GuidelinesWITHANNEX - (Are La Baza EN643)
No ratings yet
PaperforRecyclingQuality Control GuidelinesWITHANNEX - (Are La Baza EN643)
14 pages
Operational Guidelines
No ratings yet
Operational Guidelines
58 pages
Example Introduction Master Thesis
100% (3)
Example Introduction Master Thesis
4 pages

Unit 4 (2)

Uploaded by

Unit 4 (2)

Uploaded by

Machine Learning

Stock Price Character Fingerprint

• Artificial Neurons: Interconnected Nodes in ANN.

• Activation Function: The function which decides the type of

• Perceptron rule is used when training examples are

• Uses gradient descent rule and finds out the best

• How to modify weights?

𝜕Ɛtotal 1 𝜕(targeto1− outo1)2

• Now, we will find how much output (outo1) is changes

• Finally, how much does the total net input of o1 changes

• Putting it all together in a chain rule:

• A neural network determines and does • Large complexity of network structure.

• Step-1: Initialize the weights wij, Random values may be

• Step-4:Winning unit index j, so that D(j) is minimum.

• Step-5: For all unit j within a specific neighborhood of j

Construct KSOM to cluster four vectors. The four input vectors

w12 w21 w22 w31

• First Input Vector: (0,0,1,1)

𝐷 1 = 0.04 + 0.16 + 0.16 + 0.04 = 0.4

• First Input Vector: (0,0,1,1)

𝐷 2 = 0.81 + 0.49 + 0.25 + 0.49 = 2.04

• First Input Vector: (0,0,1,1)

• Next step to update the initial weights on winning cluster

𝑤𝑖1 𝑛𝑒𝑤 = 𝑤𝑖1 𝑜𝑙𝑑 + η(𝑥𝑖 − 𝑤𝑖1 𝑜𝑙𝑑 )

• First Input Vector: (0,0,1,1)

• First Input Vector: (0,0,1,1)

𝑤11 𝑤12 0.1 0.9

• Second Input Vector: (1,0,0,0)

𝐷 1 = 0.81 + 0.04 + 0.64 + 0.01 = 2.3

• Second Input Vector: (1,0,0,0)

𝐷 2 = 0.01 + 0.49 + 0.25 + 0.09 = 0.84

• Second Input Vector: (1,0,0,0)

• Next step to update the initial weights on winning cluster

𝑤𝑖2 𝑛𝑒𝑤 = 𝑤𝑖2 𝑜𝑙𝑑 + η(𝑥𝑖 − 𝑤𝑖2 𝑜𝑙𝑑 )

• Second Input Vector: (1,0,0,0)

• Second Input Vector: (1,0,0,0)

𝑤11 𝑤12 0.1 0.95

• Third Input Vector: (0,1,1,0)

• Third Input Vector: (0,1,1,0)

𝑤11 𝑤12 0.05 0.95

• Fourth Input Vector: (0,0,0,1)

• Fourth Input Vector: (0,1,1,0)

𝑤11 𝑤12 0.025 0.95

0.95 0.3 0.45 0.25

• Projects high dimensional data onto use.

lower dimensional map. • Clustering result depends on initial weight

• Capable of organizing large, complex data vector.

sets. • Mapping can result in divided clusters.

• Useful for Visualization. • Requires that nearby points behave similarly.

• CNN is a type of artificial neural network, which is widely

• CNN is a type of artificial neural network, which is

• A “filter” passes over the image, scanning a few

• Kernel/Filter → A kernel is a feature extractor that

• The rectified linear activation function or ReLU for

• To reduce the dimensions of the hidden layer by

• Initialize all filter weights with random values.

You might also like