Lec 8
Lec 8
A B H I S H EK M U K H O PA D H YAY
I N ST RUC TO R: D R . P R A DIP TA B I SWA S
I 3 D L A B O RATO RY, C P DM , I I S C
Image Source: deeplearning.ai
Warren
McCulloch Walter Pitts
Linear Separable:
Linear inseparable:
Edge Detection
Vertical edges
How do we detect
these edges
Horizontal edges
Image Source: deeplearning.ai
Neural Network?
Suppose an image is of the size 68 X 68 X 3
◦ Input feature dimension then becomes 12,288
If Image size is of 720 X 720 X 3
◦ Input feature dimension becomes 1,555,200
Number of parameters will swell up to a HUGE number
Result in more computational and memory
requirements
Another Application
Digit Recognition
Classifier 5
◦ (for example: what is the probability that the image represents a 5 given its
pixels?)
Normalization Constant
Why did this help? Well, we think that we might be able to specify how features are
“generated” by the class label
The Bayes Classifier
Let’s expand this for our digit recognition task:
To classify, we’ll simply compute these two probabilities and predict based on which one is greater
Model Parameters
For the Bayes classifier, we need to “learn” two functions, the likelihood and the prior
How many parameters are required to specify the prior for our digit recognition
example?
Model Parameters
How many parameters are required to specify the likelihood?
◦ (Supposing that each image is 30x30 pixels)
CNN
Drive into
CNN
In a convolutional network (ConvNet), there are basically
three types of layers:
1.Convolution layer
2.Pooling layer
3.Fully connected layer
A convolutional layer
A CNN is a neural network with some convolutional layers (and some
other layers). A convolutional layer has a number of filters that does
convolutional operation.
Edge detector
A filter
Convolution These are the network
parameters to be learned.
1 -1 -1
1 0 0 0 0 1 -1 1 -1 Filter 1
0 1 0 0 1 0 -1 -1 1
0 0 1 1 0 0
1 0 0 0 1 0 -1 1 -1
-1 1 -1 Filter 2
0 1 0 0 1 0
0 0 1 0 1 0 -1 1 -1
6 x 6 image
Each filter detects a
Image Source: internet small pattern (3 x 3).
1 -1 -1
-1 1 -1 Filter 1
Convolution -1 -1 1
stride=1
1 0 0 0 0 1 Dot
product
0 1 0 0 1 0 3 -1
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
Image Source: internet
1 -1 -1
-1 1 -1 Filter 1
Convolution -1 -1 1
If stride=2
1 0 0 0 0 1
0 1 0 0 1 0 3 -3
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
Image Source: internet
1 -1 -1
-1 1 -1 Filter 1
Convolution -1 -1 1
stride=1
1 0 0 0 0 1
0 1 0 0 1 0 3 -1 -3 -1
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
0 1 0 0 1 0
0 0 1 0 1 0 -3 -3 0 1
6 x 6 image 3 -2 -2 -1
Image Source: internet
-1 1 -1
-1 1 -1 Filter 2
Convolution -1 1 -1
stride=1
Repeat this for each filter
1 0 0 0 0 1
0 1 0 0 1 0 3 -1 -3 -1
-1 -1 -1 -1
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
-1 -1 -2 1 Two 4 x 4 images
0 1 0 0 1 0 Feature
Forming 4 x 4 x 2
0 0 1 0 1 0 -3 -3 Map0 1 matrix
-1 -1 -2 1
6 x 6 image 3 -2 -2 -1
-1 0 -4 3
Image Source: internet
Convolution over Volume
11 -1-1 -1-1
Color image 1 -1 -1
1 0 0 0 0 1 -1-1 11 -1-1
1 0 0 0 0 1 -1 1 -1 Filter 1
0 11 00 00 01 00 1 -1 -1 1
-1-1 -1-1 11
0 1 0 0 1 0
0 00 11 01 00 10 0
0 0 1 1 0 0
1 00 00 10 11 00 0
1 0 0 0 1 0 -1-1 11 -1-1
0 11 00 00 01 10 0 -1 1 -1
0 1 0 0 1 0 -1-1 11 -1-1
0 00 11 00 01 10 0 -1 1 -1
0 0 1 0 1 0 -1-1 11 -1-1 Filter 2
0 0 1 0 1 0 -1 1 -1
1 0 0 0 0 1 1 -1 -1 -1 1 -1
0 1 0 0 1 0 -1 1 -1 -1 1 -1
0 0 1 1 0 0 -1 -1 1 -1 1 -1
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
convolution
image
x1
1 0 0 0 0 1
0 1 0 0 1 0 x2
Fully- 0 0 1 1 0 0
1 0 0 0 1 0
connected
1
1
1
1
0 1 0 0 1 0
0 0 1 0 1 0
x36
Image Source: internet
1 -1 -1 Filter 1 1 1
-1 1 -1 2 0
-1 -1 1 3 0
4: 0 3
1 0 0 0 0 1
0 1 0 0 1 0 0
0 0 1 1 0 0 8 1
1 0 0 0 1 0 9 0
0 1 0 0 1 0 10: 0
0 0 1 0 1 0
1 0
6 x 6 image
3 0
14
fewer parameters! 15 1 Only connect to
16 1 9 inputs, not
fully connected
Image Source: internet
1 -1 -1 1: 1
-1 1 -1 Filter 1 2: 0
-1 -1 1 3: 0
4: 0 3
1 0 0 0 0 1
0 1 0 0 1 0 7: 0
0 0 1 1 0 0 8: 1
1 0 0 0 1 0 9: 0 -1
0 1 0 0 1 0 10: 0
0 0 1 0 1 0
13: 0
6 x 6 image
14: 0
Fewer parameters 15: 1
16: 1 Shared weights
Even fewer parameters
Image Source: internet
Suppose we have 10 filters applying
on input (6 X 6 X 3), each of shape 3 X
3 X 3. What will be the number of
parameters in that layer?
Max Pooling
Can
Fully Connected repeat
Feedforward network
Convolution many
times
Max Pooling
Flattened
Image Source: internet
Max Pooling
1 -1 -1 -1 1 -1
-1 1 -1 Filter 1 -1 1 -1 Filter 2
-1 -1 1 -1 1 -1
3 -1 -3 -1 -1 -1 -1 -1
-3 1 0 -3 -1 -1 -2 1
-3 -3 0 1 -1 -1 -2 1
3 -2 -2 -1 -1 0 -4 3
Image Source: internet
Why Pooling
• Subsampling pixels will not change the object
bird
bird
Subsampling
New image
1 0 0 0 0 1 but smaller
0 1 0 0 1 0 Conv
3 0
0 0 1 1 0 0 -1 1
1 0 0 0 1 0
0 1 0 0 1 0 Max 3 1
0 3
0 0 1 0 1 0 Pooling
2 x 2 image
6 x 6 image
Each filter
is a channel
Image Source: internet
The whole CNN
3 0
-1 1 Convolution
3 1
0 3
Max Pooling
Can
A new image
repeat
Convolution many
Smaller than the original
times
image
The number of channels Max Pooling
Max Pooling
Max Pooling
1
3 0
-1 1 3
3 1 -1
0 3 Flattened
1 Fully Connected
Feedforward network
3
Image Source: internet
Classic Networks
1.LeNet-5
2.AlexNet
3.VGG
LeNet-
LeNet-5
•Parameters: 60k
•Layers flow: Conv Pool Conv Pool FC FC Output
•Activation functions: Sigmoid/tanh and ReLU
AlexNet
•Parameters: 60 million
•Activation functions: ReLU
Input
1 x 28 x 28
Convolution
How many parameters for
each filter? 9 25 x 26 x 26
Max Pooling
25 x 13 x 13
Convolution
How many parameters 225=
for each filter? 50 x 11 x 11
25x9
Max Pooling
50 x 5 x 5
Input
1 x 28 x 28
Output Convolution
25 x 26 x 26
Fully connected Max Pooling
feedforward network
25 x 13 x 13
Convolution
50 x 11 x 11
Max Pooling
1250 50 x 5 x 5
Flattened
Neural Networks(R-
Networks(R-CNN)
• Calculate the CNN representation for entire
image only once
• At each window, it
generates k Anchor boxes of
different shapes and sizes
Anchor box 1
Anchor box 2
• Since the shape of anchor box 1 is similar to the bounding
box for the person, the latter will be assigned to anchor box
1 and the car will be assigned to anchor box 2
• The output in this case, instead of 3 X 3 X 8 (using a 3 X 3
grid and 3 classes), will be 3 X 3 X 16 (since we are using 2
anchors)
Image Source: deeplearning.ai
You Only Look Once
• Training
• 3 X 3 grid with two anchors per grid
• 3 different object classes
• y labels will have a shape of 3 X 3 X 16
• suppose if we use 5 anchor boxes per grid
• number of classes has been increased to 5
• target will be 3 X 3 X 10 X 5 = 3 X 3 X 50
• An input image of shape (608, 608, 3)
• Output volume of (19, 19, 425)
• 5 is the number of anchor boxes per
grid
• How many classes are there?
Answer : 80 classes