0% found this document useful (0 votes)

28 views16 pages

Deep Learning CNN 4th Unit

It's Deep learning Notes of 4th unit and explain of Convolutional Neural Network

Uploaded by

refineiq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views16 pages

Deep Learning CNN 4th Unit

It's Deep learning Notes of 4th unit and explain of Convolutional Neural Network

Uploaded by

refineiq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

UNIT 4

Convolutional Neural Network (CNN)

 A Convolutional Neural Network (CNN) is a type of Deep Learning

neural network architecture commonly used in Computer Vision.
Computer vision is a field of Artificial Intelligence that enables a
computer to understand and interpret the image or visual data.
 In deep learning, a convolutional neural network (CNN/ConvNet) is a
class of deep neural networks, most applied to analyse visual imagery.
 Convolutional Neural Network (CNN) is the extended version of artificial
neural networks (ANN) which is predominantly used to extract the
feature from the grid-like matrix dataset. For example, visual datasets like
images or videos where data patterns play an extensive role
 Convolutional Neural Network consists of multiple layers like the input
layer, Convolutional layer, Pooling layer, and fully connected layers.
 The convolutional layer is the first layer of a convolutional network.
While convolutional layers can be followed by additional convolutional
layers or pooling layers, the fully-connected layer is the final layer.

 The Convolutional layer applies filters to the input image to extract

features, the Pooling layer down samples the image to reduce
computation, and the fully connected layer makes the final prediction.
The network learns the optimal filters through backpropagation and
gradient descent.
 A complete Convolution Neural Networks architecture is also known as
covnets. A covnets is a sequence of layers, and every layer transforms one
volume to another through a differentiable function.
Types of layers:
Let’s take an example by running a covnets on of image of dimension 32x32x3

 Input Layers: It is the layer in which we give input to our model. In

CNN, Generally, the input will be an image or a sequence of images. This
layer holds the raw input of the image with width 32, height 32, and depth
3.
 Convolutional Layers: This is the layer, which is used to extract the
feature from the input dataset. It applies a set of learnable filters known as
the kernels to the input images. The filters/kernels are smaller matrices
usually 2×2, 3×3, or 5×5 shape. it slides over the input image data and
computes the dot product between kernel weight and the corresponding
input image patch. The output of this layer is referred ad feature maps.
Suppose we use a total of 12 filters for this layer we will get an output
volume of dimension 32 x 32 x 12.
 Activation Layer: By adding an activation function to the output of the
preceding layer, activation layers add nonlinearity to the network. it will
apply an element-wise activation function to the output of the convolution
layer. Some common activation functions are RELU: max (0, x), Tanh,
Leaky RELU, etc. The volume remains unchanged hence output volume
will have dimensions 32 x 32 x 12.
 Pooling layer: This layer is periodically inserted in the covnets and its
main function is to reduce the size of volume which makes the
computation fast reduces memory and prevents overfitting. Two common
types of pooling layers are max pooling and average pooling. If we use a
max pool with 2 x 2 filters and stride 2, the resultant volume will be of
dimension 16x16x12.
 Flattening: The resulting feature maps are flattened into a one-
dimensional vector after the convolution and pooling layers so they can
be passed into a completely linked layer for categorization or regression.
 Fully Connected Layers: It takes the input from the previous layer and
computes the final classification or regression task.

 Output Layer: The output from the fully connected layers is then fed
into a logistic function for classification tasks like sigmoid or softmax
which converts the output of each class into the probability score of each
class.

Convolution Layers
 The Convolution Layers are the initial layers to pull out features from the image. It
maintains the relationship between pixels by learning features using a small input
data sequence. It is a mathematical term that takes two inputs, an image matrix and a
kernel or filter. The result is calculated by:

In the above image,

The image matrix is h x w x d

The dimensions of the filter are fh x fw x d

The output is calculated as (h- fh +1)(w- fw+1) x 1

Now, let us take an example and solve a 5x5 image matrix whose pixel values are
0, 1 and the filter matrix as 3x3:
The matrix multiplication will work as follows

The final convolution layers output matrix of a 5x5 image multiplied with a 3x3
filter will be:

The convolution of the image with different filter values can produce a blur or
sharpened image. The size of the output image is calculated by:

(m-n+1)(m-n+1)

Strides
When the array is created, the pixels are shifted over to the input matrix. The
number of pixels turning to the input matrix is known as the strides. When the
number of strides is 1, we move the filters to 1 pixel at a time. Similarly, when
the number of strides is 2, we carry the filters to 2 pixels, and so on. They are
essential because they control the convolution of the filter against the input, i.e.,
Strides are responsible for regulating the features that could be missed while
flattening the image. They denote the number of steps we are moving in each
convolution. The following figure shows how the convolution would work.

In the first matrix, the stride = 0, second image: stride=2, and the third image:
stride=2. The size of the output image is calculated by:

[{(n+2p-f+1)/s}+1][{(n+2p-f+1)/s}]

Pooling Technique in CNN

 In convolutional neural networks (CNNs), the pooling layer is a
common type of layer that is typically added after convolutional
layers. The pooling layer is used to reduce the spatial dimensions (i.e.,
the width and height) of the feature maps, while preserving the depth
(i.e., the number of channels)
 The padding plays a vital role in creating CNN. After the convolution
operation, the original size of the image is shrunk. Also, in the image
classification task, there are multiple convolution layers after which our
original image is shrunk after every step, which we don’t want.
 Secondly, when the kernel moves over the original image, it passes
through the middle layer more times than the edge layers, due to which
there occurs an overlap.
 To overcome this problem, a new concept was introduced named
padding. It is an additional layer that can add to the borders of an image
while preserving the size of the original picture. For example:
So, if an n x n matrix is convolved with an ff matrix with a padding p, then the
size of the output image will be:

(n+2p-f+1) x (n+2p-f+1)

Pooling
 The pooling layer is another building block of a CNN and plays a vital role
in pre-processing an image. In the pre-process, the image size shrinks by
reducing the number of parameters if the image is too large.
 When the picture is shrunk, the pixel density is also reduced, the
downscaled image is obtained from the previous layers.
 Basically, its function is to progressively reduce the spatial size of the
image to reduce the network complexity and computational cost. Spatial
pooling is also known as down sampling or subsampling that reduces the
dimensionality of each map but retains the essential features.
 A rectified linear activation function, or ReLU, is applied to each value in
the feature map. Relu is a simple and effective nonlinearity that does not
change the values in the feature map but is present because later subsequent
pooling layers are added.
 Pooling is added after the nonlinearity is applied to the feature maps. There
are three types of spatial pooling:

1. Max Pooling

Max pooling is a rule to take the maximum of a region and help to proceed with
the most crucial features from the image. It is a sample-based process that
transfers continuous functions into discrete counterparts. Its primary objective is
to downscale an input by reducing its dimensionality and making assumptions
about features contained in the sub-region that were rejected.
2. Average Pooling

It is different from Max Pooling; it retains information about the lesser essential
features. It simply downscales by dividing the input matrix into rectangular
regions and calculating the average values of each area.
OR

The pooling operation involves sliding a two-dimensional filter over each

channel of feature map and summarising the features lying within the region
covered by the filter.
For a feature map having dimensions nh x nw x nc, the dimensions of output
obtained after a pooling layer is
(nh - f + 1) / s x (nw - f + 1)/s x nc
where,

-> nh - height of feature map

-> nw - width of feature map
-> nc - number of channels in the feature map
-> f - size of filter
-> s - stride length

Le-Net-5 Architecture
The network has 5 layers with learnable parameters and hence named Lenet-5.
It has three sets of convolution layers with a combination of average pooling.
After the convolution and average pooling layers, we have two fully connected
layers. At last, a SoftMax classifier which classifies the images into respective
class.

The input to this model is a 32 X 32 grayscale image hence the number of

channels is one.

We then apply the first convolution operation with the filter size 5X5 and we
have 6 such filters. As a result, we get a feature map of size 28X28X6. Here the
number of channels is equal to the number of filters applied.
After the first pooling operation, we apply the average pooling and the size of
the feature map is reduced by half. Note that, the number of channels is intact.

Next, we have a convolution layer with sixteen filters of size 5X5. Again the
feature map changed it is 10X10X16. The output size is calculated in a similar
manner. After this, we again applied an average pooling or subsampling layer,
which again reduce the size of the feature map by half i.e 5X5X16.

Then we have a final convolution layer of size 5X5 with 120 filters. As shown
in the above image. Leaving the feature map size 1X1X120. After which flatten
result is 120 values.

After these convolution layers, we have a fully connected layer with eighty-four
neurons. At last, we have an output layer with ten neurons since the data have
ten classes.

Here is the final architecture of the Lenet-5 model.

Architecture Details

Fourth layer

The subsampling takes place, and the image size in this step is reduced to 5x5x16. In this layer,
the input for the very last function diagram comes from all the remaining function diagrams.

Architecture Details
Let’s understand the architecture in more detail.

The first layer is the input layer with feature map size 32X32X1.

Then we have the first convolution layer with 6 filters of size 5X5 and stride is 1. The
activation function used at his layer is tanh. The output feature map is 28X28X6.

Next, we have an average pooling layer with filter size 2X2 and stride 1. The resulting feature
map is 14X14X6. Since the pooling layer doesn’t affect the number of channels.

After this comes the second convolution layer with 16 filters of 5X5 and stride 1. Also, the
activation function is tanh. Now the output size is 10X10X16.

Again comes the other average pooling layer of 2X2 with stride 2. As a result, the size of the
feature map reduced to 5X5X16.

The final pooling layer has 120 filters of 5X5 with stride 1 and activation function tanh. Now
the output size is 120.

The next is a fully connected layer with 84 neurons that result in the output to 84 values and
the activation function used here is again tanh.

The last layer is the output layer with 10 neurons and Softmax function. The Softmax gives
the probability that a data point belongs to a particular class. The highest value is then
predicted.

This is the entire architecture of the Lenet-5 model. The number of trainable parameters of
this architecture is around sixty thousand.

Alex Net
The structure of AlexNet is similar to LeNet-5, but the main difference is it is
much larger and deeper. It was the first convolution neural network that stacked
convolutional layers on top of each other rather than stacking a pooling layer on
top of each convolutional layer.

Before going on to the architecture of the AlexNet, we will get to know some
terms that will be useful in understanding the structure of AlexNet.

Stride

Stride basically denotes how far the filter will move over a convolution layer in
each step along one direction. In other words, if the value of stride is 1, then we
move the filter 1 pixel each time.

Let us understand stride using an example.

The above figure has a convolution layer of 5×5. A pooling layer of 1 is applied
to the convolution layer (The layer surrounded by zero is the pooling layer). A
filter of size 3×3 is applied to the layer. Now, let S be the stride; therefore, the
dimension of the next layer after processing from the filter will be (W - F +
2×P)/S + 1. Here W is the layer's width, F is the size of the filter that is to be
applied, P is the size of the pooling layer, and S is the size of stride.

If the value of S is 2, then the resultant convolution layer will be of size (5-
3+2)/2+1, i.e., 3×3.

Kernels and filters

A 2D matrix consisting of weights is called a kernel. A filter can be referred to

as multiple kernels stacked together. In other words, a filter is the 3D structure
of multiple kernels placed on each other.

Dropout regularization

Dropout is a mechanism used to improve the training of neural networks by

omitting a hidden unit. It also speeds up training. Dropout is driven by
randomly dropping a neuron so that it will not contribute to the forward pass
and backpropagation.

Max Pooling

Max pooling is an operation where the maximum value is calculated for the
patches of the feature map. This method is used to make a downsampled feature
map. It is generally used after the convolutional layer.

The architecture of AlexNet

AlexNet consists of a total of 8 hidden layers, excluding the input, output, and
pooling layers. Let us discuss each layer of the AlexNet briefly.

Input Layer

The input layer of AlexNet accepts the image of size 227×227×3. Here 227×227
defines the height and width of the input image, and the factor of 3 is for the
RGB channel of the image.

Output Layer
The output layer consists of 1000 connected neurons. The size of the output
layer will be 1000×1×1. The size of this layer is 1000 because the ImageNet
dataset is classified into 1000 classes.

Implementation of AlexNet
Let us see the diagram for the AlexNet and then we will implement it
accordingly.

OR
Working of Alex-Net

Unit 5 Ann
No ratings yet
Unit 5 Ann
28 pages
Unit 4
No ratings yet
Unit 4
19 pages
DL Mod 3
No ratings yet
DL Mod 3
65 pages
Deep Learning Unit-III
No ratings yet
Deep Learning Unit-III
9 pages
Unit III
No ratings yet
Unit III
38 pages
CNN Module2
No ratings yet
CNN Module2
11 pages
Unit III
No ratings yet
Unit III
8 pages
Module 3
No ratings yet
Module 3
34 pages
Unit 3
No ratings yet
Unit 3
59 pages
CNN Interview Questions Guide
No ratings yet
CNN Interview Questions Guide
16 pages
Unit 4 Deep Learning Model:: Introduction To Cnns
No ratings yet
Unit 4 Deep Learning Model:: Introduction To Cnns
7 pages
Unit 3 CNN
No ratings yet
Unit 3 CNN
47 pages
Unit2 CNN
No ratings yet
Unit2 CNN
34 pages
DL Mod3
No ratings yet
DL Mod3
102 pages
CNN Final
No ratings yet
CNN Final
17 pages
Module 3 Notes
No ratings yet
Module 3 Notes
22 pages
CNN Layer Sequence in Transfer Learning
No ratings yet
CNN Layer Sequence in Transfer Learning
8 pages
Unit 5th Ig Ann
No ratings yet
Unit 5th Ig Ann
112 pages
20 Questions To Test Your Skills On CNN Convolutional Neural Networks
No ratings yet
20 Questions To Test Your Skills On CNN Convolutional Neural Networks
11 pages
Convolution Operation
No ratings yet
Convolution Operation
23 pages
Lecture 6
No ratings yet
Lecture 6
17 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
35 pages
Deep Learning Series CNN - 2
No ratings yet
Deep Learning Series CNN - 2
15 pages
Convolutional Neural Network Basics
No ratings yet
Convolutional Neural Network Basics
11 pages
Unit 3 ML
No ratings yet
Unit 3 ML
27 pages
Unit IV Deep Leraning
No ratings yet
Unit IV Deep Leraning
35 pages
DL Endsem 2024 FlyHigh Services
No ratings yet
DL Endsem 2024 FlyHigh Services
18 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
11 pages
Understanding CNN Architecture Basics
No ratings yet
Understanding CNN Architecture Basics
24 pages
ML Lec 13 CNN
No ratings yet
ML Lec 13 CNN
44 pages
FODL Unit-4
No ratings yet
FODL Unit-4
46 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning
7 pages
CNN Hyperparameters Affecting Output Size
No ratings yet
CNN Hyperparameters Affecting Output Size
10 pages
Introduction to CNN Basics
No ratings yet
Introduction to CNN Basics
4 pages
CNN Midterm
No ratings yet
CNN Midterm
103 pages
HODL Lec 3 DNNs For Vision 1
No ratings yet
HODL Lec 3 DNNs For Vision 1
36 pages
Lecture 3 Updated
No ratings yet
Lecture 3 Updated
56 pages
CNNs: A Guide for Tech Enthusiasts
No ratings yet
CNNs: A Guide for Tech Enthusiasts
80 pages
Unit 3 CNN 2024
No ratings yet
Unit 3 CNN 2024
58 pages
CNNs Explained for Tech Enthusiasts
No ratings yet
CNNs Explained for Tech Enthusiasts
24 pages
CNN Notes Unit-3
No ratings yet
CNN Notes Unit-3
12 pages
CNN Layers and Operations Explained
No ratings yet
CNN Layers and Operations Explained
17 pages
DeepLearning Unit-II
No ratings yet
DeepLearning Unit-II
48 pages
Image Processing Essentials
No ratings yet
Image Processing Essentials
13 pages
CNNs for Machine Learning Experts
No ratings yet
CNNs for Machine Learning Experts
6 pages
21CSE424T - Deep Learning For Data Analytics - Unit I - 06082025
No ratings yet
21CSE424T - Deep Learning For Data Analytics - Unit I - 06082025
125 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
15 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
38 pages
Convolutional Networks 2024
No ratings yet
Convolutional Networks 2024
44 pages
Introduction To Convolution Neural Network
No ratings yet
Introduction To Convolution Neural Network
15 pages
Unit 3 DL
No ratings yet
Unit 3 DL
72 pages
DL Unit-3
No ratings yet
DL Unit-3
70 pages
CH VI - Convolutional Neural Network - 24
No ratings yet
CH VI - Convolutional Neural Network - 24
33 pages
Convolutional Neural Networks: Convolutional Layer Pooling Layer Fully Connected Layer
No ratings yet
Convolutional Neural Networks: Convolutional Layer Pooling Layer Fully Connected Layer
33 pages
CNN Essentials for Data Science Students
No ratings yet
CNN Essentials for Data Science Students
17 pages
465-Lecture 5-6
No ratings yet
465-Lecture 5-6
40 pages
Topic 3ii - Convolutional Neural Network
No ratings yet
Topic 3ii - Convolutional Neural Network
43 pages
Convolution Neural Network
No ratings yet
Convolution Neural Network
6 pages
Unit 2
No ratings yet
Unit 2
22 pages
Boson Lens Calibration Application Note - R1.1
No ratings yet
Boson Lens Calibration Application Note - R1.1
23 pages
MATM Test Plan Dev
No ratings yet
MATM Test Plan Dev
241 pages
Presentation JPrince Cositu
No ratings yet
Presentation JPrince Cositu
91 pages
Page and Controls Framework
No ratings yet
Page and Controls Framework
58 pages
Data Acquisition and Insights with Pandas
No ratings yet
Data Acquisition and Insights with Pandas
8 pages
Application and Theory Gaps During The Rise o - 2020 - Computers and Education
No ratings yet
Application and Theory Gaps During The Rise o - 2020 - Computers and Education
20 pages
Fantech G13 Gaming Mouse Price BD
No ratings yet
Fantech G13 Gaming Mouse Price BD
6 pages
Automation & Robotics Exam Guide
No ratings yet
Automation & Robotics Exam Guide
20 pages
Stack and Queue ADT in Python
No ratings yet
Stack and Queue ADT in Python
72 pages
Class 6 Computer Language Q&A
No ratings yet
Class 6 Computer Language Q&A
3 pages
Broadcast-Quality Up/Cross/Downconverter: Installation and Operation Manual
No ratings yet
Broadcast-Quality Up/Cross/Downconverter: Installation and Operation Manual
86 pages
Hassan Juma: Software Developer CV
No ratings yet
Hassan Juma: Software Developer CV
3 pages
CSE 2024-2025 Faculty Counselor List
No ratings yet
CSE 2024-2025 Faculty Counselor List
14 pages
Hookup Girls What Is The Worst Kind of Hookupkabbt PDF
100% (1)
Hookup Girls What Is The Worst Kind of Hookupkabbt PDF
2 pages
Brochure Sansdfir
No ratings yet
Brochure Sansdfir
15 pages
Field Report - KILIMANJARO INSTITUTE OF TECHNOLOGY AND MANAGEMENT (DSM - MWENGE)
No ratings yet
Field Report - KILIMANJARO INSTITUTE OF TECHNOLOGY AND MANAGEMENT (DSM - MWENGE)
32 pages
SC 920
No ratings yet
SC 920
75 pages
SFRA6 US Web
No ratings yet
SFRA6 US Web
2 pages
Asian Hot Girl Sex Us 45
No ratings yet
Asian Hot Girl Sex Us 45
7 pages
CS115 01
No ratings yet
CS115 01
38 pages
Compro PT Ne
No ratings yet
Compro PT Ne
28 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
3 pages
Engineering Projects & Skills Portfolio
No ratings yet
Engineering Projects & Skills Portfolio
2 pages
SSRN 3741983
No ratings yet
SSRN 3741983
15 pages
Windows 7 64-bit Software Installation Status
No ratings yet
Windows 7 64-bit Software Installation Status
3 pages
JACE Commissioning Guide
No ratings yet
JACE Commissioning Guide
47 pages
Os Record
No ratings yet
Os Record
30 pages
Lysine Acetylation Site Prediction in Prokaryotes
No ratings yet
Lysine Acetylation Site Prediction in Prokaryotes
14 pages
Format For Requesting For Note3 & Note 5 Key For Maxhub Interactive Panel
No ratings yet
Format For Requesting For Note3 & Note 5 Key For Maxhub Interactive Panel
2 pages
Cisco Scale Router Boxes
No ratings yet
Cisco Scale Router Boxes
110 pages

Deep Learning CNN 4th Unit

Uploaded by

Deep Learning CNN 4th Unit

Uploaded by

UNIT 4

Convolutional Neural Network (CNN)

 A Convolutional Neural Network (CNN) is a type of Deep Learning

 The Convolutional layer applies filters to the input image to extract

 Input Layers: It is the layer in which we give input to our model. In

In the above image,

The image matrix is h x w x d

The dimensions of the filter are fh x fw x d

The output is calculated as (h- fh +1)(w- fw+1) x 1

Pooling Technique in CNN

The pooling operation involves sliding a two-dimensional filter over each

-> nh - height of feature map

The input to this model is a 32 X 32 grayscale image hence the number of

Here is the final architecture of the Lenet-5 model.

Let us understand stride using an example.

Kernels and filters

A 2D matrix consisting of weights is called a kernel. A filter can be referred to

Dropout is a mechanism used to improve the training of neural networks by

The architecture of AlexNet

You might also like