CNN Notes
CNN Notes
image but in the case of complex images, ANN will fail to give better predictions, this
1. Input Layer: The input layer in CNN should contain image data. Image
data is represented by a three-dimensional matrix. We have to reshape the
image into a single column.
Usually, the image is highly non-linear, which means varied pixel values. This
is a scenario that is very difficult for an algorithm to make correct
predictions. RELU activation function is applied in these cases to decrease
the non-linearity and make the job easier.
Therefore this layer helps in the detection of features, decreasing the non-
linearity of the image, converting negative pixels to zero which also allows
detecting the variations of features.
CNN uses pooling layers to reduce the size of the input image so that it
speeds up the computation of the network.
As a result of pooling, even if the picture were a little tilted, the largest
number in a certain region of the feature map would have been recorded
and hence, the feature would have been preserved. Also as another benefit,
reducing the size by a very significant amount will use less computational
power. So, it is also useful for extracting dominant features.
The process of converting all the resultant 2-d arrays into a vector is
called Flattening.
Flatten output is fed as input to the fully connected neural network having
varying numbers of hidden layers to learn the non-linear complexities
present with the feature representation.
Filter size
Stride
Max or average pooling
The aim of the Fully connected layer is to use the high-level feature of the
input image produced by convolutional and pooling layers for classifying the
input image into various classes based on the training dataset.
Fully connected means that every neuron in the previous layer is connected
to each and every neuron in the next layer. The Sum of output probabilities
from the Fully connected layer is 1, fully connected using a softmax
activation function in the output layer.
Working
It works like an ANN, assigning random weights to each synapse, the input
layer is weight-adjusted and put into an activation function. The output of
this is then compared to the true values and the error generated is back-
propagated, i.e. the weights are re-calculated and repeat all the processes.
This is done until the error or cost function is minimized.
Feature Learning deals with the algorithm by learning about the dataset.
Components like Convolution, ReLU, and Pooling work for that, with
numerous iterations between them. Once the features are known, then
classification happens using the Flattening and Full Connection
components.
Feature maps
Convnet filters
Class output
We will visualize the feature maps to see how the input is
transformed passing through the convolution layers. The feature
maps are also called intermediate activations since the output of a
layer is called the activation.
For example one of the feature maps from the output of the very first
layer (block1_conv1) looks as follows.
Bright areas are the “activated” regions, meaning the filter detected
the pattern it was looking for. This filter seems to encode an eye and
nose detector.