Module3 Casestudy
Module3 Casestudy
The input to this model is a 32 X 32 gray scale image hence the number
of channels is one.
H1=32-5+1=28
W1=32-5+1=28
We then apply the first convolution operation with the filter size 5X5
and we have 6 such filters. As a result, we get a feature map of size
28X28X6. Here the number of channels is equal to the number of filters
applied.
After the first convolution operation, we apply the average pooling and
the size of the feature map is reduced by half. Note that, the number of
channels is intact.
Next, we have a convolution layer with sixteen filters of size 5X5. Again
the feature map changed it is 10X10X16. The output size is calculated in
a similar manner. After this, we again applied an average pooling or
subsampling layer, which again reduce the size of the feature map by
half i.e 5X5X16. (10-2)/2+1=5
Then we have a final convolution layer of size 5X5 with 120 filters. As
shown in the above image the feature map size 1X1X120. After which
flatten result is 120 values.
Architecture Details
The first layer is the input layer with feature map size 32X32X1.
Then we have the first convolution layer with 6 filters of size 5X5 and
stride is 1. The activation function used at his layer is tanh. The output
feature map is 28X28X6.
Next, we have an average pooling layer with filter size 2X2 and stride 1.
The resulting feature map is 14X14X6. Since the pooling layer doesn’t
affect the number of channels.
After this comes the second convolution layer with 16 filters of 5X5 and
stride 1. Also, the activation function is tanh. Now the output size is
10X10X16.
Again comes the other average pooling layer of 2X2 with stride 2. As a
result, the size of the feature map reduced to 5X5X16.
The final pooling layer has 120 filters of 5X5 with stride 1 and
activation function tanh. Now the output size is 120.
The next is a fully connected layer with 84 neurons that result in the
output to 84 values and the activation function used here is again tanh.
The last layer is the output layer with 10 neurons and Softmax function.
The Softmax gives the probability that a data point belongs to a
particular class. The highest value is then predicted.
The Alexnet has eight layers with learnable parameters. The model consists
of five layers with a combination of max pooling followed by 3 fully connected
layers and they use Relu activation in each of these layers except the output
layer.
They found out that using the relu as an activation function accelerated the
speed of the training process by almost six times. They also used the dropout
layers that prevented their model from overfitting. Further, the model is trained
on the Imagenet dataset. The Imagenet dataset has almost 14 million images
across a thousand classes.
In case, you are unaware of how to calculate the output size of a convolution
layer
Also, the number of filters becomes the channel in the output feature map.
Next, we have the first Maxpooling layer, of size 3X3 and stride 2. Then we
get the resulting feature map with the size 27X27X96.
After this, we apply the second convolution operation. This time the filter size
is reduced to 5X5 and we have 256 such filters. The stride is 1 and padding 2.
The activation function used is again relu. Now the output size we get is
27X27X256.
Again we applied a max-pooling layer of size 3X3 with stride 2. The resulting
feature map is of shape 13X13X256.
Now we apply the third convolution operation with 384 filters of size 3X3 stride
1 and also padding 1. Again the activation function used is relu. The output
feature map is of shape 13X13X384.
Then we have the fourth convolution operation with 384 filters of size 3X3.
The stride along with the padding is 1. On top of that activation function used
is relu. Now the output size remains unchanged i.e 13X13X384.
After this, we have the final convolution layer of size 3X3 with 256 such filters.
The stride and padding are set to one also the activation function is relu. The
resulting feature map is of shape 13X13X256.
So if you look at the architecture till now, the number of filters is increasing as
we are going deeper. Hence it is extracting more features as we move deeper
into the architecture. Also, the filter size is reducing, which means the initial
filter was larger and as we go ahead the filter size is decreasing, resulting in a
decrease in the feature map shape.
Next, we apply the third max-pooling layer of size 3X3 and stride 2. Resulting
in the feature map of the shape 6X6X256.
Then we have the first fully connected layer with a relu activation function. The
size of the output is 4096. Next comes another dropout layer with the dropout
rate fixed at 0.5.
This followed by a second fully connected layer with 4096 neurons and relu
activation.
Finally, we have the last fully connected layer or output layer with 1000
neurons as we have 1000 classes in the data set. The activation function used
at this layer is Softmax.
This is the architecture of the Alexnet model. It has a total of 62.3 million
learnable parameters.
The input to cov1 layer is of fixed size 224 x 224 RGB image. The image is passed
through a stack of convolutional (conv.) layers, where the filters were used with a
very small receptive field: 3×3. The convolution stride is fixed to 1 pixel; the
padding is 1-pixel for 3×3 conv. Layers.
Start with initializing the model by specifying that the model is a sequential model.
After initializing the model add
Add relu(Rectified Linear Unit) activation to each layers so that all the negative
values are not passed to the next layer.
After creating the entire convolution, pass the data to the dense layer so for that
flatten the vector which comes out of the convolutions and add
The softmax layer will output the value between 0 and 1 based on the confidence of
the model that which class the images belongs to.
14x14x512
(14-2)/2+1=7
16 layers of VGG16