Explaining How Resnet-50 Works and Why It Is So Popular
Explaining How Resnet-50 Works and Why It Is So Popular
Introduction
Why ResNet?
Did the jargon confuse you? Let’s take an analogy of a needle and a
haystack.
Let the needle be the perfect weights of the neural network, or as
explained before, a function. Let the haystack be all the possible
functions that can be made.
One starts from a single search area and tries to zero into the needle
from there. Adding layers is equivalent to moving your search area and
making it bigger. But that comes with the risk of moving away from the
place where the needle actually is as well as making our search more
time-consuming and difficult. Larger the haystack, more difficult it is
to find the perfect needle. What is the solution, then?
On the other end of the spectrum, there are cases when the gradient
reaches orders up to 10⁴ and more. As these large gradients multiply
with each other, the values tend to move towards infinity. Allowing
such a large range of values to be in the numerical domain for weights
makes convergence difficult to achieve.
Architecture
ResNet-50 architecture
1. Input Pre-processing
2. Cfg[0] blocks
3. Cfg[1] blocks
4. Cfg[2] blocks
5. Cfg[3] blocks
6. Fully-connected layer
The best way to understand the concept is through some code. The
implementation below is done in Keras, uses the standard ResNet-50
architecture (ResNet has several versions, differing in the depth of the
network). We will train the model on the famous Stanford Dogs dataset
by Stanford AI.
Import headers
!pip install -q tensorflow_datasets
import tensorflow as tf
from tensorflow import keras
import tensorflow_datasets as tfds
import os
import PIL
import pathlib
import PIL.Image
import warnings
warnings.filterwarnings("ignore")
from datetime import datetime
Along with the images and labels, we also get some meta-data which
gives us more information about the dataset. That is stored
in ds_info and printed in a human-readable manner.
Augmentation
imageAug = keras.Sequential([
keras.layers.RandomFlip("horizontal_and_vertical"),
keras.layers.RandomRotation(0.2),
keras.layers.RandomContrast(0.2)
])
Cfg0 Block
This block contains 1 Conv Layer and 2 Identity Layers. For helping
numerical stability, we specify a kernel constraint which makes sure
that all weights are normalized at constant intervals. Between 2
subsequent layers, we also include a BatchNormalization layer. The
code has been written in an explicit way deliberately to help readers
understand what design choices have been made at each stage.
Cfg1 Block
This block contains 1 Conv Layer and 2 Identity Layers. This is similar
to the Cfg0 blocks, with the difference mainly being in the number
of out_channels in the Conv and Identity layers being more.
Cfg2 Block
This block contains 1 Conv layer and 5 Identity layers. This is one of the
more important blocks for ResNet as most versions of the model differ
in this block-space.
This block contains 1 Conv Layer and 2 Identity Layers. This is the last
set of Convolutional Layer blocks present in the network.
Classifier Block
Now we take all the blocks and join them together to create the final
ResNet Model. In our entire process, we have used the Keras
Functional API, which is a best-practice for Tensorflow.
=================================================================
Total params: 17,924,472
Trainable params: 17,893,752
Non-trainable params: 30,720
_________________________________________________________________
None
Defining Callbacks
In model.fit(), we can define callbacks for the model that are invoked
during training at pre-determined intervals. We define a Model
Checkpoint callback that creates a snapshot of the model at the
completion of each epoch.
callbacks_list = [
keras.callbacks.ModelCheckpoint(
filepath='resnet50_model/checkpoint_{epoch:02d}.hdf5',
monitor='val_loss',
verbose=0,
save_best_only=True,
mode='auto',
save_freq='epoch',
options=None,
initial_value_threshold=None
)
]history = model.fit(
x=train_ds,
validation_data=valid_ds,
callbacks=callbacks_list,
epochs=20
)
We print the model history to get more information about the training
process
print(history)
Predicting results
We take the trained model and use it to perform predictions on the test
set as well as calculate several metrics like Loss and Accuracy
results = model.evaluate(test_ds)
print(f"Results : {results}")
Conclusion