0% found this document useful (0 votes)

316 views25 pages

Difference Between AlexNet, VGGNet, ResNet, and Inception

Uploaded by

sankeerthmanmadhan2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

316 views25 pages

Difference Between AlexNet, VGGNet, ResNet, and Inception

Uploaded by

sankeerthmanmadhan2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Towards Data Science

Aqeel Anwar

Jun 7, 2019

9 min read

Listen
Difference between AlexNet,
VGGNet, ResNet, and Inception

In this tutorial, I will quickly go through the details of

four of the famous CNN architectures and how they

differ from each other by explaining their W3H (When,

Why, What, and How)

AlexNet

When?

● The Alan Turing Year

● The year of Sustainable Energy for All

● London Olympics
Why? AlexNet was born out of the need to improve the

results of the ImageNet challenge. This was one of the first

Deep convolutional networks to achieve considerable

accuracy on the 2012 ImageNet LSVRC-2012 challenge with

an accuracy of 84.7% as compared to the second-best with

an accuracy of 73.8%. The idea of spatial correlation in an

image frame was explored using convolutional layers and

receptive fields.

What? The network consists of 5 Convolutional (CONV)

layers and 3 Fully Connected (FC) layers. The activation

used is the Rectified Linear Unit (ReLU). The structural

details of each layer in the network can be found in the table

below.
Alexnet Block Diagram (source:oreilly.com)

The network has a total of 62 million trainable variables

How? The input to the network is a batch of RGB images of

size 227x227x3 and outputs a 1000x1 probability vector one

corresponding to each class.

● Data augmentation is carried out to reduce over-

fitting. This Data augmentation includes mirroring

and cropping the images to increase the variation in

the training data-set. The network uses an

overlapped max-pooling layer after the first, second,

and fifth CONV layers. Overlapped maxpool layers

are simply maxpool layers with strides less than the

window size. 3x3 maxpool layer is used with a stride

of 2 hence creating overlapped receptive fields. This

overlapping improved the top-1 and top-5 errors by

0.4% and 0.3%, respectively.

● Before AlexNet, the most commonly used activation

functions were sigmoid and tanh. Due to the

saturated nature of these functions, they suffer from

the Vanishing Gradient (VG) problem and make it

difficult for the network to train. AlexNet uses the

ReLU activation function which doesn’t suffer from

the VG problem. The original paper showed that the

network with ReLU achieved a 25% error rate about

6 times faster than the same network with tanh non-

linearity.

● Although ReLU helps with the vanishing gradient

problem, due to its unbounded nature, the learned

variables can become unnecessarily high. To prevent

this, AlexNet introduced Local Response

Normalization (LRN). The idea behind LRN is to

carry out a normalization in a neighborhood of pixels

amplifying the excited neuron while dampening the

surrounding neurons at the same time.

● AlexNet also addresses the over-fitting problem by

using drop-out layers where a connection is dropped

during training with a probability of p=0.5. Although

this avoids the network from over-fitting by helping it

escape from bad local minima, the number of

iterations required for convergence is doubled too.

VGGNet:
When?

● International Year of Family Farming and

Crystallography

● First Robotic Landing on Comet

● Year of Robin Williams’ death

Why? VGGNet was born out of the need to reduce the # of

parameters in the CONV layers and improve on training

time.

What? There are multiple variants of VGGNet (VGG16,

VGG19, etc.) which differ only in the total number of layers

in the network. The structural details of a VGG16 network

have been shown below.

VGG16 Block Diagram (source: neurohive.io)

VGG16 has a total of 138 million parameters. The important

point to note here is that all the conv kernels are of size 3x3

and maxpool kernels are of size 2x2 with a stride of two.

How? The idea behind having fixed size kernels is that all

the variable size convolutional kernels used in Alexnet

(11x11, 5x5, 3x3) can be replicated by making use of

multiple 3x3 kernels as building blocks. The replication is in

terms of the receptive field covered by the kernels.

Let’s consider the following example. Say we have an input

layer of size 5x5x1. Implementing a conv layer with a kernel

size of 5x5 and stride one will result in an output feature

map of 1x1. The same output feature map can be obtained

by implementing two 3x3 conv layers with a stride of 1 as

shown below
Now let’s look at the number of variables needed to be

trained. For a 5x5 conv layer filter, the number of variables

is 25. On the other hand, two conv layers of kernel size 3x3

have a total of 3x3x2=18 variables (a reduction of 28%).

Similarly, the effect of one 7x7 (11x11) conv layer can be

achieved by implementing three (five) 3x3 conv layers with a

stride of one. This reduces the number of trainable variables

by 44.9% (62.8%). A reduced number of trainable variables

means faster learning and more robust to over-fitting.

ResNet
When?

● Discovery of Gravitational Waves

● International year of soil and light-based

technologies

● The Martian movie

Why? Neural Networks are notorious for not being able to

find a simpler mapping when it exists.

● For example, say we have a fully connected multi-

layer perceptron network and we want to train it on

a data-set where the input equals the output. The

simplest solution to this problem is having all

weights equaling one and all biases zeros for all the

hidden layers. But when such a network is trained

using back-propagation, a rather complex mapping is

learned where the weights and biases have a wide

range of values.

● Another example is adding more layers to an existing

neural network. Say we have a network f(x) that has

achieved an accuracy of n% on a data-set. Now

adding more layers to this network g(f(x)) should

have at least an accuracy of n% i.e. in the worst case

g(.) should be an identical mapping yielding the

same accuracy as that of f(x) if not more. But

unfortunately, that is not the case. Experiments have

shown that the accuracy decreases by adding more

layers to the network.

● The issues mentioned above happens because of the

vanishing gradient problem. As we make the CNN

deeper, the derivative when back-propagating to the

initial layers becomes almost insignificant in value.

ResNet addresses this network by introducing two types of

‘shortcut connections’: Identity shortcut and Projection

shortcut.

What? There are multiple versions of ResNetXX

architectures where ‘XX’ denotes the number of layers. The

most commonly used ones are ResNet50 and ResNet101.

Since the vanishing gradient problem was taken care of

(more about it in the How part), CNN started to get deeper

and deeper. Below we present the structural details of

ResNet18

Resnet18 has around 11 million trainable parameters. It

consists of CONV layers with filters of size 3x3 (just like

VGGNet). Only two pooling layers are used throughout the

network one at the beginning and the other at the end of the

network. Identity connections are between every two CONV

layers. The solid arrows show identity shortcuts where the

dimension of the input and output is the same, while the

dotted ones present the projection connections where the

dimensions differ.

How? As mentioned earlier, ResNet architecture makes use

of shortcut connections to solve the vanishing gradient

problem. The basic building block of ResNet is a Residual

block that is repeated throughout the network.

Residual Block — Image is taken from the original paper

Instead of learning the mapping from x →F(x), the network learns the

mapping from x → F(x)+G(x). When the dimension of the input x and

output F(x) is the same, the function G(x) = x is an identity function and

the shortcut connection is called Identity connection. The identical

mapping is learned by zeroing out the weights in the intermediate layer

during training since it's easier to zero out the weights than push them to

one.

For the case when the dimensions of F(x) differ from x (due

to stride length>1 in the CONV layers in between), the

Projection connection is implemented rather than the

Identity connection. The function G(x) changes the

dimensions of input x to that of output F(x). Two kinds of

mapping were considered in the original paper.

● Non-trainable Mapping (Padding): The input x is

simply padded with zeros to make the dimension

match that of F(x)

● Trainable Mapping (Conv Layer): 1x1 Conv layer

is used to map x to G(x). It can be seen from the

table above that across the network the spatial

dimensions are either kept the same or halved, and

the depth is either kept the same or doubled and the

product of Width and Depth after each conv layer

remains the same i.e. 3584. 1x1 conv layers are used

to half the spatial dimension and double the depth by

using stride length of 2 and multiple of such filters

respectively. The number of 1x1 conv layers is equal

to the depth of F(x).

Inception:
When?
● International Year of Family Farming and

Crystallography

● First Robotic Landing on Comet

● Year of Robin Williams’ death

Why? In an image classification task, the size of the salient

feature can considerably vary within the image frame.

Hence, deciding on a fixed kernel size is rather difficult.

Lager kernels are preferred for more global features that are

distributed over a large area of the image, on the other

hand, smaller kernels provide good results in detecting area-

specific features that are distributed across the image frame.

For effective recognition of such a variable-sized feature, we

need kernels of different sizes. That is what Inception does.

Instead of simply going deeper in terms of the number of

layers, it goes wider. Multiple kernels of different sizes are

implemented within the same layer.

What? The Inception network architecture consists of

several inception modules of the following structure

Inception Module (source: original paper)

Each inception module consists of four operations in parallel

● 1x1 conv layer

● 3x3 conv layer

● 5x5 conv layer

● max pooling
The 1x1 conv blocks shown in yellow are used for depth

reduction. The results from the four parallel operations are

then concatenated depth-wise to form the Filter

Concatenation block (in green). There is multiple version of

Inception, the simplest one being the GoogLeNet.

How? Inception increases the network space from which the

best network is to be chosen via training. Each inception

module can capture salient features at different levels.

Global features are captured by the 5x5 conv layer, while the

3x3 conv layer is prone to capturing distributed features.

The max-pooling operation is responsible for capturing low-

level features that stand out in a neighborhood. At a given

level, all of these features are extracted and concatenated

before it is fed to the next layer. We leave for the

network/training to decide what features hold the most

values and weight accordingly. Say if the images in the data-

set are rich in global features without too many low-level

features, then the trained Inception network will have very

small weights corresponding to the 3x3 conv kernel as

compared to the 5x5 conv kernel.

Summary
In the table below these four CNNs are sorted w.r.t their

top-5 accuracy on the Imagenet dataset. The number of

trainable parameters and the Floating Point Operations

(FLOP) required for a forward pass can also be seen.

Several comparisons can be drawn:

● AlexNet and ResNet-152, both have about 60M

parameters but there is about a 10% difference in

their top-5 accuracy. But training a ResNet-152

requires a lot of computations (about 10 times more

than that of AlexNet) which means more training

time and energy required.

● VGGNet not only has a higher number of parameters

and FLOP as compared to ResNet-152 but also has a

decreased accuracy. It takes more time to train a

VGGNet with reduced accuracy.

● Training an AlexNet takes about the same time as

training Inception. The memory requirements are 10

times less with improved accuracy (about 9%)

CNN Architectures: AlexNet, VGGNet, ResNet, Inception
No ratings yet
CNN Architectures: AlexNet, VGGNet, ResNet, Inception
14 pages
19 ResNet 10 09 2024
No ratings yet
19 ResNet 10 09 2024
35 pages
CNN Case Studies Unit 4
No ratings yet
CNN Case Studies Unit 4
13 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
211 pages
Data Science Interview Prep: CNNs Explained
No ratings yet
Data Science Interview Prep: CNNs Explained
11 pages
CNN Architectures: AlexNet, VGGNet, ResNet, Inception
No ratings yet
CNN Architectures: AlexNet, VGGNet, ResNet, Inception
14 pages
465-Lecture 7
No ratings yet
465-Lecture 7
46 pages
Deep CNN
No ratings yet
Deep CNN
66 pages
Convnets 3
No ratings yet
Convnets 3
17 pages
CSCI417 Machine Intelligence - Lec11 RNN - V1
No ratings yet
CSCI417 Machine Intelligence - Lec11 RNN - V1
61 pages
Unit 5
No ratings yet
Unit 5
24 pages
VGG Net
No ratings yet
VGG Net
6 pages
ResNet & VGGNet Deep Learning Guide
No ratings yet
ResNet & VGGNet Deep Learning Guide
44 pages
VGGNet: Deep Learning for Images
No ratings yet
VGGNet: Deep Learning for Images
22 pages
CNNs for Image Recognition
No ratings yet
CNNs for Image Recognition
17 pages
Modern CNN Architectures Overview
No ratings yet
Modern CNN Architectures Overview
32 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
80 pages
Applications of Convolutional Neural Networks
No ratings yet
Applications of Convolutional Neural Networks
20 pages
Understanding Regularization in Machine Learning
No ratings yet
Understanding Regularization in Machine Learning
154 pages
Convolutional Neural Networks Overview
No ratings yet
Convolutional Neural Networks Overview
17 pages
Overview of Famous CNN Architectures
No ratings yet
Overview of Famous CNN Architectures
6 pages
Notes - CSE (DS)
No ratings yet
Notes - CSE (DS)
44 pages
CNN Architectures: AlexNet & VGG Overview
No ratings yet
CNN Architectures: AlexNet & VGG Overview
15 pages
Evolution of CNN Architecture
No ratings yet
Evolution of CNN Architecture
13 pages
AlexNet: Pioneering CNN Architecture
No ratings yet
AlexNet: Pioneering CNN Architecture
15 pages
Res Net 4
No ratings yet
Res Net 4
23 pages
Convolutional Neural Network Report
No ratings yet
Convolutional Neural Network Report
5 pages
Understanding ResNet Architecture
No ratings yet
Understanding ResNet Architecture
8 pages
Image Processing With Deep Learning
No ratings yet
Image Processing With Deep Learning
39 pages
Overview of VGG-16 and CNN Architectures
No ratings yet
Overview of VGG-16 and CNN Architectures
14 pages
Unit III
No ratings yet
Unit III
58 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
43 pages
Unit 3
No ratings yet
Unit 3
14 pages
Understanding AlexNet
No ratings yet
Understanding AlexNet
8 pages
DL UNIT 2 CNN Architectures
No ratings yet
DL UNIT 2 CNN Architectures
12 pages
LeNet-5 and AlexNet Architectures Explained
No ratings yet
LeNet-5 and AlexNet Architectures Explained
13 pages
TRes Net
No ratings yet
TRes Net
37 pages
Lec 6
No ratings yet
Lec 6
31 pages
ResNet and VGGNet Architecture Overview
No ratings yet
ResNet and VGGNet Architecture Overview
44 pages
Comparative Analysis of CNN Architectures
No ratings yet
Comparative Analysis of CNN Architectures
41 pages
CNN Architectures 01
No ratings yet
CNN Architectures 01
66 pages
ConvNet Architectures Overview
No ratings yet
ConvNet Architectures Overview
37 pages
Deep Learning Assign 2
No ratings yet
Deep Learning Assign 2
5 pages
Case Studies
No ratings yet
Case Studies
17 pages
AE556 2024 Topic4 CNN
No ratings yet
AE556 2024 Topic4 CNN
26 pages
Insights on Deep Residual Networks
No ratings yet
Insights on Deep Residual Networks
40 pages
Applications and Architecture of CNNs
No ratings yet
Applications and Architecture of CNNs
19 pages
Understanding CNNs and RNNs in Deep Learning
No ratings yet
Understanding CNNs and RNNs in Deep Learning
93 pages
Understanding ResNet
No ratings yet
Understanding ResNet
11 pages
Convolutional Neural Networks Overview
No ratings yet
Convolutional Neural Networks Overview
44 pages
Understanding ConvNet Architectures
No ratings yet
Understanding ConvNet Architectures
38 pages
Convolutional Neural Networks Overview
No ratings yet
Convolutional Neural Networks Overview
14 pages
Understanding ResNet and C3D Architectures
No ratings yet
Understanding ResNet and C3D Architectures
82 pages
Convolutional Neural Networks Guide
No ratings yet
Convolutional Neural Networks Guide
40 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
12 pages
CNN Architectures for Text and Image
No ratings yet
CNN Architectures for Text and Image
167 pages
Cours 8 B
No ratings yet
Cours 8 B
39 pages
Deep Learning and CNN Fundamentals
No ratings yet
Deep Learning and CNN Fundamentals
33 pages
Overview of AlexNet Architecture
No ratings yet
Overview of AlexNet Architecture
16 pages
4 A 14 CSC Operating System
No ratings yet
4 A 14 CSC Operating System
2 pages
Memory Aid System for Patients
No ratings yet
Memory Aid System for Patients
4 pages
Student Fee Details 2024
No ratings yet
Student Fee Details 2024
1 page
Gradient Descent
No ratings yet
Gradient Descent
12 pages
Security Goal Implementation Techniques
No ratings yet
Security Goal Implementation Techniques
5 pages
Sigmoid Neurons and Gradient Descent
No ratings yet
Sigmoid Neurons and Gradient Descent
92 pages
Introduction To Artificial Neural Networks - Neural Networks and Deep Learning
No ratings yet
Introduction To Artificial Neural Networks - Neural Networks and Deep Learning
26 pages
T SNE1
No ratings yet
T SNE1
20 pages
Understanding Machine Learning Concepts
No ratings yet
Understanding Machine Learning Concepts
30 pages
Chatbot Advances
No ratings yet
Chatbot Advances
24 pages
A6 Syllabus
No ratings yet
A6 Syllabus
4 pages
Critical Response Essay
No ratings yet
Critical Response Essay
3 pages
AI Powered Resume Screener
No ratings yet
AI Powered Resume Screener
3 pages
Using Google Sheets + AI To Automate Business Reporting
No ratings yet
Using Google Sheets + AI To Automate Business Reporting
2 pages
Whitepaper - Foundational Large Language Models & Text Generation
100% (3)
Whitepaper - Foundational Large Language Models & Text Generation
75 pages
Artificial Intelligence Applications in Chemistry
100% (1)
Artificial Intelligence Applications in Chemistry
397 pages
Đề Vip 8+ Số 12
100% (1)
Đề Vip 8+ Số 12
27 pages
Tech Trends for Business Leaders
No ratings yet
Tech Trends for Business Leaders
40 pages
ENGE 3950 Syllabus, Subject To Change
No ratings yet
ENGE 3950 Syllabus, Subject To Change
8 pages
Dataiku AI 2021
No ratings yet
Dataiku AI 2021
23 pages
Pilot Interview Questions Answers Star Method Guide
No ratings yet
Pilot Interview Questions Answers Star Method Guide
11 pages
Deep Learning for Transport & Health
No ratings yet
Deep Learning for Transport & Health
9 pages
Advancements in Deep Reinforcement Learning and Inverse Reinforcement Learning For Robotic Manipulation Toward Trustworthy Interpretable and Explainable Artificial Intelligence
No ratings yet
Advancements in Deep Reinforcement Learning and Inverse Reinforcement Learning For Robotic Manipulation Toward Trustworthy Interpretable and Explainable Artificial Intelligence
19 pages
Enhancing Alarm Management in Green Hydrogen Plants: A Comprehensive Analysis of The V-Nets-Based Methodology Plants
No ratings yet
Enhancing Alarm Management in Green Hydrogen Plants: A Comprehensive Analysis of The V-Nets-Based Methodology Plants
14 pages
Stock Market Basics & Technical Analysis
No ratings yet
Stock Market Basics & Technical Analysis
4 pages
AI API Course
No ratings yet
AI API Course
85 pages
AI and ML in Electrical Engineering Innovations
No ratings yet
AI and ML in Electrical Engineering Innovations
2 pages
Symbolic vs. Subsymbolic AI
No ratings yet
Symbolic vs. Subsymbolic AI
11 pages
Thesis Bibiliography
No ratings yet
Thesis Bibiliography
6 pages
All in One PDF
No ratings yet
All in One PDF
574 pages
DC Build - 10 MW-40 MW-IT Load
100% (1)
DC Build - 10 MW-40 MW-IT Load
11 pages
IT Docs
No ratings yet
IT Docs
3 pages
Advanced Certification Programme in Generative Ai and Prompt Engineering
100% (1)
Advanced Certification Programme in Generative Ai and Prompt Engineering
16 pages
Deepeye: Video Surveillance With Anomaly Detection: Gokaraju Rangaraju Institute of Engineering and Technology
No ratings yet
Deepeye: Video Surveillance With Anomaly Detection: Gokaraju Rangaraju Institute of Engineering and Technology
9 pages
Comprehensive Guide to Neural Networks
No ratings yet
Comprehensive Guide to Neural Networks
47 pages
Dark Activity Detection in AIS-Based Maritime Networks
No ratings yet
Dark Activity Detection in AIS-Based Maritime Networks
6 pages
Soft Computing Index
No ratings yet
Soft Computing Index
4 pages
Optimizing Inventory Management Amazons Strategie
No ratings yet
Optimizing Inventory Management Amazons Strategie
4 pages