0% found this document useful (0 votes)
3 views4 pages

Cnn Archtechture

The document discusses key innovations in deep learning architectures, focusing on AlexNet's improvements over LeNet-5, including deeper architecture, ReLU activation, and GPU utilization. It also describes the Inception module in GoogleNet, highlighting its computational efficiency through parallel paths and parameter reduction. Lastly, it compares DenseNet's dense blocks with ResNet's skip connections, emphasizing differences in connection types, feature reuse, and gradient flow.

Uploaded by

Sara Zara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views4 pages

Cnn Archtechture

The document discusses key innovations in deep learning architectures, focusing on AlexNet's improvements over LeNet-5, including deeper architecture, ReLU activation, and GPU utilization. It also describes the Inception module in GoogleNet, highlighting its computational efficiency through parallel paths and parameter reduction. Lastly, it compares DenseNet's dense blocks with ResNet's skip connections, emphasizing differences in connection types, feature reuse, and gradient flow.

Uploaded by

Sara Zara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

1.

Explain the architectural innovations of AlexNet and how it


improved upon LeNet-5.

Answer:
AlexNet, introduced by Alex Krizhevsky et al. in 2012, revolutionized deep learning by
winning the ImageNet ILSVRC challenge with a top-5 error rate of 17% (compared
to 26% by others). Its key innovations over LeNet-5 include:

Deeper Architecture:

8 layers (5 convolutional + 3 fully connected) vs. LeNet-5’s 5 layers.


Enabled learning of hierarchical features (edges → textures → object parts).
ReLU Activation:

Replaced sigmoid/tanh with Rectified Linear Units (ReLUs).


Advantage: Avoided vanishing gradients, sped up training (6x faster convergence).
GPU Utilization:

First CNN trained on dual NVIDIA GTX 580 GPUs (3 days vs. weeks on CPUs).
Overlapping Max-Pooling:

Used stride=2, 3x3 windows (vs. non-overlapping in LeNet).


Benefit: Reduced spatial dimensions while retaining more information.
Local Response Normalization (LRN):

Mimicked biological lateral inhibition to encourage neuron competition.


Later replaced by batch normalization in modern networks.
Dropout:

Randomly deactivated 50% neurons in fully connected layers during training.


Reduced overfitting (critical for 60M parameters).

Comparison with LeNet-5:

Feature LeNet-5 AlexNet

Depth 5 layers 8 layers

Activatio
Sigmoid/Tanh ReLU
n

Small (MNIST Large


Scale
digits) (ImageNet)

Hardwar
CPU GPU-optimized
e

Conclusion: AlexNet’s scale and innovations (ReLU, GPU, dropout) made it the first
modern CNN, paving the way for deeper architectures.

2. Describe the Inception module in GoogleNet. Why is it


computationally efficient?
(10 marks)

Answer:
The Inception module is the core building block of GoogleNet (2014), which reduced
the top-5 error to 6.7% with only 6M parameters (vs. AlexNet’s 60M).

Structure of Inception Module:

Parallel Paths:

1x1 convolutions: Cheap channel-wise transformations.


3x3 and 5x5 convolutions: Capture spatial patterns at different scales.
3x3 max-pooling: Preserves spatial features.
Filter Concatenation:

Outputs of all paths are depth-concatenated (requires "SAME" padding for equal
width/height).
1x1 Bottlenecks:

Applied before 3x3/5x5 convs to reduce channels (e.g., 256 → 64).


Example: A 5x5 conv on 256 channels costs (256×5×5×64)=409,600 ops, but with a
1x1 bottleneck (256×1×1×64=16,384 + 64×5×5×64=102,400), total ops drop to 118,784.

Computational Efficiency:

Parameter Reduction: 1x1 convs compress channels before expensive operations.


Multi-Scale Processing: Captures fine and coarse features in parallel.
Sparse Connections: Mimicked by dense operations (efficient on GPUs).

Mathematical Insight:
For input tensor X of shape (H, W, C), the module computes:
text

Copy

Download

Output = Concat[
Conv1x1(X),
Conv3x3(Conv1x1(X)),
Conv5x5(Conv1x1(X)),
MaxPool3x3(X)
]

Conclusion: Inception balances depth and efficiency via bottlenecks and multi-
scale aggregation.
3. What are skip connections in ResNet? Prove mathematically why
they mitigate vanishing gradients.

(15 marks)

Answer:

Skip Connections (Residual Learning):

Definition: Shortcut paths that add input x to the output of a layer block F(x).
Formula: H(x) = F(x) + x, where F(x) learns the residual (difference from identity).

Why They Work:

Gradient Flow:

During backpropagation, gradients for F(x) are computed as:


∂Loss/∂x = ∂Loss/∂H(x) × (∂F(x)/∂x + 1)
The "+1" term ensures gradients never vanish completely, even if ∂F(x)/∂x ≈ 0.
Identity Mapping

At initialization, F(x) ≈ 0 ⇒ H(x) = x.


Early training stages benefit from near-identity transformations.
Deep Network Training:

In a 152-layer ResNet, skip connections allow gradients to propagate directly to early


layers

Mathematical Proof:
Consider a chain of L residual blocks:

H_L(x) = x + Σ_{i=1}^L F_i(x)

The gradient w.r.t. layer l is:

∂H_L/∂H_l = 1 + ∂(Σ_{i=l}^L F_i)/∂H_l


Even if ∂F_i/∂H_l → 0, the gradient remains ≥1 (avoids vanishing).

Visualization:

Input(x)→[WeightLayer→ReLU→ Weight Layer]→ +→ Otput


(H(x))
↑_____________________________________↓

Conclusion: Skip connections enable ultra-deep networks (e.g., ResNet-152) by


preserving gradient flow.

4. Compare DenseNet’s dense blocks with ResNet’s skip connections.

(10 marks)

Answer:
Feature ResNet DenseNet

Concatenative ([x, F1(x),


Connection Additive (H(x) = F(x) + x)
F2(x), ...])

Feature
Single path per block All previous features reused
Reuse

Moderate (25M for ResNet-


Parameters Economical (12M for DenseNet-121)
50)

Gradient
Preserved via addition Enhanced via concatenation
Flow

Structure Residual blocks Dense blocks + transition layers

Key Differences:

DenseNet:

Growth rate (K): Each layer adds K new channels (e.g., K=12).
Bottlenecks: 1x1 convs compress channels before 3x3 convs
Transition Layers: Batch norm + 1x1 conv + 2x2 pooling between blocks.
ResNet:

Simpler, but requires careful initialization for deep networks.

Example:

Dense Block:
Layer1: [x]
Layer2: [x, F1(x)]
Layer3: [x, F1(x), F2(x)]
Res Block:
Layer1: x
Layer2: x + F1(x)

Conclusion: DenseNet improves feature reuse but requires more memory; ResNet is
simpler and widely adopted.

You might also like