Deep Learning UNIT-II Part1

This document covers key concepts in deep learning, including the vanishing and exploding gradient problems, hyperparameters, and the building blocks of deep networks. It discusses various activation functions, weight initialization strategies, and the importance of mini-batch size and vectorization in training models. Additionally, it outlines the roles of input, hidden, and output layers, as well as the significance of loss functions and optimizers in the training process.

Uploaded by

22b81a6610

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views48 pages

Deep Learning UNIT-II Part1

Uploaded by

22b81a6610

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

UNIT-II

Syllabus
• Vanishing Gradients and Exploding Gradients problem.
• Hyper Parameters – layer size, Magnitudes, regularization, activation
functions, weight initialization strategies, mini-batch size,
vectorization.
• Building blocks of deep networks- Feed Forward multi-layer neural
networks
• Unsupervised Pretrained Networks- Auto encoders -Sparse
Autoencoders, Denoising Autoencoders, Deep Belief Networks
(DBNs), Generative Adversarial Networks (GANs).
• Training a multi-layer artificial neural network and training deep
models.
h1 = w1 * x1 + w4 * x2 + b
Output at h1 = Sigmoid(h1)
Vanishing Gradient Problem
Vanishing Gradient Problem
• Definition: The gradients become very small as they are propagated
backward, especially in deep networks.
• Effect:
• Early layers (close to the input) learn very slowly or stop learning.
• This happens because the small gradients essentially "vanish" during
backpropagation.
Example
Why it happens?
• Activation functions like sigmoid or tanh squash their outputs into a
small range (e.g., 0 to 1 for sigmoid).
• Their derivatives are also small for most input values.
• When these small derivatives are multiplied across many layers, the
gradient becomes negligible.
Vanishing Gradients & Exploding Gradients
• If the gradients are too small or too large, they can cause problems
for the model.
If we use Sigmoid Activation function
If we use Tanh Activation function

* Remaining terms
Solution for vanishing Gradient Problem: ReLU Activation
Exploding Gradient Problem
• The gradients become extremely large as they are propagated
backward.
• Effect: The weights grow uncontrollably, leading to unstable training.
• Why it happens : If the weights or gradients are too large, the
multiplication during backpropagation causes them to grow
exponentially.
* Remaining terms
Solution for Exploding Gradients using Gradient Clipping

Generally Clip value is used while defining the optimizer

Solutions

• Vanishing Gradient:
• Use activation functions with non-vanishing gradients (e.g., ReLU, Leaky
ReLU).
• Initialize weights properly (e.g., Xavier Initialization, He Initialization).
• Use techniques like Batch Normalization to stabilize training.
• Exploding Gradient:
• Apply gradient clipping to cap large gradients.
• Use regularization techniques like L2-norm to control weight growth.
• Ensure proper weight initialization.
Weight Initialization Techniques

Designed for tanh activation, scales weights based on the number of input and output neurons.

Designed for ReLU activation, scales weights based on the number of input neurons.
Weights Initialization Techniques
Example for Model Parameters

Input features

Weight importance
For Single Input feature
Multiple Input features
Hyper parameters
• Hyper parameters are key settings or configurations that influence how a neural
network model is trained and how well it performs.
• Hyper parameters are set before training begins.
Different Hyper parameters to be considered
• Number of Epochs
• Learning rate
• Layer Size (Number of Neurons per Layer)
• Magnitudes (Weight Magnitudes)
• Regularization
• Activation Functions
• Weight Initialization Strategies
• Mini-Batch Size
• Vectorization
Hyper parameters
Number of Epochs
• An epoch refers to one complete pass through the entire training dataset during
the training process.
• The number of epochs determines how many times the model will iterate over
the entire dataset.
• If the number of epochs is too small, the model might not have enough
opportunity to learn and converge to a good solution (underfitting).
• If it is too large, the model might learn the training data too well, leading to
overfitting (memorizing the data and not generalizing well to new data).
• We can use early stopping (stopping training if performance stops improving) to
avoid overfitting.
Learning Rate
• The learning rate controls how much the model's weights are
updated with respect to the error or loss after each training step.
new weight = old weight - (learning rate * gradient)
• A high learning rate can cause the model to converge too quickly,
potentially skipping over the optimal solution .
• A low learning rate ensures smaller updates to weights, which might
make the training process more stable but also slow, possibly getting
stuck in suboptimal solutions (local minima).
• adaptive learning rate methods (like Adam) to fine-tune learning
during training.
Layer Size (Number of Neurons per Layer)
• The layer size determines the capacity of the network to learn from
the data.
• Too few neurons may not allow the model to capture complex
patterns, while too many can lead to overfitting.
• A common practice is to start with a moderate number of neurons
and adjust based on performance.
Magnitudes (Weight Magnitudes)
• Refers to the size of the values assigned to the weights (connections
between neurons).
• If the weights are too large, they can lead to unstable gradients and
difficulty in training.
• Very small weights can result in the network being unable to learn
effectively.
• Better to use Weight initialization techniques.
Regularization
• Techniques used to prevent the model from overfitting the training
data, i.e., the model becoming too specific to the training set and not
generalizing well to new data.
Activation Functions
• Mathematical functions applied to the output of each neuron to introduce non-
linearity into the model.
• These functions allows the neural network to learn more complex patterns.
• Commonly used Activation functions:
• ReLU (Rectified Linear Unit): Most commonly used, it returns 0 for negative
inputs and the input itself for positive ones.
• Sigmoid: Compresses values between 0 and 1, often used for binary classification
problems.
• Tanh: Similar to sigmoid but compresses values between -1 and 1.
• Choosing the right one helps the network capture complex relationships in the
data.
Weight Initialization Strategies
• Refers to how the initial weights are set before training begins.
• Proper initialization can help the model converge faster and avoid
problems like vanishing or exploding gradients.
• Common strategies:
• Random Initialization: Weights are initialized randomly, often using
small values from a normal distribution.
• Xavier/Glorot Initialization
• He Initialization
Mini-Batch Size
• Refers to the number of training examples used in one
forward/backward pass (one "iteration") of the model.
• The mini-batch size affects how often the model updates its weights.
• Smaller mini-batches can lead to noisier updates, but they also make
the process slower.
• Larger mini-batches may converge more smoothly but take longer to
compute each update.
• A typical mini-batch size is between 32 and 128, but it can vary based
on the dataset and model.
Vectorization
• The process of converting data into a Vector(Matrix) format that
allows for efficient parallel computation.
• In deep learning, this often means representing inputs (like images or
text) as matrices or tensors and performing operations on these
structures efficiently.
• Deep learning models often involve millions of parameters, and
performing operations in a vectorized way (using matrix
multiplications, for instance) allows the model to leverage hardware
like GPUs for faster computations.
• Modern deep learning frameworks (like TensorFlow and PyTorch)
automatically take care of vectorization, but it’s important to
understand the concept since it significantly speeds up training.
Building blocks of deep networks
The components work together to form a deep
neural network.
Building blocks of deep networks
• Input Layer
• Purpose: Accepts the input data in a structured format.
• Details:
• The number of neurons in this layer matches the number of features in the
dataset.
• Example: For an image of size 28×28, the input layer would have 784 neurons
if the image is flattened.
• In image recognition task, each neuron could represent a pixel of the
image.
Hidden Layers
• Purpose: Extract features and perform computations.
• Details:
• These layers are placed between the input and output layers.
• They consist of neurons that apply transformations to the data.
• The transformations are controlled by weights (learned during training) and
biases.
• Example: Dense (fully connected) layers, Convolutional layers, Recurrent
layers.
Depth of Network = Number of Hidden Layers + 2
Neurons
• Purpose: Perform computations.
• Details:
• Each neuron computes a weighted sum of its inputs, adds a bias, and applies
an activation function.
• Output of a neuron:
z = activation( ∑ ( w ⋅ x ) + b )
Activation Functions
• Purpose: Introduce non-linearity to the model, enabling it to learn
complex patterns.
• Examples:

• Softmax: Used in the output layer for multi-class classification.

Output Layer
• Purpose: Produces the final output of the model.
• Details:
• The number of neurons corresponds to the number of output classes or
predictions.

• Activation functions depend on the problem type:

• Regression: Linear activation.
• Binary classification: Sigmoid.
• Multi-class classification: Softmax.
Weights and Biases
• Purpose: Represent the learnable parameters of the model.
• Details:
• Weights: Control the importance of each input feature.
• Biases: Allow the activation functions to shift.
• Loss Function
• Purpose: Measures the difference between the predicted output and
the true output.
• Examples:
• Mean Squared Error (MSE) for regression.
• Cross-Entropy Loss for classification.
Optimizer
• Purpose: Updates the weights and biases to minimize the loss
function.
• Examples:
• Gradient Descent
• Adam
• RMSProp
• Forward Propagation
• Purpose: Passes input data through the network to compute
predictions.
• input → weighted sum + bias → activation function → next layer

• Backward Propagation
• Purpose: Updates weights and biases using gradients computed from
the loss function via the chain rule.

Gr11 CAT Learner Guide
No ratings yet
Gr11 CAT Learner Guide
287 pages
2018 en PSDHandbook Digital
No ratings yet
2018 en PSDHandbook Digital
241 pages
DL Mod2
No ratings yet
DL Mod2
45 pages
2 Deep Neural Network_241120_095158
No ratings yet
2 Deep Neural Network_241120_095158
47 pages
2. Deep Neural Network
No ratings yet
2. Deep Neural Network
60 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
Supervised Deep Learning
No ratings yet
Supervised Deep Learning
28 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
UNIT-2 Foundations of Deep Learning
No ratings yet
UNIT-2 Foundations of Deep Learning
64 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
a imprimer 4
No ratings yet
a imprimer 4
4 pages
Computer Vision NN Architecture
No ratings yet
Computer Vision NN Architecture
19 pages
Artificial Neural Networks_dl
No ratings yet
Artificial Neural Networks_dl
55 pages
CNN Training Aspects Presentation
No ratings yet
CNN Training Aspects Presentation
26 pages
Deep Learning (All in One)
No ratings yet
Deep Learning (All in One)
23 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
26 pages
Unit 5 (Second Half)
No ratings yet
Unit 5 (Second Half)
10 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
4 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
26 pages
NN Concepts
No ratings yet
NN Concepts
4 pages
Lecture 5-6
No ratings yet
Lecture 5-6
45 pages
Machine Learning (ML) :: Aim: Analysis and Implementation of Deep Neural Network. Definitions
No ratings yet
Machine Learning (ML) :: Aim: Analysis and Implementation of Deep Neural Network. Definitions
6 pages
Part 13 MD
No ratings yet
Part 13 MD
41 pages
Introduction Deep Eng (1)
No ratings yet
Introduction Deep Eng (1)
50 pages
chapter_5_summary
No ratings yet
chapter_5_summary
5 pages
Optimization of Deep Networks
No ratings yet
Optimization of Deep Networks
84 pages
ITNN Week3
No ratings yet
ITNN Week3
21 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
ML unit 4
No ratings yet
ML unit 4
23 pages
465-Lecture 10-11
No ratings yet
465-Lecture 10-11
79 pages
unit 2 -ml
No ratings yet
unit 2 -ml
18 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
ANN Doc
No ratings yet
ANN Doc
2 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
DL UNIT 3
No ratings yet
DL UNIT 3
14 pages
Different Activation Functions With The Equations
No ratings yet
Different Activation Functions With The Equations
6 pages
ANN_Presentation_Exam_Hafsa
No ratings yet
ANN_Presentation_Exam_Hafsa
29 pages
4 - DNN Tip
No ratings yet
4 - DNN Tip
52 pages
cst414- Deep learning
No ratings yet
cst414- Deep learning
34 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
Deep Learing
No ratings yet
Deep Learing
37 pages
Unit 2.4
No ratings yet
Unit 2.4
31 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
3rd Unit ML
No ratings yet
3rd Unit ML
7 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Chapter 3
No ratings yet
Chapter 3
17 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
3 - DeepLearning - and - CNN v3
No ratings yet
3 - DeepLearning - and - CNN v3
50 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
3 pages
Artificial Neural NetworkIV
No ratings yet
Artificial Neural NetworkIV
6 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
Neural network intro lecture 4
No ratings yet
Neural network intro lecture 4
46 pages
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
No ratings yet
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
1 page
Lect 12 -Deep Feed Forward NN- Review
No ratings yet
Lect 12 -Deep Feed Forward NN- Review
93 pages
Lec 8
No ratings yet
Lec 8
43 pages
Assignment Jaiprakash
No ratings yet
Assignment Jaiprakash
5 pages
AML 03 Dense Neural Networks
No ratings yet
AML 03 Dense Neural Networks
20 pages
Artificial Intelligence Interview Questions
From Everand
Artificial Intelligence Interview Questions
Tech Interviews
5/5 (2)
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Struts MappingDispatchAction Example
No ratings yet
Struts MappingDispatchAction Example
6 pages
System Engineer Job Description, Qualification, Certification
No ratings yet
System Engineer Job Description, Qualification, Certification
7 pages
BRKNMS 2426
No ratings yet
BRKNMS 2426
96 pages
Driver Drowsiness Monitoring System Using Visual Behaviour and Machine Learning
No ratings yet
Driver Drowsiness Monitoring System Using Visual Behaviour and Machine Learning
12 pages
1 Ee5150-Introduction
No ratings yet
1 Ee5150-Introduction
15 pages
Ac Power adapterDS1877 2 1604 AMER en - US
No ratings yet
Ac Power adapterDS1877 2 1604 AMER en - US
2 pages
Cambridge Analytica Claim Form Final
No ratings yet
Cambridge Analytica Claim Form Final
4 pages
AI in Policing - Rahul Alwal IPS
No ratings yet
AI in Policing - Rahul Alwal IPS
23 pages
Vs Vs VS: Samsung Galaxy A73 5G 256GB Samsung Galaxy A72 256GB Samsung Galaxy S10 Lite 512GB Samsung Galaxy Note 9 512GB
No ratings yet
Vs Vs VS: Samsung Galaxy A73 5G 256GB Samsung Galaxy A72 256GB Samsung Galaxy S10 Lite 512GB Samsung Galaxy Note 9 512GB
6 pages
System Software and Operating System 07 - Daily Class Notes
No ratings yet
System Software and Operating System 07 - Daily Class Notes
6 pages
Perencanaan Jembatan Prategang
No ratings yet
Perencanaan Jembatan Prategang
106 pages
ITT Practice Manual 210805 135213
No ratings yet
ITT Practice Manual 210805 135213
142 pages
Abstraction and Virtualization
No ratings yet
Abstraction and Virtualization
83 pages
CPO Room Controller Application Guide
No ratings yet
CPO Room Controller Application Guide
84 pages
Aramaic Peshitta New Testament Vertical Interlin 2 GEnyAwp81GNv PDF
No ratings yet
Aramaic Peshitta New Testament Vertical Interlin 2 GEnyAwp81GNv PDF
4 pages
CNS IMP Questions
No ratings yet
CNS IMP Questions
2 pages
Critical Path Method (CPM) : Shahid Ali Khawaj
No ratings yet
Critical Path Method (CPM) : Shahid Ali Khawaj
17 pages
OS IO Management & Disk Scheduling Unit 5
No ratings yet
OS IO Management & Disk Scheduling Unit 5
44 pages
Immediate Download McGraw Hill Microsoft SQL Server 2016 Reporting Services 5 Ed. Brian Larson - Ebook PDF Ebooks 2024
100% (6)
Immediate Download McGraw Hill Microsoft SQL Server 2016 Reporting Services 5 Ed. Brian Larson - Ebook PDF Ebooks 2024
41 pages
M4 Verilog Notes
No ratings yet
M4 Verilog Notes
23 pages
Fundamentos de Biomecanica Del Ejercicio Fisico
100% (2)
Fundamentos de Biomecanica Del Ejercicio Fisico
209 pages
Strang Linear Algebra Notes
No ratings yet
Strang Linear Algebra Notes
13 pages
Spring Creek Sun March 12
No ratings yet
Spring Creek Sun March 12
24 pages
Using Computers To Make Movie
No ratings yet
Using Computers To Make Movie
5 pages
SAA-C03 mở khóa
No ratings yet
SAA-C03 mở khóa
500 pages
Faq 02yfy319 Dlimrun en
No ratings yet
Faq 02yfy319 Dlimrun en
13 pages
JavaScript DOM Exercises
No ratings yet
JavaScript DOM Exercises
21 pages
Kako Konfigurisati Rooter
No ratings yet
Kako Konfigurisati Rooter
10 pages