0% found this document useful (0 votes)

47 views8 pages

Day 2 - Loss & Activation Functions

Uploaded by

cpusingpython

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views8 pages

Day 2 - Loss & Activation Functions

Uploaded by

cpusingpython

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Feb 19, 2025

Class #2:
📌 Keywords
✒️ Neural Network - Computational model that is inspired by human intelligence.
✒️ Vanishing Gradient Problem - The Vanishing Gradient Problem occurs when
gradients (partial derivatives) in deep neural networks become extremely small,
making it difficult for the earlier layers to learn.

✒️ Exploding Gradient Problem - The Exploding Gradient Problem occurs when

gradients become excessively large, leading to unstable weight updates and
causing the model to fail to converge.

✒️ Overfitting - The model learns too much from the training data, including noise
and irrelevant patterns. It performs well on training data but poorly on new (test)
data.

✒️ Underfitting - The model is too simple to learn from the data and fails to
capture key patterns. It performs poorly on both training and test data.

✒️ Training Parameters - GPT3 (175 billion), GPT4 (1 trillion)

✒️ Hyperparameter - A hyperparameter is a parameter set before training a
machine learning model, rather than learned from the data. It controls the training
process and affects model performance.

● Manually set.
● Not learnt from data.
● Settings of the model.
✒️ Hyperparameters & Their Importance

Learning Controls how much 0.01, 0.001, Too high -> Model overshoots.
Rate(α) the model updates 0.0001 Too low -> Model learns too
weights in each step. slowly.
Batch Size Number of training 16, 32, 64, Small batch -> Better
samples processed 128 generalization, more noise.
before updating Large batch -> Faster but
weights. may overfit.
Number of Number of times the 10, 50, 100 Too many -> Overfitting
Epochs model sees the Too few -> Underfitting
entire dataset.
Number of Determines the 1, 2, 3, 10 More layers -> Better feature
Hidden depth of the neural extraction but more risk of
Layers network. overfitting.
Number of Defines the 32, 64, 128, More neurons -> More
Neurons per complexity of each 512 complexity but higher
Layer layer. computation.
Dropout Percentage of 0.1, 0.2, 0.5 Higher values prevent
Rate neurons randomly overfitting but may slow
dropped during learning.
training to prevent
overfitting.
Optimizer Algorithm that SGD, Adam, Different optimizers converge
adjusts weights to RMSprop at different speeds and
minimize loss. stabilize.
Activation Defines how neurons ReLu, Affects how well a model
Function activate and pass Sigmoid, captures non-linear
values forward. Tanh relationships.
Weight Sets initial weight Xavier, He, Poor initialization leads to
Initialization values before Random slow or stuck learning.
training.

✒️ Question - How do we get the number of epochs, number of hidden layers,

number of neurons per layer, initialise weights or initialise bias?
=> All of these are derived from trial & error.
✒️ Loss/Cost/Error Function - A function to predict how far our predicted value is
from the true value.

ȳ = σz
σz = 1 / (1+e^-z)
z = wx+b
c = (y - ȳ)^2
δc/δw = δc/δȳ * δȳ/δz * δz/δw = -2(y - ȳ) * σz * (1-σz) * x
δc/δb = δc/δȳ * δȳ/δz * δz/δb = -2(y - ȳ) * σz * (1-σz)

✒️ Loss functions
Mean Squared tf.keras.losses. Regression
Error (MSE) MeanSquaredEr problems where
ror errors need to be
minimized.
Mean Absolute tf.keras.losses. y - \hat{y}
Error (MAE) MeanAbsoluteEr
ror

Mean Squared tf.keras.losses. Regression tasks

Logarithmic MeanSquaredLo where small
Error (MSLE) garithmicError differences matter
more than large
ones.
Huber Loss tf.keras.losses.H y - \hat(y)
uber

Binary tf.keras.losses.B Binary

Cross-Entropy inaryCrossentro classification tasks
py (e.g. spam
detection, medical
diagnosis).
Categorical tf.keras.losses.C Multi-class
Cross-Entropy ategoricalCross classification
entropy when labels are
one-hot encoded.
Sparse tf.keras.losses.S Same as Categorical Multi-class
Categorical parseCategorica Crossentropy but with classification
Cross-Entropy lCrossentropy integer labels. when labels are
integer-encoded
instead of
one-hot.
Kullback-Leibl tf.keras.losses.K Probability
er Divergence LDivergence distributions in
(KL variational
Divergence) autoencoders and
reinforcement
learning.
Cosine tf.keras.losses.C
Similarity Loss osineSimilarity

Hinge Loss tf.keras.losses.H Used for Support

inge Vector Machines
(SVMs) and
max-margin
classification
tasks.
Squared Hinge tf.keras.losses.S Similar to hinge
Loss quaredHinge loss but penalizes
large margin
violations more.
Poisson Loss tf.keras.losses.P Used when
oisson modelling
count-based data
(e.g. predicting the
number of events
occurring).
Log Cosh Loss tf.keras.losses.L Regression tasks,
ogCosh similar to Huber
less but smoother.

📌 Regression
● MSE - Mean Squared Error
● MAE - Mean Absolute Error
● Huber Loss
✒️ MSE - Mean Squared Error (MSE) is a commonly used loss function for
regression models. It measures the average squared difference between actual
and predicted values.

✒️ MAE - Mean Absolute Error (MAE) is a commonly used loss function for
regression models. It measures the average absolute difference between actual
and predicted values.

✒️ Huber Loss - Huber Loss is a robust loss function that effectively handles
outliers by combining Mean Squared Error (MSE) and Mean Absolute Error (MAE).

📌 Classification
● Binary cross-entropy
● Categorical cross-entropy
● Sparse Categorical Entropy

✒️ Binary cross-entropy - Binary Cross-Entropy (BCE) is a loss function commonly

used in binary classification problems in machine learning and deep learning. It
measures the difference between the true labels and the predicted probabilities.

✒️ Categorical cross-entropy - Categorical Cross-Entropy (CCE) is a commonly

used loss function in machine learning and deep learning for multi-class
classification problems where each input belongs to one of several categories.

✒️ Sparse categorical cross-entropy - Sparse Categorical Cross-Entropy (SCCE) is

a loss function used for multi-class classification when labels are integers. It is an
optimized version of Categorical Cross-Entropy (CCE) for sparse labels.

📌 Training Steps
● Forward pass
● Gradient computation / Backward propagation
● Optimization / Update weights & biases

✒️ Forward pass - The forward pass is the process in a neural network where input
data flows through the layers, transforming weights, biases, and activation
functions, to generate an output (prediction).
✒️ Backward propagation - Backpropagation is an optimization algorithm used to
train neural networks by adjusting weights based on the error from the forward
pass. It works by propagating the error backwards through the network and
updating weights using gradient descent.

✒️ Optimization algorithm - An optimization algorithm in AI is used to adjust the

model’s parameters (weights and biases) to minimize the loss function and
improve performance. A very common algorithm is “Adam”.

✒️ Optimizers
Gradient Large Converges Slow,
Descent (GD) datasets with to optimal requires
offline batch solution. entire
updates. dataset
for each
update.
Stochastic Large Fast, Noisy
Gradient datasets, updates updates
Descent (SGD) real-time weights per may not
updates. sample. converge.
Mini-Batch Balance Faster than Still has
Gradient between GD & GD, more some
Descent SGD stable than variance.
SGD.

Momentum Training deep Fater Requires

networds with convergenc tuning β.
oscillations e, smooths
updates.
Nesterov Helps in cases Better than Requires
Accelerated of slow Momentum extra
Giant (NAG) convergence. in convex gradient
loss computati
surfaces. on.
Adagrad Sparse data Adapts Learning
(NLP, learning rate
embeddings) rate per decreases
parameter. too much
over time.
RMSprop RNNs, NLP, Reduces Requires
and learning tuning β.
non-stationar rate issues
y loss of Adagrad.
functions.
Adam Default Fast, Uses more
(Adaptive optimizer for adaptive, memory.
Moment deep learning. and works
Estimation) well for
most cases.

AdamW (Adam Similar to Adam but Regularized Prevents Requires

with Weight includes weight deep overfitting. tuning
Decay) decay. networks. weight
decay
factor.
AdaDelta Avoids No need for Computati
manual a fixed onally
learning rate learning expensive.
tuning. rate.
Nadam Adam + Nesterov Helps with Combines May not
(Nesterov-Ada Momentum slow benefits of always be
m) convergence. NAG & better
Adam than
Adam.

✒️ Activations - In AI, the activation of a neuron refers to the output value of a

neuron after applying an activation function to the weighted sum of its inputs.
This determines whether the neuron should "fire" and pass information to the next
layer.

✒️ Activation Functions
● Sigmoid
● ReLU (Rectified Linear Unit)
● Leaky ReLU
● Tanh
● Softmax
✒️ Activation functions
Activation Function Formula Common Use Cases
Sigmoid Binary classifications,
output later.

Tanh Hidden laters, better than

Sigmoid for centering
data.

ReLU Most common in hidden

layers of deep networks.

Leaky ReLU Avoids “dying ReLU”

problem, better for
negative inputs.
Softmax Multi-class classification,
output layer.

ELU Deep networks, faster

convergence than ReLU.

Swish Advanced deep networks,

often better than ReLU.

Softplus Smooth approximation of

ReLU, avoids zero
gradients.

DNN Training and Optimization Techniques
No ratings yet
DNN Training and Optimization Techniques
114 pages
DL Module 2 1 (Sami)
No ratings yet
DL Module 2 1 (Sami)
17 pages
ML Activations, Loss Functions & Optimizers
No ratings yet
ML Activations, Loss Functions & Optimizers
29 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Gen Aiml Notes by Piyush
No ratings yet
Gen Aiml Notes by Piyush
39 pages
APKA Report
No ratings yet
APKA Report
3 pages
DL Unit 1
No ratings yet
DL Unit 1
9 pages
Deep Neural Network Optimization Techniques
No ratings yet
Deep Neural Network Optimization Techniques
23 pages
Understanding Perceptrons in Deep Learning
No ratings yet
Understanding Perceptrons in Deep Learning
62 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
Understanding Neurons and Perceptrons
No ratings yet
Understanding Neurons and Perceptrons
23 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
31 pages
Training Neural Network
No ratings yet
Training Neural Network
114 pages
Deep Learning with Neural Networks
No ratings yet
Deep Learning with Neural Networks
104 pages
Deep Learning: Feedforward Networks Guide
No ratings yet
Deep Learning: Feedforward Networks Guide
15 pages
ANN Analysis
No ratings yet
ANN Analysis
5 pages
Neural Networks in Machine Learning
No ratings yet
Neural Networks in Machine Learning
77 pages
Types of Neural Networks
No ratings yet
Types of Neural Networks
7 pages
Neural Networks: A Beginner's Guide
No ratings yet
Neural Networks: A Beginner's Guide
23 pages
Gen Ai Mynotes
No ratings yet
Gen Ai Mynotes
12 pages
Deep Learning Essentials for Business
No ratings yet
Deep Learning Essentials for Business
58 pages
Deep Learning Concepts and Notations
No ratings yet
Deep Learning Concepts and Notations
3 pages
Module 1
No ratings yet
Module 1
64 pages
Neural Networks Training Loss Functions, Stochastic Gradient Descent, Backpropagation Algorithm, Bias-Variance Tradeoff
No ratings yet
Neural Networks Training Loss Functions, Stochastic Gradient Descent, Backpropagation Algorithm, Bias-Variance Tradeoff
29 pages
Module 1
No ratings yet
Module 1
7 pages
Bias, Variance, and Model Optimization
No ratings yet
Bias, Variance, and Model Optimization
3 pages
4-Tensors and Opeartions - Probability Basics-Gradient Descent-27!07!2024
No ratings yet
4-Tensors and Opeartions - Probability Basics-Gradient Descent-27!07!2024
18 pages
Unit 1
No ratings yet
Unit 1
32 pages
Understanding Optimization in AI
No ratings yet
Understanding Optimization in AI
36 pages
Chapter 5 Final
No ratings yet
Chapter 5 Final
80 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
48 pages
Neural Networks
No ratings yet
Neural Networks
38 pages
Understanding Activation Functions in Neural Networks
No ratings yet
Understanding Activation Functions in Neural Networks
22 pages
Understanding Epochs and Optimizers in ML
No ratings yet
Understanding Epochs and Optimizers in ML
23 pages
Handwritten Notes - Unit 1,2
No ratings yet
Handwritten Notes - Unit 1,2
9 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Module2 Question and Answer
No ratings yet
Module2 Question and Answer
25 pages
DeepLearning Workshop Humayun
No ratings yet
DeepLearning Workshop Humayun
63 pages
DeepLearningLab Manual
No ratings yet
DeepLearningLab Manual
21 pages
Understanding Deep Learning Concepts
No ratings yet
Understanding Deep Learning Concepts
78 pages
Master Thesis Template Polito
No ratings yet
Master Thesis Template Polito
16 pages
Understanding Deep Learning Concepts
No ratings yet
Understanding Deep Learning Concepts
21 pages
Deep Learning Concepts and Techniques
No ratings yet
Deep Learning Concepts and Techniques
16 pages
DL Unit-2
100% (1)
DL Unit-2
24 pages
Module 2
No ratings yet
Module 2
12 pages
Introduction Deep Eng
No ratings yet
Introduction Deep Eng
50 pages
Deep Learning Optimization Techniques
No ratings yet
Deep Learning Optimization Techniques
31 pages
Unit-1 and 2 and 3
No ratings yet
Unit-1 and 2 and 3
212 pages
Unit 2 DL
No ratings yet
Unit 2 DL
70 pages
Supervised Deep Learning Techniques
No ratings yet
Supervised Deep Learning Techniques
28 pages
Build a 3-Layer XOR Neural Network
No ratings yet
Build a 3-Layer XOR Neural Network
14 pages
MtechDL Unit2
No ratings yet
MtechDL Unit2
25 pages
Neural Network Optimization Techniques
No ratings yet
Neural Network Optimization Techniques
22 pages
Deep Learning with PyTorch Basics
No ratings yet
Deep Learning with PyTorch Basics
39 pages
Deep Learning: Feedforward Networks Guide
No ratings yet
Deep Learning: Feedforward Networks Guide
15 pages
Understanding Neural Networks Basics
No ratings yet
Understanding Neural Networks Basics
79 pages
Ai - W7L13
No ratings yet
Ai - W7L13
46 pages
Advanced Machine Learning Techniques
No ratings yet
Advanced Machine Learning Techniques
61 pages
Day 11 - LangChain, LangGraph
No ratings yet
Day 11 - LangChain, LangGraph
3 pages
Week 6 & 7 Notes
No ratings yet
Week 6 & 7 Notes
28 pages
Day 4 - Preprocessing, Model Code
No ratings yet
Day 4 - Preprocessing, Model Code
5 pages
Day 14 & 15 - Vector DBS, RAG
No ratings yet
Day 14 & 15 - Vector DBS, RAG
19 pages
Day 4 - Data Preprocessing, Model Code
No ratings yet
Day 4 - Data Preprocessing, Model Code
17 pages
Day 2 - Loss & Activation Functions
No ratings yet
Day 2 - Loss & Activation Functions
18 pages
Day 3 - Math & Convolution
No ratings yet
Day 3 - Math & Convolution
4 pages
Sea CErase
No ratings yet
Sea CErase
48 pages
Company Profile
No ratings yet
Company Profile
14 pages
EN212 Engineering Mathematics Updates
No ratings yet
EN212 Engineering Mathematics Updates
4 pages
How To Use Inhaler
No ratings yet
How To Use Inhaler
84 pages
Production Costs and Financial Statements
No ratings yet
Production Costs and Financial Statements
67 pages
Pearl Bill 1930
No ratings yet
Pearl Bill 1930
1 page
Hong Kong Science Park Transport Schedules
No ratings yet
Hong Kong Science Park Transport Schedules
7 pages
JN07569RPT AJs Power Source Inc ENV
No ratings yet
JN07569RPT AJs Power Source Inc ENV
39 pages
Corrective Action Plan Template
No ratings yet
Corrective Action Plan Template
4 pages
Python for Predicting Concrete Properties
No ratings yet
Python for Predicting Concrete Properties
4 pages
0.le Thanh Thien - LLM Engineer
No ratings yet
0.le Thanh Thien - LLM Engineer
2 pages
Final Exam: Contemporary World Concepts
No ratings yet
Final Exam: Contemporary World Concepts
3 pages
Largescaleindustries
No ratings yet
Largescaleindustries
88 pages
CEOT in Child
No ratings yet
CEOT in Child
7 pages
Final Placement Report 2023-25
No ratings yet
Final Placement Report 2023-25
8 pages
Understanding the Scientific Method
No ratings yet
Understanding the Scientific Method
34 pages
The Image of 'The Mirror of Iskandar' in The Poetry of Alisher Navai and Its Comparative Analysis
No ratings yet
The Image of 'The Mirror of Iskandar' in The Poetry of Alisher Navai and Its Comparative Analysis
29 pages
Realptt PC Dispatch Guide Manual 201905
No ratings yet
Realptt PC Dispatch Guide Manual 201905
45 pages
LAC Session Program For Science
100% (1)
LAC Session Program For Science
1 page
# Training Hours Per Full-Time Equivalent (FTE) Kpi-7
No ratings yet
# Training Hours Per Full-Time Equivalent (FTE) Kpi-7
2 pages
5 - Best Elite Strategy - Tips and Tricks Elite Strategy
No ratings yet
5 - Best Elite Strategy - Tips and Tricks Elite Strategy
11 pages
Sociological Research Methods Overview
No ratings yet
Sociological Research Methods Overview
14 pages
Applied Dynamics PDF
100% (2)
Applied Dynamics PDF
862 pages
Print vs Broadcast Media Differences
No ratings yet
Print vs Broadcast Media Differences
27 pages
Material SC
No ratings yet
Material SC
26 pages
Excel Tutorial 7 Using Advanced Functions, Conditional Formatting, and Filtering
No ratings yet
Excel Tutorial 7 Using Advanced Functions, Conditional Formatting, and Filtering
24 pages
Farnworth ZuluTotorial
No ratings yet
Farnworth ZuluTotorial
11 pages
Chapter 77 - Alzheimer Disease
No ratings yet
Chapter 77 - Alzheimer Disease
38 pages
Addressing Cover Letters Right
100% (1)
Addressing Cover Letters Right
8 pages
What We Say Matters Practicing Nonviolent Communication ISBN 1645471047, 9781645471042 Full Book Download
No ratings yet
What We Say Matters Practicing Nonviolent Communication ISBN 1645471047, 9781645471042 Full Book Download
14 pages