Understanding
Loss &
Regularization in
Deep Learning
Presented by:
Dr. Pandiyaraju V
Abishek Karthik
Sreya Mynampatti
What Are Underfitting &
Overfitting?
Models can either underfit (learn too little) or overfit (learn too
much and memorize). Both are issues in building reliable
models.
• Underfitting: Model is too simple to capture patterns.
• Overfitting: Model learns noise and performs poorly on
unseen data.
• Caused by improper architecture, too little or too much
training, or lack of regularization.
• Goal: Find the sweet spot — just right model complexity.
What is Underfitting?
Underfitting happens when the model cannot learn the
underlying trend of the data.
• Model is too shallow or linear for complex data.
• High bias – makes strong assumptions, ignores important
signals.
• Training and validation loss both remain high.
• Often fixed by increasing model complexity or training time.
📌 Think: "Model didn’t even try hard enough."
Why Does Overfitting
Happen?
Overfitting makes a model memorize training data rather than
learn general patterns.
• Too many parameters (deep/wide model) on small data.
• Trained too long without checks.
• Noisy or unbalanced datasets.
• Lack of regularization techniques.
📌 Think: Memorization vs Understanding.
What is Loss Function?
• The loss function tells us how wrong the model's prediction is.
It’s the core metric that we minimize during training.
• Measures the difference between predicted and actual
values.
• Helps update weights via backpropagation.
• Lower the loss → better the model performance.
📌 Example:
Mean Squared Error (MSE):
• Binary Cross-Entropy:
Types of Loss Functions
Different tasks use different loss functions depending on output
type.
• MSE – For regression problems.
• MAE (Mean Absolute Error) – Less sensitive to outliers.
• Binary Cross-Entropy – For binary classification.
• Categorical Cross-Entropy – For multi-class classification.
📌 Choose loss based on the task (regression or classification).
How Loss Drives Learning
(Backprop Recap)
Loss is used to calculate gradients and update model weights.
• Forward pass: model makes predictions.
• Compute loss between predicted and actual.
• Backward pass: gradients of loss w.r.t. weights are calculated.
• Optimizer adjusts weights to reduce loss.
What is Regularization?
Regularization is a technique to prevent overfitting and improve
generalization.
• Adds constraints or penalties to the model.
• Helps avoid learning too complex patterns or noise.
• Encourages simpler models that perform better on unseen
data.
📌 Key idea: add “discipline” to the learning process.
L1 and L2 Regularization
Both add penalties to the loss function but in different ways.
• L1 Regularization (Lasso): adds absolute value of weights
→ Encourages sparsity (some weights become zero).
• L2 Regularization (Ridge): adds squared weights
→ Shrinks weights smoothly, avoids large weights.
📌 Use L1 for feature selection, L2 for smooth generalization.
Dropout Regularization
Dropout is a simple yet powerful technique used during training.
It randomly disables neurons to avoid co-dependence.
• Forces redundancy in learning.
• Reduces risk of overfitting.
• Dropout rate = probability a neuron is turned off.
• Common in dense layers of neural networks.
Early Stopping
Sometimes, more training does more harm than good.
Early stopping halts training when performance on validation
data starts declining.
• Monitors validation loss.
• Stops training before overfitting kicks in.
• Saves compute time and avoids degrading model.
• Often paired with checkpoints (best model saving).
Batch Normalization
BatchNorm improves training speed and model stability.
It normalizes layer outputs to prevent internal covariate shifts.
• Normalizes inputs across each batch.
• Speeds up convergence.
• Slight regularization effect.
• Often placed after fully connected or conv layers.
📌 Helps with vanishing/exploding gradients.
Data Augmentation
Data augmentation generates more training data from existing
samples.
This helps generalize better to unseen inputs.
• Apply transformations: rotate, zoom, flip, shift, crop.
• Improves robustness to real-world variations.
• Common in computer vision tasks.
• Simulates unseen inputs without collecting new data.
Summary – Tackling
Overfitting
Let’s recap what we’ve learned so far about regularization
techniques.
These methods help build reliable models.
• Reduce model complexity (fewer neurons/layers).
• Add dropout in training.
• Use L1/L2 to control weights.
• Apply early stopping when val loss increases.
• Normalize inputs with BatchNorm.
• Expand data using augmentation.
📌 Combine methods for stronger generalization.
When to Use What?
There’s no one-size-fits-all — choose techniques based on your
task and data.
Here’s a rough guide:
• Small dataset → data augmentation + L2 regularization.
• Large model → dropout + L1 regularization.
• Noisy data → early stopping + robust loss (like MAE).
📌 Always watch validation metrics to avoid overfitting.
Your Takeaway
Training a deep model is not just about reducing error — it’s
about generalizing well.
A well-regularized model is both accurate and resilient.
• Don’t just memorize – learn patterns.
• Regularization is key to real-world deployment.
• Always monitor both training and validation curves.
🧠 Good models make good guesses on new data.
Code Time – Try it Yourself!
Let’s experiment with training and regularization in action!
📎
[Link]
O92O5PQLqcqmc8?usp=sharing
• Try training without regularization.
• Add L2 or dropout – observe changes in loss/accuracy.
• Use early stopping or BatchNorm and compare results.
Challenging Task
• Task 1: Image Classification with CIFAR-10 Dataset
• Train two artificial neural networks (ANNs) on the CIFAR-10 dataset using mini-
• batch gradient descent. Apply hyperparameter tuning to both models, using
• different regularization techniques for each. Evaluate the model'
• s performance
• with visualizations of loss and accuracy, and display with reasoning as to which
• model performed better.
• Task 2: Predicting House Prices with the Boston Housing Dataset
• Implement an artificial neural network (ANN) for regression on the Boston
• Housing dataset, applying minibatch gradient descent, hyperparameter tuning,
• and various regularization techniques. Assess the model using Mean Squared
• Error and visualize training progress.
• Dataset link: [Link]