0% found this document useful (0 votes)

41 views12 pages

Gen Ai Mynotes

Uploaded by

Dhruvisha Lathiya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views12 pages

Gen Ai Mynotes

Uploaded by

Dhruvisha Lathiya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Activation Functions

Definition:
An activation function is a mathematical function in a neural network that decides whether a
neuron should be activated. It adds non-linearity so the network can learn complex patterns.
Why Needed?
• Without it, the network acts like a simple linear model.
• Helps learn non-linear decision boundaries.
• Makes deep learning powerful for images, speech, and text.

Choosing Functions:
• Hidden layers → ReLU / Leaky ReLU / ELU
• Binary classification output → Sigmoid
• Multiclass classification output → Softmax
• RNN hidden layers → tanh or ReLU
Common Problems:
• Vanishing Gradient: Sigmoid, tanh cause very small gradients.
• Dead Neurons: ReLU outputs 0 permanently.
• High computation: Softmax, ELU are slower.
Introduction to Deep Learning
Definition:
Deep Learning is a branch of Machine Learning that uses artificial neural networks with many
layers to automatically learn patterns and features from data.

Scope:
• Handles large & complex datasets (images, speech, text).
• Learns features automatically (no manual feature extraction).
• Used in AI systems that require high accuracy and adaptability.

Applications:
• Image recognition (face detection, medical imaging)
• Speech recognition & virtual assistants (Siri, Alexa)
• Natural Language Processing (chatbots, translation)
• Autonomous vehicles
• Recommendation systems (Netflix, YouTube)

Historical Context & Evolution:

• 1950s: Early neural network concepts (Perceptron).
• 1980s: Backpropagation popularized for training networks.
• 2000s: Growth due to big data and powerful GPUs.
• Present: Advanced architectures like CNNs, RNNs, Transformers.

Deep Learning vs Machine Learning:

Machine Learning Deep Learning
Learns from features manually extracted Learns features automatically
Works well with small datasets Needs large datasets
Algorithms: Decision Trees, SVM Algorithms: CNN, RNN, Transformers
Low computational cost High computational cost (needs GPUs/TPUs)
Feature engineering is crucial Minimal feature engineering needed
Easier to interpret Often a "black box" (harder to explain decisions)

Popular Open-Source Libraries:

• TensorFlow – Google
• PyTorch – Meta
• Keras – High-level API for TensorFlow
• MXNet – Amazon
• Caffe – Image-focused
Artificial Neurons – Basics with Structure
An Artificial Neuron is the basic unit of an Artificial Neural Network (ANN), inspired by the
working of biological brain neurons. It processes inputs to produce an output.
Structure:
1. Inputs (x₁, x₂, …, xn): Values or features given to the neuron.
2. Weights (w₁, w₂, …, wn): Each input has a weight that decides its importance.
3. Summation Function: Adds all weighted inputs (Σ wᵢxᵢ).
4. Bias (b): A constant value added to help the neuron make better decisions.
5. Activation Function: Decides whether the neuron should activate and controls the
output range (e.g., Sigmoid, ReLU).
6. Output (y): Final value sent to the next neuron or layer.
Working:
• Multiply each input by its weight.
• Add all results and bias.
• Pass the sum through the activation function to get output.
Formula:
y = Activation(Σ(wᵢxᵢ) + b)
Key Points:
• Mimics biological neuron behavior.
• Helps the network learn patterns.
• Multiple neurons together form layers of ANN.

Multiclass Classification (FFNN)

DEFN : Predicts exactly one label from 3+ classes using a neural network.
1. Applications – Used for digit recognition (MNIST), classifying images (dog/cat/bird),
sorting text topics, or diagnosing multiple diseases.
2. Architecture – Input layer for features → hidden layers with ReLU/tanh → output layer with
one neuron per class using Softmax.
3. Challenges – Faces issues like imbalanced classes, overfitting, high computation, and
confusion between similar classes.
4. Techniques to Improve – Apply data augmentation, dropout/L2 regularization, better
optimizers (Adam/RMSprop), early stopping, and balanced data.
5. Training Process – Do a forward pass, calculate categorical cross-entropy loss,
backpropagate errors, update weights, and repeat till epochs finish.
6. Example – Classify pictures as cat, dog, or horse using a Softmax output layer.
7. Output Activation – Softmax converts raw scores to probabilities summing to 1 for all
classes.
Backpropagation
Definition:
Algorithm to train neural networks by finding how much each weight contributed to error and
updating it to reduce loss.
Why Needed:
• Too many weights for random adjustment.
• Tells exactly which weights to change and by how much.
Steps:
1. Forward Pass: Input → layers → prediction.
2. Loss Calculation: Compare prediction with target (Cross-Entropy, MSE).
3. Backward Pass: Calculate gradients via chain rule from output layer backwards.
4. Weight Update:

using Gradient Descent / Adam / RMSProp.

5. Repeat: Train for many epochs.
Math Insight:
Chain rule:

Error flows backward layer-by-layer.

Types of Backpropagations
1. Static – For feed-forward nets, fixed connections (e.g., image classification).
2. Recurrent – For RNNs, unfolded through time.
3. Online – Update after each sample (fast but noisy).
4. Batch – Update after full batch (stable but slower).
Advantages:
• Works for deep networks.
• Efficient & scalable.
• Trains millions of parameters.
Challenges:
• Vanishing / exploding gradients.
• Large data need.
• Local minima.
Improvements:
• ReLU/Leaky ReLU, Batch Norm.
• Momentum optimizers (Adam, RMSProp).
• Dropout, L2 regularization.
Example:
MNIST digit recognition — updates weights to fix misclassifications.
Hyperparameters in a Fully Connected Neural Network (FCNN)
These are settings you choose before training:
1. Learning Rate – Controls how big each weight update is.
2. Number of Layers (Depth) – How many hidden layers the network has.
3. Neurons per Layer (Width) – Units in each hidden layer.
4. Batch Size – Number of samples processed before updating weights.
5. Epochs – How many times the model sees the entire dataset.
6. Activation Functions – e.g., ReLU, Sigmoid, Tanh.
7. Optimizer – e.g., SGD, Adam, RMSProp (affects training speed & quality).
8. Dropout Rate – % of neurons randomly disabled to prevent overfitting.
9. Weight Initialization Method – e.g., Xavier, He initialization.
10. Regularization Parameters – e.g., L1/L2 penalty to reduce overfitting.

Memory Requirements in FCNN

Memory is mainly used for storing:
1. Weights & Biases –
o Total grows with (neurons_in × neurons_out) per layer.

o Deeper/wider networks → more parameters → more memory.

2. Activations –
o Outputs of each neuron for every layer are stored during forward pass

(needed in backprop).
3. Gradients –
o Each weight and bias has a gradient stored for updating during

backprop.
4. Batch Size Effect –
o Larger batch size = more activations & gradients stored in memory.

o Smaller batch size reduces memory but might slow training.

Gradient Descent
Definition:
Optimization algorithm to minimize a loss function by iteratively
updating parameters in the opposite direction of the gradient until
convergence.

Steps
1. Initialize weights & biases.
2. Forward pass – compute predictions.
3. Compute loss (MSE, Cross-Entropy).
4. Backward pass – find gradients via backpropagation.
5. Update:

6. Repeat until convergence/epochs complete.

Key Components
• Loss Function – guides optimization.

• Learning Rate (η) – step size (too high → overshoot, too low → slow).

• Gradient – steepest ascent direction; move opposite.

• Convergence Criteria – stop if loss change is minimal or epochs

end.

Variants (Optimizers)
Momentum, NAG, AdaGrad, RMSProp, Adam.

Advantages
Simple, works on many problems, supports learning rate scheduling.
Limitations
Learning rate sensitive, may get stuck in local minima, expensive for
large data.
Types
1. Batch GD – full dataset per step (stable but slow, high memory).
2. Stochastic Gradient Descent (SGD)
• One sample per update → updates happen very frequently.

• Fast per iteration (only one sample’s gradient to compute).

• Escapes local minima/saddle points due to noise in updates.

• High variance in gradient → path is zig-zag.

• Requires careful learning rate tuning to avoid divergence.

• Often used with shuffling to avoid bias from data order.

3. Mini-Batch Gradient Descent

• Uses small batches (e.g., 32–512 samples) per update.

• Faster than Batch GD due to parallel computation on batches.

• Less noisy than SGD → smoother convergence.

• Can leverage GPU acceleration for matrix operations.

• Better generalization than pure batch (due to mild noise).

• Batch size affects performance — too small = noisy, too big = slow.

Challenges with Gradient Descent (simple points)

1. Local Minima / Saddle Points – Can get stuck in bad points in the
loss surface.
2. Slow Convergence – May take many iterations, especially with poor
learning rate.
3. Learning Rate Sensitivity – Too high overshoots, too low is very
slow.
4. Vanishing / Exploding Gradients – Gradients become too small or
too large, hurting learning.
5. Non-convex Loss – Multiple minima make optimization harder.
6. Overfitting Risk – Model may learn noise if not regularized.
Overfitting in Neural Networks
• Definition: Model learns training data too well, including noise, and fails to
generalize to new/unseen data.
• Symptoms:
o High training accuracy but low validation/test accuracy.

o Gap between training and validation loss grows after a point.

• Causes:
o Too many parameters relative to data size.

o Training too long without regularization.

o Lack of sufficient/varied training data.

• Prevention Techniques:
o Regularization (L1, L2).

o Dropout layers.

o Early stopping during training.

o Data augmentation (for images, text, etc.).

o Reducing model complexity.

Dropout in Neural Networks

• Definition: Regularization technique where randomly selected neurons are
ignored (“dropped”) during training.
• Mechanism:
o At each training step, a fraction p of neurons are temporarily removed

along with their connections.

o During inference (testing), no neurons are dropped, but activations are

scaled by p to match training distribution.

• Purpose:
o Prevents co-adaptation of neurons.

o Forces network to learn redundant representations → improves

generalization.
• Advantages:
o Simple and effective against overfitting.

o Works well for large and deep networks.

• Hyperparameter:
o Dropout rate (p) → typical values are 0.2–0.5.

• Limitations:
• Increases training time.
• May hurt performance if dataset is very small or dropout rate is too high.
Delta Rule
• Also called Widrow-Hoff rule or Least Mean Squares (LMS) rule.
• Used in supervised learning for updating weights in a perceptron or simple
neural network.

• Goal: Minimize the mean squared error between predicted and target
output.
• Works well when activation is linear and problem is continuous-valued.

Learning Rate (η)

1. Definition – Controls how much the weights change during each update
step.
2. Small η – More stable convergence but slower learning.
3. Large η – Faster learning but risk of overshooting and divergence.
4. Adaptive Strategies – Can change over time (learning rate decay) or adapt
per parameter (Adam, RMSProp).
5. Balance Needed – Too small → stuck in local minima; too large → oscillations
or instability.
6. Experimentation Required – Often tuned using trial-and-error or grid search.
7. Influence on Accuracy – Right value ensures both fast and accurate
convergence.
8. Relation to Delta Rule – η in delta rule directly affects the magnitude of
weight change.
Local Minima
• In training, the loss function is like a hilly surface. A local minimum is a
point where the loss is lower than nearby points but not the lowest
possible.
• Getting stuck here means the model may not reach the best performance.
• Some local minima can still work fine, especially if they are wide and flat.
• Can happen more often in small models; large neural networks often have
many good minima.
• Can cause overfitting if the minimum fits training data too tightly.
• We can reduce the risk by using methods like random restarts, momentum,
or Adam optimizer.

Flat Regions (Plateaus)

• These are areas in the loss surface where the value is almost the same in all
directions.
• Here, the gradient is near zero, so learning slows down or pauses.
• Often caused by dead neurons in ReLU layers or saturated activation
functions (like sigmoid/tanh).
• Some flat regions are actually saddle points that confuse gradient descent.
• Can make training take much longer if the learning rate is too small.
• Using momentum, adaptive optimizers (Adam, RMSProp), or batch
normalization helps escape faster

Introduction To Deep Learning With IBM PDF
No ratings yet
Introduction To Deep Learning With IBM PDF
15 pages
Neural Networks & Deep Learning - Study Notes
No ratings yet
Neural Networks & Deep Learning - Study Notes
8 pages
IntroductiontoDeepLearning T1753856843586
No ratings yet
IntroductiontoDeepLearning T1753856843586
3 pages
Shortnotedeeplearning
No ratings yet
Shortnotedeeplearning
11 pages
Unit 1
No ratings yet
Unit 1
32 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Understanding Neural Networks and Deep Learning
No ratings yet
Understanding Neural Networks and Deep Learning
10 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Deep Learning Explained: Key Concepts & Differences
No ratings yet
Deep Learning Explained: Key Concepts & Differences
6 pages
Unit 2 Notes NLP
No ratings yet
Unit 2 Notes NLP
6 pages
Deep Learning Concepts and Techniques
No ratings yet
Deep Learning Concepts and Techniques
12 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
Eng PPT Tech
No ratings yet
Eng PPT Tech
18 pages
3.2 Overview of Neural Networks
No ratings yet
3.2 Overview of Neural Networks
28 pages
Unit 2 - ML
No ratings yet
Unit 2 - ML
18 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Deep Learing
No ratings yet
Deep Learing
37 pages
AI/ML Basics and Model Building Guide
No ratings yet
AI/ML Basics and Model Building Guide
34 pages
Deep Feedforward Neural Networks Review
No ratings yet
Deep Feedforward Neural Networks Review
93 pages
Understanding Multilayer Perceptrons
No ratings yet
Understanding Multilayer Perceptrons
23 pages
Deep Learning: Types & Key Concepts
No ratings yet
Deep Learning: Types & Key Concepts
12 pages
Notes For Deep Learning
No ratings yet
Notes For Deep Learning
6 pages
Overview of Deep Learning Concepts
100% (3)
Overview of Deep Learning Concepts
49 pages
Unit 1
No ratings yet
Unit 1
16 pages
Fundamentals of Deep Learning
100% (1)
Fundamentals of Deep Learning
26 pages
GK Deeplearning
No ratings yet
GK Deeplearning
15 pages
Introduction Deep Eng
No ratings yet
Introduction Deep Eng
50 pages
Neuron 7 AI: Linear Threshold Units
No ratings yet
Neuron 7 AI: Linear Threshold Units
18 pages
Deep Neural Network Optimization Guide
100% (1)
Deep Neural Network Optimization Guide
84 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
116 pages
Deep Learning: Keras & TensorFlow Guide
No ratings yet
Deep Learning: Keras & TensorFlow Guide
77 pages
DL Unit 3 Important Questions and Answers PDF .. - 1
No ratings yet
DL Unit 3 Important Questions and Answers PDF .. - 1
8 pages
Unit II
No ratings yet
Unit II
56 pages
Deep Neural Network Training Techniques
No ratings yet
Deep Neural Network Training Techniques
47 pages
Machine Learning and Neural Networks Syllabus
No ratings yet
Machine Learning and Neural Networks Syllabus
45 pages
Deep Neural Network Analysis Guide
No ratings yet
Deep Neural Network Analysis Guide
6 pages
Supervised Deep Learning Techniques
No ratings yet
Supervised Deep Learning Techniques
28 pages
Fundamentals of Neural Networks Presentation
No ratings yet
Fundamentals of Neural Networks Presentation
31 pages
3 - DeepLearning - and - CNN v3
No ratings yet
3 - DeepLearning - and - CNN v3
50 pages
Chapter 5 Summary
No ratings yet
Chapter 5 Summary
5 pages
Deep Learning for Tech Enthusiasts
No ratings yet
Deep Learning for Tech Enthusiasts
95 pages
Introduction to Neural Networks Basics
No ratings yet
Introduction to Neural Networks Basics
45 pages
Understanding Neurons and Perceptrons
No ratings yet
Understanding Neurons and Perceptrons
23 pages
Deep Learning: On Artificial Neural Networks (Anns)
No ratings yet
Deep Learning: On Artificial Neural Networks (Anns)
16 pages
Activation Functions and Neural Networks
No ratings yet
Activation Functions and Neural Networks
24 pages
Unit Ii DNN
No ratings yet
Unit Ii DNN
24 pages
Deep Learning Essentials for Business
No ratings yet
Deep Learning Essentials for Business
58 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
Deep Neural Network Training Techniques
No ratings yet
Deep Neural Network Training Techniques
60 pages
Deep Learning: Feedforward Networks Guide
No ratings yet
Deep Learning: Feedforward Networks Guide
15 pages
Deep Learning Interview Questions Guide
No ratings yet
Deep Learning Interview Questions Guide
46 pages
CV Lec5
No ratings yet
CV Lec5
54 pages
Module 1
No ratings yet
Module 1
22 pages
Fundamentals of Deep Learning Explained
No ratings yet
Fundamentals of Deep Learning Explained
72 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
4 pages
Deep Learning U1
No ratings yet
Deep Learning U1
5 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
3 pages
Mod 4 Notes
No ratings yet
Mod 4 Notes
46 pages
Firefly Algorithm
No ratings yet
Firefly Algorithm
27 pages
Expt 10 - Counter
No ratings yet
Expt 10 - Counter
3 pages
EMC Avamar Data Store Gen4TCustomer Installation Guide
No ratings yet
EMC Avamar Data Store Gen4TCustomer Installation Guide
54 pages
About The Final Course Project and Assessment (AutoRecovered)
No ratings yet
About The Final Course Project and Assessment (AutoRecovered)
10 pages
Brief Overview Steps To Using Fisher Valve Sizing Program 2
No ratings yet
Brief Overview Steps To Using Fisher Valve Sizing Program 2
30 pages
Iso 26262 12 2018
No ratings yet
Iso 26262 12 2018
52 pages
LUBRILOG
No ratings yet
LUBRILOG
4 pages
Detailed Industrial Automation Protocols
No ratings yet
Detailed Industrial Automation Protocols
5 pages
Video Game Translation Manual PDF
No ratings yet
Video Game Translation Manual PDF
160 pages
AI-Driven DNA Medical Breakthrough
No ratings yet
AI-Driven DNA Medical Breakthrough
41 pages
Evaluating The Visualization of What A Deep Neural Network Has Learned
No ratings yet
Evaluating The Visualization of What A Deep Neural Network Has Learned
13 pages
GFGX-L Series
No ratings yet
GFGX-L Series
8 pages
Comparison of Food Delivery Services Interfaces - Dominos Vs
No ratings yet
Comparison of Food Delivery Services Interfaces - Dominos Vs
3 pages
User-Manual-Notion R01
No ratings yet
User-Manual-Notion R01
17 pages
Datasheet: HC32F448 Series
No ratings yet
Datasheet: HC32F448 Series
103 pages
Signals Lab Record (MK)
No ratings yet
Signals Lab Record (MK)
22 pages
Jza XKVX VXL WSXB HQ
No ratings yet
Jza XKVX VXL WSXB HQ
3 pages
TOPIC 4 - Types of Business Model
No ratings yet
TOPIC 4 - Types of Business Model
12 pages
Engineering Algebra II Overview
No ratings yet
Engineering Algebra II Overview
29 pages
Serial PN Acquisition Using Smart Antenna and Censored Mean Level CFAR Adaptive Thresholding For A DS/CDMA Mobile Communication
No ratings yet
Serial PN Acquisition Using Smart Antenna and Censored Mean Level CFAR Adaptive Thresholding For A DS/CDMA Mobile Communication
6 pages
Bunyod's Strong IT Program Recommendation
No ratings yet
Bunyod's Strong IT Program Recommendation
1 page
Company Incharge 7656
50% (2)
Company Incharge 7656
1,026 pages
Electric Slab Scissor Lifts S3220E/S3226E
No ratings yet
Electric Slab Scissor Lifts S3220E/S3226E
2 pages
Types of Social Media Users Explained
No ratings yet
Types of Social Media Users Explained
2 pages
Question Bank Cobol
No ratings yet
Question Bank Cobol
8 pages
Steinitz Fractal Breakout Custom Indicator
No ratings yet
Steinitz Fractal Breakout Custom Indicator
10 pages
CLAAS LEXION 670-640 Combine Parts Catalogue Manual Instant Download (SN C5400011-C5499999)
No ratings yet
CLAAS LEXION 670-640 Combine Parts Catalogue Manual Instant Download (SN C5400011-C5499999)
24 pages
Vipul Rajgor Resume
No ratings yet
Vipul Rajgor Resume
2 pages
AI Integration in Nigerian Academic Libraries
No ratings yet
AI Integration in Nigerian Academic Libraries
19 pages
Rahul Dey: Education
No ratings yet
Rahul Dey: Education
2 pages