Ch4 and Ch5 Notes
Ch4 and Ch5 Notes
1. Define deep learning and explain how it differs from traditional machine
learning.
Answer:
Deep Learning is a subset of machine learning involving neural networks with many
layers (deep networks).
o Traditional ML often relies on feature engineering.
o DL automatically extracts features through layers.
o DL performs better with large datasets and complex problems like vision and
speech.
2. What is the role of activation functions in neural networks?
Answer:
Activation functions introduce non-linearity into the network, enabling it to learn
complex patterns. Examples include ReLU, Sigmoid, and Tanh.
3. Explain the concept of overfitting. How can it be prevented?
Answer:
Overfitting occurs when a model performs well on training data but poorly on unseen
data.
Prevention Techniques:
o Regularization (L1/L2)
o Dropout
o Data Augmentation
o Early Stopping
o Cross-Validation
4. What are the main components of a convolutional neural network (CNN)?
Answer:
o Convolutional Layers
o Activation Functions (e.g., ReLU)
o Pooling Layers (Max/Avg Pooling)
o Fully Connected Layers
o Output Layer (Softmax for classification)
5. What is the vanishing gradient problem in deep networks?
Answer:
It's when gradients become very small during backpropagation, making training slow
or ineffective.
Solution: Use ReLU instead of sigmoid/tanh, or architectures like LSTM or ResNet.
6. Compare supervised learning, unsupervised learning, and reinforcement
learning.
Answer:
o Supervised: Uses labeled data (e.g., classification).
o Unsupervised: Finds patterns in unlabeled data (e.g., clustering).
o Reinforcement: Learns via rewards and penalties (e.g., game playing,
robotics).
7. What is a loss function? Give an example.
Answer:
A loss function measures how well a model's prediction matches the target.
Example: Cross-Entropy Loss for classification.
✅ Section B – Medium Answer Questions
1. Design and explain a deep learning model for image classification using CNN.
Answer:
o Input: Image (e.g., 224x224x3)
o Conv Layer: Extracts features using filters
o ReLU: Adds non-linearity
o Pooling Layer: Reduces dimensionality
o Repeat Conv + Pool layers
o Flatten Layer: Converts 2D to 1D
o Fully Connected Layer: Learns high-level features
o Softmax Output: Predicts class
o Training: Use cross-entropy loss, Adam optimizer, validation accuracy for
evaluation
2. Modern AI applications in healthcare, finance, and autonomous systems.
Answer:
o Healthcare: Diagnosis from scans (CNN), drug discovery, patient monitoring.
o Finance: Fraud detection, algorithmic trading, credit scoring.
o Autonomous Systems: Self-driving cars (computer vision + RL), drones,
robotics.
3. Ethical concerns in AI and explainable AI (XAI).
Answer:
o Concerns: Bias, lack of transparency, job displacement, data privacy.
o XAI Goals: Make AI decisions understandable and trustworthy.
o Techniques: SHAP, LIME, attention maps, rule extraction.
4. Transformer models in NLP.
Answer:
o Transformer: Attention-based architecture (no recurrence).
o Key Parts: Self-attention, multi-head attention, positional encoding.
o Used in: BERT, GPT, T5
o Applications: Translation, summarization, question answering.
Long Question and Answers from Ch-4 and ch-5
1. Define Deep Learning and Explain How It Differs from Traditional Machine
Learning
Deep Learning is a subset of Machine Learning (ML) that focuses on algorithms inspired by
the structure and function of the human brain, known as artificial neural networks. Deep
learning models consist of multiple layers of interconnected nodes (neurons) that process data
in a hierarchical manner. These models are capable of automatically learning representations
from raw data without the need for manual feature engineering.
Deep Learning excels at identifying patterns in large, complex datasets such as images, audio,
and natural language.
It involves neural networks with many hidden layers (hence the term "deep").
Each layer learns to detect different features. For example, in image recognition:
o Early layers learn edges and textures.
o Deeper layers learn shapes or entire objects.
These layers are trained using techniques like backpropagation and gradient
descent to minimize a loss function.
Traditional ML: We'd need to manually define features like fur color, edge shapes,
whiskers, etc., and feed them into a classifier (e.g., SVM).
Deep Learning: A CNN (Convolutional Neural Network) can learn these features
automatically from raw pixels.
✅ Conclusion:
Deep Learning represents a significant leap forward in the field of artificial intelligence.
While traditional machine learning still has its place, especially in low-data or interpretable
scenarios, deep learning is the go-to for tasks that involve complex patterns and large-scale
data, such as computer vision and natural language processing.
✅ Introduction:
In neural networks, activation functions play a critical role in introducing non-linearity into
the model. Without them, even a deep neural network would behave like a simple linear
model, severely limiting its capacity to learn complex patterns in data.
An activation function determines the output of a neuron, given its input — essentially
deciding whether a neuron should be "activated" or not based on the input it receives.
✅ Why Activation Functions Are Necessary:
Neural networks are composed of layers of neurons. Each neuron performs a weighted sum
of its inputs and passes the result through an activation function.
Without activation functions, no matter how many layers are stacked, the output of
the network would still be a linear combination of the input — making it incapable
of solving non-linear problems.
Activation functions allow networks to model complex functions like decision
boundaries, image features, language semantics, etc.
✅ Types of Activation Functions:
✅ Example:
The hidden layers might use ReLU for fast and effective training.
The output layer would use softmax to give the probability of each digit class (0–9).
Impact on Training:
✅ Conclusion:
Activation functions are the heartbeat of neural networks. They enable deep learning
models to understand and learn from complex data by introducing non-linearities. The right
activation function, placed at the right location in the network, significantly affects model
performance, convergence speed, and accuracy.
3.What is Overfitting?
In deep learning, overfitting occurs when a neural network learns the training data too well,
including its noise and minor details, instead of learning the general patterns. As a result,
the model performs very well on training data but poorly on unseen data, which means it
has low generalization ability.
This is common in deep networks due to their high capacity — they have many layers and
parameters, making them capable of fitting even random patterns.
1. Regularization (L1/L2)
o Adds a penalty to large weights to discourage complexity.
2. Dropout
o Randomly disables neurons during training to prevent the model from relying
too heavily on specific paths.
3. Early Stopping
o Monitor validation loss and stop training when it stops improving.
4. Data Augmentation
o For image/text tasks, create variations (rotations, flips, synonyms, etc.) to
increase training diversity.
5. Simpler Model Architecture
o Use fewer layers or neurons to reduce capacity.
6. Batch Normalization
o Helps stabilize training and adds slight regularization.
✅ Conclusion:
Overfitting is a key challenge in deep learning, but it can be effectively managed using
techniques like regularization, dropout, and data augmentation. The goal is to build a model
that performs well not just on training data, but also on new, unseen examples.
A Convolutional Neural Network (CNN) is a deep learning model primarily used for image
recognition, classification, and processing tasks. CNNs are designed to automatically and
adaptively learn spatial hierarchies of features from input images. They are made up of
several key components:
✅ 1. Convolutional Layer
Reduces the spatial dimensions of the feature maps (e.g., width and height).
Common methods: Max Pooling and Average Pooling.
Helps with dimensionality reduction, computation efficiency, and controls
overfitting.
After several convolutional and pooling layers, the output is flattened and fed into
one or more dense (fully connected) layers.
These layers act as a classifier and produce the final prediction (e.g., image class).
✅ 5. Output Layer
The final fully connected layer typically uses an activation function like Softmax (for
multi-class classification) or Sigmoid (for binary classification).
Converts the scores into probabilities.
✅ Conclusion:
A CNN consists mainly of convolutional layers, activation functions, pooling layers, and
fully connected layers, with optional components like dropout for regularization. Together,
they enable CNNs to learn and recognize complex patterns in visual data efficiently.
In the field of deep learning, different types of learning paradigms are used based on the
nature of the problem and the availability of labeled data. The three primary types are
supervised learning, unsupervised learning, and reinforcement learning. Each of them has
a different approach to how models learn from data.
✅ 1. Supervised Learning
➤ Definition:
Supervised learning is a type of learning where the model is trained on a labeled dataset,
which means each input has a corresponding correct output (label). The goal is for the
model to learn a mapping function from input to output.
➤ How it Works:
➤ Advantages:
➤ Disadvantages:
✅ 2. Unsupervised Learning
➤ Definition:
Unsupervised learning deals with unlabeled data. The goal is to discover hidden patterns or
structures in the data without explicit guidance.
➤ How it Works:
➤ Advantages:
➤ Disadvantages:
➤ Definition:
➤ How it Works:
➤ Advantages:
➤ Disadvantages:
✅ What is an RNN?
✅ Architecture of RNN:
1. Input Layer:
Takes in a sequence of data (e.g., words in a sentence or stock prices over time).
2. Hidden Layer(s) (with recurrence):
o The key feature of RNNs.
o Each hidden state receives input from the current time step and from the
previous hidden state.
o This creates a feedback loop or "memory".
3. Output Layer:
Produces the prediction at each time step or at the final step, depending on the task
(e.g., next word prediction, sequence classification).
This loop allows the network to remember previous inputs, which is crucial for
understanding context in sequences.
✅ Applications:
Language modeling
Machine translation
Sentiment analysis
Speech recognition
Time series forecasting
Limitations:
To solve this, advanced variants like LSTM (Long Short-Term Memory) and GRU (Gated
Recurrent Unit) are commonly used.
✅ Conclusion:
RNNs are powerful tools in deep learning for sequence-based tasks. Their recurrent
architecture enables them to remember previous inputs, making them ideal for language,
time, and speech data. While they face some limitations with long sequences, these are
often addressed by using LSTM or GRU networks.
In deep learning, training a deep neural network (DNN) involves adjusting the network’s
weights and biases so that its predictions become more accurate. The two main techniques
used in this process are backpropagation and gradient descent.
✅ 1. Forward Pass
✅ 2. Backpropagation
✅ Conclusion
The training process of a deep neural network relies on forward propagation to calculate
predictions, backpropagation to compute gradients, and gradient descent to update the
weights. This cycle, repeated over many iterations, allows the model to learn patterns in the
data and make accurate predictions.
Transfer Learning is a technique in machine learning and especially deep learning where a
model developed for one task is reused as the starting point for a model on a second task.
Instead of training a model from scratch, we transfer the knowledge gained from solving
one problem and apply it to a different but related problem.
This approach is especially useful when we do not have enough labeled data for the second
task, but there is a pre-trained model available that has been trained on a large dataset.
It is very effective in domains like computer vision, natural language processing, and
speech recognition where models are data and compute intensive.
One common example is using Convolutional Neural Networks (CNNs) trained on the
ImageNet dataset. ImageNet has over 14 million images labeled across 1000 classes.
Example:
This approach has been shown to outperform models trained from scratch on small
datasets.
Modern NLP heavily relies on transfer learning using models like BERT, GPT, and RoBERTa.
Example:
3. Self-driving Cars
If the source and target tasks/domains are too different, transfer may not help.
Negative transfer: Performance might degrade if irrelevant features are transferred.
Requires understanding of the source model architecture and data.
Conclusion
Transfer learning is a powerful and practical technique in deep learning. It allows leveraging
existing knowledge to solve new, complex tasks with limited data and computational power.
As models grow larger and data becomes more expensive, transfer learning continues to be
a key strategy in developing efficient and intelligent AI systems.
GANs are especially powerful for tasks involving image generation, video synthesis, voice
generation, and more.
GANs work on the idea of two neural networks competing with each other in a game-
theoretic setup:
1. Generator (G): Learns to generate fake data that looks like the real data.
2. Discriminator (D): Learns to distinguish between real and fake data.
GAN Architecture
The training involves two loss functions and is done in alternating steps:
Variants of GANs
Conclusion
Generative Adversarial Networks are one of the most exciting developments in deep
learning. They introduce a new way of training models using adversarial loss and open doors
to highly creative and realistic data generation. Despite challenges in training, their potential
in AI applications is immense.
11.Compare and contrast LSTM and GRU units.
Transformers are a type of deep learning model that utilizes self-attention mechanisms to
process and generate sequences of data efficiently. They capture long-range dependencies
and contextual relationships making them highly effective for tasks like language modeling,
machine translation and text generation.
In this article we will explore the architecture and working of transformers by understanding
their key components, mathematical formulations and how they function during training and
inference.
The transformer model is built on an encoder-decoder architecture where both the encoder
and decoder are composed of a series of layers that utilize self-attention mechanisms and
feed-forward neural networks. This architecture enables the model to process input data in
parallel making it highly efficient and effective for tasks involving sequential data.
The encoder and decoder work together to transform the input into the desired output such as
translating a sentence from one language to another or generating a response to a query.
Key Components Transformer
1. Encoder
Layer normalization and residual connections are used around each of these sub-layers to
ensure stability and improve convergence during training.
2. Decoder
Decoder in transformer also consists of multiple identical layers. Its primary function is to
generate the output sequence based on the representations provided by the encoder and the
previously generated tokens of the output.
3. Feed-Forward Neural Network: This sub-layer processes the combined output of the
masked self-attention and encoder-decoder attention mechanisms.
1. Input Representation
The first step in processing input data involves converting raw text into a format that the
transformer model can understand. This involves tokenization and embedding.
Tokenization: The input text is split into smaller units called tokens, which can be
words, subwords, or characters. Tokenization ensures that the text is broken down
into manageable pieces.
1. Input Embedding: The input sequence is tokenized and converted into embeddings
with positional encodings added.
2. Self-Attention Mechanism: Each token in the input sequence attends to every other
token to capture dependencies and contextual information.
3. Decoder Process
1. Input Embedding and Positional Encoding: The partially generated output sequence
is tokenized and embedded with positional encodings added.
4. Feed-Forward Network: Similar to the encoder the output from the attention
mechanisms is passed through a position-wise feed-forward network.
Transformers are trained using supervised learning where the model learns to predict the next
token in a sequence given the previous tokens.
13.Explain LLM.
This article explores the evolution, architecture, applications, and challenges of LLMs,
focusing on their impact in the field of Natural Language Processing (NLP).
A large language model is a type of artificial intelligence algorithm that applies neural
network techniques with lots of parameters to process and understand human languages or
text using self-supervised learning techniques. Tasks like text generation, machine
translation, summary writing, image generation from texts, machine coding, chat-bots, or
Conversational AI are applications of the Large Language Model.
There are many techniques that were tried to perform natural language-related tasks but the
LLM is purely based on the deep learning methodologies. LLM (Large language model)
models are highly efficient in capturing the complex entity relationships in the text at hand
and can generate the text using the semantic and syntactic of that particular language in
which we wish to do so.
GPT-1 which was released in 2018 contains 117 million parameters having 985
million words.
GPT-3 which was released in 2020 contains 175 billion parameters. Chat GPT is also
based on this model as well.
GPT-4 model is released in the early 2023 and it is likely to contain trillions of
parameters.
GPT-4 Turbo was introduced in late 2023, optimized for speed and cost-efficiency,
but its parameter count remains unspecified.
Large Language Models (LLMs) operate on the principles of deep learning, leveraging neural
network architectures to process and understand human languages.
These models, are trained on vast datasets using self-supervised learning techniques. The core
of their functionality lies in the intricate patterns and relationships they learn from diverse
language data during training. LLMs consist of multiple layers, including feedforward layers,
embedding layers, and attention layers. They employ attention mechanisms, like self-
attention, to weigh the importance of different tokens in a sequence, allowing the model to
capture dependencies and relationships.
Architecture of LLM
Large Language Model’s (LLM) architecture is determined by a number of factors, like the
objective of the specific model design, the available computational resources, and the kind of
language processing tasks that are to be carried out by the LLM. The general architecture of
LLM consists of many layers such as the feed forward layers, embedding layers, attention
layers. A text which is embedded inside is collaborated together to generate predictions.
input representations
Self-Attention Mechanisms
Training Objectives
Computational Efficiency
3. Encoder: Based on a neural network technique, the encoder analyses the input text
and creates a number of hidden states that protect the context and meaning of text
data. Multiple encoder layers make up the core of the transformer architecture. Self-
attention mechanism and feed-forward neural network are the two fundamental sub-
components of each encoder layer.
1. Self-Attention Mechanism: Self-attention enables the model to weigh the
importance of different tokens in the input sequence by computing attention
scores. It allows the model to consider the dependencies and relationships
between different tokens in a context-aware manner.
2. Feed-Forward Neural Network: After the self-attention step, a feed-forward
neural network is applied to each token independently. This network includes
fully connected layers with non-linear activation functions, allowing the model
to capture complex interactions between tokens.
7. Output Layers: The output layers of the transformer model can vary depending on
the specific task. For example, in language modeling, a linear projection followed by
SoftMax activation is commonly used to generate the probability distribution over the
next token.
It’s important to keep in mind that the actual architecture of transformer-based models can
change and be enhanced based on particular research and model creations. To fulfill different
tasks and objectives, several models like GPT, BERT, and T5 may integrate more
components or modifications.
Now let’s look at some of the famous LLMs which has been developed and are up for
inference.
For implementation details, these models are available on open-source platforms like
Hugging Face and OpenAI for Python-based applications.
Code Generation: LLMs can generate accurate code based on user instructions for
specific tasks.
Language Translation and Correction: LLMs can translate text between over 50
languages and correct grammatical errors.
Use cases of LLM are not limited to the above-mentioned one has to be just creative enough
to write better prompts and you can make these models do a variety of tasks as they are
trained to perform tasks on one-shot learning and zero-shot learning methodologies as well.
Due to this only Prompt Engineering is a totally new and hot topic in academics for people
who are looking forward to using ChatGPT-type models extensively.
LLMs, such as GPT-3, have a wide range of applications across various domains. Few of
them are:
Content Generation:
o Creating human-like text for various purposes, including content creation,
creative writing, and storytelling.
o Writing code snippets based on natural language descriptions or commands.
Language Translation: Large language models can aid in translating text between
different languages with improved accuracy and fluency.
NLP is Natural Language Processing, a field of artificial intelligence (AI). It consists of the
development of the algorithms. NLP is a broader field than LLM, which consists of
algorithms and techniques. NLP rules two approaches i.e. Machine learning and the analyze
language data. Applications of NLP are-
Improve search
while on the other hand, LLM is a Large Language Model, and is more specific to human-
like text, providing content generation, and personalized recommendations.
Large Language Models (LLMs) come with several advantages that contribute to their
widespread adoption and success in various applications:
LLMs can perform zero-shot learning, meaning they can generalize to tasks for
which they were not explicitly trained. This capability allows for adaptability to new
applications and scenarios without additional training.
LLMs efficiently handle vast amounts of data, making them suitable for tasks that
require a deep understanding of extensive text corpora, such as language translation
and document summarization.
High Costs: Training LLMs requires significant financial investment, with millions
of dollars needed for large-scale computational power.
Time-Intensive: Training takes months, often involving human intervention for fine-
tuning to achieve optimal performance.
Data Challenges: Obtaining large text datasets is difficult, and concerns about the
legality of data scraping for commercial purposes have arisen.
Environmental Impact: Training a single LLM from scratch can produce carbon
emissions equivalent to the lifetime emissions of five cars, raising serious
environmental concerns.
Conclusion
Due to the challenges faced in training LLM transfer learning is promoted heavily to get rid
of all of the challenges discussed above. LLM has the capability to bring revolution in the AI-
powered application but the advancements in this field seem a bit difficult because just
increasing the size of the model may increase its performance but after a particular time a
saturation in the performance will come and the challenges to handle these models will be
bigger than the performance boost achieved by further increasing the size of the models.
14.Explain FNN.
1. Input Layer: The input layer consists of neurons that receive the input data. Each
neuron in the input layer represents a feature of the input data.
2. Hidden Layers: One or more hidden layers are placed between the input and output
layers. These layers are responsible for learning the complex patterns in the data.
Each neuron in a hidden layer applies a weighted sum of inputs followed by a non-
linear activation function.
3. Output Layer: The output layer provides the final output of the network. The number
of neurons in this layer corresponds to the number of classes in a classification
problem or the number of outputs in a regression problem.
Each connection between neurons in these layers has an associated weight that is adjusted
during the training process to minimize the error in predictions.
Activation Functions
Activation functions introduce non-linearity into the network enabling it to learn and model
complex data patterns.
Common activation functions include:
1. Forward Propagation: During forward propagation the input data passes through the
network and the output is calculated.
2. Loss Calculation: The loss (or error) is calculated using a loss function such as Mean
Squared Error (MSE) for regression tasks or Cross-Entropy Loss for classification
tasks.
Forward Propagation
Gradient Descent
Batch Gradient Descent: Updates weights after computing the gradient over the
entire dataset.
Stochastic Gradient Descent (SGD): Updates weights for each training example
individually.
Mini-batch Gradient Descent: It Updates weights after computing the gradient over
a small batch of training examples.
Accuracy: The proportion of correctly classified instances out of the total instances.
Precision: The ratio of true positive predictions to the total predicted positives.
Recall: The ratio of true positive predictions to the actual positives.
F1 Score: The harmonic mean of precision and recall, providing a balance between
the two.
Confusion Matrix: A table used to describe the performance of a classification
model, showing the true positives, true negatives, false positives, and false negatives.
AI systems often inherit biases from training data (racial, gender, socioeconomic).
Can lead to unfair decisions in hiring, policing, loans, etc.
Fairness in AI is still a developing area with no universal solution.
Deep learning models like GPT, BERT, and DALL·E require huge computing power.
Expensive hardware (GPUs/TPUs) and energy usage lead to environmental concerns (carbon
footprint).
Limits accessibility for small organizations or researchers.
Deep learning models may overfit on training data and fail to generalize well to new, unseen
data.
Especially problematic with small or noisy datasets.
Deep learning models can be fooled by tiny, imperceptible changes (adversarial examples).
Poses a threat in security-sensitive applications (e.g., facial recognition, autonomous
vehicles).
Deep learning models still struggle with commonsense knowledge, logic, and reasoning.
They lack understanding of context and can't easily explain "why" they made a decision.
AI models must be made more robust to handle real-world uncertainties and adversarial
scenarios.
Especially important for autonomous cars, drones, and medical AI systems.
✅ 4. Sustainable AI
The need for green AI — designing models that are both effective and energy-efficient — is
a growing concern.
Research is focusing on model compression, efficient architectures, and low-power AI
chips.
✅ 6. Human-AI Collaboration
✅ 7. AI in Low-Resource Settings
Many AI advancements are centered in developed nations with access to data and compute.
Future AI systems should also benefit developing countries, low-resource languages, and
underserved populations.