Deep Learning : CSE-AIML 7th Sem( 2022-2026)
UNIT-1
Introduction to deep learning
Deep learning is a subset of machine learning that focuses on training artificial neural networks
to perform complex tasks by learning from large amounts of data. It has gained immense
popularity and success in various fields, including computer vision, natural language
processing, speech recognition, and more. The term "deep" in deep learning refers to the use
of multiple layers in neural networks to capture intricate patterns and representations within
the data.
At its core, deep learning mimics the way human brains process information. It involves
constructing artificial neural networks composed of interconnected nodes, or neurons,
organized into layers. Each neuron performs a simple computation and passes its output to the
next layer. The layers are typically organized into an input layer, one or more hidden layers,
and an output layer.
Fig.
The key components of deep learning include:
1. Neural Networks: These are the fundamental building blocks of deep learning.
Neurons in a neural network receive inputs, apply weights to those inputs, and pass the
weighted sum through an activation function to produce an output. The architecture of
neural networks can be simple, like feedforward networks, or more complex, like
recurrent and convolutional networks.
2. Activation Functions: These functions introduce non-linearity to the neural network,
enabling it to capture complex relationships in data. Common activation functions
include ReLU (Rectified Linear Activation), sigmoid, and tanh.
3. Layers: Neural networks consist of multiple layers, each responsible for extracting
different features from the input data. Hidden layers between the input and output layers
enable the network to learn hierarchical representations of the data.
4. Training Data: Deep learning models require a significant amount of labeled training
data to learn from. The model adjusts its internal parameters (weights and biases) during
training to minimize the difference between its predictions and the actual target values.
5. Loss Function: Also known as the cost function, it quantifies the difference between
the predicted values and the actual values. The goal of training is to minimize this loss
function, which guides the model towards making more accurate predictions.
6. Backpropagation: This is the core learning algorithm in deep learning. It involves
calculating the gradient of the loss function with respect to the model's parameters and
then adjusting those parameters using optimization techniques, such as gradient
descent, to minimize the loss.
7. Optimization Algorithms: Various optimization algorithms, like stochastic gradient
descent (SGD), Adam, and RMSProp, are used to iteratively adjust the model's
parameters in the direction that reduces the loss function.
8. Overfitting and Regularization: Deep learning models can become too complex and
perform well on training data but poorly on new, unseen data. Regularization
techniques, such as dropout and L2 regularization, help prevent overfitting by
controlling the model's capacity.
9. Hyperparameters: These are parameters set before training, such as learning rate,
batch size, and number of hidden layers. Tuning hyperparameters is crucial for
achieving optimal performance.
Applications of Deep Learning:
1. Computer Vision:
• Image Classification: Deep learning models excel at classifying images into
various categories, enabling applications like identifying objects in photos.
• Object Detection: Deep learning can detect and localize multiple objects within
an image or video stream.
• Facial Recognition: Deep learning powers facial recognition systems used for
authentication and surveillance.
• Image Generation: Generative Adversarial Networks (GANs) can create
realistic images, which finds use in art, design, and entertainment.
2. Natural Language Processing (NLP):
• Sentiment Analysis: Deep learning models analyze text to determine the
sentiment expressed, aiding in understanding public opinion.
• Machine Translation: NLP models like Transformers have revolutionized
machine translation by capturing contextual meanings.
• Chatbots and Virtual Assistants: Deep learning helps create interactive and
natural-sounding conversational agents.
• Text Generation: Models like GPT-3 can generate coherent and contextually
relevant text based on prompts.
3. Speech Recognition:
• Voice Assistants: Deep learning is integral to the accuracy of voice-controlled
devices like Siri, Alexa, and Google Assistant.
• Transcription: Deep learning helps transcribe spoken language into text,
benefiting applications like transcription services and closed captions.
4. Healthcare:
• Medical Imaging Analysis: Deep learning aids in diagnosing diseases by
analyzing medical images, such as detecting tumors in radiology images.
• Drug Discovery: Models predict molecular structures and interactions,
accelerating drug discovery processes.
• Personalized Medicine: Deep learning analyzes patient data to tailor treatments
based on individual characteristics.
5. Autonomous Vehicles:
• Perception and Control: Deep learning enables vehicles to perceive and react to
their surroundings, making autonomous driving possible.
• Object Tracking: Deep learning helps vehicles track pedestrians, other vehicles,
and road signs.
6. Finance:
• Algorithmic Trading: Deep learning models analyze financial data to make
trading decisions.
• Credit Scoring: Models assess creditworthiness by analyzing various data
points.
• Fraud Detection: Deep learning aids in detecting fraudulent transactions by
identifying unusual patterns.
Advantages of Deep Learning:
1. Feature Learning: Deep learning automatically learns relevant features from data,
reducing the need for manual feature engineering.
2. Hierarchy of Features: Deep architectures capture hierarchical representations,
allowing the model to learn complex patterns.
3. High Accuracy: Deep learning models often achieve state-of-the-art performance on
various tasks, outperforming traditional algorithms.
4. Flexibility: Deep learning models can be adapted to different domains by retraining or
fine-tuning on new data.
5. Generalization: Deep learning can generalize well to unseen data when properly
regularized, leading to robust models.
Disadvantages of Deep Learning:
1. Data Dependency: Deep learning models require large amounts of labeled data, which
can be challenging to acquire in certain domains.
2. Computational Demands: Training deep models can be computationally intensive and
may require specialized hardware like GPUs or TPUs.
3. Overfitting: Deep models can easily overfit to training data if not properly regularized,
leading to poor generalization.
4. Black Box Nature: Understanding the decision-making process of deep models can be
difficult due to their complex architecture.
5. Hyperparameter Tuning: Choosing the right hyperparameters can significantly
impact model performance, requiring experimentation.
6. Interpretability: Deep learning models lack transparency, making it challenging to
explain their decisions, especially in critical applications.
1. A Feedforward Neural Network (FNN):
A Feedforward Neural Network (FNN), also known as a multilayer perceptron (MLP), is one
of the simplest and most fundamental architectures in deep learning. It consists of multiple
layers of interconnected neurons, organized in a sequential manner. The key feature of a
feedforward neural network is that information flows in one direction—from the input layer
through the hidden layers to the output layer—without any feedback loops or recurrent
connections.
Here's a breakdown of the components and workings of a feedforward neural network:
Input Layer:
The input layer consists of nodes (neurons) that represent the features or attributes of the input
data.
Each node in the input layer corresponds to a feature of the input data.
Hidden Layers:
Hidden layers are layers that come after the input layer and before the output layer.
These layers contain nodes that perform computations on the input data to extract complex
features and representations.
Each node in a hidden layer receives inputs from all nodes in the previous layer and passes its
output to all nodes in the subsequent layer.
Output Layer:
The output layer produces the final predictions or classifications based on the computations
performed in the hidden layers.
The number of nodes in the output layer depends on the specific task, such as binary
classification, multi-class classification, regression, etc.
Activation Functions:
Each node in the hidden layers and the output layer applies an activation function to its input
to introduce non-linearity.
Common activation functions include ReLU (Rectified Linear Activation), sigmoid, tanh, and
softmax (for multi-class classification).
Weights and Biases:
Each connection between nodes in adjacent layers has associated weights and biases.
Weights determine the strength of the connection between nodes, and biases adjust the input
before the activation function is applied.
Forward Propagation:
In feedforward neural networks, information flows forward through the layers in a series of
mathematical computations.
The input data is multiplied by the weights, biases are added, and the result is passed through
the activation function to produce the output of each node.
Loss Function:
The loss function measures the difference between the network's predictions and the actual
target values.
During training, the goal is to minimize this loss by adjusting the weights and biases.
Backpropagation and Optimization:
Backpropagation is used to calculate the gradient of the loss function with respect to the model's
parameters (weights and biases).
Optimization algorithms like gradient descent or its variants are then employed to adjust the
parameters in a way that reduces the loss function.
Gradient Descent in Deep Learning
Gradient descent is a fundamental optimization algorithm used in deep learning to update the
parameters (weights and biases) of a neural network in order to minimize a given loss function.
The goal of gradient descent is to find the set of parameter values that results in the lowest
possible value of the loss function, effectively improving the model's performance on the task
at hand.
Historical Context of Deep Learning
1. Biological Inspiration and Early Models (1940s–1960s)
The roots of deep learning start not in computer science, but in neuroscience and
mathematical logic.
Year Key Event Significance
1943 McCulloch & Pitts A simple mathematical model of a neuron — takes
neuron binary inputs, applies weights, outputs 1 or 0. Showed
neurons could compute logical functions.
1949 Hebb’s Rule (The Early learning rule: “Cells that fire together wire
Organization of together.” Inspired weight update rules in neural nets.
Behavior)
1958 Rosenblatt’s First hardware-implemented trainable classifier. Could
Perceptron learn to classify simple patterns. Media hyped it as the
dawn of “thinking machines.”
Limitation: Perceptrons were single-layer → could not solve non-linearly separable
problems like XOR. This problem would haunt neural nets for decades.
2. The First AI Winter (1969–1980s)
The bubble popped fast.
• 1969 — “Perceptrons” book (Minsky & Papert): Proved mathematically that a
single-layer perceptron couldn’t compute XOR or other simple problems without
multiple layers.
• Computing hardware was primitive — training even small networks was painfully
slow.
• Government & academic funding dried up, leading to the first AI winter (a period of
disillusionment and reduced research investment).
3. Backpropagation and the 1980s Revival
The next leap came from multi-layer networks and a way to train them.
• 1974 — Paul Werbos described backpropagation (gradient descent through layers)
in his PhD thesis.
• 1986 — Rumelhart, Hinton, and Williams popularized backpropagation in a landmark
paper.
• 1989 — Yann LeCun applied backprop to Convolutional Neural Networks (CNNs)
for digit recognition.
o Early success on ZIP code reading for the US Postal Service.
• Hopfield networks (John Hopfield) and Boltzmann machines (Hinton & Sejnowski)
expanded the scope of neural networks.
Limitation:
• Still small datasets (e.g., MNIST was only 60k images).
• No GPUs — training was extremely slow.
• Networks were shallow by today’s standards.
4. The Second AI Winter (Late 1990s–2005)
By the mid-90s, neural nets were losing the popularity contest.
• Support Vector Machines (SVMs), decision trees, and other statistical learning
methods (boosted by the kernel trick) often outperformed NNs.
• Researchers felt NNs were:
o Computationally expensive
o Prone to overfitting
o Sensitive to hyperparameters
• Neural net research retreated to small niches like speech recognition.
5. Deep Learning Breakthrough (2006–2012)
The “deep” era began when several barriers fell at once.
2006 — Geoffrey Hinton et al. introduced Deep Belief Networks with layer-wise
pretraining:
• Train each layer unsupervised first → then fine-tune with backprop.
• Solved vanishing gradient issues that plagued deep networks.
2009–2012 — Key Enablers Appeared:
• Big data from the internet (ImageNet: 1.2M labeled images).
• GPU acceleration (driven by gaming industry).
• Better activation functions (ReLU) reduced vanishing gradients.
• New regularization techniques (Dropout).
2012 — AlexNet Moment:
• Alex Krizhevsky, Ilya Sutskever, and Hinton trained an 8-layer CNN on GPUs.
• ImageNet competition: error rate dropped from 26% → 15%.
• This was the spark for the modern deep learning boom.
6. The Modern Era (2013–present)
From 2013 onwards, deep learning exploded across domains.
• CNN dominance in vision tasks.
• RNNs & LSTMs became state-of-the-art in speech recognition and machine
translation.
• 2017 — Transformers (Attention is All You Need): revolutionized NLP, later vision
and multi-modal AI.
• Generative AI: GANs (2014), VAEs, Diffusion models, large language models (e.g.,
GPT series).
• Scaling laws showed: bigger datasets + bigger models + more compute → better
performance.
2. Motivation behind Deep Learning
The motivation for studying deep learning comes from a mix of practical power, scientific
curiosity, and career opportunity.
It’s one of those fields where the technology has both deep theoretical roots and massive real-
world impact.
1. It Solves Problems That Were Once “Impossible”
• Computer vision: Object recognition, medical image diagnosis, self-driving
perception.
• Natural language processing: Translation, summarization, chatbots, sentiment
analysis.
• Speech & audio: Voice assistants, transcription, emotion detection.
• Generative tasks: Creating realistic images, music, videos, and even new protein
structures.
Deep learning often outperforms traditional machine learning in complex, high-
dimensional, unstructured data problems.
2. It Scales with Data and Compute
• Traditional algorithms hit a performance ceiling even with more data.
• Deep learning models often get better as they grow in size and training data (scaling
laws).
• This means it’s not just a fixed algorithm — it’s an evolving paradigm that benefits
from advances in hardware and data availability.
3. It Bridges Disciplines
Studying deep learning gives you tools relevant to:
• AI research (theory, algorithms)
• Engineering (software optimization, systems design)
• Neuroscience (brain-inspired computation)
• Mathematics (linear algebra, probability, optimization)
• Ethics & policy (AI bias, transparency, regulation)
4. It’s at the Center of the AI Revolution
• The most impactful AI breakthroughs in the last decade (AlphaGo, ChatGPT,
DALL·E, AlphaFold) are deep learning–driven.
• Understanding deep learning means understanding the driving force behind modern
AI.
5. Career & Economic Motivation
• Demand for AI engineers, data scientists, and ML researchers is high and still
growing.
• Companies are embedding deep learning into almost every product domain (finance,
healthcare, robotics, education).
• Skills in deep learning open paths to research, startups, or high-impact industry roles.
6. Intellectual Challenge & Creativity
• Designing architectures and training strategies is as much an art as a science.
• You learn to think in terms of representation learning: how can a model learn
features automatically, rather than manually coding them?
• There’s joy in seeing a model “learn” patterns you didn’t explicitly program.
7. Preparing for the Future
• AI is becoming a foundational technology like electricity or the internet.
• Deep learning is the backbone of AI today — understanding it helps you adapt to
future shifts, whether in AGI research, ethics, or AI-driven science.
Motivation for Deep Learning:
DNNs, specifically CNNs, play a central role in data-driven applications such as noise
reduction, cancer detection, and weather forecasting, among others. These applications can be
implemented with different models that have caused breakthroughs in image recognition,
computer vision, and other fields. However, the developed architectures are ad hoc for a
determined set of input signals, and the theoretical background for important design choices
is lacking. Some relevant works try to explain a proposed CNN’s internal operations.
Nevertheless, understanding the ideas requires a high level of expertise. Even though existing
DL models have performed well in tackling a specific task, they still lack an in-depth
understanding of how the specific elements or building blocks work. Without a clear
understanding of the blocks, determining the desired configurations of a model remains a
mystery. This causes many existing architectures to be built empirically without researching
whether the network’s depth (number of parameters) and each building block are suitable for
the task to be carried out. In this section, we describe the building blocks from a
mathematical point of view to address the aforementioned challenging issues in a
theoretically sound and explainable way. The following are the main points for motivation:
1. Limitations of Traditional Machine Learning
• Traditional ML algorithms rely heavily on manual feature engineering.
• For complex data like images, speech, or medical scans, designing effective features
is difficult, time-consuming, and often suboptimal.
• Deep learning automatically learns hierarchical features directly from raw data.
2. Ability to Handle Complex and High-Dimensional Data
• Modern datasets (e.g., medical images, natural language, videos, genomics) are large,
unstructured, and high-dimensional.
• DL models, especially neural networks with multiple layers, are capable of capturing
nonlinear and intricate patterns in such data, outperforming shallow models.
3. Representation Learning
• Deep learning models can transform raw data into progressively abstract and
meaningful representations.
• Example: In image classification, lower layers detect edges, middle layers detect
shapes, and higher layers detect objects.
• This hierarchical feature extraction is a key motivation.
4. Success in Real-World Applications
• DL has achieved state-of-the-art results in many domains:
o Computer Vision (object detection, medical image analysis)
o Natural Language Processing (translation, chatbots)
o Speech Recognition
o Healthcare & Bioinformatics (disease prediction, drug discovery)
• Its ability to surpass human-level performance in certain tasks has fueled global
interest.
5. Scalability with Big Data and Hardware Advances
• Availability of massive datasets (e.g., ImageNet, EHR data, CT scans).
• Development of GPUs, TPUs, and cloud computing enables fast training of large
neural networks.
• This synergy of data + computing power motivates the adoption of DL.
6. End-to-End Learning
• DL allows end-to-end mapping from input to output without requiring handcrafted
intermediate steps.
• Example: In speech-to-text systems, the raw audio waveform can be directly mapped
to text without explicit phoneme extraction.
7. Continuous Improvement and Adaptability
• With more data, deeper architectures, and better optimization techniques, DL models
continuously improve performance.
• They also adapt well across tasks through transfer learning, making them versatile
for various domains.
Importance of Deep Learning
1. Automatic Feature Extraction
• Traditional ML requires manual feature engineering, which is domain-dependent
and time-consuming.
• Deep Learning automatically extracts hierarchical and abstract features directly
from raw data (images, text, speech), reducing reliance on expert-driven
preprocessing.
2. Superior Performance on Complex Data
• DL excels in analyzing large-scale, high-dimensional, and unstructured data.
• It effectively captures non-linear relationships and intricate patterns that shallow
models fail to represent.
3. Wide Range of Applications
• DL underpins breakthroughs in multiple domains:
o Healthcare: Disease detection, medical image segmentation.
o Computer Vision: Face recognition, self-driving cars.
o NLP: Chatbots, translation, sentiment analysis.
o Finance & Business: Fraud detection, stock prediction.
• This broad applicability highlights its transformative potential.
4. Scalability with Data Growth
• As data grows in size and complexity, DL models improve in accuracy and
robustness with more training data.
• This makes DL especially important in the era of big data.
5. End-to-End Learning
• DL allows direct mapping from raw input to desired output without handcrafted
intermediate steps.
• Example: Speech-to-text systems convert raw audio signals directly into text.
• This simplifies workflows and improves efficiency.
6. Support for Transfer Learning
• Pre-trained deep networks can be fine-tuned for new tasks with less data.
• This is crucial in domains like medical imaging, where labeled data is scarce.
7. Driving AI Advancements
• DL is the backbone of modern Artificial Intelligence.
• It has enabled systems to achieve or even surpass human-level performance in image
classification, speech recognition, and strategic gameplay (e.g., AlphaGo).