UNIT-I:
Introduction to Deep
Learning and Neural
Network Foundations
• Learning Objectives
• Define Deep Learning and explain how it differs from traditional
Machine Learning.
• Describe the evolution of AI → ML → DL.
• Identify key applications of Deep Learning in real-world domains.
• Understand the basic building blocks of Deep Learning models
(neurons, layers, activation functions).
• Relate Deep Learning techniques to mathematical foundations in
linear algebra, probability, and calculus.
What is Deep
Learning?
• Deep Learning is a subset of Machine Learning
that focuses on artificial neural networks with
multiple layers. Unlike traditional ML, which relies
heavily on manual feature extraction, Deep
Learning automatically learns features from raw
data through hierarchical representations.
• It mimics the working of the human brain’s neural
networks to recognize complex patterns in images,
text, speech, and other data.
• Key distinction:
• ML → Requires feature engineering.
• DL → Learns features automatically through multiple
layers of neurons.
Applications of Deep Learning
• Deep Learning has achieved remarkable success in many domains
because of its ability to automatically learn patterns from raw data
without explicit feature engineering. Unlike traditional ML, DL systems
can handle high-dimensional, complex data like images, speech, and
natural language.
Some major application areas are:
• Computer Vision
• Natural Language Processing (NLP)
• Speech Recognition & Audio Processing
• Healthcare
• Reinforcement Learning Applications
Historical Evolution of Deep
Learning
• 1943: McCulloch–Pitts neuron (binary model of brain cells)
• 1950s: Perceptron (Rosenblatt)
• 1969: Minsky & Papert – XOR limitation exposed
• 1980s: Backpropagation revived interest
• 2006+: Hinton & others → Deep Learning boom
McCulloch–Pitts Neuron (1943)
•Inputs: Binary values (0 or 1)
•Weights: Each input has an associated weight to
indicate importance
•Summation & Threshold:
•Can compute basic Boolean functions (AND, OR,
NOT)
The McCulloch-Pitts neuron is a simplified model of a
biological neuron: it either “fires” (1) or “does not fire” (0)
depending on whether the weighted sum of inputs crosses a
threshold.
•Inputs (x) represent incoming signals (like dendrites).
•Weights (w) model the strength of each input.
•Threshold acts like a decision boundary for activation.
This model laid the groundwork for all subsequent artificial
neurons and neural networks.
Truth Table:
This shows that the MP neuron perfectly implements
the AND gate with these weights and threshold.
Perceptron Model – Rosenblatt
(1957)
• Input → Weighted sum → Activation function
• Learning rule adjusts weights to minimize error
•Can solve only “linearly separable
problems (AND, OR)”
• The perceptron is an improvement over
the McCulloch–Pitts neuron because it
learns the weights from data using
a simple learning rule.
• Forward pass: Computes output
based on current weights and bias.
• Error calculation: Compares
predicted output to actual target.
• Weight update: Adjusts weights
slightly in the direction that reduces
error.
Limitations:
• Cannot solve non-linearly separable
problems, e.g., XOR.
• Single-layer perceptron's are limited in
representation.
Example
Limitations of the Perceptron
• Can only solve linearly separable problems.
• Fails on non-linear tasks (classic example: XOR).
• Single-layer perceptron has limited representation power —
cannot form internal feature combinations.
• Sensitive to choice of learning rate and initialization; no guarantee of
representing complex decision boundaries.
Mathematical proof (why XOR is not linearly
separable)
Analogy of MLP
Backpropagation
Why Backpropagation?
•Forward pass computes outputs.
•But to learn, we must adjust weights so predictions match targets.
•Backpropagation provides a systematic way to compute gradients of error
w.r.t. weights.
1.Forward pass: Compute outputs layer by layer.
2.Loss calculation: Compare prediction with target (e.g., MSE, cross-
entropy).
3.Backward pass: Propagate the error backwards:
•Compute error at output.
•Distribute error across hidden layers.
•Update weights using gradient descent.