0% found this document useful (0 votes)

14 views13 pages

Unit-1 Notes

Uploaded by

saranyukrishna176

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views13 pages

Unit-1 Notes

Uploaded by

saranyukrishna176

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

1.

Historical Trends in Deep Learning

The evolution of deep learning has been shaped by decades of research, breakthroughs in

theory, advancements in hardware, and the availability of large datasets. Its roots can be

traced back to the 1940s with the introduction of the McCulloch-P itts neuron, a simple

mathematical model of a biological neuron. This was followed by the development of the

Perceptron in the 1950s by Frank Rosenblatt, which demonstrated that machines could

learn from data. However, enthusiasm declined in the 1970s due to limitations in the

perceptron’s capability to solve non-linearly separable problems, such as the XOR

problem. A major resurgence occurred in the 1980s with the introduction of

backpropagation, a learning algorithm that allowed multi-layer neural networks to be

trained effectively. Despite this progress, deep learning faced another slowdown due to

limited computational power and insufficient data. The early 2000s marked a turning

point, with renewed interest driven by increases in GPU computing, large-scale datasets,

and improved algorithms. The breakthrough came in 2012 when a deep convolutional

neural network, AlexNet, won the ImageNet competition by a wide margin, showcasing

the power of deep architectures in visual recognition tasks. This success triggered a wave

of innovations in architectures, such as VGG, ResNet, and Transformers, expanding deep

learning’s impact across domains like computer vision, natural language processing, and

speech recognition. Over the years, the field has moved toward deeper, wider, and more

efficient models, along with the use of self-supervised learning, transfer learning, and

foundation models. Today, deep learning stands at the forefront of artificial intelligence,

with ongoing research aimed at improving interpretability, robustness, and efficiency.

2. McCulloch-Pitts Neuron

The McCulloch-Pitts neuron, proposed in 1943 by Warren McCulloch and Walter Pitts,

represents the earliest mathematical model of a biological neuron and laid the foundation

for artificial neural networks. It is a simple binary computational model designed to

mimic the functioning of a single neuron in the human brain. The model receives

multiple binary inputs, each representing the presence (1) or absence (0) of a signal.

These inputs are summed and compared against a fixed threshold. If the total input

exceeds the threshold, the neuron “fires” and outputs a 1; otherwise, it outputs a 0.

Mathematically, it can be represented as a threshold logic unit without the use of weights

or learning mechanisms. Although rudimentary, the McCulloch-Pitts neuron was capable

of computing basic logical functions such as AND, OR, and NOT, making it a crucial

stepping stone in understanding how complex computation can arise from simple units.

However, due to its limitations—such as the inability to learn from data or solve non-

linearly separable problems like XOR—it was later extended and improved upon in

models like the Perceptron. Despite its simplicity, the McCulloch-Pitts neuron remains

historically significant as the first formal attempt to bridge neuroscience and

computation, inspiring the development of modern neural network models.

Algorithm of McCulloch-Pitts Neuron

Inputs:

 A list of binary inputs: x1, x2, ..., xn (each input is either 0 or 1)

 A threshold value: θ (theta), a positive integer

Steps:

1. Initialize sum = 0
2. For each input xi (where i = 1 to n):
o Add xi to sum
3. Compare the total sum with the threshold:
o If sum >= θ:
Output = 1
o Else:
Output = 0

Return:

 Output (either 0 or 1)

Example – Implementing AND Gate with McCulloch-Pitts Neuron

Let:

 Input x1 = 1
 Input x2 = 1
 Threshold θ = 2

Steps:

1. sum = x1 + x2 = 1 + 1 = 2
2. Since sum (2) >= threshold (2), the neuron fires:
Output = 1

Other combinations:

 x1 = 1, x2 = 0 → sum = 1 → Output = 0
 x1 = 0, x2 = 1 → sum = 1 → Output = 0
 x1 = 0, x2 = 0 → sum = 0 → Output = 0

Thus, the McCulloch-P itts neuron correctly performs the logic of an AND gate.
3. Thresholding Logic

Thresholding logic is a fundamental concept in both biological and artificial neural

computation, where a decision is made based on whether an input signal surpasses a

predefined limit known as the threshold. In its simplest form, thresholding logic involves

summing input signals and comparing the result to a fixed threshold value. If the sum

equals or exceeds the threshold, the system activates or produces an output of 1;

otherwise, the output is 0. This binary decision-making process mimics the behavior of

neurons in the brain, which fire only when their inputs collectively reach a certain

activation level. Thresholding logic is the core mechanism behind early neural models

like the McCulloch-P itts neuron and underlies the implementation of basic logic gates

such as AND, OR, and NOT. It plays a critical role in pattern recognition, classification,

and decision-making systems by enabling machines to differentiate between input

patterns based on activation criteria. Although modern neural networks use more

complex activation functions, thresholding remains a foundational idea that helps in

understanding how discrete decis ions are made in both artificial and biological contexts.

4. Perceptron

The Perceptron is a fundamental model in the history of artificial neural networks,

introduced by Frank Rosenblatt in 1958 as a computational algorithm designed to mimic

the learning ability of the human brain. It is a supervised learning model used primarily

for binary classification tasks. A perceptron consists of a single layer of artificial neurons,
each of which receives multiple input signals, applies corresponding weights, sums the m,

and passes the result through an activation function—typically a step function. If the

weighted sum exceeds a predefined threshold, the output is 1; otherwise, it is 0. Unlike

the earlier McCulloch-P itts neuron, the perceptron is capable of learning by adjusting its

weights during training using an error-correction rule. The learning algorithm updates

weights iteratively based on the difference between the actual and desired output,

allowing the model to minimize classification errors over time. Despite its effectiveness

in solving linearly separable problems (like AND and OR), the perceptron cannot solve

problems involving non-linear decision boundaries, such as the XOR problem.

Nevertheless, the perceptron laid the groundwork for more advanced neural network

models, including multi-layer networks and modern deep learning architectures, making

it a pivotal innovation in the development of artificial intelligence.

5. Perceptron Learning Algorithm

The Perceptron Learning Algorithm is one of the earliest and simplest algorithms for

training a single-layer neural network for binary classification. It was introduced by

Frank Rosenblatt and is used to find the optimal weights of a perceptron based on labeled

training data. The perceptron processes each input vector by computing a weighted sum

and comparing it against a threshold. If the result is above the threshold, the output is 1;

otherwise, it is 0. The learning algorithm updates the weights whenever the output does

not match the target value, gradually reducing the classification error over time.
Inputs:

 A set of training examples: {(x1, y1), (x2, y2), ..., (xn, yn)}

o Each input xi is a vector of features: [x1, x2, ..., xm]

o Each target yi is either 0 or 1

 Learning rate: η (eta), typically a small positive number (e.g., 0.1)

 Initial weights: [w1, w2, ..., wm] and bias b (often set to 0)

Algorithm Steps:

1. Initialize weights (w1, w2, ..., wm) and bias b to small random values or zeros.

2. For each training sample (xi, yi):

a. Compute the weighted sum:

net = (w1 * x1) + (w2 * x2) + ... + (wm * xm) + b

b. Apply the activation function (step function):

If net ≥ 0, then predicted output ŷ = 1

Else, predicted output ŷ = 0

c. Update the weights and bias if the prediction is incorrect:

o For each weight wj:

wj = wj + η * (yi - ŷ) * xj

o Update the bias:

b = b + η * (yi - ŷ)

3. Repeat steps 2a to 2c for all samples across multiple passes (epochs) until the

weights stabilize (converge) or a maximum number of epochs is reached.

6. Representation Power of MLPs

Multilayer Perceptron’s (MLPs) are powerful models in the field of deep learning due to

their ability to approximate complex functions. At their core, MLPs consist of multiple

layers of neurons where each layer applies a linear transformation followed by a non-

linear activation function. This combination enables MLPs to model highly non-linear

relationships between inputs and outputs. One of the most important theoretical results

about MLPs is the Universal Approximation Theorem, which states that a feed forward

neural network with at least one hidden layer containing a finite number of neurons can

approximate any continuous function on a compact domain, given suitable activation

functions. This means that MLPs have the potential to learn a wide variety of tasks

including classification, regression, and pattern recognition. The representation power of

an MLP increases with the number of hidden units and layers, allowing it to capture more

intricate data patterns. However, simply increasing the size of an MLP doesn't guarantee

better performance it also requires proper training, regularization, and sufficient data.

Overall, the ability of MLPs to represent complex functions makes them a foundational

model in neural network-based machine learning.

7. Sigmoid Neurons:

Sigmoid neurons are a fundamental component of artificial neural networks, especially in

the context of binary classification problems. They use the sigmoid activation function,

defined as σ(z) = 1 / (1 + e^(-z)), where z is the weighted sum of inputs plus a bias term

(z = w·x + b). This function maps any real-valued number into a range between 0 and 1,

making it especially useful for predicting probabilities. One of the key advantages of the

sigmoid function is that it is smooth and differentiable, which a llows for efficient
learning through gradient descent and backpropagation. Biologically, it mimics the way

real neurons activate gradually, firing more strongly with higher input stimuli. By

introducing non-linearity, sigmoid neurons enable networks to learn complex patterns

that linear models cannot. However, they are also known to suffer from the vanishing

gradient problem: for large positive or negative inputs, the gradient becomes very small,

which can hinder learning in deep networks. Despite this drawback, sigmoid neurons are

still foundational in neural network theory and are especially central to logistic regression

models. The output of a sigmoid neuron can be interpreted as the probability that the

input belongs to a particular class. Historically, sigmoid neurons were widely used before

more advanced activation functions like ReLU became standard. Their derivative, σ'(z) =

σ(z)(1 - σ(z)), plays a critical role in adjusting weights during the training process of

neural networks.

8. Gradient Descent:

Gradient Descent is a fundamental optimization algorithm used in machine learning and

deep learning to minimize a loss or cost function by iteratively adjusting the model’s

parameters. The main idea is to find the direction in which the function decreases most

rapidly, this is done by computing the gradient (partial derivatives) of the cost function

with respect to each parameter. Starting from some initial values, the algorithm updates

each parameter in the opposite direction of the gradient, scaled by a factor called the

learning rate. This process continues until the algorithm converges to a minimum, ideally

the global minimum of the cost function. Gradient Descent is essential in training models

like neural networks, where manually finding optimal weights is infeasible due to high

dimensionality and complex surfaces.

Gradient Descent Algorithm

Algorithm: Gradient Descent

Input:

 A differentiable cost function J(θ)

 Learning rate α
 Initial parameters θ
 Convergence criteria (e.g., small change in cost or max number of iterations)

Steps:

1. Initialize the parameters θ randomly or with some guess.

2. Repeat until convergence:
o Compute the gradient:
∇J(θ) = [∂J(θ)/∂θ₁, ∂J(θ)/∂θ₂, ..., ∂J(θ)/∂θn]
o Update the parameters using:
θ := θ - α × ∇J(θ)
3. Return the optimized parameters θ.

Description of the Algorithm

Gradient Descent is an iterative optimization process that helps minimize a given cost

functions, typically representing the error of a machine learning model. The cost function

measures how well the model performs; a lower value means better accuracy. At each

step, the algorithm calculates the slope (or gradient) of the cost function with respect to

each parameter (like weights in a neural network). The gradient shows the direction of

the steepest increase in cost, so by moving in the opposite direction, we reduce the cost.

The step size is controlled by a value called the learning rate α, which must be chosen

carefully, too large may overshoot the minimum, too small may slow down convergence.

The process repeats until the parameters settle around a minimum point where the model

performs optimally. Gradient Descent is widely used because it is simple, effective, and

scalable to large datasets and high-dimensional models.

9. Feed forward Neural Networks

A Feedforward Neural Network (FNN) is a fundamental type of artificial neural

network architecture in which connections between the nodes do not form cycles.

It consists of an input layer, one or more hidden layers, and an output layer. Each

layer is made up of units called neurons, which are inspired by biological neurons.

In a feed forward network, data flows in one direction—from the input layer

through the hidden layers to the output layer. Each neuron in a layer is connected

to every neuron in the subsequent layer, and each connection is associated with a

numerical weight. When data is input into the network, it is multiplied by these

weights and passed through a non-linear activation function such as the sigmoid,

ReLU (Rectified Linear Unit), or tanh function. The output of each neuron

becomes the input to the neurons in the next layer. The final layer produces the

network’s prediction or output. During training, a learning algorithm like back

propagation is used along with an optimization method such as gradient descent to

adjust the weights by minimizing the error between the predicted output and the

actual target. This learning process continues until the network’s performance

reaches a satisfactory level. FNNs are widely used for tasks such as classification,

regression, and pattern recognition due to their simplicity and ability to

approximate complex functions.

Feed forward Algorithm for Neural Networks
Input:
• Input feature vector X = *x1, x2, ..., xn+

• Weight matrices for each layer: W1, W2, ..., WL (where L is the number of layers)

• Bias vectors for each layer: b1, b2, ..., bL

• Activation function f (e.g., sigmoid, tanh, ReLU)

Output:
• Output vector Y_hat (predicted output)

Algorithm Steps:
1. Initialize input layer:

 - Set A0 = X (This is the input to the first layer)

2. For each layer l = 1 to L:

 - Compute the linear combination:

 Zl = Wl * Al-1 + bl
 - Apply activation function:
 Al = f(Zl)

3. Output of the final layer:

 - Y_hat = AL

Example for 3-layer Network (1 hidden layer + output):

Let:

• Input layer: X

• Hidden layer: W1, b1, activation f1

• Output layer: W2, b2, activation f2

Then:

Z1 = W1 * X + b1

A1 = f1(Z1)

Z2 = W2 * A1 + b2

Y_hat = f2(Z2)
Additional information

Feedforward Neural Network - GeeksforGeeks

Backpropagation in Neural Network - GeeksforGeeks

A Step by Step Backpropagation Example _ Matt Mazur

10. Representation Power of Feed forward Neural Networks

Feed forward Neural Networks (FNNs) possess remarkable representational power,

making them highly effective in approximating complex functions. At their core, FNNs

consist of layers of interconnected neurons where information flows in one direction—

from the input layer, through one or more hidden layers, to the output layer. Each neuron

applies a non-linear activation function (such as sigmoid, tanh, or ReLU) to a weighted

sum of its inputs, allowing the network to model non-linear relationships. The true

strength of FNNs lies in the Universal Approximation Theorem, which states that a feed

forward neural network with just one hidden layer containing a finite number of neurons

can approximate any continuous function on a compact domain, given appropriate

weights and activation functions. This means that FNNs are capable of learning and

representing highly complex mappings between input and output spaces, including those

with intricate patterns or high-dimensional data. The depth and width of a network further

influence its ability to capture subtle structures in data; deeper networks (with more

layers) often yield more compact representations of complex functions than shallow ones.

However, this power comes with a trade-off—training deeper networks can be

computationally intensive and susceptible to issues like vanishing gradients. Nonetheless,

with sufficient data, proper initialization, and training strategies such as back propagation

and optimization algorithms (e.g., gradient descent), feed forward networks serve as

foundational tools in modern machine learning for tasks ranging from classification and

regression to feature extraction and representation learning.

Summary of the Difference

Feature FNNs MLPs

Any forward-only neural Fully connected forward-only

Definition
network neural network

Broad (includes CNNs, MLPs, Narrow (only dense-layered

Scope
etc.) structures)

Connectivity May not be fully connected Always fully connected

Representation Universal function Universal function approximation

Power approximation (general) (specific)

Use Case CNNs, shallow nets, deep Standard classification/regression

Examples dense nets models

Experiments
No ratings yet
Experiments
39 pages
UNIT 1 Deep Learning 1
No ratings yet
UNIT 1 Deep Learning 1
64 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
53 pages
Unit1 DeepLearning Part 1
No ratings yet
Unit1 DeepLearning Part 1
23 pages
Unit1 DeepLearning
No ratings yet
Unit1 DeepLearning
18 pages
Deep Learning: A Beginner's Guide
No ratings yet
Deep Learning: A Beginner's Guide
25 pages
This Document Is About Artificial Inteligence.
No ratings yet
This Document Is About Artificial Inteligence.
81 pages
Lect 5
No ratings yet
Lect 5
41 pages
Unit I
0% (1)
Unit I
21 pages
DL QB Answers
No ratings yet
DL QB Answers
121 pages
Designing Perceptron Algorithms
No ratings yet
Designing Perceptron Algorithms
100 pages
Unit 1 Notes Final
No ratings yet
Unit 1 Notes Final
36 pages
CHP 9
No ratings yet
CHP 9
29 pages
Unit-1 Deep Learning
No ratings yet
Unit-1 Deep Learning
20 pages
Cognitive Machines and Deep Learning Insights
No ratings yet
Cognitive Machines and Deep Learning Insights
32 pages
Unit 1
No ratings yet
Unit 1
30 pages
4-Early Neural Network Architectures (MADALINE Network), and Application Domains.-16!12!2024
No ratings yet
4-Early Neural Network Architectures (MADALINE Network), and Application Domains.-16!12!2024
136 pages
Fundamentals of Artificial Neural Networks
No ratings yet
Fundamentals of Artificial Neural Networks
27 pages
ch6 Perceptron MLP PDF
No ratings yet
ch6 Perceptron MLP PDF
31 pages
03 NeuralNetworksI PDF
100% (1)
03 NeuralNetworksI PDF
78 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
39 pages
Neural Networks: A Beginner's Guide
No ratings yet
Neural Networks: A Beginner's Guide
35 pages
NN 2nd
No ratings yet
NN 2nd
23 pages
Mod 3
No ratings yet
Mod 3
101 pages
ML QuestionPaper Solution
No ratings yet
ML QuestionPaper Solution
33 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
54 pages
Unit 4
No ratings yet
Unit 4
18 pages
Foundations of Artificial Intelligence
No ratings yet
Foundations of Artificial Intelligence
60 pages
Unit 1 Deep Learning
No ratings yet
Unit 1 Deep Learning
20 pages
Unit 3
No ratings yet
Unit 3
8 pages
4 History of The Perceptron
No ratings yet
4 History of The Perceptron
34 pages
Dl-Unit 1
No ratings yet
Dl-Unit 1
12 pages
McCulloch-Pitts Neuron & Perceptron
No ratings yet
McCulloch-Pitts Neuron & Perceptron
10 pages
Chapter 07 Artificial Neural Network
No ratings yet
Chapter 07 Artificial Neural Network
62 pages
Soft Computing Unit-2 by Arun Pratap Singh
100% (1)
Soft Computing Unit-2 by Arun Pratap Singh
74 pages
DL Digital Notes
No ratings yet
DL Digital Notes
150 pages
DL CHPT 1
No ratings yet
DL CHPT 1
59 pages
Chapter 6 - Artificial Intelligence Notes
No ratings yet
Chapter 6 - Artificial Intelligence Notes
13 pages
Intro to Artificial Neural Networks
No ratings yet
Intro to Artificial Neural Networks
30 pages
CFBC 718 e 2 C
No ratings yet
CFBC 718 e 2 C
30 pages
Perceptron - Wikipedia
No ratings yet
Perceptron - Wikipedia
9 pages
Tubingen DL Notes
No ratings yet
Tubingen DL Notes
151 pages
Gradient Descent in Neural Networks
No ratings yet
Gradient Descent in Neural Networks
25 pages
DP Learn
No ratings yet
DP Learn
72 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
81 pages
Representation Power of MLPs
No ratings yet
Representation Power of MLPs
141 pages
Feedforward Backpropagation Neural Networks
No ratings yet
Feedforward Backpropagation Neural Networks
28 pages
Unit1 Notes
No ratings yet
Unit1 Notes
20 pages
Neural Networks Lecture Notes
No ratings yet
Neural Networks Lecture Notes
19 pages
UNIT 1 Neural Networks & DL
No ratings yet
UNIT 1 Neural Networks & DL
123 pages
Notes of UNIT-1
No ratings yet
Notes of UNIT-1
20 pages
Deep Learning U1
No ratings yet
Deep Learning U1
5 pages
Understanding Deep Learning Basics
No ratings yet
Understanding Deep Learning Basics
117 pages
Geometry in Machine Learning
No ratings yet
Geometry in Machine Learning
10 pages
Understanding Perceptrons in Neural Networks
No ratings yet
Understanding Perceptrons in Neural Networks
28 pages
Week 3
No ratings yet
Week 3
5 pages
CC Unit 1
No ratings yet
CC Unit 1
36 pages
Ask Vid
No ratings yet
Ask Vid
11 pages
Mini Project Abstract A2
No ratings yet
Mini Project Abstract A2
2 pages
Unit 1
No ratings yet
Unit 1
70 pages
Booklist Class Vi 2025 26
No ratings yet
Booklist Class Vi 2025 26
2 pages
127+ Data Science Projects With Python Code.
No ratings yet
127+ Data Science Projects With Python Code.
9 pages
Comprehensive AI Case Study Overview
No ratings yet
Comprehensive AI Case Study Overview
4 pages
AI Trends in Image Processing
No ratings yet
AI Trends in Image Processing
5 pages
Artificial Intelligence VS Human Intelligence
No ratings yet
Artificial Intelligence VS Human Intelligence
5 pages
AI and Lean Manufacturing
No ratings yet
AI and Lean Manufacturing
95 pages
DL Unit - III Notes1
No ratings yet
DL Unit - III Notes1
14 pages
Studienablaufplan NEU
No ratings yet
Studienablaufplan NEU
1 page
A Multilingual Chatbot For Supporting Mobile Companies Complaints. Case Study: ATM Mobilis of Algeria
No ratings yet
A Multilingual Chatbot For Supporting Mobile Companies Complaints. Case Study: ATM Mobilis of Algeria
75 pages
Ai
No ratings yet
Ai
8 pages
Machine Learning in Finance Course
100% (1)
Machine Learning in Finance Course
131 pages
CCS 3350 Artificial Intelligence 2
No ratings yet
CCS 3350 Artificial Intelligence 2
3 pages
Naveed
No ratings yet
Naveed
74 pages
Applications of Artificial Intelligence AI in Libraries
No ratings yet
Applications of Artificial Intelligence AI in Libraries
19 pages
1 s2.0 S2590123025008291 Main
No ratings yet
1 s2.0 S2590123025008291 Main
17 pages
AI and Human Future Relationship
No ratings yet
AI and Human Future Relationship
5 pages
The Impact of Artificial Intelligence On Creativity
No ratings yet
The Impact of Artificial Intelligence On Creativity
2 pages
Attention Is All You Need - ScienceOpen
No ratings yet
Attention Is All You Need - ScienceOpen
4 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
5 pages
2025 BA Class 5 - Ethical Issues in AI and Machine Learning 2
No ratings yet
2025 BA Class 5 - Ethical Issues in AI and Machine Learning 2
36 pages
Tech Report Generative AI
100% (1)
Tech Report Generative AI
17 pages
RSI Integration Into Algorithms
No ratings yet
RSI Integration Into Algorithms
1 page
Understanding Artificial Intelligence Basics
No ratings yet
Understanding Artificial Intelligence Basics
11 pages
Chengqing Zong - Rui Xia - Jiajun Zhang - Text Data Mining-Springer Singapore
No ratings yet
Chengqing Zong - Rui Xia - Jiajun Zhang - Text Data Mining-Springer Singapore
528 pages
Legal Information Retrieval and Inquiry Tool
No ratings yet
Legal Information Retrieval and Inquiry Tool
8 pages
Impact of Artificial Intelligence and Machine Learning
No ratings yet
Impact of Artificial Intelligence and Machine Learning
12 pages
English Project
No ratings yet
English Project
31 pages
AIML Quiz MCQ
No ratings yet
AIML Quiz MCQ
4 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
10 pages
G.P. Final Project (Debate)
No ratings yet
G.P. Final Project (Debate)
10 pages