0% found this document useful (0 votes)
14 views13 pages

Unit-1 Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views13 pages

Unit-1 Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

1.

Historical Trends in Deep Learning

The evolution of deep learning has been shaped by decades of research, breakthroughs in

theory, advancements in hardware, and the availability of large datasets. Its roots can be

traced back to the 1940s with the introduction of the McCulloch-P itts neuron, a simple

mathematical model of a biological neuron. This was followed by the development of the

Perceptron in the 1950s by Frank Rosenblatt, which demonstrated that machines could

learn from data. However, enthusiasm declined in the 1970s due to limitations in the

perceptron’s capability to solve non-linearly separable problems, such as the XOR

problem. A major resurgence occurred in the 1980s with the introduction of

backpropagation, a learning algorithm that allowed multi-layer neural networks to be

trained effectively. Despite this progress, deep learning faced another slowdown due to

limited computational power and insufficient data. The early 2000s marked a turning

point, with renewed interest driven by increases in GPU computing, large-scale datasets,

and improved algorithms. The breakthrough came in 2012 when a deep convolutional

neural network, AlexNet, won the ImageNet competition by a wide margin, showcasing

the power of deep architectures in visual recognition tasks. This success triggered a wave

of innovations in architectures, such as VGG, ResNet, and Transformers, expanding deep

learning’s impact across domains like computer vision, natural language processing, and

speech recognition. Over the years, the field has moved toward deeper, wider, and more

efficient models, along with the use of self-supervised learning, transfer learning, and

foundation models. Today, deep learning stands at the forefront of artificial intelligence,

with ongoing research aimed at improving interpretability, robustness, and efficiency.


2. McCulloch-Pitts Neuron

The McCulloch-Pitts neuron, proposed in 1943 by Warren McCulloch and Walter Pitts,

represents the earliest mathematical model of a biological neuron and laid the foundation

for artificial neural networks. It is a simple binary computational model designed to

mimic the functioning of a single neuron in the human brain. The model receives

multiple binary inputs, each representing the presence (1) or absence (0) of a signal.

These inputs are summed and compared against a fixed threshold. If the total input

exceeds the threshold, the neuron “fires” and outputs a 1; otherwise, it outputs a 0.

Mathematically, it can be represented as a threshold logic unit without the use of weights

or learning mechanisms. Although rudimentary, the McCulloch-Pitts neuron was capable

of computing basic logical functions such as AND, OR, and NOT, making it a crucial

stepping stone in understanding how complex computation can arise from simple units.

However, due to its limitations—such as the inability to learn from data or solve non-

linearly separable problems like XOR—it was later extended and improved upon in

models like the Perceptron. Despite its simplicity, the McCulloch-Pitts neuron remains

historically significant as the first formal attempt to bridge neuroscience and

computation, inspiring the development of modern neural network models.


Algorithm of McCulloch-Pitts Neuron

Inputs:

 A list of binary inputs: x1, x2, ..., xn (each input is either 0 or 1)


 A threshold value: θ (theta), a positive integer

Steps:

1. Initialize sum = 0
2. For each input xi (where i = 1 to n):
o Add xi to sum
3. Compare the total sum with the threshold:
o If sum >= θ:
Output = 1
o Else:
Output = 0

Return:

 Output (either 0 or 1)

Example – Implementing AND Gate with McCulloch-Pitts Neuron

Let:

 Input x1 = 1
 Input x2 = 1
 Threshold θ = 2

Steps:

1. sum = x1 + x2 = 1 + 1 = 2
2. Since sum (2) >= threshold (2), the neuron fires:
Output = 1

Other combinations:

 x1 = 1, x2 = 0 → sum = 1 → Output = 0
 x1 = 0, x2 = 1 → sum = 1 → Output = 0
 x1 = 0, x2 = 0 → sum = 0 → Output = 0

Thus, the McCulloch-P itts neuron correctly performs the logic of an AND gate.
3. Thresholding Logic

Thresholding logic is a fundamental concept in both biological and artificial neural

computation, where a decision is made based on whether an input signal surpasses a

predefined limit known as the threshold. In its simplest form, thresholding logic involves

summing input signals and comparing the result to a fixed threshold value. If the sum

equals or exceeds the threshold, the system activates or produces an output of 1;

otherwise, the output is 0. This binary decision-making process mimics the behavior of

neurons in the brain, which fire only when their inputs collectively reach a certain

activation level. Thresholding logic is the core mechanism behind early neural models

like the McCulloch-P itts neuron and underlies the implementation of basic logic gates

such as AND, OR, and NOT. It plays a critical role in pattern recognition, classification,

and decision-making systems by enabling machines to differentiate between input

patterns based on activation criteria. Although modern neural networks use more

complex activation functions, thresholding remains a foundational idea that helps in

understanding how discrete decis ions are made in both artificial and biological contexts.

4. Perceptron

The Perceptron is a fundamental model in the history of artificial neural networks,

introduced by Frank Rosenblatt in 1958 as a computational algorithm designed to mimic

the learning ability of the human brain. It is a supervised learning model used primarily

for binary classification tasks. A perceptron consists of a single layer of artificial neurons,
each of which receives multiple input signals, applies corresponding weights, sums the m,

and passes the result through an activation function—typically a step function. If the

weighted sum exceeds a predefined threshold, the output is 1; otherwise, it is 0. Unlike

the earlier McCulloch-P itts neuron, the perceptron is capable of learning by adjusting its

weights during training using an error-correction rule. The learning algorithm updates

weights iteratively based on the difference between the actual and desired output,

allowing the model to minimize classification errors over time. Despite its effectiveness

in solving linearly separable problems (like AND and OR), the perceptron cannot solve

problems involving non-linear decision boundaries, such as the XOR problem.

Nevertheless, the perceptron laid the groundwork for more advanced neural network

models, including multi-layer networks and modern deep learning architectures, making

it a pivotal innovation in the development of artificial intelligence.

5. Perceptron Learning Algorithm

The Perceptron Learning Algorithm is one of the earliest and simplest algorithms for

training a single-layer neural network for binary classification. It was introduced by

Frank Rosenblatt and is used to find the optimal weights of a perceptron based on labeled

training data. The perceptron processes each input vector by computing a weighted sum

and comparing it against a threshold. If the result is above the threshold, the output is 1;

otherwise, it is 0. The learning algorithm updates the weights whenever the output does

not match the target value, gradually reducing the classification error over time.
Inputs:

 A set of training examples: {(x1, y1), (x2, y2), ..., (xn, yn)}

o Each input xi is a vector of features: [x1, x2, ..., xm]

o Each target yi is either 0 or 1

 Learning rate: η (eta), typically a small positive number (e.g., 0.1)

 Initial weights: [w1, w2, ..., wm] and bias b (often set to 0)

Algorithm Steps:

1. Initialize weights (w1, w2, ..., wm) and bias b to small random values or zeros.

2. For each training sample (xi, yi):

a. Compute the weighted sum:

net = (w1 * x1) + (w2 * x2) + ... + (wm * xm) + b

b. Apply the activation function (step function):

If net ≥ 0, then predicted output ŷ = 1

Else, predicted output ŷ = 0

c. Update the weights and bias if the prediction is incorrect:

o For each weight wj:

wj = wj + η * (yi - ŷ) * xj

o Update the bias:

b = b + η * (yi - ŷ)

3. Repeat steps 2a to 2c for all samples across multiple passes (epochs) until the

weights stabilize (converge) or a maximum number of epochs is reached.


6. Representation Power of MLPs

Multilayer Perceptron’s (MLPs) are powerful models in the field of deep learning due to

their ability to approximate complex functions. At their core, MLPs consist of multiple

layers of neurons where each layer applies a linear transformation followed by a non-

linear activation function. This combination enables MLPs to model highly non-linear

relationships between inputs and outputs. One of the most important theoretical results

about MLPs is the Universal Approximation Theorem, which states that a feed forward

neural network with at least one hidden layer containing a finite number of neurons can

approximate any continuous function on a compact domain, given suitable activation

functions. This means that MLPs have the potential to learn a wide variety of tasks

including classification, regression, and pattern recognition. The representation power of

an MLP increases with the number of hidden units and layers, allowing it to capture more

intricate data patterns. However, simply increasing the size of an MLP doesn't guarantee

better performance it also requires proper training, regularization, and sufficient data.

Overall, the ability of MLPs to represent complex functions makes them a foundational

model in neural network-based machine learning.

7. Sigmoid Neurons:

Sigmoid neurons are a fundamental component of artificial neural networks, especially in

the context of binary classification problems. They use the sigmoid activation function,

defined as σ(z) = 1 / (1 + e^(-z)), where z is the weighted sum of inputs plus a bias term

(z = w·x + b). This function maps any real-valued number into a range between 0 and 1,

making it especially useful for predicting probabilities. One of the key advantages of the

sigmoid function is that it is smooth and differentiable, which a llows for efficient
learning through gradient descent and backpropagation. Biologically, it mimics the way

real neurons activate gradually, firing more strongly with higher input stimuli. By

introducing non-linearity, sigmoid neurons enable networks to learn complex patterns

that linear models cannot. However, they are also known to suffer from the vanishing

gradient problem: for large positive or negative inputs, the gradient becomes very small,

which can hinder learning in deep networks. Despite this drawback, sigmoid neurons are

still foundational in neural network theory and are especially central to logistic regression

models. The output of a sigmoid neuron can be interpreted as the probability that the

input belongs to a particular class. Historically, sigmoid neurons were widely used before

more advanced activation functions like ReLU became standard. Their derivative, σ'(z) =

σ(z)(1 - σ(z)), plays a critical role in adjusting weights during the training process of

neural networks.

8. Gradient Descent:

Gradient Descent is a fundamental optimization algorithm used in machine learning and

deep learning to minimize a loss or cost function by iteratively adjusting the model’s

parameters. The main idea is to find the direction in which the function decreases most

rapidly, this is done by computing the gradient (partial derivatives) of the cost function

with respect to each parameter. Starting from some initial values, the algorithm updates

each parameter in the opposite direction of the gradient, scaled by a factor called the

learning rate. This process continues until the algorithm converges to a minimum, ideally

the global minimum of the cost function. Gradient Descent is essential in training models

like neural networks, where manually finding optimal weights is infeasible due to high

dimensionality and complex surfaces.


Gradient Descent Algorithm

Algorithm: Gradient Descent

Input:

 A differentiable cost function J(θ)


 Learning rate α
 Initial parameters θ
 Convergence criteria (e.g., small change in cost or max number of iterations)

Steps:

1. Initialize the parameters θ randomly or with some guess.


2. Repeat until convergence:
o Compute the gradient:
∇J(θ) = [∂J(θ)/∂θ₁, ∂J(θ)/∂θ₂, ..., ∂J(θ)/∂θn]
o Update the parameters using:
θ := θ - α × ∇J(θ)
3. Return the optimized parameters θ.

Description of the Algorithm

Gradient Descent is an iterative optimization process that helps minimize a given cost

functions, typically representing the error of a machine learning model. The cost function

measures how well the model performs; a lower value means better accuracy. At each

step, the algorithm calculates the slope (or gradient) of the cost function with respect to

each parameter (like weights in a neural network). The gradient shows the direction of

the steepest increase in cost, so by moving in the opposite direction, we reduce the cost.

The step size is controlled by a value called the learning rate α, which must be chosen

carefully, too large may overshoot the minimum, too small may slow down convergence.

The process repeats until the parameters settle around a minimum point where the model

performs optimally. Gradient Descent is widely used because it is simple, effective, and

scalable to large datasets and high-dimensional models.


9. Feed forward Neural Networks

A Feedforward Neural Network (FNN) is a fundamental type of artificial neural

network architecture in which connections between the nodes do not form cycles.

It consists of an input layer, one or more hidden layers, and an output layer. Each

layer is made up of units called neurons, which are inspired by biological neurons.

In a feed forward network, data flows in one direction—from the input layer

through the hidden layers to the output layer. Each neuron in a layer is connected

to every neuron in the subsequent layer, and each connection is associated with a

numerical weight. When data is input into the network, it is multiplied by these

weights and passed through a non-linear activation function such as the sigmoid,

ReLU (Rectified Linear Unit), or tanh function. The output of each neuron

becomes the input to the neurons in the next layer. The final layer produces the

network’s prediction or output. During training, a learning algorithm like back

propagation is used along with an optimization method such as gradient descent to

adjust the weights by minimizing the error between the predicted output and the

actual target. This learning process continues until the network’s performance

reaches a satisfactory level. FNNs are widely used for tasks such as classification,

regression, and pattern recognition due to their simplicity and ability to

approximate complex functions.


Feed forward Algorithm for Neural Networks
Input:
• Input feature vector X = *x1, x2, ..., xn+

• Weight matrices for each layer: W1, W2, ..., WL (where L is the number of layers)

• Bias vectors for each layer: b1, b2, ..., bL

• Activation function f (e.g., sigmoid, tanh, ReLU)

Output:
• Output vector Y_hat (predicted output)

Algorithm Steps:
1. Initialize input layer:

 - Set A0 = X (This is the input to the first layer)

2. For each layer l = 1 to L:

 - Compute the linear combination:


 Zl = Wl * Al-1 + bl
 - Apply activation function:
 Al = f(Zl)

3. Output of the final layer:

 - Y_hat = AL

Example for 3-layer Network (1 hidden layer + output):


Let:

• Input layer: X

• Hidden layer: W1, b1, activation f1

• Output layer: W2, b2, activation f2

Then:

Z1 = W1 * X + b1

A1 = f1(Z1)

Z2 = W2 * A1 + b2

Y_hat = f2(Z2)
Additional information

Feedforward Neural Network - GeeksforGeeks

Backpropagation in Neural Network - GeeksforGeeks

A Step by Step Backpropagation Example _ Matt Mazur

10. Representation Power of Feed forward Neural Networks

Feed forward Neural Networks (FNNs) possess remarkable representational power,

making them highly effective in approximating complex functions. At their core, FNNs

consist of layers of interconnected neurons where information flows in one direction—

from the input layer, through one or more hidden layers, to the output layer. Each neuron

applies a non-linear activation function (such as sigmoid, tanh, or ReLU) to a weighted

sum of its inputs, allowing the network to model non-linear relationships. The true

strength of FNNs lies in the Universal Approximation Theorem, which states that a feed

forward neural network with just one hidden layer containing a finite number of neurons

can approximate any continuous function on a compact domain, given appropriate

weights and activation functions. This means that FNNs are capable of learning and

representing highly complex mappings between input and output spaces, including those

with intricate patterns or high-dimensional data. The depth and width of a network further

influence its ability to capture subtle structures in data; deeper networks (with more

layers) often yield more compact representations of complex functions than shallow ones.

However, this power comes with a trade-off—training deeper networks can be

computationally intensive and susceptible to issues like vanishing gradients. Nonetheless,


with sufficient data, proper initialization, and training strategies such as back propagation

and optimization algorithms (e.g., gradient descent), feed forward networks serve as

foundational tools in modern machine learning for tasks ranging from classification and

regression to feature extraction and representation learning.

Summary of the Difference


Feature FNNs MLPs

Any forward-only neural Fully connected forward-only


Definition
network neural network

Broad (includes CNNs, MLPs, Narrow (only dense-layered


Scope
etc.) structures)

Connectivity May not be fully connected Always fully connected

Representation Universal function Universal function approximation


Power approximation (general) (specific)

Use Case CNNs, shallow nets, deep Standard classification/regression


Examples dense nets models

You might also like