0% found this document useful (0 votes)

38 views52 pages

Lecture 2

Lecture 2 covers Multilayer Perceptrons (MLPs) in deep learning, explaining the structure and functioning of neural networks inspired by biological systems. It details the perceptron model, learning algorithms such as backpropagation, and design considerations for MLPs, including activation functions and issues like overfitting and the vanishing gradient problem. The lecture emphasizes the importance of network architecture and training processes in achieving effective machine learning outcomes.

Uploaded by

Abdelrhman Adel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views52 pages

Lecture 2

Uploaded by

Abdelrhman Adel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Lecture 2: Multilayer Perceptrons

CS460: Deep Learning

What is a Neural network ?

 Thebrain is a highly complex,

nonlinear; and parallel computer. It
has the capability to organize its
neurons, so as to perform certain
computations (e.g., pattern
recognition, perception, and motor
control) many times faster than the
fastest digital computer in existence
today.
Biological Neural
Networks
Dendrites
Synapse
Synapse

Axon

Dendrites Soma
Soma
Modeling the single
neuron
Learning in simple
neurons
 If we have two groups of objects, one
group of several written A's, and the
other of B's, we may want our
neuron to tell the A's from the B's, as
in figure.
 We want it to output a 1 when an A
is presented and a 0 when it sees a
B.
Biology analogy
Biological Artificial
Soma Node/neuron
Dendrites Input
Axon Output
Synapse Weight
The perceptron
The simplest kind of neural network is a single-layer
perceptron network, which consists of a single layer
of output nodes; the inputs are fed directly to the
outputs via a series of weights. The sum of the
products of the weights and the inputs is calculated in
each node, and if the value is above some threshold
the neuron fires and takes the activated value;
otherwise it takes the deactivated value.
 Neurons with this kind of activation function are also
called artificial neurons or linear threshold units.
 In the literature the term perceptron often refers to
networks consisting of just one of these units.
The perceptron (cont’d)
 theperceptron is an algorithm for learning
a binary classifier called a
threshold function: a function that maps its
input x(a real-valued vector) to an output
value f(X) (a single binary value):

 wherew is a vector of real-valued

weights, w . x is the dot product ,
where m is the number of inputs to the
perceptron, and b is the bias.
The perceptron (cont’d)
 Perceptrons can be trained by a simple
learning algorithm that is usually called
the delta rule. It calculates the errors
between calculated output and sample
output data, and uses this to create an
adjustment to the weights, thus
implementing a form of gradient descent.
 Single-layer perceptrons are only capable
of learning linearly separable patterns
Linearly Separable
XOR Function

 Itis impossible for a single-layer

perceptron network to learn an
XOR function
Non-linear
transformations
A single-layer neural network can
compute a continuous output instead
of a step function. A common choice
is the so-called logistic function:
Non-linear
transformations
 The logistic function is one of the
family of functions called
sigmoid functions. It has a
continuous derivative, which allows
it to be used in backpropagation.
This function is also preferred
because its derivative is easily
calculated (differentiable) :
Sigmoid function
Multi Layer Perceptron
(MLP)
 MLP is a class of a feedforward (Acyclic) artificial neural
network (ANN).
 Each neuron in one layer has directed connections to
the neurons of the subsequent layer. In many
applications the units of these networks apply a
sigmoid function as an activation function.
 MLPs models are the most basic deep neural network,
which is composed of a series of fully connected layers.
 Each new layer is a set of nonlinear functions of a
weighted sum of all outputs (fully connected) from the
prior one.
 Multilayer feed-forward networks, given enough hidden
units and enough training samples, can closely
approximate any function.
The Architecture

 MLP with one hiddenx layer

1 (PE)

x2 Weighted Transfer
(PE) Sum Function
Y1
x3 (S) (f)

(PE)

(PE) (PE)

Output
(PE)
Layer

Hidden
(PE)
Layer

Input
Layer
MLP processing
(a) Single neuron (b) Multiple neurons

x1 x1 w11 (PE) Y1
w1
w21
(PE) Y

w1 w12
x2 Y  X 1W1  X 2W2
x2 w22 (PE) Y2
PE: Processing Element (or neuron)

Y1 X1W11  X 2W21
w23
Y2 X1W12  X2W22
Y3  X 2W 23 (PE) Y3
MLP processing (cont’d)

Summation function: Y = 3(0.2) + 1(0.4) + 2(0.1) = 1.2

X1 = 3 Transfer function: YT = 1/(1 + e-1.2) = 0.77

W2 = 0.4 Processing Y = 1.2

X2 = 1 YT = 0.77
element (PE)

X3 = 2
Designing the MLP
 Before training can begin, the user must decide on
the network topology by specifying:
 the number of units in the input layer,
 the number of hidden layers (if more than one), the
number of units in each hidden layer, and
 the number of units in the output layer.
 Normalizing the input values (between 0.0 and 1.0)
for each attribute measured in the training tuples
will help speed up the learning phase and prevent
the exploding gradient problem.
 Discrete-valued attributes may be encoded such
that there is one input unit per domain value.
 Choice of the transfer function
Transformation (Transfer)
Function
 Linear function
 Sigmoid (logical activation) function [0
1]
 Tangent Hyperbolic function [-1 1]
MLP: Design issues

 Neural networks can be used for both

classiﬁcation (to predict the class label
of a given tuple) and numeric prediction
(to predict a continuous-valued output).
 For classiﬁcation, one output unit may
be used to represent two classes (where
the value 1 represents one class, and
the value 0 represents the other).
 If there are more than two classes, then
one output unit per class is used.
MLP: Design issues
 There are no clear rules as to the “best”
number of hidden layer units.
 Network design is a trial-and-error process and
may affect the accuracy of the resulting trained
network.
 The initial values of the weights may also affect
the resulting accuracy.
 Once a network has been trained and its
accuracy is not considered acceptable, it is
common to repeat the training process with
 a different network topology or
 a different set of initial weights.
The XOR function -
revisted
MLP Box Office prediction
example
The Learning algorithm

 Itadjusts the weights of the

machine, in order to minimize the
average squared error.
Learning in MLP
 The learning algorithm procedure
 Initialize weights with random values and set
other network parameters
 Read in the inputs and the desired outputs
 Compute the actual output (by working
forward through the layers)
 Compute the error (difference between the
actual and desired output)
 Change the weights by working backward
through the hidden layers
 Repeat steps 2-5 until weights stabilize
Learning in MLP (cont’d)
 Backpropagation learns by iteratively
processing a data set of training tuples,
comparing the network’s prediction for each
tuple with the actual known target value.
 The target value may be the known class label
of the training tuple (for classiﬁcation
problems) or a continuous value (for numeric
prediction).
 For each training tuple, the weights are
modiﬁed so as to minimize the mean-squared
error between the network’s prediction and the
actual target value.
Learning in MLP (cont’d)

 These modiﬁcations are made in the

“backwards” direction (i.e., from the
output layer) through each hidden
layer down to the ﬁrst hidden layer
(hence the name backpropagation).
 Although it is not guaranteed, in
general the weights will eventually
converge, and the learning process
stops.
MLPs Bottlenecks
1. Dimensionality issue

 Rule of thumb: The number of

training samples should be at least 5
to 10 times the number of weights in
the network.
 Otherwise,the network is prone to
overfitting
2. Overfitting
2. Overfitting (cont’d)
3. The black-box syndrome

 A common criticism for ANN: The lack of

transparency/explainability
 Answer: sensitivity analysis
 Conducted on a trained ANN
 The inputs are perturbed while the
relative change on the output is
measured/recorded
 Results illustrate the relative importance
of input variables
sensitivity analysis
4. Vanishing gradient
problem
 In machine learning, the vanishing gradient problem is
encountered when training artificial neural networks with
gradient-based learning methods and backpropagation. In such
methods, during each iteration of training each of the neural
network's weights receives an update proportional to the
partial derivative of the error function with respect to the current
weight. The problem is that in some cases, the gradient will be
vanishingly small, effectively preventing the weight from changing
its value. In the worst case, this may completely stop the neural
network from further training. As one example of the problem cause,
traditional activation functions such as the hyperbolic tangent
function have gradients in the range (0,1], and backpropagation
computes gradients by the chain rule. This has the effect of
multiplying n of these small numbers to compute gradients of the
early layers in an n-layer network, meaning that the gradient (error
signal) decreases exponentially with n while the early layers train
very slowly.
Building Neural
Networks
 Architecture of a neural network is driven
by the task it is intended to address
 Classification, regression, clustering,
general optimization, association, ….
 Most popular architecture: Feedforward
multi-layered perceptron with
backpropagation learning algorithm
 Used for both classification and regression
type problems
 Others – Recurrent, self-organizing feature
maps, Hopfield networks, …
Development of NNs
Backpropagation
 Multi-layer networks use a variety of learning techniques, the most
popular being back-propagation.
 The output values are compared with the correct answer to compute the
value of some predefined error-function. By various techniques, the error
is then fed back through the network.
 The algorithm adjusts the weights of each connection in order to reduce
the value of the error function by some small amount.
 After repeating this process for a sufficiently large number of training
cycles, the network will usually converge to some state where the error
of the calculations is small.
 In this case, one would say that the network has learned a certain target
function. To adjust weights properly, one applies a general method for
non-linear optimization that is called gradient descent. For this, the
network calculates the derivative of the error function with respect to the
network weights, and changes the weights such that the error decreases
(thus going downhill on the surface of the error function).
 For this reason, back-propagation can only be applied on networks with
differentiable activation functions.
The steps Of The
Backpropagation
Initialize the weights:
 The weights in the network are
initialized to small random numbers
(e.g., ranging from−1.0 to 1.0, or−0.5
to 0.5).
 Each unit has a bias associated with it,
as explained later.
 The biases are similarly initialized to
small random numbers.
 Each training tuple, X, is processed by
the following steps.
Propagate the inputs
forward:
 First,the training tuple is fed to the
network’s input layer.
 The inputs pass through the input units,
unchanged.
 That is, for an input unit, j, its output, Oj, is
equal to its input value, Ij.
 Next, the net input and output of each unit in
the hidden and output layers are computed.
 The net input to a unit in the hidden or
output layers is computed as a linear
combination of its inputs.
The steps Of The
Backpropagation
 Propagate the inputs forward:
 Each hidden layer or output layer unit has a
number of inputs to it that are, in fact, the
outputs of the units connected to it in the
previous layer.
Propagate the inputs
forward
 To compute the net input to the unit, each input
connected to the unit is multiplied by its
corresponding weight, and this is summed.
 Given a unit, j in a hidden or output layer, the net
input, Ij, to unit j is

 where wij is the weight of the connection from unit i

in the previous layer to unit j; Oi is the output of
unit i from the previous layer; and θj is the bias of
unit j.
 The bias acts as a threshold in that it serves to vary
the activity of the unit.
Propagate the inputs
forward
 Each unit in the hidden and output layers takes its net
input and then applies an activation function to it.
 The function symbolizes the activation of the neuron
represented by the unit.
 The logistic, or sigmoid, function is used. Given the
net input Ij to unit j, then Oj, the output of unit j, is
computed as

 Thelogistic function is nonlinear and differentiable,

allowing the backpropagation algorithm to model
classiﬁcation problems that are linearly inseparable.
Propagate the inputs
forward
 We compute the output values, Oj, for
each hidden layer, up to and including
the output layer, which gives the
network’s prediction.
 In practice, it is a good idea to cache (i.e.,
save) the intermediate output values at
each unit as they are required again later
when back propagating the error.
 This trick can substantially reduce the
amount of computation required.
Back propagate the
error
 Theerror is propagated backward by updating
the weights and biases to reﬂect the error of
the network’s prediction. For a unit j in the
output layer, the error Errj is computed by

 where Oj is the actual output of unit j, and Tj is

the known target value of the given training
tuple.
 Note that Oj(1−Oj) is the derivative of the
logistic function.
Back propagate the
error
 To compute the error of a hidden layer unit j,
the weighted sum of the errors of the units
connected to unit j in the next layer are
considered.
 The error of a hidden layer unit j is

 where wjk is the weight of the connection from

unit j to a unit k in the next higher layer, and
Errk is the error of unit k.
Back propagate the
error
 The weights and biases are updated to reﬂect the
propagated errors.
 Weights are updated by the following equations,
where delta(wij) is the change in weight wij:

 The variable l is the learning rate, a constant typically

having a value between 0.0 and 1.0.
 The learning rate helps avoid getting stuck at a local
minimum in decision space. If the learning rate is too
small, then learning will occur at a very slow pace. If
learning rate is too large, then oscillation.
Back propagate the
error
 Biasesare updated by the following equations, where
delta(θj) is the change in bias θj:

 The updating of the weights and biases after the

presentation of each tuple, referred to case updating.
 Alternatively, the weight and bias increments could
be accumulated in variables, so that the weights and
biases are updated after all the tuples in the training
set have been presented. (called epoch updating)
 Batch/mini-batch updating : weight and bias are
updated after several samples
 one iteration through the training set is an epoch.
Terminating condition
 Training stops when:
 All delta(wij) in the previous epoch are so small
as to be below some specified threshold, or
 The percentage of tuples misclassified in the
previous epoch is below some threshold, or
 A pre-specified number of epochs has expired.
 Inpractice, several hundreds of thousands
of epochs may be required before the
weights will converge.

CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
Session-33 - 1 CO4 Forward NN
No ratings yet
Session-33 - 1 CO4 Forward NN
15 pages
Basics of Deep Learning
No ratings yet
Basics of Deep Learning
20 pages
Notes ML 02 Slides RNN ANN
No ratings yet
Notes ML 02 Slides RNN ANN
105 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
34 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
Module 2 Notes - Full
No ratings yet
Module 2 Notes - Full
54 pages
Aiml Unit 5
No ratings yet
Aiml Unit 5
34 pages
DL Notes
No ratings yet
DL Notes
21 pages
Unit 3
100% (1)
Unit 3
11 pages
Unit 4 Neural Networks
No ratings yet
Unit 4 Neural Networks
76 pages
Unit Iv DM
No ratings yet
Unit Iv DM
58 pages
cst414 - Deep Learning
No ratings yet
cst414 - Deep Learning
34 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
82 pages
Graph Theory Report
No ratings yet
Graph Theory Report
9 pages
Neuron 7 AI: Linear Threshold Units
No ratings yet
Neuron 7 AI: Linear Threshold Units
18 pages
Chapter 2 - Artificial Neural Networks
No ratings yet
Chapter 2 - Artificial Neural Networks
19 pages
Neural Networks: Single & Multi-Layer Overview
No ratings yet
Neural Networks: Single & Multi-Layer Overview
35 pages
Deep Learning
No ratings yet
Deep Learning
11 pages
Unit2ml 230101150634 5590aaef
No ratings yet
Unit2ml 230101150634 5590aaef
202 pages
Neural Network: Prof. Subodh Kumar Mohanty
No ratings yet
Neural Network: Prof. Subodh Kumar Mohanty
37 pages
Topic 7
No ratings yet
Topic 7
33 pages
Understanding Multi-Layer Perceptrons
No ratings yet
Understanding Multi-Layer Perceptrons
27 pages
Unit-3 ML
No ratings yet
Unit-3 ML
21 pages
Unit 4 ML NN, DL, CNN-1
No ratings yet
Unit 4 ML NN, DL, CNN-1
84 pages
Module I
No ratings yet
Module I
109 pages
08 NN
No ratings yet
08 NN
117 pages
Unit 4
No ratings yet
Unit 4
38 pages
UNIT-2 Machine Learning
No ratings yet
UNIT-2 Machine Learning
35 pages
Back Propagation
100% (1)
Back Propagation
27 pages
Lesson 7.0 Supervised Learning With Neural Networks
No ratings yet
Lesson 7.0 Supervised Learning With Neural Networks
22 pages
Unit-4 Full
No ratings yet
Unit-4 Full
48 pages
Fundamentals of Deep Learning Explained
No ratings yet
Fundamentals of Deep Learning Explained
72 pages
Deep Learning for BTech Students
No ratings yet
Deep Learning for BTech Students
78 pages
Neural Networks
No ratings yet
Neural Networks
10 pages
Neural Networks for Tech Enthusiasts
No ratings yet
Neural Networks for Tech Enthusiasts
15 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
4 Multilayer Perceptrons and Radial Basis Functions
No ratings yet
4 Multilayer Perceptrons and Radial Basis Functions
6 pages
Wa0006.
No ratings yet
Wa0006.
70 pages
DL Question Bank Answers
No ratings yet
DL Question Bank Answers
55 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
75 pages
CMPE 442 Introduction To Machine Learning: Artificial Neural Networks
No ratings yet
CMPE 442 Introduction To Machine Learning: Artificial Neural Networks
65 pages
MLP Chap11
No ratings yet
MLP Chap11
24 pages
DL Question Bank Answers
No ratings yet
DL Question Bank Answers
55 pages
ML 4 PPT Unit Iv
No ratings yet
ML 4 PPT Unit Iv
71 pages
Implementing MLPs with Keras
No ratings yet
Implementing MLPs with Keras
61 pages
Soft Computing Unit 2
No ratings yet
Soft Computing Unit 2
22 pages
2023 Lecture11 NeuralNetworks
No ratings yet
2023 Lecture11 NeuralNetworks
48 pages
Machine Learning
No ratings yet
Machine Learning
77 pages
Unit-5: Introduction To Deep Learning: Artificial Neural Networks
No ratings yet
Unit-5: Introduction To Deep Learning: Artificial Neural Networks
14 pages
Multilayer Perceptron Neural Networks
No ratings yet
Multilayer Perceptron Neural Networks
51 pages
DL Notes ALL
No ratings yet
DL Notes ALL
63 pages
Soft Computing Unit 2
No ratings yet
Soft Computing Unit 2
23 pages
Neural Network
No ratings yet
Neural Network
97 pages
Supervised Learning Neural Networks
No ratings yet
Supervised Learning Neural Networks
34 pages
04 NeuralNetworksII
No ratings yet
04 NeuralNetworksII
74 pages
Lec 6-7 (Neural Networks)
No ratings yet
Lec 6-7 (Neural Networks)
26 pages
10 Neural Nets
No ratings yet
10 Neural Nets
61 pages
Ch03 Block Cipher
No ratings yet
Ch03 Block Cipher
54 pages
Nielsen's Heuristics for UI Design
No ratings yet
Nielsen's Heuristics for UI Design
10 pages
Introduction To Optimization-Lec1
No ratings yet
Introduction To Optimization-Lec1
36 pages
Lecture 1
No ratings yet
Lecture 1
10 pages
Algorithms - CS3401 2021 Regulation - 2 Marks and Important Questions
No ratings yet
Algorithms - CS3401 2021 Regulation - 2 Marks and Important Questions
50 pages
Adsa Lab Manual
No ratings yet
Adsa Lab Manual
45 pages
Understanding Monomials and Polynomials
No ratings yet
Understanding Monomials and Polynomials
9 pages
TMC - Review Final Exam
No ratings yet
TMC - Review Final Exam
31 pages
2022 1 Linear Programming
No ratings yet
2022 1 Linear Programming
8 pages
Mth603 Assignment 1 Solution
No ratings yet
Mth603 Assignment 1 Solution
6 pages
Real Numbers Flow Chart
No ratings yet
Real Numbers Flow Chart
1 page
Graphical Method
No ratings yet
Graphical Method
6 pages
Activity 5 - Search and Sorting Algorithms
No ratings yet
Activity 5 - Search and Sorting Algorithms
6 pages
Model 111
No ratings yet
Model 111
3 pages
Design and Analysis of Algorithms Exam
No ratings yet
Design and Analysis of Algorithms Exam
2 pages
2017 IDEC Guo
No ratings yet
2017 IDEC Guo
7 pages
DL Unit 2a
No ratings yet
DL Unit 2a
14 pages
10.algebraic Expressions
No ratings yet
10.algebraic Expressions
8 pages
机器学习绘图模板
No ratings yet
机器学习绘图模板
101 pages
Optimization Algorithms
No ratings yet
Optimization Algorithms
5 pages
Modelling With Linear Programming MOT 3
No ratings yet
Modelling With Linear Programming MOT 3
22 pages
Ex06 Sol
No ratings yet
Ex06 Sol
4 pages
Chapter Six: Network Optimization Problems
No ratings yet
Chapter Six: Network Optimization Problems
38 pages
A Comparative Study On Numerical Solutions of Init
No ratings yet
A Comparative Study On Numerical Solutions of Init
12 pages
Machine Learning and Deep Learning - Fundamentals and Applications - Unit 13 - Week 10 - Artificial Neural Network
No ratings yet
Machine Learning and Deep Learning - Fundamentals and Applications - Unit 13 - Week 10 - Artificial Neural Network
5 pages
Supplement 7
No ratings yet
Supplement 7
20 pages
Understanding Quadratic Equations and Inequalities
No ratings yet
Understanding Quadratic Equations and Inequalities
19 pages
BCS401 Model Question Paper 2025
No ratings yet
BCS401 Model Question Paper 2025
2 pages
NUMERICAL ANALYSIS Solution of Final Exam 23-12-2021
No ratings yet
NUMERICAL ANALYSIS Solution of Final Exam 23-12-2021
5 pages
Algorithm Design & Analysis Course
No ratings yet
Algorithm Design & Analysis Course
7 pages
NN LMS DR Gamal PDF
No ratings yet
NN LMS DR Gamal PDF
34 pages
MAT 350 - Inverse Operator Method
No ratings yet
MAT 350 - Inverse Operator Method
12 pages
Java Data Structures Cheat Sheet
No ratings yet
Java Data Structures Cheat Sheet
3 pages
Advanced Integration Practice
No ratings yet
Advanced Integration Practice
4 pages

Lecture 2

Uploaded by

Lecture 2

Uploaded by

Lecture 2: Multilayer Perceptrons

CS460: Deep Learning

 Neural network is a machine that is

 Thebrain is a highly complex,

 wherew is a vector of real-valued

 Itis impossible for a single-layer

 MLP with one hiddenx layer

Summation function: Y = 3(0.2) + 1(0.4) + 2(0.1) = 1.2

W2 = 0.4 Processing Y = 1.2

 Neural networks can be used for both

 Itadjusts the weights of the

 These modiﬁcations are made in the

 Rule of thumb: The number of

 A common criticism for ANN: The lack of

 where wij is the weight of the connection from unit i

 Thelogistic function is nonlinear and differentiable,

 where Oj is the actual output of unit j, and Tj is

 where wjk is the weight of the connection from

 The variable l is the learning rate, a constant typically

 The updating of the weights and biases after the

You might also like