0% found this document useful (0 votes)

30 views4 pages

Notes Chapter8

This document discusses feedforward neural networks and backpropagation. It contains the following key points: 1) It describes the structure of a feedforward neural network, including notation for the number of neurons in each layer, weights, inputs, outputs, and activation functions. 2) It explains how to perform a forward pass through the network to calculate outputs given inputs and weights. 3) It discusses different loss functions that can be used for regression and classification problems with neural networks, including mean squared error and cross entropy. 4) It states that the objective is to find the weights that minimize the loss, and gradient descent can be used to optimize the weights in a computationally efficient manner through backpropagation.

Uploaded by

dnthrtm3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views4 pages

Notes Chapter8

Uploaded by

dnthrtm3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

EEE 485-585, SPRING 2019

Chapter 8 - Perceptron, Neural Networks and

Backpropagation

Slide 8.6

Perceptron is an online algorithm that takes one data instance at each iteration, and updates the
weights based on the performance on that particular data instance. When the classes are linearly
separable, it converges in finite number of iterations, and is able to classify all training data points
correctly. The learning rate of perceptron can be taken as a constant ηpnq “ η.
When the prediction at the nth iteration is correct, we have ypnq ´ ŷpnq “ 0. This implies that
wpn ` 1q “ wpnq.
When the prediction at the nth iteration is wrong, there are two possibilities:
• If ŷpnq “ 1 and ypnq “ 0, then wpn ` 1q “ wpnq ´ ηpnqxpnq, so the weight moves in the
opposite direction of xpnq.
• If ŷpnq “ 0 and ypnq “ 1, then wpn ` 1q “ wpnq ` ηpnqxpnq, so the weight moves in the
direction of xpnq.
This ensures that the weights are updated in a way to correct mistakes.

Slide 8.10

Single layer perceptrons are only capable of learning linearly separable patterns. The XOR problem
is not linearly separable. However, multilayer neural networks can solve this problem. Moreover,
they can also approximate functions. In class, we discussed how the XOR problem can be solved by
using a neural network with one hidden layer. There, we manually calculated the weights to solve the
XOR problem. This is not possible to do in high-dimensional and complex datasets. Thus, we need
methods that will automatically learn the weights by using the training dataset.
The perceptron we considered used the step function as the activation function. Due to the disconti-
nuity at 0 and the zero derivatives everywhere except 0, it is not practical to use when learning the
weights via gradient descent (which is the standard method to train neural networks). Therefore, other
activation functions are used to train neural networks with many neurons and many layers. Some of
the examples are given below. The first two are sigmoid nonlinearities.
Logistic function:
1
φpvq “ ,a ą 0
1 ` expp´avq
where the hyperparameter a can be taken as 1. The derivative is φ1 pvq “ aφpvqp1 ´ φpvq.
Hyperbolic tangent function:
sinh z ez ´ e´z
φpvq “ a tanhpbvq, where tanhpzq “ “ z .
cosh z e ` e´z
Softplus:
φpvq “ lnp1 ` ev q
Rectified linear unit (ReLU):
φpvq “ maxt0, vu

What is the relation between softplus and ReLU?

Slide 8.11

In this chapter, we consider feedforward neural networks. A feedforward neural network has a very
special structure. As you can see from the figure a neuron in layer l is connected to all other neurons
in layer l ` 1. There are no direct connections between neurons in non-adjecent layers and between
neurons in the same layer. Although the figure shows a neural network with two hidden layers, we
will represent the results for neural networks with an arbitrary number layers.
Feedforward neural network structure:
Consider a neural network with L ´ 1 hidden layers. We call input layer, layer 0. Hidden layers are
indexed by l P t1, . . . , L ´ 1u and the output layer is called layer L. We use the following notation:

• dplq : number of neurons in layer l. We index the bias neuron (bias term) by 0.
pl´1q
• xi , 0 ď i ď dpl´1q : output of ith neuron at layer l ´ 1.
plq
• xj , 0 ď j ď dplq : output of jth neuron at layer l.
plq
• wij , 0 ď i ď dpl´1q , 1 ď j ď dplq : weight between ith neuron in layer l ´ 1 and jth neuron
in layer l.
plq
• w “ twij u: set of all weights.
plq
• vj , 1 ď j ď dplq : induced local field of jth neuron at layer l.

We have for j ą 0:
dpl´1q
ÿ
plq plq pl´1q
vj “ wij xi .
i“0
Thus, the output of neuron j at layer l is given as
´ ¯
plq plq
x j “ φ vj .

Performing a forward pass to compute the output given the input:

p0q p0q
In a feedforward neural network, given a dp0q -dimensional input x “ rx1 , . . . , xdp0q sT the dpLq -
pLq pLq
dimensional output xpLq “ rx1 , . . . , xdpLq sT can be computed by calculating the outputs of layers
1 to L consecutively:
» ´ř pl´1q ¯ fi
plq d plq pl´1q
x1 “ φ i“0 wi1 xi
» p0q fi » p1q fi » pLq fi
x1 x1 — ffi x1
— .. ffi — .. ffi . . ffi
x “ – . fl Ñ – . fl Ñ . . . Ñ — .. ffi Ñ . . . Ñ – .. fl “ xpLq
— ffi —
p0q p1q
– ´ řdpl´1q plq pl´1q ¯fl pLq
plq
xdp0q xdp1q xdplq “ φ i“0 widplq xi
xdpLq

Assuming that the activation functions are fixed, the output xpLq is a (vector-valued) func-
tion of the input and the weights in the network, and thus, it can be written as f w pxq “
pLq
rfw,1 pxq, . . . , fw,dpLq pxqsT “ xpLq . The kth component of f w pxq is fw,k pxq “ xk .
Loss functions:
Consider a dataset D “ tpxi , y i quni“1 . The loss on data instance i is given as lpf w pxi q, y i q, where
lp¨, ¨q is usually taken as a continuously differentiable function for the sake of computing gradients.
The loss over entire D is given as
n
ÿ
Jpwq “ lpf w pxi q, y i q.
i“1

2
pLq pLq
In a regression problem usually the identity activation is used at the output layer. Thus, xk “ vk .
In this case, lp¨, ¨q can be taken as the squared error
pLq
dÿ
lpf w pxi q, y i q “ pyik ´ fw,k pxi qq2 .
k“1

In a regression problem usually dpLq “ 1. In this case, the loss becomes

lpf w pxi q, y i q “ pyi ´ fw pxi qq2 .

When using a neural network for a classification problem, each class is assigned to one output neuron.
Thus, if there are K classes, then dpLq “ K. It is also customary for the output to represent a
probability distribution over the classes. For this, usually softmax activation function is used at the
output layer. Then, the output becomes
pLq
pLq e vk
xk “ ř pLq pLq
, 1 ď k ď dpLq .
d vj
j“1 e
pLq
Predictions can be produced by argmax operation, i.e., ŷ “ arg max1ďkďdpLq xk .
In addition, labels of data instances are one-hot encoded, i.e., if data instance i belongs to class k,
then its label is represented by y i “ ryi1 , . . . , yiK s where yik “ 1 and yij “ 0 for j ‰ k. A suitable
loss function for classification problems is the cross-entropy
pLq
dÿ
lpf w pxi q, y i q “ ´ yik logpfw,k pxi qq.
k“1

Since class probabilities are always in r0, 1s, this loss depends on the log of the probability that the
output puts on the correct label.
Minimizing the loss:
Similar to what is done in the previous chapters, our objective is to select a set of weights that
minimize the loss, i.e.,
w˚ “ arg min Jpwq.
w

This is a daunting task since there are many parameters to optimize. Moreover, Jpwq can have many
local minimizers whose performance is much worse than the global minimizer. Nevertheless, in the
following part, we will explain how gradient descent can be implemented in a feedworfard neural
network in a computationally efficient way to minimize Jpwq.
Batch gradient descent: Start with a random set of initial weights wp0q. At each iteration perform the
following update
wpn ` 1q “ wpnq ´ η∇Jpwpnqq
where Jpwpnqq denotes the loss of neural network with weights wpnq on the entire training data.
This is not the preferred method since we need to evaluate the loss on the entire dataset at each
iteration, which is computationally prohibitive.
Stochastic gradient descent (SGD): Start with a random set of initial weights wp0q. At each iteration
randomly pick one data instance pxpnq, ypnqq from D, and perform the following update:
wpn ` 1q “ wpnq ´ η∇lpf wpnq pxpnqq, ypnqq.
This allows faster updates. Moreover, the randomization also helps by avoiding bad local minimum,
and the expectation of the gradient taken over the randomness of the chosen data sample is equal to
the batch gradient.
To make SGD less random, usually mini batch gradient descent is preferred, where the gradient is
computed over a mini batch of samples (e.g., 10 samples) instead of a single sample.

3
Slide 8.14

Here, we explain how to compute gradients using back-propagation. We focus on SGD. Let
epwq “ lpf w pxq, yq.
For each layer l neuron j let
plq Bepwq
δj “ ´ plq
Bvj
pl´1q plq
By the chain rule, we can calculate δi for each layer l ´ 1 neuron i if we know δj for each layer
l neuron j as follows
plq plq plq
dÿ pl´1q dÿ
pl´1q Bepwq Bepwq Bvj Bxi plq plq pl´1q
δi “´ pl´1q
“ ´ plq
ˆ pl´1q
ˆ pl´1q
“ δj wij φ1 pvi q
Bvi j“1 Bvj Bxi Bvi j“1

Thus, all gradients can be computed in the following way:

pLq
• First compute δj for j “ 1, . . . , dpLq .
plq pl´1q
• Use δj , j “ 1, . . . , dplq to compute δj for all j “ 1, . . . , dpl´1q .

To update the weights, we need to compute the gradients with respect to the weights. This can be
done easily after δs are computed. Note that
plq
Bepwq Bepwq Bvj plq pl´1q
plq
“ plq
ˆ plq
“ ´δj xi .
Bwij Bvj Bwij

Finally, we show how δs can be computed at the output layer. We illustrate this using the squared
error loss function. A similar analysis can be done for the cross-entropy loss function as well. Let
pLq
d
1 ÿ 2 pLq
epwq “ e pwq, where ek pwq “ pyk ´ ŷk q “ pyk ´ φpvk qq.
2 k“1 k

Then,
´ ř pLq ¯
1 d
B 2 k“1 e2k pwq Bej pwq
pLq pLq
δj “´ pLq
“ ´ej pwq pLq
“ ej pwqφ1 pvj q
Bvj Bvj

Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
3EBX0_lecture_notes_addendum
No ratings yet
3EBX0_lecture_notes_addendum
10 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
Deep+Learning+Module-02+Search+Creators
No ratings yet
Deep+Learning+Module-02+Search+Creators
15 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
Lecture_09_slides_-_after
No ratings yet
Lecture_09_slides_-_after
57 pages
ML807_Distributed_and_Federated_Learning_Slides_2
No ratings yet
ML807_Distributed_and_Federated_Learning_Slides_2
211 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Lecture8 DeepLearning
No ratings yet
Lecture8 DeepLearning
94 pages
Mid 1 DL Notes
No ratings yet
Mid 1 DL Notes
15 pages
Module 3.Docxaiml
No ratings yet
Module 3.Docxaiml
20 pages
neural-networks-essay-feranmi-dere
No ratings yet
neural-networks-essay-feranmi-dere
7 pages
Deep learning
No ratings yet
Deep learning
15 pages
5_From Linear Models to Multi-layer Perceptrons
No ratings yet
5_From Linear Models to Multi-layer Perceptrons
45 pages
Sparse Autoencoder
No ratings yet
Sparse Autoencoder
15 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
AML 03 Dense Neural Networks
No ratings yet
AML 03 Dense Neural Networks
20 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
A2.2 DNN Update 2
No ratings yet
A2.2 DNN Update 2
51 pages
Lecture Slides 2 - Neural Networks - 2021
No ratings yet
Lecture Slides 2 - Neural Networks - 2021
42 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
cs188 sp23 Note25
No ratings yet
cs188 sp23 Note25
8 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
Slides 11
No ratings yet
Slides 11
48 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Lecture 4
No ratings yet
Lecture 4
50 pages
NeuralNetworks
No ratings yet
NeuralNetworks
29 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
MODULE 2 DL SNOTES P1
No ratings yet
MODULE 2 DL SNOTES P1
16 pages
week 03-04 - Deep Feedforward Networks - Intro
No ratings yet
week 03-04 - Deep Feedforward Networks - Intro
141 pages
cst414- Deep learning
No ratings yet
cst414- Deep learning
34 pages
DL
No ratings yet
DL
73 pages
Neural-Network(Basics)
No ratings yet
Neural-Network(Basics)
48 pages
a imprimer 4
No ratings yet
a imprimer 4
4 pages
unit 2 -ml
No ratings yet
unit 2 -ml
18 pages
Neural Networks Handout
No ratings yet
Neural Networks Handout
7 pages
Ch2-Training, Optimization and Regularization of DNN-new (1)
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new (1)
114 pages
NNDL
No ratings yet
NNDL
96 pages
tutorial 1,2
No ratings yet
tutorial 1,2
12 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
3 Non Linear Classifiers
No ratings yet
3 Non Linear Classifiers
74 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
Module 3_Modified
No ratings yet
Module 3_Modified
106 pages
Machine Learning
No ratings yet
Machine Learning
83 pages
Sparseautoencoder 2011new
No ratings yet
Sparseautoencoder 2011new
19 pages
Module 2
No ratings yet
Module 2
44 pages
ML unit 4
No ratings yet
ML unit 4
23 pages
ML Lec 10 Neural Networks
No ratings yet
ML Lec 10 Neural Networks
87 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
From Everand
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
Fouad Sabry
No ratings yet
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
From Everand
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
Fouad Sabry
No ratings yet
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
From Everand
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
Fouad Sabry
No ratings yet
Sapienza_University_of_Rome
No ratings yet
Sapienza_University_of_Rome
14 pages
F1 in Schools Team Proposal
No ratings yet
F1 in Schools Team Proposal
10 pages
Addie Model
No ratings yet
Addie Model
11 pages
17-50003
No ratings yet
17-50003
2 pages
High School Student Workbook
No ratings yet
High School Student Workbook
52 pages
Listening Comprehension Activities
No ratings yet
Listening Comprehension Activities
2 pages
Bolinao, Pangasinan: Luna National High School
No ratings yet
Bolinao, Pangasinan: Luna National High School
6 pages
1-Sentence-Summary: A Beautiful Mind Tells The Fascinating Story of The Mathematical Genius
No ratings yet
1-Sentence-Summary: A Beautiful Mind Tells The Fascinating Story of The Mathematical Genius
4 pages
1184 - Pre Intermediate Test 1
No ratings yet
1184 - Pre Intermediate Test 1
4 pages
Applied Cyber Security and the Smart Grid Implementing Security Controls Into the Modern Power Infrastructure 1st Edition by Eric Knapp, Raj Samani ISBN 1597499989 9781597499989 instant download
100% (3)
Applied Cyber Security and the Smart Grid Implementing Security Controls Into the Modern Power Infrastructure 1st Edition by Eric Knapp, Raj Samani ISBN 1597499989 9781597499989 instant download
70 pages
About Koha
No ratings yet
About Koha
9 pages
Suggestopedia or Desuggestopedia
No ratings yet
Suggestopedia or Desuggestopedia
35 pages
27 - Semi Detailed Lesson Plan About Cite Evidences To Support General Statement
No ratings yet
27 - Semi Detailed Lesson Plan About Cite Evidences To Support General Statement
5 pages
7es Lesson Plan Template1pdf
100% (3)
7es Lesson Plan Template1pdf
2 pages
Unit 4 Test On Grammar and Vocab-Ry!
100% (1)
Unit 4 Test On Grammar and Vocab-Ry!
4 pages
Taofik Yusmansyah PPT ICERD 2018
No ratings yet
Taofik Yusmansyah PPT ICERD 2018
16 pages
Resume Muhammad Krisna Adiputro
No ratings yet
Resume Muhammad Krisna Adiputro
2 pages
Sawaneh Tajushshariah (Roman Urdu)
100% (3)
Sawaneh Tajushshariah (Roman Urdu)
119 pages
Chinese Lacquer: Much More Than Chinese Lacquer: Studies in Conservation September 2014
No ratings yet
Chinese Lacquer: Much More Than Chinese Lacquer: Studies in Conservation September 2014
4 pages
The Characteristics of The Labor Market Are As Follows
No ratings yet
The Characteristics of The Labor Market Are As Follows
20 pages
Choosing Career
No ratings yet
Choosing Career
2 pages
CHAPTER-1-TO-3
No ratings yet
CHAPTER-1-TO-3
15 pages
Ayaz CV
No ratings yet
Ayaz CV
2 pages
Q2 Handicraft 7 - 8 - Module 1 W1 2 1
No ratings yet
Q2 Handicraft 7 - 8 - Module 1 W1 2 1
28 pages
Week 2 - Introduction To Corporate Governance
No ratings yet
Week 2 - Introduction To Corporate Governance
24 pages
Chinese
No ratings yet
Chinese
8 pages
Competitive Promotions Long Advert Health
No ratings yet
Competitive Promotions Long Advert Health
46 pages
The Importance of Economics For Engineers
No ratings yet
The Importance of Economics For Engineers
3 pages
Premier League Fixtures
No ratings yet
Premier League Fixtures
6 pages
Factors Affecting Students
No ratings yet
Factors Affecting Students
13 pages

Notes Chapter8

Uploaded by

Notes Chapter8

Uploaded by

EEE 485-585, SPRING 2019

Chapter 8 - Perceptron, Neural Networks and

What is the relation between softplus and ReLU?

Performing a forward pass to compute the output given the input:

In a regression problem usually dpLq “ 1. In this case, the loss becomes

Thus, all gradients can be computed in the following way:

You might also like