0% found this document useful (0 votes)
33 views59 pages

Introduction To Artificial Neural Networks and Perceptron

- Artificial neural networks (ANNs) were first introduced in 1943 and aimed to mimic the human brain. Early successes led to belief in intelligent machines, but progress stalled until the 1980s with improvements in network architectures and training techniques. - Biological neurons receive and transmit electrical signals at connections called synapses. ANNs abstract this model with simple mathematical units (artificial neurons) that perform computations. - The perceptron is a basic ANN architecture, consisting of a single layer of linear threshold units (LTUs) that classify inputs. Weights are updated using the perceptron learning rule to minimize errors on the training data. However, perceptrons cannot represent non-linear problems like XOR. - The limitations

Uploaded by

Prathik Narayan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views59 pages

Introduction To Artificial Neural Networks and Perceptron

- Artificial neural networks (ANNs) were first introduced in 1943 and aimed to mimic the human brain. Early successes led to belief in intelligent machines, but progress stalled until the 1980s with improvements in network architectures and training techniques. - Biological neurons receive and transmit electrical signals at connections called synapses. ANNs abstract this model with simple mathematical units (artificial neurons) that perform computations. - The perceptron is a basic ANN architecture, consisting of a single layer of linear threshold units (LTUs) that classify inputs. Weights are updated using the perceptron learning rule to minimize errors on the training data. However, perceptrons cannot represent non-linear problems like XOR. - The limitations

Uploaded by

Prathik Narayan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 59

Introduction to Artificial

Neural Networks and


Perceptron
History of ANN
From Biological to Artificial Neurons
• ANNs - first introduced in 1943 by the neurophysiologist Warren
McCulloch and the mathematician Walter Pitts.
• McCulloch and Pitts presented a simplified computational model to
perform complex computations using propositional logic.
• The early successes of ANNs until the 1960s led to the widespread belief
that we would soon be conversing with truly intelligent machines.
• When it became clear that this promise would go unfulfilled (at least for
quite a while), funding flew elsewhere and ANNs entered a long dark
era.
• In the early 1980s there was a revival of interest in ANNs as new
network architectures were invented and better training techniques
were developed.
• ANNs seem to have entered a good circle of funding and progress.
• ANNs based product news on headline, which pulls more and more
attention and funding toward them, resulting in more and more
progress, and even more amazing products.
• But by the 1990s, powerful alternative Machine Learning techniques
such as Support Vector Machines were favored by most researchers, as
they seemed to offer better results and stronger theoretical
foundations.
• Another wave of interest in ANNs is due to few good reasons :
• ANNs outperform other ML techniques on very large data and complex
problems.
• due to tremendous increase in computing power, able to train large neural
networks in a reasonable amount of time
• The training algorithms have been improved
Biological Neurons
Biological Neurons
• The cell composed of a cell body containing the nucleus and most of the
cell’s complex components, and many branching extensions
called dendrites, plus one very long extension called the axon.
• The axon’s length may be just a few times longer than the cell body, or
up to tens of thousands of times longer.
• Near its extremity the axon splits off into many branches
called telodendria, and at the tip of these branches are miniature
structures called synaptic terminals (or simply synapses), which are
connected to the dendrites (or directly to the cell body) of other
neurons.
• Biological neurons receive short electrical impulses called signals from
other neurons via these synapses.

• When a neuron receives a sufficient number of signals from other


neurons within a few milliseconds, it fires its own signals.

• Each neuron typically connected to thousands of other neurons. The


neurons are organized in a vast network of billions of neurons

• Highly complex computations can be performed by a vast network of


fairly simple neurons.
Neuron Model

• Neuron collects signals from dendrites


• Sends out spikes of electrical activity through an
axon, which splits into thousands of branches.
• At end of each branch, a synapses converts activity
into either exciting or inhibiting activity of a
dendrite at another neuron.
• Neuron fires when exciting activity surpasses
inhibitory activity
• Learning changes the effectiveness of the synapses
Neuron Model

• Abstract neuron model:


Logical Computations with Neurons
• A very simple model of the biological neuron, which later became
known as an artificial neuron proposed by Warren McCulloch and
Walter Pitts
• It has one or more binary (on/off) inputs and one binary output.
• The artificial neuron simply activates its output when more than a
certain number of its inputs are active.
• This Simplified model also outperforms in complex computations
using propositional logic.
• For example, let’s build a few ANNs that perform various logical
computations , assuming that a neuron is activated when at least two of
its inputs are active.
ANN performs simple Logical Computations

1. Identity function: If neuron A is activated, then neuron C gets activated


as well (since it receives two input signals from neuron A), but if neuron A
is off, then neuron C is off as well.
2. Logical AND: Neuron C is activated only when both neurons A and B are
activated (a single input signal is not enough to activate neuron C).

3. Logical OR: Neuron C gets activated if either neuron A or neuron B is


activated (or both).

4. Computes a slightly more complex logical proposition: Neuron C is


activated only if neuron A is active and if neuron B is off. If neuron A is
active all the time, then you get a logical NOT: neuron C is active when
neuron B is off, and vice versa.
The Perceptron
• The Perceptron is one of the simplest ANN architectures, invented in
1957 by Frank Rosenblatt.

• It is based on a slightly different artificial neuron called a linear


threshold unit (LTU).

• The inputs and output are now numbers (instead of binary on/off
values) and each input connection is associated with a weight.

• The LTU computes a weighted sum of its inputs (z = w1 x1 + w2 x2 + ⋯


+ wn xn = wT · x), then applies a step function to that sum and outputs the
result: hw(x) = step (z) = step (wT · x).
The Perceptron -linear threshold unit (LTU)
or Threshold logic unit(TLU)
.
• The most common step function used in Perceptrons is the Heaviside
step function, sometimes the signum function is used instead.

• A single LTU can be used for simple linear binary classification.


• It computes a linear combination of the inputs and if the result exceeds
a threshold, it outputs the positive class or else outputs the negative
class (just like a Logistic Regression classifier or a linear SVM).
• For example, a single LTU to classify iris flowers based on the petal
length and width.
• A Perceptron is simply composed of a single layer of LTUs, with each
neuron connected to all the inputs.
• These connections are often represented using special pass through
neurons called input neurons
• They just give output whatever input they are fed. Moreover, an extra
bias feature is generally added (x0 = 1).
• This bias feature is typically represented using a special type of neuron
called a bias neuron, which just outputs 1 all the time.
• A Perceptron with two inputs and three outputs is represented .
• This Perceptron can classify instances simultaneously into three different
binary classes, which makes it a multioutput classifier.
The Perceptron
The Perceptron
Perceptron learning rule (weight update)
Perceptron learning rule (weight update)

• wi, j is the connection weight between the ith input neuron and the
jth output neuron.
• xi is the ith input value of the current training instance.
• is the output of the jth output neuron for the current training instance.
• yj is the target output of the jth output neuron for the current training
instance.
• η is the learning rate.
• The decision boundary of each output neuron is linear, so Perceptrons
are incapable of learning complex patterns (just like Logistic Regression
classifiers).

• However, if the training instances are linearly separable, Rosenblatt


demonstrated that this algorithm would converge to a solution. This
is called the Perceptron convergence theorem.

• Scikit-Learn provides a Perceptron class that implements a single LTU


network.
• The Perceptron learning algorithm strongly resembles - Stochastic
Gradient Descent(SGD).
• In fact, Scikit-Learn’s Perceptron class is equivalent to using
an SGDClassifier with the following hyperparameters:
loss="perceptron", learning_rate="constant", eta0 =1 (the learning
rate), and penalty=None (no regularization).
• Note that contrary to Logistic Regression classifiers, Perceptrons do not
output a class probability; rather, they just make predictions based on a
hard threshold.
• This is one of the good reasons to prefer Logistic Regression over
Perceptrons.
OR GATE Perceptron Training Rule
OR GATE Perceptron Training Rule
OR GATE Perceptron Training Rule
Parallel Implementation of Perceptron

• The training of a perceptron is an inherently sequential process.


• If the number of dimensions of the vectors involved is huge, then we
might obtain some parallelism by computing dot products in parallel.
• In order to get significant parallelism, we have to modify the
perceptron algorithm slightly, so that many training examples are
used with the same estimated weight vector w.
• let us formulate the parallel algorithm as a MapReduce job.
Weaknesses of Perceptron
• Incapable of solving some trivial e.g., the Exclusive OR (XOR)
classification problem – highlighted by Marvin Minsky and Seymour
Papert

• So, many researchers dropped connectionism altogether (i.e., the study


of neural networks) in favor of higher-level problems such as logic,
problem solving, and search.

• Limitations of Perceptrons : eliminated by stacking multiple


Perceptrons.

• The resulting ANN is called a Multi-Layer Perceptron (MLP).


Perceptron Learning Theorem

• Recap: A perceptron (threshold unit) can learn anything that it can


represent (i.e. anything separable with a hyperplane)

32
The Exclusive OR problem

A Perceptron cannot represent Exclusive OR since it is not linearly


separable.

33
Minsky & Papert (1969) offered solution to XOR problem by
combining perceptron unit responses using a second layer of
Units. Piecewise linear classification using an MLP with
threshold (perceptron) units

+1

+1

34
• In particular, an MLP can solve the XOR problem
• For each combination of inputs: with inputs (0, 0) or (1, 1) the network
outputs 0, and with inputs (0, 1) or (1, 0) it outputs 1.
Multi-Layer Perceptron and Its Properties
.

• An MLP is composed of one (pass


through) input layer, one or more layers
of LTUs, called hidden layers, and one
final layer of LTUs called the output layer
(0ften more than 3 layers).
• Every layer except the output layer
includes a bias neuron
• is fully connected to the next layer
• No connections within a layer
• No direct connections between input and
•Number of output units need not equal output layers
number of input units • When an ANN has two or more hidden
• Number of hidden units per layer can be layers, it is called a deep neural network
more or less than input or output units (DNN)
What do each of the layers do?

3rd layer can generate arbitrarily


1st layer draws linear 2nd layer combines the complex boundaries
boundaries boundaries
37
MLP with Back propagation
• Rumelhart introducing the back propagation training algorithm for MLP.

• BP has two phases: Forward Pass and Backward Pass


• Forward pass phase: computes ‘functional signal’, feed forward
propagation of input pattern signals through network
• For each training instance, the algorithm feeds it to the network and computes
the output of every neuron in each consecutive layer

• Backward pass phase: computes ‘error signal’, propagates the error


backwards through network starting at output units (where the error is
the difference between actual and desired output values)
• It then proceeds to measure how much of these error contributions
came from each neuron in the previous hidden layer—and so on until
the algorithm reaches the input layer.

• This reverse pass efficiently measures the error gradient across all the
connection weights in the network by propagating the error gradient
backward in the network (hence the name of the algorithm).
• For each training instance the back propagation algorithm :
1. First makes a prediction (forward pass),
2. Measures the error,
3. Then goes through each layer in reverse to
measure the error contribution from each
connection (reverse pass),
4. and finally slightly tweaks the connection weights
to reduce the error (Gradient Descent step).
Key change to the MLP’s
architecture
• To work algorithm properly ,
• the step function replaced with the logistic function, σ(z) =
1 / (1 + exp(–z)).
• Because ,
• The step function contains only flat segments - Gradient
Descent cannot move on a flat surface
• The logistic function has a well-defined nonzero derivative
everywhere, allowing Gradient Descent to make some
progress at every step.
• The back propagation algorithm also used with two more
popular activation functions :
• The hyperbolic tangent function: tanh (z) = 2σ(2z) – 1
• It is S-shaped, continuous, and differentiable, but its output
value ranges from –1 to 1 (instead of 0 to 1 in the case of the
logistic function),
• which tends to make each layer’s output more or less
normalized (i.e., centered around 0) at the beginning of
training. This often helps speed up convergence.
• The ReLU function: ReLU (z) = max (0, z).
• It is continuous but unfortunately not differentiable at z = 0
(the slope changes abruptly, which can make Gradient
Descent bounce around).
• In practice it works very well and has the advantage of being
fast to compute.
• Most importantly, the fact that it does not have a maximum
output value also helps reduce some issues during Gradient
Descent
Conceptually: Forward Activity -
Backward Error

45
Forward Propagation of Activity
• Step 1: Initialise weights at random, choose a
learning rate η
• Until network is trained:
• For each training example i.e. input pattern and
target output(s):
• Step 2: Do forward pass through net (with fixed
weights) to produce output(s)
• i.e., in Forward Direction, layer by layer:
• Inputs applied
• Multiplied by weights
• Summed
• ‘Squashed’ by sigmoid activation function
• Output passed to each neuron in next layer
• Repeat above until network output(s) produced

46
Step 3. Back-propagation of error
 Compute error (delta or local gradient) for each output
unit δ k
 Layer-by-layer, compute error (delta or local gradient)
for each hidden unit δ j by backpropagating errors
(as shown previously)

Step 4: Next, update all the weights Δwij


Update = LearningFactor· (DesiredOutput − ActualOutput) · Input
By gradient descent, and go back to Step 2
 The overall MLP learning algorithm, involving forward pass
and backpropagation of error (until the network training
completion), is known as the Generalised Delta Rule (GDR),
or more commonly, the Back Propagation (BP) algorithm

47
‘Back-prop’ algorithm summary (with NO Maths!)

48
‘Back-prop’ algorithm summary
(with Maths!) (Not Examinable)

49
MLP/BP: A worked example

50
Worked example: Forward Pass

51
Worked example: Forward Pass

52
Worked example: Backward Pass

53
Worked example: Update Weights
Using Generalized Delta Rule (BP)
Update = LearningFactor· (DesiredOutput − ActualOutput) ·
Input

54
Similarly for the all weights wij:

55
Verification that it works

56
• An MLP is often used for classification, with each output
corresponding to a different binary class
• e.g., spam/ham, urgent/not-urgent, and so on.
• When the classes are exclusive (e.g., classes 0 through 9 for
digit image classification), the output layer is typically
modified by replacing the individual activation functions by a
shared softmax function.

• The output of each neuron corresponds to the estimated


probability of the corresponding class.

Note that the signal flows only in one direction (from the
inputs to the outputs), so this architecture is an example of
a feedforward neural network (FNN).
• Biological neurons seem to implement a roughly sigmoid (S-shaped)
activation function, so researchers stuck to sigmoid functions for a very
long time.

• But it turns out that the ReLU activation function generally works better
in ANNs.

You might also like