Introduction to Artificial
Neural Networks and
Perceptron
History of ANN
From Biological to Artificial Neurons
• ANNs - first introduced in 1943 by the neurophysiologist Warren
McCulloch and the mathematician Walter Pitts.
• McCulloch and Pitts presented a simplified computational model to
perform complex computations using propositional logic.
• The early successes of ANNs until the 1960s led to the widespread belief
that we would soon be conversing with truly intelligent machines.
• When it became clear that this promise would go unfulfilled (at least for
quite a while), funding flew elsewhere and ANNs entered a long dark
era.
• In the early 1980s there was a revival of interest in ANNs as new
network architectures were invented and better training techniques
were developed.
• ANNs seem to have entered a good circle of funding and progress.
• ANNs based product news on headline, which pulls more and more
attention and funding toward them, resulting in more and more
progress, and even more amazing products.
• But by the 1990s, powerful alternative Machine Learning techniques
such as Support Vector Machines were favored by most researchers, as
they seemed to offer better results and stronger theoretical
foundations.
• Another wave of interest in ANNs is due to few good reasons :
• ANNs outperform other ML techniques on very large data and complex
problems.
• due to tremendous increase in computing power, able to train large neural
networks in a reasonable amount of time
• The training algorithms have been improved
Biological Neurons
Biological Neurons
• The cell composed of a cell body containing the nucleus and most of the
cell’s complex components, and many branching extensions
called dendrites, plus one very long extension called the axon.
• The axon’s length may be just a few times longer than the cell body, or
up to tens of thousands of times longer.
• Near its extremity the axon splits off into many branches
called telodendria, and at the tip of these branches are miniature
structures called synaptic terminals (or simply synapses), which are
connected to the dendrites (or directly to the cell body) of other
neurons.
• Biological neurons receive short electrical impulses called signals from
other neurons via these synapses.
• When a neuron receives a sufficient number of signals from other
neurons within a few milliseconds, it fires its own signals.
• Each neuron typically connected to thousands of other neurons. The
neurons are organized in a vast network of billions of neurons
• Highly complex computations can be performed by a vast network of
fairly simple neurons.
Neuron Model
• Neuron collects signals from dendrites
• Sends out spikes of electrical activity through an
axon, which splits into thousands of branches.
• At end of each branch, a synapses converts activity
into either exciting or inhibiting activity of a
dendrite at another neuron.
• Neuron fires when exciting activity surpasses
inhibitory activity
• Learning changes the effectiveness of the synapses
Neuron Model
• Abstract neuron model:
Logical Computations with Neurons
• A very simple model of the biological neuron, which later became
known as an artificial neuron proposed by Warren McCulloch and
Walter Pitts
• It has one or more binary (on/off) inputs and one binary output.
• The artificial neuron simply activates its output when more than a
certain number of its inputs are active.
• This Simplified model also outperforms in complex computations
using propositional logic.
• For example, let’s build a few ANNs that perform various logical
computations , assuming that a neuron is activated when at least two of
its inputs are active.
ANN performs simple Logical Computations
1. Identity function: If neuron A is activated, then neuron C gets activated
as well (since it receives two input signals from neuron A), but if neuron A
is off, then neuron C is off as well.
2. Logical AND: Neuron C is activated only when both neurons A and B are
activated (a single input signal is not enough to activate neuron C).
3. Logical OR: Neuron C gets activated if either neuron A or neuron B is
activated (or both).
4. Computes a slightly more complex logical proposition: Neuron C is
activated only if neuron A is active and if neuron B is off. If neuron A is
active all the time, then you get a logical NOT: neuron C is active when
neuron B is off, and vice versa.
The Perceptron
• The Perceptron is one of the simplest ANN architectures, invented in
1957 by Frank Rosenblatt.
• It is based on a slightly different artificial neuron called a linear
threshold unit (LTU).
• The inputs and output are now numbers (instead of binary on/off
values) and each input connection is associated with a weight.
• The LTU computes a weighted sum of its inputs (z = w1 x1 + w2 x2 + ⋯
+ wn xn = wT · x), then applies a step function to that sum and outputs the
result: hw(x) = step (z) = step (wT · x).
The Perceptron -linear threshold unit (LTU)
or Threshold logic unit(TLU)
.
• The most common step function used in Perceptrons is the Heaviside
step function, sometimes the signum function is used instead.
• A single LTU can be used for simple linear binary classification.
• It computes a linear combination of the inputs and if the result exceeds
a threshold, it outputs the positive class or else outputs the negative
class (just like a Logistic Regression classifier or a linear SVM).
• For example, a single LTU to classify iris flowers based on the petal
length and width.
• A Perceptron is simply composed of a single layer of LTUs, with each
neuron connected to all the inputs.
• These connections are often represented using special pass through
neurons called input neurons
• They just give output whatever input they are fed. Moreover, an extra
bias feature is generally added (x0 = 1).
• This bias feature is typically represented using a special type of neuron
called a bias neuron, which just outputs 1 all the time.
• A Perceptron with two inputs and three outputs is represented .
• This Perceptron can classify instances simultaneously into three different
binary classes, which makes it a multioutput classifier.
The Perceptron
The Perceptron
Perceptron learning rule (weight update)
Perceptron learning rule (weight update)
• wi, j is the connection weight between the ith input neuron and the
jth output neuron.
• xi is the ith input value of the current training instance.
• is the output of the jth output neuron for the current training instance.
• yj is the target output of the jth output neuron for the current training
instance.
• η is the learning rate.
• The decision boundary of each output neuron is linear, so Perceptrons
are incapable of learning complex patterns (just like Logistic Regression
classifiers).
• However, if the training instances are linearly separable, Rosenblatt
demonstrated that this algorithm would converge to a solution. This
is called the Perceptron convergence theorem.
• Scikit-Learn provides a Perceptron class that implements a single LTU
network.
OR GATE Perceptron Training Rule
OR GATE Perceptron Training Rule
OR GATE Perceptron Training Rule
Parallel Implementation of Perceptron
• The training of a perceptron is an inherently sequential process.
• If the number of dimensions of the vectors involved is huge, then we
might obtain some parallelism by computing dot products in parallel.
• In order to get significant parallelism, we have to modify the
perceptron algorithm slightly, so that many training examples are
used with the same estimated weight vector w.
• let us formulate the parallel algorithm as a MapReduce job.
Weaknesses of Perceptron
• Incapable of solving some trivial e.g., the Exclusive OR (XOR)
classification problem – highlighted by Marvin Minsky and Seymour
Papert
• So, many researchers dropped connectionism altogether (i.e., the study
of neural networks) in favor of higher-level problems such as logic,
problem solving, and search.
• Limitations of Perceptrons : eliminated by stacking multiple
Perceptrons.
• The resulting ANN is called a Multi-Layer Perceptron (MLP).
Activation Function
• The activation function decides whether a neuron should be
activated or not by calculating the weighted sum and further adding
bias to it.
• The Activation Functions can be basically divided into 2 types-
1.Linear Activation Function
2.Non-linear Activation Functions
Equation : f(x) = x
Non-linear Activation Function
• Sigmoid or Logistic Activation Function
• The main reason why we use sigmoid function is because it exists
between (0 to 1). Therefore, it is especially used for models where
we have to predict the probability as an output. Since probability of
anything exists only between the range of 0 and 1, sigmoid is the
right choice.
• Tanh or hyperbolic tangent Activation Function
• tanh is also like logistic sigmoid but better. The range of the tanh
function is from (-1 to 1). tanh is also sigmoidal (s - shaped).
• The advantage is that the negative inputs will be mapped strongly
negative and the zero inputs will be mapped near zero in the tanh
graph.
• ReLU (Rectified Linear Unit) Activation Function
• The ReLU is half rectified (from bottom). f(z) is zero when z is less
than zero and f(z) is equal to z when z is above or equal to zero.
Output value y
a
Non-linear activation function σ
z
Weighted sum ∑
Weight bias
w1 w2 w3 b
s Input x1 x2 x3 +1
layer