Artificial Neural Networks
by
Dr.A.Sharmila, SELECT
1
ANN
According to the father of Artificial Intelligence, John McCarthy, it is
“The science and engineering of making intelligent machines, especially
intelligent computer programs”.
Artificial Intelligence is a way of making a computer, a computer-
controlled robot, or a software think intelligently, in the similar
manner the intelligent humans think.
AI is accomplished by studying how human brain thinks, and how humans
learn, decide, and work while trying to solve a problem, and then using
the outcomes of this study as a basis of developing intelligent software
and systems.
From a practical point of view, an ANN is just a parallel computational
system consisting of many simple processing elements connected
together in a specific way in order to perform a particular task.
Why Study Artificial Neural Networks?
They are extremely powerful computational devices (Turing equivalent,
universal computers)
Massive parallelism makes them very efficient
They can learn and generalize from training data – so there is no need for
enormous deeds of programming
They are particularly fault tolerant – this is equivalent to the “graceful
degradation” found in biological systems
They are very noise tolerant – so they can cope with situations where
normal symbolic systems would have difficulty
In principle, they can do anything a symbolic/logic system can do, and
more. (In practice, getting them to do it can be rather difficult…)
3
What are Artificial Neural Networks Used for?
As with the field of AI in general, there are two basic goals for
neural network research:
Brain modeling: The scientific goal of building models of how
real brains work
This can potentially help us understand the nature of human
intelligence, formulate better teaching strategies, or better
remedial actions for brain damaged patients.
Artificial System Building : The engineering goal of building
efficient systems for real world applications.
This may make machines more powerful, relieve humans of
tedious tasks, and may even improve upon human
performance.
4
What are Artificial Neural Networks Used for?
Brain modeling
Models of human development – help children with developmental problems
Simulations of adult performance – aid our understanding of how the brain
works
Neuropsychological models – suggest remedial actions for brain damaged
patients
Real world applications
Financial modeling – predicting stocks, shares, currency exchange rates
Other time series prediction – climate, weather, airline marketing tactician
Computer games – intelligent agents, backgammon, first person shooters
Control systems – autonomous adaptable robots, microwave controllers
Pattern recognition – speech recognition, hand-writing recognition, sonar
signals
Data analysis – data compression, data mining
Noise reduction – function approximation, ECG noise reduction
Bioinformatics – protein secondary structure, DNA sequencing 5
Learning in Neural Networks
There are many forms of neural networks. Most
operate by passing neural ‘activations’ through a
network of connected neurons.
One of the most powerful features of neural
networks is their ability to learn and generalize
from a set of training data. They adapt the
strengths/weights of the connections between
neurons so that the final output activations are
correct.
6
Learning in Neural Networks
There are three broad types of learning:
1. Supervised Learning (i.e. learning with a teacher)
2. Reinforcement learning (i.e. learning with limited
feedback)
3. Unsupervised learning (i.e. learning with no help)
7
A Brief History
1943 McCulloch and Pitts proposed the McCulloch-Pitts neuron model
1949 Hebb published his book The Organization of Behavior, in which the Hebbian learning rule was proposed.
1958 Rosenblatt introduced the simple single layer networks now called Perceptrons.
1969 Minsky and Papert’s book Perceptrons demonstrated the limitation of single layer perceptrons, and almost
the whole field went into hibernation.
1982 Hopfield published a series of papers on Hopfield networks.
1982 Kohonen developed the Self-Organizing Maps that now bear his name.
1986 The Back-Propagation learning algorithm for Multi-Layer Perceptrons was re-discovered and the whole
field took off again.
1990s The sub-field of Radial Basis Function Networks was developed.
2000s The power of Ensembles of Neural Networks and Support Vector Machines becomes apparent.
8
Overview
Artificial Neural Networks are powerful computational systems
consisting of many simple processing elements connected
together to perform tasks analogously to biological brains.
They are massively parallel, which makes them efficient, robust,
fault tolerant and noise tolerant.
They can learn from training data and generalize to new
situations.
They are useful for brain modeling and real world applications
involving pattern recognition, function approximation,
prediction, …
9
The Nervous System
The human nervous system can be divided into three stages that may be
represented in block diagram form as:
The receptors collect information from the environment – e.g. photons on
the retina.
The effectors generate interactions with the environment – e.g. activate
muscles.
The flow of information/activation is represented by arrows – feed forward
and feedback.
10
Levels of Brain Organization
The brain contains both large scale and small scale anatomical
structures and different functions take place at higher and lower
levels. There is a hierarchy of interwoven levels of organization:
1. Molecules and Ions
2. Synapses
3. Neuronal microcircuits
4. Dendritic trees
5. Neurons
6. Local circuits
7. Inter-regional circuits
8. Central nervous system
The ANNs we study in this module are crude approximations to
levels 5 and 6.
11
Brains vs. Computers
There are approximately 10 billion neurons in the human cortex,
compared with 10 of thousands of processors in the most powerful
parallel computers.
Each biological neuron is connected to several thousands of other
neurons, similar to the connectivity in powerful parallel computers.
Lack of processing units can be compensated by speed. The typical
operating speeds of biological neurons is measured in milliseconds (10-3
s), while a silicon chip can operate in nanoseconds (10-9 s).
The human brain is extremely energy efficient, using approximately 10-
16 joules per operation per second, whereas the best computers today
use around 10-6 joules per operation per second.
Brains have been evolving for tens of millions of years, computers have
been evolving for tens of decades.
12
Structure of a Human Brain
13
The Human Brain
• Brain contains about 10 10 basic units called neurons. Each
neuron in turn, is connected to about 10 4 other neurons.
• A neuron is a small cell that receives electro-chemical
signals from its various sources and in turn responds by
transmitting electrical impulses to other neurons.
14
Training vs. Inference
• Training: acquiring knowledge
• Inference: solving a problem using the
acquired knowledge
15
Biological Neural Networks Synapse
A biological neuron has three types of main Synapse Dendrites
Axon
components ;dendrites, soma (or cell body) and axon. Axon
The majority of neurons encode their outputs or activations
as a series of brief electrical pulses (i.e. spikes or action Soma Soma
potentials). Dendrites
Synapse
Dendrites are the receptive zones that receive activation
from other neurons i.e. Accepts input.
The cell body (soma) of the neuron’s processes the incoming
activations and converts them into output activations i.e.
Process the inputs
Axons are transmission lines that send activation to other
neurons i.e. Turns the processed inputs to outputs.
Synapses allow weighted transmission of signals (using
neurotransmitters) between axons and dendrites to build up
large neural networks.It is electrochemical contact between
neurons. 16
Neural network: Definition
• Neural network: information processing paradigm inspired by
biological nervous systems, such as our brain
• Structure: large number of highly interconnected processing
elements (neurons) working together
• Like people, they learn from experience (by example)
17
Artificial Neural Network: Definition
• The idea of ANN: NNs learn relationship between cause and
effect or organize large volumes of data into orderly and
informative patterns.
• Definition of ANN: “Data processing system consisting of a large
number of simple, highly interconnected processing elements
(artificial neurons) in an architecture inspired by the structure of
the cerebral cortex of the brain”
(Tsoukalas & Uhrig, 1997).
18
Artificial Neurons
• ANNs have been developed as generalizations of
mathematical models of neural biology, based on the
assumptions that:
1. Information processing occurs at many simple
elements called neurons.
2. Signals are passed between neurons over connection links.
3. Each connection link has an associated weight, which, in
typical neural net, multiplies the signal transmitted.
4. Each neuron applies an activation function to its net input to
determine its output signal.
19
Analogy of ANN With Biological NN
Synapse
Synapse Dendrites
Axon
Axon
Soma Soma
Dendrites
Synapse
• Dendrites receives signals from other neurons.
• The soma, sums the incoming signals. When sufficient input is received, the cell fires; that is it transmit
a signal over its axon to other cells.
• Associated Terminologies of Biological and Artificial Neural Net
• Biological Neural Network Artificial Neural Network
Cell Body/Soma Neurons
Dendrite Input
Synapse Weights
20
Axon Output
Typical Architecture of ANNs
• A typical neural network contains a large number of artificial neurons
called units arranged in a series of layers.
21
Typical Architecture of ANNs (cont.)
• Input layer —It contains those units (artificial neurons) which
receive input from the outside world on which network will
learn, recognize about or otherwise process.
• Output layer—It contains units that respond to the
information about how it’s learned any task.
• Hidden layer—These units are in between input and output
layers. The job of hidden layer is to transform the input into
something that output unit can use in some way.
Most neural networks are fully connected that means to say
each hidden neuron is fully connected to the every neuron
in its previous layer(input) and to the next layer (output)
layer.
22
Popular ANNs Architectures (Sample)
http://www.asimovinstitute.org/neural-network-zoo/ 16
Popular ANNs Architectures (cont.)
Single layer perceptron —Neural Network having two
input units and one output units with no hidden layers.
Multilayer Perceptron —These networks use more than
one hidden layer of neurons, unlike single layer perceptron.
These are also known as deep feedforward neural
networks.
Hopfield Network—A fully interconnected network of
neurons in which each neuron is connected to every other
neuron. The network is trained with input pattern by setting
a value of neurons to the desired pattern. Then its weights
are computed. The weights are not changed. Once trained
for one or more patterns, the network will converge to the
learned patterns. 24
Popular ANNs Architectures (cont.)
Deep Learning Neural Network — It is a feedforward neural
networks with big structure (many hidden layers and large
number of neurons in each layer) used fro deep learning
Recurrent Neural Network —Type of neural network in which
hidden layer neurons has self-connections. Recurrent neural
networks possess memory. At any instance, hidden layer
neuron receives activation from the lower layer as well as it
previous activation value.
Long /Short Term Memory Network (LSTM) —Type of neural
network in which memory cell is incorporated inside hidden
layer neurons is called LSTM network.
Convolutional Neural Network —is a class of deep, feed-
forward artificial neural networks that has successfully been
applied to analyzing visual imagery 25
How are ANNs being used in solving problems?
• The problem variables are mainly: inputs, weights and outputs.
• Examples (training data) represent a solved problem. i.e. Both
the inputs and outputs are known
• There are many different algorithms that can be used when
training artificial neural networks, each with their own
separate advantages and disadvantages.
• The learning process within ANNs is a result of altering the
network's weights and bias (threshold), with some kind of
learning algorithm.
• The objective is to find a set of weight matrices which when
applied to the network should map any input to a correct
output.
• For a new problem, we now have the inputs and the weights,
therefore, we can easily get the outputs.
26
Learning Techniques in ANNs
Supervised Learning
•In supervised learning, the training data is input to the
network, and the desired output is known weights and bias are
adjusted until output yields desired value.
Unsupervised Learning
•The input data is used to train the network whose output
is known. The network classifies the input data and
adjusts the weight by feature extraction in input data.
Reinforcement Learning
•Here the value of the output is unknown, but the network
provides the feedback whether the output is right or wrong.
It is semi-supervised learning.
27
Learning Algorithms
𝑚
Depends on 𝑋𝑗𝑌𝑗 𝑇
𝑤=
Hebbian input-output 𝑗= 1
correlation
𝜕𝐸
Gradient minimization Δ𝑤𝑖𝑗 = 𝜂
𝜕𝑤𝑖𝑗
Descent of error E
Only output
Competitive neuron with Winner-take-
Learning high input is all strategy
updated
Weights are
Stochastic Such as
adjusted in a
simulated
Learning probabilistic
annealing
fashion 28
Learning Algorithms
Gradient Descent
• This is the simplest training algorithm used in case of
supervised training model.
• In case, the actual output is different from target output, the
difference or error is find out.
• The gradient descent algorithm changes the weights of the
network in such a manner to minimize this error.
Back propagation
• It is an extension of gradient based delta learning rule.
• Here, after finding an error (the difference between desired
and target), the error is propagated backward from output
layer to the input layer via hidden layer.
• It is used in case of multilayer neural network.
29
Learning Data Sets in ANN
Training set
• A training dataset is a dataset of examples used for learning, that is to
fit the parameters (e.g., weights) of a classifier.
• One Epoch comprises of one full training cycle on the training set.
Validation set (development set)
• A validation dataset is a set of examples used to tune the
hyperparameters (i.e. number of hidden) of a classifier.
• the validation set should follow the same probability distribution as the
training dataset.
Test set
• A test set is therefore a set of examples used only to assess the performance
(i.e. generalization) of a fully specified classifier.
• A better fitting of the training dataset as opposed to the test dataset usually
points to overfitting.
• the test set should follow the same probability distribution as the training
dataset. 30
Applications of ANNs
• Signal processing
• Pattern recognition, e.g.
handwritten characters
or face
identification.
• Diagnosis or mapping symptoms to a
medical case.
• Speech recognition
• Human Emotion Detection
• Educational Loan Forecasting
• Computer Vision
• Deep Learning
31
The McCulloch-Pitts Neuron
27
The First Artificial Neuron
As mentioned in the research history McCulloch and Pitts (1943) produced the first neural
network, which was based on their artificial neuron. Although this work was developed in the
early forties, many of the principles can still be seen in the neural networks of today.
We can make the following statements about a McCulloch-Pitts network
The activation of a neuron is binary. That is, the neuron either fires (activation of one) or does not
fire (activation of zero).
For the network shown in Fig. the activation function for unit Y is
f(y_in) = 1, if y_in >= T, else 0
where y_in is the total input signal received
T is the threshold for Y.
Neurons in a McCulloch-Pitts network are connected by directed, weighted paths.
If the weight on a path is positive the path is excitatory, otherwise it is inhibitory.
All excitatory connections into a particular neuron have the same weight, although different
weighted connections can be input to different neurons.
Each neuron has a fixed threshold. If the net input into the neuron is greater than the threshold,
the neuron fires.
The threshold is set such that any non-zero inhibitory input will prevent the neuron from firing.
It takes one time step for a signal to pass over one connection.
The McCulloch-Pitts Neuron
This vastly simplified model of real neurons is also known as a Threshold Logic
Unit :
A set of synapses (i.e. connections) brings in activations
from other neurons.
A processing unit sums the inputs, and then applies a non-
linear activation function (i.e. squashing/transfer/threshold
function).
An output line transmits the result to other neurons.
34
Networks of McCulloch-Pitts Neurons
Artificial neurons have the same basic components as biological
neurons. The simplest ANNs consist of a set of McCulloch-Pitts
neurons labeled by indices k, i, j and activation flows between
them via synapses with strengths wki, wij:
35
Some Useful Notation
We often need to talk about ordered sets of related numbers – we call them
vectors, e.g.
x = (x1, x2, x3, …, xn) , y = (y1, y2, y3, …, ym)
The components xi can be added up to give a scalar (number), e.g.
s = x1 + x2 + x3 + … + xn = SUM(i, n, xi)
Two vectors of the same length may be added to give another vector, e.g.
z = x + y = (x1 + y1, x2 + y2, …, xn + yn)
Two vectors of the same length may be multiplied to give a scalar, e.g.
p = x.y = x1y1 + x2 y2 + …+ xnyn = SUM(i, N, xiyi)
36
Some Useful Functions
Common activation functions
Identity function
f(x) = x for all x
Binary step function (with threshold )
1 if x
f (x)
0 if x
37
Some Useful Functions
Binary sigmoid
1
f ( x)
1 e x
Bipolar sigmoid
2
g ( x) 2 f ( x) 1 x
1
1 e
3
8
The McCulloch-Pitts Neuron Equation
Using the above notation, we can now write down a simple
equation for the output of a McCulloch-Pitts neuron as a
function of its n inputs ini :
39
Review
Biological neurons, consisting of a cell body, axons,
dendrites and synapses, are able to process and
transmit neural activation.
The McCulloch-Pitts neuron model (Threshold Logic
Unit) is a crude approximation to real neurons that
performs a simple summation and thresholding function
on activation levels
Appropriate mathematical notation facilitates the
specification and programming of artificial neurons and
networks of artificial neurons.
40
Networks of McCulloch-Pitts Neurons
One neuron can’t do much on its own. Usually we will have many neurons labeled
by indices k, i, j and activation flows between them via synapses with strengths
wki, wij:
41
The Artificial Neuron
28
The McCulloch-Pitts Neuron
• In the context of neural networks, a McCulloch-
Pitts is an artificial neuron using the step
function as the activation function.
• It is also called Threshold Logic Unit.
• Threshold step function:
0 for 𝑥< 𝑇
F(x)=
= 1 for 𝑥≥ 𝑇
43
The McCulloch-Pitts Neuron (cont).
• In simple words, the output of the McCulloch-Pitts
Neuron equals to 1 if the result of the previous
equation ≥ T , otherwise the output will equals to
zero:
• Where T is a threshold value.
44
Example 1
• If a McCulloch-Pitts neuron has 3 inputs (x1=1 ,
x2=1, x3=1) and the weights are (w1=1, w2 = -1,
w3= -1) and there is no bias. Find the output?
X1
w1 1 output =0
w2
X2
T
w3
X3
Sum=(1*1)+(1*-1)+(1*-1) + 0 = -1
45
Example 2
Features of McCulloch-Pitts model
• Allows binary 0,1 states only
• Operates under a discrete-time assumption
• Weights and the neurons’ thresholds are fixed in the
model and no interaction among network neurons (no
learn)
• Just a primitive model.
• We can use multi – layer of McCulloch-Pitts neurons to
implement the basic logic gates. All we need to do is
find the appropriate connection weights and neuron
thresholds to produce the right outputs for each set of
inputs.
Activation Functions
• Assume:
• S can be anything, ranging from -inf to +inf. The neuron
really doesn’t know the bounds of the value. So how do
we decide whether the neuron should fire or not ( output
= 1 or 0).
• So, we decided to add “activation functions” for this
purpose. To check the S value produced by a neuron
and decide whether outside connections should consider
this neuron as “fired” or not. Or rather let’s say—
“activated” or not.
• activation function serves as a threshold and also
called as “Transfer function”. 48
Activation Functions (cont.)
• The activation functions can be basically divided into 2
types-
1. Linear Activation Function
2. Non-linear Activation Functions
(Unit Step, Sigmoid, Tach , ReLU, Leaky ReLU , Softmax, …..)
• In most cases, activation function are non-linear
function, that is, the role of activation function is to
make neural networks non-linear.
49
Activation Functions (cont.)
• There have been many kinds of activation functions (over 640 different
activation function proposals) that have been proposed over the years.
• However, best practice confines the use to only a limited kind of
activation functions.
• Next we will explore the most important and widely used activation
function.
• But, the most important question is ”how do we know which one to
use?”.
• Answer: Depending on best practice and nature of the problem
50
Popular Activation Functions
• Linear or Identity
• Step Activation Function (previously explained)
• Sigmoid or Logistic Activation Function
• Tanh or hyperbolic tangent Activation Function
• ReLU (Rectified Linear Unit) Activation Function
• Leaky ReLU
• Softmax function
https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6 39
Linear or Identity Activation Function
• the function is a line or linear.
• Therefore, the output of the functions will not be
confined between any range.
40
Step Activation Function
• Used in McCulloch-Pitts Neuron.
0 𝑓𝑜𝑟𝑥< 𝑇 1
F(x)=
1 𝑓𝑜𝑟𝑥≥ 𝑇
0 T
• hard limiter activation function is a special case
of step function:
1
0 𝑓𝑜𝑟𝑥< 0
F(x)=hardlim(x)=
1 𝑓𝑜𝑟𝑥≥ 0 0
• Sign activation function is a special case of step
function: 1
− 1 𝑓𝑜𝑟𝑥< 0
F(x)=sign(x)=
+ 1 𝑓𝑜𝑟𝑥≥ 0 -1 41
Sigmoid or Logistic Activation Function
• The Sigmoid Function curve looks like a S-shape.
• It exists between (0 to 1). Used in binary classifiers to
predict the probability (0 to1) as an output.
1
f(x)=
1+ 𝑒− 𝑥
54
Tanh or hyperbolic tangent Activation Function
• The Tanh Function curve looks like a S-shape.
• The range of the tanh function is from (-1 to 1)
𝑥 −𝑥
𝑒 −𝑒
f(x)= tanh 𝑥 = 𝑒𝑥+ 𝑒− 𝑥
55
ReLU (Rectified Linear Unit) Activation Function
• The ReLU is the mostly used activation function.
• Any negative input given to the ReLU activation
function turns the value into zero immediately in
the graph
f(x)= max(0, 𝑥)
56
Leaky ReLU
• Leaky ReLUs allow a small, non-zero gradient
when the unit is not active (negative values).
57
Softmax Activation Function
• The softmax function is a generalization of the
sigmoid logistic function.
• The softmax function is used in multiclass
classification methods.
58