Lecture - 05 (Introduction to ANN)
Lecture - 05 (Introduction to ANN)
Lecture - 05
Acknowledgement
https://towardsdatascience.com/what-the-hell-is-
perceptron-626217814f53
https://medium.com/@stanleydukor/neural-
representation-of-and-or-not-xor-and-xnor-logic-
gates-perceptron-algorithm-b0275375fea1
Biological Neuron
McCulloch-Pitts Neurons
The first computational model of a neuron was proposed by Warren
MuCulloch (neuroscientist) and Walter Pitts (logician) in 1943.
It may be divided into 2 parts. The first part, g takes an input, performs
an aggregation and based on the aggregated value, the second part, f
makes a decision.
4
Neural Networks vs ML Techniques
A neural network is a method in artificial intelligence that
teaches computers to process data in a way that is inspired by
the human brain.
5
Need for Systematic Notation
6
Perceptron
• Perceptron is a single layer neural network and a multi-layer perceptron is
called Neural Networks.
• Perceptron is a linear classifier (binary). Also, it is used in supervised
learning. It helps to classify the given input data.
• The perceptron consists of 4 parts.
• Input values or One input layer
• Weights and Bias
• Net sum
• Activation Function
7
How Does Perceptron Works?
• All the inputs x are multiplied with their weights w. Let’s call it k.
• Add all the multiplied values and call them Weighted Sum.
• Apply that weighted sum to the correct Activation Function.
8
Weight, Bias and Activation Function
• Weights shows the strength of the particular node.
• A bias value allows you to shift the activation function curve up or down.
• Activation functions are used to map the input between the required
values like (0, 1) or (-1, 1).
9
Implementing Logic Gates
10
Implementing Logic Gates
11
Implementing Logic Gates: AND
Prediction (y`) = 1 if Wx+b > 0 and 0 if Wx+b ≤ 0
12
Implementing Logic Gates: OR
Prediction (y`) = 1 if Wx+b > 0 and 0 if Wx+b ≤ 0
13
Implementing Logic Gates: NOT
Prediction (y`) = 1 if Wx+b > 0 and 0 if Wx+b ≤ 0
14
Implementing Logic Gates: NOR
Prediction (y`) = 1 if Wx+b > 0 and 0 if Wx+b ≤ 0
15
Implementing Logic Gates: NAND
Prediction (y`) = 1 if Wx+b > 0 and 0 if Wx+b ≤ 0
16
Implementing Logic Gates: XOR
Prediction (y`) = 1 if Wx+b > 0 and 0 if Wx+b ≤ 0
17
ANN Architectures
18
Examples of Network Types
19
Activation Functions
• Activation functions are used in a neural network to compute the
weighted sum of inputs and biases, which is in turn used to
decide whether a neuron can be activated or not.
• It manipulates the presented data and produces an output for the
neural network that contains the parameters in the data.
• The activation functions are also referred to as transfer functions
in some literature.
• These can either be linear or nonlinear depending on the function
it represents and is used to control the output of neural networks
across different domains.
The need for these activation functions includes converting the linear
input signals and models into non-linear output signals, which aids the
learning of high order polynomials for deeper networks.
Activation Functions
• The Sigmoid Function curve looks like a S-shape.
• While building a network for a multiclass problem, the output layer would have
as many neurons as the number of classes in the target.
• For instance if we have three classes, there would be three neurons in the output
layer. Suppose you got the output from the neurons as [1.2 , 0.9 , 0.75].
• Applying the softmax function over these values, we will get the following result
– [0.42 , 0.31, 0.27].
• These represent the probability for the data point belonging to each class.
Contd..
• The tanh function is very similar to the
sigmoid function.
• The only difference is that it is symmetric
around the origin.
• The range of values in this case is from -1
to 1.
• The advantage is that the negative inputs
will be mapped strongly negative and the
zero inputs will be mapped near zero in
the tanh graph.
• The function is differentiable.
• The function is monotonic while its
derivative is not monotonic.
• Both tanh and logistic sigmoid activation
functions are used in feed-forward nets.
Contd..
• Rectified Linear Unit: The rectified linear unit layer (ReLU)
is an activation function g that is used on all elements of
the volume. It aims at introducing non-linearity to the
network. Its variants are summarized below:
Contd..
• The ReLU is half rectified (from • In Leaky RELU, the leak helps to
bottom). increase the range of the ReLU function.
• f(z) is zero when z is less than zero Usually, the value of a is 0.01 or so.
and f(z) is equal to z when z is above • When a is not 0.01 then it is called
or equal to zero. Randomized ReLU.
• Range: [ 0 to infinity) • Therefore the range of the Leaky ReLU is
• The function and its derivative both (-infinity to infinity).
are monotonic. • Both Leaky and Randomized ReLU
• But the issue is that all the negative functions are monotonic in nature. Also,
values become zero immediately their derivatives also monotonic in
which decreases the ability of the nature.
model to fit or train from the data • Exponential Linear Unit or ELU is also a
properly. That means any negative variant of Rectified Linear Unit (ReLU)
input given to the ReLU activation that modifies the slope of the negative
function turns the value into zero part of the function.
immediately in the graph, which in • Unlike the leaky ReLU and parametric
turns affects the resulting graph by not ReLU functions, instead of a straight line,
mapping the negative values ELU uses a log curve for defining the
appropriately. negative values.
Building Neural Networks
27