0% found this document useful (0 votes)
3 views27 pages

Lecture - 05 (Introduction to ANN)

The document introduces Artificial Neural Networks (ANN) and their components, focusing on the perceptron as a fundamental model. It explains the differences between neural networks and traditional machine learning techniques, the role of activation functions, and how perceptrons can implement basic logic gates. Various activation functions like sigmoid, softmax, tanh, and ReLU are discussed, highlighting their characteristics and applications in neural network architectures.

Uploaded by

tedid49270
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views27 pages

Lecture - 05 (Introduction to ANN)

The document introduces Artificial Neural Networks (ANN) and their components, focusing on the perceptron as a fundamental model. It explains the differences between neural networks and traditional machine learning techniques, the role of activation functions, and how perceptrons can implement basic logic gates. Various activation functions like sigmoid, softmax, tanh, and ReLU are discussed, highlighting their characteristics and applications in neural network architectures.

Uploaded by

tedid49270
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Introduction to ANN

Biometrics (CSE 717)

Lecture - 05
Acknowledgement

Networks of Artificial Neurons, Single Layer


Perceptrons by John A. Bullinaria, 2015

https://towardsdatascience.com/what-the-hell-is-
perceptron-626217814f53

https://medium.com/@stanleydukor/neural-
representation-of-and-or-not-xor-and-xnor-logic-
gates-perceptron-algorithm-b0275375fea1
Biological Neuron
McCulloch-Pitts Neurons
The first computational model of a neuron was proposed by Warren
MuCulloch (neuroscientist) and Walter Pitts (logician) in 1943.

It may be divided into 2 parts. The first part, g takes an input, performs
an aggregation and based on the aggregated value, the second part, f
makes a decision.

4
Neural Networks vs ML Techniques
A neural network is a method in artificial intelligence that
teaches computers to process data in a way that is inspired by
the human brain.

A Neural Network arranges algorithms in such a way that it can


make reliable decisions on its own, whereas a ML Model makes
decisions based on what it has learnt from the data. As a result,
while Machine Learning models may learn from data, they may
need some human interaction in the early stages.

5
Need for Systematic Notation

6
Perceptron
• Perceptron is a single layer neural network and a multi-layer perceptron is
called Neural Networks.
• Perceptron is a linear classifier (binary). Also, it is used in supervised
learning. It helps to classify the given input data.
• The perceptron consists of 4 parts.
• Input values or One input layer
• Weights and Bias
• Net sum
• Activation Function

7
How Does Perceptron Works?
• All the inputs x are multiplied with their weights w. Let’s call it k.
• Add all the multiplied values and call them Weighted Sum.
• Apply that weighted sum to the correct Activation Function.

Unit Step Activation Function

8
Weight, Bias and Activation Function
• Weights shows the strength of the particular node.
• A bias value allows you to shift the activation function curve up or down.
• Activation functions are used to map the input between the required
values like (0, 1) or (-1, 1).

9
Implementing Logic Gates

10
Implementing Logic Gates

11
Implementing Logic Gates: AND
Prediction (y`) = 1 if Wx+b > 0 and 0 if Wx+b ≤ 0

12
Implementing Logic Gates: OR
Prediction (y`) = 1 if Wx+b > 0 and 0 if Wx+b ≤ 0

13
Implementing Logic Gates: NOT
Prediction (y`) = 1 if Wx+b > 0 and 0 if Wx+b ≤ 0

14
Implementing Logic Gates: NOR
Prediction (y`) = 1 if Wx+b > 0 and 0 if Wx+b ≤ 0

15
Implementing Logic Gates: NAND
Prediction (y`) = 1 if Wx+b > 0 and 0 if Wx+b ≤ 0

16
Implementing Logic Gates: XOR
Prediction (y`) = 1 if Wx+b > 0 and 0 if Wx+b ≤ 0

17
ANN Architectures

18
Examples of Network Types

19
Activation Functions
• Activation functions are used in a neural network to compute the
weighted sum of inputs and biases, which is in turn used to
decide whether a neuron can be activated or not.
• It manipulates the presented data and produces an output for the
neural network that contains the parameters in the data.
• The activation functions are also referred to as transfer functions
in some literature.
• These can either be linear or nonlinear depending on the function
it represents and is used to control the output of neural networks
across different domains.

The need for these activation functions includes converting the linear
input signals and models into non-linear output signals, which aids the
learning of high order polynomials for deeper networks.
Activation Functions
• The Sigmoid Function curve looks like a S-shape.

• We prefer to use sigmoid function because it


exists between (0 to 1).
• It is used for models where we have to predict
the probability as an output.
• The function is differentiable. That means, we
can find the slope of the sigmoid curve at any
two points.
• The function is monotonic but function’s
derivative is not.
• The logistic sigmoid function can cause a
neural network to get stuck at the training time.
• Sigmoid function is computationally expensive,
causes vanishing gradient problem and not
zero-centred. This method is generally used for
binary classification problems.
Contd..
• Softmax function is described as a combination of multiple
sigmoids.
• The softmax function can be used for multiclass classification
problems.
• This function returns the probability for a datapoint belonging to
each individual class.
• The mathematical expression

• While building a network for a multiclass problem, the output layer would have
as many neurons as the number of classes in the target.
• For instance if we have three classes, there would be three neurons in the output
layer. Suppose you got the output from the neurons as [1.2 , 0.9 , 0.75].
• Applying the softmax function over these values, we will get the following result
– [0.42 , 0.31, 0.27].
• These represent the probability for the data point belonging to each class.
Contd..
• The tanh function is very similar to the
sigmoid function.
• The only difference is that it is symmetric
around the origin.
• The range of values in this case is from -1
to 1.
• The advantage is that the negative inputs
will be mapped strongly negative and the
zero inputs will be mapped near zero in
the tanh graph.
• The function is differentiable.
• The function is monotonic while its
derivative is not monotonic.
• Both tanh and logistic sigmoid activation
functions are used in feed-forward nets.
Contd..
• Rectified Linear Unit: The rectified linear unit layer (ReLU)
is an activation function g that is used on all elements of
the volume. It aims at introducing non-linearity to the
network. Its variants are summarized below:
Contd..
• The ReLU is half rectified (from • In Leaky RELU, the leak helps to
bottom). increase the range of the ReLU function.
• f(z) is zero when z is less than zero Usually, the value of a is 0.01 or so.
and f(z) is equal to z when z is above • When a is not 0.01 then it is called
or equal to zero. Randomized ReLU.
• Range: [ 0 to infinity) • Therefore the range of the Leaky ReLU is
• The function and its derivative both (-infinity to infinity).
are monotonic. • Both Leaky and Randomized ReLU
• But the issue is that all the negative functions are monotonic in nature. Also,
values become zero immediately their derivatives also monotonic in
which decreases the ability of the nature.
model to fit or train from the data • Exponential Linear Unit or ELU is also a
properly. That means any negative variant of Rectified Linear Unit (ReLU)
input given to the ReLU activation that modifies the slope of the negative
function turns the value into zero part of the function.
immediately in the graph, which in • Unlike the leaky ReLU and parametric
turns affects the resulting graph by not ReLU functions, instead of a straight line,
mapping the negative values ELU uses a log curve for defining the
appropriately. negative values.
Building Neural Networks

27

You might also like