0% found this document useful (0 votes)
35 views37 pages

ML Unit 3 Notes

The document provides an overview of supervised learning with a focus on neural networks, including their architecture, components, and types such as perceptrons and multi-layer perceptrons. It explains the relationship between biological and artificial neural networks, the role of activation functions, and the workings of perceptrons as binary classifiers. Additionally, it discusses the advantages and limitations of multi-layer perceptrons and various activation functions used in neural networks.

Uploaded by

mshraghvin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views37 pages

ML Unit 3 Notes

The document provides an overview of supervised learning with a focus on neural networks, including their architecture, components, and types such as perceptrons and multi-layer perceptrons. It explains the relationship between biological and artificial neural networks, the role of activation functions, and the workings of perceptrons as binary classifiers. Additionally, it discusses the advantages and limitations of multi-layer perceptrons and various activation functions used in neural networks.

Uploaded by

mshraghvin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 37

UNIT – III

Supervised Learning – II (Neural Networks)


Neural Network Representation – Problems – Perceptrons , Activation
Functions, ArtificialNeural Networks (ANN) , Back Propagation
Algorithm. Convolutional Neural Networks - Convolution and
Pooling layers, ,
Recurrent NeuralNetworks (RNN).

Neural Network Representation

The term " Neural Network" is derived from Biological neural networks that develop the
structure of a human brain. Similar to the human brain that has neurons interconnected to
one
anothe
r,
artifici
al
neural
networks also
have neurons
that are

interconnected to one another in various layers of the networks. These neurons are known
as nodes.

The given figure illustrates the typical diagram of Biological Neural


Network. The typical Artificial Neural Network looks something like the
given figure
Dendrites from Biological Neural Network represent inputs in Artificial Neural
Networks, cell nucleus represents Nodes, synapse represents Weights, and Axon
represents Output.

Relationship between Biological neural network and artificial neural network:


An Artificial Neural Network in the field of Artificial intelligence where it attempts
to mimic the network of neurons makes up a human brain so that computers will have
an option to understand things and make decisions in a human-like manner. The
artificial neural network is designed by programming computers to behave simply like
interconnected brain cells.

There are around 1000 billion neurons in the human brain. Each neuron has an
association point somewhere in the range of 1,000 and 100,000. In the human brain,
data is stored in such a manner as to be distributed, and we can extract more than one
piece of this data when necessary from our memory parallelly. We can say that the
human brain is made up of incredibly amazing parallel processors.

We can understand the artificial neural network with an example, consider an example
of a digital logic gate that takes an input and gives an output. "OR" gate, which takes
two inputs. If one or both the inputs are "On," then we get "On" in output. If both the
inputs are "Off," then we get "Off" in output. Here the output depends upon input.
Our brain does not perform the same task. The outputs to inputs relationship keep
changing because of the neurons in our brain, which are "learning."

The architecture of an artificial neural network:

To understand the concept of the architecture of an artificial neural network, we


have to understand what a neural network consists of. In order to define a neural
network that consists of a large number of artificial neurons, which are termed units
arranged in a sequence of layers. Lets us look at various types of layers available in
an artificial neural network.

Artificial Neural Network primarily consists of three layers:

Input Layer:

As the name suggests, it accepts inputs in several different formats provided by


the programmer.

Hidden Layer:

The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.

Output Layer:
The input goes through a series of transformations using the hidden layer, which
finally results in output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs
and includes a bias. This computation is represented in the form of a transfer function.

It determines weighted total is passed as an input to an activation function to produce the


output. Activation functions choose whether a node should fire or not. Only those who
are fired make it to the output layer. There are distinctive activation functions available
that can be applied upon the sort of task we are performing.

Perceptrons

Perceptron is a building block of an Artificial Neural Network. Initially, in the mid of 19th
century, Mr. Frank Rosenblatt invented the Perceptron for performing certain
calculations to detect input data capabilities or business intelligence. Perceptron is a
linear Machine Learning algorithm used for supervised learning for various binary
classifiers. This algorithm enables neurons to learn elements and processes them one by one
during preparation. In this tutorial, "Perceptron in Machine Learning," we will
discuss in-depth knowledge of Perceptron and its basic functions in brief. Let's
start with the basic introduction of Perceptron..

Perceptron is Machine Learning algorithm for supervised learning of various


binary classification tasks. Further, Perceptron is also understood as an Artificial
Neuron or neural network unit that helps to detect certain input data
computations in business intelligence.

Perceptron model is also treated as one of the best and simplest types of Artificial
Neural networks. However, it is a supervised learning algorithm of binary classifiers.
Hence, we can consider it as a single-layer neural network with four main
parameters, i.e., input values, weights and Bias, net sum, and an activation function.

Basic Components of Perceptron


Mr. Frank Rosenblatt invented the perceptron model as a binary classifier which
contains
three main components. These are as follows:
 Input Nodes or Input Layer:

This is the primary component of Perceptron which accepts the initial data into the system
for further processing. Each input node contains a real numerical value.

 Wight and Bias:

Weight parameter represents the strength of the connection between units. This is another
most important parameter of Perceptron components. Weight is directly proportional to the
strength of the associated input neuron in deciding the output. Further, Bias can be
considered as the line of intercept in a linear equation.

 Activation Function:

These are the final and important components that help to determine whether the neuron
will fire or not. Activation Function can be considered primarily as a step function.

Types of Activation functions:

 Sign function
 Step function, and

 Sigmoid function
The data scientist uses the activation function to take a subjective decision based on
various problem statements and forms the desired outputs. Activation function may
differ (e.g., Sign, Step, and Sigmoid) in perceptron models by checking whether the
learning process is slow or has vanishing or exploding gradients.

How does Perceptron work?

In Machine Learning, Perceptron is considered as a single-layer neural network that


consists of four main parameters named input values (Input nodes), weights and Bias,
net sum, and an activation function. The perceptron model begins with the
multiplication of all input values and their weights, then adds these values together
to create the weighted sum. Then this weighted sum is applied to the activation
function 'f' to obtain the desired output. This activation function is also known as
the step function and is represented by 'f'.
This step function or Activation function plays a vital role in ensuring that output is
mapped between required values (0,1) or (-1,1). It is important to note that the weight
of input is indicative of the strength of a node. Similarly, an input's bias value gives
the ability to shift the activation function curve up or down.

Perceptron model works in two important steps as follows:

Step-1

In the first step first, multiply all input values with corresponding weight values and
then add them to determine the weighted sum. Mathematically, we can calculate the
weighted sum as follows:

∑wi*xi = x1*w1 + x2*w2 +…wn*xn

Add a special term called bias 'b' to this weighted sum to improve the model's performance.

∑wi*xi + b

Step-2

In the second step, an activation function is applied with the above-mentioned weighted
sum, which gives us output either in binary form or a continuous value as follows:

Y = f(∑wi*xi + b)

Types of Perceptron Models

Based on the layers, Perceptron models are divided into two types. These are as follows:

1. Single-layer Perceptron Model


2. Multi-layer Perceptron model

Single Layer Perceptron Model:

This is one of the easiest Artificial neural networks (ANN) types. A single-layered
perceptron model consists feed-forward network and also includes a threshold transfer
function inside
the model. The main objective of the single-layer perceptron model is to analyze the
linearly separable objects with binary outcomes.

In a single layer perceptron model, its algorithms do not contain recorded data, so it
begins with inconstantly allocated input for weight parameters. Further, it sums up all
inputs (weight). After adding all inputs, if the total sum of all inputs is more than a pre-
determined value, the model gets activated and shows the output value as +1.

If the outcome is same as pre-determined or threshold value, then the performance of this
model is stated as satisfied, and weight demand does not change. However, this model
consists of a few discrepancies triggered when multiple weight inputs values are fed into
the model. Hence, to find desired output and minimize errors, some changes should be
necessary for the weights input.

"Single-layer perceptron can learn only linearly separable patterns.".

Multi-Layered Perceptron Model:

Like a single-layer perceptron model, a multi-layer perceptron model also has the
same model structure but has a greater number of hidden layers.
The multi-layer perceptron model is also known as the Backpropagation algorithm,
which executes in two stages as follows:

 Forward Stage: Activation functions start from the input layer in the forward
stage and terminate on the output layer.
 Backward Stage: In the backward stage, weight and bias values are modified as
per the model's requirement. In this stage, the error between actual output and
demanded originated backward on the output layer and ended on the input layer.

Hence, a multi-layered perceptron model has considered as multiple artificial neural


networks having various layers in which activation function does not remain linear, similar
to a single layer perceptron model. Instead of linear, activation function can be executed as
sigmoid, TanH, ReLU, etc., for deployment.
A multi-layer perceptron model has greater processing power and can process linear and
non- linear patterns. Further, it can also implement logic gates such as AND, OR, XOR,
NAND, NOT, XNOR, NOR.

Advantages of Multi-Layer Perceptron:

 A multi-layered perceptron model can be used to solve complex non-linear problems.


 It works well with both small and large input data.
 It helps us to obtain quick predictions after the training.
 It helps to obtain the same accuracy ratio with large as well as small data.

Disadvantages of Multi-Layer Perceptron:

 In Multi-layer perceptron, computations are difficult and time-consuming.


 In multi-layer Perceptron, it is difficult to predict how much the dependent
variable affects each independent variable.
 The model functioning depends on the quality of the training.

Perceptron Function

Perceptron function ''f(x)'' can be achieved as output by multiplying the input 'x' with
the learned weight coefficient 'w'.
Mathematically, we can express it as follows:

f(x)=1; if

w.x+b>0
otherwise,

f(x)=0

 'w' represents real-valued weights vector


 'b' represents the bias
 'x' represents a vector of input x values.

Characteristics of Perceptron

The perceptron model has the following characteristics.

1. Perceptron is a machine learning algorithm for supervised learning of


binary classifiers.
2. In Perceptron, the weight coefficient is automatically learned.
3. Initially, weights are multiplied with input features, and the decision is made
whether the neuron is fired or not.
4. The activation function applies a step rule to check whether the weight function
is greater than zero.
5. The linear decision boundary is drawn, enabling the distinction between the
two linearly separable classes +1 and -1.
6. If the added sum of all input values is more than the threshold value, it must have
an output signal; otherwise, no output will be shown.

Limitations of Perceptron Model

A perceptron model has limitations as follows:

 The output of a perceptron can only be a binary number (0 or 1) due to the hard
limit transfer function.
 Perceptron can only be used to classify the linearly separable sets of input vectors.
If input vectors are non-linear, it is not easy to classify them properly.

Activation Functions

Activation function also helps to normalize the output of any input in the range between 1 to
-1. Activation function must be efficient and it should reduce the computation time
because the neural network sometimes trained on millions of data points.
Without an activation function, a neural network will become a linear regression model.
But introducing the activation function the neural network will perform a non-linear
transformation to the input and will be suitable to solve problems like image
classification, sentence prediction, or langue translation.

The neuron is basically is a weighted average of input, then this sum is passed through
an activation function to get an output.

Y = ∑ (weights*input +
bias)

Here Y can be anything for a neuron between range -infinity to +infinity. So, we have
to bound our output to get the desired prediction or generalized results.

Y = Activation function(∑ (weights*input + bias))

So, we pass that neuron to activation function to bound output

values. Why do we need Activation Functions

Without activation function, weight and bias would only have a linear
transformation, or neural network is just a linear regression model, a linear equation is
polynomial of one degree only which is simple to solve but limited in terms of ability to
solve complex problems or higher degree polynomials.

But opposite to that, the addition of activation function to neural network executes the non-
linear transformation to input and make it capable to solve complex problems such
as language translations and image classifications.

In addition to that, Activation functions are differentiable due to which they can
easily implement back propagations, optimized strategy while performing
backpropagations to measure gradient loss functions in the neural networks.

Types of Activation
Functions

The two main categories of activation functions are:

 Linear Activation Function


 Non-linear Activation Functions
Linear Neural Network Activation Function

Linear Activation Function

Equation: A linear function's equation, which is y = x, is similar to the eqn of a


single direction.

The ultimate activation function of the last layer is nothing more than a linear function
of input from the first layer, regardless of how many levels we have if they are all
linear in nature. -inf to +inf is the range.

Uses: The output layer is the only location where the activation function's function is
applied.

If we separate a linear function to add non-linearity, the outcome will no longer depend
on the input "x," the function will become fixed, and our algorithm won't exhibit any
novel behaviour.

A good example of a regression problem is determining the cost of a house. We can


use linear activation at the output layer since the price of a house may have any huge
or little value. The neural network's hidden layers must perform some sort of non-
linear function even in this circumstance.

Equation : f(x) = x
Range : (-infinity to infinity)

It doesn’t help with the complexity or various parameters of usual data that is fed to
the neural networks.

Non Linear Neural Network Activation Function

1. Sigmoid or Logistic Activation Function

It is a functional that is graphed in a "S"

shape. A is equal to 1/(1 + e-x).

Non-linear in nature. Observe that while Y values are fairly steep, X values range from -2 to
2. To put it another way, small changes in x also would cause significant shifts in the value
of
Y. spans from 0 to 1.

Uses: Sigmoid function is typically employed in the output nodes of a classi?cation, where
the result may only be either 0 or 1. Since the value for the sigmoid function only ranges
from 0 to 1, the result can be easily anticipated to be 1 if the value is more than 0.5 and 0 if
it is not.

 It is a function which is plotted as ‘S’ shaped graph.


 Equation : A = 1/(1 + e-x)
 Nature : Non-linear. Notice that X values lies between -2 to 2, Y values are
very steep. This means, small changes in x would also bring about large
changes in the value of Y.
 Value Range : 0 to 1
 Uses : Usually used in output layer of a binary classification, where result is either
0 or 1, as value for sigmoid function lies between 0 and 1 only so, result can be
predicted easily to be 1 if value is greater than 0.5 and 0 otherwise.

Tanh Function

The activation that consistently outperforms sigmoid function is known as tangent


hyperbolic function. It's actually a sigmoid function that has been mathematically adjusted.
Both are
comparable to and derivable from one another.

Range of values: -1 to +1. non-linear nature


Uses: - Since its values typically range from -1 to 1, the mean again for hidden layer of a
neural network will be 0 or very near to it. This helps to centre the data by getting the
mean
close to 0. This greatly facilitates learning for the following layer.
 The activation that works almost always better than sigmoid function is Tanh
function also known as Tangent Hyperbolic function. It’s actually mathematically
shifted version of the sigmoid function. Both are similar and can be derived from
each other.
 Equation :-

 Value Range :- -1 to +1
 Nature :- non-linear
 Uses :- Usually used in hidden layers of a neural network as it’s values lies between -
1 to 1 hence the mean for the hidden layer comes out be 0 or very close to it,
hence helps in centering the data by bringing mean close to 0. This makes
learning for the next layer much easier.

ReLU (Rectified Linear Unit) Activation Function

Currently, the ReLU is the activation function that is employed the most globally.
Since practically all convolutional neural networks and deep learning systems employ
it.
The derivative and the function are both monotonic.

However, the problem is that all negative values instantly become zero, which reduces
the model's capacity to effectively fit or learn from the data. This means that any negative
input to a ReLU activation function immediately becomes zero in the graph, which has an
impact
on the final graph by improperly mapping the negative values.

 It Stands for Rectified linear unit. It is the most widely used activation function.
Chiefly implemented in hidden layers of Neural network.
 Equation :- A(x) = max(0,x). It gives an output x if x is positive and 0 otherwise.
 Value Range :- [0, inf)
 Nature :- non-linear, which means we can easily backpropagate the errors and
have multiple layers of neurons being activated by the ReLU function.
 Uses :- ReLu is less computationally expensive than tanh and sigmoid because it
involves simpler mathematical operations. At a time only a few neurons are
activated making the network sparse making it efficient and easy for computation.

In simple words, RELU learns much faster than sigmoid and Tanh function.

Softmax Function
Although it is a subclass of the sigmoid function, the softmax function comes in handy
when dealing with multiclass classification issues.

Used frequently when managing several classes. In the output nodes of image
classification issues, the softmax was typically present. The softmax function would split
by the sum of the outputs and squeeze all outputs for each category between 0 and 1.
The output unit of the classifier, where we are actually attempting to obtain the
probabilities to determine the class of each input, is where the softmax function is best
applied.

The usual rule of thumb is to utilise RELU, which is a usual perceptron in hidden layers
and is employed in the majority of cases these days, if we really are unsure of what
encoder to apply.

A very logical choice for the output layer is the sigmoid function if your input is for
binary classification. If our output involves multiple classes, Softmax can be quite
helpful in
predicting the odds for each class.

The softmax function is also a type of sigmoid function but is handy when we are trying
to handle multi- class classification problems.

 Nature :- non-linear
 Uses :- Usually used when trying to handle multiple classes. the softmax
function was commonly found in the output layer of image classification
problems.The softmax function would squeeze the outputs for each class
between 0 and 1 and would also divide by the sum of the outputs.
 Output:- The softmax function is ideally used in the output layer of the
classifier where we are actually trying to attain the probabilities to define the
class of each input.
 The basic rule of thumb is if you really don’t know what activation function to
use, then simply use RELU as it is a general activation function in hidden layers
and is used in most cases these days.
 If your output is for binary classification then, sigmoid function is very natural
choice for output layer.
 If your output is for multi-class classification then, Softmax is very useful to
predict the probabilities of each classes.

3.2 Artificial Neural Networks (ANN)

Artificial Neural Networks (ANN) are algorithms based on brain function and are
used to model complicated patterns and forecast issues. The Artificial Neural Network
(ANN) is a deep learning method that arose from the concept of the human brain
Biological Neural Networks. The development of ANN was the result of an attempt to
replicate the workings of the human brain. The workings of ANN are extremely similar to
those of biological neural networks, although they are not identical. ANN algorithm
accepts only numeric and structured data.

Artificial Neural Networks Architecture

1. There are three layers in the network architecture: the input layer, the hidden layer
(more than one), and the output layer. Because of the numerous layers are sometimes
referred to as the MLP (Multi-Layer Perceptron).

It is possible to think of the hidden layer as a “distillation layer,” which extracts


some of the most relevant patterns from the inputs and sends them on to the next
layer for further analysis. It accelerates and improves the efficiency of the
network by recognizing just the most important information from the inputs and
discarding the redundant information.
The activation function is important for two reasons:

 This model captures the presence of non-linear relationships between the inputs.
 It contributes to the conversion of the input into a more usable output.

4. Finding the “optimal values of W — weights” that minimize prediction error is


critical to building a successful model. The “backpropagation algorithm” does this by
converting ANN into a learning algorithm by learning from mistakes.

5. The optimization approach uses a “gradient descent” technique to quantify prediction


errors. To find the optimum value for W, small adjustments in W are tried, and the
impact on prediction errors is examined. Finally, those W values are chosen as ideal
since further W changes do not reduce mistakes.

How artificial neural networks functions

The core component of ANNs is artificial neurons. Each neuron receives inputs from
several other neurons, multiplies them by assigned weights, adds them and passes the
sum to one or more neurons. Some artificial neurons might apply an activation function
to the output before passing it to the next variable.
At its core, this might sound like a very trivial math operation. But when you place
hundreds, thousands and millions of neurons in multiple layers and stack them up on top of
each other, you’ll obtain an artificial neural network that can perform very complicated
tasks, such as classifying images or recognizing speech.

Artificial neural networks are composed of an input layer, which receives data from outside
sources (data files, images, hardware sensors, microphone…), one or more hidden layers
that process the data, and an output layer that provides one or more data points
based on the function of the network. For instance, a neural network that detects persons,
cars and animals will have an output layer with three nodes. A network that
classifies bank transactions between safe and fraudulent will have a single output.

Training artificial neural


networks

Artificial neural networks start by assigning random values to the weights of the
connections between neurons. The key for the ANN to perform its task correctly
and accurately is to adjust these weights to the right numbers. But finding the right
weights is not very easy, especially when you’re dealing with multiple layers and
thousands of neurons.

This calibration is done by “training” the network with annotated examples. For instance, if
you want to train the image classifier mentioned above, you provide it with multiple photos,
each labeled with its corresponding class (person, car or animal). As you provide it with
more and more training examples, the neural network gradually adjusts its weights to
map each input to the correct outputs.

Basically, what happens during training is the network adjust itself to glean specific
patterns from the data. Again, in the case of an image classifier network, when you train the
AI model with quality examples, each layer detects a specific class of features. For
instance, the first layer might detect horizontal and vertical edges, the next layers
might detect corners and round shapes. Further down the network, deeper layers will start
to pick out more advanced features such as faces and objects.

3 Backpropagation Algorithm

Backpropagation is an algorithm that backpropagates the errors from the output nodes to
the input nodes. Therefore, it is simply referred to as the backward propagation of
errors. It uses in the vast applications of neural networks in data mining like
Character recognition, Signature verification, etc.

Backpropagation is the essence of neural network training. It is the method of fine-


tuning the weights of a neural network based on the error rate obtained in the previous
epoch (i.e., iteration). Proper tuning of the weights allows you to reduce error rates
and make the model reliable by increasing its generalization.

How Backpropagation Algorithm Works

The Back propagation algorithm in neural network computes the gradient of the loss
function for a single weight by the chain rule. It efficiently computes one layer at a
time, unlike a native direct computation. It computes the gradient, but it does not define
how the gradient is used. It generalizes the computation in the delta rule.

Consider the following Back propagation neural network example diagram to understand:

Backpropagation Algorithm:
Step 1: Inputs X, arrive through the preconnected path.
Step 2: The input is modeled using true weights W. Weights are usually chosen randomly.
Step 3: Calculate the output of each neuron from the input layer to the hidden layer to the
output layer.
Step 4: Calculate the error in the outputs
Backpropagation Error= Actual Output – Desired Output
Step 5: From the output layer, go back to the hidden layer to adjust the weights to reduce
the error.
Step 6: Repeat the process until the desired output is achieved.

Training Algorithm :

Step 1: Initialize weight to small random values.

Step 2: While the stepsstopping condition is to be false do step 3 to 10.

Step 3: For each training pair do step 4 to 9 (Feed-Forward).


Step 4: Each input unit receives the signal unit and transmitsthe signal xi signal to all
the units.

Step 5 : Each hidden unit Zj (z=1 to a) sums its weighted input signal to calculate its
net input

zinj = v0j + Σxivij ( i=1 to n)

Applying activation function zj = f(zinj) and sends this signals to all units in the
layer about i.e output units

For each output l=unit yk = (k=1 to m) sums its weighted input

signals. yink = w0k + Σ ziwjk (j=1 to a)

and applies its activation function to calculate the output

signals. yk = f(yink)

Backpropagation Error :

Step 6: Each output unit yk (k=1 to n) receives a target pattern corresponding to an


input pattern then error is calculated as:

δk = ( tk – yk ) + yink

Step 7: Each hidden unit Zj (j=1 to a) sums its input from all units in the layer above

δinj = Σ δj wjk

The error information term is calculated as


:

δj = δinj + zinj

Updation of weight and bias :


Step 8: Each output unit yk (k=1 to m) updates its bias and weight (j=1 to a). The
weight correction term is given by :

Δ wjk = α δk zj
and the bias correction term is given by Δwk = α δk.

therefore wjk(new) = wjk(old) + Δ


wjk

w0k(new) = wok(old) + Δ wok

for each hidden unit zj (j=1 to a) update its bias and weights (i=0 to n) the
weight connection term

Δ vij = α δj xi

and the bias connection on term

Δ v0j = α
δj

Therefore vij(new) = vij(old) + Δvij

v0j(new) = v0j(old) +
Δv0j

Step 9: Test the stopping condition. The stopping condition can be the minimization of
error, number of epochs.

Need for Backpropagation:


Backpropagation is “backpropagation of errors” and is very useful for training neural
networks. It’s fast, easy to implement, and simple. Backpropagation does not require
any parameters to be set, except the number of inputs. Backpropagation is a flexible
method because no prior knowledge of the network is required.

Types of Backpropagation

There are two types of backpropagation networks.


Static backpropagation: Static backpropagation is a network designed to map static
inputs for static outputs. These types of networks are capable of solving
static classification problems such as OCR (Optical Character Recognition).
 Recurrent backpropagation: Recursive backpropagation is another network used
for fixed-point learning. Activation in recurrent backpropagation is feed-forward
until a fixed value is reached. Static backpropagation provides an instant
mapping, while recurrent backpropagation does not provide an instant mapping.

Advantage
s:

 It is simple, fast, and easy to program.


 Only numbers of the input are tuned, not any other parameter.
 It is Flexible and efficient.
 No need for users to learn any special functions.

Disadvantage
s:

 It is sensitive to noisy data and irregularities. Noisy data can lead to inaccurate
results.
 Performance is highly dependent on input data.
 Spending too much time training.
 The matrix-based approach is preferred over a mini-batch.

3.4 Convolutional Neural Network

A convolutional neural network is a feed-forward neural network that is generally


used to analyze visual images by processing data with grid-like topology. It’s also
known as a ConvNet. A convolutional neural network is used to detect and classify objects
in an image.

The Convolutional Neural Networks, which are also called as covnets, are nothing but
neural networks, sharing their parameters. Suppose that there is an image, which is
embodied as a cuboid, such that it encompasses length, width, and height. Here the
dimensions of the image
are represented by the Red, Green, and Blue channels, as shown in the image given
below.
Now assume that we have taken a small patch of the same image, followed by
running a small neural network on it, having k number of outputs, which is
represented in a vertical manner. Now when we slide our small neural network all
over the image, it will result in another image constituting different width, height as well
as depth. We will notice that rather than having R, G, B channels, we have come across
some more channels that, too, with less width and height, which is actually the concept of
Convolution. In case, if we accomplished in having similar patch size as that of the
image, then it would have been a regular neural
network. We have some wights due to this small
patch.

Mathematically it could be understood as follows;


The Convolutional layers encompass a set of learnable filters, such that each
filter embraces small width, height as well as depth as that of the provided input
volume (if the image is the input layer then probably it would be 3).
 Suppose that we want to run the convolution over the image that comprises
of
34x34x3 dimension, such that the size of a filter can be axax3. Here a can be any of
the above 3, 5, 7, etc. It must be small in comparison to the dimension of the image.
 Each filter gets slide all over the input volume during the forward pass. It slides step
by step, calling each individual step as a stride that encompasses a value of 2 or 3 or
4 for higher-dimensional images, followed by calculating a dot product in
between filter's weights and patch from input volume.
 It will result in 2-Dimensional output for each filter as and when we slide our filters
followed by stacking them together so as to achieve an output volume to
have a similar depth value as that of the number of filters. And then, the network
will learn all the filters.

Working of CNN

Generally, a Convolutional Neural Network has three layers, which are as follows;

 Input: If the image consists of 32 widths, 32 height encompassing three R,


G, B
channels, then it will hold the raw pixel([32x32x3]) values of an image.
 Convolution: It computes the output of those neurons, which are associated
with input's local regions, such that each neuron will calculate a dot product in
between weights and a small region to which they are actually linked to in the input
volume. For example, if we choose to incorporate 12 filters, then it will result in a
volume of [32x32x12].
 ReLU Layer: It is specially used to apply an activation function elementwise, like
as max (0, x) thresholding at zero. It results in ([32x32x12]), which relates to
an unchanged size of the volume.
 Pooling: This layer is used to perform a downsampling operation along the
spatial
dimensions (width, height) that results in [16x16x12] volume.

Locally Connected: It can be defined as a regular neural network layer that receives an
input from the preceding layer followed by computing the class scores and results
in a 1-
Dimensional array that has the equal size to that of the number of classes.
We will start with an input image to which we will be applying multiple feature
detectors, which are also called as filters to create the feature maps that comprises
of a Convolution layer. Then on the top of that layer, we will be applying the ReLU or
Rectified Linear Unit to remove any linearity or increase non-linearity in our images.

Next, we will apply a Pooling layer to our Convolutional layer, so that from every
feature map we create a Pooled feature map as the main purpose of the pooling layer is
to make sure that we have spatial invariance in our images. It also helps to reduce the
size of our images as well as avoid any kind of overfitting of our data. After that, we
will flatten all of our pooled

images into one long vector or column of all of these values, followed by inputting
these values into our artificial neural network. Lastly, we will feed it into the locally
connected layer to achieve the final output.

Pooling Layers
The pooling operation involves sliding a two-dimensional filter over each channel of
feature map and summarising the features lying within the region covered by the filter.
For a feature map having dimensions nh x nw x nc, the dimensions of output obtained after a
pooling layer is

(nh - f + 1) / s x (nw - f + 1)/s x nc

where,

-> nh - height of feature map


-> nw - width of feature map
-> nc - number of channels in the feature map
-> f - size of filter
-> s - stride length

A common CNN model architecture is to have a number of convolution and pooling


layers stacked one after the other.

Why to use Pooling Layers?

 Pooling layers are used to reduce the dimensions of the feature maps. Thus, it
reduces the number of parameters to learn and the amount of computation
performed in the network.

The pooling layer summarises the features present in a region of the feature map
generated by a convolution layer. So, further operations are performed on
summarised features instead of precisely positioned features generated by the
convolution layer. This makes the model more robust to variations in the position of
the features in the input image.

Types of Pooling Layers:

Max Pooling

1. Max pooling is a pooling operation that selects the maximum element from the
region of the feature map covered by the filter. Thus, the output after max-
pooling layer would be a feature map containing the most prominent features of the
previous feature
map.
This can be achieved using MaxPooling2D layer in keras as follows

Average Pooling

Average pooling computes the average of the elements present in the region of feature map
covered by the filter. Thus, while max pooling gives the most prominent feature in
a particular patch of the feature map, average pooling gives the average of features present
in a patch.

Global
Pooling

1. Global pooling reduces each channel in the feature map to a single value. Thus, an
nh x nw x nc feature map is reduced to 1 x 1 x nc feature map. This is equivalent to
using a filter of dimensions nh x nw i.e. the dimensions of the feature
map. Further, it can be either global max pooling or global average
pooling.

In convolutional neural networks (CNNs), the pooling layer is a common type of layer that
is typically added after convolutional layers. The pooling layer is used to reduce the
spatial dimensions (i.e., the width and height) of the feature maps, while preserving the
depth (i.e., the number of channels).
1. The pooling layer works by dividing the input feature map into a set of
non- overlapping regions, called pooling regions. Each pooling region is then
transformed into a single output value, which represents the presence of a particular
feature in that region. The most common types of pooling operations are max
pooling and average pooling.
2. In max pooling, the output value for each pooling region is simply the
maximum value of the input values within that region. This has the effect of
preserving the most salient features in each pooling region, while discarding
less relevant information. Max pooling is often used in CNNs for object
recognition tasks, as it helps to identify the most distinctive features of an object,
such as its edges and corners.
3. In average pooling, the output value for each pooling region is the average
of the input values within that region. This has the effect of preserving more
information than max pooling, but may also dilute the most salient features.
Average pooling is often used in CNNs for tasks such as image segmentation and
object detection, where a more fine-grained representation of the input is required.

Pooling layers are typically used in conjunction with convolutional layers in a CNN,
with each pooling layer reducing the spatial dimensions of the feature maps,
while the convolutional layers extract increasingly complex features from the input.
The resulting feature maps are then passed to a fully connected layer, which
performs the final classification or regression task.

Advantages of Pooling Layer:

1. Dimensionality reduction: The main advantage of pooling layers is that they help
in reducing the spatial dimensions of the feature maps. This reduces the
computational cost and also helps in avoiding overfitting by reducing the number of
parameters in the model.
2. Translation invariance: Pooling layers are also useful in achieving
translation invariance in the feature maps. This means that the position of an object
in the image does not affect the classification result, as the same features are
detected regardless of the position of the object.
3. Feature selection: Pooling layers can also help in selecting the most important
features from the input, as max pooling selects the most salient features and average
pooling preserves more information.
Disadvantages of Pooling Layer:

1. Information loss: One of the main disadvantages of pooling layers is that they
discard some information from the input feature maps, which can be important
for the final classification or regression task.
2. Over-smoothing: Pooling layers can also cause over-smoothing of the feature
maps, which can result in the loss of some fine-grained details that are
important for the final classification or regression task.
3. Hyperparameter tuning: Pooling layers also introduce hyperparameters such as
the size of the pooling regions and the stride, which need to be tuned in order to
achieve optimal performance. This can be time-consuming and requires some
expertise in model building.

Recurrent Neural Network(RNN)

Recurrent Neural Network(RNN) is a type of Neural Network where the output


from the previous step is fed as input to the current step. In traditional neural networks, all
the inputs and outputs are independent of each other, but in cases when it is required to
predict the next word of a sentence, the previous words are required and hence there is a
need to remember the previous words. Thus RNN came into existence, which solved this
issue with the help of a Hidden Layer. The main and most important feature of RNN
is its Hidden state, which remembers some information about a sequence. The state is
also referred to as Memory State since it remembers the previous input to the network. It
uses the same parameters for each input as it performs the same task on all the inputs or
hidden layers to produce the output.
This reduces the complexity of parameters, unlike other neural
networks.
Architecture Of Recurrent Neural Network

RNNs have the same input and output architecture as any other deep neural
architecture. However, differences arise in the way information flows from input to output.
Unlike Deep neural networks where we have different weight matrices for each Dense
network in RNN, the weight across the network remains the same. It calculates state hidden
state Hi for every input Xi . By using the following formulas:

h= σ(UX + Wh-1 + B)

Y = O(Vh + C)

Hence

Y = f (X, h , W, U, V, B, C)
Here S is the State matrix which has element si as the state of the network at timestep i
The parameters in the network are W, U, V, c, b which are shared across timestep

The Recurrent Neural Network consists of multiple fixed activation function units, one
for each time step. Each unit has an internal state which is called the hidden state of the
unit. This hidden state signifies the past knowledge that the network currently holds
at a given time step. This hidden state is updated at every time step to signify the change
in the knowledge of the network about the past. The hidden state is updated using the
following recurrence relation:-

The formula for calculating the current state:


where:
ht -> current state
ht-1 -> previous state xt -> input state

Formula for applying Activation function(tanh):

where:

whh -> weight at recurrent neuron


wxh -> weight at input neuron

The formula for calculating output:

Yt -> output

Why -> weight at output layer

These parameters are updated using Backpropagation. However, since RNN works on
sequential data here we use an updated backpropagation which is known as
Backpropagation through time.

Training through
RNN
1. A single-time step of the input is provided to the network.
2. Then calculate its current state using a set of current input and the previous state.
3. The current ht becomes ht-1 for the next time step.
4. One can go as many time steps according to the problem and join the
information from all the previous states.
5. Once all the time steps are completed the final current state is used to calculate the
output.
6. The output is then compared to the actual output i.e the target output and the error is
generated.
7. The error is then back-propagated to the network to update the weights and hence the
network (RNN) is trained using back propagation through time

Classification performance metrics:


Confusion Matrix in Machine Learning

The confusion matrix is a matrix used to determine the performance of the


classification models for a given set of test data. It can only be determined if the true
values for test data are known. The matrix itself can be easily understood, but the
related terminologies may be confusing. Since it shows the errors in the model
performance in the form of a matrix, hence also known as an error matrix. Some
features of Confusion matrix are given below:

o For the 2 prediction classes of classifiers, the matrix is of 2*2 table, for 3 classes, it
is
3*3 table, and so on.
o The matrix is divided into two dimensions, that are predicted values and
actual values along with the total number of predictions.
o Predicted values are those values, which are predicted by the model, and actual
values are the true values for the given observations.
o It looks like the below table:

The above table has the following cases:

o True Negative: Model has given prediction No, and the real or actual value was also
No.

o True Positive: The model has predicted yes, and the actual value was also true.
o False Negative: The model has predicted no, but the actual value was Yes, it is
also called as Type-II error.
o False Positive: The model has predicted Yes, but the actual value was No. It is also
called a Type-I error.

Precision and Recall in Machine Learning

While building any machine learning model, the first thing that comes to our mind is how
we can build an accurate & 'good fit' model and what the challenges are that will come
during the entire procedure. Precision and Recall are the two most important but confusing
concepts in Machine Learning. Precision and recall are performance metrics
used for pattern recognition and classification in machine learning. These concepts are
essential to build a perfect machine learning model which gives more precise and accurate
results. Some of the models in machine learning require more precision and some model
requires more recall. So, it is important to know the balance between Precision and recall
or, simply, precision-recall trade-off.

Accuracy

It’s the ratio of the correctly labeled subjects to the whole pool of subjects.

Accuracy is the most intuitive one.

Accuracy answers the following question: How many students did we correctly label
out of all the students?

Accuracy = (TP+TN)/(TP+FP+FN+TN)
numerator: all correctly labeled subject (All
trues) denominator: all subject

F1-score (aka F-Score / F-Measure)

F1 Score considers both precision and recall.

It is the harmonic mean(average) of the precision and recall.


F1 Score is best if there is some sort of balance between precision (p) & recall (r) in the
system. Oppositely F1 Score isn’t so high if one measure is improved at the expense of
the other.

For example, if P is 1 & R is 0, F1 score is 0.

F1 Score = 2*(Recall * Precision) / (Recall + Precision)


AUC-ROC curve, or Area Under the Receiver Operating Characteristic curve, is a
graphical representation of the performance of a binary classification model at various
classification thresholds. It is commonly used in machine learning to assess the ability of a
model to distinguish between two classes, typically the positive class (e.g., presence of a
disease) and the negative class (e.g., absence of a disease).
 ROC: Receiver Operating Characteristics
 AUC: Area Under Curve
Receiver Operating Characteristics (ROC) Curve
ROC stands for Receiver Operating Characteristics, and the ROC curve is the graphical
representation of the effectiveness of the binary classification model. It plots the true
positive rate (TPR) vs the false positive rate (FPR) at different classification thresholds.

Area Under Curve (AUC) Curve:


AUC stands for the Area Under the Curve, and the AUC curve represents the area under
the ROC curve. It measures the overall performance of the binary classification model. As
both TPR and FPR range between 0 to 1, So, the area will always lie between 0 and 1, and
A greater value of AUC denotes better model performance. Our main goal is to maximize
this area in order to have the highest TPR and lowest FPR at the given threshold. The AUC
measures the probability that the model will assign a randomly chosen positive instance a
higher predicted probability compared to a randomly chosen negative instance.
It represents the probability with which our model can distinguish between the two classes
present in our target.

You might also like