ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf

Sanjivani Rural Education Society’s
Sanjivani College of Engineering, Kopargaon-423 603
Department of Information Technology
Prepared by
Dr.R.D.Chintamani
Assistant Professor
Department of Information Technology
Department of Information Technology, SRES’s Sanjivani College of Engineering, Kopargaon-
Machine Learning
(IT312)

DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Unit-VI
DEEP LEARNING
Course Objectives : To understand the Deep Learning concept
Course Outcome(CO6) : Understand the Deep learning,

 Generative AI tools like ChatGPT and Midjournery are able to replicate (and
often exceed) human-like performance on tasks like taking exams, generating
text and making art.
 Even to seasoned programmers, their abilities can seem magical. But,
obviously, there is no magic.
 These things are “just” artificial neural networks – circuits inspired by the
architecture of biological brains.
 In fact, much like real brains, when broken down to their building blocks,
these systems can seem “impossibly simple” relative to what they achieve.
Artificial Neural Network(ANN)

 Artificial Neural Networks (ANN) are algorithms based on brain function and
are used to model complicated patterns and forecast issues.
 The Artificial Neural Network (ANN) is a deep learning method that arise
from the concept of the human brain Biological Neural Networks.
 The development of ANN was the result of an attempt to replicate the
workings of the human brain. The workings of ANN are extremely similar
to those of biological neural networks, although they are not identical.
ANN algorithm accepts only numeric and structured data.
 Convolutional Neural Networks (CNN) and Recursive Neural Networks
(RNN) are used to accept unstructured and non-numeric data forms such as
Image, Text, and Speech.

Artificial Neural Networks (ANN):
 An Artificial Neural Network (ANN) is a computational model inspired by
the human brain’s neural structure.
 It consists of interconnected nodes (neurons) organized into layers.
 Information flows through these nodes, and the network adjusts the
connection strengths (weights) during training to learn from data, enabling it
to recognize patterns, make predictions, and solve various tasks in machine
learning and artificial intelligence.

Artificial Neural Networks Architecture

Artificial Neural Networks Architecture:

Types of Artificial Neural Networks:
Five Types of Artificial Neural Networks:

Types of Artificial Neural Networks:

How do Artificial Neural Networks learn?

Application of Artificial Neural Networks:
ANNs have a wide range of applications because of their unique properties. A
few of the important applications of ANNs include:
1. Image Processing and Character recognition:
 ANNs play a significant part in picture and character recognition because of
their capacity to take in many inputs, process them, and infer hidden and
complicated, non-linear correlations.
 Character recognition, such as handwriting recognition, has many
applications in fraud detection (for example, bank fraud) and even national
security assessments.

 Image recognition is a rapidly evolving discipline with several applications
ranging from social media facial identification to cancer detection in medicine
to satellite image processing for agricultural and defence purposes.
 Deep neural networks, which form the core of “deep learning,” have now
opened up all of the new and transformative advances in computer vision,
speech recognition, and natural language processing – notable examples being
self-driving vehicles,

2. Forecasting:
 It is widely used in everyday company decisions (sales, the financial
allocation between goods, and capacity utilization), economic and monetary
policy, finance, and the stock market.
 Forecasting issues are frequently complex; for example, predicting stock
prices is complicated with many underlying variables (some known, some
unseen).
 Traditional forecasting models have flaws when it comes to accounting for
these complicated, non-linear interactions.

2. Forecasting:
 Given its capacity to model and extract previously unknown characteristics
and correlations, ANNs can provide a reliable alternative when used correctly.
ANN also has no restrictions on the input and residual distributions, unlike
conventional models.

Advantages of Artificial Neural Networks:

Disadvantages of Artificial Neural Networks:

McCulloch-Pitts Neuron:
 The McCulloch-Pitts neuron was a binary device, functioning in an
all-or-nothing manner.
 It had multiple inputs and one output; if the combined inputs
exceeded a certain threshold, the neuron would ‘fire’, otherwise, it
remained inactive.
 This simple yet powerful concept mimicked the basic function of
neurons in the brain,
 MP neuron model is also known as linear threshold gate model,

McCulloch-Pitts Neuron:

Perceptron:
 A perceptron is the smallest element of a neural network.
 Perceptron is a single-layer neural network linear or a Machine Learning
algorithm used for supervised learning of various binary classifiers.
 It works as an artificial neuron to perform computations by learning elements
and processing them for detecting the business intelligence and capabilities
of the input data.
 A perceptron network is a group of simple logical statements that come
together to create an array of complex logical statements, known as the
neural network.
Perceptron

Perceptron:
Perceptron

Components of a Perceptron:
Perceptron

Why do we Need Weight and Bias?
 Weight and bias are two important aspects of the perceptron model.
 These are learnable parameters and as the network gets trained it adjusts both
parameters to achieve the desired values and the correct output.
 Weights are used to measure the importance of each feature in predicting
output value.
 Features with values close to zero are said to have lesser weight or
significance. These have less importance in the prediction process compared
to the features with values further from zero known as weights with a larger
value.
Perceptron

Why do we Need Weight and Bias?
 If the weight of a feature is positive then it has a direct relation with the
target value, and if it is negative then it has an inverse relationship with the
target value.
 In contrast to weight in a neural network that increases the speed of triggering
an activation function, bias delays the trigger of the activation function.
 It acts like an intercept in a linear equation. Simply stated, Bias is a constant
used to adjust the output and help the model to provide the best fit output for
the given data.
Perceptron

Perceptron Learning Rule:
 According to the rule, perceptron can learn automatically to generate the
desired results through optimal weight coefficients.
Perceptron

The anatomy of a Perceptron:
Perceptron

Why is perceptron used?
 Perceptron is a linear classifier used for data classification into the two binary
sections.
 Facilitating the supervised learning of binary classifiers, the perceptron
algorithm learns and processes elements in the training set one at a time.
 It helps detect features from an input to derive business intelligence and
classify the inputs as it enables machines to automatically learn coefficients
of weight.
 Perceptron is commonly used for basic operations like data compression, data
visualization, high-quality complex image recognition, and encryption.
Perceptron

Single Layer Perceptron Model:
 A single-layer perceptron model is the simplest type of artificial neural
network.
 It includes a feed-forward network that can analyze only linearly separable
objects while being dependent on a threshold transfer function. The model
returns only binary outcomes(target) i.e. 1, and 0.
 The algorithm in a single-layered perceptron model does not have any
previous information initially. The weights are allocated inconsistently, so
the algorithm simply adds up all the weighted inputs.
 since the single-layer perceptron is a linear classifier and it does not classify
cases if they are not linearly separable
Perceptron

Single Layer Perceptron Model:
Perceptron

Multilayer Perceptron Model
 A multi-layer perceptron model uses the backpropagation algorithm. Though
it has the same structure as that of a single-layer perceptron, it has one or
more hidden layers.
Perceptron

Activation Functions:
 Activation functions are an integral building block of neural networks that
enable them to learn complex patterns in data.
 They transform the input signal of a node in a neural network into an output
signal that is then passed on to the next layer. Without activation functions,
neural networks would be restricted to modeling only linear relationships
between inputs and outputs.
 Activation functions introduce non-linearities, allowing neural networks to
learn highly complex mappings between inputs and outputs.
Activation Functions in ANN

Activation Functions:
 An activation function determines the range of values of activation of an
artificial neuron,
 This is applied to the sum of the weighted input data of the neuron.
 Without the application of an activation function, the only operations in
computing the output of a multilayer perceptron would be the linear products
between the weights and the input values.
 A neural network that does not have an activation function in the hidden layer
would not be able to mathematically realize such complex relationships, and
would not be able to solve the tasks we are trying to solve with the network.

Why Are Activation Functions Essential?
 Without activation functions, neural networks would just consist of linear
operations like matrix multiplication. All layers would perform linear
transformations of the input, and no non-linearities would be introduced.
 Most real-world data is non-linear. For example, relationships between house
prices and size, income, and purchases, etc., are non-linear. If neural
networks had no activation functions, they would fail to learn the complex
non-linear patterns that exist in real-world data.
 Activation functions enable neural networks to learn these non-linear
relationships by introducing non-linear behaviour through activation
functions.

1. Sigmoid or Logistic Activation Function:
 It is especially used for models where we have to predict the probability as
an output. Since probability of anything exists only between the range of 0
and 1, sigmoid is the right choice.
 The function is differentiable.That means, we can find the slope of the
sigmoid curve at any two points.
 The main purpose of the activation function is to maintain the output or
predicted value in the particular range, which makes the good efficiency and
accuracy of the model.
Types of Activation Functions:

The mathematical definition of the sigmoid function is as follows:

 It takes a real-valued input and squashes it to a value between 0 and 1.
 The sigmoid function has an "S"-shaped curve that asymptotes to 0 for large
negative numbers and 1 for large positive numbers.
 The outputs can be easily interpreted as probabilities, which makes it natural
for binary classification problems.
 The main use case of the sigmoid function is as the activation for the output
layer of binary classification models

2. Tanh or hyperbolic tangent Activation Function:
 Tanh is also like logistic sigmoid but better. The range of the Tanh function
is from (-1 to 1). Tanh is also sigmoidal (s – shaped).
 The advantage of Tanh is that the negative inputs will be mapped strongly
negative and the zero inputs will be mapped near zero in the tanh graph.
 The tanh function is mainly used classification between two classes.
 Both Tanh and logistic sigmoid activation functions are used in feed-
forward nets.

 Unlike the sigmoid function, tanh is zero-centered, which means that its
output is symmetric around the origin of the coordinate system. This is often
considered an advantage because it can help the learning algorithm converge
faster.
 The tanh function is frequently used in the hidden layers of a neural network.
Because of its zero-centered nature, when the data is also normalized to have
mean zero, it can result in more efficient training.
 If one has to choose between the sigmoid and tanh the decision can also be
influenced by the specific use case and the behavior of the network during
initial training experiments.

3. ReLU (Rectified Linear Unit) Activation Function
 The ReLU is the most used activation function in the world right now.Since,
it is used in almost all the convolutional neural networks or deep learning.
 Even though ReLU is linear for half of its input space, it is technically a non-
linear function because it has a non-differentiable point at x=0, where it
abruptly changes from x. This non-linearity allows neural networks to learn
complex patterns,
 Since ReLU outputs zero for all negative inputs, it naturally leads to sparse
activations; at any time, only a subset of neurons are activated, leading to
more efficient computation.

 The ReLU function is computationally inexpensive because it involves simple
thresholding at zero. This allows networks to scale to many layers without a
significant increase in computational burden, compared to more complex
functions like tanh or sigmoid.

Loss Functions(Cost/Error Function):
Loss Function:
 The loss function helps determine how effectively your algorithm model the
featured dataset. Similarly loss is the measure that your model has for
predictability, the expected results.
 It is a mathematical function of the parameters of the machine learning
algorithm.
 In simple linear regression, prediction is calculated using slope (m) and
intercept (b). The loss function for this is the (Yi – Yihat)^2 i.e., loss function
is the function of slope and intercept. Regression loss functions like the MSE
loss function are commonly used in evaluating the performance of regression
models,

Loss Function:
 A loss function measures how good a neural network model is in performing a
certain task, which in most cases is regression or classification.
 We must minimize the value of the loss function during the back-propagation
step in order to make the neural network better.
 We only use the cross-entropy loss function in classification tasks when we
want the neural network to predict probabilities.
 For regression tasks, when we want the network to predict continuous
numbers, we must use the mean squared error loss function.

Loss Function:
 Loss functions play a pivotal role in machine learning algorithms, acting as
objective measures of the disparity between predicted and actual values.
 They serve as the basis for model training, guiding algorithms to adjust model
parameters in a direction that minimizes the loss and improves predictive
accuracy.
 In machine learning, loss functions quantify the extent of error between
predicted and actual outcomes. They provide a means to evaluate the
performance of a model on a given dataset and are instrumental in optimizing
model parameters during the training process.

Loss Functions in Deep Learning
1. Mean Squared Error/Squared loss/ L2 loss:
 The Mean Squared Error (MSE) is a straightforward and widely used loss
function.
 To calculate the MSE, you take the difference between the actual value and the
model prediction, square it, and then average it across the entire dataset.

1. Mean Squared Error/Squared loss/ L2 loss:

2. Mean Absolute Error/ L1 loss Functions:
 The Mean Absolute Error (MAE) is another simple loss function. It calculates
the average absolute difference between the actual value and the model
prediction across the dataset.

2. Mean Absolute Error/ L1 loss Functions:

3. Binary Cross Entropy/log loss Function:
 It is used in binary classification problems like two classes. example a person
has covid or not or my article gets popular or not.
 Binary cross entropy compares each of the predicted probabilities to the actual
class output which can be either 0 or 1. It then calculates the score that
penalizes the probabilities based on the distance from the expected value. That
means how close or far from the actual value.

Convolutional Neural Network(CNN)
Convolutional Neural Network(CNN):
 A Convolutional Neural Network, also known as CNN or ConvNet, is a class
of neural networks that specializes in processing data that has a grid-like
topology, such as an image.
 A digital image is a binary representation of visual data. It contains a series of
pixels arranged in a grid-like fashion that contains pixel values to denote how
bright and what colour each pixel should be.
 It takes an input image, applies filters, flatten the image, and “vote” to classify
the image.

Building Block of (CNN):
Building Block of Convolutional Neural Network(CNN):

Input Image:
 First of all, the input image will be broken down into pixels.
 If it is a black and white image, it will only have one layer and pixels will be
interpreted as 2D array with the value from 0 to 255. If it is coloured image, it
will have 3 layers (red, green, blue) and will be interpreted as 3D array.

Convolution Layer
 This is the first layer that filters the input images. Its purpose is to extract
features from the image. It captures colour, edges, gradient orientation, and
other features so it can be differentiated.

Convolution Layer: A convolutional layer within a neural network should
have the following attributes:
 Convolutional kernels defined by a width and height (hyper-parameters).
 The number of input channels and output channels (hyper-parameter).
 The depth of the Convolution filter (the input channels) must be equal to the
number channels (depth) of the input feature map.
There are two types of result from this layer:
 Same Padding :When the size of output feature-maps are the same as the
input feature-maps,
 Valid Padding :When the size of output feature-maps has the same size as the
kernel,

Pooling Layer:
 This layer is usually added after the convolutional layer.
 Pooling layer reduces the spatial size of the output from convolutional layer
and extracts dominant features. Pooling layer can be differentiated into two
types, which are:
 Max Pooling: It returns the maximum value from the portion of the image
covered by kernel. This layer discards noisy activation and help over-fitting by
providing an abstracted form of the representation.
 Average Pooling: This type of pooling returns the average value from the
portion of the image covered by kernel.

Fully Connected Input Layer (Flatten):
 Fully connected layers are layers where all the inputs from one layer are
connected to every activation unit of the next layer. The layer takes the output
of the pooling and flatten them into single vector.

Autoencoder
Autoencoder:
 Autoencoders have emerged as one of the technologies and techniques that
enable computer systems to solve data compression problems more
efficiently.
 An autoencoder is a type of artificial neural network used to learn data
encodings in an unsupervised manner.
 The aim of an autoencoder is to learn a lower-dimensional representation
(encoding) for a higher-dimensional data, typically for dimensionality
reduction, by training the network to capture the most important parts of the
input image.

Autoencoder
Autoencoder:

Autoencoder
The architecture of Autoencoders:
1. Encoder: A module that compresses the train-validate-test set input data into
an encoded representation that is typically several orders of magnitude smaller
than the input data.
2. Bottleneck: A module that contains the compressed knowledge representations
and is therefore the most important part of the network.
3. Decoder: A module that helps the network“decompress” the knowledge
representations and reconstructs the data back from its encoded form. The
output is then compared with a ground truth.

Autoencoder
How to train autoencoders?

Applications of autoencoders
1. Dimensionality reduction
 Undercomplete autoencoders are those that are used for dimensionality
reduction.
 These can be used as a pre-processing step for dimensionality reduction as they
can perform fast and accurate dimensionality reductions without losing much
information.
 Furthermore, while dimensionality reduction procedures like PCA can only
perform linear dimensionality reductions, undercomplete autoencoders can
perform large-scale non-linear dimensionality reductions.

2. Image denoising
 Autoencoders like the denoising autoencoder can be used for performing
efficient and highly accurate image denoising.
 Unlike traditional methods of denoising, autoencoders do not search for noise,
they extract the image from the noisy data that has been fed to them via
learning a representation of it. The representation is then decompressed to
form a noise-free image.
 Denoising autoencoders thus can denoise complex images that cannot be
denoised via traditional methods

3. Anomaly detection
 Undercomplete autoencoders can also be used for anomaly detection.
 For example—consider an autoencoder that has been trained on a specific
dataset P. For any image sampled for the training dataset, the autoencoder is
bound to give a low reconstruction loss and is supposed to reconstruct the
image as is.
 For any image which is not present in the training dataset, however, the
autoencoder cannot perform the reconstruction, as the latent attributes are not
adapted for the specific image that has never been seen by the network.

Long Short-Term Memory Networks (LSTM)
 LSTM (Long Short-Term Memory) is a recurrent neural network (RNN)
architecture widely used in Deep Learning. It excels at capturing long-term
dependencies, making it ideal for sequence prediction tasks.
 Unlike traditional neural networks, LSTM incorporates feedback connections,
allowing it to process entire sequences of data, not just individual data points.
This makes it highly effective in understanding and predicting patterns in
sequential data like time series, text, and speech.
 LSTM has become a powerful tool in artificial intelligence and deep learning,
enabling breakthroughs in various fields by uncovering valuable insights from
sequential data.

LSTM Architecture:

LSTM Architecture:
 The first part chooses whether the information coming from the previous
timestamp is to be remembered or is irrelevant and can be forgotten.
 In the second part, the cell tries to learn new information from the input to this
cell.
 At last, in the third part, the cell passes the updated information from the
current timestamp to the next timestamp. This one cycle of LSTM is
considered a single-time step.
 These three parts of an LSTM unit are known as gates,

LSTM Architecture:
 The first gate is called Forget gate, the second gate is known as the Input gate,
and the last one is the Output gate.
 An LSTM unit that consists of these three gates and a memory cell or lstm cell
can be considered as a layer of neurons in traditional feedforward neural
network, with each neuron having a hidden layer and a current state.

LSTM:

Recent Trends in Deep Learning
1. Hybrid Model Integration
 An application provides a mechanism for integrating hybrid models from data
sources such as census, weather, and social media into decision support tools.
Moreover, it enables the creation of a new nested domain for the location data,
which can then become part of decision support systems.
 The results suggest that incorporating deep learning networks into hybrid
models can lead to better decisions concerning hazards and performance
measures such as growth and employment.
 Hybrid models combine the benefits of symbolic AI and deep learning. It’s a
top-down approach to artificial intelligence.

2. The Vision Transformer
 Commonly referred to as ViT, an image classification model developed by
researchers at the University of Washington, it is used in sentiment analysis,
object recognition, and image captioning.
 ViT consists of an input layer, a middle layer, and an output layer. The input
layer contains training images that have been labeled with one of several
possible sentiments (cheerful, negative, neutral, uncertain, sad, happy, angry).
The middle layer detects the types of objects in the image.
 The output layer returns a confidence score based on the kind seen by the
central and input layers.

3. Self-Supervised Learning
 This deep and self-supervised learning module helps in automation.
 Rather than depending on labeled data to train a system, it learns to categorize
the raw data automatically.
 Each input component can predict any other part of the input. It might, for
example, forecast the future based on historical records.
 In a self-supervised learning system, the input is labeled either by an intelligent
agent or by some external source.
 The output is also marked with a label that reflects the overall quality of the
prediction made by the system.

4. Neuroscience Based Deep Learning
 Neuroscience-based deep learning is a type of ML that uses data from
neuroscience experiments to train artificial neural networks. It allows
researchers to develop models that better understand how the brain works.
 Artificial neural networks constructed on computers are comparable to those
seen in human brains.
 As a result of this formation, scientists and researchers have uncovered
thousands of neurological remedies and ideas. With the deployment of
progressively more robust, comprehensive, and advanced deep learning
implementations and solutions, the dynamics of adaptability ratio have
improved significantly.

5. High-Performance NLP Models
 Machine Learning-based NLP is still in the early stages. However, there is
presently no method that will allow NLP computers to recognize the meanings
of different words in various contexts and respond appropriately.
 One approach to solving this problem is to build a model that can recognize
patterns in large amounts of text (e.g., millions of documents).
 This is where machine learning algorithms come in, as they can automatically
learn from data and train models to make predictions

6. Generative Adversarial Networks (GANs):
 Generative Adversarial Networks (GANs) are a type of neural network that
generates new, realistic data based on existing data. GANs work by pitting two
neural networks against each other, with one network generating fake data and
the other network trying to detect whether the data is real or fake.
 This approach enables GANs to generate new data that is indistinguishable
from real data, and has a wide range of applications, including image and
video generation, music synthesis, and natural language processing.
 One of the advantages of GANs is their ability to generate a diverse range of
outputs, which can be used to train machine learning models that are more
robust and accurate.

7. Reinforcement learning
 Reinforcement learning is a machine learning approach in which an agent
learns how to behave in an environment by performing actions and receiving
feedback in the form of rewards or penalties. The agent learns from its actions
and adjusts its behavior accordingly to maximize the cumulative reward over
time.
 Reinforcement learning has been used to develop game-playing agents that
have beaten human champions in games such as Go and Chess. The technique
has also been applied to robotics, where agents learn how to manipulate
objects or navigate through environments.

ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf

More Related Content

Similar to ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf (20)

More from rameshwarchintamani (19)

Recently uploaded (20)

ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf