Sanjivani Rural Education Society’s
Sanjivani College of Engineering, Kopargaon-423 603
Department of Information Technology
Prepared by
Dr.R.D.Chintamani
Assistant Professor
Department of Information Technology
Department of Information Technology, SRES’s Sanjivani College of Engineering, Kopargaon-
Machine Learning
(IT312)
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Unit-VI
DEEP LEARNING
Course Objectives : To understand the Deep Learning concept
Course Outcome(CO6) : Understand the Deep learning,
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
 Generative AI tools like ChatGPT and Midjournery are able to replicate (and
often exceed) human-like performance on tasks like taking exams, generating
text and making art.
 Even to seasoned programmers, their abilities can seem magical. But,
obviously, there is no magic.
 These things are “just” artificial neural networks – circuits inspired by the
architecture of biological brains.
 In fact, much like real brains, when broken down to their building blocks,
these systems can seem “impossibly simple” relative to what they achieve.
Artificial Neural Network(ANN)
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
 Artificial Neural Networks (ANN) are algorithms based on brain function and
are used to model complicated patterns and forecast issues.
 The Artificial Neural Network (ANN) is a deep learning method that arise
from the concept of the human brain Biological Neural Networks.
 The development of ANN was the result of an attempt to replicate the
workings of the human brain. The workings of ANN are extremely similar
to those of biological neural networks, although they are not identical.
ANN algorithm accepts only numeric and structured data.
 Convolutional Neural Networks (CNN) and Recursive Neural Networks
(RNN) are used to accept unstructured and non-numeric data forms such as
Image, Text, and Speech.
Artificial Neural Network(ANN)
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Artificial Neural Networks (ANN):
 An Artificial Neural Network (ANN) is a computational model inspired by
the human brain’s neural structure.
 It consists of interconnected nodes (neurons) organized into layers.
 Information flows through these nodes, and the network adjusts the
connection strengths (weights) during training to learn from data, enabling it
to recognize patterns, make predictions, and solve various tasks in machine
learning and artificial intelligence.
Artificial Neural Network(ANN)
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Artificial Neural Networks Architecture
Artificial Neural Network(ANN)
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Artificial Neural Networks Architecture:
Artificial Neural Network(ANN)
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Types of Artificial Neural Networks:
Five Types of Artificial Neural Networks:
Artificial Neural Network(ANN)
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Types of Artificial Neural Networks:
Artificial Neural Network(ANN)
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
How do Artificial Neural Networks learn?
Artificial Neural Network(ANN)
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
How do Artificial Neural Networks learn?
Artificial Neural Network(ANN)
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Application of Artificial Neural Networks:
ANNs have a wide range of applications because of their unique properties. A
few of the important applications of ANNs include:
1. Image Processing and Character recognition:
 ANNs play a significant part in picture and character recognition because of
their capacity to take in many inputs, process them, and infer hidden and
complicated, non-linear correlations.
 Character recognition, such as handwriting recognition, has many
applications in fraud detection (for example, bank fraud) and even national
security assessments.
Artificial Neural Network(ANN)
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Application of Artificial Neural Networks:
1. Image Processing and Character recognition:
Artificial Neural Network(ANN)
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Application of Artificial Neural Networks:
1. Image Processing and Character recognition:
 Image recognition is a rapidly evolving discipline with several applications
ranging from social media facial identification to cancer detection in medicine
to satellite image processing for agricultural and defence purposes.
 Deep neural networks, which form the core of “deep learning,” have now
opened up all of the new and transformative advances in computer vision,
speech recognition, and natural language processing – notable examples being
self-driving vehicles,
Artificial Neural Network(ANN)
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Application of Artificial Neural Networks:
2. Forecasting:
 It is widely used in everyday company decisions (sales, the financial
allocation between goods, and capacity utilization), economic and monetary
policy, finance, and the stock market.
 Forecasting issues are frequently complex; for example, predicting stock
prices is complicated with many underlying variables (some known, some
unseen).
 Traditional forecasting models have flaws when it comes to accounting for
these complicated, non-linear interactions.
Artificial Neural Network(ANN)
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Application of Artificial Neural Networks:
2. Forecasting:
 Given its capacity to model and extract previously unknown characteristics
and correlations, ANNs can provide a reliable alternative when used correctly.
ANN also has no restrictions on the input and residual distributions, unlike
conventional models.
Artificial Neural Network(ANN)
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Advantages of Artificial Neural Networks:
Artificial Neural Network(ANN)
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Disadvantages of Artificial Neural Networks:
Artificial Neural Network(ANN)
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Disadvantages of Artificial Neural Networks:
Artificial Neural Network(ANN)
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
McCulloch-Pitts Neuron:
 The McCulloch-Pitts neuron was a binary device, functioning in an
all-or-nothing manner.
 It had multiple inputs and one output; if the combined inputs
exceeded a certain threshold, the neuron would ‘fire’, otherwise, it
remained inactive.
 This simple yet powerful concept mimicked the basic function of
neurons in the brain,
 MP neuron model is also known as linear threshold gate model,
Artificial Neural Network(ANN)
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
McCulloch-Pitts Neuron:
Artificial Neural Network(ANN)
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Perceptron:
 A perceptron is the smallest element of a neural network.
 Perceptron is a single-layer neural network linear or a Machine Learning
algorithm used for supervised learning of various binary classifiers.
 It works as an artificial neuron to perform computations by learning elements
and processing them for detecting the business intelligence and capabilities
of the input data.
 A perceptron network is a group of simple logical statements that come
together to create an array of complex logical statements, known as the
neural network.
Perceptron
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Perceptron:
Perceptron
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Components of a Perceptron:
Perceptron
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Why do we Need Weight and Bias?
 Weight and bias are two important aspects of the perceptron model.
 These are learnable parameters and as the network gets trained it adjusts both
parameters to achieve the desired values and the correct output.
 Weights are used to measure the importance of each feature in predicting
output value.
 Features with values close to zero are said to have lesser weight or
significance. These have less importance in the prediction process compared
to the features with values further from zero known as weights with a larger
value.
Perceptron
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Why do we Need Weight and Bias?
 If the weight of a feature is positive then it has a direct relation with the
target value, and if it is negative then it has an inverse relationship with the
target value.
 In contrast to weight in a neural network that increases the speed of triggering
an activation function, bias delays the trigger of the activation function.
 It acts like an intercept in a linear equation. Simply stated, Bias is a constant
used to adjust the output and help the model to provide the best fit output for
the given data.
Perceptron
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Perceptron Learning Rule:
 According to the rule, perceptron can learn automatically to generate the
desired results through optimal weight coefficients.
Perceptron
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
The anatomy of a Perceptron:
Perceptron
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Why is perceptron used?
 Perceptron is a linear classifier used for data classification into the two binary
sections.
 Facilitating the supervised learning of binary classifiers, the perceptron
algorithm learns and processes elements in the training set one at a time.
 It helps detect features from an input to derive business intelligence and
classify the inputs as it enables machines to automatically learn coefficients
of weight.
 Perceptron is commonly used for basic operations like data compression, data
visualization, high-quality complex image recognition, and encryption.
Perceptron
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Single Layer Perceptron Model:
 A single-layer perceptron model is the simplest type of artificial neural
network.
 It includes a feed-forward network that can analyze only linearly separable
objects while being dependent on a threshold transfer function. The model
returns only binary outcomes(target) i.e. 1, and 0.
 The algorithm in a single-layered perceptron model does not have any
previous information initially. The weights are allocated inconsistently, so
the algorithm simply adds up all the weighted inputs.
 since the single-layer perceptron is a linear classifier and it does not classify
cases if they are not linearly separable
Perceptron
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Single Layer Perceptron Model:
Perceptron
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Multilayer Perceptron Model
 A multi-layer perceptron model uses the backpropagation algorithm. Though
it has the same structure as that of a single-layer perceptron, it has one or
more hidden layers.
Perceptron
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Activation Functions:
 Activation functions are an integral building block of neural networks that
enable them to learn complex patterns in data.
 They transform the input signal of a node in a neural network into an output
signal that is then passed on to the next layer. Without activation functions,
neural networks would be restricted to modeling only linear relationships
between inputs and outputs.
 Activation functions introduce non-linearities, allowing neural networks to
learn highly complex mappings between inputs and outputs.
Activation Functions in ANN
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Activation Functions:
 An activation function determines the range of values of activation of an
artificial neuron,
 This is applied to the sum of the weighted input data of the neuron.
 Without the application of an activation function, the only operations in
computing the output of a multilayer perceptron would be the linear products
between the weights and the input values.
 A neural network that does not have an activation function in the hidden layer
would not be able to mathematically realize such complex relationships, and
would not be able to solve the tasks we are trying to solve with the network.
Activation Functions in ANN
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Why Are Activation Functions Essential?
 Without activation functions, neural networks would just consist of linear
operations like matrix multiplication. All layers would perform linear
transformations of the input, and no non-linearities would be introduced.
 Most real-world data is non-linear. For example, relationships between house
prices and size, income, and purchases, etc., are non-linear. If neural
networks had no activation functions, they would fail to learn the complex
non-linear patterns that exist in real-world data.
 Activation functions enable neural networks to learn these non-linear
relationships by introducing non-linear behaviour through activation
functions.
Activation Functions in ANN
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
1. Sigmoid or Logistic Activation Function:
 It is especially used for models where we have to predict the probability as
an output. Since probability of anything exists only between the range of 0
and 1, sigmoid is the right choice.
 The function is differentiable.That means, we can find the slope of the
sigmoid curve at any two points.
 The main purpose of the activation function is to maintain the output or
predicted value in the particular range, which makes the good efficiency and
accuracy of the model.
Types of Activation Functions:
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
1. Sigmoid or Logistic Activation Function:
The mathematical definition of the sigmoid function is as follows:
Types of Activation Functions:
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
1. Sigmoid or Logistic Activation Function:
 It takes a real-valued input and squashes it to a value between 0 and 1.
 The sigmoid function has an "S"-shaped curve that asymptotes to 0 for large
negative numbers and 1 for large positive numbers.
 The outputs can be easily interpreted as probabilities, which makes it natural
for binary classification problems.
 The main use case of the sigmoid function is as the activation for the output
layer of binary classification models
Types of Activation Functions:
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
2. Tanh or hyperbolic tangent Activation Function:
 Tanh is also like logistic sigmoid but better. The range of the Tanh function
is from (-1 to 1). Tanh is also sigmoidal (s – shaped).
 The advantage of Tanh is that the negative inputs will be mapped strongly
negative and the zero inputs will be mapped near zero in the tanh graph.
 The tanh function is mainly used classification between two classes.
 Both Tanh and logistic sigmoid activation functions are used in feed-
forward nets.
Types of Activation Functions:
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
2. Tanh or hyperbolic tangent Activation Function:
Types of Activation Functions:
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
2. Tanh or hyperbolic tangent Activation Function:
 Unlike the sigmoid function, tanh is zero-centered, which means that its
output is symmetric around the origin of the coordinate system. This is often
considered an advantage because it can help the learning algorithm converge
faster.
 The tanh function is frequently used in the hidden layers of a neural network.
Because of its zero-centered nature, when the data is also normalized to have
mean zero, it can result in more efficient training.
 If one has to choose between the sigmoid and tanh the decision can also be
influenced by the specific use case and the behavior of the network during
initial training experiments.
Types of Activation Functions:
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
3. ReLU (Rectified Linear Unit) Activation Function
 The ReLU is the most used activation function in the world right now.Since,
it is used in almost all the convolutional neural networks or deep learning.
 Even though ReLU is linear for half of its input space, it is technically a non-
linear function because it has a non-differentiable point at x=0, where it
abruptly changes from x. This non-linearity allows neural networks to learn
complex patterns,
 Since ReLU outputs zero for all negative inputs, it naturally leads to sparse
activations; at any time, only a subset of neurons are activated, leading to
more efficient computation.
Types of Activation Functions:
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
3. ReLU (Rectified Linear Unit) Activation Function
Types of Activation Functions:
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
3. ReLU (Rectified Linear Unit) Activation Function
 The ReLU function is computationally inexpensive because it involves simple
thresholding at zero. This allows networks to scale to many layers without a
significant increase in computational burden, compared to more complex
functions like tanh or sigmoid.
Types of Activation Functions:
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Types of Activation Functions:
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Loss Functions(Cost/Error Function):
Loss Function:
 The loss function helps determine how effectively your algorithm model the
featured dataset. Similarly loss is the measure that your model has for
predictability, the expected results.
 It is a mathematical function of the parameters of the machine learning
algorithm.
 In simple linear regression, prediction is calculated using slope (m) and
intercept (b). The loss function for this is the (Yi – Yihat)^2 i.e., loss function
is the function of slope and intercept. Regression loss functions like the MSE
loss function are commonly used in evaluating the performance of regression
models,
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Loss Functions(Cost/Error Function):
Loss Function:
 A loss function measures how good a neural network model is in performing a
certain task, which in most cases is regression or classification.
 We must minimize the value of the loss function during the back-propagation
step in order to make the neural network better.
 We only use the cross-entropy loss function in classification tasks when we
want the neural network to predict probabilities.
 For regression tasks, when we want the network to predict continuous
numbers, we must use the mean squared error loss function.
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Loss Functions(Cost/Error Function):
Loss Function:
 Loss functions play a pivotal role in machine learning algorithms, acting as
objective measures of the disparity between predicted and actual values.
 They serve as the basis for model training, guiding algorithms to adjust model
parameters in a direction that minimizes the loss and improves predictive
accuracy.
 In machine learning, loss functions quantify the extent of error between
predicted and actual outcomes. They provide a means to evaluate the
performance of a model on a given dataset and are instrumental in optimizing
model parameters during the training process.
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Loss Functions in Deep Learning
1. Mean Squared Error/Squared loss/ L2 loss:
 The Mean Squared Error (MSE) is a straightforward and widely used loss
function.
 To calculate the MSE, you take the difference between the actual value and the
model prediction, square it, and then average it across the entire dataset.
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Loss Functions in Deep Learning
1. Mean Squared Error/Squared loss/ L2 loss:
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Loss Functions in Deep Learning
2. Mean Absolute Error/ L1 loss Functions:
 The Mean Absolute Error (MAE) is another simple loss function. It calculates
the average absolute difference between the actual value and the model
prediction across the dataset.
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Loss Functions in Deep Learning
2. Mean Absolute Error/ L1 loss Functions:
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Loss Functions in Deep Learning
3. Binary Cross Entropy/log loss Function:
 It is used in binary classification problems like two classes. example a person
has covid or not or my article gets popular or not.
 Binary cross entropy compares each of the predicted probabilities to the actual
class output which can be either 0 or 1. It then calculates the score that
penalizes the probabilities based on the distance from the expected value. That
means how close or far from the actual value.
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Convolutional Neural Network(CNN)
Convolutional Neural Network(CNN):
 A Convolutional Neural Network, also known as CNN or ConvNet, is a class
of neural networks that specializes in processing data that has a grid-like
topology, such as an image.
 A digital image is a binary representation of visual data. It contains a series of
pixels arranged in a grid-like fashion that contains pixel values to denote how
bright and what colour each pixel should be.
 It takes an input image, applies filters, flatten the image, and “vote” to classify
the image.
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Building Block of (CNN):
Building Block of Convolutional Neural Network(CNN):
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Building Block of (CNN):
Input Image:
 First of all, the input image will be broken down into pixels.
 If it is a black and white image, it will only have one layer and pixels will be
interpreted as 2D array with the value from 0 to 255. If it is coloured image, it
will have 3 layers (red, green, blue) and will be interpreted as 3D array.
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Building Block of (CNN):
Convolution Layer
 This is the first layer that filters the input images. Its purpose is to extract
features from the image. It captures colour, edges, gradient orientation, and
other features so it can be differentiated.
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Building Block of (CNN):
Convolution Layer: A convolutional layer within a neural network should
have the following attributes:
 Convolutional kernels defined by a width and height (hyper-parameters).
 The number of input channels and output channels (hyper-parameter).
 The depth of the Convolution filter (the input channels) must be equal to the
number channels (depth) of the input feature map.
There are two types of result from this layer:
 Same Padding :When the size of output feature-maps are the same as the
input feature-maps,
 Valid Padding :When the size of output feature-maps has the same size as the
kernel,
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Building Block of (CNN):
Pooling Layer:
 This layer is usually added after the convolutional layer.
 Pooling layer reduces the spatial size of the output from convolutional layer
and extracts dominant features. Pooling layer can be differentiated into two
types, which are:
 Max Pooling: It returns the maximum value from the portion of the image
covered by kernel. This layer discards noisy activation and help over-fitting by
providing an abstracted form of the representation.
 Average Pooling: This type of pooling returns the average value from the
portion of the image covered by kernel.
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Building Block of (CNN):
Fully Connected Input Layer (Flatten):
 Fully connected layers are layers where all the inputs from one layer are
connected to every activation unit of the next layer. The layer takes the output
of the pooling and flatten them into single vector.
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Autoencoder
Autoencoder:
 Autoencoders have emerged as one of the technologies and techniques that
enable computer systems to solve data compression problems more
efficiently.
 An autoencoder is a type of artificial neural network used to learn data
encodings in an unsupervised manner.
 The aim of an autoencoder is to learn a lower-dimensional representation
(encoding) for a higher-dimensional data, typically for dimensionality
reduction, by training the network to capture the most important parts of the
input image.
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Autoencoder
Autoencoder:
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Autoencoder
The architecture of Autoencoders:
1. Encoder: A module that compresses the train-validate-test set input data into
an encoded representation that is typically several orders of magnitude smaller
than the input data.
2. Bottleneck: A module that contains the compressed knowledge representations
and is therefore the most important part of the network.
3. Decoder: A module that helps the network“decompress” the knowledge
representations and reconstructs the data back from its encoded form. The
output is then compared with a ground truth.
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Autoencoder
How to train autoencoders?
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Applications of autoencoders
1. Dimensionality reduction
 Undercomplete autoencoders are those that are used for dimensionality
reduction.
 These can be used as a pre-processing step for dimensionality reduction as they
can perform fast and accurate dimensionality reductions without losing much
information.
 Furthermore, while dimensionality reduction procedures like PCA can only
perform linear dimensionality reductions, undercomplete autoencoders can
perform large-scale non-linear dimensionality reductions.
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Applications of autoencoders
2. Image denoising
 Autoencoders like the denoising autoencoder can be used for performing
efficient and highly accurate image denoising.
 Unlike traditional methods of denoising, autoencoders do not search for noise,
they extract the image from the noisy data that has been fed to them via
learning a representation of it. The representation is then decompressed to
form a noise-free image.
 Denoising autoencoders thus can denoise complex images that cannot be
denoised via traditional methods
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Applications of autoencoders
3. Anomaly detection
 Undercomplete autoencoders can also be used for anomaly detection.
 For example—consider an autoencoder that has been trained on a specific
dataset P. For any image sampled for the training dataset, the autoencoder is
bound to give a low reconstruction loss and is supposed to reconstruct the
image as is.
 For any image which is not present in the training dataset, however, the
autoencoder cannot perform the reconstruction, as the latent attributes are not
adapted for the specific image that has never been seen by the network.
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Long Short-Term Memory Networks (LSTM)
Long Short-Term Memory Networks (LSTM)
 LSTM (Long Short-Term Memory) is a recurrent neural network (RNN)
architecture widely used in Deep Learning. It excels at capturing long-term
dependencies, making it ideal for sequence prediction tasks.
 Unlike traditional neural networks, LSTM incorporates feedback connections,
allowing it to process entire sequences of data, not just individual data points.
This makes it highly effective in understanding and predicting patterns in
sequential data like time series, text, and speech.
 LSTM has become a powerful tool in artificial intelligence and deep learning,
enabling breakthroughs in various fields by uncovering valuable insights from
sequential data.
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Long Short-Term Memory Networks (LSTM)
LSTM Architecture:
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Long Short-Term Memory Networks (LSTM)
LSTM Architecture:
 The first part chooses whether the information coming from the previous
timestamp is to be remembered or is irrelevant and can be forgotten.
 In the second part, the cell tries to learn new information from the input to this
cell.
 At last, in the third part, the cell passes the updated information from the
current timestamp to the next timestamp. This one cycle of LSTM is
considered a single-time step.
 These three parts of an LSTM unit are known as gates,
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Long Short-Term Memory Networks (LSTM)
LSTM Architecture:
 The first gate is called Forget gate, the second gate is known as the Input gate,
and the last one is the Output gate.
 An LSTM unit that consists of these three gates and a memory cell or lstm cell
can be considered as a layer of neurons in traditional feedforward neural
network, with each neuron having a hidden layer and a current state.
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Long Short-Term Memory Networks (LSTM)
LSTM:
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Recent Trends in Deep Learning
1. Hybrid Model Integration
 An application provides a mechanism for integrating hybrid models from data
sources such as census, weather, and social media into decision support tools.
Moreover, it enables the creation of a new nested domain for the location data,
which can then become part of decision support systems.
 The results suggest that incorporating deep learning networks into hybrid
models can lead to better decisions concerning hazards and performance
measures such as growth and employment.
 Hybrid models combine the benefits of symbolic AI and deep learning. It’s a
top-down approach to artificial intelligence.
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Recent Trends in Deep Learning
2. The Vision Transformer
 Commonly referred to as ViT, an image classification model developed by
researchers at the University of Washington, it is used in sentiment analysis,
object recognition, and image captioning.
 ViT consists of an input layer, a middle layer, and an output layer. The input
layer contains training images that have been labeled with one of several
possible sentiments (cheerful, negative, neutral, uncertain, sad, happy, angry).
The middle layer detects the types of objects in the image.
 The output layer returns a confidence score based on the kind seen by the
central and input layers.
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Recent Trends in Deep Learning
3. Self-Supervised Learning
 This deep and self-supervised learning module helps in automation.
 Rather than depending on labeled data to train a system, it learns to categorize
the raw data automatically.
 Each input component can predict any other part of the input. It might, for
example, forecast the future based on historical records.
 In a self-supervised learning system, the input is labeled either by an intelligent
agent or by some external source.
 The output is also marked with a label that reflects the overall quality of the
prediction made by the system.
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Recent Trends in Deep Learning
4. Neuroscience Based Deep Learning
 Neuroscience-based deep learning is a type of ML that uses data from
neuroscience experiments to train artificial neural networks. It allows
researchers to develop models that better understand how the brain works.
 Artificial neural networks constructed on computers are comparable to those
seen in human brains.
 As a result of this formation, scientists and researchers have uncovered
thousands of neurological remedies and ideas. With the deployment of
progressively more robust, comprehensive, and advanced deep learning
implementations and solutions, the dynamics of adaptability ratio have
improved significantly.
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Recent Trends in Deep Learning
5. High-Performance NLP Models
 Machine Learning-based NLP is still in the early stages. However, there is
presently no method that will allow NLP computers to recognize the meanings
of different words in various contexts and respond appropriately.
 One approach to solving this problem is to build a model that can recognize
patterns in large amounts of text (e.g., millions of documents).
 This is where machine learning algorithms come in, as they can automatically
learn from data and train models to make predictions
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Recent Trends in Deep Learning
6. Generative Adversarial Networks (GANs):
 Generative Adversarial Networks (GANs) are a type of neural network that
generates new, realistic data based on existing data. GANs work by pitting two
neural networks against each other, with one network generating fake data and
the other network trying to detect whether the data is real or fake.
 This approach enables GANs to generate new data that is indistinguishable
from real data, and has a wide range of applications, including image and
video generation, music synthesis, and natural language processing.
 One of the advantages of GANs is their ability to generate a diverse range of
outputs, which can be used to train machine learning models that are more
robust and accurate.
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Recent Trends in Deep Learning
7. Reinforcement learning
 Reinforcement learning is a machine learning approach in which an agent
learns how to behave in an environment by performing actions and receiving
feedback in the form of rewards or penalties. The agent learns from its actions
and adjusts its behavior accordingly to maximize the cumulative reward over
time.
 Reinforcement learning has been used to develop game-playing agents that
have beaten human champions in games such as Go and Chess. The technique
has also been applied to robotics, where agents learn how to manipulate
objects or navigate through environments.
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON

More Related Content

DOCX
Neural networks of artificial intelligence
PDF
Artificial neural network in Audiology.pdf
PDF
Artificial Neural Network and its Applications
DOCX
Artifical neural networks
PDF
Artificial Neural Networking
PDF
Neural networking this is about neural networks
PPT
neuralnetworklearningalgorithm-231219123006-bb13a863.ppt
Neural networks of artificial intelligence
Artificial neural network in Audiology.pdf
Artificial Neural Network and its Applications
Artifical neural networks
Artificial Neural Networking
Neural networking this is about neural networks
neuralnetworklearningalgorithm-231219123006-bb13a863.ppt

Similar to ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf (20)

PPS
Neural Networks Ver1
PDF
Artificial Neural Network: A brief study
PDF
Artificial Neural Networks.pdf
PPTX
Neural network
PDF
Artificial Neural Networks in Human Life: Future Challenges and its Applications
PPT
ANN_B.TechPresentation of ANN basics.ppt
PPT
Artificial Neural Network Learning Algorithm.ppt
PPTX
02 Fundamental Concepts of ANN
PDF
[IJET V2I2P20] Authors: Dr. Sanjeev S Sannakki, Ms.Anjanabhargavi A Kulkarni
DOCX
Neural network
PDF
NeuralNetwork Artificial Intellegence Material
PDF
Artificial neural network
PPTX
Artificial neural network
PPT
Artificial neural networks
DOCX
Neural networks report
PPTX
Artificial Neural Network in Medical Diagnosis
PPTX
Artificial Neural Network
PPTX
Artificial neural network
PPTX
artificialneuralnetwork-200611082546.pptx
PDF
Neural Network
Neural Networks Ver1
Artificial Neural Network: A brief study
Artificial Neural Networks.pdf
Neural network
Artificial Neural Networks in Human Life: Future Challenges and its Applications
ANN_B.TechPresentation of ANN basics.ppt
Artificial Neural Network Learning Algorithm.ppt
02 Fundamental Concepts of ANN
[IJET V2I2P20] Authors: Dr. Sanjeev S Sannakki, Ms.Anjanabhargavi A Kulkarni
Neural network
NeuralNetwork Artificial Intellegence Material
Artificial neural network
Artificial neural network
Artificial neural networks
Neural networks report
Artificial Neural Network in Medical Diagnosis
Artificial Neural Network
Artificial neural network
artificialneuralnetwork-200611082546.pptx
Neural Network
Ad

More from rameshwarchintamani (19)

PDF
Distributed System Lab_DS ASSIGNMENT NO 2.pdf
PDF
Unit_2_Part_1_Notes._COMMUNICATION AND CO-ORDINATIONpdf
PPT
Unit_2_Clock_Synchronization_logical_clock.ppt
PPT
Unit_2_Communication and Coordination_Elections.ppt
PDF
communicationsection123-150610092456-lva1-app6891.pdf
PPT
Unit_2_COMMUNICATION AND CO-ORDINATION.ppt
PDF
DSL_Assignment No 1_SOCKET AND RMI_CLIENT_SERVER.pdf
PPT
A5_DistSysCh1.ppt_INTRODUCTION TO DISTRIBUTED SYSTEMS
PPT
UNit 1 DS.ppt_INTRODUCTION TO DISTRIBUTED SYSTEMS
PPT
Unit_I.ppt_introduction to distributed sy
PDF
ML_Unit_V_RDC_ASSOCIATION AND DIMENSIONALITY REDUCTION.pdf
PDF
Unit4_Clustering k means_Clustering in ML.pdf
PDF
ML_Unit_IV_Clustering in Machine Learning.pdf
PDF
DCS Unit-II COMMUNICATION AND COORDINATION.pdf
PPT
Chapter-5-DFS.ppt
PDF
Unit-5_2 PPT on Distributed Web based System.pdf
PPTX
Unit_4_Fault_Tolerance.pptx
PPT
Unit_2_Midddleware_2.ppt
PPT
Distributed System Lab_DS ASSIGNMENT NO 2.pdf
Unit_2_Part_1_Notes._COMMUNICATION AND CO-ORDINATIONpdf
Unit_2_Clock_Synchronization_logical_clock.ppt
Unit_2_Communication and Coordination_Elections.ppt
communicationsection123-150610092456-lva1-app6891.pdf
Unit_2_COMMUNICATION AND CO-ORDINATION.ppt
DSL_Assignment No 1_SOCKET AND RMI_CLIENT_SERVER.pdf
A5_DistSysCh1.ppt_INTRODUCTION TO DISTRIBUTED SYSTEMS
UNit 1 DS.ppt_INTRODUCTION TO DISTRIBUTED SYSTEMS
Unit_I.ppt_introduction to distributed sy
ML_Unit_V_RDC_ASSOCIATION AND DIMENSIONALITY REDUCTION.pdf
Unit4_Clustering k means_Clustering in ML.pdf
ML_Unit_IV_Clustering in Machine Learning.pdf
DCS Unit-II COMMUNICATION AND COORDINATION.pdf
Chapter-5-DFS.ppt
Unit-5_2 PPT on Distributed Web based System.pdf
Unit_4_Fault_Tolerance.pptx
Unit_2_Midddleware_2.ppt
Ad

Recently uploaded (20)

PDF
UEFA_Embodied_Carbon_Emissions_Football_Infrastructure.pdf
PPTX
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
PDF
MLpara ingenieira CIVIL, meca Y AMBIENTAL
PDF
Computer System Architecture 3rd Edition-M Morris Mano.pdf
PPTX
A Brief Introduction to IoT- Smart Objects: The "Things" in IoT
PDF
August 2025 - Top 10 Read Articles in Network Security & Its Applications
PDF
August -2025_Top10 Read_Articles_ijait.pdf
PDF
distributed database system" (DDBS) is often used to refer to both the distri...
PDF
UEFA_Carbon_Footprint_Calculator_Methology_2.0.pdf
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PDF
Computer organization and architecuture Digital Notes....pdf
PPTX
Software Engineering and software moduleing
PPTX
ASME PCC-02 TRAINING -DESKTOP-NLE5HNP.pptx
PPTX
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
PPTX
Petroleum Refining & Petrochemicals.pptx
PDF
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
PPTX
MAD Unit - 3 User Interface and Data Management (Diploma IT)
DOC
T Pandian CV Madurai pandi kokkaf illaya
PPTX
Chapter 2 -Technology and Enginerring Materials + Composites.pptx
PDF
Unit I -OPERATING SYSTEMS_SRM_KATTANKULATHUR.pptx.pdf
UEFA_Embodied_Carbon_Emissions_Football_Infrastructure.pdf
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
MLpara ingenieira CIVIL, meca Y AMBIENTAL
Computer System Architecture 3rd Edition-M Morris Mano.pdf
A Brief Introduction to IoT- Smart Objects: The "Things" in IoT
August 2025 - Top 10 Read Articles in Network Security & Its Applications
August -2025_Top10 Read_Articles_ijait.pdf
distributed database system" (DDBS) is often used to refer to both the distri...
UEFA_Carbon_Footprint_Calculator_Methology_2.0.pdf
Exploratory_Data_Analysis_Fundamentals.pdf
Computer organization and architecuture Digital Notes....pdf
Software Engineering and software moduleing
ASME PCC-02 TRAINING -DESKTOP-NLE5HNP.pptx
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
Petroleum Refining & Petrochemicals.pptx
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
MAD Unit - 3 User Interface and Data Management (Diploma IT)
T Pandian CV Madurai pandi kokkaf illaya
Chapter 2 -Technology and Enginerring Materials + Composites.pptx
Unit I -OPERATING SYSTEMS_SRM_KATTANKULATHUR.pptx.pdf

ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf

  • 1. Sanjivani Rural Education Society’s Sanjivani College of Engineering, Kopargaon-423 603 Department of Information Technology Prepared by Dr.R.D.Chintamani Assistant Professor Department of Information Technology Department of Information Technology, SRES’s Sanjivani College of Engineering, Kopargaon- Machine Learning (IT312)
  • 2. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Unit-VI DEEP LEARNING Course Objectives : To understand the Deep Learning concept Course Outcome(CO6) : Understand the Deep learning,
  • 3. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON  Generative AI tools like ChatGPT and Midjournery are able to replicate (and often exceed) human-like performance on tasks like taking exams, generating text and making art.  Even to seasoned programmers, their abilities can seem magical. But, obviously, there is no magic.  These things are “just” artificial neural networks – circuits inspired by the architecture of biological brains.  In fact, much like real brains, when broken down to their building blocks, these systems can seem “impossibly simple” relative to what they achieve. Artificial Neural Network(ANN)
  • 4. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON  Artificial Neural Networks (ANN) are algorithms based on brain function and are used to model complicated patterns and forecast issues.  The Artificial Neural Network (ANN) is a deep learning method that arise from the concept of the human brain Biological Neural Networks.  The development of ANN was the result of an attempt to replicate the workings of the human brain. The workings of ANN are extremely similar to those of biological neural networks, although they are not identical. ANN algorithm accepts only numeric and structured data.  Convolutional Neural Networks (CNN) and Recursive Neural Networks (RNN) are used to accept unstructured and non-numeric data forms such as Image, Text, and Speech. Artificial Neural Network(ANN)
  • 5. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Artificial Neural Networks (ANN):  An Artificial Neural Network (ANN) is a computational model inspired by the human brain’s neural structure.  It consists of interconnected nodes (neurons) organized into layers.  Information flows through these nodes, and the network adjusts the connection strengths (weights) during training to learn from data, enabling it to recognize patterns, make predictions, and solve various tasks in machine learning and artificial intelligence. Artificial Neural Network(ANN)
  • 6. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Artificial Neural Networks Architecture Artificial Neural Network(ANN)
  • 7. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Artificial Neural Networks Architecture: Artificial Neural Network(ANN)
  • 8. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Types of Artificial Neural Networks: Five Types of Artificial Neural Networks: Artificial Neural Network(ANN)
  • 9. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Types of Artificial Neural Networks: Artificial Neural Network(ANN)
  • 10. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON How do Artificial Neural Networks learn? Artificial Neural Network(ANN)
  • 11. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON How do Artificial Neural Networks learn? Artificial Neural Network(ANN)
  • 12. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Application of Artificial Neural Networks: ANNs have a wide range of applications because of their unique properties. A few of the important applications of ANNs include: 1. Image Processing and Character recognition:  ANNs play a significant part in picture and character recognition because of their capacity to take in many inputs, process them, and infer hidden and complicated, non-linear correlations.  Character recognition, such as handwriting recognition, has many applications in fraud detection (for example, bank fraud) and even national security assessments. Artificial Neural Network(ANN)
  • 13. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Application of Artificial Neural Networks: 1. Image Processing and Character recognition: Artificial Neural Network(ANN)
  • 14. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Application of Artificial Neural Networks: 1. Image Processing and Character recognition:  Image recognition is a rapidly evolving discipline with several applications ranging from social media facial identification to cancer detection in medicine to satellite image processing for agricultural and defence purposes.  Deep neural networks, which form the core of “deep learning,” have now opened up all of the new and transformative advances in computer vision, speech recognition, and natural language processing – notable examples being self-driving vehicles, Artificial Neural Network(ANN)
  • 15. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Application of Artificial Neural Networks: 2. Forecasting:  It is widely used in everyday company decisions (sales, the financial allocation between goods, and capacity utilization), economic and monetary policy, finance, and the stock market.  Forecasting issues are frequently complex; for example, predicting stock prices is complicated with many underlying variables (some known, some unseen).  Traditional forecasting models have flaws when it comes to accounting for these complicated, non-linear interactions. Artificial Neural Network(ANN)
  • 16. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Application of Artificial Neural Networks: 2. Forecasting:  Given its capacity to model and extract previously unknown characteristics and correlations, ANNs can provide a reliable alternative when used correctly. ANN also has no restrictions on the input and residual distributions, unlike conventional models. Artificial Neural Network(ANN)
  • 17. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Advantages of Artificial Neural Networks: Artificial Neural Network(ANN)
  • 18. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Disadvantages of Artificial Neural Networks: Artificial Neural Network(ANN)
  • 19. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Disadvantages of Artificial Neural Networks: Artificial Neural Network(ANN)
  • 20. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON McCulloch-Pitts Neuron:  The McCulloch-Pitts neuron was a binary device, functioning in an all-or-nothing manner.  It had multiple inputs and one output; if the combined inputs exceeded a certain threshold, the neuron would ‘fire’, otherwise, it remained inactive.  This simple yet powerful concept mimicked the basic function of neurons in the brain,  MP neuron model is also known as linear threshold gate model, Artificial Neural Network(ANN)
  • 21. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON McCulloch-Pitts Neuron: Artificial Neural Network(ANN)
  • 22. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Perceptron:  A perceptron is the smallest element of a neural network.  Perceptron is a single-layer neural network linear or a Machine Learning algorithm used for supervised learning of various binary classifiers.  It works as an artificial neuron to perform computations by learning elements and processing them for detecting the business intelligence and capabilities of the input data.  A perceptron network is a group of simple logical statements that come together to create an array of complex logical statements, known as the neural network. Perceptron
  • 23. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Perceptron: Perceptron
  • 24. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Components of a Perceptron: Perceptron
  • 25. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Why do we Need Weight and Bias?  Weight and bias are two important aspects of the perceptron model.  These are learnable parameters and as the network gets trained it adjusts both parameters to achieve the desired values and the correct output.  Weights are used to measure the importance of each feature in predicting output value.  Features with values close to zero are said to have lesser weight or significance. These have less importance in the prediction process compared to the features with values further from zero known as weights with a larger value. Perceptron
  • 26. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Why do we Need Weight and Bias?  If the weight of a feature is positive then it has a direct relation with the target value, and if it is negative then it has an inverse relationship with the target value.  In contrast to weight in a neural network that increases the speed of triggering an activation function, bias delays the trigger of the activation function.  It acts like an intercept in a linear equation. Simply stated, Bias is a constant used to adjust the output and help the model to provide the best fit output for the given data. Perceptron
  • 27. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Perceptron Learning Rule:  According to the rule, perceptron can learn automatically to generate the desired results through optimal weight coefficients. Perceptron
  • 28. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON The anatomy of a Perceptron: Perceptron
  • 29. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Why is perceptron used?  Perceptron is a linear classifier used for data classification into the two binary sections.  Facilitating the supervised learning of binary classifiers, the perceptron algorithm learns and processes elements in the training set one at a time.  It helps detect features from an input to derive business intelligence and classify the inputs as it enables machines to automatically learn coefficients of weight.  Perceptron is commonly used for basic operations like data compression, data visualization, high-quality complex image recognition, and encryption. Perceptron
  • 30. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Single Layer Perceptron Model:  A single-layer perceptron model is the simplest type of artificial neural network.  It includes a feed-forward network that can analyze only linearly separable objects while being dependent on a threshold transfer function. The model returns only binary outcomes(target) i.e. 1, and 0.  The algorithm in a single-layered perceptron model does not have any previous information initially. The weights are allocated inconsistently, so the algorithm simply adds up all the weighted inputs.  since the single-layer perceptron is a linear classifier and it does not classify cases if they are not linearly separable Perceptron
  • 31. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Single Layer Perceptron Model: Perceptron
  • 32. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Multilayer Perceptron Model  A multi-layer perceptron model uses the backpropagation algorithm. Though it has the same structure as that of a single-layer perceptron, it has one or more hidden layers. Perceptron
  • 33. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Activation Functions:  Activation functions are an integral building block of neural networks that enable them to learn complex patterns in data.  They transform the input signal of a node in a neural network into an output signal that is then passed on to the next layer. Without activation functions, neural networks would be restricted to modeling only linear relationships between inputs and outputs.  Activation functions introduce non-linearities, allowing neural networks to learn highly complex mappings between inputs and outputs. Activation Functions in ANN
  • 34. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Activation Functions:  An activation function determines the range of values of activation of an artificial neuron,  This is applied to the sum of the weighted input data of the neuron.  Without the application of an activation function, the only operations in computing the output of a multilayer perceptron would be the linear products between the weights and the input values.  A neural network that does not have an activation function in the hidden layer would not be able to mathematically realize such complex relationships, and would not be able to solve the tasks we are trying to solve with the network. Activation Functions in ANN
  • 35. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Why Are Activation Functions Essential?  Without activation functions, neural networks would just consist of linear operations like matrix multiplication. All layers would perform linear transformations of the input, and no non-linearities would be introduced.  Most real-world data is non-linear. For example, relationships between house prices and size, income, and purchases, etc., are non-linear. If neural networks had no activation functions, they would fail to learn the complex non-linear patterns that exist in real-world data.  Activation functions enable neural networks to learn these non-linear relationships by introducing non-linear behaviour through activation functions. Activation Functions in ANN
  • 36. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON 1. Sigmoid or Logistic Activation Function:  It is especially used for models where we have to predict the probability as an output. Since probability of anything exists only between the range of 0 and 1, sigmoid is the right choice.  The function is differentiable.That means, we can find the slope of the sigmoid curve at any two points.  The main purpose of the activation function is to maintain the output or predicted value in the particular range, which makes the good efficiency and accuracy of the model. Types of Activation Functions:
  • 37. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON 1. Sigmoid or Logistic Activation Function: The mathematical definition of the sigmoid function is as follows: Types of Activation Functions:
  • 38. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON 1. Sigmoid or Logistic Activation Function:  It takes a real-valued input and squashes it to a value between 0 and 1.  The sigmoid function has an "S"-shaped curve that asymptotes to 0 for large negative numbers and 1 for large positive numbers.  The outputs can be easily interpreted as probabilities, which makes it natural for binary classification problems.  The main use case of the sigmoid function is as the activation for the output layer of binary classification models Types of Activation Functions:
  • 39. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON 2. Tanh or hyperbolic tangent Activation Function:  Tanh is also like logistic sigmoid but better. The range of the Tanh function is from (-1 to 1). Tanh is also sigmoidal (s – shaped).  The advantage of Tanh is that the negative inputs will be mapped strongly negative and the zero inputs will be mapped near zero in the tanh graph.  The tanh function is mainly used classification between two classes.  Both Tanh and logistic sigmoid activation functions are used in feed- forward nets. Types of Activation Functions:
  • 40. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON 2. Tanh or hyperbolic tangent Activation Function: Types of Activation Functions:
  • 41. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON 2. Tanh or hyperbolic tangent Activation Function:  Unlike the sigmoid function, tanh is zero-centered, which means that its output is symmetric around the origin of the coordinate system. This is often considered an advantage because it can help the learning algorithm converge faster.  The tanh function is frequently used in the hidden layers of a neural network. Because of its zero-centered nature, when the data is also normalized to have mean zero, it can result in more efficient training.  If one has to choose between the sigmoid and tanh the decision can also be influenced by the specific use case and the behavior of the network during initial training experiments. Types of Activation Functions:
  • 42. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON 3. ReLU (Rectified Linear Unit) Activation Function  The ReLU is the most used activation function in the world right now.Since, it is used in almost all the convolutional neural networks or deep learning.  Even though ReLU is linear for half of its input space, it is technically a non- linear function because it has a non-differentiable point at x=0, where it abruptly changes from x. This non-linearity allows neural networks to learn complex patterns,  Since ReLU outputs zero for all negative inputs, it naturally leads to sparse activations; at any time, only a subset of neurons are activated, leading to more efficient computation. Types of Activation Functions:
  • 43. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON 3. ReLU (Rectified Linear Unit) Activation Function Types of Activation Functions:
  • 44. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON 3. ReLU (Rectified Linear Unit) Activation Function  The ReLU function is computationally inexpensive because it involves simple thresholding at zero. This allows networks to scale to many layers without a significant increase in computational burden, compared to more complex functions like tanh or sigmoid. Types of Activation Functions:
  • 45. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Types of Activation Functions:
  • 46. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Loss Functions(Cost/Error Function): Loss Function:  The loss function helps determine how effectively your algorithm model the featured dataset. Similarly loss is the measure that your model has for predictability, the expected results.  It is a mathematical function of the parameters of the machine learning algorithm.  In simple linear regression, prediction is calculated using slope (m) and intercept (b). The loss function for this is the (Yi – Yihat)^2 i.e., loss function is the function of slope and intercept. Regression loss functions like the MSE loss function are commonly used in evaluating the performance of regression models,
  • 47. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Loss Functions(Cost/Error Function): Loss Function:  A loss function measures how good a neural network model is in performing a certain task, which in most cases is regression or classification.  We must minimize the value of the loss function during the back-propagation step in order to make the neural network better.  We only use the cross-entropy loss function in classification tasks when we want the neural network to predict probabilities.  For regression tasks, when we want the network to predict continuous numbers, we must use the mean squared error loss function.
  • 48. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Loss Functions(Cost/Error Function): Loss Function:  Loss functions play a pivotal role in machine learning algorithms, acting as objective measures of the disparity between predicted and actual values.  They serve as the basis for model training, guiding algorithms to adjust model parameters in a direction that minimizes the loss and improves predictive accuracy.  In machine learning, loss functions quantify the extent of error between predicted and actual outcomes. They provide a means to evaluate the performance of a model on a given dataset and are instrumental in optimizing model parameters during the training process.
  • 49. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Loss Functions in Deep Learning 1. Mean Squared Error/Squared loss/ L2 loss:  The Mean Squared Error (MSE) is a straightforward and widely used loss function.  To calculate the MSE, you take the difference between the actual value and the model prediction, square it, and then average it across the entire dataset.
  • 50. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Loss Functions in Deep Learning 1. Mean Squared Error/Squared loss/ L2 loss:
  • 51. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Loss Functions in Deep Learning 2. Mean Absolute Error/ L1 loss Functions:  The Mean Absolute Error (MAE) is another simple loss function. It calculates the average absolute difference between the actual value and the model prediction across the dataset.
  • 52. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Loss Functions in Deep Learning 2. Mean Absolute Error/ L1 loss Functions:
  • 53. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Loss Functions in Deep Learning 3. Binary Cross Entropy/log loss Function:  It is used in binary classification problems like two classes. example a person has covid or not or my article gets popular or not.  Binary cross entropy compares each of the predicted probabilities to the actual class output which can be either 0 or 1. It then calculates the score that penalizes the probabilities based on the distance from the expected value. That means how close or far from the actual value.
  • 54. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Convolutional Neural Network(CNN) Convolutional Neural Network(CNN):  A Convolutional Neural Network, also known as CNN or ConvNet, is a class of neural networks that specializes in processing data that has a grid-like topology, such as an image.  A digital image is a binary representation of visual data. It contains a series of pixels arranged in a grid-like fashion that contains pixel values to denote how bright and what colour each pixel should be.  It takes an input image, applies filters, flatten the image, and “vote” to classify the image.
  • 55. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Building Block of (CNN): Building Block of Convolutional Neural Network(CNN):
  • 56. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Building Block of (CNN): Input Image:  First of all, the input image will be broken down into pixels.  If it is a black and white image, it will only have one layer and pixels will be interpreted as 2D array with the value from 0 to 255. If it is coloured image, it will have 3 layers (red, green, blue) and will be interpreted as 3D array.
  • 57. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Building Block of (CNN): Convolution Layer  This is the first layer that filters the input images. Its purpose is to extract features from the image. It captures colour, edges, gradient orientation, and other features so it can be differentiated.
  • 58. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Building Block of (CNN): Convolution Layer: A convolutional layer within a neural network should have the following attributes:  Convolutional kernels defined by a width and height (hyper-parameters).  The number of input channels and output channels (hyper-parameter).  The depth of the Convolution filter (the input channels) must be equal to the number channels (depth) of the input feature map. There are two types of result from this layer:  Same Padding :When the size of output feature-maps are the same as the input feature-maps,  Valid Padding :When the size of output feature-maps has the same size as the kernel,
  • 59. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Building Block of (CNN): Pooling Layer:  This layer is usually added after the convolutional layer.  Pooling layer reduces the spatial size of the output from convolutional layer and extracts dominant features. Pooling layer can be differentiated into two types, which are:  Max Pooling: It returns the maximum value from the portion of the image covered by kernel. This layer discards noisy activation and help over-fitting by providing an abstracted form of the representation.  Average Pooling: This type of pooling returns the average value from the portion of the image covered by kernel.
  • 60. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Building Block of (CNN): Fully Connected Input Layer (Flatten):  Fully connected layers are layers where all the inputs from one layer are connected to every activation unit of the next layer. The layer takes the output of the pooling and flatten them into single vector.
  • 61. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Autoencoder Autoencoder:  Autoencoders have emerged as one of the technologies and techniques that enable computer systems to solve data compression problems more efficiently.  An autoencoder is a type of artificial neural network used to learn data encodings in an unsupervised manner.  The aim of an autoencoder is to learn a lower-dimensional representation (encoding) for a higher-dimensional data, typically for dimensionality reduction, by training the network to capture the most important parts of the input image.
  • 62. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Autoencoder Autoencoder:
  • 63. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Autoencoder The architecture of Autoencoders: 1. Encoder: A module that compresses the train-validate-test set input data into an encoded representation that is typically several orders of magnitude smaller than the input data. 2. Bottleneck: A module that contains the compressed knowledge representations and is therefore the most important part of the network. 3. Decoder: A module that helps the network“decompress” the knowledge representations and reconstructs the data back from its encoded form. The output is then compared with a ground truth.
  • 64. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Autoencoder How to train autoencoders?
  • 65. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Applications of autoencoders 1. Dimensionality reduction  Undercomplete autoencoders are those that are used for dimensionality reduction.  These can be used as a pre-processing step for dimensionality reduction as they can perform fast and accurate dimensionality reductions without losing much information.  Furthermore, while dimensionality reduction procedures like PCA can only perform linear dimensionality reductions, undercomplete autoencoders can perform large-scale non-linear dimensionality reductions.
  • 66. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Applications of autoencoders 2. Image denoising  Autoencoders like the denoising autoencoder can be used for performing efficient and highly accurate image denoising.  Unlike traditional methods of denoising, autoencoders do not search for noise, they extract the image from the noisy data that has been fed to them via learning a representation of it. The representation is then decompressed to form a noise-free image.  Denoising autoencoders thus can denoise complex images that cannot be denoised via traditional methods
  • 67. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Applications of autoencoders 3. Anomaly detection  Undercomplete autoencoders can also be used for anomaly detection.  For example—consider an autoencoder that has been trained on a specific dataset P. For any image sampled for the training dataset, the autoencoder is bound to give a low reconstruction loss and is supposed to reconstruct the image as is.  For any image which is not present in the training dataset, however, the autoencoder cannot perform the reconstruction, as the latent attributes are not adapted for the specific image that has never been seen by the network.
  • 68. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Long Short-Term Memory Networks (LSTM) Long Short-Term Memory Networks (LSTM)  LSTM (Long Short-Term Memory) is a recurrent neural network (RNN) architecture widely used in Deep Learning. It excels at capturing long-term dependencies, making it ideal for sequence prediction tasks.  Unlike traditional neural networks, LSTM incorporates feedback connections, allowing it to process entire sequences of data, not just individual data points. This makes it highly effective in understanding and predicting patterns in sequential data like time series, text, and speech.  LSTM has become a powerful tool in artificial intelligence and deep learning, enabling breakthroughs in various fields by uncovering valuable insights from sequential data.
  • 69. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Long Short-Term Memory Networks (LSTM) LSTM Architecture:
  • 70. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Long Short-Term Memory Networks (LSTM) LSTM Architecture:  The first part chooses whether the information coming from the previous timestamp is to be remembered or is irrelevant and can be forgotten.  In the second part, the cell tries to learn new information from the input to this cell.  At last, in the third part, the cell passes the updated information from the current timestamp to the next timestamp. This one cycle of LSTM is considered a single-time step.  These three parts of an LSTM unit are known as gates,
  • 71. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Long Short-Term Memory Networks (LSTM) LSTM Architecture:  The first gate is called Forget gate, the second gate is known as the Input gate, and the last one is the Output gate.  An LSTM unit that consists of these three gates and a memory cell or lstm cell can be considered as a layer of neurons in traditional feedforward neural network, with each neuron having a hidden layer and a current state.
  • 72. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Long Short-Term Memory Networks (LSTM) LSTM:
  • 73. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Recent Trends in Deep Learning 1. Hybrid Model Integration  An application provides a mechanism for integrating hybrid models from data sources such as census, weather, and social media into decision support tools. Moreover, it enables the creation of a new nested domain for the location data, which can then become part of decision support systems.  The results suggest that incorporating deep learning networks into hybrid models can lead to better decisions concerning hazards and performance measures such as growth and employment.  Hybrid models combine the benefits of symbolic AI and deep learning. It’s a top-down approach to artificial intelligence.
  • 74. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Recent Trends in Deep Learning 2. The Vision Transformer  Commonly referred to as ViT, an image classification model developed by researchers at the University of Washington, it is used in sentiment analysis, object recognition, and image captioning.  ViT consists of an input layer, a middle layer, and an output layer. The input layer contains training images that have been labeled with one of several possible sentiments (cheerful, negative, neutral, uncertain, sad, happy, angry). The middle layer detects the types of objects in the image.  The output layer returns a confidence score based on the kind seen by the central and input layers.
  • 75. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Recent Trends in Deep Learning 3. Self-Supervised Learning  This deep and self-supervised learning module helps in automation.  Rather than depending on labeled data to train a system, it learns to categorize the raw data automatically.  Each input component can predict any other part of the input. It might, for example, forecast the future based on historical records.  In a self-supervised learning system, the input is labeled either by an intelligent agent or by some external source.  The output is also marked with a label that reflects the overall quality of the prediction made by the system.
  • 76. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Recent Trends in Deep Learning 4. Neuroscience Based Deep Learning  Neuroscience-based deep learning is a type of ML that uses data from neuroscience experiments to train artificial neural networks. It allows researchers to develop models that better understand how the brain works.  Artificial neural networks constructed on computers are comparable to those seen in human brains.  As a result of this formation, scientists and researchers have uncovered thousands of neurological remedies and ideas. With the deployment of progressively more robust, comprehensive, and advanced deep learning implementations and solutions, the dynamics of adaptability ratio have improved significantly.
  • 77. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Recent Trends in Deep Learning 5. High-Performance NLP Models  Machine Learning-based NLP is still in the early stages. However, there is presently no method that will allow NLP computers to recognize the meanings of different words in various contexts and respond appropriately.  One approach to solving this problem is to build a model that can recognize patterns in large amounts of text (e.g., millions of documents).  This is where machine learning algorithms come in, as they can automatically learn from data and train models to make predictions
  • 78. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Recent Trends in Deep Learning 6. Generative Adversarial Networks (GANs):  Generative Adversarial Networks (GANs) are a type of neural network that generates new, realistic data based on existing data. GANs work by pitting two neural networks against each other, with one network generating fake data and the other network trying to detect whether the data is real or fake.  This approach enables GANs to generate new data that is indistinguishable from real data, and has a wide range of applications, including image and video generation, music synthesis, and natural language processing.  One of the advantages of GANs is their ability to generate a diverse range of outputs, which can be used to train machine learning models that are more robust and accurate.
  • 79. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON Recent Trends in Deep Learning 7. Reinforcement learning  Reinforcement learning is a machine learning approach in which an agent learns how to behave in an environment by performing actions and receiving feedback in the form of rewards or penalties. The agent learns from its actions and adjusts its behavior accordingly to maximize the cumulative reward over time.  Reinforcement learning has been used to develop game-playing agents that have beaten human champions in games such as Go and Chess. The technique has also been applied to robotics, where agents learn how to manipulate objects or navigate through environments.
  • 80. DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON