0% found this document useful (0 votes)
6 views21 pages

DL Notes

Uploaded by

Tejas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views21 pages

DL Notes

Uploaded by

Tejas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

DL Notes

MOD 1

Q. What is MLP?

A Multilayer Perceptron (MLP) is a type of artificial neural network


characterized by its multiple layers of interconnected nodes or neurons.

• It consists of an input layer, one or more hidden layers, and an output layer.

• Each layer contains nodes, and nodes in one layer are connected to nodes in
the adjacent layers.

• The architecture of an MLP enables it to learn complex patterns and


relationships within the data.

Components :

1,Input Layer :

The input layer receives the features or input data and transmits them to the
neurons in the hidden layer. Each node in the input layer represents a feature of
the input data.

2.Hidden Layers :

Hidden layers, as the name suggests, are intermediary layers between the input
and output layers. Each node in a hidden layer receives inputs from all nodes in
the previous layer and produces an output for nodes in the subsequent layer.

The presence of multiple hidden layers allows MLPs to learn intricate and
hierarchical representations of data.

3. Activation Functions :

Activation functions introduce non-linearities to the model, enabling MLPs to


learn complex relationships. Common activation functions include the sigmoid,
hyperbolic tangent (tanh), and rectified linear unit (ReLU).
4. Output Layer :

The output layer produces the final result or prediction. The number of nodes in
the output layer depends on the nature of the task. For binary classification, a
single node with a sigmoid activation function is often used.

For multi-class classification, the output layer may have multiple nodes, often
with a softmax activation.

Representation Power of MLP :

• MLPs can learn complex non-linear relationships between inputs and


outputs.

• Universal Approximation Theorem: An MLP with at least one hidden


layer and enough neurons can approximate any continuous function to
any desired accuracy.

• By stacking multiple hidden layers, MLPs represent patterns


hierarchically:

o Lower layers learn simple features.

o Higher layers combine these to detect complex features.

Example:
For handwriting recognition (digits 0–9),

• Lower layers may detect basic strokes and curves.

• Higher layers combine these strokes to identify full digits.


This ability to progressively learn from simple to complex features makes
MLPs highly powerful for pattern recognition tasks.

Q. Explain how MLP overcomes limitations of perceptron.

Limitations of Single-Layer Perceptron

• Can only solve linearly separable problems (e.g., AND, OR gates).

• Fails for non-linearly separable problems like XOR.

• Lacks hidden layers → cannot learn complex patterns.

• Limited representation power.


How MLP Overcomes These Limitations

1. Multiple Hidden Layers

o Adds one or more hidden layers between input and output,


enabling the network to break down complex problems into
simpler parts.

2. Non-Linear Activation Functions

o Uses functions like ReLU, Sigmoid, or Tanh to introduce non-


linearity, allowing the model to solve non-linearly separable
problems (e.g., XOR).

3. Increased Representation Power

o By the Universal Approximation Theorem, an MLP can


approximate any continuous function with enough neurons and
layers.

4. Backpropagation Training

o Learns by adjusting weights in all layers to minimize error,


improving accuracy for complex tasks.

5. Hierarchical Feature Learning

o Lower layers learn simple features, higher layers learn complex


structures, enabling recognition in images, text, and speech.

Example :
A single perceptron can’t classify an XOR logic gate because its decision
boundary isn’t a straight line. An MLP with two hidden neurons and non-linear
activation can classify XOR correctly by creating a non-linear decision
boundary.
Q. What is an Activation Function? Types?

An activation function is a mathematical function in a neural network that


decides whether a neuron should be activated or not and how strongly it should
influence the next layer. It adds non-linearity to the network so it can learn
complex patterns, not just straight-line relationships.

Without activation functions, an MLP would behave like a simple linear model
(even with many layers). With activation functions, the network can solve
problems like XOR, recognize images, translate languages, etc.

Types :

1. Step Function : The step function is a binary activation function. If the


input is above a certain threshold, it outputs one; otherwise, it outputs
zero.

Equation :

2. Sigmoid Function : The sigmoid (logistic) function squashes input


values to the range (0, 1). It is commonly used in the output layer of
binary classification models.

Equation:

3. ReLU (Rectified Linear Unit) : ReLU is a popular activation function


that outputs the input for positive values and zero for negative values. It
introduces non-linearity and is computationally efficient.
Equation:
3. Hyperbolic Tangent (Tanh) : Similar to the sigmoid function, the tanh
function maps input values to the range (-1, 1). It is often used in hidden
layers of neural networks.

Equation:

Q. Terminologies and Classes of Deep Networks.

Terminologies :

1. Neuron (Node)

The basic processing unit of a neural network, inspired by brain cells.


It takes inputs, multiplies them by weights, adds a bias, applies an activation
function, and sends the result to the next layer.

2. Layers

Neurons are arranged in layers:

• Input Layer – receives raw data.

• Hidden Layers – extract and learn patterns.

• Output Layer – produces the final prediction.


The number of hidden layers defines the depth of the network.

3. Weights

Trainable parameters that determine the strength of the connection between


neurons. During training, weights are adjusted using algorithms like Gradient
Descent to reduce prediction error.
4. Dropout : Dropout is a regularization method that randomly removes a
fraction of neurons during training to prevent overfitting. This encourages the
network to learn robust, general features instead of relying on specific neurons.
For example, a 0.5 dropout rate means half the neurons in a layer are ignored in
each training step.

5. Learning Rate : The learning rate controls how much weights change
during training. High values train faster but risk overshooting, while low values
are more precise but slower. Choosing the right rate is crucial, and methods like
scheduling or adaptive optimizers can adjust it automatically.

Classes :

1. Feedforward Neural Networks (FNN)


Feedforward Neural Networks are the simplest type of artificial neural networks
where information moves in one direction—from input to output—without
looping back. They consist of an input layer, hidden layers, and an output layer,
with each neuron passing its output to the next layer. These networks are mainly
used for tasks like image classification, spam detection, and regression
problems.
Example: Predicting house prices based on features like location, size, and
amenities.

2. Convolutional Neural Networks (CNN)


Convolutional Neural Networks are specially designed to process grid-like data
such as images. They use convolutional layers to automatically detect important
features like edges, textures, and shapes, reducing the need for manual feature
extraction. CNNs are widely used in computer vision tasks due to their high
accuracy and efficiency.
Example: Identifying objects in images, such as detecting whether a photo
contains a cat or a dog.

3. Recurrent Neural Networks (RNN)


Recurrent Neural Networks are designed to handle sequential data, where the
order of information matters. They have loops that allow information to persist,
making them suitable for tasks involving time series or language. However,
standard RNNs face problems like vanishing gradients, which are improved by
variants such as LSTMs and GRUs.
Example: Predicting the next word in a sentence or forecasting stock prices
based on past data.

4. Autoencoders
Autoencoders are neural networks that learn to compress data into a smaller
representation (encoding) and then reconstruct it back to its original form
(decoding). They are mainly used for dimensionality reduction, feature learning,
and noise removal from data. By learning efficient data representations,
autoencoders are useful in anomaly detection and image compression.
Example: Detecting fraudulent credit card transactions by learning normal
spending patterns.

5.Generative Adversarial Networks (GANs)


Generative Adversarial Networks consist of two networks—a generator and a
discriminator—that compete with each other. The generator creates fake data,
and the discriminator tries to identify if it is real or fake. Over time, the
generator becomes skilled at producing highly realistic data.
Example: Creating realistic-looking human faces that don’t actually exist.
MOD 2

Q. Explain loss function? Discuss mean squared error? Suppose you have a
dataset with actual values and predicted values as follows find mean
squared error? Actual Values = [5,10, 15, 20] Predicated Values =
[7,9,14,18]

Loss Function

A loss function measures how well or poorly a neural network’s predictions


match the actual target values.
It outputs a single value representing the error of the model for given data. The
training process aims to minimize this value by adjusting weights.
Different loss functions are used depending on the type of task (e.g., regression,
classification).

Mean Squared Error (MSE)

Mean Squared Error is a common loss function for regression problems.


It calculates the average of the squared differences between actual values and
predicted values.

Formula:
Q.Cross entropy loss and its types

Cross-Entropy Loss (also called Log Loss) is a loss function commonly used in
classification problems.
It measures the difference between the predicted probability distribution and the
actual distribution (true labels).

If the predicted probability for the correct class is high, the loss will be small. If
it’s low, the loss will be large.
It penalizes confident but wrong predictions more heavily than MSE.
Q. Explain categorical cross entropy? Consider a binary classification
problem where the actual result is 1 and predicted result is 0.8, find binary
cross entropy loss.
Q. What is Gradient Descent? Types?

Gradient Descent is an optimization algorithm used to minimize a loss function


in machine learning and deep learning.
It works by iteratively adjusting the model’s parameters (weights and biases) in
the opposite direction of the gradient (slope) of the loss function.
The size of each adjustment is controlled by a parameter called the learning rate.
Working :
Types :
Regularization

• Regularization is a technique used in machine learning to reduce


overfitting and improve a model’s ability to generalize to unseen data.

• Overfitting happens when a model learns the training data too well —
including noise — and performs poorly on new data.

• Regularization works by adding a penalty term to the loss function,


discouraging the model from fitting overly complex patterns.

Types :
Example :

For Activation Functions , refer PPT.


3RD MOD

Autoencoders

• Autoencoders are a special type of neural network used for unsupervised


learning, mainly for data compression (encoding) and reconstruction
(decoding).

• They learn to map input data to a lower-dimensional representation


(called a latent space or code) and then reconstruct the original data from
it.

• Structure:

1. Encoder: Compresses the input into a smaller code.

2. Latent Space: Stores compressed data representation.

3. Decoder: Reconstructs data from the code.

Working of an Autoencoder

1. Structure:
An autoencoder is a type of neural network with two main parts:

• Encoder → Compresses the input into a smaller representation called the


latent code or bottleneck.

• Decoder → Expands the latent code back to reconstruct the original


input.

The network is trained so that the output is as close as possible to the input.

2. Step-by-Step Working:

Step 1 – Input Data

• You give the autoencoder raw input data (e.g., an image, text vector, or
audio signal).

• Example: A 28×28 grayscale image → 784 input neurons.


Step 2 – Encoding

• The encoder applies linear transformations (matrix multiplication) and


non-linear activations (like ReLU, sigmoid) to compress the input.

• This produces a latent vector (compressed version) containing the most


important features.

• Example: 784 → 128 → 64 → 32 neurons (bottleneck layer).

Step 3 – Latent Space

• The bottleneck layer holds a compact representation of the data.

• This is where important patterns (edges, shapes, etc.) are stored in a


smaller form.

Step 4 – Decoding

• The decoder takes the latent vector and applies transformations to expand
it back to the original size.

• Example: 32 → 64 → 128 → 784 neurons.

Step 5 – Reconstruction Loss

• The network compares the reconstructed output to the original input using
a loss function like Mean Squared Error (MSE) or Binary Cross-
Entropy.

• Loss = Difference between original input and reconstructed output.

Step 6 – Training (Backpropagation)

• Using the loss, the network updates its weights via gradient descent so
that the reconstruction becomes more accurate over time.

3. Example Flow – Image Compression:

1. Input: A 28×28 handwritten digit image (784 pixels).

2. Encoding: Reduce it to 128 → 64 → 32 neurons (compressed form).


3. Latent Space: Stores only essential features of the digit (e.g., shape of
curves).

4. Decoding: Expand it back 32 → 64 → 128 → 784 neurons.

5. Output: A reconstructed digit that looks almost identical to the input.

6. Use Case: Store the compressed 32-value representation instead of the


784-pixel image to save space.

Types of Autoencoders :

1. Linear Autoencoder

Definition:
A linear autoencoder is an autoencoder that uses only linear transformations,
like matrix multiplication and addition, without applying any nonlinear
activation functions. It can only capture straight-line (linear) relationships in the
data.

Working:
The encoder compresses input data into a smaller code using linear equations,
and the decoder reconstructs it back. Since it is linear, its behaviour is similar to
Principal Component Analysis (PCA), focusing on variance preservation rather
than complex patterns.

Example:
If you have 4 numerical features in a dataset, a linear autoencoder can reduce
them to 2 features while keeping the most important information, just like PCA
would.

2. Undercomplete Autoencoder

Definition:
An undercomplete autoencoder has a latent space with fewer dimensions than
the input. This limits the network’s capacity and forces it to learn only the most
important features.
Working:
It compresses the input into a smaller code, which cannot store all details. This
restriction makes the model focus on essential patterns instead of memorizing
data. Such models are useful for compression and dimensionality reduction.

Example:
A 28×28 image (784 pixels) can be encoded into just 64 values, making it
possible to store or transmit the image in a smaller size while still keeping
important details.

3. Overcomplete Autoencoder

Definition:
An overcomplete autoencoder has a latent space with more dimensions than the
input, giving it more capacity to store information.

Working:
If no restrictions are applied, it might simply copy the input instead of learning
meaningful patterns. To prevent this, regularization methods like sparsity or
dropout are added so it still learns useful representations.

Example:
A 784-pixel image might be expanded to a 1024-value latent vector to capture
finer details for image generation, while using constraints to avoid
memorization.

4. Sparse Autoencoder

Definition:
A sparse autoencoder uses a large latent space but ensures that most neurons
remain inactive (values close to zero). This is achieved by adding a sparsity
penalty in the loss function.

Working:
Even though there are many neurons, the model activates only a few for each
input. This makes the representation more efficient and focused on the most
important features.
Example:
A text document could be encoded into a 100-value vector where only 5
positions have non-zero values, helping in feature selection for NLP tasks.

5. Denoising Autoencoder

Definition:
A denoising autoencoder is trained to remove noise from an input and
reconstruct the original clean version.

Working:
It takes a noisy version of the input, encodes it into a latent space, and decodes
it into a cleaner version. The model learns to focus on essential patterns rather
than memorizing noise.

Example:
A blurred or noisy photograph of a handwritten digit can be cleaned up by a
denoising autoencoder to produce a clear and readable digit image.

6. Contractive Autoencoder

Definition:
A contractive autoencoder learns representations that are stable even when small
changes are made to the input.

Working:
It adds a penalty term that reduces sensitivity of the latent space to slight
variations in the input. This makes the features more robust and useful for
classification tasks.

Example:
If one pixel in a cat image is changed, the contractive autoencoder will still
produce almost the same encoded vector, focusing on overall shapes and
patterns instead of tiny details.
Regularization in Autoencoders

Definition:
Regularization in autoencoders is a set of techniques used to guide the network
toward learning useful, generalizable features instead of memorizing the
training data. It works by adding constraints or penalties during training,
making the model avoid overfitting and focus on patterns that truly represent the
data. This is especially important when the autoencoder has high capacity (e.g.,
overcomplete latent space)

Working:
Without regularization, an autoencoder might take the “shortcut” of copying the
input directly to the output, which makes the latent code meaningless.
Regularization forces the network to represent information in a compressed or
meaningful way by restricting its behavior. Some common methods include:

• Sparsity Constraint → Adds a penalty if too many latent neurons are


active, ensuring only the most important ones fire.

• Weight Decay (L1/L2) → Penalizes large weights so the network doesn’t


rely on very strong connections.

• Dropout → Randomly turns off neurons during training, encouraging


redundancy and robust learning.

• Denoising → Corrupts input with noise and trains the model to recover
the original, forcing it to focus on essential features.

• Contractive Penalty → Penalizes sensitivity in latent space so that small


input changes don’t drastically change the encoding.

Example:
Suppose we train an autoencoder on 28×28 images (784 pixels) but use 1024
neurons in the latent space. Without regularization, the model might memorize
each pixel and reconstruct perfectly without understanding shapes. If we add
sparsity regularization, only a few neurons in the latent layer will activate for
each image. This makes the autoencoder store high-level patterns like edges,
curves, or textures, which are more meaningful and transferable to other tasks
like classification or denoising.

You might also like