0% found this document useful (0 votes)

47 views

Introduction to Deep Learning 17th January 2025 (2)

The document provides an introduction to deep learning, covering various machine learning techniques for classification, including K-Nearest Neighbours, Bayes Classifier, and neural networks. It discusses deep learning models such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), along with applications in image classification, video captioning, and sequence-to-sequence mapping tasks. Additionally, it highlights optimization and regularization methods for training deep learning models.

Uploaded by

ed22b044

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views

Introduction to Deep Learning 17th January 2025 (2)

Uploaded by

ed22b044

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

Introduction to Deep Learning

C. Chandra Sekhar
Dept. of Computer Science and Engineering
Indian Institute of Technology Madras
Chennai-600036

[email protected]

Office Room: SSB 407

1
Regression and Classification Tasks

Machine Learning Model

2
S.J.D.Prince, Understanding Deep Learning, MIT Press, 2023
Learning Tasks with Structured Outputs

S.J.D.Prince, Understanding Deep Learning, MIT Press, 2023 3

Machine Learning Techniques for Classification
• K-Nearest Neighbours Method
• Bayes Classifier
– Statistical modeling
– Unimodal distribution modeling
– Multimodal distribution modeling: Gaussian Mixture
Model
• Multilayer feedforward neural network based
classification
• Support vector machine based classification
• Classification using decision tree
– Random forest based classification
• Classification of sequential or temporal patterns
– Hidden Markov model
Image Classification

Tiger

Giraffe

Horse

Bear

5
Pattern Classification Tasks in Speech Processing

is bu le Tin ki mu khya sa mA chAr

mu nnAL mu da la mei ccar sel vi jey la li ta

I rO ju vAr ta lo lu mu khyam sa lu

• Speech Recognition • Speech Emotion Recognition

• Speaker Recognition • Spoken Language Identification

6
Text Processing Tasks

• Sentence classification
• Parts-of-speech tagging
• Named entity recognition
• Sentiment analysis

7
Classification using Deep Learning Models
Representation Learning: Conventional machine learning techniques (Bayes
Classifiers, MLFFNNs and SVMs) take hand-designed features as input to models.
Focus of deep learning techniques is to learn representation (features) from raw data
given as input to models.

Conventional Approaches to Pattern Classification:

Classification Class
Raw Data Representation Label
Feature Model
Extraction (Bayes Classifier/
MLFFNN/SVM)

Deep Learning based Approaches to Pattern Classification:

Class
Raw Data Feature Extraction and Classification Label

(Deep Convolutional Neural Network)

8
Content based Image Retrieval

• Query-by-example (QBE) Approach

• Suitable method
for matching
• Measure of
dissimilarity:
Distance metric
learning

9
Content based Image Retrieval

• Query-by-semantics (QBS) Approach

• Images in the
repository should be
annotated
• Image annotation:
Multi-label pattern
classification

10
Image Captioning

A group of people shopping at an outdoor market.

There are many vegetables at the fruit stand

O. Vinyals, A. Toshev, S. Bengio and D.Erhan, “Show

and tell: Lessons learned from the 2015 MSCOCO
Image Captioning Challenge,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 39,
no.4, pp.652-663, April 2017.

A woman holding a camera in crowd

Fang et al., “From captions to visual

concepts and back”, CVPR, 2015.

11
Video Captioning
● Generate text descriptions by localizing interesting
events in a video.
○ Event detection: Event Proposal Module
○ Event description: Captioning Module

Event Proposal
Proposed Events Captioning Generated captions for
Module
Module different events

Input

12
Visual Question Answering

Is there something to cut the Who is wearing glasses?

vegetables with?

Man Woman
Yes
How many children are in the
bed?

No Two One

13
Visual Commonsense Reasoning

14
Deep Learning Models
• Deep Feedfoward Neural Networks (DFNNs)
• Stacked Autoencoder based Pre-training for DFNNs
• Convolutional Neural Networks (CNNs)
• Recurrent Neural Networks (RNNs)
• Long Short Term Memory (LSTM) Networks
• Attention based Models: Transformers
– Pre-training of transformer model: BERT
• Generative Models
– Generative Pre-trained Transformers (GPT)
– Variational Autoencoders
– Generative Adversarial Networks (GANs)
– Diffusion Models

15
Multilayer Feedforward Neural Network
• Architecture of an MLFFNN
– Input layer: Linear neurons
– Hidden layers (1 or 2): Sigmoidal neurons
– Output layer: Sigmoidal neurons or Softmax neurons

Input Layer Hidden Layer Output Layer

o
x1 1 1 1 s 1
o
x2 2 2 2 s 2
. . .
.
. . .
.
. . .
.

o
xi i j k s k
. . .
.
. . . .
. . . .

o
xd d J K
s K
16
Deep Feedforward Neural Network
(DFNN)

O
U
T
H H H H H P
I I I I I I U
N D D D D D T
P D D D D D
Input U E E E E E L
N N N N N Output
X T A S
Y
L L L L L L E
A A A A A A R
Y Y Y Y Y Y
E E E E E E
R R R R R R

1 2 3 4 5

17
Optimization Methods for Training a DFNN

• Slow convergence of gradient descent method

• Problem addressed: How to reduce the number of
epochs taken to reach a local minimum?
• Weight update methods that use the past history
of updates have been shown to be effective.
• Generalized delta rule that uses momentum
factor
• Weight-specific learning rate scheduling methods
(Adaptive learning rate methods)
– AdaGrad
– RMSProp
– AdaDelta
– AdaM
• Second-order methods for optimization
18
Regularization Methods for Training a DFNN

• Underfitting: Model complexity is low

• Overfitting:

–Model complexity is high

–Training dataset size is small

• L2 regularization method

• Dropout method

• Drop connect method

• Batch normalization
19
Auto-Association Neural Network (AANN)
Encoder Decoder
Actual Desired
output output

x1 s1 x1

x2 s2 x2

x3 s3 x3

xd sd xd
Input Dimension Reduction Output
Layer Layer Layer

• AANN uses linear neurons in the Input layer, Dimension

reduction layer and Output layer. It uses sigmoidal neurons in
the other two hidden layers.
• AANN is trained using the backpropagation learning method
• After the model is trained, the output of the Bottleneck Layer
(Dimension reduction layer) is used as the reduced dimension
representation of the input
• Encoder in AANN, also called as autoencoder, is used in Deep
stacked autoencoder network models
20
Auto-Association Neural Network (AANN)

x1 s1

x2 s2

x3 s3

xd sd

Encoder Decoder

21
Multiple AANNs for Stacked Autoencoder
AANN 1 Bottleneck
Features Desired Output
Input z1
x Encoder Decoder x
1 1
Dimension d Dimension l1 Dimension d

AANN 2
Bottleneck
Input Features Desired Output
z1 z2 z1
Encoder Decoder
2 Dimension l2 2 Dimension l1
Dimension l1

AANN 3
Bottleneck
Input Features Desired Output
z2 z3 z2
Encoder Decoder
3 Dimension l3 3
Dimension l2 Dimension l2

22
Stacked Autoencoder for Pre-training a DFNN

A A O
U U A U
T T U T
O O T
Input E O
P Output
X E U S
N E
N
C N T
C C
O O O
D D D L
E E E A
R R R
Y
1 3 E
2 R

•Weights of autoencoders are learnt using unsupervised

learning with unlabeled examples. These weights are used
as the initial weights for DNN.

• Fine-tuning of DNN involves modification of weights

using backpropagation learning method that uses a small
set of labeled examples. 23
Convolution Neural Networks (CNNs)

• Convolutional neural network (CNN) is a special type of

multilayer feedforward neural network (MLFFNN) that is
well suited for image classification.

• Development of CNN is neuro-biologically motivated.

• A CNN is an MLFFNN designed specifically to recognize 2-

dimensional shapes with a high degree of invariance to
translation, scaling, skewing and other forms of distortion.

S. Haykin, Neural Networks and Learning Machines, Prentice-Hall of India, 2011

24
LeNet5: CNN for
Handwritten Character Recognition
Input 6 6 16 16
Feature Maps Output
Feature Maps Feature Maps Feature Maps 26
28x28
32x32 14x14 10x10 5x5

Convolution
Convolution Pooling
Pooling

• Input: 32x32 pixel image of a character centered and normalized in size

• Weight sharing: All the nodes in a feature map in a convolutional layer have the same
synaptic weights (~278000 connections, but only ~1700 weight parameters)

• Output layer: 26 nodes with one node for each character. Each node in the output layer is
connected to the nodes in all the feature maps in the 4th hidden layer.

Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based learning applied to document

recognition,” Proceedings of IEEE, vol.86, no.11, pp.2278-2324, November 1998.
25
CNN Models for Image Classification

• Image Classification (on ImageNet data):

• AlexNet
• VGG-Net
• ResNet
• GoogLeNet
• PReLU-Net
• Batch Normalization(BN)-Inception-ResNet

• W. Rawat and Z. Wang, “Deep convolutional neural networks for image classification:
A comprehensive survey,” Neural Computation, vol.29, pp.2352-2449, 2017.

26
VGG-Net Architecture
• Deep CNN developed by Visual Geometry Group (VGG) of Oxford
university
• Task: Classification of color images belonging to 1000 classes in
the ImageNet dataset

27
U-Net for Image Segmentation

O.Ronneberger, P.Fischer, and T.Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation ”, arXiv, 2015.

28
Faster Region-based CNN (Faster R-CNN)
for Object Detection

S.Ren, K.He, R.Girschick and J.Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, arXiv,
2016.
29
Image Captioning

A group of people at an outdoor market.

O. Vinyals, A. Toshev, S. Bengio and D.Erhan, “Show and tell:

Lessons learned from the 2015 MSCOCO Image Captioning
Challenge,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 39, no.4, pp.652-663, April 2017.

A woman holding a camera in crowd

Fang et al., “From captions to visual concepts and back,

CVPR, 2015.

30
Encoder-Decoder Paradigm
for Image Captioning

Representation
Image Caption
Deep of Image
Recurrent
Convolutional Neural
Neural Network Network (RNN)
(DCNN)

Encoder Decoder

31
Recurrent Neural Network (RNN)
State of
hidden
Input at time t layer
𝒙𝒕 at time t
𝒉𝒕 Output at time t
Hidden Output 𝒔𝒕
𝒉𝒕−𝟏 Layer Layer

• The hidden layer uses sigmoidal neurons

• The state of hidden layer (outputs of nodes in the hidden layer) at
time t, 𝒉𝒕 , is dependent on the input at time t and the state of the
hidden layer at time t-1.
• The RNN that uses sigmoidal neurons in its hidden layer is shown
to have the vanishing and exploding gradients problem, due to
which the convergence during training is slow.

32
Long Short-Term Memory (LSTM)
• Structure of an LSTM Cell

• The RNN that uses LSTM neurons in its hidden layer is shown to
avoid the vanishing gradients problem, leading to faster convergence
during training 33
Encoder-Decoder Paradigm
for Image Captioning

Image

Decoder
(LSTM)

young person drawing face on sheet <endseq>

Encoder
(VGG-Net)

<startseq> young person drawing face on sheet

• The output of the pre-final layer in the CNN based encoder is used
as the initial state of the hidden layer of LSTM based decoder 34
Embedding Methods

• Image Embedding Methods

• Output of pre-final layer of a deep CNN

• Word Embedding Methods

• Word2Vec
• GloVe
• FastText

35
Sequence-to-Sequence Mapping Tasks
• Neural Machine Translation: Translation of a sentence in
the source language to a sentence in the target
language
• Input: A sequence of words
• Output: A sequence of words
• Video Captioning: Generation of a sentence as the
caption for a video represented as a sequence of frames
• Input: A sequence of feature vectors extracted from
the frames of a video
• Output: A sequence of words
• Each of the above tasks involves mapping an input
sequence to an output sequence

36
Encoder-Decoder Paradigm for
Sequence-to-Sequence Mapping

Representation
Recurrent of Input Recurrent
Output Sequence
Input Sequence
Neural
Sequence
Neural
Network Network
(RNN) (RNN)

Encoder Decoder

37
Encoder-Decoder Paradigm for
Sequence-to-Sequence Mapping
• Sequence-to-Sequence Mapping using Encoder-Decoder Paradigm
• Encoder: Generate a representation of the input sequence
• Representation generated by Encoder is given as input to Decoder
• Decoder: Generate the output sequence (A sequence of words)

• Relationship among the elements of a sequence:

• Typically, an element in the input sequence is related to a few other
elements in the input sequence
• Typically, a word in the output sequence to be generated is related to a few
elements in the input sequence

• LSTM based approach to Sequence-to-Sequence Mapping

• Bidirectional LSTM based Encoder captures dependencies among elements in
the input sequence
• Bidirectional LSTM based Decoder captures dependencies among elements in
the output sequence
• Attention mechanism is introduced to capture dependencies of elements in
the output sequence on elements in the input sequence

• Training the LSTM based Sequence-to-Sequence mapping systems is

computationally intensive, and there is not much scope for parallelization of
operations in the training process 38
Attention based Models for
Sequence-to-Sequence Mapping

• Attention based models try to capture and use

• Relations among elements in the input
sequence (Self-Attention)
• Relations among elements in the output
sequence (Self-Attention)
• Relations between elements in the input
sequence and elements in the output
sequence (Cross-Attention)

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I.

Polosukhin, “Attention is all you need,” NIPS, 2017.
39
Attention-based Model: Transformer

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin,

“Attention is all you need,” NIPS, 2017.
40
Pre-training of Transformer

Encoder and/or decoder of transformer can be

pre-trained using huge amount of unlabeled
data, and then fine-tuned using small amount of
labeled data for a downstream task.

• Encoder pre-training for text data

oBidirectional Encoder Representation from
Transformer (BERT)

• Decoder pre-training for text data

oGenerative Pre-trained Transformer (GPT)

41
Bidirectional Encoder Representation from
Transformer (BERT)
• Pre-train the generic
representation for several
Natural Language Processing
(NLP) tasks

• Pre-training Methods:
• Masked Language Modelling
(Mask LM)
• Next Sentence Prediction
(NSP)

• Fine-tuned for tasks such as

• Sentence classification
• Sentence relationship
• Textual question answering

Image source : BERT(Devlin et al., 2019)

Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova, “BERT: Pre-training of Deep
Bidirectional Transformers for Language Understanding,” NAACL, 2019.
42
Generative Pre-trained Transformer (GPT)
• Transformer decoder is pre-trained using unlabeled text data
• GPT can be fine-tuned for downstream tasks that involve text
data
• Auto-regressive model: A word in a sentence is predicted
using all the words preceding that word in the sentence
• Masked multi-head self-attention (MSA) in each layer of
transformer decoder takes the sequence of words preceding a
word in a sentence.
• The decoder is trained to predict the next word in the
sentence.
• GPT-1, GPT-2 and GPT-3: Pre-trained models with different
number of layers trained with different corpora for different
pre-training tasks

A.Redford, K.Narasimhan, T.Salimans and I.Sutskever , “Improving Language Understanding by

Generative Pre-training,” 2018

A.Redford, J.Wu, R.Child, D.Luan, D.Amodei and I.Sutskever, “Language Models are
Unsupervised Multitask Learners,” 2019

T.Brown et al., “Language Models are Few-Shot Learners,” arXiv:2005.14165v4, 22nd July, 2020
43
Visual Question Answering (VQA) for Images

Is there something to cut the

Who is wearing glasses?
vegetables with?

Man Woman
Yes

How many children are in the bed?

Two One
No

44
Open Ended VQA

45
Image VQA Framework

Image Representation
of Image

Image Encoder
Answer
Fusion of Answer Two
Represen- Generator
tations
Question
Representation
Question Question

(Text)
How many
children Encoder
are
in the
bed?
Image Encoder: CNN, ViT Encoder, Swin Tranformer

Question Encoder: LSTM, Transformer encoder, BERT fine-tuned with questions in VQA dataset

Fusion of Representations: Concatenation, Co-attention transformer

Answer Generator: Classifier, Text generator such as GPT fine-tuned with answers in VQA dataset
46
Open Ended VQA Framework

Representation

Transformer of Question and

Partial Answer

Decoder
Question and
Partial Answer
Fusion of Next Word
Represen- Predictor
Next
tations word
Image
Representation

Image of Image

Encoder

In open ended VQA, the answer is a sequence of words. The system generates
one word of the answer at a time. The next word in the answer is predicted using
the representations of image, question, and the partial answer corresponding to
the sequence of words generated so far.

A.M.Bellini, N.Parde, M.Matteucci and M.J.Carman, “Towards Open-Ended VQA Models

using Transformers, ” EMNLP, 2020. 47
Generative Models

• Models capable of generation of data (Text, Image,

Video, Music)
• Restricted Boltzmann machine (RBM)
• Variational autoencoder
• Generative pre-trained transformer (GPT)
• Large Language Models (LLMs)
• Generative adversarial network (GAN)
• Diffusion models
• Text-to-image
• Text-to-video
• Text-to-audio
• Text-to-music
48
LLMs: Evolution of GPT Models

NLP Benchmarks:

LAMBADA: LAnguage Modeling Broadened to Account for Discourse Aspects

GLUE: General Language Understanding Evaluation

SQUaD: Stanford Question Answering Dataset

49
Image Generation
Image-to-Image Translation
Sketch-to-Image Generation
Denoising Diffusion Models
for Image Generation

Data Destructing data by adding progressively increasing level noise Noise

Data Generating a new sample by denoising Noise

L.Yang et al., “Diffusion Models: A Comprehensive Survey of Methods and Applications,” arXiv, 2023.

53
Text-to-Image Translation

Output of Stable
Diffusion Model

Word prompts: a
dream of time gone
by, oil painting, red
blue white, canvas,
watercolor, koi fish,
and animals.
Retrieval Augmented Generation (RAG)
RAG Process for Textual Question Answering

Yunfan Gao et al, “Retrieval-Augmented Generation for Large Language Models: A Survey”, arxiv:
2312.10997v5 [cs.CL] 27 March 2024
Coverage of Topics
1. Introduction to deep learning
2. Feedforward neural networks: Model of an artificial
neuron, Activation functions: Sigmoidal function, Recti-linear
unit (ReLU) function, Softmax function, Multi-layer
feedforward neural network, Backpropagation method,
Gradient descent method, Stochastic gradient descent
method
3. Optimization and regularization methods for deep
feedforward neural networks (DFNNs): Optimization
methods: Generalized delta rule, AdaGrad, RMSProp,
Adadelta, AdaM, Second order methods; Regularization
methods: Dropout, Dropconnect; Batch normalization
4. Autoencoders: Autoassociative neural network,
Stacked autoencoder, Greedy layer-wise training, Pre-
training of a DFNN using a stacked autoencoder

57
Coverage of Topics (Contd.)
5. Convolutional neural networks (CNNs): Basic CNN
architecture, Deep CNNs for image classification:
LeNet, VGGNet, GoogLeNet, ResNet; CNNs for image
segmentation: U-Net and Fast RCNN; 1-d CNNs, 3-d
CNNs
6. Recurrent neural networks (RNNs): Architecture of
an RNN, Unfolding an RNN, Backpropagation through
time, Vanishing and exploding gradient problems in
RNNs, Long short term memory (LSTM) units, Gated
recurrent units, Bidirectional RNNs
7. Embedding methods: Image and video embedding
methods; Word embedding methods: Word2Vec, GloVe

58
Coverage of Topics (Contd.)
8. Transformer models: Attention based models, Scale
dot product attention, Multi-head attention (MHA), Self-
attention MHA, Cross-attention MHA, Position encoding,
Encoder module in a transformer, Decoder module in a
transformer, Sequence to sequence mapping using
transformer, Bidirectional encoder representations from
transformers (BERT) model for text processing, Pre-
training a BERT model, Fine-tuning, Generative pre-
trained transformer (GPT), Introduction to large
language models (LLMs)
9. Generative Models: Variational autoencoder (VAE),
Generative adversarial networks (GANs), Introduction
to diffusion models

59
Books and Evaluation Pattern
Text Books:
1. C.M.Bishop and H.Bishop, Deep Learning: Foundations and Concepts, Springer,
2024
2. Charu C. Aggarwal, Neural Networks and Deep Learning, Springer, 2nd Ed., 2023
3. S.J.D.Prince, Understanding Deep Learning, MIT Press, 2023

Reference Books:
1. I.Goodfellow, Y.Bengio and A.Courville, Deep Learning, MIT Press, 2016
2. U.Kamath, J.Liu and J.Whitaker, Deep Learning for NLP and Speech Recognition,
Springer, 2019
3. Nithin Buduma, Nikhil Buduma, Joe Papa, Fundamentals of Deep Learning,
O’Reilly, 2nd Ed., 2022
4. I.Drori, The Science of Deep Learning, Cambridge University Press, 2022

Evaluation Pattern (Tentative)

•Assignments: 30%
•Midsem Examination: 25% (12noon to 1.30PM, Friday, 14th March, 2025)
•Endsem Examination: 45% (9AM to 12noon, Thursday, 8th May, 2025)

Vasudevan S. Deep Learning. A Comprehensive Guide 2022
No ratings yet
Vasudevan S. Deep Learning. A Comprehensive Guide 2022
307 pages
Review Article: Deep Learning For Computer Vision: A Brief Review
No ratings yet
Review Article: Deep Learning For Computer Vision: A Brief Review
14 pages
ENG6500 8 DL IntroductionToDeepLearning Part2
No ratings yet
ENG6500 8 DL IntroductionToDeepLearning Part2
65 pages
LBDL A5 Booklet
No ratings yet
LBDL A5 Booklet
82 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
25 pages
Deep Learning Report for Students
No ratings yet
Deep Learning Report for Students
32 pages
2
No ratings yet
2
9 pages
Lecture Notes on Lecture Notes on Deep Learning.docx
No ratings yet
Lecture Notes on Lecture Notes on Deep Learning.docx
8 pages
UNIT-2 DL
No ratings yet
UNIT-2 DL
51 pages
Deep Learning Techniques and Application
No ratings yet
Deep Learning Techniques and Application
20 pages
Slides CNN Unit 3
No ratings yet
Slides CNN Unit 3
36 pages
DL Inference FPGA Class1
No ratings yet
DL Inference FPGA Class1
56 pages
Cnn
No ratings yet
Cnn
56 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
Video Clasification PDF
100% (1)
Video Clasification PDF
114 pages
lbdl
No ratings yet
lbdl
156 pages
Computational Intelligence and Neuroscience - 2018 - Voulodimos - Deep Learning for Computer Vision A Brief Review
No ratings yet
Computational Intelligence and Neuroscience - 2018 - Voulodimos - Deep Learning for Computer Vision A Brief Review
13 pages
LBDL
No ratings yet
LBDL
156 pages
Introduction To Deep Learning - With Complexe Python and TensorFlow Examples - Jürgen Brauer PDF
No ratings yet
Introduction To Deep Learning - With Complexe Python and TensorFlow Examples - Jürgen Brauer PDF
245 pages
2630_20230529_Mahdi__Momen_Aldawood_hh_15261_946399124 (1)
No ratings yet
2630_20230529_Mahdi__Momen_Aldawood_hh_15261_946399124 (1)
11 pages
Chapter 5 Deep Learning
No ratings yet
Chapter 5 Deep Learning
35 pages
Little Book of Deep Learning
100% (1)
Little Book of Deep Learning
158 pages
Tutorial On DNN 1 of 9 Background of DNNs
No ratings yet
Tutorial On DNN 1 of 9 Background of DNNs
65 pages
7 CNN
No ratings yet
7 CNN
66 pages
Lbdl a5 Booklet
No ratings yet
Lbdl a5 Booklet
90 pages
The Evolution of Deep Learning
No ratings yet
The Evolution of Deep Learning
53 pages
IC Unit6 DeepLearning
No ratings yet
IC Unit6 DeepLearning
35 pages
Deep Learnig-CNN-new_DMI-compressed
No ratings yet
Deep Learnig-CNN-new_DMI-compressed
118 pages
DL_IT324a_1
No ratings yet
DL_IT324a_1
38 pages
DL_Unit3_1 (1)
No ratings yet
DL_Unit3_1 (1)
67 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
155 pages
Notes On Deep Learning For Nlp Itebooks pdf download
No ratings yet
Notes On Deep Learning For Nlp Itebooks pdf download
34 pages
DL Unit 1
No ratings yet
DL Unit 1
200 pages
LBDL
No ratings yet
LBDL
185 pages
Hardware Architectures For Deep Neural Networks-MIT'16
No ratings yet
Hardware Architectures For Deep Neural Networks-MIT'16
300 pages
Lect 2 Common Architectural Principles of Deep Networks (3)
No ratings yet
Lect 2 Common Architectural Principles of Deep Networks (3)
20 pages
Rec03 - Deep Architectures
No ratings yet
Rec03 - Deep Architectures
65 pages
ANN 5TH PPT
No ratings yet
ANN 5TH PPT
98 pages
FT04_Haghighat_Independent_2023
No ratings yet
FT04_Haghighat_Independent_2023
40 pages
Deep Learning Curriculum
No ratings yet
Deep Learning Curriculum
23 pages
Deep Learning Notes (1) 2
No ratings yet
Deep Learning Notes (1) 2
54 pages
Deep Learning
No ratings yet
Deep Learning
12 pages
Lec_2
No ratings yet
Lec_2
42 pages
CV Mot
No ratings yet
CV Mot
69 pages
Introduction To Deep Learning: TA: Drew Hudson May 8, 2020
No ratings yet
Introduction To Deep Learning: TA: Drew Hudson May 8, 2020
33 pages
unit 2 dl
No ratings yet
unit 2 dl
12 pages
Eng Ppt Tech
No ratings yet
Eng Ppt Tech
18 pages
UNIT - 5 Lecture 2
No ratings yet
UNIT - 5 Lecture 2
26 pages
The Little Book of Deep Learning François Fleuret download pdf
100% (3)
The Little Book of Deep Learning François Fleuret download pdf
55 pages
Deep Learning Fundamentals
No ratings yet
Deep Learning Fundamentals
19 pages
Hardware Architectures For Deep Neural Networks: ISCA Tutorial June 24, 2017
No ratings yet
Hardware Architectures For Deep Neural Networks: ISCA Tutorial June 24, 2017
290 pages
four unit
No ratings yet
four unit
3 pages
Deep Learning
No ratings yet
Deep Learning
48 pages
DL - Unit - 1 - Foundations of Deep Learning
No ratings yet
DL - Unit - 1 - Foundations of Deep Learning
35 pages
Full Document- Fake News Detection
No ratings yet
Full Document- Fake News Detection
69 pages
Unit 4
No ratings yet
Unit 4
27 pages
GenAIWorkshop GEOMAR With Footnotes Final
No ratings yet
GenAIWorkshop GEOMAR With Footnotes Final
41 pages
Machine Learning
No ratings yet
Machine Learning
11 pages
Comprehensive Machine Learning Techniques: A Guide for the Experienced Analyst
From Everand
Comprehensive Machine Learning Techniques: A Guide for the Experienced Analyst
Adam Jones
No ratings yet
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
From Everand
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
Yuxi (Hayden) Liu
No ratings yet
Blue and White Simple List Mind Map
No ratings yet
Blue and White Simple List Mind Map
1 page
Lecture 13 - Requirements Modeling - Scenario, Information and Analysis Classes
No ratings yet
Lecture 13 - Requirements Modeling - Scenario, Information and Analysis Classes
35 pages
Color 1 Color 2 Color 3 Color 4 Color 5
No ratings yet
Color 1 Color 2 Color 3 Color 4 Color 5
4 pages
Software Engineering Lab Report (CS217)
No ratings yet
Software Engineering Lab Report (CS217)
24 pages
SEHomework 3
No ratings yet
SEHomework 3
7 pages
Excel Sheet Analysis
No ratings yet
Excel Sheet Analysis
33 pages
A399 Pert CPM
No ratings yet
A399 Pert CPM
3 pages
Intersection
No ratings yet
Intersection
3 pages
Object Oriented Analysis and Design
No ratings yet
Object Oriented Analysis and Design
30 pages
Activity On Node
No ratings yet
Activity On Node
4 pages
Connection and Wiring Diagrams: Electrical Schematics
No ratings yet
Connection and Wiring Diagrams: Electrical Schematics
8 pages
t8 - Behavioural Modelling
No ratings yet
t8 - Behavioural Modelling
56 pages
6.2: Types of Electrical Diagrams - Workforce LibreTexts
No ratings yet
6.2: Types of Electrical Diagrams - Workforce LibreTexts
2 pages
Problema 1 "Lavadora" Código VHDL Module
No ratings yet
Problema 1 "Lavadora" Código VHDL Module
40 pages
GEN AI.Question Bank EndSem
No ratings yet
GEN AI.Question Bank EndSem
9 pages
Networks: (Back Propagation)
No ratings yet
Networks: (Back Propagation)
13 pages
Flat Main
No ratings yet
Flat Main
18 pages
DSD Lab Experiment-8 (A) : To Design and Implement A SR Flip-Flop Using Behavioural Modeling
No ratings yet
DSD Lab Experiment-8 (A) : To Design and Implement A SR Flip-Flop Using Behavioural Modeling
8 pages
Data Visualization With Python
No ratings yet
Data Visualization With Python
19 pages
Security Model (Pearson)
No ratings yet
Security Model (Pearson)
7 pages
Python For Non-Programmers
No ratings yet
Python For Non-Programmers
23 pages
Abd Schematic Diagram Fire Fighting Residential
No ratings yet
Abd Schematic Diagram Fire Fighting Residential
1 page
UML Reference Sheet PDF
No ratings yet
UML Reference Sheet PDF
5 pages
Ema Trading New
No ratings yet
Ema Trading New
17 pages
Aplikasi Metode Golden Section Untuk Optimasi Parameter Pada Metode Exponential Smoothing
No ratings yet
Aplikasi Metode Golden Section Untuk Optimasi Parameter Pada Metode Exponential Smoothing
10 pages
Accident/Insident: Pie Chart Description
No ratings yet
Accident/Insident: Pie Chart Description
5 pages
Business Model Canvas
No ratings yet
Business Model Canvas
1 page
Hotel Managment System Project
100% (1)
Hotel Managment System Project
30 pages
Opman Quiz 1 Review
No ratings yet
Opman Quiz 1 Review
11 pages
Smith Chart - Z Y Q
No ratings yet
Smith Chart - Z Y Q
1 page

Introduction to Deep Learning 17th January 2025 (2)

Uploaded by

Introduction to Deep Learning 17th January 2025 (2)

Uploaded by

Introduction to Deep Learning

Office Room: SSB 407

Machine Learning Model

Machine Learning Model

Machine Learning Model

Machine Learning Model

Machine Learning Model

S.J.D.Prince, Understanding Deep Learning, MIT Press, 2023 3

is bu le Tin ki mu khya sa mA chAr

mu nnAL mu da la mei ccar sel vi jey la li ta

• Speech Recognition • Speech Emotion Recognition

• Speaker Recognition • Spoken Language Identification

Conventional Approaches to Pattern Classification:

Deep Learning based Approaches to Pattern Classification:

(Deep Convolutional Neural Network)

• Query-by-example (QBE) Approach

• Query-by-semantics (QBS) Approach

A group of people shopping at an outdoor market.

O. Vinyals, A. Toshev, S. Bengio and D.Erhan, “Show

A woman holding a camera in crowd

Fang et al., “From captions to visual

Is there something to cut the Who is wearing glasses?

Input Layer Hidden Layer Output Layer

• Slow convergence of gradient descent method

• Underfitting: Model complexity is low

–Model complexity is high

–Training dataset size is small

• Drop connect method

• AANN uses linear neurons in the Input layer, Dimension

•Weights of autoencoders are learnt using unsupervised

• Fine-tuning of DNN involves modification of weights

• Convolutional neural network (CNN) is a special type of

• Development of CNN is neuro-biologically motivated.

• A CNN is an MLFFNN designed specifically to recognize 2-

S. Haykin, Neural Networks and Learning Machines, Prentice-Hall of India, 2011

• Input: 32x32 pixel image of a character centered and normalized in size

Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based learning applied to document

• Image Classification (on ImageNet data):

A group of people at an outdoor market.

O. Vinyals, A. Toshev, S. Bengio and D.Erhan, “Show and tell:

A woman holding a camera in crowd

Fang et al., “From captions to visual concepts and back,

• The hidden layer uses sigmoidal neurons

young person drawing face on sheet <endseq>

<startseq> young person drawing face on sheet

• Image Embedding Methods

• Word Embedding Methods

• Relationship among the elements of a sequence:

• LSTM based approach to Sequence-to-Sequence Mapping

• Training the LSTM based Sequence-to-Sequence mapping systems is

• Attention based models try to capture and use

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin,

Encoder and/or decoder of transformer can be

• Encoder pre-training for text data

• Decoder pre-training for text data

• Fine-tuned for tasks such as

Image source : BERT(Devlin et al., 2019)

A.Redford, K.Narasimhan, T.Salimans and I.Sutskever , “Improving Language Understanding by

Is there something to cut the

How many children are in the bed?

Fusion of Representations: Concatenation, Co-attention transformer

Transformer of Question and

A.M.Bellini, N.Parde, M.Matteucci and M.J.Carman, “Towards Open-Ended VQA Models

• Models capable of generation of data (Text, Image,

LAMBADA: LAnguage Modeling Broadened to Account for Discourse Aspects

GLUE: General Language Understanding Evaluation

SQUaD: Stanford Question Answering Dataset

Data Destructing data by adding progressively increasing level noise Noise

Data Generating a new sample by denoising Noise

Evaluation Pattern (Tentative)

You might also like