0% found this document useful (0 votes)

19 views51 pages

Unit 4

Uploaded by

Chitra M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views51 pages

Unit 4

Uploaded by

Chitra M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Unit – IV

Deep Neural Networks, Conventional Neural Networks

Deep Learning
• MLP has the advantage that the first layer (feature
extraction) and the second layer (how those features are
combined to predict the output) are learned together in a
coupled and supervised manner
• MLP with multiple hidden layers can learn more
complicated functions of the input.
• Deep neural networks where, starting from raw input,
each hidden layer combines the values in its preceding
layer and learns more complicated functions of the input.

2
Deep Learning Key Ideas
• Hierarchical learning
– Increasing abstraction levels (bottom up or
top down?)
• Structure/Pattern automatically discovered
during training
– Avoids feature engineering
– Reason preferred over SVM etc…

3
Deep Neural Networks

4
Intro to DNNs
• Deep Feedforward Neural Networks
(depth > 3 hidden layers, maybe)
– No loops, with loops Recurrent Neural
Networks
• Loops: output is dependent on previous input, ie,
output has a memory functionality
• Goal: approximate a function
– Ie, learn the mapping between input and
output

5
Intro to DNNs
• Defines a mapping and learns the parameters
• Also can be called as a directed acyclic graph
(DAG)
– Composition of function …
• is the first layer, second layer and so on…
• Length of this chain is the depth of the network
– Dimensions of the hidden layer is WIDTH of the
model

6
Intro to DNNs
• Transform the input features by a
nonlinear operator Φ
• How to choose Φ?
– Transform to some high dimensional space –
good for reducing training error, but will not
generalize
– Manually design Φ, old style of feature
engineering, then use with SVM etc
– DL: learn Φ also

7
Example: Learning XOR
• Assume regression problem

• Define a linear model

• Solving gives w = 0, b = 0.5, ie, output is

constant at 0.5
– Hence, this model cannot fit the XOR directly

8
Example: Learning XOR
• Regression will work, but features to be
transformed
• Hidden layer h – transforms inputs, output
layer does regression on transformed space
• Complete model is now

• First layer is usually a linear transform

followed by activation function (nonlinear)

9
Example…

10
Example…

• Rectified linear activation function and weight vector transforms the x-space to h-
space
• Hence, a linear model can be fit to the h-space

11
DNN Concepts
• Gradient based learning – similar to ANN
• Cost functions – to calculate the loss
• Output units – based on type of
classification
• Hidden units – combined with hidden AN
layers
• Architecture design
• Backpropagation for DNNs
12
Gradient Based Learning
• Nonlinear transformation turns loss
function non-convex
– No guarantees for optimum solution
• Initialize weights and biases to small
values
• SGD used for weight update to descend
along the cost function
• Error is measured using a Cost Function

13
Cost Function
• Directly affects convergence
• Most popular is cross-entropy loss
– The last expression is Binary CE

14
• Calculate Binary CE of each model.

15
Output Units
• Complete the task of hidden layers to produce appropriate output
• Output units effect training procedure
– If output saturates for inputs, training will not proceed
• Linear output units – give h, produce wTh + b
– Linear, so output will not saturate
• Sigmoid units – useful for binary classification
– Here and in softmax, log likelihood cancels exp and prevents saturation
• Softmax units – multinomial classification

16
Hidden Units
• How to choose the type of hidden unit to use in the
hidden layers of the model
• Rectified Linear Unit or ReLu is the most popular

– Similar to linear units except 0 for negative inputs

– No problem of gradient saturation
• Earlier sigmoid, tanh were popular for hidden layers
– However gradient saturation in all regions except near 0

17
Architecture Design
• Layer
– a 2D array of artificial neurons
– Some other layers like max pooling to reduce dimensionality
– DNNs are sequence of such layers

• How many such layers and properties of each layer

• Choosing a DNN – belief that pattern can be modeled as a

composition of simpler functions (WARNING – NOT ALL
PROBLEMS ARE MEANTS FOR DNNs)
– Results show that greater depth is better at generalization

18
Architecture Design
• Skip connections – skip a layer and connect to layer i+2
– Allows for faster gradient to flow from output towards input
• How to connect layers?
– E.g, fully connected?
– Convolutional?
– Receptive Field size?
– Stride?

19
Architecture Design

20
Backpropagation
Backpropagation Algorithm

22
Backpropagation Algorithm
• Modify weights incrementally until error is
minimum
– Hence an iterative algorithm
• Choose direction in weight space, where change
in error is maximum
• Recalculate error and continue
• Terminate on number of iterations or minimum
error/change in error
• As error propagates back to input side, it is called
as backpropagation
23
Backpropagation for Regression
• Consider the problem of nonlinear
regression:

• Only primary output is target – error can

be calculated only as rt – yt
• Hence, all other values have to be
expressed in terms of this error
24
Backpropagation for Regression
• The error function to be minimized over all
samples

• Second layer weight update using least

squares

25
Backpropagation for Regression
• Applying chain rule

26
Convolutional Networks
CNNs
• Convolutional Neural Networks or CNNs
– Convolution – linear operation
• CNNs: networks that use at least one
convolution operation in at least one layer
• Single most popular DNN variant
• Pooling – used with C-layers
• Variants and intuition

28
Convolution Operation

• Operation on two functions, usually denoted as x*w

• CNN Terminology:
– X: input
– W: Kernel
– Output: Feature map
• Input and Kernel can have more than one dimension
• Replace integral with summation

29
Convolution Operation
• Input and Kernel data types are referred to as
tensors
• Consider a 2D image input I, and a 2D kernel K
• The convolution operation is given as:

• (Q) What are i, j and m, n?

30
2D Convolution Example

31
CNN
• Sparsity
• Pooling
• Parameter Sharing
• Stride
• Zero padding
• Local connections
• Titled Convolution
• Typical CNN

32
Motivation 1: Sparsity
• Sparsity wrt connection
– ANN – kernel size is same as input size
– Convolutional Kernel much smaller in size
– Why does it work?
• Features meant to be detected are similar to size of
Kernel
• Domain data
• O(W x H) for fully connected versus O(m x
n)for sparse Kernel
– Note that W, H >> m, n
33
Sparse Connection from Below

34
Sparse Connection from Above

35
Motivation 2: Parameter Sharing
• There is a SET of kernels
• This set is applied to all positions in the
input
• Hence, a fixed set of Kernels are learnt for
each region in the input
– Ie, at each region, a particular feature is
tested for
• Also reduces set of parameters to be
learnt
36
Parameter Sharing

37
• (Q) For a sample input and kernel set,
compare the reduction in number of
parameters due to sparse Kernels and
parameter sharing in CNNs.

38
Equivariance
• Convolution operation follows the equivariance
rule of functions
– f(g(x)) = g(f(x))
• Shifts in location of feature will produce same
outputs
– E.g., in time series data, features will be detected
even if shifted in time
– Image – translated across image is still detected
• Note that changes in scale or rotation are not
equivariant tranforms
39
Pooling
• Create a statistic of neighbours to replace
a group by a single value
– E.g, L1/L2 norm, Average of items
– L-infinity norm (commonly called Max pooling)
• Removes local translations in input
• Pooling over different convolutions
– Network will learn which transformations to
become invariant to
• Pooling reduces the number of parameters
40
Max Pooling – Spatial Invariance

41
Max Pooling – Learned Invariance

42
Pooling with Downsampling

43
Typical CNN Structure

44
Sample CNNs

45
Variants of Basic CNNs
• Input is not always 2D
– E.g, 3-channel 2D images for colour images
– Hence, input is 3D Tensor
– Accounting for mini-batch, it is 4D Tensor
• Skip some positions of the Kernel –
STRIDE
– Reduced computations, at the cost of missed
features
– Strides can be different in each dimension
46
Stride

47
Zero Padding

48
Local Connections
• Connections are local, but weights are not
shared
• Every connection weight is different
• Can be used when feature does not
appear across all regions of input

49
Tiled Convolutions
• Weights are different locally
– Ie, like local connections
• But are reused in different parts of the
input
– Ie, the connections are rotated across the
image
• Reduces number of parameters to be
learnt/stored

50
51

MLP and CNN
No ratings yet
MLP and CNN
56 pages
L4 - Deep Learning
No ratings yet
L4 - Deep Learning
50 pages
CNN Basics: Convolution & Layers
No ratings yet
CNN Basics: Convolution & Layers
18 pages
Machine Learning for Data Scientists
No ratings yet
Machine Learning for Data Scientists
14 pages
Unit III
No ratings yet
Unit III
89 pages
Unit 2 Notes NLP
No ratings yet
Unit 2 Notes NLP
6 pages
Unit 3
No ratings yet
Unit 3
105 pages
Assignment 13 Modern AI
No ratings yet
Assignment 13 Modern AI
3 pages
DL Unit Iv
No ratings yet
DL Unit Iv
18 pages
Deep Learning Cheatsheet Guide
No ratings yet
Deep Learning Cheatsheet Guide
14 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
47 pages
Module 3
No ratings yet
Module 3
67 pages
Super VIP Cheatsheet - Deep Learning
No ratings yet
Super VIP Cheatsheet - Deep Learning
47 pages
4th Unit Aktu Machine Learning
No ratings yet
4th Unit Aktu Machine Learning
9 pages
MLDL
No ratings yet
MLDL
18 pages
Aiml Ece Unit-5
No ratings yet
Aiml Ece Unit-5
48 pages
Aiml Ece Unit-5
No ratings yet
Aiml Ece Unit-5
48 pages
Lecture 3
No ratings yet
Lecture 3
48 pages
CS601 Machine Learning Unit 3
No ratings yet
CS601 Machine Learning Unit 3
47 pages
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
No ratings yet
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
106 pages
AE556 2024 Topic4 CNN
No ratings yet
AE556 2024 Topic4 CNN
26 pages
Super VIP Cheetsheet - Deep Learning, AI, ML
No ratings yet
Super VIP Cheetsheet - Deep Learning, AI, ML
47 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
37 pages
Introduction to Convolutional Neural Networks
No ratings yet
Introduction to Convolutional Neural Networks
12 pages
Deep Learning Unit2
No ratings yet
Deep Learning Unit2
43 pages
PNAL9 CNNs
No ratings yet
PNAL9 CNNs
61 pages
Unit 4 (CNN and SOM)
No ratings yet
Unit 4 (CNN and SOM)
15 pages
Module11 - NNandDeep Learning
No ratings yet
Module11 - NNandDeep Learning
84 pages
Class Notes Unit 5
No ratings yet
Class Notes Unit 5
13 pages
Ch10 Deep Learning
No ratings yet
Ch10 Deep Learning
104 pages
Seminar Report cnn1
No ratings yet
Seminar Report cnn1
23 pages
Antim Prahar AI and ML For Business 2025
No ratings yet
Antim Prahar AI and ML For Business 2025
45 pages
Module 1
No ratings yet
Module 1
64 pages
Chapter 5 Deep Learning
No ratings yet
Chapter 5 Deep Learning
35 pages
MLT UNIT-4 & 5 Imp Sol
No ratings yet
MLT UNIT-4 & 5 Imp Sol
22 pages
CNN2
No ratings yet
CNN2
70 pages
Gen Aiml Notes by Piyush
No ratings yet
Gen Aiml Notes by Piyush
39 pages
Lecture 3 V33
No ratings yet
Lecture 3 V33
52 pages
6.1 DeepFFNets M2
No ratings yet
6.1 DeepFFNets M2
48 pages
Lecture 17. Convolutional Neural Networks PDF
No ratings yet
Lecture 17. Convolutional Neural Networks PDF
32 pages
Unit IV Deep Leraning
No ratings yet
Unit IV Deep Leraning
35 pages
Deep Learning for Visual Recognition
No ratings yet
Deep Learning for Visual Recognition
82 pages
Chapter21 4e
No ratings yet
Chapter21 4e
35 pages
FODL Unit-4
No ratings yet
FODL Unit-4
46 pages
Deep Learning Convolution Neural Networks
No ratings yet
Deep Learning Convolution Neural Networks
73 pages
Deep Learning Techniques and Application
No ratings yet
Deep Learning Techniques and Application
20 pages
Chapter 4 Neural Network
No ratings yet
Chapter 4 Neural Network
46 pages
Unit 4a - Convolutional Neural Networks
No ratings yet
Unit 4a - Convolutional Neural Networks
107 pages
Lec14 CNNRNNModels
No ratings yet
Lec14 CNNRNNModels
64 pages
Module 5
No ratings yet
Module 5
20 pages
CNN Architecture and Layers Guide
No ratings yet
CNN Architecture and Layers Guide
21 pages
Unit III
No ratings yet
Unit III
89 pages
Convolutional Neural Networks Notes
No ratings yet
Convolutional Neural Networks Notes
29 pages
CNN13 7 25
No ratings yet
CNN13 7 25
175 pages
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES-www - Jntumaterials.co - in
No ratings yet
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES-www - Jntumaterials.co - in
26 pages
Matconvnet: Convolutional Neural Networks For Matlab
No ratings yet
Matconvnet: Convolutional Neural Networks For Matlab
55 pages
Module5 ML
No ratings yet
Module5 ML
112 pages
WhatsApp for Early Learning
No ratings yet
WhatsApp for Early Learning
9 pages
Enhancing Pedagogical Content Knowledge
100% (1)
Enhancing Pedagogical Content Knowledge
18 pages
Chemistry A Molecular Approach 5th Edition Tro Full Download
No ratings yet
Chemistry A Molecular Approach 5th Edition Tro Full Download
404 pages
Elements of Organization and Their Importance
No ratings yet
Elements of Organization and Their Importance
4 pages
Handler - 2024 - Determinants of Llm-Assisted Decision-Making
No ratings yet
Handler - 2024 - Determinants of Llm-Assisted Decision-Making
45 pages
Architectural Project Forest Villa
No ratings yet
Architectural Project Forest Villa
9 pages
Selected Research Papers For ML - AI Project
No ratings yet
Selected Research Papers For ML - AI Project
3 pages
Teacher Reflection on Field Experience
No ratings yet
Teacher Reflection on Field Experience
2 pages
Compound Microscope and Focusing Specimen DLP
67% (3)
Compound Microscope and Focusing Specimen DLP
5 pages
Academic Toppers (2, 3 Place)
No ratings yet
Academic Toppers (2, 3 Place)
12 pages
AI Oops
No ratings yet
AI Oops
24 pages
Drilling Engineer Role Overview
100% (1)
Drilling Engineer Role Overview
2 pages
Bulletin 4 14 14
No ratings yet
Bulletin 4 14 14
3 pages
MATERI Expressing Likes and Dislikes
No ratings yet
MATERI Expressing Likes and Dislikes
12 pages
RHAPSODY Jupiter
No ratings yet
RHAPSODY Jupiter
4 pages
Palak Gupta Resume@@
No ratings yet
Palak Gupta Resume@@
2 pages
IELTS Listening
No ratings yet
IELTS Listening
7 pages
Appian Developer Resume Guide
No ratings yet
Appian Developer Resume Guide
9 pages
Hot Chocolate Word Search
No ratings yet
Hot Chocolate Word Search
3 pages
English as a Global Language
No ratings yet
English as a Global Language
3 pages
Chapter 7 - Economic Organization
No ratings yet
Chapter 7 - Economic Organization
10 pages
Jake S Resume
No ratings yet
Jake S Resume
1 page
Curriculum Implementation
88% (8)
Curriculum Implementation
11 pages
Action Research for Educators
No ratings yet
Action Research for Educators
4 pages
All Report MHRDNational Institutional Ranking Framework NIRF
No ratings yet
All Report MHRDNational Institutional Ranking Framework NIRF
19 pages
Essential Trigonometry Formulas Guide
No ratings yet
Essential Trigonometry Formulas Guide
1 page
B2027 Bachelor of Business and Commerce and Bachelor of Digital Media and Communication
No ratings yet
B2027 Bachelor of Business and Commerce and Bachelor of Digital Media and Communication
1 page
Postgraduate Institute of Science
No ratings yet
Postgraduate Institute of Science
34 pages
FYBFM-First Merit List 2025-26
No ratings yet
FYBFM-First Merit List 2025-26
2 pages
Shadow Work Worksheets For Beginners
100% (8)
Shadow Work Worksheets For Beginners
8 pages

Unit 4

Uploaded by

Unit 4

Uploaded by

Unit – IV

Deep Neural Networks, Conventional Neural Networks

• Define a linear model

• Solving gives w = 0, b = 0.5, ie, output is

• First layer is usually a linear transform

– Similar to linear units except 0 for negative inputs

• How many such layers and properties of each layer

• Choosing a DNN – belief that pattern can be modeled as a

• Only primary output is target – error can

• Second layer weight update using least

• Operation on two functions, usually denoted as x*w

• (Q) What are i, j and m, n?

You might also like