0% found this document useful (0 votes)

5 views7 pages

Week 7

The document discusses various concepts related to deep learning, including L2 regularization, model complexity, bias and variance in models, dropout regularization, and data augmentation techniques for character recognition. It provides correct answers and explanations for multiple-choice questions regarding these topics, emphasizing the importance of understanding model behavior and regularization methods. Additionally, it covers the evaluation of models based on training and validation errors, as well as mathematical concepts such as gradients and Hessians.

Uploaded by

durgaraoscet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views7 pages

Week 7

Uploaded by

durgaraoscet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Deep Learning - Week 7

1. Which of the following statements about L2 regularization is true?

(a) It adds a penalty term to the loss function that is proportional to the absolute
value of the weights.
(b) It results in sparse solutions for w.
(c) It adds a penalty term to the loss function that is proportional to the square of
the weights.
(d) It is equivalent to adding Gaussian noise to the weights.

Correct Answer: (c)

Solution:
It adds a penalty term to the loss function that is proportional to the
square of the weights. L2 regularization, also known as Ridge Regularization,
adds a penalty term to the loss function that is proportional to the sum of the squares
of the weights. The modified loss function typically looks like:
X
Lreg = L + λ w2

where λ is a hyperparameter that controls the strength of regularization.

Now, let’s analyze the other options:
It adds a penalty term to the loss function that is proportional to the
absolute value of the weights. Incorrect. This describes L1 regularization
(Lasso), not L2.
It results in sparse solutions for w. Incorrect. L2 regularization does not lead
to sparse solutions (i.e., it does not force weights to be exactly zero). Instead, it
shrinks weights toward zero but usually keeps them nonzero. L1 regularization is
the one that encourages sparsity.
It is equivalent to adding Gaussian noise to the weights. Incorrect. While
L2 regularization can be interpreted as a prior in a Bayesian framework (i.e.,
assuming a Gaussian prior on weights), it does not mean that Gaussian noise is
explicitly added to the weights during training.

Common Data Q2-Q3

Consider two models:
fˆ1 (x) = w0 + w1 x

fˆ2 (x) = w0 + w1 x2 + w2 x2 + w4 x4 + w5 x5

2. Which of these models has higher complexity?

(a) fˆ1 (x)

(b) fˆ2 (x)
(c) It is not possible to decide without knowing the true distribution of data points
in the dataset.

Correct Answer: (b)

Solution: Model fˆ2 (x) has higher complexity compared to Model fˆ1 (x). The com-
plexity of a model generally increases with the degree of the polynomial terms. Model
fˆ1 (x) is a linear model, whereas Model fˆ2 (x) includes higher-degree polynomial terms
(specifically x2 and x5 ), making it capable of capturing more complex patterns.
Therefore, fˆ2 (x) is more complex.

3. We generate the data using the following model:

y = 7x3 + 12x + x + 2.

We fit the two models fˆ1 (x) and fˆ2 (x) on this data and train them using a neural
network.

(a) fˆ1 (x) has a higher bias than fˆ2 (x).

(b) fˆ2 (x) has a higher bias than fˆ1 (x).
(c) fˆ2 (x) has a higher variance than fˆ1 (x).
(d) fˆ1 (x) has a higher variance than fˆ2 (x).

Correct Answer: (a),(c)

Solution: fˆ1 (x) has a higher bias than fˆ2 (x). (Because fˆ1 (x) is simpler and cannot
capture the true complexity of the data.) fˆ2 (x) has a higher variance than fˆ1 (x).
(Because fˆ2 (x) is more complex and may fit the training data too closely.)

4. Suppose that we apply Dropout regularization to a feed forward neural network.

Suppose further that mini-batch gradient descent algorithm is used for updating the
parameters of the network. Choose the correct statement(s) from the following state-
ments.

(a) The dropout probability p can be different for each hidden layer
(b) Batch gradient descent cannot be used to update the parameters of the network
(c) Dropout with p = 0.5 acts as a ensemble regularize
(d) The weights of the neurons which were dropped during the forward propagation
at tth iteration will not get updated during t + 1th iteration

Correct Answer: (a),(c)

Solution:

(a) The dropout probability p can be different for each hidden layer:
• True. It is common practice to apply different dropout rates to different
hidden layers, which allows for more control over the regularization strength
applied to each layer.
(b) Batch gradient descent cannot be used to update the parameters of
the network:
• False. Batch gradient descent, as well as mini-batch gradient descent, can
be used to update the parameters of a network with dropout regularization.
Dropout affects the training phase by randomly dropping neurons but does
not prevent the use of gradient descent algorithms for parameter updates.
(c) Dropout with p = 0.5 acts as an ensemble regularizer:
• True. Dropout with p = 0.5 can be seen as an ensemble method in the sense
that, during training, different subsets of neurons are active, which can
be interpreted as training a large number of “thinned” networks. During
testing, the full network is used but with the weights scaled to account for
the dropout, effectively acting as an ensemble of these thinned networks.
(d) The weights of the neurons which were dropped during the forward
propagation at t-th iteration will not get updated during t + 1-th it-
eration:
• False. During training, dropout randomly drops neurons in each mini-batch
iteration, but this does not mean that the weights of dropped neurons are
not updated. The update process occurs based on the backpropagation of
the loss through the network, and weights are updated according to the
gradients computed from the dropped and non-dropped neurons.

5. We have trained four different models on the same dataset using various hyperparam-
eters. The training and validation errors for each model are provided below. Based
on this information, which model is likely to perform best on the test dataset?
Model Training error Validation error
1 0.8 1.4
2 2.5 0.5
3 1.7 1.7
4 0.2 0.6

(a) Model 1
(b) Model 2
(c) Model 3
(d) Model 4

Correct Answer: (d)

Solution: Model 4 has both low training loss and low validation loss. Hence Model 4
will give you best results.

Common Data Q6-Q9

Consider a function L(w, b) = 0.4w2 + 7b2 + 1 and its contour plot given below:
6. What is the value of L(w∗ , b∗ ) where w∗ and b∗ are the values that minimize the
function.
Correct Answer: 1
Solution: To find the value of L(w∗ , b∗ ) where w∗ and b∗ are the values that minimize
the function

L(w, b) = 0.4w2 + 7b2 + 1,

We follow these steps:

1. Find the Minimum Values of w and b:
The partial derivatives of L with respect to w and b are:

∂L
= 0.8w
∂w
∂L
= 14b
∂b
Setting these partial derivatives to zero:

0.8w = 0 =⇒ w = 0
14b = 0 =⇒ b = 0

Therefore, the values that minimize the function are w∗ = 0 and b∗ = 0.

2. Evaluate L at w∗ and b∗ :
Substitute w∗ = 0 and b∗ = 0 into the function L(w, b):
L(w∗ , b∗ ) = L(0, 0) = 0.4(0)2 + 7(0)2 + 1 = 1

Thus, the value of L(w∗ , b∗ ) is 1.

7. What is the sum of the elements of ∇L(w∗ , b∗ )?

Correct Answer: 0
Solution: The gradient ∇L(w, b) is:

∂L ∂L
∇L(w, b) = , = (0.8w, 14b) .
∂w ∂b

At w∗ = 0 and b∗ = 0, the gradient is:

∇L(w∗ , b∗ ) = (0, 0) .

The sum of the elements of ∇L(w∗ , b∗ ) is:

0 + 0 = 0.

8. What is the determinant of HL (w∗ , b∗ ), where H is the Hessian of the function?

Correct Answer: 11.2
Solution: The Hessian matrix HL (w, b) is:
" #
∂2L ∂2L
∂w 2 ∂w∂b
HL (w, b) = ∂2L ∂2L .
∂b∂w ∂b2

Compute the second-order partial derivatives:

∂2L
= 0.8
∂w2
∂2L
= 14
∂b2
∂2L ∂2L
= =0
∂w∂b ∂b∂w
Thus, the Hessian matrix is:

0.8 0
HL (w, b) = .
0 14

The determinant of this matrix is:

Determinant = (0.8 · 14) − (0 · 0) = 11.2.

9. Compute the Eigenvalues and Eigenvectors of the Hessian. According to the eigen-
values of the Hessian, which parameter is the loss more sensitive to?
(a) b
(b) w

Correct Answer: (a)

Solution: The Hessian matrix is:

0.8 0
HL (w, b) = .
0 14

1
The eigenvalues are λ1 = 0.8 and λ2 = 14, with corresponding eigenvectors and
0

0
, respectively. The larger eigenvalue λ2 = 14 corresponds to the parameter b.
1

10. Consider the problem of recognizing an alphabet (in upper case or lower case) of
English language in an image. There are 26 alphabets in the language. Therefore,
a team decided to use CNN network to solve this problem. Suppose that data aug-
mentation technique is being used for regularization. Then which of the following
transformation(s) on all the training images is (are) appropriate to the problem

(a) Rotating the images by ±10◦

(b) Rotating the images by ±180◦
(c) Translating image by 1 pixel in all direction
(d) Cropping

Correct Answer: (a),(c),(d)

Solution:
Cropping:
Appropriate. Cropping is useful for augmenting data by varying the parts of the
image that are used for training. This can help the model learn to recognize letters
even if they are partially obscured or not centered perfectly. It ensures that the model
is robust to variations in the position of the letter within the image.

Rotating the images by ±10◦ :

Appropriate. Rotating images slightly (such as ±10◦ ) helps the model become in-
variant to small rotational changes. This is useful because in practical scenarios,
characters might be slightly tilted, and the model should be able to recognize them
regardless of minor rotations.

Rotating the images by 180◦ :

Not Appropriate. Rotating images by 180◦ is generally not useful for character recog-
nition because it might lead to images that are completely inverted. For example, ’A’
would become ’Λ’ and ’B’ would become ’q’. Such rotations do not usually represent
valid variations in the context of character recognition.

Translating the image by 1 pixel in all directions:

Appropriate. Translating images by small amounts (such as 1 pixel) helps the model
become robust to slight positional shifts. This can improve the model’s ability to
recognize characters that are not perfectly aligned or are slightly shifted.

Ottilie Essay Iowa Family Medicine
100% (3)
Ottilie Essay Iowa Family Medicine
8 pages
SS_2021_Solutions
No ratings yet
SS_2021_Solutions
16 pages
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
Accounting Cycle
No ratings yet
Accounting Cycle
4 pages
Tai Lieu Luyen Thi PMP 2021 Ver 1
No ratings yet
Tai Lieu Luyen Thi PMP 2021 Ver 1
1,165 pages
Lecture2 PDF
No ratings yet
Lecture2 PDF
111 pages
MLF Q2 Practice Problems
No ratings yet
MLF Q2 Practice Problems
61 pages
WEEK 10
No ratings yet
WEEK 10
69 pages
2R Nissin Brake Catalogue (2023ver)
No ratings yet
2R Nissin Brake Catalogue (2023ver)
103 pages
Len Sassaman - Satoshi Nakamoto
No ratings yet
Len Sassaman - Satoshi Nakamoto
37 pages
Creative Writing Notes 2ND Sem
100% (2)
Creative Writing Notes 2ND Sem
14 pages
Sample Midterm With Solutions (Updated)
No ratings yet
Sample Midterm With Solutions (Updated)
26 pages
Midterm With Solutions
No ratings yet
Midterm With Solutions
26 pages
Regularization and Normalization
No ratings yet
Regularization and Normalization
29 pages
Lecture 07
No ratings yet
Lecture 07
29 pages
Organizational Communication
No ratings yet
Organizational Communication
47 pages
07_regularization
No ratings yet
07_regularization
51 pages
Lecture 05 - Regularization - 4p
No ratings yet
Lecture 05 - Regularization - 4p
21 pages
COE292 - T221 - Final - Version C
No ratings yet
COE292 - T221 - Final - Version C
19 pages
A Remittance Letter
100% (1)
A Remittance Letter
4 pages
ass6_solns
No ratings yet
ass6_solns
13 pages
2020-exam2-solution
No ratings yet
2020-exam2-solution
9 pages
Unit Ii
No ratings yet
Unit Ii
8 pages
Case Study On Nascar
No ratings yet
Case Study On Nascar
37 pages
ISE-2 5 DL marks new Imp
No ratings yet
ISE-2 5 DL marks new Imp
17 pages
Midpaper
No ratings yet
Midpaper
16 pages
WS_2021_Solutions
No ratings yet
WS_2021_Solutions
16 pages
For CL Verilog
No ratings yet
For CL Verilog
14 pages
Exam 21
No ratings yet
Exam 21
17 pages
Lecture Slides 3 - Bias Variance and Regularisation For Neural Networks - 2021
No ratings yet
Lecture Slides 3 - Bias Variance and Regularisation For Neural Networks - 2021
29 pages
2021-exam2-solution
No ratings yet
2021-exam2-solution
11 pages
CS6910_Tutorial5
No ratings yet
CS6910_Tutorial5
9 pages
LogisticRegression_ExercisesSolutions
No ratings yet
LogisticRegression_ExercisesSolutions
5 pages
2024-exam2-solution
No ratings yet
2024-exam2-solution
11 pages
Dsa Lab 01
No ratings yet
Dsa Lab 01
10 pages
WS_2021
No ratings yet
WS_2021
16 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
17 pages
Lecture 1
No ratings yet
Lecture 1
6 pages
Week 5
No ratings yet
Week 5
3 pages
Lecture 2
No ratings yet
Lecture 2
6 pages
Unit 2.3
No ratings yet
Unit 2.3
43 pages
DL Quiz1
No ratings yet
DL Quiz1
5 pages
Best Data Collection Methods For Quantitative Research
100% (1)
Best Data Collection Methods For Quantitative Research
13 pages
Quiz1 Solutions Quiz 1 Soln
No ratings yet
Quiz1 Solutions Quiz 1 Soln
7 pages
S10_DNN_Regularization_wip
No ratings yet
S10_DNN_Regularization_wip
11 pages
1160 CS F425 20241218114944 Comprehensive Exam Question Paper
No ratings yet
1160 CS F425 20241218114944 Comprehensive Exam Question Paper
5 pages
Systematics Lab Act-3
No ratings yet
Systematics Lab Act-3
5 pages
Week 7
No ratings yet
Week 7
3 pages
Wa0193.
No ratings yet
Wa0193.
4 pages
Ell409 Aq
No ratings yet
Ell409 Aq
8 pages
EEE4114F 2022 ML Tutorial Solution 1 of 2
No ratings yet
EEE4114F 2022 ML Tutorial Solution 1 of 2
5 pages
PRML Test 2
No ratings yet
PRML Test 2
3 pages
Week 8
No ratings yet
Week 8
4 pages
Lec 2
No ratings yet
Lec 2
5 pages
CSC413 A2
No ratings yet
CSC413 A2
3 pages
Business Models and Multi-Domain Analysis For Acquiring Broadband PPDR Systems
No ratings yet
Business Models and Multi-Domain Analysis For Acquiring Broadband PPDR Systems
7 pages
Week 7 (1)
No ratings yet
Week 7 (1)
3 pages
Lec 05 Regularization
No ratings yet
Lec 05 Regularization
77 pages
Tutorial 8 Questions
No ratings yet
Tutorial 8 Questions
3 pages
NN&DL Unit-IV Regularization for Deep Learning
No ratings yet
NN&DL Unit-IV Regularization for Deep Learning
16 pages
4.deep Learning Assignment4 Solution PDF
100% (1)
4.deep Learning Assignment4 Solution PDF
12 pages
HW-5 Week 5: Dhananjai Sharma
No ratings yet
HW-5 Week 5: Dhananjai Sharma
6 pages
Gym Member
No ratings yet
Gym Member
4 pages
ECS7020P Sample Paper Solutions
No ratings yet
ECS7020P Sample Paper Solutions
6 pages
Solution PDF
No ratings yet
Solution PDF
20 pages
DNN Cluster S2 22 MidSem Makeup
No ratings yet
DNN Cluster S2 22 MidSem Makeup
7 pages
Job Description Job Title: Restaurant Operations Manager Reporting To: Department: F & B Operations
No ratings yet
Job Description Job Title: Restaurant Operations Manager Reporting To: Department: F & B Operations
2 pages
Dynamic Increase Factor For Steel
No ratings yet
Dynamic Increase Factor For Steel
17 pages
Study The F o L L o W I N G Passage, and Choose The Best Answers To The Questions That Follow
No ratings yet
Study The F o L L o W I N G Passage, and Choose The Best Answers To The Questions That Follow
2 pages
M.tech Thesis
100% (1)
M.tech Thesis
59 pages
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
FRP Manual
No ratings yet
FRP Manual
14 pages
Copyright Presentation Video
No ratings yet
Copyright Presentation Video
12 pages
03 Reg Slides
No ratings yet
03 Reg Slides
64 pages
Solution Dseclzg524!01!102020 Ec2r
100% (1)
Solution Dseclzg524!01!102020 Ec2r
6 pages
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
No ratings yet
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
100 pages
Learning From Data HW 6
No ratings yet
Learning From Data HW 6
5 pages
Cs230exam spr21 Soln
No ratings yet
Cs230exam spr21 Soln
21 pages
Solutions Problem Set 1
No ratings yet
Solutions Problem Set 1
7 pages
Resume' & Covering Letter: Vijay Lecturer in English KCT
No ratings yet
Resume' & Covering Letter: Vijay Lecturer in English KCT
45 pages
Heart Intelligence Chapter 1
100% (4)
Heart Intelligence Chapter 1
22 pages
PPC Lifting Permit
No ratings yet
PPC Lifting Permit
2 pages
June 21, 2017 Order of Instructions Pro Se Litigants Re. Electronic Filing (D.E. 8, SCH V FAB 17-80728-S.D.F.L.)
No ratings yet
June 21, 2017 Order of Instructions Pro Se Litigants Re. Electronic Filing (D.E. 8, SCH V FAB 17-80728-S.D.F.L.)
2 pages
Protein Synthesis Worksheet
No ratings yet
Protein Synthesis Worksheet
4 pages
Advantages of Structural Steel Structures
No ratings yet
Advantages of Structural Steel Structures
6 pages
Mattie Bennett Tenancy Agreement
No ratings yet
Mattie Bennett Tenancy Agreement
14 pages
Abm Accountancy Lesson 1
55% (20)
Abm Accountancy Lesson 1
23 pages
As 1386.5-1989 Clean Rooms and Clean Workstations Clean Workstations
No ratings yet
As 1386.5-1989 Clean Rooms and Clean Workstations Clean Workstations
5 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Square Summable Power Series
From Everand
Square Summable Power Series
Louis de Branges
5/5 (1)

Week 7

Uploaded by

Week 7

Uploaded by

Deep Learning - Week 7

1. Which of the following statements about L2 regularization is true?

Correct Answer: (c)

where λ is a hyperparameter that controls the strength of regularization.

Common Data Q2-Q3

2. Which of these models has higher complexity?

(a) fˆ1 (x)

Correct Answer: (b)

3. We generate the data using the following model:

(a) fˆ1 (x) has a higher bias than fˆ2 (x).

Correct Answer: (a),(c)

4. Suppose that we apply Dropout regularization to a feed forward neural network.

Correct Answer: (a),(c)

Correct Answer: (d)

Common Data Q6-Q9

L(w, b) = 0.4w2 + 7b2 + 1,

We follow these steps:

Therefore, the values that minimize the function are w∗ = 0 and b∗ = 0.

Thus, the value of L(w∗ , b∗ ) is 1.

7. What is the sum of the elements of ∇L(w∗ , b∗ )?

At w∗ = 0 and b∗ = 0, the gradient is:

The sum of the elements of ∇L(w∗ , b∗ ) is:

8. What is the determinant of HL (w∗ , b∗ ), where H is the Hessian of the function?

Compute the second-order partial derivatives:

The determinant of this matrix is:

Determinant = (0.8 · 14) − (0 · 0) = 11.2.

Correct Answer: (a)

(a) Rotating the images by ±10◦

Correct Answer: (a),(c),(d)

Rotating the images by ±10◦ :

Rotating the images by 180◦ :

Translating the image by 1 pixel in all directions:

You might also like