0% found this document useful (0 votes)
5 views5 pages

Week 3

The document discusses a neural network with various layers and activation functions, providing solutions to questions about parameters, loss values, and activation functions. It covers concepts such as backpropagation, cross-entropy, and information content of events. Additionally, it includes calculations for predicted outputs and loss using specific inputs and weights.

Uploaded by

durgaraoscet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views5 pages

Week 3

The document discusses a neural network with various layers and activation functions, providing solutions to questions about parameters, loss values, and activation functions. It covers concepts such as backpropagation, cross-entropy, and information content of events. Additionally, it includes calculations for predicted outputs and loss using specific inputs and weights.

Uploaded by

durgaraoscet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Deep Learning - Week 3

Use the following data to answer the questions 1 to 2


A neural network contains an input layer h0 = x, three hidden layers (h1 , h2 , h3 ), and
an output layer O. All the hidden layers use the Sigmoid activation function, and the
output layer uses the Softmax activation function.
Suppose the input x ∈ R200 , and all the hidden layers contain 10 neurons each. The
output layer contains 4 neurons.

1. How many parameters (including biases) are there in the entire network?
Correct Answer: 2274
Solution:
Number of Parameters
Input Layer to h1 : 200 × 10 + 10 = 2010
h1 to h2 : 10 × 10 + 10 = 110
h2 to h3 : 10 × 10 + 10 = 110
h3 to Output Layer: 10 × 4 + 4 = 44
Total Parameters: 2010 + 110 + 110 + 44 = 2274

2. Suppose all elements in the input vector are zero, and the corresponding true label is
also 0. Further, suppose that all the parameters (weights and biases) are initialized
to zero. What is the loss value if the cross-entropy loss function is used? Use the
natural logarithm (ln).
Correct Answer: Range(1.317,1.455)
Solution:
Loss with Zero Inputs and Parameters Input: x = 0, weights and biases = 0.
Hidden Layers: σ(0) = 0.5.
Output Layer Logits: [0, 0, 0, 0].
Softmax: Softmax(zi ) = 14 , ∀i.
Cross-Entropy Loss: − ln 14 = ln(4) ≈ 1.386.


Use the following data to answer the questions 3 to 4


The diagram below shows a neural network. The network contains two hidden layers
and one output layer. The input to the network is a column vector x ∈ R3 . The first
hidden layer contains 9 neurons, the second hidden layer contains 5 neurons and the
output layer contains 2 neurons. Each neuron in the lth layer is connected to all the
neurons in the (l + 1)th layer. Each neuron has a bias connected to it (not explicitly
shown in the figure).
Hidden layer 1

a1 h(1)
1

(1)
h2 Hidden layer 2

(1) a2 (2)
Input layer h3 h1

x1 (1) (2) Output layer


h4 h2
a3 ŷ
1
x2 (1) (2)
h5 h3
ŷ2
x3 (1) (2)
h6 h4

(1) (2)
h7 h5

(1)
W1 h8 W2 W3

(1)
h9

In the diagram, W1 is a matrix and x, a1 , h1 , and O are all column vectors. The
notation Wi [j, :] denotes the j th row of the matrix Wi , Wi [:, j] denotes the j th column
of the matrix Wi and Wkij denotes an element at ith row and j th column of the matrix
Wk .

3. Choose the correct dimensions of W1 and a1

(a) W1 ∈ R3×9
(b) a1 ∈ R9×5
(c) W1 ∈ R9×3
(d) a1 ∈ R1×9
(e) W1 ∈ R1×9
(f) a1 ∈ R9×1

Correct Answer: (c),(f)


Solution:

4. How many learnable parameters(including bias) are there in the network?


Correct Answer: 98
Solution:
Number of parameters in W1 : (9 ∗ 3) + 9
Number of parameters in W1 : (5 ∗ 9) + 5
Number of parameters in W1 : (2 ∗ 5) + 2
Total: 36 + 50 + 12 = 98.
5. We have a multi-classification problem that we decide to solve by training a feedfor-
ward neural network. What activation function should we use in the output layer to
get the best results?

(a) Logistic
(b) Step function
(c) Softmax
(d) linear

Correct Answer: (c)


Solution: Softmax works best on multilayer classification problems since it is scale-
invariant and outputs a probability distribution.

6. Which of the following statements about backpropagation is true?

(a) It is used to compute the output of a neural network.


(b) It is used to optimize the weights in a neural network.
(c) It is used to initialize the weights in a neural network.
(d) It is used to regularize the weights in a neural network.

Correct Answer: (b)


Solution: Backpropagation is a commonly used algorithm for optimizing the weights
in a neural network. It works by computing the gradient of the loss function with
respect to each weight in the network, and then using that gradient to update the
weight in a way that minimizes the loss function.

7. Given two probability distributions p and q, under what conditions is the cross entropy
between them minimized?

(a) All the values in p are lower than corresponding values in q


(b) All the values in p are higher than corresponding values in q
(c) p = 0(0 is a vector)
(d) p = q

Correct Answer: (d)


Solution:Cross entropy is lowest when both distributions are the same.

8. Given that the probability of Event A occurring is 0.18 and the probability of Event
B occurring is 0.92, which of the following statements is correct?

(a) Event A has a low information content


(b) Event A has a high information content
(c) Event B has a low information content
(d) Event B has a high information content

Correct Answer: (b),(c)


Solution: Events with high probability have low information content while events
with low probability have high information content.

Use the following data to answer the questions 9 and 10


The following diagram represents a neural network containing two hidden layers and
one output layer. The input to the network is a column vector x ∈ R3 . The activation
function used in hidden layers is sigmoid. The output layer doesn’t contain any
activation function and the loss used is squared error loss (predy − truey )2 .
Input Hidden Hidden Output
layer layer 1 layer 2 layer
x1 (1)
h1
(2)
h1
x2 (1)
h2 ŷ1
(2)
h2
x3 (1)
h3

The following network doesn’t contain any biases and the weights of the network are
given below:
 
1 1 3  
1 1 2  
W1 =2 −1 1  W2 = W3 = 1 2
3 1 1
1 2 −2
 
1
The input to the network is: x = 2

1
The target value y is: y = 5

9. What is the predicted output for the given input x after doing the forward pass?
Correct Answer: Range(2.9,3.0)
Solution:
Doing the forward
 pass in thenetwork
  we
  get
1 1 3 1 6
h1 = W1 · x1 = 2 −1 1  · 2 = 1
1 2 −2 1 3
 
0.997
a1 = sigmoid(h1 ) =0.731
0.952
 
  0.997  
1 1 2  3.632
h2 = W2 · a 1 = . 0.731 =

3 1 1 4.674
0.952
 
0.974
a2 = sigmoid(h2 ) =
0.990
 
  0.974
y= 1 2 · = 2.954
0.990

10. Compute and enter the loss between the output generated by input x and the true
output y.
Correct Answer: Range(3.97,4.39)
Solution: Loss=(5 − 2.954)2 = 4.1861

You might also like