0% found this document useful (0 votes)
22 views7 pages

Sol 4

The document contains exercises from an Introduction to Machine Learning course at ETH Zürich, focusing on artificial neural networks, particularly recurrent neural networks (RNNs) and their expressiveness. It includes problems related to the implementation of logical functions using neural networks and the derivation of loss functions for RNNs. Additionally, it discusses the forward propagation in neural networks and the application of stochastic gradient descent for optimizing weights.

Uploaded by

mussamut.maliha1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views7 pages

Sol 4

The document contains exercises from an Introduction to Machine Learning course at ETH Zürich, focusing on artificial neural networks, particularly recurrent neural networks (RNNs) and their expressiveness. It includes problems related to the implementation of logical functions using neural networks and the derivation of loss functions for RNNs. Additionally, it discusses the forward propagation in neural networks and the application of stochastic gradient descent for optimizing weights.

Uploaded by

mussamut.maliha1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Exercises Institute for Machine Learning

Introduction to Machine Learning Dept. of Computer Science, ETH Zürich


FS 2018
Prof. Dr. Andreas Krause
Web: https://las.inf.ethz.ch/teaching/introml-s18
Series 4, Apr 10th, 2018 Email questions to:
Mojmir Mutny, [email protected]
(ANNs)

Problem 1 (Recurrent Neural Networks):


In the lecture so far, we saw feedforward artificial neural networks, which do not contain any cycles and for which
the nodes do not maintain a persistent state over several runs. This exercise considers artificial neural networks
with nodes that maintain a persistent state that can be updated. This kind of neural network is called a recurrent
neural networks (RNN). As an example, consider the following RNN with

yt = W xt + V st
st+1 = yt

from some initial state s0 , where t denotes the tth call of the RNN, i.e., xt is the tth input.

Figure 1

(a) What is the recurrent state in the RNN from Figure 1? Name one example that can be more naturally
modeled with RNNs than with feedforward neural networks?
(b) As the state of an RNN changes over different runs of the RNN, the loss functions that we use for feedforward
neural networks do not yield consistent results. For given dataset X, please propose a loss function ( based
on the mean square loss function) for RNNs and justify why you chose this loss function.
(c) For a dataset X := (xt , yt )k1 (for some k ∈ N), show how information is propagated by drawing a feedforward
neural network that corresponds to the RNN from Figure 1 for k = 3. Recall that a feedforward neural
network does not contain nodes with a persistent state. (Hint: unfold the RNN.)

Solution 1:

(a) The reccurent state is denoted s. In this case it conincides with the output. Reccurent models are used to
model data with temporal structure e.g. time series, speech, sound.
(b) We have a data X = {(xt , yt )}, where we assume that the data is ordered temporily. Thus, we define the
PT
loss function to be L(U, W, s0 ) = t=1 (y(t) − f (xt , st−1 (U, W ), U, W ), where st is the previous reccurent
state. The initial state s0 needs to be specified and the problem depends on it as well.

(c) Check Figure 2.

Figure 2

2
Problem 2 (Expressiveness of Neural Networks):
In this question we will consider neural networks with sigmoid activation functions of the form
1
ϕ(z) = .
1 + exp(−z)
If we denote by vjl the value of neuron j at layer l its value is computed as
 
X
vjl = ϕ w0 + wj,i vil−1  .
i∈Layerl−1

In the following questions you will have to design neural networks that compute functions of two Boolean inputs
X1 and X2 . Given that the outputs of the sigmoid units are real numbers Y ∈ (0, 1), we will treat the final
output as Boolean by considering it as 1 if greater than 0.5 and 0 otherwise.

(a) Give 3 weights w0 , w1 , w2 for a single unit with two inputs X1 and X2 that implements the logical OR
function Y = X1 ∨ X2 .
(b) Can you implement the logical AND function Y = X1 ∧ X2 using a single unit? If so, give weights that
achieve this. If not, explain the problem.
(c) It is impossible to implement the XOR function Y = X1 ⊕ X2 using a single unit. However, you can do it
using a multi-layer neural network. Use the smallest number of units you can to implement XOR function.
Draw your network and show all the weights.
(d) Create a neural network with only one hidden layer (of any number of units) that implements
(A ∨ ¬B) ⊕ (¬C ∨ ¬D).
Draw your network and show all the weights.

Solution 2:

(a) We consider the following network w = 0.5, w1 = 1 and w2 = 1. We check whether the output we get is
desired OR function. The network looks as follows,
A ∨ B = round(ϕ(w0 + w1 A + w2 B))
A ∨ B = round(ϕ(−0.5 + A + B))
A B A∨B Network Round
1 1 1 ≈ 0.81 1
0 1 1 ≈ 0.62 1
1 0 1 ≈ 0.62 1
0 0 0 ≈ 0.37 0
(b) w0 = −1.5, w1 = 1 and w2 = 1. We check whether the output we get is desired AND function. The
network looks as follows,
A ∧ B = round(ϕ(w0 + w1 A + w2 B))
A ∧ B = round(ϕ(−1.5 + A + B))
A B A∧B Network Round
1 1 1 ≈ 0.62 1
0 1 0 ≈ 0.37 0
1 0 0 ≈ 0.37 0
0 0 0 ≈ 0.18 0

3
(c) We find the weights by choosing the weights of the first layer and optimizing over the weights of the last
layer s.t. the inequalities are satisfied.
We use a network with one hidden layer and two states

A ⊕ B = round(ϕ(−ϕ(A + B) + 0.84ϕ(2A + 2B))

A B A⊕B Network Round


0 0 0 0.480010659844 0
0 1 1 0.502202727467 1
1 0 1 0.502202727467 1
1 1 0 0.486027265451 0

Figure 3

For sketch check Figure 3.


(d) There are multiple ways to proceed to solve this exercises. One universal way is to guess, however this might
become too complicated for large expressions. Another approach is to use a numerical software to fit the
network to match the logic. However, this might be too brute force approach for these general problems.
We provide here a third principled alternative to build such neural networks. I am not certain whether this
method generalizes to any logical expressions (if you show it; good for you), however it works for this case.
Namely, we choose to transform the expression to disjunctive normal form, and then formulate a simplified
classifier. Recall from your previous courses that any boolean algebra expression can be converted to a
disjunctive normal form. In this form, the expression is cumulative OR of several expressions that contain
just AND inside them. As AND, and OR operations can be modeled easily with our neural network, we can
hope to perform the AND operation in one layer, and the OR operation in the second layer.
Specifically, after applying de Morgan laws, distributivity and symmetry to our expression many times, we
arrive at the following minimal form,

(A ∨ ¬B) ⊕ (¬C ∨ ¬D) ⇐⇒ (A ∧ C ∧ D) ∨ (¬A ∧ B ∧ ¬C) ∨ (¬A ∧ B ∧ ¬D) ∨ (¬B ∧ C ∧ D). (1)

The detailed derivation works as follows.

(A ∨ ¬B) ⊕ (¬C ∨ ¬D)


⇐⇒ ((A ∨ ¬B) ∧ (C ∧ D)) ∨ ((¬A ∧ B) ∧ (¬C ∨ ¬D))

applying the distributivity law ((X ∨ Y ) ∧ Z ⇐⇒ ((X ∧ Z) ∨ (Y ∧ Z)), we get

⇐⇒ ((A ∧ C ∧ D) ∨ (¬B ∧ C ∧ D)) ∨ ((¬A ∧ B ∧ ¬C) ∨ (¬A ∧ B ∧ ¬D))

applying associativity and symmetry, we get

⇐⇒ (A ∧ C ∧ D) ∨ (¬A ∧ B ∧ ¬C) ∨ (¬A ∧ B ∧ ¬D) ∨ (¬B ∧ C ∧ D).

4
Our expression decomposes to 4 expressions that are combined via logical ORs. Thus, we propose to choose
4 cells each modeling the respective expression in OR stream (AND here), and in the last cell we take OR
of all of them. Implementation of AND can be done simply using the intuition from previous exercises.

1. (A ∧ C ∧ D) : implements h1 = ϕ(η(A + C + D − 2.5))


2. (¬A ∧ B ∧ ¬C) : implements h2 = ϕ(η(−A + B − C − 0.4))
3. (¬A ∧ B ∧ ¬D) : implements h3 = ϕ(η(−A + B − D − 0.4))
4. (¬B ∧ C ∧ D) : implements h4 = ϕ(η(−B + C + D − 1.4))

In order to get output 0 or 1, we take η →large, e.g.η = 200.


Subsequently to define OR, we require that at least one of these is activated, thus final layer becomes easy,

H(h1, h2, h3, h4) = ϕ(h1 + h2 + h3 + h4 − 0.5).

In full,

output(A, B, C, D) = H(h1(A, C, D), h2(A, B, C), h3(A, B, D), h4(B, C, D)). (2)

We check the truth table,


A B C D (A ∨ ¬B) ⊕ (¬C ∨ ¬D) Round
0 0 0 0 0 0
0 0 0 1 0 0
0 0 1 0 0 0
0 0 1 1 1 1
0 1 0 0 1 1
0 1 0 1 1 1
0 1 1 0 1 1
0 1 1 1 0 0
1 0 0 0 0 0
1 0 0 1 0 0
1 0 1 0 0 0
1 0 1 1 1 1
1 1 0 0 0 0
1 1 0 1 0 0
1 1 1 0 0 0
1 1 1 1 1 1
For a figure see Figure 4.

5
Figure 4

Problem 3 (Exam question: Artificial Neural Networks):


Consider the following neural network with two logistic hidden units h1 , h2 , and three inputs x1 , x2 , x3 . The
output neuron f is a linear unit, and we are using the squared error cost function
E = (y − f )2 . The logistic function is defined as ρ(x) = 1/ (1 + e−x ).
[Note: You can solve part (c) without using the solution for part (b).]

(a) Consider a single training example x = [x1 , x2 , x3 ] with target output (label) y. Write down the sequence
of calculations required to compute the squared error cost (called forward propagation).
(b) A way to reduce the number of parameters to avoid overfitting is to tie certain weights together, so that
they share a parameter. Suppose we decide to tie the weights w1 and w4 , so that w1 = w4 = wtied . What
is the derivative of the error E with respect to wtied , i.e. ∇wtied E?
(c) For a data set D = {(x(1) , y (1) ), · · · , (x(n) , y (n) )} consisting of n labeled examples, write the pseudocode
of the stochastic gradient descent algorithm with learning rate ηt for optimizing the weight wtied (assume
all the other parameters are fixed).

Solution 3:

6
Past Exam question,Detailed solution not provided

(a) basic algebra defined via the DAG


(b) apply chain rule
0
(c) 1. Pick a random guess wtied , learning rate ηk
2. Repeat with k = 1 . . . as iteration index
(a) Sample a data point j ∼ Unif[n]
k+1 k k
(b) wtied = wtied − ηk ∇wtied E(wtied )
3. Until happy with solution

You might also like