100% found this document useful (1 vote)

832 views6 pages

CS 236, Fall 2018 Midterm Exam: Stanford University Honor Code

This exam is worth 130 points and lasts 3 hours. Students are allowed to use notes, books, and laptops but no communication or network access. The exam contains 7 multiple choice or short answer questions testing knowledge of generative models like VAEs, GANs, autoregressive models, and flow models. It concludes with the Stanford honor code statement requiring students to complete the exam with integrity.

Uploaded by

gfgfdgdf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

832 views6 pages

CS 236, Fall 2018 Midterm Exam: Stanford University Honor Code

Uploaded by

gfgfdgdf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

CS 236 Midterm Exam 1

CS 236, Fall 2018

Midterm Exam
This exam is worth 130 points. You have 3 hours to complete it. You are allowed to consult
notes, books, and use a laptop but no communication or network access is allowed. Good
luck!

Stanford University Honor Code

The Honor Code is the University’s statement on academic integrity written by students in 1921. It articu-
lates University expectations of students and faculty in establishing and maintaining the highest standards
in academic work:

• The Honor Code is an undertaking of the students, individually and collectively:

– that they will not give or receive aid in examinations; that they will not give or receive unpermitted
aid in class work, in the preparation of reports, or in any other work that is to be used by the
instructor as the basis of grading;
– that they will do their share and take an active part in seeing to it that others as well as themselves
uphold the spirit and letter of the Honor Code.

• The faculty on its part manifests its condence in the honor of its students by refraining from proctoring
examinations and from taking unusual and unreasonable precautions to prevent the forms of dishonesty
mentioned above. The faculty will also avoid, as far as practicable, academic procedures that create
temptations to violate the Honor Code.
• While the faculty alone has the right and obligation to set academic requirements, the students and
faculty will work together to establish optimal conditions for honorable academic work.

Signature
I attest that I have not given or received aid in this examination, and that I have done my share and taken an
active part in seeing to it that others as well as myself uphold the spirit and letter of the Stanford University
Honor Code.

Name / SUnetID:

Signature:

Question Score Question Score

1 / 15 5 / 20

2 / 20 6 / 20

3 / 15 7 / 20

4 / 20

Total score: / 130

CS 236 Midterm Exam 2

Note: Partial credit will be given for partially correct answers. Zero points will be given to
answers left blank.

1. [15 points total] Comparison of Models

In this course, we discussed four major types of generative models: autoregressive models, variational
autoencoders, flow models, and generative adversarial networks.
(a) [5 points] Suppose we are interested in quickly generating i.i.d. samples from a trained model.
Which of these models can we sample from efficiently (i.e., in a time polynomial in the number
of dimensions of a sample, such as in linear time)?
Answer: All of them.
(b) [5 points] Suppose we are interested in exactly evaluating the likelihood of a data point under
the trained model. In which of these models can we exactly evaluate a data point’s likelihood in
an efficient way (i.e., in a time polynomial in the number of dimensions of a sample, such as in
linear time)?
Answer: Autoregressive and flow models allow for exact likelihood evaluation.
(c) [5 points] Suppose we are interested in learning a latent representation for new data points.
Which of these models are most appropriate for this task, and why?
Answer: In a VAE, qφ (z|x) can serve as an encoder. In a flow model, f −1 (x) can serve as an
encoder. GAN can also learn representations via Bi-GAN.
2. [20 points total] Masked Autoregressive Distribution Estimation (MADE)
An autoencoder learns a feed-forward, hidden representation h(x) of its input x ∈ RD such that, from
it, we can obtain a reconstruction x̂ which is as close as possible to x. Specifically, we have

h(x) = g(b + Wx)

x̂ = sigmoid(c + Vh(x))

where W and V are weight matrices, b and c are bias vectors, g is a nonlinear activation function and
sigmoid(a) = 1/(1 + exp(−a)).
MADE modifies Q the autoencoder to build an autoregressive model. To satisfy the autoregressive
D
property p(x) = d=1 p(xd | x<d ), we use the dth output of MADE x̂d to parameterize the conditional
probability p(xd | x<d ), which means that x̂d must depend only on the preceding inputs x<d . In order
to enforce this property, MADE multiplies each weight matrix by a mask matrix. For a single hidden
layer autoencoder, we write

h(x) = g(b + (W MW )x)

x̂ = sigmoid(c + (V MV )h(x))

where MW and MV are the masks for W and V respectively, and denotes element-wise multiplica-
tion. Note that the entries of a mask matrix can only be 0 or 1.
In this question, we consider a MADE model with a single hidden layer. The dimension of the input
x is assumed to be D, and the number of hidden units h(x) is D − 1.

(a) [2 points] What are the shapes (number of rows and columns) of the mask matrices MW and
MV ?
Answer: Shape of MW : (D − 1, D) Shape of MV : (D, D − 1)
(b) [8 points] Consider two candidate MADE models as shown below. For both models, D = 3, and
there are 2 hidden units. In the figures, an arrow connecting two neurons a b indicates
that the value of b depends on the value of a. Check whether they satisfy the autoregressive
property. If yes, write down the mask matrices MW , MV and compute MV MW . Else, explain
why the autoregressive property is violated.
CS 236 Midterm Exam 3

x̂1 x̂2 x̂3 x̂1 x̂2 x̂3

h1 h2 h1 h2

x1 x2 x3 x1 x2 x3

(a) (b)

Answer: (a) is a valid MADE. (b) is not because x̂1 cannot depend on x1 .
   
0 0 0 0 0
 
1 0 0    
MW =   MV = 1 MV MW
    
   0 =
1 0 0
1 1 0 







0 1 1 1 0

(c) [5 points] Let M = MV MW . What is the maximum number of non-zero entries that M can
have in order to preserve the autoregressive property? Briefly explain your results. [Hint: The
answer should be a function of D.]
Answer: D(D−1)
2 . The matrix M is always strictly lower triangular.
(d) [5 points] It can often be advantageous to have direct connections between the input x and
output layer x̂. In this context, the reconstruction part of MADE becomes

x̂ = sigmoid(c + (V MV )h(x) + (A MA )x),

where A is the weight matrix that directly connects input and output, and MA is its mask matrix.
What is the maximum number of non-zero entries that MA can have to satisfy the autoregressive
property? Briefly explain your results. [Hint: The answer should be a function of D.]
Answer: D(D−1)
2 . The matrix MA is always strictly lower triangular.

3. [15 points total] Variational Autoencoders Basics

For each of the following questions, state true or false. Explain your answer for full points.
(a) [5 points] Suppose we are training a VAE where the prior p(z) is such that each dimension of
the latent variable z is Bernoulli distributed. We can use reparameterization with z to get an
unbiased estimate of the gradient of the variational objective function. (False)

(b) [5 points] Suppose we have trained a VAE parameterized by φ and θ. We can obtain a sample
from pdata (x) by first drawing a sample z0 ∼ pθ (z), then drawing another sample x0 ∼ pθ (x|z0 ).
(False, but True if they say that the VAE is optimal)
CS 236 Midterm Exam 4

(c) [5 points] After learning a VAE model on a dataset, Alice gives Bob the trained decoder pθ (x|z)
and the prior p(z) she used. However, she forgets to give Bob the encoder. Given sufficient
computation, can Bob still infer a latent representation for a new test point x0 ? (True)

4. [20 points total] Evidence Lower Bound

Consider the joint distribution of a latent variable model denoted by p(x, z). The model is capable of
sampling only two images {x(1) , x(2) }. You may imagine that these are two binarized MNIST images
where x(i) ∈ {0, 1}784 . Note that this latent variable model is equipped with a scalar latent variable
z ∈ R. Furthermore, this model is described by

p(z) = N (z; 0, 1)
 (1)
1 if z ≥ 0 ∧ x = x

0 if z ≥ 0 ∧ x 6= x(1)

p(x | z) =
1 if z < 0 ∧ x = x(2)


0 if z < 0 ∧ x 6= x(2)


where p(x | z) is a probability mass function and p(z) is a probability density function.
In other words, the generative model will always sample the first image x(1) when conditioned on
z ≥ 0, and the model will always sample the second image x(2) when conditioned on z < 0. N (z; 0, 1)
indicates a Gaussian distribution with mean zero and variance 1.

(a) [4 points] We can consider the log-likelihood of some image x under our model

`like (x) := log p(x).

Based on the model described above, what is `like (x(1) )?

Answer: `like (x(1) ) = log 0.5 = − log 2
(b) [2 points] One can also reason about the posterior p(z | x). What is the set of all points z for
which the posterior density is positive p(z | x(1) ) > 0 when conditioned on x(1) ?
Answer: [0, +∞)
(c) [8 points] The log-likelihood can be lower bounded by the Evidence Lower Bound (ELBO) as
follows:

p(x, z)
`ELBO (x ; q) := Eq(z) log ,
q(z)
Note that this lower bound is a function of some variational distribution q(z). Prove that

p(x, z)
Eq(z) log = log p(x) − DKL (q(z) k p(z | x)).
q(z)

Answer:

p(x, z) p(z | x)
Eq(z) log = log p(x) + Eq(z) log (1)
q(z) q(z)

q(z)
= log p(x) − Eq(z) log (2)
p(z | x)
= log p(x) − DKL (q(z) k p(z | x)). (3)

(d) [2 points] For parts (d) and (e), suppose q(z) is a univariate Gaussian distribution with positive
variance. What is the set of all points z for which q(z) > 0?
Answer: R
CS 236 Midterm Exam 5

(e) [4 points] Using what you have determined so far, select the option that is correct.
The log-likelihood `like (x(1) ) is:
i. Finite and negative
ii. 0
iii. Finite and positive
iv. None of the above
Answer: Finite and negative.
For any q that is univariate Gaussian with positive variance, `ELBO (x(1) ; q) is:
i. Finite and negative
ii. 0
iii. Finite and positive
iv. None of the above
Answer: None of the above. The KL divergence is undefined since q(z) assigns non-zero proba-
bility mass to the interval (−∞, 0), but p(z | x(1) ) assigns zero probability mass to that interval.

5. [20 points total] Normalizing Flow Models Basics

(a) [5 points] Let Z ∼ Uniform[−2, 3] and X = exp(Z). What is pX (5)?. Answer: (True, by
change of variables pX (x) = x1 pZ (log x) = 1/25)
For each of the statements below, state true or false. Explain your answer for full points.
(b) [5 points] For efficient learning and inference in flow models, any discrete or continuous distribu-
tion which allows for efficient sampling and likelihood evaluation can be used to specify the prior
distribution over latent variables.
Answer: (False, only continuous distributions can be used.)
(c) [5 points] In Parallel Wavenet, evaluating the likelihood assigned by the student model for any
external data point is computationally intractable (i.e., requires exponential time in the number
of dimensions of the sample).
Answer: (False, they are expensive to compute but not computationally intractable.)
(d) [5 points] A permutation matrix is defined as a binary square matrix with {0, 1} entries such
that every column and every row sums to 1. The Jacobian for a RealNVP model can be expressed
as the product of a series of (upper or lower) triangular matrices and permutation matrices.
Answer: (True, a permutation matrix does not affect invertibility)

6. [20 points total] Flow + GAN: Maximum Likelihood vs. Adversarial Training
Let pdata (x) denote a data distribution that we are trying to learn with a generative model, where
x ∈ Rn . Consider the simple generative model parameterized by a single invertible matrix A ∈ Rn×n ,
where a sample is obtained by first sampling an n-dimensional vector z ∼ p(z) from a given distribution
p(z), and returning the matrix-vector product Az. Let D be a training set of samples from pdata (x).

(a) [10 points] Write a loss function L(A) that trains this model using maximum likelihood.
Answer:
L(A) = −Ex∼pdata (x) [log p(A−1 x) + log | det(A−1 )|]
(b) [10 points] Write a loss function L(A) that trains this model as the generator in a generative
adversarial network. You may assume that a discriminator Dφ : Rn → R that outputs the
probability that the input is real has been defined and is trained alongside the generative model.
Answer:
L(A) = Ez∼p(z) [log(1 − Dφ (Az)]
or
L(A) = −Ez∼p(z) [log Dφ (Az)]
CS 236 Midterm Exam 6

7. [20 points total] Flow + VAE: Augmenting variational posteriors

We wish to use flexible flow models for variational autoencoders. Let x ∈ RD denote the inputs, z the
latent variables, pθ (x|z) the generative model, p(z) the prior and rφ (z|x) as the basic inference model
representing a Gaussian distribution N (z; µφ (x), diag(σφ (x)2 )).
Instead of using rφ (z|x) directly as the inference model, we will transform this distribution using a
normalizing flow to obtain a richer variational posterior distribution qφ,ψ . Specifically, let fψ : RF →
RF be an invertible transformation, µφ : RD → RF , and σφ : RD → RF . We use the following
procedure to sample z from qφ,ψ (z|x) given x:
• Sample z̃ ∼ N (z; µφ (x), diag(σφ (x)2 ))
• Compute z = fψ (z̃)

(a) [8 points] Derive an expression for log qφ,ψ (z|x). The function should take x and z as input,
output a scalar value, and depend on µφ , σφ , and fψ . You can use N (u; µ, diag(σ 2 )) to denote
the pdf for normal distribution with mean µ and covariance diag(σ 2 ) evaluated at u.
Answer: The log probability of ẑ is:

log N (ẑ; µφ (x), diag(σφ (x)2 ))

and since z = f (ẑ), we have

∂fψ−1 (z)
qφ,ψ (z|x) = log N (fψ−1 (z); µφ (x), diag(σφ (x)2 )) + log |det |
∂z

(b) [12 points] Consider rφ (z|x) as the basic Gaussian inference model (without using the flow
model), with the following sampling process:
• Sample z ∼ N (z; µφ (x), diag(σφ (x)2 ))
Show that the best evidence lower bound we can achieve with qφ,ψ is at least as tight as the best
one we can achieve with rφ , i.e.,

max ELBO(x; pθ , qφ,ψ ) ≥ max ELBO(x; pθ , rφ )

θ,φ,ψ θ,φ

where
ELBO(x; p, q) = Eq(z|x) [log p(x, z) − log q(z|x)]
You may assume that fψ can represent any invertible function RF → RF .
Answer: For any instance of φ selected for rφ , we can always choose fψ to be the identity function
for qφ,ψ . This function is invertible and preserves volume. Therefore, for this instance of ψ,

qφ,ψ (z|x) = rφ (z|x)

and for any solution rφ , we have a qφ,ψ that has the same ELBO. Therefore, the maximum ELBO
of qφ,ψ should be greater or equal to that of rφ .

Data Management Exam Answers
100% (1)
Data Management Exam Answers
7 pages
Business Analytics Project
100% (1)
Business Analytics Project
11 pages
Michael C. Whitlock and Dolph Schluter - The Analysis of Biological Data (2015, W. H. Freeman and Company)
No ratings yet
Michael C. Whitlock and Dolph Schluter - The Analysis of Biological Data (2015, W. H. Freeman and Company)
1,058 pages
EDA Unit-2
No ratings yet
EDA Unit-2
24 pages
Section 03: Data Handling and Spreadsheets in Analytical Chemistry
No ratings yet
Section 03: Data Handling and Spreadsheets in Analytical Chemistry
45 pages
Unit4 DL Final
No ratings yet
Unit4 DL Final
30 pages
Nlogit Demo PDF
No ratings yet
Nlogit Demo PDF
40 pages
Mit
No ratings yet
Mit
119 pages
CS230 Midterm Fall 2022
No ratings yet
CS230 Midterm Fall 2022
14 pages
2.1 ML (Implementation of Simple Linear Regression in Python)
No ratings yet
2.1 ML (Implementation of Simple Linear Regression in Python)
8 pages
Unit 3 DL
No ratings yet
Unit 3 DL
15 pages
Final Exam Dec 2021
No ratings yet
Final Exam Dec 2021
2 pages
AI&ML BM4251 Unit 1-5 Notes
No ratings yet
AI&ML BM4251 Unit 1-5 Notes
116 pages
RF Toolbox 2
No ratings yet
RF Toolbox 2
767 pages
CH 04
No ratings yet
CH 04
106 pages
Time Series Forecasting
100% (1)
Time Series Forecasting
52 pages
Activation Functions - Ipynb - Colaboratory
No ratings yet
Activation Functions - Ipynb - Colaboratory
10 pages
What Is Gradient Based Learning in Deep Learning
No ratings yet
What Is Gradient Based Learning in Deep Learning
12 pages
Lecture 2.1.9 Comparison of BNN and ANN
No ratings yet
Lecture 2.1.9 Comparison of BNN and ANN
5 pages
Statistics Chapter 4
No ratings yet
Statistics Chapter 4
77 pages
Cs230exam Spr18 Soln PDF
100% (1)
Cs230exam Spr18 Soln PDF
45 pages
Exam 2003 B
No ratings yet
Exam 2003 B
20 pages
Maximum Entropy On The Mean and The Cramér Rate Function in Statistical Estimation and Inverse Problems: Properties, Models, and Algorithms
No ratings yet
Maximum Entropy On The Mean and The Cramér Rate Function in Statistical Estimation and Inverse Problems: Properties, Models, and Algorithms
50 pages
Lab Manual Soft Computing
No ratings yet
Lab Manual Soft Computing
44 pages
Module 5 - NON-PARAMETRIC TESTS
No ratings yet
Module 5 - NON-PARAMETRIC TESTS
50 pages
ITC AKASH Full End Sem
No ratings yet
ITC AKASH Full End Sem
36 pages
Engineering Mathematics II - Removed
No ratings yet
Engineering Mathematics II - Removed
90 pages
Lecture 26-30 Unit 2
No ratings yet
Lecture 26-30 Unit 2
20 pages
Solutions To Deep Learning
No ratings yet
Solutions To Deep Learning
25 pages
Solution Lightwave Networks Midsem 2023
100% (1)
Solution Lightwave Networks Midsem 2023
6 pages
SHS StatProb Q4 W1-8 68pgs-041522
No ratings yet
SHS StatProb Q4 W1-8 68pgs-041522
69 pages
Solutions To Selected Problems in Numerical Optimization 2nbsped - Compress
No ratings yet
Solutions To Selected Problems in Numerical Optimization 2nbsped - Compress
75 pages
CS230 Midterm Solutions Fall 2022
No ratings yet
CS230 Midterm Solutions Fall 2022
20 pages
Mlgs 2021 Retake
No ratings yet
Mlgs 2021 Retake
54 pages
Class 10 Statistics MCQs
No ratings yet
Class 10 Statistics MCQs
4 pages
Answers All 2007
0% (1)
Answers All 2007
64 pages
Unit-4 Mwoc 5-12-22
No ratings yet
Unit-4 Mwoc 5-12-22
82 pages
Lecture 8
No ratings yet
Lecture 8
29 pages
Matlab Previous Year Papers
No ratings yet
Matlab Previous Year Papers
46 pages
Markov Chains
No ratings yet
Markov Chains
38 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
RANDOM VARIABLES and Levels of Measurement (v2024)
No ratings yet
RANDOM VARIABLES and Levels of Measurement (v2024)
21 pages
Empirical Rule
No ratings yet
Empirical Rule
25 pages
Understanding Machine Learning Solution Manual: 2 Gentle Start
No ratings yet
Understanding Machine Learning Solution Manual: 2 Gentle Start
67 pages
03 Diversity PDF
No ratings yet
03 Diversity PDF
30 pages
Discrete Random Variables and Probability Distributions
No ratings yet
Discrete Random Variables and Probability Distributions
31 pages
CNN Hands On
No ratings yet
CNN Hands On
12 pages
Cs230exam Win21
No ratings yet
Cs230exam Win21
21 pages
Ai Fundamentals Final Exam
No ratings yet
Ai Fundamentals Final Exam
21 pages
Final Solution
No ratings yet
Final Solution
12 pages
Solutions To Selected Problems-Duda, Hart
67% (3)
Solutions To Selected Problems-Duda, Hart
12 pages
Objectives: - by The End of This Lecture Students Will
No ratings yet
Objectives: - by The End of This Lecture Students Will
14 pages
Correlation and Regression
No ratings yet
Correlation and Regression
13 pages
l9 Practice Test
No ratings yet
l9 Practice Test
6 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
清大108統計學
No ratings yet
清大108統計學
5 pages
One Sample & Two Sample Mean Tests
No ratings yet
One Sample & Two Sample Mean Tests
5 pages
CS263 - Bayesian Decision Theory
No ratings yet
CS263 - Bayesian Decision Theory
16 pages
AI 3000 / CS 5500: Reinforcement Learning Assignment 1: Problem 1: Markov Reward Process
No ratings yet
AI 3000 / CS 5500: Reinforcement Learning Assignment 1: Problem 1: Markov Reward Process
5 pages
Tugas Seminar Manajemen Pemasaran
No ratings yet
Tugas Seminar Manajemen Pemasaran
5 pages
Introduction To Matlab Tutorial 11
No ratings yet
Introduction To Matlab Tutorial 11
37 pages
MCQ FT
No ratings yet
MCQ FT
29 pages
Assignment 2: Introduction To Machine Learning Prof. B. Ravindran
100% (1)
Assignment 2: Introduction To Machine Learning Prof. B. Ravindran
3 pages
1157 CS F425 20231222015056 Mid Semester Question Paper DL
No ratings yet
1157 CS F425 20231222015056 Mid Semester Question Paper DL
2 pages
Cs230exam spr21 Soln
No ratings yet
Cs230exam spr21 Soln
21 pages
Sample Midterm
No ratings yet
Sample Midterm
8 pages
CASE - Gulf Real Estate Properties: Subject: Quantitative Method - I
No ratings yet
CASE - Gulf Real Estate Properties: Subject: Quantitative Method - I
12 pages
Final Exam Group 2 19042021
No ratings yet
Final Exam Group 2 19042021
3 pages
Practice Midterm 2 Sol
No ratings yet
Practice Midterm 2 Sol
26 pages
Noc20-Cs28 Week 07 Assignment 02
No ratings yet
Noc20-Cs28 Week 07 Assignment 02
6 pages
05 Multi-Scale Derivatives - Implementations
No ratings yet
05 Multi-Scale Derivatives - Implementations
19 pages
Home Work (Satistics AIUB)
No ratings yet
Home Work (Satistics AIUB)
5 pages
Application Development Using Python: Model Question Paper-1 With Effect From 2019-20 (CBCS Scheme)
100% (1)
Application Development Using Python: Model Question Paper-1 With Effect From 2019-20 (CBCS Scheme)
3 pages
Cs 419 Endsemsols
No ratings yet
Cs 419 Endsemsols
6 pages
AI60201 Module3 4 Problems
No ratings yet
AI60201 Module3 4 Problems
4 pages
Unit Ii ML MCQ
No ratings yet
Unit Ii ML MCQ
9 pages
Hw1 Theory Solution PuHK4fmHvB
No ratings yet
Hw1 Theory Solution PuHK4fmHvB
4 pages
Assignment On RNN
No ratings yet
Assignment On RNN
1 page
Quiz Week 7 - Support Vector Machines
100% (1)
Quiz Week 7 - Support Vector Machines
3 pages
Assignment 8: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Assignment 8: Reinforcement Learning Prof. B. Ravindran
4 pages
Previous Exam Exercises On Classification: Exercise 4 2012: Classification With 2 Features
No ratings yet
Previous Exam Exercises On Classification: Exercise 4 2012: Classification With 2 Features
9 pages
Actuarial CT6 Statistical Methods Sample Paper 2011 by ActuarialAnswers
No ratings yet
Actuarial CT6 Statistical Methods Sample Paper 2011 by ActuarialAnswers
10 pages
Assignment 5: Unit 7 - Week 5
No ratings yet
Assignment 5: Unit 7 - Week 5
6 pages
EE 390 Assignment #1: January 19
100% (1)
EE 390 Assignment #1: January 19
2 pages
18AI61
No ratings yet
18AI61
3 pages
CISC 867: Deep Learning Assignment #1: K J Net
No ratings yet
CISC 867: Deep Learning Assignment #1: K J Net
3 pages
Digital Design Mid-Sem Question Paper
No ratings yet
Digital Design Mid-Sem Question Paper
2 pages
PBG BCI Robotics
No ratings yet
PBG BCI Robotics
5 pages
CS 3600 Project 4b Analysis
No ratings yet
CS 3600 Project 4b Analysis
3 pages
Int. To Data Analytics and Cyber Security Syllabus
No ratings yet
Int. To Data Analytics and Cyber Security Syllabus
2 pages

CS 236, Fall 2018 Midterm Exam: Stanford University Honor Code

Uploaded by

CS 236, Fall 2018 Midterm Exam: Stanford University Honor Code

Uploaded by

CS 236 Midterm Exam 1

CS 236, Fall 2018

Stanford University Honor Code

• The Honor Code is an undertaking of the students, individually and collectively:

Question Score Question Score

Total score: / 130

1. [15 points total] Comparison of Models

h(x) = g(b + Wx)

h(x) = g(b + (W MW )x)

x̂1 x̂2 x̂3 x̂1 x̂2 x̂3

x̂ = sigmoid(c + (V MV )h(x) + (A MA )x),

3. [15 points total] Variational Autoencoders Basics

4. [20 points total] Evidence Lower Bound

`like (x) := log p(x).

Based on the model described above, what is `like (x(1) )?

5. [20 points total] Normalizing Flow Models Basics

7. [20 points total] Flow + VAE: Augmenting variational posteriors

log N (ẑ; µφ (x), diag(σφ (x)2 ))

and since z = f (ẑ), we have

max ELBO(x; pθ , qφ,ψ ) ≥ max ELBO(x; pθ , rφ )

qφ,ψ (z|x) = rφ (z|x)

You might also like