0% found this document useful (0 votes)

31 views14 pages

CS236 Hw2 Answers

This document provides solutions for CS 236 Homework 2, which involves implementing Variational Autoencoders (VAE), Mixture of Gaussians VAE (GMVAE), and Importance Weighted Autoencoder (IWAE) using PyTorch. The solutions include code snippets for key functions, performance metrics from training runs, and instructions for visualizing generated samples. The document also contains theoretical proofs related to the IWAE and its relationship with the ELBO.

Uploaded by

nvt2341

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views14 pages

CS236 Hw2 Answers

Uploaded by

nvt2341

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

CS 236 Homework 2 Solutions

Instructors: Stefano Ermon and Aditya Grover

{ermon,adityag}@cs.stanford.edu

Available: 10/15/2018; Due: 23:59 PST, 10/29/2018

Problem 1: Implementing the Variational Autoencoder (VAE) (25 points)

For this problem we will be using PyTorch to implement the variational autoencoder (VAE) and learn a
probabilistic model of the MNIST dataset of handwritten digits. Formally, we observe a sequence of binary
pixels x ∈ {0, 1}d , and let z ∈ Rk denote a set of latent variables. Our goal is to learn a latent variable model
pθ (x) of the high-dimensional data distribution pdata (x).
R R
The VAE is a latent variable model that learns a specific parameterization pθ (x) = pθ (x, z)dz = p(z)pθ (x|z)dz.
Specifically, the VAE is defined by the following generative process:

p(z) = N (z|0, I)

pθ (x|z) = Bern(x|fθ (z))

In other words, we assume that the latent variables z are sampled from a unit Gaussian distribution N (z|0, I).
The latent z are then passed through a neural network decoder fθ (·) to obtain the parameters of the d Bernoulli
random variables which model the pixels in each image.
R
Although we would like to maximize the marginal likelihood pθ (x), computation of pθ (x) = p(z)pθ (x|z)dz is
generally intractable as it involves integration over all possible values of z. Therefore, we posit a variational
approximation to the true posterior and perform amortized inference as we have seen in class:

qφ (z|x) = N (z|µφ (x), diag(σφ2 (x)))

Specifically, we pass each image x through a neural network which outputs the mean µφ and diagonal covariance
diag(σφ2 (x)) of the multivariate Gaussian distribution that approximates the distribution over latent variables
z given x. We then maximize the lower bound to the marginal log-likelihood to obtain an expression known as
the evidence lower bound (ELBO):

log pθ (x) ≥ ELBO(x; θ, φ) = Eqφ (z|x) [log pθ (x|z)] − DKL (qφ (z|x)||p(z)))

Notice that the ELBO as shown on the right hand side of the above expression decomposes into two terms: (1)
the reconstruction loss: −Eqφ (z) [log pθ (x|z)], and (2) the Kullback-Leibler (KL) term: DKL (qφ (z|x)||p(z))).

Your objective is to implement the variational autoencoder by modifying utils.py and vae.py.

1. [5 points] Implement the reparameterization trick in the function sample gaussian of utils.py. Specif-
ically, your answer will take in the mean m and variance v of the Gaussian distribution qφ (z|x) and return
a sample z ∼ qφ (z|x).

1
Solution:
Code:
def s am ple _g au ss i an (m , v ) :
eps = torch . randn_like ( v )
z = m + torch . sqrt ( v ) * eps
return z

2. [5 points] Next, implement negative elbo bound in the file vae.py. Several of the functions in utils.py
will be helpful, so please check what is provided. Note that we ask for the negative ELBO, as PyTorch
optimizers minimize the nloss function. Additionally, since we are computing
Pn the negative ELBO over a
mini-batch of data x(i) i=1 , make sure to compute the average − n1 i=1 ELBO(x(i) ; θ, φ) over the mini-
batch. Finally, note that the ELBO itself cannot be computed exactly since exact computation of the
reconstruction term is intractable. Instead we ask that you estimate the reconstruction term via Monte
Carlo sampling

−Eqφ (z|x) [log pθ (x|z)] ≈ − log pθ (x|z(1) ),

where z(1) ∼ qφ (z|x) denotes a single sample. The function kl normal in utils.py will be helpful. Note:
negative elbo bound also expects you to return the average reconstruction loss and KL divergence.

Solution:
Code:
def n e g a t i v e _ e l b o _ b o u n d ( self , x ) :
m , v = self . enc . encode ( x )
z = ut . sa mp le _ ga us si a n (m , v )
logits = self . dec . decode ( z )

kl = ut . kl_normal (m , v , self . z_prior [ 0 ] , self . z_prior [ 1 ] )

rec = - ut . l o g _ b e r n o u l l i _ w i t h _ l o g i t s (x , logits )
nelbo = kl + rec

nelbo , kl , rec = nelbo . mean () , kl . mean () , rec . mean ()

return nelbo , kl , rec

2
3. [10 points] To test your implementation, run python run vae.py to train the VAE. Once the run is
complete (20000 iterations), it will output (assuming your implementation is correct): the average (1)
negative ELBO, (2) KL term, and (3) reconstruction loss as evaluated on a test subset that we have
selected. Report the three numbers you obtain as part of the write-up. Since we’re using stochastic opti-
mization, you may wish to run the model multiple times and report each metric’s mean and corresponding
standard error. (Hint: the negative ELBO on the test subset should be somewhere around 100.)

Solution:
Standard deviations are provided (based on 50 runs). We provide numbers computed on both the CPU
and GPU. CPU numbers are slightly better since the test subset is slightly different.
GPU Numbers:
(a) VAE negative ELBO: 101.21 ± 0.61
(b) VAE KL: 19.34 ± 0.19
(c) VAE Rec: 81.86 ± 0.66
CPU Numbers:
(a) VAE negative ELBO: 99.20 ± 0.56
(b) VAE KL: 19.35 ± 0.18
(c) VAE Rec: 79.84 ± 0.61
4. [5 points] Visualize 200 digits (generate a single image tiled in a grid of 10 × 20 digits) sampled from
pθ (x).

Solution:
Visualization of VAE samples

3
Problem 2: Implementing the Mixture of Gaussians VAE (GMVAE) (30 points)
Recall that in Problem 1, the VAE’s prior distribution was a parameter-free isotropic Gaussian p(z) =
N (z|0, I). While this original setup works well, there are settings in which we desire more expressivity to better
model our data. In this problem we will implement the GMVAE, which has a mixture of Gaussians as the prior
distribution. Specifically:
k
X 1
pθ (z) = N (z|µi , diag(σi2 ))
i=1
k

where i ∈ {1, . . . , k} denotes the ith cluster index. For notational simplicity, we shall subsume our mixture of
Guassian parameters {µi , σi }ki=1 into our generative model parameters θ. For simplicity, we have also assumed
fixed uniform weights 1/k over the possible different clusters. Apart from the prior, the GMVAE shares an
identical setup as the VAE:
qφ (z|x) = N (z|µφ (x), diag(σφ2 (x))
pθ (x|z) = Bern(x|fθ (z))

Although the ELBO for the GMVAE: Eqφ (z) [log pθ (x|z)]−DKL (qφ (z|x)||pθ (z)) is identical to that of the VAE, we
note that the KL term DKL (qφ (z|x)||pθ (z)) cannot be computed analytically between a Gaussian distribution
qφ (z|x) and a mixture of Gaussians pθ (z). However, we can obtain its unbiased estimator via Monte Carlo
sampling:

DKL (qφ (z|x)||pθ (z)) ≈ log qφ (z(1) |x) − log pθ (z(1) )

k
X 1
= log N (z(1) |µφ (x), diag(σφ2 (x)) − log N (z(1) |µi , diag(σi2 )),
i=1
k

where z(1) ∼ qφ (z|x) denotes a single sample.

1. [15 points] Implement the (1) log normal and (2) log normal mixture functions in utils.py, and the
function negative elbo bound in gmvae.py. The function log mean exp in utils.py will be helpful for
this problem.

Solution:
Code:
def log_normal (x , m , v ) :
element_wise = - 0 . 5 * ( torch . log ( v ) + ( x - m ) . pow ( 2 ) / v + np . log ( 2 * np . pi ) )
log_prob = element_wise . sum ( - 1 )
return log_prob

def l o g _ n o r m a l _ m i x t u r e (z , m , v ) :
# ( batch , dim ) -> ( batch , 1 , dim )
z = z . unsqueeze ( 1 )
# ( batch , 1 , dim ) -> ( batch , mix , dim ) -> ( batch , mix )
log_prob = log_normal (z , m , v )
# ( batch , mix ) -> ( batch ,)
log_prob = log_mean_exp ( log_prob , dim = 1 )
return log_prob

4
def n e g a t i v e _ e l b o _ b o u n d ( self , x ) :
# Prior
prior = ut . g a u s s i a n _ p a r a m e t e r s ( self . z_pre , dim = 1 )

m , v = self . enc . encode ( x )

z = ut . sa mp le _ ga us si a n (m , v )
logits = self . dec . decode ( z )

kl = ut . log_normal (z , m , v ) - ut . l o g _ n o r m a l _ m i x t u r e (z , * prior )
rec = - ut . l o g _ b e r n o u l l i _ w i t h _ l o g i t s (x , logits )
nelbo = kl + rec

nelbo , kl , rec = nelbo . mean () , kl . mean () , rec . mean ()

return nelbo , kl , rec

2. [10 points] To test your implementation, run python run gmvae.py to train the GMVAE. Once the
run is complete (20000 iterations), it will output: the average (1) negative ELBO, (2) KL term, and (3)
reconstruction loss as evaluated on a test subset that we have selected. Report the three numbers you
obtain as part of the write-up. Since we’re using stochastic optimization, you may wish to run the model
multiple times and report each metric’s mean and the corresponding standard error.

CPU Numbers:
(a) GMVAE negative ELBO: 96.75 ± 0.56
(b) GMVAE KL: 17.78 ± 0.20
(c) GMVAE Rec: 78.96 ± 0.59

5
3. [5 points] Visualize 200 digits (generate a single image tiled in a grid of 10 × 20 digits) sampled from
pθ (x).

Solution:
Visualization of GMVAE samples

6
Problem 3: Implementing the Importance Weighted Autoencoder (IWAE) (25 points)
While the ELBO serves as a lower bound to the true marginal log-likelihood, it may be loose if the variational
posterior qφ (z|x) is a poor approximation to the true posterior pθ (z|x). It is worth noting that, for a fixed
choice of x, the ELBO is, in expectation, the log of the unnormalized density ratio

pθ (x, z) pθ (z|x)
= · pθ (x),
qφ (z|x) qφ (z|x)

where z ∼ qφ (z|x). As can be seen from the RHS, the density ratio is unnormalized since the density ratio is is
multiplied by the constant pθ (x). We can obtain a tighter bound by averaging multiple unnormalized density
ratios. This is the key idea behind IWAE, which uses m > 1 samples from the approximate posterior qφ (z|x)
to obtain the following IWAE bound:
m
!
1 X pθ (x, z(i) )
Lm (x; θ, φ) = E (1) i.i.d. log
z ,...,z(m) ∼ qφ (z|x) m i=1 qφ (z(i) |x)

Notice that for the special case of m = 1, the IWAE objective reduces to the standard ELBO.

1. [5 points] Prove that IWAE is a valid lower bound of the log-likelihood, and that the ELBO lower bounds
IWAE
log pθ (x) ≥ Lm (x) ≥ L1 (x)
for any m ≥ 1. [Hint: consider Jensen’s Inequality]

Solution:
A step-by-step proof. The crucial steps are (3) and (4), where we apply Jensen’s Inequality. Identifying
these two steps is sufficient for full-credit.

2. [5 points] Implement IWAE for VAE in the negative iwae bound function in vae.py. The functions
duplicate and log mean exp defined in utils.py will be helpful.

Solution:
Code:

7
def n e g a t i v e _ i w a e _ b o u n d ( self , x , iw ) :
m , v = self . enc . encode ( x )

# Duplicate
m = ut . duplicate (m , iw )
v = ut . duplicate (v , iw )
x = ut . duplicate (x , iw )
z = ut . sa mp le _ ga us si a n (m , v )
logits = self . dec . decode ( z )

kl = ut . log_normal (z , m , v ) - ut . log_normal (z , self . z_prior [ 0 ] , self . z_prior [ 1 ] )

# I m p o r t a n t : it is t e c h n i c a l l y i n c o r r e c t to c a l c u l a t e KL a n a l y t i c a l l y
# IWAE bound r e q u i r e s that we c a l c u l a t e the i m p o r t a n c e sample weight
# So do not do : kl = ut . k l _ n o r m a l (m , v , self . z_prior [ 0 ] , self . z_prior [ 1 ])
rec = - ut . l o g _ b e r n o u l l i _ w i t h _ l o g i t s (x , logits )
nelbo = kl + rec
niwae = - ut . log_mean_exp ( - nelbo . reshape ( iw , - 1 ) , dim = 0 )

niwae , kl , rec = niwae . mean () , kl . mean () , rec . mean ()

return niwae , kl , rec

3. [10 points] Run python run vae.py --train 0 to evaluate your implementation against the test subset.
This will output IWAE bounds for m = {1, 10, 100, 1000}. Check that the IWAE-1 result is consistent
with your reported ELBO for the VAE. Report all four IWAE bounds for this write-up.

(a) VAE negative IWAE-1: 101.21 ± 0.61

(b) VAE negative IWAE-10: 98.50 ± 0.51
(c) VAE negative IWAE-100: 97.46 ± 0.50
(d) VAE negative IWAE-1000: 96.93 ± 0.45
CPU Numbers:

(a) VAE negative IWAE-1: 99.20 ± 0.56

(b) VAE negative IWAE-10: 96.52 ± 0.46
(c) VAE negative IWAE-100: 95.54 ± 0.41
(d) VAE negative IWAE-1000: 95.07 ± 0.40

8
4. [5 points] As IWAE only requires the averaging of multiple unnormalized density ratios, the IWAE
bound is also applicable to the GMVAE model. Repeat parts 2 and 3 for the GMVAE by implementing
the negative iwae bound function in gmvae.py. Compare and contrast IWAE bounds for GMVAE and
VAE.

Solution:
Code:
def n e g a t i v e _ i w a e _ b o u n d ( self , x , iw ) :
# Prior
prior = ut . g a u s s i a n _ p a r a m e t e r s ( self . z_pre , dim = 1 )

m , v = self . enc . encode ( x )

# Duplicate
m = ut . duplicate (m , iw )
v = ut . duplicate (v , iw )
x = ut . duplicate (x , iw )
z = ut . sa mp le _ ga us si a n (m , v )
logits = self . dec . decode ( z )

kl = ut . log_normal (z , m , v ) - ut . l o g _ n o r m a l _ m i x t u r e (z , * prior )
rec = - ut . l o g _ b e r n o u l l i _ w i t h _ l o g i t s (x , logits )
nelbo = kl + rec
niwae = - ut . log_mean_exp ( - nelbo . reshape ( iw , - 1 ) , dim = 0 )

niwae , kl , rec = niwae . mean () , kl . mean () , rec . mean ()

return niwae , kl , rec

Standard deviations are provided (based on 50 runs). We provide numbers computed on both the CPU
and GPU. CPU numbers are slightly better since the test subset is slightly different.
GPU Numbers:

(a) GMVAE negative IWAE-1: 98.58 ± 0.46

(b) GMVAE negative IWAE-10: 96.12 ± 0.42
(c) GMVAE negative IWAE-100: 95.16 ± 0.42
(d) GMVAE negative IWAE-1000: 94.71 ± 0.39

CPU Numbers:
(a) GMVAE negative IWAE-1: 96.75 ± 0.56
(b) GMVAE negative IWAE-10: 94.30 ± 0.48
(c) GMVAE negative IWAE-100: 93.43 ± 0.46
(d) GMVAE negative IWAE-1000: 92.98 ± 0.41

9
Problem 4: Implementing the Semi-Supervised VAE (SSVAE) (20 points)
So far we have dealt with generative models in the unsupervised setting. We now consider semi-supervised
learning on the MNIST dataset, where we have a small number of labeled x` = {(x(i) , y (i) )}100 i=1 pairs in our
training data and a large amount of unlabeled data xu = {x(i) }60000
i=101 . A label y (i)
for an image is simply the
number the image x(i) represents. We are interested in building a classifier that predicts the label y given the
sample x. One approach is to train a classifier using standard approaches using only the labeled data. However,
we would like to leverage the large amount of unlabeled data that we have to improve our classifier’s performance.

We will use a latent variable generative model (a VAE), where the labels y are partially observed, and z are
always unobserved. The benefit of a generative model is that it allows us to naturally incorporate unlabeled
data into the maximum likelihood objective function simply by marginalizing y when it is unobserved. We
will implement the Semi-Supervised VAE (SSVAE) for this task, which follows the generative process specified
below:
p(z) = N (z|0, I)
1
p(y) = Categorical(y|π) =
10
pθ (x|y, z) = Bern(x|fθ (y, z))
where π = (1/10, . . . , 1/10) is a fixed uniform prior over the 10 possible labels and each sequence of pixels x is
modeled by a Bernoulli random variable parameterized by the output of a neural network decoder fθ (·).

y z y z

x x

Figure 1: Graphical model for SSVAE. Gray nodes denote observed variables; unshaded nodes denote latent
variables. Left: SSVAE for the setting where the labels y are unobserved; Right: SSVAE where some data
points (x, y) have observed labels.

To train a model on the datasets X` and Xu , the principle of maximum likelihood suggests that we find the
model pθ which maximizes the likelihood over both datasets. Assuming the samples from X` and Xu are drawn
i.i.d., this translates to the following objective
X X
max log pθ (x) + log pθ (x, y),
θ
x∈Xu x,y∈X`

where
XZ
pθ (x) = pθ (x, y, z)dz
y∈Y
Z
pθ (x, y) = pθ (x, y, z)dz.

To overcome the intractability of exact marginalization of the latent variables z, we will instead maximize their
respective evidence lower bounds,
X X
max ELBO(x; θ, φ) + ELBO(x, y; θ, φ),
θ,φ
x∈Xu x,y∈X`

where we introduce some amortized inference model qφ (y, z|x) = qφ (y|x)qφ (z|x, y). Specifically,

qφ (y|x) = Categorical(y|fφ (x))

qφ (z|x, y) = N (z|µφ (x, y), diag(σφ2 (x, y)))

10
where the parameters of the Gaussian distribution are obtained through a forward pass of the encoder. We note
that qφ (y|x) = Categorical(y|fφ (x)) is actually an MLP classifier that is also a part of the inference model, and
it predicts the probability of the label y given the observed data x.

We use this amortized inference model to construct the ELBOs.

pθ (x, y, z)
ELBO(x; θ, φ) = Eqφ (y,z|x) log
qφ (y, z|x)

pθ (x, y, z)
ELBO(x, y; θ, φ) = Eqφ (z|x,y) log .
qφ (z|x, y)

However, Kingma et al. (2014)1 observed that maximizing the lower bound of the log-likelihood is not sufficient
to learn a good classifier. Instead, they proposed to introduce an additional training signal that directly trains
the classifier on the labeled data
X X X
max ELBO(x; θ, φ) + ELBO(x, y; θ, φ) + α log qφ (y|x).
θ,φ
x∈Xu x,y∈X` x,y∈X`

where α ≥ 0 weights the importance of the classification accuracy. In this problem, we will consider a simpler
variant of this objective that works just as well in practice,
X X
max ELBO(x; θ, φ) + α log qφ (y|x).
θ,φ
x∈X x,y∈X`

It is worth noting that the introduction of the classification loss has a natural interpretation as maximizing the
ELBO subject to the soft constraint that the classifier qφ (y|x) (which is a component of the amortized inference
model) achieves good performance on the labeled dataset. This approach of controlling the generative model
by constraining its inference model is thus a form of amortized inference regularization.2

1. [1 point] Run python run ssvae.py --gw 0. The gw flag denotes how much weight to put on the
ELBO(x) term in the objective function; scaling the term by zero corresponds to a traditional supervised
learning setting on the small labeled dataset only, where we ignore the unlabeled data. Report your
classification accuracy on the test set after the run completes (30000 iterations).

Solution:
Classifier accuracy using only the limited labeled data: 73.5%
2. [10 points] Implement the negative elbo bound function in ssvae.py. Note that the function expects
as output the negative Evidence Lower Bound as well as its decomposition into the following terms,

p(y) p(z)
−ELBO(x; θ, φ) = −Eqφ (y|x) log − Eqφ (y|x) Eqφ (z|x,y) log + log pθ (x|z, y)
qφ (y|x) qφ (z|x, y)
= DKL (qφ (y|x)kp(y)) + Eqφ (y|x) DKL (qφ (z|x, y)kp(z)) + Eqφ (y,z|x) [− log pθ (x|z, y)] .
| {z } | {z } | {z }
KLy KLz Reconstruction

Since there are only ten labels, we shall compute the expectations with respect to qφ (y|x) exactly, while
using a single Monte Carlo sample of the latent variables z sampled from each qφ (z|x, y) when dealing
with the reconstruction term. In other words, we approximate the negative ELBO with
X
(y)
DKL (qφ (y|x)kp(y)) + qφ (y|x) DKL (qφ (z|x, y)kp(z)) − log pθ (x|z , y) ,
y∈Y

where z(y) ∼ qφ (z|x, y) denotes a sample from the inference distribution when conditioned on a possible
(x, y) pairing. The functions kl normal and kl cat in utils.py will be useful.
1 Kingma, et al. Semi-Supervised Learning With Deep Generative Models. Neural Information Processing Systems, 2014
2 Shu, et al. Amortized Inference Regularization. Neural Information Processing Systems, 2018.

11
Solution:
Code:
def n e g a t i v e _ e l b o _ b o u n d ( self , x ) :
y_logits = self . cls . classify ( x )
y_logprob = F . log_softmax ( y_logits , dim = 1 )
y_prob = torch . softmax ( y_logprob , dim = 1 ) # ( batch , y_dim )

# D u p l i c a t e y based on x ’s batch size . Then d u p l i c a t e x

y = np . repeat ( np . arange ( self . y_dim ) , x . size ( 0 ) )
y = x . new ( np . eye ( self . y_dim ) [ y ] )
x = ut . duplicate (x , self . y_dim )

m , v = self . enc . encode (x , y )

z = ut . sa mp le _ ga us si a n (m , v )
x_logits = self . dec . decode (z , y )

kl_y = ut . kl_cat ( y_prob , y_logprob , np . log ( 1 . 0 / self . y_dim ) )

kl_z = ut . kl_normal (m , v , self . z_prior [ 0 ] , self . z_prior [ 1 ] ) # ( y_dim * batch )
rec = - ut . l o g _ b e r n o u l l i _ w i t h _ l o g i t s (x , x_logits ) # ( y_dim * batch )

# Compute the e x p e c t e d r e c o n s t r u c t i o n and kl ( under the d i s t r i b u t i o n q ( y | x ) )

rec = ( y_prob . t () * rec . reshape ( self . y_dim , - 1 ) ) . sum ( 0 ) # ( batch ,)
kl_z = ( y_prob . t () * kl_z . reshape ( self . y_dim , - 1 ) ) . sum ( 0 ) # ( batch ,)

# Reduce to means
kl_y , kl_z , rec = kl_y . mean () , kl_z . mean () , rec . mean ()
nelbo = rec + kl_z + kl_y
return nelbo , kl_z , kl_y , rec

3. [9 points] Run python run ssvae.py. This will run the SSVAE with the ELBO(x) term included, and
thus perform semi-supervised learning. Report your classification accuracy on the test set after the run
completes.
SSVAE Accuracy: 93.52 ± 1.09%

12
Bonus: Style and Content Disentanglement in SVHN (10 points)
A curious property of the SSVAE graphical model is that, in addition to the latent variables y learning to
encode the content (i.e. label) of the image, the latent variables z also learns to encode the style of the image.
We shall demonstrate this phenomenon on the SVHN dataset. To make the problem simpler, we will only
consider the fully-supervised scenario where y is fully-observed. This yields the fully-supervised VAE, shown
below.

y z

Figure 2: Graphical model for FSVAE. Gray nodes denote observed variables.

1. [3 points] Since fully-supervised VAE (FSVAE) always conditions on an observed y in order to generate
the sample x, it is a special case of the conditional variational autoencoder. Derive the Evidence Lower
Bound ELBO(x; θ, φ, y) of the conditional log probability log pθ (x|y). You are allowed to introduce the
amortized inference model qφ (z|x, y).

2. [7 points] Implement the negative elbo bound function in fsvae.py. In contrast to the MNIST dataset,
the SVHN dataset has a continuous observation space

p(z) = N (z|0, I)

pθ (x|y, z) = N (x|µθ (y, z), diag(σθ2 (y, z))).

To simplify the problem more, we shall assume that the variance is fixed at
1
diag(σθ2 (y, z)) = I,
10
and only train the decoder mean function µθ . Once you have implemented negative elbo bound, run
python run fsvae.py. The default settings will use a max iteration of 1 million. We suggest checking
the image quality of clip(µθ (y, z))—where clip(·) performs element-wise clipping of outputs outside the
range [0, 1]—every 10k iterations and stopping the training whenever the digit classes are recognizable.3
i.i.d.
Once you have learned a sufficiently good model, generate twenty latent variables z(0) , . . . , z(19) ∼ p(z).
Then, generate 200 SVHN digits (a single image tiled in a grid of 10 × 20 digits) where the digit in the
ith row, j th column (assuming zero-indexing) is the image clip(µθ (y = i, z = j)).

Solution:
Code:
3 An important note about sample quality inspection when modeling continuous images using VAEs with Gaussian observation
decoders: modeling continuous image data distributions is quite challenging. Rather than truly sampling x ∼ pθ (x|y), a common
heuristic is to simply sample clip(µθ (y, z)) instead.

13
def n e g a t i v e _ e l b o _ b o u n d ( self , x , y ) :
m , v = self . enc . encode (x , y )
z = ut . sa mp le _ ga us si a n (m , v )
x_mean = self . dec . decode (z , y )

kl_z = ut . kl_normal (m , v , self . z_prior [ 0 ] , self . z_prior [ 1 ] ) . mean ()

rec = - ut . log_normal (x , x_mean , 0 . 1 * torch . ones_like ( x_mean ) ) . mean ()
nelbo = kl_z + rec
return nelbo , kl_z , rec

Visualization of FSVAE samples

CS 236, Fall 2018 Midterm Exam: Stanford University Honor Code
100% (1)
CS 236, Fall 2018 Midterm Exam: Stanford University Honor Code
6 pages
Concrete Exposure Classes A23.1-19
100% (1)
Concrete Exposure Classes A23.1-19
7 pages
DLP in Limits of Exponential and Logarithmic Functions
No ratings yet
DLP in Limits of Exponential and Logarithmic Functions
24 pages
CS236 Homework 1
100% (1)
CS236 Homework 1
4 pages
Content Standard Performance Standard
No ratings yet
Content Standard Performance Standard
17 pages
Passage Practice Sheet by Latifurs
No ratings yet
Passage Practice Sheet by Latifurs
34 pages
Mocomi TimePass The Magazine - Issue 46
No ratings yet
Mocomi TimePass The Magazine - Issue 46
17 pages
ERM 200 Series Info Produit E@
No ratings yet
ERM 200 Series Info Produit E@
16 pages
T Burke Litwin Model For Org Change
No ratings yet
T Burke Litwin Model For Org Change
24 pages
Arianne D. Briones: Objective
No ratings yet
Arianne D. Briones: Objective
3 pages
Pioneer Avh-P7500dvd crt3112
No ratings yet
Pioneer Avh-P7500dvd crt3112
113 pages
Topic Vocabulary: Learning: (E) (G)
No ratings yet
Topic Vocabulary: Learning: (E) (G)
1 page
5 Q1 General Mathematics
No ratings yet
5 Q1 General Mathematics
74 pages
IELTS Speaking Part 1 Hometown
No ratings yet
IELTS Speaking Part 1 Hometown
5 pages
Understanding Diffusion Models: A Unified Perspective
No ratings yet
Understanding Diffusion Models: A Unified Perspective
23 pages
An Introduction To Variational Autoencoders: Foundations and Trends in Machine Learning
No ratings yet
An Introduction To Variational Autoencoders: Foundations and Trends in Machine Learning
89 pages
Ee 420: Final Examination Submit On Blackboard by 5PM On December 16, 2020 Maximum Points: 150
No ratings yet
Ee 420: Final Examination Submit On Blackboard by 5PM On December 16, 2020 Maximum Points: 150
4 pages
Grade 9 Applied Math 3 12 Direct and Partial Variation PDF
No ratings yet
Grade 9 Applied Math 3 12 Direct and Partial Variation PDF
8 pages
Variational Autoencoders
No ratings yet
Variational Autoencoders
94 pages
Demystifying Variational Diffusion Models
No ratings yet
Demystifying Variational Diffusion Models
48 pages
Auto Encoding Variational Bayes
No ratings yet
Auto Encoding Variational Bayes
14 pages
TAG 400i
No ratings yet
TAG 400i
5 pages
LAB ACTIVITY Projectile Motion
No ratings yet
LAB ACTIVITY Projectile Motion
19 pages
Institut Agama Islam Banten (Iaib) Serang - Banten: TAHUN AKADEMIK 2020/2021
No ratings yet
Institut Agama Islam Banten (Iaib) Serang - Banten: TAHUN AKADEMIK 2020/2021
2 pages
Khan - Diffusion Models and Normalizing Flows
No ratings yet
Khan - Diffusion Models and Normalizing Flows
36 pages
3 PDF
No ratings yet
3 PDF
56 pages
Mlgs 2021 Retake
No ratings yet
Mlgs 2021 Retake
54 pages
Bayesian NN
No ratings yet
Bayesian NN
82 pages
Jap Sam Books - February 2024
No ratings yet
Jap Sam Books - February 2024
6 pages
Tutorial On Diffusion Models For Imaging and Vision: Stanley Chan March 28, 2024
No ratings yet
Tutorial On Diffusion Models For Imaging and Vision: Stanley Chan March 28, 2024
51 pages
Tung Kieu - Probabilistic - Graphical - Model - Report
No ratings yet
Tung Kieu - Probabilistic - Graphical - Model - Report
9 pages
SBA1 Maths Y3 2023
No ratings yet
SBA1 Maths Y3 2023
9 pages
Certificate of Validation: Universidad de Zamboanga
No ratings yet
Certificate of Validation: Universidad de Zamboanga
7 pages
Action Plan NLC SRES
No ratings yet
Action Plan NLC SRES
15 pages
Facial Coding Neuromarketing - Buscar Con Google
No ratings yet
Facial Coding Neuromarketing - Buscar Con Google
1 page
Notes
No ratings yet
Notes
9 pages
Reparametrization Trick
No ratings yet
Reparametrization Trick
8 pages
Etc3400 Tute Ex 6 2022
No ratings yet
Etc3400 Tute Ex 6 2022
5 pages
hw3 Solutions PDF
No ratings yet
hw3 Solutions PDF
11 pages
Python Basics Nympy
No ratings yet
Python Basics Nympy
5 pages
CS 229, Autumn 2017 Problem Set #4: EM, DL & RL
No ratings yet
CS 229, Autumn 2017 Problem Set #4: EM, DL & RL
10 pages
Gaussian Mixture Model: P (X - Y) P (Y - X) P (X)
No ratings yet
Gaussian Mixture Model: P (X - Y) P (Y - X) P (X)
3 pages
EN ProductInformation SPECTRUM Family
No ratings yet
EN ProductInformation SPECTRUM Family
5 pages
DSCI 303: Machine Learning For Data Science Fall 2020
No ratings yet
DSCI 303: Machine Learning For Data Science Fall 2020
5 pages
Preparation Method of High Proportion Mixed Oil Cocamide DEA
No ratings yet
Preparation Method of High Proportion Mixed Oil Cocamide DEA
3 pages
Homework 2
No ratings yet
Homework 2
3 pages
Abiy F Report
No ratings yet
Abiy F Report
38 pages
hw7 Sol
No ratings yet
hw7 Sol
12 pages
Foundations of Data Science: Exercise 1
No ratings yet
Foundations of Data Science: Exercise 1
5 pages
Deep Learning Assignment3 Solution
No ratings yet
Deep Learning Assignment3 Solution
9 pages
ELEN4903 hw1 Spring2018
No ratings yet
ELEN4903 hw1 Spring2018
2 pages
Midterm 2010 Solutions
No ratings yet
Midterm 2010 Solutions
8 pages
Tutorial 1 - Bayesian Neural Networks With Pyro - UvA DL Notebooks v1.2 Documentation
No ratings yet
Tutorial 1 - Bayesian Neural Networks With Pyro - UvA DL Notebooks v1.2 Documentation
9 pages
Lecture 2
No ratings yet
Lecture 2
8 pages
HW 4
No ratings yet
HW 4
5 pages
Mlgs 2021 Endterm Solution
No ratings yet
Mlgs 2021 Endterm Solution
26 pages
Chapter-2 - Distance Measurements - PPB - Surveying - I
No ratings yet
Chapter-2 - Distance Measurements - PPB - Surveying - I
4 pages
DGM 2023 Endterm Solution
No ratings yet
DGM 2023 Endterm Solution
12 pages
Machine Learning and Pattern Recognition - Variational - Details
No ratings yet
Machine Learning and Pattern Recognition - Variational - Details
3 pages
Machine Learning and Pattern Recognition Minimal Stochastic Variational Inference Demo
No ratings yet
Machine Learning and Pattern Recognition Minimal Stochastic Variational Inference Demo
3 pages
hw2 2020
No ratings yet
hw2 2020
3 pages
VAE talk.compressed - 副本
No ratings yet
VAE talk.compressed - 副本
59 pages
November 2018 (v2) QP - Paper 4 CIE Maths IGCSE
No ratings yet
November 2018 (v2) QP - Paper 4 CIE Maths IGCSE
2 pages
8.auto-Encoding Variational Bayes
No ratings yet
8.auto-Encoding Variational Bayes
14 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
2019-20-I ES Key
No ratings yet
2019-20-I ES Key
4 pages
Auto-Encoding Variational Bayes: Diederik P. Kingma Max Welling
No ratings yet
Auto-Encoding Variational Bayes: Diederik P. Kingma Max Welling
9 pages
Unit Outline For PSY30016 - Swinburne
No ratings yet
Unit Outline For PSY30016 - Swinburne
6 pages
Cheat Sheet 1
No ratings yet
Cheat Sheet 1
2 pages
L20 GenerativeModels
No ratings yet
L20 GenerativeModels
53 pages
hw5 1
No ratings yet
hw5 1
6 pages
2021 Exam2 Solution
No ratings yet
2021 Exam2 Solution
11 pages
Cs 419 Endsemsols
No ratings yet
Cs 419 Endsemsols
6 pages
PRAXIS ESP Packer Installation Instructions
No ratings yet
PRAXIS ESP Packer Installation Instructions
11 pages
Homework 2
No ratings yet
Homework 2
3 pages
Final Exam Solutions
No ratings yet
Final Exam Solutions
12 pages
Auto-Encoding Variational Bayes
No ratings yet
Auto-Encoding Variational Bayes
8 pages
21,27,53,57 (Campus Navigation)
No ratings yet
21,27,53,57 (Campus Navigation)
12 pages
Assgmt 1
No ratings yet
Assgmt 1
7 pages
Exercise 01
No ratings yet
Exercise 01
3 pages
AI60201 Module3 4 Problems
No ratings yet
AI60201 Module3 4 Problems
4 pages
AI60201 2024 Endsem Solutions
No ratings yet
AI60201 2024 Endsem Solutions
5 pages
Aait HW3
No ratings yet
Aait HW3
9 pages
HW 3
No ratings yet
HW 3
7 pages
ACV - Notes - Final
No ratings yet
ACV - Notes - Final
7 pages
PR Practical File
No ratings yet
PR Practical File
38 pages
Latent Variable Models: Stefano Ermon
No ratings yet
Latent Variable Models: Stefano Ermon
26 pages
Stats 205 Hw4
No ratings yet
Stats 205 Hw4
3 pages
CS236 Homework 3 Answer
No ratings yet
CS236 Homework 3 Answer
8 pages