0% found this document useful (0 votes)

19 views29 pages

21.3 VAE Apps

Uploaded by

2217055

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views29 pages

21.3 VAE Apps

Uploaded by

2217055

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Deep Learning Srihari

Variational Autoencoder
Applications
Sargur N. Srihari
[email protected]
Deep Learning Srihari

Topics in VAE
1. VAE as a Generative Model
• https://cedar.buffalo.edu/~srihari/CSE676/21.1-VAE-Theory.pdf
1. VAE: The neural network perspective
https://cedar.buffalo.edu/~srihari/CSE676/21.2-VAE-NeuralNets.pdf

2. VAE Summary and Applications

2
Deep Learning Srihari

VAE Summary and Applications

1. Summary of VAE
1. Architecture
2. Training
3. Sample Generation
2. Deep Recurrent Attentive Writer (DRAW)
Image Generation
3. Semi-Supervised Learning
Semi-supervised learning
4. Interpolating between sentences
Interpolating between sentences
5. Radiology 3
Deep Learning Srihari

VAE Architecture

Source: https://lilianweng.github.io/lil-log/2018/08/12/from-autoencoder-to-beta-vae.html

4
Deep Learning Srihari

VAE Training
(without/with reparameterization)

Trained by minimizing negative ELBO:

5
( ) ( )
li θ , φ = −E z~q (z|x ) ⎡⎣ log pθ (x i | z)⎤⎦ + KL qφ (z | x i ) || p(z)
φ i
Deep Learning Srihari

Generating samples from a VAE

• Assume a trained VAE
• Generating a sample:
1. Draw a sample from p(∊)
2. Run z through decoder to
get p(x|z)
• It consists of parameters of a
Bernoulli or multinomial
3. Obtain sample from this
distribution

6
Deep Learning Srihari

Advantages/disadvantages of VAEs
• Advantages
– Elegant
• Generate from samples from N(0,1)
– Theoretically pleasing
• Maximizes ELBO in training
– Excellent results
– State-of-the-art for generative modeling
• Disadvantage
– Samples from image VAEs tend to be blurry
7
Deep Learning Srihari

Reasons for blurriness in VAE images

1. Effect of minimizing DKL(pdata||pmodel)
– Tend to assign high probability to points in data
set but also to other points which are blurry

2. Gaussian pmodel(x;g(z))
Maximizing lower bound is
similar to training an
autoencoder with mean squared error
Has tendency to ignore features that occupy few pixels
8
Deep Learning Srihari

Extending VAE to other architectures

• Extending VAE is straightforward
– This a key advantage over Boltzmann machines
• Which require careful design for tractability
• They work very well on a wide variety of
differentiable operators
– DRAW is a sophisticated recurrent encoder
combined with an attention mechanism
• Generation consists of visiting different small patches and
drawing the values of the pixels at those points

9
Deep Learning Srihari

Motivation for DRAW

• A person drawing a scene does it sequentially
reassessing handiwork after each modification
– Outlines are replaced by precise forms
– lines are sharpened, darkened or erased
– shapes are altered, and
– the final picture emerges
• DRAW: Recurrent Neural Net For Image Generation
– Google DeepMind 2015
• https://www.youtube.com/watch?v=Zt-7MI9eKEo
10
Deep Learning Srihari

Drawing images
The Deep Recurrent Attentive Writer (DRAW) architecture
• Represents a natural form of image construction
• Parts of a scene are created independently from others,
• Approximate sketches are successively refined.

DRAW is a VAE

The encoder and decoder

are both RNNs (LSTMs):

11
Deep Learning Srihari

Approaches to image generation

• Most automatic image generation aim to
generate entire scenes at once
– This means that all pixels are conditioned on a
single latent distribution
– As well as precluding the possibility of iterative self-
correction, the “one shot” approach is
fundamentally difficult to scale to large images.
• DRAW attempts more natural image
construction
– Parts of a scene are created independently from
others, and sketches are successively refined 12
Deep Learning Srihari

A recurrent neural network (RNN)

13
Deep Learning Srihari

Conventional VAE Design

Architecture VAE is used in three different modes:

1. Generation:
a sample z is drawn from a prior p(z) and passed through the
feedforward decoder network to compute the probability of the
input p (x|z) given the sample

2. Inference:
the input x is passed to the encoder network, producing
an approximate posterior q (z|x) over latent variables

3. Training:
z is sampled from q (z|x) and then used to compute the loss
KL[q(z|x)||p(z)]−log p(x|z),
which is minimized with SGD

14
Deep Learning Srihari

The DRAW Network

• Basic network structure is similar to VAEs
– Encoder network
• Determines distribution over latent codes that capture
salient information about input data
– Decoder network
• Receives samples from the code distribution and uses
them to condition its own distribution over images
• There are three key differences
– Described next
15
Deep Learning Srihari

Differences between DRAW and VAE

1. Both encoder and de-coder are RNNs
– A sequence of code samples is exchanged
– Encoder is privy to decoder’s previous outputs,
allowing it to tailor the codes it sends according to
the decoder’s behavior so far
2. Decoder’s outputs are successively added to
distribution that ultimately generates the data
– as opposed to emitting distribution in a single step
3. Dynamically updated attention is used
– to restrict both input region observed by encoder,
and the output region modified by the decoder 16
DRAW VAE Architecture
Deep Learning Srihari

Encoder and decoder are both RNNs (LSTMs)

Compare with
Conventional
VAE

• At each time-step, a sample zt from prior p(zt) is passed to the recurrent

decoder network, which then modifies part of the canvas matrix.
The final canvas matrix cT is used to compute p(x|z1:T)
• During inference the input is read at every time-step and the result is
passed to the encoder RNN.
The RNNs at the previous time-step specify where to read.
The output of the encoder RNN is used to compute the approximate posterior
17
over the latent variables at that time-step.
Deep Learning Srihari

Long Short Term Memory

• Explicitly designed to avoid the long-term dependency problem
• RNNs have the form of a repeating chain structure
– The repeating module has a simple structure such as tanh

• LSTMs also have a chain structure

– but the repeating module has a different structure

18
Deep Learning Srihari

Modeling the Drawing Mechanism

• Iterative part
– Decoder’s outputs are successively added to
distribution that ultimately generates the data, as
opposed to emitting it in a single step
• Attention mechanism
– To model working on one part of the image, and
then another
– Used to restrict both input region observed by
encoder, and output region modified by the decoder
• Network decides at each time-step:
– where to read, where to write and what to write
19
Deep Learning Srihari

Attention Mechanism
A classic image captioning system
would encode the image,
using a pre-trained CNN that
would produce a hidden state .

With an attention mechanism,

the image is first divided into parts,
and we compute with a CNN
representations of each part h1,…hn

When the RNN is generating a new word,

the attention mechanism is focusing on the
relevant part of the image,
so the decoder only uses specific parts of the
image

20
Deep Learning Srihari

Selective Attention Model

• N N grid of Gaussian filters
– Grid center (gX,gY) and stride δ determine mean of filter
at patch (i,j): µiX=gX +(i−N/2−0.5)δ , µjY=gY + (j−N/2−0.5)δ

3x 3 grid of filters

Three N N patches extracted from the image (N= 12).

Green rectangles indicate the boundary
and precision (σ) of the patches, while the patches
themselves are shown to the right.
1. Top patch has a small δ and high σ, giving a zoomed-in
but blurry view of the center of the digit;
2. Middle patch has large δ and low σ,
effectively down sampling the whole image;
3. Bottom patch has high δ and σ

21
Deep Learning Srihari

VAE and Semi-supervised Learning

• Semi-supervised learning falls in between
unsupervised and supervised learning
– make use of both labeled and unlabeled data in
supervised learning (e.g. classification, regression)
• Billions of unlabeled images on the internet,
only a small fraction is labeled
– Humans can identify anteaters after seeing very few
• Goal is to get the best performance from a tiny
data set
22
Deep Learning Srihari

VAE for semi-supervised learning

Vanilla VAE Semi-supervised Learning VAE
Extra latent variable Y for class label

23
Deep Learning Srihari

PGM for semi-supervised VAE

Generative model P Inference model Q

a are auxiliary variables such that:

q(a,z|x) =q(z|a,x)q(a|x) and the
marginal distribution q(z|x) can fit
more complicated posteriors p(z|x)

The incoming joint connections to each variable are

deep neural networks with parameters θ and ϕ

24
Deep Learning Srihari

Word-to-Vec for Language

Word-to-Vec:
e.g., sentence Have a great day
Start with one hot-vector of size |V|
V= {have, a, great, day}

Common With C context words Skip-gram model

Bag of Words Predicting the context
Have a great For each context position, we
The word great is input to predict day get C probability distributions
is input to predict day of V probabilities

Result is an embedding
of great in vector space
of N dimensions
Deep Learning Srihari

Sentence prediction by
conventional autoencoder
Sentences produced by greedily decoding from
points between two sentence encodings with a conventional autoencoder
The intermediate sentences are not plausible English

26
Deep Learning Srihari

VAE Language Model

• Words are represented using a learned
dictionary of embedding words

27
Deep Learning Srihari

VAE sentence interpolation

• Paths between random points in VAE space
• Intermediate sentences are grammatical
• Topic and syntactic structure are consistent

28
Deep Learning Srihari

VAE for Radiology

• Combines two types of models:
– discriminative & generative models
Models trained jointly
using variational EM framework

Left: Discriminative deep nn model Right: generative PGM with inputs:

Input: observed variables 1. class label y (diseases)
Generates posterior distributions 2. nuisance variables s (hospital identifiers)
over latent variables and 3. latent variables z (size, shape, other
possibly (if unobserved) class labels. brain properties)
Provides causality of observation
Performs Inference of latent variables
necessary to perform variational updates

Žižka - Text Mining With Machine Learning Principles and Techniques (2020)
No ratings yet
Žižka - Text Mining With Machine Learning Principles and Techniques (2020)
366 pages
DL Asmt-2
No ratings yet
DL Asmt-2
17 pages
22.1 GAN Motivation
No ratings yet
22.1 GAN Motivation
20 pages
Architectures RST
No ratings yet
Architectures RST
4 pages
Autoencoders - Buffalo University
100% (1)
Autoencoders - Buffalo University
36 pages
VAEs Talk
No ratings yet
VAEs Talk
44 pages
Introtodeeplearning MIT 6.S191
No ratings yet
Introtodeeplearning MIT 6.S191
36 pages
VAE Vs GAN
100% (1)
VAE Vs GAN
3 pages
Dis10 Sol
No ratings yet
Dis10 Sol
11 pages
6.1 DeepFFNets
No ratings yet
6.1 DeepFFNets
47 pages
10.0 SequenceModeling
No ratings yet
10.0 SequenceModeling
27 pages
CVAE-GAN Fine-Grained Image Generation Through Asymmetric Training
No ratings yet
CVAE-GAN Fine-Grained Image Generation Through Asymmetric Training
10 pages
DRAW Presentation
No ratings yet
DRAW Presentation
10 pages
Unsupervised Deep Learning
No ratings yet
Unsupervised Deep Learning
11 pages
MuskanSharma - III IT
No ratings yet
MuskanSharma - III IT
10 pages
Deep Feedforward Networks
No ratings yet
Deep Feedforward Networks
103 pages
DRAW: A Recurrent Neural Network For Image Generation
No ratings yet
DRAW: A Recurrent Neural Network For Image Generation
10 pages
Deep Learning Notes
100% (1)
Deep Learning Notes
16 pages
Deep Generative Models
No ratings yet
Deep Generative Models
55 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
11 pages
ArchitectureDesign For DeepLearning
No ratings yet
ArchitectureDesign For DeepLearning
34 pages
10 - Generative AI
No ratings yet
10 - Generative AI
71 pages
6 Types of Neural Network
No ratings yet
6 Types of Neural Network
8 pages
2020 CS182 Section 7 Notes
No ratings yet
2020 CS182 Section 7 Notes
5 pages
CSD411 Week14 AutoRBM
No ratings yet
CSD411 Week14 AutoRBM
18 pages
7.variational Autoencoders
No ratings yet
7.variational Autoencoders
4 pages
Deep Learning Concepts Summary
No ratings yet
Deep Learning Concepts Summary
6 pages
Unit 5
No ratings yet
Unit 5
39 pages
7FFA790A
No ratings yet
7FFA790A
14 pages
12.4.7 TransformerModels
No ratings yet
12.4.7 TransformerModels
37 pages
Variational Autoencoders (VAEs)
No ratings yet
Variational Autoencoders (VAEs)
5 pages
Neural Network Topologies: Input Layer Output Layer
No ratings yet
Neural Network Topologies: Input Layer Output Layer
30 pages
12.2 Computer Vision
No ratings yet
12.2 Computer Vision
12 pages
An Introduction To Deep Learning: January 2011
No ratings yet
An Introduction To Deep Learning: January 2011
14 pages
Deep Learning Report For Students
No ratings yet
Deep Learning Report For Students
32 pages
Class 5 - Deep Dive Into AI
No ratings yet
Class 5 - Deep Dive Into AI
32 pages
Introvae: Introspective Variational Autoencoders For Photographic Image Synthesis
No ratings yet
Introvae: Introspective Variational Autoencoders For Photographic Image Synthesis
20 pages
Introd_02
No ratings yet
Introd_02
32 pages
An Overview of Deep Generative Models
No ratings yet
An Overview of Deep Generative Models
10 pages
DL Ut - 2
No ratings yet
DL Ut - 2
30 pages
5.11 MLBasics-Challenges
No ratings yet
5.11 MLBasics-Challenges
20 pages
UNIT - I LONG
No ratings yet
UNIT - I LONG
10 pages
Neural Discrete Representation Learning
No ratings yet
Neural Discrete Representation Learning
11 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
Autonomous Driving With Deep Learning: A Survey of State-of-Art Technologies
No ratings yet
Autonomous Driving With Deep Learning: A Survey of State-of-Art Technologies
33 pages
Generative AI
No ratings yet
Generative AI
24 pages
AI Cheat
No ratings yet
AI Cheat
13 pages
Report: Trends in Generative Models
No ratings yet
Report: Trends in Generative Models
10 pages
09. Presentation
No ratings yet
09. Presentation
36 pages
Part 15 MD
No ratings yet
Part 15 MD
36 pages
AI60201 Module3
No ratings yet
AI60201 Module3
61 pages
Unit II
No ratings yet
Unit II
27 pages
For A Change
No ratings yet
For A Change
10 pages
Generative Nural Network
No ratings yet
Generative Nural Network
5 pages
Supp
No ratings yet
Supp
6 pages
ENG6500 8 DL IntroductionToDeepLearning Part2
No ratings yet
ENG6500 8 DL IntroductionToDeepLearning Part2
65 pages
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
No ratings yet
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
43 pages
9.2 CNN-Motivation
No ratings yet
9.2 CNN-Motivation
17 pages
Class Generative Models
No ratings yet
Class Generative Models
54 pages
Get Intelligent Feature Selection For Machine Learning Using The Dynamic Wavelet Fingerprint Mark K. Hinders PDF Ebook With Full Chapters Now
100% (2)
Get Intelligent Feature Selection For Machine Learning Using The Dynamic Wavelet Fingerprint Mark K. Hinders PDF Ebook With Full Chapters Now
65 pages
2023-Drilling Stuck Probability Intelligent Prediction Based On LSTM Considering Local Interpretability
No ratings yet
2023-Drilling Stuck Probability Intelligent Prediction Based On LSTM Considering Local Interpretability
10 pages
Microchemical Journal: Roman M. Balabin, Ravilya Z. Sa Fieva, Ekaterina I. Lomakina
No ratings yet
Microchemical Journal: Roman M. Balabin, Ravilya Z. Sa Fieva, Ekaterina I. Lomakina
8 pages
Excel Project Sales Data Analysis GC
No ratings yet
Excel Project Sales Data Analysis GC
33 pages
Credit Card Fraud Detection Using Machine Learning Final Research Paper
100% (2)
Credit Card Fraud Detection Using Machine Learning Final Research Paper
11 pages
Plant Disease Detection With Machine and Deep Lea-Groen Kennisnet 634493
No ratings yet
Plant Disease Detection With Machine and Deep Lea-Groen Kennisnet 634493
82 pages
Guidelines For Secure Adoption and Usage of AI - NCSA Qatar
No ratings yet
Guidelines For Secure Adoption and Usage of AI - NCSA Qatar
37 pages
Russel Resume
No ratings yet
Russel Resume
2 pages
AI Rule Based Vs Machine Learning
No ratings yet
AI Rule Based Vs Machine Learning
3 pages
Strategic Change - 2022 - Joy - Digital Future of Luxury Brands Metaverse Digital Fashion and Non Fungible Tokens
No ratings yet
Strategic Change - 2022 - Joy - Digital Future of Luxury Brands Metaverse Digital Fashion and Non Fungible Tokens
7 pages
Class Notes Deep-Learning
No ratings yet
Class Notes Deep-Learning
3 pages
Healthcare 10 01993
No ratings yet
Healthcare 10 01993
31 pages
Lecture-4-HCL-DSE - Sumita Narang
No ratings yet
Lecture-4-HCL-DSE - Sumita Narang
31 pages
Comprehensive Viva Amit Rawat
No ratings yet
Comprehensive Viva Amit Rawat
12 pages
CRP - Final Brief - 2024-25 - AI
No ratings yet
CRP - Final Brief - 2024-25 - AI
55 pages
Data Science Book1
No ratings yet
Data Science Book1
9 pages
Fairness and Bias in Artificial Intelligence - A Brief Survey of Sources, Impacts, and Mitigation Strategies
No ratings yet
Fairness and Bias in Artificial Intelligence - A Brief Survey of Sources, Impacts, and Mitigation Strategies
16 pages
Module 1 Introduction To AI in Project Management
No ratings yet
Module 1 Introduction To AI in Project Management
16 pages
What Is An API
No ratings yet
What Is An API
4 pages
Fsoft JD Ai Engineer - GHC
No ratings yet
Fsoft JD Ai Engineer - GHC
3 pages
An Artificial Intelligence Based Predictive Approachfor Smart Waste Management
No ratings yet
An Artificial Intelligence Based Predictive Approachfor Smart Waste Management
22 pages
Booklet Stats v8
No ratings yet
Booklet Stats v8
309 pages
Trends 125 Week 11 20
71% (7)
Trends 125 Week 11 20
72 pages
Predictive Modeling For Real-Time Customer Lifetime Value
No ratings yet
Predictive Modeling For Real-Time Customer Lifetime Value
6 pages
Modelling Tabular Data Using Conditional GAN's
No ratings yet
Modelling Tabular Data Using Conditional GAN's
15 pages
Alzheimer S Dementia Speech Audio Vs Text Multi Model Machine Learning at High Vs Low Resolution
No ratings yet
Alzheimer S Dementia Speech Audio Vs Text Multi Model Machine Learning at High Vs Low Resolution
17 pages
September 2024 - Top 10 Read Articles in Artificial Intelligence and Applications (IJAIA)
No ratings yet
September 2024 - Top 10 Read Articles in Artificial Intelligence and Applications (IJAIA)
32 pages
Activations
No ratings yet
Activations
8 pages
Battery Management System To Estimate Battery Agin
No ratings yet
Battery Management System To Estimate Battery Agin
15 pages

21.3 VAE Apps

Uploaded by

21.3 VAE Apps

Uploaded by

Deep Learning Srihari

2. VAE Summary and Applications

VAE Summary and Applications

Trained by minimizing negative ELBO:

Generating samples from a VAE

Reasons for blurriness in VAE images

Extending VAE to other architectures

Motivation for DRAW

The encoder and decoder

Approaches to image generation

A recurrent neural network (RNN)

Conventional VAE Design

The DRAW Network

Differences between DRAW and VAE

Encoder and decoder are both RNNs (LSTMs)

• At each time-step, a sample zt from prior p(zt) is passed to the recurrent

Long Short Term Memory

• LSTMs also have a chain structure

Modeling the Drawing Mechanism

With an attention mechanism,

When the RNN is generating a new word,

Selective Attention Model

Three N N patches extracted from the image (N= 12).

VAE and Semi-supervised Learning

VAE for semi-supervised learning

PGM for semi-supervised VAE

Generative model P Inference model Q

a are auxiliary variables such that:

The incoming joint connections to each variable are

Word-to-Vec for Language

Common With C context words Skip-gram model

VAE Language Model

VAE sentence interpolation

VAE for Radiology

Left: Discriminative deep nn model Right: generative PGM with inputs:

You might also like