0% found this document useful (0 votes)
52 views60 pages

High-Resolution Image Synthesis and Semantic Manipulation With Conditional Gans

This document describes a method called pix2pixHD for high resolution image synthesis and semantic manipulation using conditional generative adversarial networks (GANs). The method extends the baseline pix2pix method to produce higher resolution images using a coarse-to-fine generator with multi-scale discriminators and a robust objective function. It also leverages instance-level segmentation maps to improve object boundaries and enable multi-modal outputs by conditioning on embedded semantic features.

Uploaded by

yonas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views60 pages

High-Resolution Image Synthesis and Semantic Manipulation With Conditional Gans

This document describes a method called pix2pixHD for high resolution image synthesis and semantic manipulation using conditional generative adversarial networks (GANs). The method extends the baseline pix2pix method to produce higher resolution images using a coarse-to-fine generator with multi-scale discriminators and a robust objective function. It also leverages instance-level segmentation maps to improve object boundaries and enable multi-modal outputs by conditioning on embedded semantic features.

Uploaded by

yonas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60

High-Resolution Image Synthesis and Semantic

Manipulation with Conditional GANs


( pix2pixHD)

Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao,


Jan Kautz, Bryan Catanzaro
Outline
• Introduction
• Related work
• Method
• Results
• Applications
• Conclusion
Outline
• Introduction
• Related work
• Method
• Results
• Applications
• Conclusion
Introduction

Building
Tree

Car

Road Sidewalk
Introduction
Outline
• Introduction
• Related work
• Method
• Results
• Applications
• Conclusion
Related Work
Generative Adversarial Network (GAN)

Goodfellow et al. [2014] Radford et al. [2015] Arjovsky et al. [2017]

Figure from: https://medium.com/@devnag/


Related Work
Generative Adversarial Network (GAN) Image-to-Image Translation

tio n
s ol u
w re
Lo
Goodfellow et al. [2014] Radford et al. [2015] Arjovsky et al. [2017] Johnson et al. [2016] Isola et al. [2017]
Related Work
Generative Adversarial Network (GAN) Image-to-Image Translation

tio n
s ol u
w re
Lo
Goodfellow et al. [2014] Radford et al. [2015] Arjovsky et al. [2017] Johnson et al. [2016] Isola et al. [2017]

Cascaded Refinement Network (CRN)

il s
e d eta
s fi n
La ck
Chen and Koltun [2017]
Related Work
Generative Adversarial Network (GAN) Image-to-Image Translation

tio n
s ol u
w re
Lo
Goodfellow et al. [2014] Radford et al. [2015] Arjovsky et al. [2017] Johnson et al. [2016] Isola et al. [2017]

Cascaded Refinement Network (CRN) Deep Visual Manipulation

il s
e d eta l uti on
s fi n re s o
La ck Low
Chen and Koltun [2017] Zhu et al. [2016] Zhang et al. [2017]
Our Work
Generative Adversarial Network (GAN) Image-to-Image Translation
High Resolution Image Semantic Manipulation
ti on
s o lu
re
L ow

Goodfellow et al. [2014] Radford et al. [2015] Arjovsky et al. [2017] Johnson et al. [2016] Isola et al. [2017]

Cascaded Refinement Network (CRN) Deep Visual Manipulation

ta il s n
de ti o
ck fin e
re s ol u
La Low
Chen and Koltun [2017] Zhu et al. [2016] Zhang et al. [2017]
Outline
• Introduction
• Related work
• Method
• Baseline method
• Our method
• Results
• Applications
• Conclusion
Baseline Method: pix2pix
• Discriminator training

input fake real

input input

*work by Isola et al. 2017


Baseline Method: pix2pix
• Generator training

input fake

input

*work by Isola et al. 2017


Outline
• Introduction
• Related work
• Method
• Baseline method
• Our method
• Results
• Applications
• Conclusion
Our Method
• Extending to high resolution
• Using instance-level segmentation maps
Our Method
• Extending to high resolution
• New generator
• New discriminator
• New objective function
• Using instance-level segmentation maps
Coarse-to-fine Generator Multi-scale Discriminators Robust Objective

Di Di
D3
real
Residual
blocks
...

match
D2

match
Residual
...

blocks

D1

2 real synthesized
real synthesized
Coarse-to-fine Generator Multi-scale Discriminators Robust Objective

Di Di
D3
real
Residual
blocks
...

match
D2

match
Residual
...

blocks

D1

2 real synthesized
real synthesized
*Similar ideas in Denton et al. 2015, Huang et al. 2017, Chen et al. 2017, Zhang et al. 2017
Coarse-to-fine Generator Multi-scale Discriminators Robust Objective

global Di Di
D3
real
Residua
l blocks
.
..

match
D2

match
Residual
.
..

blocks

D1

2 real synthesized
local real synthesized
*Similar ideas in Durugkar et al. 2016, Iizuka et al. 2017, Zhang et al. 2017
Coarse-to-fine Generator Multi-scale Discriminators Robust Objective

Di Di
D3
real
Residual
blocks
...

match
D2

match
Residual
...

blocks

D1

2 real synthesized
real synthesized
*Similar ideas in Larsen et al. 2016
Our Method
• Extending to high resolution
• Using instance-level segmentation maps
• Boundary improvement
• Multi-modal results using feature embedding
Our Method
• Extending to high resolution
• Using instance-level segmentation maps
• Boundary improvement
• Multi-modal results using feature embedding
Our Method
• Boundary improvement
Our Method
• Boundary improvement

without instance maps with instance maps


Our Method
• Extending to high resolution
• Using instance-level segmentation maps
• Boundary improvement
• Multi-modal results using feature embedding

*Similar ideas in Zhu et al. 2017


Our Method
• Multi-modal (one-to-many) results
Feature Embedding: Inference
Training d
i ze
e s es

ls
r th
be
tu n
La a y
Fe Image generation
S
network

l s e s
a u r
Re at
Fe

Instance-wise
average
pooling
Feature encoder network
Outline
• Introduction
• Related work
• Method
• Results
• Applications
• Conclusion
Results
• Comparisons with
• pix2pix [Isola et al. 2017]
• CRN [Chen and Koltun 2017]
• Datasets
• Cityscapes [Cordts et al. 2016]
• NYU [Silberman et al. 2012]
• ADE20K [Zhou et al. 2017]
• Helen Face [Smith et al. 2013]
• CelebA-HQ [Karras et al. 2017]
Results
• Quantitative comparisons (Cityscapes)
• Semantic segmentation scores

Mean IoU
Pixel Acc

84.29
83.7883.78
85 0.64
0.69
85 0.8
0.7 0.64
78.34
78.34 0.7
0.6
80
80 0.6 0.39
0.5 0.39
0.5 0.350.35
75 0.4
0.4
75 70.55
70.55 0.3
0.3
70 0.2
0.2
0.1
65 0

pix2pix
pix2pix CRN
CRN 0 Oracle
Oracle(GT)
(GT) pix2pix
pix2pix CRN
CRN 0 Oracle
Oracle(GT)
(GT)

6.2
13.8

• Subjective scores 93.8


86.2

Ours
pix2pix Ours CRN
Results
• Qualitative comparisons
Results
• Qualitative comparisons
Results on NYU dataset

pix2pix CRN Ours


Results on NYU dataset

pix2pix CRN Ours


Results on ADE20K dataset

Labels Ours Ground truth


Results on CelebA-HQ

Synthesized
Edges Ground truth
Results on CelebA-HQ

Synthesized
Edges Ground truth
Results on CelebA-HQ

Synthesized
Edges Ground truth
Results on CelebA-HQ

Synthesized
Edges Ground truth
Outline
• Introduction
• Related work
• Method
• Results
• Applications
• Conclusion
Applications: style changing
Applications: style changing
Applications: style changing
Applications: label changing
Applications: adding objects
Applications: adding strokes
Applications: adding strokes
Applications: face-to-painting

Video credit: Mario Klingemann at https://twitter.com/quasimondo/status/981161785116975104?s=12


Extension: vid2vidHD
Extension: vid2vidHD

Details and code will be released soon!


Extension: vid2vidHD

Details and code will be released soon!


Outline
• Introduction
• Related work
• Method
• Results
• Applications
• Conclusion
Conclusion
• We present a GAN based framework that can
• Synthesize high-res realistic images
Conclusion
• We present a GAN based framework that can
• Synthesize high-res realistic images
• Generate multi-modal results
Acknowledgements
• We thank the following people for helpful comments
• Taesung Park, Phillip Isola, Tinghui Zhou, Richard Zhang,
Rafael Valle, Kevin Shih, Guilin Liu, and Alexei A. Efros
• We thank CRN [Chen and Koltun 2017] and pix2pix
[Isola et al. 2017] for sharing their code
Thank you!
Project: https://tcwang0509.github.io/pix2pixHD/
Code: https://github.com/NVIDIA/pix2pixHD
Training details
• LSGAN, Adam solver
• Feature embedding:
• 3-dimentional feature vector
• Training:
• 1024x512 resolution: 12G memory GPU
• 2048x1024 resolution:
• FP32: NVIDIA Quadro M6000 GPU (24G)
• FP16: NVIDIA Volta V100 GPU (16G)
Comparison to SIMS

Faster speed Better alignment with


(~1000X) input SIMS Ours
Pixel Acc 65.5 83.8
*Qi et al. Semi-parametric Image Synthesis. In CVPR 2018.
Comparison to SIMS
SIMS Ours

*Qi et al. Semi-parametric Image Synthesis. In CVPR 2018.

You might also like