0% found this document useful (0 votes)

52 views60 pages

High-Resolution Image Synthesis and Semantic Manipulation With Conditional Gans

This document describes a method called pix2pixHD for high resolution image synthesis and semantic manipulation using conditional generative adversarial networks (GANs). The method extends the baseline pix2pix method to produce higher resolution images using a coarse-to-fine generator with multi-scale discriminators and a robust objective function. It also leverages instance-level segmentation maps to improve object boundaries and enable multi-modal outputs by conditioning on embedded semantic features.

Uploaded by

yonas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views60 pages

High-Resolution Image Synthesis and Semantic Manipulation With Conditional Gans

Uploaded by

yonas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 60

High-Resolution Image Synthesis and Semantic

Manipulation with Conditional GANs

（ pix2pixHD)

Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao,

Jan Kautz, Bryan Catanzaro
Outline
• Introduction
• Related work
• Method
• Results
• Applications
• Conclusion
Outline
• Introduction
• Related work
• Method
• Results
• Applications
• Conclusion
Introduction

Building
Tree

Car

Road Sidewalk
Introduction
Outline
• Introduction
• Related work
• Method
• Results
• Applications
• Conclusion
Related Work
Generative Adversarial Network (GAN)

Goodfellow et al. [2014] Radford et al. [2015] Arjovsky et al. [2017]

Figure from: https://medium.com/@devnag/

Related Work
Generative Adversarial Network (GAN) Image-to-Image Translation

tio n
s ol u
w re
Lo
Goodfellow et al. [2014] Radford et al. [2015] Arjovsky et al. [2017] Johnson et al. [2016] Isola et al. [2017]
Related Work
Generative Adversarial Network (GAN) Image-to-Image Translation

tio n
s ol u
w re
Lo
Goodfellow et al. [2014] Radford et al. [2015] Arjovsky et al. [2017] Johnson et al. [2016] Isola et al. [2017]

Cascaded Refinement Network (CRN)

il s
e d eta
s fi n
La ck
Chen and Koltun [2017]
Related Work
Generative Adversarial Network (GAN) Image-to-Image Translation

tio n
s ol u
w re
Lo
Goodfellow et al. [2014] Radford et al. [2015] Arjovsky et al. [2017] Johnson et al. [2016] Isola et al. [2017]

Cascaded Refinement Network (CRN) Deep Visual Manipulation

il s
e d eta l uti on
s fi n re s o
La ck Low
Chen and Koltun [2017] Zhu et al. [2016] Zhang et al. [2017]
Our Work
Generative Adversarial Network (GAN) Image-to-Image Translation
High Resolution Image Semantic Manipulation
ti on
s o lu
re
L ow

Goodfellow et al. [2014] Radford et al. [2015] Arjovsky et al. [2017] Johnson et al. [2016] Isola et al. [2017]

input fake real

input input

*work by Isola et al. 2017

Baseline Method: pix2pix
• Generator training

input fake

input

*work by Isola et al. 2017

Outline
• Introduction
• Related work
• Method
• Baseline method
• Our method
• Results
• Applications
• Conclusion
Our Method
• Extending to high resolution
• Using instance-level segmentation maps
Our Method
• Extending to high resolution
• New generator
• New discriminator
• New objective function
• Using instance-level segmentation maps
Coarse-to-fine Generator Multi-scale Discriminators Robust Objective

Di Di
D3
real
Residual
blocks
...

match
D2

match
Residual
...

blocks

2 real synthesized
real synthesized
Coarse-to-fine Generator Multi-scale Discriminators Robust Objective

Di Di
D3
real
Residual
blocks
...

match
D2

match
Residual
...

blocks

2 real synthesized
real synthesized
*Similar ideas in Denton et al. 2015, Huang et al. 2017, Chen et al. 2017, Zhang et al. 2017
Coarse-to-fine Generator Multi-scale Discriminators Robust Objective

global Di Di
D3
real
Residua
l blocks
.
..

match
D2

match
Residual
.
..

blocks

2 real synthesized
local real synthesized
*Similar ideas in Durugkar et al. 2016, Iizuka et al. 2017, Zhang et al. 2017
Coarse-to-fine Generator Multi-scale Discriminators Robust Objective

Di Di
D3
real
Residual
blocks
...

match
D2

match
Residual
...

blocks

2 real synthesized
real synthesized
*Similar ideas in Larsen et al. 2016
Our Method
• Extending to high resolution
• Using instance-level segmentation maps
• Boundary improvement
• Multi-modal results using feature embedding
Our Method
• Extending to high resolution
• Using instance-level segmentation maps
• Boundary improvement
• Multi-modal results using feature embedding
Our Method
• Boundary improvement
Our Method
• Boundary improvement

without instance maps with instance maps

Our Method
• Extending to high resolution
• Using instance-level segmentation maps
• Boundary improvement
• Multi-modal results using feature embedding

*Similar ideas in Zhu et al. 2017

Our Method
• Multi-modal (one-to-many) results
Feature Embedding: Inference
Training d
i ze
e s es

ls
r th
be
tu n
La a y
Fe Image generation
S
network

l s e s
a u r
Re at
Fe

Instance-wise
average
pooling
Feature encoder network
Outline
• Introduction
• Related work
• Method
• Results
• Applications
• Conclusion
Results
• Comparisons with
• pix2pix [Isola et al. 2017]
• CRN [Chen and Koltun 2017]
• Datasets
• Cityscapes [Cordts et al. 2016]
• NYU [Silberman et al. 2012]
• ADE20K [Zhou et al. 2017]
• Helen Face [Smith et al. 2013]
• CelebA-HQ [Karras et al. 2017]
Results
• Quantitative comparisons (Cityscapes)
• Semantic segmentation scores

Mean IoU
Pixel Acc

84.29
83.7883.78
85 0.64
0.69
85 0.8
0.7 0.64
78.34
78.34 0.7
0.6
80
80 0.6 0.39
0.5 0.39
0.5 0.350.35
75 0.4
0.4
75 70.55
70.55 0.3
0.3
70 0.2
0.2
0.1
65 0

pix2pix
pix2pix CRN
CRN 0 Oracle
Oracle(GT)
(GT) pix2pix
pix2pix CRN
CRN 0 Oracle
Oracle(GT)
(GT)

6.2
13.8

• Subjective scores 93.8

86.2

Ours
pix2pix Ours CRN
Results
• Qualitative comparisons
Results
• Qualitative comparisons
Results on NYU dataset

pix2pix CRN Ours

Results on NYU dataset

pix2pix CRN Ours

Results on ADE20K dataset

Labels Ours Ground truth

Results on CelebA-HQ

Synthesized
Edges Ground truth
Results on CelebA-HQ

Synthesized
Edges Ground truth
Outline
• Introduction
• Related work
• Method
• Results
• Applications
• Conclusion
Applications: style changing
Applications: style changing
Applications: style changing
Applications: label changing
Applications: adding objects
Applications: adding strokes
Applications: adding strokes
Applications: face-to-painting

Video credit: Mario Klingemann at https://twitter.com/quasimondo/status/981161785116975104?s=12

Extension: vid2vidHD
Extension: vid2vidHD

Details and code will be released soon!

Extension: vid2vidHD

Details and code will be released soon!

Outline
• Introduction
• Related work
• Method
• Results
• Applications
• Conclusion
Conclusion
• We present a GAN based framework that can
• Synthesize high-res realistic images
Conclusion
• We present a GAN based framework that can
• Synthesize high-res realistic images
• Generate multi-modal results
Acknowledgements
• We thank the following people for helpful comments
• Taesung Park, Phillip Isola, Tinghui Zhou, Richard Zhang,
Rafael Valle, Kevin Shih, Guilin Liu, and Alexei A. Efros
• We thank CRN [Chen and Koltun 2017] and pix2pix
[Isola et al. 2017] for sharing their code
Thank you!
Project: https://tcwang0509.github.io/pix2pixHD/
Code: https://github.com/NVIDIA/pix2pixHD
Training details
• LSGAN, Adam solver
• Feature embedding:
• 3-dimentional feature vector
• Training:
• 1024x512 resolution: 12G memory GPU
• 2048x1024 resolution:
• FP32: NVIDIA Quadro M6000 GPU (24G)
• FP16: NVIDIA Volta V100 GPU (16G)
Comparison to SIMS

Faster speed Better alignment with

(~1000X) input SIMS Ours
Pixel Acc 65.5 83.8
*Qi et al. Semi-parametric Image Synthesis. In CVPR 2018.
Comparison to SIMS
SIMS Ours

*Qi et al. Semi-parametric Image Synthesis. In CVPR 2018.

S73042 Dynamo Tutorial GTC 2025
No ratings yet
S73042 Dynamo Tutorial GTC 2025
79 pages
ENG OnDemand3D Application Manual Eng v.1.0.11.1007 240308
No ratings yet
ENG OnDemand3D Application Manual Eng v.1.0.11.1007 240308
223 pages
Neuronet Works Book
No ratings yet
Neuronet Works Book
690 pages
MCP Beginners Guide v2
No ratings yet
MCP Beginners Guide v2
62 pages
Photo-Realistic Photo Synthesis Using Improved Conditional Generative Adversarial Networks
No ratings yet
Photo-Realistic Photo Synthesis Using Improved Conditional Generative Adversarial Networks
8 pages
Auto-DeepLab Hierarchical Neural Architecture Search For Semantic Image Segmentation
No ratings yet
Auto-DeepLab Hierarchical Neural Architecture Search For Semantic Image Segmentation
12 pages
Parallel Computing LessonPlan
No ratings yet
Parallel Computing LessonPlan
10 pages
Chapter 1 - Introduction To Time Series and Forcasting - Student Version
100% (2)
Chapter 1 - Introduction To Time Series and Forcasting - Student Version
51 pages
240619_nmr_global AI Trend Tracker
No ratings yet
240619_nmr_global AI Trend Tracker
19 pages
2301.07499v1
No ratings yet
2301.07499v1
177 pages
Dl And Nlp
No ratings yet
Dl And Nlp
51 pages
Sandy Bridge Intel HD Graphics DirectX Developers Guide
No ratings yet
Sandy Bridge Intel HD Graphics DirectX Developers Guide
68 pages
The Need For Speed Webgl and Network Rendering
No ratings yet
The Need For Speed Webgl and Network Rendering
16 pages
Machine Learning Adversaries in Video Games
No ratings yet
Machine Learning Adversaries in Video Games
37 pages
Conceptualizing Python in Google COLAB
No ratings yet
Conceptualizing Python in Google COLAB
330 pages
CG Unit 6 Notes
No ratings yet
CG Unit 6 Notes
27 pages
Towards Responsible Machine Translation - Ethical and Legal Considerations in Machine Translation
No ratings yet
Towards Responsible Machine Translation - Ethical and Legal Considerations in Machine Translation
242 pages
(B) Ruan D., Huang C. (Eds) Fuzzy Sets and Fuzzy Information - Granulation Theory (BNUP, 2000) (T) (522s)
No ratings yet
(B) Ruan D., Huang C. (Eds) Fuzzy Sets and Fuzzy Information - Granulation Theory (BNUP, 2000) (T) (522s)
522 pages
Shiflett Chris Essential PHP Security
No ratings yet
Shiflett Chris Essential PHP Security
48 pages
Sketch2face: Conditional Generative Adversarial Networks For Transforming Face Sketches Into Photorealistic Images
No ratings yet
Sketch2face: Conditional Generative Adversarial Networks For Transforming Face Sketches Into Photorealistic Images
9 pages
Release Guide ERDAS IMAGINE 2023 Update 1
No ratings yet
Release Guide ERDAS IMAGINE 2023 Update 1
70 pages
Thick Client Pentesting The-HackersMeetup Version1.0pptx
No ratings yet
Thick Client Pentesting The-HackersMeetup Version1.0pptx
32 pages
ITIL Foundation ITIL 4 Edition (Spanish PDF)
No ratings yet
ITIL Foundation ITIL 4 Edition (Spanish PDF)
261 pages
DDColor Towards Photo-Realistic Image Colorization Via Dual Decoders
No ratings yet
DDColor Towards Photo-Realistic Image Colorization Via Dual Decoders
17 pages
3d-Aware Conditional Image Synthesis: Kangle Deng Gengshan Yang Deva Ramanan Jun-Yan Zhu Carnegie Mellon University
No ratings yet
3d-Aware Conditional Image Synthesis: Kangle Deng Gengshan Yang Deva Ramanan Jun-Yan Zhu Carnegie Mellon University
15 pages
My Dream Computer
No ratings yet
My Dream Computer
10 pages
Deep Learning UDA
No ratings yet
Deep Learning UDA
44 pages
Artificial Intelligence (A Modern Approach)
No ratings yet
Artificial Intelligence (A Modern Approach)
22 pages
AMReX and pyAMReX - Looking Beyond
No ratings yet
AMReX and pyAMReX - Looking Beyond
12 pages
Mercer Case Study
No ratings yet
Mercer Case Study
46 pages
Wang High-Resolution Image Synthesis CVPR 2018 Paper
No ratings yet
Wang High-Resolution Image Synthesis CVPR 2018 Paper
10 pages
Asus Ux430un - 1.0 PDF
No ratings yet
Asus Ux430un - 1.0 PDF
68 pages
1803 07422xasxa
No ratings yet
1803 07422xasxa
28 pages
NIMBLE User Manual
No ratings yet
NIMBLE User Manual
194 pages
Hlstransform: Energy-Efficient Llama 2 Inference On Fpgas Via High Level Synthesis
No ratings yet
Hlstransform: Energy-Efficient Llama 2 Inference On Fpgas Via High Level Synthesis
9 pages
Capacitacion Syscom PDF
No ratings yet
Capacitacion Syscom PDF
242 pages
Es Como Algo
No ratings yet
Es Como Algo
14 pages
Cloudflare Overview
100% (1)
Cloudflare Overview
2 pages
Dell Precision 5860 Tower-Specs
No ratings yet
Dell Precision 5860 Tower-Specs
22 pages
Pima Indians Diabetes Database Analysis - Kaggle
No ratings yet
Pima Indians Diabetes Database Analysis - Kaggle
37 pages
Data Science For Financial Markets - Kaggle
No ratings yet
Data Science For Financial Markets - Kaggle
202 pages
S71937_enabling intelligent storage to process data for ai application ibm
No ratings yet
S71937_enabling intelligent storage to process data for ai application ibm
21 pages
Introduction To Gpu Programming With Cuda and Openacc
100% (1)
Introduction To Gpu Programming With Cuda and Openacc
40 pages
Sovereign Default On Gpus: Eaton and Gersovitz
No ratings yet
Sovereign Default On Gpus: Eaton and Gersovitz
10 pages
RobustOptimizationPaper PDF
No ratings yet
RobustOptimizationPaper PDF
38 pages
Boundary-Aware Segmentation Network For Mobile and Web Applications
No ratings yet
Boundary-Aware Segmentation Network For Mobile and Web Applications
19 pages
SuperResolution 1702.00783 PDF
No ratings yet
SuperResolution 1702.00783 PDF
21 pages
Trends in Computer Science, Engineering and Information Technology First International Conference on Computer Science, Engineering and Information Technology, CCSEIT 2011, Tirunelveli, Tamil Nadu, India, September 23-25, 2
No ratings yet
Trends in Computer Science, Engineering and Information Technology First International Conference on Computer Science, Engineering and Information Technology, CCSEIT 2011, Tirunelveli, Tamil Nadu, India, September 23-25, 2
755 pages
Art2Real - Unfolding The Reality of Artworks
No ratings yet
Art2Real - Unfolding The Reality of Artworks
11 pages
Machine Learning Curriculum Berkley
100% (1)
Machine Learning Curriculum Berkley
12 pages
MAXIMUM PC (Pdf-Flip) PDF
No ratings yet
MAXIMUM PC (Pdf-Flip) PDF
42 pages
(C) Gaming Laptop Vs Desktop
No ratings yet
(C) Gaming Laptop Vs Desktop
4 pages
Point Based Rendering Enhancement Via Deep Learning: December 16, 2018
No ratings yet
Point Based Rendering Enhancement Via Deep Learning: December 16, 2018
13 pages
Analisis de Datos MIT
No ratings yet
Analisis de Datos MIT
340 pages
Logistic Regression Learning Annotated
No ratings yet
Logistic Regression Learning Annotated
77 pages
Gpus For Ofdm Based SDR Prototyping: A Comparative Research Study
No ratings yet
Gpus For Ofdm Based SDR Prototyping: A Comparative Research Study
4 pages
Fashion Synthesis With Structural Coherence
No ratings yet
Fashion Synthesis With Structural Coherence
9 pages
AI Race To The Edge - June 2024
No ratings yet
AI Race To The Edge - June 2024
11 pages
Esser_Taming_Transformers_for_High-Resolution_Image_Synthesis_CVPR_2021_paper
No ratings yet
Esser_Taming_Transformers_for_High-Resolution_Image_Synthesis_CVPR_2021_paper
11 pages
Force2 A-Ds-5018 5
No ratings yet
Force2 A-Ds-5018 5
2 pages
A Seminar Report On GPU by M.Marshal Murmu (1801109169)
No ratings yet
A Seminar Report On GPU by M.Marshal Murmu (1801109169)
28 pages
310251: Data Science and Big Data Analytics
No ratings yet
310251: Data Science and Big Data Analytics
2 pages
NCA-AIIO-Demo
No ratings yet
NCA-AIIO-Demo
5 pages
D3 Tips and Tricks PDF
No ratings yet
D3 Tips and Tricks PDF
562 pages
Match-Trader Demo
No ratings yet
Match-Trader Demo
5 pages
001pattern Recognition and Machine Learning 1
No ratings yet
001pattern Recognition and Machine Learning 1
25 pages
Fusing Global and Local Features For Generalized AI-synthesized Image Detection
No ratings yet
Fusing Global and Local Features For Generalized AI-synthesized Image Detection
5 pages
Building and Evaluating ML Models
No ratings yet
Building and Evaluating ML Models
27 pages
Backend Developer Roadmap
No ratings yet
Backend Developer Roadmap
20 pages
Ejemplo Archimate
No ratings yet
Ejemplo Archimate
22 pages
Eye Tracking A Comprehensive Guide To Methods and
No ratings yet
Eye Tracking A Comprehensive Guide To Methods and
22 pages
(Hackthebox - Writeup) Cascade
No ratings yet
(Hackthebox - Writeup) Cascade
11 pages
Conversational Interfaces Guided Setup
No ratings yet
Conversational Interfaces Guided Setup
4 pages
Formatting Instructions For Neurips 2020
No ratings yet
Formatting Instructions For Neurips 2020
6 pages
Redux - Async Actions With Middleware and Thunks Cheatsheet - Codecademy
No ratings yet
Redux - Async Actions With Middleware and Thunks Cheatsheet - Codecademy
3 pages
Bootstrap Cheatsheet
No ratings yet
Bootstrap Cheatsheet
1 page
Deep Learning Nanodegree Syllabus 8-15
No ratings yet
Deep Learning Nanodegree Syllabus 8-15
15 pages
General Specifications: GS 33K05D10-50E
No ratings yet
General Specifications: GS 33K05D10-50E
16 pages
versal-ai-edge-gen2-product-brief
No ratings yet
versal-ai-edge-gen2-product-brief
3 pages
Computer Vision - Ipynb - Colaboratory
No ratings yet
Computer Vision - Ipynb - Colaboratory
17 pages
Research Proposal
No ratings yet
Research Proposal
1 page
Neat Video: User Guide
No ratings yet
Neat Video: User Guide
30 pages
Basic Monitoring of Electrical Power Systems: Sentron Pac3100
No ratings yet
Basic Monitoring of Electrical Power Systems: Sentron Pac3100
4 pages
Microstrategy Tips and Techniques: Reporting Essentials Five Styles of Business Intelligence
No ratings yet
Microstrategy Tips and Techniques: Reporting Essentials Five Styles of Business Intelligence
20 pages
Marketing & Publicidad
From Everand
Marketing & Publicidad
Robert S. McGraw
No ratings yet
SSO Standard Requirements
From Everand
SSO Standard Requirements
Gerardus Blokdyk
No ratings yet
The Architecture of SAP ERP: Understand how successful software works
From Everand
The Architecture of SAP ERP: Understand how successful software works
Jochen Boeder
No ratings yet
VMware Infrastructure Second Edition
From Everand
VMware Infrastructure Second Edition
Gerardus Blokdyk
No ratings yet
Outbound marketing A Clear and Concise Reference
From Everand
Outbound marketing A Clear and Concise Reference
Gerardus Blokdyk
No ratings yet
Microsoft Dynamics GP 2013 Reporting, Second Edition
From Everand
Microsoft Dynamics GP 2013 Reporting, Second Edition
David Duncan
5/5 (2)

High-Resolution Image Synthesis and Semantic Manipulation With Conditional Gans

Uploaded by

High-Resolution Image Synthesis and Semantic Manipulation With Conditional Gans

Uploaded by

High-Resolution Image Synthesis and Semantic

Manipulation with Conditional GANs

Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao,

Goodfellow et al. [2014] Radford et al. [2015] Arjovsky et al. [2017]

Figure from: https://medium.com/@devnag/

Cascaded Refinement Network (CRN)

Cascaded Refinement Network (CRN) Deep Visual Manipulation

Cascaded Refinement Network (CRN) Deep Visual Manipulation

input fake real

*work by Isola et al. 2017

*work by Isola et al. 2017

without instance maps with instance maps

*Similar ideas in Zhu et al. 2017

• Subjective scores 93.8

pix2pix CRN Ours

pix2pix CRN Ours

Labels Ours Ground truth

Video credit: Mario Klingemann at https://twitter.com/quasimondo/status/981161785116975104?s=12

Details and code will be released soon!

Details and code will be released soon!

Faster speed Better alignment with

*Qi et al. Semi-parametric Image Synthesis. In CVPR 2018.

You might also like