0% found this document useful (0 votes)
17 views31 pages

Deep Learning - GAN

The document discusses vector space decomposition and synthesis using an orthonormal basis set Φ. It explains how a vector x can be decomposed into coefficients c by projecting onto each basis vector, and then synthesized back from the coefficients using the basis vectors. This forms the basis of analysis and synthesis networks like autoencoders and GANs. GANs are generative models with two networks - a generator G that synthesizes fake samples, and a discriminator D that tries to distinguish real from fake samples to train G to produce more realistic fakes.

Uploaded by

JEFFRY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views31 pages

Deep Learning - GAN

The document discusses vector space decomposition and synthesis using an orthonormal basis set Φ. It explains how a vector x can be decomposed into coefficients c by projecting onto each basis vector, and then synthesized back from the coefficients using the basis vectors. This forms the basis of analysis and synthesis networks like autoencoders and GANs. GANs are generative models with two networks - a generator G that synthesizes fake samples, and a discriminator D that tries to distinguish real from fake samples to train G to produce more realistic fakes.

Uploaded by

JEFFRY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

DEEP LEARNING

– GAN

林伯慎 Prof. Bor-Shen Lin


[email protected]
VECTOR SPACE
DECOMPOSITION AND SYNTHESIS
decomposition synthesis
𝒄
𝑡 Φ
𝒙 Φ ෥
𝒙

𝑛
 Assume Φ = 𝝓𝑖 𝑖=1 is an orthonormal set, 𝒙 is a vector.
 Decomposition : 𝑐𝑖 = 𝒙, 𝝓𝑖 for 𝑖 = 1,2, … , 𝑛.
 𝑐𝑖 the amount of projection of x in the direction of 𝝓𝑖 .
 𝒄 = Φ𝑡 𝒙 is the decomposition of vector x.
 Synthesis : 𝒙
෥ = σ𝑛𝑖=1 𝑐𝑖 𝝓𝑖 = ΦΦ𝑡 𝒙.
 𝒙 is the reconstruction of 𝒙 with reconstruction loss 𝐿2 (𝒙, ෥
෥ 𝒙).
 If Φ is a basis, 𝐿2 𝒙, ෥
𝒙 = 0.
ANALYSIS
 If Φ = 𝝓𝑖 𝑛𝑖=1 is a orthonormal vectors in a vector
space, and 𝒙 is a vector in the vector space.
 𝑐𝑖 = 𝒙, 𝝓𝑖 for 𝑖 = 1,2, … , 𝑛.
 𝑐𝑖 is the projection of vector x on the direction of 𝝓𝑖 .
 Decomposition of the vector 𝐱 in the subspace
𝒄 = Φ𝑡 𝒙 = 𝝓1 𝝓2 … 𝝓𝑛 𝑡 𝒙
𝑐1 = 𝝓1 𝑡 𝒙
𝑐1 𝝓1 𝑡 𝝓1 𝑡 𝒙 𝝓1
⋮ = ⋮ 𝒙= ⋮
𝒄
𝑐𝑛 𝝓𝑛 𝑡 𝝓𝑛 𝑡 𝒙 𝒙

𝑐𝑖 = 𝝓𝑖 𝑡 𝒙 = 𝒙, 𝝓𝑖 ⋮
 Φ as an analysis network
Φ𝑡
 𝝓𝑖 connection weights of neuron i
SYNTHESIS
𝑐1
 ෥ = σ𝑛𝑖=1 𝑐𝑖 𝝓𝑖 = 𝝓1 𝝓2 … 𝝓𝑛
𝒙 ⋮
𝑐𝑛
𝑐1 𝝓1
𝑡
= Φ𝒄 = ΦΦ 𝒙.
𝝓1
 Reconstruction of the vector 𝒙 in
linear subspace spanned by Φ. 𝒄 ෥
𝒙
 Reconstruction error: 𝐿2 (𝒙, 𝒙
෥). ⋮ ⋮
 When Φ is a basis of the vector
space, 𝐿2 𝒙, 𝒙
෥ = 0.
Φ
 𝒄 is a representation of 𝒙.
analysis synthesis
𝒙 𝒄 ෥
𝒙
Φ𝑡 Φ
EXAMPLE: DFT / IDFT

 Discrete Fourier transform


 Decomposition of discrete-time signal 𝑥 𝑛 of length N on a
subspace with basis Φ = 𝑒 𝑗𝜔𝑛 .
 FT: 𝑋(𝜔) = 𝑥 𝑛 , 𝑒 𝑗𝜔𝑛 continuous spectrum
2𝜋𝑘𝑛
 DFT: 𝑋 𝑘 = 𝑥 𝑛 , 𝑒𝑗 𝑁 discrete spectrum
 Ingredients of 𝑥 𝑛 at different frequency (𝜔)
 Inverse Discrete Fourier transform
 Reconstruction of signal using features and basis Φ.
1
 IFT: 𝑥෤ 𝑛 = ‫׬‬ 𝑋(𝜔)𝑒 𝑗𝜔𝑛 𝑑𝜔
2𝜋
2𝜋𝑘𝑛
1 𝑁−1 𝑗 𝑁
 IDFT: 𝑥෤ 𝑛 = σ 𝑋[𝑘]𝑒
𝑁 𝑘=0
DECOMPOSITION
 Car
 A car → 1 handler,4 wheels,…
 Hamberger
 A hamberger → water, starch, mineral, … 𝒙
 3D vector projected onto 2D plane 𝒆
 Error vector perpendicular to the plane

𝒙
 Projection is the reconstruction
Φ
 Fourier analysis
 Decomposing the signal with a set of cosine functions
 「Fourier transform」decomposition of signal
 「Inverse Fourier transform」reconstruction of signal
AUTO-ENCODER
Analysis Synthesis
𝑥 𝑧 𝑥෤
Encoder Decoder + 𝐿2 (𝑥 − 𝑥)

Discriminator Generator

𝑥 𝑧 𝑥෤
D G + 𝐿2 (𝑥 − 𝑥)

 Self estimate of a vector to minimize 𝐿2 (𝑥 − 𝑥)



 D/G could be FNN, CNN/DCNN, RNN or others
 Representation learning (unsupervised)
 𝑧 is the feature of 𝑥
GAN
(GENERATIVE ADVERSARIAL NETWORK)

Generator
(synthesis)
noise
Xfake
z G 𝐷(𝑋)
D

Xreal
Real Discriminator
(analysis)
DISCRIMINATOR

 Binary Classifier
 Tell if an object is of
a specific type or not
 Positive/negative samples
 e.g. CNN

 Example: Face detection


 Positives: any face photos
𝐷(𝑋) ∈ (0,1)
 Negatives: any non-face photos 𝑋 D

label Cross
0/1 Entropy
FNN GENERATOR

784 28 x 28

noise X



z 



map X[i][j] = Y[i*28 +j]

Y
Fully connected
DCNN GENERATOR

16384 8x8x256
z G Xfake
DCNN
up pooling
Layer Operation Input Output
z 16x16x256


Fully Connected


100 100 16,384


16,384 x 100
128@3x3x25 Up pooling+
8x8x256 16x16x128
6 Conv 128@3x3x256
convolution
Up pooling+
16x16x128 32x32x64
Conv 64@3x3x128
16x16x128
Up pooling+
32x32x64 64x64x3
Conv 3@3x3x64
TRAINING OF GAN
GAN Learning

1. 𝑋𝑟𝑒𝑎𝑙 : its goal is to be accepted by


noise
Xfake D when learning D (gold as 1)
z G 𝐷(𝑋) max log(𝐷(𝑋𝑟𝑒𝑎𝑙 )) .
𝐷
D 2. 𝑋𝑓𝑎𝑘𝑒 : : its goal is to be rejected by
Xreal D when learnin D (gold as 0)
Real photos max log 1 − 𝐷(𝑋𝑓𝑎𝑘𝑒 ) .
𝐷
3. 𝑋𝑓𝑎𝑘𝑒 : its goal is to be pretend to
be real and accepted by D, so Dset
gold as 1 to generate gradient for
G to learn G (D is NOT updated)
max 𝐷(𝐺(𝑧)).
𝐺
HOW DOES GAN WORK?

+
Xreal
+ +
+
border +

- Xfake

- border

Xfake -

attackers Xfake
DISCUSSIONS
 Discriminator is a binary classifier
with positive samples ONLY.
Negative samples are produced by
Generator. defenders
Xreal +
 If Generator is not good enough, +
+ +

 Generated Xfake are too far away from +

Xreal, which makes the decision border -


boundary lousy.
 You cannot train a troop with weak border
imaginary enemies.. -
-
attackers
 When Generator becomes tough, Xfake
 Generated samples come closer to the
positive samples, and the decision
boundary shrink backward towards
the positive samples.
 Train Olympic athletics in real games.
GOALS OF GAN

 May be train discriminator(D) or generator(G).


 When the goal is to train the discriminator
 It means it is possible to train discriminator with GAN when
only positive samples are available.
 Make use of generator to produce more negative samples so as
to better train discriminator
 When the goal is to train the generator
 It is possible to generate something similar to the positive
samples (reals) but with variation(through using noise z)
 It is not expected to generate exact the same things
 mode collapse
→ when changing z, no difference (loss allows M-to-1)
→ cannot control the characteristics of the generated output
CONDITIONAL GAN (C-GAN)

z
Xfake
G
(0,1)
c D
Condition: 7 Real
photos
Xreal

• Training inputs: image+condition Implement (FC)


• Use c to control condition and z to produce
Cited from C-GAN by M Mirza
variation
• Conditions: label, image, or text
C-GAN EXAMPLE- MNIST

Change z

noise
z
G
c
Change
c condition 2

 Label as condition
Cited from C-GAN by M Mirza
C-GAN EXAMPLE – AUTO TAGGING

Xfake
z G

c D

Real tags rasberry


Xreal

z
creek,
lake,
c G waters,
river

Cited from C-GAN by M Mirza


C-GAN FOR IMAGE-TO-IMAGE TRANSLATION

condition

condition

 Cited from Image-to-Image Translation with


Conditional Adversarial Networks

 D使用PatchGAN: 判斷任意NxN的patch為real/fake
 減小Xreal空間,且有更多正樣本
DOMAIN TRANSFORMATION

 Auto-Encoder
 Variational Auto-Encoder (VAE)

 GAN/cGAN Transformer

 Cycle Consistent GAN

 Star GAN
AUTO-ENCODER, AE (TRANSFORMATION)

Transformation  Encoder-decoder
𝑋 𝑧  Unet/ResNet
D G
 Learn transformation
𝑌෨
 Need paired data
−𝐿𝑝 (𝑌෨ − 𝑌) 𝑋𝑖 , 𝑌𝑖
𝑌  min 𝐿1 (𝑌 − 𝑌)

 Example
 Gray-to-color
VARIATIONAL AUTO-ENCODER

𝝁 𝒛
+ G
𝑋
D 𝑌෨
×
𝝈 −𝐿𝑝 (𝑌෨ − 𝑌)
𝑌
𝑁(0,1) ~

 Encoder output:mean 𝝁 and stddev 𝝈


 𝑧𝑖 = 𝜇𝑖 + 𝑛𝑖 𝜎𝑖 , 𝑛𝑖 ~𝑁(0,1)
 record 𝑛𝑖 , update 𝜇𝑖 and 𝜎𝑖
 Add uncertainty to G: due to 𝑛𝑖
GAN / CGAN
 GAN
 Do not need paired
𝑋 data,
D G 𝑌෨
 X = 𝑋𝑖 , Y = 𝑌𝑗
D  Not easy to converge
Real well
GAN photos
𝑌  可加入L1 loss if paired
data available

𝑋
D G 𝑌෨
 cGAN (conditional)
 Need paired data
D  T= 𝑋𝑖 , 𝑌𝑖
Conditional Real
 Could add L1 loss
photos
GAN 𝑌
CYCLE GAN

𝑌𝑋 𝑋 𝑌𝑓𝑎𝑘𝑒 = 𝐹(𝑋)
𝑋 X-
X- Domain F Domain
F

𝑋෨

𝐿CYC (𝑋 − 𝑋) + G Y-
𝑌𝑟𝑒𝑎𝑙
− DY
Domain 𝐿𝐺𝐴𝑁 (𝐷𝑌 , 𝐹)

D G 0/1

 X-domain和Y-domain: are not required to be paired


 F for X → Y, G for Y → X
 2 cycle losses: 𝐿CYC (𝑋, 𝑋)
෨ and 𝐿CY𝐶 (𝑌, 𝑌)

 Transformed as fake data, Original as real data
 2 GAN losses: 𝐿𝐺𝐴𝑁 (𝐷𝑋 , 𝐺) and 𝐿𝐺𝐴𝑁 (𝐷𝑌 , 𝐹)
 Opt. for multiple networks (F, G, DX, DY) with multiple objectives.
CYCLE GAN - EXAMPLE

• Cited from Learning to Discover Cross-Domain Relations with Generative


Adversarial Networks
CYCLE GAN - EXAMPLE

• Cited from Unpaired Image-to-Image Translation using Cycle-


Consistent Adversarial Networks
EXAMPLES:
Open
灰階 CV Photoshop

 Cited from Daiva’s master thesis


DISCUSSIONS ON CYCLE-GAN

 To train the transformer instead of the generator


 Domain transformation
 black hair to blond hair, horse to zebra
 Without requiring pair data
 Compare with transformer (requiring pair data)
 Complicated and time consuming
 Joint optimization of multiple networks with multiple objectives.
 Reconstruction loss may help to improve the quality (peek the
ground truth)
 U-net or residual net used to accelerate the convergence
 Inconvenient for transforming among multiple attributes
STARGAN

 If using CycleGAN
 Multiple transformer
 A lot of computations
 Not flexible
STARGAN

𝑋
D G
𝑐: 𝑌𝑓𝑎𝑘𝑒
Black
hair D:
D real/fake
𝑌𝑟𝑒𝑎𝑙
Y domain
𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑟:
Dy Is black hair
෩ or not
X
.

𝐿𝑟𝑒𝑐 (𝑋 − 𝑋) G D
.
.
𝑐: Blond hair
STARGAN EXAMPLE

 Cited from StarGAN: Unified Generative Adversarial Networks for Multi-


domain Image-to-Image Translation

You might also like