SlideShare a Scribd company logo
D E F E N S E
Against Adversarial Attacks
B E T T E R F E L L O W
DEFENSE against
Adversarial attacks
iPodBoat Perturbation
Through the
machine’s eye
Adversarial
Example or Attack
Cleanse the attacked image
AutoEncoders
Our Approach
Our defense model
≈ 𝑓𝑖𝑙𝑡𝑒𝑟↓ Input:
Attacked Image
↓ Expected Output:
Purified Image
Auto Encoders
CNN model
Expected Result
Boat
Set back the effect of the perturbation
Model to defend
Our defense model
Research papers
ATTACKS & DEFENSES
Adversarial examples are
GENERALIZED
upon different deep learning models
Explaining and Harnessing Adversarial Examples
I.J. Goodfellow et al.
arXiv preprint arXiv:1412.6572, 2014
Universal Adversarial Perturbations
S.M. Moosavi-Dezfooli et al.
IEEE Conference on CVPR, 2017.
There exists
a subspace with lower dimensions
that captures the decision boundaries
They fool classifiers mainly
due to covariate shift.
PixelDefend: Leveraging Generative Models
to Understand and Defend against
Adversarial Examples
Y. Song et al.
arXiv preprint arXiv:1710.10766, 2017.
I N T U I T I O N S &
H Y P O T H E S I S
• All image classifiers are approximating the same function in a specific task
• There seems to be a latent subspace with lower dimensions for decision boundaries
• Adversarial examples lie in low probability regions in the training distribution
• If we are able to capture the latent subspace for the training distribution,
we might be able to pull the adversarial examples back, near to its mean.
OUR Previous
Toy EXPERIMENTS
Our Previous Toy EXPERIMENTS: Built from scratch
1
2
Convolutional Denoising Auto Encoder [dAE]
Convolutional Variational Auto Encoder [VAE]
It’s a
five
It’s a
five
↓ Model to defend: LeNet-5↓ dAE
↓ VAE ↓ Model to defend: LeNet-5
Toy Defense Model Comparison
Raw Attacked Denoised Reconstructed
Conv dAE Conv VAE
Output comparison
0
10
20
30
40
50
60
70
80
90
100
Raw Images Attacked
Images
Denoised
Images
Reconstructed
Images
90
65
75 77
Accuracy comparison
Toy Defense Model Comparison
Conv dAE Conv VAE
▲ ▲
O U R M A I N
E X P E R I M E N T
Our Main Experiment
• Real Image Dataset: CIFAR-10
• Concentrate on architecture for VAE
• Train defense models based on target
model’s output
• Black box defense
• Use of better computation power
• Base-line: PixelDefend’s 30% → 52%
Our Main Experiment: Built from scratch
Z_mu, Z_logvar
Sampled Epsilon
Noise Added
16x16x128
8x8x256
4x4x256
1024
64
128
1024
4x4x256
8x8x256
16x16x128
32x32x6432x32x64
128
512512
2x2x512
32x32x3 32x32x3
Denoising Variational Auto-Encoder:= dAE + VAE = dVAE
Im, D.J., et al. Denoising criterion for variational auto-encoding framework.
Proceedings of AAAI: pp. 2059-2065, 2017.
Architectural Reference: X. Yan et al., Attribute2Image: Conditional Image Generation from Visual Attributes. In ECCV, 2015.
↑ Input:
Attacked Image
↓ Expected Output:
Purified Image
Classification
Result
True
Label
Reconstruction Loss
Latent Loss
Classification
Loss
dVAE
VGG16
Our Main Experiment: Overall Architecture
• Reconstruction Loss: Binary Cross-entropy
• Latent Loss: Kullback-Leibler Divergence
• Classification Loss: Binary Cross-entropy
↑ Input:
Attacked Image ↑ Expected Output:
Purified Image
• Learning rate: 3*10E-5
• Batch size: 32
• Epoch: 80
← Raw Image
Our Main Experiment: CIFAR-10 Dataset
RAW Images ATTACKED Images
by Fast Gradient Sign Method
Goodfellow, I.J., et al. Explaining and Harnessing Adversarial Examples. arXiv preprint arXiv:1412.6572, 2014.
RAW Images
Accuracy 90%
ATTACKED Images
VGG16
VGG16
Accuracy 30%
Our Main Experiment: CIFAR-10 Dataset
ATTACKED Images
dVAE
Our Main Experiment: CIFAR-10 Dataset
VGG16
Accuracy
17%
RAW Images
dVAE VGG16
Accuracy
17%
Discussion
One can conclude that
the VAE(X. Yan, et al., 2015) is the problem
Z_mu, Z_logvar
Sampled Epsilon
Noise Added
16x16x128
8x8x256
4x4x256
1024
64
128
1024
4x4x256
8x8x256
16x16x128
32x32x6432x32x64
128
512512
2x2x512
32x32x3 32x32x3
Architectural Reference: X. Yan et al., Attribute2Image: Conditional Image Generation from Visual Attributes. In ECCV, 2015.
Discussion
However, the VAE did converge,
and its output is
identical to other so-called well-trained VAEs.
Discussion
So, for the classifier to classify image correctly,
it needs much more high spatial frequency information
(≈ high resolution image)
Discussion
VGG16
Accuracy
17%
Only
Low Spatial Frequency
Information is given
≈	The proportion which
Low Spatial Frequency information
is contributing to the classification
Accuracy
90% Loss of accuracy,
due to the loss of
High Spatial Frequency
information
However, for humans with the prior knowledge,
that there are 10 classes,
and leveraging the low spatial frequency information,
we are able to do better than the machine
Discussion
{Airplane, Car, Bird, Cat, Deer, Dog, Frog, Horse, Ship, Truck}
∴ A difference regarding
our biological vision system
Conclusion
• Not VAEs, but a more suitable generative model for high resolution
• ex: Disco-GAN
Why our model did not succeed
• An architectural limitation of the Conv Nets
• Not leveraging low spatial frequency information
• No use of prior knowledge in the system
Why adversarial attacks work
R E F E R E N C E S
In appearance order
• I.J. Goodfellow et al., Explaining and Harnessing Adversarial Examples. arXiv preprint arXiv:1412.6572, 2014.
• S.M. Moosavi-Dezfooli et al. Universal Adversarial Perturbations. IEEE Conference on CVPR, 2017.
• Y. Song et al. PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples. arXiv
preprint arXiv:1710.10766, 2017.
• Y. LeCun et al., Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 1998.
• P. Vincent et al., Extracting and Composing Robust Features with Denoising Autoencoders. Proceedings of ICML, 2008.
• X. Yan et al., Attribute2Image: Conditional Image Generation from Visual Attributes. In ECCV, 2015.
• Im, D.J., et al. Denoising criterion for variational auto-encoding framework. Proceedings of AAAI: pp. 2059-2065, 2017.
THANK YOU
René Magritte, The castle in the Pyrenees, 1959

More Related Content

PDF
Capstone Design(2) 중간 발표
PDF
Modeling perceptual similarity and shift invariance in deep networks
PDF
Universal Adversarial Perturbation
PDF
Adversarial examples in deep learning (Gregory Chatel)
PPT
MPerceptron
PDF
Deep image generating models
PDF
[Explained] "Partial Success in Closing the Gap between Human and Machine Vis...
PDF
Deep Generative Models
Capstone Design(2) 중간 발표
Modeling perceptual similarity and shift invariance in deep networks
Universal Adversarial Perturbation
Adversarial examples in deep learning (Gregory Chatel)
MPerceptron
Deep image generating models
[Explained] "Partial Success in Closing the Gap between Human and Machine Vis...
Deep Generative Models

Similar to Capstone Design(2) 최종 발표 (20)

PDF
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
PDF
Robustness of compressed CNNs
PPTX
brief Introduction to Different Kinds of GANs
PPTX
Learning Biologically Relevant Features Using Convolutional Neural Networks f...
PDF
EuroSciPy 2019 - GANs: Theory and Applications
PDF
Research of adversarial example on a deep neural network
PDF
Bo Li-they’ve created images that reliably fool neural network
PDF
Semi-Targeted Model Poisoning Attack on Federated Learning via Backward Error...
PDF
GAN - Theory and Applications
PPT
The Concurrent Constraint Programming Research Programmes -- Redux
PDF
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
PPTX
Black-Box attacks against Neural Networks - technical project presentation
PPTX
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
PDF
Adversarial Attacks and Defenses in Deep Learning.pdf
PPTX
[Paper Review] Audio adversarial examples
PPTX
L7_finetuning on tamil technologies.pptx
PPTX
Regression vs Deep Neural net vs SVM
PDF
Towards Secure and Interpretable AI: Scalable Methods, Interactive Visualizat...
PDF
Modeling documents with Generative Adversarial Networks - John Glover
PPTX
Obscenity Detection in Images
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Robustness of compressed CNNs
brief Introduction to Different Kinds of GANs
Learning Biologically Relevant Features Using Convolutional Neural Networks f...
EuroSciPy 2019 - GANs: Theory and Applications
Research of adversarial example on a deep neural network
Bo Li-they’ve created images that reliably fool neural network
Semi-Targeted Model Poisoning Attack on Federated Learning via Backward Error...
GAN - Theory and Applications
The Concurrent Constraint Programming Research Programmes -- Redux
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
Black-Box attacks against Neural Networks - technical project presentation
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
Adversarial Attacks and Defenses in Deep Learning.pdf
[Paper Review] Audio adversarial examples
L7_finetuning on tamil technologies.pptx
Regression vs Deep Neural net vs SVM
Towards Secure and Interpretable AI: Scalable Methods, Interactive Visualizat...
Modeling documents with Generative Adversarial Networks - John Glover
Obscenity Detection in Images
Ad

More from Hyunwoo Kim (14)

PDF
서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업
PDF
Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
PDF
Abstractive Summarization of Reddit Posts with Multi-level Memory Networks
PDF
Genetic Algorithm Project 2
PDF
Sentiment Analysis Intro
PDF
Two VWM representations simultaneously control attention
PDF
Neural Networks Basics with PyTorch
PDF
Capstone Design(2) 연구제안 발표
PDF
Capstone Design(1) 최종 발표
PDF
Capstone Design(1) 중간 발표
PDF
Capstone Design(1) 연구제안 발표
PDF
Neural Network Intro [인공신경망 설명]
PDF
Random Forest Intro [랜덤포레스트 설명]
PDF
Decision Tree Intro [의사결정나무]
서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업
Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
Abstractive Summarization of Reddit Posts with Multi-level Memory Networks
Genetic Algorithm Project 2
Sentiment Analysis Intro
Two VWM representations simultaneously control attention
Neural Networks Basics with PyTorch
Capstone Design(2) 연구제안 발표
Capstone Design(1) 최종 발표
Capstone Design(1) 중간 발표
Capstone Design(1) 연구제안 발표
Neural Network Intro [인공신경망 설명]
Random Forest Intro [랜덤포레스트 설명]
Decision Tree Intro [의사결정나무]
Ad

Recently uploaded (20)

PDF
project resource management chapter-09.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Getting Started with Data Integration: FME Form 101
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Hybrid model detection and classification of lung cancer
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Tartificialntelligence_presentation.pptx
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
1. Introduction to Computer Programming.pptx
project resource management chapter-09.pdf
Approach and Philosophy of On baking technology
Web App vs Mobile App What Should You Build First.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
SOPHOS-XG Firewall Administrator PPT.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Getting Started with Data Integration: FME Form 101
Building Integrated photovoltaic BIPV_UPV.pdf
WOOl fibre morphology and structure.pdf for textiles
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Programs and apps: productivity, graphics, security and other tools
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
gpt5_lecture_notes_comprehensive_20250812015547.pdf
1 - Historical Antecedents, Social Consideration.pdf
A comparative study of natural language inference in Swahili using monolingua...
Hybrid model detection and classification of lung cancer
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Tartificialntelligence_presentation.pptx
Univ-Connecticut-ChatGPT-Presentaion.pdf
1. Introduction to Computer Programming.pptx

Capstone Design(2) 최종 발표

  • 1. D E F E N S E Against Adversarial Attacks B E T T E R F E L L O W
  • 3. iPodBoat Perturbation Through the machine’s eye Adversarial Example or Attack
  • 4. Cleanse the attacked image AutoEncoders Our Approach Our defense model ≈ 𝑓𝑖𝑙𝑡𝑒𝑟↓ Input: Attacked Image ↓ Expected Output: Purified Image
  • 5. Auto Encoders CNN model Expected Result Boat Set back the effect of the perturbation Model to defend Our defense model
  • 7. Adversarial examples are GENERALIZED upon different deep learning models Explaining and Harnessing Adversarial Examples I.J. Goodfellow et al. arXiv preprint arXiv:1412.6572, 2014
  • 8. Universal Adversarial Perturbations S.M. Moosavi-Dezfooli et al. IEEE Conference on CVPR, 2017. There exists a subspace with lower dimensions that captures the decision boundaries
  • 9. They fool classifiers mainly due to covariate shift. PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples Y. Song et al. arXiv preprint arXiv:1710.10766, 2017.
  • 10. I N T U I T I O N S & H Y P O T H E S I S • All image classifiers are approximating the same function in a specific task • There seems to be a latent subspace with lower dimensions for decision boundaries • Adversarial examples lie in low probability regions in the training distribution • If we are able to capture the latent subspace for the training distribution, we might be able to pull the adversarial examples back, near to its mean.
  • 12. Our Previous Toy EXPERIMENTS: Built from scratch 1 2 Convolutional Denoising Auto Encoder [dAE] Convolutional Variational Auto Encoder [VAE] It’s a five It’s a five ↓ Model to defend: LeNet-5↓ dAE ↓ VAE ↓ Model to defend: LeNet-5
  • 13. Toy Defense Model Comparison Raw Attacked Denoised Reconstructed Conv dAE Conv VAE Output comparison
  • 14. 0 10 20 30 40 50 60 70 80 90 100 Raw Images Attacked Images Denoised Images Reconstructed Images 90 65 75 77 Accuracy comparison Toy Defense Model Comparison Conv dAE Conv VAE ▲ ▲
  • 15. O U R M A I N E X P E R I M E N T
  • 16. Our Main Experiment • Real Image Dataset: CIFAR-10 • Concentrate on architecture for VAE • Train defense models based on target model’s output • Black box defense • Use of better computation power • Base-line: PixelDefend’s 30% → 52%
  • 17. Our Main Experiment: Built from scratch Z_mu, Z_logvar Sampled Epsilon Noise Added 16x16x128 8x8x256 4x4x256 1024 64 128 1024 4x4x256 8x8x256 16x16x128 32x32x6432x32x64 128 512512 2x2x512 32x32x3 32x32x3 Denoising Variational Auto-Encoder:= dAE + VAE = dVAE Im, D.J., et al. Denoising criterion for variational auto-encoding framework. Proceedings of AAAI: pp. 2059-2065, 2017. Architectural Reference: X. Yan et al., Attribute2Image: Conditional Image Generation from Visual Attributes. In ECCV, 2015. ↑ Input: Attacked Image ↓ Expected Output: Purified Image
  • 18. Classification Result True Label Reconstruction Loss Latent Loss Classification Loss dVAE VGG16 Our Main Experiment: Overall Architecture • Reconstruction Loss: Binary Cross-entropy • Latent Loss: Kullback-Leibler Divergence • Classification Loss: Binary Cross-entropy ↑ Input: Attacked Image ↑ Expected Output: Purified Image • Learning rate: 3*10E-5 • Batch size: 32 • Epoch: 80 ← Raw Image
  • 19. Our Main Experiment: CIFAR-10 Dataset RAW Images ATTACKED Images by Fast Gradient Sign Method Goodfellow, I.J., et al. Explaining and Harnessing Adversarial Examples. arXiv preprint arXiv:1412.6572, 2014.
  • 20. RAW Images Accuracy 90% ATTACKED Images VGG16 VGG16 Accuracy 30% Our Main Experiment: CIFAR-10 Dataset
  • 21. ATTACKED Images dVAE Our Main Experiment: CIFAR-10 Dataset VGG16 Accuracy 17% RAW Images dVAE VGG16 Accuracy 17%
  • 22. Discussion One can conclude that the VAE(X. Yan, et al., 2015) is the problem Z_mu, Z_logvar Sampled Epsilon Noise Added 16x16x128 8x8x256 4x4x256 1024 64 128 1024 4x4x256 8x8x256 16x16x128 32x32x6432x32x64 128 512512 2x2x512 32x32x3 32x32x3 Architectural Reference: X. Yan et al., Attribute2Image: Conditional Image Generation from Visual Attributes. In ECCV, 2015.
  • 23. Discussion However, the VAE did converge, and its output is identical to other so-called well-trained VAEs.
  • 24. Discussion So, for the classifier to classify image correctly, it needs much more high spatial frequency information (≈ high resolution image)
  • 25. Discussion VGG16 Accuracy 17% Only Low Spatial Frequency Information is given ≈ The proportion which Low Spatial Frequency information is contributing to the classification Accuracy 90% Loss of accuracy, due to the loss of High Spatial Frequency information
  • 26. However, for humans with the prior knowledge, that there are 10 classes, and leveraging the low spatial frequency information, we are able to do better than the machine Discussion {Airplane, Car, Bird, Cat, Deer, Dog, Frog, Horse, Ship, Truck} ∴ A difference regarding our biological vision system
  • 27. Conclusion • Not VAEs, but a more suitable generative model for high resolution • ex: Disco-GAN Why our model did not succeed • An architectural limitation of the Conv Nets • Not leveraging low spatial frequency information • No use of prior knowledge in the system Why adversarial attacks work
  • 28. R E F E R E N C E S In appearance order • I.J. Goodfellow et al., Explaining and Harnessing Adversarial Examples. arXiv preprint arXiv:1412.6572, 2014. • S.M. Moosavi-Dezfooli et al. Universal Adversarial Perturbations. IEEE Conference on CVPR, 2017. • Y. Song et al. PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples. arXiv preprint arXiv:1710.10766, 2017. • Y. LeCun et al., Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 1998. • P. Vincent et al., Extracting and Composing Robust Features with Denoising Autoencoders. Proceedings of ICML, 2008. • X. Yan et al., Attribute2Image: Conditional Image Generation from Visual Attributes. In ECCV, 2015. • Im, D.J., et al. Denoising criterion for variational auto-encoding framework. Proceedings of AAAI: pp. 2059-2065, 2017.
  • 29. THANK YOU René Magritte, The castle in the Pyrenees, 1959