This document summarizes research on defending against adversarial attacks on deep learning models. It describes using a denoising variational autoencoder (dVAE) to purify images that have been attacked. The dVAE is trained on the CIFAR-10 dataset using reconstruction loss, latent loss, and classification loss. It achieves an accuracy of 17% on attacked images, compared to 30% for the unmodified model, showing some ability to recover information lost due to the attack. However, the dVAE only captures low spatial frequency information and lacks the high resolution needed for accurate classification. Architectural limitations and a lack of leveraging prior knowledge are identified as reasons the defense was not fully successful.
Related topics: