Anti-Diffusion - Preventing Abuse of Modifications of Diffusion-Based Models
Anti-Diffusion - Preventing Abuse of Modifications of Diffusion-Based Models
Li Zheng1 * , Liangbin Xie1 2 * , Jiantao Zhou1† , Xintao Wang3 , Haiwei Wu1 , Jinyu Tian4
1
University of Macau
2
Shenzhen Institute of Advanced Technology
3
Kuaishou Technology
4
Macau University of Science and Technology
[email protected], [email protected], [email protected],
[email protected], [email protected], [email protected]
arXiv:2503.05595v1 [cs.CV] 7 Mar 2025
Abstract
Although diffusion-based techniques have shown remarkable
success in image generation and editing tasks, their abuse can
lead to severe negative social impacts. Recently, some works
have been proposed to provide defense against the abuse of
diffusion-based methods. However, their protection may be
limited in specific scenarios by manually defined prompts or
the stable diffusion (SD) version. Furthermore, these methods
solely focus on tuning methods, overlooking editing meth-
ods that could also pose a significant threat. In this work, we
propose Anti-Diffusion, a privacy protection system designed
for general diffusion-based methods, applicable to both tun-
ing and editing techniques. To mitigate the limitations of
manually defined prompts on defense performance, we in-
troduce the prompt tuning (PT) strategy that enables precise Figure 1: Our defense system, called Anti-Diffusion, can
expression of original images. To provide defense against provide defense against both tuning and editing methods.
both tuning and editing methods, we propose the semantic
disturbance loss (SDL) to disrupt the semantic information
of protected images. Given the limited research on the de- et al. 2022)) have also been proposed. With the rapid ad-
fense against editing methods, we develop a dataset named vancement of text-to-image techniques, many industry pro-
Defense-Edit to assess the defense performance of various fessionals and even ordinary users could create images or
methods. Experiments demonstrate that our Anti-Diffusion train personalized models based on their ideas.
achieves superior defense performance across a wide range
However, technology is a double-edged sword. Individ-
of diffusion-based techniques in different scenarios.
uals can easily utilize images to train personalized mod-
els (e.g., DreamBooth, LoRA) and manipulate images us-
Code — https://github.com/whulizheng/Anti-Diffusion ing editing methods such as MasaCtrl (Cao et al. 2023) and
DiffEdit (Couairon et al. 2023). Similar to DeepFake (Liu
Introduction et al. 2023; Rana et al. 2022), when these methods are
abused by malicious users to create fake news, plagiarize
The field of text-to-image synthesis (Li et al. 2023; Ramesh artistic creations, violate personal privacy, etc., they can
et al. 2021; Gafni et al. 2022; Ding et al. 2021) has expe- have severe negative impacts on both individuals and soci-
rienced significant advancements, primarily driven by dif- ety (Wang et al. 2023). Hence, finding ways to protect im-
fusion models (Ho, Jain, and Abbeel 2020; Song, Meng, ages from the potential abuse of these methods is a pressing
and Ermon 2020). Numerous diffusion models have demon- issue that requires immediate attention.
strated their ability to generate images of exceptional qual- Anti-DreamBooth (Anti-DB) (Van Le et al. 2023) has
ity, such as SD (Rombach et al. 2022; Yang et al. 2023), made attempts to address this issue. By adding subtle ad-
Pixel-Art (Chen et al. 2023, 2024). Based on these diffusion versarial noise to images, Anti-DB forces the personal-
models, some controllable generation methods (Control- ized model trained on them producing outputs with sig-
Net (Zhang, Rao, and Agrawala 2023), T2I-Adapter (Mou nificant visual artifacts. However, Anti-DB demands addi-
et al. 2023)) and personalized methods (DreamBooth (Ruiz tional substitute data and manually defined prompts, which
et al. 2023), LoRA (Hu et al. 2021), Textual Inversion (Gal increases its complexity of use. Moreover, in practical sce-
* These authors contributed equally. narios, it is challenging to anticipate the prompts that mali-
†
Corresponding author cious users might utilize, thereby limiting its defense per-
Copyright © 2025, Association for the Advancement of Artificial formance. Additionally, existing methods (Truong, Dang,
Intelligence (www.aaai.org). All rights reserved. and Le 2024) focus solely on defending against person-
alized generative models, overlooking another crucial sce- Defense Test FDFR↑ ISM↓ BRISQUE↑
nario—defense against editing models. Editing models have Anti-DB(c1) 0.60 0.24 37.41
the capability to directly modify the content of input images Anti-DB(c2) c1 0.48 0.20 37.21
during inference using prompts, thereby presenting a signif- Anti-Diffusion 0.62 0.15 40.46
icant security and privacy threat if abused. Anti-DB(c1) 0.37 0.27 36.37
In this work, we propose Anti-Diffusion, a privacy pro- Anti-DB(c2) c2 0.40 0.25 36.96
tection system to prevent images from being abused by gen- Anti-Diffusion 0.60 0.17 40.66
eral diffusion-based methods. This system aims to add subtle
adversarial noise (Goodfellow, Shlens, and Szegedy 2014) Table 1: Defense performance on the DreamBooth model
to users’ images before publishing in order to disrupt the with different prompts. c1 (“a photo of sks person”), c2 (“a
tuning and editing process of diffusion-based methods. To dlsr portrait of sks person”).
mitigate the impact of different prompts when defending
and malicious using, and to overcome limitations of man-
ually defined prompts in achieving optimal performance, as fusion process in LDM occurs in the latent space. Conse-
shown in Tab. 1, we propose the prompt tuning (PT) strat- quently, in addition to a diffusion model, an autoencoder,
egy. This strategy aims to optimize a text embedding that comprising an encoder E and a decoder D, is required. For
more accurately captures the information of protected im- an image x and an encoder E, the diffusion process intro-
ages. Our method with PT does not require manual selection duces noise to the encoded latent variable z = E(x), re-
of prompts during the defense phase and still provides good sulting in a noisy latent variable zt , with the noise level es-
protection against malicious users training with unknown calating over timesteps t ∈ T . Subsequently, a UNet ϵθ is
prompts. Furthermore, as SD achieves semantic control of trained to predict the noise added to the noisy latent variable
images through cross-attention (Vaswani et al. 2017), we in- zt , given the text embedding instruction f . The specific loss
troduce the semantic disturbance loss (SDL) to disrupt the function of latent diffusion is as follows:
semantic information of protected images. By minimizing
the distance between the cross-attention map and a zero- h i
2
filled map, it can maximize the semantic distance between Lldm := Ez∼E(x),f,ϵ∼N (0,1),t ∥ϵ − ϵθ (zt , t, f )∥2 (1)
clean images and protected images. When equipped with
these two designs, our Anti-Diffusion can achieve robust de- Cross Attention Mechanism
fense against both tuning and editing methods, as shown in Attention mechanism allows models to refer to another re-
Fig. 1. To better evaluate the effectiveness of current de- lated sequence when processing one sequence. It is an im-
fense methods against diffusion-based editing methods, in portant part of diffusion, which introduces conditional in-
this work, we further construct a dataset, named Defense- formation into the denoising process, thereby indicating the
Edit. We hope this dataset can draw attention to the privacy generated image. Many editing methods, such as MasaCtrl
protection challenges posed by diffusion-based image edit- and DiffEdit, also use attention mechanisms to edit images.
ing models. In summary, our contributions are as follows: Cross-attention in diffusion can be expressed as:
1) We expand the defense to include both tuning-based
and editing-based methods, while other baselines focus only
on tuning-based methods. QK T
Attention(Q, K, V ) = sof tmax( √ ) · V (2)
2) We introduce the PT strategy for ensuring a better rep- d
resentation of protected images and providing more gener-
alized protection for unexpected prompts. where Q = WQ · φ(zt ), K = WK · f and V = Wv · f . Here
3) We integrate the SDL to disrupt the semantic informa- φ(zt ) denotes a representation of the UNet implementing ϵθ ,
tion of protected images, enhancing the performance of de- d is used to ensure the normalization input of the softmax
fense against both tuning-based and editing-based methods. layer, and W represents a learnable weight matrix.
4) We contribute a dataset called Defense-Edit for evaluat-
ing the defense performance against editing-based methods. Methods
Based on both quantitative and qualitative results, our pro- In this work, we aim to protect images by adding adversarial
posed method, Anti-Diffusion, achieves superior defense ef- noise. We first provide a detailed definition of this problem.
fects across several diffusion-based techniques, including Subsequently, we introduce the overall framework of Anti-
tuning methods (such as DreamBooth/LoRA) and editing Diffusion, which primarily encompasses three stages of iter-
methods (such as MasaCtrl/DiffEdit). ative optimization. The first stage involves PT, and the sec-
ond stage focuses on the optimization of adversarial noise,
Preliminary resulting in adversarial samples. The final stage involves up-
dating the UNet with these adversarial samples.
Stable Diffusion
Stable diffusion is a Latent Diffusion Model (LDM) that Problem Definition
has been trained on large-scale data. The LDM is a genera- Recalling that our aim is to prevent the malicious use of
tive model capable of synthesizing high-quality images from diffusion-based image generation models on private images,
Gaussian noise. Unlike traditional diffusion models, the dif- we achieve this by adding adversarial noise to those images.
Figure 2: The overview framework of Anti-Diffusion under the jth epoch. Here xj represents the image to be protected. In
stage (1), the text-embedding fj will undergo fine-tuning with the LLDM . Subsequently, in stage (2), adversarial noise will be
optimized and added to xj using the PGD with our proposed loss functions LURL and LSDL to obtain the adversarial sample
x̂j . In stage (3), the UNet will be updated with LUNet using the adversarial sample x̂j and text embedding fˆj to simulate the
tuning process of malicious users. This process repeats cyclically, returning to stage (1) in the next epoch.
This adversarial noise disrupts the functionality of the mali- divide this optimization into three stages: (1) prompt tun-
cious models while minimizing the visual impact on the im- ing, (2) adversarial noise optimization, and (3) UNet update,
ages. Let x represent the image that requires protection. An as illustrated in Fig. 2. Specifically, stage 2 corresponds to
adversarial noise δ is added, resulting in a protected image the maximization of P.1 while stage 3 corresponds to the
x̂ = x + δ. The optimization of this adversarial noise δ can minimization of P.1. Given that an accurate text-embedding
be described as a min-max optimization problem. The mini- f is crucial for P.1, we include stage 1 to train the text-
mization simulates the actions of malicious users attempting embedding f at the beginning of each epoch.
to overcome the adversarial noise added to the protected im- Fig. 2 illustrates the optimization path under the jth
ages. The maximization aims to degrade the performance of epoch. In the jth epoch, xj and fj are first input into stage
the malicious model by adding adversarial noise under the prompt tuning. At this point, the parameters of the image en-
constraint of maximal perturbations of the protected images. coder and UNet are fixed. We only optimize fj to obtain a
This min-max problem P.1 can be described as: better fˆj that corresponds to the semantic information of the
input image. Subsequently, xj and the optimized fˆj are in-
P.1 : min max L (ϵθ , x̂, f ) + C (ϵθ , x̂, f ) , corporated into the adversarial noise optimization stage. In
θ δ (3) this stage, xj is continually optimized by utilizing the PGD
s.t. ∥δ∥p ≤ η,
algorithm with loss functions LURL and LSDL . The adver-
where η controls the Lp norm perturbation magnitude of the sarial sample x̂j and fˆj are input into the next UNet update
adversarial noise δ. L is the loss function of this generation stage to facilitate the update of the UNet parameters. After
model trained on the modified images. C measures the fea- the jth epoch, the updated x̂j , fˆj and θ̂ will serve as the
ture dissimilarity of the images generated by the diffusion- xj+1 , fj+1 and θj+1 Note that in the first stage, the image
based generation model ϵθ , the input image, and the target x0 is initialized with a clean image, and the text embedding
prompt. f is the text embedding of the input prompt. We f0 is the embedding of an empty prompt. After N epochs,
generate the adversarial noise by maximizing the objective we obtain the final protected image xN .
function P.1. Then we optimize the model ϵθ to minimize
this function following the original training process of SD. Prompt Tuning Strategy
Due to the inability to predict what prompts malicious users
Overview Framework will utilize to train their models, it is challenging for Anti-
To solve the min-max problem P.1, we need to apply alter- DB to manually define a prompt that can provide the best
nating optimization over multiple epochs. In each epoch, we protection on different metrics. Therefore, we propose the
PT strategy to address this issue. As shown in Fig. 2 (1), we interfere with the original semantic information of the pro-
iteratively optimize fj under each epoch to obtain a more tected image, rendering the editing method ineffective on the
accurate representation corresponding to xj . Initially, the protected image. The LSDL is designed as follows:
image xj undergoes processing through the image encoder 2
LSDL := Ez∼E(x),fˆ Mtarget − M ϵθ , t, zt , fˆj ,
before being combined with the noise map to generate the j ,ϵ∼N (0,1),t 2
(8)
noisy latent zt . This noisy latent is then fed into the UNet,
where it interacts with fj via cross-attention. We optimize where Mtarget is the target Attention map. In our experi-
ments, we set it as a zero matrix with the same size as M .
fj to obtain fˆj by using the loss function LLDM of the la-
tent diffusion model. The parameters of the image encoder
and UNet are fixed. By continuously optimizing the text em-
bedding f , the model can predict the correct noise; the se-
mantics of fˆ are expected to gradually align with the feature
content of the images.
Table 2: Comparing the defense performance of different methods on the DreamBooth model. The inference prompt adopted
in DreamBooth is “a photo of sks person”. The best-performing defense under each metric is marked with bold.
Table 3: Comparing the defense performance of different methods on LoRA model on VGGFace2. The inference prompt
adopted in LoRA is “a photo of sks person”.
CelebA-HQ, Anti-Diffusion provides superior defense per- aCtrl can successfully change it from a standing posture to
formance. The qualitative results in Fig. 4 further support a jumping posture. For the protected images obtained from
this conclusion. While methods like Photo Guard, MIST, Photo Guard, MIST, PID, and Anti-dreamBooth, MasaCtrl
PID, and Anti-DB offer some level of protection by reducing can still successfully edit them. Only the images protected
the visual quality of the generated images, Anti-Diffusion by Anti-Diffusion can effectively prevent MasaCtrl from
significantly degrades the image quality generated by the editing. The same phenomenon is observed with DiffEdit,
disrupted DreamBooth model and also disturbs their iden- where Anti-Diffusion can effectively prevent DiffEdit from
tities. As shown in Tab. 3, we also present the quantita- changing “apples” in the image to “oranges”.
tive defense results of different methods for LoRA. Anti-
Diffusion achieves the best results on all metrics. This effec- Ablation Studies
tively demonstrates the good generalization ability of Anti-
Diffusion against different tuning methods. PT LSDL FDFR↑ ISM↓ BRISQUE↑ FID↑
0.50 0.22 37.63 432.25
MasaCtrl DiffEdit ✓ 0.52 0.19 40.34 441.43
Method PSNR↑ ✓ 0.53 0.22 38.45 432.53
BRI↑ CLI↓ BRI↑ CLI↓
no defnese - 22.18 27.44 16.55 27.65 ✓ ✓ 0.62 0.15 40.46 457.13
Photo 35.57 20.40 27.41 18.76 26.55
Table 5: Comparing the defense performance on Dream-
MIST 34.87 21.11 27.38 21.77 26.45
Booth with or without PT and LSDL .
PID 35.37 22.67 27.73 23.62 26.47
Anti-DB 33.44 25.72 27.42 24.61 26.69
Anti-DF 36.73 25.82 26.44 25.26 25.25
FDFR↑ ISM↓ BRISQUE↑ FID↑
Table 4: Comparing the defense performance against Mas- Zero 0.62 0.15 40.46 457.13
aCtrl and DiffEdit on the Defense-Edit dataset. “Photo” and Noise 0.58 0.17 38.92 412.56
“Anti-DF” denotes Photo Guard and Anti-Diffusion. “BRI” Diagonal 0.59 0.15 39.44 424.19
and “CLI” are BRISQUE and CLIP Score.
Table 6: Comparing the defense performance on different
Comparison on MasaCtrl/DiffEdit. We also compare attention targets. Here, “Noise” means a random noise map
the defense performance of different methods on MasaCtrl as a target attention map, and “Diagonal” means a diagonal
and DiffEdit. The quantitative results are shown in Tab. 4, matrix where its diagonal values are set to one.
where Anti-Diffusion achieves the best performance on all
three metrics. Specifically, Anti-Diffusion has the lowest To validate the effectiveness of the PT and the SDL, we
value on the CLIP Score, indicating that when the images are conduct comparative experiments based on DreamBooth.
protected by Anti-Diffusion, neither MasaCtrl nor DiffEdit The details are presented in Tab. 5. The first experiment is
can modify them according to the instructions. This is fur- a baseline experiment with a fixed prompt (i.e., “a photo of
ther validated by the qualitative results in Fig. 5. Specifi- a person”), which does not incorporate the PT and LSDL . In
cally, for the image “dog”, when not added with noise, Mas- the second row, we replace the fixed prompt with PT. For the
Figure 5: Qualitative defense results of different defense methods on MasaCtrl and DiffEdit. The instance is from our proposed
dataset Defense-Edit.
third row, we add LSDL based on the first row. The fourth Unexpected Prompts For DreamBooth, different prompts
row is the final Anti-Diffusion equipped with both PT and can be used to generate various content. As illustrated in
LSDL . The quantitative results of these experiments reveal Tab. 8, we introduce three additional prompts p1, p2 and p3
that PT and LSDL play complementary roles in enhancing that are “a photo of sks person with sad face”, “facial close
the defense performance. up of sks person” and “a photo of sks person yawning in a
In the experiment, we used a zero map as the target Atten- speech” to evaluate the performance. We can see that Anti-
tion map for LSDL . Since cross-attention represents seman- Diffusion can also provide defense from different prompts
tic similarity, zero-attention maps result in semantic dissim- in various scenarios.
ilarity between perturbed and original images. We also ex-
plored the use of random or diagonal matrices as targets. P Def. FDFR↑ ISM↓ BRISQUE↑ FID↑
From Tab. 6, they are not as effective as zero attention maps yes 0.53 0.18 39.40 457.27
in defense performance. p1
no 0.09 0.56 16.34 169.35
yes 0.81 0.08 27.22 346.21
Unexpected Scenarios p2
no 0.05 0.42 15.67 145.76
In practical scenarios, the specific utilization of the SD mod- yes 0.63 0.05 37.81 440.53
els by malicious users is unpredictable. Therefore, in this p3
no 0.02 0.31 18.35 189.21
section, we assess the defense capabilities of Anti-Diffusion
in various unexpected scenarios. More results of unexpected Table 8: Comparing the defense performance on different
scenarios can be found in the supplementary materials. prompts. “P” and “Def.” refer to prompt and defense.
Unexpected Version To evaluate the robustness of Anti-
Diffusion across diverse versions of SD, we apply it to the
VGGFace2 dataset using various versions of SD models, in-
cluding v2.1 and v1.5. As shown in Tab. 7, Anti-Diffusion Conclusion
can provide sufficient protection even when the versions of
SD models do not match. In conclusion, this paper presents Anti-Diffusion, a defense
system designed to prevent images from the abuse of both
Def. Test FDFR↑ ISM↓ BRISQUE↑ FID↑ tuning-based and editing-based methods. During the gener-
v2.1 0.62 0.15 40.46 457.13 ation of the protected images, we incorporate the PT strat-
v2.1
v1.5 0.89 0.03 43.24 489.45 egy to enhance defense performance, eliminating the need
v2.1 0.61 0.16 36.45 442.23 for manually defined prompts. Additionally, we introduce
v1.5
v1.5 0.82 0.04 37.24 486.56 the SDL to disrupt the semantic information of the pro-
v2.1 0.10 0.66 17.43 144.02 tected images, enhancing the performance of defense against
no
v1.5 0.06 0.45 21.43 134.76 both tuning-based and editing-based methods. We also intro-
duce the Defense-Edit dataset to evaluate the defense perfor-
Table 7: Comparing the defense performance on different mance of current defense methods against diffusion-based
versions of SD. The terms “Def.” and “Test” refer to the editing methods. Through a broad range of experiments, it
SD version for defending with Anti-Diffusion and training has been shown that Anti-Diffusion excels in defense per-
DreamBooth by malicious users. formance when dealing with various diffusion-based tech-
niques in different scenarios.
Acknowledgments Hessel, J.; Holtzman, A.; Forbes, M.; Le Bras, R.; and Choi,
This work was supported in part by Macau Science Y. 2021. CLIPScore: A Reference-free Evaluation Metric
and Technology Development Fund under SKLIOTSC- for Image Captioning. In Proceedings of the 2021 Confer-
2021-2023, 0022/2022/A1, and 0014/2022/AFJ; in part ence on Empirical Methods in Natural Language Process-
by Research Committee at University of Macau under ing, 7514–7528. Association for Computational Linguistics.
MYRG-GRG2023-00058-FST-UMDF and MYRG2022- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; and
00152-FST; in part by the Guangdong Basic and Applied Hochreiter, S. 2017. Gans trained by a two time-scale up-
Basic Research Foundation under Grant 2024A1515012536. date rule converge to a local nash equilibrium. Advances in
neural information processing systems, 30.
References Ho, J.; Jain, A.; and Abbeel, P. 2020. Denoising diffusion
Cao, M.; Wang, X.; Qi, Z.; Shan, Y.; Qie, X.; and Zheng, Y. probabilistic models. Advances in neural information pro-
2023. MasaCtrl: Tuning-Free Mutual Self-Attention Control cessing systems, 33: 6840–6851.
for Consistent Image Synthesis and Editing. In Proceedings Hu, E. J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang,
of the IEEE/CVF International Conference on Computer Vi- S.; Wang, L.; and Chen, W. 2021. Lora: Low-rank adaptation
sion (ICCV), 22560–22570. of large language models. arXiv preprint arXiv:2106.09685.
Cao, Q.; Shen, L.; Xie, W.; Parkhi, O. M.; and Zisserman, Karras, T.; Aila, T.; Laine, S.; and Lehtinen, J. 2017. Pro-
A. 2018. Vggface2: A dataset for recognising faces across gressive growing of gans for improved quality, stability, and
pose and age. In 2018 13th IEEE international conference variation. arXiv preprint arXiv:1710.10196.
on automatic face & gesture recognition (FG 2018), 67–74. Kawar, B.; Zada, S.; Lang, O.; Tov, O.; Chang, H.; Dekel,
IEEE. T.; Mosseri, I.; and Irani, M. 2023. Imagic: Text-based
Chen, J.; Wu, Y.; Luo, S.; Xie, E.; Paul, S.; Luo, P.; real image editing with diffusion models. In Proceedings of
Zhao, H.; and Li, Z. 2024. PIXART-δ: Fast and Control- the IEEE/CVF Conference on Computer Vision and Pattern
lable Image Generation with Latent Consistency Models. Recognition, 6007–6017.
arXiv:2401.05252. Korhonen, J.; and You, J. 2012. Peak signal-to-noise ra-
Chen, J.; Yu, J.; Ge, C.; Yao, L.; Xie, E.; Wu, Y.; Wang, Z.; tio revisited: Is simple beautiful? In 2012 Fourth interna-
Kwok, J.; Luo, P.; Lu, H.; and Li, Z. 2023. PixArt-α: Fast tional workshop on quality of multimedia experience, 37–38.
Training of Diffusion Transformer for Photorealistic Text- IEEE.
to-Image Synthesis. arXiv:2310.00426. Li, A.; Mo, Y.; Li, M.; and Wang, Y. 2024. PID: Prompt-
Couairon, G.; Verbeek, J.; Schwenk, H.; and Cord, M. 2023. Independent Data Protection Against Latent Diffusion Mod-
DiffEdit: Diffusion-based semantic image editing with mask els. arXiv preprint arXiv:2406.15305.
guidance. In The Eleventh International Conference on Li, Y.; Liu, H.; Wu, Q.; Mu, F.; Yang, J.; Gao, J.; Li, C.; and
Learning Representations. Lee, Y. J. 2023. Gligen: Open-set grounded text-to-image
Deng, J.; Guo, J.; Ververas, E.; Kotsia, I.; and Zafeiriou, S. generation. In Proceedings of the IEEE/CVF Conference on
2020. Retinaface: Single-shot multi-level face localisation Computer Vision and Pattern Recognition, 22511–22521.
in the wild. In Proceedings of the IEEE/CVF conference on Liang, C.; Wu, X.; Hua, Y.; Zhang, J.; Xue, Y.; Song, T.;
computer vision and pattern recognition, 5203–5212. Xue, Z.; Ma, R.; and Guan, H. 2023. Adversarial Example
Deng, J.; Guo, J.; Xue, N.; and Zafeiriou, S. 2019. Arcface: Does Good: Preventing Painting Imitation from Diffusion
Additive angular margin loss for deep face recognition. In Models via Adversarial Examples. In Proceedings of the
Proceedings of the IEEE/CVF conference on computer vi- 40th International Conference on Machine Learning, vol-
sion and pattern recognition, 4690–4699. ume 202 of Proceedings of Machine Learning Research,
Ding, M.; Yang, Z.; Hong, W.; Zheng, W.; Zhou, C.; Yin, 20763–20786. PMLR.
D.; Lin, J.; Zou, X.; Shao, Z.; Yang, H.; et al. 2021. Liu, K.; Perov, I.; Gao, D.; Chervoniy, N.; Zhou, W.; and
Cogview: Mastering text-to-image generation via transform- Zhang, W. 2023. Deepfacelab: Integrated, flexible and ex-
ers. Advances in Neural Information Processing Systems, tensible face-swapping framework. Pattern Recognition,
34: 19822–19835. 141: 109628.
Gafni, O.; Polyak, A.; Ashual, O.; Sheynin, S.; Parikh, D.; Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; and
and Taigman, Y. 2022. Make-a-scene: Scene-based text-to- Vladu, A. 2018. Towards Deep Learning Models Resis-
image generation with human priors. In European Confer- tant to Adversarial Attacks. In International Conference on
ence on Computer Vision, 89–106. Springer. Learning Representations.
Gal, R.; Alaluf, Y.; Atzmon, Y.; Patashnik, O.; Bermano, Mittal, A.; Moorthy, A. K.; and Bovik, A. C. 2012. No-
A. H.; Chechik, G.; and Cohen-Or, D. 2022. An image is reference image quality assessment in the spatial domain.
worth one word: Personalizing text-to-image generation us- IEEE Transactions on image processing, 21(12): 4695–
ing textual inversion. arXiv preprint arXiv:2208.01618. 4708.
Goodfellow, I. J.; Shlens, J.; and Szegedy, C. 2014. Explain- Mittal, A.; Soundararajan, R.; and Bovik, A. C. 2012. Mak-
ing and harnessing adversarial examples. arXiv preprint ing a “completely blind” image quality analyzer. IEEE Sig-
arXiv:1412.6572. nal processing letters, 20(3): 209–212.
Mou, C.; Wang, X.; Xie, L.; Wu, Y.; Zhang, J.; Qi, Z.; Shan,
Y.; and Qie, X. 2023. T2i-adapter: Learning adapters to
dig out more controllable ability for text-to-image diffusion
models. arXiv preprint arXiv:2302.08453.
Ramesh, A.; Pavlov, M.; Goh, G.; Gray, S.; Voss, C.; Rad-
ford, A.; Chen, M.; and Sutskever, I. 2021. Zero-shot text-to-
image generation. In International Conference on Machine
Learning, 8821–8831. PMLR.
Rana, M. S.; Nobi, M. N.; Murali, B.; and Sung, A. H. 2022.
Deepfake detection: A systematic literature review. IEEE
access, 10: 25494–25513.
Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; and Om-
mer, B. 2022. High-resolution image synthesis with latent
diffusion models. In Proceedings of the IEEE/CVF confer-
ence on computer vision and pattern recognition, 10684–
10695.
Ruiz, N.; Li, Y.; Jampani, V.; Pritch, Y.; Rubinstein, M.; and
Aberman, K. 2023. Dreambooth: Fine tuning text-to-image
diffusion models for subject-driven generation. In Proceed-
ings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, 22500–22510.
Salman, H.; Khaddaj, A.; Leclerc, G.; Ilyas, A.; and Madry,
A. 2023. Raising the Cost of Malicious AI-Powered Image
Editing. arXiv preprint arXiv:2302.06588.
Song, J.; Meng, C.; and Ermon, S. 2020. Denoising diffusion
implicit models. arXiv preprint arXiv:2010.02502.
Terhorst, P.; Kolf, J. N.; Damer, N.; Kirchbuchner, F.; and
Kuijper, A. 2020. SER-FIQ: Unsupervised estimation of
face image quality based on stochastic embedding robust-
ness. In Proceedings of the IEEE/CVF conference on com-
puter vision and pattern recognition, 5651–5660.
Truong, V. T.; Dang, L. B.; and Le, L. B. 2024. Attacks and
Defenses for Generative Diffusion Models: A Comprehen-
sive Survey. arXiv preprint arXiv:2408.03400.
Van Le, T.; Phung, H.; Nguyen, T. H.; Dao, Q.; Tran, N. N.;
and Tran, A. 2023. Anti-DreamBooth: Protecting users from
personalized text-to-image synthesis. In Proceedings of the
IEEE/CVF International Conference on Computer Vision,
2116–2127.
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones,
L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. At-
tention is all you need. Advances in neural information pro-
cessing systems, 30.
Wang, T.; Zhang, Y.; Qi, S.; Zhao, R.; Xia, Z.; and Weng,
J. 2023. Security and privacy on generative data in aigc: A
survey. arXiv preprint arXiv:2309.09435.
Yang, L.; Zhang, Z.; Song, Y.; Hong, S.; Xu, R.; Zhao, Y.;
Zhang, W.; Cui, B.; and Yang, M.-H. 2023. Diffusion mod-
els: A comprehensive survey of methods and applications.
ACM Computing Surveys, 56(4): 1–39.
Zhang, L.; Rao, A.; and Agrawala, M. 2023. Adding condi-
tional control to text-to-image diffusion models. In Proceed-
ings of the IEEE/CVF International Conference on Com-
puter Vision, 3836–3847.
Supplementary Materials for Anti-Diffusion: Preventing Abuse of Modifications of
Diffusion-Based Models
In this supplementary material for Anti-Diffusion, We can make the generated images have certain characteristics
firstly supplement related works and detailed training con- through a small number of samples. Textual Inversion can
figurations of Anti-Diffusion, DreamBooth and LoRA. Then achieve personalized generation over generated images by
we provide a comprehensive introduction to our constructed learning the representation of new ”words” in the embed-
dataset, Defense-Edit. Subsequently, we conduct a series ding space of the text encoder. It only needs to use 3-5 re-
of experiments to present both quantitative and qualitative lated images provided by the user to guide the generation
analyses. These experiments encompass the performance by learning these concepts. DreamBooth can fine tune the
evaluation on LoRA(Hu et al. 2021), the exploration of var- SD model with a small number of graphs to introduce the
ious Stable Diffusion (SD)(Rombach et al. 2022) versions desired concepts into a rare used word. Similarly, with only
in Anti-Diffusion, and the examination of different terms, a small number of samples required, LoRA accelerates the
prompts, and SD versions for DreamBooth(Ruiz et al. 2023). training process of the model and significantly reduces the
Finally, we offer additional qualitative analyses to illustrate storage cost of model files by slightly modifying and recal-
the efficacy of our method across diverse editing tasks. culating the weight layers of the original base model.
In addition to generating images, the editing method pow-
Related Works ered by SD can also quickly edit input images through in-
structions, such as the actions of characters or the number
Diffusion Models of items in the image. MasaCtrl is a method that does not
Text-to-image generation models (Li et al. 2023; Ramesh require fine-tuning and can achieve consistent image gener-
et al. 2021; Gafni et al. 2022; Ding et al. 2021) aim to ation and editing simultaneously. It converts Self-Attention
synthesize realistic images from natural language descrip- in the Diffusion model into mutual Attention to query rele-
tions. Recently, Diffusion models have emerged as a promis- vant local content and textures in the source image, achiev-
ing alternative to GANs (Dhariwal and Nichol 2021; Good- ing consistency between the edited image and the original
fellow et al. 2014), which can generate high-quality im- image. DiffEdit can replace the target in the image accord-
ages by reversing a stochastic Diffusion process. Control- ing to different text descriptions. It can automatically gener-
Net(Zhang, Rao, and Agrawala 2023) is a method used for ate a mask that highlights the input image area that needs to
image generation that controls Diffusion models by adding be edited, and then use this mask and a text guided Diffusion
additional conditions, such as pose, edge detection, depth model to achieve semantic image editing.
maps, etc. It enhances the control ability of the SD model In these works, DreamBooth and LoRA is widely used
by creating ”frozen” and ”trainable” copies. Similar to Con- due to its excellent generation ability and few-shot learning
trolNet, T2I Adapter (Mou et al. 2023) also aligns inter- ability. Moreover, image editing methods cannot be ignored
nal knowledge with external control signals by learning since MasaCtrl can immediately edit images without addi-
lightweight adapters, thereby enhancing the controllability tional tuning. Therefore, we focus on the protecting against
of the model. It does not affect the original network topol- the malicious use of these methods.
ogy and generation capability, and has efficient operating
characteristics. GLIGEN (Li et al. 2023) models the rela- Defense Against SD
tionship between text descriptions, spatial locations, and im-
ages by introducing additional trainable self-attentions lay- While SD brings efficient personalized generation and edit-
ers (Vaswani et al. 2017) on pretrained SD models. Through ing capabilities, defense against SD is also being studied
this method, GLIGEN can accurately control image gener- to protect images from unauthorized learning. Photo Guard
ation based on given text descriptions and positional infor- (Salman et al. 2023) generates adversarial noise for the en-
mation. Meanwhile, a number of personalized customiza- coder and denoising process in SD. Similar to Photo Guard,
tion generation methods based on SD have emerged, which MIST (Liang et al. 2023) interferes with both processes. At
the same time, MIST constructs a specific watermark that
Copyright © 2025, Association for the Advancement of Artificial not only interferes with the image generation process of SD,
Intelligence (www.aaai.org). All rights reserved. but also marks specific watermarks on the generated image.
Yet, for DreamBooth, its training process constantly updates rate of 1e−5 . The settings for the training instance prompt
the parameters of the SD, which makes the above two meth- and prior prompt are the same as DreamBooth.
ods prone to failure. On the basis of interfering with the
denoising process of SD, Anti-DreamBooth (Van Le et al. Details of the Dataset Defense-Edit
2023) simulates the training of DreamBooth through the
constructed dataset, dynamically updating the parameters To evaluate the performance of defense methods on SD-
of SD. Through this alternating defense and training, bet- based editing, we collect a dataset named Defense-Edit with
ter defense performance have been achieved against Dream- 50 pairs of images and prompts. In order to simulate differ-
Booth. PID (Li et al. 2024) generates adversarial noise to ent scenarios, we collect 30 images from real photos, and
disturb VAE encoder of SD, which has the advantage of ig- 20 images generated using SD. For real photos, we select
noring the impact of prompts and providing broader protec- them from CelebA-HQ(Karras et al. 2017), VGGFace2(Cao
tion than Anti-DB. However, PID solely relies on VAE and et al. 2018), and TEdBench(Kawar et al. 2023) to cover cat-
does not participate in the diffusion process. Its defense per- egories such as faces, animals (e.g., dogs and horses), and
formance is greatly affected by VAE versions. Given the un- objects (e.g., apples and boxes). For the generated images,
predictability of both prompts and model versions in prac- we adopt SD v2.1 to generate them with our constructed
tical applications, a more comprehensive defense strategy is prompts. In the process of creating instructions, to have an
necessary. At the same time, these methods mainly focus on extensive coverage of a wide range of conceivable scenar-
tuning models and overlook diffusion-based editing meth- ios, we incorporate transformations that include changes in
ods, emphasizing the necessity for a more extensive defense facial expressions, item substitutions, additions, deletions,
approach against both threats. These methods demonstrate and pose adjustments. To avoid situations where the edit-
their ability to protect images from unauthorized learning ing effect itself is not ideal, we select data that can be well
and editing, but there are still some shortcomings. Photo- edited on MasaCtrl(Cao et al. 2023) or DiffEdit(Couairon
Guard and Mist perform adversarial examples generation on et al. 2023). Some representative images with their instruc-
fixed parameter of SD, which results in poor protection on tions from our Defense-Edit dataset and their edited versions
methods such as DreamBooth that require tuning, while Anti using DiffEdit and MasaCtrl can be found in Fig. 1.
DreamBooth requires users to construct additional datasets,
which is not very practical for practical applications. And
Anti-DB only interferes with the denoising process of SD
with manually selected prompt, without interfering with the
semantic features of adversarial examples, which makes its
privacy protection ability and resistance to editing methods
not outstanding enough.
Training Configurations
During the Anti-Diffusion training process, it takes a to-
tal of 5 epochs to obtain the final perturbed images. Each
epoch consists of 30 iterations for Prompt Tuning, 10 iter-
ations for PGD in adversarial noise optimization, and 20
iterations for UNet update. For the PGD part, we utilize a
value of α = 0.002 and a default noise budget of η = 0.05.
The adversarial noise δ(i) is optimized using the Projected
Gradient Descent (PGD) scheme, with 10 iterations of PGD
employed. Our method utilizes a value of α = 0.002 and a
default noise budget of η = 0.05 for the PGD part. There
are 30 iterations for Prompt Tuning, 20 iterations for UNet
tuning at each epoch and it takes 5 epochs for the protection.
The whole training process, when executed on an NVIDIA
A6000 GPU, takes about 3 minutes to complete. During the Figure 1: Examples of Defense-Edit with original images,
optimization of the Prompt Tuning strategy, the dimension instructions, and editing effects.
of the optimized token embedding is 77 × 1024. For the Se-
mantic Disturbance Loss, we only extract Cross-Attention
maps at a scale of 16 × 16 to save memory.
DreamBooth is trained by utilizing a batch size of 2 and
Additional quantitative results
a learning rate of 5 × 10−7 across 1, 000 training steps. The In the main paper, we analyze the performance of Anti-
most recent version of SD (v2.1) is employed as the pre- Diffusion, utilizing the most recent SD version, v2.1. The
trained generator by default. Unless otherwise indicated, “a term ”sks” is employed during the training phase of the
photo of sks person” and “a photo of person” are set as the DreamBooth, and the prompt ”a photo of sks person” is used
training instance prompt and prior prompt, respectively. For during the inference stage. In this section, we provide more
the training of LoRA, it takes 400 epochs with a learning defense results on LoRA. Then we explore various terms,
prompts, and versions of SD on DreamBooth for evaluating Performance across different SD versions
the performance of Anti-Diffusion. In real-world situations, it’s impractical to know the specific
version of SD that will be employed for training Dream-
Defense performance on LoRA Booth. To evaluate the robustness of our defense method
across diverse versions of SD, we apply Anti-Diffusion on
In addition to applying LoRA on VGGFace2, we further in- the VGGFace2 dataset across various versions of SD mod-
vestigate the effectiveness of different defense methods by els, including v2.1, v1.5, and v1.4. The defense effects of
applying LoRA on the CelebA-HQ dataset. Other baselines images protected with different versions of SD are listed in
like Anti-DreamBooth,PID, Photo Guard and MIST are also Tab. 5, Tab. 6, and Tab. 7 respectively. Compared to images
adopted as an comparison. As demonstrated in Tab. 1, Anti- with no defense, Anti-Diffusion can prove effective across
Diffusion makes LoRA to generate more meaningless im- various SD versions. It remains effective when the SD ver-
ages (the highest FDFR(Deng et al. 2020) value), and dis- sion for defense differs from that of DreamBooth. Notably,
rupts LoRA’s ability to learn the image’s ID (the lowest our method outperforms Anti-DreamBooth on all the met-
ISM(Deng et al. 2019) and SER-FQA(Terhorst et al. 2020) rics, especially on ISM, suggesting that DreamBooth is more
values). In addition, the LoRA, when trained with images difficult to imitate the facial features from images protected
perturbed using Anti-Diffusion, tends to generate images of by Anti-Diffusion.
the lowest quality (the highest BRISQUE(Mittal, Moorthy,
and Bovik 2012), FID(Heusel et al. 2017) and NIQE(Mittal,
Soundararajan, and Bovik 2012) values). This suggests that
Anti-Diffusion has good generalizability when applied to
another diffusion-based method.
Table 2: Defense performance comparison across different terms for DreamBooth on VGGFace2.
Table 3: Defense performance comparison across different terms for DreamBooth on CelebA-HQ.
Method PSNR↑ FDFR↑ ISM↓ SER-FQA↓ BRISQUE↑ FID↑ NIQE↑
”a photo of sks person with sad face”
Anti-Diffusion 35.91 0.53 0.18 0.43 39.40 457.27 5.13
Anti-DreamBooth 34.55 0.47 0.23 0.45 36.32 425.75 5.01
no defense - 0.09 0.56 0.71 16.34 169.35 4.11
”a photo of sks person with angry face ”
Anti-Diffusion 35.91 0.55 0.16 0.41 39.51 432.54 5.16
Anti-DreamBooth 34.55 0.49 0.17 0.50 37.42 412.93 5.12
no defense - 0.11 0.59 0.75 15.36 173.69 4.08
Figure 3: Defense performance comparison when applying LoRA on dataset VGGFace2 ( the first two rows) and dataset
CelebA-HQ ( the last two rows).
Figure 4: Defense performance comparison across different terms for DreamBooth.