CopyrightMeter: Revisiting Copyright Protection in Text-to-image Models

Naen Xu2, Changjiang Li4, Tianyu Du2(✉), Minxi Li2, Wenjie Luo2, Jiacheng Liang4, Yuyuan Li5,
Xuhong Zhang2, Meng Han2, Jianwei Yin2, Ting Wang4
Naen Xu and Changjiang Li are the co-first authors. Tianyu Du is the corresponding author. 2Zhejiang University, 4Stony Brook University, 5Hangzhou Dianzi University
E-mails: {xunaen, zjradty, breathing, zhangxuhong, mhan, zjuyjw}@zju.edu.cn,
[email protected], [email protected], [email protected], [email protected], [email protected]
Abstract

Text-to-image diffusion models have emerged as powerful tools for generating high-quality images from textual descriptions. However, their increasing popularity has raised significant copyright concerns, as these models can be misused to reproduce copyrighted content without authorization. In response, recent studies have proposed various copyright protection methods, including adversarial perturbation, concept erasure, and watermarking techniques. However, their effectiveness and robustness against advanced attacks remain largely unexplored. Moreover, the lack of unified evaluation frameworks has hindered systematic comparison and fair assessment of different approaches.

To bridge this gap, we systematize existing copyright protection methods and attacks, providing a unified taxonomy of their design spaces. We then develop CopyrightMeter, a unified evaluation framework that incorporates 17 state-of-the-art protections and 16 representative attacks. Leveraging CopyrightMeter, we comprehensively evaluate protection methods across multiple dimensions, thereby uncovering how different design choices impact fidelity, efficacy, and resilience under attacks. Our analysis reveals several key findings: (i) most protections (16/17) are not resilient against attacks; (ii) the “best” protection varies depending on the target priority; (iii) more advanced attacks significantly promote the upgrading of protections. These insights provide concrete guidance for developing more robust protection methods, while its unified evaluation protocol establishes a standard benchmark for future copyright protection research in text-to-image generation.

1 Introduction

Recent advances in text-to-image diffusion models (T2I DMs), such as Stable Diffusion (SD) [1], DALL·E 3 [2], and Imagen [3], have revolutionized digital content creation by generating high-quality images from textual descriptions. While these models foster creativity by producing art and realistic scenes, they also raise significant copyright concerns[4]. Fine-tuning pre-trained models on specialized datasets allows them to mimic specific themes such as distinct art styles, which can lead to unauthorized reproductions[5, 6]. Artists are increasingly worried that their unique styles could be copied without permission, resulting in potential copyright infringement[7]. Furthermore, models trained on extensive datasets may produce images that closely resemble the style or content of specific artists, even if the artist or their creations are not directly referenced in the prompt[8]. As these AI-driven technologies evolve, it is crucial to balance innovation with the protection of creators’ rights, as many artists fear that their unique art styles could be easily copied, potentially drawing customers away[9].

The urgent need to safeguard digital intellectual property leads to the development of three main protection categories: (i) Obfuscation Processing, which preprocesses data before release online to prevent unauthorized use, often using adversarial perturbation to confuse AI models while preserving content for normal users[10, 7]. (ii) Model Sanitization, which modifies pre-trained DMs to remove or alter protected copyright elements before public deployment[11, 12]. (iii) Digital Watermarking, which embeds invisible identifiers in AI-generated content to assert copyright ownership and support effective content management[13, 1, 14].

Given the significance of these protection mechanisms, recent studies have raised concerns about their effectiveness and robustness[15, 16, 17, 18]. This has led to several critical questions: RQ1What are the strengths and limitations of different protection mechanisms, especially their robustness against attacks? RQ2What are the best practices for copyright protection even in adversarial and envolving environments? RQ3How can existing copyright protection methods be further improved?

Despite their importance for understanding and improving copyright protection, these questions are under-explored due to the following challenge.

TABLE I: Comparison of conclusions in prior and our work (\Circle – inconsistent; \LEFTcircle – partially inconsistent; \CIRCLE – consistent).
Previous conclusion Refined conclusion in this paper Explanation Consistency
In Obfuscation Processing, Mist shows strong effectiveness against various noise purification methods, including under the SOTA online platform NovelAI I2I scenario. Mist has limited protective effectiveness against local DiffPure attacks and the latest version of NovelAI — NAI Diffusion Anime. The original protection may lose resilience as new attacks circumvent current protections, rendering previous methods vulnerable. \LEFTcircle (Sec 4.2 and 5.5)
In Model Sanitization, FMN[12], ESD[11], UCE[19], and SLD[20] remove a copyright concept while preserving the model’s ability to generate images without it. All Model Sanitization methods maintain unrelated concepts without copyright concepts well. Despite removing explicit copyright concepts, these methods ensure that the model retains its ability to generate irrelevant images, preserving its utility and effectiveness. \CIRCLE (Sec 4.3)
In Model Sanitization, ESD permanently removes concepts from DMs, rather than modifying outputs in inference, so it cannot be circumvented even if model weights are accessible. Model Sanitization methods are vulnerable to concept recovery methods such as DreamBooth, Text Inversion, Concept Inversion, or even model-weights-free approaches like Ring-A-Bell. The training dataset of DM, such as LAION, contains images with varying content, and it is almost impossible to remove elements with copyright concepts permanently. \Circle (Sec 4.3)
In Digital Watermarking, the techniques Diag, StabSig, and GShare demonstrate relative resilience against Watermark Removal attacks. Regarding attack resilience, Diag exhibits vulnerability to Blur attacks, StabSig is vulnerable to Rotate, Blur, VAE, and DiffPure attacks, and GShade demonstrates vulnerability to Rotate attacks. The vulnerability of Diag to Blur attack is attributed to different datasets, as the original paper employs the Pokemon dataset. Besides, StabSig and GShade are vulnerable to specific attacks not covered in the original paper. \LEFTcircle (Sec 4.4)

Non-holistic evaluations – Existing studies often lack comprehensive evaluation of protections and attacks [21, 22], focus narrowly on limited perspectives, such as [16] focuses solely on model sanitization against textual inversion, without providing a holistic evaluation. Moreover, many rely on limited metrics, failing to fully capture the characteristics and impacts of the protections being evaluated.

Non-unified framework – Inconsistent datasets and DM versions across studies lead to evaluations under varying conditions, making comparision challenging. For example, Glaze [7] and Mist [23] are evaluated with different SD versions, complicating direct comparisons of evaluations.

Outdated evaluations – While new attacks quickly lead to updated protections to bolster security, many studies focus solely on older protection methods, missing recent developments. For instance, [24] evaluated only the original Mist system as reported by [23], without considering the updated Mist v2 system described by[25].

To solve existing issues, we introduce a systematic taxonomy for copyright protection methods and develop CopyrightMeter, a systematic framework for evaluating them across different dimensions, including fidelity, efficacy, and resilience: fidelity evaluate how protected content retains its original quality; efficacy measures the protection method’s effectiveness in preventing unauthorized use or mimicry; and resilience indicates the method’s ability to withstand attacks. By reviewing literature and evaluating current practices, our study provides insights into challenges and opportunities, guiding policymakers, content creators, and technologists striving to navigate the complex interplay between copyright law and technological advancement. Our contributions are summarized in three major aspects:

Framework – We develop CopyrightMeter, the first unified framework for extensively evaluating copyright protection in T2I DMs. It integrates 17 protection methods, 16 representative attacks, and 10 key metrics for in-depth analysis of these methods. We plan to open source CopyrightMeter to facilitate copyright protection research and encourage the community to contribute more techniques.

Evaluation – Leveraging CopyrightMeter, we explore the landscape of copyright protection in T2I DMs, conducting a systematic study of existing protections and attacks, uncovering key insights that challenge prior conclusions, as summarized in Table I. Our findings reveal that different protections manifest delicate trade-offs among fidelity, efficacy, and resilience. For instance, Mist achieves strong protection against mimicry but slightly compromises fidelity; ESD shows high efficacy but relatively weak resilience; ZoDiac and GShade have high fidelity and efficacy, but are less resilient to attacks. These observations indicate the importance of using comprehensive metrics to evaluate copyright protections, and suggest the optimal practices of applying them under different settings.

Exploration – We further explore improving existing protections, leading to several critical insights including (i) the generalizability of various copyright protection methods differs significantly; (ii) in scenarios prioritizing efficiency, inference-guiding Ms are preferred to model fine-tuning Ms; (iii) the ongoing arms race between protections and attacks promotes the development of more advanced protections. We envision that the CopyrightMeter platform and our findings will facilitate future research on copyright protection and shed light on designing and building T2I DMs in a more trustworthy manner.

2 Background

Refer to caption
Figure 1: Overall system design of CopyrightMeter.

2.1 Text-to-image Diffusion Models

Diffusion models are a class of generative models that transform random noise into coherent data through a forward step that gradually adds noise to data and a reverse step that denoises it to recover the original data distribution. Our study focuses on the latent diffusion models (LDMs) for their strong performance and low computational costs.

Text-to-image diffusion models (T2I DMs) generate images from textual descriptions by learning to reverse the noise addition process guided by text. A notable open-source T2I DM example is Stable Diffusion (SD). Given a text prompt, it generates an image that reflects the specified semantic features, involves two key components:

Conditioning on Textual Descriptions – The reverse diffusion process is guided by textual descriptions, which are embedded into a high-dimensional vector using transformer-based models or other deep learning architectures. This vector informs each step of the reverse diffusion to align the generated image with the text.

Training Objective – T2I DMs are trained to predict and remove noise at each step, guiding image generation to match text prompts. This is achieved by minimizing the difference between actual and predicted noise:

DM(θ)=𝔼x0,ϵ,t,y[ϵϵθ(xt,y,t)2]subscript𝐷𝑀𝜃subscript𝔼subscript𝑥0italic-ϵ𝑡𝑦delimited-[]superscriptnormitalic-ϵsubscriptitalic-ϵ𝜃subscript𝑥𝑡𝑦𝑡2\mathcal{L}_{DM}(\theta)=\mathbb{E}_{x_{0},\epsilon,t,y}\left[\|\epsilon-% \epsilon_{\theta}(x_{t},y,t)\|^{2}\right]caligraphic_L start_POSTSUBSCRIPT italic_D italic_M end_POSTSUBSCRIPT ( italic_θ ) = blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_ϵ , italic_t , italic_y end_POSTSUBSCRIPT [ ∥ italic_ϵ - italic_ϵ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y , italic_t ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (1)

where ϵitalic-ϵ\epsilonitalic_ϵ is the noise vector, and ϵθ(xt,y,t)subscriptitalic-ϵ𝜃subscript𝑥𝑡𝑦𝑡\epsilon_{\theta}(x_{t},y,t)italic_ϵ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y , italic_t ) is the model’s estimate of the noise, conditioned on the noisy image xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, the textual description y𝑦yitalic_y, and the timestep t𝑡titalic_t.

Beyond these design aspects, T2I DMs have driven advancements in generative AI across content creation, design, education, and entertainment, bridging the gap between textual descriptions and visual content. T2I can also be fine-tuned with tools like DreamBooth, which enables them to mimic specific visual styles or objects by training on a few reference images, thus allowing the model to produce images closely resembling these reference examples.

2.2 Copyright Protection in Text-to-Image Models

Refer to caption
Figure 2: Examples of existing copyright protections and attacks integrated in CopyrightMeter.

In T2I DMs, copyright protection is a critical concern. The central challenge is ensuring that images generated from the model do not resemble copyrighted images. Techniques like DreamBooth fine-tuning on DM allow models to mimic specific copyrighted content, while DMs may also inadvertently produce similar works. Another significant challenge is ensuring generated images can be traced back to their copyrighted sources. Conversely, the associated attacks aim to exploit these models for unauthorized purposes. The two primary attack methods involve generating an image that matches a specific, potentially copyrighted image or manipulating the generated image to make it untraceable. We will formalize and discuss the prominent categories of protection and corresponding attack methods in Section 3.

3 Taxonomy

In this section, we provide a holistic overview of various copyright protection and attack methods. As depicted in Figure 1, we divide copyright protection into three categories: Obfuscation Processing (Op), Model Sanitization (Ms), and Digital Watermarking (Dw). Correspondingly, we identify three attack categories: Noise Purification (Np), Concept Recovery (Cr), and Watermark Removal (Wr). Table II presents the definitions and detailed methods of these protections and their corresponding attacks. Figure 1 shows the overall system design of CopyrightMeter, while Figure 2 provides specific examples of copyright protection and attack scenarios. We will briefly introduce each category in the subsequent sections. For convenience, we summarize the acronyms and notations in Table III.

TABLE II: Overview of copyright protections and attacks.
Category Definition and Methods
Obfuscation Definition: Add adversarial perturbations on images to avoid image mimicry.
Processing Methods: AdvDM [26], Mist [23], Glaze [7], PGuard [10], AntiDB [27].
Noise Purification Definition: Purify the protected images to nullify adversarial perturbations.
Methods: JPEG [28], Quant [29], TVM [30], IMPRESS [15], DiffPure [31].
Model Definition: Prevent DM from generating images containing specific concept.
Sanitization Methods: FMN [12], ESD [11], AC [32], UCE [19], NP [33], SLD [20].
Concept Recovery Definition: Retrieve the eliminated concept to recover the content generation.
Methods: LoRA [34], DB [5], TI [35], CI [16], RB [36].
Digital Definition: Embed DM-based watermark into image generation.
Watermarking Methods: DShield [13], Diag [37], StabSig [1], ZoDiac [38], TR [14], GShade [39].
Watermark Removal Definition: Tamper with images to remove watermark.
Methods: Bright [39], Rotate [39], Crop [39], Blur [40], VAE [41], DiffPure [31].
TABLE III: Acronyms and notations.
Notation Definition
General x𝑥xitalic_x original copyrighted image to be protected
θ𝜃\thetaitalic_θ Diffusion Model (DM)’s weights
Dp(,)subscript𝐷𝑝D_{p}(\cdot,\cdot)italic_D start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( ⋅ , ⋅ ) pixel distance between images
Dz(,)subscript𝐷𝑧D_{z}(\cdot,\cdot)italic_D start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( ⋅ , ⋅ ) latent space distance between images
ϵitalic-ϵ\epsilonitalic_ϵ upper bound of pixel distance between two images
\hdashline Op & Np δ𝛿\deltaitalic_δ perturbation introduced during obfuscation processing (Op)
xtsubscript𝑥tx_{\text{t}}italic_x start_POSTSUBSCRIPT t end_POSTSUBSCRIPT the chosen dissimilar target image differ from x𝑥xitalic_x in Op
xprosubscript𝑥prox_{\text{pro}}italic_x start_POSTSUBSCRIPT pro end_POSTSUBSCRIPT protected image with perturbation applied in Op
τ𝜏\tauitalic_τ transformation applied in noise purification (Np) to remove δ𝛿\deltaitalic_δ
xpursubscript𝑥purx_{\text{pur}}italic_x start_POSTSUBSCRIPT pur end_POSTSUBSCRIPT purified image after τ𝜏\tauitalic_τ in Np
\hdashline Ms & Cr C𝐶Citalic_C set of all possible concepts
ccrsubscript𝑐crc_{\text{cr}}italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT copyright concept for model sanitization (Ms)
csubscript𝑐c_{\text{$\varnothing$}}italic_c start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT a specific unrelated concept that excludes ccrsubscript𝑐crc_{\text{cr}}italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT
crefsubscript𝑐refc_{\text{ref}}italic_c start_POSTSUBSCRIPT ref end_POSTSUBSCRIPT concept in reference image similar to copyrighted image x𝑥xitalic_x
p(x|c)𝑝conditional𝑥𝑐p(x|c)italic_p ( italic_x | italic_c ) image generation distribution by DM given concept c𝑐citalic_c
DKL()D_{KL}(\cdot\parallel\cdot)italic_D start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT ( ⋅ ∥ ⋅ ) the divergence between the two image output distributions
\hdashline Dw & Wr m𝑚mitalic_m original watermarked message embedded into x𝑥xitalic_x
w𝑤witalic_w watermark embedding function
e𝑒eitalic_e watermark extraction function
xwmsubscript𝑥wmx_{\text{wm}}italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT watermarked image with message m𝑚mitalic_m via w𝑤witalic_w
mwmsubscript𝑚wmm_{\text{wm}}italic_m start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT extracted message from xwmsubscript𝑥wmx_{\text{wm}}italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT
xwrsubscript𝑥wrx_{\text{wr}}italic_x start_POSTSUBSCRIPT wr end_POSTSUBSCRIPT image after watermark removal
Dt(,)subscript𝐷𝑡D_{t}(\cdot,\cdot)italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ⋅ , ⋅ ) text distance between two watermarked messages

3.1 Protection Schemes

This subsection contains survey-style descriptions of the investigated copyright protection schemes. Table IV shows the copyright protection methods and detailed characteristics.

TABLE IV: Summary of copyright protection methods in CopyrightMeter.

Protection Category Auxiliary Guidance Distortion Scenario Main Application Implementer Sem. Graph. I2I T2I AdvDM [26] Obfuscation Processing Diffusion Model \checkmark ×\times× \checkmark \checkmark Unauthorized style mimicry Data Owner Mist [23] Image Encoder & Diffusion \checkmark \checkmark \checkmark \checkmark Unauthorized style mimicry Glaze [7] Image Encoder \checkmark ×\times× \checkmark \checkmark Unauthorized style mimicry PGuard [10] Image Encoder ×\times× \checkmark \checkmark ×\times× Unauthorized editing AntiDB [27] Diffusion Model \checkmark \checkmark ×\times× \checkmark Unauthorized image mimicry FMN [12] Model Sanitization Model Fine-tuning \checkmark ×\times× ×\times× \checkmark Identity, object, or style Model Provider ESD [11] Model Fine-tuning \checkmark ×\times× ×\times× \checkmark Style, explicit content, or object AC [32] Textual Inversion \checkmark ×\times× ×\times× \checkmark Instance, style, or memorized images UCE [19] Textual Inversion \checkmark ×\times× ×\times× \checkmark Artist or objects NP [33] Inference Guiding \checkmark ×\times× ×\times× \checkmark Specific concepts or features SLD [20] Inference Guiding \checkmark ×\times× ×\times× \checkmark Inappropriate concept DShield [13] Digital Watermarking Model Fine-tuning ×\times× \checkmark \checkmark ×\times× Multi-bit watermark for existing images Works Publisher Diag [37] Model Fine-tuning ×\times× \checkmark \checkmark ×\times× Zero-bit watermark for existing images StabSig [1] Model Fine-tuning ×\times× \checkmark ×\times× \checkmark Multi-bit watermark for generating images TR [14] Latent Space Modifying ×\times× \checkmark ×\times× \checkmark Zero-bit watermark for generating images ZoDiac [38] Latent Space Modifying ×\times× \checkmark \checkmark ×\times× Zero-bit watermark for existing images GShade [39] Latent Space Modifying ×\times× \checkmark ×\times× \checkmark Multi-bit watermark for generating images

Note: Auxiliary Guidance – model components integrated for perturbation optimization in Op, or methods used in Ms and Dw. Sem – the semantic-distortion-based method, Graph – the graphical-distortion-based method, I2I – image-to-image generation; T2I – text-to-image generation.

3.1.1 Obfuscation Processing (Op)

This approach introduces protective perturbations into copyrighted images to prevent replication from T2I DMs. When these protected images are used as training or reference data (e.g., in image-to-image transformation), they mislead DMs that aim to replicate the originals, thereby protecting data owners from unauthorized replication and misuse of their data.

FormalizationGiven a copyrighted image x𝑥xitalic_x, the aim is to create a protected image xprosubscript𝑥prox_{\text{pro}}italic_x start_POSTSUBSCRIPT pro end_POSTSUBSCRIPT by adding a carefully crafted perturbation δ𝛿\deltaitalic_δ, such that xpro=x+δsubscript𝑥pro𝑥𝛿x_{\text{pro}}=x+\deltaitalic_x start_POSTSUBSCRIPT pro end_POSTSUBSCRIPT = italic_x + italic_δ. This perturbation δ𝛿\deltaitalic_δ is designed to either maximize the latent space distance between xprosubscript𝑥prox_{\text{pro}}italic_x start_POSTSUBSCRIPT pro end_POSTSUBSCRIPT and x𝑥xitalic_x (untargeted protection) or minimize the latent space similarity between xprosubscript𝑥prox_{\text{pro}}italic_x start_POSTSUBSCRIPT pro end_POSTSUBSCRIPT and a deliberately chosen dissimilar target image xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (targeted protection). Additionally, to ensure the perturbation remains inconspicuous, the pixel distance between x𝑥xitalic_x and xprosubscript𝑥prox_{\text{pro}}italic_x start_POSTSUBSCRIPT pro end_POSTSUBSCRIPT should be constrained by an upper bound ϵitalic-ϵ\epsilonitalic_ϵ, maintaining the visual fidelity of the protected image. This can be formatted as:

maxδDz(x,xpro)orminδDz(xpro,xt),s.t. Dp(x,xpro)ϵ.subscript𝛿subscript𝐷𝑧𝑥subscript𝑥proorsubscript𝛿subscript𝐷𝑧subscript𝑥prosubscript𝑥𝑡s.t. subscript𝐷𝑝𝑥subscript𝑥proitalic-ϵ\max_{\delta}D_{z}(x,x_{\text{pro}})\;\text{or}\;\min_{\delta}D_{z}(x_{\text{% pro}},x_{t}),\text{s.t. }D_{p}(x,x_{\text{pro}})\leq\epsilon.roman_max start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_x , italic_x start_POSTSUBSCRIPT pro end_POSTSUBSCRIPT ) or roman_min start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT pro end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , s.t. italic_D start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_x , italic_x start_POSTSUBSCRIPT pro end_POSTSUBSCRIPT ) ≤ italic_ϵ . (2)

ApproachesSince all methods maintain visual similarity by ensuring the perturbation δ𝛿\deltaitalic_δ maintain a small pixel space distance between x𝑥xitalic_x and xprosubscript𝑥prox_{\text{pro}}italic_x start_POSTSUBSCRIPT pro end_POSTSUBSCRIPT, we omit this commonality and focus solely on the unique protection concepts of each method. AdvDM [26] optimizes δ𝛿\deltaitalic_δ to maximize the diffusion training loss and increase the latent noise vector’s distance of xprosubscript𝑥prox_{\text{pro}}italic_x start_POSTSUBSCRIPT pro end_POSTSUBSCRIPT and x𝑥xitalic_x. Based on AdvDM, Mist [23] optimizes δ𝛿\deltaitalic_δ to maximize distance both in the latent noise vector and latent encoded representation. Glaze [7] optimizes δ𝛿\deltaitalic_δ by adjusting it to approach xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT with a specific style, aiming to minimize Dz(xpro,xt)subscript𝐷𝑧subscript𝑥prosubscript𝑥𝑡D_{z}(x_{\text{pro}},x_{t})italic_D start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT pro end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). PhotoGuard (PGuard) [10] using two schemes – using either the encoder or the entire diffusion process to optimize δ𝛿\deltaitalic_δ to minimize Dz(xpro,xt)subscript𝐷𝑧subscript𝑥prosubscript𝑥𝑡D_{z}(x_{\text{pro}},x_{t})italic_D start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT pro end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) in the latent space of encoder and LDM, respectively. Anti-DreamBooth (AntiDB) [27] optimizes δ𝛿\deltaitalic_δ to minimize DM’s generation ability by making x𝑥xitalic_x difficult to reconstruct from xprosubscript𝑥prox_{\text{pro}}italic_x start_POSTSUBSCRIPT pro end_POSTSUBSCRIPT.

3.1.2 Model Sanitization (Ms)

This approach is designed for model providers by guiding pre-trained DMs to remove copyright concepts before public deployment, ensuring that the models do not reproduce copyrighted content illegally.

Formalization – Given a concept protected by copyright ccrCsubscript𝑐cr𝐶c_{\text{cr}}\in Citalic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT ∈ italic_C (where C𝐶Citalic_C is the set of all concepts) and a specific unrelated concept cCccrsubscript𝑐𝐶subscript𝑐crc_{\text{$\varnothing$}}\in C\setminus c_{\text{cr}}italic_c start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ∈ italic_C ∖ italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT. It shifts model’s generation distribution conditioned on ccrsubscript𝑐crc_{\text{cr}}italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT, denoted as pϕ(x|ccr)subscript𝑝italic-ϕconditional𝑥subscript𝑐crp_{\phi}(x|c_{\text{cr}})italic_p start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_x | italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT ), toward the distribution conditioned on the unrelated concept csubscript𝑐c_{\text{$\varnothing$}}italic_c start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT, denoted as pϕ(x|c)subscript𝑝italic-ϕconditional𝑥subscript𝑐p_{\phi}(x|c_{\text{$\varnothing$}})italic_p start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_x | italic_c start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ). To measure the alignment, we minimize the KL divergence DKLsubscript𝐷𝐾𝐿D_{KL}italic_D start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT between these distributions through the transformation ϕitalic-ϕ\phiitalic_ϕ, the model’s output distribution is adjusted to reduce its ability to generate images corresponding to ccrsubscript𝑐crc_{\text{cr}}italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT. The objective can be formalized as:

argminϕDKL(p(x|c)pϕ(x|ccr)).\arg\min_{\phi}D_{KL}(p(x|c_{\text{$\varnothing$}})\parallel p_{\phi}(x|c_{% \text{cr}})).roman_arg roman_min start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT ( italic_p ( italic_x | italic_c start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) ∥ italic_p start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_x | italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT ) ) . (3)

Approaches – Based on the difference in distribution alignment, the approaches can be categorized into two types: fine-tuning and inference guiding methods.

Fine-tuning methods adjust pϕ(x|ccr)subscript𝑝italic-ϕconditional𝑥subscript𝑐crp_{\phi}(x|c_{\text{cr}})italic_p start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_x | italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT ) by modifying the DM’s U-Net weights, targeting different components depending on the method [42]. For instance, Forget-Me-Not (FMN) [12] fine-tunes U-Net cross-attention layers’ weights to minimize the Frobenius norm of attention maps between input feature and embedding of ccrsubscript𝑐crc_{\text{cr}}italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT, aligning pϕ(x|ccr)subscript𝑝italic-ϕconditional𝑥subscript𝑐crp_{\phi}(x|c_{\text{cr}})italic_p start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_x | italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT ) more closely with p(x|c)𝑝conditional𝑥subscript𝑐p(x|c_{\text{$\varnothing$}})italic_p ( italic_x | italic_c start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ). Erased Stable Diffusion (ESD) [11] fine-tunes to both cross-attention and unconditional layers to diminish ccrsubscript𝑐crc_{\text{cr}}italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT’s influence in denoising prediction. Ablating Concepts (AC) [32] further fine-tunes U-Net weights, including projection matrices in cross-attention layers, and text transformer embedding to minimize KL divergence for a tighter alignment. Unified Concept Editing (UCE) [19] strategically modifies U-Net’s cross-attention keys and values associated with text embeddings of ccrsubscript𝑐crc_{\text{cr}}italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT to align pϕ(x|ccr)subscript𝑝italic-ϕconditional𝑥subscript𝑐crp_{\phi}(x|c_{\text{cr}})italic_p start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_x | italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT ) with p(x|c)𝑝conditional𝑥subscript𝑐p(x|c_{\text{$\varnothing$}})italic_p ( italic_x | italic_c start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) while preserving unrelated concepts csubscript𝑐c_{\text{$\varnothing$}}italic_c start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT.

Inference guiding methods adjust the sampling process without altering model weights. In SD, each sampling step involves conditional and unconditional denoising. The final noise prediction is derived by taking the difference between these two samplings. Negative Prompt (NP) [33] replaces unconditional noise prediction with noise conditioned on ccrsubscript𝑐crc_{\text{cr}}italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT, guiding diffusion away from the ccrsubscript𝑐crc_{\text{cr}}italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT. Safe Latent Diffusion (SLD) [20] adds a safety guidance term, further shifting the distribution away from pϕ(x|ccr)subscript𝑝italic-ϕconditional𝑥subscript𝑐crp_{\phi}(x|c_{\text{cr}})italic_p start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_x | italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT ).

3.1.3 Digital Watermarking (Dw)

This approach embeds invisible messages in images to trace image origins and verify copyright. Unlike traditional post-hoc watermarks [43, 44] applied after image generation and do not involve DMs, we discussed watermarks in the generation process of DMs. This can be achieved by embedding watermarks directly in the training data and fine-tuning the DM, or by modifying latent vectors to impact the generation of images.

TABLE V: Summary of copyright attack methods in CopyrightMeter.

Attack Category Approach Type Methodology Accessibility Scenario Target Capability of Adversary Text Image Model I2I T2I JPEG [28] Noise Purification Empirical Data Compression ×\times× \checkmark ×\times× \checkmark ×\times× Lossy Compression Raw Data Availability Quant[29] Empirical Data Compression ×\times× \checkmark ×\times× \checkmark ×\times× Lossy Compression TVM [30] Optimization Denoising and Smoothing ×\times× \checkmark ×\times× \checkmark ×\times× Perturbation Purification IMPRESS [15] Optimization Denoising and Smoothing ×\times× \checkmark \checkmark \checkmark ×\times× Perturbation Purification DiffPure [31] Optimization Image Regeneration ×\times× \checkmark \checkmark \checkmark ×\times× Perturbation Purification LoRA [34] Concept Recovery Optimization Model Fine-Tuning \checkmark \checkmark \checkmark \checkmark \checkmark Personalizing Generation Model Weights Availability DB [5] Optimization Model Fine-Tuning \checkmark \checkmark \checkmark \checkmark \checkmark Personalizing Generation TI [35] Optimization Model Fine-Tuning \checkmark \checkmark \checkmark \checkmark \checkmark Personalizing Generation CI [16] Optimization Model Fine-Tuning \checkmark \checkmark \checkmark \checkmark ×\times× Sanitized Concepts Retrieval RB [36] Optimization Prompt Engineering \checkmark ×\times× ×\times× ×\times× \checkmark Sanitized Concepts Retrieval Bright [39] Watermark Removal Empirical Image Distortion ×\times× \checkmark ×\times× \checkmark ×\times× Watermark Obscuration Final Image Availability Rotate [39] Empirical Image Distortion ×\times× \checkmark ×\times× \checkmark ×\times× Watermark Obscuration Crop [39] Empirical Image Distortion ×\times× \checkmark ×\times× \checkmark ×\times× Watermark Obscuration Blur [40] Empirical Image Distortion ×\times× \checkmark ×\times× \checkmark ×\times× Watermark Obscuration VAE [41] Optimization Image Regeneration ×\times× \checkmark \checkmark \checkmark ×\times× Image Compression DiffPure [31] Optimization Image Regeneration ×\times× \checkmark \checkmark \checkmark ×\times× Perturbation Purification

Formalization – Embedding a watermark message m𝑚mitalic_m into an image x𝑥xitalic_x with a function w𝑤witalic_w results in a watermarked image xwm=w(x,m)subscript𝑥wm𝑤𝑥𝑚x_{\text{wm}}=w(x,m)italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT = italic_w ( italic_x , italic_m ). An extraction function e𝑒eitalic_e is decodes the message mwm=e(xwm)subscript𝑚wm𝑒subscript𝑥wmm_{\text{wm}}=e(x_{\text{wm}})italic_m start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT = italic_e ( italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT ). The watermarked image xwmsubscript𝑥wmx_{\text{wm}}italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT should remain visually similar to the x𝑥xitalic_x, and the mwmsubscript𝑚wmm_{\text{wm}}italic_m start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT should accurately reflect m𝑚mitalic_m. The goal is to find w𝑤witalic_w that minimizes either pixel distance Dpsubscript𝐷𝑝D_{p}italic_D start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT or latent space distance Dzsubscript𝐷𝑧D_{z}italic_D start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT between x𝑥xitalic_x and xwmsubscript𝑥wmx_{\text{wm}}italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT, while optionally minimizing the text discrepancy Dtsubscript𝐷tD_{\text{t}}italic_D start_POSTSUBSCRIPT t end_POSTSUBSCRIPT between m𝑚mitalic_m and mwmsubscript𝑚wmm_{\text{wm}}italic_m start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT, depending on the specific method. This can be formalized as:

minw[αDp(x,xwm)+βDz(x,xwm)+λDt(m,mwm)],subscript𝑤𝛼subscript𝐷𝑝𝑥subscript𝑥wm𝛽subscript𝐷𝑧𝑥subscript𝑥wm𝜆subscript𝐷𝑡𝑚subscript𝑚wm\min_{w}\left[\alpha D_{p}(x,x_{\text{wm}})+\beta D_{z}(x,x_{\text{wm}})+% \lambda D_{t}(m,m_{\text{wm}})\right],roman_min start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT [ italic_α italic_D start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_x , italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT ) + italic_β italic_D start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_x , italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT ) + italic_λ italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_m , italic_m start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT ) ] , (4)

where α𝛼\alphaitalic_α, β𝛽\betaitalic_β, and λ𝜆\lambdaitalic_λ are weights that balance image quality, latent space similarity, and message accuracy, respectively. Depending on the approach, either Dpsubscript𝐷𝑝D_{p}italic_D start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT or Dzsubscript𝐷𝑧D_{z}italic_D start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT (or both) may be used, and Dtsubscript𝐷𝑡D_{t}italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is included if relevant.

Approaches – The following methods outline different watermark embedding processes, denoted by w𝑤witalic_w. DiffusionShield (DShield) [13] encodes the watermark message m𝑚mitalic_m as a binary sequence, embedding each bit into distinct regions of the image x𝑥xitalic_x, with a decoder optimized to minimize the discrepancy between mwmsubscript𝑚wmm_{\text{wm}}italic_m start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT and m𝑚mitalic_m, while controlling the subscript\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-norm to reduce the pixel distance between x𝑥xitalic_x and xwmsubscript𝑥wmx_{\text{wm}}italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT. Diagnosis (Diag) [37] applies a text trigger to a dataset subset, fine-tuning the model to generate xwmsubscript𝑥wmx_{\text{wm}}italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT, and trains a binary classifier for watermark detection. Stable Signature (StabSig) [1] fine-tunes the decoder of the image generator with a binary signature, producing xwmsubscript𝑥wmx_{\text{wm}}italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT while minimizing perceptual distortion Dp(x,xwm)subscript𝐷𝑝𝑥subscript𝑥wmD_{p}(x,x_{\text{wm}})italic_D start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_x , italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT ) and message discrepancy Dt(m,mwm)subscript𝐷𝑡𝑚subscript𝑚wmD_{t}(m,m_{\text{wm}})italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_m , italic_m start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT ). Tree-Ring (TR) [14] embeds m𝑚mitalic_m in the Fourier space of initial noise latent vector, detectable through DDIM inversion[45], while minimizing L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT distance between m𝑚mitalic_m and mwmsubscript𝑚wmm_{\text{wm}}italic_m start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT from the Fourier transform of the inverted noise vector. ZoDiac [38] is equipped for watermarking existing images by embedding m𝑚mitalic_m into the latent vector through DDIM inversion, incorporating Euclidean distance, SSIM loss, and Watson-VGG perceptual loss to minimize the pixel distance of xwmsubscript𝑥wmx_{\text{wm}}italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT and x𝑥xitalic_x. Gaussian Shading (GShade) [39] maps m𝑚mitalic_m to latent representations following a standard Gaussian distribution, aiming to preserve the distribution between x𝑥xitalic_x and xwmsubscript𝑥wmx_{\text{wm}}italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT for fidelity.

3.2 Attack Schemes

This subsection outlines the copyright attack schemes evaluated. Table V summarizes attack methods and their detailed characteristics.

3.2.1 Noise Purification (Np)

This process employs specific transformation as an attack to remove the protective perturbations added to images in Op, thereby evaluating the effectiveness of Op under attack and assessing its resilience.

Formalization – Given a protected image xpro=x+δsubscript𝑥pro𝑥𝛿x_{\text{pro}}=x+\deltaitalic_x start_POSTSUBSCRIPT pro end_POSTSUBSCRIPT = italic_x + italic_δ, the adversary aims to apply a transformation τ𝜏\tauitalic_τ to remove the perturbation δ𝛿\deltaitalic_δ. These methods can be classified into two categories: (i) Experience-based methods, which use common transformations (e.g., JPEG compression) as τ𝜏\tauitalic_τ to remove perturbation δ𝛿\deltaitalic_δ while having little impact on the pixels difference between xprosubscript𝑥prox_{\text{pro}}italic_x start_POSTSUBSCRIPT pro end_POSTSUBSCRIPT and xpursubscript𝑥purx_{\text{pur}}italic_x start_POSTSUBSCRIPT pur end_POSTSUBSCRIPT. (ii) Optimization-based methods eliminate the potential protection δ𝛿\deltaitalic_δ more accurately by customizing transformations to align the latent and pixel spaces of xpursubscript𝑥purx_{\text{pur}}italic_x start_POSTSUBSCRIPT pur end_POSTSUBSCRIPT. Specifically, it minimizes the pixel distance between xpursubscript𝑥purx_{\text{pur}}italic_x start_POSTSUBSCRIPT pur end_POSTSUBSCRIPT and reconstructed image fθ(xpur)subscript𝑓𝜃subscript𝑥purf_{\theta}(x_{\text{pur}})italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT pur end_POSTSUBSCRIPT ) generated from the latent representation. Besides, for purification fidelity, it is crucial that xpursubscript𝑥purx_{\text{pur}}italic_x start_POSTSUBSCRIPT pur end_POSTSUBSCRIPT remains visually similar to the original image x𝑥xitalic_x. However, as x𝑥xitalic_x is typically unavailable during attacks. Therefore, xprosubscript𝑥prox_{\text{pro}}italic_x start_POSTSUBSCRIPT pro end_POSTSUBSCRIPT is used to approximate x𝑥xitalic_x due to the minor perturbation δ𝛿\deltaitalic_δ. Visual similarity is then achieved by constraining the pixel distance between xpursubscript𝑥purx_{\text{pur}}italic_x start_POSTSUBSCRIPT pur end_POSTSUBSCRIPT and xprosubscript𝑥prox_{\text{pro}}italic_x start_POSTSUBSCRIPT pro end_POSTSUBSCRIPT. This overall process can be formatted as:

minτDp(xpur,fθ(xpur)),s.t. Dp(xpro,xpur)ϵ.subscript𝜏subscript𝐷𝑝subscript𝑥pursubscript𝑓𝜃subscript𝑥purs.t. subscript𝐷𝑝subscript𝑥prosubscript𝑥puritalic-ϵ\min_{\tau}D_{p}(x_{\text{pur}},f_{\theta}(x_{\text{pur}})),\text{s.t. }D_{p}(% x_{\text{pro}},x_{\text{pur}})\leq\epsilon.roman_min start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT pur end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT pur end_POSTSUBSCRIPT ) ) , s.t. italic_D start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT pro end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT pur end_POSTSUBSCRIPT ) ≤ italic_ϵ . (5)

ApproachesExperience-based methods use the following transformation as τ𝜏\tauitalic_τ: JPEG [28] is a lossy compression algorithm that uses discrete cosine transform to remove high-frequency components from xprosubscript𝑥prox_{\text{pro}}italic_x start_POSTSUBSCRIPT pro end_POSTSUBSCRIPT; Quantization (Quant) [29] compresses pixel values to single discrete values.

Optimization-based methods include: Total Variation Minimization (TVM) [30] reduces δ𝛿\deltaitalic_δ by minimizing unnecessary pixel intensity variations (i.e., gradient amplitude). IMPRESS [15] purifies xprosubscript𝑥prox_{\text{pro}}italic_x start_POSTSUBSCRIPT pro end_POSTSUBSCRIPT by minimizing the consistency between xpursubscript𝑥purx_{\text{pur}}italic_x start_POSTSUBSCRIPT pur end_POSTSUBSCRIPT and fθ(xpur)subscript𝑓𝜃subscript𝑥purf_{\theta}(x_{\text{pur}})italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT pur end_POSTSUBSCRIPT ) while limiting the LPIPS between xprosubscript𝑥prox_{\text{pro}}italic_x start_POSTSUBSCRIPT pro end_POSTSUBSCRIPT and xpursubscript𝑥purx_{\text{pur}}italic_x start_POSTSUBSCRIPT pur end_POSTSUBSCRIPT for visual similarity; DiffPure [31] adds noise to xprosubscript𝑥prox_{\text{pro}}italic_x start_POSTSUBSCRIPT pro end_POSTSUBSCRIPT and then denoises to remove δ𝛿\deltaitalic_δ, limiting the upper bound of pixel distance between xprosubscript𝑥prox_{\text{pro}}italic_x start_POSTSUBSCRIPT pro end_POSTSUBSCRIPT and xpursubscript𝑥purx_{\text{pur}}italic_x start_POSTSUBSCRIPT pur end_POSTSUBSCRIPT.

3.2.2 Concept Recovery (Cr)

This process targets vulnerabilities to recover sanitized concepts, enabling sanitized models to generate images with copyrighted concepts, thus posing a risk of illegal replication. This evaluation assesses the resilience of sanitized models to such recovery attempts.

Formalization – For a sanitized model with output distribution pθ(x|ccr)subscript𝑝𝜃conditional𝑥subscript𝑐crp_{\theta}(x|c_{\text{cr}})italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x | italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT ) aligned with unrelated concept distribution p(x|c)𝑝conditional𝑥subscript𝑐p(x|c_{\text{$\varnothing$}})italic_p ( italic_x | italic_c start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ), Cr aims to realign pθ(x|ccr)subscript𝑝𝜃conditional𝑥subscript𝑐crp_{\theta}(x|c_{\text{cr}})italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x | italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT ) to a reference distribution p(x|cref)𝑝conditional𝑥subscript𝑐refp(x|c_{\text{ref}})italic_p ( italic_x | italic_c start_POSTSUBSCRIPT ref end_POSTSUBSCRIPT ). This reference distribution corresponds to images containing crefsubscript𝑐refc_{\text{ref}}italic_c start_POSTSUBSCRIPT ref end_POSTSUBSCRIPT, which are similar to the copyright content. The goal is to minimize the divergence between p(x|cref)𝑝conditional𝑥subscript𝑐refp(x|c_{\text{ref}})italic_p ( italic_x | italic_c start_POSTSUBSCRIPT ref end_POSTSUBSCRIPT ) and pθ(x|ccr)subscript𝑝𝜃conditional𝑥subscript𝑐crp_{\theta}(x|c_{\text{cr}})italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x | italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT ), enabling the sanitized model to regenerate images containing ccrsubscript𝑐crc_{\text{cr}}italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT. This can be formatted as:

argminθDKL(p(x|cref)pθ(x|ccr)).\arg\min_{\theta}D_{KL}(p(x|c_{\text{ref}})\parallel p_{\theta}(x|c_{\text{cr}% })).roman_arg roman_min start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT ( italic_p ( italic_x | italic_c start_POSTSUBSCRIPT ref end_POSTSUBSCRIPT ) ∥ italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x | italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT ) ) . (6)

Approaches – These methods learn embeddings from reference images with crefsubscript𝑐refc_{\text{ref}}italic_c start_POSTSUBSCRIPT ref end_POSTSUBSCRIPT and adjust the sanitized model to realign its output distribution. LoRA [34] modifies θ𝜃\thetaitalic_θ using a low-rank decomposition of weight updates, efficiently fine-tuning the model to align pθ(x|ccr)subscript𝑝𝜃conditional𝑥subscript𝑐crp_{\theta}(x|c_{\text{cr}})italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x | italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT ) with p(x|cref)𝑝conditional𝑥subscript𝑐refp(x|c_{\text{ref}})italic_p ( italic_x | italic_c start_POSTSUBSCRIPT ref end_POSTSUBSCRIPT ). Similarly, DreamBooth (DB) [5] fine-tunes models on a set of images with crefsubscript𝑐refc_{\text{ref}}italic_c start_POSTSUBSCRIPT ref end_POSTSUBSCRIPT, embedding crefsubscript𝑐refc_{\text{ref}}italic_c start_POSTSUBSCRIPT ref end_POSTSUBSCRIPT into the model’s output domain to produce images with the distribution p(x|cref)𝑝conditional𝑥subscript𝑐refp(x|c_{\text{ref}})italic_p ( italic_x | italic_c start_POSTSUBSCRIPT ref end_POSTSUBSCRIPT ). Textual Inversion (TI) [35] optimizes embedding for crefsubscript𝑐refc_{\text{ref}}italic_c start_POSTSUBSCRIPT ref end_POSTSUBSCRIPT by modifying the loss function to incorporate crefsubscript𝑐refc_{\text{ref}}italic_c start_POSTSUBSCRIPT ref end_POSTSUBSCRIPT during noise prediction, minimizing the discrepancy between noise predictions for generated and reference images. Concept Inversion (CI) [16] learns specialized embeddings that can recover ccrsubscript𝑐crc_{\text{cr}}italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT for each Ms approach to further improve alignment with ccrsubscript𝑐crc_{\text{cr}}italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT. Ring-A-Bell (RB) [36] is a model-agnostic method that extracts holistic representations of ccrsubscript𝑐crc_{\text{cr}}italic_c start_POSTSUBSCRIPT cr end_POSTSUBSCRIPT to identify prompts that might trigger unauthorized generation of copyright content.

3.2.3 Watermark Removal (Wr)

To assess the resilience of Dw against watermark removal, this approach evaluates watermark robustness by attempting to remove them.

Formalization – Given a watermarked image xwmsubscript𝑥wmx_{\text{wm}}italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT, an adversary applies typical image transformation attack a𝑎aitalic_a to generate a watermark-removed image xwr=a(xwm)subscript𝑥wr𝑎subscript𝑥wmx_{\text{wr}}=a(x_{\text{wm}})italic_x start_POSTSUBSCRIPT wr end_POSTSUBSCRIPT = italic_a ( italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT ). The goal is to make the watermark undetectable while keeping xwrsubscript𝑥wrx_{\text{wr}}italic_x start_POSTSUBSCRIPT wr end_POSTSUBSCRIPT visually similar to xwmsubscript𝑥wmx_{\text{wm}}italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT. Following [14, 46], the pixel-level distortion between xwrsubscript𝑥wrx_{\text{wr}}italic_x start_POSTSUBSCRIPT wr end_POSTSUBSCRIPT and xwmsubscript𝑥wmx_{\text{wm}}italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT is constrained to stay below a threshold ϵitalic-ϵ\epsilonitalic_ϵ, ensuring visual similarity. Formally, this objective is expressed as:

Dp(xwm,xwr)ϵ.subscript𝐷𝑝subscript𝑥wmsubscript𝑥writalic-ϵD_{p}(x_{\text{wm}},x_{\text{wr}})\leq\epsilon.italic_D start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT wr end_POSTSUBSCRIPT ) ≤ italic_ϵ . (7)

Approaches – Brightness Adjustment (Bright) [39] adjusts the brightness of xwmsubscript𝑥wmx_{\text{wm}}italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT to produce xwrsubscript𝑥wrx_{\text{wr}}italic_x start_POSTSUBSCRIPT wr end_POSTSUBSCRIPT. Image Rotation (Rotate) [39] rotates xwmsubscript𝑥wmx_{\text{wm}}italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT to disrupt synchronization between the watermark embedder and detector. Random Crop (Crop) [39] removes portions of xwmsubscript𝑥wmx_{\text{wm}}italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT. Gaussian Blur (Blur) [39] convolves xwmsubscript𝑥wmx_{\text{wm}}italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT with a Gaussian kernel to smooth the image and reduce watermark visibility. VAE-Cheng20 (VAE) [41] compresses xwmsubscript𝑥wmx_{\text{wm}}italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT using discretized Gaussian mixture likelihoods and attention modules to obscure the watermark. DiffPure [31] adds noise to xwmsubscript𝑥wmx_{\text{wm}}italic_x start_POSTSUBSCRIPT wm end_POSTSUBSCRIPT, followed by DM-based denoising to remove the watermark.

3.3 Threat Model

We systematically categorize the security threats to copyright protection methods based on the adversary’s objective, knowledge, and capability.

Adversary’s objective. In the field of text-to-image (T2I) diffusion models, adversaries aim to generate specific style/concept images. They exploit system flaws and challenge security measures to enable illegal copying and editing of images. Their objectives are multifaceted, including emulating a specific artist’s style, undeterred by existing obfuscation protections, the regeneration of sanitized concepts from purposefully sanitized models, and evading watermark detection. All these endeavors are pursued while maintaining a level of quality akin to the original copyrighted images.

Adversary’s knowledge. Considering the variations in different protection methods, we’ve tailored our model of the adversary’s background knowledge to capture these nuances. For obfuscation protections and digital watermarking, the adversary is capable of accessing the safeguarded or watermarked artistic images. In model sanitization, the adversary can access the sanitized model and a small set of reference images embodying the target concepts.

Adversary’s capability. In a similar vein, we’ve adjusted our model of the adversary’s capability to reflect these nuances. For obfuscation processing and digital watermarking, the adversary can modify the protected or watermarked images. In the context of model sanitization, the adversary can draw upon their knowledge of sanitized methods to retrain sanitized models using example images, thereby recovering the sanitized concepts.

4 Experiments

Leveraging CopyrightMeter, we conduct a systematic evaluation of existing copyright protection and attack methods, uncovering their intricate design landscape.

4.1 Experimental Setup

Datasets. We evaluate on three datasets: WikiArt [47], CustomConcept101 [6] (referred to as Concept), and Person [16]. WikiArt contains over 42,000 artworks from 129 artists, categorized by genre (e.g., Impressionism). Concept consists of images of 101 specific concepts, each with 3 to 15 images. Person consists of photos of 10 distinct celebrities, with 15 images for each individual derived from the LAION dataset [48]. For Op, following [15, 38], we use WikiArt and Concept. For Ms, following [12, 16], we use WikiArt and Person. For Dw, we use all three datasets.

Models. We evaluate the widely used and open-source DM implementation Stable Diffusion (SD). Previous studies used different versions of the SD, making it difficult to isolate the effects of copyright protection methods from model variations. We select a representative SD [49] version 1.5111https://huggingface.co/runwayml/stable-diffusion-v1-5, the most widely downloaded version on the Hugging Face platform with a resolution of 512×\times×512 as the T2I DM in image generation experiments for a unified evaluation.

Metrics. Following [15, 26, 10], CopyrightMeter incorporates several key metrics: Peak Signal-to-Noise Ratio (PSNR) quantifies the ratio between maximum possible signal power and noise; Structural Similarity Index Measure (SSIM) evaluates structural similarity, brightness, and contrast between two images; Visual Information Fidelity (VIFp) assesses image quality based on information fidelity; Learned Perceptual Image Patch Similarity (LPIPS) uses deep learning features for perceptual similarity assessment; Fréchet Inception Distance (FID) measures the distribution distance between feature vectors for real and generated images; CLIP-I and CLIP-T use CLIP model [50] to assess the image-image similarity and text-image alignment, respectively; ACC denotes the detection accuracy of watermarks. Except for FID and LPIPS, higher values indicate closer alignment with the reference image or corresponding text. Table VII in Appendix A provides a detailed breakdown of these metrics across fidelity, efficacy, and resilience.

Implementation. All experiments are conducted on a server with two Intel Xeon CPUs, 64 GB memory, a 4TB HDD, and an NVIDIA A800 GPU. Appendix B details experimental setup for copyright protections and attacks.

4.2 Obfuscation Processing Evaluation

In this subsection, we evaluate the performance of obfuscation processing (Op) methods to understand how different design choices affect their effectiveness. Specifically, we begin by applying the Op methods to generate the protected images, and then employ DreamBooth [5] to mimic the style of these protected images. Our evaluation focuses on three aspects: fidelity, which measures the similarity between protected and original images; efficacy, which measures how effectively the protected images prevent mimicry; and resilience, which examines the robustness of protection when using noise purification (Np) attacks to remove the perturbation. The setup details are shown in Appendix B.1.

Fidelity – For a protected image, it is crucial that it appears visually identical to the original to maintain the image’s utility. High fidelity indicates better preservation of artistic and semantic values. To evaluate the fidelity of various Op protections, we use several widely-used metrics, such as LPIPS, SSIM, PSNR, VIFp, and FID, as outlined in Table VII. Figure 3 and 5 quantitatively and qualitatively show the fidelity evaluation of these methods compared with the protected image with the originals, respectively.

Refer to caption
Figure 3: Fidelity evaluation of Op.

Figure 3 illustrates that fidelity varies among different Op protection methods. AntiDB shows the highest fidelity, with the lowest LPIPS averaging around 0.1 and FID around 80, along with the highest SSIM (exceeding 0.9), PSNR (above 36), and VIFp (around 4.4) across datasets. AdvDM and Mist also exhibit relatively low FID averaging around 110, suggesting better preservation of image quality. This is likely due to these methods incorporating DMs in perturbation optimization (cf. Table IV), enhancing fidelity compared to methods relying solely on image encoder.

Inconsistencies arise when comparing metrics like LPIPS and FID. For instance, PGuard’s low LPIPS of around 0.095 suggests a high visual similarity to the original images, but its high FID of over 180 suggests poor overall fidelity across the dataset. This disparity may stem from the different focuses of these metrics: LPIPS emphasizes semantic and perceptual similarities and visual details, while FID assesses how well the generated images align with the distribution of the original dataset, considering broader structural and statistical properties. Therefore, relying on a single metric can be misleading, highlighting the need for diverse metrics to comprehensively evaluate fidelity.

These findings align with the visualizations in Figure 5, where the AntiDB-protected images closely resemble the originals, while those from AdvDM and Mist maintain high similarity with only slight noise. Notably, all three methods DM into their optimization, contributing to their fidelity advantage. In contrast, images protected by Glaze and PGuard show more noticeable alterations. Specifically, Glaze introduces subtle and unique distortions, while PGuard results in a stretched appearance compared to the original images. Overall, while fidelity differs across Op methods, all successfully preserve key visual characteristics, ensuring utility without compromising the viewer’s experience.

Remark 1 – Op methods that use diffusion models during the perturbation optimization process tend to achieve higher fidelity compared to those that rely solely on image encoders.
Refer to caption
Figure 4: Efficacy evaluation of Op.

EfficacyFollowing [26, 7], we fine-tune pre-trained SD models using protected images to generate mimicked images. For comparison, we also generate mimicked images from original (unprotected) images as a baseline. We assess the similarity between mimicked images produced from protected images and original images using FID, CLIP-I, and text-image alignment with CLIP-T. Figure 4 presents the quantitative results, and Figure 5 displays visual examples.

Refer to caption
Figure 5: Fidelity and efficacy visualization of Op methods. Row 1&3: protected images; row 2&4: mimicked images.

As shown in Figure 4, mimicked images generated from protected images show a stronger deviation from originals, reflected in higher FID and lower CLIP-I and CLIP-T compared with originals, indicating efficacy in deterring copyright mimicry. Notably, Mist shows the highest efficacy, with an average FID increase (from around 150 to 400) and reductions in CLIP-I (from 0.7 to 0.55) and CLIP-T (from 0.30 to 0.23) across two datasets. This indicates that mimicked images significantly diverge from the originals and their text descriptions, thus effectively mitigating copyright mimicry. Mist’s strong efficacy is due to its incorporation of both image encoders and diffusion models into adversarial perturbation optimization[23], effectively increasing latent space distance while minimizing pixel-level deviation. Other protections, such as AdvDM, AntiDB and PGuard, provide moderate protection with smaller changes in FID, CLIP-I, and CLIP-T, indicating subtler deviations. In contrast, Glaze provides limited protection, as it slightly increases the FID (e.g., from 206 to 212 on the WikiArt dataset) while also slightly reducing both CLIP-T and CLIP-I. This result can be partly explained by the differences in fine-tuning methods used for image mimicry, as DreamBooth differs from the fine-tuning methods employed in Glaze (details in Sec 5.1).

Figure 4 demonstrates that the efficacy of protection methods varies across datasets. For instance, in the WikiArt dataset, almost all protection methods significantly reduce the similarity between generated images and text descriptions, as quantified by CLIP-T. However, in the Concept dataset, only Mist shows a reduction in CLIP-T from 0.3 to 0.28, while other methods remain close to the baseline. This suggests that protecting Concept is more challenging than WikiArt, likely due to two factors: (i) many protection methods [26, 23, 7] are optimized for artwork, enhancing their performance in art-centric datasets like WikiArt, and (ii) the distinct styles in WikiArt are easier for protection methods to exploit, which underscores the complexity of evaluating protection methods across different datasets. These findings highlight a need for more adaptable protection methods catering to varied data characteristics and contexts.

Notably, fidelity and efficacy do not always align. While stronger protections typically lead to greater quality degradation, our observations reveal counterintuitive results. For instance, AntiDB exhibits strong fidelity (cf. Figure 3) but does not achieve the best performance in efficacy (cf. Figure 4). Similarly, Mist shows high efficacy but ranks moderately in fidelity. These showcase the complex interplay between fidelity (preserving image quality) and efficacy (ensuring robust copyright protection against mimicry), underscoring the need for a balance between visual quality and protection effectiveness in practical applications.

These quantitative findings align with visualization results in Figure 5, where mimicked images from protected images show distinct styles from the originals. Mist shows the most unique textures (highest efficacy), while AdvDM and AntiDB show artifacts (moderate efficacy). Recognizing the efficacy of each method is crucial for selecting optimal strategies to prevent mimicry of copyrighted content.

Refer to caption
Figure 6: Resilience evaluation of Op against Np.
Refer to caption
Figure 7: Resilience visualization of Op against Np. Column 1: mimicked images generated from protected images. Column 2-5: mimicked images generated from attacked images.

ResilienceFollowing the approach outlined by [15], we assess the resilience of Op protection methods against Np attacks. Our evaluation process involves fine-tuning Stable Diffusion (SD) models using purified images generated by applying Np to protected images (i.e., copyrighted images with Op applied). We then evaluate the mimicry performance of these fine-tuned models, where a higher mimicry performance indicates lower resilience of the protection method. Figure 6 presents the resilience evaluation results of various Op protection methods against Np attacks.

Analysis of Figure 6 reveals several key insights. 1) All protection methods, except Glaze, show a notable decline in effectiveness when subjected to purification attacks, as evidenced by higher mimicry performance. For instance, AdvDM-protected images, when purified, achieve lower FID and higher CLIP-I and CLIP-T compared to their unpurified counterparts, indicating a higher mimicry performance. Note that, Glaze’s apparent resilience stems more from its initially limited protection performance than superior defensive capabilities. 2) Different Op methods show varying protection abilities when applying attacks. For example, thanks to its initial strong protection performance, Mist still maintains relatively higher FID and lower CLIP-I and CLIP-T than other protection methods under attack. 3) TVM and DiffPure emerge as the most potent methods for diminishing Op protection, achieving higher mimicry performance. 4) CLIP-T shows less sensitivity than other metrics, especially on the Concept dataset where it remains nearly constant across most attack methods. We believe this is due to its robustness to minor protection artifacts, with significant changes only when major distortions obscure the original concept content.

Furthermore, we visualize the mimicry results of various Np methods against Op techniques in Figure 7. Our visual findings align with the quantitative analysis presented earlier. Specifically, Mist demonstrates superior protection performance even when NP methods are applied, with the exceptions of TVM and DiffPure. This observation further underscores that TVM and DiffPure are the most potent methods for diminishing Op protection: the artifacts in the mimicry images under TVM and DiffPure are notably less pronounced compared to other methods. Additionally, we observe that while Np can indeed diminish protection to some extent, certain Op protection methods still demonstrate a robust ability to prevent mimicry effectively. For instance, we can discern obvious protection patterns for Mist and AdvDM even after the application of Np.

In summary, both quantitative and qualitative analyses demonstrate that Op techniques can be compromised by certain attacks. These findings underscore the critical importance of evaluating protection methods not only for their initial effectiveness but also for their resilience against subsequent attacks. This comprehensive approach to assessment is essential for developing robust and reliable protection strategies in the face of evolving threats.

Remark 2 – TVM and DiffPure serve as dominant attacks to test the lower bounds of resilience in Op protection methods.

4.3 Model Sanitization Evaluation

Similar to Sec 4.2, we assess model sanitization (Ms) across three key dimensions: fidelity, efficacy, and resilience. Fidelity measures the sanitized model’s ability to maintain performance on unrelated content. Efficacy gauges how effectively the sanitized model prevents the generation of copyrighted content, evaluating the thoroughness of the sanitization process. Resilience examines the sanitized model’s robustness against concept recovery (Cr) attacks, assessing whether it consistently avoids reproducing copyright-protected concepts even under adversarial conditions. The detailed experimental setup is given in Appendix B.2.

Fidelity – For sanitized models, it is crucial that sanitization preserves the ability to generate images for other concepts while excluding the copyright concept. Table VI evaluates the fidelity of Ms methods on MS-COCO 2017 [51] 30K dataset prompts. We use FID to measure the differences between images generated by the sanitized models (with the original DM for reference) and real-world images from the dataset. Additionally, CLIP-T is used to assess the alignment between generated images and prompts.

Our analysis reveals that model sanitization (Ms) methods achieve a successful balance between copyright protection and image generation capabilities, with only minor impacts on overall performance. In Table VI, sanitized models experience a marginal increase in FID scores compared to their original counterparts, with SLD showing the most notable change (16.95 vs. 16.21 for the original SD model). This subtle increase suggests a minor impact on image fidelity, likely due to the model inadvertently altering representations of unrelated but adjacent concepts or facing creative constraints when adjusted to exclude copyrighted content. Interestingly, CLIP-T scores remain remarkably consistent across all methods (0.30-0.31), indicating well-preserved textual alignment. These findings align with previous research [19, 20, 11], confirming that while Ms methods may slightly affect image fidelity, they successfully maintain text alignment for unrelated concepts.

In summary, the sanitization process achieves its primary goal of removing specific content without significantly compromising overall performance, demonstrating an effective balance between protecting copyrighted material and maintaining generative capabilities.

TABLE VI: Fidelity evaluation of Ms.
Method SD FMN ESD AC UCE NP SLD
FID \downarrow 16.21 16.47 16.51 16.95 16.64 16.89 16.95
CLIP-T \uparrow 0.31 0.30 0.30 0.31 0.31 0.30 0.30
Refer to caption
Figure 8: Efficacy evaluation of Ms.
Refer to caption
Figure 9: Efficacy visualization of Ms. Column 1: original images; column 2-7: sanitized models’ outputs.

EfficacyTo evaluate the effectiveness of various model sanitization (Ms) methods in removing copyrighted concepts, we focus on two key metrics: image similarity and text-image alignment. We compare images generated by sanitized models to those from the original model using FID and measure the alignment between generated images and their prompts using CLIP-T scores. A higher FID or a lower CLIP-T implies a more effective Ms method.

Figure 8 reveals the variation in FID across images generated from different sanitized models. First, higher FID reflects more efficacy in removing copyright concepts from the sanitized model’s outputs, while CLIP-T also shows a marked decrease from the baseline (original model alignment), suggesting great divergence from copyright content, with ESD performing best sanitization (average FID 311, CLIP-T 0.17). Notably, model fine-tuning methods (i.e., ESD, FMN, and UCE) generally outperform inference-guiding methods (i.e., NP and SLD) with higher FID and lower CLIP-T, reflecting more effective sanitization. This is likely because fine-tuning methods directly modify model parameters for deeper adjustments to reduce the retention of copyrighted content, while inference-guided methods only adjust output directions, resulting in superficial removals.

Visualizations in Figure 9 support these findings. Fine-tuning-based methods like ESD and UCE effectively sanitize artistic styles by visibly altering original textures and colors in WikiArt and portraits into non-face images (i.e., landscapes or still lifes) in Person. In contrast, inference-guided methods like SLD still leave faint traces of original artistic style or individual characteristics. Additionally, these categories differ significantly in time efficiency (cf. Sec 5.2).

Remark 3 – Fine-tuning-based Ms methods show greater efficacy than inference-guiding Ms methods.

Resilience – We evaluate the resilience of Ms against Cr attacks following [16]. Our evaluation process involves generating images from both original models (i.e., models capable of generating content with copyright concepts) and recovered models (i.e., sanitized models subjected to Cr). Figure 10 presents the resilience evaluation results of various Ms protection methods against Cr attacks.

Refer to caption
Figure 10: Resilience evaluation of Ms against Cr.
Refer to caption
Figure 11: Resilience visualization of Ms against Cr. Column 1: original images; column 2: FMN-sanitized images; column 3-7: images generated from the recovered model.

Analysis of Figure 10 uncovers several critical insights. 1) All Ms protection methods show reduced effectiveness under Cr attacks. The FID between images from recovered models and originals is lower than that of sanitized models and originals, with higher CLIP-T of images from recovered models than sanitized models, indicating enhanced resemblance to copyrighted content. For instance, FMN- and AC-sanitized models show relatively low resilience, with low FID scores and high CLIP-T under Cr. Thus, while Ms methods provide initial protection, their resilience against Cr attacks is limited. 2) The resilience of Ms varies with Cr attacks applied. Fine-tuning-based attacks (e.g., LoRA and DB) are the most potent methods for diminishing Ms protection, lowering FID and raising CLIP-T from baseline. In contrast, textual-inversion-based attacks (e.g., TI and CI) cause moderate changes in FID and CLIP-T, while prompt-engineering-based attacks (e.g., RB) lead to minimal deviation. This may stem from incomplete pre-filtering of copyright content in DM’s training dataset [52, 16], as Ms methods often remap them to new embeddings rather than fully remove these concepts. 3) High-potency Cr attacks tend to limited in applicability. LoRA, DB, and TI are potent, but most apply to Ms using standardized open-source models. The custom CI pipeline for each Ms method makes it adjustable to various Ms methods, though its use with new Ms methods is uncertain. In contrast, RB bypasses protections solely through prompt modifications, making it adaptable across diverse T2I DMs.

Visualizations in Figure 11 of images generated from the FMN-sanitized and recovered models reveal the varying resilience of Ms protection against Cr attacks with differing adversary capabilities. Attacks that enable deeper model manipulation, such as fine-tuning and textual-inversion methods, recover original styles in WikiArt and portrait characteristics in Person more effectively. This trend reflects a strong correlation between higher adversary capability and greater attack impact. In contrast, less invasive prompt-engineering attacks have limited success in recovering detailed human portraits, but may still pose a feasible threat in scenarios with constrained adversary capabilities. These findings underscore the need for robust Ms methods that can withstand attacks across varying levels of attacker capability.

Remark 4 – In real-world scenarios where models are accessible but not fine-tunable, the greatest threat to Ms methods comes from prompt engineering-based approaches.

4.4 Digital Watermarking Evaluation

Similarly, we evaluate digital watermarking (Dw) based on three criteria: fidelity, assessing the visual consistency between images before and after Dw; efficacy, determined by the ACC of extracted watermark messages; and resilience, measuring the ACC of message extracted from image after watermark removal (Wr) attacks. Further experimental setup details are outlined in Appendix B.3.

Refer to caption
Figure 12: Fidelity evaluation of Dw.
Refer to caption
Figure 13: Efficacy evaluation of Dw.
Refer to caption
Figure 14: Fidelity visualization of Dw. Column 1: original images; column 2-7: watermarked images.

Fidelity – Maintaining visual similarity to the original image and alignment with the corresponding prompt is essential for watermarked images. Figure 13 evaluates fidelity across all Dw methods, using FID as a general metric. Specifically, for watermarks embedded directly onto existing images, we measure visual consistency with metrics such as LPIPS and SSIM; for generative watermarks that produce watermarked images from prompts, we assess text alignment with CLIP-T. Visualizations are presented in Figure 14.

Figure 13 shows that Dw have minimal impact on image fidelity, where lower LPIPS and higher SSIM, PSNR, VIFp, and CLIP-I suggest greater fidelity. Specifically, DShield, ZoDiac, and Diag exhibit low FID (below 80), indicating minimal visual alteration. This is attributed to its approach of embedding the watermark in the latent space’s Fourier frequencies, making disturbances less visually perceptible [38]. In contrast, GShade, StabSig, and TR display slightly higher FID (exceeding 90) but maintain CLIP-T scores comparable to watermarks on existing images (around 0.3), indicating preserved semantic consistency.

Visualizations in Figure 14 confirm these findings. DShield, ZoDiac, and Diag retain a close resemblance to the originals, while GShade, StabSig, and TR introduce differences, with content and artistic style largely unchanged at the semantic level. This consistency across metrics and visuals supports the fidelity of these watermark designs.

Efficacy – For watermarks, it is crucial that the decoded message exhibits high ACC compared to the embedded message. High efficacy indicates better copyright verification.

Figure 13 shows that most Dw methods achieve ACC close to 100% across datasets, except for DShield, underscoring the efficacy of these watermarks. Notably, TR stands out with 100% ACC across all datasets, indicating robust watermark embedding and decoding capability. DShield shows slightly lower ACC, possibly due to the diverse and complex datasets we used, underscoring a limitation of fine-tuning-based watermarking methods, where efficacy depends on data specificity and quality.

In summary, these findings highlight that the efficacy of Dw is largely dependent on the embedding strategy or the quality of training data, while modifications in latent space show particular promise for high ACC in diverse settings.

Refer to caption
Figure 15: Resilience evaluation of Dw against Wr.

Resilience – We assess Dw protection resilience against Wr attacks by comparing the ACC of messages extracted after watermark removal with the originally embedded message. A higher ACC indicates a stronger resilience.

Figure 15 presents the resilience of various Dw protections against Wr attacks, revealing several key insights. 1) Most watermarks show reduced protection after attacks, with ACC lower than the baseline (un-attacked watermarked images). For example, Diag’s ACC sharply declines under Blur, while StabSig, ZoDiac, and GShade are vulnerable to Rotate. 2) Dw methods vary in resilience against attacks. Compared to the baseline, DShield and TR exhibit only slight declines under attacks, while others face larger reductions under certain attacks. 3) Under attack, latent space modifying methods exhibit higher ACC compared to model fine-tuning methods, with TR maintaining nearly 100% ACC across attacks due to its invisible Fourier space embedding that resists pixel disruption. 4) ZoDiac and GShade share similar vulnerabilities under Bright, Rotate, and Crop attacks, with the lowest ACC observed under Rotate.

In summary, these insights highlight the need to carefully consider specific attack scenarios when choosing watermark strategies. We speculate that latent-space modifying methods leverage the inherent distribution of the diffusion model’s latent space to embed watermarks more subtly and securely, making them harder to detect and remove.

Remark 5 – Latent space-based watermarks tend to be more resilient against attacks than fine-tuning-based watermarks.

5 Exploration

Next, we explore the generalizability, efficiency, and sensitivity of current protection methods. We further compare these methods with their contemporary versions and industry-leading online text-to-image applications. Furthermore, we also conduct user studies to evaluate the alignment between evaluation metrics and human judgment.

Refer to caption
Figure 16: Various style mimicry methods based on fine-tuning.
Refer to caption
Figure 17: Efficiceny evaluation.

5.1 Generalizability

While previous experiments use DreamBooth[5] for image mimicry, other fine-tuning methods can also achieve mimicry. To assess generalizability, following [13, 24], we employ a standard script from Diffusers222https://huggingface.co/docs/diffusers/training/text2image for fine-tuning (details in Appendix B.4). In Figure 17, differences in texture patterns, artifacts, or deviations from protected images reveal protection effectiveness against specific mimicry techniques. Notably, AdvDM and Mist exhibit the highest generalizability, providing strong protection under both fine-tuning methods. Glaze is less effective against DreamBooth but performs better with the standard script. Conversely, PGuard is robust with DreamBooth but is less effective with the standard script. AntiDB provides noticeable protection against both methods, particularly performing well against DreamBooth mimicry. These findings emphasize the need for protection methods that account for diverse mimicry techniques to enhance generalizability.

Remark 6 – The generalizability of various copyright protection methods differs significantly.
Refer to caption
Figure 18: Sensitivity analysis on watermark removal.

5.2 Efficiency

Computational cost is a key factor in copyright protection applications. Op and Dw involve lightweight, image-level manipulations, while Ms and Cr require deeper model-level changes, increasing time consumption. While prior studies often overlook time efficiency, we explore the time consumption of Ms and Cr methods.

As shown in Figure 17, within Ms, inference-guiding methods are more efficient than model fine-tuning methods as they can sanitize a concept within four minutes without retraining. Additionally, Cr methods are generally more time-efficient than Ms methods on average. This is likely because Cr simply enhances existing representations in the model, whereas Ms must first overcome the model’s training biases with a reverse optimization process. Notably, DB is the most efficient Cr method, highlighting the vulnerability of Ms. Therefore, practitioners should carefully select Ms methods based on available computational resources.

Remark 7 – In scenarios where efficiency is a priority, inference-guiding methods are preferred to model fine-tuning methods.

5.3 Sensitivity Analysis

Following [14, 44, 39], we take Dw as a representative for sensitivity analysis, where small perturbations may sharply impact watermark resilience, offering generalizable insights to other protection methods.

As shown in Figure 18, most protection methods exhibit a sharp ACC drop with increased hyperparameter values. For instance, higher brightness, crop ratios, or blur radii result in ACC drops. This implies that strong hyperparameter settings weaken the robustness of most methods. Notably, both TR and Diag demonstrate notable resilience to the Rotate attack, maintaining near 100% ACC even at 90 or 180 rotation, while GShade and ZoDiac suffer sharp decreases. The superior performance of TR is attributed to its multi-ring pattern in Fourier space. Similarly, Diag achieves robustness by embedding triggers to embed robust watermark patterns. These insights help practitioners choose protection methods tailored to real-world attack scenarios.

5.4 Contemporary Assessment

In the evolving field of copyright protection, methods and infringement attacks are in constant competition. We analyze recent versions of protections and attacks: (i) Glaze v2.1333https://glaze.cs.uchicago.edu/ is a closed-source update optimized for styles with clear colors and smooth textures. (ii) Mist v2444https://psyker-team.github.io/ enhances the vanilla Mist [23] with improved efficacy and efficiency. (iii) Noisy Upscaler[24] is an advanced attack that first adds a small amount of random noise to a protected image, then purifies the image using the Upscaler[49].

Figure 19 compares Glaze v2.1 (details in Appendix C) with our open-sourced Glaze implementation, along with Mist v2 and vanilla Mist. Although both Glaze versions introduce similar perturbations, Glaze v2.1 still demonstrates limited resilience, especially against JPEG attacks. Mist v2 achieves improved resilience with post-attack images displaying noticeable mottling and color shifts, while images protected by the original Mist method show a closer resemblance to the originals. These observations underscore the vulnerabilities of existing copyright protection methods to advanced attacks, highlighting the ongoing need for improved protective solutions. In Figure 20, current protections are particularly susceptible to the Noisy Upscaler attack, with heightened vulnerability compared to other methods.

Remark 8 – The arms race between protections and attacks promotes the development of more advanced protections.
Refer to caption
Figure 19: Comparison of Glaze v2.1 and our open-sourced Glaze implementation, along with Mist v2 and Mist.
Refer to caption
Figure 20: The result of Mist and Noisy Upscaler.

5.5 Real-world Online Applications

After analyzing SOTA strategies in academic settings, we compare them with industry-leading online applications. We have reported our findings to the respective companies.

Scenario.gg and NovelAI for image mimicry. To assess the efficacy of Op, we explore two online applications, scenario.gg555https://www.scenario.gg/ and NovelAI666https://novelai.net/. Figure 21 illustrates that Mist, the strongest protection, effectively prevents mimicry on scenario.gg, as the perturbation remains intact. Further, we observe that the mimicked images from TVM- and DiffPure-purified images make artifacts nearly undetectable, emerging as the most potent attacks, aligning with previous findings in Section 4.2. On NovelAI, its style transfer removes Mist’s perturbation, suggesting that frequent model updates may reduce protection efficacy. This highlights the importance of ongoing protection updates to counter mimicry threats.

Refer to caption
Figure 21: Visualization results of scenario.gg and NovelAI.

Amazon Titan Image Generator for watermarking. We assess the resilience of the watermark embedded in Amazon Bedrock Titan Image Generator777https://aws.amazon.com/cn/bedrock/titan/ against various attacks. Figure 22 shows the watermark remains intact against basic attacks (e.g., Bright, Crop, and JPEG), but becomes undetectable under more complex attacks. This implies that online watermarks share similar vulnerabilities to those applied locally. Additionally, online watermarks lack the customization options (e.g., watermark strength). Therefore, practitioners should take flexibility and customization into consideration when choosing watermark methods.

Refer to caption
Figure 22: Watermark from Amazon Titan Image Generator.

5.6 User Study

We conduct user studies to evaluate the alignment between metrics and human perception (details in Appendix D). First, we assess the visual quality and style mimicry of protected and purified images in WikiArt. Following [24], we define success rate as the percentage of users preferring mimicry images fine-tuned on protected or purified images over unprotected ones. We observe that success rates increase after purification, suggesting greater visual similarity to the originals. Notably, average success rates across all mimicry scenarios remain below 50% (50% suggests perfect mimicry), showing that from human perspectives, even mimicked images from purified images still differ significantly from the originals. Mist yields the lowest mimicry success rate (under 10%), indicating the highest efficacy for protection, whereas DiffPure attacks reduce resilience, with success rates around 35%, supporting observations in Sec 4.2. Second, following [11, 19, 20], we further examine whether Ms methods impact the fidelity of images of unrelated concepts. Over 50% of users rate the fidelity and text alignment of images generated by sanitized models as equal to or better than the original SD model, supporting the observations in Sec 4.3, suggesting that sanitized models produce images comparable to those from the original SD. The alignment between metrics and human judgment confirms that CopyrightMeter effectively captures human perception for assessing copyright protection methods.

6 Discussion

Limitations and future work. First, CopyrightMeter integrates most mainstream copyright protection methods in T2I DMs. Although it does not implement all strategies, its modular design allows easy incorporation of new protections, attacks, and metrics. Second, we primarily apply default settings from original papers, as these are typically optimized for performance. However, our framework supports alternative configurations. Finally, most protections require modifying original artworks, posing challenges for established artists whose unprotected works remain vulnerable. Unlike software security, where updates can fix vulnerabilities, copyright protection cannot be easily patched. As offense-defense dynamics evolve, existing protections may not withstand future attacks. We hope that CopyrightMeter provides interim protection and advocates for the establishment of more comprehensive laws and regulations.

Guidance for enhancing protection methods. Our findings reveal limitations in current copyright protections, with CopyrightMeter serving as a valuable benchmark for improvement. For example, adversarial perturbations in Op are easily compromised by simple attacks like JPEG, so incorporating JPEG loss into optimization may improve resilience. In Ms, resilience can be improved through adversarial training with crafted adversarial inputs that induce the generation of copyright concepts, minimizing model output probability under these inputs to facilitate concept erasure in more complex scenarios. Alternatively, if Cr is inevitable, refining Ms to slow recovery efforts provides additional protection. For Dw, designing watermarks with common attack strategies can strengthen resilience.

Additional related work. Recent studies [22, 16, 24] have surveyed copyright protections and attacks methods in T2I DMs but are limited to single-level implementations without empirical evaluation. For instance, [22] discusses Op and Ms without experimental validation or quality assessment of generated images. Similarly, [24] highlights Op artistic style imitation, showing that all existing copyright protections can be bypassed through user studies, but lack quantitative metrics for image quality. In contrast, CopyrightMeter provides a comprehensive framework for evaluation, covering major protection and attack categories within a unified platform for empirical analysis.

7 Conclusion

In this paper, we design and implement CopyrightMeter, a uniform platform dedicated to the comprehensive evaluation of copyright protection for text-to-image diffusion models. Leveraging CopyrightMeter, we conduct systematic evaluations from the perspectives of fidelity, efficacy, and resilience. To our knowledge, this platform is the first of its kind to provide a uniform, comprehensive, informative, and extensible evaluation of existing copyright protections and attacks. It offers empirical support and addresses the under-explored intricacies of copyright protections and attacks that have previously suffered from non-holistic and non-standardized evaluations, thereby tackling long-standing questions in the field.

References

  • [1] P. Fernandez, G. Couairon, H. Jégou, M. Douze, and T. Furon, “The stable signature: Rooting watermarks in latent diffusion models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 22 466–22 477.
  • [2] J. Betker, G. Goh, L. Jing, T. Brooks, J. Wang, L. Li, L. Ouyang, J. Zhuang, J. Lee, Y. Guo, W. Manassra, P. Dhariwal, C. Chu, Y. Jiao, and A. Ramesh, “Improving image generation with better captions,” 2023.
  • [3] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. L. Denton, K. Ghasemipour, R. Gontijo Lopes, B. Karagol Ayan, T. Salimans et al., “Photorealistic text-to-image diffusion models with deep language understanding,” Advances in Neural Information Processing Systems, vol. 35, pp. 36 479–36 494, 2022.
  • [4] (2023) Generative ai has an intellectual property problem. [Online]. Available: https://hbr.org/2023/04/generative-ai-has-an-intellectual-property-problem
  • [5] N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman, “Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 22 500–22 510.
  • [6] N. Kumari, B. Zhang, R. Zhang, E. Shechtman, and J.-Y. Zhu, “Multi-concept customization of text-to-image diffusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1931–1941.
  • [7] S. Shan, J. Cryan, E. Wenger, H. Zheng, R. Hanocka, and B. Y. Zhao, “Glaze: Protecting artists from style mimicry by text-to-image models,” in 32nd USENIX Security Symposium (USENIX Security 23), 2023, pp. 2187–2204.
  • [8] P. Samuelson, “Generative ai meets copyright,” Science, vol. 381, no. 6654, pp. 158–161, 2023.
  • [9] M. Heikkilä, “This artist is dominating ai-generated art. and he’s not happy about it,” MIT Technology Review, vol. 125, no. 6, pp. 9–10, 2022.
  • [10] H. Salman, A. Khaddaj, G. Leclerc, A. Ilyas, and A. Madry, “Raising the cost of malicious ai-powered image editing,” in Proceedings of the 40th International Conference on Machine Learning.   PMLR, 2023.
  • [11] R. Gandikota, J. Materzynska, J. Fiotto-Kaufman, and D. Bau, “Erasing concepts from diffusion models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 2426–2436.
  • [12] G. Zhang, K. Wang, X. Xu, Z. Wang, and H. Shi, “Forget-me-not: Learning to forget in text-to-image diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 1755–1764.
  • [13] Y. Cui, J. Ren, H. Xu, P. He, H. Liu, L. Sun, Y. Xing, and J. Tang, “Diffusionshield: A watermark for copyright protection against generative diffusion models,” arXiv preprint arXiv:2306.04642, 2023.
  • [14] Y. Wen, J. Kirchenbauer, J. Geiping, and T. Goldstein, “Tree-rings watermarks: Invisible fingerprints for diffusion images,” in Advances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, Eds., vol. 36.   Curran Associates, Inc., 2023, pp. 58 047–58 063.
  • [15] B. Cao, C. Li, T. Wang, J. Jia, B. Li, and J. Chen, “Impress: Evaluating the resilience of imperceptible perturbations against unauthorized data usage in diffusion-based generative ai,” Advances in Neural Information Processing Systems, vol. 36, 2024.
  • [16] M. Pham, K. O. Marshall, N. Cohen, G. Mittal, and C. Hegde, “Circumventing concept erasure methods for text-to-image generative models,” in The Twelfth International Conference on Learning Representations, 2023.
  • [17] G. Li, Y. Chen, J. Zhang, J. Li, S. Guo, and T. Zhang, “Towards the vulnerability of watermarking artificial intelligence generated content,” arXiv preprint arXiv:2310.07726, 2023.
  • [18] Z. Jiang, J. Zhang, and N. Z. Gong, “Evading watermark based detection of ai-generated content,” in Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, 2023, pp. 1168–1181.
  • [19] R. Gandikota, H. Orgad, Y. Belinkov, J. Materzyńska, and D. Bau, “Unified concept editing in diffusion models,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 5111–5120.
  • [20] P. Schramowski, M. Brack, B. Deiseroth, and K. Kersting, “Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 22 522–22 531.
  • [21] T. Šarčević, A. Karlowicz, R. Mayer, R. Baeza-Yates, and A. Rauber, “U can’t gen this? a survey of intellectual property protection methods for data in generative ai,” arXiv preprint arXiv:2406.15386, 2024.
  • [22] J. Ren, H. Xu, P. He, Y. Cui, S. Zeng, J. Zhang, H. Wen, J. Ding, H. Liu, Y. Chang et al., “Copyright protection in generative ai: A technical perspective,” arXiv preprint arXiv:2402.02333, 2024.
  • [23] C. Liang and X. Wu, “Mist: Towards improved adversarial examples for diffusion models,” arXiv preprint arXiv:2305.12683, 2023.
  • [24] R. Hönig, J. Rando, N. Carlini, and F. Tramèr, “Adversarial perturbations cannot reliably protect artists from generative ai,” 2024.
  • [25] B. Zheng, C. Liang, X. Wu, and Y. Liu, “Understanding and improving adversarial attacks on latent diffusion model,” arXiv preprint arXiv:2310.04687, 2023.
  • [26] C. Liang, X. Wu, Y. Hua, J. Zhang, Y. Xue, T. Song, Z. Xue, R. Ma, and H. Guan, “Adversarial example does good: Preventing painting imitation from diffusion models via adversarial examples,” in International Conference on Machine Learning.   PMLR, 2023, pp. 20 763–20 786.
  • [27] T. Van Le, H. Phung, T. H. Nguyen, Q. Dao, N. N. Tran, and A. Tran, “Anti-dreambooth: Protecting users from personalized text-to-image synthesis,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 2116–2127.
  • [28] G. K. Wallace, “The jpeg still picture compression standard,” Communications of the ACM, vol. 34, no. 4, pp. 30–44, 1991.
  • [29] P. Heckbert, “Color image quantization for frame buffer display,” ACM Siggraph Computer Graphics, vol. 16, no. 3, pp. 297–307, 1982.
  • [30] A. Chambolle, “An algorithm for total variation minimization and applications,” Journal of Mathematical imaging and vision, vol. 20, pp. 89–97, 2004.
  • [31] W. Nie, B. Guo, Y. Huang, C. Xiao, A. Vahdat, and A. Anandkumar, “Diffusion models for adversarial purification,” in International Conference on Machine Learning (ICML), 2022.
  • [32] N. Kumari, B. Zhang, S.-Y. Wang, E. Shechtman, R. Zhang, and J.-Y. Zhu, “Ablating concepts in text-to-image diffusion models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 22 691–22 702.
  • [33] AUTOMATIC1111, “Negative prompt,” https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Negative-prompt, 2022, accessed: 2024-07-01.
  • [34] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” in The Tenth International Conference on Learning Representations, 2022.
  • [35] R. Gal, Y. Alaluf, Y. Atzmon, O. Patashnik, A. H. Bermano, G. Chechik, and D. Cohen-Or, “An image is worth one word: Personalizing text-to-image generation using textual inversion,” in The Eleventh International Conference on Learning Representations, 2023.
  • [36] Y.-L. Tsai, C.-Y. Hsu, C. Xie, C.-H. Lin, J.-Y. Chen, B. Li, P.-Y. Chen, C.-M. Yu, and C.-Y. Huang, “Ring-a-bell! how reliable are concept removal methods for diffusion models?” in The Twelfth International Conference on Learning Representations, 2024.
  • [37] Z. Wang, C. Chen, L. Lyu, D. N. Metaxas, and S. Ma, “Diagnosis: Detecting unauthorized data usages in text-to-image diffusion models,” in The Twelfth International Conference on Learning Representations, 2023.
  • [38] L. Zhang, X. Liu, A. V. Martin, C. X. Bearfield, Y. Brun, and H. Guan, “Attack-resilient image watermarking using stable diffusion,” Advances in Neural Information Processing Systems, 2024.
  • [39] Z. Yang, K. Zeng, K. Chen, H. Fang, W. Zhang, and N. Yu, “Gaussian shading: Provable performance-lossless image watermarking for diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 12 162–12 171.
  • [40] O. Hosam, “Attacking image watermarking and steganography-a survey,” International Journal of Information Technology and Computer Science, vol. 11, no. 3, pp. 23–37, 2019.
  • [41] Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image compression with discretized gaussian mixture likelihoods and attention modules,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 7939–7948.
  • [42] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18.   Springer, 2015, pp. 234–241.
  • [43] F. Y. Shih, Digital watermarking and steganography: fundamentals and techniques.   CRC press, 2017.
  • [44] K. A. Zhang, L. Xu, A. Cuesta-Infante, and K. Veeramachaneni, “Robust invisible video watermarking with attention,” arXiv preprint arXiv:1909.01285, 2019.
  • [45] P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,” Advances in neural information processing systems, vol. 34, pp. 8780–8794, 2021.
  • [46] X. Zhao, K. Zhang, Z. Su, S. Vasan, I. Grishchenko, C. Kruegel, G. Vigna, Y.-X. Wang, and L. Li, “Invisible image watermarks are provably removable using generative ai,” arXiv preprint arXiv:2306.01953, 2023.
  • [47] B. Saleh and A. Elgammal, “Large-scale classification of fine-art paintings: Learning the right metric on the right feature,” arXiv preprint arXiv:1505.00855, 2015.
  • [48] C. Schuhmann, R. Beaumont, R. Vencu, C. Gordon, R. Wightman, M. Cherti, T. Coombes, A. Katta, C. Mullis, M. Wortsman et al., “Laion-5b: An open large-scale dataset for training next generation image-text models,” Advances in Neural Information Processing Systems, vol. 35, pp. 25 278–25 294, 2022.
  • [49] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 684–10 695.
  • [50] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning.   PMLR, 2021, pp. 8748–8763.
  • [51] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13.   Springer, 2014, pp. 740–755.
  • [52] A. Birhane, V. U. Prabhu, and E. Kahembwe, “Multimodal datasets: misogyny, pornography, and malignant stereotypes,” arXiv preprint arXiv:2110.01963, 2021.
  • [53] A. Mądry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” stat, vol. 1050, no. 9, 2017.
  • [54] C. Lu, Y. Zhou, F. Bao, J. Chen, C. Li, and J. Zhu, “Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps,” Advances in Neural Information Processing Systems, vol. 35, pp. 5775–5787, 2022.
Refer to caption
Figure 23: Fidelity visualization of Op. Column 1: original images; column 2-6: protected images.
Refer to caption
Figure 24: Efficacy visualization of Op. Row 1-2: Mist-protected image and its mimicked artwork; row 3-12: protected image under attacks and their mimicked artworks.
Refer to caption
Figure 25: Efficacy visualization of Ms. Row 1: original images; row 2-7: images generated by Ms.
Refer to caption
Figure 26: Resilience visualization of Ms against Cr. Row 1: original images; row 2: FMN-sanitized images; row 3-7: images generated from recovered model.
Refer to caption
Figure 27: Fidelity visualization of Dw. Row 1: original artworks from WikiArt dataset; row 2-7: watermarked images.
Refer to caption
Figure 28: Fidelity visualization of Dw against Wr. Column 1: original image; column 2: Zodiac-watermarked image; column 3-8: watermarked images under attacks.

Appendix A Metrics Overview and Visualization Results

TABLE VII: Properties of copyright protection methods.
Category Property Description PSNR SSIM FID VIFp LPIPS CLIP-I CLIP-T ACC
Obfuscation Processing Fidelity Protected images resemble the originals under all scenarios. \checkmark \checkmark \checkmark \checkmark \checkmark ×\times× ×\times× ×\times×
Efficacy Protected images mitigate copyright infringement. ×\times× ×\times× \checkmark ×\times× ×\times× \checkmark \checkmark ×\times×
Resilience Protected images mitigate copyright mimicking under attack. ×\times× ×\times× \checkmark ×\times× ×\times× \checkmark \checkmark ×\times×
Model Sanitization Fidelity Sanitized models are unaffected for unrelated concepts. ×\times× ×\times× \checkmark ×\times× ×\times× ×\times× \checkmark ×\times×
Efficacy Sanitized models forget specific copyright concepts. ×\times× ×\times× \checkmark ×\times× ×\times× ×\times× \checkmark ×\times×
Resilience Sanitized models struggle to relearn copyright concepts under attack. ×\times× ×\times× \checkmark ×\times× ×\times× ×\times× \checkmark ×\times×
Digital Watermark Fidelity Watermarked images resemble the originals under all scenarios. \checkmark \checkmark \checkmark \checkmark \checkmark \checkmark \checkmark ×\times×
Efficacy Watermark extractable from protected images. ×\times× ×\times× ×\times× ×\times× ×\times× ×\times× ×\times× \checkmark
Resilience Watermark extractable from attacked images. ×\times× ×\times× ×\times× ×\times× ×\times× ×\times× ×\times× \checkmark

We use metrics to assess fidelity, efficacy, and resilience of copyright protection methods, with Table VII summarizing these properties across different categories. For obfuscation processing and noise purification, Figure 23 presents protected images alongside the original artwork, while Figure 24 shows DreamBooth fine-tuned images with reference to the protected and purified images. For model sanitization and concept recovery, Figure 25 shows protected images with a specific concept sanitized, and Figure 26 shows the recovered images. Finally, for digital watermark and watermark removal, the results are illustrated in Figure 27.

Appendix B Experimental Setup

B.1 Obfuscation Processing and Noise Purification

In Op, AdvDM [26] trains with learning rate of 0.003 for 100 steps, and a perturbation limit of 0.06. Mist [23] uses an lsubscript𝑙l_{\infty}italic_l start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT constraint, 100 PGD steps, a per-step perturbation of 1/255, and a total budget of 16/255. Given that Glaze [7] is closed-source, we follow the implementation from [15]’s code using a learning rate of 0.001 for 500 steps, with a perceptual perturbation budget of 0.05, LPIPS loss weight of 0.1. PhotoGuard (PGuard) [10] uses an subscript\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT perturbation limit of 16/255, step size of 2/255, and 200 optimization steps. Anti-Dreambooth (AntiDB) [27] employs 100 PGD iterations for FSMG and 50 for ASPL, with a perturbation budget of 8/255, a step size of 1/255, and a noise budget η𝜂\etaitalic_η of 0.05, minimized over 1000 training steps.

In Np, JPEG Compression (JPEG) [28] sets the quality to 0.75, and Quantize (Quant) [29] sets the bit depth to 8. Total Variance Minimization (TVM) [30] sets a regularization weight of 0.5 with the l2subscript𝑙2l_{2}italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm and optimized with the BFGS algorithm. For IMPRESS [15], we use the original authors’ hyperparameters, setting the learning rate to 0.001, purification intensity to 0.1, and 3000 iterations. For DiffPure [31], we use classifier-free guidance with a scale of 7.5 and fine-tune the diffusion timesteps with a strength of 1,000 via AutoPipelineForImage2Image888https://huggingface.co/docs/diffusers/api/pipelines/auto_pipeline.

B.2 Model Sanitization and Concept Recovery

We use several methods for Ms. For Forget-Me-Not (FMN) [12], we fine-tune by textual inversion scripts provided by the authors. For Erased Stable Diffusion (ESD) [11], models are trained for each category. Ablating Concepts (AC) [32] employs scripts from the authors for both artistic and personal concepts, utilizing WikiArt artworks and generated photos, respectively. Unified Concept Editing (UCE) [19] models are trained using default parameters. Negative Prompt (NP) is applied during inference by txt2img.py999https://github.com/CompVis/stable-diffusion/blob/main/scripts/txt2img.py. Safe Latent Diffusion (SLD) [20] use a new SD pipeline based on the diffusers101010https://github.com/huggingface/diffusers/, with safety concepts defined for both artistic and personal elements. All parameters are set to default: guidance scale at 2000, warm-up steps at 7, threshold at 0.025, momentum scale at 0.5, and momentum beta at 0.7. Additionally, to confirm sanitized models’ fidelity on unrelated concepts, we compare 30,000 real-word images from MS-COCO 2017 dataset and the SD-generated images from the corresponding text descriptions.

In Cr, LoRA [34] trains with a batch size of 1 and a learning rate of 1×1041superscript1041\times 10^{-4}1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT for 100 steps. DreamBooth (DB) [5] trains with a batch size of 2 and a learning rate of 5×1075superscript1075\times 10^{-7}5 × 10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT for 1000 steps, using prompts such as “a painting in the style of [V]” for WikiArt dataset and “A photo of sks [V]” for Person dataset, where “[V]” represents a artist or concept name. Textual Inversion (TI) [35] uses textual_inversion.py111111https://github.com/huggingface/diffusers/blob/main/examples/textual_inversion/textual_inversion.py with 1000 steps and a learning rate of 5×1045superscript1045\times 10^{-4}5 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT, using the same prompts as DreamBooth. Concept Inversion (CI) [16] trains with a batch size of 4, a learning rate of 5×1035superscript1035\times 10^{-3}5 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT, and 1000 steps, using frozen erased model weights. Ring-A-Bell (RB) [36] uses a prompt length of 77, a tuning coefficient of 3, and a genetic algorithm with a population of 200, 3000 iterations, a mutation rate of 0.25, and a crossover rate of 0.5. UCE uniquely generates non-standard .pt model files, preventing further fine-tuning. Inference-guiding protections (NP, SLD) do not generate model files and are vulnerable only to CI and RB attacks.

B.3 Digital Watermark and Watermark Removal

In Dw, DiffusionShield (DShield) [13] uses a patch shape of (u,v)=(4,4)𝑢𝑣44(u,v)=(4,4)( italic_u , italic_v ) = ( 4 , 4 ) and sets a quarternary message to 2. For joint optimization, a 5-step PGD [53] is applied with lϵsubscript𝑙italic-ϵl_{\infty}\leq\epsilonitalic_l start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_ϵ, while SGD optimizes the classifier. Diagnosis (Diag) [37] uses a 100% coating rate for unconditional and 20% for trigger-conditioned memorization, with warping strengths of 2.0 and 1.0, respectively. Stable Signature (StabSig) [1] fine-tunes the LDM decoder using decoder to generate watermarked images. Tree-Ring (TR) [14] uses guidance scale of 7.5 for 50 inference steps, with a watermark radius of 10 for DDIM inversion. ZoDiac [38] uses a pre-trained SD model with 50 denoising steps and optimizes the latent variable over 100 iterations, using a watermark radius of 10 and weights of 0.1 for SSIM loss and 0.01 for perceptual loss. Gaussian Shading (GShade) [39] samples 50 steps using DPMSolver [54] with a guidance scale of 7.5 and performs 50 steps of DDIM inversion, using settings of fc=1subscript𝑓𝑐1f_{c}=1italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 1, fhw=8subscript𝑓𝑤8f_{hw}=8italic_f start_POSTSUBSCRIPT italic_h italic_w end_POSTSUBSCRIPT = 8, and l=1𝑙1l=1italic_l = 1 with capacity of 256 bits.

In Wr, Brightness Adjustment (Bright) [39] applies a factor of 6, and Image Rotation (Rotate) [39] performs a 90-degree rotation. Random Crop (Crop) [39] executes a selection of 50% of the image area, while Gaussian Blur (Blur) [40] uses a kernel size of 4. VAE-Cheng20 (VAE) is utilized with a quality level of 3 [41]. Moreover, DiffPure [31] implements the AutoPipelineForImage2Image pipeline121212https://huggingface.co/docs/diffusers/api/pipelines/auto_pipeline, with classifier-free guidance set to 7.5 and diffusion timesteps tuned to a strength of 1,000.

B.4 Style Mimicry Experimental Details

Dreambooth is a subject-driven generation method that can be used for style/concept transfer. In Op and Np, we use unprotected, protected, and attacked images as references to fine-tune a pre-trained SD model via Dreambooth, utilizing the implementation provided by diffusers131313https://github.com/huggingface/diffusers/. Additionally, we use the T2I fine-tuning script provided by diffusers to test the generalization of the protections (cf. Section 5.1). Following [24] for optimal style mimicry, we use 2000 training steps, a batch size of 4, a learning rate of 5×1065superscript1065\times 10^{-6}5 × 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT.

Appendix C Implementation Validity Analysis of Glaze

We use Glaze’s reproduce code from IMPRESS141414https://github.com/AAAAAAsuka/Impress/blob/main/glaze.py since the latest version of Glaze (v2.1)151515https://Glaze.cs.uchicago.edu/downloads.html is not open-sourced. For Glaze v2.1, we set the intensity to high and render quality to slowest for maximum protection. The comparison of protected images shows that while our implementation offers slightly lower protection, it achieves higher fidelity (cf. Table VIII). Both approaches display similar “style cloaks,” confirming the validity of our implementation (cf. 29).

Refer to caption
Figure 29: The comparison of generated images of a simplified version of the Glaze with Glaze v2.1.
TABLE VIII: Comparison of fidelity and efficacy on Glaze.
Method LPIPS \downarrow FID \downarrow CLIP-I \downarrow CLIP-T \downarrow
Glaze v2.1 0.403 ±plus-or-minus\pm± 0.053 283 0.625 ±plus-or-minus\pm± 0.008 0.248 ±plus-or-minus\pm± 0.002
Our Implementation 0.133 ±plus-or-minus\pm± 0.031 182 0.698 ±plus-or-minus\pm± 0.010 0.292 ±plus-or-minus\pm± 0.001

Appendix D User Study

User Study of Op. Our human evaluation assesses both visual quality and style mimicry of protected images under various attacks. Following [7, 24], we measure the correlation between metrics and human judgment regarding artist style mimicry. Annotators on Amazon MTurk161616https://www.mturk.com/ were presented with original artworks as style references and asked to evaluate two scenarios: (i) a generated artwork without protection versus one with protection, and (ii) a generated artwork without protection versus one with protection after attack. We employ original artist images from the WikiArt and the corresponding protected images from different protection methods as reference pictures to fine-tune the Dreambooth model with a prompt “a painting in the style of [artist]”. Participants view 10 original artworks by a specific artist as reference samples, followed by one protected and one unprotected generated image in the same style. We focus on two key aspects: 1) Visual Quality. Participants assess each image based on four questions corresponding to metrics targeting noise level, fidelity (including artifacts), alignment with brightness/contrast/structure, and overall stylistic fit (cf. Table D). To ensure unbiased assessments, we randomized image order, comparison sequences, and model generation seeds. 2) Style Mimicry. Inspired by the Glaze [7], we asked participants to rate the style mimicry of the generated images on a 5-point Likert scale, evaluating how well each image resembled the reference style samples. The options range from: (i) Not successful at all, (ii) Not very successful, (iii) Somewhat successful, (iv) Successful, to (v) Very successful.

Refer to caption
Figure 30: Visual quality and style mimicry success rates across different Op protection against attacks.
TABLE IX: Question list for Op.
Dimensions Human Evaluation Questions
Visual Quality PSNR Which image has less noise?
VIFp Which image has better fidelity and fewer artifacts (distorted, unrealistic)?
SSIM Based on brightness, contrast, and structure, which better matches the referred image?
LPIPS
FID Which image better fits the style of the referred image samples and the description “a painting in the style of [artist]”?
CLIP-I/T
Style Mimicry How successfully does the style of the image mimic the samples?

Fidelity of Ms. We conduct this user study to explore whether the Ms methods would impact the fidelity of images of unrelated concepts from human’s perspective. We evaluate image fidelity and text alignment by generating 2000 images per Ms model. Participants assess 25 random image pairs, comparing the SD reference to an erased model image, answering two questions: (i) Which image is of higher quality? (ii) Which image better represents the text caption? For each pair, the participants could respond with: (i) I prefer image A, (ii) I am indifferent, or (iii) I prefer image B. The study is conducted via Amazon Mechanical Turk, requiring participants to have a HIT Approval Rate above 95% and at least 1000 approved HITs. Each image pair batch is evaluated by three annotators, with each prompt receiving 30 assessments.

TABLE X: Image fidelity and text alignment of Ms.
Method SD FMN ESD AC UCE NP SLD
Image Fidelity FID-30k \downarrow 16.21 16.47 16.51 16.95 16.64 16.89 16.95
User/% \uparrow - 62.93 63.02 63.87 63.56 63.50 63.98
Text Alignment CLIP-T \uparrow 0.31 0.30 0.30 0.31 0.31 0.30 0.30
User/% \uparrow - 59.37 59.46 61.04 60.89 59.51 59.38

Appendix E Extentions to Other Image Editing Tasks

Our experiments with methods like AdvDM, Mist, and Glaze focused on unauthorized style imitation. It is essential to assess whether these protections also prevent unauthorized attribute editing. PGuard [10], originally designed for style imitation, applies imperceptible adversarial perturbations to handle a range of unauthorized edits, indicating potential for broader editing protection. For PGuard, we use Img2ImgPipeline171717https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/img2img for image editing. For the WikiArt dataset, we guide the SD model with style transformation prompts, such as: “Transform Vincent van Gogh’s ‘Starry Night’ into a surrealist painting in the style of Salvador Dalí.” For the CustomConcept101 dataset, we use prompts like “Change the background to a snowy mountain landscape during sunset while keeping the person unchanged.”

In examining PGuard’s resilience against various Np methods, Figures 31 and 32 reveal that while PGuard effectively prevents unauthorized edits, it faces challenges under specific attack scenarios. These attacks can lead to I2I transformations and result in images that merge original features with prompt modifications. This aligns with Section 4.2, highlighting that no Op protection is entirely resistant to Np attacks. Our findings suggest that protection effectiveness is closely linked to the intended application context, underscoring the need for further exploration into broader challenges like attribute editing.

Refer to caption
Figure 31: The result of PGuard protection.
Refer to caption
Figure 32: The result of PGuard’s protection under attacks.