CopyrightMeter: Revisiting Copyright Protection in Text-to-image Models

Naen Xu2^∗, Changjiang Li4^∗, Tianyu Du2(✉), Minxi Li2, Wenjie Luo2, Jiacheng Liang4, Yuyuan Li5,
Xuhong Zhang2, Meng Han2, Jianwei Yin2, Ting Wang4 Naen Xu and Changjiang Li are the co-first authors. Tianyu Du is the corresponding author. 2Zhejiang University, 4Stony Brook University, 5Hangzhou Dianzi University
E-mails: {xunaen, zjradty, breathing, zhangxuhong, mhan, zjuyjw}@zju.edu.cn,
[email protected], [email protected], [email protected], [email protected], [email protected]

Abstract

Text-to-image diffusion models have emerged as powerful tools for generating high-quality images from textual descriptions. However, their increasing popularity has raised significant copyright concerns, as these models can be misused to reproduce copyrighted content without authorization. In response, recent studies have proposed various copyright protection methods, including adversarial perturbation, concept erasure, and watermarking techniques. However, their effectiveness and robustness against advanced attacks remain largely unexplored. Moreover, the lack of unified evaluation frameworks has hindered systematic comparison and fair assessment of different approaches.

To bridge this gap, we systematize existing copyright protection methods and attacks, providing a unified taxonomy of their design spaces. We then develop CopyrightMeter, a unified evaluation framework that incorporates 17 state-of-the-art protections and 16 representative attacks. Leveraging CopyrightMeter, we comprehensively evaluate protection methods across multiple dimensions, thereby uncovering how different design choices impact fidelity, efficacy, and resilience under attacks. Our analysis reveals several key findings: (i) most protections (16/17) are not resilient against attacks; (ii) the “best” protection varies depending on the target priority; (iii) more advanced attacks significantly promote the upgrading of protections. These insights provide concrete guidance for developing more robust protection methods, while its unified evaluation protocol establishes a standard benchmark for future copyright protection research in text-to-image generation.

1 Introduction

Recent advances in text-to-image diffusion models (T2I DMs), such as Stable Diffusion (SD) [1], DALL·E 3 [2], and Imagen [3], have revolutionized digital content creation by generating high-quality images from textual descriptions. While these models foster creativity by producing art and realistic scenes, they also raise significant copyright concerns[4]. Fine-tuning pre-trained models on specialized datasets allows them to mimic specific themes such as distinct art styles, which can lead to unauthorized reproductions[5, 6]. Artists are increasingly worried that their unique styles could be copied without permission, resulting in potential copyright infringement[7]. Furthermore, models trained on extensive datasets may produce images that closely resemble the style or content of specific artists, even if the artist or their creations are not directly referenced in the prompt[8]. As these AI-driven technologies evolve, it is crucial to balance innovation with the protection of creators’ rights, as many artists fear that their unique art styles could be easily copied, potentially drawing customers away[9].

The urgent need to safeguard digital intellectual property leads to the development of three main protection categories: (i) Obfuscation Processing, which preprocesses data before release online to prevent unauthorized use, often using adversarial perturbation to confuse AI models while preserving content for normal users[10, 7]. (ii) Model Sanitization, which modifies pre-trained DMs to remove or alter protected copyright elements before public deployment[11, 12]. (iii) Digital Watermarking, which embeds invisible identifiers in AI-generated content to assert copyright ownership and support effective content management[13, 1, 14].

Given the significance of these protection mechanisms, recent studies have raised concerns about their effectiveness and robustness[15, 16, 17, 18]. This has led to several critical questions: RQ₁ – What are the strengths and limitations of different protection mechanisms, especially their robustness against attacks? RQ₂ – What are the best practices for copyright protection even in adversarial and envolving environments? RQ₃ – How can existing copyright protection methods be further improved?

Despite their importance for understanding and improving copyright protection, these questions are under-explored due to the following challenge.

TABLE I: Comparison of conclusions in prior and our work (

\Circle

– inconsistent;

\LEFTcircle

– partially inconsistent;

\CIRCLE

– consistent).

Previous conclusion	Refined conclusion in this paper	Explanation	Consistency
In Obfuscation Processing, Mist shows strong effectiveness against various noise purification methods, including under the SOTA online platform NovelAI I2I scenario.	Mist has limited protective effectiveness against local DiffPure attacks and the latest version of NovelAI — NAI Diffusion Anime.	The original protection may lose resilience as new attacks circumvent current protections, rendering previous methods vulnerable.	$\LEFTcircle$ (Sec 4.2 and 5.5)
In Model Sanitization, FMN[12], ESD[11], UCE[19], and SLD[20] remove a copyright concept while preserving the model’s ability to generate images without it.	All Model Sanitization methods maintain unrelated concepts without copyright concepts well.	Despite removing explicit copyright concepts, these methods ensure that the model retains its ability to generate irrelevant images, preserving its utility and effectiveness.	$\CIRCLE$ (Sec 4.3)
In Model Sanitization, ESD permanently removes concepts from DMs, rather than modifying outputs in inference, so it cannot be circumvented even if model weights are accessible.	Model Sanitization methods are vulnerable to concept recovery methods such as DreamBooth, Text Inversion, Concept Inversion, or even model-weights-free approaches like Ring-A-Bell.	The training dataset of DM, such as LAION, contains images with varying content, and it is almost impossible to remove elements with copyright concepts permanently.	$\Circle$ (Sec 4.3)
In Digital Watermarking, the techniques Diag, StabSig, and GShare demonstrate relative resilience against Watermark Removal attacks.	Regarding attack resilience, Diag exhibits vulnerability to Blur attacks, StabSig is vulnerable to Rotate, Blur, VAE, and DiffPure attacks, and GShade demonstrates vulnerability to Rotate attacks.	The vulnerability of Diag to Blur attack is attributed to different datasets, as the original paper employs the Pokemon dataset. Besides, StabSig and GShade are vulnerable to specific attacks not covered in the original paper.	$\LEFTcircle$ (Sec 4.4)

Non-holistic evaluations – Existing studies often lack comprehensive evaluation of protections and attacks [21, 22], focus narrowly on limited perspectives, such as [16] focuses solely on model sanitization against textual inversion, without providing a holistic evaluation. Moreover, many rely on limited metrics, failing to fully capture the characteristics and impacts of the protections being evaluated.

Non-unified framework – Inconsistent datasets and DM versions across studies lead to evaluations under varying conditions, making comparision challenging. For example, Glaze [7] and Mist [23] are evaluated with different SD versions, complicating direct comparisons of evaluations.

Outdated evaluations – While new attacks quickly lead to updated protections to bolster security, many studies focus solely on older protection methods, missing recent developments. For instance, [24] evaluated only the original Mist system as reported by [23], without considering the updated Mist v2 system described by[25].

To solve existing issues, we introduce a systematic taxonomy for copyright protection methods and develop CopyrightMeter, a systematic framework for evaluating them across different dimensions, including fidelity, efficacy, and resilience: fidelity evaluate how protected content retains its original quality; efficacy measures the protection method’s effectiveness in preventing unauthorized use or mimicry; and resilience indicates the method’s ability to withstand attacks. By reviewing literature and evaluating current practices, our study provides insights into challenges and opportunities, guiding policymakers, content creators, and technologists striving to navigate the complex interplay between copyright law and technological advancement. Our contributions are summarized in three major aspects:

Framework – We develop CopyrightMeter, the first unified framework for extensively evaluating copyright protection in T2I DMs. It integrates 17 protection methods, 16 representative attacks, and 10 key metrics for in-depth analysis of these methods. We plan to open source CopyrightMeter to facilitate copyright protection research and encourage the community to contribute more techniques.

Evaluation – Leveraging CopyrightMeter, we explore the landscape of copyright protection in T2I DMs, conducting a systematic study of existing protections and attacks, uncovering key insights that challenge prior conclusions, as summarized in Table I. Our findings reveal that different protections manifest delicate trade-offs among fidelity, efficacy, and resilience. For instance, Mist achieves strong protection against mimicry but slightly compromises fidelity; ESD shows high efficacy but relatively weak resilience; ZoDiac and GShade have high fidelity and efficacy, but are less resilient to attacks. These observations indicate the importance of using comprehensive metrics to evaluate copyright protections, and suggest the optimal practices of applying them under different settings.

Exploration – We further explore improving existing protections, leading to several critical insights including (i) the generalizability of various copyright protection methods differs significantly; (ii) in scenarios prioritizing efficiency, inference-guiding Ms are preferred to model fine-tuning Ms; (iii) the ongoing arms race between protections and attacks promotes the development of more advanced protections. We envision that the CopyrightMeter platform and our findings will facilitate future research on copyright protection and shed light on designing and building T2I DMs in a more trustworthy manner.

2 Background

Refer to caption — Figure 1: Overall system design of CopyrightMeter.

2.1 Text-to-image Diffusion Models

Diffusion models are a class of generative models that transform random noise into coherent data through a forward step that gradually adds noise to data and a reverse step that denoises it to recover the original data distribution. Our study focuses on the latent diffusion models (LDMs) for their strong performance and low computational costs.

Text-to-image diffusion models (T2I DMs) generate images from textual descriptions by learning to reverse the noise addition process guided by text. A notable open-source T2I DM example is Stable Diffusion (SD). Given a text prompt, it generates an image that reflects the specified semantic features, involves two key components:

Conditioning on Textual Descriptions – The reverse diffusion process is guided by textual descriptions, which are embedded into a high-dimensional vector using transformer-based models or other deep learning architectures. This vector informs each step of the reverse diffusion to align the generated image with the text.

Training Objective – T2I DMs are trained to predict and remove noise at each step, guiding image generation to match text prompts. This is achieved by minimizing the difference between actual and predicted noise:

\mathcal{L}_{DM}(\theta)=\mathbb{E}_{x_{0},\epsilon,t,y}\left[\|\epsilon-% \epsilon_{\theta}(x_{t},y,t)\|^{2}\right]

(1)

where $\epsilon$ is the noise vector, and $\epsilon_{\theta}(x_{t},y,t)$ is the model’s estimate of the noise, conditioned on the noisy image $x_{t}$ , the textual description $y$ , and the timestep $t$ .

Beyond these design aspects, T2I DMs have driven advancements in generative AI across content creation, design, education, and entertainment, bridging the gap between textual descriptions and visual content. T2I can also be fine-tuned with tools like DreamBooth, which enables them to mimic specific visual styles or objects by training on a few reference images, thus allowing the model to produce images closely resembling these reference examples.

2.2 Copyright Protection in Text-to-Image Models

In T2I DMs, copyright protection is a critical concern. The central challenge is ensuring that images generated from the model do not resemble copyrighted images. Techniques like DreamBooth fine-tuning on DM allow models to mimic specific copyrighted content, while DMs may also inadvertently produce similar works. Another significant challenge is ensuring generated images can be traced back to their copyrighted sources. Conversely, the associated attacks aim to exploit these models for unauthorized purposes. The two primary attack methods involve generating an image that matches a specific, potentially copyrighted image or manipulating the generated image to make it untraceable. We will formalize and discuss the prominent categories of protection and corresponding attack methods in Section 3.

3 Taxonomy

In this section, we provide a holistic overview of various copyright protection and attack methods. As depicted in Figure 1, we divide copyright protection into three categories: Obfuscation Processing (Op), Model Sanitization (Ms), and Digital Watermarking (Dw). Correspondingly, we identify three attack categories: Noise Purification (Np), Concept Recovery (Cr), and Watermark Removal (Wr). Table II presents the definitions and detailed methods of these protections and their corresponding attacks. Figure 1 shows the overall system design of CopyrightMeter, while Figure 2 provides specific examples of copyright protection and attack scenarios. We will briefly introduce each category in the subsequent sections. For convenience, we summarize the acronyms and notations in Table III.

TABLE II: Overview of copyright protections and attacks.

Category	Definition and Methods
Obfuscation	Definition: Add adversarial perturbations on images to avoid image mimicry.
Processing	Methods: AdvDM [26], Mist [23], Glaze [7], PGuard [10], AntiDB [27].
Noise Purification	Definition: Purify the protected images to nullify adversarial perturbations.
Noise Purification	Methods: JPEG [28], Quant [29], TVM [30], IMPRESS [15], DiffPure [31].
Model	Definition: Prevent DM from generating images containing specific concept.
Sanitization	Methods: FMN [12], ESD [11], AC [32], UCE [19], NP [33], SLD [20].
Concept Recovery	Definition: Retrieve the eliminated concept to recover the content generation.
Concept Recovery	Methods: LoRA [34], DB [5], TI [35], CI [16], RB [36].
Digital	Definition: Embed DM-based watermark into image generation.
Watermarking	Methods: DShield [13], Diag [37], StabSig [1], ZoDiac [38], TR [14], GShade [39].
Watermark Removal	Definition: Tamper with images to remove watermark.
Watermark Removal	Methods: Bright [39], Rotate [39], Crop [39], Blur [40], VAE [41], DiffPure [31].

TABLE III: Acronyms and notations.

Notation		Definition
General	$x$	original copyrighted image to be protected
	$\theta$	Diffusion Model (DM)’s weights
	$D_{p}(\cdot,\cdot)$	pixel distance between images
	$D_{z}(\cdot,\cdot)$	latent space distance between images
	$\epsilon$	upper bound of pixel distance between two images
\hdashline Op & Np	$\delta$	perturbation introduced during obfuscation processing (Op)
	$x_{\text{t}}$	the chosen dissimilar target image differ from $x$ in Op
	$x_{\text{pro}}$	protected image with perturbation applied in Op
	$\tau$	transformation applied in noise purification (Np) to remove $\delta$
	$x_{\text{pur}}$	purified image after $\tau$ in Np
\hdashline Ms & Cr	$C$	set of all possible concepts
	$c_{\text{cr}}$	copyright concept for model sanitization (Ms)
	$c_{\text{$\varnothing$}}$	a specific unrelated concept that excludes $c_{\text{cr}}$
	$c_{\text{ref}}$	concept in reference image similar to copyrighted image $x$
	$p(x\|c)$	image generation distribution by DM given concept $c$
	$D_{KL}(\cdot\parallel\cdot)$	the divergence between the two image output distributions
\hdashline Dw & Wr	$m$	original watermarked message embedded into $x$
	$w$	watermark embedding function
	$e$	watermark extraction function
	$x_{\text{wm}}$	watermarked image with message $m$ via $w$
	$m_{\text{wm}}$	extracted message from $x_{\text{wm}}$
	$x_{\text{wr}}$	image after watermark removal
	$D_{t}(\cdot,\cdot)$	text distance between two watermarked messages

3.1 Protection Schemes

This subsection contains survey-style descriptions of the investigated copyright protection schemes. Table IV shows the copyright protection methods and detailed characteristics.

TABLE IV: Summary of copyright protection methods in CopyrightMeter.

Protection Category Auxiliary Guidance Distortion Scenario Main Application Implementer Sem. Graph. I2I T2I AdvDM [26] Obfuscation Processing Diffusion Model $\checkmark$ $\times$ $\checkmark$ $\checkmark$ Unauthorized style mimicry Data Owner Mist [23] Image Encoder & Diffusion $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$ Unauthorized style mimicry Glaze [7] Image Encoder $\checkmark$ $\times$ $\checkmark$ $\checkmark$ Unauthorized style mimicry PGuard [10] Image Encoder $\times$ $\checkmark$ $\checkmark$ $\times$ Unauthorized editing AntiDB [27] Diffusion Model $\checkmark$ $\checkmark$ $\times$ $\checkmark$ Unauthorized image mimicry FMN [12] Model Sanitization Model Fine-tuning $\checkmark$ $\times$ $\times$ $\checkmark$ Identity, object, or style Model Provider ESD [11] Model Fine-tuning $\checkmark$ $\times$ $\times$ $\checkmark$ Style, explicit content, or object AC [32] Textual Inversion $\checkmark$ $\times$ $\times$ $\checkmark$ Instance, style, or memorized images UCE [19] Textual Inversion $\checkmark$ $\times$ $\times$ $\checkmark$ Artist or objects NP [33] Inference Guiding $\checkmark$ $\times$ $\times$ $\checkmark$ Specific concepts or features SLD [20] Inference Guiding $\checkmark$ $\times$ $\times$ $\checkmark$ Inappropriate concept DShield [13] Digital Watermarking Model Fine-tuning $\times$ $\checkmark$ $\checkmark$ $\times$ Multi-bit watermark for existing images Works Publisher Diag [37] Model Fine-tuning $\times$ $\checkmark$ $\checkmark$ $\times$ Zero-bit watermark for existing images StabSig [1] Model Fine-tuning $\times$ $\checkmark$ $\times$ $\checkmark$ Multi-bit watermark for generating images TR [14] Latent Space Modifying $\times$ $\checkmark$ $\times$ $\checkmark$ Zero-bit watermark for generating images ZoDiac [38] Latent Space Modifying $\times$ $\checkmark$ $\checkmark$ $\times$ Zero-bit watermark for existing images GShade [39] Latent Space Modifying $\times$ $\checkmark$ $\times$ $\checkmark$ Multi-bit watermark for generating images

Note: Auxiliary Guidance – model components integrated for perturbation optimization in Op, or methods used in Ms and Dw. Sem – the semantic-distortion-based method, Graph – the graphical-distortion-based method, I2I – image-to-image generation; T2I – text-to-image generation.

3.1.1 Obfuscation Processing (Op)

This approach introduces protective perturbations into copyrighted images to prevent replication from T2I DMs. When these protected images are used as training or reference data (e.g., in image-to-image transformation), they mislead DMs that aim to replicate the originals, thereby protecting data owners from unauthorized replication and misuse of their data.

Formalization – Given a copyrighted image $x$ , the aim is to create a protected image $x_{\text{pro}}$ by adding a carefully crafted perturbation $\delta$ , such that $x_{\text{pro}}=x+\delta$ . This perturbation $\delta$ is designed to either maximize the latent space distance between $x_{\text{pro}}$ and $x$ (untargeted protection) or minimize the latent space similarity between $x_{\text{pro}}$ and a deliberately chosen dissimilar target image $x_{t}$ (targeted protection). Additionally, to ensure the perturbation remains inconspicuous, the pixel distance between $x$ and $x_{\text{pro}}$ should be constrained by an upper bound $\epsilon$ , maintaining the visual fidelity of the protected image. This can be formatted as:

\max_{\delta}D_{z}(x,x_{\text{pro}})\;\text{or}\;\min_{\delta}D_{z}(x_{\text{% pro}},x_{t}),\text{s.t. }D_{p}(x,x_{\text{pro}})\leq\epsilon.

(2)

Approaches – Since all methods maintain visual similarity by ensuring the perturbation $\delta$ maintain a small pixel space distance between $x$ and $x_{\text{pro}}$ , we omit this commonality and focus solely on the unique protection concepts of each method. AdvDM [26] optimizes $\delta$ to maximize the diffusion training loss and increase the latent noise vector’s distance of $x_{\text{pro}}$ and $x$ . Based on AdvDM, Mist [23] optimizes $\delta$ to maximize distance both in the latent noise vector and latent encoded representation. Glaze [7] optimizes $\delta$ by adjusting it to approach $x_{t}$ with a specific style, aiming to minimize $D_{z}(x_{\text{pro}},x_{t})$ . PhotoGuard (PGuard) [10] using two schemes – using either the encoder or the entire diffusion process to optimize $\delta$ to minimize $D_{z}(x_{\text{pro}},x_{t})$ in the latent space of encoder and LDM, respectively. Anti-DreamBooth (AntiDB) [27] optimizes $\delta$ to minimize DM’s generation ability by making $x$ difficult to reconstruct from $x_{\text{pro}}$ .

3.1.2 Model Sanitization (Ms)

This approach is designed for model providers by guiding pre-trained DMs to remove copyright concepts before public deployment, ensuring that the models do not reproduce copyrighted content illegally.

Formalization – Given a concept protected by copyright $c_{\text{cr}}\in C$ (where $C$ is the set of all concepts) and a specific unrelated concept $c_{\text{$\varnothing$}}\in C\setminus c_{\text{cr}}$ . It shifts model’s generation distribution conditioned on $c_{\text{cr}}$ , denoted as $p_{\phi}(x|c_{\text{cr}})$ , toward the distribution conditioned on the unrelated concept $c_{\text{$\varnothing$}}$ , denoted as $p_{\phi}(x|c_{\text{$\varnothing$}})$ . To measure the alignment, we minimize the KL divergence $D_{KL}$ between these distributions through the transformation $\phi$ , the model’s output distribution is adjusted to reduce its ability to generate images corresponding to $c_{\text{cr}}$ . The objective can be formalized as:

\arg\min_{\phi}D_{KL}(p(x|c_{\text{$\varnothing$}})\parallel p_{\phi}(x|c_{% \text{cr}})).

(3)

Approaches – Based on the difference in distribution alignment, the approaches can be categorized into two types: fine-tuning and inference guiding methods.

Fine-tuning methods adjust $p_{\phi}(x|c_{\text{cr}})$ by modifying the DM’s U-Net weights, targeting different components depending on the method [42]. For instance, Forget-Me-Not (FMN) [12] fine-tunes U-Net cross-attention layers’ weights to minimize the Frobenius norm of attention maps between input feature and embedding of $c_{\text{cr}}$ , aligning $p_{\phi}(x|c_{\text{cr}})$ more closely with $p(x|c_{\text{$\varnothing$}})$ . Erased Stable Diffusion (ESD) [11] fine-tunes to both cross-attention and unconditional layers to diminish $c_{\text{cr}}$ ’s influence in denoising prediction. Ablating Concepts (AC) [32] further fine-tunes U-Net weights, including projection matrices in cross-attention layers, and text transformer embedding to minimize KL divergence for a tighter alignment. Unified Concept Editing (UCE) [19] strategically modifies U-Net’s cross-attention keys and values associated with text embeddings of $c_{\text{cr}}$ to align $p_{\phi}(x|c_{\text{cr}})$ with $p(x|c_{\text{$\varnothing$}})$ while preserving unrelated concepts $c_{\text{$\varnothing$}}$ .

Inference guiding methods adjust the sampling process without altering model weights. In SD, each sampling step involves conditional and unconditional denoising. The final noise prediction is derived by taking the difference between these two samplings. Negative Prompt (NP) [33] replaces unconditional noise prediction with noise conditioned on $c_{\text{cr}}$ , guiding diffusion away from the $c_{\text{cr}}$ . Safe Latent Diffusion (SLD) [20] adds a safety guidance term, further shifting the distribution away from $p_{\phi}(x|c_{\text{cr}})$ .

3.1.3 Digital Watermarking (Dw)

This approach embeds invisible messages in images to trace image origins and verify copyright. Unlike traditional post-hoc watermarks [43, 44] applied after image generation and do not involve DMs, we discussed watermarks in the generation process of DMs. This can be achieved by embedding watermarks directly in the training data and fine-tuning the DM, or by modifying latent vectors to impact the generation of images.

TABLE V: Summary of copyright attack methods in CopyrightMeter.

Attack Category Approach Type Methodology Accessibility Scenario Target Capability of Adversary Text Image Model I2I T2I JPEG [28] Noise Purification Empirical Data Compression $\times$ $\checkmark$ $\times$ $\checkmark$ $\times$ Lossy Compression Raw Data Availability Quant[29] Empirical Data Compression $\times$ $\checkmark$ $\times$ $\checkmark$ $\times$ Lossy Compression TVM [30] Optimization Denoising and Smoothing $\times$ $\checkmark$ $\times$ $\checkmark$ $\times$ Perturbation Purification IMPRESS [15] Optimization Denoising and Smoothing $\times$ $\checkmark$ $\checkmark$ $\checkmark$ $\times$ Perturbation Purification DiffPure [31] Optimization Image Regeneration $\times$ $\checkmark$ $\checkmark$ $\checkmark$ $\times$ Perturbation Purification LoRA [34] Concept Recovery Optimization Model Fine-Tuning $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$ Personalizing Generation Model Weights Availability DB [5] Optimization Model Fine-Tuning $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$ Personalizing Generation TI [35] Optimization Model Fine-Tuning $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$ Personalizing Generation CI [16] Optimization Model Fine-Tuning $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$ $\times$ Sanitized Concepts Retrieval RB [36] Optimization Prompt Engineering $\checkmark$ $\times$ $\times$ $\times$ $\checkmark$ Sanitized Concepts Retrieval Bright [39] Watermark Removal Empirical Image Distortion $\times$ $\checkmark$ $\times$ $\checkmark$ $\times$ Watermark Obscuration Final Image Availability Rotate [39] Empirical Image Distortion $\times$ $\checkmark$ $\times$ $\checkmark$ $\times$ Watermark Obscuration Crop [39] Empirical Image Distortion $\times$ $\checkmark$ $\times$ $\checkmark$ $\times$ Watermark Obscuration Blur [40] Empirical Image Distortion $\times$ $\checkmark$ $\times$ $\checkmark$ $\times$ Watermark Obscuration VAE [41] Optimization Image Regeneration $\times$ $\checkmark$ $\checkmark$ $\checkmark$ $\times$ Image Compression DiffPure [31] Optimization Image Regeneration $\times$ $\checkmark$ $\checkmark$ $\checkmark$ $\times$ Perturbation Purification

Formalization – Embedding a watermark message $m$ into an image $x$ with a function $w$ results in a watermarked image $x_{\text{wm}}=w(x,m)$ . An extraction function $e$ is decodes the message $m_{\text{wm}}=e(x_{\text{wm}})$ . The watermarked image $x_{\text{wm}}$ should remain visually similar to the $x$ , and the $m_{\text{wm}}$ should accurately reflect $m$ . The goal is to find $w$ that minimizes either pixel distance $D_{p}$ or latent space distance $D_{z}$ between $x$ and $x_{\text{wm}}$ , while optionally minimizing the text discrepancy $D_{\text{t}}$ between $m$ and $m_{\text{wm}}$ , depending on the specific method. This can be formalized as:

\min_{w}\left[\alpha D_{p}(x,x_{\text{wm}})+\beta D_{z}(x,x_{\text{wm}})+% \lambda D_{t}(m,m_{\text{wm}})\right],

(4)

where $\alpha$ , $\beta$ , and $\lambda$ are weights that balance image quality, latent space similarity, and message accuracy, respectively. Depending on the approach, either $D_{p}$ or $D_{z}$ (or both) may be used, and $D_{t}$ is included if relevant.

Approaches – The following methods outline different watermark embedding processes, denoted by $w$ . DiffusionShield (DShield) [13] encodes the watermark message $m$ as a binary sequence, embedding each bit into distinct regions of the image $x$ , with a decoder optimized to minimize the discrepancy between $m_{\text{wm}}$ and $m$ , while controlling the $\ell_{\infty}$ -norm to reduce the pixel distance between $x$ and $x_{\text{wm}}$ . Diagnosis (Diag) [37] applies a text trigger to a dataset subset, fine-tuning the model to generate $x_{\text{wm}}$ , and trains a binary classifier for watermark detection. Stable Signature (StabSig) [1] fine-tunes the decoder of the image generator with a binary signature, producing $x_{\text{wm}}$ while minimizing perceptual distortion $D_{p}(x,x_{\text{wm}})$ and message discrepancy $D_{t}(m,m_{\text{wm}})$ . Tree-Ring (TR) [14] embeds $m$ in the Fourier space of initial noise latent vector, detectable through DDIM inversion[45], while minimizing $L_{1}$ distance between $m$ and $m_{\text{wm}}$ from the Fourier transform of the inverted noise vector. ZoDiac [38] is equipped for watermarking existing images by embedding $m$ into the latent vector through DDIM inversion, incorporating Euclidean distance, SSIM loss, and Watson-VGG perceptual loss to minimize the pixel distance of $x_{\text{wm}}$ and $x$ . Gaussian Shading (GShade) [39] maps $m$ to latent representations following a standard Gaussian distribution, aiming to preserve the distribution between $x$ and $x_{\text{wm}}$ for fidelity.

3.2 Attack Schemes

This subsection outlines the copyright attack schemes evaluated. Table V summarizes attack methods and their detailed characteristics.

3.2.1 Noise Purification (Np)

This process employs specific transformation as an attack to remove the protective perturbations added to images in Op, thereby evaluating the effectiveness of Op under attack and assessing its resilience.

Formalization – Given a protected image $x_{\text{pro}}=x+\delta$ , the adversary aims to apply a transformation $\tau$ to remove the perturbation $\delta$ . These methods can be classified into two categories: (i) Experience-based methods, which use common transformations (e.g., JPEG compression) as $\tau$ to remove perturbation $\delta$ while having little impact on the pixels difference between $x_{\text{pro}}$ and $x_{\text{pur}}$ . (ii) Optimization-based methods eliminate the potential protection $\delta$ more accurately by customizing transformations to align the latent and pixel spaces of $x_{\text{pur}}$ . Specifically, it minimizes the pixel distance between $x_{\text{pur}}$ and reconstructed image $f_{\theta}(x_{\text{pur}})$ generated from the latent representation. Besides, for purification fidelity, it is crucial that $x_{\text{pur}}$ remains visually similar to the original image $x$ . However, as $x$ is typically unavailable during attacks. Therefore, $x_{\text{pro}}$ is used to approximate $x$ due to the minor perturbation $\delta$ . Visual similarity is then achieved by constraining the pixel distance between $x_{\text{pur}}$ and $x_{\text{pro}}$ . This overall process can be formatted as:

\min_{\tau}D_{p}(x_{\text{pur}},f_{\theta}(x_{\text{pur}})),\text{s.t. }D_{p}(% x_{\text{pro}},x_{\text{pur}})\leq\epsilon.

(5)

Approaches – Experience-based methods use the following transformation as $\tau$ : JPEG [28] is a lossy compression algorithm that uses discrete cosine transform to remove high-frequency components from $x_{\text{pro}}$ ; Quantization (Quant) [29] compresses pixel values to single discrete values.

Optimization-based methods include: Total Variation Minimization (TVM) [30] reduces $\delta$ by minimizing unnecessary pixel intensity variations (i.e., gradient amplitude). IMPRESS [15] purifies $x_{\text{pro}}$ by minimizing the consistency between $x_{\text{pur}}$ and $f_{\theta}(x_{\text{pur}})$ while limiting the LPIPS between $x_{\text{pro}}$ and $x_{\text{pur}}$ for visual similarity; DiffPure [31] adds noise to $x_{\text{pro}}$ and then denoises to remove $\delta$ , limiting the upper bound of pixel distance between $x_{\text{pro}}$ and $x_{\text{pur}}$ .

3.2.2 Concept Recovery (Cr)

This process targets vulnerabilities to recover sanitized concepts, enabling sanitized models to generate images with copyrighted concepts, thus posing a risk of illegal replication. This evaluation assesses the resilience of sanitized models to such recovery attempts.

Formalization – For a sanitized model with output distribution $p_{\theta}(x|c_{\text{cr}})$ aligned with unrelated concept distribution $p(x|c_{\text{$\varnothing$}})$ , Cr aims to realign $p_{\theta}(x|c_{\text{cr}})$ to a reference distribution $p(x|c_{\text{ref}})$ . This reference distribution corresponds to images containing $c_{\text{ref}}$ , which are similar to the copyright content. The goal is to minimize the divergence between $p(x|c_{\text{ref}})$ and $p_{\theta}(x|c_{\text{cr}})$ , enabling the sanitized model to regenerate images containing $c_{\text{cr}}$ . This can be formatted as:

\arg\min_{\theta}D_{KL}(p(x|c_{\text{ref}})\parallel p_{\theta}(x|c_{\text{cr}% })).

(6)

Approaches – These methods learn embeddings from reference images with $c_{\text{ref}}$ and adjust the sanitized model to realign its output distribution. LoRA [34] modifies $\theta$ using a low-rank decomposition of weight updates, efficiently fine-tuning the model to align $p_{\theta}(x|c_{\text{cr}})$ with $p(x|c_{\text{ref}})$ . Similarly, DreamBooth (DB) [5] fine-tunes models on a set of images with $c_{\text{ref}}$ , embedding $c_{\text{ref}}$ into the model’s output domain to produce images with the distribution $p(x|c_{\text{ref}})$ . Textual Inversion (TI) [35] optimizes embedding for $c_{\text{ref}}$ by modifying the loss function to incorporate $c_{\text{ref}}$ during noise prediction, minimizing the discrepancy between noise predictions for generated and reference images. Concept Inversion (CI) [16] learns specialized embeddings that can recover $c_{\text{cr}}$ for each Ms approach to further improve alignment with $c_{\text{cr}}$ . Ring-A-Bell (RB) [36] is a model-agnostic method that extracts holistic representations of $c_{\text{cr}}$ to identify prompts that might trigger unauthorized generation of copyright content.

3.2.3 Watermark Removal (Wr)

To assess the resilience of Dw against watermark removal, this approach evaluates watermark robustness by attempting to remove them.

Formalization – Given a watermarked image $x_{\text{wm}}$ , an adversary applies typical image transformation attack $a$ to generate a watermark-removed image $x_{\text{wr}}=a(x_{\text{wm}})$ . The goal is to make the watermark undetectable while keeping $x_{\text{wr}}$ visually similar to $x_{\text{wm}}$ . Following [14, 46], the pixel-level distortion between $x_{\text{wr}}$ and $x_{\text{wm}}$ is constrained to stay below a threshold $\epsilon$ , ensuring visual similarity. Formally, this objective is expressed as:

D_{p}(x_{\text{wm}},x_{\text{wr}})\leq\epsilon.

(7)

Approaches – Brightness Adjustment (Bright) [39] adjusts the brightness of $x_{\text{wm}}$ to produce $x_{\text{wr}}$ . Image Rotation (Rotate) [39] rotates $x_{\text{wm}}$ to disrupt synchronization between the watermark embedder and detector. Random Crop (Crop) [39] removes portions of $x_{\text{wm}}$ . Gaussian Blur (Blur) [39] convolves $x_{\text{wm}}$ with a Gaussian kernel to smooth the image and reduce watermark visibility. VAE-Cheng20 (VAE) [41] compresses $x_{\text{wm}}$ using discretized Gaussian mixture likelihoods and attention modules to obscure the watermark. DiffPure [31] adds noise to $x_{\text{wm}}$ , followed by DM-based denoising to remove the watermark.

3.3 Threat Model

We systematically categorize the security threats to copyright protection methods based on the adversary’s objective, knowledge, and capability.

Adversary’s objective. In the field of text-to-image (T2I) diffusion models, adversaries aim to generate specific style/concept images. They exploit system flaws and challenge security measures to enable illegal copying and editing of images. Their objectives are multifaceted, including emulating a specific artist’s style, undeterred by existing obfuscation protections, the regeneration of sanitized concepts from purposefully sanitized models, and evading watermark detection. All these endeavors are pursued while maintaining a level of quality akin to the original copyrighted images.

Adversary’s knowledge. Considering the variations in different protection methods, we’ve tailored our model of the adversary’s background knowledge to capture these nuances. For obfuscation protections and digital watermarking, the adversary is capable of accessing the safeguarded or watermarked artistic images. In model sanitization, the adversary can access the sanitized model and a small set of reference images embodying the target concepts.

Adversary’s capability. In a similar vein, we’ve adjusted our model of the adversary’s capability to reflect these nuances. For obfuscation processing and digital watermarking, the adversary can modify the protected or watermarked images. In the context of model sanitization, the adversary can draw upon their knowledge of sanitized methods to retrain sanitized models using example images, thereby recovering the sanitized concepts.

4 Experiments

Leveraging CopyrightMeter, we conduct a systematic evaluation of existing copyright protection and attack methods, uncovering their intricate design landscape.

4.1 Experimental Setup

Datasets. We evaluate on three datasets: WikiArt [47], CustomConcept101 [6] (referred to as Concept), and Person [16]. WikiArt contains over 42,000 artworks from 129 artists, categorized by genre (e.g., Impressionism). Concept consists of images of 101 specific concepts, each with 3 to 15 images. Person consists of photos of 10 distinct celebrities, with 15 images for each individual derived from the LAION dataset [48]. For Op, following [15, 38], we use WikiArt and Concept. For Ms, following [12, 16], we use WikiArt and Person. For Dw, we use all three datasets.

Models. We evaluate the widely used and open-source DM implementation Stable Diffusion (SD). Previous studies used different versions of the SD, making it difficult to isolate the effects of copyright protection methods from model variations. We select a representative SD [49] version 1.5¹¹1https://huggingface.co/runwayml/stable-diffusion-v1-5, the most widely downloaded version on the Hugging Face platform with a resolution of 512 $\times$ 512 as the T2I DM in image generation experiments for a unified evaluation.

Metrics. Following [15, 26, 10], CopyrightMeter incorporates several key metrics: Peak Signal-to-Noise Ratio (PSNR) quantifies the ratio between maximum possible signal power and noise; Structural Similarity Index Measure (SSIM) evaluates structural similarity, brightness, and contrast between two images; Visual Information Fidelity (VIFp) assesses image quality based on information fidelity; Learned Perceptual Image Patch Similarity (LPIPS) uses deep learning features for perceptual similarity assessment; Fréchet Inception Distance (FID) measures the distribution distance between feature vectors for real and generated images; CLIP-I and CLIP-T use CLIP model [50] to assess the image-image similarity and text-image alignment, respectively; ACC denotes the detection accuracy of watermarks. Except for FID and LPIPS, higher values indicate closer alignment with the reference image or corresponding text. Table VII in Appendix A provides a detailed breakdown of these metrics across fidelity, efficacy, and resilience.

Implementation. All experiments are conducted on a server with two Intel Xeon CPUs, 64 GB memory, a 4TB HDD, and an NVIDIA A800 GPU. Appendix B details experimental setup for copyright protections and attacks.

4.2 Obfuscation Processing Evaluation

In this subsection, we evaluate the performance of obfuscation processing (Op) methods to understand how different design choices affect their effectiveness. Specifically, we begin by applying the Op methods to generate the protected images, and then employ DreamBooth [5] to mimic the style of these protected images. Our evaluation focuses on three aspects: fidelity, which measures the similarity between protected and original images; efficacy, which measures how effectively the protected images prevent mimicry; and resilience, which examines the robustness of protection when using noise purification (Np) attacks to remove the perturbation. The setup details are shown in Appendix B.1.

Fidelity – For a protected image, it is crucial that it appears visually identical to the original to maintain the image’s utility. High fidelity indicates better preservation of artistic and semantic values. To evaluate the fidelity of various Op protections, we use several widely-used metrics, such as LPIPS, SSIM, PSNR, VIFp, and FID, as outlined in Table VII. Figure 3 and 5 quantitatively and qualitatively show the fidelity evaluation of these methods compared with the protected image with the originals, respectively.

Figure 3 illustrates that fidelity varies among different Op protection methods. AntiDB shows the highest fidelity, with the lowest LPIPS averaging around 0.1 and FID around 80, along with the highest SSIM (exceeding 0.9), PSNR (above 36), and VIFp (around 4.4) across datasets. AdvDM and Mist also exhibit relatively low FID averaging around 110, suggesting better preservation of image quality. This is likely due to these methods incorporating DMs in perturbation optimization (cf. Table IV), enhancing fidelity compared to methods relying solely on image encoder.

Inconsistencies arise when comparing metrics like LPIPS and FID. For instance, PGuard’s low LPIPS of around 0.095 suggests a high visual similarity to the original images, but its high FID of over 180 suggests poor overall fidelity across the dataset. This disparity may stem from the different focuses of these metrics: LPIPS emphasizes semantic and perceptual similarities and visual details, while FID assesses how well the generated images align with the distribution of the original dataset, considering broader structural and statistical properties. Therefore, relying on a single metric can be misleading, highlighting the need for diverse metrics to comprehensively evaluate fidelity.

These findings align with the visualizations in Figure 5, where the AntiDB-protected images closely resemble the originals, while those from AdvDM and Mist maintain high similarity with only slight noise. Notably, all three methods DM into their optimization, contributing to their fidelity advantage. In contrast, images protected by Glaze and PGuard show more noticeable alterations. Specifically, Glaze introduces subtle and unique distortions, while PGuard results in a stretched appearance compared to the original images. Overall, while fidelity differs across Op methods, all successfully preserve key visual characteristics, ensuring utility without compromising the viewer’s experience.

Efficacy – Following [26, 7], we fine-tune pre-trained SD models using protected images to generate mimicked images. For comparison, we also generate mimicked images from original (unprotected) images as a baseline. We assess the similarity between mimicked images produced from protected images and original images using FID, CLIP-I, and text-image alignment with CLIP-T. Figure 4 presents the quantitative results, and Figure 5 displays visual examples.

As shown in Figure 4, mimicked images generated from protected images show a stronger deviation from originals, reflected in higher FID and lower CLIP-I and CLIP-T compared with originals, indicating efficacy in deterring copyright mimicry. Notably, Mist shows the highest efficacy, with an average FID increase (from around 150 to 400) and reductions in CLIP-I (from 0.7 to 0.55) and CLIP-T (from 0.30 to 0.23) across two datasets. This indicates that mimicked images significantly diverge from the originals and their text descriptions, thus effectively mitigating copyright mimicry. Mist’s strong efficacy is due to its incorporation of both image encoders and diffusion models into adversarial perturbation optimization[23], effectively increasing latent space distance while minimizing pixel-level deviation. Other protections, such as AdvDM, AntiDB and PGuard, provide moderate protection with smaller changes in FID, CLIP-I, and CLIP-T, indicating subtler deviations. In contrast, Glaze provides limited protection, as it slightly increases the FID (e.g., from 206 to 212 on the WikiArt dataset) while also slightly reducing both CLIP-T and CLIP-I. This result can be partly explained by the differences in fine-tuning methods used for image mimicry, as DreamBooth differs from the fine-tuning methods employed in Glaze (details in Sec 5.1).

Figure 4 demonstrates that the efficacy of protection methods varies across datasets. For instance, in the WikiArt dataset, almost all protection methods significantly reduce the similarity between generated images and text descriptions, as quantified by CLIP-T. However, in the Concept dataset, only Mist shows a reduction in CLIP-T from 0.3 to 0.28, while other methods remain close to the baseline. This suggests that protecting Concept is more challenging than WikiArt, likely due to two factors: (i) many protection methods [26, 23, 7] are optimized for artwork, enhancing their performance in art-centric datasets like WikiArt, and (ii) the distinct styles in WikiArt are easier for protection methods to exploit, which underscores the complexity of evaluating protection methods across different datasets. These findings highlight a need for more adaptable protection methods catering to varied data characteristics and contexts.

Notably, fidelity and efficacy do not always align. While stronger protections typically lead to greater quality degradation, our observations reveal counterintuitive results. For instance, AntiDB exhibits strong fidelity (cf. Figure 3) but does not achieve the best performance in efficacy (cf. Figure 4). Similarly, Mist shows high efficacy but ranks moderately in fidelity. These showcase the complex interplay between fidelity (preserving image quality) and efficacy (ensuring robust copyright protection against mimicry), underscoring the need for a balance between visual quality and protection effectiveness in practical applications.

These quantitative findings align with visualization results in Figure 5, where mimicked images from protected images show distinct styles from the originals. Mist shows the most unique textures (highest efficacy), while AdvDM and AntiDB show artifacts (moderate efficacy). Recognizing the efficacy of each method is crucial for selecting optimal strategies to prevent mimicry of copyrighted content.

Resilience – Following the approach outlined by [15], we assess the resilience of Op protection methods against Np attacks. Our evaluation process involves fine-tuning Stable Diffusion (SD) models using purified images generated by applying Np to protected images (i.e., copyrighted images with Op applied). We then evaluate the mimicry performance of these fine-tuned models, where a higher mimicry performance indicates lower resilience of the protection method. Figure 6 presents the resilience evaluation results of various Op protection methods against Np attacks.

Analysis of Figure 6 reveals several key insights. 1) All protection methods, except Glaze, show a notable decline in effectiveness when subjected to purification attacks, as evidenced by higher mimicry performance. For instance, AdvDM-protected images, when purified, achieve lower FID and higher CLIP-I and CLIP-T compared to their unpurified counterparts, indicating a higher mimicry performance. Note that, Glaze’s apparent resilience stems more from its initially limited protection performance than superior defensive capabilities. 2) Different Op methods show varying protection abilities when applying attacks. For example, thanks to its initial strong protection performance, Mist still maintains relatively higher FID and lower CLIP-I and CLIP-T than other protection methods under attack. 3) TVM and DiffPure emerge as the most potent methods for diminishing Op protection, achieving higher mimicry performance. 4) CLIP-T shows less sensitivity than other metrics, especially on the Concept dataset where it remains nearly constant across most attack methods. We believe this is due to its robustness to minor protection artifacts, with significant changes only when major distortions obscure the original concept content.

Furthermore, we visualize the mimicry results of various Np methods against Op techniques in Figure 7. Our visual findings align with the quantitative analysis presented earlier. Specifically, Mist demonstrates superior protection performance even when NP methods are applied, with the exceptions of TVM and DiffPure. This observation further underscores that TVM and DiffPure are the most potent methods for diminishing Op protection: the artifacts in the mimicry images under TVM and DiffPure are notably less pronounced compared to other methods. Additionally, we observe that while Np can indeed diminish protection to some extent, certain Op protection methods still demonstrate a robust ability to prevent mimicry effectively. For instance, we can discern obvious protection patterns for Mist and AdvDM even after the application of Np.

In summary, both quantitative and qualitative analyses demonstrate that Op techniques can be compromised by certain attacks. These findings underscore the critical importance of evaluating protection methods not only for their initial effectiveness but also for their resilience against subsequent attacks. This comprehensive approach to assessment is essential for developing robust and reliable protection strategies in the face of evolving threats.

4.3 Model Sanitization Evaluation

Similar to Sec 4.2, we assess model sanitization (Ms) across three key dimensions: fidelity, efficacy, and resilience. Fidelity measures the sanitized model’s ability to maintain performance on unrelated content. Efficacy gauges how effectively the sanitized model prevents the generation of copyrighted content, evaluating the thoroughness of the sanitization process. Resilience examines the sanitized model’s robustness against concept recovery (Cr) attacks, assessing whether it consistently avoids reproducing copyright-protected concepts even under adversarial conditions. The detailed experimental setup is given in Appendix B.2.

Fidelity – For sanitized models, it is crucial that sanitization preserves the ability to generate images for other concepts while excluding the copyright concept. Table VI evaluates the fidelity of Ms methods on MS-COCO 2017 [51] 30K dataset prompts. We use FID to measure the differences between images generated by the sanitized models (with the original DM for reference) and real-world images from the dataset. Additionally, CLIP-T is used to assess the alignment between generated images and prompts.

Our analysis reveals that model sanitization (Ms) methods achieve a successful balance between copyright protection and image generation capabilities, with only minor impacts on overall performance. In Table VI, sanitized models experience a marginal increase in FID scores compared to their original counterparts, with SLD showing the most notable change (16.95 vs. 16.21 for the original SD model). This subtle increase suggests a minor impact on image fidelity, likely due to the model inadvertently altering representations of unrelated but adjacent concepts or facing creative constraints when adjusted to exclude copyrighted content. Interestingly, CLIP-T scores remain remarkably consistent across all methods (0.30-0.31), indicating well-preserved textual alignment. These findings align with previous research [19, 20, 11], confirming that while Ms methods may slightly affect image fidelity, they successfully maintain text alignment for unrelated concepts.

In summary, the sanitization process achieves its primary goal of removing specific content without significantly compromising overall performance, demonstrating an effective balance between protecting copyrighted material and maintaining generative capabilities.

TABLE VI: Fidelity evaluation of Ms.

Method	SD	FMN	ESD	AC	UCE	NP	SLD
FID $\downarrow$	16.21	16.47	16.51	16.95	16.64	16.89	16.95
CLIP-T $\uparrow$	0.31	0.30	0.30	0.31	0.31	0.30	0.30

Efficacy – To evaluate the effectiveness of various model sanitization (Ms) methods in removing copyrighted concepts, we focus on two key metrics: image similarity and text-image alignment. We compare images generated by sanitized models to those from the original model using FID and measure the alignment between generated images and their prompts using CLIP-T scores. A higher FID or a lower CLIP-T implies a more effective Ms method.

Figure 8 reveals the variation in FID across images generated from different sanitized models. First, higher FID reflects more efficacy in removing copyright concepts from the sanitized model’s outputs, while CLIP-T also shows a marked decrease from the baseline (original model alignment), suggesting great divergence from copyright content, with ESD performing best sanitization (average FID 311, CLIP-T 0.17). Notably, model fine-tuning methods (i.e., ESD, FMN, and UCE) generally outperform inference-guiding methods (i.e., NP and SLD) with higher FID and lower CLIP-T, reflecting more effective sanitization. This is likely because fine-tuning methods directly modify model parameters for deeper adjustments to reduce the retention of copyrighted content, while inference-guided methods only adjust output directions, resulting in superficial removals.

Visualizations in Figure 9 support these findings. Fine-tuning-based methods like ESD and UCE effectively sanitize artistic styles by visibly altering original textures and colors in WikiArt and portraits into non-face images (i.e., landscapes or still lifes) in Person. In contrast, inference-guided methods like SLD still leave faint traces of original artistic style or individual characteristics. Additionally, these categories differ significantly in time efficiency (cf. Sec 5.2).

Resilience – We evaluate the resilience of Ms against Cr attacks following [16]. Our evaluation process involves generating images from both original models (i.e., models capable of generating content with copyright concepts) and recovered models (i.e., sanitized models subjected to Cr). Figure 10 presents the resilience evaluation results of various Ms protection methods against Cr attacks.

Analysis of Figure 10 uncovers several critical insights. 1) All Ms protection methods show reduced effectiveness under Cr attacks. The FID between images from recovered models and originals is lower than that of sanitized models and originals, with higher CLIP-T of images from recovered models than sanitized models, indicating enhanced resemblance to copyrighted content. For instance, FMN- and AC-sanitized models show relatively low resilience, with low FID scores and high CLIP-T under Cr. Thus, while Ms methods provide initial protection, their resilience against Cr attacks is limited. 2) The resilience of Ms varies with Cr attacks applied. Fine-tuning-based attacks (e.g., LoRA and DB) are the most potent methods for diminishing Ms protection, lowering FID and raising CLIP-T from baseline. In contrast, textual-inversion-based attacks (e.g., TI and CI) cause moderate changes in FID and CLIP-T, while prompt-engineering-based attacks (e.g., RB) lead to minimal deviation. This may stem from incomplete pre-filtering of copyright content in DM’s training dataset [52, 16], as Ms methods often remap them to new embeddings rather than fully remove these concepts. 3) High-potency Cr attacks tend to limited in applicability. LoRA, DB, and TI are potent, but most apply to Ms using standardized open-source models. The custom CI pipeline for each Ms method makes it adjustable to various Ms methods, though its use with new Ms methods is uncertain. In contrast, RB bypasses protections solely through prompt modifications, making it adaptable across diverse T2I DMs.

Visualizations in Figure 11 of images generated from the FMN-sanitized and recovered models reveal the varying resilience of Ms protection against Cr attacks with differing adversary capabilities. Attacks that enable deeper model manipulation, such as fine-tuning and textual-inversion methods, recover original styles in WikiArt and portrait characteristics in Person more effectively. This trend reflects a strong correlation between higher adversary capability and greater attack impact. In contrast, less invasive prompt-engineering attacks have limited success in recovering detailed human portraits, but may still pose a feasible threat in scenarios with constrained adversary capabilities. These findings underscore the need for robust Ms methods that can withstand attacks across varying levels of attacker capability.

4.4 Digital Watermarking Evaluation

Similarly, we evaluate digital watermarking (Dw) based on three criteria: fidelity, assessing the visual consistency between images before and after Dw; efficacy, determined by the ACC of extracted watermark messages; and resilience, measuring the ACC of message extracted from image after watermark removal (Wr) attacks. Further experimental setup details are outlined in Appendix B.3.

Fidelity – Maintaining visual similarity to the original image and alignment with the corresponding prompt is essential for watermarked images. Figure 13 evaluates fidelity across all Dw methods, using FID as a general metric. Specifically, for watermarks embedded directly onto existing images, we measure visual consistency with metrics such as LPIPS and SSIM; for generative watermarks that produce watermarked images from prompts, we assess text alignment with CLIP-T. Visualizations are presented in Figure 14.

Figure 13 shows that Dw have minimal impact on image fidelity, where lower LPIPS and higher SSIM, PSNR, VIFp, and CLIP-I suggest greater fidelity. Specifically, DShield, ZoDiac, and Diag exhibit low FID (below 80), indicating minimal visual alteration. This is attributed to its approach of embedding the watermark in the latent space’s Fourier frequencies, making disturbances less visually perceptible [38]. In contrast, GShade, StabSig, and TR display slightly higher FID (exceeding 90) but maintain CLIP-T scores comparable to watermarks on existing images (around 0.3), indicating preserved semantic consistency.

Visualizations in Figure 14 confirm these findings. DShield, ZoDiac, and Diag retain a close resemblance to the originals, while GShade, StabSig, and TR introduce differences, with content and artistic style largely unchanged at the semantic level. This consistency across metrics and visuals supports the fidelity of these watermark designs.

Efficacy – For watermarks, it is crucial that the decoded message exhibits high ACC compared to the embedded message. High efficacy indicates better copyright verification.

Figure 13 shows that most Dw methods achieve ACC close to 100% across datasets, except for DShield, underscoring the efficacy of these watermarks. Notably, TR stands out with 100% ACC across all datasets, indicating robust watermark embedding and decoding capability. DShield shows slightly lower ACC, possibly due to the diverse and complex datasets we used, underscoring a limitation of fine-tuning-based watermarking methods, where efficacy depends on data specificity and quality.

In summary, these findings highlight that the efficacy of Dw is largely dependent on the embedding strategy or the quality of training data, while modifications in latent space show particular promise for high ACC in diverse settings.

Resilience – We assess Dw protection resilience against Wr attacks by comparing the ACC of messages extracted after watermark removal with the originally embedded message. A higher ACC indicates a stronger resilience.

Figure 15 presents the resilience of various Dw protections against Wr attacks, revealing several key insights. 1) Most watermarks show reduced protection after attacks, with ACC lower than the baseline (un-attacked watermarked images). For example, Diag’s ACC sharply declines under Blur, while StabSig, ZoDiac, and GShade are vulnerable to Rotate. 2) Dw methods vary in resilience against attacks. Compared to the baseline, DShield and TR exhibit only slight declines under attacks, while others face larger reductions under certain attacks. 3) Under attack, latent space modifying methods exhibit higher ACC compared to model fine-tuning methods, with TR maintaining nearly 100% ACC across attacks due to its invisible Fourier space embedding that resists pixel disruption. 4) ZoDiac and GShade share similar vulnerabilities under Bright, Rotate, and Crop attacks, with the lowest ACC observed under Rotate.

In summary, these insights highlight the need to carefully consider specific attack scenarios when choosing watermark strategies. We speculate that latent-space modifying methods leverage the inherent distribution of the diffusion model’s latent space to embed watermarks more subtly and securely, making them harder to detect and remove.

5 Exploration

Next, we explore the generalizability, efficiency, and sensitivity of current protection methods. We further compare these methods with their contemporary versions and industry-leading online text-to-image applications. Furthermore, we also conduct user studies to evaluate the alignment between evaluation metrics and human judgment.

5.1 Generalizability

While previous experiments use DreamBooth[5] for image mimicry, other fine-tuning methods can also achieve mimicry. To assess generalizability, following [13, 24], we employ a standard script from Diffusers²²2https://huggingface.co/docs/diffusers/training/text2image for fine-tuning (details in Appendix B.4). In Figure 17, differences in texture patterns, artifacts, or deviations from protected images reveal protection effectiveness against specific mimicry techniques. Notably, AdvDM and Mist exhibit the highest generalizability, providing strong protection under both fine-tuning methods. Glaze is less effective against DreamBooth but performs better with the standard script. Conversely, PGuard is robust with DreamBooth but is less effective with the standard script. AntiDB provides noticeable protection against both methods, particularly performing well against DreamBooth mimicry. These findings emphasize the need for protection methods that account for diverse mimicry techniques to enhance generalizability.

5.2 Efficiency

Computational cost is a key factor in copyright protection applications. Op and Dw involve lightweight, image-level manipulations, while Ms and Cr require deeper model-level changes, increasing time consumption. While prior studies often overlook time efficiency, we explore the time consumption of Ms and Cr methods.

As shown in Figure 17, within Ms, inference-guiding methods are more efficient than model fine-tuning methods as they can sanitize a concept within four minutes without retraining. Additionally, Cr methods are generally more time-efficient than Ms methods on average. This is likely because Cr simply enhances existing representations in the model, whereas Ms must first overcome the model’s training biases with a reverse optimization process. Notably, DB is the most efficient Cr method, highlighting the vulnerability of Ms. Therefore, practitioners should carefully select Ms methods based on available computational resources.

5.3 Sensitivity Analysis

Following [14, 44, 39], we take Dw as a representative for sensitivity analysis, where small perturbations may sharply impact watermark resilience, offering generalizable insights to other protection methods.

As shown in Figure 18, most protection methods exhibit a sharp ACC drop with increased hyperparameter values. For instance, higher brightness, crop ratios, or blur radii result in ACC drops. This implies that strong hyperparameter settings weaken the robustness of most methods. Notably, both TR and Diag demonstrate notable resilience to the Rotate attack, maintaining near 100% ACC even at 90^∘ or 180^∘ rotation, while GShade and ZoDiac suffer sharp decreases. The superior performance of TR is attributed to its multi-ring pattern in Fourier space. Similarly, Diag achieves robustness by embedding triggers to embed robust watermark patterns. These insights help practitioners choose protection methods tailored to real-world attack scenarios.

5.4 Contemporary Assessment

In the evolving field of copyright protection, methods and infringement attacks are in constant competition. We analyze recent versions of protections and attacks: (i) Glaze v2.1³³3https://glaze.cs.uchicago.edu/ is a closed-source update optimized for styles with clear colors and smooth textures. (ii) Mist v2⁴⁴4https://psyker-team.github.io/ enhances the vanilla Mist [23] with improved efficacy and efficiency. (iii) Noisy Upscaler[24] is an advanced attack that first adds a small amount of random noise to a protected image, then purifies the image using the Upscaler[49].

Figure 19 compares Glaze v2.1 (details in Appendix C) with our open-sourced Glaze implementation, along with Mist v2 and vanilla Mist. Although both Glaze versions introduce similar perturbations, Glaze v2.1 still demonstrates limited resilience, especially against JPEG attacks. Mist v2 achieves improved resilience with post-attack images displaying noticeable mottling and color shifts, while images protected by the original Mist method show a closer resemblance to the originals. These observations underscore the vulnerabilities of existing copyright protection methods to advanced attacks, highlighting the ongoing need for improved protective solutions. In Figure 20, current protections are particularly susceptible to the Noisy Upscaler attack, with heightened vulnerability compared to other methods.

5.5 Real-world Online Applications

After analyzing SOTA strategies in academic settings, we compare them with industry-leading online applications. We have reported our findings to the respective companies.

Scenario.gg and NovelAI for image mimicry. To assess the efficacy of Op, we explore two online applications, scenario.gg⁵⁵5https://www.scenario.gg/ and NovelAI⁶⁶6https://novelai.net/. Figure 21 illustrates that Mist, the strongest protection, effectively prevents mimicry on scenario.gg, as the perturbation remains intact. Further, we observe that the mimicked images from TVM- and DiffPure-purified images make artifacts nearly undetectable, emerging as the most potent attacks, aligning with previous findings in Section 4.2. On NovelAI, its style transfer removes Mist’s perturbation, suggesting that frequent model updates may reduce protection efficacy. This highlights the importance of ongoing protection updates to counter mimicry threats.

Amazon Titan Image Generator for watermarking. We assess the resilience of the watermark embedded in Amazon Bedrock Titan Image Generator⁷⁷7https://aws.amazon.com/cn/bedrock/titan/ against various attacks. Figure 22 shows the watermark remains intact against basic attacks (e.g., Bright, Crop, and JPEG), but becomes undetectable under more complex attacks. This implies that online watermarks share similar vulnerabilities to those applied locally. Additionally, online watermarks lack the customization options (e.g., watermark strength). Therefore, practitioners should take flexibility and customization into consideration when choosing watermark methods.

5.6 User Study

We conduct user studies to evaluate the alignment between metrics and human perception (details in Appendix D). First, we assess the visual quality and style mimicry of protected and purified images in WikiArt. Following [24], we define success rate as the percentage of users preferring mimicry images fine-tuned on protected or purified images over unprotected ones. We observe that success rates increase after purification, suggesting greater visual similarity to the originals. Notably, average success rates across all mimicry scenarios remain below 50% (50% suggests perfect mimicry), showing that from human perspectives, even mimicked images from purified images still differ significantly from the originals. Mist yields the lowest mimicry success rate (under 10%), indicating the highest efficacy for protection, whereas DiffPure attacks reduce resilience, with success rates around 35%, supporting observations in Sec 4.2. Second, following [11, 19, 20], we further examine whether Ms methods impact the fidelity of images of unrelated concepts. Over 50% of users rate the fidelity and text alignment of images generated by sanitized models as equal to or better than the original SD model, supporting the observations in Sec 4.3, suggesting that sanitized models produce images comparable to those from the original SD. The alignment between metrics and human judgment confirms that CopyrightMeter effectively captures human perception for assessing copyright protection methods.

6 Discussion

Limitations and future work. First, CopyrightMeter integrates most mainstream copyright protection methods in T2I DMs. Although it does not implement all strategies, its modular design allows easy incorporation of new protections, attacks, and metrics. Second, we primarily apply default settings from original papers, as these are typically optimized for performance. However, our framework supports alternative configurations. Finally, most protections require modifying original artworks, posing challenges for established artists whose unprotected works remain vulnerable. Unlike software security, where updates can fix vulnerabilities, copyright protection cannot be easily patched. As offense-defense dynamics evolve, existing protections may not withstand future attacks. We hope that CopyrightMeter provides interim protection and advocates for the establishment of more comprehensive laws and regulations.

Guidance for enhancing protection methods. Our findings reveal limitations in current copyright protections, with CopyrightMeter serving as a valuable benchmark for improvement. For example, adversarial perturbations in Op are easily compromised by simple attacks like JPEG, so incorporating JPEG loss into optimization may improve resilience. In Ms, resilience can be improved through adversarial training with crafted adversarial inputs that induce the generation of copyright concepts, minimizing model output probability under these inputs to facilitate concept erasure in more complex scenarios. Alternatively, if Cr is inevitable, refining Ms to slow recovery efforts provides additional protection. For Dw, designing watermarks with common attack strategies can strengthen resilience.

Additional related work. Recent studies [22, 16, 24] have surveyed copyright protections and attacks methods in T2I DMs but are limited to single-level implementations without empirical evaluation. For instance, [22] discusses Op and Ms without experimental validation or quality assessment of generated images. Similarly, [24] highlights Op artistic style imitation, showing that all existing copyright protections can be bypassed through user studies, but lack quantitative metrics for image quality. In contrast, CopyrightMeter provides a comprehensive framework for evaluation, covering major protection and attack categories within a unified platform for empirical analysis.

7 Conclusion

In this paper, we design and implement CopyrightMeter, a uniform platform dedicated to the comprehensive evaluation of copyright protection for text-to-image diffusion models. Leveraging CopyrightMeter, we conduct systematic evaluations from the perspectives of fidelity, efficacy, and resilience. To our knowledge, this platform is the first of its kind to provide a uniform, comprehensive, informative, and extensible evaluation of existing copyright protections and attacks. It offers empirical support and addresses the under-explored intricacies of copyright protections and attacks that have previously suffered from non-holistic and non-standardized evaluations, thereby tackling long-standing questions in the field.

References

[1] P. Fernandez, G. Couairon, H. Jégou, M. Douze, and T. Furon, “The stable signature: Rooting watermarks in latent diffusion models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 22 466–22 477.
[2] J. Betker, G. Goh, L. Jing, T. Brooks, J. Wang, L. Li, L. Ouyang, J. Zhuang, J. Lee, Y. Guo, W. Manassra, P. Dhariwal, C. Chu, Y. Jiao, and A. Ramesh, “Improving image generation with better captions,” 2023.
[3] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. L. Denton, K. Ghasemipour, R. Gontijo Lopes, B. Karagol Ayan, T. Salimans et al., “Photorealistic text-to-image diffusion models with deep language understanding,” Advances in Neural Information Processing Systems, vol. 35, pp. 36 479–36 494, 2022.
[4] (2023) Generative ai has an intellectual property problem. [Online]. Available: https://hbr.org/2023/04/generative-ai-has-an-intellectual-property-problem
[5] N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman, “Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 22 500–22 510.
[6] N. Kumari, B. Zhang, R. Zhang, E. Shechtman, and J.-Y. Zhu, “Multi-concept customization of text-to-image diffusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1931–1941.
[7] S. Shan, J. Cryan, E. Wenger, H. Zheng, R. Hanocka, and B. Y. Zhao, “Glaze: Protecting artists from style mimicry by text-to-image models,” in 32nd USENIX Security Symposium (USENIX Security 23), 2023, pp. 2187–2204.
[8] P. Samuelson, “Generative ai meets copyright,” Science, vol. 381, no. 6654, pp. 158–161, 2023.
[9] M. Heikkilä, “This artist is dominating ai-generated art. and he’s not happy about it,” MIT Technology Review, vol. 125, no. 6, pp. 9–10, 2022.
[10] H. Salman, A. Khaddaj, G. Leclerc, A. Ilyas, and A. Madry, “Raising the cost of malicious ai-powered image editing,” in Proceedings of the 40th International Conference on Machine Learning. PMLR, 2023.
[11] R. Gandikota, J. Materzynska, J. Fiotto-Kaufman, and D. Bau, “Erasing concepts from diffusion models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 2426–2436.
[12] G. Zhang, K. Wang, X. Xu, Z. Wang, and H. Shi, “Forget-me-not: Learning to forget in text-to-image diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 1755–1764.
[13] Y. Cui, J. Ren, H. Xu, P. He, H. Liu, L. Sun, Y. Xing, and J. Tang, “Diffusionshield: A watermark for copyright protection against generative diffusion models,” arXiv preprint arXiv:2306.04642, 2023.
[14] Y. Wen, J. Kirchenbauer, J. Geiping, and T. Goldstein, “Tree-rings watermarks: Invisible fingerprints for diffusion images,” in Advances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, Eds., vol. 36. Curran Associates, Inc., 2023, pp. 58 047–58 063.
[15] B. Cao, C. Li, T. Wang, J. Jia, B. Li, and J. Chen, “Impress: Evaluating the resilience of imperceptible perturbations against unauthorized data usage in diffusion-based generative ai,” Advances in Neural Information Processing Systems, vol. 36, 2024.
[16] M. Pham, K. O. Marshall, N. Cohen, G. Mittal, and C. Hegde, “Circumventing concept erasure methods for text-to-image generative models,” in The Twelfth International Conference on Learning Representations, 2023.
[17] G. Li, Y. Chen, J. Zhang, J. Li, S. Guo, and T. Zhang, “Towards the vulnerability of watermarking artificial intelligence generated content,” arXiv preprint arXiv:2310.07726, 2023.
[18] Z. Jiang, J. Zhang, and N. Z. Gong, “Evading watermark based detection of ai-generated content,” in Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, 2023, pp. 1168–1181.
[19] R. Gandikota, H. Orgad, Y. Belinkov, J. Materzyńska, and D. Bau, “Unified concept editing in diffusion models,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 5111–5120.
[20] P. Schramowski, M. Brack, B. Deiseroth, and K. Kersting, “Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 22 522–22 531.
[21] T. Šarčević, A. Karlowicz, R. Mayer, R. Baeza-Yates, and A. Rauber, “U can’t gen this? a survey of intellectual property protection methods for data in generative ai,” arXiv preprint arXiv:2406.15386, 2024.
[22] J. Ren, H. Xu, P. He, Y. Cui, S. Zeng, J. Zhang, H. Wen, J. Ding, H. Liu, Y. Chang et al., “Copyright protection in generative ai: A technical perspective,” arXiv preprint arXiv:2402.02333, 2024.
[23] C. Liang and X. Wu, “Mist: Towards improved adversarial examples for diffusion models,” arXiv preprint arXiv:2305.12683, 2023.
[24] R. Hönig, J. Rando, N. Carlini, and F. Tramèr, “Adversarial perturbations cannot reliably protect artists from generative ai,” 2024.
[25] B. Zheng, C. Liang, X. Wu, and Y. Liu, “Understanding and improving adversarial attacks on latent diffusion model,” arXiv preprint arXiv:2310.04687, 2023.
[26] C. Liang, X. Wu, Y. Hua, J. Zhang, Y. Xue, T. Song, Z. Xue, R. Ma, and H. Guan, “Adversarial example does good: Preventing painting imitation from diffusion models via adversarial examples,” in International Conference on Machine Learning. PMLR, 2023, pp. 20 763–20 786.
[27] T. Van Le, H. Phung, T. H. Nguyen, Q. Dao, N. N. Tran, and A. Tran, “Anti-dreambooth: Protecting users from personalized text-to-image synthesis,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 2116–2127.
[28] G. K. Wallace, “The jpeg still picture compression standard,” Communications of the ACM, vol. 34, no. 4, pp. 30–44, 1991.
[29] P. Heckbert, “Color image quantization for frame buffer display,” ACM Siggraph Computer Graphics, vol. 16, no. 3, pp. 297–307, 1982.
[30] A. Chambolle, “An algorithm for total variation minimization and applications,” Journal of Mathematical imaging and vision, vol. 20, pp. 89–97, 2004.
[31] W. Nie, B. Guo, Y. Huang, C. Xiao, A. Vahdat, and A. Anandkumar, “Diffusion models for adversarial purification,” in International Conference on Machine Learning (ICML), 2022.
[32] N. Kumari, B. Zhang, S.-Y. Wang, E. Shechtman, R. Zhang, and J.-Y. Zhu, “Ablating concepts in text-to-image diffusion models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 22 691–22 702.
[33] AUTOMATIC1111, “Negative prompt,” https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Negative-prompt, 2022, accessed: 2024-07-01.
[34] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” in The Tenth International Conference on Learning Representations, 2022.
[35] R. Gal, Y. Alaluf, Y. Atzmon, O. Patashnik, A. H. Bermano, G. Chechik, and D. Cohen-Or, “An image is worth one word: Personalizing text-to-image generation using textual inversion,” in The Eleventh International Conference on Learning Representations, 2023.
[36] Y.-L. Tsai, C.-Y. Hsu, C. Xie, C.-H. Lin, J.-Y. Chen, B. Li, P.-Y. Chen, C.-M. Yu, and C.-Y. Huang, “Ring-a-bell! how reliable are concept removal methods for diffusion models?” in The Twelfth International Conference on Learning Representations, 2024.
[37] Z. Wang, C. Chen, L. Lyu, D. N. Metaxas, and S. Ma, “Diagnosis: Detecting unauthorized data usages in text-to-image diffusion models,” in The Twelfth International Conference on Learning Representations, 2023.
[38] L. Zhang, X. Liu, A. V. Martin, C. X. Bearfield, Y. Brun, and H. Guan, “Attack-resilient image watermarking using stable diffusion,” Advances in Neural Information Processing Systems, 2024.
[39] Z. Yang, K. Zeng, K. Chen, H. Fang, W. Zhang, and N. Yu, “Gaussian shading: Provable performance-lossless image watermarking for diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 12 162–12 171.
[40] O. Hosam, “Attacking image watermarking and steganography-a survey,” International Journal of Information Technology and Computer Science, vol. 11, no. 3, pp. 23–37, 2019.
[41] Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image compression with discretized gaussian mixture likelihoods and attention modules,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 7939–7948.
[42] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer, 2015, pp. 234–241.
[43] F. Y. Shih, Digital watermarking and steganography: fundamentals and techniques. CRC press, 2017.
[44] K. A. Zhang, L. Xu, A. Cuesta-Infante, and K. Veeramachaneni, “Robust invisible video watermarking with attention,” arXiv preprint arXiv:1909.01285, 2019.
[45] P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,” Advances in neural information processing systems, vol. 34, pp. 8780–8794, 2021.
[46] X. Zhao, K. Zhang, Z. Su, S. Vasan, I. Grishchenko, C. Kruegel, G. Vigna, Y.-X. Wang, and L. Li, “Invisible image watermarks are provably removable using generative ai,” arXiv preprint arXiv:2306.01953, 2023.
[47] B. Saleh and A. Elgammal, “Large-scale classification of fine-art paintings: Learning the right metric on the right feature,” arXiv preprint arXiv:1505.00855, 2015.
[48] C. Schuhmann, R. Beaumont, R. Vencu, C. Gordon, R. Wightman, M. Cherti, T. Coombes, A. Katta, C. Mullis, M. Wortsman et al., “Laion-5b: An open large-scale dataset for training next generation image-text models,” Advances in Neural Information Processing Systems, vol. 35, pp. 25 278–25 294, 2022.
[49] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 684–10 695.
[50] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PMLR, 2021, pp. 8748–8763.
[51] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 2014, pp. 740–755.
[52] A. Birhane, V. U. Prabhu, and E. Kahembwe, “Multimodal datasets: misogyny, pornography, and malignant stereotypes,” arXiv preprint arXiv:2110.01963, 2021.
[53] A. Mądry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” stat, vol. 1050, no. 9, 2017.
[54] C. Lu, Y. Zhou, F. Bao, J. Chen, C. Li, and J. Zhu, “Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps,” Advances in Neural Information Processing Systems, vol. 35, pp. 5775–5787, 2022.

Appendix A Metrics Overview and Visualization Results

TABLE VII: Properties of copyright protection methods.

Category	Property	Description	PSNR	SSIM	FID	VIFp	LPIPS	CLIP-I	CLIP-T	ACC
Obfuscation Processing	Fidelity	Protected images resemble the originals under all scenarios.	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$	$\times$	$\times$	$\times$
	Efficacy	Protected images mitigate copyright infringement.	$\times$	$\times$	$\checkmark$	$\times$	$\times$	$\checkmark$	$\checkmark$	$\times$
	Resilience	Protected images mitigate copyright mimicking under attack.	$\times$	$\times$	$\checkmark$	$\times$	$\times$	$\checkmark$	$\checkmark$	$\times$
Model Sanitization	Fidelity	Sanitized models are unaffected for unrelated concepts.	$\times$	$\times$	$\checkmark$	$\times$	$\times$	$\times$	$\checkmark$	$\times$
	Efficacy	Sanitized models forget specific copyright concepts.	$\times$	$\times$	$\checkmark$	$\times$	$\times$	$\times$	$\checkmark$	$\times$
	Resilience	Sanitized models struggle to relearn copyright concepts under attack.	$\times$	$\times$	$\checkmark$	$\times$	$\times$	$\times$	$\checkmark$	$\times$
Digital Watermark	Fidelity	Watermarked images resemble the originals under all scenarios.	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$	$\times$
	Efficacy	Watermark extractable from protected images.	$\times$	$\times$	$\times$	$\times$	$\times$	$\times$	$\times$	$\checkmark$
	Resilience	Watermark extractable from attacked images.	$\times$	$\times$	$\times$	$\times$	$\times$	$\times$	$\times$	$\checkmark$

We use metrics to assess fidelity, efficacy, and resilience of copyright protection methods, with Table VII summarizing these properties across different categories. For obfuscation processing and noise purification, Figure 23 presents protected images alongside the original artwork, while Figure 24 shows DreamBooth fine-tuned images with reference to the protected and purified images. For model sanitization and concept recovery, Figure 25 shows protected images with a specific concept sanitized, and Figure 26 shows the recovered images. Finally, for digital watermark and watermark removal, the results are illustrated in Figure 27.

Appendix B Experimental Setup

B.1 Obfuscation Processing and Noise Purification

In Op, AdvDM [26] trains with learning rate of 0.003 for 100 steps, and a perturbation limit of 0.06. Mist [23] uses an $l_{\infty}$ constraint, 100 PGD steps, a per-step perturbation of 1/255, and a total budget of 16/255. Given that Glaze [7] is closed-source, we follow the implementation from [15]’s code using a learning rate of 0.001 for 500 steps, with a perceptual perturbation budget of 0.05, LPIPS loss weight of 0.1. PhotoGuard (PGuard) [10] uses an $\ell_{\infty}$ perturbation limit of 16/255, step size of 2/255, and 200 optimization steps. Anti-Dreambooth (AntiDB) [27] employs 100 PGD iterations for FSMG and 50 for ASPL, with a perturbation budget of 8/255, a step size of 1/255, and a noise budget $\eta$ of 0.05, minimized over 1000 training steps.

In Np, JPEG Compression (JPEG) [28] sets the quality to 0.75, and Quantize (Quant) [29] sets the bit depth to 8. Total Variance Minimization (TVM) [30] sets a regularization weight of 0.5 with the $l_{2}$ norm and optimized with the BFGS algorithm. For IMPRESS [15], we use the original authors’ hyperparameters, setting the learning rate to 0.001, purification intensity to 0.1, and 3000 iterations. For DiffPure [31], we use classifier-free guidance with a scale of 7.5 and fine-tune the diffusion timesteps with a strength of 1,000 via AutoPipelineForImage2Image⁸⁸8https://huggingface.co/docs/diffusers/api/pipelines/auto_pipeline.

B.2 Model Sanitization and Concept Recovery

We use several methods for Ms. For Forget-Me-Not (FMN) [12], we fine-tune by textual inversion scripts provided by the authors. For Erased Stable Diffusion (ESD) [11], models are trained for each category. Ablating Concepts (AC) [32] employs scripts from the authors for both artistic and personal concepts, utilizing WikiArt artworks and generated photos, respectively. Unified Concept Editing (UCE) [19] models are trained using default parameters. Negative Prompt (NP) is applied during inference by txt2img.py⁹⁹9https://github.com/CompVis/stable-diffusion/blob/main/scripts/txt2img.py. Safe Latent Diffusion (SLD) [20] use a new SD pipeline based on the diffusers¹⁰¹⁰10https://github.com/huggingface/diffusers/, with safety concepts defined for both artistic and personal elements. All parameters are set to default: guidance scale at 2000, warm-up steps at 7, threshold at 0.025, momentum scale at 0.5, and momentum beta at 0.7. Additionally, to confirm sanitized models’ fidelity on unrelated concepts, we compare 30,000 real-word images from MS-COCO 2017 dataset and the SD-generated images from the corresponding text descriptions.

In Cr, LoRA [34] trains with a batch size of 1 and a learning rate of $1\times 10^{-4}$ for 100 steps. DreamBooth (DB) [5] trains with a batch size of 2 and a learning rate of $5\times 10^{-7}$ for 1000 steps, using prompts such as “a painting in the style of [V]” for WikiArt dataset and “A photo of sks [V]” for Person dataset, where “[V]” represents a artist or concept name. Textual Inversion (TI) [35] uses textual_inversion.py¹¹¹¹11https://github.com/huggingface/diffusers/blob/main/examples/textual_inversion/textual_inversion.py with 1000 steps and a learning rate of $5\times 10^{-4}$ , using the same prompts as DreamBooth. Concept Inversion (CI) [16] trains with a batch size of 4, a learning rate of $5\times 10^{-3}$ , and 1000 steps, using frozen erased model weights. Ring-A-Bell (RB) [36] uses a prompt length of 77, a tuning coefficient of 3, and a genetic algorithm with a population of 200, 3000 iterations, a mutation rate of 0.25, and a crossover rate of 0.5. UCE uniquely generates non-standard .pt model files, preventing further fine-tuning. Inference-guiding protections (NP, SLD) do not generate model files and are vulnerable only to CI and RB attacks.

B.3 Digital Watermark and Watermark Removal

In Dw, DiffusionShield (DShield) [13] uses a patch shape of $(u,v)=(4,4)$ and sets a quarternary message to 2. For joint optimization, a 5-step PGD [53] is applied with $l_{\infty}\leq\epsilon$ , while SGD optimizes the classifier. Diagnosis (Diag) [37] uses a 100% coating rate for unconditional and 20% for trigger-conditioned memorization, with warping strengths of 2.0 and 1.0, respectively. Stable Signature (StabSig) [1] fine-tunes the LDM decoder using decoder to generate watermarked images. Tree-Ring (TR) [14] uses guidance scale of 7.5 for 50 inference steps, with a watermark radius of 10 for DDIM inversion. ZoDiac [38] uses a pre-trained SD model with 50 denoising steps and optimizes the latent variable over 100 iterations, using a watermark radius of 10 and weights of 0.1 for SSIM loss and 0.01 for perceptual loss. Gaussian Shading (GShade) [39] samples 50 steps using DPMSolver [54] with a guidance scale of 7.5 and performs 50 steps of DDIM inversion, using settings of $f_{c}=1$ , $f_{hw}=8$ , and $l=1$ with capacity of 256 bits.

In Wr, Brightness Adjustment (Bright) [39] applies a factor of 6, and Image Rotation (Rotate) [39] performs a 90-degree rotation. Random Crop (Crop) [39] executes a selection of 50% of the image area, while Gaussian Blur (Blur) [40] uses a kernel size of 4. VAE-Cheng20 (VAE) is utilized with a quality level of 3 [41]. Moreover, DiffPure [31] implements the AutoPipelineForImage2Image pipeline¹²¹²12https://huggingface.co/docs/diffusers/api/pipelines/auto_pipeline, with classifier-free guidance set to 7.5 and diffusion timesteps tuned to a strength of 1,000.

B.4 Style Mimicry Experimental Details

Dreambooth is a subject-driven generation method that can be used for style/concept transfer. In Op and Np, we use unprotected, protected, and attacked images as references to fine-tune a pre-trained SD model via Dreambooth, utilizing the implementation provided by diffusers¹³¹³13https://github.com/huggingface/diffusers/. Additionally, we use the T2I fine-tuning script provided by diffusers to test the generalization of the protections (cf. Section 5.1). Following [24] for optimal style mimicry, we use 2000 training steps, a batch size of 4, a learning rate of $5\times 10^{-6}$ .

Appendix C Implementation Validity Analysis of Glaze

We use Glaze’s reproduce code from IMPRESS¹⁴¹⁴14https://github.com/AAAAAAsuka/Impress/blob/main/glaze.py since the latest version of Glaze (v2.1)¹⁵¹⁵15https://Glaze.cs.uchicago.edu/downloads.html is not open-sourced. For Glaze v2.1, we set the intensity to high and render quality to slowest for maximum protection. The comparison of protected images shows that while our implementation offers slightly lower protection, it achieves higher fidelity (cf. Table VIII). Both approaches display similar “style cloaks,” confirming the validity of our implementation (cf. 29).

TABLE VIII: Comparison of fidelity and efficacy on Glaze.

Method	LPIPS $\downarrow$	FID $\downarrow$	CLIP-I $\downarrow$	CLIP-T $\downarrow$
Glaze v2.1	0.403 $\pm$ 0.053	283	0.625 $\pm$ 0.008	0.248 $\pm$ 0.002
Our Implementation	0.133 $\pm$ 0.031	182	0.698 $\pm$ 0.010	0.292 $\pm$ 0.001

Appendix D User Study

User Study of Op. Our human evaluation assesses both visual quality and style mimicry of protected images under various attacks. Following [7, 24], we measure the correlation between metrics and human judgment regarding artist style mimicry. Annotators on Amazon MTurk¹⁶¹⁶16https://www.mturk.com/ were presented with original artworks as style references and asked to evaluate two scenarios: (i) a generated artwork without protection versus one with protection, and (ii) a generated artwork without protection versus one with protection after attack. We employ original artist images from the WikiArt and the corresponding protected images from different protection methods as reference pictures to fine-tune the Dreambooth model with a prompt “a painting in the style of [artist]”. Participants view 10 original artworks by a specific artist as reference samples, followed by one protected and one unprotected generated image in the same style. We focus on two key aspects: 1) Visual Quality. Participants assess each image based on four questions corresponding to metrics targeting noise level, fidelity (including artifacts), alignment with brightness/contrast/structure, and overall stylistic fit (cf. Table D). To ensure unbiased assessments, we randomized image order, comparison sequences, and model generation seeds. 2) Style Mimicry. Inspired by the Glaze [7], we asked participants to rate the style mimicry of the generated images on a 5-point Likert scale, evaluating how well each image resembled the reference style samples. The options range from: (i) Not successful at all, (ii) Not very successful, (iii) Somewhat successful, (iv) Successful, to (v) Very successful.

Dimensions		Human Evaluation Questions
Visual Quality	PSNR	Which image has less noise?
	VIFp	Which image has better fidelity and fewer artifacts (distorted, unrealistic)?
	SSIM	Based on brightness, contrast, and structure, which better matches the referred image?
	LPIPS
	FID	Which image better fits the style of the referred image samples and the description “a painting in the style of [artist]”?
	CLIP-I/T
Style Mimicry		How successfully does the style of the image mimic the samples?

	Method	SD	FMN	ESD	AC	UCE	NP	SLD
Image Fidelity	FID-30k $\downarrow$	16.21	16.47	16.51	16.95	16.64	16.89	16.95
Image Fidelity	User/% $\uparrow$	-	62.93	63.02	63.87	63.56	63.50	63.98
Text Alignment	CLIP-T $\uparrow$	0.31	0.30	0.30	0.31	0.31	0.30	0.30
Text Alignment	User/% $\uparrow$	-	59.37	59.46	61.04	60.89	59.51	59.38