Machine Unlearning in Computer Vision (2023–2025)
Machine Unlearning in Computer Vision (2023–2025)
2025)
Introduction
Machine unlearning refers to the process of making a trained machine learning model “forget” certain training
data as if that data had never been used 1 . This concept emerged in response to practical needs like the
“right to be forgotten” under data protection laws (e.g. GDPR) and the removal of poisoned or erroneous
data from models 2 . In essence, when a user or regulator requests data deletion, not only must the raw
data be deleted, but any influence of that data on deployed models should be erased. Early foundational
work by Cao and Yang (2015) introduced the term machine unlearning and framed it as transforming
learning algorithms into a summation form to enable post-hoc deletion of specific data points 3 .
Subsequent efforts formalized the goal: an unlearned model should be indistinguishable from a model
trained from scratch without the data in question 1 . Achieving this ideal is challenging, especially for deep
neural networks common in computer vision (CV). Key requirements for practical unlearning include:
effectiveness (complete removal of the data’s influence) 4 , efficiency (prompt unlearning, to meet legal
time-frames and deployment needs) 5 , utility (minimal degradation of accuracy on the remaining data)
6 , and compatibility (methods that can be applied to existing model training pipelines) 6 .
In the CV domain – spanning tasks like image classification, object detection, and face recognition – these
requirements are particularly hard to satisfy. Vision models are typically high-capacity deep networks with
complex, non-convex training objectives, making exact re-training for every data deletion infeasible 7 .
Nevertheless, 2023–2025 has seen significant advances in machine unlearning techniques tailored to computer
vision. Researchers have built on foundational ideas (like those by Cao & Yang and the SISA framework by
Bourtoule et al. 8 ) to devise novel algorithms that selectively remove information from CV models while
preserving their performance on retained data. Below, we review recent developments in each subdomain
(classification, detection, face recognition), highlight core techniques (exact vs. approximate unlearning,
“scrubbing”, certified removal, etc.), discuss challenges such as catastrophic forgetting and computational
cost, and propose future extensions for this emerging field.
1
the model’s predictions differ only negligibly or within some probabilistic bound), though slight deviations
are permitted. In practice, most recent CV-oriented unlearning methods are approximate – they strive to
closely mimic the result of exact retraining but in a fraction of the time 12 .
“Scrubbing” Model Parameters: One family of approximate techniques focuses on scrubbing the model’s
weights to eliminate information about the forgotten data. Golatkar et al. (2020) pioneered this for deep
networks by leveraging influence functions and Fisher information to identify which model parameters
encode information about a particular training subset and then perturbing those parameters to “forget”
that info 13 14 . Their approach, sometimes termed Fisher Forgetting, adds a carefully crafted noise to the
weights (derived from a second-order approximation of the training objective) such that the model’s output
on the forget data is destroyed while minimally affecting other outputs 13 . More recently, Kurmanji et al.
(2023) proposed an improved scrubbing algorithm called SCRUB, which combines a teacher-student model
framework with application-specific forget objectives 15 . SCRUB was shown to consistently achieve high
forgetting quality (i.e. the model performs as if the data were never seen) across different scenarios – such
as removing biased data, correcting mislabels, or honoring user privacy – while maintaining model utility
on retained data 15 . Scrubbing-based methods operate directly on the model, making them model-intrinsic
approaches: they don’t require retraining from scratch, but rather edit the model’s parameters to erase
memory of the target data.
Knowledge Distillation and Student-Teacher Unlearning: Another powerful approach is to train a new
model (student) to replicate the behavior of the original model (teacher) on the data that should be
retained, but not on the data to forget. This is often done via knowledge distillation. For example, Bonato et
al. (2024) introduced SCAR (Selective-distillation for Class-agnostic Unlearning), which distills knowledge from
the original model to a new model using carefully chosen data 16 . Crucially, SCAR does this without
needing to keep aside a large “retain set” of original data: it uses out-of-distribution (OOD) images and a
distillation trick to preserve the original model’s accuracy on in-distribution data 17 . The forget set’s
representations are nudged towards incorrect classes (using a modified Mahalanobis distance in feature
space) to erase class-specific knowledge 18 . This yields an unlearned model that retains high test accuracy
on remaining classes while the forgotten class is effectively unlearned 19 . Distillation-based unlearning is
attractive because it refreshes the model (producing a new model as output) and can naturally incorporate
constraints to ensure the forgotten data has low influence (e.g., the teacher’s outputs for that data can be
deliberately not replicated). A related idea is selective fine-tuning: instead of training a completely new
model, one can fine-tune the existing model on a carefully curated dataset (or even synthetic data) that
represents the retained knowledge, thereby washing out the influence of the forgotten data. This is, in
effect, a limited form of retraining guided by the original model’s knowledge of what should remain.
Data Partitioning and Sharded Training: Techniques like SISA (Sharded, Isolated, Sliced, Aggregated
training) 8 approach unlearning from the training process side. The idea is to train the model in a
modular way (e.g., on disjoint shards of data and aggregated ensembles), so that when a deletion is
needed, only the components that trained on the “to-be-forgotten” shard need to be retrained 20 .
Bourtoule et al. (2021) demonstrated that SISA can significantly speed up exact unlearning – for instance, up
to a 4.6× retraining speed-up on certain image datasets – by limiting how much of the model is affected by
any single data point 21 . Essentially, partitioning reduces entanglement between data points in the model.
A recent CVPR 2023 work by Lin et al. went further, introducing an Entanglement-Reduced Masking
strategy during training to make knowledge of different classes more separable 22 . Their method, ERM-
KTP, trains the initial model with masked neurons to reduce class entanglement, and on an unlearning
request, it transfers the knowledge of non-forgotten classes to a new model while prohibiting (masking out)
2
the knowledge of the target class 23 . This knowledge transfer/prohibition approach yields an interpretable
unlearning process where one can explicitly see which part of the network was “responsible” for the
forgotten data 24 . The trade-off is that methods like SISA or ERM-KTP require a specialized training
procedure (sharding or masking) from the start, which may not always be feasible to implement in existing
pipelines, but they make later unlearning requests much more efficient.
Certified Removal Guarantees: A prominent line of research aims to prove that a model has forgotten a
data point, often by leveraging techniques from differential privacy. Guo et al. (2020) introduced the idea of
certified removal, which formalizes a guarantee that after unlearning, the model’s parameters (or outputs)
are indistinguishable (up to a small probability) from those of a retrained model that never saw the data 25 .
In practice, this usually involves adding randomization or noise to the training/updating process. For
example, one might train a model with differential privacy and then show that removing a point changes
the distribution of model parameters by at most what a DP guarantee allows, thus “certifying” its negligible
influence. Early certified removal methods focused on convex models (like logistic regression) or shallow
models where theoretical analysis is tractable 14 . Extending these guarantees to deep CV models is non-
trivial, but it’s an active area: recent approximate unlearning methods often strive for a statistical guarantee
of forgetting (e.g., bounding the increase in a membership inference attacker’s success rate after purported
unlearning). Certified removal sits at the intersection of security and ML – it provides a worst-case assurance
that even an adversary cannot tell the difference between the unlearned model and a model never trained
on the data. In summary, certified approaches offer strong privacy guarantees, but typically at the cost of
some model accuracy or added training overhead (since techniques like differential privacy inherently add
noise or require multiple model samples to verify removal).
Applied to
image
4×–5× faster Minor classification
Shard training
unlearning accuracy (e.g.
SISA data into
Exact than full drop if many ImageNet)
(Bourtoule et isolated slices;
(modular retrain in shards (due with some
al., 2021) retrain only
retraining) simple to less retraining
8 21 affected shard(s)
image tasks combined overhead 26 .
on deletion
21 training) 21 Speeds up
exact
unlearning.
3
Method Impact on Notes/Use
Type Key Idea Efficiency
(Year) Accuracy Cases
Demonstrated
Use Fisher on image
information to classifiers
Small
Fisher identify Much faster (CIFAR/
accuracy
Forgetting Approx parameters than ImageNet). No
degradation
(Golatkar et (weight influencing retraining retraining
possible if
al., 2020) scrubbing) target data; (one-shot needed, but
noise is
13 perturb weights update) requires
large
to erase that computing
influence second-order
gradients.
For image
classification &
Instead of
Negligible face
altering all
drop in recognition:
Boundary weights, adjust ~17× speed-
Approx other efficiently
Unlearning the model’s up for
(decision- classes’ unlearns an
(Chen et al., decision forgetting
boundary accuracy entire class
2023) 27 boundary for one class in
shift) (focuses (e.g. one
28 the target class CIFAR-10 28
change on identity) 12 .
to mimic scratch
target class) Doesn’t require
model behavior
original
training data.
Image
Train with
classification
entanglement-
focus. Requires
reduced masks
custom
to separate class Efficient at Maintains
training
knowledge; on unlearning high fidelity
procedure
ERM-KTP (Lin unlearn, transfer time on retained
Exact→Approx (with masks)
et al., 2023) only non-target (knowledge classes
hybrid for the original
22 29 class knowledge transfer is (target class
model 22 .
to a new model faster than effectively
Produces
(knowledge full retrain) removed)
interpretable
transfer +
mask showing
prohibition
forgotten
masks)
knowledge.
4
Method Impact on Notes/Use
Type Key Idea Efficiency
(Year) Accuracy Cases
Image
Knowledge classification
Original
distillation from setting. Can
accuracy
original to new even work
High largely
model using out- when you can’t
SCAR efficiency (no preserved
of-distribution keep original
(Bonato et Approx full retrain; 16 ;
images; no data after
al., 2024) (distillation) few epochs forgotten
“retain set” of training.
16 17 of class
original data Introduces
distillation) accuracy
needed to “self-forget”
drops to
preserve variant for zero
chance
accuracy access to
forget set 30 .
Used for
continual
For each forget
Minimal forgetting on
request, fine-
impact on large pre-
tune low-rank Very fast for
non- trained models
adaptation sequential
GS-LoRA forgotten (classification,
Approx (fine- modules (LoRA) requests
(Zhao et al., classes 32 detection, face
tuning on a vision (small
2024) 31 (design recognition)
modules) Transformer, trainable
32 explicitly 32 . Scalable
with group parameters
avoids to many
sparsity to per request)
catastrophic sequential
isolate forgotten
forgetting) deletions by
class info
accumulating
modules.
Moderate
Developed for
accuracy
face identity
trade-off on
Meta-learn an unlearning
that class vs.
unlearning Efficient one- when user
if full data
procedure that shot data cannot be
was
MetaUnlearn can forget a unlearning retained post-
available;
(De Min et Approx (meta- class given only at inference training.
existing
al., 2025) learning) a single example time Defines the 1-
methods
33 34 of that class (for (training SHUI
struggled if
cases where done in benchmark for
the provided
original training advance) one-shot
image is
data is gone) unlearning of
very out-of-
personal data
distribution
33 .
34
5
Method Impact on Notes/Use
Type Key Idea Efficiency
(Year) Accuracy Cases
Define
unlearning so
Typically Initially for
that resulting
requires simple models
model is Small
retraining (logistic reg.,
probabilistically accuracy hit
with DP or SVM). Concept
Certified indistinguishable (due to noise
Approx multiple is being
Removal from a model in training)
(theoretical model extended to
(Guo et al., without the data. but offers
guarantee) samples; not deep nets: aim
2020) 25 Often uses noise formal
as fast as is a rigorous
(differential privacy
other “certificate”
privacy guarantee
approximate that data is
techniques) to
methods forgotten 25 .
guarantee
removal.
(Table notes: “Efficiency” is relative to naive full retraining. “Impact on Accuracy” refers to the effect on the model’s
utility for retained data.)
This table highlights that no one-size-fits-all solution exists yet. Exact methods (like SISA or entanglement-
reduced training) preserve the ideal of complete forgetting but impose constraints or overhead either
upfront or at removal time. Approximate methods (like scrubbing, boundary shifting, or distillation)
dramatically cut down unlearning cost – often by an order of magnitude or more – at the cost of needing to
validate that the forgetting is sufficient (since they don’t literally reproduce the scratch-trained model, slight
traces or differences could remain). Many cutting-edge techniques for CV (Boundary Unlearning, SCAR, GS-
LoRA) focus on class-level unlearning (forgetting all samples of a particular class) as a worst-case scenario
that is highly relevant (e.g. a person’s identity in face recognition, or a class of images users object to). We
now delve into each CV subdomain to see how these techniques are applied and what progress has been
made.
One line of work aims to expedite retraining by design. The SISA framework (2021) is a prime example: by
partitioning the training data into $n$ shards and training $n$ sub-models whose results are aggregated,
the impact of any single data point is limited to one shard’s sub-model 8 . To unlearn that point, only the
sub-model on that shard needs retraining (plus a lightweight aggregation update). This yielded substantial
speedups in practice – e.g., 4.63× faster unlearning on a Purchase intent dataset and 1.36× faster on
6
ImageNet, compared to full retraining 21 . The downside is a slight hit in accuracy (since each sub-model
sees less data) and more complex training. Building on this idea of modular training, Lin et al. (CVPR 2023)
introduced ERM-KTP, which doesn’t use shards but instead trains a single CNN with special masking to
reduce inter-class entanglement 22 . At training time, neurons are encouraged to specialize so that each
class’s features can be somewhat decoupled. At unlearning time, ERM-KTP transfers knowledge of all non-
target classes to a new model and suppresses the target class’s knowledge using a mask, effectively “zeroing
out” that class’s contribution 29 . This approach was shown to be both effective and interpretable, since the
mask explicitly highlights which parameters encoded the forgotten class 24 . Such methods still involve
retraining a model (in ERM-KTP’s case, training a new model via knowledge transfer), but they confine the
scope of retraining to a minimal necessary subset of knowledge.
A more direct approach in classification is to update the trained model’s parameters to forget specific
data. Here, influence-function-based scrubbing techniques have shown promise. Golatkar et al.’s work on
fisher-informed unlearning demonstrated that by adding a tailored noise to the weights (proportional to each
weight’s importance for the forgotten data), a classifier’s performance on that forgotten class can be
drastically reduced without a full retrain 13 . For instance, if a CIFAR-10 model should unlearn the “cat”
class, one can compute which weight changes would occur if “cat” images were removed (using a second-
order approximation) and then apply those changes directly. The result is a model that hardly recognizes
cats (as desired) but still recognizes the other 9 classes almost as well as before. However, one challenge is
calibrating the perturbation: too little and traces of the data remain; too much and you degrade overall
accuracy. Recent scrubbing methods like SCRUB (2023) automate this process and even personalize it to the
unlearning objective (e.g., different criteria for “forgetting” in privacy vs. bias removal scenarios) 15 . These
methods report high forget accuracy (the model performs as poorly on the forgotten data as a scratch
model would) while maintaining strong accuracy on retained classes 15 .
Another breakthrough in 2023 was Boundary Unlearning for deep classifiers 36 . Chen et al. observed that
to forget a class, one doesn’t necessarily need to alter every weight in the network – it can suffice to adjust
the decision boundary in the final layer of the classifier 27 . They proposed two strategies, Boundary Shrink
and Boundary Expanding, that effectively push the forgotten class’s decision boundary so that the model no
longer confidently predicts that class 37 38 . In practical terms, if “class X” is to be forgotten, Boundary
Unlearning will retrain only the classifier layer (and possibly a few adjacent layers) for a few epochs, without
using any “class X” data, until the model’s behavior aligns with a retrained-from-scratch model. This was
found to be extremely fast – forgetting one class in CIFAR-10 took only a small fraction of full retraining
time, achieving about a 17× speed-up, while successfully making the model’s accuracy on that class plummet to
chance levels 12 . Importantly, utility on the other classes remained high (the paper reports minimal change
in overall test accuracy) 39 . The reason it works is that in the retrained-from-scratch model (the gold
standard of forgetting), the forgotten class’s samples typically end up either misclassified uniformly or
mapped to the boundaries of other class clusters 40 . Boundary Unlearning explicitly mimics this by pushing
those samples to the borders of other classes in feature space, without perturbing the internal feature extractor
much 41 . This method is particularly apt for class-level unlearning where an entire class label must be
dropped from a classifier.
In 2024, knowledge distillation techniques came to the forefront with SCAR. Many prior approximate
unlearning methods rely on keeping some of the original data as a “retain set” to restore the model’s
performance after removal (since fine-tuning or distilling on only the remaining data can be seen as a mini-
retraining) 42 . However, storing a large retain set undermines privacy (you’re holding on to user data
longer) and can be impractical if the forget requests are large. SCAR demonstrated that one can use public
7
or unrelated images (out-of-distribution data) as a proxy for the retain set during distillation 17 . By feeding
these images through the original model and new model and aligning the outputs (a form of zero-shot
knowledge distillation), SCAR retained the original accuracy without needing any original training image
after unlearning 19 . Meanwhile, it unlearns the target class by shifting its features toward other classes, as
mentioned. The results showed SCAR matched or exceeded prior methods that did use retain sets 43 ,
highlighting that clever use of auxiliary data can substitute for holding back a chunk of the training set.
In summary, for image classification, the state of the art by 2025 offers several effective unlearning
paradigms: from retraining-light methods (sharding, masking) to model editing methods (scrubbing weights,
shifting decision boundaries) to student-teacher methods (distillation with or without retain data). Core
challenges remain in this domain: (1) Verifying that the forgetting is complete – many works evaluate this by
measuring how the model performs on the forgotten class or via membership inference attacks to see if the
model still “remembers” the data 15 . If the model’s outputs or internal representations still inadvertently
encode the forgotten information (even if accuracy is low, an attacker might extract something), then
unlearning isn’t fully achieved. (2) Balancing accuracy vs. forgetting: Methods like Boundary Unlearning and
SCAR explicitly emphasize retaining accuracy on non-forgotten classes 41 16 . This is crucial because a
naive approach to forgetting (e.g., simply remove the last layer for a class or fine-tune on all other classes
only) can lead to catastrophic forgetting of other classes – the model might degrade on everything, which is
not acceptable. Techniques such as regularization, knowledge transfer, and selective parameter updates are
employed to avoid this pitfall. (3) Scalability: Forgetting one class from CIFAR-10 is one thing; forgetting
arbitrary subsets of hundreds of classes from Imagenet-scale models is another. Ongoing research
addresses scaling up these methods (for example, making scrubbing and distillation more efficient for
billions of parameters). Despite these challenges, the progress in 2023–2025 indicates that efficient and
quality-unlearning for image classifiers is becoming feasible, paving the way for deployment in real systems
where users might request class removals or withdrawal of their images.
One of the latest advances came from Zhao et al. (CVPR 2024) in their GS-LoRA approach, which addressed
continual unlearning for large pre-trained vision models across tasks including object detection 32 . They
use a pretrained transformer-based detector and attach small LoRA (Low-Rank Adaptation) modules to it.
For each forgetting request (say, “forget class Car”), they fine-tune a set of LoRA modules to specifically
erase the model’s ability to detect that class, while freezing the rest of the model’s weights 44 . A group-
sparse regularization is applied so that only a subset of the LoRA parameters actively alter the model (those
associated with the Car class), and others remain zeroed 45 . After this process, the LoRA module
encapsulating “Car” knowledge can be dropped. The result, as reported, is that the detector no longer detects
the forgotten class (specific objects), yet its performance on other object classes remains essentially
unchanged 32 . In their experiments, GS-LoRA could sequentially forget multiple classes (face recognition,
8
object detection, etc.) with minimal compounding error, showcasing a viable path for multi-round
unlearning in detectors 32 .
Adapting unlearning techniques to detection often means dealing with both classification and
localization aspects. For example, if we want to unlearn the class “Car” from an object detector trained on a
driving dataset, it’s not enough to remove the “Car” output—the model should ideally also “forget” how to
draw proper boxes around cars. In practice, many detectors are two-stage (region proposal + classification)
or one-stage with combined outputs. Unlearning a class likely involves removing or altering the
corresponding neurons in the classification head and also perhaps adjusting how the model’s backbone
responds to that object’s features. Some possible approaches (by analogy to classification) include:
retraining the detection head without that class (which might break some bounding box regressors), or
using knowledge distillation where a teacher is a model that never saw that class. The GS-LoRA method
effectively handles this by fine-tuning on a task “remove class X”: the model is optimized such that any
proposals for class X are suppressed (perhaps reclassified as background), and only minimal changes occur
to other outputs. This addresses a key challenge: detection models share features across classes (e.g.,
edges, textures), so forcing the model to forget one class could degrade those shared features and hurt
other classes. By using a parameter-efficient fine-tuning isolated to the forgetting task, GS-LoRA ensures
the rest of the knowledge is preserved (no catastrophic forgetting of other classes) 46 .
Another potential approach, inspired by Boundary Unlearning, would be to adjust the decision boundaries
in the detector’s output space. For detectors like Faster R-CNN or SSD, one could imagine shifting the
confidence scores for the forgotten class down to zero for all proposals, perhaps by adjusting the bias
terms or classification layer weights for that class. Indeed, Boundary Unlearning was demonstrated on a face
detection scenario (which is a specialized object detection) with success 47 . However, one must also ensure
that the model doesn’t just mislabel those objects as a different class (a forgotten car should ideally not be
misdetected as, say, a bus either). In detectors, this might be handled by the background class absorbing
those instances. Some research may approach this by data augmentation: e.g., remove all annotations of the
target class and fine-tune the detector for a few epochs – effectively, the model will start treating those
objects as background. This is a simple unlearning-by-fine-tuning strategy, which is approximate but could
work in practice for one-off requests. The drawback is the potential for catastrophic forgetting of other
classes if not done carefully (the model might overfit to the limited data during fine-tuning). Techniques like
regularization or knowledge distillation from the original model (for the classes to retain) can mitigate that.
So far, the literature has one clear example (GS-LoRA) dealing with detectors. It showed that even large
pretrained detectors (e.g., Transformer-based) can be unlearned for specific classes efficiently 32 . We expect
more to follow. Some challenges unique to detection:
• Multi-label complexity: An image can contain both “to-forget” and “to-keep” objects. If a user says
“remove my data” and that person appears in an image with other objects, we face a dilemma. Does
one remove the entire image’s influence or somehow only remove the features corresponding to
that person? A fine-grained solution might require detecting the user’s instances and treating those
as the unit of forgetting. This could be an area where instance-specific unlearning is needed – possibly
first detect all instances of that user or class and then remove their influence while leaving the rest
of the image’s information intact. This granularity is largely unexplored and quite complex, as it
verges into explainable AI territory (the model would need to “know” which part of its weights
corresponded to that instance’s features).
9
• Performance metrics: In classification, we can measure forgetting by drop in accuracy on the forget
class. In detection, we would likely measure drop in mAP (mean average precision) for the forgotten
class, and hope to see it drop to 0 or random chance. Zhao et al. likely evaluated that GS-LoRA yields
near-zero AP for forgotten classes while leaving AP for others. Ensuring that forgotten objects don’t
get misdetected as something else (e.g., a forgotten “Car” appears as a false-positive “Truck”) is
tricky – one might need a secondary check or a way to enforce that the features for “Car” don’t just
relabel. Possibly an idea is to insert a “tombstone” class that catches those and then eliminate it, as a
way to isolate the forgotten knowledge.
In conclusion, object detection unlearning is still emerging. The key insight from recent work is that
structural approaches (like adding trainable components per class or per request) can handle forgetting
without retraining the whole detector. Looking forward, integrating unlearning into the training of detectors
(so they are “unlearnable by design”) and handling instance-level forgetting are important next steps. But as
of 2025, we have proof-of-concept that even complex detection models can forget selectively – an ability
that might be crucial for applications like removing a particular person’s data from surveillance models or
complying with requests to stop detecting certain objects (for ethical or legal reasons).
An early adaptation of unlearning to face recognition is found in works like Boundary Unlearning (2023),
which explicitly tested their method on VGGFace2, a large face dataset 48 12 . By using their decision
boundary shifting technique, they were able to make a face recognition model “forget” a particular identity
class with a 19× speed improvement over full retraining 49 . In practical terms, if the model originally could
recognize Person A among others, after unlearning Person A, the model’s accuracy for Person A dropped to
essentially 0%, and any images of A would either be rejected or misidentified (as they would in a model that
never saw A) 12 . Meanwhile, recognition accuracy for all other people remained high. This kind of class-
level removal is very suitable for FR, since a user’s request “delete my data” translates to “remove class
[User] from the classifier”.
However, face recognition introduces some unique challenges. Often, FR models are not used as explicit
classifiers at inference time; instead they produce an embedding (a feature vector) for each face, and
recognition is done by comparing embeddings (e.g., via cosine similarity) to see if they match known
identities. Unlearning an identity in this scenario means the model’s embedding for that person’s face
should no longer be distinctive or close to that person’s previous embeddings. If the model had a centroid
for Person A in embedding space, ideally that centroid is gone or merged with others. Techniques like
weight scrubbing can be applied here: e.g., if the final layer is a fully-connected layer with one neuron per
identity (common in training FR with softmax), scrubbing or zeroing out Person A’s neuron and any
associated weights can be a start (this is like removing the class). But one must also consider earlier layers
that might have carved features specialized for Person A (perhaps Person A had a unique hairstyle the
10
model latched onto – those feature detectors might need adjustment if they solely contributed to
identifying A).
A very recent development addressing a critical real-world scenario is the One-Shot Unlearning of
Personal Identities (1-SHUI) task proposed by De Min et al. (2025) 33 . They recognized that in practice,
due to privacy regulations, a company might delete the original training data once the model is trained (to
comply with data minimization). If a user later asks to be forgotten, the company might no longer have that
user’s training images on hand to help unlearn – a setting where most existing unlearning methods would
fail, since they often rely on retraining or at least having some data to tune on. 1-SHUI addresses this by
assuming the user can provide one personal image (a “portrait picture”) of themselves to aid in unlearning
50 . Essentially, the user says: “Here’s my photo, please make sure your model forgets me.” The researchers
created benchmarks on CelebA and related face datasets to simulate this and found that existing unlearning
methods struggled when only one image (and no original training set) is available 34 . They then proposed
MetaUnlearn, a meta-learning based approach where the FR model is trained in a way that it can quickly
unlearn a class given just one example of that class 51 . During meta-training, the model learns how to
remove a class by essentially doing many “leave-one-class-out” episodes. The result is a model that contains
an internal procedure (or a small side-network) which can take a single image of the target identity and
adjust the main model’s weights to forget that identity. MetaUnlearn proved more effective than naive fine-
tuning in this one-shot scenario, but the findings also highlighted a challenge: if the provided image is not
very representative of the variations of that person’s face (for instance, the training data had the person in
diverse poses and the provided photo is just one frontal shot), unlearning may be incomplete 34 . The
model might forget the face as seen in that one photo, but could still recognize the person in a different
pose (because that information wasn’t countered by the single image). This points to an inherent tension:
without access to all of a person’s data, can we robustly make a model forget them? It suggests future work
might involve generating additional views of that person from the one photo (using generative techniques) to
better erase them – a possible extension we discuss later.
Apart from one-shot scenarios, other approaches to face recognition unlearning leverage the fact that
these models often have identifiable components tied to identities. For example, some FR models use a loss
like ArcFace which encourages each identity to occupy a distinct region in feature space. Unlearning an
identity could be approached by removing that region – e.g., adding noise to any feature vector close to
that identity’s centroid, or fine-tuning the model on a mix of other identities such that Person A’s centroid
drifts into another cluster (making it non-discriminative). The SCRUB algorithm mentioned before was
evaluated in different contexts and could be applied to face recognition as well: by formulating the objective
that an attacker should not be able to tell the model was trained on Person A’s data 15 (perhaps via
membership inference attack success as a metric). Indeed, the privacy guarantee notion in unlearning is very
fitting for face data: we want to guarantee that the model doesn’t leak any information about the forgotten
individual. This might involve rigorous testing with model inversion attacks (which attempt to reconstruct
images from the model) to ensure that no recognizable feature of the person can be extracted.
In terms of results reported, beyond the Boundary Unlearning speed-ups (17× for CIFAR-10, 19× for
VggFace2) 28 , many works show qualitative evidence: e.g., before unlearning, the model confidently
identifies Person X; after unlearning, the model either misidentifies them as someone else or rejects the
identification (which is what we want, effectively). A balanced unlearning in FR would mean: Person X’s
images yield low confidence for X and perhaps get distributed among various other identities at random (or
have high entropy in prediction), similar to what a scratch-trained model (that never knew X) would do 40
11
52 . Meanwhile, for all other persons Y, the model should behave unchanged as if X was never there
(except that if X was a possible mistaken identity for Y before, those mistakes might even reduce).
A particular challenge in FR is that people’s faces share features. Forgetting one individual might be harder
if they look very similar to another individual in the dataset – the model might have a single feature that
triggers on both. Unlearning one could unintentionally affect the other. This hasn’t been deeply explored
yet, but it’s analogous to class entanglement issues and could be handled with methods like ERM
(decoupling features) or careful fine-tuning that checks collateral damage on similar identities.
In summary, face recognition unlearning research in 2023–2025 has solidified the feasibility of class-level
unlearning (for identities) and broken ground on the much tougher limited-data unlearning scenario. With
regulations such as the EU’s GDPR and various biometric privacy laws, these techniques are likely to be
critical. It’s worth noting that the Federal Trade Commission (FTC) in the U.S. has already mandated at least
one company to delete algorithms/models derived from unlawfully obtained face data 53 – essentially
enforcing machine unlearning. The techniques discussed (boundary adjustment, meta-unlearning, etc.)
could be directly applicable to comply with such orders. The key is to do so efficiently (companies may have
to remove one person at a time upon request, rather than retrain on millions of faces from scratch every
time) and provably (showing regulators or users that “yes, we really removed your data’s influence”).
• Avoiding Catastrophic Forgetting vs. Sufficient Forgetting: There’s a fine line between selectively
forgetting the target data and inadvertently degrading the model’s knowledge of everything else.
Many unlearning methods explicitly aim to confine the changes to only what’s necessary for
forgetting. For example, GS-LoRA emphasizes that the forgetting procedure should have minimal
impact on remaining knowledge 46 . If one naively fine-tunes a model on the dataset with the
sensitive data removed, the model might overfit or shift decision boundaries, harming accuracy on
other classes (a form of catastrophic forgetting). The challenge is that in neural networks, knowledge
is entangled – completely isolating one data point’s influence is hard. Techniques like regularization
(e.g., group sparsity in GS-LoRA to only adjust certain parameter groups 45 ) and distillation
(transferring knowledge to a student except for the part to forget) are used to tackle this. The trade-
off often comes as a parameter: stronger forgetting (making sure even nuanced influences of that
data are gone) can cause a larger perturbation to the model, possibly affecting other outputs. On
the other hand, cautious forgetting (small changes) might preserve accuracy but leave some traces.
Tuning methods to find that sweet spot is non-trivial.
• Accuracy vs. Privacy/Forgetting Quality: This is arguably the core trade-off. An ideal unlearning
removes all information of the forgotten data (privacy maximized) while preserving the model’s
performance on the rest (utility maximized) 41 . In practice, methods lean towards one side. Exact
retraining maximizes both (in theory, you lose nothing except what’s needed), but it’s expensive.
Approximate methods sometimes introduce a slight utility loss; for instance, SISA reported a small
degradation in accuracy when using many shards 54 , and adding noise for certified removal
definitely can hurt accuracy. Boundary Unlearning showed it could achieve both good utility and
12
privacy guarantees with minimal cost 12 55 , but that was for a specific scenario (one class
removal). When forgetting multiple items or a large portion of data, it becomes harder to not
degrade utility. Researchers often measure this trade-off by plotting “forgetting success” vs “accuracy
retained.” Forgetting success might be measured by the model’s error rate on the forgotten data
(higher is better for forgetting) or by membership inference advantage (lower is better). There is
often a point of diminishing returns: trying to drive the forgotten data performance all the way to
random often causes a noticeable drop in overall accuracy. Some approximate methods thus settle
for statistical indistinguishability where the model is almost as good as retrained (but not exactly) 11 .
The notion of $\epsilon$-Certified Removal encapsulates this: it allows an $\epsilon$ difference
between the unlearned model and a retrained model. The smaller $\epsilon$ is (more privacy),
typically the larger the impact on accuracy or training time.
• Computational Cost vs. Completeness: Unlearning algorithms must be efficient – GDPR and other
laws imply data should be removed promptly. If retraining a detector takes 2 weeks, that’s not
practical whenever a single user opts out. Many studies report the time saved compared to retraining
as a key metric 56 . For instance, Boundary Unlearning achieving a 17× speed-up means a task that
took hours could take minutes 28 . However, the most computationally efficient methods (like one-
shot weight updates) may not achieve the most complete forgetting. There’s a trade-off between
doing a quick-and-dirty job vs. a thorough job. Some scenarios might accept a quick partial
unlearning as interim (perhaps followed by periodic full retrains offline to “true-up” the model). In
critical cases, though, one might favor a slower but guaranteed removal. The challenge is to push
methods to be both fast and complete. Advances like using OOD data (SCAR) reduce cost by not
requiring storing and handling large retain sets 19 , and using small adapter modules (LoRA) means
you fine-tune only thousands of parameters, not millions 45 . These innovations improve the cost
side of the equation significantly.
• Verification of Unlearning: How can we verify that a model has truly forgotten specific data? This is
a challenge technically and from a governance perspective. Two types of verification exist: invasive
(requiring model owners to provide some proof, possibly by instrumenting the model) and non-
invasive (third-party tests treating the model as a black box) 57 58 . In research, a common
verification is to retrain a reference model without the data and compare the unlearned model to it
on various evaluations. If they match closely (e.g., predictions or confidences align), then unlearning
is successful 59 . Membership inference attacks serve as another test: if an attacker cannot tell
whether the data was in the training set or not after unlearning, that’s a good sign of forgetting.
Some works have even proposed watermarking the data: e.g., inserting a hidden trigger during
training that only influences the target data, and then after unlearning check if the trigger effect is
gone 60 . The challenge for model providers is that providing full transparency (like giving out the
model weights for inspection) might not be feasible or secure. Certified removal offers one solution
by giving a mathematical proof, but implementing that in deep nets is hard. So, a practical trade-off
emerges: simpler unlearning methods might be easier to verify (e.g., if you literally retrained part of
the model, you know what you did), whereas complex approximate methods might require extensive
evaluation to convince skeptics.
• Limited Access to Data for Unlearning: As noted in the face recognition discussion, sometimes the
original training data or even a portion of it may no longer be available when an unlearning request
comes in 61 . Privacy best practices encourage deletion of raw data once models are trained (or at
least not keeping a hold-out copy without reason). This creates a challenge: many unlearning
13
methods implicitly assume you have the dataset (to either do a retraining, fine-tune on retain data, or
at least test against a retrained model). Without the data, some strategies break. The 1-SHUI
scenario explicitly highlights this and is forcing researchers to think of creative ways (like user-
provided data or generating synthetic data) to perform unlearning 33 . SCAR’s approach of using
unrelated images to maintain performance is one clever workaround for not storing original data
17 . Another idea is to store some distilled summary of the data (not the raw images, but say
embeddings or gradient information for the original model) that could help guide unlearning;
however, even that could pose privacy concerns if it indirectly contains info on individuals. Therefore,
there is a trade-off between data retention (for ease of unlearning) and data minimization (for
compliance and security). Solutions that don’t require any original data (like self-forget versions of
algorithms 30 ) are especially valuable.
• Data Protection Laws and Timelines: Legal requirements add external constraints. GDPR’s right to
be forgotten doesn’t specify an exact timeframe in the text, but regulators expect it to be done
within a reasonable time (often interpreted as 30 days for data deletion requests). If a model is
powering a live service (say a photo app with face recognition), one can’t afford to take it offline for
days to retrain it each time someone opts out. Thus, unlearning methods need to be on-demand and
fast. Many works cite this motivation explicitly 62 . Another aspect is auditability: if challenged
legally, a company might need to demonstrate that they indeed removed the person’s data from the
model. This ties back to verification: having a certificate or at least a clear procedure helps.
Additionally, some laws (like proposed AI regulations) might require documentation of any machine
unlearning performed, treating it as a model edit. From a deployment perspective, one challenge is
implementing an unlearning pipeline in production that can intake a request, locate the relevant
data and model components, apply the unlearning algorithm, and redeploy the model – all perhaps
seamlessly to users. This involves not just ML challenges but also engineering ones (model
versioning, ensuring no requests are served by an un-processed model, etc.).
• Complex Model Architectures: Modern CV models (think of ResNet backbones, FPNs in detectors,
ViT, or even bigger multimodal models) are complex, making it hard to pinpoint where the
information about a specific training sample lies. Unlike a database where a particular row
corresponds to a user’s data, neural networks smear information across many weights. The non-
convex nature of deep learning also means we can’t directly apply some of the nice mathematical
tools developed for convex models 14 63 . This complexity is a fundamental challenge; it’s why
approximate methods that treat the model as a black box (e.g., focusing only on outputs or last-layer
features) sometimes succeed – they sidestep the messy internals. But for more rigorous unlearning,
one might need to dive into those internals (like influence functions do). The trade-off here is
between generality and specificity: methods that don’t care about model architecture (distillation,
fine-tuning) can be widely applied but may not be the most efficient or certifiable. Methods that
exploit structure (like LoRA for transformers, or linearity in last-layer, or dual form of SVM, etc.) can
be more exact or faster, but they may need to be re-developed for each new architecture innovation.
For instance, an unlearning method tailored to CNNs might not directly work for vision transformers,
which is why GS-LoRA’s innovation to use LoRA modules is significant for transformer-based models
44 .
• Attacks and Robustness: An often overlooked but important challenge is: could an adversary
manipulate an unlearning system? For example, if a malicious user repeatedly requests unlearning of
certain data, could that degrade the model or cause denial-of-service? Some research has
14
considered worst-case sequences of unlearning queries (which motivated the continual unlearning
problem setup 64 ). Another angle: if the unlearning method is known, an attacker might try to infer
what data was removed by examining changes to the model (a kind of side-channel). Ideally,
unlearning should not reveal which data was deleted (that would itself be a privacy leak for others).
Certified indistinguishability addresses this by saying the model after unlearning looks like one of
many models trained on any subset of data of that size 25 , so you can’t tell which point was
removed. Maintaining such robustness is a challenge, especially as models might be exposed via
APIs where differences could be probed.
In summary, machine unlearning in CV is a balancing act. Effectiveness, efficiency, and accuracy form a
sort of “triangle” where improving two can be at the expense of the third 65 . The recent research strives to
get most of the way on all three: e.g., fast unlearning (efficiency) that thoroughly removes data influence
(effectiveness) and keeps model performance (utility) high. Techniques like Boundary Unlearning and SCAR
show this is possible in specific cases, but general solutions are still being refined. Verification and
compliance add another layer, ensuring that these algorithms can be trusted in practice. As we look ahead,
these challenges will drive new research, some potential directions for which are outlined next.
• Towards Certified Unlearning for Deep Models: While the concept of certified removal has been
introduced for simple models (e.g., convex models with differential privacy guarantees) 25 ,
extending strong formal guarantees to deep neural networks is largely uncharted. Future work could
explore techniques like probabilistic verification of unlearning – for example, using ensembles of
models or dropout-based Bayesian approximations to statistically confirm that a particular data
point’s influence is gone. Another angle is to incorporate differential privacy during training such that
unlearning any single sample would only cause a bounded change in the model (making post-hoc
removal easier to certify) 25 . Achieving certified unlearning in CV would give users and regulators
high confidence, but it likely requires new theoretical breakthroughs (perhaps combining insights
from robustness verification and privacy). What’s needed: development of metrics that upper-bound
the “influence” of a data point on a deep model’s predictions (for instance, a deep analog of leverage
scores), and algorithms that can efficiently adjust the model within those bounds. This might involve
advances in understanding loss landscapes – making sure that after removal, the model is in a
parameter region that could have been reached without that data.
15
example, if a foundation model knows the face of Celebrity X (because it was in the training data), an
editing approach could identify neurons highly activated by that face and modify them. This would
require tools to localize concepts within massive models. What’s needed: new research that applies
concept attribution or network dissection methods to identify where in a model certain visual
knowledge is stored, and then methods to modify or erase that specific knowledge without a full
retraining. We might also see hybrid human-in-the-loop approaches, where the model owner could
intervene (for instance, CLIP could be prompted or guided to forget by “unlearning prompts” that
represent the concept to remove).
• Fine-Grained and Partial Unlearning: Most current works consider forgetting entire training
instances or whole classes. But what if only part of an image or certain attributes need to be
forgotten? For instance, maybe a dataset was used to train a model to detect people, and now we
want the model to forget specifically the attribute of people’s race (to mitigate bias), while still
detecting people generally. This is a form of targeted unlearning that isn’t just “forget this subset of
images” but “forget this aspect of the data.” Achieving this could involve adversarially training the
model to be invariant to the attribute, or editing the feature space so that information about that
attribute is scrambled. Another example: in an autonomous driving dataset, suppose all images in a
city had a particular billboard that should be forgotten (maybe it contained personal info). One
wouldn’t remove the whole image, just that part. Fine-grained unlearning might leverage
segmentation: first detect the regions of images to forget, then tune the model to reduce its sensitivity
to those specific pixels. This is somewhat related to unlearning features vs. labels as studied by
Baumhauer et al. (NDSS 2023) 66 – they introduced a method to unlearn specific features from a
model without removing entire instances. Extending such ideas, future methods could allow a user
to specify which part of their data they want removed (like “forget my face but you can keep using the
rest of the photo that had a landmark”). What’s needed: a combination of interpretability (to find
model components related to the attribute or region) and targeted training (e.g., if the model has a
bias neuron or an attribute classifier, one could zero it out or invert gradients on that attribute).
Research into knowledge partitioning in neural nets will help here – if we can isolate the sub-
representation for a particular concept, we can attempt to remove it. This fine-grained control would
broaden unlearning applicability (not all or nothing, but selective forgetting of certain content in
images).
• Continual and Interactive Unlearning Systems: Zhao et al.’s work on continual forgetting 64 opens
the door to models that can handle a sequence of unlearning (and perhaps learning) tasks. In real
deployments, requests may come one by one over time. A promising direction is to integrate
continual learning and unlearning: a model that can both learn new classes and forget others on the
fly. Imagine a surveillance system that regularly updates its model with new people (who opt in) and
drops people who opt out. Current continual learning algorithms focus on not forgetting old tasks;
here we want to intentionally forget some while learning others – a dual problem. Methods like GS-
LoRA already show a way to accumulate “forget” modules; one could similarly accumulate “learn”
modules. The model could then become a dynamic entity that grows or prunes itself as needed.
Another aspect is interactive unlearning: perhaps a user might not be satisfied with one round (“I
think the model still recognizes me in profile view, please forget that too”). Tools that allow iterative
refinement of what is forgotten (maybe via a user-provided test: “here’s an image, does the model
still identify it?”) could be part of future systems. What’s needed: research into the stability of
repeated unlearning – ensuring that forgetting A then B is the same as forgetting B then A
(commutativity, which might not hold and could be problematic), and that after many additions and
16
deletions, the model doesn’t become a frankenstein with degraded performance. Some theoretical
groundwork on unlearning in an online learning setting would be valuable. Practically, developing
efficient retraining schedules (perhaps using experience replay for knowledge to keep, and
something like “experience anti-replay” for knowledge to drop) might allow a unified framework for
model updates.
• Integration of Generative Models in Unlearning: Generative models can play a dual role – as
targets for unlearning (e.g., a GAN that has learned a person’s face should forget it, which is another
can of worms), and as tools to aid unlearning. The latter is intriguing: a generative model could
create synthetic data that either stands in for the data we want to keep (as SCAR did with OOD
images) or stands in for the data we want to remove (to guide the forgetting). For example, if only
one image of a person is available, one could use a generative face model to create many variations
of that person (different poses, lighting) and then fine-tune the FR model to forget all those
generated images – likely resulting in a more complete forgetting of that identity. Similarly, to
unlearn a certain object class, one could generate many images of that object with a generative
adversarial network and use them with a negative learning objective (like “these images should all
be classified as ‘unknown’”). This could accelerate forgetting and ensure thoroughness. What’s
needed: careful use of generative models so as not to leak the very information we want to erase
(using a person’s face generator to forget them is counter-intuitive if that generator was trained on
them; better to use an independently trained generator or one guided by the user’s own data
provided for deletion). Research could explore data augmentation for unlearning – akin to how
data augmentation helps learning generalize, here it could help forgetting generalize (covering
variations of the data so that all traces are gone). There is some early work in the context of
unlearning data biases using generated counterfactuals 67 , which shows removing biased correlations
via unlearning can be effective. Extending that, generative scrubbing approaches (like Zhang et al.
2022 for image retrieval, who trained a generator to produce noise that specifically corrupts certain
feature directions 68 ) might be applied to standard vision tasks as well.
• Standardized Benchmarks and Evaluation Protocols: As the field matures, there is a growing need
for common benchmarks to evaluate unlearning methods in CV. The introduction of 1-SHUI for face
data 33 is an example, focusing on the toughest scenario of no training data. We can envision
creating benchmark suites for, say, classification unlearning (with datasets like CIFAR-10 or ImageNet
where specific classes must be unlearned, measuring accuracy drop vs forgetting success, maybe
with membership inference metrics 15 ), or detection unlearning (where a COCO subset’s class must
be dropped, measuring before/after AP). Perhaps an adaptation of the “Catastrophic Forgetting”
evaluations used in continual learning could be used, but in reverse (intentional forgetting but
measured collateral damage). Moreover, evaluation metrics need to be refined: accuracy on
forgotten class and retained classes, yes, but also measures like forgetfulness index (how
indistinguishable the model is from a scratch model, possibly measured by some distance in output
space or by train a discriminator to tell them apart). Another metric is efficiency index – time or
computational cost normalized by model/train size. Having agreed-upon metrics will help compare
methods fairly. The community might even hold competitions (there was a hint: a NeurIPS 2023
Kaggle competition on machine unlearning 69 ) which can greatly accelerate development. What’s
needed: cooperation to define these benchmarks. This includes selecting datasets, defining what
constitutes a “forget request” scenario (single image? single class? multiple random points?), and
what threat model is assumed (honest but curious tester vs malicious attacker). The benchmarks
17
should include both utility and privacy tests – e.g., membership inference attack results before and
after unlearning as part of the score.
In conclusion, the trajectory of machine unlearning in computer vision is set to make models more flexible,
transparent, and aligned with ethical/legal requirements. By tackling the challenges above, future
researchers will enable models that can learn from data when allowed and forget that data when required. This
will likely involve interdisciplinary work – bridging machine learning engineering, theoretical computer
science, security, and law. The extensions outlined aim to make unlearning more comprehensive (handling
any scenario of forgetting), more reliable (with guarantees and verifiability), and more integrated (part of
the standard ML lifecycle). Given the rapid progress from 2023 to 2025, it’s an exciting time where concepts
once deemed “too hard” (like editing a trained deep model) are becoming reality, and machine unlearning is
evolving from a niche research topic into a practical necessity for deploying AI responsibly in the real world.
Sources:
• Cao, Y. & Yang, J. (2015). Towards Making Systems Forget with Machine Unlearning. IEEE S&P 2015.
(Introduced the concept of machine unlearning)
• Ginart, A. et al. (2019). Making AI Forget You: Data Deletion in Machine Learning. NeurIPS 2019.
(Efficient data deletion for k-means clustering) 72
• Bourtoule, L. et al. (2021). Machine Unlearning. IEEE S&P 2021. (SISA training for unlearning) 8 21
• Guo, C. et al. (2020). Certified Data Removal in Machine Learning. (Introduced certified removal with
formal guarantees) 25
• Golatkar, A. et al. (2020). Forgetting Outside the Box: Scrubbing DNNs via the NTK. ECCV 2020. (Fisher
information based weight scrubbing) 13
• Peste, A. et al. (2021). Adaptive Forgetting in Neural Networks. NeurIPS 2021. (Studied influence of
forgetting with added noise)
• Sekhari, A. et al. (2021). Remember What You Want to Forget: Algorithms for Machine Unlearning.
NeurIPS 2021. (Theoretical analysis of unlearning vs. differential privacy) 73 74
• Chen, M. et al. (2023). Boundary Unlearning: Rapid Forgetting of Deep Networks via Shifting the Decision
Boundary. CVPR 2023 27 75 .
18
• Lin, S. et al. (2023). ERM-KTP: Knowledge-Level Machine Unlearning via Knowledge Transfer. CVPR 2023
22 23 .
• Bonato, J. et al. (2024). Is Retain Set All You Need in Machine Unlearning? (SCAR). ECCV 2024 16 17 .
• Zhao, H. et al. (2024). Continual Forgetting for Pre-trained Vision Models (GS-LoRA). CVPR 2024 31 32 .
• De Min, T. et al. (2025). Unlearning Personal Data from a Single Image (1-SHUI, MetaUnlearn). TMLR
2025 33 34 .
• Kurmanji, M. et al. (2023). Towards Unbounded Machine Unlearning (SCRUB). arXiv 2302.09880 15 .
• Survey – Li, N. et al. (2024). Machine Unlearning: Taxonomy, Metrics, Applications, Challenges, and
Prospects. arXiv 2403.08254 9 7 . (Comprehensive survey of the field)
• Survey – Wang, T. et al. (2023). Machine Unlearning: A Comprehensive Survey. arXiv. (Another survey
covering approaches and metrics)
56 57 58
62 65
49 75
3 14 Boundary Unlearning: Rapid Forgetting of Deep Networks via Shifting the Decision
37 38 Boundary
40 41 https://openaccess.thecvf.com/content/CVPR2023/papers/
52 55 Chen_Boundary_Unlearning_Rapid_Forgetting_of_Deep_Networks_via_Shifting_the_CVPR_2023_paper.pdf
59 63
8 20 21 arxiv.org
26 54 https://arxiv.org/pdf/1912.03817
13 ecva.net
https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123740375.pdf
16 17 18 ecva.net
19 30 42 https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00004.pdf
43
25 66 70 mlsec.org
71 https://mlsec.org/docs/2023-ndss.pdf
19
31 32 Continual Forgetting for Pre-trained Vision Models
44 45 https://openaccess.thecvf.com/content/CVPR2024/papers/Zhao_Continual_Forgetting_for_Pre-
46 64 trained_Vision_Models_CVPR_2024_paper.pdf
53 73 74 openreview.net
https://openreview.net/pdf?id=pvCLqcsLJ1N
20