Deepfake - Detection - and - Localization - Using - Multi-View - Inconsistency - Measurement
Deepfake - Detection - and - Localization - Using - Multi-View - Inconsistency - Measurement
This is the author's version which has not been fully edited and
102srm2 content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2024.3472064
Authorized licensed use limited to: Raj Dixit. Downloaded on November 24,2024 at 04:18:47 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Dependable and Secure Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2024.3472064
inconsistencies such as movement of lip [15] and heart rate [37], [42] , indicating that noise is effective in identifying
[16]. More recent methods [17]–[20] concentrated on in- suspicious tampering areas. From the perspective of temporal
consistencies between adjacent frames through well-designed inconsistency, deepfake videos are generated frame-by-frame,
modules. Moreover, some works [21]–[23] combined spatial which inevitably introduces inter-frame differences [17], [18].
and temporal information and achieved decent results, but Fig. 1(b) illustrates temporal inconsistencies in the mouth and
they have a relatively fixed composition of network modules eye region of fake faces across consecutive frames. Forgery
and lack a fine-grained and comprehensive design of both manipulation would leave abrupt inter-frame differences and
spatial and temporal inconsistent information. Some studies normal facial motions are more consistent leaving fewer inter-
[24] have compared these methods fairly to provide more frame differences. Temporal inconsistencies reflected in inter-
accurate practice guidance. frame differences are quite instructive for deepfake localiza-
However, all of the above approaches treated deepfake tion.
detection as a binary classification problem, while fine-grained More specifically, in this paper we design a Noise Incon-
localization is more significant and valuable in the research sistency Measurement (Noise-IM) module to identify noise
field of multi-media forensics as it is fundamental to uncover inconsistent regions from noise domain, considering the incon-
the intention of a forger. Some approaches attempted to sistent noise patterns of fake faces compared to real faces and
perform localization for image forgery detection [25]–[28]. the background. Noise-IM first calculates the noise similarity
[25] proposed a two-stream network using RGB features by means of an attention mechanism. Then the attention
and noise features, [26] utilized both frequency and spatial map is filtered by facemask to measure noise inconsistency
domain features to locate tampered regions and [27] modeled among faces and between faces and background. Finally, we
the relationships within image patches on multiple scales for compute the consistency loss based on noise similarity scores
manipulation detection. These methods were developed to according to whether their corresponding image locations
detect image forgeries rather than face forgeries. contain consistent noise patterns. We penalize areas containing
In terms of deepfake localization, previous method [29] inconsistent noise for having a low-similarity score while
highlighted manipulated regions with the assistance of atten- awarding consistent areas a high-similarity score. Furthermore,
tion mechanism under the supervision of tampering masks. we design a Temporal Inconsistency Measurement (Temporal-
Other methods [30]–[32] aimed to improve the classification IM) module to capture suspicious tampering traces between
performance of the model with the help of localization tasks. frames in temporal domain. Temporal-IM incorporates a self-
Meanwhile, [33]–[35] utilized the imperfections associated attention mechanism and calculates the cosine similarity be-
with the upsampling process in GAN-based forgeries to locate tween corresponding regions in successive frames to measure
manipulated regions. [36]–[38] introduce noise features into the degree of inconsistency. A fine-grained denoising operation
the two-stream branch and combine them with multiscale fea- is then carried to mitigate the influence of normal facial
tures to accomplish detection. The existing deepfake localiza- motions. To better exploit temporal inconsistency information,
tion methods mentioned primarily focus on spatial aspects and Temporal-IM extracts more comprehensive representations us-
overlook the significant contributions of temporal information ing convolution along both the height and width directions,
in deepfake localization. thereby expanding the temporal receptive field. Temporal-IM
Besides, commonly used public datasets, i.e., FaceForen- highlights the inconsistent position in temporal domain so that
sics++ [39], Celeb-DF [40] have a strong selection bias [41] to the network can focus on suspected areas. Noise inconsistency
favor trimmed videos, each of which involves only one person. features from Noise-IM and temporal inconsistency features
As a result, they are not competent enough to represent true from Temporal-IM are then fused by Feature Fusion Module
visual world, multi-face circumstances. As shown in Fig. 1(a), (FFM). Finally, multi-level fused features are used to detect
multi-face video frames often contain many people active in and locate tampered regions.
the scene with only a small subset having been manipulated The main contributions of this work can be summarized as
[41]. Existing deepfake detection methods based on these follows:
datasets didn’t take multi-face forensics into consideration and • We propose a novel Multi-View Inconsistency Measure-
there still exists some distance between these methods and real ment (MVIM) network that simultaneously measures
application scenarios. noise inconsistencies and temporal inconsistencies for
To address the aforementioned limitations, we propose a detecting and localizing tampered regions in deepfake
novel Multi-View Inconsistency Measurement (MVIM) net- videos, making it more in line with real-world application
work that simultaneously explores inconsistencies caused by of multi-face scenarios.
face manipulation from noise view and inconsistencies caused • A novel Noise Inconsistency Measurement (Noise-IM)
by inter-frame differences from temporal view for detecting module is devised to finely capture the noise co-
and localizing tampered regions in deepfake videos. From the occurrence of faces and their backgrounds by masked
perspective of noise inconsistency, face manipulation results attention mechanism, allowing better use of noise in-
in changes to the spatial distribution of noise features at consistency information to guide the network in locating
different positions of the image, causing inconsistencies in tampered regions.
noise patterns. Fig. 1(a) shows the median noise of real • A new Temporal Inconsistency Measurement (Temporal-
and fake video and it can be seen that noise pattern of IM) module is designed to capture suspected tampering
real face is more consistent and homogeneous than others traces reflected in inter-frame inconsistencies using self-
Authorized licensed use limited to: Raj Dixit. Downloaded on November 24,2024 at 04:18:47 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Dependable and Secure Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2024.3472064
H W3 Η W Η W Η W Η W
3 2C 4C 8C
4 4 8 8 16 16 32 32
Temporal Branch
FFM UP
FFM UP
Bayer
Conv Feature Fusion FFM UP
Real
F
FFM or
C
Noise Branch Fake
Temporal Inconsistency Measurement Block Feature Fusion Module Noise Inconsistency Measurement Block
Temporal-IM
enhancement
enhancement
7×7, Conv
1×1, Conv
1×1, Conv
7×7, Conv
1×1, Conv
1×1, Conv
Noise-IM
Channel
Concat
Spatial
Fig. 2. The overall framework of the proposed MVIM model. Given a video frame under investigation, MVIM isC× capable
H×W
of performing both classification and
localization tasks. Adjacent frames are fed intoT×the
C×H×upper
W temporal branch and noise features are sent into noise branch at the bottom. The proposed Noise-IM
and Temporal-IM modules are inserted into the modules of the two ConvNeXt branches respectively Splitinto the Noise-IM and Temporal-IM
and turn them
C×H×W C×H×W
blocks. Different resolution features are fused by FFM module.
W×C×H×T H×C×T×W
Fface Fback
Query-Conv Key-Conv Reshape
Authorized licensed use limited to: Raj Dixit. Downloaded on November 24,2024 at 04:18:47 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Dependable and Secure Computing. This is the author's version which has not been fully edited and
Noise
NoiseInconsistency Measurement
Inconsistency Measurement Temporal Inconsistency
Temporal Measurement
Inconsistency Measurement
content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2024.3472064
FnoiseFnoise FtempFtemp
C×H×
C×W
H×W T×C×
T×H×
C×W
H×W
HW×C C
HW× C×WH
C×WH H×W×T×CT×C
H×W× H×W×C×TC×T
H×W×
W×W×
C×H×
C×TH×T H×C×
H×T×
C×WT×W
Query-Conv
Query-Conv Key-Conv
Key-Conv Query-Conv
Query-Conv Key-Conv
Key-Conv Reshape
Reshape
h w
Qn Qn Kn Kn C×H× W×W×
C×H×
C×TH×T Ftemp h
Ftemp Ftemp w
Ftemp
C×W
H×W Qt Qt Kt Kt
Value-Conv
Value-Conv Value-Conv
Value-Conv HT-Conv
HT-Conv TW-Conv
TW-Conv
S S Similarity
Similarity Vt Vt
Vn Vn
Reshape
calculation
Reshape
facemask calculation
facemask M̂ M̂ MM
filtering
filtering HW×1 1
HW×
1×1,Conv
1×1,Conv 1×1,Conv
1×1,Conv
H×W
Denoising
H×W
Denoising
AtempAtemp
Anoise
Norm
Anoise
Norm
MM
Sum
Sum
C×H× T×C×
T×H×
C×W
H×W
C×W
H×W
the fused features Ffi use from different stages to accomplish c ∈ RH×W , where positions with a value of 1 are turned
M
localization. More details are described as follows. into 0 and positions with a value of 0 become -1e9 as follws:
(
0, if Mi,j = 1
B. Noise inconsistency measurement Mi,j =
c (2)
−1e9, otherwise
It is observed that noise can be considered as an intrinsic
characteristic of images, tampering operation will disrupt the We aim to use M c to zero out the values in S corresponding to
coherence of the original features and leave different kinds non-face regions. To achieve this, Mc is reshaped to RHW ×1
of traces in noise domain, which is reflected as noise in- ′ HW ×HW
and then expanded to M ∈ R
c by repeating along
consistencies. In multi-face scenarios, the noise patterns of c′ is then
real faces and backgrounds tend to be more consistent than the columns to match the size of S. The expanded M
those of fake faces. Thus we design Noise-IM to uncover added to the negative dot-product, ensuring that after sof tmax
inconsistent information at a finer granularity from noise view. activation, the inconsistency scores for non-face regions are
As shown in Fig. 3(a), Noise-IM uses a face mask M obtained close to 0 and thus filtered out:
during the data preprocessing stage where the value of each c′ − Qn K T
M n
pixel indicates whether the place is a face area or not. Then Anoise = sof tmax( √ ). (3)
d
masked attention mechanism is applied to measure the noise
inconsistency among faces and between faces and background. Then we sum along each row and normalize Anoise to
Specifically, we use one 1×1 convolution to map input noise RHW ×1 , which is then reshaped to M f ∈ RH×W . Each
features Fnoise ∈ RC×H×W into Qn ∈ RHW ×C (Query), and entry m̃i,j (1 ≤ i ≤ H, 1 ≤ j ≤ W ) in M f measures
use two 1 × 1 convolutions to map noise features into K ∈ how inconsistent the noise pattern is with other faces or
RHW ×C , Vn ∈ RC×H×W (Key and Value). Attention maps background noise features, and a value close to 1 indicates
are then obtained to measure inconsistency between different more inconsistency and close to 0 otherwise. This summation
positions by computing their negative dot-product similarity: process allows us to model the relationships between fake
faces, real faces, and the background from a global perspective.
−Qn KnT Overall Mf will highlight areas of suspected tampering where
S = sof tmax( √ ), (1)
d the noise is inconsistent. Noise-IM updates noise features
√ Fnoise in the end:
where d denotes the scaling factor, S ∈ RHW ×HW indicates
the noise inconsistency score among every two patches in f ⊗ Vn + Fnoise ,
Fnoise = αM (4)
noise features.
To measure the noise similarity among faces and between where α is a trainable parameter to adaptively adjust the
faces and the background, we filtered the attention map influence of Noise-IM module.
using a face mask, retaining only the attention scores at To better facilitate the learning of Noise-IM module, at-
locations corresponding to face regions. More specifically, we tention map Mf is under the supervision of consistency maps
H×W
downsample M to match the size of input feature Fnoise in M̄gt ∈ R , which is generated from ground truth manip-
each stage. The elements of face mask M are updated to get ulation mask Mgt through bi-linear down-sampling to match
Authorized licensed use limited to: Raj Dixit. Downloaded on November 24,2024 at 04:18:47 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Dependable and Secure Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2024.3472064
the size. Noise inconsistency loss is calculated using binary Furthermore, in order to extract inter-frame differences from
cross-entropy loss Lbce : different perspective, temporal features Ftemp is reshaped into
two coordinate-wise representations Ftemp h
∈ RW ×C×H×T
Lnoise = Lbce (M
f, M̄gt ). (5) w
and Ftemp ∈ R H×C×T ×W
, followed by convolution along
both height and width directions:
C. Temporal inconsistency measurement Rt2 = f1×1 (fht (Ftemp
h w
)) + f1×1 (ftw (Ftemp )), (7)
Video tampering process will unavoidably cause temporal
where f1×1 represents 1 × 1 convolution and fht , ftw are
inconsistencies since deepfake videos are synthesized frame-
stripe convolutions. Unlike traditional convolutions that focus
by-frame. Temporal inconsistencies caused by inter-frame dif-
on spatial dimensions, strip convolutions can simultaneously
ferences can be reflected in face motions and head movements,
capture temporal and spatial features along the height or
which is crucial for Deepfake detection and localization as
width to focus on inter-frame inconsistencies from different
they indicate the areas where tampering traces can be found.
directions, and correlate spatial information with temporal
In order to effectively capture suspicious traces of tampering
information to obtain Rt2 . Combining two temporal incon-
left behind during the tampering process, which is reflected in
sistency features to get more comprehensive representations,
inconsistencies in temporal domain, we introduce Temporal-
Temporal-IM will eventually update Ftemp :
IM to measure inconsistent information at a finer granularity.
For frame It to be detected, we utilize its adjacent frames Ftemp = (βRt1 + (1 − β)Rt2 ) + Ftemp , (8)
It−1 and It+1 to extract temporal inconsistency information.
where β is a trainable parameter to adaptively adjust the influ-
Denote features extracted by temporal backbone as Ftemp ∈
ence of two inconsistent features. Finally, extracting interme-
RT ×C×H×W , where T is the number of frames and is set to
diate frames Mtemp ∈ RC×H×W from inconsistency features
3 in this paper, C is the number of channels and H, W are
Ftemp to represent the inconsistent relationship between It
spatial dimensions.
and It−1 , It+1 , which is beneficial for indicating suspected
As shown in Fig. 3(b), we use different 1 × 1 convolutions
tampering areas. We downsample ground truth mask to match
to project Ftemp to Qt ∈ RH×W ×T ×C , Kt ∈ RH×W ×C×T ′
the corresponding size and get Mgt , the learning process is
(Query and Key). The negative cosine similarity score between
supervised by:
corresponding image patches is then calculated to represent
the degree of inconsistency between adjacent frames, denoted ′
Ltemp = Lbce (Mtemp , Mgt ). (9)
as Atemp ∈ RH×W ×T ×T . By summing the attention map
Atemp along the time dimension to ∈ RH×W ×T , we can D. Feature fusion module
measure the temporal inconsistency degree of each video
It is widely acknowledged that high-level features possess
region from a global perspective. However, the inter-frame
a relatively large perceptual field and strong ability to charac-
differences characterized in this way include not only the
terize semantic information, while displaying weaknesses in
abnormal inter-frame jitter introduced by video forgery but
the characterization of geometric information, i.e., the lack
also the normal facial motion differences. Considering that
of local texture feature details. On the contrary, low-level
real regions exhibit continuous and consistent motion changes,
features, which have smaller perceptual fields, possess strong
while forged regions may show random and intense jitter, we
abilities in the characterization of geometric detail information,
further use a fine-grained denoising operation to mitigate the
but exhibit limitations in characterizing semantic information.
interference of normal facial motion.
In order to fully utilize the information of different hierarchical
The denoising operation on Atemp first calculates the vari-
features, FFM is designed to fuse inconsistency features from
ance of each position, which characterizes the variability of
noise branch and temporal branch at different scales to get
inter-frame differences at each position. Regions with high
Ffi use , which is used for final classification and localization:
variance have large inter-frame differences and are more likely
to be forged regions, while real regions exhibit the opposite Ffi use = F F M (Fnoise
i i
, Ftemp ), i = 1, ..., N (10)
behavior. The variance is then compared to a threshold θ, and i
regions smaller than the threshold are filtered out. Now the where N indicates stage number, Fnoise
represents features
i
attention weights corresponding to real regions are reduced, from Noise-IM and Ftemp represents features from Temporal-
allowing the network to focus on regions with greater inter- IM. As is shown in Fig. 4, FFM consists of channel enhance-
frame differences, finally completing the denoising process. ment and spatial enhancement to accomplish feature fusion.
i
The threshold θ is manually set to the H-th largest variance in Specifically, noise features Fnoise ∈ RC×H×W and temporal
i C×H×W
our implementation. The denoised attention map is multiplied features Ftemp ∈ R are first concatenate by channel
with the value feature Vt to obtain the temporal inconsistency F̄ i ∈ R2C×H×W . Channel enhancement is then applied from
feature Rt1 ∈ RT ×C×H×W . The overall process can be global and local perspective:
expressed as follows: W = σ(fact (GAP (F̄ i ) + fact (F̄ i ))), (11)
Rt1 = fdenoise (mcos (Qt , Kt ))Vt , (6) bi
F =W⊗ i
Fnoise + (1 − W ) ⊗ i
Ftemp , (12)
where fdenoise represents the denoising operation and mcos where fact represents operation of a 1×1 convolution followed
denotes the negative cosine distance. by ReLU function and another 1×1 convolution, GAP stands
Authorized licensed use limited to: Raj Dixit. Downloaded on November 24,2024 at 04:18:47 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Dependable and Secure Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2024.3472064
MaxPool
1×1,Conv
1×1,Conv
ReLU
GAP
i
Fnoise
7×7,Conv
Sigmoid
Sigmoid
Concat
Fˆ i
i
Fi Ffuse
1×1,Conv
1×1,Conv
ReLU
i
Ftemp
AvgPool
i
Fig. 4. The architecture of the FFM module. Features from noise branch Fnoise i
and features from temporal branch Ftemp are fused through channel
enhancement and spatial enhancement. ⊕ and ⊗ denote element-wise addition and multiplication respectively.
enhancement follows:
L = λ1 Lnoise + λ2 Ltemp + λ3 Lcls + λ4 Lloc ,
Upsample
(15)
i Concat 1×1,Conv
F fuse where λ1 , λ2 , λ3 and λ4 are parameters, which are optimized
during the network training process.
Fig. 5. Schematic diagram of the up process. Ffi use represents fused features III. E XPERIMENTS
of smaller size and Ffi−1
use stands for larger size.
A. Datasets
for global average pooling operation. In order to enhance local In this paper, we conduct experiments on the widely-used
details and suppress irrelevant regions, spatial enhancement is datasets FFIW [41], DF-Platter [45], DFD [46] and FF++ [39].
1) FFIW: is a large-scale multi-face face forgery dataset,
1×1,Conv
1×1,Conv
where σ represents sigmoid activation, favg and fmax are tion and the number of identities in each frame ranges from
1×1,Conv
1×1,Conv
Authorized licensed use limited to: Raj Dixit. Downloaded on November 24,2024 at 04:18:47 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Dependable and Secure Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2024.3472064
levels are provided in FF++(i.e., raw, c23, and c40) and TABLE I
manipulation masks are also provided. T HE QUANTITATIVE COMPARISONS AMONG RECENT METHODS AND THE
PROPOSED ON FFIW DATASET. F1 SCORE (%) AND I O U(%) ARE ADOPTED
FOR LOCALIZATION AND ACC(%) AND AUC(%) ARE ADOPTED FOR
B. Implementation details CLASSIFICATION . T HE BEST PERFORMANCES ARE MARKED AS BOLD .
Authorized licensed use limited to: Raj Dixit. Downloaded on November 24,2024 at 04:18:47 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Dependable and Secure Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2024.3472064
TABLE III
T HE QUANTITATIVE COMPARISONS AMONG RECENT METHODS AND THE PROPOSED ON DFD DATASETS . F1 SCORE (%) AND I O U(%) ARE ADOPTED FOR
LOCALIZATION AND ACC(%) AND AUC(%) ARE ADOPTED FOR CLASSIFICATION . T HE BEST PERFORMANCES ARE MARKED AS BOLD .
TABLE IV
T HE QUANTITATIVE COMPARISONS AMONG RECENT METHODS AND THE PROPOSED ON FF++ DATASET C 40 VERSION . F1 SCORE (%) AND I O U(%) ARE
ADOPTED FOR LOCALIZATION AND ACC(%) AND AUC(%) ARE ADOPTED FOR CLASSIFICATION . T HE BEST PERFORMANCES ARE MARKED AS BOLD .
The selected comparison methods include the classic classifi- view detection of tampering traces.
cation method Xception [60], the consistency learning method 3) Comparison results on FF++ dataset: To further il-
PCL [30], the high-frequency noise-based method GFFD [42], lustrate the validity and practicality of the proposed method,
the long-distance attention-based method LDAM [22], the we also carried out experiments on the most popular dataset
spatio-temporal dynamic difference learning method DDLM FF++, which is a single face dominated dataset with multiple
[23], and the latest representation learning method MINet [61]. tampering methods. Notice that the tamper masks provided
The experimental results are shown in Table II. All methods for NeuralTextures cover the entire facial area, while the
are capable of effectively judging the authenticity of video actual tampered area is limited to the mouth region only, so
samples, but our method achieves the best classification detec- we performed experiments on the remaining three methods.
tion results due to the fine-grained inconsistency information Considering the compression of common social platforms in
measurement module design. real life, we only carry out experiments on the c40 version
here. As shown in Table IV, proposed method MVIM achieves
2) Comparison results on DFD dataset: We also conduct state-of-the-art performance on FF++ dataset in the case of
comparison experiments on another popular datasets DFD, Face2Face and FaceSwap tampering methods for classification
most of videos contain only one person active in the scene. and localization tasks, and sub-optimal classification results
There are two compression versions in DFD dataset, c23 on Deepfakes. Comparison methods achieved decent classifi-
stands for low-level compression with high visual quality and cation accuracy for each tampering method, but there is still
c40 on the contrary. In general, detection performance on some room for improvement in the accuracy of localization.
low-quality videos is not as good as that on high-quality Since our method focuses on the generic tampering features
videos. This is because high compression rate will cause left by the tampering process in both noise and temporal
some texture details of the video to be lost, which is one domain, MVIM obtains better classification and localization
of the important information that the network needs to pay detection results under different tampering methods.
attention to. As listed in Table III, proposed method MVIM To compare the localization performance of different meth-
achieves state-of-the-art performance on both versions of DFD ods more intuitively, we present the predicted localization
dataset for classification and localization tasks. Comparison masks of different methods in Fig. 6. Notably, the compar-
methods mainly capture suspicious tampering traces from ison methods sometimes exhibit mislocalizations in different
spatial domain, which may be lost to some extent during the regions and fail to maintain consistent performance across
compression process. However, our approach offers improved diverse circumstances. For instance, FFD tends to overde-
performance by incorporating inconsistency information from tect appearing faces, such as the cases in the first and last
both noise and temporal view, thereby enhancing the multi- two columns. D&L and M2TR face challenges in accurately
Authorized licensed use limited to: Raj Dixit. Downloaded on November 24,2024 at 04:18:47 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Dependable and Secure Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2024.3472064
9
66-42
Train23_189 785_99 25-393 Train168_43 Train69_132
Input
Ground-truth
大区域漏检 多区域漏检
真实误检
小区域误检
FFD
D&L
M2TR
SFFs
SDIML
LVNet
MVIM(Ours)
Fig. 6. Visualization of the predicted localization mask by different methods. From top to bottom, we show input frames, GT masks, predictions of FFD,
D&L, M2TR, SFFs, SDIML, LVNet and our MVIM, respectively.
locating tampered areas, as shown in columns two through acquire all types of forged video samples for training in
six. SFFs may occasionally overlook frontal or minor faces practical applications. Cross-dataset experiments are carried
in the third and last column, and sometimes overdetect faces out to evaluate the generalization performance of different
in the remaining columns. Leveraging noise information and methods in the face of unknown samples.
the use of multi-scale features, SDIML performs well in the 1) Comparison results from FF++ to FFIW dataset:
first two columns but still experiences mislocalizations in the Firstly, we trained different methods on the c23 version of
last few columns. LVNet is able to locate the approximate the traditional FF++ dataset and tested them on the multi-face
location of the forged area in most cases except for the third FFIW dataset. The experimental results are shown in Table
and fifth columns, but is not sufficiently accurate. In contrast, V. The performance of all methods significantly decreased,
our method can not only achieve more accurate tampering indicating considerable differences between traditional single-
region localization but also more precise boundary estimation, face datasets and multi-face datasets. Cross-dataset detection
benefitting from inconsistency information derived from noise poses significant challenges and raises higher requirements
and temporal views. for future detection schemes. Nevertheless, our method still
achieved leading cross-dataset detection performance.
D. Cross-dataset comparison results 2) Comparison results from FFIW to DF-Platter dataset:
As generative techniques and forgery methods continue Furthermore, we selected the DF-Platter dataset, in addition
to evolve, it becomes increasingly difficult to anticipate and to the FFIW dataset, to further evaluate cross-dataset per-
Authorized licensed use limited to: Raj Dixit. Downloaded on November 24,2024 at 04:18:47 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Dependable and Secure Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2024.3472064
10
TABLE VI
T HE QUANTITATIVE COMPARISONS AMONG RECENT METHODS AND THE
PROPOSED TRAINED ON FFIW AND TESTED ON DF-P LATTER DATASET.
ACC(%) AND AUC(%) ARE ADOPTED FOR CLASSIFICATION , AND THE
BEST PERFORMANCES ARE MARKED AS BOLD .
Methods Classification
ACC AUC
FFD 56.90 59.01
D&L 83.40 84.31
M2TR 71.36 73.25 (a) Performance curves w.r.t. Gaussian Blur
SFFs 76.49 78.16
SDIML 83.00 83.69
LVNet 60.51 62.59
MVIM(Ours) 85.56 84.91
E. Robustness evaluation
1) Generalization results on DFD: To examine the robust-
ness of the proposed method MVIM under video compression,
we train different methods on high-quality(c23) version of
DFD, and test them on low-quality(c40). As shown in Table
VII, video compression has a relatively large impact on the
detection performance of different methods, leading to worse
results than trained and tested on low-quality version of (c) Performance curves w.r.t. JPEG compression
DFD. It is worth noting that SDIML performs well at a Fig. 7. Robustness evaluation against Gaussian Blur, Gaussian Noise and
specific compression rate, but the classification accuracy drops JPEG compression on FFIW. Localization F1 score and classification ACC
score are reported.
more when trained and tested at different compression rates,
indicating that the stability of the method is not robust enough.
The remaining methods demonstrate a relatively similar per- localization results, which illustrates its robust performance
formance on the classification task, while our method achieves under video compression.
the highest ACC and AUC metrics. Our MVIM considers both 2) Performance under various distortion: To further ex-
noise and temporal domains to effectively extract suspicious plore the performance of different methods under various
tampered traces, thereby mitigating the negative impact of distortions, we apply different image distortion methods to
video compression. Moreover, our approach exhibits the best raw frames from FFIW dataset and evaluate their localization
Authorized licensed use limited to: Raj Dixit. Downloaded on November 24,2024 at 04:18:47 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Dependable and Secure Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2024.3472064
11
Authorized licensed use limited to: Raj Dixit. Downloaded on November 24,2024 at 04:18:47 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Dependable and Secure Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2024.3472064
12
FFIW DFD
fuse noise inconsistency features from Noise-IM and temporal
inconsistency features from Temporal-IM to detect and locate
tampered regions. Extensive experiments with multiple state-
Original
Original
of-the-art methods in different benchmark datasets have been
performed to verify the superiority of our MVIM network.
ACKNOWLEDGMENTS
This work is conducted on RTAI cluster, which is supported
by School of Computer Science and Engineering and Institute
of Artificial Intelligence, Sun Yat-sen University.
FFD
FFD
R EFERENCES
[1] Y. Pu, Z. Gan, R. Henao, X. Yuan, C. Li, A. Stevens, and L. Carin,
“Variational autoencoder for deep learning of images, labels and cap-
tions,” in Proceedings of the 30th International Conference on Neural
Information Processing Systems (NIPS), 2016, pp. 2360–2368.
MVIM(Ours)
MVIM(Ours)
[2] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image
translation using cycle-consistent adversarial networks,” in Proceedings
of the IEEE International Conference on Computer Vision (ICCV), 2017,
pp. 2223–2232.
[3] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for
generative adversarial networks,” in Proceedings of the IEEE/CVF Con-
FFIW DFD
ference on Computer Vision and Pattern Recognition (CVPR). IEEE,
2019, pp. 4396–4405.
[4] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic mod-
Fig. 9. t-SNE visualization of features derived from different methods on the els,” in Proceedings of the 34th International Conference on Neural
test set of DFD and FFIW datasets. From top to bottom, we show original Information Processing Systems (NIPS), 2020, pp. 6840–6851.
features, FFD features and our MVIM features, respectively. The orange points [5] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-
represent the forgery samples and blue points stand for real samples. resolution image synthesis with latent diffusion models,” in Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recogni-
tion (CVPR), 2022, pp. 10 674–10 685.
with the ground truth localization mask. The results indicate [6] J. Morris, S. Newman, K. Palaniappan, J. Fan, and D. Lin, “”do
that MVIM assigns higher activation weights to suspected you know you are tracked by photos that you didnt take: Large-scale
tampered regions, allowing the network to focus on crucial location-aware multi-party image privacy protection,” IEEE Transactions
on Dependable and Secure Computing, 2021.
areas and effectively achieving the localization task.
[7] C. Liu, H. Chen, T. Zhu, J. Zhang, and W. Zhou, “Making deepfakes
We also visualize the feature distribution derived from more spurious: evading deep face forgery detection via trace removal
both the comparison method FFD and our MVIM by t-SNE attack,” IEEE Transactions on Dependable and Secure Computing, 2023.
algorithm [62] on two datasets in Fig. 9. Initially, the two [8] C. Yu, X. Zhang, Y. Duan, S. Yan, Z. Wang, Y. Xiang, S. Ji, and W. Chen,
“Diff-id: An explainable identity difference quantification framework
datasets exhibit a random distribution. However, both FFD for deepfake detection,” IEEE Transactions on Dependable and Secure
and our method can effectively distinguish between forgery Computing, 2024.
and real samples. Our method produces a more compact [9] F. Matern, C. Riess, and M. Stamminger, “Exploiting visual artifacts to
expose deepfakes and face manipulations,” in Proceedings of the IEEE
feature distribution on FFIW dataset and a more discriminative Winter Applications of Computer Vision Workshops (WACVW). IEEE,
distribution on DFD dataset, which benefits from our multi- 2019, pp. 83–92.
view inconsistency measurement to ensure the discriminative [10] D.-T. Dang-Nguyen, G. Boato, and F. G. De Natale, “Discrimination be-
tween computer generated and natural human faces based on asymmetry
power of feature representations learned by our method. information,” in Proceedings of the 20th European Signal Processing
Conference (EUSIPCO). IEEE, 2012, pp. 1234–1238.
IV. C ONCLUSION [11] L. Li, J. Bao, T. Zhang, H. Yang, D. Chen, F. Wen, and B. Guo, “Face
x-ray for more general face forgery detection,” in Proceedings of the
In this paper, we propose a Multi-View Inconsistency IEEE/CVF Conference on Computer Vision and Pattern Recognition
Measurement (MVIM) network to measure inconsistencies (CVPR), 2020, pp. 5001–5010.
[12] B. Peng, W. Wang, J. Dong, and T. Tan, “Optimized 3D lighting
from both noise and temporal view. Taking into account environment estimation for image forgery detection,” IEEE Transactions
the defects of existing deepfake video generation methods, a on Information Forensics and Security, vol. 12, no. 2, pp. 479–494, 2016.
novel Noise Inconsistency Measurement (Noise-IM) module [13] S. J. Sohrawardi, A. Chintha, B. Thai, S. Seng, A. Hickerson, R. Ptucha,
and M. Wright, “Poster: Towards robust open-world detection of
is designed to identify regions with inconsistent noise patterns deepfakes,” in Proceedings of the 2019 ACM SIGSAC Conference on
in multi-face scenarios where the noise patterns of fake faces Computer and Communications Security, 2019, pp. 2613–2615.
are inconsistent with those of real faces and backgrounds. [14] B. Zi, M. Chang, J. Chen, X. Ma, and Y.-G. Jiang, “Wilddeepfake: A
challenging real-world dataset for deepfake detection,” in Proceedings
And a Temporal Inconsistency Measurement (Temporal-IM) of the 28th ACM International Conference on Multimedia, 2020, pp.
module is developed to capture suspicious tampering traces 2382–2390.
between frames in temporal view, observing that facial jitter [15] A. Haliassos, K. Vougioukas, S. Petridis, and M. Pantic, “Lips don’t
lie: A generalisable and robust approach to face forgery detection,”
of tampered regions is more intense than real regions. In in Proceedings of the IEEE/CVF Conference on Computer Vision and
addition, a Feature Fusion Module (FFM) is proposed to Pattern Recognition (CVPR), 2021, pp. 5039–5049.
Authorized licensed use limited to: Raj Dixit. Downloaded on November 24,2024 at 04:18:47 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Dependable and Secure Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2024.3472064
13
[16] S. Fernandes, S. Raj, E. Ortiz, I. Vintila, M. Salter, G. Urosevic, and [37] C. Kong, B. Chen, H. Li, S. Wang, A. Rocha, and S. Kwong, “Detect
S. Jha, “Predicting heart rate variations of deepfake videos using neural and locate: Exposing face manipulation by semantic-and noise-level
ode,” in Proceedings of the IEEE/CVF International Conference on telltales,” IEEE Transactions on Information Forensics and Security,
Computer Vision Workshops (ICCVW), 2019, pp. 1721–1729. vol. 17, pp. 1741–1756, 2022.
[17] Z. Gu, T. Yao, C. Yang, R. Yi, S. Ding, and L. Ma, “Region- [38] C. Shuai, J. Zhong, S. Wu, F. Lin, Z. Wang, Z. Ba, Z. Liu, L. Cavallaro,
aware temporal inconsistency learning for deepfake video detection,” and K. Ren, “Locate and verify: A two-stream network for improved
in Proceedings of the 31th International Joint Conference on Artificial deepfake detection,” in Proceedings of the 31st ACM International
Intelligence (IJCAI), vol. 1, 2022. Conference on Multimedia, 2023, pp. 7131–7142.
[18] Z. Gu, Y. Chen, T. Yao, S. Ding, J. Li, and L. Ma, “Delving into the [39] A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and
local: Dynamic inconsistency learning for deepfake video detection,” in M. Nießner, “Faceforensics++: Learning to detect manipulated facial
Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, images,” in Proceedings of the IEEE International Conference on
no. 1, 2022, pp. 744–752. Computer Vision (ICCV), 2019, pp. 1–11.
[19] Z. Yu, R. Cai, Z. Li, W. Yang, J. Shi, and A. C. Kot, “Benchmarking [40] Y. Li, X. Yang, P. Sun, H. Qi, and S. Lyu, “Celeb-df: A large-
joint face spoofing and forgery detection with visual and physiological scale challenging dataset for deepfake forensics,” in Proceedings of
cues,” IEEE Transactions on Dependable and Secure Computing, 2024. the IEEE/CVF Conference on Computer Vision and Pattern Recognition
[20] C. Zhu, B. Zhang, Q. Yin, C. Yin, and W. Lu, “Deepfake detection (CVPR), 2020, pp. 3207–3216.
via inter-frame inconsistency recomposition and enhancement,” Pattern [41] T. Zhou, W. Wang, Z. Liang, and J. Shen, “Face forensics in the wild,”
Recognition, p. 110077, 2023. in Proceedings of the IEEE/CVF Conference on Computer Vision and
[21] D. Zhang, F. Lin, Y. Hua, P. Wang, D. Zeng, and S. Ge, “Deepfake video Pattern Recognition (CVPR), 2021, pp. 5778–5788.
detection with spatiotemporal dropout transformer,” in Proceedings of [42] Y. Luo, Y. Zhang, J. Yan, and W. Liu, “Generalizing face forgery de-
the 30th ACM International Conference on Multimedia, 2022, pp. 5833– tection with high-frequency features,” in Proceedings of the IEEE/CVF
5841. Conference on Computer Vision and Pattern Recognition (CVPR), 2021,
[22] W. Lu, L. Liu, B. Zhang, J. Luo, X. Zhao, Y. Zhou, and J. Huang, pp. 16 317–16 326.
“Detection of deepfake videos using long-distance attention,” IEEE [43] B. Bayar and M. C. Stamm, “Constrained convolutional neural networks:
Transactions on Neural Networks and Learning Systems, 2023. A new approach towards general purpose image manipulation detection,”
[23] Q. Yin, W. Lu, B. Li, and J. Huang, “Dynamic difference learning IEEE Transactions on Information Forensics and Security, vol. 13,
with spatio-temporal correlation for deepfake video detection,” IEEE no. 11, pp. 2691–2706, 2018.
Transactions on Information Forensics and Security, vol. 18, pp. 4046– [44] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional
4058, 2023. block attention module,” in Proceedings of the European Conference on
[24] J. Deng, C. Lin, P. Hu, C. Shen, Q. Wang, Q. Li, and Q. Li, “Towards Computer Vision (ECCV), 2018, pp. 3–19.
benchmarking and evaluating deepfake detection,” IEEE Transactions [45] K. Narayan, H. Agarwal, K. Thakral, S. Mittal, M. Vatsa, and R. Singh,
on Dependable and Secure Computing, pp. 1–16, 2024. “Df-platter: Multi-face heterogeneous deepfake dataset,” in Proceedings
[25] P. Zhou, X. Han, V. I. Morariu, and L. S. Davis, “Learning rich features of the IEEE/CVF Conference on Computer Vision and Pattern Recogni-
for image manipulation detection,” in Proceedings of the IEEE/CVF tion (CVPR), 2023, pp. 9739–9748.
Conference on Computer Vision and Pattern Recognition (CVPR), 2018, [46] Google ai blog:contributing data to deepfake detection
pp. 1053–1061. research. [Online]. Available: https://ai.googleblog.com/2019/09/
[26] J. H. Bappy, C. Simons, L. Nataraj, B. Manjunath, and A. K. Roy- contributing-data-to-deepfake-detection.html.
Chowdhury, “Hybrid lstm and encoder–decoder architecture for de- [47] I. Perov, D. Gao, N. Chervoniy, K. Liu, S. Marangonda, C. Umé,
tection of image forgeries,” IEEE Transactions on Image Processing, M. Dpfks, C. S. Facenheim, L. RP, J. Jiang et al., “Deepfacelab:
vol. 28, no. 7, pp. 3286–3300, 2019. Integrated, flexible and extensible face-swapping framework,” arXiv
[27] X. Hu, Z. Zhang, Z. Jiang, S. Chaudhuri, Z. Yang, and R. Nevatia, preprint arXiv:2005.05535, 2020.
“Span: Spatial pyramid attention network for image manipulation lo- [48] Y. Nirkin, Y. Keller, and T. Hassner, “Fsgan: Subject agnostic face
calization,” in Proceedings of the European Conference on Computer swapping and reenactment,” in Proceedings of the IEEE International
Vision (ECCV), 2020, pp. 312–328. Conference on Computer Vision (ICCV), 2019, pp. 7184–7193.
[28] W. Lu, W. Xu, and Z. Sheng, “An interpretable image tampering [49] Faceswap. [Online]. Available: https://www.github.com/MarekKowalski/
detection approach based on cooperative game,” IEEE Transactions on FaceSwap.
Circuits and Systems for Video Technology, vol. 33, no. 2, pp. 952–962, [50] L. Li, J. Bao, H. Yang, D. Chen, and F. Wen, “Faceshifter: To-
2023. wards high fidelity and occlusion aware face swapping,” arXiv preprint
[29] H. Dang, F. Liu, J. Stehouwer, X. Liu, and A. K. Jain, “On the arXiv:1912.13457, 2019.
detection of digital face manipulation,” in Proceedings of the IEEE/CVF [51] Deepfakes. [Online]. Available: https://github.com/deepfakes/faceswap.
Conference on Computer Vision and Pattern Recognition (CVPR), 2020, [52] J. Thies, M. Zollhofer, M. Stamminger, C. Theobalt, and M. Niessner,
pp. 5781–5790. “Face2face: Real-time face capture and reenactment of rgb videos,”
[30] T. Zhao, X. Xu, M. Xu, H. Ding, Y. Xiong, and W. Xia, “Learning self- in Proceedings of the IEEE/CVF Conference on Computer Vision and
consistency for deepfake detection,” in Proceedings of the IEEE/CVF Pattern Recognition (CVPR), 2016, pp. 2387–2395.
Conference on Computer Vision and Pattern Recognition(CVPR), 2021, [53] J. Thies, M. Zollhöfer, and M. Nießner, “Deferred neural rendering:
pp. 15 023–15 033. Image synthesis using neural textures,” Acm Transactions on Graphics
[31] J. Wang, Z. Wu, W. Ouyang, X. Han, J. Chen, Y.-G. Jiang, and S.-N. Li, (TOG), vol. 38, no. 4, pp. 1–12, 2019.
“M2tr: Multi-modal multi-scale transformers for deepfake detection,” in [54] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A
Proceedings of the International Conference on Multimedia Retrieval convnet for the 2020s,” in Proceedings of the IEEE/CVF Conference on
(ICMR), 2022, pp. 615–623. Computer Vision and Pattern Recognition (CVPR), 2022, pp. 11 976–
[32] J. Wang, Y. Sun, and J. Tang, “Lisiam: Localization invariance siamese 11 986.
network for deepfake detection,” IEEE Transactions on Information [55] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detection and
Forensics and Security, vol. 17, pp. 2425–2436, 2022. alignment using multitask cascaded convolutional networks,” IEEE
[33] K. Songsri-in and S. Zafeiriou, “Complement face forensic detection and signal processing letters, vol. 23, no. 10, pp. 1499–1503, 2016.
localization with faciallandmarks,” arXiv preprint arXiv:1910.05455, [56] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,”
2019. arXiv preprint arXiv:1711.05101, 2017.
[34] B. Chen, X. Ju, B. Xiao, W. Ding, Y. Zheng, and V. H. C. de Albu- [57] P. Yu, J. Fei, Z. Xia, Z. Zhou, and J. Weng, “Improving generalization
querque, “Locally gan-generated face detection based on an improved by commonality learning in face forgery detection,” IEEE Transactions
xception,” Information Sciences, vol. 572, pp. 16–28, 2021. on Information Forensics and Security, vol. 17, pp. 547–558, 2022.
[35] Y. Huang, F. Juefei-Xu, Q. Guo, Y. Liu, and G. Pu, “Fakelocator: Robust [58] J. Zhang, H. Tohidypour, Y. Wang, and P. Nasiopoulos, “Shallow-
localization of gan-based face manipulations,” IEEE Transactions on and deep-fake image manipulation localization using deep learning,” in
Information Forensics and Security, vol. 17, pp. 2657–2672, 2022. Proceedings of the International Conference on Computing, Networking
[36] P. Chen, J. Liu, T. Liang, C. Yu, S. Zou, J. Dai, and J. Han, “Dlfmnet: and Communications (ICNC). IEEE, 2023, pp. 468–472.
End-to-end detection and localization of face manipulation using multi- [59] T. Xiao, Y. Liu, B. Zhou, Y. Jiang, and J. Sun, “Unified perceptual
domain features,” in Proceedings of the IEEE International Conference parsing for scene understanding,” in Proceedings of the European
on Multimedia and Expo (ICME). IEEE, 2021, pp. 1–6. Conference on Computer Vision (ECCV), 2018, pp. 418–434.
Authorized licensed use limited to: Raj Dixit. Downloaded on November 24,2024 at 04:18:47 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Dependable and Secure Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2024.3472064
14
Authorized licensed use limited to: Raj Dixit. Downloaded on November 24,2024 at 04:18:47 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.