A Study On Combating Emerging Threat of Deepfake Weaponization
A Study On Combating Emerging Threat of Deepfake Weaponization
Abstract—A breakthrough in the emerging use of machine Machine learning is an evolving concept of
learning and deep learning is the concept of autoencoders and Computer Science where machines can be trained to learn
GAN (Generative Adversarial Networks), architectures that can from provided data and accordingly take decisions on their
generate believable synthetic content called deepfakes. The own, just like humans do. Deep learning is a broader aspect of
threat lies when these low-tech doctored images, videos, and
machine learning, where highly complex networks are trained
audios blur the line between fake and genuine content and are
to learn from a massive database of unstructured data.
used as weapons to cause damage to an unprecedented degree.
This paper presents a survey of the underlying technology of Today there are various free deepfake
deepfakes and methods proposed for their detection. Based on a
applications like the Chinese app, Zao [3] that lets users to
detailed study of all the proposed models of detection, this paper
presents S S TNet as the best model to date, that uses spatial, easily swap faces with movie stars so they can see their self,
temporal, and steganalysis for detection. The threat posed by playing that role in the movie, DeepNude [4], that can create
document and signature forgery, which is yet to be explored by nonconsensual porn, FakeApp, FaceSwap, and DeepFace Lab.
researchers, has also been highlighted in this paper. This paper The existence of such open-source software and the
concludes with the discussion of research directions in this field availability of devices in the market for fabricating and
and the development of more robust techniques to deal with the propagating these falsified information has brought to
increasing threats surrounding deepfake technology. attention the immediate need for detection and elimination of
Index Terms—Deep Learning, Generative Adversarial malicious deepfake content. Deepfakes can act as a powerful
Networks, autoencoders, Deepfake detection, Fake image, Fake weapon to insurgent groups and terrorist organizations, who
Video may depict their adversaries using inflammatory words or
engaging in provocative actions, to maximize the galvanizing
impact on their target audiences. For instance, a member of
I. INTRODUCTION
the Islamic State (or ISIS), can falsely generate fake content
Deepfake technology is a new automatic that shows government officials or soldiers discussing
computer graphics tool that portrays entirely unrealistic events bombing attacks at a mosque, to aid their terrorist group’s
as real through digital media manipulation. Deepfake gained recruitment [5]. States can use this weapon to undermine their
its name from the Reddit platform where an anonymous user non-state opponents. Deepfakes can affect the outcome of an
used this term, a combination of deep learning and fake, for election, hence a threat to democracy. Deepfake is even
replacing celebrities into adult video clips. As soon as the weaponizing satellite images of Earth by showing the
code was made public, widespread interest spawned in the existence of certain objects in landscapes and locations that do
users about the generation of fake content. ObamaNet [1], was not exist in reality, just so they can play with the minds of
an architecture that featured an impressive use of lip-syncing military analysts and influence their decisions based on these
technology to generate synchronized photo-realistic lip-sync fake images [6][7].
videos. Deepfake technique makes it possible to generate
Amidst the threats posed by deepfakes in various
unauthentic videos of people expressing or saying things they
sectors, the ability to generate realistic simulations can have a
have never said before [2].
whole new positive impact on humanity. It can create an array
Deepfake makes use of Artificial Intelligence of opportunities in fields of education, entertainment, and
(AI), machine learning, and deep learning concepts. AI deals business. Historical figures can be made to communicate with
with intelligence at the machine level. students. In movies, face-swapping can be achieved for scenes
that cannot be fulfilled by the actors alone. For example, in
Authorized licensed use limited to: UNIVERSITY OF CONNECTICUT. Downloaded on May 18,2021 at 00:09:24 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)
IEEE Xplore Part Number:CFP20OSV-ART; ISBN: 978-1-7281-5464-0
2016’s Rogue One, late Peter Cushing’s appearance as Grand information to reconstruct an output based on its
Moff Tarkin was possible through similar technology [8]. representation in latent space. The decoder tries to recreate an
Deepfakes can be used in business such that customers can image that resembles the original as much as possible.
exactly view their appearance in products they wish to buy
without applying them in reality. In the medical world, this Fig. 1 shows how an autoencoder can be used to
technology can play a great role in training doctors, nurses, swap two faces. Following the path indicated by the red
and surgeons to operate on real-life scenarios in a virtual arrows, Face B is reconstructed similar to Face A. The
environment [9]. important aspect here is that both the faces have used the same
encoder. This will help the encoder to use general features that
However, the potential of deepfakes to cause a are common to both faces , and their positioning in latent space
broad spectrum of serious harm to society is a matter of will also be similar. This will allow the autoencoders to
greater concern. They can act as a new weapon to humiliation transform the same picture with faces of the target and the
and destruction, identity theft and exploitation, defamation, original individual swapped. Here, the latent space of Face A
and manipulation of legal evidence. Several methods have is referred for Decoder of Face B to be able to reconstruct
been proposed to detect deepfakes by focusing on the minute Face B similar to Face A. This technique has its application in
details of the content such as facial texture, head poses, eye- various deepfake technologies like DFaker, DeepFaceLab, and
blinking, skin color, lip movements, Spatio-temporal features, TensorFlow-based deepfakes [12].
and capsule forensics. Most of these rely on the same deep
learning techniques that are used for the creation of deepfakes. B. Generative Adversarial Networks
In the foreseeable future, deepfakes will continue The majority of the current deepfake technologies
evolving; thus, it is important to investigate their development incorporate the use of GAN. The GAN architecture was first
and improve the methods of detection accordingly. The main proposed by Ian Goodfellow in 2014 [13]. He introduced a
objective of this paper is to present a survey of methods used framework that involves the use of two neural networks that
for the creation and detection of deepfakes. Section II explains work by challenging each other: the first one generates new
the popular underlying principle of deepfake architecture. data while the other discriminates this new data from the
Section III discusses and compares different proposed original training data set. Contesting the two neural networks
methods for deepfake detection. Section IV presents the tend to improve both the quality of fake data produced as well
contribution of this paper to the survey. The research as the neural network’s discrimination ability. If a large
opportunities in this direction are further highlighted in number of images are fed to GAN, it can create a unique
Section V. image on its own [14]. However, it is necessary to attach a
filter that can help differentiate these unique outputs as
II. DEEPFAKE CREATION acceptable or not. For this, GANs make use of a
discriminative network that checks the generated data with
Deepfakes use Deep Neural Networks (DNNs). true data. Both are trained to operate together until the
DNNs consists of a set of interconnected units called neurons. discriminator has falsely classified the generated output as
These units together perform some form of computational task authentic, almost 50% of the time. This helps us conclude that
and help solve complex problems. Two popular technologies the generator model is successfully generating plausible
associated with deepfake creation are the autoencoder-decoder examples. Fig.1 shows the block diagram explaining the
model and the GAN architecture that has been discussed workflow of GAN architecture.
below.
The GAN architecture uses the min-max method for training
A. Autoencoders the generator and discriminator [13]. The min (0) represents a
Autoencoder was the first technology to be used fake output while the max (1) represents a genuine output.
in deepfake creation. Autoencoder is used to recreate images The goal of the discriminator is to get as close as possible to
that it is trained on. The output generated works in three the max value such that a realistic-looking deepfake is
different phases: encoder, latent space, and a decoder [10]. generated which can be further be used for face-swapping in
The encoder first compresses the input pixels to a relatively images and videos. GANs are more suitable for generating
smaller size by encoding special attributes like skin texture, new data [15]. The main advantage of GANs over
skin color, facial expressions, open eye, closed eye, head pose, autoencoders is that they can be used for a wider range of
and any minute details of the face. This compressed image is tasks for example to produce several classes of data, similar to
sent as input to latent space that is useful for understanding the MNIST dataset [16]. Autoencoders , on the other hand, are
and learning patterns and structural similarities between the more suitable for compressing data to lower dimensions or
data points [11]. Lastly, the decoder decompresses this generating semantic vectors from it.
Authorized licensed use limited to: UNIVERSITY OF CONNECTICUT. Downloaded on May 18,2021 at 00:09:24 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)
IEEE Xplore Part Number:CFP20OSV-ART; ISBN: 978-1-7281-5464-0
Authorized licensed use limited to: UNIVERSITY OF CONNECTICUT. Downloaded on May 18,2021 at 00:09:24 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)
IEEE Xplore Part Number:CFP20OSV-ART; ISBN: 978-1-7281-5464-0
carried out the extraction of spatial and steganalysis features differentiate between a fake and authentic visual. This method
of the digital content using CNNs. Spatial features involve the outperformed the method proposed in [27], that used
identification of visible inconsistencies in the image like facial landmark detectors to identify the eye corners and eyelid
blurs, facial texture, artificial smoothness, and contrast contours for precise detection of eyes and with the use of two
difference. Steganalysis help analyze the hidden information different models namely the SVM (Support Vector Machine)
in an image through low-level feature extraction. However, and HMM (Hidden Markov Model), detected the rate of eye
with many evolving extensions of GAN, new mechanisms are blinks. This technique was further enhanced to differentiate
being adopted. So it is important to improve the generalization between a complete and incomplete eye blink. Fogelton and
ability of detection tools. Instead of focusing on statistical Benesova [28], have designed a model that detects a complete
details of a low-level pixel, Xuan et al. [20] proposed the use blink, incomplete blink, and no blink, for every frame. This
of Gaussian blur and Gaussian Noise to force classifiers to method was implemented on the Researcher’s night dataset,
learn more detailed and meaningful features from improved and it outperformed all other related work by almost 8%.
statistical similarity at the pixel level in images.
Stressing on the fact that several intra-frame
Likewise, in [21] a two-phase model is proposed inconsistencies and temporal frame inconsistencies can occur
that pairs fake and real images . These pairs then learn from across frames in deepfake video generation, Guera and Delp
discriminative common fake feature networks (CFFN). The [23] proposed temporal feature analysis of videos to detect
discriminative CFF is used to identify the authenticity of the fake videos, with the use of a simple convolutional Long -
image. They have highlighted the fact that it is challenging to Short - Term- Memory (LSTM) structure. Similar to this, an
identify subjects excluded from the training phase using advanced framework known as SSTNet [29], uses a
supervised learning. So have introduced a contrastive loss to combination of CNN based spatial feature extraction,
learn through pairwise learning. The objective of this steganalysis feature extraction and temporal feature extraction
framework is to detect newly encountered fake images and by a single LSTM for fake video detection. This framework
those generated by a new GAN. Their experimental results achieved a good generalization on the GAN dataset and
have demonstrated to outperform the precision and recall rate outperformed previous steganalysis methods when tested on
of other state-of-the-art methods. the FaceForensics++ dataset [30]. Fig. 3 shows the working
of SSTNet model.
B. Fake Video Detection
Authorized licensed use limited to: UNIVERSITY OF CONNECTICUT. Downloaded on May 18,2021 at 00:09:24 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)
IEEE Xplore Part Number:CFP20OSV-ART; ISBN: 978-1-7281-5464-0
phonemes, while VC systems can convert the voice of an models have popularly been used to extract dynamic acoustic
utterance into another voice with the same content. DNN
TABLE I
SUMMARY OF PROMINENT DEEPFAKE DETECTION TECHNIQUES
features of audiovisuals and label them as real or fake invest in this sector as well.
[35].This proposed methodology has shown to outperform the
static feature analysis of Gaussian Mixture Model (GMM) V. CONCLUSION & FUTURE RESEARCH DIRECTIONS
classifiers [36]. The summarized version of the most As the deepfake technology approaches towards
prominent deepfake detection methods has been listed in generating fake content with considerably improved quality, it
Table 1. will likely become impossible to detect them shortly. It is
IV. CONTRIBUTIONS thus, important to immediately respond to the emerging threat
posed by deepfakes with great caution. To be able to
After an in-depth review of several proposed detection distinguish between generated and authentic content,
methods for deepfakes, the SSTNet model proves to be the organizations can start developing encrypted digital stamps
most convincing one due to its flexibility and use of a for authentic digital media.
generalized approach in detecting both the fake images and
Believability and accessibility have become the
videos. This model achieved an accuracy level of around 90%
most significant drive in deep fake technology. Stopping
to 95%, which is much higher compared to other models
deepfakes from spreading across massive networks is a
trained on the same datasets. This method can further be
considerable challenge and requires social media platforms to
improvised by incorporating other detection methods like the
step up. They need to develop tools and extensions that can
rate of complete eye blinks and investigating for lip-synching
help deal with deepfake content moderation, detection, and
evidence, on datasets released by giant tech companies like
prevent their mainstream media coverage. Also confusing
Google [37] and Facebook [38] to classify fake videos.
deepfakes to generate more flawed output can help detect
Many researchers have failed to recognize that them easily. This can be achieved with the addition of special
signature forgery and document forgery is also a matter of noise to digital photos that are uploaded on social media, such
great concern. Signatures ensure a great deal of authenticity in that they create a decoy suggesting there is a face when there
various sectors of banking, insurance, healthcare, copyrights is none, in reality [39].
and governmental regulatory compliance. Forgery of any form
To combat deep fake just developing and
could potentially lead to more immense disastrous
deploying one or two successful tools is not enough. It will
consequences. Thus, it is important for researchers to equally
Authorized licensed use limited to: UNIVERSITY OF CONNECTICUT. Downloaded on May 18,2021 at 00:09:24 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)
IEEE Xplore Part Number:CFP20OSV-ART; ISBN: 978-1-7281-5464-0
require a constant reinvention of these tools as this technology Network,” 2018, doi: 10.3390/app8122610.
[18] Y. Aslam and N. Santhi, “A Review of Deep Learning Approaches
is evolving at a much faster rate and machine learning plays a for Image Analysis,” Proc. 2nd Int. Conf. Smart Syst. Inven.
crucial role in achieving this. Therefore, the research Technol. ICSSIT 2019, no. Icssit, pp. 709–714, 2019, doi:
community should continue their research in developing 10.1109/ICSSIT 46314.2019.8987922.
[19] C. Google, “Xception : Deep Learning with Depthwise Separable
countermeasures using machine learning and deep learning to Convolutions,” pp. 1251–1258, 2014.
combat the weaponization of deepfakes . [20] X. Xuan, B. Peng, W. Wang, and J. Dong, “On the Generalization
of GAN Image Forensics,” Lect. Notes Comput. Sci. (including
REFERENCES Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol.
[1] R. Kumar, J. Sotelo, K. Kumar, A. de Brebisson, and Y. Bengio, 11818 LNCS, no. 61502496, pp. 134–141, 2019, doi: 10.1007/978-
“ ObamaNet: Photo-realistic lip-sync from text,” pp. 1–4, 2017, 3-030-31456-9_15.
[Online]. Available: http://arxiv.org/abs/1801.01442. [21] C. Hsu, Y. Zhuang, and C. Lee, “applied sciences Deep Fake Image
[2] D. Yadav and S. Salmani, “Deepfake: A survey on facial forgery Detection Based on Pairwise Learning,” 2020, doi:
technique using generative adversarial network,” 2019 Int. Conf. 10.3390/app10010370.
Intell. Comput. Control Syst. ICCS 2019, pp. 852–857, 2019, doi: [22] Y. Li and S. Lyu, “Exposing DeepFake Videos By Detecting Face
10.1109/ICCS45141.2019.9065881. Warping Artifacts.”
[3] “ Chinese deepfake app Zao sparks privacy row after going viral | [23] G. David and E. J. Delp, “Deepfake Video Detection Using
Privacy | T he Guardian.” Recurrent Neural Networks.”
https://www.theguardian.com/technology/2019/sep/02/chinese-face- [24] D. Afchar and V. Nozick, “MesoNet : a Compact Facial Video
swap-app-zao-triggers-privacy-fears-viral (accessed Aug. 26, 2020). Forgery Detection Network.”
[4] “ AI deepfake app DeepNude transformed photos of women into [25] M. Wang, L. Guo, and W. Chen, “Blink detection using Adaboost
nudes - Vox.” https://www.vox.com/2019/6/27/18761639/ai- and contour circle for fatigue recognition,” Comput. Electr. Eng.,
deepfake-deepnude-app-nude-women-porn (accessed Aug. 26, vol. 0, pp. 1–11, 2016, doi: 10.1016/j.compeleceng.2016.09.008.
2020). [26] Y. Li, M. Chang, and S. Lyu, “In Ictu Oculi : Exposing AI
[5] “ Deepfakes and the New Disinformation War | Foreign Affairs.” Generated Fake Face Videos by Detecting Eye Blinking.”
https://www.foreignaffairs.com/articles/world/2018-12- [27] M. Perception and C. T echnical, “Eye Blink Detection Using Facial
11/deepfakes-and-new-disinformation-war (accessed Aug. 30, Landmarks,” 2016.
2020). [28] A. Fogelton and W. Benesova, “Eye blink completeness detection,”
[6] “ T he Newest AI-Enabled Weapon: ‘Deep-Faking’ Photos of the Comput. Vis. Image Underst., vol. 176–177, pp. 78–85, 2018, doi:
Earth - Defense One.” 10.1016/j.cviu.2018.09.006.
https://www.defenseone.com/technology/2019/03/next -phase-ai- [29] “SST NET : DETECTING MANIPULAT ED FACES T HROUGH
deep-faking-whole-world-and-china-ahead/155944/ (accessed Aug. SPAT IAL , ST EGANALYSIS AND T EMPORAL FEAT URES
24, 2020). YuT ao Gao Yu Xiao Alibaba Group China,” pp. 2952–2956, 2020.
[7] “ Deep fakes: AI-manipulated media will be ‘WEAPONISED’ to [30] R. Andreas, D. Cozzolino, L. Verdoliva, J. Thies, M. Nießner, and
trick military | Science | News | Express.co.uk.” C. Riess, “FaceForensics ++ : Learning to Detect Manipulated
https://www.express.co.uk/news/science/1109783/deep-fakes-ai- Facial Images.”
artificial-intelligence-photos-video-weaponised-china (accessed [31] B. Paris and J. Donovan, “Deepfakes and Cheap Fakes,” Data Soc.,
Aug. 24, 2020). p. 47, 2019, [Online]. Available: https://site.ieee.org/sagroups-
[8] R. Chesney and D. K. Citron, “Deep Fakes: A Looming Challenge 7011/%0Ahttps://site.ieee.org/sagroups-
for Privacy, Democracy, and National Security,” SSRN Electron. J., 7011/blog/%0Ahttps://datasociety.net/library/deepfakes-and-cheap-
pp. 1753–1820, 2018, doi: 10.2139/ssrn.3213954. fakes/.
[9] “ Don’t believe your eyes: Exploring the positives and negatives of [32] “Resemble AI launches voice synthesis platform and deepfake
deepfakes - AI News.” https://artificialintelligence- detection tool | VentureBeat.”
news.com/2019/08/05/dont -believe-your-eyes-exploring-the- https://venturebeat.com/2019/12/17/resemble-ai-launches-voice-
positives-and-negatives-of-deepfakes/ (accessed Aug. 25, 2020). synthesis-platform-and-deepfake-detection-tool/ (accessed Aug. 30,
[10] J. Kietzmann, L. W. Lee, I. P. McCarthy, and T . C. Kietzmann, 2020).
“ Deepfakes: Trick or treat?,” Bus. Horiz., vol. 63, no. 2, pp. 135– [33] “Detecting Audio Deepfakes With AI | by Dessa | Dessa News |
146, 2020, doi: 10.1016/j.bushor.2019.11.006. Medium.” https://medium.com/dessa-news/detecting-audio-
[11] “ Understanding Latent Space in Machine Learning | by Ekin T iu | deepfakes-f2edfd8e2b35 (accessed Aug. 30, 2020).
T owards Data Science.” [34] “WaveNet: A generative model for raw audio | DeepMind.”
https://towardsdatascience.com/understanding-latent-space-in- https://deepmind.com/blog/article/wavenet-generative-model-raw-
machine-learning-de5a7c687d8d (accessed Aug. 28, 2020). audio (accessed Aug. 30, 2020).
[12] T . T . Nguyen, C. M. Nguyen, D. T. Nguyen, D. T . Nguyen, and S. [35] H. Yu, Z. H. T an, Z. Ma, R. Martin, and J. Guo, “ Spoofing
Nahavandi, “Deep Learning for Deepfakes Creation and Detection: Detection in Automatic Speaker Verification Systems Using DNN
A Survey,” pp. 1–12, 2019, [Online]. Available: Classifiers and Dynamic Acoustic Features,” IEEE Trans. Neural
http://arxiv.org/abs/1909.11573. Networks Learn. Syst., vol. 29, no. 10, pp. 4633–4644, 2018, doi:
[13] I. J. Goodfellow et al., “Generative adversarial nets,” Adv. Neural 10.1109/T NNLS.2017.2771947.
Inf. Process. Syst., vol. 3, no. January, pp. 2672–2680, 2014. [36] D. Reynolds, “Gaussian Mixture Models,” Encycl. Biometrics, no.
[14] “ Generative Adversarial Networks: The T ech Behind DeepFake and 2, pp. 659–663, 2009, doi: 10.1007/978-0-387-73003-5_196.
FaceApp.” https://interestingengineering.com/generative- [37] “Google has released a giant database of deepfakes to help fight
adversarial-networks-the-tech-behind-deepfake-and-faceapp deepfakes | MIT T echnology Review.”
(accessed Aug. 27, 2020). https://www.technologyreview.com/2019/09/25/132884/google-
[15] M. Mirza and S. Osindero, “ Conditional Generative Adversarial has-released-a-giant-database-of-deepfakes-to-help-fight-deepfakes/
Nets,” pp. 1–7, 2014, [Online]. Available: (accessed Aug. 30, 2020).
http://arxiv.org/abs/1411.1784. [38] “Deepfake Detection Challenge Dataset.”
[16] “ What is the difference between Generative Adversarial Networks https://ai.facebook.com/datasets/dfdc/ (accessed Aug. 30, 2020).
and Autoencoders? - Quora.” https://www.quora.com/What-is-the- [39] “Scientists Are T aking the Fight Against Deepfakes to Another
difference-between-Generative-Adversarial-Networks-and- Level | Discover Magazine.”
Autoencoders (accessed Aug. 29, 2020). https://www.discovermagazine.com/technology/scientists-are-
[17] L. M. Dang, S. I. Hassan, S. Im, J. Lee, S. Lee, and H. Moon, taking-the-fight-against-deepfakes-to-another-level (accessed Aug.
“ applied sciences Identification Using Convolutional Neural 30, 2020).
Authorized licensed use limited to: UNIVERSITY OF CONNECTICUT. Downloaded on May 18,2021 at 00:09:24 UTC from IEEE Xplore. Restrictions apply.