Final Year Thesis
Final Year Thesis
On
“ DEEPFAKE DETECTION”
Submitted By :
Anurag Gajbhiye
Grishma Barbate
Yutika Kale
Sagar Samarth
Guided by :
Prof. Nikita Hatwar
Anurag Gajbhiye
Grishma Barbate
Yutika Kale
Sagar Samarth
CONTENT
1 Index I-II
2 List of Figure III
3 List of Tables V
4 Abstract VI-VII
INDEX
2 Technical Keywords 2
4.4 Outcome 10
4.5 Applications 10
5 Project Plan 12
1
6 Software Requirement Specification 17
6.1 Introduction 17
7.1 Introduction 23
8.1 Introduction 31
8.2.1 Planning 32
8.2.6 Applications 33
2
9
Software Testing 40
10.1 Screenshots 42
10.2 Outputs 46
11
Deployment and Maintenance 48
11.1 Deployment 48
11.2 Maintenance 48
12
Conclusion and Future Scope 50
12.1 Conclusion 50
A References 51
B Project Planner 53
3
LIST OF FIGURES
4
LIST OF TABLES
The growing computation power has made the deep learning algorithms so
powerful that creating an indistinguishable human synthesized video popularly called
as deep fakes have become very simple. Scenarios where these realistic face swapped
deep fakes are used to create political distress, fake terrorism events, revenge porn,
and blackmail people are easily envisioned. In this work, we describe a new deep
learning based method that can effectively distinguish AI-generated fake videos from
real videos.Our method is capable of automatically detecting the replacement and
reenactment of deep fakes. We are trying to use Artificial Intelligence(AI) to fight
Artificial Intelligence(AI). Our system uses a Res-Next Convolution neural network
to extract the frame-level features and these features and further used to train the
Long Short Term Memory(LSTM) based Recurrent Neural Network(RNN) to
classify whether the video is subject to any kind of manipulation or not, i.e whether
the video is deep fake or real video. To emulate the real time scenarios and make the
model perform better on real time data, we evaluate our method on large amount of
balanced and mixed data-set prepared by mixing the various available data-set like
FaceForensic++[1], Deep fake detection challenge[2], and Celeb-DF[3]. We also
show how our system can achieve competitive results using a very simple and robust
approach.
Keywords:
Res-Next Convolution neural network.
Recurrent Neural Network (RNN).
Long Short Term Memory(LSTM).
Computer vision
Deepfake Video Detection
Chapter 1
Synopsis
Deep fake is a technique for human image synthesis based on neural network tools
like GAN(Generative Adversarial Network) or Auto Encoders etc. These tools su-
per impose target images onto source videos using deep learning techniques and
create a realistic looking deep fake video. These deep-fake videos are so real that it
becomes impossible to spot differences with the naked eye. In this work, we describe
a new deep learning-based method that can effectively distinguish AI-generated fake
videos from real videos. We are using the limitation of the deep fake creation tools as
a powerful way to distinguish between the pristine and deep fake videos. Dur- ing the
creation of the deep fake the current deep fake creation tools leave some
distinguishable artifacts in the frames which may not be visible to the human being
but the trained neural networks can spot the changes. Deepfake creation tools leave
distinctive artefacts in the resulting Deep Fake videos, and we show that they can be
effectively captured by Res-Next Convolution Neural Networks.
1
Deepfake Video Detection
Chapter 2
Technical Keywords
Area of Project
Our project is a Deep learning project which is a sub branch of Artificial
Intelligence and deals with the human brain inspired neural network technology.
Computer vi- sion plays an important role in our project. It helps in processing the
video and frames with the help of Open-CV. A PyTorch trained model is a classifier
to classify the source video as deepfake or pristine.
Technical Keywords
• Deep learning
• Computer vision
• OpenCV
• Face Recognition
• PyTorch.
2
Deepfake Video Detection
Chapter 3
Introduction
Project Idea
In the world of ever growing Social media platforms, Deepfakes are con-
sidered as the major threat of AI. There are many Scenarios where these realistic
face swapped deep fakes are used to create political distress, fake terrorism
events, revenge porn, blackmail people are easily envisioned.Some of the
examples are Brad Pitt, Angelina Jolie nude videos.
It becomes very important to spot the difference between the deepfake and
pris- tine video. We are using AI to fight AI.Deepfakes are created using tools
like FaceApp[11] and Face Swap [12], which use pre-trained neural networks
like GAN or Auto encoders for these deep fakes creation. Our method uses a
LSTM based artificial neural network to process the sequential temporal
analysis of the video frames and pre-trained Res-Next CNN to extract the frame
level fea- tures. ResNext Convolution neural network extracts the frame-level
features and these features are further used to train the Long Short Term
Memory based artificial Recurrent Neural Network to classify the video as
Deep fake or real. To emulate the real time scenarios and make the model
perform better on real time data, we trained our method with large amount of
balanced and combi- nation of various available dataset like FaceForensic++[1],
Deep fake detection challenge[2], and Celeb-DF[3].
3
Deepfake Video Detection
Spreading of the Deep fakes over the social media platforms have become very
common leading to spamming and speculating wrong information over the plat-
form. Just imagine a deep fake of our prime minister declaring war against
neighboring countries, or a Deep fake of a reputed celebrity abusing the fans.
These types of deep fakes will be terrible, and lead to threatening, mislead- ing
of common people.
Literature Survey
Face Warping Artifacts [15] used the approach to detect artifacts by com-
paring the generated face areas and their surrounding regions with a dedicated
Convolutional Neural Network model. In this work there were two-fold Face
Artifacts.
Their method is based on the observations that current deep fake algorithms
can only generate images of limited resolutions, which are then needed to be
fur- ther transformed to match the faces to be replaced in the source video.
Their method has not considered the temporal analysis of the frames.
Detection by Eye Blinking [16] describes a new method for detecting the deep-
fakes by the eye blinking as a crucial parameter leading to classification of the
videos as deep fake or pristine. The Long-term Recurrent Convolutional
Network (LRCN) was used for temporal analysis of the cropped frames of eye
blinking. As today the deepfake generation algorithms have become so
4
Deepfake Video Detection
powerful that lack of eye blinking can not be the only clue for detection of the
deep fakes. There must be certain other parameters that must be considered for
the detection of deep- fakes like teeth enchantment, wrinkles on faces, wrong
placement of eyebrows etc.
Capsule networks to detect forged images and videos [17] uses a method that
uses a capsule network to detect forged, manipulated images and videos in
different scenarios, like replay attack detection and computer-generated video
detection. In their method, they have used random noise in the training phase
which is not a good option. Still the model performed beneficial in their dataset
but may fail on real time data due to noise in training. Our method is proposed
to be trained on noiseless and real time datasets.
Recurrent Neural Network [18] (RNN) for deep fake detection used the ap-
proach of using RNN for sequential processing of the frames along with Ima-
geNet pretrained model. Their process used the HOHO [19] dataset consisting
of just 600 videos.
Their dataset consists of a small number of videos and the same type of videos,
which may not perform very well on the real time data. We will be training our
model on a large amount of Real Time data.
Synthetic Portrait Videos using Biological Signals [20] approach extract bio-
logical signals from facial regions on pristine and deep fake portrait video pairs.
Applied transformations to compute the spatial coherence and temporal
consistency, capture the signal characteristics in feature vector and
photoplethysmog- raphy (PPG) maps, and further train a probabilistic Support
Vector Machine (SVM) and a Convolutional Neural Network (CNN). Then, the
average of authenticity probabilities is used to classify whether the video is a
deepfake or a pristine.
Fake Catcher detects fake content with high accuracy, independent of the
gener- ator, content, resolution, and quality of the video. Due to lack of
discriminator leading to the loss in their findings to preserve biological signals,
formulating a differentiable loss function that follows the proposed signal
processing steps is not a straight forward process.
5
Deepfake Video Detection
Chapter 4
Problem Definition and scope
Problem Statement
Convincing manipulations of digital images and videos have been demonstrated
for several decades through the use of visual effects, recent advances in deep learn-
ing have led to a dramatic increase in the realism of fake content and the accessibility
in which it can be created. These so-called AI-synthesized media (popularly referred
to as deep fakes).Creating the Deep Fakes using the Artificially intelligent tools are
simple tasks. But, when it comes to detection of these Deep Fakes, it is a major
challenge. Already in the history there are many examples where the deep fakes are
used as powerful way to create political tension[14], fake terrorism events, revenge
porn, blackmail peoples etc.So it becomes very important to detect these deepfake
and avoid the percolation of deepfake through social media platforms. We have taken
a step forward in detecting the deep fakes using LSTM based artificial Neural
networks.
• Our project aims at discovering the distorted truth of the deep fakes.
• Our project will reduce the Abuses’ and misleading of the common people on
the world wide web.
• Our project will distinguish and classify the video as deep fake or pristine.
• Provide an easy to use system to upload the video and distinguish whether the
video is real or fake.
Statement of scope
There are many tools available for creating the deep fakes, but for deep fake
detection there is hardly any tool available. Our approach for detecting the deep
fakes will be a great contribution in avoiding the percolation of the deep fakes
over the world wide web. We will be providing a web-based platform for the
6
Deepfake Video Detection
user to upload the video and classify it as fake or real. This project can be scaled
up from developing a web-based platform to a browser plugin for au- tomatic
deep fake detection. Even big applications like WhatsApp, Facebook can
integrate this project with their application for easy pre-detection of deep fakes
before sending them to another user. A description of the software with Size of
input, bounds on input, input validation, input dependency, i/o state diagram,
Major inputs, and outputs are described without regard to implementation de-
tail.
Major Constraints
• User: User of the application will be able detect whether the uploaded video is
fake or real, Along with the model confidence of the prediction.
• Prediction: The User will be able to see the playing video with the output on
the face along with the confidence of the model.
• Solution Requirement
We analysed the problem statement and found the feasibility of the solution of
the problem. We read different research paper as mentioned in 3.3. After
checking the feasibility of the problem statement. The next step is the data- set
gathering and analysis. We analysed the data set in a different approach of
training like negatively or positively trained i.e training the model with only
fake or real video’s but found that it may lead to addition of extra bias in the
model leading to inaccurate predictions. So after doing a lot of research we
found that the balanced training of the algorithm is the best way to avoid the
bias and variance in the algorithm and get a good accuracy.
7
Deepfake Video Detection
• Solution Constraints
We analysed the solution in terms of cost,speed of processing,requirements,level
of expertise, availability of equipments.
• Parameter Identified
Design
After research and analysis we developed the system architecture of the solution as
mentioned in Chapter 6. We decided the baseline architecture of the Model which
includes the different layers and their numbers.
Development
After analysis we decided to use the PyTorch framework along with python3 lan-
guage for programming. PyTorch is chosen as it has good support for CUDA i.e
Graphic Processing Unit (GPU) and it is customize-able. Google Cloud Platform for
training the final model on a large number of data-set.
8
Deepfake Video Detection
Evaluation
We evaluated our model with a large number of real time dataset which include
YouTube videos dataset. Confusion Matrix approach is used to evaluate the accuracy
of the trained model.
Outcome
The outcome of the solution is trained deep fake detection models that will help the
users to check if the new video is deep fake or real.
Applications
Web based applications will be used by the user to upload the video and submit the
video for processing. The model will pre-process the video and predict whether the
uploaded video is a deep fake or real video.
9
Deepfake Video Detection
10
Deepfake Video Detection
Chapter 5
Project Plan
11
Deepfake Video Detection
Reconciled Estimates
Since we have a small team , less-rigid requirements, and a long deadline, we are
using the organic COCOMO[23] model.
1. Efforts Applied: It defines the Amount of labor that will be required to
complete a task. It is measured in person-months units.
EffortApplied(E) = ab(KLOC)bb
E = 2.4(20.5)1.05
E = 57.2206PM
2. Development Time: Simply means the amount of time required for the com-
pletion of the job, which is, of course, proportional to the effort put. It is
measured in the units of time such as weeks, months.
DevelopmentTime(D) = cb(E)db
D = 2.5(57.2206)0.38
D = 11.6M
12
Deepfake Video Detection
3. People Required: The number of developers needed to complete the project.
E
PeopleRequired(P) =
D
P=
P = 4.93
Before the training, we need to prepare thousands of images for both persons. We
can take a shortcut and use a face detection library to scrape facial pictures from their
videos. Spend significant time to improve the quality of your facial pictures. It
impacts your final result significantly.
1. Remove any picture frames that contain more than one person.
2. Make sure you have an abundance of video footage. Extract facial pictures
contain different poses, face angles, and facial expressions.
3. Some resemblance of both persons may help, like similar face shapes.
Risk Analysis
In Deep Fakes, it creates a mask on the created face so it can blend in with the target
video. To further eliminate the artifacts
1. Apply a Gaussian filter to further diffuse the mask boundary area.
13
Deepfake Video Detection
Project Schedule
Project task set
14
Deepfake Video Detection
• Task 2 : Module 1 implementation
Module 1 implementation consists of splitting the video to frames and
cropping each frame consisting of face.
• Task 3: Pre-processing
Pre-processing includes the creation of the new dataset which includes only
face cropped videos.
Timeline chart
15
Deepfake Video Detection
Chapter 6
Software requirement specification
Introduction
Purpose and Scope of Document
This document lays out a project plan for the development of Deepfake video de-
tection using neural network.The intended readers of this document are current and
future developers working on Deep fake video detection using neural network and
the sponsors of the project. The plan will include, but is not restricted to, a sum- mary
of the system functionality, the scope of the project from the perspective of the
“Deepfake video detection” team (me and my mentors), use case diagram, Data flow
diagram,activity diagram, functional and non- functional requirements, project risks
and how those risks will be mitigated, the process by which we will develop the
project, and metrics and measurements that will be recorded throughout the project.
16
Deepfake Video Detection
Functional Model and Description
A description of each major software function, along with data flow (structured
anal- ysis) or class hierarchy (Analysis Class diagram with class description for
object oriented system) is presented.
DFD level – 0 indicates the basic flow of data in the system. In this System Input is
given equal importance as that for Output.
DFD Level-1
17
Deepfake Video Detection
DFD Level-2
18
Deepfake Video Detection
Activity Diagram: Training
Workflow:
19
Deepfake Video Detection
Figure 6.5: Training Workflow
Testing Workflow:
20
Deepfake Video Detection
Safety Requirement
• The Data integrity is preserved. Once the video is uploaded to the system. It is
only processed by the algorithm. The videos are kept secure from human
interventions, as the uploaded video is not are not able for human manipulation.
• To extend the safety of the videos uploaded by the user will be deleted after 30
min from the server.
Security Requirement
• While uploading the video, the video will be encrypted using a certain symmet-
ric encryption algorithm. On the server also the video is in encrypted format
only. The video is only decrypted from preprocessing till we get the output.
After getting the output the video is again encrypted.
• This cryptography will help in maintaining the security and integrity of the
Sequence Diagram
21
Deepfake Video Detection
Chapter 7
Detailed Design Document
Introduction
In this system, we have trained our PyTorch deepfake detection model on equal
number of real and fake videos in order to avoid the bias in the model. The system
architecture of the model is shown in the figure. In the development phase, we have
taken a dataset, preprocessed the dataset and created a new processed dataset which
only includes the face cropped videos.
22
Deepfake Video Detection
To detect the deep fake videos it is very important to understand the creation
process of the deep fake. Majority of the tools including the GAN and autoencoders
takes a source image and target video as input. These tools split the video into
frames , detect the face in the video and replace the source face with target face on
each frame. Then the replaced frames are then combined using different pre-trained
models. These models also enhance the quality of video my removing the left-over
traces by the deepfake creation model. Which results in the creation of a deep fake
that looks realistic in nature. We have also used the same approach to detect the deep
fakes. Deepfakes created using the pre-trained neural networks models are so realistic
that it is almost impossible to spot the difference with the naked eye. But in reality,
the deep fakes creation tools leave some of the traces or artifacts in the video which
may not be noticeable by the naked eyes. The motive of this paper to identify these
unnoticeable traces and distinguishable artifacts of these videos and classified it as
deep fake or real video.
23
Deepfake Video Detection
1. Faceswap
2. Faceit
Architectural Design
Module 1 : Data-set Gathering
For making the model efficient for real time prediction. We have gathered the data
from different available data-sets like FaceForensic++(FF)[1], Deep fake detection
challenge(DFDC)[2], and Celeb-DF[3]. Further we have mixed the dataset with the
collected datasets and created our own new dataset, to accurate and real time
detection on different kinds of videos. To avoid the training bias of the model we
have considered 50% Real and 50% fake videos.
Deep fake detection challenge (DFDC) dataset [3] consist of certain audio
alerted video, as audio deep fake are out of scope for this paper. We preprocessed the
DFDC dataset and removed the audio altered videos from the dataset by running a
python script.
After preprocessing of the DFDC dataset, we have taken 1500 Real and 1500
Fake videos from the DFDC dataset. 1000 Real and 1000 Fake videos from the
FaceForensic++(FF)[1] dataset and 500 Real and 500 Fake videos from the Celeb-
DF[3] dataset. Which makes our total dataset consisting 3000 Real, 3000 fake videos
and 6000 videos in total. Figure 2 depicts the distribution of the data-sets.
24
Deepfake Video Detection
Module 2 : Pre-processing
In this step, the videos are preprocessed and all the unrequired and noise is removed
from videos. Only the required portion of the video i.e face is detected and cropped.
The first steps in the preprocessing of the video is to split the video into frames.
After splitting the video into frames the face is detected in each of the frame and the
frame is cropped along the face. Later the cropped frame is again converted to a new
video by combining each frame of the video. The process is followed for each video
which leads to creation of a processed dataset containing face only videos. The frame
that does not contain the face is ignored while preprocessing.
To maintain the uniformity of the number of frames, we have selected a
threshold value based on the mean of the total frames count of each video. Another
reason for selecting a threshold value is limited computation power. As a video of 10
second at 30 frames per second(fps) will have a total 300 frames and it is
computationally very difficult to process the 300 frames at a single time in the
experimental envi- ronment. So, based on our Graphic Processing Unit (GPU)
computational power in an experimental environment we have selected 150 frames as
the threshold value. While saving the frames to the new dataset we have only saved
the first 150 frames of the video to the new video. To demonstrate the proper use of
Long Short-Term Memory (LSTM) we have considered the frames in the sequential
manner i.e. first 150 frames and not randomly. The newly created video is saved at
frame rate of 30 fps and resolution of 112 x 112.
25
Deepfake Video Detection
The dataset is split into train and test dataset with a ratio of 70% train videos (4,200)
and 30% (1,800) test videos. The train and test split is a balanced split i.e 50% of the
real and 50% of fake videos in each split.
26
Deepfake Video Detection
Our model is a combination of CNN and RNN. We have used the Pre- trained
ResNext CNN model to extract the features at frame level and based on the extracted
features a LSTM network is trained to classify the video as deepfake or pristine. Us-
ing the Data Loader on training split of videos the labels of the videos are loaded and
fitted into the model for training.
ResNext :
Instead of writing the code from scratch, we used the pre-trained model of ResNext
for feature extraction. ResNext is Residual CNN network optimized for high per-
formance on deeper neural networks. For the experimental purpose we have used the
resnext50_32x4d model. We have used a ResNext of 50 layers and 32 x 4 dimen-
sions.
Following, we will be fine-tuning the network by adding extra required layers
and selecting a proper learning rate to properly converge the gradient descent of the
model. The 2048-dimensional feature vectors after the last pooling layers of ResNext
is used as the sequential LSTM input.
2048-dimensional feature vectors are fitted as the input to the LSTM. We are using 1
LSTM layer with 2048 latent dimensions and 2048 hidden layers along with 0.4
chance of dropout, which is capable of achieving our objective. LSTM is used to
process the frames in a sequential manner so that the temporal analysis of the video
can be made, by comparing the frame at ‘t’ second with the frame of ‘t-n’ seconds.
Where n can be any number of frames before t.
The model also consists of Leaky Relu activation function. A linear layer of 2048
input features and 2 output features are used to make the model capable of learning
the average rate of correlation between input and output. An adaptive average polling
layer with the output parameter 1 is used in the model. Which gives the target output
size of the image of the form H x W. For sequential processing of the frames a
Sequential Layer is used. The batch size of 4 is used to perform the batch training. A
SoftMax layer is used to get the confidence of the model during predication.
27
Deepfake Video Detection
It is the process of choosing the perfect hyper-parameters for achieving the maxi-
mum accuracy. After reiterating many times on the model. The best hyper-parameters
for our dataset are chosen. To enable the adaptive learning rate Adam[21] optimizer
with the model parameters is used. The learning rate is tuned to 1e-5 (0.00001) to
achieve a better global minimum of gradient descent. The weight decay used is 1e-3.
As this is a classification problem so to calculate the loss cross entropy approach
is used.To use the available computation power properly the batch training is used.
The batch size is 4. Batch size of 4 is tested to be ideal size for training in our
development environment.
The User Interface for the application is developed using the Django framework.
Django is used to enable the scalability of the application in the future.
The first page of the User interface i.e index.html contains a tab to browse and
upload the video. The uploaded video is then passed to the model and prediction is
made by the model. The model returns the output whether the video is real or fake
along with the confidence of the model. The output is rendered in the predict.html on
the face of the playing video.
28
Deepfake Video Detection
Chapter 8
Project Implementation
Introduction
There are many examples where deepfake creation technology is used to mis- lead
the people on social media platforms by sharing the false deep fake videos of famous
personalities like Mark Zuckerberg Eve of House A.I. Hearing, Don- ald Trump’s
Breaking Bad series where he was introduced as James McGill, Barack Obama’s
public service announcement and many more [5]. These types of deepfakes create a
huge panic among the normal people, which raises the need to spot these deep fakes
accurately so that they can be distinguished from the real videos.
29
Deepfake Video Detection
Tools and Technologies Used
Planning
1.OpenProject
UML Tools
1. draw.io
Programming Languages
1. Python3
2. JavaScript
Programming Frameworks
1. PyTorch
2. Django
IDE
Versioning Control
1. Git
Cloud Services
30
Deepfake Video Detection
Libraries
1. torch
2. torchvision
3. os
4. numpy
5. cv2
6. matplotlib
7. face_recognition
8. json
9. pandas
10. copy
11. glob
12. random
13. sklearn
Algorithm Details
Dataset Details Refer
7.2.1
Preprocessing Details
• Using glob we imported all the videos in the directory in a python list.
31
Deepfake Video Detection
• cv2.VideoCapture is used to read the videos and get the mean number of frames
in each video.
• To maintain uniformity, based on mean a value 150 is selected as idea value for
creating the new dataset.
• The video is split into frames and the frames are cropped on face location.
• The face cropped frames are again written to new video using VideoWriter.
• The new video is written at 30 frames per second and with the resolution of 112
x 112 pixels in the mp4 format.
• Instead of selecting the random videos, to make the proper use of LSTM for
temporal sequence analysis the first 150 frames are written to the new video.
Model Details
32
Deepfake Video Detection
• LSTM Layer : LSTM is used for sequence processing and spot the temporal
change between the frames.2048-dimensional feature vectors is fitted as the
input to the LSTM. We are using 1 LSTM layer with 2048 latent dimensions
and 2048 hidden layers along with 0.4 chance of dropout, which is capable to do
achieve our objective. LSTM is used to process the frames in a sequential
manner so that the temporal analysis of the video can be made, by comparing
the frame at ‘t’ second with the frame of ‘t-n’ seconds. Where n can be any
number of frames before t.
33
Deepfake Video Detection
• ReLU:A Rectified Linear Unit is activation function that has output 0 if the
input is less than 0, and raw output otherwise. That is, if the input is greater than
0, the output is equal to the input. The operation of ReLU is closer to the way
our biological neurons work. ReLU is non-linear and has the advantage of not
having any backpropagation errors unlike the sigmoid function, also for larger
Neural Networks, the speed of building models based off on ReLU is very fast.
34
Deepfake Video Detection
• Dropout Layer :Dropout layer with the value of 0.4 is used to avoid over-
fitting in the model and it can help a model generalize by randomly setting the
output for a given neuron to 0. In setting the output to 0, the cost function
becomes more sensitive to neighbouring neurons changing the way the weights
will be updated during the process of backpropagation.
• Train Test Split:The dataset is split into train and test dataset with a ratio of
70% train videos (4,200) and 30% (1,800) test videos. The train and test split is
a balanced split i.e 50% of the real and 50% of fake videos in each split. Refer
figure 7.6
• Data Loader: It is used to load the videos and their labels with a batch size of
4.
• Training: The training is done for 20 epochs with a learning rate of 1e-5
(0.00001),weight decay of 1e-3 (0.001) using the Adam optimizer.
35
Deepfake Video Detection
• Cross Entropy: To calculate the loss function Cross Entropy approach is used
because we are training a classification problem.
36
Deepfake Video Detection
not only into the errors being made by a classifier but more importantly the
types of errors that are being made.
Confusion matrix is used to evaluate our model and calculate the accuracy.
• Export Model: After the model is trained, we have exported the model. So that
it can be used for prediction on real time data.
• The new video for prediction is preprocessed(refer 8.3.2, 7.2.2) and passed to
the loaded model for prediction
• The trained model performs the prediction and return if the video is a real or
fake along with the confidence of the prediction.
37
Deepfake Video Detection
Chapter 9
Software Testing
Non-functional Testing
38
Deepfake Video Detection
39
Deepfake Video Detection
Chapter 10
Screen shots
40
Deepfake Video Detection
41
Deepfake Video Detection
Chapter 11
Deployment and Maintenance
Deployment
Following are the steps to be followed for the deployment of the application.
• cd DeepFake_Detection
• python server.py
• Access the application at http://localhost:5000
42
Deepfake Video Detection
Chapter 12
Conclusion and Future Scope
Conclusion
We presented a neural network-based approach to classify the video as deep fake
or real, along with the confidence of proposed model. Our method is capable of
predicting the output by processing 1 second of video (10 frames per second) with a
good accuracy. We implemented the model by using pre-trained ResNext CNN model
to extract the frame level features and LSTM for temporal sequence process- ing to
spot the changes between the t and t-1 frame. Our model can process the video in the
frame sequence of 10,20,40,60,80,100.
Future Scope
There is always a scope for enhancements in any developed system, especially
when the project builds using the latest trending technology and has a good scope in
future.
• Web based platforms can be upscaled to a browser plugin for ease of access to
the user.
• Currently only Face Deep Fakes are being detected by the algorithm, but the
algorithm can be enhanced in detecting full body deep fakes.
43
Deepfake Video Detection
Appendix A
References
[1] Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus
Thies,Matthias Nießner, “FaceForensics++: Learning to Detect Manipulated Facial
Images” in arXiv:1901.08971.
[2] Deepfake detection challenge dataset :
https://www.kaggle.com/c/deepfake-detection- challenge/data Accessed on 26
March, 2020
[3] Yuezun Li , Xin Yang , Pu Sun , Honggang Qi and Siwei Lyu “Celeb-DF: A
Large-scale Challenging Dataset for DeepFake Forensics” in arXiv:1909.12962
[4] Deep Fake Video of Mark Zuckerberg Goes Viral on Eve of House A.I. Hearing :
https://fortune.com/2019/06/12/deepfake-mark-zuckerberg/ Accessed on 26
March, 2020
[5] 10 deep fake examples that terrified and amused the internet :
https://www.creativebloq.com/features/deepfake-examples Accessed on 26 March,
2020
[6] TensorFlow: https://www.tensorflow.org/ (Accessed on 26 March, 2020)
[7] Keras: https://keras.io/ (Accessed on 26 March, 2020)
[8] PyTorch : https://pytorch.org/ (Accessed on 26 March, 2020)
[9] G. Antipov, M. Baccouche, and J.-L. Dugelay. Face aging with conditional gen-
erative adversarial networks. arXiv:1702.01983, Feb. 2017
[10] J. Thies et al. Face2Face: Real-time face capture and reenactment of rgb
videos. Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 2387–2395, June 2016. Las Vegas, NV.
[11] Face app: https://www.faceapp.com/ (Accessed on 26 March, 2020) [12] Face
Swap : https://faceswaponline.com/ (Accessed on 26 March, 2020) [13]
Deepfakes, Revenge Porn, And The Impact On Women :
https://www.forbes.com/sites/chenxiwang/2019/11/01/deepfakes-revenge-porn-an
d- the-impact-on-women/
[14] The rise of the deep fake and the threat to democracy :
https://www.theguardian.com/technology/ng-interactive/2019/jun/22/the-rise-of-
the-deepfake-and-the-threat-to-democracy(Accessed on 26 March, 2020)
[15] Yuezun Li, Siwei Lyu, “ExposingDF Videos By Detecting Face Warping Arti-
facts,” in arXiv:1811.00656v3.
44
Deepfake Video Detection
[16] Yuezun Li, Ming-Ching Chang and Siwei Lyu “Exposing AI Created Fake
Videos by Detecting Eye Blinking” in arXiv:1806.02877v2.
[17] Huy H. Nguyen , Junichi Yamagishi, and Isao Echizen “ Using capsule net-
works to detect forged images and videos ” in arXiv:1810.11215.
[18] D. Güera and E. J. Delp, "Deepfake Video Detection Using Recurrent Neural
Networks," 2018 15th IEEE International Conference on Advanced Video and
Sig- nal Based Surveillance (AVSS), Auckland, New Zealand, 2018, pp. 1-6.
[19] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic hu-
man actions from movies. Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pages 1–8, June 2008. Anchorage, AK
[20] Umur Aybars Ciftci, ˙Ilke Demir, Lijun Yin “Detection of Synthetic Portrait
Videos using Biological Signals” in arXiv:1901.02212v2
[21] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization.
arXiv:1412.6980, Dec. 2014.
[22] ResNext Model : https://pytorch.org/hub/pytorch_vision_resnext/ accessed on
06 April 2020
[23] https://www.geeksforgeeks.org/software-engineering-cocomo-model/ Accessed
on 15 April 2020
[24] Deepfake Video Detection using Neural Networks
http://www.ijsrd.com/articles/IJSRDV8I10860.pdf
[25] International Journal for Scientific Research and Development http://ijsrd.com/
45
Deepfake Video Detection
Appendix B
Project Planner
46
Deepfake Video Detection
47
Deepfake Video Detection
APPENDIX C
FINAL CO- MAPPING OF PROJECT
Project Outcomes :
PO8: Ethics
PO10: Communication
48
Deepfake Video Detection
Communicate with 3 2 1 2 1 2 1 1 3 1 3 3
engineers and the
community at large in
written and oral forms.
(Real life, Industry
based project)
Be able to present 3 2 1 2 1 1 2 1 3 3 2 3
the project outlining
the approach and
expected results
using good oral
presentation skills.
49
Deepfake Video Detection
Annexure
Information of Project Group Members:
Photos Details
50