This document discusses deepfakes, including what they are, their history, present uses, future challenges, and consequences. Deepfakes use deep learning techniques like GANs to manipulate images and audio to deceive viewers into thinking something is real when it is actually fake. While initially developed by researchers, open-source tools now allow anyone to generate deepfakes. The future poses challenges around reducing training data needs, improving temporal coherence in videos, and preventing identity leakage, among other issues. Deepfakes could potentially target politicians, actors and public figures to manipulate perceptions. Prevention strategies include developing counter-AI techniques, using blockchain, and raising awareness.
• When somethingreal is taken and deep
learning is applied onto it, making it into
something fake.
• Deep learning + Fake = Deepfake
• Deep learning involve training generative neural
network architectures, such as Autoencoders or
Generative Adversarial Networks (GANs).
• The generated visual and audio content have a
high potential to deceive.
• Photo manipulationwas developed in the 19th
century and soon applied to motion pictures.
• Technology steadily improved during the 20th century,
and more quickly with digital video.
• Initially Deepfake technology development began by
researchers at academic institutions in the 1990s,
and later by amateurs in online communities.
• An early landmark project was the Video Rewrite
program, published in 1997, which modified existing
video footage of a person speaking to depict that
person mouthing the words contained in a different
audio track.
• The Face2Faceprogram, published in 2016.
• The “Synthesizing Obama” program, published in 2017.
11.
• In August2018, researchers at the University of
California, Berkeley published a paper introducing a
fake dancing app that can create the impression of
masterful dancing ability using AI.
13.
• As of2019, open-source software such as Faceswap
and the command line-based DeepFaceLab were
brought to the people.
• A famous research paper published in June 2018, by
the name of Transfer Learning from Speaker
Verification to Multispeaker Text-To-Speech Synthesis.
• In January 2020, another research paper by the name
of Neural Voice Puppetry: Audio-driven Facial
Reenactment was published.
• Avatarify launched in April 2020, and the main
purpose is to provide photorealistic avatars for video-
conferencing apps and more interestingly its open-
source.
16.
• Before thesewhite papers were published, In 2017,
Lyrebird AI, a software existed for creating
synthesized audio.
• Desktop app like FakeApp and Mobile apps like
Impression and Doublicat.
• And many more…
• Generalization;
High-quality Deepfakesare often achieved by
training on hours of footage of the target. This
challenge is to minimize of the amount of training
data required to produce quality images and to
enable the execution of trained models on new
identities (unseen during training).
19.
• Paired Training;
Traininga supervised model can produce high-
quality results, but requires data pairing. This is the
process of finding examples of inputs and their
desired outputs for the model to learn from. Data
pairing is laborious and impractical when training
on multiple identities and facial behaviors. Some
solutions include self-supervised training (using
frames from the same video), the use of unpaired
networks such as Cycle-GAN, or the manipulation
of network embeddings.
20.
• Identity leakage;
Thisis where the identity of the driver (i.e., the
actor controlling the face in a reenactment) is
partially transferred to the generated face. Some
solutions proposed include attention mechanisms,
few-shot learning, disentanglement, boundary
conversions, and skip connections.
21.
• Occlusions;
When partof the face is obstructed with a hand,
hair, glasses, or any other item then artifacts can
occur. A common occlusion is a closed mouth
which hides the inside of the mouth and the teeth.
Some solutions include image segmentation during
training and in-painting.
22.
• Temporal coherence;
Invideos containing Deepfakes, artifact such as
flickering and jitter can occur because the network
has no context of the preceding frames. Some
researchers provide this context or use novel
temporal coherence losses to help improve
realism.
The way inwhich the following
are the Target;
• Politicians (easy and dangerous)
• Film Actors (too much data available)
• Social Figures
• General Public