0% found this document useful (0 votes)
298 views78 pages

Lightweight Deep Learning for Image Forgery Detection

The document discusses image forgery detection using fusion of light weight deep learning models. It introduces a robust deep learning system to identify image forgeries from double image compression artifacts. The proposed model is lightweight and faster than state-of-the-art approaches, achieving 92.23% validation accuracy on detecting different types of forgeries from compression artifacts.

Uploaded by

Venkat Karthik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
298 views78 pages

Lightweight Deep Learning for Image Forgery Detection

The document discusses image forgery detection using fusion of light weight deep learning models. It introduces a robust deep learning system to identify image forgeries from double image compression artifacts. The proposed model is lightweight and faster than state-of-the-art approaches, achieving 92.23% validation accuracy on detecting different types of forgeries from compression artifacts.

Uploaded by

Venkat Karthik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

IMAGE FORGERY DETECTION BASED ON FUSSION OF LIGHT WEIGHT DEEP

LEARNING MODELS

A project report submitted in partial fulfillment


of the requirements for the award of the degree of

Master
of
Computer Application
Submitted by
STUDENT_NAME
ROLL_NO
Under the esteemed guidance of
GUIDE_NAME
Assistant Professor

DEPARTMENT OF COMPUTER SCIENCE &ENGINEERING


ST.MARY’S GROUP OF INSTITUTIONS GUNTUR (Affiliated
to JNTU Kakinada, Approved by AICTE, Accredited by NBA)
CHEBROLU-522 212, A.P, INDIA
2014-16
ST. MARY’S GROUP OF INSTITUTIONS, CHEBROLU, GUNTUR
(Affiliated to JNTU Kakinada)

DEPARTMENT OF COMPUTER SCIENCE & ENGINEEING

CERTIFICATE

This is to certify that the project report entitled PROJECT NAME” is the bonafied record of
project work carried out by STUDENT NAME, a student of this college, during the academic
year 2014 - 2016, in partial fulfillment of the requirements for the award of the degree of Master
Of Computer Application from St.Marys Group Of Institutions Guntur of Jawaharlal Nehru
Technological University, Kakinada.

GUIDE_NAME,
Asst. Professor Associate. Professor
(Project Guide) (Head of Department, CSE)
DECLARATION

We, hereby declare that the project report entitled “PROJECT_NAME” is an original work
done at St.Mary„s Group of Institutions Guntur, Chebrolu, Guntur and submitted in fulfillment
of the requirements for the award of Master of Computer Application, to St.Mary„s Group of
Institutions Guntur, Chebrolu, Guntur.

STUDENT_NAME
ROLL_NO
ACKNOWLEDGEMENT

We consider it as a privilege to thank all those people who helped us a lot for
successful
completion of the project “PROJECT_NAME” A special gratitude we extend to
our guide GUIDE_NAME, Asst. Professor whose contribution in stimulating
suggestions and encouragement ,helped us to coordinate our project especially in
writing this report, whose valuable suggestions, guidance and comprehensive
assistance helped us a lot in presenting the project “PROJECT_NAME”.
We would also like to acknowledge with much appreciation the crucial role of our
Co-Ordinator GUIDE_NAME, Asst.Professor for helping us a lot in completing
our project. We just wanted to say thank you for being such a wonderful educator
as well as a person.
We express our heartfelt thanks to HOD_NAME, Head of the Department, CSE,
for his spontaneous expression of knowledge, which helped us in bringing up this
project through the academic year.

STUDENT_NAME
ROLL_NO
ABSTRACT:

Capturing images has been increasingly popular in recent years, owing to the widespread availability of
cameras. Images are essential in our daily lives because they contain a wealth of information, and it is often
required to enhance images to obtain additional information. A variety of tools are available to improve
image quality; nevertheless, they are also frequently used to falsify images, resulting in the spread of
misinformation. This increases the severity and frequency of image forgeries, which is now a major source
of concern. Numerous traditional techniques have been developed over time to detect image forgeries. In
recent years, convolutional neural networks (CNNs) have received much attention, and CNN has also
influenced the field of image forgery detection. However, most image forgery techniques based on CNN
that exist in the literature are limited to detecting a specific type of forgery (either image splicing or copy-
move). As a result, a technique capable of efficiently and accurately detecting the presence of unseen
forgeries in an image is required. In this paper, we introduce a robust deep learning based system for
identifying image forgeries in the context of double image compression. The difference between an image’s
original and recompressed versions is used to train our model. The proposed model is lightweight, and its
performance demonstrates that it is faster than state-of-the-art approaches. The experiment results are
encouraging, with an overall validation accuracy of 92.23%.

5
TABLE OF CONTENTS

TITLE PAGENO
1. ABSTRACT 6
2. INTRODUCTION 8
2.1 SYSTEM ANALYSIS 8
2.2 PROPOSED SYSTEM 9
2.3OVERVIEW OF THE PROJECT 9
3- LITERATURE SURVEY 11
3.1 REQUIREMENT SPECIFICATIONS 13
3.2 HADWARE AND SOFTWARE SPECIFICATIONS 15
3.3 TECHNOLOGIES USED 18
3.4 INTRODUCTION TO PYTHON 20
3.5 MACHINE LEARNING 24
3.6 SUPERVISED LEARNING 26
4. DESIGN AND IMPLEMENTATION CONSTRAINTS 27
4.1 CONSTRAINTS IN ANALYSIS 30
4.2 CONSTRAINTS IN DESIGN 34
5.DESIGN AND IMPLEMENTATION 38
6. ARCHITECTURE DIAGRAM 43
7. MODULES 45
8. CODING AND TESTING 50
9.APPENDISIS 52

6
CHAPTER 1

SYNOPSIS
Due to technological advancements and globalization, electronic equipment is now widely and inexpensively
available. As a result, digital cameras have grown in popularity. There are many camera sensors all around us, and
we use them to collect a lot of images. Images are required in the form of a soft copy for various documents that
must be filed online, and a large number of images are shared on social media every day. The amazing thing about
images is that even illiterate people can look at them and extract information from them. As a result, images are an
integral component of the digital world, and they play an essential role in storing and distributing data. There are
numerous tools accessible for quickly editing the images [1,2]. These tools were created with the intention of
enhancing and improving the images. However, rather than enhancing the image, some people exploit their
capabilities to falsify images and propagate falsehoods [3,4]. This is a significant threat, as the damage caused by
faked images is not only severe, but also frequently irreversible.
There are two basic types of image forgery: image splicing and copy-move, which are discussed below:

•Image Splicing: A portion of a donor image is copied into a source image. A sequence of donor images can
likewise be used to build the final forged image.
•Copy-Move: This scenario contains a single image. Within the image, a portion of the image is copied and pasted.
This is frequently used to conceal other objects. The final forged image contains no components from other
images.
The primary purpose in both cases of image forgery is to spread misinformation by changing the original content in
an image with something else [5,6]. Earlier images were an extremely credible source for the information
exchange, however, due to image forgery, they are used to spread misinformation. This is affecting the trust of the
public in images, as the forging of images may or may not be visible or recognizable to the naked eye. As a result,
it is essential to detect image forgeries to prevent the spread of misinformation as well as to restore public trust in
images. This can be done by exploring the various artifacts left behind when an image forgery is performed, and
they can be identified using various image processing techniques.
Researchers have proposed a variety of methods for detecting the presence of image forgeries [7–9]. Conventional
image forgery detection techniques detect forgeries by concen- trating on the multiple artifacts present in a forged
image, such as changes in illumination, contrast, compression, sensor noise, and shadow. CNN’s have gained
popularity in recent years for various computer vision tasks, including image object recognition, semantic
segmentation, and image classification. Two major features contribute to CNN’s success in computer vision.
Firstly, CNN takes advantage of the significant correlation between adjacent pixels. As a result, CNN prefers
7
locally grouped connections over one-to-one connections between all pixel. Second, each output feature map is
produced through a convolution operation by sharing weights. Moreover, compared to the traditional method that
depends on engineered features to detect specific forgery, CNN uses learned features from training images, and it
can generalize itself to detect unseen forgery. These advan- tages of CNN make it a promising tool for detecting
the presence of forgery in an image. It is possible to train a CNN-based model to learn the many artifacts found in a
forged image [10–13]. Thus, we propose a very light CNN-based network, with the primary goal of learning the
artifacts that occur in a tampered image as a result of differences in the features of the original image and the
tampered region.
The major contribution of the proposed technique are as follows:

• A lightweight CNN-based architecture is designed to detect image forgery efficiently. The proposed
technique explores numerous artifacts left behind in the image tamper- ing process, and it takes advantage of
differences in image sources through image recompression.

•While most existing algorithms are designed to detect only one type of forgery, our technique can detect both
image splicing and copy-move forgeries and has achieved high accuracy in image forgery detection.

• Compared to existing techniques, the proposed technique is fast and can detect the presence of image
forgery in significantly less time. Its accuracy and speed make it suitable for real-world application, as it can
function well even on slower devices.
The rest of the paper is organized as follows. Section 2 provides a literature review of image forgery detection
methodologies. Section 3 introduces the proposed framework for detecting the presence of forgeries in an image.
Section 4 contains a discussion of the experimentation and the results achieved. Finally, in Section 5, we
summarize the conclusions.

2.Literature Review
Various approaches have been proposed in the literature to deal with image forgery. The majority of traditional
techniques are based on particular artifacts left by image forgery, whereas recently techniques based on CNNs and
deep learning were introduced, which are mentioned below. First, we will mention the various traditional
techniques and then move on to deep learning based techniques.

In [14], the authors’ proposed error level analysis (ELA) for the detection of forgery in an image. In [15], based on
8
the lighting conditions of objects, forgery in an image is detected. It tries to find the forgery based on the difference
in the lighting direction of the forged part and the genuine part of an image. In [16], various traditional image
forgery detection techniques have been evaluated. In [17], Habibi et al., use the contourlet trans- form to retrieve
the edge pixels for forgery detection. In [18], Dua et al., presented a JPEG compression-based method. The
discrete DCT coefficients are assessed independently for each block of an image partitioned into non-overlapping
blocks of size 8 × 8 pixels. The sta- tistical features of AC components of block DCT coefficients alter when a
JPEG compressed image tampers. The SVM is used to classify authentic and forged images using the retrieved
feature vector. Ehret et al. in [19] introduced a technique that relies on SIFT, which provides sparse keypoints with
scale, rotation, and illumination invariant descriptors for forgery detection. A method for fingerprint faking
detection utilizing deep Boltzmann machines (DBM) for image analysis of high-level characteristics is proposed in
[20]. Balsa et al. in [21] compared the DCT, Walsh–Hadamard transform (WHT), Haar wavelet transform (DWT),
and discrete Fourier transform (DFT) for analog image transmission, changing compression and comparing quality.
These can be used for image forgery detection by exploring the image from different domains. Thanh et al.
proposed a hybrid approach for image splicing in [22], in which they try to retrieve the original images that were
utilized to construct the spliced image if a given image is proven to be the spliced image. They present a hybrid
image retrieval approach that uses Zernike moment and SIFT features.
Bunk et al. established a method for detecting image forgeries based on resampling features and deep learning in
[23]. Bondi et al. in [24] suggested a method for detecting image tampering by the clustering of camera-based
CNN features. Myung-Joon in [2] introduced CAT-Net, to acquire forensic aspects of compression artifact on DCT
and RGB domains simultaneously. Their primary network is HR-Net (high resolution). They used the technique
proposed in [25], which tells us that how we can use the DCT coefficient to train a CNN, as directly giving DCT
coefficients to CNN will not train it efficiently. Ashraful et al. in [26] proposed DOA-GAN, to detect and localize
copy-move forgeries in an image, authors used a GAN with dual attention. The first-order attention in the generator
is designed to collect copy-move location information, while the second-order attention for patch co-occurrence
exploits more discriminative properties. The affinity matrix is utilized to extract both attention maps, which are
then used to combine location-aware and co-occurrence features for the network’s ultimate detection and
localization branches.
Yue et al. in [27] proposed BusterNet for copy-move image forgery detection. It has a two-branch architecture with
a fusion module in the middle. Both branches use visual artifacts to locate potential manipulation locations and
visual similarities to locate copy- move regions. Yue et al. in [28] employed a CNN to extract block-like
characteristics from an image, compute self-correlations between various blocks, locate matching points using a
point-wise feature extractor, and reconstruct a forgery mask using a deconvolutional network. Yue et al. in [3]
9
designed ManTra-Net that is s a fully convolutional network that can handle any size image and a variety of
forgery types, including copy-move, enhancement, splicing, removal, and even unknown forgery forms. Liu et al.
in [29] proposed PSCC-Net, which analyses the image in a two-path methodology: a top-down route that retrieves
global and local features and a bottom-up route that senses if the image is tampered and predicts its masks at four
levels, each mask being constrained on the preceding one.
In [30] Yang et al., proposed a technique based on two concatenated CNNs: the coarse CNN and the refined CNN,
which extracts the differences between the image itself and splicing regions from patch descriptors of different
scales. They enhanced their work in [1] and proposed a patch-based coarse-to-refined network (C2RNet). The
coarse network is based on VVG16, and the refined network is based on VVG19. In [31] Xiuli et al., proposed a
ringed residual U-Net to detect the splicing type image forgery in the images. Younis et al. in [32] utilized the
reliability fusion map for the detection of the forgery. By utilizing the CNNs, Younis et al. in [33] classify an
image as the original one, or it contains copy-move image forgery. In [34] Vladimir et al., train four models at the
same time: a generative annotation model GA, a generative retouching model GR, and two discriminators DA and
DR that checks the output of GA and GR. Mayer et al. in [35] introduced a system that maps sets of image regions
to a value that indicates if they include the same or different forensic traces.
In [36] Minyoung et al., designed an algorithm that leverages the automatically recorded image EXIF metadata for
training a model to identify whether an image has self-consistency or if its content may have been generated from a
single image. In [37] Rongyu et al., proposed a UNet that consists of a dense convolutional and deconvolutional
networks. The first is a down-sampling method for retrieving features, while the second is an up-sampling
approach for recovering feature map size. In [38] Lui et al., introduced the CNN segmentation-based approach to
find manipulated regions in digital photos. First, a uniform CNN architecture is built to deal with various scales’
color input sliding windows. Then, using sampling training regions, they meticulously build CNN training
processes.
In [39], an unfixed encoder and a fixed encoder are used to build a Dual-encoder U- Net (D-Unet). The unfixed
encoder learns the image fingerprints that distinguish between genuine and tampered regions on its own. In
contrast, the fixed encoder offers direction data to facilitate the network’s learning and detection. In [40] Francesco
et al., tested the efficiency of several image forgery detectors over image-to-image translation, including both ideal
settings and even in the existence of compression, which is commonly performed when uploading to social media
sites. Kadam et al. in [41] Proposed a method based on multiple image splicing using MobileNet V1.Jaiswal et al.
in [42] proposed a framework in which images are fed into a CNN and then processed through several layers to
extract features, which are then utilized as a training vector for the detection model. For feature extraction, they
employed a pre-trained deep learning resnet-50.
10
Hao et al. in [43] proposed using an attention method to analyze and refine feature maps for the detection task. The
learned attention maps emphasize informative areas to enhance binary classification (false face vs. genuine face)
and illustrate the altered regions. In [44], Nguyen et al., developed a CNN that employs a multi-task learning
strategy to detect altered images and videos while also locating the forged areas. The information received from
one work is shared with the second task, improving both activities’ performance. To boost the network’s
generability, a semi-supervised learning strategy is adopted. An encoder and a Y-shaped decoder are included in
the network. Li et al. introduced a deepfake detection method in [45]. The DeepFake techniques can only create
fixed-size images of the face, which must be affinely warped to match the source’s face arrangement. Due to the
resolution disparity between the warped face area and the surrounding context, this warping produces different
artifacts. As a result, DeepFake Videos can be identified using these artifacts. Komodakis et al. in [46] suggested a
method for learning image features by training CNNs to recognize the two-dimensional rotation that is applied to
the picture that it receives as input. The method proposed in [47] is composed of three parts: single image super-
resolution, semantic segmentation super- resolution, and feature affinity module for semantic segmentation. In [48]
Yu et al., used dual attention upon pyramid visual feature maps to fully examine the visual-semantic relationships
and enhance the level of produced sentences. For more details about image forgery and media, forensics readers
may refer to [5–13].
The state-of-the-art techniques available for detecting the presence of tampering in the images generally take a
very long time to process the images. Most of them can detect either image splicing forgery or copy-move type of
forgery, not both. Another major issue with them is that they detect the forgery with low accuracy. Hence, there is
a need for a better framework that is fast and more accurate. To address this, we presented a novel image
recompression-based system. Apart from achieving better image forgery detection accuracy, our proposed
framework has also achieved faster response time. This makes it suitable for real-life applications, as it is more
accurate and can be utilized even by slower machines. The proposed framework is detailed in the next section

OVERVIEW OF THE PROJECT

Overview of the Project: "Image Forgery Detection Based on Fusion of Lightweight Deep Learning Models"

The Image Forgery Detection project is an innovative and advanced system designed to address the growing
concerns related to the manipulation and forgery of digital images. As the prevalence of image tampering continues
to rise, the need for robust and accurate forgery detection mechanisms becomes increasingly crucial. This project
leverages state-of-the-art techniques in deep learning to create a sophisticated solution capable of identifying
11
various forms of image manipulations.

In the digital age, the ease of image editing tools has led to an upsurge in deceptive practices, ranging from simple
retouching to more malicious activities like spreading misinformation. Recognizing the significance of maintaining
the integrity of digital content, this project introduces a cutting-edge approach to image forgery detection.

Key Components and Features:

1. Lightweight Deep Learning Models:


The core of the system revolves around the utilization of lightweight deep learning models. These models are
carefully trained to extract intricate features from images, enabling the system to discern subtle alterations. The
emphasis on lightweight models ensures efficient processing without compromising on accuracy, making the
forgery detection process both swift and reliable.

2. Fusion Techniques:
To enhance the robustness of forgery detection, the project incorporates fusion techniques. These techniques
intelligently combine outputs from multiple lightweight models, resulting in a consolidated decision that reflects a
more comprehensive understanding of the image content. The fusion process contributes to reducing false positives
and negatives, thereby elevating the system's overall performance.

3. Extensive Dataset Training:


The success of any deep learning-based system hinges on the quality and diversity of the training dataset. This
project meticulously curates an extensive dataset, encompassing various types of image forgeries, alterations, and
manipulations. The models undergo rigorous training to ensure adaptability to a wide array of scenarios, making
the system versatile and capable of handling real-world challenges.

Applications:

1. Media Forensics:
The project finds applications in media forensics, assisting professionals in verifying the authenticity of images
used in journalism, legal proceedings, and other critical domains. By providing a tool to identify potential
forgeries, the system contributes to maintaining the credibility of digital media.
12
2. Social Media Integrity:
In the era of social media, where visual content is widely shared, the project plays a pivotal role in ensuring the
integrity of shared images. It aids in combating the spread of misinformation and deceptive visuals, fostering a
more reliable and transparent online environment.

3. Legal and Law Enforcement Support:


Law enforcement agencies and legal professionals benefit from the project's capabilities in investigating digital
evidence. The system aids in verifying the authenticity of images presented as evidence in legal cases,
strengthening the integrity of the judicial process.

Challenges and Future Developments:

1. Evolving Threat Landscape:


As image manipulation techniques evolve, the project acknowledges the need for continuous research and
development. Future iterations will focus on staying ahead of emerging threats and incorporating adaptive
mechanisms to counter new forms of forgery.

2. User-Friendly Interface:
To enhance accessibility, the project aims to develop a user-friendly interface that allows users with varying levels
of technical expertise to seamlessly interact with the forgery detection system. This includes the integration of
intuitive dashboards and reporting tools.

Conclusion:

The "Image Forgery Detection Based on Fusion of Lightweight Deep Learning Models" project stands at the
forefront of technological advancements in the realm of image forensics. By combining the power of lightweight
deep learning models and fusion techniques, the system provides an effective solution to the pervasive issue of
image manipulation. As digital content continues to shape various facets of our lives, the project's contributions
towards maintaining the integrity of visual information are paramount. Through ongoing research and user-centric
developments, the project aspires to make a lasting impact on digital content verification and authentication.

13
CHAPTER 2

SYSTEM ANALYSIS

2.1 EXISTING SYSTEM

1. Current Image Forgery Detection Techniques:

In the realm of image forensics, various techniques are employed to detect forged or manipulated images. The
existing system typically leverages a combination of traditional image processing methods and machine learning
approaches. Some of the common techniques include:

Image Forensic Analysis:


Traditional image forensic analysis involves examining image metadata, such as EXIF data, to identify anomalies
or inconsistencies. This method is effective in detecting basic alterations but may fall short in identifying
sophisticated forgeries.

Digital Watermarking:
Digital watermarking is a method where invisible or semi-visible information is embedded within an image to
verify its authenticity. However, this technique may not be foolproof and could be circumvented by advanced
forgery methods.

Statistical Analysis:
Statistical features of an image, such as noise patterns or color distributions, are analyzed to identify irregularities.
While this method is useful for certain types of forgeries, it may not be robust against more advanced manipulation
techniques.

2. Lightweight Deep Learning Models:

To address the limitations of traditional methods, the existing system incorporates lightweight deep learning
models for image forgery detection. These models are designed to balance accuracy with computational efficiency,
14
making them suitable for real-time applications and resource-constrained environments.

Convolutional Neural Networks (CNNs):


CNNs have shown significant promise in image forgery detection. These lightweight variants of CNNs are
optimized for faster inference without compromising on detection accuracy. They can analyze image features and
patterns to identify anomalies indicative of forgery.

MobileNet and SqueezeNet Architectures:


Lightweight architectures like MobileNet and SqueezeNet are commonly employed in the existing system. These
models are designed to be computationally efficient, making them suitable for deployment on edge devices or
platforms with limited processing power.

3. Feature Extraction and Fusion:

The existing system utilizes feature extraction techniques to capture relevant information from images. Features
extracted from multiple lightweight models are then fused to create a comprehensive representation of the image.
Feature fusion enhances the system's ability to detect a wide range of forgery types and improves overall detection
accuracy.

4. Transfer Learning:

Transfer learning is a prevalent technique in the existing system, allowing lightweight models to leverage
knowledge gained from pre-trained models on large datasets. This approach accelerates the training process and
enhances the models' ability to generalize patterns associated with image forgeries.

5. Challenges in the Existing System:

Despite advancements in image forgery detection, the existing system faces challenges that need to be addressed
for further improvement:

Adversarial Attacks:
Adversarial attacks involve manipulating images in a way that evades detection by the existing system.
15
Lightweight models may be susceptible to such attacks, requiring additional robustness measures.

Limited Generalization:
Lightweight models, while efficient, may have limitations in generalizing to diverse types of forgeries. Ensuring
the system's effectiveness across various manipulation techniques remains a challenge.

Real-time Processing:
Achieving real-time processing for image forgery detection on resource-constrained devices is a challenge.
Optimizing the system's performance without compromising accuracy is crucial for practical applications.

2.2PROPOSED SYSTEM

CNNs, which are inspired by the human visual system, are designed to be non-linear interconnected neurons. They
have already demonstrated extraordinary potential in a variety of computer vision applications, including image
segmentation and object detection. They may be beneficial for a variety of additional purposes, including image
forensics. With the various tools available today, image forgery is fairly simple to do, and because it is extremely
dangerous, detecting it is crucial. When a fragment of an image is moved from one to another, a variety of artifacts
occur due to the images’ disparate origins. While these artifacts may be undetectable to the naked eye, CNNs may
detect their presence in faked images. Due to the fact that the source of the forged region and the background
images are distinct, when we recompress such images, the forged is enhanced differently due to the compression
difference. We use this concept in the proposed approach by training a CNN-based model to determine if an image
is genuine or a fake.

A region spliced onto another image will most likely have a statistically different distribution of DCT coefficients
than the original region. The authentic region is compressed twice: first in the camera, and then again in the fake,
resulting in periodic patterns in the histogram [2]. The spliced section behaves similarly to a singly compressed
region when the secondary quantization table is used.

As previously stated, when an image is recompressed, if it contains a forgery, the forged portion of the image
compresses differently from the remainder of the image due to the difference between the source of the original
16
image and the source of the forged portion. When the difference between the original image and its recompressed
version is analyzed, this considerably emphasizes the forgery component. As a result, we use it to train our CNN-
based model for detecting image forgery.

Algorithm 1 shows the working of the proposed technique, which has been explained here. We take the forged
image A (images shown in Figure 1b tamper images), and then recompress it; let us call the recompressed image as
Arecompressed (images shown in Figure 1c are recompressed forged images). Now we take the difference of the
original image and the recompressed image, let us call it Adi f f (images shown in Figure 1e are the difference of
Figure 1b,c, respectively). Now due to the difference in the source of the forged part and the original part of the
image, the forged part gets highlighted in Adi f f (as we can observe in Figure 1d,e, respectively). We train a CNN-
based network to categorize an image as a forged image or a genuine one using Adi f f as our input features (we
label it as a featured image). Figure 2 gives the pictorial view of the overall working of the proposed method.
To generate Arecompressed from A, we use JPEG compression. Image A undergoes JPEG compression and
produces Arecompressed as described in Figure 3. When there is a single compression, then the histogram of the
dequantized coefficients exhibits the pattern as shown in Figure 4, this type of pattern is shown by the forged part
of the image. Moreover, when there is a sort of double compression then, as described in Figure 5, there is a gaping
between the dequantized coefficients as shown in Figure 6, this type of pattern is shown by the genuine part of the
image.
We constructed a very light CNN model with minimal parameters in our proposed model (line number 5 to 13 of
Algorithm 1). We constructed a model consisting of 3 convo- lutional layers after which there is a dense fully
connected layer, as described below:
• The first convolutional layer consists of 32 filters of size 3-by-3, stride size one, and “relu” activation
function.
• The second convolutional layer consists of 32 filters of size 3-by-3, stride size one, and “relu” activation
function.
• The third convolutional layer consists of 32 filters of size 7-by-7, stride size one, and “relu” activation
function, followed by max-pooling of size 2-by-2.

CHAPTER 3

REQUIREMENT SPECIFICATIONS

17
3.1 HARDWARE AND SOFTWARE SPECIFICATION :

3.1.1 HARDWARE REQUIREMENTS:

❖ Hard disk : 500 GB and above.

❖ Processor : i3 and above.

❖ Ram : 4GB and above.

3.1.2 SOFTWARE REQUIREMENTS :

❖ Operating System : Windows 10

❖ Software : python

❖ Tools :Anaconda (Jupyter Note Book IDE)

3.2 TECHNOLOGIES USED:

❖ Programming Language: Python.

3.2.1 Introduction to Python:

Python is a widely used general-purpose, high level programming language. It was


initially designed by Guido van Rossum in 1991 and developed by Python Software
Foundation. It was mainly developed for emphasis on code readability, and its syntax allows
programmers to express concepts in fewer lines of code.
Python is a programming language that lets you work quickly and integrate systems
more efficiently.

18
It is used for:

● web development (server-side),

● software development,

● mathematics,

● System scripting.

What can Python do?

● Python can be used on a server to create web applications.

● Python can be used alongside software to create workflows.

● Python can connect to database systems. It can also read and modify files.

● Python can be used to handle big data and perform complex mathematics.

● Python can be used for rapid prototyping, or for production-ready software


development.

Why Python?

✧ Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).

✧ Python has a simple syntax similar to the English language.

✧ Python has syntax that allows developers to write programs with fewer lines than some
other programming languages.
✧ Python runs on an interpreter system, meaning that code can be executed as soon as
it is written. This means that prototyping can be very quick.
✧ Python can be treated in a procedural way, an object-orientated way or a functional way.

Python Syntax compared to other programming languages

19
● Python was designed to for readability, and has some similarities to the English
language with influence from mathematics.

● Python uses new lines to complete a command, as opposed to other programming


languages which often use semicolons or parentheses.
● Python relies on indentation, using whitespace, to define scope; such as the scope of
loops, functions and classes. Other programming languages often use curly-brackets
for this purpose.

Python is Interpreted

● Many languages are compiled, meaning the source code you create needs to be
translated into machine code, the language of your computer’s processor, before it
can be run. Programs written
in an interpreted language are passed straight to an interpreter that runs them directly.
● This makes for a quicker development cycle because you just type in your code and
run it, without the intermediate compilation step.

● One potential downside to interpreted languages is execution speed. Programs that are
compiled into the native language of the computer processor tend to run more quickly
than interpreted
programs. For some applications that are particularly computationally intensive, like
graphics processing or intense number crunching, this can be limiting.
● In practice, however, for most programs, the difference in execution speed is
measured in milliseconds, or seconds at most, and not
appreciably noticeable to a human user. The expediency of coding in an interpreted
language is typically worth it for most applications.
● For all its syntactical simplicity, Python supports most constructs that would be
expected in a very high-level language, including complex dynamic data types,
structured and functional
programming, and object-oriented programming.

20
● Additionally, a very extensive library of classes and functions is available that provides
capability well beyond what is built into the language, such as database manipulation
or GUI programming.
● Python accomplishes what many programming languages don’t: the language itself is
simply designed, but it is very versatile in terms of what you can accomplish with it.

3.2.2 Machine learning


Introduction:
Machinelearning (ML) is the scientific study of algorithms and statistical

models that computer systems use to perform a specific task without

21
using explicit instructions, relying on patterns and inference instead. It is seen as a subset of
artificial intelligence. Machine learning algorithms build a mathematical model based on
sample data, known as "training data", in order to make predictions or decisions without being
explicitly programmed to perform the task. Machine learning algorithms are used in a wide
variety of applications, such as email filtering and computer vision, where it is difficult or
infeasible to develop a conventional algorithm for effectively performing the task.
Machine learning is closely related to computational statistics, which focuses on making
predictions using computers. The study of mathematical optimization delivers
methods, theory and application domains to the field of machine learning. Data mining is a
field of study within machine learning, and focuses on exploratory data analysis
through learning. In its application across business problems, machine learning is also referred
to as predictive analytics.

Machine learning tasks:

Machine learning tasks are classified into several broad categories. In supervised
learning, the algorithm builds a mathematical model from a set of data that contains both the
inputs and the desired outputs. For example, if the task were determining whether an image
contained a certain object, the training data for a supervised learning algorithm would include
images with and without that object (the input), and each image would have a label (the
output) designating whether it contained the object. In special cases, the input may be only
partially available, or restricted to special feedback. Semi algorithms develop mathematical

models from incomplete training data, where a portion of the sample input doesn't have labels.

Classification algorithms and regression algorithms are types of supervised learning.


Classification algorithms are used when the outputs are restricted to a limited set of values. For
a classification algorithm that filters emails, the input would be an incoming email, and the
output would be the name of the folder in which to file the email. For an algorithm that
identifies spam emails, the output would be the prediction of either "spam" or "not spam",
represented by the Boolean values true and false. Regression algorithms are named for their
continuous outputs, meaning they may have any value within a range. Examples of a
continuous value are the temperature, length, or price of an object.

In unsupervised learning, the algorithm builds a mathematical model from a set of data that
22
contains only inputs and no desired output labels. Unsupervised learning algorithms are used
to find structure in the data, like grouping or clustering of data points. Unsupervised learning
can discover patterns in the data, and can group the inputs into categories, as in feature
learning. Dimensionality reduction is the process of reducing the number of "features", or
inputs, in a set of data.

Active learning algorithms access the desired outputs (training labels) for a limited set of
inputs based on a budget and optimize the choice of inputs for which it will acquire
training labels. When used interactively, these can be presented to a human user for labeling.
Reinforcement learning algorithms are given feedback in the form of positive or negative
reinforcement in a dynamic environment and are used in autonomous vehicles or in learning
to play a game against a human opponent. Other

specialized algorithms in machine learning include topic modeling, where the computer
program is given a set of natural language documents and finds other documents that cover
similar topics. Machine learning algorithms can be used to find the unobservable probability
density function in density estimation problems. Meta learning algorithms learn their own
inductive bias based on previous experience. In developmental robotics, robot learning
algorithms generate their own sequences of learning experiences, also known as a curriculum,
to cumulatively acquire new skills through self-guided exploration and social interaction with
humans. These robots use guidance mechanisms such as active learning, maturation, motor
synergies, and imitation.

Types of learning algorithms:

The types of machine learning algorithms differ in their approach, the type of data they
input and output, and the type of task or problem that they are intended to solve.

Supervised learning:

Supervised learning algorithms build a mathematical model of a set of data that


contains both the inputs and the desired outputs. The data is known as training data, and
consists of a set of training examples. Each training example has one or more inputs and the
desired output, also known as a supervisory signal. In the mathematical model, each training
example is represented by an array or vector, sometimes called a feature vector, and the
23
training data is represented by a matrix. Through iterative optimization of an objective
function, supervised learning algorithms learn a function that can be used to predict the
output associated with new

inputs. An optimal function will allow the algorithm to correctly determine the output for
inputs that were not a part of the training data. An algorithm that improves the accuracy of its
outputs or predictions over time is said to have learned to perform that task.

Supervised learning algorithms include classification and regression. Classification algorithms


are used when the outputs are restricted to a limited set of values, and regression algorithms
are used when the outputs may have any numerical value within a range. Similarity
learning is an area of supervised machine learning closely related to regression and
classification, but the goal is to learn from examples using a similarity function that measures
how similar or related two objects are. It has applications in ranking, recommendation
systems, visual identity tracking, face verification, and speaker verification.

In the case of semi-supervised learning algorithms, some of the training examples are missing
training labels, but they can nevertheless be used to improve the quality of a model. In weakly
supervised learning, the training labels are noisy, limited, or imprecise; however, these labels
are often cheaper to obtain, resulting in larger effective training sets.

Unsupervised Learning:

Unsupervised learning algorithms take a set of data that contains only inputs, and find
structure in the data, like grouping or clustering of data points. The algorithms, therefore, learn
from test data that has not been labeled, classified or categorized. Instead of responding to
feedback, unsupervised learning algorithms identify commonalities in the data and react based
on the presence or absence of such commonalities in each

24
new piece of data. A central application of unsupervised learning is in the field of density
estimation in statistics, though unsupervised learning encompasses other domains involving
summarizing and explaining data features.

Cluster analysis is the assignment of a set of observations into subsets (called clusters) so that
observations within the same cluster are similar according to one or more pre designated
criteria, while observations drawn from different clusters are dissimilar. Different clustering
techniques make different assumptions on the structure of the data, often defined by some
similarity metric and evaluated, for example, by internal compactness, or the similarity
between members of the same cluster, and separation, the difference between clusters. Other
methods are based on estimated density and graph connectivity.

Semi-supervised learning:

Semi-supervised learning falls between unsupervised learning (without any labeled training
data) and supervised learning (with completely labeled training data). Many machine-learning
researchers have found that unlabeled data, when used in conjunction with a small amount of
labeled data, can produce a considerable improvement in learning accuracy.

K-Nearest Neighbors

Introduction

In four years of the analytics built more than 80% of classification models and just 15-
20% regression models. These ratios can be more or less generalized throughout the industry.
The reason of a bias towards classification models is that most analytical problem involves
making a decision. For instance will a customer attrite or not, should we target customer X for
digital campaigns, whether customer has a high potential or not etc. This analysis is more
insightful and directly links to an implementation roadmap. In this article, we will talk about
another widely used classification technique called K-nearest neighbors (KNN). Our focus
25
will be primarily on how does the algorithm work and how does the input parameter effect the
output/prediction.

KNN algorithm

KNN can be used for both classification and regression predictive problems. However,
it is more widely used in classification problems in the industry. To evaluate any technique
we generally look at 3 important aspects:
1. Ease to interpret output
2. Calculation time
3. Predictive Power
Let us take a few examples to place KNN in the scale:

KNN algorithm fairs across all parameters of considerations. It is commonly used for its easy
of interpretation and low calculation time.

The KNN algorithm work

Let’s take a simple case to understand this algorithm. Following is a spread of red circles
(RC) and green squares (GS):

26
You intend to find out the class of the blue star (BS). BS can either be RC or GS and
nothing else. The “K” is KNN algorithm is the nearest neighbors we wish to take vote from.
Let’s say K = 3. Hence, we will now make a circle with BS as center just as big as to enclose
only three data points on the plane. Refer to following diagram for more details:

The three closest points to BS is all RC. Hence, with good confidence level we can say
that the BS should belong to the class RC. Here, the choice became very obvious as all three
votes from the closest neighbor went to RC. The choice of the parameter K is very crucial in
this algorithm.

How do we choose the factor K?

First let us try to understand what exactly does K influence in the algorithm. If we see
the last example, given that all the 6 training observation remain constant, with a given K
27
value we can make boundaries of each class. These boundaries will segregate RC from GS.
The same way, let’s try to see the effect of value “K” on the class boundaries. Following are
the different boundaries separating the two classes with different values of K.

If you watch carefully, you can see that the boundary becomes smoother with
increasing value of K. With K increasing to infinity it finally becomes all blue or all red
depending on the total majority. The training

error rate and the validation error rate are two parameters we need to access on different K-
value. Following is the curve for the training error rate with varying value of K:

28
As you can see, the error rate at K=1 is always zero for the training sample. This is
because the closest point to any training data point is itself. Hence the prediction is always
accurate with K=1. If validation error curve would have been similar, our choice of K would
have been 1. Following is the validation error curve with varying value of K:

This makes the story more clear. At K=1, we were over fitting the boundaries. Hence,
error rate initially decreases and reaches a minimal. After the minima point, it then
increases with increasing K. To get the

optimal value of K, you can segregate the training and validation from the initial dataset. Now

29
plot the validation error curve to get the optimal value of K. This value of K should be used for
all predictions.

Breaking it down – Pseudo Code of KNN

We can implement a KNN model by following the below steps:

1. Load the data


2. Initialize the value of k
3. For getting the predicted class, iterate from 1 to total number of training data
points
1. Calculate the distance between test data and each row of training data. Here
we will use Euclidean distance as our distance metric since it’s the most
popular method. The other metrics that can be used are Chebyshev, cosine,
etc.
2. Sort the calculated distances in ascending order based on distance values
3. Get top k rows from the sorted array
4. Get the most frequent class of these rows
5. Return the predicted class

Conclusion

KNN algorithm is one of the simplest classification algorithms. Even with such
simplicity, it can give highly competitive results. KNN algorithm can also be used for
regression problems. The only difference from the discussed methodology will be using
averages of nearest neighbors rather than voting from nearest neighbors.

Decision tree introduction:

In a decision tree, the algorithm starts with a root node of a tree then compares the
value of different attributes and follows the next branch until it reaches the end leaf node.
30
It uses different algorithms to check about the split and variable that allow the best
homogeneous sets of population.

Decision trees are considered to be widely used in data science. It is a key proven tool for
making decisions in complex scenarios. In Machine learning, ensemble methods like decision
tree, random forest are widely used. Decision trees are a type of supervised learning algorithm
where data will continuously be divided into different categories according to certain
parameters.

So in this blog, I will explain the Decision tree algorithm. How is it used? How it functions
will be covering everything that is related to the decision tree.

What is a Decision Tree?

Decision tree as the name suggests it is a flow like a tree structure that works on the
principle of conditions. It is efficient and has strong algorithms used for predictive analysis. It
has mainly attributes that include internal nodes, branches and a terminal node.

Every internal node holds a “test” on an attribute, branches hold the conclusion of the test and
every leaf node means the class label. This is the most used algorithm when it
comes to supervised learning techniques.

It is used for both classifications as well as regression. It is often termed as “CART” that
means Classification and Regression Tree. Tree algorithms are always preferred due to
stability and reliability.

How can an algorithm be used to represent a tree

Let us see an example of a basic decision tree where it is to be decided in what conditions to

31
play cricket and in what conditions not to play. You might have got a fair idea about the
conditions on which decision trees work with the above example. Let us now see the common
terms used in Decision Tree that is stated below:

❖ Branches - Division of the whole tree is called branches.

❖ Root Node - Represent the whole sample that is further divided.

❖ Splitting - Division of nodes is called splitting.

❖ Terminal Node - Node that does not split further is called a terminal node.
❖ Decision Node - It is a node that also gets further divided into different sub-nodes
being a sub node.
❖ Pruning - Removal of subnodes from a decision node.

❖ Parent and Child Node - When a node gets divided further then that node is termed as
parent node whereas the divided nodes or the sub- nodes are termed as a child node of the
parent node.

How Does Decision Tree Algorithm Work

It works on both the type of input & output that is categorical and continuous. In
classification problems, the decision tree asks questions,

and based on their answers (yes/no) it splits data into further sub branches.

In a decision tree, the algorithm starts with a root node of a tree then compares the value of
different attributes and follows the next branch until it reaches the end leaf node. It uses
different algorithms to check about the split and variable that allow the best homogeneous sets
of population.

32
"Decision trees create a tree-like structure by computing the relationship between
independent features and a target. This is done by making use of functions that
are based on comparison operators on the independent features

Types of Decision Tree:

Type of decision tree depends upon the type of input we have that is categorical or numerical :

1. If the input is a categorical variable like whether the image contender will defaulter
or not, that is either yes/no. This type of decision tree is called a Categorical variable
decision tree, also called classification trees.
2. If the input is numeric types and or is continuous in nature like when we have to
predict a house price. Then the used decision tree is called a Continuous variable
decision tree, also called Regression trees.

Lists of Algorithms:

● ID3 (Iterative Dicotomizer3) – This DT algorithm was developed by Ross Quinlan that uses
greedy algorithms to generate multiple branch trees. Trees extend to maximum size before
pruning.
● C4.5 flourished ID3 by overcoming restrictions of features that are required to be categorical.
It effectively defines distinct attributes for numerical features. Using if-then condition it
converts the trained trees.
● C5.0 uses less space and creates smaller rulesets than C4.5.

● The CART classification and regression tree are similar to C4.5 but it braces numerical target
variables and does not calculate the rule sets. It generates a binary tree.
33
Why do we use Decision Trees?

Decision trees provide an effective method of Decision Making because they: Clearly lay out
the problem so that all options can be challenged. Allow us to analyze fully the possible
consequences of a

decision. Provide a framework to quantify the values of outcomes and the probabilities of
achieving them.

What is decision tree in interview explain?

A Decision Tree is a supervised machine learning algorithm that can be used for both
Regression and Classification problem statements. It divides the complete dataset into smaller
subsets while at the same time an associated Decision Tree is incrementally developed.

Where are Decision Trees used?

Decision trees are commonly used in operations research, specifically in decision analysis,
to help identify a strategy most likely to reach a goal, but are also a popular tool in machine
learning.
What is a decision tree classification model?

34
Decision Tree - Classification. Decision tree builds classification or regression models in
the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while
at the same time an associated decision tree is incrementally developed. ... Decision trees can
handle both categorical and numerical data

What is the final objective of decision tree?

As the goal of a decision tree is that it makes the optimal choice at the end of each node it
needs an algorithm that is capable of doing just that. That algorithm is known as Hunt's
algorithm, which is both greedy, and recursive

Decision tree introduction

In a decision tree, the algorithm starts with a root node of a tree then compares the value of
different attributes and follows the next branch until it reaches the end leaf node. It uses
different algorithms to check about the split and variable that allow the best homogeneous sets
of population.

Decision trees are considered to be widely used in data science. It is a key proven tool for
making decisions in complex scenarios. In Machine learning, ensemble methods like decision
tree, random forest are widely used. Decision trees are a type of supervised learning algorithm
where data will continuously be divided into different categories according to certain
parameters.

So in this blog, I will explain the Decision tree algorithm. How is it used? How it functions
will be covering everything that is related to the decision tree.

What is a Decision Tree?

35
Decision tree as the name suggests it is a flow like a tree structure that works on the principle
of conditions. It is efficient and has strong algorithms used for predictive analysis. It has
mainly attributes that include internal nodes, branches and a terminal node.

Every internal node holds a “test” on an attribute, branches hold the conclusion of the test and
every leaf node means the class label. This is the most used algorithm when it
comes to supervised learning techniques.

It is used for both classifications as well as regression. It is often termed as “CART” that
means Classification and Regression Tree. Tree algorithms are always preferred due to
stability and reliability.

How can an algorithm be used to represent a tree

Let us see an example of a basic decision tree where it is to be decided in what conditions to
play cricket and in what conditions not to play. You might have got a fair idea about the
conditions on which decision trees work with the above example. Let us now see the common
terms used in Decision Tree that is stated below:

● Branches - Division of the whole tree is called branches.

● Root Node - Represent the whole sample that is further divided.

● Splitting - Division of nodes is called splitting.

● Terminal Node - Node that does not split further is called a terminal node.

● Decision Node - It is a node that also gets further divided into different sub-nodes being a sub
node.
● Pruning - Removal of subnodes from a decision node.

● Parent and Child Node - When a node gets divided further then that node is termed as parent
node whereas the divided nodes or the sub- nodes are termed as a child node of the parent
node.

36
How Does Decision Tree Algorithm Work

It works on both the type of input & output that is categorical and continuous. In classification
problems, the decision tree asks questions, and based on their answers (yes/no) it splits data
into further sub branches.

In a decision tree, the algorithm starts with a root node of a tree then compares the value of
different attributes and follows the next branch until it reaches the end leaf node. It uses
different algorithms to check about the split and variable that allow the best homogeneous sets
of population.

"Decision trees create a tree-like structure by computing the


relationship between independent features and a target. This is

done by making use of functions that are based on comparison


operators on the independent features

Lists of Algorithms:

● ID3 (Iterative Dicotomizer3) – This DT algorithm was developed by Ross Quinlan that uses
greedy algorithms to generate multiple branch trees. Trees extend to maximum size before
pruning.
● C4.5 flourished ID3 by overcoming restrictions of features that are required to be categorical.
It effectively defines distinct attributes for numerical features. Using if-then condition it
converts the trained trees.
● C5.0 uses less space and creates smaller rulesets than C4.5.

37
● The CART classification and regression tree are similar to C4.5 but it braces numerical target
variables and does not calculate the rule sets. It generates a binary tree.

38
Why do we use Decision Trees?

Decision trees provide an effective method of Decision Making because they: Clearly lay out
the problem so that all options can be challenged. Allow us to analyze fully the possible
consequences of a decision. Provide a framework to quantify the values of outcomes and the
probabilities of achieving them.
What is decision tree in interview explain?

A Decision Tree is a supervised machine learning algorithm that can be used for both
Regression and Classification problem statements. It divides the complete dataset into smaller
subsets while at the same time an associated Decision Tree is incrementally developed.

LOGISTICS IS THE ART AND SCIENCE OF MANAGEMENT, ENGINEERING AND


TECHNICAL ACTIVITIES CONCERNED WITH REQUIREMENTS, DESIGN AND
SUPPLYING, MAINTAINING RESOURCES TO SUPPORT OBJECTIVES, PLANS AND
OPERATION.
How do you explain logistics?

Logistics refers to the overall process of managing how resources are acquired, stored,
and transported to their final destination. Logistics management involves identifying
prospective distributors and suppliers and determining their effectiveness and accessibility.
What are the 3 types of logistics?

39
Logistics has three types; inbound, outbound, and reverse logistics.

What are the 7 R's of logistics?

So, what are the 7 Rs? The Chartered Institute of Logistics & Transport UK (2019) defines
them as: Getting the Right product, in the Right quantity, in the Right condition, at the
Right place, at the Right time, to the Right customer, at the Right price.

What are the importance of logistics?

Logistics is an important element of a successful supply chain that helps increase the sales
and profits of businesses that deal with the production, shipment, warehousing and
delivery of products. Moreover, a reliable logistics service can boost a business' value
and help in maintaining a positive public image.

What is logistics in real life?

Logistics is the strategic vision of how you will create and deliver your product or service
to your end customer. If you take the city, town or village that you live in, you can see a very
clear example of what the logistical strategy was when they were designing it.

What are the 3 main activities of logistics systems?

40
Logistics activities or Functions of Logistics

● Order processing. The Logistics activities start from the order processing which might be the
work of the commercial department in an organization. ...

● Materials handling. ...

● Warehousing. ...

● Inventory control. ...

● Transportation. ...

● Packaging.

What is 3PL and 4PL in logistics?

A 3PL (third-party logistics) provider manages all aspects of fulfillment, from warehousing to
shipping. A 4PL (fourth-party logistics) provider manages a 3PL on behalf of the
customer and other aspects of the supply chain.
What are the five major components of logistics? There are five
elements of logistics:
● Storage, warehousing and materials handling.

● Packaging and unitisation.

● Inventory.

● Transport.

41
● Information and control.

What is logistic cycle?

Logistics management cycle includes key activities such as product selection, quantification
and procurement, inventory management, storage, and distribution. Other activities that
help drive the logistics cycle and are also at the heart of logistics are organisation and staffing,
budget, supervision, and evaluation.

Why did you choose logistics?

We choose logistics because it is one of the most important career sectors in the globe and
be more excited about it. ... I prefer my profession to work in logistics and it can be a
challenging field, and with working in it I want to make up an important level of satisfaction in
their jobs.

What is logistics and SCM?

The basic difference between Logistics and Supply Chain Management is that Logistics
management is the process of integration and maintenance

42
(flow and storage) of goods in an organization whereas Supply Chain Management is the
coordination and management (movement) of supply chains of an organization

CHAPTER 4

4.1 Design and Implementation Constraints

4.5.1 Constraints in Analysis

♦ Constraints as Informal Text

♦ Constraints as Operational Restrictions

♦ Constraints Integrated in Existing Model Concepts

♦ Constraints as a Separate Concept

♦ Constraints Implied by the Model Structure

4.1.2 Constraints in Design

♦ Determination of the Involved Classes

♦ Determination of the Involved Objects

♦ Determination of the Involved Actions

43
♦ Determination of the Require Clauses

♦ Global actions and Constraint Realization

4.1.3 Constraints in Implementation

44
A hierarchical structuring of relations may result in more classes and a
more complicated structure to implement. Therefore it is advisable to transform the
hierarchical relation structure to a simpler structure such as a classical flat one. It is
rather straightforward to transform the developed hierarchical model into a bipartite,
flat model, consisting of classes on the one hand and flat relations on the other. Flat
relations are preferred at the design level for reasons of simplicity and implementation
ease. There is no identity or functionality associated with a flat relation. A flat relation
corresponds with the relation concept of entity-relationship modeling and many object
oriented methods.

4.2 Other Nonfunctional Requirements

4.2.1 Performance Requirements

The application at this side controls and communicates with the following three main
general components.

⮚ embedded browser in charge of the navigation and accessing to the web service;

⮚ Server Tier: The server side contains the main parts of the functionality of the
proposed architecture. The components at this tier are the following.

Web Server, Security Module, Server-Side Capturing Engine, Preprocessing


Engine, Database System, Verification Engine, Output Module.

4.2.2 Safety Requirements


1. The software may be safety-critical. If so, there are issues associated with its
integrity level
2. The software may not be safety-critical although it forms part of a safety-critical
system. For example, software may simply log transactions.
3. If a system must be of a high integrity level and if the software is shown to be of
that integrity level, then the hardware must be at least of the same integrity level.
4. There is little point in producing 'perfect' code in some language if hardware and

45
system software (in widest sense) are not reliable.
5. If a computer system is to run software of a high integrity level then that system
should not at the same time accommodate software of a lower integrity level.
6. Systems with different requirements for safety levels must be separated.
7. Otherwise, the highest level of integrity required must be applied to all systems in
the same environment.

CHAPTER 5

5.1 Architecture Diagram:

46
47
5.2 Sequence Diagram:

A Sequence diagram is a kind of interaction diagram that shows how processes operate
with one another and in what order. It is a construct of Message Sequence diagrams are
sometimes called event diagrams, event sceneries and timing diagram.

5.3 Use Case Diagram:

48
Unified Modeling Language (UML) is a standardized general-purpose modeling language
in the field of software engineering. The standard is managed and was created by the Object
Management Group. UML includes a set of graphic notation techniques to create visual
models of software intensive systems. This language is used to specify, visualize, modify,
construct and document the artifacts of an object oriented software intensive system under
development.

5.3.1. USE CASE DIAGRAM

A Use case Diagram is used to present a graphical overview of the functionality provided
by a system in terms of actors, their goals and any dependencies between those use cases.
Use case diagram consists of two parts:

Use case: A use case describes a sequence of actions that provided something of measurable
value to an actor and is drawn as a horizontal ellipse.

Actor: An actor is a person, organization or external system that plays a role in one or more
interaction with the system.

49
50
51
5.4 Activity Diagram:

Activity diagram is a graphical representation of workflows of stepwise activities and


actions with support for choice, iteration and concurrency. An activity diagram shows the
overall flow of control.
The most important shape types:

● Rounded rectangles represent activities.

● Diamonds represent decisions.

● Bars represent the start or end of concurrent activities.

● A black circle represents the start of the workflow.

● An encircled circle represents the end of the workflow.

52
5.5 Collaboration Diagram:

UML Collaboration Diagrams illustrate the relationship and interaction between


software objects. They require use cases, system operation contracts and domain model to
already exist. The collaboration diagram illustrates messages being sent between classes and
objects.

53
CHAPTER 6

6.1 MODULES

⮚ Dataset collection

⮚ Machine Learning Algorithm

⮚ Prediction

6.2 MODULE EXPLANATION:

6.2.1 Dataset collection:


54
Dataset is collected from the kaggle.com. That dataset have some value like gender,
marital status, self-employed or not, monthly income, etc,.

or not depends up on the customer information. That data well be preprocessed and proceed to the
next step.

Machine learning Algorithm:

In this stage, the collected data will be given to the machine algorithm for training
process. We use multiple algorithms to get high accuracy range of prediction. A preprocessed
dataset are processed in different machine learning algorithms. Each algorithm gives some
accuracy level. Each one is undergoes for the comparison.

✔ Logistic Regression

✔ K-Nearest Neighbors

✔ Decision Tree Classifier

55
CHAPTER 7

CODING AND TESTING

7.1 CODING

Once the design aspect of the system is finalizes the system enters into the coding and
testing phase. The coding phase brings the actual system into action by converting the design
of the system into the code in a given programming language. Therefore, a good coding style
has to be taken whenever changes are required it easily screwed into the system.
7.2 CODING STANDARDS

Coding standards are guidelines to programming that focuses on the physical structure and
appearance of the program. They make the code

easier to read, understand and maintain. This phase of the system actually implements the
blueprint developed during the design phase. The coding specification should be in such a
way that any programmer must be able to understand the code and can bring about changes
whenever felt necessary. Some of the standard needed to achieve the above-mentioned

objectives are as follows:


Program should be simple, clear and easy to understand. Naming

56
conventions
Value conventions

Script and comment procedure


Message box format Exception and
error handling
7.2.1 NAMING CONVENTIONS

Naming conventions of classes, data member, member functions, procedures etc., should
be self-descriptive. One should even get the meaning and scope of the variable by its name.
The conventions are adopted for easy understanding of the intended message by the user. So
it is customary to follow the conventions. These conventions are as follows:
Class names

Class names are problem domain equivalence and begin with capital letter and have
mixed cases.
Member Function and Data Member name

Member function and data member name begins with a lowercase letter with each
subsequent letters of the new words in uppercase and the rest of letters in lowercase.
7.2.2 VALUE CONVENTIONS

Value conventions ensure values for variable at any point of time.

This involves the following:

⮚ Proper default values for the variables.

⮚ Proper validation of values in the field.

⮚ Proper documentation of flag values.

57
7.2.3 SCRIPT WRITING AND COMMENTING STANDARD

Script writing is an art in which indentation is utmost important. Conditional and looping
statements are to be properly aligned to facilitate easy understanding. Comments are included
to minimize the number of surprises that could occur when going through the code.

7.2.4 MESSAGE BOX FORMAT :

When something has to be prompted to the user, he must be able to understand it


properly. To achieve this, a specific format has been adopted in displaying messages to the
user. They are as follows:

⮚ X – User has performed illegal operation.

⮚ ! – Information to the user.

7.3 TEST PROCEDURE


SYSTEM TESTING
Testing is performed to identify errors. It is used for quality

assurance. Testing is an integral part of the entire development and maintenance process. The
goal of the testing during phase is to verify that the specification has been accurately and
completely incorporated into the design, as well as to ensure the correctness of the design
itself. For example the design must not have any logic faults in the design is detected before
coding commences, otherwise the cost of fixing the faults will be considerably higher as
reflected. Detection of design faults can be achieved by means of inspection as well as
walkthrough.
Testing is one of the important steps in the software development phase. Testing checks
for the errors, as a whole of the project testing involves the following test cases:

58
⮚ Static analysis is used to investigate the structural properties of the Source code.

⮚ Dynamic testing is used to investigate the behavior of the source code by executing
the program on the test data.

7.4 TEST DATA AND OUTPUT

7.4.1 UNIT TESTING

Unit testing is conducted to verify the functional performance of each


modular component of the software. Unit testing focuses on the smallest unit of the software
design (i.e.), the module. The white-box testing techniques were heavily employed for unit
testing.

7.4.2 FUNCTIONAL TESTS

Functional test cases involved exercising the code with nominal input values
for which the expected results are known, as well as boundary values and special values, such
as logically related inputs, files of identical elements, and empty files.
Three types of tests in Functional test:

⮚ Performance Test

⮚ Stress Test

⮚ Structure Test

7.4.3 PERFORMANCE TEST

It determines the amount of execution time spent in various parts of the unit,
program throughput, and response time and device utilization by the program unit.
7.4.4 STRESS TEST

Stress Test is those test designed to intentionally break the unit. A Great deal can
be learned about the strength and limitations of a program by examining the manner in which a
programmer in which a program unit breaks.
7.4.5 STRUCTURED TEST

Structure Tests are concerned with exercising the internal logic of a program and
traversing particular execution paths. The way in which White-Box test strategy was employed
to ensure that the test cases could Guarantee that all independent paths within a module have
been have been exercised at least once.

⮚ Exercise all logical decisions on their true or false sides.

⮚ Execute all loops at their boundaries and within their


operational bounds.

⮚ Exercise internal data structures to assure their validity.

⮚ Checking attributes for their correctness.

⮚ Handling end of file condition, I/O errors, buffer problems and textual errors
in output information
7.4.6 INTEGRATION TESTING

Integration testing is a systematic technique for construction the program structure


while at the same time conducting tests to uncover errors associated with interfacing. i.e.,
integration testing is the complete testing of the set of modules which makes up the product.
The objective is to take untested modules and build a program structure tester should identify
critical modules. Critical modules should be tested as early as possible. One approach is to
wait until all the units have passed testing, and then combine them and then tested. This
approach is evolved from unstructured testing of small programs. Another strategy is to
construct the product in increments of tested units. A small set of modules are integrated
together and tested, to which another module is added and tested in combination. And so on.
The advantages of this approach are that, interface dispenses can be easily found and
corrected.
The major error that was faced during the project is linking error. When all the
modules are combined the link is not set properly with all support files. Then we checked out
for interconnection and the links. Errors are localized to the new module and its
intercommunications. The

product development can be staged, and modules integrated in as they complete unit testing.
Testing is completed when the last module is integrated and tested.
7.5 TESTING TECHNIQUES / TESTING STRATERGIES

7.5.1 TESTING

Testing is a process of executing a program with the intent of finding an error. A


good test case is one that has a high probability of finding an as-yet –undiscovered error. A
successful test is one that uncovers an as-yet- undiscovered error. System testing is the stage of
implementation, which is aimed at ensuring that the system works accurately and efficiently
as expected before live operation commences. It verifies that the whole set of programs hang
together. System testing requires a test consists of several key activities and steps for run
program, string, system and is important in adopting a successful new system. This is the last
chance to detect and correct errors before the system is installed for user acceptance testing.
The software testing process commences once the program is created and the
documentation and related data structures are designed. Software testing is essential for
correcting errors. Otherwise the program or the project is not said to be complete. Software
testing is the critical element of software quality assurance and represents the ultimate the

review of specification design and coding. Testing is the process of executing the program
with the intent of finding the error. A good test case design is one that as a probability of
finding an yet undiscovered error. A successful test is one that uncovers an yet undiscovered
error. Any engineering product can be tested in one of the two ways:
7.5.1.1 WHITE BOX TESTING

This testing is also called as Glass box testing. In this testing, by knowing
the specific functions that a product has been design to perform test can be conducted that
demonstrate each function is fully operational at the same time searching for errors in each
function. It is a test case design method that uses the control structure of the procedural design
to derive test cases. Basis path testing is a white box testing.
Basis path testing:

⮚ Flow graph notation


⮚ Cyclometric complexity

⮚ Deriving test cases

⮚ Graph matrices Control

7.5.1.2 BLACK BOX TESTING


In this testing by knowing the internal operation of a product, test can
be conducted to ensure that “all gears mesh”, that is the internal operation performs according
to specification and all internal components have been adequately exercised. It
fundamentally focuses on the functional requirements of the software.
The steps involved in black box test case design are:

⮚ Graph based testing methods

⮚ Equivalence partitioning

⮚ Boundary value analysis

⮚ Comparison testing

7.5.2 SOFTWARE TESTING STRATEGIES:

A software testing strategy provides a road map for the software developer. Testing
is a set activity that can be planned in advance and conducted systematically. For this reason a
template for software testing a set of steps into which we can place specific test case design
methods should be strategy should have the following characteristics:

⮚ Testing begins at the module level and works “outward” toward the integration
of the entire computer based system.

⮚ Different testing techniques are appropriate at different points in time.


⮚ The developer of the software and an independent test group conducts testing.

⮚ Testing and Debugging are different activities but debugging must be


accommodated in any testing strategy.

7.5.2.1 INTEGRATION TESTING:

Integration testing is a systematic technique for constructing the program structure


while at the same time conducting tests to uncover errors associated with. Individual modules,
which are highly prone to interface errors, should not be assumed to work instantly when we
put them together. The problem of course, is “putting them together”- interfacing. There may
be the chances of data lost across on another’s sub functions, when combined may not produce
the desired major function; individually acceptable impression may be magnified to
unacceptable levels; global data structures can present problems.

7.5.2.2 PROGRAM TESTING:

The logical and syntax errors have been pointed out by program testing. A syntax error
is an error in a program statement that in violates

one or more rules of the language in which it is written. An improperly defined field
dimension or omitted keywords are common syntax error. These errors are shown through
error messages generated by the computer. A logic error on the other hand deals with the
incorrect data fields, out-off-range items and invalid combinations. Since the compiler s will
not deduct logical error, the programmer must examine the output. Condition testing exercises
the logical conditions contained in a module. The possible types of elements in a condition
include a Boolean operator, Boolean variable, a pair of Boolean parentheses A relational
operator or on arithmetic expression. Condition testing method focuses on testing each
condition in the program the purpose of condition test is to deduct not only errors in the
condition of a program but also other a errors in the program.
7.5.2.3 SECURITY TESTING:

Security testing attempts to verify the protection mechanisms built in to a system


well, in fact, protect it from improper penetration. The system security must be tested for
invulnerability from frontal attack must also be tested for invulnerability from rear attack.
During security, the tester places the role of individual who desires to penetrate system.

7.5.2.4 VALIDATION TESTING

At the culmination of integration testing, software is completely assembled as a


package. Interfacing errors have been uncovered and corrected and a final series of software
test-validation testing begins. Validation testing can be defined in many ways, but a simple
definition is that validation succeeds when the software functions in manner that is reasonably
expected by the customer. Software validation is achieved through a series of black box tests
that demonstrate conformity with requirement. After validation test has been conducted, one
of two conditions exists.
* The function or performance characteristics confirm to specifications
and are accepted.

* A validation from specification is uncovered and a deficiency created.

Deviation or errors discovered at this step in this project is corrected prior to


completion of the project with the help of the user by negotiating to establish a method for
resolving deficiencies. Thus the proposed system under consideration has been tested by
using validation testing and found to be working satisfactorily. Though there were deficiencies
in the system they were not catastrophic

7.5.2.5 USER ACCEPTANCE TESTING


User acceptance of the system is key factor for the success of any system. The
system under consideration is tested for user acceptance by constantly keeping in touch with
prospective system and user at the time of developing and making changes whenever required.
This is done in regarding to the following points.

● Input screen design.


● Output screen design.

RESULTS - INPUT AND OUTPUT SCREENSHOTS


To run the project, double click on the ‘run.bat’ file to get the below output.

In the above screen, click on the ‘Upload MICC-F220 Dataset’ button to upload the dataset and get the below
output.
In the above screen, select and upload the ‘Dataset’ folder, then click on the ‘Select Folder’ button to load the
dataset and get the below output.

In the above screen, the dataset is loaded, and now click on the ‘Preprocess Dataset’ button to read all images,
normalize them, and get the below output.
In the above screen, all images are processed. To check if the images loaded properly, one sample image is
displayed. Close the above image to get the below output.

In the above screen, we can see the dataset contains 220 images, all images are processed, and now click on the
‘Generate & Load Fusion Model’ button to train all algorithms, extract features from them, and then calculate their
accuracy.

In the above screen, we can see the accuracy of all three algorithms. In the last line, we can see that from all three
algorithms, the application extracted 576 features. Click on ‘Fine-Tuned Features Map with SVM’ to train SVM
with extracted features and get its accuracy as the fusion model.
In the above screen, with Fine-Tuned SVM fusion model, we got 95% accuracy. The confusion matrix graph
shows that both X and Y boxes contain more correctly predicted classes. Fine-tuned features with SVM have high
accuracy. Close the confusion matrix graph, then click on ‘Run Baseline SIFT Model’ button to train SVM with
SIFT existing features and get its accuracy.

In the above screen, with existing SIFT SVM features, we got 68% accuracy. The confusion matrix graph shows
that existing SIFT predicted 6 and 8 instances incorrectly. Existing SIFT features are not good in prediction. Close
the above graph, then click on ‘Accuracy Comparison Graph’ button to get the below graph.
In the above graph, x-axis represents algorithm names, and y-axis represents accuracy and other metrics. Each
different color bar represents different metrics like precision, recall, etc. Close the above graph, then click on
‘Performance Table’ button to get results in the below tabular format.

In the above screen, we can see that the proposed fusion model SVM with fine-tuned features has got 95%
accuracy, which is better than all other algorithms.

CONCLUSION
The increased availability of cameras has made photography popular in recent years. Images play a crucial role in
our lives and have evolved into an essential means of conveying information since the general public quickly
understands them. There are various tools accessible to edit images; these tools are primarily intended to enhance
images; however, these technologies are frequently exploited to forge the images to spread misinformation. As a
result, image forgery has become a significant problem and a matter of concern.

In this paper, we provide a unique image forgery detection system based on neural networks and deep learning,
emphasizing the CNN architecture approach. To achieve satisfactory results, the suggested method uses a CNN
architecture that incorporates variations in image compression. We use the difference between the original and
recompressed images to train the model. The proposed technique can efficiently detect image splicing and copy-
move types of image forgeries. The experiments results are highly encouraging, and they show that the overall
validation accuracy is 92.23%, with a defined iteration limit.

We plan to extend our technique for image forgery localization in the future. We will also combine the suggested
technique with other known image localization techniques to improve their performance in terms of accuracy and
reduce their time complexity. We will enhance the proposed technique to handle spoofing [50] as well. The present
technique requires image resolution to be a minimum of 128 × 128, so we will enhance the proposed technique to
work well for tiny images. We will also be developing a challenging extensive image forgery database to train deep
learning networks for image forgery detection.

FUTURE ENHANCEMENTS
Future Enhancements for "Image Forgery Detection Based on Fusion of Lightweight Deep Learning Models"

The continuous evolution of digital imaging techniques and the emergence of sophisticated forgery methods
underscore the importance of envisioning and implementing future enhancements for the Image Forgery Detection
project. As technology advances and new challenges arise, the system must adapt to stay ahead of the curve. This
section outlines a comprehensive set of future enhancements that aim to further elevate the capabilities and
effectiveness of the forgery detection system.

1. Adaptive Learning Mechanisms:


Future iterations of the project will focus on incorporating adaptive learning mechanisms. These mechanisms
will enable the system to dynamically adjust its detection algorithms based on evolving trends in image
manipulation. Machine learning algorithms will continuously learn from new types of forgeries, ensuring that the
system remains resilient in the face of emerging threats.

2. Real-time Detection:
Enhancing the system's capabilities to operate in real-time is a critical aspect of future development. The goal is
to reduce processing time significantly, making it possible for the forgery detection system to analyze and verify
images in real-time, a feature particularly crucial in applications such as live media broadcasts and online content
sharing.

3. Integration with Blockchain Technology:


The integration of blockchain technology will be explored to fortify the integrity and traceability of detected
forgeries. A blockchain-based system can securely store information about verified images, creating an immutable
record of their authenticity. This not only enhances the trustworthiness of the system but also provides a
transparent and tamper-proof audit trail.

4. Multi-Modal Forgery Detection:


To broaden the scope of forgery detection, future versions of the project will explore multi-modal approaches.
This involves extending the system's capabilities beyond image-based forgeries to include detection of audio and
video manipulations. The inclusion of multi-modal forgery detection will contribute to a more comprehensive and
holistic approach to digital media forensics.

5. Edge Computing Implementation:


With the proliferation of edge computing devices, optimizing the forgery detection system for edge deployment
is on the horizon. This will involve creating lightweight versions of the detection models suitable for deployment
on edge devices, ensuring that forgery detection can be performed efficiently on devices with limited
computational resources.

6. Explanability and Interpretability:


Enhancing the system's interpretability is a key consideration for future developments. Future versions will focus
on incorporating techniques that provide users with more insights into how the system arrives at its forgery
detection decisions. This includes implementing methods for generating explanations or visualizations that make
the decision-making process more transparent.
7. Collaborative Detection Networks:
The development of collaborative detection networks will be explored to facilitate information sharing among
forgery detection systems. This collaborative approach involves multiple systems working together to enhance
detection accuracy. A decentralized network of detection nodes could collectively contribute to a more robust and
globally effective forgery detection ecosystem.

8. User-Driven Customization:
Recognizing the diverse needs of users, future enhancements will include features that allow users to customize
and fine-tune the forgery detection system based on specific requirements. This user-driven customization can
involve adjusting sensitivity levels, defining regions of interest, and tailoring the system to different use cases.

9. Ethical Considerations and Bias Mitigation:


Future developments will prioritize addressing ethical considerations in forgery detection, including potential
biases in the models. The project will incorporate techniques for bias mitigation and ethical AI practices to ensure
fair and unbiased detection across different demographic groups and cultural contexts.

10. Cross-Domain Generalization:


Enhancing the system's ability to generalize across diverse domains will be a focal point of future research. This
involves training the models on datasets from various domains to improve adaptability and effectiveness when
faced with a wide range of forgery scenarios.

11. Interdisciplinary Collaborations:


To tackle the multidimensional challenges associated with image forgery detection, future enhancements will
involve fostering collaborations with experts from diverse fields. Engaging with professionals in law, psychology,
and human-computer interaction will provide valuable insights into the societal impact of forgery detection and
help refine the system's features accordingly.

In conclusion, the future enhancements outlined for the Image Forgery Detection project are designed to propel the
system into a new era of effectiveness, adaptability, and ethical responsibility. By embracing emerging
technologies and addressing the evolving landscape of digital forgery, the project aims to maintain its position as a
cutting-edge solution for ensuring the integrity of digital media content. The envisioned enhancements will not
only fortify the system against emerging threats but also contribute to the broader goal of fostering transparency,
accountability, and trust in the digital information ecosystem.
References
1.Xiao, B.; Wei, Y.; Bi, X.; Li, W.; Ma, J. Image splicing forgery detection combining coarse to refined
convolutional neural network and adaptive clustering. Inf. Sci. 2020, 511, 172–191. [CrossRef]
2.Kwon, M.J.; Yu, I.J.; Nam, S.H.; Lee, H.K. CAT-Net: Compression Artifact Tracing Network for Detection and
Localization of Image Splicing. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer
Vision (WACV), Waikoloa, HI, USA, 5–9 January 2021; pp. 375–384.
3.Wu, Y.; Abd Almageed, W.; Natarajan, P. ManTra-Net: Manipulation Tracing Network for Detection and
Localization of Image Forgeries With Anomalous Features. In Proceedings of the 2019 IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 9535–9544.
4. Ali, S.S.; Baghel, V.S.; Ganapathi, I.I.; Prakash, S. Robust biometric authentication system with a secure
user template. Image Vis. Comput. 2020, 104, 104004. [CrossRef]
5.Castillo Camacho, I.; Wang, K. A Comprehensive Review of Deep-Learning-Based Methods for Image
Forensics. J. Imaging 2021,
7, 69. [CrossRef] [PubMed]

6. Zheng, L.; Zhang, Y.; Thing, V.L. A survey on image tampering and its detection in real-world photos. J.
Vis. Commun. Image Represent. 2019, 58, 380–399. [CrossRef]
7.Jing, L.; Tian, Y. Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey. IEEE Trans.
Pattern Anal. Mach. Intell. 2020, 43, 1. [CrossRef]
8.Meena, K.B.; Tyagi, V. Image Forgery Detection: Survey and Future Directions. In Data, Engineering and
Applications: Volume 2; Shukla, R.K., Agrawal, J., Sharma, S., Singh Tomer, G., Eds.; Springer: Singapore, 2019;
pp. 163–194.
9.Mirsky, Y.; Lee, W. The Creation and Detection of Deepfakes: A Survey. ACM Comput. Surv. 2021, 54, 1–41.
[CrossRef]
10.Rony, J.; Belharbi, S.; Dolz, J.; Ayed, I.B.; McCaffrey, L.; Granger, E. Deep weakly-supervised learning
methods for classification and localization in histology images: A survey. arXiv 2019, arXiv:abs/1909.03354.
11.Lu, Z.; Chen, D.; Xue, D. Survey of weakly supervised semantic segmentation methods. In Proceedings of the
2018 Chinese Control Furthermore, Decision Conference (CCDC), Shenyang, China, 9–11 June 2018; pp. 1176–
1180.
12.Zhang, M.; Zhou, Y.; Zhao, J.; Man, Y.; Liu, B.; Yao, R. A survey of semi- and weakly supervised semantic
segmentation of images.
Artif. Intell. Rev. 2019, 53, 4259–4288. [CrossRef]
13.Verdoliva, L. Media Forensics and DeepFakes: An Overview. IEEE J. Sel. Top. Signal Process. 2020, 14, 910–
932. [CrossRef]
14.Luo, W.; Huang, J.; Qiu, G. JPEG Error Analysis and Its Applications to Digital Image Forensics. IEEE Trans.
Inf. Forensics Secur.
2010, 5, 480–491. [CrossRef]
15. Matern, F.; Riess, C.; Stamminger, M. Gradient-Based Illumination Description for Image Forgery
Detection. IEEE Trans. Inf. Forensics Secur. 2020, 15, 1303–1317. [CrossRef]
16. Christlein, V.; Riess, C.; Jordan, J.; Riess, C.; Angelopoulou, E. An Evaluation of Popular Copy-Move
Forgery Detection Approaches. IEEE Trans. Inf. Forensics Secur. 2012, 7, 1841–1854. [CrossRef]
17.Habibi, M.; Hassanpour, H. Splicing Image Forgery Detection and Localization Based on Color Edge
Inconsistency using Statistical Dispersion Measures. Int. J. Eng. 2021, 34, 443–451.
18. Dua, S.; Singh, J.; Parthasarathy, H. Image forgery detection based on statistical features of block DCT
coefficients. Procedia Comput. Sci. 2020, 171, 369–378. [CrossRef]

APPENDIX A :
DATA DICTIONARY
The data dictionary for the "Image Forgery Detection Based on Fusion of Lightweight Deep Learning Models"
project is an essential component for understanding and managing the data elements used in the system. This
comprehensive guide provides an extensive overview of the data attributes, their definitions, and relationships
within the context of image forgery detection.

1. Image Dataset:
The dataset used for training and testing the deep learning models comprises authentic and manipulated images.
Each entry includes attributes such as image ID, file path, and labels indicating authenticity or manipulation.
Metadata, including image resolution, format, and capture details, is recorded for comprehensive analysis.

2. Lightweight Deep Learning Models:


This section outlines the architecture and parameters of the lightweight deep learning models employed for forgery
detection. Each model's specifications, including layer configurations, activation functions, and optimization
techniques, are detailed. The data dictionary elucidates the specific weights and biases associated with each model.
3. Feature Extraction:
Features extracted from images for model training are documented in this section. This includes information on the
types of features, such as texture, color, and shape descriptors, and their corresponding values. Additionally, the
dictionary outlines the transformation processes applied during feature extraction.

4. Fusion Techniques:
In the fusion stage, different lightweight deep learning models are combined to enhance forgery detection
accuracy. The data dictionary defines the fusion techniques employed, whether it be averaging, stacking, or other
ensemble methods. Parameters related to the fusion process, such as weights assigned to individual models, are
also included.

5. Evaluation Metrics:
For assessing the performance of the forgery detection system, various evaluation metrics are defined in the data
dictionary. These include precision, recall, F1 score, and area under the receiver operating characteristic (ROC)
curve. The documentation details the computation of these metrics for both training and testing phases.

6. Training Parameters:
Training the deep learning models involves configuring numerous parameters, such as learning rates, batch sizes,
and epochs. The data dictionary provides an exhaustive list of these parameters along with their optimal values
determined through experimentation.

7. Confusion Matrix:
The confusion matrix, a crucial tool for evaluating model performance, is outlined in the data dictionary. Entries
include true positives, true negatives, false positives, and false negatives, allowing for a comprehensive
understanding of the model's ability to correctly identify authentic and manipulated images.

8. Image Forgery Labels:


The labels assigned to images indicating the presence or absence of forgery are defined in this section. Each label
corresponds to a specific class, providing clarity on the ground truth used during model training and testing.

9. Preprocessing Steps:
Various preprocessing steps are applied to the images before feeding them into the deep learning models. The data
dictionary details these steps, such as normalization, resizing, and data augmentation, along with their parameters
and effects on the image data.

10. Model Outputs:


The outputs generated by the deep learning models during forgery detection, including probability scores and class
predictions, are described in this section. These outputs serve as a basis for decision-making in the system.

In summary, the data dictionary for "Image Forgery Detection Based on Fusion of Lightweight Deep Learning
Models" acts as a comprehensive guide for stakeholders involved in the project. It ensures a clear understanding of
the data elements, their characteristics, and the processes involved in image forgery detection.

APPENDIX B :
OPERATION MANUAL

The operational manual for the "Image Forgery Detection Based on Fusion of Lightweight Deep Learning Models"
project serves as a detailed guide for users and administrators, providing comprehensive instructions on system
operation, maintenance, and troubleshooting. The manual focuses on ensuring a seamless and effective utilization
of the forgery detection system.

Operational Manual: Image Forgery Detection System

The operational manual delineates the step-by-step procedures for deploying, operating, and maintaining the Image
Forgery Detection System. The document is designed to cater to users with varying levels of technical expertise,
offering clear instructions for optimal system utilization.

1. System Deployment:
Begin by ensuring that all hardware and software prerequisites are met. Install the necessary dependencies,
including the required Python libraries and deep learning frameworks. The manual provides detailed instructions
on setting up the system environment, ensuring a smooth deployment process.

2. Data Input and Preprocessing:


Users are guided on how to input image data into the system for forgery detection. Clear instructions are provided
on data preprocessing steps, emphasizing the importance of consistent data formatting. The manual details the
supported image formats, resolutions, and any specific considerations for successful processing.

3. Model Inference:
The operational manual explains how to initiate the forgery detection process using the trained lightweight deep
learning models. Users are guided through the steps to load images, apply feature extraction, and execute the
fusion techniques for enhanced accuracy. The manual ensures users understand the significance of model outputs,
such as probability scores and class predictions.

4. Result Interpretation:
Upon forgery detection, the manual provides insights into interpreting the results. It explains the significance of
evaluation metrics, including precision, recall, F1 score, and the ROC curve. Users are instructed on how to
comprehend the confusion matrix and make informed decisions based on the model's performance.

5. Troubleshooting:
In case users encounter issues during system operation, the operational manual includes a comprehensive
troubleshooting guide. Common challenges and their solutions are outlined to assist users in resolving issues
efficiently. This section aims to enhance the user experience by addressing potential roadblocks.

6. System Maintenance:
To ensure the longevity and optimal performance of the forgery detection system, users are provided with
guidelines for routine maintenance. This includes updating dependencies, monitoring system resources, and
managing model versions. The manual emphasizes the importance of keeping the system up to date for continued
reliability.

7. Security Considerations:
A dedicated section on security considerations outlines best practices for safeguarding the forgery detection
system. Users are educated on potential security threats and provided with recommendations for mitigating risks,
ensuring the integrity and confidentiality of sensitive data.

8. User Permissions and Access Control:


To maintain a secure environment, the manual describes how to manage user permissions and access control. This
section details the different user roles and their corresponding privileges, allowing administrators to control access
to critical system functionalities.
The operational manual aims to empower users with the knowledge required for efficient and secure operation of
the Image Forgery Detection System. By providing clear instructions and guidance, the manual ensures a user-
friendly experience while maximizing the system's effectiveness in detecting image forgeries.

You might also like