second_report - Copy
second_report - Copy
ANOMALY DETECTION
PROJECT REPORT
Submitted by
SUSHANT GAURAV (21CS1123)
HRITIK KUMAR (21CS1098)
SUBHASH KUMAR (21ME1071)
MURARI YADAV (21CS1095)
to the Pondicherry University, in partial fulfillment of the requirement for the award of degree
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING
Chapter Page
Title
/Section No.
I BONAFIDE CERTIFICATE 1
II ACKNOWLEDGEMENT 1
III ABSTRACT 1
IV LIST OF ABBREVIATIONS 1
V LIST OF SYMBOLS 2
Chapter 1 INTRODUCTION
1.1 Overview 1
2.1 Techniques 3
5.5 Inferences 20
CONCLUSION AND FUTURE
Chapter 6 21
ENHANCEMENTS
6.1 21
Conclusion
6.2 Future Enhancements 22
REFERENCES 23
LIST OF FIGURES
Table Page
Title
No. No.
5.1 Complexity analysis of FastFlow and other 20
methods
BONAFIDE CERTIFICATE
This is to certify that the Project work titled “<<title of the project>>” is a bonafide work done by
<<name of the candidates>> (<<reg. no.>>) in partial fulfillment for the award of the degree of Bachelor of
Technology in Computer Science and Engineering of the Puducherry Technological University and that
this work has not been submitted for the award of any other degree of this/any other institution.
I also express my heart-felt gratitude to Dr. E. Ilavarasan, Professor & Head (CSE) for
giving constant motivation in succeeding my goal.
With profoundness I would like to express my sincere thanks to Dr. S. Mohan, The Vice-
Chancellor, Puducherry Technological University, for his kindness in extending the infrastructural
facilities to carry out my project work successfully.
I would be failing in my duty if I do not acknowledge the efforts of the Project Coordinator,
Dr.M.Thirumaran, and the Project Review Panel members, viz.<<names of the project
evaluation members>> for shaping our ideas and constructive criticisms during project review.
I also express my thanks to all the Faculty and Technical Staff members of the CSE
department for their timely help and the Central Library for facilitating useful reference
materials.
I would be failing in my duty if I didn’t acknowledge the immense help extended by my
friends, who stood by me through all my trials and tribulations. Their constant encouragement,
along with the valuable guidance of faculty members, helped me stay motivated and complete this
work successfully.
<<candidate’s name>>
ABSTRACT
This project is about anomaly detection, which means finding things in images that look
different or wrong. These "anomalies" can be small changes, missing parts, or things that do
not belong. Anomaly detection is very useful in real life, especially in factories, where it
normalizing flows, which helps the computer learn what a normal image looks like. After
learning that, the system can find anything that is not normal. What makes FastFlow special
is that it works in two dimensions (2D), so it keeps the shape and position of things in the
image. This helps the system find even small or hidden changes that other methods might
miss.
FastFlow is also very fast. It can look at an image and give results quickly, which is
important for real-time use, like in manufacturing, security, or health care. We tested
FastFlow using popular datasets like MVTec AD (for industrial defects), BTAD (real-world
anomalies), and CIFAR-10 (for image classification). In all tests, FastFlow did better than
Because of this, FastFlow is a great choice for systems that need to find problems quickly
and accurately. It can help reduce errors, save time, and improve safety in many areas.
LIST OF ABBREVIATIONS
AE Autoencoder
AUC Area Under the Curve
CNN Convolutional Neural Network
GAN Generative Adversarial
Network MSE Mean Squared
Error
NF Normalizing Flow
ROC Receiver Operating Characteristic
SSIM Structural Similarity Index
Measure VAE Variational
Autoencoder
ViT Vision Transformer
LIST OF SYMBOLS
θ Model parameters
σ Standard deviation
µ Mean
P (x) Probability distribution of x
N Normal distribution
CHAPTER 1 :
INTRODUCTION
1.1 OVERVIEW
This project focuses on unsupervised anomaly detection and localization using FastFlow,
a novel approach implemented with 2D normalizing flows as a probability distribution
esti- mator. Anomaly detection is critical in industrial applications, medical imaging, and
secu- rity checks where collecting and labeling sufficient anomaly data is often infeasible.
The primary objective of this study is to implement and evaluate FastFlow for
unsupervised anomaly detection and localization. Specific objectives include:
2. To design a lightweight network structure that can be used as a plug-in module with
various feature extractors.
Anomaly detection in computer vision is essential for identifying abnormal images and
lo- cating abnormal areas. However, due to the low probability density of anomalies,
normal and abnormal data usually exhibit a serious long-tail distribution. In some cases,
no ab- normal samples are available at all. This reality makes it difficult to collect and
1
annotate a
2
large amount of abnormal data for supervised learning in practical applications.
Unsupervised anomaly detection addresses this problem by using only normal samples
during the training process while still being able to identify and locate anomalies during
testing. FastFlow represents a promising approach to this challenge by effectively
modeling the distribution of normal features and identifying deviations from this
distribution.
Chapter 4 details the proposed FastFlow model, including its architecture, design, and
module descriptions.
Finally, the thesis concludes with a summary of the contributions and suggestions for
future enhancements.
3
CHAPTER 2 :
LITERATURE REVIEW
2.1 TECHNIQUES
Reconstruction-based methods rely on the principle that models trained only on normal
data will fail to accurately reconstruct anomalous data. These approaches typically
employ autoencoders, variational autoencoders (VAEs), or generative adversarial networks
(GANs) to learn a compact representation of normal data. During inference, the
reconstruction error is used as an anomaly score, with higher errors indicating potential
anomalies.
4
during inference.
5
2.2 SURVEY OF THE RELATED WORK
In recent years, several approaches have been proposed for unsupervised anomaly detec-
tion. One promising method is using deep neural networks to obtain the features of
normal images and model the distribution with statistical methods, then detect abnormal
samples that have different distributions.
Recently, some works began to use normalizing flow to estimate distribution. Through
a trainable process that maximizes the log-likelihood of normal image features, they
embed normal image features into a standard normal distribution and use the probability
to identify and locate anomalies. However, original one-dimensional normalizing flow
models need to flatten the two-dimensional input feature into a one-dimensional vector to
estimate the dis- tribution, which destroys the inherent spatial positional relationship of
the two-dimensional image and limits the ability of the flow model.
Additionally, these methods need to extract the features for a large number of patches
in images through the sliding window method and detect anomalies for each patch to
obtain anomaly location results. This leads to high complexity in inference and limits the
practical value of these methods.
Based on the literature review, it is evident that unsupervised anomaly detection remains
a challenging problem, particularly in terms of balancing detection accuracy and
computa- tional efficiency. Existing methods often face trade-offs between these aspects,
with many approaches either achieving high accuracy at the cost of computational
complexity or main- taining efficiency at the expense of detection performance.
6
detection and
7
localization, providing both high accuracy and computational efficiency.
Through this literature review, several research gaps have been identified:
1. The need for a method that effectively preserves spatial information when modeling
the distribution of normal features.
2. The requirement for an efficient end-to-end inference approach that avoids the com-
putational overhead of patch-based processing.
3. The importance of effectively leveraging both local and global features for anomaly
detection and localization.
4. The necessity for a flexible approach that can work with various feature extractors to
adapt to different application scenarios.
FastFlow aims to address these gaps by implementing 2D normalizing flows with fully
convolutional networks, enabling end-to-end inference, and supporting integration with
var- ious backbone architectures.
8
CHAPTER 3 :
EXISTING WORK
With the development of deep learning, recent unsupervised anomaly detection methods
use deep neural networks as feature extractors, producing more promising anomaly
results. Most existing methods use ResNet to extract distinguishing visual features. Some
work has also begun to introduce Vision Transformers (ViT) into unsupervised anomaly
detection fields.
ViT has a global receptive field and can better learn the relationship between global
and local features. DeiT introduces a teacher-student strategy specific to transformers,
which makes image transformers learn more efficiently and achieves state-of-the-art
performance. CaiT proposes a simple yet effective architecture designed in the spirit of
encoder/decoder architecture and demonstrates that transformer models offer a
competitive alternative to the best convolutional neural networks.
In addition to these methods, various other feature extraction approaches have been
pro- posed in recent years. EfficientNet, for instance, uses a compound scaling method to
uniformly scale all dimensions of depth, width, and resolution using a simple yet highly
ef- fective compound coefficient. MobileNet focuses on building lightweight deep neural
net- works through the use of depthwise separable convolutions, which significantly
reduces the model size and computational cost while maintaining good performance.
These lightweight models are particularly valuable in edge computing scenarios where
computational re- sources are limited.
Furthermore, some researchers have explored the use of pre-trained models for feature
extraction in anomaly detection. By leveraging models trained on large-scale datasets like
ImageNet, they can extract general-purpose features that are useful for anomaly detection
tasks, even when the target domain is different from the source domain. This transfer
learning approach helps address the challenge of limited training data in many anomaly
detection applications.
9
3.2 CONVOLUTIONAL NEURAL NETWORKS FOR FEATURE EXTRACTION
Convolutional Neural Networks (CNNs) have become the backbone of many computer
vision tasks, including anomaly detection. The hierarchical nature of CNNs makes them
well-suited for extracting features at multiple scales and abstraction levels. Early layers
capture low-level features such as edges and textures, while deeper layers extract more
complex, high-level features related to object parts and complete objects.
ResNet, one of the most popular CNN architectures, introduced residual connections
that enable the training of very deep networks by addressing the vanishing gradient prob-
lem. The skip connections allow the network to learn residual mappings instead of direct
mappings, making optimization easier and enabling networks with hundreds of layers.
This depth is particularly beneficial for capturing the complex patterns needed for
anomaly de- tection.
Another important CNN architecture is DenseNet, which connects each layer to every
other layer in a feed-forward fashion. This dense connectivity pattern strengthens feature
propagation, encourages feature reuse, and substantially reduces the number of
parameters. By reusing features, DenseNet creates shorter connections between layers
close to the input and layers close to the output, which helps with gradient flow during
training and makes the network more efficient.
In addition, the Feature Pyramid Network (FPN) architecture has been adopted in some
anomaly detection methods to effectively utilize multi-scale features. FPN builds a
feature pyramid with high-level semantics throughout while maintaining high resolution,
which is particularly useful for detecting anomalies of different sizes and scales. This
approach
helps address the challenge that anomalies can appear at various scales in images.
Beyond these methods, kernel density estimation (KDE) has been used to model the
probability density function of normal features. KDE is a non-parametric way to estimate
the probability density function of a random variable, which makes it flexible for
modeling complex distributions. However, KDE suffers from the curse of dimensionality
and can be computationally expensive for high-dimensional features.
Another approach is using one-class classification methods like Support Vector Data
De- scription (SVDD) and Deep SVDD. These methods aim to learn a compact
hypersphere that encloses the normal data points while excluding outliers. During
inference, data points that fall outside this hypersphere are considered anomalies. Deep
SVDD extends this idea by learning a neural network mapping that minimizes the volume
of a hypersphere enclosing the normal data representations.
Some recent works have also explored the use of self-supervised learning for
distribution estimation. These methods learn representations by solving pretext tasks,
such as predicting the rotation of an image or solving jigsaw puzzles, using only normal
samples. The intuition is that models trained on such tasks will perform well on normal
samples but poorly on anomalous ones, as the latter do not follow the patterns learned
from normal data.
For anomaly localization, most existing work focuses on how to reasonably use multi-
scale features to identify anomalies at different scales and semantic levels and achieve
pixel-level anomaly localization through the sliding window method. The importance of
the correla- tion between global information and local anomalies cannot be fully utilized,
and the slid- ing window method needs to test a large number of image patches with high
computational complexity.
To address these problems, FastFlow obtains learnable modeling of global and local
fea- ture distributions through an end-to-end testing phase, instead of designing a
11
complicated multi-scale strategy and using the sliding window method.
Other approaches for anomaly localization include using attention mechanisms to high-
light regions that contribute most to the anomaly score. These attention maps can be used
directly as anomaly localization maps or combined with other localization methods to im-
prove accuracy. Attention mechanisms help the model focus on the most relevant parts of
the image for the task at hand, which is particularly useful for identifying subtle
anomalies in complex scenes.
Additionally, the use of segmentation networks for anomaly localization has been ex-
plored. These methods train a segmentation model to predict the normal regions of an im-
age, and any deviation from the normal prediction is considered an anomaly. This
approach directly produces pixel-level localization maps without requiring additional post-
processing steps, but it typically requires more labeled data for training compared to other
methods.
Autoencoders have been widely used for anomaly detection due to their ability to learn
compact representations of normal data. An autoencoder consists of an encoder that com-
presses the input data into a lower-dimensional latent space and a decoder that reconstructs the
original input from this latent representation. When trained on normal samples, the
autoencoder learns to accurately reconstruct normal patterns but struggles to reconstruct
anomalies, resulting in higher reconstruction errors for anomalous samples.
Generative Adversarial Networks (GANs) have emerged as powerful tools for anomaly
detection. In the GAN framework, a generator learns to produce realistic samples from
a random noise distribution, while a discriminator learns to distinguish between real and
generated samples. After training on normal data, the generator learns to produce samples
that resemble normal patterns but struggles to generate anomalous patterns.
Existing approaches for unsupervised anomaly detection and localization have several
lim- itations:
2. Using the sliding window method to process patches for localization leads to high
computational complexity and limits the practical value of these methods.
3. The importance of the correlation between global information and local anomalies is
not fully utilized in current approaches.
4. Some methods achieve good image-level anomaly detection but fail to obtain exact
anomaly localization results.
5. Using hard-code position embedding to leverage the distribution learned by
normaliz- ing flows may underperform on more complicated datasets.
8. Most methods struggle with highly textured or complex backgrounds, where distin-
guishing between normal variations and actual anomalies becomes challenging.
10. Few methods address the challenge of domain adaptation, where a model trained
on one dataset needs to be applied to a different but related dataset with minimal fine-
tuning.
These limitations motivated the development of FastFlow, which addresses these issues
by implementing 2D normalizing flows with fully convolutional networks and enabling
end-to-end inference.
16
CHAPTER 4 :
PROPOSED WORK
1. A feature extractor that can be any backbone network like ResNet or Vision Trans-
former.
17
Figure 4.1: Overview of the proposed FastFlow approach
The key idea is to transform a simple probability density through a sequence of invertible
mappings to produce a more complex density. This transformation allows for both exact
likelihood computation and efficient sampling.
Formally, let z ∼ pz(z) be a random variable with a simple distribution, and let f : Rd
→ Rd be an invertible function. Then the random variable x = f (z) has a probability
density given by the change of variables formula:
px(x) = pz(f ∂x
−1
(x)) det ∂f−1 (4.1)
18
−1
where det ∂f∂x is the absolute value of the determinant of the Jacobian of f−1 with
respect to x.
For anomaly detection, normalizing flows are particularly useful because they provide
a direct way to estimate the likelihood of a sample under the learned distribution of nor-
mal data. Samples with low likelihood (i.e., high negative log-likelihood) are considered
anomalies, as they deviate from the learned normal patterns.
In the whole pipeline of FastFlow, visual features are first extracted from the input image
through ResNet or vision transformers. When using Vision Transformer (ViT) as the
feature extractor, only the feature of one certain layer is used because ViT has a stronger
ability to capture the relationship between local patches and the global feature. For
ResNet, the features of the last layer in the first three blocks are used, and these features
are put into three corresponding FastFlow models.
f = f1 ◦ f2 ◦ . . . ◦ fK (4.2)
The subnet in each coupling layer plays a crucial role in modeling the transformation
parameters. In FastFlow, the subnet is implemented as a fully convolutional network to
preserve spatial information. For backbone networks with large model capacities, such
as CaiT and Wide-ResNet50-2, the subnet alternates between 3×3 and 1×1 convolution
kernels to balance expressiveness and computational efficiency. For backbones with
smaller capacities, such as DeiT and ResNet18, only 3×3 convolution kernels are used to
maximize the expressiveness of the subnet.
The feature extractor can be any backbone network like ResNet or Vision Transformer.
When using ResNet, the features of the last layer in the first three blocks are used, and
these features are put into three corresponding FastFlow models. When using Vision
Trans- former, only the feature of one certain layer is used because ViT has a stronger
ability to capture the relationship between local patches and the global feature.
The ResNet feature extractor leverages the residual connections to enable deeper net-
works and better feature extraction. The features from different layers capture information
at various semantic levels, with early layers focusing on low-level patterns like edges and
20
textures, while deeper layers capture more abstract, high-level concepts. By using features
from multiple layers, FastFlow can detect anomalies at different levels of abstraction.
The affine coupling layer is a key component of the 2D flow model. It first splits the
input tensor along the channel dimension into two parts. One part remains unchanged,
while the other part is transformed based on the unchanged part. The transformation is
parameterized by two functions: a scale function and a translation function, both
implemented as convo- lutional neural networks. This design ensures that the
transformation is invertible and has a tractable Jacobian determinant.
The subnet in each coupling layer plays a crucial role in modeling the transformation
parameters. In FastFlow, the subnet is implemented as a fully convolutional network to
preserve spatial information. For backbone networks with large model capacities, such
as CaiT and Wide-ResNet50-2, the subnet alternates between 3×3 and 1×1 convolution
kernels to balance expressiveness and computational efficiency. For backbones with
smaller capacities, such as DeiT and ResNet18, only 3×3 convolution kernels are used to
maximize the expressiveness of the subnet.
In the training phase, FastFlow learns to transform the input visual feature into a tractable
distribution, specifically a standard normal distribution in two-dimensional space. This is
21
achieved by maximizing the log-likelihood of normal image features.
The training objective is to maximize the log-likelihood of normal samples under the
model. Given a dataset of normal images D = {x1, x2, . . . , xN }, the training objective
is:
N
Σ
max log pθ(xi) (4.3)
θ
i=1
where θ represents the parameters of the FastFlow model, and pθ(xi) is the probability
density of xi under the model. Using the change of variables formula, this can be rewritten
as:
N
Σ ∂f−1
max log pz(fθ (xi)) + log det ∂x
−1 (4.4)
θ i
i=1
where pz is the density of the base distribution (e.g., standard normal), and fθ−1 is the
inverse of the flow transformation.
In the inference phase, the likelihood value at each location on the two-dimensional
fea- ture is used as the anomaly score. Features of anomalous images should be out of
distribu- tion and hence have lower likelihoods than normal images.
One of the key advantages of FastFlow is its efficient inference process. Unlike pre-
vious methods that require evaluating multiple patches using a sliding window approach,
FastFlow processes the entire image in one forward pass, resulting in significant speed
improvements. The end-to-end inference capability is particularly valuable for real-time
applications where processing speed is critical.
22
Figure 4.3: FastFlow training and inference process
The FastFlow model is implemented using the PyTorch deep learning framework. The
feature extractors are pre-trained on ImageNet and fine-tuned on the target dataset. For
the 2D flow model, we use a stack of 8 transformation blocks for most experiments, with
each block containing 2 affine coupling layers.
For training, we use the Adam optimizer with a learning rate of 2e-4 and weight decay
of 1e-5. The batch size is set to 32, and the model is trained for 100 epochs. Learning rate
scheduling with a cosine annealing policy is employed to improve convergence.
Data augmentation plays a crucial role in training effective anomaly detection models.
For the MVTec AD dataset, we use standard augmentations such as random cropping,
hor- izontal flipping, and slight color jittering. These augmentations help the model learn
robust representations that are invariant to common variations in normal samples.
For multi-scale feature fusion in the ResNet-based models, we apply a weighted sum of
the anomaly maps produced by the FastFlow models operating on different feature levels.
23
The weights are learned during training to optimally combine information from different
semantic levels.
During inference, the anomaly score for the entire image is computed as the maximum
value in the anomaly map, reflecting the highest anomaly probability across all spatial lo-
cations. For localization, the anomaly map is thresholded using Otsu’s method to generate
a binary mask indicating anomalous regions.
2. It uses fully convolutional networks as the subnet in the flow model, which
maintains the relative position of the space to improve the performance of anomaly
detection.
3. It supports end-to-end inference of the whole image and directly outputs the
anomaly detection and location results at once, improving inference efficiency.
4. It can be used as a plug-in module with arbitrary deep feature extractors such as
ResNet and vision transformer.
5. It achieves high accuracy (99.4% AUC) in anomaly detection with high inference
efficiency.
7. FastFlow is more robust to variations in normal samples and can better handle
complex backgrounds and textures compared to methods that rely on reconstruction errors
or patch- based approaches.
8. The bidirectional nature of normalizing flows allows FastFlow to be used for both
anomaly detection (forward process) and feature generation (reverse process), providing
additional flexibility for various applications.
9. The model can be trained end-to-end without requiring complex multi-stage training
procedures, making it easier to implement and deploy in practice.
24
CHAPTER 5 :
SIMULATION RESULTS
URL: https://www.mvtec.com/company/research/datasets/mvtec-ad
URL: https://github.com/openvinotoolkit/anomalib/tree/main/
datasets/btad
The performance of the proposed method and all comparable methods is measured by
the Area Under the Receiver Operating Characteristic Curve (AUROC) at image or pixel
level:
2. Pixel-level AUROC: Measures the ability to localize anomalies at the pixel level.
For the detection task, evaluated models are required to output a single score (anomaly
score) for each input test image. In the localization task, methods need to output anomaly
scores for every pixel.
25
5.3 EXPERIMENTAL SETUP
1. Hardware Configuration:
2. Backbone Networks:
• ResNet18
• Wide-ResNet50-2
3. FastFlow Configuration:
• For backbone networks with large model capacities (CaiT and Wide-ResNet50-2),
alternating 3×3 and 1×1 convolution kernels are used in the subnet.
• For backbone networks with small model capacities (DeiT and ResNet18), only
3×3 convolution kernels are used in the subnet.
26
Table 5.1: Complexity analysis of FastFlow and other methods
27
5.4 RESULTS AND GRAPHS
The experimental results on the MVTec AD dataset show that FastFlow surpasses
previous state-of-the-art methods in terms of accuracy and inference efficiency with
various back- bone networks.
28
Table 5.4: Anomaly detection results on CIFAR-10 dataset
Method AUC
OC-SVM 58.6
KDE 61.0
l2-AE 53.6
VAE 58.3
Pixel CNN 55.1
LSA 64.1
AnoGAN 61.8
DSVDD 64.8
OCGAN 65.6
FastFlow (Ours) 66.7
5.5 INFERENCES
3. FastFlow works well with various backbone networks, including ResNet and Vision
Transformers.
4. The alternating use of 3×3 and 1×1 convolution kernels in the subnet improves per-
formance for backbone networks with large model capacities while reducing the number
of parameters.
5. FastFlow achieves the best results on the BTAD dataset with a mean AUC of 0.97,
outperforming other methods like AE MSE, AE MSE+SSIM, and VT-ADL.
7. The bidirectional nature of FastFlow allows for both anomaly detection (forward
pro- cess) and feature generation (reverse process), as demonstrated in the feature
visualization and generation experiments.
These results demonstrate that FastFlow is an effective and efficient approach for unsu-
pervised anomaly detection and localization, outperforming previous state-of-the-art
meth- ods in terms of both accuracy and inference efficiency.
29
Figure 5.2: Feature visualization and generation using FastFlow
30
CHAPTER 6 :
EXPERIMENTAL RESULTS ON TEST IMAGES AND
PERFORMANCE METRICS VISUALIZATION
Figure 6.10: Collage of test images: Original, overlay, and heatmap for clothes, toothbrush, and
31
6.1 Training Metrics - Toothbrush Category
The following plots show the performance metrics during training for the toothbrush category.
Each metric is plotted against the number of epochs to show the model’s learning progress.
32
6.3.1 AUROC Score
33
6.5 Training Loss - Toothbrush Category
34
6.7.2 Comparison of Different Runs
The following plots show the comparison between different training runs:
35
36
CHAPTER 7:
CONCLUSION AND FUTURE ENHANCEMENTS
7.1 CONCLUSION
2. Designing a lightweight network structure for FastFlow with the alternate stacking
of large and small convolution kernels for all steps, adopting an end-to-end inference
phase with high efficiency.
3. Demonstrating that the proposed FastFlow model can be used as a plug-in model
with various different feature extractors, including ResNet and Vision Transformers.
Extensive experimental results on the MVTec AD, BTAD, and CIFAR-10 datasets
show that FastFlow surpasses previous state-of-the-art methods in terms of accuracy and
infer- ence efficiency with various backbone networks. Specifically, FastFlow achieves
99.4% AUC in anomaly detection on the MVTec AD dataset with high inference
efficiency.
The bidirectional nature of FastFlow allows for both anomaly detection (forward
process) and feature generation (reverse process), providing additional flexibility and
utility. The forward process transforms the original distribution of normal feature maps to
a standard normal distribution, while the reverse process can generate visual features
from specific probability sampling variables.
37
7.2 FUTURE ENHANCEMENTS
1. Exploring more advanced backbone networks: Future work could investigate the
use of more advanced backbone networks or custom-designed feature extractors that
better capture the characteristics of normal samples in specific domains.
5. Reducing model complexity: While FastFlow is already more efficient than many
existing methods, further reducing the model complexity without sacrificing performance
would make it more suitable for deployment on edge devices with limited computational
resources.
These future enhancements would further strengthen FastFlow’s capabilities and broaden
its applicability to various real-world anomaly detection scenarios.
38
REFERENCES
[5] K. Roth, L. Pemula, J. Zepeda, B. Scho¨ lkopf, T. Brox, and P. Gehler, ”Towards
total recall in industrial anomaly detection,” in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), pp. 14318-14328, June,
2022.
[6] M. Rudolph, B. Wandt, and B. Rosenhahn, ”Same same but DifferNet: Semi-
supervised defect detection with normalizing flows,” in Proceedings of the Winter
Conference on Applications of Computer Vision (WACV), pp. 1907-1916, January,
2021.
[8] D. P. Kingma and P. Dhariwal, ”Glow: Generative flow with invertible 1x1
convolu- tions,” Advances in Neural Information Processing Systems (NeurIPS),
vol. 31, pp. 10215-10224, December, 2018.
39
[9] L. Dinh, J. Sohl-Dickstein, and S. Bengio, ”Density estimation using real NVP,” in
Proceedings of the International Conference on Learning Representations (ICLR),
April, 2017.
40
[10] K. He, X. Zhang, S. Ren, and J. Sun, ”Deep residual learning for image
recognition,” in Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pp. 770-778, June, 2016.
[14] N. Cohen and Y. Hoshen, ”Sub-image anomaly detection with deep pyramid corre-
spondences,” Available at: https://arxiv.org/abs/2005.02357, May, 2020.
[15] J. Yi and S. Yoon, ”Patch SVDD: Patch-level SVDD for anomaly detection and seg-
mentation,” in Proceedings of the Asian Conference on Computer Vision (ACCV),
pp. 375-390, November, 2020.