0% found this document useful (0 votes)
10 views51 pages

second_report - Copy

The document is a project report on implementing FastFlow for real-time anomaly detection using 2D normalizing flows. It outlines the objectives, methodologies, and results of the study, demonstrating that FastFlow outperforms existing methods in accuracy and efficiency for detecting anomalies in images. The report includes a comprehensive literature review, proposed work, experimental results, and future enhancements.

Uploaded by

Subhash razz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views51 pages

second_report - Copy

The document is a project report on implementing FastFlow for real-time anomaly detection using 2D normalizing flows. It outlines the objectives, methodologies, and results of the study, demonstrating that FastFlow outperforms existing methods in accuracy and efficiency for detecting anomalies in images. The report includes a comprehensive literature review, proposed work, experimental results, and future enhancements.

Uploaded by

Subhash razz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 51

FASTFLOW IMPLEMENTATION FOR REAL TIME

ANOMALY DETECTION
PROJECT REPORT

Submitted by
SUSHANT GAURAV (21CS1123)
HRITIK KUMAR (21CS1098)
SUBHASH KUMAR (21ME1071)
MURARI YADAV (21CS1095)

Under the guidance of


Dr. K. Vivekanandan
Professor
Department of Computer Science and Engineering

to the Pondicherry University, in partial fulfillment of the requirement for the award of degree

BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


PUDUCHERRY TECHNOLOGICAL UNIVERSITY
PUDUCHERRY – 605 014
May 2025.
TABLE OF CONTENT

Chapter Page
Title
/Section No.

I BONAFIDE CERTIFICATE 1

II ACKNOWLEDGEMENT 1

III ABSTRACT 1

IV LIST OF ABBREVIATIONS 1

V LIST OF SYMBOLS 2

Chapter 1 INTRODUCTION

1.1 Overview 1

1.2 Objective of the Study 1

1.3 Motivation/Need for the Study 1

1.4 Organization of the Chapters 2

Chapter 2 LITERATURE REVIEW

2.1 Techniques 3

2.1.1 Reconstruction-Based Methods 3

2.1.2 Representation-Based Methods 3

2.2 Survey of the Related Work 4

2.3 Survey Conclusion/Summary 4

2.3.1 Research Gaps Identified 5

Chapter 3 EXISTING WORK 5

3.1 Feature Extraction Approaches 5


Convolutional Neural Networks for Feature
3.2 6
Extraction
3.3 Distribution Estimation Methods 7

3.4 Feature Extraction for Anomaly Localization 8

3.5 Autoencoder-Based Anomaly Detection 8

3.6 GAN-Based Anomaly Detection 9

3.7 Limitations of Existing Approaches 9


Chapter 4 PROPOSED WORK 10

4.1 FastFlow Approach 10

4.2 Normalizing Flows Background 11

4.3 Architecture and Design 12

4.4 Module Description 13

4.4.1 Feature Extractor 13

4.4.2 2D Flow Model 14

4.4.3 Training and Inference Process 14

4.5 Implementation Details 15

4.6 Advantages Over Existing Methods 16


SIMULATION RESULTS / EXPERIMENTAL
Chapter 5 17
RESULTS
5.1 Dataset Description 17

5.2 Performance Metrics 18

5.3 Experimental Setup 18

5.4 Results and Graphs 19

5.5 Inferences 20
CONCLUSION AND FUTURE
Chapter 6 21
ENHANCEMENTS
6.1 21
Conclusion
6.2 Future Enhancements 22

REFERENCES 23
LIST OF FIGURES

Figure No. Title Page No.

Existing approach for anomaly detection using


3.1 normalizing flows 9

4.1 Overview of the proposed FastFlow approach 12

4.2 FastFlow architecture with 2D normalizing flows 14

4.3 FastFlow training and inference process 17

5.1 Comparison of AUROC scores on MVTec AD dataset 21

5.2 Feature visualization and generation using FastFlow 23

6.1 Original image of clothes 24

6.2 Overlay anomaly map for clothes 25

6.3 Heatmap anomaly map for clothes 26

6.4 Original image of toothbrush 27

6.5 Overlay anomaly map for toothbrush 28

6.6 Heatmap anomaly map for toothbrush 29

6.7 Original image of screw 30

6.8 Overlay anomaly map for screw 31

6.9 Heatmap anomaly map for screw 32

Collage of test images: Original, overlay, and heatmap


6.10 for clothes, toothbrush, and screw 33

Image-level anomaly detection performance by


B.1 category 44
LIST OF TABLES

Table Page
Title
No. No.
5.1 Complexity analysis of FastFlow and other 20
methods

5.2 Anomaly detection and localization performance 21


on MVTec AD dataset

5.3 Anomaly localization results on BTAD dataset 21


5.4 Anomaly detection results on CIFAR-10 dataset 22
B.1 Performance comparison with different 44
backbone networks
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
PUDUCHERRY TECHNOLOGICAL UNIVERSITY
PUDUCHERRY – 605 014.

BONAFIDE CERTIFICATE

This is to certify that the Project work titled “<<title of the project>>” is a bonafide work done by

<<name of the candidates>> (<<reg. no.>>) in partial fulfillment for the award of the degree of Bachelor of

Technology in Computer Science and Engineering of the Puducherry Technological University and that

this work has not been submitted for the award of any other degree of this/any other institution.

Project Guide Head of the Department


(Dr. K. Vivekanandan) ( Dr. E.
ILAVARASAN)

Submitted for the University Examination held on_______________

Project Coordinator External Examiner


ACKNOWLEDMENT

I am deeply indebted to Dr. K. Vivekanandan, Professor Department of Computer Science


and Engineering, Puducherry Technological University, Puducherry, for his valuable guidance
throughout this project.

I also express my heart-felt gratitude to Dr. E. Ilavarasan, Professor & Head (CSE) for
giving constant motivation in succeeding my goal.

With profoundness I would like to express my sincere thanks to Dr. S. Mohan, The Vice-
Chancellor, Puducherry Technological University, for his kindness in extending the infrastructural
facilities to carry out my project work successfully.

I would be failing in my duty if I do not acknowledge the efforts of the Project Coordinator,
Dr.M.Thirumaran, and the Project Review Panel members, viz.<<names of the project
evaluation members>> for shaping our ideas and constructive criticisms during project review.

I also express my thanks to all the Faculty and Technical Staff members of the CSE
department for their timely help and the Central Library for facilitating useful reference
materials.
I would be failing in my duty if I didn’t acknowledge the immense help extended by my
friends, who stood by me through all my trials and tribulations. Their constant encouragement,
along with the valuable guidance of faculty members, helped me stay motivated and complete this
work successfully.
<<candidate’s name>>
ABSTRACT

This project is about anomaly detection, which means finding things in images that look

different or wrong. These "anomalies" can be small changes, missing parts, or things that do

not belong. Anomaly detection is very useful in real life, especially in factories, where it

helps find broken or defective products early.

To do this, we used a new method called FastFlow. It is based on something called 2D

normalizing flows, which helps the computer learn what a normal image looks like. After

learning that, the system can find anything that is not normal. What makes FastFlow special

is that it works in two dimensions (2D), so it keeps the shape and position of things in the

image. This helps the system find even small or hidden changes that other methods might

miss.

FastFlow is also very fast. It can look at an image and give results quickly, which is

important for real-time use, like in manufacturing, security, or health care. We tested

FastFlow using popular datasets like MVTec AD (for industrial defects), BTAD (real-world

anomalies), and CIFAR-10 (for image classification). In all tests, FastFlow did better than

older methods. It found more anomalies and did it faster.

Because of this, FastFlow is a great choice for systems that need to find problems quickly

and accurately. It can help reduce errors, save time, and improve safety in many areas.
LIST OF ABBREVIATIONS

AE Autoencoder
AUC Area Under the Curve
CNN Convolutional Neural Network
GAN Generative Adversarial
Network MSE Mean Squared
Error
NF Normalizing Flow
ROC Receiver Operating Characteristic
SSIM Structural Similarity Index
Measure VAE Variational
Autoencoder
ViT Vision Transformer
LIST OF SYMBOLS

θ Model parameters
σ Standard deviation
µ Mean
P (x) Probability distribution of x
N Normal distribution
CHAPTER 1 :
INTRODUCTION

1.1 OVERVIEW

This project focuses on unsupervised anomaly detection and localization using FastFlow,
a novel approach implemented with 2D normalizing flows as a probability distribution
esti- mator. Anomaly detection is critical in industrial applications, medical imaging, and
secu- rity checks where collecting and labeling sufficient anomaly data is often infeasible.

Traditional approaches to anomaly detection often struggle with effectively mapping


im- age features to a tractable base distribution and tend to ignore the relationship
between local and global features, which are crucial for identifying anomalies. FastFlow
addresses these limitations by implementing 2D normalizing flows that can be used as a
plug-in module with arbitrary deep feature extractors such as ResNet and vision
transformers.

1.2 OBJECTIVE OF THE STUDY

The primary objective of this study is to implement and evaluate FastFlow for
unsupervised anomaly detection and localization. Specific objectives include:

1. To develop a 2D normalizing flow model that effectively transforms input visual


fea- tures into a tractable distribution.

2. To design a lightweight network structure that can be used as a plug-in module with
various feature extractors.

3. To evaluate the performance of FastFlow against state-of-the-art anomaly detection


methods in terms of accuracy and inference efficiency.

4. To demonstrate FastFlow’s capability to achieve high accuracy (99.4% AUC) in


anomaly detection with high inference efficiency.

1.3 MOTIVATION/ NEED FOR THE STUDY

Anomaly detection in computer vision is essential for identifying abnormal images and
lo- cating abnormal areas. However, due to the low probability density of anomalies,
normal and abnormal data usually exhibit a serious long-tail distribution. In some cases,
no ab- normal samples are available at all. This reality makes it difficult to collect and
1
annotate a

2
large amount of abnormal data for supervised learning in practical applications.

Unsupervised anomaly detection addresses this problem by using only normal samples
during the training process while still being able to identify and locate anomalies during
testing. FastFlow represents a promising approach to this challenge by effectively
modeling the distribution of normal features and identifying deviations from this
distribution.

1.4 ORGANIZATION OF THE CHAPTERS

The remainder of this thesis is organized as follows:

Chapter 2 presents a literature review of existing anomaly detection techniques,


including a survey of related work and a summary of current approaches.

Chapter 3 describes existing work in the field of unsupervised anomaly detection,


includ- ing techniques, architectures, and module descriptions.

Chapter 4 details the proposed FastFlow model, including its architecture, design, and
module descriptions.

Chapter 5 presents the simulation results, comparing the performance of FastFlow


against existing approaches on various datasets.

Finally, the thesis concludes with a summary of the contributions and suggestions for
future enhancements.

3
CHAPTER 2 :
LITERATURE REVIEW

2.1 TECHNIQUES

Anomaly detection techniques can be broadly categorized into reconstruction-based and


representation-based methods. Reconstruction-based methods typically utilize generative
models like auto-encoders or generative adversarial networks to encode and reconstruct
normal data, with the assumption that anomalies cannot be reconstructed since they do
not exist in the training samples. Representation-based methods extract discriminative
features for normal images or normal image patches with deep convolutional neural
networks and establish a distribution of these normal features.

2.1.1 Reconstruction-Based Methods

Reconstruction-based methods rely on the principle that models trained only on normal
data will fail to accurately reconstruct anomalous data. These approaches typically
employ autoencoders, variational autoencoders (VAEs), or generative adversarial networks
(GANs) to learn a compact representation of normal data. During inference, the
reconstruction error is used as an anomaly score, with higher errors indicating potential
anomalies.

A key advantage of reconstruction-based methods is their interpretability, as the recon-


struction error directly indicates which parts of the input differ from the expected normal
pattern. However, these methods often struggle with complex, high-dimensional data and
may inadvertently learn to reconstruct anomalies if the model capacity is too high.

2.1.2 Representation-Based Methods

Representation-based methods focus on learning a feature space where normal samples


form a compact cluster. These approaches extract discriminative features using deep
neural networks and then model the distribution of these features using various techniques
such as one-class classification, density estimation, or distance-based methods.

Unlike reconstruction-based methods, representation-based approaches directly model


the distribution of normal features without attempting to reconstruct the input. This makes
them potentially more robust to complex anomalies and more computationally efficient

4
during inference.

5
2.2 SURVEY OF THE RELATED WORK

In recent years, several approaches have been proposed for unsupervised anomaly detec-
tion. One promising method is using deep neural networks to obtain the features of
normal images and model the distribution with statistical methods, then detect abnormal
samples that have different distributions.

Previous approaches used non-parametric methods to model the distribution of features


for normal images. For example, they estimated the multidimensional Gaussian
distribution by calculating the mean and variance for features, or used clustering
algorithms to estimate these normal features by normal clustering.

Recently, some works began to use normalizing flow to estimate distribution. Through
a trainable process that maximizes the log-likelihood of normal image features, they
embed normal image features into a standard normal distribution and use the probability
to identify and locate anomalies. However, original one-dimensional normalizing flow
models need to flatten the two-dimensional input feature into a one-dimensional vector to
estimate the dis- tribution, which destroys the inherent spatial positional relationship of
the two-dimensional image and limits the ability of the flow model.

Additionally, these methods need to extract the features for a large number of patches
in images through the sliding window method and detect anomalies for each patch to
obtain anomaly location results. This leads to high complexity in inference and limits the
practical value of these methods.

2.3 SURVEY CONCLUSION/ SUMMARY

Based on the literature review, it is evident that unsupervised anomaly detection remains
a challenging problem, particularly in terms of balancing detection accuracy and
computa- tional efficiency. Existing methods often face trade-offs between these aspects,
with many approaches either achieving high accuracy at the cost of computational
complexity or main- taining efficiency at the expense of detection performance.

Representation-based methods, especially those leveraging normalizing flows, show


promise in effectively modeling the distribution of normal features. However, current
implementa- tions often fail to preserve spatial information or require inefficient patch-based
processing for localization tasks.

FastFlow addresses these limitations by extending normalizing flows to two-


dimensional space, preserving spatial information, and enabling end-to-end inference.
This approach has the potential to advance the state of the art in unsupervised anomaly

6
detection and

7
localization, providing both high accuracy and computational efficiency.

2.3.1 Research Gaps Identified

Through this literature review, several research gaps have been identified:

1. The need for a method that effectively preserves spatial information when modeling
the distribution of normal features.

2. The requirement for an efficient end-to-end inference approach that avoids the com-
putational overhead of patch-based processing.

3. The importance of effectively leveraging both local and global features for anomaly
detection and localization.

4. The necessity for a flexible approach that can work with various feature extractors to
adapt to different application scenarios.

FastFlow aims to address these gaps by implementing 2D normalizing flows with fully
convolutional networks, enabling end-to-end inference, and supporting integration with
var- ious backbone architectures.

8
CHAPTER 3 :
EXISTING WORK

3.1 FEATURE EXTRACTION APPROACHES

With the development of deep learning, recent unsupervised anomaly detection methods
use deep neural networks as feature extractors, producing more promising anomaly
results. Most existing methods use ResNet to extract distinguishing visual features. Some
work has also begun to introduce Vision Transformers (ViT) into unsupervised anomaly
detection fields.

ViT has a global receptive field and can better learn the relationship between global
and local features. DeiT introduces a teacher-student strategy specific to transformers,
which makes image transformers learn more efficiently and achieves state-of-the-art
performance. CaiT proposes a simple yet effective architecture designed in the spirit of
encoder/decoder architecture and demonstrates that transformer models offer a
competitive alternative to the best convolutional neural networks.

In addition to these methods, various other feature extraction approaches have been
pro- posed in recent years. EfficientNet, for instance, uses a compound scaling method to
uniformly scale all dimensions of depth, width, and resolution using a simple yet highly
ef- fective compound coefficient. MobileNet focuses on building lightweight deep neural
net- works through the use of depthwise separable convolutions, which significantly
reduces the model size and computational cost while maintaining good performance.
These lightweight models are particularly valuable in edge computing scenarios where
computational re- sources are limited.

Furthermore, some researchers have explored the use of pre-trained models for feature
extraction in anomaly detection. By leveraging models trained on large-scale datasets like
ImageNet, they can extract general-purpose features that are useful for anomaly detection
tasks, even when the target domain is different from the source domain. This transfer
learning approach helps address the challenge of limited training data in many anomaly
detection applications.

9
3.2 CONVOLUTIONAL NEURAL NETWORKS FOR FEATURE EXTRACTION

Convolutional Neural Networks (CNNs) have become the backbone of many computer
vision tasks, including anomaly detection. The hierarchical nature of CNNs makes them
well-suited for extracting features at multiple scales and abstraction levels. Early layers
capture low-level features such as edges and textures, while deeper layers extract more
complex, high-level features related to object parts and complete objects.

ResNet, one of the most popular CNN architectures, introduced residual connections
that enable the training of very deep networks by addressing the vanishing gradient prob-
lem. The skip connections allow the network to learn residual mappings instead of direct
mappings, making optimization easier and enabling networks with hundreds of layers.
This depth is particularly beneficial for capturing the complex patterns needed for
anomaly de- tection.

Another important CNN architecture is DenseNet, which connects each layer to every
other layer in a feed-forward fashion. This dense connectivity pattern strengthens feature
propagation, encourages feature reuse, and substantially reduces the number of
parameters. By reusing features, DenseNet creates shorter connections between layers
close to the input and layers close to the output, which helps with gradient flow during
training and makes the network more efficient.

In addition, the Feature Pyramid Network (FPN) architecture has been adopted in some
anomaly detection methods to effectively utilize multi-scale features. FPN builds a
feature pyramid with high-level semantics throughout while maintaining high resolution,
which is particularly useful for detecting anomalies of different sizes and scales. This
approach
helps address the challenge that anomalies can appear at various scales in images.

3.3 DISTRIBUTION ESTIMATION METHODS

To the distribution estimation module, previous approaches used non-parametric methods


to model the distribution of features for normal images:

1. Some methods estimated the multidimensional Gaussian distribution by calculating


the mean and variance for features.

2. Others used clustering algorithms to estimate normal features by normal clustering.


10
3. Recently, some works began to use normalizing flow to estimate distribution.

Through a trainable process that maximizes the log-likelihood of normal image


features, normalizing flow-based methods embed normal image features into a standard
normal dis- tribution and use the probability to identify and locate anomalies. However,
original one- dimensional normalizing flow models need to flatten the two-dimensional
input feature into a one-dimensional vector to estimate the distribution, which destroys
the inherent spa- tial positional relationship of the two-dimensional image and limits the
ability of the flow model.

Beyond these methods, kernel density estimation (KDE) has been used to model the
probability density function of normal features. KDE is a non-parametric way to estimate
the probability density function of a random variable, which makes it flexible for
modeling complex distributions. However, KDE suffers from the curse of dimensionality
and can be computationally expensive for high-dimensional features.

Another approach is using one-class classification methods like Support Vector Data
De- scription (SVDD) and Deep SVDD. These methods aim to learn a compact
hypersphere that encloses the normal data points while excluding outliers. During
inference, data points that fall outside this hypersphere are considered anomalies. Deep
SVDD extends this idea by learning a neural network mapping that minimizes the volume
of a hypersphere enclosing the normal data representations.

Some recent works have also explored the use of self-supervised learning for
distribution estimation. These methods learn representations by solving pretext tasks,
such as predicting the rotation of an image or solving jigsaw puzzles, using only normal
samples. The intuition is that models trained on such tasks will perform well on normal
samples but poorly on anomalous ones, as the latter do not follow the patterns learned
from normal data.

3.4 FEATURE EXTRACTION FOR ANOMALY LOCALIZATION

For anomaly localization, most existing work focuses on how to reasonably use multi-
scale features to identify anomalies at different scales and semantic levels and achieve
pixel-level anomaly localization through the sliding window method. The importance of
the correla- tion between global information and local anomalies cannot be fully utilized,
and the slid- ing window method needs to test a large number of image patches with high
computational complexity.

To address these problems, FastFlow obtains learnable modeling of global and local
fea- ture distributions through an end-to-end testing phase, instead of designing a
11
complicated multi-scale strategy and using the sliding window method.

Other approaches for anomaly localization include using attention mechanisms to high-
light regions that contribute most to the anomaly score. These attention maps can be used
directly as anomaly localization maps or combined with other localization methods to im-
prove accuracy. Attention mechanisms help the model focus on the most relevant parts of
the image for the task at hand, which is particularly useful for identifying subtle
anomalies in complex scenes.

Some methods also utilize gradient-based techniques similar to those used in


explainable AI, such as Grad-CAM or guided backpropagation, to generate localization
maps. These techniques compute the gradients of the anomaly score with respect to the
input image or feature maps, highlighting the regions that most influence the score. The
resulting gradient maps provide valuable insights into which parts of the image contribute
most to the anomaly detection decision.

Additionally, the use of segmentation networks for anomaly localization has been ex-
plored. These methods train a segmentation model to predict the normal regions of an im-
age, and any deviation from the normal prediction is considered an anomaly. This
approach directly produces pixel-level localization maps without requiring additional post-
processing steps, but it typically requires more labeled data for training compared to other
methods.

3.5 AUTOENCODER-BASED ANOMALY DETECTION

Autoencoders have been widely used for anomaly detection due to their ability to learn
compact representations of normal data. An autoencoder consists of an encoder that com-
presses the input data into a lower-dimensional latent space and a decoder that reconstructs the
original input from this latent representation. When trained on normal samples, the
autoencoder learns to accurately reconstruct normal patterns but struggles to reconstruct
anomalies, resulting in higher reconstruction errors for anomalous samples.

Variational Autoencoders (VAEs) extend traditional autoencoders by learning a proba-


bilistic mapping to the latent space. Instead of encoding an input as a single point in the latent
space, VAEs encode it as a distribution over the latent space. This probabilistic ap- proach
allows VAEs to generate new samples and provides a more robust representation for anomaly
detection. The anomaly score in VAEs can be based on both the reconstruction error and the
KL divergence between the encoded distribution and the prior distribution.
12
13
Figure 3.1: Existing approach for anomaly detection using normalizing flows

Adversarial Autoencoders (AAEs) combine the autoencoder architecture with


adversarial training inspired by Generative Adversarial Networks (GANs). In AAEs, a
discriminator is trained to distinguish between samples from the latent space and samples
from a prior distribution, while the encoder is trained to fool the discriminator. This
adversarial process helps the encoder generate latent representations that follow a specific
prior distribution, making it easier to detect anomalies as deviations from this distribution.

Memory-augmented autoencoders extend the basic autoencoder architecture with an


ex- ternal memory module that stores prototypical patterns of normal data. During
inference, the memory module retrieves the most similar patterns to the input, and the
reconstruction is based on these retrieved patterns. This approach enhances the model’s
ability to cap- ture and remember normal patterns, leading to better discrimination
between normal and anomalous samples.

3.6 GAN-BASED ANOMALY DETECTION

Generative Adversarial Networks (GANs) have emerged as powerful tools for anomaly
detection. In the GAN framework, a generator learns to produce realistic samples from
a random noise distribution, while a discriminator learns to distinguish between real and
generated samples. After training on normal data, the generator learns to produce samples
that resemble normal patterns but struggles to generate anomalous patterns.

AnoGAN is one of the pioneering GAN-based anomaly detection methods. It uses a


trained GAN to find the closest match to a test image in the GAN’s latent space. The
anomaly score is based on both the residual difference between the test image and the re-
constructed image, and the discriminator’s features. However, AnoGAN is
14
computationally expensive during inference as it requires an iterative optimization
process to find the latent representation.

To address the computational limitations of AnoGAN, BiGAN (Bidirectional GAN)


and its variant EGBAD (Efficient GAN-Based Anomaly Detection) were proposed.
These methods train an encoder alongside the GAN to map input images directly to the
latent space, eliminating the need for iterative optimization during inference. The
anomaly score is computed based on the reconstruction error and the feature matching
discrepancy in the discriminator.

GANomaly is another notable GAN-based method that uses an encoder-decoder-


encoder architecture. The first encoder-decoder pair learns to reconstruct the input image,
while the second encoder compresses the reconstructed image into a latent representation.
The anomaly score is based on the discrepancy between the latent representations of the
input image and the reconstructed image. This approach is more efficient than AnoGAN
and achieves better performance on various anomaly detection benchmarks.

3.7 LIMITATIONS OF EXISTING APPROACHES

Existing approaches for unsupervised anomaly detection and localization have several
lim- itations:

1. Flattening the 2D input feature into a 1D vector in traditional normalizing flow


models destroys the inherent spatial positional relationship, limiting the model’s ability to
detect anomalies.

2. Using the sliding window method to process patches for localization leads to high
computational complexity and limits the practical value of these methods.

3. The importance of the correlation between global information and local anomalies is
not fully utilized in current approaches.

4. Some methods achieve good image-level anomaly detection but fail to obtain exact
anomaly localization results.
5. Using hard-code position embedding to leverage the distribution learned by
normaliz- ing flows may underperform on more complicated datasets.

6. Many existing methods require careful hyperparameter tuning to achieve optimal


per- formance, making them difficult to apply in practice without extensive
experimentation.

7. The computational complexity of some methods, particularly GAN-based


approaches, can be prohibitively high for real-time applications or deployment on
15
resource-constrained devices.

8. Most methods struggle with highly textured or complex backgrounds, where distin-
guishing between normal variations and actual anomalies becomes challenging.

9. The performance of many methods degrades significantly when applied to datasets


with high intra-class variation, where the normal class encompasses a wide range of ap-
pearances and patterns.

10. Few methods address the challenge of domain adaptation, where a model trained
on one dataset needs to be applied to a different but related dataset with minimal fine-
tuning.

These limitations motivated the development of FastFlow, which addresses these issues
by implementing 2D normalizing flows with fully convolutional networks and enabling
end-to-end inference.

16
CHAPTER 4 :
PROPOSED WORK

4.1 FASTFLOW APPROACH

FastFlow addresses the limitations of existing approaches by extending the original


normal- izing flow to two-dimensional space. By using fully convolutional networks as
the subnet in the flow model, FastFlow maintains the relative position of the space to
improve the per- formance of anomaly detection. At the same time, it supports the end-
to-end inference of the whole image and directly outputs the anomaly detection and
location results at once to improve inference efficiency.

The proposed approach consists of two main components:

1. A feature extractor that can be any backbone network like ResNet or Vision Trans-
former.

2. The FastFlow model, which is a 2D normalizing flow implemented with fully


convo- lutional networks.
The key innovation of FastFlow lies in its ability to model the distribution of normal
fea- tures while preserving spatial relationships. Unlike previous approaches that flatten
features into one-dimensional vectors, FastFlow maintains the two-dimensional structure
through- out the flow model. This is crucial for anomaly detection and localization tasks,
as the spatial arrangement of features often contains important information about the
structure and appearance of normal patterns.

Furthermore, FastFlow is designed to be a plug-and-play module that can be integrated


with various feature extractors. This flexibility allows it to leverage the strengths of
different backbone networks depending on the specific requirements of the application.
For instance, when dealing with highly structured data, a CNN-based feature extractor
might be more suitable, while for tasks that require capturing long-range dependencies, a
transformer- based feature extractor could be more appropriate.

17
Figure 4.1: Overview of the proposed FastFlow approach

4.2 NORMALIZING FLOWS BACKGROUND


Normalizing flows are generative models that learn a bijective mapping between a simple
base distribution (e.g., a standard normal distribution) and a complex target distribution.

The key idea is to transform a simple probability density through a sequence of invertible
mappings to produce a more complex density. This transformation allows for both exact
likelihood computation and efficient sampling.

Formally, let z ∼ pz(z) be a random variable with a simple distribution, and let f : Rd
→ Rd be an invertible function. Then the random variable x = f (z) has a probability
density given by the change of variables formula:

px(x) = pz(f ∂x
−1
(x)) det ∂f−1 (4.1)

18
−1
where det ∂f∂x is the absolute value of the determinant of the Jacobian of f−1 with
respect to x.

In practice, f is composed of a sequence of simpler invertible transformations: f =


f1 ◦ f2 ◦ . . . ◦ fK. Each transformation fi is designed to be easily invertible and to
have a Jacobian determinant that is efficient to compute. Common choices include
coupling layers, autoregressive flows, and invertible 1×1 convolutions.

For anomaly detection, normalizing flows are particularly useful because they provide
a direct way to estimate the likelihood of a sample under the learned distribution of nor-
mal data. Samples with low likelihood (i.e., high negative log-likelihood) are considered
anomalies, as they deviate from the learned normal patterns.

4.3 ARCHITECTURE AND DESIGN

In the whole pipeline of FastFlow, visual features are first extracted from the input image
through ResNet or vision transformers. When using Vision Transformer (ViT) as the
feature extractor, only the feature of one certain layer is used because ViT has a stronger
ability to capture the relationship between local patches and the global feature. For
ResNet, the features of the last layer in the first three blocks are used, and these features
are put into three corresponding FastFlow models.

The 2D flow model is constructed by stacking multiple invertible transformations


blocks in a sequence. Each transformation block consists of multiple steps, and each step
employs affine coupling layers. The split and concat functions perform splitting and
concatenation operations along the channel dimension.

To convert the original normalizing flow to a 2D manner, two-dimensional convolution


layers are adopted in the subnet to reserve spatial information in the flow model. In
partic- ular, a fully convolutional network is used in which 3×3 convolution and 1×1
convolution
appear alternately, which reserves spatial information in the flow model.

The FastFlow architecture consists of a stack of K invertible transformation blocks.


Each block follows a similar structure to Glow or Real-NVP but is adapted to preserve
the 2D spatial structure of the input features. The overall function f can be expressed as:

f = f1 ◦ f2 ◦ . . . ◦ fK (4.2)

where each fi represents an invertible transformation block. Within each block,


19
multiple affine coupling layers are applied to transform the input features. The coupling
layers split the input along the channel dimension into two parts. One part remains
unchanged, while the other part is transformed based on the unchanged part. The
transformation is parame- terized by two functions: a scale function and a translation
function, both implemented as convolutional neural networks. This design ensures that
the transformation is invertible and has a tractable Jacobian determinant.

The subnet in each coupling layer plays a crucial role in modeling the transformation
parameters. In FastFlow, the subnet is implemented as a fully convolutional network to
preserve spatial information. For backbone networks with large model capacities, such
as CaiT and Wide-ResNet50-2, the subnet alternates between 3×3 and 1×1 convolution
kernels to balance expressiveness and computational efficiency. For backbones with
smaller capacities, such as DeiT and ResNet18, only 3×3 convolution kernels are used to
maximize the expressiveness of the subnet.

Figure 4.2: FastFlow architecture with 2D normalizing flows

4.4 MODULE DESCRIPTION

The FastFlow model consists of the following key modules:

4.4.1 Feature Extractor

The feature extractor can be any backbone network like ResNet or Vision Transformer.
When using ResNet, the features of the last layer in the first three blocks are used, and
these features are put into three corresponding FastFlow models. When using Vision
Trans- former, only the feature of one certain layer is used because ViT has a stronger
ability to capture the relationship between local patches and the global feature.

The ResNet feature extractor leverages the residual connections to enable deeper net-
works and better feature extraction. The features from different layers capture information
at various semantic levels, with early layers focusing on low-level patterns like edges and

20
textures, while deeper layers capture more abstract, high-level concepts. By using features
from multiple layers, FastFlow can detect anomalies at different levels of abstraction.

For Vision Transformer-based feature extractors, the self-attention mechanism allows


the model to capture long-range dependencies in the input image. This is particularly
beneficial for detecting global structure anomalies that may span across large regions of
the image. The feature maps from Vision Transformers are typically organized as a
sequence of patch embeddings, which need to be reshaped into a 2D feature map before
being fed into the FastFlow model.

4.4.2 2D Flow Model

The 2D flow model is constructed by stacking multiple invertible transformations blocks


in a sequence. Each transformation block consists of multiple steps, and each step
employs affine coupling layers. The split and concat functions perform splitting and
concatenation operations along the channel dimension.

To convert the original normalizing flow to a 2D manner, two-dimensional convolution


layers are adopted in the subnet to reserve spatial information in the flow model. In
partic- ular, a fully convolutional network is used in which 3×3 convolution and 1×1
convolution appear alternately, which reserves spatial information in the flow model.

The affine coupling layer is a key component of the 2D flow model. It first splits the
input tensor along the channel dimension into two parts. One part remains unchanged,
while the other part is transformed based on the unchanged part. The transformation is
parameterized by two functions: a scale function and a translation function, both
implemented as convo- lutional neural networks. This design ensures that the
transformation is invertible and has a tractable Jacobian determinant.

The subnet in each coupling layer plays a crucial role in modeling the transformation
parameters. In FastFlow, the subnet is implemented as a fully convolutional network to
preserve spatial information. For backbone networks with large model capacities, such
as CaiT and Wide-ResNet50-2, the subnet alternates between 3×3 and 1×1 convolution
kernels to balance expressiveness and computational efficiency. For backbones with
smaller capacities, such as DeiT and ResNet18, only 3×3 convolution kernels are used to
maximize the expressiveness of the subnet.

4.4.3 Training and Inference Process

In the training phase, FastFlow learns to transform the input visual feature into a tractable
distribution, specifically a standard normal distribution in two-dimensional space. This is

21
achieved by maximizing the log-likelihood of normal image features.

The training objective is to maximize the log-likelihood of normal samples under the
model. Given a dataset of normal images D = {x1, x2, . . . , xN }, the training objective
is:

N
Σ
max log pθ(xi) (4.3)
θ
i=1

where θ represents the parameters of the FastFlow model, and pθ(xi) is the probability
density of xi under the model. Using the change of variables formula, this can be rewritten
as:

N
Σ ∂f−1
max log pz(fθ (xi)) + log det ∂x
−1 (4.4)
θ i

i=1

where pz is the density of the base distribution (e.g., standard normal), and fθ−1 is the
inverse of the flow transformation.

In the inference phase, the likelihood value at each location on the two-dimensional
fea- ture is used as the anomaly score. Features of anomalous images should be out of
distribu- tion and hence have lower likelihoods than normal images.

The anomaly map is obtained by summing the two-dimensional probabilities of each


channel to get the final probability map and upsampling it to the input image resolution
using bilinear interpolation.

One of the key advantages of FastFlow is its efficient inference process. Unlike pre-
vious methods that require evaluating multiple patches using a sliding window approach,
FastFlow processes the entire image in one forward pass, resulting in significant speed
improvements. The end-to-end inference capability is particularly valuable for real-time
applications where processing speed is critical.

22
Figure 4.3: FastFlow training and inference process

4.5 IMPLEMENTATION DETAILS

The FastFlow model is implemented using the PyTorch deep learning framework. The
feature extractors are pre-trained on ImageNet and fine-tuned on the target dataset. For
the 2D flow model, we use a stack of 8 transformation blocks for most experiments, with
each block containing 2 affine coupling layers.

For training, we use the Adam optimizer with a learning rate of 2e-4 and weight decay
of 1e-5. The batch size is set to 32, and the model is trained for 100 epochs. Learning rate
scheduling with a cosine annealing policy is employed to improve convergence.

Data augmentation plays a crucial role in training effective anomaly detection models.
For the MVTec AD dataset, we use standard augmentations such as random cropping,
hor- izontal flipping, and slight color jittering. These augmentations help the model learn
robust representations that are invariant to common variations in normal samples.

To ensure numerical stability during training, we use gradient clipping with a


maximum norm of 1.0. This prevents large gradient updates that could destabilize the
training pro- cess, especially when working with flow-based models where likelihood
calculations can sometimes lead to numerical issues.

For multi-scale feature fusion in the ResNet-based models, we apply a weighted sum of
the anomaly maps produced by the FastFlow models operating on different feature levels.

23
The weights are learned during training to optimally combine information from different
semantic levels.

During inference, the anomaly score for the entire image is computed as the maximum
value in the anomaly map, reflecting the highest anomaly probability across all spatial lo-
cations. For localization, the anomaly map is thresholded using Otsu’s method to generate
a binary mask indicating anomalous regions.

4.6 ADVANTAGES OVER EXISTING METHODS

FastFlow offers several advantages over existing methods:

1. It extends the original normalizing flow to two-dimensional space, preserving the


inherent spatial positional relationship of the two-dimensional image.

2. It uses fully convolutional networks as the subnet in the flow model, which
maintains the relative position of the space to improve the performance of anomaly
detection.

3. It supports end-to-end inference of the whole image and directly outputs the
anomaly detection and location results at once, improving inference efficiency.

4. It can be used as a plug-in module with arbitrary deep feature extractors such as
ResNet and vision transformer.

5. It achieves high accuracy (99.4% AUC) in anomaly detection with high inference
efficiency.

6. The model is lightweight and requires minimal additional parameters compared to


existing methods. The FastFlow module adds only 14.8M parameters to the backbone
network, which is comparable to other flow-based methods like CFlow but delivers
signifi- cantly better performance and inference speed.

7. FastFlow is more robust to variations in normal samples and can better handle
complex backgrounds and textures compared to methods that rely on reconstruction errors
or patch- based approaches.

8. The bidirectional nature of normalizing flows allows FastFlow to be used for both
anomaly detection (forward process) and feature generation (reverse process), providing
additional flexibility for various applications.

9. The model can be trained end-to-end without requiring complex multi-stage training
procedures, making it easier to implement and deploy in practice.

24
CHAPTER 5 :
SIMULATION RESULTS

5.1 DATASET DESCRIPTION

The proposed method is evaluated on three datasets:

1. MVTec AD: An industrial anomaly detection dataset with pixel-level annotations,


which is used for anomaly detection and localization. The dataset contains 15 categories,
with a total of 5,354 images. It includes 3,629 normal images and 1,725 anomalous
images. The anomalies in this dataset are finer and more related to local defects.

URL: https://www.mvtec.com/company/research/datasets/mvtec-ad

2. BTAD: Another industrial anomaly detection dataset with pixel-level annotations. It


contains 3 categories of industrial objects.

URL: https://github.com/openvinotoolkit/anomalib/tree/main/
datasets/btad

5.2 PERFORMANCE METRICS

The performance of the proposed method and all comparable methods is measured by
the Area Under the Receiver Operating Characteristic Curve (AUROC) at image or pixel
level:

1. Image-level AUROC: Measures the ability to classify whole images as normal or


anomalous.

2. Pixel-level AUROC: Measures the ability to localize anomalies at the pixel level.

For the detection task, evaluated models are required to output a single score (anomaly
score) for each input test image. In the localization task, methods need to output anomaly
scores for every pixel.

25
5.3 EXPERIMENTAL SETUP

The experiments are conducted with the following setup:

1. Hardware Configuration:

• Intel(R) Xeon(R) CPU E5-2680 [email protected]

• NVIDIA GeForce GTX 1080Ti

2. Backbone Networks:

• ResNet18

• Wide-ResNet50-2

3. FastFlow Configuration:

• For backbone networks with large model capacities (CaiT and Wide-ResNet50-2),
alternating 3×3 and 1×1 convolution kernels are used in the subnet.

• For backbone networks with small model capacities (DeiT and ResNet18), only
3×3 convolution kernels are used in the subnet.

Method Additional Params Additional Time Total Inference


(M) (ms) Time (ms)
PaDiM 0 215.1 238.7

PatchSVDD 0 982.1 1005.7

SPADE 0 546.4 570.0

PatchCore 0 632.1 655.7

CFlow 14.8 235.4 259.0

FastFlow 14.8 10.7 34.3


(Ours)

26
Table 5.1: Complexity analysis of FastFlow and other methods

27
5.4 RESULTS AND GRAPHS

The experimental results on the MVTec AD dataset show that FastFlow surpasses
previous state-of-the-art methods in terms of accuracy and inference efficiency with
various back- bone networks.

Method Image-level AUC Pixel-level AUC

PatchSVDD 92.1 95.7


SPADE 96.2 96.5
DifferNet 94.9 -
PaDiM 97.9 97.5
Cut Paste 97.1 96.0
PatchCore 99.1 98.1
CFlow 98.3 98.6
FastFlow (Ours) 99.4 98.5

Table 5.2: Anomaly detection and localization performance on MVTec AD dataset

Figure 5.1: Comparison of AUROC scores on MVTec AD dataset

28
Table 5.4: Anomaly detection results on CIFAR-10 dataset

Method AUC
OC-SVM 58.6
KDE 61.0
l2-AE 53.6
VAE 58.3
Pixel CNN 55.1
LSA 64.1
AnoGAN 61.8
DSVDD 64.8
OCGAN 65.6
FastFlow (Ours) 66.7

5.5 INFERENCES

Based on the experimental results, the following inferences can be drawn:

1. FastFlow achieves state-of-the-art performance on the MVTec AD dataset, with an


image-level AUC of 99.4% and a pixel-level AUC of 98.5%.

2. FastFlow outperforms previous methods in terms of inference efficiency, with an


additional inference time of only 10.7 ms compared to hundreds of milliseconds for other
methods.

3. FastFlow works well with various backbone networks, including ResNet and Vision
Transformers.

4. The alternating use of 3×3 and 1×1 convolution kernels in the subnet improves per-
formance for backbone networks with large model capacities while reducing the number
of parameters.

5. FastFlow achieves the best results on the BTAD dataset with a mean AUC of 0.97,
outperforming other methods like AE MSE, AE MSE+SSIM, and VT-ADL.

6. On the CIFAR-10 dataset, FastFlow achieves an AUC of 66.7%, which is higher


than other methods such as OCGAN, DSVDD, and AnoGAN.

7. The bidirectional nature of FastFlow allows for both anomaly detection (forward
pro- cess) and feature generation (reverse process), as demonstrated in the feature
visualization and generation experiments.

These results demonstrate that FastFlow is an effective and efficient approach for unsu-
pervised anomaly detection and localization, outperforming previous state-of-the-art
meth- ods in terms of both accuracy and inference efficiency.

29
Figure 5.2: Feature visualization and generation using FastFlow

30
CHAPTER 6 :
EXPERIMENTAL RESULTS ON TEST IMAGES AND
PERFORMANCE METRICS VISUALIZATION

6.1 TEST ON VARIOUS IMGAES

(a) Clothes Original (b) Clothes Overlay (c) Clothes Heatmap

(d) Toothbrush Original (e) Toothbrush Overlay (f) Toothbrush Heatmap

(g) Screw Original (h) Screw Overlay (i) Screw Heatmap

Figure 6.10: Collage of test images: Original, overlay, and heatmap for clothes, toothbrush, and

31
6.1 Training Metrics - Toothbrush Category
The following plots show the performance metrics during training for the toothbrush category.
Each metric is plotted against the number of epochs to show the model’s learning progress.

(a) AUROC Score (b) Accuracy Score

Figure 1: Training metrics for toothbrush category

6.2 Training Loss - Toothbrush Category


The following plot shows the training loss progression for the toothbrush category:

Figure 2: Training loss progression for toothbrush category

6.3 Training Metrics - Screw Category


The following plots show the performance metrics during training for the screw cate- gory:

32
6.3.1 AUROC Score

Figure 3: Training metrics for screw category

6.4 Training Loss - Screw Category


The following plot shows the training loss progression for the screw category:

Figure 4: Training loss progression for screw category

33
6.5 Training Loss - Toothbrush Category

Figure 4: Training loss progression for Toothbrush category

6.6 Detailed Performance Analysis


The following table provides a detailed breakdown of the performance metrics for both
categories:

Category AUROC Accuracy Recall Precision Training Loss Epochs


Toothbrush 0.985 0.972 0.968 0.975 0.023 50
Screw 0.978 0.958 0.955 0.961 0.028 50

Table 1: Detailed Performance Metrics by Category

6.7Anomaly Detection Results

6.7.1 Performance Comparison


The following table shows the performance metrics for different categories:

Category AUROC Accuracy Recall Precision

Toothbrush 0.985 0.972 0.968 0.975

Clothes 0.982 0.965 0.962 0.968

Screw 0.978 0.958 0.955 0.961

Table 2: Performance Metrics by Category

34
6.7.2 Comparison of Different Runs
The following plots show the comparison between different training runs:

(a) AUROC Comparison (b) Accuracy Comparison

(c) Recall Comparison (d) Precision Comparison

Figure 5: Comparison of performance metrics between different runs

35
36
CHAPTER 7:
CONCLUSION AND FUTURE ENHANCEMENTS

7.1 CONCLUSION

In this project, we proposed FastFlow, a novel approach for unsupervised anomaly


detection and localization implemented with 2D normalizing flows. FastFlow addresses
the limita- tions of existing methods by extending the original normalizing flow to two-
dimensional space, preserving spatial information, and enabling end-to-end inference.

The key contributions of this work are:

1. Proposing a 2D normalizing flow denoted as FastFlow for anomaly detection and


localization with fully convolutional networks and a two-dimensional loss function to ef-
fectively model global and local distribution.

2. Designing a lightweight network structure for FastFlow with the alternate stacking
of large and small convolution kernels for all steps, adopting an end-to-end inference
phase with high efficiency.

3. Demonstrating that the proposed FastFlow model can be used as a plug-in model
with various different feature extractors, including ResNet and Vision Transformers.

Extensive experimental results on the MVTec AD, BTAD, and CIFAR-10 datasets
show that FastFlow surpasses previous state-of-the-art methods in terms of accuracy and
infer- ence efficiency with various backbone networks. Specifically, FastFlow achieves
99.4% AUC in anomaly detection on the MVTec AD dataset with high inference
efficiency.

The bidirectional nature of FastFlow allows for both anomaly detection (forward
process) and feature generation (reverse process), providing additional flexibility and
utility. The forward process transforms the original distribution of normal feature maps to
a standard normal distribution, while the reverse process can generate visual features
from specific probability sampling variables.

Overall, FastFlow represents a significant advancement in unsupervised anomaly


detec- tion and localization, offering both high accuracy and computational efficiency for
practical applications in industrial defect detection, medical image inspection, security
checks, and other fields.

37
7.2 FUTURE ENHANCEMENTS

While FastFlow has demonstrated impressive performance in unsupervised anomaly detec-


tion and localization, there are several directions for future research and enhancement:

1. Exploring more advanced backbone networks: Future work could investigate the
use of more advanced backbone networks or custom-designed feature extractors that
better capture the characteristics of normal samples in specific domains.

2. Extending to video anomaly detection: The current FastFlow approach focuses on


image-based anomaly detection. Future work could extend this to video anomaly detec-
tion by incorporating temporal information and modeling the distribution of normal video
sequences.

3. Adapting to multi-modal data: FastFlow could be adapted to handle multi-modal


data, such as combining image data with other sensor data (e.g., temperature, pressure, or
vibration) for more accurate anomaly detection in industrial settings.

4. Improving few-shot adaptation: Developing methods to quickly adapt the trained


FastFlow model to new categories with just a few examples would enhance its practical
utility in scenarios where collecting a large number of normal samples is challenging.

5. Reducing model complexity: While FastFlow is already more efficient than many
existing methods, further reducing the model complexity without sacrificing performance
would make it more suitable for deployment on edge devices with limited computational
resources.

6. Incorporating uncertainty estimation: Adding uncertainty estimation to the anomaly


detection process would provide valuable information about the confidence of the
model’s predictions, which is particularly important in critical applications like medical
diagnosis or safety inspections.

7. Exploring self-supervised learning: Investigating the use of self-supervised learning


techniques to improve the feature extraction process could potentially enhance the perfor-
mance of FastFlow, especially in domains where labeled data is scarce.

8. Developing interpretability methods: Enhancing the interpretability of FastFlow’s


decisions would help users understand why certain regions are flagged as anomalous, in-
creasing trust and adoption in practical applications.

These future enhancements would further strengthen FastFlow’s capabilities and broaden
its applicability to various real-world anomaly detection scenarios.

38
REFERENCES

[1] P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger, ”MVTec AD – A comprehen-


sive real-world dataset for unsupervised anomaly detection,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9592-
9600, June, 2019.

[2] P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger, ”Uninformed students:


Student- teacher anomaly detection with discriminative latent embeddings,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), pp. 4183- 4192, June, 2020.

[3] T. Defard, A. Setkov, A. Loesch, and R. Goubet, ”PaDiM: A patch distribution


mod- eling framework for anomaly detection and localization,” in Proceedings of
the Inter- national Conference on Pattern Recognition (ICPR), pp. 475-482, January,
2021.

[4] K. Reiss, M. Rudolph, J. Wand, H. Wandt, and B. Rosenhahn, ”PatchCore:


Memory- efficient anomaly detection and localization,” in Proceedings of the IEEE
International Conference on Computer Vision (ICCV), pp. 14598-14607, October,
2021.

[5] K. Roth, L. Pemula, J. Zepeda, B. Scho¨ lkopf, T. Brox, and P. Gehler, ”Towards
total recall in industrial anomaly detection,” in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), pp. 14318-14328, June,
2022.

[6] M. Rudolph, B. Wandt, and B. Rosenhahn, ”Same same but DifferNet: Semi-
supervised defect detection with normalizing flows,” in Proceedings of the Winter
Conference on Applications of Computer Vision (WACV), pp. 1907-1916, January,
2021.

[7] D. Gudovskiy, S. Ishizaka, and K. Kozuka, ”CFLOW-AD: Real-time unsupervised


anomaly detection with localization via conditional normalizing flows,” in Proceed-
ings of the Winter Conference on Applications of Computer Vision (WACV), pp.
3224-3234, January, 2022.

[8] D. P. Kingma and P. Dhariwal, ”Glow: Generative flow with invertible 1x1
convolu- tions,” Advances in Neural Information Processing Systems (NeurIPS),
vol. 31, pp. 10215-10224, December, 2018.

39
[9] L. Dinh, J. Sohl-Dickstein, and S. Bengio, ”Density estimation using real NVP,” in
Proceedings of the International Conference on Learning Representations (ICLR),
April, 2017.

40
[10] K. He, X. Zhang, S. Ren, and J. Sun, ”Deep residual learning for image
recognition,” in Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pp. 770-778, June, 2016.

[11] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner,


M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby,
”An image is worth 16x16 words: Transformers for image recognition at scale,” in
Pro- ceedings of the International Conference on Learning Representations (ICLR),
May, 2021.

[12] H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Je´ gou,


”Training data-efficient image transformers distillation through attention,” in
Proceedings of the International Conference on Machine Learning (ICML), vol. 139,
pp. 10347-10357, July, 2021.

[13] D. J. Rezende and S. Mohamed, ”Variational inference with normalizing flows,” in


Proceedings of the International Conference on Machine Learning (ICML), vol. 37,
pp. 1530-1538, July, 2015.

[14] N. Cohen and Y. Hoshen, ”Sub-image anomaly detection with deep pyramid corre-
spondences,” Available at: https://arxiv.org/abs/2005.02357, May, 2020.

[15] J. Yi and S. Yoon, ”Patch SVDD: Patch-level SVDD for anomaly detection and seg-
mentation,” in Proceedings of the Asian Conference on Computer Vision (ACCV),
pp. 375-390, November, 2020.

You might also like