0% found this document useful (0 votes)
17 views14 pages

Anomaly Detection Solutions The Dynamic Loss Approach in VAE For Manufacturing and IoT Environment

Uploaded by

seyedadel2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views14 pages

Anomaly Detection Solutions The Dynamic Loss Approach in VAE For Manufacturing and IoT Environment

Uploaded by

seyedadel2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Results in Engineering 25 (2025) 104277

Contents lists available at ScienceDirect

Results in Engineering
journal homepage: www.sciencedirect.com/journal/results-in-engineering

Research paper

Anomaly detection solutions: The dynamic loss approach in VAE for


manufacturing and IoT environment
Praveen Vijai , Bagavathi Sivakumar P ∗
Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Coimbatore, India

A R T I C L E I N F O A B S T R A C T

Keywords: Anomaly detection is critical for enhancing operational efficiency, safety, and maintenance in industrial
Anomaly detection applications, particularly in the era of Industry 4.0 and IoT. While traditional anomaly detection approaches face
Deep learning limitations such as scalability issues, high false alarm rates, and reliance on skilled expertise, this study proposes a
Variational autoencoder
novel approach using a BiLSTM-Variational Autoencoder (BiLSTM-VAE) model with a dynamic loss function. The
Bidirectional long short term memory model
Dynamic loss function
proposed model addresses key challenges, including data imbalance, interpretability issues, and computational
complexity. By leveraging the bidirectional capability of BiLSTM in the encoder and decoder, the model captures
comprehensive temporal dependencies, enabling more effective anomaly detection. The innovative dynamic loss
function integrates a tempering index mechanism with tuneable parameters (𝛼 and 𝛾 ), which assigns higher
weights to underrepresented classes and down-weights easily class­fied samples. This improves reconstruction
and enhances detection accuracy, particularly for minority class anomalies. Experimental evaluations on the
SKAB and TEP datasets demonstrate the superiority of the proposed framework. The model achieved an accuracy
of 98% and an F1 score of 96% for binary class­fication on the SKAB dataset and a multiclass class­fication
accuracy of 92% with an F1 score of 85% on the TEP dataset. These results significantly outperform state-of­
the-art models, including traditional VAE, LSTM, and transformer-based approaches. The proposed BiLSTM-VAE
model not only demonstrates robust anomaly detection capabilities across diverse datasets but also effectively
handles data imbalance and reduces false positives, making it a scalable and reliable solution for industrial
anomaly detection in the context of Industry 4.0 and IoT environments.

1. Introduction blocks and roller chains [7]. These components are essential for ensuring
the reliability and efficiency of machinery, ultimately enhancing op­
The manufacturing industry serves as a cornerstone of economic de­ erational performance and minimizing downtime in industrial settings
velopment, encompassing a diverse range of sectors that convert raw [8]. However, despite their importance, these components can be sus­
materials into finished products through various processes [1,2]. Thus, ceptible to various anomalies and faults under certain conditions [9].
it has been revealed that, manufacturing sector has evolved signifi­ Mechanical failures can arise from issues such as inadequate lubrica­
cantly from its artisanal roots to a sophisticated, technology driven tion, misalignment, electrical malfunctions, operational challenges, or
landscape. Moreover, modern manufacturing environment is charac­ exposure to physical stressors.
terized by diverse production methods, advanced technologies and a Therefore, early detection of these anomalies is crucial for maintain­
focus on efficiency and sustainability [3,4]. Historically, manufactur­ ing operational integrity and preventing costly disruptions [10]. Imple­
ing began with manual craftsmanship before the industrial revolution mentation of anomaly detection in industrial environment is crucial for
introduced mechanization and mass production techniques [5,6]. This maintaining operational efficient, safety and security. Some of the com­
shift allowed for greater quantities of goods to be produced at lower mon methods used for detecting anomalies in industrial machineries are
costs, fundamentally altering economic structures and consumer mar­ visual inceptions, checklists and SOPs (Standard Operation Procedures),
kets. Typically, industrial applications depend heavily on a diverse array data logging and analysis, operator Feedback and reporting, RCA (Root
of components including mechanical parts such as ball bearings, elec­ Cause Analysis) and many more. When anomalies are detected, conduct­
trical components like capacitors, and critical elements such as pillow ing a root cause analysis helps determine the underlying issues causing

* Corresponding author.
E-mail addresses: cb.en.d.cse14005@cb.students.amrita.edu (P. Vijai), pbsk@cb.amrita.edu (B.S. P).

https://doi.org/10.1016/j.rineng.2025.104277
Received 9 November 2024; Received in revised form 25 December 2024; Accepted 4 February 2025

Available online 6 February 2025


2590-1230/© 2025 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277
the anomalies. This approach aids in involving a team-based investiga­ dencies and VAEs pr­ficiency in reconstruction ensures that the model
tion that looks at historical data, operator insights and machine perfor­ not only learns from preceding behaviors but also understands the un­
mance metrics. Though the performance of the manual methods focuses derlying distribution of normal operations, making it more effective at
on delivering better performance for detecting anomalies in machines, identifying deviations indicative of potential failures. Additionally, in­
there are few drawbacks of employing traditional techniques such as clusion of dynamic loss function in the proposed model overcomes the
time consuming process [11], delayed detection, limited data handling pitfalls encountered by the existing CE function (Cross entropy) in tra­
capacity, difficulty in identifying complex anomalies, inconsistent ap­ ditional VAEs like difficulty in capturing complex anomalies, limited
plications of procedures, subjectivity and reliance on skilled experts. interpretability, sensitivity to outliers as traditional reconstruction is
Therefore, to tackle the issues encountered conventional approaches, AI highly sensitive to noise. While VAEs provide latent space representa­
driven procedures are opted as AI driven anomaly detection systems are tion, interpreting this space to comprehend the root cause of anomalies
designed to handle vast amounts of data efficiently, making it a scalable can be challenging. Furthermore, computational cost is considered as
option [12]. Besides, AI systems excel in pattern recognition, analyz­ one of the issues in the loss function of existing works as training VAEs
ing vast datasets to detect subtle changes that can indicate potential can be computationally expensive especially for high dimensional sen­
problems. Moreover, AI powered anomaly detection facilitates predic­ sor data commonly encountered in industrial setting. Therefore, in order
tive maintenance by identifying early signs of equipment failure. This to tackle these challenges, robust loss function must be incorporated.
proactive approach enhances the longevity of machinery and optimizes Therefore, proposed model tackle these issues by employing dynamic
maintenance schedules. Therefore, various state-of-the-art approaches loss function in BiLSTM-VAE model which uses tempering index tech­
have adopted AI algorithms for detecting anomalies in industrial ma­ nique to overcome these drawbacks and results in better outcome.
chineries. The objectives of the present research work are outlined as follows,
Accordingly, ML and DL based [13] approach has been used to diag­
nosis the defects and abnormalities in components of hydraulic system. • To perform both binary and multiclass class­fication for anomaly
The study has used correlation coefficients to extract the relevant fea­ detection in industrial applications using proposed BiLSTM-VAE
tures of 2335 and each hydraulic components have been analyzed by with dynamic loss function, where BiLSTM models are used in en­
Boruta algorithm. To evaluate the TPR and TNR, the study has em­ coder part and decoder part of VAE.
ployed XGBoost, MLP (Multi-Layer Perceptron), LDA (Linear Discrim­ • To incorporate tempering index in proposed dynamic loss function
inant Analysis), SVC (Support Vector Class­fier), LightGBM, random to address the class imbalance issue by assigning higher weights
forest and decision tree. The study has achieved better performance. to underrepresented classes, by doing so the dynamic loss function
Moreover, multi spectra imaging [14] has been used to detect the leak­ pay more attention to the minority classes, improving the detection
age in pipes by personalized hardware an image processing. Likewise, accuracy.
the anomaly in rotary machine [15] has been employed in the work, • To assess the performance of the proposed system using standard
where an unsupervised model based on optimization and fusion of ML evaluation metrics like value of accuracy, recall, value of F1 score
algorithms such as K means clustering and SIFT has employed or de­ and precision.
tecting the anomaly in the specific industrial application. Similarly, SAE
(Stacked autoencoder) model and RSAE (Reduced SAE) has employed The paper is organized in the following structure. Section 2 explores
to detect the anomalies in signals received from rotary machine motors. the work of existing researchers for anomaly detection in industrial ap­
The findings of the model has obtained accuracy rate of 0.92 for RSAE plications using statistical, ML and DL approaches. Section 3 focuses on
and 0.52 for SAE model. Regardless of the performance of the model, demonstrating the proposed methodology involved in distinguishing the
accuracy is considered as one of the major drawbacks. anomaly detection in industrial applications using proposed BiLSTM­
Though the state-of-the-art approaches have delivered considerable VAE model. Section 4 showcases the results obtained by implementing
advantages for anomaly detection, there are certain limitations such as proposed model. Eventually, section 5 deliberates on summarizing the
reliance on data quality [15], entails extensive validation in diverse in­ entire work in a crisp manner. Further, future recommendation is pro­
dustrial settings, high cost of computation with neural network based vided for the conclusion part.
models [16], prone to ove­fitting due to limited labeled data [17], high
rate of false alarms, low accuracy, complications in processing diverse 2. Literature review
data sources efficiently [18]. Moreover, existing unsupervised models
often face difficulties when dealing with high dimensionality, intrin­ The existing body of works on anomaly detection is thoroughly pre­
sic noise, intricate interdependencies that are typical of industrial data. sented and analyzed in the following section.
Therefore, to rectify the shortcomings, proposed model emphasizes on
using BiLSTM --VAE model with dynamic loss function for anomaly de­ 2.1. Statistical and machine learning models
tection using TEP and SKAB datasets for anomaly detection in industrial
settings. Implementation of this model reduces the ove­fitting of the Study has used different techniques for anomaly detection using
model by focusing on unsupervised techniques and varying hyperpa­ CBLOF (Cluster Based Local Outlier Factor), HBOS (Histogram Based
rameters. Further, the performance of the model is evaluated by using Outlier Score) and OCSVM (One Class Support Vector Machine) [19].
metrics such as value of accuracy, recall rate, F1 score and precision The outcome of the model has showcased that, anomalies have been de­
value. tected precisely for ensuring the sustainability of industrial system. De­
Anomaly detection in industrial applications in increasingly vital due spite its advantages, the model reliance on unsupervised learning meth­
to the growing complexity and volume of data generates by machin­ ods, which depend on normal operating data, can pose challenges in
ery, particularly within the context of industry 4.0 and IoT. Therefore, detecting all possible fault types. This is particularly true for faults that
primary motivation of the present research focuses on enhancing the significantly deviate from previously observed patterns, as the model
operational efficiency, prevent equipment failures and minimize the can struggle to identify anomalies that fall outside the learned normal
downtime thorough timely detection of irregularities in machine be­ behavior. Likewise, two algorithms such as IForest (Isolation Forest) and
havior. Traditional methods of anomaly detection are often inadequate AE [20] has been employed for detecting the anomaly in mechanical
due to reliance on manual process, which are prone to human error component. Besides, IForest and AE models were compared with LOF
and inefficiency. Therefore, proposed model aims to automate the de­ and supervised RF algorithm using two days of industrial data collected
tection process using BiLSTM with VAE with proposed dynamic loss in in November 2020. Though the implementation of statistical model has
industrial machineries. The BiLSTM’s ability to handle sequential depen­ delivered reasonable performance for anomaly detection, future work

2
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277
of the study has focused on utilizing deep AE models like LSTM as it can critical components, as their unexpected failures can lead to significant
effectively captures sequential data patterns. operational disruptions. Hence, six layer autoencoder model has used for
Anomaly and fault detections in industrial systems poses significant detecting the fault analysis [17] and anomaly detection. The model has
challenge because of the inherent complexity of these systems, which demonstrated an accuracy rate of 91% in detecting anomalies. Though
makes comprehensive monitoring difficult. Therefore, three different the model has delivered accuracy rate of 91%, better accuracy can be
ML models like LOF, One class SVM and AE [21,22] where used via attained by employing DL models with different industrial benchmark
weighted average for performing anomaly detection. Findings of the datasets.
study has deliberated F1 score of 0.904 for LOF, 0.89 for one class SVM Industrial sensors have become essential tools for monitoring envi­
and 0.88 for AE. Despite the performance of the model, the computa­ ronmental conditions within manufacturing systems. However, if these
tional cost of the employing these algorithms were high, therefore, DL smart sensors exhibit abnormal behavior, it can lead to failures or poten­
based models will be preferred in the future to able to classify and cat­ tial risks during operation ultimately jeopardizing the overall reliability
egorize different types of faults/anomalies. Clustering approaches such of the manufacturing process. Hence, LSTM based model [31] has used
as HMM and auto encoders [23] were used for identifying huge devia­ for reconstructing the time series in reverse sequence. Subsequently, the
tions within the environmental data. The outcome of the model meets discrepancies between the reconstructed values and the actual values
the need for sustainable manufacturing by enabling the analysis of data are utilized to calculate the probability of an anomaly using maximum
collected from different machines. Moreover, SVM has used for detect­ probability estimation approach. Likewise, ATASML (Adversarial Task
ing anomalies in the operation of a rotating bearing within a marketable Augmented Sequential Meta-Learning) [32] model has used for the de­
semi-conductor manufacturing machine. Reinforcement learning-based tecting the faults in industrial components. The model has incorporated
approaches, such as the adoptive miner-misuse method, can enhance two different datasets such as SKAB and TEP datasets. Findings of the
online anomaly detection in power systems by adaptively learning from model has r­flected in delivering 94.7% of accuracy for SKAB dataset
real-time data and optimizing detection strategies to improve accuracy 90.13% of accuracy for TEP dataset. Therefore, the strategic combina­
and reduce false alarms in smart city energy management systems [24]. tion of adversarial learning with task sequencing in ATASML has focused
Industrial data presents considerable difficulties for conventional sta­ on fault diagnosis in various operational contexts. It is known that, the
tistical and clustering techniques [25,26] as these challenges stem from anomaly in mechanical systems can result in break down with serious
factors such as high dimensionality, intrinsic noise and the diverse na­ safety, environment and economic impact. Hence, in order to proceed
ture of the data, all of which can negatively affect the performance of with anomaly detection in mechanical equipment, two DL based ap­
these methods. Additionally, reliance on specific distributional assump­ proaches [33] have been used, such as SAE (Stacked AE) and LSTM net­
tions, the need for careful algorithm tuning, and high computational work. Combination of these techniques resulted in identifying anomaly
demands further constrain the efficacy of traditional anomaly detection conditions in an unsupervised manner. Findings of the work has stated
strategies in intricate industrial environment. Besides, high dimensional that, the model has resulted in better performance for anomaly detec­
characteristics of industrial data complicate the precise modeling and tion. DL base approach [34] has employed in the study for detection of
detection of anomalies using traditional approaches. Furthermore, this anomalies in industrial machines, particularly in rotating machinery.
type of data is frequently affected by noise resulting from sensor inac­ Therefore, in order to accomplish this, CNN model has used as fea­
curacies and environmental factors, which can hinder the effectiveness ture extractor for the reconstruction of input information and prototype
of statistical and clustering approaches. algorithm has used for improving the training process of a arbitrarily
Moreover, the robust of the statistical and ML presents a significant initialized feature extractor. Moreover, a BAGAN (Balancing Generative
challenge. Many statistical techniques depend on assumptions regarding Adversarial Network) [35] has used in the study for tacking the chal­
the underlying data distributions, such as the assumption of normal­ lenge of imbalanced fault diagnosis by harnessing data generation and
ity, which is often not met by industrial data, thereby diminishing its sample selection process. Initially, BAGAN technique has used for cre­
effectiveness. Likewise, conventional clustering methods frequently ne­ ating more distinct fault samples, as this approach utilized both fault
cessitate extensive parameter tuning to achieve optimal results, which and normal samples for enhancing the quality of the generated data.
can be particularly difficult with complex industrial datasets. Further­ Following this, to classify the faults effectively, SAE based DNN model
more, the computational demands of traditional clustering approaches has used on TEP dataset. The results indicated that the BAGAN based
can render them unsuitable for anomaly detection in high throughput model, coupled with an active sample selection strategy, significantly
industrial settings [19]. enhanced performance in diagnosing imbalances within chemical fault
data.
2.2. Deep learning models Automated early detection and prediction of process faults continues
to be difficult challenge in industrial operations. Therefore, DL based
The model G-LSTM-AE (Gated Long Short Term Memory Autoen­ methods have been opted for detecting faults in industrial machiner­
coder) [27] has focused on combining the strengths of LSTM networks ies. Hence, in order to process this effectively, temporal CNN1D2D [36]
and autoencoders by effectively learning the temporal dependencies in approach has been executed for detecting faults using TEP dataset by
time series data while also reconstructing input signals to detect the detecting various fault patterns, handling internal data fluctuations and
anomalies based on reconstruction errors. Two different datasets such correlation between sensors. Moreover, GAN model has used for enrich­
as automatic guided vehicle and SKAB dataset has been employed and ing and extending training data. Findings of the study has shown that,
findings of the model has showcased that G-LSTM-AE technique has faults that were challenging to identify were 3, 9 and 15. Issues aris­
resulted in satisfactory anomaly detection in industrial scenarios. Like­ ing in the production line can lead to significant losses. Anticipating
wise, DL based CNN and LSTM autoencoder [28,29] model has utilized these faults before they happen or pinpointing their underlying causes
for optimizing the anomaly detection rate of all anomalies. Analytical can greatly mitigate such losses. Thus, DL based technique has used in
outcome of the model has resulted in reasonable outcome for time se­ which, the production process follows a spatial sequence that differs
ries anomaly detection. Three real-world datasets have been considered from conventional time series data. To address this, LSTM within an
in the study for detecting the abnormality in the manufacturing sys­ encoder-decoder [37] has used for accommodating the branched struc­
tem using DL based 1DCNN technique [30]. In order to evaluate the ture associated with the spatial sequence. Additionally, an attention
performance of the model, traditional techniques such as LSTM, autoen­ mechanism has employed for detecting faults and detect the causes in
coder, LSTM-autoencoder and ARIMA techniques were compared with TEP dataset. A significant limitation of this method is the complexity of
1DCNN and resulted in significantly higher outcome for anomaly de­ the attention mechanism. This algorithm demands substantial comput­
tection. Moreover, in various industrial settings, gear are often deemed ing resources and exhibited sub-optimal real time performance. Another

3
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277

Fig. 1. Overview of Proposed Model.

drawback of the model is the necessity for historical data to generate the
output model effectively. Correspondingly, MLP, GRU and TCN (Tem­
poral Convolutional Network) [38] has used in the study for identifying
different types of faults in the automated control system for enhancing
the decision making in industrial process management. Besides, a com­
bination of BLSTM with AM [39] has developed for address the dynamic
and temporal relationship on longer series observation and the attention
mechanism has adopted for highlight the features by assigning weights
to the model. This is obtained by using TEP dataset, which reduced the
bias between larger population parameters and sample statistics. Find­
ings of the work has illustrated ideal tradeoff in fault diagnosis research.
Likewise, DL based model [40] has opted in the work for conducting
fault detection and diagnosis by combining non-linear processes using
TEP dataset. Experimental outcome of the model has resulted in consid­
erable performance of anomaly detection.

3. Research methodology
Fig. 2. System architecture used in SKAB.

Proposed work on anomaly detection in industrial applications is


3.1. Dataset description
gaining significant attention, particularly due to its potential to enhance
operational efficiency and safety. Therefore, the following section delve
Two different datasets are used in the proposed work for assess­
into the methodologies employed, supported by Fig. 1 that illustrates
ing the efficacy of the model. Therefore, detailed description of these
the mechanism involved.
datasets are given.
The process is initiated by loading SKAB and TEP datasets. Once
the dataset is loaded, pre-processing approaches such as min-max nor­
3.1.1. SKAB dataset
malization and label encoding is performed. Following the process of The Skoltech anomaly benchmark is a dataset designed for evaluat­
pre-processing, the data is separated as train-test split (80% training ing anomaly detection algorithms, particularly focusing on outlier and
and 20% testing). After train-test split, class­fication of the model takes changepoint detection. The dataset consists of multivariate time series
place, where proposed BiLSTM-VAE with dynamic loss is used. Here, data collected from sensors in a test bed, each dataset represents a single
the proposed dynamic loss function assign higher weights to under­ experiment with one anomaly. Table 1 showcases the columns in SKAB
represented classes, by doing so, data imbalance handling process can dataset.
be improved by generating synthetic samples for minority classes and The system comprises several key components essential for its opera­
the class-wise adaptation of dynamic loss ensures that the model pay tion and is portrayed in Fig. 2. At the forefront is the 1,2 solenoid valve,
specific attention to the most challenging minority classes, enhanced which regulates the flow of water from a 3 tank filled with water. This
reconstruction and generative capabilities. This is accomplished by us­ tank is connected to a 4-water pump that facilitates the movement of wa­
ing tempering index where, parameters such as 𝛼 (alpha) and 𝛾 (gamma) ter throughout the system. Safety is paramount, and thus 5-emergency
are used. The parameter 𝛼 adjust the weights assigned to various classes, stop button is integrated to allow for immediate cessation of opera­
while 𝛾 reduces the loss contribution from samples that are easily clas­ tion in case of an emergency. The 6-electric motor powers the pump,
s­fied. By incorporating a tempering index, the model encourages to while an inverter (7) manages the motor’s speed and torque, optimizing
focus more on sample minority class, thereby addressing the challenge performance based on operational demands. Central to the system’s con­
of data imbalance. Eventually, the model is gauged using evaluation trol and monitoring is 8-compactRIO, which provides a robust platform
metrics. for data acquisition and control. Additionally, 9- a mechanical level is

4
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277
Table 1
Attributes of SKAB Dataset.

Columns Description

Datetime Represents dates and times of the moment when the value is written to the database.
Accelerometer1RMS Shows a vibration acceleration (Amount of g units)
Accelerometer2RMS Shows a vibration acceleration (Amount of g units)
Current Shows the amperage on the electric motor (Ampere)
Pressure Represents the pressure in the loop after the water pump (Bar)
Temperature Shows the temperature of the engine body (The degree Celsius)
Thermocouple Represents the temperature of the fluid in the circulation loop (The degree Celsius)
Voltage Shows the voltage on the electric motor (Volt)
RatRMS Represents the circulation flow rate of the fluid inside the loop (Liter per minute)
Anomlay Shows if the point is anomalous (0 or 1)
changepoint Shows if the point is a changepoint for collective anomalies (0 or 1)

Fig. 3. Tennessee Eastman process flow for simulation.

included to address any potential shaft misalignment issues, ensuring the data needs to be bounded within a d­fined interval, making it suit­
smooth operation and longevity of the equipment. able for anomaly detection model.
Label encoding: Label encoding is a critical pre-processing ap­
3.1.2. Tennessee Eastman process dataset proach employed for primarily converting categorical variables into
TEP dataset comprises of different datasets that simulate various op­ numerical format. This transformation is important for algorithms that
erational conditions and faults. This detail encompasses of 22 classes, need numerical input.
Additionally, SMOTE pre-processing approach has used for TEP
where 21 classes represents different fault types and 1 class (Fault 0)
dataset to address the class imbalance in dataset. It generates synthetic
represents the fault-free condition. Fig. 3 shows the process involving 5
examples of the minority class rather than duplicating existing instances,
main operating units such as reactors, condenser, vapor-liquid separa­
which aids to create more balanced dataset and enhance the perfor­
tor, recycle compressor and product stripper.
mance of the model.

3.2. Data pre-processing 3.3. Proposed BiLSTM-variational autoencoder with dynamic loss function

Two different data pre-preprocessing techniques are used in the VAE is a powerful generative model opted by the proposed work
study for for anomaly detection of machinery fault detection. Though there are
Min-Max normalization: Min-Max normalization is a technique various methods for anomaly detection, proposed work adopts VAE as
which scales the values of a dataset to a specific range, typically be­ VAEs encode data into a probabilistic latent space rather than a fixed
tween 0 to 1. This method is specifically ben­ficial in scenarios where point, which allows for a more nuanced understanding of the data dis­

5
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277

Fig. 4. Variational Autoencoder model.

tribution. This flexibility helps capture the variability and complexity 𝑝(𝐴 ∣ 𝐶) =  (𝐴; 𝜇 ′ (𝐶), 𝜎 ′ 2 (𝐶)) (2)
of normal operating conditions in machinery, making it easier to iden­
Where 𝜇 ′ (𝐶)showcases the mean and 𝜎 ′ 2 (𝐶)
represents the variance
tify deviations that indicate anomalies. Unlike standard AEs and other
of the reconstructed output given the latent variable c. Therefore, the
approaches, which can memorize training data, VAEs reassure general­
goal of the decoder is to maximize the likelihood of reconstructing the
ization by sampling from the learned latent distribution. This charac­
original data from the latent representation.
teristics enables VAEs to reconstructs inputs more effectively, as they
learn to model the underlying data distribution rather than just mem­
3.3.3. Loss function
orizing specific instances. Besides, the reconstruction process in VAEs
The loss function of VAE is a fundamental aspect that governs its
involves comparing the original input with its reconstruction from the
training and performance, comprising two main components such as re­
latent space. This allows for a nuanced assessment of anomalies, as de­
construction loss and KL divergence. Each of these components play a
viations from normal patterns can be detected through reconstruction
distinct role in shaping the model’s ability to learn a meaningful repre­
error. VAEs can learn to reconstruct typical patterns while highlight­
sentation of the data and for generating new samples. The reconstruction
ing anomalies, which is imperative in industrial setting where normal
loss measures how precisely the VAE can predict the input data from its
operating conditions can vary significantly. Furthermore, VAE create
latent representation. It is essential for ensuring that the model captures
a smoother and more continuous latent space due to its regularization
the essential features of the input. This loss is d­fined mathematically
techniques such as KL divergence. This characteristics leads to better
as,
clustering of similar data points and more reliable similarity measures,
[ ]
which enhances the detection of anomalies. This smoothness of the la­ Reconstruction Loss = 𝔼𝑞(𝐶|𝐴) log 𝑝(𝐴|𝐶) (3)
tent space ensures that even minor deviations from the norm can be
captured effectively. Owing to these factors, VAE model is used. There­ Here, A denotes the original input, z represents the latent variable sam­
fore, Fig. 4 shows the working of traditional VAE function. pled from the encoder’s output and p(A|C) highlights the likelihood of
The architecture of VAE consist of two main components such as the reconstructing x given c. For more continuous data, Gaussian distribu­
encoder and the decoder. This structure is crucial for its operation, as tion can be used, leading to a reconstruction loss computed via MSE and
it allows the model to learn a probabilistic representation of the input for binary data, BCE is used, which is demonstrated as,
data. 𝑁
1 ∑[ ]
BCE = − 𝐴 log(𝐴) + (1 − 𝐴𝑖 ) log(1 − 𝐴̂ 𝑖 ) (4)
𝑁 𝑖=1 𝑖
3.3.1. Encoder
The encoder part in VAE is responsible for mapping the input data This part of the loss function ensures that the decoder learns to generate
into a latent space. Therefore, the encoder takes raw input data A and outputs that closely resemble the original inputs, thus driving accurate
transforms it into a latent space representation C. Instead of produc­ reconstructions.
ing a single deterministic output, the encoder outputs parameter for a Likewise, KL divergence ensures that the learned latent distribution
probability distribution-typically a Gaussian distribution characterized approximates a prior distribution and promotes a well-structured latent
by a mean 𝜇 and variance 𝜎 2 . This transformation is mathematically space where different regions can be effectively utilized for generating
expressed in the equation (1) as new samples, therefore, the KL divergence term can be expressed in
terms of the parameters of the learned distribution and is depicted as,
𝑞(𝑐𝑜 ∣ 𝐴) =  (𝐶; 𝜇(𝐴), 𝜎 2 (𝐴)) (1)
𝑑 ( )
1∑
Here, 𝑞 (𝑐 ∣ 𝐴) is the approximate posterior distribution. The encoder’s 𝐷𝐾𝐿 (𝑞(𝐶|𝐴)‖𝑝(𝐶)) = − 1 + log(𝜎𝑗2 ) − 𝜇𝑗2 − 𝜎𝑗2 (5)
2 𝑗=1
goal is(to)ensure that this distribution closely matches a prior distribu­
tion 𝑝 𝑐𝑜 , often chosen as a standard normal distribution  (0, 𝐼) Where, d is denoted as dimensionality of the latent space and 𝜇𝑗 and 𝜎𝑗
denotes the mean and variance for each dimension of the latent variable.
3.3.2. Decoder Therefore, In VAE, the encoder is designed to transform the in­
The decoder performs the reverse operation, where the decoder takes put data into a lower-dimensional representation characterized by a
samples from the latent space and reconstructs them back into the data probability distribution. This probabilistic encoding is essential for the
space. It aims to generate data points that closely match the original latent space C to possess meaningful abstract properties that facili­
inputs. The decoder outputs another set of parameters for a distribution tate the reconstruction of the observed data. To ensure that this la­
over the reconstructed data, typically modeled as, tent space adheres to a well-defined structure, regularization technique

6
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277

Fig. 5. Proposed architecture of BiLSTM with VAE.

are employed, allowing the VAE effectively learn variational inference analyzing sequences bidirectionally, the BiLSTM can learn complex de­
throughout the training process. The weight parameter 𝜙 of the encoder pendencies and relationships in time-series data more effectively than
network is optimized to transform input samples into an encoded fea­ unidirectional models. In encoder part, the encoder takes time series
ture representation, referred to as C. In contrast, the weight parameter 𝜃 data from machinery as input. Then, BiLSTM processes the input data in
of the decoder network is trained to generate new samples by mapping both forward and backward directions. This dual processing allows the
from the encoded space C back to the original data space. Throughput model to capture the temporal dependencies more effectively, as it con­
the training process, there is a possibility of information loss, which sider both past and future information for each time step. This is useful
can affect the accuracy of the reconstruction. Therefore, the primary in industrial settings where the state of machinery is i­fluenced by pre­
objective is to establish an ideal encoder-decoder pair that maximizes vious and subsequent states. After processing the input data, the BiLSTM
information retention during encoding while minimizing reconstruction generates hidden states that are then used to form a latent representa­
error during decoding. This approach ensures that the model effectively tion. This latent space is important for capturing the underlying patterns
captures the essential features of the input data and precisely recon­ in normal operational behavior. The encoder outputs are passed through
structs it. Moreover, traditional VAE often struggle with computational a variational layer that approximates a posterior distribution over the
resources, especially when dealing with huge datasets. This leads to latent variables. The output from the BiLSTM encoder is typically fed
more increased training times and operational cost. While traditional into a re-parameterization layer, where two vectors are produced. The
VAE are designed to reconstruct input data, it often produce outputs that use of BiLSTM in encoder can handle varying input lengths and capture
lack high fidelity or distorted reconstruction. This can hinder effective­ long-range dependencies more effectively. This capability is crucial for
ness of anomaly detection since the model can misinterpret the anoma­ upholding the integrity of temporal information when encoding input
lies as normal variations due to poor reconstruction quality. Therefore, sequences into latent representations. The decoder begins by sampling
in order to overcome these pitfalls, proposed model focuses on employ­ from the latent space using the parameters generate by the encoder.
ing BiLSTM technique with variational autoencoder with proposed dy­ This sampling is essential for generating new data points that mimic the
namic loss. Though there are various algorithms, proposed work adopts input data distribution. A BiLSTM decoder interprets these latent rep­
BiLSTM model, as it is designed to handle sequential data by processing resentations and generate output sequences. Similar to the encoder, the
information in both forward and backward directions. This bidirection­ BiLSTM decoder processes information bi-directionally, allowing it to
consider both past outputs and future predictions when generating each
ality allows to capture long-range dependencies more effectively than
step of the output sequence. This features enhances its ability to pro­
traditional models, which typically only process data in one direction.
duce coherent and contextually relevant output. Hence, mathematical
BiLSTM models can scale with larger datasets and complex tasks with­
equations for proposed model is listed as follows,
out a significant drop in performance. Unlike standard models, BiLSTM
The goal is to reconstruct data for a specific minority class. Therefore,
mitigate issues such as vanishing gradient, making it more capable of
the training of the proposed model involves the inclusion of additional
learning from long sequences of data. This is important for machinery
sample data associated with the designated class label b. during training,
data, as it can exhibit complex temporal patterns over extended periods.
the network develops an optimal latent distribution corresponding to the
As a result of these merits, BiLSTM is picked over other model. The in­
particular class label and the loss function of the VAE is computed by
tegration of BiLSTM model with VAE enhances the anomaly detection
employing equation (6),
process in industrial applications takes by incorporating mechanism of
BiLSTM in encoder and decoder part, which is depicted in Fig. 5. BiLSTM 𝐿vae (𝜙, 𝜃, 𝐴, 𝑏) = − log(𝑥𝑡 ) − 𝐷𝐾𝐿 [𝑄(𝐶 ∣ 𝐴, 𝑏)‖𝑃 (𝐶 ∣ 𝑏)] (6)
architecture consist of two LSTM layers that process the input sequence
in both forward and backward directions. Where, 𝐿vae (𝜙, 𝜃, 𝐴, 𝑏) demonstrates the variation lower bound of VAE.
CE used is d­fined in equation (7),
• Forward LSTM layer: Processes the input sequences from the begin­
𝐶𝐹 (𝑝𝑡 ) = −𝑙𝑜𝑔(𝑥𝑡 ) (7)
ning to the end
• Backward LSTM layer: Processes the input sequences from the end However, the traditional CE (Cross Entropy) loss used in VAE do
to the beginning. not possess the ability to optimize the latent distribution. Moreover,
when CE is employed as the reconstruction loss in the context of imbal­
This dual processing allows the model to capture from both past and anced datasets, the majority class tends to dominate the loss calculation,
future states, which is crucial for comprehending sequential data. By which in turn skews the gradient updates during the training process.

7
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277
Table 2 performance of the model. Likewise, the optimizer used in the proposed
Parameters Employed in proposed model. work is Adam optimizer, due to its efficiency in handling sparse gradi­
Parameter Value ents and noisy data. The learning rate (𝑙𝑟 = 1𝑒 − 4) controls how much
to adjust weights during training. The clipping value (clip_value=1.0)
Latent Dimension 64
Encoder LSTM Units 128, 64
prevents exploding gradients by limiting the maximum value of gradi­
Decoder LSTM Units 64, 128 ents during back propagation. Likewise, batch size used in the model is
Classifier Dense Units 64 32, which strikes the balance between computational efficient and con­
Optimizer Adam (lr=1e-4, clip_value=1.0) vergence stability and eventually, epochs opted by the model is 30, as
Batch Size 32
30 epochs allows sufficient iterations for learning without risking over­
Epochs 30
fitting.
These parameters are used in the proposed model for building su­
Most importantly, cross entropy function can be sensitive to outliers and perior anomaly detection model which performs both binary and multi­
this factor can make the model become biased towards predicting the class class­fication efficaciously. The results obtained using the proposed
majority class, leading to high false negative rates for anomalies. Dif­ model is demonstrated in the subsequent section.
ficulty in balancing loss component, as in VAE, the objective function
typically includes both reconstruction loss and KL divergence. Usage of 4. Results and discussion
cross-entropy can complicate the balance between the model since it
can dominate the loss function if not properly scaled, potentially lead­ Results obtained using the proposed BiLSTM with VAE model is de­
ing to suboptimal learning of latent representations. Therefore, in order picted in the corresponding section. Performance metrics, performance
to overcome the drawback, proposed model utilizes dynamic loss in KL analysis and comparative analysis are carried out in this section.
by employing tempering index (1 − 𝑥𝑡 ) with tune-able parameter 𝛾 in
proposed dynamic loss for overcoming the pitfalls encountered with CE 4.1. Performance metrics
loss. Implementation of the (1 − 𝑥𝑡 ) is employed for misclass­fied and
true/sample negative samples. Therefore, mathematical expression is 4.1.1. Accuracy
derived in equation (8). Accuracy is claimed as the calculation of total accurate class­fication.
The accuracy range is premeditated by using equation (11),
𝑇 𝐼(𝑥𝑡 ) = −𝛼𝑡 (1 − 𝑥𝑡 )𝛾 log(𝑥𝑡 ) (8) 𝑇𝑁 +𝑇𝑃
Acc = (11)
Here, 𝛼 is used for handling the class imbalance issue, where 𝑇𝑁 + 𝐹𝑁 + 𝑇𝑃 + 𝐹𝑃
{ Where, TN is represented as True negative, FN is represented as False
−𝛼, if 𝑏 = 1 Negative, similarly, True positive and False positive is denoted by using
𝛼𝑡 = (9)
−(1 − 𝛼), otherwise TP and FP.
Weighted term is denoted as 𝛼𝑡 whose value is 𝛼 for positive class and
4.1.2. Precision
1−𝛼 for negative class. Therefore, usage of 𝛼 balances the significance of
Precision is considered by determining the accurate class­fication
majority as well as the minority examples. 𝛾 is tailored to various classes
count. The precision is estimated by using equation (12),
depending on the imbalance characteristics. Therefore, the goal is to
reduce the relative errors for minority classes by emphasizing its sig­ 𝑇𝑃
Precision = (12)
nificance. The hyperparameter 𝛾 i­fluences the shape of the loss curve, 𝑇𝑃 + 𝐹𝑃
allowing for targeted adjustments in the learning process. The primary
4.1.3. F-measure
purpose of proposed dynamic loss focuses on minimizing error input
The F1 score is represented as the weighted harmonic-mean value
from notable instances and amplify the error for those examples that tol­
of precision and value of recall, Equation (13) is d­fined as the formula
erate a low loss. Hence, mathematical equation for proposed dynamic
employed for determining F1-Score,
loss is provided in equation (10),
𝑅×𝑃
F1-score = 2 × (13)
𝐿cflvae (𝜙, 𝜃, 𝐴, 𝑏) = −𝛼𝑡 (1 − 𝑥𝑡 )𝛾 log(𝑥𝑡 ) − 𝐷𝐾𝐿 [𝑄(𝐶 ∣ 𝐴, 𝑏)‖𝑃 (𝐶 ∣ 𝑏)] 𝑅+𝑃
Where, P is denoted as precision and R is denoted as recall.
(10)
Therefore, proposed dynamic loss function focuses on different minor­ 4.1.4. Recall
ity class samples differently and learns the best distribution of observed Recall is indicated as the reclusive of the production metric that as­
data. Therefore, table showcases the parameters used in the model (Ta­ sess the total of correct positive categories made out of all the optimistic
ble 2). classes. Equation (14) shows the mathematical model for recall,
Table shows the parameters used in the proposed model, where the 𝑇𝑃
latent dimension refers to the size of the encoded representation of the Recall = (14)
𝑇𝑃 + 𝐹𝑁
input data produced by the encoder. A latent dimension of 64 indicates
that the model will compress the sequence of input into a fixed-length 4.2. Performance analysis
vector of size 64. The encoder consist of two LSTM layers with 128 and
64 units. The first layer (128 units) captures complex temporal patterns Performance analysis is performed for assessing the efficacy of the
in the input sequence, while the second layer (64 units) r­fines this in­ model for the anomaly detection in industrial applications using SKAB
formation into a more compact representation before passing it to the and TEP dataset.
decoder. Similar to the encoder, the decoder also has two LSTM layers.
The first layer has 64 units, which processes the encoded state from the 4.2.1. SKAB dataset
encoder and generates a sequence of hidden states for output generation. Subsequent section explores the confusion matrix, model accuracy,
The second layer has only 128 unit, indicating that it produces a single model loss and ROC curve for proposed model using SKAB dataset. Fig. 6
output at each time step. After processing through the decoder, a dense shows the confusion matrix of the proposed model.
layer with 64 units is used to classify the final output from the decoder’s Confusion matrix is employed for evaluating the performance of clas­
hidden states. The choice of 128 units can balance the complexity and s­fication model, where it typically consist of 4 different components

8
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277
Table 3
Performance Metrics of Different Models for SKAB dataset [41].

Model Accuracy Precision Recall F1 Score

Hotelling 0.46 0.46 0.99 0.63


Hotelling-Q 0.51 0.47 0.59 0.52
iForest 0.54 0.50 0.98 0.66
MSET 0.72 1.00 0.39 0.57
Autoencoder 0.46 0.46 0.99 0.63
LSTM 0.35 0.37 0.57 0.45
Conv-AE 0.88 0.90 0.85 0.87
MSCRED 0.79 0.79 0.75 0.77
Transformer 0.70 0.68 0.66 0.67
Saxformer+CNN 0.66 0.68 0.50 0.58
Saxformer+Boosting+CNN 0.90 0.95 0.83 0.88
Proposed BiLSTM-VAE model 0.98 0.95 0.96 0.96

fewer instances like Class 4 (75 correct) and Class 9 (139 correct), indi­
cating lower performance possibly due to imbalanced classes or similar
Fig. 6. Confusion Matrix for SKAB dataset. features. There are also instances where neighboring classes are con­
fused, such as Class 1 being misclass­fied as Classes 2, 3, or 20, and
such as TP, TN, FP and FN. TP correctly predicts the positive class, TN Class 12 (3220 correct) sometimes being misclass­fied as Classes 11 or
correctly predicts the negative class, FP incorrectly predicts the positive 13. To enhance performance, it is essential to delve into the difficulties
class and FN incorrectly predicts the negative class. Here, rows represent faced by poorly performing classes and tackle issues like data imbalance
the actual classes and column depicts the predicted label. The overall or feature overlap.
mechanism of the Fig. 6, illustrates that, misclass­fications are less than Model accuracy is highlighted in Fig. 10a. The model demonstrates
correct class­fications. This shows that the model has delivered better efficient learning as the training and validation losses decrease consis­
result of the model. tently until they reach a point of stability around epochs 10 to 12, show­
Likewise, model accuracy is showcased in Fig. 7a, where model ac­ ing convergence. However, there is a increase in training loss at epoch
curacy is a measure of how well a model makes predictions compared 16, surpassing 30, potentially indicating problems such as weight ini­
to the actual outcomes. Y axis represents the accuracy, a measure of tialization or optimizer instability. Though, the validation loss remains
proportion of correct predictions, x axis denotes the number of epochs. unchanged, suggesting this is probably a temporary variation. Follow­
Initially, the training accuracy starts very low and improves steadily ing the spike, the training loss stabilizes again and closely matches the
over first 10 epochs and after about 15 epochs, it plateaus near 1.0, validation loss, indicating no signs of ove­fitting. The Fig. 10b shows
showing the model learns the training data perfectly. Then, the vali­ significant improvement in accuracy, with both the training and valida­
dation accuracy closely follows the training accuracy curve during the tion accuracy starting at about 0.5 and steadily increasing to around 0.9
initial epochs. by epoch 30, demonstrating effective learning without notable ove­fit­
Model loss is illustrated in Fig. 7b, where model loss quant­fies how ting or unde­fitting. It is interesting to note that the validation accuracy
well the proposed model’s predictions align with the actual outcomes, slightly exceeds the training accuracy at certain points, indicating good
focusing on the errors made during predictions. The y axis indicates the generalization and minimal ove­fitting. The accuracy levels off after
loss, which measures the error or difference between the predicted and epoch 20, with minor fluctuations around 0.9, suggesting that the model
true values and x axis represents the number of epochs. At the start, the has likely reached its maximum performance with the current setup.
training loss is extremely high. After the first epoch, the training loss ROC curve for TEP dataset is showcased in Fig. 11, where the dashed
drops sharply to near 0. line represents a random class­fier, where the TPR equals the FPR. Fig­
ROC plot is demonstrated in Fig. 8, where ROC curve plots the TRP ure, lists the 18 classes and its respective AUC scores, where higher AUC
against FPR at different threshold levels. The TPR, which is known as values indicates better performance for that class. The key findings of
recall measures the proportion of actual positive correctly ident­fied by the paper shows that, many classes (class 1, class 2, class 6, class 7 and
the model, while FPR indicates the proportion of actual negatives in­ class 14) have an AUC of 1.0, which sign­fies that the proposed model
correctly ident­fied as positives. Therefore, ROC curve aids in visualize is perfect in class­fication process. Even for the lowest performing class
the trade-off between sensitivity and spec­ficity, assisting in selecting (class 0), the AUC is still quite high at 0.89. This indicates that the model
an optimal threshold for class­fication. Likewise, the metric values of has string predictive capability for all classes. The performance metrics
the model is demonstrated in which, accuracy gained by the model is of a multi-class class­fication model is also discussed, where accuracy of
0.98, precision obtained by the model is 0.95, recall obtained by the the model obtained is 0.92, precision gained is 0.82, Recall and F1 score
framework is 0.96, likewise, F1 score of the proposed model gained is obtained is 0.92 and 0.85.
0.96.
4.3.1. Comparative analysis
4.3. TEP dataset Though the performance of the proposed model has delivered better
outcome for anomaly detection for both binary and multiclass classi­
Like, binary class­fication, results of multiclass class­fication for TEP fication, it is important to compare the performance of the proposed
dataset is demonstrated in the subsequent section. Therefore confusion framework with state-of-the-approaches, to highlight the working mech­
matrix for the proposed model is demonstrated in Fig. 9. anism of the proposed research.
The confusion matrix provides important insights of the proposed Table 3 r­flects the metric value obtained by the proposed and state­
model’s performs with various classes. When most of the values are con­ of-the-art approaches for anomaly detection using SKAB dataset, in
centrated along the diagonal, it shows high accuracy for the majority of which the lowest accuracy gained by the existing model is LSTM model
classes. For instance, there are 3648 correct predictions for Class 1 out of with accuracy rate of 0.35 and lowest F1 score of 0.45 is also obtained
around 3700, and Class 20 has 2807 correct predictions. However, there by LSTM model, shows it ineffectiveness on anomaly detection process.
are errors that occur away from the diagonal, especially in classes with However, when compared to the existing models, proposed BiLSTM with

9
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277

Fig. 7. Accuracy and Loss on SKAB dataset.

that this layer contributes positively to model performance. Excluding


the second BiLSTM layer leads to a slight drop in SKAB (97.20) and a
modest decline in TEP (90.21), indicating redundancy but still retain­
ing significant effectiveness. The complete removal of both BiLSTM layer
causes a substantial decline in performance, with SKAB dropping to 0.92
and TEP to 0.82, highlighting critical role of BiLSTM in the model ar­
chitecture. Reducing the latent space dimensional by half results in a
minor decrease in SKAB (97.80) but maintains TEP at 0.91, indicating
that the model remains effective despite dimensionality reduction. In­
creasing the latent space dimension doubles performance on SKAB to
0.98 and improves TEP to 0.92, highlight that a lager latent space en­
hances model capacity and representation. The removal of dynamic loss
weighting leads to a slight decrease in SKAB (0.97) and TEP (0.90), im­
plying that this feature contributes positively but is not as critical as the
BiLSTM layers.
Therefore, from the ablation work, it is very well d­fined that the
proposed BiLSTM-VAE with dynamic loss has delivered superior per­
formance for both SKAB dataset and TEP dataset in terms of anomaly
detection in industrial machineries.
Fig. 8. ROC Curve for Class 0 and Class 1.
4.5. Discussion and limitation of the study
VAE r­flects better accuracy rate of 0.98. This higher accuracy deliber­
ates the superior performance of the proposed work. The findings of the present framework very much r­flects on the
Table 4 showcases the comparison analysis of proposed work with effectiveness of the proposed BiLSTM-VAE for anomaly detection and
SOTA. From table, it is deliberated that proposed model has obtained class­fication both in terms of binary as well as multiclass class­fica­
better performance for anomaly detection with accuracy rate of 0.92, as tion. As most of the work entirely carry on either binary class­fication
the accuracy of MSSA-PNN was 0.88, LDA was 0.69, QSA of 0.81, KNN or just multiclass class­fication. However, proposed work has carried out
was 0.70, SVM was 0.75, MaxEnt was 0.52, CS-BP was 0.84. Therefore, both. Likewise, different state-of-the-approaches uses autoencoder mod­
findings of the table, exhibit the staggering performance of the proposed els like G-LSTM-AE [23], stacked AE and LSTM network [28], SAE based
framework using TEP dataset. DNN model [30] has used in varied studies, however, these autoencoder
Likewise, Table 5 shows the class-wise F1 score obtained by the exist­ based models resulted in low accuracy and numerical outcomes and
ing and proposed model, in which the F1 score gained by the proposed this can be due to combination of models incorporated with AEs. How­
model is 0.85, which is superior than most of the state-of-the-art models. ever, this drawback can be overcome by using BiLSTM model with VAEs
with proposed dynamic loss function, where this function enhances the
4.4. Ablation study performance of the model by re-updating the loss back to the encoder­
decoder mechanism which resulted in higher accuracy. Less studies have
A deeper understanding of the proposed model can be evaluated us­ explored the combination of SKAB and TEP datasets, nevertheless, pro­
ing ablation study by pinpointing and removing necessary components. posed model works on SKAB and TEP datasets as combination of SKAB
This approach can showcase the effectiveness of the proposed model and TEP dataset leverage the strengths of each dataset. Moreover, SKAB
under various settings. Hence, Table 6 shows the ablation study of the provides detailed mechanical failure data, while TEP offers rich tempo­
present work for two different datasets. ral data from a chemical process. This amalgamation helps in creating
From Table 6, it is ident­fied that, the original proposed co­figura­ robust models that can generalize better across different scenarios. Be­
tion achieves the highest performance with SKAB at 0.98 and TEP at sides, ATASML model faced computational efficiency issue for anomaly
0.92 of accuracy, serving as the benchmark. Removing the first BiLSTM detection, however proposed model do not experience this challenge
layer results in a decrease in both SKAB (0.96) and TEP (0.89) indicating due to implemented of advanced models.

10
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277

Table 4
Comparison of Model Performances Across Categories for TEP dataset [42].

Class MSSA-PNN LDA QDA KNN SVM MaxEnt CS-BP Proposed

1 1 0.99 1 0.72 0.99 0.93 0.99 0.94


2 0.98 0.97 0.99 0.59 0.99 0.44 0.98 0.92
3 0.98 0.54 0.59 0.67 0.66 0.36 0.71 0.92
4 1 1 1 0.67 1 0.68 1 0.89
5 1 0.99 1 0.62 1 0.39 1 0.92
6 0.99 0.98 1 0.98 1 0.49 0.99 0.92
7 1 1 1 0.66 1 0.42 1 0.92
8 0.8 0.56 0.92 0.59 0.66 0.41 0.92 0.92
9 0.82 0.58 0.59 0.62 0.65 0.67 0.67 0.92
10 0.79 0.52 0.64 0.65 0.65 0.66 0.63 0.93
11 0.78 0.51 0.6 0.6 0.67 0.41 0.66 0.92
12 0.88 0.58 0.96 0.74 0.44 0.37 0.95 0.92
13 0.98 0.65 0.98 0.88 0.47 0.44 0.97 0.92
14 1 0.51 1 0.93 0.66 0.44 1 0.91
15 0.76 0.6 0.57 0.65 0.66 0.66 0.64 0.92
16 0.78 0.52 0.56 0.68 0.65 0.71 0.63 0.93
17 0.84 0.53 0.67 0.63 0.65 0.49 0.74 0.92
18 0.84 0.88 0.91 0.86 0.92 0.74 0.9 0.92
19 0.79 0.53 0.57 0.62 0.63 0.4 0.65 0.93
20 0.71 0.56 0.66 0.67 0.66 0.49 0.74 0.92
21 0.81 0.59 0.78 0.64 0.65 0.43 0.82 0.92

Mean 0.88 0.69 0.81 0.7 0.75 0.52 0.84 0.92

Table 5
Comparison Analysis for TEP dataset for F1 score.

Class MSSA-PNN LDA QDA KNN SVM MaxEnt CS-BP Proposed

1 1.00 0.99 1.00 0.80 1.00 0.94 1.00 0.69


2 0.98 0.97 0.99 0.72 0.99 0.41 0.98 0.93
3 0.98 0.61 0.63 0.77 0.79 0.18 0.79 0.93
4 1.00 1.00 1.00 0.78 1.00 0.70 1.00 0.40
5 1.00 1.00 1.00 0.74 1.00 0.28 1.00 0.81
6 1.00 0.98 1.00 0.98 1.00 0.39 0.99 0.92
7 1.00 1.00 1.00 0.75 1.00 0.30 1.00 0.94
8 0.85 0.61 0.94 0.72 0.80 0.33 0.94 0.93
9 0.87 0.66 0.66 0.73 0.79 0.69 0.78 0.93
10 0.84 0.57 0.68 0.75 0.79 0.68 0.74 0.55
11 0.83 0.59 0.65 0.71 0.80 0.28 0.75 0.90
12 0.91 0.61 0.97 0.80 0.61 0.16 0.96 0.90
13 0.98 0.68 0.98 0.90 0.60 0.30 0.98 0.93
14 1.00 0.57 1.00 0.95 0.79 0.26 1.00 0.93
15 0.82 0.67 0.61 0.76 0.80 0.67 0.77 0.93
16 0.84 0.59 0.59 0.77 0.79 0.73 0.75 0.66
17 0.88 0.61 0.71 0.74 0.79 0.40 0.80 0.86
18 0.87 0.89 0.93 0.89 0.93 0.75 0.93 0.93
19 0.85 0.59 0.62 0.74 0.78 0.35 0.77 0.94
20 0.79 0.63 0.69 0.76 0.80 0.48 0.81 0.92
21 0.85 0.64 0.81 0.74 0.79 0.39 0.86 0.93

Mean 0.91 0.74 0.83 0.79 0.84 0.46 0.88 0.85

Table 6
Ablation Study on performance Metrics.

Experiment Description SKAB TEP

Base Model Original BiLSTM-VAE with dynamic loss weighting 0.98 0.92
No First BiLSTM Removed the first BiLSTM layer in the encoder 0.96 0.89
No Second BiLSTM Removed the second BiLSTM layer in the encoder 0.97 0.90
No BiLSTMs Removed both BiLSTM layers 0.92 0.82
Half Latent Dimension Reduced the latent space dimension by half 0.97 0.91
Double Latent Dimension Increased the latent space dimension by double 0.98 0.92
No Dynamic Loss Weighting Removed the dynamic loss weighting callback 0.97 0.90

11
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277

Fig. 9. Confusion Matrix for TEP dataset.

Fig. 10. Accuracy and Loss on TEP dataset.

5. Conclusion working mechanism of anomaly detection process. Findings of the pro­


posed model, after the execution of proposed BiLSTM-VAE for SKAB
Proposed mechanism of BiLSTM-VAE with dynamic loss was opted dataset was accuracy rate of 0.98 and 0.92 for TEP dataset. Hence, nu­
in the present research work for detecting anomalies in different indus­ merical outcome of the proposed technique has highlighted performance
trial machineries. Hence, incorporation of dynamic loss in the proposed of anomaly detection. Therefore, current work can aid the industrial ap­
BiLSTM-VAE model resulted in better performance of anomaly detection plication to detect the anomalies by utilizing sophisticated algorithms
in SKAB dataset and TEP dataset due to the application of tempering in­ combined with DL mechanism. This work can eventually assist the or­
dex. Operation of the proposed dynamic loss resolved the drawback of ganizations to enhance the operational efficiency and minimize risk
conventional loss function encountered in traditional VAE model. There­ associated with machine faults. As the present work focuses on imple­
fore, execution of the BiLSTM and dynamic loss function enhanced the mentation of ML and DL, future aspect of the work can exclusively work

12
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277

Fig. 11. ROC Curve for TEP dataset.

on cloud based environment. This is due to ability of the cloud to facil­ [7] D. Singh, Dictionary of Mechanical Engineering, Springer, 2024.
itate real-time analysis in industrial setting. [8] V. Bafandegan Emroozi, M. Kazemi, M. Doostparast, A. Pooya, Improving industrial
maintenance efficiency: a holistic approach to integrated production and mainte­
nance planning with human error optimization, Process Int. Opt. Sustain. 8 (2)
CRediT authorship contribution statement (2024) 539--564.
[9] N.R. Palakurti, Challenges and future directions in anomaly detection, in: Practi­
Praveen Vijai: Software, Methodology. Bagavathi Sivakumar P: cal Applications of Data Processing, Algorithms, and Modeling, IGI Global, 2024,
Supervision. pp. 269--284.
[10] A. Jaramillo-Alcazar, J. Govea, W. Villegas-Ch, Anomaly detection in a smart indus­
trial machinery plant using iot and machine learning, Sensors 23 (19) (2023) 8286.
Declaration of competing interest [11] S.F. Chevtchenko, E.D.S. Rocha, M.C.M. Dos Santos, R.L. Mota, D.M. Vieira, E.C.
De Andrade, D.R.B. De Araújo, Anomaly detection in industrial machinery using
The authors declare that they have no known competing financial iot devices and machine learning: a systematic mapping, IEEE Access 11 (2023)
interests or personal relationships that could have appeared to i­fluence 128288--128305.
[12] A. Mishra, Scalable AI and Design Patterns: Design, Develop, and Deploy Scalable
the work reported in this paper. AI Solutions, Springer Nature, 2024.
[13] D. Kim, T.-Y. Heo, Anomaly detection with feature extraction based on machine
Data availability learning using hydraulic system iot sensor data, Sensors 22 (7) (2022) 2479.
[14] N. Bao, Y. Fan, Z. Ye, A. Simeone, A machine vision—based pipe leakage detection
system for automated power plant maintenance, Sensors 22 (4) (2022) 1588.
Open source data are being used.
[15] M. Carratù, V. Gallo, S.D. Iacono, P. Sommella, A. Bartolini, F. Grasso, L. Ciani,
G. Patrizi, A novel methodology for unsupervised anomaly detection in industrial
References electrical systems, IEEE Trans. Instrum. Meas. (2023).
[16] R. Anuradha, B. Swathi, A. Nagpal, P. Chaturvedi, R. Kalra, A.A. Alwan, Deep Learn­
[1] R. Figliè, R. Amadio, M. Tyrovolas, C. Stylios, Ł. Paśko, D. Stadnicka, A. Carreras­ ing for Anomaly Detection in Large-Scale Industrial Data, 2023 10th IEEE Uttar
Coch, A. Zaballos, J. Navarro, D. Mazzei, Towards a taxonomy of industrial chal­ Pradesh Section International Conference on Electrical, Electronics and Computer
lenges and enabling technologies in industry 4.0, IEEE Access (2024). Engineering (UPCON), vol. 10, IEEE, 2023, pp. 1551--1556.
[2] A. Khang, V. Abdullayev, V. Hahanov, V. Shah, Advanced IoT Technologies and [17] I. Ahmed, M. Ahmad, A. Chehri, G. Jeon, A smart-anomaly-detection system for
Applications in the Industry 4.0 Digital Economy, CRC Press, 2024. industrial machines based on feature autoencoder and deep learning, Micromachines
[3] F.J. Folgado, D. Calderón, I. González, A.J. Calderón, Review of industry 4.0 from 14 (1) (2023) 154.
the perspective of automation and supervision systems: definitions, architectures and [18] A. Gholami, C. Qin, S. Pannala, A.K. Srivastava, F. Rahmatian, R. Sharma, S. Pandey,
recent trends, Electronics 13 (4) (2024) 782. D-pmu data generation and anomaly detection using statistical and clustering tech­
[4] K. Shriram, S.K. Karthiban, A.C. Kumar, S. Mathiarasu, P. Saleeshya, Productivity niques, in: 2022 10th Workshop on Modelling and Simulation of Cyber-Physical
improvement in a paper manufacturing company through lean and iot-a case study, Energy Systems (MSCPES), IEEE, 2022, pp. 1--6.
Int. J. Bus. Syst. Res. 17 (1) (2023) 97--119. [19] E.A. Hinojosa-Palafox, O.M. Rodríguez-Elías, J.H. Pacheco-Ramírez, J.A. Hoyo­
[5] S. Tyagi, N. Rastogi, A. Gupta, K. Joshi, Significant leap in the industrial revolution Montaño, M. Pérez-Patricio, D.F. Espejel-Blanco, A novel unsupervised anomaly
from industry 4.0 to industry 5.0: needs, problems, and driving forces, in: Manage­ detection framework for early fault detection in complex industrial settings, IEEE
ment and Production Engineering Review, 2024. Access (2024).
[6] B. Wang, H. Ma, F. Wang, U. Dampage, M. Al-Dhaifallah, Z.M. Ali, M.A. Mohamed, [20] D. Ribeiro, L.M. Matos, G. Moreira, A. Pilastri, P. Cortez, Isolation forests and deep
An iot-enabled stochastic operation management framework for smart grids, IEEE autoencoders for industrial screw tightening anomaly detection, Computers 11 (4)
Trans. Intell. Transp. Syst. 24 (1) (2022) 1025--1034. (2022) 54.

13
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277
[21] D. Velásquez, E. Pérez, X. Oregui, A. Artetxe, J. Manteca, J.E. Mansilla, M. Toro, [33] Z. Li, J. Li, Y. Wang, K. Wang, A deep learning approach for anomaly detection based
M. Maiza, B. Sierra, A hybrid machine-learning ensemble for anomaly detection in on sae and lstm in mechanical equipment, Int. J. Adv. Manuf. Technol. 103 (2019)
real-time industry 4.0 systems, IEEE Access 10 (2022) 72024--72036. 499--510.
[22] N. Murugesan, A.N. Velu, B.S. Palaniappan, B. Sukumar, M.J. Hossain, Mitigating [34] R. de Paula Monteiro, M.C. Lozada, D.R.C. Mendieta, R.V.S. Loja, C.J.A. Bastos Filho,
missing rate and early cyberattack discrimination using optimal statistical approach A hybrid prototype selection-based deep learning approach for anomaly detection
with machine learning techniques in a smart grid, Energies 17 (8) (2024) 1965. in industrial machines, Expert Syst. Appl. 204 (2022) 117528.
[23] R. Sorostinean, Z. Burghelea, A. Gellert, Anomaly detection in smart industrial ma­ [35] P. Peng, H. Zhang, X. Wang, W. Huang, H. Wang, Imbalanced chemical process fault
chinery through hidden Markov models and autoencoders, IEEE Access (2024). diagnosis using balancing gan with active sample selection, IEEE Sens. J. 23 (13)
[24] A. Almalaq, S. Albadran, M.A. Mohamed, An adoptive miner-misuse based online (2023) 14826--14833.
anomaly detection approach in the power system: an optimum reinforcement learn­ [36] I. Lomov, M. Lyubimov, I. Makarov, L.E. Zhukov, Fault detection in Tennessee East­
ing method, Mathematics 11 (4) (2023) 884. man process with temporal deep learning models, J. Ind. Inform. Int. 23 (2021)
[25] T. Klaeger, S. Gottschall, L. Oehm, Data science on industrial data—today’s chal­ 100216.
lenges in Brown field applications, Challenges 12 (1) (2021) 2. [37] Y. Li, A fault prediction and cause ident­fication approach in complex industrial pro­
[26] H.C. Altunay, Z. Albayrak, A hybrid cnn+ lstm-based intrusion detection system for cesses based on deep learning, Comput. Intell. Neurosci. 2021 (1) (2021) 6612342.
industrial iot networks, Eng. Sci. Technol. Int. J. 38 (2023) 101322. [38] V. Pozdnyakov, A. Kovalenko, I. Makarov, M. Drobyshevskiy, K. Lukyanov, Adversar­
[27] M. Hu, P. Xia, Industrial time-series signal anomaly detection based on g-lstm-ae ial attacks and defenses in fault detection and diagnosis: a comprehensive benchmark
model, in: International Conference on Art­ficial Intelligence in China, Springer, on the Tennessee Eastman process, IEEE Open J. Ind. Electron. Soc. (2024).
2022, pp. 383--391. [39] S. Zhao, Y. Duan, N. Roy, B. Zhang, A novel fault diagnosis framework empowered
[28] F. Khanmohammadi, R. Azmi, Time-series anomaly detection in automated vehicles by lstm and attention: a case study on the Tennessee Eastman process, Can. J. Chem.
using d-cnn-lstm autoencoder, IEEE Trans. Intell. Transp. Syst. (2024). Eng. (2024).
[29] P.K. Sebastian, K. Deepa, N. Neelima, R. Paul, T. Özer, A comparative analysis of
[40] R. Verma, R. Yerolla, C.S. Besta, Deep learning-based fault detection in the Tennessee
deep neural network models in iot-based smart systems for energy prediction and
Eastman process, in: 2022 Second International Conference on Art­ficial Intelligence
theft detection, IET Renew. Power Gener. 18 (3) (2024) 398--411.
and Smart Energy (ICAIS), IEEE, 2022, pp. 228--233.
[30] D.H. Tran, V.L. Nguyen, H. Nguyen, Y.M. Jang, Self-supervised learning for time­
[41] Y. Song, D. Li, Application of a novel data-driven framework in anomaly detection
series anomaly detection in industrial internet of things, Electronics 11 (14) (2022)
of industrial data, IEEE Access (2024).
2146.
[42] H. Xu, T. Ren, Z. Mo, X. Yang, A fault diagnosis model for Tennessee Eastman pro­
[31] S. Dou, G. Zhang, Z. Xiong, Anomaly detection of process unit based on lstm time
cesses based on feature selection and probabilistic neural network, Appl. Sci. 12 (17)
series reconstruction, CIESC J. 70 (2) (2019) 481.
(2022) 8868.
[32] D. Sun, Y. Fan, G. Wang, Enhancing fault diagnosis in industrial processes through
adversarial task augmented sequential meta-learning, Appl. Sci. 14 (11) (2024)
4433.

14

You might also like