Anomaly Detection Solutions The Dynamic Loss Approach in VAE For Manufacturing and IoT Environment
Anomaly Detection Solutions The Dynamic Loss Approach in VAE For Manufacturing and IoT Environment
Results in Engineering
journal homepage: www.sciencedirect.com/journal/results-in-engineering
Research paper
A R T I C L E I N F O A B S T R A C T
Keywords: Anomaly detection is critical for enhancing operational efficiency, safety, and maintenance in industrial
Anomaly detection applications, particularly in the era of Industry 4.0 and IoT. While traditional anomaly detection approaches face
Deep learning limitations such as scalability issues, high false alarm rates, and reliance on skilled expertise, this study proposes a
Variational autoencoder
novel approach using a BiLSTM-Variational Autoencoder (BiLSTM-VAE) model with a dynamic loss function. The
Bidirectional long short term memory model
Dynamic loss function
proposed model addresses key challenges, including data imbalance, interpretability issues, and computational
complexity. By leveraging the bidirectional capability of BiLSTM in the encoder and decoder, the model captures
comprehensive temporal dependencies, enabling more effective anomaly detection. The innovative dynamic loss
function integrates a tempering index mechanism with tuneable parameters (𝛼 and 𝛾 ), which assigns higher
weights to underrepresented classes and down-weights easily classfied samples. This improves reconstruction
and enhances detection accuracy, particularly for minority class anomalies. Experimental evaluations on the
SKAB and TEP datasets demonstrate the superiority of the proposed framework. The model achieved an accuracy
of 98% and an F1 score of 96% for binary classfication on the SKAB dataset and a multiclass classfication
accuracy of 92% with an F1 score of 85% on the TEP dataset. These results significantly outperform state-of
the-art models, including traditional VAE, LSTM, and transformer-based approaches. The proposed BiLSTM-VAE
model not only demonstrates robust anomaly detection capabilities across diverse datasets but also effectively
handles data imbalance and reduces false positives, making it a scalable and reliable solution for industrial
anomaly detection in the context of Industry 4.0 and IoT environments.
1. Introduction blocks and roller chains [7]. These components are essential for ensuring
the reliability and efficiency of machinery, ultimately enhancing op
The manufacturing industry serves as a cornerstone of economic de erational performance and minimizing downtime in industrial settings
velopment, encompassing a diverse range of sectors that convert raw [8]. However, despite their importance, these components can be sus
materials into finished products through various processes [1,2]. Thus, ceptible to various anomalies and faults under certain conditions [9].
it has been revealed that, manufacturing sector has evolved signifi Mechanical failures can arise from issues such as inadequate lubrica
cantly from its artisanal roots to a sophisticated, technology driven tion, misalignment, electrical malfunctions, operational challenges, or
landscape. Moreover, modern manufacturing environment is charac exposure to physical stressors.
terized by diverse production methods, advanced technologies and a Therefore, early detection of these anomalies is crucial for maintain
focus on efficiency and sustainability [3,4]. Historically, manufactur ing operational integrity and preventing costly disruptions [10]. Imple
ing began with manual craftsmanship before the industrial revolution mentation of anomaly detection in industrial environment is crucial for
introduced mechanization and mass production techniques [5,6]. This maintaining operational efficient, safety and security. Some of the com
shift allowed for greater quantities of goods to be produced at lower mon methods used for detecting anomalies in industrial machineries are
costs, fundamentally altering economic structures and consumer mar visual inceptions, checklists and SOPs (Standard Operation Procedures),
kets. Typically, industrial applications depend heavily on a diverse array data logging and analysis, operator Feedback and reporting, RCA (Root
of components including mechanical parts such as ball bearings, elec Cause Analysis) and many more. When anomalies are detected, conduct
trical components like capacitors, and critical elements such as pillow ing a root cause analysis helps determine the underlying issues causing
* Corresponding author.
E-mail addresses: cb.en.d.cse14005@cb.students.amrita.edu (P. Vijai), pbsk@cb.amrita.edu (B.S. P).
https://doi.org/10.1016/j.rineng.2025.104277
Received 9 November 2024; Received in revised form 25 December 2024; Accepted 4 February 2025
2
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277
of the study has focused on utilizing deep AE models like LSTM as it can critical components, as their unexpected failures can lead to significant
effectively captures sequential data patterns. operational disruptions. Hence, six layer autoencoder model has used for
Anomaly and fault detections in industrial systems poses significant detecting the fault analysis [17] and anomaly detection. The model has
challenge because of the inherent complexity of these systems, which demonstrated an accuracy rate of 91% in detecting anomalies. Though
makes comprehensive monitoring difficult. Therefore, three different the model has delivered accuracy rate of 91%, better accuracy can be
ML models like LOF, One class SVM and AE [21,22] where used via attained by employing DL models with different industrial benchmark
weighted average for performing anomaly detection. Findings of the datasets.
study has deliberated F1 score of 0.904 for LOF, 0.89 for one class SVM Industrial sensors have become essential tools for monitoring envi
and 0.88 for AE. Despite the performance of the model, the computa ronmental conditions within manufacturing systems. However, if these
tional cost of the employing these algorithms were high, therefore, DL smart sensors exhibit abnormal behavior, it can lead to failures or poten
based models will be preferred in the future to able to classify and cat tial risks during operation ultimately jeopardizing the overall reliability
egorize different types of faults/anomalies. Clustering approaches such of the manufacturing process. Hence, LSTM based model [31] has used
as HMM and auto encoders [23] were used for identifying huge devia for reconstructing the time series in reverse sequence. Subsequently, the
tions within the environmental data. The outcome of the model meets discrepancies between the reconstructed values and the actual values
the need for sustainable manufacturing by enabling the analysis of data are utilized to calculate the probability of an anomaly using maximum
collected from different machines. Moreover, SVM has used for detect probability estimation approach. Likewise, ATASML (Adversarial Task
ing anomalies in the operation of a rotating bearing within a marketable Augmented Sequential Meta-Learning) [32] model has used for the de
semi-conductor manufacturing machine. Reinforcement learning-based tecting the faults in industrial components. The model has incorporated
approaches, such as the adoptive miner-misuse method, can enhance two different datasets such as SKAB and TEP datasets. Findings of the
online anomaly detection in power systems by adaptively learning from model has rflected in delivering 94.7% of accuracy for SKAB dataset
real-time data and optimizing detection strategies to improve accuracy 90.13% of accuracy for TEP dataset. Therefore, the strategic combina
and reduce false alarms in smart city energy management systems [24]. tion of adversarial learning with task sequencing in ATASML has focused
Industrial data presents considerable difficulties for conventional sta on fault diagnosis in various operational contexts. It is known that, the
tistical and clustering techniques [25,26] as these challenges stem from anomaly in mechanical systems can result in break down with serious
factors such as high dimensionality, intrinsic noise and the diverse na safety, environment and economic impact. Hence, in order to proceed
ture of the data, all of which can negatively affect the performance of with anomaly detection in mechanical equipment, two DL based ap
these methods. Additionally, reliance on specific distributional assump proaches [33] have been used, such as SAE (Stacked AE) and LSTM net
tions, the need for careful algorithm tuning, and high computational work. Combination of these techniques resulted in identifying anomaly
demands further constrain the efficacy of traditional anomaly detection conditions in an unsupervised manner. Findings of the work has stated
strategies in intricate industrial environment. Besides, high dimensional that, the model has resulted in better performance for anomaly detec
characteristics of industrial data complicate the precise modeling and tion. DL base approach [34] has employed in the study for detection of
detection of anomalies using traditional approaches. Furthermore, this anomalies in industrial machines, particularly in rotating machinery.
type of data is frequently affected by noise resulting from sensor inac Therefore, in order to accomplish this, CNN model has used as fea
curacies and environmental factors, which can hinder the effectiveness ture extractor for the reconstruction of input information and prototype
of statistical and clustering approaches. algorithm has used for improving the training process of a arbitrarily
Moreover, the robust of the statistical and ML presents a significant initialized feature extractor. Moreover, a BAGAN (Balancing Generative
challenge. Many statistical techniques depend on assumptions regarding Adversarial Network) [35] has used in the study for tacking the chal
the underlying data distributions, such as the assumption of normal lenge of imbalanced fault diagnosis by harnessing data generation and
ity, which is often not met by industrial data, thereby diminishing its sample selection process. Initially, BAGAN technique has used for cre
effectiveness. Likewise, conventional clustering methods frequently ne ating more distinct fault samples, as this approach utilized both fault
cessitate extensive parameter tuning to achieve optimal results, which and normal samples for enhancing the quality of the generated data.
can be particularly difficult with complex industrial datasets. Further Following this, to classify the faults effectively, SAE based DNN model
more, the computational demands of traditional clustering approaches has used on TEP dataset. The results indicated that the BAGAN based
can render them unsuitable for anomaly detection in high throughput model, coupled with an active sample selection strategy, significantly
industrial settings [19]. enhanced performance in diagnosing imbalances within chemical fault
data.
2.2. Deep learning models Automated early detection and prediction of process faults continues
to be difficult challenge in industrial operations. Therefore, DL based
The model G-LSTM-AE (Gated Long Short Term Memory Autoen methods have been opted for detecting faults in industrial machiner
coder) [27] has focused on combining the strengths of LSTM networks ies. Hence, in order to process this effectively, temporal CNN1D2D [36]
and autoencoders by effectively learning the temporal dependencies in approach has been executed for detecting faults using TEP dataset by
time series data while also reconstructing input signals to detect the detecting various fault patterns, handling internal data fluctuations and
anomalies based on reconstruction errors. Two different datasets such correlation between sensors. Moreover, GAN model has used for enrich
as automatic guided vehicle and SKAB dataset has been employed and ing and extending training data. Findings of the study has shown that,
findings of the model has showcased that G-LSTM-AE technique has faults that were challenging to identify were 3, 9 and 15. Issues aris
resulted in satisfactory anomaly detection in industrial scenarios. Like ing in the production line can lead to significant losses. Anticipating
wise, DL based CNN and LSTM autoencoder [28,29] model has utilized these faults before they happen or pinpointing their underlying causes
for optimizing the anomaly detection rate of all anomalies. Analytical can greatly mitigate such losses. Thus, DL based technique has used in
outcome of the model has resulted in reasonable outcome for time se which, the production process follows a spatial sequence that differs
ries anomaly detection. Three real-world datasets have been considered from conventional time series data. To address this, LSTM within an
in the study for detecting the abnormality in the manufacturing sys encoder-decoder [37] has used for accommodating the branched struc
tem using DL based 1DCNN technique [30]. In order to evaluate the ture associated with the spatial sequence. Additionally, an attention
performance of the model, traditional techniques such as LSTM, autoen mechanism has employed for detecting faults and detect the causes in
coder, LSTM-autoencoder and ARIMA techniques were compared with TEP dataset. A significant limitation of this method is the complexity of
1DCNN and resulted in significantly higher outcome for anomaly de the attention mechanism. This algorithm demands substantial comput
tection. Moreover, in various industrial settings, gear are often deemed ing resources and exhibited sub-optimal real time performance. Another
3
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277
drawback of the model is the necessity for historical data to generate the
output model effectively. Correspondingly, MLP, GRU and TCN (Tem
poral Convolutional Network) [38] has used in the study for identifying
different types of faults in the automated control system for enhancing
the decision making in industrial process management. Besides, a com
bination of BLSTM with AM [39] has developed for address the dynamic
and temporal relationship on longer series observation and the attention
mechanism has adopted for highlight the features by assigning weights
to the model. This is obtained by using TEP dataset, which reduced the
bias between larger population parameters and sample statistics. Find
ings of the work has illustrated ideal tradeoff in fault diagnosis research.
Likewise, DL based model [40] has opted in the work for conducting
fault detection and diagnosis by combining non-linear processes using
TEP dataset. Experimental outcome of the model has resulted in consid
erable performance of anomaly detection.
3. Research methodology
Fig. 2. System architecture used in SKAB.
4
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277
Table 1
Attributes of SKAB Dataset.
Columns Description
Datetime Represents dates and times of the moment when the value is written to the database.
Accelerometer1RMS Shows a vibration acceleration (Amount of g units)
Accelerometer2RMS Shows a vibration acceleration (Amount of g units)
Current Shows the amperage on the electric motor (Ampere)
Pressure Represents the pressure in the loop after the water pump (Bar)
Temperature Shows the temperature of the engine body (The degree Celsius)
Thermocouple Represents the temperature of the fluid in the circulation loop (The degree Celsius)
Voltage Shows the voltage on the electric motor (Volt)
RatRMS Represents the circulation flow rate of the fluid inside the loop (Liter per minute)
Anomlay Shows if the point is anomalous (0 or 1)
changepoint Shows if the point is a changepoint for collective anomalies (0 or 1)
included to address any potential shaft misalignment issues, ensuring the data needs to be bounded within a dfined interval, making it suit
smooth operation and longevity of the equipment. able for anomaly detection model.
Label encoding: Label encoding is a critical pre-processing ap
3.1.2. Tennessee Eastman process dataset proach employed for primarily converting categorical variables into
TEP dataset comprises of different datasets that simulate various op numerical format. This transformation is important for algorithms that
erational conditions and faults. This detail encompasses of 22 classes, need numerical input.
Additionally, SMOTE pre-processing approach has used for TEP
where 21 classes represents different fault types and 1 class (Fault 0)
dataset to address the class imbalance in dataset. It generates synthetic
represents the fault-free condition. Fig. 3 shows the process involving 5
examples of the minority class rather than duplicating existing instances,
main operating units such as reactors, condenser, vapor-liquid separa
which aids to create more balanced dataset and enhance the perfor
tor, recycle compressor and product stripper.
mance of the model.
3.2. Data pre-processing 3.3. Proposed BiLSTM-variational autoencoder with dynamic loss function
Two different data pre-preprocessing techniques are used in the VAE is a powerful generative model opted by the proposed work
study for for anomaly detection of machinery fault detection. Though there are
Min-Max normalization: Min-Max normalization is a technique various methods for anomaly detection, proposed work adopts VAE as
which scales the values of a dataset to a specific range, typically be VAEs encode data into a probabilistic latent space rather than a fixed
tween 0 to 1. This method is specifically benficial in scenarios where point, which allows for a more nuanced understanding of the data dis
5
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277
tribution. This flexibility helps capture the variability and complexity 𝑝(𝐴 ∣ 𝐶) = (𝐴; 𝜇 ′ (𝐶), 𝜎 ′ 2 (𝐶)) (2)
of normal operating conditions in machinery, making it easier to iden
Where 𝜇 ′ (𝐶)showcases the mean and 𝜎 ′ 2 (𝐶)
represents the variance
tify deviations that indicate anomalies. Unlike standard AEs and other
of the reconstructed output given the latent variable c. Therefore, the
approaches, which can memorize training data, VAEs reassure general
goal of the decoder is to maximize the likelihood of reconstructing the
ization by sampling from the learned latent distribution. This charac
original data from the latent representation.
teristics enables VAEs to reconstructs inputs more effectively, as they
learn to model the underlying data distribution rather than just mem
3.3.3. Loss function
orizing specific instances. Besides, the reconstruction process in VAEs
The loss function of VAE is a fundamental aspect that governs its
involves comparing the original input with its reconstruction from the
training and performance, comprising two main components such as re
latent space. This allows for a nuanced assessment of anomalies, as de
construction loss and KL divergence. Each of these components play a
viations from normal patterns can be detected through reconstruction
distinct role in shaping the model’s ability to learn a meaningful repre
error. VAEs can learn to reconstruct typical patterns while highlight
sentation of the data and for generating new samples. The reconstruction
ing anomalies, which is imperative in industrial setting where normal
loss measures how precisely the VAE can predict the input data from its
operating conditions can vary significantly. Furthermore, VAE create
latent representation. It is essential for ensuring that the model captures
a smoother and more continuous latent space due to its regularization
the essential features of the input. This loss is dfined mathematically
techniques such as KL divergence. This characteristics leads to better
as,
clustering of similar data points and more reliable similarity measures,
[ ]
which enhances the detection of anomalies. This smoothness of the la Reconstruction Loss = 𝔼𝑞(𝐶|𝐴) log 𝑝(𝐴|𝐶) (3)
tent space ensures that even minor deviations from the norm can be
captured effectively. Owing to these factors, VAE model is used. There Here, A denotes the original input, z represents the latent variable sam
fore, Fig. 4 shows the working of traditional VAE function. pled from the encoder’s output and p(A|C) highlights the likelihood of
The architecture of VAE consist of two main components such as the reconstructing x given c. For more continuous data, Gaussian distribu
encoder and the decoder. This structure is crucial for its operation, as tion can be used, leading to a reconstruction loss computed via MSE and
it allows the model to learn a probabilistic representation of the input for binary data, BCE is used, which is demonstrated as,
data. 𝑁
1 ∑[ ]
BCE = − 𝐴 log(𝐴) + (1 − 𝐴𝑖 ) log(1 − 𝐴̂ 𝑖 ) (4)
𝑁 𝑖=1 𝑖
3.3.1. Encoder
The encoder part in VAE is responsible for mapping the input data This part of the loss function ensures that the decoder learns to generate
into a latent space. Therefore, the encoder takes raw input data A and outputs that closely resemble the original inputs, thus driving accurate
transforms it into a latent space representation C. Instead of produc reconstructions.
ing a single deterministic output, the encoder outputs parameter for a Likewise, KL divergence ensures that the learned latent distribution
probability distribution-typically a Gaussian distribution characterized approximates a prior distribution and promotes a well-structured latent
by a mean 𝜇 and variance 𝜎 2 . This transformation is mathematically space where different regions can be effectively utilized for generating
expressed in the equation (1) as new samples, therefore, the KL divergence term can be expressed in
terms of the parameters of the learned distribution and is depicted as,
𝑞(𝑐𝑜 ∣ 𝐴) = (𝐶; 𝜇(𝐴), 𝜎 2 (𝐴)) (1)
𝑑 ( )
1∑
Here, 𝑞 (𝑐 ∣ 𝐴) is the approximate posterior distribution. The encoder’s 𝐷𝐾𝐿 (𝑞(𝐶|𝐴)‖𝑝(𝐶)) = − 1 + log(𝜎𝑗2 ) − 𝜇𝑗2 − 𝜎𝑗2 (5)
2 𝑗=1
goal is(to)ensure that this distribution closely matches a prior distribu
tion 𝑝 𝑐𝑜 , often chosen as a standard normal distribution (0, 𝐼) Where, d is denoted as dimensionality of the latent space and 𝜇𝑗 and 𝜎𝑗
denotes the mean and variance for each dimension of the latent variable.
3.3.2. Decoder Therefore, In VAE, the encoder is designed to transform the in
The decoder performs the reverse operation, where the decoder takes put data into a lower-dimensional representation characterized by a
samples from the latent space and reconstructs them back into the data probability distribution. This probabilistic encoding is essential for the
space. It aims to generate data points that closely match the original latent space C to possess meaningful abstract properties that facili
inputs. The decoder outputs another set of parameters for a distribution tate the reconstruction of the observed data. To ensure that this la
over the reconstructed data, typically modeled as, tent space adheres to a well-defined structure, regularization technique
6
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277
are employed, allowing the VAE effectively learn variational inference analyzing sequences bidirectionally, the BiLSTM can learn complex de
throughout the training process. The weight parameter 𝜙 of the encoder pendencies and relationships in time-series data more effectively than
network is optimized to transform input samples into an encoded fea unidirectional models. In encoder part, the encoder takes time series
ture representation, referred to as C. In contrast, the weight parameter 𝜃 data from machinery as input. Then, BiLSTM processes the input data in
of the decoder network is trained to generate new samples by mapping both forward and backward directions. This dual processing allows the
from the encoded space C back to the original data space. Throughput model to capture the temporal dependencies more effectively, as it con
the training process, there is a possibility of information loss, which sider both past and future information for each time step. This is useful
can affect the accuracy of the reconstruction. Therefore, the primary in industrial settings where the state of machinery is ifluenced by pre
objective is to establish an ideal encoder-decoder pair that maximizes vious and subsequent states. After processing the input data, the BiLSTM
information retention during encoding while minimizing reconstruction generates hidden states that are then used to form a latent representa
error during decoding. This approach ensures that the model effectively tion. This latent space is important for capturing the underlying patterns
captures the essential features of the input data and precisely recon in normal operational behavior. The encoder outputs are passed through
structs it. Moreover, traditional VAE often struggle with computational a variational layer that approximates a posterior distribution over the
resources, especially when dealing with huge datasets. This leads to latent variables. The output from the BiLSTM encoder is typically fed
more increased training times and operational cost. While traditional into a re-parameterization layer, where two vectors are produced. The
VAE are designed to reconstruct input data, it often produce outputs that use of BiLSTM in encoder can handle varying input lengths and capture
lack high fidelity or distorted reconstruction. This can hinder effective long-range dependencies more effectively. This capability is crucial for
ness of anomaly detection since the model can misinterpret the anoma upholding the integrity of temporal information when encoding input
lies as normal variations due to poor reconstruction quality. Therefore, sequences into latent representations. The decoder begins by sampling
in order to overcome these pitfalls, proposed model focuses on employ from the latent space using the parameters generate by the encoder.
ing BiLSTM technique with variational autoencoder with proposed dy This sampling is essential for generating new data points that mimic the
namic loss. Though there are various algorithms, proposed work adopts input data distribution. A BiLSTM decoder interprets these latent rep
BiLSTM model, as it is designed to handle sequential data by processing resentations and generate output sequences. Similar to the encoder, the
information in both forward and backward directions. This bidirection BiLSTM decoder processes information bi-directionally, allowing it to
consider both past outputs and future predictions when generating each
ality allows to capture long-range dependencies more effectively than
step of the output sequence. This features enhances its ability to pro
traditional models, which typically only process data in one direction.
duce coherent and contextually relevant output. Hence, mathematical
BiLSTM models can scale with larger datasets and complex tasks with
equations for proposed model is listed as follows,
out a significant drop in performance. Unlike standard models, BiLSTM
The goal is to reconstruct data for a specific minority class. Therefore,
mitigate issues such as vanishing gradient, making it more capable of
the training of the proposed model involves the inclusion of additional
learning from long sequences of data. This is important for machinery
sample data associated with the designated class label b. during training,
data, as it can exhibit complex temporal patterns over extended periods.
the network develops an optimal latent distribution corresponding to the
As a result of these merits, BiLSTM is picked over other model. The in
particular class label and the loss function of the VAE is computed by
tegration of BiLSTM model with VAE enhances the anomaly detection
employing equation (6),
process in industrial applications takes by incorporating mechanism of
BiLSTM in encoder and decoder part, which is depicted in Fig. 5. BiLSTM 𝐿vae (𝜙, 𝜃, 𝐴, 𝑏) = − log(𝑥𝑡 ) − 𝐷𝐾𝐿 [𝑄(𝐶 ∣ 𝐴, 𝑏)‖𝑃 (𝐶 ∣ 𝑏)] (6)
architecture consist of two LSTM layers that process the input sequence
in both forward and backward directions. Where, 𝐿vae (𝜙, 𝜃, 𝐴, 𝑏) demonstrates the variation lower bound of VAE.
CE used is dfined in equation (7),
• Forward LSTM layer: Processes the input sequences from the begin
𝐶𝐹 (𝑝𝑡 ) = −𝑙𝑜𝑔(𝑥𝑡 ) (7)
ning to the end
• Backward LSTM layer: Processes the input sequences from the end However, the traditional CE (Cross Entropy) loss used in VAE do
to the beginning. not possess the ability to optimize the latent distribution. Moreover,
when CE is employed as the reconstruction loss in the context of imbal
This dual processing allows the model to capture from both past and anced datasets, the majority class tends to dominate the loss calculation,
future states, which is crucial for comprehending sequential data. By which in turn skews the gradient updates during the training process.
7
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277
Table 2 performance of the model. Likewise, the optimizer used in the proposed
Parameters Employed in proposed model. work is Adam optimizer, due to its efficiency in handling sparse gradi
Parameter Value ents and noisy data. The learning rate (𝑙𝑟 = 1𝑒 − 4) controls how much
to adjust weights during training. The clipping value (clip_value=1.0)
Latent Dimension 64
Encoder LSTM Units 128, 64
prevents exploding gradients by limiting the maximum value of gradi
Decoder LSTM Units 64, 128 ents during back propagation. Likewise, batch size used in the model is
Classifier Dense Units 64 32, which strikes the balance between computational efficient and con
Optimizer Adam (lr=1e-4, clip_value=1.0) vergence stability and eventually, epochs opted by the model is 30, as
Batch Size 32
30 epochs allows sufficient iterations for learning without risking over
Epochs 30
fitting.
These parameters are used in the proposed model for building su
Most importantly, cross entropy function can be sensitive to outliers and perior anomaly detection model which performs both binary and multi
this factor can make the model become biased towards predicting the class classfication efficaciously. The results obtained using the proposed
majority class, leading to high false negative rates for anomalies. Dif model is demonstrated in the subsequent section.
ficulty in balancing loss component, as in VAE, the objective function
typically includes both reconstruction loss and KL divergence. Usage of 4. Results and discussion
cross-entropy can complicate the balance between the model since it
can dominate the loss function if not properly scaled, potentially lead Results obtained using the proposed BiLSTM with VAE model is de
ing to suboptimal learning of latent representations. Therefore, in order picted in the corresponding section. Performance metrics, performance
to overcome the drawback, proposed model utilizes dynamic loss in KL analysis and comparative analysis are carried out in this section.
by employing tempering index (1 − 𝑥𝑡 ) with tune-able parameter 𝛾 in
proposed dynamic loss for overcoming the pitfalls encountered with CE 4.1. Performance metrics
loss. Implementation of the (1 − 𝑥𝑡 ) is employed for misclassfied and
true/sample negative samples. Therefore, mathematical expression is 4.1.1. Accuracy
derived in equation (8). Accuracy is claimed as the calculation of total accurate classfication.
The accuracy range is premeditated by using equation (11),
𝑇 𝐼(𝑥𝑡 ) = −𝛼𝑡 (1 − 𝑥𝑡 )𝛾 log(𝑥𝑡 ) (8) 𝑇𝑁 +𝑇𝑃
Acc = (11)
Here, 𝛼 is used for handling the class imbalance issue, where 𝑇𝑁 + 𝐹𝑁 + 𝑇𝑃 + 𝐹𝑃
{ Where, TN is represented as True negative, FN is represented as False
−𝛼, if 𝑏 = 1 Negative, similarly, True positive and False positive is denoted by using
𝛼𝑡 = (9)
−(1 − 𝛼), otherwise TP and FP.
Weighted term is denoted as 𝛼𝑡 whose value is 𝛼 for positive class and
4.1.2. Precision
1−𝛼 for negative class. Therefore, usage of 𝛼 balances the significance of
Precision is considered by determining the accurate classfication
majority as well as the minority examples. 𝛾 is tailored to various classes
count. The precision is estimated by using equation (12),
depending on the imbalance characteristics. Therefore, the goal is to
reduce the relative errors for minority classes by emphasizing its sig 𝑇𝑃
Precision = (12)
nificance. The hyperparameter 𝛾 ifluences the shape of the loss curve, 𝑇𝑃 + 𝐹𝑃
allowing for targeted adjustments in the learning process. The primary
4.1.3. F-measure
purpose of proposed dynamic loss focuses on minimizing error input
The F1 score is represented as the weighted harmonic-mean value
from notable instances and amplify the error for those examples that tol
of precision and value of recall, Equation (13) is dfined as the formula
erate a low loss. Hence, mathematical equation for proposed dynamic
employed for determining F1-Score,
loss is provided in equation (10),
𝑅×𝑃
F1-score = 2 × (13)
𝐿cflvae (𝜙, 𝜃, 𝐴, 𝑏) = −𝛼𝑡 (1 − 𝑥𝑡 )𝛾 log(𝑥𝑡 ) − 𝐷𝐾𝐿 [𝑄(𝐶 ∣ 𝐴, 𝑏)‖𝑃 (𝐶 ∣ 𝑏)] 𝑅+𝑃
Where, P is denoted as precision and R is denoted as recall.
(10)
Therefore, proposed dynamic loss function focuses on different minor 4.1.4. Recall
ity class samples differently and learns the best distribution of observed Recall is indicated as the reclusive of the production metric that as
data. Therefore, table showcases the parameters used in the model (Ta sess the total of correct positive categories made out of all the optimistic
ble 2). classes. Equation (14) shows the mathematical model for recall,
Table shows the parameters used in the proposed model, where the 𝑇𝑃
latent dimension refers to the size of the encoded representation of the Recall = (14)
𝑇𝑃 + 𝐹𝑁
input data produced by the encoder. A latent dimension of 64 indicates
that the model will compress the sequence of input into a fixed-length 4.2. Performance analysis
vector of size 64. The encoder consist of two LSTM layers with 128 and
64 units. The first layer (128 units) captures complex temporal patterns Performance analysis is performed for assessing the efficacy of the
in the input sequence, while the second layer (64 units) rfines this in model for the anomaly detection in industrial applications using SKAB
formation into a more compact representation before passing it to the and TEP dataset.
decoder. Similar to the encoder, the decoder also has two LSTM layers.
The first layer has 64 units, which processes the encoded state from the 4.2.1. SKAB dataset
encoder and generates a sequence of hidden states for output generation. Subsequent section explores the confusion matrix, model accuracy,
The second layer has only 128 unit, indicating that it produces a single model loss and ROC curve for proposed model using SKAB dataset. Fig. 6
output at each time step. After processing through the decoder, a dense shows the confusion matrix of the proposed model.
layer with 64 units is used to classify the final output from the decoder’s Confusion matrix is employed for evaluating the performance of clas
hidden states. The choice of 128 units can balance the complexity and sfication model, where it typically consist of 4 different components
8
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277
Table 3
Performance Metrics of Different Models for SKAB dataset [41].
fewer instances like Class 4 (75 correct) and Class 9 (139 correct), indi
cating lower performance possibly due to imbalanced classes or similar
Fig. 6. Confusion Matrix for SKAB dataset. features. There are also instances where neighboring classes are con
fused, such as Class 1 being misclassfied as Classes 2, 3, or 20, and
such as TP, TN, FP and FN. TP correctly predicts the positive class, TN Class 12 (3220 correct) sometimes being misclassfied as Classes 11 or
correctly predicts the negative class, FP incorrectly predicts the positive 13. To enhance performance, it is essential to delve into the difficulties
class and FN incorrectly predicts the negative class. Here, rows represent faced by poorly performing classes and tackle issues like data imbalance
the actual classes and column depicts the predicted label. The overall or feature overlap.
mechanism of the Fig. 6, illustrates that, misclassfications are less than Model accuracy is highlighted in Fig. 10a. The model demonstrates
correct classfications. This shows that the model has delivered better efficient learning as the training and validation losses decrease consis
result of the model. tently until they reach a point of stability around epochs 10 to 12, show
Likewise, model accuracy is showcased in Fig. 7a, where model ac ing convergence. However, there is a increase in training loss at epoch
curacy is a measure of how well a model makes predictions compared 16, surpassing 30, potentially indicating problems such as weight ini
to the actual outcomes. Y axis represents the accuracy, a measure of tialization or optimizer instability. Though, the validation loss remains
proportion of correct predictions, x axis denotes the number of epochs. unchanged, suggesting this is probably a temporary variation. Follow
Initially, the training accuracy starts very low and improves steadily ing the spike, the training loss stabilizes again and closely matches the
over first 10 epochs and after about 15 epochs, it plateaus near 1.0, validation loss, indicating no signs of ovefitting. The Fig. 10b shows
showing the model learns the training data perfectly. Then, the vali significant improvement in accuracy, with both the training and valida
dation accuracy closely follows the training accuracy curve during the tion accuracy starting at about 0.5 and steadily increasing to around 0.9
initial epochs. by epoch 30, demonstrating effective learning without notable ovefit
Model loss is illustrated in Fig. 7b, where model loss quantfies how ting or undefitting. It is interesting to note that the validation accuracy
well the proposed model’s predictions align with the actual outcomes, slightly exceeds the training accuracy at certain points, indicating good
focusing on the errors made during predictions. The y axis indicates the generalization and minimal ovefitting. The accuracy levels off after
loss, which measures the error or difference between the predicted and epoch 20, with minor fluctuations around 0.9, suggesting that the model
true values and x axis represents the number of epochs. At the start, the has likely reached its maximum performance with the current setup.
training loss is extremely high. After the first epoch, the training loss ROC curve for TEP dataset is showcased in Fig. 11, where the dashed
drops sharply to near 0. line represents a random classfier, where the TPR equals the FPR. Fig
ROC plot is demonstrated in Fig. 8, where ROC curve plots the TRP ure, lists the 18 classes and its respective AUC scores, where higher AUC
against FPR at different threshold levels. The TPR, which is known as values indicates better performance for that class. The key findings of
recall measures the proportion of actual positive correctly identfied by the paper shows that, many classes (class 1, class 2, class 6, class 7 and
the model, while FPR indicates the proportion of actual negatives in class 14) have an AUC of 1.0, which signfies that the proposed model
correctly identfied as positives. Therefore, ROC curve aids in visualize is perfect in classfication process. Even for the lowest performing class
the trade-off between sensitivity and specficity, assisting in selecting (class 0), the AUC is still quite high at 0.89. This indicates that the model
an optimal threshold for classfication. Likewise, the metric values of has string predictive capability for all classes. The performance metrics
the model is demonstrated in which, accuracy gained by the model is of a multi-class classfication model is also discussed, where accuracy of
0.98, precision obtained by the model is 0.95, recall obtained by the the model obtained is 0.92, precision gained is 0.82, Recall and F1 score
framework is 0.96, likewise, F1 score of the proposed model gained is obtained is 0.92 and 0.85.
0.96.
4.3.1. Comparative analysis
4.3. TEP dataset Though the performance of the proposed model has delivered better
outcome for anomaly detection for both binary and multiclass classi
Like, binary classfication, results of multiclass classfication for TEP fication, it is important to compare the performance of the proposed
dataset is demonstrated in the subsequent section. Therefore confusion framework with state-of-the-approaches, to highlight the working mech
matrix for the proposed model is demonstrated in Fig. 9. anism of the proposed research.
The confusion matrix provides important insights of the proposed Table 3 rflects the metric value obtained by the proposed and state
model’s performs with various classes. When most of the values are con of-the-art approaches for anomaly detection using SKAB dataset, in
centrated along the diagonal, it shows high accuracy for the majority of which the lowest accuracy gained by the existing model is LSTM model
classes. For instance, there are 3648 correct predictions for Class 1 out of with accuracy rate of 0.35 and lowest F1 score of 0.45 is also obtained
around 3700, and Class 20 has 2807 correct predictions. However, there by LSTM model, shows it ineffectiveness on anomaly detection process.
are errors that occur away from the diagonal, especially in classes with However, when compared to the existing models, proposed BiLSTM with
9
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277
10
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277
Table 4
Comparison of Model Performances Across Categories for TEP dataset [42].
Table 5
Comparison Analysis for TEP dataset for F1 score.
Table 6
Ablation Study on performance Metrics.
Base Model Original BiLSTM-VAE with dynamic loss weighting 0.98 0.92
No First BiLSTM Removed the first BiLSTM layer in the encoder 0.96 0.89
No Second BiLSTM Removed the second BiLSTM layer in the encoder 0.97 0.90
No BiLSTMs Removed both BiLSTM layers 0.92 0.82
Half Latent Dimension Reduced the latent space dimension by half 0.97 0.91
Double Latent Dimension Increased the latent space dimension by double 0.98 0.92
No Dynamic Loss Weighting Removed the dynamic loss weighting callback 0.97 0.90
11
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277
12
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277
on cloud based environment. This is due to ability of the cloud to facil [7] D. Singh, Dictionary of Mechanical Engineering, Springer, 2024.
itate real-time analysis in industrial setting. [8] V. Bafandegan Emroozi, M. Kazemi, M. Doostparast, A. Pooya, Improving industrial
maintenance efficiency: a holistic approach to integrated production and mainte
nance planning with human error optimization, Process Int. Opt. Sustain. 8 (2)
CRediT authorship contribution statement (2024) 539--564.
[9] N.R. Palakurti, Challenges and future directions in anomaly detection, in: Practi
Praveen Vijai: Software, Methodology. Bagavathi Sivakumar P: cal Applications of Data Processing, Algorithms, and Modeling, IGI Global, 2024,
Supervision. pp. 269--284.
[10] A. Jaramillo-Alcazar, J. Govea, W. Villegas-Ch, Anomaly detection in a smart indus
trial machinery plant using iot and machine learning, Sensors 23 (19) (2023) 8286.
Declaration of competing interest [11] S.F. Chevtchenko, E.D.S. Rocha, M.C.M. Dos Santos, R.L. Mota, D.M. Vieira, E.C.
De Andrade, D.R.B. De Araújo, Anomaly detection in industrial machinery using
The authors declare that they have no known competing financial iot devices and machine learning: a systematic mapping, IEEE Access 11 (2023)
interests or personal relationships that could have appeared to ifluence 128288--128305.
[12] A. Mishra, Scalable AI and Design Patterns: Design, Develop, and Deploy Scalable
the work reported in this paper. AI Solutions, Springer Nature, 2024.
[13] D. Kim, T.-Y. Heo, Anomaly detection with feature extraction based on machine
Data availability learning using hydraulic system iot sensor data, Sensors 22 (7) (2022) 2479.
[14] N. Bao, Y. Fan, Z. Ye, A. Simeone, A machine vision—based pipe leakage detection
system for automated power plant maintenance, Sensors 22 (4) (2022) 1588.
Open source data are being used.
[15] M. Carratù, V. Gallo, S.D. Iacono, P. Sommella, A. Bartolini, F. Grasso, L. Ciani,
G. Patrizi, A novel methodology for unsupervised anomaly detection in industrial
References electrical systems, IEEE Trans. Instrum. Meas. (2023).
[16] R. Anuradha, B. Swathi, A. Nagpal, P. Chaturvedi, R. Kalra, A.A. Alwan, Deep Learn
[1] R. Figliè, R. Amadio, M. Tyrovolas, C. Stylios, Ł. Paśko, D. Stadnicka, A. Carreras ing for Anomaly Detection in Large-Scale Industrial Data, 2023 10th IEEE Uttar
Coch, A. Zaballos, J. Navarro, D. Mazzei, Towards a taxonomy of industrial chal Pradesh Section International Conference on Electrical, Electronics and Computer
lenges and enabling technologies in industry 4.0, IEEE Access (2024). Engineering (UPCON), vol. 10, IEEE, 2023, pp. 1551--1556.
[2] A. Khang, V. Abdullayev, V. Hahanov, V. Shah, Advanced IoT Technologies and [17] I. Ahmed, M. Ahmad, A. Chehri, G. Jeon, A smart-anomaly-detection system for
Applications in the Industry 4.0 Digital Economy, CRC Press, 2024. industrial machines based on feature autoencoder and deep learning, Micromachines
[3] F.J. Folgado, D. Calderón, I. González, A.J. Calderón, Review of industry 4.0 from 14 (1) (2023) 154.
the perspective of automation and supervision systems: definitions, architectures and [18] A. Gholami, C. Qin, S. Pannala, A.K. Srivastava, F. Rahmatian, R. Sharma, S. Pandey,
recent trends, Electronics 13 (4) (2024) 782. D-pmu data generation and anomaly detection using statistical and clustering tech
[4] K. Shriram, S.K. Karthiban, A.C. Kumar, S. Mathiarasu, P. Saleeshya, Productivity niques, in: 2022 10th Workshop on Modelling and Simulation of Cyber-Physical
improvement in a paper manufacturing company through lean and iot-a case study, Energy Systems (MSCPES), IEEE, 2022, pp. 1--6.
Int. J. Bus. Syst. Res. 17 (1) (2023) 97--119. [19] E.A. Hinojosa-Palafox, O.M. Rodríguez-Elías, J.H. Pacheco-Ramírez, J.A. Hoyo
[5] S. Tyagi, N. Rastogi, A. Gupta, K. Joshi, Significant leap in the industrial revolution Montaño, M. Pérez-Patricio, D.F. Espejel-Blanco, A novel unsupervised anomaly
from industry 4.0 to industry 5.0: needs, problems, and driving forces, in: Manage detection framework for early fault detection in complex industrial settings, IEEE
ment and Production Engineering Review, 2024. Access (2024).
[6] B. Wang, H. Ma, F. Wang, U. Dampage, M. Al-Dhaifallah, Z.M. Ali, M.A. Mohamed, [20] D. Ribeiro, L.M. Matos, G. Moreira, A. Pilastri, P. Cortez, Isolation forests and deep
An iot-enabled stochastic operation management framework for smart grids, IEEE autoencoders for industrial screw tightening anomaly detection, Computers 11 (4)
Trans. Intell. Transp. Syst. 24 (1) (2022) 1025--1034. (2022) 54.
13
P. Vijai and B.S. P
Results in Engineering 25 (2025) 104277
[21] D. Velásquez, E. Pérez, X. Oregui, A. Artetxe, J. Manteca, J.E. Mansilla, M. Toro, [33] Z. Li, J. Li, Y. Wang, K. Wang, A deep learning approach for anomaly detection based
M. Maiza, B. Sierra, A hybrid machine-learning ensemble for anomaly detection in on sae and lstm in mechanical equipment, Int. J. Adv. Manuf. Technol. 103 (2019)
real-time industry 4.0 systems, IEEE Access 10 (2022) 72024--72036. 499--510.
[22] N. Murugesan, A.N. Velu, B.S. Palaniappan, B. Sukumar, M.J. Hossain, Mitigating [34] R. de Paula Monteiro, M.C. Lozada, D.R.C. Mendieta, R.V.S. Loja, C.J.A. Bastos Filho,
missing rate and early cyberattack discrimination using optimal statistical approach A hybrid prototype selection-based deep learning approach for anomaly detection
with machine learning techniques in a smart grid, Energies 17 (8) (2024) 1965. in industrial machines, Expert Syst. Appl. 204 (2022) 117528.
[23] R. Sorostinean, Z. Burghelea, A. Gellert, Anomaly detection in smart industrial ma [35] P. Peng, H. Zhang, X. Wang, W. Huang, H. Wang, Imbalanced chemical process fault
chinery through hidden Markov models and autoencoders, IEEE Access (2024). diagnosis using balancing gan with active sample selection, IEEE Sens. J. 23 (13)
[24] A. Almalaq, S. Albadran, M.A. Mohamed, An adoptive miner-misuse based online (2023) 14826--14833.
anomaly detection approach in the power system: an optimum reinforcement learn [36] I. Lomov, M. Lyubimov, I. Makarov, L.E. Zhukov, Fault detection in Tennessee East
ing method, Mathematics 11 (4) (2023) 884. man process with temporal deep learning models, J. Ind. Inform. Int. 23 (2021)
[25] T. Klaeger, S. Gottschall, L. Oehm, Data science on industrial data—today’s chal 100216.
lenges in Brown field applications, Challenges 12 (1) (2021) 2. [37] Y. Li, A fault prediction and cause identfication approach in complex industrial pro
[26] H.C. Altunay, Z. Albayrak, A hybrid cnn+ lstm-based intrusion detection system for cesses based on deep learning, Comput. Intell. Neurosci. 2021 (1) (2021) 6612342.
industrial iot networks, Eng. Sci. Technol. Int. J. 38 (2023) 101322. [38] V. Pozdnyakov, A. Kovalenko, I. Makarov, M. Drobyshevskiy, K. Lukyanov, Adversar
[27] M. Hu, P. Xia, Industrial time-series signal anomaly detection based on g-lstm-ae ial attacks and defenses in fault detection and diagnosis: a comprehensive benchmark
model, in: International Conference on Artficial Intelligence in China, Springer, on the Tennessee Eastman process, IEEE Open J. Ind. Electron. Soc. (2024).
2022, pp. 383--391. [39] S. Zhao, Y. Duan, N. Roy, B. Zhang, A novel fault diagnosis framework empowered
[28] F. Khanmohammadi, R. Azmi, Time-series anomaly detection in automated vehicles by lstm and attention: a case study on the Tennessee Eastman process, Can. J. Chem.
using d-cnn-lstm autoencoder, IEEE Trans. Intell. Transp. Syst. (2024). Eng. (2024).
[29] P.K. Sebastian, K. Deepa, N. Neelima, R. Paul, T. Özer, A comparative analysis of
[40] R. Verma, R. Yerolla, C.S. Besta, Deep learning-based fault detection in the Tennessee
deep neural network models in iot-based smart systems for energy prediction and
Eastman process, in: 2022 Second International Conference on Artficial Intelligence
theft detection, IET Renew. Power Gener. 18 (3) (2024) 398--411.
and Smart Energy (ICAIS), IEEE, 2022, pp. 228--233.
[30] D.H. Tran, V.L. Nguyen, H. Nguyen, Y.M. Jang, Self-supervised learning for time
[41] Y. Song, D. Li, Application of a novel data-driven framework in anomaly detection
series anomaly detection in industrial internet of things, Electronics 11 (14) (2022)
of industrial data, IEEE Access (2024).
2146.
[42] H. Xu, T. Ren, Z. Mo, X. Yang, A fault diagnosis model for Tennessee Eastman pro
[31] S. Dou, G. Zhang, Z. Xiong, Anomaly detection of process unit based on lstm time
cesses based on feature selection and probabilistic neural network, Appl. Sci. 12 (17)
series reconstruction, CIESC J. 70 (2) (2019) 481.
(2022) 8868.
[32] D. Sun, Y. Fan, G. Wang, Enhancing fault diagnosis in industrial processes through
adversarial task augmented sequential meta-learning, Appl. Sci. 14 (11) (2024)
4433.
14