0% found this document useful (0 votes)
101 views9 pages

Deep Learning For Infrared Thermal Image Based Machine Health Monitoring

This document discusses using deep learning techniques like convolutional neural networks (CNNs) for machine health monitoring using infrared thermal (IRT) video. Specifically: 1) Current machine health monitoring requires experts to engineer features from sensor data, but deep learning can automatically learn features without domain expertise. 2) The authors apply CNNs to IRT video from two machine health monitoring applications - fault detection and oil level prediction - achieving 95% and 91.67% accuracy respectively. 3) They show CNNs can identify important regions in IRT images related to specific machine conditions, potentially providing new physical insights without detailed domain knowledge.

Uploaded by

STEMM 2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views9 pages

Deep Learning For Infrared Thermal Image Based Machine Health Monitoring

This document discusses using deep learning techniques like convolutional neural networks (CNNs) for machine health monitoring using infrared thermal (IRT) video. Specifically: 1) Current machine health monitoring requires experts to engineer features from sensor data, but deep learning can automatically learn features without domain expertise. 2) The authors apply CNNs to IRT video from two machine health monitoring applications - fault detection and oil level prediction - achieving 95% and 91.67% accuracy respectively. 3) They show CNNs can identify important regions in IRT images related to specific machine conditions, potentially providing new physical insights without detailed domain knowledge.

Uploaded by

STEMM 2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

IEEE/ASME TRANSACTIONS ON MECHATRONICS, VOL. 23, NO.

1, FEBRUARY 2018 151

Deep Learning for Infrared Thermal Image


Based Machine Health Monitoring
Olivier Janssens , Rik Van de Walle, Mia Loccufier, and Sofie Van Hoecke

Abstract—The condition of a machine can automatically


be identified by creating and classifying features that sum-
marize characteristics of measured signals. Currently, ex-
perts, in their respective fields, devise these features based
on their knowledge. Hence, the performance and useful-
ness depends on the expert’s knowledge of the underly-
ing physics or statistics. Furthermore, if new and additional
conditions should be detectable, experts have to implement
new feature extraction methods. To mitigate the drawbacks
of feature engineering, a method from the subfield of feature
learning, i.e., deep learning (DL), more specifically convo-
lutional neural networks (NNs), is researched in this paper. Fig. 1. Block diagram illustrating feature engineering (FE) and the train-
The objective of this paper is to investigate if and how DL ing of a machine learning model for CM.
can be applied to infrared thermal (IRT) video to automati-
cally determine the condition of the machine. By applying
this method on IRT data in two use cases, i.e., machine- recognizing abnormal behavior of a machine or machine com-
fault detection and oil-level prediction, we show that the ponent. This generally implies the comparison of healthy and
proposed system is able to detect many conditions in rotat- faulty situations indicated by processed measurements, either
ing machinery very accurately (i.e., 95 and 91.67% accuracy manually obtained through operators with portable devices or
for the respective use cases), without requiring any detailed
knowledge about the underlying physics, and thus having continuously through built-in sensors. From these measure-
the potential to significantly simplify condition monitoring ments, informative characteristics (features) are extracted (en-
using complex sensor data. Furthermore, we show that by gineered) by a CM expert that have to be interpreted to deter-
using the trained NNs, important regions in the IRT images mine the machine’s condition. To automate CM, the streams of
can be identified related to specific conditions, which can measurements need to be processed automatically, requiring a
potentially lead to new physical insights.
system that automatically extracts features from the streams of
Index Terms—Fault detection, machine learning algo- measurements and provide these to a machine learning algo-
rithms, neural networks, preventive maintenance. rithm that determines the machine’s condition. Before such an
algorithm can be used, it has to be trained in order to be able to
I. INTRODUCTION
detect the different conditions.
ONDITION monitoring (CM) of a machine and its com-
C ponents is crucial to avoid downtime and unnecessary
costs, enhance the machine’s lifetime, and improve safety by
1) First, a dataset of measurements with accompanying
labels is created that indicate the different machine
conditions.
2) The algorithm is subsequently trained, using extracted
features from the dataset, to detect the different conditions
Manuscript received September 15, 2016; revised January 20, 2017 using streams of measurements.
and March 21, 2017; accepted June 11, 2017. Date of publication
July 3, 2017; date of current version February 14, 2018. Recom- A block diagram illustrating this process is given in Fig. 1.
mended by Technical Editor Y. Shen. This work was supported by As can be seen in this figure, sensory data (X) are gathered
the Vlaamse innovatiesamenwerkingsverband Operations and Mainte- together with the corresponding labels (Y ), i.e., the machine’s
nance (VIS O&M) Excellence project of Flanders Innovation and En-
trepreneurship (VLAIO), and has been performed in the framework of condition during the measurements. From the data, features are
the Offshore Wind Infrastructure Application Lab. (Corresponding au- extracted (φ) that are subsequently given to a machine learning
thor: Olivier Janssens.) algorithm together with the labels (Y ). The machine learning
O. Janssens, R. Van de Walle, and S. Van Hoecke are with the In-
teruniversitair Micro-Elektronica Centrum (IMEC)—Ghent University— algorithms will learn a model (θ) that can distinguish between
Internet Technology and Data Science Lab (IDLab), Ghent 9000, Bel- the different conditions. When the system is put into production,
gium (e-mail: [email protected]; [email protected]; new measurements will be taken for which the conditions are
[email protected]).
M. Loccufier is with the Dynamical Systems and Control (DySC) unknown. In order to detect the condition, features are extracted
Research Group, Ghent University, Zwijnaarde 9052, Belgium (e-mail: out of the measurements and fed to the model that is now capable
[email protected]). of predicting the condition of the machine (Ŷ ).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. Currently, the features extracted from the measurements are
Digital Object Identifier 10.1109/TMECH.2017.2722479 engineered by CM experts. The FE process can either be data
1083-4435 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on April 18,2023 at 07:58:13 UTC from IEEE Xplore. Restrictions apply.
152 IEEE/ASME TRANSACTIONS ON MECHATRONICS, VOL. 23, NO. 1, FEBRUARY 2018

driven or model driven. Data-driven finite element (FE) entails


creating features that describe measurable characteristics that
are related to a condition by observation without taking the
underlying physics into account. Model-driven FE requires rea-
soning on the underlying physics of the conditions to deduce
what resulting phenomena can occur and how to quantify them
in features.
A lot of work and research [1], [2] has already been done to
understand the underlying physics for vibration analysis, a com-
monly used technique for CM, resulting in many useful signal
processing techniques and methodologies for model-driven FE.
But also data-driven FE is already successfully applied [3], [4]
on vibration data by, for example, extracting well-known statis-
tics from the vibration signals (e.g., mean, standard deviation,
median, and skew).
As certain faults in rotating machinery remain difficult to
detect using vibration signals alone, for example lubrication re- Fig. 2. Schematic representation of FE without feature selection, with
lated problems [5], other types of sensors, like thermal imaging, feature selection, and FL, respectively.
can be considered. Interpreting the phenomena in the infrared
thermal (IRT) images requires however substantial insights into provides a new representation of the input data, better suited for
the mechanics and thermodynamics of the systems. As complete the classification task. The transformation steps can be repeated
physics-based modeling of a machine is difficult and requires many times—each with their own set of learnable parameters—
a lot of knowledge, effort and time [6], mainly data-driven FE in order to transform the data optimally during classification,
has been applied for IRT-based CM [5], [7]–[9]. i.e., optimal features are learned for the classification task.
Despite the successful applications of FE, both in vibration In recent years, FL has become very popular by the introduc-
and IRT data, there is a possibility that an FE-based system will tion of deep learning (DL) [10]. DL methods are representation-
not perform optimally for two reasons. learning methods with multiple levels of representation, ob-
1) The FE depends on an expert with knowledge about me- tained by composing simple nonlinear modules that each trans-
chanical engineering or statistics who might not be able form the representation (of the data) at one level (starting with
to devise features that fully describe the dynamics of the the raw input) into a representation at a higher, slightly more ab-
signals that are required for correct classification. stract level. By composing multiple transformations, very com-
2) It is possible that the required knowledge to create fea- plex functions can be learned for handling complex data such
tures is not available yet. as images and video. DL is done using various types of deep
Furthermore, a CM system can be designed for a specific set of neural networks (DNNs), wherein every layer will learn a new
conditions, hence, when new conditions should be detectable, representation of the data (i.e., learn features). DL achieves good
an expert has to implement new feature extraction capabilities results in image recognition, speech recognition, drug discov-
into the system. ery, analyzing particle accelerator data, and natural language
In order to circumvent this problem, feature learning processing [10]. All these applications require complex data
(FL)/representation learning can be employed. As opposed to with many variables, resulting in the success of DL. In previous
FE, wherein a human creates the features, FL uses a machine work, we have applied techniques of DL, i.e., a convolutional
learning algorithm to learn and create useful features from neural network (CNN), on vibration signals to detect machine
raw data. The algorithm learns features without human input faults [11]. As vibration signals are one-dimensional (1-D) sig-
that optimally represent the raw data for the required task, and nals, a shallow convolutional network sufficed. IRT data, how-
has therefore the potential to be more powerful than manually ever, have many variables (i.e., pixels), and therefore form per-
engineered features. It should be noted that FL is different from fect fit for researching CNNs as a tool for CM. A vast amount of
feature selection that is used to select the most informative data is required to train a very deep NN. To mitigate this problem,
subset of features from all the available (manually engineered) we investigate transfer learning, a solution when needing a lot of
features; so there is no FL during feature selection. A schematic data, and its usability for IRT data. We apply our proposed DNN
representation, illustrating the difference between FE without on two use cases, namely machine-fault detection and oil-level
feature selection, with feature selection, and FL, respectively, prediction, and show that by using FL, and thus, not manually
can be seen in Fig. 2. FE takes input data (X) and extracts fea- designing features but letting a DNN learn the features, a sig-
tures (φ) that are subsequently used to train a classifier (fθ (.)) nificant condition detection performance gain can be achieved.
with learnable parameters (θ) that outputs predictions (Ŷ ). Finally, we show that the neural networks (NNs) actually focus
When feature selection is applied, the most informative subset of on certain parts in the IRT images to make the classification
features is selected (ψ ⊆ φ) to train the classification algorithm. decisions, which can potentially lead to new physical insights.
Conversely, FL will not extract features, but will use raw input The remainder of the paper is as follows. In Section II, an
data and transform it using tθ 1 (.), wherein θ1 consists of the overview of the recent progress regarding NNs for CM is dis-
learnable parameters of the transformation. The transformation cussed. In Section III, our DNN is discussed together with the

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on April 18,2023 at 07:58:13 UTC from IEEE Xplore. Restrictions apply.
JANSSENS et al.: DEEP LEARNING FOR INFRARED THERMAL IMAGE BASED MACHINE HEALTH MONITORING 153

transfer learning approach, and insights into the decision mak- 1) Local connectivity: When providing an image as input,
ing process are visualized. In Sections IV and V, the two use instead of connecting every neuron in the first hidden
cases on which the DNNs are applied are presented. Finally, in layer to every pixel, a neuron is connected to a specific
Section VI, a conclusion is provided. local region of pixels called a local receptive field. A local
receptive field has a grid structure with a height (h), width
II. NNS FOR CM (w) and depth (d) and is connected to a hidden neuron in
NNs have been used for many decades. However, most often the next layer. Such a local receptive field is slid across
they are used in combination with features engineered by an ex- the input grid structure (i.e., image). Each local receptive
pert [12], [13]. In contrast, FL uses a raw representation of the field is connected to a different hidden neuron, where
input data and lets an algorithm learn and create a suitable repre- each connection is again a weight.
sentation of the data, i.e., features. An example of such a process 2) Weight sharing: Weights (also called kernel or filter) con-
using NNs is given in [14], wherein vibration spectrum images sist of a grid structure equal to the size of a local recep-
are created and given to NNs for rolling element bearing (REB) tive field. Instead of having a unique set of weights for
fault classification. Feature learning can be done by using both each location in the input grid structure, the weights are
supervised and/or unsupervised methods. For REB-fault detec- shared. As any other image processing filter, weights in a
tion using vibration measurements, unsupervised methods using CNN will extract features from the input. Due to weight
autoencoders have been used recently [15]. Autoencoders are sharing, the same feature can be extracted in different lo-
NNs that are designed to replicate the given input. The NN has cations of the input. The output of such a transformation
a single hidden layer containing less nodes than the input layer. is called a feature map. It should be noted that in a CNN,
The purpose of this hidden layer is to learn a compressed rep- every layer will have multiple sets of weights so that a
resentation of the input data. An autoencoder is used to extract multitude of features can be extracted resulting in multi-
features that are given to a classification algorithm. It should ple feature maps (k). Due to weight sharing, the amount
be noted that many autoencoders can be stacked on top of each of weights in the NNs are reduced.
other to form a DNN. Each layer is trained individually, as train- 3) Pooling: Pooling is done after a convolutional layer and
ing an entire DNN at once suffers from the gradient vanishing reduces the dimension of the feature maps. It is applied
problem. During NN training, the (local) minimum of the error by sliding a small window over the feature maps while
function is found by iteratively taking small steps (i.e., gradient extracting a single value from that region by the use of,
descent) in the direction of the negative error derivative with re- for example, a max or mean operation. A feature map
spect to the network’s weights (i.e., gradients). To calculate this will hence reduce in size resulting in less parameters and
gradient, backpropagation is used. Backpropagation in essence reduced number of computations in subsequent layers.
is the chain rule applied to an NN. Hence, the gradient is propa- For more information on CNNs, we refer the reader to [17].
gated backward through each layer. With each subsequent layer, Datasets are often very small for tasks in specialized fields
the magnitude of the gradients get exponentially smaller (van- compared to the required amount of data to train a DNN. Hence,
ishes), making the steps also exponentially smaller, resulting in DNNs will tend to overfit. To overcome this problem, pretrained
very slow learning of the weights in the lower (first) layers of a networks can be used, which are NNs trained for another task for
DNN. An important factor causing the gradients to shrink is the which a lot of data were available. In essence, the weights of the
activation function derivatives (i.e., derivative of a layer’s output already trained network are repurposed for the new tasks. It has
with respect to its input). When the sigmoid activation function been shown that such an NN will have learned general features
is used in the network, the magnitude of the sigmoid derivative that can be used for other tasks [18], [19]. It has also been shown
is well below one in the function’s range causing the gradient to that NNs, which are trained on images of everyday scenery, can
vanish. To solve this problem, in 2012, Krizhevsky et al. [16] be repurposed and modified to be applicable in tasks that require
proposed another type of activation function called the rectified domain specific images, such as medical images [20] or aerial
linear unit, which does not suffer from this problem. Hence, the images [21]. The process of reusing and modifying a trained NN
vanishing gradient problem was mostly solved, enabling much is called transfer learning. There are several methods to apply
deeper (supervised) NNs to be trained as a whole, resulting in transfer learning [18], which are as follows.
many new state-of-the-art results. 1) Remove the last layer (k) or multiple layers (k−t, ...,
An NN is commonly dense and fully connected, meaning that k). Hence, by providing the modified pretrained NN with
every neuron of a layer is connected to every other neuron in input samples, the network will output intermediary ab-
the subsequent layer. Each connection is a weight totaling many stract representations of the data that can be given to a
parameters. The number of parameters is difficult to train as the new classifier, such as a support vector machine. The
network will memorize the data (overfitting), especially when idea behind this approach is that the network has learned
too little data is available. If possible, a partial solution to this reusable features, which at a certain layer are useful for
problem is to gather more data. Nevertheless, the training pro- the task at hand, and that only a new classifier has to be
cedure will take very long. Another partial solution is implicitly trained using the reusable features.
provided in CNNs [17]. CNNs are designed to deal with images, 2) In addition to removing one or more layers, it is also
and therefore exploit certain properties, i.e., local connectivity, possible to attach new layers to the modified pretrained
weight sharing, and pooling, which results in a faster training network. The idea behind this method is that the initial
phase, but also less parameters to train. layers have learned useful weights, but that the subse-

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on April 18,2023 at 07:58:13 UTC from IEEE Xplore. Restrictions apply.
154 IEEE/ASME TRANSACTIONS ON MECHATRONICS, VOL. 23, NO. 1, FEBRUARY 2018

Fig. 3. Architecture of the deep convolutional neural network for IRT CM. C kh ×w denotes a convolutional layer with k feature maps and receptive
field of dimension h × w. P denotes a pooling layer. D n denotes a dense fully connected layer with n neurons. S denotes a softmax layer.

quent layers have not. Hence, they have to be replaced images does not mean that transfer learning is not possible for
and trained. totally different types of images. Hence, we hypothesize that a
3) Following the above-mentioned method, one can choose pretrained DNN, such as the VGG network, can be reused for
to only train the newly added layers (using gradient de- machine condition detection using IRT images.
scent in combination with backpropagation) in order to As the input layer of the VGG network is reused, our dataset
modify the weights of these new layers without modify- has to be preprocessed according to the data that were initially
ing the weights of the transferred layers. provided to the original VGG network, hence, preprocessing as
4) As opposed to only training the newly added layers, it described in [22] was applied. Images (i.e., frames) are prepro-
is also possible to train the entire network, i.e., train the cessed by removing the mean value. Next, smoothing is applied
pretrained layers and new layers. The idea behind this using a Gaussian kernel with a standard deviation of 3 pixels.
method is that neighboring layers coadapt during training, Then, all frames are aligned to a common reference frame (i.e.,
which can only be done when training all layers [18]. image registration), and subsequently cropped to a width and
The application of transfer learning in this paper is discussed height of 224 pixels.
in the next section. Training is applied using minibatch gradient descent, updat-
ing all the weights of the network, including the weights of the
pretrained layers. However, the learning rate for the minibatch
III. NN ARCHITECTURE
gradient descent algorithm should be smaller than the original
Images are complex data as they consist of many variables learning rate to minimally influence the already pretrained lay-
(pixels), hence a deep network is required. However, we de- ers. Therefore, it was set to 1.10−5 . The network was trained
termined that the datasets we constructed in the two use cases using a minibatch size of 8 and for 100 epochs.
contained too little data to properly train a DNN for the IRT
data. Gathering enough data is infeasible. Hence, research into
transfer learning for IRT is done. A. Insights Into IRT Data
Various transfer learning methods were tested, however, the It is difficult to know where to look on an IRT image in order
last option discussed in Section II, i.e., training both the pre- to detect a specific machine condition. NNs are nevertheless
trained and new layers, provided the best results. We opted to able to discover what is important in the images to make a
use a pretrained VGG (NN created by the Visual Geometry decision regarding the conditions. Thus, it can be concluded
Group at the University of Oxford) [22] network that achieves that the necessary information is present in the thermal images.
state-of-the-art results on the imagenet dataset. The VGG net- Extracting the regions in an image that are important for an NN,
work is a very deep CNN containing 16 layers, which was by applying the technique proposed by Zeiler et al. [23], can
trained on natural images. The goal of the VGG network was potentially lead to new physical insights. The Zeiler method has
to classify images in one of a thousand categories. The VGG three steps that are iterated as follows.
network uses rectified linear activation functions in every layer 1) The first step masks a part of the input image (i.e. a 7 × 7
except the last layer, which is a fully connected layer where square of pixels is set to a constant value).
softmax activation functions are used. A layer with softmax ac- 2) In step two, the modified incomplete image is classified
tivation functions provides a probabilistic mutually exclusive by the trained CNN. The CNN has softmax activation
classification, i.e., it provides 1000 values ranging between 0 functions in the output layer which give a probability for
and 1 and the sum of these thousand values is equal to 1. Hence, every possible class.
it gives the probability of a sample belonging to a certain class. 3) In the third step the class probability corresponding to
For transfer learning purposes, the last layer of the VGG the correct class is saved in a matrix with the same di-
network was removed as our dataset has fewer classes. A new mensions as the image. The probabilities are stored in the
fully connected layer was attached to network. This new layer location corresponding to the location that was masked
also uses softmax activation functions, but less weights, as there in the original image.
are less classes to distinguish for the task at hand. In the end, These three steps are iterated over so that every part of the
this means that all except for one layer of the VGG network image is masked once. The idea behind this method is that if an
(which are pretrained) are reused in our network and solely the important and crucial part of the image is masked, the probability
last layer is new. In Fig. 3, the architecture of the network can for the correct class will be low (i.e., closer to zero). Hence, if
be seen. As has been demonstrated in other research, the fact such a drop in probability is observed when a specific part of
that a network’s layers have been trained using a certain type of the image is masked, it can be concluded that said part of the

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on April 18,2023 at 07:58:13 UTC from IEEE Xplore. Restrictions apply.
JANSSENS et al.: DEEP LEARNING FOR INFRARED THERMAL IMAGE BASED MACHINE HEALTH MONITORING 155

TABLE I rotor at a radius of 5.4 cm. The weight of the bolts can be seen
SUMMARY OF THE EIGHT CONDITIONS IN DATASET ONE
in Tables I and II.

No imbalance Imbalance: B. Dataset


13 g or 17.3 N
To construct datasets that are large enough to validate the
Healthy REB (HB) Condition 1 Condition 2
Outer-raceway fault (ORF) Condition 3 Condition 4 proposed methods, every condition in both datasets are created
Mildly inadequately lubricated bearing Condition 5 Condition 6 for five different REBs. By using multiple REBs, variability is
Extremely inadequately lubricated bearing Condition 7 Condition 8 introduced in the dataset due to manufacturing, mounting, and
grease distribution. Each REB is run for 1 h, and the last 10
image is crucial for that particular class. An intuitive example is min—–when steady state is reached—is captured by the IRT
given in [23], where a CNN is trained to detect objects in natural camera. For dataset one, in total, 5 REBs × 8 conditions =
images. One of the possible classes is “dog.” Hence, if a picture 40 IRT recordings are made. For dataset two, 5 REBs × 12 =
of a dog is given to the NN where the face of the dog is hidden 60 recordings are made. It should also be noted that relative
by the mask, the probability for the class “dog,” provided by temperatures are used and not absolute temperature. This is done
the network, will be much lower compared to the case when the by subtracting the temperature measured by the thermocouple
dog’s face is not masked. from the temperatures measured by the thermal camera. For
In the next section, the first use case, wherein the CNN pre- more information regarding the dataset one, we refer the reader
sented above, is discussed. to [5]. It should be noted that simultaneously with the IRT
measurements, accelerometer measurements were captured to
check the validity of the dataset.
IV. USE CASE ONE: MACHINE-FAULT DETECTION
In this use case, IRT video is recorded for various conditions C. Application of the CNN
in a rotating machinery setup. The CNN uses the IRT data to
To detect the multitude of conditions in the two datasets, two
detect the condition of an REB and the gradation of imbalance in
CNNs are required. One CNN, as described in Section III, is
the machine. Two separate datasets were created using the same
trained to detect the different REB conditions. A second CNN’s
setup but on separate moments in time. The conditions present
goal is to detect the gradation of imbalance and is trained using
in each dataset are listed in Table I and Table II, respectively.
differenced frames. It was determined that imbalance could not
be detected by solely using spatial information of the heat dis-
A. Setup tribution in the component. Temporal information is required
The setup can be seen in Fig. 4, for which the rotation speed to make the vibrations, due to imbalance, visible in the images.
was set to 25 Hz. The REB in the housing at the right-hand side Hence, subsequent images are differenced (i.e., It−1 − It ). By
in the setup is changed in-between test runs, hence, this is the differencing these frames, the movement in the images becomes
housing that is monitored by the thermal camera. Additional to visible as described in [5].
the IRT camera, two thermocouples are mounted to measure the By combining the outputs of the two CNNs (i.e., one fo-
ambient temperature. cussing on the spatial aspects and one focussing on the temporal
The type of bearings used were spherical roller bearings and aspects), an overall classification can be done.
to imitate outer-raceway faults (ORF) in the REBs, three small
shallow grooves were added mechanically on the REBs’ outer- D. Results
raceway (see Fig. 5 for an example of such a groove). The ORF To put the results of the CNN approach in perspective, they
is placed at the 10 o’clock position in the housing (i.e., close to are compared to FE approaches given in [5]. Accuracy [see (1)]
the top of the housing, facing the IRT camera) for dataset one is used as metric for fault-detection performance. This metric
and at the 6 o’clock position (i.e., loaded zone) for dataset two. specifies the ratio between the number of samples that are cor-
Lubricant grease is added to every REB. The required amount rectly classified and all the samples in total. The scores were
of grease is 2.5 g as is discussed in [5]. determined during fivefold cross validation. This means that the
Both the healthy bearings (HB) and those with an ORF are CNNs were trained on recorded data from REB two, three, four,
placed in a housing that contains a grease reservoir. The grease and five, and subsequently, the CNNs were tested on data from
reservoir contains 20 g [24]. For the REBs with reduced lubri- REB one. This is done five times so that every bearing is in the
cant in dataset one, i.e., mildly inadequately lubricated bearing test set once. For more information on this evaluation procedure,
(MILB) and extremely inadequately lubricated bearing (EILB), we refer the reader to [5]
no grease reservoir is present. For the MILBs the grease on each
individual REB is superficially removed (1.5 g reduction). Sim- Number of correctly classified frames
accuracy = . (1)
ilarly, for the EILBs the grease in the REBs is decreased more Total number of frames that were classified
(0.75 g reduction). For the hard particle faults in dataset two, The results for both the FE-based approach and the FL-based
0.02 g of iron particles are mixed in the lubricant of the REBs. approach on dataset one are listed in Table III. The detection of
To complete the datasets, all the different REB conditions are imbalance can be done perfectly (accuracy 100%) using either
also tested during imbalance, this is done by adding bolts to the FE or FL. However, for the detection of the REB condition,

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on April 18,2023 at 07:58:13 UTC from IEEE Xplore. Restrictions apply.
156 IEEE/ASME TRANSACTIONS ON MECHATRONICS, VOL. 23, NO. 1, FEBRUARY 2018

TABLE II
SUMMARY OF THE 12 CONDITIONS IN DATASET TWO

No imbalance Imbalance: 4.1 g or 5.5 N Imbalance: 9.3 g or 12.4 N Imbalance: 13 g or 17.3 N

Healthy REB (HB) Condition 1 Condition 2 Condition 3 Condition 4


Outer-raceway fault (ORF) Condition 5 Condition 6 Condition 7 Condition 8
Hard particle contamination (HP) Condition 9 Condition 10 Condition 11 Condition 12

TABLE IV
RESULTS OF BOTH THE FE- AND FL-BASED
APPROACH ON DATASET TWO

Method Conditions Accuracy

FE HP, ORF, HB 65.00% (σ = 16.16%)


FL HP, ORF, HB 98.33% (σ = 3.33%)
FE Imbalance gradation 88.33% (σ = 12.47%)
FL Imbalance gradation 93.33% (σ = 9.72%)
FE All 12 conditions 55.00% (σ = 11.31%)
FL All 12 conditions 91.67% (σ = 9.13%)

σ denotes the standard deviation.

Fig. 4. 3-D image of the setup. The labels are: (1) Servomotor;
(2) coupling; (3) bearing housing; (4) bearing; (5) disk; (6) shaft; (7)
thermocouple; and (8) metal plate. The red square indicates what the FL achieves better results (7% higher accuracy). Overall, for all
IRT camera records. eight conditions together, the FL approach thus provides a 7%
better result.
Results for both the FE-based approach and the FL-based
approach on dataset two are listed in Table IV. It can be seen
that FL provides way better results for both the detection of
the imbalance gradation and the detection of the specific REB
condition. In the end, the FL approach provides a 37% better
accuracy compared to the FE approach.
In general, it can be concluded that the CNN approach
gives very good results on both datasets without requiring
expert knowledge about the problem. However, as a down-
side, NNs are black box systems, meaning that their in-
ner workings are not human interpretable. Nevertheless, in-
sights can be derived from NNs using the method described
in Section III-A.
In Fig. 6, the output based on this method is visualized for
the six bearing conditions. The figures indicate which parts
are important in the IRT image for the specific conditions. For
example, to identify if an REB is extremely inadequately lubri-
cated, the area around the seal is very important [see Fig. 6(c)],
which can, for example, be due to the heat originating from
Fig. 5. Three shallow grooves in the outer-raceway of a bearing simu- the increased friction between the shaft and the seal. Another
lating an ORF.
example is the large area for an ORF at the 10 o’clock position
TABLE III [see Fig. 6(d)]. Due to the fact that the ORF is actually facing
RESULTS OF BOTH THE FE- AND FL-BASED APPROACH ON DATASET ONE the camera inside the housing, a possible increase in heat is
observable in this area. In general, these locations can help to
Method Conditions Accuracy make a link to the underlying physics and can potentially lead to
FE MILB, EILB, HB, ORF 88.25% (σ = 8.07%) new insights. However, further research is needed to relate each
FL MILB, EILB, HB, ORF 95.00% (σ = 6.12%) highlighted image part with the specific underlying physical
FE Balance and imbalance 100.0% (σ = 0.00%) phenomenon.
FL Balance and imbalance 100.0% (σ = 0.00%)
FE All eight conditions 88.25% (σ = 8.07%) When testing our method using a Nvidia GeForce GTX TI-
FL All eight conditions 95.00% (σ = 6.12%) TAN X, 122.26 frames/s can be processed with a standard devi-
ation of 7.27 frames/s, showing that the presented method can
σ denotes the standard deviation.
be used for real-time CM.

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on April 18,2023 at 07:58:13 UTC from IEEE Xplore. Restrictions apply.
JANSSENS et al.: DEEP LEARNING FOR INFRARED THERMAL IMAGE BASED MACHINE HEALTH MONITORING 157

Fig. 6. Regions that influence the CNNs output for (a) HB, (b) MILB, (c) EILB, (d) ORF at the 10 o’clock position, (e) ORF at the loaded zone, and
(f) hard particles. The closer to 1, the more important a region is for the respective class.

V. USE CASE TWO: OIL LEVEL PREDICTION was used and the rotation speed, oil flow rate, and oil tem-
perature were varied in between test runs. The room tempera-
The second use case deals with oil-level prediction in an REB
ture was controllable and was set to a constant temperature of
without having to shut down the machinery.
23 °C. In total 30 recordings were created at various rotation
speeds, flow rates and oil temperatures. As opposed to use case
A. Setup and Dataset one, only one REB is used in this use case. The main body
The setup can be seen in Fig. 7. The main difference with of the REB cover is made out of stainless steel. However, at
the setup of use case one is that in this use case a much larger the left-hand side of the cover a small plexiglass window was
REB (cylindrical roller bearing) and a recirculatory oil lubri- added to visually monitor the oil level and provide ground truth
cation system is used. Furthermore, a static load of 5000 N data, i.e., labeled data. However, in the preprocessing phase the

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on April 18,2023 at 07:58:13 UTC from IEEE Xplore. Restrictions apply.
158 IEEE/ASME TRANSACTIONS ON MECHATRONICS, VOL. 23, NO. 1, FEBRUARY 2018

Fig. 7. Image of the used setup. (1) Bearing, (2) hydrostatic pad to
apply radial load on the bearing, (3) pneumatic muscle for loading the
bearing, (4) force cell for friction torque, and (5) temperature measure-
ments.

TABLE V
RESULTS OF BOTH THE FE- AND FL-BASED
APPROACH IN USE CASE TWO

Method Conditions Accuracy

FE Full, not full 80.00%


FL Full, not full 86.67%

plexiglass part is removed from the IRT image. For more in-
formation on the setup and dataset, we refer the reader to [25].
The goal is to let the CNN, described in Section II, automat-
ically determine if the oil level in the REB is full or not as
this is not determinable visually by humans. The same training
and preprocessing procedures as described for use case one are
applied.
Fig. 8. Regions that influence the CNNs output for an REB (a) full of
B. Results oil and (b) not full of oil.
The accuracy score was determined using leave-one-out cross
validation as the variability in conditions of the dataset are rather VI. CONCLUSION
large for the amount of samples. To put the FL results in per- In this paper, it is shown that CNNs, an FL tool, can be used
spective also an FE-based approach is used, similar to the one to detect various machine conditions. The advantage of FL is
discussed in [5] where general statistical features are used. The that no FE or thus expert knowledge is required. FE can also
results can be seen in Table V. As can be seen an FL-based result in a suboptimal system especially when the data are very
approach provides better results (6.67%). Not only does FL pro- complex, such as for thermal infrared imaging data.
vide better results, it also does not require an expert to engineer DNNs, such as CNNs, require a vast amount of data to
features. train. To mitigate this problem we investigated transfer learn-
The search for important parts in the image responsible for ing, which is a method to reuse layers of a pretrained DNN.
the classification result resulted in the Fig. 8(a) and (b) for an We show that by using transfer learning, wherein layers of a
REB full of oil and an REB not full of oil. In contrast to use case trained CNN on natural images are repurposed, the CNN out-
one, the underlying physics of these images can be interpreted. performs classical FE in both the machine-fault detection and
To detect if an REB is full of oil, the top side of the REB is the oil-level prediction use case. For both use cases, the FL ap-
important, which is to be expected as this part will be hotter proach provides at least a 6.67% better accuracy compared to
when the REB is full of oil. Conversely, to detect if the REB the FE approach, and even up to 37% accuracy improvement
is not full, the bottom part of the REB is important as only for dataset two of use case one.
this part will be significantly warmer when the REB is not full Finally, as it is difficult to know where in the image to look at
of oil. to detect certain condition, we show that by applying the method

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on April 18,2023 at 07:58:13 UTC from IEEE Xplore. Restrictions apply.
JANSSENS et al.: DEEP LEARNING FOR INFRARED THERMAL IMAGE BASED MACHINE HEALTH MONITORING 159

of Zeiler et al. [23] on the trained CNNs, valuable insights into [21] F. Hu, G.-S. Xia, J. Hu, and L. Zhang, “Transferring deep convolutional
the important regions of the thermal images can be detected, neural networks for the scene classification of high-resolution remote
sensing imagery,” Remote Sens., vol. 7, no. 11, pp. 14 680–14 707,
potentially leading to new physical insights. 2015.
The presented method has the potential to improve online CM [22] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
in, for example, offshore wind turbines. The maintenance costs large-scale image recognition,” in Proc. Int. Conf. Learn. Represent.,
2015, pp. 1–14.
for offshore wind turbines is very high due to the limited acces- [23] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolu-
sibility. Installing an IRT camera in the offshore wind turbine’s tional networks,” in Computer Vision ECCV 2014 (ser. Lecture Notes in
nacelle, combined with the presented method, allows for online Computer Science). New York, NY, USA: Springer, 2014, pp. 818–833.
[24] Schaeffler, “Fag split plummer block housings of series SNV,” pp. 1–84,
CM. Another potential application is the monitoring of bearings 2015. [Online]. Available: http://www.schaeffler.com/remotemedien/
in manufacturing lines. Using thermal imaging together with media/_shared_media/08_media_library/01_publications/schaeffler_2/tpi/
the method of Zeiler et al. applied to the trained CNN allows downloads_8/tpi_175_de_en.pdf
[25] O. Janssens, M. Rennuy, S. Devos, M. Loccufier, R. Van de Walle, and
identifying the location of the faults in the manufacturing lines. S. Van Hoecke, “Towards intelligent lubrication control: Infrared thermal
imaging for oil level prediction in bearings,” in Proc. IEEE Multi-Conf.
REFERENCES Syst. Control, 2016, pp. 1330–1335.
[1] W. A. Smith and R. B. Randall, “Rolling element bearing diagnostics using
the case western reserve university data: A benchmark study,” Mech. Syst. Olivier Janssens received the master’s degree
Signal Process., vol. 64, pp. 100–131, 2015. in industrial engineering focussing on informa-
[2] E.-T. Idriss and J. Erkki, “A summary of fault modelling and predic- tion and communication technology from the
tive health monitoring of rolling element bearings,” Mech. Syst. Signal University College of West Flanders, Kortrijk,
Process., vol. 6061, pp. 252–272, 2015. Belgium, in 2012.
[3] R. Heng and M. Nor, “Statistical analysis of sound and vibration signals Following his studies, he joined the IDLab with
for monitoring rolling element bearing condition,” Appl. Acoust., vol. 53, the Department of Electronics and Information
no. 1–3, pp. 211–226, 1998. Systems, Ghent University—Interuniversitair
[4] Y. L. Murphey, M. A. Masrur, Z. Chen, and B. Zhang, “Model-based fault Micro-Elektronica Centrum (IMEC), Ghent, Bel-
diagnosis in electric drives using machine learning,” IEEE/ASME Trans. gium, in order to research multisensor data-
Mechatronics, vol. 11, no. 3, pp. 290–303, Jun. 2006. driven condition monitoring methods.
[5] O. Janssens et al., “Thermal image based fault diagnosis for rotating
machinery,” Infrared Phys. Technol., vol. 73, pp. 78–87, 2015. Rik Van de Walle received the M.Sc. and Ph.D.
[6] W. Moussa, “Thermography-assisted bearing condition monitoring,” degrees in engineering from Ghent University,
Ph.D. dissertation, Dept. Mech. Eng., University of Ottawa, Ottawa, ON, Ghent, Belgium, in 1994 and 1998, respectively.
Canada, 2014. After a visiting scholarship at the University of
[7] A. Widodo, D. Satrijo, T. Prahasto, G.-M. Lim, and B.-K. Choi, “Con- Arizona, Tucson, AZ, USA, he returned to Ghent
firmation of thermal images and vibration signals for intelligent machine University, where he became a Professor of mul-
fault diagnostics,” Int. J. Rotating Mach., vol. 2012, pp. 1–10, 2012. timedia systems and applications, and the Head
[8] V. T. Tran, B.-S. Yang, F. Gu, and A. Ball, “Thermal image enhancement of the Multimedia Lab. His research interests in-
using bi-dimensional empirical mode decomposition in combination with clude multimedia content delivery, presentation
relevance vector machine for rotating machinery fault diagnosis,” Mech. and archiving, coding and description of multi-
Syst. Signal Process., vol. 38, no. 2, pp. 601–614, Jul. 2013. media data, content adaptation, and interactive
[9] G.-M. Lim, Y. Ali, and B.-S. Yang, The Fault Diagnosis and Monitoring (mobile) multimedia applications.
of Rotating Machines by Thermography, J. Mathew, L. Ma, A. Tan, M.
Weijnen, and J. Lee, Eds. London, U.K.:Springer, 2012, pp. 557–565.
[10] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, Mia Loccufier received the M.S. degree in elec-
no. 7553, pp. 436–444, 2015. tromechanical engineering, the M.S. degree in
[11] O. Janssens et al., “Convolutional neural network based fault detection for automatic control engineering, and the Ph.D.
rotating machinery,” J. Sound Vib., vol. 377, pp. 331–345, 2016. degree in electromechanical engineering from
[12] B. Li, M.-Y. Chow, Y. Tipsuwan, and J. C. Hung, “Neural-network-based Ghent University, Ghent, Belgium.
motor rolling bearing fault diagnosis,” IEEE Trans. Ind. Electron., vol. 47, She is a Professor with the DySC Research
no. 5, pp. 1060–1069, Oct. 2000. Group, Department of Electrical Energy, Sys-
[13] Z. Chen, C. Li, and R.-V. Sanchez, “Gearbox fault identification and tems, and Automation, Faculty of Engineering,
classification with convolutional neural networks,” Shock Vib., vol. 2015, Ghent University, and where she is a Lecturer
pp. 1–10, 2015. of mechanical vibrations, structural dynamics,
[14] M. Amar, I. Gondal, and C. Wilson, “Vibration spectrum imaging: A novel and systems dynamics. Her research interests
bearing fault classification approach,” IEEE Trans. Ind. Electron., vol. 62, include the dynamics of technical systems, passive control, especially
no. 1, pp. 494–502, Jan. 2015. nonlinear tuned mass dampers of mechanical systems and structures,
[15] N. Verma, V. Gupta, M. Sharma, and R. Sevakula, “Intelligent condition dynamics of rotating machinery, stability and bifurcation analysis of non-
based monitoring of rotating machines using sparse auto-encoders,” in linear systems and structures, and control of underactuated mechanical
Proc. IEEE Conf. Progn. Health Manage., 2013, pp. 1–7. systems.
[16] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Pro- Sofie Van Hoecke received the master’s de-
cess. Syst., 2012, pp. 1097–1105. gree in computer science from Ghent University,
[17] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning Ghent, Belgium, in 2003, and the Ph.D. degree
applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278– in computer science engineering from the De-
2324, Nov. 1998. partment of Information Technology, Ghent Uni-
[18] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are versity, in 2009.
features in deep neural networks?,” in Proc. Adv. Neural Inf. Process. She is currently an Assistant Professor with
Syst., 2014, pp. 3320–3328. Ghent University and a Senior Researcher with
[19] A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, “CNN features the IDLab, Ghent University–IMEC. Her re-
off-the-shelf: An astounding baseline for recognition,” in Proc. IEEE Conf. search interests include the design of multi-
Comput. Vis. Pattern Recognit. Workshops, 2014, pp. 806–813. sensor architectures, Quality of Service (QoS)-
[20] W. Zhang et al., “Deep model based transfer and multi-task learning for brokering of novel services, innovative Information and Communica-
biological image analysis,” in Proc. 21th ACM SIGKDD Int. Conf. Knowl. tion Technology (ICT) solutions for care, and multisensor condition
Discovery Data Mining, 2015, pp. 1475–1484. monitoring.

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on April 18,2023 at 07:58:13 UTC from IEEE Xplore. Restrictions apply.

You might also like