0% found this document useful (0 votes)
54 views

Machine Learning With Domain Knowledge For Predict

This document summarizes a research paper that proposes machine learning pipelines for quality monitoring in resistance spot welding processes. The researchers developed 12 machine learning models using different feature engineering techniques and machine learning methods to analyze welding quality using data from industrial production lines. Their goal was to predict welding quality before operations occur by capturing dependencies across sequences of welds. The models showed promising results and combined both engineering and data science perspectives to provide insights for process monitoring.

Uploaded by

Thiago Domingos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Machine Learning With Domain Knowledge For Predict

This document summarizes a research paper that proposes machine learning pipelines for quality monitoring in resistance spot welding processes. The researchers developed 12 machine learning models using different feature engineering techniques and machine learning methods to analyze welding quality using data from industrial production lines. Their goal was to predict welding quality before operations occur by capturing dependencies across sequences of welds. The models showed promising results and combined both engineering and data science perspectives to provide insights for process monitoring.

Uploaded by

Thiago Domingos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Journal of Intelligent Manufacturing (2022) 33:1139–1163

https://doi.org/10.1007/s10845-021-01892-y

Machine learning with domain knowledge for predictive quality


monitoring in resistance spot welding
Baifan Zhou1,2,3 · Tim Pychynski1 · Markus Reischl2 · Evgeny Kharlamov3,4 · Ralf Mikut2

Received: 10 July 2021 / Accepted: 6 December 2021 / Published online: 2 March 2022
© The Author(s) 2022

Abstract
Digitalisation trends of Industry 4.0 and Internet of Things led to an unprecedented growth of manufacturing data. This opens
new horizons for data-driven methods, such as Machine Learning (ML), in monitoring of manufacturing processes. In this
work, we propose ML pipelines for quality monitoring in Resistance Spot Welding. Previous approaches mostly focused on
estimating quality of welding based on data collected from laboratory or experimental settings. Then, they mostly treated
welding operations as independent events while welding is a continuous process with a systematic dynamics and production
cycles caused by maintenance. Besides, model interpretation based on engineering know-how, which is an important and
common practice in manufacturing industry, has mostly been ignored. In this work, we address these three issues by developing
a novel feature-engineering based ML approach. Our method was developed on top of real production data. It allows to analyse
sequences of welding instances collected from running manufacturing lines. By capturing dependencies across sequences of
welding instances, our method allows to predict quality of upcoming welding operations before they happen. Furthermore,
in our work we strive to combine the view of engineering and data science by discussing characteristics of welding data that
have been little discussed in the literature, by designing sophisticated feature engineering strategies with support of domain
knowledge, and by interpreting the results of ML analysis intensively to provide insights for engineering. We developed 12
ML pipelines in two dimensions: settings of feature engineering and ML methods, where we considered 4 feature settings
and 3 ML methods (linear regression, multi-layer perception and support vector regression). We extensively evaluated our
ML pipelines on data from two running industrial production lines of 27 welding machines with promising results.

Keywords Condition monitoring · Quality monitoring · Machine learning · Resistance spot welding · Predictive maintenance ·
Feature engineering · Industry 4.0

Introduction gas, chemical and process industries. Kagermann describes


Industry 4.0 in the way that smart machines, storage systems
Technological advances in trends of Industry 4.0 (Kager- and production facilities will be incorporated into aggre-
mann 2015) and Internet of Things (ITU 2012), includ- gate solutions that are often referred to as Cyber-Physical
ing technologies in sensoring, communication, information Systems (Kagermann 2013; NSF 2010). Furthermore, such
processing, and actuation, have opened horizons of new systems are then integrated in smart factories.
opportunities and challenges to change the paradigm of This trend towards smart factories unlocked an unprece-
many industrial processes, such as manufacturing, oil and dented volumes of data. Indeed, modern manufacturing
machines and production lines are equipped with sensors
B Baifan Zhou that constantly collect and send data. The machines have
[email protected] control units that monitor and process the data, coordinate
1 Bosch Corporate Research, 71272 Renningen, Germany machines and manufacturing environment and send mes-
2
sages, notifications, requests. Such data generated during
Karlsruhe Institute of Technology, 76344
Eggenstein-Leopoldshafen, Germany manufacturing (Chand and Davis 2010; Wuest et al. 2016)
3
has led to a large growth of interest in data analysis for a wide
SIRIUS Centre, University of Oslo, 0316 Oslo, Norway
range of industrial applications (Mikhaylov et al. 2019a, b;
4 Bosch Center for Artificial Intelligence, 71272 Renningen, Zhou et al. 2017, 2019).
Germany

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


1140 Journal of Intelligent Manufacturing (2022) 33:1139–1163

Resistance spot welding. In this work, we investigate such et al. 2019). The reasons are that ML allows to predict the
data analysis for a particular industrial scenario of Resistance quality by relying on statistical theory in building mathemat-
Spot Welding (RSW) and its applications at Bosch, a large ical models. ML thus enables computers to make inference
manufacturing company that is one of the world leaders in from data without being explicitly programmed (Alpaydin
automated welding in the automotive industry. 2009; Samuel 2000). Moreover, ML methods are especially
We illustrate RSW with Fig. 4, in which the two elec- important, because the reliable estimation of welding qual-
trode caps of the welding gun press two or three worksheets ity can decrease or even obviate the expensive destructive
between the electrodes with force. A high electric current measuring of welding quality. Furthermore, if data-driven
then flows from one electrode, through the worksheets, to methods can predict the quality of future welding operations,
the other electrode, generating a substantial amount of heat necessary measures can be undertaken before the actual qual-
as a result of electric resistance. The material in a small ity failure happens. Finally, ML models are beneficial as they
area between the worksheets will melt, and form a weld- can potentially perform quality control for every welding spot
ing nugget connecting the worksheets, known as the welding reliably, ensuring process capability and reducing costs for
spot. The quality of welding operations is typically quanti- quality monitoring (Zhou et al. 2018). In the ML commu-
fied by quality indicators like spot diameters, as prescribed in nity there are two large groups of approaches (LaCasse et al.
international and German standards (ISO 2004; DVS 2016). 2019): feature engineering with classic machine learning,
To obtain the spot diameters or tensile shear strength pre- and feature learning with neural networks. In this work, we
cisely, the common practice is to tear the welded chassis focus on the former, feature engineering, which means man-
apart and measure these two quality indicators DVS (2016), ual design of strategies to extract new features from existing
which is extremely expensive. Nevertheless, monitoring of features (Bengio et al. 2013), examples include extraction
welding quality has a great importance. Indeed, consider a of statistic features like maximum, mean, etc., or geometric
scenario in the car factory, where cars are continuously pro- features, such as slopes, drops, etc.
duced in several production lines. In each production line,
a series of chassis is produced, with up to six thousands of Limitation of previous works. Many previous studies have
spots welded on each chassis. If a quality failure happens adopted the approach of data-driven models for quality
on one spot, the whole production line needs to be stopped, monitoring in RSW. Many have studied the problem, to clas-
which means the loss of several cars, production down-time, sify (Martín et al. 2007; Sun et al. 2017), estimate (El Ouafi
and cost to bring the production line back to running. Think- et al. 2010; Wan et al. 2016), or optimise (Panchakshari and
ing about the number of cars produced everyday, it reveals Kadam 2013) welding quality of each individual operations.
the huge economic benefit behind improving quality moni- In most studies, data were collected from laboratory or exper-
toring of RSW. Furthermore, if the technology developed for imental settings (Summerville et al. 2017; Boersch et al.
improving RSW can be generalised over other manufactur- 2016). The typical data amount with labels (quality data) was
ing processes or other industries, the industrial impact behind less than 500 Zhang et al. (2017), very few of them above
the research endeavour to improve RSW will be tremendous. 3000 (Kim and Ahmed 2018). A number of different features
were used for analysis, such as process curves like electrode
Machine learning for resistance spot welding. Bosch RSW displacement (Li et al. 2012), or scalar process parameters
solutions are fully automated and produce large volumes of like coating (Yu 2015). Various methods of feature process-
heterogeneous data. Therefore, we focus on data analyses ing and ML modelling were explored, including classic ML
in this work, in particular, on Machine Learning (ML), for methods like Random Forests (Sumesh et al. 2015), or Neural
quality monitoring 1 for RSW. Note that ML approaches have Networks (Afshari et al. 2014).
proven their great potential for quality monitoring and thus However, previous works treated the welding operations
they have received an increasing attention in industry (Zhao as independent events, ignoring the fact that welding is a con-
tinuous process with systematic dynamics, e.g. wearing of
1 Note that quality monitoring that we investigate here falls into a wider welding tools, production cycles caused by maintenance. The
well-known category of condition monitoring that is typically divided reason of this is because the characteristics of welding data
into two categories machine health monitoring (Zhao et al. 2019) that were insufficiently recognised. Very few studies were con-
aims at maintaining the healthy state of an equipment, and quality
monitoring that aims at ensuring the product quality within acceptable
ducted on real production data. It is questionable whether
limits. The common quality monitoring practice is to estimate or assess models developed from laboratory data are applicable in real
the product quality after a manufacturing operation happens, so as to production, as the welding conditions (such as cooling time
provide results for future manufacturing operations. While predictive and wear) are usually different. Furthermore, although some
quality monitoring, is to predict the quality of a manufacturing oper-
ation before the actual operation happens, so that necessary measures
feature processing methods seemed to consider some domain
can be undertaken even before the quality deficiency happens (Zhou knowledge of welding, the integration of domain knowledge
et al. 2020a). in data analysis was limited, and the interpretation of data

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Journal of Intelligent Manufacturing (2022) 33:1139–1163 1141

ning industrial production lines of 27 welding machines at


Bosch’s plants or development partners. Results from two
representative datasets collected from two welding machines
are demonstrated in this paper. In total, results of 24 ML
models are demonstrated. The algorithms in this paper are
implemented with the MATLAB toolbox SciXMiner (Mikut
Fig. 1 Welding workflow with quality prediction. If the predicted qual- et al. 2017; Bartschat et al. 2019) and Python (Rossum 1995).
ity (Q-Value in this paper) of the next welding operation is good, the
The machine learning in this work (Fig. 2) is slightly
welding process continues. Otherwise necessary measures are to under-
take, e.g. changing electrode caps, dressing electrodes (”Resistance spot adapted from the pipeline of Fayyad et al. (1996) and
welding” section) Mikut et al. (2006). The complete pipeline of ML for data
analysis includes data collection, data preparation, data pre-
processing, modelling, and interpretation. Data preparation
analysis also provided limited insights from perspective of refers to the activities that transform the industrial data from
engineering know-how. different sources and formats, e.g. csv, SQL data base, txt,
xlsx, to a uniform format so that the data can be processed in
Our approach. In this work, we develop ML approaches
a uniform way. Data pre-processing is basically the process
to predict the welding quality before the actual welding
of feature extraction (in this work feature engineering specif-
process happens, considering the characteristics of welding
ically), that is to change the representation of data (Bengio
data, especially the temporal dependency. The envisioned
et al. 2013), so that it can be suitable for the subsequent
welding process is to undertake necessary measures before
machine learning modelling.
possible quality failures (Fig. 1). Furthermore, this work
strives to integrate domain knowledge in machine learning Our contributions. We now summarise the main contribu-
modelling, combining views from data science and weld- tions of our paper as follows:
ing engineering know-how. We focus on feature engineering
because it is more transparent than feature learning / deep – We conducted an in depth study and revealed character-
learning, since transparency is very desired in a industrial istics of welding data collected from running production
setting. Four settings of engineered features are designed with domain knowledge. These characteristics are the
for machine learning modelling to explore and test whether natural results of welding production and reveal the tem-
and to what degree feature engineering can increase the pre- poral dependencies of welding operations, which to the
diction power. Three ML methods, Linear regression (LR), best of our knowledge have never been discussed in
multi-layer perceptron with one hidden layer (MLP), and depths in the literature (except for minor mentioning).
support vector regression (SVR) are studied as representa- The discussion and visualisation of the data are impor-
tive classic machine learning methods (LaCasse et al. 2019). tant for understanding the data from both perspectives of
The combination of the feature processing settings and ML engineering and data science.
methods gives 12 ML pipelines. The developed ML pipelines – We demonstrated that sophisticated feature engineering
are extensively evaluated with data collected from two run- with support of domain knowledge can greatly improve

Fig. 2 Basic machine learning


pipeline in this work (Fayyad
et al. 1996; Mikut et al. 2006;
LaCasse et al. 2019). Question
definition is illustrated in Fig. 1.
The collected data is described
in ”Data and description”
section and prepared to the
formats in Data in two format.
The data pre-processing and
modelling are studied in
”methodology” section.
Interpretation and visualisation
are presented in ”Experiment
results” section and
”discussion” section

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


1142 Journal of Intelligent Manufacturing (2022) 33:1139–1163

the performance of ML methods for RSW. To this end we tion of modelling results to provide insights for engineering
designed 4 settings with 4 levels of feature engineering. – this was not presented the conference paper.
This in particular illustrates how domain knowledge can
Organisation of the paper. This paper is organised as fol-
play an important role in ML-based data analysis.
lows. “Related work” section gives a detailed review on
– We developed and compared novel ML-based meth-
related work and reveals their limitation. The “Data and
ods for predicting welding quality. The adopted ML
Problem Statement” section describes the Resistance Spot
algorithms include Linear Regression (LR), Multi-layer
Welding process and the structures of the data collected
Perception (MLP) and Support Vector Regression (SVR).
from the production lines. “Methods” section introduces the
The combination of three ML algorithms and four set-
strategies for handling the special data structures and for
tings of feature engineering gives twelve ML models for
feature engineering. Section “Experiment settings” section
each dataset. We showed that, on the one hand, simplistic
describes the experiment settings for evaluating the feature
methods such as LR can have comparable performance
engineering approaches. “Results and discussion” section
to MLP, given that efficient feature engineering strate-
presents and discusses the evaluation results. “Interpret-
gies are adopted; while on the other hand, LR has a very
ing ML results for engineering insights” further interprets
desired feature in manufacturing industry, which is trans-
the features, visualisation and results to extract engineer-
parency, compared to less transparent methods like MLP
ing insights. “Conclusion and outlook” section concludes the
or Deep Learning.
work and previews the future research directions.
– We conducted an extensive evaluation of the twelve ML
models with real industrial data collected from two weld-
ing stations in running production lines, resulting in
Related work
twenty four models in total. This in particular provides
a guidance to a wider community on how to develop of
Many previous studies have adopted the approach of data-
ML-based quality monitoring systems in production.
driven models for quality monitoring (summarised in Table 1).
– We interpreted the ML results and feature selection with
It can be seen that regression (R) and classification (C) has
engineering knowledge and provided insights that enable
been the focus in the past 20 years. The interest of machine
engineers to understand the data and the process from a
learning for RSW has been growing in the recent years. This
data science perspective. This demonstrates the advan-
work will discuss and summarise them from four perspec-
tage of transparency of domain knowledge supported
tives: question definition, data collection, feature processing,
feature engineering, compared to approach of feature
and machine learning modelling.
learning. Thus, ML can be a natural method in the toolbox
of engineers for common engineering practice. Question definition. There exist two aspects of question def-
inition. The first one is which quality indicator is used as the
target feature. Most previous works have studied estimation
This work has been conducted as a part of the PhD study of the spot diameter (Boersch et al. 2016; Kim and Ahmed
of the first author (Zhou 2021). The material presented in this 2018), as this is the suggestion by standards. Many works
paper significantly extends our previously presented confer- studied estimation of tensile shear strength (Cho and Rhee
ence paper (Zhou et al. 2020a) as follows. First, we provide 2000; Martín et al. 2009; Zhang et al. 2017; Sun et al. 2017),
an extensive systematic review of related work of the past 20 and other less common quality indicators like load (Tseng
years. Second, we discuss characteristics of welding data that 2006), gaps (Hamedi et al. 2007), penetration (El Ouafi et al.
were only shortly mentioned in the conference paper, espe- 2010), pitting potential (Martín et al. 2010). All of these qual-
cially hierarchical temporal structures. Third, we incorporate ity indicators are physical quantities that can be measured.
domain knowledge more deeply involved in data handling, The second aspect is to study the question from the per-
data splitting, and feature engineering for time series. Fourth, spective of classification, regression, or optimisation. Many
we design four feature engineering strategies: two of them works (Lee et al. 2003; Cho and Rhee 2004; El-Banna et al.
are new, and the other two were only briefly discussed in 2008; Yu 2015) treated the problem as classification, that is
the conference paper while here we present them in much to predict the category of quality: good, bad, and sometimes
more details. Fifth, here we present three types of classic ML the concrete failure types, which is the final important goal
pipelines (LR, MLP, SVR) and only one of them (that is, LR) for quality monitoring. Some works (Lee et al. 2001; Martín
was in the conference paper. Sixth, here we evaluate our 12 et al. 2009; Afshari et al. 2014; Summerville et al. 2017)
ML pipelines with a dataset that is much more complex and defined the question as regression, that is to assess a numer-
reflect more real complicated dynamics in production than ical value of the quality. This may seem to be unnecessary,
the one used in the conference paper. Seventh, we introduce a but could be beneficial in many senses. The exact quality
new section that is entirely devoted to an extensive interpreta- values are of great interest for process experts to gain better

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Table 1 An overview of related works of machine learning in RSW. All studies are carried out with a laboratory data source on RSW except for (Kim and Ahmed 2018). #Data: number of data
tuples. C: classification, R: regression, O: optimisation, TSS: tensile shear strength, D: diameter, h: height, E pitt : pitting potential. SF: single features, TS: time series, FE: feature engineering TSFE:
time series features engineered, DKFE: domain knowledge supported feature engineering, Re: resistance, F: force, t: time, I: current, s: displacement, U: voltage, MLP: multi-layer perceptron, NN:
neural networks, LR: linear regression, SOM: self-organising maps, BN: Bayesin network, LVQ: learning vector quantisation, LDA/QDA: linear/quadratic discriminant analysis, kNN: k-nearest
neighbours, GRNN: general regression neural networks, GA: genetic algorithms, LogisticR: logistric regression, ANOVA: analysis of variance, DT: decision trees, RF: random forests, SVR/SVM:
support vector regression/machine, PolyR: polynomial regression PSO: particle swarm optimisation, KELM: kernel extreme learning machines, CART: classification and regression tree, GLM:
generalised linear model, SAE: sparse auto-encoder, CNN: convolutional neural networks
Question #Data Feature Processing ML Methods

Cho and Rhee (2000) R: TSS 50 TS: Re; TSFE: DKFE (slope, max, std, mxpo) MLP
Li et al. (2000) R: D 170 SF: F, t, I; TS: F, Re, s; TSFE: subsampling; FE: MLP
PCA, rms
Lee et al. (2001) R: TSS 80 TS: Re, s; TSFE: subsampling, DKFE (slope, mxpo, Fuzzy NN
range)
Cho and Rhee (2002) R: D, TSS 60 TS: Re; TSFE: DKFE (max, slope, min, std), stats LR, MLP
Lee et al. (2003) C: D 390 Image from SAM; ImageFE: dilation, quantisation MLP
Journal of Intelligent Manufacturing (2022) 33:1139–1163

Cho and Rhee (2004) C: TSS 10 TS: Re; TSFE: TS to bipolarised image HopfieldNN
Junno et al. (2004) C: D 192 TS: I, F, U; TSFE: means of segmented TS SOM
Laurinen et al. (2004) C: D 192 TS: I, F, U; TSFE: histogram, discretisation of BN
quartiles
Park and Cho (2004) C: TSS, h 78 TS: s; TSFE: subsampling, slopes LVQ, MLP
Podržaj et al. (2004) C: spatter 30 TS: U, I, Re, s, F; TSFE: subsampling, std LVQ
Haapalainen et al. (2005) C: TSS 3879 TS: I, U; TSFE: transitions, segmentation mean; FE: LDA, QDA, kNN, LVQ
PCA
Tseng (2006) R, O: load 25 SF: I, F, t, thickness GRNN, GA
Hamedi et al. (2007) R, O: gaps 39 SF: I, t, thickness MLP, GA
Koskimaki et al. (2007) C: TSS 3879 TS: I, U; TSFE: segmentation mean kNN, SOM
Martín et al. (2007) C: D 438 TS: ultrasonic oscillogram; TSFE: DKFE (heights & MLP
distances of echoes)
El-Banna et al. (2008) C: D 1270 TS: R; TSFE: segmentation, DKFE (max, min, LVQ
mean, std, range, rms, slope); FS: power of the test
Haapalainen et al. (2008) C: TSS 3879 TS: I, U; TSFE: transitions, segmentation mean; FE: kNN
PCA; FS: SFS, SBS, SFFS, SBFS, nBest
Martín et al. (2009) R: TSS 242 SF: t, I, F LR, MLP
El Ouafi et al. (2010) R: h, penetration, D 54 SF: Thickness, I, F, t; TS: Re; TSFE: subsampling MLP
Martín et al. (2010) R: E pitt 242 SF: t, I, F; FE: polynomial features MLP

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Li et al. (2012) R: D 145 TS: s; TSFE: DKFE (geometric features) MLP
Panchakshari and Kadam (2013) R, O: D, TSS, load 25 SF: I, welding time, holding time, squeezing time; LR, GA
FS: ANOVA

123
1143
Table 1 continued
1144

Question #Data Feature Processing ML Methods

123
Afshari et al. (2014) R: D 54 SF: F, t, I MLP
Yu (2015) C: TSS, spatter 473 SF: coating, F; TS: Power; TSFE: DKFE (mxpo, LogisticR
max, drop)
Pereda et al. (2015) C: D 330 SF: t, I, material, treatment ANOVA, DT,
NN, RF,
SVR, Logis-
ticR, QDA
Zhang et al. (2015) C: TSS, D 200 TS: s; TSFE: segmentation, mean, radar chart, DT
geometric features
Boersch et al. (2016) R: D 3241 TS: I, U, T; TSFE: derivative, segmentation, LR, DT, RF, SVM, kNN
scale-space filtering, geometric features, stats
Pashazadeh et al. (2016) R, O: D, h 48 SF: t, I, Pressure PolyR, MLP, GA
Wan et al. (2016) C, R: D, load 60 SF: F, I, t, TS: Re; TSFE: PCA MLP
Summerville et al. (2017) R: D 126 TS: Re; TSFE: PCA LR
Summerville et al. (2017) R: TSS 170 TS: Re; TSFE: PCA PolyR
Sun et al. (2017) C: TSS 67 SF: I, t, Pressure PSO, KELM
Zhang et al. (2017) C: TSS 120 TS: s; TSFE: DKFE stats, Chernoff images, binary HopfieldNN
features
Hou et al. (2017) C: good/bad 88 X-ray images SAE
Kim and Ahmed (2018) R: D 3344 SF: material, thickness, t, coating, I DT, CART
Gavidel et al. (2019) R: D 39 SF: material, thickness, various surface conditions, CART, RF, MLP, kNN, SVM, GLM, Bootstrapping
F, I, t
Amiri et al. (2020) R: TSS, fatigue life 60 SF: statistic features extracted from ultrasonic MLP
images
Dai et al. (2021) C: good/bad 400 pictures of outlook of spots assisted visual inspection CNN

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Journal of Intelligent Manufacturing (2022) 33:1139–1163
Journal of Intelligent Manufacturing (2022) 33:1139–1163 1145

insights in the influence of input factors on the quality indica- et al. 2020), power. Besides the two common types, Lee
tor. After predicting the numerical values of spot diameters, et al. (2003) used images collected from Scanning Acoustic
a classification can still be made according to the tolerance Microscopy (SAM).
bands, which usually vary for different welding conditions Some authors have attempted to exploit domain knowl-
and user specifications. Other works (Tseng 2006; Hamedi edge for designing feature engineering strategies to extract
et al. 2007; Pashazadeh et al. 2016) studied the optimisation features from process curves. The works (Cho and Rhee
problem, that is to optimise the influential factors so that the 2000, 2002) used statistic or geometric features of process
quality can improve. curves, like slope, maximum, standard deviation, position of
maximum, range, root-mean-squares of measurements (El-
Data collection. It is an important issue to discuss from two
Banna et al. 2008). Some more specific features based on
aspects. The first aspect is the data amount. The amount of
domain knowledge, such as heights and distance between
data labelled with quality features is extremely limited (Zhou
echoes from ultrasonic oscillograms were studied (Martín
et al. 2018) due to the costly data collection process as dis-
et al. 2007). Geometric features (Li et al. 2012) were intro-
cussed before. Therefore, most of the previous works used a
duced later, e.g. “drop” in process curves (Yu 2015), which
relatively small amount of data. The number of welding spots
is the value decrease from a peak to the following valley.
ranges from 10 (Cho and Rhee 2004) to less than 4000 (Haa-
In recent years, richer features were extracted, e.g. deriva-
palainen et al. 2008). The typical data amount in literature
tive, filtering (Boersch et al. 2016), and Chernoff images
is less than 500, e.g. in (Tseng 2006; Sun et al. 2017; Zhang
drawn with statistic features from process curves (Zhang et al.
et al. 2017; Summerville et al. 2017; Zhang et al. 2015; Martín
2017).
et al. 2010; Laurinen et al. 2004).
Feature Selection (FS), such as analysis of variance
The second aspect is the data source. There could be three
(ANOVA) Panchakshari and Kadam (2013), step-wise regres-
major data sources: (1) simulation data (Zhou et al. 2018),
sion (Panchakshari and Kadam 2013), etc. was performed.
where the most amount of data labelled with quality features
The work (Haapalainen et al. 2008) discussed and compared
could be produced with less cost, but the data conditions
various feature selection methods, e.g. Sequential Forward
may deviate from production because it is very difficult to
Selection (SFS), Sequential Backward Selection (SBS),
perfectly reproduce the real production conditions in simu-
Sequential Forward Floating selection (SFFS), Sequential
lations; (2) laboratory data (Summerville et al. 2017), where
Backward Floating Selection (SBFS) and N-Best Features
the welding condition could be more similar to real produc-
Selection (nBest).
tion but less labelled data can be produced; (3) and production
data, which is the final target application of welding quality
ML Modelling. Most of the methods used for machine learn-
monitoring but there exist the most restrictions on amount of
ing modelling can be classified as classical machine learning
labelled data, cost for data collection and number of sensors.
methods (LaCasse et al. 2019), like Linear Regression
Almost all of the previous studies have collected data from
(LR) (Cho and Rhee 2002; Martín et al. 2009; Panchakshari
laboratories or experimental settings (Lee et al. 2001, 2003;
and Kadam 2013), Polynomial Regression (PolyR) (Pashaz-
Koskimaki et al. 2007; Martín et al. 2010; Wan et al. 2016;
adeh et al. 2016; Summerville et al. 2017), or Generalised
Sumesh et al. 2015; Yu 2015), except for Kim and Ahmed
Linear Models (GLM) Gavidel et al. (2019), k-Nearest
(2018), which used about 3400 welded spots accumulated
Neighbours (kNN) (Haapalainen et al. 2005; Koskimaki et al.
production data.
2007; Boersch et al. 2016), Decision Trees (DT) (Zhang
Feature Processing. In the literature, two types of features et al. 2015; Kim and Ahmed 2018), Random Forests
are commonly used. The first type is single features, which (RF) (Pereda et al. 2015; Boersch et al. 2016), Support
are recorded as (aggregated) constants for a welding oper- Vector Machines (SVM), etc. Statistic methods like Lin-
ation. Examples include welding time (t) (Li et al. 2000), ear or Quadratic Discriminate Analysis (LDA and QDA)
sheet thickness (thickness) (El Ouafi et al. 2010), sheet coat- are also used for classification. Bayesian Networks (BN),
ing (coating) (Yu 2015), electrode force (F) (Tseng 2006), Genetic Algorithms (GA) (Tseng 2006; Panchakshari and
welding current (I) (Hamedi et al. 2007), pressure (Pashaz- Kadam 2013), and Particle Swarm Intelligence (PSO) Sun
adeh et al. 2016). The second type is sensor measurements, et al. (2017) are often used for optimisation. The Arti-
or process curves, which are physical quantities measured ficial Neutral Networks (ANN) used in previous studies,
along time, and are therefore referred to as time series (TS). include Fuzzy Neural Networks (FuzzyNN) (Lee et al. 2001),
Examples include dynamic resistance (Re) (Cho and Rhee Learning Vector Quantisation (LVQ) (Junno et al. 2004),
2000), electrode force (F) (Junno et al. 2004), welding cur- Self-Organising Maps (SOM) (Junno et al. 2004), General
rent (I) (Haapalainen et al. 2005), electrode displacement Regression Neural Networks (GRNN) (Tseng 2006), Hop-
(s) (Park and Cho 2004), welding voltage (U) (Haapalainen field Neural Networks (HopfieldNN) (Zhang et al. 2017),
et al. 2008), ultrasonic images (Martín et al. 2007; Amiri Kernel Extreme Learning Machines (KELM) (Sun et al.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


1146 Journal of Intelligent Manufacturing (2022) 33:1139–1163

2017) and Multilayer-Perceptrons (MLP). Since these net- welding quality has been largely ignored. If the func-
works either have fewer than two hidden layers, or do not tionality of estimation of welding quality is realised in a
demonstrate the characteristic of hierarchical feature learn- welding system, this system can make an quality estima-
ing (Bengio et al. 2013), they can still be classified in the tion for each welding operation only after the operation,
category of classic machine learning. during which the quality failure may already happened.
There exist also several works that use deep learning (DL) – Most of the previous work has used a limited amount of
for quality monitoring in RSW. The work (Hou et al. 2017) data collected from experimental settings. It is question-
applied sparse auto-encoder network for detecting welding able whether models developed from laboratory data are
defects from X-ray images. Convolutional neural networks applicable in real production, as the welding conditions
(CNN) are proposed (Dai et al. 2021) to classify spot quality (such as cooling time and wear) are usually different.
assist visual inspection for classifying spots into good/bad More studies should be conducted on a larger amount of
based on pictures of outlook of welded car body parts. Long data (e.g. more than 4000 welding spots) and data col-
short-term memory neural networks (Zhou et al. 2020a) are lected from real industrial production.
applied for learning abstract representation from the tem- – Domain knowledge is insufficiently considered in ML
poral data for predicting quality. Many more works exist analysis. This can be seen from the fact that most previous
that apply DL for quality monitoring or optimisation in other studies treated each welding operation independently.
welding processes, such as laser welding (Mikhaylov et al. In addition, the design of feature engineering strategies
2019a; Shevchik et al. 2020; Zhang et al. 2019a), arc weld- can rely more on domain knowledge. Furthermore, fea-
ing (Nomura et al. 2021; Zhang et al. 2019b), resistance wire ture selection has been performed, but was not used for
welding (Guo et al. 2017) etc. It can been seen there exist understanding the importance of input features on qual-
much more works about classic ML than DL for quality ity features. The interpretation of ML analysis results
monitoring of RSW. In our industrial environment classic and features should rely more on domain knowledge and
methods are also preferred than DL methods. One reason for generate more insights for a better understanding of the
that could be that the amount of collected data is large enough process and data.
to demonstrate the advantage of DL methods, as deep learn-
ing usually requires very large amount of data LaCasse et al.
(2019). DL becomes more useful when the analysed data Data and problem statement
are “heavier”, e.g. X-ray images, pictures, and laser welding
images. However, quality judged from X-ray images and pic- This section gives a brief introduction to the resistance spot
tures for RSW are less reliable (indirect quality indicators) welding process (“Resistance spot welding”section), dis-
and less precise (only good/bad instead of numeric values), cusses the characteristics of welding data collected from
compared to diameters and other numerical quality indica- running industrial production lines (“Hierarchical temporal
tors. Another reason could be, the relationship between input structures in the data” section), their formats (“data into for-
features and target features is not very complex, or not very mats” section), and defines the problem of predictive quality
non-linear, so that deep learning methods cannot demonstrate monitoring (“Problem statement” section).
their benefits (Bengio et al. 2013).
Resistance spot welding
Summary. There exist the following aspects that the previous
work has not covered sufficiently: Resistance Spot Welding (RSW) is widely applied in auto-
motive industry for e.g. car chassis production. During the
– The temporal dependency of welding data is limitedly welding process, the two electrode caps, equipped at the end
discussed. Each welding operation has been deemed as of the two welding electrodes (Fig. 4), press two or three
an independent event in almost all literature, that is to worksheets (two layers of car chassis parts in automotive
say, the quality estimation of each welding operation is industry) with force. Then an electric current flows from one
carried out independently from other operations. This is electrode, through the worksheets, to the other electrode, gen-
logically and physically true, but there could be some erating a large amount of heat due to electric resistance. The
systematic trends between the welding operations per- materials in a small area between the two worksheets, called
formed continuously, e.g. the wearing effect of welding the welding spot, will melt, and form a weld nugget con-
electrodes should also increase continuously. necting the worksheets. The electrode caps directly touching
– Due to the insufficient recognition of interdependency the worksheets wear quickly due to high thermal-electric-
between welding operations, the literature has been mechanical loads and oxidation, and need to be maintained
focusing on estimating the welding quality with the data regularly. After a fixed number of spots are welded, a very
of each welding operation, but the prediction of future thin layer of the cap will be removed to restore the surface

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Journal of Intelligent Manufacturing (2022) 33:1139–1163 1147

1 1

0.9 0.8

Resistance

Current
0.8 0.6

0.7 0.4

Reference Resistance(t)
0.6 Example Resistance(t) 0.2
Reference Current(t)
Example Current(t)
0.5 0
0 100 200 300 400
Samples
a b

Fig. 3 (a) Examples of process curves. The adaptive control will try to tuple consisting of single features on the welding operation level and
force the process curves to follow the reference curves defined by the process curves on the welding time level (Sect. 3.3) are recorded by the
welding program. Left y-axis: Resistance (anonymised), right y-axis: adaptive control system that tries to force the actual curves follow the
Current (anonymised), x-axis: Samples (b) Consecutive welding oper- reference curves
ations constitute the welding operation level. For each operation, a data

The diameter of the welding nugget is typically used as the


quality indicator of a single welding act according to inter-
national standards (ISO 2004) and German standards DVS
(2016), but it is difficult to measure precisely. Destructive
methods, i.e. tearing the welded chassis apart, are expensive
and can only have the quality partially controlled. Non-
destructive methods, including ultrasonic wave and X-ray,
are also costly, time-consuming and yield imprecise results
(El Ouafi et al. 2010; Zhang et al. 2004; Cho and Rhee 2000).
In industry, substitute quality indicators are often developed
Fig. 4 A schematic illustration of a welding gun in Resistance Spot
Welding (RSW) Zhou et al. (2018). The electrode caps, equipped at the to describe the welding quality. For example, the Q-Value
end of two welding electrodes, press the worksheets (e.g. chassis parts developed by Bosch Rexroth, will be studied in this work. Q-
in automotive industry) with force. A high electric current (the blue Value is an aggregated value calculated using process curves,
arrows) flows through, generating heat and forming a welding nugget,
by extracting their statistic, geometric and other features that
whose diameter D is an important quality indicator
incorporate domain knowledge. The exact way of calculation
of Q-Value cannot be disclosed here since it is a know-how
of Bosch Rexroth. The optimal Q-Value is one; values below
condition of the electrode cap. This is called Dressing. After
one normally indicate quality deficiency, and values above
a certain number of dressings are performed, the electrode
one indicate energy inefficiency.
cap becomes too short and needs to be changed altogether.
This is called Cap Change.
The welding process is controlled by the welding con- Hierarchical temporal structures in the data
trol system, which is an Adaptive Control System and also
provides the function to store the collected data. It will try Previous studies (Boersch et al. 2016; Zhang et al. 2004;
to force the electric current flowing through the electrodes Cho and Rhee 2000) have treated each welding operation
and worksheets to follow a pre-designed profile. This pro- independently. If we closely examine the data, e.g. Fig. 5a
file is called reference current curve (Fig. 3a) in Adaptive showing the Q-Value along a number of welding operations
Control, which is determined in the process development for an example welding machine, we can see clearly that the
stage. Apart from the reference current curve, other refer- data has strong periodicity, which indicates that the data very
ence curves, e.g. voltage, are also determined in the process likely have temporal dependencies. This section elaborates
development stage. A complete set of all reference curves the multi-levels of temporal structures in data as an intrinsic
are stored in a welding program. The welding program also result of the structure of production processes.
prescribes other information, e.g. the welding position on the The first time level is the welding time level during a sin-
car part, the thickness, material, surface coating condition of gle welding operation, which are usually several hundreds of
the worksheets and the glue between the worksheets. samples (Fig. 3a). For each welding operation, data of process

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


1148 Journal of Intelligent Manufacturing (2022) 33:1139–1163

1.6 1.6
Prog1
Prog2
1.4 1.4

1.2 1.2

Q-Value
Q-Value

1 1

0.8 0.8

Line-plot of Q-Value of WM1 Q-Value marked with Programs of WM1


0.6 0.6
0 500 1000 1500 2000 540 560 580 600 620

a Welding Operations b Welding Operations

1.6 1.6
Prog1
Prog2
1.4 1.4 Prog3
Prog6
Prog7
1.2 1.2

Q-Value
Q-Value

Prog8

1 1

0.8 0.8
Line-plot of Q-Value of WM2 Q-Value marked with Programs of WM2
0.6 0.6
0 500 1000 1500 2000 550 600 650
Welding Operations Welding Operations
c d

Fig. 5 (a) Q-Value along a number of welding operations for Welding of welding operations for Welding Machine 2 (partially shown). The
Machine 1. The red rectangle indicates the area for a closer look in (b). irregular trends of Q-Value indicate complex production conditions.
Meaning of Q-Values, 1: optimal, <1: quality deficiency; >1: energy The red rectangle indicates the area for a closer look in (d). (d) Welding
inefficiency. The Q-Values raise gradually due to wearing effects, and operations performed with different welding programs. Note that there
drop abruptly when there is a maintenance. (b) Welding operations with exists a change of production arrangement. Before the 618th welding
different welding programs are performed for spots of different weld- operation, only three welding programs were performed (Prog6, 7 and
ing positions on the car part, thus often possessing different dynamics, 8). After that three MORE welding programs were performed (Prog1,
e.g. the means of Q-Values are different. (c) Q-Value along a number 2 and 3) due to a change of production plan

curves and single features are recorded (Fig. 3b). The consec- dure forms a dress cycle. Since the wearing effect repeats in
utive welding operations constitute the second time level, the each dress cycle, the consecutive dress cycles form the dress
welding operation level. A data sample on this level contains cycle level. A data sample on this level contains the com-
the complete information of a welding operation. The weld- plete information of all welding operations in a dress cycle.
ing operations are controlled by the adaptive control system According to the domain expert, the strong periodicity of the
and are operated with respective welding programs. A closer Q-Value is caused by the wearing effect. The Q-Values in
look at a small area reveals that the welding programs of the Fig. 5 begin with small values at each start of dress cycle,
operations are arranged with a specific order (Fig. 5b and d), rises as the electrode cap wears, and ideally become stable
defined by the order of car part types manufactured in the at the end of dress cycles (Fig. 5a), but can also demonstrate
production lines. complex trends under other conditions (Fig. 5c). After cer-
As the welding process goes on, the electrode wears. tain dress cycles, the electrode cap is changed. The behaviour
The wearing effect is quantified using the single feature of Q-Value is then influenced by the new electrode. All oper-
WearCount (Fig. 7). When the WearCount reaches a fixed ations welded by one electrode cap constitute an electrode
threshold pre-designed by the process experts, dressing is cycle. The consecutive electrode cycles comprise the elec-
performed. A complete dressing-welding-dressing proce- trode cycle level.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Journal of Intelligent Manufacturing (2022) 33:1139–1163 1149

Fig. 6 Sequential structure of a


simplified RSW production line,
where multiple types of car
chassis parts, each with a certain
amount of welding spots with
specified types, go through a
sequence of welding machines.
Each spot type is welded with
one pre-defined welding
program by one machine in a
fixed order

There exist other latent time-levels, including the level of


car chassis parts, production batches of parts and machines,
suppliers of the materials, components and machines. Since
this information is normally not available in the welding data,
these time-levels will NOT be addressed in this work.

Data in two formats

In the data collected from two production lines with a total


of 27 welding machines, we have selected two representative
datasets for this paper to demonstrate the results, including
Welding Machine 1 (WM1) with 2 welding programs and
1998 welding operations, and Welding Machine 2 (WM2) 2
with 6 welding programs and 5839 welding operations. The
data come with inhomogeneous formats, including welding
protocols generated in real-time, feed-back curves databases,
Fig. 7 Example of temporal structures in the RSW data quantified by reference curve databases, and meta settings.
WearCount, DressCount, and CapCount. WearCount is a counter on the
welding operation level. It increases by one after one welding operation, After collection and preparation, the inhomogeneous data
and is reset to zero after a dressing is performed, or the cap is changed. are prepared in two formats for each single welding operation
DressCount is a counter on dress cycle level. It increases by one after (Fig. 3b). We use the term Data Tuple (DT) (Mikut et al. 2006)
one dressing operation, and is reset to zero after the cap is changed to indicate a single data instance that contains all features
fully describing the instance. In this work, data tuples are on
the welding operation level, and is comprised of the following
two types of features.
Until here, the time levels of a single welding machine
are explained. If we step further to see the multiple machines
organised in production lines, we see two typical structures of – Time series (TS), including eight effective features.
production lines. Figure 6 illustrates a simplified production – Four Reference Curves prescribed by the weld-
line for RSW with a sequential structure. This organisation of ing programs of the adaptive control system, e.g.
welding machines constitute the machine level. Depending
on the structures of production lines, the collected data points 2 Note this is a different welding machine with much more complex
need to be organised with different temporal structures. data than the WM2 in Zhou et al. (2020a).

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


1150 Journal of Intelligent Manufacturing (2022) 33:1139–1163

Fig. 8 Hierarchical feature extraction. RawSF and padded TS go time dependency. After that, features are flattened to be made suitable
through different feature engineering modules. The resulting features for classic ML methods. Feature selection reduces these features to a
(EngSF and TSFE) are combined with RawSF and go through further small amount (20). Three ML methods are studied for modelling. LR:
advanced FE module with respect to ProgNo. All raw features (RawSF linear regression, MLP: multi-layer perceptron, and SVR: support vec-
and Padded TS) and engineered features (EngSF, TSFE, EngF_Prog) tor machine
are combined again and go through data reshaping to accentuate short-


reference current (Ir e f ), reference voltage (Ur e f ), Q k+1 = f (X 1 , ..., X k−1 , X k , S Fk+1 ). (1)
reference resistance (Rr e f ), reference pulse width
modulation (P W Mr e f ). The reference curves for a In the following text, the subscripts 1, ..., k − 1, k, k +
specific welding program usually remain identical 1 will be replaced by 1, ..., pr e2, pr e1, next1 for a better
unless they are manually changed. understanding. Thus, Eq. 1 becomes Eq. 2.
– Four Actual Process Curves, which are actual mea-

sured process feedback curves, e.g. electric current Q next1 = f (X 1 , ..., X pr e2 , X pr e1 , S Fnext1 ) (2)
(I ), voltage (U ), resistance (R), pulse width modula-
tion (P W M).
Methods
– Single features (SF), containing 164 effective features.
– Single features that describe the information of the This section elaborates on the machine learning methods
temporal structures in data, including the afore- studied for quality monitoring. This section first introduces
mentioned WearCount, DressCount, CapCount, and the strategies to handle the temporal structures in the data
Program Numbers (ProgNo), which are ordinal num- and meaningfully combine features on different time lev-
bers of the welding programs. els (“Feature engineering” section), then describes feature
– Other raw single features, examples including Sta- selection (“Feature selection” section) and machine learning
tus, describing the operating or control status of the modelling (“Machine learningmodelling” section).
welding operation, ProcessCurveMeans, which are
the average values of the process curves and their Feature engineering
welding stages calculated by the welding software
system, and QualityIndicators, which are categori- Since the data exist in at least two levels: welding level and
cal or numerical values describing the quality of the welding operation level, the feature engineering are also per-
welding operations, e.g. HasSpatter (boolean), Pro- formed on different time levels, resulting in a ML pipeline
cessStabilityFactor (PSF), Q-Value. of hierarchical feature extraction. We first take a glace over
all levels, and dive in each module to understand the details.
Problem statement Hierarchical feature extraction. To address the issue of
temporal structures, feature extraction is performed on dif-
The representation of the question defined in the “Introduc- ferent time levels (Fig. 8). Features are first extracted from
tion”, Figure 1 is to find a function between the available the padded time series on the welding time level. These
information and the Q-Value of the next welding operation extracted features (TSFE) can be seen as vectors containing
Q k+1 , shown in Eq. 1, where X 1 , ..., X k−1 , X k include data compressed information from the time series. After feature
tuples (single features and time series) of previous welding extraction, the TSFE become features on the welding oper-
operations (from time step 1 to k) and known features of the ation level, since they do not change along the welding
∗ , e.g. welding program)
next welding operation (S Fk+1 time, but change for different welding operations. The TSFE

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Journal of Intelligent Manufacturing (2022) 33:1139–1163 1151

(WeldMedian), standard deviation (WeldStd) (El-Banna et al.


2008), and end value (WeldEndValue). We can see these
extracted statistic features can characterise the time series.
Other time series, e.g. current curves and pulse width modu-
lation curves, are much simpler than the resistance curve and
can also be described by these features.

FE on the welding operation level. Strategies for feature


engineering on single features are designed based on the
meaning of features in domain knowledge, changing the rep-
resentations of the raw features to facilitate machine learning
modelling. Denoted as Engineered Single Features (EngSF),
3 new features are generated, listed below.

• WearDiff is calculated as the difference between


WearCount of two consecutive data tuples, characteris-
ing the degree of change of wearing effect. The value
Fig. 9 Resistance curve as example to explain time series feature
engineered (TSFE, partially shown). Resistance curve is the most com- is normally ONE if the data is continuous; if some data
plicated process curve and often deemed as the most informative time tuples are missing, the value will be other numbers that
series feature (Cho and Rhee 2000; El-Banna et al. 2008; El Ouafi et al. correctly describe the wearing effect; and the value will
2010; Lee et al. 2001)
be a large negative value after each fresh dressing.
• NewDress will be ONE after each dressing, and ZERO
for other welding operations.
can be combined with the single features. The consecutive
• NewCap will be ONE after each Cap Change, and ZERO
combined features again form time series on the welding
for other welding operations.
operation level. The combined features can go through fur-
ther feature extraction modules. This hierarchical feature
extraction can continue on further time levels, depending on Note that before the next welding operation happens, the
the desired granularity of time levels. In this work, feature WearCount, DressCount, and CapCount of the next opera-
extraction is performed on the first two time levels. After tion are already known, since they are artificially designed
that, the extracted features will be reshaped on the welding incremental features. The EngSF based on them are there-
operation level and then be used for ML modelling. fore also known. These features corresponding to the next
welding operation can therefore be used for predicting the
FE on the welding time level. After considering strategies Q-Value of the next welding spot.
proposed in the literature (Junno et al. 2004; Park and Cho According to the welding expert and observing from Fig.
2004; Zhang et al. 2015; Lee et al. 2001; El-Banna et al. 5b, Q-Values with different welding programs have differ-
2008; Yu 2015) and engineering knowledge, this work extract ent behaviours, but this work does not consider Program
statistic and geometrical features to aggregate the informa- Number (ProgNo) as a good feature for machine learning
tion of time series (on the welding level) to time series modelling. The same value of ProgNo would have different
features engineered (TSFE) on the welding operation level. meanings for different welding machines in case of using
First, all eight time series are synchronised to the same raw values of ProgNo for modelling, and the number of fea-
time point of welding-start. The pre-welding-stage where tures may change in case of One-Hot Encoding. Therefore,
nothing actually happens is chopped. Then these features are this work creates another type of features to incorporate the
padded to the maximum length according to physical mean- information of ProgNo implicitly, avoiding using the fea-
ing. For example, current is padded with zero because after ture ProgNo (Fig. 11). Firstly, all single features that form
welding current becomes zero, while resistance is padded time series on the welding operation level are decomposed
with the last value because resistance does not disappear after to sub-time series, each only belonging to one ProgNo. Sec-
welding. Then, these features are extracted (Figure 9): length ondly, the aforementioned EngSF are extracted separately
(WeldTime) (Panchakshari and Kadam 2013), maximum from each sub-time series. We give a name to this group of
(WeldMax) (Cho and Rhee 2000), minimum (WeldMin) (El- features: Engineered Single Features considering ProgNo,
Banna et al. 2008), maximum position (WeldMxPo) (Lee (EngSF_Prog). Concretely, WearDiff_Prog is calculated as
et al. 2001), minimum position (WeldMnPo), slope (WeldS- the difference between consecutive WearCounts that belong
lope) (Cho and Rhee 2002) (slope = (max −min)/(mx po− to the same ProgNo. NewDress_Prog, NewDress_Prog, and
mnpo)), mean (WeldMean) (El-Banna et al. 2008), median NewCap_Prog are created similarly.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


1152 Journal of Intelligent Manufacturing (2022) 33:1139–1163

Moreover, the following features are also created to Performance metrics. To evaluate the model prediction
implicitly incorporate the information of welding program. power, this work has selected the performance metric mean
absolute percentage error (mape) (Equ. 3). mape is the per-
centage representation of mean absolute error. It is intuitive
• RawSF_Prog indicates the features generated by decom-
to understand for process experts, and relatively insensitive to
posing the raw single features of the data points belong-
local errors where the prediction of some points deviate from
ing to the same ProgNo.
the true value to a larger degree. Other performance metrics
• TSFE_Prog indicates the features generated by decom-
(e.g. mean absolute error, mae) were considered less intuitive
posing the time series features engineered of the data
by process experts in our group and are not presented in the
points belonging to the same ProgNo.
paper.

N  
EngSF_Prog, RawSF_Prog, and TSFE_Prog are grouped 1  yn − ŷn
under the name Engineered Features considering ProgNo mape = | | × 100%. (3)
N yn
(EngF_Prog). n=1
Before feeding extracted features on the desired time level
into machine learning algorithms, these features need to be
reshaped to form small time snippets of a certain look-back Experiment settings
length l.
We now explain the data splitting strategies the benchmark
feature engineering settings and finally the modelling.
Feature selection
Data splitting according to the temporal structures
The number of features grow enormous because of feature
engineering. From each of the 8 time series, 10 features Data splitting also needs to take the temporal structures of
are extracted, resulting in 80 features. From single features data in consideration. The splitting point should be chosen
describing temporal structures, 3 new features are gener- at complete units of some time levels. According to pro-
ated. There are 164 raw single features. After reshaping l cess experts, the deployment scenario will only be to test
previous operations, then flatting to table, the number of the developed machine learning methods on complete dress
engineered features can grow to more than 2000. For exam- cycles in the future. It is therefore more meaningful to split
ple, suppose l = 10, the number of engineered features: the data also in this way. In this work, the data is split to train-
(164+3+80)×10 = 2470. When adding the EngF_Prog, the ing, validation, and test in a 0.8 : 0.1 : 0.1 ratio, rounded to
number of features even doubles: (164 + 3 + 80) × 2 = 494, complete dress cycles, illustrated in Fig. 10 for both welding
494 × 10 = 4940 (Fig. 8). machines using the Q-Value. It is also important to note that
This work suggests using step-wise forward feature selec- the validation data and test data should contain at least one
tion (Mikut et al. 2006) to keep the number of features for complete dress cycle to ensure they cover the wearing effect
modelling small, for two purposes: 1) to retain the predic- through a full dress cycle. Details see Table 2.
tion power of ML models, especially in case where number
of data points is less than number of features; 2) to attain
Benchmarks
the transparency of the ML models. After considering the
time cost of feature selection, for linear regression models,
Three benchmarks are designed with process experts using
this work applies a wrapper method to select the features.
intuition and domain knowledge, to provide baselines for
For MLP and SVR, a pre-selection with linear regression is
evaluation of the performance of the ML models.
performed.
– Benchmark 1, is a simple estimation to predict the next Q-
Machine learning modelling Value as equal to the previous Q-Value: Q̂ next1 = Q pr e1 .

Three ML methods. Three types of classic machine learn-


ing methods are tested in this work. Linear Regression (LR) Table 2 Details of datasets
is selected as the representative algorithm of classic ML Welding Machine #Training #Validation #Test data
methods and extensively studied. Two non-linear methods, data data
Multi-Layer Perceptrons with one hidden layer (MLP) and Welding Machine 1 1456 219 323
Support Vector Regression (SVR) are studied to see if non- Welding Machine 2 4474 602 763
linear methods can improve the performance further.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Journal of Intelligent Manufacturing (2022) 33:1139–1163 1153

training
Data splitting of WM1 training Data splitting of WM2
1.6 validation validation
1.6
test test

1.4 1.4

Q-value
Q-Value

1.2 1.2

1 1

0.8 0.8

0 500 1000 1500 2000 0 1000 2000 3000 4000 5000 6000
a Welding Operations b Welding Operations

Fig. 10 Example of data splitting rounded to complete dress cycles. cated dressing operations were performed. This may be caused by the
Note the data of Welding Machine 2 is much more complicated than change of production arrangement (Figure 5)
that of Welding Machine 1. Especially at the beginning cycles, compli-

Table 3 Performance of benchmarks on test set evaluated using mape


Benchmarks WM 1 WM 2

Benchmark 1 Q̂ next1 = Q pr e1 3.19% 10.41%


Benchmark 2 Average over WearCount 2.70% 5.13%

Fig. 11 Example for generating EngF_Prog on data of Welding Benchmark 3 Q̂ next1 = Q pr e1 _ Pr og 2.48% 5.03%
Machine 1. Each dot indicates a welding spot and its data. Purple dots
belong to Welding Program 1 and yellow dots belong to Welding Pro-
gram 2. All single features that form time series on the welding operation
level are decomposed to sub-time series, each only belonging to one four settings of features to study whether and to which degree
ProgNo. The Engineered Features considering ProgNo (EngF_Prog)
are extracted separately from each sub-time series feature engineering on the two types of features can increase
model prediction power.

– Benchmark 2, Average over WearCount assumes the


behaviour of Q-Value of the same welding program – Setting 0, no feature engineering. Only the raw single fea-
across all dress cycles should be nominally identical. The tures will be used in machine learning modelling. Notice
Q-Values with W earCount = i in any dress cycle are that the ProcessCurveMeans in the raw single features
therefore calculated as the average value of all Q-Values already provide some information of the time series. A
whose W earCount = i and Pr og N o = P, across all total of 1640 (164 × 10) features are generated before
dress cycles in training data: feature selection.
Q̂ next1|W earCount=i = – Setting 1, only performing feature engineering on time
mean({Q W earCount=i,Pr og N o=P , |Q ∈ T rainingSet}) series. The resulting time series features engineered
– Benchmark 3 is a slight adaptation based on the Bench- (TSFE) will be combined with raw single features
mark 1, but taking more domain knowledge into consid- (RawSF) in machine learning modelling. The TSFE serve
eration. Thus: Q̂ next1 = Q pr e1 _ Pr og . as a supplement to the ProcessCurveMeans. A total of
2440 ((164 + 8 × 10) × 10) features are generated before
We have calculated benchmarks on two welding machines feature selection.
and report this in Table 3. – Setting 2, performing feature engineering on time series
and single features. The time series features engineered
Four settings of feature engineering (TSFE), raw single features (RawSF) and engineered
single features (EngSF) will be combined and used in
Feature engineering can be performed on two types of fea- machine learning modelling. A total of 2470 ((164 +
tures, time series features and single features (also time series 3 + 8 × 10) × 10) features are generated before feature
in the time level of welding operations). We have designed selection.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


1154 Journal of Intelligent Manufacturing (2022) 33:1139–1163

– Setting 3, performing a further step of feature engineering more time than LR, which makes it less desirable for the
on time series and single features. The time series fea- quick-adaptation of the methods to new datasets and indus-
tures engineered (TSFE), raw single features (RawSF), try application scenarios.
engineered single features (EngSF), and EngF_Prog For MLP and SVR, other hyper-parameters are selected
(RawSF_Prog, TSFE_Prog and EngSF_Prog) will be using limited grid search. MLP has two hyper-parameters,
combined and used in machine learning modelling. A number of neurons in the hidden layer and the activation
total of 4940 ((164 + 3 + 8 × 10) × 2 × 10) features are function. SVR has two or three hyper-parameters, kernel
generated before feature selection. type, regularisation factor, and degree in case of polynomial
kernel types (Table 4).

ML model training and hyper-parameter selection


Results and discussion
We trained the ML models on the training set and selected
the hyper-parameters based on the performance evaluated on This section presents the experiment results and discuss the
the validation set. performance of ML analysis. Four feature settings and three
Two hyper-parameters are to be selected for linear regres- ML methods (LR, MLP, SVR) give 12 models for each
sion, the number of selected features (ω) and look-back length dataset and 24 models in total.
(l). These two hyper-parameters are similar in the sense, that After the hyper-parameters are determined, we trained the
as they increase, the data amount provided to the ML model models again with the selected hyper-parameters on the com-
will increase and therefore the model performance evalu- bined set of training data and validation data, and tested the
ated on the training set should always increase. In industrial models on the respective test sets. The results of linear regres-
application, it is desired to find suitable hyper-parameters that sion (LR) models trained with these four feature settings on
provide relatively good performance, avoid overfitting, and dataset of Welding Machine 1 and Welding Machine 2 are
ideally make the model insensitive to the hyper-parameters. presented in Table 5. The performance of the four settings are
We performed a limited grid search to find the hyper- compared to Setting 0 and the best benchmark (Benchmark
parameters that is to fix the first hyper-parameter and vary 3). Percentage improvements are calculated.
the second one; then select the second hyper-parameter in an
area where the model performance is good and insensitive Results and discussion of linear regression
to the hyper-parameter; then fix the second hyper-parameter,
vary the first one, and select the first one. Firstly, from Table 5 one can conclude the model per-
To make a fair comparison, we want to make the data formance increases as the degree of feature engineering
amount delivered to the model the same, so that the perfor- increases. We observe the effect of the features derived from
mance difference is indeed caused by the quality of features, domain knowledge that they help to improve the model per-
not the data amount. We therefore unify the hyper-parameters formance by comparing the performance of Setting 0 to
in the four models trained on the four feature settings. After 3 on WM1 and WM2. The improvement becomes signifi-
a series of experiments we selected a number of selected fea- cant when we compare Setting 3 to Setting 0 (28.09% for
tures of 20 and a look-back length of 10 for the four feature WM1 and 29.31% for WM2), where Setting 3 has the most
settings. Figure 12a and b illustrate the model performance advanced features derived from domain knowledge. The
evaluated on validation set of Welding Machine 1. The mod- domain knowledge derived features also reveal more insights
els become insensitive to the hyper-parameters after more from the engineering perspective, and help to make the
than approximately 20 features are selected and when the model insensitive to hyper-parameters. These are discussed
length of look-back window is longer than about 10. A fur- extensively in the “Interpreting ML results for engineering
ther reason that 20 is selected is that we want to limit the insights” section.
number of selected features to retain model transparency, The performance of Setting 0 and Setting 1 on dataset of
which is very desirable from the view of process experts. Welding Machine 1 shows deterioration compared to Bench-
The performance of Setting 0 and Setting 1 is more sensitive mark 3. Observing from Figure 5a we can see the Q-Value
to hyper-parameters than that of Setting 2 and Setting 3. behaviour of Welding Machine 1 is relatively stable, Bench-
The same feature sets determined by LR are tested on mark 3 therefore works very well. The performance of Setting
MLP and SVR. The reasons are of two-fold: (1) the features 0 and Setting 1 on dataset of Welding Machine 2 shows some
selected by LR should already cover more than necessary improvement, because the Q-Value behaviour of Welding
information for a successful prediction of the next Q-Value. Machine 2 (Figure 5d) is much more complicated.
This hypothesis is confirmed by extra experiments of feature The performance improvement of Setting 1 compared to
selection; (2) feature selection with MLP and SVR takes Setting 0 is insignificant, but consistent (Fig. 12), which

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Journal of Intelligent Manufacturing (2022) 33:1139–1163 1155

3 3
Hyper-parameter selection on WM1 Setting0 Hyper-parameter selection on WM1 Setting0
Setting1 Setting1
Setting2 Setting2
Setting3 Setting3
2.5 2.5
mape (%)

mape (%)
2 2

1.5 1.5
0 5 10 15 20 25 30 0 5 10 15 20
a #Selected features b Look-back length

7 7
Hyper-parameter selection on WM2 Setting0 Hyper-parameter selection on WM2 Setting0
Setting1 Setting1
Setting2 Setting2
6 Setting3 6 Setting3

mape (%)
mape (%)

5 5

4 4

3 3
0 5 10 15 20 25 30 0 5 10 15 20
c #Selected features d Look-back length

Fig. 12 (a) Performance of models evaluated on the validation set of but a fixed number of selected features of 20. The performance changes
Welding Machine 1. The models are trained on dataset of Welding less than 0.1% when the length of look-back window is about 10. To
Machine 1 with different number of selected features (ω) but a fixed note in Setting 3, where the EngF_Prog is used, the effective look-back
look-back length (l) of 10. The performance changes less than 0.1% time step is length of look-back window × number of welding program
after approximately 20 features are selected. (b) Performance of mod- (l × # Pr og). (c) Performance of models evaluated on the validation
els evaluated on the validation set of Welding Machine 1. The models are set of Welding Machine 2. (d) Performance of models evaluated on the
trained on dataset of Welding Machine 1 with different look-back length validation set of Welding Machine 2

Table 4 Hyper-parameter
Methods Hyper-parameters Selection method
selection of ML methods. ω:
#selected features, l: look-back LR ω, l, features Limited grid search
length, λ: regularisation factor
MLP ω, l, features Kept the same as LR
#Neurons, activation function Limited grid search
SVR ω, l, features Kept the same as LR
Kernel type, λ, degree Limited grid search

implies the features in the RawSF that contain time series dataset of Welding Machine 2 is evident. In Figure 12, we
information (the ProcessCurveMeans) already provide valu- can see this difference is not a random effect but systematic.
able information. A significant improvement begins with A further inspection on the two datasets reveals that the
Setting 2, which indicates the feature engineering strategies welding programs of Welding Machine 1 are always arranged
on the welding operation level are meaningful. The perfor- in a fixed interlaced order (Figure 5b), i.e. the ProgNo always
mance difference between Setting 2 and Setting 3 analysed repeat the same pattern: Prog1, Prog2, Prog1, Prog2, ... but
on dataset of Welding Machine 1 is rather small, but that on the welding programs of Welding Machine 2 are not arranged
in a fixed order. A change of production arrangement hap-

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


1156 Journal of Intelligent Manufacturing (2022) 33:1139–1163

Imprv. w.r.t. Benchmark3


Table 5 Performance of the Linear Regression (LR) models evaluated on test sets of representative welding machines. Percentage improvements are calculated with respect to Setting 0 (the 4th
column), and to the best benchmark (the 5th column). Note that the performance of the models trained with Setting 0 and 1 shows deterioration with respect to Benchmark 3. This phenomenon
pened in the 618th welding operation (Fig. 5d), before which
the production arrangement was comprised of three weld-
ing programs. After the 618th welding operation, three extra
welding programs were added to the production arrange-
ment, which is quite usual in manufacturing industry since

16.70%
32.41%
4.37%
9.54%
the production arrangement could change at any time in agile
manufacturing. This explains why Setting 3, in which the


Imprv. w.r.t. Setting 0 welding program information is specially handled, has a sig-
nificant improvement on dataset of WM2, but less significant
improvement on dataset of WM1.
Figure 13 illustrates the visualisation of the target values
Welding Machine 2 (WM2)

and estimated values. It can be seen that the Setting 0 and


−4.57%

12.89%
29.31%
Setting 1 learned a general rising trend of the behaviour of Q-
5.41%

Values. However, the dynamics of behaviour of Q-Values are


more complex than a simple rising trend. We can observe the


mape (LR)

Q-Value trend first rises then declines slightly, and remains


stable at the end. These dynamics are better learned by more
5.03%
4.81%
4.55%
4.19%
3.40%

complicated feature settings.


Besides, although most of the Q-Values are predicted with
small errors, there exist quite a few outliers that have an
Imprv. w.r.t. Benchmark3

apparent different behaviour than the “normal” Q-Values.


These outliers seem to be random, and cannot be explained
will be discussed in Sect. 6. mape: mean absolute percentage error, “Imprv. w.r.t.”: “improvement with respect to”

with the trained models.


The complex dynamics of Q-Value behaviours implicate
−7.66%
−2.82%
15.32%
22.58%

that the welding quality is not solely influenced by linear


wearing effects. According to process experts, other influen-

tial factors include the statistic and stochastic variance caused


by dressing, the chassis to be welded, etc.
Imprv. w.r.t. Setting 0

Results and discussion of MLP and SVR


Welding Machine 1 (WM1)

21.35%
28.09%

The results of MLP and SVR models are shown in Table 6.


7.12%

4.49%

We have performed experiments of feature selection with


these methods. The results show that the selected features


are largely overlapping with those selected by LR. Thus, the
mape (LR)

MLP and SVR models are trained with the features deter-
2.48%
2.67%
2.55%
2.10%
1.92%

mined by LR models and their hyper-parameters are selected


using limited grid search.
Table 6 demonstrates that performance can indeed be
RawSF, TSEF, EngSF, EngF_Prog

improved by non-linear models. This means there exist non-


linearity and interaction between the selected features, which
cannot be described using the LR models.
RawSF, TSEF, EngSF
Q̂ next1 = Q pr e1 _ Pr og

The performance of MLP models is usually better com-


pared to LR models (thus also better than the benchmarks).
RawSF, TSEF

The improvement becomes quite significant for the compli-


Feature settings or benchmarks

cated dataset (Welding Machine 2) with the highest degree


RawSF

of feature engineering (Setting 3). Conspicuous is the Set-


ting 1 model performance is worse than that of Setting 0 for
both welding machines. This indicates the time series fea-
Benchmark 3

tures engineered may cause overfitting in some cases.


A closer look at results of LR and MLP models on the test
Setting 0
Setting 1
Setting 2
Setting 3

set of Welding Machine 2 for two example welding programs


with the Setting 3 is illustrated in Fig. 14. This reveals that

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Journal of Intelligent Manufacturing (2022) 33:1139–1163 1157

1.4
a Setting 0 b Setting 1 c Setting 2 d Setting 3
1.2
Q-Value

0.8

1700 1750 1800 1850 1900 1950 1700 1750 1800 1850 1900 1950 1700 1750 1800 1850 1900 1950 1700 1750 1800 1850 1900 1950
Number of Welding Operations Number of Welding Operations Number of Welding Operations Number of Welding Operations
target estimated training estimated test

Fig. 13 Prediction results zoomed in on the test set area of Welding Machine 1, with the model of linear regression with the four settings of feature
engineering, 20 features selected, and a look-back length of 10. The test set takes 10% of the data, and the training data area is not shown

although the different welding programs have very different features, and visualisation of hyper-parameter selection to
behaviours for Q-values, both LR and MLP models are able gain insights of ML analysis for engineering practice.
to capture the different dynamics. From the figure the perfor-
mance of the two models are not easy to differentiate. This Interpretation of the benchmarks
indicates the importance of numerical performance metrics
in Tables 5 and 6. Referring to Table 3, we can see that through simple consid-
The performance of SVR models are on par with the LR eration it is already possible to build very effective predictors
models (and better than the benchmarks). Also conspicu- for the next Q-Value. According to process experts, Bench-
ous is that the performance of SVR models does not always mark 1, Q̂ next1 = Q pr e1 works to some degree because
improve as the degree of feature engineering increases. A the behaviour of Q-Value should be normally stable and
further investigation of SVR models reveals that the perfor- influenced by the wearing effect, while the wearing effect
mance of them fluctuates irregularly quite a few (up to 1.2% progresses gradually. The Q-Value therefore cannot change
mape) as the regularisation factor changes, which means abruptly. Benchmark 2, Average over WearCount works well,
SVR models often fall into local optimums. Some SVR mod- because dressing should restore the change of surface condi-
els perform well on training and validation sets but perform tion caused by the wearing effect, and therefore also restores
badly on test sets, which indicates a tendency of overfitting. the behaviour of Q-Value. In another word, the behaviour
of Q-Value should be nominally identical in different dress-
cycles. Benchmark 3, Q̂ next1 = Q pr e1 _ Pr og has the best
Generalisability over other welding machines
performance because the welding operations of the same
welding program provide more valuable information for pre-
We have been testing our methods on a number of other
diction of future welding quality of that welding program
welding machines (until the submission four extra welding
than the other welding programs.
machines) and other quality indicators (e.g. Process Stabil-
ity Factor). The results have shown comparably promising
results and similar diagrams as Figures 12, 13 and 14. Our Extensive interpretation of selected features
evaluation has confirmed our hypothesis of generalisability
of the proposed approaches over other welding machines and Welding Machine 1.
quality indicators. Table 7 lists the 5 most important features with order of
descending importance for Setting 0, Setting 1, Setting 2 and
Setting 3 on dataset of Welding Machine 1, respectively.
Q-Value of the previous second welding spot is selected as
Interpreting ML results for engineering the most important feature in three settings. Since the ProgNo
insights always repeat the same pattern (see the “Results and discus-
sion of linear regression” section): the Prog1, Prog2, Prog1,
ML analysis in engineering should not only deliver results, Prog2, ... The feature RawSF_Q_pre2 is therefore equal to
but also provide insights that are helpful for engineering prac- EngF_Prog_Q_pre1. EngF_Prog_Q_pre1 is namely the Q-
tice. Therefore, feature engineering has the advantage that Value of the previous spot welded with the same welding
the engineered features can efficiently represent the neces- program as the next welding spot (This feature is identical
sary information of the data and reveal influential factors and to Benchmark 3). This means the quality of welding usually
other insights. This section interprets benchmarks, selected does not have abrupt changes.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


1158 Journal of Intelligent Manufacturing (2022) 33:1139–1163

Table 6 Performance and hyper-parameters of the Multi-Layer Perceptrons (MLP) and Support Vector Regression (SVR) models evaluated on test sets of representative welding machines.
Radial-Basis-Function has always been selected as the kernel types of SVR, and is therefore not listed in the table. mape: mean absolute percentage error, Act. function: activation function, λ:

19.02
21.29

16.00
14.79
Moreover, the features RawSF_WearCount_next1,

λ
EngSF_NewDress_next1, EngF_Prog_WearDiff_next1 are
selected, which means the wearing effect has strong influ-
ence on the welding quality, and therefore quality prediction

mape (SVR)
should use features characterising the wearing effect.
The features RawSF_I2_Mean_pre1, TSFE_R_WeldStd_

4.02%
4.49%

3.34%
4.32%
pre1, TSFE_I_WeldMin_pre3, RawSF_I_Mean_pre3 are
selected, which means the time series features extracted
#Neurons from the welding stage indeed contain some information for
predicting the next spot quality. Note these features are of
previous first or third spots, which are not welded with the
50
47

44
60
same ProgNo as the next spot. This indicates that the infor-
mation provided by historical spots that are not welded with
the same ProgNo as the next spot are also important. The
Act. function

reason may be that the wearing effect taking place during


these temporally adjacent welding operations have influence
logistic
Welding Machine 2 (WM2)

on the next welding quality.


tanh
tanh

tanh

No time series features extracted from the initial stage are


selected as the most important features. This indicates the
initial stage may be less important for the welding quality,
mape (MLP)

which is also reasonable considering the time of the initial


stage is short and the current is relatively too small to exerting
3.87%
4.07%
3.57%
2.97%

effect.
In Setting 0 and 1, the selection of features of relative early
operations, like RawSF_Q_pre9, TSFE_R_WeldStd_pre8, is
8.17
10.19
6.15
8.17

questionable, because their influence should not be greater


than temporally more adjacent features. In Setting 2 und 3 we
λ

can see these questionable features are no longer selected.


mape (SVR)

Welding Machine 2.
2.25%
1.94%
2.08%
1.87%

As for Welding Machine 2 (Table 8), the selected features


are different.
The most obvious difference is that RawSF_Q_pre2 is
no longer selected as the most important feature. As men-
#Neurons

tioned in see the “Results and discussion of linear regression”


section, a change of arrangement of welding programs hap-
5
25
50
53

pened in the 618th welding operation. For the same reason,


no EngSF is selected in the most important features, since
Act. function

EngSFs do not incorporate information of welding programs.


Although the feature RawSF_WearCount_next1 can also
Welding Machine 1 (WM1)

describe the dependency of Q-Value on wearing effect to


tanh
tanh
tanh
tanh

some degree, as it is selected through Setting 0 to 2, the


performance of models trained on Setting 3 demonstrates a
significant improvement (Table 5). This protrudes the advan-
mape (MLP)

tage of the feature EngF_Prog_Q_pre1 in Setting 3.


Many TSFEs extracted from reference curves are selected
2.27%
2.42%
2.03%
1.92%

as more important than those from actual process curves.


regularisation factor

Reference curves are prescribed by the welding programs,


and are therefore always identical for a specific program.
This implicates that the next spot quality is also dependent
Setting 0
Setting 1
Setting 2
Setting 3
settings
Feature

on the welding programs performed on the previous spots,


rather than the corresponding actual process curves. This

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Journal of Intelligent Manufacturing (2022) 33:1139–1163 1159

1.6 1.6

1.4 1.4

1.2 1.2
Q-value

Q-value
1 1

0.8 target 0.8 target


estimated test LR estimated test LR
estimated test MLP estimated test MLP
0.6 0.6
860 880 900 920 940 960 860 880 900 920 940 960
Number of welding operations (only for Prog1) Number of welding operations (only for Prog3)
a Setting 3 Prog 1 b Setting 3 Prog 3

Fig. 14 Examples of prediction results on test set of Welding Machine 2 illustrated for two welding programs, with the model of MLP with the
Setting 3 of feature engineering, 20 features selected, and a look-back length of 10

phenomenon is not evident for Welding Machine 1, since Conclusion and outlook
the welding program arrangement of which is fixed.
Similar to the case of Welding Machine 1, feature engi- Conclusion. This work firstly reveals characteristics of weld-
neering avoids selection of questionable features such as ing data that are little discussed in the literature, especially the
RawSF_Power_Mean_pre10, which is far away in terms of hierarchical temporal structures in the production data that
temporal influence and should not be included, if considered are important for quality prediction. Then machine learn-
from the view of engineering know-how. ing (ML) approaches to deal with the hierarchical temporal
structures and feature engineering with deep consideration
of engineering knowledge are introduced. After that, the
ML approaches are evaluated on two industrial production
datasets to test the generalisability. A great advantage of
our solution that is very desired in industry is that the ML
approaches are insensitive to the hyper-parameters, number
Interpretation of hyper-parameter selection of features and lookback length. Our results demonstrate that
the prediction power of even the most simplistic modelling
The results of selection of hyper-parameters are illustrated method, linear regression, can be substantially enhanced
in Fig. 12. The trend that the performance of the models through cunning design of engineered features. Furthermore,
increases as the look-back length increases (Figure 12b) the transparency of feature engineering allows the interpre-
imply that the hypothesis holds, that there exist temporal tation of ML results to gain engineering insights. On the
dependencies between welding spots. That is to say, the Q- contrary, a blind training of ML models would select ques-
Value of the next welding spot is indeed dependent on the tionable and less robust features. The extensive interpretation
previous welding operations. of ML results enable a better understanding of the meaning of
We observe that when #Selected features ω > 15, look- features, temporal dependency, crossing the board of two dis-
back length l > 10, the model performance does not change ciplines: ML and engineering and making them more deeply
significantly for WM1 (Fig. 12a and b). The same phe- intertwined.
nomenon is revealed for Welding Machine 2 (Fig. 12c and
d) that the model performance becomes insensitive in the Outlook. We have been doing or plan to do investigation in
areas of the selected hyper-parameters. The same hyper- the following directions which we believe are valuable from
parameters are therefore selected (ω = 20, l = 10) for both industrial and academic points of view:
datasets of Welding Machine 2.
The performance of Setting 0 and Setting 1 is more sen- – Testing the proposed approach on more datasets to further
sitive to hyper-parameters than that of Setting 2 and Setting verify the generalisability.
3. This indicates again that the features derived from domain – Exploring other feature extraction strategies, especially
knowledge also make the model more insensitive the hyper- feature learning (deep learning), to compare the (dis)ad-
parameters. vantages to feature engineering.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


1160

123
Table 7 The most important 5 features selected from feature settings in the analysis of dataset of Welding Machine 1, listed in ranking of descending importance. Note the score is evaluated on a
multivariate basis, i.e. the importance of ω-th feature is the combined score of the 1st to ω-th feature (Sect. 4.2). Correlation coefficient between the model estimation and target value is chosen as
the feature score for an intuitive comparison. The prefixes RawSF, TSFE, EngSF, EngF_Prog indicate the feature source, the suffixes indicate the time stamp of the features, and the stems indicate
the physical meanings, e.g. Q for Q-Value, R for resistance, I for current, I2_Mean for ProcessCurveMeans of current, I_WeldMin for the minimum extracted from the welding stage
# Setting 0 Score Setting 1 Score Setting 2 Score Setting 3 Score

1 RawSF_Q_pre2 0.72 RawSF_Q_pre2 0.72 RawSF_Q_pre2 0.72 EngF_Prog_Q_pre1 0.73


2 RawSF_I2_Mean_pre1 0.78 RawSF_I2_Mean_pre1 0.78 EngSF_NewDress_next1 0.80 EngF_Prog_WearDiff_next1 0.86
3 RawSF_WearCount_next1 0.79 RawSF_WearCount_next1 0.79 RawSF_I2_Mean_pre1 0.84 EngF_Prog_I2_Mean_pre1 0.87
4 RawSF_Q_pre9 0.81 TSFE_I_WeldMin_pre3 0.81 TSFE_R_WeldStd_pre1 0.86 TSFE_R_WeldStd_pre1 0.88
5 RawSF_Q_pre3 0.82 TSFE_R_WeldStd_pre8 0.82 RawSF_I_Mean_pre3 0.87 EngF_Prog_Q_pre2 0.88

Table 8 The most important 5 features selected from feature settings in the analysis of dataset of Welding Machine 2, listed in ranking of descending importance. Feature scores are similar to
Table 7. The prefixes RawSF, TSFE, EngSF, EngF_Prog indicate the feature source, the suffixes indicate the time stamp of the features, and the stems indicate the physical meanings, e.g. Q for
Q-Value, I2_Mean for ProcessCurveMeans of current, R for resistance, RefU_WeldEndValue for the end value extracted from the welding stage of the reference curve of voltage, I_WeldMin for
the minimum extracted from the welding stage, RefPWM for the reference curve of the Pulse Width Modulation
# Setting 0 Score Setting 1 Score Setting 2 Score Setting 3 Score

1 RawSF_WearCount_next1 0.75 RawSF_WearCount_next1 0.75 RawSF_WearCount_next1 0.75 EngF_Prog_Q_pre1 0.79


2 RawSF_Time_pre2 0.81 TSFE_RefU_WeldEndValue_pre2 0.85 TSFE_RefU_WeldEndValue_pre2 0.85 EngF_Prog_WearDiff_next1 0.91
3 RawSF_I2_Mean_pre2 0.85 TSFE_RefR_WeldMax_pre1 0.88 TSFE_RefR_WeldMax_pre1 0.88 TSFE_RefR_WeldSlope_pre2 0.92
4 RawSF_R_Mean_pre1 0.86 TSFE_I_WeldMin_pre3 0.90 TSFE_I_WeldMin_pre3 0.90 RawSF_WearCount_next1 0.93

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


5 RawSF_Power_Mean_pre10 0.87 TSFE_R_WeldMax_pre1 0.90 TSFE_R_WeldMax_pre1 0.90 TSFE_RefPWM_WeldMean_pre4 0.93
Journal of Intelligent Manufacturing (2022) 33:1139–1163
Journal of Intelligent Manufacturing (2022) 33:1139–1163 1161

– Investigating other ML methods (Zou et al. 2021; Feng Amiri, N., Farrahi, G., Kashyzadeh, K. R., & Chizari, M. (2020). Appli-
et al. 2020), e.g. artificial neural networks, especially cations of ultrasonic testing and machine learning methods to
predict the static and fatigue behavior of spot-welded joints. Jour-
recurrent neural networks, which are suitable for pro- nal of Manufacturing Processes, 52, 26–34.
cessing data with temporal structures. Bartschat, A., Reischl, M., & Mikut, R. (2019). Data mining tools. Wiley
– Predicting the Q-Value of the next welding spot as a prob- Interdisciplinary Reviews: Data Mining and Knowledge Discov-
abilistic distribution. Before the next welding actually ery, 9(4), e1309.
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learn-
happens, the next Q-Value can in fact not be determinis- ing: a review and new perspectives. IEEE Transactions on Pattern
tically predicted. The prediction of the next Q-Value in Analysis and Machine Intelligence, 35(8), 1798–1828.
this work is actually a prediction of the mean value. A Boersch, I., Füssel, U., Gresch, C., Großmann, C., & Hoffmann, B.
better way is probabilistic forecasting. (2016). Data Mining in Resistance Spot Welding. In: The Interna-
tional Journal of Advanced Manufacturing Technology (p. 1–15).
– Using the prediction results as a basis for process opti-
Chand, S., & Davis, J. (2010). What is smart manufacturing. Time Mag-
misation. After the next spot quality is predicted, there azine Wrapper, 7, 28–33.
exist several possible measures to undertake, e.g. flexible Cho, Y., & Rhee, S. (2000). New technology for measuring dynamic
dressing, adaptation of the reference curves, or switching resistance and estimating strength in resistance spot welding. Mea-
surement Science and Technology, 11(8), 1173.
to non-adaptive control mode.
Cho, Y., & Rhee, S. (2002). Primary circuit dynamic resistance mon-
– Modelling of domain knowledge in ontologies (Sve- itoring and its application to quality estimation during resistance
tashova et al. 2020a, b; Zhou et al. 2020b, 2021a), spot welding. Welding Journal, 81(6), 104–111.
knowledge graphs (Zhou et al. 2021b, c) and rule-base Cho, Y., & Rhee, S. (2004). Quality estimation of resistance spot
welding by using pattern recognition with neural networks. IEEE
systems (Kharlamov et al. 2017a, b, 2019; Horrocks et al. Transactions on Instrumentation and Measurement, 53(2), 330–
2016), etc. Domain knowledge will be more deeply inte- 334.
grated in data integration (Jiménez-Ruiz 2015; Pinkel Dai, W., Li, D., Tang, D., Jiang, Q., Wang, D., Wang, H., & Peng, Y.
et al. 2015, 2018; Kalayci et al. 2020; Kharlamov (2021). Deep learning assisted vision inspection of resistance spot
welds. Journal of Manufacturing Processes, 62, 262–274.
et al. 2016), data query, reasoning (Thinh et al. 2018; DVS. (2016). Widerstandspunktschweißen von Stählen bis 3 mm
Ringsquandl et al. 2018) and automatic construction of Einzeldicke - Konstruktion und Berechnung. Standard, Deutscher
ML pipelines. Verband für Schweißen und Verwandte Verfahren e: V., Düssel-
dorf, DE.
El-Banna, M., Filev, D., & Chinnam, R. B. (2008). Online qualitative
nugget classification by using a linear vector quantization neural
Funding Open access funding provided by University of Oslo (incl Oslo network for resistance spot welding. The International Journal of
University Hospital). The work was partially supported by the H2020 Advanced Manufacturing Technology, 36(3–4), 237–248.
projects Dome 4.0 (Grant Agreement No. 953163), OntoCommons El Ouafi, A., Bélanger, R., & Méthot, J. F. (2010). An on-line ANN-
(Grant Agreement No. 958371), and DataCloud (Grant Agreement based approach for quality estimation in resistance spot welding.
No. 101016835) and the SIRIUS Centre, Norwegian Research Council Advanced Materials Research, 112, 141–148.
project number 237898. Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data min-
ing to knowledge discovery in databases. AI Magazine, 17(3),
Open Access This article is licensed under a Creative Commons 37–37.
Attribution 4.0 International License, which permits use, sharing, adap- Feng, W., Zhang, J., Dong, Y., Han, Y., Luan, H., Xu, Q., Yang, Q.,
tation, distribution and reproduction in any medium or format, as Kharlamov, E., Tang, J. (2020). Graph random neural networks
long as you give appropriate credit to the original author(s) and the for semi-supervised learning on Graphs. NeurIPS
source, provide a link to the Creative Commons licence, and indi- Gavidel, S. Z., Lu, S., & Rickli, J. L. (2019). Performance analysis and
cate if changes were made. The images or other third party material comparison of machine learning algorithms for predicting nugget
in this article are included in the article’s Creative Commons licence, width of resistance spot welding joints. The International Journal
unless indicated otherwise in a credit line to the material. If material of Advanced Manufacturing Technology, 105(9), 3779–3796.
is not included in the article’s Creative Commons licence and your Guo, Z., Ye, S., Wang, Y., & Lin, C. (2017). Resistance Welding Spot
intended use is not permitted by statutory regulation or exceeds the Defect Detection With Convolutional Neural Networks. In: Inter-
permitted use, you will need to obtain permission directly from the copy- national Conference on Computer Vision Systems, (p. 169–174).
right holder. To view a copy of this licence, visit http://creativecomm Springer.
ons.org/licenses/by/4.0/. Haapalainen, E., Laurinen, P., Junno, H., Tuovinen, L., & Röning,
J. (2005). Methods for Classifying Spot Welding Processes: A
Comparative Study of Performance.In: International Conference
on Industrial. Engineering and Other Applications of Applied
Intelligent Systems (p. 412–421). Berlin/Heidelberg, Bari, Italy:
References Springer.
Haapalainen, E., Laurinen, P., Junno, H., Tuovinen, L., & Röning, J.
Afshari, D., Sedighi, M., Karimi, M. R., & Barsoum, Z. (2014). (2008). Feature selection for identification of spot welding pro-
Prediction of the nugget size in resistance spot welding with a cesses. In Informatics in Control Automation and Robotics, (p.
combination of a finite-element analysis and an artificial neural 69–79). Springer, Berlin/Heidelberg.
network. Materiali in Tehnologije, 48(1), 33–38. Hamedi, M., Shariatpanahi, M., & Mansourzadeh, A. (2007). Optimiz-
Alpaydin, E. (2009). Introduction to Machine Learning. Massachusetts: ing spot welding parameters in a sheet metal assembly by neural
MIT Press. networks and genetic algorithm. Proceedings of the Institution of

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


1162 Journal of Intelligent Manufacturing (2022) 33:1139–1163

Mechanical Engineers, Part B: Journal of Engineering Manufac- In: Proceedings of the Artificial Intelligence and Applications, (p.
ture, 221(7), 1175–1184. 705–711). Toulouse, France.
Hou, W., Wei, Y., Guo, J., & Jin, Y., et al (2017). Automatic Detection Lee, H. T., Wang, M., Maev, R., & Maeva, E. (2003). A study on using
of Welding Defects Using Deep Neural Network. In: Journal of scanning acoustic microscopy and neural network techniques to
physics: Conference series, vol. 933, (p. 012006). IOP Publishing. evaluate the quality of resistance spot welding. International Jour-
Horrocks, I., Giese, M., Kharlamov, E., Waaler, A. (2016). Using seman- nal of Advanced Manufacturing Technology, 22(9–10), 727–732.
tic technology to Tame the data variety challenge. IEEE Internet Lee, S. R., Choo, Y. J., Lee, T. Y., Kim, M. H., & Choi, S. K. (2001).
Computing, 20(6), 62–66 (2016) A quality assurance technique for resistance spot welding using a
ISO. (2004). Resistance welding-procedures for determining the weld- Neuro-Fuzzy algorithm. Journal of Manufacturing Systems, 20(5),
ability lobe for resistance spot. International materials reviews., 320–328.
49(2), 45–75. Li, W., Hu, S. J., & Ni, J. (2000). On-line quality estimation in resistance
ITU (2012). Recommendation ITU – T Y.2060: Overview of the Internet spot welding. Journal of Manufacturing Science and Engineering,
of Things. Tech. rep. In: International Telecommunication Union. 122(3), 511–512.
Jiménez-Ruiz, E., Kharlamov, E., Zheleznyakov, D., Horrocks, I., Li, Y., Zhao, W., Xue, H., & Ding, J. (2012). Defect recognition of resis-
Pinkel, C., Skjaeveland, M. G., Thorstensen, E., Mora, J. (2015). tance spot welding based on artificial neural network. Advances in
BootOX: Practical mapping of RDBs to OWL 2. ISWC (2), 2015, Intelligent and Soft Computing, 115(2), 423–430.
113–132 Martín, Ó., López, M., & Martin, F. (2007). Artificial neural networks
Junno, H., Laurinen, P., Tuovinen, L., & Röning, J (2004). Studying the for quality control by ultrasonic testing in resistance spot welding.
Quality of Resistance Spot Welding Joints Using Self-organising Journal of Materials Processing Technology, 183(2–3), 226–233.
Maps. In: Fourth International ICSC Symposium on Engineering Martín, O., Tiedra, P. D., López, M., San-Juan, M., García, C., Martín, F.,
of Intelligent Systems (EIS), (pp. 705–711). & Blanco, Y. (2009). Quality prediction of resistance spot welding
Kagermann, H. (2013). Recommendations for implementing the strate- joints of 304 austenitic stainless steel. Materials and Design, 30(1),
gic initiative INDUSTRIE 4.0. Tech. rep., acatech–National 68–77.
Academy of Science and Engineering, Frankfurt, DE. Martín, T., De Tiedra, P., & López, M. (2010). Artificial neural networks
Kagermann, H. (2015). Change through digitization – value creation in for pitting potential prediction of resistance spot welding joints
the age of industry 4.0. In Management of Permanent Change, (p. of AISI 304 Austenitic stainless steel. Corrosion Science, 52(7),
23–45). Springer. 2397–2402.
Kalayci, E. G., Grangel-González, I., Lösch, F., Xiao, G., ul Mehdi, Mikhaylov, D., Zhou, B., Kiedrowski, T., Mikut, R., & Lasagni, A. F.
A., Kharlamov, E., Calvanese, D. (2020). Semantic integration of (2019a). High accuracy beam splitting using SLM combined with
Bosch manufacturing data using virtual knowledge graphs. ISWC ML algorithms. Optics and Lasers in Engineering, 121, 227–235.
(2), 2020, 464–481. Mikhaylov, D., Zhou, B., Kiedrowski, T., Mikut, R., & Lasagni, A.F.
Kharlamov, E., Brandt, S., Jiménez-Ruiz, E., Kotidis, Y., Lamparter, S., (2019b). ML Aided Phase Retrieval Algorithm for Beam Splitting
Mailis, T., Neuenstadt, C., Özçep, Ö. L., Pinkel, C., Svingos, C., With an LCoS-SLM. In Laser Resonators, Microresonators, and
Zheleznyakov, D., Horrocks, I., Ioannidis, Y. E., Möller, R. (2016). Beam Control XXI, Vol. 10904, p. 109041M. In: International
Ontology-based integration of streaming and static relational data Society for Optics and Photonics, San Francisco
with optique. SIGMOD Conference, 2016, 2109-2112. Mikut, R., Bartschat, A., Doneit, W., Ordiano, J.Á.G., Schott, B.,
Kharlamov, E., Hovland, D., Skjaeveland, M. G., Bilidas, D., Jiménez- Stegmaier, J., Waczowicz, S., & Reischl, M. (2017). The MATLAB
Ruiz, E. Xiao, G., Soylu, A., Lanti, D., Rezk, M., Zheleznyakov, Toolbox SciXMiner: User’s Manual and Programmer’s Guide.
D., Giese, M., Lie, H., Ioannidis, Y. E., Kotidis, Y., Koubarakis, arXiv:1704.03298.
M., & Waaler, A. (2017a). Ontology based data access in Statoil. Mikut, R., Reischl, M., Burmeister, O., & Loose, T. (2006). Data mining
Journal of Web Semantics, 44, 3–36. in medical time series. Biomedizinische Technik, 51(5/6), 288–293.
Kharlamov, E., Mailis, T., Mehdi, G., Neuenstadt, C., Özçep, Ö. L., Nomura, K., Fukushima, K., Matsumura, T., & Asai, S. (2021). Burn-
Roshchin, M., Solomakhina, N., Soylu, A., Svingos, C., Brandt, through prediction and weld depth estimation by deep learning
S., Giese, M., Ioannidis, Y. E., Lamparter, S., Möller, R., Kotidis, model monitoring the molten pool in gas metal arc welding with
Y., Waaler, A. (2017b). Semantic access to streaming and static gap fluctuation. Journal of Manufacturing Processes, 61, 590–600.
data at Siemens. Journal of Web Semantics, 44, 54–74. NSF. (2010). NSF 11–516: Cyber-Physical Systems (CPS). National
Kharlamov, E., Kotidis, Y., Mailis, T. , Neuenstadt, C., Nikolaou, C., Science Foundation, Virginia, USA: Tech. rep.
Özçep, Ö. L., Svingos, C., Zheleznyakov, D., Ioannidis, Y. E., Lam- Panchakshari, A. S., & Kadam, M. S. (2013). Optimization of the pro-
parter, S., Möller, R., & Waaler, A. (2019). An ontology-mediated cess parameters in resistance spot welding using genetic algorithm.
analytics-aware approach to support monitoring and diagnostics International Journal of Multidisciplinary Science and Engineer-
of static and streaming data. Journal of Web Semantics, 56, 30–55. ing, 4(3), 1438–1442.
Kim, K. Y., & Ahmed, F. (2018). Semantic weldability prediction with Park, Y. J., & Cho, H. (2004). Quality evaluation by classification of
RSW quality dataset and knowledge construction. Advanced Engi- electrode force patterns in the resistance spot welding process
neering Informatics, 38, 41–53. using neural networks. Proceedings of the Institution of Mechan-
Koskimaki, H. J., Laurinen, P., Haapalainen, E., Tuovinen, L., & ical Engineers, Part B: Journal of Engineering Manufacture,
Roning, J. (2007). Application of the extended k nn method to 218(11), 1513–1524.
resistance spot welding process identification and the benefits of Pashazadeh, H., Gheisari, Y., & Hamedi, M. (2016). Statistical model-
process information. IEEE Transactions on Industrial Electronics, ing and optimization of resistance spot welding process parameters
54(5), 2823–2830. using neural networks and multi-objective genetic algorithm. Jour-
LaCasse, P. M., Otieno, W., & Maturana, F. P. (2019). A survey of nal of Intelligent Manufacturing, 27(3), 549–559.
feature set reduction approaches for predictive analytics models in Pereda, M., Santos, J., Martín, Ó., & Galán, J. (2015). Direct quality
the connected manufacturing enterprise. Applied Sciences, 9(5), prediction in resistance spot welding process: sensitivity, speci-
843. ficity and predictive accuracy comparative analysis. Science and
Laurinen, P., Junno, H., Tuovinen, L., & Röning, J. (2004). Studying the Technology of Welding and Joining, 20(8), 679–685.
quality of resistance spot welding joints using Bayesian networks. Pinkel, C., Binnig, C., Jiménez-Ruiz, E., May, W., Ritze, D., Skjaeve-
land, M. G., Solimando, A., Kharlamov, E. (2015). RODI: A

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Journal of Intelligent Manufacturing (2022) 33:1139–1163 1163

benchmark for automatic mapping generation in relational-to- Zhang, H., Hou, Y., Zhang, J., Qi, X., & Wang, F. (2015). A new method
ontology data integration. ESWC, 2015, 21–37. for nondestructive quality evaluation of the resistance spot weld-
Pinkel, C., Binnig, C., Jiménez-Ruiz, E., Kharlamov, E., May, W., ing based on the radar chart method and the decision tree classifier.
Nikolov, A., Sasa Bastinos, A., Skjaeveland, M. G., Solimando, International Journal of Advanced Manufacturing Technology,
A., Taheriyan, M., Heupel, C., Horrocks, I. (2018). RODI: 78(5–8), 841–851.
Benchmarking relational-to-ontology mapping generation quality. Zhang, H., Hou, Y., Zhao, J., Wang, L., Xi, T., & Li, Y. (2017). Auto-
Semantic Web, 9(1), 25–52. matic welding quality classification for the spot welding based on
Podržaj, P., Polajnar, I., Diaci, J., & Kariž, Z. (2004). Expulsion detec- the hopfield associative memory neural network and chernoff face
tion system for resistance spot welding based on a neural network. description of the electrode displacement signal features. Mechan-
Measurement Science and Technology, 15(3), 592. ical Systems and Signal Processing, 85, 1035–1043.
Ringsquandl, M., Kharlamov, E., Stepanova, D., Hildebrandt, M., Zhang, Y., Chen, G., & Lin, Z. (2004). Study on Weld Quality Con-
Lamparter, S., Lepratti, R., Horrocks, I., Kröger, P. (2018). Event- trol of Resistance Spot Welding Using a Neuro-Fuzzy Algorithm.
enhanced learning for KG completion. ESWC, 2018, 541–559. In: International Conference on Knowledge-Based and Intelligent
Rossum, G. (1995). Python reference manual. Tech. rep., CWI (Centre Information and Engineering Systems, (p. 544–550). Springer.
for Mathematics and Computer Science), Amsterdam, The Nether- Zhang, Y., You, D., Gao, X., Zhang, N., & Gao, P. P. (2019). Weld-
lands, The Netherlands. ing defects detection based on deep learning with multiple optical
Samuel, A. L. (2000). Some studies in machine learning using the game sensors during disk laser welding of thick plates. Journal of Man-
of checkers. IBM Journal of Research and Development, 44(12), ufacturing Systems, 51, 87–94.
206–226. Zhang, Z., Wen, G., & Chen, S. (2019). Weld image deep learning-based
Shevchik, S., Le-Quang, T., Meylan, B., Farahani, F. V., Olbinado, on-line defects detection using convolutional neural networks for
M. P., Rack, A., Masinelli, G., Leinenbach, C., & Wasmer, K. Al alloy in robotic arc welding. Journal of Manufacturing Pro-
(2020). Supervised deep learning for real-time quality monitor- cesses, 45, 208–216.
ing of laser welding with X-ray radiographic guidance. Scientific Zhao, R., Yan, R., Chen, Z., Mao, K., Wang, P., & Gao, R. X. (2019).
reports, 10(1), 1–12. Deep learning and its applications to machine health monitoring.
Sumesh, A., Rameshkumar, K., Mohandas, K., & Babu, R. S. (2015). Mechanical Systems and Signal Processing, 115, 213–237.
Use of machine learning algorithms for weld quality monitoring Zhou, B. (2021). Machine learning methods for product quality moni-
using acoustic signature. Procedia Computer Science, 50, 316– toring in electric resistance welding. Ph.D. thesis.
322. Zhou, B., Chioua, M., & Schlake, J.C. (2017). Practical methods for
Summerville, C., Adams, D., Compston, P., & Doolan, M. (2017). detecting and removing transient changes in univariate oscillatory
Nugget diameter in resistance spot welding: a comparison between time series. IFAC-PapersOnLine, 50(1), 7987–7992.
a dynamic resistance based approach and ultrasound C-Scan. Pro- Zhou, B., Chioua, M., Bauer, M., Schlake, J. C., & Thornhill, N. F.
cedia Engineering, 183, 257–263. (2019). Improving root cause analysis by detecting and remov-
Summerville, C. D. E., Adams, D., Compston, P., & Doolan, M. (2017). ing transient changes in oscillatory time series with application to
Process monitoring of resistance spot welding using the dynamic a 1, 3-Butadiene process. Industrial and Engineering Chemistry
resistance signature. Welding Journal., 11, 403–412. Research, 58(26), 11234–11250.
Sun, H., Yang, J., & Wang, L. (2017). Resistance spot welding qual- Zhou, B., Pychynski, T., Reischl, M., & Mikut, R. (2018). Compari-
ity identification with particle swarm optimization and a kernel son of machine learning approaches for time-series-based quality
extreme learning machine model. The International Journal of monitoring of resistance spot welding (RSW). Archives of Data
Advanced Manufacturing Technology, 91(5–8), 1879–1887. Science. Series A (Online First), 5(1), 1–27.
Svetashova, Y., Zhou, B., Pychynski, T., Schmidt, S., Sure-Vetter, Y., Zhou, B., Svetashova, Y., Byeon, S., Pychynski, T., Mikut, R., & Khar-
Mikut, R., & Kharlamov, E. (2020a). Ontology-enhanced machine lamov, E. (2020a). Predicting quality of automated welding with
learning: A Bosch use case of welding quality monitoring. In: machine learning and semantics: a Bosch case study. In CIKM.
ISWC, 2020, pp. 531–550. Zhou, B., Svetashova, Y., Pychynski, T., & Kharlamov, E. (2020b).
Svetashova, Y., Zhou, B. Schmid, S., Pychynski, T., & Kharlamov, E. Semantic ML for manufacturing monitoring at Bosch. In: ISWC
(2020b). SemML: Reusable ML models for condition monitoring (Demos/Industry), Vol. 2721, p. 398.
in discrete manufacturing. In: ISWC (Demos/Industry), Vol. 2721, Zhou, B., Svetashova, Y., Gusmao, A., Soylu, A., Cheng, G., Mikut, R.,
2020, pp. 213–218. Waaler, A. and & Kharlamov, E. (2021a). SemML: Facilitating
Thinh Ho, V., Stepanova, D., Gad-Elrab, M. H., Kharlamov, E., development of ML models for condition monitoring with seman-
Weikum, G. (2018). Rule learning from knowledge graphs guided tics. Journal of Web Semantics, 71, 100664.
by embedding models. ISWC (1), 2018, 72–90. Zhou, D., Zhou, B., Chen, J., Cheng, G., Kostylev, E. V., Kharlamov,
Tseng, H. Y. (2006). Welding parameters optimization for economic E. (2021b). Towards ontology reshaping for KG generation with
design using neural approximation and genetic algorithm. Interna- user-in-the-loop: Applied to Bosch Welding. In: IJCKG.
tional Journal of Advanced Manufacturing Technology, 27(9–10), Zhou, B., Zhou, D., Chen, J., Svetashova, Y., Cheng, G., Kharlamov, E.
897–901. (2021c). Scaling usability of ML analytics with knowledge graphs:
Wan, X., Wang, Y., & Zhao, D. (2016). Quality monitoring based on Exemplified with a Bosch welding case. In: IJCKG.
dynamic resistance and principal component analysis in small Zou, X., Zheng, Q., Dong, Y., Guan, X., Kharlamov, E., Lu, J. Tang,
scale resistance spot welding process. The International Journal J. (2021) TDGIA: Effective injection attacks on graph neural net-
of Advanced Manufacturing Technology, 86(9–12), 3443–3451. works. KDD, 2021, 2461–2471
Wuest, T., Weimer, D., Irgens, C., & Thoben, K. D. (2016). Machine
learning in manufacturing: advantages, challenges, and applica-
tions. Production & Manufacturing Research, 4(1), 23–45. Publisher’s Note Springer Nature remains neutral with regard to juris-
Yu, J. (2015). Quality estimation of resistance spot weld based on dictional claims in published maps and institutional affiliations.
logistic regression analysis of welding power signal. International
Journal of Precision Engineering and Manufacturing, 16(13),
2655–2663.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:

1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at

[email protected]

You might also like