0% found this document useful (0 votes)
34 views18 pages

Identification of risk features using text mining and BERT-based models

Uploaded by

abba3iabbas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views18 pages

Identification of risk features using text mining and BERT-based models

Uploaded by

abba3iabbas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Process Safety and Environmental Protection 158 (2022) 382–399

Contents lists available at ScienceDirect

Process Safety and Environmental Protection


journal homepage: www.elsevier.com/locate/psep

Identification of risk features using text mining and BERT-based models:


Application to an oil refinery ]]
]]]]]]
]]


July Bias Macêdo, Márcio das Chagas Moura , Diego Aichele, Isis Didier Lins
Center for Risk Analysis and Environmental Modeling, Department of Industrial Engineering, Universidade Federal de Pernambuco, Brazil

a r t i cl e i nfo a bstr ac t

Article history: The uncontrollable release of hazardous substances may lead to catastrophic accidents. In this context, risk
Received 30 July 2021 studies are aimed at recommending either preventive measures or designing safeguards to mitigate the
Received in revised form 9 December 2021 consequences. To that end, risk experts postulate possible leakages, then identify their causes and con­
Accepted 10 December 2021
sequences, and finally evaluate and classify the risks into categories. These analyses rely on examination
Available online 13 December 2021
different engineering textual documents and attendance numerous meetings, which is very time con­
suming. Moreover, this qualitative process of hazard identification and assessment are usually the first steps
Keywords:
Hazard identification of quantitative risk analysis (QRA) and is paramount to ensure its quality. Therefore, we here propose to use
Hazard assessment text mining and fine-tuned trained bidirectional encoder representations from transformers (BERT) models
Natural language processing to support and reduce the efforts required for completing the early stages of QRA. Our idea is to apply these
Text mining techniques to identify the potential consequences of accidents related to the operation of an oil refinery and
Oil refineries classify each scenario in terms of severity of the consequence and likelihood of occurrence. The proposed
method was applied to an actual oil refinery and presented very promising results. The potential con­
sequences, the severity and likelihood categories were predicted with a mean accuracy of 97.42%, 86.44%,
and 94.34% respectively. The models resulting from this research were embedded into a web-based app that
is called HALO (hazard analysis based on language processing for oil refineries).
© 2021 Institution of Chemical Engineers. Published by Elsevier B.V. All rights reserved.

1. Introduction manage and minimize the potential risks related to their operation
(Vinnem and Røed, 2020; Wang et al., 2018). Thus, QRA enables the
Oil refineries are expensive, complex systems that provide es­ organizations to propose and implement preventive measures either
sential resources by converting crude oil into useful products such as to avoid undesired events or to design efficient safeguards (Basheer
fuels, lubricants, and asphalt. Oil refining encompasses a great et al., 2019).
variety of physical and chemical processes, and it is divided into Overall, in the first steps of QRA (hazard identification and ana­
three basic steps: separation, conversion, and treatment (Demirbas lysis), experts recognize relevant scenarios that may arise, assessing
and Bamufleh, 2017). These stages contain highly flammable, ex­ and reporting their likelihood and potential consequences (Steijn
plosive, and/or toxic substances, which are handled and stored in et al., 2020). Qualitative approaches are adopted by a multi­
extreme conditions. Then, the loss of containment of these materials disciplinary team of experts on design, operation, and maintenance
may threaten personal safety and environment (Pramoth et al., of the plant as tools to complete these steps (Fuentes-bargues et al.,
2020). Indeed, the consequences of these undesirable events depend 2016). For instance, preliminary hazard analysis (PHA) (Yan and Xu,
on different variables such as the nature of the released material and 2019), hazard and operability analysis (HAZOP) (Guiochet, 2016),
its physical state, and the environmental conditions (Casal, 2017). failure mode and effect analysis (FMEA) (Bhattacharjee et al., 2020),
In this context, national and international regulators demand and fault tree (FT) (Ramos et al., 2020b).
quantitative risk analysis (QRA) for both new and existing installa­ In order to support the development and periodic review of QRA,
tions in order to provide a thorough picture of the hazards and, then, several studies have proposed advanced approaches to address dif­
ferent challenges usually faced in the analysis. (Bernechea et al.,
2013) proposed a methodology to consider domino effects into QRA,

Corresponding author. by estimating the frequency with which new accidents will occur,
E-mail addresses: [email protected] (J.B. Macêdo), [email protected], while (Kamil et al., 2019; Lisi et al., 2015; Zhou and Reniers, 2018)
[email protected] (M. das Chagas Moura),
focused on modelling such effects. Other studies have been done to
[email protected] (D. Aichele), [email protected] (I.D. Lins).

https://doi.org/10.1016/j.psep.2021.12.025
0957-5820/© 2021 Institution of Chemical Engineers. Published by Elsevier B.V. All rights reserved.
J.B. Macêdo, M. das Chagas Moura, D. Aichele et al. Process Safety and Environmental Protection 158 (2022) 382–399

estimate more realistic accident frequency (Badri et al., 2013; forms, and defining the potential consequences, and the frequency
Landucci and Paltrinieri, 2016) and to quantify and/or update prob­ and severity categories for each of the scenarios.
ability of failure/ accidents (Guo et al., 2021; Li et al., 2020; Meng We expect that these three models could be highly useful for new
et al., 2021; Sarvestani et al., 2021). However, these studies were not plants that depend on the approval of the environmental regulators
concerned with the efforts required in the early stages of QRA. In­ to start the development of the facility design and construction. In
deed, they adopted traditional and time-consuming approaches to these situations, almost no specific information is available for that
identify hazards – namely, PHA, HAZOP, bow-tie (constructed after plant, and then risk experts usually rely on partially relevant risk
assessing accident databases), assessment of historical data, litera­ studies performed for similar facilities. Thus, with a model trained
ture review, and others. based on all the available information garnered from past risk stu­
Generally, these techniques involve examining different en­ dies, experts could use that entire source of knowledge to reduce
gineering documents that describe the installation (e.g., flowcharts, uncertainty. Instead of starting the analysis from scratch, risk ana­
equipment, and material lists) and attending numerous meetings to lysts could reuse knowledge, imbued in the trained models, from
postulate possible leakages, identify hazards and their possible previous studies or use QRA performed for similar plants as a
causes and consequences and, finally, evaluate and classify risks starting point to identify potential consequences, qualitatively
(Carrasquilla and Melko, 2017). characterize frequency and severity of accidental scenarios, and
Regarding the initial steps of QRA, for instance, (Aziz et al., 2019) prioritize the most critical events.
built an ontology for knowledge modelling and to design an expert We claim that such a method is particularly important for oil
database system from the hazard scenarios. The preliminary step of refineries because regulatory agencies require highly detailed PHA in
the proposed approach consists of outlining the hazard scenarios order to provide valuable information to decision makers and sur­
and gathering relevant information, which requires tedious proce­ rounding communities and to be suitable for QRA (Baybutt, 2015).
dures and several brainstorming sessions. Furthermore, (Ahmad Thus, risk analysts are demanded to postulate and analyse hundreds
et al., 2019) incorporated the thematic analysis for hazard preven­ or even thousands of scenarios. For instance, for a medium-size oil
tion strategies, which was applied to extract information from ac­ refinery that processes around 200k barrels of oil per day, a PHA
cident databases; this analysis involves transcribing and/or re- resulted in the identification of more than 3000 accidental hy­
reading the data repeatedly to obtain the key points of the hazard potheses.
prevention suggestions from the accident databases. The remainder of this paper is organized as follows. Section 2
Therefore, this paper proposes the application of text mining introduces key concepts on risk analysis, provides an overview of the
(TM) and natural language processing (NLP) to support the initial characteristics of TM and NLP, and presents some advances in risk
qualitative steps of a QRA. Indeed, TM and NLP techniques can be analysis and works that applied TM and NLP techniques in this
applied to extract, organize, and classify information from text, al­ context. Section 3 describes the proposed methodology to predict
lowing the automatic identification of patterns (Drury and Roche, the potential accidents, discusses the text data used, the pre­
2019). For this reason, the application of these techniques seems processing step and the modelling process. Section 4 shows an ap­
attractive for the QRA context in order to reduce necessary efforts to plication of the proposed model to an oil refinery. Finally, Section 5
perform it. concludes remarks and points out future research directions.
More specifically, TM can be described as the discovery of
knowledge from written data and has been used in different fields 2. Theoretical background
such as healthcare, marketing, education, and industry (Gagne et al.,
2019; Heidinger and Gatzert, 2018; Yim and Warschauer, 2017; Zare, 2.1. Risk analysis
2019). TM is an interdisciplinary field that uses different analysis
tools and involves techniques from machine learning (ML) and NLP. Risk analysis allows managing hazards properly to prevent po­
In a nutshell, NLP can be considered as a subdiscipline of artificial tential accidents from happening. QRA oftentimes resorts to sys­
intelligence and computational linguistics that includes any ma­ tematic approaches for characterizing a risk. The early stages of QRA
nipulation of natural language to allow computers generating consist of hazards identification and analysis, which represent the
statements and/or words written in human languages (Khurana most difficult steps, due to the many possibilities (scenarios) of what
et al., 2017). may go wrong (Pasman and Rogers, 2018; Ramos et al., 2020a; Zeng
Given that, this paper applies TM to extract information from text and Zio, 2017). To perform these steps, a team of experts usually
data and fine-tune pre-trained bidirectional encoder representations attends several meetings aimed at brainstorming all hazards and
from transformers (BERT) (Devlin et al., 2018) to identify risk fea­ potential leakages, their possible causes, expected frequency and
tures in an oil refinery. To that end, the pre-trained BERT model was consequences. To that end, the experts need to consider several
fine-tuned with specific-domain datasets to perform three tasks: (i) engineering documents to gather relevant information about the
to predict the potential consequences of accidents related to the system and its environment (Pasman et al., 2018; Zio and
operation of an oil refinery, and (ii) to classify each scenario in terms Aven, 2012).
of severity of the consequence, and (iii) likelihood of occurrence. There are different techniques that are widely adopted in the
Each dataset used to train the three models was built based on PHA early stages of QRA of process industries such as oil refineries. The
documents available for an oil refinery. It is noteworthy that there choice of the right technique depends on different factors, as avail­
are different documents that store information as textual data. We able resources, the amount and quality of the data, and the com­
here focused on PHA spreadsheets because they summarize and plexity of the system analysed. The aim of PHA is to identify all
store information from experts and other refinery documents; thus, possible leakages and the accidental events that may occur and to
PHA documents contain valuable textual information. These docu­ provide a qualitative estimate of the severity and likelihood of each
ments were developed by a group of experts specific to each pro­ accidental scenario (Li et al., 2018).
duction unit of the oil refinery. To perform the risk analysis, they Then, PHA is an approach to screen out the low-risk scenarios,
followed a standard (ANP 2014) that guides them in filling out the while the most critical events are further analysed by a quantitative

383
J.B. Macêdo, M. das Chagas Moura, D. Aichele et al. Process Safety and Environmental Protection 158 (2022) 382–399

Table 1 The results of the PHA are usually reported as spreadsheets, as


Description of the consequence levels in terms of the effects to human life. illustrated in Table 3, which is an example that represents the de­
Source: adapted from ISO (2018).
scription of a potential accident due to the release of contaminated
Consequence and oily water from a basin of the industrial wastewater treatment
Category Effects unit in an oil refinery. Table 3 also contains the operating conditions,
a list of equipment existing in the analysed subsystem, and the pi­
I Negligible without injuries
II Minor minor injuries or first aid treatment peline material. Note that, given the initiating event (i.e., leakage or
III Moderate serious injuries inside or mild injuries outside the rupture), a variety of consequences may occur (e.g., a small leakage
facility might cause a toxic vapour cloud and/or irritation), and may have
IV Significant fatality inside or serious injuries outside the facility different impact to human life, which is defined by the “severity”
column, whereas the rate of occurrence of each initiating event is
indicated in the column “likelihood”.
Table 2 For an oil refinery with a capacity of processing about 230,000
Description of the likelihood categories. barrels per day, the PHA resulted in a dataset of 1635 reports similar
Source: adapted from ISO (2018).
to that of Table 3. These spreadsheets summarize the assessment
Likelihood performed by the experts and represent their tacit knowledge; thus,
Category Description these documents contain valuable information about the risks re­
lated to the operation of different subsystems present in the oil re­
A Remote conceptually possible, but there are no records in the
literature finery. Given that, TM and NLP techniques can be applied to
B Unlikely unlikely to occur in normal conditions automatically extract the text and assess the content of previous
C Possible might occur sometime analysis to develop tools that may support future QRA.
D Likely will probably occur

2.2. Text mining and NLP


approach to estimate their physical effects generally related to fire,
explosion, and toxic dispersion. Finally, this piece of information is Information about an industrial system is usually stored in the
conflated for all critical events to calculate the individual and social form of textual data. Indeed, approximately 76% of activities in in­
risks associated with the entire facility, which are compared to the dustries require natural language understanding (Baker et al., 2020).
risk tolerability criteria established by regulatory agencies. These Commonly, the number of available documents is overwhelming and
same steps are also performed for existing oil refineries with the difficult to be manually processed. Thus, TM can reduce the time and
objective of presenting evidence that both risk estimates are still human effort required for content analysis of these documents using
below the thresholds. For instance, this is a demand required by the different methods from NLP, ML, information retrieval and knowledge
environmental regulator in order to permit the plant’s life extension. management (Galati and Bigliardi, 2019; Pejic-bach et al., 2020).
According to ISO 31010 (ISO, 2018), Table 1 and Table 2 show the There are several applications of NLP that are used in TM such as
consequence and likelihood classes respectively that are commonly topic tracking and clustering. For instance, text classification is one
adopted in PHA. Their combination represents the risk category NLP task with several real-word applications, such as sentiment
(Arunraj and Maiti, 2007; Aven and Zio, 2018). Note that the cate­ analysis of movies review and spam, bots, and fraud detection
gories are defined in terms of the damage to human life. It is worth (Minaee et al., 2021). It involves extracting rules from an annotated
to mention that other assets could also be analysed such as en­ corpus, which is a set of related labelled documents/texts, and once
vironment, property, or reputation. However, the scope of this work the classifier is trained, it can classify new textual data based on the
is limited to human life. patterns detected (Bengfort et al., 2018).

Table 3
Example of data contained in PHA documents.

Unit Industrial wastewater treatment

System Flow regularization system

Subsystem description Basin with possible presence of toxic substance hydrocarbon from another unit

Pipeline Material Carbon steel Operating conditions

Temperature (°C) Pressure Flow rate


(kgf . cm 2) (kg . h 1)

25 1.03 3000
Equipment Sump pump
Chemical Product Contaminated and oily water

Initiating event Potential consequences Severity Likelihood

Leakage Irritation II D
Toxic vapour cloud II D
Rupture Irritation III A
Toxic vapour cloud III A

384
J.B. Macêdo, M. das Chagas Moura, D. Aichele et al. Process Safety and Environmental Protection 158 (2022) 382–399

2.2.1. Preprocessing This task allows the model to understand the relation across sen­
In its raw format, the corpus is useless for algorithms that work tences (Devlin et al., 2018).
on numeric feature spaces. For this reason, it must be preprocessed In a nutshell, during pre-training, a set of two sentences, with
and converted into a standard format for knowledge distillation (e.g., some words being masked, is fed to the model and each word is
as input to an ML model), which consists of discovering patterns and converted into feature vector representations. These input data are
identifying information by applying different mechanisms on the passed to the transformer encoder layers; each encoder is broken
vectorized corpus. (Bengfort et al., 2018; Moreno and Redondo, down into two sub-layers: a multi-head attention and a feed forward
2016). The most commonly applied preprocessing operations are: neural network. The encoder’s inputs first flow through 12 attention
stopwords removal, where irrelevant words (e.g., “a”, “it” and “to”) layers that help the model focus on other words in the input sen­
are cut out; punctuation removal; upper-to-lower case conversion to tence as it encodes a specific word. Next, the outputs of the multi-
ensure that same words will be equivalent in different cases (e.g., head attention layer are concatenated and fed to a neural network.
“Hello” and “hello”); stemming to reduce words to their root form Finally, each word vector is passed into a fully connected layer in the
(e.g., “processing” is reduced to “process”) (Te et al., 2014); tokeni­ output layer; for more details see (Devlin et al., 2018; Vaswani
zation, where the words boundaries are identified and, then, texts et al., 2017).
are broken down into a sequence of words/terms known as tokens Nevertheless, training these models from scratch would require
(e.g., “Hello word” is converted to “Hello”, “word”) (Chowdhary, large datasets and a long time to converge. Thus, pre-trained word
2020; Moreno and Redondo, 2016). representations have been the key component for improving dif­
Moreover, the processed texts need to be converted into a nu­ ferent NLP tasks. Several pre-trained versions of the model are
merical representation, a feature vector that can be used as inputs available for download; thus, we can further train BERT to perform a
for supervised and/or unsupervised algorithms. These representa­ supervised-learning task by adding an untrained layer of neurons on
tions are the basis for knowledge distillation. However, obtaining top of the pre-trained model. Overall, during fine-tuning, the pre-
high quality word representations is quite challenging because they trained parameters are adopted to initialize the model and, then,
should represent the syntax, semantics and context of a word they are fine-tuned using specific, labelled data for solving the su­
(Feldman and Sanger, 2007). pervised task. In this research, we fine-tuned the Pytorch im­
There are several methods to vectorize texts, from empirical plementation of pre-trained BERT supplied by Google (Wolf et al.,
models (based on Bag-of-Words representations) to word embed­ 2020) using the data extracted from PHA documents to perform
dings. Particularly, a continuous representation is derived by uti­ three text classification tasks.
lizing a neural network with word embeddings methods. Overall, the
word vectors of the input sentence are processed in the layers of the 2.2.3. Related works
architecture. Each layer yields a more abstract representation of the ML, TM, and NLP have been recently applied to different research
input sentence until a single vector representing the entire input text areas. In this section, we focus on the different issues that both fields
is obtained. Then, different output layers can be adopted, depending can handle in the risk analysis context. Indeed, (Zio, 2018) pointed
on the task performed by the network. For example, a sigmoid or that the advance in computing power and growing data availability
softmax can be considered for classification or a decoder (sequence- counts in favour for the development of models for mining of
to-sequence configuration) for translation (Baker et al., 2020; knowledge acquired for QRA.
George, Joseph, 2014; Kim et al., 2020). In this context, deep learning For example, (Rachman and Ratnayake, 2019) developed an ML
(DL) architectures have achieved state-of-the-art results on different based-model to conduct risk-based inspection screening assessment
NLP tasks (e.g., machine translation, email spam detection, in­ that is used to identify equipment that makes major contribution to
formation extraction, and text summarization) (Howard and Ruder, the system’s total risk of failure, thus, allowing to prioritize high-risk
2018). For instance, BERT is a novel language representation model systems. These authors have to perform feature selection to build a
proposed by Google (Devlin et al., 2018). Next, we give a brief in­ dataset from previous risk-based inspections conducted for offshore
troduction about BERT. oil and gas production and processing units, where the output was
the risk category. (Kurian et al., 2020) applied ML to analyse process
2.2.2. BERT and occupational-type incidents reports from five oil sand in­
BERT consists of a multi-layer transformer encoder; transformer is dustries. These reports were manually classified and provided to
an architecture developed by (Vaswani et al., 2017) entirely based on different ML algorithms (AdaBoost, decision trees, K-Nearest
attention mechanisms that learn contextual relations between words Neighbors, Random Forest, Support Vector Machines, Multilayer
in a text. As opposed to directional models, such as long short-term Perceptron, Multinomial Naive Bayes, and Logistic Regression) in
memory networks (LSTMs), transformers encode all the words in the order to predict labels for incident type, consequence type, actual
input sentence at once; thus, it learns the context of a word based on risk score, and potential risk score. However, these approaches re­
all its surroundings and faster than sequential models. quire the feature engineering step, where the specialist must
BERT was pre-trained by Google on an extremely large corpus manually process the database and pick up which features will be
(BooksCorpus (Zhu et al., 2015) and Wikipedia). During pre-training, used for feeding the model.
the model learns the relation between words within a sentence and For instance, (Heidarysafa et al., 2018; Nayak et al., 2009; Zhang
between sentences by training on two unsupervised tasks: “mask et al., 2019) applied TM methods on accident’s narratives to com­
language modelling” and “next sentence prediction”. For the “masked prehend contextual relationships inherent to road accidents, identify
language modelling”, BERT takes in a sentence with 15% of the words causes and predict secondary crashes. (Boggs et al., 2020) applied
being randomly masked originally; then, the objective is to predict TM to perform an exploratory analysis of automated vehicle crash
the original word of the masked token based only on its context. This reports to quantify the pre-crashes and location factors. (Kuhn, 2019;
objective allows the representation to combine the right and the left Robinson, 2019; Sjöblom, 2014) developed approaches based on TM
contexts. For the “next sentence prediction”, BERT takes in two sen­ and NLP in the context of aviation accident analysis to find simila­
tences and the aim is to predict if the second one follows the first. rities between the accident reports applying clustering, and to

385
J.B. Macêdo, M. das Chagas Moura, D. Aichele et al. Process Safety and Environmental Protection 158 (2022) 382–399

Fig. 1. General overview of the proposed methodology.

automatically identify topics within reports by topic modelling in support preventive risk studies in process industries applying TM
order to pinpoint trends to prioritize safety activities (Vayansky and and DL techniques. Most studies that apply TM and DL to risk ana­
Kumar, 2020). lysis use reactive data such as accidents reports; however, valuable
Liu et al. (2021) employed K-means clustering to assess incident information is stored in different forms of text data and their
narratives from Pipeline and Hazardous Materials Safety Adminis­ adoption could support risk studies.
tration database in order to identify contributing factors and latent In this paper, we propose an approach based on TM and DL
causality. Suh (2021) applied TM and topic modelling to extract to­ models to extract information from proactive risk analysis of an oil
pics from narrative texts contained in accidents reports of Occupa­ refinery in order to specify the potential consequences and classify
tional Safety and Health Administration database to identify sectoral each accidental scenario in terms of severity of their consequence
patterns, which is defined by categorizing the common nature of and likelihood of occurrence. To that end, we developed an approach
accidents shared across industries. based on pre-trained BERT model to extract relevant information
Wang et al. (2016) proposed a system based on TM to extract risk from PHA documents. The text data was used to feed and fine-tune
elements from risk matrices. The authors extracted the risk origin, the pre-trained model. In this way, the model is capable of learning
component, causes, probability, and severity from those documents patterns that allow it to characterize risk scenarios (predict potential
and annotated the identified words. Then, they adopted support consequences, severity of consequences and likelihood of occur­
vector machines (SVM) (Vapnik and Izmailov, 2017) to perform a rence) given the occurrence of an uncontrollable release of ha­
binary classification (high or low risk). zardous material. Next, we present our proposed approach.
Sarkar (2016) proposed a TM and ML-based model to identify
basic events that influence the primary causes of occupational ac­ 3. Proposed methodology for identifying risk features
cidents (reactive data) in a steel plant. Then, the author predicted the
probability of the occurrence of a given cause through a Bayesian This paper aims at reducing efforts required to develop risk
network (Leu and Chang, 2013). Next, Sarkar et al. (2018) included studies. To that end, we adopted TM techniques to extract text data
proactive data, which consist of observations by safety inspectors from PHA documents, and then perform text classification tasks to
that indicate a certain level of potential hazard, and used decision identify risk features in oil refinery’s subsystems. Our idea is to de­
tree classifiers to predict the occurrence of accidents. velop models capable of learning and recognizing risk features, and
Moreover, Passmore et al. (2018) applied topic modelling to thus extract useful knowledge about accidental scenarios.
summarize narrative reports of injuries that occurred in coal mines We here developed three models by fine-tuning pre-trained BERT
to identify the main theme of the documents. Singh et al. (2019) model with the extracted data to perform three classification tasks:
used reactive (accident report) and proactive data (workplace con­ i) identification of possible consequences, given an occurrence of a
ditions during an accident-free period) to identify chain of events in leakage; ii) classification of the severity of the consequences; iii)
accident paths. Zhang and Mahadevan (2019) developed an SVM and classification of the likelihood of occurrence of the accidental sce­
DL-based model to examine previous accidents. The authors applied nario. Each model was trained with a specific annotated corpus that
the ensemble model to extract features from reports of aviation was built from the PHA sheets. Indeed, the corpus contains the data
accidents and assign the risk level to a corresponding aviation in­ extracted from PHA documents (e.g., Table 3) and the target related
cident. Their model classified the risk associated with the con­ to its corresponding task.
sequences of accidental events as high, moderately high, medium, Fig. 1 provides an overview of the proposed methodology. First,
moderately medium, and low. we developed two scripts: one that automatically extracts text from
Although the mentioned studies significantly contributed to the a collection of PHA spreadsheets, and another to organize and build
advance of risk analysis, there still is a lack of studies focusing on an annotated corpus for each supervised-learning task, also referred

386
J.B. Macêdo, M. das Chagas Moura, D. Aichele et al. Process Safety and Environmental Protection 158 (2022) 382–399

Table 4
Data from Table 3 converted into CSV format.

Instance Data
(unit, system, subsystem description, chemical product, operating conditions, material, equipment, initiating event, potential consequence, severity,
likelihood)

1 Industrial wastewater treatment, Flow regularization system, Basin with possible presence of toxic substance hydrocarbon from another unit, Contaminated
and oily water, 25, 1.033 3000, Carbon steel, Sump pump, Leakage, Irritation, II, D
2 Industrial wastewater treatment, Flow regularization system, Basin with possible presence of toxic substance hydrocarbon from another unit, Contaminated
and oily water, 25, 1.033 3000, Carbon steel, Sump pump, Leakage, Toxic vapour cloud, II, D
3 Industrial wastewater treatment, Flow regularization system, Basin with possible presence of toxic substance hydrocarbon from another unit, Contaminated
and oily water, 25, 1.033 3000, Carbon steel, Sump pump, Rupture, Irritation, II, A
4 Industrial wastewater treatment, Flow regularization system, Basin with possible presence of toxic substance hydrocarbon from another unit, Contaminated
and oily water, 25, 1.033 3000, Carbon steel, Sump pump, Rupture, Toxic vapour cloud, III, A

Table 5
Definition of the input and output of each corpus.

Corpus Input data Output Number of instances

1 Unit, system, subsystem description, chemical product, initiating event, equipment, equipment Potential consequence 1391
specifications, and operating conditions
2 Unit, system, subsystem description, chemical product, initiating event, equipment, equipment Severity category 2974
specifications, operating conditions, and potential consequence
3 Unit, system, subsystem description, chemical product, initiating event, equipment, equipment Likelihood category 2974
specifications, operating conditions, and potential consequence

to as dataset in this paper. Next, the corpus was preprocessed and output that compose the datasets and presents the number of in­
converted into a manageable format for feeding the learning algo­ stances of each corpus.
rithms. Below, we describe these steps in more details. Each input sequence provided to Model 1 characterizes a possible
leakage in a specific subsystem of the oil refinery. Since two in­
3.1. Text extraction itiating events (leakage and rupture) were considered by the experts
during the development of the PHA, every subsystem is represented
The first script accesses each PHA document (available as DOC twice in this dataset. For the first task, we defined our input as a
and DOCX files), and the textual data are extracted by searching for sentence constructed by joining the following text: unit, system,
each header: unit, system, subsystem description, equipment, che­ subsystem description, chemical product, operating conditions,
mical product, pipeline/equipment material, temperature, pressure, equipment, equipment specifications, and initiating event.
flow rate, initiating event, potential consequences, likelihood, and Considering the example in Table 3, one of the raw sentences i used
severity. Then, all texts were automatically extracted and stored into as input for task 1, x1, i , is given in Eq. 1:
a CSV file. This step is paramount to make the data more man­
ageable. x1, i = Industrial wastewater treatment Flow regularization system
Since some headers have multiple text data (e.g., initiating event, Basin with possible presence of toxic substance hydrocarbon from
see Table 3), various instances were generated per document. Each other
row of the resulting CSV file corresponds to an instance, and each
unit Contaminated and oily water 25 1.033 3, 000 Carbon steel
text associated to a header is separated by commas. Table 4 illus­
trates the instances resulting from Table 3 on the CSV file. Sump pump Leakage
(1)
3.2. Text organization
This sentence represents that a small leakage of petroleum might
cause toxic vapour cloud and/or irritation; thus, the output is a
As Fig. 1 depicts, the target depends on the task performed. For
vector that contains the combination of the potential consequences.
this reason, an annotated corpus (i.e., labelled dataset) was built for
In this example, the output y1, i corresponding to x1, i is given in Eq. 2:
each of the three tasks. Thus, the second script selects the textual
data to construct the input sentences and the target from the CSV file 0
according to the header. The script also cuts out the rows of the CSV 0
without data related to the “initiating event, “potential con­ 0
sequence”, “severity category” or “likelihood category”. y1, i = 1
Each row of the CSV file may provide multiple input sentences. 0
For example, a set of potential consequences was specified for two 1
initiating events (leakage and rupture, see Table 3) and, thus, we can 0 (2)
build a sentence for each set. It is also possible to construct different
input sentences using the potential consequences, where the output Note that this vector contains 7 positions, which represent the
pair can be either the severity or the likelihood category. Thus, we number of all potential consequences that can damage human life
built corpus 1 with 1391 instances, i.e., input sentence and label found in the PHA documents. Thus, each position y1,n i ,
pairs, for Model 1 and dataset 2 and dataset 3 with 2974 instances n = 1, 2, 3, 4, 5, 6, or 7 (burn injury, vapour cloud explosion,
for Models 2 and 3 respectively. Table 5 summarizes the input and flash fire, irritation, pool fire, toxic vapour cloud, or jet fire

387
J.B. Macêdo, M. das Chagas Moura, D. Aichele et al. Process Safety and Environmental Protection 158 (2022) 382–399

Fig. 2. a) Elbow method b) corpus’ clusters.

respectively), assumes the values 0 or 1, where 1 indicates the pre­ 3.3. Data analysis
sence of the potential consequence n. These potential consequences
were defined by the experts based on their knowledge, experience, In order to verify systematic biases in the corpus we performed
and creativity. The use of textual data from PHA may allow the K -means clustering. The number of clusters was selected through
model to extract and understand this knowledge. Thus, it would the Elbow Method, in which the Within-Cluster-Sum of Squared
enable postulating an accidental scenario in an oil refinery sub­ Errors (WSS) is calculated for different number of clusters; then, the
system. number of clusters selected is the one that WSS starts to diminish. In
For the second and third tasks, the input sentences were similarly the plot of WSS versus number of clusters, this is visible as an
constructed by joining the same textual data used for the first task “elbow” (D’Silva and Sharma, 2020). The Elbow method provide the
with the addition of the potential consequence (see Table 5). Using optimal number of clusters in a pre-defined search space, here the
the same example, two input sentences can be constructed: one by minimum number of clusters was defined as 2 and the maximum as
adding “toxic vapour cloud” and the other by adding “irritation”. 10. Thus, considering WSS versus number of clusters curve for our
These input sentences are present in both corpus (i.e., for task 2 and corpus, depicted in Fig. 2a, we set 5 clusters for the k -means clus­
for task 3). For example, one of these sentences used as input for task tering algorithm. These clusters are displayed in Fig. 2b.
2, x2, i , and 3, x3, i , is given in Eq. 3. One can see that the composition of the clusters is unbalanced;
however, there is not a significantly large cluster that may indicate
x2, i = x3, i = Industrial wastewater treatment Flow regularization biases on the learning process of the models.
system
3.4. Text preprocessing
Basin with possible presence of toxic substance hydrocarbon from
other unit Textual data oftentimes present noise, such as different varia­
Contaminated and oily water 25 1.033 3, 000 Carbon steel tions of capitalization for the same word, punctuation, special
Sump pump Leakage Toxic vapour cloud characters, etc. Given that, three preprocessing operations (low­
ercasing, noise removal and tokenization) were performed to
(3)
transform the input sentences into a cleaner format that can help
The output for the second task may be assigned to four possible improve the learning process of the models. The lowercasing and
values (0, 1, 2, or 3), which represent the severity categories (I to IV; noise removal were implemented in Python using regular expression
see Table 1). For instance, the output of Model 2 for the i -th input operations and Pandas library (McKinney, 2010) and the tokeniza­
sentence is y2, i = 1 (II). Also, the output for the third task may be tion was performed using the tokenizer provided by transformers
assigned to four possible values (0,1, 2, or 3), which represent the library (Wolf et al., 2020).
likelihood category (A to D; see Table 2), while the output of Model 3 First, we converted upper-cased to lower-cased words.
for the i -th input sentence is y3, i = 3 (D). Thus, we created an ap­ Lowercasing all data is simple and one of the most effective process
propriate dataset for each model. to solve data sparsity issues, and it should be applied to improve
accuracy for all languages and domains (Uysal and Gunal, 2014).

Table 6
Examples of tokenized sentences.

Corpus Tokenized sentence

1 ['[CLS]', 'industrial', 'waste', '##water', 'treatment', 'flow', 'regularization', 'system', 'basin', 'with', 'possible', 'presence' 'of', 'toxic', 'substances, 'hydrocarbon',
'from', 'other', 'unit', 'contaminated', 'and', 'oily', 'water', 'sump', 'pump', 'leak', '##age', '[SEP]']
2 and 3 ['[CLS]', 'industrial', 'waste', '##water', 'treatment', 'flow', 'regularization', 'system', 'basin', 'with', 'possible', 'presence' 'of', 'toxic', 'substances, 'hydrocarbon',
'from', 'other', 'unit', 'contaminated', 'and', 'oily', 'water', 'sump', 'pump', 'leak', '##age', 'toxic', 'vapour', 'cloud', '[SEP]']

388
J.B. Macêdo, M. das Chagas Moura, D. Aichele et al. Process Safety and Environmental Protection 158 (2022) 382–399

Fig. 3. a) Training and validation optimization learning curves for Model 1; b) Training and validation accuracy learning curves for Model 1.

Next, noise removal includes deleting special characters, white to quickly download and use several pre-trained models on a given
spaces, and punctuation. In the input sentences, several special text, and thus fine-tune them on our own datasets. We here adopted
characters were present (e.g., ‘-’, ‘/’, ‘%’, ‘#’). This step also includes the ‘bert-base-multilingual-cased’ version of the ‘BertForSequence­
the expansion of abbreviations (e.g., “adu” was converted to “at­ Classification’ model that is a version pre-trained in 104 languages
mospheric distillation unit”). Thus, noise removal was paramount to (including Portuguese, in which our text data are originally written).
construct cleaner sentences. To mark the beginning and the end of We added one output layer on top of the pre-trained model to
each sentence, it was necessary to add [CLS] and [SEP] respectively; adapt it for performing a classification task. The representation of
this is because BERT was pre-trained using the format [CLS] sen­ the last token [CLS] of the input sentence is fed to the output layer,
tence [SEP]. i.e., the final hidden state h of the token [CLS] is used to represent the
Moreover, it is necessary to use the same tokenization to fine- sentence. Then, an activation function computes h and converts it
tune a pre-trained model; for this reason, we used the into probabilities (Eq. 4).
‘AutoTokenizer’ backed by transformers library, which splits the
p (c h) = activation (hT W + b) (4)
sentences into a sequence of tokens according to punctuation and
word pieces (i.e., sub-word units), converts raw text to sparse index where c is the class of the input sentence, b is the bias, and W is the
encodings, and stores the vocabulary token-to-index map. Thus, the weights matrix of the added output layer.
cleaned sentences were processed by the tokenizer, and then Table 6 Moreover, we added three different output layers for each model
presents some examples. In addition, the tokenizer transforms all because Model 1 aims at predicting multiple classes that are not
sequences to a maximum length (512 tokens) by adding zeros, since mutually exclusive, while Models 2 and 3 classify the input sentence
the model requires inputs that have the same shape and size. into a single class among mutually exclusive classes. Then, we used a
sigmoid activation function for Model 1, and a softmax for Models 2
3.5. Modelling process and 3 to predict the probability of each label c . Simply put, the
sigmoid function returns a value in the range 0–1 for each label (i.e.,
We used the Pytorch implementation of pre-trained BERT avail­ independent probabilities); thus, the predicted labels are the ones,
able at transformers library (Wolf et al., 2020), which provides APIs which have probability greater than 0.5. In turn, the softmax

Fig. 4. Confusion matrices for Model 1′s classification of test data.

389
J.B. Macêdo, M. das Chagas Moura, D. Aichele et al. Process Safety and Environmental Protection 158 (2022) 382–399

Table 7 hours. We adopted ‘BertAdam’ as optimizer, available on transfor­


Results of the classification of test data for each potential consequence. mers library, and a learning rate of 10−6, batch size of 16, and warm
n Potential consequence Precision (%) Recall (%) F1-score (%) up of 0.1 to train all models.
Finally, each dataset was randomly split into 90% (10% of it were
1 Burn injury 87.09 90 88.52
2 Vapour cloud explosion 95.59 98.48 97.01 adopted for validation) for training and the remaining 10% for test
3 Flash fire 100 100 100 (unseen data). Finally, we evaluated the model’s performance on test
4 Irritation 100 100 100 data. The results achieved with each model are discussed in the
5 Pool fire 75 90 81.82
following section.
6 Toxic vapour cloud 91.55 94.20 92.75
7 Jet Fire 100 97.06 98.51
4. Results and discussion

activation function outputs are mutually exclusive, and then the sum
4.1. Model 1
of their probabilities is 1; thus, the predicted class is the one, which
has the highest probability (Farhadi et al., 2019; Gao and Pavel,
To evaluate the learning and generalization of Model 1, the train
2017). Model 1 was fitted using the binary cross-entropy loss func­
and validation learning curves are presented in Fig. 3a, and the ac­
tion (to penalize each output independently), while Models 2 and 3
curacy (i.e., proportion of true positives and true negatives among
adopted the categorical cross-entropy (further details can be found
the total number of observations) graphs are given in Fig. 3b. The
in Goodfellow et al., 2016).
training and validation accuracies were remarkably high (99.97% and
Thus, we fine-tuned the pre-trained model three times, using the
99.9% respectively), which indicate a good fit of the learning algo­
specific dataset from a given classification task. In this case, the loss
rithm. Then, we stopped the training after 150 epochs to avoid
is propagated through the entire architecture and all BERT pre­
overfitting.
trained parameters as well as W and b are fine-tuned and updated
As explained, Model 1 performs a multi-classification in which
based on the new dataset. However, W and b are the only para­
an instance can be assigned to different potential consequences
meters that need to be randomly initialized and learned from
simultaneously; thus, the output is a vector with 7 dimensions,
scratch. Although the developed models are initialized with the
which represent potential consequences, and the value assumed in
same pre-trained parameters, training for distinct tasks provides
each dimension is binary, indicating whether the example contains
different fine-tuned models in an efficient way. Indeed, this ap­
that consequence or not. Fig. 4 provides a confusion matrix with
proach allows us to build models with state-of-the-art architectures
the prediction on test data for each consequence in order to eval­
within a reasonable time. Training these architectures from scratch
uate the performance of Model 1. In the confusion matrices, the
can take days (Howard and Ruder, 2018); however, using the GeForce
element in the r -th row and j -th column, cr , j , indicates the number
RTX 2080 Ti, the fine-tuning of each model took about four/five (burn injury)
of observations j predicted as r . For example, c0,0 (element of

Table 8
Wrong predictions made by Model 1.

Error

# of errors Type Description Prediction Target

1 False positive Predicted toxic vapour cloud 0 0 0 0 011 0000001


Predicted toxic vapour cloud 0 110 0 11 0 110 0 0 1
Predicted vapour cloud explosion 0 10 0 110 0 0 0 0 110
Predicted burn injury 1110 0 10 0 110 0 10
Predicted toxic vapour cloud 0 0 110 10 0 0 110 0 0
Predicted burn injury 110 0 0 0 0 010 0 0 0 0
Predicted toxic vapour cloud 110 0 0 10 110 0 0 0 0
Predicted vapour cloud explosion 0 10 0 110 0 0 0 0 110
False negative Did not predict burn injury 0110 010 1110 0 10
Did not predict toxic vapour cloud 0 0 110 0 0 0 0 110 10
Did not predict vapour cloud explosion 0 0 0 0 010 010 0 010
Did not predict toxic vapour cloud 110 0 0 0 0 110 0 0 10
Did not predict pool fire 110 0 0 0 0 110 0 10 0
Did not predict burn injury 010 0 0 0 0 110 0 0 0 0
Did not predict jet fire 010 0 0 0 0 010 0 0 01
Did not predict toxic vapour cloud 0 0 110 0 0 0 0 110 10
2 False positive Predicted burn injury and pool fire 1110 110 0 110 0 10
Predicted burn injury and toxic vapour cloud 110 0 0 10 010 0 0 0 0
False negative Did not predict pool fire and toxic vapour cloud 110 0 110 110 0 0 0 0
Did not predict burn injury; predicted pool fire 010 010 0 110 0 0 0 0
False negative and false positive Did not predict toxic vapour cloud; predicted vapour cloud explosion 010 010 0 0 0 0 0 110

Table 9
Assessing scenarios provided by Model and the potential consequences predicted.

i x1, i y1, i ŷ1, i Error

1 naphtha hydrotreating unit reactor system hydrogen and hydrogen sulphide section between exchanger 110 0 0 010 0 0 Did not predict burn
and recycling gas injection point naphtha and naphtha stream 57 200 91,461 carbon steel small leakage 00 00 injury
2 naphtha hydrotreating unit cooling and separation system hydrogen and hydrogen sulphide section 010 0 0 010 0 0 –
between the separation point of the output stream from the stabilization vessel and the naphtha stream for 00 00
recycling exchanger 31 260 82,943 carbon steel large leakage
3 naphtha hydrotreating unit cooling and separation system hydrogen and hydrogen sulphide section 110 01 110 0 0 Did not predict
between exchanger and condenser 33 300 20,1291 carbon steel large leakage 00 00 pool fire

390
J.B. Macêdo, M. das Chagas Moura, D. Aichele et al. Process Safety and Environmental Protection 158 (2022) 382–399

the confusion matrix for burn injury) indicates that 110 of the in­
stances that are not labelled as burn injury (0) where corrected
classified as 0.
Model 1 was able to predict accurately most of the potential
consequences, even those with the lowest frequency. For instance,
there are only eleven instances in test data labelled with irritation
and the model correctly classified all instances. Indeed, Model 1
achieved a mean accuracy of 97.42% to predict the potential con­
sequences of test samples. From the confusion matrices, we com­
puted precision, recall, and F1-score for each class, where precision is
the number of true positives (instances correctly predicted as 1) over
all positive predictions (all instances predicted as 1), recall is the
number of true positives over all instances with 1 as true label, and
F1-score is the harmonic mean between precision and recall. Table 7
summarizes the scores for each category.
One can see the inferior precision for pool fire (75%) in com­
parison to others. This can be explained by the relatively high
number of false positives. In the early stages of QRA, the aim is to
identify all the possible hazardous scenarios; thus, this type of error
Fig. 6. Confusion matrix for Model 2′s classifications of test data.
is deemed to be acceptable since the risk analyst may assess the
coherence of the model’ results. Moreover, Model 1 yielded more
false positives for burn injury and for toxic vapour cloud than for Table 10
pool fire. However, the precision related to the prediction of these Results of the classification of test data for each severity category.
consequences were less affected by these. Model 1 also yielded more Severity Precision (%) Recall (%) F1-score (%)
false negatives for burn injury and toxic vapour cloud than for the
I 75.86 95.65 84.61
other consequences. Nevertheless, the recall for burn injury and II 92.17 85.48 88.69
toxic vapour cloud is 90% and 94.2% respectively. Thus, Model 1 III 80.41 88.63 81.28
presented satisfactory results considering all potential consequences IV 92.45 81.67 84.21
and achieved a mean F1-score above 94.09%.
We also evaluated the model’s performance to correctly predict
the whole list of potential consequences of a given instance. This A possible explanation for the inferior performance to predict
means that if one or more labels of a sample were misclassified, we some potential consequences might be the presence of similar sce­
considered the prediction as a model error. Then, Table 8 presents all narios in the PHA documents that cannot led to such consequences.
misclassifications, the actual target, and a brief description of the Table 9 shows some examples of input (sentences showed without
error; 85.41% of the test data were assigned with the correct set of preprocessing and tokenization to facilitate de analysis), output, and
labels. the prediction made by Model 1, x1,i , y1,i , and ŷ1,i , respectively.
Note that 21 out of 144 instances on test set were misclassified, For instance, there are several similarities in the description of
16 out of 21 instances had one incorrect label and 5 instances had instances 1 and 2 (in bold); however, instance 2 does not generate
two incorrect labels. However, in only 11 instances, there were po­ burn injury despite the higher operating temperature and it was
tential consequences unpredicted; in these cases, experts should correctly classified by Model 1. Also, among 635 accidental scenarios
review the results and manually include them. It is noteworthy that related to naphtha hydrotreating unit only 8 of them can cause burn
most unpredicted potential consequences were burn injury and toxic injury. Moreover, instances 2 and 3 even involve the same system
vapour cloud. (cooling and separation system), equipment (exchanger) and similar

Fig. 5. a) Training and validation optimization learning curves for Model 2; b) Training and validation performance learning curves for Model 2.

391
J.B. Macêdo, M. das Chagas Moura, D. Aichele et al. Process Safety and Environmental Protection 158 (2022) 382–399

Table 11
Samples of input and output pairs in the data set.

i x2, i y2, i ŷ2, i

1 Hydrotreating unit loading and unloading of chemicals system dmds from container to pump and from pump to unit flange 1.33 25 container tank III –
rupture toxic vapour cloud
2 Hydrotreating unit loading and unloading of chemicals system dmds from truck to the loading area and container in the waiting area 1.33 25 II III
container tank rupture toxic vapour cloud
3 Powerhouse steam system medium pressure steam from boiler to the medium pressure steam collection and distribution header 17.5 260 64.73 II –
rupture burn injury
4 Powerhouse steam system medium pressure steam from medium pressure steam collection and distribution header to the battery limit 17.5 260 III II
128.92 rupture burn injury
5 Water treatment unit chemical product system gaseous chlorine from chlorine cylinders to the chlorination system in the chlorinator house carbon III –
steel 25 0 30 rupture toxic vapour cloud
6 Water treatment unit chemical product system gaseous chlorine cylinders inside the cylinder room carbon steel 25 21.1 40 rupture toxic vapour cloud IV III

operating conditions; however, differently from instance 3, the


scenario described in instance 2 cannot led to pool fire.
In addition, it is worth mentioning that burn injury, toxic vapour
cloud, and pool fire (for which the model had the worst recall) are
related to a wide variety of scenarios, which can make it difficult for
it to learn/recognize all features that characterize these potential
consequences. For instance, considering our database, pool fire and
toxic vapour cloud occur in more than 15 different units of the oil
refinery (i.e., almost all units in the refinery) and they are associated
with the release of more than 100 chemical products, whereas jet
fire occurs in only 7 units, due to the outflow of 50 substances.
How to improve the performance of the model will be object of
future research. A possible solution to overcome this problem may
be to perform data augmentation to build a more homogeneous
dataset in relation to the different features, such as unit and/or
chemical (Liu et al., 2020). It is worth mentioning that the model’s
outcomes represent a starting point for completing the QRA. Thus,
analysts should critically evaluate these results in order to add/re­
move scenarios, and then build a more representative database. Fig. 8. Confusion matrix of Model 3′s classifications of test data.
Thus, the updated database could be used to retrain the models as a
way to improve their performances and system representativeness. Table 12
Overall, Model 1 provides satisfactory predictions of the potential Results of the classification of test data for each likelihood category.
consequences for different subsystems of the oil refinery. The team Likelihood Precision (%) Recall (%) F1-score (%)
of risk analysts could evaluate of the outputs are coherent and could
A 89.28 75.76 81.97
postulate new potential consequences. Thus, the model predictions B 92.56 97.39 94.91
could be used as a starting point for risk analysis purposes. C 100 91.80 95.72
D 95.7 100 97.8

4.2. Model 2
Figs. 5a and 4 b respectively. Model 2 was trained for 150 epochs
The sentences (instances) provided to Model 2 are labelled with until its loss curves reached some stability. The accuracy curves
the severity of the potential consequences (See Table 1). The opti­ reached 92.83% on training and 92.92% on validation. Again, the
mization and the accuracy learning curves for Model 2 are shown in proximity of the training and validation curves and the behaviour of

Fig. 7. a) Training and validation optimization learning curves for Model 3; b) Training and validation accuracy curves for Model 3.

392
J.B. Macêdo, M. das Chagas Moura, D. Aichele et al. Process Safety and Environmental Protection 158 (2022) 382–399

Table 13 Moreover, most of the scenarios of the water treatment unit in


Confusion matrix. our database are classified as III, such as x2,5, which involves the
Risk Matrix rupture of pipeline from chlorine cylinders to the chlorination
Consequence Likelihood
system. There are only three scenarios classified as IV, which may
have compromised the learning process. One of them is x2,6 (mis­
A B C D
classified as III) that represents a toxic vapour cloud generated due
IV M M M NT to the rupture of the chlorine cylinder. Additionally, x2,5 and x2,6 are
III T M M M
also similar; both encompass the release of gaseous chlorine and the
II T T M M
I T T T T
chlorine cylinder. Thus, the data provided might not be sufficient to
make the model capture these subtleties between category III and
others, which may have compromised the learning process. On the
the learning plots indicate a good fit of the model. Fig. 6 shows the other hand, these errors for categories III and IV are tolerable be­
confusion matrix on the test data. The model achieved an accuracy of cause in practice scenarios classified as III or IV must be further
86.44% to classify the test data. analysed quantitatively. Thus, the classification provided by Model 2
The lowest precision was for the prediction of category I (75.86%) would certainly be useful to identify the least severe scenarios, fil­
and III (78.35%). These outcomes for category I can be explained by tering the most critical cases for further quantitative analysis.
the small number of instances; thus, the instances II misclassified as Moreover, since the main purpose of QRA is to identify all accidental
I had a big impact on the precision (note that the recall for category II scenarios, underestimate scenarios are undesirable. Thus, to avoid
was less affected by these errors). Yet, Model 2 predicted correctly this situation we plan to incorporate in the models’ training a
95.65% of the instances classified as I. Moreover, the worst recall was function to penalize non-conservative predictions. Furthermore, we
computed for category IV, where 10 out of 11 misclassifications were will investigate ensemble models such as the combination of BERT
predicted as III. Nevertheless, the results obtained were satisfactory. with other ML classifiers to process the continuous variables sepa­
Indeed, F1-scores were above 80% for all categories. Table 10 sum­ rately, so that the model can better learn the relationship between
marizes the results for each category. these variables and the description of subsystems.
A possible explanation for the inferior performance to predict
category III might be the presence of similar scenarios descriptions
in the PHA documents, which are classified with different severity 4.3. Model 3
levels. For instance, Table 11 shows some examples of input (sen­
tences showed without preprocessing and tokenization to facilitate Finally, the input sequence provided to Model 3 is labelled with
de analysis) and output pairs (x2, i , y2, i ) used to train Model 2 and the likelihood category; see Table 2. The evaluation of the model
misclassified scenarios (i.e., prediction ŷ2, i different from y2, i ). Note training can be done through the optimization and the accuracy
that the instances without predictions correspond to the training learning curves in Fig. 7a and b respectively. The model was trained
instances. for 130 epochs until its loss curve reached some stability and
As we can see, x2,1 (training input) and x2,2 (test input) are very achieved an accuracy of 97.68% on training and 95.48% on validation;
similar, both represent the transport of dmds (dimethyl disulphide) in the curves indicate a good fit of the model. Moreover, Model 3
the hydrotreating unit under similar operating conditions; then, it is achieved an accuracy of 94.34% on test. A detailed description of
reasonable that Model 2 classified x2,2 as III, i.e., the same category as Model 3 outcomes is given in Fig. 8 and the performance metrics are
x2,1. However, the system described in x2,1 starts at the container tank summarized in Table 12.
and goes to pump; while the system described in x2,2 starts at the Model 3 predicted all classes with great precision. The worst
truck and goes to the container tank. Likewise, Model 2 misclassified performance was in the prediction of category A, which represents
x2,4 as II. Both x2,3 and x2,4 represent a burn injury due to rupture of a the least frequent events (Table 2) that are indeed more difficult to
pipeline containing medium pressure steam in the powerhouse under envision and usually leads to more uncertain estimates (Jin et al.,
similar operating conditions; however, the system described in x2,3 2020; Marchiori and Guida, 2015). In fact, 24.24% of the instances of
goes from the boiler to the header, while x2,4 considers the header up category A were misclassified as B. These errors have a greater im­
to the battery limit and involves a higher flow rate. pact on the metrics for category A, since it corresponds to the
smallest group on test. Note that these errors do not have a major
impact on the metrics for category B. Moreover, one interesting
finding is that most of the model's errors were in predicting in­
stances into categories that represent more likely events. These er­
rors may lead to more critical risk classification of the scenarios (ISO,
2018). Finally, all performance metrics for category B, C, and D were
above 90%. These results of Model 3 suggest a reasonable ability to
learn and recognize patterns about all likelihood categories.

4.4. Concatenation of errors

To analyse the concatenation of errors, we combined the pre­


dictions made by Model 2 and 3 on test data. To that end we con­
sidering the risk matrix (Table 13), according to which risks are
classified as tolerable (T), moderate (M), or non-tolerable (NT)
(ISO, 2018).
The risk categories provided by the models are summarized in
the confusion matrix (Fig. 9). One can see that more than 17% of the
Fig. 9. Confusion matrix with the result on test data of the combination of Model 2 test samples were assigned into a more critical risk category; thus,
and 3. the results provided are more conservative.

393
J.B. Macêdo, M. das Chagas Moura, D. Aichele et al. Process Safety and Environmental Protection 158 (2022) 382–399

Fig. 10. Confusion matrices for Model 1′s classification of test data using 70/30 split.

Table 14 Table 16
Results of the classification of test data using 70/30 split for each potential con­ Performance metrics for Model 2 using 70/30 split.
sequence.
Severity Precision (%) Recall (%) F1-score (%)
n Potential consequence Precision (%) Recall (%) F1-score (%)
I 80.39 94.25 86.77
1 Burn injury 93.62 93.62 93.62 II 88.18 86.87 87.52
2 Vapour cloud explosion 99.49 98.99 99.24 III 86.50 85.87 86.18
3 Flash fire 99.39 100 99.69 IV 94.12 90.26 92.15
4 Irritation 100 100 100
5 Pool fire 85.19 100 92.00
6 Toxic vapour cloud 97.16 98.08 97.62
7 Jet Fire 100 99.09 99.54
evaluated their performances on the remaining data. Fig. 10 provides
the confusion matrices with the prediction on test data for each
consequence to evaluate the performance of Model 1.
4.5. Sensitivity analysis From the confusion matrices, we computed the performance
metrics summarized in Table 14. Comparing Table 14 and Table 7,
In order to further evaluate the performance of the algorithms one can see that the performance of Model 1 improved and the false
we retrained the models using 70% of the dataset; then, we negative rate dropped for all potential consequences. This result is

Table 15
Performance metrics for Model 1 using 70/30 split.

# of errors Error Description Prediction Target


Type

1 False positive Predicted toxic vapour cloud 110 0 0 10 110 0 0 0 0


Predicted toxic vapour cloud 0 0 0 0 011 0000001
Predicted vapour cloud explosion 0 0 10 110 0 0 0 0 110
Predicted toxic vapour cloud 0 0 110 10 0 0 110 0 0
Predicted toxic vapour cloud 0 110 0 11 0 110 0 0 1
Predicted toxic vapour cloud 0 0 110 10 0 0 110 0 0
Predicted burn injury 1110 0 10 0 110 0 10
Predicted burn injury 110 0 0 0 0 010 0 0 0 0
Predicted burn injury 1110 0 10 0 110 0 10
Predicted vapour cloud explosion 0 10 0 110 0 0 0 0 110
Predicted burn injury 1110 0 10 0 110 0 10
Predicted burn injury 110 0 0 10 010 0 010
False negative Did not predict pool fire 110 0 0 10 110 0 110
Did not predict toxic vapour cloud 110 0 0 0 0 110 0 0 10
Did not predict burn injury 0 110 0 0 0 1110 0 0 0
Did not predict burn injury 010 0 010 110 0 0 10
Did not predict burn injury 010 0 0 0 0 110 0 0 0 0
Did not predict toxic vapour cloud 0 0 110 0 0 0 0 110 10
Did not predict vapour cloud explosion 0 0 0 0 010 010 0 010
Did not predict pool fire 0 010 0 0 0 0 01010 0
Did not predict vapour cloud explosion 0 010 0 0 0 0 110 0 0 0
Did not predict pool fire 1110 0 0 0 1110 10 0
Did not predict toxic vapour cloud 0 0 0 010 0 0 0 0 0 110
Did not predict burn injury 010 0 0 0 0 110 0 0 0 0
Did not predict toxic vapour cloud 1110 0 0 0 1110 0 10
Did not predict jet fire 010 0 0 0 0 010 0 0 01
Did not predict pool fire 110 0 0 0 0 110 0 10 0
2 False negative and false positive Did not predict burn injury; predicted toxic vapour cloud 010 0 010 110 0 0 0 0
False positive Predicted burn injury and toxic vapour cloud 110 0 0 10 010 0 0 0 0

394
J.B. Macêdo, M. das Chagas Moura, D. Aichele et al. Process Safety and Environmental Protection 158 (2022) 382–399

Table 17 very positive, since the main purpose of the early stages of QRA is to
Performance metrics for Model 3 using 70/30 split. predict all possible consequences. This result is very positive, since
Likelihood Precision (%) Recall (%) F1-score (%) the main purpose of the early stages of QRA is to predict all possible
consequences.
A 93.24 67.65 78.41
B 90.82 96.74 93.48 Moreover, with the increase from 144 to 431 test instances,
C 82.80 81.25 82.02 Model 1 made only 8 more errors considering the assignment of the
D 88.15 90.50 89.31 correct set of potential consequences, i.e., 29 out of 431 instances on
test set were misclassified (Table 15). Among them, 27 instances had

Fig. 11. Overview of HALO (hazard analysis based on language processing for oil refineries) app. The top image shows the first page of the app. The bottom image presents an
example of input provided by the user.

395
J.B. Macêdo, M. das Chagas Moura, D. Aichele et al. Process Safety and Environmental Protection 158 (2022) 382–399

Fig. 12. Example of outcome provided by the app.

one incorrect label and 2 instances had two incorrect labels. As For example, considering the system illustrated in Table 3, the
mentioned, the main purpose of the early stages of QRA is to identify user defines nine variables to characterize the system under ana­
all possible hazards; thus, only 16 errors (unpredicted consequences) lysis: unit (industrial wastewater treatment), system (flow regular­
are critical regarding the following steps of QRA. Yet, the most un­ ization system), chemical product involved in the hypothetical spill
predicted was burn injury (6 out of 16), which is not quantitatively (contaminated and oily water), equipment present in the subsystem
evaluated in the following steps of QRA, since it is not often related analysed (sump pump), operational temperature (25), pressure
to casualties. (1.033), and mass flow rate (3000), material (carbon steel), descrip­
Regarding the performance of Model 2, some performance me­ tion of the subsystem (Basin with possible presence of toxic substance
trics improved (see Table 16 and Table 10); however, more than 12% hydrocarbon from another unit).
of the instances III were classified as a less severe category. As we To define unit, system, chemical product, equipment, and mate­
mentioned, in practice, scenarios classified as III and IV must be rial the user chooses one option from a drop-down list. The options
further analysed in the QRA, while the less severe scenarios are available are limited by the data used to build the models. For in­
screened out from the analysis. Thus, we believe that such errors stance, the models are only able to predict risk features related to
made by Model 2 with the 70/30 split would result in the removal of chemical releases in 22 units of an oil refinery (e.g., atmospheric
many scenarios from the following stages of a QRA, diminishing the distillation unit, delayed coking unit, hydrotreater unit, and others).
usefulness of the proposed methodology. If the unit that the user is interested in is not in the drop-down list, it
Moreover, for Model 3, the performances for classifying the means that there was no data related to this unit in our database.
likelihood of occurrence with the 70/30 split resulted in worse Moreover, the models predict consequences and their respective
performances (see Table 17 and Table 12). This can be explained severity and likelihood of occurrence based on the variables pro­
because we have an unbalanced dataset; then, 70% can represent a vided during the training phase. Thus, if the user selects one of the
small amount of training data considering some classes. For in­ options of the dropdown list that is not on the subsystem analysed it
stance, there are 330 instances labelled as A, while there are 1150 is important to keep in mind that the predicted scenarios may be
instances of category B. We also evaluated the performance of the significantly different, being necessary to critically assess these dif­
model using 70% of the data for training. ferences. For example, if a critical valve is selected by the user, but it
is missing on the system being analysed, the models will predict
based on learnings where the valve is present, then the predicted
4.6. HALO (hazard analysis based on language processing risk features may be more critical.
for oil refineries) The other variables to characterize the system are provided in a
user-defined field. To define subsystem description the user must
Overall, the proposed methodology presented satisfactory re­ provide a short description of the analysed “section”. This descrip­
sults; thus, the trained classifiers could be a useful tool to support tion must be provided in Portuguese because the models were
the QRA early qualitative stages. To that end, we embedded the trained in this language. Moreover, the temperature, pressure and
trained classifiers into a web app, known as HALO1 (hazard analysis flow rate must be provided in °C , kgf . cm 2 and kg . h 1 respectively.
based on language processing for oil refineries), to support risk Finally, the user confirms the features, and the app provides a list of
analysts identifying and assessing accidental scenarios related to the potential consequences due to hypothetical chemical spills in the oil
operation of oil refineries. In summary, the user (risk analyst) pro­ refinery system described by the user. In addition, the app qualita­
vides information about the system and the trained classifiers take tively estimates the severity and the likelihood of occurrence of each
the user’s input. Fig. 11 shows the first page of the app and an ex­ predicted potential consequence. Fig. 12 shows an example of the
ample of how the information is provided to the app. Further details outcomes that are provided by the classifiers built in this paper.
on the app structure and outcomes can be found in (Macedo
et al., 2021). 5. Conclusions

Overall, this study strengthens the idea that information con­


1
http://nlprisk.ceerma.com/ tained as text data can be automatically extracted and processed by

396
J.B. Macêdo, M. das Chagas Moura, D. Aichele et al. Process Safety and Environmental Protection 158 (2022) 382–399

text mining and NLP techniques to support risk studies. A method Declaration of Competing Interest
based on TM techniques and pre-trained BERT model was developed
to identify risk features in an oil refinery. There have been no conflict of interest of the authors regarding
First, a corpus was built using PHA documents to train the the research area or analysed results.
models. The texts were automatically extracted from the documents
and, then, preprocessed into a convenient format for the learning Appendix A. Supporting information
algorithms. Next, the pre-trained model was tailored for performing
three tasks: i) to identify possible consequences, given the occur­ Supplementary data associated with this article can be found in
rence of a leakage; ii) to classify the severity of the consequences; iii) the online version at doi:10.1016/j.psep.2021.12.025.
to classify the likelihood of occurrence of the accident scenario. As a
result, we developed three models that could extract sufficient References
knowledge from the textual data and yielded satisfactory training
and test outcomes. Ahmad, S.I., Hashim, H., Hassim, M.H., Rashid, R., 2019. Development of hazard pre­
vention strategies for inherent safety assessment during early stage of process
Model 1 presented a great performance both in the individual design. Process Saf. Environ. Prot. 121, 271–280. https://doi.org/10.1016/j.psep.
prediction of each potential consequence, as well as in the prediction 2018.10.006
of the set of consequences associated with the different subsystems. Arunraj, N.S., Maiti, J., 2007. Risk-based maintenance—techniques and applications. J.
Hazard. Mater. 142, 653–661. https://doi.org/10.1016/j.jhazmat.2006.06.069
Model 2 also showed satisfactory results, since only the precision to Aven, T., Zio, E., 2018. Knowledge in risk assessment and management, 1st ed.
classify the severity level III was less than 80%. In addition, part of Aziz, A., Ahmed, S., Khan, F.I., 2019. An ontology-based methodology for hazard
the contribution to decrease the precision for this category is due to identification and causation analysis. Process Saf. Environ. Prot. 123, 87–98.
https://doi.org/10.1016/j.psep.2018.12.008
the classification of less severe instances (I and II) as III; thus, the
Badri, N., Nourai, F., Rashtchian, D., 2013. A multivariable approach for estimation of
experts can focus their efforts on the assessment of the most severe vapor cloud explosion frequencies for independent congested spaces to be used in
accident scenarios. Finally, most of Model 3′s errors were more occupied building risk assessment. Process Saf. Environ. Prot. 91, 19–30. https://
doi.org/10.1016/j.psep.2011.12.002
conservative (i.e., predicted an instance into a more likely category).
Baker, H., Hallowell, M.R., Tixier, A.J.P., 2020. Automatically learning construction
Moreover, the Model 3 presented promising results, achieving high injury precursors from text. Autom. Constr. 118, 103145. https://doi.org/10.1016/j.
performance in the prediction of most likelihood categories. autcon.2020.103145
These outcomes underscore that TM and NLP can be adopted to Basheer, A., Tasneem, S.M.T., Abbasi, A.S.A., 2019. Methodologies for assessing risks of
accidents in chemical process industries. J. Fail. Anal. Prev. 19, 623–648. https://
support risk analysts in identifying the potential consequences of doi.org/10.1007/s11668-019-00642-w
different scenarios and to describe qualitatively risks in terms of Baybutt, P., 2015. The importance of defining the purpose, scope, and objectives for
expected likelihood and severity of consequences. Indeed, the pro­ process hazard analysis studies. Process Saf. Prog. 34, 84–88. https://doi.org/10.
1002/prs
posed method could be a useful tool to support hazard identification Bengfort, B., Bilbro, R., Ojeda, T., 2018. Applied Text Analysis with Python: Enabling
and analysis; instead of starting the QRA from scratch, analysts could Language-aware Data Products with Machine Learning. O’Reilly Media, Inc.
either reuse knowledge from previous studies or process studies for Bernechea, E.J., Vílchez, J.A., Arnaldos, J., 2013. A model for estimating the impact of
the domino effect on accident frequencies in quantitative risk assessments of
similar plants. This may be rather useful especially for plants, which storage facilities. Process Saf. Environ. Prot. 91, 423–437. https://doi.org/10.1016/j.
are brand new and depend on the approval of the environmental psep.2012.09.004
regulators to start the development of the facility design and con­ Bhattacharjee, P., Dey, V., Mandal, U.K., 2020. Risk assessment by failure mode and
effects analysis (FMEA) using an interval number based logistic regression model.
struction. Then, experts may use that entire source of knowledge to
Saf. Sci. 132, 104967. https://doi.org/10.1016/j.ssci.2020.104967
reduce the uncertainty for performing risk analysis based on a model Boggs, A.M., Wali, B., Khattak, A.J., 2020. Exploratory analysis of automated vehicle
trained with all the available information collected and processed crashes in California: a text analytics & hierarchical Bayesian heterogeneity-based
approach. Accid. Anal. Prev. 135, 105354. https://doi.org/10.1016/j.aap.2019.
from past risk studies.
105354
Although the scope of this study was restricted to the oil refinery Carrasquilla, Juan, Melko, Roger, 2017. Machine learning phases of matter. Nature
context, the methodology can be applicable to different industrial Physics 13, 431–434. https://doi.org/10.1038/nphys4035
systems, being necessary to provide data from the context of interest Casal, J., 2017. Evaluation of the Effects and Consequences of Major Accidents in
Industrial Plants, second ed. Elsevier.
to train the models. It is important to emphasize that the model’s Chowdhary, K.R., 2020. Fundamentals of Artificial Intelligence. Springer India,
predictions are limited to what is provided through the training data. Jodhpur.
Therefore, caution must be taken when generalizing the predictions, D’Silva, J., Sharma, U., 2020. Unsupervised automatic text summarization of Konkani
texts using K-means with Elbow Method. Int. J. Eng. Res. Technol. 13, 2380–2384.
and the results must be carefully evaluated by the risk analysts. https://doi.org/10.37624/ijert/13.9.2020.2380-2384
Further studies, which would take into account other en­ Demirbas, A., Bamufleh, H.S., 2017. Optimization of crude oil refining products to
gineering documents, such as flowcharts, equipment and material valuable fuel blends. Pet. Sci. Technol. 35, 406–412. https://doi.org/10.1080/
10916466.2016.1261162
lists, should be undertaken to investigate the possibility of conflating Devlin, J., Chang, M., Kenton, L., Kristina, T., 2018. BERT: Pre-training of Deep
more information about the system and the data stored in the PHA Bidirectional Transformers for Language Understanding. arXiv Prepr. arXiv1810.
documents. This could improve the model's learning process and 04805.
Drury, B., Roche, M., 2019. A survey of the applications of text mining for agriculture.
reduce biases usually found in early stages of QRA.
Comput. Electron. Agric. 163, 104864. https://doi.org/10.1016/j.compag.2019.
104864
Farhadi, F., Nia, V.P., Lodi, A., 2019. Activation Adaptation in Neural Networks. arXiv
Prepr. arXiv1901.09849.
Acknowledgements
Feldman, R., Sanger, J., 2007. The Text Mining Handbook: Advanced Approaches in
Analyzing Unstructured Data. Cambridge University Press.
The authors thank CNPq Conselho Nacional de Desenvolvimento Fuentes-bargues, L., Gonz, C., Carmen, M., Ingeniería, D.D.P., De, Polit, U., 2016. Risk
Científico e Tecnológico (CNPq), Fundação de Amparo a Ciência e assessment of a compound feed process based on HAZOP analysis and linguistic
terms Ver o. J. Loss Prev. Process Ind. 44, 44–52.
Tecnologia do Estado de Pernambuco (FACEPE), Coordenação de Gagne, J.C.De, Hall, K., Conklin, J.L., Yamane, S.S., Roth, N.W., Chang, J., Kim, S.S., 2019.
Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Brazil - Uncovering cyberincivility among nurses and nursing students on twitter: a data
Finance Code 001, for the financial support through re­ mining study. Int. J. Nurs. Stud. 89, 24–31. https://doi.org/10.1016/j.ijnurstu.2018.
09.009
search scholarships and grants.

397
J.B. Macêdo, M. das Chagas Moura, D. Aichele et al. Process Safety and Environmental Protection 158 (2022) 382–399

Galati, F., Bigliardi, B., 2019. Industry 4. 0: Emerging themes and future research Nayak, R., Piyatrapoomi, N., Weligamage, J., Asset, R., Branch, M., 2009. Application of
avenues using a text mining approach. Comput. Ind. 109, 100–113. https://doi.org/ text mining in analysing road crashes for road asset. In: Proceedings of the 4th
10.1016/j.compind.2019.04.018 World Congress on Engineering Asset Management, pp. 49–50.
Gao, B., Pavel, L., 2017. On the Properties of the Softmax Function with Application in Pasman, H., Rogers, W., 2018. How trustworthy are risk assessment results, and what
Game Theory and Reinforcement Learning. arXiv Prepr. arXiv1704.00805 1–10. can be done about the uncertainties they are plagued with? J. Loss Prev. Process
George, K., Joseph, S., S, 2014. Text Classification by Augmenting Bag of Words (BOW) Ind. 55, 162–177. https://doi.org/10.1016/j.jlp.2018.06.004.This
representation with co-occurrence feature. IOSR J. Comput. Eng. 16, 34–38. Pasman, H.J., Rogers, W.J., Mannan, M.S., 2018. How can we improve process hazard
https://doi.org/10.9790/0661-16153438 identification ? What can accident investigation methods contribute and what
Goodfellow, I., Bengio, Y., Courville, A., 2016. Deep Learning. MIT Press. other recent developments ? A brief historical survey and a sketch of how to
Guiochet, J., 2016. Hazard analysis of human-robot interactions with HAZOP-UML. Saf. advance. J. Loss Prev. Process Ind. 55, 80–106.
Sci. 84, 225–237. https://doi.org/10.1016/j.ssci.2015.12.017 Passmore, D., Chae, C., Kustikova, Y., Baker, R., Yim, J., 2018. An exploration of text
Guo, X., Ji, J., Khan, F., Ding, L., 2021. Fuzzy bayesian network based on an improved mining of narrative reports of injury incidents to assess risk. MATEC Web Conf,
similarity aggregation method for risk assessment of storage tank accident. 251, 251.
Process Saf. Environ. Prot. 149, 817–830. https://doi.org/10.1016/j.psep.2020.07. Pejic-bach, M., Bertoncel, T., Meško, M., Krstić, Ž., 2020. Text mining of industry 4. 0
030 job advertisements. Int. J. Inf. Manag. 50, 416–431. https://doi.org/10.1016/j.
Heidarysafa, M., Kowsari, K., Barnes, L., Brown, D., 2018. Analysis of Railway Accidents ijinfomgt.2019.07.014
’ Narratives Using Deep Learning. In: 2018 17th IEEE International Conference on Pramoth, R., Sudha, S., Kalaiselvam, S., 2020. Resilience-based Integrated Process
Machine Learning and Applications (ICMLA). IEEE, pp. 1446–1453. doi: 10.1109/ System Hazard Analysis (RIPSHA) approach: application to a chemical storage
ICMLA.2018.00235. area in an edible oil refinery. Process Saf. Environ. Prot. 141, 246–258. https://doi.
Heidinger, D., Gatzert, N., 2018. Awareness, determinants and value of reputation risk org/10.1016/j.psep.2020.05.028
management: empirical evidence from the banking and insurance industry. J. Rachman, A., Ratnayake, R.M.C., 2019. Machine learning approach for risk-based in­
Bank. Financ. 91, 106–118. https://doi.org/10.1016/j.jbankfin.2018.04.004 spection screening assessment. Reliab. Eng. Syst. Saf. 185, 518–532. https://doi.
Howard, J., Ruder, S., 2018. Universal Language Model Fine-tuning for Text org/10.1016/j.ress.2019.02.008
Classification. arXiv Prepr. arXiv1801.06146. Ramos, M.A., López Droguett, E., Mosleh, A., Das Chagas Moura, M., 2020a. A human
ISO, 2018. ISO 31000: risk management—guidelines. reliability analysis methodology for oil refineries and petrochemical plants op­
Jin, R., Wang, F., Liu, D., 2020. Dynamic probabilistic analysis of accidents in con­ eration: Phoenix-PRO qualitative framework. Reliab. Eng. Syst. Saf. 193, 106672.
struction projects by combining precursor data and expert judgments. Adv. Eng. https://doi.org/10.1016/j.ress.2019.106672
Inform. 44, 101062. https://doi.org/10.1016/j.aei.2020.101062 Ramos, M.A., Thieme, C.A., Utne, I.B., Mosleh, A., 2020b. A generic approach to ana­
Kamil, M.Z., Taleb-Berrouane, M., Khan, F., Ahmed, S., 2019. Dynamic domino effect lysing failures in human – system interaction in autonomy. Saf. Sci. 129, 104808.
risk assessment using Petri-nets. Process Saf. Environ. Prot. 124, 308–316. https:// https://doi.org/10.1016/j.ssci.2020.104808
doi.org/10.1016/j.psep.2019.02.019 Robinson, S.D., 2019. Temporal topic modeling applied to aviation safety reports: a
Khurana, D., Koli, A., Khatter, K., Singh, S., Rachna, M., 2017. Natural Language subject matter expert review. Saf. Sci. 116, 275–286. https://doi.org/10.1016/j.ssci.
Processing: state of the art, current trends and challenges. arXiv Prepr. arXiv1708. 2019.03.014
05148. Sarkar, S., 2016. Text Mining based Safety Risk Assessment and Prediction of
Kim, J., Yoon, J., Park, E., Choi, S., 2020. Patent document clustering with deep em­ Occupational Accidents in a Steel Plant. In: 2016 International Conference on
beddings. Scientometrics 123, 563–577. https://doi.org/10.1007/s11192-020- Computational Techniques in Information and Communication Technologies
03396-7 (ICCTICT), pp. 439–444.
Kuhn, K.D., 2019. Using structural topic modeling to identify latent topics and trends Sarkar, S., Verma, A., Maiti, J., 2018. Prediction of occupational incidents using
in aviation incident reports. Transp. Res. Part C 87, 105–122. https://doi.org/10. proactive and reactive data: a data mining approach. In: Industrial Safety
1016/j.trc.2017.12.018 Management. Springer, Singapore, pp. 65–79.
Kurian, D., Sattari, F., Lefsrud, L., Ma, Y., 2020. Using machine learning and keyword Sarvestani, K., Ahmadi, O., Mortazavi, S.B., Mahabadi, H.A., 2021. Development of a
analysis to analyze incidents and reduce risk in oil sands operations. Saf. Sci. 130, predictive accident model for dynamic risk assessment of propane storage tanks.
104873. https://doi.org/10.1016/j.ssci.2020.104873 Process Saf. Environ. Prot. 148, 1217–1232. https://doi.org/10.1016/j.psep.2021.02.
Landucci, G., Paltrinieri, N., 2016. A methodology for frequency tailorization dedicated 018
to the Oil & Gas sector. Process Saf. Environ. Prot. 104, 123–141. https://doi.org/10. Singh, K., Maiti, J., Dhalmahapatra, K., 2019. Chain of events model for safety man­
1016/j.psep.2016.08.012 agement: data analytics approach. Saf. Sci. 118, 568–582. https://doi.org/10.1016/j.
Leu, S., Chang, C., 2013. Bayesian-network-based safety risk assessment for steel ssci.2019.05.044
construction projects. Accid. Anal. Prev. 54, 122–133. Sjöblom, O., 2014. Data Mining in Promoting Aviation Safety Management. In:
Li, M., Wang, H., Wang, D., Shao, Z., He, S., 2020. Risk assessment of gas explosion in International Conference on Well-Being in the Information Society, pp. 186–187.
coal mines based on fuzzy AHP and bayesian network. Process Saf. Environ. Prot. Steijn, W.M.P., Van Kampen, J.N., Van der Beek, D., Groeneweg, J., Van Gelder,
135, 207–218. https://doi.org/10.1016/j.psep.2020.01.003 P.H.A.J.M., 2020. An integration of human factors into quantitative risk analysis
Li, X., Chen, G., Jiang, S., He, R., Xu, C., Zhu, H., 2018. Developing a dynamic model for using Bayesian Belief Networks towards developing a ‘QRA+’. Saf. Sci. 122, 104514.
risk analysis under uncertainty: case of third-party damage on subsea pipelines. J. https://doi.org/10.1016/j.ssci.2019.104514
Loss Prev. Process Ind. 54, 289–302. Suh, Y., 2021. Sectoral patterns of accident process for occupational safety using
Lisi, R., Consolo, G., Maschio, G., Milazzo, M.F., 2015. Estimation of the impact prob­ narrative texts of OSHA database. Saf. Sci. 142, 105363. https://doi.org/10.1016/j.
ability in domino effects due to the projection of fragments. Process Saf. Environ. ssci.2021.105363
Prot. 93, 99–110. https://doi.org/10.1016/j.psep.2014.05.003 Te, W., Adhitya, A., Srinivasan, R., 2014. Sustainability trends in the process industries:
Liu, G., Boyd, M., Yu, M., Halim, S.Z., Quddus, N., 2021. Identifying causality and a text mining-based analysis. Comput. Ind. 65, 393–400.
contributory factors of pipeline incidents by employing natural language pro­ Uysal, A.K., Gunal, S., 2014. The impact of preprocessing on text classification. Inf.
cessing and text mining techniques. Process Saf. Environ. Prot. 152, 37–46. https:// Process. Manag. 50, 104–112. https://doi.org/10.1016/j.ipm.2013.08.006
doi.org/10.1016/j.psep.2021.05.036 Vapnik, V., Izmailov, R., 2017. Knowledge transfer in SVM and neural networks. Ann.
Liu, S., Lee, K., Lee, I., 2020. Document-level multi-topic sentiment classification of Math. Artif. Intell. 81, 3–19. https://doi.org/10.1007/s10472-017-9538-x
Email data with BiLSTM and data augmentation. Knowl. -Based Syst. 197, 105918. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, L., 2017.
https://doi.org/10.1016/j.knosys.2020.105918 Attention Is All You Need, in: Advances in Neural Information Processing Systems,
Macedo, J., Aichele, D., Moura, M. das C., Lins, I.D., 2021. A web app to support hazard pp. 5998–6008.
identification of oil refineries. In: 31st European Safety and Reliability Conference. Vayansky, I., Kumar, S.A.P., 2020. A review of topic modeling methods. Inf. Syst. 94,
Angers, France. 101582. https://doi.org/10.1016/j.is.2020.101582
Marchiori, D., Guida, S., Di, 2015. Supplemental material for noisy retrieval models of Vinnem, J., Røed, W., 2020. Offshore Risk Assessment, fourth ed. Springer, London.
over- and undersensitivity to rare events. Decision 2, 82–106. https://doi.org/10. https://doi.org/10.1007/978-1-4471-7444-8
1037/dec0000023.supp Wang, N., An, S., Mai, Q., 2016. Space Engineering Risk Analysis from Risk Assessment
McKinney, W., 2010. Data Structures for Statistical Computing in Python. In: Matrix Using Text Mining. In: 2016 International Conference on Management
Proceedings of the 9th Python in Science Conference. https://doi.org/10.25080/ Sciente & Engineering (23rd), pp. 917–922.
majora-92bf1922–00a. Wang, Q., Zhang, L., Hu, J., 2018. Real-time risk assessment of casing-failure incidents
Meng, X., Zhu, J., Fu, J., Li, T., Chen, G., 2021. An accident causation network for in a whole fracturing process. Process Saf. Environ. Prot. 120, 206–214. https://doi.
quantitative risk assessment of deepwater drilling. Process Saf. Environ. Prot. 148, org/10.1016/j.psep.2018.06.039
1179–1190. https://doi.org/10.1016/j.psep.2021.02.035 Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T.,
Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., Gao, J., 2021. Deep Funtowicz, M., Davison, J., Shleifer, S., Platen, P. Von, Ma, C., Jernite, Y., Plu, J., Xu, C.,
Learning Based Text Classification: A Comprehensive Review. arXiv 54. Scao, T. Le, Gugger, S., Drame, M., Lhoest, Q., Rush, A.M., 2020. Transformers:
Moreno, A., Redondo, T., 2016. Text Analytics: the convergence of Big Data and State-of-the-Art Natural Language Processing. In: Proceedings of the 2020
Artificial Intelligence. Int. J. Interact. Multimed. Artif. Intell. 3, 57–64. https://doi. Conference on Empirical Methods in Natural Language Processing: System
org/10.9781/ijimai.2016.369 Demonstrations. Association for Computational Linguistics, pp. 38–45.

398
J.B. Macêdo, M. das Chagas Moura, D. Aichele et al. Process Safety and Environmental Protection 158 (2022) 382–399

Yan, F., Xu, K., 2019. Methodology and case study of quantitative preliminary hazard Zhou, J., Reniers, G., 2018. A matrix-based modeling and analysis approach for fire-
analysis based on cloud model. J. Loss Prev. Process Ind. 60, 116–124. https://doi. induced domino effects. Process Saf. Environ. Prot. 116, 347–353. https://doi.org/
org/10.1016/j.jlp.2019.04.013 10.1016/j.psep.2018.02.014
Yim, S., Warschauer, M., 2017. Web-based collaborative writing in L2 contexts: Zhu, Y., Kiros, R., Zemel, R., Salakhutdinov, R., 2015. Aligning Books and Movies:
methodological insights from text mining. Lang. Learn. Technol. 21, 146–165. Towards Story-like Visual Explanations by Watching Movies and Reading Books.
Zare, P., 2019. The investigation of multiple product rating based on data mining In: Proceedings of the IEEE International Conference on Computer Vision, pp.
approaches. Comput. Eng. Intell. Syst. 10, 15–25. https://doi.org/10.7176/CEIS 19–27.
Zeng, Z., Zio, E., 2017. A classification-based framework for trustworthiness assess­ Zio, E., 2018. The future of risk assessment. Reliab. Eng. Syst. Saf. 177, 176–190. https://
ment of quantitative risk analysis. Saf. Sci. 99, 215–226. doi.org/10.1016/j.ress.2018.04.020
Zhang, X., Green, E., Chen, M., Souleyrette, R.R.R., Zhang, X., Green, E., Chen, M., Zio, E., Aven, T., 2012. Industrial disasters: extreme events, extremely rare. Some
Souleyrette, R.R.R., 2019. Identifying secondary crashes using text mining reflections on the treatment of uncertainties in the assessment of the asso­
techniques. J. Transp. Saf. Secur. 1–21. https://doi.org/10.1080/19439962.2019. ciated risks. Process Saf. Environ. Prot. 1, 31–45. https://doi.org/10.1016/j.psep.
1597795 2012.01.004
Zhang, X., Mahadevan, S., 2019. Ensemble machine learning models for aviation in­
cident risk prediction. Decis. Support Syst. 116, 48–63. https://doi.org/10.1016/j.
dss.2018.10.009

399

You might also like