0% found this document useful (0 votes)
13 views6 pages

Cloud_computing_for_big_data_analytics_in_the_Process_Control_Industry

This article presents a novel cloud computing infrastructure for big data analytics in the Process Control Industry, aiming to enhance industrial processes through real-time optimization and data-driven modeling. It discusses the integration of cloud technologies, machine learning algorithms, and advanced data processing techniques to manage and analyze large volumes of sensor data efficiently. The proposed architecture is designed to reduce costs and complexity while enabling scalable implementations of optimal control systems across various industrial applications.

Uploaded by

Rishav Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views6 pages

Cloud_computing_for_big_data_analytics_in_the_Process_Control_Industry

This article presents a novel cloud computing infrastructure for big data analytics in the Process Control Industry, aiming to enhance industrial processes through real-time optimization and data-driven modeling. It discusses the integration of cloud technologies, machine learning algorithms, and advanced data processing techniques to manage and analyze large volumes of sensor data efficiently. The proposed architecture is designed to reduce costs and complexity while enabling scalable implementations of optimal control systems across various industrial applications.

Uploaded by

Rishav Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2017 25th Mediterranean Conference on Control and Automation (MED)

July 3-6, 2017. Valletta, Malta

Cloud Computing for Big Data Analytics in the


Process Control Industry
E. Goldin1 , D. Feldman1 , G. Georgoulas2 , M. Castano2 , and G. Nikolakopoulos2

Abstract— The aim of this article is to present an example of Towards this vision, the industrial processes require an IT
a novel cloud computing infrastructure for big data analytics infrastructure that could efficiently manage massive amounts
in the Process Control Industry. Latest innovations in the field of complex data structures collected form disparate data
of Process Analyzer Techniques (PAT), big data and wireless
technologies have created a new environment in which almost sources, while providing the necessary computational power
all stages of the industrial process can be recorded and utilized, and tools for analyzing these data in batch, near and hard
not only for safety, but also for real time optimization. Based real-time approaches. The overall problem becomes more
on analysis of historical sensor data, machine learning based complex, because of the diversity of acquired data mainly
optimization models can be developed and deployed in real time due to the: different data and sensors types, data reliability
closed control loops. However, still the local implementation
of those systems requires a huge investment in hardware and levels, measurement frequencies and missing data. Moreover
software, as a direct result of the big data nature of sensors in every case, the acquired data needs to be filtered, stored
data being recorded continuously. The current technological and often aggregated before any meaningful analysis can be
advancements in cloud computing for big data processing, open performed.
new opportunities for the industry, while acting as an enabler
for a significant reduction in costs, making the technology With the explosion of the “Internet of Things” [4] in the
available to plants of all sizes. The main contribution of this last decade, a world of new technologies has become readily
article stems from the presentation for a fist time ever of a accessible and relevant for the industrial process. Nowadays,
pilot cloud based architecture for the application of a data with relatively low costs, it is possible to send torrents of data
driven modeling and optimal control configuration for the field to the ”cloud” for storage and analysis. Cloud computing en-
of Process Control. As it will be presented, these developments
have been carried in close relationship with the process industry compasses, cloud storage, and batch and streaming analysis
and pave a way for a generalized application of the cloud based of data using the latest Machine Learning (ML) algorithms.
approaches, towards the future of Industry 4.0. The potential benefits of using cloud computing for dynamic
optimal control in the industrial plants include:
I. INTRODUCTION
• Dramatically reduced costs of storing and analyzing
For many years SCADA systems have been used to collect large amounts of data
sensor data in order to control industrial processes, usually in • Low levels of complexity relative to existing systems
real time [1]. The topological complexity of these systems • Enabling the use of advanced ML algorithms in batch
(see [2]) involves large costs associated to scaling and and real time
adapting to the vast amount of signals gathered for allowing • Reduces the industry entry level costs, for implementing
a general reconfiguration on the control structure for the advanced control systems
process plant (see [3]). It should be also mentioned that the • Enabling large scale implementation with many low cost
majority of these SCADA systems, up to now, have been sensors
utilized mainly for providing an overview of the controlled • Very easy to manage from the cloud
process, while having the ability to perform Process Analyzer • Easy to scale or modify storage capacities
Techniques (PAT) mainly for the statistical processing of the
received data for an off line analysis. Inspired by these capabilities of the cloud infrastructure
and the reachability of these technologies nowadays, the
However, the recent innovations in online PAT and wire-
proposed architecture aims to combine the existing PAT
less embedded technologies have created a new era in which
based analysis of process that is carried in most of the times
almost all stages in the industrial process can be recorded,
off line, or in a batch of time samples, with the multiple
stored and analyzed. This process is producing a massive
streams of sensory data describing the process and product
amount of sampled data that need to be stored and processed
states. The low-dimensional data should be robust against
in real time for allowing an overall reconfiguration of the
infrequent updates of PAT measurements and missing data,
control plant and for achieving a continuous operational
while handling largely varying measurement intervals. The
optimality against the variations of the production stages.
model should also be able to handle the multivariate and
The work has received funding from the European Unions Horizon auto correlated nature of process data and the high quantities
2020 Research and Innovation Programme under the Grant Agreement of data from regular on line measurements. Principles from
No.636834, DISIRE wireless sensor networks, estimation and statistical signal
1 GSTAT, Israel
2 Robotic Team, Division of Signal and Systems, Electrical Engineering processing will be integrated and evaluated with real process
Department, Luleå University of Technology, Luleå, Sweden. data in order to create a novel and reliable PAT based swarm

978-1-5090-4533-4/17/$31.00 ©2017 IEEE 1373


Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on February 20,2024 at 21:51:45 UTC from IEEE Xplore. Restrictions apply.
sensing and data analysis that would drive the changes in the
Integrated Process Control (IPC) industry. Based on such an
architecture it will be for the first time feasible to acquire
and process online huge streams of data, improve the process
models and correspondingly perform an online reconfigura-
tion or re-tuning of the control scheme, in order to meet
the changing demands of the process under investigation and
apply platwide control techniques (see [5], [6]). Towards this
vision, the corresponding architecture of the cloud computing
for the big data analytics will be presented that forms the
major contribution of this article. Furthermore, the proposed
technological platform will be adjusted to the use case of a
walking beam furnace.
The rest of this article is structured as it follows. In the
Fig. 1. Schematic Diagram of the Cloud Based Architecture
Section II the architecture and components of cloud com-
puting will be introduced, while in Section III a use case of
a dynamic optimal design problem that can be implemented storage interface that can facilitate storage of virtually un-
using the described architecture will be analyzed. Finally, limited data bucketed into 5 terabytes in size. Furthermore,
Chapter IV will conclude the article by summarizing the the analytic architecture on the cloud is comprised of a
benefits and limitations in using the described architecture ”big data” infrastructure, where the files are distributed over
in the industrial process. several machines for storage and parallel computing and a
II. A RCHITECTURE FOR C LOUD C OMPUTING statistical software from which the data can be transformed
and analyzed.
In batch computing, data is first stored in a Big Data
Repository where it can be properly cleaned, aggregated or A. Cloud Storage
transformed before being analyzed by the process managers
(see [7]). Often this includes saving the data in Parquet Amazon Web Services (AWS) offers a suite of over 70
format that can reduce the size of the data up to 90% of services that form an on-demand computing platform. The
its original size. two core services offered are:
In the proposed prototype architecture for batch processing 1) Amazon Elastic Compute Cloud (EC2) - a virtual
over the Cloud, users (industrial processes) were given access computer rental service through which users can run
to an Amazon web portal for S3 storage services. All users any software they desire and tailor the computer spec-
were encouraged to contribute their raw batch data to the S3 ifications to their specific needs. The payment scheme
repository. From the S3 storage service it is feasible to collect is per hour of actual usage - where computers can be
the data onto virtual computers (”instances”) implemented ”stopped” and ”started” on demand.
over the EC2 Amazon elastic computing framework, for 2) Amazon Simple Storage Service (S3) - a web storage
data analysis and cleaning. On these virtual computers the interface which can facilitate storage of virtually un-
Hadoop cluster [8] has been installed with a Spark engine limited data bucketed into 5 terabytes in size.
[9] for computing and an RStudio Server [10] as an analytic In the presented architecture, the utilized Amazon on-
access point for the end-users. Further access is also provided demand platform allowed for higher flexibility in pricing and
to the virtual computers via the RStudio Server IDE, through almost instantaneous setup of our prototype architecture. It
which they can perform ML algorithms and a vast array of also served as a platform where the different partners could
statistical analysis on the data. The overall architecture of easily upload and access their data for further analysis.
the proposed cloud architecture is presented in Figure 1.
In the architecture depicted in Figure 1, historical data B. Hadoop Cluster (HDFS)
collected from sensors embedded in the industrial process,
are uploaded to the S3 storage on the Amazon Web Service Apache Hadoop is the leading open-source software
(AWS). After the upload the data are cleaned and prepared framework for distributed storage and processing of Big Data
for analysis on the big data framework. The process man- [8]. While Hadoop encompasses a suite of Apache software
agers can access this data via local computers where they can programs that help manage the tasks on the distributed
send, develop and test their algorithms, including dynamic system, the two core components of Hadoop are:
optimal control algorithms on the cloud of the monitored 1) Hadoop Distributed File System (HDFS) - The system
process. that takes very large data, breaks it down into separate
Historical Data Repository - Users were given access to pieces and distributes them to different nodes (servers)
an Amazon S3 storage facility to which they were able to in a cluster.
upload their historical/batch data in various formats (csv, 2) MapReduce - The computational engine that can per-
Json, etc.). Amazon Simple Storage Service (S3) is a web form analysis on the cluster.

1374
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on February 20,2024 at 21:51:45 UTC from IEEE Xplore. Restrictions apply.
HDFS was designed to store Big Data with a very high G. Near Real-time Computing
reliability and flexibility to scale up by simply adding com- Apache Kafka [12] is a publish-subscribe messaging
modity servers. application that enables sending and receiving streaming
In the presented prototype architecture it has been utilized information between the plants and the Spark engine on the
Hadoop as a framework for setting up the HDFS cluster on cloud. On the local computers (in the plants) a Kafka API
which the sensor data are stored. (which consists of a few Java libraries) sends streaming data
to a Kafka Server set up on AWS that manages the queue of
C. Apache Spark Engine
information passed on to the Spark engine. The Spark engine
The main feature of Apache Spark is its in-memory cluster then performs the streaming analysis and pushes back the
computing that increases of the processing speed much faster results to the Kafka server and from there back to the plants.
than the Hadoop’s MapReduce technology. Spark uses HDFS The analysis can be either cleaning of the data, searching
for storage purpose, where calculations are performed in for outliers or implementing a ML algorithm in real-time.
memory on each of the nodes. Aside from the increased In addition, every 10 minutes the Spark server sends the
speed in computation, the Spark engine is able to: accumulated data to the Historical Big Data Repository for
• Provide built-in APIs for multiple languages: Java, future use or for batch computing.
Scala, Python and R
H. Batch Computing
• Spark-SQL for querying big data with SQL liked code
• Spark-MLlib [11] for big data parallel machine learning In batch computing, the data are initially stored in the
algorithms like linear and logistic regression, clustering Historical Big Data Repository where it can be properly
K-means, decision trees, random forest, neural network, cleaned, aggregated or transformed before being analyzed
recommendation engine and more by the process managers. In many cases, this step includes
• Spark-Streaming for calculating machine learning algo- saving the data in the Parquet format which can reduce the
rithms on streaming data size of the data by using the R or Python languages. In
general, the process managers can choose from a vast array
D. Process Managers of ML algorithms that can be implemented on the cluster
At the other end of the proposed architecture are the through the Spark engine.
process managers who, through local computers, can access III. T HE USE CASE OF THE WALKING BEAM FURNACE
and perform machine learning algorithms on the data stored
in the Hadoop cluster. The two leading programs that serve The walking beam furnace is used to re-heat slabs (large
as an interface for conducting statistical analysis using the steel beams) to a specific temperature before their refinement
Spark engine are: in the steel industry (see [13]). The slabs are walked from
the feed to the output of the furnace by the cyclic movement
1) R - An open-source statistical language used widely of so-called walking beams. During this passage, the items
both in the industry in academia. are directly exposed to the heat produced by burners located
2) Python - An open-source all around language which inside the furnace. Since the heat distribution affects the qual-
has a vast library of functions for implementing ma- ity of the finished product, a natural optimal control problem
chine learning algorithms. in this context is to regulate pre-assigned temperatures at
As mentioned above, both of these coding languages have specific points of the furnace, while minimizing the energy
APIs that pass commands to the Spark engine. The process expenditure for the heat generation (see [14], [15]).
managers access and run these programs through a number The walking beam furnace at MEFOS is an experimental
of web-based development environments and notebooks such furnace and lacks some of the features of an industrial
as the Jupyter notebook, which is popular in the Python furnace. Specifically, the temperatures throughout the furnace
community and RStudio, which is the leading IDE amongst are not feedback controlled (as it is otherwise customary in
R users. the industry), i.e., the furnace operates open loop. Currently,
a human operator configures the furnace set-points manually
E. Control Feedback Loop
(the set-point values are, however, computed numerically)
After the process managers have performed their analysis, and then measures the slabs temperature at the furnace
they can set up dynamic models for implementation in exit using a pyrometer. In fact, under normal operating
the cloud that can push back responses to the industrial conditions, the open-loop control can be tuned to work well.
processes. This process is explained further in the Near Real- Additionally, this industrial installation is affected by stops
Time Computing subsection. and other variations that influence the control performance
and correspondingly the need for a feedback control loop.
F. Historical Big Data Repository In the described use case the main variables that need to be
In the cloud, the raw data and the process manager’s controlled are thus: a) the furnace temperatures in several
recommendations will be stored at the historical big data zones of the furnace and b) the temperature of slabs at the
repository (AWS S3). AWS offers great flexibility in storage output (the target temperature). Furthermore, the main ob-
plans that have the merit to be easily scaled as needed. jective is to reduce the operating costs through the reduction

1375
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on February 20,2024 at 21:51:45 UTC from IEEE Xplore. Restrictions apply.
of energy consumption. In this respect, a small decrease in atomization air supply rate, the combustion air flow and the
energy consumption such as 0.5% translates into a saving exhaust flow. In the cloud the raw data and the optimizers
of 2kWh per ton of heated product, while optimal control recommendations will be stored at historical big data repos-
strategies could lead to quality improvements as well. The itory (AWS S3). The overall schematic representation of the
overall schematic diagram of the WBF with the indicative presented architecture is depicted in Figure 4.
control loops, the sensors and the different heating zones is For this usecase, the variables required from the optimal
depicted in Figure 2. control module are the following ones in Figure 5:
The minimum data input for the optimal control is 200
past values of the averages every 10 seconds of the above
parameters (one value every 10 seconds in the last 2,000
seconds, i.e. 33 minute and 20 seconds) is required.
A. Transferring data from the sensors to the cloud
For transferring data from the sensors to the cloud, a
computer connected to the WBF process is utilized that is
able to manage and update the site metadata, i.e. a Mefos-
Service method which run preliminary for synchronization
of factory list, zone list, sensor list, bath list and model
Fig. 2. Schematic Diagram of the Walking Beam Furnace list. Furthermore, this method create a file in json structure
with 3 fields: FactoryID, ZoneID, SensorID in Every possible
To achieve these goals there is a need to gather more values, while the posted data can be either a single message
information about the process on-line, while the optimal or array. The input messages are processed at the Kafka
controls output would optimize the process by controlling
the following variables: 1) the fuel supply rate at the burners, TABLE I
one burner at each zone, total of three burners, 2) the fuel M ESSAGE T YPES
atomization air supply rate, one for each burner, 3) the Message Type 1 - Process Status Change
combustion air flow, one at each zone, total of three zones, Factory ID F Key [Predifined Integer]
and 4) the exhaust flow, e.g. exhaust damper position, one Batch ID F key [Predefined Integer]
Status ID P key [running Integer]
exhaust damper in the furnace. Date time [Time Stamp]
In this use case, MEFOS has installed a dedicating PC in Current Status [Predefined String:Idle/Start/Stop/Pause/Restart ]
the WBF site for managing the flow of the measurements Message Type 2 - Measurements
Factory ID F key [Predefined Integer: -1 / 1 / 2 / 3 / ]
data. Figure 3 presents the flow of the sensory data from Zone ID F key [Predefined Integer]
ABB control system to the connectivity server and from there Sensor ID F key [Predefined Integer]
to the corresponding PC and in the sequel to the cloud. Batch ID F key [Predefined Integer]
Date Time [Time Stamp]
Measurement value [Double]
Measurement unit [Char: C/ % / m3 /h / kg/h / MMWC / Boolean ]
Quality [Integer]

server by using a specific topic that it is known by both


sides as the MefosService and the MefosSpark, while it
requires suitable configurations e.g ”ToSpark”. The Kafka
API provides a callback method which verifies the input
streaming received on Kafka server. The POST method ”/
SendMeashurements” uses this API to evaluates any loss, if
there is some.
Fig. 3. Cloud Based Implemented Architecture of the WBF
B. The Cloud side
In the presented use case it is intended to stream the data On the clouds side (AWS) there will be the Kafka server
on-line, near real-time from the process by using the Kafka- which will receive a streaming of data and will manage
producer component, to Kafka service in the cloud, while the the queue. Overall the data will be routed through the
Apache Kafka publishes-subscribes messaging applications. Kafka server into the Spark cluster and from there back
In the cloud the data will be pulled by the Kafka-consumer to Kafka. As mentioned before, the Kafka server will be
that will be implemented at the Spark cluster. At the cluster, held responsible for managing the messages that arrive
the data will be verified, cleaned, aggregated, organized and from MefosService. The Spark streaming process consumes
sent to the optimal control system to determine recommen- measurements data from the Kafka server, store it in the
dations. Afterwards the optimizer’s recommendations will memory, and feeds the relevant process models at 10 sec.
be pushed back to Kafka, while the corresponding gateway In every batch intervals the process receives the recommen-
will determine the fuel supply rate at the burners, the fuel dations per measurement type from each model and sends

1376
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on February 20,2024 at 21:51:45 UTC from IEEE Xplore. Restrictions apply.
Fig. 4. Schematic description of the Architecture

Fig. 6. Overview of the Streaming Process

Fig. 5. Variables required by the optimal control module process by a provided URL.
For the big data repository, the Spark-Streaming process
metadata are synchronized and pre-processed. After this step
the recommendations to the Kafka server. In the sequel, the the data are being pushed from the Mefos-Service PC into the
Spark streaming process saves the measurements data along Kafka server and from there are pulled by the Spark cluster.
with the recommendations to AWS S3. Overall, the streaming At the Spark-Streaming, the initial data are accumulated
process is depicted in Figure 6. in the memory and afterwards are saved at a historical
The Kafka server will also keep and be responsible for Big Data repository. The Controls recommendations data
the recommendations data queue that it is arrived from the are also accumulated at the memory and are saved at the
Spark cluster. For the transfering of the results from the historical Big Data repository that relies at the AWS S3
cloud back to the process, the Kafka server keeps the controls (Amazon Simple Storage Service), while the files will be
recommendations data and streams them on a specific output saved as Parquet file type with the following benefits: 1)
topic to some consumer, while the ”MefosService” includes The structure of the table, i.e. the number of the columns,
the Kafka-consumer feature that pulls the recommendations their types and the delimiter between columns, will be saved,
data from the output topic, e.g. ”FromSpark”. Finally, the 2) the data are compressed, a fact that saves about 60%
output recommendations are reaching to the Web-API of the of its volume compared to text file type, and 3) it enables

1377
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on February 20,2024 at 21:51:45 UTC from IEEE Xplore. Restrictions apply.
the straight upload into Spark in memory data storage, no
conversions will be needed. Furthermore, the historical Big
Data repository will enable deep investigation of the data in
case it is required for the development of new models, such
as the BI reports, etc.
IV. C ONCLUSIONS
In this article an example of a novel cloud computing
infrastructure for big data analytics in the Process Control
Industry has been presented. The current technological ad-
vancements in cloud computing for big data processing, open
new opportunities for the industry, while acting as an enabler
for a significant reduction in costs, making the technology
available to plants of all sizes. The main contribution of
this article has been the presentation for the fist time ever
of a pilot cloud based architecture for the application of a
data driven modeling and optimal control configuration for
the field of Process Control. These developments have been
carried in close relationship with the process industry, since it
has been presented a use case at the walking beam furnace of
the Steel Industry MEFOS in Sweden. Part of the future work
includes the full extended experimentation and validation of
the proposed scheme in WBF campaigns.
R EFERENCES
[1] D. Bailey and E. Wright, Practical SCADA for industry. Newnes,
2003.
[2] O. Sporns and G. Tononi, “Classes of network connectivity and
dynamics,” in Complexity, vol. 7, pp. 28–38, 2001.
[3] M. van de Wal and B. de Jager, “Control structure design: a survey,”
in Proceedings of the 1995 American Control Conference, vol. 1,
pp. 225–229 vol.1, Jun 1995.
[4] L. Atzori, A. Iera, and G. Morabito, “The internet of things: A survey,”
Computer networks, vol. 54, no. 15, pp. 2787–2805, 2010.
[5] S. Skogestad, “Plantwide control: the search for the self-optimizing
control structure,” Journal of Process Control, vol. 10, pp. 487–507,
October 2000.
[6] W. L. Luyben, B. D. Tyreus, and M. L. Luyben, Plant-wide process
control. McGraw-Hill, 1998.
[7] F. Bonomi, R. Milito, J. Zhu, and S. Addepalli, “Fog computing and
its role in the internet of things,” in Proceedings of the first edition
of the MCC workshop on Mobile cloud computing, pp. 13–16, ACM,
2012.
[8] T. White, Hadoop: The definitive guide. ” O’Reilly Media, Inc.”, 2012.
[9] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica,
“Spark: Cluster computing with working sets.,” HotCloud, vol. 10,
no. 10-10, p. 95, 2010.
[10] J. Allaire, “Rstudio: Integrated development environment for r,”
Boston, MA, 2012.
[11] X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu,
J. Freeman, D. Tsai, M. Amde, S. Owen, et al., “Mllib: Machine
learning in apache spark,” Journal of Machine Learning Research,
vol. 17, no. 34, pp. 1–7, 2016.
[12] N. Garg, Apache Kafka. Packt Publishing Ltd, 2013.
[13] H. S. Ko, J.-S. Kim, T.-W. Yoon, M. Lim, D. R. Yang, and I. S. Jun,
“Modeling and predictive control of a reheating furnace,” in American
Control Conference, 2000. Proceedings of the 2000, vol. 4, pp. 2725–
2729, IEEE, 2000.
[14] B. Leden, “A control system for fuel optimization of reheating
furnaces,” Scand. J. Metall., vol. 15, no. 1, pp. 16–24, 1986.
[15] J. Srisertpol, S. Tantrairatn, P. Tragrunwong, and V. Khomphis, “Es-
timation of the mathematical model of the reheating furnace walking
hearth type in heating curve up process,”

1378
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on February 20,2024 at 21:51:45 UTC from IEEE Xplore. Restrictions apply.

You might also like