Machine Learning In Python For Dynamic Process Systems Ankur Kumar download
Machine Learning In Python For Dynamic Process Systems Ankur Kumar download
https://ebookbell.com/product/machine-learning-in-python-for-
dynamic-process-systems-ankur-kumar-52689944
https://ebookbell.com/product/machine-learning-in-python-for-process-
systems-engineering-ankur-kumar-jesus-florescerrillo-47411856
https://ebookbell.com/product/machine-learning-in-python-essential-
techniques-for-predictive-analysis-michael-bowles-36155298
https://ebookbell.com/product/machine-learning-in-python-and-r-for-
dummies-john-paul-mueller-232123742
https://ebookbell.com/product/statistics-data-mining-and-machine-
learning-in-astronomy-a-practical-python-guide-for-the-analysis-of-
survey-data-course-book-eljko-ivezi-andrew-j-connolly-jacob-t-
vanderplas-alexander-gray-51954088
https://ebookbell.com/product/statistics-data-mining-and-machine-
learning-in-astronomy-a-practical-python-guide-for-the-analysis-of-
survey-data-updated-edition-eljko-ivezi-andrew-j-connolly-jacob-t-
vanderplas-alexander-gray-51955478
https://ebookbell.com/product/statistics-data-mining-and-machine-
learning-in-astronomy-a-practical-python-guide-for-the-analysis-of-
survey-data-eljko-ivezi-andrew-connolly-jake-vanderplas-alexander-
gray-33573214
https://ebookbell.com/product/machine-learning-for-emotion-analysis-
in-python-allan-ramsay-56305226
https://ebookbell.com/product/machine-learning-for-decision-sciences-
with-case-studies-in-python-s-sumathi-43242592
ML for Process Industry Series
First Edition
Ankur Kumar
Jesus Flores-Cerrillo
`
Dedicated to our spouses, family, friends, motherland, and all the data-science enthusiasts
`
www.MLforPSE.com
All rights reserved. No part of this book may be reproduced or transmitted in any form or in
any manner without the prior written permission of the authors.
.
Every effort has been made in the preparation of this book to ensure the accuracy
of the information presented and obtain permissions for usage of copyrighted materials.
However, the authors make no warranties, expressed or implied, regarding errors or
omissions and assume no legal liability or responsibility for loss or damage resulting from
the use of information contained in this book.
Series Introduction
In the 21st century, data science has become an integral part of the work culture at every
manufacturing industry and process industry is no exception to this modern phenomenon.
From predictive maintenance to process monitoring, fault diagnosis to advanced process
control, machine learning-based solutions are being used to achieve higher process reliability
and efficiency. However, few books are available that adequately cater to the needs of
budding process data scientists. The scant available resources include: 1) generic data
science books that fail to account for the specific characteristics and needs of process plants
2) process domain-specific books with rigorous and verbose treatment of underlying
mathematical details that become too theoretical for industrial practitioners. Understandably,
this leaves a lot to be desired. Books are sought that have process systems in the backdrop,
stress application aspects, and provide a guided tour of ML techniques that have proven
useful in process industry. This series ‘Machine Learning for Process Industry’ addresses
this gap to reduce the barrier-to-entry for those new to process data science.
The first book of the series ‘Machine Learning in Python for Process Systems
Engineering’ covers the basic foundations of machine learning and provides an overview of
broad spectrum of ML methods primarily suited for static systems. Step-by-step guidance on
building ML solutions for process monitoring, soft sensing, predictive maintenance, etc. are
provided using real process datasets. Aspects relevant to process systems such as modeling
correlated variables via PCA/PLS, handling outliers in noisy multidimensional dataset,
controlling processes using reinforcement learning, etc. are covered. This second book of
the series is focused on dynamic systems and provides a guided tour along the wide range
of available dynamic modeling choices. Emphasis is paid to both the classical methods (ARX,
CVA, ARMAX, OE, etc.) and modern neural network methods. Applications on time series
analysis, noise modeling, system identification, and process fault detection are illustrated
with examples. Future books of the series will continue to focus on other aspects and needs
of process industry. It is hoped that these books can help process data scientists find
innovative ML solutions to the real-world problems faced by the process industry.
Books of the series will be useful to practicing process engineers looking to ‘pick up’ machine
learning as well as data scientists looking to understand the needs and characteristics of
process systems. With the focus on practical guidelines and real industrial case studies, we
hope that these books lead to wider spread of data science in the process industry.
`
Preface
Model predictive control (MPC) and real-time optimization (RTO) are among the most critical
technologies that drive the process industry. Any experienced process engineer would vouch
that the success of MPCs and RTOs depend heavily on the accuracy of the underlying
process models. Similarly, the key requirement for other important process technologies like
dynamic data reconciliation, process monitoring, etc. is availability of accurate process
models. Modeling efforts during commissioning of these tools can easily consume up to 90%
of the project cost and time. Modern data revolution, whereby abundant amount of process
data are easily available, and practical difficulties in building high-fidelity first-principles
dynamic models for complex industrial processes have popularized the usage of empirical
data-driven/machine learning (ML) models. Building dynamic models using process data is
called system identification (SysID) and it’s a very mature field with extensive literature.
However, it is also very easy for a process data scientist (PDS) new to this field to get
overwhelmed with the SysID mathematics and ‘drowned’ in the sea of SysID terminology.
Several noteworthy ML books have been written on time-series analysis; however, in process
industry, input-output models are of greater import. Unfortunately, there aren’t many books
that cater to the needs of modern PDSs interested in dynamic process modeling (DPM)
without weighing them down with too much mathematical details and therein lies our
motivation for authoring this book: specifically, a reader-friendly and easy to understand book
that provides a comprehensive coverage of ML techniques that have proven useful for
building dynamic process models with focus on practical implementations.
It would be clear to you by now that this book is designed to teach working process engineers
and budding PDSs about DPM. While doing so, this book attempts to avoid a pitfall that
several generic ML books fall into: overemphasis on ‘modern’ and complex ML techniques
such as artificial neural networks (ANNs) and undertreatment of classical DPM methods.
Classical techniques like FIR and ARX still dominate the dynamic modeling solutions offered
by commercial vendors of industrial solutions and are no less ‘machine-learning’ than the
ANNs. These two along with other classical techniques (such as OE, ARIMAX, CVA) predate
the ANN-craze era and have stood the test of time in providing equal (if not superior)
performance compared to ANNs. Correspondingly, along with modern ML techniques like
RNNs, considerable portion of the book is devoted to classical dynamic models with the
emphasis on understanding the implications of modeling decisions such as the impact of
implicit noise model assumption with ARX model, the implication of differencing data, etc.
`
Guided by our own experience from building process models for varied industrial applications
over the past several years, this book covers a curated set of ML techniques that have proven
useful for DPM. The broad objectives of the book can be summarized as follows:
• reduce barrier-to-entry for those new to the field of SysID
• provide working-level knowledge of SysID techniques to the readers
• enable readers to make judicious selection of a SysID technique appropriate for their
problems through intuitive understanding of the advantages and drawbacks of
different methods
• provide step-by-step guidance for developing tools for soft sensing, process
monitoring, predictive maintenance, etc. using SysID methods
This book adopts a tutorial-style approach. The focus is on guidelines and practical
illustrations with a delicate balance between theory and conceptual insights. Hands-on-
learning is emphasized and therefore detailed code examples with industrial-scale datasets
are provided to concretize the implementation details. A deliberate attempt is made to not
weigh readers down with mathematical details, but rather use it as a vehicle for better
conceptual understanding. Complete code implementations have been provided in the
GitHub repository. Although most of the existing literature on SysID use MATLAB as the
programming environment, we have adopted Python in this book due to its immense
popularity among the broad ML community. Several Python libraries are now available which
makes DPM using Python convenient.
We are quite confident that this text will enable its readers to build dynamic models for
challenging problems with confidence. We wish them the best of luck in their career.
Pre-requisites
No prior experience with machine learning or Python is needed. Undergraduate-level
knowledge of basic linear algebra and calculus is assumed.
Book organization
Under the broad theme of ML for process systems engineering, this book is an extension of
the first book of the series (which dealt with fundamentals of ML and its varied applications
in process industry); however, it can also be used as a standalone text. To give due treatment
to various aspects of SysID, the book has been divided into three parts. Part 1 of the book
provides a perspective on the importance of ML for dynamic process modeling and lays down
the basic foundations of ML-DPM (machine learning for dynamic process modeling). Part 2
provides in-detail presentation of classical ML techniques and has been written keeping in
mind the different modeling requirements and process characteristics that determine a
model’s suitability for a problem at hand. These include, amongst others, presence of
multiple correlated outputs, process nonlinearity, need for low model bias, need to model
disturbance signal accurately, etc. Part 3 is focused on artificial neural networks and deep
learning. While deep learning is the current buzzword in ML community, we would like to
caution the reader against the temptation to deploy a deep learning model for every problem
at hand. For example, the models covered in Part 2 still dominate the portfolio of models
used in industrial controllers and can often provide comparable (or even superior)
performance compared to ANNs with relatively less hassle.
Symbol notation
The following notation has been adopted in the book for representing different types of
variables:
- lower-case letters refer to vectors (𝑥 ∈ ℝ𝑚×1 ) and upper-case letters denote matrices
(𝑋 ∈ ℝ𝑛×𝑚 )
- individual element of a vector and a matrix are denoted as 𝑥𝑗 and 𝑥𝑖𝑗 , respectively.
- any ith vector in a dataset gets represented as subscripted lower-case letter (e.g., 𝑥𝑖 ∈
ℝ𝑚×1 ). Its distinction from an individual element 𝑥𝑗 would be clear from the
corresponding context.
`
Table of Contents
1.1. Process Systems Engineering, Dynamic Process Modeling, and Machine Learning
-- components of a dynamic process model
1.2. ML-DPM Workflow
1.3. Taxonomy of ML-based Dynamic Models
1.4. Applications of DPM in Process Industry
Part 1
Introduction & Fundamentals
Chapter 1
Machine Learning and Dynamic Process
Modeling: An Introduction
P
rocess industry operations are dynamic in nature. In complex process plants such as
oil refineries, 1000s of process measurements may be recorded every second to
capture crucial process trends. Plant engineers often employ dynamic process models
to predict future values of process variables in applications such as process control, process
monitoring, etc. Machine learning (ML) provides a convenient mechanism to bring together
the dynamic process modeling (DPM) needs and the large data resources available in modern
process plants.
This chapter provides an overview of what ML has to offer for DPM. This chapter also
addresses a dichotomy between ML community and DPM community. While the former
generally tends to claim that ML has been reduced to mere execution of ‘model-fitting’ actions,
the later vouches for the need of ‘experts’ to build ‘successful’ models. We will attempt to
reconcile these two differing viewpoints.
Overall, this chapter provides a whirlwind tour of how the power of machine learning is
harnessed for dynamic process modeling. Specifically, the following topics are covered
• Introduction to dynamic process modeling and the need for machine learning
• Typical workflow in a ML-based DPM (ML-DPM) project
• Taxonomy of popular ML-DPM methods and models
• Applications of DPM in process industry
Let’s now tighten our seat-belts as we embark upon this exciting journey of de-mystifying
machine learning for dynamic process modeling.
1
`
Process industry is a parent term used to refer to industries like petrochemical, chemical,
power, paper, cement, pharmaceutical, etc. These industries use processing plants to
manufacture intermediate or final consumer products. As emphasized in Figure 1.1, the prime
concern of the management of these plants is optimal design and operations, and high
reliability through proactive process monitoring, quality control, data reconciliation, etc. All
these tasks fall under the ambit of process systems engineering (PSE).
Figure 1.1: Overview of industries that constitute process industry and the tasks process
systems engineers perform
The common theme among the PSE tasks is that they all rely on reliable dynamic
mathematical process models that can accurately predict the future state of the process using
current and past process measurements. Therefore, DPM is one of the defining skills of
process systems engineers. Process industry has historically utilized both first
principles/phenomenological and empirical/data-based models for DPM. While the former
models provide higher fidelity, the later models are easier to build for complex systems. The
immense rise in popularity of machine learning/data science in recent years and exponential
increase in sensor measurements collected at plants have led to renewed interest in ML-
based DPM. Several classical and ‘modern’ ML methods for DPM are at a process data
scientist’s disposal. The rest of the book will take you on a whirlwind tour of these methods.
Let’s now jump straight into the nitty-gritty of ML-DPM.
2
`
Figure 1.2: (a) Process system (and its dynamic model) with input, output, and noise signals (b) A pH
neutralization process broken down into its dynamic components
The impact of measurement and process noise can be clubbed together and explained via a
stochastic model as shown in the figure above. The resulting disturbance signal summarizes
all the uncertain characteristics of the process. The stochastic and deterministic models can
be estimated simultaneously as well as separately.
3
`
where α and β are estimable model parameters and the disturbance variable, v(k),
summarizes all the uncertainties. Here, the value of the output at the kth time instant is
predicated upon the past values of both output and input, and the disturbance. In the later
chapters, we will study how to characterize v(k) using stochastic models.
⋯ 𝑦𝑠𝑡𝑒𝑎𝑑𝑦 ⋯
transition
dynamics
Figure 1.3: Representative dynamic changes in a SISO process output upon a step change
in input
4
`
➢ In static model, we are only concerned with being able to predict ysteady
as the impact of the step change in u.
➢ On the other hand, in dynamic model, we care about the transition
dynamics as well.
➢ It is obvious that a dynamic model estimation is a more demanding
problem and the complexity only increases for multivariable systems
with stochastic components.
𝑑𝑦
= 𝑓(𝑦(𝑡), 𝑢(𝑡))
𝑑𝑡
In an alternate scenario, if the input variables are constant between samples, then
an exact analytical discrete-time representation can be derived for linear processes.
5
`
Figure 1.4 shows the typical steps involved in system identification. As you can see, SysID is
more than just curve fitting. Overall, there are five broad tasks: data collection, exploratory
data analysis, data pre-treatment, model identification, and model validation. Although we will
study each step in detail in Chapter 4, let’s take a quick overview:
• Data pre-treatment: This step consists of several activities that are designed to
remove the portions of training data that are unimportant (or even detrimental) to model
identification. It may entail removal of outliers and noise, removal of trends, etc.
Alternatively, training data can also be massaged to manipulate the model’s accuracy
as suited for the end purpose of the model. For example, if the model is to be used for
control purpose, then the training data may be pre-filtered to bolster model’s accuracy
for high frequency signals. Overall, the generic guideline is that you have better
chances of a successful SysID with a better conditioned dataset.
• Model training: Model training is the most critical step in SysID and entails a few sub-
steps. First, a choice must be made on the type of disturbance model and deterministic
model. The end use of the model, the available a priori system knowledge, and the
desired degree of modeling complexity dictate these selections. For example, if the
model is to be used for simulation purposes, then OE1 structure may be preferred over
ARX; if process disturbance has different dynamics compared to process input, then
1
We will cover these model structures in the upcoming chapters
6
`
Representative illustrations
Time plots
detrending
ARX modeling
Figure 1.4: Steps (with sample sub-steps) involved in a typical ML-based dynamic modeling
7
`
• Model validation: Once a model has been fitted, the next step is to check if the model
truly represents the underlying process. Several techniques are at a modeler’s
disposal. For example, you could check the model’s performance on a dataset that has
not been used during parameter estimation, plot the modeling errors to look for leftover
patterns, or check if the model agrees with the a priori information about the system.
Often, your first model will fail the validation procedure. An expert modeler can use the
validation results to infer the reasons for the failure. Some examples of the cause could
be
➢ Training data was not ‘rich’ enough
➢ Training data was not pre-treated adequately
➢ Choice of model structure was wrong
➢ Estimation algorithm did not converge
Once diagnosed, appropriate corrections are made, and the iterative procedure
continues.
We hope that you get the understanding that SysID is more than just black-box application of
modeling algorithm for estimation of model parameters. There are several practical aspects
that require active participation of the modeler during the SysID process. Without
exaggeration, we can say that SysID is an art, and the rest of the book will help you obtain
the necessary skills to become the SysID artist!
2
Models where time variable does not appear as an explicit variable.
8
`
Figure 1.5: Some popular SysID methods and models covered in this book
There are several noteworthy points in Figure 1.5. First, the apparent domination of linear
models: you may have expected nonlinear models to be preferable for modeling complex
industrial process systems, however, as it turns out, linear models are often justified for DPM.
The justification stems from the fact that industrial processes often operate around an optimal
point making the linear models pretty good approximation for usage in process control and
process monitoring applications. If the end use of the model is process simulation or design
where the system response over wide range of input variables is of interest, then nonlinear
models can prove to be more suitable. Within linear category, different modeling options exist
to cater to process systems with different characteristics, viz, multivariable outputs, presence
of correlated noise, disturbances sharing dynamics with inputs, presence of measurement
noise only, presence of drifts or non-stationarities, etc.
The classification at the root of linear model sub-tree is based on the methodology used for
model fitting. While PEM methods employ minimization of prediction errors, subspace
methods are based on matrix algebra and do not involve any optimization for parameter
optimization.3 Do not worry if these terms do not make much sense right now; they will soon
become ‘obvious’ to you. If not trained about the nuanced differences between these different
models and methods, you may not have much idea at the outset about which model would be
the best one for your system. This book will help you gain adequate conceptual understanding
to become adept at making the right choice and obtaining your coveted model quickly.
3
Another distinction is that PEM is used to obtain input-output models while SIM is used to obtain state-space models.
9
`
Types of models
In Eq. 1 we saw one way of representing a dynamic process model. The models from Figure
1.5 can be used to generate other representations of your dynamic systems; the figure below
shows the different forms/types of models that we will learn to derive in this book using
machine learning. We will also understand the pros and cons of these different model forms.
Figure 1.6: Different forms of dynamic process model (that we will learn to generate in this
book), their defining characteristics, and corresponding representative examples
10
`
u Process model:
y
𝑥(𝑘 + 1) + 𝑎𝑥(𝑘) = 𝑏𝑢(𝑘)
If not careful, you may just ignore the presence of measurement noise and attempt to fit
the following input-output model to estimate the model parameters a and b.
Data file ‘simpleProcess.csv’ contains 1000 samples of u and y obtained from the true
process. Below are the parameter estimates we get using this data,
Two things are striking here: the estimates are grossly inaccurate, and the parameter
error estimates seem to suggest high confidence in these wrong values! What went
wrong in our approach? The reason is that the SNR (signal-to-noise ratio) value is not
very high (data was generated with SNR ~ 10) and the fitted ARX model is wrong input-
output form of the true process. The correct form would be the following
11
`
In Figure 1.1, we saw some of the applications of dynamic models in process industry. To
provide further perspectives into how a typical plant operator or management may use these
varied applications, Figure 1.7 juxtaposes DPM applications alongside the typical decision-
making hierarchy in a process plant. Every step of plant operation is now-a-days heavily
reliant on DPM-based tools and machine learning has proven to be a useful vehicle for quickly
building these tools. You can use the models from Figure 1.5 for building these tools or if you
use commercial vendor solutions, you can find them employing these models in their products.
For example, in the process control field, FIR models have been the bedrock of industrial MPC
controllers. In the last few years, commercial vendors have inducted CVA models in their
offering due to the advantages provided by subspace models. The latest offering by Aspen,
DMC34, incorporates neural networks for MPC and inferential modeling. ARX, BJ models are
also used for industrial MPC5.
4
https://www.aspentech.com/en/products/msc/aspen-dmc3
5
Qin and Badgwell, A survey of industrial model predictive control technology. Control Engineering Practice, 2003
12
`
This concludes our quick attempt to establish the connection between process industry,
dynamic process modeling, and machine learning. It must now be obvious to you that modern
process industry relies heavily on dynamic process modeling to achieve its objectives of
reducing maintenance costs, and increasing productivity, reliability, safety, and product
quality. ML-based DPM helps build tools quickly to facilitate achieving these objectives.
This introductory chapter has also cautioned against blind application of ‘convenient’ ML
models for dynamic modeling. Your process data will throw several questions at you at each
stage of system identification. No straightforward answers exist to these questions and only
some time-tested guiding principles are available. While the rest of the book will familiarize
you with these principles, the onus still lies on you to use your process insights and SysID
13
`
understanding to make the right modeling choices. And yes, remember the age-old advice6,
“All models are wrong, but some are useful.”
Summary
This chapter impressed upon you the importance of DPM in process industry and the role
machine learning plays in it. We familiarized ourselves with the typical SysID workflow,
explored its different tasks, and looked at different ML models available at our disposal for
model identification. We also explored the application areas in process industry where ML has
proved useful. In the next chapter we will take the first step and learn about the environment
we will use to execute our Python scripts containing SysID code.
6
Attributed to the famous statistician George E. P. Box. It basically implies that your (SysID) model will seldom exactly
represent the real process. However, it can be close enough to be useful for practical purposes.
14
`
Chapter 2
The Scripting Environment
I
n the previous chapter we studied the various aspects of system identification and learned
about its different uses in process industry. In this chapter we will quickly familiarize
ourselves with the Python language and the scripting environment that we will use to write
ML codes, execute them, and see results. This chapter won’t make you an expert in Python
but will give you enough understanding of the language to get you started and help understand
the several in-chapter code implementations in the upcoming chapters. If you already know
the basics of Python, have a preferred code editor, and know the general structure of a typical
SysID script, then you can skip to Chapter 3.
If you skim through the system identification literature, you will find almost exclusive usage of
MATLAB software as the computing environment. This is mostly attributed to the System
Identification toolbox, a very powerful MATLAB toolbox developed by Prof. Lennart Jung (a
legend in system identification). Unfortunately, this tool not freely available, and MATLAB is
not yet as popular as Python among the ML community. Luckily, several good souls in the
Python community have developed specialized libraries for all aspects of SysID. Most of the
popular SysID models can now be generated using off-the-shelf Python libraries. Considering
the dominance of Python for deep learning, Python becomes an excellent choice for SysID
scripting.
In the above context, we will cover the following topics to familiarize you to Python
• Introduction to Python language
• Introduction to Spyder and Jupyter, two popular code editors
• Overview of Python data structures and scientific computing libraries
• Python libraries for system identification
• Overview of a typical ML-DPM/SysID script
15
`
Python is a high-level general-purpose computer programming language that can be used for
application development and scientific computing. If you have used other computer languages
like Visual Basic, C#, C++, Java, then you would understand the fact that Python is an
interpreted and dynamic language. If not, then think of Python as just another name in the list
of computer languages. What is more important is that Python offers several features that
sets it apart from the rest of the pack making it the most preferred language for machine
learning. Figure 2.1 lists some of these features. Python provides all tools to conveniently
carry out all steps of an ML-DPM project, namely, data collection, data exploration, data pre-
processing, model ID, visualization, and solution deployment to end-users. In addition, freely
available tools make writing Python code very easy7.
Installing Python
One can download official and latest version of Python from the python.com website.
However, the most convenient way to install and use Python is to install Anaconda
(www.anaconda.com) which is an open-source distribution of Python. Along with the core
Python, Anaconda installs a lot of other useful packages. Anaconda comes with a GUI called
Anaconda Navigator (Figure 2.2) from where you can launch several other tools.
7
Most of the content of this chapter is similar to that in Chapter 2 of the book ‘Machine Learning in Python for Process
Systems Engineering’ and have been re-produced with appropriate changes to maintain the standalone nature of this
book.
16
`
Jupyter Notebooks are another very popular way of writing and executing Python code. These
notebooks allow combining code, execution results, explanatory text, and multimedia
resources in a single document. As you can imagine, this makes saving and sharing complete
data analysis very easy.
In the next section, we will provide you with enough familiarity on Spyder and Jupyter so that
you can start using them.
17
`
Figure 2.3 shows the interface8 (and its different components) that comes up when you launch
Spyder. These are the 3 main components:
• Editor: You can type and save your code here. Clicking button executes the code
in the active editor tab.
• Console: Script execution results are shown here. It can also be used for executing
Python commands and interact with variables in the workspace.
• Variable explorer: All the variables generated by running editor scripts or console are
shown here and can be interactively browsed.
Like any IDE, Spyder offers several convenience features. You can divide your script into cells
and execute only selected cell if you choose to (by pressing Ctrl + Enter buttons). Intellisense
allows you to autocomplete your code by pressing Tab key. Extensive debugging
functionalities make troubleshooting easier. These are only some of the features available in
Spyder. You are encouraged to explore the different options (such as pausing and canceling
script execution, clearing out variable workspace, etc.) on the Spyder GUI.
8
If you have used MATLAB, you will find the interface very familiar
18
`
With Spyder, you have to run your script again to see execution results if you close and reopen
your script. In contrast to this, consider the Jupyter interface in Figure 2.4. Note that the
Jupyter interface opens in a browser. We can save the shown code, the execution outputs,
and explanatory text/figures as a (.ipnb) file and have them remain intact when we reopen the
file in Jupyter notebook.
You can designate any input cell as a code or markdown (formatted explanatory text). You
can press Ctrl + Enter keys to execute any active cell. All the input cells can be executed via
the Cell menu.
This completes our quick overview of Spyder and Jupyter interfaces. You can choose either
of them for working through the codes in the rest of the book.
19
`
In the current and next sections, we will see several simple examples of manipulating data
using Python and scientific packages. While these simple operations may seem unremarkable
(and boring) in the absence of any larger context, they form the building blocks of more
complex scripts presented later in the book. Therefore, it will be worthwhile to give these
atleast a quick glance.
Note that you will find ‘#’ used a lot in these examples; these hash marks are used to insert
explanatory comments in code. Python ignores (does not execute) anything written after # on
a line.
Tuples are another sequence construct like lists, with a difference that their items and sizes
cannot be changed Since tuples are immutable/unchangeable, they are more memory
efficient.
# creating tuples
tuple1 = (0,1,'two')
tuple2 = (list1, list2) # equals ([2, 4, 6, 8], ['air', 3, 1, 5])
A couple of examples below illustrate list comprehension which is a very useful way of creating
new lists from other sequences
Note that Python indexing starts from zero. Very often, we need to work with multiple items of
the list. This can be accomplished easily as shown below.
print(list4[4:len(list4)]) # displays [3,1,5]; len() function returns the number of items in list
print(list4[4:]) # same as above
print(list4[::3]) # displays [2, 'air', 5]
print(list4[::-1]) # displays list 4 backwards [5, 1, 3, 'air', 6, 4, 2]
list4[2:4] = [0,0,0] # list 4 becomes [2, 4, 0, 0, 0, 3, 1, 5]
# selectively execute code based on condition # compute sum of squares of numbers in list3
if list1[0] > 0: sum_of_squares = 0
list1[0] = 'positive' for i in range(len(list3)):
else: sum_of_squares += list3[i]**2
list1[0] = 'negative'
print(sum_of_squares) # displays 78
# list1 becomes ['positive', 4, 6]
Custom functions
Previously we used Python’s built-in functions (len(), append()) to carry out operations pre-
defined for these functions. Python allows defining our own custom functions as well. The
advantage of custom functions is that we can define a set of instructions once and then re-
use them multiple times in our script and project.
For illustration, let’s define a function to compute the sum of squares of items in a list
return sum_of_squares
22
`
You might have noticed in our custom function code above that we used
different indentations (number of whitespaces at beginning of code lines) to
separate the ‘for loop’ code from the rest of the function code. This practice is
actually enforced by Python and will result in errors or bugs if not followed.
While other popular languages like C++, C# use braces ({}) to demarcate a
code block (body of a function, loop, if statement, etc.), Python uses
indentation. You can choose the amount of indentation but it must be consistent
within a code block.
This concludes our extremely selective coverage of Python basics. However, this should be
sufficient to enable you to understand the codes in the subsequent chapters. Let’s continue
now to learn about specialized scientific packages.
While the core Python data-structures are quite handy, they are not very convenient for the
advanced data manipulations we require for machine learning tasks. Fortunately, specialized
packages like NumPy, SciPy, Pandas exist which provide convenient multidimensional
tabular data structures suited for scientific computing. Let’s quickly make ourselves familiar
with these packages.
NumPy
In NumPy, ndarrays are the basic data structures which put data in a grid of values.
Illustrations below show how 1D and 2D arrays can be created and their items accessed
23
`
# create a 1D array
arr1D = np.array([1,4,6])
Note that the concept of rows and columns do not apply to a 1D array. Also, you would have
noticed that we imported the NumPy package before using it in our script (‘np’ is just a short
alias). Importing a package makes available all its functions and sub-packages for use in our
script.
24
`
Executing arr2D.sum() returns the scalar sum over the whole array, i.e., 25.
25
`
# slicing
arr8 = np.arange(10).reshape((2,5)) # rearrange the 1D array into shape (2,5)
print((arr8[0:1,1:3]))
>>> [[1 2]]
print((arr8[0,1:3])) # note that a 1D array is returned here instead of the 2D array above
>>> [1 2]
An important thing to note about NumPy array slices is that any change made on sliced view
modifies the original array as well! See the following example
This feature becomes quite handy when we need to work on only a small part of a large
array/dataset. We can simply work on a leaner view instead of carrying around the large
dataset. However, situation may arise where we need to actually work on a separate copy of
subarray without worrying about modifying the original array. This can be accomplished via
the copy method.
26
`
Fancy indexing is another way of obtaining a copy instead of a view of the array being indexed.
Fancy indexing simply entails using integer or boolean array/list to access array items.
Examples below clarify this concept
Vectorized operations
Suppose you need to perform element-wise summation of two 1D arrays. One
approach is to access items at each index at a time in a loop and sum them.
Another approach is to sum up items at multiple indexes at once. The later
approach is called vectorized operation and can lead to significant boost in
computational time for large datasets and complex operations.
# vectorized operations
vec1 = np.array([1,2,3,4])
vec2 = np.array([5,6,7,8])
vec_sum = vec1 + vec2 # returns array([6,8,10,12]); no need to loop through index 0 to 3
Broadcasting
Consider the following summation of arr2D and arr1D arrays
# item-wise addition of arr2D and arr1D
arr_sum = arr2D + arr1D
27
`
Pandas
Pandas is another very powerful scientific package. It is built on top of NumPy and offers
several data structures and functionalities which make (tabular) data analysis and pre-
processing very convenient. Some noteworthy features include label-based slicing/indexing,
(SQL-like) data grouping/aggregation, data merging/joining, and time-series functionalities.
Series and dataframe are the 1D and 2D array like structures, respectively, provided by
Pandas
Note that s.values and df.values convert the series and dataframe into corresponding NumPy
arrays.
9
numpy.org/doc/stable/user/basics.broadcasting.html
28
`
Data access
Pandas allows accessing rows and columns of a dataframe using labels as well as integer
locations. You will find this feature pretty convenient.
# column(s) selection
print(df['id']) # returns column 'id' as a series
print(df.id) # same as above
print(df[['id']]) # returns specified columns in the list as a dataframe
>>> id
0 1
1 1
2 1
# row selection
df.index = [100, 101, 102] # changing row indices from [0,1,2] to [100,101,102] for illustration
print(df)
>>> id value
100 1 10
101 1 8
102 1 6
print(df.loc[101]) # returns 2nd row as a series; can provide a list for multiple rows selection
print(df.iloc[1]) # integer location-based selection; same result as above
Data aggregation
As alluded to earlier, Pandas facilitates quick analysis of data. Check out one quick example
below for group-based mean aggregation
29
`
101 2 32
102 2 24
File I/O
Conveniently reading data from external sources and files is one of the strong forte of Pandas.
Below are a couple of illustrative examples.
# reading from excel and csv files
dataset1 = pd.read_excel('filename.xlsx') # several parameter options are available to customize
what data is read
dataset2 = pd.read_csv('filename.xlsx')
This completes our very brief look at Python, NumPy, and Pandas. If you are new to Python
(or coding), this may have been overwhelming. Don’t worry. Now that you are atleast aware
of the different data structures and ways of accessing data, you will become more and more
comfortable with Python scripting as you work through the in-chapter code examples.
30
`
In the previous chapter we saw several dynamic modeling options for SysID. We also saw
some of the pre-processing steps involved in data preparation and in the subsequent residual
assessment step. Technically, one can employ the previously described scientific packages
to implement the SysID algorithms, but it can be cumbersome and inconvenient. Fortunately,
the Python user-community has made available several libraries that make implementation of
the SysID algorithms very convenient. Figure 2.5 shows some of these libraries.
Figure 2.5: SysID-relevant Python packages10 and the corresponding available functionalities
You will find us using these packages heavily in the upcoming chapters. We understand that
many of the terms in the above figure may be alien to you right now, but we figured that it
would be good to give you a ‘feel’ of what Python has to offer for SysID.
Note that we did not include the celebrated Sklearn library in Figure 2.5 that offers
several generic ML-related functionalities such as dataset splitting,
standardization, scoring, etc. Our SysID scripts will make extensive use of
Sklearn library as well.
10
SIPPY: https://github.com/CPCLAB-UNIPI/SIPPY
SysIdent: https://sysidentpy.org/; https://github.com/wilsonrljr/sysidentpy
Pyflux: https://pyflux.readthedocs.io/en/latest/
31
`
The concluding message is that Python provides most of the tools for SysID that you may find
in other languages such as R and MATLAB.
In Figure 1.4 in Chapter 1, we graphically portrayed different stages of a SysID exercise. Let’s
now see what a typical SysID script looks like. You will also understand how Python, NumPy,
and advanced packages are utilized for ML-DPM scripting. We will study several SysID
aspects in much greater detail in the next few chapters, but this simple script is a good start.
The objective of this simple script is to take data for an input and an output variable from a file
and build an ARX model between them. The first few code lines take care of importing the
libraries that the script will employ.
# import packages
from sippy import system_identification SIPPY provides ARX models
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score
from statsmodels.graphics.tsaplots import plot_acf Used for model diagnostics
We studied NumPy before. SIPPY, Sklearn, and Statsmodels were introduced in previous
section. Another library11 that we see here is matplotlib12 which is used for creating
visualization plots. The next few lines of code fetch raw data and mean-center them.
# fetch data
data = np.loadtxt('InputOutputData.txt')
u = data[:,2:]; y = data[:,1:2] # first column is timestamp
11
If any package/library that you need is not installed on your machine, you can get it by running the command pip
install <package-name> on Spyder console
12
Seaborn is another popular library for creating nice-looking plots
32
`
Here NumPy’s loadtxt function is used to read space-separated data in the file
InputOutputData.txt. The data get stored in a 2D NumPy array, data, where the 2nd and 3rd
columns contain data for the output and input variables, respectively. NumPy slicing is used
to separate the u and y data. Thereafter, variables are pre-processed to mean-center them.
Next, an ARX model13 is fitted and used to make predictions.
plt.figure()
plt.plot(y_centered, 'o', label='raw data')
plt.plot(y_centered_pred, '--', label='ARX fit')
plt.legend(), plt.xlabel('k'), plt.ylabel('y')
The residual plot suggests that further model refinement is warranted. In the later chapters,
we will learn how this inference has been made and what recourses are at our disposal for
model refinement.
13
SIPPY: Model coefficients can be obtained via the G (or NUMAERATOR and DENOMINATOR) attribute(s) of the model
object. See SIPPY’s manual for more details.
33
Random documents with unrelated
content Scribd suggests to you:
Zeus
Alcmena
Zeus
Alcmena
Heart’s utterance
Were mockery, if spoken by the tongue.
Zeus
Alcmena
Zeus
Hope is mine.
Alcmena
Then say, where have you found the keys of life,
That you unlock its portals suddenly?
Zeus
Alcmena
Zeus
I am unchanging,
But, till this moment, me you have not known.
Alcmena
Zeus
When you were younger
And guarded still the pitiable illusion
That life is good and destiny exalted,
Did you not dream perhaps of sacrifice
In which yourself as immolated victim
Should satisfy delirious desire,
Wedded at last in death with strength,—which marriage
Humanly shaped has never learned to yield?
Alcmena
Zeus
Alcmena
Zeus
Alcmena
Whence came the thought of Semele to you?
And why this chain of words now coiled on me
As a predestined victim?
Zeus
I myself
Blaze with the fire of Semele. This hand
Shall rend the veil once more. Myself am hope,
Sole arbiter of germinating life,
The driver of the lusty winds of morning,
The cloud-compeller, dancer of the dance
Wherein the sea is festive and the hills
Nod musical assent, the charioteer
That drags the world behind his flashing wheels,
Bringer of life and change that is called death
And vibrant longing, setter of an end
To fear and doubt, a darting two-edged sword
That heals the wounds created of itself,
The crystal-veined one, in whose blood there flows
The flame of life—in such wise apprehend
Me standing here, and in such wise remark
The honour I have done you.
Alcmena
Open-eyed
At last, I see a spirit stands beside me.
For this cause I grew pale and bent my head
In sweet confusion. Bringer of release,
Even if it should be my worship falls
Before a devil from hell, behold I kneel
To kiss the fragrance of your garment’s hem.
V. DE S. PINTO
(CHRIST CHURCH)
ART
A FAR COUNTRY
THIS wood is older born than other woods:
The trees are God’s imagining of trees,
Anemones
So pale as these
Have never laughed like children in far solitudes,
Shaking and breaking worldforweary moods
To pure and childish glees.
GRAVE JOYS
TO PEGGY
WHEN our sweet bodies moulder under-ground,
Shut off from these bright waters and clear skies,
When we hear nothing but the sullen sound
Of dead flesh dropping slowly from the bone
And muffled fall of tongue and ears and eyes;
Perhaps, as each disintegrates alone,
Frail broken vials once brimmed with curious sense,
Our souls will pitch old Grossness from his throne,
And on the beat of unsubstantial wings
Soar to new ecstasies still more intense.
There the thin voice of horny, black-legged things
Shall thrill me as girls’ laughter thrills me here,
And the cold drops a passing storm-cloud flings
Be my strong wine, and crawling roots and clods
My trees and hills, and slugs swift fallow deer.
There I shall dote upon a sexless flower
By dream-ghosts planted in my dripping brain,
And suck from those cold petals subtler power
Than from your colder, whiter flesh could fall,
Most vile of girls and lovelier than all.
But in your tomb the deathless She will reign
And draw new lovers out of rotting sods
That your lithe body may for ever squirm
Beneath the strange embraces of the worm.
YEGOR
“What shall I write?” said Yegor.—Tchekov.
“WHAT shall I write?” said Yegor;
“Of the bright-plumed bird that sings
Hovering on the fringes of the forest,
Where leafy dreams are grown,
And thoughts go with silent flutterings,
Like moths by a dark wind blown?”
STRANGE ELEMENTS
WHEN my girl swims with me I think
She is a Shark with hungry teeth,
Because her throat that dazzles me
Is white as sharks are underneath.
The King
The King
Pale from the east, the stars arise, and climb,
And then grow bright, beholding Babylon;
They would delay, but may not; so they pass,
And fade and fall, bereft of Babylon.
Quick from the Midian line the sun comes up,
For he expects to see my palaces;
And the moon lingers, even on the wane....
Mine ancient dynasty, as yon great river,
Euphrates, with his fountains in far hills,
Arose in the blue morning of the years;
And as yon river flows on into time,
Unalterable in majesty, my line
Survives in domination down the years.
I know, but am concerned not, that some peoples,
At the pale limits of the world, abide
As yet beyond the circle of my sway,
The miserable sons of meagre soil
That needs much tillage ere the yield be good.
I only wait until they ripen more,
And fatten toward my final harvesting:
When I am ready, I will reap them in.
For it is written in the stars, and read
Of all my wise men and astrologers,
That I, and my great line of Babylon,
Shall rule the world, and only find a bound
Where the horizon’s bounds are set, an end
When the world ends; so shall all other lands,
All languages, all peoples, and all tongues,
Become a fable told of olden times,
Deemed of our sons a thing incredulous.
The King
That voice, that seems to hum my kingdom’s glory
Fails in the vast immensity of night,
As fails all earthly praise of Him who hears
The ceaseless acclamation of the stars.
What needs there more?—the apple of the world,
Grown ripe and juicy, rolls into my lap,
And all the gods of Babylon, well pleased
With blood of bulls and fume of fragrant things,
Even while I take mine ease, attend on me:
The figs do mellow, the olive, and the vine,
And in the plains climb the big sycamores;
My camels and my laden dromedaries
Move in from eastward bearing odorous gums,
And the Zidonians hew me cedar beams,
Even tall cedars out of Lebanon;
Euphrates floats his treasured freightage down,
And all great Babylon is filled with spoil.
Wherefore, upon the summit of the world,
The utmost apex of this thronèd realm,
I stand, as stands the driving charioteer,
And steer my course right onward toward the stars.
Mean-fated men my horses trample under,
And my wine-bins have drained the blood of mothers,
And smoothly my wheels run upon the necks
Of babes and sucklings,—while I hold my way,
Serene, supreme, secure in destiny,
Because the gods perceive mine excellence,
And entertain for mine imperial Person
Peculiar favours.... I am Babylon:
Exceeding precious in the High One’s eyes.
VERA VENVSTAS
Corporis
Animæ
O splendid hearte,
Scorned and afflicted, still thou needest not
Comfort of me.
What matter though the body be uncouthe
Wherein thou art? Fear not. He seeth truth
Who gave it thee.
A BABY
A DEVON RHYME
GNARLY and bent and deaf ’s a post
Pore ol’ Ezekiel Purvis
Goeth creepin’ slowly up the ’ill
To the Commoonion Survis.
CHRISTOPHER MARLYE
CHRISTOPHER MARLYE damned his God
In many a blasphemous mighty line,
—Being given to words and wenches and wine.
ebookbell.com