Intelligent Systems-Mihir Sen
Intelligent Systems-Mihir Sen
INTELLIGENT SYSTEMS
Mihir Sen
Department of Aerospace and Mechanical Engineering
University of Notre Dame
Notre Dame, IN 46556, U.S.A.
May 11, 2006
ii
Preface
Intelligent systems form part of many engineering applications that we deal with these days, and
for this reason it is important for mechanical and aerospace engineers to be aware of the basics in
this area. The present notes are for the course AME 60655 Intelligent Systems given during the
Spring 2006 semester to undergraduate seniors and beginning graduate students. The objective of
this course is to introduce the theory and applications of this subject.
These pages are at present in the process of being written. I will be glad to receive comments
and suggestions, or have mistakes brought to my attention.
Mihir Sen
Department of Aerospace and Mechanical Engineering
University of Notre Dame
Notre Dame, IN 46556
U.S.A.
Copyright c ( by M. Sen, 2006
iii
iv
Contents
Preface iii
1 Introduction 1
1.1 Intelligent systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Related disciplines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Systems theory 3
2.1 Mathematical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Algebraic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Ordinary dierential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.3 Partial dierential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.4 Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.5 Functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.6 Stochastic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.7 Uncertain systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.8 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.9 Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 System response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 Linear system identication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5.1 Static systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5.2 Frequency response of linear dynamic systems . . . . . . . . . . . . . . . . . . 12
2.5.3 Sampled functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5.4 Impulse response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.5 Step response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.6 Deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5.7 Model adjustment technique . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5.8 Auto-regressive models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5.9 Least squares and regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5.10 Nonlinear systems identication . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5.11 Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.6 Linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.6.1 Linear algebraic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
v
vi CONTENTS
2.6.2 Ordinary dierential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.6.3 Partial dierential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6.4 Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6.5 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.7 Nonlinear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.7.1 Algebraic equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.7.2 Ordinary dierential equations . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.7.3 Bifurcations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.8 Cellular automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.9 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.9.1 Linear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.9.2 Nonlinear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.10 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.10.1 Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.10.2 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.10.3 Data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.11 Intelligent systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.11.1 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.11.2 Need for intelligent systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 Articial neural networks 35
3.1 Single neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Network architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.1 Single-layer feedforward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.2 Multilayer feedforward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.3 Recurrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.4 Lattice structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3 Learning rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.1 Hebbian learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.2 Competitive learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.3 Boltzmann learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.4 Delta rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4 Multilayer perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4.1 Feedforward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4.2 Backpropagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4.3 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4.4 Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.5 Radial basis functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.6 Other examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.7 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.7.1 Heat exchanger control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.7.2 Control of natural convection . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.7.3 Turbulence control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
CONTENTS vii
4 Fuzzy logic 57
4.1 Fuzzy sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2.1 Mamdani method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2.2 Takagi-Sugeno-Kang (TSK) method . . . . . . . . . . . . . . . . . . . . . . . 59
4.3 Defuzzication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4 Fuzzy reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.5 Fuzzy-logic modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.6 Fuzzy control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.7 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.8 Other applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5 Probabilistic and evolutionary algorithms 63
5.1 Simulated annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2 Genetic algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.3 Genetic programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.4.1 Noise control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.4.2 Fin optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.4.3 Electronic cooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6 Expert and knowledge-based systems 67
6.1 Basic theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7 Other topics 69
7.1 Hybrid approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.2 Neurofuzzy systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.3 Fuzzy expert systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.4 Data mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.5 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
8 Electronic tools 71
8.1 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.1.1 Digital electronics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.1.2 Mechatronics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.1.3 Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.1.4 Actuators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.2 Computer programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.2.1 Basic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.2.2 Fortran . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.2.3 LISP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.2.4 C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.2.5 Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.2.6 C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
8.2.7 Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
8.3 Computers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
viii CONTENTS
8.3.1 Workstations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
8.3.2 PCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
8.3.3 Programmable logic devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
8.3.4 Microprocessors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
9 Applications: heat transfer correlations 73
9.1 Genetic algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
9.1.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
9.1.2 Applications to compact heat exchangers . . . . . . . . . . . . . . . . . . . . 75
9.1.3 Additional applications in thermal engineering . . . . . . . . . . . . . . . . . 78
9.1.4 General discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
9.2 Articial neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
9.2.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
9.2.2 Application to compact heat exchangers . . . . . . . . . . . . . . . . . . . . . 84
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Bibliography 95
Chapter 1
Introduction
The adjective intelligent (or smart) is frequently applied to many common engineering systems.
1.1 Intelligent systems
A system is a small part of the universe that we are interested in. It may be natural like the weather
or man-made like an automobile; it may be an object like a machine or abstract like a system for
electing political leaders. The surroundings are everything else that interacts with the system. The
system may sometimes be further subdivided into subsystems which also interact with each other.
This division into subsystems is not necessarily unique. In this study we are mostly interested in
mechanical devices that we design for some specic purpose. This by itself helps us dene what the
system to be considered is.
Though it is hard to quantify the intelligence of a system, one can certainly recognize the
following two extremes in relation to some of the characteristics that it may possess:
(a) Low intelligence: Typically a simple system, it has to be told everything and needs complete
instructions, needs low-level control, the parameters are set, it is usually mechanical.
(b) High intelligence: Typically a complex system, it is autonomous to a certain extent and needs
few instructions, determines for itself what the goals are,demands high-level control, adaptive, makes
decisions and choices, it is usually computerized.
There is thus a continuum between these two extremes and most practical devices fall within
this category. Because of this broad denition, all control systems are intelligent to a certain extent
and in this respect they are similar. However, the more intelligent systems are able to handle more
complex situations and make more complex decisions. As computer hardware and software improve
it becomes possible to engineer systems that are more intelligent under this denition.
We will be using a collection of techniques known as soft computing. These are inspired by
biology and work well on nonlinear, complex problems.
1.2 Applications
The three areas in which intelligent systems impact the discipline of mechanical engineering are
control, design and data analysis. Some of the specic areas in which intelligent systems have been
applied are the following: instrument landing system, automatic pilot, collision-avoidance system,
anti-lock brake, smart air bag, intelligent road vehicles, planetary rovers, medical diagnoses, image
processing, intelligent data analysis, nancial risk analysis, temperature and ow control, process
1
2 1. Introduction
control, intelligent CAD, smart materials, smart manufacturing, intelligent buildings, internet search
engines, machine translators.
1.3 Related disciplines
Areas of study that are closely related to the subject of these notes are systems theory, control
theory, computer science and engineering, articial intelligence and cognitive science.
1.4 References
[24, 23, 35, 37, 55, 62, 65, 75, 81, 98]. A good textbook is [31].
Chapter 2
Systems theory
A system schematically shown in Fig. 2.1 has an input u(t) and an output y(t) where t is time.
In addition one must consider the state of the system x(t), the disturbance to the system w
s
(t),
and the disturbance to the measurements w
m
. The reason for distinguishing between x and y is
that in many cases the entire state of the system may not be known but only the output is. All
the quantities belong to suitably dened vector spaces [59]. For example, x may be in R
n
(nite
dimensional) or L
2
(innite dimensional).
The model of a system are the equations that relate u, x and y. It may be obtained from a direct,
rst principles approach (modeling), or deduced from empirical observations (system identication).
The response of the system may be mathematically represented in dierential form as
x = f(x, u, w
s
) (2.1)
y = g(x, u, w
m
) (2.2)
In discrete form we have
x
i+1
= f(x
i
, u
i
, w
s,i
) (2.3)
y
i+1
= g(x
i+1
, u
i+1
, w
m,i+1
) (2.4)
where i is an index that corresponds to time. In both cases f and g are operators [59] (also called
mappings or transformations) that take an argument (or pre-image) that belongs to a certain set of
possible values to an image that belongs to another set.
u(t)
system
y(t)
Figure 2.1: Block diagram of a system.
3
4 2. Systems theory
2.1 Mathematical models
A model is something that represents reality; it may for instance be something physical as an
experiment, or be mathematical. The input-output relationship of a mathematical model may be
symbolically represented as y = T(u), where T is an operator. The following are some of the types
that are commonly used.
2.1.1 Algebraic
May be matricial, polynomial or transcendental.
Example 2.1
T(u) = e
u
sin u
Example 2.2
T(u) = Au
where A is a rectangular matrix and u is a vector of suitable length.
2.1.2 Ordinary dierential
May be of any given integer or fractional order. For non-integer order, the derivative of order > 0
may be written in terms of the fractional integral (dened below in Eq. (2.5)) as
c
D
t
u(t) =
c
D
m
t
[
c
D
m
t
u(t)]
where m is the smallest integer larger than . A fractional derivative of order 1/2 is called a
semi-derivative.
Example 2.3
T(u) =
d
2
u
dt
2
+
du
dt
Example 2.4
T(u) =
d
1/2
u
dt
1/2
2.1. Mathematical models 5
2.1.3 Partial dierential
Applies if the dependent variable is a function of more than one independent variable.
Example 2.5
T(u) =
2
u
2
u
t
where is a spatial coordinate.
2.1.4 Integral
May be of any given integer or fractional order. A fractional integral of order > 0 is dened
by [73] [93]
c
D
t
u(t) =
1
()
t
c
(t s)
1
u(s) ds Riemann-Liouville (2.5)
c
D
t
u(t) =
1
( n)
t
c
(t s)
n1
u
(n)
(s) ds (n 1 < < n) Caputo
(2.6)
where the gamma function is dened by
() =
0
r
1
e
r
dr
A fractional integral of order 1/2 is a semi-integral.
For = 1, Eq. (2.5) gives the usual integral. Also it can be shown by dierentiation that
d
dt
c
D
t
u(t)
=
c
D
+1
t
u(t)
2.1.5 Functional
Involves functions which have dierent arguments.
Example 2.6
T(u) = u(t) + u(2t)
Example 2.7
T(u) = u(t) + u(t )
where is a delay.
6 2. Systems theory
2.1.6 Stochastic
Includes random variables with certain probability distributions. In a Markov process the probable
future state of a system depends only on the present state and not on the past.
Let x(t) be a continuous random variable. Its expected value is
Ef(x) = lim
T
1
T
2
exp
(y y)
2
2
2
, (2.11)
where y is the mean and is the standard deviation.
Joint distributions and density
D
x1x2
(y
1
, y
2
) = Probx
1
< y
1
and x
2
< y
2
, (2.12)
P
x1x2
(y
1
, y
2
) =
2
y
1
y
2
D
x1x2
(y
1
, y
2
). (2.13)
The expected value is
E
x
=
x(y) dP
x
(y). (2.14)
2.1. Mathematical models 7
Example 2.8
An example of a stochastic dierential equation is the Langevin equation [94]
du
dt
= u + F(t),
where F(t) is a stochastic uctuation. The solution is
u = u
0
e
t
+ e
t
t
0
e
t
F(t
) dt
. (2.15)
Let u = dx/dt, so that
x = x
0
+
u
0
1 e
t
t
0
e
t
0
e
t
F(t
) dt
dt
. (2.16)
Assuming F(t) to be Gaussian and
E{F(t)} = 0,
E{F(t
1
)F(t
2
)} = (|t
1
t
2
),
where
=
(z) dz,
it can be shown that
E{u
2
(t)} =
2
+
u
2
0
2
e
2t
,
E{x(t) x
0
} =
u
0
1 e
t
,
E{(x(t) x
0
)
2
} =
2
t +
u
2
0
1 e
t
2
+
2
2
3 + 4e
t
e
2t
.
For long time these are
E{u
2
(t)} =
2
,
E{x(t) x
0
} =
u
0
,
E{(x(t) x
0
)
2
} =
2
t.
2.1.7 Uncertain systems
[1]
There is uncertainty from several sources in models. If
x y = 0, (2.17)
x +y 2 = 2, (2.18)
are the exact equations, for which x = y = 1 is the solution, then the equations with uncertainty
could perhaps be
(x y)
2
=
1
, (2.19)
(x +y)
2
4 =
2
. (2.20)
8 2. Systems theory
Then
(x 1)
2
+ (y 1)
2
3
(2.21)
The problem is to nd
3
, given
1
and
2
.
Sometimes, the model is an oversimplication of the exact one. For example, the hydrodynamic
equations applicable to convection heat transfer are often reduced a heat transfer coecient.
There is also possible uncertainty in physical parameters. For an object at temperature T(t)
that is cooling in an ambient at T
, we can write
dT
dt
+T = T
. (2.22)
If
= + (2.23)
then we can nd the uncertainty in the solution to be given by
T = (T
T(0))te
t
(2.24)
2.1.8 Combinations
Such as integro-dierential operators.
Example 2.9
T(u) =
d
2
u
dt
2
+
t
0
u(s) ds
2.1.9 Switching
The operator changes depending on the value of the independent or dependent variable.
Example 2.10
T(u) =
d
2
u
dt
2
+
du
dt
if n t t < (n + 1) t
=
du
dt
if (n + 1) t t < (n + 2) t
where n is even and 2t is the time period.
Example 2.11
T(u) =
d
2
u
dt
2
+
du
dt
if u
1
u < u
2
=
du
dt
otherwise
where u
1
and u
2
are limits within which the rst equation is valid.
2.2. Operators 9
2.2 Operators
If x
1
and x
2
belong to a vector space, then so do x
1
+x
2
and x
1
, where is a scalar. Vectors in a
normed vector space have suitably dened norms or magnitudes. The norm of x is written as [[x[[.
Vectors in inner product vector spaces have inner products dened. The inner product of x
1
and x
2
is written as 'x
1
, x
2
`. A complete vector space is one in which every Cauchy sequence converges.
Complete normed and inner product spaces are also called Banach and Hilbert spaces respectively.
Commonly used vector spaces are R
n
(nite dimensional) and L
2
(innite dimensional).
An operator maps a vector (called the pre-image) belonging to one vector space to another
vector (called the image) in another vector space. The operators themselves belong to a vector
space. Examples of mappings and operators are:
(a) R
n
R
m
such as x
2
= Ax
1
, where x
1
R
n
and x
2
R
m
are vectors, and the operator
A R
nm
is a matrix.
(b) R R such as x
2
= f(x
1
), where x
1
R and x
2
R are real numbers and the operator f is a
function.
The operators given in the previous section are linear combinations of these and others (like for
example derivative or integral operators).
An operator T is linear if
T(u
1
+u
2
) = T(u
1
) +T(u
2
)
and
T(u) = T(u).
where is a scalar. Otherwise it is nonlinear.
Example 2.12
Indicate which are linear and which are not: (a) T(u) = au, (b) T(u) = au + b, (c) T(u) = adu/dt, (d)
T(u) = a(du/dt)
2
, where a and b are constants, and u is a scalar.
2.3 System response
We can represent an input-output relationship by y = T(u) where T is an operator. Thus if we
know the input u(t), then the operations represented by T must be carried out to obtain the output.
This is the forward or operational mode of the system and is the subject matter of courses such as
algebra and calculus, depending of the form of the operators.
Example 2.13
Determine y(t) if u(t) = sin t and T(u) = u
2
10 2. Systems theory
2.4 Equations
Very often for design or control purposes we need to solve the inverse problem, i.e. to nd what u(t)
would be for a given y(t). This is much more dicult and is normally studied in subjects such as
linear algebra or dierential and integral equations. The solutions may not be unique.
Example 2.14
Determine u(t) if y(t) = sin t and T(u) = u
2
.
Example 2.15
Determine u(t) if y(t), kernel K and parameter are given where
u(t) = y(t) +
1
0
K(t, s) u(s) ds (Fredholm equation of the second kind)
Example 2.16
Determine u(t) if y(t), kernel K and parameter are given where
u(t) = y(t) +
t
0
K(t, s) u(s) ds (Volterra equation of the second kind)
Example 2.17
Determine u(t) given y(t) and T(u) = Au, where u and y are m- and s-dimensional vectors and A is a
s m matrix.
The solution is unique if s = m and A is not singular.
Example 2.18
Find the probability distribution of u(t) given that
dy
dt
= T(t, u, w)
where w(t) is a random variable with a given distribution.
Example 2.19
Find the probability distribution of y(t) given that
dy
dt
= y(t) + N(t) (Langevin equation)
where N(t) is white noise.
2.5. Linear system identication 11
2.5 Linear system identication
Generally we develop the structure of the model itself based on the natural laws which we believe
govern the system. It may also happen that we are do not have complete knowledge of the physics
of the phenomena that govern the system but can experiment with it. Thus we may have a set
of values for u(t) and y(t) and we would like to know what T is. This is a system identication
problem. It is even more dicult that the previous problems and we have no general way of doing
it. At present we assume the operators to be of certain forms with undened coecients and then
nd their values that t the data best. Identication can be either o-line when the system is not
in use or on-line when in use.
[50] [69] [70]
Example 2.20
If u = sin t and y = cos t, what is T such that y = T(u)?
Possibilities are
(a) T(u) = u(t
2
)
(b) T(u) = du/dt.
2.5.1 Static systems
Let
y = f(u, )
where a set of data pairs are available for y and u for specic .
This can be reduced to an optimization problem. If we assume the form of f and minimize
(y f(u, ))
2
for the data. There are local, e.g. gradient-based, methods. There are also global
methods such as simulated annealing, genetic algorithms, and interval methods.
Example 2.21
Fit the data set (x
i
, y
i
) for i = 1, . . . , N to the straight line y = ax + b.
The sum of the squares of the errors is
S =
N
i=1
[y
i
(ax
i
+ b)]
2
To minimize S we put S/a = S/b = 0, from which
Nb + a
N
i=1
x
i
=
N
i=1
y
i
b
N
i=1
x
i
+a
N
i=1
x
2
i
=
N
i=1
x
i
y
i
Thus
a =
N(
N
i=1
x
i
y
i
) (
N
i=1
x
i
)(
N
i=1
y
i
)
N
N
i=1
x
2
i
(
N
i=1
x
i
)
2
b =
(
N
i=1
y
i
)(
N
i=1
x
2
i
) (
N
i=1
x
i
y
i
)(
N
i=1
x
i
)
N
N
i=1
x
2
i
(
N
i=1
x
i
)
2
12 2. Systems theory
2.5.2 Frequency response of linear dynamic systems
Using a Laplace transfrom dened as
F(s) = T[f(t)] =
0
f(t)e
st
dt, (2.25)
we get the system transfer function
G(s) =
Y (s)
U(s)
, (2.26)
where Y (s) and U(s) are usually polynomials. Replacing s by i, we get
G() = M()e
i()
. (2.27)
where M is the amplitude and is the phase angle.
Example 2.22
For a rst-order system
dy
dt
+ y = u(t)
the transfer function is
G() =
1
+ i
.
Multiplying numerator and denominator by i, we get
M() =
1
2
+
2
and
() = tan(/).
In the extreme limits, we have
0, M() =
1
, = 0,
, M() =
1
, = /2.
2.5.3 Sampled functions
If f(t) is continuous, then let f
(t) =
k=0
f(kh)(t kh), (2.28)
where h is the sampling interval, and is the so-called delta distribution. The Laplace transform is
F
(s) =
k=0
f(kh)e
ksh
. (2.29)
Writing z = e
sh
, we get the z-transform
F
(s) =
k=0
f(kh)z
k
. (2.30)
The transfer function is then G(z) = Y (z)/U(z).
2.5. Linear system identication 13
2.5.4 Impulse response
The function can be dened as the limit of several dierent functions, such as the one shown in Fig.
2.3.
u(t)
U/t
t
0
t
0
+ t t
Figure 2.3: Impulse of magnitude U.
2.5.5 Step response
A step function is shown in Fig. 2.4.
u(t)
U
t
0
t
Figure 2.4: Step of magnitude U.
Example 2.23
For a rst-order system the response is
y(t) = Ce
t
+
U
.
From initial conditions y = y
0
at t = 0, we get
y U/
y
0
U/
= e
t
.
14 2. Systems theory
The time constant is dened as the value of t where the left side is 1/e of its initial value, so that = 1/
here.
2.5.6 Deconvolution
The convolution integral is
y(t) =
t
0
u()w(t ) d, (2.31)
where w(t) is the impulse response of the system. A system is said to be causal if the output at a
certain time depends only on the past, but not on the future. Given u(t) and y(t), the goal is to nd
w(t). Assume that the value of the variable is held constant between sampling, so that u(t) = u(nh)
and y(t) = y(nh) for nh t < (n + 1)h, where n = 0, 1, 2, . . .. The convolution integral gives
y(T) = h[u(0)w(0)] , (2.32)
y(2T) = h[u(0)w(T) +u(T)w(0)] , (2.33)
.
.
. (2.34)
y(NT) = h
N1
k=0
[u(kh)w(Nh kh h)] . (2.35)
The solution is
w(nh) =
1
u(0)
1
h
y(nh +h)
n
k=1
u(kh)w(nh kh)
. (2.36)
2.5.7 Model adjustment technique
This is described in Fig. 2.5.
system
model
parameter
adjustment
u(t)
u(t)
y(t)
e(t)
+
k=1
y(kh)
T
(kh)
2
. (2.41)
Dierentiating with respect to results in
=
k=1
(kh)
T
(kh)
1
N
k=1
(kh)y(kh). (2.42)
The values outside the measured range are usually taken to be zero. Once the constants
are determined, then y(kh) can be calculated from Eq. (2.40). White noise e may be added to the
mathematical model to give
y(kh) =
m
i=1
a
j
y(kh ih) +
n
i=1
b
i
u(kh ih) +
i=0
c
i
e(kh ih). (2.43)
Example 2.24
For a rst-order dierence equation
y(kh) + ay(kh h) = bu(kh h),
we have
= [a b]
T
,
(kh) = [y(kh h) u(kh h)]
T
.
From measurements
E =
1
N
N
k=1
[y(kh) +ay(kh h) bu(kh h)]
2
.
Dierentiating with respect to a adn b, we get
a
N
k=1
y
2
(kh h) b
N
k=1
y(kh h)u(kh h) =
N
k=1
y(kh)y(kh h),
a
N
k=1
y(kh h)u(kh h) +b
N
k=1
u
2
(kh h) =
N
k=1
y(kh)u(kh h),
so that
a
b
=
y
(
kh h)
y(kh h)u(kh h)
y(kh h)u(kh h) u
2
(kh h)
y(kh)y(kh h)
y(kh)u(kh h)
.
16 2. Systems theory
2.5.9 Least squares and regression
Least-squares estimator, nonlinear problems (Gauss-Newton and Levenberg-Marquardt methods)
[55].
2.5.10 Nonlinear systems identication
[50]
Let
dx
dt
= F(x(t), u(t))
y = G(x(t))
Dierent models have been proposed.
Control-ane
F = f(x) +G(x)u
For example the Lorenz equations (2.49)(2.51), in which the variable r is taken to be the input
u, can be written in this fashion as
f =
(x
2
x
1
)
x
2
x
1
x
3
bx
3
+x
1
x
2
G =
0
x
2
0
Bilinear
This corresponds to a control-ane model with u R, f = Ax and G = Nx+b. A MIMO extension
can be made by taking
G(x)u =
m
i=1
u
i
(t)N
i
x +Bu
where u
i
are the components of the vector u.
Volterra
y(t) = y
0
(t) +
n=1
. . .
k
n
(t; t
1
, . . . , t
n
)u(t
1
) . . . u(t
n
) dt
1
. . . dt
n
where u, y R. In the discrete case, this is
y(kh) = y
0
+
i=0
a
i
u(khih)+
i=0
j=0
b
ij
u(khih)u(khjh)+
i=0
j=0
k=0
c
ijk
u(khih)u(khjh).
(2.44)
2.6. Linear equations 17
Block-oriented
Either the static or the dynamic parts are chosen to be linear or nonlinear and the two arranged in
series. Thus we have two possibilities. In a Hammerstein model (the equations below are not right
since the dynamics are not evident)
v = N(u)
y = L(v)
where L and N are linear and nonlinear operators respectively, and v is an intermediate variable.
Another possibility is the Wiener model where
v = L(u)
y = N(v)
Discrete-time
ARMAX (autoregressive moving average with exogenous inputs)
y
k
=
p
j=1
a
j
y
kj
+
q
j=0
b
j
u
kj
+
r
j=0
c
j
e
kj
where e
k
is a modeling error and can be represented, for example by a Gaussian white noise. A
special case of this is the ARMA model where u
k
is identically zero.
An extension is NARMAX (nonlinear ARMAX) where
y
k
= F(y
k1
, . . . , y
kp
, u
k
, . . . , u
kq
, e
k1
, . . . , e
kr
) +e
k
2.5.11 Statistical analysis
Principal component analysis, clustering, k-means.
2.6 Linear equations
2.6.1 Linear algebraic
Let
y = Au
where u and y are n-dimensional vectors and A is a n n matrix. Then, if A is non-singular, we
can write
u = A
1
y
where A
1
is the inverse of A.
2.6.2 Ordinary dierential
Consider the system
dx
dt
= Ax +Bu (2.45)
y = Cx +Du (2.46)
18 2. Systems theory
where x R
n
, u R
m
, y R
s
, A R
nn
, B R
nm
, C R
sn
, D R
sm
. The solution of Eq.
(2.45) with x(t
0
) = x
0
is
x(t) = e
A(tt0)
x
0
+
t
t0
e
A(tt0)
Bu() d
where the exponential matrix is dened by
e
At
= I +At +
A
2
t
2
2
+
A
3
t
3
3!
+. . .
Using Eq. (2.46), the output is related to the input by
y(t) = C
e
A(tt0)
x
0
+
t
t0
e
A(tt0)
Bu() d
+Du
Linear dierential equations are frequently treated using Laplace transforms. The transform
of the function f(t) is F(s) where
F(s) =
0
f(t)e
st
dt
and the inverse is
f(t) =
1
2i
+i
i
f(t)e
st
ds
where is a suciently positive real number. Application of Laplace transforms reduces ordinary
dierential equations to algebraic equations. The input-output relationship of a linear system is
often expressed as a transfer function which is a ratio of the Laplace transforms.
2.6.3 Partial dierential
Consider
x
t
=
2
x
2
for 0
y = x(0, t)
in the semi-innite domain [0, ) where x = x(, t). The solution with x(, 0) = f(), k(x/)(0, t) =
u(t) and (x/)(, t) 0 as is
x(, t) =
e
2
/4t
0
f(s)e
s
2
/4t
cosh(
xs
2t
) ds
+
k
/2
t
e
s
2
s
2
u(t
2
4s
2
) ds
y = ?
2.6.4 Integral
The solution to Abels equation
t
0
u(s)
(t s)
1/2
ds = y(t)
is
u(t) =
1
d
dt
t
0
y(s)
(t s)
1/2
ds
2.7. Nonlinear systems 19
2.6.5 Characteristics
(a) Superposition: In a linear operator, the change in the image is proportional to the change in the
pre-image. This makes it fairly simple to use a trial and error method to achieve a target output by
changing the input. In fact, if one makes two trials, a third one derived from linear interpolation
should succeed.
(b) Unique equilibrium: There is only one steady state at which, if placed there, the system stays.
(c) Unbounded response: If the steady state is unstable, the response may be unbounded.
(b) Solutions: Though many linear systems can be solved analytically, not all have closed form
solutions but must be solved numerically. Partial dierential equations are especially dicult.
2.7 Nonlinear systems
2.7.1 Algebraic equations
An iterated map f : R
n
R
n
of the form
x
i+1
= f(x)
marches forward in the index i. As an example we can consider the nonlinear map
x
i+1
= rx
i
(1 x
i
) (2.47)
called the logistics map, where x [0, 1] and r [0, 4]. A xed point x maps to itself, so that
x
i+1
= rx
i
(1 x
i
)
from which x = 0 and r
1
. Fig. 2.6 shows the results of the map for severl dierent values of r. For
some, like r = 0.5 and r = 1.5, the stable xed points are reached after some iterations. For r = 3.1,
there is a periodic oscillation, while for r = 3.5 the oscillations have double the period. This period
doubling phenomenon continues as r is increased until the period becomes innite and the values of
x are not repeated. This is deterministic chaos, an example of which is shown for r = 3.9.
2.7.2 Ordinary dierential equations
We consider a set of n scalar ordinary dierential equations written as
dx
i
dt
= f
i
(x
1
, x
2
, . . . , x
n
) for i = 1, 2, . . . , n (2.48)
The critical (singular or equilibrium) points are the steady states of the system so that
f
i
(x
1
, x
2
, . . . , x
n
) for i = 1, 2, . . . , n
Singularity theory looks at the solutions to this equation. In general there are m critical points
(x
1
, x
2
, . . . , x
n
) depending on the form of f
i
.
2.7.3 Bifurcations
Bifurcations are qualitative changes in the nature of the response of a system due to changes in a
parameter. An example has already been given for the iterative map (2.47). Similar behavior can
also be observed for dierential systems.
20 2. Systems theory
0 2 4 6 8 10 12 14 16 18
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
i
x(i)
(a)
0 2 4 6 8 10 12 14 16 18
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
i
x(i)
(b)
0 2 4 6 8 10 12 14 16 18
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
i
x(i)
(c)
0 2 4 6 8 10 12 14 16 18
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
i
x(i)
(d)
0 10 20 30 40 50 60 70 80 90
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
i
x(i)
(e)
Figure 2.6: Logistics map; x
0
= 0.5 and r = (a) 0.5, (b) 1.5, (c) 3.1, (d) 3.5, (e) 3.9.
2.7. Nonlinear systems 21
Suppose that there are parameters R
m
in the system
dx
i
dt
= f
i
(x
1
, x
2
, . . . , x
n
;
1
,
2
, . . . ,
m
) for i = 1, 2, . . . , n
which may vary. Then the dynamical system may have dierent long-time solutions depending
on the nature of f
i
and the values of
j
. The following are some examples of bifurcations which
commonly occur in nonlinear dynamical systems: steady to steady, steady to oscillatory, oscillatory
to chaotic Some examples are given below.
The rst three examples are for the one-dimensional equation dx/dt = f(x, ) where x R.
(a) Pitchfork if f(x) = x[x
2
(
0
)].
(b) Transcritical if f(x) = x[x (
0
)].
(c) Saddle-node if f(x) = x
2
+ (
0
).
(d) Hopf: In two-dimensional space we have
dx
1
dt
= (
0
)x
1
x
2
(x
2
1
+x
2
2
)x
1
,
dx
2
dt
= x
1
+ (
0
)x
2
(x
2
1
+x
2
1
)x
2
.
There is a Hopf bifurcation at =
0
which can be readily observed by transforming to polar
coordinates (r, ) where r
2
= x
2
1
+x
2
2
, tan = x
2
/x
1
, to get
dr
dt
= r(
0
) r
3
,
d
dt
= 1.
(e) 3-dimensional dynamical system: Consider the Lorenz equations
dx
1
dt
= (x
2
x
1
), (2.49)
dx
2
dt
= rx
1
x
2
x
1
x
3
, (2.50)
dx
3
dt
= bx
3
+x
1
x
2
. (2.51)
The critical points of this system of equations are
(0, 0, 0) and (
b(r 1),
b(r 1),
b(r 1),
p +
2
u (T T
0
)g
T
t
+u T =
2
T
where u, p and T are the velocity, pressure and temperature elds respectively, is the density, is
the kinematic viscosity, is the thermal diusivity, g is the gravity vector, and is the coecient of
thermal expansion. The thermal boundary conditions are the temperatures of the upper and lower
surfaces. Below a critical temperature dierence between the two surfaces, T, the u = 0 conductive
solution is stable. At the critical value it becomes unstable and bifurcates into two convective ones.
For rigid walls, this occurs when the Rayleigh number gTH
3
/ = 1108. At higher Rayleigh
numbers, the convective rolls also become unstable and other solutions appear.
(g) Mechanical systems: The system of springs and bars in the Fig. 2.7(a) will show snap-through
bifurcation as indicated in Fig. 2.7(b).
(h) Chemical reaction: The temperature T of a continuously stirred chemical reactor can be repre-
sented as [16]
dT
dt
= e
E/T
(T T
)
where E is the activation energy of the reaction, is the heat transfer coecient, and T
is the
external temperature. Fig. 2.8(a) shows the functions e
E/E
and (T T
were
the bifurcation parameter as in Fig. 2.8(c).
(i) Design: Sometimes the number of choices of a certain component in a mechanical system design
depends on a parameter. Thus, for example, there may be two electric motors available for 1/4 HP
and below while there may be three for 1/2 HP and below. At 1/4 HP there is thus a bifurcation.
Bifurcations can be supercritical or subcritical depending on whether the bifurcated state is
found only above the critical value of the bifurcation parameter or even below it.
2.8 Cellular automata
Cellular automata (CA), originally invented by von Neumann [99] are nite-state systems that
change in time through specic rules [10, 21, 22, 25, 44, 53, 56, 97, 100, 106, 107]. In general a CA
consists of a discrete lattice of cells. All cells are equivalent and interact only with those in their
local neighborhood. The value at each cell takes on one of a nite number of discrete states, which is
updated according to given rules in discrete time. Even simple rules may give rise to fairly complex
dynamic behavior. The initial state also plays a signicant role in the long-time dynamics, and
dierent initial states may end up at dierent nal conditions.
A one-dimensional automaton is a linear array of cells which at a given instant in time are
either black or white. At the next time step the cells may change color according to a given rule.
For example, one rule could be that if a cell is black and has one neighbor black, it will change to
2.8. Cellular automata 23
F
(a)
F
(b)
Figure 2.7: Mechanical system with snap-through bifurcation.
24 2. Systems theory
T
(T-T)
e
-E/T
(a)
T
A
B
A
B
(b)
T
T
T
T
A
B
A
B
(c)
2.9. Stability 25
white. The rule is applied to all the cells to obtain the new state of the automaton. In general, the
value at the ith cell at the (k + 1)th time step, c
k+1
i
, is given by
c
k+1
i
= F(c
k
ir
, c
k
ir+1
, . . . , c
k
i+r1
, c
k
i+r
), (2.52)
where c
i
can take on n dierent (usually integer) values. The process is marched successively in a
similar manner in discrete time. Initial conditions are needed to start the process and the boundaries
may be considered periodic. There are 256 dierent possible rules. The results of two of them with
an initial black cell are shown in Figure ?. Fractal (i.e. self-similar) and chaotic behaviors are shown.
In a two-dimensional automaton, the cells are laid out in the form of a two-dimensional grid.
The lattice may be triangular, square or hexagonal. In each case, there are dierent ways in which a
neighborhood may be dened. In a simple CA there are black and white dots laid out in a plane as
in a checkerboard. Once again, a dot looks at its neighbors (four for a von Neumann neighborhood,
eight for Moore, etc.) and decides on its new color at the new instant in time. One very popular set
of rules is the Game of Life by Conway [40] that relates the color of a cell to that of its 8 neighbors:
a black cell will remain black only when surrounded by 2 or 3 black neighbors, a white cell will
become black when surrounded by exactly 3 black neighbors, and in all other cases the cell will
remain or become white. A variety of behaviors are obtained for dierent initial conditions, among
them periodic, translation, and chaotic.
There are variants of CAs that we can include within the general framework. In a coupled-map
lattice, the cell can take any real number value instead of from a discrete set. In an asynchronous
CA the cell values are updated not necessarily together. In other cases, probabilistic instead of
deterministic rules may be used, or the rules may not be the same for all cells. In a mobile CA the
cells are allowed to move.
CAs have characteristics that make them suitable for modeling of the dynamics of complex,
physical systems. They can capture both temporal and spatial characteristics of a physical system
through simple rules. The rules are usually proposed based on physical intuition and the results
compared with observations. Another way is to related the rules to a mathematical models based
perhaps on partial dierential equations [71,96]. An early example of this is the numerical simulation
of uid ows which have been carried out with a hexagonal grid in which the governing equations are
simulated; this is called a lattice gas method [14, 38, 80, 105, 114]. There are many other applications
in which CAs have been used like convection [110], computer graphics [42], robot control [20], urban
studies [102], microstructure evolution [111], data mining [63], pattern recognition [84], music [8],
ecology [78], biology and biotechnology [7,26], information processing [17], robot manufacturing [57],
design [90], and recrystallization [43]. Chopard and Droz [21] provide a compilation of applications
of CAs to physical problems which include statistical mechanics, diusion phenomena, reaction-
diusion processes, and nonequilibrium phase transitions. Harris et al. [46] is another source of
physically-based visual simulations on graphics hardware, including the boiling phenomenon.
2.9 Stability
2.9.1 Linear
To determine the stability of any one of the critical points, the dynamical system (2.48) is linearized
around it to get
dx
i
dt
=
n
j=1
A
ij
x
j
for i = 1, 2, . . . , n
26 2. Systems theory
This system of equations has a unique critical point, i.e. the origin. The eigenvalues of the matrix
A = A
ij
determine its linear stability, i.e. its stability to small disturbances. If all eigenvalues
have negative real parts, the system is stable.
2.9.2 Nonlinear
It is possible for a system to be stable to small disturbances but unstable to large ones. In general
it is not possible to determine the nonlinear stability of any system.
The Lyapunov method is one that often works. Let us translate the coordinate system to a
critical point so that the origin is now one of the critical points of the new system. If there exists a
function V (x
1
, x
2
. . . . , x
n
) such that (a) V 0 and (b) dV/dt 0 with the equalities holding only
for the origin, then the origin is stable for all perturbations large or small. In this case V is known
as a Lyapunov function.
2.10 Applications
2.10.1 Control
Open-loop
The objective of open-loop control is to nd u such that y = y
s
(t), where y
s
, known as a reference
value, is prescribed. The problem is one of regulation if y
s
is a constant, and tracking if it is function
of time.
Consider a system
dx
1
dt
= a
1
x
1
dx
2
dt
= a
2
x
2
For regulation the objective is to go from an initial location (x
1
, x
2
) to a nal (x
1
, x
2
). We can
calculate the eect that errors in initial position and system parameters will have on its success.
Errors due to these will continue to grow so that after a long time the actual and desired states may
be very dierent. Open-loop control is usually of limited use also since the mathematical model of
the plant may not be correctly known.
Feedback
For closed-loop control, there is a feedback from the output to the input of the system, as shown in
Fig. 2.9. Some physical quantity is measured by a sensor, the signal is processed by a controller,
and then used to move an actuator. The process can be represented mathematically by
x = f(x, u, w)
y = g(x, u, w)
u = h(u, u
s
)
The sensor may be used to determine the error
e = y y
s
through a comparator.
2.11. Intelligent systems 27
System
u(t) y(t) y
s
-
+
Controller
e
Figure 2.9: Block diagram of a system with feedback.
PID control
The manipulated variable is taken to be
u(t) = K
p
e(t) +K
i
t
0
e(s) ds +K
d
de(t)
dt
Some work has also been done on PI
x
dt
=
1
(2 )
d
2
dt
2
t
c
(t s)
1
x(s) ds
Show that the usual rst-order derivative is recovered for = 1.
2. Write a computer code to integrate numerically the Lorenz equations (2.49)(2.51). Choose values of the
parameters to illustrate dierent kinds of dynamic behavior.
3. Choose a set of (x
i
, y
i
) for i = 1, . . . , 100 that correspond to a power law y = ax
n
. Write a regression program
to nd a and n.
4. Determine the uncertainty in the frequency of oscillation of a pendulum given the uncertainty in its length.
5. The action of a cooling coil in a room may be modeled as
dTr
dt
= kr(T Tr) +kac(Tac Tr).
2.11. Intelligent systems 29
where Tr is the room temperature, T is the outside temperature, and Tac is the temperature of the cooling
coils. Also
kac =
k
1
if AC is on
0 if AC is o
The cooling comes on when Tr increases to T
c2
and goes o when it decreases to T
c1
, where T
c1
< T
c2
. Taking
T = 100
F, Tac = 40
F, T
c1
= 70
F, T
c2
= 80
F, kr = 0.01 s
1
, k
1
= 0.1 s
1
, plot the variation with time
of the room temperature Tr. Find the period of oscillation analytically and numerically.
6. Set up a stable controller to bring a spring-mass-damper system with m = 0.1 kg, k = 10 N/m, and c = 10
Ns/m from an arbitrary to a given position. First choose (a) a proportional controller and then (b) add a
derivative part to change it to a PD controller. In each case choose suitable values of the controller parameters
and a reference position, and plot the displacement vs. time curves.
7. The forced Dung equation
d
2
x
dt
2
+
dx
dt
x + x
3
= cos t
is a nonlinear model, for example, for the motion of a cantilever beam in the nonuniform eld of two permanent
magnets.
(a) By letting v = dx/dt, write the equation as two rst-order equations.
(b) For = 0, determine the critical points (i.e. x, v) and, by considering the linearized equation around
such points, determine whether they are stable or unstable.
(c) The Dung system may exhibit chaotic behavior when external forcing is added and > 0. In order
to demonstrate this, consider the two equations with = 0.1, = 1.4, and = (i) 0.2, (ii) 0.31, (iii)
0.337, (iv) 0.38, and initial conditions x = 0.1, v = 0. Numerically integrate the equations with these
parameters. For each case, plot (a) time dependence x vs. t and v vs. t, (b) the phase space x vs. v, and
(c) a Poincare section
1
. Discuss the results. Note: To get the long-time behavior of the motion, rather
than just the initial start-up, take t > 800 at least.
8. Fig. 2.10 is a schematic of a mass-spring system in which the mass moves in the transverse y-direction; k is the
spring constant, m is mass, and L(t) is the length of each spring; L
0
is the length when y = 0. The unstretched,
uncompressed spring length is .
(a) Find the governing equation. Neglect gravity.
(b) Find the critical points. Note: There should be only one for L
0
(initially stretched spring) and three
for L
0
< (initially compressed spring).
(c) By taking m = 0.1 kg, k = 10 N/m, and = 0.1 m, perform numerical simulations with L
0
= 0.08 and
L
0
= 0.18 with the initial condition y(0) = 0 m and dy/dt|
t=0
= 0.01 m/s.
(d) Apply a suitable vertical, sinusoidal force on the mass. Perform numerical simulations to show the eect
of hysteresis.
L
0
L
0
L(t) L(t)
y
m
k k
Figure 2.10: Schematic diagram of transverse mass-spring system.
1
The Poincare section is a plot of the discrete set of (x, v) at every period of the external forcing, i.e. (x, v) at
t = 2/, 4/, 6/, 8/, . If the solution is periodic, the Poincare section is just a single point. When the
period has doubled, it consists of two points, and so on.
30 2. Systems theory
9. There are three types of problems associated with L[x] = u: operations (given L and x, nd u), equations
(given L and u, nd x), and system ID (given u and x, nd L). Operations are very straight forward and the
result is unique; equations can be more dicult and solutions symbolically represented as x = L
1
[u] are not
necessarily unique. For (a)(e) and (g)(h) below, x = x(t), u = u(t) and for (f) x = x(t), u = real number.
For (a)(g), (i) show that the operator L is linear, (ii) nd the most general form of the solution to the equation
L[x] = u, and (iii) state if the inverse operator L
1
is unique or not. In (h), show that there are at least two
L for which L[x] = u.
(a) Scaler multiplier
L = t, u(t) = sin(t)
(b) Matrix multiplier
L =
3 3 1
1 2 0
4 5 1
, u =
16
8
24
b
a
( ) dt, u = 2
(g) Dierential
L =
d
2
dt
2
, u(t) = cos(2t)
(h) System identication
x(t) = t, u(t) = t
2
10. Consider the numerical integration of the Langevin equation
dv
dt
= v + F(t), (2.53)
where
v =
dx
dt
, (2.54)
and F(t) is a white-noise force. There are several numerical methods to integrate Eq. (2.53)-(2.54), among
them the following
4
.
Euler scheme
x
i+1
= x
i
+hv
i
, (2.55)
v
i+1
= v
i
v
i
h + W(h), (2.56)
with
W(h) = (12h)
1/2
(R 0.5). (2.57)
Heun scheme
x
i+1
= x
i
+hv
i
+
1
2
h
2
v
i
, (2.58)
v
i+1
= v
i
hv
i
+
1
2
2
h
2
v
i
+ W(h)
1
2
hW(h), (2.59)
2
Dened by E
h
[f(t)] = f(t +h).
3
Dened by [f(t)] = f(t + h) f(t).
4
For more details regarding derivation of these schemes see A. Greiner, et al., Journal of Statistical Physics, vol.
15, No. 1/2, p. 94-108, 1988.
2.11. Intelligent systems 31
with
W(h) =
(3h)
1/2
if R < 1/6,
0 if 1/6 R < 5/6,
(3h)
1/2
if 5/6 R.
(2.60)
Here W(h) =
t
i+1
t
i
F(t
)dt
; v
i
= v(t
i
), which is the approximate value at t
i
= ih; h denotes a step-size used
in integration, and R represents random numbers
5
that are uniformly distributed on the interval (0, 1). By
taking = 1.0, (x(0), v(0)) = (1, 0), and the nal time t = 10, and using either numerical scheme (or your
own), perform a large number of realizations. Let
E{M
k
} =
1
N
N
n=1
(Mn)
k
(2.61)
be a moment of order k over all realizations, where N is the number of realizations and M
i
is the result of the
ith simulation. Calculate and plot the quantities E{v(t)
2
} and E{(x(t) x(0))
2
}. Do they agree with those
of the theoretical estimate?
11. Write a computer code to calculate the logistic map
x
n+1
= rxn(1 xn) (2.62)
for 0 r 4. Plot the bifurcation diagram, which represents the long term behavior of x as a function r. Let
r
i
be the location at which the onset of the solution with 2
i
-periods occurs (the bifurcation point). Determine
the precise values of at least the rst seven r
i
. Then estimate the Feigenbaums constant,
= lim
i
r
i
r
i1
r
i+1
r
i
. (2.63)
12. The nondimensional equation for the cooling of a body by convection and radiation is
dT
dt
+T + T
4
= 0, (2.64)
where and are constants, and T(0) = 1. It is known that = 0.1, but there is an uncertainty in the value
of so that = 0.2(1 + ). Let T
(t) be the solution of Eq. (2.64) for a certain value of . Perform a large
number of integrations to determine E{T
(t)} and T
0
(t) (the case where = 0) occurs and what that
maximum deviation value is.
13. The correlation dimension of a set of points may be calculated from the slope of the lnC(r) vs. lnr plot, where
C(r) = lim
m
N(r)
m
2
.
N(r) is the number of pairs of points in the set for which the distance between them is less than r; m is the
total number of points. Using this, nd the correlation dimension of the Lorenz attractor.
14. This problem considers the use of an auto-regressive model to identify a system. Here, it is assumed that the
system is modeled by a dierence equation of the form
y(kh) =
p
j=1
a
j
y(kh jh). (2.65)
(a) Calculate N uniformly-sampled points of the variable x
2
(t), for 15 t 18, of the Lorenz equations with
r = 350, = 10 and b = 8/3 and initial condition x
1
(0) = x
2
(0) = x
3
(0) = 1 as a test signal. By using
the rst n points (with, of course, n > p), determine the auto-regressive coecients a
j
for p = 2, 3, 6,
and 10. Then use these coecients in the auto-regressive model to calculate the rest of the test signal
6
.
Plot discrepancies between the actual test signal and the modeled test signals. In addition, report the
root mean square error of the rst n samples, of the rest, and of the entire signal. Discuss the obtained
results.
5
Random numbers can be generated using the Matlab function rand()=. There are similar commands in Fortran,
C, and C++.
6
The procedure consists of using Eq. 2.65 to predict the signal at t = kh, denoted as a modeled signal y(k h),
from {y(kh jh), j = 1, , p}, the known actual samples from the previous time.
32 2. Systems theory
(b) Repeat with the values of x
2
(t) for 20 t 80 with r = 28, the other parameters being the same as
before.
(c) A cellular automaton consists of a line of cells, each colored either black or white. At every step, the
color of a cell at the next instant in time is determined by a denite rule from the color of that cell and
its immediate left and right neighbors on the previous step, i.e.
a
n
i
= rule[a
n1
i1
, a
n1
i
, a
n1
i+1
] (2.66)
where a
n
i
denotes the color of the cell i at step n. It is easy to see there there are eight possibilities
of [a
n1
i1
, a
n1
i
, a
n1
i+1
] and each combination could yield a new cell a
n
i
with either black or white color.
Therefore, there is a total 2
8
= 256 possible sets of rules. These rules can be numbered from 0 to 255, as
depicted in Fig. 2.11.
With 0 representing white and 1 black, the number assigned is such that when it is written in base 2, it
gives a sequence of 0s and 1s that correspond to the sequence of new colors chosen for each of the eight
possible cases. For example, the rule 90, which is 01011010
2
in base 2, is the case that
[a
n1
i1
, a
n1
i
, a
n1
i+1
] = [1, 1, 1] a
n
i
= 0
[a
n1
i1
, a
n1
i
, a
n1
i+1
] = [1, 1, 0] a
n
i
= 1
[a
n1
i1
, a
n1
i
, a
n1
i+1
] = [1, 0, 1] a
n
i
= 0
[a
n1
i1
, a
n1
i
, a
n1
i+1
] = [1, 0, 0] a
n
i
= 1
[a
n1
i1
, a
n1
i
, a
n1
i+1
] = [0, 1, 1] a
n
i
= 1
[a
n1
i1
, a
n1
i
, a
n1
i+1
] = [0, 1, 0] a
n
i
= 0
[a
n1
i1
, a
n1
i
, a
n1
i+1
] = [0, 0, 1] a
n
i
= 1
[a
n1
i1
, a
n1
i
, a
n1
i+1
] = [0, 0, 0] a
n
i
= 0.
Write a computer code (MatLab, C/C++, or Fortran) to generate the cellular automaton.
i. Take n = 50 (the number of evolution steps) and start from a single black cell. Display
7
the cellular
automaton of the rule 18, 22, 45, 73, 75, 150, 161 and 225 (and any rule that you may be interested
in). As an example, Fig. 2.12 illustrates the cellular automaton, starting with a single black cell, of
rule 90 with n = 50.
ii. Start from a single black cell . Display the cellular automaton of the rule 30 and rule 110 with
n = 40, 200, 1000, and 2000 (or higher).
Discuss the results obtained.
(d) Let look at a cellular automaton involving three colors, rather than two. In this case, cells can also be
gray in addition to black and white. Instead of considering every possible rule, the so-called totalistic
rule is considered. In this rule, the color of a given cell depends on the average color of its immediately
neighboring cells, i.e.
a
n
i
= rule
1
3
i+1
l=i1
a
n1
l
(2.67)
It can be seen that, with three possible colors for each cell, there are seven possible values of the average
color and each average color could give a new cell of black, white or gray color. Therefore, there are
3
7
= 2187 total possible totalistic rule. These rules can be conveniently numbered by a code number as
depicted in Fig. 2.13.
With 0 representing white, 1 gray and 2 black, the code number assigned is such that when it is written
in base 3, it gives a sequence of 0s, 1s and 2s that correspond to the sequence of the new colors chosen
for each of the seven possible cases.
Write a computer code to generate the totalistic cell automaton with three possible colors for each cell.
i. Start from a single gray cell and take n = 50. Display the cellular automaton of the totalistic rule
237, 1002, 1020, 1038, 1056, and 1086 (and any rule you may be interested in).
ii. Start from a single gray cell. Display the cellular automaton of the totalistic rule 1635 and 1599
with n = 50, 200, 1000, and 2000 (or higher).
Discuss the results obtained.
7
One way to accomplish these plotting tasks may be done by using MatLab functions imagesc() and col-
ormap(grayscale).
2.11. Intelligent systems 33
0 0 0 0 0 0 0 0
= 0
0 0 0 0 0 0 0
= 1 1
0 0 0 0 0 0
=
0 1 2
.
.
.
0 0 0
=
0 1 1 1 1 90
.
.
.
=
1 1 1 1 1 1 1 1 255
Figure 2.11: The sequence of 256 possible cellular automaton rules. In each rule, the top row in
each box represents one of the possible combinations of colors [a
n1
i1
, a
n1
i
, a
n1
i+1
] of a cell and its
immediate neighbors. The bottom row species what color the considered cell a
n
i
should be in each
of these cases.
S
t
e
p
5
10
15
20
25
30
35
40
45
50
Figure 2.12: Fifty steps in the evolution of the rule 90 cellular automaton starting from a single
black cell.
34 2. Systems theory
0 0 0 0 0 0 0
=
0
0 0 0 0 0 0
=
1 1
0 0 0 0 0 0
=
2 2
.
.
.
0 0
=
2 0 1 177 2 0
.
.
.
0
=
2 0 0 2 2 1 1635
.
.
.
=
2 2 2 2 2 2 2 2186
Figure 2.13: The sequence of 2187 possible totalistic rules. In each rule, the top row in each box
represents one of the possible average colors of a cell and its immediate neighbors, i.e. the possible
colors of 1/3
i+1
i1
a
n1
l
. The bottom row species what color the considered cell a
n
i
should be
in each of these cases. Note that 0 represents white, 1 gray and 2 black. The rightmost top-row
element of the rule represents the result for average color 0, while the element immediately to its
left represents the result for average color 1/3and so on.
Chapter 3
Artificial neural networks
The technique is derived from eorts to understand the workings of the brain [47]. The brain has
a large number of interconnected neurons of the order of 10
11
with about 10
15
connections between
them. Each neuron consists of dendrites which serve as signal inputs, the soma that is the body of
the cell, and an axon which is the output. Signals in the form of electrical pulses from the neurons
are stored in the synapses as chemical information. A cell res if the sum of the inputs to it exceeds
a certain threshold. Some of the characteristics of the brain are: the neurons are connected in
a massively parallel fashion, it learns from experience and has memory, and it is extremely fault
tolerant to loss of neurons or connections. In spite of being much slower than modern silicon devices,
the brain can perform certain tasks such as pattern recognition and association remarkably well.
A brief history of the subject is given in Haykin [48]. McCulloch and Pitts [108] in 1943
dened a single Threshold Logic Unit for which the input and output were Boolean, i.e. either 0 or
1. Hebbs [49] main contribution in 1949 was to the concept of machine learning. Rosenblatt [79]
introduced the perceptron. Widrow and Ho [104] proposed the least mean-square algorithm and
used it in the procedure called ADALINE (adaptive linear element). After Minsky and Papert [66]
showed that the results of a single-layer perceptron were very restricted there was a decade-long
break in activity in the area; however their results were not for multilayer networks. Hopeld [51] in
1982 showed how information could be stored in dynamically stable feedback networks. Kohonen [58]
studied self-organizing maps. In 1986 a key contribution was made by Rumelhart et al. [83] [82] who
with the backpropagation algorithm made the multilayer perceptron easy to use. Broomhead and
Lowe [15] introduced the radial basis functions.
The objective of articial neural network technology has been to use the analogy with bi-
ological neurons to produce a computational process that can perform certain tasks well. The
main characteristics of the network are their ability to learn and to adapt; they are also massively
parallel and due to that robust and fault tolerant. Further details on neural networks are given
in [85] [48] [103] [89] [88] [19] [45] [36].
3.1 Single neuron
For purposes of computation the neuron (also called a node, cell or unit), as shown in Fig. 3.1,
is assumed to take in multiple inputs, sum them and then apply an activation function to the
sum before putting it out. The information is stored in the weights. The weights can be positive
(excitatory), zero, or negative (inhibitory).
35
36 3. Articial neural networks
x
1
x
2
x
n
(s)
y
+
-
s
Figure 3.1: Schematic of a single neuron.
The argument s of the activation (or squashing) function (s) is related to the inputs through
s
j
=
i
w
ij
y
i
where is the threshold; the term bias, which is the negative of the threshold is also sometimes used.
The threshold can be considered to be an additional input of magnitude 1 and weight . y
i
is the
output of neuron i, and the sum is over all the neurons i that feed to neuron j. With this
s
j
=
i
w
ij
y
i
The output of the neuron j is
y
j
= (s
j
)
The activation functions (s) with range [0, 1] (binary) and [1, 1] (bipolar) that are normally used
are shown in Table 3.1. The constant c represents the slope of the sigmoid functions, and is sometimes
taken to be unity. The activation function should not be linear so that the eect of multiple neurons
cannot be easily combined.
For a single neuron the net eect is then
y
j
= (
i
w
ij
y
i
)
3.2 Network architecture
3.2.1 Single-layer feedforward
This is also called a perceptron. An example is shown in Fig. 3.2.
3.2.2 Multilayer feedforward
A two-layer network is shown in Fig. 3.3.
3.2. Network architecture 37
Function binary (s) = bipolar (s) =
Step (Heaviside, threshold) 1 if s > 0 1 if s > 0
0 if s 0 0 if s = 0
1 if s < 0
Piecewise linear 1 if s > 1/2 1 if s > 1/2
s + 1/2 if 1/2 s 1/2 2s if 1/2 s 1/2
0 if s < 1/2 1 if s < 1/2
Sigmoid 1 + exp(cs)
1
tanh(cs/2)
(logistic)
Table 3.1: Commonly used activation functions.
Input
Output
Figure 3.2: Schematic of a single-layer network.
3.2.3 Recurrent
There must be at least one neuron with feedback as inFig. 3.4. Self-feedback occurs when the output
of a neuron is fed back to itself.
The network shown in Fig. 3.5 is known as the Hopeld network.
3.2.4 Lattice structure
The neurons are laid out in the form of a 1-, 2-, or higher-dimensional lattice. An example is shown
in Fig. 3.6.
38 3. Articial neural networks
Input
Output
(u
i
w
ij
) if winning
0 otherwise
The weights stop changing when they approach the input values.
(a) In a self-organizing features map (Kohonen) the weights in Fig. 3.9 are changed according to
w
ij
=
(x
j
w
ij
) all neurons in the neighborhood
0 otherwise s 0
Similar inputs patterns produce geometrically close winners. Thus high-dimensional input data are
projected onto a two-dimensional grid.
(b) Another example is the Hopeld network.
40 3. Articial neural networks
Figure 3.5: Hopeld network.
3.3.3 Boltzmann learning
This is a recurrent network in which each neuron has a state S = 1, +1. The energy of the
network is
E =
1
2
j=i
w
ij
S
i
S
j
In this procedure a neuron j is chosen at random and its state changed from S
j
to S
j
with
probability 1+exp(E/T)
1
. T is a parameter called the temperature, and E is the change
in energy due to the change in S
j
. Neurons may be visible, i.e. interact with the environment or
invisible. Visible neurons may be clamped (i.e. xed) or free.
3.3.4 Delta rule
This is also called the error-correction learning rule. If y
j
is the output of a neuron j when the
desired value should be y
j
, then the error is
e
k
= y
j
y
j
The weights w
ij
leading to the neuron are modied in the following manner
w
ij
= e
j
u
i
The learning rate is a positive value that should neither be too large to avoid runaway instability,
not too small to take a long time for convergence. One possible measure of the overall error is
E =
1
2
(e
k
)
2
where the sum is over all the output nodes.
3.4. Multilayer perceptron 41
Figure 3.6: Schematic of neurons in a lattice.
i
j
i
x
t
i
y
t
ij
w
Figure 3.7: Pair of neurons.
3.4 Multilayer perceptron
For simplicity, we will use the logistics activation function
y = (s)
=
1
1 e
s
This has the following derivative
dy
dx
=
e
s
(1 +e
s
)
2
= y(1 y)
3.4.1 Feedforward
Consider neuron i connected to neuron j. The outputs of the two are y
i
and y
j
respectively.
42 3. Articial neural networks
Input
Output
Figure 3.8: Connections for competitive learning.
Output nodes
Input nodes
Winning node
Figure 3.9: Self-organizing map.
3.4.2 Backpropagation
According to the delta rule
w
ij
=
j
y
i
where
j
is the local gradient. We will consider neurons that are in the output layer and then those
that are in hidden layers.
(a) Neurons in output layer: If the target output value is y
j
and the actual output is y
j
, then the
error is
e
j
= y
j
y
j
The squared output error summed over all the output neurons is
E =
1
2
j
e
2
j
3.4. Multilayer perceptron 43
We can write
x
j
=
i
w
ij
y
i
y
j
=
j
(x
j
)
The rate of change of E with respect to the weight w
ij
is
E
w
ij
=
E
e
j
e
j
y
j
y
j
x
j
x
j
w
ij
= (e
j
)(1)(
j
(x
i
))(y
i
)
Using a gradient descent
w
ij
=
E
w
ij
= e
j
j
(x
i
)y
i
(b) Neurons in hidden layer: Consider the neurons j in the hidden layer connected to neurons k in
the output layer. Then
j
=
E
y
j
y
j
x
j
=
E
y
j
j
(x
j
)
The squared error is
E =
1
2
k
e
2
k
from which
E
y
j
=
k
e
k
e
k
y
j
=
k
e
k
e
k
x
k
x
k
y
j
Since
e
k
= y
k
y
k
= y
k
k
(x
k
)
we have
e
k
x
k
=
k
(x
k
)
Also since
x
k
=
j
w
jk
y
j
44 3. Articial neural networks
we have
x
k
y
j
= w
jk
Thus we have
E
y
j
=
k
e
k
k
(x
k
)w
jk
=
k
w
jk
so that
j
=
k
w
jk
j
(x
j
)
The local gradients in the hidden layer can thus be calculated from those in the output layer.
3.4.3 Normalization
The input to the neural network should be normalized, say between y
min
= 0.15 and y
max
= 0.85,
and unnormalized at the end. If x is a unnormalized variable and y its normalized version, then
y = ax +b
Since y = y
min
for x = x
min
and y = y
max
for x = x
max
, we have
a =
y
max
y
min
x
max
x
min
b =
x
max
y
min
x
min
y
max
x
max
x
min
This can be used to transfer variables back and forth between the normalized and unnormalized
versions.
3.4.4 Fitting
Fig. 3.10 shows the phenomenon of undertting and overtting during the training process.
Time
Error
Underfitting Overfitting
training
testing
Figure 3.10: Overtting in a learning process.
3.5. Radial basis functions 45
3.5 Radial basis functions
There are three layers: input, hidden and output. The interpolation functions are of the form
F(x) =
N
i=1
w
i
j([[x x
i
[[) (3.1)
where j([[x x
i
[[) is a set of nonlinear radial-basis functions, x
i
are the centers of these functions,
and [[.[[ is the Euclidean norm. The unknown weights can be found by solving a linear matrix
equation.
3.6 Other examples
Cerebeller model articulation controller, adaptive resonance networks, feedback linearization [39].
3.7 Applications
ANNs have generally been used in statistical data analysis such as nonlinear regression and cluster
analysis. Input-output relationships such as y = f(u), y R
m
, u R
n
can be approximated.
Pattern recognition in the face of incomplete data and noise is another important application. In
association information that is stored in a network can be recalled when presented with partial data.
Nonlinear dynamical systems can be simulated so that, given the past history of a system, the future
can be predicted. This is often used in neurocontrol.
3.7.1 Heat exchanger control
Diaz [28] used neural networks for the prediction and control of heat exchangers. Input variables
were the mass ow rates of in-tube and over-tube uids, and the inlet temperatures. The output of
the ANN was the heat rate.
3.7.2 Control of natural convection
[112]
3.7.3 Turbulence control
[41] [60]
Problems
1. This problem concerns feedforward in a trained network (i.e. the set of weights w
ij
and b
j
is given to you,
but you write the feedforward program). Consider the neural network consisting of two neurons in one hidden
layer and one in the output layer as shown in Fig. 3.11.
Columns 1-6 of Boston housing data are used as inputs and column 14 is used as a target data in the training
using error backpropagation technique and the activation function (s) = tanh s. Below is the set of weights
obtained,
46 3. Articial neural networks
j
i
ij
w
Figure 3.11: A feedforward neural network with one hidden layer; there are two neurons in one
hidden layer, and one in the output layer
Neuron 1.
b
1
= 1.0612, w
x
1
1
= 0.7576, w
x
2
1
= 0.1604,
w
x
3
1
= 0.0100, w
x
4
1
= 0.1560, w
x
5
1
= 0.0743, w
x
6
1
= 0.4465
Neuron 2.
b
2
= 0.6348, w
x
1
1
= 0.3835, w
x
2
1
= 0.1729,
w
x
3
1
= 0.0088, w
x
4
1
= 0.2584, w
x
5
1
= 0.2134, w
x
6
1
= 0.5738
Neuron 3.
b
3
= 1.1919, w
13
= 1.1938, w
23
= 1.0434
Download the le housing.data
2
and write a computer code for this feedforward network. Find the output (by
feeding data of columns 1-6 to the network) of the model and then compare it with the target data. Remember
that, before feeding the input data to the network scale them to zero mean, and unit variance.
2. This problem is on the delta learning rule with the gradient descent method of a single neuron with multiple
inputs, no hidden layer, and one output.
(a) Write a computer program (MatLab, C/C++, or Fortran) to apply the delta learning rule to the auto-
mpg data
3
. Take column one as a target data and column four as an input. Use the activation function
(s) = tanh s. Apply the learning rule until w
11
and b
1
are suciently small (i.e. when one is suciently
near the minimum of the error function) and report the numerical values of the weights w
11
and b
1
. To see
how the weights are being adjusted, plot the weights w
11
and b
1
against the number of iterations. Also, on the
same graph, plot the approximate data and the actual.
(b) Repeat using data columns four, ve, and six as input data. Report the numeric values of all weights w
j1
(not just w
11
). Instead of plotting the approximate data, plot the root mean squared error against the number
of iterations.
Appendix: A Gradient Descent Algorithm
Consider a single neuron as shown in Fig. 3.14. To train a neural network with the gradient descent algorithm,
one needs to compute the gradient G of the error function with respect to each weight w
ij
of the network. For p point
training data, dene the error function by the mean squared error, so
3.7. Applications 47
neurons anterior to neuron i neurons posterior to neuron i
Input layer Hidden layer Output layer
i
j k
Figure 3.12: A model of a single neuron. The vector x = x
1
, x
2
, , x
n
denotes the input. w
k
= w
jk
,
j = 1, , n represents the synaptic weights. b
k
is the bias. () is an activation function applied
on s =
k
w
k
x +b
k
.
E =
p
E
p
, E
p
=
1
2
o
(t
p
o
y
p
o
)
2
(3.2)
where o ranges over the output neurons of the network, t
p
o
is the the target data of the training point p. The gradient
G
jk
is dened by
G
jk
=
E
w
jk
=
w
jk
p
E
p
=
p
E
p
w
jk
(3.3)
The equation above implies that the gradient G is the summation of gradients over all training data. It is therefore
sucient to describe the computation of the gradient for a single data point (G is just the summation of these
components.).
For notational simplicity, the superscript p is drop. By using chain rule, one get that
E
w
io
= (to yo)
yo
so
so
w
io
(3.4)
where so =
i
w
io
x
i
+ bo. Since yo = o(so), the second term can be written as
(so). Using so =
i
w
io
x
i
+ bo,
the third term becomes x
i
. Substituting these back into the above equation, one obtains
E
w
io
= (to yo)
(so)x
i
(3.5)
Note again that the gradient G
io
for the entire training data is obtained by summing at each weight the
contribution given by Eq. (3.5) over all the training data. Then, the weights can be updated by
w
io
= w
io
G
io
. (3.6)
where is a small positive constant called the learning rate. If the value of is too large, the algorithm can become
unstable. If it is too small, the algorithm will take long time to converge.
The steps in the algorithm are:
2
It is available at /afs/nd.edu/user10/dwirasae/Public. Description of each column is given in housing.names
3
auto-mpg1.dat can be downloaded from /afs/nd.edu/user10/dwirasae/Public/. auto-mpg.name1 contains the
descriptions of each column.
48 3. Articial neural networks
Initialize weights to small random values
Repeat until the stopping criteria is satised
For each weight, set w
ij
to zeros
For each training data, (x,t)
p
Compute the s
j
, y
j
For each weight, set w
ij
= w
ij
+ (t
j
y
j
)
(s
j
)x
i
For each weight w
ij
set w
ij
= w
ij
+ w
ij
.
The algorithm is terminated, when one is suciently close to the minimum of the error function, where G 0.
1. This problem is on the use of the gradient descent algorithm with backpropagation of error to train a multi-
layer, fully connected neural network. In a fully connected network each node in a given layer is connected to
every node in the next layer. The auto-mpg data is the system to be modeled. The data can be downloaded
from
auto-mpg.dat /afs/nd.edu/user10/diwrasae/Public/
auto-mpg.name1 contains the descriptions of each column. Take column one as a target data and columns
three, four, ve, and six as input data.
Another problem
1. Write a computer program to train the network with one hidden layer with two neurons in this layer. For the
neurons in the hidden layer, use the sigmoidal activation function (s) = 1/(1 + e
s
). For the output neuron,
there is no activation function (or it is simply linear). Plot the root mean squared error as a function of number
of iterations. Report the numerical values of the weights w
ij
and bias b
i
. Compare the output of the network
and the target data by plotting them together in one plot.
2. Repeat Part 1 with a network consisting of two hidden layers in which each layer consists of two neurons.
Compare the output obtained with that of Part 1.
Note that, before training the network, it is recommend to scale the input and target data, say between 0.15
and 0.85.
Appendix: Error Backpropagation and Gradient Descent Algorithm
In this appendix, we describe the gradient descent algorithm with error backpropagation to train a multi-layer
neural network. Assume here that we have p pairs (x, t) of training data. The vector x denotes an input to the
network and t the corresponding target (desired output). As seen before in the previous assignment, the overall
gradient G is the summation of the gradients for each training data point. It is therefore sucient to describe the
computation of the gradient for a single data point. Let w
ij
represent the weight from neuron j to neuron i as in Fig.
3.13 (note that this was dened as w
ji
in the last homework). In addition, let dene the following.
The error for neuron i:
i
= E/s
i
.
The negative gradient for weight w
ij
: w
ij
= E/w
ij
.
The set of neurons anterior to neuron i: A
i
= {j | w
ij
}.
The set of neurons posterior to neuron i: P
i
= {j | w
ji
}.
Note that s
i
is an activation potential at neuron i (it is an argument of the activation function at neuron i). Examples
of the set A
i
and P
i
are shown in Fig. 3.14.
As done before, by using chain rule, the gradient can be written as
w
ij
=
E
s
i
s
i
w
ij
.
The rst factor on the right hand side is
i
. Since the activation potential is dened by
s
i
=
kA
i
w
ik
y
k
,
3.7. Applications 49
j
i
ij
w
Figure 3.13: Pair of neurons.
neurons anterior to neuron i neurons posterior to neuron i
Input layer Hidden layer Output layer
i
j k
Figure 3.14: Schematic of the set of neurons anterior and posterior to neuron i.
the second factor is therefore nothing but y
j
. Putting them together, we then obtain
w
ij
=
i
y
j
.
In order to compute this gradient, the error at neuron i and the output of relevant neuron j must be given. The
output of neuron i is determined by
y
i
=
i
(s
i
),
where
i
is the activation function of neuron i. Now the remaining task is to compute the error
i
. To accomplish
this, we rst compute the error in the output layer. This error is then propagated back to the neuron in the hidden
layers.
Let consider the output layer. As done before, we dene the error function by the mean squared error, so
E =
1
2
o
(to yo)
2
,
where o ranges over the output neurons of the network. Using the chain rule, the error for the output neuron o is
determined by
o = (to yo)
o
(so),
where
= /so . For the hidden unit, we propagate the error back from the output neurons. Again using the
chain rule, we can expand the error for the hidden neuron in terms of its posterior nodes as
j
=
E
s
j
=
iP
j
E
s
i
s
i
y
j
y
j
s
j
.
50 3. Articial neural networks
The rst factor on the right hand side is
i
. Since s
i
=
kA
i
w
ik
y
k
, the second is simply w
ij
. The third is the
derivative of the activation function of neuron j. Substituting these back, we obtain
j
=
j
(s
j
)
iP
j
i
w
ij
.
The procedures for computing the gradient can be summarized as follows. For given weights w
ij
, rst perform
the feedforward, layer by layer, to get the output of neurons in the hidden layers and the output layer. Then calculate
the error o in the output layer. After that, backpropagate the error, layer by layer, to get the error
i
. Finally,
calculate the gradient w
ij
. The weight w
ij
can then be updated by
w
ij
= w
ij
+
p
w
p
ij
,
where is a small positive constant (note that the superscript p is used to denote the training point; it is not an
exponent).
For a feedforward network which is fully connected, i.e., each node in a given layer connected to every node in
the next layer, one can write the back propagation algorithm in the matrix notation (rather than using the graph form
described above; although more general, an implementation of the graph form usually requires the use of an abstract
data type). In this notation, the bias, activation potentials, and error signals for all neurons in a single layer can be
represented as vectors of dimension n, where n is the number of neurons in that layer. All the non-bias weights from
an anterior to a given layer form a matrix of dimension m n, where m is the number of the neurons in the given
layer and n is the number of the neurons in the anterior layer (the ith-row of this matrix represents the weights from
neurons in the anterior layer to the neuron i in the given layer). Number the layers from 0 (the input layer) to L (the
output layer).
The steps of the algorithm for o-line learning in matrix notation are:
Initialize weights W
l
and bias weights b
l
for layer l = 1, , L, where b
l
is the vector of bias weights, to small
random values.
Repeat until the stopping criteria is satised.
Set W
l
and b
l
to zeros.
For each training data (x, t)
Initialize the input layer y
0
= x.
Feedforward: for l = 1, 2, , L,
y
l
=
l
(W
l
y
l1
+b
l
).
Calculate the error in the output layer
L
= (t y
L
)
L
(s
L
),
where denotes the vector of the error signals, s denote the vector of the activation potentials.
And is understood as the elementwise multiplication.
Backpropagate the error: for l = L 1, L 2, , 1,
l
= (W
T
l+1
l+1
)
l
(s
l
),
where T is the transpose operator.
Update the gradient and bias weights: W
l
= W
l
+
l
y
T
l1
, b
l
= b
l
+
l
for l = 1, 2, , L.
Update the weights W
l
= W
l
+ W
l
and bias weights b
l
= b
l
+ b
l
.
The algorithm is terminated when it is suciently close to the minimum of the error function (i.e. when W at the
current iteration step diers slightly from that of the previous step).
Comment from Damrongsak Wirasaet
With the tanh() as an activation function the output from the Neural network will not exceed +-1. For the
rst problem, the network was trained using the target data that is scaled to zero mean and unit variance, the scaled
data may have some values that are grater than 1 (and lower than -1). From the reason given above, the output from
feedforward NN with the coecients given in the problem statement will not exceed +-1. This is normal and you
could leave it like that.
For the second problem, before training the Network, you may scale the input to zero mean and unit variance.
However, scale the target data by subtracting the mean dened by
3.7. Applications 51
mean = (max(t) + min(t))/2
and dividing it by
std = (max(tt) - min(t))/2.
This makes the target data lie between +-1.
Another comment
Actually, I have a hard time training the network with two hidden layer using the sigmoid activation function.
I always get an output with constant value. And that value is the averge of the target data. I am not sure the reason
why (I suspect that the network coecient I get is the one that is the local minimum of the error function.). Indeed,
some of you encounter the same problem. Note that this problem goes away when I use the tanh function as an
activation function. And I do not ask you to use tanh() activation function.
Below are the codes I used to train the network.
One hidden layer
clear ;
% load housing.data
housing = load(auto-mpg.dat) ;
% Cook-up data
X = linspace(-10,10,100) ; X = X ;
t = tanh(X) ;
%t = 1./(1 + exp(-X)) ;
% t = cos(X) ;
% X = housing(:,1:6) ;
% t = housing(:,14) ;
% X = housing(:,[3 4 5 6]) ;
% t = housing(:,1) ;
%-----------, normalize between +/1
% xmean = mean(X) ;
% xstd = std(X) ;
% X = (X - ones(size(X,1),1)*xmean)./(ones(size(X,1),1)*xstd) ;
%
%
% xmean = (max(X) + min(X))/2 ;
% xstd = (max(X) - min(X))/2 ;
% for i = 1: size(X,1)
% X(i,1:size(X,2)) = (X(i,1:size(X,2)) - xmean)./xstd ;
% end
%
% tmean = (max(t) + min(t))/2 ;
% tstd = (max(t) - min(t))/2 ;
% t = (t - tmean)/tstd ;
%------------------------------------------------------
%------------------------------------------------------
% ymin = 0.15 ;
% ymax = 0.85 ;
% for i = 1: size(X,2)
% xmax = max(X(:,i)) ;
% xmin = min(X(:,i)) ;
% a(i) = (ymax - ymin)/(xmax - xmin) ;
% b(i) = (xmax*ymin - xmin*ymax)/(xmax - xmin) ;
% end
% for i = 1: size(X,1)
% X(i,:) = a.*X(i,:) + b ;
52 3. Articial neural networks
% end
% xmax = max(t) ;
% xmin = min(t) ;
% a = (ymax - ymin)/(xmax - xmin) ;
% b = (xmax*ymin - xmin*ymax)/(xmax - xmin) ;
% t = a*t + b ;
%-------------------------------------------------------
numHidden = 2 ;
randn(seed, 123456) ;
W1 = 0.1*randn(numHidden, size(X,2)) ;
W2 = 0.1*randn(size(t,2), numHidden) ;
b1 = 0.1*randn(numHidden, 1) ;
b2 = 0.1*randn(size(t,2), 1) ;
numEpochs = 2000 ;
numPatterns = size(X,1) ;
eta = 0.005 ;
for i = 1:numEpochs
disp( i ) ;
dw1 = zeros(numHidden, size(X,2)) ;
dw2 = zeros(size(t,2), numHidden) ;
db1 = zeros(numHidden, 1) ;
db2 = zeros(size(t,2), 1) ;
err = zeros(size(X,1), 1) ;
for n = 1: numPatterns
y0 = X(n,:) ;
% Output, error, and gradient
s1 = W1*y0 + b1 ;
y1 = tanh(s1) ; % tanh()
% y1 = 1./(1 + exp(-s1)) ;
s2 = W2*y1 + b2 ;
y2 = s2 ;
sigma2 = (y2 - t(n,:)) ; err(n) = sigma2 ;
sigma1 = (W2*sigma2).*(1 - y1.*y1) ; % tanh()
% sigma1 = (W2*sigma2).*y1.*(1 - y1) ;
dw1 = dw1 + sigma1*y0 ; db1 = db1 + sigma1 ;
dw2 = dw2 + sigma2*y1 ; db2 = db2 + sigma2 ;
end
% Update gradient
W1 = W1 - eta*dw1 ; b1 = b1 - eta*db1 ;
W2 = W2 - eta*dw2 ; b2 = b2 - eta*db2 ;
% mse(i) = var(err) ;
E = sqrt(err*err)/size(t,2) ;
mse(i) = E ;
end
% Report the weight
db1
W1
db2
3.7. Applications 53
W2
semilogy(1:numEpochs, mse, - ) ;
hold on ;
Two hidden layers
clear ;
housing = load(auto-mpg.dat) ;
% Cook-up data
% X = linspace(-5,5,100) ; X = X ;
% t = 1./(1 + exp(-X)) ;
% t = tanh(X) ;
% t = sin(X) ;
X = housing(:,[3 4 5 6]) ;
t = housing(:,1) ;
%-----------, normalize between +/1
% xmean = mean(X) ;
% xstd = std(X) ;
% X = (X - ones(size(X,1),1)*xmean)./(ones(size(X,1),1)*xstd) ;
%
% xmean = (max(X) + min(X))/2 ;
% xstd = (max(X) - min(X))/2 ;
% for i = 1: size(X,1)
% X(i,1:size(X,2)) = (X(i,1:size(X,2)) - xmean)./xstd ;
% end
%
% tmean = (max(t) + min(t))/2 ;
% tstd = (max(t) - min(t))/2 ;
% t = (t - tmean)/tstd ;
%------------------------------------------------------------
%------------------------------------------------------------
ymin = 0.15 ;
ymax = 0.85 ;
for i = 1: size(X,2)
xmax = max(X(:,i)) ;
xmin = min(X(:,i)) ;
a(i) = (ymax - ymin)/(xmax - xmin) ;
b(i) = (xmax*ymin - xmin*ymax)/(xmax - xmin) ;
end
for i = 1: size(X,1)
X(i,:) = a.*X(i,:) + b ;
end
xmax = max(t) ;
xmin = min(t) ;
a = (ymax - ymin)/(xmax - xmin) ;
b = (xmax*ymin - xmin*ymax)/(xmax - xmin) ;
t = a*t + b ;
%-------------------------------------------------------
numHidden1 = 2 ;
numHidden2 = 2 ;
% randn(seed, 123456) ;
54 3. Articial neural networks
W1 = 0.1*randn(numHidden1, size(X,2)) ;
W2 = 0.1*randn(numHidden2, numHidden1) ;
W3 = 0.1*randn(size(t,2), numHidden2) ;
b1 = 0.1*randn(numHidden1, 1) ;
b2 = 0.1*randn(numHidden2, 1) ;
b3 = 0.1*randn(size(t,2), 1) ;
numEpochs = 3000 ;
numPatterns = size(X,1) ;
eta = 0.0008 ;
for i = 1:numEpochs
disp( i ) ;
dw1 = zeros(numHidden1, size(X,2)) ;
dw2 = zeros(numHidden2, numHidden1) ;
dw3 = zeros(size(t,2), numHidden2) ;
db1 = zeros(numHidden1, 1) ;
db2 = zeros(numHidden2, 1) ;
db3 = zeros(size(t,2), 1) ;
err = zeros(size(X,1), 1) ;
for n = 1: numPatterns
y0 = X(n,:) ;
% Output, error, and gradient
s1 = W1*y0 + b1 ;
% y1 = 1./(1 + exp(-s1)) ;
y1 = tanh(s1) ;
s2 = W2*y1 + b2 ;
% y2 = 1./(1 + exp(-s2)) ;
y2 = tanh(s2) ;
s3 = W3*y2 + b3 ;
y3 = s3 ;
sigma3 = (y3 - t(n,:)) ; err(n) = sigma3 ;
% sigma2 = (W3*sigma3).*y2.*(1 - y2) ;
% sigma1 = (W2*sigma2).*y1.*(1 - y1) ;
sigma2 = (W3*sigma3).*(1 - y2.*y2) ; % tanh()
sigma1 = (W2*sigma2).*(1 - y1.*y1) ; % tanh()
dw1 = dw1 + sigma1*y0 ; db1 = db1 + sigma1 ;
dw2 = dw2 + sigma2*y1 ; db2 = db2 + sigma2 ;
dw3 = dw3 + sigma3*y2 ; db3 = db3 + sigma3 ;
% Update gradient
% W1 = W1 - eta*dw1 ; b1 = b1 - eta*db1 ;
% W2 = W2 - eta*dw2 ; b2 = b2 - eta*db2 ;
% W3 = W3 - eta*dw3 ; b3 = b3 - eta*db3 ;
end
% Update gradient
W1 = W1 - eta*dw1 ; b1 = b1 - eta*db1 ;
W2 = W2 - eta*dw2 ; b2 = b2 - eta*db2 ;
W3 = W3 - eta*dw3 ; b3 = b3 - eta*db3 ;
% mse(i) = var(err) ;
E = sqrt(err*err)/size(t,2) ;
mse(i) = E ;
3.7. Applications 55
end
% Report the weight
db1
W1
db2
W2
db3
W3
semilogy(1:numEpochs, mse, - ) ;
hold on ;
56 3. Articial neural networks
Chapter 4
Fuzzy logic
[24] [103] [88] [95] [9] [52] [18] [5]
Uncertainty can be quantied with a certain probability. For example, if it is known that of
a number of bottles one contains poison, the probability of choosing the poisoned bottle can be
calculated. On the other hand, if each bottle had a certain amount of poison in it, there would
not be any bottle with pure water nor any with pure poison. This is handled with fuzzy set theory
introduced by Zadeh [113].
In crisp (or classical) sets, a given element is either a member of the set or not. Let us consider
a universe of discourse U that contains all the elements x that we are interested in. A set A U is
formed by all x A. The complement of A is dened by
A = x : x / A. We can also dene the
following operations between sets A and B:
A B = x : x A and x B intersection
A B = x : x A or x B union
A` B = x : x A and x / B dierence
We have the following laws:
A
A = U excluded middle
A
A = contradiction
A B =
A
B De Morgan rst
A B =
A
B De Morgan second
4.1 Fuzzy sets
A fuzzy set A, where x A U, has members x, each of which has a membership
A
(x) that lies
in the interval [0, 1]. The core of A are the values of x with
A
(x) = 1, and the support are those
with
A
(x) > 0. A set is normal if there is at least one element with
A
(x) = 1, i.e. if the core is
not empty. It is convex if
A
(x) is unimodal.
An -cut A
is dened as
A
= x :
A
(x)
Representation theorem
A =
[0,1]
A
57
58 4. Fuzzy logic
The intersection (AND operation) between fuzzy sets A and B can be dened in several ways.
One is through the -cut
(A B)
= A
[0, 1)
The membership function is
AB
(x) = min : x C
= min : x A
= min
A
(x),
B
(x) (4.1)
x U. A and B are disjoint if their intersection is empty. Similarly, the union (OR operation)
and complement (NOT operation) are dened as
AB
(x) = max
A
(x),
B
(x)
A
= 1
A
(x)
Fuzzy sets A = B i
A
(x) =
B
(x) and A B i
A
(x)
B
(x) x U.
Fuzzy numbers: These are sets in R that are normal and convex. The operations of addition and
multiplication (including subtraction and division) with fuzzy numbers A and B are dened as
A+B
(z) = sup
x+y=z
min
A
(x),
B
(y)
AB
(z) = sup
xy=z
min
A
(x),
B
(y)
Fuzzy functions: These are dened in term of fuzzy numbers and their operations dened above.
Linguistic variables: To use fuzzy numbers, certain variables may be referred to with names rather
than values. For example, the temperature may be represented as fuzzy numbers that are given
names such as hot, normal, or cold, each with a corresponding membership function.
Fuzzy rule: This is expressed in the form
IF A THEN C.
where A called the antecedent and C the consequent are fuzzy variables or statements.
4.2 Inference
This is the process by which a set of rules are applied. Thus we may have a set of rules for n input
variables
IF A
i
THEN C
i
, for i = 1, 2, . . . , n.
4.2.1 Mamdani method
In this the form is
IF x
i
is A
1
AND . . . x
n
is A
n
THEN y is B.
where A
i
i = 1, . . . , n) and B are linguistic variables. The AND operation has been dened in Eq.
(4.1).
4.3. Defuzzication 59
4.2.2 Takagi-Sugeno-Kang (TSK) method
Here
IF x
i
is A
1
AND . . . x
n
is A
1
n THEN y = f(x
1
, . . . , x
n
).
The consequent is then crisp. Usually an ane linear function
f = a
0
+
n
i=1
a
i
x
i
is used. singleton?.
4.3 Defuzzication
This converts a single membership function
A
(x) or a set of membership functions
Ai
(x) to a crisp
value x. There are several ways to do this.
Height or maximum membership: For a membership function with a single peaked maximum, x can
be chosen such that
A
(x) is the maximum.
Mean-max or middle of maxima: If there is more than one value of x with the maximum membership,
then the average of the smallest and largest such values can be used.
Centroid, center of area or center of gravity: The centroid of the shape of the membership function
can be determined as
x =
xA
x
A
(x) dx
xA
A
(x) dx
The union is taken if there are a number of membership functions.
Bisector of area: x divides the area into two equal parts so that
x<x
A
(x) dx =
x>x
A
(x) dx
Weighted average: For a set of membership functions, this method weights each by its maximum
value
Ai
(x
m
) at x = x
m
so that
x =
x
m
Ai
(x
m
)
Ai
(x
m
)
This works best if the membership functions are symmetrical about the maximum value.
Center of sums: For a set of membership functions, each one of them can be weighted as
x =
xA
x
A
(x) dx
xA
A
(x) dx
This is similar to the weighted average, except that the integrals of each membership function is
used instead of the xs at the maxima.
60 4. Fuzzy logic
4.4 Fuzzy reasoning
In classical logic, statements are either true or false. For example, one may say that if x and y then
z, where x, y and z are statements that are either true or false. However, in fuzzy logic the truth
value of a statement lies between 0 and 1. In fuzzy logic x, y and z above will each be associated
with some truth value.
Crisp Fuzzy
Fact (x is A) (x is A
)
Rule If (x is A) THEN (y is B) If (x is A) THEN (y is B)
Conclusion (y is B) (y is B
)
where in the last column A, A
, B and B
[minA
ij
(p
i
0
+p
i
1
x
1
. . . p
i
k
x
k
)]
minA
ij
where the ps are determined by minimizing the least squares error using a gradient descent or some
other procedure.
4.6 Fuzzy control
This is based on rules that use human knowledge in the form of IF-THEN rules. The IF part is,
however, applied in a fuzzy manner so that the application of the rules change gradually in the space
of input variables.
Consider the problem of stabilization of an inverted pendulum placed on a cart. The input are
the crisp angular displacement from the desired position , and the crisp angular velocity
. The
controller must nd a suitable crisp force F to apply to the cart.
The steps for a Mamdani-type fuzzy logic control are:
1. Create linguistic variables and their membership functions for input variables, and
, and
the output variable F.
2. Write suitable IF-THEN rules.
3. For given and
values, determine their linguistic versions and the corresponding member-
ships.
4.7. Clustering 61
4. For each combination of the linguistic versions of and
, choose the smallest membership.
Cap the F membership at that value.
5. Draw the F membership function. Defuzzify to determine a crisp value of F.
4.7 Clustering
[13]
We have m vectors that represent points in n-dimensional space. The data can be rst nor-
malized to the range [0, 1]. This is the set U. The objective is to divide U into k non-empty subsets
A
1
, . . . , A
k
such that
k
i=1
A
i
= U
A
i
A
j
= for i = j
For crisp sets this is done by minimizing
J =
m
i=1
k
j=1
Aj
(x
i
)d
2
ij
where
Aj
(x
i
) is the characteristic function for cluster A
j
(i.e.
Aj
(x
i
) = 1 if x
i
A
j
, and = 0
otherwise), and d
ij
is the (suitably dened) distance between x
i
and the center of cluster A
j
at
v
j
=
m
i=1
Aj
(x
i
)x
i
m
i=1
Aj
(x
i
)
Similarly, fuzzy clustering is done by minimizing
J =
m
i=1
k
j=1
Aj
(x
i
)d
2
ij
where the center of cluster A
j
is at
v
j
=
m
i=1
r
Aj
(x
i
)x
i
m
i=1
r
Aj
(x
i
)
with the weighting parameter r 1.
Cluster validity: In the preceding analysis, the number of clusters has to be provided. Validation
involves determining the best number of clusters in terms of minimizing a validation measure.
There are many ways in which this can be dened [81].
4.8 Other applications
Decision making, classication, pattern recognition. Consumer electronics and appliances [86].
62 4. Fuzzy logic
Problems
1. Write a computer program to simulate the fuzzy-logic control of an inverted pendulum. The system to be
considered is that shown at the end of the Section 14.4 of the MEMS handbook. Use the functions given in
Fig. 14.25 as membership functions for cart and pendulum. Simulate the problem with the following initial
conditions (units in degrees and degree/s)
(i) (0) = 10 and
(0) = 0,
(ii) (0) = 30 and
(0) = 0,
(iii) (0) = 15 and
(0) = 0,
(iv) (0) = 0 and
(0) = 15.
In each case, plot pendulum angle, pendulum angular velocity, and cart force as function of time. Does the
controller bring the response of the system to the desired state ( = 0 and
= 0 as t )?
Remark
To implement this problem, one needs values of the pendulum angle (t) and angular velocity
(t). As a
reminder, in an actual system, one obtains these values from sensors. In a purely computer simulation, one
gets these values from a mathematical model. For this particular problem, we can assume that the pendulum
mass is concentrated at the end of the rod and that the rod is massless. The mathematical model approximating
the physical problem can be written as
(M +m) x ml(sin )
2
+ml(cos )
= u
m x cos +ml
= mg sin
where x(t) is the position of the cart, is the angle of the pendulum, M denotes the mass of the cart, m is the
pendulum mass, u(t) represents a force on the cart, and l is the length of the rod (see 14.16 for a schematic
diagram). Extra credit will be given if you verify the above equation.
Chapter 5
Probabilistic and evolutionary algorithms
There are a class of search algorithms that are not gradient based and are hence suitable for the
search for global extrema. Among them are simulated annealing, random search, downhill simplex
search and evolutionary methods [55]. Evolutionary algorithms are those that change or evolve as
the computation proceeds. They are usually probabilistic searches, based on multiple search points,
and inspired by biological evolution. Common algorithms in this genre are the genetic algorithm
(GA), evolution strategies, evolutionary programming and genetic programming (GP).
5.1 Simulated annealing
This a derivative-free probabilistic search method. It can be used both for continuous or discrete
optimization problems. The technique is based on what happens when metals are slowly cooled.
The falling temperature decreases the random motion of the atoms and lets them eventually line up
in a regular crystalline structure with the least potential energy.
If we want to minimize f(x), where f R and x R
n
, the value of the function (called the
objective function) is the analog of the energy level E. The temperature T is a variable that controls
the jump from x to x + x. An annealing or cooling schedule is a predetermined temperature
decrease, and the simplest is to let it fall at a xed rate. A generating function g is the probability
density of x. A Boltzmann machine has
g =
1
(2T)
n/2
exp
[[x[[
2T
f(x
i
) of each.
Select pairs of solutions with probability according to the normalized tness.
Apply crossover with certain probability.
Apply mutation with certain probability.
Apply elitism.
Apply the process to the new generation, and repeat as many times as necessary.
Evolutionary programming is very similar to GAs, except that only mutation is used.
[89] [88]
5.3 Genetic programming
In GPs [81], tree structures are used to represent computer programs. Crossover is then between
branches of the trees representing parts of the program, as in Fig. 5.1.
5.4 Applications
5.4.1 Noise control
[27]
5.4. Applications 65
Offspring
*
+
x
*
*
x
3
x
Parents
*
x +
x
*
*
*
3
x
x
3 x
x 1
+
1
+
3
x
*
1
x
1
Figure 5.1: Crossover in genetic programming.
5.4.2 Fin optimization
[32] [33] [34]
5.4.3 Electronic cooling
[74]
Problems
1. Use the Genetic Algorithm Optimization Toolbox (GAOT)
1
or any other free softwares to nd the solution of
the following problems:
1
C. Houck, J. Je Joines, and M. Kay, A Genetic Algorithm for Function Optimization: A
Matlab Implementation, NCSU-IE TR 95-09, 1995. It can be downloaded at the following URL:
http://www.ie.ncsu.edu/mirage/GAToolBox/gaot/
66 5. Probabilistic and evolutionary algorithms
(a) The maximum of the function
f(x, y) = sin(2x) sin(2y) + cos(x) cos(y) +
3
2
exp
50
(x 0.5)
2
+y
2
, [x, y] [1, 1]
2
(b) Consider an ellipse which dened by the intersection of the surfaces x + y = 1 and x
2
+ 2y
2
+ z
2
= 1.
Find the points on such ellipse which are furthest from and nearest to the origin.
Provide not only solutions but also salient parameters used and if possible the resulting population at a few
specic generations.
Chapter 6
Expert and knowledge-based systems
[6, 30, 87, 91]
6.1 Basic theory
6.2 Applications
67
68 6. Expert and knowledge-based systems
Chapter 7
Other topics
7.1 Hybrid approaches
7.2 Neurofuzzy systems
7.3 Fuzzy expert systems
[6]
7.4 Data mining
7.5 Measurements
[77]
69
70 7. Other topics
Chapter 8
Electronic tools
Digital electronics and computers are essential to the practical use of intelligent systems in engineer-
ing. The hardware and software are continuously in a process of change.
8.1 Tools
8.1.1 Digital electronics
8.1.2 Mechatronics
[54, 68]
8.1.3 Sensors
8.1.4 Actuators
8.2 Computer programming
8.2.1 Basic
8.2.2 Fortran
8.2.3 LISP
8.2.4 C
e
8.2.5 Matlab
Programs can be written in the Matlab language. In many cases, however, it is possible within
Matlab to use a Toolbox that is already written. Toolboxes for articial neural networks, genetic
algorithms, and fuzzy logic are available.
71
72 8. Electronic tools
8.2.6 C++
8.2.7 Java
8.3 Computers
Workstations, mainframes and high-performance computers are generally used for applications like
CAD, intensive number crunching such as in CFD, FEM, etc. PCs also have many of the same
functions but also do CAM and process control in manufacturing. Microprocessors are more special
purpose devices used in applications like embedded control and in places where cheapness and small
size are important.
8.3.1 Workstations
8.3.2 PCs
Languages such as LabVIEW are used.
8.3.3 Programmable logic devices
8.3.4 Microprocessors
Problems
1. This homework is intended to get you a little more familiar with programming in LabVIEW. For each of the
problems, there are many possible solutions, and each can be be as easy, or complicated, as you make it.
(a) Make a calculator that will, at a minimum, add, subtract, multiply, and divide two numbers. Feel free
to add more functions.
(b) Use LabVIEWs waveform generators to generate a sine wave. On the front panel, include controls for
the waves amplitude, phase, and frequency and plot the wave. Now add white noise to the signal and
using LabVIEWs analysis tools, calculate the FFT Power Spectrum of the signal. Include this graph on
the front panel as well.
(c) Simulate data acquisition by assuming a sampling rate and sampling your favorite function. Take at least
200 data points and include, on the front panel, a control for the sampling rate and an X-Y graph of
your sampled data.
Save each le as your-afs-id pr#.vi (e.g. jmayes pr1.vi) and when nished with all three problems, email the
les as attachments to [email protected]. Each le will then be downloaded and run. Files should not need
instructions or additional functions or sub-.vis.
Chapter 9
Applications: heat transfer correlations
9.1 Genetic algorithms
See [72].
Evolutionary programming, of which genetic algorithms and programming are examples, allow
programs to change or evolve as they compute. GAs, specically, are based on the principle of
Darwinian selection. One of their most important applications in the thermal sciences is in the area
of optimization of various kinds.
Optimization by itself is fundamental to many applications. In engineering, for example, it
is important to the design of systems; analysis permits the prediction of the behavior of a given
system, but optimization is the technique that searches among all possible designs of the system
to nd the one that is the best for the application. The importance of this problem has given rise
to a wide variety of techniques which help search for the optimum. There are searches that are
gradient-based and those that are not. In the former the search for the optimum solution, as for
example the maximum of a function of many variables, starts from some point and directs itself
in an incremental fashion towards the optimum; at each stage the gradient of the function surface
determines the direction of the search. Local optima can be found in this way, the search for global
optimum being more dicult. Again, if one visualizes a multi-variable function, it can have many
peaks, any one of which can be approached by a hill-climbing algorithm. To nd the highest of these
peaks, the entire domain has to be searched; the narrower this peak the ner the searching comb
must be. For many applications this brute-force approach is too expensive in terms of computational
time. Alternatives, like simulated annealing, are techniques that have been proposed, and the GA
is one of them.
In what follows we will provide an overview of the genetic algorithm and programming. A
numerical example will be explained in some detail. The methodology will be applied to one of
the heat exchangers discussed before. There will a discussion on other applications in thermal
engineering and comments will be made on potential uses in the future.
9.1.1 Methodology
GAs are discussed in detail by Holland (1975, 1992), Mitchell (1997), Goldberg (1989), Michalewicz,
(1992) and Chippereld (1997). One of the principal advantages of this method is its ability to
pick out a global extremum in a problem with multiple local extrema. For example, we can discuss
nding the maximum of a function f(x) in a given domain a x b. In outline the steps of the
73
74 9. Applications: heat transfer correlations
Figure 9.1: Distribution of tnesses.
procedure are the following.
First, an initial population of n members x
1
, x
2
, . . . , x
n
[a, b] is randomly generated.
Then, for each x a tness is evaluated. The tness or eectiveness is the parameter that
determines how good the current x is in terms of being close to an optimum. Clearly, in this
case the tness is the function f(x) itself, since the higher the value of f(x) the closer we are
to the maximum.
The probability distribution for the next generation is found based on the tness values of each
member of the population. Pairs of parents are then selected on the basis of this distribution.
The osprings of these parents are found by crossover and mutation. In crossover two numbers
in binary representation, for example, produce two others by interchanging part of their bits.
After this, and based on a preselected probability, some bits are randomly changed from 0 to 1
or vice versa. Crossover and mutation create a new generation with a population that is more
likely to be tter than the previous generation.
The process is continued as long as desired or until the largest tness in a generation does not
change much any more.
The procedure can be easily generalized to a function of many variables.
Let us consider a numerical example that is shown in detail in Table 9.1. Suppose that one
has to nd the x at which f(x) = x(1 x) is globally a maximum between 0 and 1. We have taken
n = 6, meaning that each generation will have six numbers. Thus, for a start 6 random numbers are
selected between 0 and 1. Now we choose n
b
which is the number of bits used to represent a number
in binary form. Taking n
b
= 5, we can write the numbers in binary form normalized between 0 and
the largest number possible for n
b
bits, which is 2
n
b
1 = 31. In one run the numbers chosen, and
written down in the rst column of the table labeled G = 0, are 25, 30, 28, 19, 3, and 1, respectively.
The tnesses of each one of the numbers, i.e. f(x), are computed and shown in column two. These
values are normalized by their sum and shown in the third column as s(x). The normalized tnesses
are drawn on a roulette wheel in Figure 9.1. The probability of crossover is taken to be 100%,
meaning that crossover will always occur. Pairs of numbers are chosen by spinning the wheel, the
numbers having a bigger piece of the wheel having a larger probability of being selected. This
produces column four marked G = 1/4, and shuing to producing random pairing gives column
ve marked G = 1/2. The numbers are now split up in pairs, and crossover applied to each pair.
The rst pair [0 0 0 1 1] and [1 1 1 0 0] produces [0 0 0 1 0] and [1 1 1 0 1]. This is illustrated in
Figure 9.2(a) where the crossover position is between the fourth and fth bit; the bits to the right of
this line are interchanged. Crossover positions in the other pairs are randomly selected. Crossover
produces column six marked as G = 3/4. Finally, one of the numbers, in this case the last number
in the list [0 0 1 1 0], is mutated to [0 0 1 0 0] by changing one randomly selected bit from 1 to 0 as
shown in Figure 9.2(b). From the numbers in generation G = 0, these steps have now produced a
new generation G = 1. The process is repeated until the largest tness in each generation increases
no more. In this particular case, values within 3.22% of the exact value of x for maximum f(x),
which is the best that can be done using 5 bits, were usually obtained within 10 generations.
The genetic programming technique (Koza, 1992; Koza, 1994) is an extension of this procedure
in which computer codes take the place of numbers. It can be used in symbolic regression to search
9.1. Genetic algorithms 75
G = 0 f(x) s(x) G = 1/4 G = 1/2 G = 3/4 G = 1
11001 0.1561 0.2475 00011 00011 00010 00010
11110 0.0312 0.0495 00011 11100 11101 11101
11100 0.0874 0.1386 11110 00011 10011 10011
10011 0.2373 0.3762 10011 10011 00011 00011
00011 0.0874 0.1386 00011 11110 11011 11011
00001 0.0312 0.0495 11100 00011 00110 00100
Table 9.1: Example of use of the genetic algorithm.
Figure 9.2: (a) Crossover and (b) mutation in a genetic algorithm.
within a set of functions for the one which best ts experimental data. The procedure is similar to
that for the GA, except for the crossover operation. If each function is represented in tree form,
though not necessarily of the same length, crossover can be achieved by cutting and grafting. As an
example, Figure 9.3 shows the result of the operation on the two functions 3x(x +1) and x(3x +1)
to give 3x(3x + 1) and x(x + 1). The crossover points may be dierent for each parent.
9.1.2 Applications to compact heat exchangers
The following analysis is on the basis of data collected on a single-row heat exchanger referred to as
heat exchanger 1 in Section 2.2. In the following a set of N = 214 experimental runs provided the
data base. The heat rate is determined by
Q = m
a
c
p,a
(T
out
a
T
in
a
) (9.1)
= m
w
c
w
(T
in
w
T
out
w
) (9.2)
For prediction purposes we will use functions of the type
Q = q(T
in
w
, T
in
a
, m
a
, m
w
) (9.3)
The conventional way of correlating data is to determine correlations for inner and outer heat transfer
coecients. For example, power laws of the following form
Nu
a
= a Re
m
a
Pr
1/3
a
(9.4)
Nu
w
= b Re
n
w
Pr
0.3
w
(9.5)
are common. The two Nusselt numbers provide the heat transfer coecients on each side and the
overall heat transfer coecient, U, is related to h
a
and h
w
by
1
UA
a
=
1
h
w
A
w
+
1
h
a
A
a
(9.6)
Figure 9.3: Crossover in genetic programming. Parents are 3x(x + 1) and x(3x + 1); ospring are
3x(3x + 1) and x(x + 1).
76 9. Applications: heat transfer correlations
Figure 9.4: Section of S
U
(a, b, m, n) surface.
Figure 9.5: Ratio of the predicted air- and water-side Nusselt numbers.
To nd the constants a, b, m, n, the mean square error
S
U
=
1
N
1
U
p
1
U
e
2
(9.7)
must be minimized, where N is the number of experimental data sets, U
p
is the prediction made
by the power-law correlation, and U
e
is the experimental value for that run. The sum is over all N
runs.
This procedure was carried out for the data collected. It was found that the S
U
had local
minima for many dierent sets of the constants, the following two being examples.
Correlation a b m n
A 0.1018 0.0299 0.591 0.787
B 0.0910 0.0916 0.626 0.631
Figure 9.4 shows a section of the S
U
surface that passes though the two minima A and B. The
coordinate z is a linear combination of the constants a, b, m and n such that it is zero and unity
at the two minima. Though the values of S
U
for the two correlations are very similar and the heat
rate predictions for the two correlations are also almost equally accurate, the predictions on the
thermal resistances on either side are dierent. Figure 9.5 shows the ratio of the predicted air- and
water-side Nusselt numbers using these two correlations. R
a
is the ratio of the Nusselt number on
the air side predicted by Correlation A divided by that predicted by Correlation B. R
w
is the same
value for the water side. The predictions, particularly the one on the water side, are very dierent.
There are several reasons for this multiplicity of minima of S
U
. Experimentally, it is very
dicult to measure the temperature at the wall separating the two uids, or even to specify where
it should be measured, and mathematically, it is due to the nonlinearity of the function to be
minimized. This raises the question as to which of the local minima is the correct one. A possible
conclusion is that the one which gives the smallest value of the function should be used. This leads
to the search for the global minimum which can be done using the GA.
For this data, Pacheco-Vega et al. (1998) conducted a global search among a proposed set of heat
transfer correlations using the GA. The experimentally determined heat rate of the heat exchanger
was correlated with the ow rates and input temperatures, with all values being normalized. To
reduce the number of possibilities the total thermal resistance was correlated with the mass ow
rates in the form
T
in
w
T
in
a
Q
= f( m
a
, m
w
) (9.8)
The functions f( m
a
, m
w
) that were used are indicated in Table 9.2. The GA was used to seek the
values of the constants associated with each correlation, the objective being to minimize the variance
S
Q
=
1
N
Q
p
Q
e
2
(9.9)
9.1. Genetic algorithms 77
Correlation f a b c d
Power a m
b
w
+c m
d
a
0.1875 0.9997 0.5722 0.5847 0.0252
law
Inverse (a +b m
w
)
1
0.0171 5.3946 0.4414 1.3666 0.0326
linear +(c +d m
a
)
1
Inverse (a +e
b mw
)
1
0.9276 3.8522 0.4476 0.6097 0.0575
exponential +(c +e
d ma
)
1
Exponential ae
b mw
+ce
d ma
3.4367 6.8201 1.7347 0.8398 0.0894
Inverse (a +b m
2
w
)
1
0.2891 20.3781 0.7159 0.7578 0.0859
quadratic +(c +d m
2
a
)
1
Inverse (a +b ln m
w
)
1
0.4050 0.0625 0.5603 0.2048 0.1165
logarithmic +(c + d ln m
a
)
1
Logarithmic a b ln m
w
0.6875 0.4714 0.4902 0.1664
c ln m
a
Linear a b m
w
c m
a
2.3087 0.8533 0.8218 0.2118
Quadratic a b m
2
w
c m
2
a
1.8229 0.6156 0.5937 0.2468
Table 9.2: Comparison of best ts for dierent correlations.
Figure 9.6: Experimental vs. predicted normalized heat ow rates for a power-law correlation. The
straight line is the line of equality between prediction and experiment, and the broken lines are
10%.
where the sum is over all N runs, between the predictions of a correlation,
Q
p
, and the actual
experimental values,
Q
e
. Since the unknowns are the set of constants a, b, c and sometimes d,
a single binary string represents them; the rst part of the string is a, the next is b, and so on.
The rest of the GA is as in the numerical example given before. The results obtained for each
correlation are also summarized in the table in descending order of S
Q
. The last column shows the
mean square error dened in a manner similar to equations (9.19)-(9.20). The parameters used
for the computations are: population size 20, number of generations 1000, bits for each variable 30,
probability of crossover 1, and probability of mutation 0.03.
Some correlations are clearly seen to be superior to others. However, the dierence in S
Q
between the rst- and second-place correlations, the power-law and inverse logarithmic which have
mean errors of 2.5% and 3.3% respectively, is only about 8%, indicating that either could do just
as well in predictions even though their functional forms are very dierent. In fact, the mean error
in many of the correlations is quite acceptable. Figures 9.6 shows the predictions of the power-law
correlation versus the experimental values, all in normalized variables. The prediction is seen to
be very good. The quadratic correlation, on the other hand, is the worst in the set of correlations
considered, and Figure 9.7 shows its predictions. It must also be remarked that, because of the
random numbers used in the procedure, the computer program gives slightly dierent results each
time it is run, changing the lineup of the less appropriate correlations somewhat.
78 9. Applications: heat transfer correlations
Figure 9.7: Experimental vs. predicted normalized heat ow rates for a quadratic correlation. The
straight line is the line of equality between prediction and experiment, and the broken lines are
10%.
9.1.3 Additional applications in thermal engineering
Though the GA is a relatively new technique in relation to its application to thermal engineering,
there are a number of dierent applications that have already been successful. Davalos and Rubinsky
(1996) adopted an evolutionary-genetic approach for numerical heat-transfer computations. Shape
optimization is another area that has been developed. Fabbri (1997) used a GA to determine the
optimum shape of a n. The two-dimensional temperature distribution for a given n shape was
found using a nite-element method. The n shape was proposed as a polynomial, the coecients
of which have to be calculated. The n was optimized for polynomials of degree 1 through 5. Von
Wolfersdorf et al. (1997) did shape optimization of cooling channels using GAs. The design procedure
is inherently an optimization process. Androulakis and Venkatasubramanian (1991) developed a
methodology for design and optimization that was applied to heat exchanger networks; the proposed
algorithm was able to locate solutions where gradient-based methods failed. Abdel-Magid and
Dawoud (1995) optimized the parameters of an integral and a proportional-plus-integral controller
of a reheat thermal system with GAs. The fact that the GA can be used to optimize in the presence
of variables that take on discrete values was put to advantage by Schmit et al. (1996) who used
it for the design of a compact high intensity cooler. The placing of electronic components as heat
sources is a problem that has become very important recently from the point of view of computers.
Queipo et al. (1994) applied GAs to the optimized cooling of electronic components. Tang and
Carothers (1996) showed that the GA worked better than some other methods for the optimum
placement of chips. Queipo and Gil (1997) worked on the multiobjective optimization of component
placement and presented a solution methodology for the collocation of convectively and conductively
air-cooled electronic components on planar printed wiring boards. Meysenc et al. (1997) studied the
optimization of microchannels for the cooling of high-power transistors. Inverse problems may also
involve the optimization of the solution. Allred and Kelly (1992) modied the GA for extracting
thermal proles from infrared image data which can be useful for the detection of malfunctioning
electronic components. Jones et al. (1995) used thermal tomographic methods for the detection
of inhomogeneities in materials by nding local variations in the thermal conductivity. Raudensky
et al. (1995) used the GA in the solution of inverse heat conduction problems. Okamoto et al.
(1996) reconstructed a three-dimensional density distribution from limited projection images with
the GA. Wood (1996) studied an inverse thermal eld problem based on noisy measurements and
compared a GA and the sequential function specication method. Li and Yang (1997) used a GA
for inverse radiation problems. Castrogiovanni and Sforza (1996, 1997) studied high heat ux ow
boiling systems using a numerical method in which the boiling-induced turbulent eddy diusivity
term was used with an adaptive GA closure scheme to predict the partial nucleate boiling regime.
Applications involving genetic programming are rarer. Lee et al. (1997) studied the problem of
correlating the CHF for upward water ow in vertical round tubes under low pressure and low-ow
conditions. Two sets of independent parameters were tested. Both sets included the tube diame-
ter, uid pressure and mass ux. The inlet condition type had, in addition, the heated length and
the subcooling enthalpy; the local condition type had the critical quality. Genetic programming
was used as a symbolic regression tool. The parameters were non-dimensionalized; logarithms were
taken of the parameters that were very small. The tness function was dened as the mean square
dierence between the predicted and experimental values. The four arithmetical operations addi-
9.2. Articial neural networks 79
tion, subtraction, multiplication and division were used to generate the proposed correlations. The
programs ran up to 50 generations and produced 20 populations in each generation. In a rst intent,
90% of the data sets was randomly selected for training and the rest for testing. Since no signicant
dierence was found in the error for each of the sets, the entire data set was nally used both for
training and testing. The nal correlations that were found had predictions better than those in the
literature. The advantage of the genetic programming method in seeking an optimum functional
form was exploited in this application.
9.1.4 General discussion
The evolutionary programming method has the advantage that, unlike the ANN, a functional form
of the relationship is obtained. Genetic algorithms, genetic programming and symbolic regression
are relatively new techniques from the perspective of thermal engineering, and we can only ex-
pect the applications to grow. There are a number of areas in prediction, control and design that
these techniques can be eectively used. One of these, in which progress can be expected, is in
thermal-hydronic networks. Networks are complex systems built up from a large number of simple
components; though the behavior of each component may be well understood, the behavior of the
network requires massive computations that may not be practical. Optimization of networks is an
important issue from the perspective of design, since it is not obvious what the most energy-ecient
network, given certain constraints, should be. The constraints are usually in the form of the lo-
cations that must be served and the range of thermal loads that are needed at each position. A
search methodology based on the calculation of every possible network conguration would be very
expensive in terms of computational time. An alternative based on evolutionary techniques would
be much more practical. Under this procedure a set of networks that satisfy the constraints would
be proposed as candidates for the optimum. From this set a new and more t generation would
evolve and the process repeated until the design does not change much. The denition of tness,
for this purpose, would be based on the energy requirements of the network.
9.2 Articial neural networks
See [29]. In this section we will discuss the ANN technique, which is generally considered to be
a sub-class of AI, and its application to the analysis of complex thermal systems. Applications of
ANNs have been found in such diverse elds as philosophy, psychology, business and economics,
sociology, science, a well as in engineering. The common denominator is the complexity of the eld.
The technique is rooted in and inspired by the biological network of neurons in the human brain
that learns from external experience, handles imprecise information, stores the essential character-
istics of the external input, and generalizes previous experience (Eeckman, 1992). In the biological
network of interconnecting neurons, each receives many input signals from other neurons and gives
only one output signal which is sent to other neurons as part of their inputs. If the sum of the in-
puts to a given neuron exceeds a set threshold, normally determined by the electric potential of the
receiver neuron which may be modied under dierent circumstances, the neuron res and sends a
signal to all the connected receiver neurons. If not, the signal is not transmitted. The ring decision
represents the key to the learning and memory ability of the neural network.
The ANN attempts to mimic the biological neural network: the processing unit is the articial
neuron; it has synapses or inter-neuron connections characterized by synaptic weights; an operator
performs a summation of the input signals weighted by the respective synapses; an activation function
limits the permissible amplitude range of the output signal. It is also important to realize the essential
dierence between a biological neural network and an ANN. Biological neurons function much slower
80 9. Applications: heat transfer correlations
than the computer calculations associated with an articial neuron in an ANN. On the other hand,
the delivery of information across the biological neural network is much faster. The biological one
compensates for the relatively slow chemical reactions in a neuron by having an enormous number
of interconnected neurons doing massively parallel processing, while the number of articial neurons
must necessarily be limited by the available hardware.
In this section we will briey discuss the basic principles and characteristics of the multilayer
ANN, along with the details of the computations made in the feedforward mode and the associated
backpropagation algorithm which is used for training. Issues related to the actual implementation
of the algorithm will also be noted and discussed. Specic examples on the performance of two
dierent compact heat exchangers analyzed by the ANN approach will then be shown, followed by a
discussion on how the technique can also be applied to the dynamic performance of heat exchangers
as well as to their control in real thermal systems. Finally, the potential of applying similar ANN
techniques to other thermal-system problems and their specic advantages will be delineated.
9.2.1 Methodology
The interested reader is referred to the text by Haykin (1994) for an account of the history of ANN
and its mathematical background. Many dierent denitions of ANNs are possible; the one proposed
by Schalko (1997) is that an ANN is a network composed of a number of articial neurons. Each
neuron has an input/output characteristic and implements a local computation or function. The
output of any neuron is determined by this function, its interconnection with other neurons, and
external inputs. The network usually develops an overall functionality through one or more forms
of training; this is the learning process. Many dierent network structures and congurations have
been proposed, along with their own methodologies of training (Warwick et al., 1992).
Feedforward network
There are many dierent types of ANNs, but one of the most appropriate for engineering appli-
cations is the supervised fully-connected multilayer conguration (Zeng, 1998) in which learning is
accomplished by comparing the output of the network with the data used for training. The feedfor-
ward or multilayer perceptron is the only conguration that will be described in some detail here.
Figure 9.8 shows such an ANN consisting of a series of layers, each with a number of nodes. The
rst and last layers are for input and output, respectively, while the others are the hidden layers.
The network is said to be fully-connected when any node in a given layer is connected to all the
nodes in the adjacent layers.
We introduce the following notation: (i, j) is the jth node in the ith layer. The line connecting
a node (i, j) to another node in the next layer i + 1 represents the synapse between the two nodes.
x
i,j
is the input of the node (i, j), y
i,j
is its output,
i,j
is its bias, and w
i,j
i1,k
is the synaptic weight
between nodes (i 1, k) and (i, j). The total number of layers, including those for input and output,
is I, and the number of nodes in the ith layer is J
i
. The input information is propagated forward
through the network; J
1
values enter the network and J
I
leave. The ow of information through the
layers is a function of the computational processing occurring at every internal node in the network.
The relation between the output of node (i 1, k) in one layer and the input of node (i, j) in the
following layer is
x
i,j
=
i,j
+
Ji1
k=1
w
i,j
i1,k
y
i1,k
(9.10)
Thus the input x
i,j
of node (i, j) consists of a sum of all the outputs from the previous nodes modied
by the respective inter-node synaptic weights w
i,j
i1,k
and a bias
i,j
. The weights are characteristic
9.2. Articial neural networks 81
layer number
node number
E
E
E
E
i = 1
g
j = J
i
.
.
.
g
j = 3
g
j = 2
g
j = 1
E
r
r
r
r
r
r
rrj
d
d
d
d
d
d
d
d
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
w
2,1
1,1
w
2,2
1,1
w
2,3
1,1
E
r
r
r
r
r
r
rrj
d
d
d
d
d
d
d
d
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
i = 2
g
.
.
.
g
g
g
B
E
i = I
g
.
.
.
g
g
g
E
E
E
E
Figure 9.8: Schematic of a fully-connected multilayer ANN.
of the connection between the nodes, and the bias of the node itself. The bias represents the
propensity for the combined incoming input to trigger a response from the node and presents a
degree of freedom which gives additional exibility in the training process. Similarly, the synaptic
weights are the weighting functions which determine the relative importance of the signals originated
from the previous nodes.
The input and output of the node (i, j) are related by
y
i,j
=
i,j
(x
i,j
) (9.11)
where
i,j
(x), called the activation or threshold function, plays the role of the biological neuron
determining whether it should re or not on the basis of the input to that neuron. A schematic of the
nodal operation is shown in Figure 9.9. It is obvious that the activation function plays a central role
in the processing of information through the ANN. Keeping in mind the analogy with the biological
neuron, when the input signal is small, the neuron suppresses the signal altogether, resulting in a
vanishing output, and when the input exceeds a certain threshold, the neuron res and sends a signal
to all the neurons in the next layer. This behavior is determined by the activation function. Several
appropriate activation functions have been studied (Haykin, 1994; Schalko, 1997). For instance, a
simple step function can be used, but the presence of non-continuous derivatives causes computing
diculties. The most popular one is the logistic sigmoid function
i,j
() =
1
1 +e
/c
(9.12)
for i > 1, where c determines the steepness of the function. For i = 1,
i,j
() = is used instead.
The sigmoid function is an approximation to the step function, but with continuous derivatives.
82 9. Applications: heat transfer correlations
Figure 9.9: Nodal operation in an ANN.
The nonlinear nature of the sigmoid function is particularly benecial in the simulation of practical
problems. For any input x
i,j
, the output of a node y
i,j
always lies between 0 and 1. Thus, from
a computational point of view, it is desirable to normalize all the input and output data with the
largest and smallest values of each of the data sets.
Training
For a given network, the weights and biases must be adjusted for known input-output values through
a process known as training. The back-propagation method is a widely-used deterministic training
algorithm for this type of ANN (Rumelhart et al., 1986). The central idea of this method is to
minimize an error function by the method of steepest descent to add small changes in the direction of
minimization. This algorithm may be found in many recent texts on ANN (for instance, Rzempoluck,
1998), and only a brief outline will be given here.
In usual complex thermal-system applications where no physical models are available, the
appropriate training data come from experiments. The rst step in the training algorithm is to
assign initial values to the synaptic weights and biases in the network based on the chosen ANN
conguration. The values may be either positive or negative and, in general, are taken to be less
than unity in absolute value. The second step is to initiate the feedforward of information starting
from the input layer. In this manner, successive input and output of each node in each layer can all
be computed. When nally i = I, the value of y
I,j
will be the output of the network. Training of
the network consists of modifying the synaptic weights and biases until the output values dier little
from the experimental data which are the targets. This is done by means of the back propagation
method. First an error
I,j
is quantied by
I,j
= (t
I,j
y
I,j
)y
I,j
(1 y
I,j
) (9.13)
where t
I,j
is the target output for the j-node of the last layer. The above equation is simply a
nite-dierence approximation of the derivative of the sigmoid function. After calculating all the
I,j
, the computation then moves back to the layer I 1. Since the target outputs for this layer do
not exist, a surrogate error is used instead for this layer dened as
I1,k
= y
I1,k
(1 y
I1,k
)
JI
j=1
I,j
w
I,j
I1,k
(9.14)
A similar error
i,j
is used for all the rest of the inner layers. These calculations are then continued
layer by layer backward until layer 2. It is seen that the nodes of the rst layer 1 have neither
nor values assigned, since the input values are all known and invariant. After all the errors
i,j
are known, the changes in the synaptic weights and biases can then be calculated by the generalized
delta rule (Rumelhart et al., 1986):
w
i,j
i1,k
=
i,j
y
i1,k
(9.15)
i,j
=
i,j
(9.16)
for i < I, from which all the new weights and biases can be determined. The quantity is known as
the learning rate that is used to scale down the degree of change made to the nodes and connections.
The larger the training rate, the faster the network will learn, but the chances of the ANN to reach
9.2. Articial neural networks 83
the desired outcome may become smaller as a result of possible oscillating error behaviors. Small
training rates would normally imply the need for longer training to achieve the same accuracy. Its
value, usually around 0.4, is determined by numerical experimentation for any given problem.
A cycle of training consists of computing a new set of synaptic weights and biases successively
for all the experimental runs in the training data. The calculations are then repeated over many
cycles while recording an error quantity E for a given run within each cycle, where
E =
1
2
JI
j=1
(t
I,j
y
I,j
)
2
(9.17)
The output error of the ANN at the end of each cycle can be based on either a maximum or averaged
value for a given cycle. Note that the weights and biases are continuously updated throughout the
training runs and cycles. The training is terminated when the error of the last cycle, barring the
existence of local minima, falls below a prescribed threshold. The nal set of weights and biases
can then be used for prediction purposes, and the corresponding ANN becomes a model of the
input-output relation of the thermal-system problem.
Implementation issues
In the implementation of a supervised fully-connected multilayered ANN, the user is faced with sev-
eral uncertain choices which include the number of hidden layers, the number of nodes in each layer,
the initial assignment of weights and biases, the training rate, the minimum number of training data
sets and runs, the learning rate and the range within which the input-output data are normalized.
Such choices are by no means trivial, and yet are rather important in achieving good ANN results.
Since there is no general sound theoretical basis for specic choices, past experience and numerical
experimentation are still the best guides, despite the fact that much research is now going on to
provide a rational basis (Zeng, 1998).
On the issue of number of hidden layers, there is a sucient, but certainly not necessary,
theoretical basis known as the Kolmogorovs mapping neural network existence theorem as presented
by Hecht-Nielsen (1987), which essentially stipulates that only one hidden layer of articial neurons
is sucient to model the input-output relations as long as the hidden layer has 2J
1
+1 nodes. Since
in realistic problems involving a large set of input parameters, the nodes in the hidden layer would
be excessive to satisfy this requirement, the general practice is to use two hidden layers as a starting
point, and then to add more layers as the need arises, while keeping a reasonable number of nodes
in each layer (Flood and Kartam, 1994).
A slightly better situation is in the choice of the number of nodes in each layer and in the entire
network. Increasing the number of internal nodes provides a greater capacity to t the training data.
In practice, however, too many nodes suer the same fate as the polynomial curve-tting routine
by collocation at specic data points, in which the interpolations between data points may lead to
large errors. In addition, a large number of internal nodes slows down the ANN both in training
and in prediction. One interesting suggestion given by Rogers (1994) and Jenkins (1995) is that
N
t
= 1 +N
n
J
1
+J
I
+ 1
J
I
(9.18)
where N
t
is the number of training data sets, and N
n
is the total number of internal nodes in the
network. If N
t
, J
1
and J
I
are known in a given problem, the above equation determines the suggested
minimum number of internal nodes. Also, if N
n
, J
1
and J
I
are known, it gives the minimum value
of N
t
. The number of data sets used should be larger than that given by this equation to insure
84 9. Applications: heat transfer correlations
the adequate determination of the weights and biases in the training process. Other suggested
procedures for choosing the parameters of the network include the one proposed by Karmin (1990)
by rst training a relatively large network that is then reduced in size by removing nodes which do not
signicantly aect the results, and the so-called Radial-Gaussian system which adds hidden neurons
to the network in an automatic sequential and systematic way during the training process (Gagarin
et al., 1994). Also available is the use of evolutionary programming approaches to optimize ANN
congurations (Angeline et al., 1994). Some authors (see, for example, Thibault and Grandjean,
1991) present studies of the eect of varying these parameters.
The issue of assigning the initial synaptic weights and biases is less uncertain. Despite the
fact that better initial guesses would require less training eorts, or even less training data, such
initial guesses are generally unavailable in applying the ANN analysis to a new problem. The
initial assignment then normally comes from a random number generator of bounded numbers.
Unfortunately, this does not guarantee that the training will converge to the nal weights and biases
for which the error is a global minimum. Also, the ANN may take a large number of training cycles
to reach the desired level of error. Wessels and Barnard (1992), Drago and Ridella (1992) and
Lehtokangas et al. (1995) suggested other methods for determining the initial assignment so that
the network converges faster and avoids local minima. On the other hand, when the ANN needs
upgrading by additional or new experimental data sets, the initial weights and biases are simply the
existing ones.
During the training process, the weights and biases continuously change as training proceeds
in accordance with equations (9.15) and (9.16), which are the simplest correction formulae to use.
Other possibilities, however, are also available (Kamarthi, 1992). The choice of the training rate
is largely by trials. It should be selected to be as large as possible, but not too large to lead to non-
convergent oscillatory error behaviors. Finally, since the sigmoid function has the asymptotic limits
of [0,1] and may thus cause computational problems in these limits, it is desirable to normalize all
physical variables into a more restricted range such as [0.15, 0.85]. The choice is somewhat arbitrary.
However, pushing the limits closer to [0,1] does commonly produce more accurate training results
at the expense of larger computational eorts.
9.2.2 Application to compact heat exchangers
In this section the ANN analysis will be applied to the prediction of the performance of two dierent
types of compact heat exchangers, one being a single-row n-tube heat exchanger (called heat ex-
changer 1), and the other a much more complicated multi-row multi-column n-tube heat exchanger
(heat exchanger 2). In both cases, air is either heated or cooled on the n side by water owing
inside the serpentine tubes. Except at the tube ends, the air is in a cross-ow conguration. Details
of the analyses are available in the literature (Diaz et al., 1996, 1998, 1999; Pacheco-Vega et al.,
1999). For either heat exchanger, the normal practice is to predict the heat transfer rates by using
separate dimensionless correlations for the air- and water-side coecients of heat transfer based on
the experimental data and denitions of specic temperature dierences.
Heat exchanger 1
The simpler single-row heat exchanger, a typical example being shown in Figure 9.10, is treated rst.
It is a nominal 18 in.24 in. plate-n-tube type manufactured by the Trane Company with a single
circuit of 12 tubes connected by bends. The experimental data were obtained in a variable-speed
open wind-tunnel facility shown schematically in Figure 9.11. A PID-controlled electrical resistance
heater provides hot water and its ow rate is measured by a turbine ow meter. All temperatures
are measured by Type T thermocouples. Additional experimental details can be found in the thesis
9.2. Articial neural networks 85
Figure 9.10: Schematic of compact heat exchanger 1.
Figure 9.11: Schematic arrangement of test facility; (1) centrifugal fan, (2) ow straightener, (3)
heat exchanger, (4) Pitot-static tube, (5) screen, (6) thermocouple, (7) dierential pressure gage,
(8) motor. View A-A shows the placement of ve thermocouples.
by Zhao (1995). A total of N = 259 test runs were made, of which only the data for N
t
= 197 runs
were used for training, while the rest were used for testing the predictions. It is advisable to include
the extreme cases in the training data sets so that the predictions will be within the same range.
For the ANN analysis, there are four input nodes, each corresponding to the normalized quan-
tities: air ow rate m
a
, water ow rate m
w
, inlet air temperature T
in
a
, and inlet water temperature
T
in
w
. There is a single output node for the normalized heat transfer rate
Q. Normalization of the
variables was done by limiting them within the range [0.15, 0.85]. Coecients of heat transfer
have not been used, since that would imply making some assumptions about the similarity of the
temperature elds.
Fourteen dierent ANN congurations were studied as shown in Table 9.3. As an example,
the training results of the 4-5-2-1-1 conguration, with three hidden layers with 5, 2 and 1 nodes
respectively, are considered in detail. The input and output layers have 4 nodes and one node,
respectively, corresponding to the four input variables and a single output. Training was carried out
to 200,000 cycles to show how the errors change along the way. The average and maximum values
of the errors for all the runs can be found, where the error for each run is dened in equation (9.17).
These errors are shown in Figure 9.12. It is seen that the the maximum error asymptotes at about
150,000 cycles, while the corresponding level of the average error is reached at about 100,000. In
either case, the error levels are suciently small.
After training, the ANNs were used to predict the N
p
= 62 testing data which were not used
in the training process; the mean and standard deviations of the error for each conguration, R and
respectively, are shown in Table 9.3. R and are dened by
R =
1
N
p
Np
r=1
R
r
(9.19)
=
Np
r=1
(R
r
R)
2
N
p
(9.20)
where R
r
is the ratio
Q
e
/
Q
p
ANN
for run number r,
Q
e
is the experimental heat-transfer rate, and
Q
p
ANN
is the corresponding prediction of the ANN. R is an indication of the average accuracy of
the prediction, while is that of the scatter, both quantities being important for an assessment
of the relative success of the ANN analysis. The network conguration with R closest to unity is
4-1-1-1, while 4-5-5-1 is the one with the smallest . If both factors are taken into account, it seems
that 4-5-1-1 would be the best, even though the exact criterion is of the users choice. It is also of
interest to note that adding more hidden layers may not improve the ANN results. Comparisons of
the values of R
r
for all test cases are shown in Figure 9.13 for two congurations. It is seen, that
Figure 9.12: Training error results for conguration 4-5-2-1-1 ANN.
86 9. Applications: heat transfer correlations
Conguration R
4-1-1 1.02373 0.266
4-2-1 0.98732 0.084
4-5-1 0.99796 0.018
4-1-1-1 1.00065 0.265
4-2-1-1 0.96579 0.089
4-5-1-1 1.00075 0.035
4-5-2-1 1.00400 0.018
4-5-5-1 1.00288 0.015
4-1-1-1-1 0.95743 0.258
4-5-1-1-1 0.99481 0.032
4-5-2-1-1 1.00212 0.018
4-5-5-1-1 1.00214 0.016
4-5-5-2-1 1.00397 0.019
4-5-5-5-1 1.00147 0.022
Table 9.3: Comparison of heat transfer rates predicted by dierent ANN congurations for heat
exchanger 1.
Figure 9.13: Ratio of heat transfer rates R
r
for all testing runs ( 4-5-5-1; + 4-5-1-1) for heat
exchanger 1.
although the 4-5-1-1 conguration is the second best in R, there are still several points at which the
predictions dier from the experiments by more than 14%. The 4-5-5-1 network, on the other hand,
has errors conned to 3.7%.
The eect of the normalization range for the physical variables was also studied. Addi-
tional trainings were carried out for the 4-5-5-1 network using the dierent normalization range
of [0.05,0.95]. For 100,000 training cycles, the results show that R = 1.00063 and = 0.016. Thus,
in this case, more accurate averaged results can be obtained with the range closer to [0,1].
We also compare the heat-transfer rates obtained by the ANN analysis based on the 4-5-5-1
conguration,
Q
p
ANN
, and those determined from the dimensionless correlations of the coecients
of heat transfer,
Q
p
cor
. For the experimental data used, the least-square correlation equations have
been given by Zhao (1995) and Zhao et al. (1995) to be
Nu
a
= 0.1368Re
0.585
a
Pr
1/3
a
(9.21)
Nu
w
= 0.01854Re
0.752
w
Pr
0.3
w
(9.22)
applicable for 200 < Re
a
< 700 and 800 < Re
w
< 4.5 10
4
, where is the n eectiveness. The
Reynolds, Nusselt, and Prandtl numbers are dened as follows,
Re
a
=
V
a
a
; Nu
a
=
h
a
k
a
; Pr
a
=
a
a
(9.23)
Re
w
=
V
w
D
w
; Nu
w
=
h
w
D
k
w
; Pr
w
=
w
w
(9.24)
where the superscripts a and w refer to the air- and water-side, respectively, V is the average ow
velocity, is the n spacing, D is the tube inside diameter, and and k are the kinematic viscosity
9.2. Articial neural networks 87
Figure 9.14: Comparison of 4-5-5-1 ANN (+) and correlation () predictions for heat exchanger 1.
and thermal conductivity of the uids, respectively. The correlations are based on the maximum
temperature dierences between the two uids. The results are shown in Figure 9.14, where the
superscript e is used for the experimental values and p for the predicted. For most of the data the
ANN error is within 0.7%, while the predictions of the correlation are of the order of 10%. The
superiority of the ANN is evident.
These results suggest that the ANNs have the ability of recognizing all the consistent patterns
in the training data including the relevant physics as well as random and biased measurement errors.
It can perhaps be said that it catches the underlying physics much better than the correlations do,
since the error level is consistent with the uncertainty in the experimental data (Zhao, 1995a).
However, the ANN does not know and does not have to know what the physics is. It completely
bypasses simplifying assumptions such as the use of coecients of heat transfer. On the other hand,
any unintended and biased errors in the training data set are also picked up by the ANN. The trained
ANN, therefore, is not better than the training data, but not worse either.
Problems
1. This is a problem
88 9. Applications: heat transfer correlations
References
[1] J. Ackermann. Robust Control: Systems with Uncertain Physical Parameters. Springer=-
Verlag, London, 1993.
[2] J.S. Albus and A.M. Meystel. Engineering of Mind: An Introduction to the Science of Intel-
ligent Systems. Wiley, New York, 2001.
[3] J.S. Albus and A.M. Meystel. Intelligent Systems: Architecture, Design, and Control. Wiley,
New York, 2002.
[4] R.A. Aleev and R.R. Aleev. Soft Computing and its Applications. World Scientic, Singapore,
2001.
[5] R. Babuska. Fuzzy Modeling for Control. Kluwer Academic Publishers, Boston, 1998.
[6] A.B. Badiru and J.Y. Cheung. Fuzzy Engineering Expert Systems with Neural Network Appli-
cations. John Wiley, New York, NY, 2002.
[7] F. Bagnoli, P. Lio, and S. Ruo, editors. Dynamical Modeling in Biotechnologies. World
Scientic, Singapore, 2000.
[8] P. Ball. Natural talent. New Scientist, 188(2523):5051, 2005.
[9] H. Bandemer and S. Gottwald. Fuzzy Sets, Fuzzy Logic Fuzzy Methods with Applications. John
Wiley & Sons, Chichester, 1995.
[10] S. Bandini and T. Worsch, editors. Theoretical and Practical Issues on Cellular Automata.
Springer, London, 2001.
[11] A.-L. Barab asi. Linked: The New Science of Networks. Perseus, Cambridge, MA, 2002.
[12] A.-L. Barabasi, R. Albert, and H. Jeong. Mean-eld theory for scale-free random networks.
Physica A, 272:173187, 1999.
[13] J.C. Bezdek. Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press,
New York, 1981.
[14] M. J. Biggs and S. J. Humby. Lattice-gas automata methods for engineering. Chemical
Engineering Research & Design, 76(A2):162174, 1998.
[15] D.S. Broomhead and D. Lowe. Multivariable functional interpolation and adaptive networks.
Complex Systems, 2:321355, 1988.
89
90 REFERENCES
[16] J.D. Buckmaster and G.S.S. Ludford. Lectures on Mathematical Combustion. SIAM, Philadel-
phia, 1983.
[17] Z.C. Chai, Z.F. Cao, and Y. Zhou. Encryption based on reversible second-order cellular
automata. Lecture Notes in Computer Science, 3759:350358, 2005.
[18] G. Chen and T.T. Pham. Introduction to Fuzzy Sets, Fuzzy Logic, and Fuzzy Control Systems.
CRC Press, Boca Raton, FL, 2001.
[19] M. Chester. Neural Networks: A Tutorial. PTR Prentice Hall, Englewood Clis, NJ, 1969.
[20] S.B. Cho and G.B. Song. Evolving cam-brain to control a mobile robot. Applied Mathematics
and Computation, 111(2-3):147162, 2000.
[21] B. Chopard and M. Droz. Cellular Automata Modeling of Physical Systems. Cambridge
University Press, Cambridge, U.K., 1998.
[22] E. F. Codd. Cellular Automata. Academic Press, New York, 1968.
[23] E. Czogala and J. Leski. Fuzzy and Neuro-Fuzzy Intelligent Systems. Physica-Verlag, Heidel-
berg, New York, 2000.
[24] C.W. de Silva. Intelligent Control: Fuzzy Logic Applications. CRC, Boca Raton, FL, 1995.
[25] J. Demongeot, E. Gol`es, and M. Tchuente, editors. Dynamical Systems and Cellular Automata.
Academic Press, London, 1985.
[26] A. Deutsch and S. Dormann, editors. Cellular Automaton Modeling of Biological Pattern
Formation: Characterization, Applications, and Analysis. Birkh auser, New York, 2005.
[27] Z.G. Diamantis, D.T. Tsahalis, and I. Borchers. Optimization of an active noise control system
inside an aircraft, based on the simultaneous optimal positioning of microphones and speakers,
with the use of genetic algorithms. Computational Optimization and Applications, 23:6576,
2002.
[28] G. Daz. Simulation and Control of Heat Exchangers Using Articial Neural Networks. PhD
thesis, Department of Aerospace and Mechanical Engineering, University of Notre Dame, 2000.
[29] G. Daz, M. Sen, K.T. Yang, and R.L. McClain. Simulation of heat exchanger performance
by articial neural networks. International Journal of HVAC&R Research, 1999.
[30] C.L. Dym and R.E. Levitt. Knowledge-Based Systems in Engineering. McGraw-Hill, New
York, 1991.
[31] A.P. Engelbrecht. Computational Intelligence: An Introduction. Wiley, Chichester, U.K., 2002.
[32] G. Fabbri. A genetic algorithm for n prole optimization. International Journal of Heat and
Mass Transfer, 40(9):21652172, 1997.
[33] G. Fabbri. Heat transfer optimization in internally nned tubes under laminar ow conditions.
International Journal of Heat and Mass Transfer, 41(10):12431253, 1998.
[34] G. Fabbri. Heat transfer optimization in corrugated wall channels. International Journal of
Heat and Mass Transfer, 43:42994310, 2000.
REFERENCES 91
[35] S.G. Fabri and V. Kadirkamanathan. Functional Adaptive Control: An Intelligent Systems
Approach. Springer, London, New York, 2001.
[36] L. Fausett. Fundamentals of Neural Networks: Architectures, Algorithms and Applications.
Prentice Hall, Englewood Clis, NJ, 1997.
[37] D.B. Fogel and C.J. Robinson, editors. Computational Intelligence: The Experts Speak. IEEE,
2003.
[38] U. Frisch, B. Hasslacher, and Y. Pomeau. Lattice-gas automata for the Navier-Stokes equation.
Physical Review Letters, 56:15051508, 1986.
[39] F. Garces, V.M. Becerra, C. Kambhampati, and K. Warwick. Strategies for Feedback Lineari-
sation: A Dynamic Neural Network Approach. Springer, New York, 2003.
[40] M. Gardner. The fantastic combinations of john conways new solitaire game life. Scientic
American, 233(4):120123, April 1970.
[41] E.A. Gillies. Low-dimensional control of the circular cylinder wake. Journal of Fluid Mechanics,
371:157178, 1998.
[42] S. Gobron and N. Chiba. 3D surface cellular automata and their applications. Journal of
Visualization and Computer Animation, 10(3):143158, 1999.
[43] R.L. Goetz. Particle stimulated nucleation during dynamic recrystallization using a cellular
automata model. Scripta Materialia, 52(9):851 856, 2005.
[44] E. Gol`es and S. Martnez, editors. Cellular Automata, Dynamical Systems, and Neural Net-
works. Kluwer, Dordrecht, 1994.
[45] K. Gurney. An Introduction to Neural Networks. UCL Press, London, 1997.
[46] M.J. Harris, G. Coombe, T. Scheuermann, and A. Lastra. Physically-based visual simulation
on graphics hardware. In Proceedings of the SIGGRAPH/Eurographics Workshop on Graphics
Hardware, pages 109118, 2002.
[47] M.H. Hassoun. Fundamentals of Articial Neural Networks. MIT Press, Cambridge, MA,
1995.
[48] S. Haykin. Neural Networks: A Comprehensive Foundation. Macmillan, New York, 1994.
[49] D.O. Hebb. The Organization of Behavior: A Neuropsychological Theory. Wiley, New York,
1949.
[50] M.A. Henson and D.E. Seborg, editors. Nonlinear Process Control. Prentice Hall, Upper
Saddle River, NJ, 1997.
[51] J.J. Hopeld. Neural networks and physical systems with emergent collective computational
capabilities. Proceedings of the National Academy of Sciences of the U.S.A., 79:25542558,
1982.
[52] H.W. Lewis III. The Foundations of Fuzzy Control. Plenum Press, New York, 1997.
[53] A. Ilachinski. Cellular Automata: A Discrete Universe. World Scientic, Singapore, 2001.
92 REFERENCES
[54] R. Isermann. Mechatronic Systems: Fundamentals. Springer, London, 2003.
[55] J.-S.R. Jang, C.-T. Sun, and E. Mizutani. Neuro-Fuzzy and Soft Computing: A Computational
Approach to Learning and Machine Intelligence. Prentice Hall, Upper Saddle River, NJ, 1997.
[56] K. Preston Jr. and M.J.B. Du. Modern Cellular Automata: Theory and Applications. Plenum
Press, New York, 1984.
[57] K.J. Kim and S.B. Cho. A comprehensive overview of the applications of articial life. Articial
Life, 12(1):153182, 2006.
[58] T. Kohonen. Self-organized formation of topologically correct feature maps. Biological Cyber-
netics, 43:5969, 1982.
[59] E. Kreyszig. Introductory Functional Analysis with Applications. John Wiley, New York, 1978.
[60] C. Lee, J. Kim, D. Babcock, and R. Goodman. Application of neural networks to turbulence
control for drag reduction. Physics of Fluids, 9(6):17401747, 1997.
[61] L. Ljung. System Identication: Theory for the User. Prentice Hall, Upper Saddle River, NJ,
1999.
[62] G.F. Luger and P. Johnson. Cognitive Science: The Science of Intelligent. Springer,, London,
New York, 1994.
[63] P. Maji and P.P. Chaudhuri. Cellular automata based pattern classifying machine for dis-
tributed data mining. Lecture Notes in Computer Science, 3316:848853, 2004.
[64] B.D. McCandliss, J.A. Fiez, M. Conway, and J.L. McClelland. Eliciting adult plasticity for
japanese adults struggling to identify english vertical bar r vertical bar and vertical bar l
vertical bar: Insights from a hebbian model and a new training procedure. Journal of Cognitive
Neuroscience, page 53, 1999.
[65] L.R. Medsker. Hybrid Intelligent Systems. Kluwer Academic Publishers, Boston, 1995.
[66] M.L. Minsky and S.A. Papert. Perceptrons. MIT Press, Cambridge, MA, 1969.
[67] L. Nadel and D.L. Stein, editors. 1990 Lectures in Complex Systems. Addison-Wesley, Redwood
City, CA, 1991.
[68] D. Necsulescu. Mechatronics. Prentice Hall, Upper Saddle River, NJ, 2002.
[69] O. Nelles. Nonlinear System Identication. Springer, Berlin, 2001.
[70] J.P. Norton. An Introduction to Identication. Academic Press, London, 1986.
[71] S. Omohundro. Modeling cellular automata with partial-dierential equations. Physica D,
10(1-2):128134, 1984.
[72] A. Pacheco-Vega, M. Sen, K.T. Yang, and R.L. McClain. Genetic-algorithm-based predictions
of n-tube heat exchanger performance. Heat Transfer 1998, 6:137142, 1998.
[73] I. Podlubny. Fractional Dierential Equations. Academic Press, San Diego, 1999.
REFERENCES 93
[74] N. Queipo, R. Devarakonda, and J.A.C. Humphrey. Genetic algorithms for thermosciences
research: application to the optimized cooling of electronic components. International Journal
of Heat and Mass Transfer, 37(6):893908, 1998.
[75] M. Rao, Q. Wang, and J. Cha. Integrated Distributed Intelligent Systems in Manufacturing.
Chapman and Hall, London, 1993.
[76] C.R. Reeves and J.W. Rowe. Genetic Algorithms Principles and Perspectives: A Guide to
GA Theory. Kluwer, Boston, 1997.
[77] L. Reznik and V. Kreinovich, editors. Soft Computing in Measurement and Information Ac-
quisition. Springer-Verlag, Berlin, 2003.
[78] K. Rohde. Cellular automata and ecology. Oikos, 110(1):203207, 2005.
[79] F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization
in the brain. Psychological Review, 65:386408, 1958.
[80] D.H. Rothman and S. Zaleski. Lattice-Gas Cellular Automata: Simple Models of Complex
Hydrodynamics. Cambridge University Press, Cambridge, U.K., 1997.
[81] D. Ruan, editor. Intelligent Hybrid Systems: Fuzzy Logic, Neural Networks, and Genetic
Algorithms. Kluwer, Boston, 1997.
[82] D.E. Rumelhart, G.E. Hinton, and R.J. Williams. Learning internal representations by error
propagation. In D.E. Rumelhart and J.L. McClelland, editors, Parallel Distributed Processing:
Explorations in the Microstructure of Cognition, volume 1, chapter 8, pages 620661. MIT
Press, Cambridge, MA, 1986.
[83] D.E. Rumelhart, G.E. Hinton, and R.J. Williams. Learning representations by back-
propagating errors. Nature, 323:533536, 1986.
[84] J.R. Sanchez. Pattern recognition of one-dimensional cellular automata using Markov chains.
International Journal of Modern Physics C, 15(4):563 567, 2004.
[85] R.J. Schalko. Artical Neural Networks. McGraw-Hill, New York, 2002.
[86] G.G. Schwartz, G.J. Klir, H.W. Lewis, and Y. Ezawa. Applications of fuzzy-sets and approx-
imate reasoning. Proceedings of the IEEE, 82(4):482498, 1994.
[87] E. Sciubba and R. Melli. Articial Intelligence in Thermal Systems Design: Concepts and
Applications. Nova Science Publishers, Commack, N.Y., 1998.
[88] M. Sen and J.W. Goodwine. Soft computing in control. In M. Gad el Hak, editor, The MEMS
Handbook, chapter 4.24, pages 620661. CRC, Boca Raton, FL, 2001.
[89] M. Sen and K.T. Yang. Applications of articial neural networks and genetic algorithms in
thermal engineering. In F. Kreith, editor, The CRC Handbook of Thermal Engineering, chapter
4.24, pages 620661. CRC, Boca Raton, FL, 2000.
[90] S. Setoodeh, Z. Gurdal, and L.T. Watson. Design of variable-stiness composite layers using
cellular automata. Computer Methods in Applied Mechanics and Engineering, 195(9-12):836
851, 2006.
94 REFERENCES
[91] J.N. Siddall. Expert Systems for Engineers. Marcel Dekker, New York, 1990.
[92] N.K. Sinha and B. Kuszta. Modeling and Identication of Dynamic Systems. Van Nostrand
Reinhold, New York, 1983.
[93] I.M. Sokolov, J. Klafter, and A. Blumen. Fractional kinetics. Physics Today, 55(11):4854,
2002.
[94] S.K. Srinivasan and R. Vasudevan. Introduction to Random Dierential Equations and Their
Applications. Elsevier, New York, 1971.
[95] A. Tettamanzi and M. Tomassini. Soft Computing: Integrating Evolutionary, Neural, and
Fuzzy Systems. Springer, Berlin, 2001.
[96] T. Tooli. Cellular automata as an alternative to (rather than an approximation of)
dierential-equations in modeling physics. Physica D, 10(1-2):117127, 1984.
[97] T. Tooli and N. Margolis. Cellular Automata Machines. MIT Press, Cambridge, MA, 1987.
[98] E. Turban and J.E. Aronson. Decision Support Systems and Intelligent Systems. Prentice
Hall, Upper Saddle River, N.J., 1998.
[99] J. von Neumann. Theory of Self-Reproducing Automata, (completed and edited by A.W.
Burks). University of Illinois, Urbana-Champaign, IL, 1966.
[100] B.H. Voorhees. Computational Analysis of One-Dimensional Cellular Automata. World Sci-
entic, Singapore, 1996.
[101] D.J. Watts and S.H. Strogatz. Collective dynamics of small-world networks. Nature, 393:440
442, 1998.
[102] C. Webster and F.L. Wu. Coase, spatial pricing and self-organising cities. Urban Studies,
38(11):20372054, 2001.
[103] D.A. White and D.A. Sofge, editors. Handbook of Intelligent Control: Neural, Fuzzy and
Adaptive Approaches. Van Nostrand, New York, 1992.
[104] B. Widrow and Jr. M.E. Ho. Adaptive switching circuits. IRE WESCON Convention Record,
pages 96104, 1960.
[105] D.A. Wolf-Gladrow. Lattice-Gas Cellular Automata and Lattice Boltzmann Models: An Intro-
duction. Springer, Berlin, 2000.
[106] S. Wolfram, editor. Theory and Applications of Cellular Automata. World Scientic, Singapore,
1987.
[107] S. Wolfram. A New Kind of Science. Wolfram Media, Champaign, IL, 2002.
[108] W.S.McCulloch and W. Pitts. A logical calculus of the ideas immanent in nervous activity.
Bulletin of Mathematical Biophysics, 5:115133, 1943.
[109] H. Xie, R.L. Mahajan, and Y.-C. Lee. Fuzzy logic models for thermally based microelectronic
manufacturing. IEEE Transactions on Semiconductor Manufacturing, 8(3):219227, 1995.
REFERENCES 95
[110] T. Yanagita. Coupled map lattice model for boiling. Physics Letters A, 165(5-6):405 408,
1992.
[111] W. Yu, C.D. Wright, S.P. Banks, and E.J. Palmiere. Cellular automata method for simulating
microstructure evolution. IEE Proceedings-Science Measurement and Technology, 150(5):211
213, 2003.
[112] P.K. Yuen and H.H. Bau. Controlling chaotic convection using neural nets - theory and
experiments. Neural Networks, 11(3):557569, 1998.
[113] L.A. Zadeh. Fuzzy sets. Information and Control, 8:338353, 1965.
[114] R. Y. Zhang and H. D. Chen. Lattice Boltzmann method for simulations of liquid-vapor
thermal ows. Physical Review E, 67:066711, 2003.