Recursive Estimation
Recursive Estimation
Abstract
These are my notes of the course Recursive Estimation (151-0566-00L, spring semester 2023) by Dr. D’Andrea
at ETH Zürich. Almost all of the material in this comes from lecture notes, slides and exercises of the course, with
exception of minor additions that were introduced for clarity.
License
This work is licensed under a Creative Commons “Attribution-NonCommercial-ShareAlike
4.0 International” license.
Contents
1 Probability Basics Review 1
1.1 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Expectation and Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Sampling Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.4 Change of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Bayes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.6 Gaussian Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Bayesian Tracking 3
2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Recursive Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Extracting Estimates 3
3.1 Maximum Likelihood (ML) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2 Maximum a Posteriori (MAP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.3 Recursive Least Squares (RLS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.3.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.3.2 Standard Weighted LS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.3.3 Recursive LS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4 Kalman Filter 5
4.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2 Bayesian Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.3 Alternate Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.4 Detectability and Stabilizability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.5 The Steady-State KF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
i
7 Observer Based Control 9
7.1 Static State-Feedback Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
7.2 Separation Principle and Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
ii
1 Probability Basics Review Proposition 1.1 (Conditioning). Given the RVs x, y, z:
pxy|z (x̄, ȳ|z̄)
1.1 Random Variables px|yz (x̄|ȳ, z̄) = .
py|x (ȳ|z̄)
Definition 1.1 (Random Variable). We call x ∈ X a
random variable (RV) from the set of possible outcomes This generalizes to more variables.
X , with an associated probability density function (PDF) Definition 1.5 (Independence). The RVs x and y are said
px : X → R that satisfies to be independent if p(x|y) = p(x).
• px (x̄) ≥ 0 for all x̄ ∈ X , and From the definition it follows that p(x, y) = p(x)p(y)
• if X is countable (discrete random variable, DRV) and p(y|x) = p(y).
X Z
px (x̄) = 1 or px (x̄)dx̄ = 1, 1.2 Expectation and Moments
x̄∈X X
The expectation is to be understood as a statistical aver-
in the case of a continuous random variable (CRV) age, or as a weighted sum with the coefficients being the
(X is an interval). probability.
The PDF is then used to define the notion of proba- Definition 1.6 (Expectation). For a RV x ∈ X
bility, i.e. the probability that a discrete RV x takes the Z
X
value x̄ ∈ X is px (x̄), and is written as Pr {x = x̄}. E {x} = x̄px (x̄) or x̄px (x̄) dx̄.
For a continuous RV x the probability of any specific x
x̄∈X X
value is always 0, instead we can only can only refer to
the RV being in some interval [a, b] ⊆ X and we write In the definition above px can be replaced with a con-
Rb ditional px|y , to obtain the conditional expectation
Pr {x ∈ [a, b]} = a px (x̄) dx̄.
Z
Definition 1.2 (Joint PDF). Let x ∈ X and y ∈ Y be E {x|ȳ} = x̄px|y (x̄|ȳ) dx̄.
RVs. The joint PDF satisfies x|y X
• pxy (x̄, ȳ) ≥ 0 for all x ∈ X and y ∈ Y, Theorem 1.1 (Law of the Unconscious Statistician). Let
y = g(x) ∈ Y = g(X ) where x ∈ X is a DRV or CRV.
• further Then
XX ZZ Z
X
pxy (x̄, ȳ) = 1 or pxy (x̄, ȳ) dx̄dȳ = 1 E {y} = g(x̄)px (x̄) or g(x̄)px (x̄) dx̄,
x̄∈X ȳ∈Y y X
X ×Y x̄∈X
and so does conditioning and has the property that F̂x (−∞) = 0 and F̂x (∞) = 1.
Let ū be the samples of u ∼ U(0, 1). To find a sample
p(x1 , . . . |xN )p(xN ) = p(x1 , . . . xN ).
x̄ of x we solve for a x̄ such that F̂x (x̄ − 1) < ū and
However, now there can be mixed cases of conditioning. ū ≤ F̂x (x̄).
1
Algorithm 1.2 (Sample multiple finite DRV). Given a 1.5 Bayes’ Theorem
desired joint PDF p̂xy for the scalar DRVs x ∈ X and
Theorem 1.2 (Bayes’ theorem). For the RVs x and z
y ∈ Y, where Nx = |X | and Ny = |Y| are both finite,
let Z = {1, 2, . . . , Nx Ny }. Then define a new p̂z such p(x)
that p̂z (1) = p̂xy (1, 1), p̂z (2) = p̂xy (1, 2), …, p̂z (Nx Ny ) = p(x|z) = p(z|x) .
p(z)
p̂xy (Nx , Ny ), and apply algorithm 1.1 to p̂z .
Remark. The interpretation is as follows: x is the un-
If the constraint of having finite sets of outcome is a
known quantity of interest (state); p(x) is the prior belief
problem, the following algorithm also works for infinite
of the state; z is an observation related to the state; p(z|x)
sets X and Y.
is, for a given state, what is the probability of observing
Algorithm 1.3 (Sample multiple DRVs). Given a desired z? p(x|z) is the posterior belief, that is the observation
joint PDF p̂xy , decompose it into p̂x|y (x̄|ȳ)p̂y (ȳ). Apply what is the probability that the state is x?
algorithm 1.1 to get a sample ȳ for y via p̂y (ȳ), then with ȳ
Bayes’ theorem is a systematic way of combining prior
fixed apply algorithm 1.1 again to get x̄ for x via p̂x|y (x̄|ȳ).
beliefs with observations. Since observing z is usually not
Remark. The independence of the uniform number gener- enough to directly determine x. That is because usually
ator between successive calls is important. Further, both dim z < dim x and with noise p(z|x) that is not “sharp”.
algorithms were described for 2 variables but they both
generalize any number of DRVs. Proposition 1.5 (Generalization of Bayes’ theorem). Sup-
pose there are N (vector or scalar) observations z1 , . . . , zN .
Algorithm 1.4 (Sample a CRV). Given a desired piece- Assuming conditional independence, i.e.
wise continuous and bounded PDF p̂x for a CRV x, let
Z x̄ p(z1 , . . . , zN |x) = p(z1 |x) · · · p(zN |x),
F̂x (x̄) = p̂x (λ) dλ = Pr {x ≤ x̄}
−∞ then Q
p(x) i p(zi |x)
be the CDF of x. To find a sample of x let x̄ be any p(x|z1 , . . . , zN ) = ,
p(z1 , . . . , zN )
solution to ū = F̂x (x̄), then x has PDF px = p̂x .
where the normalization
Algorithm 1.5 (Sample multiple CRVs). Analogously to X Y
algorithm 1.3 decompose the given desired joint PDF into p(z1 , . . . , zN ) = p(x) p(zi |x)
p̂xy (x̄, ȳ) = p̂x|y (x̄|ȳ)p̂y (ȳ). Then, apply algorithm 1.4 to x∈X i
get a ȳ for y via p̂y (ȳ), and with ȳ fixed apply it again to
get a sample x̄ of x with p̂x|y (x̄|ȳ). by the total probability theorem.
A possible interpretation for the independence assump-
1.4 Change of Variables tion is that a measurement of the state x is corrupted by
noise which is independent at each time step.
When we work with functions of RVs we usually also wish
to know the PDFs of the results.
1.6 Gaussian Random Variables
Proposition 1.2 (Change of variables for DRVs). Let py
be given for y ∈ Y and consider x = g(y) ∈ X = g(Y). For Kalman filter we need to know the properties of Gaus-
For each x̄ ∈ X let sian RVs.
Yx̄ = {ȳi : ȳi ∈ Y, g(ȳi ) = x̄}, Definition 1.8 (Gaussian RV (GRV)). The PDF of a
Gaussian (normally) distributed D-dimensional CRV y =
then X X (y1 , . . . , yD ) is
px (x̄) = py (ȳ) = py (ȳ).
ȳ∈Yx̄ ȳ∈Y:g(ȳ)=x̄ 1 1
exp − (y − µ) Σ−1 (y − µ) ,
T
p(y) = p
Proposition 1.3 (Change of variables for CRVs). Con- (2π)D det Σ 2
sider a strictly monotonic differentiable continuous func-
tion x = g(y), then where µ ∈ RD is the mean vector and Σ ∈ RD×D and Σ
0 (is a positive definite matrix) and symmetric (ΣT = Σ).
py (ȳ) py ◦ g −1 (x̄)
px (x̄) = 0 = 0 . Proposition 1.6. In the special case where Σ is a diagonal
g (ȳ) g ◦ g −1 (x̄)
matrix with entries σi2
Proposition 1.4 (Multivariate change of variables for
CRVs). Let g : Rm → Rm , w 7→ g(w), be a map with YD
1 (yi − µi )2
nonsingular Jacobian for all w, i.e. p(y) = p exp − .
i=1 2πσi2 2σi2
∂w1 g1 · · · ∂wm g1
∂g .. .. .. 6= 0, ∀w. Hence, the PDF is a product of scalar GRVs, and thus the
det = det . . .
∂w variables are mutually independent (the converse is also
∂w1 gm · · · ∂wm gm
true).
Further, assume that z = g(w) has a unique solution for
Remark. For a time-dependent GRV y(k) we say that it is
w in terms of z, say w = h(z). Then
spatially independent for a fixed time k if y1 (k), …, yD (k)
−1
∂g are mutually independent, and temporally independent if
pz (z̄) = pw (h(z̄)) det (h(z̄)) . y(1), …, y(k) are mutually independent.
∂w
2
Definition 1.9 (Jointly GRVs). Two GRVs x and y are Proposition 2.2 (Measurement Update). We combine
said to be jointly Gaussian if the vector RV (x, y) is also the new observation using Bayes’ rule and get
a GRV.
p(x(k)|z(1 : k)) = p(x(k)|z(k), z(1 : k − 1)) =
Remark. If two variables are GRVs this does not imply measurement model prior
that they are jointly GRVs. z }| {z }| {
p(z(k)|x(k), z(1 : k − 1)) p(x(k)|z(1 : k − 1))
Proposition 1.7. If two GRVs x ∼ N (µx , Σx ) and y ∼ p(z(k)|z(1 : k − 1))
N (µy , Σy ) are independent, i.e. p(x, y) = p(x)p(y) then | {z }
normalization
they are jointly Gaussian.
where the normalization can be computed using the total
Lemma 1.1 (Affine transformation of a GRV is a GRV).
probability theorem
Let y be a GRV, M a matrix and b a vector of appropriate
size, then x = M y + b is a GRV. X
p(z(k)|z(1 : k − 1)) = p(z(k)|x(k))p(x(k)|z(1 : k − 1)).
Lemma 1.2 (Linear combination of jointly GRVs is a x(k)∈X
3
Definition 3.1 (Maximum Likelihood Estimator). Let Example. Consider the scalar observation model z = x+w
z ∈ Z be the measurement of the observation model with w ∼ N (0, 1), x ∼ N (µ, σ 2 ), w and x independent.
pz|x (z̄|x̄). For a given observation z̄ ML seeks the value Then
for the parameter x that makes the observation z̄ most
1 (x̄ − µ)2
likely: px (x̄) ∝ exp − , and
2 σ2
x̂ ML
= arg max pz|x (z̄|x̄).
x̄∈X 1
pz|x (z̄|x̄) ∝ exp − (z̄ − x̄) . 2
4
An intuition for the structure of this estimator is given Theorem 4.1 (Kalman Filter Equations). The prior up-
by the fact that if the measurement coincides with the date or prediction step is
estimate then z̄k − Hk x̂k−1 = 0 and x̂k = x̂k−1 .
x̂p (k) = A(k − 1)x̂m (k − 1) + u(k − 1),
Pp (k) = A(k − 1)Pm (k − 1)AT (k − 1) + Q(k − 1),
4 Kalman Filter
then, the a posteriori update or measurement step is
The Kalman filter (KF) is a Bayesian estimator for linear −1
Pm (k) = (Pp−1 (k) + H T (k)R−1 (k)H(k)) ,
time-varying (LTV) systems with Gaussian process and
−1
measurement noise. The KF is particularly exceptional
T
x̂m (k) = x̂p (k) + Pm (k)H (k)R (k)(z̄(k) − H(k)x̂p (k)).
because it has a closed form analytical solution.
Therefore, the Kalman filter is the analytical solution
to the Bayesian state estimation problem for a linear sys-
4.1 Problem Statement tem with Gaussian distributions.
Consider a LTV system
4.3 Alternate Formulation
x(k) = A(k − 1)x(k − 1) + u(k − 1) + v(k − 1),
A more common formulation for the Kalman filter is as
z(k) = H(k)x(k) + w(k), follows.
where x(k) is the state, u(k) is a known control input, Theorem 4.2 (KF with Kalman Gain). Let
v(k) ∼ N (0, Q(k)) is the process noise, z(k) is the mea- −1
surement and w(k) ∼ N (0, R(k)) is the sensor noise. Fur- K(k) = Pp (k)H T (k)(H(k)Pp (k)H T (k) + R(k))
ther the initial state x(0) ∼ N (x0 , P0 ) and x(0), {w(·)} be the Kalman filter gain. Then, the a posteriori update
and {w(·)} are mutually independent. can be computed with
Remark. If v(k) has nonzero mean, say v(k) ∼ N (α, Q(k)) x̂m (k) = x̂p (k) + K(k)(z̄(k) − H(k)x̂p (k)),
define ū = u − α. Similarly if w(k) ∼ N (β, R(k)), redefine
Pm (k) = (I − K(k)H(k))Pp (k).
z̄ = z − β.
Lemma 4.3 (Joseph form of the covariance update). In
4.2 Bayesian Formulation the a posteriori update of the KF with Kalman Gain Pm (k)
can be computed with
For the Bayesian interpretation of the KF we reformulate T
the problem using auxiliary variables “p” for the predic- Pm (k) = (I − K(k)H(k))Pp (k)(I − K(k)H(k))
tion, and “m” for measurement: + K(k)R(k)K(k) .
T
xm (0) = x(0) This form is more computationally expensive, but less sen-
xp (k) = A(k − 1)xm (k − 1) + u(k − 1) + v(k − 1) sitive to numerical errors.
zm (k) = H(k)xp (k) + w(k) This is the same as the recursive least square algo-
rithm 3.1, therefore, the KF can also be applied to non-
where xm (k) is defined via its PDF Gaussian RVs. Then, the KF can be interpreted as a linear
unbiased estimator that minimizes the mean square error
pxm (k) (ξ) = pxp (k)|zm (k) (ξ|z̄(k)), ∀ξ. (MMSE). However, it the KF will no longer be optimal in
the Bayesian sense.
Lemma 4.1. With the above formulation
Remark. If A(k), H(k), Q(k), R(k) and P0 are known
pxp (k) (ξ) = px(k)|z(1:k−1) (ξ|z̄(1 : k − 1)), for all k, Pp (k), Pm (k) and K(k) can all be precomputed
offline.
pxm (k) (ξ) = px(k)|z(1:k) (ξ|z̄(1 : k)),
Remark. The KF assumes positive definiteness for P0 0,
for all ξ and k = 1, 2, . . .. That is, xp (k) is the RV x(k) Q(k) 0, R(k) 0, however the KF also makes sense
conditioned on z(1 : k − 1) and xm (k) is the RV x(k) when they are positive semidefinite, as long the matrix
conditioned on z(1 : k). inversions are well defined. That is, when some states are
know exactly.
Now, introducing the following notation for the mean
and variance of the prediction and measurement 4.4 Detectability and Stabilizability
x̂p (k) = E {xp (k)} , Pp (k) = Var {xp (k)} , Definition 4.1 (Detectability). Consider the determinis-
tic system
x̂m (k) = E {xm (k)} , Pm (k) = Var {xm (k)} ,
x(k + 1) = Ax(k), z(k) = Hx(k),
we make use of the following fact: where k = 0, 1, . . . and x(0) = x0 ∈ Rn . The system is
Lemma 4.2. For all k, xp (k) and xm (k) are GRVs. said to be detectable if
lim z(k) = 0 =⇒ lim x(k) = 0,
Hence, we can compute expressions for x̂p (k), Pp (k), k→∞ k→∞
x̂m (k) and Pm (k), i.e. the Kalman filter equations. for any x0 .
5
The above implies that the system is detectable iff 5 Extended Kalman Filter (EKF)
Hw 6= 0 for any eigenvector w corresponding to an eigen-
value |λ| ≥ 1 of the matrix A. In other words, we need 5.1 Problem Statement
to be able to see unstable modes. This can be expressed
with the conditions: Consider the nonlinear discrete-time system
A − λI x(k) = qk−1 (x(k − 1), v(k − 1)), z(k) = hk (x(k), w(k)),
rank = n, ∀λ ∈ C, |λ| ≥ 1,
H
where
i.e. is full rank (PBH Test); Or, the eigenvalues of A−LH
or (I − LH)A can be placed within the unit circle by a E {x(0)} = x0 , Var {x(0)} = P0 ,
suitable choice of L ∈ Rn×m E {v(k − 1)} = 0, Var {v(k − 1)} = Q(k − 1),
Definition 4.2 (Stabilizability). Consider the determin- E {w(k)} = 0, Var {w(k)} = R(k).
istic system
x(k + 1) = Ax(k) + Bu(k), z(k) = Hx(k), Moreover, x(0), {v(·)}, {w(·)} are mutually independent,
qk−1 is continuously differentiable wrt x(k−1) and v(k−1),
where k = 0, 1, . . . and x(0) = x0 , A ∈ Rn×n , B ∈ Rn×m . and hk is continuously differentiable wrt x(k) and w(k).
The system is said to be stabilizable if That is, a system that is mildly nonlinear. Any known
input to the system is implicitly absorbed in qk−1 .
∃u(0 : k − 1) such that lim x(k) = 0
k→∞
5.2 The EKF Equations
for any x0 .
The EKF works by linearizing the nonlinear system at the
Equivalently, if A − λI B is full rank for all λ ∈ C current state estimate and then apply the KF equations.
with |λ| ≥ 1 (PBH Test); Or if the eigenvalues of A − BK
or (I−BK)A can be placed within the unit circle by choos- Theorem 5.1 (EKF process update equations). Lineariz-
ing K ∈ Rm×n . Stabilizability is the dual of detectability, ing qk−1 (x(k−1), v(k−1)) about x̂m (k−1) and E {v(k − 1)} =
i.e. (A, B) is stabilizable iff (AT , B T ) is detectable. 0 yields
6
zero-mean with variance M (k)R(k)M T (k), and the update for all a < 0 and b > 0 and any real valued function ξ(t)
equations are that is continuous at 0.
K(k) = Pp (k)H T (k)(H(k)Pp (k)H T (k) Theorem 5.4 (Hybrid EKF process update). Solve in the
−1 interval (k − 1)T ≤ t ≤ kT the ODE
+ M (k)R(k)M T (k)) ,
x̂m (k) = x̂p (k) + K(k)(z̄(k) − hk (x̂p (k), 0)), ˙
x̂(t) = q(x̂(t), 0, t), x̂((k − 1)T ) = x̂m [k − 1]
Pm (k) = (I − K(k)H(k))Pp (k).
and set x̂p [k] = x̂(kT ). Then solve in the same interval
Intuition: correct for the mismatch between the actual the matrix ODE
measurement z̄(k) and its nonlinear prediction hk (x̂p (k), 0),
Ṗ (t) = A(t)P (t) + P (t)AT (t) + L(t)Qc LT (t),
and correct the variance according to the linearized equa-
tions. where A(t) = ∂x q(x̂(t), 0, t) and L(t) = ∂v q(x̂, 0, t) with
Remark. In this case the Kalman gain cannot be computed P ((k − 1)T ) = Pm [k − 1]. Then set Pp [k] = P (kT ).
offline even if the noise distributions are known for all k,
hence the EKF is more computationally expensive. Proof. Consider only 0 ≤ t ≤ T and generalize later for
other k. To obtain the mean update we take the expecta-
The EKF variables x̂p (k), x̂m (k), Pp (k) and Pm are tion of the dynamics E {ẋ(t)} = E {q(x(t), v(t), t)}. Then
only approximations! The EKF would be exact if qk−1 the time-derivative and E {·} commute, and we assume
and E {·} commuted: that E {q(·)} ≈ q(E {·}) to get the ODE for x̂(t).
E {qk−1 (x, v)} = qk−1 (E {x} , E {v}) For the variance update linearize the system with A(t) =
∂x q(x̂(t), 0, t), L(t) = ∂v q(x̂(t), 0, t) and let x̃ = x(t)− x̂(t),
(the same is also for hk ), which is not the case for general assume x̃ and v(t) are small (may be a bad assumption, es-
nonlinear qk−1 , but true for linear qk−1 . pecially if v(t) is unbounded). Then x̃˙ ≈ A(t)x̃(t) + L(t)x̃,
Therefore, the EKF does not have general convergence and
guarantees, but it works well for mildly nonlinear systems Z t+τ
with unimodal noise distributions. x̃(t + τ ) ≈ x̃(t) + A(ξ)x̃(ξ) + L(ξ)v(ξ) dξ
t
Z t+τ
5.3 Hybrid EKF ≈ x̃(t) + τ A(t)x̃(t) + L(t) v(ξ) dξ + O(τ 2 ),
t
In practice the process dynamics are usually continuous in
by linearizing around τ = 0. The integral of v(ξ) cannot
time, and measurement taken a discrete time steps. That
be approximated since v(ξ) = Qc δ(ξ)
is not continuous.
is
Now, define P (t) = Var {x(t)} ≈ E x̃(t)x̃T (t) similar as
ẋ(t) = q(x(t), v(t), t), z[k] = hk (x[k], w[k]),
in the mean, then
with E {w[k]} = 0, Var {w[k]} = R (assumed constant for
simplicity). We use the notation x[k] = x(kT ) with T P (t + τ ) ≈ P (t) + τ A(t)P (t) + τ P (t)AT (t)
ZZ
being a constant sampling time.
+ L(t) E v(ξ)v T (η) dξdη LT (t) + O(τ 2 )
Remark. One could discretize the dynamics and work with [t,t+τ ]2
the EKF above. However, if the process is “fast” (needs = P (t) + τ A(t)P (t) + τ P (t)AT (t) + τ L(t)Q LT (t) + O(τ 2 ),
c
very small T ), working with a lot of samples may end up
being more expensive than the hybrid EKF which works where in the second step we used the fact that the inte-
with the continuous-time dynamics. grand equals Qc δ(ξ − η). Reordering the equation to get
(P (t + τ ) − P (t))/τ on the RHS and letting τ → 0 yields
Definition 5.1 (White Noise). A discrete time signal
the variance update ODE.
vd[k] is said to be white noise if E {vd [k]} = 0 and
E vd [k]vd T [k + n] = Qδd [n], where n is an integer and The measurement equations are the same as the discrete-
δd [n] is the Kroneker delta, i.e δd [0] = 1 and δd [n] = 0 time EKF.
when n 6= 0.
Similarly a continuous
n time o signal v(t) is white noise Theorem 5.5 (Hybrid EKF measurement update). Let
T
if E {v(t)} = 0 and E v(t)v(t) = Qc δ(τ ) where δ(τ ) is H[k] = ∂x hk (x̂p [k], 0), M [k] = ∂w hk (x̂p [k], 0).
the Dirac delta, which may be defined as
( Then the update equations are
1/(2ϵ) −ϵ < τ < ϵ −1
δ(τ ) = lim . K[k] = Pp H T [k](H[k]Pp [k]H T [k] + M [k]RM T [k]) ,
ϵ→0 0 otherwise
x̂m [k] = x̂p [k] + K[k](z̄ − hk (x̂p [k], 0)),
Remark. True continuous white noise cannot exist because
Pm [k] = (I − K[k]H[k])Pp [k].
it would have infinite power (has constant power spectral
density), but it is nonetheless a useful approximation. Remark. Solving the matrix ODE is usually done with
Theorem 5.3. The Dirac pulse has the property that numerical ODE solvers such as Runge-Kutta (matlab e.g.
ode45), and the accuracy largely depends on the order
Z b
of the solver. Numerical accuracy is often at the cost of
ξ(τ )δ(τ ) dτ = ξ(0), increased computation.
a
7
6 Particle Filter (PF) with a indexing the bins. Again, by the law of the uncon-
scious statistician
The basic idea of the particle filter is to approximate the Z "Z a+∆y #
Bayesian state estimator for nonlinear systems and general
En {sna } = δ(ξ − ȳ n ) dξ py (ȳ n ) dȳ n
(non Gaussian) noise distributions, by representing the y R a
state PDF by a large number of samples called particles. Z a+∆y
The overview of the particle filter is as follows: = py (ȳ n ) dȳ n = Pr {a ≤ y < a + ∆y} .
a
• the particles are propagate through the process model, PN
Hence limN →∞ N1 n
n=1 sa = Eyn {sna } = Pr {y ∈ [a, a + ∆y)}
• the particles are then weighted according to the mea- by the LLN, since
surement likelihood,
N Z
1 X n 1 X a+∆y
N
• a resampling generates a new set of particles. sa = δ(ξ − y n ) dξ
N n=1 N n=1 a
6.1 Problem Statement Z a+∆y Z a+∆y
1 X
N
= δ(ξ − y ) dξ →
n
py (ȳ) dȳ.
Consider the nonlinear discrete-time system a N n=1 a
x(k) = qk−1 (x(k − 1), v(k − 1)), z(k) = hk (x(k), w(k)), Thus for a smooth and bounded py and small ∆y we ap-
proximate
where x(0), {v(·)} and {w(·)} are mutually independent
and can be CRVs or DRVs with known PDF (no assump- 1 X
N
tion on the shape of the PDF). Any known input is im- py (ξ) ≈ δ(ξ − ȳ n ), ∀ξ,
N n=1
plicitly absorbed in qk−1 (·).
where we understand it in the sense that if you integrate
6.2 Monte Carlo (MC) Sampling both you get similar numbers.
MC sampling is a basic technique of using a large number
of samples called particles to approximate the PDF of a Change of variables for MC approximant Consider
RV. x = g(y), with x ∈ X = g(Y). Let xn = g(y n ), j ∈ X ,
and rjn = δ(j − xn ) (similar to sni ), then
MC approximant of a DRV Let y ∈ Y = {1, 2, . . . , Ȳ }
1 X
N
be a DRV with PDF py . Then let {y 1 , y 2 , . . . , y N } i.i.d px (j) ≈ δ(j − g(ȳ n )), j ∈ g(Y),
with PDF py be DRVs model N random samples of y, and N n=1
define (
1 if y n = i i.e. we can approximate px by using samples from py . This
si = δ(i − y ) =
n n
, also holds for joint RVs
0 otherwise
1 X
N
where i = 1, . . . , Ȳ and n = 1, . . . N (there are N × Ȳ
px (ξ) ≈ δ(ξ − x̄n ), ∀ξ,
sni ’s). By the law of the unconscious statistician N n=1
X
Ȳ
where ξ and x̄n may be vectors. Moreover, X and Y may
En {sni } = δ(i − ȳ n )py (ȳ n ) = py (i). also be infinite.
y
ȳ n =1
8
Prior Update Given the PDF pxm (k−1) we construct After the resampling perturb the particles with
pxp (k) by approximating both with MC sampling. Let
x̄nm (k) ← x̄nm (k) + ∆xn (k),
1 X
N
pxm (k−1) (ξ) ≈ δ(ξ − x̄nm (k − 1)), ∀ξ where ∆xn (k) is drawn from a zero-mean, finite-variance
N n=1 distribution. To choose the variance of said distribution a
simple way is to let
where {x̄nm (k − 1)} are N particles to approximate xm (k −
1). Then σi = KEi N −1/d ,
1 X
N
where K 1 is a tuning parameter, d is the dimension of
pxp (k) (ξ) ≈ δ(ξ − x̄np (k)), ∀ξ the state space, Ei = maxn1 ,n2 |xnm,i
2
− xnm,i
1
| is the maxi-
N n=1
−1/d
mum inter-sample variability and N is related to the
where x̄np (k) = qk−1 (x̄nm (k − 1), v̄ n (k − 1)),
spacing between nodes of a uniform grid.
and {v̄ n (k − 1)} are MC samples of pv(k−1) . In words: we
“simply” propagate the particles through the dynamics. 7 Observer Based Control
Measurement Update (and Resampling) Given the For many modern control strategies knowledge of the sys-
PDF pxp (k) of xp (k) ∈ X and a measurement z̄(k) we tem state x(k) is required. If perfect state measurements
approximately construct pxm (k) using MC sampling. By are not available it is replaced with a state estimate x̂(k).
Bayes’ rule (proposition 2.2 in §2.2): Henceforth we will discuss why and when it makes sense
to separate the problem in estimation and feedback control
pzm (k)|xp (k) (z̄(k)|ξ) pxp (k) (ξ) (separation principle). Consider the LTI system
pxm (k) (ξ) = P , ∀ξ.
ζ∈X pzm (k)|xp (k) (z̄(k)|ζ) pxp (k) (ζ)
x(k) = Ax(k − 1) + Bu(k − 1) + v(k − 1),
Substituting the MC approximation1 for pxp (k) z(k) = Hx(k) + w(k),
X
N
where v(k − 1) and w(k) are zero-mean CRVs to model
pxm (k) (ξ) ≈ βn δ(ξ − x̄np (k)) noise. We want x̂(k) → x(k) as k → ∞ in absence of
n=1
noise, and x̂(k) → E {x(k)} as k → ∞ with bounded
PN variance with noise.
where n=1 βn = 1, and
Definition 7.1 (Leuenberg Observer).
βn = α pzm (k)|xp (k) (z̄|x̄np (k)),
!−1 x̂(k) = Ax̂(k − 1) + Bu(k − 1) + K(z̄(k) − ẑ(k)),
X
N
α= pzm (k)|xp (k) (z̄(k)|x̄np (k)) . ẑ(k) = H(Ax̂(k − 1) + Bu(k − 1)),
n=1
where K is a static correction matrix that is to be de-
In words: at points of high prior there are many parti-
signed.
cles, in the posterior they are scaled by the measurement
likelihood. Remark. In absence of noise (v(k − 1) = 0, w(k) = 0)
To complete the measurement update we need to re- z̄(k) = z(k) and the error
sample the particles. This is algorithm 1.1 of §1.3. Repeat
N times: e(k) = x(k) − x̂(k) = (I − KH)Ae(k − 1),
• Select a random number r ∼ U (0, 1), hence e(k) → 0 as k → 0 iff (I −KH)A is stable (all eigen-
values |λi | < 1). Also, from linear system theory we know
• Pick particle n̄ such that
there exists such stabilizing K iff (A, HA) is detectable,
X
n̄−1 X
n̄ which is detectable iff (A, H) is detectable.
βn < r and βn ≥ r. Remark. An alternate Formulation of the Leuenberg Ob-
n=1 n=1 server is
The result are N new particles x̄nm (k) from a subset of the x̂(k + 1) = Ax̂(k) + Bu(k) + K(z̄(k) − ẑ(k)),
old particles that have all equal weight. ẑ(k) = H x̂(k).
6.4 Sample Impoverishment The error dynamics are then e(k + 1) = (A − KH)e(k),
and there is a stable K iff (A − KH) is stable, which is
A possible problem of the PF is that all particles may true iff (A, H) is detectable.
converge to the same one and become a bad representation
of the PDF. This is because we have a finite number of
samples N , and is called sample impoverishment. The 7.1 Static State-Feedback Control
simplest solution to prevent this is roughening. In the deterministic case (no noise, v(k −1) = 0, w(k) = 0)
1 and making use of the fact that f (ξ)δ(ξ) = f (0)δ(ξ) we introduce a linear static feedback law u(k) = F x(k) =
9
F Hz(k) by choosing a matrix F . The closed loop dynam- 1. Design a steady-state KF (independent of Q̄ and R̄)
ics x(k) = (A + BF )x(k − 1) are stable iff A + BK is that provides the estimate x̂(k) of x(k).
stable. From linear system theory F exists iff (A, B) is
stabilizable. 2. Design an optimal state-feedback strategy u(k) =
The feedback F can be chosen by pole placement or F x(k) (independent of Q and R for the deterministic
using a linear quadratic regulator (LQR), which yields the LQR problem (x(k) = Ax(k − 1) + Bu(k − 1)) that
F that minimizes minimizes JLQR (above).
∞
X 3. Combine the two.
JLQR = xT (k)Q̄x(k) + uT (k)R̄u(k),
k=0
Because of optimality this called separation theorem for
LTI system and quadratic cost.
where Q̄ = Q̄T 0 and R̄ = R̄T 0.
The LQR is dual to the steady-state KF design prob-
lem, because the solution for the optimal F is
−1
F = −(B T P B + R̄) B T P A,
hence
x(k) A + BF −BF x(k − 1)
= = .
e(k) 0 (I − KH)A e(k − 1)
10