0% found this document useful (0 votes)
19 views114 pages

Lecture Booklet v01

This document contains lecture notes from a mathematics course. It introduces mathematical concepts and proof techniques, including: - Notation used in mathematics like sets of natural numbers, integers, rational numbers, and real numbers. - Methods of proof like direct proof, proof by contrapositive, proof by contradiction, and proof by exhaustion. - The principle of mathematical induction and its uses in proofs. - Examples are provided to illustrate different proof techniques.

Uploaded by

tentupinamba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views114 pages

Lecture Booklet v01

This document contains lecture notes from a mathematics course. It introduces mathematical concepts and proof techniques, including: - Notation used in mathematics like sets of natural numbers, integers, rational numbers, and real numbers. - Methods of proof like direct proof, proof by contrapositive, proof by contradiction, and proof by exhaustion. - The principle of mathematical induction and its uses in proofs. - Examples are provided to illustrate different proof techniques.

Uploaded by

tentupinamba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 114

Collection of ROB 501 Lecture Notes

J.W. Grizzle

Fall 2015

1
Page = 2

Rob 501 Fall 2014


Lecture 01
Typeset by: Jimmy Amin
Proofread by: Ross Hartley

Introduction to Mathematical Arguments

Notation:

N = {1, 2, 3, · · · } Natural numbers or counting numbers


Z = Z = {· · · , −3, −2, −1, 0, 1, 2, 3, · · · } Integers or whole numbers
 
m
Q = |m, q ∈ Z, q 6= 0, no common factors (reduce all fractions) Ra-
q
tional numbers
R = Real numbers
C = {α + jβ | α, β ∈ R, j 2 = -1} Complex numbers
∀ means "for every", "for all", "for each".
∃ means "for some", "there exist(s)", "there is/are", "for at least one".
∼ means "not". In books, and some of our handouts, you see ¬.
p ⇒ q means "if p is true, then q is true.".
p ⇐⇒ q means "p is true if and only if q is true".
p ⇐⇒ q is logically equivalent to:
(a) p ⇒ q and
(b) q ⇒ p.
The contrapositive of p ⇒ q is ∼ q ⇒∼ p (logically equivalent).
The converse of p ⇒ q is q ⇒ p.
Relation: (p ⇒ q) ⇔ (∼ q ⇒∼ p)
1
Page = 3

However, in general, (p ⇒ q) DOES NOT IMPLY (q ⇒ p), and vice-versa


 = Q.E.D. (Latin:"quod erat demonstrandum" = "thus it was demon-
strated")

Review of Some Proof Techniques

Direct Proofs: We derive a result by applying the rules of logic to the given
assumptions, denitions, axioms, and (already) known theorems.

Example:
Def. An integer n is even if n = 2k for some integer k ; it is odd if n = 2k + 1
for some integer k . Prove that the sum of two odd integers is even.

(Remark: In a denition, "if" means "if and only if".)

Proof: Let a and b be odd integers.


Hence, there exist integers k1 and k2 such that
a = 2k1 + 1
b = 2k2 + 1
It follows that
a + b = (2k1 + 1) + (2k2 + 1) = 2(k1 + k2 + 1)
Because (k1 + k2 + 1) is an integer, a + b is even. 

Proof by Contrapositive: To establish p ⇒ q , we prove it logical equivalent,


∼ q ⇒∼ p.

As an example, let n be an integer. Prove that if n2 is even, then n is even.

p = n2 is even, ∼ p = n2 is odd
q = n is even, ∼ q = n is odd

2
Page = 4

Our proof of p ⇒ q is to show ∼ q ⇒∼ p. (i.e., if n is odd, then n2 is odd.)


Assume n is odd. ∴ n = 2k + 1, for some integer k .
Therefore
n2 = (2k + 1)2 = 4k 2 + 4k + 1 = 2(2k 2 + 2k) + 1

Because (2k 2 + 2k) is an integer, we are done. 

Proof by Exhaustion: Reduce the proof to a nite number of cases, and then
prove each case separately.

Proofs by Induction:

First Principle of Induction (Standard Induction): Let P (n) denote a


statement about the natural numbers with the following properties:

(a) Base case: P (1) is true


(b) Induction part: If P (k) is true, then P (k + 1) is true.

∴ P (n) is true for all n ≥ 1 (n ≥ base case)

Example:
Claim: For all n ≥ 1, 1 + 3 + 5 + · · · + (2n − 1) = n2
Proof:

Step 1: Base case: n = 1 : 12 = 1 = n


Step 2: Assume 1 + 3 + 5 + · · · + (2k − 1) = k 2 = n2
Step 3: To show 1 + 3 + 5 + · · · + (2k − 1) + (2(k + 1) − 1) = (k + 1)2 = n2

By the induction step,


1 + 3 + 5 + · · · + (2k − 1) + (2(k + 1) − 1) = k 2 + (2(k + 1) − 1)

3
Page = 5

But,
k 2 + (2(k + 1) − 1) = k 2 + 2k + 2 − 1 = k 2 + 2k + 1 = (k + 1)2
which is what we wanted to show. 

4
Page = 6

Rob 501 Fall 2014


Lecture 02
Typeset by: Ross Hartley
Proofread by: Jimmy Amin

Review of Some Proof Techniques (Continued)

Second Principle of Induction (Strong Induction): Let P (n) be a state-


ment about the natural numbers with the following properties:

(a) Base Case: P (1) is true.

(b) Induction: If P (j) is true for all 1 ≤ j ≤ k, then P (k + 1) is true.

Conclusion: P (n) is true for all n ≥ 1 (n ≥ Base Case).

Fact: Two principles of induction are equivalent. Sometimes, the second

method is easier to apply.

Example:
Def.: A natural number n is composite if it can be factored asn = a · b, where
a and b are natural numbers satisfying 1 < a, b < n. Otherwise, n is prime.

Theorem: (Fundamental Theorem of Arithmetic) Every natural number n≥2


can be factored as a product of one or more primes.

Proof:

Base Case: The number 2 can be written as the product of a single prime.

Induction: Assume that every integer between 2 and k can be written as

the product of one or more primes.

1
Page = 7

To Show: k+1 can be written as the product of one or more primes.

There are two cases:

Case 1: k + 1 is prime. We are done because k + 1 is the product of one or


more primes (itself ).

Case 2: k + 1 is composite. Then, there exist two natural numbers a and

b, 1 < a, b ≤ k , such that k + 1 = a · b

Therefore, by the induction step:

a = p1 · p2 · · · · · pi , f or some primes pi
b = q1 · q2 · · · · · qj , f or some primes qj
Hence, a · b = (p1 · p2 · · · · · pi ) · (q1 · q2 · · · · · qj ) is a product of primes. 

Proof by Contradiction: We want to show that a statement p is true.

We assume instead that the statement is false. We derive a "contradiction",

meaning some statement that is obviously false, such as " 1 + 1 = 3". More

generally, we derive that R is true and R is also false (This is a contradiction.)

We conclude that ∼p is impossible (led to a contradiction). Hence, p must be

true!


Example: Prove that 2 is an irrational number.

Proof by Contradiction: Assume 2 is rational.

Conclusion: There exist natural numbers m and n, (n 6= 0), m and n have no

common factors, such that


√ m
2=
n
m2
∴2= n2 ⇒ 2n2 = m2 ⇒ m2 is even ⇒m has to be even. (Proven in previous

lecture, product of even numbers is even.)

∴∃ a natural number k such that m = 2k

2
Page = 8

∴ 2n2 = (2k)2 = 4k 2
∴ n2 = 2k 2 ⇒ n2 is even ⇒ n is even
Conclusion, m and n have 2 as a common factor. This contradicts m and n
having no common factors.

Hence,
√ 2 is not a rational number.

∴ 2 must be irrational. 

Explanation:

p: 2 irrational.

We start with the assumption that (∼ p :) 2 is a rational number.

Based on that assumption, we can deduce that


√ (R :) ∃m, n, n 6= 0, m and n do
m
not have common factors such that
√ 2= n.
m
However, from 2= n, we can show that (∼ R :) m and n have 2 as a common
factor.

∴ R ∧ (∼ R), which is a contradiction.


Conclusion: ∼ p is impossible.

∴ p is true.

Proof Types: In conclusion, we have following proof techniques.

• Direct Proof: p⇒q


• Proof by Contrapositive: ∼ q ⇒∼ p
(Start with the conclusion being false, that is ∼q and do logical steps to

arrive at ∼ p)
• Proof by Contradiction: p ∧ (∼ q)
(Assume p is true and q is false. Find that both R and ∼ R and true,

which is a contradiction.)

Negating a Statement:
Examples:

p:x≥0 ∼p:x<0

3
Page = 9

p : ∀x ∈ R, f (x) > 0 ∼ p : ∃x ∈ R, f (x) ≤ 0


In general, ∼∀=∃ and ∼ ∃ = ∀.

Exercise: y ∈ R,
Let

p : ∀δ > 0, ∃x ∈ Q such that |x − y| < δ


What is ∼ p?

Answer:

∼ p : ∃δ > 0, ∀x ∈ Q such that |x − y| ≥ δ

Key Properties of Real Numbers: Let A be a non-empty subset of R.

Def.

(1) A is bounded from above if ∃b ∈ R such that x ∈ A ⇒ x ≤ b.


(2) A number b∈R is an upper bound for A if ∀x ∈ A, x ≤ b.
(3) A number b is a least upper bound for A if

(i) b is an upper bound for A, and

(ii) b is less than or equal to every upper bound.

Notation: Least upper bound of A is denoted by sup(A), the supremum of A.

Theorem: Every subset of R that is upper bounded has a supremum.

This is FALSE for Q.


Here is a classical example:

Assume A = {x ∈ Q|x2 < 2} √ √


An obvious candidate for the supremum is x= 2, but 2 is irrational.

4
Page = 10

Rob 501 Fall 2014


Lecture 03
Typeset by: Pedro Di Donato
Proofread by: Mia Stevens

Abstract Linear Algebra

Def: Field: (Chen, 2nd edition, page 8) : A eld consists of a set, denoted
by F, of elements called scalars and two operations called addition  + and
multiplication  ·; the two operations are dened over F such that they satisfy
the following conditions:

1. To every pair of elements α and β in F , there correspond an element α + β


in F called the sum of α and β , and an element α · β in F called product
of α and β .
2. Addition and multiplication are respectively commutative: For any α and
β in F,
α+β =β+α α·β =β·α

3. Addition and multiplication are respectively associative: For any α, β , γ


in F,
(α + β) + γ = α + (β + γ) (α · β) · γ = α · (β · γ)

4. Multiplication is distributive with respect to addition: For any α, β , γ in


F,
α · (β + γ) = (α · β) + (α · γ)

5. F contains an element, denoted by 0, and an element, denoted by 1, such


that α + 0 = α, 1 · α = α for every α in F .

6. To every α in F , there is an element β in F such that α + β = 0. The


element β is called the additive inverse.

1
Page = 11

7. To every α in F which is not the element 0, there is an element γ in F


such that α · γ = 1. The element γ is called the multiplicative inverse.

Remark: R is a typical example of a eld.

Examples Non-examples

R Irrational (Fails axiom 1)

C 2×2 matrices, real coe. (Fails axiom 2)

Q 2×2 diagonal matrices real coe. (Fails axiom 7)

Def: Vector Space (Linear Space) (Chen 2nd Edition, page 9) A linear
space over a eld F , denoted by (X , F), consists of a set, denoted by X , of
elements called vectors, a eld F , and two operations called vector addition
and scalar multiplication. The two operations are dened over X and F such
that they satisfy all the following conditions:

1. To every pair of vectors x1 and x2 in X , there corresponds a vector x1 + x2


2 1
in X, called the sum of x1 and x .

2. Addition is commutative: For any x1 , x2 inX , x1 + x2 = x2 + x1 .



3. Addition is associative:
 For any x1 , x2 , 3 1
and x in X , x + x
2
+ x3 =
x1 + x2 + x3 .
4. X contains a vector, denoted by 0, such that 0+x = x for every x in X.
The vector 0 is called the zero vector or the origin.

5. To every x in X, there is a vector x̄ in X, such that x + x̄ = 0.


6. To every α in F , and every x in X , there corresponds a vector α·x in X
called the scalar product of α and x.

7. Scalar multiplication is associative: For any α, β in F and any x in X,


α · (β · x) = (α · β) · x.
1 We use x1 , x2 , x3 to denote dierent vectors. It does not denote powers!

2
Page = 12

8. Scalar multiplication is distributive with respect to vector addition: For



any α in F and any x1 , x2 in X , α · x1 + x2 = α · x1 + α · x2 .
9. Scalar multiplication is distributive with respect to scalar addition: For
any α, β in F and any x in X , (α + β) · x = α · x + β · x.
10. For any x in X , 1 · x = x, where 1 is the element 1 in F.

Remark: F = eld, X = set of vectors

Examples:

1. Every eld forms a vector space over itself. (F, F). Examples: (R, R),
(C, C), (Q, Q).
2. X = C, F = R: (C, R).
3. F = R, D ⊂ R (examples: D = [a, b] ; D = (0, ∞) ; D = R) and X =
{f : D → R} = {functions from D to R}
f, g ∈ X , dene f + g ∈ X by ∀t ∈ D, (f + g) (t) := f (t) + g(t) and let
α ∈ R, α · f ∈ X , dene f · g ∈ X by ∀t ∈ D, (α · f ) (t) = α · f (t).
4. Let F be a eld and dene F n the set of n-tuples written as columns
  
 α1 
.
F =  ..  αi ∈ F, 1 ≤ i ≤ n = X
n
 
αn
     
α1 β1 α1 + β1
. . ..
Vector Addition:  ..  +  ..  =  . 
αn βn αn + βn
 
αx1
.
Scalar Multiplication: α · x =  .. 

αxn
5. X = F n×m = {n × m matrices with coecients in F}

3
Page = 13

Non-examples:

1. X = R, F = C, (R, C) - Fails the denition of scalar multiplication (and


others).

2. X = {x ≥ 0, x ∈ R} , F = R - Fails the denition of scalar multiplication


(and others).

Def: Subspace: Let (X , F) be a vector space, and let Y be a subset of X.


Then Y is a subspace if using the rules of vector addition and scalar multipli-
cation dened in (X , F), we have that (Y, F) is a vector space.

Remark: To apply the denition, you have to check axioms 1 to 10.

Proposition: (Tools to check that something is a subspace) Let (X , F) be a


vector space and Y ⊂ X. Then, the following are equivalent (TFAE):

1. (Y, F) is a subspace.

2. ∀y 1 , y 2 ∈ Y, y 1 + y 2 ∈ Y (closed under vector addition), and ∀y ∈ Y and


α ∈ F, αy ∈ Y (closed under scalar multiplication).
3. ∀y 1 , y 2 ∈ Y , ∀α ∈ F, α · y 1 + y 2 ∈ Y .

Example: (X , F) , F = R, X = {f : (−∞, ∞) → R},


Y = {polynomials with real coecients}
Is Y a subspace? Yes, by part 2 of the proposition.

Non-example:
  X = R2 , F =
R
x1
Y= ∈ R2 x1 + x2 = 3 .
x
 2    
x1 y1 x1 + y1
Let ∈ Y and ∈ Y . Then, ∈
/Y because x1 +y1 +x2 +y2 = 6
x2 y2 x2 + y2

4
Page = 14

Therefore, x + y 6∈ Y , which means that this space is not closed under vector
addition! Thus, it is not a subspace!

Note: Every vector space needs to contain the 0 vector.

5
Page = 15

ROB 501 Fall 2014


Lecture 04
Typeset by: Xiangyu Ni
Proofread by: Sulbin Park

Abstract Linear Algebra (Continued)

Def. Let (X , F) be a vector space. A linear combination is a nite sum of


the form α1 x1 + α2 x2 + · · · + αn xn where n ≥ 1, αi ∈ F, xi ∈ X .
 
i
x1
x i 
 2
Remark: xi =  . , where xi means individual vectors, not powers.
 .. 
xn i

P∞
Something of the form k=1 αk v
k
is not a linear combination because it is not
nite.

Def. A nite set of vectors {v 1 , . . . , v k } is linearly dependent if ∃αi ∈ F


not all zero such that α1 v 1 + α2 v 2 + · · · + αk v k = 0. Otherwise, the set is
linearly independent.

Remark: For a linearly independent set {v 1 , . . . , v k }, α1 v 1 + α2 v 2 + · · · +


αk v k = 0 ⇐⇒ α1 = 0, α2 = 0, . . . , αk = 0.

Def. An arbitrary set of vectors S ⊂ X is linearly independent if every nite


subset is linearly independent.

1
Page = 16

Remark: Suppose {v 1 , . . . , v k } is a linearly dependent set. Then, ∃α1 , . . . , αk


are not all zero such that α1 v 1 + α2 v 2 + · · · + αk v k = 0.

Suppose α1 6= 0
α1 v 1 = −α2 v 2 − α3 v 3 − · · · − αk v k
α2 α3 αk
v1 = − v2 − v3 − · · · − vk
α1 α1 α1

∴ v 1 is a linear combination of the {v 2 , . . . , v k }.

Example: X = P(t) = {set of polynomials with real coecients}. F = R.


Claim: The monomials are linearly independent. In particular, for each n ≥ 0,
the set {1, t, . . . , tn } is linearly independent.

Proof: Let α0 + α1 t + · · · + αn tn = o =zero polynomial. We need to show that


α0 = α1 = · · · = αn = 0.
k
Recall that p(t) ≡ 0, d dtp(t)
k |t=0 = 0 for k = 0, 1, 2, . . . .

p(t) = α0 + α1 t + · · · + αn tn
0 = p(0) ⇐⇒ α0 = 0
0 = dp(t)
dt |t=0 = (α1 + 2α2 t + · · · + nαn t
n−1
)|t=0 ⇐⇒ α1 = 0
..
.
Etc. 
 
1 0 0
Example: Let X = {2×3 matrices with real coecients}. Let v = , 1
2 0 0
     
1 0 0 0 0 1 0 0 0
v2 = , v3 = , v4 = .
0 0 0 0 0 0 1 0 0
{v 1 , v 2 } is a linearly independent
 set.
    
α 1 0 0 α 0 0 0 0 0
α1 v 1 + α2 v 2 = 0 ⇐⇒ + 2 =
2α1 0 0 0 0 0 0 0 0
⇐⇒ α1 = α2 = 0.

2
Page = 17

{v 1 , v 2 , v 4 } is a linearly dependent set.


α1 v 1 + α2 v 2 + α4 v 4 = 0     
α1 0 0 α2 0 0 0 0 0 0 0 0
⇐⇒ + + =
2α1 0 0 0 0 0 α4 0 0 0 0 0
⇐⇒ α1 = 1, α2 = −1, α4 = −2.

Remark: F is important when determining whether a set is linearly


√ indepen-
dent or not. For example, let X = C and v = 1, v = j = −1. v 1 and v 2
1 2

are linearly independent when F = R. However, they are linearly dependent


when F = C.

Def. Let S be a subset of a vector space (X , F). The span of S , denoted


span{S}, is the set of all linear combinations of elements of S .
span{S} = {x ∈ X |∃n ≥ 1, α1 , . . . αn ∈ F, v 1 , . . . , v n ∈ S, x = α1 v 1 + · · · +
αn v n }.

Remark: span{S} is a subset.

Example: Let X = {f : R → R} and F = R. S = {1, t, t2 , . . . } = {tk |k ≥


0}. span{S} = P(t) = {polynomials with real coecients}.

Is et ∈ span{S}? No. Although et can be written as a sum of polynomials


(Taylor Series), the number of components of that sum is innite. While, the
linear combination has to be nite.

Def. A set of vectors B in (X , F) is a basis for X if

• B is linearly independent.
• span{B} = X .

3
Page = 18
     
1 0 0
0 1 0
     
Example: (F n , F) where F is R or C. e1 =  . , e2 =  . , . . . , en =  . .
 ..   ..   .. 
0 0 1
{e1 , e2 , . . . , en } is both linearly independent and its span is F n .
∴ It is a basis.
It is called the Natural Basis.

Moreover, {e1 , e2 , . . . , en , je1 , je2 , . . . , jen } is a basis for Cn in (Cn , R). How-
ever, it is not a basis for Cn in (Cn , C).
     
1 1 1
0 1 1
     
Let v 1 =  .. , v 2 =  .. , . . . , v n =  .. . {v 1 , v 2 , . . . , v n } is also a basis for
. . .
0 0 1
(F , F) where F is R or C.
n

Example: The innite set {1, t, . . . , tn , . . . } is a basis for (P(t), R).

Def. Let n > 0 be an integer. The vector space (X , F) has nite dimension
n if

• there exists a set with n linearly independent vectors, and


• any set with n + 1 or more vectors is linearly dependent.

(X , F) is innite dimensional if for every n > 0, there is a linearly independent


set with n or more elements in it.

4
Page = 19

Examples:
dim(F n , F) = n
dim(Cn , R) = 2n
dim(P(t), R) = ∞

Theorem: Let (X , F) be an n-dimensional vector space (n is nite). Then,


any set of n linearly independent vectors is a basis.

Proof: Let (X , F) be n-dimensional and let {v 1 , · · · , v n } be a linearly inde-


pendent set.
To show: ∀x ∈ X , ∃α1 , · · · , αn ∈ F such that x = α1 v 1 + · · · + αn v n .
How: Because (X , F) is n-dimensional, {x, v 1 , · · · , v n } is a linearly dependent
set. Otherwise, the dimX > n which it isn't. Hence, ∃β0 , β1 , · · · , βn ∈ F ,
NOT ALL ZERO, such that β0 x + β1 v 1 + · · · + βn v n = 0.
Claim: β0 6= 0
Proof: Suppose that β0 = 0. Then,

1. At least one of β1 , · · · , βn is non-zero.


2. β1 v 1 + · · · + βn v n = 0.

1 and 2 above, imply that {v 1 , · · · , v n } is a linearly dependent set, which is a


contradiction. Hence, β0 = 0 cannot hold. Completing the proof, we write
β0 x = −β1 x1 − · · · − βn v n
   
−β1 −β n
x= v1 + · · · + vn
β0 β0
−β1 −βn
∴ α1 = , · · · , αn = .
β0 β0

5
Page = 20

ROB 501 Fall 2014


Lecture 05
Typeset by: Meghan Richey
Proofread by: Su-Yang Shieh

Abstract Linear Algebra (Continued)

Proposition: Let (X , F) be a vector space with basis {v 1 , · · · , v n }. Let


x ∈ X . Then, ∃ unique coecients α1 , · · · , αn such that x = α1 v 1 + α2 v 2 +
· · · + αn v n .

Proof: Suppose x can also be written as x = β1 v 1 + β2 v 2 + · · · + βn v n .


We need to show: α1 = β1 , α2 = β2 , · · · , αn = βn .
0 = x − x = (α1 − β1 )v 1 + · · · + (αn − βn )v n
By linear independence of {v 1 , · · · , v n }, we can obtain that α1 −β1 = 0, · · · , αn −
βn = 0.
Hence, α1 = β1 , · · · , αn = βn , that is, the coecients are unique. 


α1
α 
 2
Def: x ∈ X , x = α1 v + · · · + αn v . x uniquely denes  .  ∈ F n .
1 n
 .. 
αn
 
α1
α 
 2
[x]v =  ..  is the representation of x with respect to the basis v = {v 1 , · · · , v n }
 . 
αn
if and only if x = α1 v 1 + · · · + αn v n .

Example: F = R, X = {2 × 2 matrices with real coecients}

1
Page = 21
       
1 0 0 1 0 0 0 0
Basis 1: v 1 = , v2 = , v3 = , v4 =
0 0 0 0 1 0 0 1
       
1 0 0 1 0 1 0 0
Basis 2: w1 = , w2 = , w3 = , w4 =
0 0 1 0 −1 0 0 1

 
5 3
x= = 5w1 + 2w2 + 1w3 + 4w4
1 4
 
5
2
Therefore, [x]w =  
1∈R .
4

Easy Facts:

1. Addition of vectors in (X , F) ≡ Addition of the representations in


(F n , F).
[x + y]v = [x]v + [y]v
2. Scalar multiplication in (X , F) ≡ Scalar multiplication with the repre-
sentations in (F n , F).
[αx]v = α[x]v
3. Once a basis is chosen, any n-dimensional vector space (X , F) "looks
like" (F n , F).

Change of Basis Matrix: Let {u1 , · · · , un } and {ū1 , · · · , ūn } be two bases
for (X , F). Is there a relation between [x]u and [x]ū ?

Theorem: ∃ an invertible matrix P , with coecients in F , such that ∀x ∈


(X , F), [x]ū = P [x]u .
Moreover, P = [P1 |P2 | · · · |Pn ] with Pi = [ui ]ū ∈ F n where Pi is the ith column
of the matrix P and [ui ]ū is the representation of ui with respect to ū.

2
Page = 22

Proof: Let x = α1 u1 + · · · + αn un = ᾱ1 ū1 + · · · + ᾱn ūn .


 
α1
α 
 2
α =  ..  = [x]u
 . 
αn
 
ᾱ1
 ᾱ 
 2
ᾱ =  ..  = [x]ū
 . 
ᾱn
hX
n i n
X n
X
ᾱ = [x]ū = i
αi u = i
αi [u ]ū = αi Pi = P α.

i=1 i=1 i=1
Therefore, ᾱ = P α = P [x]u .

Now we need to show that P is invertible:


Dene P̄ = [P̄1 |P̄2 | · · · |P̄n ] with P̄i =[ūi ]u .
Do the same calculations and obtain α = P̄ ᾱ.
Then, we can obtain that α = P̄ P α and ᾱ = P P̄ ᾱ.
Therefore, P P̄ = P̄ P = I .
In conclusion, P̄ is the inverse of P (P̄ = P −1 ). 

Example: X ={2 × 2 matrices with real coecients}, F = R.


       
0 11 0 0 0 0 0
u= , , ,
0 00 0 1 0 0 1
       
1 0 0 1 0 1 0 0
ū = , , ,
0 0 1 0 −1 0 0 1
We have following relations:
α = P ᾱ, Pi = [ui ]ū , ᾱ = P̄ α, P̄i = [ūi ]u . (P̄ −1 = P , P −1 = P̄ )
Typically, compute the easier of P or P̄ , and compute the other by inversion.

3
Page = 23

We choose to compute P̄

1
0
P̄1 = [ū1 ]u =
0

0
 
0
1
P̄2 = [ū2 ]u =
1

0
 
0
 1 
P̄3 = [ū3 ]u =
 −1 

0
 
0
0
P̄4 = [ū4 ]u =
0

1
 
1 0 0 0
0 1 1 0
Therefore, P̄ =
0
 and P = P̄ −1
1 −1 0 
0 0 0 1

Def. Let A be an n × n matrix with complex coecients. A scalar λ ∈ C is


an eigenvalue (e-value) of A, if ∃ a non-zero vector v ∈ Cn such that Av = λv .
Any such vector v is called an eigenvector (e-vector) associated with λ. Eigen-
vectors are not unique.

To nd eigenvalues, we need to know conditions under which ∃v 6= 0 such


that Av = λv .
Av = λv ⇐⇒ (λI − A)v = 0 ⇐⇒ det(λI − A) = 0

4
Page = 24
 
0 1
Example: A = , det(λI − A) = λ2 + 1 = 0.
−1 0
Therefore, the eigenvalues are λ1 = j, λ2 = −j .
To nd eigenvectors, we need
 to solve (A
 − λi I)v = 0.
i

1 1
The eigenvectors are v 1 = , v2 = .
j −j
Note that both eigenvalues and eigenvectors are complex conjugate pairs.

5
Page = 25

ROB 501 Fall 2014


Lecture 06
Typeset by: Katie Skinner
Proofread by: Meghan Richey
Edited by Grizzle: 24 Sept 2015

Abstract Linear Algebra (Continued)

Def. ∆(λ) = det(λI − A) is called the characteristic polynomial. ∆(λ) = 0 is


called the characteristic equation.
∆(λ) = (λ − λ1 )m1 (λ − λ2 )m2 · · · (λ − λp )mp
where λ1 , · · · , λp are the distinct eigenvalues, and mi is the multiplicity of λi
such that
m1 + m2 + · · · + mp = n

Theorem: Let A be an n × n matrix with coecients in R or C. If the


e-values {λ1 , · · · , λn } are distinct, that is, λi 6= λj for all 1 ≤ i 6= j ≤ n, then
the e-vectors {v 1 , · · · , v n } are linearly independent in (Cn ,C).

Remark: Restatement of the theorem: If {λ1 , · · · , λn } are distinct then


{v , · · · , v } is a basis for (Cn ,C).
1 n

Proof: We prove the contrapositive and show there is a repeated e-value (λi = λj
for some i 6= j ).

{v 1 , · · · , v n } linearly dependent ⇒ ∃ α1 , · · · , αn ∈ C, not all zero, such that


α1 v 1 + · · · + αn v n = 0(∗).

Without loss of generality, we can suppose α1 6= 0. (that is, we can always


reorder of e-values so that the rst coecient is nonzero.)

1
Page = 26

Because v i is an e-vector,
(A − λj I)v i = Av i − λj v i = λi v i − λj v i = (λi − λj )v i

Side Note: It is an easy exercise to show


(A−λ2 I)(A−λ3 I) · · · (A−λn I)v i = (λi −λ2 )(λi −λ3 ) · · · (λi −λn )v i , 2 ≤ i ≤ n
Let i = 1
(λ1 − λ2 )(λ1 − λ3 ) · · · (λ1 − λn )v 1
Let i = 2
(λ2 − λ2 )(λ2 − λ3 ) · · · (λ2 − λn )v 2 = 0
Etc.

Combining the above with (∗), we obtain


0 = (A − λ2 I)(A − λ3 I) · · · (A − λn I)(α1 v 1 + · · · + αn v n )

= α1 (λ1 − λ2 )(λ1 − λ3 ) · · · (λ1 − λn )v 1

We know α1 6= 0, as stated above, and v 1 6= 0, by denition of e-vectors.


∴ 0 = (λ1 − λ2 )(λ1 − λ3 ) · · · (λ1 − λn )
At least one the terms (λ1 − λk ), 2 ≤ k ≤ n, must be zero, and thus there is a
repeated e-value λ1 = λk for some 2 ≤ k ≤ n. 

Def. Let (X , F ) and (Y, F ) be vector spaces. L : X → Y is a linear operator


if for all x, z ∈ X , α, β ∈ F ,
L(αx + βz) = αL(x) + βL(z)

Equivalently,
L(x + z) = L(x) + L(z)
L(αx) = αL(x)

2
Page = 27

Example:

1. Let A be an n × m matrix with coecients in F .


Dene L : F m → F n by L(x) = Ax, then L is a linear operator. Check
that linearity and multiplication by scalar are satised to prove this.
2. Let X = {polynomials whose degrees ≤ 3}, F = R, Y = X . Then for
p ∈ X , L(p) = dtd p(t).

Def. Let (X , F ) and (Y, F ) be nite dimensional vector spaces, and L :


X → Y be a linear operator. A matrix representation of L with respect to a
basis {u1 , · · · , um } for X and {v 1 , · ·· , v n } for Y is an n × m matrix A, with
coecients in F , such that ∀x ∈ X , L(x) {v1 ,··· ,vn } = A x {u1 ,··· ,um } .

Theorem: Let (X , F ) and (Y, F ) be nite dimensional vector spaces, L :


X → Y a linear operator, {u1 , · · · , um } a basis forX and {v 1 ,· · · , v n } a basis
for Y , then L has a matrix representation A = A1 | · · · |Am , where the ith
column of A is given by
 
Ai = L(ui ) {v1 ,··· ,vn } , 1 ≤ i ≤ m

Proof: x ∈ X , x = α1 u1 + · · · + αm um so that its representation is


 
α1
  α 
 2
x {u1 ,··· ,um } =  ..  ∈ F m
 . 
αm

As in the theorem, we dene


 
Ai = L(ui ) {v1 ,··· ,vn } , 1 ≤ i ≤ m
.

3
Page = 28

Using linearity
L(x) = L(α1 u1 + · · · + αm um )
= α1 L(u1 ) + · · · + αm L(um )

Hence, computing representations, we have

[L(x)]{v1 ,··· ,vn } = [α1 L(u1 ) + · · · + αm L(um )]{v1 ,··· ,vn }

= α1 [L(u1 )]{v1 ,··· ,vn } + · · · + αm [L(um )]{v1 ,··· ,vn }

= α1 A1 + · · · + αm Am


α1
  
 α2 
= A1 |A2 | · · · |Am  .. 
 . 
αm

 
= A x {u1 ,··· ,um }

   
∴ L(x) {v1 ,··· ,vn } = A x {u1 ,··· ,um } 

4
Page = 29

Example:

F = R, X = {polynomials, degrees ≤ 3}, Y = {polynomials, degrees ≤ 3}.

Put the same basis on X and Y , {1, t, t2 , t3 }. Let L : X → Y be dierentiation.


Find the matrix representation, A, which will be a real 4 × 4 matrix.
 
Solution: Compute A column by column, where A = A1 |A2 |A3 |A4 .
 
0
  0
A1 = L(1) {1,t,t2 ,t3 } = 0

0
 
1
  0
A2 = L(t) {1,t,t2 ,t3 } = 0

0
 
0
 2   2
A3 = L(t ) {1,t,t2 ,t3 } =  

0
0
 
0
   0
A4 = L(t3 ) {1,t,t2 ,t3 } = 
3

and thus

 
0 1 0 0
0 0 2 0
A=
0

0 0 3
0 0 0 0

5
Page = 30

Let's check that it makes sense

p(t) = a0 + a1 t + a2 t2 + a3 t3
and  
a0
  a1 
p(t) {1,t,t2 ,t3 } = 
a2 
a3

    
0 1 0 0 a0 a1
0 0 2 0    
A[p(t)]{1,t,t2 ,t3 } =  a1  = 2a2 
0 0 0 3 a2  3a3 
0 0 0 0 a3 0

Does this correspond to dierentiating the polynomial p(t)? We see that

d
p(t) = a1 + 2a2 t + 3a3 t2
dt

 
a1
d 2a2 
[ p(t)]{1,t,t2 ,t3 } = 
3a3 
dt
0

and thus, yes indeed,

d
A[p(t)]{1,t,t2 ,t3 } = [ p(t)]{1,t,t2 ,t3 }
dt
.

6
Page = 31

Rob 501 Fall 2014


Lecture 07
Typeset by: Zhiyuan Zuo
Proofread by: Vittorio Bichucher
Revised by Ni on 31 October 2015

Abstract Linear Algebra (Continued)

Elementary Properties of Matrices (Assumed Known)


A = n × m matrix with coecients in R or C.
Def. Rank of A = # of linearly independent columns of A.
Theorem: rank(A) = rank(A> ) = rank(AA> ) = rank(A> A).
Corollary: # of linearly independent rows = # of linearly independent columns.

Normed Spaces:
Let Field F be R or C,
Def. A function k · k: X → R is a norm if it satises

(a) kxk ≥ 0, ∀x ∈ X and kxk = 0 ⇔ x = 0


(b) Triangle inequality: kx + yk ≤ kxk + kyk, ∀x, y ∈ X
(
If α ∈ R, |α| means the absolute value
(c) kαxk = |α|·kxk, ∀x ∈ X , α ∈ F , .
If α ∈ C, |α| means the magnitude

Examples:

1 F = R or C, X = Fn .
  21
P
n
i) kxk2 = |xi |2 , Two norm, Euclidean norm
i=1
  p1
P
n
ii) kxkp = |xi |p , 1 ≤ p < ∞, p-norm
i=1

1
Page = 32

iii) kxk∞ = max |xi |, max-norm, sup-norm, ∞-norm


1≤i≤n

2 F = R, D ⊂ R, D = [a, b], a < b < ∞,


X = {f : D → R | f is continuous}.

Rb
i) kf k2 = ( a |f (t)|2 dt) 2
1

Rb 1
ii) kf kp = ( a |f (t)|p dt) p , 1 ≤ p < ∞
iii) kf k∞ = max |f (t)|, which is also written kf k∞ = sup |f (t)|
a≤t≤b a≤t≤b

Def. (X , F, k · k) is called a normed space.


Distance: For x, y ∈ X , d(x, y) := kx − yk is called the distance from x to y .
Note: d(x, y) = d(y, x).
Distance to a set: Let S ⊂ X be a subset.
d(x, S) := inf kx − yk
y∈S

If ∃x∗ ∈ S such that d(x, S) = kx−x∗ k, then x∗ is a best approximation of x by


elements of S .
Sometimes, write x̂ for x∗ because we are really thinking of the solution as an
approximation.

Important questions:

a) When does an x∗ exist?


b) How to characterize (compute) x∗ such that kx − x∗ k = d(x, S), x∗ ∈ S ?
c) If a solution exists, is it unique?

Notation: When x∗ (or x̂) exists, we write x∗ = arg minkx − yk.


y∈S

Inner Product Space:


Recall: z = α + jβ ∈ C, α, β ∈ R, z̄ = z 's complex conjugate = α − jβ

2
Page = 33

Def. Let (X , C) be a vector space, a function h· , ·i : X × X → C is an


inner product if

(a) ha, bi = hb, ai.


(b) hα1 x1 + α2 x2 , yi = α1 hx1 , yi + α2 hx2 , yi, linear in the left argument. Sum
can also appear on the right, just use the property (a).
(c) hx, xi ≥ 0 for any x ∈ X , and hx, xi = 0 ⇔ x = 0. (hx, xi is a real
number. Therefore, it can be compared to 0.)

Remarks:

1) hx, xi = hx, xi, by (a). Hence, hx, xi is always a real number.


2) If the vector space is dened as (X , R), replace (a) with (a ) ha, bi = hb, ai
0

Examples:

P
n
a) (Cn , C), hx, yi = x> y = xi yi .
i=1
P
n
b) (Rn , R), hx, yi = x> y = xi yi .
i=1

c) F = R, X = {A | n × m real matrices}, hA, Bi = tr(AB > ) = tr(A> B).


Rb
d) X = {f : [a, b] → R, f continuous}, F = R, hf, gi = a f (t)g(t) dt.

Theorem: (Cauchy-Schwarz Inequality) Let F be R or C, (X , F, h· , ·i) be


an inner product space. Then, for all x, y ∈ X
|hx, yi| ≤ hx, xi1/2 hy, yi1/2 .

Proof: (Will assume F = R).

3
Page = 34

If y = 0, the result is clearly to true.


Assume y 6= 0 and let λ ∈ R to be chosen, we have
0 ≤ hx − λy, x − λyi
= hx, x − λyi − λhy, x − λyi
= hx, xi − λhx, yi − λhy, xi + λ2 hy, yi
= hx, xi − 2λhx, yi + λ2 hy, yi.
Now, select λ = hx, yi/hy, yi.
Then,
0 ≤ hx − λy, x − λyi
= hx, xi − 2|hx, yi|2 /hy, yi + |hx, yi|2 /hy, yi
= hx, xi − |hx, yi|2 /hy, yi.

Therefore, we can conclude that |hx, yi|2 ≤ hx, xihy, yi ⇒ |hx, yi| ≤ hx, xi1/2 hy, yi1/2 . 

4
Page = 35

Rob 501 Fall 2014


Lecture 08
Typeset by: Sulbin Park
Proofread by: Ming-Yuan Yu

Orthogonal Bases

Corollary: Let (X , F, h·, ·i) be an inner product space. Then,


p
kxk := hx, xi1/2 = hx, xi
is a norm.

Proof: (For F = R) will only check the triangle inequality kx + yk ≤ kxk + kyk,
which is equivalent to showing
kx + yk2 ≤ kxk2 + kyk2 + 2kxk · kyk
kx + yk2 = hx + y, x + yi
= hx, x + yi + hy, x + yi
= hx, xi + hx, yi + hy, xi + hy, yi
= kxk2 + kyk2 + 2hx, yi
≤ kxk2 + kyk2 + 2 |hx, yi|
≤ kxk2 + kyk2 + 2kxk · kyk 

Def.

(a) Two vectors x and y are orthogonal if hx, yi = 0. Notation: x ⊥ y


(b) A set of vectors S is orthogonal if
∀x,y ∈ S ,x 6= y ⇒ hx, yi = 0 (i.e. x ⊥ y )

(c) If in addition, kxk = 1 for all x ∈ S , then S is an orthonormal set.

1
Page = 36

Remark:
x
x 6= 0, kxk has norm 1.
x 1 1
= · kxk = · kxk = 1
kxk kxk kxk

Pythagorean Theorem: If x ⊥ y , then


kx + yk2 = kxk2 + kyk2
.

Proof: From the proof of the triangle inequality,


kx + yk2 = kxk2 + kyk2 + 2hx, yi
= kxk2 + kyk2 (because hx, yi = 0) 

Pre-projection Theorem: Let X be a nite-dimensional (real) inner product


space, M be a subspace of X , and x be an arbitrary point in X .

(a) If ∃m0 ∈ M such that


kx − m0 k ≤ kx − mk ∀m ∈ M
then m0 is unique.
(b) A necessary and sucient condition that m0 is a minimizing vector in M
is that the vector x − m0 is orthogonal to M .

Remarks:

(a') If ∃m0 ∈ M such that kx − m0 k = d(x, M ) = inf kx − mk, then m0 is


m∈M
unique. (equivalent to (a))
(b') kx − m0 k = d(x, M ) ⇔ x − m0 ⊥ M . (equivalent to (b))

2
Page = 37

Proof:
Claim 1: If m0 ∈ M satises kx − m0 k = d(x, M ), then x − m0 ⊥ M .
Proof: (By contrapositive) Assume x − m0 6⊥ M , we will nd m1 ∈ M such
that kx − m1 k < kx − m0 k.
Suppose x − m0 6⊥ M . Hence, ∃m ∈ M such that hx − m0 , mi = 6 0. We know
m 6= 0, and hence we dene m̃ = kmk ∈ M .
m

Dene δ := hx − m0 , m̃i =
6 0.
m1 = m0 + δ m̃
∴ m1 ∈ M
kx − m1 k2 = kx − m0 − δ m̃k2
= hx − m0 − δ m̃, x − m0 − δ m̃i
= hx − m0 , x − m0 i − δ hx − m0 , m̃i −δ hm̃, x − m0 i +δ 2 hm̃, m̃i
| {z } | {z } | {z }
δ δ =1
2 2
= kx − m0 k − δ
∴ kx − m1 k2 < kx − m0 k2 

Claim 2: If x − m0 ⊥ M , then kx − m0 k = d(x, M ) and m0 is unique.


Proof: Recall the Pythagorean Theorem:
kx + yk2 = kxk2 + kyk2 when x ⊥ y
Let m ∈ M be arbitrary and suppose x − m0 ⊥ M .
Then,
kx − mk2 = kx − m0 + |m0{z
− m} k2
∈M
= kx − m0 k2 + km0 − mk2 (x − m0 ⊥ M )
∴ inf kx − mk = kx − m0 k and the unique minimizer is m0 . 
m∈M

How to Construct Orthogonal Sets


Gram-Schmidt Process: Let {y 1 , . . . , y n } be a linearly independent set of
vectors. We will produce {v 1 , . . . , v n } orthogonal, such that
∀1 ≤ k ≤ n, span{y 1 , . . . , y k } = span{v 1 , . . . , v k }.

3
Page = 38

Step 1: v 1 = y 1
Remark: v 1 6= 0 because {y 1 , . . . , y n } linearly independent.

Step 2: v 2 = y 2 − a21 v 1 and choose a21 such that v 1 ⊥ v 2 .


0 = hv 2 , v 1 i = hy 2 − a21 v 1 , v 1 i = hy 2 , v 1 i − a21 hv 1 , v 1 i
hy 2 , v 1 i
∴ a21 = 1 2
(kv 1 k =
6 0 because v 1 6= 0)
kv k

Claim: span{y 1 , y 2 } = span{v 1 , v 2 }.


Proof: Know span{y 1 } = span{v 1 }.
To show: y 2 ∈ span{v 1 , v 2 } and v 2 ∈ span{y 1 , y 2 }.

4
Page = 39

Rob 501 Fall 2014


Lecture 09
Typeset by: Pengcheng Zhao
Proofread by: Xiangyu Ni
Revised by Ni on 1 November 2015

Orthogonal Bases (Continued)

Gram-Schmidt Process: Let {y 1 , · · · , y n } be a linearly independent set of


vectors. We will produce {v 1 , · · · , v n } orthogonal such that, ∀1 ≤ k ≤ n,
span{v 1 , · · · , v k } = span{y 1 , · · · , y k }.

Step 1
v1 = y1
Step 2
v 2 = y 2 − a21 v 1
2 1 hy 2 , v 1 i
hv , v i = 0 ⇔ a21 =
kv 1 k2
Step 3
v 3 = y 3 − a31 v 1 − a32 v 2
Choose coecients such that hv 3 , v 1 i = 0 and hv 3 , v 2 i = 0,
0 = hv 3 , v 1 i = hy 3 , v 1 i − a31 hv 1 , v 1 i − a32 hv 2 , v 1 i
| {z }
=0

0 = hv 3 , v 2 i = hy 3 , v 2 i − a31 hv 1 , v 2 i −a32 hv 2 , v 2 i
| {z }
=0

hy 3 , v 1 i hy 3 , v 2 i
∴ a31 = a32 =
kv 1 k2 kv 2 k2
Pk−1 hyk ,vj i
Therefore, we can conclude that vk = yk − j=1 kvj k2 vj .

1
Page = 40

Proof of G-S Process: Need to show span{v 1 , · · · , v k } = span{y 1 , · · · , y k }


(
{v 1 , · · · , v k } ⊆ span{y 1 , · · · , y k } ⇔ v k ∈ span{y 1 , · · · , y k }
⇔ .
{y 1 , · · · , y k } ⊆ span{v 1 , · · · , v k } ⇔ y k ∈ span{v 1 , · · · , v k }

Intermediate Facts
Proposition: Let(X , F) be an n-dimensional vector space and let {v 1 , · · · , v k }
be a linearly independent set with 0 < k < n. Then, ∃v k+1 such that
{v 1 , · · · , v k , v k+1 } is linearly independent.

Proof: (By contradiction)


Suppose no such v k+1 exists. Hence, ∀x ∈ X , x ∈ span{v 1 , · · · , v k }.
∴ X ⊂ span{v 1 , · · · , v k }.
∴ dim(X ) ≤ dim(span{v 1 , · · · , v k }).
∴ n ≤ k , which contradicts k < n. 

Corollary: In a nite dimensional vector space, any linearly independent set


can be completed to a basis. More precisely, let {v 1 , · · · , v k } be linearly inde-
pendent, n = dim(X ), k < n.
Then, ∃v k+1 , · · · , v n such that {v 1 , · · · , v k , v k+1 , · · · , v n } is a basis for X .

Proof: Previous proposition+Induction

Def. Let (X , F, h·, ·i) be an inner product space, and S ⊆ X a subset.


(Doesn't have to be a subspace.)
S ⊥ := {x ∈ X |x ⊥ S} = {x ∈ X |hx, yi = 0 for all y ∈ S}
is called the orthogonal complement of S .

Exercise: S ⊥ is always a subspace.

Proposition: Let (X , F, h·, ·i) be a nite dimensional inner product space,

2
Page = 41

M a subspace of X . Then,
X = M ⊕ M ⊥.

Proof: If x ∈ M ∩ M ⊥ , hx, xi = 0 ⇔ x = 0.
Hence, M ∩ M ⊥ = {0}.

Let {y 1 , · · · , y k } be a basis of M . Complete it to be a basis for X :


{y 1 , y 2 , · · · , y k , y k+1 , · · · , y n }
Apply G.S. to produce orthogonal vectors {v 1 , · · · , v k , v k+1 , · · · , v n } such that
span{v 1 , · · · , v k } = span{y 1 , · · · , y k } = M .
An easy calculation gives
M ⊥ = span{v k+1 , · · · , v n }

Why?
x = α1 v 1 + · · · + αk v k + αk+1 v k+1 + · · · + αn v n
x ⊥ M ⇔ hx, v i i = 0, 1 ≤ i ≤ k
hx, v i i = α1 hv 1 , v i i + · · · + αi hv i , v i i + · · · + αn hv n , v i i
| {z } | {z }
=0 =0
i i
= αi hv , v i
2
= αi kv i k
∴ x = αk+1 v k+1 + · · · + αn v n ⇔ x ∈ span{v k+1 , · · · , v n }.
∴ x ∈ M ⊥ ⇔ x ∈ span{v k+1 , · · · , v n }.

Projection Theorem

Theorem: (Classical Projection Theorem)


Let X be a nite dimensional inner product space and M a subspace of X .
Then, ∀x ∈ X , ∃ unique m0 ∈ M such that
kx − m0 k = d(x, M ) = inf kx − mk.
m∈M

3
Page = 42

Moreover, m0 is characterized by x − m0 ⊥ M .

Proof: To show: m0 exists. Uniqueness and orthogonality were shown in the


Pre-projection Theorem.
From G.S., we learnt that X = M ⊕ M ⊥ .
Hence, we can write
x = m0 + m̃
where
m0 ∈ M and m̃ ∈ M ⊥
Hence,
x − m0 = m̃ ∈ M ⇒ x − m0 ⊥ M. 

4
Page = 43

Rob 501 Handout: Grizzle


Lecture 10
Orthogonal Projection and Normal Equations

Projection Theorem (Continued)

Orthogonal Projection Operator


Let X be a nite dimensional (real) inner product space and M a subspace of
X. For x∈X and m0 ∈ M . The Projection Theorem shows the TFAE:

(a) x − m0 ⊥M .
(b) ∃m̃ = M ⊥ such that x = m0 + m̃.
(c) kx − m0 k = d(x, M ) = inf kx − mk.
m∈M

Def. P: X → M by P (x) = m0 , wherem0 satises any of (a),(b) or (c), is


called the orthogonal projection of X onto M .

Exercise1: P : X → M is a linear operator.

Exercise2: P : Let {v 1 , · · · , v k } be an orthonormal basis for M .Then


k
X
P (x) = hx, v i iv i .
i=1

1
Page = 44

Normal Equations
Let X be a nite dimensional (real) inner product space and M = span{y 1 , · · · , y k },
1 k
with {y , · · · , y } linearly independent. Given x ∈ X , seek x̂ ∈ M such that

kx − x̂k = d(x, M ) = inf kx − mk = min kx − mk


m∈M m∈M
where we can write  min because the Projection Theorem assures the existence
of a minimizing vector x̂ ∈ M .

Notation: x̂ = argmin d(x, M )

Remark: One solution is Gram Schmidt and the orthogonal projection oper-
ator. We provide an alternative way to compute the answer.

By the Projection Theorem, x̂ exists and is characterized by x − x̂⊥M . Write

x̂ = α1 y 1 + α2 y 2 + · · · + αk y k
and impose x − x̂⊥M ⇔ x − x̂⊥y i , 1 ≤ i ≤ k .

Then, hx − x̂, y i i = 0, ∀1 ≤ i ≤ k yields

hx̂, y i i = hx, y i i i = 1, 2, · · · , k
⇔hα1 y 1 + α2 y 2 + · · · + αk y k , y i i = hx, y i i i = 1, 2, · · · , k.

We now write this out in matrix form.


i=1
α1 hy 1 , y 1 i + α2 hy 2 , y 1 i + · · · + αk hy k , y 1 i = hx, y 1 i
i=2
α1 hy 1 , y 2 i + α2 hy 2 , y 2 i + · · · + αk hy k , y 2 i = hx, y 2 i

..
.

i=k
α1 hy 1 , y k i + α2 hy 2 , y k i + · · · + αk hy k , y k i = hx, y k i

2
Page = 45

These are called the Normal Equations.

 
1 1 1 2 1 k
hy , y i hy , y i · · · hy , y i
 hy 2 , y 1 i hy 2 , y 2 i · · · hy 2 , y k i 
 
Def. G = G(y 1 , · · · , y k ) =  .. .. .. 
 . . . 
k 1 k 2 k k
hy , y i hy , y i · · · hy , y i

Gij = hy i , y j i is called the Gram matrix.

Remark: Because we are assuming F = R, hy i , y j i = hy j , y i i, and we therefore


have G = GT .
 
α1
α 
 2
Let α =  .. , we have
.
αk

GT α = β (normal equation in the matrix form)

where  
β1
β 
 2
βi = hx, y i i, β =  ..  .
.
βk

3
Page = 46

Def. g(y 1 , y 2 , · · · , y k ) =det G(y 1 , · · · , y k ) is the determinant of the Gram


Matrix.

Prop. g(y 1 , y 2 , · · · , y k ) 6= 0 ⇔ {y 1 , · · · , y k } is linearly independent.

The proof is given at the end of the handout.

Summary: Here is the solution of our best approximation problem by the


normal equations. Assume the set {y 1 , · · · , y k } is linearly independent and
1 k
M := span{y , · · · , y }. Then x̂ = arg min d(x, M ) if, and only if,

x̂ = α1 y 1 + α2 y 2 + · · · + αk y k

GT α = β

Gij = hy i , y j i

βi = hx, y i i.

4
Page = 47

Application: Over determined system of linear equations in Rn


Aα = b,
where A = n × m real matrix, n ≥ m, rank(A) = m (columns of A are linearly
independent). From the dimension of A, we have that α ∈ R , b ∈ R .
m n

Original Problem Formulation:


Seek α̂ such that
kAα̂ − bk = minm kAα − bk,
α∈R
where
n
X
2
kxk = (xi )2 .
i=1

Solution:

n
X
n T T
X =R , F = R, hx, yi = x y = y x = x i yi
i=1

Therefore,
n
X
2
kxk = hx, xi = |xi |2 .
i=1
Write
A = [A1 |A2 | · · · |Am ] and α = [α1 , α2 , · · · , αm ]>
and we note that
Aα = α1 A1 + α2 A2 + · · · αm Am .

New Problem Formulation:


Seek
x̂ = Aα̂ ∈ span{A1 , A2 , · · · , Am } =: M
such that
kx̂ − bk = d(b, M ) ⇔ x̂ − b⊥M.

5
Page = 48

From the Projection Theorem and the Normal Equations,

x̂ = α̂1 A1 + α̂2 A2 + · · · α̂m Am


and G> α̂ = β , with
Gij = hAi , Aj i = A>
i Aj

βi = hb, Ai i = b> Ai = A>


i b.

Aside  
A>1
 A> 
 
A> =  ..2  A = [A1 | · · · |Am ]
 . 
A> m

(A> A)ij = A>


i Aj

G = G> = A> A
(A> b)i = A>
i b

Normal Equations are


A> Aα̂ = A> b.
From the Proposition, G> = A> A is invertible ⇔ columns of A are linearly
independent. Hence,
α̂ = (A> A)−1 A> b.

6
Page = 49

Prop. g(y 1 , y 2 , · · · , y k ) 6= 0 ⇔ {y 1 , · · · , y k } is linearly independent.

Proof: g(y 1 , y 2 , · · · , y k ) = 0 ↔ ∃α 6= 0 such that G> α = 0.

From our construction of the normal equations, G> α = 0 if, and only if

hα1 y 1 + α2 y 2 + · · · + αk y k , y i i = 0 i = 1, 2, · · · , k.
This is equivalent to

α1 y 1 + α2 y 2 + · · · + αk y k ⊥y i = 0 i = 1, 2, · · · , k
which is equivalent to

α1 y 1 + α2 y 2 + · · · + αk y k ⊥span{y 1 , · · · , y k } =: M
and thus

α1 y 1 + α2 y 2 + · · · + αk y k ∈ M ⊥ .

Because α1 y 1 + α2 y 2 + · · · + αk y k ∈ M , we have that



α1 y 1 + α2 y 2 + · · · + αk y k ∈ M ∩ M ⊥
and therefore
α1 y 1 + α2 y 2 + · · · + αk y k = 0.
By the linear independence of {y 1 , · · · , y k }, we deduce that

α1 = α2 = · · · αk = 0. 

7
Page = 50

Rob 501 Fall 2014


Lecture 11
Typeset by: Su-Yang Shieh
Proofread by: Zhiyuan Zuo
Updated by Grizzle on 8 October 2015

Symmetric Matrices

Def. An n × n real matrix A is symmetric if A> = A.

Claim 1: The eigenvalues of a symmetric matrix are real.

Proof: Let λ ∈ C be an eigenvalue. To show: λ = λ̄ where λ̄ is the complex


conjugate of λ.
Because λ ∈ C is an eigenvalue, ∃v ∈ Cn , v 6= 0, such that
Av = λv.
Take the complex conjugate of both sides, yielding
Āv̄ = λ̄v̄.
Because A is real, we have Ā = A and thus
Av̄ = λ̄v̄.
Now, take the transpose of both sides to obtain
v̄ > A> = λ̄v̄ > .
Because A is symmetric, A> = A, and hence,
v̄ > A = λ̄v̄ >
⇒ v̄ > Av = λ̄v̄ > v
⇒ v̄ > λv = λ̄v̄ > v
∴ λkvk2 = λ̄kvk2

1
Page = 51

where hx, yi = x> ȳ and kxk2 = hx, xi = x> x̄ = x̄> x. Because kvk2 6= 0, we
deduce that λ = λ̄, proving the result. 

Remark: We now know that when A is real and symmetric, an eigenvalue


λ is real, and therefore we can assume the corresponding eigenvector is real.
Indeed,
(A − λI) v = 0.
| {z }
real
Hence we have v ∈ R and we can use the real inner product on Rn , namely
n

hx, yi = x> y .

Claim 2: Eigenvectors corresponding to distinct eigenvalues are orthogonal.


That is, let λ1 , λ2 ∈ R, v 1 , v 2 ∈ Rn , Av 1 = λ1 v 1 , Av 2 = λ2 v 2 , v 1 6= 0, v 2 6= 0.
Then,
λ1 6= λ2 ⇒ hv 1 , v 2 i = 0.

Proof: Av 1 = λ1 v 1 .
Take the transpose of both sides, and use A = A> . Then,
(v 1 )> A = λ1 (v 1 )>
(v 1 )> Av 2 = λ1 (v 1 )> v 2
(v 1 )> λ2 v 2 = λ1 (v 1 )> v 2
(λ1 − λ2 )(v 1 )> v 2 = 0
λ1 6= λ2 , ⇒ (v 1 )> v 2 = 0. 

Def.: A matrix Q is orthogonal if Q> Q = I . That is, Q−1 = Q> .

Claim 3: Suppose the eigenvalues of A are all distinct. Then there exists an
orthogonal matrix Q such that
Q> AQ = Λ = diag(λ1 , · · · , λn ).

Proof: λ1 , · · · , λn distinct implies that the eigenvectors v1 , · · · , vn are orthog-


2
Page = 52

onal, and thus


hv i , v j i = (v i )> v j = 0 i 6= j.
WLOG (without loss of generality), we can assume:kv i k = 1
∴ kv i k2 = 1 ⇔ (v i )> v i = kv i k2 = 1.
We dene  
Q = v 1 |v 2 | · · · |v n
Then (
 >  1, i = j
Q Q ij = (v i )> v j =
6 j
0, i =
∴ Q> Q = I , is orthogonal. 

Fact: [See HW06] Even if the eigenvalues are repeated, A = A> ⇒


∃ Q orthogonal such that Q> AQ = Λ = diag(λ1 , · · · , λn ). Symmetric matrices
are rather special in that one can ALWAYS nd a basis consisting of e-vectors.

Useful Observation: Let A be m × n real matrix. Then both A> A and AA>
are symmetric, and hence their eigenvalues are real.

Claim 4: Eigenvalues of A> A and AA> are non-negative.

Proof: We do the proof for A> A.


Let A> Av = λv where v ∈ Rn , v 6= 0, λ ∈ R, v ∈ Rn . To show: λ ≥ 0.
Multiply both sides by v >
v > A> Av = v > λv
hAv, Avi = λhv, vi
∴ kAvk2 = λkvk2
∴ λ ≥ 0, because kvk2 > 0, kAvk2 ≥ 0. 

3
Page = 53

Quadratic Forms

Def. Let M be an n × n real matrix and x ∈ Rn . Then x> M x is called a


quadratic form.

Def. An n × n matrix W is skew symmetric if W = −W > .

Exercise: If W is skew symmetric, then x> W x = 0 for all x ∈ Rn .

M + M> M − M>
Exercise: M a real matrix, M = + .
2 2
| {z } | {z }
symmetric skew symmetric

M +M >
Def. 2 is the symmetric part of M .
 
M +M >
Exercise: >
x Mx = x >
2 x.

Consequence: When working with a quadratic form, always assume M is


symmetric.

Def. A real symmetric matric P is positive denite, if, for all x ∈ Rn , x 6=


0 ⇒ x> P x > 0.

4
Page = 54

Rob 501 Fall 2014


Lecture 12
Typeset by: Yong Xiao
Proofread by: Pedro Donato

Positive Denite Matrices and Schur Complement

Notation: P > 0: P is positive denite. (Does not mean all entries of P are
positive)

Theorem: A symmetric matrix P is positive denite if and only if all of its


eigenvalues are greater than 0.

Proof:
Claim 1: P is positive denite. ⇒ All eigenvalues of P are greater than 0.
Proof: Let λ ∈ R, P x = λx, x 6= 0. (λ is an eigenvalue of P ).
Then, we have:
x> P x = x> λx = λ kxk2 > 0
∴ kxk > 0 ⇒ λ > 0. 

Claim 2: All eigenvalues of P are greater than 0. ⇒ P is positive denite.


Proof: To show x 6= 0 ⇒ x> P x > 0.
Without loss of generality, assume kxk = 1,

∴ x> x = 1.
x> P x ≥ min
n
x> P x = λmin (P )
x∈R , kxk=1

where λmin (P ) is the smallest eigenvalue of P .


Meanwhile, λmin (P ) > 0 because all eigenvalues of P are positive and there is
only a nite number of them.
∴ x> P x ≥ λmin (P ) > 0. 

1
Page = 55

Exercise: Show  
2 −1
P = >0
−1 2

Denition: P = P > is positive semidenite if x>P x ≥ 0 for all x 6= 0.

Theorem: P is positive semidenite if and only if all eigenvalues of P are


non-negative. (Notation: P ≥ 0 or P < 0.)

Denition: N is a square root of a symmetric matrix P if N >N = P .


>
Note: N > N = N > N ⇒ N > N is always symmetric.

Theorem: P ≥ 0 ⇔ ∃N such that N >N = P .


Proof:

1. Suppose N > N = P , and let x ∈ Rn .

x> P x = x> N > N x = (N x)> (N x) = kN xk2 ≥ 0.


2. Now suppose P ≥ 0. To show ∃N such that N > N = P .
Since P is symmetric, there exists an orthogonal matrix O such that
P = O> ΛO
where Λ = diag (λ1 , λ2 , · · · , λn ).
Since P ≥ 0, λi ≥ 0 √for all√i = 1, 2,√ . . . , n.
Dene Λ := diag( λ1 , λ2 , . . . , λn ),
1/2

Λ = (Λ1/2 )> Λ1/2 = Λ1/2 Λ1/2 .


Let N = Λ1/2 O, then
 >
> >
N N =O Λ 1/2
Λ1/2 O = O> ΛO = P.

∴ N > N = P. 

2
Page = 56

Exercise: For a symmetric matrix P , x, y ∈ Rn, prove (x + y)>P (x + y) =


x> P x + y > P y + 2x> P y . (Because y > P x is scalar)

Theorem: (Schur Complement) Suppose that A = n × n is symmetric and


invertible, B = n × m, C = m × m is symmetric and invertible, and
 
A B
M=
B> C
symmetric.

Then the following are equivalent:

1. M > 0.
2. A > 0, and C − B > A−1 B > 0.
3. C > 0, and A − BC −1 B > > 0.

Denition: C − B >A−1B is the Schur Complement of A in M .

Denition: A − BC −1B > is the Schur Complement of C in M .

Proof: We will show 1. ⇔ 2.. The proof of 1. ⇔ 3. is identical.

Firstly, let's show 1. ⇒ 2..


Suppose M > 0, then for all x ∈ Rn , x 6= 0,
 >  
x x
M >0
0 0
 >     
x A B x  >  Ax
0< = x 0 = x> Ax.
0 B> C 0 >
B x
∴ A is positive denite.  
x
We will make a nice choice of to show C − B > A−1 B > 0.
y

3
Page = 57

We want Ax + By = 0, thus let x = −A−1 By , y 6= 0.


 >     >   
x A B x −A−1 By A B −A−1 By
0< =
y B> C y y B> C y
 
 > > −1 >  0
= −y B A y
−B > A−1 By + Cy
= y > Cy − y > B > A−1 By
= y > (C − B > A−1 B)y.
∴ C − B > A−1 B > 0.

Secondly, let's show 2. ⇒ 1..


Suppose A > 0, C − B > A−1 B > 0. To show M > 0.
       >  
x x 0 x x
(Equivalently, to show: for an arbitrary , 6 = , M > 0)
y y 0 y y

 
x
For an arbitrary , dene x̄ = x + A−1 By .
y
       
x 0 x̄ 0
Note that 6= ⇔ 6= .
y 0 y 0
 >    >  
x x x̄ − A−1 By x̄ − A−1 By
M = M
y y y y
 >    >    >  
x̄ x̄ −A−1 By −A−1 By x̄ −A−1 By
= M + M +2 M
0 0 y y 0 y
= x̄> Ax̄ + y > (C − B > A−1 B)y + 0 > 0. 

4
Page = 58

Rob 501 Fall 2014


Lecture 13
Typeset by: Ming-Yuan Yu
Proofread by: Ilsun Song

Weighted Least Squares

Let Q be an n × n positive denite matrix (Q > 0)


and let the inner product on Rn be
hx, yi = x> Qy.
We re-do Aα = b, where A = n × m, n ≥ m, rank(A) = m, α ∈ Rm , and b ∈
Rn . We want to seek α̂ such that
kAα̂ − bk = minm kAα − bk
α∈R

where kxk = hx, xi 2 = (x> Qx) 2 and Q > 0.


1 1

Solution: X = Rn , F = R, hx, yi >


 = x Qy
Write A = A1 A2 · · · Am

Normal Equations:
x̂ = α̂1 A1 + α̂2 A2 + · · · + α̂m Am
G> α̂ = β , with G = G>
[G> ]ij = [G]ij = hAi , Aj i = A> >
i QAj = [A QA]ij
βi = hb, Ai i = b> QAi = A> >
i Qb = [A Qb]i .

∴ A> QAα̂ = A> Qb.

Since A> QA is invertible by rank(A) = m, we can conclude that


α̂ = (A> QA)−1 A> Qb.

1
Page = 59

Recursive Least Squares

Model:

yi = Ci x + ei , i = 1, 2, 3, · · ·

Ci ∈ Rm×n
i = time index
x = an unknown constant vector ∈ Rn
yi = measurements ∈ Rm
ei = model "mismatch" ∈ Rm

Objective 1: Compute a least squared error estimate of x at time k , using


all available data at time k , (y1 , · · · , yk )!

Objective 2: Discover a computationally attractive form for the answer.

Solution:

k
!
X
x̂k : = argmin (yi − Ci x)> Si (yi − Ci x)
x∈Rn i=1
k
!
X
= argmin e>
i Si ei
x∈Rn i=1

where Si = m × m positive denite matrix. (Si > 0 for all time index i)

Batch Solution:

     
y1 C1 e1
y  C  e 
 2   2   2 
Yk =  ..  , Ak =  ..  , Ek =  .. 
 .   .   . 
yk Ck ek

2
Page = 60
 
S1
 0 
 S2 
Rk =  ...  = diag(S1 , S2 , · · · , Sk ) > 0
 
0
Sk
Yk = Ak x + Ek , [model for 1 ≤ i ≤ k]
kYk − Ak xk2 = kEk k2 := Ek> Rk Ek

Since x̂k is the value minimizing the error kEk k, which is the unexplained
part of the model,
x̂k = argminkEk k = argminkYk − Ak xk,
x∈Rn x∈Rn

which satises the Normal Equations (A>


k Rk Ak )x̂k = Ak Rk Yk .
>

k Rk Ak ) Ak Rk Yk , which is called a Batch Solution.


∴ x̂k = (A> −1 >

Drawback: Ak = km × n matrix, and grows at each step!

Solution: Find a recursive means to compute x̂k+1 in terms of x̂k and the
new measurement yk+1 !

Normal equations at time k , (A>


k Rk Ak )x̂k = Ak Rk Yk , is equivalent to
>

k
! k
X X
Ci> Si Ci x̂k = Ci> Si yi .
i=1 i=1

We dene
k
X
Qk = Ci> Si Ci
i=1
so that
>
Qk+1 = Qk + Ck+1 Sk+1 Ck+1 .

3
Page = 61

At time k + 1,
Xk+1 k+1
X
>
( Ci Si Ci ) x̂k+1 = Ci> Si yi
| i=1 {z } i=1
Qk+1

or
k
X
Qk+1 x̂k+1 = Ci> Si yi +Ck+1
>
Sk+1 yk+1 .
|i=1 {z }
Qk x̂k

∴ Qk+1 x̂k+1 = Qk x̂k + Ck+1


>
Sk+1 yk+1

Good start on recursion! Estimate at time k + 1 expressed as a linear


combination of the estimate at time k and the latest measurement at time k +1.

Continuing,  
x̂k+1 = Q−1
k+1 Q k x̂ k + C >
k+1 Sk+1 y k+1 .
Because
>
Qk = Qk+1 − Ck+1 Sk+1 Ck+1 ,
we have
x̂k+1 = x̂k + Q−1 >
k+1 Ck+1 Sk+1 (y k+1 − Ck+1 x̂k ) .
| {z }| {z }
Kalman gain Innovations

Innovations yk+1 − Ck+1 x̂k = measurement at time k + 1 minus the "predicted"


value of the measurement = "new information".

In a real-time implementation, computing the inverse of Qk+1 can be time con-


suming. An attractive alternative can be obtained by applying the Matrix Inversion
Lemma:
−1
(A + BCD)−1 = A−1 − A−1 B DA−1 B + C −1 DA−1
Now, following the substitution rule as shown below,
>
A ↔ Qk B ↔ Ck+1 C ↔ Sk+1 D ↔ Ck+1 ,

4
Page = 62

we can obtain that


−1
Q−1
k+1 = Q k + C >
k Sk+1 C k+1
 
−1 −1
= Q−1
k − Q −1 >
k C k+1 C Q
k+1 k
−1 >
C k+1 + Sk+1 Ck+1 Q−1
k ,

k !
which is a recursion for Q−1

Upon dening
Pk = Q−1
k ,
we have  
> > −1 −1
Pk+1 = Pk − Pk Ck+1 Ck+1 Pk Ck+1 + Sk+1 Ck+1 Pk
We note that we are now inverting a matrix that is m × m, instead of one that
is n × n. Typically, n > m, sometimes by a lot!

5
Page = 63

Rob 501 Fall 2014


Lecture 14
Typeset by: Bo Lin
Proofread by: Hiroshi Yamasaki
Revised: 28 October 2015

Weighted Least Square

We suppose the inner product on Rn is dened by < x, y >= x> Sy , where


S is an n × n positive denite matrix. We denote the corresponding norm by
kxkS := (x> Sx)1/2 .

Overdetermined Equation:
Let Ax = b, where x ∈ Rn , b ∈ Rm , A = m × n, n < m, and rank(A) = n.
−1
Then, we conclude that x̂ = A> SA A> Sb, where x̂ = argminkxkS .
Ax=b

Underdetermined Equation:
Let Ax = b, where x ∈ Rn , b ∈ Rm , A = m × n, n > m, and rank(A) = m. In
other words, we are assuming the rows of A are linearly independent instead of
the columns of A are linearly independent.

Def. If ∀b0 ∈ Rm , ∃x0 ∈ Rn , such that b0 = Ax0 , b = Ax is consistent.

Fact: If rank(A) = the number of rows, then the equation b = Ax is consistent.

Fact: Suppose x0 is such that b0 = Ax0 , and V = {x ∈ Rn |y = Ax} is


the set of solutions. Then, V = x0 + N (A), where N (A) = {x ∈ Rn |Ax = 0}
is the null space of A. Therefore, V is the translate of a subspace. We can also
say that V is an "ane" space.

Theorem: If the rows of A are linearly independent, then


1
x̂ := argminkxkS = argminkxkS = argmin(x> Sx) 2
x∈V Ax=b Ax=b

1
Page = 64

exists, is unique, and is given by


−1
x̂ = S −1 A> AS −1 A> b.

Best Linear Unbiased Estimator (BLUE)

Let y = Cx + , y ∈ Rm , x ∈ Rn , E{} = 0, cov{, } = E{> } = Q > 0.


We assume no stochastic (random) model for the unknown x. We also assume
that columns of C are linearly independent.
P
n
Seek: x̂ = Ky that minimizes E{kx̂ − xk2 } = E{ |x̂i − xi |2 } where k · k is
i=1
the standard Euclidean norm on Rn .

Aside:
(v + w)> (v + w) = v > v + w> w + v > w + w> v
= kvk2 + kwk2 + 2v > w (Because v > w is a scalar.)

∴ E{kx̂ − xk2 } = E{kKy − xk2 }


= E{kKCx + K − xk2 }
= E{(KCx − x + K)> (KCx − x + K)}
= E{(KCx − x)> (KCx − x) + 2(K)> (KCx − x) + > K > K}
From E{} = 0 and x is deterministic, we have
2E{(K)> (KCx − x)} = 0.
Moreover, by using the properties of the trace, we have
 
> K > K = tr > K > K = tr K> K > .

∴ E{kx − x̂k2 } = kKCx − xk2 + tr E{K> K > }


= kKCx − xk2 + tr(KQK > ).
Diculty: Optimal K depends on the unknown x through kKCx − xk2 !

2
Page = 65

Observation: If KC = I , then the problematic term disappears, i.e.,


kKCx − xk2 = 0.
Interpretation: Estimator is unbiased.
E{x̂} = E{Ky}
= E{KCx + K}
= KCx
= x. (if KC = I)
New Problem:
K̂ = argmin{tr(KQK > )} subject to KC = I.
New Observation:
 
k1
k 
 2
Write K =  ..  (partition K by rows).
.
kn
 
Then, K > = k1> |k2> | · · · |kn>
  
k1 Xn
 ..   > 
> 
tr  .  Q k1 | · · · |kn  = ki Qki>
kn i=1

Xn
= kki> k2Q
i=1

KC = I ⇔ C > K > = In×n


   
⇔ C > k1> | · · · |kn> = e1 | · · · |en
⇔ C > ki> = ei 1 ≤ i ≤ n.
∴ We have n-separate optimization problems involving the column vectors ki> .
>
k̂i = argmin kki> k2Q subject to C > ki> = ei .

3
Page = 66

From our formula for under determined equations, we have


∴ k̂i> = Q−1 C(C > Q−1 C)−1 ei , which yields
∴ K̂ > = [k̂1> | · · · |k̂n> ] = Q−1 C(C > Q−1 C)−1 .
Therefore,
K̂ = (C > Q−1 C)−1 C > Q−1

Theorem: Let x ∈ Rn , y ∈ Rm , y = Cx + , E{} = 0, E{> } =: Q > 0,


and rank(C) = n. The Best Linear Unbiased Estimator (BLUE) is x̂ = K̂y
where  −1
K̂ = C > Q−1 C C > Q−1 .
Moreover, the covariance of the error is
−1
E{(x̂ − x) (x̂ − x)> } = C > Q−1 C .

Remark: Error covariance computation is an exercise. Solution (from previous


calculations)
E{(x̂ − x) (x̂ − x)> } = KQK >
−1 −1
= C > Q−1 C C > Q−1 QQ−1 C C > Q−1 C
−1  > −1  > −1 −1
= C > Q−1 C C Q C C Q C
 −1
= C > Q−1 C
Indeed
x̂ − x = Ky − x
= KCx + K − x
= K (because KC = I)
∴ E{(x̂ − x)(x̂ − x)> } = E{(K)(K)> }
= E{K> K > }
= KQK >

4
Page = 67

Remarks:

• Comparing Weighted Least Squares to BLUE, we see that they are identical
when the weighting matrix is taken as the inverse of the covariance matrix
of the noise term: S = Q−1 .
• Another way to say this, if you solve a least squares problem with weight
matrix S , you are implicitly assuming that your uncertainty in the mea-
surements has zero mean and a covariance matrix of Q = S −1 .
• If you know the uncertainty has zero mean and a covariance matrix of
Q, using S = Q−1 makes a lot of sense! For simplicity, assume that Q
is diagonal. A large entry of Q means high variance, which means the
measurement is highly uncertain. Hence, the corresponding component of
y should not be weighted very much in the optimization problem....and
indeed, taking S = Q−1 does just that because, the weight term S is small
for large terms in Q.
• The inverse of the covariance matrix is sometimes called the information
matrix. Hence, there is low information when the variance (or covariance)
is large!
• Wow! We do all this abstract math, and the answer makes sense!

5
Page = 68

Rob 501 Fall 2014


Lecture 15
Typeset by: Connie Qiu
Proofread by: Bo Lin
Revised by Grizzle on 29 October 2015

Minimum Variance Estimator


y = Cx + , y ∈ Rm , x ∈ Rn , and  ∈ Rm .

Stochastic assumptions:

E{x} = 0, E{} = 0 (means).

E{> } = Q, E{xx> } = P, E{x> } = 0 (covariances).

Remark: E{x> } = 0 implies that the states and noise are uncorrelated.
Recall that uncorrelated does NOT imply independence, except for Gaussian
random variables.

Assumptions: Q ≥ 0, P ≥ 0, CP C > + Q > 0. (will see why later)

Objective: minimize the variance


n
X n
X
2 2
E{kx̂ − xk } = E{ (x̂i − xi ) } = E{(x̂i − xi )2 }.
i=1 i=1

We see that there are n separate optimization problems.

Remark: suppose x̂ = Ky . It is automatically unbiased, because

E{x̂} = E{Ky} = E{KCx + K} = KCE{x} + KE{} = 0 = E{x}

1
Page = 69

Problem Formulation: We will pose this as a minimum norm problem in a


vector space of random variables.

F = R,
X = span{x1 , x2 , . . . , xn , 1 , 2 , . . . , m },
where    
x1 1
x =  ...  and = ..
. .
xn m

For z1 , z2 ∈ X , we dene their inner product by:

< z1 , z2 >= E{z1 z2 }

M = span{y1 , y2 , . . . , ym } ⊂ X (measurements),

P
n
yi = Ci x + i = Cij xj + i , 1 ≤ i ≤ m, (i-th row of y)
j=1

x̂i = arg minkxi − mk = d(x, M ).


m∈M

Fact: {y1 , y2 , . . . , ym } is linearly independent if, and only if, CP C > + Q is


positive denite. This is proven below when we compute the Gram matrix.
(Recall, {y1 , y2 , . . . , ym } linearly independent if, and only if G is full rank,
where Gij :=< yi , yj > .)

2
Page = 70

Solution via the Normal Equations

By the normal equations,

x̂i = α̂1 y1 + α̂2 y2 + · · · + α̂m ym


where G> α̂ = β .
Gij =< yi , yj >= E{yi yj } = E{[Ci x + i ][Cj x + j ]}
= E{[Ci x + i ][Cj x + j ]> }
= E{[Ci x + i ][x> Cj > + j ]}
= E{Ci xx> Cj> } + E{Ci xj } + E{i x> Cj> } + E{i j }
= Ci E{xx> }Cj> + E{i j }
= Ci P Cj> + Qij
= [CP C > + Q]ij
where we have used the fact that x and  are uncorrelated. We conclude that

G = CP C > + Q.

We now turn to computing β. Let's note that xi , the i-th component of x is


equal to
>
x ei , where ei is the standard basis vector in Rn .
βj =< xi , yj > = E{xi yj }
= E{xi [Cj x + j ]}
= E{xi Cj x} + E{xi j }
= Cj E{xxi }
= Cj E{xx> ei }
= Cj E{xx> }ei
= Cj P e i
= C j Pi
where P = [P1 |P2 | . . . |Pn ].

3
Page = 71

Putting all this together, we have

G> α̂ = β
m
[CP C > + Q]α̂ = CPi
m
α̂ = [CP C > + Q]−1 CPi

x̂i = α̂1 y1 + α̂2 y2 + · · · + α̂m ym = α̂> y = (row vector × column vector.)


 
α̂1
α̂ =  ...  .
α̂m

We now seek to identify the gain matrix K so that


 
K1
K 
 2
x̂ = Ky ⇔ x̂i = Ki y, where K =  ..  ;
 . 
Kn
that is, Ki is the i-th row of K.

Ki> = α̂ = [CP C > + Q]−1 CPi


[K1> | . . . |Kn> ] = [CP C > + Q]−1 CP
K = P C > [CP C > + Q]−1

x̂ = Ky = P C > [CP C > + Q]−1 y

4
Page = 72

Remarks:

1. Exercise: E{(x̂ − x)(x̂ − x)> } = P − P C > [CP C > + Q]−1 CP


2. The term P C > [CP C > + Q]−1 CP represents the value of the measure-
ments. It is the reduction in the variance of x given the measurement
y.
3. If Q>0 and P > 0, then from the Matrix Inversion Lemma

x̂ = Ky = [C > Q−1 C + P −1 ]−1 C > Q−1 y.


This form of the equation is useful for comparing BLUE vs MVE

4. BLUE vs MVE

• BLUE: x̂ = [C > Q−1 C]−1 C > Q−1 y

• MVE: x̂ = [C > Q−1 C + P −1 ]−1 C > Q−1 y

• Hence, BLUE = MVE when P −1 = 0.

• P −1 = 0 roughly means P = ∞I , that is innite covariance in x, which


in turn means no idea about how x is distributed!

• For BLUE to exist, we need dim(y) ≥ dim(x)


• For MVE to exist, we can have dim(y) < dim(x) as long as

(CP C > + Q) > 0

5
Page = 73

Solution to Exercise

We seek E{(x̂ − x)(x̂ − x)> } To get started, let's note that

x̂ − x = Ky − x = KCx + K − x = (KC − I)x + K


and thus

(x̂ − x)(x̂ − x)> = (KC − I)xx> (KC − I)> + K> K > − 2(KC − I)x> K >

Taking expectations, and recalling that x and  are uncorrelated, we have

E{(x̂ − x)(x̂ − x)> } = (KC − I)P (KC − I)> + KQK >


= KCP C > K > + P − 2P C > K > + KQK >
= P + K[CP C > + Q]K > − 2P C > K >
substituting with K = P C > [CP C > + Q]−1 and simplifying yields the result.

6
Page = 74

Solution to MIL

We will show that if Q>0 and P > 0, then

P C > [CP C > + Q]−1 = [C > Q−1 C + P −1 ]−1 C > Q−1

MIL: Suppose that A, B , C and D are compatible


1
matrices. If A, C , and
−1 −1
(C + DA B) are each square and invertible, then A + BCD is invertible
and
(A + BCD)−1 = A−1 − A−1 B(C −1 + DA−1 B)−1 DA−1

We apply the MIL to [C > Q−1 C + P −1 ]−1 , where we identify A = P −1 , B =


C > , C = Q−1 , D = C . This yields

[C > Q−1 C + P −1 ]−1 = P − P C > [Q + CP C > ]−1 CP

Hence

[C > Q−1 C + P −1 ]−1 C > Q−1 = P C > Q−1 − P C > [Q + CP C > ]−1 CP C > Q−1
 
= P C > I − [Q + CP C > ]−1 CP C > Q−1
= P C > [[Q+CP C > ]−1 [Q+CP C > ]−[Q+CP C > ]−1 CP C > ]Q−1
 
= P C > [Q + CP C > ]−1 [Q + CP C > ] − CP C > Q−1
 
= P C > [Q + CP C > ]−1 Q + CP C > − CP C > Q−1
= P C > [Q + CP C > ]−1 [Q] Q−1
= P C > [Q + CP C > ]−1

1 The sizes are such the matrix products and sum in A + BCD make sense.

7
Page = 75

Rob 501 Fall 2014


Lecture 16
Typeset by: Kurt Lundeen
Proofread by: Connie Qiu
Revised by Ni on 6 November 2015

Matrix Factorizations

QR Decomposition or Factorization: Let A be a real m × n matrix with


linearly independent columns (rank of A = n = # columns). Then there exist
an m × n matrix Q with orthonormal columns and an upper triangular n × n
matrix R such that
A = QR.

Notes:

1) Q> Q = In×n
 
r11 · · · · · · r1n
 .. . 
 . r22 · · · .. 
2) [R]ij = 0, for i < j , R =  .. .. . . . .. 
 . . . 
0 · · · · · · rnn
3) Columns of A linearly independent ⇔ R is invertible

Utility of QR Decomposition:

1) Suppose Ax = b is overdetermined with columns of A linearly independent.

1
Page = 76

Write A = QR and consider


A> Ax̂ = A> b
A> A = R> Q> QR = R> R
A> b = R> Q> b
∴ R> Rx̂ = R> Q> b
Rx̂ = Q> b (because R is invertible)
∴ Solve for x̂ by back substitution using triangular nature of R.
For example, when n = 3
 
r11 r12 r13
 0 r22 r23  x̂ = Q> b
0 0 r33
Then, x̂3 to x̂1 can be obtained easily without using the matrix inversion.
2) Suppose Ax = b is under determined with rows of A linearly independent.
Recall: x̂ = A> (AA> )−1 b is x of smallest norm satisfying Ax = b.
A> has linearly independent columns.
∴ A> = QR, Q> Q = I, R is upper triangular and invertible.
AA> = R> Q> QR = R> R
x̂ = QR(R> R)−1 b
= QRR−1 (R> )−1 b
x̂ = Q(R> )−1 b

Computation of QR Factorization:
Gram Schmidt with Normalization:
A = [A1 |A2 | · · · |An ], Ai ∈ Rm , hx, yi = x> y .
For 1 ≤ k ≤ n, {A1 , A2 , · · · , An } → {v1 , v2 , · · · , vn }

2
Page = 77

by
A1
v1 = ;
kA1 k
v 2 = A2 − hA2 , v 1 iv 1 ;
v2
v2 = 2 ;
kv k
..
.
v k = Ak − hAk , v 1 iv 1 − hAk , v 2 iv 2 − · · · − hAk , v k−1 iv k−1 ;
k vk
v = k ;
kv k
For k = 1 : n
v k = Ak
For j = 1 : k − 1
v k = v k − hAk , v j iv j
End
vk
vk = kv k k
End
Q = [v 1 |v 2 | · · · |v n ] has orthonormal columns, and hence Q> Q = In×n because
[Q> Q]ij = hv i , v j i = δij .

What about R?
Ai ∈ span{v 1 , · · · , v i }
Ai = hA1 , v 1 iv 1 
+ hA2 , v 2 iv

2
+ · · · + hAi , v i iv i
hA1 , v 1 i
 ... 
 
 i 
hAi , v i 
We dene Ri =   , where the value becomes 0 in Ri from the (i+1)-th
 0 
 . 
 .. 
0
element to the n-th element.
∴ QRi = Ai ⇔ QR = A

3
Page = 78

Modied Gram Schmidt Algorithm:


We have been using the classical Gram-Schmidt Algorithm. It behaves poorly
under round-o error.
Here is a standard example:
     
1 1 1
 ε 2 0 3 0
   
y1 =  
0 , y = ε , y = 0 , ε > 0
0 0 ε
( !
 0 i 6= j
Let {e1 , e2 , e3 , e4 } be the standard basis vectors Yes, eij =
1 i=j
We note that
y 2 = y 1 + ε(e3 − e2 )
y 3 = y 2 + ε(e4 − e3 )
and thus
span{y 1 , y 2 } = span{y 1 , (e3 − e2 )}
span{y 1 , y 2 , y 3 } = span{y 1 , (e3 − e2 ), (e4 − e3 )}
Then, GS applied to {y 1 , y 2 , y 3 } and {y 1 , (e3 − e2 ), (e4 − e3 )} should produce
the same orthonormal vectors.
We go to MATLAB, and for ε = 0.1, we do indeed get the same results. See
MATLAB.
But with ε = 10−8 ,
kQ1 − Q2 k = 0.5

Initial data {y 1 , · · · , y n } linearly independent.


For k = 1 : n
vk = yk
end
For i = 1 : n
i
v i = kvvi k
For j = i + 1 : n
v j = v j − hv i , v j iv i

4
Page = 79

end
end

5
Page = 80

Rob 501 Fall 2014

Lecture 17

Typeset by: Joshua Mangelson

Proofread by: Katie Skinner

Revised by Ni on Nov. 20, 2015

Singular Value Decomposition

We will use the SVD (Singular Value Decomposition) to understand "numeri-


cal" rank of a matrix, "numerical linear independence", etc.

Def. Rectangular diagonal matrix: Σ is an m × n matrix.

 
S
a) m > n Σ= , S is an n × n diagonal matrix
0
 
b) m < n Σ = S 0 , S is an m × m diagonal matrix

Diagonal of Σ is equal to diagonal of S .


Another way to say Rectangular Diagonal Matrix is [Σ]ij = 0 for i 6= j .

SVD Theorem: Any m × n R matrix A can be factorized as A = Q1 ΣQ> 2,


where Q1 is an m × m orthogonal matrix, Q2 is an n × n orthogonal matrix,
Σ is an m × n rectangular diagonal matrix, and diagonal of Σ diag(Σ) =
[σ1 , σ2 , · · · , σk ], which satises σ1 ≥ σ2 ≥ · · · ≥ σk ≥ 0, where k = min(n, m).
Moreover, the columns of Q1 are eigenvectors of AA> , the columns of Q2 are
eigenvectors of A> A, and (σ12 , σ22 , · · · , σk )2 are eigenvalues of A> A and AA> .

1
Page = 81

Remark: The entries of diag(Σ) are called singular values.

Generalizes decomposition of symmetric matrix.


P = OΛO>

Projection process embedded in SVD: Interpret SVD in the case of over-


determined system of equations.
Y = Ax, Y ∈ Rm , X ∈ Rn , A ∈ Rm×n
 
S
where rank(A) = n (m > n), A = Q1 ΣQ>
2, Σ = , S is an n × n diagonal
0
matrix.
A> A = Q2 Σ> Q> >
1 Q1 ΣQ2
 
  > S
= Q2 S 0 Q1 Q1 Q>
2
0
 
  S
= Q2 S 0 I Q>
2
0
= Q2 S 2 Q>
>
 2  >
A Y = Q2 S 0 Q1 Y
 

Ỹ = Q>
1Y = ˜
1
, Y˜1 ∈ Rn , Y˜2 ∈ Rm×n
Y2
 
A> Y = Q2 S 0 Ỹ
 
  Y˜1
= Q2 S 0
Y˜2
= Q2 S Y˜1

Projection! Notice how Y˜2 gets multiplied by 0, in the last line above. Here we
are throwing away the orthogonal parts.

We decomposed Y into part in column span of A, Y˜1 , and a part not in the
2
Page = 82

span Y˜2 .
Ax = Y
⇒ A> Ax̂ = A> Y
⇒ Q2 S 2 Q> ˜
2 x̂ = Q2 S Y1

2 x̂ = S Y1 (rank(A) = # columns ⇒ S invertible.)


⇒ S 2 Q> ˜
⇒ SQ> 2 x̂ = Y1
˜

∴ x̂ = Q2 S −1 Y˜1

Remarks:

• Q2 only rotates, no scaling.

• Only S −1 scales.

• If S has small elements, elements of S −1 are big. Therefore, x̂ is too


sensitive to the noise perturbation in measurements.

Hermitian of X: Consider x ∈ Cn . Then we dene the vector "x Hermitian"


by xH := x̄> . That is, xH is the complex conjugate transpose of x. Similarly,
for a matrix A ∈ Cm×n , we dene AH ∈ Cn×m by Ā> . We say that a square
matrix A ∈ Cn×n is a Hermitian matrix if A = AH .

3
Page = 83

Another common way to write the SVD:

 
 Σ
U V H, m>n
A= 0
  
U Σ 0 V H, m < n

Unitary Matrix: A matrix U ∈ Cn×n is unitary if U H U = U U H = In .

Numerical Rank: numerical rank(A) = # of nonzero singular values larger


than a threshold.

Fact: The numerical rank of A is the number of singular values that are larger
than a given threshold. Often the threshold is chosen as a percentage of the
largest singular value.

4
Page = 84

Rob 501 Fall 2014


Lecture 18
Lecture: Random Vector
Typeset by: Xianan Huang
Proofread by: Josh Mangelson
Revised by Grizzle 10 Nov 2015
Probability Review

1 Random Variables

I will assume known the definition of a probability space, a set of events, and
random variable. My scanned lecture notes are attached at the end of this
handout.
Given: (Ω, F , P ) a probability space
X : Ω → R random variable

2 Random Vectors

 vector is a function X : Ω → R where each component of


p
Def. A random
X1
X 
 2
X =  ..  is a random variable, that is, Xi : Ω → R for 1 ≤ i ≤ p.
 . 
Xp
Assumption: ∀x ∈ Rp , the set {ω ∈ Ω | X(ω) ≤ x} ∈ F where the
inequality is understood pointwise, that is,
p
\
{ω ∈ Ω | X(ω) ≤ x} = {ω ∈ Ω | Xi (ω) ≤ xi }
i=1

Distributions and Densities For a random vector X : Ω → Rp , the cumu-


lative probability distribution function is
FX (x) = P (X ≤ x) = P ({ω ∈ Ω | X(ω) ≤ x})

1
Page = 85

The probability density function of a continuous random vector X is


∂ p FX (x)
fX (x) =
∂x1 ∂x2 ...∂xp
which is equivalent to
Z xp Z x2 Z x1
FX (x1 , x2 , ...xp ) = ... fX (x̄1 , x̄2 ...x̄p )dx̄1 dx̄2 ...dx̄p
−∞ −∞ −∞

Suppose the vector X is partitioned into two components X1 and X2 , so that,


by abuse notation, we have
 
X1 X1 : Ω → R n
X=
X2 X2 : Ω → R m

X : Ω → Rp with p = n + m

Def. X1 and X2 are independent if the distribution function factors


FX (x) = FX1 ,X2 (x1 , x2 ) = FX1 (x1 )FX2 (x2 ).
The same is true for densities.

3 Conditioning

Recall For two events A, B ∈ F , P (B) > 0


T
P (A B)
P (A | B) :=
P (B)
Note T
P (A B) P (B)
B ⊂ A, P (A | B) = = =1
P (B) P (B)
T
P (A B) P (A)
A ⊂ B, P (A | B) = = ≥ P (A)
P (B) P (B)

2
Page = 86

Def. The conditional distribution of X1 given X2 = x2 is


T
P (A Bε )
FX1 |X2 (x1 | x2 ) = lim P (X1 ≤ x1 | x2 − ε ≤ X2 ≤ x2 + ε) = lim
ε→0 ε→0 P (Bε )
where A = {ω ∈ Ω | X1 (ω) ≤ x1 } and B = {ω ∈ Ω | x2 −ε ≤ X2 (ω) ≤ x2 +ε}

In general, this is unpleasant to compute, but for Gaussian random vectors,


the handout “Useful Facts About Gaussian Random Variables and Vectors”
shows that it is quite easy.
fX1 X2 (x1 ,x2 )
Def. The conditional density is fX1 |X2 (x1 | x2 ) = fX2 (x2 ) . Sometimes
we simply write f (x1 | x2 )

Very important: X1 given X2 = x2 is a random vector. We have produced


its distribution and density!

3
Page = 87

4 Moments

Suppose g : Rp → R
Z Z ∞ Z ∞
E{g(X)} = g(x)fX (x)dx = ... g(x1 ...xp )fX (x1 ...xp )dx1 ...dxp
Rp −∞ −∞

Mean or Expected Value


   
x1 µ1
µ = E{X} = E{ ... } =  .. 
.
xp µp

Covariance Matrices
cov(X) = cov(X, X) = E{(X − µ)(X − µ)T }
where
(X − µ) is p × 1, (X − µ)T is 1 × p, (X − µ)(X − µ)T is p × p

Exercise cov(X) is positive semidefinite

 
X1 X1 : Ω → R n
If we have X decomposed in blocks X = we may
X2 X2 : Ω → R m
compute
cov(X1 , X2 ) = E{(X1 − µ1 )(X2 − µ2 )T }
where
(X1 − µ1 ) is m × 1, (X2 − µ2 )T is 1 × n, (X1 − µ1 )(X2 − µ2 )T is m × n

Def. X1 and X2 are uncorrelated if cov(X1 , X2 ) = 0

Fact: In general, independence ⇒ uncorrelated, but the converse is false.

4
Page = 88

5 Derivation of the conditional density formula from the definition


of the conditional distribution:

\ Z x1 Z x2 +ε
P (A Bε ) = fX1 X2 (x̄1 , x̄2 )dx̄2 dx̄1
−∞ x2 −ε
Z x2 +ε
P (Bε ) = fX2 (x̄2 )dx̄2
x2 −ε
T R x1 R x2 +ε
P (A Bε ) −∞ x2 −ε fX1 X2 (x̄1 , x̄2 )dx̄2 dx̄1
FX1 |X2 (x1 | x2 ) = = R x2 +ε , ε small
P (Bε ) f
x2 −ε X2 2(x̄ )dx̄ 2

Density: differentiate w.r.t. x1


R x2 +ε
fX1 X2 (x1 , x̄2 )dx̄2 fX X (x1 , x2 ) · 2ε fX1 X2 (x1 , x2 )
fX1 |X2 (x1 | x2 ) = x2R−εx2 +ε = 1 2 =
f
x2 −ε X2 2 (x̄ )dx̄ 2 f X2
(x2 ) · 2ε fX2 (x2 )

5
ROB 501 Fall 2014
Lecture 19
Typeset by:
Proofread by:
There was no lecture on this day.

89
Page = 90

Rob 501 Fall 2014


Lecture 20
Typeset by: Yevgeniy Yesilevskiy
Revised by Ni on 21 Nov. 2015

Multivariate Random Variables or Vectors

Let (Ω, F , P ) be a probability space.

 
X1
X=
X2
where X1 ∈ R n and X2 ∈ R m , and let p = n + m.
Then, the distribution function

FX1 X2 (x1 , x2 ) = P (X1 ≤ x1 , X2 ≤ x2 )


= P ({ω ∈ Ω|X1 (ω) ≤ x1 , X2 (ω) ≤ x2 })

Conditioning:

FX1 |X2 (x1 |x2 ) = P (X1 ≤ x1 |X2 = x2 )


P (A ∩ B )
= lim
→0 P (B )
where A = {ω|X1 (ω) ≤ x1 }, B = {ω|x2 −  ≤ X2 (ω) ≤ x2 + }

Conditional Density:
fX1 X2 (x1 , x2 )
fX1 |X2 =
fX2 (x2 )
Sometimes, it is convenient to write f (x1 |x2 ).

1
Page = 91

Conditional Mean (Expectation):


Z
µ(x2 ) = E{X1 |X2 = x2 } = x1 f (x1 |x2 )dx1
Z R n

= x1 fX1 |X2 (x1 |x2 )dx1


Rn

Theorem: Let x̂ = argminz=g(x2 ) E{kX1 − zk2 |X2 = x2 }, where g varies over

all functions g : R → Rn .
m

Then, x̂ = µ(x2 ) = E{X1 |X2 = x2 }.

Remark: g : Rm → Rn includes linear, quadratic, cubic ... terms.

2
Page = 92

Rob 501 Fall 2014


Lecture 21
Typeset by: Je Koller
Proofread by: Yevgeniy Yesilevskiy
Revised by Grizzle on 10 Nov. 2015

Luenberger Observers

Luenberger Observers: It is deterministic estimator. We consider the easi-


est case

xk+1 = Axk
yk = Cxk

where x ∈ Rn , y ∈ Rp , A ∈ Rn×n , and C ∈ Rp×n .

Question 1: When can we reconstruct the initial condition (xo ) from the
measurements y0 , y1 , y2 , . . .
yo = Cxo
y1 = Cx1 = CAxo
y2 = Cx2 = CAx1 = CA2 xo
..
.

yk = CAk xo

Represent the above matrix form:

1
Page = 93

   
yo C
 y   CA 
 1   
 ..  =  ..  xo
 .   . 
yk CAk

 
C
 CA 
 
We note that if rank  ..  = n, then we can determine x0 uniquely on the
 . 
CAk
basis of the measurements.

Caley Hamilton Theorem:

   
C C
 CA   CA 
   
rank  ..  = rank  ..  for all k ≥n−1
 .   . 
CAn−1 CAk

 
C
 CA 
 
Theorem: rank  .. =n means that we can determine xo uniquely
 . 
CAn−1
from the measurements. (This called the Kalman observability rank condition.)

Question 2: Can we process the measurements dynamically (i.e. recursively)


and estimate xk ?

2
Page = 94

Full-State Luenberger Observer:


x̂k+1 = Ax̂k + L(yk − C x̂k )

We dene the error to be ek = xk − x̂k . We want conditions such that ek → 0


as k → ∞. Want ek → 0 because then x̂k → xk !!!

ek+1 = xk+1 − x̂k+1


= Axk − [Ax̂k + L(yk − C x̂k )]
= A(xk − x̂k ) − LC(xk − x̂k )
= Aek − LCek

ek+1 = (A − LC)ek

Theorem: e0 ∈ Rn and dene ek+1 = (A − LC)ek . The the sequence


Let
ek → 0 as k → ∞ for all e0 ∈ Rn if, and only if, |λi (A − LC)| < 1 for
i = 1, . . . , n.

Theorem: A sucient condition for the existence of L : Rm → Rn that places


eigenvalues of (A − LC) in the unit circle is:

 
C
 CA 
 
rank  ..  = n = dim(x)
 . 
CAn−1
.

Remarks: L = constant similar to Kss = steady-state Kalman Gain

3
Page = 95

1. Reason to choose one gain over the other: Optimality of the estimate when
you know the noise statistics.

2. Kalman Filter works for time varying models Ak , Ck , Gk , etc.

4
Page = 96

Rob 501 Fall 2014

Lecture 22

Typeset by Ni on 18 Nov. 2015

Real Analysis

Let(X , R, k · k) be a real normed space.


Recall k · k : X → [0, +∞) such that

1. kxk ≥ 0 and kxk = 0 ⇔ x = 0


2. kα · xk = |α| · kxk for all α ∈ R, x ∈ X
3. kx + yk ≤ kxk + kyk for all x, y ∈ X .

Recall:
Def.

1. For x, y ∈ X , d(x, y) := kx − yk.


2. For x ∈ X, S ⊂ X a subset

d(x, S) := inf kx − yk.


y∈S

Def. Let x0 ∈ X and a ∈ R, a > 0. The open ball of radius a center at x0 is

Ba (x0 ) = {x ∈ X |kx − x0 k < a}.

Examples:


1. R2 , k · k 2 : Euclidean norm

1
Page = 97

2. R2 , k · k 1 : One norm

k(x1 , x2 )k1 = |x1 | + |x2 |



3. R2 , k · k ∞ : Max norm

k · k∞ = max |xi |
1≤i≤n

Lemma: Let (X , k · k) be a normed space, x ∈ X, and S ⊂ X. Then,

d(x, S) = 0 ⇔ ∀ > 0, ∃y ∈ S, kx − yk < 


⇔ ∀ > 0, B (x) ∩ S 6= ∅.

Corollary:

d(x, S) > 0 ⇔ ∃ > 0, ∀y ∈ S, kx − yk ≥ 


⇔ ∃ > 0 such that B (x) ∩ S = ∅
In the following, we assume (X , k · k) is given.

Def.

1. Let P ⊂ X , a subset of X . A point p ∈ P is an interior point of P if


∃ > 0 such that B (p) ⊂ P .
2.

P̊ = {p ∈ P | p is an interior point}
= {p ∈ P | ∃ > 0 such that B (p) ⊂ P }

Remark for later use:p ∈ P̊ ⇔ ∃ > 0, B (p) ⊂ P ⇔ ∃ > 0 such that


B (p) ∩ (∼ P ) = ∅ ⇔ d(p, ∼ P ) > 0
∼ P = P C = complement = {x ∈ X |x ∈
/ P}

3. P is open if P = P̊ . (Every point in P is an interior point.)

2
Page = 98

Proposition: x ∈ P̊ ⇔ d(x, ∼ P ) > 0

Example:

• P = (0, 1) ⊂ (R, k · k) is open


1 x
x ∈ P, 0 < x ≤ ,  = , B (x) ⊂ P , and
2 2
1 x
x ∈ P, ≤ x < 1,  = 1 − , B (x) ⊂ P.
2 2
• P = [0, 1) ⊂ (R, |·|) is not open because 0 ∈ P , ∀ > 0, B (0)∩(∼ P ) 6= ∅
or 0 ∈ P , d(0, ∼ P ) = 0.

Def.

1. A point x ∈ X is a closure point of P if ∀ > 0, ∃p ∈ P such that


diskx − pk < , [d(x, P ) = 0].

2.

Closure of P = P : = {x ∈ X | X is a closure point}


= {x ∈ X | d(x, P ) = 0}
3. P is closed if P = P .

Example:

1. P = {x ∈ [0, 1] | x rational} ⇒ P = [0, 1]


2. P = (0, 1) ⇒ P = [0, 1]

Proposition:

x ∈ X , x ∈ P ⇔ d(x, P ) = 0.
x ∈ X , x ∈ P̊ ⇔ d(x, ∼ P ) > 0.

3
Page = 99

Proposition:

P is closed ⇔P = P .
P is open ⇔ P = P̊ .

Proposition:

P is closed ⇔ ∼ P is open.
P is open ⇔ ∼ P is closed.

Proof:

=∼ P} 
∼ P =∼ (P̊ ) = {x ∈ X | d(x, ∼ P ) = 0} = |∼ P {z
| {z }
P is open ∼P is closed

4
Page = 100

Rob 501 Fall 2014


Lecture 23
Typeset by: Ilsun Song
Proof-read by: Yunxiang Xu
Revised by Ni on 21 Nov. 2015

Sequence

Def. A set of vectors indexed by the non-negative integers is called a sequence


(xn ) or {xn }. Let (xn ) be a sequence and n1 < n2 < n3 < · · · be an innite
set of strictly increasing integers. Then, (xni ) is called a subsequence of (xn ).
Example:
ni = 2i + 1 or ni = 2i

Def. A sequence of vectors (xn ) converges to x ∈ X if, ∀ ε > 0, ∃N (ε) < ∞


such that, n≥N , then kxn − xk < ε, i.e., n≥N ⇒ xn ∈Bε (x). One writes
lim xn = x or xn → x or xn −−−→ x.
n→∞ n→∞

Proposition: Suppose xn → x. Then,

1. kxn k → kxk
2. sup kxn k < ∞ (The sequence is bounded.)
n
3. If xn → y then y = x. (Limits are unique.)

Aside: Useful inequality (Triangular inequality)


For x, y ∈ X,
kxk = kx − y + yk ≤ kx − yk + kyk
⇒ kxk − kyk ≤ kx − yk
∴ |kxk − kyk| ≤ kx − yk

1
Page = 101

Proof:

1. |kxk − kxn k| ≤ kx − xn k −−−→ 0.


n→∞

2. Set ε = 1, ∃N (1) < ∞ such that n ≥ N, kxn − xk ≤ 1.


∴ ∀ n ≥ N, kxn k = kxn − x + xk ≤ kxn − xk + kxk ≤ 1 + kxk.
sup kxk k ≤ max{kx1 k, kx2 k, · · · , kxn−1 k, 1 + kxk} < ∞.
k | {z }
nite

3. kx − yk = kx − xn + xn − yk ≤ kx − xn k + kxn − yk −−−→ 0.
n→∞

Def. x ∈ X , P ⊂ X a subset. x is a limit point of P if ∃ a sequence of elements


of P that converges to x. That is, ∃(xn ), xn ∈ P, and lim xn = x.
n→∞

Proposition: x is a limit point of P ⇔ x ∈ P .


Proof:

1. Suppose x is a limit point. Then, ∃(xn ) such that xn ∈ P and xn → x.


Because xn → x, ∀ε > 0, ∃xn ∈ P such that kxn − xk < ε ⇒ d(x, P ) = 0
⇒x∈ P .
2. Suppose x ∈ P . Then, ∀ ε > 0, ∃ y ∈ P such that kx − yk < ε. Let
ε = n1 . Then, ∃ xn ∈ P such that kx − xn k < n1
⇒ xn → x.
∴ x is a limit point.

Corollary: P is closed ⇔ it contains its limit points.

Complete Spaces (Banach Spaces)

Def. A sequence (xn ) in (X , k · k) is a Cauchy sequence if ∀ ε > 0, ∃N (ε) <


∞, such that n, m ≥ N ⇒ kxn − xm k < ε.

2
Page = 102

Notation: kxn − xm k −−−−−→ 0


n, m→∞

Proposition: If xn → x, then (xn ) is Cauchy.


Proof: Let ε > 0 and choose N < ∞ such that n ≥ N ⇒ kxn − xk < 2ε . Then,
kxn − xm k = kxn − x + x − xm k
≤ kxn − xk + kx − xm k
< 0.5ε + 0.5ε
<ε for all n, m ≥ N 
Unfortunately, not all Cauchy sequences are convergent. For a reason we will
understand shortly, all counter examples are innite dimensional.

Example:
X = {f : [0, 1] → R | f continuous}
R1
where kf k1 = 0 |f (τ )|dτ .

Dene a 
sequence as follow

0 0 ≤ t ≤ 21 − n1
fn (t) = 1 + n(t − 12 ) 12 − n1 ≤ t ≤ 12


1 t ≥ 12
kfn − fm k1 = 1
2 | n1 − 1
m| −−−−−→ 0, but there is no continuous f (t), such
n, m→∞
that f (t) → f .

Def. A normed space (X, R, k · k) is complete if every Cauchy Sequence in X


has a limit in X . Such spaces are called Banach spaces.
There are many useful and known Banach spaces.

In EECS562, you will use (C [0, T ] , k · k∞ ).

Def. A subset P of a normed space is complete if every Cauchy Sequence in


P has a limit in P.

3
Page = 103

Remark: P is complete ⇒ P is closed.

Theorem:

1. In a normed linear space, any nite dimensional subspace is complete.


2. Any closed subset of a complete set is also complete.
3. C [a, b], k·k∞ is complete where C [a, b] = {f : [a, b] → R | f continuous}
Note: a < b, both nite.

4
Page = 104

Rob 501 Fall 2014


Lecture 24
Typeset by: Kevin Chen
Proofread by: Yong Xiao
Revised by Ni on Nov. 21, 2015
Newton-Raphson & Contraction Mapping
Let h : Rn → Rn satisfy, ∀ x ∈ Rn , the Jacobian ∂h ∂x (x) exists , is continuous
and is invertible. Moreover, ∂x (x) is a continuous function.
∂h

Remark: One says h is C 1 when its derivative exits and is continuous.


Problem: For y ∈ Rn , nd a solution to y = h (x) , i.e., seek x∗ ∈ Rn s.t.
h (x∗ ) = y .

Approach: Generate a sequence of approximate solutions. Then, refer to the


literature to ensure convergence.

Idea: Have xk , seek xk+1 such that h (xk+1)−y ≈ 0. We write xk+1 = xk +∆xk
so that h (xk + ∆xk ) − y ≈ 0. Applying Taylor's Theorem and keeping only
the zeroth and rst order terms,
∂h
h (xk ) + (xk ) ∆xk − y ≈ 0
∂x
∂h
(xk ) ∆xk ≈ − (h (xk ) − y)
∂x  −1
∂h
∆xk ≈ − (xk ) (h (xk ) − y)
∂x
 −1
∂h
∴ xk+1 = xk − (xk ) (h (xk ) − y)
∂x
| {z }
T (xk )

1
Page = 105

 ∂h −1
As indicated, we dene T (x) = x − ∂x (x) (h (x) − y). Then,

x∗ = T (x∗ ) (Fixed Point)


 −1
∗ ∂h ∗
⇔x =− (x ) (h (x∗ ) − y)
∂x
 −1
∂h ∗
⇔0= (x ) (h (x∗ ) − y)
∂x
⇔ y = h (x∗ )

Let (X , R, k · k) be a normed space, S ⊂ X , and T : S → S .

Questions:
1. When does ∃x∗ s.t. T (x∗ ) = x∗ ? (Fixed point)
2. If a xed point exists, is it unique?
3. When can a xed point be determined by the Method of Successive Ap-
proximations: xn+1 = T (xn )?

Def. T : S → S is a contraction mapping if,


∃ 0 ≤ α < 1 s.t. ∀x, y ∈ S, kT (x) − T (y) k ≤ αkx − yk
.

Contraction Mapping Theorem: If T is a contraction mapping on a com-


plete subset S of a normed space (X , R, k · k), then there exists a unique vector
x∗ ∈ S such that T (x∗ ) = x∗ . Moreover, for every initial point x0 ∈ S , the
sequence xn+1 = T (xn ) , n ≥ 0, is Cauchy, and xn → x∗ .

Proof: For all n ≥ 1


kxn+1 − xn k = kT (xn ) − T (xn−1 ) k
≤ αkxn − xn−1 k

2
Page = 106

By induction, kxn+1 − xn k ≤ αn kx1 − x0 k. Consider kxm − xn k, and WLOG,


suppose m = n + p, p > 0. Then,
kxm − xn k = kxn+p − xn k
= kxn+p − xn+p−1 + xn+p−1 − · · · + xn+1 − xn k
≤ kxn+p − xn+p−1 k + · · · + kxn+1 − xn k

≤ αn+p−1 + αn+p−2 + · · · + αn kx1 − x0 k
p−1
X
n
=α αi kx1 − x0 k
i=0
X∞
≤ αn αi kx1 − x0 k
i=0
n
α
= kx1 − x0 k −n→∞
−−→ 0
1−α m→∞

∴ (xn ) is Cauchy sequence in S , and by completeness, ∃x∗ ∈ S such that


xn → x∗ . 

Claim: x∗ = T (x∗)
Proof: For every n ≥ 1,
kx∗ − T (x∗ ) k = kx∗ − xn + xn − T (x∗ ) k
= kx∗ − xn + T (xn−1 ) − T (x∗ ) k
≤ kx∗ − xn k + kT (xn−1 ) − T (x∗ ) k
≤ kx∗ − xn k + αkxn−1 − x∗ k −−−→ 0. 
n→∞

Claim: x∗ is unique.
Proof: Suppose y ∗ = T (y ∗ ).
Then,
kx∗ − y ∗ k = kT (x∗ ) − T (y ∗ ) k
≤ αkx∗ − y ∗ k and 0 ≤ α < 1

3
Page = 107

The only non-negative real number γ that satises γ ≤ γα for some 0 ≤ α < 1
is γ = 0. Hence, due to the property of norms, 0 = kx∗ − y ∗ k ⇔ x∗ = y ∗ . 

Continuous Functions and Compact Sets

Def. Let (X , k · k), and (Y, k| · k|), be two normed spaces.


(a) f : X → Y is continuous at x0 ∈ X if ∀ε > 0, ∃δ (ε, x0 ) > 0 such that
kx − x0 k < δ ⇒ k|f (x)) k| < ε
, i.e. ∀ε > 0, ∃δ > 0, s.t. x ∈ Bδ (x0 ) ⇒ f (x) ∈ Bε (f (x0 )).

(b) f is continuous if it is continuous at x0 for all x0 ∈ X .

Theorem: Let Let (X , k · k), and (Y, k| · k|) be two normed spaces. f : X →
Y a function.

(a) If f is continuous at x0 and the sequence (xn ) converges to x0 (i.e. xn → x0 ).


Then, f (xn ) → f (x0 ).
(b) If f is not continuous at x0 (discontinuous), then theres exists a sequence
(xn ) such that xn → x0 , and f (xn ) 9 f (x0 ), that is, f (xn ) does not
converge to f (x0 ).

The proof is done in HW 10.

4
Page = 108

Rob 501 Fall 2014


Lecture 25
Typeset by: Yunxiang Xu
Proofread by: Jakob Hoellerbauer
Revised by Ni on Nov. 29, 2015

Continuous Functions and Compact Sets (Continued)

Def. A set C is bounded if ∃r < ∞ such that C ⊂ Br (0).

Bolzano-Weierstrass Theorem (Sequential Compactness Theorem): In


a nite dimensional normed space (X , R, || · ||), the following two properties are
equivalent for a set C ⊂ X .

(a) C is closed and bounded;


(b) For every sequence (xn ) in C (i.e. xn ∈ C ), there exists x0 ∈ C and
a subsequence (xni ) of (xn ) such that xni → x0 (Every sequence in C
contains a convergent subsequence).
Subsequence: 1 6 n1 < n2 < n3 < · · ·

Def. C satises (a) or (b) is said to be compact.

Example: C = [0, 1] is a compact subet of R. For all (xn ) in C , it will have


two following possibilities.

(a) (xn ) has nite number of distinct values and at least one of them has to be
used for innite times.
(b) (xn ) has innite number of distinct values.

Weierstrass Theorem: If C is compact and f : C → R is continous, then f


1
Page = 109

achieves its extreme values. That is,


∃x∗ ∈ C , s.t. f (x∗ ) = sup f (x)
x∈C

and
∃x∗ ∈ C , s.t. f (x∗ ) = inf f (x).
x∈C

Proof: Let f ∗ := sup f (x). To show ∃x∗ ∈ C , s.t. f (x∗ ) = f ∗ .


x∈C

Assume f ∗ is nite (Can be shown, but we skip it).


f ∗ = supremum = least upper bound
∀ε > 0, ∃xε ∈ C , s.t. |f ∗ − f (xε )| < ε.

Set ε = n1 , and deduce that ∃(xn ) in C such that |f ∗ − f (xn )| < 1


n
C is compact ⇒ ∃(xni ) and x∗ ∈ C , s.t. xni → x∗ .

By f continuous, f (xni ) → f (x∗ )


|f ∗ − f (x∗ )| = |f ∗ − f (xni ) + f (xni ) − f (x∗ )|
6 |f ∗ − f (xni )| + |f (xni ) − f (x∗ )|
1
6 + |f (xni ) − f (x∗ )|
ni
−−−→ 0
i→∞

∴ f ∗ = f (x∗ ). 

Convex Sets and Convex Functions

Def. Let(V, R) is a vector space. C ⊂ V is convex if ∀x, y ∈ C, 0 6 λ 6 1.


Then, λx + (1 − λ)y ∈ C .

2
Page = 110

Remark:

𝑥𝑥 𝑦𝑦
𝑥𝑥

𝑦𝑦

(a) x, y ∈ C , then line connecting x and y also lies in C .


(b) Balls are always convex.

Def. Suppose C is convex. Then f : C → R is convex if ∀x, y ∈ C, 0 6 λ 6 1,


f (λx + (1 − λ)y) 6 λf (x) + (1 − λ)f (y).

𝑓𝑓 𝑓𝑓

𝑓𝑓(𝑥𝑥) 𝑓𝑓(𝑥𝑥)

𝑓𝑓(𝑦𝑦) 𝑓𝑓(𝑦𝑦)

𝑥𝑥 𝑦𝑦 𝑥𝑥 𝑦𝑦

Def. Suppose (V, R, || · ||) a normed space. D ⊂ V a subset, and f : D → R


a function.

3
Page = 111

(a) x∗ ∈ D is a local minimum of f if ∃δ > 0 s.t.∀x ∈ Bδ (x∗ ), f (x∗ ) 6 f (x).


(b) x∗ ∈ D is a global minimum if ∀y ∈ D, f (x∗ ) 6 f (y).

Theorem: If D and f are both convex, then any local miniumum is also a
global minimum.

Proof: We prove by contrapositive statement.

We show that if x is not a global minimum, then it cannot be a local mini-


mum.
x ∈ D, x is not a global minimum, hence ∃y ∈ D s.t. f (y) < f (x).

To show: ∀δ > 0. ∃z ∈ Bδ (x), s.t. f (z) < f (x).

Claim: ∀δ > 0, ∃0 < λ < 1, s.t. z = (1 − λ)x + λy ∈ Bδ (x).


||z − x|| = ||(1 − λ)x + λy − x||
= ||λ(y − x)||
= λ||y − x||

∴λ< ||y−x|| . It works!


δ

f (z) = f ((1 − λ)x + λy)


6 (1 − λ)f (x) + λf (y)
< (1 − λ)f (x) + λf (x)
= f (x)
∴ f (z) < f (x). x is not a local minimum. 

4
Page = 112

Rob 501 Fall 2014


Lecture 26
Typeset by: Vittorio Bichucher
Proofread by: Mia Stevens
Revised by Ni on Nov. 29, 2015

Convex Sets and Convex Functions (Continued)

Additional Facts:

• All norms k · k : X → [0, ∞) are convex. (proof using triangle inequality)


• For all 1 ≤ β < ∞, k · kβ is convex. (Convex function × strictly increasing
function) Hence, on Rn :
Σni=1 |xi |3
is convex.
• Let r > 0, k · k a norm, Br (x0 ) is a convex set.

Special case: B1 (0) convex set. (unit ball about the origin)
Let C be an open, bounded and convex set, 0 ∈ C. Then, ∃ k·k : X → [0, ∞)
such that C = {x ∈ X | ||x|| < 1} = B1 (0).
• K1 convex, K2 convex → K1 ∩ K2 is convex. (Proved by line inside the
set)
• Consider (Rn , R), A is a real m by n matrix, b ∈ Rm . Then:
 K = {x ∈ Rn | Ax ≤ b} is also convex. (linear inequality)
 K = {x ∈ Rn | Ax = b} is convex. (linear equality)
 K = {x ∈ Rn | Aeq x = beq , Ain x ≤ bin } is convex as well. (intersection
property)

Remark: Ãx ≥ b̃ ⇔ −Ãx ≤ −b̃.

1
Page = 113

Quadratic Programming

x ∈ R n , Q ≥ 0.
Minimize: x T
Qx
| {z }
+ fx
|{z}
subject to Ain x ≤ bin and Aeq x = beq
quadratic term linear term

Note: f (x), Q, Ain and Aeq are all convex. Also, check if constraints form
the empty set.

There are special purposes solvers available! See S. Boyd's website!

Example using robot equation:


D(q)q̈ + C(q, q̇)q̇ + G(q) = Bu
where q ∈ Rn , u ∈ Rm .

Further, the ground reaction forces can be modeled as:


 
Fh
F = Λ0 (q, q̇) + Λ1 (q)u = .
Fv

Suppose the desired feedback signal is u = γ(q, q̇), but we need to respect
bounds on the ground reaction forces
F v ≥ 0.2mtotal g.

Therefore, the normal force should be at least 20% of the total weight
|F h | ≤ 0.6F v .

Therefore, the friction force has a cone shape, and its magnitude is less than

2
Page = 114

60% of the total vertical force. Putting it all together:


 v 
F ≥ 0.2mtotal g
 F h ≤ 0.6F v  ⇔ Ain (q)u ≤ bin (q, q̇).
−F h ≤ 0.6F v

QP:
u∗ = argmin uT u + dT dp
Ain (q)u ≤ bin (q, q̇)
u = γ(q, q̇) + dT d
where dT d is often called the relaxation parameter. Further, p is an weighting
factor and it should be >>>> 1 · 104 . Dr. Grizzle nished by showing his
handout in linear programming and quadractic programming. And remember
Stephen Boyd!

You might also like