100% found this document useful (1 vote)
40 views185 pages

Classnote Ma2031

Uploaded by

Siddarth Baruah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
40 views185 pages

Classnote Ma2031

Uploaded by

Siddarth Baruah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 185

Arindama Singh

MA2031 Classnotes
Linear Algebra for Engineers

© A.Singh
Contents

Syllabus v

1 Vector Spaces 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 What is a vector space? . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Linear independence . . . . . . . . . . . . . . . . . . . . . . . 16
1.6 Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.7 Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.8 Extracting a basis . . . . . . . . . . . . . . . . . . . . . . . . . 32

2 Inner Product Spaces 37


2.1 Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2 Orthonormal basis . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.3 Gram-Schmidt orthogonalization . . . . . . . . . . . . . . . . . 45
2.4 Best approximation . . . . . . . . . . . . . . . . . . . . . . . . 48

3 Linear Transformations 52
3.1 What is a linear transformation? . . . . . . . . . . . . . . . . . 52
3.2 Action on a basis . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3 Range space and null space . . . . . . . . . . . . . . . . . . . . 58
3.4 Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.5 Adjoint of a Linear Transformation . . . . . . . . . . . . . . . . 68

4 Linear Transformations and Matrices 75


4.1 Solution of linear equations . . . . . . . . . . . . . . . . . . . . 75
4.2 Least squares solution . . . . . . . . . . . . . . . . . . . . . . . 81
4.3 Matrix of a linear transformation . . . . . . . . . . . . . . . . . 85
4.4 Matrix operations . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.5 Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.6 Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5 Spectral Representation 108


5.1 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . 108
5.2 Characteristic polynomial . . . . . . . . . . . . . . . . . . . . . 111
5.3 Schur triangularization . . . . . . . . . . . . . . . . . . . . . . 120

iii
iv
5.4 Diagonalizability . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.5 Jordan form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.6 Singular value decomposition . . . . . . . . . . . . . . . . . . . 151
5.7 Polar decomposition . . . . . . . . . . . . . . . . . . . . . . . . 158

Answers to Exercises 163

Index 178
Syllabus
Vector spaces: Real and complex vector spaces, subspaces, span, linear indepen-
dence, dimension.
Linear Transformations: Linear transformations, rank and nullity, matrix repre-
sentation, change of bases, solvability of linear systems.
Inner product spaces: Inner products, angle, orthogonal and orthonormal sets,
Gram-Schmidt orthogonalization, orthogonal and orthonormal bases, orthogonal
complement, QR-factorization, best approximation and least squares, Riesz repre-
sentation and adjoint.
Eigenpairs of linear transformations: Eigenvalues and eigenvectors, spectral
mapping theorem, characteristic polynomial, Cayley-Hamilton theorem.
Matrix representations: Block-diagonalization, Schur triangularization, diagonal-
ization theorem, generalized eigenvectors, Jordan form, singular value decomposi-
tion, polar decomposition.
Texts:
1. S. Lang, Linear Algebra, 3rd Ed., Springer, 2004.
2. D. W. Lewis, Matrix Theory, World Scientific, 1991.

References:
1. H. Anton & C. Rorres, Elementary Linear Algebra: Applications, 11th Ed.,
Wiely, 2013.
2. K. Janich, Linear Algebra, 3rd Ed., Springer, 2004.
3. B. Koleman & D. Hill, Elementary Linear Algebra, 9th Ed., Pearson, 2007.
4. A. Singh, Introduction to Matrix Theory, Ane Books, 2018.

v
1
Vector Spaces

1.1 Introduction
Consider a vector 𝑢® in the plane. As we know, two vectors are equal iff they have
the same direction and same length. Thus, we may draw 𝑢® anywhere on the plane.
Let us fix the 𝑥 and the 𝑦 axes, and mark the origin as (0, 0). Next, we draw 𝑢® with
its initial point at the origin. Its endpoint is some point, say, (𝑎, 𝑏). Thus, by making
the convention that each plane vector has its initial point at the origin, we see that
any plane vector is identified with a point (𝑎, 𝑏) ∈ R2 .
Take another vector 𝑣® with its initial point at the origin and endpoint as (𝑐, 𝑑).
By using the parallelogram law of adding two vectors, we see that the vector 𝑢® + 𝑣®
has initial point (0, 0) and endpoint (𝑎 + 𝑐, 𝑏 + 𝑑). This gives rise to addition of two
points in R2 as in the following:

(𝑎, 𝑏) + (𝑐, 𝑑) = (𝑎 + 𝑐, 𝑏 + 𝑑).

We call it component-wise addition. Similarly, for any 𝛼 ∈ R, we see that the


endpoint of 𝛼𝑢® is (𝛼𝑎, 𝛼𝑏). That is, the scalar multiplication may be defined on R2
by the rule
𝛼 (𝑎, 𝑏) = (𝛼𝑎, 𝛼𝑏).
These two operations of addition of two points and left-multiplication of a point by
a scalar have the same properties as the corresponding operations on plane vectors.
Moreover, by the technique of identification, some other entities such as functions
can also be seen to have similar properties. For example, consider 𝑆 as the set of
all functions from {1, 2} to R. Such a function 𝑓 ∈ 𝑆 is completely specified if we
know how 𝑓 acts on 1 and how it acts on 2. That is, any element 𝑓 ∈ 𝑆 is completely
specified by the ordered pair (𝑓 (1), 𝑓 (2)), which is, of course, a point in R2 . This
way, 𝑆 is identified with R2 .
Now, what about the addition and scalar multiplication on 𝑆? Suppose 𝑓 ∈ 𝑆 and
𝛼 ∈ R. As per the scalar multiplication in R2, we have

𝛼 (𝑓 (1), 𝑓 (2)) = (𝛼 𝑓 (1), 𝛼 𝑓 (2)).

1
2 MA2031 Classnotes

That is, we may define the scalar multiplication on 𝑆 by the rule

(𝛼 𝑓 )(𝑥) = 𝛼 𝑓 (𝑥) for each 𝑥 ∈ {1, 2}.

This defines the new function 𝛼 𝑓 obtained from the function 𝑓 and the scalar 𝛼 .
Similarly, if 𝑓 , 𝑔 ∈ 𝑆, the addition on R2 says that

(𝑓 (1), 𝑓 (2)) + (𝑔(1), 𝑔(2)) = (𝑓 (1) + 𝑔(1), 𝑓 (2) + 𝑔(2)).

Thus, our identification dictates us to define the addition of two functions in 𝑆 by


the rule
(𝑓 + 𝑔)(𝑥) = 𝑓 (𝑥) + 𝑔(𝑥) for each 𝑥 ∈ {1, 2}.
This defines the new function 𝑓 + 𝑔 ∈ 𝑆 obtained from the functions 𝑓 and 𝑔.
We see that points in R2 , the functions in 𝑆, and also the solutions in Sol may
all be thought of as vectors in the plane so far as these two operations of addition
and scalar multiplication are concerned. That is, if we say that the set of all plane
vectors is a vector space, then so are R2 and the set of functions 𝑆.
The essential properties of vectors in the plane are the following:
Addition is commutative and associative.
® such that for all vectors 𝑥,
There exists a vector, denoted by 0, ®
® ®
𝑥® + 0 = 𝑥® = 0 + 𝑥®.
® there exists (possibly) another vector, denoted by −®
For each vector 𝑥, 𝑥, such
that 𝑥® + (−® ®
𝑥) = 0 = (−®
𝑥) + 𝑥®.
® 1 · 𝑥® = 𝑥®.
For each vector 𝑥,
Addition distributes over scalar multiplication.
The approach is to find out the essential properties of plane vectors with respect
to these operations. If we use these properties as the defining criteria, then our
familiar way of working with plane vectors will yield results in any unfamiliar
domain that satisfies these criteria. This is the greedy principle of abstraction used
in mathematics. Using this principle, we end up with the notion of a vector space.

1.2 What is a vector space?


Let R denote the set of all real numbers and let C denote the set of all complex
numbers. To talk about them both, we let F denote either R or C. The familiar
properties of addition and multiplication of numbers from F are the following:
For all 𝑎, 𝑏, 𝑐 ∈ F,
Vector Spaces 3
1. 𝑎 + 𝑏 = 𝑏 + 𝑎, (𝑎 + 𝑏) + 𝑐 = 𝑎 + (𝑏 + 𝑐), 0 + 𝑎 = 𝑎, 𝑎 − 𝑎 := 𝑎 + (−𝑎) = 0,
2. 𝑎𝑏 = 𝑏𝑎, (𝑎𝑏)𝑐 = 𝑎(𝑏𝑐), 1 𝑎 = 𝑎, 𝑎 𝑎 −1 = 1 for 𝑎 ≠ 0,
3. 𝑎(𝑏 + 𝑐) = (𝑎𝑏) + (𝑎𝑐) := 𝑎𝑏 + 𝑎𝑐, (𝑎 + 𝑏)𝑐 = (𝑎𝑐) + (𝑏𝑐) := 𝑎𝑐 + 𝑏𝑐.
In fact, any set having at least two distinct elements, written 0 and 1, where
the operations of addition and multiplication are defined, and satisfy the above
properties is called a field. And we need a field for defining a vector space. To make
the matter simple, we take our field as F, which is either R or C.
In what follows, we will write the Cartesian product of F with itself, taken 𝑛 times
as F𝑛 . That is,
F𝑛 := {(𝑎 1, . . . , 𝑎𝑛 ) : 𝑎 1, . . . , 𝑎𝑛 ∈ F}.
An 𝑛-tuple in F𝑛 is also written in two other forms, such as
𝑎 1 
 
 .. 
[𝑎 1 · · · 𝑎𝑛 ], ..
 
𝑎𝑛 
 
The first is called a row vector, and the second is called a column vector. We also
write a column vector using the transpose notation as in the following:
𝑎 1 
 
 .. 
. is written as [𝑎 1 · · · 𝑎𝑛 ] t .
 
𝑎𝑛 
 
To distinguish the set of row vectors and the set of column vectors, we write

F1×𝑛 := {[𝑎 1 · · · 𝑎𝑛 ] : 𝑎 1, . . . , 𝑎𝑛 ∈ F},


F𝑛×1 := {[𝑎 1 · · · 𝑎𝑛 ] t : 𝑎 1, . . . , 𝑎𝑛 ∈ F}.

In fact, when we are not very specific, we write a row vector and also a column
vector as an 𝑛-tuple. So, both F1×𝑛 and F𝑛×1 are written as F𝑛 .
For 𝑚, 𝑛 ∈ N, we write F𝑚×𝑛 as the set of all 𝑚 × 𝑛 matrices with entries as
numbers from F.
A nonempty set 𝑉 with two operations: + (addition) that associates any two
elements 𝑢, 𝑣 in 𝑉 to a single element 𝑢 + 𝑣 in 𝑉 , and · (scalar multiplication) that
associates a number 𝛼 ∈ F and an element 𝑣 in 𝑉 to an element 𝛼 · 𝑣 in 𝑉 , is said to
be a vector space over F iff it satisfies the following conditions:
(1) For all 𝑥, 𝑦 ∈ 𝑉 , 𝑥 + 𝑦 = 𝑦 + 𝑥 .
(2) For all 𝑥, 𝑦, 𝑧 ∈ 𝑉 , (𝑥 + 𝑦) + 𝑧 = 𝑥 + (𝑦 + 𝑧).
(3) There exists an element 0 ∈ 𝑉 such that 𝑥 + 0 = 𝑥 for all 𝑥 ∈ 𝑉 .
4 MA2031 Classnotes

(4) For each 𝑥 ∈ 𝑉 , there exists (−𝑥) ∈ 𝑉 such that 𝑥 + (−𝑥) = 0.


(5) For each 𝑥 ∈ 𝑉 , 1 · 𝑥 = 𝑥 .
(6) For all 𝛼, 𝛽 ∈ F and for all 𝑥, 𝑦 ∈ 𝑉 , (𝛼𝛽) · 𝑥 = 𝛼 · (𝛽 · 𝑥).
(7) For each 𝛼 ∈ F and for all 𝑥, 𝑦 ∈ 𝑉 , 𝛼 · (𝑥 + 𝑦) = (𝛼 · 𝑥) + (𝛼 · 𝑦).
(8) For all 𝛼, 𝛽 ∈ F and for each 𝑥 ∈ 𝑉 , (𝛼 + 𝛽) · 𝑥 = (𝛼 · 𝑥) + (𝛽 · 𝑥).
If the underlying field F is R we say that 𝑉 is a real vector space; and if F = C,
we say that 𝑉 is a complex vector space. Elements of a vector space are called
vectors. We will use 𝑢, 𝑣, 𝑤, 𝑥, 𝑦, 𝑧 with or without subscripts for vectors. Elements
of the underlying field F are called scalars, and they will be denoted by the Roman
letters 𝑎, 𝑏, 𝑐, . . . and also by the Greek letters 𝛼, 𝛽, 𝛾, . . . with or without subscripts.
The symbol 0 will stand for both ‘zero vector’ and ‘zero scalar’; you should know
which one it represents from its use in a specific context. We write 𝛼 · 𝑥 as 𝛼𝑥 .
We accept the usual precedence rule of arithmetic; that is, the expression 𝛼𝑥 + 𝛽𝑦
will be understood as (𝛼 · 𝑥) + (𝛽 · 𝑦). Also, we will shorten 𝑦 + (−𝑥) to 𝑦 − 𝑥, and
(−𝑥) + 𝑦 to −𝑥 + 𝑦.
Notice that the first property of commutativity of addition allows us to write 𝑥 +𝑦
as 𝑦 + 𝑥 whenever it facilitates our understanding. Similarly, the second property
of associativity of addition allows us to put more parentheses or remove some
according to our convenience. Analogous comments hold for other properties.
In mentioning the properties, we have used a short hand. In the third property,
when we say “there exists an element 0 ∈ 𝑉 such that”, what we mean is “there
exists an element 𝑦 ∈ 𝑉 , which we write as 0, such that”. Similarly, in the fourth
property, “for each 𝑥 ∈ 𝑉 there exists (−𝑥) ∈ 𝑉 such that” means: for each 𝑥 ∈ 𝑉 ,
there exists an element 𝑦 ∈ 𝑉 , which we denote as −𝑥, such that”.

(1.1) Example
1. {0} is a vector space over F with 0 + 0 = 0 and 𝛼 · 0 = 0 for each 𝛼 ∈ F.
2. F is a vector space over F with addition and multiplication as in F.
3. R𝑛 , R1×𝑛 and R𝑛×1 are real vector spaces with component-wise addition and
scalar multiplication, for any 𝑛 ∈ N.
4. C𝑛 , C1×𝑛 and C𝑛×1 are complex vector spaces with component-wise addition and
scalar multiplication, for any 𝑛 ∈ N.
5. Consider C with usual addition of complex numbers. For any 𝛼 ∈ R, consider
the scalar multiplication 𝛼𝑥 as the real number 𝛼 multiplied with the complex
number 𝑥 for any 𝑥 ∈ C. Then C is a real vector space. Similarly, C𝑛 , C1×𝑛 and
C𝑛×1 are also real vector spaces.
6. 𝑉 = {(𝑎, 𝑏) ∈ R2 : 𝑏 = 0} is a real vector space under component-wise addition
and scalar multiplication.
Vector Spaces 5
7. 𝑉 = {(𝑎, 𝑏) ∈ R2 : 2𝑎 + 𝑏 = 0} is a real vector space under component-wise
addition and scalar multiplication.
8. Let 𝑉 = {(𝑎, 𝑏) ∈ R2 : 3𝑎 + 5𝑏 = 1}. We see that ( 1/3, 0), (0, 1/5) ∈ 𝑉 . But
their sum ( 1/3, 1/5) ∉ 𝑉 . [Also, 3( 1/3, 0) ∉ 𝑉 .] Thus 𝑉 is not a vector space with
component-wise addition and scalar multiplication.
9. F𝑛 [𝑡] := {𝑎 0 + 𝑎 1𝑡 + · · · + 𝑎𝑛 𝑡 𝑛 : 𝑎𝑖 ∈ F} with addition as the usual addition
of polynomials and scalar multiplication as multiplication of a polynomial by
a number, is a vector space over F. Here, F𝑛 [𝑡] contains all polynomials in the
variable 𝑡 of degree less than or equal to 𝑛.
10. F[𝑡] := the set of all polynomials (of all degrees) with coefficients from F is
a vector space over F with + as the addition of two polynomials and · as the
multiplication of a polynomial by a number from F.
11. Let 𝑉 = R2 . For (𝑎, 𝑏), (𝑐, 𝑑) ∈ 𝑉 and 𝛼 ∈ 𝑅, define addition as component-wise
addition, and scalar multiplication as in the following:
(
𝛼 (𝑎, 𝑏) = (0, 0) if 𝛼 = 0
𝛼 (𝑎, 𝑏) = (𝛼𝑎, 𝑏/𝛼) if 𝛼 ≠ 0.

Then (1 + 1)(0, 1) = 2(0, 1) = (0, 1/2) but 1(0, 1) + 1(0, 1) = (0, 2). Thus 𝑉 is
not a vector space over R.
12. Let 𝑉 = F𝑚×𝑛 . If 𝐴 = [𝑎𝑖 𝑗 ] and 𝐵 = [𝑏𝑖 𝑗 ] are in F𝑚×𝑛 , then define 𝐴+𝐵 = [𝑎𝑖 𝑗 +𝑏𝑖 𝑗 ]
and 𝛼𝐴 = [𝑖 𝑗 ]. Then 𝑉 is vector space over F. The zero vector is the zero matrix
0 and −[𝑎𝑖 𝑗 ] = [−𝑎𝑖 𝑗 ].
13. Let 𝑉 be the set of all functions from a nonempty set 𝑆 to F. Define addition of
two functions and scalar multiplication by

(𝑓 + 𝑔)(𝑥) = 𝑓 (𝑥) + 𝑔(𝑥), (𝛼 𝑓 )(𝑥) = 𝛼 𝑓 (𝑥), for 𝑥 ∈ 𝑆, 𝛼 ∈ F.

Then 𝑉 is a vector space over F. Here, zero vector is the zero map 0 given by
0(𝑥) = 0 for all 𝑥 ∈ 𝑆; and for each 𝑓 ∈ 𝑉 , its additive inverse −𝑓 is the map
given by (−𝑓 )(𝑥) = −𝑓 (𝑥) for 𝑥 ∈ 𝑆.
14. Let 𝑉 be the set of all continuous functions from the closed interval [𝑎, 𝑏] to
R. Define addition and scalar multiplication as in (13). Then 𝑉 is a real vector
space.
15. Let 𝑉 be the set of all two times differentiable functions from R to R such that
𝑓 00 + 𝑓 = 0, where 𝑓 0 denotes the derivative of 𝑓 with respect to the independent
real variable 𝑡 . Define addition and scalar multiplication as in (13). For 𝑓 , 𝑔 ∈ 𝑉
6 MA2031 Classnotes

and 𝛼 ∈ R, we see that

(𝑓 + 𝑔) 00 + (𝑓 + 𝑔) = (𝑓 00 + 𝑓 ) + (𝑔00 + 𝑔) = 0
(𝛼 𝑓 ) 00 + (𝛼 𝑓 ) = 𝛼 (𝑓 00 + 𝑓 ) = 0.

Therefore, the operations of addition and scalar multiplication are well-defined


on 𝑉 . It is easy to verify the eight properties. Hence 𝑉 is a real vector space.
16. Let 𝑉 = R∞, the set of all sequences of real numbers. For sequences (𝑎𝑛 ), (𝑏𝑛 ) ∈
𝑉 , define (𝑎𝑛 ) + (𝑏𝑛 ) := (𝑎𝑛 + 𝑏𝑛 ); and for 𝛼 ∈ R, define 𝛼 (𝑎𝑛 ) := (𝛼𝑎𝑛 ). Then 𝑉
is a real vector space.
17. Let 𝑉 = 𝑐 00, the set of all sequences of real numbers having a finite number of
nonzero terms. With the addition and scalar multiplication as in (16), 𝑉 is a real
vector space.

Any vector that behaves like the 0 in the third property, is called a zero vector.
Similarly, any vector that behaves like (−𝑥) for a given vector 𝑥, is called an additive
inverse of 𝑥 . In fact, there cannot be more than one zero vector, and there cannot be
more than one additive inverse of any vector. Along with this we show some other
expected facts.

(1.2) Theorem
In any vector space, the following are true:
(1) Zero vector is unique.
(2) Each vector has a unique additive inverse.
(3) For any vectors 𝑥, 𝑦, 𝑧, and any scalar 𝛼, the following hold:
(a) If 𝑥 + 𝑦 = 𝑥 + 𝑧, then 𝑦 = 𝑧.
(b) 0 · 𝑥 = 0.
(c) 𝛼 · 0 = 0.
(d) (−1) · 𝑥 = −𝑥 .
(e) If 𝛼 · 𝑥 = 0, then 𝛼 = 0 or 𝑥 = 0.

Proof. Let 𝑉 be a vector space over F.


(1) Let 𝜃 1 and 𝜃 2 be zero vectors in 𝑉 . Then 𝜃 2 = 𝜃 2 + 𝜃 1 = 𝜃 1 + 𝜃 2 = 𝜃 1 .
(2) Let 𝑥 ∈ 𝑉 . Suppose 𝑥 1, 𝑥 2 ∈ 𝑉 satisfy 𝑥 + 𝑥 1 = 0 = 𝑥 + 𝑥 2 . Then

𝑥1 = 𝑥1 + 0 = 𝑥1 + 𝑥 + 𝑥2 = 𝑥2 + 𝑥 + 𝑥1 = 𝑥2 + 0 = 𝑥2.

(3) Let 𝑥, 𝑦, 𝑧 ∈ 𝑉 and let 𝛼 ∈ F.


Vector Spaces 7
(a) 𝑥 + 𝑦 = 𝑥 + 𝑧 ⇒ −𝑥 + 𝑥 + 𝑦 = −𝑥 + 𝑥 + 𝑧 ⇒ 0 + 𝑦 = 0 + 𝑧 ⇒ 𝑦 = 𝑧.
(b) 0 · 𝑥 + 0 = 0 · 𝑥 = (0 + 0) · 𝑥 = 0 · 𝑥 + 0 · 𝑥 ⇒ 0 · 𝑥 = 0, by (a).
(c) 𝛼 · 0 + 0 = 𝛼 · 0 = 𝛼 · (0 + 0) = 𝛼 · 0 + 𝛼 · 0 ⇒ 𝛼 · 0 = 0, by (a).
(d) 𝑥 + (−1)𝑥 = 1 · 𝑥 + (−1) · 𝑥 = (1 + (−1)) · 𝑥 = 0 · 𝑥 = 0 ⇒ (−1)𝑥 = −𝑥 .
(e) Suppose 𝛼 · 𝑥 = 0 but 𝛼 ≠ 0. Then 𝛼 −1 exists in F. Consequently,

𝑥 = 1 · 𝑥 = 𝛼 −1 · 𝛼 · 𝑥 = 𝛼 −1 · 0 = 0.
As (1.2) shows, we may work with vectors the same way as we work with numbers.
However, we cannot multiply two vectors, since no such operation is available in
a vector space. This is the reason we used 𝛼 −1 in the proof of (1.2-3e) instead of
using 𝑥 −1 . In fact, there is no such vector as 𝑥 −1 .
In what follows, we write 𝑉 as a vector space without mentioning the underlying
field. We assume that the underlying field is F which may be R or C. We consider
F𝑛 to be a vector space over F. In particular, R𝑛 is taken as a real vector space; and
if nothing is specified, we take C𝑛 as a complex vector space.

Exercises for § 1.2


1. In each of the following a nonempty set 𝑉 is given and some operations are
defined. Check whether 𝑉 is a vector space with these operations. Whenever
𝑉 is a vector space, write explicitly the zero vector and the additive inverse of
any vector 𝑣 ∈ 𝑉 .

(a) 𝑉 = {(𝑎, 0) : 𝑎 ∈ R} with + and · as in R2 .


Ans: It is; 0 = (0, 0), −(𝑎, 0) = (−𝑎, 0).
(b) 𝑉 = {(𝑎, 𝑏) ∈ R2 : 2𝑎 + 3𝑏 = 0} with + and · as in R2 . Ans: It is.
(c) 𝑉 = {(𝑎, 𝑏) ∈ R2 : 𝑎 + 𝑏 = 1} with + and · as in R2 . Ans: It is not.
(d) 𝑉 = R2 with + as in R2, and · defined by
0 · (𝑎, 𝑏) = (0, 0), and for 𝛼 ≠ 0, 𝛼 ∈ R, 𝛼 · (𝑎, 𝑏) = (𝑎/𝛼, 𝛼𝑏).
Ans: It is not.
(e) 𝑉 = R2 with + as in R2, and · defined by 𝛼 · (𝑎, 𝑏) = (𝑎, 0) for 𝛼 ∈ R.
Ans: It is not.
(f) 𝑉 = C2 with + and · defined by (𝑎, 𝑏) + (𝑐, 𝑑) = (𝑎 + 2𝑐, 𝑏 + 3𝑑), and
𝛼 · (𝑎, 𝑏) = (𝛼𝑎, 𝛼𝑏) for (𝑎, 𝑏), (𝑐, 𝑑) ∈ 𝑉 and 𝛼 ∈ C. Ans: It is not.
(g) 𝑉 = R+, the set of all positive real numbers, with addition ⊕ and scalar
multiplication defined by 𝑥 ⊕ 𝑦 = 𝑥𝑦, and 𝛼 𝑥 = 𝑥 𝛼 for 𝑥, 𝑦 ∈ 𝑉 and
𝛼 ∈ R. Ans: It is.
(h) 𝑉 = R+ ∪ {0} with addition ⊕ and scalar multiplication defined by
𝑥 ⊕ 𝑦 = 𝑥𝑦, and 𝛼 𝑥 = |𝛼 |𝑥 for 𝑥, 𝑦 ∈ 𝑉 and 𝛼 ∈ R. Ans: It is not.
8 MA2031 Classnotes

(i) 𝑉 = R × R+, with addition ⊕ as (𝑎, 𝑏) ⊕ (𝑐, 𝑑) = (𝑎 + 𝑐, 𝑏𝑑) and scalar


multiplication defined by 𝛼 (𝑎, 𝑏) = (𝛼𝑎, 𝑏 𝛼 ) for 𝛼 ∈ R.
Ans: It is not.
(j) 𝑉 = Q, the set of all rational numbers, with + and · as in R.
Ans: It is not.
(k) (R \ Q) ∪ {0, 1} with + and · as in R. Ans: It is not.
(l) 𝑉 = R with addition ⊕ as 𝑎 ⊕ 𝑏 = 𝑎 + 𝑏 − 2 and the scalar multiplication
defined by 𝛼 𝑏 = 𝛼𝑏 + 2(𝛼 − 1) for 𝛼 ∈ R. Ans: It is not.
(m) 𝑉 = R𝑛 with addition ⊕ as 𝑢 ⊕ 𝑣 = 𝑢 + 𝑣 − 𝑤 and scalar multiplication
as 𝛼 𝑢 = 𝛼 (𝑢 − 𝑤) + 𝑤, where 𝑤 is a fixed given vector in 𝑉 , and
𝛼 ∈ R, 𝑢, 𝑣 ∈ 𝑉 . Ans: It is.
(n) 𝑉 = {(𝑎, 𝑏) ∈ R2 : 𝑎 + 𝑏 = 1} with addition ⊕ and scalar multiplication
as (𝑎, 𝑏) ⊕ (𝑐, 𝑑) = (𝑎 + 𝑐 − 1, 𝑏 + 𝑑), and 𝛼 (𝑎, 𝑏) = (𝛼𝑎 − 𝛼 + 1, 𝛼𝑏).
Ans: It is.
2. Is the set of all polynomials of degree 5 with usual addition and scalar
multiplication of polynomials a vector space? Ans: No.
3. Is the set of all purely imaginary numbers a real vector space with usual
addition and scalar multiplication? Ans: No.
4. Is the set of all 𝑛 × 𝑛 hermitian matrices a real vector space? Ans: Yes.
5. Is the set of all 𝑛 × 𝑛 hermitian matrices a complex vector space? Ans: No.
6. Let 𝑉 be a vector space over F. Show the following:
(a) For all 𝑥, 𝑦, 𝑧 ∈ 𝑉 , 𝑥 + 𝑦 = 𝑧 + 𝑦 implies 𝑥 = 𝑧.
(b) Let 𝛼, 𝛽 ∈ F, 𝑥 ∈ 𝑉 , 𝑥 ≠ 0. Then, 𝛼𝑥 ≠ 𝛽𝑥 iff 𝛼 ≠ 𝛽.
(c) If 𝑉 has two vectors, then 𝑉 has an infinite number of vectors.

1.3 Subspaces
Consider the following two nonempty subsets of R2 :

𝑈 = {(𝑎, 𝑏) ∈ R2 : 2𝑎 + 𝑏 = 0}, 𝑊 = {(𝑎, 𝑏) ∈ R2 : 2𝑎 + 𝑏 = 1}.

We have seen that 𝑈 is a vector space with the same operations of addition and scalar
multiplication as in R2 . Of course, the operations are well defined on 𝑈 . That is,
whenever 𝑥, 𝑦 ∈ 𝑈 and 𝛼 ∈ F, we have 𝑥 + 𝑦 ∈ 𝑈 and 𝛼𝑥 ∈ 𝑈 . But 𝑊 is not a vector
space with the same operations. In fact, the sum of two vectors from 𝑊 does not
necessarily result in a vector from 𝑊 . For instance, (0, 1) ∈ 𝑊 and (1, −1) ∈ 𝑊 but
(0, 1) + (1, −1) = (1, 0) ∉ 𝑊 . Similarly, multiplying a scalar with a vector from 𝑊
Vector Spaces 9
may not result in a vector from 𝑊 . We would like to separate out the first interesting
case of 𝑈 .
Let 𝑉 be a vector space. A subset 𝑈 of 𝑉 is called a subspace of 𝑉 iff the
following conditions are satisfied:
(1) 𝑈 ≠ ∅.
(2) For all 𝑥, 𝑦 ∈ 𝑈 , 𝑥 + 𝑦 ∈ 𝑈 .
(3) For each 𝑥 ∈ 𝑈 and for each 𝛼 ∈ F, 𝛼𝑥 ∈ 𝑈 .
A subspace of 𝑉 which is not equal to 𝑉 is called a proper subspace of 𝑉 .
Notice that the second and the third conditions together, in the definition of a
subspace, is equivalent to the following single condition:

For all 𝑥, 𝑦 ∈ 𝑈 and for each 𝛼 ∈ F, 𝑥 + 𝛼𝑦 ∈ 𝑈 .

(1.3) Example

1. If 𝑉 is a vector space, then {0} is a subspace of 𝑉 . It is called the zero subspace


of 𝑉 .
2. Any vector space 𝑉 is a subspace of itself.
3. The 𝑥-axis, that is, {(𝑎, 0) : 𝑎 ∈ R} is a subspace of R2 . We identify the 𝑥-axis
with R and say that R is a proper subspace of R2 .
4. Let 𝑚 < 𝑛. Then {(𝑎 1, . . . , 𝑎𝑚 , 0, . . . , 0) ∈ F𝑛 } is a subspace of F𝑛 . We identify
this subspace as F𝑚 , and say that F𝑚 is a proper subspace of F𝑛 .
5. All straight lines in R2 passing through the origin are proper subspaces of R2 .
6. 𝑊 = {(𝑎, 𝑏, 𝑐) ∈ R3 : 𝑎 − 3𝑏 + 2𝑐 = 0} is a proper subspace of R3 .
7. All planes and all straight lines in R3 passing through the origin are proper
subspaces of R3 .
8. F𝑚 [𝑡] is a proper subspace of F𝑛 [𝑡] for 𝑚 < 𝑛.
9. 𝑊 = {𝐴 ∈ F𝑛×𝑛 : tr(𝐴) = 0} is a subspace of F𝑛×𝑛 .
10. The set of all upper triangular 𝑛 × 𝑛 matrices is a subspace of F𝑛×𝑛 .
11. 𝐶 [𝑎, 𝑏] := the set of all continuous functions from [𝑎, 𝑏] to R, is a proper sub-
space of the set of all functions from the closed interval [𝑎, 𝑏] to R, with usual
operations of + and · ; see (1.1-13).
12. Interpret each polynomial 𝑝 (𝑡) ∈ R𝑛 [𝑡] as a function, where 𝑡 ∈ [𝑎, 𝑏]. Then
R𝑛 [𝑡] is a proper subspace of 𝐶 [𝑎, 𝑏].
13. R [𝑎, 𝑏] := the set of all Riemann integrable functions from [𝑎, 𝑏] to R is a real
vector space. And the vector space 𝐶 [𝑎, 𝑏] is a proper subspace of R [𝑎, 𝑏].
10 MA2031 Classnotes

14. 𝐶 𝑘 [𝑎, 𝑏] := the set of all 𝑘 times continuously differentiable functions from
[𝑎, 𝑏] to R is a proper subspace of 𝐶 [𝑎, 𝑏].
15. Let 𝑉 = 𝐶 [−1, 1]. Let 𝑈 = {𝑓 ∈ 𝑉 : 𝑓 is an odd function }. As a convention,
the zero function is taken as an odd function. So, 𝑈 ≠ ∅. If 𝑓 , 𝑔 ∈ 𝑈 and 𝛼 ∈ R,
then (𝑓 + 𝛼𝑔)(−𝑥) = 𝑓 (−𝑥) + 𝛼𝑔(−𝑥) = −𝑓 (𝑥) + 𝛼 (−𝑔(𝑥)) = −(𝑓 + 𝛼𝑔)(𝑥).
So, 𝑓 + 𝛼𝑔 ∈ 𝑈 . Therefore, 𝑈 is a subspace of 𝑉 .

In fact, a subspace is a vector space on its own right, with the operations inherited
from the parent vector space.

(1.4) Theorem
Let 𝑉 be a vector space over F with + as the addition and · as the scalar multiplica-
tion. Let 𝑈 be a subspace of 𝑉 . Then 𝑈 is a vector space over F with the addition
as the restriction of + to 𝑈 , and scalar multiplication as the restriction of · to 𝑈 .

Proof. Since 𝑈 is a subspace of 𝑉 , the restriction of + and · to 𝑈 are well de-


fined operations on 𝑈 . The commutativity and associativity of addition, distributive
properties, and scalar multiplication with 1 are satisfied in 𝑉 ; and hence, they are
true in 𝑈 too. Therefore, we only need to verify the existence of the zero vector and
the additive inverse in 𝑈 .
Since 𝑈 ≠ ∅, there exists an 𝑥 ∈ 𝑈 . In 𝑉 , we know that −𝑥 = (−1)𝑥 . Since
(−1)𝑥 ∈ 𝑈 , we see that −𝑥 ∈ 𝑈 . Now, since both 𝑥 and −𝑥 ∈ 𝑈 , we have 𝑥 + (−𝑥) =
0 ∈ 𝑈 . Moreover, this 0 serves as the additive identity in 𝑈 ; and this −𝑥 serves as
the additive inverse of 𝑥 in 𝑈 .
Therefore, 𝑈 is a vector space with the restricted operations.
The proof of (1.4) reveals that the zero vector of a subspace is the same zero
vector of the parent vector space. And the additive inverse of a vector 𝑥 in 𝑈 is the
same additive inverse −𝑥 in 𝑉 .
Verify that all subspaces given in (1.3) are vector spaces on their own right.
Given two subspaces, can we use set operations for obtaining new subspaces?

(1.5) Theorem
The intersection of two subspaces of a vector space is also a subspace.

Proof. Let 𝑈 and 𝑊 be subspaces of a vector space 𝑉 . Since 0 ∈ 𝑈 and 0 ∈ 𝑊 ,


𝑈 ∩ 𝑊 ≠ ∅.
Let 𝑥, 𝑦 ∈ 𝑈 ∩ 𝑊 and let 𝛼 ∈ F. Then 𝑥 + 𝛼𝑦 ∈ 𝑈 and 𝑥 + 𝛼𝑦 ∈ 𝑊 , since they are
subspaces. So, 𝑥 + 𝛼𝑦 ∈ 𝑈 ∩ 𝑊 . Therefore, 𝑈 ∩ 𝑊 is a subspace.
For illustration, consider two distinct planes passing through the origin in R3 .
Vector Spaces 11
Their intersection is a straight line passing through the origin; it is a subspace of
R3 .
However, union of two subspaces need not be a subspace. For example, consider
the 𝑥-axis 𝑋 = {(𝑎, 0) : 𝑎 ∈ R} and the 𝑦-axis 𝑌 = {(0, 𝑏) : 𝑏 ∈ R}. These are
subspaces of R2 . But their union is not a subspace of R2 . Reason?

(1, 0), (0, 1) ∈ 𝑋 ∪ 𝑌, but (1, 0) + (0, 1) = (1, 1) ∉ 𝑋 ∪ 𝑌 .

(1.6) Theorem
Union of two subspaces is a subspace iff one of them is a subset of the other.

Proof. Let 𝑈 and 𝑊 be subspaces of a vector space 𝑉 .


Suppose 𝑈 ⊆ 𝑊 or 𝑊 ⊆ 𝑈 . If 𝑈 ⊆ 𝑊 , then 𝑈 ∪ 𝑊 = 𝑊 . If 𝑊 ⊆ 𝑈 , then
𝑈 ∪ 𝑊 = 𝑈 . In either case, 𝑈 ∪ 𝑊 is a subspace of 𝑉 .
Conversely, Suppose that 𝑈 * 𝑊 and 𝑊 * 𝑈 . Then there exist vectors 𝑥 and 𝑦
such that 𝑥 ∈ 𝑈 , 𝑥 ∉ 𝑊 , and 𝑦 ∈ 𝑊 , 𝑦 ∉ 𝑈 . Now, both 𝑥, 𝑦 ∈ 𝑈 ∪𝑊 . Where is 𝑥 +𝑦?
If 𝑥 + 𝑦 ∈ 𝑈 , then 𝑦 = (𝑥 + 𝑦) − 𝑥 ∈ 𝑈 , which is wrong.
If 𝑥 + 𝑦 ∈ 𝑊 , then 𝑥 = (𝑥 + 𝑦) − 𝑦 ∈ 𝑊 ; which is also wrong.
Therefore 𝑥 + 𝑦 ∉ 𝑈 ∪ 𝑊 . Consequently, 𝑈 ∪ 𝑊 is not a subspace of 𝑉 .

Exercises for § 1.3


1. In each of the following, a vector space 𝑉 and a subset 𝑈 of 𝑉 are given.
Check whether 𝑈 is a subspace of 𝑉 .
(a) 𝑉 = R2 , 𝑈 = {(𝑎, 𝑏) ∈ 𝑉 : 𝑏 = 2𝑎 − 𝛼 } for some 𝛼 ≠ 0. Ans: It is not
(b) 𝑉 = R3 , 𝑈 = {(𝑎, 𝑏, 𝑐) ∈ 𝑉 : 2𝑎 − 𝑏 − 𝑐 = 0}. Ans: It is.
(c) 𝑉 = F3 [𝑡], 𝑈 = {𝑎 + 𝑏𝑡 + 𝑐𝑡 2 + 𝑑𝑡 3 ∈ 𝑉 : 𝑎 = 0}. Ans: It is.
(d) 𝑉 = C3 [𝑡], 𝑈 = {𝑎 + 𝑏𝑡 + 𝑐𝑡 2 + 𝑑𝑡 3 ∈ 𝑉 : 𝑎 + 𝑏 + 𝑐 + 𝑑 = 0}. Ans: It is.
(e) 𝑉 = C3 [𝑡], 𝑈 = {𝑎 + 𝑏𝑡 + 𝑐𝑡 2 + 𝑑𝑡 3 ∈ 𝑉 : 𝑎, 𝑏, 𝑐, 𝑑 integers}.
Ans: It is not.
(f) 𝑉 = 𝐶 [−1, 1], 𝑈 = {𝑓 ∈ 𝑉 : 𝑓 is an even function}. Ans: It is.
(g) 𝑉 = 𝐶 [0, 1], 𝑈 = {𝑓 ∈ 𝑉 : 𝑓 (𝑡) ≥ 0 for all 𝑡 ∈ [0, 1]}. Ans: It is not.
(h) 𝑉 = R𝑛×𝑛 , 𝑈 = {𝐴 ∈ 𝑉 : 𝐴t = 𝐴}. Ans: It is.
(i) 𝑉 = C𝑛×𝑛 , 𝑈 = {𝐴 ∈ 𝑉 : 𝐴∗ = 𝐴}. Ans: It is not.
2. Describe all subspaces of R2, and of R3 .
3. Let 𝑈𝑏 = {(𝑎 1, 𝑎 2, . . . , 𝑎𝑛 ) : 𝑎 1, 𝑎 2, . . . , 𝑎𝑛 ∈ R, 𝑎 1 + 2𝑎 2 + · · · + 𝑛𝑎𝑛 = 𝑏}. For
which real numbers 𝑏, is 𝑈𝑏 a subspace of R𝑛 ? Ans: Only for 𝑏 = 0.
4. Let 𝑈 be a subspace of 𝑉 and let 𝑉 be a subspace of 𝑊 . Is 𝑈 a subspace of
𝑊? Ans: Yes.
12 MA2031 Classnotes

5. Why is the set of all polynomials of degree 𝑛 not a vector space, with usual
addition and scalar multiplication of polynomials?
Ans: Not closed under addition (scalar multiplication).
6. Is the set of all skew-symmetric 𝑛 × 𝑛 matrices a subspace of R𝑛×𝑛 ?
Ans: Yes.
7. Determine whether the following are real vector spaces:
(a) R∞ := the set of all sequences of real numbers.
(b) ℓ ∞ := the set of all bounded sequences of real numbers.
(c) ℓ 1 := the set of all absolutely convergent sequences of real numbers.
Ans: (a)-(c): Yes.
8. Let 𝑆 be a nonempty set and let 𝑠 ∈ 𝑆. Let 𝑉 be the set of all functions
𝑓 : 𝑆 → R with 𝑓 (𝑠) = 0. Is 𝑉 a vector space over R with the usual addition
and scalar multiplication of functions?
Ans: It is a subspace of the space of all functions from 𝑆 to R.
9. Show that the set 𝐵(𝑆) of all bounded functions from a nonempty set 𝑆 to R
is a real vector space.

1.4 Span
We have seen that the union of two subspaces may fail to be a subspace since the
union need not be closed under the operations. It is quite possible that we enlarge
the union so that it becomes a subspace. Of course, a trivial enlargement of the
union is the whole vector space. A better option would be to enlarge the union in
a minimal way; that is by including only those vectors that are required to obtain a
subspace.
To see the requirement in a general way, let 𝑆 be a nonempty subset of a vector
space 𝑉 . Suppose 𝑆 is not a subspace of 𝑉 . Let 𝑢 ∈ 𝑆. In any minimal enlargement
of 𝑆, all vectors of the form 𝛼𝑢 must be present. Similarly, if 𝑣 ∈ 𝑆, then all vectors
of the form 𝛽𝑣 must be present in the enlargement. Then the vectors of the form
𝛼𝑢 + 𝛽𝑣 must also be in the enlargement. In general, if 𝑣 1, . . . , 𝑣𝑛 ∈ 𝑆, then in this
enlargement, we must have all vectors of the form

𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣 𝑛 for 𝛼 1, . . . , 𝛼𝑛 ∈ F.

Moreover, trivially, if 𝑆 = ∅, then the minimal way we can enlarge it to a subspace


is the zero space {0}.
Let 𝑉 be a vector space. Let 𝑣 1, . . . , 𝑣𝑛 ∈ 𝑉 . A vector 𝑣 ∈ 𝑉 is said to be a linear
combination of 𝑣 1, . . . , 𝑣𝑛 iff 𝑣 = 𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 for some scalars 𝛼 1, . . . , 𝛼𝑛 ∈ F.
Vector Spaces 13
Let 𝑆 be a nonempty subset of 𝑉 . Then the set of all linear combinations of
elements of 𝑆 is called the span of 𝑆, and is denoted by span (𝑆). Further, span of
the empty set is taken to be {0}.

(1.7) Example

1. span (∅) = span ({0}) = {0}.


2. C = span {1, 𝑖} with scalars from R.
3. C = span {1} = span {1, 𝑖} with scalars from C.
4. Let 𝑒 1 = (1, 0) and let 𝑒 2 = (0, 1). Then F2 = span {𝑒 1, 𝑒 2 }, where scalars are
from F.
5. Let 𝑒𝑖 denote the vector in F𝑛 having 1 at the 𝑖th place and 0 elsewhere. Then
F𝑛 = span {𝑒 1, . . . , 𝑒𝑛 }, where scalars are from F.
6. In R3, span {(1, 2, 3)} = {𝑎(1, 2, 3) : 𝑎 ∈ R}. It is the straight line passing through
the origin and the point (1, 2, 3).
7. In R3, span {(1, 2, 3), (3, 2, 1)} = {(𝑎 + 3𝑏, 2𝑎 + 2𝑏, 3𝑎 + 𝑏) : 𝑎, 𝑏 ∈ R}. It is the
plane passing through the points (0, 0, 0), (1, 2, 3) and (3, 2, 1).
8. F3 [𝑡] = span {1, 𝑡, 𝑡 2, 𝑡 3 } with scalars from F.
9. F[𝑡] = span {1, 𝑡, 𝑡 2, . . .} with scalars from F.
       
2×2 1 0 0 1 0 0 0 0
10. F = span , , , .
0 0 0 0 1 0 0 1

Notice that a linear combination is always a finite sum. If 𝑆 is a finite set, say
𝑆 = {𝑣 1, . . . , 𝑣𝑚 }, then span (𝑆) = {𝛼 1𝑣 1 + · · · + 𝛼𝑚 𝑣𝑚 : 𝛼 1, . . . , 𝛼𝑚 ∈ F}.
In general if 𝑆 is a nonempty set, then we can write its span as

span (𝑆) = {𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 : 𝛼 1, . . . , 𝛼𝑛 ∈ F, 𝑣 1, . . . , 𝑣𝑛 ∈ 𝑆 for some 𝑛 ∈ N}.

Caution: The set 𝑆 can have infinitely many elements, but a linear combination is
a sum of finitely many elements from 𝑆, multiplied with some scalars.
We show that the notion of ‘span’ serves its purpose in enlarging a subset to a
minimal subspace.

(1.8) Theorem
The span of a subset is the minimal subspace that contains the subset.

Proof. Let 𝑆 be a subset of a vector space 𝑉 over F. If 𝑆 = ∅, then span (𝑆) = {0};
which is clearly the minimal subspace containing ∅. So, let 𝑆 ≠ ∅. Clearly, 𝑆 ⊆
14 MA2031 Classnotes

span (𝑆). Let 𝑥, 𝑦 ∈ span (𝑆). Then 𝑥 = 𝑎 1𝑥 1 + · · · + 𝑎𝑛 𝑥𝑛 and 𝑦 = 𝑏 1𝑦1 + · · · + 𝑏𝑚𝑦𝑚


for some 𝑎𝑖 , 𝑏 𝑗 ∈ F, 𝑥𝑖 , 𝑦 𝑗 ∈ 𝑆, and 𝑛, 𝑚 ∈ N. Let 𝛼 ∈ F. Now,

𝑥 + 𝛼𝑦 = 𝑎 1𝑥 1 + · · · + 𝑎𝑛 𝑥𝑛 + 𝛼𝑏 1𝑦1 + · · · + 𝛼𝑏𝑚𝑦𝑚 ∈ span (𝑆).

Hence span (𝑆) is a subspace containing 𝑆.


If 𝑈 is a subspace containing 𝑆, then 𝑈 contains all linear combinations of vectors
from 𝑈 ; in particular, from 𝑆. That is, 𝑈 contains span (𝑆). So, span (𝑆) is a minimal
subspace containing 𝑆.
Further, if 𝑆 0 is also a minimal subspace containing 𝑆, then 𝑆 0 ⊆ span (𝑆). Now that
𝑆 0 is a subspace that contains 𝑆, we also have span (𝑆) ⊆ 𝑆 0 . That is, 𝑆 0 = span (𝑆).
Therefore, span (𝑆) is the minimal subspace containing 𝑆.
In fact, span (𝑆) is the intersection of all subspaces of 𝑉 that contain 𝑆.
Notice that if 𝑈 and 𝑊 are subspaces of a vector space 𝑉 , then any linear com-
bination of vectors from 𝑈 is a vector in 𝑈 . Similarly, any linear combination of
vectors from 𝑊 is also a vector in 𝑊 . We guess that any vector in span (𝑈 ∪ 𝑊 ) is
a sum of vectors from 𝑈 and 𝑊 .
Let 𝑆 1 and 𝑆 2 be nonempty subsets of a vector space 𝑉 . The Sum of these subsets
is defined as
𝑆 1 + 𝑆 2 := {𝑥 + 𝑦 : 𝑥 ∈ 𝑆 1, 𝑦 ∈ 𝑆 2 }.
For example, consider the sum of the 𝑥-axis and the 𝑦-axis in R2 . We see that
each vector (𝑎, 𝑏) ∈ R2 can be written as (𝑎, 𝑏) = (𝑎, 0) + (0, 𝑏). Hence the sum of
the 𝑥-axis and the 𝑦-axis is the whole of R2 .
In R3, the sum of 𝑥-axis and the 𝑦-axis is the 𝑥𝑦-plane.
We show that the sum of two subspaces is the minimal enlargement of their union
so that the enlarged set is a subspace.

(1.9) Theorem
The sum of two subspaces of a vector space is equal to the span of their union.

Proof. Let 𝑈 and 𝑊 be subspaces of a vector space 𝑉 . Since 𝑈 ⊆ 𝑈 + 𝑊 ,


𝑈 + 𝑊 ≠ ∅. Let 𝑧 1, 𝑧 2 ∈ 𝑈 + 𝑊 . There exist vectors 𝑥 1, 𝑥 2 ∈ 𝑈 and 𝑦1, 𝑦2 ∈ 𝑊 such
that 𝑧 1 = 𝑥 1 + 𝑦1 and 𝑧 2 = 𝑥 2 + 𝑦2 . Let 𝛼 ∈ F. Then

𝑧 1 + 𝛼𝑧 2 = 𝑥 1 + 𝑦1 + 𝛼 (𝑥 2 + 𝑦2 ) = (𝑥 1 + 𝛼𝑥 2 ) + (𝑦1 + 𝛼𝑦2 ) ∈ 𝑈 + 𝑊 .
Hence 𝑈 + 𝑊 is a subspace of 𝑉 that contains 𝑈 ∪ 𝑊 . Since span (𝑈 ∪ 𝑊 ) is the
minimal subspace of 𝑉 that contains 𝑈 ∪ 𝑊 , we have span (𝑈 ∪ 𝑊 ) ⊆ 𝑈 + 𝑊 . On
the other hand, 𝑈 + 𝑊 ⊆ span (𝑈 ∪ 𝑊 ). So, 𝑈 + 𝑊 = span (𝑈 ∪ 𝑊 ).
Though a vector space is very large, there can be a small subset whose span is the
vector space. For example, R2 = span {(1, 0), (0, 1)}.
Vector Spaces 15
A subset 𝑆 of a vector space 𝑉 is said to span 𝑉 iff span (𝑆) = 𝑉 . In this case, we
also say that 𝑆 is a spanning set of 𝑉 , and 𝑉 is spanned by 𝑆.

(1.10) Example
1. The subset {(1, 0), (0, 1)} of R2 spans R2 .
2. The subset {(1, 2), (2, 1), (2, 2)} of R2 spans R2 .
3. The subset {𝑒 1, . . . , 𝑒𝑛 } of F𝑛 is a spanning set of F𝑛 .
4. F𝑛 [𝑡] is spanned by {1, 𝑡, . . . , 𝑡 𝑛 }.
5. The subset {(1, 2)} of R2 spans the vector space {𝛼 (1, 2) : 𝛼 ∈ R}. Here, the
vector space is the straight line that passes through the origin and the point (1, 2);
it is a proper subspace of R2 . For instance, (1, 1) ∉ span {(1, 2)}. We see that
{(1, 2)} does not span R2 .
6. {(1, 1, 1), (0, 1, 1), (1, −1, −1), (1, 3, 3)} = span {(0, 0, 0), (1, 1, 1), (0, 1, 1). Why?

𝑎(1, 1, 1) + 𝑏 (0, 1, 1) + 𝑐 (1, −1, −1) + 𝑑 (1, 3, 3)


= (𝑎 + 𝑐 + 𝑑)(1, 1, 1) + (𝑏 − 2𝑐 + 2𝑑)(0, 1, 1).

The span of the given vectors is the plane in R3 that contains the points (0, 0, 0),
(1, 1, 1) and (0, 1, 1).

If 𝑆 is a spanning set of a vector space 𝑉 , then any superset of 𝑆 is also a spanning


set of𝑉 . However, a subset of 𝑆 may or may not be a spanning set. In Example 1.6 (6),
the plane is spanned by {(1, 1, 1), (0, 1, 1)}; but it is not spanned by {(1, 1, 1)}.

Exercises for § 1.4


1. Do the polynomials 𝑡 3 − 2𝑡 2 + 1, 4𝑡 2 − 𝑡 + 3, and 3𝑡 − 2 span F3 [𝑡]?
Ans: No.
2. What is span {𝑡 𝑛 : 𝑛 = 0, 2, 4, 6, . . .} in R[𝑡]?
Ans: {𝑎 0 + 𝑎 1𝑡 + 𝑎 2𝑡 2 + · · · + 𝑎𝑛 𝑡 𝑛 : 𝑎 2𝑖−1 = 0}.
3. In F3, find span {𝑒 1 + 𝑒 2, 𝑒 2 + 𝑒 3, 𝑒 3 + 𝑒 1 }, where 𝑒 1 = (1, 0, 0), 𝑒 2 = (0, 1, 0) and
𝑒 3 = (0, 0, 1). Ans: F3 .
4. Let 𝑢, 𝑣 1, 𝑣 2, . . . , 𝑣𝑛 be 𝑛 + 1 distinct vectors in a vector space 𝑉 . Take 𝑆 1 =
{𝑣 1, 𝑣 2, . . . , 𝑣𝑛 } and 𝑆 2 = {𝑢, 𝑣 1, 𝑣 2, . . . , 𝑣𝑛 }. Prove that 𝑢 ∈ span (𝑆 1 ) iff
span (𝑆 1 ) = span (𝑆 2 ).
5. Let 𝑆 be a subset of a vector space𝑉 . Show that 𝑆 is a subspace iff 𝑆 = span (𝑆).
6. Let 𝑢 1 (𝑡) = 1, and for 𝑛 = 2, 3, . . . , let 𝑢𝑛 (𝑡) = 1 + 𝑡 + . . . + 𝑡 𝑛−1 . Show that
{𝑢 1, . . . , 𝑢𝑛 } spans F𝑛−1 [𝑡]. Is it true that {𝑢 1, 𝑢 2, . . .} spans F[𝑡]? Ans: Yes.
16 MA2031 Classnotes

7. Let 𝑆 be a subset of a vector space 𝑉 . Prove that span (𝑆) is the intersection
of all subspaces that contain 𝑆.
8. We know that 𝑒 𝑡 = 1 + 𝑡 + 2!1 𝑡 2 + · · · for each 𝑡 ∈ R. Does it imply that
𝑒 𝑡 ∈ span {1, 𝑡, 𝑡 2, . . .}? Ans: No.
9. Show that every vector space has at least two spanning sets.
10. Let 𝑉 be the real vector space of all functions from {1, 2} to R. Construct a
spanning set of 𝑉 with two elements.
11. Find a finite spanning set for the real vector space of all real symmetric
matrices of order 𝑛.
12. Let 𝐴 and 𝐵 be subsets of a vector space 𝑉 . Prove (a)-(c) and give counter
examples for (d)-(f):
(a) span (span (𝐴)) = span (𝐴).
(b) If 𝐴 ⊆ 𝐵, then span (𝐴) ⊆ span (𝐵).
(c) span (𝐴 ∩ 𝐵) ⊆ span (𝐴) ∩ span (𝐵).
(d) span (𝐴) ∩ span (𝐵) ⊆ span (𝐴 ∩ 𝐵).
(e) span (𝐴) \ span (𝐵) ⊆ span (𝐴 \ 𝐵).
(f) span (𝐴 \ 𝐵) ⊆ (span (𝐴) \ span (𝐵)) ∪ {0}.
13. Give suitable real vector spaces 𝑈 , 𝑉 ,𝑊 so that 𝑈 + 𝑉 = 𝑈 + 𝑊 but 𝑉 ≠ 𝑊 .
14. Let 𝑈 and 𝑊 be subspaces of a vector space such that 𝑈 ∩ 𝑊 = {0}. Prove
that if 𝑥 ∈ 𝑈 + 𝑊 , then there exist unique 𝑢 ∈ 𝑈 , 𝑤 ∈ 𝑊 such that 𝑥 = 𝑢 + 𝑤 .
15. Let 𝑈 , 𝑉 , and 𝑊 be subspaces of a vector space 𝑋 .
(a) Prove that (𝑈 ∩ 𝑉 ) + (𝑈 ∩ 𝑊 ) ⊆ 𝑈 ∩ (𝑉 + 𝑊 ).
(b) Give suitable 𝑈 , 𝑉 ,𝑊 , 𝑋 so that 𝑈 ∩ (𝑉 + 𝑊 ) * (𝑈 ∩ 𝑉 ) + (𝑈 ∩ 𝑊 ).
(c) Prove that 𝑈 + (𝑉 ∩ 𝑊 ) ⊆ (𝑈 + 𝑉 ) ∩ (𝑈 + 𝑊 ).
(d) Give suitable 𝑈 , 𝑉 ,𝑊 , 𝑋 so that (𝑈 + 𝑉 ) ∩ (𝑈 + 𝑊 ) * 𝑈 + (𝑉 ∩ 𝑊 ).
16. Let 𝑒𝑖 be the sequence (0, 0, . . . , 0, 1, 0, 0, . . .) where the 𝑖th term is 1 and the
rest are all 0. What is span ({𝑒 1, 𝑒 2, . . .})?
Ans: 𝑐 00, the set of all sequences each having finitely many nonzero terms.

1.5 Linear independence


The vector space 𝑉 spans itself, though our interest is in finding smaller subsets of
𝑉 that would span 𝑉 . We would like to have a spanning set none of whose proper
subsets is a spanning set. In that case, no vector in such a set is in the span of the
rest.
Vector Spaces 17
Let 𝑆 be a subset of a vector space 𝑉 . 𝑆 is said to be linearly dependent iff there
exists a vector 𝑣 ∈ 𝑆 such that 𝑣 ∈ span (𝑆 \ {𝑣 }).
A list of vectors 𝑣 1, . . . , 𝑣𝑛 is said to be linearly dependent iff either a vector is
repeated in the list or the set {𝑣 1, . . . , 𝑣𝑛 } is linearly dependent, or both.
A set or a list is called linearly independent iff it is not linearly dependent.
When a list of vectors 𝑣 1, . . . , 𝑣𝑛 is linearly dependent (or independent), we say
that the vectors 𝑣 1, . . . , 𝑣𝑛 are linearly dependent (or independent).
Observe that ∅ is linearly independent, and {0} is linearly dependent. Moreover,
a set of at least two vectors is linearly dependent iff one of the vectors in the set is a
linear combination of others.

(1.11) Example

1. The set {(1, 0), (0, 1)} is linearly independent in R2 .


2. The set {(1, 1), (2, 1), (2, 3)} is linearly dependent in R2 .
3. The vectors (1, 1), (2, 1), (2, 3) are linearly dependent in R2 .
4. The vectors 1, 𝑖 in the real vector space C are linearly independent.
5. The vectors 1, 𝑖 in the complex vector space C are linearly dependent.
6. The set {1, 1 + 𝑡, 𝑡 2 − 𝑡 3, 2𝑡 + 𝑡 2 − 𝑡 3, 1 + 𝑡 + 𝑡 2 + 𝑡 3 } is linearly dependent in F3 [𝑡].
Reason: 2𝑡 + 𝑡 2 − 𝑡 3 = −2(1) + 2(1 + 𝑡) + (𝑡 2 − 𝑡 3 ).
7. The set of functions {𝑡, cos 𝑡 } is linearly independent in 𝐶 [0, 𝜋/2]. For, if cos 𝑡 =
𝛼𝑡, then at 𝑡 = 0, we have 1 = 𝛼 × 0 = 0, which is impossible. On the other
hand, if 𝑡 = 𝛽 cos 𝑡, then at 𝑡 = 𝜋2 , we obtain 𝜋2 = 𝛽 cos( 𝜋2 ) = 0, which is again
impossible.
   
1 0 0 1
8. The set , is linearly independent in F2×2 .
0 1 1 0

Given a set of vectors, how do we determine whether one of them is a linear


combination of others? Or, how do we show that none of the vectors in the set is a
linear combination of others?

(1.12) Theorem
Let 𝑣 1, . . . , 𝑣𝑛 be vectors in a vector space 𝑉 , where 𝑛 ≥ 2. Then
(1) {𝑣 1, . . . , 𝑣𝑛 } is linearly dependent iff 𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 = 0 where at least one
of the scalars 𝛼 1, . . . , 𝛼𝑛 is nonzero.
(2) {𝑣 1, . . . , 𝑣𝑛 } is linearly independent iff for scalars 𝛼 1, . . . , 𝛼𝑛 ,
𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 = 0 implies 𝛼 1 = · · · = 𝛼𝑛 = 0.
18 MA2031 Classnotes

Proof. (1) Suppose that {𝑣 1, . . . , 𝑣𝑛 } is linearly dependent. Then we have at least


one 𝑗 ∈ {1, . . . , 𝑛} such that

𝑣 𝑗 = 𝛼 1𝑣 1 + · · · + 𝛼 𝑗−1𝑣 𝑗−1 + 𝛼 𝑗+1𝑣 𝑗+1 + · · · + 𝛼𝑛 𝑣𝑛 .

Then 𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 = 0 where 𝛼 𝑗 = −1 ≠ 0.
Conversely, suppose that 𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 = 0, where 𝛼 1, . . . , 𝛼𝑛 ∈ F, and we have
(some) 𝑗 ∈ {1, . . . , 𝑛} such that 𝛼 𝑗 ≠ 0. Then

𝑣 𝑗 = −(𝛼 𝑗 ) −1 (𝛼 1𝑣 1 + · · · + 𝛼 𝑗−1𝑣 𝑗−1 + 𝛼 𝑗+1𝑣 𝑗+1 + · · · + 𝛼𝑛 𝑣𝑛 ).

Therefore, {𝑣 1, . . . , 𝑣𝑛 } is linearly dependent.


(2) If the given condition is satisfied, then there does not exist a nonzero 𝛼 𝑗 such
that 𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 = 0. That is, {𝑣 1, . . . , 𝑣𝑛 } is linearly independent.
Conversely, if the condition is not satisfied, then 𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 = 0 with at least
one 𝛼 𝑗 ≠ 0. That is, the set {𝑣 1, . . . , 𝑣𝑛 } is linearly dependent.
As (1.12) shows, given vectors are linearly independent iff the only way the zero
vector can be written as a linear combination of the vectors is the trivial linear
combination. It thus gives us a method of determining whether a given set of
vectors is linearly dependent or independent. We start with the equation
𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣 𝑛 = 0
and try to determine the scalars 𝛼 1, . . . , 𝛼𝑛 . If it turns out that all of them must be
zero, then the vectors are linearly independent. In case, we fail to show this, we
would obtain a nontrivial linear combination of the zero vector. That would supply
us with a vector that can be expressed as a linear combination of others.

(1.13) Example
1. Is {(1, 0, 0), (1, 2, 0), (1, 1, 1)} linearly independent in R3 ? We start with the
equation
𝛼 (1, 0, 0) + 𝛽 (1, 2, 0) + 𝛾 (1, 1, 1) = (0, 0, 0).
Comparing the components, we obtain: 𝛼 + 𝛽 + 𝛾 = 0, 2𝛽 + 𝛾 = 0, and 𝛾 = 0. It
implies 𝛼 = 𝛽 = 𝛾 = 0.
Thus, the vectors are linearly independent.
2. {(1, 0), (1, 1), (3, 2)} is linearly dependent in R2 . To illustrate (1.12), suppose

𝑎(1, 0) + 𝑏 (1, 1) + 𝑐 (3, 2) = (0, 0).

It gives (𝑎 + 𝑏 + 3𝑐, 𝑏 + 2𝑐) = (0, 0). That is, 𝑎 + 𝑏 + 3𝑐 = 0, 𝑏 + 2𝑐 = 0. Or,


𝑐 = −𝑎, 𝑏 = 2𝑎. With 𝑎 = 0, we obtain the trivial linear combination of the zero
vector. With 𝑎 ≠ 0, we obtain (3, 2) = 1(1, 0) + 2(1, 1).
Vector Spaces 19
3. {1, 𝑡, 𝑡 2 } is a linearly independent set in F2 [𝑡]. For, if 𝑎 1 + 𝑏 𝑡 + 𝑐 𝑡 2 = 0, the zero
polynomial, then 𝑎 = 𝑏 = 𝑐 = 0.
4. Is {sin 𝑡, cos 𝑡 } linearly independent in 𝐶 [0, 𝜋]? Assume that

𝛼 sin 𝑡 + 𝛽 cos 𝑡 = 0.

Notice that the 0 on the right hand side is the zero function. Putting 𝑡 = 0 we
get 𝛽 = 0. By taking 𝑡 = 𝜋/2, we get 𝛼 = 0. Hence the set {sin 𝑡, cos 𝑡 } is linearly
independent.
5. If {𝑢 1, . . . , 𝑢𝑛 } is linearly dependent, then for any vector 𝑣 ∈ 𝑉 , {𝑢 1, . . . , 𝑢𝑛 , 𝑣 } is
linearly dependent. Why?

We observe the following:


1. Any set containing the zero vector is linearly dependent.
2. The vectors 𝑢, 𝑣 are linearly dependent iff one of them is a scalar multiple of
the other.
3. Each superset of a linearly dependent set is linearly dependent.
4. Each subset of a linearly independent set is linearly independent.
5. Moreover, the set {𝑣 1, . . . , 𝑣𝑛 } is linearly dependent does not imply that each
vector is in the span of the remaining vectors.
Given a linearly dependent list of five vectors, suppose, we have discovered that the
second vector is a linear combination of first, third, and fourth, where the coefficient
of the fourth vector is nonzero. Then the fourth vector can also be expressed as a
linear combination of the first, second and third.

(1.14) Theorem
A list of 𝑛 vectors is linearly dependent iff there exists 𝑘 ∈ {1, . . . , 𝑛} such that the
first 𝑘 − 1 vectors are linearly independent, and the 𝑘th vector is in the span of the
first 𝑘 − 1 vectors.

If the first vector in the list is the zero vector, then the list of previous vectors is
taken as the empty list ∅, in which case 𝑘 = 1.
Proof. Write 𝑆 0 := ∅, and for 1 ≤ 𝑗 ≤ 𝑛, define the list 𝑆 𝑗 as the list of first 𝑗
vectors. We notice that 𝑆 0 is a sublist of 𝑆 1, which is a sublist of 𝑆 2, and so on. In
this increasing list of lists 𝑆 0, 𝑆 1, 𝑆 2, . . . , 𝑆𝑛 , the first list 𝑆 0 is linearly independent
and the last list 𝑆𝑛 is linearly dependent. Notice that if 𝑆 𝑗 is linearly independent,
then all of 𝑆 0, . . . , 𝑆 𝑗 are linearly independent; and if 𝑆 𝑗 is linearly dependent, then
all of 𝑆 𝑗 , 𝑆 𝑗+1, . . . , 𝑆𝑛 are linearly dependent. Therefore, somewhere the switching
from linearly independent to linearly dependent happens. That is, there exists a
20 MA2031 Classnotes

𝑘 ∈ {1, . . . , 𝑛} such that 𝑆𝑘−1 is linearly independent and 𝑆𝑘 is linearly dependent;


in which case, the 𝑘-th vector is in span (𝑆𝑘−1 ).
In the proof of (1.14), the number 𝑘 is the least number 𝑖 such that the first 𝑖
vectors are linearly dependent. In fact, the proof is only a detailed explanation of
this idea.
So, if a list of vectors is linearly dependent, then either the first vector is the
zero vector, or there exists a vector in the list which is a linear combination of the
previous ones. Notice that considering a linearly dependent set as an ordered set
(a list) has this bonus. In a linearly dependent list, a vector which depends linearly
on the previous ones need not be unique. Further, such a vector can change if we
choose a different ordering on the given set.

(1.15) Theorem
Let 𝑉 be a vector space. Let 𝐴 be a linearly independent set of 𝑚 vectors, and let 𝐵
be a spanning set of 𝑉 consisting of 𝑛 vectors. Then 𝑚 ≤ 𝑛.

Proof. Let 𝐴 = {𝑢 1, . . . , 𝑢𝑚 } and let 𝐵 = {𝑣 1, . . . , 𝑣𝑛 }. Since 𝐴 is linearly inde-


pendent, each 𝑢𝑖 is a nonzero vector. Assume that 𝑚 > 𝑛. Then we have vectors
𝑢𝑛+1, . . . , 𝑢𝑚 in 𝐴. Since 𝐵 is a spanning set, 𝑢 1 ∈ span (𝐵). Thus, the list 𝐵 1 of
vectors
𝑢 1, 𝑣 1, 𝑣 2, . . . , 𝑣 𝑛
is linearly dependent. By (1.14), we have a vector in this list which is in the span
of the previous ones. Notice that 𝑢 1 is not such a vector since 𝑢 1 ≠ 0. So, let 𝑣𝑘 be
such a vector. Now, remove this 𝑣𝑘 from 𝐵 1 to obtain the list 𝐶 1 of vectors
𝑢 1, 𝑣 1, . . . , 𝑣𝑘−1, 𝑣𝑘+1, . . . , 𝑣𝑛 .
Notice that span (𝐶 1 ) = span (𝐵 1 ) = span (𝐵) = 𝑉 . By including 𝑢 2, enlarge the list
𝐶 1 to 𝐵 2 of vectors
𝑢 2, 𝑢 1, 𝑣 1, . . . , 𝑣𝑘−1, 𝑣𝑘+1, . . . , 𝑣𝑛 .
Again, 𝐵 2 is linearly dependent. Then one of the vectors in 𝐵 2 is a linear combi-
nation of the previous ones. Such a vector is neither 𝑢 1 nor 𝑢 2, since {𝑢 1, . . . , 𝑢𝑛 }
is linearly independent. Thus, let 𝑣 𝑗 be such a vector. Remove this 𝑣 𝑗 from 𝐵 2 to
obtain a list 𝐶 2 . Now, span (𝐶 2 ) = span (𝐵 2 ) = span (𝐵) = 𝑉 .
Continue this process of introducing a 𝑢 and removing a 𝑣 for 𝑛 steps. Finally, 𝑣𝑛
is removed and we end up with the list 𝐶𝑛 of vectors
𝑢𝑛 , 𝑢𝑛−1, . . . , 𝑢 2, 𝑢 1,
which spans 𝑉 . Then 𝑢𝑛+1 ∈ span (𝐶𝑛 ). This is a contradiction since 𝐴 is linearly
independent. Therefore, our assumption that 𝑚 > 𝑛 is wrong.
Vector Spaces 21
Exercises for § 1.5
1. In each of the following, a vector space 𝑉 and a subset 𝑆 of 𝑉 are given.
Determine whether 𝑆 is linearly dependent; and if it is, express one of the
vectors in 𝑆 as a linear combination of the remaining vectors.
(a) 𝑉 = R3, 𝑆 = {(1, 0, −1), (2, 5, 1), (0, −4, 3)}. Ans: LI.
(b) 𝑉 = R3, 𝑆 = {(1, 2, 3), (4, 5, 6), (7, 8, 9)}. Ans: LD.
(c) 𝑉 = C3, 𝑆 = {(1, −3, −2), (−3, 1, 3), (2, 5, 7)}. Ans: LI.
(d) 𝑉 = C3, 𝑆 = {(1, 3, 2), (3, 1, 3), (1, 2, 3), (4, 7, 5)}. Ans: LD.
(e) 𝑉 = F3 [𝑡], 𝑆 = {𝑡 2 − 3𝑡 + 5, 𝑡 3 + 2𝑡 2 − 𝑡 + 1, 𝑡 3 + 3𝑡 2 − 1}. Ans: LI.
(f) 𝑉 = F3 [𝑡], 𝑆 = {−2𝑡 3 − 𝑡 2 + 3𝑡 + 2, 𝑡 3 − 2𝑡 2 + 3𝑡 + 1, 𝑡 3 + 𝑡 2 + 3𝑡 }.
Ans: LI.
(g) 𝑉 = F3 [𝑡], 𝑆 = {6𝑡 3 − 3𝑡 2 + 𝑡 + 2, 𝑡 3 − 𝑡 2 + 2𝑡 + 3, 2𝑡 3 + 𝑡 2 − 3𝑡 + 1}.
Ans: LI.        
2×2 2 0 1 −1 2 1 1 2
(h) 𝑉 =F , 𝑆= , , , . Ans: LI.
2 3 2 1 0 −1 −1 0
(i) 𝑉 = the set of all functions from R to R, 𝑆 = {2, sin2 𝑡, cos2 𝑡 }.
Ans: LD.
(j) 𝑉 = the set of all functions from R to R, 𝑆 = {1, sin 𝑡, sin 2𝑡 }.
Ans: LI.
(k) 𝑉 = the set of all functions from R to R, 𝑆 = {sin 𝑡, cos 𝑡, sin 2𝑡 }.
Ans: LI.
2. Show that in R2, {(𝑎, 𝑏), (𝑐, 𝑑)} is linearly independent iff 𝑎𝑑 − 𝑏𝑐 ≠ 0.
3. Show that in R2 , any three vectors are linearly dependent.
4. Give three linearly dependent vectors in R2 such that none is a scalar multiple
of another.
5. Give four vectors in R3 which are linearly dependent so that any three of them
are linearly independent.
6. Prove the following:
(a) Each superset of a linearly dependent set is linearly dependent.
(b) Each subset of a linearly independent set is linearly independent.
(c) Union of any two linearly dependent sets is linearly dependent.
(d) Intersection of any two linearly independent sets is linearly independent.
7. Construct examples to show that the following statements are false:
(a) Each subset of a linearly dependent set is linearly dependent.
(b) Each superset of a linearly independent set is linearly independent.
(c) Union of any two linearly independent sets is linearly independent.
(d) Intersection of any two linearly dependent sets is linearly dependent.
22 MA2031 Classnotes

8. Let 𝐴 and 𝐵 be subsets of a vector space. Prove or disprove:


(a) If span (𝐴) ∩ span (𝐵) = {0}, then 𝐴 ∪ 𝐵 is linearly independent.
(b) If 𝐴 ∪ 𝐵 is linearly independent, then span (𝐴) ∩ span (𝐵) = {0}.
Ans: (a) disprove, (b) prove.
9. Show that in the vector space of all functions from R to R, the set of functions
{𝑒 𝑡 , 𝑡𝑒 𝑡 , 𝑡 3𝑒 𝑡 } is linearly independent.
10. Is {sin 𝑡, sin 2𝑡, sin 3𝑡, . . . , sin 𝑛𝑡 } linearly independent in 𝐶 [−𝜋, 𝜋]?
11. Let 𝑝 1 (𝑡), . . . , 𝑝𝑘 (𝑡) be polynomials with coefficients from F with distinct
degrees. Show that {𝑝 1 (𝑡), . . . , 𝑝𝑘 (𝑡)} is linearly independent in F[𝑡].
12. Let 𝑢, 𝑣, 𝑤, 𝑥, 𝑦1, 𝑦2, 𝑦3, 𝑦4, 𝑦5 be vectors in a vector space 𝑉 satisfying the
relations: 𝑦1 = 𝑢 + 𝑣 + 𝑤, 𝑦2 = 2𝑣 + 𝑤 + 𝑥, 𝑦3 = 𝑢 + 3𝑤 + 𝑥, 𝑦4 = 2𝑢 + 𝑣 + 4𝑥,
and 𝑦5 = 𝑢 + 2𝑣 + 3𝑤 + 4𝑥 . Are the vectors 𝑦1, . . . , 𝑦5 linearly dependent or
independent? Ans: LD.
13. Let 𝑆 be a linearly independent subset of a vector space 𝑉 , and let 𝑣 ∈ 𝑉 \ 𝑆.
Show that if 𝑆 ∪ {𝑣 } is linearly dependent then 𝑣 ∈ span (𝑆).
14. Let 𝑣 1, . . . , 𝑣𝑛 be linearly independent vectors in a vector space 𝑉 . Let 𝑢 ∈ 𝑉
be such that 𝑢 + 𝑣 1, . . . , 𝑢 + 𝑣𝑛 are linearly dependent. Show that 𝑢, 𝑣 1, . . . , 𝑣𝑛
are linearly dependent.

1.6 Basis
A spanning set of a vector space may contain a vector which is in the span of the
other vectors in the set. Throwing away such a vector leaves a spanning set, again.
That is, in a spanning set, there may be redundancy. On the other hand, a linearly
independent set may fail to span the vector space. That is, in a linearly independent
set there may be deficiency. We would like to have a set of vectors which is neither
redundant nor deficient.
Let 𝑉 be a vector space over F. A linearly independent subset that spans 𝑉 is called
a basis of 𝑉 . A basis depends on the underlying field since linear combinations
depend on the field.
A vector space may have many bases. For example, both {1} and {2} are bases of
R. In fact {𝑥 }, for any nonzero 𝑥 ∈ R, is a basis of R. However, {0} has the unique
basis ∅.

(1.16) Example
1. Recall that in R2, the vectors 𝑒 1 = (1, 0) and 𝑒 2 = (0, 1) are linearly independent.
They also span R2 since (𝑎, 𝑏) = 𝑎𝑒 1 + 𝑏𝑒 2 for any 𝑎, 𝑏 ∈ R. Therefore, {𝑒 1, 𝑒 2 } is
a basis of R2 .
Vector Spaces 23
2. The set {(1, 1), (1, 2)} is a basis of R2 . Reason? Since neither of them is a
scalar multiple of the other, the set is linearly independent. To see that the
given set of vectors spans R2, we must show that each vector (𝑎, 𝑏) ∈ R2 can
be expressed as a linear combination of these vectors. So, we ask whether the
equation (𝑎, 𝑏) = 𝛼 (1, 1) + 𝛽 (1, 2) has a solution for 𝛼 and 𝛽. The requirement
amounts to 𝛼 + 𝛽 = 𝑎 and 𝛼 + 2𝛽 = 𝑏. We see that 𝛽 = 𝑏 − 𝑎 and 𝛼 = 2𝑎 − 𝑏 do
the job. Hence {(1, 1), (1, 2)} is a basis of R2 .
3. Let 𝑉 = {(𝑎, 𝑏) ∈ R2 : 2𝑎 − 𝑏 = 0}. Clearly, (1, 2) ∈ 𝑉 . If (𝑎, 𝑏) ∈ 𝑉 , then
𝑏 = 2𝑎. That is, (𝑎, 𝑏) = (𝑎, 2𝑎) = 𝑎(1, 2). So, 𝑉 = span {(1, 2)}. Also, {(1, 2)}
is linearly independent. Therefore, it is a basis of 𝑉 .
4. The set {(1, 0, 0), (0, 1, 0), (0, 0, 1)} as a subset of R3 is a basis of R3 . Also, the
same set as a subset of C3 is a basis of C3 .
5. Let 𝑉 = {(𝑎, 𝑏, 𝑐) ∈ R3 : 𝑎 − 2𝑏 + 𝑐 = 0}. Let (𝑎, 𝑏, 𝑐) ∈ 𝑉 . Then 𝑎 = 2𝑏 − 𝑐;
that is, (𝑎, 𝑏, 𝑐) = (2𝑏 − 𝑐, 𝑏, 𝑐) = 𝑏 (2, 1, 0) + 𝑐 (−1, 0, 1). So, 𝑉 is spanned by
𝐵 := {(2, 1, 0), (−1, 0, 1)}. Further, 𝐵 is a linearly independent subset of 𝑉 .
Therefore, 𝐵 is a basis of 𝑉 .
6. Let 𝑉 = F𝑚×𝑛 . Let 𝐸𝑖 𝑗 be the matrix in 𝑉 whose (𝑖, 𝑗)th entry is 1 and all other
entries are 0. Then 𝐵 = {𝐸𝑖 𝑗 ∈ 𝑉 : 1 ≤ 𝑖 ≤ 𝑚, 1 ≤ 𝑗 ≤ 𝑛} is a basis of 𝑉 .

Recall that 𝑒 𝑗 in F𝑛 has the 𝑗th component as 1 and all other components as 0. The
set {𝑒 1, . . . , 𝑒𝑛 } is a basis of F𝑛 . As an ordered set, this basis is called the Standard
Basis of F𝑛 .
We also write the ordered set {𝑒 1, . . . , 𝑒𝑛 } for the standard basis of F𝑛×1, where
each 𝑒 𝑗 is taken as a column vector. Similarly, we write the standard basis of F1×𝑛 as
{𝑒 1, . . . , 𝑒𝑛 }, where each 𝑒𝑖 is taken as a row vector. The context will clarify whether
it is a column vector or a row vector, and the particular 𝑛.
We now show formally that a basis neither has redundancy nor has deficiency in
spanning the vector space. We use the following terminology.
A subset 𝐵 of a vector space 𝑉 is a maximal linearly independent set means that
𝐵 is linearly independent in 𝑉 , and each proper superset of 𝐵 is linearly dependent.
Similarly, a subset 𝐵 of 𝑉 is a minimal spanning set of 𝑉 means that 𝐵 spans 𝑉 ,
and each proper subset of 𝐵 fails to span 𝑉 .

(1.17) Theorem
Any subset of a vector space is a basis iff it is a minimal spanning set iff it is a
maximal linearly independent set.

Proof. Let 𝐵 be a subset of a vector space 𝑉 .


Suppose that 𝐵 is a basis of 𝑉 . Then 𝐵 is a spanning set of 𝑉 . If 𝐵 is not a minimal
spanning set of 𝑉 , then there exists a vector 𝑣 ∈ 𝐵 such that 𝐵 \ {𝑣 } is also a spanning
24 MA2031 Classnotes

set. In particular, 𝑣 ∈ span (𝐵 \ {𝑣 }). This contradicts the assumption that 𝐵 is


linearly independent. Therefore, each basis of 𝑉 is a minimal spanning set of 𝑉 .
Let 𝐵 be a minimal spanning set. If 𝐵 is linearly dependent, then we have a vector
𝑣 ∈ 𝐵 such that 𝑣 ∈ span {𝐵 \ {𝑣 }). Then span (𝐵 \ {𝑣 }) = span (𝐵) = 𝑉 . This
contradicts minimality of 𝐵. Hence, 𝐵 is linearly independent. If 𝐵 is not maximal
linearly independent, then there exists a vector 𝑤 such that 𝐵 ∪ {𝑤 } is linearly
independent. In that case, 𝑤 ∉ span (𝐵). This contradicts the assumption that 𝐵 is
a spanning set. Therefore, each minimal spanning set of 𝑉 is a maximal linearly
independent set of 𝑉 .
Assume that 𝐵 is a maximal linearly independent set of 𝑉 . If 𝐵 does not span
𝑉 , then there exists 𝑥 ∈ 𝑉 such that 𝑥 ∉ span (𝐵). In that case, 𝐵 ∪ {𝑥 } is linearly
independent. This contradicts the maximality of 𝐵. That is, 𝐵 spans 𝑉 ; and hence 𝐵
is a basis of 𝑉 . Therefore, each maximal linearly independent set is a basis.
Any vector (𝑎, 𝑏) ∈ R2 can be written uniquely as 𝑎(1, 0) + 𝑏 (0, 1). Also, (𝑎, 𝑏)
can be written as a linear combination of the vectors (1, 1), (−1, 2), and (1, 0); but
not uniquely. For example,
(2, 3) = 3(1, 1) + 0(−1, 2) + −1(1, 0) = 1(1, 1) + 1(−1, 2) + 2(1, 0).
Here, both the sets {(1, 0), (0, 1)} and {(1, 1), (−1, 2), (1, 0)} are spanning sets of
R2 . We see that neither (1, 0) is in the span of (0, 1), nor (0, 1) is in the span of (1, 0).
Considering the second set, we find that (1, 0) is in the span of {(1, 1), (−1, 2)}.
Reason?
(1, 0) = 2(1, 1) + 1(−1, 2).
In this case, uniqueness of writing a vector as a linear combination breaks down
because one of the vectors in the set is a linear combination of others. This is
another reason why we are interested in a basis.
To express the intended idea formally, we require an ordering of the vectors in a
basis. When we consider the set {𝑣 1, . . . , 𝑣𝑛 } as an ordered set, we mean that 𝑣 1 is
the first vector, 𝑣 2 is the second vector, etc, and finally, 𝑣𝑛 is the 𝑛th vector in the
ordered set.

(1.18) Theorem
Let 𝑣 1, . . . , 𝑣𝑛 be vectors in a vector space 𝑉 . The ordered set 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } is a
basis of 𝑉 iff for each 𝑣 ∈ 𝑉 there exists a unique 𝑛-tuple of scalars (𝛼 1, . . . , 𝛼𝑛 )
such that 𝑣 = 𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 .

Proof. Suppose the ordered set 𝐵 is a basis of 𝑉 . Let 𝑣 ∈ 𝑉 . As span (𝐵) = 𝑉 , there
exist scalars 𝛼 1, . . . , 𝛼𝑛 such that 𝑣 = 𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 . For uniqueness, suppose

𝑣 = 𝛼 1 𝑣 1 + · · · + 𝛼 𝑛 𝑣 𝑛 = 𝛽 1 𝑣 1 + · · · + 𝛽𝑛 𝑣 𝑛 .
Vector Spaces 25
Then (𝛼 1 − 𝛽 1 )𝑣 1 + · · · + (𝛼𝑛 − 𝛽𝑛 )𝑣𝑛 = 0. Due to linear independence of 𝐵,

𝛼 1 = 𝛽 1 , . . . , 𝛼 𝑛 = 𝛽𝑛 .

This proves uniqueness of the 𝑛-tuple (𝛼 1, . . . , 𝛼𝑛 ).


Conversely, suppose for each 𝑣 ∈ 𝑉 , there exists a unique 𝑛-tuple of scalars
(𝛼 1, . . . , 𝛼𝑛 ) such that 𝑣 = 𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 . Then 𝑉 ⊆ span (𝐵) ⊆ 𝑉 . That is, 𝐵
spans 𝑉 . Further, let 𝛽 1𝑣 1 + · · · + 𝛽𝑛 𝑣𝑛 = 0. Now, 0 𝑣 1 + · · · + 0 𝑣𝑛 = 0. By uniqueness
of the 𝑛-tuple of scalars in the linear combination of the zero vector, we have
𝛽 1 = · · · = 𝛽𝑛 = 0. That is, 𝐵 is linearly independent. Hence, 𝐵 is a basis of 𝑉 .
A basis can be extracted from a finite spanning set.

(1.19) Theorem
If a vector space has a finite spanning set, then every finite spanning set contains a
basis.

Proof. Let 𝑆 = {𝑣 1, . . . , 𝑣𝑛 } be a spanning set of a vector space 𝑉 . Consider 𝑆 as an


ordered set. If 𝑆 is linearly independent, then 𝑆 is a basis. Otherwise, due to (1.14),
there exists a vector 𝑣𝑘 which is in the span of 𝑣 1, . . . , 𝑣𝑘−1 . Delete 𝑣𝑘 from 𝑆 and
apply the same check again. Repeatedly throwing away those vectors from 𝑆 which
are in the span of the previous vectors we end up at a basis.

(1.20) Example
Let 𝐵 = {(1, 0, 1), (1, 2, 1), (2, 2, 2), (0, 2, 0)} and let 𝑉 = span (𝐵). We see that
(1, 2, 1) is not a scalar multiple of (1.0, 1). Next, (2, 2, 2) = (1, 0, 1) + (1, 2, 1).
Removing the vector (2, 2, 2) from 𝐵, we obtain

𝐵 1 = {(1, 0, 1), (1, 2, 1), (0, 2, 0)}.

Here, 𝑉 = span (𝐵) = span (𝐵 1 ). Next, (0, 2, 0) = −(1, 0, 1) + (1, 2, 1). Removing
(0, 2, 0) from 𝐵 1, we end up with

𝐵 2 = {(1, 0, 1), (1, 2, 1)}.

Notice that 𝑉 = span (𝐵) = span (𝐵 1 ) = span (𝐵 2 ), and 𝐵 2 is linearly independent.


Thus 𝐵 2 is a basis of 𝑉 .

Similarly, by enlarging a linearly independent set we may also end up with a


spanning set, keeping linear independence preserved.

(1.21) Theorem
If a vector space has a finite spanning set, then every linearly independent set can
be extended to a basis.
26 MA2031 Classnotes

Proof. Let 𝑆 = {𝑢 1, . . . , 𝑢𝑛 } be a spanning set of a vector space 𝑉 . Let 𝐵 =


{𝑣 1, . . . , 𝑣𝑚 } be a linearly independent subset of 𝑉 . If 𝐵 also spans 𝑉 , then it is a
basis of 𝑉 . Otherwise, construct the ordered set 𝐶 = {𝑣 1, . . . , 𝑣𝑚 , 𝑢 1, . . . , 𝑢𝑛 }. Now,
𝐶 is a linearly dependent spanning set. Since 𝐵 is linearly independent, by (1.14),
some 𝑢𝑖 is a linear combination of the earlier vectors in 𝐶. Throw away all such
vectors one-by-one. The remaining set so constructed from 𝐶 is a basis of 𝑉 ; and it
is an extension of 𝐵.

(1.22) Example
Let 𝐵 = {(1, 0, 1), (1, 2, 1), (2, 2, 2), (0, 2, 0)} and let 𝑉 = span (𝐵), as in the last
example. The vector (2, −2, 2) ∈ 𝑉 , since (2, −2, 2) = 3(1, 0, 1) − (1, 2, 1).
For extending the set {(2, −2, 2)} to a basis, we construct the spanning set

𝐶 = {(2, −2, 2), (1, 0, 1), (1, 2, 1), (2, 2, 2), (0, 2, 0)}.

Notice that 𝐶 is a spanning set of 𝑉 since its subset 𝐵 is a spanning set. Further, we
see that {(2, −2, 2), (1, 0, 1)} is linearly independent. And,

(1, 2, 1) = (−1)(2, −2, 2) + 3(1, 0, 1),


(2, 2, 2) = (−1)(2, −2, 2) + 4(1, 0, 1),
(0, 2, 0) = (−1)(2, −2, 2) + 2(1, 0, 1).

Deleting these vectors from 𝐶, we end up with the basis {(2, −2, 2), (1, 0, 1)} of
𝑉 , which is an extension of {(2, −2, 2)}.

Exercises for § 1.6


1. Determine which of the following sets are bases of F2 [𝑡]?
(a) {1, 1 + 𝑡, 1 + 𝑡 + 𝑡 2 }. Ans: Basis.
(b) {1 + 2𝑡 + 𝑡 2, 3 + 𝑡 2, 𝑡 + 𝑡 2 }. Ans: Basis.
(c) {1 + 2𝑡 + 3𝑡 2, 2 − 5𝑡 + 3𝑡 2, 3𝑡 + 𝑡 2 }. Ans: Not a basis.
(d) {−1 − 𝑡 − 2𝑡 2, 2 + 𝑡 − 2𝑡 2, 1 − 2𝑡 + 4𝑡 2 }. Ans: Basis.
2. Is {1 + 𝑡 𝑛 , 𝑡 + 𝑡 𝑛 , . . . , 𝑡 𝑛−1 + 𝑡 𝑛 , 𝑡 𝑛 } a basis of F𝑛 [𝑡]? Ans: Yes.
3. Let {𝑥, 𝑦, 𝑧} be a basis of a vector space 𝑉 . Are the following bases of 𝑉 ?
(a) {𝑥 + 𝑦, 𝑦 + 𝑧, 𝑧 + 𝑥 } (b) {𝑥 − 𝑦, 𝑦 − 𝑧, 𝑧 − 𝑥 }
Ans: (a) Yes. (b) No.
4. Find a basis for the subspace {(𝑎, 𝑏, 𝑐) ∈ R3 : 𝑎 + 𝑏 + 𝑐 = 0} of R3 .
5. Find a basis of 𝑉 = {(𝑎 1, . . . , 𝑎 5 ) ∈ C5 : 𝑎 1 + 𝑎 3 − 𝑎 5 = 0 = 𝑎 2 − 𝑎 4 = 0}.
6. Let 𝑉 = {𝑝 (𝑡) ∈ R2 [𝑡] : 𝑝 (0) + 2𝑝 0 (0) + 3𝑝 00 (0) = 0}. Show that 𝑉 is a vector
space and find a basis for it.
Vector Spaces 27
7. Extend the set {1 + 𝑡 2, 1 − 𝑡 2 } to a basis of F3 [𝑡].
8. Let 𝑢 1 = 1 and let 𝑢 𝑗 = 1 + 𝑡 + 𝑡 2 + · · · + 𝑡 𝑗−1 for 𝑗 = 2, 3, 4, . . . Is {𝑢 1, . . . , 𝑢𝑛 }
a basis of F𝑛 [𝑡]? Is {𝑢 1, 𝑢 2, . . .} a basis of F[𝑡]? Ans: Yes; Yes.
3
9. Construct three bases for R so that no two of them have a common vector.
10. Construct a basis for the vector space of all functions from {1, 2, 3} to R.
11. Construct a basis for the real vector space of all 𝑛 × 𝑛 hermitian matrices.
12. Construct a basis for the vector space of all 𝑛 × 𝑛 matrices of trace 0.
13. Show that 𝑊 := {𝐴 ∈ C2×2 : 𝐴∗ + 𝐴 = 0} is a real vector space. Find a basis
for 𝑊 .

1.7 Dimension
Due to (1.19) if a vector space has a finite spanning set, then all its bases are finite.
We show something more.

(1.23) Theorem
If a vector space has a finite spanning set, then each basis has the same number of
elements.

Proof. Let 𝑉 be a vector space having a finite spanning set. Let 𝐵 and 𝐸 be two
bases of 𝑉 . Both 𝐵 and 𝐶 are finite sets. Let 𝐵 have 𝑚 vectors and let 𝐶 have 𝑛
vectors. Consider 𝐵 as a linearly independent set and 𝐸 as a spanning set. Then
𝑚 ≤ 𝑛, due to (1.15). Now, consider 𝐸 as a linearly independent set and 𝐵 as a
spanning set. Then 𝑛 ≤ 𝑚.
Let 𝑉 be a vector space having a finite spanning set. Then the number of elements
in a basis is called the dimension of 𝑉 ; and it is denoted by dim (𝑉 ).

(1.24) Example

1. dim (R) = 1; dim (R𝑛 ) = 𝑛; dim (C) = 1; dim (C𝑛 ) = 𝑛; dim (F𝑛 [𝑡]) = 𝑛 + 1.
2. The dimension of the zero space {0} is 0. Reason? ∅ is a basis of {0}.
3. If C is considered as a vector space over R, then dim (C) = 2. For instance, {1, 𝑖}
is a basis of the real vector space C.
4. The real vector space C𝑛 has dimension 2𝑛. Can you construct a basis of C𝑛
considered as a vector space over R?
5. The real vector space C𝑛 [𝑡] has dimension 2(𝑛 + 1).
28 MA2031 Classnotes

A vector space which has a finite basis is called a finite dimensional vector
space. We also write dim (𝑉 ) < ∞ to express the fact that “𝑉 is finite dimensional”.
Due to (1.19), each vector space having a finite spanning set is finite dimensional.
The dimension of a finite dimensional vector space is a non-negative integer. For
instance, F𝑛 and F𝑛 [𝑡] are finite dimensional vector spaces over F with dim (F𝑛 ) = 𝑛
and dim (F𝑛 [𝑡]) = 𝑛 + 1.
A vector space which does not have a finite basis, is called an infinite dimensional
vector space; and we express this fact by writing dim (𝑉 ) = ∞. From (1.15) it follows
that
a vector space is infinite dimensional
iff no finite subset of it is its basis
iff no finite subset of it is a spanning set
iff it contains an infinite linearly independent set.

(1.25) Example
1. F∞ is infinite dimensional since {𝑒 1, 𝑒 2, . . .} is linearly independent in F∞ .
2. The set of all polynomials, F[𝑡], is an infinite dimensional vector space. Reason?
Suppose the dimension is finite, say dim (F[𝑡]) = 𝑛. Then any set of 𝑛 + 1 vectors
is linearly dependent. But {1, 𝑡, . . . , 𝑡 𝑛 } is linearly independent! Notice that
{1, 𝑡, 𝑡 2, . . .} is a basis of F𝑛 [𝑡].
3. 𝐶 [𝑎, 𝑏] is an infinite dimensional vector space. Reason? Take the collection of
functions {𝑓𝑛 : 𝑓𝑛 (𝑡) = 𝑡 𝑛 for all 𝑡 ∈ [𝑎, 𝑏]; 𝑛 = 0, 1, 2, . . .}. Then {𝑓0, 𝑓1, . . . , 𝑓𝑛 }
is linearly independent for every 𝑛. So, 𝐶 [𝑎, 𝑏] cannot have a finite basis.

We will not study infinite dimensional vector spaces, though occasionally, we will
give an example to illustrate a point.
Inter-dependence of the notions of spanning set, linear independence, and basis
can be seen using the notion of dimension. For a finite set 𝑆, we write |𝑆 | for the
number of elements in 𝑆. The following theorems (1.26-1.27) state some relevant
facts; their proofs are easy.

(1.26) Theorem
Let 𝑆 be a finite subset of a finite dimensional vector space 𝑉 .
(1) 𝑆 is a basis of 𝑉 iff 𝑆 is a spanning set and |𝑆 | = dim (𝑉 ).
(2) 𝑆 is a basis of 𝑉 iff 𝑆 is linearly independent and |𝑆 | = dim (𝑉 ).
(3) If |𝑆 | < dim (𝑉 ), then 𝑆 does not span 𝑉 .
(4) If |𝑆 | > dim (𝑉 ), then 𝑆 is linearly dependent.
Vector Spaces 29
(1.27) Theorem
Let 𝑈 be a subspace of a finite dimensional vector space 𝑉 .
(1) 𝑈 is a proper subspace of 𝑉 iff dim (𝑈 ) < dim (𝑉 ).
(2) (Basis Extension) Each basis of 𝑈 can be extended to a basis of 𝑉 .

In view of (1.27-2) each linearly independent subset of a finite dimensional vector


space can be extended to a basis for the space.
Given two subspaces 𝑈 and 𝑊 of a finite dimensional vector space, we have two
other subspaces, span (𝑈 ∩ 𝑊 ) and 𝑈 + 𝑊 . What about their dimensions? Since
𝑈 ∩𝑊 is a subspace of both 𝑈 and 𝑊 , and each of 𝑈 and 𝑊 is a subspace of 𝑈 +𝑊 ,
we obtain

dim (𝑈 ∩ 𝑊 ) ≤ dim (𝑈 ), dim (𝑊 ) ≤ dim (𝑈 + 𝑊 ).

(1.28) Example

1. Let 𝑈 = {(𝑎, 𝑏) ∈ R2 : 2𝑎 − 𝑏 = 0} and 𝑊 = {(𝑎, 𝑏) ∈ R2 : 𝑎 + 𝑏 = 0}. For these


subspaces of R2, we see that

𝑈 ∩ 𝑊 = {(𝑎, 𝑏) : 2𝑎 − 𝑏 = 0 = 𝑎 + 𝑏} = {0}.
𝑈 + 𝑊 = {(𝑎, 2𝑎) + (𝑐, −𝑐) : 𝑎, 𝑐 ∈ R} = {(𝑎 + 𝑐, 2𝑎 − 𝑐) : 𝑎, 𝑐 ∈ R}.

Further, if 𝛼, 𝛽 ∈ R, then
 𝛼 + 𝛽 2𝛼 − 𝛽 𝛼 + 𝛽 2𝛼 − 𝛽 
(𝛼, 𝛽) = + , 2 − .
3 3 3 3
That is, each vector in R2 can be expressed in the form (𝑎 + 𝑐, 2𝑎 − 𝑐). Thus
𝑈 + 𝑊 = R2 .
Here, dim (𝑈 ∩ 𝑊 ) + dim (𝑈 + 𝑊 ) = 2 = dim (𝑈 ) + dim (𝑊 ).
2. Consider the following subspaces of R3 :

𝑈 = {(𝑎, 𝑏, 𝑐) ∈ R3 : 𝑎 + 𝑏 + 𝑐 = 0}; 𝑊 = {(𝑎, 𝑏, 𝑐) ∈ R3 : 𝑎 − 𝑏 − 𝑐 = 0}.

We determine (some) bases for 𝑈 , 𝑊 , 𝑈 ∩ 𝑊 and 𝑈 + 𝑊 .


It is easy to see that a basis for 𝑈 is {(1, 0, −1), (1, −1, 0)}; and a basis for 𝑊 is
{(1, 1, 0), (1, 0, 1)}.
Next, solving 𝑎 + 𝑏 + 𝑐 = 0 and 𝑎 − 𝑏 − 𝑐 = 0, we see that 𝑎 = 0 and 𝑐 = −𝑏.
Thus 𝑈 ∩ 𝑊 = {(0, 𝑏, −𝑏) : 𝑏 ∈ R} has a basis {(0, 1, −1).}
30 MA2031 Classnotes

Next, 𝑈 + 𝑊 is the span of the union of bases of 𝑈 and 𝑊 ; that is,

𝑈 + 𝑊 = span {(1, 0, −1), (1, −1, 0), (1, 1, 0), (1, 0, 1)} ⊆ R3 .

Since (1, 0, −1), (1, −1, 0), (1, 1, 0) are linearly independent, 𝑈 + 𝑊 = R3 .
Thus dim (𝑈 ∩ 𝑊 ) + dim (𝑈 + 𝑊 ) = 1 + 3 = dim (𝑈 ) + dim (𝑊 ).

We show that our observation in (1.28) connecting dim (𝑈 ), dim (𝑊 ), dim (𝑈 ∩𝑊 )


and dim (𝑈 + 𝑊 ) is true in general.

(1.29) Theorem
Let 𝑈 and 𝑊 be finite dimensional subspaces of a vector space 𝑉 . Then

dim (𝑈 ∩ 𝑊 ) + dim (𝑈 + 𝑊 ) = dim (𝑈 ) + dim (𝑊 ).

Proof. Since 𝑈 ∩ 𝑊 is a subspace of 𝑈 , dim (𝑈 ∩ 𝑊 ) < ∞. Let 𝐵 = {𝑢 1, . . . , 𝑢𝑛 }


be a basis of 𝑈 ∩𝑊 . (Here, if 𝑈 ∩𝑊 = {0}, we take 𝐵 = ∅; and thus 𝑛 = 0.) By the
basis extension theorem, there are bases 𝐵 ∪ 𝐶 and 𝐵 ∪ 𝐷 for 𝑈 and 𝑊 , respectively,
where 𝐶 = {𝑣 1, . . . , 𝑣𝑘 } and 𝐷 = {𝑤 1, . . . , 𝑤𝑚 }, for some 𝑘, 𝑚 ≥ 0. Notice that no 𝑣
is in 𝑈 ∩ 𝑊 , and no 𝑤 is in 𝑈 ∩ 𝑊 . Also, no 𝑣 is in 𝑊 . Reason? If 𝑣 𝑗 ∈ 𝑊 , then
𝑣 𝑗 ∈ 𝑈 ∩ 𝑊 , which is wrong. Similarly, no 𝑤 is in 𝑈 . Therefore, 𝐵 ∪ 𝐶 has 𝑛 + 𝑘
vectors, and 𝐵 ∪ 𝐷 has 𝑛 + 𝑚 vectors. Also, the set

𝐸 = 𝐵 ∪ 𝐶 ∪ 𝐷 = {𝑢 1, . . . , 𝑢𝑛 , 𝑣 1, . . . , 𝑣𝑘 , 𝑤 1, . . . , 𝑤𝑚 }

has exactly 𝑛 + 𝑘 + 𝑚 vectors. We show that 𝐸 is a basis of 𝑈 + 𝑊 .


Let 𝑧 ∈ 𝑈 + 𝑊 . Then there exist 𝑥 ∈ 𝑈 and 𝑦 ∈ 𝑊 such that 𝑥 + 𝑦 ∈ 𝑈 + 𝑊 .
Further, there exist scalars 𝛼 1, . . . , 𝛼𝑛+𝑘 , 𝛽 1, . . . , 𝛽𝑛+𝑚 such that

𝑥 = 𝛼 1𝑢 1 + · · · + 𝛼𝑛𝑢𝑛 + 𝛼𝑛+1𝑣 1 + · · · + 𝛼𝑛+𝑘 𝑣𝑘 ,


𝑦 = 𝛽 1𝑢 1 + · · · + 𝛽𝑛𝑢𝑛 + 𝛽𝑛+1𝑤 1 + · · · + 𝛽𝑛+𝑚𝑤𝑚 .

Then
𝑛
Õ 𝑘
Õ 𝑚
Õ
𝑧 = 𝑥 +𝑦 = (𝛼𝑖 + 𝛽𝑖 )𝑢𝑖 + 𝛼𝑛+𝑗 𝑣 𝑗 + 𝛽𝑛+𝑙 𝑤 ℓ ∈ span (𝐸).
𝑖=1 𝑗=1 ℓ=1

So, 𝑈 + 𝑊 ⊆ span (𝐸). As 𝐸 ⊆ 𝑈 + 𝑊 , we have span (𝐸) ⊆ 𝑈 + 𝑊 . Consequently,


span (𝐸) = 𝑈 + 𝑊 .
To prove the linear independence of 𝐸, let

𝛼 1𝑢 1 + · · · + 𝛼𝑛𝑢𝑛 + 𝛽 1𝑣 1 + · · · + 𝛽𝑘 𝑣𝑘 + 𝛾 1𝑤 1 + · · · + 𝛾𝑚𝑤𝑚 = 0.
Vector Spaces 31
Then
𝛼 1𝑢 1 + · · · + 𝛼𝑛𝑢𝑛 + 𝛽 1𝑣 1 + · · · + 𝛽𝑘 𝑣𝑘 = −𝛾 1𝑤 1 − · · · − 𝛾𝑚𝑤𝑚 .
The left hand side is a vector in 𝑈 , and the right hand side is a vector in𝑊 . Therefore,
both are in 𝑈 ∩ 𝑊 . Since 𝐵 is a basis for 𝑈 ∩ 𝑊 , we have

−𝛾 1𝑤 1 − · · · − 𝛾𝑚𝑤𝑚 = 𝑎 1𝑢 1 + · · · + 𝑎𝑛𝑢𝑛

for some scalars 𝑎 1, . . . , 𝑎𝑛 . Thus

𝑎 1𝑢 1 + · · · + 𝑎𝑛𝑢𝑛 + 𝛾 1𝑤 1 + · · · + 𝛾𝑚𝑤𝑚 = 0.

Since 𝐵 ∪ 𝐷 is linearly independent, 𝑎 1 = · · · = 𝑎𝑛 = 𝛾 1 = · · · = 𝛾𝑘 = 0. Substituting


the values of 𝛾𝑖 ’s, we get

𝛼 1𝑢 1 + · · · + 𝛼𝑛𝑢𝑛 + 𝛽 1𝑣 1 + · · · + 𝛽𝑘 𝑣𝑘 = 0.

Since 𝐵 ∪ 𝐶 is linearly independent, 𝛼 1 = · · · = 𝛼𝑛 = 𝛽 1 = · · · = 𝛽𝑘 = 0. That is,

𝛼 1 = · · · = 𝛼𝑛 = 𝛽 1 = · · · = 𝛽𝑘 = 𝛾 1 = · · · 𝛾𝑘 = 0.

Hence 𝐸 is linearly independent.


Now that 𝐸 is a basis of 𝑈 + 𝑊 , we have
dim (𝑈 ∩ 𝑊 ) + dim (𝑈 + 𝑊 ) = 𝑛 + 𝑛 + 𝑚 + 𝑘 = dim (𝑈 ) + dim (𝑊 ).
It thus follows that two distinct planes through the origin in R3 intersect on a
straight line.

Exercises for § 1.7


1. Describe all subspaces of R3 .
2. Find bases and dimensions of the following subspaces of R5 :
(a) {(𝑎 1, 𝑎 2, 𝑎 3, 𝑎 4, 𝑎 5 ) ∈ R5 : 𝑎 1 − 𝑎 3 − 𝑎 4 = 0}. Ans: 4.
(b) {(𝑎 1, 𝑎 2, 𝑎 3, 𝑎 4, 𝑎 5 ) ∈ R5 : 𝑎 2 = 𝑎 3 = 𝑎 4, 𝑎 1 + 𝑎 5 = 0}. Ans: 2.
(c) Span of the set of vectors (1, −1, 0, 2, 1), (2, 1, −2, 0, 0), (0, −3, 2, 4, 2), (2, 4, 1, 0, 1),
(3, 3, −4, −2, −1) and (5, 7, −3, −2, 0). Ans: 3.
3. Find dim span {1 + 𝑡 2, −1 + 𝑡 + 𝑡 2, −6 + 3𝑡, 1 + 𝑡 2 + 𝑡 3, 𝑡 3 } in F3 [𝑡]. Ans: 3.


4. Determine the dimension of the vector space of all matrices, over R or C or


both (as appropriate):
(a) 𝑛 × 𝑛 matrices. Ans: 𝑛 2 .
(b) symmetric 𝑛 × 𝑛 matrices. Ans: (𝑛 2 + 𝑛)/2.
32 MA2031 Classnotes

(c) skew-symmetric 𝑛 × 𝑛 matrices.


Ans: In F𝑛×𝑛 , dim F is (𝑛 2 − 𝑛)/2. In C𝑛×𝑛 , dim R is 𝑛 2 − 𝑛.
(d) hermitian 𝑛 × 𝑛 matrices.
Ans: In C𝑛×𝑛 , dim R is 𝑛 2 . In R𝑛×𝑛 , dim R is (𝑛 2 + 𝑛)/2.
(e) upper triangular 𝑛 × 𝑛 matrices.
Ans: In F𝑛×𝑛 , dim F is (𝑛 2 + 𝑛)/2. In C𝑛×𝑛 , dim R is 𝑛 2 + 𝑛.
(f) diagonal 𝑛 × 𝑛 matrices.
Ans: In F𝑛×𝑛 , dim F is 𝑛. In C𝑛×𝑛 , dim R is 2𝑛.
(g) scalar 𝑛 × 𝑛 matrices.
Ans: In F𝑛×𝑛 , dim F is 1. In C𝑛×𝑛 , dim R is 2.
5. Let 𝑈 and 𝑊 be subspaces of a finite dimensional vector space. Prove that if
dim (𝑈 ∩ 𝑊 ) = dim (𝑈 ), then 𝑈 ⊆ 𝑊 .
6. Show that if 𝑈 and 𝑊 are subspaces of R9 with dim (𝑈 ) = 5 = dim (𝑊 ), then
𝑈 ∩ 𝑊 ≠ {0}.
7. Let 𝑈 = span {(1, 2, 3), (2, 1, 1)} and 𝑊 = span {(1, 0, 1), (3, 0, −1)}. Find a
basis for 𝑈 ∩ 𝑊 . Also, find dim (𝑈 + 𝑊 ).
Ans: dim (𝑈 ∩ 𝑊 ) = 1, dim (𝑈 + 𝑊 ) = 3.
8. Compute dim (𝑈 ), dim (𝑉 ), dim (𝑈 + 𝑉 ) and dim (𝑈 ∩ 𝑉 ), where

𝑈 = {(𝑎 1, . . . , 𝑎 50 ) ∈ R50 : 𝑎𝑖 = 0 when 3 divides 𝑖}


𝑉 = {(𝑎 1, . . . , 𝑎 50 ) ∈ R50 : 𝑎𝑖 = 0 when 4 divides 𝑖}.

Ans: dim (𝑈 ) = 34, dim (𝑉 ) = 38, dim (𝑈 + 𝑉 ) = 46, dim (𝑈 ∩ 𝑉 ) = 26.


9. Let 𝑉 be the vector space of all functions from {1, 2, 3} to R. Consider each
polynomial in R[𝑡] as a function from {1, 2, 3} to R. Is the set of vectors
{𝑡, 𝑡 2, 𝑡 3, 𝑡 4, 𝑡 5 } linearly independent in 𝑉 ? Ans: No.
10. Let 𝑆 be a set consisting of 𝑛 elements. Let 𝑉 be the set of all functions from
𝑆 to R. Show that 𝑉 is a vector space of dimension 𝑛.
11. Prove (1.26-1.27).
12. Let 𝑉 be the set of all functions 𝑓 (𝑡) having a power series expansion for
|𝑡 | < 1. Show that 𝑉 is an infinite dimensional vector space.

1.8 Extracting a basis


A question of computational importance is that given a finite list of vectors, how do
we construct a basis for the subspace spanned by these vectors?
Vector Spaces 33
Given a list of vectors 𝑣 1, . . . , 𝑣𝑛 in a vector space 𝑉 , we need to systematically
eliminate those vectors which are linear combinations of the previous ones in the
list, using (1.14). So that the span of the shortened list will be the same as that of
the given list. For example, if 𝑣𝑖 = 0, then 𝑣𝑖 is a linear combination of others; also,
𝑣𝑖 ∈ span (∅). That is, a zero vector can safely be dropped from the list.
The notion of elementary operations help in achieving this, at least in F𝑛 . We
observe the following connections between linear combinations, linear dependence
and linear independence of rows and columns of a matrix, and its row reduced
echelon form (RREF).

(1.30) Observation In the RREF of 𝐴 suppose 𝑅𝑖1, . . . , 𝑅𝑖𝑟 are the rows of 𝐴
which have become the nonzero rows in the RREF, and other rows have become the
zero rows. Also, suppose 𝐶 𝑗1, . . . , 𝐶 𝑗𝑟 for 𝑗1 < · · · < 𝑗𝑟 , are the columns of 𝐴 which
have become the pivotal columns in the RREF, other columns being non-pivotal.
Then the following are true:
1. The rows 𝑅𝑖1, . . . , 𝑅𝑖𝑟 are linearly independent; and the other rows of 𝐴 are
linear combinations of 𝑅𝑖1, . . . , 𝑅𝑖𝑟 .
2. The columns 𝐶 𝑗1, . . . , 𝐶 𝑗𝑟 have respectively become 𝑒 1, . . . , 𝑒𝑟 in the RREF.
3. The columns 𝐶 𝑗1, . . . , 𝐶 𝑗𝑟 are linearly independent; and other columns of 𝐴
are linear combinations of 𝐶 𝑗1, . . . , 𝐶 𝑗𝑟 .
4. If 𝑒 1, . . . , 𝑒𝑘 are all the pivotal columns in the RREF that occur to the
left of a non-pivotal column, then the non-pivotal column is in the form
(𝑎 1, . . . , 𝑎𝑘 , 0, . . . , 0)𝑇 . Further, if a column 𝐶 in 𝐴 has become this non-pivotal
column in the RREF, then 𝐶 = 𝑎 1𝐶 𝑗1 + · · · + 𝑎𝑘 𝐶 𝑗𝑘 .

The above observations can be used in two ways. Let 𝑣 1, . . . , 𝑣𝑚 ∈ F𝑛 and let
𝑈 = span {𝑣 1, . . . , 𝑣𝑚 }. We consider these vectors as row vectors, form a matrix of
𝑚 rows where the 𝑖th row is 𝑣𝑖 . Then reduce this matrix to its RREF. The zero rows
are obviously in the span of the pivoted rows. The pivoted rows form a basis for 𝑈 .
In the second method, we we consider the vectors 𝑣 1, . . . , 𝑣𝑚 as column vectors,
and form a matrix with its 𝑗th column as 𝑣 𝑗 . Then we reduce the matrix to its RREF.
If the column indices of the pivoted columns are 𝑖 1, . . . , 𝑖𝑟 , then 𝑣𝑖 1 , . . . , 𝑣𝑖𝑟 form a
basis for 𝑈 . Further, using the last item in the above observation, we can also find
out exactly how a vector out of 𝑣 1, . . . , 𝑣𝑚 not in the basis is expressed a s a linear
combination of the basis vectors.

(1.31) Example
Let 𝑈 = span {𝑣 1, 𝑣 2, 𝑣 3, 𝑣 4, 𝑣 5 } in R4, where 𝑣 1 = (1, 1, 0, −1), 𝑣 2 = (2, −1, 1, 0),
𝑣 3 = (1, 2, −2, 1), 𝑣 4 = (1, 5, −3, −1), 𝑣 5 = (4, −1, 0, 2). It is required to extract a
basis for 𝑈 from the list of these vectors.
34 MA2031 Classnotes

Method 1: Taking the vectors as rows of a matrix and reducing it to its RREF, we
have
 1 1 0 −1   1 0 0 1/5 
   
 2 −1 1 0   0 1 0 −6/5 
   
 𝑅𝑅𝐸𝐹 
 1 2 −2 1  −→  0 0 1 − /5  . 8
 
   
 1 5 −3 −1  0 0 0 0
   
 4 −1 0 2  0 0 0 0 
   
Discarding the zero rows, we obtain the basis for 𝑈 as

{(1, 0, 0, 1/5), (0, 1, 0, −6/5), (0, 0, 1, −8/5).

Method 2: Taking the vectors as columns of a matrix and reducing it to its RREF,
we have
 1 2 1 1 4  1 0 0 2 −1 
   
 1 −1 2 5 1  𝑅𝑅𝐸𝐹  1 0 −1 2 
   
 −→  .
 0 1 −2 −3 0  0 0 1 1 1

   
 −1 0 1 −1 2  0 0 0 0 0
   
In the RREF, the first three columns are pivoted columns and the last two are non-
pivoted. Therefore, a basis for 𝑈 is {𝑣 1, 𝑣 2, 𝑣 3 }. Further, the entries in the non-pivoted
columns show that 𝑣 4 = 2𝑣 1 − 𝑣 2 + 𝑣 3 and 𝑣 5 = −𝑣 1 + 2𝑣 2 + 𝑣 3 .

We can extend this method of extracting a basis to F𝑛 [𝑡] by considering the


coefficients of the powers of 𝑡 . It is illustrated in the following example.

(1.32) Example
Let 𝑈 = span {𝑡 + 2𝑡 2 + 3𝑡 3, 1 + 2𝑡 2 + 4𝑡 3, −1 +𝑡 −𝑡 3, 3 −𝑡 + 4𝑡 2 + 9𝑡 3, 1 +𝑡 +𝑡 2 +𝑡 3 }.
To construct a basis for 𝑈 , we write the polynomial 𝑎 + 𝑏𝑡 + 𝑐𝑡 2 + 𝑑𝑡 3 as the tuple
(𝑎, 𝑏, 𝑐, 𝑑) and then follow the earlier methods of reducing the appropriate matrix to
its RREF.
Method 1: Here, we take the tuples corresponding to the given polynomials as rows
of a matrix and convert it to its RREF:
 0 1 2 3 1 0 0 0 
  
 1 0 2 4
 
0 1
 0 −1 
 𝑅𝑅𝐸𝐹 
 −1 1 0 −1  −→  0 0 1 2.
 
   
 3 −1 4 9  0 0 0 0
   
 1 1 1 1 0 0 0 0 
  
Writing the nonzero rows as the corresponding polynomials, we get a basis for 𝑈 ,
namely, {1, 𝑡 − 𝑡 3, 𝑡 2 + 2𝑡 3 }.
Vector Spaces 35
Method 2: We write the same tuples as columns of a matrix and convert the matrix
to its RREF:
 0 1 −1 3 1   1 0 1 −1 0 
   
 1 0 1 −1 1  𝑅𝑅𝐸𝐹  0 1 −1 3 0 
   
 −→  .
2 2 0 4 1 0 0 0 0 1

   
 3 4 −1 9 1  0 0 0 0 0
   
In the RREF, columns 1, 2 and 5 are pivoted; thus a basis for 𝑈 consists of the first,
second, and the fifth polynomial. That is a basis for 𝑈 is

{𝑡 + 2𝑡 2 + 3𝑡 3, 1 + 2𝑡 2 + 4𝑡 3, 1 + 𝑡 + 𝑡 2 + 𝑡 3 }.

Further, the entries in the non-pivoted columns in the RREF show that
−1 + 𝑡 − 𝑡 3 = (𝑡 + 2𝑡 2 + 3𝑡 3 ) − (1 + 2𝑡 2 + 4𝑡 3 ) and
3 − 𝑡 + 4𝑡 2 + 9𝑡 3 = −(𝑡 + 2𝑡 2 + 3𝑡 3 ) + 3(1 + 2𝑡 2 + 4𝑡 3 ).

Exercises for § 1.8


1. In R3, what is dim (span {𝑒 1 + 𝑒 2, 𝑒 2 + 𝑒 3, 𝑒 3 + 𝑒 1 })? Ans: 3.
2. Find a basis and dimension of the subspace of R5 that is spanned by the vectors
(1, −1, 0, 2, 1), (2, 1, −2, 0, 0), (0, −3, 2, 4, 2), (3, 3, −4, −2, −1), (2, 4, 1, 0, 1),
(5, 7, −3, −2, 0)}. Ans: dim = 4.
3. Determine a basis for span {1 + 𝑡 2, −1 + 𝑡 + 𝑡 2, −6 + 3𝑡, 1 + 𝑡 2 + 𝑡 3, 𝑡 3 }.
Ans: dim = 3.
4. In each of the following subspaces 𝑈 and 𝑊 of a vector space 𝑉 , determine
the bases and dimensions of 𝑈 , 𝑊 , 𝑈 + 𝑊 and of 𝑈 ∩ 𝑊 .
(a) 𝑉 = R3, 𝑈 = span {(1, 2, 3), (2, 1, 1)}, 𝑊 = span {(1, 0, 1), (3, 0, −1)}.
Ans: dim (𝑈 ) = 2, dim (𝑊 ) = 2, dim (𝑈 ∩ 𝑊 ) = 1, dim (𝑈 + 𝑊 ) = 3.
(b) 𝑉 = R4, 𝑈 = span {(1, 0, 2, 0), (1, 0, 3, 0), },
𝑊 = span {(1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 1)}.
Ans: dim (𝑈 ) = 2, dim (𝑊 ) = 3, dim (𝑈 ∩ 𝑊 ) = 1, dim (𝑈 + 𝑊 ) = 4.
(c) 𝑉 = C4, 𝑈 = span {(1, 0, 0, 2), (3, 1, 0, 2), (7, 0, 5, 2)},
𝑊 = span {(1, 0, 3, 2), (10, 4, 14, 8), (1, 1, −1, −1)}.
Ans: dim (𝑈 ) = 3, dim (𝑊 ) = 2, dim (𝑈 ∩ 𝑊 ) = 1, dim (𝑈 + 𝑊 ) = 4.
5. Let 𝑈 = {(𝑎, 𝑏, 𝑐, 𝑑) ∈ R4 : 𝑏 = −𝑎} and 𝑊 = {(𝑎, 𝑏, 𝑐, 𝑑) ∈ R4 : 𝑐 = −𝑎}.
Find the dimensions of the subspaces 𝑈 , 𝑊 , 𝑈 + 𝑊 and 𝑈 ∩ 𝑊 of R4 .
Ans: dim (𝑈 ) = 3, dim (𝑊 ) = 3, dim (𝑈 ∩ 𝑊 ) = 2, dim (𝑈 + 𝑊 ) = 4.
6. Determine conditions on the scalars 𝑎 and 𝑏 so that the vectors (1, 𝑎, 4),
(1, 3, 1) and (0, 2, 𝑏) are linearly dependent. Ans: 𝑏 (𝑎 − 3) = 6.
36 MA2031 Classnotes

7. Let 𝑉 be a vector space. Let 𝑣 1, . . . , 𝑣𝑛 ∈ 𝑉 . Show the following:


(a) If 𝑣 1, . . . , 𝑣𝑛 span 𝑉 , then 𝑣 1, 𝑣 2 − 𝑣 1, . . . , 𝑣𝑛 − 𝑣 1 span 𝑉 .
(b) If 𝑣 1, . . . , 𝑣𝑛 are linearly independent, then 𝑣 1, 𝑣 2 − 𝑣 1, . . . , 𝑣𝑛 − 𝑣 1 are
linearly independent.
2
Inner Product Spaces

2.1 Inner Products


Vectors in the plane or in the usual three dimensional space are defined as entities
having certain lengths and directions. In the last chapter we have only abstracted
the notion of a vector via the operations of addition and scalar multiplication. We
wish to give directions and lengths to our abstract vectors in a vector space. Notice
that directions are relatively fixed by defining angle between vectors. In the plane,
both angle and length are defined by the dot product. For instance, writing k𝑥 k for
the length of a vector 𝑥, we have
𝑥 ·𝑦
k𝑥 k 2 = 𝑥 · 𝑥,

cos ∠(𝑥, 𝑦) = .
k𝑥 k k𝑦 k
We will refer to the dot product as an inner product. As usual, we will define
this inner product abstractly, as a certain map satisfying some properties. In fact,
we take up those fundamental properties of the dot product in connection with the
already available operations of addition and scalar multiplication.
An inner product on a vector space 𝑉 over F is a map from 𝑉 × 𝑉 to F, which
associates a pair of vectors 𝑥, 𝑦 ∈ 𝑉 to a scalar h𝑥, 𝑦i in F satisfying the following
properties:
(1) For each 𝑥 ∈ 𝑉 , h𝑥, 𝑥i ≥ 0.
(2) For each 𝑥 ∈ 𝑉 , h𝑥, 𝑥i = 0 iff 𝑥 = 0.
(3) For all 𝑥, 𝑦, 𝑧 ∈ 𝑉 , h𝑥 + 𝑦, 𝑧i = h𝑥, 𝑧i + h𝑦, 𝑧i.
(4) For each 𝛼 ∈ F and for all 𝑥, 𝑦 ∈ 𝑉 , h𝛼𝑥, 𝑦i = 𝛼 h𝑥, 𝑦i.
(5) For all 𝑥, 𝑦 ∈ 𝑉 , h𝑦, 𝑥i = h𝑥, 𝑦i.
The usual dot product on R2 is an inner product on R2 .

(2.1) Example
Í
1. For 𝑥 = (𝑎 1, . . . , 𝑎𝑛 ), 𝑦 = (𝑏 1, . . . , 𝑏𝑛 ) ∈ R𝑛 , h𝑥, 𝑦i = 𝑛𝑗=1 𝑎 𝑗 𝑏 𝑗 defines an inner
product. It is called the standard inner product on R𝑛 .

37
38 MA2031 Classnotes
Í𝑛
2. For 𝑥 = (𝑎 1, . . . , 𝑎𝑛 ), 𝑦 = (𝑏 1, . . . , 𝑏𝑛 ) ∈ C𝑛 , h𝑥, 𝑦i = 𝑗=1 𝑎 𝑗 𝑏 𝑗 defines an inner
product. It is called the standard inner product on C𝑛 . Notice that h𝑥, 𝑦i =
Í𝑛
𝑗=1 𝑎 𝑗 𝑏 𝑗 is not an inner product on C .
𝑛

3. In R𝑛 [𝑡], for 𝑝 (𝑡) = 𝑎 0 + 𝑎 1𝑡 + · · · + 𝑎𝑛 𝑡 𝑛 , 𝑞(𝑡) = 𝑏 0 + 𝑏 1𝑡 + · · · + 𝑏𝑛 𝑡 𝑛 , take


Í
h𝑝, 𝑞i = 𝑛𝑖=0 𝑎𝑖 𝑏𝑖 . This defines an inner product on R𝑛 [𝑡].
4. In C𝑛 [𝑡], for 𝑝 (𝑡) = 𝑎 0 + 𝑎 1𝑡 + · · · + 𝑎𝑛 𝑡 𝑛 , 𝑞(𝑡) = 𝑏 0 + 𝑏 1𝑡 + · · · + 𝑏𝑛 𝑡 𝑛 , take
Í
h𝑝, 𝑞i = 𝑛𝑖=0 𝑎𝑖 𝑏 𝑖 . This defines an inner product on C𝑛 [𝑡].
5. Let 𝑡 1, 𝑡 2, . . . , 𝑡𝑛+1 be distinct real numbers. For any 𝑝, 𝑞 ∈ R𝑛 [𝑡], h𝑝, 𝑞i =
Í𝑛+1
𝑖=1 𝑝 (𝑡𝑖 )𝑞(𝑡𝑖 ) defines an inner product on R𝑛 [𝑡].
6. Consider each polynomial 𝑝 ∈ R𝑛 [𝑡] as a function from [0, 1] to R. For 𝑝, 𝑞 ∈
∫1
R𝑛 [𝑡], define h𝑝, 𝑞i = 0 𝑝 (𝑡)𝑞(𝑡)𝑑𝑡 . This is an inner product on R𝑛 [𝑡].
7. Let𝑉 be a finite dimensional vector space with an ordered basis 𝐵 = {𝑢 1, 𝑢 2, . . . , 𝑢𝑛 }.
Í Í Í
For 𝑥 = 𝑛𝑖=1 𝛼𝑖 𝑢𝑖 , 𝑦 = 𝑛𝑖=1 𝛽𝑖 𝑢𝑖 , define h𝑥, 𝑦i𝐵 = 𝑛𝑖=1 𝛼𝑖 𝛽 𝑖 . It is an inner product
on 𝑉 .
Í Í𝑛
8. For 𝐴 = [𝑎𝑖 𝑗 ] and 𝐵 = [𝑏𝑖 𝑗 ] in F𝑚×𝑛 , define h𝐴, 𝐵i = 𝑚 𝑖=1 𝑗=1 𝑎𝑖 𝑗 𝑏 𝑖 𝑗 . It is an
inner product on F . 𝑚×𝑛
∫𝑏
9. For 𝑓 , 𝑔 ∈ 𝐶 [𝑎, 𝑏], h𝑓 , 𝑔i = 𝑎 𝑓 (𝑡)𝑔(𝑡) 𝑑𝑡 is an inner product on 𝐶 [𝑎, 𝑏].

A vector space with an inner product on it is called an inner product space.


Read the abbreviation ips as ‘inner product space’.
An ips with the underlying field as R is called a real ips or an Euclidean space.
An ips with F = C is called a complex ips or a unitary space.
Notice that in a real ips, the fifth property reads as h𝑦, 𝑥i = h𝑥, 𝑦i for all vectors 𝑥
and 𝑦.

(2.2) Theorem
Let 𝑉 be an ips. For all 𝑥, 𝑦, 𝑧 ∈ 𝑉 and for each 𝛼 ∈ F,

h𝑥, 𝑦 + 𝑧i = h𝑥, 𝑦i + h𝑥, 𝑧i and h𝑥, 𝛼𝑦i = 𝛼 h𝑥, 𝑦i.

Proof. h𝑥, 𝑦 + 𝑧i = h𝑦 + 𝑧, 𝑥i = h𝑦, 𝑥i + h𝑧, 𝑥i = h𝑦, 𝑥i + h𝑧, 𝑥i = h𝑥, 𝑦i + h𝑥, 𝑧i.


h𝑥, 𝛼𝑦i = h𝛼𝑦, 𝑥i = 𝛼 h𝑦, 𝑥i = 𝛼 h𝑦, 𝑥i = 𝛼 h𝑥, 𝑦i.
Let 𝑉 be an ips. For any 𝑥 ∈ 𝑉 , the length of 𝑥, also called the norm of 𝑥 is
defined as p
k𝑥 k = h𝑥, 𝑥i.
Notice that h𝑥, 𝑥i ≥ 0 implies that k𝑥 k is a real number.
In any ips 𝑉 , the norm satisfies the following properties:
Inner Product Spaces 39
1. For each 𝑥 ∈ 𝑉 , k𝑥 k ≥ 0.
2. For each 𝑥 ∈ 𝑉 , 𝑥 = 0 iff k𝑥 k = 0.
3. For each 𝑥 ∈ 𝑉 and for each 𝛼 ∈ F, k𝛼𝑥 k = |𝛼 |k𝑥 k.

Other most used properties of the norm, in an ips, are proved in the following
theorem.

(2.3) Theorem
For all vectors 𝑥, 𝑦 in an ips, the following are true:
(1) (Parallelogram Law) k𝑥 + 𝑦 k 2 + k𝑥 − 𝑦 k 2 = 2k𝑥 k 2 + 2k𝑦 k 2 .
(2) (Cauchy-Schwartz Inequality) |h𝑥, 𝑦i| ≤ k𝑥 k k𝑦 k.
Further, equality holds iff one of 𝑥, 𝑦 is a scalar multiple of the other.
(3) (Triangle Inequality) k𝑥 + 𝑦 k ≤ k𝑥 k + k𝑦 k.
(4) (Reverse Triangle Inequality) k𝑥 k − k𝑦 k ≤ k𝑥 − 𝑦 k.

Proof. (1) k𝑥 + 𝑦 k 2 = h𝑥 + 𝑦, 𝑥 + 𝑦i = h𝑥, 𝑥i + h𝑥, 𝑦i + h𝑦, 𝑥i + h𝑦, 𝑦i. Similarly,


expand k𝑥 − 𝑦 k 2 and complete the proof.
h𝑥, 𝑦i
(2) If 𝑦 = 0, then h𝑥, 𝑦i = 0 = k𝑥 k k𝑦 k. Assume that 𝑦 ≠ 0. Set 𝛼 = . Then
h𝑦, 𝑦i
h𝑦, 𝑥i |h𝑥, 𝑦i| 2
𝛼= . Consequently, 𝛼 h𝑥, 𝑦i = 𝛼 h𝑦, 𝑥i = 𝛼𝛼 h𝑦, 𝑦i = . Now,
h𝑦, 𝑦i k𝑦 k 2

|h𝑥, 𝑦i| 2
k𝑥 −𝛼𝑦 k 2 = h𝑥 −𝛼𝑦, 𝑥 −𝛼𝑦i = h𝑥, 𝑥i −𝛼 h𝑥, 𝑦i −𝛼 h𝑦, 𝑥i +𝛼𝛼 h𝑦, 𝑦i = k𝑥 k 2 − .
k𝑦 k 2

As 0 ≤ k𝑥 − 𝛼𝑦k 2, we have |h𝑥, 𝑦i| ≤ k𝑥 k k𝑦 k.


Next, |h𝑥, 𝑦i| = k𝑥 k k𝑦 k iff 𝑦 = 0 or k𝑥 − 𝛼𝑦 k 2 = 0 iff 𝑦 = 0 or 𝑥 = 𝛼𝑦 iff one is a
scalar multiple of the other.
(3) k𝑥 + 𝑦 k 2 = h𝑥 + 𝑦, 𝑥 + 𝑦i = k𝑥 k 2 + h𝑥, 𝑦i + h𝑦, 𝑥i + k𝑦 k 2
= k𝑥 k 2 + 2 Reh𝑥, 𝑦i + k𝑦 k 2 ≤ k𝑥 k 2 + 2|h𝑥, 𝑦i| + k𝑦 k 2
≤ k𝑥 k 2 + 2k𝑥 kk𝑦 k + k𝑦 k 2 = (k𝑥 k + k𝑦 k) 2 .
(4) Using (3), we have k𝑥 k ≤ k𝑥 − 𝑦 k + k𝑦 k. Thus k𝑥 k − k𝑦 k ≤ k𝑥 − 𝑦 k. Similarly,
k𝑦 k − k𝑥 k ≤ k𝑥 − 𝑦 k. Then the required inequality follows.

Exercises for § 2.1


1. Check that the maps given in (2.1) are inner products on the respective vector
spaces.
2. Why is the map h , i not an inner product on the given vector space?
40 MA2031 Classnotes

(a) h𝑥, 𝑦i= 𝑎𝑐 for 𝑥 = (𝑎, 𝑏), 𝑦 = (𝑐, 𝑑) in R2 .


(b) h𝑥, 𝑦i= 𝑎𝑐 for 𝑥 = (𝑎, 𝑏), 𝑦 = (𝑐, 𝑑) in C2 .
(c) h𝑥, 𝑦i= 𝑎𝑐 − 𝑏𝑑 for 𝑥 = (𝑎, 𝑏), 𝑦 = (𝑐, 𝑑) in R2 .
∫1
(d) h𝑝, 𝑞i= 0 𝑝 0 (𝑡)𝑞(𝑡) 𝑑𝑡 for 𝑝, 𝑞 ∈ R[𝑡].
∫1
(e) h𝑥, 𝑦i = 0 𝑥 0 (𝑡)𝑦 0 (𝑡) 𝑑𝑡 for 𝑥, 𝑦 ∈ 𝐶 1 [0, 1].
∫1
(f) h𝑥, 𝑦i = 0 𝑥 0 (𝑡)𝑦 0 (𝑡) 𝑑𝑡 for 𝑥, 𝑦 ∈ 𝐶 1 [0, 1].
(g) h𝑓 , 𝑔i = 0/2 𝑓 (𝑡)𝑔(𝑡) for 𝑓 , 𝑔 ∈ 𝐶 [0, 1].
∫1

Ans: For each of these, construct a vector 𝑣 so that h𝑣, 𝑣i = 0.


3. Let 𝐴 = [𝑎𝑖 𝑗 ] ∈ R2×2 . For 𝑥, 𝑦 ∈ R2, let 𝑓𝐴 (𝑥, 𝑦) = 𝑦 t𝐴𝑥 . Show that 𝑓𝐴 is an
inner product on R2 iff 𝑎 12 = 𝑎 21, 𝑎 11 > 0, 𝑎 22 > 0, and 𝑎 11𝑎 22 − 𝑎 12𝑎 21 > 0.
4. Let 𝐵 be a basis for a finite dimensional ips 𝑉 . Let 𝑦 ∈ 𝑉 be such that h𝑥, 𝑦i = 0
for all 𝑥 ∈ 𝐵. Show that 𝑦 = 0.
5. Let 𝑉 be an inner product space, and let 𝑥, 𝑦 ∈ 𝑉 . Show the following:
(a) k𝑥 k ≥ 0.
(b) 𝑥 = 0 iff k𝑥 k = 0.
(c) k𝛼𝑥 k = |𝛼 |k𝑥 k, for all 𝛼 ∈ F.
(d) k𝑥 + 𝛼𝑦k = k𝑥 − 𝛼𝑦 k for all 𝛼 ∈ F iff h𝑥, 𝑦i = 0.
(e) If k𝑥 + 𝑦 k = k𝑥 k + k𝑦 k, then at least one of 𝑥, 𝑦 is a scalar multiple of
the other.
6. Let 𝑉 be a complex ips. Show that Reh𝑖𝑥, 𝑦i = −Imh𝑥, 𝑦i for all 𝑥, 𝑦 ∈ 𝑉 .
7. (Polarization Identity): Let 𝑉 be an ips over F. Let 𝑥, 𝑦 ∈ 𝑉 . Show the
following:
(a) If F = R, then 4h𝑥, 𝑦i = k𝑥 + 𝑦 k 2 − k𝑥 − 𝑦 k 2 .
(b) If F = C, then 4h𝑥, 𝑦i = k𝑥 + 𝑦 k 2 − k𝑥 − 𝑦 k 2 + 𝑖 k𝑥 + 𝑖𝑦 k 2 − 𝑖 k𝑥 − 𝑖𝑦 k 2 .
8. For 1 ≤ 𝑗 ≤ 𝑛, let 𝑎 𝑗 ≥ 0 and 𝑏 𝑗 ≥ 0. Show that
𝑛
Õ 2 𝑛
Õ 𝑛
Õ
2 2
𝑎 𝑗𝑏 𝑗 ≤ ( 𝑗𝑎 𝑗 ) 𝑎 𝑗 /𝑗 .
𝑗=1 𝑗=1 𝑗=1

2.2 Orthonormal basis


In the presence of an inner product, we can define the non-obtuse angle between
two vectors. Let 𝑥 and 𝑦 be nonzero vectors in an ips. The angle between 𝑥 and 𝑦,
denoted by 𝜃 (𝑥, 𝑦), is defined by
|h𝑥, 𝑦i|
cos 𝜃 (𝑥, 𝑦) = .
k𝑥 k k𝑦 k
Inner Product Spaces 41
Notice that the angle is well defined due to Cauchy-Schwartz inequality. We will
mainly work with a particular case of the angle, that is, when it is 𝜋/2.
Let 𝑥, 𝑦 be vectors in an ips. We say that 𝑥 is orthogonal to 𝑦 iff h𝑥, 𝑦i = 0. When
𝑥 is orthogonal to 𝑦, we write 𝑥 ⊥ 𝑦.
Thus the zero vector is orthogonal to every vector, and 𝑥 ⊥ 𝑦 implies 𝑦 ⊥ 𝑥 .

(2.4) Example

1. Let {𝑒 1, 𝑒 2, . . . , 𝑒𝑛 } be the standard basis for R𝑛 . If 𝑖 ≠ 𝑗, then 𝑒𝑖 ⊥ 𝑒 𝑗 .


∫1
2. In R2 [𝑡], with the inner product as h𝑝 (𝑡), 𝑞(𝑡)i = −1 𝑝 (𝑡)𝑞(𝑡) 𝑑𝑡, the vectors 1
and 𝑡 are orthogonal to each other.
3. In F𝑛 [𝑡] with the inner product as

h𝑎 0 + 𝑎 1𝑡 + · · · + 𝑎𝑛 𝑡 𝑛 , 𝑏 0 + 𝑏 1𝑡 + · · · + 𝑏𝑛 𝑡 𝑛 i = 𝑎 0𝑏 0 + 𝑎 1𝑏 1 + · · · + 𝑎𝑛𝑏 𝑛 ,

the polynomials 𝑡 𝑗 and 𝑡 𝑘 are orthogonal to each other, provided 𝑗 ≠ 𝑘.


∫ 2𝜋 ∫ 2𝜋
4. In𝐶 [0, 2𝜋], define h𝑓 , 𝑔i = 0 𝑓 (𝑡)𝑔(𝑡)𝑑𝑡 . For all𝑚, 𝑛 ∈ N, 0 cos 𝑚𝑡 sin 𝑛𝑡 𝑑𝑡 =
0. Hence cos 𝑚𝑡 ⊥ sin 𝑛𝑡 .

(2.5) Theorem Pythagoras


Let 𝑥 and 𝑦 be vectors in an ips 𝑉 .
(1) If 𝑥 ⊥ 𝑦, then k𝑥 + 𝑦 k 2 = k𝑥 k 2 + k𝑦 k 2 .
(2) If 𝑉 is a real ips, then k𝑥 + 𝑦 k 2 = k𝑥 k 2 + k𝑦 k 2 implies 𝑥 ⊥ 𝑦.
inner space product
Proof. (1) h𝑥 + 𝑦, 𝑥 + 𝑦i = k𝑥 k 2 + h𝑥, 𝑦i + h𝑦, 𝑥i + k𝑦 k 2 . Since 𝑥 ⊥ 𝑦, h𝑥, 𝑦i = 0 =
h𝑦, 𝑥i. Then the equality follows.
(2) Suppose that 𝑉 is a real ips. Then h𝑥, 𝑦i = h𝑦, 𝑥i. Now, k𝑥 + 𝑦 k 2 = k𝑥 k 2 + k𝑦 k 2
implies h𝑥, 𝑦i = 0.
For a complex ips, the conclusion of the second statement in Pythagoras theorem
need not be true. For example, consider 𝑉 = C, a complex ips with h𝑥, 𝑦i = 𝑥𝑦, as
usual. Then k1 + 𝑖 k 2 = (1 + 𝑖)(1 + 𝑖) = 1 + 1 = k1k 2 + k𝑖 k 2 . But h1, 𝑖i = 1 · (−𝑖) =
−𝑖 ≠ 0.
We extend the notion of orthogonality to a set of vectors.
Let 𝑆 be a nonempty subset of nonzero vectors in an ips 𝑉 . 𝑆 is called an
orthogonal set in 𝑉 iff for all 𝑥, 𝑦 ∈ 𝑆 with 𝑥 ≠ 𝑦, we have 𝑥 ⊥ 𝑦.
𝑆 is called an orthonormal set in 𝑉 iff 𝑆 is an orthogonal set in 𝑉 , and k𝑥 k = 1
for each 𝑥 ∈ 𝑆.
42 MA2031 Classnotes

(2.6) Example

1. The standard basis of R𝑛 is an orthonormal set.


∫ 2𝜋
2. Consider 𝐶 [0, 2𝜋] as a real ips with h𝑓 , 𝑔i = 0 𝑓 (𝑡)𝑔(𝑡) 𝑑𝑡 as the inner product.
The set of functions {cos 𝑚𝑡 : 𝑚 ∈ N} is an orthogonal set in 𝐶 [0, 2𝜋]. But
∫ 2𝜋
0
cos2 𝑡 𝑑𝑡 ≠ 1. Hence, the set is not orthonormal.

3. However, {(cos 𝑚𝑡)/ 𝜋 : 𝑚 ∈ N} is an orthonormal set in 𝐶 [0, 2𝜋].
4. Any singleton set {𝑣 } for 𝑣 ≠ 0, is an orthogonal set in any ips 𝑉 .

Notice that if {𝑣 1, . . . , 𝑣𝑛 } is an orthonormal set, then for 1 ≤ 𝑖, 𝑗 ≤ 𝑛,


(
1 if 𝑖 = 𝑗
h𝑣𝑖 , 𝑣 𝑗 i = 𝛿𝑖 𝑗 =
0 if 𝑖 ≠ 𝑗 .

Further, if {𝑣 1, . . . , 𝑣𝑛 } is an orthogonal set, then {𝑣 1 /k𝑣 1 k, . . . , 𝑣𝑛 /k𝑣𝑛 k} is an or-


thonormal set.
Each orthogonal or orthonormal set in (2.6) is linearly independent; this is not a
mere coincidence.

(2.7) Theorem
Every orthogonal (orthonormal) set is linearly independent.

Proof. Let 𝑆 be an orthogonal set in an ips 𝑉 . Let 𝑛 ∈ N, 𝑣 1, . . . , 𝑣𝑛 ∈ 𝑆, and let


Í
𝛼 1, . . . , 𝛼𝑛 ∈ F. Suppose 𝑛𝑖=1 𝛼𝑖 𝑣𝑖 = 0. Let 1 ≤ 𝑗 ≤ 𝑛. Taking inner product with 𝑣 𝑗 ,
and using the fact that h𝑣𝑖 , 𝑣 𝑗 i = 0 for 𝑖 ≠ 𝑗, we have
𝑛
DÕ E Õ𝑛
0= 𝛼𝑖 𝑣 𝑖 , 𝑣 𝑗 = 𝛼𝑖 h𝑣𝑖 , 𝑣 𝑗 i = 𝛼 𝑗 h𝑣 𝑗 , 𝑣 𝑗 i.
𝑖=1 𝑖=1

Since 𝑣 𝑗 ≠ 0, 𝛼 𝑗 = 0. Therefore, 𝑆 is linearly independent.


Like maximal linearly independent sets, a maximal orthogonal set is an orthogonal
set which when enlarged by including a vector not in the set, the enlarged set is
no more orthogonal. In general, a maximal orthogonal set is called an orthogonal
basis for the ips. Similarly, an orthonormal basis is a maximal orthonormal set. In
infinite dimensional ips, an orthonormal basis may not span the space. However,
in a finite dimensional ips, an orthogonal or an orthonormal basis spans the space;
and hence, it is a basis in the conventional sense. Since we are concerned with
finite dimensional ips only, we define orthogonal and orthonormal bases as in the
following.
Inner Product Spaces 43
Let 𝑉 be a finite dimensional ips. An orthogonal set which is also a basis for 𝑉 is
called an orthogonal basis of 𝑉 . Similarly, an orthonormal set which is also a basis
for 𝑉 is called an orthonormal basis of 𝑉 .
In case, we have an orthonormal basis for an ips, each vector can be expressed as
a linear combination of the basis vectors, where the coefficients are just the inner
products of the vector with the basis vectors. Moreover, the length of any vector
can be expressed in a nice way.

(2.8) Theorem
Let 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } be an orthonormal basis of an ips 𝑉 . Let 𝑥 ∈ 𝑉 . Then
Í
(1) (Fourier Expansion) 𝑥 = 𝑛𝑗=1 h𝑥, 𝑣 𝑗 i𝑣 𝑗 ;
(2) (Parseval’s Identity) k𝑥 k 2 = 𝑛𝑗=1 |h𝑥, 𝑣 𝑗 i| 2 .
Í

Í𝑛
Proof. (1) Since 𝐵 is a basis of 𝑉 , 𝑥 = 𝑖=1 𝛼𝑖 𝑣 𝑖 for some scalars 𝛼𝑖 . Now,

𝑛
DÕ E 𝑛
Õ
h𝑥, 𝑣 𝑗 i = 𝛼𝑖 𝑣 𝑖 , 𝑣 𝑗 = 𝛼 𝑖 𝛿𝑖 𝑗 = 𝛼 𝑗 for 1 ≤ 𝑗 ≤ 𝑛.
𝑖=1 𝑖=1
Í𝑛 Í𝑛 Í𝑛
Therefore, 𝑥 = 𝑖=1 𝛼𝑖 𝑣 𝑖 = 𝑗=1 𝛼 𝑗 𝑣 𝑗 = 𝑗=1 h𝑥, 𝑣 𝑗 i𝑣 𝑗 .

(2) Using (1), we have


𝑛
DÕ 𝑛
Õ E Õ𝑛 𝑛
Õ 
k𝑥 k 2 = 𝛼𝑗𝑣 𝑗, 𝛼𝑖 𝑣 𝑖 = 𝛼𝑗 𝛼 𝑖 h𝑣 𝑗 , 𝑣𝑖 i
𝑗=1 𝑖=1 𝑗=1 𝑖=1
𝑛
Õ 𝑛
Õ  𝑛
Õ Õ𝑛 𝑛
Õ
2
= 𝛼𝑗 𝛼 𝑖 𝛿 𝑗𝑖 = 𝛼𝑗𝛼 𝑗 = |𝛼 𝑗 | = |h𝑥, 𝑣 𝑗 i| 2 .
𝑗=1 𝑖=1 𝑗=1 𝑗=1 𝑗=1

If the finite set 𝐵 is not a basis for the ips 𝑉 , then instead of an equality in Parseval’s
identity, we have an inequality.

(2.9) Theorem (Bessel’s Inequality)


Let 𝐸 = {𝑢 1, . . . , 𝑢𝑚 } be an orthonormal set in an ips 𝑉 . Let 𝑥 ∈ 𝑉 . Then
Í𝑚 2 2
𝑗=1 |h𝑥, 𝑢 𝑗 i| ≤ k𝑥 k .

Proof. Consider the ips 𝑈 = span {𝑢 1, . . . , 𝑢𝑚 }, which is a subspace of 𝑉 . Now,


Í
𝐸 is an orthonormal basis of 𝑈 . Let 𝑦 = 𝑚𝑗=1 h𝑥, 𝑢 𝑗 i 𝑢 𝑗 . Now, 𝑦 ∈ 𝑈 . By Fourier
Í
expansion, 𝑦 = 𝑚𝑗=1 h𝑦, 𝑢 𝑗 i 𝑢 𝑗 . Since {𝑢 1, . . . , 𝑢𝑚 } is a basis of 𝑈 , by (1.18), we have
h𝑥, 𝑢 𝑗 i = h𝑦, 𝑢 𝑗 i for 1 ≤ 𝑗 ≤ 𝑚. That is, 𝑥 − 𝑦 ⊥ 𝑢 𝑗 for 1 ≤ 𝑗 ≤ 𝑚.
Then 𝑥 − 𝑦 ⊥ 𝑦. By Pythagoras’ theorem and Parseval’s identity, we obtain
44 MA2031 Classnotes
𝑚
Õ 𝑛
Õ
2 2 2 2 2
k𝑥 k = k𝑥 − 𝑦 k + k𝑦 k ≥ k𝑦 k = |h𝑦, 𝑢 𝑗 i| = |h𝑥, 𝑢 𝑗 i| 2 .
𝑗=1 𝑗=1

The vector 𝑦 in the proof of Bessel’s inequality has geometric significance. For
illustartion, take 𝑈 as the 𝑥𝑦-plane, 𝑉 as R3, and 𝑥 = (1, 2, 3). Choose the standard
basis {𝑒 1, 𝑒 2 } as the orthonormnal basis for 𝑈 . Then

h𝑥, 𝑒 1 i = 1, h𝑥, 𝑒 2 i = 2, 𝑦 = 1 𝑒 1 + 2 𝑒 2 = (1, 2, 0).

The vector 𝑦 is the orthogonal projection of 𝑥 on 𝑈 .


We may view Bessel’s inequality in a different way. Imagine extending the
orthonormal basis 𝐸 of the subspace 𝑈 to a basis 𝐵 of the ips 𝑉 . If 𝐵 is also
orthonormal, then Bessel’s inequality would follow from Parseval’s identity in a
trivial way. But is this extension of an orthonormal basis of a subspace to the parent
space always possible? We address this question in the next section.

Exercises for § 2.2


1. Let 𝑊 = {𝑥 ∈ R4 : 𝑥 ⊥ (1, 0, −1, 1), 𝑥 ⊥ (2, 3, −1, 2)}, where R4 is the real
ips with standard inner product. Show that 𝑊 is a subspace of R4 . Also, find
a basis for 𝑊 .
2. For vectors 𝑥, 𝑦 in a real ips, prove that 𝑥 + 𝑦 ⊥ 𝑥 − 𝑦 iff k𝑥 k = k𝑦 k.
3. Consider the standard basis for R𝑛 . Take 𝑥 = (𝑎 1, . . . , 𝑎𝑛 ). Verify Fourier
expansion and Prseval’s identity.
4. Take 𝑥 = (𝑛, 𝑛 − 1, . . . , 1) ∈ R𝑛 . Consider the orthonormal set {𝑒 2, . . . , 𝑒𝑛−1 }.
Verify Bessel’s inequality.
5. Let {𝑢 1, 𝑢 2, . . . , 𝑢𝑘 } be an orthogonal set in a real ips 𝑉 . Let 𝑎 1, ..., 𝑎𝑘 ∈ R. Is
it rue that k 𝑘𝑖=1 𝑎𝑖 𝑢𝑖 k 2 = 𝑘𝑖=1 |𝑎𝑖 | 2 k𝑢𝑖 k 2 ? Ans: Yes.
Í Í

6. Formulate a converse of Parseval’s identity and prove it.


7. In the proof of Bessel’s inequality, by plugging in the expression for 𝑦, show
directly that h𝑥 − 𝑦, 𝑦i = 0.
8. For a subset 𝑆 of an ips 𝑉 , define 𝑆 ⊥ = {𝑥 ∈ 𝑉 : h𝑥, 𝑢i = 0, for all 𝑢 ∈ 𝑆 }.
Show the following:
(a) 𝑉 ⊥ = {0} (b) {0}⊥ = 𝑉 (c) 𝑆 ⊥ is a subspace of 𝑉 (d) 𝑆 ⊆ 𝑆 ⊥⊥ .
9. Let 𝑊 be a subspace of a finite dimensional ips 𝑉 . Show the following:
(a) 𝑉 = 𝑊 + 𝑊 ⊥ (b) 𝑊 ∩ 𝑊 ⊥ = {0} (c) 𝑊 ⊥⊥ = 𝑊 .
10. Show that {sin 𝑡, sin(2𝑡), . . . , sin(𝑚𝑡)} is linearly independent in 𝐶 [0, 2𝜋].
Inner Product Spaces 45

2.3 Gram-Schmidt orthogonalization


Given two linearly independent vectors 𝑢 1, 𝑢 2 on the plane how do we construct two
orthogonal vectors 𝑣 1, 𝑣 2 such that span {𝑢 1, 𝑢 2 } = span {𝑣 1, 𝑣 2 }?
Keep 𝑣 1 = 𝑢 1 . Project 𝑢 2 on 𝑢 1, and subtract the result from 𝑢 2 to get 𝑣 2 . Now,
𝑣 2 ⊥ 𝑣 1 . It is easy to see that the span condition is also satisfied.
We may continue this process of taking projections in 𝑛 dimensions. Its general
version is called the Gram-Schmidt orthogonalization process. We formulate and
prove this process in the following theorem.

(2.10) Theorem
Let 𝑢 1, . . . , 𝑢𝑛 be linearly independent vectors in an ips 𝑉 . Construct the vectors
𝑣 1, . . . , 𝑣𝑛 as follows:

𝑣 1 = 𝑢1
h𝑢𝑘 , 𝑣 1 i h𝑢𝑘 , 𝑣 2 i h𝑢𝑘 , 𝑣𝑘−1 i
𝑣 𝑘 = 𝑢𝑘 − 𝑣1 − 𝑣2 − · · · − 𝑣𝑘−1 for 𝑘 > 1.
h𝑣 1, 𝑣 1 i h𝑣 2, 𝑣 2 i h𝑣𝑘−1, 𝑣𝑘−1 i

Then span {𝑣 1, . . . , 𝑣𝑛 } = span {𝑢 1, . . . , 𝑢𝑛 }, and {𝑣 1, . . . , 𝑣𝑛 } is orthogonal.

Proof. We use induction on 𝑛. For 𝑛 = 1, 𝑣 1 = 𝑢 1 . Thus span {𝑣 1 } = span {𝑢 1 }.


Moreover, {𝑢 1 } is linearly independent implies that 𝑢 1 ≠ 0. So, 𝑣 1 = 𝑢 1 ≠ 0. Then
{𝑣 1 } is an orthogonal set.
Assume that the conclusion of the theorem is true for 𝑛 = 𝑚. We show that it is
true for 𝑛 = 𝑚 + 1.
By assumption, span {𝑣 1, . . . , 𝑣𝑚 } = span {𝑢 1, . . . , 𝑢𝑚 }. Let 𝑣 be a linear combina-
tion of 𝑣 1, . . . , 𝑣𝑚 , 𝑣𝑚+1 . Then 𝑣 is a linear combination of 𝑢 1, . . . , 𝑢𝑚 , 𝑣𝑚+1 . As 𝑣𝑚+1 is
a linear combination of 𝑣 1, . . . , 𝑣𝑚 , 𝑢𝑚+1, it is a linear combination of 𝑢 1, . . . , 𝑢𝑚 , 𝑢𝑚+1 .
Thus 𝑣 is a linear combination of 𝑢 1, . . . , 𝑢𝑚 , 𝑢𝑚+1 . Similarly, if 𝑢 is a linear combi-
nation of 𝑢 1, . . . , 𝑢𝑚 , 𝑢𝑚+1, then it is a linear combination of 𝑣 1, . . . , 𝑣𝑚 , 𝑢𝑚+1 . Again,
𝑢𝑚+1 is a linear combination of 𝑣 1, . . . , 𝑣𝑚+1 implies that 𝑢 is a linear combination of
𝑣 1, . . . , 𝑣𝑚+1 . This proves that span {𝑣 1, . . . , 𝑣𝑚+1 } = span {𝑢 1, . . . , 𝑢𝑚+1 }.
By assumption, {𝑣 1, . . . , 𝑣𝑚 } is orthogonal. Let 1 ≤ 𝑗 ≤ 𝑚. Then for any 𝑖 ≠ 𝑗,
1 ≤ 𝑖 ≤ 𝑚, h𝑣𝑖 , 𝑣 𝑗 i = 0. We obtain

𝑚
D Õ h𝑢𝑚+1, 𝑣𝑖 i E h𝑢𝑚+1, 𝑣 𝑗 i
h𝑣𝑚+1, 𝑣 𝑗 i = 𝑢𝑚+1 − 𝑣𝑖 , 𝑣 𝑗 = h𝑢𝑚+1, 𝑣 𝑗 i − h𝑣 𝑗 , 𝑣 𝑗 i = 0.
𝑖=1
h𝑣𝑖 , 𝑣𝑖 i h𝑣 𝑗 , 𝑣 𝑗 i
46 MA2031 Classnotes

Therefore, {𝑣 1, . . . , 𝑣𝑚 , 𝑣𝑚+1 } is an orthogonal set.


Starting from a basis of a finite dimensional ips 𝑉 , Gram-Schmidt process yields
an orthogonal basis for 𝑉 .

(2.11) Example
The vectors 𝑢 1 = (1, 1, 0), 𝑢 2 = (0, 1, 1), 𝑢 3 = (1, 0, 1) form a basis for F3 . Apply
Gram-Schmidt Orthogonalization to obtain an orthogonal basis of F3 .

𝑣 1 = (1, 1, 0).
h𝑢 2, 𝑣 1 i (0, 1, 1) · (1, 1, 0)
𝑣 2 = 𝑢2 − 𝑣 1 = (0, 1, 1) − (1, 1, 0)
h𝑣 1, 𝑣 1 i (1, 1, 0) · (1, 1, 0)
= (0, 1, 1) − 12 (1, 1, 0) = (− 12 , 12 , 1).
h𝑢 3, 𝑣 1 i h𝑢 3, 𝑣 2 i
𝑣 3 = 𝑢3 − 𝑣1 − 𝑣2
h𝑣 1, 𝑣 1 i h𝑣 2, 𝑣 2 i
= (1, 0, 1) − (1, 0, 1) · (1, 1, 0)(1, 1, 0) − (1, 0, 1) · (− 12 , 21 , 1) (− 12 , 21 , 1)
= (1, 0, 1) − 12 (1, 1, 0) − 13 (− 12 , 21 , 1) = (− 23 , 23 , − 32 ).

The required orthogonal basis of F3 is (1, 1, 0), (− 12 , 12 , 1), (− 23 , 32 , − 23 ) .




(2.12) Example
The vectors 𝑢 1 = 1, 𝑢 2 = 𝑡, 𝑢 3 = 𝑡 2 form a linearly independent set in the ips of all
polynomials considered as functions from [−1, 1] to R; with the inner product as
∫1
h𝑝 (𝑡), 𝑞(𝑡)i = −1 𝑝 (𝑡)𝑞(𝑡) 𝑑𝑡 . Gram-Schmidt Process yields:

𝑣 1 = 𝑢 1 = 1.
∫1
h𝑢 2, 𝑣 1 i 𝑡 𝑑𝑡
𝑣 2 = 𝑢2 − 𝑣 1 = 𝑡 − ∫−1
1
1 = 𝑡.
h𝑣 1, 𝑣 1 i 𝑑𝑡
−1
∫1 ∫1
2 𝑑𝑡 𝑡 3 𝑑𝑡
h𝑢 3, 𝑣 1 i h𝑢 3, 𝑣 2 i 2 −1
𝑡 −1
𝑣 3 = 𝑢3 − 𝑣1 − 𝑣2 = 𝑡 − ∫ 1 1 − ∫1 𝑡 = 𝑡 2 − 31 .
h𝑣 1, 𝑣 1 i h𝑣 2, 𝑣 2 i 𝑑𝑡 𝑡 2 𝑑𝑡
−1 −1

The set {1,𝑡, 𝑡 2 − 13 } is an orthogonal set in this ips.


And, span 1, 𝑡, 𝑡 2 } = span {1, 𝑡, 𝑡 2 − 31 .

In fact, orthogonalization of {1, 𝑡, 𝑡 2, 𝑡 3, 𝑡 4, . . .} with the above inner product gives


the Legendre Polynomials.
Observe that an orthogonal set can be made orthonormal by dividing each vector
by its norm. We thus obtain the following result.
Inner Product Spaces 47
(2.13) Theorem
An orthonormal set in a finite dimensional ips can be extended to an orthonormal
basis. Every finite dimensional ips has an orthonormal basis.

Notice that if you apply Gram-Schmidt orthogonalization on a linearly dependent


set, then it will generate the zero vector. That is, if {𝑢 1, . . . , 𝑢𝑘 } is linearly inde-
pendent and 𝑢𝑘+1 is a linear combination of 𝑢 1, . . . , 𝑢𝑘 , then 𝑣𝑘+1 will turn out to be
the zero vector. Ignoring such zero vectors in the Gram-Schmidt process leads to
an orthogonal basis for the span of the given vectors. This is how Gram-Schmidt
orthogonalization process is used to extract a basis from a finite spanning set.

Exercises for § 2.3


1. Consider R3 with the standard inner product. Apply Gram-Schmidt process
on the given set of vectors.
(a) {(1, 2, 0), (2, 1, 0), (1, 1, 1)} (b) {(1, 1, 1), (1, −1, 1), (1, 1, −1)}
(c) {(0, 1, 1), (0, 1, −1), (−1, 1, −1)}.
Ans: (a) (1, 2, 0), (6/5, −3/5, 0), (0, 0, 1) (b) (1, 1, 1), (2, −4, 2)/3, (1, 0, −1)
(c) (0, 1, 1), (0, 1, −1), (−1, 0, 0).
2. Consider R3 with the standard inner product. In each of the following, find a
vector of norm 1 which is orthogonal to the given two vectors:
(a) (2, 1, 0), (1,
√ 2, 1) (b) (1, 2, 3),
√ (2, 1, −2) (c) (0,√2, −1), (−1, 2, −1).
Ans: (a) (1/ 14)(1, −2, 3) (b) (1/ 74)(7, −4, 3) (c) (1/2 5)(0, 2, 4).
3. Consider R4 with the standard inner product. In each of the following, find
the set of all vectors orthogonal to both 𝑢 and 𝑣.
(a) 𝑢 = (1, 2, 0, 1), 𝑣 = (2, 1, 0, −1) (b) 𝑢 = (1, 1, 1, 0), 𝑣 = (1, −1, 1, 1)
(c) 𝑢 = (0, 1, 1, −1), 𝑣 = (0, 1, −1, 1).
Ans: (a) span {(−1, 1, 0, 1), (0, 0, 1, 0)}. (b) span {(−6, 1, 5, 2), (0, 1, −1, 1)}.
(c) span {(1, 0, 0, 0), (0, 0, 1, 1)}.
4. Consider C3 with the standard inner product. Find an orthonormal basis for
the subspace
 spanned by the vectors (1, 0, 𝑖) and (2, 1, 1 + 𝑖).
Ans: (1/2)(1, 0, 𝑖/2), (1/4)(1 + 𝑖, 1, 1 − 𝑖) .
5. Consider the polynomials 𝑢 0 (𝑡) = 1, 𝑢 1 (𝑡) = 𝑡, 𝑢 2 (𝑡) = 𝑡 2 in R2 [𝑡]. Using
Gram-Schmidt orthogonalization, find orthogonal polynomials obtained from
𝑢 1, 𝑢 2, 𝑢 3 with respect to the following inner products:
∫1 ∫1
(a) h𝑝, 𝑞i = 0 𝑝 (𝑡)𝑞(𝑡) 𝑑𝑡 (b) h𝑝, 𝑞i = −1 𝑝 (𝑡)𝑞(𝑡) 𝑑𝑡
∫0
(c) h𝑝, 𝑞i = −1 𝑝 (𝑡)𝑞(𝑡) 𝑑𝑡 .
48 MA2031 Classnotes

Ans: (a) 1, 𝑡 − 1/2, 𝑡 2 −𝑡 + 1/6. (b) 1, 𝑡, 𝑡 2 − 1/3. (c) 1, 𝑡 + 1/2, 𝑡 2 − 5𝑡 − 11/6.


∫1
6. Equip R3 [𝑡] with the inner product h𝑓 , 𝑔i = 0 𝑓 (𝑡)𝑔(𝑡)𝑑𝑡 .

(a) Find the set of all vectors orthogonal to the constant polynomials.
(b) Apply Gram-Schmidt process to the ordered basis {1, 𝑡, 𝑡 2, 𝑡 3 }.

Ans: (a) {−(𝑏 + 𝑐 + 𝑑) + 2𝑏𝑡 + 3𝑐𝑡 2 + 4𝑑𝑡 3 : 𝑎, 𝑏, 𝑐, 𝑑 ∈ R}.


(b) 1, 𝑡 − 1/2, 𝑡 2 − 𝑡 + 1/6, 𝑡 3 − 3𝑡 2 /2 + 3𝑡/5 − 1/20.

2.4 Best approximation


In R3, given a plane and a point not on the plane, we are interested in finding a point
on the plane that is closest to the given point. Such a point on the plane may be told
to approximate the given point from the plane.
Of course, the phrase ‘closest’ is meaningful when we have the notion of distance.
If 𝑢 and 𝑣 are two vectors in an ips, the distance between them may be defined as
k𝑢 − 𝑣 k. Given a subspace 𝑈 of an ips 𝑉 , and a vector 𝑣 ∈ 𝑉 , we look for a vector in
𝑢 that minimizes the distance k𝑣 − 𝑥 k while 𝑥 varies over the subspace 𝑈 .
Let 𝑈 be a subspace of an ips 𝑉 . Let 𝑣 ∈ 𝑉 . A vector 𝑢 ∈ 𝑈 is called a best
approximation of 𝑣 from 𝑈 iff k𝑣 − 𝑢 k ≤ k𝑣 − 𝑥 k for each 𝑥 ∈ 𝑈 .

(2.14) Example
Let 𝑈 be the plane {(𝑎, 𝑏, 𝑐) ∈ R3 : 𝑎 + 𝑏 + 𝑐 = 0}. Find a best approximation of the
point (1, 1, 1) from 𝑈 .
Suppose (𝛼, 𝛽, 𝛾) is a best approximation of (1, 1, 1) from 𝑈 . Such a point satisfies
𝛼 +𝛽+𝛾 = 0 and minimizes the distance k(1, 1, 1)−(𝛼, 𝛽, 𝛾)k. Substituting𝛾 = −𝛼 −𝛽,
we look for 𝛼, 𝛽 ∈ R so that
 1/2
𝑓 (𝛼, 𝛽) = (1 − 𝛼) 2 + (1 − 𝛽) 2 + (1 + 𝛼 + 𝛽) 2

is minimum. Simplifying the expression for 𝑓 (𝛼, 𝛽), we see that it is equivalent to
minimizing
𝑔(𝛼, 𝛽) = 𝛼 2 + 𝛽 2 + 𝛼𝛽.
Then by using the methods of functions of two variables calculus, we may determine
the required best approximation as (0, 0, 0).

As it happens a best approximation in an ips can always be obtained in an easier


way than applying calculus of several variables.
Inner Product Spaces 49
(2.15) Theorem
Let 𝑈 be a subspace of an ips 𝑉 . Let 𝑣 ∈ 𝑉 . A vector 𝑢 ∈ 𝑈 is a best approximation
of 𝑣 from 𝑈 iff 𝑣 − 𝑢 ⊥ 𝑥 for each 𝑥 ∈ 𝑈 . Moreover, a best approximation is unique.

Proof. Let 𝑢 ∈ 𝑈 satisfy 𝑣 − 𝑢 ⊥ 𝑥 for each 𝑥 ∈ 𝑈 . If 𝑥 ∈ 𝑈 , then 𝑥 − 𝑢 ∈ 𝑈 . Thus


for each 𝑥 ∈ 𝑈 , By Pythagoras’ Theorem,

k𝑣 − 𝑥 k 2 = k(𝑣 − 𝑢) + (𝑢 − 𝑥)k 2 = k𝑣 − 𝑢 k 2 + k𝑢 − 𝑥 k 2 ≥ k𝑣 − 𝑢 k 2 .

Therefore, 𝑢 is a best approximation of 𝑣 from 𝑈 .


Conversely, suppose that 𝑢 is a best approximation of 𝑣. Then

k𝑣 − 𝑢 k ≤ k𝑣 − 𝑥 k for each 𝑥 ∈ 𝑈 . (2.4.1)

Let 𝑦 ∈ 𝑈 . We want to show that h𝑣 − 𝑢, 𝑦i = 0. For 𝑦 = 0, clearly h𝑣 − 𝑢, 𝑦i = 0.


For 𝑦 ≠ 0, let 𝛼 = h𝑣 − 𝑢, 𝑦i/k𝑦 k 2 . Then

h𝑣 − 𝑢, 𝛼𝑦i = 𝛼 h𝑣 − 𝑢, 𝑦i = |𝛼 | 2 k𝑦 k 2, h𝛼𝑦, 𝑣 − 𝑢i = h𝑣 − 𝑢, 𝛼𝑦i = |𝛼 | 2 k𝑦 k 2 .

Notice that 𝑢 + 𝛼𝑦 ∈ 𝑈 . From (2.4.1), we have

k𝑣 − 𝑢 k 2 ≤ k𝑣 − 𝑢 − 𝛼𝑦 k 2 = h𝑣 − 𝑢 − 𝛼𝑦, 𝑣 − 𝑢 − 𝛼𝑦i
= k𝑣 − 𝑢 k 2 − h𝑣 − 𝑢, 𝛼𝑦i − h𝛼𝑦, 𝑣 − 𝑢i + 𝛼𝛼 h𝑦, 𝑦i
= k𝑣 − 𝑢 k 2 − |𝛼 | 2 k𝑦 k 2 .

Hence, |𝛼 | 2 k𝑦 k 2 = 0. As 𝑦 ≠ 0, |𝛼 | 2 = 0. It follows that h𝑣 − 𝑢, 𝑦i = 0.


To see the uniqueness of a best approximation, suppose that 𝑢, 𝑤 ∈ 𝑈 are best
approximations to 𝑣. Then k𝑣 −𝑢 k ≤ k𝑣 − 𝑤 k and k𝑣 − 𝑤 k ≤ k𝑣 −𝑢 k. So, k𝑣 −𝑢 k =
k𝑣 − 𝑤 k.
Moreover, 𝑤 − 𝑢 ∈ 𝑈 . Therefore, by what we have just proved, 𝑣 − 𝑤 ⊥ 𝑤 − 𝑢.
By Pythagoras’ theorem,

k𝑣 − 𝑢 k 2 = k(𝑣 − 𝑤) + (𝑤 − 𝑢)k 2 = k𝑣 − 𝑤 k 2 + k𝑤 − 𝑢 k 2 = k𝑣 − 𝑢 k 2 + k𝑤 − 𝑢 k 2 .

Thus, k𝑤 − 𝑢 k 2 = 0. That is, 𝑤 = 𝑢.


Observe that (2.15) does not guarantee the existence of a best approximation. We
show that if 𝑈 is finite dimensional subspace of the ips 𝑉 , then corresponding to
each vector 𝑣 ∈ 𝑉 , there exists a unique best approximation 𝑢 to 𝑣 from 𝑈 ; and such
a vector 𝑢 can be given in a closed form using an orthonormal basis for 𝑈 .

(2.16) Theorem
Let {𝑢 1, . . . , 𝑢𝑛 } be an orthonormal basis for a subspace 𝑈 of an ips 𝑉 . Let 𝑣 ∈ 𝑉 .
Í
Then 𝑢 = 𝑛𝑖=1 h𝑣, 𝑢𝑖 i𝑢𝑖 is the best approximation of 𝑣 from 𝑈 .
50 MA2031 Classnotes
Í𝑛
Proof. Write 𝑢 := 𝑖=1 h𝑣, 𝑢𝑖 i𝑢𝑖 . Since 𝑢 ∈ 𝑈 , by Fourier expansion, we have
Í
𝑢 = 𝑛𝑖=1 h𝑢, 𝑢𝑖 i𝑢𝑖 . Due to (1.18), h𝑣, 𝑢𝑖 i = h𝑢, 𝑢𝑖 i for 1 ≤ 𝑖 ≤ 𝑛. That is, h𝑣 −𝑢, 𝑢𝑖 i = 0
for each 𝑖 ∈ {1, . . . , 𝑛}. Now, if 𝑥 ∈ 𝑈 , then there exist scalars 𝑎 1, . . . , 𝑎𝑛 such that
Í Í
𝑥 = 𝑛𝑖=1 𝑎𝑖 𝑢𝑖 . Then h𝑣 − 𝑢, 𝑥i = 𝑛𝑖=1 𝑎𝑖 h𝑣 − 𝑢, 𝑢𝑖 i = 0. That is, 𝑣 − 𝑢 ⊥ 𝑥 for each
𝑥 ∈ 𝑈.
By (2.15), 𝑢 is the best approximation of 𝑣 from 𝑈 .
For obvious geometrical reasons, the vector 𝑢 in (2.16) is called the orthogonal
projection of 𝑣 on the subspace 𝑈 , and it is denoted by proj𝑈 (𝑣).
Notice that the orthogonality condition 𝑣 − 𝑢 ⊥ 𝑥 for each 𝑥 ∈ 𝑈 in (2.15) is
equivalent to 𝑣 − 𝑢 ⊥ 𝑢 𝑗 for each 𝑗 whenever {𝑢 1, . . . , 𝑢𝑛 } is a spanning set for 𝑈 .
This is helpful in computing the best approximation, without using an orthonormal
basis.
Suppose {𝑢 1, . . . , 𝑢𝑛 } is any basis of 𝑈 . Write the best approximation of 𝑣 from 𝑈
Í
as 𝑢 = 𝑛𝑗=1 𝛽 𝑗 𝑢 𝑗 with unknown scalars 𝛽 𝑗 . Then using the orthogonality condition,
Í
we have h𝑣 − 𝑛𝑗=1 𝛽 𝑗 𝑢 𝑗 , 𝑢𝑖 i = 0. This way, the scalars 𝛽 𝑗 are determined from the
linear system
Õ 𝑛
h𝑢 𝑗 , 𝑢𝑖 i𝛽 𝑗 = h𝑣, 𝑢𝑖 i for 𝑖 = 1, . . . , 𝑛.
𝑗=1

(2.17) Example
1. For the best approximation of 𝑣 = (1, 0) ∈ R2 from 𝑈 = {(𝑎, 𝑎) : 𝑎 ∈ R}, we
look for a point (𝛼, 𝛼) so that (1, 0) − (𝛼, 𝛼) ⊥ (𝛽, 𝛽) for all 𝛽. That is, we look
for an 𝛼 so that (1 − 𝛼, −𝛼) · (1, 1) = 0. Or, 1 − 𝛼 − 𝛼 = 0. It leads to 𝛼 = 1/2. The
best approximation here is ( 1/2, 1/2).
2. Reconsider (2.14). We require a vector (𝛼, 𝛽, 𝛾) ∈ 𝑈 = {(𝑎, 𝑏, 𝑐) ∈ R3 : 𝑎 + 𝑏 +
𝑐 = 0} which is the best approximation to (1, 1, 1). A basis for 𝑈 is given by
{(1, −1, 0), (0, 1, −1)}. The orthogonality condition in (2.15) implies that

(1, 1, 1) − (𝛼, 𝛽, 𝛾) ⊥ (1, −1, 0), (1, 1, 1) − (𝛼, 𝛽, 𝛾) ⊥ (0, 1, −1).

These equations along with the fact (𝛼, 𝛽, 𝛾) ∈ 𝑈 give

1 − 𝛼 − 1 + 𝛽 = 0, 1 − 𝛽 − 1 + 𝛾 = 0, 𝛼 + 𝛽 + 𝛾 = 0.

That is, 𝛼 = 𝛽 = 𝛾 = 0. Therefore, (0, 0, 0) is the required best approximation.


Alternatively, an orthonormal basis for 𝑈 is given by {𝑢 1, 𝑢 2 }, where 𝑢 1 =
√ √ √ √ √
( 1/ 2, −1/ 2, 0) and 𝑢 2 = ( 1/ 6, 1/ 6, −2/ 6). With 𝑣 = (1, 1, 1), we obtain h𝑣, 𝑢 1 i =
0 = h𝑣, 𝑢 2 i. The best approximation of 𝑣 from 𝑈 is given by 𝑢 = h𝑣, 𝑢 1 i𝑢 1 +
h𝑣, 𝑢 2 i𝑢 2 = (0, 0, 0).
Inner Product Spaces 51
∫1
3. In R[𝑡] with h𝑓 , 𝑔i = 0
𝑓 (𝑡)𝑔(𝑡) 𝑑𝑡, what is the best approximation of 𝑡 2 from
R1 [𝑡]?
Write the best approximation of 𝑡 2 from R1 [𝑡] as 𝛼 + 𝛽𝑡 . The orthogonality
condition asks us to determine 𝛼, 𝛽 ∈ R so that 𝑡 2 −(𝛼 +𝛽𝑡) ⊥ 1 and 𝑡 2 −(𝛼 +𝛽𝑡) ⊥
𝑡 . That is,

∫1 ∫1
2
(𝑡 − 𝛼 − 𝛽𝑡) 𝑑𝑡 = 0 = (𝑡 3 − 𝛼𝑡 − 𝛽𝑡 2 ) 𝑑𝑡 .
0 0

This gives 31 − 𝛼 − 2 = 0 = 41 − 𝛼2 − leading to 𝛼 = − 16 and 𝛽 = 1. Therefore,


𝛽 𝛽
3
the best approximation is − 61 + 𝑡 .

Observe that the best approximation 𝑢 of a vector 𝑣 ∈ 𝑉 from 𝑈 is the orthogonal


projection of 𝑣 on 𝑈 . It is the same vector 𝑢 that we have used in the proof of Bessel’s
inequality.

Exercises for § 2.4


Find the best approximation of 𝑣 ∈ 𝑉 from 𝑈 in the following:
1. 𝑉 = R3, 𝑣 = (1, 2, 1), 𝑈 = span {(3, 1, 2), (1, 0, 1)}.
2. 𝑉 = R3, 𝑣 = (1, 2, 1), 𝑈 = {(𝛼, 𝛽, 𝛾) ∈ R3 : 𝛼 + 𝛽 + 𝛾 = 0}.
3. 𝑉 = R4, 𝑣 = (1, 0, −1, 1), 𝑈 = span {(1, 0, −1, 1), (0, 0, 1, 1)}.
∫1
4. 𝑉 = R3 [𝑡], 𝑣 = 𝑡 3, 𝑈 = span {1, 1 + 𝑡, 1 + 𝑡 2 }, h𝑝, 𝑞i = 0 𝑝 (𝑡)𝑞(𝑡) 𝑑𝑡 .
∫1
5. 𝑉 = 𝐶 [−1, 1], 𝑣 (𝑡) = 𝑒 𝑡 , 𝑈 = R2 [𝑡], h𝑓 , 𝑔i = −1 𝑓 (𝑡)𝑔(𝑡) 𝑑𝑡 .
Ans: 1. (5/3, 4/3, 1/3) 2. (−1/3, 2/3, −1/3) 3. (1, 0, −1, 1)
4. −19/20 − 3𝑡/5 + 3𝑡 2 /2 5. 𝑒 𝑡 − 9(𝑒 − /𝑒) + 3𝑡/𝑒 − 15(𝑒 − 13/𝑒)𝑡 2 /8.
3
Linear Transformations

3.1 What is a linear transformation?


The interesting maps in a domain of mathematical discourse are those maps which
preserve structures. Since a vector space gets its structure from the two operations
of addition and scalar multiplication, we are interested in maps that preserve these
operations.
Let 𝑉 and 𝑊 be vector spaces over the same field F. A function 𝑇 : 𝑉 → 𝑊 is
said to be a linear transformation (or a linear map, or a linear operator) iff for all
𝑥, 𝑦 ∈ 𝑉 and for each 𝛼 ∈ F,

𝑇 (𝑥 + 𝑦) = 𝑇 (𝑥) + 𝑇 (𝑦) and 𝑇 (𝛼𝑥) = 𝛼𝑇 (𝑥).

A linear transformation from 𝑉 to 𝑉 is called a linear operator on 𝑉 .


A linear transformation from 𝑉 to F is called a linear functional.
Observe that the two conditions in the definition of a linear transformation amount
to the single condition

𝑇 (𝑥 + 𝛼𝑦) = 𝑇 (𝑥) + 𝛼𝑇 (𝑦) for all 𝑥, 𝑦 ∈ 𝑉 , and for each 𝛼 ∈ F.

(3.1) Example
1. Let 𝑉 be a vector space. The map 𝑇 : 𝑉 → 𝑉 defined by 𝑇 (𝑣) = 0 for each 𝑣 ∈ 𝑉
is a linear operator on 𝑉 ; it is called the zero operator.
2. Let 𝑉 be a vector space. The map 𝑇 : 𝑉 → 𝑉 defined by 𝑇 (𝑣) = 𝑣 is a linear
operator on 𝑉 ; it is called the identity operator.
3. Let 𝑉 be a vector space. Let 𝛼 be any scalar. Then the map 𝑇 : 𝑉 → 𝑉 defined
by 𝑇 (𝑣) = 𝛼𝑣 is a linear operator on 𝑉 .
4. Define the map 𝑇 : R3 → R2 by 𝑇 (𝑎, 𝑏, 𝑐) = (2𝑎 + 𝑏, 𝑏 − 𝑐). Then 𝑇 is a linear
transformation.
5. The map 𝑇 : R2 → R3 defined by 𝑇 (𝑎, 𝑏) = (𝑎 + 𝑏, 2𝑎 − 𝑏, 𝑎 + 6𝑏) is a linear
transformation.

52
Linear Transformations 53
6. For 1 ≤ 𝑖 ≤ 𝑚 and 1 ≤ 𝑗 ≤ 𝑛, let 𝑎𝑖 𝑗 ∈ F. Define 𝑇 : F𝑛 → F𝑚 by
𝑛
Õ 𝑛
Õ 
𝑇 (𝛽 1, . . . , 𝛽𝑛 ) = 𝑎 1𝑗 𝛽 𝑗 , . . . , 𝑎𝑚 𝑗 𝛽 𝑗 .
𝑗=1 𝑗=1

Then, 𝑇 is a linear transformation.


7. Fix 𝜙 ∈ [0, 2𝜋]. For any 𝑥 = (𝑟 cos 𝜃, 𝑟 sin 𝜃 ) ∈ R2 , the map 𝑇 : R2 → R2
defined by

𝑇 (𝑥) = 𝑟 cos(𝜃 + 𝜙), 𝑟 sin(𝜃 + 𝜙)
is the rotation by the angle 𝜙. It is a linear operator on R2 .
8. Fix 𝜙 ∈ [0, 2𝜋]. For any 𝑥 = (𝑟 cos 𝜃, 𝑟 sin 𝜃 ) ∈ R2 , the map 𝑇 : R2 → R2
defined by

𝑇 (𝑥) = 𝑟 cos(𝜙 − 𝜃 ), 𝑟 sin(𝜙 − 𝜃 )
is the reflection on the line making an angle 𝜙/2 with R × {0}, the 𝑥-axis. 𝑇 is a
linear operator on R2 .
9. For 1 ≤ 𝑗 ≤ 𝑛, define 𝑇 𝑗 : R𝑛 → R by 𝑇 𝑗 (𝑎 1, . . . , 𝑎𝑛 ) = 𝑎 𝑗 . Then 𝑇 𝑗 is a linear
functional. In general, let 𝛼 1, . . . , 𝛼𝑛 ∈ R. Define the map 𝑇 : R𝑛 → R by
Í
𝑇 (𝑎 1, . . . , 𝑎𝑛 ) = 𝑛𝑖=1 𝛼𝑖 𝑎𝑖 . Then 𝑇 is a linear functional.
10. Let 𝑉 be a vector space with basis {𝑣 1, . . . , 𝑣𝑛 }. Given any vector 𝑣, there
exist unique scalars 𝛼 1, . . . , 𝛼𝑛 ∈ F such that 𝑣 = 𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 . Fix any
𝑖 ∈ {1, . . . , 𝑛}. Define the map 𝑇𝑖 : 𝑉 → F by 𝑇𝑖 (𝑣) = 𝛼𝑖 . Then 𝑇𝑖 is a linear
functional. Reason?
Any 𝑣 ∈ 𝑉 can be written as 𝑣 = 𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 , where 𝛼𝑖 is a unique scalar
for each 𝑖 ∈ {1, . . . , 𝑛}. Thus 𝑇𝑖 (𝑣) = 𝛼𝑖 defines a function 𝑇𝑖 from 𝑉 to F. To see
that 𝑇𝑖 is a linear functional, let 𝑣, 𝑤 ∈ 𝑉 . There exist unique scalars 𝛼 1, . . . , 𝛼𝑛
and 𝛽 1, . . . , 𝛽𝑛 such that 𝑣 = 𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 and 𝑤 = 𝛽 1𝑣 1 + · · · + 𝛽𝑛 𝑣𝑛 . Then
𝑛
Õ 
𝑇𝑖 (𝑣 + 𝑤) = 𝑇𝑖 (𝛼 𝑗 + 𝛽 𝑗 )𝑣 𝑗 = 𝛼𝑖 + 𝛽𝑖 = 𝑇𝑖 (𝑣) + 𝑇𝑖 (𝑤).
𝑗=1

Similarly the other condition can be verified.


11. Let 𝑛 ∈ N. Define the map 𝑇 : F𝑛 [𝑡] → F𝑛−1 [𝑡] by 𝑇 (𝑝 (𝑡)) = 𝑝 0 (𝑡), the
derivative of 𝑝 (𝑡) with respect to 𝑡 . Then 𝑇 is a linear transformation.
12. Let 𝛼 ∈ [𝑎, 𝑏]. Define the function 𝑇𝛼 : 𝐶 [𝑎, 𝑏] → F by 𝑇𝛼 (𝑓 ) = 𝑓 (𝛼). Verify
that 𝑇𝛼 is a linear functional.
13. Let the function 𝑇 : 𝐶 1 [𝑎, 𝑏] → 𝐶 [𝑎, 𝑏] be defined by 𝑇 (𝑓 ) = 𝑓 0 . Then 𝑇
is a linear transformation. (Here 𝐶 𝑘 [𝑎, 𝑏] is the vector space of all 𝑘-times
continuously differentiable functions from [𝑎, 𝑏] to R.)
54 MA2031 Classnotes

14. Let the map 𝑇 : 𝐶 1 [𝑎, 𝑏] → 𝐶 [𝑎, 𝑏] be defined by 𝑇 (𝑓 ) = 𝛼 𝑓 + 𝛽 𝑓 0, where 𝛼, 𝛽


are fixed scalars. Verify that 𝑇 is a linear transformation.
15. Let 𝑆 and 𝑇 be linear transformations from 𝑉 to 𝑊 . Let 𝛼, 𝛽 ∈ F. Then the map
𝐴 : 𝑉 → 𝑊 defined by 𝐴(𝑣) = 𝛼𝑆 (𝑣) + 𝛽𝑇 (𝑣) is a linear transformation.
16. Let 𝐴 ∈ F𝑚×𝑛 . Define 𝑇 : F𝑛×𝑘 → F𝑚×𝑘 by 𝑇 (𝑋 ) = 𝐴𝑋 . Then 𝑇 is a linear
transformation since 𝑇 (𝑋 + 𝛼𝑌 ) = 𝐴(𝑋 + 𝛼𝑌 ) = 𝐴𝑋 + 𝛼𝐴𝑦 for any 𝛼 ∈ F.

Remember that we talk of a linear transformation from 𝑉 to 𝑊 only when both 𝑉


and 𝑊 are vector spaces over the same field F, be it R or C.
Convention: When we say that 𝑇 : 𝑉 → 𝑊 is a linear transformation, we assume
that 𝑉 and 𝑊 are vector spaces over the same field F.

(3.2) Theorem
Let 𝑇 : 𝑉 → 𝑊 be a linear transformation. Then the following are true:
(1) 𝑇 (0) = 0.
(2) For all 𝑢, 𝑣 ∈ 𝑉 , 𝑇 (𝑢 − 𝑣) = 𝑇 (𝑢) − 𝑇 (𝑣).
(3) For any 𝑛 ∈ N, for all 𝑣 1, . . . , 𝑣𝑛 ∈ 𝑉 and for all scalars 𝛼 1, . . . , 𝛼𝑛 ,
𝑇 (𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 ) = 𝛼 1𝑇 (𝑣 1 ) + · · · + 𝛼𝑛𝑇 (𝑣𝑛 ).

Proof. (1) 𝑇 (0) + 𝑇 (0) = 𝑇 (0 + 0) = 𝑇 (0) = 𝑇 (0) + 0. Hence 𝑇 (0) = 0.


(2) 𝑇 (𝑢) = 𝑇 (𝑢 − 𝑣 + 𝑣) = 𝑇 (𝑢 − 𝑣) + 𝑇 (𝑣). Hence 𝑇 (𝑢 − 𝑣) = 𝑇 (𝑢) − 𝑇 (𝑣).
(3) It follows by induction.
Every map that ‘looks linear’ need not be a linear transformation. For instance,
the map 𝑇 : R → R defined by 𝑇 (𝑥) = 2𝑥 + 3 is not a linear transformation since
𝑇 (0) ≠ 0.

(3.3) Theorem
Let 𝑇 : 𝑈 → 𝑉 and 𝑆 : 𝑉 → 𝑊 be linear transformations. Then 𝑆 ◦ 𝑇 : 𝑈 → 𝑊 is
a linear transformation.

Proof. Recall that the map 𝑆 ◦𝑇 is defined by (𝑆 ◦𝑇 )(𝑢) = 𝑆 (𝑇 (𝑢)) for 𝑢 ∈ 𝑈 . Let
𝑥, 𝑦 ∈ 𝑈 and let 𝛼 ∈ F. Now,

(𝑆 ◦ 𝑇 )(𝑥 + 𝛼𝑦) = 𝑆 (𝑇 (𝑥 + 𝛼𝑦)) = 𝑆 (𝑇 (𝑥) + 𝛼𝑇 (𝑦))


= 𝑆 (𝑇 (𝑥)) + 𝛼𝑆 (𝑇 (𝑦)) = (𝑆 ◦ 𝑇 )(𝑥) + 𝛼 (𝑆 ◦ 𝑇 )(𝑦).

Therefore, (𝑆 ◦ 𝑇 ) is a linear transformation.


As the above theorem shows, the composition of two, hence any finite number of,
linear transformations is a linear transformation.
Linear Transformations 55
In what follows we will abbreviate 𝑇 (𝑥) to 𝑇 𝑥 and 𝑆 ◦ 𝑇 to 𝑆𝑇 , whenever it does
not harm our understanding.

Exercises for § 3.1


1. Why is the function 𝑇 : R2 → R2 given below not a linear transformation?
(a) 𝑇 (𝑎, 𝑏) = (1, 𝑏) (b) 𝑇 (𝑎, 𝑏) = (𝑎, 𝑎 2 ) (c) 𝑇 (𝑎, 𝑏) = (sin 𝑎, 0)
(d) 𝑇 (𝑎, 𝑏) = (|𝑎|, 𝑏) (e) 𝑇 (𝑎, 𝑏) = (𝑎 + 1, 𝑏) (f) 𝑇 (𝑎, 𝑏) = (2𝑎 + 𝑏, 𝑎 + 𝑏 2 ).
Ans: Look at (a) 𝑇 (0, 0) (b) 𝑇 (2, 2), 2𝑇 (1, 1) (c) 𝑇 (𝜋/2, 0), 2𝑇 (𝜋/4, 0)
(d) 𝑇 (−1, 0), (−1)𝑇 (1, 0) (e) 𝑇 (0, 0) (f) 𝑇 (0, 2), 2𝑇 (0, 1).
2. What are the linear operators on R? Ans: 𝑇 (𝑥) = 𝛼𝑥 for some 𝛼 .
3. Let 𝑇 : R2 → R2be a linear transformation with 𝑇 (1, 0) = (1, 4) and
𝑇 (1, 1) = (2, 5). What is 𝑇 (2, 3)? Is 𝑇 one-one? Ans: (5, 11), Yes.
4. Let 𝑆 : 𝐶 1 [0, 1] → 𝐶 [0, 1] and 𝑇 : 𝐶 [0, 1] → R be defined by 𝑆 (𝑢) = 𝑢 0 and
∫1
𝑇 (𝑣) = 0 𝑣 (𝑡) 𝑑𝑡 . Find, if possible, 𝑆𝑇 and 𝑇 𝑆. Are they linear transforma-
tions? Ans: 𝑇 𝑆 (𝑥) = 0, 𝑆𝑇 (𝑥) = 𝑥 (1) − 𝑥 (0), Yes.
5. Can you construct a linear operator on R2 that maps the square with corners
at (−1, −1), (1, −1), (1, 1) and (−1, 1) onto the square with corners at (0, 0),
(1, 0), (1, 1) and (0, 1)? Ans: No. Look at pre-images of (1, 1), (−1, −1).
6. Give a linear transformation 𝑇 from 𝑉 onto F2, where dim (𝑉 ) = 2.

3.2 Action on a basis


Consider the linear transformation 𝐷 : R3 [𝑡] → R2 [𝑡] defined by 𝐷 (𝑝 (𝑡)) = 𝑝 0 (𝑡).
We know that 𝐷 (𝑡 3 ) = 3𝑡 2, 𝐷 (𝑡 2 ) = 2𝑡, 𝐷 (𝑡) = 1 and 𝐷 (1) = 0. We may then use
its linearity to obtain 𝐷 (𝑝 (𝑡)) for any polynomial 𝑝 (𝑡) ∈ R3 [𝑡]. For instance,

𝐷 (𝑎 + 𝑏𝑡 + 𝑐𝑡 2 + 𝑑𝑡 3 ) = 𝑎𝐷 (1) + 𝑏𝐷 (𝑡) + 𝑐𝐷 (𝑡 2 ) + 𝑑𝐷 (𝑡 3 ) = 𝑏 + 2𝑐𝑡 + 3𝑑𝑡 2 .

Similarly, let 𝑇 : R3 → R be a linear transformation with 𝑇 (1, 0, 0) = 2 and


𝑇 (0, 1, 0) = −1. Since (2, 3, 0) = 2(1, 0, 0) + 3(0, 1, 0),

𝑇 (2, 3, 0) = 2𝑇 (1, 0, 0) + 3𝑇 (0, 1, 0) = 2 × 2 + 3 × (−1) = 1.

Can we construct a linear transformation 𝑇 : R3 → R2 with 𝑇 (1, 0, 0) = (1, 0)?


In fact, we can give infinitely many such linear transformations. For instance, if
𝛼 ∈ R, then the map 𝑇 given by 𝑇 (𝑎, 𝑏, 𝑐) = (𝑎, 𝛼𝑏) is a linear transformation.
56 MA2031 Classnotes

We may also give infinitely many linear transformations 𝑇 : R3 → R2 with


𝑇 (1, 0, 0) = (1, 0) and 𝑇 (0, 1, 0) = (0, 1). For instance, corresponding to each 𝛼 ∈ R
define 𝑇 by 𝑇 (𝑎, 𝑏, 𝑐) = (𝑎 + 𝛼𝑐, 𝑏 + 𝛼𝑐).
However, there exists only one linear transformation 𝑇 : R3 → R2 with

𝑇 (1, 0, 0) = (1, 0), 𝑇 (0, 1, 0) = (0, 1), 𝑇 (0, 0, 1) = (1, 1).

Reason?

𝑇 (𝑎, 𝑏, 𝑐) = 𝑎 𝑇 (1, 0, 0) + 𝑏 𝑇 (0, 1, 0) + 𝑐 𝑇 (0, 0, 1)


= 𝑎 (1, 0) + 𝑏 (0, 1) + 𝑐 (1, 1) = (𝑎 + 𝑐, 𝑏 + 𝑐).

What are the information required to describe a linear transformation? The


following theorem provides an answer, which we state informally and explain its
formal meaning in the proof.

(3.4) Theorem
A linear transformation is uniquely determined from its action on a basis.

Proof. Let 𝑉 and 𝑊 be vector spaces. Let 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } be a basis of 𝑉 . Let


𝑤 1, . . . , 𝑤𝑛 ∈ 𝑊 . We want to show that there exists a unique linear transformation
𝑇 : 𝑉 → 𝑊 with 𝑇 (𝑣 1 ) = 𝑤 1, 𝑇 (𝑣 2 ) = 𝑤 2, . . . , 𝑇 (𝑣𝑛 ) = 𝑤𝑛 . Notice that the vectors
𝑤 1, . . . , 𝑤𝑛 , which are the images of the basis vectors, need neither be linearly
independent, nor be distinct.
For the existence of such a map, we construct one from 𝑉 to 𝑊 , and then prove
that this map is a linear transformation.
Let 𝑥 ∈ 𝑉 . Then 𝑥 = 𝑎 1𝑣 1 + · · · + 𝑎𝑛 𝑣𝑛 for unique scalars 𝑎 1, . . . , 𝑎𝑛 . Define

𝑇 (𝑥) = 𝑎 1𝑤 1 + · · · + 𝑎𝑛𝑤𝑛 .

Due to uniqueness of the scalars, this map is well-defined. We must verify the two
defining conditions of a linear transformation.
Let 𝑢, 𝑣 ∈ 𝑉 . Then 𝑢 = 𝑏 1𝑣 1 + · · · + 𝑏𝑛 𝑣𝑛 and 𝑣 = 𝑐 1𝑣 1 + · · · + 𝑐𝑛 𝑣𝑛 for some scalars
𝑏𝑖 , 𝑐 𝑗 . Now, 𝑢 + 𝑣 = (𝑏 1 + 𝑐 1 )𝑣 1 + · · · + (𝑏𝑛 + 𝑐𝑛 )𝑣𝑛 . Thus

𝑇 (𝑢 + 𝑣) = (𝑏 1 + 𝑐 1 )𝑤 1 + · · · + (𝑏𝑛 + 𝑐𝑛 )𝑤𝑛
= (𝑏 1𝑤 1 + · · · + 𝑏𝑛𝑤𝑛 ) + (𝑐 1𝑤 1 + · · · + 𝑐𝑛𝑤𝑛 ) = 𝑇 (𝑢) + 𝑇 (𝑣).

Similarly, for any scalar 𝛼,

𝑇 (𝛼𝑢) = 𝑇 (𝛼𝑏 1𝑣 1 + · · · + 𝛼𝑏𝑛 𝑣𝑛 ) = 𝛼𝑏 1𝑤 1 + · · · + 𝛼𝑏𝑛𝑤𝑛


= 𝛼 (𝑏 1𝑤 1 + · · · + 𝑏𝑛𝑤𝑛 ) = 𝛼𝑇 (𝑢).
Linear Transformations 57
Therefore 𝑇 is a linear transformation.
For uniqueness of 𝑇 , suppose 𝑆 : 𝑉 → 𝑊 is a linear transformation with 𝑆 (𝑣 1 ) =
𝑤 1, 𝑆 (𝑣 2 ) = 𝑤 2, . . . , 𝑆 (𝑣𝑛 ) = 𝑤𝑛 . That is,

𝑆 (𝑣𝑖 ) = 𝑇 (𝑣𝑖 ) for 1 ≤ 𝑖 ≤ 𝑛.

We show that 𝑆 = 𝑇 . For this, let 𝑦 ∈ 𝑉 . We have scalars 𝛼𝑖 ∈ F such that


𝑦 = 𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 . Then

𝑆 (𝑦) = 𝑆 (𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 ) = 𝛼 1𝑆 (𝑣 1 ) + · · · + 𝛼𝑛 𝑆 (𝑣𝑛 )
= 𝛼 1𝑇 (𝑣 1 ) + · · · + 𝛼𝑛𝑇 (𝑣𝑛 ) = 𝑇 (𝛼 1𝑣 2 + · · · + 𝛼𝑛 𝑣𝑛 ) = 𝑇 (𝑦).

Hence 𝑆 = 𝑇 as maps from 𝑉 to 𝑊 .

(3.5) Example

1. Let 𝑇 : R3 → R2 be a linear transformation such that 𝑇 (1, 0, 0) = (2, 3),


𝑇 (0, 1, 0) = (−1, 4) and 𝑇 (0, 0, 1) = (5, −3). Then

𝑇 (3, −4, 5) = 3𝑇 (1, 0, 0) + (−4)𝑇 (0, 1, 0) + 5𝑇 (0, 0, 1)


= 3(2, 3) + (−4)(−1, 4) + 5(5, −3) = (35, −22).

2. Construct a linear transformation 𝑇 : R2 → {(𝑎, 𝑏, 𝑐) : 𝑎 − 𝑏 − 𝑐 = 0}.


We start with a basis, say, {𝑣 1 = (1, 0), 𝑣 2 = (0, 1)} of R2 . Let 𝑊 := {(𝑎, 𝑏, 𝑐) :
𝑎 − 𝑏 − 𝑐 = 0}. Choose any two vectors in 𝑊 , for instance, 𝑤 1 = (1, 1, 0) and
𝑤 2 = (1, 0, 1). We want 𝑇 (1, 0) = 𝑤 1 and 𝑇 (0, 1) = 𝑤 2 . Thus define

𝑇 (𝑎, 𝑏) = 𝑇 (𝑎(1, 0) + 𝑏 (0, 1)) = 𝑎𝑇 (1, 0) + 𝑏𝑇 (0, 1)


= 𝑎𝑤 1 + 𝑏𝑤 2 = 𝑎(1, 1, 0) + 𝑏 (1, 0, 1) = (𝑎 + 𝑏, 𝑎, 𝑏).

This is one of the required linear transformations from R2 to 𝑊 . Find another


linear transformation.
3. Construct a linear transformation 𝑇 : R3 [𝑡] → R2 such that

𝑇 (1 + 𝑡) = (1, 2), 𝑇 (1 − 𝑡) = (1, 1), 𝑇 (1 − 𝑡 2 ) = (2, 1).

For such a linear transformation 𝑇 , we have


1 1
(1, 2) + (1, 1) = (1, 32 ).
 
𝑇 (1) = 2 𝑇 (1 + 𝑡) + 𝑇 (1 − 𝑡) = 2
𝑇 (𝑡) = 𝑇 (1 + 𝑡) − 𝑇 (1) = (1, 2) − (1, 32 ) = (0, 12 ).
𝑇 (𝑡 2 ) = 𝑇 (1) − 𝑇 (1 − 𝑡 2 ) = (1, 32 ) − (2, 1) = (−1, 12 ).
58 MA2031 Classnotes

We are free to choose 𝑇 (𝑡 3 ). For convenience, we choose 𝑇 (𝑡 3 ) = (0, 0). Then a


required linear transformation is given by

𝑇 (𝑎 + 𝑏𝑡 + 𝑐𝑡 2 + 𝑑𝑡 3 ) = 𝑎𝑇 (1) + 𝑏𝑇 (𝑡) + 𝑐𝑇 (𝑡 2 ) + 𝑑𝑇 (𝑡 3 )
= 𝑎 − 𝑐, 12 (3𝑎 + 𝑏 + 𝑐) .


Find another linear transformation satisfying the given conditions.

Exercises for § 3.2


1. In each of the following, construct a linear transformation 𝑇 if it exists.
(a) 𝑇 : R2 → R2 ; 𝑇 (1, 1) = (1, −1), 𝑇 (0, 1) = (−1, 1), 𝑇 (2, −1) = (1, 0).
Ans: No. 𝑇 (2, −1) =?
(b) 𝑇 : R2 → R3 ; 𝑇 (1, 1) = (1, 0, 2), 𝑇 (2, 3) = (1, −1, 4).
Ans: 𝑇 (𝑎, 𝑏) = (2𝑎 − 𝑏, 𝑎 − 𝑏, 2𝑎).
(c) 𝑇 : R3 → R2 ; 𝑇 (1, 0, 3) = (1, 1), 𝑇 (−2, 0, −6) = (2, 1).
Ans: No. 𝑇 (−2, 0, −6) =?
(d) 𝑇 : R3 → R2 ; 𝑇 (1, 1, 0) = (0, 0), 𝑇 (0, 1, 1) = (1, 1), 𝑇 (1, 0, 1) = (1, 0).
Ans: 𝑇 (𝑎, 𝑏, 𝑐) = (𝑐, (𝑏 + 𝑐 − 𝑎)/2).
(e) 𝑇 : R3 [𝑡] → R; 𝑇 (𝑎 + 𝑏𝑡 2 ) = 0 for any 𝑎, 𝑏 ∈ R.
Ans: 𝑇 (𝑎 + 𝑏𝑡 + 𝑐𝑡 2 + 𝑑𝑡 3 ) = 𝑏 + 𝑐 and others.
(f) 𝑇 : R𝑛 [𝑡] → R; 𝑇 (𝑝 (𝑡)) = 𝑝 (𝛼) for a fixed 𝛼 ∈ R, and 𝑇 is onto.
Ans: This 𝑇 . ∫1
(g) 𝑇 : 𝐶 1 [0, 1] → R; 𝑇 (𝑢) = 0 (𝑢 (𝑡)) 2 𝑑𝑡 . Ans: No. 𝑇 (1 + 𝑡) =?
∫1
(h) 𝑇 : 𝐶 1 [0, 1] → R2 ; 𝑇 (𝑢) = 0 𝑢 (𝑡) 𝑑𝑡, 𝑢 0 (0) . Ans: This 𝑇 .


2. Does there exist a linear operator on R2 which maps the square with cor-
ners at (−1, −1), (1, −1), (1, 1), (−1, 1) onto the square with corners at
(−1, 0), (1, 0), (1, 2), (−1, 2)? No. Pre-image of (1, 2) =?
3. Let 𝑉 and 𝑊 be real ips. Let 𝑇 : 𝑉 → 𝑊 be a linear transformation. Prove
that for all 𝑥, 𝑦 ∈ 𝑉 , h𝑇 𝑥,𝑇𝑦i = h𝑥, 𝑦i iff k𝑇 𝑥 k = k𝑥 k.

3.3 Range space and null space


Recall that a function 𝑓 : 𝑋 → 𝑌 is called a one-one (injective) function iff for
all 𝑤, 𝑥 ∈ 𝑋, 𝑓 (𝑤) = 𝑓 (𝑥) implies that 𝑤 = 𝑥 . The function 𝑓 is called an onto
(surjective) function iff its range {𝑓 (𝑥) : 𝑥 ∈ 𝑋 } is equal to its co-domain 𝑌 .
Whether a linear transformation is one-one or onto can be decided by looking at
two subspaces.
Linear Transformations 59
Let 𝑇 : 𝑉 → 𝑊 be a linear transformation. Then
𝑁 (𝑇 ) = {𝑣 ∈ 𝑉 : 𝑇 (𝑣) = 0} is called the null space of 𝑇 ;
𝑅(𝑇 ) = {𝑇 (𝑣) : 𝑣 ∈ 𝑉 } is called the range space of 𝑇 .
The null space of 𝑇 is also called the kernel of 𝑇 . We should justify as to why
these two sets of vectors are called spaces.

(3.6) Theorem
Let 𝑇 : 𝑉 → 𝑊 be a linear transformation. Then 𝑁 (𝑇 ) is a subspace of 𝑉 and
𝑅(𝑇 ) is a subspace of 𝑊 .

Proof. Since 𝑇 (0) = 0, 0 ∈ 𝑁 (𝑇 ); so 𝑁 (𝑇 ) is a nonempty subset of 𝑉 . Let


𝑢, 𝑣 ∈ 𝑁 (𝑇 ) and let 𝛼 be a scalar. Then 𝑇 (𝑢) = 𝑇 (𝑣) = 0. Consequently, 𝑇 (𝑢 +𝛼𝑣) =
𝑇 (𝑢) + 𝛼𝑇 (𝑣) = 0. That is, 𝑢 + 𝛼𝑣 ∈ 𝑁 (𝑇 ). Therefore, 𝑁 (𝑇 ) is a subspace of 𝑉 .
Again, since 𝑇 (0) = 0, 𝑅(𝑇 ) is a nonempty subset of 𝑊 . Let 𝑥, 𝑦 ∈ 𝑅(𝑇 ) and let
𝛽 be a scalar. Then there exist 𝑢, 𝑣 ∈ 𝑉 such that 𝑥 = 𝑇 (𝑢) and 𝑦 = 𝑇 (𝑣). Now,
𝑥 + 𝛽𝑦 = 𝑇 (𝑢) + 𝛽𝑇 (𝑣) = 𝑇 (𝑢 + 𝛽𝑣). That is, 𝑥 + 𝛽𝑦 ∈ 𝑅(𝑇 ). Therefore, 𝑅(𝑇 ) is a
subspace of 𝑊 .
Since the null space and the range space of a linear transformation are vector
spaces, they have some dimensions. We thus give names to these numbers.
Let 𝑇 : 𝑉 → 𝑊 be a linear transformation. Then null(𝑇 ) := dim (𝑁 (𝑇 )) is called
the nullity of 𝑇 ; and rank(𝑇 ) := dim (𝑅(𝑇 )) is called the rank of 𝑇 .

(3.7) Example
Let 𝑇 : R3 → R2 be defined by 𝑇 (𝑎, 𝑏, 𝑐) = (𝑎 + 𝑏, 𝑎 − 𝑐). To determine null(𝑇 ),
suppose 𝑇 (𝑎, 𝑏, 𝑐) = (0, 0), then 𝑎 = −𝑏 and 𝑎 = 𝑐. Therefore

𝑁 (𝑇 ) = {(𝑎, 𝑏, 𝑐) ∈ R3 : 𝑎 = −𝑏 = 𝑐} = {(𝑎, −𝑎, 𝑎) : 𝑎 ∈ R}.

A basis for 𝑁 (𝑇 ) is {(1, −1, 1)}. Therefore, null(𝑇 ) = 1.


Any vector in 𝑅(𝑇 ) is of the form (𝑎 + 𝑏, 𝑎 − 𝑐) for 𝑎, 𝑏, 𝑐 ∈ R. Since

(𝑎 + 𝑏, 𝑎 − 𝑐) = 𝑎(1, 1) + 𝑏 (1, 0) + 𝑐 (0, −1),

𝑅(𝑇 ) = span {(1, 1), (1, 0), (0, −1)}. Now, (1, 1) ∈ span {(1, 0), (0, −1)}; and
{(1, 0), (0, −1)} is a linearly independent set. Therefore, a basis for 𝑅(𝑇 ) is
{(1, 0), (0, −1)}. Consequently, rank(𝑇 ) = 2.

Since a linear transformation is completely determined by its action on a basis,


the images of the basis vectors should span its range space. We show this formally
in the following theorem.
60 MA2031 Classnotes

(3.8) Theorem
Let 𝑇 : 𝑉 → 𝑊 be a linear transformation. Let 𝐵 be a basis of 𝑉 . Then

𝑅(𝑇 ) = span {𝑇 (𝑣) : 𝑣 ∈ 𝐵}.

Proof. Let 𝑤 ∈ 𝑅(𝑇 ). There exists 𝑢 ∈ 𝑉 such that𝑇 (𝑢) = 𝑤 . Since 𝐵 is a basis of𝑉 ,
there exist scalars 𝛼 1, . . . , 𝛼𝑛 and vectors 𝑣 1, . . . , 𝑣𝑛 ∈ 𝐵 such that 𝑢 = 𝛼 1𝑣 + · · · +𝛼𝑛 𝑣𝑛 .
Then

𝑤 = 𝑇 (𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 ) = 𝛼 1𝑇 (𝑣 1 ) + · · · + 𝛼𝑛𝑇 (𝑣𝑛 ) ∈ span {𝑇 (𝑣 1 ), . . . ,𝑇 (𝑣𝑛 )}.

That is, 𝑅(𝑇 ) ⊆ span {𝑇 (𝑣) : 𝑣 ∈ 𝐵}. Conversely, for each 𝑣 ∈ 𝐵, 𝑇 (𝑣) ∈ 𝑅(𝑇 ). As
𝑅(𝑇 ) is a vector space, we have span {𝑇 (𝑣) : 𝑣 ∈ 𝐵} ⊆ 𝑅(𝑇 ).
As we see a linear transformation is one-one or onto can be characterized in terms
of its nullity and rank.

(3.9) Theorem
Let 𝑇 : 𝑉 → 𝑊 be a linear transformation.
(1) 𝑇 is one-one iff 𝑁 (𝑇 ) ⊆ {0} iff 𝑁 (𝑇 ) = {0} iff null(𝑇 ) = 0.
(2) 𝑇 is an onto map iff 𝑊 ⊆ span {𝑇 𝑣 : 𝑣 ∈ 𝐵} for any basis 𝐵 of 𝑉 iff
rank(𝑇 ) = dim (𝑊 ).

Proof. (1) Assume that 𝑇 is one-one. If 𝑥 ∈ 𝑁 (𝑇 ), then 𝑇 (𝑥) = 0, which is equal


to 𝑇 (0). Now, 𝑇 (𝑥) = 𝑇 (0) implies 𝑥 = 0. That is, 𝑁 (𝑇 ) ⊆ {0}.
Conversely, suppose 𝑁 (𝑇 ) ⊆ {0}. Let 𝑇 (𝑥) = 𝑇 (𝑦). Then 𝑇 (𝑥 − 𝑦) = 0. So,
𝑥 − 𝑦 ∈ 𝑁 (𝑇 ). That is, 𝑥 − 𝑦 = 0. Therefore, 𝑇 is one-one.
Other equivalences are trivial.
(2) As 𝑅(𝑇 ) ⊆ 𝑊 , the linear transformation 𝑇 is onto iff 𝑊 ⊆ 𝑅(𝑇 ). Let 𝐵 be any
basis of 𝑉 . From (3.8), it follows that 𝑅(𝑇 ) = span {𝑇 𝑣 : 𝑣 ∈ 𝐵}. Therefore, 𝑇 is an
onto map iff 𝑊 ⊆ span {𝑇 𝑣 : 𝑣 ∈ 𝐵}. The other equivalence is trivial.
Look at (3.7). Does it suggest any connection between the rank and the nullity of
a linear transformation?

(3.10) Theorem (Rank-nullity)


Let 𝑉 be a finite dimensional vector space. Let 𝑇 : 𝑉 → 𝑊 be a linear transfor-
mation. Then rank(𝑇 ) + null(𝑇 ) = dim (𝑉 ).

Proof. If 𝑇 = 0, the zero map, then 𝑅(𝑇 ) = {0} and 𝑁 (𝑇 ) = 𝑉 . Clearly, the
dimension formula holds. So, assume that 𝑇 is a nonzero linear transformation.
The null space 𝑁 (𝑇 ) is a subspace of the finite dimensional vector space 𝑉 . So, let
Linear Transformations 61
𝐵 = {𝑣 1, . . . , 𝑣𝑘 } be a basis of 𝑁 (𝑇 ). [It includes the case of 𝑁 (𝑇 ) = {0}. In this
case, 𝐵 = ∅; and we take 𝑘 = 0, which is the number of vectors in 𝐵.] Extend 𝐵 to
a basis 𝐵 ∪ {𝑤 1, . . . , 𝑤𝑛 } for 𝑉 . Let 𝐸 = {𝑇 (𝑤 1 ), . . . ,𝑇 (𝑤𝑛 )}. We show that 𝐸 is a
basis of 𝑅(𝑇 ).
By (3.8), 𝑅(𝑇 ) = span {𝑇 (𝑣 1 ), . . . ,𝑇 (𝑣𝑘 ),𝑇 (𝑤 1 ), . . . ,𝑇 (𝑤𝑛 )}. Since 𝑇 (𝑣𝑖 ) = 0 for
each 𝑖 ∈ {1, . . . , 𝑘 }, 𝑅(𝑇 ) = span {𝑇 (𝑤 1 ), . . . ,𝑇 (𝑤𝑛 )} = span (𝐸).
For linear independence of 𝐸, let 𝑏 1, . . . , 𝑏𝑛 be scalars such that

𝑏 1𝑇 (𝑤 1 ) + · · · + 𝑏𝑛𝑇 (𝑤𝑛 ) = 0.

Then 𝑇 (𝑏 1𝑤 1 + · · · + 𝑏𝑛𝑤𝑛 ) = 0. So, 𝑏 1𝑤 1 + · · · + 𝑏𝑛𝑤𝑛 ∈ 𝑁 (𝑇 ). Since 𝐵 is a basis


of 𝑁 (𝑇 ), there exist scalars 𝑎 1, . . . , 𝑎𝑘 such that

𝑏 1𝑤 1 + · · · + 𝑏𝑛𝑤𝑛 = 𝑎 1𝑣 1 + · · · + 𝑎𝑘 𝑣𝑘 .

That is,
𝑎 1𝑣 1 + · · · + 𝑎𝑘 𝑣𝑘 − 𝑏 1𝑤 1 − · · · − 𝑏𝑛𝑤𝑛 = 0.
Since 𝐵 ∪ {𝑤 1, . . . , 𝑤𝑛 } is a basis of 𝑉 , we have 𝑎 1 = · · · = 𝑎𝑘 = 𝑏 1 = · · · = 𝑏𝑛 = 0.
Hence 𝐸 is linearly independent.
Now that 𝐵 is a basis for 𝑁 (𝑇 ), 𝐸 is a basis for 𝑅(𝑇 ), and 𝐵 ∪ {𝑤 1, . . . , 𝑤𝑛 } is a
basis for 𝑉 , we have rank(𝑇 ) + null(𝑇 ) = 𝑛 + 𝑘 = dim (𝑉 ).

(3.11) Example
Let 𝑇 : R2 [𝑡] → R4 be defined by 𝑇 (𝑎 + 𝑏𝑡 + 𝑐𝑡 2 ) = (𝑎 − 𝑏, 𝑏 − 𝑐, 𝑐 + 𝑎, −2𝑎).
Determine rank(𝑇 ) and null(𝑇 ).
We find that 𝑅(𝑇 ) = {(𝑎 − 𝑏, 𝑏 − 𝑐, 𝑐 + 𝑎, −2𝑎) : 𝑎, 𝑏, 𝑐 ∈ R}. Since

(𝑎 − 𝑏, 𝑏 − 𝑐, 𝑐 + 𝑎, −2𝑎) = 𝑎(1, 0, 1, −2) + 𝑏 (−1, 1, 0, 0) + 𝑐 (0, −1, 1, 0),

the vectors (1, 0, 1, −2), (−1, 1, 0, 0) and (0, −1, 1, 0) span 𝑅(𝑇 ).
Alternatively, we start with a basis of R2 [𝑡], say {1, 𝑡, 𝑡 2 }. Now,

𝑇 (1) = (1, 0, 1, −2), 𝑇 (𝑡) = (−1, 1, 0, 0), 𝑇 (𝑡 2 ) = (0, −1, 1, 0).

By (3.8), 𝑅(𝑇 ) = span {(1, 0, 1, −2), (−1, 1, 0, 0), (0, −1, 1, 0)}.
To check linear independence, suppose

𝑎(1, 0, 1, −2) + 𝑏 (−1, 1, 0, 0) + 𝑐 (0, −1, 1, 0) = 0.

Then (𝑎−𝑏, 𝑏 −𝑐, 𝑐 +𝑎, −2𝑎) = (0, 0, 0, 0). So, 𝑎−𝑏 = 0, 𝑏 −𝑐 = 0, 𝑐 +𝑎 = 0, −2𝑎 = 0.
Solving, we find that 𝑎 = 𝑏 = 𝑐 = 0. So, 𝐵 = {(1, 0, 1, −2), (−1, 1, 0, 0), (0, −1, 1, 0)}
is linearly independent. That is, 𝐵 is a basis of 𝑅(𝑇 ). Hence, rank(𝑇 ) = 3.
62 MA2031 Classnotes

To compute null(𝑇 ), let 𝑣 = (𝑎 + 𝑏𝑡 + 𝑐𝑡 2 ) ∈ 𝑁 (𝑇 ). Then

𝑇 (𝑣) = (𝑎 − 𝑏, 𝑏 − 𝑐, 𝑐 + 𝑎, −2𝑎) = (0, 0, 0, 0).

As earlier, it implies that 𝑎 = 𝑏 = 𝑐 = 0. That is, 𝑣 = 0. So, 𝑁 (𝑇 ) = {0}; and then


null(𝑇 ) = 0.
Notice that by Rank-nullity theorem, rank(𝑇 ) = dim (R2 [𝑡]) − null(𝑇 ) = 3. This
is a shorter way to compute rank(𝑇 ).

Further Rank-nullity theorem implies the following:


If 𝑚 < 𝑛, then no linear transformation from R𝑚 to R𝑛 is onto.
If 𝑚 > 𝑛, then no linear transformation from R𝑚 to R𝑛 is one-one.

Exercises for § 3.3


1. Determine rank(𝑇 ) and null(𝑇 ) in each of the following cases:
(a) 𝑇 : R2 → R2 ; 𝑇 (𝑎, 𝑏) = (𝑎 − 𝑏, 2𝑏).
(b) 𝑇 : R3 → R2 ; 𝑇 (𝑎, 𝑏, 𝑐) = (𝑎 − 𝑏, 2𝑐).
(c) 𝑇 : R2 → R3 ; 𝑇 (𝑎, 𝑏) = (𝑎 + 𝑏, 𝑎 − 𝑏, 0).
(d) 𝑇 : R2 → R3 ; 𝑇 (𝑎, 𝑏) = (𝑎 + 𝑏, 0, 2𝑏 − 𝑎).
(e) 𝑇 : R2 → R3 ; 𝑇 (𝑎, 𝑏) = (𝑎 + 𝑏, 𝑎 + 2𝑏, 2𝑎 − 2𝑏).
(f) 𝑇 : R2 [𝑡] → R3 [𝑡]; 𝑇 (𝑝 (𝑡)) = 𝑡 𝑝 (𝑡) + 𝑝 0 (𝑡).
Ans: rank(𝑇 ), null(𝑇 ) = (a) 2,0 (b) 2,1 (c) 2,0 (d) 2,0 (e) 2,0 (f) 3,0.
2. Let 𝑉 be the vector space of all functions from R to R, having derivatives of
all orders. Let 𝑇 : 𝑉 → 𝑉 be the differential operator: 𝑇 𝑥 = 𝑥 0. What is
𝑁 (𝑇 )? Ans: Space of all const. functions.
3. Define 𝑇 : R3 → R3 by 𝑇 (𝑎, 𝑏, 𝑐) = (𝑏 + 𝑐, 𝑎 + 𝑏 + 2𝑐, 𝑎 + 𝑐).
(a) Determine the null space 𝑁 (𝑇 ) and the range space 𝑅(𝑇 ).
(b) Let 𝑆 = {𝑣 ∈ R3 : 𝑇 (𝑣) = (1, 3, 2)}. Express the subset 𝑆 of R3 in the
form 𝑆 = {𝑢 + 𝑥 : 𝑥 ∈ 𝑁 (𝑇 )} for a particular vector 𝑢 ∈ 𝑆.
Ans: (a) 𝑁 (𝑇 ) = {(𝑎, 𝑎, −𝑎) : 𝑎 ∈ R}, 𝑅(𝑇 ) = {(𝑎, 𝑎 + 𝑏, 𝑏) : 𝑎, 𝑏 ∈ R}.
(b) 𝑆 = {(1, 0, 1) + 𝑥 : 𝑥 ∈ 𝑁 (𝑇 )}.
4. Let 𝑉 and 𝑊 be finite dimensional vector spaces. Let 𝑇 : 𝑉 → 𝑊 be a linear
transformation. Give reasons for the following:
(a) rank(𝑇 ) ≤ min{dim (𝑉 ), dim (𝑊 )}.
(b) 𝑇 is onto implies dim (𝑊 ) ≤ dim (𝑉 ).
(c) 𝑇 is one-one implies dim (𝑉 ) ≤ dim (𝑊 ).
(d) dim (𝑉 ) > dim (𝑊 ) implies 𝑇 is not one-one.
Linear Transformations 63
(e) dim (𝑉 ) < dim (𝑊 ) implies 𝑇 is not onto.
5. Give an example for each of the following:
(a) A linear transformation 𝑇 : R2 → R2 with 𝑁 (𝑇 ) = 𝑅(𝑇 ).
(b) Linear transformations 𝑆 ≠ 𝑇 with 𝑁 (𝑆) = 𝑁 (𝑇 ) and 𝑅(𝑆) = 𝑅(𝑇 ).
Ans: (a) 𝑇 (𝑎, 𝑏) = (𝑎 − 𝑏, 𝑎 − 𝑏) (b) 𝑆 (𝑎, 𝑏) = (𝑎, 𝑏), 𝑇 (𝑎, 𝑏) = (𝑏, 𝑎).
6. Let 𝑈 be a subspace of a finite dimensional vector space 𝑉 .
(a) Does there exist a liner operator 𝑇 on 𝑉 such that 𝑅(𝑇 ) = 𝑈 ?
(b) Does there exist a liner operator 𝑇 on 𝑉 such that 𝑁 (𝑇 ) = 𝑈 ?
Ans: (a) Yes (b) Yes. Use extension of basis.
7. Let 𝑇 : 𝑉 → 𝑊 be a linear transformation and let {𝑣 1, . . . , 𝑣𝑛 } be a basis of
𝑉 . Show the following:
(a) 𝑇 is one-one iff 𝑇 (𝑣 1 ), . . . ,𝑇 (𝑣𝑛 ) are linearly independent.
(b) 𝑇 is onto iff span {𝑇 (𝑣 1 ), . . . ,𝑇 (𝑣𝑛 )} = 𝑊 .
(c) 𝑇 is one-one and onto iff the list of vectors 𝑇 (𝑣 1 ), . . . ,𝑇 (𝑣𝑛 ) is a basis
for 𝑊 .
8. Show that a subset {𝑣 1, . . . , 𝑣𝑛 } of a vector space 𝑉 is linearly independent
iff the map 𝑓 : F𝑛 → 𝑉 defined by 𝑓 (𝛼 1, 𝛼 2, . . . , 𝛼𝑛 ) = 𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 is
one-one.
9. Let 𝑇 : 𝑉 → 𝑉 be a linear operator such that 𝑇 2 = 𝑇 . Let 𝐼 denote the identity
operator. Prove that 𝑅(𝑇 ) = 𝑁 (𝐼 − 𝑇 ) and 𝑁 (𝑇 ) = 𝑅(𝐼 − 𝑇 ).
10. Let 𝑆 : 𝑈 → 𝑉 and 𝑇 : 𝑉 → 𝑊 be linear transformations where 𝑈 , 𝑉 and 𝑊
are of finite dimensions. Show that rank(𝑇 𝑆) ≤ min{rank(𝑇 ), rank(𝑆)}.

3.4 Isomorphisms
Recall that a function 𝑓 : 𝑋 → 𝑌 is one-one and onto iff there exists a unique
function 𝑔 : 𝑌 → 𝑋 such that 𝑔 ◦ 𝑓 = 𝐼𝑋 and 𝑓 ◦ 𝑔 = 𝐼𝑌 . Here, the map 𝐼𝑋 is the
identity map on 𝑋 and similarly, 𝐼𝑌 is the identity map on 𝑌 . In such a case, the
function 𝑓 is said to be invertible, and its inverse, which is the function 𝑔, is denoted
by 𝑓 −1 .
We give a name to a one-one onto linear transformation.
A one-one and onto linear transformation is called an isomorphism.
Two vector spaces are called isomorphic to each other iff there exists an isomor-
phism from one to the other.
64 MA2031 Classnotes

(3.12) Example

1. Define the linear transformation 𝑇 : R2 → R2 by

𝑇 (𝑎, 𝑏) = (𝑎 − 𝑏, 2𝑎 + 𝑏) for (𝑎, 𝑏) ∈ R2 .

If (𝑎, 𝑏) ∈ 𝑁 (𝑇 ), then 𝑇 (𝑎, 𝑏) = (0, 0). It leads to 𝑎 − 𝑏 = 0 = 2𝑎 + 𝑏. Thus,


𝑎 = 𝑏 = 0. That is, 𝑁 (𝑇 ) ⊆ {0}. Therefore 𝑇 is one-one.
By Rank-nullity theorem rank(𝑇 ) = 2 − null(𝑇 ) = 2 − 0 = 2. Thus 𝑅(𝑇 ) = R2 ;
consequently, 𝑇 is an onto map.
Therefore, 𝑇 is an isomorphism.
2. 𝑇 : R𝑛 → R𝑛 defined by 𝑇 (𝑎 1, . . . , 𝑎𝑛 ) = (𝑎 1, 𝑎 1 + 𝑎 2, . . . , 𝑎 1 + · · · + 𝑎𝑛 ) is an
isomorphism.
3. 𝑇 : R𝑛 [𝑡] → R𝑛+1 defined by 𝑇 (𝑎 0 + 𝑎 1𝑡 + · · · + 𝑎𝑛 𝑡 𝑛 ) = (𝑎 0, . . . , 𝑎𝑛 ) is an
isomorphism.

(3.13) Theorem
The inverse of an isomorphism is also an isomorphism.

Proof. Let 𝑇 : 𝑉 → 𝑊 be an isomorphism. Since 𝑇 is one-one and onto, its


inverse 𝑇 −1 exists, and it is also one-one and onto. We show that 𝑇 −1 is a linear
transformation. So, let 𝑥, 𝑦 ∈ 𝑊 and let 𝛼 be a scalar. Since 𝑇 is onto, there exist
𝑢, 𝑣 ∈ 𝑉 such that 𝑇 (𝑢) = 𝑥 and 𝑇 (𝑣) = 𝑦. Then 𝑇 (𝑢 + 𝛼𝑣) = 𝑇 (𝑢) + 𝛼𝑇 (𝑣) = 𝑥 + 𝛼𝑦.
That is, 𝑇 −1 (𝑥 + 𝛼𝑦) = 𝑢 + 𝛼𝑣 = 𝑇 −1 (𝑥) + 𝛼𝑇 −1 (𝑦). Therefore, 𝑇 −1 is a linear
transformation.
Each vector space is isomorphic to itself. Reason? The identity map is an
isomorphism. If 𝑉 is isomorphic to 𝑊 , then the inverse of an isomorphism from 𝑉
to 𝑊 is an isomorphism from 𝑊 to 𝑉 . That is, 𝑊 is also isomorphic to 𝑉 . Further,
(3.3) implies that a composition of isomorphisms is an isomorphism. Thus, if
𝑈 is isomorphic to 𝑉 and 𝑉 is isomorphic to 𝑊 , then 𝑈 is also isomorphic to 𝑊 .
Therefore, ‘is isomorphic to’ is an equivalence relation on the set of all vector spaces.
This is the reason we have been talking of “an isomorphism between spaces” and
“two spaces being isomorphic”.
The following result is a corollary to Rank-nullity theorem.

(3.14) Theorem
Let 𝑉 and 𝑊 be finite dimensional vector spaces with dim (𝑉 ) = dim (𝑊 ). Let
𝑇 : 𝑉 → 𝑊 be a linear transformation. Then 𝑇 is an isomorphism iff 𝑇 is one-one
iff 𝑇 it is onto.
Linear Transformations 65
Proof. 𝑇 is one-one iff null(𝑇 ) = 0 iff rank(𝑇 ) = dim (𝑉 ) iff rank(𝑇 ) = dim (𝑊 )
iff 𝑇 is onto.

(3.15) Example
Let 𝑊 = {(𝑎, 𝑏, 𝑐, 𝑑) ∈ R4 : 𝑎 + 𝑏 + 𝑐 + 𝑑 = 0}. Define 𝑇 : R2 [𝑡] → 𝑊 by

𝑇 (𝑎 + 𝑏𝑡 + 𝑐𝑡 2 ) = (𝑎 − 𝑏, 𝑏 − 𝑐, 𝑐 + 𝑎, −2𝑎).

The map 𝑇 is well defined since (𝑎 − 𝑏) + (𝑏 − 𝑐) + (𝑐 + 𝑎) + (−2𝑎) = 0 shows that


(𝑎 −𝑏, 𝑏 −𝑐, 𝑐 +𝑎, −2𝑎) ∈ 𝑊 . It is also easy to verify that 𝑇 is a linear transformation.
We look for 𝑁 (𝑇 ) and 𝑅(𝑇 ).
As in (3.11), 𝑁 (𝑇 ) ⊆ {0}. So, null(𝑇 ) = 0. Thus rank(𝑇 ) = 3. Also, dim (𝑊 ) = 3,
as {(1, −1, 0, 0), (1, 0, −1, 0), (1, 0, 0, −1)} is a basis of 𝑊 . Hence 𝑇 is an isomor-
phism.
To see that 𝑊 = 𝑅(𝑇 ) directly we may proceed as follows. We start with a
basis of 𝑊 , say, {(1, 0, 0, −1), (0, 1, 0, −1), (0, 0, 1, −1)}. Are these basis vectors in
𝑅(𝑇 )? For the first basis vector, we look for a polynomial 𝑎 + 𝑏𝑡 + 𝑐𝑡 2 such that
𝑇 (𝑎 + 𝑏𝑡 + 𝑐𝑡 2 ) = (𝑎 − 𝑏, 𝑏 − 𝑐, 𝑐 + 𝑎, −2𝑎) = (1, 0, 0, −1).
It leads to 𝑎 = 21 , 𝑏 = − 12 , 𝑐 = − 12 . We verify.

𝑇 ( 12 − 12 𝑡 − 12 𝑡 2 ) = 1
+ 21 , − 12 + 12 , − 12 + 12 , −2 × 1

2 2 = (1, 0, 0, −1).

Proceeding similarly for the other two basis vectors, we see that

𝑇 ( 12 + 12 𝑡 − 12 𝑡 2 ) = 1
− 12 , 21 + 12 , 21 − 12 , −2 × 1

2 2 = (0, 1, 0, −1).

𝑇 ( 12 + 12 𝑡 + 12 𝑡 2 ) = 1
− 21 , 12 − 12 , 21 + 12 , −2 × 1

2 2 = (0, 0, 1, −1).
Therefore, 𝑊 = span {(1, 0, 0, −1), (0, 1, 0, −1), (0, 0, 1, −1)} ⊆ 𝑅(𝑇 ) ⊆ 𝑊 .

Clearly, a one-one linear transformation is an isomorphism from its domain space


to its range space.
The dimensions of two vector spaces provide complete information as to when
an isomorphism may exist from one to the other.

(3.16) Theorem
A vector space is isomorphic to a finite dimensional vector space iff both the spaces
have equal dimensions.

Proof. Let 𝑉 and 𝑊 be vector spaces, where 𝑉 is of finite dimension. Let


𝑇 : 𝑉 → 𝑊 be an isomorphism. Then 𝑁 (𝑇 ) = {0} and 𝑅(𝑇 ) = 𝑊 . By Rank-nullity
theorem, dim (𝑉 ) = null(𝑇 ) + rank(𝑇 ) = 0 + rank(𝑇 ) = dim (𝑊 ).
66 MA2031 Classnotes

Conversely, suppose dim (𝑉 ) = dim (𝑊 ) = 𝑛. Let 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } be a basis


of 𝑉 and let 𝐸 = {𝑤 1, . . . , 𝑤𝑛 } be a basis of 𝑊 . Using (3.4), define the linear
transformation 𝑇 : 𝑉 → 𝑊 by 𝑇 (𝑣𝑖 ) = 𝑤𝑖 for 1 ≤ 𝑖 ≤ 𝑛.
Since 𝐵 and 𝐸 span 𝑉 and 𝑊 respectively, 𝑇 is onto. By (3.14), 𝑇 is an isomor-
phism.
It thus follows that a finite dimensional vector space cannot be isomorphic to an
infinite dimensional vector space.
As (3.16) asserts, isomorphisms preserve dimension. We add a little flavour to it.

(3.17) Theorem
Let 𝑃 : 𝑈 → 𝑉 , 𝑇 : 𝑉 → 𝑊 and 𝑄 : 𝑊 → 𝑋 be linear transformations, where 𝑉
and 𝑊 are finite dimensional vector spaces. Suppose 𝑃 and 𝑄 are isomorphisms.
Then rank(𝑄𝑇 𝑃) = rank(𝑇 ) and null(𝑄𝑇 𝑃) = null(𝑇 ).

Proof. We first prove that (1) 𝑅(𝑇 𝑃) = 𝑅(𝑇 ), (2) 𝑁 (𝑄𝑇 ) = 𝑁 (𝑇 ).


(1) Let 𝑥 ∈ 𝑅(𝑇 𝑃). There exists 𝑢 ∈ 𝑈 such that 𝑥 = 𝑇 (𝑃 (𝑢)). Clearly, 𝑥 ∈ 𝑅(𝑇 ).
Conversely, let 𝑦 ∈ 𝑅(𝑇 ). There exists 𝑣 ∈ 𝑉 such that 𝑦 = 𝑇 (𝑣). Then 𝑦 = 𝑇 (𝑣) =
(𝑇 𝑃) 𝑃 −1 (𝑣) ∈ 𝑅(𝑇 𝑃). Hence 𝑅(𝑇 𝑃) = 𝑅(𝑇 ).


(2) Let 𝑢 ∈ 𝑁 (𝑄𝑇 ). Then 𝑄𝑇 (𝑢) = 0 implies 𝑇 (𝑢) = 𝑄 −1 (𝑄𝑇 (𝑢)) = 0. That is,
𝑢 ∈ 𝑁 (𝑇 ). Conversely, let 𝑧 ∈ 𝑁 (𝑇 ). Then 𝑇 (𝑧) = 0. Clearly, 𝑄 (𝑇 (𝑧)) = 0. So,
𝑧 ∈ 𝑁 (𝑄𝑇 ). Therefore, 𝑁 (𝑄𝑇 ) = 𝑁 (𝑇 ).
By (1), rank(𝑇 𝑃) = rank(𝑇 ). Since 𝑃 : 𝑈 → 𝑉 is an isomorphism, dim (𝑈 ) =
dim (𝑉 ). By Rank-nullity theorem, we obtain

null(𝑇 𝑃) = dim (𝑈 ) − rank(𝑇 𝑃) = dim (𝑉 ) − rank(𝑇 ) = null(𝑇 ).

Similarly, using (2), we have

rank(𝑄𝑇 ) = dim (𝑉 ) − null(𝑄𝑇 ) = dim (𝑉 ) − null(𝑇 ) = rank(𝑇 ).

So, rank(𝑇 𝑃) = rank(𝑇 ) = rank(𝑄𝑇 ) and null(𝑇 𝑃) = null(𝑇 ) = null(𝑄𝑇 ). It thus


follows that rank(𝑄𝑇 𝑃) = rank(𝑇 ) and null(𝑄𝑇 𝑃) = null(𝑇 ).
The following result provides a converse to (3.17).

(3.18) Theorem
Let 𝑈 , 𝑉 ,𝑊 and 𝑋 be finite dimensional vector spaces with dim (𝑈 ) = dim (𝑉 ) and
dim (𝑊 ) = dim (𝑋 ). Let 𝑇 : 𝑉 → 𝑊 and 𝑆 : 𝑈 → 𝑋 be linear transformations. If
rank(𝑆) = rank(𝑇 ), then there exist isomorphisms 𝑃 : 𝑈 → 𝑉 and 𝑄 : 𝑊 → 𝑋 such
that 𝑆 = 𝑄𝑇 𝑃 .
Linear Transformations 67
The conditions dim (𝑈 ) = dim (𝑉 ) and dim (𝑊 ) = dim (𝑋 ) imply that there exist
isomorphisms 𝑃 : 𝑈 → 𝑉 and 𝑄 : 𝑊 → 𝑋 . However, the composition formula
𝑆 = 𝑄𝑇 𝑃 may not hold, in general. The theorem says that such a composition
formula holds for some isomorphisms 𝑃 and 𝑄 when rank(𝑆) = rank(𝑇 ).
Proof. Suppose that dim (𝑈 ) = dim (𝑉 ) = 𝑛, dim (𝑊 ) = dim (𝑋 ) = 𝑚, and
rank(𝑇 ) = rank(𝑆) = 𝑟 . Then null(𝑇 ) = null(𝑆) = 𝑛 − 𝑟 .
Choose a basis {𝑣 1, . . . , 𝑣𝑛−𝑟 } for 𝑁 (𝑇 ), which is a subspace of 𝑉 . Extend this
basis to {𝑣 1, . . . , 𝑣𝑛−𝑟 , . . . , 𝑣𝑛 } for 𝑉 . Similarly, choose a basis {𝑢 1, . . . , 𝑢𝑛−𝑟 } for
𝑁 (𝑆), which is a subspace of 𝑈 . Extend this to a basis {𝑢 1, . . . , 𝑢𝑛−𝑟 , . . . , 𝑢𝑛 } for 𝑈 .
Then define the vectors

𝑤 1 = 𝑇 𝑣𝑛−𝑟 +1, . . . , 𝑤𝑟 = 𝑇 𝑣𝑛 ; 𝑥 1 = 𝑆𝑢𝑛−𝑟 +1, . . . , 𝑥𝑟 = 𝑆𝑢𝑛 .

As in the proof of Rank-nullity theorem, {𝑤 1, . . . , 𝑤𝑟 } is a basis for 𝑅(𝑇 ), and


{𝑥 1, . . . , 𝑥𝑟 } is a basis for 𝑅(𝑆). Then extend these to bases {𝑤 1, . . . , 𝑤𝑟 , . . . , 𝑤𝑚 } for
𝑊 , and {𝑥 1, . . . , 𝑥𝑟 , . . . , 𝑥𝑚 } for 𝑋 . Define the linear transformations 𝑃 : 𝑈 → 𝑉 and
𝑄 : 𝑊 → 𝑋 by specifying their actions on the bases of 𝑈 and 𝑊 as in the following:

𝑃 (𝑢 1 ) = 𝑣 1, . . . , 𝑃 (𝑢𝑛 ) = 𝑣𝑛 ; 𝑄 (𝑤 1 ) = 𝑥 1, . . . , 𝑄 (𝑤𝑛 ) = 𝑥𝑛 .

Since each basis vector 𝑣𝑖 of 𝑉 is in 𝑅(𝑃), 𝑉 ⊆ 𝑅(𝑃); that is, 𝑃 is onto. By (3.14), 𝑃
is an isomorphism. Similarly, 𝑄 is also an isomorphism.
For 1 ≤ 𝑖 ≤ 𝑛 − 𝑟, 𝑄 (𝑇 (𝑃 (𝑢𝑖 ))) = 𝑄 (𝑇 (𝑣𝑖 )) = 𝑄 (0) = 0 = 𝑆 (𝑢𝑖 ).
For 𝑛 − 𝑟 < 𝑖 ≤ 𝑛, 𝑄 (𝑇 (𝑃 (𝑢𝑖 ))) = 𝑄 (𝑇 (𝑣𝑖 )) = 𝑄 (𝑤𝑖 ) = 𝑥𝑖 = 𝑆 (𝑢𝑖 ).
Since the linear transformations 𝑄𝑇 𝑃 and 𝑆 act the same way on each of the basis
vectors 𝑢 1, . . . , 𝑢𝑛 of 𝑈 , we conclude that 𝑄𝑇 𝑃 = 𝑆.
The results in (3.17-3.18) are often quoted by telling informally that
isomorphisms preserve rank and nullity of a linear transformation.

Exercises for § 3.4


1. Let {𝑣 1, . . . , 𝑣𝑛 } be an ordered basis of a vector space 𝑉 . Prove that the linear
transformation 𝑇 : 𝑉 → F𝑛 given by 𝑇 (𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 ) = (𝛼 1, . . . , 𝛼𝑛 ) is an
isomorphism.
2. Find an isomorphism between F𝑛+1 and F𝑛 [𝑡] other than that in (3.12).
Ans: 𝑇 (𝑎 0, 𝑎 1, . . . , 𝑎𝑛 ) = (𝑎 1 + 𝑎 2𝑡 + · · · + 𝑎𝑛 𝑡 𝑛−1 + 𝑎 0𝑡 𝑛 ).
3. Show that a linear transformation 𝑇 : 𝑉 → 𝑊 is an isomorphism iff for every
basis {𝑣 1, . . . , 𝑣𝑛 } of 𝑉 , the list of vectors 𝑇 (𝑣 1 ), . . . ,𝑇 (𝑣𝑛 ) is an ordered basis
of 𝑊 .
68 MA2031 Classnotes

4. Let 𝑇 : 𝑉 → R be a nonzero linear transformation, where 𝑉 ≠ {0}. Prove or


disprove: 𝑇 is onto iff null(𝑇 ) = dim (𝑉 ) − 1.
5. Let 𝑉 and 𝑊 be finite dimensional vector spaces, and let 𝑆, 𝑇 : 𝑉 → 𝑊 be
linear transformations. Prove that rank(𝑆) = rank(𝑇 ) iff there exist isomor-
phisms 𝑃 : 𝑉 → 𝑉 and 𝑄 : 𝑊 → 𝑊 such that 𝑆 = 𝑄𝑇 𝑃 .
6. Let 𝑇 : 𝑉 → F𝑛 be an isomorphism. Let h , i be the standard inner product
on F𝑛 . Show that h𝑥, 𝑦i𝑇 := h𝑇 (𝑥),𝑇 (𝑦)i for all 𝑥, 𝑦 ∈ 𝑉 , defines an inner
product on 𝑉 .
7. Let 𝑇 : 𝑉 → 𝑊 and 𝑆 : 𝑊 → 𝑉 be linear transformations, where 𝑉 and 𝑊
are finite dimensional vector spaces. Show the following:

(a) If dim (𝑉 ) = dim (𝑊 ), then 𝑆𝑇 = 𝐼𝑉 implies that 𝑇 𝑆 = 𝐼𝑊 .


(b) If dim (𝑉 ) ≠ dim (𝑊 ), then 𝑆𝑇 = 𝐼𝑉 does not necessarily imply that
𝑇 𝑆 = 𝐼𝑊 .

8. Give an example of an infinite dimensional vector space 𝑉 and linear operators


𝑆,𝑇 on 𝑉 such that 𝑆𝑇 = 𝐼 but 𝑇 𝑆 ≠ 𝐼 .
Ans: 𝑉 = R∞, 𝑇 (𝑎 1, 𝑎 2, . . .) = (0, 𝑎 1, 𝑎 2, . . .), 𝑆 (𝑏, 𝑎 1, 𝑎 2, . . . , ) = (𝑎 1, 𝑎 2, . . .).

3.5 Adjoint of a Linear Transformation


Recall that for ease in reading we write 𝑇 (𝑥) as 𝑇 𝑥 and 𝑆 ◦ 𝑇 as 𝑆𝑇 . We also write
inner products on different spaces using the same notation h , i.
In inner product spaces, equality of two linear transformations can be shown by
using the inner products.

(3.19) Theorem
Let 𝑉 be a vector space, 𝑊 be an inner product space, and let 𝑆, 𝑇 : 𝑉 → 𝑊 be
linear transformations. Then, 𝑆 = 𝑇 iff h𝑆𝑣, 𝑤i = h𝑇 𝑣, 𝑤i for all 𝑣 ∈ 𝑉 , 𝑤 ∈ 𝑊 .

Proof. If 𝑆 = 𝑇 , then h𝑆𝑣, 𝑤i = h𝑇 𝑣, 𝑤i for all 𝑣 ∈ 𝑉 , 𝑤 ∈ 𝑊 . Conversely, let


h𝑆𝑣, 𝑤i = h𝑇 𝑣, 𝑤i for all 𝑣 ∈ 𝑉 , 𝑤 ∈ 𝑊 . In particular, with 𝑤 = (𝑆 − 𝑇 )𝑣, we obtain
h(𝑆 − 𝑇 )𝑣, (𝑆 − 𝑇 )𝑣i = 0. It implies that for each 𝑣 ∈ 𝑉 , (𝑆 − 𝑇 )𝑣 = 0.
That is, 𝑆 = 𝑇 .

Linear transformations on finite dimensional inner product spaces give rise to


new linear transformations that work backward. This can be explicitly exhibited in
the presence of an orthonormal basis for the domain space.
Linear Transformations 69
(3.20) Theorem
Let 𝑉 and 𝑊 be inner product spaces, where 𝑉 is of finite dimension. Then,
corresponding to each linear transformation𝑇 : 𝑉 → 𝑊 , there exists a unique linear
transformation 𝑆 : 𝑊 → 𝑉 such that h𝑇 𝑣, 𝑤i = h𝑣, 𝑆𝑤i for all 𝑣 ∈ 𝑉 , 𝑤 ∈ 𝑊 .

Proof. Let 𝑇 : 𝑉 → 𝑊 be a linear transformation. Let {𝑣 1, . . . , 𝑣𝑛 } be an


Í
orthonormal basis of 𝑉 . Let 𝑣 ∈ 𝑉 . By Fourier expansion, we have 𝑣 = 𝑛𝑖=1 h𝑣, 𝑣𝑖 i𝑣𝑖 .
Í
Then 𝑇 𝑣 = 𝑛𝑖=1 h𝑣, 𝑣𝑖 i𝑇 𝑣𝑖 . For 𝑤 ∈ 𝑊 , we compute h𝑇 𝑣, 𝑤i.
𝑛
Õ 𝑛
Õ
h𝑇 𝑣, 𝑤i = h𝑣, 𝑣𝑖 ih𝑇 𝑣𝑖 , 𝑤i = h𝑇 𝑣𝑖 , 𝑤ih𝑣, 𝑣𝑖 i
𝑖=1 𝑖=1
𝑛
Õ D Õ 𝑛 E
= 𝑣, h𝑇 𝑣𝑖 , 𝑤i 𝑣𝑖 = 𝑣, h𝑤,𝑇 𝑣𝑖 i 𝑣𝑖 .
𝑖=1 𝑖=1

This suggests that we define the map 𝑆 : 𝑊 → 𝑉 by


𝑛
Õ
𝑆 (𝑤) = h𝑤,𝑇 𝑣𝑖 i 𝑣𝑖 for each 𝑤 ∈ 𝑊 .
𝑖=1

Is 𝑆 a linear transformation? Let 𝑥, 𝑦 ∈ 𝑊 and let 𝛼 be a scalar. Now,


𝑛
Õ 𝑛
Õ 
𝑆 (𝑥 + 𝛼𝑦) = h𝑥 + 𝛼𝑦,𝑇 𝑣𝑖 i 𝑣𝑖 = h𝑥,𝑇 𝑣𝑖 i + 𝛼 h𝑦,𝑇 𝑣𝑖 i 𝑣𝑖
𝑖=1 𝑖=1
𝑛
Õ 𝑛
Õ
= h𝑥,𝑇 𝑣𝑖 i𝑣𝑖 + 𝛼 h𝑦,𝑇 𝑣𝑖 i𝑣𝑖 = 𝑆 (𝑥) + 𝛼𝑆 (𝑦).
𝑖=1 𝑖=1

Therefore, 𝑆 is a linear transformation satisfying h𝑇 𝑣, 𝑤i = h𝑣, 𝑆𝑤i for all 𝑣 ∈ 𝑉 ,


𝑤 ∈𝑊.
For uniqueness of 𝑆, let 𝑄 : 𝑊 → 𝑉 be a linear transformation such that

h𝑇 𝑣, 𝑤i = h𝑣, 𝑄𝑤i for all 𝑣 ∈ 𝑉 , 𝑤 ∈ 𝑊 .

Then for all 𝑣 ∈ 𝑉 and 𝑤 ∈ 𝑊 , we obtain h𝑄𝑤, 𝑣i = h𝑇 𝑣, 𝑤i = h𝑣, 𝑆𝑤i = h𝑆𝑤, 𝑣i.
By (3.19), we conclude that 𝑄 = 𝑆.
We give a name to the linear transformation 𝑆, in (3.20), corresponding to 𝑇 .
Let 𝑉 and 𝑊 be inner product spaces, where 𝑉 is of finite dimension. Let
𝑇 : 𝑉 → 𝑊 be a linear transformation. The linear transformation 𝑇 ∗ : 𝑊 → 𝑉
defined by
h𝑇 𝑥, 𝑦i = h𝑥,𝑇 ∗𝑦i for all 𝑥 ∈ 𝑉 , 𝑦 ∈ 𝑊
70 MA2031 Classnotes

is called the adjoint of 𝑇 . The proof of (3.20) supplies an explicit representation of


the adjoint in the presence of an orthonormal basis. If {𝑣 1, . . . , 𝑣𝑛 } is an orthonormal
basis of 𝑉 , and 𝑇 : 𝑉 → 𝑊 is a linear transformation, then 𝑇 ∗ : 𝑊 → 𝑉 is given by
𝑛
Õ

𝑇 (𝑤) = h𝑤,𝑇 (𝑣𝑖 )i 𝑣𝑖 for each 𝑤 ∈ 𝑊 . (3.5.1)
𝑖=1

Notice that in (3.5.1), the inner product is the inner product of 𝑊 .

(3.21) Example
Consider the spaces R3 and R4 with their standard bases and the standard inner
products. Define the linear transformation 𝑇 : R4 → R3 by

𝑇 (𝑎, 𝑏, 𝑐, 𝑑) = (𝑎 + 𝑐, 𝑏 − 2𝑐 + 𝑑, 𝑎 − 𝑏 + 𝑐 − 𝑑).

For obtaining 𝑇 ∗ : R3 → R4, we proceed as follows:

(𝑎, 𝑏, 𝑐, 𝑑), 𝑇 ∗ (𝛼, 𝛽, 𝛾) = 𝑇 (𝑎, 𝑏, 𝑐, 𝑑), (𝛼, 𝛽, 𝛾)


= h(𝑎 + 𝑐, 𝑏 − 2𝑐 + 𝑑, 𝑎 − 𝑏 + 𝑐 − 𝑑), (𝛼, 𝛽, 𝛾)i
= (𝑎 + 𝑐)𝛼 + (𝑏 − 2𝑐 + 𝑑)𝛽 + (𝑎 − 𝑏 + 𝑐 − 𝑑)𝛾
= 𝑎(𝛼 + 𝛾) + 𝑏 (𝛽 − 𝛾) + 𝑐 (𝛼 − 2𝛽 + 𝛾) + 𝑑 (𝛽 − 𝛾)
= (𝑎, 𝑏, 𝑐, 𝑑), (𝛼 + 𝛾, 𝛽 − 𝛾, 𝛼 − 2𝛽 + 𝛾, 𝛽 − 𝛾) .

Therefore, 𝑇 ∗ (𝛼, 𝛽, 𝛾) = (𝛼 + 𝛾, 𝛽 − 𝛾, 𝛼 − 2𝛽 + 𝛾, 𝛽 − 𝛾).


Alternatively, using the standard basis as the orthonormal basis for R4, we obtain

𝑇 (𝑒 1 ) = (1, 0, 1), 𝑇 (𝑒 2 ) = (0, 1, −1), 𝑇 (𝑒 3 ) = (1, −2, 1), 𝑇 (𝑒 4 ) = (0, 1, −1).

Then, for each 𝑤 = (𝛼, 𝛽, 𝛾) ∈ R3, we have


4
Õ

𝑇 (𝑤) = h𝑤,𝑇 (𝑒𝑖 )i 𝑒𝑖
𝑖=1
= (𝛼 + 𝛾)𝑒 1 + (𝛽 − 𝛾)𝑒 2 + (𝛼 − 2𝛽 + 𝛾)𝑒 3 + (𝛽 − 𝛾)𝑒 4
= (𝛼 + 𝛾, 𝛽 − 𝛾, 𝛼 − 2𝛽 + 𝛾, 𝛽 − 𝛾).

(3.22) Theorem (Riesz Representation)


Corresponding to each linear functional 𝑓 on a finite dimensional ips 𝑉 , there exists
a unique 𝑦 ∈ 𝑉 such that for each 𝑥 ∈ 𝑉 , 𝑓 (𝑥) = h𝑥, 𝑦i.

Proof. Let 𝑉 be a finite dimensional inner product space and let 𝑓 : 𝑉 → F


be a linear functional. Let {𝑣 1, . . . , 𝑣𝑛 } be an orthonormal basis of 𝑉 . Take 𝑦 =
Linear Transformations 71
𝑛
Õ
Í𝑛
𝑖=1 𝑓 (𝑣𝑖 ) 𝑣𝑖 . Now, if 𝑥 ∈ 𝑉 , then 𝑥 = h𝑥, 𝑣𝑖 i𝑣𝑖 , due to (2.8). Then
𝑖=1

𝑛
Õ 𝑛
Õ D Õ 𝑛 E
𝑓 (𝑥) = h𝑥, 𝑣𝑖 i𝑓 (𝑣𝑖 ) = h𝑥, 𝑓 (𝑣𝑖 ) 𝑣𝑖 i = 𝑥, 𝑓 (𝑣𝑖 ) 𝑣𝑖 = h𝑥, 𝑦i.
𝑖=1 𝑖=1 𝑖=1

Thus, 𝑓 (𝑥) = h𝑥, 𝑦i for all 𝑥 ∈ 𝑉 .


For uniqueness of such a vector 𝑦, let 𝑦1, 𝑦2 ∈ 𝑉 be such that

𝑓 (𝑥) = h𝑥, 𝑦1 i, 𝑓 (𝑥) = h𝑥, 𝑦2 i for all 𝑥 ∈ 𝑉 .

Then h𝑥, 𝑦1 − 𝑦2 i = 0 for all 𝑥 ∈ 𝑉 . Therefore, 𝑦1 = 𝑦2 .


Existence of an adjoint of a linear transformation can be proved using Riesz
representation theorem. It is as follows.
Call the vector 𝑦 in the equation 𝑓 (𝑥) = h𝑥, 𝑦i as the Riesz representer of the
functional 𝑓 ; and we denote it by 𝑣 𝑓 . Due to (3.22), if {𝑣 1, . . . , 𝑣𝑛 } is an orthonormal
basis of 𝑉 , then the Riesz representer of 𝑓 is given by
𝑛
Õ
𝑣𝑓 = 𝑓 (𝑣𝑖 ) 𝑣𝑖 .
𝑖=1

Let 𝑤 ∈ 𝑊 . Consider the map 𝑓 : 𝑉 → F defined by

𝑓 (𝑥) = h𝑇 𝑥, 𝑤i for each 𝑥 ∈ 𝑉 .

Since 𝑇 : 𝑉 → 𝑊 is a linear transformation, 𝑓 is a linear functional on 𝑉 . Using


the Riesz representer 𝑣 𝑓 of the functional 𝑓 , we see that

h𝑇 𝑥, 𝑤i = 𝑓 (𝑥) = h𝑥, 𝑣 𝑓 i for each 𝑥 ∈ 𝑉 .

Due to the uniqueness of the Riesz representer, the correspondence 𝑤 ↦→ 𝑣 𝑓 defines


a function from 𝑊 to 𝑉 . Call this function as 𝑆. That is, let

𝑆 :𝑊 →𝑉 be given by 𝑆 (𝑤) = 𝑣 𝑓 .

We show that 𝑆 is a linear transformation. For this, let 𝑥 ∈ 𝑉 , 𝑦, 𝑧 ∈ 𝑊 ; and 𝛼 ∈ F.


Then

h𝑥, 𝑆 (𝑦 + 𝑧)i = h𝑇 𝑥, 𝑦 + 𝑧i = h𝑇 𝑥, 𝑦i + h𝑇 𝑥, 𝑧i = h𝑥, 𝑆𝑦i + h𝑥, 𝑆𝑧i = h𝑥, 𝑆𝑦 + 𝑆𝑧i,

h𝑥, 𝑆 (𝛼𝑦)i = h𝑇 𝑥, 𝛼𝑦i = 𝛼 h𝑇 𝑥, 𝑦i = 𝛼 h𝑥, 𝑆𝑦i = h𝑥, 𝛼𝑆𝑦i.


72 MA2031 Classnotes

These equations hold for all 𝑥 ∈ 𝑉 . That is, 𝑆 (𝑦 + 𝑧) = 𝑆𝑦 + 𝑆𝑧 and 𝑆 (𝛼𝑦) = 𝛼𝑆𝑦.
Therefore, 𝑆 is a linear transformation satisfying

h𝑇 𝑥, 𝑤i = h𝑥, 𝑆𝑤i for each 𝑥 ∈ 𝑉 .

It follows from Riesz representation that 𝑆 is a unique linear transformation satisfy-


ing this property.
In the following theorems, we prove some useful facts about the adjoint.

(3.23) Theorem
Let 𝑈 , 𝑉 and 𝑊 be finite dimensional ips. Let 𝑆 : 𝑈 → 𝑉 and 𝑇 ,𝑇1,𝑇2 : 𝑉 → 𝑊 be
linear transformations. Let 𝐼 : 𝑉 → 𝑉 be the identity operator and let 𝛼 ∈ F. Then

(𝑇1 + 𝑇2 ) ∗ = 𝑇1∗ + 𝑇2∗, (𝛼𝑇 ) ∗ = 𝛼 𝑇 ∗, (𝑇 ∗ ) ∗ = 𝑇 , 𝐼 ∗ = 𝐼, (𝑇 𝑆) ∗ = 𝑆 ∗𝑇 ∗ .

Proof. h𝑥, (𝑇1 + 𝑇2 ) ∗𝑦i = h(𝑇1 + 𝑇2 )𝑥, 𝑦i = h𝑇1𝑥, 𝑦i + h𝑇2𝑥, 𝑦i = h𝑥,𝑇1∗𝑦i + h𝑥,𝑇2∗𝑦i
= h𝑥,𝑇1∗𝑦 + 𝑇2∗𝑦i. Therefore, (𝑇1 + 𝑇2 ) ∗ = 𝑇1∗ + 𝑇2∗ . Other equalities are proved
similarly.

(3.24) Theorem
Let 𝑉 and 𝑊 be finite dimensional inner product spaces, and let 𝑇 : 𝑉 → 𝑊 be a
linear transformation. Then the following are true:
(1) 𝑁 (𝑇 ∗𝑇 ) = 𝑁 (𝑇 ), 𝑁 (𝑇𝑇 ∗ ) = 𝑁 (𝑇 ∗ ).
(2) rank(𝑇 ∗ ) = rank(𝑇 ∗𝑇 ) = rank(𝑇𝑇 ∗ ) = rank(𝑇 ).
(3) 𝑅(𝑇 ∗𝑇 ) = 𝑅(𝑇 ∗ ), 𝑅(𝑇𝑇 ∗ ) = 𝑅(𝑇 ).
(4) If dim (𝑉 ) = dim (𝑊 ), then null(𝑇 ∗ ) = null(𝑇 ∗𝑇 ) = null(𝑇𝑇 ∗ ) = null(𝑇 ).

Proof. (1) Let 𝑥 ∈ 𝑁 (𝑇 ). Then 𝑇 𝑥 = 0; so, 𝑇 ∗𝑇 𝑥 = 0. That is, 𝑥 ∈ 𝑁 (𝑇 ∗𝑇 ).


Conversely, if 𝑦 ∈ 𝑁 (𝑇 ∗𝑇 ), then 𝑇 ∗𝑇𝑦 = 0. Taking inner product with 𝑦, we have
h𝑦,𝑇 ∗𝑇𝑦i = 0. It follows that h𝑇𝑦,𝑇𝑦i = 0. That is, 𝑇𝑦 = 0. So, 𝑦 ∈ 𝑁 (𝑇 ). Therefore,
𝑁 (𝑇 ∗𝑇 ) = 𝑁 (𝑇 ). Replacing 𝑇 with 𝑇 ∗, and using the fact that (𝑇 ∗ ) ∗ = 𝑇 , we have
𝑁 (𝑇𝑇 ∗ ) = 𝑁 (𝑇 ∗ ).
(2) From (1) it follows that null(𝑇 ∗𝑇 ) = null(𝑇 ). Then Rank-nullity theorem implies
that rank(𝑇 ∗𝑇 ) = rank(𝑇 ). Similarly, rank(𝑇𝑇 ∗ ) = rank(𝑇 ∗ ).
Now, if 𝑦 ∈ 𝑅(𝑇 ∗𝑇 ), then 𝑦 = 𝑇 ∗ (𝑇 𝑥) for some 𝑥 ∈ 𝑉 . That is, 𝑦 ∈ 𝑅(𝑇 ∗ ). So,
𝑅(𝑇 ∗𝑇 ) is a subspace of 𝑅(𝑇 ∗ ).
Therefore, rank(𝑇 ) = rank(𝑇 ∗𝑇 ) ≤ rank(𝑇 ∗ ). Similarly, with 𝑇 ∗ instead of 𝑇 , we
obtain rank(𝑇 ∗ ) = rank(𝑇𝑇 ∗ ) ≤ rank(𝑇 ). Combining these, we get the required
equalities.
Linear Transformations 73
(3) We have already seen that 𝑅(𝑇 ∗𝑇 ) is a subspace of 𝑅(𝑇 ∗ ); and (2) says that their
dimensions are equal. Therefore, 𝑅(𝑇 ∗𝑇 ) = 𝑅(𝑇 ∗ ).
With 𝑇 ∗ instead of 𝑇 , we obtain 𝑅(𝑇𝑇 ∗ ) = 𝑅(𝑇 ).
(4) We have null(𝑇 ∗ ) = dim (𝑊 ) − rank(𝑇 ∗ ), null(𝑇 ) = dim (𝑉 ) − rank(𝑇 ), and
dim (𝑉 ) = dim (𝑊 ). Then (2) implies null(𝑇 ∗ ) = null(𝑇 ). Now, other equalities
follow from (1).
Basing on the adjoint, special types of linear operators can be defined.
A linear operator 𝑇 on a finite dimensional ips 𝑉 is called

1. self-adjoint iff 𝑇 ∗ = 𝑇 ;
2. normal iff 𝑇 ∗𝑇 = 𝑇𝑇 ∗ ;
3. unitary iff 𝑇 ∗𝑇 = 𝑇𝑇 ∗ = 𝐼 ;
4. orthogonal iff 𝑇 ∗𝑇 = 𝑇𝑇 ∗ = 𝐼 and 𝑉 is a real ips;
5. isometric iff k𝑇 𝑥 k = k𝑥 k for each 𝑥 ∈ 𝑉 .

We will come across these types of linear operators at various places. For now,
we observe that each self-adjoint linear operator is normal, and each unitary linear
operator is normal and invertible. Similarly, it can be shown that a linear operator
on a finite dimensional ips is unitary iff it is isometric.

Exercises for § 3.5


1. Fix a vector 𝑢 in an ips 𝑉 . Define a linear functional 𝑇 on 𝑉 by 𝑇 𝑣 = h𝑣, 𝑢i
for 𝑣 ∈ 𝑉 . What is 𝑇 ∗ (𝛼) for a scalar 𝛼? Ans: 𝑇 ∗ (𝛼) = 𝛼𝑢.
2. Define an operator 𝑇 on F𝑛 by 𝑇 (𝑎 1, 𝑎 2, . . . , 𝑎𝑛 ) = (0, 𝑎 1, . . . , 𝑎𝑛−1 ). What is
𝑇 ∗ (𝑎 1, . . . , 𝑎𝑛 )? Ans: 𝑇 ∗ (𝑎 1, . . . , 𝑎𝑛 ) = (𝑎 2, . . . , 𝑎𝑛 , 0).
3. Prove that a linear operator 𝑇 on a finite dimensional ips is invertible iff 𝑇 ∗ is
invertible. In that case, show that (𝑇 ∗ ) −1 = (𝑇 −1 ) ∗ .
4. Fundamental subspaces: Let 𝑇 : 𝑉 → 𝑊 be a linear operator, where 𝑉 and
𝑊 are finite dimensional ips. Prove the following:
(a) 𝑁 (𝑇 ) = 𝑅(𝑇 ∗ ) ⊥ (b) 𝑁 (𝑇 ∗ ) = 𝑅(𝑇 ) ⊥ (c) 𝑅(𝑇 ) = 𝑁 (𝑇 ∗ ) ⊥
(d) 𝑅(𝑇 ∗ ) = 𝑁 (𝑇 ) ⊥ (e) 𝑉 = 𝑁 (𝑇 ) ⊕ 𝑁 (𝑇 ) ⊥ = 𝑅(𝑇 ) ⊕ 𝑅(𝑇 ) ⊥ = 𝑁 (𝑇 ∗ ) ⊕
𝑁 (𝑇 ∗ ) ⊥ = 𝑅(𝑇 ∗ ) ⊕ 𝑅(𝑇 ∗ ) ⊥ .
Recall that for 𝑆 ⊆ 𝑉 , 𝑆 ⊥ = {𝑥 ∈ 𝑉 : 𝑥 ⊥ 𝑢 for each 𝑢 ∈ 𝑆 }. See Exercise 9 of
§ 2.2. Also, 𝑉 = 𝑋 ⊕ 𝑌 means 𝑉 = 𝑋 + 𝑌 and 𝑋 ∩ 𝑌 = {0}.
5. Show that on a finite dimensional ips, a linear operator is unitary iff it is
isometric.
6. Give an example of a linear operator which is
74 MA2031 Classnotes

(a) normal but not self-adjoint. (b) normal but not unitary.
(c) unitary but not self-adjoint. (d) self-adjoint but not unitary.
Ans: (a)-(b) 𝑇 : F2 → F2, 𝑇 (𝑎, 𝑏) = (2𝑎 − 3𝑏, 3𝑎 − 2𝑏). See that
𝑇 ∗ (𝑎, 𝑏) = (2𝑎 + 3𝑏, −3𝑎 + 2𝑏).
(c) 𝑇 : F2 → F2, 𝑇 (𝑎, 𝑏) = (−𝑏, 𝑎), 𝑇 ∗ (𝑎, 𝑏) = (𝑏, −𝑎). (d) 𝑇 = 2𝐼 .
∫1
7. Determine a polynomial 𝑞(𝑡) ∈ 𝑃2 (R) so that 0 𝑝 (𝑡)𝑞(𝑡)𝑑𝑡 = 𝑝 (1/2) for
each 𝑝 (𝑡) ∈ R2 [𝑡]. Ans: 𝑞(𝑡) = −3/2 + 15𝑡 − 15𝑡 2 .
8. Define 𝑓 : C3 → C by 𝑓 (𝑎, 𝑏, 𝑐) = (𝑎 + 𝑏 + 𝑐)/3. Find a vector 𝑦 ∈ C3 such
that 𝑓 (𝑥) = h𝑥, 𝑦i for each 𝑥 ∈ C3 . Ans: (1/3, 1/3, 1/3).
9. Let 𝑇 be a linear operator on a finite dimensional ips. Using (3.24) show that
if 𝑇 ∗𝑇 = 𝐼, then 𝑇𝑇 ∗ = 𝐼 .
4
Linear Transformations and Matrices

4.1 Solution of linear equations


We discuss application of linear transformations to the solvability of systems of
linear equations. For this, notice that if 𝐴 ∈ F𝑚×𝑛 and 𝑥 ∈ F𝑛×1, then 𝐴𝑥 ∈ F𝑚×1 .
Further, it satisfies the following:

𝐴(𝑥 + 𝑦) = 𝐴𝑥 + 𝐴𝑦, 𝐴(𝛼𝑥) = 𝛼𝐴𝑥 for 𝑥, 𝑦 ∈ F𝑛×1, 𝛼 ∈ F.

Thus we lay out the following convention:


Convention: A matrix 𝐴 ∈ F𝑚×𝑛 is viewed as the linear transformation 𝐴 : F𝑛×1 →
F𝑚×1 defined by 𝐴(𝑥) = 𝐴𝑥 for 𝑥 ∈ F𝑛×1 .
Let 𝐴 = [𝑎𝑖 𝑗 ] ∈ F𝑚×𝑛 and let 𝑏 ∈ F𝑚×1 . Then 𝐴𝑥 = 𝑏, written explicitly as

𝑎 11𝑥 1 + · · · +𝑎 1𝑛 𝑥𝑛 = 𝑏 1
.. ..
. .
𝑎𝑚1𝑥 1 + · · · +𝑎𝑚𝑛 𝑥𝑛 = 𝑏𝑛

is called a system of linear equations with 𝑚 equations and 𝑛 unknowns 𝑥 1, . . . , 𝑥𝑛 ,


where the coefficients 𝑎𝑖 𝑗 are from F. We abbreviate the phrase ‘a system of linear
equations’ to ‘ a linear system’. When 𝑏 = 0, the linear system is said to be
homogeneous.
The solution set of the linear system 𝐴𝑥 = 𝑏 is defined as

Sol (𝐴, 𝑏) = {𝑥 ∈ F𝑛×1 : 𝐴𝑥 = 𝑏}.

Thus the system 𝐴𝑥 = 𝑏 has a solution iff Sol (𝐴, 𝑏) ≠ ∅.


We use the notation [𝐴|𝑏] for the new matrix in F𝑚×(𝑛+1) obtained by taking
the columns of 𝐴 in the same order along with 𝑏 as the (𝑛 + 1)th column. We
call [𝐴|𝑏] an augmented matrix. The system 𝐴𝑥 = 𝑏 is called consistent iff
rank([𝐴|𝑏]) = rank(𝐴).
Recall that rank(𝐴) is the dimension of 𝑅(𝐴), the range space of 𝐴, which is the
subspace of F𝑚×1 spanned by the columns of 𝐴.

75
76 MA2031 Classnotes

(4.1) Theorem
A linear system has a solution iff it is consistent.

Proof. Let 𝑈 be the subspace of F𝑚×1 spanned by the columns of [𝐴|𝑏]. Notice
that 𝑅(𝐴) is a subspace of 𝑈 . Then,
𝐴𝑥 = 𝑏 has a solution iff 𝑏 = 𝐴𝑥 for some 𝑥 ∈ F𝑛×1 iff 𝑏 ∈ 𝑅(𝐴)
iff 𝑅(𝐴) = 𝑈 iff rank(𝐴) = dim (𝑅(𝐴)) = dim (𝑈 ) = rank([𝐴|𝑏]).
Recall that 𝑁 (𝐴) is the null space of 𝐴, which is equal to the solution set of
the homogeneous system. That is, 𝑁 (𝐴) = Sol (𝐴, 0). We connect Sol (𝐴, 𝑏) and
Sol (𝐴, 0).

(4.2) Theorem
If 𝑢 is a solution of 𝐴𝑥 = 𝑏, then Sol (𝐴, 𝑏) = 𝑢 + 𝑁 (𝐴) = {𝑢 + 𝑥 : 𝑥 ∈ 𝑁 (𝐴)}.

Proof. Let 𝐴𝑢 = 𝑏. If 𝑣 ∈ Sol (𝐴, 𝑏), then 𝐴(𝑣 − 𝑢) = 𝐴𝑣 − 𝐴𝑢 = 𝑏 − 𝑏 = 0. That is,


𝑣 − 𝑢 ∈ 𝑁 (𝐴). Then, 𝑣 ∈ 𝑢 + 𝑁 (𝐴).
Conversely, let 𝑥 ∈ 𝑢 +𝑁 (𝐴). Then 𝑥 = 𝑢 +𝑦 for some 𝑦 ∈ 𝑁 (𝐴). That is, 𝑥 = 𝑢 +𝑦
for some 𝑦 with 𝐴𝑦 = 0. Now, 𝐴(𝑥) = 𝐴(𝑢 + 𝑦) = 𝐴𝑢 + 𝐴𝑦 = 𝐴𝑢 = 𝑏. Therefore,
𝑢 + 𝑥 ∈ Sol (𝐴, 𝑏).
It means that any solution of 𝐴𝑥 = 𝑏 can be obtained by taking a particular
solution 𝑢 of 𝐴𝑥 = 𝑏 and then adding it to any solution of the homogeneous system
𝐴𝑥 = 0. Notice that 𝑁 (𝐴) has either a single vector or an infinite number of vectors.
Therefore, a linear system has either no solutions, or a unique solution, or infinitely
many solutions.
As a corollary to (4.2), we obtain the following result.

(4.3) Theorem
Let 𝐴 ∈ F𝑚×𝑛 and let 𝑏 ∈ F𝑚×1 . Write 𝑘 := null(𝐴) = 𝑛 − rank(𝐴).
(1) The linear system 𝐴𝑥 = 𝑏 has a unique solution iff 𝑘 = 0 and 𝑏 ∈ 𝑅(𝐴) iff
rank([𝐴|𝑏]) = rank(𝐴) = 𝑛.
(2) If 𝑢 is a solution of 𝐴𝑥 = 𝑏 and {𝑣 1, . . . , 𝑣𝑘 } is a basis for 𝑁 (𝐴), then

Sol (𝐴, 𝑏) = {𝑢 + 𝛼 1𝑣 1 + · · · + 𝛼𝑘 𝑣𝑘 : 𝛼 1, . . . , 𝛼𝑘 ∈ F}.

(3) For 𝑚 = 𝑛, 𝐴𝑥 = 𝑏 has a unique solution iff 𝑑𝑒𝑡 (𝐴) ≠ 0.

When a system has a unique solution, a determinant formula can be given for
obtaining the solution. We will discuss later how to compute the solution set of a
general linear system.
Linear Transformations and Matrices 77
(4.4) Theorem (Cramer’s Rule)
Let 𝐴 ∈ F𝑛×𝑛 with det(𝐴) ≠ 0, and let 𝑏 ∈ F𝑛×1 . Let 𝐴𝑖 [𝑏] denote the matrix obtained
from 𝐴 by replacing its 𝑖th column with the vector 𝑏. Then the solution of 𝐴𝑥 = 𝑏
is given by
det(𝐴𝑖 [𝑏])
𝑥𝑖 = for 1 ≤ 𝑖 ≤ 𝑛.
det(𝐴)

Proof. Since det(𝐴) ≠ 0, there exists a unique 𝑥 ∈ F𝑛×1 such that 𝐴𝑥 = 𝑏. Let
𝑥 = (𝑥 1, . . . , 𝑥𝑛 ) t . Write 𝐴𝑥 = 𝑏 as 𝑥 1𝐶 1 + · · · + 𝑥𝑛𝐶𝑛 = 𝑏, where 𝐶 𝑗 is the 𝑗th column
of 𝐴. Next, move 𝑏 to the left side to obtain

𝑥 1𝐶 1 + · · · + (𝑥𝑖 𝐶𝑖 − 𝑏) + · · · + 𝑥𝑛𝐶𝑛 = 0.

So, the column vectors 𝐶 1, . . . , 𝑥𝑖 𝐶𝑖 − 𝑏𝑖 , . . . , 𝐶𝑛 are linearly dependent. Thus,

det(𝐴𝑖 [𝑥𝑖 𝐶𝑖 − 𝑏]) = 0.

Using Properties (9)-(10) of the the determinant, we get

det[𝐶 1, . . . , 𝑥𝑖 𝐶𝑖 , . . . , 𝐶𝑛 ] − det[𝐶 1, . . . , 𝑏𝑖 , . . . , 𝐶𝑛 ] = 0.

This is same as 𝑥𝑖 det(𝐴) − det(𝐴𝑖 [𝑏]) = 0.

Cramer’s rule helps in studying the map (𝐴, 𝑏) ↦→ 𝑥, when det(𝐴) ≠ 0. For
computing actual solutions of a linear system, it does not help when the order of
the matrix is larger than five, in general. We rather use Gauss-Jordan elimination
method.
Gauss-Jordan elimination uses conversion of the augmented matrix to its RREF.
We now discuss this systematic approach for solving systems of linear equations.

(4.5) Theorem
Let [𝐴0 |𝑏 0] be an augmented matrix obtained from the augmented matrix [𝐴|𝑏] by
elementary row operations. Then Sol (𝐴, 𝑏) = Sol (𝐴0, 𝑏 0).

Proof. Let 𝐵 be an invertible matrix. If 𝑥 satisfies 𝐴𝑥 = 𝑏, then 𝑥 also satisfies


𝐵𝐴𝑥 = 𝐵𝑏, and vice-versa. That is, Sol (𝐴, 𝑏) = Sol (𝐵𝐴, 𝐵𝑏). Since each elementary
matrix is invertible, and elementary operations amount to pre-multiplying both 𝐴
and 𝑏 with a product of elementary matrices, we see that Sol (𝐴, 𝑏) = Sol (𝐴0, 𝑏 0).

In Gauss-Jordan elimination, we reduce the augmented matrix to its RREF, and


then write down the solution set directly. We illustrate this method with an example.
78 MA2031 Classnotes

(4.6) Example
Consider solving the following system of linear equations:

𝑥1 + 𝑥2 + 2𝑥 3 + 𝑥5 =1
3𝑥 1 + 5𝑥 2 + 5𝑥 3 + 𝑥 4 + 𝑥5 =2
4𝑥 1 + 6𝑥 2 + 7𝑥 3 + 𝑥 4 + 2𝑥 5 =3
𝑥1 + 5𝑥 2 + 5𝑥 4 + 𝑥5 =2
2𝑥 1 + 8𝑥 2 + 𝑥 3 + 6𝑥 4 + 0𝑥 5 = 2.
In Gauss-Jordan elimination, the reduction of the augmented matrix to RREF goes
as follows:
1 1 2 0 1 1   1 1 2 0 1 1 
 
3
 5 5 1 1 2   0 2
 −1 1 −2 −1 
𝑂1
4 6 7 1 2 3  −→  0 2 −1 1 −2 −1 
   
   
1 5 0 5 1 2  0 4 −2 5 0 1
   
2
 8 1 6 0 2   0 6
 −3 6 −2 0 
 1
 0 5/2 −1/2 2 3/2  
 1 0 5/2 0 8/3 2 
 0
 1 −1/2 1/2 −1 −1/2  
 0 1 −1/2 0 −5/3 −1 
𝑂2  𝑂3 
−→  0 0 0 0 0 0 −→  0 0 0 1 4/3 1  .
 
   
 0 0 0 3 4 3   0 0 0 0 0 0 
 
 0 0 0 3 4 3   0 0 0 0 0 0 
 
Here, 𝑂1 = 𝑅2 ← 𝑅2 − 4𝑅1, 𝑅3 ← 𝑅3 − 4𝑅1, 𝑅4 ← 𝑅4 − 𝑅1, 𝑅5 ← 𝑅5 − 2𝑅1 ;
𝑂2 = 𝑅2 ← 1/2𝑅2, 𝑅1 ← 𝑅1 − 𝑅2, 𝑅3 ← 𝑅3 − 2𝑅2, 𝑅4 ← 𝑅4 − 4𝑅2, 𝑅5 ← 𝑅5 − 6𝑅2 ;
𝑂3 = 𝑅3 ↔ 𝑅4, 𝑅1 ← 𝑅1 + 1/2𝑅3, 𝑅2 ← 𝑅2 − 1/2𝑅3, 𝑅5 ← 𝑅5 − 3𝑅3 .
The equations now look like

𝑥1 + 25 𝑥 3 + 83 𝑥 5 = 2
𝑥 2 − 12 𝑥 3 − 53 𝑥 5 = −1
𝑥 4 + 34 𝑥 5 = 1.
The basic variables are 𝑥 1, 𝑥 2, 𝑥 4 and the free variables are 𝑥 3 and 𝑥 5 . Assigning the
free avraibales arbitrary values, say, 𝑥 3 = 𝛼 and 𝑥 5 = 𝛽, we have

𝑥1 = 2 − 52 𝛼 − 83 𝛽
𝑥 2 = −1 + 21 𝛼 + 53 𝛽
𝑥3 = 𝛼
𝑥4 = 1 − 43 𝛽
𝑥5 = 𝛽.
Linear Transformations and Matrices 79
Hence the solution set is

  2  −5/2   −8/3  


       






 
 −1 


 1/2
 
 5/3
 




Sol (𝐴, 𝑏) =  0 + 𝛼  1  + 𝛽  0  : 𝛼, 𝛽 ∈ F .
   
      −4  


  1  0  /3  


       
  0
  0  1 

      
In fact, you can write the solution set from the RREF of the augmented matrix quite
mechanically instead of rewriting as a set of equations. In this process, we delete all
zero rows at the bottom of the RREF. Next, we insert suitable zero rows so that the
pivots are on the diagonal and the 𝐴-portion is a square matrix. In this phase, we
may require adding more zero rows at the bottom. Next, we change all non-pivot
entries (containing zero now) to −1. Then the non-pivotal columns form a basis for
𝑁 (𝐴), and any solution of the system is the vector in the 𝑏-portion plus a linear
combination of the non-pivotal columns. To see this process for the above RREF,
we proceed as follows:


 1 0 5/2 0 8/3 2 
 0 1 −1/2 0 −5/3 −1   1 0 5/2 0 8/3 2 
 
𝑑𝑒𝑙−0
0 0 0 1 4/3 1 −→  0 1 −1/2 0 −5/3 −1 
   
 
   

 0 0 0 0 0 0   0
 0 0 1 4/3 1 

 0 0 0 0 0 0 
 1 0 5/2 0 8/3 2   1 0 5/2 0 8/3 2 
  
 0 1 −1/2 0 −5/3 −1   0 1 −1/2 0 −5/3 −1 
  (−1) 
𝑖𝑛𝑠−0 
−→  0 0 0 0 0 0  −→ 0 0 −1 0 0 0.
  

   
 0 0 0 1 4/3 1   0 0 0 1 4/3 1 
  
 0 0 0 0 0 0   0 0 0 0 0 0 
 
Then

  2  5/2  8/3 


       

 −1


  
− 1/2



 −5/3





 
 

Sol (𝐴, 𝑏) =  0 + 𝑎  −1  + 𝑏  0  : 𝑎, 𝑏 ∈ F .
   
      4  


  1  0  /3 


       
  0
  0  −1  

      
It is easy to see that this process yields the same solution set as earlier.

Exercises for § 4.1


1. Let 𝐴 = [𝑎𝑖 𝑗 ] ∈ F𝑚×𝑛 , where 𝑚 < 𝑛. Show that there exists a nonzero vector
(𝛼 1, . . . , 𝛼𝑛 ) ∈ F𝑛 such that 𝛼 1𝑎𝑖1 + · · · +𝛼𝑛 𝑎𝑖𝑛 = 0 for all 𝑖 = 1, . . . , 𝑚. Interpret
80 MA2031 Classnotes

the result for linear systems.


2. Prove (4.3).
3. Suppose a linear homogeneous system 𝐴𝑥 = 0 has 𝑚 equations and 𝑛 un-
knowns. Show the following:
(a) If rank(𝐴) = 𝑛, then 𝐴𝑥 = 0 has a unique solution, the trivial solution.
(b) If rank(𝐴) < 𝑛, then 𝐴𝑥 = 0 has infinitely many solutions.
(c) If 𝑚 ≥ 𝑛, then both (a)-(b) are possible.
(d) If 𝑚 < 𝑛 then 𝐴𝑥 = 0 has infinitely many solutions.
(e) 𝐴𝑥 = 0 has infinitely many solutions iff the columns of 𝐴 are linearly
dependent.
4. Let the linear system 𝐴𝑥 = 𝑏 have 𝑚 equations and 𝑛 unknowns. Show the
following:
(a) If rank([𝐴 | 𝑏]) > rank(𝐴), then 𝐴𝑥 = 𝑏 has no solutions.
(b) If rank([𝐴 | 𝑏]) = rank(𝐴) = 𝑛, then 𝐴𝑥 = 𝑏 has a unique solution.
(c) If rank([𝐴 | 𝑏]) = rank(𝐴) < 𝑛, then 𝐴𝑥 = 𝑏 has infinitely many solu-
tions.
(d) 𝐴𝑥 = 𝑏 has at least one solution iff 𝑏 ∈ span {𝐴1, . . . , 𝐴𝑛 }.
(e) 𝐴𝑥 = 𝑏 has at most one solution iff 𝐴1, . . . , 𝐴𝑛 are linearly independent.
5. Let 𝐴 ∈ F𝑚×𝑛 . Let 𝑏 be a nonzero vector in F𝑚×1 orthogonal to each column
of 𝐴. Show that the linear system 𝐴𝑥 = 𝑏 has no solutions.
6. Prove: If 𝑈 is a subspace of F𝑛×1 and 𝑣 ∈ F𝑛×1, then there exists a system of
linear equations having 𝑛 equations and 𝑛 unknowns, with coefficients in F,
such that its solution set equals 𝑣 + 𝑈 .
7. Let 𝐴 ∈ F𝑚×𝑛 . Let 𝑈 = {𝑋 ∈ F𝑛×𝑘 : 𝐴𝑋 = 0}. What is dim (𝑈 )?
Ans: dim (𝑈 ) = 𝑘 null(𝐴).
8. Determine whether the following systems of linear equations are consistent.
If consistent, then find the solution set.
(a) 𝑥 1 − 𝑥 2 + 2𝑥 3 − 3𝑥 4 = 7, 4𝑥 1 + 3𝑥 3 + 𝑥 4 = 9, 2𝑥 1 − 5𝑥 2 + 𝑥 3 = −2,
3𝑥 1 − 𝑥 2 − 𝑥 3 + 2𝑥 4 = −2.
Ans: 𝑥 1 = 10/9, 𝑥 2 = 11/9, 𝑥 3 = 17/9, 𝑥 4 = −10/9.
(b) 𝑥 1 − 𝑥 2 + 2𝑥 3 − 3𝑥 4 = 7, 4𝑥 1 + 3𝑥 3 + 𝑥 4 = 9, 2𝑥 1 − 5𝑥 2 + 𝑥 3 = −2,
3𝑥 1 +n4𝑥 2 + 4𝑥 3 − 2𝑥 4 = 18. o
 𝑡  𝑡
Ans: − /9 /27 /27 0 + 𝛼 −2 − /3 /3 1 : 𝛼 ∈ F .
10 23 121 1 7

9. Find all possible values of 𝑘 ∈ R such that the system of linear equations
𝑥 + 𝑦 + 2𝑧 − 5𝑤 = 3, 2𝑥 + 5𝑦 − 𝑧 − 9𝑤 = −3, 𝑥 − 2𝑦 + 6𝑧 − 7𝑤 = 7,
2𝑥 + 2𝑦 + 2𝑧 + 𝑘𝑤 = −4
has more than one solution. Ans: 𝑘 = −12.
Linear Transformations and Matrices 81
10. Determine the values of 𝑘 ∈ R so that system of linear equations

𝑥 + 𝑦 − 𝑧 = 1, 2𝑥 + 3𝑦 + 𝑘𝑧 = 3, 𝑥 + 𝑘𝑦 + 3𝑧 = 2.

has (a) no solution, (b) infinitely many solutions, (c) exactly one solution.
Ans: (a) 𝑘 = −3. (b) 𝑘 = 2. (c) 𝑘 ∉ {−3, 2}.
11. For all possible values of the scalars 𝑎 and 𝑏, discuss the number of solutions
of the linear system 𝑥 + 2𝑦 + 3𝑧 = 1, 𝑥 − 𝑎𝑦 + 21𝑧 = −2, 3𝑥 + 7𝑦 + 𝑎𝑧 = 𝑏.
Ans: 𝑎 ∉ {0, 7} : unique solution; 𝑎 = 0, 𝑏 = 5/2 or 𝑎 = 7, 𝑏 = 4/9 : infinitely
many solutions; 𝑎 = 0, 𝑏 ≠ 5/2 or 𝑎 = 7, 𝑏 ≠ 4/9 : no solutions.

4.2 Least squares solution


In applications, we often neglect small parameters to arrive at a workable model.
Also, reading of instruments is never exact. These considerations lead to inexact
data. Suppose that after admitting such inaccuracies in our model we reached at a
linear system of equations, which is inconsistent. How do we solve such a linear
system for a required solution?
If 𝑥 is any suggested solution to the equation 𝐴𝑥 = 𝑏, then we look at the residual
k𝐴𝑥 − 𝑏 k. If 𝑥 is indeed a solution of 𝐴𝑥 = 𝑏, then the residual is equal to 0. Thus
we take up the heuristic of minimizing the residual for arriving at a vector which
may be best among the suggested solutions.
Let 𝑇 : 𝑈 → 𝑉 be a linear transformation, where 𝑈 is a vector space and 𝑉 is an
ips. A vector 𝑢 ∈ 𝑈 is called a least squares solution of the equation 𝑇 𝑥 = 𝑦 iff
k𝑇𝑢 − 𝑦 k ≤ k𝑇𝑤 − 𝑦 k for all 𝑤 ∈ 𝑈 .
Notice that if 𝑢 ∈ 𝑈 is a least squares solution of 𝑇 𝑥 = 𝑦, then 𝑇𝑢 is the best
approximation of 𝑦 from 𝑅(𝑇 ). Thus a least squares solution of 𝑇 𝑥 = 𝑦 is also called
a best approximate solution.
To determine a least squares solution of 𝑇 𝑥 = 𝑦, we need to determine the best
approximation 𝑣 of 𝑦 from 𝑅(𝑇 ). And then we would find an appropriate 𝑢 ∈ 𝑈
which satisfies 𝑇𝑢 = 𝑣. Of course, 𝑣 ∈ 𝑅(𝑇 ) guarantees that there exists such a
𝑢 ∈ 𝑈 . This strategy along with (2.15-2.16) yield the following result.

(4.7) Theorem
Let 𝑇 : 𝑈 → 𝑉 be a linear transformation, where 𝑈 is a subspace of an ips 𝑉 . Let
𝑦 ∈ 𝑉 . Then the following are true:
(1) If 𝑅(𝑇 ) is finite dimensional, then 𝑇 𝑥 = 𝑦 has a least squares solution.
(2) A vector 𝑢 ∈ 𝑈 is a least squares solution of 𝑇 𝑥 = 𝑦 iff 𝑇𝑢 − 𝑦 ⊥ 𝑧 for each
𝑧 ∈ 𝑅(𝑇 ).
82 MA2031 Classnotes

(3) A least squares solution is unique iff 𝑇 is one-one.

In case of matrices, we have the following simplification.

(4.8) Theorem
Let 𝐴 ∈ F𝑚×𝑛 , and let 𝑏 ∈ F𝑚×1 . A vector 𝑢 ∈ F𝑛×1 is a least squares solution of the
system of linear equations 𝐴𝑥 = 𝑏 iff 𝐴∗𝐴𝑢 = 𝐴∗𝑏.

Proof. Let 𝑢 1, . . . , 𝑢𝑛 be the columns of 𝐴. These vectors span 𝑅(𝐴). Using (4.7),
we see that
𝑢 is a least squares solution of 𝐴𝑥 = 𝑏
iff h𝐴𝑢 − 𝑏, 𝑢𝑖 i = 0 for 𝑖 = 1, . . . , 𝑛
iff 𝑢𝑖∗ (𝐴𝑢 − 𝑏) = 0 for 𝑖 = 1, . . . , 𝑛
iff 𝐴∗ (𝐴𝑢 − 𝑏) = 0
iff 𝐴∗𝐴𝑢 = 𝐴∗𝑏.

(4.9) Example

    
1 1 0 1
Suppose 𝐴 = ,𝑏= , and 𝑢 = .
0 0 1 −1
We see that 𝐴 ∈ R2×2 and 𝐴t𝐴𝑢 = 𝐴t𝑏. Therefore, 𝑢 is a least squares solution of
𝐴𝑥 = 𝑏. Notice that 𝐴𝑥 = 𝑏 has no solution.

A least squares solution can be written in a simplified form by using the QR-
factorization, which stems from Gram-Schmidt orthogonalization. To see this, we
first present this matrix factorization.
A QR-factorization of a matrix 𝐴 ∈ F𝑚×𝑛 is the determination of a
matrix 𝑄 ∈ F𝑚×𝑛 with orthonormal columns, and an upper triangular matrix
𝑅 ∈ F𝑛×𝑛 such that 𝐴 = 𝑄𝑅.

(4.10) Theorem
Each matrix with linearly independent columns has a QR-factorization, where 𝑅 is
invertible. Consequently, 𝑅 = 𝑄 ∗𝐴.

Proof. Let 𝑢 1, . . . , 𝑢𝑛 be the columns of 𝐴 ∈ F𝑚×𝑛 . Suppose the columns are linearly
independent. It ensures that 𝑚 ≥ 𝑛. Use Gram-Schmidt process and orthonormalize
to obtain the orthonormal vectors 𝑣 1, . . . , 𝑣𝑛 . We know that for each 𝑘 ∈ {1, . . . , 𝑛},
span {𝑢 1, . . . , 𝑢𝑘 } = span {𝑣 1, . . . , 𝑣𝑘 }.
In particular, 𝑢𝑘 ∈ span {𝑣 1, . . . , 𝑣𝑘 }. Hence there exist scalars 𝑎𝑖 𝑗 such that the
following equalities hold:
Linear Transformations and Matrices 83

𝑢 1 = 𝑎 11𝑣 1
𝑢 2 = 𝑎 12𝑣 1 + 𝑎 22𝑣 2
..
.
𝑢𝑛 = 𝑎 1𝑛 𝑣 1 + 𝑎 2𝑛 𝑣 2 + · · · + 𝑎𝑛𝑛 𝑣𝑛 .

Since the vectors 𝑢 1, . . . , 𝑢𝑛 are linearly independent, the scalars 𝑎 11, . . . , 𝑎𝑛𝑛 are
nonzero. Put 𝑎𝑖 𝑗 = 0 for 𝑖 > 𝑗 . Write 𝑅 = [𝑎𝑖 𝑗 ] and 𝑄 = [𝑣 1, · · · , 𝑣𝑛 ]. Then the
above equalities give
𝐴 = [𝑢 1, · · · , 𝑢𝑛 ] = 𝑄𝑅.
Here, 𝑄 ∈ F𝑚×𝑛 has orthonormal columns; and 𝑅 ∈ F𝑛×𝑛 is upper triangular.
Further, 𝑅 is invertible since the diagonal entries 𝑎𝑖𝑖 in 𝑅 are nonzero.
Moreover, the inner product in F𝑛×1 is given by h𝑢, 𝑣i = 𝑣 ∗𝑢. Therefore, 𝑄 has
orthonormal columns means that 𝑄 ∗𝑄 = 𝐼 . Then 𝑄𝑅 = 𝐴 implies that 𝑅 = 𝑄 ∗𝐴.
Notice that 𝑄 ∗𝑄 = 𝐼 does not imply that 𝑄𝑄 ∗ is 𝐼, in general. In case 𝐴 is a square
matrix, 𝑄𝑄 ∗ = 𝐼 and thus 𝑄 is unitary.

(4.11) Example
1 1
 
Let 𝐴 = 0 1 . Gram-Schmidt process on the columns of 𝐴 followed by orthonor-
1 1
 
malization yields the following:
1 1/√2
1
k𝑤 1 k 2 = 𝑤 1t 𝑤 1 = 2,
   
𝑤 1 = 𝑢 1 = 0 , 𝑣1 = 𝑤 1 =  0  .
1 k𝑤 1 k 1/√2
   
1 1/√2 0 0
t
  √     1  
𝑤 2 = 𝑢 2 − (𝑣 1 𝑢 2 )𝑣 1 = 1 − 2  0  = 1 , 𝑣 2 =
      𝑤 2 = 1 .
1 1/√2 0 k𝑤 2 k 0
       
Therefore,
 1/√2 0 
 
𝑄 = [𝑣 1 𝑣 2 ] =  0 1  .
 1/√2 0 
 
√ √ 
2 2
Then 𝑅 = 𝑄 ∗𝐴 = 𝑄 t𝐴 = . Verify that 𝐴 = 𝑄𝑅.
0 1
The QR-factorization can be used to express a least squares solution in closed
form.
84 MA2031 Classnotes

(4.12) Theorem
Let 𝐴 ∈ F𝑚×𝑛 have linearly independent columns, and let 𝑏 ∈ F𝑚×1 . Then the least
squares solution of 𝐴𝑥 = 𝑏 is unique; and it is given by 𝑢 = 𝑅 −1𝑄 ∗𝑏, where 𝐴 = 𝑄𝑅
is a QR-factorization of 𝐴.

Proof. Since 𝐴 has linearly independent columns, rank(𝐴) = 𝑛. It implies that


null(𝐴) = 𝑛 − rank(𝐴) = 0. Thus, as a linear transformation, 𝐴 is one-one. By
(4.7-3), we have a unique least squares solution of 𝐴𝑥 = 𝑏.
Using (4.10), take 𝑢 = 𝑅 −1𝑄 ∗𝑏. Now,

𝐴∗𝐴𝑢 = 𝑅 ∗𝑄 ∗𝑄𝑅𝑅 −1𝑄 ∗𝑏 = 𝑅 ∗𝑄 ∗𝑦 = 𝐴∗𝑏.

That is, 𝑢 satisfies the equation 𝐴∗𝐴𝑥 = 𝐴∗𝑏. Therefore, 𝑢 is the least squares
solution.
Assume that 𝐴 has linearly independent columns. Is the least squares solution 𝑢
a solution of 𝐴𝑥 = 𝑏? We have 𝐴𝑢 = 𝑄𝑅𝑅 −1𝑄 ∗𝑏 = 𝑄𝑄 ∗𝑏. As seen earlier, this is
not necessarily equal to 𝑏; and then 𝑢 need not be a solution of 𝐴𝑥 = 𝑏. However, if
𝑢 is also a solution of 𝐴𝑥 = 𝑏, then 𝑄 has orthonormal rows. In that case, 𝐴 must be
a square matrix.
Again, if a solution 𝑣 exists for 𝐴𝑥 = 𝑏, then 𝑣 = 𝑢. Reason? If 𝐴𝑣 = 𝑏, then
𝑄𝑅𝑣 = 𝑏 implies that 𝑅𝑣 = 𝑄 ∗𝑏. Hence, 𝑣 = 𝑢.
Notice that the linear system 𝑅𝑢 = 𝑄 ∗𝑏 is easy to solve since 𝑅 is upper triangular.

(4.13) Example
Consider computing the least squares solution of the system 𝐴𝑥 = 𝑏, where 𝐴 is the
matrix in (4.11), and 𝑏 = (1, 2, 3) t . We have seen that
1 1   1/√2 0  √ √ 
   2 2
𝐴 =  0 1  = 𝑄𝑅, 𝑄 =  0 1  , 𝑅= .
1  1/√2 0  0 1
 1   
If 𝑢 is the least squares solution of 𝐴𝑥 = 𝑏, then 𝑢 satisfies 𝑅𝑢 = 𝑄 ∗𝑏, or
√ √ √
2 𝑢 1 + 2 𝑢 2 = 2 2, 𝑢 2 = 2.

It is a triangular system, which is solved by back-substitution to obtain 𝑢 1 = 0 and


𝑢 2 = 2.
Alternatively, the least squares solution 𝑢 = (𝑢 1, 𝑢 2 ) t satsifies 𝐴∗𝐴 = 𝐴∗𝑏, that is,

2𝑢 1 + 2𝑢 2 = 4, 2𝑢 1 + 3𝑢 2 = 6.

Solving, we obtain the same solution 𝑢 1 = 0 and 𝑢 2 = 2.


Linear Transformations and Matrices 85
Exercises for § 4.2
1. Find least squares solutions for the following systems:
(a) 3𝑥 + 𝑦 = 1, 𝑥 + 2𝑦 = 0, 2𝑥 − 𝑦 = −2.
(b) 𝑥 + 𝑦 + 𝑧 = 0, −𝑥 + 𝑧 = 1, 𝑥 − 𝑦 = −1, 𝑦 − 𝑧 = −2.
Ans: (a) 𝑥 = −1/5, 𝑦 = 3/5. (b) 𝑥 = −2/3, 𝑦 = 1, 𝑧 = −1/3.
2. Find a QR-factorization of each of the following matrices:
 1 −1 4 
0 1 1 1 0  
     1 4 −2 
(a) 1 1
  (b) 1 0 1
  (c)  .
0 1 0 1 1  1 4 2 

     1 −1 0 
 3 1 −√2
√   √
0 1/ 2 √ √ 
 
  1 √1
Ans: (a) 𝑄 = 1 0√  , 𝑅 = . (b) 𝑄 = √1  3 −1
√2 

,
0 1/ 2 0 2 6 
  0 2 2 

√ √ √ 1 −1 1 
2 3 3 3    2 3 2 
1
  1
1 1 −1  
𝑅 = √  0 3 √ 1  . (c) 𝑄 = 2   , 𝑅 = 0 5 −2 .
6
1 1 1 
  
 0 0 2 2 0 0 4 
  1 −1 −1  
 
3. Let 𝐴 ∈ F𝑚×𝑛 have linearly independent columns. Show that if 𝐴 has a
QR-factorization where the upper triangular matrix 𝑅 has positive diagonal
entries, then that is the only possible QR-factorization.
4. Let 𝐴 ∈ R𝑚×𝑛 have linearly independent columns. Let 𝑏 ∈ R𝑚×1 . Show that
there exists a unique 𝑥 ∈ R𝑛×1 such that 𝐴∗𝐴𝑥 = 𝐴∗𝑏.
5. Let 𝐴 ∈ F𝑚×𝑛 be a matrix of rank 𝑛. Show the following:
(a) There exist an invertible matrix 𝐵 ∈ F𝑚×𝑚 and a matrix 𝐶 ∈ F𝑚×𝑛 such
that 𝐴 = 𝐵𝐶 and 𝐶 ∗𝐶 is a diagonal matrix.
(b) There exist an invertible matrix 𝐵 ∈ F𝑚×𝑚 and a matrix 𝐶 ∈ F𝑚×𝑛 such
that 𝐴 = 𝐵𝐶 and 𝐶 ∗𝐶 = 𝐼 .

4.3 Matrix of a linear transformation


Let 𝑉 be a vector space of dimension 𝑛 over F. As dimension of F𝑛×1 is also 𝑛, 𝑉
and F𝑛×1 are isomorphic. In fact, if 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } is an ordered basis of 𝑉 , and
{𝑒 1, . . . , 𝑒𝑛 } is the standard basis of F𝑛×1, then the map that takes 𝑣𝑖 to 𝑒𝑖 provides
the required natural isomorphism. If 𝑣 ∈ 𝑉 is any vector, then there exist unique
86 MA2031 Classnotes

scalars 𝑎 1, . . . , 𝑎𝑛 ∈ F such that 𝑣 = 𝑎 1𝑣 1 + · · · + 𝑎𝑛 𝑣𝑛 . Then this natural isomorphism


from 𝑉 to F𝑛×1 is given by
𝑎 1 
 
↦ [𝑣]𝐵 := (𝑎 1, . . . , 𝑎𝑛 ) t =  ...  .
𝑣→
 
 
𝑎𝑛 
 
The vector [𝑣]𝐵 is called the coordinate vector of 𝑣 with respect to 𝐵. The coordi-
nate vector is well-defined since 𝐵 is an ordered basis for 𝑉 ensures that the scalars
𝑎 1, . . . , 𝑎𝑛 are uniquely determined from 𝑣.
We write the map 𝑣 ↦→ [𝑣]𝐵 as [ ]𝐵 : 𝑉 → F𝑛×1 , and [ ]𝐵 (𝑣) as [𝑣]𝐵 . Also, we call
the map [ ]𝐵 as the coordinate vector map with respect to the ordered basis 𝐵 of 𝑉 .

(4.14) Example
1. Consider the ordered basis 𝐵 = {(1, −1), (1, 0)} for F2 . Then
(0, 1) = −1(1, −1) + 1(1, 0). Thus [(0, 1)]𝐵 = (−1, 1) t .
2. Consider the ordered basis 𝐵 = {(1, 0), (1, −1)} for F2 . Then
(0, 1) = 1(1, 0) + −1(1, −1). Thus [(0, 1)]𝐵 = (1, −1) t .
3. Consider the ordered basis 𝐵 = {1, 1 + 𝑡, 1 + 𝑡 2 } for F2 [𝑡]. Then
1 + 𝑡 + 𝑡 2 = −1(1) + 1(1 + 𝑡) + 1(1 + 𝑡 2 ). Thus [1 + 𝑡 + 𝑡 2 ]𝐵 = (−1, 1, 1) t .

The coordinate vectors would be possibly different if we alter the positions of the
basis vectors in the ordered basis. Further, we speak of a coordinate vector map
with respect to a given ordered basis of a finite dimensional vector space.

(4.15) Theorem
Each coordinate vector map is an isomorphism.

Proof. Let 𝑉 be a vector space with an ordered basis {𝑣 1, . . . , 𝑣𝑛 }. Let 𝑥, 𝑦 ∈ 𝑉 .


There exist unique scalars 𝛼 1, . . . , 𝛼𝑛 and 𝛽 1, . . . , 𝛽𝑛 such that

𝑥 = 𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣 𝑛 , 𝑦 = 𝛽 1 𝑣 1 + · · · 𝛽𝑛 𝑣 𝑛 .

Then [𝑥]𝐵 = (𝛼 1, . . . , 𝛼𝑛 )𝑡 and [𝑦]𝐵 = (𝛽 1, . . . , 𝛽𝑛 )𝑡 . Thus, for any 𝑐 ∈ F,

[𝑥 + 𝑐𝑦]𝐵 = [(𝛼 1 + 𝑐𝛽 1 )𝑣 1 + · · · + (𝛼𝑛 + 𝑐𝛽𝑛 )𝑣𝑛 ]𝐵


= (𝛼 1 + 𝑐𝛽 1, . . . , 𝛼𝑛 + 𝑐𝛽𝑛 )𝑡 = [𝑥]𝐵 + 𝑐 [𝑦]𝐵 .

So, the map [ ]𝐵 is a linear transformation from 𝑉 to F𝑛×1 .


Linear Transformations and Matrices 87
Let 𝑣 ∈ 𝑁 ([ ]𝐵 ). Then [𝑣]𝐵 = 0 ∈ F𝑛×1 . Thus 𝑣 = 0 𝑣 1 + · · · + 0 𝑣𝑛 = 0. That is,
𝑁 ( [ ]𝐵 ) ⊆ {0}. Therefore, [ ]𝐵 is one-one. By (3.14), [ ]𝐵 is an isomorphism.
Thus we say that an ordered basis of an 𝑛-dimensional vector space 𝑉 brings in a
coordinate system by inducing an isomorphism between 𝑉 and F𝑛×1 .
Let 𝑉 and 𝑊 be finite dimensional vector spaces over F with respective ordered
bases as 𝐵 = {𝑣 1, . . . 𝑣𝑛 } and 𝐸 = {𝑤 1, . . . , 𝑤𝑚 }. Let 𝑇 : 𝑉 → 𝑊 be a linear
transformation. Let 𝑣 ∈ 𝑉 . We have unique scalars 𝛼 1, . . . , 𝛼𝑛 such that 𝑣 = 𝛼 1𝑣 1 +
· · · + 𝛼𝑛 𝑣𝑛 . Then 𝑇 (𝑣) = 𝛼 1𝑇 (𝑣 1 ) + · · · + 𝛼𝑛𝑇 (𝑣𝑛 ). It follows that

[𝑇 (𝑣)]𝐸 = 𝛼 1 [𝑇 (𝑣 1 )]𝐸 + · · · + 𝛼𝑛 [𝑇 (𝑣𝑛 )]𝐸 .

That is, the coordinate vector of 𝑇 (𝑣) can be obtained once we know the coordinate
vectors of 𝑇 (𝑣 1 ), . . . ,𝑇 (𝑣𝑛 ) with respect to 𝐸. Suppose the coordinate vectors of
𝑇 (𝑣 𝑗 ) are given as follows:

 𝑎 11   𝑎 1𝑗   𝑎 1𝑛 
     
[𝑇 (𝑣 1 )]𝐸 =  .  , · · · , [𝑇 (𝑣 𝑗 )]𝐸 =  .  , · · · , [𝑇 (𝑣𝑛 )]𝐸 =  ... 
 ..   ..   
     
𝑎𝑚1  𝑎𝑚 𝑗  𝑎𝑚𝑛 
     
for scalars 𝑎𝑖 𝑗 . Notice that this is equivalent to expressing 𝑇 (𝑣 𝑗 ) in terms of the basis
vectors 𝑤 1, . . . , 𝑤𝑚 as in the following:

𝑇 (𝑣 1 ) = 𝑎 11𝑤 1 + · · · + 𝑎𝑚1𝑤𝑚
..
.
𝑇 (𝑣𝑛 ) = 𝑎 1𝑛𝑤 1 + · · · + 𝑎𝑚𝑛𝑤𝑚 .

We put together the coordinate vectors [𝑇 (𝑣 𝑗 )]𝐸 in that order to obtain the follow-
ing array of scalars 𝑎𝑖 𝑗 :

 𝑎 11 · · · 𝑎 1𝑗 · · · 𝑎 1𝑛 
 
 .
. 
h i  . 

[𝑇 (𝑣 1 )]𝐸 · · · [𝑇 (𝑣 𝑗 )]𝐸 · · · [𝑇 (𝑣𝑛 )]𝐸 =  𝑎𝑖1 · · · 𝑎𝑖 𝑗 · · · 𝑎𝑖𝑛  .

 .. 
 . 
 
𝑎𝑚1 · · · 𝑎𝑚 𝑗 · · · 𝑎𝑚𝑛 
 
Such an array of 𝑚𝑛 scalars 𝑎𝑖 𝑗 is called an 𝑚 × 𝑛 matrix with entries from F. The
set of all 𝑚 × 𝑛 matrices with entries from F is denoted by F𝑚×𝑛 . The scalar at the
𝑖th row and the 𝑗th column of a matrix is called its (𝑖, 𝑗)th entry. By writing

𝐴 = [𝑎𝑖 𝑗 ] ∈ F𝑚×𝑛
88 MA2031 Classnotes

we mean that 𝐴 is an 𝑚 × 𝑛 matrix with its (𝑖, 𝑗)th entry as 𝑎𝑖 𝑗 .


Two matrices in F𝑚×𝑛 are equal iff their corresponding entries are equal.
We summarize and give a notation. Let 𝑉 and 𝑊 be vector spaces over F with
dimensions 𝑛 and 𝑚, and ordered bases 𝐵 = {𝑣 1, . . . 𝑣𝑛 } and 𝐸 = {𝑤 1, . . . , 𝑤𝑚 },
respectively. Let 𝑇 : 𝑉 → 𝑊 be a linear transformation. Let [𝑇 (𝑣 𝑗 )]𝐸 denote the
coordinate vector of 𝑇 (𝑣 𝑗 ) with respect to 𝐸. The matrix [𝑇 ]𝐸,𝐵 of 𝑇 with respect
to the ordered bases 𝐵 and 𝐸 is given by
h i
[𝑇 ]𝐸,𝐵 := [𝑇 (𝑣 1 )]𝐸 . . . [𝑇 (𝑣𝑛 )]𝐸 .

Now we know how to construct the matrix of a linear transformation. We first


express the images of the basis vectors of the domain space in terms of the basis
vectors of the co-domain space as in the following:

𝑇 (𝑣 1 ) = 𝑎 11𝑤 1 + · · · + 𝑎𝑖1𝑤𝑖 + · · · + 𝑎𝑚1𝑤𝑚


..
.
𝑇 (𝑣 𝑗 ) = 𝑎 1𝑗 𝑤 1 + · · · + 𝑎𝑖 𝑗 𝑤𝑖 + · · · 𝑎𝑚 𝑗 𝑤𝑚
..
.
𝑇 (𝑣𝑛 ) = 𝑎 1𝑛𝑤 1 + · · · + 𝑎𝑖𝑛𝑤𝑖 + · · · 𝑎𝑚𝑛𝑤𝑚 .

Then
 𝑎 11 · · · 𝑎 1𝑗 · · · 𝑎 1𝑛 
 
 .
. 

 . 

[𝑇 ]𝐸,𝐵 =  𝑎𝑖1 · · · 𝑎𝑖 𝑗 · · · 𝑎𝑖𝑛  .

 .. 
 . 
 
𝑎𝑚1 · · · 𝑎𝑚 𝑗 · · · 𝑎𝑚𝑛 
 
Caution: Mark which 𝑎𝑖 𝑗 goes where.

(4.16) Example
In the following, we consider all bases as ordered bases.
1. Let 𝐵 = {𝑒 1, 𝑒 2 } and 𝐸 = {𝑓1, 𝑓2, 𝑓3 } be the standard bases for R2 and R3,
respectively. Consider the linear transformation 𝑇 : R2 → R3 given by 𝑇 (𝑎, 𝑏) =
(2𝑎 − 𝑏, 𝑎 + 𝑏, 𝑏 − 𝑎). Then

𝑇 (𝑒 1 ) = (2, 1, −1) = 2𝑓1 + 1𝑓2 + (−1) 𝑓3


𝑇 (𝑒 2 ) = (−1, 1, 1) = (−1) 𝑓1 + 1𝑓2 + 1𝑓3 .
 2 −1    2𝑎 − 𝑏 
  𝑎  
Therefore [𝑇 ]𝐸,𝐵 =  1 1 . Further, [(𝑎, 𝑏)]𝐵 = and [𝑇 (𝑎, 𝑏)]𝐸 =  𝑎 + 𝑏  .
−1 1 𝑏 −𝑎 + 𝑏 
   
Linear Transformations and Matrices 89
2. Consider the linear transformation 𝐷 : R3 [𝑡] → R2 [𝑡]; 𝐷 (𝑝 (𝑡)) = 𝑝 0 (𝑡). Choose
bases 𝐵 = {1, 𝑡, 𝑡 2, 𝑡 3 } and 𝐸 = {1, 𝑡, 𝑡 2 } for R3 [𝑡] and R2 [𝑡], respectively. Then

𝐷 (1) = 0 × 1 + 0 × 𝑡 + 0 × 𝑡 2
𝐷 (𝑡) = 1 × 1 + 0 × 𝑡 + 0 × 𝑡 2
𝐷 (𝑡 2 ) = 0 × 1 + 2 × 𝑡 + 0 × 𝑡 2
𝐷 (𝑡 3 ) = 0 × 1 + 0 × 𝑡 + 3 × 𝑡 2 .

So,
𝑎 
0 1 0 0  
  𝑏 
 2 3 𝑏 2 3
 
[𝐷]𝐸,𝐵 = 0 0 2 0 , [𝑎+𝑏𝑡 +𝑐𝑡 +𝑑𝑡 ]𝐵 =   , 𝐷 (𝑎+𝑏𝑡 +𝑐𝑡 +𝑑𝑡 )]𝐸 =  2𝑐  .
 
0 𝑐 
 0 0 3   3𝑑 
 
𝑑 
 

3. With the same linear transformation 𝐷 and the basis 𝐵 for R3 [𝑡] as in (2), let
𝐸 = {1, 1 + 𝑡, 1 + 𝑡 2 }. Then

𝐷 (1) = 0 × 1 + 0 × (1 + 𝑡) + 0 × (1 + 𝑡 2 )
𝐷 (𝑡) = 1 × 1 + 0 × (1 + 𝑡) + 0 × (1 + 𝑡 2 )
𝐷 (𝑡 2 ) = −2 × 1 + 2 × (1 + 𝑡) + 0 × (1 + 𝑡 2 )
𝐷 (𝑡 3 ) = −3 × 1 + 0 × (1 + 𝑡) + 3 × (1 + 𝑡 2 ).

0 1 −2 −3
 
Therefore, [𝐷]𝐸,𝐵 = 0 0 2 0  .
0 0 0 3 
 
𝐷 (𝑎 +𝑏𝑡 + 𝑐𝑡 2 +𝑑𝑡 3 ) = 𝑏 + 2𝑐𝑡 + 3𝑑𝑡 2 = (𝑏 − 2𝑐 − 3𝑑) × 1 + 2𝑐 (1 + 𝑡 2 ) + 3𝑑 (1 + 𝑡 2 ).
𝑎 
 
  𝑏 − 2𝑐 − 3𝑑 
𝑏  
So, [𝑎 + 𝑏𝑡 + 𝑐𝑡 2 + 𝑑𝑡 3 )]𝐵 =   and [𝐷 (𝑎 + 𝑏𝑡 + 𝑐𝑡 2 + 𝑑𝑡 3 )]𝐸 =  2𝑐
  .
𝑐  
  
 3𝑑 

𝑑 
 
4. Let 𝐵 = {1, 1 + 𝑡, 𝑡 + 𝑡 2 } and 𝐸 = {1, 𝑡, 𝑡 + 𝑡 2, 𝑡 2 + 𝑡 3 } be bases for R2 [𝑡] and
R3 [𝑡], respectively.
∫𝑡 Let 𝑇 : R2 [𝑡] → R3 [𝑡] be the linear transformation given
by 𝑇 (𝑝 (𝑡)) = 0 𝑝 (𝑠)𝑑𝑠. Then
∫𝑡
𝑇 (1) = 0
𝑑𝑠 = 0 × 1 + 1 × 𝑡 + 0(𝑡 + 𝑡 2 ) + 0(𝑡 2 + 𝑡 3 )
∫𝑡
𝑇 (1 + 𝑡) = 0
(1 + 𝑠) 𝑑𝑠 = 0 × 1 + 12 × 𝑡 + 12 (𝑡 + 𝑡 2 ) + 0(𝑡 2 + 𝑡 3 )
∫ 𝑡
𝑇 (𝑡 + 𝑡 2 ) = 0 (𝑠 + 𝑠 2 ) 𝑑𝑠 = 0 × 1 − 16 × 𝑡 + 16 (𝑡 + 𝑡 2 ) + 31 (𝑡 2 + 𝑡 3 ).
90 MA2031 Classnotes
0 0 0 

1
 1/2 −1/6
Therefore, [𝑇 ]𝐸,𝐵 = 1/2 1/6
.
0
 
0
 0 1/3

The coordinate vectors of a typical vector in R2 [𝑡] and its image are:

𝑣 = 𝑎 + 𝑏𝑡 + 𝑐𝑡 2 = (𝑎 − 𝑏 + 𝑐) × 1 + (𝑏 − 𝑐)(1 + 𝑡) + 𝑐 (𝑡 + 𝑡 2 )
∫𝑡
𝑇 (𝑣) = 𝑇 (𝑎 + 𝑏𝑡 + 𝑐𝑡 ) = (𝑎 + 𝑏𝑠 + 𝑐𝑠 2 ) 𝑑𝑠 = 𝑎 𝑡 + 𝑏2 𝑡 2 + 𝑐3 𝑡 3
2
0
= 0 × 1 + (𝑎 − 𝑏
2 + 𝑐 𝑏
3 )𝑡 + ( 2 − 𝑐3 )(𝑡 + 𝑡 2 ) + 𝑐3 (𝑡 2 + 𝑡 3 ).

 0 
𝑎 − 𝑏 + 𝑐   
− / + /
 
  𝑎 𝑏 2 𝑐 3
Therefore, [𝑣]𝐵 =  𝑏 − 𝑐  and [𝑇 (𝑣)]𝐸 =  𝑏
 
.
 𝑐   / 2 − 𝑐/3 
 
   𝑐/3 
 
As we see, a linear transformation𝑇 : 𝑉 → 𝑊 with fixed ordered bases 𝐵 for𝑉 , and
𝐸 for 𝑊 , gives rise to a matrix. We can also construct back the linear transformation
from such a given matrix. For, suppose 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } and 𝐸 = {𝑤 1, . . . , 𝑤𝑚 }. Let
Í
𝑣 ∈ 𝑉 . There exist unique scalars 𝛽 1, . . . , 𝛽𝑛 such that 𝑣 = 𝑛𝑗=1 𝛽 𝑗 𝑣 𝑗 . Then

𝑛
Õ 𝑛
Õ
𝑇 (𝑣) = 𝛽 𝑗𝑇 (𝑣 𝑗 ) = 𝛽 𝑗 (𝑎 1𝑗 𝑤 1 + · · · + 𝑎𝑚 𝑗 𝑤𝑚 ).
𝑗=1 𝑗=1

We thus say that the matrix [𝑇 ]𝐸,𝐵 represents the linear transformation 𝑇 .

Exercises for § 4.3


1. Define 𝑇 : R2 → R3 by 𝑇 (𝑎, 𝑏) = (𝑎 − 𝑏, 𝑎, 2𝑎 − 𝑏). Let 𝐵 be the standard
basis of R2 . Take 𝐶 = {(1, 2), (2, 3)} and 𝐷 = {(1, 1, 0), (0, 1, 1), (2, 2, 3)} as
ordered bases for R3 . Compute [𝑇 ]𝐷,𝐵 and [𝑇 ]𝐷,𝐶 .
−1/3 1/3   1/3 1/3 
   
Ans: [𝑇 ]𝐷,𝐵 =  0
 1  ; [𝑇 ]𝐷,𝐶 =  2
 3  .
 2/3 −2/3 −2/3 −2/3
   
2. Define 𝑇 : R3 → R3 by 𝑇 (𝑎, 𝑏, 𝑐) = (𝑏 + 𝑐, 𝑐 + 𝑎, 𝑎 + 𝑏). Determine [𝑇 ]𝐸,𝐵
where
(a) 𝐵 = {(1, 0, 0), (0, 0, 1), (0, 1, 0)}, 𝐸 = {(0, 0, 1), (1, 0, 0), (0, 1, 0)}.
(b) 𝐵 = {(0, 0, 1), (1, 0, 0), (0, 1, 0)}, 𝐸 = {(1, 0, 0), (0, 0, 1), (0, 1, 0)}.
(c) 𝐵 = {(1, 1, −1), (−1, 1, 1), (1, −1, 1)}, 𝐸 = {(−1, 1, 1), (1, −1, 1), (1, 1, −1)}.
Linear Transformations and Matrices 91
 1 0 1 1 0 1
   
Ans: (a)-(b) 0 1 1 . (c) 1 1 0 .
 
 1 1 0 0 1 1
   
3. Define 𝑇 : R2 [𝑡] → R by 𝑇 (𝑓 ) = 𝑓 (2). Compute
 the matrix of 𝑇 with respect
to the standard bases of the spaces. Ans: 1 2 4 .
4. Let 𝑇 : F2 [𝑡] → F3 [𝑡] be given by 𝑇 (𝑝 (𝑡)) = 𝑡 𝑝 (𝑡). Consider the ordered
bases 𝐵 = {1 + 𝑡, 1 − 𝑡, 𝑡 2 } for F2 [𝑡] and 𝐸 = {1, 1 + 𝑡, 1 + 𝑡 + 𝑡 2, 𝑡 3 } for F3 [𝑡].
 −1 −1 0 
 
 0 2 0
Find the matrix [𝑇 ]𝐸,𝐵 . Ans:  .
 1 −1 0 

 0 0 1
 

4.4 Matrix operations


Recall that a matrix 𝐴 ∈ F𝑚×𝑛 is viewed as the linear transformation 𝐴 : F𝑛×1 →
F𝑚×1 defined by 𝐴(𝑥) = 𝐴𝑥 for 𝑥 ∈ F𝑛×1 .
Conversely, any linear transformation from F𝑛×1 to F𝑚×1 is a matrix multiplication.
To see this, suppose 𝑇 : F𝑛×1 → F𝑚×1 is a linear transformation. Let 𝑒 1, . . . , 𝑒𝑛 be
the standard basis vectors of F𝑛×1 . Construct the matrix 𝐴 ∈ F𝑚×𝑛 by taking the
vector 𝑇 𝑒𝑖 as its 𝑖th column; that is,
 
𝐴 = 𝑇 (𝑒 1 ) · · · 𝑇 (𝑒𝑛 ) .

If 𝑥 ∈ F𝑛×1, then we have scalars 𝑎 1, . . . , 𝑎𝑛 such that 𝑥 = 𝑎 1𝑒 1 + · · · + 𝑎𝑛 𝑒𝑛 . Then

𝑎 1 
 
𝑇 (𝑥) = 𝑎 1𝑇 (𝑒 1 ) + · · · + 𝑎𝑛𝑇 (𝑒𝑛 ) = 𝐴  ...  = 𝐴𝑥 .
 
 
𝑎𝑛 
 
This justifies our terminology: the matrix 𝐴 represents the linear transformation 𝑇 .
In abstract vector spaces, how are the coordinate vectors of 𝑣, 𝑇 (𝑣) and the matrix
of 𝑇 related? Look back at (4.16), where we computed the coordinate vector of a
typical vector and also that of its image. In the first problem there, with 𝑣 = (𝑎, 𝑏),
we had
 2 −1    2𝑎 − 𝑏 
  𝑎  
[𝑇 ]𝐸,𝐵 =  1 1 , [𝑣]𝐵 = , [𝑇 (𝑣)]𝐸 =  𝑎 + 𝑏  .
−1 1 𝑏  −𝑎 + 𝑏 
   
92 MA2031 Classnotes

In the second problem with 𝑣 = 𝑎 + 𝑏𝑡 + 𝑐𝑡 2 + 𝑑𝑡 3, and 𝑇 = 𝐷, we had


𝑎 
0 1 0 0   𝑏 
 𝑏   
[𝑇 ]𝐸,𝐵 = 0 0 2 0 , [𝑣]𝐵 =   , [𝑇 (𝑣)]𝐸 =  2𝑐  .
0 0 0 3 𝑐  3𝑑 
 𝑑   
 
In third problem, with 𝑣 = 𝑎 + 𝑏𝑡 + 𝑐𝑡 2 + 𝑑𝑡 3 and 𝑇 = 𝐷, we had

𝑎 
0 1 −2 −3   𝑏 − 2𝑐 − 3𝑑 
  𝑏   
[𝑇 ]𝐸,𝐵 = 0 0 2 0  , [𝑣]𝐵 =   , [𝑇 (𝑣)]𝐸 =  2𝑐 .
𝑐 

0 0 0 3   3𝑑 
  𝑑   
 
In the fourth problem, we had obtained
0 0 0   0 
 𝑎 − 𝑏 + 𝑐   
1 1/2 −1/6   𝑎 − 𝑏/2 + 𝑐/3
[𝑇 ]𝐸,𝐵 =  1/2 1/6
, [𝑣]𝐵 =  𝑏 − 𝑐  , [𝑇 (𝑣)]𝐸 =  𝑏
 .
0   𝑐   /2 − 𝑐/3 

0
 0 1/3   
 𝑐/3 

Can you see how do we obtain the column vector [𝑇 (𝑣)]𝐸 from the matrix [𝑇 ]𝐸,𝐵
and the column vector [𝑣]𝐵 ? The rule is simple. The first component of [𝑇 (𝑣)]𝐸 is
the dot product of the first row of [𝑇 ]𝐸,𝐵 with the column vector [𝑣]𝐵 . The second
component of [𝑇 (𝑣)]𝐸 is the dot product of the second row of [𝑇 ]𝐸,𝐵 with the column
vector [𝑣]𝐵 ; and so on.

(4.17) Theorem
Let 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } and 𝐸 = {𝑤 1, . . . , 𝑤𝑚 } be ordered bases for the vector spaces
𝑉 and 𝑊 , respectively. Let 𝑇 : 𝑉 → 𝑊 be a linear transformation. Then [𝑇 ]𝐸,𝐵 is
the unique matrix in F𝑚×𝑛 such that for each 𝑥 ∈ 𝑉 ,
[𝑇 𝑥]𝐸 = [𝑇 ]𝐸,𝐵 [𝑥]𝐵 .

Proof. Let 𝑥 = 𝑎 1𝑣 1 + · · · + 𝑎𝑛 𝑣𝑛 . Then 𝑇 𝑥 = 𝑎 1𝑇 𝑣 1 + · · · + 𝑎𝑛𝑇 𝑣𝑛 . Let [𝑇 ]𝐸,𝐵 =


[𝑏𝑖 𝑗 ] ∈ F𝑚×𝑛 . Then we have
𝑎 1   𝑏 1𝑗 
   
[𝑥]𝐵 =  .  , [𝑇 𝑣 𝑗 ]𝐸 =  ...  for 𝑗 = 1, . . . , 𝑛.
 ..   
   
𝑎𝑛  𝑏𝑚 𝑗 
   
 𝑏 11   𝑏 1𝑛   𝑏 11𝑎 1 + · · · + 𝑏 1𝑛 𝑎𝑛 
     
 ..   ..   .
[𝑇 𝑥]𝐸 = 𝑎 1  .  + · · · + 𝑎𝑛  .  =   = [𝑇 ]𝐸,𝐵 [𝑥]𝐵 .
. 
     . 
𝑏𝑚1  𝑏𝑚𝑛  𝑏𝑚1𝑎 1 + · · · + 𝑏𝑚𝑛 𝑎𝑛 
     
Linear Transformations and Matrices 93
For the uniqueness of [𝑇 ]𝐸,𝐵 , let 𝐴 ∈ F𝑚×𝑛 satisfy [𝑇 𝑥]𝐸 = 𝐴[𝑥]𝐵 for each 𝑥 ∈ 𝑉 .
Let 𝑗 ∈ {1, . . . , 𝑛}. Take 𝑥 = 𝑣 𝑗 , the 𝑗th basis vector in 𝐵. Then, [𝑣 𝑗 ]𝐵 = 𝑒 𝑗 , the 𝑗th
standard basis vector of F𝑛×1 . Consequently,

the 𝑗th column of [𝑇 ]𝐸,𝐵 = [𝑇 (𝑣 𝑗 )]𝐸 = 𝐴[𝑣 𝑗 ]𝐵 = 𝐴𝑒 𝑗 = the 𝑗th column of 𝐴.

Therefore, [𝑇 ]𝐸,𝐵 = 𝐴.
Let 𝑆,𝑇 : 𝑉 → 𝑊 be linear transformations and let 𝛼 be a scalar. Then the
functions 𝑆 + 𝑇 : 𝑉 → 𝑊 and 𝛼𝑆 : 𝑉 → 𝑊 defined by

(𝑆 + 𝑇 )(𝑥) = 𝑆 (𝑥) + 𝑇 (𝑥), (𝛼𝑆)(𝑥) = 𝛼𝑆 (𝑥) for all 𝑥 ∈ 𝑉

are linear transformations. (Prove!) Analogously, addition of two matrices and


multiplication of a matrix with a scalar are defined entry-wise. We see that these
definitions tally, in a certain sense.

(4.18) Theorem
Let 𝐵 and 𝐸 be ordered bases for the finite dimensional vector spaces 𝑉 and 𝑊 ,
respectively. Let 𝑆, 𝑇 : 𝑉 → 𝑊 be linear transformations. Let 𝛼 ∈ F. Then

[𝑆 + 𝑇 ]𝐸,𝐵 = [𝑆]𝐸,𝐵 + [𝑇 ]𝐸,𝐵 and [𝛼𝑇 ]𝐸,𝐵 = 𝛼 [𝑇 ]𝐸,𝐵 .

Proof. Let 𝑥 ∈ 𝑉 . Since the coordinate vector of a sum is the sum of the coordinate
vectors, we have

[(𝑆 + 𝑇 )(𝑥)]𝐸 = [𝑆 (𝑥) + 𝑇 (𝑥)]𝐸 = [𝑆 (𝑥)]𝐸 + [𝑇 (𝑥)]𝐸


 
= [𝑆]𝐸,𝐵 [𝑥]𝐵 + [𝑇 ]𝐸,𝐵 [𝑥]𝐵 = [𝑆]𝐸,𝐵 + [𝑇 ]𝐸,𝐵 [𝑥]𝐵 .

By (4.17), [𝑆 + 𝑇 ]𝐸 = [𝑆]𝐸,𝐵 + [𝑇 ]𝐸,𝐵 . Similarly, [𝛼𝑇 ]𝐸,𝐵 = 𝛼 [𝑇 ]𝐸,𝐵 .

(4.19) Theorem
Let 𝐵, 𝐶 and 𝐷 be ordered bases for the finite dimensional vector spaces 𝑈 , 𝑉 and
𝑊 , respectively. Let 𝑆 : 𝑈 → 𝑉 and 𝑇 : 𝑉 → 𝑊 be linear transformations. Then

[𝑇 𝑆]𝐷,𝐵 = [𝑇 ]𝐷,𝐶 [𝑆]𝐶,𝐵 .

Proof. Suppose 𝐵 = {𝑢 1, . . . , 𝑢𝑛 }. Let 𝑒 𝑗 be the 𝑗th standard basis vector of F𝑛×1 .


Then, for each 𝑗 ∈ {1, . . . , 𝑛}, [𝑢 𝑗 ]𝐵 = 𝑒 𝑗 . Now, for each such 𝑗,

[𝑇 𝑆]𝐷,𝐵 𝑒 𝑗 = [𝑇 𝑆]𝐷,𝐵 [𝑢 𝑗 ]𝐵 = [(𝑇 𝑆)(𝑢 𝑗 )]𝐷 = [𝑇 (𝑆 (𝑢 𝑗 )]𝐷


= [𝑇 ]𝐷,𝐶 [𝑆 (𝑢 𝑗 )]𝐶 = [𝑇 ]𝐷,𝐶 [𝑆]𝐶,𝐵 [𝑢 𝑗 ]𝐵 = [𝑇 ]𝐷,𝐶 [𝑆]𝐶,𝐵 𝑒 𝑗 .
94 MA2031 Classnotes

That is, the 𝑗th column of [𝑇 𝑆]𝐷,𝐵 is same as the 𝑗th column of [𝑇 ]𝐷,𝐶 [𝑆]𝐶,𝐵 for each
such 𝑗 . Therefore, [𝑇 𝑆]𝐷,𝐵 = [𝑇 ]𝐷,𝐶 [𝑆]𝐶,𝐵 .

As you see, the addition of matrices, a scalar multiple of a matrix, and product of
two matrices are defined in such a way that as linear transformations, they correspond
to the sum of linear transformations, a scalar multiple of a linear transformation,
and the composition of two linear transformations, respectively.
We consider a particular case. Suppose 𝑇 is an isomorphism from 𝑉 to 𝑊 , where
dim (𝑉 ) = 𝑛 = dim (𝑊 ). Then its inverse, written as 𝑇 −1 is an isomorphism from
𝑊 to 𝑉 . Then 𝑇 −1𝑇 = 𝐼, the identity map on 𝑉 . Now, fix ordered bases 𝐵 for 𝑉 , and
𝐸 for 𝑊 . If 𝐵 = {𝑣 1, . . . , 𝑣𝑛 }, then

(𝑇 −1𝑇 )(𝑣 𝑗 ) = 𝑣 𝑗 = 0 𝑣 1 + · · · + 0 𝑣 𝑗−1 + 1 𝑣 𝑗 + 0 𝑣 𝑗+1 + · · · + 0 𝑣𝑛 .

Then the 𝑛 × 𝑛 matrix [𝑇 −1𝑇 ]𝐵,𝐵 has the 𝑗th column as 𝑒 𝑗 ∈ F𝑛×1 . Similarly, the 𝑖th
column of [𝑇𝑇 −1 ]𝐸 is also 𝑒𝑖 , where 𝐸 is an ordered basis of 𝑊 .
An isomorphism maps a basis onto a basis. Looking at an 𝑛 × 𝑛 matrix as a linear
transformation, we see that the images of the standard basis vectors are the columns
of the matrix. It then follows that

a square matrix is invertible iff its columns are linearly independent.

Further, we observe that the matrix representation of the identity map with respect
to the same basis in both copies of the vector space is the identity matrix.
The matrix representation of an isomorphism and that of its inverse have an
obvious connection.

(4.20) Theorem
Let 𝐵 and 𝐸 be ordered bases for finite dimensional vector spaces 𝑉 and 𝑊 ,
respectively. Let 𝑇 : 𝑉 → 𝑊 be an isomorphism. Then
  −1
−1
[𝑇 ]𝐵,𝐸 = [𝑇 ]𝐸,𝐵 .

Proof. Due to (4.19), [𝑇 −1 ]𝐵,𝐸 [𝑇 ]𝐸,𝐵 = [𝑇 −1𝑇 ]𝐵,𝐵 = [𝐼 ]𝐵,𝐵 = 𝐼 .


Similarly, [𝑇 ]𝐸,𝐵 [𝑇 −1 ]𝐵,𝐸 = [𝑇𝑇 −1 ]𝐸,𝐸 = [𝐼 ]𝐸,𝐸 = 𝐼 .

We know that F𝑚×𝑛 is a vector space with addition and scalar multiplication of
matrices. Denote the set of all linear transformations from 𝑉 to 𝑊 by L (𝑉 ,𝑊 ). It
is easy to verify that L (𝑉 ,𝑊 ) is a vector space over the same underlying field with
the addition and scalar multiplication of linear transformations as mentioned earlier.
Can you get a basis for L (𝑉 ,𝑊 ) looking at the basis {𝐸𝑖 𝑗 : 1 ≤ 𝑖 ≤ 𝑚, 1 ≤ 𝑗 ≤ 𝑛}
for F𝑚×𝑛 ?
Linear Transformations and Matrices 95
Let 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } be an orthonormal basis for 𝑉 . Let 𝑢, 𝑣 ∈ 𝑉 . Then 𝑢 =
Í𝑛 Í𝑛
𝑖=1 h𝑢, 𝑣 𝑖 i𝑣 𝑖 , 𝑣 = 𝑗=1 h𝑣, 𝑣 𝑗 i𝑣 𝑗 , and

𝑛 Õ
Õ 𝑛 𝑛
Õ
h𝑢, 𝑣i = h𝑢, 𝑣𝑖 ih𝑣, 𝑣 𝑗 ih𝑣𝑖 , 𝑣 𝑗 i = h𝑢, 𝑣𝑖 ih𝑣, 𝑣𝑖 i = [𝑢]𝐵 · [𝑣]𝐵 .
𝑖=1 𝑗=1 𝑖=1

An orthonormal basis converts the inner product to the dot product. Using an
orthonormal basis in a finite dimensional inner product space amounts to working
in F𝑛×1 with the dot product. Moreover, orthonormal bases allow writing the entries
of the matrix representation of a linear transformation by using the inner products;
see the following theorem.

(4.21) Theorem
Let 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } and 𝐸 = {𝑤 1, . . . , 𝑤𝑚 } be ordered bases of the inner product
spaces 𝑉 and 𝑊 , respectively. Let 𝑇 : 𝑉 → 𝑊 be a linear transformation. If 𝐸 is
an orthonormal basis of 𝑊 , then the (𝑖 𝑗)th entry of [𝑇 ]𝐸,𝐵 is equal to h𝑇 𝑣 𝑗 , 𝑤𝑖 i.

Proof. For 1 ≤ 𝑖 ≤ 𝑚 and 1 ≤ 𝑗 ≤ 𝑛, let 𝑎𝑖 𝑗 denote the (𝑖 𝑗)th entry of the matrix
Í
[𝑇 ]𝐸,𝐵 . Then 𝑇 𝑣 𝑗 = 𝑎 1𝑗 𝑤 1 + · · · + 𝑎𝑚 𝑗 𝑤𝑚 = 𝑚 𝑘=1 𝑎𝑘 𝑗 𝑤 𝑘 . Since 𝐸 is orthonormal,
Í𝑚
h𝑇 𝑣 𝑗 , 𝑤𝑖 i = 𝑎 𝑤
𝑘=1 𝑘 𝑗 𝑘 , 𝑤 𝑖 i = 𝑎 𝑖𝑗 h𝑤 𝑖 , 𝑤 𝑖 = 𝑎 𝑖𝑗.

Notice that in (4.21) the basis for 𝑉 need not be orthonormal.


We find some connection between the matrix representation of the adjoint, and
the adjoint of the matrix representation of a linear transformation.

(4.22) Theorem
Let 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } and 𝐸 = {𝑤 1, . . . , 𝑤𝑚 } be orthonormal ordered bases of the
ips 𝑉 and 𝑊 , respectively. Let 𝑇 : 𝑉 → 𝑊 be a linear transformation. Then
[𝑇 ∗ ]𝐵,𝐸 = ([𝑇 ]𝐸,𝐵 ) ∗ .

Proof. Suppose 1 ≤ 𝑖 ≤ 𝑚 and 1 ≤ 𝑗 ≤ 𝑛. Denote the (𝑖 𝑗)th entry of [𝑇 ]𝐸,𝐵 by 𝑎𝑖 𝑗


and that of [𝑇 ∗ ]𝐵,𝐸 by 𝑏𝑖 𝑗 . By (4.21),

𝑏𝑖 𝑗 = h𝑇 ∗𝑤 𝑗 , 𝑣𝑖 i = h𝑣𝑖 ,𝑇 ∗𝑤 𝑗 i = h𝑇 𝑣𝑖 , 𝑤 𝑗 i = 𝑎 𝑗𝑖 .

Therefore, [𝑇 ∗ ]𝐵,𝐸 = ([𝑇 ]𝐸,𝐵 ) ∗ .


If the bases are not orthonormal, then the adjoint of the matrix representation may
not represent the adjoint of the linear transformation.

(4.23) Example
Let 𝑢 1 = (1, 1, 0), 𝑢 2 = (1, 0, 1) and 𝑢 3 = (0, 1, 1). Consider 𝐸 = {𝑢 1, 𝑢 2, 𝑢 3 } as a
basis of R3, and the standard basis 𝐵 = {𝑒 1, 𝑒 2, 𝑒 3, 𝑒 4 } of R4 . Use the standard inner
96 MA2031 Classnotes

products (the dot products) on these spaces. Consider the linear transformation
𝑇 : R4 → R3 defined by

𝑇 (𝑎, 𝑏, 𝑐, 𝑑) = (𝑎 + 𝑐, 𝑏 − 2𝑐 + 𝑑, 𝑎 − 𝑏 + 𝑐 − 𝑑).

The computation in (3.21) shows that 𝑇 ∗ : R3 → R4 is given by

𝑇 ∗ (𝛼, 𝛽, 𝛾) = (𝛼 + 𝛾, 𝛽 − 𝛾, 𝛼 − 2𝛽 + 𝛾, 𝛽 − 𝛾).

For the matrix representations of 𝑇 and of 𝑇 ∗, we proceed as follows:

𝑇 𝑒1 = 𝑇 (1, 0, 0, 0) = (1, 0, 1) = 0 𝑢1 + 1 𝑢2 + 0 𝑢3
𝑇 𝑒2 = 𝑇 (0, 1, 0, 0) = (0, 1, −1) = 1 𝑢1 − 1 𝑢2 + 0 𝑢3
𝑇 𝑒3 = 𝑇 (0, 0, 1, 0) = (1, −2, 1) = −1 𝑢 1 + 2 𝑢 2 − 1 𝑢 3
𝑇 𝑒4 = 𝑇 (0, 0, 0, 1) = (0, 1, −1) = 1 𝑢1 − 1 𝑢2 + 0 𝑢3

𝑇 ∗𝑢 1 = 𝑇 ∗ (1, 1, 0) = (1, 1, −1, 1) = 1 𝑒 1 + 1 𝑒 2 − 1 𝑒 3 + 1 𝑒 4


𝑇 ∗𝑢 2 = 𝑇 ∗ (1, 0, 1) = (2, −1, 2, −1) = 2 𝑒 1 − 1 𝑒 2 + 2 𝑒 3 − 1 𝑒 4
𝑇 ∗𝑢 3 = 𝑇 ∗ (0, 1, 1) = (1, 0, −1, 0) = 1 𝑒 1 + 0 𝑒 2 − 1 𝑒 3 + 0 𝑒 4
Therefore, the matrices are
 1 2 1 
 0 1 −1 1  
 1 −1 0
 
[𝑇 ∗ ]𝐵,𝐸

[𝑇 ]𝐸,𝐵 =  1 −1
 2 −1  , = .
 0 0 −1  −1 2 −1 
0   
  1 −1 0 

Notice that 𝐸 is not an orthonormal basis, and [𝑇 ∗ ]𝐵,𝐸 ≠ ([𝑇 ]𝐸,𝐵 ) ∗ .

The connection between special types of linear operators and their matrix repre-
sentations can be stated in the presence of an orthonormal basis.

(4.24) Theorem
Let 𝑇 be a linear operator on a finite dimensional inner product space 𝑉 . Let 𝐵 be
an orthonormal ordered basis of 𝑉 .
(1) 𝑇 is self-adjoint iff [𝑇 ]𝐵,𝐵 is hermitian.
(2) 𝑇 is normal iff [𝑇 ]𝐵,𝐵 is normal.
(3) 𝑇 is unitary iff [𝑇 ]𝐵,𝐵 is unitary.

Proof. (1) Due to (4.22), [𝑇 ∗ ]𝐵,𝐵 = [𝑇 ]𝐵,𝐵 ∗ . If 𝑇 is self-adjoint, then 𝑇 ∗ = 𝑇 . It

follows that [𝑇 ]𝐵,𝐵∗ = [𝑇 ∗ ]


𝐵,𝐵 = [𝑇 ]𝐵,𝐵 . That is, [𝑇 ]𝐵,𝐵 is hermitian. Conversely, if

[𝑇 ]𝐵,𝐵 = [𝑇 ]𝐵,𝐵 , then for each 𝑣 ∈ 𝑉 ,

[𝑇 ∗𝑣]𝐵 = [𝑇 ∗ ]𝐵,𝐵 [𝑣]𝐵 = [𝑇 ]𝐵,𝐵



[𝑣]𝐵 = [𝑇 ]𝐵,𝐵 [𝑣]𝐵 = [𝑇 𝑣]𝐵 .
Linear Transformations and Matrices 97
It then follows that 𝑇 ∗𝑣 = 𝑇 𝑣 for each 𝑣 ∈ 𝑉 . That is, 𝑇 ∗ = 𝑇 .
Proofs of (2)-(3) are similar to that of (1).
The proof of (4.24-1) reveals that 𝑇 is self-adjoint iff for each orthonormal basis
𝐵 of 𝑉 , [𝑇 ∗ ]𝐵,𝐵 = [𝑇 ]𝐵,𝐵 iff for some orthonormal basis 𝐵 of 𝑉 , [𝑇 ∗ ]𝐵,𝐵 = [𝑇 ]𝐵,𝐵 .
Similar comments hold for the statements in (2)-(3). Further, if 𝑉 is a real inner
product space, then ‘hermitian’ may be replaced by ‘real symmetric’ and ‘unitary’
by ‘orthogonal’.

Exercises for § 4.4


       
1 0 0 1 0 0 0 0
1. Let 𝐵 = , , , .
0 0 0 0 1 0 0 1
(a) Define 𝑇 : R2×2 → R2×2 by 𝑇 (𝐴) = 𝐴t . Compute [𝑇 ]𝐵,𝐵 .
𝑓 0 (0) 2𝑓 (1)
(b) Define 𝑇 : R2 [𝑡] → R2×2 by 𝑇 (𝑓 ) = .
0 𝑓 0 (3)
Compute [𝑇 ]𝐵,𝐷 , where 𝐷 = {1, 𝑡, 𝑡 2 }.
(c) Define 𝑇 : R2×2 → R by 𝑇 (𝐴) = tr(𝐴). Compute [𝑇 ]{1},𝐵 .
1 0 0 0 0 1 0
  
0 0 1 0 2 2 4  
Ans: (a)  . (b)   . (c) 1 0 0 1 .
0 1 0 0 0 0 0
 
0 0 0 1 0 1 6
  
2. Let 𝑆 and 𝑇 be linear operators on a finite dimensional ips 𝑉 . Prove the
following:
(a) Let 𝑆 and 𝑇 be self-adjoint. Show that 𝑆𝑇 is self-adjoint iff 𝑆𝑇 = 𝑇 𝑆 iff
𝑇 𝑆 is self-adjoint.
(b) If 𝑆 is self-adjoint, then 𝑇 ∗𝑆𝑇 is self-adjoint.
(c) If 𝑇 is invertible and 𝑇 ∗𝑆𝑇 is self-adjoint, then 𝑆 is self-adjoint.
Interpret the above results for hermitian matrices.
2×2 ∗
3. Show that 𝑊 :=  {𝐴 ∈ C :𝐴 + 𝐴 =0} is a real
  vector space. Find a basis
𝑖 0 0 0 0 −1 0 𝑖
for 𝑊 . Ans: , , , .
0 0 0 𝑖 1 0 𝑖 0
4. Construct a normal 3 × 3 matrix which is neither hermitian nor unitary.
 1 1 + 𝑖 1
 
Ans: −1 + 𝑖 1 1 .
 −1 −1 1
 
5. Find a real normal 3 × 3 matrix which is neither symmetric nor orthogonal.
1 1 0
 
Ans: 0 1 1 .
1 0 1
 
98 MA2031 Classnotes

6. Is 𝑉 := {𝐴 ∈ F𝑛×𝑛 : tr(𝐴) = 0} a vector space? Ans: Yes.


7. Show that the map 𝑇 : F𝑛×𝑛→ F defined by 𝑇 (𝐴) = tr(𝐴) is a linear
functional. Show that 𝑇 is an onto map. Also, find null(𝑇 ).
Ans: null(𝑇 ) = dim F (𝑉 ) = 𝑛 2 − 1.
 
2×2 2 0 1
8. Construct a matrix 𝐴 ∈ R with tr(𝐴 ) < 0. Ans: 𝐴 = .
−1 0
9. In the following is h , i an inner product on the vector space 𝑉 ?
(a) h𝐴, 𝐵i = tr(𝐴 + 𝐵) for 𝐴, 𝐵 in 𝑉 = R2×2 . Ans: No.
(b) h𝐴, 𝐵i = tr(𝐴t 𝐵) for 𝐴, 𝐵 in 𝑉 = R3×3 . Ans: Yes.
10. Construct matrices 𝐴, 𝐵 ∈ F𝑛×𝑛 so that tr(𝐴𝐵) ≠ tr(𝐴)tr(𝐵). Ans: 𝐴 = 𝐼 = 𝐵.
11. Let 𝐴 ∈ F𝑚×𝑛 and let 𝐵 ∈ F𝑛×𝑚 . Show that tr(𝐴𝐵) = tr(𝐵𝐴).
12. Let 𝐴, 𝐵 ∈ C𝑛×𝑛 . Show that 𝐴𝐵 − 𝐵𝐴 ≠ 𝐼 .
13. Let 𝐶 ∈ F2×2 . Show that tr(𝐶) = 0 iff there exist 𝐴, 𝐵 ∈ F2×2 such that
𝐶 = 𝐴𝐵 − 𝐵𝐴.
14. Let 𝐴 ∈ C𝑚×𝑚 . Show that if tr(𝐴∗𝐴) = 0, then 𝐴 = 0. What if 𝐴 ∈ C𝑚×𝑛 ?
Ans: If 𝑚 ≠ 𝑛, then tr(𝐴∗𝐴) = 0 ; 𝐴 = 0.
15. Let 𝐴 be a square matrix such that 𝐴∗𝐴 = 𝐴2 . Prove that 𝐴 is hermitian.

4.5 Change of basis


Suppose 𝑉 and 𝑊 are vector spaces of dimensions 𝑛 and 𝑚, respectively. If we fix
ordered bases for 𝑉 and 𝑊 , then the coordinate vector maps with respect to these
bases provide isomorphisms from 𝑉 and𝑊 to F𝑛×1 and F𝑚×1 . Using such coordinate
vector maps, we had seen how to represent a linear transformation from 𝑉 to 𝑊 by
a matrix.
To fix the ideas, let 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } be an ordered basis of𝑉 . Let 𝐶 = {𝑤 1, . . . , 𝑤𝑚 }
be an ordered basis of 𝑊 . Have standard bases for F𝑛×1 and F𝑚×1 . The coordinate
vector maps [ ]𝐵 : 𝑉 → F𝑛×1 and [ ]𝐶 : 𝑊 → F𝑚×1 are isomorphisms. Let
𝑇 : 𝑉 → 𝑊 be a linear transformation and [𝑇 ]𝐶,𝐵 be its matrix representation. Then
we have the following commutative diagram:

𝑇
𝑉 −−−−→ 𝑊
'
[ ]𝐵 
[ ]
'
y y 𝐶
F𝑛×1 −−−−→ F𝑚×1
[𝑇 ]𝐶,𝐵
Linear Transformations and Matrices 99
It means
𝑇 = [ ]𝐶−1 ◦ [𝑇 ]𝐶,𝐵 ◦ [ ]𝐵 , [𝑇 ]𝐶,𝐵 = [ ]𝐶 ◦ 𝑇 ◦ [ ]𝐵−1 . (4.5.1)
Also, [ ]𝐶 ◦ 𝑇 = [𝑇 ]𝐶,𝐵 ◦ [ ]𝐵 , which amounts to

[𝑇 (𝑥)]𝐶 = [𝑇 ]𝐶,𝐵 [𝑥]𝐵 for each 𝑥 ∈ 𝑉 . (4.5.2)

We know that isomorphisms preserve rank and nullity. Since the coordinate vector
maps are isomorphisms, we obtain the following theorem.

(4.25) Theorem
Let 𝑉 and 𝑊 be finite dimensional vector spaces with ordered bases 𝐵 and 𝐶,
respectively. Let 𝑇 : 𝑉 → 𝑊 be a linear transformation. Then
rank(𝑇 ) = rank([𝑇 ]𝐶,𝐵 ) and null(𝑇 ) = null([𝑇 ]𝐶,𝐵 ).

Since we are able to go from 𝑇 to its matrix representation [𝑇 ]𝐶,𝐵 and back in
a unique manner, it suggests that the matrix representation itself is some sort of
isomorphism. It is easy to verify that the map 𝑇 ↦→ [𝑇 ]𝐶,𝐵 is an isomorphism from
L (𝑉 ,𝑊 ) to F𝑚×𝑛 .
It thus follows that dim (L (𝑉 ,𝑊 )) = 𝑚𝑛. Alternatively, a basis for L (𝑉 ,𝑊 ) can
be constructed explicitly. Let {𝑣 1, . . . , 𝑣𝑛 } and {𝑤 1, . . . , 𝑤𝑚 } be ordered bases for
the vector spaces 𝑉 and 𝑊 , respectively. Suppose 1 ≤ 𝑖 ≤ 𝑛 and 1 ≤ 𝑗 ≤ 𝑚.
Define 𝑇𝑖 𝑗 : 𝑉 → 𝑊 by 𝑇𝑖 𝑗 (𝑣𝑖 ) = 𝑤 𝑗 , 𝑇𝑖 𝑗 (𝑣𝑘 ) = 0 for 𝑘 ≠ 𝑖. Then show that the set
{𝑇𝑖 𝑗 : 𝑖 = 1, . . . , 𝑛, 𝑗 = 1, . . . , 𝑚} is a basis for L (𝑉 ,𝑊 ).
We look at a particular case of the composition formulas in (4.5.1). Consider
a vector space 𝑉 of dimension 𝑛, with ordered bases 𝑂 = {𝑣 1, . . . , 𝑣𝑛 } and 𝑁 =
{𝑤 1, . . . , 𝑤𝑛 }. Consider the identity map 𝐼 on 𝑉 . Let us write 𝑉𝑂 for the vector space
𝑉 where we take the ordered basis as 𝑂. Similarly, write 𝑉𝑁 for the same space
but with the ordered basis 𝑁 . Fix the standard basis 𝐸 = {𝑒 1, . . . , 𝑒𝑛 } for F𝑛×1 . The
matrix representation diagram now looks like
𝐼
𝑉𝑂 −−−−→ 𝑉𝑁
'
[ ]𝑂 
[ ]
'
y y 𝑁
F𝑛×1 −−−−→ F𝑛×1
[𝐼 ]𝑁 ,𝑂

Here, [ ]𝑂 maps each 𝑣𝑖 to the corresponding 𝑒𝑖 and [ ]𝑁 maps each 𝑤 𝑗 to the


corresponding 𝑒 𝑗 . Then (4.5.2) says that

[𝑣]𝑁 = [𝐼 𝑣]𝑁 = [𝐼 ]𝑁 ,𝑂 [𝑣]𝑂 for each 𝑣 ∈ 𝑉 .

Thus the 𝑛 × 𝑛 matrix [𝐼 ]𝑁 ,𝑂 is called the change of basis matrix. This matrix
records the change in the coordinate vectors while we change the basis of 𝑉 from
100 MA2031 Classnotes

𝑂 to 𝑁 . It is obtained by collecting the scalars while expressing the basis vectors


𝑣𝑖 ∈ 𝑂 as linear combinations of basis vectors 𝑤 𝑗 ∈ 𝑁 .
Notice that 𝐼 𝑁 ,𝑂 is invertible, and 𝐼 𝑁−1,𝑂 = 𝐼𝑂,𝑁 .

(4.26) Example
Consider two ordered bases for R3 such as

𝑂 = {(1, 0, 1), (1, 1, 0), (0, 1, 1)}, 𝑁 = {(1, −1, 1), (1, 1, −1), (−1, 1, 1)}.

Find the change of basis matrix [𝐼 ]𝑁 ,𝑂 and verify that [𝑣]𝑁 = [𝐼 ]𝑁 ,𝑂 [𝑣]𝑂 for the
vector 𝑣 = (1, 2, 3).
We need to express each vector in 𝑂 as a linear combination of vectors in 𝑁 .
Towards this, suppose

(1, 0, 1) = 𝑎(1, −1, 1) + 𝑏 (1, 1, −1) + 𝑐 (−1, 1, 1).

Then 𝑎 + 𝑏 − 𝑐 = 1, −𝑎 + 𝑏 + 𝑐 = 0, 𝑎 − 𝑏 + 𝑐 = 1. Solving these equations, we obtain


𝑎 = 1, 𝑏 = 12 , and 𝑐 = 21 . So,

(1, 0, 1) = 1(1, −1, 1) + 12 (1, 1, −1) + 21 (−1, 1, 1).

Continuing with the second and third vectors in 𝑂, we obtain

(1, 1, 0) = 12 (1, −1, 1) + 1(1, 1, −1) + 21 (−1, 1, 1).

(0, 1, 1) = 12 (1, −1, 1) + 21 (1, 1, −1) + 1(−1, 1, 1).


Therefore the change of basis matrix is
1 1/2 1/2
 
[𝐼 ]𝑁 ,𝑂 = 1/2 1 1/2
.
1/2 1/2 1 

For 𝑣 = (1, 2, 3), we obtain

𝑣 = (1, 2, 3) = 2(1, −1, 1) + 23 (1, 1, −1) + 52 (−1, 1, 1) ⇒ [𝑣]𝑁 = [2, 3/2, 5/2] t .
𝑣 = (1, 2, 3) = 1(1, 0, 1) + 0(1, 1, 0) + 2(0, 1, 1) ⇒ [𝑣]𝑂 = [1, 0, 2] t .

1 1/2 1/2 1   2 
     
[𝐼 ]𝑁 ,𝑂 [𝑣]𝑂 = 1/2 1 1/2

0 = 3/2 = [𝑣]𝑁 .
   
1/2 1/2 1  2 5/2
    
In (4.26), construct the matrix 𝐵 by taking its columns as the transposes of vectors
in 𝑂, keeping the same order of the vectors as given in 𝑂. Similarly, construct the
Linear Transformations and Matrices 101
matrix 𝐶 by taking its columns as the transposes of vectors from 𝑁 , again keeping
the same order. We claim that [𝐼 ]𝑁 ,𝑂 = 𝐶 −1 𝐵. Indeed, the following may be easily
verified:
 1 1 −1   1 1/2 1/2
 1 1 0
 
  
𝐶 [𝐼 ]𝑁 ,𝑂 =  −1 1 1  1/2 1 1/2 = 0 1 1  = 𝐵.

 


 1 −1 1  1/2 1/2 1   1 0 1 
  

In general, if 𝑉 = F𝑛×1, then the change of basis matrix can be given in a closed
from using the given basis vectors.

(4.27) Theorem
Let 𝑂 = {𝑣 1, . . . , 𝑣𝑛 } and 𝑁 = {𝑤 1, . . . , 𝑤𝑛 } be ordered bases for F𝑛×1 . Let 𝐸 =
{𝑒 1, . . . , 𝑒𝑛 } be the standard basis for F𝑛×1 . Then the change of basis matrices [𝐼 ]𝐸,𝑂
and [𝐼 ]𝑁 ,𝑂 are given by

[𝐼 ]𝐸,𝑂 = [𝑣 1 · · · 𝑣𝑛 ] and [𝐼 ]𝑁 ,𝑂 = [𝑤 1 · · · 𝑤𝑛 ] −1 [𝑣 1 · · · 𝑣𝑛 ].

Proof. Since [𝑣 𝑗 ]𝑂 = 𝑒 𝑗 , we have

[𝐼 ]𝐸,𝑂 𝑒 𝑗 = [𝐼 ]𝐸,𝑂 [𝑣 𝑗 ]𝑂 = [𝐼 (𝑣 𝑗 )]𝐸 = [𝑣 𝑗 ]𝐸 = 𝑣 𝑗 .

That is, the 𝑗th column of the matrix [𝐼 ]𝐸,𝑂 is simply 𝑣 𝑗 . So,

[𝐼 ]𝐸,𝑂 = [𝑣 1 · · · 𝑣𝑛 ].

It is the matrix formed by putting the vectors 𝑣 1, . . . , 𝑣𝑛 as columns in that order.


Similarly,
[𝐼 ]𝐸,𝑁 = [𝑤 1 · · · 𝑤𝑛 ].
Using these equalities, we obtain
−1 [𝐼 ]
[𝐼 ]𝑁 ,𝑂 = [𝐼 ]𝑁 ,𝐸 [𝐼 ]𝐸,𝑂 = [𝐼 ]𝐸,𝑁 −1
𝐸,𝑂 = [𝑤 1 · · · 𝑤 𝑛 ] [𝑣 1 · · · 𝑣 𝑛 ].

In fact, columns of any invertible 𝑛 × 𝑛 matrix form a basis for F𝑛×1 . Therefore,
any invertible matrix is a change of basis matrix in this sense. It gives rise to the
following generalization.

(4.28) Theorem
Let 𝐴 ∈ F𝑚×𝑛 . Let 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } and 𝐶 = {𝑤 1, . . . , 𝑤𝑚 } be ordered bases for F𝑛×1
and F𝑚×1, respectively. Then [𝐴]𝐶,𝐵 = 𝑄 −1𝐴𝑃, where

𝑄 = [𝑤 1 · · · 𝑤𝑚 ] and 𝑃 = [𝑣 1 · · · 𝑣𝑛 ].
102 MA2031 Classnotes

Proof. Take the standard bases for F𝑛×1 and F𝑚×1 as 𝐷 and 𝐸, respectively. Now,
[𝐼 ]𝐷,𝐵 = [𝑣 1 · · · 𝑣𝑛 ], [𝐼 ]𝐸,𝐶 = [𝑤 1 · · · 𝑤𝑚 ], and [𝐴]𝐸,𝐷 = 𝐴.
Then [𝐴]𝐶,𝐵 = [𝐼 ]𝐶,𝐸 [𝐴]𝐸,𝐷 [𝐼 ]𝐷,𝐵 = [𝑤 1 · · · 𝑤𝑚 ] −1 𝐴 [𝑣 1 · · · 𝑣𝑛 ].
As (4.28) shows, a matrix 𝐴 ∈ F𝑚×𝑛 and a matrix 𝑄 −1𝐴𝑃, where both 𝑄 ∈ F𝑚×𝑚
and 𝑃 ∈ F𝑛×𝑛 are invertible, represent the same linear transformation with respect
to ordered bases chosen in both the domain and the co-domain spaces. Note that a
matrix 𝐴 ∈ F𝑚×𝑛 is equal to [𝐴]𝐸 0,𝐸 , where 𝐸 is the standard basis for F𝑛×1 and 𝐸 0 is
the standard basis for F𝑚×1 .

(4.29) Example
 
1 2 3
Consider the matrix 𝐴 = as a linear transformation from R3×1 to R2×1 .
0 1 1
It maps
𝑎   
  𝑎 + 2𝑏 + 3𝑐
𝑣 = 𝑏  to 𝐴𝑣 =
  for 𝑎, 𝑏, 𝑐 ∈ R.
𝑐  𝑏 +𝑐
 
Choose ordered bases
−1
 1
    0     
1 1

   

 1 for R3×1, for R2×1 .
 
𝐵 = 1 ,  1 , 𝐶= ,

 1
 
 1
 
−1  1 −1
     
Then
    1 −1 0
1 1 −1 1 1 1  
𝑄= , 𝑄 = , 𝑃 = 1 1 1 .
1 −1 2 1 −1 1 1 −1
 
We write 𝑣 = (𝑎, 𝑏, 𝑐) t as a linear combination of basis vectors from 𝐵, and
𝐴𝑣 = (𝑎 + 2𝑏 + 3𝑐, 𝑏 + 𝑐) t as a linear combination of basis vectors from 𝐶.

𝑎  1 −1  0
  1   1   1  
𝑏  = (𝑎 + 𝑏 + 𝑐) 1 + (−2𝑎 + 𝑏 + 𝑐)  0 + (−𝑎 + 2𝑏 − 𝑐)  1 .
  3   3   3  
𝑐  1  1 −1
       
     
𝑎 + 2𝑏 + 3𝑐 1 1 1 1
= 2 (𝑎 + 3𝑏 + 4𝑐) + 2 (𝑎 + 𝑏 + 2𝑐) .
𝑏 +𝑐 1 −1
Thus their coordinate vectors with respect to bases 𝐵 and 𝐶 are as follows:

 𝑎 +𝑏 +𝑐   
1   1 𝑎 + 3𝑏 + 4𝑐
[𝑣]𝐵 =  −2𝑎 + 𝑏 + 𝑐  , [𝐴𝑣]𝐶 = .
3 
−𝑎 + 2𝑏 − 𝑐 
 2 𝑎 + 2𝑏 + 2𝑐
 
Linear Transformations and Matrices 103
Then the matrix representation of 𝐴 with respect to the new bases is given by
       
−1 1 1 1 1 2 3 1 2 3 1 8 3 −1
[𝐴]𝐶,𝐵 = 𝑄 𝐴𝑃 = = .
2 1 −1 0 1 1 0 1 1 2 4 1 −1
As it should happen, we see that
   𝑎 +𝑏 +𝑐   
1 8 3 −1 1  𝑎 + 3𝑏 + 4𝑐
  1
[𝐴]𝐶,𝐵 [𝑣]𝐵 = 2 3 −2𝑎 + 𝑏 + 𝑐  = 2 = [𝐴𝑣]𝐶 .
4 1 −1 
−𝑎 + 2𝑏 − 𝑐  𝑎 + 2𝑏 + 2𝑐
 

Exercises for § 4.5


1. For R3×1, consider the ordered bases 𝑂 = {[1, 0, 1] t, [1, 1, 0] t, [0, 1, 1] t } and
𝑁 = {[1, −1, 1] t, [1, 1, −1] t, [−1, 1, 1] t }. Let 𝑇 be the linear operator on
R3×1 whose matrix representation with respect to the standard basis of R3×1
 1 1 1
 
is given by the matrix −1 0 1 . Find the matrix [𝑇 ]𝑁 ,𝑂 and verify that
 0 1 0
 
 1  1 2 3 3
     1
 
𝑇 2  = [𝑇 ]𝑁 ,𝑂 2 . Ans: [𝑇 ]𝑁 ,𝑂 =  2 1 3  .
     2  
 3  3 0 0 2
  𝑁  𝑂  
2. Consider the ordered bases 𝑂 = {𝑢 1, 𝑢 2 } and 𝑁 = {𝑣 1, 𝑣 2 } forF2×1, where
t t t t −1 0
𝑢 1 = (1, 1) , 𝑢 2 = (−1, 1) , 𝑣 1 = (2, 1) , and 𝑣 2 = (1, 0) . Let 𝐴 = .
0 1
(a) Compute 𝑄 = [𝐴]𝑂,𝑂 and 𝑅 = [𝐴]𝑁 ,𝑁 .
(b) Find the change of basis matrix 𝑃 = [𝐼 ]𝑁 ,𝑂 .
(c) Compute 𝑆 = 𝑃𝑄𝑃 −1 .
(d) Is it true that 𝑅 = 𝑆? Why?
(e) If 𝑆 = [𝑠𝑖 𝑗 ], verify that 𝐴𝑣 1 = 𝑠 11𝑣 1 + 𝑠 21𝑣 2, 𝐴𝑣 2 = 𝑠 12𝑣 1 + 𝑠 22𝑣 2 .
     
0 1 1 0 1 1
Ans: (a) 𝑄 = ,𝑅 = . (b) 𝑃 = .
1 0 −4 −1 −1 −3
 
1 0
(c) 𝑆 = . (d) Yes.
−4 −1
3. Construct a matrix 𝐴 ∈ R2×2, a vector 𝑣 ∈ R2×1, and a basis 𝐵 = {𝑢 1, 𝑢 2 } for
R2×1 satisfying
  [𝐴𝑣]𝐵 ≠𝐴[𝑣]𝐵 .     
1 0 1 1 1
Ans: 𝐴 = ,𝑣 = ,𝐵= , .
1 1 2 0 1
4. Let 𝑉 and 𝑊 be vector spaces with bases {𝑣 1, . . . , 𝑣𝑛 } and {𝑤 1, . . . , 𝑤𝑚 },
respectively. For 𝑖 ∈ {1, . . . , 𝑛} and 𝑗 ∈ {1, . . . , 𝑚}, define 𝑇𝑖 𝑗 : 𝑉 → 𝑊 by
𝑇𝑖 𝑗 (𝑣𝑖 ) = 𝑤 𝑗 , and 𝑇𝑖 𝑗 (𝑣𝑘 ) = 0 for 𝑘 ≠ 𝑖. Then show that the set of all these 𝑇𝑖 𝑗
is a basis for L (𝑉 ,𝑊 ).
104 MA2031 Classnotes

5. Let 𝐵 = {𝑢 1, . . . , 𝑢𝑛 } and 𝐸 = {𝑣 1, . . . , 𝑣𝑚 } be ordered bases of 𝑉 and 𝑊 ,


respectively. Let 𝑇 ∈ L (𝑉 ,𝑊 ). Show the following:
(a) 𝑇 is one-one iff columns of [𝑇 ]𝐸,𝐵 are linearly independent.
(b) 𝑇 is onto iff columns of [𝑇 ]𝐸,𝐵 span F𝑚×1 .
6. Let 𝑉 and 𝑊 be vector spaces of dimensions 𝑛 and 𝑚, with ordered bases 𝐵
and 𝐸, respectively. Prove that the map 𝑇 ↦→ [𝑇 ]𝐸,𝐵 is an isomorphism from
L (𝑉 ,𝑊 ) to F𝑚×𝑛 .
7. Let 𝐵 = {𝑢 1, . . . , 𝑢𝑛 } and 𝐸 = {𝑣 1, . . . , 𝑣𝑚 } be ordered bases of 𝑉 and 𝑊 ,
respectively. Let {𝑀𝑖 𝑗 : 𝑖 = 1 . . . , 𝑚; 𝑗 = 1, . . . , 𝑛} be an ordered basis of
F𝑚×𝑛 . Let 𝑇𝑖 𝑗 ∈ L (𝑉 ,𝑊 ) be such that [𝑇𝑖 𝑗 ]𝐸,𝐵 = 𝑀𝑖 𝑗 . Show that a basis for
L (𝑉 ,𝑊 ) is given by {𝑇𝑖 𝑗 : 𝑖 = 1 . . . , 𝑚; 𝑗 = 1, . . . , 𝑛}.
8. Let {𝑢 1, . . . , 𝑢𝑛 } be an ordered basis of an ips 𝑉 . Show the following:
(a) The matrix [𝑎𝑖 𝑗 ], where 𝑎𝑖 𝑗 = h𝑢𝑖 , 𝑢 𝑗 i, is invertible. [This matrix is called
the Gram matrix with respect to the given ordered basis.]
(b) If (𝛼 1, . . . 𝛼𝑛 ) ∈ F𝑛 , then there is exactly one vector 𝑥 ∈ 𝑉 such that
h𝑥, 𝑢 𝑗 i = 𝛼 𝑗 , for 𝑗 = 1, . . . , 𝑛.
9. Let 𝑇 be a linear operator on a finite dimensional vector space 𝑉 . Let 𝐵 and
𝐶 be ordered bases for 𝑉 . Show that tr([𝑇 ]𝐶,𝐶 ) = tr([𝑇 ]𝐵,𝐵 ) and det([𝑇 ]𝐶,𝐶 ) =
det([𝑇 ]𝐵,𝐵 ). Thus, define tr(𝑇 ) and det(𝑇 ).
10. Let {𝑢 1, . . . , 𝑢𝑛 } be an orthonormal basis of an ips 𝑉 . Show that for any 𝑥, 𝑦 ∈
Í
𝑉 , h𝑥, 𝑦i = 𝑛𝑘=1 h𝑥, 𝑢𝑘 ih𝑢𝑘 , 𝑦i. Then deduce that there exists an isometric
linear transformation 𝑇 from 𝑉 to F𝑛 .

4.6 Equivalence
The effect of change of bases in vector spaces on the matrix representation of a
linear transformation leads to the following relation on matrices.
Let 𝐴, 𝐵 ∈ F𝑚×𝑛 . We say that 𝐵 is equivalent to 𝐴 iff there exist invertible matrices
𝑃 ∈ F𝑛×𝑛 and 𝑄 ∈ F𝑚×𝑚 such that 𝐵 = 𝑄 −1𝐴𝑃 .
As it happens, a change of bases in both the domain and co-domain spaces
bring up an equivalent matrix that represents the original matrix viewed as a linear
transformation.
It is easy to see that on F𝑚×𝑛 , the relation ‘is equivalent to’ is an equivalence
relation. Observe that 𝐴 and 𝐵 are equivalent iff there exist invertible matrices 𝑃
and 𝑄 of appropriate order such that 𝐵 = 𝑄𝐴𝑃 . There is an easy characterization of
equivalence of two matrices.
Linear Transformations and Matrices 105
(4.30) Theorem (Rank Theorem)
Two matrices of the same size are equivalent iff they have the same rank.

Proof. Let 𝐴 and 𝐵 be 𝑚×𝑛 matrices. We view them as linear transformations from
F𝑛×1 to F𝑚×1 . Observe that isomorphisms on F𝑘×1 are simply invertible matrices.
Now, using (3.17-3.18) we obtain the following:
𝐴 and 𝐵 are equivalent
iff there exist invertible matrices 𝑃 ∈ F𝑛×𝑛 , 𝑄 ∈ F𝑚×𝑚 such that 𝐵 = 𝑄𝐴𝑃
iff there exist isomorphisms 𝑃 on F𝑛×𝑛 , and 𝑄 on F𝑚×1 such that 𝐵 = 𝑄𝐴𝑃
iff rank(𝐵) = rank(𝐴).
It is easy to construct an 𝑚 × 𝑛 matrix of rank 𝑟 . For instance
 
𝐼𝑟 0
𝐸𝑟 = ∈ F𝑚×𝑛
0 0

is such a matrix, where 𝐼𝑟 is the identity matrix of order 𝑟, and the other zero matrices
are of appropriate size. We thus obtain the following result as a corollary to the
Rank theorem.

(4.31) Theorem (Rank factorization)


Let 𝐴 ∈ F𝑚×𝑛 . Then rank(𝐴) = 𝑟 iff 𝐴 is equivalent to 𝐸𝑟 ∈ F𝑚×𝑛 .

The RREF conversion can be used to construct the matrices 𝑃 and 𝑄 from a given
matrix 𝐴 so that 𝑄 −1𝐴𝑃 = 𝐸𝑟 .
Now, taking transpose, we have 𝐴t = 𝑃 t 𝐸𝑟t (𝑄 −1 ) t . That is, 𝐴t is equivalent to 𝐸𝑟t .
However, 𝐸𝑟t has rank 𝑟 . It thus follows that rank(𝐴t ) = rank(𝐴) = 𝑟 .
We know that the 𝑖th column of 𝐴 is equal to 𝐴(𝑒𝑖 ). Thus, 𝑅(𝐴) is the subspace of
F𝑚×1 spanned by the columns of 𝐴. Then rank(𝐴) is same as the maximum number
of linearly independent columns of 𝐴. Then rank(𝐴t ) is same as the maximum
number of linearly independent rows of 𝐴. We have thus shown that these two
numbers are equal. This fact is expressed by asserting that the column rank of a
matrix and the row rank of a matrix are equal. Of course, this also follows from
(3.24).
Further, if 𝐴 is a square matrix, then rank(𝐴t ) = rank(𝐴) implies that 𝐴t and 𝐴
are equivalent matrices.
The rank factorization yields another related factorization. For a matrix 𝐴 ∈ F𝑚×𝑛
of rank 𝑟, we have invertible matrices 𝑃 ∈ F𝑛×𝑛 and 𝑄 ∈ F𝑚×𝑚 such that 𝐴 = 𝑄𝐸𝑟 𝑃 −1 .
However, 𝐸𝑟 ∈ F𝑚×𝑛 can be written as
     
𝐼𝑟 0 𝐼𝑟  𝐼
with 𝑟 ∈ F𝑚×𝑟 , 𝐼𝑟 0 ∈ F𝑟 ×𝑛 .
  
𝐸𝑟 = = 𝐼𝑟 0
0 0 0 0
106 MA2031 Classnotes

Taking  
𝐼
0 𝑃 −1,
 
𝐵 =𝑄 𝑟 , 𝐶 = 𝐼𝑟
0
we see that rank(𝐵) = 𝑟 = rank(𝐶). We have thus have proved the following theorem.

(4.32) Theorem (Full rank factorization)


Let 𝐴 ∈ F𝑚×𝑛 be of rank 𝑟 . Then there exist rank 𝑟 matrices 𝐵 ∈ F𝑚×𝑟 and 𝐶 ∈ F𝑟 ×𝑛
such that 𝐴 = 𝐵𝐶.

A particular case of equivalence is obtained for square matrices. It will be


conceptually rewarding to look at the matrix representation of a linear operator on
a finite dimensional vector space when a change of basis occurs. Let 𝑇 : 𝑉 → 𝑉 be
a linear operator, where dim (𝑉 ) = 𝑛. Consider two ordered bases 𝐶 and 𝐸 for 𝑉 .
We have two matrix representations of 𝑇 with respect to these bases, namely, [𝑇 ]𝐶,𝐶
and [𝑇 ]𝐸,𝐸 . What is the relation between these two matrices?
By the change of basis formula, we see that [𝑇 ]𝐸,𝐸 = [𝐼 ]𝐸,𝐶 [𝑇 ]𝐶,𝐶 [𝐼 ]𝐶,𝐸 . Let
𝐴 = [𝑇 ]𝐶,𝐶 , 𝐵 = [𝑇 ]𝐸,𝐸 , and let 𝑃 = [𝐼 ]𝐶,𝐸 . Then 𝑃 is invertible and 𝑃 −1 = [𝐼 ]𝐸,𝐶 .
Therefore,
𝐵 = 𝑃 −1𝐴𝑃 .
Conversely, for matrices 𝐴, 𝐵 ∈ F𝑛×𝑛 , if there exists an invertible matrix 𝑃 such that
𝐵 = 𝑃 −1𝐴𝑃, then the columns of 𝑃 form a basis of F𝑛×1 . Take 𝐸 as the standard basis
of F𝑛×1 . Look at the matrix 𝐴 as the linear operator 𝐴 on F𝑛×1 . Then [𝐴]𝐸,𝐸 = 𝐴 and
[𝐴]𝐶,𝐶 = 𝐵.
This leads to the notion of similarity of two matrices. Let 𝐴, 𝐵 ∈ F𝑛×𝑛 . We say
that 𝐴 is similar to 𝐵 iff 𝐵 = 𝑃 −1𝐴𝑃 for some invertible matrix 𝑃 ∈ F𝑛×𝑛 .
When a change of basis occurs in F𝑛×1, a matrix in F𝑛×𝑛 is represented as a matrix
similar to itself. We thus say that similar matrices represent the same linear operator.
It is obvious that similarity is an equivalence relation on F𝑛×𝑛 .
Though equivalence is easy to characterize, similarity involves much more. We
will address this issue in Chapter 5.

Exercises for § 4.6


1. Show that for any matrix 𝐴 ∈ F𝑚×𝑛 , rank(𝐴∗ ) = rank(𝐴).
2. Derive the Rank theorem for matrices from the Rank factorization of a matrix.
3. Let 𝐴 ∈ F𝑚×𝑛 and let 𝐵 ∈ F𝑛×𝑘 . Show the following:
(a) 𝑅(𝐴𝐵) is a subspace of 𝑅(𝐴).
(b) rank(𝐴𝐵) ≤ rank(𝐴) and rank(𝐴𝐵) ≤ rank(𝐵).
4. Let 𝐴, 𝐵 ∈ F𝑚×𝑛 . Show that rank(𝐴 + 𝐵) ≤ rank(𝐴) + rank(𝐵).
Linear Transformations and Matrices 107
5. Let 𝐴 = 𝐵𝐶 be a full rank factorization of an 𝑚 × 𝑛 matrix 𝐴. Prove the
following:
(a) The columns of 𝐵 form a basis for 𝑅(𝐴).
(b) The rows of 𝐶 𝑡 form a basis for the space spanned by the rows of 𝐴.
6. Let 𝐴 = 𝐵𝐶 1 and 𝐴 = 𝐵𝐶 2 be two full rank factorizations of an 𝑚 × 𝑛 matrix
𝐴. Show that 𝐶 1 = 𝐶 2 .
7. Show that if 𝐴 = 𝐵𝐶 is a full rank factorization of 𝐴, and 𝐷 is an invertible
matrix of order rank(𝐴), then 𝐴 = (𝐵𝐷)(𝐷 −1𝐶) is also a full rank factorization.
8. Let 𝐴 = 𝐵 1𝐶 1 and 𝐴 = 𝐵 2𝐶 2 be two full rank factorizations of 𝐴. Show that
there exists an invertible matrix 𝐷 of appropriate order such that 𝐵 2 = 𝐵 1 𝐷
and 𝐶 2 = 𝐷 −1𝐶 1 .
5
Spectral Representation

5.1 Eigenvalues and eigenvectors


 
0 1
Let 𝐴 = . As a linear operator on R2×1, it transforms straight lines to straight
1 0
lines. Find a straight line that is mapped to itself by 𝐴. We see that
      
𝑎 0 1 𝑎 𝑏
𝐴 = = .
𝑏 1 0 𝑏 𝑎

Thus, the line {(𝑎, 𝑎) : 𝑎 ∈ R} never moves. So also the line {(𝑎, −𝑎) : 𝑎 ∈ R}.
Observe that
       
𝑎 𝑎 𝑎 𝑎
𝐴 =1 , 𝐴 = (−1) .
𝑎 𝑎 −𝑎 −𝑎

In general, if a straight line remains invariant under a linear operator 𝑇 , then the
image of any point on the straight line must be a point on the same straight line.
That is, 𝑇 (𝑥) must be a scalar multiple of 𝑥 . Since we are interested in fixing a
straight line, such a vector 𝑥 should be a nonzero vector.
Let 𝑇 be a linear operator on a vector space 𝑉 over F. A scalar 𝜆 ∈ F is called an
eigenvalue of 𝑇 iff there exists a nonzero vector 𝑣 ∈ 𝑉 such that 𝑇 𝑣 = 𝜆𝑣. Such a
vector 𝑣 is called an eigenvector of 𝑇 for (or associated with, or corresponding to)
the eigenvalue 𝜆.
Convention: Since eigenvectors are nonzero vectors, whenever we discuss eigen-
values of a linear operator on a vector space, we assume that the vector space is a
nonzero vector space.

(5.1) Example
1. Let 𝑇 be the linear operator on R2 given by 𝑇 (𝑎, 𝑏) = (𝑎, 𝑎 + 𝑏). We have
𝑇 (0, 1) = (0, 0 + 1) = 1 (0, 1). Thus the vector (0, 1) is an eigenvector associated
with the eigenvalue 1 of 𝑇 . Is (0, 𝑏) also an eigenvector associated with the same
eigenvalue 1?

108
Spectral Representation 109
2. Let the linear operator 𝑇 on R2 be given by 𝑇 (𝑎, 𝑏) = (−𝑏, 𝑎). If 𝜆 is an eigenvalue
of 𝑇 with an eigenvector (𝑎, 𝑏), then (−𝑏, 𝑎) = (𝜆𝑎, 𝜆𝑏). It implies that 𝑏 = −𝜆𝑎
and 𝑎 = 𝜆𝑏. It gives 𝑏 = −𝜆 2𝑏, or 𝑏 (1 + 𝜆 2 ) = 0. Since R2 is a real vector space,
𝜆 ∈ R. Then 1 + 𝜆 2 ≠ 0. Hence 𝑏 = 0. This leads to 𝑎 = 0. That is, (𝑎, 𝑏) = (0, 0).
But an eigenvector is nonzero! Therefore, 𝑇 does not have an eigenvalue.
3. Let 𝑇 : C2 → C2 be given by 𝑇 (𝑎, 𝑏) = (−𝑏, 𝑎). As in (2), if 𝜆 is an eigenvalue of
𝑇 with an eigenvector (𝑎, 𝑏), then 𝑏 (1 + 𝜆 2 ) = 0 and 𝑏 = −𝜆𝑎. If 𝑏 = 0, then 𝑎 = 0,
which is not possible as (𝑎, 𝑏) ≠ (0, 0). Thus 1 + 𝜆 2 = 0. Hence 𝜆 = ±𝑖. It is easy
to verify that the eigenvalue 𝜆 = 𝑖 is associated with an eigenvector (1, −𝑖) and
the eigenvalue 𝜆 = −𝑖 is associated with an eigenvector (1, 𝑖).
4. The linear operator 𝑇 : F[𝑡] → F[𝑡] defined by 𝑇 (𝑝 (𝑡)) = 𝑡𝑝 (𝑡) has no eigen-
vector and no eigenvalue, since for a polynomial 𝑝 (𝑡), 𝑡𝑝 (𝑡) ≠ 𝛼𝑝 (𝑡) for any
𝛼 ∈ F.
5. Let 𝑇 : R[𝑡] → R[𝑡] be defined by 𝑇 (𝑝 (𝑡)) = 𝑝 0 (𝑡), where we interpret each 𝑝 ∈
R[𝑡] as a function from the open interval (0, 1) to R. Since derivative of a constant
polynomial is 0, which equals 0 times the constant polynomial, all nonzero
constant polynomials are eigenvectors of 𝑇 associated with the eigenvalue 0.

(5.2) Theorem
Let 𝑇 : 𝑉 → 𝑉 be a linear operator. Let 𝜆 be a scalar. Then the following are true:
(1) A nonzero vector 𝑣 ∈ 𝑉 is an eigenvector of 𝑇 for the eigenvalue 𝜆 iff
𝑣 ∈ 𝑁 (𝑇 − 𝜆𝐼 ).
(2) 𝜆 is an eigenvalue of 𝑇 iff 𝑇 − 𝜆𝐼 is not one-one.

Proof. (1) A nonzero vector 𝑣 is an eigenvector of 𝑇 for the eigenvalue 𝜆 iff 𝑇 𝑣 = 𝜆𝑣


iff (𝑇 − 𝜆𝐼 )𝑣 = 0 iff 𝑣 ∈ 𝑁 (𝑇 − 𝜆𝐼 ).
(2) If 𝜆 is an eigenvalue of 𝑇 then there exists a nonzero vector 𝑣 ∈ 𝑉 such that
𝑣 ∈ 𝑁 (𝑇 − 𝜆𝐼 ). Thus 𝑇 − 𝜆𝐼 is not one-one. Conversely, if 𝑇 − 𝜆𝐼 is not one-one,
then there exist distinct vectors 𝑢, 𝑤 ∈ 𝑉 such that (𝑇 − 𝜆𝐼 )(𝑢) = (𝑇 − 𝜆𝐼 )𝑤 . Then
𝑇 (𝑢 − 𝑤) = 𝜆(𝑢 − 𝑤), where 𝑢 − 𝑤 ≠ 0. It follows that 𝜆 is an eigenvalue of 𝑇 with
an eigenvector as 𝑢 − 𝑤 .
The results on eigenvalues of linear operators are applicable to square matrices.

(5.3) Theorem
Let 𝑇 be a liner operator on a finite dimensional vector space 𝑉 over F. Let 𝐵 be an
ordered basis of 𝑉 . Then, 𝜆 ∈ F is an eigenvalue of 𝑇 with an associated eigenvector
𝑣 iff 𝜆 is an eigenvalue of the matrix [𝑇 ]𝐵,𝐵 ∈ F𝑛×𝑛 with an associated eigenvector
[𝑣]𝐵 ∈ F𝑛×1 .
110 MA2031 Classnotes

Proof. Let 𝜆 ∈ F be an eigenvalue of 𝑇 . There exists a nonzero vector 𝑣 ∈ 𝑉 such


that 𝑇 𝑣 = 𝜆𝑣. Then [𝑣]𝐵 ≠ 0 and [𝑇 ]𝐵,𝐵 [𝑣]𝐵 = [𝑇 𝑣]𝐵 = [𝜆𝑣]𝐵 = 𝜆[𝑣]𝐵 .
Conversely, let 𝜆 ∈ F be an eigenvalue of [𝑇 ]𝐵,𝐵 with an associated eigenvector
𝑢 = (𝑎 1, . . . , 𝑎𝑛 ) t ∈ F𝑛×1 . Let 𝐵 = {𝑣 1, . . . , 𝑣𝑛 }. Write 𝑣 = 𝑎 1𝑣 1 + · · · + 𝑎𝑛 𝑣𝑛 . Then
𝑢 = [𝑣]𝐵 . Now, 𝑢 ≠ 0 implies 𝑣 ≠ 0. Further, [𝜆𝑣]𝐵 = 𝜆𝑢 = [𝑇 ]𝐵,𝐵𝑢 = [𝑇 ]𝐵,𝐵 [𝑣]𝐵 =
[𝑇 𝑣]𝐵 . Thus 𝑇 𝑣 = 𝜆𝑣.
As (5.3) shows eigenvalues of a linear operator on a finite dimensional vector
space and that of its matrix representation coincide. It allows us to go back and
forth from a linear operator to its matrix representation with respect to any basis
while considering problems about its eigenvalues and eigenvectors.

Exercises for § 5.1


1. Show that the eigenvalues of a triangular matrix (upper or lower) are the
entries on the diagonal.
2. Let 𝐴 be an 𝑛 × 𝑛 matrix and let 𝛼 be a scalar such that each row (or each
column) sums to 𝛼 . Show that 𝛼 is an eigenvalue of 𝐴.
3. Let 𝜆 be an eigenvalue of a linear operator 𝑇 on a finite dimensional vector
space 𝑉 over F. Prove (a)-(c), and answer (d):
(a) 𝜆𝑘 is an eigenvalue of 𝑇 𝑘 .
(b) If 𝛼 ∈ F, then 𝜆 + 𝛼 is an eigenvalue of 𝑇 + 𝛼𝐼 .
(c) Let 𝑝 (𝑡) = 𝑎 0 + 𝑎 1𝑡 + . . . + 𝑎𝑘 𝑡 𝑘 ∈ F[𝑡]. Then 𝑝 (𝜆) is an eigenvalue of
𝑝 (𝑇 ) := 𝑎 0𝐼 + 𝑎 1𝑇 + . . . + 𝑎𝑘𝑇 𝑘 .
(d) Are all the eigenvalues of 𝑝 (𝑇 ) of the form 𝑝 (𝜆)?
Ans: (d) If F = C, then yes.
4. Let 𝑇 : 𝑉 → 𝑉 be an isomorphism, where 𝑉 is a finite dimensional vector
space. Let 𝜆 be a nonzero scalar. Show that 𝜆 is an eigenvalue of 𝑇 iff 1/𝜆 is
an eigenvalue of 𝑇 −1 .
5. Let 𝑆 and 𝑇 be linear operators on 𝑉 . Let 𝜆 and 𝜇 be eigenvalues of 𝑆 and 𝑇 ,
respectively. What is wrong with the following argument?
𝜆𝜇 an eigenvalue of 𝑆 ◦ 𝑇 because, if 𝑆 (𝑥) = 𝜆𝑥 and 𝑇 (𝑥) = 𝜇𝑥, then
(𝑆 ◦ 𝑇 )𝑥 = 𝑆 (𝜇𝑥) = 𝜇𝑆 (𝑥) = 𝜇𝜆𝑥 = 𝜆𝜇𝑥 .
6. Can any nonzero vector in any non-trivial vector space be an eigenvector of
some linear operator? Ans: Yes.
7. Given a scalar 𝜆, can any nonzero vector in any non-trivial vector space be an
eigenvector associated with the eigenvalue 𝜆 of some linear operator?
Ans: Yes.
Spectral Representation 111
8. Construct 𝐴, 𝐵 ∈ R2×2 such that 𝜆 is an eigenvalue of 𝐴, 𝜇 is an eigenvalue of
𝐵 but 𝜆𝜇 is not an eigenvalue
  of 𝐴𝐵.
1 1 1 0
Ans: 𝐴 = , 𝐵= .
1 0 1 1

5.2 Characteristic polynomial


Eigenvalues of a matrix can be seen as zeros of a certain polynomial. Due to (5.3),
the same should be true for a linear operator.

(5.4) Theorem
Let 𝑇 be a linear operator on a finite dimensional vector space 𝑉 . Let 𝐵 be an
ordered basis of 𝑉 , and let 𝐴 = [𝑇 ]𝐵,𝐵 be the matrix representation of 𝑇 with
respect to 𝐵. Then a scalar 𝜆 is an eigenvalue of 𝑇 iff det(𝐴 − 𝜆𝐼 ) = 0.

Proof. Let dim 𝑉 = 𝑛. With respect to the basis 𝐵, the matrix of 𝑇 − 𝜆𝐼 is 𝐴 − 𝜆𝐼 .


Now, 𝑇 −𝜆𝐼 is not one-one iff (𝐴−𝜆𝐼 )𝑥 = 0 has a nonzero solution iff det(𝐴−𝜆𝐼 ) = 0,
due to (4.3-3).
In (5.4), it looks as though the equation det([𝑇 ]𝐵,𝐵 − 𝜆𝐼 ) = 0 is not affected by the
choice of a basis 𝐵 for the vector space. In fact, if we choose another ordered basis,
then the matrix of 𝑇 with respect to the new basis can be written as 𝑃 −1 [𝑇 ]𝐵,𝐵 𝑃 for
some invertible matrix 𝑃; see (4.28). In that case,

det(𝑃 −1 [𝑇 ] 𝐵,𝐵 𝑃 − 𝜆𝐼 ) = det(𝑃 −1 ([𝑇 ] 𝐵,𝐵 − 𝜆𝐼 )𝑃) = det([𝑇 ]𝐵,𝐵 − 𝜆𝐼 ).

Therefore, det([𝑇 ]𝐵,𝐵 − 𝜆𝐼 ) as a polynomial in 𝜆, is independent of the particular


choice of the ordered basis 𝐵.
Further, if 𝐴 is an 𝑛 × 𝑛 matrix, then det(𝐴 − 𝑡𝐼 ) is a polynomial in 𝑡 with its
leading term as (−1)𝑛 𝑡 𝑛 . To make it a monic polynomial, that is, when the leading
term has the coefficient as 1, we multiply it with (−1)𝑛 . We know that such a monic
polynomial is not affected by a change of basis. Thus we give a name to it.
Let 𝑇 be a linear operator on a vector space 𝑉 of dimension 𝑛. Let 𝐴 be a
matrix representation of 𝑇 with respect to some ordered basis of 𝑉 . The polynomial
(−1)𝑛 det(𝐴 − 𝑡𝐼 ) in the variable 𝑡 is called the characteristic polynomial of 𝑇 ; and
it is denoted by 𝜒𝑇 (𝑡).
The zeros of the characteristic polynomial 𝜒𝑇 (𝑡) are called the characteristic
values of the operator 𝑇 .
Due to our convention in this chapter, a vector space of dimension 𝑛 necessarily
implies that 𝑛 ≥ 1.
112 MA2031 Classnotes

Since 𝑉 is a finite dimensional vector space over the field F, where F is either R
or C, the coefficients of powers of 𝑡 in 𝜒𝑇 (𝑡) are complex numbers, in general. Then
all characteristic values of 𝑇 are in C. This fact is a consequence of the fundamental
theorem of algebra, which states the following:
Each polynomial of degree 𝑛 with complex coefficients has 𝑛 number
of complex zeros, counting multiplicities.
From (5.4) it follows that the eigenvalues of 𝑇 are precisely the characteristic values
that lie in the underlying field. Explicitly, each eigenvalue of 𝑇 is its characteristic
value; if 𝑉 is a complex vector space, then each characteristic value is an eigenvalue;
and if 𝑉 is a real vector space, then all and only the real characteristic values are
eigenvalues of 𝑇 .

(5.5) Example

1. In (5.1-1), we had 𝑇 : R2 → R2 given by 𝑇 (𝑎, 𝑏) = (𝑎, 𝑎 + 𝑏). Let us take the


standard basis 𝐸 = {𝑒 1, 𝑒 2 } for R2 . Then
 
1 0 1−𝑡 0
[𝑇 ]𝐸,𝐸 = , 𝜒𝑇 (𝑡) = (−1) 2 = (𝑡 − 1) 2 .
1 1 1 1−𝑡
The eigenvalues of 𝑇 are the zeros of 𝜒𝑇 (𝑡) which are scalars in the underlying
field. That is, 1 is the only eigenvalue of 𝑇 . Solving 𝑇 (𝑎, 𝑏) = 1 (𝑎, 𝑏), we get
eigenvectors (0, 𝑏) for 𝑏 ≠ 0.
2. For the linear operator 𝑇 : R2 → R2 given by 𝑇 (𝑎, 𝑏) = (−𝑏, 𝑎), in (5.1-2), fix
the standard basis 𝐸 of R2 . Then
 
0 −1 −𝑡 −1
[𝑇 ]𝐸,𝐸 = , 𝜒𝑇 (𝑡) = (−1) 2 = 𝑡 2 + 1.
1 0 1 −𝑡

If 𝜆 is an eigenvalue of 𝑇 , then 𝜆 ∈ R and 𝜆 2 + 1 = 0. But 𝜆 2 + 1 ≠ 0 for any


𝜆 ∈ R. Therefore, 𝑇 does not have an eigenvalue.
3. The rotation by an angle 𝜃 on the plane R2 is given by the linear operator

𝑇𝜃 : R2 → R2, 𝑇𝜃 (𝑎, 𝑏) = (𝑎 cos 𝜃 − 𝑏 sin 𝜃, 𝑎 sin 𝜃 + 𝑏 cos 𝜃 ).

Taking the standard basis 𝐵 for R2, we have


 
cos 𝜃 − sin 𝜃
[𝑇𝜃 ]𝐵,𝐵 = .
sin 𝜃 cos 𝜃

Then 𝜒𝑇 (𝑡) = (𝑡 − cos 𝜃 ) 2 + sin2 𝜃 . Its zeros are cos 𝜃 ± 𝑖 sin 𝜃 . We find that if 𝜃
is not a multiple of 𝜋, then all these zeros are non-real. In this case, 𝑇𝜃 does not
have an eigenvalue.
Spectral Representation 113
Indeed, if 𝜃 is not a multiple of 𝜋, the rotation does not fix any straight line
through the origin.

Given any monic polynomial 𝑝 (𝑡), there exists a matrix 𝐶 such that 𝜒𝐶 (𝑡) = 𝑝 (𝑡).
See the following example.

(5.6) Example
0 0 · · · 0 −𝑎 0 
 
1 0 · · · 0 −𝑎 
 1 
Let 𝐶 =  ..  for scalars 𝑎 0, 𝑎 1, . . . , 𝑎𝑘 and 𝑘 ≥ 2.
. .
. 

 . 

0 0 · · · 1 −𝑎𝑘−1 
 
We show by induction on 𝑘 that

det(𝐶 − 𝑡𝐼 ) = (−1)𝑘 (𝑡 𝑘 + 𝑎𝑘−1𝑡 𝑘−1 + · · · + 𝑎 1𝑡 + 𝑎 0 ).

In the basis case, for 𝑘 = 2, and any scalars 𝑎 0, 𝑎 1, we have

−𝑡 −𝑎 0
= (−1) 2 (𝑡 2 + 𝑎 1𝑡 + 𝑎 0 ).
1 −𝑎 1 − 𝑡

So, assume the induction hypothesis that for any scalars 𝑎 0, 𝑎 1, . . . , 𝑎𝑚−1,

−𝑡 0 · · · 0 −𝑎 0
1 −𝑡 · · · 0 −𝑎 1
.. .. = (−1)𝑚 (𝑡 𝑚 + 𝑎𝑚−1𝑡 𝑚−1 + · · · + 𝑎 1𝑡 + 𝑎 0 ).
. .
0 0 · · · 1 −𝑎𝑚−1 − 𝑡

Then

−𝑡 0 · · · 0 −𝑎 0 −𝑡 0 · · · 0 −𝑎 1 1 −𝑡 · · · 0
1 −𝑡 · · · 0 −𝑎 1 1 −𝑡 · · · 0 −𝑎 2 0 1 ··· 0
𝑚+1
.. .. = −𝑡 . .. .. + (−1) 𝑎 0 ..
. . . .
0 0 · · · 1 −𝑎𝑚 − 𝑡 0 0 · · · 1 −𝑎𝑚 − 𝑡 0 0 ··· 1
= −𝑡 (−1)𝑚 (𝑡 𝑚 + 𝑎𝑚 𝑡 𝑚−1 + + · · · + 𝑎 2𝑡 + 𝑎 1 ) + (−1)𝑚+1𝑎 0
= (−1)𝑚+1 (𝑡 𝑚+1 + 𝑎𝑚 𝑡 𝑚 + · · · + 𝑎 1𝑡 + 𝑎 0 ).

This completes the induction proof. Therefore,

𝜒 (𝑡) = (−1)𝑘 det(𝐶 − 𝑡𝐼 ) = 𝑡 𝑘 + 𝑎𝑘−1𝑡 𝑘−1 + · · · + 𝑎 1𝑡 + 𝑎 0 .


𝐶
114 MA2031 Classnotes

Due to this reason, the matrix 𝐶 as given above is called the companion matrix of
the monic polynomial 𝑡 𝑘 + 𝑎𝑘−1𝑡 𝑘−1 + · · · + 𝑎 1𝑡 + 𝑎 0 .

The fundamental theorem of algebra implies that if 𝑝 (𝑡) is a polynomial with real
coefficients, then its complex zeros come in conjugate pairs. That is, for 𝛽 ≠ 0, if
𝜆 = 𝛼 + 𝑖𝛽 is a zero of such a polynomial 𝑝 (𝑡), then so is 𝜆 = 𝛼 − 𝑖𝛽. Further, it
implies that a polynomial of odd degree has a real zero. Consequently, each linear
operator on a finite dimensional complex vector space has an eigenvalue; and each
linear operator on an odd dimensional real vector space has an eigenvalue.
For matrices, we need to be a bit careful. Let 𝐴 be an 𝑛 × 𝑛 matrix. If at least
one entry of 𝐴 is a complex number with nonzero imaginary part, then 𝐴 ∈ C𝑛×𝑛 is
viewed as a linear operator on C𝑛×1 . In this case, the eigenvalues of 𝐴 are precisely
the characteristic values. On the other hand, if 𝐴 has only real entries, we may view
it as a linear operator on C𝑛×1 or on R𝑛×1 . As an operator on C𝑛×1, all characteristic
values of 𝐴 are its eigenvalues; and as an operator on R𝑛×1, only the real characteristic
values of 𝐴 are the eigenvalues.

Convention: In general, we view a matrix 𝐴 as an operator on C𝑛×1, the charac-


teristic values of 𝐴 will be referred to as complex eigenvalues of 𝐴.

In this terminology, a matrix in R𝑛×𝑛 can have complex eigenvalues. Notice that
an eigenvector corresponding to a complex eigenvalue of 𝐴 is a vector in C𝑛×1 .
For instance, in (5.1-3), the matrix [𝑇 ]𝐸,𝐸 has complex eigenvalues 𝑖 and −𝑖. The
corresponding eigenvectors are (1, −𝑖) t and (1, 𝑖) t, which are in C2×1 . Similarly,
the rotation matrix [𝑇𝜃 ]𝐵,𝐵 in (5.5-3) has complex eigenvalues cos 𝜃 ± 𝑖 sin 𝜃 with
eigenvectors [1, ∓𝑖] t .
Recall that a polynomial 𝑝 (𝑡) has 𝜆 as a zero of multiplicity 𝑚 means that (𝑡 − 𝜆)𝑚
divides the polynomial 𝑝 (𝑡) but (𝑡 − 𝜆)𝑚+1 does not divide 𝑝 (𝑡). Accordingly, if 𝑇 is
a linear operator on a finite dimensional vector space and 𝜆 is a characteristic value
of 𝑇 , where 𝜆 has multiplicity 𝑚 as a zero of 𝜒𝑇 (𝑡), then we say that the algebraic
multiplicity of the characteristic value 𝜆 of 𝑇 is 𝑚. The same terminology applies
when 𝜆 is an eigenvalue of 𝑇 or a (complex) eigenvalue of a square matrix 𝐴.
When we speak of all characteristic values of 𝑇 counting multiplicities, we
are concerned with the list of all characteristic values of 𝑇 , where each one is
repeated as many times as its algebraic multiplicity. Thus, a linear operator 𝑇
on an 𝑛-dimensional vector space has 𝑛 number of characteristic values, counting
multiplicities. Similarly, an 𝑛 × 𝑛 matrix has 𝑛 number of complex eigenvalues,
counting multiplicities. You should understand the results in the following theorem
in this sense.
Spectral Representation 115
(5.7) Theorem
(1) The diagonal entries of a triangular (upper or lower) and of a diagonal matrix
are its eigenvalues, counting multiplicities.
(2) A square matrix and its transpose have the same complex eigenvalues, count-
ing multiplicities.
(3) The determinant of a square matrix is the product of all its complex eigenval-
ues, counting multiplicities.
(4) The trace of a square matrix is the sum of all its complex eigenvalues, counting
multiplicities.

Proof. Let 𝐴 be a square matrix of order 𝑛. Let 𝜆1, . . . , 𝜆𝑛 be the 𝑛 complex


eigenvalues (characteristic values) of 𝐴, counting multiplicities. Then

𝜒 (𝑡) = (−1)𝑛 det(𝐴 − 𝑡𝐼 ) = (𝑡 − 𝜆1 ) · · · (𝑡 − 𝜆𝑛 ).


𝐴

(1) In all these cases, 𝜒𝐴 (𝑡) = (−1)𝑛 det(𝐴 − 𝑡𝐼 ) = (𝑡 − 𝑎 11 ) · · · (𝑡 − 𝑎𝑛𝑛 ).


(2) Now, 𝜒 t (𝑡) = det(𝐴t − 𝑡𝐼 ) = det((𝐴 − 𝑡𝐼 ) t ) = det(𝐴 − 𝑡𝐼 ) = 𝜒 (𝑡).
𝐴 𝐴
(3) Put 𝑡 = 0 in the expression for 𝜒𝐴 (𝑡) to get det(𝐴) = 𝜆1 · · · 𝜆𝑛 .
(4) Let 𝐴 = [𝑎𝑖 𝑗 ]. Expand det(𝐴 − 𝑡𝐼 ); and look at the coefficient of 𝑡 𝑛−1 in this
expansion. We obtain

Coeff. of 𝑡 𝑛−1 in det(𝐴 − 𝑡𝐼 ) = (−1)𝑛−1 (𝑎 11 + 𝑎 22 + · · · + 𝑎𝑛𝑛 ) = (−1)𝑛−1 tr(𝐴).

But det(𝐴 − 𝑡𝐼 ) = (−1)𝑛 𝜒𝐴 (𝑡) = (𝜆1 − 𝑡) · · · (𝜆𝑛 − 𝑡). So,

Coeff. of 𝑡 𝑛−1 in det(𝐴 − 𝑡𝐼 ) = (−1)𝑛−1 (𝜆1 + · · · + 𝜆𝑛 ).

Therefore, 𝜆1 + · · · + 𝜆𝑛 = 𝑡𝑟 (𝐴).
We say that a polynomial 𝑝 (𝑡) annihilates a linear operator 𝑇 iff 𝑝 (𝑇 ) = 0, the
zero operator. As usual, when 𝑝 (𝑡) annihilates 𝑇 , we also say that 𝑇 is annihilated
by 𝑝 (𝑡). The same terminology applies to square matrices.

(5.8) Theorem (Cayley-Hamilton)


Any linear operator on a finite dimensional nonzero vector space is annihilated by
its characteristic polynomial.

Proof. Let 𝑉 be a nonzero vector space. Let 𝑣 be a nonzero vector in 𝑉 . Consider


the vectors 𝑣, 𝑇 𝑣, 𝑇 2𝑣, . . . . This infinite list of vectors cannot be linearly independent
in 𝑉 , since 𝑉 is finite dimensional. Also {𝑣 } is linearly independent. So, let 𝑘 be
the smallest positive integer such that {𝑣, 𝑇 𝑣, . . . , 𝑇 𝑘 𝑣 } is linearly dependent. Then
116 MA2031 Classnotes

𝑆 := {𝑣, 𝑇 𝑣, . . . , 𝑇 𝑘−1𝑣 } is linearly independent; and


there exist scalars 𝑎 0, . . . , 𝑎𝑘−1 such that 𝑇 𝑘 𝑣 = 𝑎 0𝑣 + 𝑎 1𝑇 𝑣 + · · · + 𝑎𝑘−1𝑇 𝑘−1𝑣.

Extend 𝑆 to a basis 𝐵 = {𝑣, 𝑇 𝑣, . . . , 𝑇 𝑘−1𝑣, 𝑢 1, . . . , 𝑢𝑚 } for 𝑉 . Then compute the


𝑇 -images of the basis vectors and use the above equality to obtain

𝑇 𝑣 = 0 𝑣 + 1𝑇 𝑣 + 0𝑇 2𝑣 + · · · + 0𝑇 𝑘−1𝑣 + 0 𝑢 1 + · · · + 0 𝑢𝑚
𝑇 (𝑇 𝑣) = 0 𝑣 + 0𝑇 𝑣 + 1𝑇 2𝑣 + · · · + 0𝑇 𝑘−1𝑣 + 0 𝑢 1 + · · · + 0 𝑢𝑚
..
.
𝑇 (𝑇 𝑘−2𝑣) = 0 𝑣 + 0𝑇 𝑣 + 0𝑇 2𝑣 + · · · + 1𝑇 𝑘−1𝑣 + 0 𝑢 1 + · · · + 0 𝑢𝑚
𝑇 (𝑇 𝑘−1𝑣) = 𝑎 0 𝑣 + 𝑎 1 𝑇 𝑣 + 𝑎 2 𝑇 2𝑣 + · · · + 𝑎𝑘−1 𝑇 𝑘−1𝑣 + 0 𝑢 1 + · · · + 0 𝑢𝑚
𝑇 (𝑢 1 ) = 𝑏 11𝑣 + 𝑏 12𝑇 𝑣 + · · · + 𝑏 1𝑘𝑇 𝑘−1𝑣 + 𝑏 1(𝑘+1)𝑢 1 + · · · + 𝑏 1(𝑘+𝑚)𝑢𝑚
..
.
𝑇 (𝑢𝑚 ) = 𝑏𝑚1𝑣 + 𝑏𝑚2𝑇 𝑣 + · · · + 𝑏𝑚𝑘𝑇 𝑘−1𝑣 + 𝑏𝑚(𝑘+1)𝑢 1 + · · · + 𝑏𝑚(𝑘+𝑚)𝑢𝑚

where 𝑏𝑖 𝑗 are some scalars. Thus, the matrix representation of 𝑇 with respect to the
ordered basis 𝐵 is in the form
0 0 · · · 0 𝑎 0 
 
  1 0 · · · 0 𝑎 1 
𝐶 𝐴
𝑀 := [𝑇 ]𝐵,𝐵 = where 𝐶 = 
 
, .. .
0 𝐷  . 
 
0 0 · · · 1 𝑎𝑘−1 
 

Writing 𝑝 (𝑡) = (−1)𝑘 det(𝐶 − 𝑡𝐼 ) and 𝑞(𝑡) = (−1)𝑚 det(𝐷 − 𝑡𝐼 ), we have

𝜒 (𝑡) = 𝜒 (𝑡) = 𝑞(𝑡) 𝑝 (𝑡).


𝑇 𝑀

Using the result in (5.6), we see that

𝑝 (𝑡) = (−1)𝑘 det(𝐶 − 𝑡𝐼 ) = 𝑡 𝑘 − 𝑎𝑘−1𝑡 𝑘−1 − · · · − 𝑎 1𝑡 − 𝑎 0 .

Since 𝑇 𝑘 𝑣 = 𝑎 0𝑣 + 𝑎 1𝑇 𝑣 + · · · + 𝑎𝑘−1𝑇 𝑘−1𝑣, 𝑝 (𝑇 )𝑣 = 0. So, 𝜒𝑇 (𝑇 )𝑣 = 0. Notice that


the polynomial 𝑝 (𝑡) depends (possibly) on the nonzero vector 𝑣. However, it is a
factor of the characteristic polynomial 𝜒𝑇 (𝑡), and 𝜒𝑇 (𝑡) does not depend on 𝑣. We
thus conclude that 𝜒𝑇 (𝑇 )𝑣 = 0 for each nonzero vector 𝑣 ∈ 𝑉 . Therefore, 𝜒𝑇 (𝑇 ) = 0,
the zero operator.
The characteristic polynomial is not the only polynomial that annihilates the
operator. For example, the identity operator 𝐼 : 𝑉 → 𝑉 , defined by 𝐼 (𝑣) = 𝑣, on
Spectral Representation 117
an 𝑛-dimensional vector space 𝑉 has the characteristic polynomial 𝑝 (𝑡) = (𝑡 − 1)𝑛 .
But 𝐼 is annihilated by the polynomial 𝑞(𝑡) = 𝑡 − 1, which is not the characteristic
polynomial in case 𝑛 > 1. In this regard, the following result is often helpful.

(5.9) Theorem
If a linear operator on a finite dimensional vector space is annihilated by a polyno-
mial, then its eigenvalues are from among the zeros of the polynomial.

Proof. Let 𝑇 be a linear operator on an 𝑛-dimensional vector space 𝑉 over F. Let


𝑝 (𝑡) be a polynomial with coefficients from F such that 𝑝 (𝑇 ) = 0, the zero operator.
Let 𝜆 be an eigenvalue of 𝑇 . We show that 𝑝 (𝜆) = 0.
Suppose 𝑣 ∈ 𝑉 is an eigenvector corresponding to the eigenvalue 𝜆 of 𝑇 . Let
𝛼, 𝛽 ∈ F. Write the identity operator on 𝑉 as 𝐼 . We see that

(𝛼𝑇 + 𝛽𝐼 )𝑣 = 𝛼𝑇 𝑣 + 𝛽𝑣 = (𝛼𝜆 + 𝛽)𝑣, 𝑇 2𝑣 = 𝑇 (𝜆𝑣) = 𝜆𝑇 𝑣 = 𝜆 2𝑣.

It follows by induction that 𝑇 𝑘 𝑣 = 𝜆𝑘 𝑣 for any 𝑘 ∈ N. Then, 𝑝 (𝑇 )𝑣 = 𝑝 (𝜆)𝑣. Since


𝑝 (𝑇 ) = 0. we have 𝑝 (𝜆)𝑣 = 0. As 𝑣 ≠ 0, we conclude that 𝑝 (𝜆) = 0.
Thus, if an 𝑛 × 𝑛 matrix is annihilated by a polynomial 𝑝 (𝑡), then all its complex
eigenvalues are among the zeros of 𝑝 (𝑡).

(5.10) Example
Let 𝐴 ∈ R3×3 be such that 𝐴2 = 3𝐴 − 2𝐼 and det(𝐴) = 4. Then what is tr(𝐴)?
𝐴 is annihilated by the polynomial 𝑝 (𝑡) = 𝑡 2 − 3𝑡 + 2 = (𝑡 − 2)(𝑡 − 1). The zeros of
𝑝 (𝑡) are 2 and 1. Thus the (complex) eigenvalues of 𝐴 can be 2 or 1. Now that 𝐴 is a
matrix of order 3, taking into account the repetition of eigenvalues, the possibilities
are 2, 2, 2 or 2, 2, 1 or 2, 1, 1 or 1, 1, 1. As det(𝐴) = the product of the eigenvalues
of 𝐴 = 4; the eigenvalues are 2, 2 and 1. So tr(𝐴) = 2 + 2 + 1 = 5.

We wish to find out the nature of eigenvalues and eigenvectors of these special
types of operators.

(5.11) Theorem
Let 𝑇 be a linear operator on a finite dimensional inner product space.
(1) If 𝑇 is self-adjoint, then
(a) all characteristic values of 𝑇 are real;
(b) 𝑇 has an eigenvalue;
(c) all eigenvalues of 𝑇 are real; and
(d) eigenvectors for distinct eigenvalues are orthogonal.
(2) If 𝑇 is unitary, then each eigenvalue of 𝑇 has absolute value 1.
118 MA2031 Classnotes

Proof. (1) Let 𝑇 be a self-adjoint linear operator on an inner product space 𝑉 of


dimension 𝑛 over F.
(a) Let 𝜆 be a characteristic value of 𝑇 . Choose an orthonormal basis 𝐵 for 𝑉 . Due
to (4.22), 𝐴 := [𝑇 ]𝐵,𝐵 is an 𝑛 × 𝑛 hermitian matrix. Then 𝜆 is a complex eigenvalue
of 𝐴. Let 𝑣 ∈ C𝑛×1 be an eigenvector corresponding to the complex eigenvalue 𝜆 of
𝐴. Now, 𝐴𝑣 = 𝜆𝑣 implies

h𝐴𝑣, 𝑣i = h𝜆𝑣, 𝑣i = 𝜆h𝑣, 𝑣i; h𝐴𝑣, 𝑣i = h𝐴∗𝑣, 𝑣i = h𝑣, 𝐴𝑣i = h𝑣, 𝜆𝑣i = 𝜆h𝑣, 𝑣i.

Since h𝑣, 𝑣i ≠ 0, it follows that 𝜆 = 𝜆. That is, 𝜆 is real.


(b-c) These statements follow from (a).
(d) Let 𝑢, 𝑣 be eigenvectors corresponding to distinct eigenvalues 𝜆, 𝜇 of 𝑇 . The
equalities 𝑇 ∗ = 𝑇 , 𝑇𝑢 = 𝜆𝑢, 𝑇 𝑣 = 𝜇𝑣, and 𝜆, 𝜇 are real imply that

h𝑇𝑢, 𝑣i = h𝜆𝑢, 𝑣i = 𝜆h𝑢, 𝑣i;


h𝑇𝑢, 𝑣i = h𝑇 ∗𝑢, 𝑣i = h𝑢,𝑇 𝑣i = h𝑢, 𝜇𝑣i = 𝜇 h𝑢, 𝑣i = 𝜇 h𝑢, 𝑣i.

Then (𝜆 − 𝜇 )h𝑢, 𝑣i = 0. As 𝜆 ≠ 𝜇, we conclude that 𝑢 and 𝑣 are orthogonal.


(2) Suppose 𝑇 ∗𝑇 = 𝑇𝑇 ∗ = 𝐼 . Let 𝑣 be an eigenvector associated with the eigenvalue
𝜆 of 𝑇 . Now, 𝑇 𝑣 = 𝜆𝑣 implies that

h𝑇 𝑣,𝑇 𝑣i = h𝑣,𝑇 ∗𝑇 𝑣i = h𝑣, 𝑣i; h𝑇 𝑣,𝑇 𝑣i = h𝜆𝑣, 𝜆𝑣i = 𝜆𝜆h𝑣, 𝑣i = |𝜆| 2 h𝑣, 𝑣i.

That is, (1 − |𝜆| 2 )h𝑣, 𝑣i = 0. As 𝑣 ≠ 0, we have |𝜆| = 1.


From (5.11) we obtain the following results for matrices.

(5.12) Theorem

(1) Each hermitian or real symmetric matrix has an eigenvalue.


(2) Each eigenvalue of a hermitian or real symmetric matrix is real.
(3) Eigenvectors corresponding to distinct eigenvalues of a hermitian or real
symmetric matrix are orthogonal.
(4) Corresponding to each eigenvalue of a real symmetric matrix, a real eigen-
vector exists.
(5) Each eigenvalue of a skew-hermitian or real skew-symmetric matrix is purely
imaginary or zero.
(6) Each complex eigenvalue of a unitary or orthogonal matrix has absolute
value 1. The determinant of such a matrix has absolute value 1.
Spectral Representation 119
Proof. (1)-(3) follow from the previous theorem. Though (4) also follows implic-
itly, we give a more direct proof as in the following:
(4) Let 𝐴 be a real symmetric matrix. By (1), 𝜆 ∈ R. Let 𝑣 ∈ C𝑛×1 be an eigenvec-
tor of 𝐴 corresponding to the eigenvalue 𝜆. Write 𝑣 = 𝑥 + 𝑖𝑦, where 𝑥, 𝑦 ∈ R𝑛×1 .
Comparing the real and imaginary parts in 𝐴(𝑥 + 𝑖𝑦) = 𝜆(𝑥 + 𝑖𝑦), we have

𝐴𝑥 = 𝜆𝑥, 𝐴𝑦 = 𝜆𝑦.

Since 𝑥 + 𝑖𝑦 ≠ 0, at least one of 𝑥 or 𝑦 is nonzero. Such a nonzero vector is a real


eigenvector of 𝐴 associated with the eigenvalue 𝜆.
(5) Suppose 𝐴 is a skew-hermitian or real skew-symmetric matrix; that is, 𝐴∗ = −𝐴.
Let 𝑣 be an eigenvector associated with the eigenvalue 𝜆 of 𝐴. Then 𝐴𝑣 = 𝜆𝑣 implies

h𝐴𝑣, 𝑣i = h𝜆𝑣, 𝑣i = 𝜆h𝑣, 𝑣i; h𝐴𝑣, 𝑣i = h−𝐴∗𝑣, 𝑣i = −h𝑣, 𝐴𝑣i = −h𝑣, 𝜆𝑣i = −𝜆h𝑣, 𝑣i.

Since 𝑣 ≠ 0, we have 𝜆 = −𝜆. That is, 𝜆 is purely imaginary or zero.


(6) Suppose 𝐴 is a unitary or an orthogonal matrix. By the previous theorem, each
complex eigenvalue of 𝐴 has absolute value 1.
Further, since det(𝐴) is the product of all complex eigenvalues of 𝐴 counting
multiplicities, |det(𝐴)| = 1.
We remark that a hermitian matrix need not have a real eigenvector though all
eigenvalues are real. For example, the hermitian matrix
 
0 𝑖
−𝑖 0

has eigenvalues 1 and −1 with corresponding linearly independent eigenvectors


[𝑖 1] 𝑡 and [−𝑖 1] 𝑡 . It does not have any real eigenvector.
As we know, the determinant of an orthogonal operator (matrix) is either 1 or −1.
However, every orthogonal operator need not have an eigenvalue.

(5.13) Example
Consider the linear operator 𝑇 : R2 → R2 given by 𝑇 (𝑎, 𝑏) = √1 (𝑎 − 𝑏, 𝑎 + 𝑏). Its
2
matrix representation with respect to the standard basis 𝐸 of R2 is
 √ √ 
1/ 2 −1/ 2
[𝑇 ]𝐸,𝐸 = 1 √ √ .
/ 2 1/ 2

It is easy to verify that [𝑇 ]𝐸,𝐸 is an orthogonal matrix. Its characteristic polynomial


is √
𝜒 (𝑡) = 𝑡 2 − 2 𝑡 + 1.
𝑇
120 MA2031 Classnotes

[𝑇 ]𝐸,𝐸 has complex eigenvalues as (1 ± 𝑖)/ 2. Since 𝜒𝑇 (𝑡) has no real zeros, 𝑇 has
no eigenvalues.

In (5.13), 𝑇 is the rotation in the plane by an angle of 𝜋/4. In fact, any rotation 𝑇𝜃
of (5.5), where 𝜃 is not a multiple of 𝜋, provides such an example.

Exercises for § 5.2


1. Find the characteristic polynomial, eigenvalues and associated eigenvectors
for the matrices given below.
     −2 0 3   0 1 0
3 2 −2 −1    
(a) (b) (c)  −2 3 0  (d)  0 0 1  .
 
−1 0 5 2  0 0 5  −6 −1 4 
   
Ans: (a) (𝑡 −1)(𝑡 −2), [1 −1] 𝑡 , [−2 1] 𝑡 . (b) (𝑡 −𝑖)(𝑡 +𝑖), [1 −2−𝑖] 𝑡 , [1 𝑖 −2] 𝑡 .
(c) (𝑡 + 2)(𝑡 − 3)(𝑡 − 5), [5 2 0] 𝑡 , [0 1 0] 𝑡 , [3 − 3 7] 𝑡 .
(d) (𝑡 + 1)(𝑡 − 2)(𝑡 − 3), [1 − 1 1] 𝑡 , [1 2 4] 𝑡 , [1 3 9] 𝑡 .
2. Find the eigenvalues and associated eigenvectors of the differentiation opera-
tor 𝑑/𝑑𝑡 : R3 [𝑡] → R3 [𝑡]. Ans: Eigenvalue is 0; eigenvector is 1.
0 1 0
 
3. Let 𝐴 = 0 0 1 . Show the following:
𝑎 𝑏 𝑐 
 
(a) 𝐴 is invertible iff 𝑎 ≠ 0.
(b) If 𝑎 ≠ 0, then 𝐴−1 = (1/𝑎)(𝐴2 − 𝑐𝐴 − 𝑏𝐼 ).
4. If you know the characteristic polynomial of 𝐴 ∈ F𝑛×𝑛 , then how do you
determine whether 𝐴 is invertible or not? If 𝐴 is invertible, then how do you
compute 𝐴−1 using the characteristic polynomial of 𝐴?
5. Is it true that each real skew symmetric matrix has an eigenvalue?
Ans: No.
6. Let 𝑥 and 𝑦 be eigenvectors corresponding to distinct eigenvalues of a real
symmetric matrix of order 3. Show that their cross product is a third eigen-
vector linearly independent with 𝑥 and 𝑦.

5.3 Schur triangularization


Eigenvalues and eigenvectors can be used to represent a linear operator on an inner
product space in a nice form. One of them is the unitary triangularization of Schur.
Spectral Representation 121
(5.14) Theorem (Schur Triangularization)
For each linear operator on a finite dimensional complex ips, there exists an or-
thonormal ordered basis for the ips with respect to which the matrix of the linear
operator is upper triangular.

Proof. Let dim (𝑉 ) = 𝑛. For 𝑛 = 1, let {𝑢} be a basis of 𝑉 . Then 𝑢 ≠ 0. Take


𝑣 = 𝑢/k𝑢 k. Then 𝐵 = {𝑣 } is an orthonormal basis of 𝑉 . It follows that 𝑇 𝑣 ∈ 𝑉 , and
hence 𝑇 𝑣 = 𝛼𝑣 for some 𝛼 ∈ C. So, 𝑇 𝑣 ∈ span {𝑣 }.
To apply induction, assume that the statement holds for all linear operators on
each complex ips of dimension less than 𝑛. Let 𝑇 be a linear operator on a complex
ips 𝑉 of dimension 𝑛. Let 𝜆 ∈ C be an eigenvalue of 𝑇 with an associated eigenvector
𝑢. Take 𝑣 1 = 𝑢/k𝑢 k; so that 𝑣 1 is an eigenvector of 𝑇 of norm 1 associated with the
eigenvalue 𝜆. Extend the set {𝑣 1 } to obtain a basis for 𝑉 . Then use Gram-Schmidt
process to obtain an orthogonal (or orthonormal) basis {𝑣 1, 𝑢 2, . . . , 𝑢𝑛 } for 𝑉 .
Let 𝑈 = span {𝑢 2, . . . , 𝑢𝑛 }. Then 𝑣 1 is orthogonal to each vector in 𝑈 . Let 𝑥 ∈ 𝑈 .
As 𝑇 𝑥 ∈ 𝑉 , there exist unique scalars 𝑎 1, 𝑎 2, . . . , 𝑎𝑛 ∈ C such that

𝑇 𝑥 = 𝑎 1𝑣 1 + 𝑎 2𝑢 2 + · · · + 𝑎𝑛𝑢𝑛 .

Define 𝑆 : 𝑈 → 𝑈 by
𝑆 (𝑥) = 𝑎 2𝑢 2 + · · · + 𝑎𝑛𝑢𝑛 .
Clearly, for 𝑢 ∈ 𝑈 , 𝑆 (𝑢) ∈ 𝑈 . We show that 𝑆 is a linear operator. For this, let
𝑦, 𝑧 ∈ 𝑈 , and let 𝛼 ∈ C. There exist unique scalar 𝑏𝑖 , 𝑐𝑖 such that

𝑇𝑦 = 𝑏 1𝑣 1 + 𝑏 2𝑢 2 + · · · + 𝑏𝑛𝑢𝑛 , 𝑇 𝑧 = 𝑐 1 𝑣 1 + 𝑐 2 𝑣 2 + · · · + 𝑐 𝑛𝑢𝑛 ;
𝑇 (𝑦 + 𝛼𝑧) = (𝑏 1 + 𝛼𝑐 1 )𝑣 1 + (𝑏 2𝑢 2 + · · · + 𝑏𝑛𝑢𝑛 ) + 𝛼 (𝑐 2𝑢 2 + · · · + 𝑐𝑛𝑢𝑛 ).

We see that 𝑆 (𝑦 + 𝛼𝑧) = (𝑏 2𝑢 2 + · · · + 𝑏𝑛𝑢𝑛 ) + 𝛼 (𝑐 2𝑢 2 + · · · + 𝑐𝑛𝑢𝑛 ) = 𝑆 (𝑦) + 𝛼𝑆 (𝑧).


Therefore, 𝑆 : 𝑈 → 𝑈 is a linear operator. It satisfies

𝑇 𝑥 = 𝑎 1𝑣 1 + 𝑆 (𝑥),

where the scalar 𝑎 1 ∈ C and the vector 𝑆 (𝑥) ∈ 𝑈 are uniquely determined from 𝑇
and the vector 𝑥 ∈ 𝑈 .
By the induction hypothesis, there exists an orthonormal ordered basis {𝑣 2, . . . , 𝑣𝑛 }
for 𝑈 such that
𝑆 (𝑣 𝑗 ) ∈ span {𝑣 2, . . . , 𝑣 𝑗 } for 2 ≤ 𝑗 ≤ 𝑛.
Let 𝐵 = {𝑣 1, 𝑣 2, . . . , 𝑣𝑛 }. Since k𝑣 1 k = 1, 𝑣 1 is orthogonal to each vector in 𝑈 , and
{𝑣 1, . . . , 𝑣𝑛 } is an orthonormal set. We see that 𝐵 is an orthonormal ordered basis
122 MA2031 Classnotes

for 𝑉 . To see that this is the required basis, we compute the 𝑇 -images of the basis
vectors as follows: (Here, 𝛼𝑖 are some suitable scalars.)

𝑇 (𝑣 1 ) = 𝜆𝑣 1 ∈ span {𝑣 1 },
𝑇 (𝑣 2 ) = 𝛼 1𝑣 1 + 𝑆 (𝑣 2 ) ∈ span {𝑣 1, 𝑣 2 },
..
.
𝑇 (𝑣 𝑗 ) = 𝛼 𝑗 𝑣 1 + 𝑆 (𝑣 𝑗 ) ∈ span {𝑣 1, 𝑣 2, . . . , 𝑣 𝑗 }, for 2 ≤ 𝑗 ≤ 𝑛.

Clearly, [𝑇 ]𝐵,𝐵 is upper triangular.


Schur triangularization involves the choice of a basis for 𝑉 so that the linear
operator 𝑇 on 𝑉 is represented by an upper triangular matrix. When 𝑉 = C𝑛×1, the
linear operator 𝑇 is a matrix in C𝑛×𝑛 . We discuss this particular case.
Let 𝐴 ∈ C𝑛×𝑛 . Let 𝐸 = {𝑣 1, . . . , 𝑣𝑛 } be a basis for C𝑛×1 such that [𝐴]𝐸,𝐸 is
upper triangular. From (4.28) it follows that 𝑃 −1𝐴𝑃 is upper triangular, where
𝑃 = [𝑣 1 · · · 𝑣𝑛 ]. Notice that this is a stronger type of equivalence.
Let 𝐴, 𝐵 ∈ F𝑛×𝑛 . Recall that 𝐵 is called similar to 𝐴 iff there exists an invertible
matrix 𝑃 ∈ F𝑛×𝑛 such that 𝐵 = 𝑃 −1𝐴𝑃 .
𝐵 is called unitarily similar to 𝐴 iff there exists a unitary matrix 𝑃 ∈ F𝑛×𝑛 such
that 𝐵 = 𝑃 −1𝐴𝑃 .
When F = R, 𝐵 is called orthogonally similar to 𝐴 iff there exists an orthogonal
matrix 𝑃 ∈ R𝑛×𝑛 such that 𝐵 = 𝑃 −1𝐴𝑃 .
Clearly, similarity is an equivalence relation on the set of square matrices of the
same order. So, we speak of matrices similar to each other, etc. If two matrices
are similar, then they are also equivalent. However, converse does not hold. For
instance, the identity matrix of order 𝑛 is similar to only itself. But the identity
matrix is equivalent to any invertible matrix. Thus, equivalence may not preserve
eigenvalues. In contrast, the following theorem shows that similarity preserves
eigenvalues.

(5.15) Theorem
Similar square matrices have the same eigenvalues, counting multiplicities.

Proof. Let 𝐴, 𝑃, 𝐵 ∈ F𝑛×𝑛 such that 𝑃 is invertible and 𝐵 = 𝑃 −1𝐴𝑃 . Then


𝜒 (𝑡) = (−1)𝑛 det(𝑃 −1𝐴𝑃 − 𝑡𝐼 ) = (−1)𝑛 det(𝑃 −1 (𝐴 − 𝑡𝐼 )𝑃)
𝐵
= (−1)𝑛 det(𝑃 −1 )det(𝐴 − 𝑡𝐼 )det(𝑃) = (−1)𝑛 det(𝐴 − 𝑡𝐼 ) = 𝜒𝐴 (𝑡).

From this, the result follows.


Though equivalence of matrices is easily characterized by the rank theorem,
similarity involves much more. We will proceed towards that goal, but slowly.
Spectral Representation 123
For matrices, (5.14) may now be stated as follows.

(5.16) Theorem (Schur Triangularization)


Each complex square matrix is unitarily similar to an upper triangular matrix; and
each real square matrix that has only real characteristic values is orthogonally
similar to an upper triangular matrix.

The inductive construction of a unitary matrix described in the proof of Schur


triangularization amounts to the following.
Assume that for all 𝐵 ∈ C𝑚×𝑚 , 𝑚 ≥ 1, we have a unitary matrix 𝑄 ∈ C𝑚×𝑚 such
that 𝑄 ∗ 𝐵𝑄 is upper triangular. Let 𝐴 ∈ C (𝑚+1)×(𝑚+1) and let 𝜆 ∈ C be an eigenvalue
of 𝐴 with an associated eigenvector 𝑢. Consider C (𝑚+1)×1 as an ips with the usual
inner product h𝑤, 𝑧i = 𝑧 ∗𝑤 . Let 𝑣 = 𝑢/k𝑢 k, so that 𝑣 is an eigenvector of 𝐴 of norm
1 associated with the eigenvalue 𝜆. Extend the set {𝑣 } to obtain an orthonormal
ordered basis 𝐸 = {𝑣, 𝑢 2, . . . , 𝑢𝑚+1 } for C (𝑚+1)×1 . Here, you may have to use an
extension of a basis, and then Gram-Schmidt orthonormalization process. Now,
construct the matrix 𝑅 ∈ C (𝑚+1)×(𝑚+1) by taking these basis vectors as its columns,
in that order; that is, let  
𝑅 = 𝑣 𝑢 2 · · · 𝑢𝑚+1 .
Since 𝐸 is an orthonormal set, 𝑅 is unitary. With respect to the basis 𝐸, the matrix
representation of 𝐴 is 𝑅 −1𝐴𝑅 = 𝑅 ∗𝐴𝑅. Using the standard basis for C (𝑚+1)×1, we see
that the first column of 𝑅 ∗𝐴𝑅 is

𝑅 ∗𝐴𝑅𝑒 1 = 𝑅 ∗𝐴𝑣 = 𝑅 −1𝜆𝑣 = 𝜆𝑅 −1𝑣 = 𝜆𝑅 −1𝑅𝑒 1 = 𝜆𝑒 1 .

Then 𝑅 ∗𝐴𝑅 can be written in the following block form:


 
∗ 𝜆 𝑥
𝑅 𝐴𝑅 = ,
0 𝐶

where 0 ∈ C𝑚×1, 𝐶 ∈ C𝑚×𝑚 and 𝑥 = [𝑣 ∗𝐴𝑣 1 · · · 𝑣 ∗𝐴𝑣𝑚 ] ∈ C1×𝑚 .


Notice that if 𝑚 = 1, the construction is complete. For 𝑚 > 1, by induction
hypothesis, we have a matrix 𝑆 ∈ C𝑚×𝑚 such that 𝑆 ∗𝐶𝑆 is upper triangular. Then
take  
1 0
𝑃 =𝑅 .
0 𝑆
Since 𝑆 is unitary, direct computation shows that 𝑃 is unitary. Moreover,
 ∗         
∗ 1 0 ∗ 1 0 1 0 𝜆 𝑥 1 0 𝜆 𝑦
𝑃 𝐴𝑃 = 𝑅 𝐴𝑅 = =
0 𝑆 0 𝑆 0 𝑆∗ 0 𝐶 0 𝑆 0 𝑆 ∗𝐶𝑆

for some 𝑦 ∈ C1×𝑚 . Since 𝑆 ∗𝐶𝑆 is upper triangular, so is 𝑃 ∗𝐴𝑃 . The construction is
complete.
124 MA2031 Classnotes

If 𝐴 is a real matrix, and all its complex eigenvalues turn out to be real, then
we use the transpose instead of the adjoint every where in the above construction.
Thus, 𝑃 is an orthogonal matrix.

(5.17) Example
 2 1 0
 
Consider the matrix 𝐴 =  2 3 0 for Schur triangularization.
−1 −1 1
 
We find that 𝜒𝐴 (𝑡) = (𝑡 − 1) 2 (𝑡 − 4). All characteristic values of 𝐴 are real. Thus
there exists an orthogonal matrix 𝑃 such that 𝑃 t𝐴𝑃 is upper triangular. To determine
such a matrix 𝑃, we take one of the eigenvalues, say 1. An associated eigenvector
of norm 1 is 𝑣 = (0, 0, 1) t . We extend {𝑣 } to an orthonormal basis for R3×1 . For
convenience, we take the (ordered) orthonormal basis as
{(0, 0, 1) t, (1, 0, 0) t, (0, 1, 0) t }.
Taking the basis vectors as columns, we form the matrix 𝑅 as follows:
0 1 0

𝑅 = 0 0 1 .
1 0 0

We then find that
1 −1 −1
t
 
𝑅 𝐴𝑅 = 0 2 1 .
0 2 3
 
 
2 1
Now, we try to triangularize the matrix 𝐶 = . It has eigenvalues 1 and 4.
2 3
√ √
The eigenvector of unit norm associated with the eigenvalue 1 is ( 1/ 2, −1/ 2) t . We
extend it to an orthonormal basis
√ √ √ √
{( 1/ 2, −1/ 2) t, ( 1/ 2, 1/ 2) t }
for R2×1 . Then we construct the matrix 𝑆 by taking these basis vectors as its columns:
 √ √ 
1/ 2 1/ 2
𝑆 = −1 √ 1 √ .
/ 2 / 2
 
t 1 −1
We find that 𝑆 𝐶𝑆 = , which is an upper triangular matrix. Then
0 4
√ √
  0 1 0 1 0 0  0 1/ 2 1/ 2
1 0   √
1/ 2

1/ 2 = 0

−1/ 2
√ 
1/ 2
𝑃 =𝑅 = 0 0 1 0 .
0 𝑆 1
 √
−1/ 2
√ 
1/ 2
 
 0 0 0
  1
 0 0 
Spectral Representation 125

1 0 − 2
 
Now, 𝑃 t𝐴𝑃 = 0 1 −1  is an upper triangular matrix.
0 0 4 

Notice that there is nothing sacred about being upper triangular. For, given
a matrix 𝐴 ∈ C𝑛×𝑛 , consider using Schur triangularization of 𝐴∗ . There exists a
unitary matrix 𝑃 such that 𝑃 ∗𝐴∗𝑃 is upper triangular. Then taking adjoint, we have
𝑃 ∗𝐴𝑃 is lower triangular. That is,
each square matrix is unitarily similar to a lower triangular matrix.
Analogously, a real square matrix having no non-real eigenvalues is also orthogo-
nally similar to a lower triangular matrix. We remark that the lower triangular form
of a matrix need not be the transpose or the adjoint of its upper triangular form.
Neither the unitary matrix 𝑃 nor the upper triangular matrix 𝑃 ∗𝐴𝑃 in Schur
triangualrization is unique. That is, there can be unitary matrices 𝑃 and 𝑄 such
that 𝑃 ≠ 𝑄, 𝑃 ∗𝐴𝑃 ≠ 𝑄 ∗𝐴𝑄, and both 𝑃 ∗𝐴𝑃 and 𝑄 ∗𝐴𝑄 are upper triangular. The
non-uniqueness stems from the choices involved in the associated eigenvectors and
in extending this to an orthonormal basis. For instance, in (5.17), if you extend
{(0, 0, 1) t } to the ordered orthonormal basis

{(0, 0, 1) t, (0, 1, 0) t, (1, 0, 0) t },

then you end up with (verify)


√ √ √
0 −1/ 2 1/ 2 1 0 − 2 
√ √ 
𝑃 t𝐴𝑃 = 0 1
  
𝑃 = 0 1/ 2 1/ 2 , 1  .

1 0 0  0 0 4 
 

Exercises for § 5.3


1. Let 𝑇 : R2 → R2 be given by 𝑇 (𝑎, 𝑏) = (5𝑎 + 7𝑏, −2𝑎 − 4𝑏). Determine an
ordered 2
 basis
√ of R√with respect
√ to√which, the matrix of 𝑇 is upper triangular.
Ans: (1/ 2, −1/ 2)𝑡 , (1/ 2, 1/ 2)𝑡 .
2. Find Schur triangularization of the following matrices:
     13 8 8
7 −2 1 1  
(a) (b) (c)  −1 7 −2 .
12 −3 −2 3  −1 −2 7 
      
1 1 2 3 −14 1 1 1+𝑖
Ans: (a) 𝑃 = √ ;𝑈 = . (b) 𝑃 = √ ;
5 −2 1 0 1 3 1 − 𝑖 −1
  2 −2 1  9 0 9
2 + 𝑖 −1 + 2𝑖 
1 
  
𝑈 = . (c) 𝑃 = 3 2 1 −2 ; 𝑈 = 0 9 9 .
0 2−𝑖 1 −2 2  0 0 9
   
126 MA2031 Classnotes

3. Compute the 50th power of the 3 × 3 matrix in Exercise 2(c).


 209 400 400 
49
 
Ans: 9 −50 −91 −100 .
−50 −100 −91 
 
4. Let 𝜆1, . . . , 𝜆𝑛 be the eigenvalues of an 𝑛 × 𝑛 matrix 𝐴. Prove that

|𝜆1 | 2 + · · · + |𝜆𝑛 | 2 ≤ tr(𝐴∗𝐴).

5. For a matrix 𝐴 = [𝑎𝑖 𝑗 ] ∈ C𝑛×𝑛 , let 𝑐 = max{|𝑎𝑖 𝑗 | : 1 ≤ 𝑖, 𝑗 ≤ 𝑛}. Show that


|det(𝐴)| ≤ 𝑐 𝑛𝑛𝑛/2 . [Hint: Use Exercise 4 and AM-GM inequality.]
6. Let the scalar 𝜆 appear exactly 𝑚 times in the diagonal of a square upper
triangular matrix 𝐴. Prove that null((𝐴 − 𝜆𝐼 )𝑚 ) = 𝑚.
7. Let 𝐴 ∈ C𝑛×𝑛 and let 𝜆 be an eigenvalue of 𝐴 of algebraic multiplicity 𝑚.
Then prove that null((𝐴 − 𝜆𝐼 )𝑚 ) = 𝑚.

5.4 Diagonalizability
Some linear operators can be represented by matrices simpler than upper triangular
matrices. For instance, if 𝑇 is a self-adjoint linear operator on an ips 𝑉 of finite
dimension, then Schur triangularization asserts the existence of an orthonormal
basis for 𝑉 such that the matrix of 𝑇 with respect to this basis is upper triangular.
By (4.22), this matrix is hermitian. However, a hermitian upper triangular matrix is
diagonal. Therefore, 𝑇 is represented by a diagonal matrix. Shortly, we will discuss
a more general result in this connection.
Let 𝑇 : 𝑉 → 𝑉 be a linear operator, where 𝑉 is a finite dimensional vector space.
We say that 𝑇 is diagonalizable iff there exists an ordered basis 𝐵 for 𝑉 such that
[𝑇 ]𝐵,𝐵 is a diagonal matrix.
If 𝑉 is an ips, then 𝑇 is called unitarily diagonalizable iff there exists an or-
thonormal ordered basis 𝐵 for 𝑉 such that [𝑇 ]𝐵,𝐵 is a diagonal matrix.
If 𝐴 ∈ C𝑛×𝑛 , then it is viewed as a linear operator on C𝑛×1 . Of course, the matrix
representation of the linear operator 𝐴 with respect to the standard basis of C𝑛×1
is 𝐴 itself. A change of basis results in a matrix similar to 𝐴. Thus, 𝐴 ∈ C𝑛×𝑛
is diagonalizable iff 𝐴 is similar to a diagonal matrix iff there exists an invertible
matrix 𝑃 ∈ C𝑛×𝑛 such that 𝑃 −1𝐴𝑃 is a diagonal matrix. In such a case, we say that
𝐴 is diagonalized by 𝑃 .
Similarly, 𝐴 is unitarily diagonalizable iff 𝐴 is unitarily similar to a diagonal
matrix iff there exists a unitary matrix 𝑈 ∈ C𝑛×𝑛 such that 𝑈 ∗𝐴𝑈 is a diagonal
matrix.
Spectral Representation 127
If 𝐴 ∈ R𝑛×𝑛 , then we say that 𝐴 is orthogonally diagonalizable iff 𝐴 is orthog-
onally similar to a diagonal matrix iff there exists an orthogonal matrix 𝑄 ∈ R𝑛×𝑛
such that 𝑄 ∗𝐴𝑄 is a diagonal matrix.
The following result connects diagonalizability of a linear operator to eigenvalues
and eigenvectors.

(5.18) Theorem
A linear operator 𝑇 on a finite dimensional vector space 𝑉 is diagonalizable iff there
exists a basis for 𝑉 consisting of eigenvectors of 𝑇 . In particular, a matrix 𝐴 ∈ F𝑛×𝑛
is diagonalizable iff there exists a basis of F𝑛×1 consisting of eigenvectors of 𝐴.

Proof. Let 𝑇 be a linear operator on a vector space 𝑉 of dimension 𝑛.


If 𝑇 is diagonalizable, then there exists a basis 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } for 𝑉 such that
[𝑇 ]𝐵,𝐵 = diag (𝜆1, . . . , 𝜆𝑛 ) for scalars 𝜆𝑖 . In this case, 𝑇 𝑣𝑖 = 𝜆𝑖 𝑣𝑖 for 1 ≤ 𝑖 ≤ 𝑛.
Therefore, 𝐵 consists of eigenvectors of 𝑇 .
Conversely, if 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } is a basis for 𝑉 , where 𝑇 𝑣𝑖 = 𝜆𝑖 𝑣𝑖 for 1 ≤ 𝑖 ≤ 𝑛,
then [𝑇 ]𝐵,𝐵 = diag (𝜆1, . . . , 𝜆𝑛 ).
The statement about diagonalizability of a matrix in (5.18) follows from that for
the linear operator. However, it is instructive to see the same more directly. Let
𝐴 ∈ C𝑛×𝑛 .
Suppose that 𝐴 is diagonalized by 𝑃 . Then there exist scalars 𝜆1, . . . , 𝜆𝑛 such that

𝑃 −1𝐴𝑃 = 𝐷 = diag (𝜆1, . . . , 𝜆𝑛 ).


 
Then 𝐴𝑃 = 𝑃𝐷. If 𝑃 = 𝑣 1 · · · 𝑣𝑛 , then 𝐴𝑣𝑖 = 𝜆𝑖 𝑣𝑖 for 1 ≤ 𝑖 ≤ 𝑛. Since 𝑃 is
invertible, its columns, which are the eigenvectors of 𝐴, form a basis for F𝑛×1 .
Conversely, if 𝜆1, . . . , 𝜆𝑛 are the eigenvalues of a matrix 𝐴 ∈ F𝑛×𝑛 with corre-
sponding eigenvectors 𝑣 1, . . . , 𝑣𝑛 which form a basis for F𝑛×1, then construct the
matrices
 
𝑃 = 𝑣 1 · · · 𝑣𝑛 ∈ F𝑛×𝑛 , 𝐷 = diag (𝜆1, . . . , 𝜆𝑛 ).
The matrix 𝑃 is invertible; and

𝐴𝑃𝑒𝑖 = 𝐴𝑣𝑖 = 𝜆𝑖 𝑣𝑖 = 𝜆𝑖 𝑃𝑒𝑖 = 𝑃𝐷𝑒𝑖 .

That is, 𝐴𝑃 = 𝑃𝐷 so that 𝑃 −1𝐴𝑃 = 𝐷. Hence 𝐴 is diagonalized by 𝑃 .


If 𝐴 is at all diagonalizable, then we construct the diagonalizing matrix 𝑃 by
collecting together the linearly independent eigenvectors of 𝐴. Then 𝑃 −1𝐴𝑃 will be
the diagonal matrix with its diagonal entries as the respective eigenvalues.
Now, the question is how to determine whether a given matrix or a linear operator
is diagonalizable or not. That is, when can we ensure that there exists a basis for
128 MA2031 Classnotes

the vector space consisting of eigenvectors of the operator? A partial answer is


provided by the following theorem.

(5.19) Theorem
Eigenvectors associated with distinct eigenvalues of a linear operator on a finite di-
mensional vector space are linearly independent. In particular, each linear operator
on a vector space of dimension 𝑛 having 𝑛 distinct eigenvalues is diagonalizable.

Proof. Let 𝑉 be a vector space of of finite dimension, 𝑇 : 𝑉 → 𝑉 be a linear


operator, and let 𝜆1, . . . , 𝜆𝑚 be distinct eigenvalues of 𝑇 with corresponding eigen-
vectors as 𝑣 1, . . . , 𝑣𝑚 . Suppose, on the contrary, that the vectors 𝑣 1, . . . , 𝑣𝑚 are linearly
dependent. By (1.14), there exists 𝑘 such that {𝑣 1, . . . , 𝑣𝑘 } is linearly independent
and 𝑣𝑘+1 ∈ span {𝑣 1, . . . , 𝑣𝑘 }. Then, we have scalars 𝑎 1, . . . , 𝑎𝑘 such that

𝑣𝑘+1 = 𝑎 1𝑣 1 + · · · + 𝑎𝑘 𝑣𝑘 . (5.4.1)

Applying 𝑇 on both sides of (5.4.1), we obtain

𝑇 𝑣𝑘+1 = 𝑎 1𝑣 1 + · · · + 𝑎𝑘 𝑣𝑘 .

Since 𝑣 𝑗 is an eigenvector of 𝑇 for the eigenvalue 𝜆 𝑗 , we have

𝜆𝑘+1𝑣𝑘+1 = 𝑎 1𝜆1𝑣 1 + · · · + 𝑎𝑘 𝜆𝑘 𝑣𝑘 .

Multiplying 𝜆𝑘+1 to both sides of (5.4.1), we get

𝜆𝑘+1𝑣𝑘+1 = 𝑎 1𝜆𝑘+1𝑣 1 + · · · + 𝑎𝑘 𝜆𝑘+1𝑣𝑘 .

Subtracting the last equation from its preceding one, we obtain

0 = 𝑎 1 (𝜆1 − 𝜆𝑘+1 )𝑣 1 + · · · + 𝑎𝑘 (𝜆𝑘 − 𝜆𝑘+1 )𝑣𝑘 .

Since {𝑣 1, . . . , 𝑣𝑘 } is linearly independent, we find that

𝑎 1 (𝜆1 − 𝜆𝑘+1 ) = 0, . . . , 𝑎𝑘 (𝜆𝑘 − 𝜆𝑘+1 ) = 0.

As 𝜆 𝑗 − 𝜆𝑘+1 ≠ 0 for 1 ≤ 𝑗 ≤ 𝑘, we conclude that 𝑎 1 = 0, . . . , 𝑎𝑘 = 0. Then (5.4.1)


implies that 𝑣𝑘+1 = 0. But this is impossible as 𝑣𝑘+1 is an eigenvector of 𝑇 . Therefore,
𝑣 1, . . . , 𝑣𝑚 are linearly independent.
For the particular case, suppose 𝑇 has 𝑛 distinct eigenvalues 𝜆1, . . . , 𝜆𝑛 . Then
the associated eigenvectors 𝑣 1, . . . , 𝑣𝑛 are linearly independent. Since dim (𝑉 ) = 𝑛,
these vectors form a basis for 𝑉 . By (5.18), 𝑇 is diagonalizable.
For matrices, we thus conclude the following: Eigenvectors associated with
distinct complex eigenvalues of a matrix are linearly independent. In particular, if a
Spectral Representation 129
matrix of order 𝑛 has 𝑛 distinct complex eigenvalues, then it is similar to a diagonal
matrix. The diagonalizing matrix is, in general, a matrix with complex entries.

(5.20) Example
Let 𝑇 : R3×1 → R3×1 be given by 𝑇 (𝑎, 𝑏, 𝑐) = (𝑎 + 𝑏 + 𝑐, 3𝑏 + 3𝑐, −2𝑎 + 𝑏 + 𝑐). If
possible, find an ordered basis 𝐵 for R3 so that [𝑇 ]𝐵,𝐵 is a diagonal matrix.
The matrix of 𝑇 with respect to the standard basis of R3 is

 1 1 1
 
𝐴 =  0 3 3  .
 −2 1 1 
 
We find that 𝜒𝑇 (𝑡) = 𝜒𝐴 (𝑡) = 𝑡 (𝑡 − 2)(𝑡 − 3). Since no eigenvalue is repeated,
𝑇 is diagonalizable. For diagonalization, we compute the eigenvectors 𝑣 1, 𝑣 2, 𝑣 3
corresponding to the eigenvalues 𝜆 = 0, 2, 3. They are as follows:

𝜆 = 0 : 𝑣 1 = (0, −1, 1), 𝜆 = 2 : 𝑣 2 = (2, 3, −1), 𝜆 = 3 : 𝑣 3 = (1, 2, 0).

We see that

𝑇 (𝑣 1 ) = (0, 0, 0) = 0 𝑣 1 + 0 𝑣 2 + 0 𝑣 3
𝑇 (𝑣 2 ) = (4, 6, −2) = 0 𝑣 1 + 2 𝑣 2 + 0 𝑣 3
𝑇 (𝑣 3 ) = (3, 6, 0) = 0 𝑣 1 + 0 𝑣 2 + 3 𝑣 3 .

Therefore, with the basis 𝐵 = {𝑣 1, 𝑣 2, 𝑣 3 } for R3, we have [𝑇 ]𝐵,𝐵 = diag (0, 2, 3).
To diagonalize the matrix 𝐴 directly, we work with column vectors so that
{𝑣 1t , 𝑣 2t , 𝑣 3t } is the required basis for R3×1 . The diagonalizing matrix is

 0 2 1
 t t t  
𝑃 = 𝑣 1 𝑣 2 𝑣 3 =  −1 3 2  .
 1 −1 0 
 
 1 −1/2 1/2   1 1 1  0 2 1 0 0 0
      
Then 𝑃 −1𝐴𝑃 =  1 −1/2 −1/2   0 3 3   −1 3 2  =  0 2 0  .
    
 −1 1 1   −2 1 1   1 −1 0   0 0 3 
     
If 𝜆 is an eigenvalue of a linear operator 𝑇 , then its associated eigenvector 𝑢
is a solution of 𝑇𝑢 = 𝜆𝑢. Thus the maximum number of linearly independent
eigenvectors associated with the eigenvalue 𝜆 is dim (𝑁 (𝑇 − 𝜆𝐼 )). This number and
the algebraic multiplicity of 𝜆 have certain relations with the diagonalizability of 𝑇 .
Let 𝜆 be an eigenvalue of a linear operator 𝑇 on a finite dimensional vector space
𝑉 . The number dim (𝑁 (𝑇 − 𝜆𝐼 )) is called the geometric multiplicity of 𝜆.
130 MA2031 Classnotes

Recall that the number 𝑚 such that (𝑡 − 𝜆)𝑚 divides 𝜒𝑇 (𝑡), and (𝑡 − 𝜆)𝑚+1 does
not divide 𝜒𝑇 (𝑡) is the algebraic multiplicity of 𝜆.

(5.21) Theorem
The geometric multiplicity of any eigenvalue of a linear operator on a finite dimen-
sional vector space is less than or equal to its algebraic multiplicity.

Proof. Let 𝑉 be a vector space of dimension 𝑛. Let 𝑇 be a linear operator on 𝑉 ,


and let 𝜆 be an eigenvalue of 𝑇 . Suppose the geometric multiplicity of 𝜆 is 𝑘. Then
we have 𝑘 number of linearly independent eigenvectors of 𝑇 associated with this
eigenvalue 𝜆, and no more. Let these eigenvectors be 𝑣 1, . . . , 𝑣𝑘 . Extend these to an
ordered basis 𝐵 = {𝑣 1, . . . , 𝑣𝑘 , 𝑣𝑘+1, . . . , 𝑣𝑛 } for 𝑉 . Then

𝑇 𝑣 1 = 𝜆𝑣 1, . . . , 𝑇 𝑣𝑘 = 𝜆𝑣𝑘 .

And for 𝑗 > 𝑘, 𝑇 𝑣 𝑗 can be any linear combination of 𝑣 1, . . . , 𝑣𝑛 . That is, the matrix
of 𝑇 with respect to 𝐵 is given by
 
𝜆𝐼𝑘 𝐶
𝐴 := [𝑇 ]𝐵,𝐵 = ,
0 𝐷

where 𝐼𝑘 is the identity matrix of order 𝑘 and 𝐶 ∈ C𝑘×(𝑛−𝑘) , 𝐷 ∈ C (𝑛−𝑘)×(𝑛−𝑘) are


some matrices. Now,
𝜒 (𝑡) = 𝜒 (𝑡) = (𝑡 − 𝜆)𝑘 𝑝 (𝑡)
𝑇 𝐴

for some polynomial 𝑝 (𝑡) of degree 𝑛 − 𝑘. Clearly, the algebraic multiplicity of 𝜆 is


at least 𝑘.

(5.22) Theorem
A linear operator 𝑇 on a vector space of dimension 𝑛 is diagonalizable iff the
geometric multiplicity of each eigenvalue of 𝑇 is equal to its algebraic multiplicity
iff sum of geometric multiplicities of all eigenvalues of 𝑇 is 𝑛.

Proof. Suppose 𝑇 : 𝑉 → 𝑉 is diagonalizable, where dim (𝑉 ) = 𝑛. Then there


exists an ordered basis 𝐵 of 𝑉 consisting of eigenvectors of 𝑇 such that [𝑇 ]𝐵,𝐵 is a
diagonal matrix. If 𝜆 is an eigenvalue of 𝑇 of algebraic multiplicity 𝑚, then in this
diagonal matrix there are exactly 𝑚 number of entries equal to 𝜆. In the basis 𝐵 there
are exactly 𝑚 number of eigenvectors associated with 𝜆. Therefore, the geometric
multiplicity of 𝜆 is 𝑚.
Conversely, suppose that the geometric multiplicity of each eigenvalue is equal to
its algebraic multiplicity. Then corresponding to each eigenvalue 𝜆, we have exactly
that many linearly independent eigenvectors as its algebraic multiplicity. Collecting
Spectral Representation 131
together these eigenvalues, we get 𝑛 linearly independent eigenvectors; which form
a basis for 𝑉 . Therefore, 𝑇 is diagonalizable.
The second ‘iff’ statement follows since geometric multiplicity of each eigenvalue
is at most its algebraic multiplicity.

(5.23) Example
   
2 0 2 1
Let 𝐴 = and let 𝐵 = . We see that 𝜒𝐴 (𝑡) = 𝜒𝐵 (𝑡) = (𝑡 − 2) 2 . The
0 2 0 2
eigenvalue 𝜆 = 2 has algebraic multiplicity 2 for both 𝐴 and 𝐵.
For geometric multiplicities, we solve 𝐴𝑥 = 𝑥 and 𝐵𝑦 = 𝑦.
Now, 𝐴𝑥 = 2𝑥 gives 𝑥 = 𝑥, which is satisfied by the linearly independent vectors
(1, 0) t and (0, 1) t . Thus, the geometric multiplicity of the eigenvalue 2 is dim (𝑁 (𝐴−
2𝐼 )) = 2, which is equal to the algebraic multiplicity of the eigenvalue 2. Hence 𝐴
is diagonalizable; in fact, it is already a diagonal matrix.
For the matrix 𝐵, we solve 𝐵𝑥 = 2𝑥 . With 𝑥 = (𝑎, 𝑏) t, we get 2𝑎 + 𝑏 = 2𝑎 and
2𝑏 = 2𝑏. That is, 𝑏 = 0 and 𝑎 can be any complex number. For instance, 𝑥 = (1, 0) t .
Then the geometric multiplicity of the eigenvalue 2 is dim (𝑁 (𝐵 − 𝜆𝐼 )) = 1, which
is not equal to the algebraic multiplicity of the eigenvalue 2. Therefore, 𝐵 is not
diagonalizable.

(5.24) Example
Let 𝑇 : R3 → R3 be given by 𝑇 (𝑎, 𝑏, 𝑐) = (𝑎 + 3𝑏 + 3𝑐, 3𝑎 + 𝑏 + 3𝑐, 3𝑎 + 3𝑏 + 𝑐).
Determine whether the linear operator 𝑇 is diagonalizable or not.
With respect to the standard basis, 𝑇 has the matrix representation
1 3 3
 
𝐴 =  3 1 3  .
3 3 1
 
We obtain 𝜒𝑇 (𝑡) = 𝜒𝐴 (𝑡) = (𝑡 + 2) 2 (𝑡 − 7). The eigenvalue −2 has algebraic
multiplicity 2. To determine its geometric multiplicity, we solve the linear system
(𝑇 + 2𝐼 )(𝑎, 𝑏, 𝑐) = 0. It gives 3𝑎 + 3𝑏 + 3𝑐 = 0 or, 𝑎 + 𝑏 + 𝑐 = 0. It has two linearly
independent solutions. That is, the geometric multiplicity of the eigenvalue −2 is
equal to
dim (𝑁 (𝑇 + 2𝐼 )) = dim {(𝑎, 𝑏, 𝑐) ∈ R3 : 𝑎 + 𝑏 + 𝑐 = 0} = 2.
Thus the geometric multiplicity of the eigenvalue −2 is equal to its algebraic mul-
tiplicity, 2. The geometric multiplicity of the eigenvalue 7 is at least 1 and also it
is less than or equal to its algebraic multiplicity, which is 1. That is, the geometric
multiplicity of the eigenvalue 7 is equal to its algebraic multiplicity, 1. Hence 𝑇 is
diagonalizable.
132 MA2031 Classnotes

 1 1 1
 
Indeed, with 𝑃 =  −1 0 1  , we see that 𝑃 −1𝐴𝑃 = diag (−2, −2, 7).
 0 −1 1 
 
In the beginning of this section we have argued that a hermitian matrix can be
diagonalized. This observation can be generalized to normal matrices by using the
fact that a normal upper triangular matrix is necessarily a diagonal matrix.

(5.25) Lemma
A normal upper triangular matrix is diagonal. In particular, a hermitian upper
triangular matrix is diagonal.

Proof. Suppose 𝐵 is a normal upper triangular matrix. If 𝐵 is a 1 × 1 matrix,


then it is diagonal. Otherwise, assume that (induction hypothesis) all 𝑛 × 𝑛 normal
upper triangular matrices are diagonal. Let 𝐵 be an (𝑛 + 1) × (𝑛 + 1) normal upper
triangular matrix. Then write 𝐵 in block form as
 
𝑎 𝑥
𝐵= ,
0 𝐶

where 𝑥 ∈ F1×𝑛 , 0 ∈ F𝑛×1, and 𝐶 ∈ F𝑛×𝑛 is an upper triangular matrix. Then

𝑎 0∗ 𝑎 𝑥
     2 
∗ |𝑎| 𝑎𝑥
𝐵 𝐵 = ∗ ∗ =
𝑥 𝐶 0 𝐶 𝑎𝑥 ∗ 𝑥 ∗𝑥 + 𝐶 ∗𝐶
𝑎 0∗ |𝑎| + 𝑥𝑥 ∗ 𝑥𝐶 ∗
     2 
∗ 𝑎 𝑥
𝐵𝐵 = = .
0 𝐶 𝑥∗ 𝐶∗ 𝐶𝑥 ∗ 𝐶𝐶 ∗

Now, 𝐵 ∗ 𝐵 = 𝐵𝐵 ∗ implies that 𝑥𝑥 ∗ = 0 and 𝑥 ∗𝑥 +𝐶 ∗𝐶 = 𝐶𝐶 ∗ . If 𝑥 = [𝑏 1 · · · 𝑏𝑛 ], then


𝑥𝑥 ∗ = |𝑏 1 | 2 + · · · + |𝑏𝑛 | 2 . So, 𝑥𝑥 ∗ = 0 implies that 𝑥 = 0. Thus 𝐶 ∗𝐶 = 𝐶𝐶 ∗ . That is,
 
𝑎 0
𝐵= ,
0 𝐶

where 𝐶 is an 𝑛 × 𝑛 normal upper triangular matrix. By our assumption, 𝐶 is a


diagonal matrix. Therefore, 𝐵 is a diagonal matrix.
Using (5.25) and Schur triangularization we obtain the following result.

(5.26) Theorem (Spectral theorem)


A linear operator on a finite dimensional ips is unitarily diagonalizable iff it is
a normal operator. In particular, each self-adjoint linear operator on a finite
dimensional ips is unitarily diagonalizable.
Spectral Representation 133
Proof. Let 𝑇 : 𝑉 → 𝑉 be a linear operator, where 𝑉 is an ips of dimension
𝑛. Assume that 𝑇 is normal. Schur triangularization implies that there exists an
orthonormal ordered basis 𝐵 for 𝑉 such that [𝑇 ]𝐵,𝐵 is upper triangular. By (4.24),
[𝑇 ]𝐵,𝐵 is a normal matrix. By (5.25), [𝑇 ]𝐵,𝐵 is a diagonal matrix.
Conversely, suppose [𝑇 ]𝐵,𝐵 is a diagonal matrix, where 𝐵 is an orthonormal
ordered basis for 𝑉 . Since a diagonal matrix is necessarily normal, [𝑇 ]𝐵,𝐵 is normal.
Again, (4.24) implies that 𝑇 is normal.
The result in (5.27) is called the spectral theorem since it uses the spectrum, that
is, the multiset of eigenvalues of the linear operator. Notice that the diagonal matrix
that represents a normal linear operator 𝑇 has its diagonal entries as the eigenvalues
of 𝑇 . In case, 𝑇 is self-adjoint, the diagonal entries are real.
The spectral theorem for matrices may be stated as follows.

(5.27) Theorem (Spectral theorem)


A square matrix is unitarily diagonalizable iff it is normal. In particular, each
hermitian matrix is unitarily diagonalizable; and each real symmetric matrix is
orthogonally diagonalizable.

There can be non-normal but diagonalizable matrices. For such a matrix 𝐴, there
does not exist a unitary matrix 𝑈 such that 𝑈 −1𝐴𝑈 is a diagonal matrix.
If unitary diagonalization is not required, we can diagonalize a diagonalizable
operator by constructing a basis of eigenvectors, which need not be orthonormal.
This means, for each eigenvalue, we choose linearly independent eigenvectors as
many as that matching its geometric multiplicity. This is bound to succeed provided
that the operator is diagonalizable. Similarly, choosing orthogonal eigenvectors for
each eigenvalue would lead to unitary diagonalization of a normal operator.

(5.28) Example
Consider 𝑇 : R3 → R3 with 𝑇 (𝑎, 𝑏, 𝑐) = (𝑎 + 3𝑏 + 3𝑐, 3𝑎 + 𝑏 + 3𝑐, 3𝑎 + 3𝑏 + 𝑐). Now,

h𝑇 (𝑎, 𝑏, 𝑐), (𝛼, 𝛽, 𝛾)i = 𝑎𝛼 + 3𝑏𝛼 + 3𝑐𝛼 + 3𝑎𝛽 + 𝑏𝛽 + 3𝑐𝛽 + 3𝑎𝛾 + 3𝑏𝛾 + 𝑐𝛾
= 𝑎(𝛼 + 3𝛽 + 3𝛾) + 𝑏 (3𝛼 + 𝛽 + 3𝛾) + 𝑐 (3𝛼 + 3𝛽 + 𝛾)
= h(𝑎, 𝑏, 𝑐),𝑇 (𝛼, 𝛽, 𝛾)i.

Thus, 𝑇 is self-adjoint; it is diagonalizable. In fact, with respect to the standard


basis 𝐸 of R3 the matrix representation of 𝑇 is

1 3 3
 
[𝑇 ]𝐸,𝐸 =  3 1 3  ,
3 3 1
 
134 MA2031 Classnotes

which is real symmetric. As in (5.24), the eigenvalues of 𝑇 are −2, −2 and 7.


Any eigenvector (𝑎, 𝑏, 𝑐) for the eigenvalue −2 satisfies the equation 𝑇 (𝑎, 𝑏, 𝑐) =
−2(𝑎, 𝑏, 𝑐), that is,

𝑎 + 3𝑏 + 3𝑐 = −2𝑎, 3𝑎 + 𝑏 + 3𝑐 = −2𝑏, 3𝑎 + 3𝑏 + 𝑐 = −2𝑐.

They give 𝑎 + 𝑏 + 𝑐 = 0. We choose two linearly independent eigenvectors for the


eigenvalue −2. For instance, 𝑢 1 = (1, −1, 0) and 𝑢 2 = (1, 0, −1).
For the eigenvalue 7, an eigenvector (𝑎, 𝑏, 𝑐) satisfies

𝑎 + 3𝑏 + 3𝑐 = 7𝑎, 3𝑎 + 𝑏 + 3𝑐 = 7𝑏, 3𝑎 + 3𝑏 + 𝑐 = 7𝑐.

These equations simplify to 𝑎 = 𝑏 = 𝑐. Thus we choose 𝑢 3 = (1, 1, 1) as an


eigenvector.
Then we have obtained a basis 𝐵 = {𝑢 1, 𝑢 2, 𝑢 3 } for R3 . Since these are eigenvectors,
using their corresponding eigenvalues, we find that

𝑇 (𝑢 1 ) = −2𝑢 1, 𝑇 (𝑢 2 ) = −2𝑢 2, 𝑇 (𝑢 3 ) = 7𝑢 3 .

Hence [𝑇 ]𝐵,𝐵 = diag (−2, −2, 7).


Further, the spectral theorem asserts that there exists an orthonormal basis of R3
consisting of eigenvectors of 𝑇 . For the eigenvalue −2, we choose the eigenvectors
as 𝑣 1 = (1, 0, −1) and 𝑣 2 = (−1, 2, −1), which are orthogonal to each other. For
the eigenvalue 7, we choose the eigenvector 𝑣 3 = 𝑢 3 = (1, 1, 1). Normalizing the
eigenvectors, we have

𝑤1 = √1 (1, 0, −1), 𝑤2 = √1 (−1, 2, −1), 𝑤3 = √1 (1, 1, 1).


2 6 3

Now, 𝐶 = {𝑤 1, 𝑤 2, 𝑤 3 } is an orthonormal basis of R3 . Since these vectors are


eigenvectors, using their corresponding eigenvalues, we obtain

𝑇 (𝑤 1 ) = −2 𝑤 1, 𝑇 (𝑤 2 ) = −2 𝑤 2, 𝑇 (𝑤 3 ) = 7 𝑤 3 .

Hence [𝑇 ]𝐶,𝐶 = diag (−2, −2, 7).

Sometimes, choosing orthogonal eigenvectors may not be very obvious. If we


can obtain linearly independent eigenvectors, then Gram-Schmidt orthogonalization
can help.
Let 𝑇 be a normal operator. We know that if 𝜆 and 𝜇 are two distinct eigenvalues
of 𝑇 with corresponding eigenvectors 𝑢 and 𝑣, then 𝑢 ⊥ 𝑣. See (5.11-d). If an
eigenvalue 𝜆 of 𝑇 is repeated 𝑚 times, diagonalizability of 𝑇 implies that there exist
𝑚 number of linearly independent eigenvectors for the eigenvalue 𝜆. After choosing
Spectral Representation 135
such 𝑚 number of eigenvectors, we employ Gram-Schmidt orthogonalization to
obtain an orthogonal set of eigenvectors corresponding to 𝜆.
To see why Gram-Schmidt process succeeds here, let 𝑢 1, . . . , 𝑢𝑚 be linearly in-
dependent eigenvectors associated with an eigenvalue 𝜆 of 𝑇 . Suppose that Gram-
Schmidt orthogonalization applied on these vectors yield the vectors 𝑣 1, . . . , 𝑣𝑚 ,
respectively. Then there exist scalars 𝑎 1, . . . 𝑎 𝑗 such that

𝑣 𝑗 = 𝑎 1𝑢 1 + · · · + 𝑎 𝑗 𝑢 𝑗 for 1 ≤ 𝑗 ≤ 𝑚.

Now, 𝑇 𝑣 𝑗 = 𝑎 1𝑇𝑢 1 + · · · 𝑎 𝑗𝑇𝑢 𝑗 = 𝜆(𝑎 1𝑢 1 + · · · 𝑎 𝑗 𝑢 𝑗 ) = 𝜆𝑣 𝑗 shows that 𝑣 𝑗 is also an


eigenvector of 𝑇 with the same eigenvalue 𝜆.
We then normalize the orthogonal eigenvectors to obtain orthonormal ones. Fi-
nally, we combine all these orthonormal sets of eigenvectors corresponding to
distinct eigenvalues to construct an orthonormal basis of eigenvectors.

(5.29) Example
Consider 𝑇 : R3 → R3 given by 𝑇 (𝑎, 𝑏, 𝑐) = (−𝑎 + 𝑏 + 𝑐, 𝑎 − 𝑏 + 𝑐, 𝑎 + 𝑏 − 𝑐). With
respect to the standard basis 𝐸 of R3, we have
 −1 1 1 
 
𝐴 = [𝑇 ]𝐸,𝐸 =  1 −1 1  .
 1 1 −1 
 
The matrix 𝐴 is real symmetric. Thus, 𝑇 is self-adjoint. Now, 𝜒𝑇 (𝑡) = 𝜒𝐴 (𝑡) =
(𝑡 − 1)(𝑡 + 2) 2 . Thus 𝑇 has eigenvalues 1, −2, −2. To construct an eigenvector for
the eigenvalue 1, we set up the linear system (𝑇 − 𝐼 )(𝑎, 𝑏, 𝑐) t = 0. That is,

−𝑎 + 𝑏 + 𝑐 = 𝑎, 𝑎 − 𝑏 + 𝑐 = 𝑏, 𝑎 + 𝑏 − 𝑐 = 𝑐.

Simplifying we obtain 𝑎 = 𝑏 = 𝑐. So, we choose an eigenvector as 𝑢 1 = (1, 1, 1).


For the eigenvalue −2, the equations are

−𝑎 + 𝑏 + 𝑐 = −2𝑎, 𝑎 − 𝑏 + 𝑐 = −2𝑏, 𝑎 + 𝑏 − 𝑐 = −2𝑐.

They give 𝑎 + 𝑏 + 𝑐 = 0. So, we choose two linearly independent eigenvectors

𝑢 2 = (1, −1, 0), 𝑢 3 = (1, 0, −1).

Notice that 𝑢 1 ⊥ 𝑢 2 and 𝑢 1 ⊥ 𝑢 3 . But 𝑢 2 is not orthogonal to 𝑢 3 . We employ


Gram-Schmidt process on {𝑢 2, 𝑢 3 } and obtain

𝑣 2 = 𝑢 2, 𝑣 3 = (1, 0, −1) − 12 (1, −1, 0) = 12 (1, 1, −2).

Observe that 𝑣 2 is also an eigenvector associated with the eigenvalue −2.


136 MA2031 Classnotes

Next, we normalize the eigenvectors 𝑢 1, 𝑣 2, 𝑣 3 by dividing them with their respec-


tive norms to obtain orthonormal eigenvectors 𝑤 1, 𝑤 2, 𝑤 3 as follows:

𝑤1 = √1 (1, 1, 1), 𝑤2 = √1 (1, −1, 0), 𝑤3 = √1 (1, 1, −2).


3 2 6

Now, 𝐵 = {𝑤 1, 𝑤 2, 𝑤 3 } is an orthonormal basis for R3 . To verify whether 𝑤 1, 𝑤 2, 𝑤 3


are the eigenvectors of 𝑇 associated with the eigenvalues 1, −2, −2, respectively, we
compute 𝑇 (𝑤 1 ), 𝑇 (𝑤 2 ), 𝑇 (𝑤 3 ) as follows:

𝑇 (𝑤 1 ) = √1 𝑇 (1, 1, 1) = √1 (1, 1, 1) = 1 𝑤 1
3 3
𝑇 (𝑤 2 ) = 1
√ 𝑇 (1, −1, 0) = √1 (−2, 2, 0) = −2 𝑤 2
2 2
1 1
𝑇 (𝑤 3 ) = √ 𝑇 (1, 1, −2) = √ (−2, −2, 4) = −2 𝑤 3 .
6 6

Therefore, [𝑇 ]𝐵,𝐵 = diag (1, −2, −2).


To diagonalize the matrix 𝐴, we work with the column vectors. The diagonalizing
matrix is
 1/√3 1/√2 1/√6 
 t t t  √ √ √ 
𝑃 = 𝑤 1 𝑤 2 𝑤 3 =  1/ 3 −1/ 2 1/ 6  .
 1/√3 0 −2/√6 
 
Since the columns of 𝑃 are orthonormal, 𝑃 is an orthogonal matrix. Then
1 0 0
 
𝑃 −1𝐴𝑃 = 𝑃 t𝐴𝑃 =  0 −2 0  .
 0 0 −2 
 

Exercises for § 5.4


0 1 1   7 −2 0
   
1. Diagonalize the matrices: (a) 1 0 1 (b) −2 6 −2 .
1 1 0   0 −2 5
   
1 0 1
 
Ans: (a) 𝑃 =  1 1 0  , 𝑃 −1𝐴𝑃 = diag (2, −1, −1).
 1 −1 −1 
 
1 2 2
 
(b) 𝑃 =  2 1 −2  , 𝑃 −1𝐴𝑃 = diag (3, 6, 9).
 2 −2 1 
 
2. In each of the following, determine whether 𝑇 is diagonalizable. If 𝑇 is
diagonalizable, find an ordered basis 𝐵 and the diagonal matrix [𝑇 ]𝐵,𝐵 .
(a) 𝑇 : R3 → R3 ; 𝑇 (𝑎, 𝑏, 𝑐) = (𝑎 + 𝑏 + 𝑐, 𝑎 + 𝑏 − 𝑐, 𝑎 − 𝑏 + 𝑐).
(b) 𝑇 : R3 → R3 ; 𝑇 𝑒 1 = 0, 𝑇 𝑒 2 = 𝑒 1, 𝑇 𝑒 3 = 𝑒 2 .
Spectral Representation 137
(c) 𝑇 : R3 → R3 ; 𝑇 𝑒 1 = 𝑒 2, 𝑇 𝑒 2 = 𝑒 3, 𝑇 𝑒 3 = 0.
(d) 𝑇 : R3 → R3 ; 𝑇 𝑒 1 = 𝑒 3, 𝑇 𝑒 2 = 𝑒 2, 𝑇 𝑒 3 = 𝑒 1 .
(e) 𝑇 : F3 [𝑡] → F3 [𝑡]; 𝑇 (𝑝 (𝑡)) = 𝑑𝑝/𝑑𝑡∫ .
1 t
(f) 𝑇 : F3 [𝑡] → F3 [𝑡]; 𝑇 (𝑝 (𝑡)) = 𝑡 0 𝑝 (𝑠) 𝑑𝑠.
Ans: (a) 𝐵 = {(−1, 1, 1), (0, 1, −1), (1, 1, 0)}, [𝑇 ]𝐵,𝐵 = diag (−1, 2, 2).
(d) 𝐵 = {(1, 0, −1), (0, 1, 0), (1, 0, 1)}, [𝑇 ]𝐵,𝐵 = diag (−1, 1, 1).
(f) 𝐵 = {1, 𝑡, 𝑡 2, 𝑡 3 }, [𝑇 ]𝐵,𝐵 = diag (1, 1/2, 1/3, 1/4).
(b), (c), (e) Not diagonalizable.
3. Give three 3 × 3 matrices which cannot be diagonalized.
 0 1 0 0 0 0 0 1 0
     
Ans: 0 0 0 , 0 0 1 , 0 0 1 .
 0 0 0 0 0 0 0 0 0
     
4. Which of the following matrices are diagonalizable?
1 1 1 1 1 1 1 0 1  3/2 −1/2 0 
      
(a)  1 −1 1  (b)  0 1 1  (c)  1 1 0  (d)
     −1/2

3/2 0  .
 1 1 −1  0 0 1 0 1 1  1/2 −1/2 1 
      
Ans: (a), (c), (d) diagonalizable; (b) is not.
5. For the diagonalizable matrices in Exercise 4, is it possible to find eigenvectors
of the matrix that form a basis for R3×1 ? Ans: Yes.
 9 −5 3 
 
6. Let 𝐴 =  0 4 3  . By diagonalizing 𝐴, compute 𝐴5 .
0 0 1
 
 95 45 − 95 45 − 1 
 
Ans: 𝐴5 =  0 45 45 − 1  .
 0 0 1 

7. Prove that two hermitian matrices are similar iff they have the same charac-
teristic polynomial.

5.5 Jordan form


We now know that all linear operators (matrices) cannot be diagonalized. Non-
diagonalizability means that we cannot have a basis {𝑣 1, . . . , 𝑣𝑛 } for the underlying
space so that 𝑇 (𝑣 𝑗 ) = 𝜆 𝑗 𝑣 𝑗 . In that case, we would like to have a basis which would
bring the matrix of the linear operator to a nearly diagonal form. Specifically, if
possible, we would try to construct an ordered basis {𝑣 1, . . . , 𝑣𝑛 } such that

𝑇 (𝑣 𝑗 ) = 𝜆 𝑗 𝑣 𝑗 or 𝑇 (𝑣 𝑗 ) = 𝑎𝑣 𝑗−1 + 𝜆 𝑗 𝑣 𝑗 for each 𝑗 ∈ {1, . . . , 𝑛},


138 MA2031 Classnotes

where 𝑎 is either 0 or 1. Notice that the matrix representation of 𝑇 with respect to


such a basis would possibly have nonzero entries on the diagonal and on the super
diagonal (entries above the diagonal); all other entries being 0.
Let 𝜆 be an eigenvalue of a linear operator 𝑇 on a finite dimensional complex
vector space 𝑉 with an associated eigenvector 𝑣 1 . A Jordan string for 𝜆 is a list of
nonzero vectors 𝑣 1, . . . , 𝑣𝑘 such that

𝑇 (𝑣 1 ) = 𝜆1𝑣 1, 𝑇 (𝑣 2 ) = 𝑣 1 +𝜆𝑣 2, . . . , 𝑇 (𝑣𝑘 ) = 𝑣𝑘−1 +𝜆𝑣𝑘 ; 𝑇 (𝑣) ≠ 𝑣𝑘 +𝜆𝑣 for any 𝑣 ∈ 𝑉 .

Equivalently,

(𝑇 −𝜆𝐼 )𝑣 1 = 0, (𝑇 −𝜆𝐼 )𝑣 2 = 𝑣 1, . . . , (𝑇 −𝜆𝐼 )𝑣𝑘 = 𝑣𝑘−1 ; (𝑇 −𝜆𝐼 )𝑣 ≠ 𝑣𝑘 for any 𝑣 ∈ 𝑉 .

Such a Jordan string 𝑣 1, . . . , 𝑣𝑘 is said to start with 𝑣 1 and end with 𝑣𝑘 . The number
𝑘 is called the length of the Jordan string.

(5.30) Example
Consider the linear operator 𝑇 : C5 → C5 given by 𝑇 (𝑎, 𝑏, 𝑐, 𝑑, 𝑒) = (𝑎, 𝑎 + 𝑏, 𝑏 +
𝑐, 𝑑, 𝑑 + 𝑒). With respect to the standard basis of C5, 𝑇 is represented by the matrix
1 0 0 0 0

1 1 0 0 0

0 1 1 0 0 .
 
 
0 0 0 1 0
 
0 0 0 1 1

Since this matrix is lower triangular with all diagonal entries as 1, the only eigenvalue
of 𝑇 is 1 with algebraic multiplicity 5. Notice that

(𝑇 − 𝐼 )(𝑎, 𝑏, 𝑐, 𝑑, 𝑒) = (0, 𝑎, 𝑏, 0, 𝑑).

To find the associated eigenvectors, we solve (𝑇 − 𝐼 )(𝑎, 𝑏, 𝑐, 𝑑, 𝑒) = 0. We obtain


𝑎 = 𝑏 = 𝑑 = 0 and 𝑐, 𝑒 are arbitrary. The geometric multiplicity of the eigenvalue 1
is 2. The two linearly independent eigenvectors are

𝑣 1 = (0, 0, 1, 0, 0), 𝑤 1 = (0, 0, 0, 0, 1).

There are two types of Jordan strings for the eigenvalue 1, one starting with 𝑣 1
and the other starting with 𝑤 1 .
For a Jordan string starting with 𝑣 1, we need to solve (𝑇 − 𝐼 )𝑣 2 = 𝑣 1 . If 𝑣 2 =
(𝑎, 𝑏, 𝑐, 𝑑, 𝑒), then the equation is

(0, 𝑎, 𝑏, 0, 𝑑) = (0, 0, 1, 0, 0).


Spectral Representation 139
This implies 𝑎 = 0, 𝑏 = 1, 𝑑 = 0; leaving 𝑐, 𝑒 arbitrary. This has again two linearly
independent solutions obtained by taking 𝑐 = 1, 𝑒 = 0 or, 𝑐 = 0, 𝑒 = 1. We take the
first one. That is,
𝑣 2 = (0, 1, 1, 0, 0).
Next, we solve (𝑇 − 𝐼 )𝑣 3 = 𝑣 2 . If 𝑣 3 = (𝑎, 𝑏, 𝑐, 𝑑, 𝑒), then the equation is

(0, 𝑎, 𝑏, 0, 𝑑) = (0, 1, 1, 0, 0).

It gives 𝑎 = 1, 𝑏 = 1, 𝑑 = 0 leaving 𝑐, 𝑒 arbitrary. Once more we have two linearly


independent solutions; we take up one of them, say,

𝑣 3 = (1, 1, 1, 0, 0).

Further, we set up (𝑇 − 𝐼 )𝑣 4 = 𝑣 3 . If 𝑣 4 = (𝑎, 𝑏, 𝑐, 𝑑, 𝑒), then

(0, 𝑎, 𝑏, 0, 𝑑) = (1, 1, 1, 0, 0).

It does not have a solution; and we stop here with the Jordan string 𝑣 1, 𝑣 2, 𝑣 3 .
Notice that we could have chosen 𝑣 2 differently, and once again, 𝑣 3 could have
been chosen in a different way.
Now, with the eigenvector 𝑤 1 = (0, 0, 0, 0, 1), we proceed similarly. Suppose
𝑤 2 = (𝑎, 𝑏, 𝑐, 𝑑, 𝑒) satisfies (𝑇 − 𝐼 )𝑤 2 = 𝑤 1 . Then

(0, 𝑎, 𝑏, 0, 𝑑) = (0, 0, 0, 0, 1).

That is, 𝑎 = 0, 𝑏 = 0, 𝑑 = 1 and 𝑐, 𝑒 arbitrary. We take one of the two possibilities


to obtain
𝑤 2 = (0, 0, 1, 1, 0).
Next, we set up (𝐴 − 𝐼 )𝑤 3 = 𝑤 2, 𝑤 3 = (𝑎, 𝑏, 𝑐, 𝑑, 𝑒); to obtain

(0, 𝑎, 𝑏, 0, 𝑑) = (0, 0, 1, 1, 0).

This equation has no solutions; we stop with the Jordan string 𝑤 1, 𝑤 2 .


What happens if we apply (𝐴 − 𝐼 ) successively to the vectors 𝑣 3 and 𝑤 2 ? Notice
that (𝑇 − 𝐼 )(𝑎, 𝑏, 𝑐, 𝑑, 𝑒) = (0, 𝑎, 𝑏, 0, 𝑑). Therefore, we see that

(𝐴 − 𝐼 )𝑣 3 = (𝐴 − 𝐼 )(1, 1, 1, 0, 0) = (0, 1, 1, 0, 0) = 𝑣 2,
(𝐴 − 𝐼 ) 2𝑣 3 = (𝐴 − 𝐼 )𝑣 2 = (𝐴 − 𝐼 )(0, 1, 1, 0, 0) = (0, 0, 1, 0, 0) = 𝑣 1,
(𝐴 − 𝐼 ) 3𝑣 3 = (𝐴 − 𝐼 )𝑣 1 = (𝐴 − 𝐼 )(0, 0, 1, 0, 0) = (0, 0, 0, 0, 0).
(𝐴 − 𝐼 )𝑤 2 = (𝐴 − 𝐼 )(0, 0, 1, 1, 0) = (0, 0, 0, 0, 1) = 𝑤 1,
(𝐴 − 𝐼 ) 2𝑤 2 = (𝐴 − 𝐼 )𝑤 1 = (𝐴 − 𝐼 )(0, 0, 0, 1, 0) = (0, 0, 0, 0, 0).
140 MA2031 Classnotes

That is, (𝐴 − 𝐼 ) 3𝑣 3 = 0 = (𝐴 − 𝐼 ) 2𝑤 2 .

Let us look at constructing 𝑣 2 from 𝑣 1 in a Jordan string for an eigenvalue 𝜆.


Suppose that 𝑁 (𝑇 − 𝜆𝐼 ) is a proper subspace of 𝑁 (𝑇 − 𝜆𝐼 ) 2, and we have already
chosen a nonzero vector 𝑣 1 from 𝑁 (𝑇 −𝜆𝐼 ). We know that there exist nonzero vectors
in 𝑁 (𝑇 − 𝜆𝐼 ) 2 \ 𝑁 (𝑇 − 𝜆𝐼 ). But there is no guarantee that such a vector 𝑣 2 exists
that may satisfy the equation (𝑇 − 𝜆𝐼 )𝑣 2 = 𝑣 1 . If this happens, then 𝑣 1 ∈ 𝑅(𝑇 − 𝜆𝐼 ).
Conversely, if 𝑣 1 ∈ 𝑅(𝑇 − 𝜆𝐼 ), then such a vector 𝑣 2 exists. This suggests that
for the construction of a Jordan string, we must choose the starting
vector 𝑣 1 from 𝑁 (𝑇 − 𝜆𝐼 ) ∩ 𝑅(𝑇 − 𝜆𝐼 ).
Notice that if 𝑁 (𝑇 − 𝜆𝐼 ) ∩ 𝑅(𝑇 − 𝜆𝐼 ) = {0}, then such a choice is not possible; and
then the Jordan string with starting vector 𝑣 1 ∈ 𝑁 (𝑇 − 𝜆𝐼 ) will have only 𝑣 1 in it.

(5.31) Theorem (Jordan Strings)


Let 𝑇 be a linear operator on a finite dimensional complex vector space 𝑉 . Then
there exists a basis of 𝑉 , which is a disjoint union of Jordan strings for eigenvalues
of 𝑇 . Further, if 𝜆 is an eigenvalue of 𝑇 , then in this basis, the number of Jordan
strings for 𝜆 is equal to the geometric multiplicity of 𝜆.

Proof. We use induction on 𝑛. For 𝑛 = 1, let 𝑣 be an eigenvector associated with


the eigenvalue 𝜆 of 𝑇 . Then 𝑇 𝑣 = 𝜆𝑣. As dim (𝑉 ) = 1, {𝑣 } is a basis of 𝑉 . If 𝑥 ∈ 𝑉 ,
then 𝑥 = 𝛼𝑣 for some scalar 𝛼 . Now,

(𝑇 − 𝜆𝐼 )𝑥 = 𝑇 𝑥 − 𝜆𝑥 = 𝛼𝑇 𝑣 − 𝜆𝛼𝑣 = 𝛼𝜆𝑣 − 𝛼𝜆𝑣 = 0 ≠ 𝑣.

Therefore, the Jordan string with 𝑣 consists of this single vector 𝑣. Since {𝑣 } is a
basis of 𝑉 , the statements of the theorem hold true.
Lay out the induction hypothesis that for all complex vector spaces of dimension
less than 𝑛, the statements are true. Let 𝑇 : 𝑉 → 𝑉 be a linear operator, where
𝑉 is a complex vector space of dimension 𝑛. Let 𝜆 be an eigenvalue of 𝑇 . Then
null(𝑇 − 𝜆𝐼 ) ≥ 1. Write 𝑈 = 𝑅(𝑇 − 𝜆𝐼 ). By the rank nullity theorem, 𝑟 = dim (𝑈 ) =
rank(𝑇 − 𝜆𝐼 ) < 𝑛. If 𝑥 ∈ 𝑈 , then (𝑇 − 𝜆𝐼 )𝑥 ∈ 𝑅(𝑇 − 𝜆𝐼 ) = 𝑈 . Thus, the restriction
of 𝑇 − 𝜆𝐼 to 𝑈 is a linear operator. Call this restriction linear operator as 𝑆. That is,
𝑆 is the linear operator given by

𝑆 : 𝑈 → 𝑈, 𝑆𝑥 = (𝑇 − 𝜆𝐼 )(𝑥), for each 𝑥 ∈ 𝑈 = 𝑅(𝑇 − 𝜆𝐼 ).

We now break the proof into two cases.


Case 1: Assume that 𝑁 (𝑇 − 𝜆𝐼 ) ∩ 𝑈 ≠ {0}. Then there exists a nonzero vector
𝑥 ∈ 𝑈 such that (𝑇 − 𝜆𝐼 )𝑥 = 0. So, 𝑆𝑥 = 0 = 0 𝑥 . Thus 0 is an eigenvalue of 𝑆,
and 𝑠 = null(𝑆) ≥ 1. Notice that 𝑠 is the geometric multiplicity of the eigenvalue
Spectral Representation 141
0 of 𝑆. By the induction hypothesis, there exists an ordered basis 𝐸 = {𝑣 1, . . . , 𝑣𝑟 }
of 𝑈 , which is a disjoint union of Jordan strings. Moreover, corresponding to the
eigenvalue 0 of 𝑆, there are 𝑠 number of Jordan strings listed in the basis 𝐸 starting
with a vector from 𝑁 (𝑆).
We first look at any possible nonzero eigenvalue of 𝑆. Let 𝑥 1, . . . , 𝑥 𝑗 be a Jordan
string for a nonzero eigenvalue 𝜇 of 𝑆 listed among 𝑣 1, . . . , 𝑣𝑟 . Then

𝑆𝑥 1 = 𝜇𝑥 1, 𝑆𝑥 2 = 𝑥 1 + 𝜇𝑥 2, . . . , 𝑆𝑥 𝑗 = 𝑥 𝑗−1 + 𝜇𝑥 𝑗 ;
𝑆𝑥 ≠ 𝑥 𝑗 + 𝜇𝑥 for any 𝑥 ∈ 𝑈 . (5.5.1)

As 𝑆 = 𝑇 − 𝜆𝐼 on 𝑈 , it follows that

𝑇 𝑥 1 = (𝜆 + 𝜇)𝑥 1, 𝑇 𝑥 2 = 𝑥 1 + (𝜆 + 𝜇)𝑥 2, . . . ,𝑇 𝑥 𝑗 = 𝑥 𝑗−1 + (𝜆 + 𝜇)𝑥 𝑗 ;


𝑇 𝑥 ≠ 𝑥 𝑗 + (𝜆 + 𝜇)𝑥 for any 𝑥 ∈ 𝑈 .

Look at the last inequality. Can it happen that there exists a vector 𝑦 ∈ 𝑉
such that 𝑇𝑦 = 𝑥 𝑗 + (𝜆 + 𝜇)𝑦? If it so happens, then (𝑇 − 𝜆𝐼 )𝑦 = 𝑥 𝑗 + 𝜇𝑦. Then
𝑥 𝑗 + 𝜇𝑦 ∈ 𝑅(𝑇 − 𝜆𝐼 ) = 𝑈 . But 𝑥 𝑗 ∈ 𝑈 . Therefore, 𝑦 ∈ 𝑈 . This will contradict (5.5.1).
Hence the last inequality is replaced by

𝑇 𝑥 ≠ 𝑥 𝑗 + (𝜆 + 𝜇)𝑥 for any 𝑥 ∈ 𝑉 .

And, we conclude that any Jordan string for an eigenvalue 𝜇 ≠ 0 of 𝑆 listed among
𝑣 1, . . . , 𝑣𝑟 is a Jordan string for an eigenvalue 𝜆 + 𝜇 of 𝑇 .
Next, we look at any Jordan string for the eigenvalue 0. Any such Jordan string
looks like a list of vectors 𝑢 1, . . . , 𝑢𝑘 in 𝑈 with

𝑆𝑢 1 = 0, 𝑆𝑢 2 = 𝑢 1, . . . , 𝑆𝑢𝑘 = 𝑢𝑘−1 ; 𝑆𝑥 ≠ 𝑢𝑘 for any 𝑥 ∈ 𝑈 . (5.5.2)

The vectors 𝑢𝑖 are from the set 𝑣 1, . . . , 𝑣𝑟 . Then (5.5.2) implies that

𝑇𝑢 1 = 𝜆𝑢 1, 𝑇𝑢 2 = 𝑢 1 + 𝜆𝑢 2, . . . ,𝑇𝑢𝑘 = 𝑢𝑘−1 + 𝜆𝑢𝑘 .

Since 𝑢𝑘 ∈ 𝑈 = 𝑅(𝑇 − 𝜆𝐼 ), there exists a vector 𝑢𝑘+1 ∈ 𝑉 such that

(𝑇 − 𝜆𝐼 )𝑢𝑘+1 = 𝑢𝑘 .

That is, 𝑇𝑢𝑘+1 = 𝑢𝑘 + 𝜆𝑢𝑘+1 . Further, if for some 𝑥 ∈ 𝑉 , (𝑇 − 𝜆𝐼 )𝑥 = 𝑢𝑘+1, then


𝑢𝑘+1 ∈ 𝑅(𝑇 − 𝜆𝐼 ) = 𝑈 . Then 𝑆𝑢𝑘+1 = (𝑇 − 𝜆𝐼 )(𝑢𝑘+1 ) = 𝑢𝑘 contradicts the inequality
in (5.5.2). Hence
𝑇 𝑥 ≠ 𝑢𝑘+1 + 𝜆𝑥 for any 𝑥 ∈ 𝑉 .
142 MA2031 Classnotes

Therefore, we have an enlarged Jordan string 𝑢 1, . . . , 𝑢𝑘 , 𝑢𝑘+1 for the eigenvalue 𝜆


of 𝑇 . Observe that such a vector 𝑢𝑘+1 need not be in 𝑁 (𝑇 − 𝜆𝐼 ). This way, starting
with each Jordan string of the eigenvalue 0 of 𝑆, we end up with an enlarged Jordan
string for the eigenvalue 𝜆 of 𝑇 ; and they are 𝑠 in number.
Each of these enlarged Jordan strings has length one more than the corresponding
one listed in 𝑣 1, . . . , 𝑣𝑟 , where one vector has been added to each Jordan string, at the
end. Let 𝑤 1, . . . , 𝑤𝑠 be the added vectors. The set {𝑣 1, . . . , 𝑣𝑟 , 𝑤 1, . . . , 𝑤𝑠 } is now a
disjoint union of Jordan strings with 𝑠 number of Jordan strings for the eigenvalue 𝜆
of 𝑇 . These 𝑠 number of Jordan strings start with a vector from 𝑁 (𝑇 − 𝜆𝐼 ) ∩ 𝑈 and
end with the vectors 𝑤 1, . . . , 𝑤𝑠 . And the other Jordan strings of nonzero eigenvalues
of 𝑆 are kept as they are, accounting for Jordan strings of eigenvalues of 𝑇 other
than 𝜆.
The starting vectors of these 𝑠 number of enlarged Jordan strings form a basis for
𝑁 (𝑇 − 𝜆𝐼 ) ∩𝑈 . Extend the set of these starting vectors to a basis of 𝑁 (𝑇 − 𝜆𝐼 ). Since
null(𝑇 − 𝜆𝐼 ) = 𝑛 − 𝑟, we obtain 𝑛 − 𝑟 − 𝑠 number of linearly independent vectors
𝑧 1, . . . , 𝑧𝑛−𝑟 −𝑠 ∈ 𝑁 (𝑇 − 𝜆𝐼 ) \ 𝑈 so that these vectors and the starting vectors of the
enlarged Jordan strings form a basis for 𝑁 (𝑇 − 𝜆𝐼 ). Notice that if 𝑛 = 𝑟 + 𝑠, then
we do not need any such 𝑧𝑖 . Further, (𝑇 − 𝜆𝐼 )𝑥 ≠ 𝑧𝑖 for any 𝑥 ∈ 𝑉 , since otherwise,
𝑧𝑖 would be a vector in 𝑈 . Therefore, each of these vectors 𝑧𝑖 is a Jordan string of
length 1 for the eigenvalue 𝜆 of 𝑇 .
Now, the set 𝐶 = {𝑣 1, . . . , 𝑣𝑟 , 𝑤 1, . . . , 𝑤𝑠 , 𝑧 1, . . . , 𝑧𝑛−𝑟 −𝑠 } is a disjoint union of
Jordan strings for the eigenvalues of 𝑇 . We claim that 𝐶 is linearly independent. To
prove this, suppose that for scalars 𝛼𝑖 , 𝛽 𝑗 and 𝛾 ℓ ,

𝛼 1𝑣 1 + · · · + 𝛼𝑟 𝑣𝑟 + 𝛽 1𝑤 1 + · · · + 𝛽𝑠 𝑤𝑠 + 𝛾 1𝑧 1 + · · · + 𝛾𝑛−𝑟 −𝑠 𝑧𝑛−𝑟 −𝑠 = 0. (5.5.3)

Apply 𝑇 − 𝜆𝐼 to both the sides. Since (𝑇 − 𝜆𝐼 )𝑧 ℓ = 0 for each ℓ, we have

𝛼 1 (𝑇 − 𝜆𝐼 )𝑣 1 + · · · + 𝛼𝑟 (𝑇 − 𝜆𝐼 )𝑣𝑟 + 𝛽 1 (𝑇 − 𝜆𝐼 )𝑤 1 + · · · + 𝛽𝑠 (𝑇 − 𝜆𝐼 )𝑤𝑠 = 0.

We look at the effect of applying (𝑇 − 𝜆𝐼 ) in three stages. First, look at the starting
vectors of the Jordan strings. They are from 𝑁 (𝑇 − 𝜆𝐼 ). If 𝑣 is a such a vector, then
(𝑇 − 𝜆𝐼 )𝑣 = 0. Thus, all starting vectors of the Jordan strings vanish from the sum.
Second, look at all other 𝑣𝑖 in the Jordan strings. Since (𝑇 − 𝜆𝐼 )𝑣𝑖 = 𝑣𝑖−1, each
such 𝑣𝑖 in the Jordan string will have coefficient as 𝛼𝑖+1 in the sum instead of the
previous 𝛼𝑖 . This will reintroduce the starting vectors of the Jordan strings with
updated coefficients. Further, the vectors with which Jordan strings end are absent
in the sum.
Third, look at (𝑇 − 𝜆𝐼 )𝑤 𝑗 . Each 𝑤 𝑗 is the last vector in an enlarged Jordan string.
Thus 𝑤 𝑗 = (𝑇 − 𝜆𝐼 )𝑣 𝑝 for some 𝑝; this 𝑣 𝑝 is the vector with which a Jordan string
Spectral Representation 143
ends. Thus these vectors 𝑣 𝑝 are reintroduced in the sum with coefficients as 𝛽 𝑗 .
Further, the vectors 𝑤 𝑗 are absent in the sum.
Therefore, the simplified sum is a liner combination of all 𝑣𝑖 with updated coef-
ficients where vectors 𝑣 𝑝 with which a Jordan string ends has the coefficient 𝛽 𝑗 of
the corresponding next vector 𝑤 𝑗 . In this sum all 𝛼s do not occur, but all 𝛽s occur
as coefficients of these vectors 𝑣 𝑝 .
For instance, if the list 𝑣 1, 𝑣 2, 𝑣 3, 𝑤 1 is an enlarged Jordan string, and another
Jordan string starts from 𝑣 4, then after applying (𝑇 − 𝜆𝐼 ) the part for 𝑣 1, 𝑣 2, 𝑣 3, 𝑤 1 in
the sum is

𝛼 1 (𝑇 − 𝜆𝐼 )𝑣 1 + 𝛼 2 (𝑇 − 𝜆𝐼 )𝑣 2 + 𝛼 3 (𝑇 − 𝜆𝐼 )𝑣 3 + 𝛽 1 (𝑇 − 𝜆𝐼 )𝑤 1 .

Since (𝑇 − 𝜆𝐼 )𝑣 1 = 0, (𝑇 − 𝜆𝐼 )𝑣 2 = 𝑣 1, (𝑇 − 𝜆𝐼 )𝑣 3 = 𝑣 2 and (𝑇 − 𝜆𝐼 )𝑤 1 = 𝑣 3, this


portion in the simplified sum would look like

𝛼 2𝑣 1 + 𝛼 3𝑣 2 + 𝛽 1𝑣 3 .

Coming back to the proof, we have shown that {𝑣 1, . . . , 𝑣𝑟 } is linearly independent.


Thus, all scalars in the simplified sum are 0. In particular, each 𝛽 𝑗 is 0. Then (5.5.3)
simplifies to
𝛼 1𝑣 1 + · · · + 𝛼𝑟 𝑣𝑟 = −𝛾 1𝑧 1 − · · · − 𝛾𝑛−𝑟 −𝑠 𝑧𝑛−𝑟 −𝑠 .
Here, the left hand side is a vector in 𝑈 and the right hand side is a vector in
span (𝑁 (𝑇 − 𝜆𝐼 ) \𝑈 ). Since 𝑈 ∩ span (𝑁 (𝑇 − 𝜆𝐼 ) \𝑈 ) = {0}, the vector on each side
of the above equation is 0. However, each of the sets {𝑣 1, . . . , 𝑣𝑟 } and {𝑧 1, . . . , 𝑧𝑛−𝑟 −𝑠 }
is linearly independent. We conclude that each 𝛼𝑖 is equal to zero, and each 𝛾 ℓ is
equal to 0. This proves our claim.
Now the set 𝐵 := {𝑣 1, . . . , 𝑣𝑟 , 𝑤 1, . . . , 𝑤𝑠 , 𝑧 1, . . . , 𝑧𝑛−𝑟 −𝑠 } is a linearly independent
set having exactly 𝑛 vectors. So, it is a basis for 𝑉 , which is a disjoint union of
Jordan strings of eigenvalues of 𝑇 .
Case 2: Suppose that 𝑁 (𝑇 − 𝜆𝐼 ) ∩ 𝑈 = {0}. Then there exists no nonzero vector
in 𝑈 such that (𝑇 − 𝜆𝐼 )𝑥 = 0; consequently, 0 is not an eigenvalue of 𝑆. But 𝜆 is an
eigenvalue of 𝑇 anyway. Going through the proof in the first case, we find that each
Jordan string among the vectors 𝑣 1, . . . , 𝑣𝑟 is a Jordan string for a nonzero eigenvalue
of 𝑆. The starting vectors of these Jordan strings are not from 𝑁 (𝑇 − 𝜆𝐼 ). Thus, we
do not need to enlarge the set of vectors 𝑣 1, . . . , 𝑣𝑟 by adjoining any 𝑤 𝑗 . we rather
take {𝑧 1, . . . , 𝑧𝑛−𝑟 } as a basis for 𝑁 (𝑇 − 𝜆𝐼 ). Each vector 𝑧 𝑗 is a Jordan string on its
own. Essentially, in the previous construction, we take 𝑠 = 0. The proof of the first
case holds for this case also.
This completes the proof of the first statement. For the second statement about
the number of Jordan strings, notice that each 𝑧 ℓ ∈ 𝑁 (𝑇 −𝜆𝐼 ) is itself a Jordan string
144 MA2031 Classnotes

of length 1 for the eigenvalue 𝜆 of 𝑇 . (Such a situation arises when 𝑛 > 𝑟 + 𝑠.) Thus
for the eigenvalue 𝜆, we have in total 𝑛 − 𝑟 − 𝑠 + 𝑠 = 𝑛 − 𝑟 number of Jordan strings
in 𝐵. Since 𝑛 − 𝑟 = null(𝑇 − 𝜆𝐼 ), the number of Jordan strings for the eigenvalue
𝜆 of 𝑇 is the geometric multiplicity of 𝜆. Jordan strings for other eigenvalues of 𝑇
remain unchanged.
The inductive construction in the proof of (5.31) goes as follows. Let 𝜆 be an
eigenvalue of 𝑇 : 𝑉 → 𝑉 . Write 𝑈 := 𝑅(𝑇 − 𝜆𝐼 ); and define the linear operator
𝑆 : 𝑈 → 𝑈 as the restriction of 𝑇 − 𝜆𝐼 to 𝑈 . Without loss of generality, suppose
the first 𝑠 Jordan strings (each Jordan string in each line on the left array below)
correspond to the eigenvalue 0 of 𝑆; and other Jordan strings may correspond to
other eigenvalues of 𝑆.
When 0 is not an eigenvalue of 𝑆, the number 𝑠 is equal to 0. If such a Jordan string
corresponds to an eigenvalue 𝜇 − 𝜆 of 𝑆, then it also corresponds to the eigenvalue
𝜇 of 𝑇 .
Here, 𝑛 1 + · · · + 𝑛𝑘 = 𝑟 = rank(𝑇 − 𝜆𝐼 ) = dim (𝑈 ). The first vectors of the first 𝑠
Jordan strings are in 𝑁 (𝑆). That is,

𝑣 11, . . . , 𝑣𝑠1, . . . , 𝑣𝑘1 ∈ 𝑁 (𝑇 − 𝜆𝐼 ) ∩ 𝑅(𝑇 − 𝜆𝐼 ).

Next, we construct 𝑤𝑖 s such that

(𝑇 − 𝜆𝐼 )𝑤𝑖 = 𝑣𝑖𝑛𝑖 for 1 ≤ 𝑖 ≤ 𝑠.

Also we construct a basis {𝑧 1, . . . , 𝑧𝑛−𝑟 −𝑠 } for span (𝑁 (𝑇 − 𝜆𝐼 ) \ 𝑅(𝑇 − 𝜆𝐼 )). The


enlarged basis for 𝑉 consisting of Jordan strings of eigenvalues of 𝑇 may be arranged
as the array on the right:

𝑣 11, 𝑣 12, · · · 𝑣 1𝑛1 , 𝑤 1


..
𝑣 11, 𝑣 12, ··· 𝑣 1𝑛1 .
.. 𝑣𝑠1, 𝑣𝑠2, ··· 𝑣𝑠𝑛𝑠 , 𝑤𝑠
. ..
𝑣𝑠1, 𝑣𝑠2, ··· 𝑣𝑠𝑛𝑠 enlarged to .
.. 𝑣𝑘1, 𝑣𝑘2, ··· 𝑣𝑘𝑛𝑘
. 𝑧1
𝑣𝑘1, 𝑣𝑘2, · · · 𝑣𝑘𝑛𝑘 ..
.
𝑧𝑛−𝑟 −𝑠

With respect to a basis that consists of disjoint union of Jordan strings, how does
a matrix representation of a linear operator look like?
Let 𝑇 be a linear operator on a complex vector space 𝑉 of dimension 𝑛 having
distinct eigenvalues 𝜆1, . . . , 𝜆𝑘 , whose geometric multiplicities are 𝑠 1, . . . , 𝑠𝑘 , and
Spectral Representation 145
algebraic multiplicities 𝑚 1, . . . , 𝑚𝑘 , respectively. Then we have a basis in which
there are 𝑠𝑖 number of Jordan strings for 𝜆𝑖 . We choose these 𝑠𝑖 Jordan strings in
some order, so that we may talk of first Jordan string for 𝜆𝑖 , second Jordan string for
𝜆𝑖 , and so on. We also take the eigenvalues in some order, say, 𝜆𝑖 is the 𝑖th eigenvalue.
The vectors in any Jordan string are already ordered. Then we construct an ordered
basis from these Jordan strings by listing all vectors (in order) in the first Jordan
string for 𝜆1 ; next, the vectors from the second Jordan string for 𝜆1, and so on. This
completes the list of 𝑚 1 vectors for 𝜆1 . Next, we list all vectors in order from the
first Jordan string for 𝜆2, and so on. After the list is complete we obtain an ordered
basis 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } of 𝑉 . Notice that for any 𝑣 𝑗 in this basis, we have

𝑇 (𝑣 𝑗 ) = 𝜆𝑣 𝑗 or 𝑇 (𝑣 𝑗 ) = 𝑎𝑣 𝑗−1 + 𝜆𝑣 𝑗

for an eigenvalue 𝜆 of 𝑇 with 𝑎 ∈ {0, 1}. For instance, if the first Jordan string for
𝜆1 is of length ℓ, then

𝑇 (𝑣 1 ) = 𝜆1𝑣 1, 𝑇 (𝑣 2 ) = 𝑣 1 + 𝜆1𝑣 2, . . . ,𝑇 (𝑣 ℓ ) = 𝑣 ℓ−1 + 𝜆1𝑣 ℓ ,

and after this the next Jordan string starts. In the matrix representation of 𝑇 with
respect to 𝐵, this will correspond to a block of order ℓ having diagonal entries as 𝜆1
and super diagonal entries (entries above the diagonal) as 1.
Then the next Jordan string will give rise to another block of diagonal entries 𝜆1
and super diagonal entries as 1. This block will have the order as the length of the
Jordan string. This way, when all the Jordan strings for 𝜆1 are complete, another
similar block will start with a Jordan string for 𝜆2, and so on.
With respect to this basis 𝐵, the linear operator 𝑇 will have the matrix represen-
tation in the form
𝐵,𝐵 = diag (𝐽1, 𝐽2, . . . , 𝐽𝑘 ), (5.5.4)
where each 𝐽𝑖 is again a block diagonal matrix looking like

𝐽𝑖 = diag ( 𝐽˜1 (𝜆𝑖 ), 𝐽˜2 (𝜆𝑖 ), . . . , 𝐽˜𝑠𝑖 (𝜆𝑖 )),

wit 𝑠𝑖 as the geometric multiplicity of the eigenvalue 𝜆𝑖 . Each matrix 𝐽˜𝑗 (𝜆𝑖 ) here has
the form
𝜆𝑖 1 
 
 𝜆𝑖 1 
 
˜𝐽 𝑗 (𝜆𝑖 ) =  .. ..
. .

.
 

 1 

 𝜆𝑖 
The missing entries are all 0. Such a matrix 𝐽˜𝑗 (𝜆𝑖 ) is called a Jordan block with
diagonal entries 𝜆𝑖 . The order of this Jordan block is the length of the corresponding
146 MA2031 Classnotes

Jordan string for the eigenvalue 𝜆𝑖 . In the matrix [𝑇 ]𝐵,𝐵 , the number of Jordan blocks
with diagonal entries 𝜆𝑖 is the number of Jordan strings for the eigenvalue 𝜆𝑖 , which
is equal to the geometric multiplicity of the eigenvalue 𝜆𝑖 . Any matrix which is in
the block diagonal form (5.5.4) is said to be in Jordan form. Using (5.31) we
obtain the following result.

(5.32) Theorem (Jordan form)


Let 𝑇 be a linear operator on a finite dimensional complex vector space 𝑉 . Then
there exists a basis 𝐵 for 𝑉 such that [𝑇 ]𝐵,𝐵 is in Jordan form, whose diagonal entries
are the eigenvalues of 𝑇 . The number of Jordan blocks in [𝑇 ]𝐵,𝐵 with diagonal entry
𝜆 is the geometric multiplicity of 𝜆. Moreover, the number 𝑟𝑘 (𝜆) of Jordan blocks of
order 𝑘 with diagonal entry 𝜆, is given by
𝑟𝑘 (𝜆) = rank((𝑇 − 𝜆𝐼 )𝑘−1 ) − 2 rank((𝑇 − 𝜆𝐼 )𝑘 ) + rank((𝑇 − 𝜆𝐼 ))𝑘+1 (5.5.5)
for 1 ≤ 𝑘 ≤ 𝑛. Thus, the Jordan form of 𝑇 is unique up to a permutation of the
blocks.

In the formula for 𝑟𝑘 (𝜆), we use the convention that for any matrix 𝐵 of order 𝑛,
𝐵0 is the identity matrix of order 𝑛.
Proof. Existence of Jordan form and the statement about the number of Jordan
blocks with diagonal entry as 𝜆 follow from (5.31).
For the formula for 𝑟𝑘 (𝜆), let 𝜆 be an eigenvalue of 𝑇 . Write 𝐽 := [𝑇 ]𝐵,𝐵 . Suppose
1 ≤ 𝑘 ≤ 𝑛. Observe that [𝑇 − 𝜆𝐼 ]𝐵,𝐵 = 𝐽 − 𝜆𝐼 . From (4.25) it follows that for each 𝑖,
rank((𝑇 − 𝜆𝐼 )𝑖 ) = rank((𝐽 − 𝜆𝐼 )𝑖 ). Therefore, it is enough to prove the formula for
𝐽 instead of 𝑇 .
We use induction on 𝑛. In the basis case, 𝐽 = [𝜆]. Here, 𝑘 = 1; 𝑟𝑘 (𝜆) = 𝑟 1 (𝜆) = 1.
On the right hand side, due to the convention,
(𝐽 − 𝜆𝐼 )𝑘−1 = 𝐼 = [1], (𝐽 − 𝜆𝐼 )𝑘 = [0] 1 = [0], (𝐽 − 𝜆𝐼 )𝑘+1 = [0] 2 = [0].
So, the formula holds for 𝑛 = 1.
Lay out the induction hypothesis that for all matrices in Jordan form of order less
than 𝑛, the formula holds. Let 𝐽 be a matrix of order 𝑛, which is in Jordan form. We
consider two cases.
Case 1: Let 𝐽 have a single Jordan block corresponding to 𝜆. That is,
𝜆 1  0 1 
   
 𝜆 1   0 1 
   
𝐽 =
 ... ... 
, 𝐽 − 𝜆𝐼 = 
 ... ... 
.
   

 1 
 1

 𝜆  
 0
Spectral Representation 147
Here 𝑟 1 (𝜆) = 0, 𝑟 2 (𝜆) = 0, . . . , 𝑟𝑛−1 (𝜆) = 0 and 𝑟𝑛 (𝜆) = 1. By direct computation,
we see that (𝐽 −𝜆𝐼 ) 2 has 1 on the super-super-diagonal, and 0 elsewhere. Proceeding
similarly for higher powers of 𝐽 − 𝜆𝐼, we obtain

rank(𝐽 − 𝜆𝐼 ) = 𝑛 − 1, rank((𝐽 − 𝜆𝐼 ) 2 ) = 𝑛 − 2, . . . , rank((𝐽 − 𝜆𝐼 )𝑖 = 𝑛 − 𝑖,


rank((𝐽 − 𝜆𝐼 )𝑛 ) = 0, rank((𝐽 − 𝜆𝐼 )𝑛+1 = 0, . . . .

Then for 𝑘 < 𝑛, rank((𝐽 − 𝜆𝐼 )𝑘−1 ) − 2 rank((𝐽 − 𝜆𝐼 )𝑘 ) + rank((𝐽 − 𝜆𝐼 )𝑘+1 )


= (𝑛 − (𝑘 − 1)) − 2(𝑛 − 𝑘) + (𝑛 − 𝑘 − 1) = 0.

And for 𝑘 = 𝑛, rank((𝐽 − 𝜆𝐼 )𝑘−1 ) − 2 rank((𝐽 − 𝜆𝐼 )𝑘 ) + rank((𝐽 − 𝜆𝐼 )𝑘+1 )


= (𝑛 − (𝑛 − 1)) − 2 × 0 + 0 = 1 = 𝑟𝑛 (𝜆).

Case 2: Suppose 𝐽 has more than one Jordan block corresponding to 𝜆. By reordering
of blocks, we assume that the first Jordan block in 𝐽 corresponds to 𝜆 and has order
𝑟 < 𝑛. Then 𝐽 − 𝜆𝐼 can be written in block form as
 
𝐶 0
𝐽 − 𝜆𝐼 = ,
0 𝐷

where 𝐶 is the Jordan block of order 𝑟 with diagonal entries as 0, and 𝐷 is the matrix
of order 𝑛 − 𝑟 consisting of other blocks of 𝐽 − 𝜆𝐼 . Then, for any 𝑗,
 𝑗 
𝑗 𝐶 0
(𝐽 − 𝜆𝐼 ) = .
0 𝐷𝑗
Therefore,
rank(𝐽 − 𝜆𝐼 ) 𝑗 = rank(𝐶 𝑗 ) + rank(𝐷 𝑗 ).
Write 𝑟𝑘 (𝜆, 𝐶) and 𝑟𝑘 (𝜆, 𝐷) for the number of Jordan blocks of order 𝑘 for the
eigenvalue 𝜆 that appear in 𝐶 and in 𝐷, respectively. Then

𝑟𝑘 (𝜆) = 𝑟𝑘 (𝜆, 𝐶) + 𝑟𝑘 (𝜆, 𝐷).

By the induction hypothesis,

𝑟𝑘 (𝜆, 𝐶) = rank(𝐶 𝑘−1 ) − 2 rank(𝐶 𝑘 ) + rank(𝐶)𝑘+1,


𝑟𝑘 (𝜆, 𝐷) = rank(𝐷 𝑘−1 ) − 2 rank(𝐷 𝑘 ) + rank(𝐷)𝑘+1 .

It then follows that

𝑟𝑘 (𝜆) = rank((𝐽 − 𝜆𝐼 )𝑘−1 ) − 2 rank((𝐽 − 𝜆𝐼 )𝑘 ) + rank((𝐽 − 𝜆𝐼 ))𝑘+1 .


148 MA2031 Classnotes

Since the number of Jordan blocks of order 𝑘 corresponding to each eigenvalue of


𝑇 is uniquely determined, the Jordan form of 𝑇 is also uniquely determined up to a
permutation of blocks.
Taking 𝑉 as C𝑛×1, we obtain the matrix version of (5.32).

(5.33) Theorem (Jordan form)


Each matrix 𝐴 ∈ C𝑛×𝑛 is similar to a matrix in Jordan form whose diagonal entries
are the eigenvalues of 𝐴. In a Jordan form, the number of Jordan blocks with
diagonal entry 𝜆 is the geometric multiplicity of 𝜆. The number 𝑟𝑘 (𝜆) of Jordan
blocks of order 𝑘 with diagonal entry 𝜆, is given by

𝑟𝑘 (𝜆) = rank((𝑇 − 𝜆𝐼 )𝑘−1 ) − 2 rank((𝑇 − 𝜆𝐼 )𝑘 ) + rank((𝑇 − 𝜆𝐼 ))𝑘+1

for 1 ≤ 𝑘 ≤ 𝑛. The Jordan form of 𝐴 is unique up to a permutation of Jordan blocks.

Notice that using rank-nullity theorem, the number 𝑟𝑘 can also be given as in the
following:

𝑟𝑘 (𝜆) = null((𝐽 − 𝜆𝐼 )𝑘−1 ) − 2 null((𝐽 − 𝜆𝐼 )𝑘 ) + null((𝐽 − 𝜆𝐼 ))𝑘+1 . (5.5.6)

To obtain a Jordan form of a given matrix, we may construct a basis consisting of


Jordan strings. Alternatively, we may use the formulas for 𝑟𝑘 given in (5.5.5)-(5.5.6).

(5.34) Example
 1 0 0 0 0
 
 1 1 0 0 0
 
Consider the matrix 𝐴 = 0 1 1 0 0 of (5.30).
 
 
 0 0 0 1 0
 
 0 0 0 1 1
 
There, we had constructed the Jordan strings

𝑣 1 = (0, 0, 1, 0, 0) t, 𝑣 2 = (0, 1, 1, 0, 0) t, 𝑣 3 = (1, 1, 1, 0, 0) t ;


𝑤 1 = (0, 0, 0, 0, 1) t, 𝑤 2 = (0, 0, 1, 1, 0) t .

In the ordered basis {𝑣 1, 𝑣 2, 𝑣 3, 𝑤 1, 𝑤 2 }, the matrix of 𝐴 is given by 𝐽 = 𝑃 −1𝐴𝑃 with


𝑃 as the matrix whose columns are these basis vectors. That is, the Jordan form of
𝐴 is
0 −1
 0 1 0 0 1
 0 0 0 0 0
 0 1 0 0 1 1 0 0 0
0 1 1 0 0 1 1 0 0 0 0 1 1 0 0 0 1 1 0 0
  
𝐽 = 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 = 0 0 1 0 0 .
       
       
0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 1
       
0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1
  
Spectral Representation 149

Alternatively, we compute the ranks of successive powers of 𝐴 − 𝜆𝐼 . Notice that


the only eigenvalue of 𝐴 is 1. We find that

rank(𝐴 − 𝐼 ) = 3, rank((𝐴 − 𝐼 ) 2 ) = 1, rank((𝐴 − 𝐼 )𝑖 ) = 0 for 𝑖 ≥ 3.

Then

𝑟 1 (1) = 5 − 2 × 3 + 1 = 0, 𝑟 2 (1) = 3 − 2 × 1 + 0 = 1
𝑟 3 (1) = 1 − 2 × 0 + 0 = 1, 𝑟 4 (1) = 0 = 𝑟 5 (1).

It says that in the Jordan form, there is one Jordan block of size 2 and one Jordan
block of size 3 with the sole eigenvalue 1. That gives the Jordan form as obtained
earlier, up to a permutation of the blocks.

In the Jordan strings, observe that the vectors used are from 𝑁 (𝑇 − 𝜆𝐼 ) 𝑗 . Such
a vector is called a generalized eigenvector corresponding to the eigenvalue 𝜆 of
𝑇 . For a matrix 𝐴, the similarity matrix 𝑃 in 𝐽 = 𝑃 −1𝐴𝑃 has the columns as the
vectors from the basis in which 𝐽 represents the matrix 𝐴. These vectors are specific
generalized eigenvectors of 𝐴.
The uniqueness of a Jordan form can be made exact by first ordering the eigen-
values of 𝐴 and then arranging the blocks corresponding to each eigenvalue, which
are now together, in some order, say in ascending order of their size. In doing so,
the Jordan form of any matrix becomes unique. Such a Jordan form is called the
Jordan canonical form of a matrix. It then follows that if two matrices are similar,
then they have the same Jordan canonical form. Moreover, uniqueness also implies
that two dissimilar matrices will have different Jordan canonical forms. Therefore,
Jordan form characterizes similarity of matrices. It implies the following:
Two square matrices 𝐴 and 𝐵 of the same order are similar iff they
have the same eigenvalues, and for each eigenvalue 𝜆, for each 𝑗 ∈ N,
rank(𝐴 − 𝜆𝐼 ) 𝑗 = rank(𝐵 − 𝜆𝐼 ) 𝑗 .
As an application of Jordan form, we will show that each matrix is similar to its
transpose. Let 𝐴 ∈ C𝑛×𝑛 . We know that a scalar 𝜆 is an eigenvalue of 𝐴 iff it is an
eigenvalue of 𝐴t . Further, rank(𝐴t ) = rank(𝐴). Thus, for any eigenvalue 𝜆 of 𝐴 and
for any 𝑗, we have rank 𝐴 − 𝜆𝐼 ) 𝑗 = rank 𝐴t − 𝜆𝐼 ) 𝑗 . Consequently, 𝐴 and 𝐴t are
 

similar.
It also follows from the Jordan form that if 𝑚 is the algebraic multiplicity of
an eigenvalue 𝜆, then one can always choose 𝑚 linearly independent generalized
eigenvectors; see Exercise 7 of Section 5.3. Further, the following is guaranteed
(Exercise 8.):
150 MA2031 Classnotes

If the linear system (𝐴 − 𝜆𝐼 )𝑘 𝑥 = 0 has 𝑟 < 𝑚 number of linearly


independent solutions, then (𝐴 − 𝜆𝐼 )𝑘+1 has at least 𝑟 + 1 number of
linearly independent solutions. Also (𝐴 − 𝜆𝐼 )𝑚 𝑥 = 0 has 𝑚 linearly
independent solutions.
This result is often more useful in computing the exponential of a matrix rather than
using explicitly the Jordan form, which is comparatively difficult to compute.

Exercises for § 5.5


1. In (5.30), explore the other choices for 𝑣 2, 𝑣 3, and 𝑤 2 . Construct the corre-
sponding Jordan strings.
Ans: 𝑣 1 as given. 𝑣 2 = (0, 1, 1, 0, 0), 𝑣 3 = (1, 1, 0, 0, 1); 𝑣 2 = (0, 1, 0, 0, 1),
𝑣 3 = (1, 0, 1, 1, 0); 𝑣 2 = (0, 1, 0, 0, 1), 𝑣 3 = (1, 0, 0, 1, 1); 𝑤 1 as given.
𝑤 2 = (0, 0, 0, 1, 1).
2. Let 𝐴 ∈ F𝑛×𝑛 . Let 𝐵 = 𝑃𝐴𝑃 −1 for an invertible matrix 𝑃 ∈ F𝑛×𝑛 . Show that if
{𝑣 1, . . . , 𝑣𝑚 } is a basis of 𝑁 (𝐴), then {𝑃𝑣 1, . . . , 𝑃𝑣𝑚 } is a basis of 𝑁 (𝐵).
(This would prove directly that null(𝐵) = null(𝐴). Then rank(𝐵) = rank(𝐴).)
3. Determine the Jordan forms of the following matrices:
0 0 0 −2 −1 −3  −7 8 2 
     
(a) 1 0 0 (b)  4 3 3 (c)  −4 5 1  .
2 1 0  −2 1 −1  −23 21 7 
     
0 1 0  −4 0 0  2 1 0
     
Ans: (a) 0 0 1 . (b)  0 2 1  . (c)  0 2 0  .
0 0 0  0 0 2 0 0 1
     
4. Determine the matrix 𝑃 ∈ C3×3 such that 𝑃 −1𝐴𝑃 is in Jordan form, where 𝐴
is the matrix in Exercise 3(b).
 1 1 0
 
Ans: 𝑃 =  −1 −1 −1  .
 1 −1 0 
 
5. Let 𝐴 be the 5 × 5 matrix whose first row is (0, 1, 1, 0, 1), the second row is
(0, 0, 1, 1, 1),
 and all other rows
 are zero rows. Find the Jordan form of 𝐴.
0 1 0 𝐵 0
Ans: 𝐵 = , 𝐽 = .
0 0 1 0 0
6. Let 𝐴 be a 7 × 7 matrix with 𝜒 (𝑡) = (𝑡 − 2) 4 (3 − 𝑡) 3 . Suppose that in the
𝐴
Jordan form of 𝐴, the largest block for each of the eigenvalues is of order
2. Show that there are only two possible Jordan canonical forms of 𝐴; and
determine those
 Jordan
  forms.
        
2 1 2 1 3 1 2 1 3 1
Ans: diag , , , 3 ; diag , 2, 2, , 3 .
0 2 0 2 0 3 0 2 0 3
Spectral Representation 151
7. Let 𝜆 be an eigenvalue of a matrix 𝐴 ∈ C𝑛×𝑛 . Let𝑚 be the algebraic multiplicity
of 𝜆. Using the Jordan form, prove that null((𝐴 − 𝜆𝐼 )𝑚 ) = 𝑚.
8. Let 𝜆 be an eigenvalue of a matrix 𝐴 ∈ C𝑛×𝑛 . Let𝑚 be the algebraic multiplicity
of 𝜆. For each 𝑘 ∈ N, if null((𝐴 − 𝜆𝐼 )𝑘 ) < 𝑚, then show that null((𝐴 − 𝜆𝐼 ))𝑘 <
null((𝐴 − 𝜆𝐼 )𝑘+1 .
9. Using the Jordan form of a matrix show that a matrix 𝐴 is diagonalizable iff
for each eigenvalue of 𝐴, its geometric multiplicity is equal to its algebraic
multiplicity.
10. Let 𝐴 ∈ C𝑛×𝑛 . Let 𝜆1, . . . , 𝜆𝑘 be the distinct eigenvalues of 𝐴 having algebraic
multiplicities 𝑚 1, . . . , 𝑚𝑘 , respectively. Prove that 𝐴 is similar to a matrix of
the form diag (𝐵 1, . . . , 𝐵𝑘 ), where 𝐵𝑖 is of order 𝑚𝑖 . (This is called the block
diagonal form of 𝐴.)
11. Let 𝐽𝜆 be a Jordan block. Use a matrix 𝑄 in the following form to show that 𝐽𝜆
is similar to (𝐽𝜆 ) t . Then conclude that each matrix is similar to its transpose.

 1

 1 

𝑄=
 . 
 .. 


 1 

1 
 
 2 2 3 1 1 0
   
12. The matrix 𝐴 =  1 3 3  has the Jordan form 0 1 0 . Thus there exist
 
 −1 −2 −3  0 0 1
   
vectors 𝑣 1, 𝑣 2, 𝑣 3 ∈ F3×1 such that 𝐴𝑣 1 = 𝑣 1, 𝐴𝑣 2 = 𝑣 1 + 𝑣 2 and 𝐴𝑣 3 = 𝑣 3 . Now,
𝑁 (𝐴 − 𝐼 ) has a basis {𝑥 1, 𝑥 2 }, where 𝑥 1 = (1, −2, 0) t and 𝑥 2 = (−3, 0, 1) t .
Taking 𝑣 1 = 𝑥 1, we see that 𝐴𝑣 2 = 𝑥 1 + 𝑣 2 has no solutions. Also, by taking
𝑣 1 = 𝑥 2, we find that 𝐴𝑣 2 = 𝑥 2 + 𝑣 2 has no solutions. Why does it happen?
Ans: The vector 𝑣 1 must be chosen from 𝑁 (𝐴 − 𝐼 ) ∩ 𝑅(𝐴).

5.6 Singular value decomposition


Let 𝑇 : 𝑉 → 𝑊 be a linear transformation, where 𝑉 and 𝑊 are inner product spaces
with dim (𝑉 ) = 𝑛 and dim (𝑊 ) = 𝑚. There are two natural self-adjoint operators
associated with 𝑇 , namely, 𝑇 ∗𝑇 : 𝑉 → 𝑉 and 𝑇𝑇 ∗ : 𝑊 → 𝑊 . Suppose 𝜆 ∈ R is an
eigenvalue of 𝑇 ∗𝑇 with an associated eigenvector 𝑣 ∈ 𝑉 . Then 𝑇 ∗𝑇 𝑣 = 𝜆𝑣 implies
that
k𝑇 𝑣 k 2 = h𝑇 𝑣,𝑇 𝑣i = h𝑇 ∗𝑇 𝑣, 𝑣i = h𝜆𝑣, 𝑣i = 𝜆k𝑣 k 2 .
152 MA2031 Classnotes

Since k𝑣 k > 0, we see that 𝜆 ≥ 0. If there are 𝑟 number of positive eigenvalues of


𝑇 ∗𝑇 , for 0 ≤ 𝑟 ≤ 𝑛, then all eigenvalues can be arranged in a decreasing list such as

𝜆1 ≥ 𝜆2 ≥ · · · ≥ 𝜆𝑟 > 0 = 𝜆𝑟 +1 = · · · = 𝜆𝑛 .

The non-negative square roots of eigenvalues of the self-adjoint linear operator


𝑇 ∗𝑇 : 𝑉 → 𝑉 are called the singular values of 𝑇 . If there are 𝑟 number of positive
eigenvalues of 𝑇 ∗𝑇 , then the singular values of 𝑇 are written as

𝑠 1 ≥ 𝑠 2 ≥ · · · ≥ 𝑠𝑟 > 0 = 𝑠𝑟 +1 = · · · = 𝑠𝑛 .

Here, 𝑠𝑖 = 𝜆𝑖 . If 𝜆 > 0 is an eigenvalue of 𝑇 ∗𝑇 with an associated eigenvector 𝑣,
then 𝑇 ∗𝑇 𝑣 = 𝜆𝑣. It yields (𝑇𝑇 ∗ )(𝑇 𝑣) = 𝜆(𝑇 𝑣). Now, 𝜆𝑣 ≠ 0 implies 𝑇 𝑣 ≠ 0. Hence
𝜆 > 0 is also an eigenvalue of 𝑇𝑇 ∗ with an associated eigenvector 𝑇 𝑣. Similarly, it
follows that each positive eigenvalue of 𝑇𝑇 ∗ is also an eigenvalue of 𝑇 ∗𝑇 .
Further, the spectral theorem implies that the self-adjoint linear operator 𝑇 ∗𝑇 is
represented by a diagonal matrix. In such a diagonal matrix, the only nonzero entries
are 𝑠 12, . . . , 𝑠𝑟2 . Therefore, 𝑇 ∗𝑇 has rank 𝑟 . It then follows that 𝑠 12, . . . , 𝑠𝑟2 are the only
positive eigenvalues of the self-adjoint linear operator 𝑇𝑇 ∗ . There are 𝑛 − 𝑟 number
of zero eigenvalues of 𝑇 ∗𝑇 , where as 𝑇𝑇 ∗ has 𝑚 − 𝑟 number of zero eigenvalues.
The following theorem gives much more information than this by representing 𝑇
in terms of its singular values.

(5.35) Theorem (Singular value decomposition, SVD)


Let 𝑉 and 𝑊 be inner product spaces of dimensions 𝑛 and 𝑚, respectively. Let
𝑇 : 𝑉 → 𝑊 be a linear transformation with 𝑟 positive singular values 𝑠 1 ≥
. . . ≥ 𝑠𝑟 . Then there exist orthonormal ordered bases 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } for 𝑉 and
𝐸 = {𝑤 1, . . . , 𝑤𝑚 } for 𝑊 such that
 
𝑆 0
[𝑇 ]𝐸,𝐵 = ∈ C𝑚×𝑛 with 𝑆 := diag (𝑠 1, . . . , 𝑠𝑟 ) ∈ C𝑟 ×𝑟 .
0 0

Further, the vectors 𝑣𝑖 and 𝑤 𝑗 satisfy the following:


(1) Each 𝑣𝑖 is an eigenvector of 𝑇 ∗𝑇 .
(2) {𝑣 1, . . . , 𝑣𝑟 } is an orthonormal basis of 𝑅(𝑇 ∗ ).
(3) {𝑣𝑟 +1, . . . , 𝑣𝑛 } is an orthonormal basis of 𝑁 (𝑇 ).
(4) {𝑤 1, . . . , 𝑤𝑟 } is an orthonormal basis of 𝑅(𝑇 ).
(5) {𝑤𝑟 +1, . . . , 𝑤𝑚 } is an orthonormal basis of 𝑁 (𝑇 ∗ ).
(6) Each 𝑤 𝑗 is an eigenvector of 𝑇𝑇 ∗ .
Spectral Representation 153
Proof. We construct the bases 𝐵 and 𝐸 meeting the requirements in (1)-(6). Finally,
we show the matrix representation of 𝑇 .
(1) The eigenvalues of 𝑇 ∗𝑇 are 𝑠 12, . . . , 𝑠𝑟2, and 𝑛 − 𝑟 number of 0s. By the spectral
theorem, the self-adjoint linear operator 𝑇 ∗𝑇 is diagonalizable. So, there exists an
orthonormal ordered basis 𝐵 := {𝑣 1, . . . , 𝑣𝑛 } for 𝑉 such that

𝑇 ∗𝑇 𝑣𝑖 = 𝑠𝑖2𝑣𝑖 for 𝑖 = 1, . . . , 𝑟 ; 𝑇 ∗𝑇 𝑣𝑖 = 0 𝑣𝑖 = 0 for 𝑖 = 𝑟 + 1, . . . , 𝑛.

Thus each 𝑣𝑖 is an eigenvector of 𝑇 ∗𝑇 .


(2) For 1 ≤ 𝑖 ≤ 𝑟, 𝑣𝑖 = (𝑠𝑖2 ) −1𝑇 ∗𝑇 𝑣𝑖 . It shows that 𝑣𝑖 ∈ 𝑅(𝑇 ∗𝑇 ). Further,
rank(𝑇 ∗𝑇 ) = 𝑟 . So, {𝑣 1, . . . , 𝑣𝑟 } is an orthonormal basis of 𝑅(𝑇 ∗𝑇 ). Due to (3.24),
𝑅(𝑇 ∗𝑇 ) = 𝑅(𝑇 ∗ ). Therefore, {𝑣 1, . . . , 𝑣𝑟 } is an orthonormal basis of 𝑅(𝑇 ∗ ).
(3) For 𝑟 + 1 ≤ 𝑖 ≤ 𝑛, 𝑇 ∗𝑇 𝑣𝑖 = 0. It follows that 𝑣𝑖 ∈ 𝑁 (𝑇 ∗𝑇 ). Further,
null(𝑇 ∗𝑇 ) = 𝑛 − 𝑟 . Thus {𝑣𝑟 +1, . . . , 𝑣𝑛 } is an orthonormal basis for 𝑁 (𝑇 ∗𝑇 ). By
(3.24), 𝑁 (𝑇 ) = 𝑁 (𝑇 ∗𝑇 ). Hence {𝑣𝑟 +1, . . . , 𝑣𝑛 } is an orthonormal basis of 𝑁 (𝑇 ).
(4) For 𝑖 = 1, . . . , 𝑟, define 𝑤𝑖 = (𝑠𝑖 ) −1𝑇 𝑣𝑖 . Then

h𝑤𝑖 , 𝑤 𝑗 i = (𝑠𝑖 𝑠 𝑗 ) −1 h𝑇 𝑣𝑖 , 𝑇 𝑣 𝑗 i = (𝑠𝑖 𝑠 𝑗 ) −1 h𝑣𝑖 , 𝑇 ∗𝑇 𝑣 𝑗 i = (𝑠𝑖 𝑠 𝑗 ) −1 h𝑣𝑖 , 𝑠 2𝑗 𝑣 𝑗 i = (𝑠𝑖 ) −1𝑠 𝑗 h𝑣𝑖 , 𝑣 𝑗 i.

Since {𝑣 1, . . . , 𝑣𝑟 } is an orthonormal set, {𝑤 1, . . . , 𝑤𝑟 } is also an orthonormal set.


For 1 ≤ 𝑗 ≤ 𝑟, clearly, 𝑤 𝑗 ∈ 𝑅(𝑇 ). Further, rank(𝑇 ) = rank(𝑇 ∗𝑇 ) = 𝑟 . Therefore,
{𝑤 1, . . . , 𝑤𝑟 } is an orthonormal basis of 𝑅(𝑇 ).
(5) Extend the orthonormal set {𝑤 1, . . . , 𝑤𝑟 } in 𝑊 to an orthonormal basis

𝐸 = {𝑤 1, . . . , 𝑤𝑟 , 𝑤𝑟 +1, . . . , 𝑤𝑚 }

for 𝑊 . Let 𝑟 + 1 ≤ 𝑗 ≤ 𝑚. Then 𝑇 ∗𝑤 𝑗 ∈ 𝑅(𝑇 ∗ ). Since {𝑣 1, . . . , 𝑣𝑟 } is a basis for


𝑅(𝑇 ∗ ), there exist scalars 𝛼 1, . . . , 𝛼𝑟 such that 𝑇 ∗𝑤 𝑗 = 𝛼 1𝑣 1 + · · · + 𝛼𝑟 𝑣𝑟 . So,

𝑇𝑇 ∗𝑤 𝑗 = 𝛼 1𝑇 𝑣 1 + · · · + 𝛼𝑟 𝑇 𝑣𝑟 = 𝛼 1𝑠 1𝑤 1 + · · · + 𝑎𝑟 𝑠𝑟 𝑤𝑟 .

As {𝑤 1, . . . , 𝑤𝑟 } is linearly independent, and 𝑠𝑖 are positive, it follows that 𝛼 1 =


· · · = 𝛼𝑟 = 0. Then 𝑇 ∗𝑤 𝑗 = 0. That is, 𝑤 𝑗 ∈ 𝑁 (𝑇 ∗ ) for 𝑟 + 1 ≤ 𝑗 ≤ 𝑚. But
null(𝑇 ∗ ) = 𝑚 − rank(𝑇 ∗ ) = 𝑚 − rank(𝑇 ∗𝑇 ) = 𝑚 − 𝑟 . Hence, {𝑤𝑟 +1, . . . , 𝑤𝑚 } is an
orthonormal basis of 𝑁 (𝑇 ∗ ).
(6) For 1 ≤ 𝑗 ≤ 𝑟, 𝑇𝑇 ∗𝑤 𝑗 = (𝑠 𝑗 ) −1𝑇𝑇 ∗𝑇 𝑣 𝑗 = (𝑠 𝑗 ) −1𝑇𝑠 2𝑗 𝑣 𝑗 = 𝑠 2𝑗 𝑤 𝑗 .
For 𝑟 + 1 ≤ 𝑗 ≤ 𝑚, 𝑇 ∗𝑤 𝑗 = 0 implies that 𝑇𝑇 ∗𝑤 𝑗 = 0 𝑤 𝑗 .
Therefore, each 𝑤 𝑗 is an eigenvector of 𝑇𝑇 ∗ .
Towards the matrix representation of 𝑇 , notice that

𝑇 𝑣 𝑗 = 𝑠 𝑗 𝑤 𝑗 for 𝑗 = 1, . . . , 𝑟 ; 𝑇 𝑣 𝑗 = 0 for 𝑗 = 𝑟 + 1, . . . , 𝑛.
154 MA2031 Classnotes
 
𝑆 0
Therefore, [𝑇 ]𝐸,𝐵 = with 𝑆 = diag (𝑠 1, . . . , 𝑠𝑟 ).
0 0
The matrix interpretation of SVD may be formulated as in the following.

(5.36) Theorem (SVD)


Let 𝐴 ∈ C𝑚×𝑛 be a matrix of rank 𝑟 . Then there exist unitary matrices 𝑃 ∈ C𝑚×𝑚
and 𝑄 ∈ C𝑛×𝑛 such that
 
𝑆 0
𝐴 = 𝑃 Σ 𝑄 , Σ :=
∗ ∈ C𝑚×𝑛 , 𝑆 := diag (𝑠 1, . . . , 𝑠𝑟 ) ∈ C𝑟 ×𝑟 ,
0 0
where 𝑠 1 ≥ . . . ≥ 𝑠𝑟 are the positive singular values of 𝐴.

In (5.36), the columns of 𝑃 are eigenvectors of 𝐴𝐴∗, that form an orthonormal


basis for C𝑚×1 ; and these are called the left singular vectors of 𝐴. Further, the
columns of 𝑄 are eigenvectors of 𝐴∗𝐴 that form an orthonormal basis for C𝑛×1 ; and
these are called the right singular vectors of 𝐴.
A singular value decomposition depends on the choice of orthonormal bases. For
example, just by multiplying ±1 to an already constructed one we obtain a different
orthonormal basis. Thus, SVD is not unique.
Also, it can be shown that when 𝐴 ∈ R𝑚×𝑛 , the matrices 𝑃 and 𝑄 can be chosen
to have real entries.
In the product 𝑃 Σ𝑄 ∗, there are 0 blocks in case 𝑚 ≠ 𝑛. We may thin out certain 0
blocks and obtain the same product. Let 𝐴 ∈ C𝑚×𝑛 have rank 𝑟 .
Case 1: Suppose 𝑚 < 𝑛. We delete the last 𝑛 − 𝑚 columns from Σ to obtain Σ1 ; and
delete the last 𝑛 − 𝑚 rows from 𝑄 ∗ to obtain 𝑄 1∗ . The deleted columns and rows do
not contribute anything to the product. Therefore, we have

𝐴 = 𝑃 Σ1𝑄 1∗, Σ1 = diag (𝑠 1, . . . , 𝑠𝑟 , 0, . . . , 0) ∈ C𝑚×𝑚 , 𝑄 1 = [𝑣 1 · · · 𝑣𝑚 ] ∈ C𝑛×𝑚 .

Here, 𝑃 is unitary, and 𝑄 1 has orthonormal columns.


Case 2: Suppose 𝑚 > 𝑛. We delete the last 𝑚 − 𝑛 columns of 𝑃 to obtain 𝑃2, and
delete the last 𝑚 − 𝑛 rows of Σ to obtain Σ2 . We see that

𝐴 = 𝑃2 Σ2𝑄 ∗, 𝑃2 = [𝑤 1 · · · 𝑤𝑛 ] ∈ C𝑚×𝑛 , Σ2 = diag (𝑠 1, . . . , 𝑠𝑟 , 0, . . . , 0) ∈ C𝑛×𝑛 .

Here, the columns of 𝑃 2 are orthonormal, while 𝑄 is unitary.


Case 3: When 𝑚 = 𝑛, we keep all of 𝑃, Σ, 𝑄 as they are, in 𝐴 = 𝑃 Σ𝑄.
These three forms of SVD are called the thin SVD of 𝐴.
A further tightening in deleting the 0 blocks in the product 𝑃 Σ𝑄 ∗ is possible.
Suppose 𝐴 ∈ C𝑚×𝑛 has rank 𝑟 . Observe that the columns 𝑟 + 1 onwards in the
matrices 𝑃 and 𝑄 in the product 𝑃𝐴𝑄 ∗ produce the 0 blocks. We may then delete
Spectral Representation 155
the last 𝑚 − 𝑟 columns of 𝑃, the last 𝑛 − 𝑟 columns of 𝑄, and curtail Σ to its first 𝑟 × 𝑟
block. Thus we obtain

˜ 𝑄˜ ∗, with 𝑃˜ = 𝑤 1 · · · 𝑤𝑟 , 𝑄˜ = 𝑣 1 · · · 𝑣𝑟 .
   
𝐴 = 𝑃𝑆

Here, the matrices 𝑃˜ ∈ C𝑚×𝑟 and 𝑄˜ ∈ C𝑛×𝑟 have orthonormal columns, and 𝑆 =
diag (𝑠 1, . . . , 𝑠𝑟 ) ∈ C𝑟 ×𝑟 . This simplified decomposition of the matrix 𝐴 is called the
tight SVD of 𝐴.
In the tight SVD, the matrices 𝐴 ∈ C𝑚×𝑛 , 𝑃˜ ∈ C𝑚×𝑟 , 𝑆 ∈ C𝑟 ×𝑟 and 𝑄˜ ∗ ∈ C𝑟 ×𝑛 are
all of rank 𝑟 . Write 𝐵 = 𝑃𝑆 ˜ and 𝐶 = 𝑆 𝑄˜ ∗ to obtain

𝐴 = 𝐵 𝑄˜ ∗ = 𝑃˜ 𝐶,

where 𝐵 ∈ C𝑚×𝑟 and 𝐶 ∈ C𝑟 ×𝑛 are also of rank 𝑟 . It shows that each 𝑚 × 𝑛 matrix of
rank 𝑟 can be written as a product of an 𝑚 × 𝑟 matrix with an 𝑟 × 𝑛 matrix each of
rank 𝑟 . This way, the full rank factorization of a matrix follows from the tight SVD.

(5.37) Example
 2 −1
 
Obtain SVD, tight SVD, and full rank factorizations of 𝐴 = −2 1 .
 4 −2
   
24 −12
The matrix 𝐴∗𝐴 = has eigenvalues 𝜆1 = 30 and 𝜆2 = 0. Thus
−12 6

𝑠 1 = 30. It is easy to check that rank(𝐴) = 1 as the first column of 𝐴 is −2 times
the second column. Solving the equations 𝐴∗𝐴(𝑎, 𝑏) t = 30(𝑎, 𝑏) t, that is,

24𝑎 − 12𝑏 = 30𝑎, −12𝑎 + 6𝑏 = 30𝑏,

we obtain a solution as 𝑎 = −2, 𝑏 = 1. So, a unit eigenvector corresponding to the


eigenvalue 30 is  
1 −2
𝑣1 = √ .
5 1
For the eigenvalue 𝜆2 = 0, the equations are 24𝑎 − 12𝑏 = 0, −12𝑎 + 6𝑏 = 0.
Thus a unit eigenvector orthogonal to 𝑣 1 is
 
1 1
𝑣2 = √ .
5 2

Then,
 2 −1  √  −1
  −2/ 5  
𝑤1 = √1 𝐴𝑣 1 = √1 −2 1 √ = √1  1 .
30 30
 4 −2 1/ 5 6
   
−2
   
156 MA2031 Classnotes

Notice that k𝑤 1 k = 1. We extend {𝑤 1 } to an orthonormal basis of C3×1 . It is



 −1 1  1 

 1
  1
  1
  

𝑤1 = √  1  , 𝑤 2 := √  1 , 𝑤 3 ; = √ −1 .
 6   2   3  
 −2 0  1 

      
Next, we take 𝑤 1, 𝑤 2, 𝑤 3 as the columns of 𝑃 and 𝑣 1, 𝑣 2 as the columns of 𝑄 to obtain
an SVD of 𝐴 as

 2 −1  −1/√6 1/√2 1/√3  30 0  √
 −2/ 5 1/√5 ∗

   √ √ √ 
−2 1 =  1/ 6 1/ 2 −1/ 3   0 0 1 √ 2 √ .
   √
 4 −2  −2/ 6 √ 
1/ 3   0
/ 5 / 5
   0  0
For the tight SVD, 𝑃˜ has the 𝑟 columns as the the first 𝑟 columns of 𝑃, 𝑄˜ has the
the 𝑟 columns as the first 𝑟 columns of 𝑄, and 𝑆 is the usual 𝑟 × 𝑟 block consisting
of positive singular values of 𝐴 as the diagonal entries. With 𝑟 = rank(𝐴) = 1, we
thus have the tight SVD as
 2 −1  −1/√6  h
 √ i −2/√5 ∗
 
−2 1 =  1/√6 30
  
√ .
1/ 5
 4 −2  −2/√6 
   
   
In the tight SVD, using associativity of matrix product, we get the full rank factor-
izations as √
 2 −1  − 5   √  ∗  −1/√6   √  ∗
  √  −2/ 5  √  −2/ 6
−2 1 =  5 √ =  1/ 6 √ .
   √  1/ 5
  √  6
 4 −2 −2 5  −2/ 6 
     
You should check that the columns of 𝑄 are eigenvectors of 𝐴𝐴∗ .

Observe that when we write an 𝑚 × 𝑛 matrix 𝐴 of rank 𝑟 in its SVD form


𝐴 = 𝑃 Σ𝑄 ∗, the columns of 𝑃 are the eigenvectors of the matrix 𝐴𝐴∗ associated with
the eigenvalues 𝑠 12, . . . , 𝑠𝑟2, 0, . . . , 0. Similarly, the columns of 𝑄 are the eigenvectors
of the matrix 𝐴∗𝐴 associated with the same eigenvalues. In the former case, there
are 𝑚 − 𝑟 zero eigenvalues and in the latter case, they are 𝑛 − 𝑟 in number. Writing
the 𝑖th column of 𝑃 as 𝑤𝑖 and the 𝑗th column of 𝑄 as 𝑣 𝑗 , SVD amounts to writing 𝐴
as
𝐴 = 𝑃 Σ𝑄 ∗ = 𝑠 1𝑤 1𝑣 1∗ + · · · + 𝑠𝑟 𝑤𝑟 𝑣𝑟∗ .
Each matrix 𝑤𝑘 𝑣𝑘∗ here is of rank 1. This means that if we know the 𝑟 positive singular
values of 𝐴 and we know their corresponding left and right singular vectors, we know
𝐴 completely. This is particularly useful when 𝐴 is a very large matrix of low rank.
No wonder, SVD is used in image processing, various compression algorithms, and
in principal components analysis. Next to the theory of linear equations, SVD is
the most important tool for applications. We will see another application of SVD
in representing a matrix in a very useful and elegant manner.
Spectral Representation 157
Exercises for § 5.6
1. Let 𝐴 ∈ C𝑚×𝑛 . Show that the positive singular values of 𝐴∗ are precisely the
positive singular values of 𝐴.
2. Prove that if 𝜆1, . . . , 𝜆𝑛 are the eigenvalues of an 𝑛 × 𝑛 hermitian matrix, then
its singular values are |𝜆1 |, . . . , |𝜆𝑛 |.
3. Compute the singular value decomposition of the following matrices:
1 0  0 1 1
√
 
3 2 2   
(a) (b)  1 1  (c)  2 2 0  .
2 3 −2 0 1  0 1 1
  √ √

 √
1/ 2

1/ 2
    1/ 2 1/ 2 0
5 0 0  1 √ √ √ 
Ans: (a) 1 √ √ / 18 −1/ 18 4/ 18  .
/ 2 −1/ 2 0 3 0  2
/3 −2/3 −1/3 
√ 
 1/√6 1/√2 1/√3   3 0   √
 1/ 2 1/√2

 √ √  
(b)  2/ 6
 0 −1/ 3   0 1   √ √ .
 1/√6 −1/√2 1/√3   0 0  1/ 2 −1/ 2
   √ 
 1/√6 −1/√3 1/√2   2 2 0 0   1/√6 1/√3 1/√2 
 √ √   √   √ 
(c)  2/ 6 1/ 3 0   0 2 0   3/ 12 0 −1/2  .
 1/√6 −1/√3 −1/√2   0 0 0   1/√12 −2/√6 1/2 
     
   
1 0 2 −1
4. Show that the matrices and are similar but they have different
1 1 1 0
singular values.
5. Show that if 𝑠 is a singular value of a matrix 𝐴, then there exists a nonzero
vector 𝑥 such that k𝐴𝑥 k = 𝑠 k𝑥 k.
6. Show that a matrix 𝐴 ∈ C𝑚×𝑛 is of rank 1 iff 𝐴 = 𝑢𝑣 for some nonzero vectors
𝑢 ∈ C𝑚×1 and 𝑣 ∈ C1×𝑛 .
7. Show that a scalar 𝜆 > 0 is an eigenvalue of 𝑇 ∗𝑇 iff it is an eigenvalue of 𝑇𝑇 ∗,
without using SVD.
8. In an SVD of a linear operator 𝑇 : 𝑉 → 𝑊 , show that the orthonormal bases
{𝑣 1, . . . , 𝑣𝑛 } for 𝑉 and {𝑤 1, . . . , 𝑤𝑚 } for 𝑊 can be chosen in such a way that
for 1 ≤ 𝑖 ≤ 𝑟, 𝑤𝑖 = (𝑠𝑖 ) −1𝑇 𝑣𝑖 and 𝑣𝑖 = (𝑠𝑖 ) −1𝑇 ∗𝑤𝑖 .
9. Let 𝐴 ∈ C𝑚×𝑛 have singular values 𝑠 1 ≥ · · · ≥ 𝑠𝑟 > 0. Let 𝑆 be the unit circle in
C𝑛×1 . That is, 𝑆 = {𝑥 ∈ C𝑛×1 : k𝑥 k = 1}. Show that 𝑠 1 = max{k𝐴𝑥 k : 𝑥 ∈ 𝑆 }
and 𝑠𝑟 = min{k𝐴𝑥 k : 𝑥 ∈ 𝑆 }.
 
𝑆 0 ∗
10. Let 𝐴 ∈ C 𝑚×𝑛 have the positive singular values 𝑠 1, . . . , 𝑠𝑟 . Let 𝐴 = 𝑃 𝑄
0 0
 −1 
† 𝑆 0 ∗
be an SVD of 𝐴, where 𝑆 = diag (𝑠 1, . . . , 𝑠𝑟 ). Define 𝐴 = 𝑄 𝑃 ∈
0 0
C𝑛×𝑚 with 0 blocks of suitable sizes. Prove the following:
158 MA2031 Classnotes

(a) For any 𝑏 ∈ C𝑚×1, 𝐴†𝑏 is a least squares solution of 𝐴𝑥 = 𝑏.


(b) (𝐴𝐴† ) ∗ = 𝐴𝐴†, (𝐴†𝐴) ∗ = 𝐴†𝐴, 𝐴𝐴†𝐴 = 𝐴 and 𝐴†𝐴𝐴† = 𝐴† .
(c) If 𝐴† is any matrix in C𝑛×𝑚 that satisfies the equalities in (b), then it is
unique.
(The matrix 𝐴† is called the generalized inverse or Moore-Penrose inverse of
the 𝑚 × 𝑛 matrix 𝐴.)

5.7 Polar decomposition


Linear operators on an inner product space behave like complex numbers, in many
respects. Recall that a complex number is written as 𝑧 = 𝑟𝑒 𝑖𝜃 , where 𝑟 is a non-
negative real number which may be thought of as a stretching factor, and 𝑒 𝑖𝜃 is a
rotation. We aim towards representing a linear operator as a product of a positive
semi-definite matrix and a unitary matrix.
Let 𝑇 be a linear operator on a finite dimensional inner product space 𝑉 . We say
that 𝑇 is positive semi-definite iff 𝑇 is self-adjoint, and h𝑇 𝑥, 𝑥i ≥ 0 for each 𝑥 ∈ 𝑉 .
𝑇 is called positive definite iff 𝑇 is self adjoint, and h𝑇 𝑥, 𝑥i > 0 for each nonzero
𝑥 ∈ 𝑉.
Similarly, a matrix 𝐴 ∈ C𝑚× is called positive semi-definite iff 𝐴 is hermitian and

𝑥 𝐴𝑥 ≥ 0 for each 𝑥 ∈ C𝑛×1 ; and 𝐴 is called positive definite iff 𝐴 is hermitian and
𝑥 ∗𝐴𝑥 > 0 for each nonzero 𝑥 ∈ C𝑛×1 .
Examples of positive semi-definite linear operators are abundant. For, if 𝑇 is
any linear operator on a finite dimensional ips, then both 𝑇 ∗𝑇 and 𝑇𝑇 ∗ are positive
semi-definite.

(5.38) Theorem (Polar Decomposition)


Let 𝑇 : 𝑉 → 𝑊 be a linear operator, where 𝑉 and 𝑊 are inner product spaces
of dimensions 𝑛 and 𝑚, respectively. Then there exist positive semi-definite linear
operators 𝑃 on 𝑊 , 𝑄 on 𝑉 , and a linear operator 𝑈 : 𝑉 → 𝑊 such that

𝑇 = 𝑃𝑈 = 𝑈 𝑄,

where 𝑃 2 = 𝑇𝑇 ∗, 𝑄 2 = 𝑇 ∗𝑇 ; 𝑈𝑈 ∗ = 𝐼 for 𝑚 ≤ 𝑛, 𝑈 ∗𝑈 = 𝐼 for 𝑚 ≥ 𝑛, and 𝑈 is


unitary for 𝑚 = 𝑛.

Proof. Let rank(𝑇 ) = 𝑟 . Suppose 𝑠 1 ≥ · · · ≥ 𝑠𝑟 are the positive singular values


of 𝑇 . Due to the SVD of 𝑇 , there exist orthonormal bases {𝑣 1, . . . , 𝑣𝑛 } for 𝑉 and
{𝑤 1, . . . , 𝑤𝑚 } for 𝑊 such that
Spectral Representation 159
𝑇𝑇 ∗𝑤𝑖 = 𝑠𝑖2𝑤𝑖 for 1 ≤ 𝑖 ≤ 𝑟, 𝑇𝑇 ∗𝑤𝑖 = 0 for 𝑟 + 1 ≤ 𝑖 ≤ 𝑚.
𝑇 ∗𝑇 𝑣𝑖 = 𝑠𝑖2𝑣𝑖 for 1 ≤ 𝑖 ≤ 𝑟, 𝑇 ∗𝑇 𝑣𝑖 = 0 for 𝑟 + 1 ≤ 𝑖 ≤ 𝑛.
𝑇 𝑣𝑖 = 𝑠𝑖 𝑤𝑖 for 1 ≤ 𝑖 ≤ 𝑟, 𝑇 𝑣𝑖 = 0 for 𝑟 + 1 ≤ 𝑖 ≤ 𝑛.

Let 𝑣 ∈ 𝑉 and let 𝑤 ∈ 𝑊 . By Fourier expansion,


𝑛
Õ 𝑚
Õ
𝑣= h𝑣, 𝑣𝑖 i𝑣𝑖 , 𝑤= h𝑤, 𝑤 𝑗 i𝑤 𝑗 .
𝑖=1 𝑗=1

Then the above equalities imply that


𝑟
Õ 𝑟
Õ 𝑟
Õ
∗ ∗
𝑇𝑇 𝑤 = 𝑠𝑖2 h𝑤, 𝑤𝑖 i𝑤𝑖 , 𝑇 𝑇𝑣 = 𝑠𝑖2 h𝑣, 𝑣𝑖 i𝑣𝑖 , 𝑇𝑣 = 𝑠𝑖 h𝑣, 𝑣𝑖 i𝑤𝑖 .
𝑖=1 𝑖=1 𝑖=1

Let ℓ = min{𝑚, 𝑛} ≥ 𝑟 . Define linear operators 𝑃 : 𝑊 → 𝑊 , 𝑄 : 𝑉 → 𝑉 , and


𝑈 : 𝑉 → 𝑊 by
𝑟
Õ 𝑟
Õ ℓ
Õ
𝑃𝑤 = 𝑠𝑖 h𝑤, 𝑤𝑖 i𝑤𝑖 , 𝑄𝑣 = 𝑠𝑖 h𝑣, 𝑣𝑖 i𝑣𝑖 , 𝑈 𝑣 = h𝑣, 𝑣𝑖 i𝑤𝑖 for 𝑣 ∈ 𝑉 , 𝑤 ∈ 𝑊 .
𝑖=1 𝑖=1 𝑖=1

Notice that for 1 ≤ 𝑖 ≤ 𝑟, 𝑃𝑤𝑖 = 𝑠𝑖 𝑤𝑖 , 𝑄𝑣𝑖 = 𝑠𝑖 𝑣𝑖 ; for 𝑖 > 𝑟, 𝑃𝑤𝑖 = 0 = 𝑄 (𝑣𝑖 ); and
for 𝑖 > ℓ, 𝑈 𝑣𝑖 = 0. Due to the formula for the adjoint in (3.5.1), we have

𝑚
Õ 𝑟
Õ 𝑟
Õ
𝑃 ∗𝑤 = h𝑤, 𝑃𝑤𝑖 i𝑤𝑖 = h𝑤, 𝑠𝑖 𝑤𝑖 i𝑤𝑖 = 𝑠𝑖 h𝑤, 𝑤𝑖 i𝑤𝑖 = 𝑃𝑤 .
𝑖=1 𝑖=1 𝑖=1
𝑟
Õ 𝑟
Õ
h𝑃𝑤, 𝑤i = 𝑠𝑖 h𝑤, 𝑤𝑖 ih𝑤𝑖 , 𝑤i = 𝑠𝑖 |h𝑤, 𝑤𝑖 i| 2 ≥ 0.
𝑖=1 𝑖=1
𝑟
Õ  𝑟
Õ 𝑟
Õ
2
𝑃 𝑤 = 𝑃 𝑠𝑖 h𝑤, 𝑤𝑖 i𝑤𝑖 = 𝑠𝑖 h𝑤, 𝑤𝑖 i𝑃𝑤𝑖 = 𝑠𝑖2 h𝑤, 𝑤𝑖 i𝑤𝑖 = 𝑇𝑇 ∗𝑤 .
𝑖=1 𝑖=1 𝑖=1
Õℓ  ℓ
Õ 𝑟
Õ
𝑃𝑈 𝑣 = 𝑃 h𝑣, 𝑣𝑖 i𝑤𝑖 = h𝑣, 𝑣𝑖 i𝑃𝑤𝑖 = 𝑠𝑖 h𝑣, 𝑣𝑖 i𝑤𝑖 = 𝑇 𝑣.
𝑖=1 𝑖=1 𝑖=1

Hence 𝑃 is positive semi-definite, 𝑃 2 = 𝑇𝑇 ∗, and 𝑃𝑈 = 𝑇 . Similarly, it follows that 𝑄


is positive semi-definite, 𝑄 2 = 𝑇 ∗𝑇 , and 𝑈 𝑄 = 𝑇 . It remains to verify the conditions
on 𝑈 in different cases.
Case 1: Let 𝑚 ≤ 𝑛. Then ℓ = 𝑚. Let 𝑤 ∈ 𝑊 . For 1 ≤ 𝑖 ≤ 𝑚, using (3.5.1), we have
𝑛
Õ 𝑚
Õ 𝑚
Õ
𝑈 ∗𝑤 = h𝑤, 𝑈 𝑣 𝑗 i𝑣 𝑗 = h𝑤, 𝑤 𝑗 i𝑣 𝑗 , 𝑈 𝑣𝑖 = h𝑣𝑖 , 𝑣 𝑗 i𝑤 𝑗 = 𝑤𝑖 .
𝑗=1 𝑗=1 𝑗=1
160 MA2031 Classnotes
𝑚
Õ  𝑚
Õ 𝑚
Õ

𝑈𝑈 𝑤 = 𝑈 h𝑤, 𝑤 𝑗 i𝑣 𝑗 = h𝑤, 𝑤 𝑗 i𝑈 𝑣 𝑗 = h𝑤, 𝑤 𝑗 i𝑤 𝑗 = 𝑤 .
𝑗=1 𝑗=1 𝑗=1

That is, 𝑈𝑈 ∗ = 𝐼 .
Case 2: Let 𝑚 ≥ 𝑛. Then ℓ = 𝑛. Let 𝑣 ∈ 𝑉 . For 1 ≤ 𝑖 ≤ 𝑛, using (3.5.1) again, we
obtain
Õ𝑛 𝑛
Õ

𝑈 𝑤𝑖 = h𝑤𝑖 , 𝑈 𝑣 𝑗 i𝑣 𝑗 = h𝑤𝑖 , 𝑤 𝑗 i𝑣 𝑗 = 𝑣𝑖 .
𝑗=1 𝑗=1

𝑛
Õ  𝑛
Õ 𝑛
Õ
∗ ∗ ∗
𝑈 𝑈𝑣 = 𝑈 h𝑣, 𝑣𝑖 i𝑤𝑖 = h𝑣, 𝑣𝑖 i𝑈 𝑤𝑖 = h𝑣, 𝑣𝑖 i𝑣𝑖 = 𝑣.
𝑖=1 𝑖=1 𝑖=1

That is, 𝑈 ∗𝑈 = 𝐼 .
When 𝑚 = 𝑛, both (1) and (2) are true. Therefore, 𝑈 is unitary.

The matrix interpretation of (5.38) is straight forward.

(5.39) Theorem (Polar Decomposition)


Each matrix 𝐴 ∈ C𝑚×𝑛 can be written as 𝐴 = 𝑃𝑈 = 𝑈 𝑄, where 𝑃 ∈ C𝑚×𝑚 and
𝑄 ∈ C𝑛×𝑛 are positive semi-definite, 𝑃 2 = 𝐴𝐴∗, 𝑄 2 = 𝐴∗𝐴; 𝑈𝑈 ∗ = 𝐼 for 𝑚 ≤ 𝑛,
𝑈 ∗𝑈 = 𝐼 for 𝑚 ≥ 𝑛, and 𝑈 is unitary for 𝑚 = 𝑛.

Since 𝑃 2 = 𝐴𝐴∗ and 𝑄 2 = 𝐴∗𝐴, conventionally, they are written as


√ √
𝑃 := 𝐴𝐴∗, 𝑄 := 𝐴∗𝐴.

Notice that 𝑃 and 𝑄 are positive semi-definite square roots of the positive semi-
definite matrices 𝐴𝐴∗ and 𝐴∗𝐴, respectively. (See Exercise 9.)
The construction of polar decomposition of matrices from SVD may be summa-
rized as follows:
If 𝐴 ∈ C𝑚×𝑛 has SVD as 𝐴 = 𝐵𝐷𝐸 ∗, then 𝐴 = 𝑃𝑈 = 𝑈 𝑄, where

𝑚 =𝑛: 𝑈 = 𝐵𝐸 ∗, 𝑃 = 𝐵𝐷𝐵 ∗, 𝑄 = 𝐸𝐷𝐸 ∗ .


𝑚 <𝑛: 𝑈 = 𝐵𝐸 1∗, 𝑃 = 𝐵𝐷 1 𝐵 ∗, 𝑄 = 𝐸 1 𝐷 1 𝐸 1∗ .
𝑚 >𝑛: 𝑈 = 𝐵 1 𝐸 ∗, 𝑃 = 𝐵 1 𝐷 2 𝐵 1∗, 𝑄 = 𝐸𝐷 2 𝐸 ∗ .

Here, 𝐸 1 is constructed from 𝐸 by taking its first 𝑚 columns; 𝐷 1 is constructed


from 𝐷 by taking its first 𝑚 columns; 𝐵 1 is constructed from 𝐵 by taking its first 𝑛
columns; and 𝐷 2 is constructed from 𝐷 by taking its first 𝑛 rows.
Spectral Representation 161
(5.40) Example
 2 −1
 
Consider the matrix 𝐴 = −2 1 of (5.37). We had obtained its SVD as 𝐴 = 𝐵𝐷𝐸 ∗,
 4 −2
 
where

 −1/√6 1/√2 1/√3  30 0  √ √ 
 √ √ √    − 2/ 5 1/ 5
𝐵 =  1/ 6 1/ 2 −1/ 3  , 𝐷 =  0 0 , 𝐸 = 1 √ 2 √ .
 −2/√6 √ 
1/ 3
/ 5 / 5
 0   0
 0

We construct the matrices 𝐵 1 by taking first two columns of 𝐵, and 𝐷 2 by taking


first two rows of 𝐷, as in the following:
 −1/√6 √
1/ 2 √ 
 √ √  30 0
𝐵 1 =  1/ 6 1/ 2 , 𝐷2 = .
 −2/√6
 0 0
 0 

Then
−1 1  2 + √3 −1 + 2√3
√  √ √ 
 
1 2 1
 
∗ 1 1 
𝑈 = 𝐵1𝐸 = √  1 √ 3 = √ −2 + 3 1 + 2 3 ,
6  5 1 2 30  
−2 0   4 −2 
 
−1 0  √  1 −1 2 


−1 1 −2
= √5 −1 1 −2 ,
   
𝑃 = 𝐵𝐷𝐵 1∗ = 5  1 0 √1 √ √
−2 0 6 3 3 0 6
 2 −2 4 
   
√ −2 0 1 −2 1
    √  
4 −2
𝑄 = 𝐸𝐷 2 𝐸 ∗ = 6 √ = √6 .
1 0 5 1 2 5 −2 1

As expected we find that

√  1 −1 2   2 + √3 −1 + 2√3  2 −1
5 √ √ 
−1 1 −2 √1 −2 + 3 1 + 2 3 =
    
𝑃𝑈 = √ −2 1 = 𝐴.
6  2 −2 4  30 
    
4 −2   4 −2
     
√ √
 2 + 3 −1 + 2 3   2 −1
√ √  √6 4 −2
 
1

𝑈 𝑄 = √ −2 + 3 1 + 2 3 √ = −2 1 = 𝐴.

30   5 −2 1
 4 −2   4
 −2
 

Exercises for § 5.7


1. Prove (5.38) by using thin SVD.
2. Derive singular value decomposition from the polar decomposition.
162 MA2031 Classnotes

3. Let𝑇 be a self-adjoint liner operator on a finite dimensional ips 𝑉 . Let 𝛼, 𝛽 ∈ R


be such that 𝛼 2 < 4𝛽. Show that 𝑇 2 + 𝛼𝑇 + 𝛽𝐼 is positive definite; and thus
invertible.
4. Let 𝑉 be an ips. Show that if h𝑥, 𝑦i = h𝑧, 𝑦i for all 𝑥, 𝑦, 𝑧 ∈ 𝑉 , then 𝑥 = 𝑧.
5. Let 𝑇 be a linear operator on a finite dimensional complex ips 𝑉 . Let 𝑥, 𝑦 ∈ 𝑉 .
Show that 4h𝑇 𝑥, 𝑦i =
 
h𝑇 (𝑥 +𝑦), 𝑥 +𝑦i − h𝑇 (𝑥 −𝑦), 𝑥 −𝑦i + h𝑇 (𝑥 +𝑖𝑦), 𝑥 +𝑖𝑦i − h𝑇 (𝑥 −𝑖𝑦), 𝑥 −𝑖𝑦i .
6. Let 𝑆 and 𝑇 be linear operators on a finite dimensional complex ips 𝑉 . Show
that if for all 𝑥 ∈ 𝑉 , h𝑆𝑥, 𝑥i = h𝑇 𝑥, 𝑥i, then 𝑆 = 𝑇 .
7. Let 𝑇 be a linear operator on a finite dimensional complex ips 𝑉 . Prove that if
for each 𝑥 ∈ 𝑉 , h𝑇 𝑥, 𝑥i is a non-negative real number, then 𝑇 is self-adjoint.
Note: In general, this statement neither holds in finite dimensional real ips,
nor in infinite dimensional complex ips.
8. Let 𝑈 be a linear operator on a finite dimensional ips 𝑉 . Show that if k𝑈 𝑥 k =
k𝑥 k for all 𝑥 ∈ 𝑉 , then 𝑈 is unitary.
9. Let 𝑆 and 𝑇 be linear operators on a finite dimensional ips 𝑉 . We say that 𝑆
is a square root of 𝑇 iff 𝑆 2 = 𝑇 . Prove that each positive semi-definite linear
operator on a finite dimensional ips has a unique positive semi-definite square
root.
10. Let𝑇 be a linear operator on a finite dimensional ips𝑉 . Show that the following
are equivalent:
(a) 𝑇 is positive semi-definite.
(b) 𝑇 is self-adjoint and each eigenvalue of 𝑇 is non-negative.
(c) 𝑇 has a positive semi-definite square root.
(d) 𝑇 has a self-adjoint square root.
(e) There exists a linear operator 𝑆 on 𝑉 such that 𝑇 = 𝑆 ∗𝑆.
Answers to Exercises
Answers may be brief. You should work them out in detail.
§ 1.2
1(a) It is; 0 = (0, 0), −(𝑎, 0) = (−𝑎, 0). (b) It is. (c) It is not. (d) It is not;
1(1, 1) +2(1, 1) ≠ 3(1, 1). (e) It is not; 1(1, 1) ≠ (1, 1). (f) It is not; 1(1, 1) +1(1, 1) ≠
2(1, 1). (g) It is; 0 = 1, −𝑥 = 1/𝑥 . (h) It is not; (1
√ + (−1)) 1 ≠ 1 1 + (−1) √ √1. (i) It
is; 0 = (0, 1), −(𝑎, 𝑏) = (−𝑎, 1/𝑏). (j) It is not; 2 · 1 ∉ 𝑉 . (k) It is not; 2 · 2 ∉ 𝑉 .
(l) It is not; (1 + 1) 1 ≠ 1 1 ⊕ 1 1.
(m) It is; 0 = 𝑤, −𝑥 = 𝑤 − 𝑥 . (n) It is; 0 = (1, 0), −(𝑎, 𝑏) = (2 − 𝑎, −𝑏).
2. No. 𝑡 5 + (−𝑡 5 ) is not in the set. 3. No. 4. Yes. 5. No; (𝑖𝐼 ) ∗ = −(𝑖𝐼 ).
6(a) Add −𝑦. (b) Follows from Theorem 1.2(3e). (c) Follows from (b) as there are
infinite number of scalars.
§ 1.3
1(a) It is not; (0, 0) ∉ 𝑈 . (b) It is. (c) It is. (d) It is. (e) It is not; (1/2)𝑡 ∉ 𝑈 .
(f) It is; taking zero function as even. (g) It is not; 𝑓 (𝑡) = 𝑡 2 does not have an
additive inverse in 𝑈 . (h) It is. (i) It is not; 𝑖𝐼 ∉ 𝑈 .
2. R2 : {(0, 0}, R2, and all straight lines passing through the origin.
R3 : {(0, 0, 0)}, R3, all straight lines and planes passing through the origin.
3. Only for 𝑏 = 0. 4. Yes. 5. Not closed under addition (scalar multiplication).
6. Yes. 7(a,b,c) Yes. 8. It is a subspace of the space of all functions from 𝑆 to R.
§ 1.4
1. No. 𝑡 3 + 2𝑡 2 + 2𝑡 is not in the span. 2. {𝑎 0 + 𝑎 1𝑡 + 𝑎 2𝑡 2 + · · · + 𝑎𝑛 𝑡 𝑛 : 𝑎 2𝑖−1 = 0}.
3. F3 . 4. 𝑢 ∈ span (𝑆 2 ). So, span (𝑆 2 ) ⊆ span (𝑆 1 ) implies 𝑢 ∈ span (𝑆 1 ). Next,
𝑢 ∈ span (𝑆 1 ) ⇒ 𝑆 2 ⊆ span (𝑆 1 ) ⇒ span (𝑆 2 ) ⊆ span (𝑆 1 ).
5. 𝑆 is a subspace iff it equals the minimal subspace containing it.
Í
6. 1 = 𝑢 1, 𝑡 𝑖 = 𝑢𝑖 − 𝑖−1 𝑗=0 𝑢 𝑗 ; and F𝑛−1 [𝑡] = span {1, 𝑡, . . . , 𝑡
𝑛−1 }. Next, F[𝑡] =
∞ 𝐹 [𝑡]; so it is spanned by {1, 𝑡, 𝑡 2, . . .}.
∪𝑛=0 𝑛
7. The intersection is a (the) minimal subspace containing 𝑆.
8. No; any linear combination of 1, 𝑡, 𝑡 2, . . . is a polynomial in 𝑡 .
9. Both ∅ and {0} span {0}. If 𝑉 ≠ {0}, then both 𝑉 and 𝑉 \ {0} span 𝑉 .
10. {𝑓 , 𝑔}, where 𝑓 (1) = 1, 𝑓 (2) = 0; 𝑔(1) = 0, 𝑔(2) = 1.
11. Let 𝐸𝑖 𝑗 ∈ F𝑛×𝑛 have (𝑖, 𝑗)th entry 1; all other entries 0.
A basis is {𝐸𝑖𝑖 : 1 ≤ 𝑖 ≤ 𝑛} ∪ {𝐸𝑖 𝑗 + 𝐸 𝑗𝑖 : 1 ≤ 𝑖 < 𝑗 ≤ 𝑛}.
12(a) A linear combination of linear combinations is a linear combination.
(b) A linear combination of vectors from 𝐴 is also a linear combination of vectors
from 𝐵. (c) span (𝐴 ∩ 𝐵) ⊆ span (𝐴) and span (𝐴 ∩ 𝐵) ⊆ span (𝐵).

163
164 Introduction to Linear Algebra

(d) False: 𝑉 = R, 𝐴 = {1}, 𝐵 = {2}. (e) False: 𝑉 = R2, 𝐴 = {(1, 0), (0, 1)},
𝐵 = {(1, 0)}. (f) False: 𝑉 = R2, 𝐴 = {(1, 0), (0, 1)}, 𝐵 = {(1, 1)}.
13. 𝑈 = 𝑥-axis, 𝑉 = 𝑦-axis, 𝑊 = the line 𝑦 = 𝑥 .
14. If 𝑥 = 𝑢 + 𝑤 = 𝑢 0 + 𝑤 0 for 𝑢, 𝑢 0 ∈ 𝑈 and 𝑤, 𝑤 0 ∈ 𝑊 , then 𝑢 − 𝑢 0 = 𝑤 − 𝑤 0 . So
both 𝑢 − 𝑢 0, 𝑤 − 𝑤 0 ∈ 𝑈 ∩ 𝑊 .
15(a) 𝑉 ⊆ 𝑉 +𝑊 . So, 𝑈 ∩𝑉 ⊆ 𝑈 ∩ (𝑉 +𝑊 ). Similarly, 𝑈 ∩𝑊 ⊆ 𝑈 ∩ (𝑉 +𝑊 ). Then
the conclusion follows. (b) Take 𝑋 = R2, 𝑈 = span {(1, 1)}, 𝑉 = span {(1, 0)},
𝑊 = span {(0, 1)}. (c) 𝑉 ∩ 𝑊 ⊆ 𝑉 ; so 𝑈 + (𝑉 ∩ 𝑊 ) ⊆ 𝑈 + 𝑉 . 𝑉 ∩ 𝑊 ⊆ 𝑊 ; so
𝑈 + (𝑉 ∩ 𝑊 ) ⊆ 𝑈 + 𝑊 . Therefore, 𝑈 + (𝑉 ∩ 𝑊 ) ⊆ (𝑈 + 𝑉 ) ∩ (𝑈 + 𝑊 ). (d) Take
𝑈 , 𝑉 ,𝑊 , 𝑋 as in (b).
16. 𝑐 00, the set of all sequences each having finitely many nonzero terms.
§ 1.5
1(a) Lin. Ind. (b) (7, 8, 9) = 2(4, 5, 6) − (1, 2, 3). (c) Lin. Ind. (d) 4th = 7/11 times
1st +8/11 times 2nd +13/11 times 3rd. (e) Lin. Ind. (f) Lin. Ind. (g) Lin. Ind.
(h) Lin. Ind. (i) 2 = 2 sin2 𝑡 + 2 cos2 𝑡 . (j) Lin. Ind. (k) Lin. Ind.
2. If (𝑎, 𝑏) = 𝛼 (𝑐, 𝑑), then 𝑎𝑑 − 𝑏𝑐 = 0. If (𝑐, 𝑑) = 𝛼 (𝑎, 𝑏), then 𝑎𝑑 − 𝑏𝑐 = 0. If
𝑎𝑑 = 𝑏𝑐, then (𝑎, 𝑏) = (0, 0) or (𝑐, 𝑑) = (0, 0) or (𝑎, 𝑏) = 𝛼 (𝑐, 𝑑) for some nonzero 𝛼 .
3. {(1, 0), (0, 1)} spans R2 . Use Theorem 1.15.
4. (1, 0), (0, 1), (1, 1). 5. (1, 0, 0), (0, 1, 0), (0, 0, 1), (1, 1, 1).
6(a) If a linear combination of vectors from a subset is zero, then the same shows
that a linear combination of vectors from the superset is also zero. (b) Follows from
(a). (c) Follows from (a). (d) Follows from (c).
7(a) {(1, 0)} is a lin. ind subset of the lin. dep. set {(1, 0), (0, 1)}. (b) Take the sets
in (a). (c) {(1, 0)} and {(2, 0)} are each lin. ind. but their union is not.
(d) {(1, 0), (2, 0)} and {(1, 0), (3, 0)} are each lin. dep. but their intersection is not.
8(a) Not necessarily. 𝐴 = {(1, 0), (2, 0)}, 𝐵 = {(0, 1)}. (b) If 𝑣 ≠ 0 and 𝑣 =
𝑎 1𝑢 1 + · · · + 𝑎𝑛𝑢𝑛 = 𝑏 1𝑣 1 + · · · + 𝑏𝑚 𝑣𝑚 for nonzero 𝑎𝑖 , 𝑏 𝑗 and 𝑢𝑖 ∈ 𝐴, 𝑣 𝑗 ∈ 𝐵, then
𝑎 1𝑢 1 + · · · + 𝑎𝑛𝑢𝑛 − 𝑏 1𝑣 1 − · · · − 𝑏𝑚 𝑣𝑚 = 0 shows that 𝐴 ∪ 𝐵 is lin. dep.
9. Suppose 𝑎𝑒 𝑡 + 𝑏𝑡𝑒 𝑡 + 𝑐𝑡 3𝑒 𝑡 = 0. Evaluate it at 𝑡 =∫ −1, 0, 1. Solve for 𝑎, 𝑏, 𝑐.
Í 𝜋
10. Yes. Let 𝑓 (𝑡) = 𝑛𝑘=1 𝑎𝑘 sin 𝑘𝑡 . 𝑓 (𝑡) = 0 ⇒ −𝜋 sin 𝑚𝑡 𝑓 (𝑡)𝑑𝑡 = 0 for any 𝑚.
Evaluate the integral and conclude that 𝑎𝑚 = 0 for 1 ≤ 𝑚 ≤ 𝑛.
11. Otherwise, a higher degree polynomial is a linear combination of some lower
degree polynomials. Differentiate the equation to get a contradiction.
12. Five vectors in the span of four vectors are lin. dep.
Í
13. 𝑆 ∪ {𝑣 } is lin. dep. implies 𝑛𝑖=1 𝑎𝑖 𝑣𝑖 = 0, where 𝑣𝑖 ∈ 𝑆 ∪ {𝑣 } and 𝑎𝑖 are scalars
not all zero. Since 𝑆 is lin. ind., 𝑣 is one of these 𝑣𝑖0𝑠. Say, 𝑣 1 = 𝑣. Then 𝑎 1 ≠ 0 again
due to the lin. ind. of 𝑆. Then 𝑣 = 𝑣 1 = (𝑎 1 ) −1 𝑛𝑖=2 𝑣𝑖 ∈ span (𝑆).
Í
14. Without loss of generality, suppose 𝑢 + 𝑣 1 = 𝑎 2 (𝑢 + 𝑣 2 ) + · · · + 𝑎𝑛 (𝑢 + 𝑣𝑛 ). If
𝑎 2 + · · · + 𝑎𝑛 = 1, then 𝑣 1 = 𝑎 2𝑣 2 + · · · + 𝑎𝑛 𝑣𝑛 contradicting the lin. ind. of 𝑣 1, . . . , 𝑣𝑛 .
Thus (1 − 𝑎 2 − · · · − 𝑎𝑛 )𝑢 = 𝑎 2𝑣 2 + · · · 𝑎𝑛 𝑣𝑛 .
Answers to exercises 165
§ 1.6
1(a) Basis. (b) Basis. (c) Not a basis. (d) Basis. 2. Yes. 3(a) Yes. (b) No.
4. {(1, 0, −1), (0, 1, −1)}. 5. {(1, 0, 0, 0, 1), (0, 1, 0, 1, 0), (0, 0, 1, 0, 1)}.
6. {𝑡 − 2, 𝑡 2 − 2𝑡 − 2}. 7. {1 + 𝑡 2, 1 − 𝑡 2, 𝑡, 𝑡 3 }. 8. Yes; Yes.
9. {𝑒 1, 𝑒 2, 𝑒 3 }, {𝑒 1 + 𝑒 2, 𝑒 1 + 𝑒 3, 𝑒 2 + 𝑒 3 }, {𝑒 1 + 2𝑒 2, 𝑒 2 + 2𝑒 3, 𝑒 3 + 2𝑒 1 }.
10. {𝑓1, 𝑓2, 𝑓3 }, where for 𝑖, 𝑗 ∈ {1, 2, 3}, 𝑓𝑖 (𝑖) = 1, and 𝑓𝑖 ( 𝑗) = 0 when 𝑖 ≠ 𝑗 .
11. Let 𝐸 𝑗𝑘 ∈ C𝑛×𝑛 have 1 as ( 𝑗, 𝑘)th entry; all other entries 0. A basis is
{𝐸 𝑗 𝑗 : 1 ≤ 𝑗 ≤ 𝑛} ∪ {𝑖𝐸 𝑗 𝑗 : 1 ≤ 𝑗 ≤ 𝑛} ∪ {𝐸 𝑗𝑘 + 𝐸𝑘 𝑗 : 1 ≤ 𝑗 < 𝑘 ≤ 𝑛}
∪{𝑖 (𝐸 𝑗𝑘 + 𝐸𝑘 𝑗 ) : 1 ≤ 𝑗 < 𝑘 ≤ 𝑛}.
12. A basis is {𝐸   11 −𝐸𝑛𝑛 : 1 ≤ 𝑖< 𝑛} ∪  {𝐸 𝑖 𝑗 : 1≤ 𝑖 < 𝑗 ≤ 𝑛}.
𝑖 0 0 0 0 1 0 𝑖
13. A basis is , , , .
0 0 0 𝑖 −1 0 𝑖 0
§ 1.7
1. {0}, R3, straight lines passing through the origin, and planes passing through the
origin. 2(a) Basis: {(0, 1, 0, 0, 0), (1, 0, 1, 0, 0), (1, 0, 0, 1, 0), (0, 0, 0, 0, 1)}.
(b) Basis: {(1, 0, 0, 0, −1), (0, 1, 1, 1, 0)}. (c) Basis: {(1, −1, 0, 2, 1), (2, 1, −2, 0, 0),
(2, 4, 1, 0, 1)} 3. 3; It is span {1 + 𝑡 2, −1 + 𝑡 + 𝑡 2, 𝑡 3 }.
4(a) dim F (F𝑛×𝑛 ) = 𝑛 2 ; dim R (C𝑛×𝑛 ) = 2𝑛 2 . (b) In F𝑛×𝑛 , dim F is (𝑛 2 + 𝑛)/2. In
C𝑛×𝑛 , dim R is 𝑛 2 + 𝑛. (c) In F𝑛×𝑛 , dim F is (𝑛 2 − 𝑛)/2. In C𝑛×𝑛 , dim R is 𝑛 2 − 𝑛.
(d) In C𝑛×𝑛 , Basis over R: the 𝑛 matrices with a single 1 on the diagonal, the
𝑛(𝑛 − 1)/2 matrices with a single pair of 1s at corresponding off-diagonal elements
and the 𝑛(𝑛 − 1)/2 matrices with a single pair of 𝑖 and −𝑖 at corresponding off-
diagonal elements. Thus dim R is 𝑛 2 . In R𝑛×𝑛 , dim R is (𝑛 2 + 𝑛)/2.
(e) In F𝑛×𝑛 , dim F is (𝑛 2 + 𝑛)/2. In C𝑛×𝑛 , dim R is 𝑛 2 + 𝑛. (f) In F𝑛×𝑛 , dim F is 𝑛. In
C𝑛×𝑛 , dim R is 2𝑛. (g) In F𝑛×𝑛 , dim F is 1. In C𝑛×𝑛 , dim R is 2.
5. dim (𝑈 ∩ 𝑊 ) = dim (𝑈 ) and 𝑈 ∩ 𝑊 is a subspace of 𝑈 implies 𝑈 ∩ 𝑊 = 𝑈 .
6. If 𝑈 ∩ 𝑊 = {0}, then dim (𝑈 ) + dim (𝑊 ) = dim (𝑈 + 𝑊 ) ≤ 9; a contradiction.
7. {(−3, 0, 1)} is a basis for 𝑈 ∩ 𝑊 ; dim (𝑈 + 𝑊 ) = 3.
8. 𝑈 + 𝑉 = {(𝑎𝑖 ) ∈ R50 : 12/𝑖}, 𝑈 ∩ 𝑉 = {(𝑎𝑖 ) ∈ R50 : 3/𝑖 or 4/𝑖}, dim (𝑈 ) = 34,
dim (𝑉 ) = 38, dim (𝑈 + 𝑉 ) = 46, dim (𝑈 ∩ 𝑉 ) = 26.
9. No: dim (span {𝑡, 𝑡 2, 𝑡 3, 𝑡 4, 𝑡 5 }) ≤ dim (𝑉 ) = 3.
10. {𝑓1, . . . , 𝑓𝑛 } is a basis where 𝑓𝑖 (𝑖) = 1, 𝑓𝑖 ( 𝑗) = 0 for 𝑗 ≠ 𝑖.
11. If 𝑆 is a linearly dependent spanning set, systematically delete vectors to get
a basis; contradicting |𝑆 | = dim (𝑉 ). For Theorem 1.27, adjoin to a basis of 𝑈 the
vectors from a spanning set of 𝑉 and delete all linearly dependent vectors from the
new ones. 12. R[𝑡] ⊆ 𝑉 .
§ 1.8
1. 3. 2. Basis: {(1, −1, 0, 2, 1), (0, 3, −2, −4, −2), (0, 0, 5, 4, 2), (0, 0, 0, 0, 1)}.
3. Basis: {1 + 𝑡 2, 𝑡 + 2𝑡 2, 𝑡 3 }. 4(a) Bases for 𝑈 : {(1, 2, 3), (2, 1, 1)}; 𝑊 :
166 Introduction to Linear Algebra

{(1, 0, 1), (3, 0, −1)}; 𝑈 + 𝑊 : {(1, 2, 3), (0, 3, 5), (0, 2, 2)}; 𝑈 ∩ 𝑊 : {(−3, 0, 1)}.
(b) Bases for 𝑈 : {(1, 0, 2, 0), (1, 0, 3, 0)}; 𝑊 : {(1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 1)};
𝑈 + 𝑊 : {(1, 0, 2, 0), (0, 1, 0, 0), (0, 1, 0, 0), (0, 0, 1, 1)}; 𝑈 ∩ 𝑊 : {(1, 0, 0, 0)}.
(c) Bases for 𝑈 : {(1, 0, 0, 2), (3, 1, 0, 2), (7, 0, 5, 2)};𝑊 : {(1, 0, 3, 2), (1, 1, −1, −1)};
𝑈 +𝑊 : {(1, 0, 0, 2), (0, 1, 0, −4), (0, 0, 3, 0), (0, 0, 0, 1)}; 𝑈 ∩𝑊 : {(1, −12, 15, 14)}.
5. dim of 𝑈 : 3; 𝑊 : 3; 𝑈 + 𝑊 : 4; 𝑈 ∩ 𝑊 : 2. 6. 𝑏 (𝑎 − 3) = 6.
7(a) Each 𝑣𝑖 ∈ span {𝑣 1, 𝑣 2 − 𝑣 1, . . . , 𝑣𝑛 − 𝑣 1 }. (b) Due to (a) and dim (𝑉 ) = 𝑛.
§ 2.1
2(a) h(0, 1), (0, 1)i = 0. (b) As in (a). (c) h(1, 1), (1, 1)i = 0. (d) h1, 1i = 0. (e) As in
(d). (f) h1, 1i = 0. (g) For 𝑓 (𝑡) = 0 in [0, 1/2] and 𝑓 (𝑡) = 𝑡 in (1/2, 1], h𝑓 , 𝑓 i = 0.
3. (𝑦 t𝐴𝑥) t = 𝑦 t𝐴𝑥 implies 𝑎 12 = 𝑎 21 . Next, 𝑥 t𝐴𝑥 ≥ 0 gives a quadratic. Complete
Í
the square and argue. 4. With 𝑦 = 𝑛𝑖=1 𝛼𝑖 𝑥𝑖 , h𝑦, 𝑦i = 0.
5. h𝑥, 𝑦i = 𝛼 + 𝑖𝛽 ⇒ Reh𝑖𝑥, 𝑦i = −𝛽.
6. (a)-(c) easy. (d) k𝑥 + 𝛼𝑦k 2 = k𝑥 − 𝛼𝑦k 2 iff Re(𝛼 h𝑥, 𝑦i) = 0. Take 𝛼 = h𝑥, 𝑦i.
(e) k𝑥 + 𝑦 k 2 = (k𝑥 k + k𝑦 k) 2 iff Reh𝑥, 𝑦i = k𝑥 k k𝑦 k (Using Reh𝑥, 𝑦i ≤ |h𝑥, 𝑦i| ≤
k𝑥 k k𝑦 k) iff |h𝑥, 𝑦i| = k𝑥 k k𝑦 k (Cauchy-Schwartz) iff one is a scalar multiple of the
other. 7. Expand the norms on the right hand side and simplify.
8. Consider 𝑥 = (𝑎 1, 2𝑎 2, . . . , 𝑛𝑎𝑛 ), 𝑦 = (𝑏 1, 𝑏 2 /2, . . . , 𝑏𝑛 /𝑛). Apply Cauchy-
Schwarz.
§ 2.2
1. 𝑊 = span {(3, −1, 3, 0), (0, −1, 3, 3)}. 2. h𝑥, 𝑦i = h𝑦, 𝑥i ⇒ h𝑥 + 𝑦, 𝑥 − 𝑦i =
k𝑥 k 2 − k𝑦 k 2 . 5. Yes. 6. If 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } is an orthonormal set and for each 𝑥,
k𝑥 k 2 = 𝑛𝑗=1 |h𝑥, 𝑣 𝑗 i| 2, then 𝐵 is an orthonormal basis. For, let 𝑦 = 𝑛𝑗=1 h𝑥, 𝑣 𝑗 i𝑣 𝑗 .
Í Í

Then k𝑥 k 2 = k𝑦 k 2 . 7. Easy.
8(a) 𝑥 ∈ 𝑉 ⊥ ⇒ h𝑥, 𝑣i = 0 for all 𝑣 ∈ 𝑉 . In particular, h𝑥, 𝑥i = 0. (b) h𝑣, 0i = 0 for
all 𝑣 ∈ 𝑉 . (c) If h𝑥, 𝑧i = 0 = h𝑦, 𝑧i, then h𝑥 + 𝛼𝑦, 𝑧i = 0. (d) If 𝑥 ∈ 𝑆, then h𝑥, 𝑦i = 0
for all 𝑦 ∈ 𝑆 ⊥ .
9(a) 𝑊 ⊆ 𝑉 , 𝑊 ⊥ ⊆ 𝑉 ; so 𝑊 + 𝑊 ⊥ ⊆ 𝑉 . Let 𝑣 ∈ 𝑉 . Let {𝑣 1, . . . , 𝑣𝑛 } be an
Í
orthonormal basis of 𝑉 . Write 𝑥 = 𝑛𝑗=1 h𝑥, 𝑣 𝑗 i𝑣 𝑗 ; 𝑦 = 𝑣 − 𝑥 . Then h𝑦, 𝑣 𝑗 i = 0. Hence
𝑦 ∈ 𝑊 ⊥ . (b) Let 𝑥 ∈ 𝑊 ∩𝑊 ⊥ . Then h𝑥, 𝑥i = 0. (c) 𝑊 ⊆ 𝑊 ⊥⊥ . Let 𝑥 ∈ 𝑊 ⊥⊥ . Using
(a), 𝑥 = 𝑤 +𝑦, for some 𝑤 ∈ 𝑊 and 𝑦 ∈ 𝑊 ⊥ . Then h𝑤, 𝑦i = 0 ⇒ 0 = h𝑥, 𝑦i = h𝑦, 𝑦i.
Then 𝑥 = 𝑤 ∈ 𝑊 . ∫ 2𝜋
Í
10. Let 𝑥 (𝑡) := 𝑛𝑗=1 𝑎 𝑗 sin( 𝑗𝑡) = 0. Compute 0 𝑥 (𝑡) sin(𝑚𝑡)𝑑𝑡 for 𝑚 = 1, 2, . . . , 𝑛.
§ 2.3
1(a) (1, 2, 0), (6/5, −3/5, 0), (0, 0, 1). (b) (1, 1, 1), (2/3, −4/3, 2/3), (1, 0, −1).
(c) (0, 1,√1), (0, 1, −1), (−1, 0, 0).
√ √
2(a) (1/ 14)(1, −2, 3). (b) (1/ 74)(7, −4, 3). (c) (1/2 5)(0, 2, 4).
3(a) span {(−1, 1, 0, 1), (0, 0, 1, 0)}. (b) span {(−6, 1, 5, 2), (0, 1, −1, 1)}.
(c) span {(1, 0, 0, 0), (0, 0, 1, 1)}. 4. (1/2)(1, 0, 𝑖/2), (1/4)(1 + 𝑖, 1, 1 − 𝑖) .
Answers to exercises 167
5(a) 1, 𝑡 − 1/2, 𝑡 2 − 𝑡 + 1/6. (b) 1, 𝑡, 𝑡 2 − 1/3. (c) 1, 𝑡 + 1/2, 𝑡 2 − 5𝑡 − 11/6.
6(a) {−(𝑏 + 𝑐 + 𝑑) + 2𝑏𝑡 + 3𝑐𝑡 2 + 4𝑑𝑡 3 : 𝑎, 𝑏, 𝑐, 𝑑 ∈ R}.
(b) 1, 𝑡 − 1/2, 𝑡 2 − 𝑡 + 1/6, 𝑡 3 − 3𝑡 2 /2 + 3𝑡/5 − 1/20.
§ 2.4
1. (5/3, 4/3, 1/3). 2. (−1/3, 2/3, −1/3). 3. 𝑣 since 𝑣 ∈ 𝑈 . 4. −19/20−3𝑡/5+3𝑡 2 /2.
5. 𝑒 𝑡 − 9(𝑒 − /𝑒) + 3𝑡/𝑒 − 15(𝑒 − 13/𝑒)𝑡 2 /8.
§ 3.1
1(a) 𝑇 (0, 0) ≠ (0, 0). √ (b) 𝑇 (2, 2) = (2, 4); 2𝑇 (1, 1) = (2, 2). (c) 𝑇 (𝜋/2, 0) =
(1, 0); 2𝑇 (𝜋/4, 0) = ( 2, 0). (d) 𝑇 (−1, 0) = (1, 0); (−1)𝑇 (1, 0) = (−1, 0).
(e) 𝑇 (0, 0) ≠ (0, 0). (f) 𝑇 (0, 2) = (0, 4); 2𝑇 (0, 1) = (2, 2).
2. 𝑇 (𝑥) = 𝛼𝑥 for some 𝛼 . 3. 𝑇 (2, 3) = (5, 11). 𝑇 is one-one.
4. 𝑇 𝑆 (𝑥) = 0 and 𝑆𝑇 (𝑥) = 𝑥 (1) − 𝑥 (0). Both are linear transformations.
5. No. If 𝑇 (𝑎, 𝑏) = (1, 1), then 𝑇 (−𝑎, −𝑏) = (−1, −1), which is not in the co-domain
square. 6. Fix a basis {𝑣 1, 𝑣 2 } for 𝑉 . If 𝑣 = 𝑎𝑣 1 + 𝑏𝑣 2, define 𝑇 (𝑣) = (𝑎, 𝑏).
§ 3.2
1(a) No 𝑇 since 𝑇 (2, −1) ≠ 2𝑇 (1, 1) − 3𝑇 (0, 1). (b) 𝑇 (𝑎, 𝑏) = (2𝑎 − 𝑏, 𝑎 − 𝑏, 2𝑎).
(c) No 𝑇 as 𝑇 (−2, 0, −6) ≠ −2𝑇 (1, 0, 3). (d) 𝑇 (𝑎, 𝑏, 𝑐) = (𝑐, (𝑏 + 𝑐 − 𝑎)/2).
(e) 𝑇 (𝑎 + 𝑏𝑡 + 𝑐𝑡 2 + 𝑑𝑡 3 ) = 𝑏 + 𝑐 and many more. (f) This 𝑇 itself.
(g) No 𝑇 since 𝑇 (1 + 𝑡) ≠ 𝑇 (1) + 𝑇 (𝑡). (h) This 𝑇 is linear.
2. No. Let 𝑇 (1, 1) = (𝑎, 𝑏), 𝑇 (1, −1) = (𝑐, 𝑑). Now, −1 ≤ 𝑎, 𝑐 ≤ 1 and 0 ≤ 𝑏, 𝑑 ≤ 2.
Then 𝑇 (−1, −1) = (−𝑎, −𝑏), 𝑇 (−1, 1) = (−𝑐, −𝑑). Here, 0 ≤ −𝑏, −𝑑 ≤ 2 also.
(Image points are inside the co-domain.) It forces 𝑏 = 𝑑 = 0. So, 𝑇 (1, 1) = (𝑎, 0),

𝑇 (1, −1) = (𝑐, 0). Then 𝑇 (𝛼, 𝛽) = (𝛼 + 𝛽)𝑎/2 + (𝛼 − 𝛽)𝑏/2, 0 for all 𝛼, 𝛽 ∈ R. No
point goes to (1, 2).
3. Expand k𝑇 (𝑢 + 𝑣)k 2 and use k𝑇 𝑥 k 2 = k𝑥 k 2 for all 𝑥 ∈ 𝑉 .
§ 3.3
1(a) rank(𝑇 ) = 2, null(𝑇 ) = 0. (b) rank(𝑇 ) = 2, null(𝑇 ) = 1.
(c) rank(𝑇 ) = 2, null(𝑇 ) = 0. (d) rank(𝑇 ) = 2, null(𝑇 ) = 0.
(e) rank(𝑇 ) = 2, null(𝑇 ) = 0. (f) rank(𝑇 ) = 3, null(𝑇 ) = 0.
2. Space of all const. func.
3(a) 𝑁 (𝑇 ) = {(𝑎, 𝑎, −𝑎) : 𝑎 ∈ R}, 𝑅(𝑇 ) = {(𝑎, 𝑎 + 𝑏, 𝑏) : 𝑎, 𝑏 ∈ R}.
(b) 𝑆 = {(1, 0, 1) + 𝑥 : 𝑥 ∈ 𝑁 (𝑇 )}.
4(a) rank(𝑇 ) ≤ dim (𝑉 ) and 𝑅(𝑇 ) ⊆ 𝑊 . (b) 𝑇 is onto implies rank(𝑇 ) = dim (𝑊 ).
(c) 𝑇 is one-one implies rank(𝑇 ) = dim (𝑉 ). (d) Follows from (c).
(e) Follows from (b).
5(a) 𝑇 (𝑎, 𝑏) = (𝑎 − 𝑏, 𝑎 − 𝑏). (b) 𝑆 (𝑎, 𝑏) = (𝑎, 𝑏), 𝑇 (𝑎, 𝑏) = (𝑏, 𝑎).
6. Extend a basis {𝑢 1, . . . , 𝑢𝑘 } for 𝑈 to a basis {𝑢 1, . . . , 𝑢𝑘 , 𝑣 1, . . . , 𝑣𝑚 } for 𝑉 .
(a) 𝑇 (𝑢𝑖 ) = 𝑢𝑖 , 𝑇 (𝑣 𝑗 ) = 0. (b) 𝑇 (𝑢𝑖 ) = 0, 𝑇 (𝑣 𝑗 ) = 𝑣 𝑗 .
168 Introduction to Linear Algebra

7. 𝑅(𝑇 ) = span {𝑇 𝑣 1, . . . ,𝑇 𝑣𝑛 }. (a)𝑇 is one-one iff 𝑅(𝑇 ) = dim (𝑉 ) iff {𝑇 𝑣 1, . . . ,𝑇 𝑣𝑛 }


is a basis of 𝑅(𝑇 ). Similarly, (b)-(c) follow. 8. 𝑓 is a linear transformation. 𝑓 is
one-one iff 𝑁 (𝑓 ) = {0} iff 𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣 − 𝑛 = 0 implies 𝛼 1 = · · · = 𝛼𝑛 = 0.
9. 𝑇 (𝐼 − 𝑇 ) = (𝐼 − 𝑇 )𝑇 . Let 𝑦 ∈ 𝑅(𝑇 ). Then for some 𝑥, 𝑦 = 𝑇 𝑥 . So, (𝐼 − 𝑇 )𝑇 𝑥 =
(𝐼 − 𝑇 )𝑦 = 0. That is, 𝑦 ∈ 𝑁 (𝐼 − 𝑇 ). Similarly other implications are proved.
10. 𝑅(𝑇 𝑆) ⊆ 𝑇 (𝑅(𝑆)). So, rank(𝑇 𝑆) ≤ dim (𝑅(𝑆)) = rank(𝑆). And, 𝑅(𝑇 𝑆) ⊆
𝑇 (𝑅(𝑆)) ⊆ 𝑇 (𝑉 ) = 𝑅(𝑇 ). So, rank(𝑇 𝑆) ≤ rank(𝑇 ).
§ 3.4
1. 𝑇 (𝑣𝑖 ) = 𝑒𝑖 . 𝑁 (𝑇 ) = {0}, 𝑅(𝑇 ) = F𝑛 .
2. 𝑇 (𝑎 0, 𝑎 1, . . . , 𝑎𝑛 ) = (𝑎 1 + 𝑎 2𝑡 + · · · + 𝑎𝑛 𝑡 𝑛−1 + 𝑎 0𝑡 𝑛 ).
3. 𝑅(𝑇 ) = span {𝑇 𝑣 1, . . . ,𝑇 𝑣𝑛 }. See Exercise 7 of § 3.3.
4. null(𝑇 ) = dim (𝑉 ) − 1 iff rank(𝑇 ) = 1 iff 𝑇 is onto. 5. Use Theorem 3.18.
6. (a) h𝑇 𝑥,𝑇 𝑥i = (𝑇 𝑥) ∗ (𝑇 𝑥) ≥ 0. (b) h𝑇 𝑥,𝑇 𝑥i = (𝑇 𝑥) ∗ (𝑇 𝑥) = 0 iff 𝑇 𝑥 = 0 iff
𝑥 = 0. Similarly, other conditions can be verified.
7(a) 𝑆𝑇 = 𝐼𝑉 implies 𝑇 is one-one; then null(𝑇 ) = 0. dim (𝑉 ) = dim (𝑊 ) implies
rank(𝑇 ) = dim (𝑉 ) = dim (𝑊 ). So, 𝑇 is onto. Thus 𝑇 is an isomorphism; 𝑆 = 𝑇 −1 .
(b) Define 𝑇 : R2 → R3, 𝑆 : R3 → R2 by 𝑇 (𝑎, 𝑏) = (𝑎, 𝑏, 0) and 𝑆 (𝑎, 𝑏, 𝑐) = (𝑎, 𝑏).
Then 𝑆𝑇 (𝑎, 𝑏) = 𝑆 (𝑎, 𝑏, 0) = (𝑎, 𝑏). But 𝑇 𝑆 (𝑎, 𝑏, 𝑐) = 𝑇 (𝑎, 𝑏) = (𝑎, 𝑏, 0).
8. 𝑉 = R∞ ; 𝑇 (𝑎 1, 𝑎 2, 𝑎 3, . . .) = (𝑎 1, 𝑎 1, 𝑎 2, 𝑎 2, 𝑎 3, 𝑎 3, . . .);
𝑆 (𝑎 1, 𝑎 2, 𝑎 3, . . .) = (𝑎 1, 𝑎 3, 𝑎 5, . . .).
§ 3.5
1. 𝑇 ∗ (𝛼) = 𝛼𝑢. 2. 𝑇 ∗ (𝑎 1, . . . , 𝑎𝑛 ) = (𝑎 2, . . . , 𝑎𝑛 , 0).
3. 𝑆𝑇 = 𝑇 𝑆 = 𝐼 ⇒ 𝑇 ∗𝑆 ∗ = 𝑆 ∗𝑇 ∗ = 𝐼 . So, 𝑇 ∗ is invertible and its inverse is
𝑆 ∗ = (𝑇 −1 ) ∗ .
4(a) 𝑥 ∈ 𝑁 (𝑇 ) ⇒ h𝑇 𝑥, 𝑦i = 0 ⇒ h𝑥,𝑇 ∗𝑦i = 0 ∀𝑦 ∈ 𝑊 . So, 𝑥 ∈ 𝑅(𝑇 ∗ ) ⊥ . Also,
𝑥 ∈ 𝑅(𝑇 ∗ ) ⊥ ⇒ h𝑥,𝑇 ∗𝑦i = 0 ∀𝑦 ∈ 𝑊 ⇒ h𝑇 𝑥, 𝑦i = 0 ∀𝑦 ∈ 𝑊 . In particular,
h𝑇 𝑥,𝑇 𝑥, i = 0. So, 𝑇 𝑥 = 0 ⇒ 𝑥 ∈ 𝑁 (𝑇 ). Others are proved using 𝑇 ∗ instead of 𝑇
and Exercise 9 of § 2.2.
5. If 𝑇𝑇 ∗ = 𝑇 ∗𝑇 = 𝐼, then h𝑇 𝑥,𝑇 𝑥i = h𝑥, 𝑥i. For the converse, use polarization
identity in Exercise 7 of § 2.1 so that k𝑇 𝑥 k = k𝑥 k implies h𝑇 𝑥,𝑇𝑦i = h𝑥, 𝑦i. It gives
𝑇 ∗𝑇 = 𝑇𝑇 ∗ = 𝐼 .
6(a) 𝑇 : F2 → F2, 𝑇 (𝑎, 𝑏) = (2𝑎 − 3𝑏, 3𝑎 − 2𝑏). 𝑇 ∗ (𝑎, 𝑏) = (2𝑎 + 3𝑏, −3𝑎 + 2𝑏).
(b) The 𝑇 in (a). (c) 𝑇 : F2 → F2, 𝑇 (𝑎, 𝑏) = (−𝑏, 𝑎), 𝑇 ∗ (𝑎, 𝑏) = (𝑏, −𝑎). (d) 𝑇 = 2𝐼 .
7. (1/3, 1/3, 1/3). 8. 𝑓 (𝑝 (𝑡)) = 𝑝 (1/2) is a linear functional on R2 [𝑡]. We require
a 𝑞(𝑡) so that h𝑝, 𝑞i = 𝑓 (𝑝). Take 𝑞(𝑡) = h𝑓 , 𝑢 1 i𝑢 1 + h𝑓 , 𝑢 2 i𝑢 2 + √h𝑓 , 𝑢 3 i𝑢 3, where
{𝑢 1, 𝑢 2, 𝑢 3 } is an orthonormal basis for R2 [𝑡]. Take 𝑢 1 = 1, 𝑢 2 = 3(2𝑡 − 1), 𝑢 3 =

5(6𝑡 2 − 6𝑡 + 1). Then 𝑞(𝑡) = −3/2 + 15𝑡 − 15𝑡 2 .
9. null(𝑇 ) = null(𝑇 ∗𝑇 ) = null(𝐼 ) = 0. Thus 𝑇 is invertible. Then
𝑇 ∗𝑇 = 𝐼 ⇒ 𝑇𝑇 ∗𝑇 = 𝑇 ⇒ 𝑇𝑇 ∗ = 𝐼 .
Answers to exercises 169
§ 4.1
1. rank(𝐴) ≤ 𝑚 < 𝑛. So, null(𝐴) = 𝑛 − rank(𝐴) ≥ 1.
2(a) 𝐴𝑥 = 𝑏 has a unique solution iff 𝑏 ∈ 𝑅(𝐴) and 𝑁 (𝐴) = {0}. (b) Sol (𝐴, 𝑏) =
𝑢 + 𝑁 (𝐴) = 𝑢 + span (𝐵), where 𝐵 is a basis of 𝑁 (𝐴). (c) If 𝑚 = 𝑛, then 𝐴𝑥 = 𝑏 has
a unique solution iff 𝑏 ∈ 𝑅(𝐴) and 𝑁 (𝐴) = {0} iff rank(𝐴) = 𝑛 iff det(𝐴) ≠ 0.
3. 𝐴 has 𝑚 rows and 𝑛 columns.
(a) rank(𝐴) = 𝑛 ⇒ null(𝐴) = 0 ⇒ 𝑁 (𝐴) = Sol (𝐴, 0) = {0}.
(b) rank(𝐴) < 𝑛 ⇒ null(𝐴) ≥ 1 ⇒ 𝑁 (𝐴) = Sol (𝐴, 0) ≠ {0}.
(c) If 𝑚 ≥ 𝑛, then rank(𝐴) ≤ 𝑛. Construct examples.
(d) If 𝑚 < 𝑛, then rank(𝐴) < 𝑛 ⇒ null(𝐴) ≥ 1. (e) 𝐴𝑥 = 0 has infinitely many
solutions iff null(𝐴) ≥ 1 iff rank(𝐴) < 𝑛 iff columns of 𝐴 are lin. dep.
4(a) rank([𝐴|𝑏]) > rank(𝐴) ⇒ 𝑏 ∉ 𝑅(𝐴). (b) Theorem 4.3(a).
(c) rank([𝐴|𝑏]) = rank(𝐴) ⇒ 𝐴𝑥 = 𝑏 has a solution. rank(𝐴) < 𝑛 ⇒ null(𝐴) ≥
1 ⇒ Sol (𝐴, 0) is a nonzero subspace, an infinite set.
(d) 𝐴𝑥 = 𝑏 has a solution iff 𝑏 ∈ 𝑅(𝐴).
(e) 𝐴𝑥 = 𝑏 has at most one solution iff 𝐴𝑥 = 0 has a unique solution iff null(𝐴) = 0
iff rank(𝐴) = 𝑛 iff columns of 𝐴 are lin. ind.
5. If 𝐴𝑥 = 𝑏 has a solution, then 𝑏 ∈ 𝑅(𝐴). Now, 𝑏 is orthogonal to each column of
𝐴 implies 𝑏 ⊥ 𝑦 for each 𝑦 ∈ 𝑅(𝐴). Then 𝑏 ⊥ 𝑏 ⇒ 𝑏 = 0.
6. If 𝑈 = {0}, then take 𝐴 = 𝐼, 𝑏 = 𝑣 so that Sol (𝐴, 𝑏) = {𝑏}. Else, let 𝑈 be a nonzero
subspace. Extend a basis {𝑢 1, . . . , 𝑢𝑚 } of 𝑈 to a basis 𝐵 = {𝑢 1, . . . , 𝑢𝑚 , 𝑢𝑚+1, . . . , 𝑢𝑛 }
of F𝑛×1 . Define 𝑇 : F𝑛×1 → F𝑛×1 by 𝑇 (𝑢𝑖 ) = 0 for 𝑖 ≤ 𝑚, and 𝑇 (𝑢𝑖 ) = 𝑢𝑖 for
𝑖 > 𝑚. If 𝑣 = [𝑎 1, . . . , 𝑎𝑛 ] 𝑡 , take 𝑦 = 𝑇 (𝑎 1𝑢 1 + · · · + 𝑎𝑛𝑢𝑛 ). Then {𝑥 ∈ F𝑛×1 :
𝑇 𝑥 = 𝑦} = (𝑎 1𝑢 1 + · · · + 𝑎𝑛𝑢𝑛 ) + 𝑈 . Finally, take 𝐴 = [𝑇 ]𝐵,𝐵 and 𝑏 = [𝑦]𝐵 . Then
Sol (𝐴, 𝑏) = [𝑎 1𝑢 1 + · · · + 𝑎𝑛𝑢𝑛 ]𝐵 + 𝑈 = 𝑣 + 𝑈 .
7. Let 𝑋 = [𝑋 1 · · · 𝑋𝑘 ]. 𝐴𝑋 = 0 iff 𝐴𝑋 1 = 0, . . . , 𝐴𝑋𝑘 = 0. So, dim (𝑈 ) = 𝑘 null(𝐴).
8(a)n𝑥 1 = 10/9, 𝑥 2 = 11/9, 𝑥 3 = 17/9, 𝑥 4 = −10/9. o
 𝑡  𝑡
(b) −10/9 23/27 121/27 0 + 𝛼 −2 −1/3 7/3 1 : 𝛼 ∈ F . 9. 𝑘 = −12.
10(a) 𝑘 = −3. (b) 𝑘 = 2. (c) 𝑘 ∉ {−3, 2}.
11. 𝑎 ∉ {0, 7} : unique solution; 𝑎 = 0, 𝑏 = 5/2 or 𝑎 = 7, 𝑏 = 4/9 : infinitely many
solutions; 𝑎 = 0, 𝑏 ≠ 5/2 or 𝑎 = 7, 𝑏 ≠ 4/9 : no solutions.
§ 4.2
1(a) 𝑥 = −1/5, √
𝑦 = 3/5. (b) 𝑥 = −2/3, 𝑦 = 1, 𝑧 = −1/3.
0 1/ 2  
  1 1
√ .
2(a) 𝑄 = 1 0√  , 𝑅 =
0 1/ 2 0 2
 3 1 −√2
 √  √ √ √
2 3 3 3 
√ √   
(b) 𝑄 = √1  3 −1 , 𝑅 = √1  0 3 1  .
√2 

6  6
 0 0 √
0 2 2   2 2
 
170 Introduction to Linear Algebra

1 −1 1 
  2 3 2 
1 1 1 −1
   
(c) 𝑄 = 2  , 𝑅 = 0 5 −2 .
1 1 1 
  
0 0 4 
1 −1 −1  
 
3. Suppose 𝐴 = 𝑄 1𝑅1 = 𝑄 2𝑅2, 𝑄 1, 𝑄 2 ∈ F𝑚×𝑛 satisfy 𝑄 1∗𝑄 1 = 𝑄 2∗𝑄 2 = 𝐼, 𝑅1 = [𝑎𝑖 𝑗 ],
𝑅2 = [𝑏𝑖 𝑗 ] ∈ F𝑛×𝑛 are upper triangular, and 𝑎𝑘𝑘 > 0, 𝑏𝑘𝑘 > 0 for 1 ≤ 𝑘 ≤ 𝑛. Then
𝑅1∗𝑅1 = 𝑅1∗𝑄 1∗𝑄 1𝑅1 = (𝑄 1𝑅1 ) ∗𝑄 1𝑅1 = 𝐴∗𝐴 = (𝑄 2𝑅2 ) ∗𝑄 2𝑅2 = 𝑅2∗𝑅2 . Now, 𝑅1, 𝑅2, 𝑅1∗
and 𝑅2∗ are all invertible. Multiplying (𝑅2∗ ) −1 on the left, and (𝑅1 ) −1 on the right,
we have (𝑅2∗ ) −1𝑅1∗𝑅1 (𝑅1 ) −1 = (𝑅2∗ ) −1𝑅2∗𝑅2 (𝑅1 ) −1 . It implies (𝑅2∗ ) −1𝑅1∗ = 𝑅2𝑅1−1 . The
matrix on the left is lower triangular and that on the right is upper triangular;
so both are diagonal. Comparing the diagonal entries in the products we have
[(𝑏𝑖𝑖 ) −1 ] ∗𝑎𝑖𝑖∗ = 𝑏𝑖𝑖 (𝑎𝑖𝑖 ) −1 for 1 ≤ 𝑖 ≤ 𝑛. That is, |𝑎𝑖𝑖 | 2 = |𝑏𝑖𝑖 | 2 . Since 𝑎𝑖𝑖 > 0
and 𝑏𝑖𝑖 > 0, we see that 𝑎𝑖𝑖 = 𝑏𝑖𝑖 . Hence (𝑅2−1 ) ∗𝑅1∗ = 𝑅2𝑅1−1 = 𝐼 . Therefore,
𝑅2 = 𝑅1, 𝑄 2 = 𝐴𝑅2−1 = 𝐴𝑅1−1 = 𝑄 1 .
4. Since columns of 𝐴 are lin. ind. the least squares solution is unique.
5(a) Extend the columns of 𝐴 to a basis {𝑢 1, . . . , 𝑢𝑚 } of F𝑚×1 . Use Gram-Schmidt
orthogonalization to obtain a basis {𝑣 1, . . . , 𝑣𝑚 } for F𝑚×1 . There exists an isomor-
phism between these two bases. So, let 𝑃 ∈ F𝑚×𝑚 be the invertible matrix (this
isomorphism) such that 𝑃𝑢𝑖 = 𝑣𝑖 for 1 ≤ 𝑖 ≤ 𝑚. Let 𝐶 = [𝑣 1 · · · 𝑣𝑛 ]. Then 𝐶 ∗𝐶 is a
diagonal matrix. Moreover, 𝑃𝐴 = 𝐶 implies that 𝐴 = 𝑃 −1𝐶, which is the required
factorization of 𝐴. (b) Use orthonormalization instead of orthogonalization.

§ 4.3
−1/3 1/3   1/3 1/3  1 0 1 1 0 1
       
1. [𝑇 ]𝐷,𝐵 =  0 1  ; [𝑇 ]𝐷,𝐶 =  2 3  . 2(a)-(b) 0 1 1 . (c)
 
1 1 0 .
 
 2/3 −2/3 −2/3 −2/3 1 1 0 0 1 1
       
 −1 −1 0 
 
   0 2 0
3. 1 2 4 . 4.   .
 1 −1 0 

 0 0 1
 

§ 4.4
1 0 0 0  0 1 0
   
0 0 1 0  2 2 4  
1(a)   . (b)   . (c) 1 0 0 1 .
0 1 0 0  0 0 0
  
0 0 0 1  0 1 6
   
2(a) If 𝑆𝑇 = 𝑇 𝑆, then (𝑆𝑇 ) ∗ = (𝑇 𝑆) ∗ = 𝑆 ∗𝑇 ∗ = 𝑆𝑇 . If (𝑆𝑇 ) ∗ = 𝑆𝑇 , then 𝑆𝑇 = (𝑆𝑇 ) ∗ =
𝑇 ∗𝑆 ∗ = 𝑇 𝑆. (b) (𝑇 ∗𝑆𝑇 ) ∗ = 𝑇 ∗𝑆 ∗𝑇 = 𝑇 ∗𝑆𝑇 .
(c) 𝑇 ∗𝑆𝑇 = (𝑇 ∗𝑆𝑇 ) ∗ = 𝑇 ∗𝑆 ∗𝑇 . Multiply (𝑇 ∗ ) −1 on the left and 𝑇 −1 on the right.
3. 𝐴∗ + 𝐴 = 0 = 𝐵 ∗ + 𝐵 ⇒ (𝐴 + 𝛼𝐵) ∗ + (𝐴 + 𝛼𝐵) = 0 for real 𝛼 .
Answers to exercises 171
         1 1 + 𝑖 1 1 1 0
𝑖 0 0 0 0 −1 0 𝑖    
Basis: , , , . 4.  −1 + 𝑖 1 1  . 5. 0 1 1 .
0 0 0 𝑖 1 0 𝑖 0 
 −1 −1 1
 
1 0 1

   
6. tr(𝐴 + 𝛼𝐵) = tr(𝐴) + 𝛼tr(𝐵); so 𝑉 is a subspace of F𝑛×𝑛 .
7. tr(𝐴 + 𝛼𝐵) = tr(𝐴) + 𝛼tr(𝐵); with 𝑉 as in Q.6, null(𝑇 ) = dim F (𝑉 ) = 𝑛 2 − 1.
0 1
8. 𝐴 = . 9(a) No; tr(−𝐼 + (−𝐼 )) < 0. (b) Yes. 10. 𝐴 = 𝐼 = 𝐵.
−1 0Í Í Í Í Í Í
11. tr(𝐴𝐵) = 𝑛𝑗=1 𝑛𝑘=1 𝑎 𝑗𝑘 𝑏𝑘 𝑗 = 𝑛𝑘=1 𝑛𝑗=1 𝑎𝑘 𝑗 𝑏 𝑗𝑘 = 𝑛𝑗=1 𝑛𝑘=1 𝑏 𝑗𝑘 𝑎𝑘 𝑗 = tr(𝐵𝐴).
 
𝑎 𝑏
12. tr(𝐴𝐵 − 𝐵𝐴) = tr(𝐴𝐵) − tr(𝐵𝐴) = 0 ≠ tr(𝐼 ). 13. Let 𝐶 = . If 𝑎 = 0, take
𝑐 −𝑎
       
0 −𝑏 1 0 0 𝑎 0 0
𝐴= ,𝐵= . If 𝑎 ≠ 0, take 𝐴 = ,𝐵= .
𝑐 0 0 0 0 𝑐 1 𝑏/𝑎
14. tr(𝐴) = 𝑖 𝑗 |𝑎𝑖 𝑗 | 2 .
Í Í
15. 𝐴∗𝐴 = 𝐴2 ⇒ 𝐴𝐴∗ = (𝐴∗ ) 2 . Then tr[(𝐴∗ − 𝐴) ∗ (𝐴∗ − 𝐴)] = tr[𝐴𝐴∗ − 𝐴∗𝐴] = 0.

§ 4.5

 1 1 −1  −1 1 0 1
  1
 
1.  −1 1 1  = 2  1 1 0  .
 1 −1 1  0 1 1
   
2 1 1 2 3 3

1 
 
1 

(a) [𝐼 ]𝑁 ,𝑂 = 2  1 2 1  . (b) [𝑇 ]𝑁 ,𝑂 = 2  2 1 3  .

1 1 2 0 0 2
   
1 1 1  4    1  4
      1
      
(c)  2  =  0  ,  2  = 2  3  ,  𝑇  2   =  4  .
3
  𝑂   2   3 𝑁
 5    3 
      𝑁  2 
     
 
0 1 1 0 1 1 1 0
2(a) 𝑄 = ,𝑅 = . (b) 𝑃 = . (c) 𝑆 = .
1 0 −4 −1 −1 −3 −4 −1
−1 = [𝐼 ] −1
(d) 𝑆 = 𝑃𝑄𝑃    [𝐴]
𝑁 ,𝑂  𝑂,𝑂 [𝐼 ]𝑁
 , 𝑂 = [𝐴]
 𝑁 ,𝑂 [𝐼 ]𝑂,𝑁 = [𝐴]  = 𝑅.
 𝑁 ,𝑁  
1 0 1 1 1 −1 −1
3. 𝐴 = ,𝑣 = ,𝐵 = , ; then [𝑣]𝐵 = , 𝐴[𝑣]𝐵 = ,
1 1 2 0 1 2 1
   
1 −2
[𝐴𝑣]𝐵 = = .
3 𝐵 3
Í Í
4. If 𝑇 𝑣𝑖 = 𝑎𝑖1𝑤 1 + · · · 𝑎𝑖𝑚𝑤𝑚 for 1 ≤ 𝑖 ≤ 𝑛, then 𝑇 = 𝑛𝑖=1 𝑚𝑗=1 𝑎𝑖 𝑗𝑇𝑖 𝑗 . Next, this
equals 0 implies 𝑇 𝑣𝑖 = 0. As {𝑤 𝑗 } lin. ind., 𝑎𝑖1 = · · · = 𝑎𝑖𝑚 = 0. So, {𝑇𝑖 𝑗 } is lin. ind.
5(a) 𝑇 is one-one iff null(𝑇 ) = {0} iff null([𝑇 ]𝐸,𝐵 ) = 0 iff rank([𝑇 ]𝐸,𝐵 ) = 𝑛.
(b) 𝑇 is onto iff [𝑇 ]𝐸,𝐵 is onto iff rank([𝑇 ]𝐸,𝐵 ) = 𝑚.
6. Both L (𝑉 ,𝑊 ) and F𝑚×𝑛 are vector spaces. Use Exercise 5 and show that
[𝛼 𝑇 ]𝐸,𝐵 = 𝛼 [𝑇 ]𝐸,𝐵 and [𝑆 + 𝑇 ]𝐸,𝐵 = [𝑆]𝐸,𝐵 + [𝑇 ]𝐸,𝐵 .
7. Since the map 𝑇 ↦→ [𝑇 ]𝐸,𝐵 is an isomorphism, it maps a basis onto a basis.
172 Introduction to Linear Algebra

8(a) Write 𝐶 𝑗 := the 𝑗th column of 𝐴 = [h𝑢 1, 𝑢 𝑗 i, . . . , h𝑢𝑛 , 𝑢 𝑗 i] 𝑡 . Suppose for scalars
Í Í
𝑏 1, . . . , 𝑏𝑛 , 𝑗 𝑏 𝑗 𝐶 𝑗 = 0. Its 𝑖th component gives 𝑗 𝑏 𝑗 h𝑢𝑖 , 𝑢 𝑗 i = 0. That is, for each 𝑖,
Í Í
h 𝑗 𝑏 𝑗 𝑢 𝑗 , 𝑢𝑖 i = 0. Since {𝑢𝑖 } is a basis, for each 𝑣 ∈ 𝑉 , h 𝑗 𝑏 𝑗 𝑢 𝑗 , 𝑣i = 0. In particular,
Í Í Í
h 𝑗 𝑏 𝑗 𝑢 𝑗 , 𝑗 𝑏 𝑗 𝑢 𝑗 i = 0. Or, 𝑗 𝑏 𝑗 𝑢 𝑗 = 0. Due to lin ind. of {𝑢 𝑗 }, each 𝑏 𝑗 = 0. So, the
columns of 𝐴 are lin. ind.
(b) Since {𝐶 1, . . . , 𝐶𝑛 } is a basis for F𝑛×1, there exist unique scalars 𝑏 1, . . . , 𝑏𝑛
such that [𝛼 1, . . . , 𝛼 𝑛 ] 𝑡 = 𝑏 1𝐶 1 + · · · + 𝑏𝑛𝐶𝑛 . Comparing the components, we have
Í Í
𝛼 𝑖 = h𝑢𝑖 , 𝑗 𝑏 𝑗 𝑢 𝑗 i. So, 𝛼𝑖 = h 𝑗 𝑏 𝑗 𝑢 𝑗 , 𝑢𝑖 i.
9. [𝑇 ]𝐶,𝐶 = [𝐼 ]𝐶,𝐵 [𝑇 ]𝐵,𝐵 [𝐼 ]𝐵,𝐶 . Thus we show that if 𝑅 = 𝑃 −1𝑄𝑃, then tr(𝑅) = tr(𝑃)
for 𝑛 × 𝑛 matrices 𝑃, 𝑄, 𝑅, with 𝑃 invertible. For this, use tr(𝑀1 𝑀2 ) = tr(𝑀2 𝑀1 ).
Similarly, do for the determinant.
Í Í Í Í
10. 𝑥 = 𝑖 h𝑥, 𝑢𝑖 i𝑢𝑖 , 𝑦 = 𝑗 h𝑦, 𝑢 𝑗 i𝑢 𝑗 ⇒ h𝑥, 𝑦i = 𝑖 h𝑥, 𝑢𝑖 i 𝑗 h𝑢 𝑗 , 𝑦ih𝑢𝑖 , 𝑢 𝑗 i. This
proves the first part. Next, define 𝑇 : 𝑉 → F𝑛 by 𝑇 (𝑢𝑘 ) = 𝑒𝑘 for 𝑘 = 1, . . . , 𝑛. Since
𝑥 = 𝑘 h𝑥, 𝑢𝑘 i𝑢𝑘 , 𝑇 𝑥 = 𝑘 h𝑥, 𝑢𝑘 i𝑒𝑘 . Using first part, k𝑇 𝑥 k 2 = 𝑘 |h𝑥, 𝑢𝑘 i| 2 = k𝑥 k 2 .
Í Í Í

§ 4.6
1. For 𝐴 = [𝑎𝑖 𝑗 ], write 𝐴 = [𝑎𝑖 𝑗 ]. See that rank(𝐴 = rank(𝐴). Then use
rank(𝐵𝑡 ) = rank(𝐵). 2. If rank(𝐴) = 𝑟 = rank(𝐵), then 𝐴 = 𝑄 −1 𝐸𝑟 𝑃 and
𝐵 = 𝑀 −1 𝐸𝑟 𝑆. So, 𝐵 = 𝑀 −1𝑄𝐴𝑃 −1𝑆.
3(a) 𝑅(𝐴𝐵) = {𝐴𝐵𝑥 : 𝑥 ∈ F𝑘×1 } ⊆ {𝐴𝑦 : 𝑦 ∈ F𝑛×1 } = 𝑅(𝐴). (b) From (a),
rank(𝐴𝐵) ≤ rank(𝐴). Next, rank((𝐴𝐵)𝑡 ) = rank(𝐵𝑡 𝐴𝑡 ) ≤ rank(𝐵𝑡 ) = rank(𝐵).
4. Let 𝐴 = 𝐷𝐸 and  𝐵 = 𝐹𝐺 be the full rank factorizations of 𝐴
 and 𝐵. Now,
  𝐸 𝐸
𝐴+𝐵 = 𝐷 𝐹 . By Exercise 3, rank(𝐴 + 𝐵) ≤ rank ≤ rank(𝐸) +
𝐺 𝐺
rank(𝐺) = rank(𝐴) + rank(𝐵).
5(a) Since 𝐴 = 𝐵𝐶, each column of 𝐴 is a linear combination of columns of 𝐵. Since
𝐵 has full rank,the columns of 𝐵 are lin. ind. (b) Use (a) on 𝐴𝑡 = 𝐶 𝑡 𝐵𝑡 .
6. The columns of 𝐴 are unique linear combinations of columns of 𝐴. The coeffi-
cients in these linear combinations give the matrix 𝐶. Thus 𝐶 is a unique matrix.
7. Since 𝐷 is invertible, rank(𝐵𝐷) = rank(𝐵).
8. From Exercise 5(a), columns of 𝐵 1 form a basis for 𝑅(𝐴). Also, the columns of
𝐵 2 form a basis for 𝑅(𝐴). The isomorphism that maps the columns of 𝐵 1 to columns
of 𝐵 2 provides such a 𝐷. Then use Exercise 6.

§ 5.1
1. Let 𝜆 be a diagonal entry of 𝐴. In 𝐴 − 𝜆𝐼, the corresponding diagonal entry
is 0. Look at the first 0 entry on the diagonal of 𝐴 − 𝜆𝐼 . (first from (1, 1)th entry
through (𝑛, 𝑛)th.) If this occurs on the 𝑘th column, then the 𝑘th column is a linear
combination of earlier columns. Thus rank(𝐴 − 𝜆𝐼 ) < 𝑛. And, null(𝐴 − 𝜆𝐼 ) ≥ 1.
Thus 𝐴𝑣 = 𝜆𝑣 for some 𝑣 ≠ 0. 2. For rows summing to 𝛼: 𝐴[1 · · · 1] 𝑡 = 𝛼 [1 · · · 1] 𝑡 .
For columns summing to 𝛼, use null(𝐴 − 𝛼𝐼 ) = null(𝐴𝑡 − 𝛼𝐼 ).
Answers to exercises 173
3(a) 𝑇 𝑣 = 𝜆𝑣 ⇒ 𝑇 𝑘 𝑣 = 𝜆𝑘 𝑣. (b) 𝑇 𝑣 = 𝜆𝑣 ⇒ (𝑇 + 𝛼𝐼 )𝑣 = (𝜆 + 𝛼)𝑣. (c) Follows from
(a)-(b). Next, if F = C, then yes.
4. Let 𝑇 𝑣 = 𝜆𝑣. Then 𝑇 −1𝑇 𝑣 = 𝑣 = (1/𝜆)𝜆𝑣 = (1/𝜆)𝑇 𝑣, and 𝑇 𝑣 ≠ 0 for 𝑣 ≠ 0.
5. 𝑥 may not be a common
  eigenvector. 6. Yes; 𝑇 = 𝐼 . 7. Yes; 𝑇 = 𝜆𝐼 .
1 1 1 0
8. 𝐴 = , 𝐵= .
1 0 1 1
§ 5.2
1(a) (𝑡 − 1)(𝑡 − 2), [1 − 1] 𝑡 , [−2 1] 𝑡 . (b) (𝑡 − 𝑖)(𝑡 + 𝑖), [1 − 2 − 𝑖] 𝑡 , [1 𝑖 − 2] 𝑡 .
(c) (𝑡 + 2)(𝑡 − 3)(𝑡 − 5), [5 2 0] 𝑡 , [0 1 0] 𝑡 , [3 − 3 7] 𝑡 .
(d) (𝑡 + 1)(𝑡 − 2)(𝑡 − 3), [1 − 1 1] 𝑡 , [1 2 4] 𝑡 , [1 3 9] 𝑡 .
2. Eigenvalue is 0; eigenvector is 1.
3(a) If 𝑎 = 0, then 𝐴 has lin. dep. columns. If 𝑎 ≠ 0, then 𝐴 has lin. ind.
columns. (b) 𝜒𝐴 (𝑡) = 𝑡 3 − 𝑐𝑡 2 − 𝑏𝑡 − 𝑎. Use Cayley-Hamilton theorem and verify
that 𝐴(1/𝑎)(𝐴2 − 𝑐𝐴 − 𝑏𝐼 ) = 𝐼 .
4. Let 𝜒𝐴 (𝑡) = 𝑎 0 + 𝑎 1𝑡 + · · · + 𝑎𝑛−1𝑡 𝑛−1 + 𝑡 𝑛 . Then 𝐴 is invertible iff 𝑎 0 ≠ 0. And,
𝐴 = −(𝑎 0) −1 (𝑎 1𝐼 + 𝑎 2𝐴 + · · · + 𝑎𝑛−1𝐴𝑛−2 + 𝐴𝑛−1 ).
0 −1
5. has no real eigenvalue.
1 0
6. 𝑥 ⊥ 𝑦. Now, 𝑧 := 𝑥 × 𝑦 is orthogonal to both 𝑥 and 𝑦. Then {𝑥, 𝑦, 𝑧} is an
orthogonal basis of R3×1 . Let 𝐴𝑥 = 𝛼𝑥 and 𝐴𝑦 = 𝛽𝑦. Let 𝐴𝑧 = 𝑎𝑥 + 𝑏𝑦 + 𝑐𝑧. Then
h𝐴𝑧 −𝑐𝑧, 𝑥i = h𝐴𝑧, 𝑥i−h𝑐𝑧, 𝑥i = h𝑧, 𝐴𝑥i+0 = h𝑧, 𝑎𝑥i = 0. Similarly, h𝐴𝑧 −𝑐𝑧, 𝑦i = 0.
So, h𝐴𝑧 − 𝑐𝑧, 𝐴𝑧 − 𝑐𝑧i = h𝐴𝑧 − 𝑐𝑧, 𝑎𝑥 + 𝑏𝑦i = 0. That is, 𝐴𝑧 − 𝑐𝑧 = 0.

§ 5.3
√ √ 𝑡 √ √ 𝑡
 
−2 9
1. {(1/ 2, −1/ 2) , (1/ 2, 1/ 2) }; [𝑇 ] = .
0 3
   
1 1 2 3 −14
2(a) 𝑃 = √ ;𝑈 = .
5 −2 1 0 1
   
1 1 1+𝑖 2 + 𝑖 −1 + 2𝑖
(b) 𝑃 = √ ;𝑈 = .
3 1 − 𝑖 −1 0 2−𝑖
2 −2 1  9 0 9
1 
   
(c) 𝑃 = 3 2 1 −2 ; 𝑈 = 0 9 9 .

1 −2 2  0 0 9
   
3. Notice that with 𝐷 = diag (9, 9, 9), (𝑈 −𝐷) 2 = 0. Using binomial theorem to com-
950 0 50 · 950 
 
pute 𝑈 50 = (𝐷 + (𝑈 − 𝐷)) 50 gives 𝑈 50 = 𝐷 50 + 50𝐷 49 (𝑈 − 𝐷) =  0 950 50 · 950  .
0 0
 950 
 209 400 400 
 
Then 𝐴50 = 𝑃 𝑡 𝑈 𝑃 = 949 −50 −91 −100 .
−50 −100 −91 
 
174 Introduction to Linear Algebra

4. Let 𝐴 = 𝑃 ∗𝑈 𝑃, where 𝑈 is upper triangular and 𝑃 is unitary.


Now, 𝑃𝐴∗𝐴𝑃 ∗ = 𝑃𝑃 ∗𝑈 ∗𝑃𝑃 ∗𝑈 𝑃𝑃 = 𝑈 ∗𝑈 . Then
tr(𝐴∗𝐴) = tr(𝑃𝐴∗𝐴𝑃 ∗ ) = tr(𝑈 ∗𝑈 ) = 𝑖 |𝑢𝑖𝑖 | 2 + 𝑖> 𝑗 |𝑢𝑖 𝑗 | 2 ≥ 𝑖 |𝑢𝑖𝑖 | 2 = 𝑖 |𝜆𝑖 | 2 .
Í Í Í Í
5. |det(𝐴)| ≤ (Π|𝜆𝑖 | 2 ) 1/2 ≤ ( 𝑛1 |𝜆𝑖 | 2 )𝑛/2 ≤ ( 𝑛1 tr(𝐴∗𝐴))𝑛/2 ≤ (𝑐 2𝑛)𝑛/2 .
Í
 
𝐵 𝐶
6. 𝐴 − 𝜆𝐼 = where 𝐵 ∈ C𝑚×𝑚 , 𝐶 ∈ C𝑚×(𝑛−𝑚) , 0 ∈ C (𝑛−𝑚)×𝑚 and
0 𝐷
𝐷 ∈ C (𝑛−𝑚)×(𝑛−𝑚) . Further, 𝐵 and 𝐷 are upper triangular, all diagonal entries of
𝐵 are 0, and each diagonal entry of 𝐷 is nonzero. Due to these particular forms of 𝐵
and 𝐷, we see that 𝐵𝑚 = 0 and 𝐷𝑚 is upper triangular with nonzero diagonal  entries.
˜ 0 𝐶˜
𝑚
 
𝐵 𝐶
Taking successive powers of 𝐴 − 𝜆𝐼, get (𝐴 − 𝜆𝐼 ) = 𝑚 = .
0 𝐷𝑚 0 𝐷𝑚
Here, 𝐶˜ ∈ C𝑚×(𝑛−𝑚) is some matrix. Clearly, rank((𝐴 − 𝜆𝐼 )𝑚 )) = 𝑛 − 𝑚. Due to
Rank-nullity theorem, null((𝐴 − 𝜆𝐼 )𝑚 ) = 𝑚.
7. There exists an invertible 𝑃 ∈ C𝑛×𝑛 such that 𝐵 = 𝑃 −1𝐴𝑃 . Then 𝑃 −1 (𝐴 − 𝜆𝐼 )𝑃 =
𝐵 − 𝜆𝐼 . If 𝑃 −1 (𝐴 − 𝜆𝐼 )𝑘 𝑃 = (𝐵 − 𝜆𝐼 )𝑘 , then 𝑃 −1 (𝐴 − 𝜆𝐼 )𝑘+1𝑃 = 𝑃 −1 (𝐴 − 𝜆𝐼 )𝑘 𝑃𝑃 −1 (𝐴 −
𝜆𝐼 )𝑃 = (𝐵 − 𝜆𝐼 )𝑘 (𝐵 − 𝜆𝐼 ) = (𝐵 − 𝜆𝐼 )𝑘+1 . By induction we have (𝐴 − 𝜆𝐼 )𝑚 is similar
to (𝐵 − 𝜆𝐼 )𝑚 . So, null((𝐴 − 𝜆𝐼 )𝑚 ) = null((𝐵 − 𝜆𝐼 )𝑚 ) = 𝑚 by Exercise 6.

§ 5.4
0 1 1 1 0 1
   
1(a) 𝐴 =  1 0 1  , 𝑃 =  1 1 0  , 𝑃 −1𝐴𝑃 = diag (2, −1, −1).
 
1 1 0  1 −1 −1 
   
 7 −2 0  1 2 2
   
(b) 𝐴 =  −2 6 −2  , 𝑃 =  2 1 −2  , 𝑃 −1𝐴𝑃 = diag (3, 6, 9).
 0 −2 5   2 −2 1 
   
2(a) 𝐵 = {(−1, 1, 1), (0, 1, −1), (1, 1, 0)}, [𝑇 ]𝐵,𝐵 = diag (−1, 2, 2).
(b)-(c) Geom. mult. of 𝜆 = 0 is 1; alg. mult. of 𝜆 = 0 is 3; not diagonalizable.
(d) 𝐵 = {(1, 0, −1), (0, 1, 0), (1, 0, 1)}, [𝑇 ]𝐵,𝐵 = diag (−1, 1, 1).
(e) Geom. mult. of 𝜆 = 0 is 1; alg. mult. of 𝜆 = 0 is 4; not diagonalizable.
(f) 𝐵 = {1, 𝑡, 𝑡 2, 𝑡 3 }, [𝑇 ]𝐵,𝐵 = diag (1, 1/2, 1/3, 1/4).
 0 1 0  0 0 0 0 1 0
     
3. 0 0 0 , 0 0 1 , 0 0 1 .
 0 0 0  0 0 0 0 0 0
     
4(a) Real sym., so diagonalizable.
(b) Geom. mult. of 𝜆 = 1 is 1, alg. mult. is 3; not diagonalizable.
(c)-(d) Distinct eigenvalues; diagonalizable.
5. Yes. All eigenvalues are real.
 1 1 1   95 0 0   1 −1 0   95 45 − 95 45 − 1 
       
6. 𝐴5 =  0 1 1   0 45 0   0 1 1  =  0 45 45 − 1  .
 0 0 −1   0 0 1   0 0 −1   0 0 1 
      
Answers to exercises 175
7. 𝐴 and 𝐵 have the same eigenvalues with respective multiplicities being equal.
By ordering the eigenvectors suitably, we have invertible matrices 𝑃 and 𝑄 such
that 𝐴 = 𝑃 −1 𝐷𝑃 and 𝐵 = 𝑄 −1 𝐷𝑄, where 𝐷 is a diagonal matrix. Then 𝐵 =
(𝑃 −1𝑄) −1𝐴(𝑃 −1𝑄).

§ 5.5

1. 𝑣 1 as given. 𝑣 2 = (0, 1, 1, 0, 0), 𝑣 3 = (1, 1, 0, 0, 1); 𝑣 2 = (0, 1, 0, 0, 1), 𝑣 3 =


(1, 0, 1, 1, 0); 𝑣 2 = (0, 1, 0, 0, 1), 𝑣 3 = (1, 0, 0, 1, 1); 𝑤 1 as given. 𝑤 2 = (0, 0, 0, 1, 1).
2. 𝐴𝑣𝑖 = 0 ⇒ 𝐵𝑃𝑣𝑖 = 𝑃𝐴𝑃 −1𝑃𝑣𝑖 = 0. Let 𝐵𝑣 = 0. Then 𝐴𝑃 −1𝑣 = 0. So, 𝑃 −1𝑣 =
𝛼 1𝑣 1 +· · ·+𝛼𝑚 𝑣𝑚 . Then 𝑣 = 𝛼 1𝑃𝑣 1 +· · ·+𝛼𝑚 𝑃𝑣𝑚 . Next, 𝑃 is an isom.; so {𝑃𝑣 1, . . . , 𝑃𝑣𝑛 }
is lin. ind.
0 1 0
 
3(a)𝜆 = 0 has geometric multiplicity 1. So, 0 0 1 .
0 0 0
 
(b) 𝜆 = 2 has geometric multiplicity 1 and algebraic multiplicity 2. 𝜆 = −4 has
 −4 0 0  2 1 0
   
algebraic multiplicity 1. So,  0 2 1  . (c)  0 2 0  .
 0 0 2 0 0 1
   
−4 0 0 1 1 0
   
4. 𝐽 =  0 2 1 . 𝐴𝑃 = 𝑃 𝐽 ⇒ 𝑃 = −1 −1 −1 .
 0 0 2  1 −1 0 
     
0 1 0 𝐵 0
5. 𝐵 = , 𝐽 = .
0 0 1 0 0
           
2 1 2 1 3 1 2 1 3 1
6. 𝐽 = diag , , , 3 or 𝐽 = diag , 2, 2, , 3 .
0 2 0 2 0 3 0 2 0 3
7. null(𝐴 − 𝜆𝐼 )𝑚 = null(𝐽 − 𝜆𝐼 )𝑚 = null(𝐽𝜆 − 𝜆𝐼 )𝑚 , which is 𝑚. Here, 𝐽𝜆 is the
sub-matrix of the Jordan form 𝐽 of 𝐴 having each diagonal entry as 𝜆.
8. 𝑥 ∈ 𝑁 ((𝐴 − 𝜆𝐼 )𝑘 ) ⇒ (𝐴 − 𝜆𝐼 )𝑘 𝑥 = 0 ⇒ (𝐴 − 𝜆𝐼 )𝑘+1𝑥 = 0 ⇒ 𝑥 ∈ 𝑁 ((𝐴 − 𝜆𝐼 )𝑘+1 .
So, {0} ⊆ 𝑁 (𝐴 − 𝜆𝐼 ) ⊆ 𝑁 (𝐴 − 𝜆𝐼 ) 2 ⊆ · · · 𝑁 (𝐴 − 𝜆𝐼 )𝑘 ⊆ · · · 𝑁 (𝐴 − 𝜆𝐼 )𝑚 . Once
null((𝐴 − 𝜆𝐼 )𝑘 ) = 𝑚, we would have 𝑁 ((𝐴 − 𝜆𝐼 )𝑘 ) = 𝑁 ((𝐴 − 𝜆𝐼 )𝑘+1 ) = · · · =
𝑁 ((𝐴 − 𝜆𝐼 )𝑚 ).
9. If geometric mult. equals the algebraic mult. of 𝜆, then each Jordan block for 𝜆
is a 1 × 1 matrix.
10. This is obvious from the Jordan form.
11. We see that 𝑄 2 = 𝐼 . Thus 𝑄 −1 = 𝑄. Further, 𝑄 −1 𝐽𝜆 𝑄 = 𝑄 𝐽𝜆 𝑄 = (𝐽𝜆 ) t . Therefore,
each Jordan block is similar to its transpose. Now, construct a matrix 𝑅 by putting
such a matrix as its blocks matching the orders of each Jordan block in 𝐽 . Then it
follows that 𝑅 −1 𝐽 𝑅 = 𝐽 t .
12. The vector 𝑣 1 must be chosen from 𝑁 (𝐴 − 𝐼 ) ∩ 𝑅(𝐴). But 𝑥 1 ∉ 𝑅(𝐴) and
𝑥 2 ∉ 𝑅(𝐴). Of course, the basis {(1, 1, −1) t, (−3, 0, 1) t } gives the Jordan form.
176 Introduction to Linear Algebra

§ 5.6

1. 𝐴∗𝐴𝑣 = 𝑠𝑣 and 𝐴𝐴∗𝑤 = 𝑠𝑤 implies (𝐴∗ ) ∗𝐴∗𝑤 = 𝑠𝑤 and 𝐴∗ (𝐴∗ ) ∗𝑣 = 𝑠𝑣.


2. 𝐴∗ = 𝐴, 𝑃 ∗𝐴𝑃 = diag (𝜆1, . . . , 𝜆𝑛 ) ⇒ 𝑃 ∗𝐴∗𝐴𝑃 = diag (𝜆12, . . . , 𝜆𝑛2 ).
 √ √     1/√2 √
1/ 2 0
1/ 2 1/ 2 5 0 0 1√  √ √ 
3(a) 1 √ √ / 18 − / 18 / 18  .
1 4
/ 2 −1/ 2 0 3 0  2
/3 −2/3 −1/3 
√ √ √ √
 1/ 6 1/ 2 1/ 3   3 0   √
 1/ 2 1/√2

 √ √  
(b)  / 6
 2 0 −/ 3  0 1
1  √ √ .
 1/√6 −1/√2 1/√3   0 0  1/ 2 −1/ 2
   √ 
 1/√6 −1/√3 1/√2   2 2 0 0   1/√6 1/√3 1/√2 
 √ √   √   √ 
(c)  2/ 6 1/ 3 0   0 2 0   3/ 12 0 −1/2  .
 1/√6 −1/√3 −1/√2   0 0 0   1/√12 −2/√6 1/2 
        

  
1 1 1 0 1 −1 2 −1
4. = . Singular values: 0, 3; and 3 ± 2 2.
0 1 1 1 0 1 1 0
5. Take 𝑥 as an eigenvector for the eigenvalue 𝑠 2 of 𝐴∗𝐴.
6. Range of 𝑢𝑣 is span {𝑢}; so rank(𝑢𝑣) = 1. Next, if rank(𝐴) = 1, its full rank
factorization gives 𝑢 and 𝑣. Or, rank(𝐴) = 1 implies 𝐴 = [𝑎 1𝑢, . . . , 𝑎𝑛𝑢], where 𝑢
is a column vector. Then 𝐴 = 𝑢 [𝑎 1, . . . , 𝑎𝑛 ].
7. 𝑇 ∗𝑇 𝑥 = 𝜆𝑥 ⇒ (𝑇𝑇 ∗ )(𝑇 𝑥) = 𝜆(𝑇 𝑥). If 𝑇 𝑥 = 0, then 𝑇 ∗𝑇 𝑥 = 𝜆𝑥 = 0 ⇒ 𝜆 = 0, not
possible. So, 𝜆 > 0 is an eigenvalue of 𝑇 ∗𝑇 implies 𝜆 is an eigenvalue of 𝑇𝑇 ∗ . Take
𝑇 ∗ instead of 𝑇 to get the converse.
8. 𝑇 ∗𝑇 𝑣𝑖 = 𝑠𝑖2𝑣𝑖 , 𝑤𝑖 = (𝑠𝑖 ) −1𝑇 𝑣𝑖 ⇒ (𝑠𝑖 ) −1𝑇 ∗𝑤𝑖 = (𝑠𝑖 ) −2𝑇 ∗𝑇 𝑣𝑖 = 𝑣𝑖 .
9. Let 𝐴 = 𝑃 Σ 𝑄 ∗ be an SVD of 𝐴 ∈ C𝑚×𝑛 , and let 𝑥 ∈ C𝑛×1 be a unit vector. Let 𝑠 1 ≥
· · · ≥ 𝑠𝑟 be the positive singular values of 𝐴. Write 𝑦 = 𝑄 ∗𝑥 = [𝛼 1, · · · , 𝛼𝑛 ] 𝑡 . Then
k𝑦 k 2 = 𝑛𝑖=1 |𝛼𝑖 | 2 = k𝑄 ∗𝑥 k 2 = 𝑥 ∗𝑄𝑄 ∗𝑥 = 𝑥 ∗𝑥 = k𝑥 k 2 = 1. k𝐴𝑥 k 2 = k𝑃 Σ 𝑄 ∗𝑥 k 2 =
Í
𝑥 ∗𝑄 Σ∗𝑃 ∗𝑃 Σ 𝑄 ∗𝑥 = 𝑥 ∗𝑄 Σ2𝑄 ∗𝑥 = 𝑦 ∗ Σ2𝑦 = 𝑟𝑗=1 𝑠 2𝑗 |𝛼 𝑗 | 2 ≤ 𝑠 12 𝑟𝑗=1 |𝛼 𝑗 | 2 ≤ 𝑠 12 .
Í Í

For 𝑥 = 𝑣 1, k𝐴𝑣 1 k 2 = 𝑠 12 k𝑢 1 k 2 = 𝑠 12 . So, 𝑠 1 = max{k𝐴𝑥 k : 𝑥 ∈ C𝑛×1, k𝑥 k = 1}.


Similarly, k𝐴𝑥 k 2 = 𝑟𝑗=1 𝑠 2𝑗 |𝛼 𝑗 | 2 ≥ 𝑠𝑟2 𝑟𝑗=1 |𝛼 𝑗 | 2 ≥ 𝑠𝑟2 .
Í Í

With 𝑥 = 𝑣𝑟 , k𝐴𝑣𝑟 k 2 = 𝑠𝑟2 k𝑢𝑟 k 2 = 𝑠𝑟2 . So, 𝑠𝑟 = min{k𝐴𝑥 k : 𝑥 ∈ C𝑛×1, k𝑥 k = 1}.


Notice that if 𝑥 is a unit  vector,
  −1𝑠𝑟 ≤ k𝐴𝑥 k ≤ 𝑠 1 .
then
𝑆 0 ∗ 𝑆 0 ∗
10(a) 𝐴∗𝐴𝐴†𝑏 = 𝐴∗𝑄 𝑃 𝑃 𝑄 𝑏 = 𝐴∗𝑏 as 𝑃 ∗𝑃 = 𝐼, 𝑄𝑄 ∗ = 𝐼 .
0 0 0 0
(b) Follows using 𝑃 ∗𝑃 = 𝐼, 𝑄𝑄 ∗ = 𝐼 and the formula for 𝐴† .
(c) For uniqueness, suppose 𝐵, 𝐶 ∈ F𝑚×𝑛 such that (𝐴𝐵) ∗ = 𝐴𝐵, (𝐵𝐴) ∗ = 𝐵𝐴,
𝐴𝐵𝐴 = 𝐴, 𝐵𝐴𝐵 = 𝐵, (𝐴𝐶) ∗ = 𝐴𝐶, (𝐶𝐴) ∗ = 𝐶𝐴, 𝐴𝐶𝐴 = 𝐴, 𝐶𝐴𝐶 = 𝐶. Then
𝐵 = 𝐵𝐴𝐵 = 𝐵(𝐴𝐵) ∗ = 𝐵𝐵 ∗𝐴∗ = 𝐵𝐵 ∗ (𝐴𝐶𝐴) ∗ = 𝐵𝐵 ∗𝐴∗𝐶 ∗𝐴∗ = 𝐵𝐵 ∗𝐴∗ (𝐴𝐶) ∗
= 𝐵𝐵 ∗𝐴∗𝐴𝐶 = 𝐵(𝐴𝐵) ∗𝐴𝐶 = 𝐵𝐴𝐵𝐴𝐶 = 𝐵(𝐴𝐵𝐴)𝐶 = 𝐵𝐴𝐶 = 𝐵(𝐴𝐶𝐴)𝐶
= 𝐵𝐴𝐶𝐴𝐶 = 𝐵𝐴(𝐶𝐴)𝐶 = 𝐵𝐴(𝐶𝐴) ∗𝐶 = 𝐵𝐴𝐴∗𝐶 ∗𝐶 = (𝐵𝐴) ∗𝐴∗𝐶 ∗𝐶
= 𝐴∗ 𝐵 ∗𝐴∗𝐶 ∗𝐶 = (𝐴𝐵𝐴) ∗𝐶 ∗𝐶 = 𝐴∗𝐶 ∗𝐶 = (𝐶𝐴) ∗𝐶 = 𝐶𝐴𝐶 = 𝐶.
Answers to exercises 177
§ 5.7
1. Let 𝑚 ≤ 𝑛. Using the thin SVD of 𝐴, we see that 𝐴 = 𝑃1 Σ1𝑄 1, where 𝑃1 ∈ F𝑚×𝑚 ,
Σ1 ∈ F𝑚×𝑚 and 𝑄 1 ∈ F𝑚×𝑛 . The rows of 𝑄 1 are the first 𝑚 rows of a unitary matrix;
so, they are orthonormal. Since 𝑃1 is unitary, we have 𝐴 = (𝑃1 Σ1𝑃1∗ )(𝑃1𝑄 1 ). Now,
𝑃 := 𝑃1 Σ1𝑃1∗ is positive semi-definite and 𝑈 := 𝑃 1𝑄 1 has orthonormal rows so
that 𝐴 = 𝑃𝑈 . Since 𝑄 1 has orthonormal rows, 𝑄 1𝑄 1∗ = 𝐼 . Thus 𝐴 = 𝑃 1 Σ1𝑄 1 =
(𝑃 1𝑄 1 )(𝑄 1∗ Σ1𝑄 1 ). Again, 𝑄 := 𝑄 1∗ Σ1𝑄 1 is positive semidefinite. So, 𝐴 = 𝑈 𝑄.
Similarly, the case 𝑚 ≥ 𝑛 is proved.
2. Let 𝑚 ≤ 𝑛, 𝐴 = 𝑈 𝑄, where 𝑈 is unitary and 𝑄 is positive semi-definite
with 𝑄 2 = 𝐴∗𝐴. Then 𝑄 = 𝐶 ∗ 𝐷𝐶, 𝐴∗𝐴 = 𝐶 ∗ 𝐷 2𝐶, 𝐶 ∈ C𝑚×𝑚 is unitary and
𝐷 = diag (𝑠 1, . . . , 𝑠𝑟 , 0, . . . , 0) ∈ C𝑚×𝑚 , 𝑠𝑖 are
 the positive singular values of 𝐴. Then
𝐷 0 𝐶 0
Take 𝑃 := 𝑈𝐶 ∗, Σ = ,𝑉 = so that Σ ∈ C𝑚×𝑛 and 𝑉 ∈ C𝑛×𝑛 . Then
0 0 0 𝐼
𝐴 = 𝑃 Σ𝑉 is an SVD of 𝐴. Similarly, the case 𝑚 ≥ 𝑛 is tackled using the polar
decomposition 𝐴 = 𝑃𝑈 .
3. Let 𝑇 ∗ = 𝑇 , 𝛼 2 < 4𝛽, 𝑥 ≠ 0. Then h(𝑇 2 + 𝛼𝑇 + 𝛽𝐼 )𝑥, 𝑥i = k𝑇 𝑥 k 2 + 𝛼 h𝑇 𝑥, 𝑥i +
2
𝛽 k𝑥 k 2 ≥ k𝑇 𝑥 k 2 − |𝛼 | k𝑇 𝑥 k k𝑥 k +𝛽 k𝑥 k 2 = k𝑇 𝑥 k − (|𝛼 | k𝑥 k/2) + (𝛽 −𝛼 2 /4)k𝑥 k 2 >
0. So, 𝑇 is positive definite. Then 𝑇 𝑥 ≠ 0 for any nonzero 𝑥 . So, 𝑇 is invertible.
4. h𝑥, 𝑦i = h𝑧, 𝑦i ⇒ h𝑥 − 𝑧, 𝑦i = 0. With 𝑦 = 𝑥 − 𝑧, h𝑥 − 𝑧, 𝑥 − 𝑧i = 0.
5. Expand the norms using inner products.
6. Using Exercise 5, conclude that h𝑇 𝑥, 𝑥i = 0 for every 𝑥 implies that h𝑇 𝑥, 𝑦i = 0
for all 𝑥, 𝑦. With 𝑦 as 𝑇 𝑥, h𝑇 𝑥,𝑇 𝑥i = 0 for every 𝑥 . That is, 𝑇 = 0. Use this for 𝑆 −𝑇
instead of 𝑇 . 7. h𝑇 𝑥, 𝑥i ∈ R. h𝑇 𝑥, 𝑥i = h𝑥,𝑇 𝑥i = h𝑇 ∗𝑥, 𝑥i. By Exercise 6, 𝑇 ∗ = 𝑇 .
8. k𝑈 𝑥 k 2 = k𝑥 k 2 ⇒ h𝑈 ∗𝑈 𝑥, 𝑥ih𝐼𝑥, 𝑥i, h𝑥, 𝑈𝑈 ∗𝑥i = h𝑥, 𝐼𝑥i. Use Exercise 6.
9. Let dim (𝑉 ) = 𝑛. By Exercise 7, 𝑇 ∗ = 𝑇 . If 𝑇 𝑥 = 𝜆𝑥 with 𝑥 ≠ 0, then
𝜆 = k𝑥 k 2 /h𝑇 𝑥, 𝑥i ≥ 0. Then there exists an orthonormal basis 𝐵 = {𝑣 1, . . . , 𝑣𝑛√ } such
that [𝑇 ]𝐵,𝐵 = diag (𝜆1, . . . , 𝜆𝑛 ), 𝜆𝑖 ≥ 0. Define the linear operator 𝑆 by 𝑆 (𝑣𝑖 ) = 𝜆𝑖 𝑣𝑖 .
Then 𝑆 2 = 𝑇 . For uniqueness, let 𝑅 be a positive semi-definite operator with
𝑅 2 = 𝑇 . As 𝑅 ∗ = 𝑅, we have an orthonormal basis 𝐶 = {𝑢 1, . . . , 𝑢𝑛 } such that
[𝑅]𝐵,𝐵 = diag (𝛼 1, . . . , 𝛼𝑛 ). Let 𝑣 ∈ 𝐵. Fix any 𝑣 𝑗 ∈ 𝐵. Then 𝑇 𝑣 𝑗 = 𝜆 𝑗 𝑣 𝑗 . For ease in
Í
notation, write 𝑣 = 𝑣 𝑗 and 𝜆 = 𝜆 𝑗 so that 𝑇 𝑣 = 𝜆𝑣. Now, 𝑣 = 𝑛𝑖=1 h𝑣, 𝑢𝑖 i𝑢𝑖 . And, 𝑅𝑣 =
Í𝑛 2 Í 𝑛 2 Í 𝑛
h𝑣, 𝑢𝑖 i(𝛼𝑖2 − 𝜆)𝑢𝑖 = 0.
𝑖=1 h𝑣, 𝑢𝑖 i𝛼𝑖 𝑢𝑖 , 𝑇 𝑣 = 𝑅 𝑣 = 𝑖=1 h𝑣, 𝑢𝑖 i𝛼𝑖 𝑢𝑖 = 𝜆𝑣 implies 𝑖=1
As 𝐶 is lin. ind., (𝛼𝑖2 − 𝜆)h𝑣, 𝑢𝑖 i = 0 for each 𝑖. Then 𝑣 = 𝑖:𝛼 2 =𝜆 h𝑣, 𝑢𝑖 i𝑢𝑖 . Hence
Í
Í Í √ 𝑖
𝑅𝑣 = 𝑖:𝛼 2 =𝜆 h𝑣, 𝑢𝑖 i𝑅𝑢𝑖 = 𝑖:𝛼 2 =𝜆 h𝑣, 𝑢𝑖 i𝛼𝑖 𝑢𝑖 = 𝛼𝑖 𝑣 = 𝜆𝑣. So, 𝑅 is uniquely defined
𝑖 𝑖
on the basis 𝐵.
10(a ⇒ b) Let 𝑇 𝑣 = 𝜆𝑣. Then 0 ≤ h𝑇 𝑣, 𝑣i = 𝜆h𝑣, 𝑣i. So, 𝜆 ≥ 0.
(b ⇒ c) Use the construction of 𝑆 in the answer to Exercise 9.
(c ⇒ d) If 𝑆 is positive semi-definite, it is self-adjoint.
(d ⇒ e) 𝑆 satisfies 𝑆 2 = 𝑇 and 𝑆 ∗ = 𝑆. So, 𝑇 = 𝑆 ∗𝑆.
(e ⇒ a) (𝑆 ∗𝑆) ∗ = 𝑆 ∗𝑆; h𝑆 ∗𝑆𝑥, 𝑥i = h𝑆𝑥, 𝑆𝑥i ≥ 0.
Index

𝑁 (𝑇 ), 59 equivalent matrices, 104


𝑅(𝑇 ), 59 Euclidean space, 38
F𝑚×𝑛 , 87
finite dimensional, 28
L (𝑉 ,𝑊 ), 94
Fourier expansion, 43
Sol (𝐴, 𝑏), 75
Full rank factorization, 106
adjoint, 70 full rank factorization, 155
Algebraic multiplicity, 114 fundamental subspaces, 73
angle between vectors, 40
Gauss-Jordan elimination, 77
Annihilates, 115
generalized eigenvector, 149
augmented matrix, 75
geometric multiplicity, 129
basis, 22 Gram matrix, 104
Bessel’s inequality, 43 homogeneous system, 75
best approximation, 48
identity operator, 52
Cauchy Schwartz Inequality, 39 infinite dimensional, 28
Cayley-Hamilton theorem, 115 inner product, 37
change of basis, 99 inner product space, 38
characteristic polynomial, 111 ips, 38
Characteristic values, 111 isometric, 73
column vector, 3 isomorphic, 63
Companion matrix, 114 isomorphism, 63
Complex eigenvalues, 114
complex vector space, 4 Jordan block, 145
consistent system, 75 Jordan form, 146
coordinate vector, 86 Jordan string, 138
coordinate vector map, 86
least squares, 81
Counting multiplicities, 114
length of a Jordan string, 138
Cramer’s rule, 77
linearly dependent, 17
diagonalizable, 126 linearly independent, 17
diagonalized by, 126 linear combination, 12
dimension, 27 linear equations, 75
linear functional, 52
eigenvalue, 108 linear map, 52
eigenvector, 108 linear operator, 52

178
Index 179
linear transformation, 52 singular values, 152
solution set, 75
matrix, 87
span, 13
of linear map, 88
spanned by, 15
maximal, 23
spanning set, 15
minimal, 23
square root, 162
norm, 38 standard basis, 23
normal, 73 standard inner product, 37, 38
nullity, 59 subspace, 9
null space, 59 sum of subsets, 14
SVD, 152
orthogonal, 73
orthogonally diagonalizable, 127 Theorem
orthogonal basis, 43 Basis extension, 29
orthogonal set, 41 Bessel’s inequality, 43
orthogonal vectors, 41 Cauchy Schwartz, 39
orthonormal basis, 43 Cayley-Hamilton, 115
orthonormal set, 41 Fourier expansion, 43
Full rank factorization, 106
parallelogram law, 39
Jordan form, 146
Parseval’s identity, 43
Jordan strings, 140
polar decomposition, 158, 160
Parallelogram law, 39
positive definite, 158
Parseval’s identity, 43
positive semi-definite, 158
Polar decomposition, 158, 160
Pythagoras’ theorem, 41
Rank-nullity, 60
QR-factorization, 82 Rank factorization, 105
Rank theorem, 105
range space, 59
Reverse triangle ineq., 39
rank, 59
Schur triangularization, 121
Rank-nullity theorem, 60
SVD, 152, 154
Rank factorization, 105
Triangle inequality, 39
Rank theorem, 105
thin SVD, 154
real vector space, 4
tight SVD, 155
Reverse triangle inequality, 39
triangle inequality, 39
Riesz representation, 70
Riesz representer, 71 unitarily diagonalizable, 126
row vector, 3 unitary, 73
unitary space, 38
scalar, 4
Schur triangularization, 123 vector, 4
self-adjoint, 73 vector space, 3
similar, 106
zero operator, 52

You might also like