Classnote Ma2031
Classnote Ma2031
MA2031 Classnotes
Linear Algebra for Engineers
© A.Singh
Contents
Syllabus v
1 Vector Spaces 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 What is a vector space? . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Linear independence . . . . . . . . . . . . . . . . . . . . . . . 16
1.6 Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.7 Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.8 Extracting a basis . . . . . . . . . . . . . . . . . . . . . . . . . 32
3 Linear Transformations 52
3.1 What is a linear transformation? . . . . . . . . . . . . . . . . . 52
3.2 Action on a basis . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3 Range space and null space . . . . . . . . . . . . . . . . . . . . 58
3.4 Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.5 Adjoint of a Linear Transformation . . . . . . . . . . . . . . . . 68
iii
iv
5.4 Diagonalizability . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.5 Jordan form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.6 Singular value decomposition . . . . . . . . . . . . . . . . . . . 151
5.7 Polar decomposition . . . . . . . . . . . . . . . . . . . . . . . . 158
Index 178
Syllabus
Vector spaces: Real and complex vector spaces, subspaces, span, linear indepen-
dence, dimension.
Linear Transformations: Linear transformations, rank and nullity, matrix repre-
sentation, change of bases, solvability of linear systems.
Inner product spaces: Inner products, angle, orthogonal and orthonormal sets,
Gram-Schmidt orthogonalization, orthogonal and orthonormal bases, orthogonal
complement, QR-factorization, best approximation and least squares, Riesz repre-
sentation and adjoint.
Eigenpairs of linear transformations: Eigenvalues and eigenvectors, spectral
mapping theorem, characteristic polynomial, Cayley-Hamilton theorem.
Matrix representations: Block-diagonalization, Schur triangularization, diagonal-
ization theorem, generalized eigenvectors, Jordan form, singular value decomposi-
tion, polar decomposition.
Texts:
1. S. Lang, Linear Algebra, 3rd Ed., Springer, 2004.
2. D. W. Lewis, Matrix Theory, World Scientific, 1991.
References:
1. H. Anton & C. Rorres, Elementary Linear Algebra: Applications, 11th Ed.,
Wiely, 2013.
2. K. Janich, Linear Algebra, 3rd Ed., Springer, 2004.
3. B. Koleman & D. Hill, Elementary Linear Algebra, 9th Ed., Pearson, 2007.
4. A. Singh, Introduction to Matrix Theory, Ane Books, 2018.
v
1
Vector Spaces
1.1 Introduction
Consider a vector 𝑢® in the plane. As we know, two vectors are equal iff they have
the same direction and same length. Thus, we may draw 𝑢® anywhere on the plane.
Let us fix the 𝑥 and the 𝑦 axes, and mark the origin as (0, 0). Next, we draw 𝑢® with
its initial point at the origin. Its endpoint is some point, say, (𝑎, 𝑏). Thus, by making
the convention that each plane vector has its initial point at the origin, we see that
any plane vector is identified with a point (𝑎, 𝑏) ∈ R2 .
Take another vector 𝑣® with its initial point at the origin and endpoint as (𝑐, 𝑑).
By using the parallelogram law of adding two vectors, we see that the vector 𝑢® + 𝑣®
has initial point (0, 0) and endpoint (𝑎 + 𝑐, 𝑏 + 𝑑). This gives rise to addition of two
points in R2 as in the following:
1
2 MA2031 Classnotes
This defines the new function 𝛼 𝑓 obtained from the function 𝑓 and the scalar 𝛼 .
Similarly, if 𝑓 , 𝑔 ∈ 𝑆, the addition on R2 says that
In fact, when we are not very specific, we write a row vector and also a column
vector as an 𝑛-tuple. So, both F1×𝑛 and F𝑛×1 are written as F𝑛 .
For 𝑚, 𝑛 ∈ N, we write F𝑚×𝑛 as the set of all 𝑚 × 𝑛 matrices with entries as
numbers from F.
A nonempty set 𝑉 with two operations: + (addition) that associates any two
elements 𝑢, 𝑣 in 𝑉 to a single element 𝑢 + 𝑣 in 𝑉 , and · (scalar multiplication) that
associates a number 𝛼 ∈ F and an element 𝑣 in 𝑉 to an element 𝛼 · 𝑣 in 𝑉 , is said to
be a vector space over F iff it satisfies the following conditions:
(1) For all 𝑥, 𝑦 ∈ 𝑉 , 𝑥 + 𝑦 = 𝑦 + 𝑥 .
(2) For all 𝑥, 𝑦, 𝑧 ∈ 𝑉 , (𝑥 + 𝑦) + 𝑧 = 𝑥 + (𝑦 + 𝑧).
(3) There exists an element 0 ∈ 𝑉 such that 𝑥 + 0 = 𝑥 for all 𝑥 ∈ 𝑉 .
4 MA2031 Classnotes
(1.1) Example
1. {0} is a vector space over F with 0 + 0 = 0 and 𝛼 · 0 = 0 for each 𝛼 ∈ F.
2. F is a vector space over F with addition and multiplication as in F.
3. R𝑛 , R1×𝑛 and R𝑛×1 are real vector spaces with component-wise addition and
scalar multiplication, for any 𝑛 ∈ N.
4. C𝑛 , C1×𝑛 and C𝑛×1 are complex vector spaces with component-wise addition and
scalar multiplication, for any 𝑛 ∈ N.
5. Consider C with usual addition of complex numbers. For any 𝛼 ∈ R, consider
the scalar multiplication 𝛼𝑥 as the real number 𝛼 multiplied with the complex
number 𝑥 for any 𝑥 ∈ C. Then C is a real vector space. Similarly, C𝑛 , C1×𝑛 and
C𝑛×1 are also real vector spaces.
6. 𝑉 = {(𝑎, 𝑏) ∈ R2 : 𝑏 = 0} is a real vector space under component-wise addition
and scalar multiplication.
Vector Spaces 5
7. 𝑉 = {(𝑎, 𝑏) ∈ R2 : 2𝑎 + 𝑏 = 0} is a real vector space under component-wise
addition and scalar multiplication.
8. Let 𝑉 = {(𝑎, 𝑏) ∈ R2 : 3𝑎 + 5𝑏 = 1}. We see that ( 1/3, 0), (0, 1/5) ∈ 𝑉 . But
their sum ( 1/3, 1/5) ∉ 𝑉 . [Also, 3( 1/3, 0) ∉ 𝑉 .] Thus 𝑉 is not a vector space with
component-wise addition and scalar multiplication.
9. F𝑛 [𝑡] := {𝑎 0 + 𝑎 1𝑡 + · · · + 𝑎𝑛 𝑡 𝑛 : 𝑎𝑖 ∈ F} with addition as the usual addition
of polynomials and scalar multiplication as multiplication of a polynomial by
a number, is a vector space over F. Here, F𝑛 [𝑡] contains all polynomials in the
variable 𝑡 of degree less than or equal to 𝑛.
10. F[𝑡] := the set of all polynomials (of all degrees) with coefficients from F is
a vector space over F with + as the addition of two polynomials and · as the
multiplication of a polynomial by a number from F.
11. Let 𝑉 = R2 . For (𝑎, 𝑏), (𝑐, 𝑑) ∈ 𝑉 and 𝛼 ∈ 𝑅, define addition as component-wise
addition, and scalar multiplication as in the following:
(
𝛼 (𝑎, 𝑏) = (0, 0) if 𝛼 = 0
𝛼 (𝑎, 𝑏) = (𝛼𝑎, 𝑏/𝛼) if 𝛼 ≠ 0.
Then (1 + 1)(0, 1) = 2(0, 1) = (0, 1/2) but 1(0, 1) + 1(0, 1) = (0, 2). Thus 𝑉 is
not a vector space over R.
12. Let 𝑉 = F𝑚×𝑛 . If 𝐴 = [𝑎𝑖 𝑗 ] and 𝐵 = [𝑏𝑖 𝑗 ] are in F𝑚×𝑛 , then define 𝐴+𝐵 = [𝑎𝑖 𝑗 +𝑏𝑖 𝑗 ]
and 𝛼𝐴 = [𝑖 𝑗 ]. Then 𝑉 is vector space over F. The zero vector is the zero matrix
0 and −[𝑎𝑖 𝑗 ] = [−𝑎𝑖 𝑗 ].
13. Let 𝑉 be the set of all functions from a nonempty set 𝑆 to F. Define addition of
two functions and scalar multiplication by
Then 𝑉 is a vector space over F. Here, zero vector is the zero map 0 given by
0(𝑥) = 0 for all 𝑥 ∈ 𝑆; and for each 𝑓 ∈ 𝑉 , its additive inverse −𝑓 is the map
given by (−𝑓 )(𝑥) = −𝑓 (𝑥) for 𝑥 ∈ 𝑆.
14. Let 𝑉 be the set of all continuous functions from the closed interval [𝑎, 𝑏] to
R. Define addition and scalar multiplication as in (13). Then 𝑉 is a real vector
space.
15. Let 𝑉 be the set of all two times differentiable functions from R to R such that
𝑓 00 + 𝑓 = 0, where 𝑓 0 denotes the derivative of 𝑓 with respect to the independent
real variable 𝑡 . Define addition and scalar multiplication as in (13). For 𝑓 , 𝑔 ∈ 𝑉
6 MA2031 Classnotes
(𝑓 + 𝑔) 00 + (𝑓 + 𝑔) = (𝑓 00 + 𝑓 ) + (𝑔00 + 𝑔) = 0
(𝛼 𝑓 ) 00 + (𝛼 𝑓 ) = 𝛼 (𝑓 00 + 𝑓 ) = 0.
Any vector that behaves like the 0 in the third property, is called a zero vector.
Similarly, any vector that behaves like (−𝑥) for a given vector 𝑥, is called an additive
inverse of 𝑥 . In fact, there cannot be more than one zero vector, and there cannot be
more than one additive inverse of any vector. Along with this we show some other
expected facts.
(1.2) Theorem
In any vector space, the following are true:
(1) Zero vector is unique.
(2) Each vector has a unique additive inverse.
(3) For any vectors 𝑥, 𝑦, 𝑧, and any scalar 𝛼, the following hold:
(a) If 𝑥 + 𝑦 = 𝑥 + 𝑧, then 𝑦 = 𝑧.
(b) 0 · 𝑥 = 0.
(c) 𝛼 · 0 = 0.
(d) (−1) · 𝑥 = −𝑥 .
(e) If 𝛼 · 𝑥 = 0, then 𝛼 = 0 or 𝑥 = 0.
𝑥1 = 𝑥1 + 0 = 𝑥1 + 𝑥 + 𝑥2 = 𝑥2 + 𝑥 + 𝑥1 = 𝑥2 + 0 = 𝑥2.
𝑥 = 1 · 𝑥 = 𝛼 −1 · 𝛼 · 𝑥 = 𝛼 −1 · 0 = 0.
As (1.2) shows, we may work with vectors the same way as we work with numbers.
However, we cannot multiply two vectors, since no such operation is available in
a vector space. This is the reason we used 𝛼 −1 in the proof of (1.2-3e) instead of
using 𝑥 −1 . In fact, there is no such vector as 𝑥 −1 .
In what follows, we write 𝑉 as a vector space without mentioning the underlying
field. We assume that the underlying field is F which may be R or C. We consider
F𝑛 to be a vector space over F. In particular, R𝑛 is taken as a real vector space; and
if nothing is specified, we take C𝑛 as a complex vector space.
1.3 Subspaces
Consider the following two nonempty subsets of R2 :
We have seen that 𝑈 is a vector space with the same operations of addition and scalar
multiplication as in R2 . Of course, the operations are well defined on 𝑈 . That is,
whenever 𝑥, 𝑦 ∈ 𝑈 and 𝛼 ∈ F, we have 𝑥 + 𝑦 ∈ 𝑈 and 𝛼𝑥 ∈ 𝑈 . But 𝑊 is not a vector
space with the same operations. In fact, the sum of two vectors from 𝑊 does not
necessarily result in a vector from 𝑊 . For instance, (0, 1) ∈ 𝑊 and (1, −1) ∈ 𝑊 but
(0, 1) + (1, −1) = (1, 0) ∉ 𝑊 . Similarly, multiplying a scalar with a vector from 𝑊
Vector Spaces 9
may not result in a vector from 𝑊 . We would like to separate out the first interesting
case of 𝑈 .
Let 𝑉 be a vector space. A subset 𝑈 of 𝑉 is called a subspace of 𝑉 iff the
following conditions are satisfied:
(1) 𝑈 ≠ ∅.
(2) For all 𝑥, 𝑦 ∈ 𝑈 , 𝑥 + 𝑦 ∈ 𝑈 .
(3) For each 𝑥 ∈ 𝑈 and for each 𝛼 ∈ F, 𝛼𝑥 ∈ 𝑈 .
A subspace of 𝑉 which is not equal to 𝑉 is called a proper subspace of 𝑉 .
Notice that the second and the third conditions together, in the definition of a
subspace, is equivalent to the following single condition:
(1.3) Example
14. 𝐶 𝑘 [𝑎, 𝑏] := the set of all 𝑘 times continuously differentiable functions from
[𝑎, 𝑏] to R is a proper subspace of 𝐶 [𝑎, 𝑏].
15. Let 𝑉 = 𝐶 [−1, 1]. Let 𝑈 = {𝑓 ∈ 𝑉 : 𝑓 is an odd function }. As a convention,
the zero function is taken as an odd function. So, 𝑈 ≠ ∅. If 𝑓 , 𝑔 ∈ 𝑈 and 𝛼 ∈ R,
then (𝑓 + 𝛼𝑔)(−𝑥) = 𝑓 (−𝑥) + 𝛼𝑔(−𝑥) = −𝑓 (𝑥) + 𝛼 (−𝑔(𝑥)) = −(𝑓 + 𝛼𝑔)(𝑥).
So, 𝑓 + 𝛼𝑔 ∈ 𝑈 . Therefore, 𝑈 is a subspace of 𝑉 .
In fact, a subspace is a vector space on its own right, with the operations inherited
from the parent vector space.
(1.4) Theorem
Let 𝑉 be a vector space over F with + as the addition and · as the scalar multiplica-
tion. Let 𝑈 be a subspace of 𝑉 . Then 𝑈 is a vector space over F with the addition
as the restriction of + to 𝑈 , and scalar multiplication as the restriction of · to 𝑈 .
(1.5) Theorem
The intersection of two subspaces of a vector space is also a subspace.
(1.6) Theorem
Union of two subspaces is a subspace iff one of them is a subset of the other.
5. Why is the set of all polynomials of degree 𝑛 not a vector space, with usual
addition and scalar multiplication of polynomials?
Ans: Not closed under addition (scalar multiplication).
6. Is the set of all skew-symmetric 𝑛 × 𝑛 matrices a subspace of R𝑛×𝑛 ?
Ans: Yes.
7. Determine whether the following are real vector spaces:
(a) R∞ := the set of all sequences of real numbers.
(b) ℓ ∞ := the set of all bounded sequences of real numbers.
(c) ℓ 1 := the set of all absolutely convergent sequences of real numbers.
Ans: (a)-(c): Yes.
8. Let 𝑆 be a nonempty set and let 𝑠 ∈ 𝑆. Let 𝑉 be the set of all functions
𝑓 : 𝑆 → R with 𝑓 (𝑠) = 0. Is 𝑉 a vector space over R with the usual addition
and scalar multiplication of functions?
Ans: It is a subspace of the space of all functions from 𝑆 to R.
9. Show that the set 𝐵(𝑆) of all bounded functions from a nonempty set 𝑆 to R
is a real vector space.
1.4 Span
We have seen that the union of two subspaces may fail to be a subspace since the
union need not be closed under the operations. It is quite possible that we enlarge
the union so that it becomes a subspace. Of course, a trivial enlargement of the
union is the whole vector space. A better option would be to enlarge the union in
a minimal way; that is by including only those vectors that are required to obtain a
subspace.
To see the requirement in a general way, let 𝑆 be a nonempty subset of a vector
space 𝑉 . Suppose 𝑆 is not a subspace of 𝑉 . Let 𝑢 ∈ 𝑆. In any minimal enlargement
of 𝑆, all vectors of the form 𝛼𝑢 must be present. Similarly, if 𝑣 ∈ 𝑆, then all vectors
of the form 𝛽𝑣 must be present in the enlargement. Then the vectors of the form
𝛼𝑢 + 𝛽𝑣 must also be in the enlargement. In general, if 𝑣 1, . . . , 𝑣𝑛 ∈ 𝑆, then in this
enlargement, we must have all vectors of the form
𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣 𝑛 for 𝛼 1, . . . , 𝛼𝑛 ∈ F.
(1.7) Example
Notice that a linear combination is always a finite sum. If 𝑆 is a finite set, say
𝑆 = {𝑣 1, . . . , 𝑣𝑚 }, then span (𝑆) = {𝛼 1𝑣 1 + · · · + 𝛼𝑚 𝑣𝑚 : 𝛼 1, . . . , 𝛼𝑚 ∈ F}.
In general if 𝑆 is a nonempty set, then we can write its span as
Caution: The set 𝑆 can have infinitely many elements, but a linear combination is
a sum of finitely many elements from 𝑆, multiplied with some scalars.
We show that the notion of ‘span’ serves its purpose in enlarging a subset to a
minimal subspace.
(1.8) Theorem
The span of a subset is the minimal subspace that contains the subset.
Proof. Let 𝑆 be a subset of a vector space 𝑉 over F. If 𝑆 = ∅, then span (𝑆) = {0};
which is clearly the minimal subspace containing ∅. So, let 𝑆 ≠ ∅. Clearly, 𝑆 ⊆
14 MA2031 Classnotes
(1.9) Theorem
The sum of two subspaces of a vector space is equal to the span of their union.
𝑧 1 + 𝛼𝑧 2 = 𝑥 1 + 𝑦1 + 𝛼 (𝑥 2 + 𝑦2 ) = (𝑥 1 + 𝛼𝑥 2 ) + (𝑦1 + 𝛼𝑦2 ) ∈ 𝑈 + 𝑊 .
Hence 𝑈 + 𝑊 is a subspace of 𝑉 that contains 𝑈 ∪ 𝑊 . Since span (𝑈 ∪ 𝑊 ) is the
minimal subspace of 𝑉 that contains 𝑈 ∪ 𝑊 , we have span (𝑈 ∪ 𝑊 ) ⊆ 𝑈 + 𝑊 . On
the other hand, 𝑈 + 𝑊 ⊆ span (𝑈 ∪ 𝑊 ). So, 𝑈 + 𝑊 = span (𝑈 ∪ 𝑊 ).
Though a vector space is very large, there can be a small subset whose span is the
vector space. For example, R2 = span {(1, 0), (0, 1)}.
Vector Spaces 15
A subset 𝑆 of a vector space 𝑉 is said to span 𝑉 iff span (𝑆) = 𝑉 . In this case, we
also say that 𝑆 is a spanning set of 𝑉 , and 𝑉 is spanned by 𝑆.
(1.10) Example
1. The subset {(1, 0), (0, 1)} of R2 spans R2 .
2. The subset {(1, 2), (2, 1), (2, 2)} of R2 spans R2 .
3. The subset {𝑒 1, . . . , 𝑒𝑛 } of F𝑛 is a spanning set of F𝑛 .
4. F𝑛 [𝑡] is spanned by {1, 𝑡, . . . , 𝑡 𝑛 }.
5. The subset {(1, 2)} of R2 spans the vector space {𝛼 (1, 2) : 𝛼 ∈ R}. Here, the
vector space is the straight line that passes through the origin and the point (1, 2);
it is a proper subspace of R2 . For instance, (1, 1) ∉ span {(1, 2)}. We see that
{(1, 2)} does not span R2 .
6. {(1, 1, 1), (0, 1, 1), (1, −1, −1), (1, 3, 3)} = span {(0, 0, 0), (1, 1, 1), (0, 1, 1). Why?
The span of the given vectors is the plane in R3 that contains the points (0, 0, 0),
(1, 1, 1) and (0, 1, 1).
7. Let 𝑆 be a subset of a vector space 𝑉 . Prove that span (𝑆) is the intersection
of all subspaces that contain 𝑆.
8. We know that 𝑒 𝑡 = 1 + 𝑡 + 2!1 𝑡 2 + · · · for each 𝑡 ∈ R. Does it imply that
𝑒 𝑡 ∈ span {1, 𝑡, 𝑡 2, . . .}? Ans: No.
9. Show that every vector space has at least two spanning sets.
10. Let 𝑉 be the real vector space of all functions from {1, 2} to R. Construct a
spanning set of 𝑉 with two elements.
11. Find a finite spanning set for the real vector space of all real symmetric
matrices of order 𝑛.
12. Let 𝐴 and 𝐵 be subsets of a vector space 𝑉 . Prove (a)-(c) and give counter
examples for (d)-(f):
(a) span (span (𝐴)) = span (𝐴).
(b) If 𝐴 ⊆ 𝐵, then span (𝐴) ⊆ span (𝐵).
(c) span (𝐴 ∩ 𝐵) ⊆ span (𝐴) ∩ span (𝐵).
(d) span (𝐴) ∩ span (𝐵) ⊆ span (𝐴 ∩ 𝐵).
(e) span (𝐴) \ span (𝐵) ⊆ span (𝐴 \ 𝐵).
(f) span (𝐴 \ 𝐵) ⊆ (span (𝐴) \ span (𝐵)) ∪ {0}.
13. Give suitable real vector spaces 𝑈 , 𝑉 ,𝑊 so that 𝑈 + 𝑉 = 𝑈 + 𝑊 but 𝑉 ≠ 𝑊 .
14. Let 𝑈 and 𝑊 be subspaces of a vector space such that 𝑈 ∩ 𝑊 = {0}. Prove
that if 𝑥 ∈ 𝑈 + 𝑊 , then there exist unique 𝑢 ∈ 𝑈 , 𝑤 ∈ 𝑊 such that 𝑥 = 𝑢 + 𝑤 .
15. Let 𝑈 , 𝑉 , and 𝑊 be subspaces of a vector space 𝑋 .
(a) Prove that (𝑈 ∩ 𝑉 ) + (𝑈 ∩ 𝑊 ) ⊆ 𝑈 ∩ (𝑉 + 𝑊 ).
(b) Give suitable 𝑈 , 𝑉 ,𝑊 , 𝑋 so that 𝑈 ∩ (𝑉 + 𝑊 ) * (𝑈 ∩ 𝑉 ) + (𝑈 ∩ 𝑊 ).
(c) Prove that 𝑈 + (𝑉 ∩ 𝑊 ) ⊆ (𝑈 + 𝑉 ) ∩ (𝑈 + 𝑊 ).
(d) Give suitable 𝑈 , 𝑉 ,𝑊 , 𝑋 so that (𝑈 + 𝑉 ) ∩ (𝑈 + 𝑊 ) * 𝑈 + (𝑉 ∩ 𝑊 ).
16. Let 𝑒𝑖 be the sequence (0, 0, . . . , 0, 1, 0, 0, . . .) where the 𝑖th term is 1 and the
rest are all 0. What is span ({𝑒 1, 𝑒 2, . . .})?
Ans: 𝑐 00, the set of all sequences each having finitely many nonzero terms.
(1.11) Example
(1.12) Theorem
Let 𝑣 1, . . . , 𝑣𝑛 be vectors in a vector space 𝑉 , where 𝑛 ≥ 2. Then
(1) {𝑣 1, . . . , 𝑣𝑛 } is linearly dependent iff 𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 = 0 where at least one
of the scalars 𝛼 1, . . . , 𝛼𝑛 is nonzero.
(2) {𝑣 1, . . . , 𝑣𝑛 } is linearly independent iff for scalars 𝛼 1, . . . , 𝛼𝑛 ,
𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 = 0 implies 𝛼 1 = · · · = 𝛼𝑛 = 0.
18 MA2031 Classnotes
Then 𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 = 0 where 𝛼 𝑗 = −1 ≠ 0.
Conversely, suppose that 𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 = 0, where 𝛼 1, . . . , 𝛼𝑛 ∈ F, and we have
(some) 𝑗 ∈ {1, . . . , 𝑛} such that 𝛼 𝑗 ≠ 0. Then
(1.13) Example
1. Is {(1, 0, 0), (1, 2, 0), (1, 1, 1)} linearly independent in R3 ? We start with the
equation
𝛼 (1, 0, 0) + 𝛽 (1, 2, 0) + 𝛾 (1, 1, 1) = (0, 0, 0).
Comparing the components, we obtain: 𝛼 + 𝛽 + 𝛾 = 0, 2𝛽 + 𝛾 = 0, and 𝛾 = 0. It
implies 𝛼 = 𝛽 = 𝛾 = 0.
Thus, the vectors are linearly independent.
2. {(1, 0), (1, 1), (3, 2)} is linearly dependent in R2 . To illustrate (1.12), suppose
𝛼 sin 𝑡 + 𝛽 cos 𝑡 = 0.
Notice that the 0 on the right hand side is the zero function. Putting 𝑡 = 0 we
get 𝛽 = 0. By taking 𝑡 = 𝜋/2, we get 𝛼 = 0. Hence the set {sin 𝑡, cos 𝑡 } is linearly
independent.
5. If {𝑢 1, . . . , 𝑢𝑛 } is linearly dependent, then for any vector 𝑣 ∈ 𝑉 , {𝑢 1, . . . , 𝑢𝑛 , 𝑣 } is
linearly dependent. Why?
(1.14) Theorem
A list of 𝑛 vectors is linearly dependent iff there exists 𝑘 ∈ {1, . . . , 𝑛} such that the
first 𝑘 − 1 vectors are linearly independent, and the 𝑘th vector is in the span of the
first 𝑘 − 1 vectors.
If the first vector in the list is the zero vector, then the list of previous vectors is
taken as the empty list ∅, in which case 𝑘 = 1.
Proof. Write 𝑆 0 := ∅, and for 1 ≤ 𝑗 ≤ 𝑛, define the list 𝑆 𝑗 as the list of first 𝑗
vectors. We notice that 𝑆 0 is a sublist of 𝑆 1, which is a sublist of 𝑆 2, and so on. In
this increasing list of lists 𝑆 0, 𝑆 1, 𝑆 2, . . . , 𝑆𝑛 , the first list 𝑆 0 is linearly independent
and the last list 𝑆𝑛 is linearly dependent. Notice that if 𝑆 𝑗 is linearly independent,
then all of 𝑆 0, . . . , 𝑆 𝑗 are linearly independent; and if 𝑆 𝑗 is linearly dependent, then
all of 𝑆 𝑗 , 𝑆 𝑗+1, . . . , 𝑆𝑛 are linearly dependent. Therefore, somewhere the switching
from linearly independent to linearly dependent happens. That is, there exists a
20 MA2031 Classnotes
(1.15) Theorem
Let 𝑉 be a vector space. Let 𝐴 be a linearly independent set of 𝑚 vectors, and let 𝐵
be a spanning set of 𝑉 consisting of 𝑛 vectors. Then 𝑚 ≤ 𝑛.
1.6 Basis
A spanning set of a vector space may contain a vector which is in the span of the
other vectors in the set. Throwing away such a vector leaves a spanning set, again.
That is, in a spanning set, there may be redundancy. On the other hand, a linearly
independent set may fail to span the vector space. That is, in a linearly independent
set there may be deficiency. We would like to have a set of vectors which is neither
redundant nor deficient.
Let 𝑉 be a vector space over F. A linearly independent subset that spans 𝑉 is called
a basis of 𝑉 . A basis depends on the underlying field since linear combinations
depend on the field.
A vector space may have many bases. For example, both {1} and {2} are bases of
R. In fact {𝑥 }, for any nonzero 𝑥 ∈ R, is a basis of R. However, {0} has the unique
basis ∅.
(1.16) Example
1. Recall that in R2, the vectors 𝑒 1 = (1, 0) and 𝑒 2 = (0, 1) are linearly independent.
They also span R2 since (𝑎, 𝑏) = 𝑎𝑒 1 + 𝑏𝑒 2 for any 𝑎, 𝑏 ∈ R. Therefore, {𝑒 1, 𝑒 2 } is
a basis of R2 .
Vector Spaces 23
2. The set {(1, 1), (1, 2)} is a basis of R2 . Reason? Since neither of them is a
scalar multiple of the other, the set is linearly independent. To see that the
given set of vectors spans R2, we must show that each vector (𝑎, 𝑏) ∈ R2 can
be expressed as a linear combination of these vectors. So, we ask whether the
equation (𝑎, 𝑏) = 𝛼 (1, 1) + 𝛽 (1, 2) has a solution for 𝛼 and 𝛽. The requirement
amounts to 𝛼 + 𝛽 = 𝑎 and 𝛼 + 2𝛽 = 𝑏. We see that 𝛽 = 𝑏 − 𝑎 and 𝛼 = 2𝑎 − 𝑏 do
the job. Hence {(1, 1), (1, 2)} is a basis of R2 .
3. Let 𝑉 = {(𝑎, 𝑏) ∈ R2 : 2𝑎 − 𝑏 = 0}. Clearly, (1, 2) ∈ 𝑉 . If (𝑎, 𝑏) ∈ 𝑉 , then
𝑏 = 2𝑎. That is, (𝑎, 𝑏) = (𝑎, 2𝑎) = 𝑎(1, 2). So, 𝑉 = span {(1, 2)}. Also, {(1, 2)}
is linearly independent. Therefore, it is a basis of 𝑉 .
4. The set {(1, 0, 0), (0, 1, 0), (0, 0, 1)} as a subset of R3 is a basis of R3 . Also, the
same set as a subset of C3 is a basis of C3 .
5. Let 𝑉 = {(𝑎, 𝑏, 𝑐) ∈ R3 : 𝑎 − 2𝑏 + 𝑐 = 0}. Let (𝑎, 𝑏, 𝑐) ∈ 𝑉 . Then 𝑎 = 2𝑏 − 𝑐;
that is, (𝑎, 𝑏, 𝑐) = (2𝑏 − 𝑐, 𝑏, 𝑐) = 𝑏 (2, 1, 0) + 𝑐 (−1, 0, 1). So, 𝑉 is spanned by
𝐵 := {(2, 1, 0), (−1, 0, 1)}. Further, 𝐵 is a linearly independent subset of 𝑉 .
Therefore, 𝐵 is a basis of 𝑉 .
6. Let 𝑉 = F𝑚×𝑛 . Let 𝐸𝑖 𝑗 be the matrix in 𝑉 whose (𝑖, 𝑗)th entry is 1 and all other
entries are 0. Then 𝐵 = {𝐸𝑖 𝑗 ∈ 𝑉 : 1 ≤ 𝑖 ≤ 𝑚, 1 ≤ 𝑗 ≤ 𝑛} is a basis of 𝑉 .
Recall that 𝑒 𝑗 in F𝑛 has the 𝑗th component as 1 and all other components as 0. The
set {𝑒 1, . . . , 𝑒𝑛 } is a basis of F𝑛 . As an ordered set, this basis is called the Standard
Basis of F𝑛 .
We also write the ordered set {𝑒 1, . . . , 𝑒𝑛 } for the standard basis of F𝑛×1, where
each 𝑒 𝑗 is taken as a column vector. Similarly, we write the standard basis of F1×𝑛 as
{𝑒 1, . . . , 𝑒𝑛 }, where each 𝑒𝑖 is taken as a row vector. The context will clarify whether
it is a column vector or a row vector, and the particular 𝑛.
We now show formally that a basis neither has redundancy nor has deficiency in
spanning the vector space. We use the following terminology.
A subset 𝐵 of a vector space 𝑉 is a maximal linearly independent set means that
𝐵 is linearly independent in 𝑉 , and each proper superset of 𝐵 is linearly dependent.
Similarly, a subset 𝐵 of 𝑉 is a minimal spanning set of 𝑉 means that 𝐵 spans 𝑉 ,
and each proper subset of 𝐵 fails to span 𝑉 .
(1.17) Theorem
Any subset of a vector space is a basis iff it is a minimal spanning set iff it is a
maximal linearly independent set.
(1.18) Theorem
Let 𝑣 1, . . . , 𝑣𝑛 be vectors in a vector space 𝑉 . The ordered set 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } is a
basis of 𝑉 iff for each 𝑣 ∈ 𝑉 there exists a unique 𝑛-tuple of scalars (𝛼 1, . . . , 𝛼𝑛 )
such that 𝑣 = 𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 .
Proof. Suppose the ordered set 𝐵 is a basis of 𝑉 . Let 𝑣 ∈ 𝑉 . As span (𝐵) = 𝑉 , there
exist scalars 𝛼 1, . . . , 𝛼𝑛 such that 𝑣 = 𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 . For uniqueness, suppose
𝑣 = 𝛼 1 𝑣 1 + · · · + 𝛼 𝑛 𝑣 𝑛 = 𝛽 1 𝑣 1 + · · · + 𝛽𝑛 𝑣 𝑛 .
Vector Spaces 25
Then (𝛼 1 − 𝛽 1 )𝑣 1 + · · · + (𝛼𝑛 − 𝛽𝑛 )𝑣𝑛 = 0. Due to linear independence of 𝐵,
𝛼 1 = 𝛽 1 , . . . , 𝛼 𝑛 = 𝛽𝑛 .
(1.19) Theorem
If a vector space has a finite spanning set, then every finite spanning set contains a
basis.
(1.20) Example
Let 𝐵 = {(1, 0, 1), (1, 2, 1), (2, 2, 2), (0, 2, 0)} and let 𝑉 = span (𝐵). We see that
(1, 2, 1) is not a scalar multiple of (1.0, 1). Next, (2, 2, 2) = (1, 0, 1) + (1, 2, 1).
Removing the vector (2, 2, 2) from 𝐵, we obtain
Here, 𝑉 = span (𝐵) = span (𝐵 1 ). Next, (0, 2, 0) = −(1, 0, 1) + (1, 2, 1). Removing
(0, 2, 0) from 𝐵 1, we end up with
(1.21) Theorem
If a vector space has a finite spanning set, then every linearly independent set can
be extended to a basis.
26 MA2031 Classnotes
(1.22) Example
Let 𝐵 = {(1, 0, 1), (1, 2, 1), (2, 2, 2), (0, 2, 0)} and let 𝑉 = span (𝐵), as in the last
example. The vector (2, −2, 2) ∈ 𝑉 , since (2, −2, 2) = 3(1, 0, 1) − (1, 2, 1).
For extending the set {(2, −2, 2)} to a basis, we construct the spanning set
𝐶 = {(2, −2, 2), (1, 0, 1), (1, 2, 1), (2, 2, 2), (0, 2, 0)}.
Notice that 𝐶 is a spanning set of 𝑉 since its subset 𝐵 is a spanning set. Further, we
see that {(2, −2, 2), (1, 0, 1)} is linearly independent. And,
Deleting these vectors from 𝐶, we end up with the basis {(2, −2, 2), (1, 0, 1)} of
𝑉 , which is an extension of {(2, −2, 2)}.
1.7 Dimension
Due to (1.19) if a vector space has a finite spanning set, then all its bases are finite.
We show something more.
(1.23) Theorem
If a vector space has a finite spanning set, then each basis has the same number of
elements.
Proof. Let 𝑉 be a vector space having a finite spanning set. Let 𝐵 and 𝐸 be two
bases of 𝑉 . Both 𝐵 and 𝐶 are finite sets. Let 𝐵 have 𝑚 vectors and let 𝐶 have 𝑛
vectors. Consider 𝐵 as a linearly independent set and 𝐸 as a spanning set. Then
𝑚 ≤ 𝑛, due to (1.15). Now, consider 𝐸 as a linearly independent set and 𝐵 as a
spanning set. Then 𝑛 ≤ 𝑚.
Let 𝑉 be a vector space having a finite spanning set. Then the number of elements
in a basis is called the dimension of 𝑉 ; and it is denoted by dim (𝑉 ).
(1.24) Example
1. dim (R) = 1; dim (R𝑛 ) = 𝑛; dim (C) = 1; dim (C𝑛 ) = 𝑛; dim (F𝑛 [𝑡]) = 𝑛 + 1.
2. The dimension of the zero space {0} is 0. Reason? ∅ is a basis of {0}.
3. If C is considered as a vector space over R, then dim (C) = 2. For instance, {1, 𝑖}
is a basis of the real vector space C.
4. The real vector space C𝑛 has dimension 2𝑛. Can you construct a basis of C𝑛
considered as a vector space over R?
5. The real vector space C𝑛 [𝑡] has dimension 2(𝑛 + 1).
28 MA2031 Classnotes
A vector space which has a finite basis is called a finite dimensional vector
space. We also write dim (𝑉 ) < ∞ to express the fact that “𝑉 is finite dimensional”.
Due to (1.19), each vector space having a finite spanning set is finite dimensional.
The dimension of a finite dimensional vector space is a non-negative integer. For
instance, F𝑛 and F𝑛 [𝑡] are finite dimensional vector spaces over F with dim (F𝑛 ) = 𝑛
and dim (F𝑛 [𝑡]) = 𝑛 + 1.
A vector space which does not have a finite basis, is called an infinite dimensional
vector space; and we express this fact by writing dim (𝑉 ) = ∞. From (1.15) it follows
that
a vector space is infinite dimensional
iff no finite subset of it is its basis
iff no finite subset of it is a spanning set
iff it contains an infinite linearly independent set.
(1.25) Example
1. F∞ is infinite dimensional since {𝑒 1, 𝑒 2, . . .} is linearly independent in F∞ .
2. The set of all polynomials, F[𝑡], is an infinite dimensional vector space. Reason?
Suppose the dimension is finite, say dim (F[𝑡]) = 𝑛. Then any set of 𝑛 + 1 vectors
is linearly dependent. But {1, 𝑡, . . . , 𝑡 𝑛 } is linearly independent! Notice that
{1, 𝑡, 𝑡 2, . . .} is a basis of F𝑛 [𝑡].
3. 𝐶 [𝑎, 𝑏] is an infinite dimensional vector space. Reason? Take the collection of
functions {𝑓𝑛 : 𝑓𝑛 (𝑡) = 𝑡 𝑛 for all 𝑡 ∈ [𝑎, 𝑏]; 𝑛 = 0, 1, 2, . . .}. Then {𝑓0, 𝑓1, . . . , 𝑓𝑛 }
is linearly independent for every 𝑛. So, 𝐶 [𝑎, 𝑏] cannot have a finite basis.
We will not study infinite dimensional vector spaces, though occasionally, we will
give an example to illustrate a point.
Inter-dependence of the notions of spanning set, linear independence, and basis
can be seen using the notion of dimension. For a finite set 𝑆, we write |𝑆 | for the
number of elements in 𝑆. The following theorems (1.26-1.27) state some relevant
facts; their proofs are easy.
(1.26) Theorem
Let 𝑆 be a finite subset of a finite dimensional vector space 𝑉 .
(1) 𝑆 is a basis of 𝑉 iff 𝑆 is a spanning set and |𝑆 | = dim (𝑉 ).
(2) 𝑆 is a basis of 𝑉 iff 𝑆 is linearly independent and |𝑆 | = dim (𝑉 ).
(3) If |𝑆 | < dim (𝑉 ), then 𝑆 does not span 𝑉 .
(4) If |𝑆 | > dim (𝑉 ), then 𝑆 is linearly dependent.
Vector Spaces 29
(1.27) Theorem
Let 𝑈 be a subspace of a finite dimensional vector space 𝑉 .
(1) 𝑈 is a proper subspace of 𝑉 iff dim (𝑈 ) < dim (𝑉 ).
(2) (Basis Extension) Each basis of 𝑈 can be extended to a basis of 𝑉 .
(1.28) Example
𝑈 ∩ 𝑊 = {(𝑎, 𝑏) : 2𝑎 − 𝑏 = 0 = 𝑎 + 𝑏} = {0}.
𝑈 + 𝑊 = {(𝑎, 2𝑎) + (𝑐, −𝑐) : 𝑎, 𝑐 ∈ R} = {(𝑎 + 𝑐, 2𝑎 − 𝑐) : 𝑎, 𝑐 ∈ R}.
Further, if 𝛼, 𝛽 ∈ R, then
𝛼 + 𝛽 2𝛼 − 𝛽 𝛼 + 𝛽 2𝛼 − 𝛽
(𝛼, 𝛽) = + , 2 − .
3 3 3 3
That is, each vector in R2 can be expressed in the form (𝑎 + 𝑐, 2𝑎 − 𝑐). Thus
𝑈 + 𝑊 = R2 .
Here, dim (𝑈 ∩ 𝑊 ) + dim (𝑈 + 𝑊 ) = 2 = dim (𝑈 ) + dim (𝑊 ).
2. Consider the following subspaces of R3 :
𝑈 + 𝑊 = span {(1, 0, −1), (1, −1, 0), (1, 1, 0), (1, 0, 1)} ⊆ R3 .
Since (1, 0, −1), (1, −1, 0), (1, 1, 0) are linearly independent, 𝑈 + 𝑊 = R3 .
Thus dim (𝑈 ∩ 𝑊 ) + dim (𝑈 + 𝑊 ) = 1 + 3 = dim (𝑈 ) + dim (𝑊 ).
(1.29) Theorem
Let 𝑈 and 𝑊 be finite dimensional subspaces of a vector space 𝑉 . Then
𝐸 = 𝐵 ∪ 𝐶 ∪ 𝐷 = {𝑢 1, . . . , 𝑢𝑛 , 𝑣 1, . . . , 𝑣𝑘 , 𝑤 1, . . . , 𝑤𝑚 }
Then
𝑛
Õ 𝑘
Õ 𝑚
Õ
𝑧 = 𝑥 +𝑦 = (𝛼𝑖 + 𝛽𝑖 )𝑢𝑖 + 𝛼𝑛+𝑗 𝑣 𝑗 + 𝛽𝑛+𝑙 𝑤 ℓ ∈ span (𝐸).
𝑖=1 𝑗=1 ℓ=1
𝛼 1𝑢 1 + · · · + 𝛼𝑛𝑢𝑛 + 𝛽 1𝑣 1 + · · · + 𝛽𝑘 𝑣𝑘 + 𝛾 1𝑤 1 + · · · + 𝛾𝑚𝑤𝑚 = 0.
Vector Spaces 31
Then
𝛼 1𝑢 1 + · · · + 𝛼𝑛𝑢𝑛 + 𝛽 1𝑣 1 + · · · + 𝛽𝑘 𝑣𝑘 = −𝛾 1𝑤 1 − · · · − 𝛾𝑚𝑤𝑚 .
The left hand side is a vector in 𝑈 , and the right hand side is a vector in𝑊 . Therefore,
both are in 𝑈 ∩ 𝑊 . Since 𝐵 is a basis for 𝑈 ∩ 𝑊 , we have
−𝛾 1𝑤 1 − · · · − 𝛾𝑚𝑤𝑚 = 𝑎 1𝑢 1 + · · · + 𝑎𝑛𝑢𝑛
𝑎 1𝑢 1 + · · · + 𝑎𝑛𝑢𝑛 + 𝛾 1𝑤 1 + · · · + 𝛾𝑚𝑤𝑚 = 0.
𝛼 1𝑢 1 + · · · + 𝛼𝑛𝑢𝑛 + 𝛽 1𝑣 1 + · · · + 𝛽𝑘 𝑣𝑘 = 0.
𝛼 1 = · · · = 𝛼𝑛 = 𝛽 1 = · · · = 𝛽𝑘 = 𝛾 1 = · · · 𝛾𝑘 = 0.
(1.30) Observation In the RREF of 𝐴 suppose 𝑅𝑖1, . . . , 𝑅𝑖𝑟 are the rows of 𝐴
which have become the nonzero rows in the RREF, and other rows have become the
zero rows. Also, suppose 𝐶 𝑗1, . . . , 𝐶 𝑗𝑟 for 𝑗1 < · · · < 𝑗𝑟 , are the columns of 𝐴 which
have become the pivotal columns in the RREF, other columns being non-pivotal.
Then the following are true:
1. The rows 𝑅𝑖1, . . . , 𝑅𝑖𝑟 are linearly independent; and the other rows of 𝐴 are
linear combinations of 𝑅𝑖1, . . . , 𝑅𝑖𝑟 .
2. The columns 𝐶 𝑗1, . . . , 𝐶 𝑗𝑟 have respectively become 𝑒 1, . . . , 𝑒𝑟 in the RREF.
3. The columns 𝐶 𝑗1, . . . , 𝐶 𝑗𝑟 are linearly independent; and other columns of 𝐴
are linear combinations of 𝐶 𝑗1, . . . , 𝐶 𝑗𝑟 .
4. If 𝑒 1, . . . , 𝑒𝑘 are all the pivotal columns in the RREF that occur to the
left of a non-pivotal column, then the non-pivotal column is in the form
(𝑎 1, . . . , 𝑎𝑘 , 0, . . . , 0)𝑇 . Further, if a column 𝐶 in 𝐴 has become this non-pivotal
column in the RREF, then 𝐶 = 𝑎 1𝐶 𝑗1 + · · · + 𝑎𝑘 𝐶 𝑗𝑘 .
The above observations can be used in two ways. Let 𝑣 1, . . . , 𝑣𝑚 ∈ F𝑛 and let
𝑈 = span {𝑣 1, . . . , 𝑣𝑚 }. We consider these vectors as row vectors, form a matrix of
𝑚 rows where the 𝑖th row is 𝑣𝑖 . Then reduce this matrix to its RREF. The zero rows
are obviously in the span of the pivoted rows. The pivoted rows form a basis for 𝑈 .
In the second method, we we consider the vectors 𝑣 1, . . . , 𝑣𝑚 as column vectors,
and form a matrix with its 𝑗th column as 𝑣 𝑗 . Then we reduce the matrix to its RREF.
If the column indices of the pivoted columns are 𝑖 1, . . . , 𝑖𝑟 , then 𝑣𝑖 1 , . . . , 𝑣𝑖𝑟 form a
basis for 𝑈 . Further, using the last item in the above observation, we can also find
out exactly how a vector out of 𝑣 1, . . . , 𝑣𝑚 not in the basis is expressed a s a linear
combination of the basis vectors.
(1.31) Example
Let 𝑈 = span {𝑣 1, 𝑣 2, 𝑣 3, 𝑣 4, 𝑣 5 } in R4, where 𝑣 1 = (1, 1, 0, −1), 𝑣 2 = (2, −1, 1, 0),
𝑣 3 = (1, 2, −2, 1), 𝑣 4 = (1, 5, −3, −1), 𝑣 5 = (4, −1, 0, 2). It is required to extract a
basis for 𝑈 from the list of these vectors.
34 MA2031 Classnotes
Method 1: Taking the vectors as rows of a matrix and reducing it to its RREF, we
have
1 1 0 −1 1 0 0 1/5
2 −1 1 0 0 1 0 −6/5
𝑅𝑅𝐸𝐹
1 2 −2 1 −→ 0 0 1 − /5 . 8
1 5 −3 −1 0 0 0 0
4 −1 0 2 0 0 0 0
Discarding the zero rows, we obtain the basis for 𝑈 as
Method 2: Taking the vectors as columns of a matrix and reducing it to its RREF,
we have
1 2 1 1 4 1 0 0 2 −1
1 −1 2 5 1 𝑅𝑅𝐸𝐹 1 0 −1 2
−→ .
0 1 −2 −3 0 0 0 1 1 1
−1 0 1 −1 2 0 0 0 0 0
In the RREF, the first three columns are pivoted columns and the last two are non-
pivoted. Therefore, a basis for 𝑈 is {𝑣 1, 𝑣 2, 𝑣 3 }. Further, the entries in the non-pivoted
columns show that 𝑣 4 = 2𝑣 1 − 𝑣 2 + 𝑣 3 and 𝑣 5 = −𝑣 1 + 2𝑣 2 + 𝑣 3 .
(1.32) Example
Let 𝑈 = span {𝑡 + 2𝑡 2 + 3𝑡 3, 1 + 2𝑡 2 + 4𝑡 3, −1 +𝑡 −𝑡 3, 3 −𝑡 + 4𝑡 2 + 9𝑡 3, 1 +𝑡 +𝑡 2 +𝑡 3 }.
To construct a basis for 𝑈 , we write the polynomial 𝑎 + 𝑏𝑡 + 𝑐𝑡 2 + 𝑑𝑡 3 as the tuple
(𝑎, 𝑏, 𝑐, 𝑑) and then follow the earlier methods of reducing the appropriate matrix to
its RREF.
Method 1: Here, we take the tuples corresponding to the given polynomials as rows
of a matrix and convert it to its RREF:
0 1 2 3 1 0 0 0
1 0 2 4
0 1
0 −1
𝑅𝑅𝐸𝐹
−1 1 0 −1 −→ 0 0 1 2.
3 −1 4 9 0 0 0 0
1 1 1 1 0 0 0 0
Writing the nonzero rows as the corresponding polynomials, we get a basis for 𝑈 ,
namely, {1, 𝑡 − 𝑡 3, 𝑡 2 + 2𝑡 3 }.
Vector Spaces 35
Method 2: We write the same tuples as columns of a matrix and convert the matrix
to its RREF:
0 1 −1 3 1 1 0 1 −1 0
1 0 1 −1 1 𝑅𝑅𝐸𝐹 0 1 −1 3 0
−→ .
2 2 0 4 1 0 0 0 0 1
3 4 −1 9 1 0 0 0 0 0
In the RREF, columns 1, 2 and 5 are pivoted; thus a basis for 𝑈 consists of the first,
second, and the fifth polynomial. That is a basis for 𝑈 is
{𝑡 + 2𝑡 2 + 3𝑡 3, 1 + 2𝑡 2 + 4𝑡 3, 1 + 𝑡 + 𝑡 2 + 𝑡 3 }.
Further, the entries in the non-pivoted columns in the RREF show that
−1 + 𝑡 − 𝑡 3 = (𝑡 + 2𝑡 2 + 3𝑡 3 ) − (1 + 2𝑡 2 + 4𝑡 3 ) and
3 − 𝑡 + 4𝑡 2 + 9𝑡 3 = −(𝑡 + 2𝑡 2 + 3𝑡 3 ) + 3(1 + 2𝑡 2 + 4𝑡 3 ).
(2.1) Example
Í
1. For 𝑥 = (𝑎 1, . . . , 𝑎𝑛 ), 𝑦 = (𝑏 1, . . . , 𝑏𝑛 ) ∈ R𝑛 , h𝑥, 𝑦i = 𝑛𝑗=1 𝑎 𝑗 𝑏 𝑗 defines an inner
product. It is called the standard inner product on R𝑛 .
37
38 MA2031 Classnotes
Í𝑛
2. For 𝑥 = (𝑎 1, . . . , 𝑎𝑛 ), 𝑦 = (𝑏 1, . . . , 𝑏𝑛 ) ∈ C𝑛 , h𝑥, 𝑦i = 𝑗=1 𝑎 𝑗 𝑏 𝑗 defines an inner
product. It is called the standard inner product on C𝑛 . Notice that h𝑥, 𝑦i =
Í𝑛
𝑗=1 𝑎 𝑗 𝑏 𝑗 is not an inner product on C .
𝑛
(2.2) Theorem
Let 𝑉 be an ips. For all 𝑥, 𝑦, 𝑧 ∈ 𝑉 and for each 𝛼 ∈ F,
Other most used properties of the norm, in an ips, are proved in the following
theorem.
(2.3) Theorem
For all vectors 𝑥, 𝑦 in an ips, the following are true:
(1) (Parallelogram Law) k𝑥 + 𝑦 k 2 + k𝑥 − 𝑦 k 2 = 2k𝑥 k 2 + 2k𝑦 k 2 .
(2) (Cauchy-Schwartz Inequality) |h𝑥, 𝑦i| ≤ k𝑥 k k𝑦 k.
Further, equality holds iff one of 𝑥, 𝑦 is a scalar multiple of the other.
(3) (Triangle Inequality) k𝑥 + 𝑦 k ≤ k𝑥 k + k𝑦 k.
(4) (Reverse Triangle Inequality) k𝑥 k − k𝑦 k ≤ k𝑥 − 𝑦 k.
|h𝑥, 𝑦i| 2
k𝑥 −𝛼𝑦 k 2 = h𝑥 −𝛼𝑦, 𝑥 −𝛼𝑦i = h𝑥, 𝑥i −𝛼 h𝑥, 𝑦i −𝛼 h𝑦, 𝑥i +𝛼𝛼 h𝑦, 𝑦i = k𝑥 k 2 − .
k𝑦 k 2
(2.4) Example
h𝑎 0 + 𝑎 1𝑡 + · · · + 𝑎𝑛 𝑡 𝑛 , 𝑏 0 + 𝑏 1𝑡 + · · · + 𝑏𝑛 𝑡 𝑛 i = 𝑎 0𝑏 0 + 𝑎 1𝑏 1 + · · · + 𝑎𝑛𝑏 𝑛 ,
(2.6) Example
(2.7) Theorem
Every orthogonal (orthonormal) set is linearly independent.
(2.8) Theorem
Let 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } be an orthonormal basis of an ips 𝑉 . Let 𝑥 ∈ 𝑉 . Then
Í
(1) (Fourier Expansion) 𝑥 = 𝑛𝑗=1 h𝑥, 𝑣 𝑗 i𝑣 𝑗 ;
(2) (Parseval’s Identity) k𝑥 k 2 = 𝑛𝑗=1 |h𝑥, 𝑣 𝑗 i| 2 .
Í
Í𝑛
Proof. (1) Since 𝐵 is a basis of 𝑉 , 𝑥 = 𝑖=1 𝛼𝑖 𝑣 𝑖 for some scalars 𝛼𝑖 . Now,
𝑛
DÕ E 𝑛
Õ
h𝑥, 𝑣 𝑗 i = 𝛼𝑖 𝑣 𝑖 , 𝑣 𝑗 = 𝛼 𝑖 𝛿𝑖 𝑗 = 𝛼 𝑗 for 1 ≤ 𝑗 ≤ 𝑛.
𝑖=1 𝑖=1
Í𝑛 Í𝑛 Í𝑛
Therefore, 𝑥 = 𝑖=1 𝛼𝑖 𝑣 𝑖 = 𝑗=1 𝛼 𝑗 𝑣 𝑗 = 𝑗=1 h𝑥, 𝑣 𝑗 i𝑣 𝑗 .
If the finite set 𝐵 is not a basis for the ips 𝑉 , then instead of an equality in Parseval’s
identity, we have an inequality.
The vector 𝑦 in the proof of Bessel’s inequality has geometric significance. For
illustartion, take 𝑈 as the 𝑥𝑦-plane, 𝑉 as R3, and 𝑥 = (1, 2, 3). Choose the standard
basis {𝑒 1, 𝑒 2 } as the orthonormnal basis for 𝑈 . Then
(2.10) Theorem
Let 𝑢 1, . . . , 𝑢𝑛 be linearly independent vectors in an ips 𝑉 . Construct the vectors
𝑣 1, . . . , 𝑣𝑛 as follows:
𝑣 1 = 𝑢1
h𝑢𝑘 , 𝑣 1 i h𝑢𝑘 , 𝑣 2 i h𝑢𝑘 , 𝑣𝑘−1 i
𝑣 𝑘 = 𝑢𝑘 − 𝑣1 − 𝑣2 − · · · − 𝑣𝑘−1 for 𝑘 > 1.
h𝑣 1, 𝑣 1 i h𝑣 2, 𝑣 2 i h𝑣𝑘−1, 𝑣𝑘−1 i
𝑚
D Õ h𝑢𝑚+1, 𝑣𝑖 i E h𝑢𝑚+1, 𝑣 𝑗 i
h𝑣𝑚+1, 𝑣 𝑗 i = 𝑢𝑚+1 − 𝑣𝑖 , 𝑣 𝑗 = h𝑢𝑚+1, 𝑣 𝑗 i − h𝑣 𝑗 , 𝑣 𝑗 i = 0.
𝑖=1
h𝑣𝑖 , 𝑣𝑖 i h𝑣 𝑗 , 𝑣 𝑗 i
46 MA2031 Classnotes
(2.11) Example
The vectors 𝑢 1 = (1, 1, 0), 𝑢 2 = (0, 1, 1), 𝑢 3 = (1, 0, 1) form a basis for F3 . Apply
Gram-Schmidt Orthogonalization to obtain an orthogonal basis of F3 .
𝑣 1 = (1, 1, 0).
h𝑢 2, 𝑣 1 i (0, 1, 1) · (1, 1, 0)
𝑣 2 = 𝑢2 − 𝑣 1 = (0, 1, 1) − (1, 1, 0)
h𝑣 1, 𝑣 1 i (1, 1, 0) · (1, 1, 0)
= (0, 1, 1) − 12 (1, 1, 0) = (− 12 , 12 , 1).
h𝑢 3, 𝑣 1 i h𝑢 3, 𝑣 2 i
𝑣 3 = 𝑢3 − 𝑣1 − 𝑣2
h𝑣 1, 𝑣 1 i h𝑣 2, 𝑣 2 i
= (1, 0, 1) − (1, 0, 1) · (1, 1, 0)(1, 1, 0) − (1, 0, 1) · (− 12 , 21 , 1) (− 12 , 21 , 1)
= (1, 0, 1) − 12 (1, 1, 0) − 13 (− 12 , 21 , 1) = (− 23 , 23 , − 32 ).
(2.12) Example
The vectors 𝑢 1 = 1, 𝑢 2 = 𝑡, 𝑢 3 = 𝑡 2 form a linearly independent set in the ips of all
polynomials considered as functions from [−1, 1] to R; with the inner product as
∫1
h𝑝 (𝑡), 𝑞(𝑡)i = −1 𝑝 (𝑡)𝑞(𝑡) 𝑑𝑡 . Gram-Schmidt Process yields:
𝑣 1 = 𝑢 1 = 1.
∫1
h𝑢 2, 𝑣 1 i 𝑡 𝑑𝑡
𝑣 2 = 𝑢2 − 𝑣 1 = 𝑡 − ∫−1
1
1 = 𝑡.
h𝑣 1, 𝑣 1 i 𝑑𝑡
−1
∫1 ∫1
2 𝑑𝑡 𝑡 3 𝑑𝑡
h𝑢 3, 𝑣 1 i h𝑢 3, 𝑣 2 i 2 −1
𝑡 −1
𝑣 3 = 𝑢3 − 𝑣1 − 𝑣2 = 𝑡 − ∫ 1 1 − ∫1 𝑡 = 𝑡 2 − 31 .
h𝑣 1, 𝑣 1 i h𝑣 2, 𝑣 2 i 𝑑𝑡 𝑡 2 𝑑𝑡
−1 −1
(a) Find the set of all vectors orthogonal to the constant polynomials.
(b) Apply Gram-Schmidt process to the ordered basis {1, 𝑡, 𝑡 2, 𝑡 3 }.
(2.14) Example
Let 𝑈 be the plane {(𝑎, 𝑏, 𝑐) ∈ R3 : 𝑎 + 𝑏 + 𝑐 = 0}. Find a best approximation of the
point (1, 1, 1) from 𝑈 .
Suppose (𝛼, 𝛽, 𝛾) is a best approximation of (1, 1, 1) from 𝑈 . Such a point satisfies
𝛼 +𝛽+𝛾 = 0 and minimizes the distance k(1, 1, 1)−(𝛼, 𝛽, 𝛾)k. Substituting𝛾 = −𝛼 −𝛽,
we look for 𝛼, 𝛽 ∈ R so that
1/2
𝑓 (𝛼, 𝛽) = (1 − 𝛼) 2 + (1 − 𝛽) 2 + (1 + 𝛼 + 𝛽) 2
is minimum. Simplifying the expression for 𝑓 (𝛼, 𝛽), we see that it is equivalent to
minimizing
𝑔(𝛼, 𝛽) = 𝛼 2 + 𝛽 2 + 𝛼𝛽.
Then by using the methods of functions of two variables calculus, we may determine
the required best approximation as (0, 0, 0).
k𝑣 − 𝑥 k 2 = k(𝑣 − 𝑢) + (𝑢 − 𝑥)k 2 = k𝑣 − 𝑢 k 2 + k𝑢 − 𝑥 k 2 ≥ k𝑣 − 𝑢 k 2 .
k𝑣 − 𝑢 k 2 ≤ k𝑣 − 𝑢 − 𝛼𝑦 k 2 = h𝑣 − 𝑢 − 𝛼𝑦, 𝑣 − 𝑢 − 𝛼𝑦i
= k𝑣 − 𝑢 k 2 − h𝑣 − 𝑢, 𝛼𝑦i − h𝛼𝑦, 𝑣 − 𝑢i + 𝛼𝛼 h𝑦, 𝑦i
= k𝑣 − 𝑢 k 2 − |𝛼 | 2 k𝑦 k 2 .
k𝑣 − 𝑢 k 2 = k(𝑣 − 𝑤) + (𝑤 − 𝑢)k 2 = k𝑣 − 𝑤 k 2 + k𝑤 − 𝑢 k 2 = k𝑣 − 𝑢 k 2 + k𝑤 − 𝑢 k 2 .
(2.16) Theorem
Let {𝑢 1, . . . , 𝑢𝑛 } be an orthonormal basis for a subspace 𝑈 of an ips 𝑉 . Let 𝑣 ∈ 𝑉 .
Í
Then 𝑢 = 𝑛𝑖=1 h𝑣, 𝑢𝑖 i𝑢𝑖 is the best approximation of 𝑣 from 𝑈 .
50 MA2031 Classnotes
Í𝑛
Proof. Write 𝑢 := 𝑖=1 h𝑣, 𝑢𝑖 i𝑢𝑖 . Since 𝑢 ∈ 𝑈 , by Fourier expansion, we have
Í
𝑢 = 𝑛𝑖=1 h𝑢, 𝑢𝑖 i𝑢𝑖 . Due to (1.18), h𝑣, 𝑢𝑖 i = h𝑢, 𝑢𝑖 i for 1 ≤ 𝑖 ≤ 𝑛. That is, h𝑣 −𝑢, 𝑢𝑖 i = 0
for each 𝑖 ∈ {1, . . . , 𝑛}. Now, if 𝑥 ∈ 𝑈 , then there exist scalars 𝑎 1, . . . , 𝑎𝑛 such that
Í Í
𝑥 = 𝑛𝑖=1 𝑎𝑖 𝑢𝑖 . Then h𝑣 − 𝑢, 𝑥i = 𝑛𝑖=1 𝑎𝑖 h𝑣 − 𝑢, 𝑢𝑖 i = 0. That is, 𝑣 − 𝑢 ⊥ 𝑥 for each
𝑥 ∈ 𝑈.
By (2.15), 𝑢 is the best approximation of 𝑣 from 𝑈 .
For obvious geometrical reasons, the vector 𝑢 in (2.16) is called the orthogonal
projection of 𝑣 on the subspace 𝑈 , and it is denoted by proj𝑈 (𝑣).
Notice that the orthogonality condition 𝑣 − 𝑢 ⊥ 𝑥 for each 𝑥 ∈ 𝑈 in (2.15) is
equivalent to 𝑣 − 𝑢 ⊥ 𝑢 𝑗 for each 𝑗 whenever {𝑢 1, . . . , 𝑢𝑛 } is a spanning set for 𝑈 .
This is helpful in computing the best approximation, without using an orthonormal
basis.
Suppose {𝑢 1, . . . , 𝑢𝑛 } is any basis of 𝑈 . Write the best approximation of 𝑣 from 𝑈
Í
as 𝑢 = 𝑛𝑗=1 𝛽 𝑗 𝑢 𝑗 with unknown scalars 𝛽 𝑗 . Then using the orthogonality condition,
Í
we have h𝑣 − 𝑛𝑗=1 𝛽 𝑗 𝑢 𝑗 , 𝑢𝑖 i = 0. This way, the scalars 𝛽 𝑗 are determined from the
linear system
Õ 𝑛
h𝑢 𝑗 , 𝑢𝑖 i𝛽 𝑗 = h𝑣, 𝑢𝑖 i for 𝑖 = 1, . . . , 𝑛.
𝑗=1
(2.17) Example
1. For the best approximation of 𝑣 = (1, 0) ∈ R2 from 𝑈 = {(𝑎, 𝑎) : 𝑎 ∈ R}, we
look for a point (𝛼, 𝛼) so that (1, 0) − (𝛼, 𝛼) ⊥ (𝛽, 𝛽) for all 𝛽. That is, we look
for an 𝛼 so that (1 − 𝛼, −𝛼) · (1, 1) = 0. Or, 1 − 𝛼 − 𝛼 = 0. It leads to 𝛼 = 1/2. The
best approximation here is ( 1/2, 1/2).
2. Reconsider (2.14). We require a vector (𝛼, 𝛽, 𝛾) ∈ 𝑈 = {(𝑎, 𝑏, 𝑐) ∈ R3 : 𝑎 + 𝑏 +
𝑐 = 0} which is the best approximation to (1, 1, 1). A basis for 𝑈 is given by
{(1, −1, 0), (0, 1, −1)}. The orthogonality condition in (2.15) implies that
1 − 𝛼 − 1 + 𝛽 = 0, 1 − 𝛽 − 1 + 𝛾 = 0, 𝛼 + 𝛽 + 𝛾 = 0.
∫1 ∫1
2
(𝑡 − 𝛼 − 𝛽𝑡) 𝑑𝑡 = 0 = (𝑡 3 − 𝛼𝑡 − 𝛽𝑡 2 ) 𝑑𝑡 .
0 0
(3.1) Example
1. Let 𝑉 be a vector space. The map 𝑇 : 𝑉 → 𝑉 defined by 𝑇 (𝑣) = 0 for each 𝑣 ∈ 𝑉
is a linear operator on 𝑉 ; it is called the zero operator.
2. Let 𝑉 be a vector space. The map 𝑇 : 𝑉 → 𝑉 defined by 𝑇 (𝑣) = 𝑣 is a linear
operator on 𝑉 ; it is called the identity operator.
3. Let 𝑉 be a vector space. Let 𝛼 be any scalar. Then the map 𝑇 : 𝑉 → 𝑉 defined
by 𝑇 (𝑣) = 𝛼𝑣 is a linear operator on 𝑉 .
4. Define the map 𝑇 : R3 → R2 by 𝑇 (𝑎, 𝑏, 𝑐) = (2𝑎 + 𝑏, 𝑏 − 𝑐). Then 𝑇 is a linear
transformation.
5. The map 𝑇 : R2 → R3 defined by 𝑇 (𝑎, 𝑏) = (𝑎 + 𝑏, 2𝑎 − 𝑏, 𝑎 + 6𝑏) is a linear
transformation.
52
Linear Transformations 53
6. For 1 ≤ 𝑖 ≤ 𝑚 and 1 ≤ 𝑗 ≤ 𝑛, let 𝑎𝑖 𝑗 ∈ F. Define 𝑇 : F𝑛 → F𝑚 by
𝑛
Õ 𝑛
Õ
𝑇 (𝛽 1, . . . , 𝛽𝑛 ) = 𝑎 1𝑗 𝛽 𝑗 , . . . , 𝑎𝑚 𝑗 𝛽 𝑗 .
𝑗=1 𝑗=1
(3.2) Theorem
Let 𝑇 : 𝑉 → 𝑊 be a linear transformation. Then the following are true:
(1) 𝑇 (0) = 0.
(2) For all 𝑢, 𝑣 ∈ 𝑉 , 𝑇 (𝑢 − 𝑣) = 𝑇 (𝑢) − 𝑇 (𝑣).
(3) For any 𝑛 ∈ N, for all 𝑣 1, . . . , 𝑣𝑛 ∈ 𝑉 and for all scalars 𝛼 1, . . . , 𝛼𝑛 ,
𝑇 (𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 ) = 𝛼 1𝑇 (𝑣 1 ) + · · · + 𝛼𝑛𝑇 (𝑣𝑛 ).
(3.3) Theorem
Let 𝑇 : 𝑈 → 𝑉 and 𝑆 : 𝑉 → 𝑊 be linear transformations. Then 𝑆 ◦ 𝑇 : 𝑈 → 𝑊 is
a linear transformation.
Proof. Recall that the map 𝑆 ◦𝑇 is defined by (𝑆 ◦𝑇 )(𝑢) = 𝑆 (𝑇 (𝑢)) for 𝑢 ∈ 𝑈 . Let
𝑥, 𝑦 ∈ 𝑈 and let 𝛼 ∈ F. Now,
Reason?
(3.4) Theorem
A linear transformation is uniquely determined from its action on a basis.
𝑇 (𝑥) = 𝑎 1𝑤 1 + · · · + 𝑎𝑛𝑤𝑛 .
Due to uniqueness of the scalars, this map is well-defined. We must verify the two
defining conditions of a linear transformation.
Let 𝑢, 𝑣 ∈ 𝑉 . Then 𝑢 = 𝑏 1𝑣 1 + · · · + 𝑏𝑛 𝑣𝑛 and 𝑣 = 𝑐 1𝑣 1 + · · · + 𝑐𝑛 𝑣𝑛 for some scalars
𝑏𝑖 , 𝑐 𝑗 . Now, 𝑢 + 𝑣 = (𝑏 1 + 𝑐 1 )𝑣 1 + · · · + (𝑏𝑛 + 𝑐𝑛 )𝑣𝑛 . Thus
𝑇 (𝑢 + 𝑣) = (𝑏 1 + 𝑐 1 )𝑤 1 + · · · + (𝑏𝑛 + 𝑐𝑛 )𝑤𝑛
= (𝑏 1𝑤 1 + · · · + 𝑏𝑛𝑤𝑛 ) + (𝑐 1𝑤 1 + · · · + 𝑐𝑛𝑤𝑛 ) = 𝑇 (𝑢) + 𝑇 (𝑣).
𝑆 (𝑦) = 𝑆 (𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣𝑛 ) = 𝛼 1𝑆 (𝑣 1 ) + · · · + 𝛼𝑛 𝑆 (𝑣𝑛 )
= 𝛼 1𝑇 (𝑣 1 ) + · · · + 𝛼𝑛𝑇 (𝑣𝑛 ) = 𝑇 (𝛼 1𝑣 2 + · · · + 𝛼𝑛 𝑣𝑛 ) = 𝑇 (𝑦).
(3.5) Example
𝑇 (𝑎 + 𝑏𝑡 + 𝑐𝑡 2 + 𝑑𝑡 3 ) = 𝑎𝑇 (1) + 𝑏𝑇 (𝑡) + 𝑐𝑇 (𝑡 2 ) + 𝑑𝑇 (𝑡 3 )
= 𝑎 − 𝑐, 12 (3𝑎 + 𝑏 + 𝑐) .
2. Does there exist a linear operator on R2 which maps the square with cor-
ners at (−1, −1), (1, −1), (1, 1), (−1, 1) onto the square with corners at
(−1, 0), (1, 0), (1, 2), (−1, 2)? No. Pre-image of (1, 2) =?
3. Let 𝑉 and 𝑊 be real ips. Let 𝑇 : 𝑉 → 𝑊 be a linear transformation. Prove
that for all 𝑥, 𝑦 ∈ 𝑉 , h𝑇 𝑥,𝑇𝑦i = h𝑥, 𝑦i iff k𝑇 𝑥 k = k𝑥 k.
(3.6) Theorem
Let 𝑇 : 𝑉 → 𝑊 be a linear transformation. Then 𝑁 (𝑇 ) is a subspace of 𝑉 and
𝑅(𝑇 ) is a subspace of 𝑊 .
(3.7) Example
Let 𝑇 : R3 → R2 be defined by 𝑇 (𝑎, 𝑏, 𝑐) = (𝑎 + 𝑏, 𝑎 − 𝑐). To determine null(𝑇 ),
suppose 𝑇 (𝑎, 𝑏, 𝑐) = (0, 0), then 𝑎 = −𝑏 and 𝑎 = 𝑐. Therefore
𝑅(𝑇 ) = span {(1, 1), (1, 0), (0, −1)}. Now, (1, 1) ∈ span {(1, 0), (0, −1)}; and
{(1, 0), (0, −1)} is a linearly independent set. Therefore, a basis for 𝑅(𝑇 ) is
{(1, 0), (0, −1)}. Consequently, rank(𝑇 ) = 2.
(3.8) Theorem
Let 𝑇 : 𝑉 → 𝑊 be a linear transformation. Let 𝐵 be a basis of 𝑉 . Then
Proof. Let 𝑤 ∈ 𝑅(𝑇 ). There exists 𝑢 ∈ 𝑉 such that𝑇 (𝑢) = 𝑤 . Since 𝐵 is a basis of𝑉 ,
there exist scalars 𝛼 1, . . . , 𝛼𝑛 and vectors 𝑣 1, . . . , 𝑣𝑛 ∈ 𝐵 such that 𝑢 = 𝛼 1𝑣 + · · · +𝛼𝑛 𝑣𝑛 .
Then
That is, 𝑅(𝑇 ) ⊆ span {𝑇 (𝑣) : 𝑣 ∈ 𝐵}. Conversely, for each 𝑣 ∈ 𝐵, 𝑇 (𝑣) ∈ 𝑅(𝑇 ). As
𝑅(𝑇 ) is a vector space, we have span {𝑇 (𝑣) : 𝑣 ∈ 𝐵} ⊆ 𝑅(𝑇 ).
As we see a linear transformation is one-one or onto can be characterized in terms
of its nullity and rank.
(3.9) Theorem
Let 𝑇 : 𝑉 → 𝑊 be a linear transformation.
(1) 𝑇 is one-one iff 𝑁 (𝑇 ) ⊆ {0} iff 𝑁 (𝑇 ) = {0} iff null(𝑇 ) = 0.
(2) 𝑇 is an onto map iff 𝑊 ⊆ span {𝑇 𝑣 : 𝑣 ∈ 𝐵} for any basis 𝐵 of 𝑉 iff
rank(𝑇 ) = dim (𝑊 ).
Proof. If 𝑇 = 0, the zero map, then 𝑅(𝑇 ) = {0} and 𝑁 (𝑇 ) = 𝑉 . Clearly, the
dimension formula holds. So, assume that 𝑇 is a nonzero linear transformation.
The null space 𝑁 (𝑇 ) is a subspace of the finite dimensional vector space 𝑉 . So, let
Linear Transformations 61
𝐵 = {𝑣 1, . . . , 𝑣𝑘 } be a basis of 𝑁 (𝑇 ). [It includes the case of 𝑁 (𝑇 ) = {0}. In this
case, 𝐵 = ∅; and we take 𝑘 = 0, which is the number of vectors in 𝐵.] Extend 𝐵 to
a basis 𝐵 ∪ {𝑤 1, . . . , 𝑤𝑛 } for 𝑉 . Let 𝐸 = {𝑇 (𝑤 1 ), . . . ,𝑇 (𝑤𝑛 )}. We show that 𝐸 is a
basis of 𝑅(𝑇 ).
By (3.8), 𝑅(𝑇 ) = span {𝑇 (𝑣 1 ), . . . ,𝑇 (𝑣𝑘 ),𝑇 (𝑤 1 ), . . . ,𝑇 (𝑤𝑛 )}. Since 𝑇 (𝑣𝑖 ) = 0 for
each 𝑖 ∈ {1, . . . , 𝑘 }, 𝑅(𝑇 ) = span {𝑇 (𝑤 1 ), . . . ,𝑇 (𝑤𝑛 )} = span (𝐸).
For linear independence of 𝐸, let 𝑏 1, . . . , 𝑏𝑛 be scalars such that
𝑏 1𝑇 (𝑤 1 ) + · · · + 𝑏𝑛𝑇 (𝑤𝑛 ) = 0.
𝑏 1𝑤 1 + · · · + 𝑏𝑛𝑤𝑛 = 𝑎 1𝑣 1 + · · · + 𝑎𝑘 𝑣𝑘 .
That is,
𝑎 1𝑣 1 + · · · + 𝑎𝑘 𝑣𝑘 − 𝑏 1𝑤 1 − · · · − 𝑏𝑛𝑤𝑛 = 0.
Since 𝐵 ∪ {𝑤 1, . . . , 𝑤𝑛 } is a basis of 𝑉 , we have 𝑎 1 = · · · = 𝑎𝑘 = 𝑏 1 = · · · = 𝑏𝑛 = 0.
Hence 𝐸 is linearly independent.
Now that 𝐵 is a basis for 𝑁 (𝑇 ), 𝐸 is a basis for 𝑅(𝑇 ), and 𝐵 ∪ {𝑤 1, . . . , 𝑤𝑛 } is a
basis for 𝑉 , we have rank(𝑇 ) + null(𝑇 ) = 𝑛 + 𝑘 = dim (𝑉 ).
(3.11) Example
Let 𝑇 : R2 [𝑡] → R4 be defined by 𝑇 (𝑎 + 𝑏𝑡 + 𝑐𝑡 2 ) = (𝑎 − 𝑏, 𝑏 − 𝑐, 𝑐 + 𝑎, −2𝑎).
Determine rank(𝑇 ) and null(𝑇 ).
We find that 𝑅(𝑇 ) = {(𝑎 − 𝑏, 𝑏 − 𝑐, 𝑐 + 𝑎, −2𝑎) : 𝑎, 𝑏, 𝑐 ∈ R}. Since
the vectors (1, 0, 1, −2), (−1, 1, 0, 0) and (0, −1, 1, 0) span 𝑅(𝑇 ).
Alternatively, we start with a basis of R2 [𝑡], say {1, 𝑡, 𝑡 2 }. Now,
By (3.8), 𝑅(𝑇 ) = span {(1, 0, 1, −2), (−1, 1, 0, 0), (0, −1, 1, 0)}.
To check linear independence, suppose
Then (𝑎−𝑏, 𝑏 −𝑐, 𝑐 +𝑎, −2𝑎) = (0, 0, 0, 0). So, 𝑎−𝑏 = 0, 𝑏 −𝑐 = 0, 𝑐 +𝑎 = 0, −2𝑎 = 0.
Solving, we find that 𝑎 = 𝑏 = 𝑐 = 0. So, 𝐵 = {(1, 0, 1, −2), (−1, 1, 0, 0), (0, −1, 1, 0)}
is linearly independent. That is, 𝐵 is a basis of 𝑅(𝑇 ). Hence, rank(𝑇 ) = 3.
62 MA2031 Classnotes
3.4 Isomorphisms
Recall that a function 𝑓 : 𝑋 → 𝑌 is one-one and onto iff there exists a unique
function 𝑔 : 𝑌 → 𝑋 such that 𝑔 ◦ 𝑓 = 𝐼𝑋 and 𝑓 ◦ 𝑔 = 𝐼𝑌 . Here, the map 𝐼𝑋 is the
identity map on 𝑋 and similarly, 𝐼𝑌 is the identity map on 𝑌 . In such a case, the
function 𝑓 is said to be invertible, and its inverse, which is the function 𝑔, is denoted
by 𝑓 −1 .
We give a name to a one-one onto linear transformation.
A one-one and onto linear transformation is called an isomorphism.
Two vector spaces are called isomorphic to each other iff there exists an isomor-
phism from one to the other.
64 MA2031 Classnotes
(3.12) Example
(3.13) Theorem
The inverse of an isomorphism is also an isomorphism.
(3.14) Theorem
Let 𝑉 and 𝑊 be finite dimensional vector spaces with dim (𝑉 ) = dim (𝑊 ). Let
𝑇 : 𝑉 → 𝑊 be a linear transformation. Then 𝑇 is an isomorphism iff 𝑇 is one-one
iff 𝑇 it is onto.
Linear Transformations 65
Proof. 𝑇 is one-one iff null(𝑇 ) = 0 iff rank(𝑇 ) = dim (𝑉 ) iff rank(𝑇 ) = dim (𝑊 )
iff 𝑇 is onto.
(3.15) Example
Let 𝑊 = {(𝑎, 𝑏, 𝑐, 𝑑) ∈ R4 : 𝑎 + 𝑏 + 𝑐 + 𝑑 = 0}. Define 𝑇 : R2 [𝑡] → 𝑊 by
𝑇 (𝑎 + 𝑏𝑡 + 𝑐𝑡 2 ) = (𝑎 − 𝑏, 𝑏 − 𝑐, 𝑐 + 𝑎, −2𝑎).
𝑇 ( 12 − 12 𝑡 − 12 𝑡 2 ) = 1
+ 21 , − 12 + 12 , − 12 + 12 , −2 × 1
2 2 = (1, 0, 0, −1).
Proceeding similarly for the other two basis vectors, we see that
𝑇 ( 12 + 12 𝑡 − 12 𝑡 2 ) = 1
− 12 , 21 + 12 , 21 − 12 , −2 × 1
2 2 = (0, 1, 0, −1).
𝑇 ( 12 + 12 𝑡 + 12 𝑡 2 ) = 1
− 21 , 12 − 12 , 21 + 12 , −2 × 1
2 2 = (0, 0, 1, −1).
Therefore, 𝑊 = span {(1, 0, 0, −1), (0, 1, 0, −1), (0, 0, 1, −1)} ⊆ 𝑅(𝑇 ) ⊆ 𝑊 .
(3.16) Theorem
A vector space is isomorphic to a finite dimensional vector space iff both the spaces
have equal dimensions.
(3.17) Theorem
Let 𝑃 : 𝑈 → 𝑉 , 𝑇 : 𝑉 → 𝑊 and 𝑄 : 𝑊 → 𝑋 be linear transformations, where 𝑉
and 𝑊 are finite dimensional vector spaces. Suppose 𝑃 and 𝑄 are isomorphisms.
Then rank(𝑄𝑇 𝑃) = rank(𝑇 ) and null(𝑄𝑇 𝑃) = null(𝑇 ).
(2) Let 𝑢 ∈ 𝑁 (𝑄𝑇 ). Then 𝑄𝑇 (𝑢) = 0 implies 𝑇 (𝑢) = 𝑄 −1 (𝑄𝑇 (𝑢)) = 0. That is,
𝑢 ∈ 𝑁 (𝑇 ). Conversely, let 𝑧 ∈ 𝑁 (𝑇 ). Then 𝑇 (𝑧) = 0. Clearly, 𝑄 (𝑇 (𝑧)) = 0. So,
𝑧 ∈ 𝑁 (𝑄𝑇 ). Therefore, 𝑁 (𝑄𝑇 ) = 𝑁 (𝑇 ).
By (1), rank(𝑇 𝑃) = rank(𝑇 ). Since 𝑃 : 𝑈 → 𝑉 is an isomorphism, dim (𝑈 ) =
dim (𝑉 ). By Rank-nullity theorem, we obtain
(3.18) Theorem
Let 𝑈 , 𝑉 ,𝑊 and 𝑋 be finite dimensional vector spaces with dim (𝑈 ) = dim (𝑉 ) and
dim (𝑊 ) = dim (𝑋 ). Let 𝑇 : 𝑉 → 𝑊 and 𝑆 : 𝑈 → 𝑋 be linear transformations. If
rank(𝑆) = rank(𝑇 ), then there exist isomorphisms 𝑃 : 𝑈 → 𝑉 and 𝑄 : 𝑊 → 𝑋 such
that 𝑆 = 𝑄𝑇 𝑃 .
Linear Transformations 67
The conditions dim (𝑈 ) = dim (𝑉 ) and dim (𝑊 ) = dim (𝑋 ) imply that there exist
isomorphisms 𝑃 : 𝑈 → 𝑉 and 𝑄 : 𝑊 → 𝑋 . However, the composition formula
𝑆 = 𝑄𝑇 𝑃 may not hold, in general. The theorem says that such a composition
formula holds for some isomorphisms 𝑃 and 𝑄 when rank(𝑆) = rank(𝑇 ).
Proof. Suppose that dim (𝑈 ) = dim (𝑉 ) = 𝑛, dim (𝑊 ) = dim (𝑋 ) = 𝑚, and
rank(𝑇 ) = rank(𝑆) = 𝑟 . Then null(𝑇 ) = null(𝑆) = 𝑛 − 𝑟 .
Choose a basis {𝑣 1, . . . , 𝑣𝑛−𝑟 } for 𝑁 (𝑇 ), which is a subspace of 𝑉 . Extend this
basis to {𝑣 1, . . . , 𝑣𝑛−𝑟 , . . . , 𝑣𝑛 } for 𝑉 . Similarly, choose a basis {𝑢 1, . . . , 𝑢𝑛−𝑟 } for
𝑁 (𝑆), which is a subspace of 𝑈 . Extend this to a basis {𝑢 1, . . . , 𝑢𝑛−𝑟 , . . . , 𝑢𝑛 } for 𝑈 .
Then define the vectors
𝑃 (𝑢 1 ) = 𝑣 1, . . . , 𝑃 (𝑢𝑛 ) = 𝑣𝑛 ; 𝑄 (𝑤 1 ) = 𝑥 1, . . . , 𝑄 (𝑤𝑛 ) = 𝑥𝑛 .
Since each basis vector 𝑣𝑖 of 𝑉 is in 𝑅(𝑃), 𝑉 ⊆ 𝑅(𝑃); that is, 𝑃 is onto. By (3.14), 𝑃
is an isomorphism. Similarly, 𝑄 is also an isomorphism.
For 1 ≤ 𝑖 ≤ 𝑛 − 𝑟, 𝑄 (𝑇 (𝑃 (𝑢𝑖 ))) = 𝑄 (𝑇 (𝑣𝑖 )) = 𝑄 (0) = 0 = 𝑆 (𝑢𝑖 ).
For 𝑛 − 𝑟 < 𝑖 ≤ 𝑛, 𝑄 (𝑇 (𝑃 (𝑢𝑖 ))) = 𝑄 (𝑇 (𝑣𝑖 )) = 𝑄 (𝑤𝑖 ) = 𝑥𝑖 = 𝑆 (𝑢𝑖 ).
Since the linear transformations 𝑄𝑇 𝑃 and 𝑆 act the same way on each of the basis
vectors 𝑢 1, . . . , 𝑢𝑛 of 𝑈 , we conclude that 𝑄𝑇 𝑃 = 𝑆.
The results in (3.17-3.18) are often quoted by telling informally that
isomorphisms preserve rank and nullity of a linear transformation.
(3.19) Theorem
Let 𝑉 be a vector space, 𝑊 be an inner product space, and let 𝑆, 𝑇 : 𝑉 → 𝑊 be
linear transformations. Then, 𝑆 = 𝑇 iff h𝑆𝑣, 𝑤i = h𝑇 𝑣, 𝑤i for all 𝑣 ∈ 𝑉 , 𝑤 ∈ 𝑊 .
Then for all 𝑣 ∈ 𝑉 and 𝑤 ∈ 𝑊 , we obtain h𝑄𝑤, 𝑣i = h𝑇 𝑣, 𝑤i = h𝑣, 𝑆𝑤i = h𝑆𝑤, 𝑣i.
By (3.19), we conclude that 𝑄 = 𝑆.
We give a name to the linear transformation 𝑆, in (3.20), corresponding to 𝑇 .
Let 𝑉 and 𝑊 be inner product spaces, where 𝑉 is of finite dimension. Let
𝑇 : 𝑉 → 𝑊 be a linear transformation. The linear transformation 𝑇 ∗ : 𝑊 → 𝑉
defined by
h𝑇 𝑥, 𝑦i = h𝑥,𝑇 ∗𝑦i for all 𝑥 ∈ 𝑉 , 𝑦 ∈ 𝑊
70 MA2031 Classnotes
(3.21) Example
Consider the spaces R3 and R4 with their standard bases and the standard inner
products. Define the linear transformation 𝑇 : R4 → R3 by
𝑇 (𝑎, 𝑏, 𝑐, 𝑑) = (𝑎 + 𝑐, 𝑏 − 2𝑐 + 𝑑, 𝑎 − 𝑏 + 𝑐 − 𝑑).
𝑛
Õ 𝑛
Õ D Õ 𝑛 E
𝑓 (𝑥) = h𝑥, 𝑣𝑖 i𝑓 (𝑣𝑖 ) = h𝑥, 𝑓 (𝑣𝑖 ) 𝑣𝑖 i = 𝑥, 𝑓 (𝑣𝑖 ) 𝑣𝑖 = h𝑥, 𝑦i.
𝑖=1 𝑖=1 𝑖=1
𝑆 :𝑊 →𝑉 be given by 𝑆 (𝑤) = 𝑣 𝑓 .
These equations hold for all 𝑥 ∈ 𝑉 . That is, 𝑆 (𝑦 + 𝑧) = 𝑆𝑦 + 𝑆𝑧 and 𝑆 (𝛼𝑦) = 𝛼𝑆𝑦.
Therefore, 𝑆 is a linear transformation satisfying
(3.23) Theorem
Let 𝑈 , 𝑉 and 𝑊 be finite dimensional ips. Let 𝑆 : 𝑈 → 𝑉 and 𝑇 ,𝑇1,𝑇2 : 𝑉 → 𝑊 be
linear transformations. Let 𝐼 : 𝑉 → 𝑉 be the identity operator and let 𝛼 ∈ F. Then
Proof. h𝑥, (𝑇1 + 𝑇2 ) ∗𝑦i = h(𝑇1 + 𝑇2 )𝑥, 𝑦i = h𝑇1𝑥, 𝑦i + h𝑇2𝑥, 𝑦i = h𝑥,𝑇1∗𝑦i + h𝑥,𝑇2∗𝑦i
= h𝑥,𝑇1∗𝑦 + 𝑇2∗𝑦i. Therefore, (𝑇1 + 𝑇2 ) ∗ = 𝑇1∗ + 𝑇2∗ . Other equalities are proved
similarly.
(3.24) Theorem
Let 𝑉 and 𝑊 be finite dimensional inner product spaces, and let 𝑇 : 𝑉 → 𝑊 be a
linear transformation. Then the following are true:
(1) 𝑁 (𝑇 ∗𝑇 ) = 𝑁 (𝑇 ), 𝑁 (𝑇𝑇 ∗ ) = 𝑁 (𝑇 ∗ ).
(2) rank(𝑇 ∗ ) = rank(𝑇 ∗𝑇 ) = rank(𝑇𝑇 ∗ ) = rank(𝑇 ).
(3) 𝑅(𝑇 ∗𝑇 ) = 𝑅(𝑇 ∗ ), 𝑅(𝑇𝑇 ∗ ) = 𝑅(𝑇 ).
(4) If dim (𝑉 ) = dim (𝑊 ), then null(𝑇 ∗ ) = null(𝑇 ∗𝑇 ) = null(𝑇𝑇 ∗ ) = null(𝑇 ).
1. self-adjoint iff 𝑇 ∗ = 𝑇 ;
2. normal iff 𝑇 ∗𝑇 = 𝑇𝑇 ∗ ;
3. unitary iff 𝑇 ∗𝑇 = 𝑇𝑇 ∗ = 𝐼 ;
4. orthogonal iff 𝑇 ∗𝑇 = 𝑇𝑇 ∗ = 𝐼 and 𝑉 is a real ips;
5. isometric iff k𝑇 𝑥 k = k𝑥 k for each 𝑥 ∈ 𝑉 .
We will come across these types of linear operators at various places. For now,
we observe that each self-adjoint linear operator is normal, and each unitary linear
operator is normal and invertible. Similarly, it can be shown that a linear operator
on a finite dimensional ips is unitary iff it is isometric.
(a) normal but not self-adjoint. (b) normal but not unitary.
(c) unitary but not self-adjoint. (d) self-adjoint but not unitary.
Ans: (a)-(b) 𝑇 : F2 → F2, 𝑇 (𝑎, 𝑏) = (2𝑎 − 3𝑏, 3𝑎 − 2𝑏). See that
𝑇 ∗ (𝑎, 𝑏) = (2𝑎 + 3𝑏, −3𝑎 + 2𝑏).
(c) 𝑇 : F2 → F2, 𝑇 (𝑎, 𝑏) = (−𝑏, 𝑎), 𝑇 ∗ (𝑎, 𝑏) = (𝑏, −𝑎). (d) 𝑇 = 2𝐼 .
∫1
7. Determine a polynomial 𝑞(𝑡) ∈ 𝑃2 (R) so that 0 𝑝 (𝑡)𝑞(𝑡)𝑑𝑡 = 𝑝 (1/2) for
each 𝑝 (𝑡) ∈ R2 [𝑡]. Ans: 𝑞(𝑡) = −3/2 + 15𝑡 − 15𝑡 2 .
8. Define 𝑓 : C3 → C by 𝑓 (𝑎, 𝑏, 𝑐) = (𝑎 + 𝑏 + 𝑐)/3. Find a vector 𝑦 ∈ C3 such
that 𝑓 (𝑥) = h𝑥, 𝑦i for each 𝑥 ∈ C3 . Ans: (1/3, 1/3, 1/3).
9. Let 𝑇 be a linear operator on a finite dimensional ips. Using (3.24) show that
if 𝑇 ∗𝑇 = 𝐼, then 𝑇𝑇 ∗ = 𝐼 .
4
Linear Transformations and Matrices
𝑎 11𝑥 1 + · · · +𝑎 1𝑛 𝑥𝑛 = 𝑏 1
.. ..
. .
𝑎𝑚1𝑥 1 + · · · +𝑎𝑚𝑛 𝑥𝑛 = 𝑏𝑛
75
76 MA2031 Classnotes
(4.1) Theorem
A linear system has a solution iff it is consistent.
Proof. Let 𝑈 be the subspace of F𝑚×1 spanned by the columns of [𝐴|𝑏]. Notice
that 𝑅(𝐴) is a subspace of 𝑈 . Then,
𝐴𝑥 = 𝑏 has a solution iff 𝑏 = 𝐴𝑥 for some 𝑥 ∈ F𝑛×1 iff 𝑏 ∈ 𝑅(𝐴)
iff 𝑅(𝐴) = 𝑈 iff rank(𝐴) = dim (𝑅(𝐴)) = dim (𝑈 ) = rank([𝐴|𝑏]).
Recall that 𝑁 (𝐴) is the null space of 𝐴, which is equal to the solution set of
the homogeneous system. That is, 𝑁 (𝐴) = Sol (𝐴, 0). We connect Sol (𝐴, 𝑏) and
Sol (𝐴, 0).
(4.2) Theorem
If 𝑢 is a solution of 𝐴𝑥 = 𝑏, then Sol (𝐴, 𝑏) = 𝑢 + 𝑁 (𝐴) = {𝑢 + 𝑥 : 𝑥 ∈ 𝑁 (𝐴)}.
(4.3) Theorem
Let 𝐴 ∈ F𝑚×𝑛 and let 𝑏 ∈ F𝑚×1 . Write 𝑘 := null(𝐴) = 𝑛 − rank(𝐴).
(1) The linear system 𝐴𝑥 = 𝑏 has a unique solution iff 𝑘 = 0 and 𝑏 ∈ 𝑅(𝐴) iff
rank([𝐴|𝑏]) = rank(𝐴) = 𝑛.
(2) If 𝑢 is a solution of 𝐴𝑥 = 𝑏 and {𝑣 1, . . . , 𝑣𝑘 } is a basis for 𝑁 (𝐴), then
When a system has a unique solution, a determinant formula can be given for
obtaining the solution. We will discuss later how to compute the solution set of a
general linear system.
Linear Transformations and Matrices 77
(4.4) Theorem (Cramer’s Rule)
Let 𝐴 ∈ F𝑛×𝑛 with det(𝐴) ≠ 0, and let 𝑏 ∈ F𝑛×1 . Let 𝐴𝑖 [𝑏] denote the matrix obtained
from 𝐴 by replacing its 𝑖th column with the vector 𝑏. Then the solution of 𝐴𝑥 = 𝑏
is given by
det(𝐴𝑖 [𝑏])
𝑥𝑖 = for 1 ≤ 𝑖 ≤ 𝑛.
det(𝐴)
Proof. Since det(𝐴) ≠ 0, there exists a unique 𝑥 ∈ F𝑛×1 such that 𝐴𝑥 = 𝑏. Let
𝑥 = (𝑥 1, . . . , 𝑥𝑛 ) t . Write 𝐴𝑥 = 𝑏 as 𝑥 1𝐶 1 + · · · + 𝑥𝑛𝐶𝑛 = 𝑏, where 𝐶 𝑗 is the 𝑗th column
of 𝐴. Next, move 𝑏 to the left side to obtain
𝑥 1𝐶 1 + · · · + (𝑥𝑖 𝐶𝑖 − 𝑏) + · · · + 𝑥𝑛𝐶𝑛 = 0.
det[𝐶 1, . . . , 𝑥𝑖 𝐶𝑖 , . . . , 𝐶𝑛 ] − det[𝐶 1, . . . , 𝑏𝑖 , . . . , 𝐶𝑛 ] = 0.
Cramer’s rule helps in studying the map (𝐴, 𝑏) ↦→ 𝑥, when det(𝐴) ≠ 0. For
computing actual solutions of a linear system, it does not help when the order of
the matrix is larger than five, in general. We rather use Gauss-Jordan elimination
method.
Gauss-Jordan elimination uses conversion of the augmented matrix to its RREF.
We now discuss this systematic approach for solving systems of linear equations.
(4.5) Theorem
Let [𝐴0 |𝑏 0] be an augmented matrix obtained from the augmented matrix [𝐴|𝑏] by
elementary row operations. Then Sol (𝐴, 𝑏) = Sol (𝐴0, 𝑏 0).
(4.6) Example
Consider solving the following system of linear equations:
𝑥1 + 𝑥2 + 2𝑥 3 + 𝑥5 =1
3𝑥 1 + 5𝑥 2 + 5𝑥 3 + 𝑥 4 + 𝑥5 =2
4𝑥 1 + 6𝑥 2 + 7𝑥 3 + 𝑥 4 + 2𝑥 5 =3
𝑥1 + 5𝑥 2 + 5𝑥 4 + 𝑥5 =2
2𝑥 1 + 8𝑥 2 + 𝑥 3 + 6𝑥 4 + 0𝑥 5 = 2.
In Gauss-Jordan elimination, the reduction of the augmented matrix to RREF goes
as follows:
1 1 2 0 1 1 1 1 2 0 1 1
3
5 5 1 1 2 0 2
−1 1 −2 −1
𝑂1
4 6 7 1 2 3 −→ 0 2 −1 1 −2 −1
1 5 0 5 1 2 0 4 −2 5 0 1
2
8 1 6 0 2 0 6
−3 6 −2 0
1
0 5/2 −1/2 2 3/2
1 0 5/2 0 8/3 2
0
1 −1/2 1/2 −1 −1/2
0 1 −1/2 0 −5/3 −1
𝑂2 𝑂3
−→ 0 0 0 0 0 0 −→ 0 0 0 1 4/3 1 .
0 0 0 3 4 3 0 0 0 0 0 0
0 0 0 3 4 3 0 0 0 0 0 0
Here, 𝑂1 = 𝑅2 ← 𝑅2 − 4𝑅1, 𝑅3 ← 𝑅3 − 4𝑅1, 𝑅4 ← 𝑅4 − 𝑅1, 𝑅5 ← 𝑅5 − 2𝑅1 ;
𝑂2 = 𝑅2 ← 1/2𝑅2, 𝑅1 ← 𝑅1 − 𝑅2, 𝑅3 ← 𝑅3 − 2𝑅2, 𝑅4 ← 𝑅4 − 4𝑅2, 𝑅5 ← 𝑅5 − 6𝑅2 ;
𝑂3 = 𝑅3 ↔ 𝑅4, 𝑅1 ← 𝑅1 + 1/2𝑅3, 𝑅2 ← 𝑅2 − 1/2𝑅3, 𝑅5 ← 𝑅5 − 3𝑅3 .
The equations now look like
𝑥1 + 25 𝑥 3 + 83 𝑥 5 = 2
𝑥 2 − 12 𝑥 3 − 53 𝑥 5 = −1
𝑥 4 + 34 𝑥 5 = 1.
The basic variables are 𝑥 1, 𝑥 2, 𝑥 4 and the free variables are 𝑥 3 and 𝑥 5 . Assigning the
free avraibales arbitrary values, say, 𝑥 3 = 𝛼 and 𝑥 5 = 𝛽, we have
𝑥1 = 2 − 52 𝛼 − 83 𝛽
𝑥 2 = −1 + 21 𝛼 + 53 𝛽
𝑥3 = 𝛼
𝑥4 = 1 − 43 𝛽
𝑥5 = 𝛽.
Linear Transformations and Matrices 79
Hence the solution set is
2 −5/2 −8/3
−1
1/2
5/3
Sol (𝐴, 𝑏) = 0 + 𝛼 1 + 𝛽 0 : 𝛼, 𝛽 ∈ F .
−4
1 0 /3
0
0 1
In fact, you can write the solution set from the RREF of the augmented matrix quite
mechanically instead of rewriting as a set of equations. In this process, we delete all
zero rows at the bottom of the RREF. Next, we insert suitable zero rows so that the
pivots are on the diagonal and the 𝐴-portion is a square matrix. In this phase, we
may require adding more zero rows at the bottom. Next, we change all non-pivot
entries (containing zero now) to −1. Then the non-pivotal columns form a basis for
𝑁 (𝐴), and any solution of the system is the vector in the 𝑏-portion plus a linear
combination of the non-pivotal columns. To see this process for the above RREF,
we proceed as follows:
1 0 5/2 0 8/3 2
0 1 −1/2 0 −5/3 −1 1 0 5/2 0 8/3 2
𝑑𝑒𝑙−0
0 0 0 1 4/3 1 −→ 0 1 −1/2 0 −5/3 −1
0 0 0 0 0 0 0
0 0 1 4/3 1
0 0 0 0 0 0
1 0 5/2 0 8/3 2 1 0 5/2 0 8/3 2
0 1 −1/2 0 −5/3 −1 0 1 −1/2 0 −5/3 −1
(−1)
𝑖𝑛𝑠−0
−→ 0 0 0 0 0 0 −→ 0 0 −1 0 0 0.
0 0 0 1 4/3 1 0 0 0 1 4/3 1
0 0 0 0 0 0 0 0 0 0 0 0
Then
2 5/2 8/3
−1
− 1/2
−5/3
Sol (𝐴, 𝑏) = 0 + 𝑎 −1 + 𝑏 0 : 𝑎, 𝑏 ∈ F .
4
1 0 /3
0
0 −1
It is easy to see that this process yields the same solution set as earlier.
9. Find all possible values of 𝑘 ∈ R such that the system of linear equations
𝑥 + 𝑦 + 2𝑧 − 5𝑤 = 3, 2𝑥 + 5𝑦 − 𝑧 − 9𝑤 = −3, 𝑥 − 2𝑦 + 6𝑧 − 7𝑤 = 7,
2𝑥 + 2𝑦 + 2𝑧 + 𝑘𝑤 = −4
has more than one solution. Ans: 𝑘 = −12.
Linear Transformations and Matrices 81
10. Determine the values of 𝑘 ∈ R so that system of linear equations
𝑥 + 𝑦 − 𝑧 = 1, 2𝑥 + 3𝑦 + 𝑘𝑧 = 3, 𝑥 + 𝑘𝑦 + 3𝑧 = 2.
has (a) no solution, (b) infinitely many solutions, (c) exactly one solution.
Ans: (a) 𝑘 = −3. (b) 𝑘 = 2. (c) 𝑘 ∉ {−3, 2}.
11. For all possible values of the scalars 𝑎 and 𝑏, discuss the number of solutions
of the linear system 𝑥 + 2𝑦 + 3𝑧 = 1, 𝑥 − 𝑎𝑦 + 21𝑧 = −2, 3𝑥 + 7𝑦 + 𝑎𝑧 = 𝑏.
Ans: 𝑎 ∉ {0, 7} : unique solution; 𝑎 = 0, 𝑏 = 5/2 or 𝑎 = 7, 𝑏 = 4/9 : infinitely
many solutions; 𝑎 = 0, 𝑏 ≠ 5/2 or 𝑎 = 7, 𝑏 ≠ 4/9 : no solutions.
(4.7) Theorem
Let 𝑇 : 𝑈 → 𝑉 be a linear transformation, where 𝑈 is a subspace of an ips 𝑉 . Let
𝑦 ∈ 𝑉 . Then the following are true:
(1) If 𝑅(𝑇 ) is finite dimensional, then 𝑇 𝑥 = 𝑦 has a least squares solution.
(2) A vector 𝑢 ∈ 𝑈 is a least squares solution of 𝑇 𝑥 = 𝑦 iff 𝑇𝑢 − 𝑦 ⊥ 𝑧 for each
𝑧 ∈ 𝑅(𝑇 ).
82 MA2031 Classnotes
(4.8) Theorem
Let 𝐴 ∈ F𝑚×𝑛 , and let 𝑏 ∈ F𝑚×1 . A vector 𝑢 ∈ F𝑛×1 is a least squares solution of the
system of linear equations 𝐴𝑥 = 𝑏 iff 𝐴∗𝐴𝑢 = 𝐴∗𝑏.
Proof. Let 𝑢 1, . . . , 𝑢𝑛 be the columns of 𝐴. These vectors span 𝑅(𝐴). Using (4.7),
we see that
𝑢 is a least squares solution of 𝐴𝑥 = 𝑏
iff h𝐴𝑢 − 𝑏, 𝑢𝑖 i = 0 for 𝑖 = 1, . . . , 𝑛
iff 𝑢𝑖∗ (𝐴𝑢 − 𝑏) = 0 for 𝑖 = 1, . . . , 𝑛
iff 𝐴∗ (𝐴𝑢 − 𝑏) = 0
iff 𝐴∗𝐴𝑢 = 𝐴∗𝑏.
(4.9) Example
1 1 0 1
Suppose 𝐴 = ,𝑏= , and 𝑢 = .
0 0 1 −1
We see that 𝐴 ∈ R2×2 and 𝐴t𝐴𝑢 = 𝐴t𝑏. Therefore, 𝑢 is a least squares solution of
𝐴𝑥 = 𝑏. Notice that 𝐴𝑥 = 𝑏 has no solution.
A least squares solution can be written in a simplified form by using the QR-
factorization, which stems from Gram-Schmidt orthogonalization. To see this, we
first present this matrix factorization.
A QR-factorization of a matrix 𝐴 ∈ F𝑚×𝑛 is the determination of a
matrix 𝑄 ∈ F𝑚×𝑛 with orthonormal columns, and an upper triangular matrix
𝑅 ∈ F𝑛×𝑛 such that 𝐴 = 𝑄𝑅.
(4.10) Theorem
Each matrix with linearly independent columns has a QR-factorization, where 𝑅 is
invertible. Consequently, 𝑅 = 𝑄 ∗𝐴.
Proof. Let 𝑢 1, . . . , 𝑢𝑛 be the columns of 𝐴 ∈ F𝑚×𝑛 . Suppose the columns are linearly
independent. It ensures that 𝑚 ≥ 𝑛. Use Gram-Schmidt process and orthonormalize
to obtain the orthonormal vectors 𝑣 1, . . . , 𝑣𝑛 . We know that for each 𝑘 ∈ {1, . . . , 𝑛},
span {𝑢 1, . . . , 𝑢𝑘 } = span {𝑣 1, . . . , 𝑣𝑘 }.
In particular, 𝑢𝑘 ∈ span {𝑣 1, . . . , 𝑣𝑘 }. Hence there exist scalars 𝑎𝑖 𝑗 such that the
following equalities hold:
Linear Transformations and Matrices 83
𝑢 1 = 𝑎 11𝑣 1
𝑢 2 = 𝑎 12𝑣 1 + 𝑎 22𝑣 2
..
.
𝑢𝑛 = 𝑎 1𝑛 𝑣 1 + 𝑎 2𝑛 𝑣 2 + · · · + 𝑎𝑛𝑛 𝑣𝑛 .
Since the vectors 𝑢 1, . . . , 𝑢𝑛 are linearly independent, the scalars 𝑎 11, . . . , 𝑎𝑛𝑛 are
nonzero. Put 𝑎𝑖 𝑗 = 0 for 𝑖 > 𝑗 . Write 𝑅 = [𝑎𝑖 𝑗 ] and 𝑄 = [𝑣 1, · · · , 𝑣𝑛 ]. Then the
above equalities give
𝐴 = [𝑢 1, · · · , 𝑢𝑛 ] = 𝑄𝑅.
Here, 𝑄 ∈ F𝑚×𝑛 has orthonormal columns; and 𝑅 ∈ F𝑛×𝑛 is upper triangular.
Further, 𝑅 is invertible since the diagonal entries 𝑎𝑖𝑖 in 𝑅 are nonzero.
Moreover, the inner product in F𝑛×1 is given by h𝑢, 𝑣i = 𝑣 ∗𝑢. Therefore, 𝑄 has
orthonormal columns means that 𝑄 ∗𝑄 = 𝐼 . Then 𝑄𝑅 = 𝐴 implies that 𝑅 = 𝑄 ∗𝐴.
Notice that 𝑄 ∗𝑄 = 𝐼 does not imply that 𝑄𝑄 ∗ is 𝐼, in general. In case 𝐴 is a square
matrix, 𝑄𝑄 ∗ = 𝐼 and thus 𝑄 is unitary.
(4.11) Example
1 1
Let 𝐴 = 0 1 . Gram-Schmidt process on the columns of 𝐴 followed by orthonor-
1 1
malization yields the following:
1 1/√2
1
k𝑤 1 k 2 = 𝑤 1t 𝑤 1 = 2,
𝑤 1 = 𝑢 1 = 0 , 𝑣1 = 𝑤 1 = 0 .
1 k𝑤 1 k 1/√2
1 1/√2 0 0
t
√ 1
𝑤 2 = 𝑢 2 − (𝑣 1 𝑢 2 )𝑣 1 = 1 − 2 0 = 1 , 𝑣 2 =
𝑤 2 = 1 .
1 1/√2 0 k𝑤 2 k 0
Therefore,
1/√2 0
𝑄 = [𝑣 1 𝑣 2 ] = 0 1 .
1/√2 0
√ √
2 2
Then 𝑅 = 𝑄 ∗𝐴 = 𝑄 t𝐴 = . Verify that 𝐴 = 𝑄𝑅.
0 1
The QR-factorization can be used to express a least squares solution in closed
form.
84 MA2031 Classnotes
(4.12) Theorem
Let 𝐴 ∈ F𝑚×𝑛 have linearly independent columns, and let 𝑏 ∈ F𝑚×1 . Then the least
squares solution of 𝐴𝑥 = 𝑏 is unique; and it is given by 𝑢 = 𝑅 −1𝑄 ∗𝑏, where 𝐴 = 𝑄𝑅
is a QR-factorization of 𝐴.
That is, 𝑢 satisfies the equation 𝐴∗𝐴𝑥 = 𝐴∗𝑏. Therefore, 𝑢 is the least squares
solution.
Assume that 𝐴 has linearly independent columns. Is the least squares solution 𝑢
a solution of 𝐴𝑥 = 𝑏? We have 𝐴𝑢 = 𝑄𝑅𝑅 −1𝑄 ∗𝑏 = 𝑄𝑄 ∗𝑏. As seen earlier, this is
not necessarily equal to 𝑏; and then 𝑢 need not be a solution of 𝐴𝑥 = 𝑏. However, if
𝑢 is also a solution of 𝐴𝑥 = 𝑏, then 𝑄 has orthonormal rows. In that case, 𝐴 must be
a square matrix.
Again, if a solution 𝑣 exists for 𝐴𝑥 = 𝑏, then 𝑣 = 𝑢. Reason? If 𝐴𝑣 = 𝑏, then
𝑄𝑅𝑣 = 𝑏 implies that 𝑅𝑣 = 𝑄 ∗𝑏. Hence, 𝑣 = 𝑢.
Notice that the linear system 𝑅𝑢 = 𝑄 ∗𝑏 is easy to solve since 𝑅 is upper triangular.
(4.13) Example
Consider computing the least squares solution of the system 𝐴𝑥 = 𝑏, where 𝐴 is the
matrix in (4.11), and 𝑏 = (1, 2, 3) t . We have seen that
1 1 1/√2 0 √ √
2 2
𝐴 = 0 1 = 𝑄𝑅, 𝑄 = 0 1 , 𝑅= .
1 1/√2 0 0 1
1
If 𝑢 is the least squares solution of 𝐴𝑥 = 𝑏, then 𝑢 satisfies 𝑅𝑢 = 𝑄 ∗𝑏, or
√ √ √
2 𝑢 1 + 2 𝑢 2 = 2 2, 𝑢 2 = 2.
2𝑢 1 + 2𝑢 2 = 4, 2𝑢 1 + 3𝑢 2 = 6.
(4.14) Example
1. Consider the ordered basis 𝐵 = {(1, −1), (1, 0)} for F2 . Then
(0, 1) = −1(1, −1) + 1(1, 0). Thus [(0, 1)]𝐵 = (−1, 1) t .
2. Consider the ordered basis 𝐵 = {(1, 0), (1, −1)} for F2 . Then
(0, 1) = 1(1, 0) + −1(1, −1). Thus [(0, 1)]𝐵 = (1, −1) t .
3. Consider the ordered basis 𝐵 = {1, 1 + 𝑡, 1 + 𝑡 2 } for F2 [𝑡]. Then
1 + 𝑡 + 𝑡 2 = −1(1) + 1(1 + 𝑡) + 1(1 + 𝑡 2 ). Thus [1 + 𝑡 + 𝑡 2 ]𝐵 = (−1, 1, 1) t .
The coordinate vectors would be possibly different if we alter the positions of the
basis vectors in the ordered basis. Further, we speak of a coordinate vector map
with respect to a given ordered basis of a finite dimensional vector space.
(4.15) Theorem
Each coordinate vector map is an isomorphism.
𝑥 = 𝛼 1𝑣 1 + · · · + 𝛼𝑛 𝑣 𝑛 , 𝑦 = 𝛽 1 𝑣 1 + · · · 𝛽𝑛 𝑣 𝑛 .
That is, the coordinate vector of 𝑇 (𝑣) can be obtained once we know the coordinate
vectors of 𝑇 (𝑣 1 ), . . . ,𝑇 (𝑣𝑛 ) with respect to 𝐸. Suppose the coordinate vectors of
𝑇 (𝑣 𝑗 ) are given as follows:
𝑎 11 𝑎 1𝑗 𝑎 1𝑛
[𝑇 (𝑣 1 )]𝐸 = . , · · · , [𝑇 (𝑣 𝑗 )]𝐸 = . , · · · , [𝑇 (𝑣𝑛 )]𝐸 = ...
.. ..
𝑎𝑚1 𝑎𝑚 𝑗 𝑎𝑚𝑛
for scalars 𝑎𝑖 𝑗 . Notice that this is equivalent to expressing 𝑇 (𝑣 𝑗 ) in terms of the basis
vectors 𝑤 1, . . . , 𝑤𝑚 as in the following:
𝑇 (𝑣 1 ) = 𝑎 11𝑤 1 + · · · + 𝑎𝑚1𝑤𝑚
..
.
𝑇 (𝑣𝑛 ) = 𝑎 1𝑛𝑤 1 + · · · + 𝑎𝑚𝑛𝑤𝑚 .
We put together the coordinate vectors [𝑇 (𝑣 𝑗 )]𝐸 in that order to obtain the follow-
ing array of scalars 𝑎𝑖 𝑗 :
𝑎 11 · · · 𝑎 1𝑗 · · · 𝑎 1𝑛
.
.
h i .
[𝑇 (𝑣 1 )]𝐸 · · · [𝑇 (𝑣 𝑗 )]𝐸 · · · [𝑇 (𝑣𝑛 )]𝐸 = 𝑎𝑖1 · · · 𝑎𝑖 𝑗 · · · 𝑎𝑖𝑛 .
..
.
𝑎𝑚1 · · · 𝑎𝑚 𝑗 · · · 𝑎𝑚𝑛
Such an array of 𝑚𝑛 scalars 𝑎𝑖 𝑗 is called an 𝑚 × 𝑛 matrix with entries from F. The
set of all 𝑚 × 𝑛 matrices with entries from F is denoted by F𝑚×𝑛 . The scalar at the
𝑖th row and the 𝑗th column of a matrix is called its (𝑖, 𝑗)th entry. By writing
𝐴 = [𝑎𝑖 𝑗 ] ∈ F𝑚×𝑛
88 MA2031 Classnotes
Then
𝑎 11 · · · 𝑎 1𝑗 · · · 𝑎 1𝑛
.
.
.
[𝑇 ]𝐸,𝐵 = 𝑎𝑖1 · · · 𝑎𝑖 𝑗 · · · 𝑎𝑖𝑛 .
..
.
𝑎𝑚1 · · · 𝑎𝑚 𝑗 · · · 𝑎𝑚𝑛
Caution: Mark which 𝑎𝑖 𝑗 goes where.
(4.16) Example
In the following, we consider all bases as ordered bases.
1. Let 𝐵 = {𝑒 1, 𝑒 2 } and 𝐸 = {𝑓1, 𝑓2, 𝑓3 } be the standard bases for R2 and R3,
respectively. Consider the linear transformation 𝑇 : R2 → R3 given by 𝑇 (𝑎, 𝑏) =
(2𝑎 − 𝑏, 𝑎 + 𝑏, 𝑏 − 𝑎). Then
𝐷 (1) = 0 × 1 + 0 × 𝑡 + 0 × 𝑡 2
𝐷 (𝑡) = 1 × 1 + 0 × 𝑡 + 0 × 𝑡 2
𝐷 (𝑡 2 ) = 0 × 1 + 2 × 𝑡 + 0 × 𝑡 2
𝐷 (𝑡 3 ) = 0 × 1 + 0 × 𝑡 + 3 × 𝑡 2 .
So,
𝑎
0 1 0 0
𝑏
2 3 𝑏 2 3
[𝐷]𝐸,𝐵 = 0 0 2 0 , [𝑎+𝑏𝑡 +𝑐𝑡 +𝑑𝑡 ]𝐵 = , 𝐷 (𝑎+𝑏𝑡 +𝑐𝑡 +𝑑𝑡 )]𝐸 = 2𝑐 .
0 𝑐
0 0 3 3𝑑
𝑑
3. With the same linear transformation 𝐷 and the basis 𝐵 for R3 [𝑡] as in (2), let
𝐸 = {1, 1 + 𝑡, 1 + 𝑡 2 }. Then
𝐷 (1) = 0 × 1 + 0 × (1 + 𝑡) + 0 × (1 + 𝑡 2 )
𝐷 (𝑡) = 1 × 1 + 0 × (1 + 𝑡) + 0 × (1 + 𝑡 2 )
𝐷 (𝑡 2 ) = −2 × 1 + 2 × (1 + 𝑡) + 0 × (1 + 𝑡 2 )
𝐷 (𝑡 3 ) = −3 × 1 + 0 × (1 + 𝑡) + 3 × (1 + 𝑡 2 ).
0 1 −2 −3
Therefore, [𝐷]𝐸,𝐵 = 0 0 2 0 .
0 0 0 3
𝐷 (𝑎 +𝑏𝑡 + 𝑐𝑡 2 +𝑑𝑡 3 ) = 𝑏 + 2𝑐𝑡 + 3𝑑𝑡 2 = (𝑏 − 2𝑐 − 3𝑑) × 1 + 2𝑐 (1 + 𝑡 2 ) + 3𝑑 (1 + 𝑡 2 ).
𝑎
𝑏 − 2𝑐 − 3𝑑
𝑏
So, [𝑎 + 𝑏𝑡 + 𝑐𝑡 2 + 𝑑𝑡 3 )]𝐵 = and [𝐷 (𝑎 + 𝑏𝑡 + 𝑐𝑡 2 + 𝑑𝑡 3 )]𝐸 = 2𝑐
.
𝑐
3𝑑
𝑑
4. Let 𝐵 = {1, 1 + 𝑡, 𝑡 + 𝑡 2 } and 𝐸 = {1, 𝑡, 𝑡 + 𝑡 2, 𝑡 2 + 𝑡 3 } be bases for R2 [𝑡] and
R3 [𝑡], respectively.
∫𝑡 Let 𝑇 : R2 [𝑡] → R3 [𝑡] be the linear transformation given
by 𝑇 (𝑝 (𝑡)) = 0 𝑝 (𝑠)𝑑𝑠. Then
∫𝑡
𝑇 (1) = 0
𝑑𝑠 = 0 × 1 + 1 × 𝑡 + 0(𝑡 + 𝑡 2 ) + 0(𝑡 2 + 𝑡 3 )
∫𝑡
𝑇 (1 + 𝑡) = 0
(1 + 𝑠) 𝑑𝑠 = 0 × 1 + 12 × 𝑡 + 12 (𝑡 + 𝑡 2 ) + 0(𝑡 2 + 𝑡 3 )
∫ 𝑡
𝑇 (𝑡 + 𝑡 2 ) = 0 (𝑠 + 𝑠 2 ) 𝑑𝑠 = 0 × 1 − 16 × 𝑡 + 16 (𝑡 + 𝑡 2 ) + 31 (𝑡 2 + 𝑡 3 ).
90 MA2031 Classnotes
0 0 0
1
1/2 −1/6
Therefore, [𝑇 ]𝐸,𝐵 = 1/2 1/6
.
0
0
0 1/3
The coordinate vectors of a typical vector in R2 [𝑡] and its image are:
𝑣 = 𝑎 + 𝑏𝑡 + 𝑐𝑡 2 = (𝑎 − 𝑏 + 𝑐) × 1 + (𝑏 − 𝑐)(1 + 𝑡) + 𝑐 (𝑡 + 𝑡 2 )
∫𝑡
𝑇 (𝑣) = 𝑇 (𝑎 + 𝑏𝑡 + 𝑐𝑡 ) = (𝑎 + 𝑏𝑠 + 𝑐𝑠 2 ) 𝑑𝑠 = 𝑎 𝑡 + 𝑏2 𝑡 2 + 𝑐3 𝑡 3
2
0
= 0 × 1 + (𝑎 − 𝑏
2 + 𝑐 𝑏
3 )𝑡 + ( 2 − 𝑐3 )(𝑡 + 𝑡 2 ) + 𝑐3 (𝑡 2 + 𝑡 3 ).
0
𝑎 − 𝑏 + 𝑐
− / + /
𝑎 𝑏 2 𝑐 3
Therefore, [𝑣]𝐵 = 𝑏 − 𝑐 and [𝑇 (𝑣)]𝐸 = 𝑏
.
𝑐 / 2 − 𝑐/3
𝑐/3
As we see, a linear transformation𝑇 : 𝑉 → 𝑊 with fixed ordered bases 𝐵 for𝑉 , and
𝐸 for 𝑊 , gives rise to a matrix. We can also construct back the linear transformation
from such a given matrix. For, suppose 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } and 𝐸 = {𝑤 1, . . . , 𝑤𝑚 }. Let
Í
𝑣 ∈ 𝑉 . There exist unique scalars 𝛽 1, . . . , 𝛽𝑛 such that 𝑣 = 𝑛𝑗=1 𝛽 𝑗 𝑣 𝑗 . Then
𝑛
Õ 𝑛
Õ
𝑇 (𝑣) = 𝛽 𝑗𝑇 (𝑣 𝑗 ) = 𝛽 𝑗 (𝑎 1𝑗 𝑤 1 + · · · + 𝑎𝑚 𝑗 𝑤𝑚 ).
𝑗=1 𝑗=1
We thus say that the matrix [𝑇 ]𝐸,𝐵 represents the linear transformation 𝑇 .
𝑎 1
𝑇 (𝑥) = 𝑎 1𝑇 (𝑒 1 ) + · · · + 𝑎𝑛𝑇 (𝑒𝑛 ) = 𝐴 ... = 𝐴𝑥 .
𝑎𝑛
This justifies our terminology: the matrix 𝐴 represents the linear transformation 𝑇 .
In abstract vector spaces, how are the coordinate vectors of 𝑣, 𝑇 (𝑣) and the matrix
of 𝑇 related? Look back at (4.16), where we computed the coordinate vector of a
typical vector and also that of its image. In the first problem there, with 𝑣 = (𝑎, 𝑏),
we had
2 −1 2𝑎 − 𝑏
𝑎
[𝑇 ]𝐸,𝐵 = 1 1 , [𝑣]𝐵 = , [𝑇 (𝑣)]𝐸 = 𝑎 + 𝑏 .
−1 1 𝑏 −𝑎 + 𝑏
92 MA2031 Classnotes
𝑎
0 1 −2 −3 𝑏 − 2𝑐 − 3𝑑
𝑏
[𝑇 ]𝐸,𝐵 = 0 0 2 0 , [𝑣]𝐵 = , [𝑇 (𝑣)]𝐸 = 2𝑐 .
𝑐
0 0 0 3 3𝑑
𝑑
In the fourth problem, we had obtained
0 0 0 0
𝑎 − 𝑏 + 𝑐
1 1/2 −1/6 𝑎 − 𝑏/2 + 𝑐/3
[𝑇 ]𝐸,𝐵 = 1/2 1/6
, [𝑣]𝐵 = 𝑏 − 𝑐 , [𝑇 (𝑣)]𝐸 = 𝑏
.
0 𝑐 /2 − 𝑐/3
0
0 1/3
𝑐/3
Can you see how do we obtain the column vector [𝑇 (𝑣)]𝐸 from the matrix [𝑇 ]𝐸,𝐵
and the column vector [𝑣]𝐵 ? The rule is simple. The first component of [𝑇 (𝑣)]𝐸 is
the dot product of the first row of [𝑇 ]𝐸,𝐵 with the column vector [𝑣]𝐵 . The second
component of [𝑇 (𝑣)]𝐸 is the dot product of the second row of [𝑇 ]𝐸,𝐵 with the column
vector [𝑣]𝐵 ; and so on.
(4.17) Theorem
Let 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } and 𝐸 = {𝑤 1, . . . , 𝑤𝑚 } be ordered bases for the vector spaces
𝑉 and 𝑊 , respectively. Let 𝑇 : 𝑉 → 𝑊 be a linear transformation. Then [𝑇 ]𝐸,𝐵 is
the unique matrix in F𝑚×𝑛 such that for each 𝑥 ∈ 𝑉 ,
[𝑇 𝑥]𝐸 = [𝑇 ]𝐸,𝐵 [𝑥]𝐵 .
Therefore, [𝑇 ]𝐸,𝐵 = 𝐴.
Let 𝑆,𝑇 : 𝑉 → 𝑊 be linear transformations and let 𝛼 be a scalar. Then the
functions 𝑆 + 𝑇 : 𝑉 → 𝑊 and 𝛼𝑆 : 𝑉 → 𝑊 defined by
(4.18) Theorem
Let 𝐵 and 𝐸 be ordered bases for the finite dimensional vector spaces 𝑉 and 𝑊 ,
respectively. Let 𝑆, 𝑇 : 𝑉 → 𝑊 be linear transformations. Let 𝛼 ∈ F. Then
Proof. Let 𝑥 ∈ 𝑉 . Since the coordinate vector of a sum is the sum of the coordinate
vectors, we have
(4.19) Theorem
Let 𝐵, 𝐶 and 𝐷 be ordered bases for the finite dimensional vector spaces 𝑈 , 𝑉 and
𝑊 , respectively. Let 𝑆 : 𝑈 → 𝑉 and 𝑇 : 𝑉 → 𝑊 be linear transformations. Then
That is, the 𝑗th column of [𝑇 𝑆]𝐷,𝐵 is same as the 𝑗th column of [𝑇 ]𝐷,𝐶 [𝑆]𝐶,𝐵 for each
such 𝑗 . Therefore, [𝑇 𝑆]𝐷,𝐵 = [𝑇 ]𝐷,𝐶 [𝑆]𝐶,𝐵 .
As you see, the addition of matrices, a scalar multiple of a matrix, and product of
two matrices are defined in such a way that as linear transformations, they correspond
to the sum of linear transformations, a scalar multiple of a linear transformation,
and the composition of two linear transformations, respectively.
We consider a particular case. Suppose 𝑇 is an isomorphism from 𝑉 to 𝑊 , where
dim (𝑉 ) = 𝑛 = dim (𝑊 ). Then its inverse, written as 𝑇 −1 is an isomorphism from
𝑊 to 𝑉 . Then 𝑇 −1𝑇 = 𝐼, the identity map on 𝑉 . Now, fix ordered bases 𝐵 for 𝑉 , and
𝐸 for 𝑊 . If 𝐵 = {𝑣 1, . . . , 𝑣𝑛 }, then
Then the 𝑛 × 𝑛 matrix [𝑇 −1𝑇 ]𝐵,𝐵 has the 𝑗th column as 𝑒 𝑗 ∈ F𝑛×1 . Similarly, the 𝑖th
column of [𝑇𝑇 −1 ]𝐸 is also 𝑒𝑖 , where 𝐸 is an ordered basis of 𝑊 .
An isomorphism maps a basis onto a basis. Looking at an 𝑛 × 𝑛 matrix as a linear
transformation, we see that the images of the standard basis vectors are the columns
of the matrix. It then follows that
Further, we observe that the matrix representation of the identity map with respect
to the same basis in both copies of the vector space is the identity matrix.
The matrix representation of an isomorphism and that of its inverse have an
obvious connection.
(4.20) Theorem
Let 𝐵 and 𝐸 be ordered bases for finite dimensional vector spaces 𝑉 and 𝑊 ,
respectively. Let 𝑇 : 𝑉 → 𝑊 be an isomorphism. Then
−1
−1
[𝑇 ]𝐵,𝐸 = [𝑇 ]𝐸,𝐵 .
We know that F𝑚×𝑛 is a vector space with addition and scalar multiplication of
matrices. Denote the set of all linear transformations from 𝑉 to 𝑊 by L (𝑉 ,𝑊 ). It
is easy to verify that L (𝑉 ,𝑊 ) is a vector space over the same underlying field with
the addition and scalar multiplication of linear transformations as mentioned earlier.
Can you get a basis for L (𝑉 ,𝑊 ) looking at the basis {𝐸𝑖 𝑗 : 1 ≤ 𝑖 ≤ 𝑚, 1 ≤ 𝑗 ≤ 𝑛}
for F𝑚×𝑛 ?
Linear Transformations and Matrices 95
Let 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } be an orthonormal basis for 𝑉 . Let 𝑢, 𝑣 ∈ 𝑉 . Then 𝑢 =
Í𝑛 Í𝑛
𝑖=1 h𝑢, 𝑣 𝑖 i𝑣 𝑖 , 𝑣 = 𝑗=1 h𝑣, 𝑣 𝑗 i𝑣 𝑗 , and
𝑛 Õ
Õ 𝑛 𝑛
Õ
h𝑢, 𝑣i = h𝑢, 𝑣𝑖 ih𝑣, 𝑣 𝑗 ih𝑣𝑖 , 𝑣 𝑗 i = h𝑢, 𝑣𝑖 ih𝑣, 𝑣𝑖 i = [𝑢]𝐵 · [𝑣]𝐵 .
𝑖=1 𝑗=1 𝑖=1
An orthonormal basis converts the inner product to the dot product. Using an
orthonormal basis in a finite dimensional inner product space amounts to working
in F𝑛×1 with the dot product. Moreover, orthonormal bases allow writing the entries
of the matrix representation of a linear transformation by using the inner products;
see the following theorem.
(4.21) Theorem
Let 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } and 𝐸 = {𝑤 1, . . . , 𝑤𝑚 } be ordered bases of the inner product
spaces 𝑉 and 𝑊 , respectively. Let 𝑇 : 𝑉 → 𝑊 be a linear transformation. If 𝐸 is
an orthonormal basis of 𝑊 , then the (𝑖 𝑗)th entry of [𝑇 ]𝐸,𝐵 is equal to h𝑇 𝑣 𝑗 , 𝑤𝑖 i.
Proof. For 1 ≤ 𝑖 ≤ 𝑚 and 1 ≤ 𝑗 ≤ 𝑛, let 𝑎𝑖 𝑗 denote the (𝑖 𝑗)th entry of the matrix
Í
[𝑇 ]𝐸,𝐵 . Then 𝑇 𝑣 𝑗 = 𝑎 1𝑗 𝑤 1 + · · · + 𝑎𝑚 𝑗 𝑤𝑚 = 𝑚 𝑘=1 𝑎𝑘 𝑗 𝑤 𝑘 . Since 𝐸 is orthonormal,
Í𝑚
h𝑇 𝑣 𝑗 , 𝑤𝑖 i = 𝑎 𝑤
𝑘=1 𝑘 𝑗 𝑘 , 𝑤 𝑖 i = 𝑎 𝑖𝑗 h𝑤 𝑖 , 𝑤 𝑖 = 𝑎 𝑖𝑗.
(4.22) Theorem
Let 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } and 𝐸 = {𝑤 1, . . . , 𝑤𝑚 } be orthonormal ordered bases of the
ips 𝑉 and 𝑊 , respectively. Let 𝑇 : 𝑉 → 𝑊 be a linear transformation. Then
[𝑇 ∗ ]𝐵,𝐸 = ([𝑇 ]𝐸,𝐵 ) ∗ .
𝑏𝑖 𝑗 = h𝑇 ∗𝑤 𝑗 , 𝑣𝑖 i = h𝑣𝑖 ,𝑇 ∗𝑤 𝑗 i = h𝑇 𝑣𝑖 , 𝑤 𝑗 i = 𝑎 𝑗𝑖 .
(4.23) Example
Let 𝑢 1 = (1, 1, 0), 𝑢 2 = (1, 0, 1) and 𝑢 3 = (0, 1, 1). Consider 𝐸 = {𝑢 1, 𝑢 2, 𝑢 3 } as a
basis of R3, and the standard basis 𝐵 = {𝑒 1, 𝑒 2, 𝑒 3, 𝑒 4 } of R4 . Use the standard inner
96 MA2031 Classnotes
products (the dot products) on these spaces. Consider the linear transformation
𝑇 : R4 → R3 defined by
𝑇 (𝑎, 𝑏, 𝑐, 𝑑) = (𝑎 + 𝑐, 𝑏 − 2𝑐 + 𝑑, 𝑎 − 𝑏 + 𝑐 − 𝑑).
𝑇 ∗ (𝛼, 𝛽, 𝛾) = (𝛼 + 𝛾, 𝛽 − 𝛾, 𝛼 − 2𝛽 + 𝛾, 𝛽 − 𝛾).
𝑇 𝑒1 = 𝑇 (1, 0, 0, 0) = (1, 0, 1) = 0 𝑢1 + 1 𝑢2 + 0 𝑢3
𝑇 𝑒2 = 𝑇 (0, 1, 0, 0) = (0, 1, −1) = 1 𝑢1 − 1 𝑢2 + 0 𝑢3
𝑇 𝑒3 = 𝑇 (0, 0, 1, 0) = (1, −2, 1) = −1 𝑢 1 + 2 𝑢 2 − 1 𝑢 3
𝑇 𝑒4 = 𝑇 (0, 0, 0, 1) = (0, 1, −1) = 1 𝑢1 − 1 𝑢2 + 0 𝑢3
The connection between special types of linear operators and their matrix repre-
sentations can be stated in the presence of an orthonormal basis.
(4.24) Theorem
Let 𝑇 be a linear operator on a finite dimensional inner product space 𝑉 . Let 𝐵 be
an orthonormal ordered basis of 𝑉 .
(1) 𝑇 is self-adjoint iff [𝑇 ]𝐵,𝐵 is hermitian.
(2) 𝑇 is normal iff [𝑇 ]𝐵,𝐵 is normal.
(3) 𝑇 is unitary iff [𝑇 ]𝐵,𝐵 is unitary.
𝑇
𝑉 −−−−→ 𝑊
'
[ ]𝐵
[ ]
'
y y 𝐶
F𝑛×1 −−−−→ F𝑚×1
[𝑇 ]𝐶,𝐵
Linear Transformations and Matrices 99
It means
𝑇 = [ ]𝐶−1 ◦ [𝑇 ]𝐶,𝐵 ◦ [ ]𝐵 , [𝑇 ]𝐶,𝐵 = [ ]𝐶 ◦ 𝑇 ◦ [ ]𝐵−1 . (4.5.1)
Also, [ ]𝐶 ◦ 𝑇 = [𝑇 ]𝐶,𝐵 ◦ [ ]𝐵 , which amounts to
We know that isomorphisms preserve rank and nullity. Since the coordinate vector
maps are isomorphisms, we obtain the following theorem.
(4.25) Theorem
Let 𝑉 and 𝑊 be finite dimensional vector spaces with ordered bases 𝐵 and 𝐶,
respectively. Let 𝑇 : 𝑉 → 𝑊 be a linear transformation. Then
rank(𝑇 ) = rank([𝑇 ]𝐶,𝐵 ) and null(𝑇 ) = null([𝑇 ]𝐶,𝐵 ).
Since we are able to go from 𝑇 to its matrix representation [𝑇 ]𝐶,𝐵 and back in
a unique manner, it suggests that the matrix representation itself is some sort of
isomorphism. It is easy to verify that the map 𝑇 ↦→ [𝑇 ]𝐶,𝐵 is an isomorphism from
L (𝑉 ,𝑊 ) to F𝑚×𝑛 .
It thus follows that dim (L (𝑉 ,𝑊 )) = 𝑚𝑛. Alternatively, a basis for L (𝑉 ,𝑊 ) can
be constructed explicitly. Let {𝑣 1, . . . , 𝑣𝑛 } and {𝑤 1, . . . , 𝑤𝑚 } be ordered bases for
the vector spaces 𝑉 and 𝑊 , respectively. Suppose 1 ≤ 𝑖 ≤ 𝑛 and 1 ≤ 𝑗 ≤ 𝑚.
Define 𝑇𝑖 𝑗 : 𝑉 → 𝑊 by 𝑇𝑖 𝑗 (𝑣𝑖 ) = 𝑤 𝑗 , 𝑇𝑖 𝑗 (𝑣𝑘 ) = 0 for 𝑘 ≠ 𝑖. Then show that the set
{𝑇𝑖 𝑗 : 𝑖 = 1, . . . , 𝑛, 𝑗 = 1, . . . , 𝑚} is a basis for L (𝑉 ,𝑊 ).
We look at a particular case of the composition formulas in (4.5.1). Consider
a vector space 𝑉 of dimension 𝑛, with ordered bases 𝑂 = {𝑣 1, . . . , 𝑣𝑛 } and 𝑁 =
{𝑤 1, . . . , 𝑤𝑛 }. Consider the identity map 𝐼 on 𝑉 . Let us write 𝑉𝑂 for the vector space
𝑉 where we take the ordered basis as 𝑂. Similarly, write 𝑉𝑁 for the same space
but with the ordered basis 𝑁 . Fix the standard basis 𝐸 = {𝑒 1, . . . , 𝑒𝑛 } for F𝑛×1 . The
matrix representation diagram now looks like
𝐼
𝑉𝑂 −−−−→ 𝑉𝑁
'
[ ]𝑂
[ ]
'
y y 𝑁
F𝑛×1 −−−−→ F𝑛×1
[𝐼 ]𝑁 ,𝑂
Thus the 𝑛 × 𝑛 matrix [𝐼 ]𝑁 ,𝑂 is called the change of basis matrix. This matrix
records the change in the coordinate vectors while we change the basis of 𝑉 from
100 MA2031 Classnotes
(4.26) Example
Consider two ordered bases for R3 such as
𝑂 = {(1, 0, 1), (1, 1, 0), (0, 1, 1)}, 𝑁 = {(1, −1, 1), (1, 1, −1), (−1, 1, 1)}.
Find the change of basis matrix [𝐼 ]𝑁 ,𝑂 and verify that [𝑣]𝑁 = [𝐼 ]𝑁 ,𝑂 [𝑣]𝑂 for the
vector 𝑣 = (1, 2, 3).
We need to express each vector in 𝑂 as a linear combination of vectors in 𝑁 .
Towards this, suppose
𝑣 = (1, 2, 3) = 2(1, −1, 1) + 23 (1, 1, −1) + 52 (−1, 1, 1) ⇒ [𝑣]𝑁 = [2, 3/2, 5/2] t .
𝑣 = (1, 2, 3) = 1(1, 0, 1) + 0(1, 1, 0) + 2(0, 1, 1) ⇒ [𝑣]𝑂 = [1, 0, 2] t .
1 1/2 1/2 1 2
[𝐼 ]𝑁 ,𝑂 [𝑣]𝑂 = 1/2 1 1/2
0 = 3/2 = [𝑣]𝑁 .
1/2 1/2 1 2 5/2
In (4.26), construct the matrix 𝐵 by taking its columns as the transposes of vectors
in 𝑂, keeping the same order of the vectors as given in 𝑂. Similarly, construct the
Linear Transformations and Matrices 101
matrix 𝐶 by taking its columns as the transposes of vectors from 𝑁 , again keeping
the same order. We claim that [𝐼 ]𝑁 ,𝑂 = 𝐶 −1 𝐵. Indeed, the following may be easily
verified:
1 1 −1 1 1/2 1/2
1 1 0
𝐶 [𝐼 ]𝑁 ,𝑂 = −1 1 1 1/2 1 1/2 = 0 1 1 = 𝐵.
1 −1 1 1/2 1/2 1 1 0 1
In general, if 𝑉 = F𝑛×1, then the change of basis matrix can be given in a closed
from using the given basis vectors.
(4.27) Theorem
Let 𝑂 = {𝑣 1, . . . , 𝑣𝑛 } and 𝑁 = {𝑤 1, . . . , 𝑤𝑛 } be ordered bases for F𝑛×1 . Let 𝐸 =
{𝑒 1, . . . , 𝑒𝑛 } be the standard basis for F𝑛×1 . Then the change of basis matrices [𝐼 ]𝐸,𝑂
and [𝐼 ]𝑁 ,𝑂 are given by
[𝐼 ]𝐸,𝑂 = [𝑣 1 · · · 𝑣𝑛 ] and [𝐼 ]𝑁 ,𝑂 = [𝑤 1 · · · 𝑤𝑛 ] −1 [𝑣 1 · · · 𝑣𝑛 ].
That is, the 𝑗th column of the matrix [𝐼 ]𝐸,𝑂 is simply 𝑣 𝑗 . So,
[𝐼 ]𝐸,𝑂 = [𝑣 1 · · · 𝑣𝑛 ].
In fact, columns of any invertible 𝑛 × 𝑛 matrix form a basis for F𝑛×1 . Therefore,
any invertible matrix is a change of basis matrix in this sense. It gives rise to the
following generalization.
(4.28) Theorem
Let 𝐴 ∈ F𝑚×𝑛 . Let 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } and 𝐶 = {𝑤 1, . . . , 𝑤𝑚 } be ordered bases for F𝑛×1
and F𝑚×1, respectively. Then [𝐴]𝐶,𝐵 = 𝑄 −1𝐴𝑃, where
𝑄 = [𝑤 1 · · · 𝑤𝑚 ] and 𝑃 = [𝑣 1 · · · 𝑣𝑛 ].
102 MA2031 Classnotes
Proof. Take the standard bases for F𝑛×1 and F𝑚×1 as 𝐷 and 𝐸, respectively. Now,
[𝐼 ]𝐷,𝐵 = [𝑣 1 · · · 𝑣𝑛 ], [𝐼 ]𝐸,𝐶 = [𝑤 1 · · · 𝑤𝑚 ], and [𝐴]𝐸,𝐷 = 𝐴.
Then [𝐴]𝐶,𝐵 = [𝐼 ]𝐶,𝐸 [𝐴]𝐸,𝐷 [𝐼 ]𝐷,𝐵 = [𝑤 1 · · · 𝑤𝑚 ] −1 𝐴 [𝑣 1 · · · 𝑣𝑛 ].
As (4.28) shows, a matrix 𝐴 ∈ F𝑚×𝑛 and a matrix 𝑄 −1𝐴𝑃, where both 𝑄 ∈ F𝑚×𝑚
and 𝑃 ∈ F𝑛×𝑛 are invertible, represent the same linear transformation with respect
to ordered bases chosen in both the domain and the co-domain spaces. Note that a
matrix 𝐴 ∈ F𝑚×𝑛 is equal to [𝐴]𝐸 0,𝐸 , where 𝐸 is the standard basis for F𝑛×1 and 𝐸 0 is
the standard basis for F𝑚×1 .
(4.29) Example
1 2 3
Consider the matrix 𝐴 = as a linear transformation from R3×1 to R2×1 .
0 1 1
It maps
𝑎
𝑎 + 2𝑏 + 3𝑐
𝑣 = 𝑏 to 𝐴𝑣 =
for 𝑎, 𝑏, 𝑐 ∈ R.
𝑐 𝑏 +𝑐
Choose ordered bases
−1
1
0
1 1
1 for R3×1, for R2×1 .
𝐵 = 1 , 1 , 𝐶= ,
1
1
−1 1 −1
Then
1 −1 0
1 1 −1 1 1 1
𝑄= , 𝑄 = , 𝑃 = 1 1 1 .
1 −1 2 1 −1 1 1 −1
We write 𝑣 = (𝑎, 𝑏, 𝑐) t as a linear combination of basis vectors from 𝐵, and
𝐴𝑣 = (𝑎 + 2𝑏 + 3𝑐, 𝑏 + 𝑐) t as a linear combination of basis vectors from 𝐶.
𝑎 1 −1 0
1 1 1
𝑏 = (𝑎 + 𝑏 + 𝑐) 1 + (−2𝑎 + 𝑏 + 𝑐) 0 + (−𝑎 + 2𝑏 − 𝑐) 1 .
3 3 3
𝑐 1 1 −1
𝑎 + 2𝑏 + 3𝑐 1 1 1 1
= 2 (𝑎 + 3𝑏 + 4𝑐) + 2 (𝑎 + 𝑏 + 2𝑐) .
𝑏 +𝑐 1 −1
Thus their coordinate vectors with respect to bases 𝐵 and 𝐶 are as follows:
𝑎 +𝑏 +𝑐
1 1 𝑎 + 3𝑏 + 4𝑐
[𝑣]𝐵 = −2𝑎 + 𝑏 + 𝑐 , [𝐴𝑣]𝐶 = .
3
−𝑎 + 2𝑏 − 𝑐
2 𝑎 + 2𝑏 + 2𝑐
Linear Transformations and Matrices 103
Then the matrix representation of 𝐴 with respect to the new bases is given by
−1 1 1 1 1 2 3 1 2 3 1 8 3 −1
[𝐴]𝐶,𝐵 = 𝑄 𝐴𝑃 = = .
2 1 −1 0 1 1 0 1 1 2 4 1 −1
As it should happen, we see that
𝑎 +𝑏 +𝑐
1 8 3 −1 1 𝑎 + 3𝑏 + 4𝑐
1
[𝐴]𝐶,𝐵 [𝑣]𝐵 = 2 3 −2𝑎 + 𝑏 + 𝑐 = 2 = [𝐴𝑣]𝐶 .
4 1 −1
−𝑎 + 2𝑏 − 𝑐 𝑎 + 2𝑏 + 2𝑐
4.6 Equivalence
The effect of change of bases in vector spaces on the matrix representation of a
linear transformation leads to the following relation on matrices.
Let 𝐴, 𝐵 ∈ F𝑚×𝑛 . We say that 𝐵 is equivalent to 𝐴 iff there exist invertible matrices
𝑃 ∈ F𝑛×𝑛 and 𝑄 ∈ F𝑚×𝑚 such that 𝐵 = 𝑄 −1𝐴𝑃 .
As it happens, a change of bases in both the domain and co-domain spaces
bring up an equivalent matrix that represents the original matrix viewed as a linear
transformation.
It is easy to see that on F𝑚×𝑛 , the relation ‘is equivalent to’ is an equivalence
relation. Observe that 𝐴 and 𝐵 are equivalent iff there exist invertible matrices 𝑃
and 𝑄 of appropriate order such that 𝐵 = 𝑄𝐴𝑃 . There is an easy characterization of
equivalence of two matrices.
Linear Transformations and Matrices 105
(4.30) Theorem (Rank Theorem)
Two matrices of the same size are equivalent iff they have the same rank.
Proof. Let 𝐴 and 𝐵 be 𝑚×𝑛 matrices. We view them as linear transformations from
F𝑛×1 to F𝑚×1 . Observe that isomorphisms on F𝑘×1 are simply invertible matrices.
Now, using (3.17-3.18) we obtain the following:
𝐴 and 𝐵 are equivalent
iff there exist invertible matrices 𝑃 ∈ F𝑛×𝑛 , 𝑄 ∈ F𝑚×𝑚 such that 𝐵 = 𝑄𝐴𝑃
iff there exist isomorphisms 𝑃 on F𝑛×𝑛 , and 𝑄 on F𝑚×1 such that 𝐵 = 𝑄𝐴𝑃
iff rank(𝐵) = rank(𝐴).
It is easy to construct an 𝑚 × 𝑛 matrix of rank 𝑟 . For instance
𝐼𝑟 0
𝐸𝑟 = ∈ F𝑚×𝑛
0 0
is such a matrix, where 𝐼𝑟 is the identity matrix of order 𝑟, and the other zero matrices
are of appropriate size. We thus obtain the following result as a corollary to the
Rank theorem.
The RREF conversion can be used to construct the matrices 𝑃 and 𝑄 from a given
matrix 𝐴 so that 𝑄 −1𝐴𝑃 = 𝐸𝑟 .
Now, taking transpose, we have 𝐴t = 𝑃 t 𝐸𝑟t (𝑄 −1 ) t . That is, 𝐴t is equivalent to 𝐸𝑟t .
However, 𝐸𝑟t has rank 𝑟 . It thus follows that rank(𝐴t ) = rank(𝐴) = 𝑟 .
We know that the 𝑖th column of 𝐴 is equal to 𝐴(𝑒𝑖 ). Thus, 𝑅(𝐴) is the subspace of
F𝑚×1 spanned by the columns of 𝐴. Then rank(𝐴) is same as the maximum number
of linearly independent columns of 𝐴. Then rank(𝐴t ) is same as the maximum
number of linearly independent rows of 𝐴. We have thus shown that these two
numbers are equal. This fact is expressed by asserting that the column rank of a
matrix and the row rank of a matrix are equal. Of course, this also follows from
(3.24).
Further, if 𝐴 is a square matrix, then rank(𝐴t ) = rank(𝐴) implies that 𝐴t and 𝐴
are equivalent matrices.
The rank factorization yields another related factorization. For a matrix 𝐴 ∈ F𝑚×𝑛
of rank 𝑟, we have invertible matrices 𝑃 ∈ F𝑛×𝑛 and 𝑄 ∈ F𝑚×𝑚 such that 𝐴 = 𝑄𝐸𝑟 𝑃 −1 .
However, 𝐸𝑟 ∈ F𝑚×𝑛 can be written as
𝐼𝑟 0 𝐼𝑟 𝐼
with 𝑟 ∈ F𝑚×𝑟 , 𝐼𝑟 0 ∈ F𝑟 ×𝑛 .
𝐸𝑟 = = 𝐼𝑟 0
0 0 0 0
106 MA2031 Classnotes
Taking
𝐼
0 𝑃 −1,
𝐵 =𝑄 𝑟 , 𝐶 = 𝐼𝑟
0
we see that rank(𝐵) = 𝑟 = rank(𝐶). We have thus have proved the following theorem.
Thus, the line {(𝑎, 𝑎) : 𝑎 ∈ R} never moves. So also the line {(𝑎, −𝑎) : 𝑎 ∈ R}.
Observe that
𝑎 𝑎 𝑎 𝑎
𝐴 =1 , 𝐴 = (−1) .
𝑎 𝑎 −𝑎 −𝑎
In general, if a straight line remains invariant under a linear operator 𝑇 , then the
image of any point on the straight line must be a point on the same straight line.
That is, 𝑇 (𝑥) must be a scalar multiple of 𝑥 . Since we are interested in fixing a
straight line, such a vector 𝑥 should be a nonzero vector.
Let 𝑇 be a linear operator on a vector space 𝑉 over F. A scalar 𝜆 ∈ F is called an
eigenvalue of 𝑇 iff there exists a nonzero vector 𝑣 ∈ 𝑉 such that 𝑇 𝑣 = 𝜆𝑣. Such a
vector 𝑣 is called an eigenvector of 𝑇 for (or associated with, or corresponding to)
the eigenvalue 𝜆.
Convention: Since eigenvectors are nonzero vectors, whenever we discuss eigen-
values of a linear operator on a vector space, we assume that the vector space is a
nonzero vector space.
(5.1) Example
1. Let 𝑇 be the linear operator on R2 given by 𝑇 (𝑎, 𝑏) = (𝑎, 𝑎 + 𝑏). We have
𝑇 (0, 1) = (0, 0 + 1) = 1 (0, 1). Thus the vector (0, 1) is an eigenvector associated
with the eigenvalue 1 of 𝑇 . Is (0, 𝑏) also an eigenvector associated with the same
eigenvalue 1?
108
Spectral Representation 109
2. Let the linear operator 𝑇 on R2 be given by 𝑇 (𝑎, 𝑏) = (−𝑏, 𝑎). If 𝜆 is an eigenvalue
of 𝑇 with an eigenvector (𝑎, 𝑏), then (−𝑏, 𝑎) = (𝜆𝑎, 𝜆𝑏). It implies that 𝑏 = −𝜆𝑎
and 𝑎 = 𝜆𝑏. It gives 𝑏 = −𝜆 2𝑏, or 𝑏 (1 + 𝜆 2 ) = 0. Since R2 is a real vector space,
𝜆 ∈ R. Then 1 + 𝜆 2 ≠ 0. Hence 𝑏 = 0. This leads to 𝑎 = 0. That is, (𝑎, 𝑏) = (0, 0).
But an eigenvector is nonzero! Therefore, 𝑇 does not have an eigenvalue.
3. Let 𝑇 : C2 → C2 be given by 𝑇 (𝑎, 𝑏) = (−𝑏, 𝑎). As in (2), if 𝜆 is an eigenvalue of
𝑇 with an eigenvector (𝑎, 𝑏), then 𝑏 (1 + 𝜆 2 ) = 0 and 𝑏 = −𝜆𝑎. If 𝑏 = 0, then 𝑎 = 0,
which is not possible as (𝑎, 𝑏) ≠ (0, 0). Thus 1 + 𝜆 2 = 0. Hence 𝜆 = ±𝑖. It is easy
to verify that the eigenvalue 𝜆 = 𝑖 is associated with an eigenvector (1, −𝑖) and
the eigenvalue 𝜆 = −𝑖 is associated with an eigenvector (1, 𝑖).
4. The linear operator 𝑇 : F[𝑡] → F[𝑡] defined by 𝑇 (𝑝 (𝑡)) = 𝑡𝑝 (𝑡) has no eigen-
vector and no eigenvalue, since for a polynomial 𝑝 (𝑡), 𝑡𝑝 (𝑡) ≠ 𝛼𝑝 (𝑡) for any
𝛼 ∈ F.
5. Let 𝑇 : R[𝑡] → R[𝑡] be defined by 𝑇 (𝑝 (𝑡)) = 𝑝 0 (𝑡), where we interpret each 𝑝 ∈
R[𝑡] as a function from the open interval (0, 1) to R. Since derivative of a constant
polynomial is 0, which equals 0 times the constant polynomial, all nonzero
constant polynomials are eigenvectors of 𝑇 associated with the eigenvalue 0.
(5.2) Theorem
Let 𝑇 : 𝑉 → 𝑉 be a linear operator. Let 𝜆 be a scalar. Then the following are true:
(1) A nonzero vector 𝑣 ∈ 𝑉 is an eigenvector of 𝑇 for the eigenvalue 𝜆 iff
𝑣 ∈ 𝑁 (𝑇 − 𝜆𝐼 ).
(2) 𝜆 is an eigenvalue of 𝑇 iff 𝑇 − 𝜆𝐼 is not one-one.
(5.3) Theorem
Let 𝑇 be a liner operator on a finite dimensional vector space 𝑉 over F. Let 𝐵 be an
ordered basis of 𝑉 . Then, 𝜆 ∈ F is an eigenvalue of 𝑇 with an associated eigenvector
𝑣 iff 𝜆 is an eigenvalue of the matrix [𝑇 ]𝐵,𝐵 ∈ F𝑛×𝑛 with an associated eigenvector
[𝑣]𝐵 ∈ F𝑛×1 .
110 MA2031 Classnotes
(5.4) Theorem
Let 𝑇 be a linear operator on a finite dimensional vector space 𝑉 . Let 𝐵 be an
ordered basis of 𝑉 , and let 𝐴 = [𝑇 ]𝐵,𝐵 be the matrix representation of 𝑇 with
respect to 𝐵. Then a scalar 𝜆 is an eigenvalue of 𝑇 iff det(𝐴 − 𝜆𝐼 ) = 0.
Since 𝑉 is a finite dimensional vector space over the field F, where F is either R
or C, the coefficients of powers of 𝑡 in 𝜒𝑇 (𝑡) are complex numbers, in general. Then
all characteristic values of 𝑇 are in C. This fact is a consequence of the fundamental
theorem of algebra, which states the following:
Each polynomial of degree 𝑛 with complex coefficients has 𝑛 number
of complex zeros, counting multiplicities.
From (5.4) it follows that the eigenvalues of 𝑇 are precisely the characteristic values
that lie in the underlying field. Explicitly, each eigenvalue of 𝑇 is its characteristic
value; if 𝑉 is a complex vector space, then each characteristic value is an eigenvalue;
and if 𝑉 is a real vector space, then all and only the real characteristic values are
eigenvalues of 𝑇 .
(5.5) Example
Then 𝜒𝑇 (𝑡) = (𝑡 − cos 𝜃 ) 2 + sin2 𝜃 . Its zeros are cos 𝜃 ± 𝑖 sin 𝜃 . We find that if 𝜃
is not a multiple of 𝜋, then all these zeros are non-real. In this case, 𝑇𝜃 does not
have an eigenvalue.
Spectral Representation 113
Indeed, if 𝜃 is not a multiple of 𝜋, the rotation does not fix any straight line
through the origin.
Given any monic polynomial 𝑝 (𝑡), there exists a matrix 𝐶 such that 𝜒𝐶 (𝑡) = 𝑝 (𝑡).
See the following example.
(5.6) Example
0 0 · · · 0 −𝑎 0
1 0 · · · 0 −𝑎
1
Let 𝐶 = .. for scalars 𝑎 0, 𝑎 1, . . . , 𝑎𝑘 and 𝑘 ≥ 2.
. .
.
.
0 0 · · · 1 −𝑎𝑘−1
We show by induction on 𝑘 that
−𝑡 −𝑎 0
= (−1) 2 (𝑡 2 + 𝑎 1𝑡 + 𝑎 0 ).
1 −𝑎 1 − 𝑡
So, assume the induction hypothesis that for any scalars 𝑎 0, 𝑎 1, . . . , 𝑎𝑚−1,
−𝑡 0 · · · 0 −𝑎 0
1 −𝑡 · · · 0 −𝑎 1
.. .. = (−1)𝑚 (𝑡 𝑚 + 𝑎𝑚−1𝑡 𝑚−1 + · · · + 𝑎 1𝑡 + 𝑎 0 ).
. .
0 0 · · · 1 −𝑎𝑚−1 − 𝑡
Then
−𝑡 0 · · · 0 −𝑎 0 −𝑡 0 · · · 0 −𝑎 1 1 −𝑡 · · · 0
1 −𝑡 · · · 0 −𝑎 1 1 −𝑡 · · · 0 −𝑎 2 0 1 ··· 0
𝑚+1
.. .. = −𝑡 . .. .. + (−1) 𝑎 0 ..
. . . .
0 0 · · · 1 −𝑎𝑚 − 𝑡 0 0 · · · 1 −𝑎𝑚 − 𝑡 0 0 ··· 1
= −𝑡 (−1)𝑚 (𝑡 𝑚 + 𝑎𝑚 𝑡 𝑚−1 + + · · · + 𝑎 2𝑡 + 𝑎 1 ) + (−1)𝑚+1𝑎 0
= (−1)𝑚+1 (𝑡 𝑚+1 + 𝑎𝑚 𝑡 𝑚 + · · · + 𝑎 1𝑡 + 𝑎 0 ).
Due to this reason, the matrix 𝐶 as given above is called the companion matrix of
the monic polynomial 𝑡 𝑘 + 𝑎𝑘−1𝑡 𝑘−1 + · · · + 𝑎 1𝑡 + 𝑎 0 .
The fundamental theorem of algebra implies that if 𝑝 (𝑡) is a polynomial with real
coefficients, then its complex zeros come in conjugate pairs. That is, for 𝛽 ≠ 0, if
𝜆 = 𝛼 + 𝑖𝛽 is a zero of such a polynomial 𝑝 (𝑡), then so is 𝜆 = 𝛼 − 𝑖𝛽. Further, it
implies that a polynomial of odd degree has a real zero. Consequently, each linear
operator on a finite dimensional complex vector space has an eigenvalue; and each
linear operator on an odd dimensional real vector space has an eigenvalue.
For matrices, we need to be a bit careful. Let 𝐴 be an 𝑛 × 𝑛 matrix. If at least
one entry of 𝐴 is a complex number with nonzero imaginary part, then 𝐴 ∈ C𝑛×𝑛 is
viewed as a linear operator on C𝑛×1 . In this case, the eigenvalues of 𝐴 are precisely
the characteristic values. On the other hand, if 𝐴 has only real entries, we may view
it as a linear operator on C𝑛×1 or on R𝑛×1 . As an operator on C𝑛×1, all characteristic
values of 𝐴 are its eigenvalues; and as an operator on R𝑛×1, only the real characteristic
values of 𝐴 are the eigenvalues.
In this terminology, a matrix in R𝑛×𝑛 can have complex eigenvalues. Notice that
an eigenvector corresponding to a complex eigenvalue of 𝐴 is a vector in C𝑛×1 .
For instance, in (5.1-3), the matrix [𝑇 ]𝐸,𝐸 has complex eigenvalues 𝑖 and −𝑖. The
corresponding eigenvectors are (1, −𝑖) t and (1, 𝑖) t, which are in C2×1 . Similarly,
the rotation matrix [𝑇𝜃 ]𝐵,𝐵 in (5.5-3) has complex eigenvalues cos 𝜃 ± 𝑖 sin 𝜃 with
eigenvectors [1, ∓𝑖] t .
Recall that a polynomial 𝑝 (𝑡) has 𝜆 as a zero of multiplicity 𝑚 means that (𝑡 − 𝜆)𝑚
divides the polynomial 𝑝 (𝑡) but (𝑡 − 𝜆)𝑚+1 does not divide 𝑝 (𝑡). Accordingly, if 𝑇 is
a linear operator on a finite dimensional vector space and 𝜆 is a characteristic value
of 𝑇 , where 𝜆 has multiplicity 𝑚 as a zero of 𝜒𝑇 (𝑡), then we say that the algebraic
multiplicity of the characteristic value 𝜆 of 𝑇 is 𝑚. The same terminology applies
when 𝜆 is an eigenvalue of 𝑇 or a (complex) eigenvalue of a square matrix 𝐴.
When we speak of all characteristic values of 𝑇 counting multiplicities, we
are concerned with the list of all characteristic values of 𝑇 , where each one is
repeated as many times as its algebraic multiplicity. Thus, a linear operator 𝑇
on an 𝑛-dimensional vector space has 𝑛 number of characteristic values, counting
multiplicities. Similarly, an 𝑛 × 𝑛 matrix has 𝑛 number of complex eigenvalues,
counting multiplicities. You should understand the results in the following theorem
in this sense.
Spectral Representation 115
(5.7) Theorem
(1) The diagonal entries of a triangular (upper or lower) and of a diagonal matrix
are its eigenvalues, counting multiplicities.
(2) A square matrix and its transpose have the same complex eigenvalues, count-
ing multiplicities.
(3) The determinant of a square matrix is the product of all its complex eigenval-
ues, counting multiplicities.
(4) The trace of a square matrix is the sum of all its complex eigenvalues, counting
multiplicities.
Therefore, 𝜆1 + · · · + 𝜆𝑛 = 𝑡𝑟 (𝐴).
We say that a polynomial 𝑝 (𝑡) annihilates a linear operator 𝑇 iff 𝑝 (𝑇 ) = 0, the
zero operator. As usual, when 𝑝 (𝑡) annihilates 𝑇 , we also say that 𝑇 is annihilated
by 𝑝 (𝑡). The same terminology applies to square matrices.
𝑇 𝑣 = 0 𝑣 + 1𝑇 𝑣 + 0𝑇 2𝑣 + · · · + 0𝑇 𝑘−1𝑣 + 0 𝑢 1 + · · · + 0 𝑢𝑚
𝑇 (𝑇 𝑣) = 0 𝑣 + 0𝑇 𝑣 + 1𝑇 2𝑣 + · · · + 0𝑇 𝑘−1𝑣 + 0 𝑢 1 + · · · + 0 𝑢𝑚
..
.
𝑇 (𝑇 𝑘−2𝑣) = 0 𝑣 + 0𝑇 𝑣 + 0𝑇 2𝑣 + · · · + 1𝑇 𝑘−1𝑣 + 0 𝑢 1 + · · · + 0 𝑢𝑚
𝑇 (𝑇 𝑘−1𝑣) = 𝑎 0 𝑣 + 𝑎 1 𝑇 𝑣 + 𝑎 2 𝑇 2𝑣 + · · · + 𝑎𝑘−1 𝑇 𝑘−1𝑣 + 0 𝑢 1 + · · · + 0 𝑢𝑚
𝑇 (𝑢 1 ) = 𝑏 11𝑣 + 𝑏 12𝑇 𝑣 + · · · + 𝑏 1𝑘𝑇 𝑘−1𝑣 + 𝑏 1(𝑘+1)𝑢 1 + · · · + 𝑏 1(𝑘+𝑚)𝑢𝑚
..
.
𝑇 (𝑢𝑚 ) = 𝑏𝑚1𝑣 + 𝑏𝑚2𝑇 𝑣 + · · · + 𝑏𝑚𝑘𝑇 𝑘−1𝑣 + 𝑏𝑚(𝑘+1)𝑢 1 + · · · + 𝑏𝑚(𝑘+𝑚)𝑢𝑚
where 𝑏𝑖 𝑗 are some scalars. Thus, the matrix representation of 𝑇 with respect to the
ordered basis 𝐵 is in the form
0 0 · · · 0 𝑎 0
1 0 · · · 0 𝑎 1
𝐶 𝐴
𝑀 := [𝑇 ]𝐵,𝐵 = where 𝐶 =
, .. .
0 𝐷 .
0 0 · · · 1 𝑎𝑘−1
(5.9) Theorem
If a linear operator on a finite dimensional vector space is annihilated by a polyno-
mial, then its eigenvalues are from among the zeros of the polynomial.
(5.10) Example
Let 𝐴 ∈ R3×3 be such that 𝐴2 = 3𝐴 − 2𝐼 and det(𝐴) = 4. Then what is tr(𝐴)?
𝐴 is annihilated by the polynomial 𝑝 (𝑡) = 𝑡 2 − 3𝑡 + 2 = (𝑡 − 2)(𝑡 − 1). The zeros of
𝑝 (𝑡) are 2 and 1. Thus the (complex) eigenvalues of 𝐴 can be 2 or 1. Now that 𝐴 is a
matrix of order 3, taking into account the repetition of eigenvalues, the possibilities
are 2, 2, 2 or 2, 2, 1 or 2, 1, 1 or 1, 1, 1. As det(𝐴) = the product of the eigenvalues
of 𝐴 = 4; the eigenvalues are 2, 2 and 1. So tr(𝐴) = 2 + 2 + 1 = 5.
We wish to find out the nature of eigenvalues and eigenvectors of these special
types of operators.
(5.11) Theorem
Let 𝑇 be a linear operator on a finite dimensional inner product space.
(1) If 𝑇 is self-adjoint, then
(a) all characteristic values of 𝑇 are real;
(b) 𝑇 has an eigenvalue;
(c) all eigenvalues of 𝑇 are real; and
(d) eigenvectors for distinct eigenvalues are orthogonal.
(2) If 𝑇 is unitary, then each eigenvalue of 𝑇 has absolute value 1.
118 MA2031 Classnotes
h𝐴𝑣, 𝑣i = h𝜆𝑣, 𝑣i = 𝜆h𝑣, 𝑣i; h𝐴𝑣, 𝑣i = h𝐴∗𝑣, 𝑣i = h𝑣, 𝐴𝑣i = h𝑣, 𝜆𝑣i = 𝜆h𝑣, 𝑣i.
h𝑇 𝑣,𝑇 𝑣i = h𝑣,𝑇 ∗𝑇 𝑣i = h𝑣, 𝑣i; h𝑇 𝑣,𝑇 𝑣i = h𝜆𝑣, 𝜆𝑣i = 𝜆𝜆h𝑣, 𝑣i = |𝜆| 2 h𝑣, 𝑣i.
(5.12) Theorem
𝐴𝑥 = 𝜆𝑥, 𝐴𝑦 = 𝜆𝑦.
h𝐴𝑣, 𝑣i = h𝜆𝑣, 𝑣i = 𝜆h𝑣, 𝑣i; h𝐴𝑣, 𝑣i = h−𝐴∗𝑣, 𝑣i = −h𝑣, 𝐴𝑣i = −h𝑣, 𝜆𝑣i = −𝜆h𝑣, 𝑣i.
(5.13) Example
Consider the linear operator 𝑇 : R2 → R2 given by 𝑇 (𝑎, 𝑏) = √1 (𝑎 − 𝑏, 𝑎 + 𝑏). Its
2
matrix representation with respect to the standard basis 𝐸 of R2 is
√ √
1/ 2 −1/ 2
[𝑇 ]𝐸,𝐸 = 1 √ √ .
/ 2 1/ 2
In (5.13), 𝑇 is the rotation in the plane by an angle of 𝜋/4. In fact, any rotation 𝑇𝜃
of (5.5), where 𝜃 is not a multiple of 𝜋, provides such an example.
𝑇 𝑥 = 𝑎 1𝑣 1 + 𝑎 2𝑢 2 + · · · + 𝑎𝑛𝑢𝑛 .
Define 𝑆 : 𝑈 → 𝑈 by
𝑆 (𝑥) = 𝑎 2𝑢 2 + · · · + 𝑎𝑛𝑢𝑛 .
Clearly, for 𝑢 ∈ 𝑈 , 𝑆 (𝑢) ∈ 𝑈 . We show that 𝑆 is a linear operator. For this, let
𝑦, 𝑧 ∈ 𝑈 , and let 𝛼 ∈ C. There exist unique scalar 𝑏𝑖 , 𝑐𝑖 such that
𝑇𝑦 = 𝑏 1𝑣 1 + 𝑏 2𝑢 2 + · · · + 𝑏𝑛𝑢𝑛 , 𝑇 𝑧 = 𝑐 1 𝑣 1 + 𝑐 2 𝑣 2 + · · · + 𝑐 𝑛𝑢𝑛 ;
𝑇 (𝑦 + 𝛼𝑧) = (𝑏 1 + 𝛼𝑐 1 )𝑣 1 + (𝑏 2𝑢 2 + · · · + 𝑏𝑛𝑢𝑛 ) + 𝛼 (𝑐 2𝑢 2 + · · · + 𝑐𝑛𝑢𝑛 ).
𝑇 𝑥 = 𝑎 1𝑣 1 + 𝑆 (𝑥),
where the scalar 𝑎 1 ∈ C and the vector 𝑆 (𝑥) ∈ 𝑈 are uniquely determined from 𝑇
and the vector 𝑥 ∈ 𝑈 .
By the induction hypothesis, there exists an orthonormal ordered basis {𝑣 2, . . . , 𝑣𝑛 }
for 𝑈 such that
𝑆 (𝑣 𝑗 ) ∈ span {𝑣 2, . . . , 𝑣 𝑗 } for 2 ≤ 𝑗 ≤ 𝑛.
Let 𝐵 = {𝑣 1, 𝑣 2, . . . , 𝑣𝑛 }. Since k𝑣 1 k = 1, 𝑣 1 is orthogonal to each vector in 𝑈 , and
{𝑣 1, . . . , 𝑣𝑛 } is an orthonormal set. We see that 𝐵 is an orthonormal ordered basis
122 MA2031 Classnotes
for 𝑉 . To see that this is the required basis, we compute the 𝑇 -images of the basis
vectors as follows: (Here, 𝛼𝑖 are some suitable scalars.)
𝑇 (𝑣 1 ) = 𝜆𝑣 1 ∈ span {𝑣 1 },
𝑇 (𝑣 2 ) = 𝛼 1𝑣 1 + 𝑆 (𝑣 2 ) ∈ span {𝑣 1, 𝑣 2 },
..
.
𝑇 (𝑣 𝑗 ) = 𝛼 𝑗 𝑣 1 + 𝑆 (𝑣 𝑗 ) ∈ span {𝑣 1, 𝑣 2, . . . , 𝑣 𝑗 }, for 2 ≤ 𝑗 ≤ 𝑛.
(5.15) Theorem
Similar square matrices have the same eigenvalues, counting multiplicities.
for some 𝑦 ∈ C1×𝑚 . Since 𝑆 ∗𝐶𝑆 is upper triangular, so is 𝑃 ∗𝐴𝑃 . The construction is
complete.
124 MA2031 Classnotes
If 𝐴 is a real matrix, and all its complex eigenvalues turn out to be real, then
we use the transpose instead of the adjoint every where in the above construction.
Thus, 𝑃 is an orthogonal matrix.
(5.17) Example
2 1 0
Consider the matrix 𝐴 = 2 3 0 for Schur triangularization.
−1 −1 1
We find that 𝜒𝐴 (𝑡) = (𝑡 − 1) 2 (𝑡 − 4). All characteristic values of 𝐴 are real. Thus
there exists an orthogonal matrix 𝑃 such that 𝑃 t𝐴𝑃 is upper triangular. To determine
such a matrix 𝑃, we take one of the eigenvalues, say 1. An associated eigenvector
of norm 1 is 𝑣 = (0, 0, 1) t . We extend {𝑣 } to an orthonormal basis for R3×1 . For
convenience, we take the (ordered) orthonormal basis as
{(0, 0, 1) t, (1, 0, 0) t, (0, 1, 0) t }.
Taking the basis vectors as columns, we form the matrix 𝑅 as follows:
0 1 0
𝑅 = 0 0 1 .
1 0 0
We then find that
1 −1 −1
t
𝑅 𝐴𝑅 = 0 2 1 .
0 2 3
2 1
Now, we try to triangularize the matrix 𝐶 = . It has eigenvalues 1 and 4.
2 3
√ √
The eigenvector of unit norm associated with the eigenvalue 1 is ( 1/ 2, −1/ 2) t . We
extend it to an orthonormal basis
√ √ √ √
{( 1/ 2, −1/ 2) t, ( 1/ 2, 1/ 2) t }
for R2×1 . Then we construct the matrix 𝑆 by taking these basis vectors as its columns:
√ √
1/ 2 1/ 2
𝑆 = −1 √ 1 √ .
/ 2 / 2
t 1 −1
We find that 𝑆 𝐶𝑆 = , which is an upper triangular matrix. Then
0 4
√ √
0 1 0 1 0 0 0 1/ 2 1/ 2
1 0 √
1/ 2
√
1/ 2 = 0
√
−1/ 2
√
1/ 2
𝑃 =𝑅 = 0 0 1 0 .
0 𝑆 1
√
−1/ 2
√
1/ 2
0 0 0
1
0 0
Spectral Representation 125
√
1 0 − 2
Now, 𝑃 t𝐴𝑃 = 0 1 −1 is an upper triangular matrix.
0 0 4
Notice that there is nothing sacred about being upper triangular. For, given
a matrix 𝐴 ∈ C𝑛×𝑛 , consider using Schur triangularization of 𝐴∗ . There exists a
unitary matrix 𝑃 such that 𝑃 ∗𝐴∗𝑃 is upper triangular. Then taking adjoint, we have
𝑃 ∗𝐴𝑃 is lower triangular. That is,
each square matrix is unitarily similar to a lower triangular matrix.
Analogously, a real square matrix having no non-real eigenvalues is also orthogo-
nally similar to a lower triangular matrix. We remark that the lower triangular form
of a matrix need not be the transpose or the adjoint of its upper triangular form.
Neither the unitary matrix 𝑃 nor the upper triangular matrix 𝑃 ∗𝐴𝑃 in Schur
triangualrization is unique. That is, there can be unitary matrices 𝑃 and 𝑄 such
that 𝑃 ≠ 𝑄, 𝑃 ∗𝐴𝑃 ≠ 𝑄 ∗𝐴𝑄, and both 𝑃 ∗𝐴𝑃 and 𝑄 ∗𝐴𝑄 are upper triangular. The
non-uniqueness stems from the choices involved in the associated eigenvectors and
in extending this to an orthonormal basis. For instance, in (5.17), if you extend
{(0, 0, 1) t } to the ordered orthonormal basis
5.4 Diagonalizability
Some linear operators can be represented by matrices simpler than upper triangular
matrices. For instance, if 𝑇 is a self-adjoint linear operator on an ips 𝑉 of finite
dimension, then Schur triangularization asserts the existence of an orthonormal
basis for 𝑉 such that the matrix of 𝑇 with respect to this basis is upper triangular.
By (4.22), this matrix is hermitian. However, a hermitian upper triangular matrix is
diagonal. Therefore, 𝑇 is represented by a diagonal matrix. Shortly, we will discuss
a more general result in this connection.
Let 𝑇 : 𝑉 → 𝑉 be a linear operator, where 𝑉 is a finite dimensional vector space.
We say that 𝑇 is diagonalizable iff there exists an ordered basis 𝐵 for 𝑉 such that
[𝑇 ]𝐵,𝐵 is a diagonal matrix.
If 𝑉 is an ips, then 𝑇 is called unitarily diagonalizable iff there exists an or-
thonormal ordered basis 𝐵 for 𝑉 such that [𝑇 ]𝐵,𝐵 is a diagonal matrix.
If 𝐴 ∈ C𝑛×𝑛 , then it is viewed as a linear operator on C𝑛×1 . Of course, the matrix
representation of the linear operator 𝐴 with respect to the standard basis of C𝑛×1
is 𝐴 itself. A change of basis results in a matrix similar to 𝐴. Thus, 𝐴 ∈ C𝑛×𝑛
is diagonalizable iff 𝐴 is similar to a diagonal matrix iff there exists an invertible
matrix 𝑃 ∈ C𝑛×𝑛 such that 𝑃 −1𝐴𝑃 is a diagonal matrix. In such a case, we say that
𝐴 is diagonalized by 𝑃 .
Similarly, 𝐴 is unitarily diagonalizable iff 𝐴 is unitarily similar to a diagonal
matrix iff there exists a unitary matrix 𝑈 ∈ C𝑛×𝑛 such that 𝑈 ∗𝐴𝑈 is a diagonal
matrix.
Spectral Representation 127
If 𝐴 ∈ R𝑛×𝑛 , then we say that 𝐴 is orthogonally diagonalizable iff 𝐴 is orthog-
onally similar to a diagonal matrix iff there exists an orthogonal matrix 𝑄 ∈ R𝑛×𝑛
such that 𝑄 ∗𝐴𝑄 is a diagonal matrix.
The following result connects diagonalizability of a linear operator to eigenvalues
and eigenvectors.
(5.18) Theorem
A linear operator 𝑇 on a finite dimensional vector space 𝑉 is diagonalizable iff there
exists a basis for 𝑉 consisting of eigenvectors of 𝑇 . In particular, a matrix 𝐴 ∈ F𝑛×𝑛
is diagonalizable iff there exists a basis of F𝑛×1 consisting of eigenvectors of 𝐴.
(5.19) Theorem
Eigenvectors associated with distinct eigenvalues of a linear operator on a finite di-
mensional vector space are linearly independent. In particular, each linear operator
on a vector space of dimension 𝑛 having 𝑛 distinct eigenvalues is diagonalizable.
𝑣𝑘+1 = 𝑎 1𝑣 1 + · · · + 𝑎𝑘 𝑣𝑘 . (5.4.1)
𝑇 𝑣𝑘+1 = 𝑎 1𝑣 1 + · · · + 𝑎𝑘 𝑣𝑘 .
𝜆𝑘+1𝑣𝑘+1 = 𝑎 1𝜆1𝑣 1 + · · · + 𝑎𝑘 𝜆𝑘 𝑣𝑘 .
(5.20) Example
Let 𝑇 : R3×1 → R3×1 be given by 𝑇 (𝑎, 𝑏, 𝑐) = (𝑎 + 𝑏 + 𝑐, 3𝑏 + 3𝑐, −2𝑎 + 𝑏 + 𝑐). If
possible, find an ordered basis 𝐵 for R3 so that [𝑇 ]𝐵,𝐵 is a diagonal matrix.
The matrix of 𝑇 with respect to the standard basis of R3 is
1 1 1
𝐴 = 0 3 3 .
−2 1 1
We find that 𝜒𝑇 (𝑡) = 𝜒𝐴 (𝑡) = 𝑡 (𝑡 − 2)(𝑡 − 3). Since no eigenvalue is repeated,
𝑇 is diagonalizable. For diagonalization, we compute the eigenvectors 𝑣 1, 𝑣 2, 𝑣 3
corresponding to the eigenvalues 𝜆 = 0, 2, 3. They are as follows:
We see that
𝑇 (𝑣 1 ) = (0, 0, 0) = 0 𝑣 1 + 0 𝑣 2 + 0 𝑣 3
𝑇 (𝑣 2 ) = (4, 6, −2) = 0 𝑣 1 + 2 𝑣 2 + 0 𝑣 3
𝑇 (𝑣 3 ) = (3, 6, 0) = 0 𝑣 1 + 0 𝑣 2 + 3 𝑣 3 .
Therefore, with the basis 𝐵 = {𝑣 1, 𝑣 2, 𝑣 3 } for R3, we have [𝑇 ]𝐵,𝐵 = diag (0, 2, 3).
To diagonalize the matrix 𝐴 directly, we work with column vectors so that
{𝑣 1t , 𝑣 2t , 𝑣 3t } is the required basis for R3×1 . The diagonalizing matrix is
0 2 1
t t t
𝑃 = 𝑣 1 𝑣 2 𝑣 3 = −1 3 2 .
1 −1 0
1 −1/2 1/2 1 1 1 0 2 1 0 0 0
Then 𝑃 −1𝐴𝑃 = 1 −1/2 −1/2 0 3 3 −1 3 2 = 0 2 0 .
−1 1 1 −2 1 1 1 −1 0 0 0 3
If 𝜆 is an eigenvalue of a linear operator 𝑇 , then its associated eigenvector 𝑢
is a solution of 𝑇𝑢 = 𝜆𝑢. Thus the maximum number of linearly independent
eigenvectors associated with the eigenvalue 𝜆 is dim (𝑁 (𝑇 − 𝜆𝐼 )). This number and
the algebraic multiplicity of 𝜆 have certain relations with the diagonalizability of 𝑇 .
Let 𝜆 be an eigenvalue of a linear operator 𝑇 on a finite dimensional vector space
𝑉 . The number dim (𝑁 (𝑇 − 𝜆𝐼 )) is called the geometric multiplicity of 𝜆.
130 MA2031 Classnotes
Recall that the number 𝑚 such that (𝑡 − 𝜆)𝑚 divides 𝜒𝑇 (𝑡), and (𝑡 − 𝜆)𝑚+1 does
not divide 𝜒𝑇 (𝑡) is the algebraic multiplicity of 𝜆.
(5.21) Theorem
The geometric multiplicity of any eigenvalue of a linear operator on a finite dimen-
sional vector space is less than or equal to its algebraic multiplicity.
𝑇 𝑣 1 = 𝜆𝑣 1, . . . , 𝑇 𝑣𝑘 = 𝜆𝑣𝑘 .
And for 𝑗 > 𝑘, 𝑇 𝑣 𝑗 can be any linear combination of 𝑣 1, . . . , 𝑣𝑛 . That is, the matrix
of 𝑇 with respect to 𝐵 is given by
𝜆𝐼𝑘 𝐶
𝐴 := [𝑇 ]𝐵,𝐵 = ,
0 𝐷
(5.22) Theorem
A linear operator 𝑇 on a vector space of dimension 𝑛 is diagonalizable iff the
geometric multiplicity of each eigenvalue of 𝑇 is equal to its algebraic multiplicity
iff sum of geometric multiplicities of all eigenvalues of 𝑇 is 𝑛.
(5.23) Example
2 0 2 1
Let 𝐴 = and let 𝐵 = . We see that 𝜒𝐴 (𝑡) = 𝜒𝐵 (𝑡) = (𝑡 − 2) 2 . The
0 2 0 2
eigenvalue 𝜆 = 2 has algebraic multiplicity 2 for both 𝐴 and 𝐵.
For geometric multiplicities, we solve 𝐴𝑥 = 𝑥 and 𝐵𝑦 = 𝑦.
Now, 𝐴𝑥 = 2𝑥 gives 𝑥 = 𝑥, which is satisfied by the linearly independent vectors
(1, 0) t and (0, 1) t . Thus, the geometric multiplicity of the eigenvalue 2 is dim (𝑁 (𝐴−
2𝐼 )) = 2, which is equal to the algebraic multiplicity of the eigenvalue 2. Hence 𝐴
is diagonalizable; in fact, it is already a diagonal matrix.
For the matrix 𝐵, we solve 𝐵𝑥 = 2𝑥 . With 𝑥 = (𝑎, 𝑏) t, we get 2𝑎 + 𝑏 = 2𝑎 and
2𝑏 = 2𝑏. That is, 𝑏 = 0 and 𝑎 can be any complex number. For instance, 𝑥 = (1, 0) t .
Then the geometric multiplicity of the eigenvalue 2 is dim (𝑁 (𝐵 − 𝜆𝐼 )) = 1, which
is not equal to the algebraic multiplicity of the eigenvalue 2. Therefore, 𝐵 is not
diagonalizable.
(5.24) Example
Let 𝑇 : R3 → R3 be given by 𝑇 (𝑎, 𝑏, 𝑐) = (𝑎 + 3𝑏 + 3𝑐, 3𝑎 + 𝑏 + 3𝑐, 3𝑎 + 3𝑏 + 𝑐).
Determine whether the linear operator 𝑇 is diagonalizable or not.
With respect to the standard basis, 𝑇 has the matrix representation
1 3 3
𝐴 = 3 1 3 .
3 3 1
We obtain 𝜒𝑇 (𝑡) = 𝜒𝐴 (𝑡) = (𝑡 + 2) 2 (𝑡 − 7). The eigenvalue −2 has algebraic
multiplicity 2. To determine its geometric multiplicity, we solve the linear system
(𝑇 + 2𝐼 )(𝑎, 𝑏, 𝑐) = 0. It gives 3𝑎 + 3𝑏 + 3𝑐 = 0 or, 𝑎 + 𝑏 + 𝑐 = 0. It has two linearly
independent solutions. That is, the geometric multiplicity of the eigenvalue −2 is
equal to
dim (𝑁 (𝑇 + 2𝐼 )) = dim {(𝑎, 𝑏, 𝑐) ∈ R3 : 𝑎 + 𝑏 + 𝑐 = 0} = 2.
Thus the geometric multiplicity of the eigenvalue −2 is equal to its algebraic mul-
tiplicity, 2. The geometric multiplicity of the eigenvalue 7 is at least 1 and also it
is less than or equal to its algebraic multiplicity, which is 1. That is, the geometric
multiplicity of the eigenvalue 7 is equal to its algebraic multiplicity, 1. Hence 𝑇 is
diagonalizable.
132 MA2031 Classnotes
1 1 1
Indeed, with 𝑃 = −1 0 1 , we see that 𝑃 −1𝐴𝑃 = diag (−2, −2, 7).
0 −1 1
In the beginning of this section we have argued that a hermitian matrix can be
diagonalized. This observation can be generalized to normal matrices by using the
fact that a normal upper triangular matrix is necessarily a diagonal matrix.
(5.25) Lemma
A normal upper triangular matrix is diagonal. In particular, a hermitian upper
triangular matrix is diagonal.
𝑎 0∗ 𝑎 𝑥
2
∗ |𝑎| 𝑎𝑥
𝐵 𝐵 = ∗ ∗ =
𝑥 𝐶 0 𝐶 𝑎𝑥 ∗ 𝑥 ∗𝑥 + 𝐶 ∗𝐶
𝑎 0∗ |𝑎| + 𝑥𝑥 ∗ 𝑥𝐶 ∗
2
∗ 𝑎 𝑥
𝐵𝐵 = = .
0 𝐶 𝑥∗ 𝐶∗ 𝐶𝑥 ∗ 𝐶𝐶 ∗
There can be non-normal but diagonalizable matrices. For such a matrix 𝐴, there
does not exist a unitary matrix 𝑈 such that 𝑈 −1𝐴𝑈 is a diagonal matrix.
If unitary diagonalization is not required, we can diagonalize a diagonalizable
operator by constructing a basis of eigenvectors, which need not be orthonormal.
This means, for each eigenvalue, we choose linearly independent eigenvectors as
many as that matching its geometric multiplicity. This is bound to succeed provided
that the operator is diagonalizable. Similarly, choosing orthogonal eigenvectors for
each eigenvalue would lead to unitary diagonalization of a normal operator.
(5.28) Example
Consider 𝑇 : R3 → R3 with 𝑇 (𝑎, 𝑏, 𝑐) = (𝑎 + 3𝑏 + 3𝑐, 3𝑎 + 𝑏 + 3𝑐, 3𝑎 + 3𝑏 + 𝑐). Now,
h𝑇 (𝑎, 𝑏, 𝑐), (𝛼, 𝛽, 𝛾)i = 𝑎𝛼 + 3𝑏𝛼 + 3𝑐𝛼 + 3𝑎𝛽 + 𝑏𝛽 + 3𝑐𝛽 + 3𝑎𝛾 + 3𝑏𝛾 + 𝑐𝛾
= 𝑎(𝛼 + 3𝛽 + 3𝛾) + 𝑏 (3𝛼 + 𝛽 + 3𝛾) + 𝑐 (3𝛼 + 3𝛽 + 𝛾)
= h(𝑎, 𝑏, 𝑐),𝑇 (𝛼, 𝛽, 𝛾)i.
1 3 3
[𝑇 ]𝐸,𝐸 = 3 1 3 ,
3 3 1
134 MA2031 Classnotes
𝑇 (𝑢 1 ) = −2𝑢 1, 𝑇 (𝑢 2 ) = −2𝑢 2, 𝑇 (𝑢 3 ) = 7𝑢 3 .
𝑇 (𝑤 1 ) = −2 𝑤 1, 𝑇 (𝑤 2 ) = −2 𝑤 2, 𝑇 (𝑤 3 ) = 7 𝑤 3 .
𝑣 𝑗 = 𝑎 1𝑢 1 + · · · + 𝑎 𝑗 𝑢 𝑗 for 1 ≤ 𝑗 ≤ 𝑚.
(5.29) Example
Consider 𝑇 : R3 → R3 given by 𝑇 (𝑎, 𝑏, 𝑐) = (−𝑎 + 𝑏 + 𝑐, 𝑎 − 𝑏 + 𝑐, 𝑎 + 𝑏 − 𝑐). With
respect to the standard basis 𝐸 of R3, we have
−1 1 1
𝐴 = [𝑇 ]𝐸,𝐸 = 1 −1 1 .
1 1 −1
The matrix 𝐴 is real symmetric. Thus, 𝑇 is self-adjoint. Now, 𝜒𝑇 (𝑡) = 𝜒𝐴 (𝑡) =
(𝑡 − 1)(𝑡 + 2) 2 . Thus 𝑇 has eigenvalues 1, −2, −2. To construct an eigenvector for
the eigenvalue 1, we set up the linear system (𝑇 − 𝐼 )(𝑎, 𝑏, 𝑐) t = 0. That is,
−𝑎 + 𝑏 + 𝑐 = 𝑎, 𝑎 − 𝑏 + 𝑐 = 𝑏, 𝑎 + 𝑏 − 𝑐 = 𝑐.
𝑇 (𝑤 1 ) = √1 𝑇 (1, 1, 1) = √1 (1, 1, 1) = 1 𝑤 1
3 3
𝑇 (𝑤 2 ) = 1
√ 𝑇 (1, −1, 0) = √1 (−2, 2, 0) = −2 𝑤 2
2 2
1 1
𝑇 (𝑤 3 ) = √ 𝑇 (1, 1, −2) = √ (−2, −2, 4) = −2 𝑤 3 .
6 6
Equivalently,
Such a Jordan string 𝑣 1, . . . , 𝑣𝑘 is said to start with 𝑣 1 and end with 𝑣𝑘 . The number
𝑘 is called the length of the Jordan string.
(5.30) Example
Consider the linear operator 𝑇 : C5 → C5 given by 𝑇 (𝑎, 𝑏, 𝑐, 𝑑, 𝑒) = (𝑎, 𝑎 + 𝑏, 𝑏 +
𝑐, 𝑑, 𝑑 + 𝑒). With respect to the standard basis of C5, 𝑇 is represented by the matrix
1 0 0 0 0
1 1 0 0 0
0 1 1 0 0 .
0 0 0 1 0
0 0 0 1 1
Since this matrix is lower triangular with all diagonal entries as 1, the only eigenvalue
of 𝑇 is 1 with algebraic multiplicity 5. Notice that
There are two types of Jordan strings for the eigenvalue 1, one starting with 𝑣 1
and the other starting with 𝑤 1 .
For a Jordan string starting with 𝑣 1, we need to solve (𝑇 − 𝐼 )𝑣 2 = 𝑣 1 . If 𝑣 2 =
(𝑎, 𝑏, 𝑐, 𝑑, 𝑒), then the equation is
𝑣 3 = (1, 1, 1, 0, 0).
It does not have a solution; and we stop here with the Jordan string 𝑣 1, 𝑣 2, 𝑣 3 .
Notice that we could have chosen 𝑣 2 differently, and once again, 𝑣 3 could have
been chosen in a different way.
Now, with the eigenvector 𝑤 1 = (0, 0, 0, 0, 1), we proceed similarly. Suppose
𝑤 2 = (𝑎, 𝑏, 𝑐, 𝑑, 𝑒) satisfies (𝑇 − 𝐼 )𝑤 2 = 𝑤 1 . Then
(𝐴 − 𝐼 )𝑣 3 = (𝐴 − 𝐼 )(1, 1, 1, 0, 0) = (0, 1, 1, 0, 0) = 𝑣 2,
(𝐴 − 𝐼 ) 2𝑣 3 = (𝐴 − 𝐼 )𝑣 2 = (𝐴 − 𝐼 )(0, 1, 1, 0, 0) = (0, 0, 1, 0, 0) = 𝑣 1,
(𝐴 − 𝐼 ) 3𝑣 3 = (𝐴 − 𝐼 )𝑣 1 = (𝐴 − 𝐼 )(0, 0, 1, 0, 0) = (0, 0, 0, 0, 0).
(𝐴 − 𝐼 )𝑤 2 = (𝐴 − 𝐼 )(0, 0, 1, 1, 0) = (0, 0, 0, 0, 1) = 𝑤 1,
(𝐴 − 𝐼 ) 2𝑤 2 = (𝐴 − 𝐼 )𝑤 1 = (𝐴 − 𝐼 )(0, 0, 0, 1, 0) = (0, 0, 0, 0, 0).
140 MA2031 Classnotes
That is, (𝐴 − 𝐼 ) 3𝑣 3 = 0 = (𝐴 − 𝐼 ) 2𝑤 2 .
Therefore, the Jordan string with 𝑣 consists of this single vector 𝑣. Since {𝑣 } is a
basis of 𝑉 , the statements of the theorem hold true.
Lay out the induction hypothesis that for all complex vector spaces of dimension
less than 𝑛, the statements are true. Let 𝑇 : 𝑉 → 𝑉 be a linear operator, where
𝑉 is a complex vector space of dimension 𝑛. Let 𝜆 be an eigenvalue of 𝑇 . Then
null(𝑇 − 𝜆𝐼 ) ≥ 1. Write 𝑈 = 𝑅(𝑇 − 𝜆𝐼 ). By the rank nullity theorem, 𝑟 = dim (𝑈 ) =
rank(𝑇 − 𝜆𝐼 ) < 𝑛. If 𝑥 ∈ 𝑈 , then (𝑇 − 𝜆𝐼 )𝑥 ∈ 𝑅(𝑇 − 𝜆𝐼 ) = 𝑈 . Thus, the restriction
of 𝑇 − 𝜆𝐼 to 𝑈 is a linear operator. Call this restriction linear operator as 𝑆. That is,
𝑆 is the linear operator given by
𝑆𝑥 1 = 𝜇𝑥 1, 𝑆𝑥 2 = 𝑥 1 + 𝜇𝑥 2, . . . , 𝑆𝑥 𝑗 = 𝑥 𝑗−1 + 𝜇𝑥 𝑗 ;
𝑆𝑥 ≠ 𝑥 𝑗 + 𝜇𝑥 for any 𝑥 ∈ 𝑈 . (5.5.1)
As 𝑆 = 𝑇 − 𝜆𝐼 on 𝑈 , it follows that
Look at the last inequality. Can it happen that there exists a vector 𝑦 ∈ 𝑉
such that 𝑇𝑦 = 𝑥 𝑗 + (𝜆 + 𝜇)𝑦? If it so happens, then (𝑇 − 𝜆𝐼 )𝑦 = 𝑥 𝑗 + 𝜇𝑦. Then
𝑥 𝑗 + 𝜇𝑦 ∈ 𝑅(𝑇 − 𝜆𝐼 ) = 𝑈 . But 𝑥 𝑗 ∈ 𝑈 . Therefore, 𝑦 ∈ 𝑈 . This will contradict (5.5.1).
Hence the last inequality is replaced by
And, we conclude that any Jordan string for an eigenvalue 𝜇 ≠ 0 of 𝑆 listed among
𝑣 1, . . . , 𝑣𝑟 is a Jordan string for an eigenvalue 𝜆 + 𝜇 of 𝑇 .
Next, we look at any Jordan string for the eigenvalue 0. Any such Jordan string
looks like a list of vectors 𝑢 1, . . . , 𝑢𝑘 in 𝑈 with
The vectors 𝑢𝑖 are from the set 𝑣 1, . . . , 𝑣𝑟 . Then (5.5.2) implies that
(𝑇 − 𝜆𝐼 )𝑢𝑘+1 = 𝑢𝑘 .
𝛼 1 (𝑇 − 𝜆𝐼 )𝑣 1 + · · · + 𝛼𝑟 (𝑇 − 𝜆𝐼 )𝑣𝑟 + 𝛽 1 (𝑇 − 𝜆𝐼 )𝑤 1 + · · · + 𝛽𝑠 (𝑇 − 𝜆𝐼 )𝑤𝑠 = 0.
We look at the effect of applying (𝑇 − 𝜆𝐼 ) in three stages. First, look at the starting
vectors of the Jordan strings. They are from 𝑁 (𝑇 − 𝜆𝐼 ). If 𝑣 is a such a vector, then
(𝑇 − 𝜆𝐼 )𝑣 = 0. Thus, all starting vectors of the Jordan strings vanish from the sum.
Second, look at all other 𝑣𝑖 in the Jordan strings. Since (𝑇 − 𝜆𝐼 )𝑣𝑖 = 𝑣𝑖−1, each
such 𝑣𝑖 in the Jordan string will have coefficient as 𝛼𝑖+1 in the sum instead of the
previous 𝛼𝑖 . This will reintroduce the starting vectors of the Jordan strings with
updated coefficients. Further, the vectors with which Jordan strings end are absent
in the sum.
Third, look at (𝑇 − 𝜆𝐼 )𝑤 𝑗 . Each 𝑤 𝑗 is the last vector in an enlarged Jordan string.
Thus 𝑤 𝑗 = (𝑇 − 𝜆𝐼 )𝑣 𝑝 for some 𝑝; this 𝑣 𝑝 is the vector with which a Jordan string
Spectral Representation 143
ends. Thus these vectors 𝑣 𝑝 are reintroduced in the sum with coefficients as 𝛽 𝑗 .
Further, the vectors 𝑤 𝑗 are absent in the sum.
Therefore, the simplified sum is a liner combination of all 𝑣𝑖 with updated coef-
ficients where vectors 𝑣 𝑝 with which a Jordan string ends has the coefficient 𝛽 𝑗 of
the corresponding next vector 𝑤 𝑗 . In this sum all 𝛼s do not occur, but all 𝛽s occur
as coefficients of these vectors 𝑣 𝑝 .
For instance, if the list 𝑣 1, 𝑣 2, 𝑣 3, 𝑤 1 is an enlarged Jordan string, and another
Jordan string starts from 𝑣 4, then after applying (𝑇 − 𝜆𝐼 ) the part for 𝑣 1, 𝑣 2, 𝑣 3, 𝑤 1 in
the sum is
𝛼 1 (𝑇 − 𝜆𝐼 )𝑣 1 + 𝛼 2 (𝑇 − 𝜆𝐼 )𝑣 2 + 𝛼 3 (𝑇 − 𝜆𝐼 )𝑣 3 + 𝛽 1 (𝑇 − 𝜆𝐼 )𝑤 1 .
𝛼 2𝑣 1 + 𝛼 3𝑣 2 + 𝛽 1𝑣 3 .
of length 1 for the eigenvalue 𝜆 of 𝑇 . (Such a situation arises when 𝑛 > 𝑟 + 𝑠.) Thus
for the eigenvalue 𝜆, we have in total 𝑛 − 𝑟 − 𝑠 + 𝑠 = 𝑛 − 𝑟 number of Jordan strings
in 𝐵. Since 𝑛 − 𝑟 = null(𝑇 − 𝜆𝐼 ), the number of Jordan strings for the eigenvalue
𝜆 of 𝑇 is the geometric multiplicity of 𝜆. Jordan strings for other eigenvalues of 𝑇
remain unchanged.
The inductive construction in the proof of (5.31) goes as follows. Let 𝜆 be an
eigenvalue of 𝑇 : 𝑉 → 𝑉 . Write 𝑈 := 𝑅(𝑇 − 𝜆𝐼 ); and define the linear operator
𝑆 : 𝑈 → 𝑈 as the restriction of 𝑇 − 𝜆𝐼 to 𝑈 . Without loss of generality, suppose
the first 𝑠 Jordan strings (each Jordan string in each line on the left array below)
correspond to the eigenvalue 0 of 𝑆; and other Jordan strings may correspond to
other eigenvalues of 𝑆.
When 0 is not an eigenvalue of 𝑆, the number 𝑠 is equal to 0. If such a Jordan string
corresponds to an eigenvalue 𝜇 − 𝜆 of 𝑆, then it also corresponds to the eigenvalue
𝜇 of 𝑇 .
Here, 𝑛 1 + · · · + 𝑛𝑘 = 𝑟 = rank(𝑇 − 𝜆𝐼 ) = dim (𝑈 ). The first vectors of the first 𝑠
Jordan strings are in 𝑁 (𝑆). That is,
With respect to a basis that consists of disjoint union of Jordan strings, how does
a matrix representation of a linear operator look like?
Let 𝑇 be a linear operator on a complex vector space 𝑉 of dimension 𝑛 having
distinct eigenvalues 𝜆1, . . . , 𝜆𝑘 , whose geometric multiplicities are 𝑠 1, . . . , 𝑠𝑘 , and
Spectral Representation 145
algebraic multiplicities 𝑚 1, . . . , 𝑚𝑘 , respectively. Then we have a basis in which
there are 𝑠𝑖 number of Jordan strings for 𝜆𝑖 . We choose these 𝑠𝑖 Jordan strings in
some order, so that we may talk of first Jordan string for 𝜆𝑖 , second Jordan string for
𝜆𝑖 , and so on. We also take the eigenvalues in some order, say, 𝜆𝑖 is the 𝑖th eigenvalue.
The vectors in any Jordan string are already ordered. Then we construct an ordered
basis from these Jordan strings by listing all vectors (in order) in the first Jordan
string for 𝜆1 ; next, the vectors from the second Jordan string for 𝜆1, and so on. This
completes the list of 𝑚 1 vectors for 𝜆1 . Next, we list all vectors in order from the
first Jordan string for 𝜆2, and so on. After the list is complete we obtain an ordered
basis 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } of 𝑉 . Notice that for any 𝑣 𝑗 in this basis, we have
𝑇 (𝑣 𝑗 ) = 𝜆𝑣 𝑗 or 𝑇 (𝑣 𝑗 ) = 𝑎𝑣 𝑗−1 + 𝜆𝑣 𝑗
for an eigenvalue 𝜆 of 𝑇 with 𝑎 ∈ {0, 1}. For instance, if the first Jordan string for
𝜆1 is of length ℓ, then
and after this the next Jordan string starts. In the matrix representation of 𝑇 with
respect to 𝐵, this will correspond to a block of order ℓ having diagonal entries as 𝜆1
and super diagonal entries (entries above the diagonal) as 1.
Then the next Jordan string will give rise to another block of diagonal entries 𝜆1
and super diagonal entries as 1. This block will have the order as the length of the
Jordan string. This way, when all the Jordan strings for 𝜆1 are complete, another
similar block will start with a Jordan string for 𝜆2, and so on.
With respect to this basis 𝐵, the linear operator 𝑇 will have the matrix represen-
tation in the form
𝐵,𝐵 = diag (𝐽1, 𝐽2, . . . , 𝐽𝑘 ), (5.5.4)
where each 𝐽𝑖 is again a block diagonal matrix looking like
wit 𝑠𝑖 as the geometric multiplicity of the eigenvalue 𝜆𝑖 . Each matrix 𝐽˜𝑗 (𝜆𝑖 ) here has
the form
𝜆𝑖 1
𝜆𝑖 1
˜𝐽 𝑗 (𝜆𝑖 ) = .. ..
. .
.
1
𝜆𝑖
The missing entries are all 0. Such a matrix 𝐽˜𝑗 (𝜆𝑖 ) is called a Jordan block with
diagonal entries 𝜆𝑖 . The order of this Jordan block is the length of the corresponding
146 MA2031 Classnotes
Jordan string for the eigenvalue 𝜆𝑖 . In the matrix [𝑇 ]𝐵,𝐵 , the number of Jordan blocks
with diagonal entries 𝜆𝑖 is the number of Jordan strings for the eigenvalue 𝜆𝑖 , which
is equal to the geometric multiplicity of the eigenvalue 𝜆𝑖 . Any matrix which is in
the block diagonal form (5.5.4) is said to be in Jordan form. Using (5.31) we
obtain the following result.
In the formula for 𝑟𝑘 (𝜆), we use the convention that for any matrix 𝐵 of order 𝑛,
𝐵0 is the identity matrix of order 𝑛.
Proof. Existence of Jordan form and the statement about the number of Jordan
blocks with diagonal entry as 𝜆 follow from (5.31).
For the formula for 𝑟𝑘 (𝜆), let 𝜆 be an eigenvalue of 𝑇 . Write 𝐽 := [𝑇 ]𝐵,𝐵 . Suppose
1 ≤ 𝑘 ≤ 𝑛. Observe that [𝑇 − 𝜆𝐼 ]𝐵,𝐵 = 𝐽 − 𝜆𝐼 . From (4.25) it follows that for each 𝑖,
rank((𝑇 − 𝜆𝐼 )𝑖 ) = rank((𝐽 − 𝜆𝐼 )𝑖 ). Therefore, it is enough to prove the formula for
𝐽 instead of 𝑇 .
We use induction on 𝑛. In the basis case, 𝐽 = [𝜆]. Here, 𝑘 = 1; 𝑟𝑘 (𝜆) = 𝑟 1 (𝜆) = 1.
On the right hand side, due to the convention,
(𝐽 − 𝜆𝐼 )𝑘−1 = 𝐼 = [1], (𝐽 − 𝜆𝐼 )𝑘 = [0] 1 = [0], (𝐽 − 𝜆𝐼 )𝑘+1 = [0] 2 = [0].
So, the formula holds for 𝑛 = 1.
Lay out the induction hypothesis that for all matrices in Jordan form of order less
than 𝑛, the formula holds. Let 𝐽 be a matrix of order 𝑛, which is in Jordan form. We
consider two cases.
Case 1: Let 𝐽 have a single Jordan block corresponding to 𝜆. That is,
𝜆 1 0 1
𝜆 1 0 1
𝐽 =
... ...
, 𝐽 − 𝜆𝐼 =
... ...
.
1
1
𝜆
0
Spectral Representation 147
Here 𝑟 1 (𝜆) = 0, 𝑟 2 (𝜆) = 0, . . . , 𝑟𝑛−1 (𝜆) = 0 and 𝑟𝑛 (𝜆) = 1. By direct computation,
we see that (𝐽 −𝜆𝐼 ) 2 has 1 on the super-super-diagonal, and 0 elsewhere. Proceeding
similarly for higher powers of 𝐽 − 𝜆𝐼, we obtain
Case 2: Suppose 𝐽 has more than one Jordan block corresponding to 𝜆. By reordering
of blocks, we assume that the first Jordan block in 𝐽 corresponds to 𝜆 and has order
𝑟 < 𝑛. Then 𝐽 − 𝜆𝐼 can be written in block form as
𝐶 0
𝐽 − 𝜆𝐼 = ,
0 𝐷
where 𝐶 is the Jordan block of order 𝑟 with diagonal entries as 0, and 𝐷 is the matrix
of order 𝑛 − 𝑟 consisting of other blocks of 𝐽 − 𝜆𝐼 . Then, for any 𝑗,
𝑗
𝑗 𝐶 0
(𝐽 − 𝜆𝐼 ) = .
0 𝐷𝑗
Therefore,
rank(𝐽 − 𝜆𝐼 ) 𝑗 = rank(𝐶 𝑗 ) + rank(𝐷 𝑗 ).
Write 𝑟𝑘 (𝜆, 𝐶) and 𝑟𝑘 (𝜆, 𝐷) for the number of Jordan blocks of order 𝑘 for the
eigenvalue 𝜆 that appear in 𝐶 and in 𝐷, respectively. Then
Notice that using rank-nullity theorem, the number 𝑟𝑘 can also be given as in the
following:
(5.34) Example
1 0 0 0 0
1 1 0 0 0
Consider the matrix 𝐴 = 0 1 1 0 0 of (5.30).
0 0 0 1 0
0 0 0 1 1
There, we had constructed the Jordan strings
Then
𝑟 1 (1) = 5 − 2 × 3 + 1 = 0, 𝑟 2 (1) = 3 − 2 × 1 + 0 = 1
𝑟 3 (1) = 1 − 2 × 0 + 0 = 1, 𝑟 4 (1) = 0 = 𝑟 5 (1).
It says that in the Jordan form, there is one Jordan block of size 2 and one Jordan
block of size 3 with the sole eigenvalue 1. That gives the Jordan form as obtained
earlier, up to a permutation of the blocks.
In the Jordan strings, observe that the vectors used are from 𝑁 (𝑇 − 𝜆𝐼 ) 𝑗 . Such
a vector is called a generalized eigenvector corresponding to the eigenvalue 𝜆 of
𝑇 . For a matrix 𝐴, the similarity matrix 𝑃 in 𝐽 = 𝑃 −1𝐴𝑃 has the columns as the
vectors from the basis in which 𝐽 represents the matrix 𝐴. These vectors are specific
generalized eigenvectors of 𝐴.
The uniqueness of a Jordan form can be made exact by first ordering the eigen-
values of 𝐴 and then arranging the blocks corresponding to each eigenvalue, which
are now together, in some order, say in ascending order of their size. In doing so,
the Jordan form of any matrix becomes unique. Such a Jordan form is called the
Jordan canonical form of a matrix. It then follows that if two matrices are similar,
then they have the same Jordan canonical form. Moreover, uniqueness also implies
that two dissimilar matrices will have different Jordan canonical forms. Therefore,
Jordan form characterizes similarity of matrices. It implies the following:
Two square matrices 𝐴 and 𝐵 of the same order are similar iff they
have the same eigenvalues, and for each eigenvalue 𝜆, for each 𝑗 ∈ N,
rank(𝐴 − 𝜆𝐼 ) 𝑗 = rank(𝐵 − 𝜆𝐼 ) 𝑗 .
As an application of Jordan form, we will show that each matrix is similar to its
transpose. Let 𝐴 ∈ C𝑛×𝑛 . We know that a scalar 𝜆 is an eigenvalue of 𝐴 iff it is an
eigenvalue of 𝐴t . Further, rank(𝐴t ) = rank(𝐴). Thus, for any eigenvalue 𝜆 of 𝐴 and
for any 𝑗, we have rank 𝐴 − 𝜆𝐼 ) 𝑗 = rank 𝐴t − 𝜆𝐼 ) 𝑗 . Consequently, 𝐴 and 𝐴t are
similar.
It also follows from the Jordan form that if 𝑚 is the algebraic multiplicity of
an eigenvalue 𝜆, then one can always choose 𝑚 linearly independent generalized
eigenvectors; see Exercise 7 of Section 5.3. Further, the following is guaranteed
(Exercise 8.):
150 MA2031 Classnotes
𝜆1 ≥ 𝜆2 ≥ · · · ≥ 𝜆𝑟 > 0 = 𝜆𝑟 +1 = · · · = 𝜆𝑛 .
𝑠 1 ≥ 𝑠 2 ≥ · · · ≥ 𝑠𝑟 > 0 = 𝑠𝑟 +1 = · · · = 𝑠𝑛 .
√
Here, 𝑠𝑖 = 𝜆𝑖 . If 𝜆 > 0 is an eigenvalue of 𝑇 ∗𝑇 with an associated eigenvector 𝑣,
then 𝑇 ∗𝑇 𝑣 = 𝜆𝑣. It yields (𝑇𝑇 ∗ )(𝑇 𝑣) = 𝜆(𝑇 𝑣). Now, 𝜆𝑣 ≠ 0 implies 𝑇 𝑣 ≠ 0. Hence
𝜆 > 0 is also an eigenvalue of 𝑇𝑇 ∗ with an associated eigenvector 𝑇 𝑣. Similarly, it
follows that each positive eigenvalue of 𝑇𝑇 ∗ is also an eigenvalue of 𝑇 ∗𝑇 .
Further, the spectral theorem implies that the self-adjoint linear operator 𝑇 ∗𝑇 is
represented by a diagonal matrix. In such a diagonal matrix, the only nonzero entries
are 𝑠 12, . . . , 𝑠𝑟2 . Therefore, 𝑇 ∗𝑇 has rank 𝑟 . It then follows that 𝑠 12, . . . , 𝑠𝑟2 are the only
positive eigenvalues of the self-adjoint linear operator 𝑇𝑇 ∗ . There are 𝑛 − 𝑟 number
of zero eigenvalues of 𝑇 ∗𝑇 , where as 𝑇𝑇 ∗ has 𝑚 − 𝑟 number of zero eigenvalues.
The following theorem gives much more information than this by representing 𝑇
in terms of its singular values.
𝐸 = {𝑤 1, . . . , 𝑤𝑟 , 𝑤𝑟 +1, . . . , 𝑤𝑚 }
𝑇𝑇 ∗𝑤 𝑗 = 𝛼 1𝑇 𝑣 1 + · · · + 𝛼𝑟 𝑇 𝑣𝑟 = 𝛼 1𝑠 1𝑤 1 + · · · + 𝑎𝑟 𝑠𝑟 𝑤𝑟 .
𝑇 𝑣 𝑗 = 𝑠 𝑗 𝑤 𝑗 for 𝑗 = 1, . . . , 𝑟 ; 𝑇 𝑣 𝑗 = 0 for 𝑗 = 𝑟 + 1, . . . , 𝑛.
154 MA2031 Classnotes
𝑆 0
Therefore, [𝑇 ]𝐸,𝐵 = with 𝑆 = diag (𝑠 1, . . . , 𝑠𝑟 ).
0 0
The matrix interpretation of SVD may be formulated as in the following.
˜ 𝑄˜ ∗, with 𝑃˜ = 𝑤 1 · · · 𝑤𝑟 , 𝑄˜ = 𝑣 1 · · · 𝑣𝑟 .
𝐴 = 𝑃𝑆
Here, the matrices 𝑃˜ ∈ C𝑚×𝑟 and 𝑄˜ ∈ C𝑛×𝑟 have orthonormal columns, and 𝑆 =
diag (𝑠 1, . . . , 𝑠𝑟 ) ∈ C𝑟 ×𝑟 . This simplified decomposition of the matrix 𝐴 is called the
tight SVD of 𝐴.
In the tight SVD, the matrices 𝐴 ∈ C𝑚×𝑛 , 𝑃˜ ∈ C𝑚×𝑟 , 𝑆 ∈ C𝑟 ×𝑟 and 𝑄˜ ∗ ∈ C𝑟 ×𝑛 are
all of rank 𝑟 . Write 𝐵 = 𝑃𝑆 ˜ and 𝐶 = 𝑆 𝑄˜ ∗ to obtain
𝐴 = 𝐵 𝑄˜ ∗ = 𝑃˜ 𝐶,
where 𝐵 ∈ C𝑚×𝑟 and 𝐶 ∈ C𝑟 ×𝑛 are also of rank 𝑟 . It shows that each 𝑚 × 𝑛 matrix of
rank 𝑟 can be written as a product of an 𝑚 × 𝑟 matrix with an 𝑟 × 𝑛 matrix each of
rank 𝑟 . This way, the full rank factorization of a matrix follows from the tight SVD.
(5.37) Example
2 −1
Obtain SVD, tight SVD, and full rank factorizations of 𝐴 = −2 1 .
4 −2
24 −12
The matrix 𝐴∗𝐴 = has eigenvalues 𝜆1 = 30 and 𝜆2 = 0. Thus
−12 6
√
𝑠 1 = 30. It is easy to check that rank(𝐴) = 1 as the first column of 𝐴 is −2 times
the second column. Solving the equations 𝐴∗𝐴(𝑎, 𝑏) t = 30(𝑎, 𝑏) t, that is,
Then,
2 −1 √ −1
−2/ 5
𝑤1 = √1 𝐴𝑣 1 = √1 −2 1 √ = √1 1 .
30 30
4 −2 1/ 5 6
−2
156 MA2031 Classnotes
𝑇 = 𝑃𝑈 = 𝑈 𝑄,
Notice that for 1 ≤ 𝑖 ≤ 𝑟, 𝑃𝑤𝑖 = 𝑠𝑖 𝑤𝑖 , 𝑄𝑣𝑖 = 𝑠𝑖 𝑣𝑖 ; for 𝑖 > 𝑟, 𝑃𝑤𝑖 = 0 = 𝑄 (𝑣𝑖 ); and
for 𝑖 > ℓ, 𝑈 𝑣𝑖 = 0. Due to the formula for the adjoint in (3.5.1), we have
𝑚
Õ 𝑟
Õ 𝑟
Õ
𝑃 ∗𝑤 = h𝑤, 𝑃𝑤𝑖 i𝑤𝑖 = h𝑤, 𝑠𝑖 𝑤𝑖 i𝑤𝑖 = 𝑠𝑖 h𝑤, 𝑤𝑖 i𝑤𝑖 = 𝑃𝑤 .
𝑖=1 𝑖=1 𝑖=1
𝑟
Õ 𝑟
Õ
h𝑃𝑤, 𝑤i = 𝑠𝑖 h𝑤, 𝑤𝑖 ih𝑤𝑖 , 𝑤i = 𝑠𝑖 |h𝑤, 𝑤𝑖 i| 2 ≥ 0.
𝑖=1 𝑖=1
𝑟
Õ 𝑟
Õ 𝑟
Õ
2
𝑃 𝑤 = 𝑃 𝑠𝑖 h𝑤, 𝑤𝑖 i𝑤𝑖 = 𝑠𝑖 h𝑤, 𝑤𝑖 i𝑃𝑤𝑖 = 𝑠𝑖2 h𝑤, 𝑤𝑖 i𝑤𝑖 = 𝑇𝑇 ∗𝑤 .
𝑖=1 𝑖=1 𝑖=1
Õℓ ℓ
Õ 𝑟
Õ
𝑃𝑈 𝑣 = 𝑃 h𝑣, 𝑣𝑖 i𝑤𝑖 = h𝑣, 𝑣𝑖 i𝑃𝑤𝑖 = 𝑠𝑖 h𝑣, 𝑣𝑖 i𝑤𝑖 = 𝑇 𝑣.
𝑖=1 𝑖=1 𝑖=1
That is, 𝑈𝑈 ∗ = 𝐼 .
Case 2: Let 𝑚 ≥ 𝑛. Then ℓ = 𝑛. Let 𝑣 ∈ 𝑉 . For 1 ≤ 𝑖 ≤ 𝑛, using (3.5.1) again, we
obtain
Õ𝑛 𝑛
Õ
∗
𝑈 𝑤𝑖 = h𝑤𝑖 , 𝑈 𝑣 𝑗 i𝑣 𝑗 = h𝑤𝑖 , 𝑤 𝑗 i𝑣 𝑗 = 𝑣𝑖 .
𝑗=1 𝑗=1
𝑛
Õ 𝑛
Õ 𝑛
Õ
∗ ∗ ∗
𝑈 𝑈𝑣 = 𝑈 h𝑣, 𝑣𝑖 i𝑤𝑖 = h𝑣, 𝑣𝑖 i𝑈 𝑤𝑖 = h𝑣, 𝑣𝑖 i𝑣𝑖 = 𝑣.
𝑖=1 𝑖=1 𝑖=1
That is, 𝑈 ∗𝑈 = 𝐼 .
When 𝑚 = 𝑛, both (1) and (2) are true. Therefore, 𝑈 is unitary.
Notice that 𝑃 and 𝑄 are positive semi-definite square roots of the positive semi-
definite matrices 𝐴𝐴∗ and 𝐴∗𝐴, respectively. (See Exercise 9.)
The construction of polar decomposition of matrices from SVD may be summa-
rized as follows:
If 𝐴 ∈ C𝑚×𝑛 has SVD as 𝐴 = 𝐵𝐷𝐸 ∗, then 𝐴 = 𝑃𝑈 = 𝑈 𝑄, where
Then
−1 1 2 + √3 −1 + 2√3
√ √ √
1 2 1
∗ 1 1
𝑈 = 𝐵1𝐸 = √ 1 √ 3 = √ −2 + 3 1 + 2 3 ,
6 5 1 2 30
−2 0 4 −2
−1 0 √ 1 −1 2
√
−1 1 −2
= √5 −1 1 −2 ,
𝑃 = 𝐵𝐷𝐵 1∗ = 5 1 0 √1 √ √
−2 0 6 3 3 0 6
2 −2 4
√ −2 0 1 −2 1
√
4 −2
𝑄 = 𝐸𝐷 2 𝐸 ∗ = 6 √ = √6 .
1 0 5 1 2 5 −2 1
√ 1 −1 2 2 + √3 −1 + 2√3 2 −1
5 √ √
−1 1 −2 √1 −2 + 3 1 + 2 3 =
𝑃𝑈 = √ −2 1 = 𝐴.
6 2 −2 4 30
4 −2 4 −2
√ √
2 + 3 −1 + 2 3 2 −1
√ √ √6 4 −2
1
𝑈 𝑄 = √ −2 + 3 1 + 2 3 √ = −2 1 = 𝐴.
30 5 −2 1
4 −2 4
−2
163
164 Introduction to Linear Algebra
(d) False: 𝑉 = R, 𝐴 = {1}, 𝐵 = {2}. (e) False: 𝑉 = R2, 𝐴 = {(1, 0), (0, 1)},
𝐵 = {(1, 0)}. (f) False: 𝑉 = R2, 𝐴 = {(1, 0), (0, 1)}, 𝐵 = {(1, 1)}.
13. 𝑈 = 𝑥-axis, 𝑉 = 𝑦-axis, 𝑊 = the line 𝑦 = 𝑥 .
14. If 𝑥 = 𝑢 + 𝑤 = 𝑢 0 + 𝑤 0 for 𝑢, 𝑢 0 ∈ 𝑈 and 𝑤, 𝑤 0 ∈ 𝑊 , then 𝑢 − 𝑢 0 = 𝑤 − 𝑤 0 . So
both 𝑢 − 𝑢 0, 𝑤 − 𝑤 0 ∈ 𝑈 ∩ 𝑊 .
15(a) 𝑉 ⊆ 𝑉 +𝑊 . So, 𝑈 ∩𝑉 ⊆ 𝑈 ∩ (𝑉 +𝑊 ). Similarly, 𝑈 ∩𝑊 ⊆ 𝑈 ∩ (𝑉 +𝑊 ). Then
the conclusion follows. (b) Take 𝑋 = R2, 𝑈 = span {(1, 1)}, 𝑉 = span {(1, 0)},
𝑊 = span {(0, 1)}. (c) 𝑉 ∩ 𝑊 ⊆ 𝑉 ; so 𝑈 + (𝑉 ∩ 𝑊 ) ⊆ 𝑈 + 𝑉 . 𝑉 ∩ 𝑊 ⊆ 𝑊 ; so
𝑈 + (𝑉 ∩ 𝑊 ) ⊆ 𝑈 + 𝑊 . Therefore, 𝑈 + (𝑉 ∩ 𝑊 ) ⊆ (𝑈 + 𝑉 ) ∩ (𝑈 + 𝑊 ). (d) Take
𝑈 , 𝑉 ,𝑊 , 𝑋 as in (b).
16. 𝑐 00, the set of all sequences each having finitely many nonzero terms.
§ 1.5
1(a) Lin. Ind. (b) (7, 8, 9) = 2(4, 5, 6) − (1, 2, 3). (c) Lin. Ind. (d) 4th = 7/11 times
1st +8/11 times 2nd +13/11 times 3rd. (e) Lin. Ind. (f) Lin. Ind. (g) Lin. Ind.
(h) Lin. Ind. (i) 2 = 2 sin2 𝑡 + 2 cos2 𝑡 . (j) Lin. Ind. (k) Lin. Ind.
2. If (𝑎, 𝑏) = 𝛼 (𝑐, 𝑑), then 𝑎𝑑 − 𝑏𝑐 = 0. If (𝑐, 𝑑) = 𝛼 (𝑎, 𝑏), then 𝑎𝑑 − 𝑏𝑐 = 0. If
𝑎𝑑 = 𝑏𝑐, then (𝑎, 𝑏) = (0, 0) or (𝑐, 𝑑) = (0, 0) or (𝑎, 𝑏) = 𝛼 (𝑐, 𝑑) for some nonzero 𝛼 .
3. {(1, 0), (0, 1)} spans R2 . Use Theorem 1.15.
4. (1, 0), (0, 1), (1, 1). 5. (1, 0, 0), (0, 1, 0), (0, 0, 1), (1, 1, 1).
6(a) If a linear combination of vectors from a subset is zero, then the same shows
that a linear combination of vectors from the superset is also zero. (b) Follows from
(a). (c) Follows from (a). (d) Follows from (c).
7(a) {(1, 0)} is a lin. ind subset of the lin. dep. set {(1, 0), (0, 1)}. (b) Take the sets
in (a). (c) {(1, 0)} and {(2, 0)} are each lin. ind. but their union is not.
(d) {(1, 0), (2, 0)} and {(1, 0), (3, 0)} are each lin. dep. but their intersection is not.
8(a) Not necessarily. 𝐴 = {(1, 0), (2, 0)}, 𝐵 = {(0, 1)}. (b) If 𝑣 ≠ 0 and 𝑣 =
𝑎 1𝑢 1 + · · · + 𝑎𝑛𝑢𝑛 = 𝑏 1𝑣 1 + · · · + 𝑏𝑚 𝑣𝑚 for nonzero 𝑎𝑖 , 𝑏 𝑗 and 𝑢𝑖 ∈ 𝐴, 𝑣 𝑗 ∈ 𝐵, then
𝑎 1𝑢 1 + · · · + 𝑎𝑛𝑢𝑛 − 𝑏 1𝑣 1 − · · · − 𝑏𝑚 𝑣𝑚 = 0 shows that 𝐴 ∪ 𝐵 is lin. dep.
9. Suppose 𝑎𝑒 𝑡 + 𝑏𝑡𝑒 𝑡 + 𝑐𝑡 3𝑒 𝑡 = 0. Evaluate it at 𝑡 =∫ −1, 0, 1. Solve for 𝑎, 𝑏, 𝑐.
Í 𝜋
10. Yes. Let 𝑓 (𝑡) = 𝑛𝑘=1 𝑎𝑘 sin 𝑘𝑡 . 𝑓 (𝑡) = 0 ⇒ −𝜋 sin 𝑚𝑡 𝑓 (𝑡)𝑑𝑡 = 0 for any 𝑚.
Evaluate the integral and conclude that 𝑎𝑚 = 0 for 1 ≤ 𝑚 ≤ 𝑛.
11. Otherwise, a higher degree polynomial is a linear combination of some lower
degree polynomials. Differentiate the equation to get a contradiction.
12. Five vectors in the span of four vectors are lin. dep.
Í
13. 𝑆 ∪ {𝑣 } is lin. dep. implies 𝑛𝑖=1 𝑎𝑖 𝑣𝑖 = 0, where 𝑣𝑖 ∈ 𝑆 ∪ {𝑣 } and 𝑎𝑖 are scalars
not all zero. Since 𝑆 is lin. ind., 𝑣 is one of these 𝑣𝑖0𝑠. Say, 𝑣 1 = 𝑣. Then 𝑎 1 ≠ 0 again
due to the lin. ind. of 𝑆. Then 𝑣 = 𝑣 1 = (𝑎 1 ) −1 𝑛𝑖=2 𝑣𝑖 ∈ span (𝑆).
Í
14. Without loss of generality, suppose 𝑢 + 𝑣 1 = 𝑎 2 (𝑢 + 𝑣 2 ) + · · · + 𝑎𝑛 (𝑢 + 𝑣𝑛 ). If
𝑎 2 + · · · + 𝑎𝑛 = 1, then 𝑣 1 = 𝑎 2𝑣 2 + · · · + 𝑎𝑛 𝑣𝑛 contradicting the lin. ind. of 𝑣 1, . . . , 𝑣𝑛 .
Thus (1 − 𝑎 2 − · · · − 𝑎𝑛 )𝑢 = 𝑎 2𝑣 2 + · · · 𝑎𝑛 𝑣𝑛 .
Answers to exercises 165
§ 1.6
1(a) Basis. (b) Basis. (c) Not a basis. (d) Basis. 2. Yes. 3(a) Yes. (b) No.
4. {(1, 0, −1), (0, 1, −1)}. 5. {(1, 0, 0, 0, 1), (0, 1, 0, 1, 0), (0, 0, 1, 0, 1)}.
6. {𝑡 − 2, 𝑡 2 − 2𝑡 − 2}. 7. {1 + 𝑡 2, 1 − 𝑡 2, 𝑡, 𝑡 3 }. 8. Yes; Yes.
9. {𝑒 1, 𝑒 2, 𝑒 3 }, {𝑒 1 + 𝑒 2, 𝑒 1 + 𝑒 3, 𝑒 2 + 𝑒 3 }, {𝑒 1 + 2𝑒 2, 𝑒 2 + 2𝑒 3, 𝑒 3 + 2𝑒 1 }.
10. {𝑓1, 𝑓2, 𝑓3 }, where for 𝑖, 𝑗 ∈ {1, 2, 3}, 𝑓𝑖 (𝑖) = 1, and 𝑓𝑖 ( 𝑗) = 0 when 𝑖 ≠ 𝑗 .
11. Let 𝐸 𝑗𝑘 ∈ C𝑛×𝑛 have 1 as ( 𝑗, 𝑘)th entry; all other entries 0. A basis is
{𝐸 𝑗 𝑗 : 1 ≤ 𝑗 ≤ 𝑛} ∪ {𝑖𝐸 𝑗 𝑗 : 1 ≤ 𝑗 ≤ 𝑛} ∪ {𝐸 𝑗𝑘 + 𝐸𝑘 𝑗 : 1 ≤ 𝑗 < 𝑘 ≤ 𝑛}
∪{𝑖 (𝐸 𝑗𝑘 + 𝐸𝑘 𝑗 ) : 1 ≤ 𝑗 < 𝑘 ≤ 𝑛}.
12. A basis is {𝐸 11 −𝐸𝑛𝑛 : 1 ≤ 𝑖< 𝑛} ∪ {𝐸 𝑖 𝑗 : 1≤ 𝑖 < 𝑗 ≤ 𝑛}.
𝑖 0 0 0 0 1 0 𝑖
13. A basis is , , , .
0 0 0 𝑖 −1 0 𝑖 0
§ 1.7
1. {0}, R3, straight lines passing through the origin, and planes passing through the
origin. 2(a) Basis: {(0, 1, 0, 0, 0), (1, 0, 1, 0, 0), (1, 0, 0, 1, 0), (0, 0, 0, 0, 1)}.
(b) Basis: {(1, 0, 0, 0, −1), (0, 1, 1, 1, 0)}. (c) Basis: {(1, −1, 0, 2, 1), (2, 1, −2, 0, 0),
(2, 4, 1, 0, 1)} 3. 3; It is span {1 + 𝑡 2, −1 + 𝑡 + 𝑡 2, 𝑡 3 }.
4(a) dim F (F𝑛×𝑛 ) = 𝑛 2 ; dim R (C𝑛×𝑛 ) = 2𝑛 2 . (b) In F𝑛×𝑛 , dim F is (𝑛 2 + 𝑛)/2. In
C𝑛×𝑛 , dim R is 𝑛 2 + 𝑛. (c) In F𝑛×𝑛 , dim F is (𝑛 2 − 𝑛)/2. In C𝑛×𝑛 , dim R is 𝑛 2 − 𝑛.
(d) In C𝑛×𝑛 , Basis over R: the 𝑛 matrices with a single 1 on the diagonal, the
𝑛(𝑛 − 1)/2 matrices with a single pair of 1s at corresponding off-diagonal elements
and the 𝑛(𝑛 − 1)/2 matrices with a single pair of 𝑖 and −𝑖 at corresponding off-
diagonal elements. Thus dim R is 𝑛 2 . In R𝑛×𝑛 , dim R is (𝑛 2 + 𝑛)/2.
(e) In F𝑛×𝑛 , dim F is (𝑛 2 + 𝑛)/2. In C𝑛×𝑛 , dim R is 𝑛 2 + 𝑛. (f) In F𝑛×𝑛 , dim F is 𝑛. In
C𝑛×𝑛 , dim R is 2𝑛. (g) In F𝑛×𝑛 , dim F is 1. In C𝑛×𝑛 , dim R is 2.
5. dim (𝑈 ∩ 𝑊 ) = dim (𝑈 ) and 𝑈 ∩ 𝑊 is a subspace of 𝑈 implies 𝑈 ∩ 𝑊 = 𝑈 .
6. If 𝑈 ∩ 𝑊 = {0}, then dim (𝑈 ) + dim (𝑊 ) = dim (𝑈 + 𝑊 ) ≤ 9; a contradiction.
7. {(−3, 0, 1)} is a basis for 𝑈 ∩ 𝑊 ; dim (𝑈 + 𝑊 ) = 3.
8. 𝑈 + 𝑉 = {(𝑎𝑖 ) ∈ R50 : 12/𝑖}, 𝑈 ∩ 𝑉 = {(𝑎𝑖 ) ∈ R50 : 3/𝑖 or 4/𝑖}, dim (𝑈 ) = 34,
dim (𝑉 ) = 38, dim (𝑈 + 𝑉 ) = 46, dim (𝑈 ∩ 𝑉 ) = 26.
9. No: dim (span {𝑡, 𝑡 2, 𝑡 3, 𝑡 4, 𝑡 5 }) ≤ dim (𝑉 ) = 3.
10. {𝑓1, . . . , 𝑓𝑛 } is a basis where 𝑓𝑖 (𝑖) = 1, 𝑓𝑖 ( 𝑗) = 0 for 𝑗 ≠ 𝑖.
11. If 𝑆 is a linearly dependent spanning set, systematically delete vectors to get
a basis; contradicting |𝑆 | = dim (𝑉 ). For Theorem 1.27, adjoin to a basis of 𝑈 the
vectors from a spanning set of 𝑉 and delete all linearly dependent vectors from the
new ones. 12. R[𝑡] ⊆ 𝑉 .
§ 1.8
1. 3. 2. Basis: {(1, −1, 0, 2, 1), (0, 3, −2, −4, −2), (0, 0, 5, 4, 2), (0, 0, 0, 0, 1)}.
3. Basis: {1 + 𝑡 2, 𝑡 + 2𝑡 2, 𝑡 3 }. 4(a) Bases for 𝑈 : {(1, 2, 3), (2, 1, 1)}; 𝑊 :
166 Introduction to Linear Algebra
{(1, 0, 1), (3, 0, −1)}; 𝑈 + 𝑊 : {(1, 2, 3), (0, 3, 5), (0, 2, 2)}; 𝑈 ∩ 𝑊 : {(−3, 0, 1)}.
(b) Bases for 𝑈 : {(1, 0, 2, 0), (1, 0, 3, 0)}; 𝑊 : {(1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 1)};
𝑈 + 𝑊 : {(1, 0, 2, 0), (0, 1, 0, 0), (0, 1, 0, 0), (0, 0, 1, 1)}; 𝑈 ∩ 𝑊 : {(1, 0, 0, 0)}.
(c) Bases for 𝑈 : {(1, 0, 0, 2), (3, 1, 0, 2), (7, 0, 5, 2)};𝑊 : {(1, 0, 3, 2), (1, 1, −1, −1)};
𝑈 +𝑊 : {(1, 0, 0, 2), (0, 1, 0, −4), (0, 0, 3, 0), (0, 0, 0, 1)}; 𝑈 ∩𝑊 : {(1, −12, 15, 14)}.
5. dim of 𝑈 : 3; 𝑊 : 3; 𝑈 + 𝑊 : 4; 𝑈 ∩ 𝑊 : 2. 6. 𝑏 (𝑎 − 3) = 6.
7(a) Each 𝑣𝑖 ∈ span {𝑣 1, 𝑣 2 − 𝑣 1, . . . , 𝑣𝑛 − 𝑣 1 }. (b) Due to (a) and dim (𝑉 ) = 𝑛.
§ 2.1
2(a) h(0, 1), (0, 1)i = 0. (b) As in (a). (c) h(1, 1), (1, 1)i = 0. (d) h1, 1i = 0. (e) As in
(d). (f) h1, 1i = 0. (g) For 𝑓 (𝑡) = 0 in [0, 1/2] and 𝑓 (𝑡) = 𝑡 in (1/2, 1], h𝑓 , 𝑓 i = 0.
3. (𝑦 t𝐴𝑥) t = 𝑦 t𝐴𝑥 implies 𝑎 12 = 𝑎 21 . Next, 𝑥 t𝐴𝑥 ≥ 0 gives a quadratic. Complete
Í
the square and argue. 4. With 𝑦 = 𝑛𝑖=1 𝛼𝑖 𝑥𝑖 , h𝑦, 𝑦i = 0.
5. h𝑥, 𝑦i = 𝛼 + 𝑖𝛽 ⇒ Reh𝑖𝑥, 𝑦i = −𝛽.
6. (a)-(c) easy. (d) k𝑥 + 𝛼𝑦k 2 = k𝑥 − 𝛼𝑦k 2 iff Re(𝛼 h𝑥, 𝑦i) = 0. Take 𝛼 = h𝑥, 𝑦i.
(e) k𝑥 + 𝑦 k 2 = (k𝑥 k + k𝑦 k) 2 iff Reh𝑥, 𝑦i = k𝑥 k k𝑦 k (Using Reh𝑥, 𝑦i ≤ |h𝑥, 𝑦i| ≤
k𝑥 k k𝑦 k) iff |h𝑥, 𝑦i| = k𝑥 k k𝑦 k (Cauchy-Schwartz) iff one is a scalar multiple of the
other. 7. Expand the norms on the right hand side and simplify.
8. Consider 𝑥 = (𝑎 1, 2𝑎 2, . . . , 𝑛𝑎𝑛 ), 𝑦 = (𝑏 1, 𝑏 2 /2, . . . , 𝑏𝑛 /𝑛). Apply Cauchy-
Schwarz.
§ 2.2
1. 𝑊 = span {(3, −1, 3, 0), (0, −1, 3, 3)}. 2. h𝑥, 𝑦i = h𝑦, 𝑥i ⇒ h𝑥 + 𝑦, 𝑥 − 𝑦i =
k𝑥 k 2 − k𝑦 k 2 . 5. Yes. 6. If 𝐵 = {𝑣 1, . . . , 𝑣𝑛 } is an orthonormal set and for each 𝑥,
k𝑥 k 2 = 𝑛𝑗=1 |h𝑥, 𝑣 𝑗 i| 2, then 𝐵 is an orthonormal basis. For, let 𝑦 = 𝑛𝑗=1 h𝑥, 𝑣 𝑗 i𝑣 𝑗 .
Í Í
Then k𝑥 k 2 = k𝑦 k 2 . 7. Easy.
8(a) 𝑥 ∈ 𝑉 ⊥ ⇒ h𝑥, 𝑣i = 0 for all 𝑣 ∈ 𝑉 . In particular, h𝑥, 𝑥i = 0. (b) h𝑣, 0i = 0 for
all 𝑣 ∈ 𝑉 . (c) If h𝑥, 𝑧i = 0 = h𝑦, 𝑧i, then h𝑥 + 𝛼𝑦, 𝑧i = 0. (d) If 𝑥 ∈ 𝑆, then h𝑥, 𝑦i = 0
for all 𝑦 ∈ 𝑆 ⊥ .
9(a) 𝑊 ⊆ 𝑉 , 𝑊 ⊥ ⊆ 𝑉 ; so 𝑊 + 𝑊 ⊥ ⊆ 𝑉 . Let 𝑣 ∈ 𝑉 . Let {𝑣 1, . . . , 𝑣𝑛 } be an
Í
orthonormal basis of 𝑉 . Write 𝑥 = 𝑛𝑗=1 h𝑥, 𝑣 𝑗 i𝑣 𝑗 ; 𝑦 = 𝑣 − 𝑥 . Then h𝑦, 𝑣 𝑗 i = 0. Hence
𝑦 ∈ 𝑊 ⊥ . (b) Let 𝑥 ∈ 𝑊 ∩𝑊 ⊥ . Then h𝑥, 𝑥i = 0. (c) 𝑊 ⊆ 𝑊 ⊥⊥ . Let 𝑥 ∈ 𝑊 ⊥⊥ . Using
(a), 𝑥 = 𝑤 +𝑦, for some 𝑤 ∈ 𝑊 and 𝑦 ∈ 𝑊 ⊥ . Then h𝑤, 𝑦i = 0 ⇒ 0 = h𝑥, 𝑦i = h𝑦, 𝑦i.
Then 𝑥 = 𝑤 ∈ 𝑊 . ∫ 2𝜋
Í
10. Let 𝑥 (𝑡) := 𝑛𝑗=1 𝑎 𝑗 sin( 𝑗𝑡) = 0. Compute 0 𝑥 (𝑡) sin(𝑚𝑡)𝑑𝑡 for 𝑚 = 1, 2, . . . , 𝑛.
§ 2.3
1(a) (1, 2, 0), (6/5, −3/5, 0), (0, 0, 1). (b) (1, 1, 1), (2/3, −4/3, 2/3), (1, 0, −1).
(c) (0, 1,√1), (0, 1, −1), (−1, 0, 0).
√ √
2(a) (1/ 14)(1, −2, 3). (b) (1/ 74)(7, −4, 3). (c) (1/2 5)(0, 2, 4).
3(a) span {(−1, 1, 0, 1), (0, 0, 1, 0)}. (b) span {(−6, 1, 5, 2), (0, 1, −1, 1)}.
(c) span {(1, 0, 0, 0), (0, 0, 1, 1)}. 4. (1/2)(1, 0, 𝑖/2), (1/4)(1 + 𝑖, 1, 1 − 𝑖) .
Answers to exercises 167
5(a) 1, 𝑡 − 1/2, 𝑡 2 − 𝑡 + 1/6. (b) 1, 𝑡, 𝑡 2 − 1/3. (c) 1, 𝑡 + 1/2, 𝑡 2 − 5𝑡 − 11/6.
6(a) {−(𝑏 + 𝑐 + 𝑑) + 2𝑏𝑡 + 3𝑐𝑡 2 + 4𝑑𝑡 3 : 𝑎, 𝑏, 𝑐, 𝑑 ∈ R}.
(b) 1, 𝑡 − 1/2, 𝑡 2 − 𝑡 + 1/6, 𝑡 3 − 3𝑡 2 /2 + 3𝑡/5 − 1/20.
§ 2.4
1. (5/3, 4/3, 1/3). 2. (−1/3, 2/3, −1/3). 3. 𝑣 since 𝑣 ∈ 𝑈 . 4. −19/20−3𝑡/5+3𝑡 2 /2.
5. 𝑒 𝑡 − 9(𝑒 − /𝑒) + 3𝑡/𝑒 − 15(𝑒 − 13/𝑒)𝑡 2 /8.
§ 3.1
1(a) 𝑇 (0, 0) ≠ (0, 0). √ (b) 𝑇 (2, 2) = (2, 4); 2𝑇 (1, 1) = (2, 2). (c) 𝑇 (𝜋/2, 0) =
(1, 0); 2𝑇 (𝜋/4, 0) = ( 2, 0). (d) 𝑇 (−1, 0) = (1, 0); (−1)𝑇 (1, 0) = (−1, 0).
(e) 𝑇 (0, 0) ≠ (0, 0). (f) 𝑇 (0, 2) = (0, 4); 2𝑇 (0, 1) = (2, 2).
2. 𝑇 (𝑥) = 𝛼𝑥 for some 𝛼 . 3. 𝑇 (2, 3) = (5, 11). 𝑇 is one-one.
4. 𝑇 𝑆 (𝑥) = 0 and 𝑆𝑇 (𝑥) = 𝑥 (1) − 𝑥 (0). Both are linear transformations.
5. No. If 𝑇 (𝑎, 𝑏) = (1, 1), then 𝑇 (−𝑎, −𝑏) = (−1, −1), which is not in the co-domain
square. 6. Fix a basis {𝑣 1, 𝑣 2 } for 𝑉 . If 𝑣 = 𝑎𝑣 1 + 𝑏𝑣 2, define 𝑇 (𝑣) = (𝑎, 𝑏).
§ 3.2
1(a) No 𝑇 since 𝑇 (2, −1) ≠ 2𝑇 (1, 1) − 3𝑇 (0, 1). (b) 𝑇 (𝑎, 𝑏) = (2𝑎 − 𝑏, 𝑎 − 𝑏, 2𝑎).
(c) No 𝑇 as 𝑇 (−2, 0, −6) ≠ −2𝑇 (1, 0, 3). (d) 𝑇 (𝑎, 𝑏, 𝑐) = (𝑐, (𝑏 + 𝑐 − 𝑎)/2).
(e) 𝑇 (𝑎 + 𝑏𝑡 + 𝑐𝑡 2 + 𝑑𝑡 3 ) = 𝑏 + 𝑐 and many more. (f) This 𝑇 itself.
(g) No 𝑇 since 𝑇 (1 + 𝑡) ≠ 𝑇 (1) + 𝑇 (𝑡). (h) This 𝑇 is linear.
2. No. Let 𝑇 (1, 1) = (𝑎, 𝑏), 𝑇 (1, −1) = (𝑐, 𝑑). Now, −1 ≤ 𝑎, 𝑐 ≤ 1 and 0 ≤ 𝑏, 𝑑 ≤ 2.
Then 𝑇 (−1, −1) = (−𝑎, −𝑏), 𝑇 (−1, 1) = (−𝑐, −𝑑). Here, 0 ≤ −𝑏, −𝑑 ≤ 2 also.
(Image points are inside the co-domain.) It forces 𝑏 = 𝑑 = 0. So, 𝑇 (1, 1) = (𝑎, 0),
𝑇 (1, −1) = (𝑐, 0). Then 𝑇 (𝛼, 𝛽) = (𝛼 + 𝛽)𝑎/2 + (𝛼 − 𝛽)𝑏/2, 0 for all 𝛼, 𝛽 ∈ R. No
point goes to (1, 2).
3. Expand k𝑇 (𝑢 + 𝑣)k 2 and use k𝑇 𝑥 k 2 = k𝑥 k 2 for all 𝑥 ∈ 𝑉 .
§ 3.3
1(a) rank(𝑇 ) = 2, null(𝑇 ) = 0. (b) rank(𝑇 ) = 2, null(𝑇 ) = 1.
(c) rank(𝑇 ) = 2, null(𝑇 ) = 0. (d) rank(𝑇 ) = 2, null(𝑇 ) = 0.
(e) rank(𝑇 ) = 2, null(𝑇 ) = 0. (f) rank(𝑇 ) = 3, null(𝑇 ) = 0.
2. Space of all const. func.
3(a) 𝑁 (𝑇 ) = {(𝑎, 𝑎, −𝑎) : 𝑎 ∈ R}, 𝑅(𝑇 ) = {(𝑎, 𝑎 + 𝑏, 𝑏) : 𝑎, 𝑏 ∈ R}.
(b) 𝑆 = {(1, 0, 1) + 𝑥 : 𝑥 ∈ 𝑁 (𝑇 )}.
4(a) rank(𝑇 ) ≤ dim (𝑉 ) and 𝑅(𝑇 ) ⊆ 𝑊 . (b) 𝑇 is onto implies rank(𝑇 ) = dim (𝑊 ).
(c) 𝑇 is one-one implies rank(𝑇 ) = dim (𝑉 ). (d) Follows from (c).
(e) Follows from (b).
5(a) 𝑇 (𝑎, 𝑏) = (𝑎 − 𝑏, 𝑎 − 𝑏). (b) 𝑆 (𝑎, 𝑏) = (𝑎, 𝑏), 𝑇 (𝑎, 𝑏) = (𝑏, 𝑎).
6. Extend a basis {𝑢 1, . . . , 𝑢𝑘 } for 𝑈 to a basis {𝑢 1, . . . , 𝑢𝑘 , 𝑣 1, . . . , 𝑣𝑚 } for 𝑉 .
(a) 𝑇 (𝑢𝑖 ) = 𝑢𝑖 , 𝑇 (𝑣 𝑗 ) = 0. (b) 𝑇 (𝑢𝑖 ) = 0, 𝑇 (𝑣 𝑗 ) = 𝑣 𝑗 .
168 Introduction to Linear Algebra
1 −1 1
2 3 2
1 1 1 −1
(c) 𝑄 = 2 , 𝑅 = 0 5 −2 .
1 1 1
0 0 4
1 −1 −1
3. Suppose 𝐴 = 𝑄 1𝑅1 = 𝑄 2𝑅2, 𝑄 1, 𝑄 2 ∈ F𝑚×𝑛 satisfy 𝑄 1∗𝑄 1 = 𝑄 2∗𝑄 2 = 𝐼, 𝑅1 = [𝑎𝑖 𝑗 ],
𝑅2 = [𝑏𝑖 𝑗 ] ∈ F𝑛×𝑛 are upper triangular, and 𝑎𝑘𝑘 > 0, 𝑏𝑘𝑘 > 0 for 1 ≤ 𝑘 ≤ 𝑛. Then
𝑅1∗𝑅1 = 𝑅1∗𝑄 1∗𝑄 1𝑅1 = (𝑄 1𝑅1 ) ∗𝑄 1𝑅1 = 𝐴∗𝐴 = (𝑄 2𝑅2 ) ∗𝑄 2𝑅2 = 𝑅2∗𝑅2 . Now, 𝑅1, 𝑅2, 𝑅1∗
and 𝑅2∗ are all invertible. Multiplying (𝑅2∗ ) −1 on the left, and (𝑅1 ) −1 on the right,
we have (𝑅2∗ ) −1𝑅1∗𝑅1 (𝑅1 ) −1 = (𝑅2∗ ) −1𝑅2∗𝑅2 (𝑅1 ) −1 . It implies (𝑅2∗ ) −1𝑅1∗ = 𝑅2𝑅1−1 . The
matrix on the left is lower triangular and that on the right is upper triangular;
so both are diagonal. Comparing the diagonal entries in the products we have
[(𝑏𝑖𝑖 ) −1 ] ∗𝑎𝑖𝑖∗ = 𝑏𝑖𝑖 (𝑎𝑖𝑖 ) −1 for 1 ≤ 𝑖 ≤ 𝑛. That is, |𝑎𝑖𝑖 | 2 = |𝑏𝑖𝑖 | 2 . Since 𝑎𝑖𝑖 > 0
and 𝑏𝑖𝑖 > 0, we see that 𝑎𝑖𝑖 = 𝑏𝑖𝑖 . Hence (𝑅2−1 ) ∗𝑅1∗ = 𝑅2𝑅1−1 = 𝐼 . Therefore,
𝑅2 = 𝑅1, 𝑄 2 = 𝐴𝑅2−1 = 𝐴𝑅1−1 = 𝑄 1 .
4. Since columns of 𝐴 are lin. ind. the least squares solution is unique.
5(a) Extend the columns of 𝐴 to a basis {𝑢 1, . . . , 𝑢𝑚 } of F𝑚×1 . Use Gram-Schmidt
orthogonalization to obtain a basis {𝑣 1, . . . , 𝑣𝑚 } for F𝑚×1 . There exists an isomor-
phism between these two bases. So, let 𝑃 ∈ F𝑚×𝑚 be the invertible matrix (this
isomorphism) such that 𝑃𝑢𝑖 = 𝑣𝑖 for 1 ≤ 𝑖 ≤ 𝑚. Let 𝐶 = [𝑣 1 · · · 𝑣𝑛 ]. Then 𝐶 ∗𝐶 is a
diagonal matrix. Moreover, 𝑃𝐴 = 𝐶 implies that 𝐴 = 𝑃 −1𝐶, which is the required
factorization of 𝐴. (b) Use orthonormalization instead of orthogonalization.
§ 4.3
−1/3 1/3 1/3 1/3 1 0 1 1 0 1
1. [𝑇 ]𝐷,𝐵 = 0 1 ; [𝑇 ]𝐷,𝐶 = 2 3 . 2(a)-(b) 0 1 1 . (c)
1 1 0 .
2/3 −2/3 −2/3 −2/3 1 1 0 0 1 1
−1 −1 0
0 2 0
3. 1 2 4 . 4. .
1 −1 0
0 0 1
§ 4.4
1 0 0 0 0 1 0
0 0 1 0 2 2 4
1(a) . (b) . (c) 1 0 0 1 .
0 1 0 0 0 0 0
0 0 0 1 0 1 6
2(a) If 𝑆𝑇 = 𝑇 𝑆, then (𝑆𝑇 ) ∗ = (𝑇 𝑆) ∗ = 𝑆 ∗𝑇 ∗ = 𝑆𝑇 . If (𝑆𝑇 ) ∗ = 𝑆𝑇 , then 𝑆𝑇 = (𝑆𝑇 ) ∗ =
𝑇 ∗𝑆 ∗ = 𝑇 𝑆. (b) (𝑇 ∗𝑆𝑇 ) ∗ = 𝑇 ∗𝑆 ∗𝑇 = 𝑇 ∗𝑆𝑇 .
(c) 𝑇 ∗𝑆𝑇 = (𝑇 ∗𝑆𝑇 ) ∗ = 𝑇 ∗𝑆 ∗𝑇 . Multiply (𝑇 ∗ ) −1 on the left and 𝑇 −1 on the right.
3. 𝐴∗ + 𝐴 = 0 = 𝐵 ∗ + 𝐵 ⇒ (𝐴 + 𝛼𝐵) ∗ + (𝐴 + 𝛼𝐵) = 0 for real 𝛼 .
Answers to exercises 171
1 1 + 𝑖 1 1 1 0
𝑖 0 0 0 0 −1 0 𝑖
Basis: , , , . 4. −1 + 𝑖 1 1 . 5. 0 1 1 .
0 0 0 𝑖 1 0 𝑖 0
−1 −1 1
1 0 1
6. tr(𝐴 + 𝛼𝐵) = tr(𝐴) + 𝛼tr(𝐵); so 𝑉 is a subspace of F𝑛×𝑛 .
7. tr(𝐴 + 𝛼𝐵) = tr(𝐴) + 𝛼tr(𝐵); with 𝑉 as in Q.6, null(𝑇 ) = dim F (𝑉 ) = 𝑛 2 − 1.
0 1
8. 𝐴 = . 9(a) No; tr(−𝐼 + (−𝐼 )) < 0. (b) Yes. 10. 𝐴 = 𝐼 = 𝐵.
−1 0Í Í Í Í Í Í
11. tr(𝐴𝐵) = 𝑛𝑗=1 𝑛𝑘=1 𝑎 𝑗𝑘 𝑏𝑘 𝑗 = 𝑛𝑘=1 𝑛𝑗=1 𝑎𝑘 𝑗 𝑏 𝑗𝑘 = 𝑛𝑗=1 𝑛𝑘=1 𝑏 𝑗𝑘 𝑎𝑘 𝑗 = tr(𝐵𝐴).
𝑎 𝑏
12. tr(𝐴𝐵 − 𝐵𝐴) = tr(𝐴𝐵) − tr(𝐵𝐴) = 0 ≠ tr(𝐼 ). 13. Let 𝐶 = . If 𝑎 = 0, take
𝑐 −𝑎
0 −𝑏 1 0 0 𝑎 0 0
𝐴= ,𝐵= . If 𝑎 ≠ 0, take 𝐴 = ,𝐵= .
𝑐 0 0 0 0 𝑐 1 𝑏/𝑎
14. tr(𝐴) = 𝑖 𝑗 |𝑎𝑖 𝑗 | 2 .
Í Í
15. 𝐴∗𝐴 = 𝐴2 ⇒ 𝐴𝐴∗ = (𝐴∗ ) 2 . Then tr[(𝐴∗ − 𝐴) ∗ (𝐴∗ − 𝐴)] = tr[𝐴𝐴∗ − 𝐴∗𝐴] = 0.
§ 4.5
1 1 −1 −1 1 0 1
1
1. −1 1 1 = 2 1 1 0 .
1 −1 1 0 1 1
2 1 1 2 3 3
1
1
(a) [𝐼 ]𝑁 ,𝑂 = 2 1 2 1 . (b) [𝑇 ]𝑁 ,𝑂 = 2 2 1 3 .
1 1 2 0 0 2
1 1 1 4 1 4
1
(c) 2 = 0 , 2 = 2 3 , 𝑇 2 = 4 .
3
𝑂 2 3 𝑁
5 3
𝑁 2
0 1 1 0 1 1 1 0
2(a) 𝑄 = ,𝑅 = . (b) 𝑃 = . (c) 𝑆 = .
1 0 −4 −1 −1 −3 −4 −1
−1 = [𝐼 ] −1
(d) 𝑆 = 𝑃𝑄𝑃 [𝐴]
𝑁 ,𝑂 𝑂,𝑂 [𝐼 ]𝑁
, 𝑂 = [𝐴]
𝑁 ,𝑂 [𝐼 ]𝑂,𝑁 = [𝐴] = 𝑅.
𝑁 ,𝑁
1 0 1 1 1 −1 −1
3. 𝐴 = ,𝑣 = ,𝐵 = , ; then [𝑣]𝐵 = , 𝐴[𝑣]𝐵 = ,
1 1 2 0 1 2 1
1 −2
[𝐴𝑣]𝐵 = = .
3 𝐵 3
Í Í
4. If 𝑇 𝑣𝑖 = 𝑎𝑖1𝑤 1 + · · · 𝑎𝑖𝑚𝑤𝑚 for 1 ≤ 𝑖 ≤ 𝑛, then 𝑇 = 𝑛𝑖=1 𝑚𝑗=1 𝑎𝑖 𝑗𝑇𝑖 𝑗 . Next, this
equals 0 implies 𝑇 𝑣𝑖 = 0. As {𝑤 𝑗 } lin. ind., 𝑎𝑖1 = · · · = 𝑎𝑖𝑚 = 0. So, {𝑇𝑖 𝑗 } is lin. ind.
5(a) 𝑇 is one-one iff null(𝑇 ) = {0} iff null([𝑇 ]𝐸,𝐵 ) = 0 iff rank([𝑇 ]𝐸,𝐵 ) = 𝑛.
(b) 𝑇 is onto iff [𝑇 ]𝐸,𝐵 is onto iff rank([𝑇 ]𝐸,𝐵 ) = 𝑚.
6. Both L (𝑉 ,𝑊 ) and F𝑚×𝑛 are vector spaces. Use Exercise 5 and show that
[𝛼 𝑇 ]𝐸,𝐵 = 𝛼 [𝑇 ]𝐸,𝐵 and [𝑆 + 𝑇 ]𝐸,𝐵 = [𝑆]𝐸,𝐵 + [𝑇 ]𝐸,𝐵 .
7. Since the map 𝑇 ↦→ [𝑇 ]𝐸,𝐵 is an isomorphism, it maps a basis onto a basis.
172 Introduction to Linear Algebra
8(a) Write 𝐶 𝑗 := the 𝑗th column of 𝐴 = [h𝑢 1, 𝑢 𝑗 i, . . . , h𝑢𝑛 , 𝑢 𝑗 i] 𝑡 . Suppose for scalars
Í Í
𝑏 1, . . . , 𝑏𝑛 , 𝑗 𝑏 𝑗 𝐶 𝑗 = 0. Its 𝑖th component gives 𝑗 𝑏 𝑗 h𝑢𝑖 , 𝑢 𝑗 i = 0. That is, for each 𝑖,
Í Í
h 𝑗 𝑏 𝑗 𝑢 𝑗 , 𝑢𝑖 i = 0. Since {𝑢𝑖 } is a basis, for each 𝑣 ∈ 𝑉 , h 𝑗 𝑏 𝑗 𝑢 𝑗 , 𝑣i = 0. In particular,
Í Í Í
h 𝑗 𝑏 𝑗 𝑢 𝑗 , 𝑗 𝑏 𝑗 𝑢 𝑗 i = 0. Or, 𝑗 𝑏 𝑗 𝑢 𝑗 = 0. Due to lin ind. of {𝑢 𝑗 }, each 𝑏 𝑗 = 0. So, the
columns of 𝐴 are lin. ind.
(b) Since {𝐶 1, . . . , 𝐶𝑛 } is a basis for F𝑛×1, there exist unique scalars 𝑏 1, . . . , 𝑏𝑛
such that [𝛼 1, . . . , 𝛼 𝑛 ] 𝑡 = 𝑏 1𝐶 1 + · · · + 𝑏𝑛𝐶𝑛 . Comparing the components, we have
Í Í
𝛼 𝑖 = h𝑢𝑖 , 𝑗 𝑏 𝑗 𝑢 𝑗 i. So, 𝛼𝑖 = h 𝑗 𝑏 𝑗 𝑢 𝑗 , 𝑢𝑖 i.
9. [𝑇 ]𝐶,𝐶 = [𝐼 ]𝐶,𝐵 [𝑇 ]𝐵,𝐵 [𝐼 ]𝐵,𝐶 . Thus we show that if 𝑅 = 𝑃 −1𝑄𝑃, then tr(𝑅) = tr(𝑃)
for 𝑛 × 𝑛 matrices 𝑃, 𝑄, 𝑅, with 𝑃 invertible. For this, use tr(𝑀1 𝑀2 ) = tr(𝑀2 𝑀1 ).
Similarly, do for the determinant.
Í Í Í Í
10. 𝑥 = 𝑖 h𝑥, 𝑢𝑖 i𝑢𝑖 , 𝑦 = 𝑗 h𝑦, 𝑢 𝑗 i𝑢 𝑗 ⇒ h𝑥, 𝑦i = 𝑖 h𝑥, 𝑢𝑖 i 𝑗 h𝑢 𝑗 , 𝑦ih𝑢𝑖 , 𝑢 𝑗 i. This
proves the first part. Next, define 𝑇 : 𝑉 → F𝑛 by 𝑇 (𝑢𝑘 ) = 𝑒𝑘 for 𝑘 = 1, . . . , 𝑛. Since
𝑥 = 𝑘 h𝑥, 𝑢𝑘 i𝑢𝑘 , 𝑇 𝑥 = 𝑘 h𝑥, 𝑢𝑘 i𝑒𝑘 . Using first part, k𝑇 𝑥 k 2 = 𝑘 |h𝑥, 𝑢𝑘 i| 2 = k𝑥 k 2 .
Í Í Í
§ 4.6
1. For 𝐴 = [𝑎𝑖 𝑗 ], write 𝐴 = [𝑎𝑖 𝑗 ]. See that rank(𝐴 = rank(𝐴). Then use
rank(𝐵𝑡 ) = rank(𝐵). 2. If rank(𝐴) = 𝑟 = rank(𝐵), then 𝐴 = 𝑄 −1 𝐸𝑟 𝑃 and
𝐵 = 𝑀 −1 𝐸𝑟 𝑆. So, 𝐵 = 𝑀 −1𝑄𝐴𝑃 −1𝑆.
3(a) 𝑅(𝐴𝐵) = {𝐴𝐵𝑥 : 𝑥 ∈ F𝑘×1 } ⊆ {𝐴𝑦 : 𝑦 ∈ F𝑛×1 } = 𝑅(𝐴). (b) From (a),
rank(𝐴𝐵) ≤ rank(𝐴). Next, rank((𝐴𝐵)𝑡 ) = rank(𝐵𝑡 𝐴𝑡 ) ≤ rank(𝐵𝑡 ) = rank(𝐵).
4. Let 𝐴 = 𝐷𝐸 and 𝐵 = 𝐹𝐺 be the full rank factorizations of 𝐴
and 𝐵. Now,
𝐸 𝐸
𝐴+𝐵 = 𝐷 𝐹 . By Exercise 3, rank(𝐴 + 𝐵) ≤ rank ≤ rank(𝐸) +
𝐺 𝐺
rank(𝐺) = rank(𝐴) + rank(𝐵).
5(a) Since 𝐴 = 𝐵𝐶, each column of 𝐴 is a linear combination of columns of 𝐵. Since
𝐵 has full rank,the columns of 𝐵 are lin. ind. (b) Use (a) on 𝐴𝑡 = 𝐶 𝑡 𝐵𝑡 .
6. The columns of 𝐴 are unique linear combinations of columns of 𝐴. The coeffi-
cients in these linear combinations give the matrix 𝐶. Thus 𝐶 is a unique matrix.
7. Since 𝐷 is invertible, rank(𝐵𝐷) = rank(𝐵).
8. From Exercise 5(a), columns of 𝐵 1 form a basis for 𝑅(𝐴). Also, the columns of
𝐵 2 form a basis for 𝑅(𝐴). The isomorphism that maps the columns of 𝐵 1 to columns
of 𝐵 2 provides such a 𝐷. Then use Exercise 6.
§ 5.1
1. Let 𝜆 be a diagonal entry of 𝐴. In 𝐴 − 𝜆𝐼, the corresponding diagonal entry
is 0. Look at the first 0 entry on the diagonal of 𝐴 − 𝜆𝐼 . (first from (1, 1)th entry
through (𝑛, 𝑛)th.) If this occurs on the 𝑘th column, then the 𝑘th column is a linear
combination of earlier columns. Thus rank(𝐴 − 𝜆𝐼 ) < 𝑛. And, null(𝐴 − 𝜆𝐼 ) ≥ 1.
Thus 𝐴𝑣 = 𝜆𝑣 for some 𝑣 ≠ 0. 2. For rows summing to 𝛼: 𝐴[1 · · · 1] 𝑡 = 𝛼 [1 · · · 1] 𝑡 .
For columns summing to 𝛼, use null(𝐴 − 𝛼𝐼 ) = null(𝐴𝑡 − 𝛼𝐼 ).
Answers to exercises 173
3(a) 𝑇 𝑣 = 𝜆𝑣 ⇒ 𝑇 𝑘 𝑣 = 𝜆𝑘 𝑣. (b) 𝑇 𝑣 = 𝜆𝑣 ⇒ (𝑇 + 𝛼𝐼 )𝑣 = (𝜆 + 𝛼)𝑣. (c) Follows from
(a)-(b). Next, if F = C, then yes.
4. Let 𝑇 𝑣 = 𝜆𝑣. Then 𝑇 −1𝑇 𝑣 = 𝑣 = (1/𝜆)𝜆𝑣 = (1/𝜆)𝑇 𝑣, and 𝑇 𝑣 ≠ 0 for 𝑣 ≠ 0.
5. 𝑥 may not be a common
eigenvector. 6. Yes; 𝑇 = 𝐼 . 7. Yes; 𝑇 = 𝜆𝐼 .
1 1 1 0
8. 𝐴 = , 𝐵= .
1 0 1 1
§ 5.2
1(a) (𝑡 − 1)(𝑡 − 2), [1 − 1] 𝑡 , [−2 1] 𝑡 . (b) (𝑡 − 𝑖)(𝑡 + 𝑖), [1 − 2 − 𝑖] 𝑡 , [1 𝑖 − 2] 𝑡 .
(c) (𝑡 + 2)(𝑡 − 3)(𝑡 − 5), [5 2 0] 𝑡 , [0 1 0] 𝑡 , [3 − 3 7] 𝑡 .
(d) (𝑡 + 1)(𝑡 − 2)(𝑡 − 3), [1 − 1 1] 𝑡 , [1 2 4] 𝑡 , [1 3 9] 𝑡 .
2. Eigenvalue is 0; eigenvector is 1.
3(a) If 𝑎 = 0, then 𝐴 has lin. dep. columns. If 𝑎 ≠ 0, then 𝐴 has lin. ind.
columns. (b) 𝜒𝐴 (𝑡) = 𝑡 3 − 𝑐𝑡 2 − 𝑏𝑡 − 𝑎. Use Cayley-Hamilton theorem and verify
that 𝐴(1/𝑎)(𝐴2 − 𝑐𝐴 − 𝑏𝐼 ) = 𝐼 .
4. Let 𝜒𝐴 (𝑡) = 𝑎 0 + 𝑎 1𝑡 + · · · + 𝑎𝑛−1𝑡 𝑛−1 + 𝑡 𝑛 . Then 𝐴 is invertible iff 𝑎 0 ≠ 0. And,
𝐴 = −(𝑎 0) −1 (𝑎 1𝐼 + 𝑎 2𝐴 + · · · + 𝑎𝑛−1𝐴𝑛−2 + 𝐴𝑛−1 ).
0 −1
5. has no real eigenvalue.
1 0
6. 𝑥 ⊥ 𝑦. Now, 𝑧 := 𝑥 × 𝑦 is orthogonal to both 𝑥 and 𝑦. Then {𝑥, 𝑦, 𝑧} is an
orthogonal basis of R3×1 . Let 𝐴𝑥 = 𝛼𝑥 and 𝐴𝑦 = 𝛽𝑦. Let 𝐴𝑧 = 𝑎𝑥 + 𝑏𝑦 + 𝑐𝑧. Then
h𝐴𝑧 −𝑐𝑧, 𝑥i = h𝐴𝑧, 𝑥i−h𝑐𝑧, 𝑥i = h𝑧, 𝐴𝑥i+0 = h𝑧, 𝑎𝑥i = 0. Similarly, h𝐴𝑧 −𝑐𝑧, 𝑦i = 0.
So, h𝐴𝑧 − 𝑐𝑧, 𝐴𝑧 − 𝑐𝑧i = h𝐴𝑧 − 𝑐𝑧, 𝑎𝑥 + 𝑏𝑦i = 0. That is, 𝐴𝑧 − 𝑐𝑧 = 0.
§ 5.3
√ √ 𝑡 √ √ 𝑡
−2 9
1. {(1/ 2, −1/ 2) , (1/ 2, 1/ 2) }; [𝑇 ] = .
0 3
1 1 2 3 −14
2(a) 𝑃 = √ ;𝑈 = .
5 −2 1 0 1
1 1 1+𝑖 2 + 𝑖 −1 + 2𝑖
(b) 𝑃 = √ ;𝑈 = .
3 1 − 𝑖 −1 0 2−𝑖
2 −2 1 9 0 9
1
(c) 𝑃 = 3 2 1 −2 ; 𝑈 = 0 9 9 .
1 −2 2 0 0 9
3. Notice that with 𝐷 = diag (9, 9, 9), (𝑈 −𝐷) 2 = 0. Using binomial theorem to com-
950 0 50 · 950
pute 𝑈 50 = (𝐷 + (𝑈 − 𝐷)) 50 gives 𝑈 50 = 𝐷 50 + 50𝐷 49 (𝑈 − 𝐷) = 0 950 50 · 950 .
0 0
950
209 400 400
Then 𝐴50 = 𝑃 𝑡 𝑈 𝑃 = 949 −50 −91 −100 .
−50 −100 −91
174 Introduction to Linear Algebra
§ 5.4
0 1 1 1 0 1
1(a) 𝐴 = 1 0 1 , 𝑃 = 1 1 0 , 𝑃 −1𝐴𝑃 = diag (2, −1, −1).
1 1 0 1 −1 −1
7 −2 0 1 2 2
(b) 𝐴 = −2 6 −2 , 𝑃 = 2 1 −2 , 𝑃 −1𝐴𝑃 = diag (3, 6, 9).
0 −2 5 2 −2 1
2(a) 𝐵 = {(−1, 1, 1), (0, 1, −1), (1, 1, 0)}, [𝑇 ]𝐵,𝐵 = diag (−1, 2, 2).
(b)-(c) Geom. mult. of 𝜆 = 0 is 1; alg. mult. of 𝜆 = 0 is 3; not diagonalizable.
(d) 𝐵 = {(1, 0, −1), (0, 1, 0), (1, 0, 1)}, [𝑇 ]𝐵,𝐵 = diag (−1, 1, 1).
(e) Geom. mult. of 𝜆 = 0 is 1; alg. mult. of 𝜆 = 0 is 4; not diagonalizable.
(f) 𝐵 = {1, 𝑡, 𝑡 2, 𝑡 3 }, [𝑇 ]𝐵,𝐵 = diag (1, 1/2, 1/3, 1/4).
0 1 0 0 0 0 0 1 0
3. 0 0 0 , 0 0 1 , 0 0 1 .
0 0 0 0 0 0 0 0 0
4(a) Real sym., so diagonalizable.
(b) Geom. mult. of 𝜆 = 1 is 1, alg. mult. is 3; not diagonalizable.
(c)-(d) Distinct eigenvalues; diagonalizable.
5. Yes. All eigenvalues are real.
1 1 1 95 0 0 1 −1 0 95 45 − 95 45 − 1
6. 𝐴5 = 0 1 1 0 45 0 0 1 1 = 0 45 45 − 1 .
0 0 −1 0 0 1 0 0 −1 0 0 1
Answers to exercises 175
7. 𝐴 and 𝐵 have the same eigenvalues with respective multiplicities being equal.
By ordering the eigenvectors suitably, we have invertible matrices 𝑃 and 𝑄 such
that 𝐴 = 𝑃 −1 𝐷𝑃 and 𝐵 = 𝑄 −1 𝐷𝑄, where 𝐷 is a diagonal matrix. Then 𝐵 =
(𝑃 −1𝑄) −1𝐴(𝑃 −1𝑄).
§ 5.5
§ 5.6
178
Index 179
linear transformation, 52 singular values, 152
solution set, 75
matrix, 87
span, 13
of linear map, 88
spanned by, 15
maximal, 23
spanning set, 15
minimal, 23
square root, 162
norm, 38 standard basis, 23
normal, 73 standard inner product, 37, 38
nullity, 59 subspace, 9
null space, 59 sum of subsets, 14
SVD, 152
orthogonal, 73
orthogonally diagonalizable, 127 Theorem
orthogonal basis, 43 Basis extension, 29
orthogonal set, 41 Bessel’s inequality, 43
orthogonal vectors, 41 Cauchy Schwartz, 39
orthonormal basis, 43 Cayley-Hamilton, 115
orthonormal set, 41 Fourier expansion, 43
Full rank factorization, 106
parallelogram law, 39
Jordan form, 146
Parseval’s identity, 43
Jordan strings, 140
polar decomposition, 158, 160
Parallelogram law, 39
positive definite, 158
Parseval’s identity, 43
positive semi-definite, 158
Polar decomposition, 158, 160
Pythagoras’ theorem, 41
Rank-nullity, 60
QR-factorization, 82 Rank factorization, 105
Rank theorem, 105
range space, 59
Reverse triangle ineq., 39
rank, 59
Schur triangularization, 121
Rank-nullity theorem, 60
SVD, 152, 154
Rank factorization, 105
Triangle inequality, 39
Rank theorem, 105
thin SVD, 154
real vector space, 4
tight SVD, 155
Reverse triangle inequality, 39
triangle inequality, 39
Riesz representation, 70
Riesz representer, 71 unitarily diagonalizable, 126
row vector, 3 unitary, 73
unitary space, 38
scalar, 4
Schur triangularization, 123 vector, 4
self-adjoint, 73 vector space, 3
similar, 106
zero operator, 52