Chapter 3 - Spectral Theorem
Chapter 3 - Spectral Theorem
In the previous chapter we were solely interested in making an invertible change of variable.
That is, the change of basis matrix P need only be invertible. When we make an invertible
change of variable, algebraic properties such as
are all preserved. However geometric properties are not typically preserved such as:
² length, angle, area and volume, scalar product, normal forms of curves and surfaces.
x2 + y 2 = 1,
X = 2x, Y = 3y,
Proposition 59 The orthogonal matrices are precisely the matrices which preserve the scalar
product. That is
P x ¢ P y = (P x)T P y = xT P T P y = xT y = x ¢ y.
² Spectral theorem: Let A be an n £ n symmetric matrix. Then the roots of χA are real
and A has an eigenbasis consisting of mutually perpendicular unit vectors.
We prove a …rst result towards the theorem. Recall that eigenvectors associated with distinct
eigenvalues are independent – the corresponding result for symmetric matrices is the following.
As λ 6= µ then v ¢ w = 0.
Example 61 Let µ ¶
1
1 2
A= 1 .
2
1
(a) Find an orthogonal matrix P such that P T AP is diagonal.
(b) Show that the curve x2 + xy + y 2 = 1 is an ellipse, and …nd its area. Sketch the curve.
Note that the 12 -eigenvectors and 32 -eigenvectors are perpendicular to one another – this was
bound to be the case by the previous proposition. The eigenvectors (1, ¡1)T and (1, 1)T cannot
be the used as the columns of an orthogonal matrix, as they’re not unit length, but if we
2π
πab = p .
3
We can say the curve is an ellipse, and calculate its area, as the change of variable is orthogonal.
The XY -axes are given by
A sketch of the ellipse, with the XY -axes labelled, is given in Figure 1 below.
Figure 1: x2 + xy + y 2 = 1
Proof. Let λ be a (potentially complex) root of χA . Then by (an appropriate complex version
of) Proposition 41(a), there is a non-zero complex vector v in Cncol such that Av = λv. As the
entries of A are real, when we conjugate this equation we obtain A¹ ¹ v. As A = AT , and
v = λ¹
by the product rule for transposes, we see
¹ vT v = (λ¹
λ¹ ¹ v)T v = (A¹
v )T v = v
¹T AT v = v
¹T Av = v
¹T λv = λ¹
vT v.
There are, in fact, only limited ways of doing this. As hw1 i = hv1 i then w1 is a scalar multiple
of v1 . But as w1 is a unit vector then w1 = §v1 / jv1 j. So there are only two choices for w1
and it seems most natural to take w1 = v1 / jv1 j (rather than needlessly introducing a negative
y2 = βw2 = v2 ¡ (v2 ¢ w1 ) w1 6= 0
is the component of v2 perpendicular to v1 . We then have w2 = §y2 / jy2 j. Again we have two
choices of w2 but again there is no particular reason to choose the negative option.
v2
y2
v3 v1
w1
w2
w3
y3
Figure 2: GSOP example
Figure 2 above hopefully captures the geometric nature of this process. v1 spans a line and so
there are only two unit vectors parallel to it with w1 = v1 / jv1 j being a more natural choice
than its negative. hv1 , v2 i is a plane divided into two half-planes by the line hv1 i and there
are two choices of unit vector in this plane which are perpendicular to this line. We choose w2
to be that unit vector pointing into the same half-plane as v2 does. Continuing, hv1 , v2 , v3 i is
a three-dimensional space divided it into two half-spaces by the plane hv1 , v2 i. There are two
choices of unit vector in this space which are perpendicular to the plane. We choose w3 to be
that unit vector pointing into the same half-space as v3 does. This process is known as the
Gram-Schmidt orthogonalization process (GSOP)1 , with the rigorous details appearing below.
1
Named after the Danish mathematician Jorgen Pedersen Gram (1850-1916) and the German mathematician
Erhard Schmidt (1876-1959). The orthogonalization process was employed by Gram in a paper of 1883 and
by Schmidt, with acknowledgements to Gram, in a 1907 paper, but in fact the process had also been used by
Laplace as early as 1812.
Proof. We will prove this by induction on i. The result is seen to be true for i = 1 by taking
w1 = v1 /jv1 j. Suppose now that 1 6 I < k and that we have so far produced orthonormal
vectors w1 , . . . , wI such that (3.1) is true for 1 6 i 6 I. We then set
I
X
yI+1 = vI+1 ¡ (vI+1 ¢ wj )wj .
j=1
which contradicts the linear independence of v1 , . . . , vI , vI+1 . If we set wI+1 = yI+1 / jyI+1 j , it
follows from (3.2) that w1 , . . . , wI+1 form an orthonormal set. Further
hw1 , . . . , wk i = hv1 , . . . , vk i = U.
Theorem 66 (Spectral Theorem2 ) Let A be a real symmetric n£n matrix. Then there exists
an orthogonal n £ n matrix P such that P T AP is diagonal.
Proof. We shall prove the result by strong induction on n. When n = 1 there is nothing to
prove as all 1 £ 1 matrices are diagonal and so we can simply take P = I1 .
Suppose now that the result holds for r £ r real symmetric matrices where 1 6 r < n. By
the fundamental theorem of algebra, the characteristic polynomial χA has a root λ in C, which
by Proposition 62 we in fact know to be real. Let X denote the λ-eigenspace, that is
X = fv 2 Rncol j Av = λvg.
Note that Avi = λvi for 1 6 i 6 m and so the …rst m columns of P T AP are λeT1 , . . . , λeTm . Also
if 1 6 i 6 m and m < j, k 6 n we have, using A = AT and the product rule for transposes,
that
vi ¢ (Avj ) = viT (Avj ) = viT AT vj = (Avi )T vj = λviT vj = λ(vi ¢ vj ) = 0;
£ ¤ £ ¤ £ ¤
so that P T AP ij = 0. Further, as P T AP is symmetric then P T AP kj = P T AP jk . Together
this means that µ ¶
T λIm 0
P AP = ,
0 M
where M is a symmetric (n ¡ m) £ (n ¡ m) matrix. By our inductive hypothesis there is an
orthogonal (n ¡ m) £ (n ¡ m) matrix Q such that QT M Q is diagonal. If we set
µ ¶
Im 0
R=
0 Q
2
Appreciation of this result, at least in two variables, dates back to Descartes and Fermat. But the equivalent
general result was …rst proven by Cauchy in 1829, though independently of the language of matrices, which were
yet to be invented. Rather Cauchy’s result was in terms of quadratic forms – a quadratic form in two variables
is an expression of the form ax2 + bxy + cy 2 .
(P R)T A(P R) = RT P T AP R
µ ¶T µ ¶µ ¶
Im 0 λIm 0 Im 0
=
0 QT 0 M 0 Q
µ ¶
λIm 0
= T
0 Q MQ
² inde…nite otherwise.
From the spectral theorem we see that these correspond respectively to the eigenvalues of A being
(i) all positive, (ii) all non-negative, (iii) all negative, (iv) all non-positive, (v) some positive,
some negative and some possibly zero.
Remark 68 (Hermitian matrices) There is a version of the spectral theorem over the com-
plex numbers. The standard inner product on Cn is given by
hz, wi = z ¢ w = z1 w1 + ¢ ¢ ¢ + zn wn .
T
So the equivalent of the orthogonal matrices are the unitary matrices which satisfy U ¡1 = U .
These are precisely the matrices that preserve the complex inner product. And the equivalent of
T
symmetric matrices are the hermitian matrices3 which satisfy M = M . The complex version
of the spectral theorem then states that, for any hermitian matrix M there exists a unitary matrix
T
U such that U MU is diagonal with real entries. Hermitian matrices are important in quantum
theory as they represent observables.
Remark 69 We saw earlier that the theory of diagonalization applies equally well over any
…eld, mainly because it is part of the theory of vector spaces and linear maps. By contrast the
spectral theorem is best set in the context of inner product spaces and so there is a spectral
theorem only for symmetric matrices over R and for Hermitian matrices over C, these being
the linear maps which respect the inner product. There is a more detailed comment on this
matter at the end of the chapter.
3
After the French mathematician, Charles Hermite (1822-1901).
1 1 1 0
However to …nd the last three columns of P , we need an orthonormal basis for the ¡1-
eigenspace. Applying the GSOP to the above three vectors, we arrive at
p p p
v2 = (1, ¡1, 0, 0)T / 2, v3 = (1, 1, ¡2, 0)T / 6, v4 = (1, 1, 1, ¡3)T / 12.
Below are some important examples of symmetric matrices across mathematics and a de-
scription of their connection with quadratic forms.
Example 73 (Gram matrices) The Gram matrix M for an inner product h , i on a vector
space with basis fv1 , . . . , vn g has (i, j)th entry
[M ]ij = hvi , vj i .
This is a symmetric, positive de…nite matrix – because of the properties of inner products – and
conversely any symmetric, positive de…nite matrix is the Gram matrix of an inner product.
Example 74 (Inertia matrix in dynamics) A rigid body, rotating about a …xed point O
with angular velocity ! has kinetic energy
1
T = ! T I0 !,
2
where I0 is the inertia matrix
0 1
A ¡D ¡E
I0 = @ ¡D B ¡F A ,
¡E ¡F C
where
ZZZ ZZZ ZZZ
¡ 2 2
¢ ¡ 2 2
¢ ¡ ¢
A = ρ y +z dV, B= ρ x +z dV, C= ρ x2 + y 2 dV,
Z ZRZ ZZZ R
ZZZ R
where ρ denotes density and R is the region that the rigid body occupies. For a spinning
top, symmetrical about its axis, the eigenvectors of I0 are along the axis with two eigenvectors
orthogonal to that. Wrt this basis I0 = diag(A, A, C), but the spectral theorem applies to any
rigid body, however irregular the distribution of matter.
One of the most important applications of the spectral theorem is the classi…cation of
quadratic forms.
where the aij are scalars. Thus a quadratic form in two variables x, y is ax2 + bxy + cy 2 where
a, b, c are scalars.
Alternatively a co-ordinate-free way of de…ning quadratic forms on a vector space V is as
B(v, v)
where B : V £ V ! R is a bilinear map.
There are many important examples of quadratic forms, some listed below.
Example 78 (Conics) The general degree two equation in two variables has the form
Ax2 + Bxy + Cy 2 + Dx + Ey + F = 0
where A, . . . , F are real scalars and A, B, C are not all zero. This equation can be put into
normal forms as follows. Firstly we can rewrite the equation as
µ ¶ µ ¶ µ ¶
x x A B/2
(x, y) M + (D, E) + F = 0, where M = . (3.3)
y y B/2 C
Note that M is symmetric. By the spectral theorem we know that there is a 2 £ 2 orthogonal
matrix P which will diagonalize M. If we set
µ ¶ µ ¶
x X
=P ,
y Y
As P is orthogonal then this change of variable will not change any geometric aspects: distances,
angles and areas remain unaltered. In these new variables X, Y, and with P T M P = diag(A, ~ C)
~
~ ~
and (D, E)P = (D, E), our equation now reads as
~ 2 + CY
AX ~ 2 + DX
~ + EY
~ + F = 0.
We can now complete any squares to put this equation into a normal form.
Each ellipse, hyperbola, parabola can be uniquely put into one of the above forms by an
isometry of the plane. The general degree two equation also leads to some degenerate cases such
as parallel lines, intersecting lines, repeated lines, points and the empty set.
a F' F a x
F' F x
a a
b
Figure 3a – ellipse Figure 3b – hyperbola
y
F
x
Example 79 (Quadrics) The spectral theorem applies equally well to the general degree two
equation in three variables x, y, z. The normal forms for the non-degenerate cases are
x2 y 2
¡ 2 ¡z =0 (a, b > 0) .
a2 b
Each of these non-degenerate cases be uniquely put into one of the above forms by an isome-
try of the plane. The general degree two equation in three variables also leads to some degenerate
cases such as parallel planes, intersecting planes, repeated planes, points, cones, elliptic par-
abolic and hyperbolic cylinders and the empty set.
Fig. 4a: Ellipsoid Fig. 4b: Elliptic paraboloid Fig. 4c: Hyperbolic paraboloid
2
x2 2
a2
+ yb2 + zc2 = 1 z = a2 x2 + b2 y 2 z = a2 x2 ¡ b2 y 2
Fig. 4d: 2 sheets hyperboloid Fig. 4e: 1 sheet hyperboloid Fig. 4f: Double Cone
2 2 2
x2 2 x2 2 2
a2
¡ yb2 ¡ zc2 = 1 a2
+ yb2 ¡ zc2 = 1 z 2 = xa2 + yb2 .
Example 80 Show that the equation 13x2 + 13y 2 + 10z 2 + 4yz + 4zx + 8xy = 1 de…nes an
ellipsoid and …nd its volume.
Solution. Let 0 1
13 4 2
A = @ 4 13 2 A
2 2 10
Solution. (a) The characteristic polynomial χA (x) equals (x + 1) (x ¡ 2)2 so that the eigen-
values are ¡1, 2, 2. The ¡1-eigenvectors are multiples of (1, 1, 1)T and the 2-eigenspace is the
plane x + y + z = 0. So an orthonormal eigenbasis for A is
(1, 1, 1)T (1, ¡1, 0)T (1, 1, ¡2)T
p , p , p ,
3 2 6
(b) We then have that f (P x) = (P x)T A(P x) = xT P T AP x = ¡x2 +2y 2 +2z 2 . Now we similarly
have 0 p 10 1
p0 2 0 x
g(x) = (x, y, z) @ 2 ¡1 0 A @ y A
0 0 2 z
and this matrix (call it B) has characteristic polynomial χB (x) = (x + 2) (x ¡ 1) (x ¡ 2) . This
means that there is an orthogonal matrix R such that
g(Rx) = ¡2x2 + y 2 + 2z 2 .
We pcanpthen see that the (invertible but not orthogonal) map S which sends (x, y, z)T to
(x/ 2, 2y, z)T satis…es
p p
g(RSx) = ¡2(x/ 2)2 = ( 2y)2 + 2z 2 = ¡x2 + 2y 2 + 2z 2 .
That A and B have the same number of eigenvalues of each sign means that there is an
invertible change of variables connecting the funvyions f and g, but there is no orthogonal
change of variable as that would preserve the eigenvalues.
(c) The quadric surface f (x) = 1 is a hyperboloid of one sheet, as in Figure 4e with the new
x-axis being the axis of the hyperboloid.
Example 82 (Hessian matrix) Let f (x, y) be a function of two variables with partial deriv-
atives of all orders. Taylor’s theorem in two variables states
1¡ ¢
f (a+δ, b+ε) = f(a, b) +(fx (a, b)δ +fy (a, b)ε)+ fxx (a, b)δ 2 + 2fxy (a, b) δε + fyy (a, b)ε2 +R3
2
where R3 is a remainder term that is at least order three in δ and ε. A critical point or
stationary point (a, b) is one where
λ¢2 + µE 2
When H is singular then the critical point is said to be degenerate, and its classi…cation depends
on the cubic terms (or higher) in Taylor’s theorem.
Example 83 (Norms) Given an inner product space V then the norm squared kvk2 = hv, vi
is a positive de…nite quadratic form on V. For a smooth parameterized surface r(u, v) in R3
then the tangent space Tp at a point p equals the span of ru and rv . The restriction of kvk2 to
Tp is the quadratic form
where
E = ru ¢ r u , F = ru ¢ rv , G = rv ¢ rv ,
and is known as the …rst fundamental form.
We conclude this chapter with some comments charting the direction of spectral theory into
the second year linear algebra and beyond into third year functional analysis. You should
consider all these remarks – and the subsequent epilogue – to be beyond the Prelims
syllabus but they may make interesting further reading to some.
Remark 84 (Adjoints) As commented earlier, the spectral theorem is most naturally stated
in the context of inner product spaces; a more sophisticated version of the theorem appears
in the secord year the course A0 Linear Algebra. The version we have met states that a real
symmetric matrix (or complex Hermitian matrix) is diagonalizable via an orthogonal change of
variable.
If we seek to extend this theorem to linear maps on vector spaces, our …rst problem is that
there is no well-de…ned notion of the transpose of a linear map and so no notion of a symmetric
linear map. The determinant of a linear map T is well-de…ned as the determinant is the same
4
After the German mathematician Ludwig Hesse (1811–1874).
AT 6= P ¡1 B T P.
This can be gotten around somewhat if we only consider orthogonal changes of variable P . In
this case
A = P T BP =) AT = P T B T P.
So should we only use orthonormal bases, and orthogonal changes of variable, then we can
de…ne the "transpose" of a linear map. But this discussion only makes sense in in an inner
product space and not in a general vector space; that "tranpose" is instead referred to a the
adjoint of T written T ¤ .
Given a linear map T : V ! V of a …nite-dimensional inner product space V, its adjoint
T ¤ : V ! V is the unique linear map satisfying
If we choose an orthonormal basis v1 , . . . , vn for V and let A and B respectively be the matrices
for T and T ¤ wrt this basis then
So the matrix for T ¤ is that of the transpose of the matrix for T. The "symmetric" linear maps
are then those satisfying T = T ¤ , the so-called self-adjoint linear maps which satisfy
Example 85 (See Sheet 4, Exercise P3.) The nth Legendre polynomial Pn (x) satis…es Legen-
dre’s equation
¡ ¢ d2 y dy
1 ¡ x2 2
¡ 2x + n(n + 1)y = 0
dx dx
where n is a natural number. This can be rewritten as
· ¸
d ¡ 2
¢ d
Ly = ¡n(n + 1)y where L= 1¡x .
dx dx
So Pn(x) can be viewed as an n(n + 1)-eigenvector of the di¤erential operator L. Further it can
be shown that
hPn (x), Pm (x)i = 0 when n 6= m,
so that the Legendre polynomials are in fact orthogonal eigenvectors; further still it is true that
showing L to be self-adjoint.
}2 d2 ψ
¡ + V (x)ψ = Eψ, ψ(0) = ψ(a) = 0.
2m dx2
This equation was formulated in 1925 by the Austrian physicist, Erwin Schrödinger (1887-1961).
The above is the time-independent equation of a particle in the interval 0 6 x 6 a. The wave
function ψ is a complex-valued function of x and jψ(x)j2 can be thought of as the probability
density function of the particle’s position. m is its mass, } is the (reduced) Planck constant,
V (x) denotes potential energy and E is the particle’s energy.
A signi…cant, confounding aspect of late nineteenth century experimental physics was the
emission spectra of atoms. (By the way, these two uses of the word "spectrum" in mathematics
and physics appear to be coincidental.) As an example, experiments showed that only certain
discrete, quantized energies could be released by an excited atom of hydrogen. Classical physical
theories were unable to explain this phenomenon.
Schrödinger’s equation can be rewritten as Hψ = Eψ with E being an eigenvalue of the
di¤erential operator H known as the Hamiltonian. One can again show that H is self-adjoint,
that is:
hHψ 1 , ψ 2 i = hψ 1 , Hψ 2 i
where Z a
hϕ, ψi = ϕ(x)ψ(x) dx.
0
And if V is constant then it’s easy to show that the only non-zero solutions of Schrödinger’s
equation above are
³ nπx ´ n2 π 2 }2
ψ n (x) = An sin where E = En = V + ,
a 2ma2
and npis a positive integer and An is a constant. If jψ n (x)j2 is to be a pdf then we need
An = 2/a, and again these ψ n are orthonormal with respect to the above inner product. Note
above that the energy E can only take certain discrete values En .
for which jψ(x)j2 is a pdf. How might we write such a ψ(x) as a combination of the ψ n (x)? This
is an in…nite-dimensional version of the problem the spectral theorem solved – how in general
to write a vector as a linear combination of orthonormal eigenvectors – and in the in…nite
dimensional case is the subject of Fourier analysis, named after the French mathematician
Joseph Fourier (1768-1830). In this case Fourier analysis shows that
1 p
X 8 15
ψ(x) = α2n+1 ψ 2n+1 (x) where αn = 3 3 .
0
π n
If the particle’s energy was measured it would be one of the permitted energies En and the e¤ect
of measuring this energy is to collapse the above wave function ψ to one of the eigenstates
ψ 2n+1 . It is the case that
1
X
jα2n+1 j2 = 1
0
We conclude with an important related theorem, namely the singular value decomposition
theorem which applies not just to square matrices and is important in numerial analysis, signal
processing, pattern recognition and in particular is used in the Trinity term Statistics and Data
Analysis course when discussing principal component analysis.
Recall that, given an m £ n matrix A of rank r then there exist an invertibel m £ m matrix
P and an invertible n £ n matrix Q such that
µ ¶
Ir 0r,n¡r
P AQ = .
0m¡r,r 0m¡r,n¡r
Solution. Firstly
0 1 0 1
1 0 µ ¶ 1 0 2 1
B C
0 2 C 1 0 2 1 B 2 ¡2 C
AT A = B B 0 4 C.
@ 2 1 A 0 2 1 ¡1 = @ 2 2 5 1 A
1 ¡1 1 ¡2 1 2
and so à !
p1 p1
P = P1T = p1
2 2 .
2
¡ p12
Remark 89 With notation as in Theorem 87, the pseudoinverse (or Moore-Penrose inverse)
of A is µ ¶
+ D¡1 0r,m¡r
A =Q P.
0n¡r,r 0n¡r,m¡r
The following facts are then true of the pseudoinverse.
(a) If A is invertible then A¡1 = A+ .