0% found this document useful (0 votes)
46 views23 pages

Chapter 3 - Spectral Theorem

The document discusses the spectral theorem, which states that a symmetric matrix can be orthogonally diagonalized, preserving geometric properties. It explains the conditions under which matrices can be diagonalized by orthogonal changes of variables and provides proofs for key propositions related to eigenvectors and eigenvalues. Additionally, it introduces the Gram-Schmidt orthogonalization process for constructing orthonormal bases from independent vectors.

Uploaded by

skcap01482
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views23 pages

Chapter 3 - Spectral Theorem

The document discusses the spectral theorem, which states that a symmetric matrix can be orthogonally diagonalized, preserving geometric properties. It explains the conditions under which matrices can be diagonalized by orthogonal changes of variables and provides proofs for key propositions related to eigenvectors and eigenvalues. Additionally, it introduces the Gram-Schmidt orthogonalization process for constructing orthonormal bases from independent vectors.

Uploaded by

skcap01482
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

3.

THE SPECTRAL THEOREM

In the previous chapter we were solely interested in making an invertible change of variable.
That is, the change of basis matrix P need only be invertible. When we make an invertible
change of variable, algebraic properties such as

² determinant, trace, eigenvalues, dimension, rank, invertibility

are all preserved. However geometric properties are not typically preserved such as:

² length, angle, area and volume, scalar product, normal forms of curves and surfaces.

For example the curve with equation

x2 + y 2 = 1,

under the invertible change of variable

X = 2x, Y = 3y,

takes on the equation


X2 Y 2
+ = 1.
4 9
What was a circle with area π has become an ellipse with area 6π.
Should we wish to make changes of variable which preserve geometric properties then we
need to make an orthogonal change of variable.

De…nition 58 An n £ n matrix P is orthogonal if P ¡1 = P T .


This is equivalent to the columns (or rows) of P being unit length and mutually perpendicular.
That is to say, the columns (or rows) of P form an orthonormal basis of Rncol (or Rn ).

Proposition 59 The orthogonal matrices are precisely the matrices which preserve the scalar
product. That is

P x ¢ P y = x ¢ y for all x, y () P is orthogonal.

Proof. Let P be an orthogonal matrix. Then

P x ¢ P y = (P x)T P y = xT P T P y = xT y = x ¢ y.

Conversely assume P x ¢ P y = x ¢ y for all x, y. If we set x = ei and y = ej then


£ T ¤
P P ij = eTi P T P ej = eTi ej = δ ij = [I]ij .

As this is true for each i, j then P T P = I and so P is orthogonal.

THE SPECTRAL THEOREM 32


A …rst question then is: what matrices can be diagonalized by an orthogonal change of
variables? Say that
P ¡1 AP = D
where D is diagonal and P is orthogonal. Then A = P DP ¡1 = P DP T , so
¡ ¢T
AT = P DP T = P T T DT P T = P DP T = A.

Thus if a matrix A is orthogonally diagonalizable it is necessarily symmetric. The converse is


true and known as the spectral theorem:

² Spectral theorem: Let A be an n £ n symmetric matrix. Then the roots of χA are real
and A has an eigenbasis consisting of mutually perpendicular unit vectors.

We prove a …rst result towards the theorem. Recall that eigenvectors associated with distinct
eigenvalues are independent – the corresponding result for symmetric matrices is the following.

Proposition 60 Let A be a real n £ n symmetric matrix. If v and w are eigenvectors of A


with associated eigenvalues λ and µ, where λ 6= µ, then v ¢ w = 0.

Proof. We have that Av = λv and Aw = µw where λ 6= µ. Then, as A is symmetric, we have

λv ¢ w = λvT w = (λv)T w = (Av)T w = vT AT w = vT Aw = vT µw = µv ¢ w.

As λ 6= µ then v ¢ w = 0.

Example 61 Let µ ¶
1
1 2
A= 1 .
2
1
(a) Find an orthogonal matrix P such that P T AP is diagonal.
(b) Show that the curve x2 + xy + y 2 = 1 is an ellipse, and …nd its area. Sketch the curve.

Solution. (a) Note µ ¶µ ¶


21 1 3
χA (x) = (1 ¡ x) ¡ = ¡x ¡x .
4 2 2
When
µ 1 1
¶ ¿ À
1 2 2
1
λ = : ker 1 1 = ;
2 2 2
¡1
µ ¶ ¿ À
3 ¡ 12 1
2
1
λ = : ker 1 1 = .
2 2
¡2 1

Note that the 12 -eigenvectors and 32 -eigenvectors are perpendicular to one another – this was
bound to be the case by the previous proposition. The eigenvectors (1, ¡1)T and (1, 1)T cannot
be the used as the columns of an orthogonal matrix, as they’re not unit length, but if we

THE SPECTRAL THEOREM 33


normalize them to unit vectors then they can form the columns of an orthogonal matrix. Thus
we set µ ¶
1 1 1
P =p .
2 ¡1 1
We then know µ 1 ¶
T 2
0
P AP = .
0 32
(b) The equation x2 + xy + y 2 = 1 can be rewritten as
µ ¶µ ¶
1 12 x
(x y) 1 = 1.
2
1 y

Making the change of variable µ ¶ µ ¶


x X
=P ,
y Y
the equation 1 = (x y) A (x y)T becomes
1 3
1 = (X Y ) P T AP (X Y )T = X 2 + Y 2 .
2 2
p p
This is the equation of an ellipse with semi-axes of length a = 2 and b = 2/3 with area


πab = p .
3
We can say the curve is an ellipse, and calculate its area, as the change of variable is orthogonal.
The XY -axes are given by

X-axis or Y = 0 is in the direction of P (1, 0)T so is the line y = x;


Y -axis or X = 0 is in the direction of P (0, 1)T so is the line x + y = 0.

A sketch of the ellipse, with the XY -axes labelled, is given in Figure 1 below.

Figure 1: x2 + xy + y 2 = 1

THE SPECTRAL THEOREM 34


When an n £ n matrix has distinct eigenvalues then we can …nd n eigenvectors which are
independent and so form an eigenbasis; we can create an invertible matrix P with those eigen-
vectors as the columns of P. Similarly when a symmetric n £ n matrix has distinct eigenvalues
then we can …nd n eigenvectors which are orthogonal (and thus independent) and so form an
eigenbasis; the matrix P with those eigenvectors as its columns will not in general be orthogo-
nal, but if we normalize the eigenvectors – scale them to unit length – then the matrix P will
be orthogonal as its columns will be mutually perpendicular and unit length.
When an n £ n matrix has repeated eigenvalues then there may not be any eigenbasis.
This cannot happen when a symmetric square matrix has repeated eigenvalues, but this result
is reasonably sophisticated. In particular we will need to prove the following for symmetric
matrices:

² The roots of the characteristic polynomial are real.

² The direct sum of the eigenspaces is the entire space.

² Each eigenspace has an orthonormal basis.

We begin by demonstrating the …rst result.

Proposition 62 Let A be a real n £ n symmetric matrix. The roots of χA (x) = det(xI ¡ A)


are real.

Proof. Let λ be a (potentially complex) root of χA . Then by (an appropriate complex version
of) Proposition 41(a), there is a non-zero complex vector v in Cncol such that Av = λv. As the
entries of A are real, when we conjugate this equation we obtain A¹ ¹ v. As A = AT , and
v = λ¹
by the product rule for transposes, we see
¹ vT v = (λ¹
λ¹ ¹ v)T v = (A¹
v )T v = v
¹T AT v = v
¹T Av = v
¹T λv = λ¹
vT v.

Now for any non-zero complex vector v = (v1 , v2 , . . . , vn )T we have

¹ ¢ v = v1 v1 + ¢ ¢ ¢ + vn vn = jv1 j2 + ¢ ¢ ¢ + jvn j2 > 0.


¹T v = v
v
¹ ¡ λ)¹
As (λ ¹ and so λ is real.
vT v = 0 then λ = λ
We now move on to the third bullet point. We will demonstrate that any subspace has an
orthonormal basis. This result then applies to eigenspaces as they are subspaces. Our …rst
result is to show how an orthonormal set can be constructed from a linearly independent one.
Say that v1 , v2 , . . . , vk is an independent set in Rn ; we shall construct an orthonormal basis
w1 , w2 , . . . , wk such that

hv1 , v2 , . . . , vi i = hw1 , w2 , . . . , wi i for 1 6 i 6 k.

There are, in fact, only limited ways of doing this. As hw1 i = hv1 i then w1 is a scalar multiple
of v1 . But as w1 is a unit vector then w1 = §v1 / jv1 j. So there are only two choices for w1
and it seems most natural to take w1 = v1 / jv1 j (rather than needlessly introducing a negative

THE SPECTRAL THEOREM 35


sign). With this choice of w1 we then need to …nd a unit vector w2 perpendicular to w1 and
such that hw1 , w2 i = hv1 , v2 i . In particular, we have

v2 = αw1 + βw2 for some scalars α, β.

We require w2 to be perpendicular to w1 and so α = v2 ¢ w1 . Note that

y2 = βw2 = v2 ¡ (v2 ¢ w1 ) w1 6= 0

is the component of v2 perpendicular to v1 . We then have w2 = §y2 / jy2 j. Again we have two
choices of w2 but again there is no particular reason to choose the negative option.

v2

y2
v3 v1

w1
w2
w3
y3
Figure 2: GSOP example
Figure 2 above hopefully captures the geometric nature of this process. v1 spans a line and so
there are only two unit vectors parallel to it with w1 = v1 / jv1 j being a more natural choice
than its negative. hv1 , v2 i is a plane divided into two half-planes by the line hv1 i and there
are two choices of unit vector in this plane which are perpendicular to this line. We choose w2
to be that unit vector pointing into the same half-plane as v2 does. Continuing, hv1 , v2 , v3 i is
a three-dimensional space divided it into two half-spaces by the plane hv1 , v2 i. There are two
choices of unit vector in this space which are perpendicular to the plane. We choose w3 to be
that unit vector pointing into the same half-space as v3 does. This process is known as the
Gram-Schmidt orthogonalization process (GSOP)1 , with the rigorous details appearing below.

1
Named after the Danish mathematician Jorgen Pedersen Gram (1850-1916) and the German mathematician
Erhard Schmidt (1876-1959). The orthogonalization process was employed by Gram in a paper of 1883 and
by Schmidt, with acknowledgements to Gram, in a 1907 paper, but in fact the process had also been used by
Laplace as early as 1812.

THE SPECTRAL THEOREM 36


Theorem 63 (Gram-Schmidt Orthogonalization Process (GSOP)) Let v1 , . . . , vk be
independent vectors in Rncol (or Rn ). Then there are orthonormal vectors w1 , . . . , wk such that,
for each 1 6 i 6 k, we have
hw1 , . . . , wi i = hv1 , . . . , vi i. (3.1)

Proof. We will prove this by induction on i. The result is seen to be true for i = 1 by taking
w1 = v1 /jv1 j. Suppose now that 1 6 I < k and that we have so far produced orthonormal
vectors w1 , . . . , wI such that (3.1) is true for 1 6 i 6 I. We then set
I
X
yI+1 = vI+1 ¡ (vI+1 ¢ wj )wj .
j=1

Note that, for 1 6 i 6 I,


I
X
yI+1 ¢ wi = vI+1 ¢ wi ¡ (vI+1 ¢ wj )δ ij = vI+1 ¢ wi ¡ vI+1 ¢ wi = 0. (3.2)
j=1

So yI+1 is perpendicular to each of w1 , . . . , wI . Further yI+1 is non-zero, for if yI+1 = 0 then


I
X
vI+1 = (vI+1 ¢ wj )wj is in hw1 , . . . , wI i = hv1 , . . . , vI i
j=1

which contradicts the linear independence of v1 , . . . , vI , vI+1 . If we set wI+1 = yI+1 / jyI+1 j , it
follows from (3.2) that w1 , . . . , wI+1 form an orthonormal set. Further

hw1 , . . . , wI+1 i = hw1 , . . . , wI , yI+1 i = hw1 , . . . , wI , vI+1 i = hv1 , . . . , vI , vI+1 i

and the proof follows by induction.

Corollary 64 Every subspace of Rn (or Rncol ) has an orthonormal basis.

Proof. If U is a subspace of Rn then it has a basis v1 , . . . , vk . By applying the GSOP process,


an orthonormal set w1 , . . . , wk can be constructed from them which is a basis for U as

hw1 , . . . , wk i = hv1 , . . . , vk i = U.

Corollary 65 An orthonormal set can be extended to an orthonormal basis.

Proof. Let w1 , . . . , wk be an orthonormal set in Rn . In particular it is linearly independent


and so may be extended to a basis w1 , . . . , wk , vk+1 , . . . , vn for Rn . The GSOP can then be
applied to construct an orthonormal basis x1 , . . . , xn from this basis. The nature of the GSOP
means that xi = wi for 1 6 i 6 k and so our orthonormal basis is an extension of the original
orthonormal set.

THE SPECTRAL THEOREM 37


We now prove the second bullet point above, that the eigenspaces of a symmetric matrix A
form a direct sum for the entire space. Eigenvectors from di¤erent eigenspaces are automatically
orthogonal to one another and via GSOP we now know there exists an orthonormal basis for each
eigenspace. The union of the orthonormal bases for the eigenspaces then makes an orthonormal
eigenbasis for the whole space. All this is equivalent to showing that there is an orthogonal
matrix P such that P T AP is diagonal – we create P by having the orthonormal eigenbasis as
its columns.

Theorem 66 (Spectral Theorem2 ) Let A be a real symmetric n£n matrix. Then there exists
an orthogonal n £ n matrix P such that P T AP is diagonal.

Proof. We shall prove the result by strong induction on n. When n = 1 there is nothing to
prove as all 1 £ 1 matrices are diagonal and so we can simply take P = I1 .
Suppose now that the result holds for r £ r real symmetric matrices where 1 6 r < n. By
the fundamental theorem of algebra, the characteristic polynomial χA has a root λ in C, which
by Proposition 62 we in fact know to be real. Let X denote the λ-eigenspace, that is

X = fv 2 Rncol j Av = λvg.

Then X is a non-zero subspace as it is ker(A ¡ λI) and λ is an eigenvalue – so X has an


orthonormal basis v1 , . . . vm which we may extend to an orthonormal basis v1 , . . . vn for Rncol .
Let P = (v1 j . . . j vn ); then P is orthogonal and by the de…nition of matrix multiplication
£ T ¤
P AP ij = viT Avj = vi ¢ (Avj ).

Note that Avi = λvi for 1 6 i 6 m and so the …rst m columns of P T AP are λeT1 , . . . , λeTm . Also
if 1 6 i 6 m and m < j, k 6 n we have, using A = AT and the product rule for transposes,
that
vi ¢ (Avj ) = viT (Avj ) = viT AT vj = (Avi )T vj = λviT vj = λ(vi ¢ vj ) = 0;
£ ¤ £ ¤ £ ¤
so that P T AP ij = 0. Further, as P T AP is symmetric then P T AP kj = P T AP jk . Together
this means that µ ¶
T λIm 0
P AP = ,
0 M
where M is a symmetric (n ¡ m) £ (n ¡ m) matrix. By our inductive hypothesis there is an
orthogonal (n ¡ m) £ (n ¡ m) matrix Q such that QT M Q is diagonal. If we set
µ ¶
Im 0
R=
0 Q
2
Appreciation of this result, at least in two variables, dates back to Descartes and Fermat. But the equivalent
general result was …rst proven by Cauchy in 1829, though independently of the language of matrices, which were
yet to be invented. Rather Cauchy’s result was in terms of quadratic forms – a quadratic form in two variables
is an expression of the form ax2 + bxy + cy 2 .

THE SPECTRAL THEOREM 38


then R is orthogonal, P R is orthogonal and

(P R)T A(P R) = RT P T AP R
µ ¶T µ ¶µ ¶
Im 0 λIm 0 Im 0
=
0 QT 0 M 0 Q
µ ¶
λIm 0
= T
0 Q MQ

which is diagonal. This concludes the proof by induction.

Corollary 67 A real symmetric matrix A is said to be:

² positive de…nite if xT Ax > 0 for all x 6= 0;

² positive semi-de…nite if xT Ax > 0 for all x;

² negative de…nite if xT Ax < 0 for all x 6= 0;

² negative semi-de…nite if xT Ax 6 0 for all x;

² inde…nite otherwise.

From the spectral theorem we see that these correspond respectively to the eigenvalues of A being
(i) all positive, (ii) all non-negative, (iii) all negative, (iv) all non-positive, (v) some positive,
some negative and some possibly zero.

Remark 68 (Hermitian matrices) There is a version of the spectral theorem over the com-
plex numbers. The standard inner product on Cn is given by

hz, wi = z ¢ w = z1 w1 + ¢ ¢ ¢ + zn wn .
T
So the equivalent of the orthogonal matrices are the unitary matrices which satisfy U ¡1 = U .
These are precisely the matrices that preserve the complex inner product. And the equivalent of
T
symmetric matrices are the hermitian matrices3 which satisfy M = M . The complex version
of the spectral theorem then states that, for any hermitian matrix M there exists a unitary matrix
T
U such that U MU is diagonal with real entries. Hermitian matrices are important in quantum
theory as they represent observables.

Remark 69 We saw earlier that the theory of diagonalization applies equally well over any
…eld, mainly because it is part of the theory of vector spaces and linear maps. By contrast the
spectral theorem is best set in the context of inner product spaces and so there is a spectral
theorem only for symmetric matrices over R and for Hermitian matrices over C, these being
the linear maps which respect the inner product. There is a more detailed comment on this
matter at the end of the chapter.
3
After the French mathematician, Charles Hermite (1822-1901).

THE SPECTRAL THEOREM 39


Example 70 For the matrix A below, …nd orthogonal P such that P T AP is diagonal.
0 1
0 1 1 1
B 1 0 1 1 C
A=B @ 1 1 0 1 A.
C

1 1 1 0

Solution. The characteristic polynomial of A is χA (x) = (x + 1)3 (x ¡ 3). A unit length 3-


eigenvector is v1 = (1, 1, 1, 1)T /2 and the ¡1-eigenspace is x1 + x2 + x3 + x4 = 0. So a basis
for the ¡1-eigenspace is

(1, ¡1, 0, 0)T , (0, 1, ¡1, 0)T , (0, 0, 1, ¡1)T .

However to …nd the last three columns of P , we need an orthonormal basis for the ¡1-
eigenspace. Applying the GSOP to the above three vectors, we arrive at
p p p
v2 = (1, ¡1, 0, 0)T / 2, v3 = (1, 1, ¡2, 0)T / 6, v4 = (1, 1, 1, ¡3)T / 12.

Such a required matrix is then P = (v1 j v2 j v3 j v4 ).

Algorithm 71 (Orthogonal Diagonalization of a Symmetric Matrix) Let M be a sym-


metric matrix. The spectral theorem shows that M is diagonalizable and so has an eigenbasis.
Setting this eigenbasis as the columns of a matrix P will yield an invertible matrix P such that
P ¡1 M P is diagonal – in general though this P will not be orthogonal.
If v is an eigenvector of M whose eigenvalue is not repeated, then we replace it with v/ jvj .
This new eigenvector is of unit length and is necessarily orthogonal to other eigenvectors with
di¤erent eigenvalues Proposition 60. If none of the eigenvalues is repeated, this is all we need
do to the eigenbasis to produce an orthonormal eigenbasis.
If λ is a repeated eigenvalue then we can …nd a basis for the λ-eigenspace. Applying the
GSOP to this basis produces an orthonormal basis for the λ-eigenspace. Again these eigenvectors
are orthogonal to all eigenvectors with di¤erent eigenvalues. We can see now that the previous
non-repeated case is simply a special case of the repeated case, the GSOP for a single eigenvector
involving nothing other than normalizing it.
Once the given basis for each eigenspace has had the GSOP applied to it the entire eigenbasis
has now been made orthonormal. We may put this orthonormal eigenbasis as the columns of a
matrix P which will be orthogonal and such that P ¡1 MP = P T M P is diagonal.

Example 72 Find a 2 £ 2 real symmetric matrix M such that M 2 = A where


µ p ¶
p3 3
A= .
3 5

Solution. The characteristic polynomial of A is


p
det(xI ¡ A) = (x ¡ 3)(x ¡ 5) ¡ (¡ 3)2 = x2 ¡ 8x + 12

THE SPECTRAL THEOREM 40


which has roots 2 and 6. Determining the eigenvectors we see
µ p ¶ ¿µ p ¶À µ p ¶
1 3 ¡ 3 1 ¡ 3
λ = 2: ker p = , so take v1 = .
3 3 1 2 1
µ p ¶ ¿µ ¶À µ ¶
¡3 3 p1 1 p1
λ = 6: ker p = , so take v2 = .
¡ 3 1 3 2 3
p p
So with P = (v1 j v2 ) we have P T AP = diag(2, 6), which has a clear square root of diag( 2, 6). Thus
we might choose
³p p ´ µ p p p p ¶
T 1 3p2 + p6 3p 2 ¡ p6
M = P diag 2, 6 P = .
4 3 2¡ 6 2+3 6

Below are some important examples of symmetric matrices across mathematics and a de-
scription of their connection with quadratic forms.

Example 73 (Gram matrices) The Gram matrix M for an inner product h , i on a vector
space with basis fv1 , . . . , vn g has (i, j)th entry

[M ]ij = hvi , vj i .

This is a symmetric, positive de…nite matrix – because of the properties of inner products – and
conversely any symmetric, positive de…nite matrix is the Gram matrix of an inner product.

Example 74 (Inertia matrix in dynamics) A rigid body, rotating about a …xed point O
with angular velocity ! has kinetic energy
1
T = ! T I0 !,
2
where I0 is the inertia matrix
0 1
A ¡D ¡E
I0 = @ ¡D B ¡F A ,
¡E ¡F C
where
ZZZ ZZZ ZZZ
¡ 2 2
¢ ¡ 2 2
¢ ¡ ¢
A = ρ y +z dV, B= ρ x +z dV, C= ρ x2 + y 2 dV,
Z ZRZ ZZZ R
ZZZ R

D = ρyz dV, E= ρxz dV, F = ρxy dV,


R R R

where ρ denotes density and R is the region that the rigid body occupies. For a spinning
top, symmetrical about its axis, the eigenvectors of I0 are along the axis with two eigenvectors
orthogonal to that. Wrt this basis I0 = diag(A, A, C), but the spectral theorem applies to any
rigid body, however irregular the distribution of matter.

THE SPECTRAL THEOREM 41


Example 75 (Covariance and correlation matrices in probability and statistics) The
covariance matrix § is a symmetric, positive semi-de…nite matrix giving the covariance between
each pair of elements of a random vector. Given a random vector X = (X1 , . . . , Xn )T the
covariance matrix § is de…ned by
[§]i,j = cov [Xi , Xj ] = E [(Xi ¡ E(Xi ))(Yj ¡ E(Xj ))]
or equally
§ = E[XXT ] ¡ E(X)E(X)T .
It follows from the spectral theorem that every symmetric positive semi-de…nite matrix is a
covariance matrix. This matrix is important in the theory of principal component analysis
(PCA).
The correlation matrix C is similarly de…ned with
cov [Xi , Xj ]
[C]ij = .
σ (Xi ) σ (Xi )
C is a symmetric, positive semi-de…nite matrix with all its diagonal entries equalling 1.

One of the most important applications of the spectral theorem is the classi…cation of
quadratic forms.

De…nition 76 A quadratic form in n variables x1 , x2 , . . . , xn is a polynomial where each


term has degree two. That is, it can be written as a sum
X
aij xi xj
i6j

where the aij are scalars. Thus a quadratic form in two variables x, y is ax2 + bxy + cy 2 where
a, b, c are scalars.
Alternatively a co-ordinate-free way of de…ning quadratic forms on a vector space V is as
B(v, v)
where B : V £ V ! R is a bilinear map.

The connection with symmetric matrices is that we can write


X
aij xi xj = xT Ax
i6j

where xT = (x1 , x2 , . . . , xn ) and A is the symmetric matrix


8
< aii i = j
1
[A]ij = aij i < j .
: 21
a i>j
2 ji

Thus, for example, µ ¶µ ¶


b
2 2 a 2
x
ax + bxy + cy = (x y) b .
2
c y

THE SPECTRAL THEOREM 42


De…nition 77 When the spectral theorem is applied to quadratic forms it is often referred to
as the principal axis theorem.

There are many important examples of quadratic forms, some listed below.

Example 78 (Conics) The general degree two equation in two variables has the form

Ax2 + Bxy + Cy 2 + Dx + Ey + F = 0

where A, . . . , F are real scalars and A, B, C are not all zero. This equation can be put into
normal forms as follows. Firstly we can rewrite the equation as
µ ¶ µ ¶ µ ¶
x x A B/2
(x, y) M + (D, E) + F = 0, where M = . (3.3)
y y B/2 C

Note that M is symmetric. By the spectral theorem we know that there is a 2 £ 2 orthogonal
matrix P which will diagonalize M. If we set
µ ¶ µ ¶
x X
=P ,
y Y

then (3.3) becomes


µ ¶ µ ¶
T X X
(X, Y ) P M P + (D, E) P + F = 0,
Y Y

As P is orthogonal then this change of variable will not change any geometric aspects: distances,
angles and areas remain unaltered. In these new variables X, Y, and with P T M P = diag(A, ~ C)
~
~ ~
and (D, E)P = (D, E), our equation now reads as
~ 2 + CY
AX ~ 2 + DX
~ + EY
~ + F = 0.

We can now complete any squares to put this equation into a normal form.

² Ellipses have normal form


x2 y 2
+ 2 =1 (a > b > 0) .
a2 b

² Hyperbolae have normal form


x2 y 2
¡ 1 =1 (a, b > 0) .
a2 b

² Parabolae have normal form


y 2 = 4ax (a > 0) .

Each ellipse, hyperbola, parabola can be uniquely put into one of the above forms by an
isometry of the plane. The general degree two equation also leads to some degenerate cases such
as parallel lines, intersecting lines, repeated lines, points and the empty set.

THE SPECTRAL THEOREM 43


y y
y bx a y bx a
D' b D D' D

a F' F a x
F' F x
a a

b
Figure 3a – ellipse Figure 3b – hyperbola
y

F
x

Figure 3c – parabola Figure 3d – degenerate case

Example 79 (Quadrics) The spectral theorem applies equally well to the general degree two
equation in three variables x, y, z. The normal forms for the non-degenerate cases are

² Ellipsoids have normal form


x2 y 2 z 2
+ 2 + 2 =1 (a > b > c) .
a2 b c
² Hyperboloids of one sheet have normal form
x2 y 2 z 2
+ 2 ¡ 2 =1 (a > b > 0, c > 0) .
a2 b c
² Hyperboloids of two sheets have normal form
x2 y 2 z 2
+ 2 ¡ 2 = ¡1 (a > b > 0, c > 0) .
a2 b c
² Elliptic paraboloids have normal form
x2 y 2
+ 2 ¡z =0 (a > b > 0) .
a2 b

THE SPECTRAL THEOREM 44


² Hyperbolic paraboloids have normal form

x2 y 2
¡ 2 ¡z =0 (a, b > 0) .
a2 b

Each of these non-degenerate cases be uniquely put into one of the above forms by an isome-
try of the plane. The general degree two equation in three variables also leads to some degenerate
cases such as parallel planes, intersecting planes, repeated planes, points, cones, elliptic par-
abolic and hyperbolic cylinders and the empty set.

Fig. 4a: Ellipsoid Fig. 4b: Elliptic paraboloid Fig. 4c: Hyperbolic paraboloid
2
x2 2
a2
+ yb2 + zc2 = 1 z = a2 x2 + b2 y 2 z = a2 x2 ¡ b2 y 2

Fig. 4d: 2 sheets hyperboloid Fig. 4e: 1 sheet hyperboloid Fig. 4f: Double Cone
2 2 2
x2 2 x2 2 2
a2
¡ yb2 ¡ zc2 = 1 a2
+ yb2 ¡ zc2 = 1 z 2 = xa2 + yb2 .

Example 80 Show that the equation 13x2 + 13y 2 + 10z 2 + 4yz + 4zx + 8xy = 1 de…nes an
ellipsoid and …nd its volume.

Solution. Let 0 1
13 4 2
A = @ 4 13 2 A
2 2 10

THE SPECTRAL THEOREM 45


so that xT Ax = 13x2 + 13y 2 + 10z 2 + 4yz + 4zx + 8xy. Note that χA (x) equals
¯ ¯ ¯ ¯
¯ x ¡ 13 ¡4 ¡2 ¯¯ ¯¯ x ¡ 9 9¡x 0 ¯
¯ ¯
¯ ¡4 x ¡ 13 ¡2 ¯¯ = ¯¯ ¡4 x ¡ 13 ¡2 ¯¯
¯
¯ ¡2 ¡2 x ¡ 10 ¯ ¯ ¡2 ¡2 x ¡ 10 ¯
¯ ¯ ¯ ¯
¯ 1 ¡1 0 ¯ ¯ 1 0 0 ¯
¯ ¯ ¯ ¯
= (x ¡ 9) ¯¯ ¡4 x ¡ 13 ¡2 ¯¯ = (x ¡ 9) ¯¯ ¡4 x ¡ 17 ¡2 ¯¯
¯ ¡2 ¡2 x ¡ 10 ¯ ¯ ¡2 ¡4 x ¡ 10 ¯
= (x ¡ 9)(x2 ¡ 27x + 162) = (x ¡ 9)(x ¡ 18)(x ¡ 9).
This means that there is an orthogonal matrix P such that
P T AP = diag(9, 9, 18).
If we set x = P X then we see our quadric now has equation
9X 2 + 9Y 2 + 18Z 2 = XT P T AP X = 1,
which is an ellipsoid. Further we have
1 1 1
a= , b= , c= p
3 3 3 2
and so by Sheet 4, Question S2, and noting the orthogonal change of variable won’t the ellip-
soid’s volume, we see that volume equals
p
4π 1 1 1 2 2π
£ £ £ p = .
3 3 3 3 2 81

Example 81 (a) Find an orthogonal matrix P such that P T AP is diagonal where


0 1
1 ¡1 ¡1
A = @ ¡1 1 ¡1 A.
¡1 ¡1 1

(b) Consider the real-valued functions f and g de…ned on R3col by


p
f (x) = x2 + y 2 + z 2 ¡ 2xy ¡ 2xz ¡ 2yz, g(x) = ¡y 2 + 2z 2 + 2 2xy,
where x = (x, y, z)T . Is there an invertible matrix Q such that f(Qx) = g(x)? Is there an
orthogonal Q?
(c) Sketch the surface f (x) = 1.

Solution. (a) The characteristic polynomial χA (x) equals (x + 1) (x ¡ 2)2 so that the eigen-
values are ¡1, 2, 2. The ¡1-eigenvectors are multiples of (1, 1, 1)T and the 2-eigenspace is the
plane x + y + z = 0. So an orthonormal eigenbasis for A is
(1, 1, 1)T (1, ¡1, 0)T (1, 1, ¡2)T
p , p , p ,
3 2 6

THE SPECTRAL THEOREM 46


from which we can form the required
0 p p 1
p2 p 3 1
1
P = p @ p2 ¡ 3 1 A.
6 2 0 ¡2

(b) We then have that f (P x) = (P x)T A(P x) = xT P T AP x = ¡x2 +2y 2 +2z 2 . Now we similarly
have 0 p 10 1
p0 2 0 x
g(x) = (x, y, z) @ 2 ¡1 0 A @ y A
0 0 2 z
and this matrix (call it B) has characteristic polynomial χB (x) = (x + 2) (x ¡ 1) (x ¡ 2) . This
means that there is an orthogonal matrix R such that

g(Rx) = ¡2x2 + y 2 + 2z 2 .

We pcanpthen see that the (invertible but not orthogonal) map S which sends (x, y, z)T to
(x/ 2, 2y, z)T satis…es
p p
g(RSx) = ¡2(x/ 2)2 = ( 2y)2 + 2z 2 = ¡x2 + 2y 2 + 2z 2 .

That A and B have the same number of eigenvalues of each sign means that there is an
invertible change of variables connecting the funvyions f and g, but there is no orthogonal
change of variable as that would preserve the eigenvalues.
(c) The quadric surface f (x) = 1 is a hyperboloid of one sheet, as in Figure 4e with the new
x-axis being the axis of the hyperboloid.

Example 82 (Hessian matrix) Let f (x, y) be a function of two variables with partial deriv-
atives of all orders. Taylor’s theorem in two variables states
1¡ ¢
f (a+δ, b+ε) = f(a, b) +(fx (a, b)δ +fy (a, b)ε)+ fxx (a, b)δ 2 + 2fxy (a, b) δε + fyy (a, b)ε2 +R3
2
where R3 is a remainder term that is at least order three in δ and ε. A critical point or
stationary point (a, b) is one where

fx (a, b) = 0 = fy (a, b).

Thus, at a critical point, we have


1¡ ¢
f(a + δ, b + ε) = f(a, b) + fxx(a, b)δ 2 + 2fxy (a, b) δε + fyy (a, b)ε2 + R3
2
and so the local behaviour of f near a critical point is determined by the quadratic form
µ ¶µ ¶
2 2 fxx fxy δ
fxx δ + 2fxy δε + fyy ε = (δ ε) .
fxy fyy ε

THE SPECTRAL THEOREM 47


The symmetric matrix µ ¶
fxx fxy
H=
fxy fyy
is known as the Hessian4 . As H is symmetric, we know that we can make an orthogonal
change of variables (δ, ε) ! (¢, E) so that the above quadratic form becomes

λ¢2 + µE 2

where λ, µ are the eigenvalues of H. We then see that:

² there is a (local) minimum at (a, b) if λ, µ > 0;

² there is a (local) maximum at (a, b) if λ, µ < 0;

² there is a saddle point at (a, b) if λ, µ have di¤erent signs.

When H is singular then the critical point is said to be degenerate, and its classi…cation depends
on the cubic terms (or higher) in Taylor’s theorem.

Example 83 (Norms) Given an inner product space V then the norm squared kvk2 = hv, vi
is a positive de…nite quadratic form on V. For a smooth parameterized surface r(u, v) in R3
then the tangent space Tp at a point p equals the span of ru and rv . The restriction of kvk2 to
Tp is the quadratic form

(α, β) 7! kαru + βrv k2 = Eα2 + 2F αβ + Gβ 2

where
E = ru ¢ r u , F = ru ¢ rv , G = rv ¢ rv ,
and is known as the …rst fundamental form.

We conclude this chapter with some comments charting the direction of spectral theory into
the second year linear algebra and beyond into third year functional analysis. You should
consider all these remarks – and the subsequent epilogue – to be beyond the Prelims
syllabus but they may make interesting further reading to some.

Remark 84 (Adjoints) As commented earlier, the spectral theorem is most naturally stated
in the context of inner product spaces; a more sophisticated version of the theorem appears
in the secord year the course A0 Linear Algebra. The version we have met states that a real
symmetric matrix (or complex Hermitian matrix) is diagonalizable via an orthogonal change of
variable.
If we seek to extend this theorem to linear maps on vector spaces, our …rst problem is that
there is no well-de…ned notion of the transpose of a linear map and so no notion of a symmetric
linear map. The determinant of a linear map T is well-de…ned as the determinant is the same
4
After the German mathematician Ludwig Hesse (1811–1874).

THE SPECTRAL THEOREM 48


for any matrices A and B representing T. This is because A = P ¡1 BP for some change of basis
matrix P and so
¡ ¢ 1
det A = det P ¡1 BP = det B det P = det B.
det P
However if we wished to de…ne the transpose T T of T as the linear map de…ned by AT wrt the
…rst basis and B T wrt the second basis we have just de…ned di¤erent linear maps as in general

AT 6= P ¡1 B T P.

This can be gotten around somewhat if we only consider orthogonal changes of variable P . In
this case
A = P T BP =) AT = P T B T P.
So should we only use orthonormal bases, and orthogonal changes of variable, then we can
de…ne the "transpose" of a linear map. But this discussion only makes sense in in an inner
product space and not in a general vector space; that "tranpose" is instead referred to a the
adjoint of T written T ¤ .
Given a linear map T : V ! V of a …nite-dimensional inner product space V, its adjoint
T ¤ : V ! V is the unique linear map satisfying

hT v, wi = hv, T ¤ wi for all v, w 2 V.

If we choose an orthonormal basis v1 , . . . , vn for V and let A and B respectively be the matrices
for T and T ¤ wrt this basis then

[A]ji = hT vi , vj i = hvi , T ¤ vj i = [B]ij .

So the matrix for T ¤ is that of the transpose of the matrix for T. The "symmetric" linear maps
are then those satisfying T = T ¤ , the so-called self-adjoint linear maps which satisfy

hT v, wi = hv, T wi for all v, w 2 V.

The second year version of the spectral theorem states:

² Spectral theorem for self-adjoint maps. Let T : V ! V be a self-adjoint map on a


…nite-dimensional inner product space. Then all the eigenvalues of T are real and there
is an orthonormal eigenbasis of V.

Example 85 (See Sheet 4, Exercise P3.) The nth Legendre polynomial Pn (x) satis…es Legen-
dre’s equation
¡ ¢ d2 y dy
1 ¡ x2 2
¡ 2x + n(n + 1)y = 0
dx dx
where n is a natural number. This can be rewritten as
· ¸
d ¡ 2
¢ d
Ly = ¡n(n + 1)y where L= 1¡x .
dx dx
So Pn(x) can be viewed as an n(n + 1)-eigenvector of the di¤erential operator L. Further it can
be shown that
hPn (x), Pm (x)i = 0 when n 6= m,

THE SPECTRAL THEOREM 49


where the inner product h , i is de…ned by
Z 1
hf, gi = f (x)g(x) dx,
¡1

so that the Legendre polynomials are in fact orthogonal eigenvectors; further still it is true that

hLy1 , y2 i = hy1 , Ly2 i ,

showing L to be self-adjoint.

Remark 86 (Spectral Theory – in…nite-dimensional spaces) Whilst the space R[x] of


polynomials is in…nite dimensional, the above example is not at a great remove from orthogonally
diagonalizing a real symmetric matrix – after all any polynomial can be written as a …nite linear
combination of Legendre polynomials. For contrast, Schrödinger’s equation in quantum
theory has the form

}2 d2 ψ
¡ + V (x)ψ = Eψ, ψ(0) = ψ(a) = 0.
2m dx2
This equation was formulated in 1925 by the Austrian physicist, Erwin Schrödinger (1887-1961).
The above is the time-independent equation of a particle in the interval 0 6 x 6 a. The wave
function ψ is a complex-valued function of x and jψ(x)j2 can be thought of as the probability
density function of the particle’s position. m is its mass, } is the (reduced) Planck constant,
V (x) denotes potential energy and E is the particle’s energy.
A signi…cant, confounding aspect of late nineteenth century experimental physics was the
emission spectra of atoms. (By the way, these two uses of the word "spectrum" in mathematics
and physics appear to be coincidental.) As an example, experiments showed that only certain
discrete, quantized energies could be released by an excited atom of hydrogen. Classical physical
theories were unable to explain this phenomenon.
Schrödinger’s equation can be rewritten as Hψ = Eψ with E being an eigenvalue of the
di¤erential operator H known as the Hamiltonian. One can again show that H is self-adjoint,
that is:
hHψ 1 , ψ 2 i = hψ 1 , Hψ 2 i
where Z a
hϕ, ψi = ϕ(x)ψ(x) dx.
0
And if V is constant then it’s easy to show that the only non-zero solutions of Schrödinger’s
equation above are
³ nπx ´ n2 π 2 }2
ψ n (x) = An sin where E = En = V + ,
a 2ma2
and npis a positive integer and An is a constant. If jψ n (x)j2 is to be a pdf then we need
An = 2/a, and again these ψ n are orthonormal with respect to the above inner product. Note
above that the energy E can only take certain discrete values En .

THE SPECTRAL THEOREM 50


In general though, a wave function need not be one of these eigenstates ψ n and may be a
…nite or indeed in…nite combination of them. For example we might have
r
30
ψ(x) = x (a ¡ x)
a5

for which jψ(x)j2 is a pdf. How might we write such a ψ(x) as a combination of the ψ n (x)? This
is an in…nite-dimensional version of the problem the spectral theorem solved – how in general
to write a vector as a linear combination of orthonormal eigenvectors – and in the in…nite
dimensional case is the subject of Fourier analysis, named after the French mathematician
Joseph Fourier (1768-1830). In this case Fourier analysis shows that
1 p
X 8 15
ψ(x) = α2n+1 ψ 2n+1 (x) where αn = 3 3 .
0
π n

If the particle’s energy was measured it would be one of the permitted energies En and the e¤ect
of measuring this energy is to collapse the above wave function ψ to one of the eigenstates
ψ 2n+1 . It is the case that
1
X
jα2n+1 j2 = 1
0

(this is Parseval’s Identity which is essentially an in…nite dimensional version of Pythagoras’


Theorem). The probability of the particle having measured energy E2n+1 is jα2n+1 j2 . The role
of measurement in quantum theory is very di¤erent from that of classical mechanics; the very
act of measuring some observable characteristic of the particle actually a¤ects and changes the
wave function.
From the more general point of view, it is important that these wave functions lie not just
in an in…nite-dimensional complex inner product space, but that this space is a Hilbert space,
meaning it is complete – Cauchy sequences are convergent. There is a (somewhat technical)
version of the spectral theorem for Hilbert spaces which is the subject of the third year functional
analysis courses.

3.1 Epilogue – Singular Value Decomposition (O¤-syllabus)

We conclude with an important related theorem, namely the singular value decomposition
theorem which applies not just to square matrices and is important in numerial analysis, signal
processing, pattern recognition and in particular is used in the Trinity term Statistics and Data
Analysis course when discussing principal component analysis.
Recall that, given an m £ n matrix A of rank r then there exist an invertibel m £ m matrix
P and an invertible n £ n matrix Q such that
µ ¶
Ir 0r,n¡r
P AQ = .
0m¡r,r 0m¡r,n¡r

EPILOGUE – SINGULAR VALUE DECOMPOSITION (OFF-SYLLABUS) 51


The matrix P results from the elementary matrices used to put A into RRE form, and then
the ECOs can be used to move the r leading 1s to the …rst r columns and clear out the rest of
the rows.
A natural question is: what form can A be put into if P and Q are required to be orthogonal
instead?

Theorem 87 (Singular Value Decomposition) Let A be an m £ n matrix of rank r. Then


there exist an orthogonal m £ m matrix P and an orthogonal n £ n matrix Q such that
µ ¶
D 0r,n¡r
P AQ = (3.4)
0m¡r,r 0m¡r,n¡r
where D is an invertible diagonal r £ r matrix with positive entries listed in decreasing order.
Proof. Note that AT A is a symmetric n £ n matrix. So by the spectral theorem there is an
n £ n orthogonal matrix Q such that
µ ¶
T T ¢ 0r,n¡r
Q A AQ = ,
0n¡r,r 0n¡r,n¡r
where ¢ is a diagonal r £ r matrix with its diagonal entries in decreasing order. Note that
AT A has the same rank r as A and that the eigenvalues of AT A are non-negative, the positive
eigenvalues being the entries of ¢. If we write
¡ ¢
Q = Q1 Q2
where Q1 is n £ r and Q2 is n £ (n ¡ r) then we have in particular that
QT1 AT AQ1 = ¢; QT2 AT AQ2 = 0; QT1 Q1 = Ir ; Q1 QT1 + Q2 QT2 = In ,
the last two equations following from Q’s orthogonality. From the second equation AQ2 =
0m,n¡r . ¡p p ¢
If ¢ = diag (λ1 , . . . , λr ) then we may set D = diag λ1 , . . . , λr so that D2 = ¢. We
then de…ne P1 to be the m £ r matrix
P1 = AQ1 D¡1 .
Note that
P1 DQT1 = AQ1 QT1 = A(In ¡ Q2 QT2 ) = A ¡ (AQ2 )QT2 = A.
We are almost done now as, by the transpose product rule and because D is diagonal, we
have ¡ ¢T ¡ ¢
P1T P1 = AQ1 D¡1 AQ1 D¡1 = D¡1 QT1 AT AQ1 D¡1 = D¡1 ¢D¡1 = Ir
and also that
P1T AQ1 = P1T P1 D = Ir D = D.
That P1T P1 = Ir means the columns of P1 form an orthonormal set, which may then be extended
to an orthonormal
¡ basis¢ for Rm . We put these vectors as the columns of an orthogonal m £ m
matrix P T = P1 P2 and note that

P2T AQ1 = P2T P1 D = 0m¡r,r D = 0m¡r,r

EPILOGUE – SINGULAR VALUE DECOMPOSITION (OFF-SYLLABUS) 52


as the columns of P are orthogonal. Finally we have that P AQ equals
µ T ¶ µ T ¶ µ T ¶ µ ¶
P1 ¡ ¢ P1 ¡ ¢ P1 AQ1 0r.n¡r D 0r.n¡r
A Q1 Q2 = AQ1 0m.n¡r = = .
P2T P2T 0m¡r.r 0m¡r,n¡r 0m¡r.r 0m¡r,n¡r

Example 88 Find the singular value decomposition of


µ ¶
1 0 2 1
A= .
0 2 1 ¡1

Solution. Firstly
0 1 0 1
1 0 µ ¶ 1 0 2 1
B C
0 2 C 1 0 2 1 B 2 ¡2 C
AT A = B B 0 4 C.
@ 2 1 A 0 2 1 ¡1 = @ 2 2 5 1 A
1 ¡1 1 ¡2 1 2

AT A has characteristic polynomial

x4 ¡ 12x3 + 35x2 = x2 (x ¡ 5)(x ¡ 7).

We can then take 0 1


p1 p1 ¡ 23 ¡ p421
14 10
B p2 ¡ p210 1
¡ p121 C
B 14 3 C
Q=B p3 p1 2 C,
@ 14 10
0 p
21 A
0 p2 2
0
10 3
p p
so that QT AT AQ = diag(7, 5, 0, 0). We then set D = diag( 7, 5, 0, 0) and P1 = AQ1 D¡1 to
give 0 1
p1 p1
µ ¶B 14 10 Ã ! Ã !
p2 ¡ p210 C p1 0 p1 p1
1 0 2 1 B 14 C 7 2 2
P1 = B p3 p1
C p1
= p1
0 2 1 ¡1 @ 14 10 A 0 5 2
¡ p12
0 p2
10

and so à !
p1 p1
P = P1T = p1
2 2 .
2
¡ p12

Remark 89 With notation as in Theorem 87, the pseudoinverse (or Moore-Penrose inverse)
of A is µ ¶
+ D¡1 0r,m¡r
A =Q P.
0n¡r,r 0n¡r,m¡r
The following facts are then true of the pseudoinverse.
(a) If A is invertible then A¡1 = A+ .

EPILOGUE – SINGULAR VALUE DECOMPOSITION (OFF-SYLLABUS) 53


¡ ¢+
(b) AT = (A+ )T .
(c) (AB)+ 6= B + A+ in general.
(d) The pseudoinverse has the following properties.

(I) AA+ A = A; (II) A+ AA+ = A+ ; (III) AA+ and A+ A are symmetric.

(e) A+ is the only matrix to have the properties I, II, III.


(f) AA+ is orthogonal projection onto the column space of A.
(g) If the columns of A are independent then A+ = (AT A)¡1 AT .
(h) Let b be in Rm
col and set x0 = A b. Then
+

jAx ¡ bj > jAx0 ¡ bj for all x in Rncol .

EPILOGUE – SINGULAR VALUE DECOMPOSITION (OFF-SYLLABUS) 54

You might also like