PART1
PART1
February 8, 2013
1 Preamble
The notes below will serve as the basis for the module lectures to be given on the
topics of Jordan form, nonnegative matrices, and Markov chains. As youll see, most
of the proofs have been omitted in the notes. The intention is to sketch the proofs
during the lectures, but also to have the students work through some of the details
(using references such as those listed at the end of this document) as part of the
selfstudy component of module. If youre having trouble getting ahold of some of
the reference texts, I have copies of all of them in my office which youre welcome
to borrow from time to time. I have also included some sample exercises. These are
not intended to be terribly difficult, but if you want to discuss the problems with me
once the module is over, feel free to do so.
Suppose that the square matrix A has as an eigenvalue. A Jordan chain for
is a finite list of nonzero vectors v1 , . . . , vk such that Av1 = v1 , and for j =
1
2, . . . , k, Avj = vj + vj1 . Evidently v1 is an eigenvector of A corresponding to the
eigenvalue . The vectors v2 , . . . , vk are known as generalised eigenvectors of A
corresponding to the eigenvalue . The algebraic multiplicity of an eigenvalue of A
is its multiplicity as a root of the characteristic polynomial of A, while the geometric
multiplicity of is the dimension of the null space of A I. For any eigenvalue, the
algebraic multiplicity is greater than or equal to the geometric multiplicity.
Suppose that C and that k N. The Jordan block Jk () is the k k upper
triangular matrix
1 0 ... 0
0 1 0 ... 0
.. . .
.. .. .
..
.
Jk () = .
0 0 ... 1
0 0 ... 0
Note that in the special case that k = 1, J1 () is just the 1 1 matrix .
Remarks:
The matrix on the right hand side of (1) is the Jordan canonical form for A. It is
unique, up to a reordering of the diagonal Jordan blocks.
The numbers 1 , . . . , m are the eigenvalues of A, and they are not necessarily
distinct.
Evidently k1 + . . . km = n.
In the case that m = n and all kj s are equal to 1, the Jordan canonical form for A
is a diagonal matrix, and A is said to be diagonable.
The columns of the matrix S consist of eigenvectors and generalised eigenvectors
for the matrix A.
Suppose that is an eigenvalue of the matrix A, and that the Jordan canonical
form for A Pis given by (1). Let I = {j|j = }. Then the algebraic multiplicity of
is given by jI kj , while the geometric multiplicity of is given by |I|.
Suppose that k, p N, C and consider the matrix Jk ()p . An argument by
2
induction (on p) shows that Jk ()p is an upper triangular matrix such that for each
pair of indices i, j with 1 i < j k, the (i, j) entry of Jk ()p is given by:
p
pj+i
ji
, if j i p 1,
p
(Jk () )i,j = 1, if j i = p,
0, if j i > p.
Observe that the search for the Jordan canonical form of a given square matrix A
is essentially the same as the search for Jordan chains associated with the eigenvalues
of A. For concreteness, suppose that is an eigenvalue of A. We can then find a
basis for the null space of A I, say u1 , . . . , up . Each uj is an eigenvector of A
corresponding to , and the geometric multiplicity of is p, so we know that there
will be p Jordan blocks corresponding to the eigenvalue in the Jordan form for
A. Further, each uj can be used to construct a Jordan chain of eigenvectors and
generalised eigenvectors, as follows.
i) Select u1 , and set v1 u1 .
For each l = 1, 2, . . . , we iterate the steps below.
ii) If the null space of (A I)l is a proper subspace of the null space of (A I)l+1 ,
we find vl+1 as a solution to the linear system (A I)vl+1 = vl .
iii) If the null space of (A I)l is the same as the null space of (A I)l+1 , then
we stop.
Observe that this procedure produces a Jordan chain of length r (say), v1 , . . . , vr ,
whose initial vector is u1 . The corresponding Jordan block in the Jordan form for A
will be r r. We then repeat the procedure to produce Jordan chains for that start
with each of u2 , . . . , up . A similar process applies to the remaining eigenvalues of A.
3 Nonnegative matrices
3.1 Primitive matrices
A square matrix A with entries in R is nonnegative (respectively, positive) if each
of its entries is nonnegative (respectively, positive). We use the notation A 0
(respectively, A > 0) to denote this, and a similar terminology and notation applies
to vectors in Rn . A nonnegative matrix A is primitive if there is a k N such that
Ak > 0.
Theorem 3.1 (The PerronFrobenius theorem for primitive matrices) Suppose that
A is a primitive nonnegative matrix of order n. Then:
a) the spectral radius, (A), is an eigenvalue of A;
b) (A) is an algebraically simple eigenvalue of A;
3
c) A has positive right and left eigenvectors corresponding to the eigenvalue (A);
d) the left and right eigenspaces of A corresponding to (A) both have dimension 1;
e) if 6= (A) is an eigenvalue of A, then || < (A);
f ) if B is an nn matrix such that 0 B A, then (B) (A), and if (B) = (A),
then in fact B = A;
g) if x is a positive right eigenvector of A, then necessarily Ax = (A)x, with an
analogous statement holding for positive left eigenvectors.
0 34 14
A = 1 0 0 .
0 1 0
The next result follows from Theorems 2.1 and 3.1. Here we use J to denote an
allones matrix.
Theorem 3.3 Let A be a primitive nonnegative matrix of order n with Perron value
. Let x and y denote right and left Perron vectors of A respectively, normalised so
4
that y T x = 1. Label the eigenvalues of A as , 2 , . . . , n , where |2 | |3 | . . .
|n |. If 2 6= 0, there is a constant C such that
k
1
k
T n1 |2 |
k A xy Ck J,
1
where the inequality holds entrywise. On the other hand, if 2 = 0, then k
Ak = xy T
for all k n 1. In particular, in either case we have
1 k
lim k
A = xy T .
k
5
happens to be primitive with Perron vector x, then from Corollary 3.4 we find that
as k ,
1 1
T
p(k) T x.
1 p(k) 1 x
For this reason, the vector 1T1 x x is sometimes known as the stable distribution vector
for the population. Note also that as k , the total size of the population is
1T p(k). Since
1T p(k + 1)
lim = (A),
k 1T p(k)
the Perron value of A is interpreted as the asymptotic growth rate of the population.
Example 3.5 (North American right whale) The female subpopulation is under con-
sideration here, the time unit is one year, and the population is subdivided into five
categories: calf, immature female, mature but non-reproductive female, mother, and
resting mother (right whales do not reproduce in the year after giving birth). The
corresponding population projection matrix (based on estimated data) is
0 0 .13 0 0
.9 .85 0 0 0
A= 0 .12 .71 0 1 .
(2)
0 0 .29 0 0
0 0 0 .85 0
Example 3.6 (Desert tortoise) The population projection matrix for this species is
0 0 0 0 0 1.300 1.980 2.570
0.716 0.567 0 0 0 0 0 0
0 0.149 0.567 0 0 0 0 0
0 0 0.149 0.604 0 0 0 0
A= .
0 0 0 0.235 0.560 0 0 0
0 0 0 0 0.225 0.678 0 0
0 0 0 0 0 0.249 0.851 0
0 0 0 0 0 0 0.016 0.860
6
The corresponding Perron value is 0.9581, and the stable distribution vector is
0.2217
0.4058
0.1546
0.0651
0.0384 .
0.0309
0.0718
0.0117
One question of interest in the mathematical ecology literature is the sensitivity
of the asymptotic growth rate to changes in demographic parameters, or, in mathe-
matical language, the sensitivity of the Perron value of A to changes in the entries of
A. To frame the issue more precisely, suppose that were given a primitive matrix A
of order n, and we consider its Perron value (A) as a function of the n2 entries in
A. One way of measuring the sensitivity of the Perron value is as follows: fix a pair
of indices i, j between 1 and n. Now consider
1
= lim ((A + Ei,j ) (A)).
ai,j 0
A
Here Ei,j denotes the (0, 1) matrix with a 1 in the (i, j) position, and 0s everywhere
else.
Theorem 3.7 Let A be a primitive n n matrix with Perron value (A). Suppose
that x and y are right and let Perron vectors for A, respectively, normalised so that
y T x = 1. Fix a pair of indices i, j between 1 and n. Then
= xj yi .
ai,j
A
Example 3.8 We revisit the matrix of Example 3.6. We saw in that example that
the stable distribution vector is
0.2217
0.4058
0.1546
0.0651
x=
,
0.0384
0.0309
0.0718
0.0117
7
and it turns out that the left Perron vector y, normalised so that y T x = 1, is given
by
0.1955
0.2616
0.6866
1.8019
y= .
2.7148
4.8029
4.3813
5.1237
From Theorem 3.7 we find that the matrix of derivatives for the Perron value is given
by
0.0433 0.0793 0.0302 0.0127 0.0075 0.0060 0.0140 0.0023
0.0580 0.1062 0.0405 0.0170 0.0100 0.0081 0.0188 0.0031
0.1522 0.2786 0.1062 0.0447 0.0264 0.0212 0.0493 0.0080
T
0.3994 0.7313 0.2786 0.1173 0.0692 0.0556 0.1294 0.0211
yx = 0.6018 1.1018 0.4198 0.1767 0.1043 0.0838 0.1949 0.0318 .
1.0646 1.9492 0.7427 0.3126 0.1845 0.1482 0.3448 0.0563
0.9712 1.7782 0.6775 0.2851 0.1683 0.1352 0.3145 0.0513
1.1357 2.0794 0.7923 0.3334 0.1968 0.1581 0.3678 0.0600
Here the bolded entries correspond to the demographically relevant entries, where the
original population projection matrix has positive entries.
8
1u 2
-u -u
3 -u
4 -u
5
m m
- -
contains. If the vertices on a walk are all distinct, then the walk is called a path, while
if we have a walk from vertex i to vertex i of the form i i0 i1 i2 . . . ik i
and the vertices i0 , i1 , . . . , ik1 are distinct, then the walk is called a cycle. A cycle of
length 1, that is, an arc of the form i i, is called a loop. Finally, a directed graph
D is strongly connected if it has the property that for any pair of vertices i, j of D,
there is a walk from i to j in D.
A square matrix A is said to be reducible if there is a permutation matrix P such
that P AP T is in block triangular form i.e.
T A1,1 A1,2
P AP = ,
0 A2,2
where the blocks A1,1 and A2,2 are square. A matrix A is irreducible if no such per-
mutation exists. The follow results describes the relationship between irreducibility
and directed graphs.
Theorem 3.10 A square nonnegative matrix A is irreducible if and only if its directed
graph D(A) is strongly connected.
The connection between primitivity and directed graphs becomes more apparent
with the following result.
Theorem 3.11 Let A be a square nonnegative matrix. Then Ak has a positive entry
in position (i, j) if and only if D(A) contains a walk from vertex i to vertex j of length
k.
From Theorem 3.11 we deduce that a square nonnegative matrix A is primitive
if and only if there is a k N such that for any two vertices in D(A), there is a
walk of length exactly k. From this we deduce that any primitive matrix must have
a strongly connected directed graph. However, the converse fails, as the following
example shows.
Example 3.12 Consider the following 3 3 matrix:
0 1 0
A = 12 0 12 .
0 1 0
The directed graph for A is below.
9
1u 2
-u -u
3
The following numbertheoretic result, sometimes called the postage stamp lemma,
is helpful.
Theorem 3.15 Let A be a primitive matrix of order n. Consider D(A), and let s
denote the length of a shortest cycle in D(A). Then we have
exp(A) n + s(n 2)
and
exp(A) (n 1)2 + 1. (3)
10
Further, equality holds in (3) if and only if there is a permutation matrix P and a
collection of positive numbers a1 , . . . an+1 such that either P AP T or P AT P T has the
following form:
0 a1 0 . . . 0
0 0 a2 . . . 0
.. . . .. .
. . . (4)
an 0 0 . . . an1
an+1 0 0 . . . 0
11
For each k N, we iterate as follows:
X
xi (k) = yj (k 1) (5)
j3ji
X
yi (k) = xj (k). (6)
j3ij
Equation (5) can be interpreted as saying that the kstep authority score for vertex
i is given by the sum of the k 1step hub scores of the vertices having outarcs to
vertex i. From (6), we see that the kstep hub score for vertex i is the sum of the
kstep authority scores of the vertices to which i has an outarc. We can recast (5)
and (6) in matrixvector terms as
It now follows that x(k) = (AT A)k x(0), y(k) = (AAT )k y(0), k N. Since were inter-
ested in the relative sizes of the entries in x(k) (respectively, y(k)), we consider the
1 1
related sequences x(k) = 1T x(k) x(k), y(k) = 1T y(k) y(k), k N.
T T
Suppose now that both AA and A A are primitive, and let u and v denote the
Perron vectors of AAT and AT A, respectively, normalised so that 1T u = 1T v = 1.
Then we have
lim x(k) = u, lim y(k) = v.
k k
Hence we take u and v as the authority and hub vectors for D, respectively.
Example 3.16 Consider the directed graph D given above. The corresponding ad-
jacency matrix is
0 1 0 1
0 0 1 0
A= 1 1 0
,
1
0 1 1 0
so that
2 0 2 1 2 1 1 1
0 2 1 0
, AAT = 1 3
0 2
AT A =
2
.
1 3 1 1 0 1 0
1 0 1 1 1 2 0 2
12
4u
Z
>
Z
Z
Z
Z
RZ
Z
k Z
Z
Z
u -u Zu
Z
? -
1 2 3
The implementation of HITS is querydependent. That is, once the user enters the
term(s) on which to be searched, a search of web pages containing those terms (and
possibly some semanticallyrelated terms) is performed. That yields the collection of
web pages that form the vertex set of the directed graph D; we then include in D
all arcs of the form i j where i and j are among the pages containing the search
terms, and where page i has a link out to page j. Apparently a version of HITS is
used by the search engine Teoma, which now belongs to Ask.com.
Remarks:
The matrix AAT is primitive if and only if the corresponding directed graph is
strongly connected. A similar comment applies to AT A.
If A contains neither a zero row nor a zero column, then AAT is primitive if and
only if AT A is primitive.
If the authority score vector x has been computed, we may find y easily via the
formula y = 1T1Ax A
x.
In practice the vector x can often be estimated by the power method that is, by
selecting some nonnegative initial vector x(0), and then computing x(k) = AT Ax(k
1) for several iterations (say, 1015).
13
For an irreducible nonnegative matrix A that is periodic with period d, we can
simultaneously permute the rows and columns of A to put it in periodic normal form.
That is, there is a permutation matrix P so that P AP T has the form
0 A1 0 . . . 0
0 0 A2 . . . 0
T
P AP =
. . . . . . .
(7)
0 0 . . . 0 Ad1
Ad 0 . . . 0 0
The following weaker version of Theorem 3.3 holds for irreducible periodic matri-
ces.
14
Theorem 3.18 Suppose that A is an irreducible nonnegative matrix that is periodic
with period d. Let x and y denote right and left Perron vectors for A, respectively,
normalised so that y T x = 1. Then
k+d1
1 X 1
lim j
Aj = xy T .
k d (A)
j=k
Example 3.19 Condider the adjacency matrix A of the directed graph in Figure 3.3.
Then
0 1 0
A = 1 0 1 .
0 1 0
The eigenvalues of A are 2, 2, 0, with corresponding (right) eigenvectors given
by
1 1 1
2 2 2
1 , 12 , 0 ,
2
1
2
1
2
12
respectively. Observe that for the permutation matrix
0 1 0
P = 1 0 0 ,
0 0 1
we have
0 1 1
P AP T = 1 0 0 ,
1 0 0
which is in periodic normal form.
15
Example 3.20 Consider the matrix
0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 1 0 0
A= 0
.
0 0 0 1 0
0 0 0 0 0 1
1 0 0 1 0 0
Starting with S0 = {1}, we iterate the procedure as above to find that S2 = {2}, S3 =
{3}, S4 = {4}, S5 = {5}, S6 = {6}, S7 = {1, 4}, S8 = {2, 5}, S9 = {3, 6}. Further, for
all k 7, we find that
S7 , if k 1 mod 3,
Sk = S8 , if k 2 mod 3,
S9 , if k 0 mod 3.
We close the section with the following result on irreducible nonnegative matrices.
Theorem 3.21 Let A be an irreducible nonnegative matrix, P and suppose that r >
(A). Then rI A is invertible, and in fact (rI A)1 = 1 j
j=0 rj+1 A . In particular,
(rI A)1 has all positive entries.
where each Aj,j is either square and irreducible, or is a 1 1 zero matrix. The block
triangular form in (8) is known as the Frobenius normal form for A. It is straightfor-
ward to see that for such a matrix A, we have (A) = max{(Aj,j )|j = 1, . . . , m}, so
that for reducible nonnegative matrices, the spectral radius is an eigenvalue. Observe
16
that a reducible nonnegative matrix may or may not have a positive right eigenvector
corresponding to the spectral radius; for instance consider the examples
1 1 1
2 2 2
0
and .
0 1 0 1
Suppose that we have a reducible nonnegative matrix A, and consider the directed
graph D(A). We say that two vertices i, j of D(A) communicate if either i = j, or
there are paths from i to j and from j to i in D(A). Evidently communication is
an equivalence relation, and the corresponding equivalence classes (which are sets of
indices) are called the classes for A. It is straightforward to determine that in fact
the classes for A are the same as the subsets of indices that correspond to diagonal
blocks in the Frobenius normal form for A. Suppose that we have two classes S and
T for A. We say that S has access to T if there is an i S and a j T such that
i has access to j. A class is called final if it does not have access to any other class.
Observe that any class is either a final class, or it has access to a final class. Lastly,
we say that a class S is basic if (A[S]) = (A), where A[S] is the principal submatrix
of A on the rows and columns indexed by S.
The following result addresses the existence of positive eigenvectors for reducible
nonnegative matrices.
17
given event B, respectively. We say that the sequence uk has the Markov property if,
for each k N
P r{uk+1 |u1 , u2 , . . . , uk } = P r{uk+1 |uk }.
A sequence of random variables that enjoys the Markov property is known as a Markov
chain. A Markov chain is said to be time homogeneous if there is a collection of fixed
probabilities ti,j , i, j = 1, . . . , n such that for each k N and i, j = 1, . . . , n we have
P r{uk+1 = j|uk = i} = ti,j .
In this setting we refer to the quantity ti,j as the transition probability from state i
to state j for the Markov chain. For a time homogeneous Markov chain, we may
construct the corresponding transition matrix T as T = [ti,j ]i,j=1,...,n . We can also
represent the iterates uk of the Markov chain as vectors
P r{uk = 1}
P r{uk = 2}
vk .
..
.
P r{uk = n}
Observe that vkT 1 = 1 for each k N. It is straightforward to verify that for each
k N,
T
vk+1 = vkT T.
This last relation can be reformulated equivalently as
T
vk+1 = v1T T k , k N. (9)
Evidently we may view the iterates of a time homogeneous Markov chain on the state
space {1, . . . , n} as realisation of the power method, whereby powers of the stochastic
matrix T are applied to an initial vector v1T .
Observe that for any stochastic matrix T, we have (T ) = 1, with 1 as a corre-
sponding right eigenvector.
Theorem 4.1 (PerronFrobenius theorem for stochastic matrices) Suppose that T is
an irreducible stochastic matrix of order n. Then we have following:
a) (T ) = 1 is an algebraically and geometrically simple eigenvalue of T ;
b) the right eigenspace corresponding to the eigenvalue 1 is spanned by 1;
c) there is a positive left eigenvector w corresponding to the eigenvalue 1, normalised
so that wT 1 = 1;
d) if T is primitive, then for any eigenvalue 6= 1 we have || < 1;
2ij
e) if T is periodic with period d, then each of e d , j = 0, . . . , d 1 is an algebraically
simple eigenvalue of T , while all remaining eigenvalues have modulus strictly less than
1.
18
For an irreducible stochastic matrix T , the left eigenvector w of Theorem 4.1 is
known as the stationary distribution vector for the corresponding Markov chain, and
is a central quantity of interest.
Corollary 4.2 Suppose that T is a primitive stochastic matrix of order n with sta-
tionary distribution vector w. Then for any initial vector v Rn with v 0, 1T v = 1,
we have
lim v T T k = wT .
k
Thus, a Markov chain with transition matrix T converges to the stationary distribution
vector w, independently of the initial distribution.
In view of Corollary 4.2, we have the interpretation that each entry in the sta-
tionary vector represents the longterm probability that the Markov chain is in the
corresponding state. So, a large entry in the stationary distribution vector corre-
sponds to a state that is visited frequently by the Markov chain, while a small entry
in the stationary distribution vector corresponds to an infrequently visited state.
19
Theorem 4.4 Suppose that we have an irreducible stochastic matrix T of order n
with stationary vector w. Fix an index k between 1 and n 1, and partition T and w
conformally as
T1,1 T1,2
T = ,
T2,1 T2,2
where T1,1 and T2,2 are of orders k and n k, respectively. Then we have the following
conclusions.
a) The matrices S1 T1,1 + T1,2 (I T2,2 )1 T2,1 and S2 T2,2 + T2,1 (I T1,1 )1 T1,2
are irreducible and stochastic, of orders k and n k, respectively.
b) Denoting the stationary distribution vectors of S1 and S2 by u1 and u2 , respectively,
the vector w is given by
a1 u1
w= ,
a2 u 2
where the vector
a1
a2
is the stationary vector of the 2 2 matrix
T
u1 T1,1 1 uT1 T1,2 1
C= .
uT2 T2,1 1 uT2 T2,2 1
The matrices S1 , S2 in Theorem 4.4 are called stochastic complements, while the
matrix C is known as the coupling matrix. Observe that Theorem 4.4 allows for the
following divide and conquer strategy for computing the stationary vector.
i) partition T ;
ii) compute the stochastic complements S1 , S2 ;
iii) find the stationary vectors u1 , u2 for S1 and S2 ;
iv) find the coupling matrix C and its stationary distribution vector;
v) assemble the stationary vector w from u1 , u2 , a1 , a2 .
Note also that at step iii) we could iterate the divide and conquer strategy if we like.
Suppose that we have a reducible stochastic T matrix written in Frobenius normal
form as
T1,1 T1,2 . . . T1,m
0 T2,2 . . . T2,m
.. .
.. . . . .
. . . .
0 . . . 0 Tm,m
Suppose that we have a basic class for T, say corresponding to the irreducible diagonal
block Tj,j . Since T is a stochastic matrix, we see that Tj,j 1 1, and letting u be
the stationary vector for Tj,j , we find that on the one hand uT Tj,j 1 = uT 1 = 1 (since
20
(Tj,j ) = 1), while on the other hand uT Tj,j 1 uT 1 = 1. We deduce that in fact
we must have Tj,j 1 = 1, from which it follows that the blocks Tj,k , k = j + 1, . . . , m,
must all be zero blocks. Noting also that necessarily Tm,m is a basic class, we deduce
that T has the number 1 as an algebraically simple eigenvalue if and only if the only
basic class for T corresponds to the block Tm,m . When that holds, there is again a
uniques left eigenvector w of T corresponding to the eigenvalue 1, and is normalised
so that wT 1 = 1. We refer to this w also as the stationary distribution vector, though
observe that it may not be a positive vector.
The irreducible stochastic matrices are, in some sense, fundamental to the study
of spectral properties of all irreducible nonnegative matrices, as the following result
makes clear.
21
on and so the sequence of pages visited is a realisation of the Markov chain with
transition matrix T. If T is primitive, then each entry of its stationary distribution
vector describes the longterm probability that the surfer is in the corresponding
state. The interpretation then is that large entries in the stationary distribution
vector correspond to important web pages, in the sense that the internal structure of
the web directs the random surfer towards those web pages. The PageRank algorithm
then proposes to use the stationary distribution vector to rank the importance of the
web pages.
There are, however, a couple of implementational features that need to be ad-
dressed: i) many web pages, such as pictures, or pdf documents, do not contain any
outgoing links; ii) the directed graph for the world wide web is not even close to being
strongly connected (let alone primitive). To deal with these issues, two amendments
to the approach above are adopted.
i) For each vertex in the directed graph that having outdegree 0, we let the corre-
sponding row of T be n1 1T , where n is the number of vertices in the directed graph.
For vertices having outdegree 1 or more, the corresponding row of T is constructed
as above. It is readily verified that the matrix T is stochastic. The random surfer
interpretation is now modified slightly on a page having outgoing links, the surfer
clicks one of them at random, while if the surfer lands on a page with no outgoing
links, he/she jumps to a page on the web chosen uniformly at random.
ii) As noted above, T is stochastic, but because of the (very) disconnected nature of
the world wide web, T likely to be reducible. To address that issue, we take a posi-
tive vector u Rn such that 1T u = 1, and a scalar c (0, 1), and form the matrix
G = cT + (1 c)1uT . This matrix G is usually known as the Google matrix and note
that G is positive and stochastic. The Markov chain associated with G still admits
an interpretation in terms of a random surfer: at each time step, with probability c
the surfer follows links at random as described by T, while with probability 1 c, the
surfer selects a web page with probability determined by the corresponding entry in
u, and jumps to that page. The vector u is often known as the teleportation vector.
With these modifications in place, the stationary distribution of G is used to
to rank the importance of web pages. This stationary distribution is known as the
PageRank vector. Google reports using the power method to approximate the sta-
tionary distribution vector of G that is, they take an initial nonnegative vector v
whose entries sum to 1, and compute a number of terms in the sequence v T Gk , k N.
Since G is primitive, we know that this sequence will converge to the stationary dis-
tribution vector, but a key practical question is how fast the rate of convergence
is. Referring to the Jordan form for G, it is evident that the rate of convergence is
governed, asymptotically, by the eigenvalues of G. Fortunately, the following results
gives good information regarding the eigenvalues of G.
22
Theorem 4.6 Let M be a real square matrix of order n with eigenvalues 1 , . . . , n ,
and assume that 1 is an algebraically simple eigenvalue, with corresponding right
eigenvector x. Let c be a real scalar and let v be a vector in Rn . Then the eigenvalues
of the matrix cM + (1 c)xv T are: c1 + (1 c)v T x, c2 , . . . , cn .
From Corollary 4.7, all of the nonPerron eigenvalues of the Google matrix have
moduli bounded above by c, and in practice it turns out that G has several eigenvalues
with modulus equal to c. Consequently, the asymptotic rate of convergence for the
power method applied G is given by c. Interestingly enough for the Google matrix
G, it has been shown that a version of the divide and conquer technique inspired by
Theorem 4.4, used in conjunction with the power method (applied to the stochastic
complements) has an asymptotic rate of convergence that is at least as good as that
of the power method applied directly to G.
23
Let M = [mi,j ]i,j=1,...,n denote the mean first passage matrix for the Markov chain.
Then (10) can be rewritten as
M = T (M Mdg ) + J, (11)
where J is the n n all ones matrix, and where for any n n matrix A, we define
Adg as
Adg = diag a1,1 . . . an,n .
Let w denote the stationary distribution vector for T , and note from (11) that
w M = wT T (M Mdg ) + wT J, which yields wT Mdg = 1T . Hence we find that for
T
each i = 1, . . . , n, mi,i = w1i . Observe that this gives us another interpretation of the
entries in the stationary distribution vector in terms of the mean first return times
for the Markov chain associated with T .
We would like to derive a formula for M in terms of T . To do this, we introduce
the notation W = diag(w). From (11), we find that (I T )M = T Mdg + J =
T W 1 + J. Next, we consider the matrix I T + 1wT , and note that its invertible.
Since (I T + 1wT )M = T W 1 + 1(1 + wT M ), it follows that M = (I T +
1wT )1 T W 1 + 1(1 + wT M ) = (I (I T + 1wT )1 )W 1 + 1wT M. We already know
that the diagonal of M is given by W 1 , so we find that in fact
The matrix (I T + 1wT )1 is known as the fundamental matrix for the Markov
chain.
Suppose that we have an irreducible stochastic matrix T of order n with stationary
distribution vector w and mean first passage matrix M . Referring to (12), we see that
Consequently, we see that for each index i between 1 and n, the expression nj=1 mi,j wj
P
is independent of i and given by the quantity 1T ((I T + 1wT )1 )dg 1. This quantity
is known as the Kemeny constant for the Markov chain, and can be thought of as
the expected first passage time from a given state i to a randomly chosen destination
state (where the probability of selecting a particular destination state is given by the
corresponding entry in the stationary distribution vector).
24
Example 4.8 Suppose that a (0, 1), and let
0 0 0 a 1a
1 0 0 0 0
T = 0 1 0 0 0 .
0 0 1 0 0
0 0 0 1 0
1a
The mean first passage matrix for the Markov chain is then given by
5 a 4 a 3 a 2 a 1+3a
1a
1
5 a 4 a 3 a 2+2a
1a
3+a
M = 2 1 5 a 4 a 1a .
4
3 2 1 5 a 1a
5a
4 3 2 1 1a
15a
The Kemeny constant here is equal to 5a
.
5 Sample exercises
1. Suppose that A is an n n matrix that is singular. The index of A is defined as
being the smallest k N such that rank(Ak+1 ) = rank(Ak ). Prove that the index of
A is the same as the order of the largest Jordan block associated with the eigenvalue
0 for A.
Prove that the following are equivalent: i) (A) = max{rj |j = 1, . . . , n}; ii) (A) =
min{rj |j = 1, . . . , n}; iii) r is a scalar multiple of 1.
25
3. Suppose that A is a k k (0, 1) matrix. Prove that the matrix AAT to be prim-
itive if only if, for any pair of indices i, j between 1 and k, there are indices
i i0 , i1 , i2 , . . . , ir1 , ir j and indices l0 , l1 , . . . , lr1 such that aip ,lp = 1 for each
p = 0, 1, . . . r 1, and aip+1 ,lp = 1 for each p = 0, 1, . . . r 1.
4. Suppose that k N. What is the smallest M N such that every integer greater
than or equal to M can be written in the form ak + b(k + 1) for some pair of nonneg-
ative integers a and b?
References
[1] A. Berman and R. Plemmons. Nonnegative Matrices in the Mathematical Sci-
ences. SIAM, Philadelphia, 1994.
[2] R. Brualdi and H. Ryser. Combinatorial Matrix Theory. Cambridge University
Press, Cambridge, 1991.
[3] H. Caswell. Matrix Population Models, 2nd edition. Sinauer, Sunderland, 2001.
[4] I. Herstein. Topics in Algebra, 2nd edition. Wiley, New York, 1975.
[5] R. Horn and C. Johnson. Matrix Analysis. Cambridge University Press, Cam-
bridge, 1985.
26
[6] A. Langville and C. Meyer. Googles PageRank and Beyond: The Science of
Search Engine Rankings. Princeton University Press, Princeton, 2006.
[7] J. Marsden. Elementary Classical Analysis. W.H. Freeman, San Francisco, 1974.
27