An Introduction To Number Theory
An Introduction To Number Theory
J. J. P. Veerman
March 16, 2022
© 2022 J. J. P. Veerman
Adapt—remix, transform, and build upon the material for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms.
No additional restrictions—You may not apply legal terms or technological measures that legally
restrict others from doing anything the license permits.
√
1 Eratosthenes’ sieve up to n = 30. All multiples of a less than 31
are cancelled. The remainder are the primes less than n = 31. 4
2 f is a minimal polynomial for the irrational number r. By
minimality f 0 (p/q) is not zero. On the interval (r − 1, r), the
absolute value of the derivative of f attains its maximum at t. 9
3 A directed path γ passing through all points of Z2 . 13
4 A rectangle of 30 by 12 squares can be subdivided into squares
non larger than 6 by 6. 19
5 Two meshing gear wheels have 30, resp. 12 teeth. Each tiny
square represents the turning of one tooth in each wheel. After
precisely 5 turns of the first wheel and 2 of the second, both are
back in the exact same position. 20
6 The division algorithm: for any two integers r1 and r2 , we can find
an integer q and a real e ∈ [0, 1) so that r1 /r2 = q2 + e. 22
On the left, the function 2x lnt dt in blue, π(x) in red, and
R
7
x/ ln x in green. On the right, we have 2x lnt dt − x/ ln x in blue,
R
3
4 List of Figures
9 Proof that ∑∞n=1 f (n) (shaded in blue and green) minus f (1)
(shaded in blue) is less than 1∞ f (x) dx if f is positive and (strictly)
R
decreasing to 0. 40
10 The origin is marked by “×”. The red dots are visible from ×;
between any blue dot and × there is a red dot. The picture shows
exactly one quarter of {−4, · · · , 4}2 \(0, 0) ⊂ Z2 . 40
26 ABCar+s is the sum of the BbiCc j along the green line in the i − j
diagram. The red lines indicate where p - Bbi and p - Cc j . So all
contributions exceptBbrCcs are divisible by p. Thus p - ABCar+s . 127
27 Intuitively we wrap R around a circle of length 1, so that points
that differ by an integer land on the same point. 131
28 The Gaussian integers are the lattice points in the complex plane;
both real and imaginary parts are integers. For an arbitrary point
z ∈ C — marked by x in the figure, a nearby integer is k1 + ik2
where k1 is the closest integer to Re (z) and k2 the closest integer
to Im (z). In this case that is 2 + 3i. 157
√
29 A depiction of Z[ −6] in the complex plane; √ real parts are
integers and imaginary parts are multiples of 6. 159
√
30 Left, the elements of the ring Z[ −3]. Right, the ring
√
Z[ 12 (1 + −3)]. The units of each ring are indicated in green and
√
the ideals h2, 1 + −3i on the left and h2i on the left are indicated
in red. Fundamental domains (Definition 8.17) are shaded in blue. 161
√ √
31 Left, the fundamental domain of Z[ −3]. Here, h = i 3. Right,
one of the 2 isosceles triangles that constitute the√fundamental
√
domain of Z[ 12 (1 + −3)]. Its height d equals 12 3. The point
that maximizes the distance to the closest of the 3 corner points
lies on the bisector of the top angle at height y. 162
6 List of Figures
32 Points in the area red shaded are a distance less than from√an integer
in Z. The blue area maps into the red under
√ x → 2x − 19/4
indicated
√ by the arrow. We note that 19/4 ≈ 1.09 and
3/2 ≈ 0.87. 165
33 Possible values of ργ −1 in the proof of Proposition 8.16. 168
34 The Gaussian primes described in Proposition 8.30. There are
approximately 950 within a radius 40 of the origin (left figure) and
about 3300 within a radius 80 (right figure). 170
62 The functions θ (x)/x (green), ψ(x)/x (red), and π(x) ln x/x (blue)
for x ∈ [1, 1000]. All converge to 1 as x tends to infinity. The x-axis
is horizontal. 257
1
63 Plot of the function f (n) := ( lcm (1, 2, · · · , n)) n for n in
{1, · · · , 100} (left) and in {104 , · · · , 105 } (right). The function
converges to e indicated in the plots by a line. 259
List of Figures 3
11
12 Contents
Bibliography 313
Index 317
Part 1
Introduction to Number
Theory
Chapter 1
3
4 1. A Quick Tour of Number Theory
2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
1In a more general context — see Chapter 8 — these are called irreducible numbers, while the term
prime is reserved for numbers satisfying Corollary 2.9.
1.1. Divisors and Congruences 5
Note that any non-empty set S of integers with a lower bound can be
transformed by addition of a integer b ∈ N0 into a non-empty S + b in N0 .
Then S + b has a smallest element, and therefore so does S. Furthermore, a
non-empty set S of integers with a upper bound can also be transformed into
a non-empty −S + b in N0 . Here, −S stands for the collection of elements
of S multiplied by −1. Thus we have the following corollary of the well-
ordering principle.
Corollary 1.10. Let be a non-empty set S in Z with a lower (upper) bound.
Then S has a smallest (largest) element.
Definition 1.11. i) An element x ∈ R is called an integer if it is a root of a
degree 1 polynomial with leading coefficient 1, that is if x − p = 0.
ii) An element x ∈ R is called rational if it a root of a degree 1 polynomial,
that is: qx − p = 0 where p and q 6= 0 are integers.
iii) Otherwise it is called an irrational number.
The set of integers is denoted by Z, and the rational numbers are de-
noted by Q. The usual way of expressing a rational number is that it can
be written as qp . The advantage of expressing a rational number as the solu-
tion of a degree 1 polynomial, however, is that it naturally paves the way to
Definitions 1.15 and 1.16.
Theorem 1.12. Any interval in R contains an element of Q. We say that Q
is dense in R.
The crux of the following proof is that we take an interval and scale it
up until we know there is an integer in it, and then scale it back down.
1.2. Rational and Irrational Numbers 7
Proof. Let I = (a, b) with b > a any interval in R. From Corollary 1.10 we
1
see that there is an n such that n > b−a . Indeed, if that weren’t the case, then
N would be bounded from above, and thus it would have a largest element
n0 . But if n0 ∈ N, then so is n0 + 1. This gives a contradiction and so the
above inequality must hold.
It follows that nb − na > 1. Thus the interval (na, nb) contains an in-
teger, say, p. So we have that na < p < nb. The theorem follows upon
dividing by n.
√
Theorem 1.13. 2 is irrational.
√
Proof. Suppose 2 can be expressed as the quotient of integers rs . We may
assume that gcd(r, s) = 1 (otherwise just divide out the common factor).
After squaring, we get
2s2 = r2 .
The right-hand side is even, therefore the left-hand side is even. But the
square of an odd number is odd, so r is even. But then r2 is a multiple of 4.
Thus s must be even. This contradicts the assumption that gcd(r, s) = 1.
The transcendental numbers are even harder to pin down than the gen-
eral irrational numbers. We do know that e and π are transcendental, but the
proofs are considerably more difficult (see [26]). We’ll see below that the
transcendental numbers are far more abundant than the rationals or the alge-
braic numbers. In spite of this, they are harder to analyze and, in fact, even
hard to find. This paradoxical situation where the most prevalent numbers
are hardest to find, is actually pretty common in number theory.
The most accessible tool to construct transcendental numbers is Liou-
ville’s Theorem. The setting is the following. Given an algebraic number
y, it is the root of a polynomial with integer coefficients f (x) = ∑di=0 ai xi ,
where we always assume that the coefficient ad of the highest power is
non-zero. That highest power is called the degree of the polynomial and is
denoted by deg( f ) . Note that we can always find a polynomial of higher
degree that has y as a root. Namely, multiply f by any other polynomial g.
Definition 1.17. We say that f (x) = ∑di=0 ai xi in Z[x] is a minimal polynomial
in Z[x] for ρ if f is a non-zero polynomial in Z[x] of minimal degree, say d,
such that f (ρ) = 0. We say that the degree of ρ is d.
t
( )
r−1 r p/q r+1
So countable sets are the smallest infinite sets in the sense that there are
no infinite sets that contain no countable set. But there certainly are larger
sets, as we will see next.
Theorem 1.23. The set R is uncountable.
t(1) = 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 · · ·
t(2) = 2, 0, 2, 0, 2, 0, 2, 0, 2, 2, 2 · · ·
t(3) = 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2 · · ·
t(4) = 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0 · · ·
t(5) = 0, 0, 0, 2, 0, 0, 2, 0, 0, 2, 0 · · ·
t(6) = 2, 0, 0, 0, 0, 2, 0, 0, 0, 2, 2 · · ·
.. .. ..
. . .
Construct t ∗ as follows: for every n, its nth digit differs from the nth digit
of t(n). In the above example, t ∗ = 2, 2, 2, 0, 2, 0, · · · . But now we have
a contradiction, because the element t ∗ cannot occur in the list. In other
words, there is no surjection from N to T . Hence there is no bijection
between N and T .
The second step is to show that there is a subset K of R such that there
is no surjection (and thus no bijection) from N to K. Let t be a sequence
with digits ti . Define f : T → R as follows
∞
f (t) = ∑ ti 3−i .
i=1
12 1. A Quick Tour of Number Theory
If s and t are two distinct sequences in T , then for some k they share the
first k − 1 digits but tk = 2 and sk = 0. So
∞ ∞
f (t) − f (s) = 2 · 3−k + ∑ (ti − si )3−i ≥ 2 · 3−k − 2 ∑ 3−i = 3−k .
i=k+1 i=k+1
Thus f is injective. Therefore f is a bijection between T and the subset
K = f (T ) of R. If there is a surjection g from N to K = f (T ), then,
g f
N −→ K ←− T .
And so f −1 g is a surjection from N to T . By the first step, this is impossible.
Therefore, there is no surjection g from N to K, much less from N to R.
The crucial part here is the diagonal step, where an element is con-
structed that cannot be in the list. This really means the set T is strictly
larger than N. The rest of the proof seems an afterthought, and perhaps
needlessly complicated. You might think that it is much more straightfor-
ward to just use the digits 0 and 1 and the representation of the real numbers
on the base 2, as opposed to the digits 0 and 2 and the base 3. But if you
do that, you run into a problem that has to be dealt with. The sequence t ∗
might end with an infinite all-ones subsequence such as t ∗ = 1, 1, 1, 1, · · · .
This corresponds to the real number x = 1.0... which might be in the list.
To circumvent that problem leads to slightly more complicated proofs (see
exercise 1.10).
Meanwhile, this gives us a very nice corollary which we will have
occasion to use in later chapters. For b an integer greater than 1, denote
by {0, 1, 2, · · · b − 1}N the set of sequences a1 a2 a3 · · · where each ai is in
{0, 1, 2, · · · b − 1}. Such sequences are often called words.
Corollary 1.24. (i) The set of infinite sequences in {0, 1, 2, · · · b − 1}N is
uncountable. (ii) The set of finite sequences (but without bound) in {0, 1, 2, · · · b−
1}N is countable.
Proof. The proof of (i) is the same as the proof that T is uncountable in the
proof of Theorem 1.23. The proof of (ii) consists of writing first all b words
of length 1, then all b2 words of length 2, and so forth. Every finite string
will occur in the list.
Theorem 1.25. (i) The set Z2 is countable. (ii) Q is countable.
1.4. Countable and Uncountable Sets 13
Proof. (i) The proof relies on Figure 3. In it, a directed path γ is traced
out that passes through all points of Z2 . Imagine that you start at (0, 0) and
travel along γ with unit speed. Keep a counter c ∈ N that marks the point
(0, 0) with a “1”. Up the value of the counter by 1 whenever you hit a point
of Z2 . This establishes a bijection between N and Z2 .
(ii) Again travel along γ with unit speed. Keep a counter c ∈ N that
marks the point (0, 1) with a “1”. Up the value of the counter by 1. Con-
tinue to travel along the path until you hit the next point (p, q) that is not
a multiple of any previous and such q is not zero. Mark that point with
the value of the counter. Q contains N and so is infinite. Identifying each
marked point (p, q) with the rational number qp establishes the countability
of Q.
Notice that this argument really tells us that the product (Z × Z)of a
countable set (Z) and another countable set is still countable. The same
holds for any finite product of countable set. Since an uncountable set is
strictly larger than a countable, intuitively this means that an uncountable
set must be a lot larger than a countable set. In fact, an extension of the
above argument shows that the set of algebraic numbers numbers is count-
able (see exercises 1.9 and 1.26). And thus, in a sense, it forms small subset
of all reals. All the more remarkable, that almost all reals that we know
14 1. A Quick Tour of Number Theory
1.5. Exercises
Exercise 1.1. Apply Eratosthenes’ Sieve to get all prime numbers between
1 and 200. (Hint: you should get 25 primes less than 100, and 21 between
100 and 200.)
Exercise 1.2. Factor the following into prime numbers (write as a product
of primes).
393, 16000, 5041, 1111, 1763, 720.
Exercise 1.3. Find pairs of primes that differ by 2. These are called twin
primes. Are there infinitely many such pairs? (Hint: This is an open prob-
lem; the affirmative answer is called the twin prime conjecture.)
Conjecture 1.28 (Twin Prime Conjecture). There are infinitely many twin
prime pairs2.
Exercise 1.4. Show that small enough even integers greater than 3 can be
written as the sum of two primes. Is this always true? (Hint: This is an
open problem; the affirmative answer is called the Goldbach conjecture.)
Exercise 1.10. What is wrong in the following attempt to prove that [0, 1]
is uncountable?
Assume that [0, 1] is countable, that is: there is a bijection f between [0, 1]
and N. Let r(n) be the unique number in [0, 1] assigned to n. Thus the
infinite array (r(1), r(2), · · · ) forms an exhaustive list of the numbers in
[0, 1], as follows:
r(1) = 0.00000000000 · · ·
r(2) = 0.10101010111 · · ·
r(3) = 0.00011111111 · · ·
r(4) = 0.11111100000 · · ·
r(5) = 0.00010010010 · · ·
r(6) = 0.10000100011 · · ·
.. .. ..
. . .
(Written as number on the base 2.) Construct r∗ as the string whose nth
digit differs from that of r(n). Thus in this example:
r∗ = 0.111010 · · · ,
which is different from all the other listed binary numbers in [0, 1].
(Hint: what if r∗ ends with an infinite all ones subsequence?)
Exercise 1.11. The set f (T ) in the proof of Theorem 1.23 is called the
middle third Cantor set. Find its construction. What does it look like?
(Hint: locate the set of numbers whose first digit (base 3) is a 1; then
the set of numbers whose second digit is a 1.)
Exercise 1.12. The integers exhibit many, many other intriguing patterns.
Given the following function:
n even: f (n) = 2n
.
n odd: f (n) = 3n+1 2
a) (Periodic orbit) Show that f sends 1 to 2 and 2 to 1.
b) (Periodic orbit attracts) Show that if you start with a small positive inte-
ger and apply f repeatedly, eventually you fall on the orbit in (a).
c) Show that this is true for all positive integers.
(Hint: This is an open problem; the affirmative answer is called the Collatz
conjecture.)
Exercise 1.14. This exercise prepares for Mersenne and Fermat primes,
see Definition 5.13.
2ab −1
a) Use ∑a−1 ib p
i=0 2 = 2a −1 to show that if 2 − 1 is prime, then p must be
prime.
(−2b )a −1
b) Use ∑a−1 b i p
i=0 (−2 ) = (−2)a −1 to show that if 2 + 1 is prime, then p has
no odd factor. (Hint: assume a is odd.)
1 p
Exercise 1.15. In what follows, we assume that e − 1 = ∑∞
i=1 i! = q is
rational and show that this leads to a contradiction.
a) Show that the above assumption implies that
q ∞
q! q!
∑ +∑ = p(q − 1)! .
i=1 i! i=1 (q + i)!
(Hint: multiply both sides of by q! .)
q! 1
b) Show that ∑∞ ∞
i=1 (q+i)! < ∑i=1 (q+1)i . (Hint: write out a few terms of the
sum on the left.)
c) Show that the sum on the left hand side in (b) cannot have an integer
value.
d) Show that the other two terms in (a) have an integer value.
e) Conclude there is a contradiction unless the assumption that e is rational
is false.
Exercise 1.16. Show that Liouville’s theorem (Theorem 1.18) also holds
for rational for rational numbers ρ = rs as long as qp 6= rs .
Exercise 1.17. a) Show that for all positive integers p and n, we have
p(n + 1)n! ≤ (n + p)! .
b) Use (a) to show that
∞ ∞ −1
∑ 10−k! ≤ ∑ 10−p(n+1)n! = 10−(n+1)n! 1 − 10−(n+1)n! .
k=n+1 p=1
c) Show that b) implies the affirmation after equation (1.1).
Exercise 1.19. Show that the inequality of Roth’s theorem does not hold
for all numbers. (Hint: Let ρ be a Liouville number.)
Definition 1.30. Let A be a set. Its power set P(A) is the set whose elements
are the subsets of A. This always includes the empty set denoted by 0. /
18 1. A Quick Tour of Number Theory
In the next two exercises, the aim is to show something that is obvious
for finite sets, namely:
Theorem 1.31. The cardinality of a power set is always (strictly) greater
than that of the set itself.
Exercise 1.21. Let A be an arbitrary set. Assume that that there is a surjec-
tion S : A → P(A) and define
R = {a ∈ A | a 6∈ S(a)} . (1.2)
a) Show that there is a q ∈ A such that S(q) = R.
b) Show that if q ∈ R, then q 6∈ R. (Hint: equation (1.2).)
c) Show that if q 6∈ R, then q ∈ R. (Hint: equation (1.2).)
d) Use (b) and (c) and exercise 1.20, to establish that |A| < |P(A)|. (Hint:
see Definition 1.26.)
In the next two exercises we show that the cardinality of R equals that
of P(N). This implies that that |R| > |N|, which also follows from Theorem
1.23.
Exercise 1.22. Let T be the set of sequences defined in the proof of Theo-
rem 1.23. To a sequence t ∈ T , associate a set S(t) in P(N) as follows:
i ∈ S if t(i) = 2 and i 6∈ S if t(i) = 0 .
a) Show that there is a bijection S : T → P(N).
b) Use the bijection f in the proof of Theorem 1.23 to show there is a bi-
jection K → P(N).
c) Show that (a) and (b) imply that |P(N)| = |K| = |T |. (Hint: see Defini-
tion 1.26.)
d) Find an injection K → R and conclude that |P(N)| ≤ |R|.
Exercise 1.24. Show that having the same cardinality (see Definition 1.26)
is an equivalence relation on sets.
1.5. Exercises 19
Exercise 1.25. a) Fix some n > 0. Show that having the same remainder
modulo n is an equivalence relation on Z. (Hint: for example, -8, 4, and
16 have remainder 4 modulo 12.)
b) Show that addition respects this equivalence relation. (Hint: If a+b = c,
a ∼ a0 , and b ∼ b0 , then a0 + b0 = c0 with c ∼ c0 .)
c) The same question for multiplication.
12
30
Exercise 1.28. Suppose two meshing gear wheels have n and m teeth, re-
spectively. Each wheel has one marked tooth.
a) Show that the positions of the wheels after ` teeth are traversed is in-
dicated by the projection of the point (`, `) on both in a rectangular coor-
dinate system with n by m units. See Figure 5. (Hint: each small square
corresponds to the turn through one tooth on both wheels. Show that the
first time the marked teeth return exactly to their original position occurs
when the first wheel has made lcm (n, m)/n = m/ gcd(n, m) complete turns
and the second lcm (n, m)/n = n/ gcd(n, m).
20 1. A Quick Tour of Number Theory
12
12
12
12
12
30 30
Figure 5. Two meshing gear wheels have 30, resp. 12 teeth. Each tiny
square represents the turning of one tooth in each wheel. After precisely
5 turns of the first wheel and 2 of the second, both are back in the exact
same position.
Chapter 2
21
22 2. The Fundamental Theorem of Arithmetic
r /r2
1
e
0 q
S = {ax + by : x, y ∈ Z, ax + by 6= 0}
.
ν(S) = {|s| : s ∈ S} ⊆ N ∪ {0}
Then ν(S) 6= 0/ (it contains |a| and |b|) and is bounded from below. Thus by
the well-ordering principle of N, it has a smallest element n. Then there is
an element d ∈ S that has that norm: |d| = n.
For that d, we use the division algorithm to establish that there are q
and r ≥ 0 such that
a = dq + r and |r| < |d| . (2.1)
Now substitute d = ax+by. A short computation shows that r can be rewrit-
ten as:
r = a(1 − qx) + b(−qy) .
Suppose r 6= 0. Then this shows that r ∈ S. But we also know from (2.1)
that |r| is smaller than |d|. This is a contradiction because of the way d
is defined. But r = 0 implies that d is a divisor of a. The same argument
shows that d is also a divisor of b. Thus d is a common divisor of both a
and b.
Now let e be any divisor of both a and b. Then e | (ax + by), and so
e | d. But if e | d, then |e| must be smaller than or equal to |d|. Therefore, d
is the greatest common divisor of both a and b.
By multiplying x and y by f , we achieve that for any multiple f d of d
that
afx+bfy = fd.
24 2. The Fundamental Theorem of Arithmetic
On the other hand, let d be as defined above and suppose that x, y, and c are
such that
ax + by = c .
Since d divides a and b, we must have that d | c, and thus c must be a
multiple of d.
Euclid’s lemma is so often used, that it will pay off to have a few of the
standard consequences for future reference.
Theorem 2.7 (Cancellation Theorem). Let gcd(a, b) = 1 and b positive.
Then ax =b ay if and only if x =b y.
Proof. The statement is trivially true if b = 1, because all integers are equal
modulo 1.
If ax =b ay, then a(x − y) =b 0. The latter is equivalent to b | a(x − y).
The conclusion follows from Euclid’s Lemma. Vice versa, if x =b y, then
(x − y) is a multiple of b and so a(x − y) is a multiple of b.
Proof. Corollary 2.9 says that if p and all qi are primes, then there is j ≤ n
such that p | q j . Since q j is prime, its only divisors are 1 and itself. Since
p 6= 1 (by the definition of prime), p = q j .
ii) that product is unique (up to the order of multiplication and up to multi-
plication by the units).
Remark 2.12. The theorem is also called the unique factorization theorem.
Its statement means that up to re-ordering of the pi and factors ±1, every
integer n can be uniquely expressed as
r
n = ±1 · ∏ p`i i ,
i=1
where the pi are distinct primes.
Proof. First we prove (i). Define S to be the set of integers n that are not
products of primes times a unit, and the set ν(S) their absolute values. If
the set S is non-empty, then by the well-ordering principle (Theorem 1.9),
ν(S) has a smallest element. Let a be one of the elements in S that minimize
ν(S).
If a is prime, then it can be factored into primes, namely a = a, which
contradicts the assumption. Thus a is a composite number, a = bc and both
b and c are non-units. Thus |b| and |c| are strictly smaller than |a|. By
assumption, both b and c are products of primes. Then, of course, so is
a = bc. But this contradicts the assumptions on a.
Next, we prove (ii). Let S be the set of integers that have more than
one factorization and ν(S) the set of their absolute values. If the set S is
non-empty, then, again by the well-ordering principle, ν(S) has a smallest
element. Let a be one of the elements in S that minimize ν(S).
Thus we have
r s
a = u ∏ pi = u0 ∏ p0i ,
i=1 i=1
where at least some of the pi and p0i do not match up. Here, u and u0 are
units. Clearly, p1 divides a. By Corollary 2.10, p1 equals one of the p0i , say,
p01 . Since primes are not units, pa1 is strictly less than |a|. Therefore, by
hypothesis, pa1 is uniquely factorizable. But then the primes in
r s
a
= u ∏ pi = u0 ∏ p0i ,
p1 i=2 i=2
all match up (up to units).
2.4. Corollaries of the Fundamental Theorem of Arithmetic 27
Remark 2.13. It is interesting to note that the proof of this theorem depends
on two distinct characterizations of primes. In part (i), we use Definition
1.4, which essentially says that primes are numbers that cannot be factored
into smaller numbers (the literal meaning of “irreducible”). But for part (ii),
we essentially use the fact that if a prime p divides ab, then it divides a or
b (or both). Now (through Corollary 2.10) we know both characterizations
hold in Z, but it will turn out that they are not equivalent in general (see
Proposition 8.3).
If the reader investigates the arguments carefully, it will become clear
that underneath it all lurks the division algorithm in Z. To wit, we use
Corollary 2.10 which Corollary 2.9 which uses Euclid’s lemma which uses
Bézout which finally uses the division algorithm. It is precisely this division
algorithm that is not available in all rings, and which plays an important role
in algebraic number theory, see Chapter 8).
Remark 2.14. The student might reflect on this and conclude that one can-
not write 1 as a product of primes. So how come that in Theorem 2.11 we
do not make an exception for the number 1 (or -1 for that matter). The
answer is this: 1 is a unit times “the empty product” of primes, and this is
unique. This piece of apparent sophistry actually turns out to be useful as
we will see in Chapter 8 (corollary 8.14).
Proof. The easiest way to see this uses prime power factorization. If
gcd(∏ni=1 ai , b) = d > 1, then d contains a factor p > 1 that is a prime.
Since p divides ∏ni=1 ai , at least one of the ai must contain (by Corollary
28 2. The Fundamental Theorem of Arithmetic
2.9) a factor p. Since p also divides b, this contradicts the assumption that
gcd(ai , b) = 1.
Vice versa, if gcd(ai , b) = d > 1 for some i, then also ∏ni=1 ai is divisible
by d.
Corollary 2.16. For all a and b in Z not both equal to 0, we have that
gcd(a, b) · lcm (a, b) = ab up to units.
Proof. Given two numbers a and b, let P = {pi }ki=1 be the list of all prime
numbers occurring in the unique factorization of a or b. We then have:
s s
a = u ∏ pki i and b = u0 ∏ p`i i ,
i=1 i=1
where u and u0 are units and ki and `i in N ∪ {0}. Now define:
mi = min(ki , `i ) and Mi = max(ki , `i ) ,
and let the numbers m and M be given by
s s
m = ∏ pm
i
i
and M = ∏ pMi
i .
i=1 i=1
Since mi + Mi = ki + `i , it is clear that the multiplication m · M yields ab.
Now all we need to do, is showing that m equals gcd(a, b) and that M
equals lcm (a, b). Clearly m divides both a and b. On the other hand, any
integer greater than m has a unique factorization that either contains a prime
not in the list P and therefore divides neither a nor b, or, if not, at least one
of the primes in P in its factorization has a power greater than mi . In the last
case m is not a divisor of at least one of a and b. The proof that M equals
lcm (a, b) is similar.
A question one might ask is: how many primes are there? In other
words, how long can the list of primes in a factorization be? Euclid provided
the answer around 300BC.
Theorem 2.17 (Infinitude of Primes). There are infinitely many primes.
Proof. Suppose the list P of all primes is finite, so that P = {pi }ni=1 . Define
the integer d as the product of all primes (to the power 1):
n
d = ∏ pi .
i=1
2.5. The Riemann Hypothesis 29
Analytic functions are functions that are differentiable, that is to say, wher-
ever the derivative is non-zero, the derivative equals a scaling times a rota-
tion. Equivalently, they are locally given by a convergent power series. If f
and g are two analytic continuations to a region U of a function h given on
a region V ⊂ U, then the difference f − g is zero on V . One can then show
that the power series of f − g must be zero on the entire region U. Hence,
analytic continuations f and g are unique.
Definition 2.19. The Riemann zeta function ζ (z) is a complex function de-
fined on {z ∈ C | Re z > 1} by
∞
ζ (z) = ∑ n−z .
n=1
On other values of z ∈ C it is defined by the analytic continuation of this
function (except at z = 1 where it has a simple pole).
There are two common proofs of this formula. It is worth presenting both.
proof 1. The first proof uses the Fundamental Theorem of Arithmetic. First,
we use the geometric series
∞
(1 − p−z )−1 = ∑ p−kz
k=0
2.5. The Riemann Hypothesis 31
proof 2. The second proof, the one that Euler used, employs a sieve method.
This time, we start with the left-hand side of the Euler product. If we mul-
tiply ζ by 2−z , we get back precisely the terms with n even. So
1 − 2−z ζ (z) = 1 + 3−z + 5−z + · · · = ∑ n−z .
2- n
The argument used in Eratosthenes’ sieve (Section 1.1) now serves to show
that in the right-hand side of the last equation all terms other than 1 disap-
pear as ` tends to infinity. Therefore, the left-hand side tends to 1, which
implies the proposition.
Figure 7. On the left, the function 2x lnt dt in blue, π(x) in red, and
R
The first estimate is the one we will prove directly in Chapter 12. It
turns out the second is equivalent to it (exercise 12.10). However, it is
this one that gives the better estimate of π(x). In Figure 7 on the left, we
plotted, for x ∈ [2, 1000], from top to bottom the functions 2x lnt dt in blue,
R
From this figure one may be tempted to conclude that 2x lnt dt − π(x) is
R
n is called the Skewes number. Not much is known about this number1,
except that it is less than 10317 .
Perhaps the most important open problem in all of mathematics is the
following. It concerns the analytic continuation of ζ (z) given above.
Conjecture 2.22 (Riemann Hypothesis). All non-real zeros of ζ (z) lie on
the line Re z = 12 .
In his only paper on number theory [46], Riemann realized that the
hypothesis enabled him to describe detailed properties of the distribution
1In 2020.
2.6. Exercises 33
2.6. Exercises
Exercise 2.1. Apply the division algorithm to the following number pairs.
(Hint: replace negative numbers by positive ones.)
a) 110 , 7.
b) 51 , −30.
c) −138 , 24.
d) 272 , 119.
e) 2378 , 1769.
f) 270 , 175560.
Exercise 2.2. In this exercise we will exhibit the division algorithm applied
to polynomials x + 1 and 3x3 + 2x + 1 with coefficients in Q, R, or C.
a) Apply long division to divide 3021 by 11. (Hint: 3021 = 11 · 275 − 4.)
b) Apply the exact same algorithm to divide 3x3 + 2x + 1 by x + 1. In this
algorithm, xk behaves as 10k in (a). (Hint: at every step, cancel the highest
power of x.)
c) Verify that you obtain 3x3 + 2x + 1 = (x + 1)(3x2 − 3x + 5) − 4.
d) Show that in general, if p1 and p2 are polynomials such that the degree
of p1 is greater or equal to the degree of p2 , then
p1 = q2 p2 + p3 ,
where the degree of p3 is less than the degree of p2 . (Hint: perform long
division as in (b). Stop when the degree of the remainder is less than that
of p2 .)
e) Why does this division not work for polynomials with coefficients in Z?
(Hint: replace x + 1 by 2x + 1.)
2This area of research, complex analysis methods to investigate properties of primes, is now called
analytic number theory. We take this up in Chapters 11 and 12.
34 2. The Fundamental Theorem of Arithmetic
Exercise 2.5. a) Use unique factorization to show that any composite num-
√
ber n must have a prime factor less than or equal to n.
b) Use that fact to prove: If we apply Eratosthenes’ sieve to {2, 3, · · · n}, it
√
is sufficient to sieve out numbers less than or equal to n.
Exercise 2.7. It is possible to extend the definition of gcd and lcm to more
than two integers (not all of which are zero). For example gcd(24, 27, 54) =
3.
a) Compute gcd(6, 10, 15) and lcm (6, 10, 15).
b) Give an example of a triple whose gcd is one, but every pair of which
has a gcd greater than one.
c) Show that there is no triple {a, b, c} whose lcm equals abc, but every
pair of which has lcm less than the product of that pair. (Hint: consider
lcm (a, b) · c.)
Exercise 2.12. See exercise 2.11. Show that any number in E is a product
of primes in E. (Hint: follow the proof of Theorem 2.11, part (i).)
Exercise 2.13. See exercise 2.11 which shows that unique factorization
does not hold in E = {2, 4, 6, · · · }. The proof of unique factorization uses
Euclid’s lemma. In turn, Euclid’s lemma was a corollary of Bézout’s
lemma, which depends on the division algorithm. Where exactly does the
chain break down in this case?
Exercise 2.14. Let L = {p1 , p2 , · · · } be the list of all (infinitely many)
primes, ordered according ascending magnitude. Show that pn+1 ≤
∏ni=1 pi . (Hint: consider d = ∏ni=1 pi and let pn+1 be the smallest prime
divisor of d − 1. See the proof of Theorem 2.17.)
A much stronger version of exercise 2.14 is the so-called Bertrand’s Pos-
tulate. That theorem says that for every n ≥ 1, there is a prime in {n +
1, · · · , 2n}. It was proved by Chebyshev. Subsequently the proof was sim-
plified by Ramanujan and Erdös [2].
Exercise 2.15. Let p and q be primes greater than 3.
a) Show that Res12 (p) = r with r ∈ {1, 5, 7, 11}. (The same holds for q.)
b) Show that 24 | p2 − q2 . (Hint: use (a) to show that p2 = 24x + r2 and
check all cases.)
36 2. The Fundamental Theorem of Arithmetic
Exercise 2.16. A square full integer is an integer n that has a prime factor
and each prime factor occurs with a power at least 2. A square free integer
is an integer n such that each prime factor occurs with a power at most 1.
a) If n is square full, show that there are positive integers a and b such that
n = a2 b3 .
b) Show that every integer greater than one is the product of a square free
number and a square full number.
Exercise 2.17. Let L = {p1 , p2 , · · · } be the list of all primes, ordered ac-
cording ascending magnitude. The numbers En = 1 + ∏ni=1 pi are called
Euclid numbers.
a) Check the primality of E1 through E6 .
b) Show that En =4 3. (Hint: En − 1 is twice an odd number.)
c) Show that for n ≥ 3 the decimal representation of En ends in a 1. (Hint:
look at the factors of En .)
Exercise 2.18. Twin primes are a pair of primes of the form p and p + 2.
a) Show that the product of two twin primes plus one is a square.
b) Show that p > 3, the sum of twin primes is divisible by 12. (Hint: see
exercise 2.15)
Exercise 2.19. Show that there arbitrarily large gaps between successive
primes. More precisely, show that every integer in {n!+2, n!+3, · · · n!+n}
is composite for any n ≥ 2.
The usual statement for the fundamental theorem of arithmetic includes
only natural numbers n ∈ N (i.e. not Z) and the common proof uses in-
duction on n. We review that proof in the next two problems.
Exercise 2.20. a) Prove that 2 can be written as a product of primes.
b) Let k > 2. Suppose all numbers in {1, 2, · · · k} can be written as a product
of primes (or 1). Show that k + 1 is either prime or composite.
c) If in (b), k +1 is prime, then all numbers in {1, 2, · · · k +1} can be written
as a product of primes (or 1).
d) If in (b), k + 1 is composite, then there is a divisor d ∈ {2, · · · k} such
that k + 1 = dd 0 .
e) Show that the hypothesis in (b) implies also in this case, all numbers in
{1, 2, · · · k + 1} can be written as a product of primes (or 1).
f) Use the above to formulate the inductive proof that all elements of N can
be written as a product of primes.
2.6. Exercises 37
Exercise 2.21. The set-up of the proof is the same as in exercise 2.20. Use
induction on n. We assume the result of that exercise.
a) Show that n = 2 has a unique factorization.
b) Suppose that if for k > 2, {2, · · · k} can be uniquely factored. Then there
are primes pi and qi , not necessarily distinct, such that
s r
k + 1 = ∏ pi = ∏ qi .
i=1 i=1
c) Show that then p1 divides ∏ri=1 qi and so, Corollary 2.10 implies that
there is a j ≤ r such that p1 = q j .
d) Relabel the qi ’s, so that p1 = q1 and divide n by p1 = q1 . Show that
k+1 s r
= ∏ pi = ∏ qi .
q1 i=2 i=2
e) Show that the hypothesis in (b) implies that the remaining pi equal the
remaining qi . (Hint: qk1 ≤ k.)
f) Use the above to formulate the inductive proof that all elements of N can
be uniquely factored as a product of primes.
Here is a different characterization of gcd and lcm. We prove it as a corol-
lary of the prime factorization theorem.
Corollary 2.23. (1) A common divisor d > 0 of a and b equals gcd(a, b) if
and only if every common divisor of a and b is a divisor of d.
(2) Also, a common multiple d > 0 of a and b equals lcm (a, b) if and only
if every common multiple of a and b is a multiple of d.
Exercise 2.22. Use the characterization of gcd(a, b) and lcm (a, b) given
in the proof of Corollary 2.16 to prove Corollary 2.23.
38 2. The Fundamental Theorem of Arithmetic
Exercise 2.24. In this exercise we consider the Riemann zeta function for
real values of z greater than 1.
a) Show that for all x > −1, we have ln(1 + x) ≤ x.
b) Use Proposition 2.20 and a) to show that
p−z p−z p−z
ln ζ (z) = ∑ ln 1 + −z
≤ ∑ −z
≤ ∑ −z
.
p prime 1− p p prime 1 − p p prime 1 − 2
R∞
Figure 8. Proof that ∑∞ n=1 f (n) is greater than 1 f (x) dx if f is positive
and (strictly) decreasing.
Exercise 2.25. a) Let p be a fixed prime. Show that the probability that
two independently chosen integers in {1, · · · , n} are divisible by p tends to
1/p2 as n → ∞. Equivalently, the probability that they are not divisible by
p tends to 1 − 1/p2 .
b) Make the necessary assumptions, and show that the probability that two
two independently chosen integers in {1, · · · , n} are not divisible by any
prime tends to ∏ p prime 1 − p−2 . (Hint: you need to assume that the
probabilities in (a) are independent and so they can be multiplied.)
c) Show that from (b) and Euler’s product formula, it follows that for 2
random (positive) integers a and b to have gcd(a, b) = 1 has probability
1/ζ (2) ≈ 0.61.
d) Show that for d > 1 and integers {a1 , a2 , · · · ad } that probability equals
1/ζ (d). (Hint: the reasoning is the same as in (a), (b), and (c).)
e) Show that for real d > 1:
1
Z ∞
1 < ζ (d) < 1 + x−d dx = 1 +
1 d
For the middle inequality, see Figure 9.
f) Show that for large d, the probability that gcd(a1 , a2 , · · · ad ) = 1 tends to
1.
Exercise 2.26. This exercise in based on exercise 2.25.
a) In the {−4, · · · , 4}2 \(0, 0) grid in Z2 , find out which proportion of the
lattice points is visible from the origin, see Figure 10.
b) Use exercise 2.25 (c) to show that in a large grid, this proportion tends
to 1/ζ (2).
c) Use exercise 2.25 (d) to show that as the dimension increases to infin-
ity, the proportion of the lattice points Zd that are visible from the origin,
increases to 1.
40 2. The Fundamental Theorem of Arithmetic
Figure 9. Proof that ∑∞n=1 f (n) (shaded in blue and green) minus f (1)
(shaded in blue) is less than 1∞ f (x) dx if f is positive and (strictly)
R
decreasing to 0.
Figure 10. The origin is marked by “×”. The red dots are visible from
×; between any blue dot and × there is a red dot. The picture shows
exactly one quarter of {−4, · · · , 4}2 \(0, 0) ⊂ Z2 .
2
Exercise 2.27. We note here that ζ (2) = π6 .
a) Show that the irrationality of π implies that ζ (2) is irrational.
b) Show that (a) and Proposition 2.20 yield another proof of the infinity of
primes.
Chapter 3
Linear Diophantine
Equations
Proof. On the one hand, we have r1 = r2 q2 +r3 , and so any common divisor
of r2 and r3 must also be a divisor of r1 (and of r2 ). Vice versa, since
r1 − r2 q2 = r3 , we have that any common divisor of r1 and r2 must also be
a divisor of r3 (and of r2 ).
41
42 3. Linear Diophantine Equations
188 = 158 · 1 + 30
158 = 30 · 5 + 8
30 = 8·3+6 ,
8 = 6·1+2
6 = 2·3+0
We see that gcd(188, 158) = 2. The numbers that multiply the ri are the
quotients of the division algorithm (see the proof of Lemma 2.2). If we call
them qi , the computation looks as follows:
r1 = r2 q2 + r3
r2 = r3 q3 + r4
.. .. ..
. . . , (3.1)
rn−3 = rn−2 qn−2 + rn−1
rn−2 = rn−1 qn−1 + rn
rn−1 = rn qn + 0
where we use the convention that rn+1 = 0 while rn 6= 0. Observe that with
that convention, (3.1) consists of n − 1 steps. A much more concise form
(in part based on a suggestion of Katahdin [30]) to render this computation
is as follows.
| qn | qn−1 | ··· | q3 | q2 |
(3.2)
0 | rn | rn−1 | ··· | r3 | r2 | r1 |
Thus, each step ri+1 | ri | is similar to the usual long division, except that
its quotient qi+1 is placed above ri+1 (and not above ri ), while its remainder
3.2. A Particular Solution of ax + by = c 43
ri+2 is placed all the way to the left of of ri+1 . The example we worked out
before, now looks like this:
| 3 | 1 | 3 | 5 | 1 |
(3.3)
0 | 2 | 6 | 8 | 30 | 158 | 188 |
There is a beautiful visualization of this process outlined in exercise 3.2.
Proof. Let ri , qi , and Qi be defined as above, and set rn+1 = 0. From equa-
tion (3.4), we have
ri r r r
= Q−1i
i−1
=⇒ r n−1
= rQ−1 · · · Q−1 1 .
n−1 2
ri+1 ri rn r2
Observe that rn+1 = 0 and so gcd(r1 , r2 ) = rn and
rn−1 xn−1 yn−1 r
= 1 .
rn xn yn r2
The theorem follows immediately by setting x = xn and y = yn .
| + | − | + | − | + | ···
| qn | qn−1 | qn−2 | qn−3 | qn−4 | ···
0 | rn | rn−1 | rn−2 | rn−3 | rn−4 | ···
| 1 | | | | |
| 0 | −qn−1 | 1 | | |
| | | qn−1 qn−2 | −qn−1 | |
| | | | −qn−3 (1 + qn−1 qn−2 ) | 1 + qn−1 qn−2 | ···
The algorithm proceeds as follows. Number the columns from right to left,
so that ri (in row 1) and qi (in row 2) are in the ith column. (The signs in
row “0” serve only to keep track of the signs of the coefficients in row 3
and below.) In the first two rows, the algorithm proceeds from right to left.
3.3. Solution of the Homogeneous equation ax + by = 0 45
From ri−1 and ri determine qi and ri+1 by ri−1 = ri qi + ri+1 . The division
guarantees that these exist, but they may not be unique (see exercise 7.22).
In rows 3 and below, the algorithm proceeds from left to right. Each column
has at most two non-zero entries. Start with column n + 1 which has only
zeroes and column n which has one 1. The bottom non-zero entry of column
i equals the sum of column i + 1 times qi times (-1). The top non-zero entry
of column i equals the sum of the entries in column i + 2. Finally, we obtain
that rn = r2 x + r1 y, where x is the sum of the entries in the 2nd column
(rows 3 and below) and y, the sum of the entries (row 3 and below) of the
1st column.
Applying this to the example gives
| + | − | + | − | + | −
| 3 | 1 | 3 | 5 | 1 | 0
0 | 2 | 6 | 8 | 30 | 158 | 188
| 1 | | | | | (3.5)
| | −1 | 1 | | |
| | | 3 | −1 | |
| | | | −20 | 4 |
| | | | | 21 | −21
Adding the last two lines gives that 2 = 158(25) + 188(−21).
Proof. On the one hand, by substitution the expressions for x and y into the
homogeneous equation, one checks they are indeed solutions. On the other
hand, x and y must satisfy
r1 r2
x=− y.
gcd(r1 , r2 ) gcd(r1 , r2 )
46 3. Linear Diophantine Equations
The integers gcd(rri ,r ) (for i in {1, 2}) have greatest common divisor equal
1 2
to 1. Thus Euclid’s lemma applies and therefore gcd(rr1 ,r ) is a divisor of y
1 2
while gcd(rr2 ,r ) is a divisor of x.
1 2
x
2
(r ,r )
1 2
(0) (0)
(x1,x2)
(0) (0)
(z1+x1,z +x2)
2
x
1
m
(z1,z2)
m’
x(0)
Proof. Let be that particular solution. Let m be the line given by
y(0)
−x(0)
(~r,~x) = c. Translate m over the vector to get the line m0 . Then an
−y (0)
z1
integer point on the line m0 is a solution of the homogeneous equation
z2
x(0) + z1
if and only if on m is also an integer point (see Figure 11).
y(0) + z2
48 3. Linear Diophantine Equations
r1 = r1 · 1 + r2 · 0 .
r2 = r1 · 0 + r2 · 1
3.6. The Chinese Remainder Theorem 49
Proof. The proof follows from unique factorization and is similar to that of
k
Corollary 2.16. Suppose b j = ∏si=1 pi i j , where ki j ≥ 0. Set
mi = min ki j and Mi = max ki j ,
j j
50 3. Linear Diophantine Equations
Then
s s
gcd(b1 , · · · bk ) = ∏ pm
i
i
and lcm (b1 , · · · bk ) = ∏ pMi
i .
i=1 i=1
`
Any common divisor of the bi must be equal to ∏si=1 pi i with `i ≤ mi and
similar for common multiples.
Theorem 3.13 (Chinese Remainder Theorem). Let n = ∏ki=1 bi , where bi
are positive integers such that gcd (b j , bi ) = 1 for i 6= j. The set of solutions
of
∀i ∈ {1, · · · , k} : z =bi ci
is given by
k
n n
z =n ∑ x jc j where xi satisfies xi =bi 1 .
j=1 bj bi
3.7. Polynomials
In this section, we illustrate that the division and Euclidean algorithms have
much wider applications than just the integers, see also exercises 2.2 and
2.4.
Definition 3.14. A polynomial f in Q[x] of positive degree is irreducible
over Q if it cannot be written as a product of two polynomials in Q[x] with
positive degree. Recall (Definition 1.17) that f is minimal polynomial in
3.8. Exercises 51
We mention without proof (but see exercise 2.2) that in Q[x] the divi-
sion algorithm holds: given r1 and r2 , then there are q2 and r3 such that
r1 = r2 q2 + r3 such that degree(r3 ) < degree(r1 ) .
Remark 3.17. To make this valid without exceptions, we adopt the con-
vention that the degree of a non-zero constant equals 0, while the degree of
0 equals −∞. For example, if r1 = r2 = 1, the inequality for r3 still holds.
The student is likely already familiar with these facts.
3.8. Exercises
Exercise 3.1. Let ` be the line in R2 given by y = ρx, where ρ ∈ R.
a) Show that ` intersects Z2 if and only if ρ is rational.
b) Given a rational ρ > 0, find the intersection of ` with Z2 . (Hint: set
ρ = rr21 and use Proposition 3.5.)
52 3. Linear Diophantine Equations
Exercise 3.2. This problem was taken (and reformulated) from [24].
a) Tile a 188 by 158 rectangle by squares using what is called a
greedy algorithm a. The first square is 158 by 158. The remaining rec-
tangle is 158 by 30. Now the optimal choice is five 30 by 30 squares.
What remains is an 30 by 8 rectangle, and so on. Explain how this is a
visualization of equation (3.3). See Figure 12.
b) Consider equation (3.1) or (3.2) and use a) to show that
n
r1 r2 = ∑ qi ri2 .
i=2
(Hint: assume that r1 > r2 > 0, rn 6= 0, and rn+1 = 0.)
aBy “greedy” we mean that at every step, you choose the biggest square possible and as
many of them as possible. In general a greedy algorithm always makes a locally optimal choice.
158 30
30
30
30
158
30
30
6 8
6888
Figure 12. A ‘greedy’ (or locally best) algorithm to tile the the 188 ×
158 rectangle by squares. The 3 smallest — and barely visible —
squares are 2 × 2. Note how the squares spiral inward as they get
smaller. See exercise 3.13.
Exercise 3.3. In (3.1), assume that r1 > r2 > 0. What happens if you start
the Euclidean algorithm with r2 = r1 · 0 + r3 instead of r1 = r2 · q2 + r3 ?
3.8. Exercises 53
Exercise 3.4. Apply the Euclidean algorithm to find the greatest common
divisor of the following number pairs. (Hint: replace negative numbers by
positive ones. For the division algorithm applied to these pairs (r1 , r2 ), see
exercise 2.1)
a) 110 , 7.
b) 51 , −30.
c) −138 , 24.
d) 272 , 119.
e) 2378 , 1769.
f) 270 , 175560.
Exercise 3.6. Find all solutions for x and y of the following (homogeneous)
Diophantine equations. (Hint: Use one of the algorithms in Section 3.2.)
a) 110x + 7y = 0.
b) 51x − 30y = 0.
c) −138x + 24y = 0.
d) 272x + 119y = 0.
e) 2378x + 1769y = 0.
f) 270x + 175560y = 0.
Exercise 3.7. Find the general solution for x and y in all problems of exer-
cise 3.5 that admit a solution. (Hint: use Corollary 3.8.)
√
Exercise 3.10. Denote the golden mean , or 1+2 5 ≈ 1.618, by g.
a) Show that g2 = g + 1 and thus for n ∈ Z: gn+1 = gn + gn−1 .
b) Show that F3 ≥ g1 and F2 ≥ g0 .
n
c) Use induction to show that
F√n+2≥ g for n > 0.
1+ 5
d) Use the fact that 5 log10 2 ≈ 1.045, to show that F5k+2 > 10k for
k ≥ 0.
Exercise 3.11. Consider the equations in (3.1) and assume that rn+2 = 0
and rn+1 > 0.
a) Show that rn+1 ≥ F2 = 1 and rn ≥ F3 = 2. (Hint: r(i) is strictly increas-
ing.)
b) Show that r1 ≥ Fn+2 .
c) Suppose r1 and r2 in N and max{r1 , r2 } < Fn+2 . Show that the Eu-
clidean Algorithm to calculate gcd(r1 , r2 ) takes at most n − 1 iterates of
the division algorithm.
Exercise 3.12. Use exercises 3.10 and 3.11 to show that the Euclidean
Algorithm to calculate gcd(r1 , r2 ) takes at most 5k − 1 iterates where
k is the number of decimal places of max{r1 , r2 }. (This is known as
Lamé’s theorem.)
3.8. Exercises 55
Exercise 3.13. Apply the greedy algorithm of exercise 3.2 (a) to the rec-
tangle whose sides have length 1 and g (see exercise 3.10 (a)). At step 0,
we start with the 1 × 1 square.
a) Use exercise 3.10 (a) to show at that step i, you get one g−i × g−i square
(see Figure 13).
b) Use exercise 3.2 (b) to show that g = ∑∞ −2i .
i=0 g
c) Use this construction, but now with a Fn+1 × Fn Fibonacci rectangle, to
show that Fn+1 Fn = ∑ni=1 Fi2 . For Fi , see Definition 3.18.
d) Show that in polar coordinates (r, θ ) the red spiral connecting the cor-
ners of the squares in Figure 13 is given by r = Cg2θ /π for some C.(Note:
this is called the golden spiral.)
1 1/g
1/g
1/g 2
1/g 3
3 2
1/g 1/g
Figure 13. The greedy algorithm of exercise 3.2 (a) applied to the
golden mean rectangle. The spiral connecting the corners of the square
is known as the golden spiral. (In actual fact we used a 55 by 34 rectan-
gle as an approximation. An approximation to a true spiral was created
by fitting circular segments to the corners.)
Exercise 3.14. a) Write the numbers 287, 513, and 999 in base 2, 3, and 7,
using the division algorithm. Do not use a calculating device. (Hint: start
with base 10. For example:
287 = 28 · 10 + 7
28 = 2 · 10 + 8
2 = 0 · 10 + 2
Hence the number in base 10 is 2 · 102 + 8 · 101 + 7 · 100 .)
b) Show that to write n in base b takes about logb n divisions.
56 3. Linear Diophantine Equations
Exercise 3.22. For this exercise, read Section 3.7 carefully. All polynomi-
als are in Q[x] (that is: with coefficients in Q). Let p1 (x) = x7 − x2 + 1,
p2 (x) = x3 + x2 , and e(x) = 2 − x.
a) Use the Euclidean Algorithm to determine gcd(p1 , p2 ). Hint: We list the
steps of the Euclidean algorithm:
Exercise 3.23. All polynomials are in Q[x]. Let p(x) be a polynomial and
p0 (x) its derivative.
a) Show that if p(x) has a multiple root λ of order k > 1, then p0 (x) has
that same root of order k − 1. (Hint: Differentiate p(x) = h(x)(x − λ )k .)
b) Use exercise 3.22, to give an algorithm to find a polynomial q(x) that
has the same roots as p(x), but all roots are simple (i.e. no multiple roots).
(Hint: you need to divide p by gcd(p, p0 ).)
Number Theoretic
Functions
Note that outside number theory, the term sequence is the one that is
most commonly used. We will use these terms interchangeably.
Definition 4.2. A multiplicative function is a sequence such that gcd(a, b) =
1 implies f (ab) = f (a) f (b). A completely multiplicative function is one
where the condition that gcd(a, b) = 1 is not needed.
Note that completely multiplicative implies multiplicative (but not vice versa).
The reason this definition is interesting, is that it allows us to evaluate the
59
60 4. Number Theoretic Functions
is also multiplicative.
Proof. Let n = ∏si=1 p`i i . The summation ∑d|n f (d) can be written out using
the previous lemma and the fact that f is multiplicative:
Perhaps the simplest multiplicative functions are the ones where f (n) =
nk for some fixed k. Indeed, f (n) f (m) = nk mk = f (nm). In fact, this is a
completely multiplicative function. Thus Proposition 4.3 implies that the
functions σk defined below are multiplicative.
Definition 4.4. Let k ∈ R. The multiplicative function σk : N → R gives the
sum of the k-th power of the positive divisors of n. Equivalently:
σk (n) = ∑ d k .
d|n
4.1. Multiplicative Functions 61
Definition 4.7. We say that n is square free if there is no prime p such that
p2 | n.
Lemma 4.8. The Möbius function µ is multiplicative.
ab = d
o n
=⇒ b | n and a | .
d|n b
And so (a, b) is in Tn . Vice versa, if (a, b) is in Tn , then by setting d ≡ ab,
we get
b | n
n =⇒ d | n and ab = d .
a|
b
And so (a, b) is in Sn .
Theorem 4.13. (Möbius inversion) Let F : N → C be any number theoretic
function and µ the Möbius function. Then the following equation holds
F(n) = ∑ f (d)
d|n
For the second step we apply Lemma 4.12 to the set over which the sum-
mation takes place. This gives:
Proof. Define S(d, n) as the set of integers m between 1 and n such that
gcd(m, n) = d:
S(d, n) = {m ∈ N : m ≤ n and gcd(m, n) = d} .
4.4. Euler’s Phi or Totient Function 65
1
Proof. Apply Möbius inversion to Lemma 4.16:
d µ(a)
ϕ(d) = ∑ µ(a) = d∑ . (4.3)
a|d
a a|d
a
And the best of these examples is gotten by substituting the identity ε for F
in equation (4.5):
ε = 1∗ f ⇐⇒ f = µ ∗ε = µ . (4.7)
Thus µ is the convolution inverse of the sequence (1, 1, 1 · · · ). This imme-
diately leads to an unexpected3 expression for 1/ζ (z) of equation (4.8).
Definition 4.21. Let f (n) is an arithmetic function (or sequence). A Dirichlet
series is a series of the form F(z) = ∑∞ −z
n=1 f (n)n . Similarly, a Lambert series
∞ xn
is a series of the form F(x) = ∑n=1 f (n) 1−xn .
Proof. This follows easily from re-arranging the terms in the product:
!
∞ ∞
f (a)g(b)
∑ (ab)z = ∑ ∑ f (a)g(b) n−z .
a,b≥1 n=1 ab=n
Recall from Chapter 2 that one of the chief concerns of number theory
is the location of the non-real zeros of ζ . At stake is Conjecture 2.22 which
states that all its non-real zeros are on the line Re z = 1/2. The original
definition of the zeta function is as a series that is absolutely convergent
for Re z > 1 only. Equation (4.8) converges in that same region, and so
3The fact that this follows so easily, justifies the use of the word referred to in the previous footnote
68 4. Number Theoretic Functions
Proof.
ζ (z − k)ζ (z) = ∑ a−z ∑ bk b−z = ∑ n−z ∑ bk .
a≥1 b≥1 n≥1 b|n
Lemma 4.24. A Lambert series can re-summed as follows:
∞
xn ∞
∑ f (n) = ∑ (1 ∗ f )(n) xn .
1 − xn n=1
n=1
Proof. We have
xn
∑ ϕ(n) 1 − xn = ∑ (1 ∗ ϕ)(n) xn = ∑ I(n) xn .
n≥1 n≥1 n≥1
The first equality follows from Lemma 4.24 and the second from Lemma
d
4.16. The last sum can be computed as x dx (1−x)−1 which gives the desired
expression.
t/2
Figure 14. A one parameter family ft of maps from the circle to itself.
For every t ∈ [0, 1] the map ft is constructed by truncating the map x →
2x mod 1 as indicated in this figure.
4.6. Exercises
Exercise 4.1. Decide which functions are not multiplicative, multiplica-
tive, or completely multiplicative (see Definition 4.2).
a) f (n) = 1.
b) f (n) = 2.
c) f (n) = ∑ni=1 i.
d) f (n) = ∏ni=1 i.
e) f (n) = n.
f) f (n) = nk .
g) f (n) = ∑d|n d.
h) f (n) = ∏d|n d.
70 4. Number Theoretic Functions
Exercise 4.2. a) Let h(n) = 0 when n is even, and 1 when n is odd. Show
that h is multiplicative.
b) Now let H(n) = ∑d|n h(d). Show without using Proposition 4.3 that H is
multiplicative. (Hint: write a = 2k ∏ri=1 p`i i by unique factorization, where
the pi are odd primes. Compute the number of odd divisors. Similarly for
b.)
c) What does Proposition 4.3 say?
z4
z3
z2
z1 y
y2 3
y
x1 x2 x3 x4 1
Figure 15. Two ways of computing the volume of a big box: add the
volumes of the small boxes, or compute the dimensions of the big box.
Exercise 4.4. a) Compute the numbers σ1 (n) = σ (n) of Definition 4.4 for
n ∈ {1, · · · , 30} without using Theorem 4.5.
b) What is the only value n for which σ (n) = n?
c) Show that σ (p) = p + 1 whenever p is prime.
d) Use (c) and multiplicativity of σ to check the list obtained in (a).
e) For what values of n in the list of (a) is n | σ (n)? (Hint: 6 and 28.)
Exercise 4.5. a) Compute the numbers σ0 (n) = τ(n) of Definition 4.4 for
n ∈ {1, · · · , 30} without using Theorem 4.5.
b) What is the only value n for which τ(n) = 1?
c) Show that τ(p) = 2 whenever p is prime.
d) Use (c) and multiplicativity of τ to check the list obtained in (a).
4.6. Exercises 71
Exercise 4.11. Draw the following directed graph G: the set of vertices
V represent 0 and the natural numbers between 1 and 50. For a, b ∈ V , a
directed edge ab exists if σ (a) − a = b. Finally, add a loop at the vertex
representing 0. Notice that every vertex has 1 outgoing edge, but may have
more than 1 incoming edge.
a) Find the cycles of length 1 (loops). The non-zero of these represent per-
fect numbers.
b) Find the cycles of length 2 (if any). A pair of numbers a and b that
form a cycle of length 2 are called amicable numbers. Thus for such a pair,
σ (b) − b = a and σ (a) − a = b.a
c) Find any longer cycles. Numbers represented by vertices in longer cy-
cles are called sociable numbers.
d) Find numbers whose path ends in a cycle of length 1. These are called
aspiring numbers.
e) Find numbers (if any) that have no incoming edge. These are called un-
touchable numbers.
f) Determine the paths starting at 2193 and at 562. (Hint: both end in a
cycle (or loop).)
aAs of 2017, about 109 amicable number pairs have been discovered.
Exercise 4.13. Let F(n) = n = ∑d|n f (n). Use the Möbius inversion for-
mula (or f (n) = ∑d|n µ(d)F( dn )) to find f (n). (Hint: substitute the Möbius
function of Definition 4.6 and use multiplicativity where needed.)
Exercise 4.14. a) Compute the sets Sn and Tn of Lemma 4.12 explicitly for
n = 4 and n = 12.
b) Perform the resummation done in equations 4.1 and 4.2 explicitly for
n = 4 and n = 12.
74 4. Number Theoretic Functions
Exercise 4.17. See Definition 4.10. Define f (n) ≡ τ(n2 ) and g(n) ≡ 2ω(n) .
a) Compute ω(n), f (n), and g(n) for n equals 10n and 6!.
b) For p prime, show that τ(p2k ) = ∑d|pk 2ω(d) = 2k + 1. (Hint: use The-
orem 4.5.)
c) Show that f is multiplicative. (Hint: use that τ is multiplicative.)
d) Use (d) to show that g is multiplicative.
e) Show that
τ(n2 ) = ∑ 2ω(d) .
d|n
4.6. Exercises 75
Exercise 4.18. Let S(n) denote the number of square free divisors of n
with S(1) = 1 and ω(n) the number of distinct prime divisors of n. See
also Definition 4.10.
a) Show that S(n) = ∑d|n |µ(d)|. (Hint: use Definition 4.6.)
b) Show that S(n) = 2ω(n) . (Hint: let W be the set of prime divisors of
n. Then every square free divisor corresponds to a subset — product — of
those primes. How many subsets of primes are there in W ?)
c) Conclude that
∑ |µ(d)| = 2ω(n) .
d|n
Exercise 4.21. Use exercise 4.20 (e) and the definition of ω in exercise
4.17 and λ in exercise 4.19 to show that
∑ µ(d)λ (d) = 2ω(n) .
d|n
76 4. Number Theoretic Functions
Exercise 4.23. a) Use Euler’s product formula and the sequence µ of Def-
inition 4.6 to show that
!
1 −z
i −iz
= 1− p = ∏ ∑ µ(p )p .
ζ (z) p ∏prime p prime i≥0
b) Without using equation (4.7), prove that the expression in (a) equals
∑n≥1 µ(n) n−z . (Hint: since µ is multiplicative, you can write a proof
re-arranging terms as in the first proof of Euler’s product formula.)
Exercise 4.26. Show that ζ (z) has no zeroes and no poles in the region
ℜ(z) > 1. (Hint: use that ζ (z) converges for ℜ(z) > 1 and (4.8).)
Chapter 5
77
78 5. Modular Arithmetic and Primes
Proof. Let {xi } be a complete set of residues modulo b. Then the b numbers
{axi } form complete set of residues unless two of them are congruent. But
that is impossible by Theorem 2.7.
Let {xi } be a reduced set of residues modulo b. Then, as above, no two
of the ϕ(b) numbers {axi } are congruent modulo b. Furthermore, Lemma
2.15 implies that if gcd(a, b) = 1 and gcd(xi , b) = 1, then gcd(axi , b) = 1.
Thus the set {axi } is a reduced set of residues modulo b.
Theorem 5.4 (Euler). Let a, b > 1 and gcd(a, b) = 1. Then aϕ(b) =b 1.
ϕ(b)
Proof. Let {xi }i=1 be a reduced set of residues modulo b. Then by Lemma
ϕ(b)
5.3, {axi }i=1 is a reduced set of residues modulo b. Because multiplication
5.1. Euler’s Theorem and Primitive Roots 79
is commutative, we get
ϕ(b) ϕ(b) ϕ(b)
∏ xi =b ∏ axi =b aϕ(b) ∏ xi
i=1 i=1 i=1
ϕ(b)
Since gcd(xi , a) = 1, Lemma 2.15 implies that gcd ∏i=1 xi , a = 1. The
cancelation theorem applied to the equality between the first and third terms
proves the result.
Euler’s theorem says that ϕ(b) is a multiple of Ord× b (a). But it does
not say what multiple. In fact, in practice, that question is difficult to decide.
It is of theoretical importance to decide when the two are equal.
Definition 5.5. Let a and b positive integers with gcd(a, b) = 1. If Ord×
b (a) =
ϕ(b), then a is called a primitive root modulo b.
The salient fact about prime roots is that we know exactly when they
occur. An accessible proof of Theorem 5.7 (i) can be found in [15]chapter
8 and part (ii) in [4]chapter 10.
1The contrapositive of (P ⇒ Q) is (qQ ⇒qP) (or: not Q implies not P) and holds if and only if the
former holds.
80 5. Modular Arithmetic and Primes
Proof. The proof proceeds by executing a long division, each step of which
uses the division algorithm. Start by reducing a modulo n and call the result
r0 .
a = nq0 + r0 ,
where r0 ∈ {0, · · · n−1}. Lemma 3.1 implies that gcd(a, n) = gcd(r0 , n) = 1.
So in particular, r0 6= 0. The integer part of a/n is q0 . The next step of the
long division is:
10r0 = nq1 + r1 ,
where again we choose r1 ∈ {0, · · · n − 1}.
Note that 0 ≤ 10r0 < 10n and so q1 ∈ {0, · · · 9}. We now record the first
digit “after the decimal point” of the decimal expansion: q1 . By Lemma 3.1,
we have gcd(10r0 , n) = gcd(r1 , n). In turn, this implies via Lemma 2.15 that
gcd(r0 , n) = gcd(r1 , n). And again, we see that r1 6= 0.
The process now repeats itself.
10 (10r0 − nq1 ) = nq2 + r2 ,
| {z }
r1
and we record the second digit after the decimal dot, q2 ∈ {0, · · · 9}. By the
same reasoning, gcd(r2 , n) = 1 and so r2 6= 0. One continues and proves by
induction that gcd(ri , n) = 1. In particular, ri 6= 0, so the expansion does not
terminate.
Since the remainders ri are in {1, · · · n − 1}, the sequence must be
eventually periodic with (least positive) period p. At that point, we have
10k+p r0 =n 10k r0 .
5.2. Fermat’s Little Theorem and Primality Testing 81
By Theorem 2.7, we can cancel the common factors 10k and r0 , and we
obtain that 10 p =n 1. Since p is the least such (positive) number, we have
proved (i). Item (ii) follows directly from Euler’s Theorem.
2n−1 =n 1. If that fails, we know that n is not prime. However, the converse
of Fermat’s little theorem is not true! So even if 2n−1 =n 1, it could be that
n is not prime; we will discuss this possibility at the end of this section. As
it turns out, primality testing via Fermat’s little theorem can be done much
faster than the naive method, provided one uses fast modular exponentiation
algorithms. We briefly illustrate this technique by computing 11340 modulo
341.
Start by expanding 340 in base 2 as done in exercise 3.14, where it was
shown that this takes on the order of log2 340 (long) divisions.
340 = 170 · 2 + 0
170 = 85 · 2 + 0
85 = 42 · 2 + 1
42 = 21 · 2 + 0
21 = 10 · 2 + 1
10 = 5·2+0
5 = 2·2+1
2 = 1·2+0
1 = 0·2+1
And so
340 = 101010100 in base 2 .
i
Next, compute a table of powers 112 modulo 341, as done below. This
can be done using very few computations. For instance, once 118 =341 143
has been established, the next up is found by computing 1432 modulo 341,
5.2. Fermat’s Little Theorem and Primality Testing 83
which gives 330, and so on. So this takes about log2 340 multiplications.
0 111 =341 11
0 112 =341 121
1 114 =341 319
0 118 =341 143
1 1116 =341 330
0 1132 =341 121
1 1164 =341 319
0 11128 =341 143
1 11256 =341 330
The first column in the table thus obtained now tells us which coefficients
in the second we need to compute the result.
11340 =341 330 · 319 · 330 · 319 =341 132 .
Again, this takes no more than log2 340 multiplications. Thus altogether, for
a number n and a computation in base b, this takes on the order of 2 logb n
multiplications plus logb n divisions2. For large numbers, this is much more
√
efficient than the n of the naive method.
As mentioned, the drawback is that we can get false positives. While
there are partial converses to Fermat’s little theorem, they do not yield com-
putationally efficient improvements (see exercise 5.20).
Definition 5.11. The number n ∈ N is called a pseudoprime to the base b
if gcd(b, n) = 1 and bn−1 =n 1 but nonetheless n is composite. (When the
base is 2, the clause to the base 2 is often dropped.)
Some numbers pass all tests to every base and are still composite.
These are called Carmichael numbers. The smallest Carmichael number
is 561. It has been proved [44] that there are infinitely many of them.
Definition 5.12. The number n ∈ N is called a Carmichael number if it is
composite and it is a pseudoprime to every base.
2Divisions take more computations than multiplications. We do not pursue this here.
84 5. Modular Arithmetic and Primes
Proof. We only prove (ii); (i) can be proved similarly. So suppose that a is
odd, then
2b =2b +1 −1 =⇒ 2ab =2b +1 (−1)a =2b +1 −1 =⇒ 2ab + 1 =2b +1 0
which proves the statement. Notice that this includes the case where b = 1.
In that case, we have 3 | (2a + 1) (whenever a odd).
Proof. One direction follows from the previous lemma. Thus we only need
to prove that if an even number n is perfect, then it is of the form stipulated.
Since n is even, we may assume n = q 2k−1 where k ≥ 2 and q is odd.
Using multiplicativity of σ and the fact that σ (n) = 2n:
σ (n) = σ (q) (2k − 1) = 2n = q 2k .
86 5. Modular Arithmetic and Primes
Thus
(2k − 1) σ (q) − 2k q = 0 . (5.1)
Since 2k −(2k −1) = 1, we know by Bézout that gcd((2k −1), 2k ) = 1. Thus
Proposition 3.5 implies that the general solution of the above equation is:
q = (2k − 1)t and σ (q) = 2k t , (5.2)
where t > 0, because we know that q > 0.
Assume first that t > 1. The form of q, namely q = (2k − 1)t, allows us
to identify at least four distinct divisors of q. This gives that
σ (q) ≥ 1 + t + (2k − 1) + (2k − 1)t = 2k (t + 1) .
This contradicts equation (5.2), and so t = 1.
Now use equation (5.2) again (with t = 1) to get that n = q 2k−1 =
(2k − 1) 2k−1 has the required form. Furthermore, the same equation says
that σ (q) = σ (2k − 1) = 2k which proves that 2k − 1 is prime.
It is unknown at the date of this writing (2021) whether any odd perfect
numbers exist.
Z× 0 1 2 3 4 5
Z×
5 0 1 2 3 4 6
0 0 0 0 0 0 0
0 0 0 0 0 0
1 0 1 2 3 4 5
1 0 1 2 3 4
2 0 2 4 0 2 4
2 0 2 4 1 3
3 0 3 0 3 0 3
3 0 3 1 4 2
4 0 4 2 0 4 2
4 0 4 3 2 1
5 0 5 4 3 2 1
The optimistic reader might be inclined to think that maybe not all is
lost, as long as things work for the most important number system, Z itself.
Alas, a moment’s thought reveals that multiplication in Z, like multiplica-
tion in Zb for b non-prime, does not have an inverse. Thus our hand is
forced, and we define a structure where addition has all the nice properties
— in particular, it has an inverse — and where we are a bit more prudent in
assigning the characteristics of multiplication.
Definition 5.20. A ring is defined as a set R which is closed under two op-
erations, usually called addition and multiplication, and has the following
properties:
i) R with addition is an Abelian group (with additive identity 0).
ii) Multiplication in R is associative (see exercise 5.23).
iii) Multiplication is distributive over addition (that is: a(b + c) = ab + bc
and (b + c)a = ba + ca).
iv) R has a (multiplicative) identity denoted by 1 and 0 6= 1.
A commutative ring is a ring in which multiplication is commutative.
Remark 5.21. Note that N is not a ring, because addition is not invertible.
We will from here on out consider the primes as a subset of Z.
Remark 5.22. We will assume rings to be commutative and drop the ad-
jective “commutative” for brevity, unless needed for clarity.
Remark 5.23. The requirement that 0 6= 1 only excludes the 0 ring (R =
{0}).
Remark 5.24. An important example of an “almost ring” are the multiples
nZ in Z for n > 1. Indeed, that set satisfies all the requirements of a ring
5.4. A Divisive Issue: Rings and Fields 89
The sets Z, Q, and Zb are all examples of rings, but of these only Q and
Z p with p prime are fields, because all elements are invertible as we saw in
Proposition 5.18. The field of the integers modulo a prime p will be from
now be denoted by F p , where p is understood to be a prime.
Rings and fields occur in all kinds of other situations and applications.
To mention one unexpected example, we already looked at one interesting
example of a ring, namely the arithmetic functions with addition and convo-
lution as operations (exercise 4.15). Here are some √ other examples of rings
that are not fields. Real numbers of the form a + b 3 where a and √ b in Z,
complex numbers of the form a + ib or those of the form a + ib 6 where
a and b in Z. Other examples are the n by n matrices (n ≥ 2). We have
already seen the polynomials with rational coefficients exercise 3.22. They
also form a ring. All of these rings have different properties. For instance,
the ring of n by n matrices is not commutative. We will see later that not all
rings (that are not fields) have primes.
It is useful to reflect a moment on how the absence of division influ-
ences how we think about such sets. It is precisely that curious absence that
brings us to the study of primes, integers that have no non-trivial divisors at
90 5. Modular Arithmetic and Primes
all. The situation in fields like Z p (for prime p) or R is very different! Here
multiplication does have an inverse, and thus given a and b not equal to 0,
we can always write a as a non-trivial product as follows:
a = p (ab)b−1 .
Proof. We have
a2 = p 1 ⇐⇒ a2 − 1 = p (a + 1)(a − 1) = p 0 ⇐⇒ p | (a + 1)(a − 1) .
Because p is prime, Corollary 2.9 says that the last statement holds if and
only if either p | a + 1 (and so a = p −1) or p | a − 1 (and so a = p +1).
Proof. This is true for p is 2 and 3. If p > 3, then Proposition 5.18 (3) and
Lemma 5.28 imply that every factor ai in the product (p − 1)! other than -1
or 1 has a unique inverse a0i different from itself. The factors a0i run through
all factors 2 through p − 2 exactly once. Thus in the product, we can pair
each ai different from ±1 with an inverse a0i distinct from itself. This gives
(p − 1)! = p (+1)(−1) ∏ ai a0i = p −1 .
5.6. Exercises
Exercise 5.1. a) Let m > 0. Show that a =m b is an equivalence relation
on Z. (Use Definitions 1.7 and 1.27.)
b) Describe the equivalence classes of Z modulo 6. (Which numbers in Z
are equivalent to 0? Which are equivalent to 1? Et cetera.)
c) Show that the equivalence classes are identified by their residue, that is:
a ∼ b if and only if Resm (a) = Resm (b).
Note: If we pick one element of each equivalence class, such an element
is called a representative of that class. The smallest non-negative represen-
tative of a residue class in Zm , is called the least residue (see Definition
1.8). The collection consisting of the smallest non-negative representative
of each residue class is called a complete set of least residues.
92 5. Modular Arithmetic and Primes
Exercise 5.2. This exercise relies on exercise 5.1. Denote the set of equiva-
lence classes of Z modulo m by Zm (see Definition 1.7). Prove that addition
and multiplication are well-defined in Zm , using the following steps.
a) If a =m a0 and b =m b0 , then Resm (a) + Resm (b) =m Resm (a0 ) +
Resm (b0 ). (Hint: show that a + b = c if and only if a + b =m c. In other
words: the sum modulo m only depend on Resm (a) and Resm (b) and not
on which representative in the class (see exercise 5.1) you started with.)
b) Do the same for multiplication.
Exercise 5.4. Let n = ∑ki=1 ai 10i where ai ∈ {0, 1, 2, · · · , 9}. Follow the
strategy in exercise 5.3 to prove the following facts.
a) Show that n is divisible by 5 if and only if a0 is. (Hint: Show that
n =5 a0 .)
b) Show that n is divisible by 2 if and only if a0 is.
c) Show that n is divisible by 9 if and only if ∑ki=1 ai is.
d) Show that n is divisible by 11 if and only if ∑ki=1 (−1)i ai is.
e) Find the criterion for divisibility by 4.
f) Find the criterion for divisibility by 7. (Hint: this is a more complicated
criterion!)
Exercise 5.10. a) For i in {1, 2, · · · 11} and j in {2, 3, · · · 11}, make a table
of Ord×j (i), i varying horizontally. After the jth column, write ϕ( j).
b) List the primitive roots i modulo j for i and j as in (a). (Hint: the
smallest primitive roots modulo j are: {1, 2, 3, 2, 5, 3, 0,/ 2, 3, 2}.)
Exercise 5.11. We show that the 5-th Fermat number, 232 + 1, is a com-
posite number.
a) Show that 24 =641 −54 .(Hint: add 24 and 54 .)
b) Show that 27 5 =641 −1.
c) Show that 232 + 1 = (27 )4 24 + 1 =641 0.
d) Conclude that F5 is divisible by 641.
Exercise 5.15. Assume the setting of exercise 5.14. Assume p and q are
such that the encryption is invertible. What is the decryption algorithm?
Prove it. (Hint Find r ∈ {0, · · · q − 1} such that rp =q 1. Then multiply the
encryption by r.)
Exercise 5.16. Work out the last two problems if we encrypt using an affine
cipher (a, p) . That is, the encryption on the alphabet {0, · · · q − 1} is done
as follows:
i → a + pi mod q
Work out when this can be inverted, and what the algorithm for the inverse
is.
Exercise 5.17. Decrypt the code V’ir Tbg n Frperg.
p − 1 and equals 1 if i = 0 or i =
p.
b) Evaluate 4i mod 4 and 6i mod 6. So where in (a) did you use the
fact that p is prime?
c) Use (a) and the binomial theorem to show that if p is prime, then we
have (a + b) p = p a p + b p .
5.6. Exercises 95
Exercise 5.20. In this exercise, we prove Lemma 5.31. For this purpose,
abbreviate Ord×n (a) by o and assume the condition of the lemma.
a) Show that n − 1 = o j for some j ∈ N.
b) Show that if j > 1 in (a), there is a prime p dividing j such that
a(n−1)/p =n ao( j/p) =n 1 .
c) Show that j = 1 and so o = Ord×n (a) = n − 1.
d) Show that (c) implies the lemma. (Hint: use Euler.)
e) Use the lemma to show that 997 is prime. (Hint: 996 has prime divisors
2, 3, and 83.)
Theorem 3.13 and exercise 3.18 show how to solve linear congruences gen-
erally. Quadratic congruences are much more complicated. As an example,
we look at the equation x2 = p ±1 in the following exercise.
Exercise 5.21. a) Show that Fermat’s little theorem gives a solution of
p−1
x2 − 1 = p 0 whenever p is an odd prime. (Hint: consider x 2 .)
p−1
b) Use Lemma 5.28 to show that x 2 = p ±1.
c) Show that Wilson’s theorem implies that for odd primes p
2
p−1 p−1
(−1) 2 ! = p −1 .
2
(Hint: the left-hand side gives all reduced residues modulo p.)
d)
h Use(c) i to show that if p =4 1 (examples are 13, 17, 29, etc), then
p−1
2 ! satisfies the quadratic congruence x2 + 1 = p 0.
e) Show that if p =4 3 (examples are 7, 11, 19, etc), then the quadratic
congruence x2 + 1 = p 0 has no solutions. (Hint: we have x4 = p 1 and by
Euler xϕ(p) = p 1; derive a contradiction if p =4 3.)
96 5. Modular Arithmetic and Primes
Exercise 5.22. Given b > 2, let R ⊆ Zb be the reduced set of residues and
let S ⊆ Zb be the set of solutions in Zb of x2 =b 1 (or self inverses).
a) Show that S ⊆ R. (Hint:Bézout.)
b) Show that
∏ x =b ∏ x (=b 1 if S is empty) .
x∈R x∈S
c) Show that if S contains a, then it contains −a.
d) Show that if a =b −a, then a and −a are not in S.
e) Show that
∏ x =b (−1)m some m .
x∈R
f) Show that
∏ x =b (−1)|S|/2 .
x∈R
g) Compute ∏x∈R x in a few cases (b = 6, 8), and verify that (f) holds.
1 2n 2n 2n
Definition 5.32. The nth Catalan number Cn equals n+1 n = n − n+1 .
(n,n)
(0,0)
Exercise 5.24. Show that the following sets with the usual additive and
multiplicative operations
√ are not fields:
a) The numbers a + b 3 where a and √ b in Z.
b) The numbers of the form a + ib 6 where a and b in Z.
c) Z6 .
d) The 2 by 2 real matrices.
e) The polynomials with rational coefficients.
f) The Gaussian integers, i.e. the numbers a + bi where a and b in Z.
(Hint: in each case, exhibit at least one element that does not have a mul-
tiplicative inverse.)
Continued Fractions
99
100 6. Continued Fractions
0
0 1/3 1/2 1
In the exercises 3.20 and 3.21, we indicated by example how the Gauss map
is related to the Euclidean algorithm.
the Euclidean algorithm (see equation (3.2)) while making sure that the
sequence of the ai in Definition 6.4 below starts with a1 .
At any rate, with these conventions, the equations of Lemma 6.2 be-
come:
1
ωi = ω − ai−1 = T (ωi−1 ) and
i−1
. (6.1)
1
ωi−1 =
ai−1 + ωi
The way one thinks of this is as follows. The first equation defines a
dynamical system1. Namely, given an initial value ω1 ∈ [0, 1), the repeated
application of T gives a string of positive integers {a1 , a2 , · · · }. The string
ends only if after n steps ωn = 1` , and so ωn+1 = 0. We show in Theorem 6.5
that this happens if and only if ω1 is rational. The `th branch of T , depicted
1 1
in Figure 17, has I` = ( `+1 , ` ] as its domain. It is easy to see that ai = `
precisely if ωi ∈ I` .
If, on the other hand, the {ai } are given, then we can use the second
equation to formally2 derive a possibly infinite quotient that characterizes
ω1 . For, in that case, we have
1
ω1 = 1
. (6.2)
a1 + 1 a2 + a + ···
3
The expression stops after n steps, if ωn+1 = 0. Else the expression contin-
ues forever, and we can only hope that converges to a limit. We now give
some definitions.
Definition 6.4. Let ω1 ∈ [0, 1]. The expression
1 def
ω1 = 1
≡ [a1 , a2 , a3 , · · · ] .
a1 + 1 a2 + a + ···
3
1A dynamical system is basically a rule that describes short term changes. Usually the purpose of
studying such a system is to derive long term behavior, such as, in this case, deciding whether the sequence
{ai } is finite, periodic, or neither.
2Here, “formally” means that we have an expression for ω , but (1) we don’t yet know if the actual
1
computation of that expression converges to that number, and on the other hand (2) we “secretly” do know
that it converges, or we would not bother with it.
102 6. Continued Fractions
are called the continued fraction convergents (or continued fraction approximants)
of ω1 . The coefficients ai are called the continued fraction coefficients.
Proof. If ω is rational, then by Lemma 6.2 and Corollary 3.2, the algorithm
ends. On the other hand, if the expansion is finite, namely [a1 , a2 , · · · , an ],
then, from equation (6.2), we see that ω is rational.
Theorem 6.6. For the continued fraction convergents, we have
where
ai 1 0 1
Ai = and A−1
i =
.
1 0 1 −ai
pk+1 = pk + a 1 pk−1
k+1 .
qk+1 = qk + a 1 qk−1
k+1
p
The quotient qk+1 does not change if if we multiply only the right-hand side
k+1
of these equations by ak+1 to insure that both pk+1 and pk+1 are integers.
This gives the result.
104 6. Continued Fractions
Proof. The left-hand side of the expression in (i) equals the determinant of
qn pn
, which, by Theorem 6.6, must equal the determinant of
qn−1 pn−1
An · · · A2 A1 . Finally, each Ai has determinant -1. To get the second equation,
divide the first by qn−1 qn .
Corollary 6.8. We have
n−1 n−1
(i) pn ≥ 2 2 and qn ≥ 2 2
.
(ii) gcd(pn , qn ) = 1
numbers has a limit3, the decreasing sequence has a limit ω− . Similarly, the
increasing sequence must have a limit ω+ . Now we use Corollary 6.7(ii)
again to see that for all n, the difference between the two cannot exceed
1
qn−1 qn . So ω+ = ω− = ω.
p
2n−1
Corollary 6.10. Suppose ω is irrational. For every n > 0, we have q2n−1 <
p2n
ω < q2n . If ω is rational, the same happens, until we obtain equality of ω
and the last convergent.
i : 0 1 2 3 4 5 ···
ai : - 1 2 4 2 4 ···
pi : 0 1 2 9 20 89 ···
qi : 1 1 3 13 29 129 ···
But, because the ai are eventually periodic, we can also opt for a more
explicit representation of ω1 . The periodic tail can be easily analyzed. In-
deed, let
1 1
x= 1
=⇒ x =
2+ 1 2 + 4 +1 x
4 + 2 + ···
After some manipulation, this simplifies to a quadratic equation for x with
one root in [0, 1).
√
x2 + 4x − 2 = 0 =⇒ x = −2 ± 6 .
6.4. The Geometric Theory of Continued Fractions 107
Figure 18. The line y = ωx and (in red) successive iterates of the rota-
tion Rω . Closest returns in this figure are q in {2, 3, 5, 8}.
0 dn−1 +2 dn dn−1 + dn
dn d n+1 d n−1
c
en+1= an+1en+ en−1
b
y=wx
e n−1
en a
qn+1
p0
Theorem 6.13 (The closest return property). q0 is a continued fraction
convergent if and only if
|ω1 q0 − p0 | < |ω1 q − p| for all 0 < q < q0 .
of an+1 + 1 integer translates of the previous and one can check that that it
inherits this property (Figure 20).
Next we show, again by induction, that the Rqωn are closest returns, and
that there are no others. It is trivial that R1ω is the only closest return for
q = 1. It is easy to see that Raω1 is the only closest return5 for 0 < q ≤
a1 . Now suppose that up to q = qn the only closest returns are ei , i ≤ n.
We have to prove that the next closest return is en+1 . By Lemma 6.12,
dn+1 < dn . Now we only need to prove that there are no closest returns
for q in {qn + 1, qn + 2, · · · , qn+1 − 1}. To that purpose we consider Figure
20. With the exception of the origin and the endpoints of en , and en+1 , the
shaded regions in the figure are contained in the interior of translates of the
parallelogram p(en−1 , en ), and therefore contain no lattice points. Since the
vector c is parallel to and larger than en , we have that b > a. Thus there is
a band of width dn around y = ω1 x that contain no points in Z2 except the
origin, en , and en+1 .
and x2 ∈ [0, 1), we can conclude that x1 lies on the a1 -branch of T defined on
( a11+1 , a11 ] that contains x1 , see Figure 17. More precisely, if b1 : I1 → [0, 1)
is the branch of T such that x ∈ I1 , then the end point of I1 that maps to zero
under T is the first convergent. It is this statement we wish to generalize.
To get an idea what iterates of T look like, let’s have a look at T 2 in
Figure 21. T has a countable collection of branches. Each branch maps
onto [0, 1). Thus T 2 has countably many branches for every single branch
b : I → [0, 1) of T . In turn, each of the branches of T 2 also maps onto [0, 1).
And so forth.
Proposition 6.14. Let bk : Ik → [0, 1) be the branch of T k such that x ∈ Ik ,
then the kth convergent pk /qk of x is the (unique) end point of Ik that maps
to zero under T k .
5By definition of a , the first time qω is within ω of a natural number is when q = a .
1 1 1 1
112 6. Continued Fractions
0
0 1/3 1/2 2/3 1
Figure 21. A few branches of the twice iterated Gauss map T 2 . The
points T −2 (0) are marked in red. The reader should compare this plot
to Figure 17.
Proof. From the expression given in Definition 6.4 for qpnn = [a1 , a2 , · · · , an ],
we see that T ([a1 , a2 , · · · , an ]) = [a2 , · · · , an ]. Continuing by induction, we
find
T n ([a1 , a2 , · · · , an ]) = T n−1 ([a2 , · · · , an ]) = · · · = T ([an ]) = 0 .
So the nth convergent is indeed an nth pre-image of 0 under T .
Similarly, (6.1) implies that
x = [a1 , a2 , · · · ] = [a1 , a2 , · · · , an , an+1 , · · · ] = [a1 , a2 , · · · , (an + xn+1 )] .
Since xn+1 ∈ [0, 1), this is a single branch whose domain contains x.
odd k the convergents (the zeroes of the branches) are on the right of the
interval of definition of the branch they belong to, and for the k even they
are on the left side. This is convenient, because it mimplies that x is always
‘sandwiched’ between two successive convergents.
6.7. Exercises
Exercise 6.1. Give the continued fraction expansion of 1331 ,
21 34 n−1
34 , 21 , n for
n > 1, n−1
n 2 for n > 1 by following the steps in Section 6.3.
√
Exercise 6.2. Verify the continued fraction expansion of 2 ≈ 1.4 given
in the text by following the steps in Section 6.3.
Exercise 6.3. a) Find the continued fraction expansion of the fixed points
(i.e. solutions of T (x) = x for T in Definition 6.1) of the Gauss map.
b) Use the continued fractions in (a) to find quadratic equations for the
fixed points in (a).
c) Derive the same equations from T (x) = x.
d) Give the positive solutions of the quadratic equations in (b) and (c).
√
Exercise 6.4. Compute the continued fraction expansion for n for n be-
tween 1 and 15.
Exercise 6.5. Given the following continued fraction expansions, deduce
a quadratic equation for x. (Hint: see Section 6.3.)
a) x = [8] = [8, 8, 8, 8, 8, · · · ].
b) x = [3, 6] = [3, 6, 6, 6, 6, · · · ].
c) x = [1, 2, 3] = [1, 2, 3, 1, 2, 3, · · · ].
d) x = [4, 5, 1, 2, 3] = [4, 5, 1, 2, 3, 1, 2, 3, · · · ].
Exercise 6.7. Derive a quadratic equation for the number with continued
fraction expansion: [n], [m, n], [n, m], [a, b, n, m].
Exercise 6.8. From the expressions given in Section 6.2, compute the first
6 convergents of π − 3, e − 2, θ , and g.
Exercise 6.9. In exercise 6.8, numerically check how close the nth conver-
gent of ω is to the actual value of ω.
b) Compare your answer to (a) with the decimal expansion approximation
using i digits.
114 6. Continued Fractions
Exercise 6.12. What does the matrix in Theorem 6.6 correspond to in terms
of the Euclidean algorithm of Chapter 3?
Exercise 6.14. Check Theorem 6.13 for the continued fraction convergents
in exercise 6.9.
Figure 22. Black: thread from origin with golden mean slope; red:
pulling the thread down from the origin; green: pulling the thread up
from the origin.
6.7. Exercises 115
pn+1/q n+1 x pn /q n
Figure 23. The placement of x between its convergents pn /qn and pn+1 /qn+1 .
Exercise 6.17. Use exercise 6.16 to generate bounds for the errors com-
puted in exercise 6.9. Compare your answers.
Exercise 6.22. Consider Figure 24. The first plot contains the points
{(n, n)}50
n=1 in standard polar coordinates, the first coordinate denoting the
radius and the second, the angle with the positive x-axis in radians. The
next plots are the same, but now for n ranging from 1 to 180, 330, and
2000, respectively.
a) Determine the first 4 continued fraction convergents of 2π.
b) Use a) to explain why we appear to see 6, 19, 25, and 44 spiral arms.
c) Why does the curvature of the spiral arms appear to (a) alternate and (b)
decrease?
Exercise 6.23. The exercise depends on exercise 6.22. Suppose we restrict
the points plotted in that exercise to primes (in N) only. Consider the last
plot (with 44 spiral arms) of Figure 24.
a) Show that each spiral arm corresponds to a residue class i modulo 44.
b) Show that if gcd(i, 44) > 1, that arm contain no primes (except possibly
i itself), see the left plot of Figure 25.
c) Use Theorem 6.16 to show that the primes tend (as max p → ∞) to be
equally distributed over the co-prime arms.
d) Use Theorem 4.17 to determine the number of co-prime arms. Confirm
this in the left plot of Figure 25.
e) Explain the new phenomenon occurring in the right plot of Figure 25.
The following result will be proved in Chapter 13.
Theorem 6.16 (Prime Number Theorem for Arithmetic Progressions).
For given n, denote by r any of its reduced residues. Let π(x; n, r) stand for
the number of primes p less than or equal to x such that Resn (p) = r. Then
π(x; n, r) 1
lim = .
x→∞ π(x) ϕ(n)
118 6. Continued Fractions
Figure 24. Plots of the points (n, n) in polar coordinates, for n ranging
from 1 to 50, 180, 330, and 3000, respectively.
Figure 25. Plots of the prime points (p, p) (p prime) in polar coordi-
nates with p ranging between 2 and 3000, and between 2 and 30000,
respectively.
Currents in Number
Theory: Algebraic,
Probabilistic, and Analytic
Chapter 7
123
124 7. Fields, Rings, and Ideals
Definition 7.1. A ring R[x] of polynomials is the set of polynomials with co-
efficients in a (commutative) ring R without zero divisors2 (unless otherwise
mentioned).
Without the extra requirements, the resulting ring would have very
strange properties indeed. For example, if R consists of the integers modulo
6, then, indeed, very strange factorizations can happen:
(2x − 3)(3x + 2) =6 6x2 − 5x − 6 =6 x .
So, in particular, the degree of the product is not equal to the sum of the de-
grees of the factors. Dropping commutativity would lead to another strange
problem. Given f ∈ R[x], we may want to evaluate f at c ∈ R by substituting
the value c for x. Suppose for example that R is the non-commutative ring
of 2 by 2 matrices. Set for some a ∈ R,
f (x) = (x − a)(x + a) = x2 − a2 .
But if we substitute another 2 by 2 matrix c for x such that the matrices
a and c do not commute, then the above equality does not hold anymore.
However, if R satisfies the two requirements, one can prove that the result-
ing polynomial ring has no zero divisors, evaluations are safe, and that the
degree of a product is additive (see [27][sections 8.5 and 8.6] for details).
Definition 7.2. Recall (Definition 1.17) that f is minimal polynomial in
R[x] for ρ if f is a non-zero polynomial in R[x] of minimal degree such that
f (ρ) = 0. A polynomial f in R[x] of positive degree is irreducible over R
if it cannot be written as a product of two polynomials in R[x] with positive
degree. A polynomial f in R[x] is prime over R if if whenever f divides gh
(g and h in R[x]), it must divide g or h.
Definition 7.3. Let f and g in R[x]. The greatest common divisor of f and
g, or gcd( f , g), is a polynomial in R[x] with maximal degree that is a factor
of both f and g. The least common multiple of f and g, or lcm ( f , g), is a
polynomial in R[x] with minimal degree that has both f and g as factors.
Remark 7.4. If p is minimal for ρ, it must be irreducible, because if not,
one of its factors with smaller degree would also have ρ as a root.
2This means that if for a, b in R, we have that ab = 0, then a = 0 or b = 0, see Definition 8.4.
7.1. Rings of Polynomials 125
It turns out that in the special case where the coefficients of the poly-
nomials are taken from a field F, the result is a ring F[x] that is very rem-
iniscent of the trusty old ring Z. The underlying reason for this similarity
is that in F[x], the division algorithm works (see exercise 7.1): given r1 and
r2 , then there are q2 and r3 such that3
r1 = r2 q2 + r3 such that deg(r3 ) < deg(r1 ) .
Recall that the gcd of two polynomials in F[x] can be computed by factor-
ing both polynomials and multiplying together the common factors to the
lowest power as in the proof of Corollary 2.23. Since factoring polynomials
is hard, it is often easier to just use the Euclidean algorithm. An example is
given in exercise 3.22. The relation between lcm and gcd of two polynomi-
als is the same as in the proof of Corollary 2.23. The minimal polynomials
of F[x] are “like” the primes in Z. We will see later that this implies unique
factorization, and that primes and irreducibles are the same4. We give a few
properties that will be immediately useful5
Proposition 7.5. Given ρ ∈ C and p ∈ F[x] so that p(ρ) = 0.
i) p is minimal for ρ if and only if p is irreducible.
ii) If p is minimal, it has no repeated roots.
The latter is of lower degree and still has a root α. This contradicts the
minimality of p.
Proof. We paraphrase the proof of Lemma 2.5 with “degree” replacing “ab-
solute value”. Let S and ν(S) be the sets:
Next, we present a result that holds for more general rings of the form
R[x] (though not for all). For simplicity, however, we give the result for Z[x].
It says that if we can factor a polynomial in Z[x] as a product of polynomials
with rational coefficients, then, in fact, those coefficients are integers.
Lemma 7.8 (Gauss’ Lemma). Let A` ∈ Z, and bi , c j ∈ Q. If
! !
m+n m n
∑ A` x` = ∑ bi xi ∑ c jx j ,
`=0 i=0 j=0
then bi , c j ∈ Z.
We now show that ABC = 1 and so all three are ±1. Given any prime p in
Z, let r be the minimum of the index i such that p - Bbi , and the minimum
of the index j such that p - Cc j . From the way the coefficient ABCar+s is
computed, see Figure 26, it immediately follows that p - ABCar+s . Since
we can do this for any prime p, the result follows.
n
j
s
s−1
1 i
0
0 1 2 r−1 r r+s m
Figure 26. ABCar+s is the sum of the BbiCc j along the green line in
the i − j diagram. The red lines indicate where p - Bbi and p - Cc j . So
all contributions except BbrCcs are divisible by p. Thus p - ABCar+s .
We end this section with a note on some notation that can be confusing.
We can “adjoin” x to a ring R in two ways. If we use square brackets [·], we
take R[x] to be the minimal (smallest) ring that contains both R and x. On
128 7. Fields, Rings, and Ideals
the other hand, parentheses (·) are used to indicate the minimal (smallest)
field that contains both R and x. On the other hand, A little reflection leads
to the following definition.
7.2. Ideals
Definition 7.9. A non-empty subset I of a ring R is called an ideal6 if
i) For all i and j in I, i ± j is in I (closed under addition and negatives).
ii) For all x in R and i in I, xi and7 ix are in I (it “absorbs” products).
6Usually “fraktur” letters (a, b, c ...) are used for ideals. On a blackboard or whiteboard, these are
hard to distinguish from normal letters. So instead we will use capital letters to indicate ideals.
7One of the two is sufficient if R is commutative.
7.2. Ideals 129
This example also illustrates the fact that h6i + h15i is a principal ideal.
In fact, in Z, it is easy to see that every ideal I is principal. One can use
Bźout to show that I is generated by its least positive element. Another
non-trivial example of a principal ideal is the set of polynomials q satisfying
q(ρ) = 0 in the ring of polynomials over a field F. Indeed, we need to
refer to Lemma 7.6 to establish that this is the case (work out the details in
exercise 7.8).
Next, we look at multiplication of ideals . If ideals are to behave like
numbers, then the product of two ideals should also be an ideal. At first
glance, one would think the collection of products of one element in h6i and
one in h15i would do the trick. This is indeed the case in Z (exercise 7.4).
However, in general this construct is not closed under addition (exercise
7.5). Thus we define AB as the smallest ideal containing the products of
one element in A and one in B, or
( )
k
AB := ∑ ai bi : ai ∈ A, bi ∈ B, k ∈ N . (7.5)
i=1
8In most texts parentheses (·) are used. We want to avoid ambiguity with the notation for an n tuple
(i, j, · · · ).
130 7. Fields, Rings, and Ideals
The relation between ring and ideal is very similar to one between
group and normal subgroup (Definition 7.28). In fact, since a ring R is an
Abelian group with respect to addition, any ideal in R is a normal subgroup.
There is one interesting difference: a normal subgroup is also a group. In
contrast an ideal (like the even numbers) may not have a multiplicative iden-
tity and so it is not a ring (see Remark 5.24). The remainder of this section
spells out the relation between rings and their ideals.
Definition 7.10. Given two rings I and J, a ring homomorphism is a map
f : I → J that preserves addition and multiplication and their respective
identities 0 and 1. The kernel of a ring homomorphism is the pre-image of
the additive identity 0. A ring isomorphism is a ring homomorphism that is
a bijection. The word “‘ring” is often omitted.
Proposition 7.11. i) The quotient R/K of a ring R by an ideal K is a ring.
ii) The kernel K of a ring homomorphism f : R → H is an ideal.
ab − a0 b0 + (a − a0 )K + K(b − b0 ) + K · K =
(a − a0 )b + a0 (b − b0 ) + (a − a0 )K + K(b − b0 ) + K · K .
The absorption property of the product does the rest.
Associativity and distributivity now follow easily. For example, since
[ab]c = a[bc] in R and multiplication is well-defined, we must have
[(a + K)(b + K)] (c + K) = (a + K) [(b + K)(c + K)] .
Similarly for distributivity. Again, by absorption, (a + K)(1 + K) ⊆ (a + K)
and so 1 + K is the multiplicative identity. This proves (i).
7.2. Ideals 131
The proof of (ii) is rather trivial. Just use Definitions 7.9 and 7.10.
Choose x and y in the kernel of f and conclude that f (x ± y) = 0 and that
for any r ∈ G, f (rx) also equals 0.
Theorem 7.12 (Fundamental Homomorphism Theorem). If f : R → H is
a surjective ring homomorphism with kernel K, then H is (ring) isomorphic
to R/K.
Theorem 7.12 has the surprising consequence, for example, that there
are no non-trivial (ring) homomorphisms C → R (see exercise 7.7).
132 7. Fields, Rings, and Ideals
Proof. Clearly, {1, ρ, · · · , ρ d−1 } are independent over F (otherwise the min-
imal polynomial would have degree less than d) and since a field is closed
under addition, subtraction, and multiplication, and so F(ρ) must contain
all expressions ∑d−1
i=0 ai ρ .
i
All we are doing in this last proof, really, is taking an arbitrary quo-
tient f /g of polynomials f and g in ρ and reducing it using the minimal
polynomial. That insight leads to a sharper result.
Theorem 7.15. Let F(ρ) a finite extension of a field F. Suppose p is a
minimal polynomial for ρ. Then F(ρ) is ring isomorphic to F[x]/hp(x)i.
This is all very well, but what if we adjoin another algebraic element, β ,
to Q(α)? What does Q(α, β ) look like? Are the results we just proved still
useful? The answer, miraculously, is yes. And the reason is the primitive
element theorem below (Theorem 7.18).
√
Let us look√ at an example again. Adjoin β = 3 to the previous exam-
ple Q(α) = Q( 2), and consider Q(α, β ). Since √ the
√ squares of α and β
are integers, it is clear that every element of Q( 2, 3) can be written as
√ √ √
a + b 2 + c 3 + d 6 where a, b, c, d ∈ Q .
√ √ √
What is not immediately obvious is that 1, 2, 3, and 6 are linearly
independent over the rationals, but let us assume that for now (see exercises
7.17 to 7.20).
Remark 7.17. We obtain a 4 dimensional vector space over Q with a basis
√ √ √
formed by the vectors {1, 2, 3, 6}.
Now we make the “inspired9” guess that√in this √ example Q(α + β ) is iden-
tical to Q(α, β )! To verify that, set γ = 2 + 3. Clearly, γ ∈ Q(α, β ) and
so
Q(γ) ⊆ Q(α, β ) .
A simple computation indeed yields
√ √ √ √
γ 2 = 5 + 2 6 , γ 3 = 11 2 + 9 3 , γ 4 = 49 + 20 6 . (7.6)
3
√ 3
√ 2
√ so γ −9γ generates 2, γ −11γ generates 3, while γ −5 generates
And
6. Thus
Q(α, β ) ⊆ Q(γ) .
We have established that Q(γ) = Q(α, β ). That we can do this in general,
is the content of the primitive element theorem.
9“Inspired” is pretentious way of saying that I do not want to say where I got this (but see the proof of
Theorem 7.18).
7.3. Fields and Extensions 135
Proof. If we can find a single generator ϕ for α and β , we can then repeat
the argument to find a generator θ for ϕ and γ, and so forth. Thus it is
sufficient to prove this result for F(α, β ).
Let p and q be minimal polynomials in F[x] for α and β , respectively.
The roots of p are {αi }m n
i=1 with α1 ≡ α and those of q are {βi }i=1 with
β1 ≡ β . Now define for c 6= 0 in F
r(x) := p(α + cβ − cx) .
This polynomial has several intriguing properties. First, it is a member of
the field F(α + cβ )[x], for it has coefficients in F(α + cβ ). Furthermore,
its roots are given by
α1 − αi
α1 + cβ − cx = αi ⇐⇒ xi = + β1 .
c
For i = 1, we of course get β = β1 as a root. But now, since F is infinite,
we fix a value of c∗ of c such that none of the other roots equals βi for i > 1.
Since both q ∈ F[x] ⊆ F(α + c∗ β )[x] and r ∈ F(α + c∗ β )[x] and both
have β as a root, Lemma 7.6 implies that the minimal polynomial d for β in
F(α + c∗ β )[x] must be a divisor of both q and r. But these two share only
one root, and therefore d ∈ F(α + c∗ β )[x] has degree one:
s(x) = a1 x + a0 = a1 (x − β ) .
Clearly, the ai are in F(α + c∗ β ), but then so does β = a0 /a1 , and the same
holds for α = (α + c∗ β ) − c∗ β . Thus α + c∗ β generates F(α, β ).
Thus a primitive element generates the whole field extension through ad-
dition and multiplication (and their inverses). In contrast, a primitive root
(Definition 5.5) is an element of F p (the elements of Z p with addition and
multiplication as operations) whose powers generate F p .
As mentioned in our last example, Q(γ) is in fact a vector space over
Q. From (7.6), it is clear that γ 4 − 10γ 2 + 1 = 0. Therefore Q(γ) has four
basis vectors, like Q(α, β ), namely {1, γ, γ 2 , γ 3 } span the space Q(γ). The
scalars are elements of Q. As such, it is somewhat confusingly denoted by
Q(γ)/Q in the literature, though this is not to be interpreted as a quotient.
136 7. Fields, Rings, and Ideals
We can take advantage of the fact that algebraic integers are complex
numbers, which in turn form a commutative field (and thus a ring) without
zero divisors. Many of the properties mentioned in Definition 5.20 as well
as the absence of zero divisors are thus automatically satisfied. To make a
long story short, we only need to prove that A is closed under additive in-
version, under addition, and under multiplication. The first is easy. Suppose
that θ ∈ A is a root of xd +ad−1 xd−1 +· · ·+a0 , where the ai are in Z. Then,
of course, −θ is a root of the same polynomial with the odd ai replaced by
−ai . The remaining two criteria have a very interesting constructive proof.
To understand it, we need to define the Kronecker product.
Definition 7.20. Given two matrices A and B, their Kronecker product A ⊗
B is given by
A11 B A 12 B A 13 B · · ·
A ⊗ B := A21 B A22 B · · ·
.
· · ·
. .
.. ..
Proof of Theorem 7.19. We only need to prove that A is closed under ad-
dition and under multiplication. So let α and β be in A . Then α is a root
of a monic polynomial pA (x) of degree a and the same for β and pB (x) of
degree b. Suppose pA (x) = ∑a−1 i
i=0 ai x . The so-called companion matrix ,
138 7. Fields, Rings, and Ideals
While, like Z, the algebraic integers A form a ring, that ring does not
“look” like Z at all! We will take this up later when we prove that A
is dense in the complex numbers and has no irreducibles and no primes
(Theorem 8.6). So to study factorization, we must look at more restricted
collections of algebraic integers.
Examples of more restricted rings of integers are Z(γ), the ring con-
sisting of numbers of the form ∑d−1 i
i=0 ci γ with ci ∈ Z, where γ is algebraic
of degree d. To see that Z(γ) is a ring is trivial, since we do not have to
worry about multiplicative inverses, which was the only complication in
Proposition 7.14.
We end this section with a slightly confusing definition and a warning
in the form of a Lemma.
Definition 7.22. Consider the field Q(γ). The integers of Q(γ) are those
elements in Q(γ) that are algebraic integers.
This is not necessarily the same as the set Z(γ)! As an example we will
prove the lemma below in exercise 7.25.
√
Lemma 7.23. Let j be square free. The integers of Q( j) are precisely the
√ √
elements of the ring Z( 21 (1 + j)) if j =4 1, and Z( j) else.
7.5. Rings of Quadratic Numbers and Modules 139
7.6. Exercises
Exercise 7.1. The reader might want to review exercises 3.22 to 3.25 first.
Let f and g in F[x]. We will show that there are polynomials q and r in
F[x] such that
f = gq + r and deg(r) < deg(g) . (7.10)
a) Show that this holds if deg(g) > deg( f ).
b) Now let n = deg( f ) ≥ deg(g) = m and f (x) = ∑ni=0 ai xi and g(x) =
∑m i
i=0 bi x . Define
an n−m
f j (x) = f (x) − x g(x) ,
bm
where f j has degree j. Show that j ≤ n − 1. (Hint: by assumption, an and
bm are not zero.)
c) Show that the computation in (b) can be repeated with f replaced by f j
as long as j ≥ m. (Hint: we are just formalizing long division here.)
d) Show that r(x) = fi (x), where fi is the first of the f j to have degree less
than m.
e) Show that the leading term of q(x) in (7.10) is bamn xn−m .
Exercise 7.4. a) Given two ideals hai and hbi in Z. Show that
( )
k
hai · hbi = ∑ ni mi ab : ni , mi ∈ Z, k ∈ N .
i=1
b) Use (a) to prove that in Z
hai · hbi = habi .
142 7. Fields, Rings, and Ideals
Exercise 7.11. Consider primes p and q (in Z). Use Lemma 7.21 to find
√ √ √ √
minimal polynomials for p q and p + q.
Exercise 7.14. a) For any real α > 1, and any n ∈ N, we can choose k =
bα n c. Show that
1 1
k n ≤ α < (k + 1) n .
b) Use (a) to show that
1 1
1
(k + 1) n − k n < α 2 n − 1 .
c) Show that the algebraic integers are dense in {x ∈ R : x ≥ 1}. (Hint:
k1/n is an algebraic integer.)
d) Extend the conclusion in (c) to all of R by using exercise 7.12 (a).
e) Use (d) and Lemma 7.21 to prove that A is dense in C.
We first need a lemma that is interesting in its own right. This lemma is
proved in the next exercise.
Lemma 7.32. Let {m1 , m2 , · · · , m` } be a collection of distinct square free
√
integers in Z. Then for ai ∈ Z, ∑`i=1 ai mi = 0 if and only if ai = 0 for all
i.
7.6. Exercises 145
√
Exercise 7.18. a) Show that Pn−1 (γn − pn ) = 0. (Hint: show that
Pn (γn ) = 0 and use exercise 7.17 (a).)
b) Show that
√ √
Pn−1 (x − pn ) = En−1 (x) + pn On−1 (x) ,
where On−1 and En−1 are in Z[x]. (Hint: use exercise 7.17 (d).)
c) Show that On−1 (x) = −(n − 1)x plus higher order in x. (Hint: direct
calculation from its definition.)
d) Use (c) and Lemma 7.32 to show that On−1 (γn ) 6= 0. (Hint: how does
the fact that an (n−)-degree polynomial in γn equals zero contradict the
√
lemma.) e) Use (d) to show that Q(γn ) contains pn .
f) Use that fact that the order of the primes is arbitrary to show that Q(γn )
√
contains pi for any i ∈ {1, · · · , n}.
g) Prove Proposition 7.30 (i). That is: show that Q(γn ) = Fn or γn is a
primitive element for the field Fn . (Hint: it is trivial that Q(γn ) ⊆ Fn .)
√
h) Show that (e) holds for any γn = ε · p (n) with ε ∈ Sn fixed.
Exercise 7.21. Suppose ρ ∈ A is not a unit and has minimal (monic) poly-
nomial p in Z[x].
√
a) Show that q(x) = p(x2 ) has root ρ.
b) Show that any factor in Z[x] of q is monic.
√
c) Show that ρ is not a unit. (Hint: if it is, then its square must be too.)
d) Conclude that ρ is not irreducible.
√
Exercise 7.22. We apply the Euclidean algorithm in Z[ −1] to 17 + 15i
and 7 + 5i. Compare with the computations in Section 3.2 and exercise
3.22.
a) Check all computations in the following diagram.
| + | − | + | − |
| 2 + 2i | −2 − i | 3 | 0 |
0 | −1 + i | −4 | 7 + 5i | 17 + 15i |
| 1 | | | |
| | 2+i | 1 | |
| | | −6 − 3i | −2 − i |
b) Check all computations in the following diagram.
| + | − | + | − | + |
| 1 + 2i | 1−i | 1−i | 2 | 0 |
0 | 1+i | −1 + 3i | 3 + 5i | 7 + 5i | 17 + 15i |
| 1 | | | | |
| | −1 + i | 1 | | |
| | | −2i | −1 + i | |
| | | | −2 + 4i 1 − 2i | |
√
c) From the diagram in (a), compute values for x and y in Z[ −1] such
that
−1 + i = (7 + 5i)x + (17 + 15i)y .
(Hint: follow instructions in Section 3.2.) √
d) From the diagram in (b), compute values for x and y in Z[ −1] such
that
1 + i = (7 + 5i)x + (17 + 15i)y .
e) Compute gcd(17 + 15i, 7 + 5i) (up to invertible elements).
f) Compute lcm (17 + 15i, 7 + 5i) (up to invertible elements). (Hint: see
Corollary 2.16.)
148 7. Fields, Rings, and Ideals
Exercise 7.23. Find a greatest common divisor and a least common multi-
ple for each of the following pairs of Gaussian integers. (Hint: see exercise
7.22.)
a) 7 + 5i and 3 − 5i.
b) 8 + 38i and 9 + 59i.
c) −9 + 19i and 52 + 68i.
Exercise 7.24. a) Show that the arithmetic functions (Definition 4.1) with
the operations addition and Dirichlet convolution (Definition 4.19 form a
commutative ring. (Hint: see exercise 4.15).
b) Show that the same does not hold for the multiplicative (Definition 4.1)
arithmetic functions. (Hint: see exercise 4.16).
c) Show that the functions f : R → R together with the operations addition
and multiplication form a commutative ring.
d) Is the ring in (c) a domain?
e) Show that the square integrable functions f : [0, ∞) → [0, ∞) together
with the operations addition and convolution form almost a commutative
ring. (Hint: only the multiplicative identity is missing.)
f) Look up Titchmarsh’s convolution theorem and show that it implies that
the ring in (e) (with the “Dirac delta function” added) is a domain.
√
Exercise 7.25. a) Show that all elements of Q[ j], j ∈ Z, are algebraic
numbers. (Hint: see equation (7.7).) √ √
b) Now let j be square free and show that if a + b j is an integer of Q[ j],
then
2a ∈ Z and a2 − b2 j ∈ Z .
c) Show (b) implies that if a ∈ Z, then b ∈ Z. (Hint: set b = qp where
gcd(p, q) = 1.)
d) Show that (b) implies that if a ∈ Z + 12 , then 4b2 j ∈ 4Z + 1.
e) Show that in (d) we obtain that b ∈ Z + 21 and j =4 1. (Hint: set b = qp
where gcd(p, q) = 1 and conclude that q = 2. Then show that p2 j = 2n + 1
implies that j =4 1.) √
f) Use (c) and (e) to show that if j =4 1, the integers of Q[ j] are given by
n p o 1 1 p
I = a + b j : a, b ∈ Z ∪ a + + b + j : a, b ∈ Z .
2 2
g) Use (f) to prove Lemma 7.23.
Chapter 8
Factorization in Rings
149
150 8. Factorization in Rings
1And the meaning of “prime” has changed to confuse non-algebraists. But we’re not falling for it!
8.2. Integral Domains 151
√
Set a = b = ρ. Then a is a root of q(x) = p(x2 ) by exercise 7.13 (b).
Now q ∈ Z[x] is monic and any polynomial factor of q must also be monic.
Therefore is in A . Since ρ = ab, ρ is reducible. Clearly, we also have
ρ | ab. But if ρ divides a (or b), then a/ρ is in A . Since A is closed under
multiplication, its square, which equals ρ −1 would then also be in A . This
contradicts our initial choice of ρ. Hence ρ cannot divide a or b, and so ρ
is not prime.
(iii) Suppose 2 | ab in Z6 . Then in Z, 2 divides ab + 6m. But that means
that ab is even and thus a (or b) has a factor 2. But then in Z6 , 2 divides a
(or b). Therefore 2 is prime in Z6 . On the other hand, 2 · 4 =6 2. Since both
2 and 4 are non-invertible, 2 is reducible.
√(iv) Suppose the number 3 equals the product xy, where x and y in
Z[ −5]. Clearly, x and y cannot both be real, because 3 is prime and irre-
ducible √in Z. If both are non-real, then b 6= 0 and each has absolute value
at least 5, and |xy| ≥ 5, a contradiction. If one of them is non-real, then
so is their product, another contradiction. Therefore,
√ one of x or y must be
a unit. This proves that 3 is irreducible in Z[ −5]. But on the other hand,
√ √ √ √
(2 + i 5)(2 − i 5) = 9 =⇒ 3 | (2 + i 5)(2 − i 5) .
√ √
But since (2±i3 5) 6∈ Z[ −5], 3 does not divide either of these factors.
(v) Since √ √
4 = 2 · 2 = (1 + −3)(1 − −3) ,
√
both 2 and (1 + −3) are divisors of 4. They are also divisors of (2 +
√ √
2 −3). however, it is a simple check to see that 2 and (1 + −3) do not
divide each other. In other words, there is no mximal common divisor in
this case.
(vi) Using the binomial theorem, we see that modulo 6
n
2 n i
n
2x · 2x · (1 + 3h(x)) =6 4x ∑ 3 h(x)i =6 4x2 ,
i=0 i
It might seem that we have not done much to tame the factorization
process. However, the following result indicates that we on the right track.
Theorem 8.7. Any prime p in an integral domain R is irreducible.
Proof. Suppose that the prime p satisfies p = ab. We need to show that a
or b is a unit. Certainly p 6= 0 divides ab, and so, from Definition 8.1, p | a
or p | b. Assume the former. So there is a c such that pc = abc = a. We then
8.2. Integral Domains 153
Proof. First, suppose that every irreducible is a prime and assume that the
following are two factorizations of x ∈ R into irreducibles.
x = up1 · · · pk = u0 q1 · · · q` .
Now if p1 is a prime, upon relabeling the qi , it must divide q1 . Since q1 is
irreducible, we must have p1 = q1 up to units. Doing finitely many steps,
one proves that the factorization is unique.
Next, suppose that q is irreducible and that there are non-zero a and
b such that q | ab. This implies qc = ab. We factor both sides of this last
equation into irreducibles.
uq(p1 · · · pk ) = u0 (q1 · · · q` )(q`+1 · · · qm ) .
By unique factorization, q must equal to q1 (upon relabeling and up to units)
and thus it divides a or b.
Definition 8.10. An integral domain R is a unique factorization domain2 if
every element admits a unique factorization into irreducibles. This is often
abbreviated to UFD.
2The word “domain” serves as a reminder that R must be an integral domain.
154 8. Factorization in Rings
By Theorems 8.7 and 8.9, in a UFD, “prime” and “irreducible” are syn-
onymous. In a UFD, the notions of greatest common divisor and least com-
mon multiple are well-defined. The reason these notions are well-defined
can be found in the proof of Corollary 2.16. To repeat that argument, sup-
pose that
s s
α = u ∏ pki i and β = u0 ∏ p`i i ,
i=1 i=1
where u and u0 are units and ki and `i in N ∪ {0}. Now define:
mi = min(ki , `i ) and Mi = max(ki , `i ) .
Then, of course, we have
s s
gcd(α, β ) = ∏ pm
i
i
and lcm (α, β ) = ∏ pMi
i .
i=1 i=1
The pi are unique up to a unit. And so are the gcd and lcm, since the product
of units is a unit.
We still need to be slightly cautious. For instance, in Z[i], which is a
UFD, the units are ±1 and ±i. The gcd of 2i and −4 is 2 up to units, that
is: ±2 or ±2i.
needs the reformulation given by Theorem 8.5. Corollary 2.8 would need
to be reformulated (which we omit). Among other things, the unique fac-
torization, and the Euclidean algorithm of Chapter 3, which in turn led us
to continued fractions, follow from these. So the consequences of having
a Euclidean function are indeed staggering! Exercise 8.11 investigates the
relation between the two chapters.
In Euclidean domains the notions of prime and irreducible are again
happily reunited.
Proposition 8.12. Let R be a Euclidean domain. If p ∈ R is irreducible,
then p is prime.
Proof. Item (i) follows from the previous proposition together with Theo-
rem 8.7. Theorem 8.9 implies item(ii).
Polynomial rings over a field, such as Q[x] or R[x], are a great examples
of Euclidean domains. We already saw in Section 7.1, that the degree is a
Euclidean function in these rings.
We finally come to the reason to introduce empty products in Remark
2.14.
Corollary 8.14. A field F is a Euclidean domain and therefore has unique
factorization. Namely, every non-zero x ∈ F is a unit times the empty
product of primes. In particular, there are no primes and no irreducible
numbers in F.
156 8. Factorization in Rings
Thus the results in Chapter 2 starting with Theorem 2.17 (the infinitude
of primes) do not generalize to all Euclidean domains. The problem in the
proof of Theorem 2.17 is that it crucially depends on adding “1” to some
number in order to get a “bigger” number. The rest of that Chapter depends
on the embedding of the integers in the real numbers (or even C).
The last result, together with Definitions 8.11, 8.4, and 5.20, immedi-
ately implies the following.
Corollary 8.15. We have the following inclusions:
fields ( Eucl. domains ( UFDs ( domains ( comm. rings ( rings .
Proof. For j a square free integer, N(α) is the square of the absolute value
of α, and so it is a positive integer. So the second requirement of Definition
8.11 follows immediately from Corollary 7.26. It remains to prove that the
first requirement is satisfied.
Given any ρ1 and ρ2 in Z[i], we can certainly choose κ and ρ3 so that
ρ1 = κρ2 + ρ3 .
4This is one of reasons we added 0 to the image of E in Definition 8.11
8.4. Example and Counter-Example 157
Im
x
Re
Figure 28. The Gaussian integers are the lattice points in the complex
plane; both real and imaginary parts are integers. For an arbitrary point
z ∈ C — marked by x in the figure, a nearby integer is k1 + ik2 where
k1 is the closest integer to Re (z) and k2 the closest integer to Im (z). In
this case that is 2 + 3i.
The computation that leads from equation (8.1) to equation (8.2) can
also be done explicitly. Let ρ1 = a + bi and ρ2 = c + di. It is an easy
computation to see that
ac + bd −ad + bc
ρ1 ρ2−1 = 2 +i 2 .
c + d2 c + d2
We want to express this as a Gaussian integer κ = k1 + ik2 plus a remainder
ρ3 ρ2−1 = ε1 + iε2 whose norm is less than 1. We choose k1 to be the integer
closest (or one of the integers closest) to ac+bd
c2 +d 2
, and k2 , the integer closest
−ad+bc
to c2 +d 2 . With those choices, the remainders
ac + bd −ad + bc
ε1 = − k1 and ε2 = − k2
c2 + d 2 c2 + d 2
5If there is more than one closest Gaussian integer, pick any one of them.
158 8. Factorization in Rings
1
are each not greater than 2 in absolute value. Thus
ρ3 = (ε1 + iε2 )(c + id) ,
with norm (ε12 + ε22 )(c2 + d 2 ) by Corollary 7.26. Since the εi are no greater
than 21 , (8.2) follows.
The computation in the foregoing proof will be important, and so it is
useful to summarize it even more succinctly.
Definition 8.17. A fundamental domain of Z[i] is a simply connected region
in C such that it contains exactly one representative of every set z + Z[i].
Usually one takes the unit square as a fundamental domain for Z[i].
Im
Re
√
Figure 29. A depiction of Z[ −6] in the complex
√ plane; real parts are
integers and imaginary parts are multiples of 6.
Thus each of the norms equals 2. But 2 = a2 + 6b2 has no integer solutions,
hence 2 is irreducible. The exact same argument applied to 5 gives that
25 = N(α)N(γ) .
2 2
√ 5 = a + 6b has no integer
Each of the norms now must equal 5. But again
solutions. If we apply the argument to 2 ± i 6, we obtain
10 = N(α)N(γ) .
Thus either α must have norm 2 and β must have norm 5, or vice versa.
But the previous arguments show that both are impossible.
hand, consider the ideal hp, ji. The fact that it is generated by gcd(p, j) is
non-trivial: it follows from Bézout (Lemma 2.5).
Now let us see how this pans out in some examples of ideals in rings
√
of algebraic integers. Start by considering the ring Z[ −3] of algebraic
integers (see equation (7.7)) displayed in the left of Figure 30. We start
by showing that this ring does not have the unique factorization property.
Knowing that √ √
4 = 2 · 2 = (1 + i 3)(1 − i 3) , (8.3)
the proof of that statement is almost verbatim that of Proposition 8.19 (see
exercise 8.20. This exercise goes on to show that 4 admits no factorization
at all into primes!).
√
What is interesting here is that the numbers 2 and (1 ± i 3) belong to
the same maximal ideal.
√ √
Lemma 8.21. I = h2, 1 + i 3i is a maximal ideal in R = Z[ −3].
Proof. I √
is depicted in red in the left of Figure 30. It clearly contains both 2
and 1 + i 3. It clearly forms a lattice and so is closed under addition. Next
we check the absorption property of the ideal. Denote the two generators
by x and y for brevity. For any elements α, β , and γ of R, we must have
α(β x + γy) = δ x + εy ∈ I .
It is an easy but tedious exercise to check that for any integers a, b, c, and d
√ √ √ √
(a+ib 3)·2+(c+id 3)·(1+i 3) = (a−b−2d)·2+(c+d +2b)·(1+i 3) .
And so all these elements lie in the lattice I.
If we add I any
√ element not in I, then the resulting set contains the
differences 1 and i 3 (see Figure 30). Taking the closure under addition, it
√
immediately follows that we obtain all of Z[ −3]. Thus I is maximal.
The upshot is that we are tempted (or, rather, Kummer was [55]) to
think of the set I as the set√of multiples of some hidden or “ideal”7 prime
Q. Then both 2 and (1 ± i 3) are multiples of this “ideal” number Q (up
to units at least). This way, lo and behold, unique factorization into irre-
ducibles or primes is restored!
7Hence the name “ideal”.
8.5. Ideal Numbers 161
Im Im
Re Re
√
√ Left, the elements of the ring Z[ −3]. Right, the ring
Figure 30.
1
Z[ 2 (1 + −3)]. The units of each ring are indicated in green and the
√
ideals h2, 1 + −3i on the left and h2i on the left are indicated in red.
Fundamental domains (Definition 8.17) are shaded in blue.
There√is more than a grain of truth in this. Recall that the ring R0 =
√
Z[ 12 (1 + i 3)] is the ring of integers in Q( −3) (Lemma 7.23). This ring,
√
depicted on the right of Figure 30, contains the units (drawn in green) 1±i2 3 .
√
Clearly, 2 and 1 + i 3 are now the same up to a unit. Therefore, this time
around 2 generates I. In other words, R0 contains R, and has the same set
I as an ideal, only now it is a principal ideal. Indeed, in R0 , equation (8.3)
does not represent
√ distinct factorizations of 4, precisely because in this ring,
2 and 1 + i 3 differ by a unit.
Finally, we finish this section by checking that indeed
√ the norm is not a
√
Euclidean function for Z[ −3], while it is for Z[ 12 (1+i 3)]. Thus this ring
is a Euclidean domain and so, by Corollary 8.13, primes and irreducibles
are the same, and factorization is unique. This ring is an important example
and has its own name; its elements are called the Eisenstein integers .
√
Proposition 8.22. i) The norm in Z[ −3] is not a Euclidean function.
√
ii) The norm in Z[ 21 (1 + −3)] is a Euclidean function.
Proof. According to Remark 8.18, the norm — which in these two cases
is positive — is a Euclidean function if and only if it is less than 1 in a
fundamental domain. In both cases, the norm of a number is simply the
square of the usual absolute value of that number. The fundamental domains
are shaded in Figure 30.
Proof of √
(i). The fundamental domain D is given by a rectangle of
height |h| = 3 and width 1 (see Figure 31). The diagonals in D have
162 8. Factorization in Rings
√
length 1 + 3 = 2 and so we have that the distance to the nearest algebraic
integer is between 0 and 1. It equals 1 at the intersection of the diagonals.
Thus N fails to be a Euclidean function.
Proof of (ii). The fundamental domain consists of two isosceles trian- √
gles, one of which is depicted on the right of Figure 31. Its height d is 21 3
and its base has length 1. We are looking for the point that maximizes the
distance to the nearest point of the triangle. That point lies at height y on
the bisector of the top-angle and its its distance d − y to the three points of
the triangle is the same. Thus we compute
1 4d 2 − 1 4d 2 + 1
+ y2 = (d − y)2 =⇒ y= =⇒ d −y = .
22 8d 8d
√
3
This evaluates to d − y = 3 which is less than 1.
h 1+h d
1 1/2 1/2
0
√ √
Figure 31. Left, the fundamental domain of Z[ −3]. Here, h = i 3.
Right, one of the 2 √isosceles triangles that constitute
√ the fundamental
domain of Z[ 12 (1 + −3)]. Its height d equals 12 3. The point that
maximizes the distance to the closest of the 3 corner points lies on the
bisector of the top angle at height y.
It is surprising that in the first part of the proof, the criterion of Eu-
clidean fails at only 1 point in the fundamental domain. An analyst might
suspect that somehow we can get around the exception because it has mea-
√
sure zero. Note, however, that (8.3) shows that Z[ −3] does not have have
unique factorization and thus there is no Euclidean function (Proposition
8.13).
8.6. Principal Ideal Domains 163
Proof. In view of Corollary 8.15, we only need to prove (i) that a Euclidean
domain is a PID, (ii) that a PID is a UFD, and (iii) that the three categories
are not equal. We leave (iii) for the next section.
i) In a Euclidean domain, the trivial ideal {0} is of course a principal
ideal (as it has only one element). Let E be the Euclidean function in D.
Fix a non-trivial ideal I and pick x ∈ I that minimizes E on I\{0}. Pick any
other y ∈ I. Then by the division algorithm
y = xq + r and E(r) < E(x) .
But since y − xq ∈ I, r is in I, and so E(r) must be zero by the minimality
of x. Hence x generates y.
ii) Suppose x0 is an element of a principal ideal domain D that cannot
be written as a a product of irreducibles. Then, clearly, there are non-zero
non-units x1 and y1 so that x0 = x1 y1 . But by definition of x0 , at least one
of x1 and y1 cannot be written as a product of irreducibles. Suppose that is
x1 . Now x1 divides x0 , and we get hx0 i ( hx1 i. We can apply the same ar-
guments to x1 , and so on. Thus we get what is called an (infinite) ascending
chain of ideals:
hx0 i ( hx1 i · · · ( hxn i · · · .
We define I = ∪∞ i=0 hxi i. It is easy to see that I is an ideal (Definition 7.9).
But because D is a PID, I must have a single generator p. The element p
must reside in hxn i for some n. Since p generates hxn i it must in fact be
equal to xn . Thus the ascending chain must end, contradicting the hypothe-
sis on x0 , which implies that every element in D can be written as a product
of irreducibles.
It is then sufficient by Theorem 8.9 to show that every irreducible p is
also prime. Let element a not in hpi and consider the ideal hp, ai. Because
164 8. Factorization in Rings
D is a PID, there is a q that generates this ideal: hqi = hp, ai. But then we
must have
hqi = D ,
because if not, p has a non-trivial divisor q. In particular, we get that there
are x and y so that
1 = px + ay =⇒ ∀b ∈ D : b = pxb + ayb
But this implies that if p | ab and p - a, then we must have p | b. Thus p is
prime.
Common PID’s are Z and F[x], but these are also Euclidean domains.
√
Proof. For brevity, we set θ = 1+ 2−19 and denote R = Z[θ ]. The norm of
a + bθ satisfies (see, for example, remark 7.27)
b 2 19b2
N(a + bθ ) = a + + = a2 + ab + 5b2 ∈ N ∪ {0} .
2 4
We have that the norm of units must be ±1, so
b 2 19b2
a+ + = 1.
2 4
Clearly, the only solutions are a = ±1 and b = 0.
By the multiplicative property of the norm, if 2 is reducible we have
2 = xy =⇒ N(2) = 4 = N(x)N(y) .
N(x) and N(y) are natural numbers and not equal to 0 or ±1. The only
solution is N(x) = N(y) = 2 which is easily seen to be impossible. Hence 2
is irreducible. The same reasoning works for 3.
√
−19
Proposition 8.26. Z[ 1+ 2 ] is not a Euclidean domain.
8.7. ED, PID, and UFD are Different 165
19 /4
3 /2
1 1
i i+1/2 i+1
− 3 /2
− 19 /4
Figure 32. Points in the area red shaded are a distance less than
√ from
an integer in Z. The blue area maps into
√ the red under x →
√ 2x − 19/4
indicated by the arrow. We note that 19/4 ≈ 1.09 and 3/2 ≈ 0.87.
Proof. Of course, the second part is settled by the previous result. So here
we just prove that R is a PID. So consider any non-zero ideal I in R and pick
a b in I which minimizes the norm N(b) on the non-zero elements of I. Now
assume I is not principal, then certainly bR will not equal I. In this case,
we choose any a ∈ I\bR and investigate what happens. By the absorption
property, we have that
∀p, q ∈ R : ap − bq ∈ I .
We will show, however, that
∃p, q ∈ R : ap − bq 6= 0 and N(ap − bq) < N(b) , (8.4)
which contradicts our choice of b, and therefore disproves the assumption
that I is not principal. By the multiplicativity of norms, (8.4) will be proved
if N (ap/b − q) < 1. By remark 7.27 then, (8.4) is equivalent to
a
∃p, q ∈ R : ap − bq 6= 0 and p−q < 1. (8.5)
b
√ q so that the real part of ap/b − q is not zero. Then
Clearly, we can choose
add√a multiple
√ of i 19/2 to q so that the imaginary part of ap/b − q is in
(− 19/4,
√ √19/4]. Note that ap/b − q 6= 0. If in fact the imaginary part is
in (− 3/2, 3/2) (shaded red in Figure 32), then by subtracting an integer
Z) from√q we are done. If, however, the imaginary part of ap/b −√
(in √ q lands
in [ 3/2, 19/4], then we multiply both p and q by 2 and subtract i √19/2.
One can check (see exercise 8.23) that the complex map g : z → 2z−i 19/4
maps the top blue shaded area in Figure 32 into the area shaded in red. The
argument for the lower blue area is identical.
Theorem 8.28. The set Z[x] is a UFD but not a PID.
8.8. Exercises
Exercise 8.1. Let R be an integral domain. Consider the set
R × {R\{0}} = {(a, b) : a, b ∈ R, b 6= 0} .
Define an equivalence relation ∼ as follows.
(a, b) ∼ (c, d) if ad = bc .
Frac(R) is the collection of equivalence classes with addition and multipli-
cation:
(a, b) + (c, d) = (ad + bc, bd) and (a, b) · (c, d) = (ac, bd) .
It is not hard (but tedious) to show [22][Chapter 8] that ∼ is indeed an
equivalence and that Frac(R) is the minimal field containing R. Frac(R) is
called the field of fractions or field of quotients of R.
a) Show that addition and multiplication are well-defined in Frac(R).
b) What is the field of fractions of Z?
c) The identity is not used in the definition of Frac(R). What is the “field
of fractions” of the “rng” (see remark 5.24) mZ where m > 1 in N?
d) Why is it necessary to require that R has no zero divisors?
168 8. Factorization in Rings
√
Exercise 8.2. We apply the Euclidean algorithm in Z[ −1] as in Section
8.4. For the notation, see the proof of Proposition 8.16. Suppose ργ −1
falls in the unit square depicted in Figure 33. We have drawn four quarter
circles of radius 1 in the unit square, denoted by a, b, c, and d.
a) Show that we cannot always choose κ = κ1 + iκ2 where κ1 is the floor
of the real part of κ + ργ −1 and κ2 the floor of the imaginary part. (Hint:
Consider the region “northeast” of the quarter circle a.)
b) Compute the coordinates of the points A, B, C, and D indicated in the
figure. (Hint: Because of the symmetries of the figure, the x coordinate of
A equals 1/2. et cetera.)
c) Show that if ργ −1 falls in the interior of the convex shape FACE, then
there are four possible choices for κ so that N(ρ) < N(γ).
d) Estimate the area of the convex shape FACE. (Hint: It is contained in a
square with sides of length BD and it contains a square with sides of length
AC.)
e) Is it possible that there is only one value for κ so that N(ρ) < N(γ)?
i 1+i
a b
A
d B
F C
1
0
In the following proposition and in exercises 8.3, 8.4, and 8.5, we study
√
the primes in Z[ −1] — called Gaussian primes. Recall that the Gaussian
integers from a Euclidean domain (Proposition 8.16), and so we have unique
factorization and primes and irreducibles are the same (Corollary 8.13). We
use the following notation. C (for “cross”) denotes the set Z ∪ iZ minus
√
the origin. Recall that the units in Z[ −1] are {±1, ±i} and those in Z
√
are {±1}. The notation π means a prime in Z[ −1], whereas p means a
positive prime in Z.
8.8. Exercises 169
√
Proposition 8.30 (Gaussian Primes). A number π ∈ Z[ −1] is prime if:
i) π ∈ C and |π| equals a prime p in Z with p =4 3,
ii) π 6∈ C and |π|2 equals a prime p in Z with p = 2 or p =4 1.
iii) Furthermore, if π is reducible then (i) and (ii) cannot hold. (So “if” can
be replaced by “if and only if”.)
Exercise 8.4. a) Use exercise 5.21 (c) to show that if p =4 1 and p prime
in Z, then there is m such that p | m2 + 1. √
b) Show that if p =4 1, then p is not a prime in Z[ −1]. √ (Hint: use that
p | (m + i)(m − i).) Also show that 2 is not a prime in Z[ −1].
c) Show that a2 + b2 6=4 3. (Hint: compute modulo 4.)
d) Show that if a prime p in Z does not have residue 1 or 3 modulo 4, then
p = 2.
e) Use exercise 8.3 (c) and (b) of this exercise to prove Proposition 8.30
(i).
f) Then use exercise 8.3 (d) and (c) and (d) of this exercise to prove part
(ii).
√
Exercise 8.5. a) Show that for a reducible γ in Z[ −1], N(γ) is not prime
in Z. (Hint: use Corollary 7.26.)
b) Use (a) to show that a reducible γ cannot satisfy Proposition 8.30 (ii).
c) Assume γ in C and γ = αβ up to units. Show that if α and β are in C,
then |γ| is not prime in Z.
d) Assume γ in N and γ = αβ up to units and that α and β are not in C.
Show that if γ = p, then |α| = |β |, and therefore are conjugates (Hint: use
Corollary 7.26.). Show that this implies that N(γ) has the form a2 + b2 .
e) Show that (c) and (d) and exercise 8.4 (c) imply that γ cannot satisfy
Proposition 8.30 (i).
f) Extend the reasoning in (d) and (e) to all of C.
170 8. Factorization in Rings
√
Exercise 8.6. Again, we consider numbers
√ in the ring R = Z[ −1].
a) Show that if bn − 1 is prime in Z[ −1], then b − 1 is a unit.
b) Use (a) to show that b must be 2 or 1 ± i.
c) Use Proposition 8.30 (i) to show that if b = √ 2, we obtain the usual
Mersenne primes (Definition 5.13) as primes in Z[ −1].
d) Show that if n is not prime, then bn − 1 is not prime. (Hint: as in exer-
cise 1.14 (i).)
e) Show that
n nπ
N((1 ± i)n − 1) = 2n − 21+ 2 cos +1.
4
(Hint: (1 ± i) = 21/2 e±iπ/4 and eiϕ + e−iϕ = 2 cos ϕ.)
f) Show that (1 ± i)n − 1 is prime if and only if its norm is prime and n is
odd. (Hint: use (d) to show that n must be odd, and then Proposition 8.30.)
Exercise 8.11. a) Prove Lemmas 2.5 and 2.6 for a Euclidean domain.
b) Theorem 8.5 follows immediately from the absence of zero divisors
(Definition 8.4). In Chapter 2, we take the absence of zero divisors in Z for
granted. Why do we need Euclid’s Lemma (Lemma 2.6) — whose proof
uses that division algorithm — to prove Theorem 2.7? (Hint: does the
cancellation take place in Z?)
Exercise 8.14. a) Which ones of the sets in exercise 5.24 are integral do-
mains?
b) Euclidean domains?
√ √
Exercise 8.15. a) Show that ±1 and ±1 ± 2) are units of Z[ 2]. (Hint:
see Lemma 8.31.)
a unit, then for all n ∈ Z, α n is a unit.
b) Show if α is √
c) Show that Z[ 2] has infinitely many units.
d) Find solutions of the quadratic equation a2 − 2b2 = ±1. (Note: an
equation of the form a2 − db2 = 0 where d is square free, is called Pell’s
equation .)
√ √
One can show that the set of units of Z[ 2] is {±(1 + 2)n : n ∈ Z}.
8.8. Exercises 173
√
Exercise 8.16. Given the ring R = Z[ 10]. √
a) Show that there is no α ∈ R with N(α) = ±2. (Hint: write α = a+b 10
and try to solve for the coefficients of α in Z10 .)
b)√Show that there is no α ∈ R with N(α) = ±5. (Hint: write α = a +
b 10. Then in Z5 , show that a =5 0. It follows that 25k2 − 10b2 = ±5.
Divide by 5 and solve in Z5 .)
c) Use (a) and (b) to show that 2 and 5 are irreducible. (Hint: assume that
2 = αβ , show that then N(α) =√±2, et cetera.)
d) Use (a) and (b)
√ to show that 10 is irreducible.
e) Show that Z[ 10] is a not Euclidean domain. (Hint: Show that 10 does
not have unique factorization.)
Exercise 8.17. Given a field F, we form the ring F[x] of polynomials. For
this exercise, read Section 3.7 again.
a) Use exercise 7.1 to show that the ring F[x] is a Euclidean domain with
the degree d (of the polynomial) as a Euclidean function.
b) What goes wrong in (a) if F = Z? (Hint: give a counter-example.)
c) What are the “primes” in F[x]. (Hint: see Proposition 7.5 and Corollary
8.13.)
d) p1 (x) = x2 +1 is reducible over C, R, or Q? What about p2 (x) = x2 −2?
e) Show that the degree in R[x] is an additive function if R is a domain.
Exercise 8.19. Define the product of ideals A and B as the smallest ideal
containing { ai bi : ai ∈ A, bi ∈B }.
a) Show that AB must contain ∑ki=1 ai bi : ai ∈ A, bi ∈ B k ∈ N .
b) Show that the set in (a) is an ideal.
c) Suppose A is generated by {xi } and B by {y j }. Show that AB is the ideal
generated by {ai y j }.
d) Use (c) to show that for I and J as in exercise 7.5, IJ = h6, xi. (Hint: x2
is in hxi, and so forth.)
174 8. Factorization in Rings
√ √
Exercise 8.20. a) Show that 2 and 1 ± i 3 are irreducible in Z[ −3].
(Hint: follow the proof of Proposition 8.19.) √
b) Use (a) to show that up to units, there are two factorizations in Z[ −3]
of 4 (see equation (8.3)).
c) Use equation (8.3) to show
√ that 4 is not prime.
d) Show that 2 and (1 ± i −3) are not prime. (Hint: see Proposition √8.3.)
e) Conclude that 4 does not
√ admit any factorization into
√ primes in Z[ −3].
f) Show that 2 and (1 ± i −3) are prime in Z[ 21 (1 + −3)].
Exercise 8.21. a) Modify the first part of the proof√of Proposition √ 8.22 to
show that√ the norm is a Euclidean function for Z[ −1] and Z[ −2] but
not for Z[ −n] for n ≥ 3.
b) Modify the second part of the proof of Proposition 8.22 to show that
√
the distance to the nearest lattice point of Z[ 12 (1 + j)] is less than 1 if
j ∈ {−11, · · · , −1}. (Hint: the height y of the equidistant point in
ptriangle
on the left of Figure 31 must be such that d − y < 1 where d = 12 | j|.)
c) Show that with Lemma 7.23, √ this implies that the norm is a Euclidean
function for the integers of Q[ j] where j ∈ {−11, −7, −3, −2, −1}.
Exercise 8.22. Use Definition 7.9 to show that I in part (ii) of the proof of
Theorem 8.24 is an ideal.
Exercise 8.23. Consider the map g :C√ →C, defined in the proof of part
(ii) of Theorem 8.27. a) Show that g 419 = 0.
√ √
b) Show that − 23 < g 23 < 0.
c) Show that (a) and (b) imply that g maps the blue region in Figure 32 into
the red region.
Ergodic Theory
Overview. This time we venture seemingly very distant from number the-
ory. The reason is that we wish to investigate what properties “typical” real
numbers have. By “typical” we mean “almost all”; and to define “almost
all”, we would need to delve fairly deeply into measure theory, one of the
backbones of abstract analysis. In this chapter, we will point to the tech-
nical problems that need to be addressed, and then quickly state the most
important result (the Birkhoff ergodic theorem). In Chapter 10 we will then
move to the implications for number theory. The proof of the Birkhoff er-
godic theorem will be postponed to Chapter 14. We remark that ergodic
theory was to a large extent inspired by a problem that arose in 19th century
physics [25, 39], namely how to describe statistical behavior of a determin-
istic dynamical system. Broadly speaking, an ergodic dynamical system
explores all parts of the available with equal probability, allowing quantita-
tive predictions for the long term behavior of such a system. The discussion
whether or not ‘physical’ systems tend to be ergodic has had a profound
impact on science, in particular physics [25, 39]. The use of probabilistic
methods to study number theory is often referred to as probabilistic number
theory.
175
176 9. Ergodic Theory
1.26). The notion of length works perfectly well for simple sets such as
intervals. But if we want to consider more general sets – such as Cantor
sets — it is definitely very useful to have a more general notion of length,
which we denote by measure. However, there is a difficulty in formulating
a rigorous mathematical theory of measure for arbitrary sets. The source
of the difficulty is that there are, in a sense, too many sets. Recall that the
real line is uncountable (see Theorem 1.23). The collection of subsets of
the line is in fact the same as the power set (Definition 1.30) P(R) of the
the real line. And thus the cardinality of the collection of subsets is strictly
larger than that of the real numbers (Theorem 1.31), making it a truly very
big set.
A reasonable theory of measure for arbitrary subsets of R should have
some basic properties that are consistent with with intuitive notions of “length”.
If we denote the measure of a set A by µ(A), then we would like µ to have
the following properties.
1) µ : P(R) → [0, ∞].
2) For any interval I: µ(I) equals the length of I.
3) µ is translation invariant.
∞
4) For a countable collection of disjoint sets Ai : µ(∪∞
i=1 Ai ) = ∑i=1 µ(Ai ).
The problem is that no such function exists. Among all the possible sets, we
can construct an — admittedly pretty weird — set for which the last three
properties cannot simultaneously hold.
To explain this more easily, let us replace R by the circle S = R/Z.
Now define an equivalence relation (Definition 1.27) in S as follows: a ∼ b
if a − b is rational. Each element of S clearly belongs to some equivalence
class (it is equivalent to itself), and cannot belong to two distinct equiva-
lence classes, because if a ∼ b and a ∼ c, then also the difference between
b and c is rational, and hence they belong to the same class. Note that each
equivalence class is countable, and so (see exercise 1.9) there are uncount-
ably many equivalence classes.
For every one of these equivalence classes, we pick exactly one rep-
resentative. The union of these representatives forms a set V . Now by
requirement (1), any set, no matter how exotic its construction, should have
a measure that is a real number. We choose V as our set. Let r : N → Q
be a bijection between N and the rationals in S. Consider the union of the
9.2. Measure and Integration 177
translates
∪∞
i=1 (V + ri ) .
By definition of V , this union covers the entire circle. So by requirement (2)
above, its measure is 1. By requirement (3), each of the translates of V must
have the same measure, ε. Since by the previous paragraph, the translates
of V are disjoint, requirement (4) implies that
∞
1= ∑ ε,
i=1
which is clearly impossible!
The construction of the set V just outlined is a little vague. It is not clear
at all how exactly we could choose an individual representative, much less
how we could achieve that feat for each of the uncountably many equiva-
lence classes. If we wanted to draw a picture of the set V , we’d get nowhere.
Does this construction V really exist as an honest set? It turns out that one
needs to invoke the axiom of choice1 to make sure that V exists.
The consensus in current mathematics (2020) is to accept the axiom
of choice. One consequence of that is that if we want to define a measure,
then at least one of those four requirements above needs to be dropped or
weakened. The measure theoretic answer to this quandary is to restrict the
collection sets for which we can determine a measure. This means, that
of the properties (1) through (4), we restrict property (1) to hold only for
certain sets. These are called the measurable sets.
One can work out [7] that the collection of Lebesgue measurable sets is
also closed under complementation, countable intersection, and countable
union. Furthermore, any open set in R is a countable union of disjoint open
intervals [33] (see also exercise 9.4). As a consequence of these facts, we
have the following result.
Proposition 9.3. i) A set S ⊂ R is Lebesgue measurable if and only if there
exist closed sets Ci ⊆ S such that
µout (S\ ∪∞
i=1 Ci ) = 0 .
ii) A set S ⊂ R is Lebesgue measurable if and only if there exist open sets
Oi ⊇ S such that
µout (∩∞
i=1 Oi \S) = 0 .
Proof. First observe that every closed set is the complement of an open set
and vice versa. Since complementation preserves the Lebesgue measurable
sets (by definition 9.2), (i) and (ii) are equivalent.
Definition 9.2 implies that for a measurable set S the following holds.
For all ε > 0, there are countably many disjoint open intervals Ii such that
µout (S) ≤ ∑ `(Ii ) < µout (S) + ε .
i
Thus µ is a function from the measurable sets to the positive reals and
the measurable sets are constructed so that properties (2), (3), and (4) in
Section 9.1 hold. We summarize this as follows.
Corollary 9.5. The Lebesgue measure µ on R or R/Z satisfies the follow-
ing properties
1) µ : measurable sets → [0, ∞].
2) For any interval I: µ(I) equals the length of I.
3) µ is translation invariant.
∞
4) For a countable collection of disjoint sets Ai : µ(∪∞
i=1 Ai ) = ∑i=1 µ(Ai ).
We remark that part (4) of this result implies that in general sub-additivity
holds:
∞
µ(∪∞
i=1 Ai ) ≤ ∑ µ(Ai ) . (9.1)
i=1
The reason is that (4) says the measure of the union equals the sum of the
measures of the disjoint “new” parts Ani of Ai , i.e. Ai minus the intersection
of Ai with the A j where j < i. Since Ani ⊆ Ai , we have µ(Ani ) ≤ µ(Ai ).
Hence the sub-additivity.
We need a some more technical terms. If we have a space X and a col-
lection Σ of measurable sets, then the pair (X, Σ) is called a measurable space.
A function f : X → X is called measurable if the inverse image under f of
180 9. Ergodic Theory
partitions the range of f into small pieces [yi , yi+1 ]. For each such layer, the
contribution is the measure of the inverse image f −1 ({y : y ≥ yi+1 }) times
yi+1 − yi . Sets of measure zero are neglected. Summing all contributions,
one obtains an approximation of the Lebesgue integral (see Figures 35 and
70). The Lebesgue integral itself is defined as the limit (if it exists) of
these. The Lebesgue integral of a not necessarily non-negative function f
is computed by splitting up f into its non-negative part f + and its negative
part f − , so that f = f + + f − . The integral of f is then defined as
Z Z
I= f + dµ − (− f − ) dµ .
We’ll see in Section 14.1 that the domains of f + and f − are measurable
so that this operation is well-defined. A function f is called integrable , or
R
µ-integrable for clarity, if | f | dµ exists and is finite. It turns out that the
Lebesgue integral generalizes the Riemann integral4 we know from calculus
(see exercise 9.6).
This level of technical sophistication means that the fundamental the-
orems in measure theory require a substantial mastery of the formalism.
Since pursuing all the technicalities would take a considerable effort and
4Recall that the Riemann integral is approximated by partitioning the domain of f , see Figure 35.
9.3. The Birkhoff Ergodic Theorem 181
would lead us well and far away from number theory, we will suppress
those details in this chapter. However, proofs will be completed in Chapter
14.
F
−1 (F nu)(B)
Y
nu(F (B))
X *
Somewhat confusingly, this last result is often also called the Birkhoff er-
godic theorem. We will also adhere to that usage, just so that we can avoid
saying “the corollary to the Birkhoff ergodic theorem” on many occasions.
This corollary really says that a transformation is ergodic if and only if time
averages equal spatial averages. This is a very important result because, as
we will see, spatial averages are often much easier to compute.
0
c− c+
One can furthermore prove that the set of invariant probability mea-
sures is non empty and every invariant measure is a convex combination
of ergodic measures [38][chapter 8]. This says that, in a sense, ergodic
measures are the building blocks of chaotic dynamics. If we find ergodic
behavior with respect to some measure µ, then we understand the statisti-
cal behavior for almost all points with respect to µ. There may be other
complicated behavior but this is “negligible” if you measure it with µ.
A B
We start with the measure δ0 that assigns (full) measure 1 to the point
0 and measure 0 to any (measurable) set not containing 0. As we can see in
Figure 38, for any set S
0 ∈ S ⇐⇒ 0 ∈ T −1 (S) .
Thus δ0 (S) = δ0 T −1 (S) , that is: δ0 is T -invariant. Since any T -invariant
set either contains the point 0 or not, such a set trivially has measure either
9.4. Examples of Ergodic Measures 185
Proof. Note that T restricted to the interval A = [0, 1] is just the doubling
map. Observe also that the complement Sc must also be T invariant.
Suppose that Sc , contains an interval J of positive length and choose
an interval I so that I ∩ S is not empty. Since S is invariant, we have that
for all n > 0, T −n (S) is contained in S. If we can show that these pre-
images are dense in A, then they must intersect the interval J and we have a
contradiction.
The inverse image T −1 (I) is:
I +0 I +1
∪ = ({0.0} ∪ {0.1}) + 2−1 I ,
2 2
where the expressions 0.0 and 0.1 are binary (base 2), so that 0.1 = 21 .
Iterating this procedure, we get
T −2 (I) = ({0.00} ∪ {0.01} ∪ {0.10} ∪ {0.11}) + 2−2 I ,
5The set (0, 1] has measure 0 with respect to δ . Corollary 9.10 tells us to neglect such sets. Thus we
0
must take x = 0, and then the summation also gives 0.
186 9. Ergodic Theory
Similarly, the nth iterate gives all the expressions in base of length n. This
is a collection of 2n regularly spaced copies of 2−n I. Clearly, the union of
these over n is dense and so must intersect J.
This result implies that if S ⊂ [0, 1] is an invariant set and its comple-
ment in [0, 1], Sc , is not empty, then neither can contain an interval. This is
equivalent to the following.
Corollary 9.14. If S ⊂ A is a T invariant set containing an interval, then
S = A.
For now note that both A and B are T invariant sets and µA (A) = 1
while µA (B) = 0. We check Corollary 9.10 again. Let f be
0 if x ∈ [0, 12 )
f (x) =
α if x ∈ [ 1 , 1]
2
For arbitrary x in [0, 1], we expect T i (x) to hit the interval [0, 12 ] half the time
on average. So the sum should give α2 . Indeed, if we compute the integral
R
f dµA , that is what we obtain.
Now we turn to an at first sight very strange and counter-intuitive ex-
ample. In the unit interval, we consider the set of x with all possible bi-
nary expansions, but now we construct a measure ν p that assigns a measure
p ∈ (0, 1) to “0”, and 1 − p to “1”. In effect this amounts to assigning a
measure p to the interval [0, 12 ] and 1 − p to [ 12 , 1]. The interesting case is of
course when p 6= 12 . So that is what we will assume.
Continuing the construction of the measure ν p , the set of sequences
starting with 00 get assigned a measure p2 ; the ones starting with 01, a
measure p(1 − p); 10, a measure (1 − p)p; and 11, a measure (1 − p)2 . The
sum of these is 1. We now keep going ad infinitum, always keeping the sum
of the measures equal to 1, see Figure 39. So ν p is a probability measure.
The same reasoning as in Proposition 9.13 shows that an interval I
consisting of points whose binary expansion starts with a = a1 a2 · · · an has
as pre-image the interval I0 consisting of points whose expansion starts with
0a and I1 where the expansion starts with 1a.
ν p (A0 ) + ν p (A1 ) = pν p (A) + (1 − p)ν p (A) = ν p (A) ,
and the measure ν p is T invariant.
9.5. The Lebesgue Decomposition 187
p 1−p
0 1/2 1
Figure 39. The first two stages of the construction of the singular mea-
sure ν p .
This means that ν p -almost all x land in [0, 12 ] a fraction p of the time on
average. Thus the set of points that land in [0, 12 ] on average a fraction q of
the time has ν p measure zero. But those have full νq measure.
Note that the binary expansion of the ν p typical (that is: in a subset having
full measure) x has on average a fraction of exactly p ones.
9.6. Exercises 189
9.6. Exercises
Exercise 9.1. Reformulate the counter example in Section 9.1 as a counter
example in R. (Hint: two numbers in [0, 1] are equivalent if their difference
is rational. Let V a set the contains exactly one representative of each
class. Let R be the set of rationals in [−1, 2]. Then consider the union
∪r∈RV + r. Show that it should have measure between 1 and 3.)
Exercise 9.2. a) Show there is an open set in [0, 1] of arbitrarily small outer
measure that contains all the rationals in [0, 1].
b) Show there is a closed set in [0, 1] of measure greater than 1 − ε that
contains only irrational numbers.
Exercise 9.3. a) Show that countable sets have Lebesgue measure zero.
(Hint: use the Definition 9.4 and Corollary 9.5 (4).)
b) What is the Lebesgue measure of the following sets: the rationals in
[0, 1], the algebraic numbers in [0, 1], the transcendental numbers in [0, 1],
and the irrational numbers in [0, 1]?
Exercise 9.4. Show that any open set O in R is a finite or countable union
of disjoint open intervals. (Hint: for every x ∈ O there is an open interval
(a, b) ⊆ O that contains x. Now let α = inf{a : (a, b) ⊆ O , x ∈ (a, b)} and
similar for β . This way we obtain a partitioning of O into open intervals.
Each such interval must contain a rational number.)
In the next exercise, we prove the following Lemma.
Lemma 9.17. i) Any set in a probability space X with outer measure zero
is Lebesgue measurable with Lebesgue measure zero.
ii) A countable union of measure 0 sets has measure 0.
Exercise 9.5. a) Show that the empty set has measure zero. (Hint: X and
0/ are disjoint. Use criterion (4) in Section 9.1.)
b) Prove part (i) of the lemma for a non empty set. (Hint: a non empty set
contains a point which is a Borel set; now apply Definition 9.2.)
c) Prove part (ii) of the lemma. (Hint: use equation (9.1).)
Exercise 9.6. Let X = [0, 1], E the set of irrational numbers in X, and µ
the Lebesgue measure. R
a) Use exercise 9.3 to show that E dµ = 1. (Hint: approximate the
Lebesgue integral as in Section 9.1.)R
b) Show that the Riemann integral E dx is undefined. (Hint: look up the
exact definition of Riemann integral)
190 9. Ergodic Theory
Exercise 9.7. Construct the middle third set Cantor set C ⊆ [0, 1] in the
following way (Figure 40). At stage 0, take out the open middle third
interval of the unit interval. At stage 1, take out the open middle third
interval of the two remaining intervals. At stage n, take out the open middle
third interval of each of the 2n remaining intervals. The set C consists of
the points that are not removed. See also exercise 1.11.
a) Show that C consists of all points x = ∑∞ −i where {a }∞ are
i=1 ai 3 i i=1
arbitrary sequences in {0, 2} .
N
Figure 40. The first two stages of the construction of the middle third
Cantor set. The shaded parts are taken out.
Exercise 9.8. Construct the set C ⊆ [0, 1] in the same way as in exercise
9.7, but now at stage n, take out (open intervals of) an arbitrary fraction
mn ∈ (0, 1) of each of the remaining intervals.
a) Show that C is non-empty. (Hint: find a point that is never taken out.)
i
b) Let mi = 1 − e−α for some α ∈ (0, 1). Compute the Lebesgue measure
of C and its complement. (Hint: at every stage, consider the length of the
set that is left over. You should get e−α/(1−α) . )
We remark that Cantor sets with positive measure such as those in exercise
9.8 are sometimes called fat Cantor sets.
Exercise 9.9. a) Show that the Borel sets contain the closed sets. (Hint: a
closed set is the complement of an open set.)
b) Show that the middle third Cantor set (see exercise 9.7) is a Borel set.
c) Show that the Cantor sets of exercise 9.8 are Borel sets.
d) Show the sets in (a), (b), and (c) are measurable.
e) Show that the complements of the sets in (d) are measurable.
Exercise 9.10. Construct the Cantor function c : [0, 1] → [0, 1], also called
Devil’s staircase as follows. See also exercise 9.7.
a) Start with stage 0: c(0) = 0 and c(1) = 1. At stage 1, set c(x) = 12 if
x ∈ [ 13 , 23 ].
b) At stage 2, set c(x) = 14 if x ∈ [ 19 , 92 ] and c(x) = 34 if x ∈ [ 79 , 89 ].
c) Use a computer program to draw 5 or more stages. c(x) is the continuous
function that is the limit of this process.
9.6. Exercises 191
Exercise 9.11. See exercise 9.10 for the definition of the Cantor function,
c(x).
a) Use exercise 9.7 (a) to show that for x in the Cantor set
∞ ∞
ai −i
x= ∑ ai 3−i =⇒ c(x) = ∑ 2
2 .
i=1 i=1
b) Show that on any interval not intersecting the Cantor set c is constant.
c) Show that c : [0, 1] → [0, 1] is onto.
d) Show that c is non-decreasing.
e) Show that c(x), is continuous. (Hint: find a proof that a non-decreasing
function from an interval onto itself is continuous.)
Since c is increasing, we can interpret it as a cumulative distribution func-
tion. The measure µ of [a, b] ⊆ [0, 1] equals c(b) − c(a). If [a, b] is inside
any of the flat parts, then its measure equals zero. Thus the measure of the
complement of the Cantor set is zero, and all measure is concentrated on
the Cantor set.
Exercise 9.12. Find the Lebesgue decomposition (Theorem 9.15) of c in
exercise 9.11 interpreted as a measure. Explain!
The equation in item (c) of exercise 9.13 holds in the case where the func-
tion c admits a derivative everywhere.
Exercise 9.14. Consider the map t : [0, 1] → [0, 1] given by t(x) = {10x},
the fractional part of 10x.
a) Show that the Lebesgue measure dx is invariant under t.
b) Prove Corollary 9.14.
c) Use (b) to show that if an invariant set contains an interval, then it equals
[0, 1].
d) Show that the frequency with t n (x) visits the interval I = [0.358, 0.359)
equals the frequency with which 358 occurs (if that average exists).
e) Assuming ergodicity, show that for Lebesgue almost every x, that av-
erage equals 10−3 . (Hint: use the corollary to Birkhoff’s theorem with
f (x) = 1 on I and 0 elsewhere.)
192 9. Ergodic Theory
Exercise 9.16. a) Show that there exist x in whose decimal expansion the
word “358” occurs more often than in almost all other numbers (see exer-
cise 9.14 (d)).
b) Show that the frequency of occurrences of “358” in the decimal expan-
sion of a number x does not necessarily exist.
c) What is the measure of of set of numbers referred to in (a) and (b). (Hint:
use Birkhoff’s theorem and its corollary.)
Exercise 9.17. a) Fix b > 1 and let w be any finite word in {0, 1, · · · b − 1}N
of length n. Show that for almost all x, the frequency with which that word
occurs in the expansion in base b equals b−n . (Hint: follow the reasoning
in exercise 9.14.)
b) The measure of the set of x for which that frequency is not b−n is zero.
Exercise 9.18. Use exercise 9.17, Corollary 1.24, and Lemma 9.17 to show
that the set of words not normal in base b has measure 0.
Exercise 9.19. Show that the set of absolutely normal numbers has full
measure. (Hint: follow the reasoning of exercise 9.18.)
Exercise 9.20. a) Show that the set of numbers that are not normal in base
b > 2 is uncountable. (Hint: words with a missing digit are a subset of
these; see exercise 9.7.)
b) Repeat (a), but now for base 2. (Hint: rewrite in base 4 with digits 00,
01, 10, and 11; follow (a).)
Exercise 9.21. a) Show that the set of absolutely normal numbers is dense.
(Hint: follows from exercise 9.19.)
b) Show that numbers with finite expansion in base b are non-normal in
base b.
c) For any b > 1, show that the set of non-normal numbers inn base b is
also dense. (Hint: pick any number and approximate it.)
9.6. Exercises 193
Overview. In this chapter, we consider the three maps from [0, 1) to itself
that are most important for our understanding of the statistical properties of
real numbers. They are: multiplication by an integer n modulo 1, rotation
by an irrational number, and the Gauss map that we discussed in Chapter 6.
In doing this, we review three standard techniques to establish ergodicity.
In this chapter we restrict all measures, transformations, and so on to live in
one dimension ([0, 1) or R/Z).
195
196 10. Three Maps and the Real Numbers
dy
−1 −1
T (dy) T (dy)
d −1
is of course equal to the length of dy T (y) dy. Since
d −1 dy
T (y) dy = 0 ,
dy |T (x)|
10.1. Invariant Measures 197
Proof. We already proved item (i). For item (ii), notice that
Z x
1 1 1
ν([0, x]) = ds = ln(1 + s) ,
ln 2 0 1 + x ln 2
so ν([0, 1]) = 1 and ν is as probability measure. It is easy to check that (see
also Figure the inverse image under T of [0, x] is the union of the
117)1that
intervals a+x , a , and so
1 a+1 a+1+x
= ln 2 ∑∞
a=1 ln a − ln a+x
1 a+x a+1+x
= ln 2 ∑∞
a=1 ln a − ln a+1
1
= ln 2 ln(1 + x) .
198 10. Three Maps and the Real Numbers
At the end of this last proof, we needed to jump through some hoops
to get from the invariance of the measure of simple intervals to that of all
Borel sets. This can be avoided if we prove the invariance of the density
directly via equation (10.1). But to do that, you first need to know a tricky
sum, see exercises 10.6 and 10.7.
With the invariant measures in hand, we can now turn to proving the
ergodicity of the three maps starring in this chapter.
Proof. By Proposition 9.3, there are open sets On containing E such that
µ(On \E) = δn , where δn tends to 0 as n tends to infinity. Using property
(4) of Corollary 9.5, we see that
µ(On ) = µ(On \E) + µ(E) = µ(E) + δn . (10.2)
According to exercise 9.4, for each n, there is a collection of disjoint open
intervals {In,i } such that
On = ∪i In,i .
Now suppose that µ(E ∩ I) ≤ (1 − ε)µ(I) for all intervals. In particular this
holds for those intervals belonging to the collection of intervals {In,i }. So
for any n, we have
µ(E ∩ On ) = µ (E ∩ (∪i In,i )) = ∑ µ(E ∩ In,i ) ≤ ∑(1 − ε)µ(In,i ) .
i i
10.3. Rotations and Multiplications on R/Z 199
The middle equality follows again from property (4) of Corollary 9.5. No-
tice that the left-hand side equals µ(E), since On contains E, and the right-
hand side equals (1 − ε)µ(On ) by definition of the intervals In,i . Together
with equation (10.2), this gives
µ(E) = (1 − ε)µ(On ) = (1 − ε)(µ(E) + δn ) .
If n tends to infinity, δn tends to 0, and thus µ(E) must be 0.
Proof. We want to show that for all x and y, the interval [y − δ , y + δ ] con-
tains a point of the orbit starting at x. Denote by qpnn the continued fraction
convergents of ω (of Definition 6.4). By Lemma 6.12
lim x + qn ω − pn = x .
n→∞
Fix n be big enough enough, so that the distance (on the circle) between x
and x + qn ω − pn is less than δ . Then the points xi := x + iqn ω modulo 1
advance (or recede) by less than δ . And thus at least one must land in the
stipulated interval (see Figure 42).
200 10. Three Maps and the Real Numbers
0 1
x x+2q om
x+om
x+q om x+om+q om
0 1
I J
Figure 43. `(I) is between 31 and 12 of `(J). So there are two disjoint
−1 that fall in J.
images of I under Rω
In the proof of the next theorem, we employ the same strategy as in the
proof of Proposition 9.13 and Corollary 9.14. But this time, the Lebesgue
density theorem helps us get a much stronger result.
Theorem 10.8. Multiplication by τ ∈ Z with |τ| > 1 modulo 1 is ergodic.
10.3. Rotations and Multiplications on R/Z 201
0
0 1
the measure of Ac and A in [0, 1]. There is no way to tell, because we do not
even know what A is. The solution lies in controlling that distortion. If we
n (x )
can prove that for that particular branch ∂∂ TT n (y 0
is bounded independent
0)
of n by, say, K, then the argument of the proof of Proposition 10.3 gives that
a small interval with the density of A being greater than 1 − ε must map to
a large interval with density at least 1 − Kε. Since we can let ε as small as
we want, the set A ∩ [0, 1] must have measure 1.
The exposition in the remainder of this section and the next closely
follows [59].
Definition 10.11. Let I0 be an interval. The distortion D of T n on that
interval is defined as
∂ T n (x0 )
D := sup ln .
x0 ,y0 ∈I0 ∂ T n (y0 )
Here, ∂ stands for the derivative with respect to x.
Proposition 10.12. Let T be the Gauss map. The distortion of T n on any
nth level interval I0 is uniformly bounded in n.
∂ 2T
Now we note that ∂ ln |∂ T | equals ∂ T . Furthermore, the mean value the-
|Ii+1 |
orem (once again) gives |Ii | = |∂ T (ui )| for some ui ∈ Ii . Substituting this into
the last equation, we get
n−1
∂ 2 T (zi )
D≤ ∑ z sup · |Ii+1 | . (10.3)
,u ∈I
i=0 i i i
∂ T (zi )∂ T (ui )
204 10. Three Maps and the Real Numbers
We start with a remarkable result that says that the arithmetic (usual)
mean of the continued fraction coefficients diverges (item (i)) for almost all
numbers, but their geometric mean is almost always converges (item (ii)).
Theorem 10.14. For almost all numbers x, the continued fraction coeffi-
cients an =an (x) satisfy:
a1 + ... + an
i) limn→∞ = ∞ and
n
− log2 a
1
ii) limn→∞ (a1 · ... · an )1/n = ∏∞
a=1 1 − (1+a)2 < ∞.
This last constant is approximately equal to 2.86542 · · · is called Khinchin’s
constant.
This sum telescopes and the student should verify (see exercise 10.14) that
this gives
1 1
ln(k + 1) − k ln 1 + , (10.6)
ln 2 k+1
which diverges as k → ∞ and proves the first statement.
206 10. Three Maps and the Real Numbers
ii) This proof is very similar to that of (i), except that now we want to
compute the “time average”
ln a1 + ... + ln an
lim .
n→∞ n
The exponential of this will give us the result we need. So this time, we
define
1 1
For a ∈ N : g∞ (x) = ln a if x ∈ , . (10.7)
a+1 a
This time around, g∞ is ν-integrable (as we will see below) and we get
Z 1 ∞
1 g∞ (x) ln a a+1 a+2
dx = ∑ ln − ln . (10.8)
ln 2 0 1 + x a=1 ln 2 a a+1
ln a
(Note that = log2 a.) Since we can write
ln 2
a+1 a+2 1
ln − ln = − ln 1 − , (10.9)
a a+1 (a + 1)2
we finally get the result (as well as the assertion that g∞ is ν-integrable) by
taking the exponential of the sum in (10.8).
π2
Remark 10.16. The constant 12 ln 2 ≈ 1.1866 · · · is called Lévy’s constant.
10.5. Number Theoretic Implications 207
Proof. Item (ii) follows very easily from (i), see exercise 10.20. So here
we will prove only (i).
To simplify notation in this proof, we will write xi := T i (x0 ) where T
is the Gauss map. For the nth approximant of x0 ∈ (0, 1), see Definition 6.4,
(x)
we will write qpnn(x) . From that same definition, we conclude
pn (x0 ) 1 qn−1 (x1 )
= = .
qn (x0 ) a1 (x0 ) + pn−1 (x1 )/qn−1 (x1 ) a1 (x0 )qn−1 (x1 ) + pn−1 (x1 )
See also exercise 10.2 (a). By Corollary 6.8 (ii)), gcd(pn , qn ) = 1, and so
from exercise 10.2 (b) we see that pn (x0 ) equals qn−1 (x1 ). More generally,
we have by the same reasoning
pn (x j ) = qn−1 (x j−1 ) .
This implies that
pn (x0 ) pn−1 (x1 ) pn−2 (x2 ) p1 (xn−1 ) 1
· · ··· = ,
qn (x0 ) qn−1 (x1 ) qn−2 (x2 ) q1 (xn−1 ) qn (x0 )
since p1 = 1 by Theorem 6.6. Now we take the logarithm of the last equa-
tion. This yields
1 n−1 1 n−1
1 pn−i (xi )
− ln qn (x0 ) = ∑ ln xi − ∑ ln xi − ln .
n n i=0 n i=0 qn−i (xi )
Two more steps are required. The first is showing that the last sum is
finite. This not difficult, because
1 n−1 qn−i (xi )xi 1 n−1
(qn−i (xi )xi − pn−i (xi ))
∑ ln pn−i (xi ) = n ∑ ln 1 +
n i=0 pn−i (xi )
.
i=0
Corollary 6.7 or, more precisely, exercise 6.16 yields that
|qn−i (xi )xi − pn−i (xi )| 1 √
< < 2−(n−i) 2 ,
pn−i (xi ) pn−i (xi )qn−i+1 (xi )
where the last inequality follows from Corollary 6.8 (i). The fact that for
small x, ln(1 + x) ≈ x concludes the first step (see also exercise 10.12).
Since the above sum is bounded and xi = T i (x0 ), we now divide by n
and take a limit to get
1 1 n−1
lim − ln qn (x0 ) = lim ∑ ln T i (x0 ) .
n→∞ n n→∞ n
i=0
208 10. Three Maps and the Real Numbers
The second step is then to compute the right-hand side of this expression.
Naturally, the ergodicity of the Gauss map invites us to employ Birkhoff’s
theorem in the guise of Corollary 9.10 with f (x) set equal to ln(x).
1 n−1
Z 1
ln x
∑ ln T i (x0 ) =
n i=0 0 (1 + x) ln 2
dx .
10.6. Exercises
Exercise 10.1. Show that for a set A: µ(A) = 0 (Lebesgue measure) if and
only if ν(A) = 0 (invariant measure of the Gauss map). (Hint: write both
equalities in terms of Lebesgue integrals.)
Exercise 10.3. a) Show that every probability density ρ on R/Z gives rise
to an invariant measure under the identity.
b) What are the absolutely continuous measures — i.e. with a density, see
Section 9.5 — that are invariant under rotation by 1/2? (Hint: consider
densities with period 1/2.)
c) The same for rotation by p/q for p and q in N.
d) Show that the uniform density — with density ρ(x) = 1 — is invariant
under x → nx modulo 1 (where n ∈ N).
Exercise 10.4. At the end of the proof of Theorem 10.7, assume that
|I| = |J| and complete the proof in that case.
a) Show that for every ε > 0, there is i such that Riω (I) falls in an ε-
neighborhood of J.
b) Estimate the fraction of J that must be in A.
c) Show that this gives a contradiction.
10.6. Exercises 209
Exercise 10.7. Prove that the Gauss map preserves the measure of Propo-
sition 10.3 via equation (10.1). Do not use the computation in the proof of
that proposition. (Hint: use exercise 10.6.)
Exercise 10.8. Show that ρ(x) = 1 is the only continuous invariant proba-
bility density of an irrational rotation R. (Hint: if ρ is invariant under R, it
must be invariant under Ri for all positive i. Use Lemma 10.6.)
Exercise 10.9. a) Show that ρ(x) = 1 is the only continuous invariant den-
sity for the angle doubling map. (Hint: use a reasoning similar to that of
Proposition 9.13.)
b) Check that the same is true for the map x → τx modulo 1 where τ ∈ Z
and τ > 1.
Exercise 10.10. The orbit of any irrational rotations is uniformly dis-
tributed. So why do we encounter specifically the golden mean in phyl-
lotaxis — the placement of leaves? Research this and add illustrations.
Exercises 10.11 and 10.12 discuss some very useful properties of the log-
arithm for later reference. In fact, they are useful in a much wider context
than discussed here. For instance, exercise 10.11 comes up in any discus-
sion of entropy [19] or in deciding the stability of Lotka-Volterra dynamical
systems [53]. Exercise 10.12 is important for deciding the convergence of
products of the form ∏(1 + xi ).
210 10. Three Maps and the Real Numbers
Exercise 10.11. a) Show that if x > −1, then ln(1 + x) ≤ x with equality
iff x = 0. (Hint: draw the graphs of ln(1 + x) and x.)
b) Let pi and qi positive and ∑i pi = ∑i qi . Use (a) to show that
− ∑i pi ln pi ≤ − ∑i pi ln qi . (Hint: − ∑i pi (ln pi − ln qi ) = ∑i pi ln qpii ≤
∑i qi − pi by (a).)
c) Let Sn be the open n-dimensional simplex pi > 0 and ∑ni=1 pi = 1. Show
that h : Sn → R given by h(p) = − ∑i pi ln pi has a single extremum at
pi = 1n . (Hint: The constraint is C = ∑i pi must be equal to 1. Deduce that
at the maximum, the gradients of h and C must be parallel.)
d) Show that this extremum is a maximum. (Hint: set f (x) := −x ln x and
show that f 00 (x) < 0. As a consequence, if wi are positive weights such
that ∑i wi = 1, we have Jensen’s inequality or f (∑i wi pi ) ≥ ∑i wi f (pi ).
See Figure 45.)
wx+(1−w)y
x y
Figure 45. Illustration of the fact that for a concave function f , we have
f (wx + (1 − w)y) ≥ w f (x) + (1 − w) f (y) (Jensen’s inequality).
Exercise 10.14. a) Show that the right-hand side of (10.4) gives (10.5).
b) Show that (10.5) gives (10.6). (Hint: write out the first few terms explic-
itly.)
c) Use exercise 10.11 (a) to bound the second term of (10.6).
d) Conclude that (10.6) is unbounded.
Exercise 10.17. Use exercise 10.16 (a) to show that Khinchin’s constant
log2 a
1
equals ∏∞a=1 1 + (1+a)2 .
Exercise 10.19. a) Show that limx→0 ln(x) ln(1+x) = 0 (Figure 46). (Hint:
for the limit as x → 0, substitute x = ey , then use L’Hopital.)
R ln(1+x)
b) Use (a) to show that I := 01 (1+x)
ln x
dx = − 01
R
x dx. (Hint: integra-
tion by parts.)
(−1)n+1 xn
c) Show that ln(1 + x) = ∑∞ i=1 n .
d) Substitute (c) into I and integrate term by term to get I =
n −2
∑∞n=1 (−1) n .
2
e) The sum in (d) equals π12 . Show that that gives the result advertised in
Theorem 10.15. (Observation: we sure took the cowardly way out in this
last step; to really work out that last sum from first principles is elementary
but very laborious. The interested student should look this up on the web.)
n −2 = π while from2
In exercise 10.19, note the curious fact that ∑∞
n=1 (−1) n 12
−2 = π2
exercise 2.26 we have that ζ (2) = ∑∞
n=1 n 6 .
Exercise 10.20. a) Use exercise 6.16 and Theorem 10.15 (i) to show that
for almost all ω ∈ [0, 1]
1 pn π2
lim ln ω − =− .
n→∞ n qn 6 ln 2
b) What do you in (a) get if ω is rational? Is that a problem?
Definition 10.18. Given a one dimensional smooth map T : [0, 1] → [0, 1],
the Lyapunov exponent λ (x) at a point x is given by
1
λ (x) := lim ln |DT n (x)| ,
n→∞ n
10.6. Exercises 213
assuming that the limit exists. (There is a natural extension of this notion
for systems in dimension greater than or equal to 2, but we do not need it
here.)
Exercise 10.21. What does Definition 10.18 tell you about how fast T n x
and T n y separate if x is a typical point and y is very close to x?
Exercise 10.22. Let T be the Gauss map and µ its invariant measure. Show
that the Lyapunov exponent at x satisfies
n−1
1
λ (x) = lim ∑ ln DT (T j (x)) .
n→∞ n
j=0
(Hint: think chain rule.)
b) Show that Birkhoff’s theorem (Corollary 9.10) implies that for almost
all x ∈ [0, 1]
Z 1
−2 ln x
λ (x) = dx .
0 ln 2 (1 + x)
(Hint: refer to exercise 10.18.)
c) Use the last part of the proof of Theorem 10.15 and exercise 10.19 to
2
show that for almost all x, the Lyapunov exponent equals 6πln 2 ≈ 2.3731.
Exercise 10.23. a) See exercise 10.22. Let T be the Gauss map and x =
[n, n, · · · ]. Determine the Lyapunov exponent at x. (Hint: see also exercise
6.3.)
b) Why are these exponents different from the one computed in exercise
10.22?
Exercise 10.24. Let T be the map given in Corollary 10.10. a) Show
that for almost all points x, the Lyapunov exponent is given by λ (x) =
− ∑i `i ln `i . (Hint: see also exercise 10.22.)
b) Show that the answer in (a) is greater than 1.
c) Show if the map has n branches, then the Lyapunov exponent is extremal
if all branches have the same slope. (Hint: exercise 10.11 (c).)
d) Show that this extremum is a maximum. (Hint: exercise 10.11 (d).)
Stock prices undergo multiplicative corrections, that is: each day their price
is multiplied by a factor like 0.99 or 1.01. On the basis of the previous prob-
lem, it seems reasonable that the distribution of their first digits satisfies the
logarithmic distribution of exercise 10.25. In fact, a much wider range of
real world data satisfies this distribution than this “multiplicative” explana-
tion would suggest. This phenomenon is called Benford’s law and appears
to be only partially understood [10].
Chapter 11
215
216 11. The Cauchy Integral Formula
We will use the fact that this says that analyticity is an open condition.
Corollary 11.3. If f is analytic at z0 , then it is analytic in an open neigh-
borhood of z0 .
{z }
i
1An open neighborhood of z minus the point z itself is often called a punctured neighborhood of z .
0 0 0
11.1. Analyticity versus Isolated Singularities 217
One might be tempted to say that the example in item (ii) above consists
of two singularities, one of order k and one of order k −1. However, we have
ak ak−1 (ak + ak−1 )z − ak−1 z0
+ = .
(z − z0 )k (z − z0 )k−1 (z − z0 )k
The numerator does not vanish at z0 , and so we have one singularity of order
k. A pole of “infinite order” in item (iii) means that the expansion contains
infinitely many non-zero terms ak (z − z0 )−k with k ∈ N.
Remark 11.5. A subtle — but sometimes important — point that is the
observation that branch points like the origin for z → (z − z0 )1/2 or z →
ln(z − z0 ) are not isolated singularities. The reason is that in any punctured
neighborhood of the origin these “functions” are not one-valued. In other
words, they are not functions, and therefore a fortiori they are not analytic
functions. Even if you redefine the function in this neighborhood so that
it describes a single branch of that function, then still there is a line of
discontinuities (the branch cut) with the branch point as its endpoint.
For completeness, we mention the only other types of singularities:
cluster points , these are limit points of other singularities; and natural
boundaries , entire sets where singularities are dense. An example of the
latter is the unit circle for the function ∑∞ n!
n=1 z . Needless to say, these
singularities are not isolated.
All singularities mentioned in this remark are non-isolated, and if z0 is
the locus of such a singularity, it is not possible to approximate its behavior
in terms of integral powers of (z − z0 ).
Definition 11.6. A function f is meromorphic in a domain A if it has only
isolated poles in the domain. It is meromorphic if this holds on all of C.
Proof. Item (i) follows immediately from the hypotheses. Item (ii) follows
from the fact that ∑∞ ∞ ∞
n+1 gi (z) ≤ ∑n+1 | gi (z)| ≤ ∑n+1 mi and the conver-
gence of ∑n mn (so the partial sums of {mn } form a Cauchy sequence).
Figure 48. Left, a curve. Then two simple, closed curves with opposite
orientation. The curve on the right is a union of two simple, closed
curves.
c
z0 D
p gamma
Remark. The surprising aspect of this formula is that the value of an ana-
lytic function at z0 is determined by the values of that function on a simple,
closed curve that encircles z0 .
Proof. Cauchy’s integral formula establishes the result for k = 0. The in-
duction step proceeds as follows. Suppose we are given
(k − 1)! f (w)
I
f (k−1) (z) = dw .
2πi γ (w − z)k
Since z lies inside γ, so does z + d if d is small enough (Figure 50). We use
r
z+d
z
(k − 1)! 1 1
I
= lim f (w) − dw
d→0 2πid γ (w − z − d)k (w − z)k
z c
z+d
z
0
Figure 51. F(z) does not depend on the path. So F(z + d) − F(z) =
R
c f ≈ f (z)d
Proof. Pick a point z0 and set F(z) := zz0 f (w) dw. Because f (w) dw = 0,
R H
F(z) does not depend on the path from z0 to z and so is uniquely defined.
R
Thus F(z + d) − F(z) = c f ≈ f (z)d, where c is a short, linear path from z
to z + d (see Figure 51). Then F 0 (z) = f (z) and so f is the derivative of an
analytic function and therefore is itself analytic.
Proposition 11.14. Let {gi } be a sequence of functions that are analytic in
a region A and suppose that ∑∞ i=1 gi (z) converges uniformly on every closed
disk contained in A. Then
i) For any curve γ in A: γ limn ∑ni=1 gi = limn γ ∑ni=1 gi .
R R
Proof. Write fn = ∑ni=1 gi and call the limit f . Then for all n > N
Z Z Z Z
fn − f = fn − f ≤ | fn − f | ≤ ε`(γ) .
γ γ γ γ
where `(γ) is the length of γ (a curve whose image is a compact set). The
fact that | fn (z) − f (z)| ≤ ε for all z ∈ γ is due to uniform convergence. This
proves (i).
Next, we prove (ii). Pick z0 ∈ A and let B = Br (z0 ) be an open disk
whose closure B̄ is contained in A. By assumption, fn → f uniformly on B̄
and thus f is continuous on B̄ (see exercise 11.17). Now let γ be any simple,
H
closed curve in B̄. Then by Cauchy’s theorem, γ fn = 0. Item (i) implies
H
that γ f = 0. Finally, Morera’s theorem implies that f is analytic at z0 .
224 11. The Cauchy Integral Formula
For part (iii), we have to show that | fn0 (z) − f 0 (z)| tends to zero as n
tends to infinity. We use Theorem 11.11 to do that. Fix some small r and so
that γ(t) := z0 + reit is contained in A. Then
1 fn (z) − f (z)
I
| fn0 (z0 ) − f 0 (z0 )| ≤ |dz| .
2π γ (z − z0 )2
By uniform convergence, for large n, | fn (z) − f (z)| is less than ε on γ while
|z − z0 | = r and the length of γ is 2πr.
Lemma 11.15. If |z − z0 | < |w − z0 |, then
∞
(z − z0 )k 1
∑ = .
k=0 (w − z0 )k+1 w−z
h ik
z−z0 k
Proof. ∑∞
k=0 w−z0 is a geometric series that can be written as ∑∞
k=0 x ,
1
where |x| < 1. This equals 1−x . Substituting this in the right-hand side of
the lemma gives the result.
Theorem 11.16 (Taylor’s Theorem). Suppose f is analytic in a region A
and let D be any open disk centered on z0 whose closure is contained in A.
Then for all z ∈ D we have
∞
f (n) (z0 )
f (z) = ∑ (z − z0 )n ,
n=0 n!
which converges on D. This is called the Taylor series of f at z0 .
Proof. Let D be the disk bounded by the curve γ given by w(t) = z0 + reit .
Take z inside D (see Figure 52) so that |z − z0 | < |w − z0 |. By Theorem 11.9
and Lemma 11.15, we have
1 f (w) 1 ∞
(z − z0 )k
I I
f (z) = dw = ∑ f (w) dw .
2πi γ (w − z) 2πi γ k=0 (w − z0 )k+1
Again because |z − z0 | < |w − z0 |, the sum converges uniformly, and so
Proposition 11.14 (i) implies that the sum and integral can be interchanged.
To the expression that then results, we apply Theorem 11.11 to get
1 ∞
(z − z0 )k ∞
f (k) (z0 )
I
··· = ∑ f (w) k+1
dw = ∑ (z − z0 )k .
2πi k=0 γ (w − z0 ) k=0 k!
11.4. A Tauberian Theorem 225
r w
D
z
0 z A
|z−z0 | k
By Theorem 11.11 (ii), the last expression is bounded by M ∑∞ k=0 rk
.
Uniform convergence on compact sets contained in the open disk of radius r
follows from Lemma 11.7. The series is analytic by Proposition 11.14.
Remark 11.17. Note that it follows that the Taylor series of an entire func-
tion (Definition 11.2) converges in all of C.
Note that g0T exists (exercise 11.24) and so gT is entire. Pick any ε > 0, we
will prove that for any ε > 0, we can choose T such that
lim |gT (0) − g(0)| < ε . (11.1)
T →∞
Since gT (0) is finite, this implies that g(0) also exists. So, fix ε > 0.
dR R
C−
C+
DR
d
L−
Furthermore,
z2
1 1 1 −is 2|c|
1+ 2 = 1 + e2is = e + eis = . (11.4)
z R R R R
And finally
ezT = eRT c+iRT sin s = eRT c . (11.5)
Since the length of C+ is πR, we thus obtain from (11.2) that
1 1 Fe−RcT 2c RcT F
Z
≤ · · · e · πR = . (11.6)
2πi C+ 2π Rc R R
For the second step, analyticity of gT and Theorem 11.9 imply that
2
2
Z gT (z) 1 + z 2 ezT Z gT (z) 1 + z 2 ezT
1 R 1 R
dz = dz ,
2πi C− z 2πi L− z
allowing us to evaluate the integral along C− . We have, now for c < 0,
Fe−RcT
Z T Z T
|gT (z)| = f (t)e−zt dt ≤ F e−Rct dt = .
0 0 R|c|
Substituting this into the integral over C− and using (11.4) and (11.5) gives
2
Z gT (z) 1 + z 2 ezT
1 R 1 Fe−RcT 2|c| RcT F
dz ≤ e πR = . (11.7)
2πi C− z 2π R|c| R R
The third (most painful) step is the evaluation of the remaining integral,
Z
G(z)ezt dz ,
L−
2
(see again Figure 53) where G(z) := g(z)(1+ Rz 2 )/(2πiz). On the two (com-
pact) circular segments z = Reis with Re z ∈ [−dR , 0], |G| is maximized by
the constant Mh (R, dR ). The combined length of these segments is less than
4d. Thus the integral over these pieces contributes at most Mh (R, dR ) 4d.
On the vertical segment, |G| is bounded by another constant, Mv (R, d). This
may very well increase as d decreases, since, with decreasing d, the path
passes very close to the origin. We have that ezT = e−dT and the path
length is less than 2R. So the contribution of the vertical segment is at most
Mv (R, d) e−dT 2R. Summarizing, this gives
Z
G(z)ezt dz ≤ 4d Mh (R, dR ) + 2R Mv (R, d) e−dT . (11.8)
L−
228 11. The Cauchy Integral Formula
Thus for R large enough, |p(2Reiϕ )| is larger than |p(Reiϕ )|. The closed
disk D of radius 2R is compact and p is continuous, so it follows that |p(z)|
must have a minimum in in the interior of that disk (see Figure 54).
Let z0 be this minimum. Take δ in the ball |δ | < ε, and ε small so
that the ε-disk around z0 is in the interior of D (see Figure 54). Now expand
p(z0 + δ ) = ∑di=0 ai (z0 + δ )i . The expansion must contain non-trivial terms,
because otherwise p would be constant. So for some 0 < k ≤ d,
p(z0 + δ ) = p(z0 ) + bk δ k + bk+1 δ k+1 + · · · + bd δ d ,
11.6. Exercises 229
z0
2R
0
Figure 54. In the proof of Proposition 11.20, |p(z)| must have a mini-
mum z0 in the interior of the disk |z| < 2R and it cannot have a minimum
unless at z0 unless it is zero.
where bk 6= 0. Thus
p(z0 + δ ) = p(z0 ) + bk δ k (1 + δ B(δ )) ,
where again for ε small enough |B(δ )| is bounded and so p(z0 + δ ) ≈
p(z0 ) + bk δ k . By choosing the phase of δ appropriately and |δ | small
enough, one make sure that if |p(z0 )| > 0, then |p(z0 )+bk δ k | < |p(z0 )|.
Lest one might think that every complex function must have a zero, we
warn the reader that ez has no zero (see also exercise 11.16).
Together with exercise 3.24, the last result establishes the fundamental
theorem of algebra (Theorem 3.19), which we repeat verbatim here.
Theorem 11.21 (Fundamental Theorem of Algebra). A polynomial in
C[x] (the set of polynomials with complex coefficients) of degree d ≥ 1 has
exactly d roots, counting multiplicity.
11.6. Exercises
Exercise 11.1. Which of the following sets are regions or domains in C?
a) C\{0}.
b) C\N.
c) C minus the negative real axis.
d) C minus the real axis.
e) The union of the closed unit disks with centers at 1 and -1.
f) The same as (d), but minus the boundary.
g) The same as (e), but now add the imaginary axis.
230 11. The Cauchy Integral Formula
Exercise 11.4. a) Let z = x + iy and show that for n ∈ N, n−z = n−x e−iy ln n .
b) From (a), show that n−z = n−x .
c) Use (b) to show that ζ (z) = ∑∞ −z is uniformly convergent on com-
n=1 n
pact disks in Re z > 1. (Hint: use Lemma 11.7 and exercise 2.25 (e).)
Exercise 11.6. Let A be a region (that is: an open, connected set) contain-
ing a sequence {zn } converging to z0 . Let f and g be analytic functions on
A such that f (zn ) = g(zn ) for all n.
a) Show that h := f − g is analytic of A and satisfies h(zn ) = 0.
b) Use exercise 11.5 to show that h = 0 in an open disk containing z0 .
c) Write A as the disjoint union of
A0 := {z0 ∈ A : h(z) = 0 on an open neighborhood of z0 } and A1 := A\A0 .
Show that A0 is open in A. (Hint: by definition of A0 .)
d) Show A1 is open in A. (Hint: consider z ∈ A1 , if h(z) 6= 0, use continuity
of h; if h(z) = 0, use exercise 11.5 that h is not zero in a neighborhood of
z.)
e) Show that one of A0 or A1 must be empty. (Hint: use Definition 11.1.)
f) Conclude that the analytic continuations of f and g in A coincide. (Note
that this was remarked more informally in Section 2.5.)
The last result of exercise 11.6 will be relevant when we discuss the analytic
continuation of the zeta function. We isolate the result here.
Theorem 11.22 (Uniqueness of Analytic Continuation). Suppose f and
g are analytic in a region or domain A ∈ C. Let Z be the set of points such
that f (z) − g(z) = 0 and suppose that Z has a limit point in A. Then the
analytic continuation of f and g coincide on A.
Exercise 11.8. a) Use exercise 11.7 (c) to show that (Figure 55) for y ∈ R
1 iy 1 iy
cos y = e + e−iy and sin y = e − e−iy .
2 2i
b) Use exercise 11.6 to show that
1 iz 1 iz
cos z = e + e−iz and sin z = e − e−iz
2 2i
are the unique extensions of the sine and cosine functions to the complex
plane.
sin x
c) Find a formula with only exponentials for tan z. (Hint: tan x = cos x .)
−exp(−it)
exp(it)
i sin(t)
cos(t)
exp(−it)
Figure 55. The complex plane with eit , −e−it and e−it on the unit cir-
cle. cost is the average of eit and e−it and i sint as the average of eit and
−e−it .
0 r
r e i phi
Figure 56. Moving around the origin once in the positive direction in-
creases ϕ, and thus ln z, by 2π. Discontinuities can be avoided if we
agree never to cross the half line or branch cut L.
11.6. Exercises 233
3This means that the partial derivatives exist and are continuous.
234 11. The Cauchy Integral Formula
Exercise 11.13. Write z = x + iy and f (z) = u(x + iy) + iv(x + iy), where u
and v are real functions. In a neighborhood of (x0 + iy0 ), suppose that the
matrix of (continuous) derivatives D f (x, y) satisfies Cauchy-Riemann.
a) Use exercise 11.12 to show that this implies that D f (x, y) acts like a
complex number.
b) Use (a) to imply that f is analytic.
Exercise 11.14. Write z = x + iy and f (z) = u(x + iy) + iv(x + iy), where u
and v are real functions.
a) Given that u(x + iy) = e−y cos x, compute v and f (z). (Hint: use the
Cauchy-Riemann equations to compute ∂x v and ∂y v. Integrate both to get
v. Finally, express u + iv as f (z).)
b) Given that v(x + iy) = −y3 + 3x2 y − y, compute u and f .
c) Given that f (z) = tan z, compute u and v. (Hint: use exercise 11.8 (c).)
An interesting result — though we will not prove it — is the following. A
weaker version of this is called the Casorati-Weierstrass Theorem and has
an easy proof [21][chapter 4] [35][chapter 3].
Theorem 11.25 (Picard Theorem). Let f have an isolated essential singu-
larity at z0 . Then the image of any punctured neighborhood of z0 contains
all values infinitely often with at most one exception.
k2
limk→∞ (k+1)(k+2) = 1.
c) Show that limk→∞ hk (x) = 0.
d d
d) Show that dx limk→∞ hk (x) = 0 while limk→∞ dx hk (x) = 0 does not exist
at x = 1/2 (for example).
Figure 57. The functions gk and hk of exercise 11.19 for i ∈ {2, 8, 15, 30}.
Exercise 11.20. Set α = a + ib where a and b real and greater than zero
and let f (z) = (z − α)−1 .
a) Show that
H
f is analytic inside and on the contour C given in Figure 58.
b) Show C fR= 0.
c) Show that bi f tends to 0 as R tends to infinity. (Hint: | f | → 0 while the
path length remains
R
finite.)
d) Show that r f tends to πi as R tends to infinity. (Hint: set z(t) =
ib + Reit with Ht ∈ [0, π].)
e) Show that p f tends to −2πi as R tends to infinity. (Hint: set z(t) =
α + re−it with t ∈ [0, 2π].)
R +R
f) Conclude that limR→∞ −R f (z) dz = πi. (Hint: use (a).)
11.6. Exercises 237
r2
g
r1
ib
b2 p
b1
−R c a +R
Exercise 11.23. a) Let f (t) = 1. Show that its Laplace transform as defined
in Theorem 11.18 does not have an analytic continuation to the imaginary
axis.
b) In (a), show that the conclusion of Theorem 11.18 does not hold.
c) Repeat (a) and (b), but now for f (t) = eiωt .
12.1. Preliminaries
Recall that π(x) denotes the number of primes in the interval [2, x]. So
π(2) = 1, π(3.2) = 2, and so on. The reason that the variable x is real is that
it simplifies the formulas to come. The Riemann zeta function is denoted
by ζ (s), see Definition 2.19 and Proposition 2.20. In this chapter, we will
frequently encounter sums of the form ∑ p . For example see Definition 12.1
below. Such sums will always be understood to be over all positive primes.
On the other hand, ∑ p≤x indicates a sum over all positive primes p less than
239
240 12. The Prime Number Theorem
It is analytic in Re z > 1.
ln 2 + ln 3 + ln 5
theta(x)
ln 2 + ln 3
f(x)
3 5
2 xi x i+1
Figure 59. Now, θ (t) is constant except at the values t = p (a prime) where
it has a jump discontinuity of size ln p. Thus, in this case, I(x) simplifies to
Z x
I(x) = f (t) dθ (t) = ∑ f (p) ln(p) . (12.2)
1 p≤x
On the other hand, we can find a different expression for I(x) by integration
by parts (sometimes called partial integration)
Z x Z x Z x
x
I(x) = d f (t)θ (t) − θ (t) d f (t) = f (t)θ (t) 1
− f 0 (t)θ (t) dt .
1 1 1
(12.3)
The point of this operation is usually that now we have expressed the in-
tegral in (12.2) as fixed expression plus another integral which has better
convergence properties than the original integral. For instance if f (t) = t −k ,
then f 0 (t) ∝ t −k−1 and so the integral converges faster.
Lemma 12.2. We have for x ≥ 2
Z x
θ (x) θ (t)
π(x) = + dt .
ln x 2 t (lnt)2
Proof. First note that since 2 is the smallest prime, equation (12.2) gives
Z x
d θ (t)
π(x) = .
lnt 2−ε
Apply integration by parts (12.3) to obtain
Z x
θ (x) 1
π(x) = − θ (t) d .
ln x 2−ε lnt
1 dt
Using d lnt = − t(lnt)
to work out the last term yields the lemma with lower
2
limit 2 − ε in the integral. But since θ (t) = 0 for t < 2, we may replace that
limit by 2.
Lemma 12.3. For Re z > 1, we have
Z ∞
Φ(z) 1 θ (x)
− = − 1 x−z dx
z z−1 1 x
Z ∞
θ (et )e−t − 1 e−zt+t dt .
=
0
242 12. The Prime Number Theorem
R ∞ −z
Proof. Using (12.2), we can write Φ(z) as 1 x dθ (x). Then apply (12.3)
(partial integration) to obtain
∞
Z ∞
Φ(z) = x−z θ (x) +z x−z−1 θ (x) dx .
1 1
∞
We will see in equation (12.6) that for Re z > 1, the boundary term x−z θ (x)
1
vanishes. This gives
Φ(z)
Z∞ θ (x)
= x−z dx .
z 1 x
Noting that 1/(z − 1) = 1∞ x−z dx, the first equality follows. The second
R
we have pk ≤ n.
Proof. We prove the right-hand side first. From the binomial theorem (The-
orem 5.30), we see that (since the pi must divide k ≤ n)
n
n n
<∑ = 2n .
bn/2c i=0 i
n
For the left-hand side, we note that bn/2c is the largest of the n+1 numbers
n
i and so n
n n
(n + 1) >∑ = 2n .
bn/2c i=0 i
n
Lemma 12.6. i) For all n ≥ 2, we have bn/2c ≤ nπ(n) .
n
ii) For n ≥ 2 a power of 2, we have eθ (n)−θ (n/2) ≤ n/2 .
Proof. For the first inequality, use unique factorization (Theorem 2.11) and
the definition of π(n) to write
π(n)
n
= ∏ pki i .
bn/2c i=1
By Lemma 12.4, pki i ≤ n. Thus ∏i=1 pki i ≤ nπ(n) , which yields the inequal-
π(n)
ity.
For the second inequality, we start by noticing that n is even and so any
n
prime p in the interval 2 , n is a divisor of n! but not of the denominator of
n n
n/2 . Therefore any such p divides n/2 . This implies that
n
∏ p≤ .
n <p≤n n/2
2
244 12. The Prime Number Theorem
Noting that p = eln p and inserting the definition of θ (x) (Definition 12.1)
yields the last inequality.
Theorem 12.7 (Chebyshev’s Theorem). For any a < ln 2 and b > 4 ln 2,
there is a large enough K such that
π(x)
∀x ≥ K : ∈ [a, b] .
x/ ln x
Equations (12.6) and (12.8) will also play an important role in the proof
of the (full) prime number theorem.
12.3. Properties of the Zeta Function 245
is analytic.
Proof. First, set w := p−z = e−z ln p . Using that the Taylor series at 0
wn
− ln(1 − w) = ∑ ,
n≥1 n
We saw in see exercise 2.24 (c) that ζ (z) diverges as z & 1+ . Here
is a more precise statement. Recall that analytic continuations are well-
defined (i.e. unique) in domains with only isolated singularities (see Theo-
rem 11.22).
Proposition 12.10. i) The functions (z − 1)ζ (z) and (z − 1)ζ 0 (z) + zζ (z)
have well-defined analytic continuations on Re z > 0.
ii) (The analytic continuation of) (z − 1)ζ (z) evaluated at z = 1 equals 1.
n+1
n
u
n n+1
Figure 60. Integration over the shaded triangle of area 1/2 in equation (12.11).
Proof. We have that 1∞ x−z dx = 1/(z − 1) and nn+1 n−z dx = n−z . Using
R R
∞ Z n+1 Z x
··· = ∑ −zu−z−1 dudx . (12.11)
n=1 n n
Each term of the sum is an integral over a triangular domain of area 1/2
(Figure 60). The maximum of the integrand is
p
zn−z−1 = σ 2 + τ 2 n−σ −1 ,
where z = σ +iτ (with σ , τ real). So, each summand has absolute value less
than half that. Thus (12.11) converges uniformly on compact disks in σ > 0
(see also exercise 11.4) and so h has an analytic continuation to Re z > 0.
12.3. Properties of the Zeta Function 247
2 (1 + cos(τn ln p))2
= ∑∑ > 0.
p n≥1 npnσ
But Re E > 0 yields |eE | > 1, which implies the lemma.
248 12. The Prime Number Theorem
Proof. Taking a derivative with respect to z on both sides of the first equal-
ity of Lemma 12.9, we obtain
−ζ 0 (z) ln p e−z ln p ln p
=∑ −z ln p
=∑ z .
ζ (z) p 1 − e p p −1
1
To express this in terms of the function Φ, we use x−1 = 1x + x(x−1)
1
to get
−ζ 0 (z) ln p ln p
= ∑ z +∑ z z .
ζ (z) p p p p (p − 1)
The first term on the right, of course, is Φ(z) (Definition 12.1). Subtracting
the second term on the right, we see that
Φ(z) 1 −ζ 0 (z) 1 1 ln p
− = − − ∑ pz (pz − 1)
z z−1 zζ (z) z−1 z p
We tackle the first term on the right-hand side. From Proposition 12.10
(i), we obtain that both the numerator and the denominator are analytic on
Re z > 0. We only need to make sure the denominator does not have zeros
in Re z ≥ 1. By Proposition 12.10 (ii), we know that it does not have a zero
at z = 1. By remark 12.8, ζ (z) has no zeroes for Re z > 1. Lemma 12.11
says that it has no zeroes if Re z = 1.
Next we look at the second term on the right-hand side. Since ln p is
smaller than any positive power of p, the last term on the right-hand side is
comparable to p−2z . Since p−2z ≤ p−2 whenever Re z ≥ 1, it converges
uniformly in that region and is thus analytic in the desired region (Proposi-
tion 11.14 (ii)).
12.5. The Prime Number Theorem 249
θ (x) π(x)
ii) lim =1 ⇐⇒ lim =1.
x→∞ x x→∞ x/ ln x
Proof. We first prove (i). Suppose that the conclusion of the lemma does
not hold. Then for some ε > 0 either there is a sequence of xi such that
limi→∞ xi = ∞ with θ (xi ) > (1 + ε)xi or the same holds with θ (xi ) < (1 −
ε)xi .
Let us assume the former. Since θ is monotone, we have for all i
Z (1+ε)xi Z (1+ε)xi
θ (y) − y (1 + ε)xi − y (1+ε)xi
dy > dy = −(1+ε)xi y−1 −ln y .
xi y2 xi y2 xi
The latter can easily be worked out and yields ε − ln(1 + ε) for each i. Since
this is strictly greater than 0 by exercise 10.11, I(s) = 1s θ (y)−y
R
y2
dy cannot
converge to a fixed value as s tends to infinity.
The proof of non-convergence if θ (xi ) < (1 − ε)xi is almost identical
(exercise 12.17).
To prove (ii), we use Lemma 12.2 to establish that
Z x
θ (x) θ (t)
π(x) − = dt .
ln x 2 t (lnt)2
Next we use (12.6) to get rid of the θ (x) in the integrand, and subsequently
(12.8) to estimate the remaining integral. For large x, this gives
θ (x) x
π(x) − ≤ 8 ln 2 (1 + ε) ,
ln x (ln x)2
for any ε > 0. Now we multiply both sides by ln x/x to obtain the result.
250 12. The Prime Number Theorem
So this lemma implies that to prove the prime number theorem at this
point, we need to show that 1∞ θ (x)−x dx = 0∞ (θ (et )e−t − 1) dt exists. We
R R
x2
restate Theorem 2.21 in its full glory.
Theorem 12.14 (Prime Number Theorem). We have
π(x) π(x)
1) lim = 1 and 2) lim R x = 1.
x→∞ (x/ ln x)
2 lnt dt
x→∞
Proof. The equivalence of parts (1) and (2) is due to the fact that L’Hopital’s
x)−1
rule implies that limx→∞ R xx(ln
(lnt)−1 dt
= 1. Thus, for example,
2
π(x) π(x) x/ ln x
lim R x = lim Rx .
x→∞
2 lnt dt x→∞ x/ ln x
2 lnt dt
The same reasoning works vice versa (exercise 12.10).
So we only need to prove part (1). Lemma 12.3 gives
Φ(z + 1) 1
Z ∞
θ (et )e−t − 1 e−zt dt .
− =
z+1 z 0
Proposition 12.12 says that the left-hand side has an analytic continuation
in Re z ≥ 0 while equation (12.6) says that θ (et )e−t − 1 is bounded. But
then, by Theorem 11.18, 0∞ (θ (et )e−t − 1) dt exists. Finally, Lemma 12.13
R
12.6. Exercises
Exercise 12.1. Write out in full the computations referred to in the proofs
of Lemmas 12.2 and 12.3.
Proposition 12.15 (Abel Summation). For the sequence {an }∞
n=1 , denote
A(x) = ∑n≤x an . Then for any differentiable f , we have
Z x
∑ an f (n) = A(x) f (x) − A(t) f 0 (t) dt .
n≤x 1
12.6. Exercises 251
Exercise 12.3. Recall the notation bxc (floor) and {x} (fractional part) from
Definition 2.1.
a) Use Abel summation to show that
Z x
1 x − {x} t − {t}
∑ n = x + 1 t 2 dt .
n≤x
Exercise 12.6. a) How many trailing zeros does 400! (in decimal notation)
proof of Lemma 12.4 with p = 5 and p = 2.)
have? (Hint: use the
b) How about 400
200 ?
Exercise 12.7. Consider E(x1 , x2 ) := bx1 +x2 c−bx1 c−bx2 c as in the proof
of Lemma 12.4 and show that E ∈ {0, 1}.
c) Use (b) to show that parts (a) and (b) of Theorem 12.14 are equivalent.
d) Compare (b) to (12.8).
In the next two problems we prove the following result.
Proposition 12.16. Let pn denote the nth prime. The prime number theorem
is equivalent to
pn
lim = 1.
n→∞ n ln n
Exercise 12.11. For this exercise, assume that limx→∞ x/yln x = 1 and that
x → ∞ if and only if y → ∞. (In fact, y stands for π(x), and we know that
x → ∞ if and only if π(x) → ∞, see Theorem 2.17.)
f (x)
a) Suppose limx→∞ fi (x) = ∞ and limx→∞ f1 (x) = 1. Show that
2
ln f1 (x)
lim = 1.
x→∞ ln f 2 (x)
f (x)
(Hint: for x large, (1 − ε) < f1 (x) < (1 + ε), multiply by f2 (x), and take
2
logarithms.)
t
b) Show that limx→∞ lnlnlnxx = 0. (Hint: substitute x = ee .)
c) Use the hypotheses and (a) to show that
x x y ln y 1
= = .
y ln y y ln y x/ ln x ln x − ln ln x 1 − lnlnlnxx
d) Use (b) to show that the limit in (c) as x → ∞ tends to 1. Use the
hypotheses to change to the limit as y → ∞.
e) Show that (d) implies one way of Proposition 12.16.
Exercise 12.12. For this exercise, assume that limy→∞ y lnx y = 1 and that
x → ∞ if and only if y → ∞. See exercise 12.11.
a) Follow exercise 12.11 in reverse to show that
y x y ln y 1
lim = lim = lim = 1.
x→∞ x/ ln x x→∞ y ln y x/ ln x ln x − ln ln x x→∞ 1 − ln ln x
ln x
b) Show that (b) implies the other direction of Proposition 12.16.
c) Whereabouts is the nth prime located?
254 12. The Prime Number Theorem
Exercise 12.13. In this exercise, we fix any K > 1 and {xi }∞ i=1 is a sequence
such that limi→∞ xi = ∞. We also set x0 = Kx for notational ease.
π(x )
a) Show that if π(xi0 ) = π(xi ) and limi→∞ x / lni x exists, then
i i
π(x0 ) 1 π(xi )
lim 0 i 0 = lim .
i→∞ xi / ln xi K i→∞ xi / ln xi
b) Show that (a) and the prime number theorem imply that for large enough
x, there are primes in (x, x0 ]. (Hint: if (a) holds, then there are no primes in
[xi , Kxi ].)
c) Show that in fact, the prime number theorem implies
π(xi0 )
lim = K.
i→∞ π(xi )
d) Show that (c) implies that for large enough x, there are approximately
(K − 1)π(x) primes in (x, x0 ].
In fact, the following holds for all n. We omit the proof, which involves
some careful computations. It can be found in [2].
Proposition 12.17 (Bertrand’s Postulate). For all n ≥ 2 there is a prime
in the interval [n, 2n).
The same reference [2] also mentions an open (in 2018) problem in this
direction: Is there always a prime between n2 and (n + 1)2 ?
Exercise 12.14. a) Show for every m ∈ N, the set {m! + 2, · · · , m! + m}
contains no primes. (Hint: for 2 ≤ j ≤ m we have j | (m! + j).)
b) Show that from Proposition 12.16, we might reasonably expect the “ex-
pected” prime gap pn+1 − pn to be equal to
Gn := (n + 1) ln(n + 1) − n ln n ≈ ln((n + 1)e) ,
if n large.
c) Use the prime number theorem to show that
Gn ≈ ln pn+1 − ln ln pn+1 + 1 ≈ ln pn+1 .
n+1 −pn
d) Assume the twin prime conjecture to show that pln pn+1 does not con-
verge to a limit. See also Figure 61.
d) Use lemma 12.13 to show that the prime number theorem is equivalent
to saying that the sum of the first n “expected” prime gaps equals pn+1 .
Figure 61. The prime gaps pn+1 − pn divided by ln pn+1 for n in {1, · · · , 1000}.
Just like the first Chebyshev function θ (x), the second Chebyshev function
ψ(x) is often used as a more tractable version of the prime counting function
π(x). In particular, in exercises 12.18 and 12.19, we will prove a lemma
similar to Lemma 12.13, namely
Lemma 12.19. We have
ψ(x) π(x)
lim = 1 ⇐⇒ lim = 1.
x→∞ x x→∞ x/ ln x
Exercise 12.19. a) Show that Definitions 12.1 and 12.18 imply that θ (x) ≤
ψ(x).
b) Use (a) and exercises 12.16 (d) and 12.18 (d) to show that
(π(x) − x1−ε ) ln x θ (x) ψ(x) π(x) ln x
(1 − ε) ≤ ≤ ≤ .
x x x x
c) Use (b) and Lemma 12.13 (ii) to prove Lemma 12.19.
Exercise 12.20. Plot θ (x)/x, ψ(x)/x, and π(x) ln x/x in one figure. (See
for example, Figure 62). Compare with exercise 12.18. b) Show that all
three tend to 1 as x tends to infinity.
12.6. Exercises 257
Figure 62. The functions θ (x)/x (green), ψ(x)/x (red), and π(x) ln x/x
(blue) for x ∈ [1, 1000]. All converge to 1 as x tends to infinity. The
x-axis is horizontal.
Exercise 12.25. a) Show that the function H(z) of exercise 12.24 (d) has
an analytic continuation to Re z ≥ 0. (Hint: the pole at z = 0 has been
canceled by the subtraction of 1/z.)
R A(x)−x
b) Use Theorem 11.18 to show that 0∞ x2 dx converges.
A(x)
c) Use Lemma 12.13 (i) to show that limx→∞ x = 1.
lim ∏p =e
n→∞
p≤n
if and only the prime number theorem holds. (Hint: see Lemma 12.13 (ii).)
b) See Figure 63). Show that
1
lim ( lcm (1, 2, · · · , n)) n = e
n→∞
if and only the prime number theorem holds. (Hint: see Lemma 12.19.)
1
Figure 63. Plot of the function f (n) := ( lcm (1, 2, · · · , n)) n for n in
{1, · · · , 100} (left) and in {104 , · · · , 105 } (right). The function converges
to e indicated in the plots by a line.
Part 3
Primes in Arithmetic
Progressions
263
264 13. Primes in Arithmetic Progressions
i=1
13.2. The Hermitian Inner Product 265
One easily checks that this binary operation satisfies the requirements
that for all x, y, and z in V and α in C
1) (x, x) ≥ 0 positivity
2) (x, x) = 0 ⇐⇒ x = 0 definiteness
3) (x, αu + v) = α(x, u) + (x, v) linearity
4) (x, y) = (y, x) conjugate symmetry
More generally, any function V × V that satisfies these requirements is
called an inner product, but we will not be needing that generality here.
Definition 13.5. A set {ei }ni=1 of vectors in V is an orthonormal basis if for
all i 6= j and all x in V
The property that is crucial for us is that the αi in item (3) of this defi-
nition can be computed easily, namely
n
x = ∑ (ei , x)ei . (13.2)
i=1
For more details and a good general introduction, see [6][Chapter 6].
m1 mr
and for a and m in Z+
o , we set m/o := ( o1 , · · · , or ) and
r
a
ga := ∏ gjj ;
j=1
r
a j m j mod o j
a · (m/o) := ∑ .
i=1 oj
(1,1)
(1,−1)
Often these are abbreviated to L-series and L-function , though some authors
reserve those names for generalizations of those notions.
These L-function have the “feel” of a zeta function as the next result
indicates. We will use a complicated combination of L-functions as a “new”
zeta function to prove our main theorem. In the remainder of this chapter,
we abbreviate the function f has a well-defined analytic continuation in the
region S by f is analytic in S.
Proposition 13.14. If ψ is bounded and completely multiplicative, then
L(ψ, z) is analytic in Re z > 1 and
!
∞
−z
∞
ψ(pn )
ln ∑ ψ(n) n = ln L(ψ, z) = − ∑ ln(1−ψ(p)p−z ) = ∑ ∑ nz
.
n=1 p prime p n=1 np
Upon taking the logarithm, we arrive at the second equality. The third one
— and analyticity — follows from Lemma 12.9.
To prove the last part, we use Proposition 12.15 and compute
Z x
L(ψ, z) = ∑ ψ(n) n−z = Ψ(x)x−z + z 1
Ψ(t)t −z−1 dt ,
n≤x
where Ψ(x) = ∑n≤x ψ(n). Since ψ has period, say, q with average 0, we
have Ψ(x + q) = Ψ(x), and so Ψ is bounded. Thus both terms in the above
equation converge for Re z > 0.
Proof. Since χ(a) has unit modulus, we have that χ(a)χ(a) = 1. Because
there are ϕ(q) characters, the first equality follows.
The second equality is automatic if either a or n is not co-prime to
q. If a and n are distinct co-primes, then recall that the characters form
an orthogonal basis. Thus there must be another character χ ∗ ∈ Xq so that
χ ∗ (a−1 n) 6= 1. Since the reduced residues mod q form a field, from the
above we must have that χ(a) = χ(a−1 ). Using multiplicativity, we obtain
that ∑χ∈Xq χ(a)χ(n) equals
We will define quantities that allow us to mimic the proof of the prime
number theorem. To facilitate this, we use uppercase letters of the cor-
responding notation we used earlier. So ζ becomes Z, π becomes Π, θ
becomes Θ, and Φ stays the same. We will then proceed to give a proof
of the prime number theorem for arithmetic progressions that follows the
proof of Theorem 12.14 as closely as possible. As in Chapter 12, ∑ p and
∏ p mean sum or product over the (positive) primes.
The following definition should be compared with the definition of the
Riemann zeta function (Definition 2.19), of the prime counting function (in
Theorem 2.21), and Definition 12.1.
Definition 13.16. We introduce a new zeta function Zq,a , a function Πq,a
that counts the primes congruent to a mod q, and two auxiliary functions.
!
Zq,a (z) := ∏ L(χ, z)χ(a) = exp ∑ χ(a) ln(L(χ, z)) .
χ∈Xq χ∈Xq
Πq,a (x) := ∑ 1.
p≤x
p=q a
ln p
Θq,a (x) := ϕ(q) ∑ ln p and Φq,a (z) := ϕ(q) ∑ z
.
p≤x p=q a p
p=q a
Note that Θq,a (x) ≤ ϕ(q)θ (x). Our first inequality follows from (12.6).
∃C > 0 such that Θq,a (x) ≤ Cx . (13.8)
The factor 1/ϕ(q) that figures so prominently in our main result, The-
orem 13.25, shows up in the following lemma.
Lemma 13.18. We have for x ≥ 2
Z x
Θq,a (x) 1 Θq,a (t)
Πq,a (x) = + dt .
ϕ(q) ln x ϕ(q) 2 t (lnt)2
274 13. Primes in Arithmetic Progressions
Proof. First note that since 2 is the smallest prime, equation (12.2) gives
Z x
1 d Θq,a (t)
Πq,a (x) = .
ϕ(q) 2−ε lnt
The rest follows as in Lemma 12.2
Lemma 13.19. For Re z > 1, we have
Z ∞
Φq,a (z) 1 Θq,a (x)
− = − 1 x−z dx
z z−1 1 x
Z ∞
Θq,a (et )e−t − 1 e−zt+t dt .
=
0
Proof. Using (12.2), we can write Φq,a (z) as 1∞ x−z dΘq,a (x). Then apply
R
(12.3) (partial integration). The proof follows that of Lemma 12.3, except
that (12.6) is replaced by (13.8)
Proof. The first equality follows from Proposition 13.14. Then we follow
the reasoning of Lemma 12.9 to get
∞
χ(pn )
− ln 1 − χ(p)e−z ln p = ∑ nz
,
n=1 np
where we used complete multiplicativity of χ. Since |χ| = 1, this is analytic
on Re z > 1. Substitute this back into the lemma. Analyticity then allows
us to perform the finite sum over χ first. By Lemma 13.15, this gives a
contribution ϕ(q) if both pn =q a and gcd(pn , q) = 1, and else zero. This
proves the second equality of the lemma. Now the proof follows verbatim
the second paragraph of the proof of Lemma 12.9.
13.6. Primes in Arithmetic Progressions 275
Now we note that by Lemma 13.20, in the region z > 1, we may do the
summation over χ first. We then see that by Lemma 13.15, the first term
on the right hand side equals Φq,a (z). The rest of the proof follows that of
Lemma 12.12
Lemma 13.24. For all q ≥ 2 and a such that gcd(a, q) = 1:
Θq,a (y) − y Θq,a (x)
Z ∞
i) dy exists =⇒ lim = 1.
1 y2 x→∞ x
Proof. The proof of (i) is entirely parallel to that of Lemma 12.13. For the
proof of (ii), we use Lemma 13.18 and (13.8) instead of Lemma 12.2 and
(12.6). So,
Z x
Θq,a (x) 1 Θq,a (t) 1 Cx
Πq,a (x) − = 2
dt ≤ (1 + ε) .
ϕ(q) ln x ϕ(q) 2 t (lnt) ϕ(q) (ln x)2
for any ε > 0. Multiply both sides by ln x/x to obtain the result.
Theorem 13.25 (Prime Number Theorem for Arithmetic Progressions).
We have
Πq,a (x) 1 Πq,a (x) 1
1) lim = and 2) lim R x = .
x→∞ (x/ ln x) ϕ(q) x→∞ lnt dt ϕ(q)
2
Proof. The equivalence of (1) and (2) is the same as in Theorem 12.14.
So we only need to prove part (1). Lemma 13.19 gives
Φq,a (z + 1) 1
Z ∞
Θq,a (et )e−t − 1 e−zt dt .
− =
z+1 z 0
13.7. Exercises 277
Proposition 13.23 says that the left-hand side has an analytic continuation
in Re z ≥ 0 while equation (13.8) says that Θq,a (e−t )e−t − 1 is bounded. But
then, by Theorem 11.18, 0∞ (θ (et )e−t − 1) dt exists. Finally, Lemma 13.24
R
13.7. Exercises
Exercise 13.1. a) Finish the computation of (13.6) to show that fm is mul-
tiplicative. (Hint: see equation (13.4).)
b) Check that the entries table on the right in (13.3) correspond to (13.5).
Exercise 13.8. a) Use Theorem 13.8 and exercise 13.5 to construct the
characters of Z× ×
13 and Z26 .
b) Show that these characters basically correspond to the Fourier transform
of Definition 13.26, except that the xk are re-ordered (see also exercise
13.10).
Exercise 13.10. a) For any odd prime p denote by g its smallest primitive
root. Show that there is a bijection ind p : Z× +
p → Z p−1 given by
ind p (ga ) = a .
The value ind p (x) is called the index of x relative top p. The prime root g
is called the base.
b) For every odd prime less than 20, choose the smallest primitive root as
base, and determine the indices of {1, 2, · · · , p − 1}. Hint: as an example,
for p = 17 with base 3, we obtain the following table
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
16 14 1 12 5 15 11 10 2 3 7 13 4 9 6 8
c) Prove that the indices behave like logarithms, that is:
ind p (ab) =ϕ(p) ind p (a) + ind p (b) and ind p (ak ) =ϕ(p) k ind p (a) .
Exercise 13.12. Show that for any k > 0 there are infinitely many primes
ending in k consecutive 9’s.
There are useful relations between the newly minted functions in this chap-
ter and their counterparts in Chapter 12. We prove the following lemma in
exercise 13.13
Lemma 13.27. Let q = ∏ri=1 pki i . We have the following equalities:
Exercise 13.13. a) Using the Euler product of Proposition 13.14, show that
ln L(χ1 , z) = − ∑ ln(1 − p−z ) + ∑ ln(1 − p−z ) ,
p p|q
b) Show that (a) implies item (i) of Lemma 13.27.
c) Show that
Exercise 13.17. a) Use Proposition 13.14 to show that for real z ≥ 1 and
χ(a)χ(p)
χ 6= χ1 , ∑ p pz is bounded.
b) Use Lemma 13.15 to show that for Re z > 1
!
1 1 χ(a)χ(p) 1 χ(a)χ(p)
∑ = ∑ ∑ pz = ϕ(q) L(χ1 , z) + ∑∑ .
p=q a pz ϕ(q) χ∈Xq p χ6=χ1 p
pz
L(χ1 ,z)
c) Use Proposition 12.10 (ii) and (13.9) to show that limz&1+ ζ (z)
= 1.
d) Show that (a), (b), and (c) imply Dirichlet’s theorem.
where S(x) = card (S ∩ [1, x]) and T (x) = card (T ∩ [1, x]). The Dirichlet
density of a set S ⊆ T relative to T is
∑n∈S n−z
lim .
z&1+ ∑n∈T n−z
The relation between natural density and Dirichlet density (Definition 13.29)
is somewhat subtle. If the natural density exists then so does the Dirichlet
density, but not vice versa. To establish the former, we prove Lemma 13.30
below in exercise 13.19. The other direction of this statement is not so easy;
it is established by way of a counter-example in exercises 13.20 and 13.21.
Lemma 13.30. Let A and B be non-empty subsets of N and an and bn are
their indicator functions. That is: an equals 1 if n ∈ A and 0 elsewhere, and
similar for bn . Furthermore, A(x) = ∑n≤x an and similar for B(x). Now we
have for Re z > 1:
−z
A(x) ∑∞
n=1 an n
lim =µ =⇒ lim −z
=µ.
x→∞ B(x) z&1+ ∑∞
n=1 bn n
13.7. Exercises 283
2 8 32
0 4 16 64
1 2 4 8 16 32 64 128 256
Figure 65. The set S consists of the natural numbers contained in in-
tervals shaded in the top figure of the form [22n−1 , 22n ). The bottom
picture is the same but with a logarithmic horizontal scale.
284 13. Primes in Arithmetic Progressions
Exercise 13.20. We show that the set S depicted in top of Figure 65 does
not have a natural density (relative to N), but that it does have a logarithmic
density.
a) Show that the limsup of the natural density is at least 5/8 while the
liminf of the density is at most 3/8. (Hint: first, take the average up to the
green points in the figure, and then up to the blue points)
b) Use Figure 66 to show that
2m −1 2m −1
2−m
Z 1
1 1
∑ = ∑ = dx + rm = ln 2 + rm ,
j=0 2m + j j=0 1 + j 2−m 0 1+x
f(0)
dx
f(1)
0 1
∑k−1
j=0 f ( j dx)dx if f is strictly decreasing.
13.7. Exercises 285
+···
= ∑n∈S n−z .
(Hint: n−1
1 gets multiplied by (n
1−z − (n + 1)1−z ) for n ≥ n , n−1 by
1 2
(n1−z − (n + 1)1−z ) for n ≥ n2 , and so on. The sums as given telescope
to n−1 1−z 1−z −1 −1 1−z 1−z
1 (n1 − n2 ), (n1 + n2 )(n2 − n3 ), and so forth.)
b) Show that if the logarithmic density of S (with respect to N) equals µ,
then, by (a), we have
!
∞
−z −1
∑ n = ∑ ∑ k n1−z − (n + 1)1−z =
n∈S n=1 k∈S,k≤n
!
∞
−1
= ∑ µ ∑k n1−z − (n + 1)1−z =
n=1 k≤n
= µ ∑ n−z .
n∈N
c) Use (b) to demonstrate the statement heading this exercise.
To emphasize once again the similarity between our generalized zeta func-
tions and ζ of Chapter 12, we show that Zq,a has no zeroes in Re z > 1. The
proof can be copied from exercise 4.23, provided you make the requisite
substitutions.
Definition 13.32. The function Mq,a : N → Z is given by:
1 if n = 1
0 if ∃p prime with p 6=q a and p | n
Mq,a (n) = .
0 if ∃p prime with p2 | n
(−1)r
if n = p1 · · · pr and pi =q a
This the counterpart of the Möbius function of Definition 4.6.
286 13. Primes in Arithmetic Progressions
Exercise 13.25. a) Let an = lnpp is n = p a prime and 0 else and set f (t) =
1/ lnt. Now use Abel summation (Proposition 12.15) to show that
Z x
1 1 ln p 1 ln p
=
∑ p ln x ∑ p + 2 ∑ p
dt .
n≤x p≤x 2 t(lnt) p≤t
b) Use exercise 13.24 (d) applied to the previous item to show that
Z x Z x
1 R(x) 1 R(t)
∑ = 1+ + dt + dt
n≤x p ln x 2 t lnt 2 t(lnt)2
c) Conclude that
1
∑ = ln ln x + o(ln ln x) .
p≤x p
d) Compare (a) with exercise 13.18(d).
e) To appreciate how agonizingly slow the approach of ln ln x to infinity is,
10
approximate ln ln 1010 . (Hint: about 25).
10
f) To write that number — 1010 — in full decimal notation in a series
of books, how many books would you fill? Assume that you write 2000
characters on a page and that 500 pages make one book.
Chapter 14
289
290 14. The Birkhoff Ergodic Theorem
Remark 14.2. Since (∪i∈N Ai )c = ∩i∈N Aci , we see that a σ -algebra is also
closed under countable intersection.
Proof. Set h± equal to supn fn (x) and infn fn (x), respectively. Then
h−1 ∞ −1
+ ((x, ∞)) = ∪n=1 f n ((x, ∞)) ,
which proves the first case. The proof for h− is same, except that the union
must be replaced by an intersection.
14.2. Dominated Convergence 291
Set g± equal to lim supn fn (x) and lim infn fn (x), respectively. Since
g+ (x) = lim sup fi (x)
n→∞ i≥n
and supi≥n fi (x) is non-increasing (in i), we can replace the above limit by
the infimum, and use the above results for supremum and infimum to get
g−1 −1
+ ((x, ∞)) = ∩n≥1 ∪i≥n f i ((x, ∞)) .
Proof. Let
1
Am,n := x ∈ X : ∀i ≥ m , | fi (x) − f (x)| < .
n
We have Am,n ⊆ Am+1,n and ∪m Am,n covers all of X, except for a measure
zero set Zn (see Figure 68). Thus we can choose mn such that
ε
µ(X\Amn ,n ) < n . (14.1)
2
1a space with µ(X) < ∞
2Pointwise convergence means that for x fixed lim fi (x) = f (x).
i→∞
292 14. The Birkhoff Ergodic Theorem
1
1/n f3 f2
f1
0 1
A1,n
0
A 3,n A 2,n
For any x in the intersection of all Amn ,n , we have that for i ≥ mn , | fi (x) − f (x)| <
1/n. And thus on U := ∩n≥1 Amn ,n , we have uniform convergence. Within
X, we have (∩n≥1 Amn ,n )c = ∪n≥1 Acmn ,n (see Figure 69 and exercise 14.1),
where the superscript indicates complement. So
X\U = ∩n≥1 (X\Amn ,n ) ,
and so, by equation (14.1) and subadditivity (9.1), µ(X\U) < ε.
X A2
A1
Next we prove first that integrable functions nearly live on sets of finite
measure and that integrals over small sets are small.
Lemma 14.8. Suppose : X → [0, ∞] is measurable and integrable. Then:
R
i) for every ε > 0 there is a set F of finite measure such X\F g dµ < ε.
ii) for all ε > 0, there is a δ > 0 such that for all small sets S with µ(S) < δ ,
R
S g dµ < 2ε.
14.2. Dominated Convergence 293
y
5
y
4
f(x)
y3
y
2
y
1
y
0
Figure 70. The definition of the Lebesgue integral. Let {yi } be a count-
R
able partition of the range of
f . We approximate f dµ from below by
−1
∑i µ f ({y : y ≥ yi+1 }) (yi+1 − yi ). f is integrable if the limit con-
verges as the mesh of the partition goes to zero. The function y in the
proof of Lemma 14.8 (ii) is indicated in red. (Here µ is the Lebesgue
measure.)
Upon choosing η small enough, the result follows because µ(U) < ∞.
In the infinite measure case, we need to do one step extra. Use Lemma
14.8 (i) to first find a set F of finite measure so that
Z Z Z
( fk − f ) dµ ≤ 2 g dµ + ( fk − f ) dµ ,
X\F F
where the first integral on the right hand can be made smaller than any
ε > 0. The second integral can no be estimated in exactly the same way as
before.
Remark 14.10. While we proved the theorem here for real valued func-
tions, it also holds for complex valued functions. One simply proves the
result for the real and imaginary parts separately.
14.3. Littlewood’s Three Principles 295
One must be careful in the interpretation of this last result: it does not
mean that the points of the R\S are points of continuity of f . As an exam-
ple, consider the function that is 1 on the rational numbers and 0 everywhere
else. As a function R → R, it is nowhere continuous, but it’s restriction to
the irrational numbers is continuous. Luzin’s theorem still goes a little fur-
ther, and asserts that we can contain the rationals in an open sets of arbitrary
small measure (exercise 14.6).
If there is an ergodic map T so that T (xi ) = xi+1 and that preserves the
Lebesgue measure, then of course, item (ii) follows from Corollary 9.10,
which says that time averages equal space averages. The standard example
of this is T (xk+1 ) = xk + ρ where ρ is irrational, as we discussed at length
in Chapter 10. However, it is still amusing to give a very simple and direct
proof of this based Weyl’s criterion.
Indeed, it requires no more than than summing a geometric series to
see that
n−1 n−1
1 e2πimx0 e2πimx0 e2πimnρ − 1
n ∑ e2πimxk = n ∑ e2πimkρ = n
· 2πimρ
e −1
.
k=0 k=0
Proof. Note that this statement holds for f with “≥” if and only if it holds
for g = − f with “≤”. So it is sufficient to prove only the ≥ version.
14.5. Proof of Birkhoff’s Ergodic Theorem 299
k
n
Sf (x )
0
N−k p N
Figure 71. A plot of Snf (x0 ) for some fixed x0 for n ∈ {0, · · · , N}.
First assume that n(x) is bounded (for almost all x) by some k > 0.
Then no matter how large we take N, there is some p(x) in {N − k, · · · , N}
p(x)
such that S f (x) ≥ 0 (see Figure 71). We then have for µ-almost all x0
p(x0 ) N−p(x0 ) N−p(x0 )
SNf (x0 ) = S f (x0 ) + S f (x p ) ≥ −S| f | (x p ) ≥ −S|kf | (xN−k ) .
Therefore for µ-almost all x
N N
∑ f (T i (x)) ≥ − ∑ f (T i (x)) .
i=1 i=N−k+1
Bearing in mind that µ is invariant, we integrate this inequality. So by
Lemma 10.1, f (T i (x)) dµ = f (x) dµ(x) and similarly for | f |. In this
R R
R R
way we obtain, after integrating, that N f dµ ≥ −k | f | dµ. But since
R
we may take N arbitrarily large, it follows that f dµ ≥ 0.
Let
f (x) if n(x) ≤ k
fk (x) =
0 else
We have | fk | ≤ | f | and so the fk are dominated by | f | and since f is µ-
integrable, so are the fk . Since the fk converge pointwise to f , we have
Z Z
f dµ = lim fk dµ ≥ 0 ,
k→∞
by dominated convergence (Theorem 14.9).
By Lemma 14.5 and the comments immediately prior to it, h f i± are mea-
surable functions. First suppose they are bounded. Then they are also inte-
grable, because µ(X) = 1.
Suppose that the following statement is false:
Z Z
h f i− dµ ≥ f dµ .
Putting (14.5) and (14.6) together shows that if h f i± are bounded, then the
average has the desired properties.
Now we drop the hypotheses that h f i± are finite. So let
Xn := {x ∈ X : −n ≤ h f i− (x) ≤ h f i+ (x) ≤ n} .
T maps Xn to itself and so all hypotheses hold and therefore the above con-
clusion holds for all Xn , and thus for X∞ = ∪n Xn . We are done if X\X∞ has
µ-measure zero. Now Xn is measurable because h f i± are, and so X∞ and
its complement are also measurable. Suppose the complement has positive
measure, then since f is integrable, there must be a c > 0 so that
Z
(c − f ) dµ > 0 .
X\X∞
We apply again the contrapositive of the maximal ergodic theorem, to get
that there must be a (positive measure of) x in X\X∞ so that for all n
n
S(c− f ) (x) > 0 =⇒ nc − Snf (x) > 0 .
But this contradicts the definition of X\X∞ .
Recall that Corollary 9.10, which in fact says that space averages equal
time averages, follows fairly easily from this theorem. Frequently, it is
that Corollary which one has in mind when referring to Birkhoff’s ergodic
theorem. We repeat that statement here for convenience. Its proof is in
Chapter 9.
Corollary 14.18. A transformation T : X → X that preserves a probability
measure µ has the property that every T invariant set has measure 0 or 1 if
and only if for every integrable function f
n−1
1
Z
lim ∑ f (T i (x)) = f (x) dµ
n→∞ n i=0 X
14.6. Exercises
Exercise 14.1. Let An sets in a space X, and I any (possibly uncountable)
index set.
a) Show that (∩n∈I An )c = ∪n∈I Acn .
b) Show that (∪n∈I An )c = ∩n∈I Acn .
(Note: these two statements are known as the De Morgan laws .)
Exercise 14.2. a) Show that gn (X) = supi≥n fi (x) is non-increasing (in n).
b) Let fn (x) = sin nx. Determine lim supn fn (1). (Hint: use Lemma 10.6).
c) Show that the twin prime conjecture (Conjecture 1.28) is equivalent to
lim infn pn+1 − pn = 2.
Exercise 14.5. Explain why Henri Lebesgue wrote the following about his
method of integration (as cited by [23][ page 796]):
“I have to pay a certain sum, which I have collected in my pocket. I take
the bills and coins out of my pocket and give them to the creditor in the
order I find them until I have reached the total sum. This is the Riemann
integral. But I can proceed differently. After I have taken all the money
out of my pocket I order the bills and coins according to identical values
and then I pay the several heaps one after the other to the creditor. This is
my integral.”
14.6. Exercises 303
Exercise 14.6. a) Show that the rational numbers in the unit interval can
be contained in an open set of arbitrarily small measure. (Hint: for
some lambda > 1, put the number p/q in an open interval of length
Cϕ(q)−1 λ −q , where ϕ is the totient function.)
b) Use (a) to show that the rational numbers in R an be contained in an
open set of arbitrarily small measure. (Hint: in each unit interval, choose
an appropriate C as defined in (a).)
1
1/n3 1/n3
0
0 j/k 1
Figure 72. The function fn (in red) in exercise 14.7 is a sum of very
thin triangles with height 1. Each triangle is given by hn ( j, k, x) (in
black).
304 14. The Birkhoff Ergodic Theorem
Exercise 14.8. Let fn (x) = 1/n for x ∈ [0, 1/n] and 0 elsewhere and set
g(x) = 1/x.
a) Show that g dominates
R
the fk .R
b) Show that limk→∞ fk dµ 6= limk→∞ fk dµ.
c) Why do (a) and (b) not contradict Theorem 14.9?
Exercises 14.9 and 14.10 provide an interesting illustration of the dominated
convergence theorem. Generalizing exercise 11.19, for fixed r ≥ 1, consider
the functions gk (x) = kr xk (1 − x) on [0, 1]. Define Gk (x) = supi≤k g(i (x) and
G(x) = supi gi (x).
k
Exercise 14.9. a) Show that gk (x) is increasing on [0, k+1 ] and decreasing
k
on [ k+1 , 1].
k+1
b) Show that gk has maximum kr−1 k+1 k
≈ kr−1 e−1 . (Hint:
limk→∞ (1 − 1/k)k = e.) n r o
c) Show that gk−1 (x) = gk (x) iff x ∈ 0, k−1 k , 1 and that
r 2k
gk k−1
k = kr−1 k−1
k (1 − 1k ) ≈ kr−1 e−2 .
n r o
k
d) Show that gk (x) = gk+1 (x) iff x ∈ 0, k+1 , 1 and that
r 2k+2
gk k
k+1
k
= kr−1 k+1 (2 + 1k ) ≈ (k + 1)r−1 e−2 .
r r
d) Show that k+1 k
− k−1k ≈ rk−2 . (Hint: compute the first term in
the expansionRof (1 + x)−r − (1 − x)r .)
e) Show that G(x) dx is “sandwiched” between the sum of the areas of
the rectangles like the one shaded red in Figure 73 and the sum of the red
plus the green ones.
f) Conclude that G is integrable iff r < 2.
Exercise 14.10. a) Use exercise 14.9 (f) to show that the dominated con-
vergence theorem implies that for r < 2, we have
Z 1 Z 1
lim gk (x) dx = lim gk (x) dx .
0 k→∞ k→∞ 0
b) What goes wrong for r ≥ 2?
c) Show that
kr
Z 1 Z 1
lim gk (x) dx = 0 and gk (x) dx = .
0 k→∞ 0 (k + 1)(k + 2)
d) Why is (c) consistent with (a) and (b)?
14.6. Exercises 305
g
(k+1) e −2 k+1
k e−2
g
k
g
k−1
0
2
((k−1)/k) (k/(k+1)) 2
0 1
Many number theory textbooks state (correctly) that the fractional parts of
f (n) = ln pn are not equidistributed. This is slightly misleading because an
unsuspecting student could be tempted into wondering to what mysterious
distribution the numbers the fractional parts of ln pn deign themselves to
converge to? The answer — perhaps somewhat disappointingly — is that
the logarithm increases so slowly that in fact those numbers do not converge
at all as we show in exercises 14.12 and 14.13. We denote the fractional of x
by {x}. For a slowly increasing function f : N → R and an interval I ⊂ [0, 1],
we define the “hitting frequency” as follows:
#{{ f (i)} ∈ I for i ∈ {1, · · · , n}}
F(0, n) := .
n
Note that if the fractional parts of { f (n)} converge to any distribution what-
soever, then there is a c ∈ [0, 1] so that limn→∞ F(0, n) = c.
Exercise 14.12. In this exercise, we set f (n) := ln n and let J = [α, α +
δ ) ⊂ [0, 1] be an arbitrary interval. For K ∈ N and nK , choose n0K so that
f (nK ) ≤ K + α < f (nK + 1) and f (n0K ) ≤ K + α + δ < f (n0K + 1) .
a) Show that
n0K
lim = eδ .
K→∞ nK
b) Show that (see Figure 74)
n0K F(0, n0K ) ≈ nK · c + (n0K − nK ) · 1 .
c) Show that
lim F(0, n0K ) − F(0, nK ) = (1 − c)(1 − e−δ ) .
K→∞
d) Conclude that the fractional parts of f (n) = ln n do not converge to any
distribution.
K+1
f(x)
K+J
x
n n’
K K
1
"3" "2" "1"
0
0 1/3 1/2 1
Figure 75. A few branches of the Lüroth map of exercise 14.16. The
names of the branches are as indicated in the figure.
308 14. The Birkhoff Ergodic Theorem
Exercise 14.17. This exercise relies on exercise 14.16 and Section 6.6. Let
bk (x) : Ik → [0, 1) be the branch of T k such that x ∈ Ik , then the kth conver-
gent [a1 , · · · , ak ] of x is the (unique) endpoint of Ik that maps to zero under
T k (see Proposition 6.14). The branches of T are labeled as indicated in
Figure 75. For simplicity, we note (without proof) that the kth convergent
is always a rational number also denoted by pk /qk . The Lüroth expansion
of a number x ∈ [0, 1) is the list [a1 , a2 , · · · ] where ai is the label of the
branch in whose domain T i (x) is located. For more details, see [8].
a) Show that
p
x − k < |Ik | ,
qk
where |Ik | is the length of Ik .
b) Show that T k : Ik → [0, 1) is an affine bijection.
c) Show that
p
|Ik+1 | < x − k < |Ik | ,
qk
(Hint: bk maps Ik affinely onto [0, 1) (see Figure 76) and so the sub-
intervals of Ik have the same proportions as the sub-intervals of the unit
interval in the Lüroth map of Figure 75.)
d) Use (b) to show that
k−1
1
ln = ∑ ln DT (T j (x)) .
|Ik | j=0
(Hint: see exercise 10.22.)
e) Use (c) and (d) to show that
1 p
lim ln x − k = −λ (x) ,
k→∞ k qk
where λ (x) is the Lyapunov exponent of T at x.
14.6. Exercises 309
x
0
pk /q k I k+1
Ik
Figure 76. A few branches of the k + 1st iterate of the Lüroth map T
restricted to the interval Ik . In red a branch of T k and in black a few
branches of T k+1 .
Exercise 14.19. Two measures ν and µ are said to be in the same measure
class if they have the same sets measure zero sets. Suppose we fix a mea-
sure class and are given that there is an (unknown) ergodic measure in this
class.
a)
R
Given a set S and its characteristic function χS . Show that µ(S) =
χS dµ.
b) Use (a) and Corollary 14.18 to show that
n−1
1
µ(S) = lim ∑ f (T i (x)) .
n→∞ n i=0
c) Show that this determines the measure µ.
d) Show that if there was another ergodic measure ρ, then it would live
entirely in the sets of µ-measure zero. (Hint: see Corollary 9.12.)
Sometimes the definition of ergodicity Definition 9.9 is replaced by the ap-
parently stronger one given below. In exercise 14.20, we show that these
are in fact equivalent.
Definition 14.20. A transformation T of a measure space X to itself is
called weakly ergodic (with respect to µ) if it preserves the measure µ and
310 14. The Birkhoff Ergodic Theorem
Exercise 14.20. a) Show that weakly ergodic implies ergodic. (Hint: this
is trivial.)
b) Now assume that T is ergodic with respect to the measure µ, and let S0
be a weakly invariant set of positive measure. Show that S = ∩∞ −i
i=0 T (S0 )
is invariant.
c) Set Sn = ∩ni=0 T −i (S0 ) and ∆n = Sn \Sn+1 . Show that µ(S0 ) = µ(S) +
∑∞i=0 µ(∆i ). (Hint: use Definition 14.3.)
d) Show that if x ∈ ∆n , then T n x ∈ S0 but T n x 6∈ T −1 S0 .
e) Use (c) and (d) to show that µ(∆n ) = 0 and thus µ(S0 ) = µ(S).
f) Use ergodicity to show that µ(S0 ) has full measure.
Exercise 14.21. For this exercise, assume that the linear combinations of
the functions e2πinx are dense in the set of integrable function on the circle
or L1 (R/Z).
a) Show that the Lebesgue measure is ergodic and measure preserving if
and only if for all m 6= 0 in Z
n−1
1 k
lim
n→∞ n ∑ e2πimT (x) = 0 .
k=0
(Hint: use the proof of Corollary 14.13.) b) Show that T in (a) is ergodic
if and only if {T k (x)} is equidistributed.
We saw in Section 9.4 that a given transformation T may have uncountably
many coexisting invariant measures. The Krylov-Bogoliubov theorem (see
[31]) states that a continuous map T from a compact metric space to itself
has an (at least one) invariant probability measure. Exercise 14.22 gives a
counterexample if we drop continuity.
14.6. Exercises 311
[1] Odlyzko A. M., On the distributionof spacings between zeros of the zeta function, Mathematics of
Computation 48 (1987), 1003–1026.
[2] M. Aigner and G. M. Ziegler, Proofs from the book, 6th edition, Springer Verlag, Providence, RI, 2018.
[3] T. M. Apostol, Mathematical analysis, 2nd edn, Addison-Wesley, Philippines, 1974.
[4] , Introduction to analytic number theory, Springer Verlag, New York, 1989.
[5] V. I. Arnold, Geometrical methods in the theory of ordinary differential equations, Springer Verlag,
New York, 1983.
[6] S. Axler, Linear algebra done right, 3rd edition, Springer International Publishing, Cham, Switzerland,
2015.
[7] , Measure, integration & real analysis, Springer Nature Switzerland, Cham, Switzerland, 2020.
[8] L. Barreira and G. Iommi, Frequency of digits in the lüroth expansion, Journal of Number Theory
(2009), 1479–1490.
[9] C. M. Bender, D. C. Brody, and M. P. Müller, Hamiltonian for the zeros of the riemann zeta function,
Phys. Rev. Lett. 118 (2017Mar), 130–201.
[10] A. Berger and T. P. Hill, Benford’s law strikes back: No simple explanation in sight for mathematical
gem, Mathematical Intelligencer (2011), 85–91.
[11] P. Berrizbeitia and B. Iskra, Gaussian mersenne and eisenstein mersenne primes, Mathematics of Com-
putation 79 (2010), 1779–1791.
[12] M. V. Berry and J. P. Keating, The riemann zeros and eigenvalue asymptotics, SIAM Review 41 (1999),
236–266.
[13] I. Boreico, My favorite problem: Linear independence of radicals, The Harvard College Mathematics
Review (2007), 83–87.
[14] Y. Bugeaud, Distribution modulo one and diophantine approximation, Cambridge University Press,
Cambridge, UK, 2012.
[15] D. M. Burton, Elementary number theory, 7th edition, McGraw-Hill, New York, NY, 2011.
[16] G. Cantor, Über eine elementare frage der mannigfaltigkeitslehere 1 (1890), 75–78.
[17] J. S. Caughman, 2018. Personal communication.
[18] D. A. Clark, A quadratic field which is euclidean but not norm-euclidean, Manuscripta Mathematica
83 (1994), 327–330.
[19] I. P. Cornfeld, S. V. Fomin, and Ya. S. Sinai, Ergodic theory, Springer-Verlag, New York, NY, 1982.
313
314 Bibliography
[20] J. Esmonde and M. Ram Murty, Problems in algebraic number theory, Springer, New York, NY, 1999.
[21] B. Fornberg and C. Piret, Complex variables and analytic functions, an illustrated introduction, SIAM,
Philadelphia, 2020.
[22] W. J. Gilbert, Modern algebra with applications, John Wiley & Sons, New York, 1976.
[23] T. Gowers, The princeton companion to mathematics, Princeton University Press, Princeton, NJ, 2008.
[24] A. Granville, Number theory revealed: a masterclass, AMS, Providence, RI, 2019.
[25] Dumas H. S., The kam story: A friendly introduction to the content, history, and significance of classi-
cal kolmogorov-arnold-moser theory, World Scientific, Singapore, 2014.
[26] G. H. Hardy and E. M. Wright, An introduction to the theory of numbers, sixth edition, Oxford Uni-
versity Press, London, UK, 2008.
[27] Th. W. Hungerford, Algebra, Springer Verlag, New York, 1974.
[28] H. S. Zuckerman I. Niven and H. L. Montgomery, An introduction to the theory of numbers, Wiley &
Sons, New York, NY, 1991.
[29] I. Kaplansky, Set theory and metric spaces, 2nd edition, AMS Chelsea Publishing, Providence, RI,
2001.
[30] D. Katahdin, 2018. Personal communication.
[31] A. Katok and B. Hasselblatt, Cambridge, UK.
[32] A. Ya. Khintchine, Continued fractions, P. Noordhoff, Ltd, Groningen, 1963.
[33] A. N. Kolmogorov and S. V. Fomin, Introductory real analysis, Dover, New York, NY, 1970.
[34] J. E. Littlewood, Lectures of the theory of functions, Oxford University Press, Oxford, UK, 1944.
[35] J. E. Marsden and M. J. Hoffman, Basic complex analysis, 3rd edn, W. H. Freeman, New York, NY,
1999.
[36] Wolfram MathWorld, Mertens constant. Available online at: https://mathworld.wolfram.
com/MertensConstant.html.
[37] F. Mertens, Ein beitrag zur analytischen zahlentheorie, J. reine angew. Math. 78 (1874), 46–62.
[38] J. W. Milnor, Dynamics: Introductory lectures, University of Stony Brook, 2001.
[39] C. M. Moore, Ergodic theorem, ergodic theory, and statistical mechanics, PNAS 112 (2015), 1907–
1911.
[40] J. R. Munkres, Topology, 2nd edition, Prentice-Hall, Hoboken, NJ, 2000.
[41] D. J. Newman, Simple analytic proof of the prime number theorem, The American Mathematical
Monthly 97 (1980), 693–696.
[42] Fitzpatrick P. M., Advanced calculus. a course in mathematical analysis, PWS Publishing Company,
Boston, 1996.
[43] C. C. Pinter, A book of abstract algebra, 2nd edition, Dover, New York, 1990.
[44] C. Pomerance, J. L. Selfridge, and S. S. Wagstaff, The pseudoprimes to 25 · 109 , Mathematics of Com-
putation 35 (1980), 1003–1026.
[45] C. C. Pugh, Real mathematical analysis, 2nd edn, Springer, Cham, Switzerland, 2015.
[46] B. Riemann, Ueber die anzahl der primzahlen unter einer gegebenen grösse, Monatsberichte der
Berliner Akademie (1859).
[47] S. Roman, An introduction to discrete mathematics, Harcourt Brace Jovanovich, Orlando, FL, 1989.
[48] W. Rudin, Real and complex analysis, 3rd edn, McGraw-Hill International, New York, NY, 1987.
[49] J. H. Shapiro, 2020. Informal Lecture Notes.
[50] C. L. Siegel, Algebraische abhängigkeit von wurzeln. (german), Acta Arith. 21 (1972), 59–64.
[51] I. Soprounov, A short proof of the prime number theorem for arithmetic progressions, Journal of Num-
ber Theory (2010).
[52] H. M. Stark, On the gap in the theorem of heegner, Journal of Number Theory 1 (1) (1969), 16–27.
[53] S. Sternberg, Dynamical systems, revised in 2013, Dover Publications, United States, 2013.
Bibliography 315
[54] I. Stewart and D. Tall, Algebraic number theory and fermat’s last theorem, third edition, A K Peters,
Natick, MA, 2002.
[55] J. Stillwell, Mathematics and its history, third edition, Springer, New York, NY, 2010.
[56] S. Sutherland, V’ir Tbg n Frperg. Available online at: https://www.math.sunsysb.edu/
˜scott/papers/MSTP/crypto.pdf.
[57] J. J. P. Veerman, Symbolic dynamics of order-preserving sets, Physica D 29 (1986), 191–201.
[58] , Symbolic dynamics and rotation numbers, Physica A 134 (1987), 543–576.
[59] , The dynamics of well-ordered orbits, Autonomous University of Barcelona, Barcelon, SP,
1995.
[60] R. A. Wilson, An example of a pid which is not a euclidean domain. Robert A. Wilson’s website,
accessed in December 2021.
[61] D. Zagier, Newman’s short proof of the prime number theorem, The American Mathematical Monthly
104 (1997), 705–708.
Index
Bε (x), 199 dθ e, 22
F((x)), 128 bθ c, 22
F/E, 135 µ(n), 61
I(n), 66 ω(n), 62, 75
L-function, 271 ϕ(n), 64
L-series, 271 π(x), 31
L(χ, z), 270 ψ(x), 256
Mq,a (n), 285 Resm (a), 5
R(x), 128 σ -algebra, 289
R[[x]], 128 σk (n), 60
R[x], 128 τ(n), 61
Zq,a (z), 273 Frac(R), 167
[F : E], 136 θ (x), 240
C[x], 8 ζ (z), 30
F p , 89 {1, 2, · · · b − 1}N , 12
Λ(n), 255 {θ }, 22
Ω(n), 62 a | b, 4
Φ(z), 240 ak k b, 242
Φq,a (z), 273 deg( f ), 8
Πq,a (x), 273 1(n), 66
Q[x], 8
R[x], 8 Abel summation, 250
Θq,a (z), 273 Abelian group, 87
Z(γ), 138 absolute convergence, 218
Z[x], 8 absolutely continuous measure, 187
χ1 , 270 absolutely normal, 192
ε(n), 62, 66 absorbs products, 128
γ, 251 additive function, 62
ind p (x), 279 additive order, 78
λ (n), 75 affine cipher, 94
317
318 Index
zeta function, 30