Math 124 Notes Fall 2023
Math 124 Notes Fall 2023
Contents
0 Preface 5
1
Math 124: Number Theory CONTENTS
0 Preface
This class is from the Fall 2023 semester. Meeting times are Monday and Friday from
12-1:15pm in SC221. The main textbook is Ireland-Rosen’s A Classical Introduction
to Modern Number Theory.1 There are no prerequisites to this course, so “if you don’t
understand something, it’s [Kisin’s] fault, not yours.” Bottom line: don’t be afraid to
ask questions!
Problem sets will be assigned approximately weekly. Kisin’s office hours will be 2-
3pm on Wednesdays in SC232. Office hours and section times for the course assistants
can be found on Canvas. There will be both a midterm and final exam, which will be
“absolutely routine if you do all of the homework.” (The problems will be taken from
the homework.)
If you see anything wrong or unclear, let me know at [email protected]!
• unique factorization,
• congruences,
• Quadratic Reciprocity,
• Diophantine equations.
The magic of number theory is that very sophisticated tools and results can be
developed from the most simple techniques. This means that the beginning of the
course may feel slow, perhaps even underwhelming, but it comes together very nicely
(and quickly!) in the latter half of the semester. So hold your horses.
1.1 Notation
Always good to establish from the getgo.
• N+ = {1, 2, 3, . . . }
1
Kisin believes this textbook should be titled “An Elementary Introduction to Classical Number
Theory”; I share this sentiment.
As mentioned already, primes are important because every integer factors into primes:
We’re all familiar with this theorem, perhaps we use it every day. But just how
“obvious” is this theorem? Is it immediate from the definition?
The theorem statement seems to incorporate two parts to it. First, we need to show
that a prime factorization actually exists in the first place, and then assert that it is
unique. This distinction is important: there are some worlds where existence holds,
but the factorization is not unique! (For those who have taken Math 122, this may be
familiar. Otherwise, an √ 2 · 3, but√if you
√ example I like to give is that you can write 6 =
allow for terms with −5 then suddenly you can also write 6 = (1 + −5)(1 − −5).
So the factorization is not unique.)
Lemma 1.4
Every n ∈ N+ is a product of primes.
So the subtle thing we are using here that makes this work is the very convenient
fact that the integers have a well-ordering, i.e. given any two integers a, b, we can
compare their sizes. This is used in the fact that the two divisors a, b are both less than
n, which allows us to activate the inductive hypothesis.
Again, this may not seem the most enlightening stuff now, but when we grow up
from N+ to some other set of numbers and try to prove similar statements, the work
we’ve done here will become very handy.
Now, we want to prove that the factorization is unique, up to ordering. For instance,
we can factor 36 = 6 · 6 = 2 · 3 · 2 · 3, or 36 = 4 · 9 = 2 · 2 · 3 · 3, which are the same up
to ordering of the factors. We want to show this is always the case, no matter which n
we choose.
We first develop a definition for greatest common divisor, an important concept that
will come up again and again.
Remark 1.6. For those familiar with ring theory, this is the ideal in Z generated
by a1 , a2 , . . . , an .
1. d | a and d | b, and
Might seem weird at first, but if you think about it at first, it makes perfect sense.
Remark 1.8. You might be wondering why we don’t replace the second condition
of gcd with a more familiar condition “whenever c | a and c | b, then c ≤ d. There
are a few reasons why we don’t want to do this, the first being that this takes
away some factors which would qualify as gcd. For instance, gcd(36, 60) = ±12,
but using the ≤ condition would remove the −12 possibility. This may not be a
concern now, but it would cause problems when we go beyond the integers.
The second reason is that it’s just not necessary. Why do we need to invoke
the well-ordering of the integers when we don’t need to? This is a good thing to
practice in math: if you don’t need something, don’t use it. You’ll thank yourself
in the long run.
Okay, we have this definition. Now we want to know this gcd always exists.
Proof. Let’s take care of the stupid case first. If a = b = 0, then gcd(a, b) = 0. Now
assume a ̸= 0. Let d be the smallest positive element of (a, b). We will show (a, b) = dZ.
When we want to show equality of two sets, it is customary to show that one contains
the other, and vice versa. One inclusion is immediate: given our choice of d above, it
is clear that d · Z ⊆ (a, b). (⊆ means “contains”.) So we are left to prove (a, b) ⊆ d · Z.
Suppose e ∈ (a, b). We invoke the Division Algorithm, which tells us that e = q·d+r
where q, r ∈ Z and 0 ≤ r < d. (This is just saying when you divide e by d, you get a
quotient q with a remainder r < d.) Then, r = e − qd, so r ∈ (a, b) since d, e ∈ (a, b).
But since we chose d to be the smallest positive element of (a, b), this forces r = 0,
which means e = q · d ∈ d · Z. Since this works for any e ∈ (a, b), we conclude
(a, b) ⊆ d · Z, and equality of the two sets follows.
Now we must address the last part of the result: showing that d = gcd(a, b). Showing
that d is a common factor is immediate: a, b ∈ (a, b) = dZ, so d | a and d | b. Also note
that d ∈ (a, b), so there are integers x, y such that d = ax + by. Thus, if an integer
c ∈ Z satisfies c | a and c | b, then c | ax + by = d, so d is a gcd of a, b.
Just as a cook must know the ingredients in a recipe and not just the steps, we
should understand what are the key things being used in our proofs. Here, the main
content of the proof unravels from the Division Algorithm, a simple/intuitive yet very
important result in number theory.
We continue to develop some more definitions and results for our proof of unique
prime factorization.
Remark 1.11. Notation: if (a, b) = dZ, i.e. d = gcd(a, b), then we abuse notation
by suppressing the gcd and writing (a, b) = d.
Lemma 1.12
If a | bc and (a, b) = 1, then a | c.
Proof. This follows from previous definitions. As (a, b) = Z (in particular, 1 ∈ (a, b)),
there exists r, s ∈ Z such that 1 = ra + sb. Thus, c = rac + sbc. Clearly, a | rac, and
from assumption, a | s · bc, so a | c as desired.
What we will use in our proof of unique factorization is a specific version of this
where a = p is a prime.
Corollary 1.13
If p is prime and p | bc, then p | b or p | c.
Proof. As the only factors of p are ±1, ±p, we either have (p, b) = 1, in which case p | c
from above, or (p, b) = p, in which case p | b.
Definition 1.14 (Order). If p is prime and n ∈ Z, then ordp n is the largest integer
a such that pa | n.
Corollary 1.15
If p is prime, and a, b ∈ Z, then ordp (ab) = ordp (a) + ordp (b).
Note that we can apply this above corollary repeated, e.g. ordp (abc) = ordp (ab) +
ordp c = ordp a + ordp b + ordp c. Now, finally, we are ready to prove uniqueness of prime
factorization.
Proof of second part of Theorem 1.2. Let n ∈ N+ . Write n = pa11 pa22 · · · pann , where the
pi ’s are distinct primes and ai > 0. (Note that we already proved a prime factorization
exists, so this is fair game.) We want to show that the exponent of any prime p in the
factorization depends only on n, not on the choice of factorization.
Let a := ordp n. By the above corollary, we have ordp n = ordp (pa11 ) + ordp (pa22 ) +
· · · + ordp (pann ). If pi ̸= p, then p does not divide pi , so ordp pai i = 0. On the other
Definition 2.1 (Irreducible). Let f ∈ k[x], and deg f > 0. Then, f is called
irreducible if whenever f = gh, either deg g = 0 or deg h = 0.
Finally, we specify the units of our ring k[x]. In Z, the units are {±1}, as they are the
only two integers with a multiplicative inverse. In k[x], try to convince yourself of the
following:
Given this, we have the following theorem, analogous to unique prime factorization in
Z.
Turns out the proof of this is almost identical to what we did last time, which is why we
went through it in the first place! So our approach will be, like last time, to first prove
existence of such a factorization, then prove its uniqueness. We continue expanding our
analogy between Z and k[x] in order to prove this theorem.
Set Z k[x]
Units {±1} k×
Size |n| deg f
Proof of existence. We may assume f is monic. Like with our proof last time, we can
start with showing a factorization exists via induction. Any linear polynomial (i.e.
deg f = 1) is irreducible, so the base case is satisfied. Otherwise, suppose factorization
exists for all h such that deg h < n. Take a degree-n polynomial f . If f is irreducible,
great, we have the trivial factorization f = f . Otherwise, we can write f = gg ′ , where
deg g, deg g ′ < deg f . By the inductive hypothesis, each of g, g ′ has a factorization into
irreducibles, so the product gg ′ = f does as well.
Recall the proof of uniqueness in the integer scenario was solo carried by the won-
derful thing called the Division Algorithm: given a, b ∈ Z (b ̸= 0), there exists integers
q, r ∈ Z such that a = bq + r and 0 ≤ r < |b|. To have the same proof apply to k[x],
we need an analogy of this result. Because this was pretty intuitive for the integers, we
omitted a proof there, but here we will need to put in the work.
Proof. Consider the set S := {f −ζg : ζ ∈ k[x]}. Let r be an element of S with minimal
degree. By definition, we can write r = f − qg for some q ∈ k[x]. We have two cases
for g.
First, suppose g ∈ k × . Then, we can set q = f /g, in which case r = 0. That’s good.
Now, suppose deg > 0. We wish to show deg r < deg g. Write r = axℓ + . . . and
g = bxm + . . . , so deg r = ℓ and deg g = m. Suppose for the sake of contradiction
that deg r ≥ deg g. Then, we have r − ab−1 xd−m g = f − (q + ab−1 xd−m )g so it is in S,
but deg(r − ab−1 xd−m g) < deg g, hence contradicting the minimality of g in S. Thus,
deg r < deg g, and we conclude.
This stuff may look familiar to you, because it is from last lecture.
Proof. Let d be an element of minimal degree in (f, g). It is clear (d) ⊆ (f, g), so we
now prove the reverse inclusion. If c ∈ (f, g), then by Division Algorithm (Lemma 2.5),
we can write c = qd + r where deg r < deg d or r = 0. Rewrite as r = c − qd. Because
c, d ∈ (f, g), we have r ∈ (f, g) as well. This forces r = 0, because otherwise we would
have deg r < deg d, contradicting the minimality of d. Thus, when r = 0, we have
c = qd, namely c ∈ (d). This means (f, g) ⊆ (d), so equality follows. This completes
the first part of this lemma.
Now we show that this choice of d is a gcd of f, g. First, note that (f, g) = (d),
so in particular both f ∈ (d) and g ∈ (d), meaning d | f and d | g, respectively. Let
c ∈ k[x] where c | f and c | g. Then, c | af + bg for any a, b ∈ k[x]. But note d ∈ (f, g),
so d = a′ f + b′ g for some a′ , b′ ∈ k[x] by definition! Thus, c | d, as desired.
Proof. Since (f, g) = (1), we may write 1 = af + bg for some a, b ∈ k[x]. Then,
h = af h + bgh. Clearly, f | af h, and f | b · gh, so f | h, as desired.
Corollary 2.11
If p ∈ k[x] is irreducible and p | f g, then p | f or p | g.
Definition 2.12 (Order). For p ∈ k[x] irreducible, f ∈ k[x], define the order
ordp f as the largest integer a such that pa | f .
This is left as an exercise, but you should just follow the proof for what we did in Z.
Now we are ready to complete the proof of Theorem 2.4 by proving uniqueness of
factorization.
Proof of uniqueness. Let f ∈ k[x]. We already proved it has a factorization into irre-
ducibles, so write f = c · f1a1 · · · fnan where the fi ’s are irreducible, ai ∈ N+ , and c ∈ k × .
We may assume that f and the fi ’s are monic and c = 1. But the exponents ai are
uniquely determined by the order of p, namely if p = fi , then ai = ordp f . But the
order is determined solely by f , so the factorization is indeed only dependent on f , i.e.
it is unique.
Remark 2.15. (bypassing rings) If you don’t know what a ring is, think of an
integral domain R as a subset of C such that 0, 1 ∈ R and the set is closed under
both addition and multiplication. This will be sufficient most of the time for this
class, I think.
Proof. Let us take the function λ(a + bi) = a2 + b2 = (a + bi)(a − bi). (This will be true
even when a, b ∈ R, not just in Z.) It is clear that λ outputs non-negative integers, as
a2 , b2 ∈ Z≥0 . Furthermore, λ is multiplicative, i.e. λ((a+bi)(c+di)) = λ(a+bi)λ(c+di).
(You can prove this by expanding everything out directly, or using basic facts about
conjugation and use λ(a + bi) = (a + bi)(a − bi).)
Now we want to show that we can construct a Division Algorithm using λ. Take
α, γ ∈ Z[i], γ ̸= 0, and let α/γ = r + si, where r, s ∈ R. We can choose integers
m, n ∈ Z such that |r − m| ≥ 1/2 and |s − n| ≥ 1/2 (just choose the closest integers to
r and s respectively). Let δ = m + ni ∈ Z[i] and ζ = α − γδ.
Proof. Suppose there are only finitely many primes, so {p1 , . . . , pn } is our complete list
of distinct primes. Consider z = p1 p2 · · · pn +1. If any pi | z, then pi | z −p1 p2 · · · pn = 1,
so pi ∤ z. But we know z has a prime factorization, so there must exist a prime dividing
z but not contained in our list {p1 , . . . , pn }. This contradicts our assumption of finitely
many primes.
There are many proofs of this result, but this was the first one, and I find it partic-
ularly nice because of its simplicity. Even better, one can use this technique to prove
even stronger statements. For instance, I challenge you to prove:
Exercise 3.2. Prove there are infinitely many primes congruent to 1 mod 4. (You
can also prove this for 3 mod 4.)
The fact about infinitely many primes in Z also comes as a consequence of the following
result:
Theorem 3.3
X 1
The sum diverges.
p prime
p
Lemma 3.4
If s ∈ N+ , then the sum ∞ 1
= 1 + 21s + 31s + · · · diverges if s = 1 and converges
P
n=1 ns
if s ≥ 2.
Proof. We ´ ∞invoke some stuff from high school calculus. ´ ∞ The sum is the left Riemann
1 1
sum for 1 (x+1) s dx and the right Riemann sum for 1 xs
dx, so the sum is bounded
by
ˆ ∞ ∞ ˆ ∞
1 X 1 1
s
dx ≤ s
≤ dx.
1 (x + 1) n=1
n 1 xs
´z
The left integral when s = 1 is limz→∞ 1 (x + 1)−1 dx = limz→∞ log(x + 1)| z
´ z1 → +∞,
−s
so the sum diverges for s = 1. For s ≥ 2, the right integral converges: 1
x dx =
1 1 z 1 1
− s−1 xs−1 1 = s−1 1 − zs−1 which is bounded as z → ∞.
ℓ(n) −1
Y 1
λ(n) = 1− .
i=1
p i
−1
1
Each 1 − pi
is the sum of a geometric series in disguise (with initial term 1 and
1
common ratio pi
), so we can write
ℓ(n) −1
Y 1
λ(n) = 1−
i=1
pi
ℓ(n)
Y 1 1
= 1 + + 2 + ···
i=1
p i pi
1 1 1 1
= 1+ + + ··· 1+ + + ··· ··· .
p1 p21 p2 p22
Clearly, the prime factorization of any integer m ≤ n uses only the primes ≤ n, i.e.
the primes in {p1 , . . . , pℓ(n) }. Thus, in the expansion of this product, m1 appears for all
m ≤ n. In particular,
1 1
λ(n) ≥ 1 + + · · · + ,
2 n
which we know from Lemma 3.4 diverges as n → ∞.
Now we do some clever manipulations. Using the Taylor series expansion of log,
which recall is ∞
X xm
log(1 − x) = − ,
m=1
m
we can now write
ℓ(n)
X 1
log λ(n) = − log 1 −
i=1
pi
ℓ(n) ∞
X X
−1
= (mpm
i )
i=1 m=1
ℓ(n) ∞
1 1 1 XX
−1
= + + ··· + + (mpm
i ) .
p1 p2 pℓ(n) i=1 m=2
Call the remaining double sum at the end as S. We will now show S converges.
∞
X ∞
X
(mpm
i )
−1
< (pm
i )
−1
= p−2 −3
i + pi + · · ·
m=2 m=2
p−2
= i
≤ 2p−2
i
1 − p−1
i
ℓ(n) ∞
XX
−1
=⇒ S = (mpm
i )
i=1 m=2
ℓ(n) ∞
X X 1
< 2p−2
i < 2 2
,
i=1 n=1
n
We’re going to revise this question a little bit. Note that if k is an infinite field, say
k = Q or C, then I could take any irreducible polynomial of the form ax + b, and there
are infinitely many choices for (a, b), so the question is obvious. Thus, we will focus on
when k is a finite field.
We actually know how to classify all finite fields. This is not straightforward to prove,
but we will provide the fact:
Fact 3.6. A finite field has q = pr elements for some prime p and r ∈ N. (We
denote such a field as Fq .)
Now, we can provide an analogous statement to Theorem 3.3 for k[x]. This highlights
the usefulness of this theorem, because then we can conclude the infinitude of primes
for other rings beyond just the integers.
Theorem 3.7
If |k| = q, then the sum
X X 1
q − deg p(x) =
p prime
q deg p(x)
p(x)∈k[x]
p(x) irreducible
diverges.
The proof of this is very similar in flavor to the proof we provided for Theorem 3.3.
Someone asked about the density of primes, so we’ll provide a brief interlude here
to address this question.
Definition 3.8. If x ∈ R+ , then let π(x) be the number of primes p such that
1 < p ≤ x.
This was conjectured by Gauss at the (impressive) age of 16.2 It was proven as a theorem
by Hadamard and Poussin in 1896 using the Riemann zeta function. In particular,
proving strong bounds on the error term for the Prime Number Theorem requires
assuming the Riemann Hypothesis, just one of many reasons why the elusive conjecture
is so important in math.
We won’t prove the Prime Number Theorem, as it is very involved, but we will
prove a pretty strong, related result:
Theorem 3.10
There exists constants c1 , c2 > 0 such that
x x
c2 · < π(x) < c1 · .
log x log x
Proposition 3.12
For x ∈ R+ , θ(x) < (4 log 2) · x.
(2n!)
Proof. Consider the binomial coefficient 2n
n
= n!n! . (If you have never seen this before,
this is the number of ways to choose n objects from 2n total objects.) We see that this
is divisible by all primes p with n + 1 ≤ p ≤ 2n. (They appear in the numerator, but
2
If you ever feel yourself having too big of an ego, just think of Gauss. On the other hand, if you
feel yourself having pretty low self-esteem, just think of the time when Grothendieck needed to use
a particular prime in a lecture and said, “Alright, take 57,” which became so famous that 57 is now
called the Grothendieck prime.
2n
not in the denominator.) Also note appears in the binomial expansion of (1 + 1)2n ,
n
so we have
2n Y
22n 2n
= (1 + 1) > > p.
n n<p≤2n
p prime
Thus, for any x ∈ R+ , we can find m ∈ N such that 2m−1 ≤ x ≤ 2m , which gives us
as desired.
Corollary 4.1
x
There exists a constant c1 > 0 such that π(x) < c1 · log x
for x ≥ 2.
Proof. We have X X
θ(x) = log p, π(x) = 1.
1≤p≤x 1≤p≤x
How can we relate π(x) with θ(x)? One (less fruitful) observation one could make
is log p ≤ log x, so θ(x) ≤ π(x) log(x). But we will do something a bit more useful,
because after all, we want an upper bound for π(x), not a lower one.
P
We will consider the sum x≥p≥√x log p. This may seem a bit out of the blue, but
√
log x = 21 log x, so we are just taking the sum from the logarithmically top half of the
range 1 ≤ p ≤ x. We now have a really nice lower bound for θ(x) (equivalently, an
upper bound for π(x)):
X
θ(x) ≥ log p
√
x≥p≥ x
√ √ √
≥ log x · π(x) − log x · π( x)
√ √ √
≥ log x · π(x) − x log x
1 √
= (log x) · (π(x) − x).
2
2θ(x) √
=⇒ π(x) ≤ + x
log x
x √
< 8 log 2 + x.
log x
√
So
√ this is basically whatxwe want. To make the x extra term √ disappear, observe that
2x
x grows slower than log x (more specifically, one can show x < log x for x ≥ 2), so
x √ x
π(x) ≤ 8 log 2 + x < (8 log 2 + 2) ,
log x log x
so we can take c1 = 2 + 8 log 2.
To recap on the above proof, because it is a bit technical: we have this wonderful
result from Proposition 3.12, and we want to turn this into an upper bound for π(x).
This means we have to write θ(x) as an upper bound of some expression in terms of
π(x). The trick we employ is to consider the sum that only takes the “top half” of the
range 1 ≤ p ≤ x, and the magic happens in the computations.
Proposition 4.2
x
There exists a constant c2 > 0 such that π(x) > c2 · log x
.
( n o
1 iff pnj ≥ 12
2n n
−2 j =
pj p 0 otherwise.
Thus, we have
n 2n Y
2 ≤ ≤ ptp
n p≤2n
X X log 2n
=⇒ n log 2 ≤ tp log p = log p
p≤2n p≤2n
log p
X log 2n X
log 2n
= log p + log p.
√ log p √ log p
p≤ 2n 2n<p≤2n
3
Kisin: “I always remember this formula, because it was needed in an olympiad problem while I
was in high school, but I had never seen it before, so I just proved it on the spot.” Absolute chad.
4
If this still concerns you, do this explicitly for a specific prime p and integer n. Doing examples
will help!
√
We now consider the latter sum separately. If p > 2n, then
log 2n log 2n log 2n
< √ = 1 = 2,
log p log 2n 2
log 2n
j k
log 2n
so log p
= 1. Substituting this back into our inequality above, we have
X log 2n X
n log 2 ≤ log p + log p.
√ log p √
p≤ 2n 2n<p≤2n
Whew, okay that was a very involved proof with lots of big steps. Let’s get back to
ground level and deal with things a bit less stressful.
Sometimes, when n is understood, we will suppress the (n) or the mod n and just
write a ≡ b.
This is what we call an equivalence relation. It is especially nice because it complies
with all the algebraic operations we’d want in life: if a ≡ b and c ≡ d, then a + c ≡ b + d
and a · c ≡ b · d. (The latter can be seen via ac − bd = c(a − b) + b(c − d).) You are
probably familiar with all of this already from the integers modulo n, denoted Z/nZ.
Lemma 4.4
If (a, n) = (1), then ∃ x ∈ Z such that ax ≡ 1 (mod n).
Proof. This follows very quickly from definitions. If (a, n) = (1) = Z, then 1 ∈ (a, n),
so 1 can be expressed as a linear combination of a and n. In other words, ∃ x, y ∈ Z
such that ax + ny = 1. Equivalently, this means ax ≡ 1 (mod n), as desired.
One can think, therefore, of a such that (a, n) = 1 as an invertible element modulo
n. In Z/nZ, the element corresponding to a has a multiplicative inverse.
Definition 4.5 (Units of Z/nZ). The units of Z/nZ, denoted (Z/nZ)× , are the
congruence class of a for any (a, n) = 1.
A result we will prove next time is Fermat’s Little Theorem, which states that if
x ∈ (Z/pZ)× , then xp−1 ≡ 1 (mod p).
A natural question one might ask is how we can count the number of units of Z/nZ,
or equivalently count the number of 1 ≤ a < n such that (a, n) = 1. This is an important
quantity in number theory, so it has a specific function related to it, called the Euler
totient function φ. For an integer n, φ(n) = |(Z/nZ)× | = {1 ≤ a < n : (a, n) = 1}.
We have a nice way of counting this:
Lemma 4.7
If n = pa11 pa22 · · · par r , then
Y r
1 1
ϕ(n) = n 1 − ··· 1 − = pai i −1 (pi − 1).
p1 pr i=1
This gives rise to a more general version of Fermat’s Little Theorem (called Euler’s
Totient Theorem), which we will state and prove next time.
Proof 1. We wish to compute the size of the set {1 ≤ m ≤ n | (m, n) = 1}. This is
now just a counting problem. We first start with all integers from 1 to n – there are n
of them. Now, we subtract all multiples of pi for each pi . This leaves us with
n n n
n− − − ··· − .
p1 p2 pr
But we have subtracted too much! For instance, we remove the number p1 p2 twice: once
for the multiples of p1 , and the other for p2 . Thus, we must add back in all multiples
of pi pj for i ̸= j. This gives us now
n n n n
n− − ··· − + + ··· + .
p1 pr p1 p2 pr−1 pr
as desired.
Remark 5.1. Note that this is not actually a proof, this is more of a general
argument. There needs to be some work to formalize this. For those familiar with
some combinatorics, this is essentially invoking the Principle of Inclusion-Exclusion.
convolution f ∗ g of f and g as
X
(f ∗ g)(n) = f (d1 )g(d2 ).
d1 d2 =n
d1 ,d2 ∈N
One can show that this operation is both associative and commutative (just write
it out, it’s not bad at all).
We introduce three particular functions.
1. Consider the function I : N → C where I(n) = 1 for all n ∈ N. Then, for any
f : N → C, convolution gives
X
(f ∗ I)(n) = (I ∗ f )(n) = f (d).
d|n
2. Let φ : N → C send φ(1) = 1 and φ(n) = 0 for all n ̸= 1. Then, for any
f : N → C, we have f ∗ φ = f .
3. The last is called the Möbius function, denoted µ, and it is defined by
(
(−1)s n = p1 · · · ps , each pi distinct
µ(n) =
0 otherwise.
Some examples: µ(p) = −1 for any prime p, µ(10) = (−1)2 = 1, and µ(28) = 0
since it has two factors of 2. In particular, if n is divisible by a perfect square,
then µ(n) = 0.
Considering that we want to use a technique called Möbius inversion, it makes sense
that the Möbius function will be an object of interest. We begin with one nice property
of it:
Lemma 5.3
P
If n > 1, then d|n µ(d) = 0.
Here, the second equality follows from the fact that any integer divisible by a square
evaluates to 0, the third line follows from simply counting how many tuples (b1 , . . . , bs )
have i 1’s for 0 ≤ i ≤ s, and the last line follows from the Binomial Theorem.
Proof. The statement may come as a surprise, but the proof is actually not bad at all.
We will use all three functions I, φ, µ that we had before. First, note that
X X
(µ ∗ I)(n) = µ(d)I(n/d) = µ(d),
d|n d|n
which by the previous lemma is 0 for n > 1. One can easily compute it is 1 for n = 1,
so in fact µ ∗ I = φ. In this case, we have
F =f ∗I
=⇒ F ∗ µ = f ∗I ∗µ
=f ∗µ∗I
=f ∗ φ = f,
as desired.
Recall we want to prove Lemma 4.7 in a different way using Möbius inversion. Here
is the first step towards this proof, which gives the flavor of being related to µ.
Proposition 5.5
P
d|n ϕ(d) = n. In other words, ϕ ∗ I is the identity map id : n 7→ n.
Corollary 5.6
If n = pa11 · · · pas s , then ϕ(n) = n 1 − 1
p1
··· 1 − 1
ps
.
Proof. Proposition 5.5 gives ϕ ∗ I = id. Möbius Inversion now tells us ϕ = id ∗µ, so
X
ϕ(n) = µ(d) id(n/d)
d|n
X n
= µ(d)
d
d|n
n n n
=n− − ··· − + + ···
p1 ps p1 p2
1 1
=n 1− ··· 1 −
p1 ps
This is actually a consequence of a more general fact in group theory called La-
grange’s Theorem. If you know some group theory, you can replace (Z/nZ)× as any
finite group G, h with some element of G, and ϕ(n) with the size |G| of G, and the
statement would still be true as equality in the group.
Actually, it looks like Kisin wants to talk about this result in its group-theoretic
generality, so we will talk a little bit about groups. For our purposes, we will consider
a subgroup of (Z/nZ)× as a subset H ⊆ (Z/nZ)× such that 1 ∈ H and for any
h1 , h2 ∈ H, their product h1 h2 ∈ H is also in H.
A few more definitions, unfortunately all called order. (This is not to be confused
with the more strictly number-theoretic definition of order we provided way back in
Definition 1.14.)
One can see that these two definitions of order are related. Consider the subgroup
generated by an element h ∈ H, that is, the set of all powers of h. We will notate as
⟨h⟩ = {1, h, h2 , . . . }. It follows that the order of h agrees with the order of ⟨h⟩.
Now we prove Theorem 5.7 via the following Proposition.
Proposition 5.10
|H| divides ϕ(n) = |(Z/nZ)× |.
Proof. This proof might seem a little weird; I think this is because Kisin is trying to
explain a proof in group theory without defining new terminology. For the more purely
group theory explanation, see the last paragraph.
We can define an equivalence relation between elements in (Z/nZ)× as follows. If
g1 , g2 ∈ (Z/nZ)× , then we say g1 ∼ g2 if there exists some h ∈ H such that g1 ∼ g2 h.
For example, if we take h = −1, then a ∼ −a for any a ∈ (Z/nZ)× . Another example:
all elements of H are equivalent to each other.
Any equivalence relation produces equivalence classes. We can consider the equiva-
lence class of some a ∈ (Z/nZ)× ; this is given by the set {ah | h ∈ H}. Each element
in this set is distinct (if ah1 = ah2 , then h1 = h2 ), so this equivalence class has exactly
|H| elements. Note that if b is in this equivalence class, i.e., b = ah′ for some h′ ∈ H,
then the equivalence class of b is the same as the equivalence class of a. (I will leave
this as an exercise, but reach out if you have questions.)
Now the finish line is in sight. Each element of (Z/nZ)× belongs to an equivalence
class (namely, its own), and each equivalence class has size |H|. Therefore, ϕ(n) =
|(Z/nZ)× | is equal to |H| times the number of equivalence classes. The latter number
is clearly an integer, so it follows that |H| divides ϕ(n), as desired.
For people familiar with group theory, we are simply considering the cosets of H in
(Z/nZ)× . Each coset has size |H|, the number of cosets is clearly an integer, and every
element in (Z/nZ)× is contained in a coset of H. It follows that |H| times the number
of cosets is ϕ(n).
and a = |H| divides ϕ(n) by the above Proposition. Thus, hϕ(n) = (ha )ϕ(n)/a = 1,
done.
Note that what Fermat’s Little Theorem is telling us, in terms of orders, is that the
order of any element in (Z/pZ)× divides p − 1. But it is a neat fact about (Z/pZ)×
that there actually exists an element whose order is exactly p − 1:
Theorem 5.12
If p is prime, then there exists some h ∈ (Z/pZ)× such that h has order p − 1. In
other words, (Z/pZ)× = {1, h, h2 , . . . , hp−2 } has one generator, so it is cyclic.
A natural question one may ask after seeing the above examples is whether 2 is a
primitive root for infinitely many primes. This is actually an unsolved problem! There
is a conjecture by Artin, though, which claims that if a ̸= −1 is not a square, then a is
a primitive root mod p for infinitely many p. So we expect for 2 to be a primitive root
for infinitely many primes.
Lemma 6.1
If k is a field (e.g., Z/pZ, Q, R, C) and f (x) ∈ k[x] is a monic polynomial of degree
n, then f (x) = 0 has at most n solutions in k.
Proof. We will induct on n = deg f . If n = 1, then f (x) is of the form f (x) = x − a for
some a ∈ k. Clearly, the only root is x = a, so there is exactly one solution.
Before we move on to the inductive step, we will develop a useful condition for
divisibility. Suppose f (x) is a polynomial and α ∈ k such that f (α) = 0. Then, since
k[x] has a Division Algorithm (recall Lemma 2.5), we can write f (x) = (x−α)q(x)+r(x),
where deg r < deg(x−α) = 1. This forces deg r = 0, i.e., r(x) = c is a constant function.
But then 0 = f (α) = r(α) = c, which means (x − α) | f (x).
Now we proceed with the inductive step. Suppose the statement is true for polyno-
mials of degree n − 1, and let deg f = n. If f has no roots, then the Lemma is clearly
satisfied. Otherwise, choose some α such that f (α) = 0. By the above, (x − α) | f (x),
so f (x) = (x − α) · f1 (x). But now deg f1 , so our inductive hypothesis tells us that f1
has at most n − 1 roots. The conclusion follows.
Exercise 6.2. (For fun) Out of the fields Z/pZ, Q, R, C, can you find which ones
produce exactly n roots for a degree n polynomial? (Such fields are called al-
gebraically closed, and they are useful because, well, we can always factor a
polynomial into linear factors.)
Note that the above lemma tells us that the polynomial xp−1 − 1 = 0 has at most
p − 1 roots, but Fermat’s Little Theorem (5.11) tells us that it actually has exactly
p − 1 distinct roots. We can strengthen this observation:
Corollary 6.3
If d | p − 1, then xd = 1 (mod p) has exactly d solutions in Z/pZ.
Proof. Fermat’s Little Theorem tells us that every element a ∈ (Z/pZ)× satisfies ap−1 =
1, so we can factor Y
xp−1 − 1 = (x − a).
a∈(Z/pZ)×
(We can do this because from the proof above, each (x − a) | xp−1 − 1, and as
the (x − a) linear factors are coprime, their product must collectively divide xp−1 − 1.
Comparing degrees and leading coefficients, it follows that the two are in fact equal,
hence the factorization.)
Note that if d | p − 1, then xd − 1 | xp−1 − 1. (This is a strictly algebraic fact; for
example, x15 − 1 = (x5 − 1)(x10 − x5 + 1).) Write xp−1 − 1 = (xd − 1) · g(x), where
Recall we’re doing all of this to prove Theorem 5.12: there exists an element in
(Z/pZ)× of order p−1. It turns out that the above Corollary gives us enough “restricting
conditions” to force this to be true.
Let’s elaborate more in this big-picture argument. Suppose no such primitive root
(element with order p − 1) exists. We already know the order of any element must
divide |(Z/pZ)× | = p − 1. (This is the point of Proposition 5.10, and it comes as a
consequence of Euler’s Totient Theorem, Theorem 5.7.) But Corollary 6.3 gives us, for
any divisor d | p − 1, an exact number for how many elements have order d. A quick
computation, invoking some of the work from §5.2, will show us that counting over all
such elements for all divisors d < p − 1 is not enough, so there must be an element with
order p − 1.
Proof of Theorem 5.12. Let ψ(d) be the number of elements of (Z/pZ)× of order d.
Note that an element satisfies xd − 1 = 0 if it has order dividing d, i.e., ifPc is the
smallest positive integer such that xc − 1 = 0, then c | d. Thus, we have d = c|d ψ(c).
(At this point, you can see we are set up nicely to use Möbius Inversion, Theorem 5.4.)
P
Recall (the remarkably clever) Proposition 5.5, which tells us d = c|d ϕ(c), where
ϕ is the Euler Totient function. Our desire now is to show ψ = ϕ. By Möbius Inversion,
taking f to be either ψ or ϕ and F = id, we have
X
ϕ(d) = µ(c) · d/c = ψ(d),
c|d
Remark 6.4. Note that not only does this show the existence of a primitive root,
it also tells us exactly how many primitive roots there are! This is given by ψ(p−1),
which at the end we saw is just ϕ(p − 1). One can see this more directly, though:
once we know (Z/pZ)× is a primitive root, say a, then we know
Then, it is not hard to show directly that for any m such that (m, p − 1) = 1, am is
also a primitive root mod p. The number of m such that (m, p − 1) = 1 is ϕ(p − 1)
by definition.
The answer comes from a significant result in number theory called the Chinese Re-
mainder Theorem.5
Proof. Since all mi ’s are pairwise coprime, for any prime p, p divides at most one mi .
Denote n = m1 · · · ms and, for each i, denote ni = n/mi = m1 · · · mi−1 mi+1 · · · ms . By
construction, (mi , ni ) = 1, so there exist integers (ri , si ) such that ri mi + si ni = 1. Let
ei = si ni . Again, by construction, observe
(
0 mod mj if j ̸= i
ei ≡
1 mod mi .
Now we can describe the structure of (Z/nZ)× by first decomposing Z/nZ following
Sunzi’s Theorem above.
5
Note the subtle discrimination going on in the name: any result created by a European/American
is credited by name (e.g., Fermat’s Little Theorem), but here they fail to give a specific name despite
knowing its founder. There is some push in the math community to rename it Sunzi’s Theorem, since
the result is first known to be stated by Sunzi.
Fact 6.7. If p is an odd prime (i.e., p > 2) and b ∈ N+ , then (Z/pb Z)× is always
cyclic.
7.1 Motivation
Here is the premise of the topic.6 In general, we are interested in solving polynomial
equations. Doing this over R, even C, for linear and quadratic equations is the whole
point of the algebra sequence in middle/high school. Doing this in generality is the
birthplace of algebraic geometry, one of the most prominent fields in modern mathe-
matics. (Take Math 137 or 232A/B if this piques your interest.)
Number theory cares about things modulo n. We can do even better: by Sunzi’s
Theorem, to study something modulo n, it suffices to study it in modulo p for primes
p | n. For instance, if we want to solve x3 − 3 = 0 (mod 30), we can solve it in mod 2,
3, and 5, then combine our findings to find solutions modulo 30.
6
This was not covered in class, I’m just adding this for more context.
Solving linear equations modulo p is easy in some cases and doable in all cases. (We
might talk more about this later in the course.) If I give you something like x − 3 = 0
(mod p), it is obvious what x can be mod p. An equation like 3x ≡ 1 (mod 7) is also
completely doable. (Do it!)
Even if I give you something clunky like Ax ≡ B (mod C) (take something ridicu-
7
lous like A = 111234 , B = 420420 , and C = 77 ), we know in general that we can find
integers x and y such that Ax+Cy = (A, C) using the Euclidean algorithm/the process
you did on your homework. Thus, so long as (A, C) | B (which in our example is true,
since (A, C) = (11, 7) = 1), we have B = (A, C) · d, so A(dx) + C(dy) = (A, C) · d =
B =⇒ A · (dx) = B (mod C). The point is that solving linear equations is completely
understood, and pretty efficient.
The next step is solving quadratic equations. The simplest such equation is of the
form x2 ≡ a (mod p). (In fact, every quadratic can be reduced to this form. For
instance, if 2x2 + 3x − 1 ≡ 0 (mod 7), then 2x2 − 4x = 1 = 8 (mod 7) =⇒ x2 − 2x =
4 =⇒ (x−1)2 = 5 (mod 7).) Quadratic reciprocity allows us to answer these questions
in a marvelously efficient way. For example, once we lay out the main result, then we
can compute problems like these very easily:
Before reading the next section, I invite you to play around with these two baby
exercises:
Exercise 7.2. Take the first eight odd primes p ∈ {3, 5, 7, 11, 13, 17, 19, 23, 29}.
For each of these primes, determine whether −1 is a quadratic residue. I’ll start:
in mod 3, −1 = 2, but 12 = 22 = 1 modulo 3, so −1 is not a quadratic residue. On
the other hand, in mod 5, −1 = 4 = 22 , so −1 is a quadratic residue. Do this for
all primes, and try to see a pattern! The answer may come as a surprise.
For the even more curious, do the same for 2. We saw above that 2 = −1 is
not a quadratic residue modulo 3, and it turns out that 2 is also not a quadratic
residue modulo 5. Can you find any patterns?
Definition 7.4 (Legendre Symbol). Let a ∈ Z and p prime such that (a, p) = 1.
Then, the Legendre symbol is defined as
(
a 1 if a is a quadratic residue mod p
=
p −1 otherwise.
Most of the time, we will restrict our attention to when a ̸= 0, i.e., (a, p) = 1,
because if a = 0, then we can do the obvious 02 = 0, which is not so interesting.
From Theorem 5.12, we know that (Z/pZ)× is cyclic. Let h be a primitive root
modulo p. Thus, for any a ∈ (Z/pZ)× , we can write a = hi for some integer i. One can
show that a is a quadratic residue if and only if i is even (in which case a = (hi/2 )2 ).
Let us elaborate on this a little bit more via the following result.
Lemma 7.5
For a ∈ Z and p an odd prime such that (a, p) = 1, we have
a p−1
=a 2 (mod p).
p
This is quite an exciting result! For those who entertained Exercise 7.2, you probably
figured there must be a better way to check if a number is a quadratic residue. Well,
here we go.
Initially, the fastest way to compute ap is by going through all x ∈ (Z/pZ)× and
seeing if x2 ≡ a (mod p). In particular, if a is not a quadratic residue, that would
require us to go through all x ∈ (Z/pZ)× . For each x, we have two computations –
square x, then reduce mod p – giving a total of 2(p − 1) computations. Now, we can
compute it just from multiplying a to itself p−1
2
times, which is far less computationally.
Thus, to summarize, there exists an x such that x2 ≡ a (mod p) (i.e., ap = 1)
p−1
if and only if a 2 ≡ 1 (mod p). When no such x exists, i.e., when ap = −1, then
p−1
p−1 2
a 2 ̸≡ 1. But note that a 2 = ap−1 = 1 by Fermat’s Little Theorem, and this is
p−1
p−1
only possible if a 2 ≡ ±1, so ap = −1 ⇐⇒ a 2 ≡ −1 (mod p). We have now
covered both cases, so the conclusion follows.
(mod 4).
Proposition 7.7
p2 −1
2
p
= (−1) 8 . In other words
(
2 1 p ≡ 1, 7 (mod 8)
=
p −1 p ≡ 3, 5 (mod 8).
Finally, we state this truly remarkable result, first proved by none other than Gauss.
It may look a bit complicated at first, but it makes this question regarding quadratic
residues very simple.
This makes answering questions like Exercise 7.1 not only doable, but even doable
by hand.
As a corollary, we can prove Proposition 7.7, which says that 2 is a quadratic residue
when p ≡ 1, 7 (mod 8), and not a quadratic residue otherwise.
Proof of Lemma 7.5. We use Lemma 7.9, which we just proved. Here, we may compute
µ explicitly. Given i between 1 and p−1 2
, note 2i < p, so we have 2i ≡ −mi for some
1 ≤ mi ≤ (p − 1)/2 if and only if 2i > p−1 2
, i.e. i > p−1
4
. Thus, µ is equal to the
p−1
number of 1 ≤ j ≤ (p − 1)/2
p−1 such that j > 4 , which by complementary counting is
just p−1
2
− m where m = 4
. We do this by casework:
• (p ≡ 1 (mod 8)) We can write p = 8k + 1. Then, m = 2k, so p−1
2
− m = 4k =
2k = 2k. Thus, µ is even.
• (p ≡ 3 (mod 8)) We can write p = 8k + 3. Then, m = 2k again, so µ = p−1
2
−m =
4k + 1 − 2k = 2k + 1 is odd.
• (p ≡ 5 (mod 8)) Write p = 8k + 5. Then, m = 2k + 1, so µ = p−1
2
−m =
(4k + 2) − (2k + 1) = 2k + 1 is odd.
• (p ≡ 7 (mod 8)) Write p = 8k + 7. Then, m = 2k + 1, so µ = p−1
2
−m =
(4k + 3) − (2k + 1) = 2k + 2 is even.
Collecting all of these computations, the result follows.
Lemma 7.10
(n−1)/2
f (nz) Y m m
= f z+ f z− .
f (z) m=1
n n
Lemma 7.11
If n > 0 is odd, then
n−1
Y
xn − y n − ζ k x − ζ −k y ,
k=0
where ζ = e2πi/n .
Proof. Note that for ζ = e2πi/n , we have (ζ k )n = e2πik = 1, so all ζ k ’s are a root of the
polynomial z n − 1 = 0. But there are n such powers of ζ, and deg(z n − 1) = n, so they
each appear as a root exactly once. In particular,
n−1
Y
n
z −1= (z − ζ k ).
k=0
as desired.
Great, let us return to the main lemma at hand. We apply the lemma above with
x = e2πiz and y = e−2πiz to get
Proposition 7.12
If p is an odd prime and a ∈ Z such that (p, a) = 1, then
(p−1)/2 (p−1)/2
Y ℓa a Y ℓ
f = f .
ℓ=1
p p ℓ=1
p
Proof. We can write ℓa ≡ ±mℓ (mod p) for some 1 ≤ mℓ ≤ (p − 1)/2. This tells us
that ℓa∓m
p
ℓ
is an integer, so using the relations f (z) = f (z + 1) and f (z) = −f (−z),
f (ℓa/p) = −f (∓mℓ /p) = ±f (mℓ /p).
Multiplying across all 1 ≤ ℓ ≤ (p − 1)/2 on both sides, we have
p−1 p−1 p−1
2 2 Y2
Y
µ
Y a
f (ℓa/p) = (−1) f (ℓ/p) = f (ℓ/p),
ℓ=1 ℓ=1
p ℓ=1
where the last equality invokes Lemma 7.9.
Now why is this useful? Well, we can now prove Quadratic Reciprocity.
Proof of Theorem 7.8. Take p, q odd primes, and apply the above Proposition to a = q.
(Note that p, q are “symmetric” in the sense that they are interchangeable, so whatever
we do for q, we can do the same for p.) The Proposition tells us
p−1 p−1
2 Y2
Y q
f (ℓq/p) = f (ℓ/p),
ℓ=1
p ℓ=1
αn + an−1 αn−1 + · · · + a0 = 0
Another way to think about this is that α is an algebraic number if it satisfies some
polynomial relation p(X) = X n + an−1 X n−1 + · · · + a0 = 0 for ai ∈ Q, and likewise
ai ∈ Z for algebraic integers.
In general, we can always find algebraic numbers which are not algebraic integers. We
in fact have a nice characterization of when algebraic numbers are algebraic integers.
(Spoiler: it’s given in the name.)
Proposition 8.3
Let r ∈ Q. Then,
1. r is an algebraic number.
Proof. The first is trivial: r satisfies p(X) = X − r = 0. We now focus on the second
statement. Assume there are ai ∈ Z such that rn + an−1 rn−1 + · · · + a0 = 0. Write
r = pq , where p, q ∈ Z and (p, q) = 1. Clearing denominators in our polynomial relation,
we have
pn + an−1 pn−1 q + · · · + an q n = 0
=⇒ −q(an−1 pn−1 + an−2 pn−2 q + · · · + an q n−1 ) = pn .
But this means q divides pn , which is only possible when (p, q) = 1 if q = ±1. This
means r = ±p ∈ Z, as desired.
1. ∀ x, y ∈ V , x + y ∈ V ;
2. ∀ r ∈ Q, x ∈ V , r · x ∈ V ;
So we have entered the land of linear algebra, which is great, because we know linear
algebra really well.7
Proposition 8.5
Let V be a Q-module. Let α ∈ C such that α · V ⊆ V . Then, α is an algebraic
number.
It is easy to see that this is a Q-linear map. Let M ∈ Mn (Q) be its matrix with
respect to the basis γ1 , . . . , γn . Take the characteristic polynomial P (X) = det(M −
XIn ) = X n + an−1 X n−1 + · · · + a0 ; since M ∈ Mn (Q), the coefficients here live in Q.
7
“We” means the math community at large. I myself am pretty bad at linear algebra, oops.
Now we invoke the Cayley-Hamilton Theorem, one of the most important results in
linear algebra. The theorem states that plugging in the matrix into the characteristic
polynomial of the matrix gives 0, so here, we have P (mα ) = mnα +an−1 mn−1
α +· · ·+a0 = 0.
But we know exactly what mkα is from definition: mkα (x) = αk x. Thus, P (mα )(x) =
(αn + an−1 αn−1 + · · · + a0 )x = 0 for all x. Setting x ̸= 0, this forces the sum in
the parentheses to be 0, so αn + an−1 αn−1 + · · · + a0 = 0. Hence, α is algebraic, as
desired.
Proposition 8.6
The set of algebraic numbers is a field.
Proof. Let α, β ∈ C be two algebraic numbers. We want to show three things are
algebraic: (1) 1/α (so the set has inverses), (2) α · β (closed under multiplication), and
(3) α + β (closed under addition).
We start with (1). We know by definition that α, β satisfy the relations
αn + an−1 αn−1 + · · · + a0 = 0
β m + bm−1 β m−1 + · · · + b0 = 0
for ai , bi ∈ Q. Assume that a0 ̸= 0 (otherwise we can divide the first equation by α).
Dividing by a0 αn , we get a new equation
i n
1 an−1 1 an−i 1 1
+ · + ··· + · + = 0,
a0 a0 α a0 α α
so 1/α is an algebraic number.
We will prove (2) and (3) with the same approach. Let V be the Q-module with
basis {αk β j : 0 ≤ k < n, 0 ≤ j < m}. By construction, this has finite dimension.
Furthermore, any element in V is of the form v = i,j rij · αi β j , and we have αv =
P
i+1 j
P
i,j rij · α β , which is still an element of V . (The only thing we have to check is
n−1
still lives in V , which is true since α1+(n−1) = αn = − i<n ai αi ∈ V .)
P
that α · α
Likewise, βv ∈ V .
But this means α · V ⊆ V and β · V ⊆ V . Now, we can take sums and products to
get (α + β) · V ⊆ V and (αβ)V ⊆ V . Proposition 8.5 tells us then that α + β and αβ
are algebraic numbers, as desired.
To show that the set of algebraic integers for a ring, we now provide the proof given
in the textbook. It is basically going to be the same, except instead of working over Q,
we will work over Z.
Remark 8.7. We called a Q-vector space as a Q-module because modules are more
general than vector spaces. For example, there are no such things as a Z-vector
space since Z is not a field. However, modules are defined over rings, so it makes
sense to talk about a Z-module.
1. W is an abelian subgroup;
2. if n ∈ Z, α ∈ W , then n · α ∈ W ;
Akin to Proposition 8.5, we have this analogous result, which uses a similar Cayley-
Hamilton argument.
Proposition 8.9
Let W be a Z-module. Let α ∈ C such that αW ⊆ W . Then, α is an algebraic
integer.
where δij is the Kronecker delta function that gives 1 when i = j and 0 otherwise.
Consider the n × n-matrix M = (αδij − cij )i,j . Note the above equality indicates
M · (γ1 · · · γn )⊺ = 0, which forces det M = 0 since the γi ’s are nonzero. But det M
is a polynomial in α with integer coefficients since each cij ∈ Z. Note also that each
nonzero coefficient of α in M is δii = 1, so det M is a monic polynomial in α. Hence, α
is an algebraic integer.
The proof that the set of algebraic integers forms a ring follows straight from this
Proposition, similar to what we did for algebraic numbers.
Corollary 8.10
The set of algebraic integers is a ring.
Proof. Take W to be the Z-module generated by {αi β j | 0 ≤ i < n, 0 ≤ j < m}, and
proceed as in the proof of Proposition 8.6.
Proposition 8.11
Let w1 , w2 ∈ Ω and p ∈ Z be a prime number. Then, (w1 + w2 )p ≡ w1p + w2p
(mod p).
Proof. The proof follows exactly like the proof for the integers. By the Binomial The-
orem,
p−1
p p p
X p k p−k
(w1 + w2 ) = w1 + w2 + w1 w2 .
k=1
k
p
Since p | k for 1 ≤ k ≤ p − 1, the result follows.
Well, we know an algebraic number satisfies some polynomial relation. Can we find
this polynomial√ relation? I mentioned earlier√ that this is a bit difficult to do by hand;
for instance, 2 satisfies X 2 − 2 = 0, and 3 corresponds to X 2 − 3 = 0, √ but √
given
these two polynomials, it is hard to find the minimal polynomial that has 2 + 3 as
a root. We will “find” this minimal polynomial through some algebra.
Let α ∈ Q be an algebraic number. Then, S = {P ∈ Q[X] | P (α) = 0} is an ideal
of Q[X]. But Q[X] is a principal ideal domain (it has a Euclidean algorithm, recall our
work for k[x] in Lecture 2), so S = (f ) for some irreducible monic f ∈ Q[X].
In fact, f is the polynomial of minimal degree such that f (α) = 0 and f is monic.
We call f the minimal polynomial of α, and the degree of f is called the degree of
α.
Note that we could have also come to this more directly without considering Q[X]
as a PID. Take the set S, take the element f ∈ S which is monic and has minimal
degree. Suppose g ∈ S as well. Then, the Division Algorithm in Q[X] tells us that
g(X) = f (X)q(X) + r(X) for some q, r ∈ Q[X] such that deg r < deg f . But then
r(α) = g(α) − f (α)q(α) = 0, so r ∈ S as well. This is only possible if r = 0 by
minimality of f , which implies f | g. Hence, S = (f ), and f is the minimal polynomial
of α by construction.
Define the sets
Note Q[α] ⊆ Q(α). The natural question is, then, when does equality hold? We
have one answer to this:
Proposition 8.12
If α is an algebraic integer, then Q[α] = Q(α) and it is a Q-vector space of dimension
equal to the degree of α.
P (α)
Proof. Let f be the minimal polynomial of α. Let Q(α) ∈ Q(α), where P, Q ∈ Q[X] and
Q(α) ̸= 0. This means f ∤ Q, and since f is irreducible, this means (f, Q) = 1. Thus,
the Euclidean algorithm in Q[X] tells us that ∃ h, k ∈ Q[X] such that f g + Qk = 1.
Substituting X = α, we get f (α)h(α) + Q(α)k(α) = 1 =⇒ k(α) = 1/Q(α), so
P (α)/Q(α) = P (α)k(α) ∈ Q[α]. This proves the first part of the statement.
Notice that αn = −an−1 αn−1 − · · · − a0 , so Q[α] is generated by 1, α, . . . , αn−1 .
These elements are linearly independent: if not, then α would satisfy some polynomial
relation of the form
bn−1 αn−1 + · · · + b0 = 0
where bi ∈ Q. But g(X) = n−1 i
P
i=0 bi X satisfies g(α) = 0 and deg g < deg f , which only
complies with the minimality of f if g = 0, meaning bi = 0 for all i. The conclusion
follows.
We will prove this statement using roots of unity. In general, the nth roots of unity
are the complex numbers z ∈ C satisfying z n = 1. (If you think about it enough, you
can see that z must be of the form e2πik/n for some integer k.)
(
ζ + ζ −1 p ≡ ±1 (mod 8)
ζ p + ζ −p =
ζ 3 + ζ −3 p ≡ ±3 (mod 8).
We can write these sums in terms of τ . The first one is just τ = ζ +ζ −1 by definition,
and the second can be written as ζ 3 + ζ −3 = −ζ −1 + (−ζ −1 )−1 = −(ζ + ζ −1 ) = −τ since
ζ 4 = −1 =⇒ ζ 3 = −ζ −1 . To summarize,
(
p p −p τ p ≡ ±1 (mod 8)
τ =ζ +ζ = = (−1)ε τ,
−τ p ≡ ±3 (mod 8).
2
where ε = p 8−1 .
Now we put all of this together. From above, we have
ε p 2
(−1) τ ≡ τ ≡ τ (mod p).
p
Multiplying by τ on both sides gives us factors of τ 2 = 2 on the left and the right.
But since pis odd, 2 ∤ p, so we can cancel out the 2’s on both sides to be left with
(−1)ε ≡ p2 (mod p). But if both (−1)ε and the Legendre symbol only take on values
of ±1, then this equivalence mod p translates to equivalence as integers. This concludes
the proof.
I think this proof is a bit nicer than the proof we provided previously for this result,
since it gives a little more explanation as to why considering p mod 8 is the correct
thing to do. Before, we were like “oh, haha, if you look at p mod 8 and go through all
the cases then it works, what a happy coincidence har har har” but here the 8 comes
naturally from this whole business with considering the eighth root of unity ζ.
Lemma 9.1
Let a ∈ Z. then,
p−1
(
X p if p | a
ζ at =
t=0
0 (p, a) = 1.
Proof. We first consider the case when p | a. Then, since ζ p = 1, it follows that ζ a = 1
as well, and so the result follows.
Now suppose p ∤ a. This is similarly easy: this sum is just the sum of a finite
geometric series with starting term 1 and common ratio ζ, so we have
p−1
X (ζ a )p − 1
ζ at = =0
t=0
ζa − 1
again because (ζ a )p = (ζ p )a = 1.
Corollary 9.2
Let x, y ∈ Z. Then,
p−1
(
1 X t(x−y) 0 if p ∤ x − y
ζ =
p t=0 1 if x ≡ y (mod p).
Now, we will prove a useful lemma regarding sums of the Legendre symbol.
Lemma 9.3
Pp−1 t
t=0 p
= 0.
Proposition 9.5
ga = ap g1 .
P t
Proof. If p | a, then ζ a = 1, so ga = p−1
t=0 p
= 0 by Lemma 9.3. Now, assume that
p ∤ a. We have an isomorphism from Z/pZ → Z/pZ where t 7→ at, so we can change
variables to get
p−1
a X a t
ga = ζ at
p t=0
p p
p−1
X at
= ζ at
t=0
p
p−1
X s
= ζ s = g1 ,
s=0
p
a a a
=⇒ ga = ga = g1 ,
p p p
as desired.
Author’s Note 9.6. Some notation remarks. We will let g = g1 , and I will
denote Z/pZ as Fp (the F stands for “field” because this is a field
P with p elements).
Whenever I suppress the indices of a sum (e.g., if I just write x ), then assume x
goes from 0 to p − 1.
Moving forward,
Proposition 9.7
p−1
g 2 = (−1) 2 p.
On the other hand, we can use the definition of g to explicitly expand (using Freshman’s
Dream, which states (a + b)q ≡ aq + bq mod q):
!q
Xt X t q
q t
g = ζ ≡ ζ qt ≡ gq (mod q),
t
p t
p
q
where the last equality follows because pt = pt since the symbol is either −1, 0, 1
and q is odd.
But Proposition 9.5 tells us that gq ≡ pq g mod q, so g q ≡ gq ≡ pq g. At the
∗
same time, we wrote above that g ≡ pq g, so we have
q
∗
p q
g≡ g (mod q)
q p
∗
p q
=⇒ g·g ≡ g·g
q p
∗
p ∗ q
=⇒ p ≡ p∗ (Proposition 9.7)
q p
∗
q p
=⇒ ≡
p q
p−1
!
(−1) 2 p
= (again Prop 9.7)
q
p−1
−1 2 p
=
q q
q−1 p−1 p
≡ (−1) 2 · 2 , (Corollary 7.6)
q
which is exactly the statement for Quadratic Reciprocity.
Proposition 9.8
X p −1
Let P (X) = X p−1 + X p−2 + · · · + 1 = X−1
. Then, P is an irreducible polynomial
in Q[X].
Remark 9.9. We call P (X) above the pth cyclotomic polynomial. Note also
that if we take ζ = e2πi p again, then P (ζ) = 0, so P is the minimal polynomial of
ζ by the Proposition, as it is irreducible.
Taking this modulo p, we have X p−1 ≡ f (X + 1)g(X + 1) mod p, which means in mod
p, we have f (X + 1) ≡ X r and g(X + 1) ≡ X s for some r, s > 0. Taking X = 0,
we see that p | f (1), g(1), so p2 | f (1)g(1) = P (1) = p, which is a contradiction. The
conclusion follows.
Proposition 9.11
Taking ζ = e2πi/p again,a
p−1
2
Y 2 p−1
ζ 2k−1 − ζ −(2k−1) = (−1) 2 p.
k=1
a
Apparently Gauss thought about this almost every day for four years before being able to
prove it. Look at Gauss, man, so inspirational.
Proof. Consider again P (X) from the above Proposition; since every ζ t is a root of P ,
p−1
we can write P (X) = X p−1 + · · · + 1 = t=1 (X − ζ t ). Plugging in X = 1, we have
Q
p−1
Y
p = P (1) = (1 − ζ t ).
t=1
Now we do something that is a bit ad hoc, which is only reasonable if it took a chad
like Gauss four years to come up with this. Observe that the set ±(4k − 2) : 1 ≤ k ≤ p−1
2
is a complete set of residues mod p. This is the case because the expression 4k − 2,
where we take the values 1 ≤ k ≤ p+14
, gives the 2 mod 4 residues 2, 6, . . . ,, while those
p+1
greater than 4 give the 3 mod 4 residues. Taking the negatives of everything covers
the rest.
Given this, though, we can now rewrite our product, splitting along when we take
+(4k − 2) or −(4k − 2):
p−1 p−1
2
Y 2
Y
p= (1 − ζ 4k−2 ) (1 − ζ −(4k−2) )
k=1 k=1
p−1
Y2
Proposition 9.12
Let p be an odd prime, and ζ = e2πi/p . Then,
p−1 (√
2
Y
−(2k−1) p p≡1 (mod 4)
ζ 2k−1 − ζ
= √
k=1
i p p≡3 (mod 4).
Proof. From the above Proposition, we know that the magnitude of the product is going
√
to be p, so we are only concerned about the sign/what power of i we multiply.
We need to get a little bit more involved in the complex numbers this time. You
may have seen before eiθ = cos θ + i sin θ; thus, we have eix − e−ix = 2i · sin(x), so we
have
p−1 p−1
2 2
2iπ(2k−1) 2iπ(2k−1)
Y Y
ζ 2k−1 − ζ −(2k−1)
− e−
= e p p
k=1 k=1
p−1
2
Y (4k − 2)π
= 2i sin
k=1
p
p−1
2
p−1 Y (4k − 2)π
=i 2 2 sin .
k=1
p
Let’s look at when sin is negative. In general, sin x is negative when π < x < 2π,
so sin (4k−2)π
p
is negative if p+2
4
< k ≤ p−1
2
. Counting by case, this gives p−1
4
negative
p−3
terms if p ≡ 1 (mod 4) and 4 negative terms if p ≡ 3 (mod 4).
Now we proceed by case mod 4. If p ≡ 1 (mod 4), then the sign of the product is
p−1 p−1 p−1 p−1 p−1
i (−1) 4 = (−1) 4 (−1) 4 = (−1) 2 = 1. This agrees with what we have in our
2
proposition statement.
If p ≡ 3 (mod 4), then the sign is equal to
p−1 p−3 p−3 p−3
i 2 (−1) 4 = i(−1) 4 (−1) 4 = i,
which is also what we have in the proposition. This concludes the proof.
We have seem to gone a long ways away from our temporary home of Gauss sums,
but now we have found a road to circle back around. Recall Proposition 9.7 tells us
p−1
that g 2 = (−1) 2 p. But this is exactly the expression we have from Proposition 9.11,
so we have
p−1
2
p−1 Y 2
g 2 = (−1) 2 p= ζ 2k−1 − ζ −(2k−1)
k=1
p−1
2
Y
ζ 2k−1 − ζ −(2k−1) .
=⇒ g = ε
k=1
You are probably on the edge of your seat at this point whether ε is positive or
negative. Kronecker gives us the answer.
Proof. Let
p−1
p−1 2
X j Y
xj − ε x2k−1 − xp−(2k−1) .
f (x) =
j=1
p k=1
Note that when x = ζ, we are just computing f (ζ) = g − g = 0. One can also
use Lemma 9.3 to deduce f (1) = 0. This means that f is divisible by the minimal
polynomials of ζ and 1, respectively; in particular, X p−1 + · · · + 1 | f and X − 1 | f .
This means
(X − 1)(X p−1 + · · · + 1) = X p − 1 | f,
so we can write f (X) = (X p − 1)g(X) for some g(X). Substituting x = ez in the above
expression, we have
p−1 (p−1)/2
X j jz
Y
ez(2k−1) − ez(p−(2k−1)) = (epz − 1)g(ez ).
e −ε (2)
j=1
p k=1
P∞ (jz)k
We can identify ejz with its Taylor series expansion k=0 k!
, in which case the
sum can be expressed as
p−1 ∞ p−1
X j jz
X zk X j
e = jk.
j=1
p k=0
k! j=1
p
We will now identify thez (p−1)/2 coefficient in Equation 2. The sum contributes
a
Pp−1 j p−1 p−1
coefficient of ((p−1)/2)! j=1 p j 2 . But note that mod p, we have j 2 ≡ pj , so
1
Just as a note, a field is a set closed under addition and multiplication, and it has
the important property that every nonzero element has a multiplicative inverse. For
example, Q and Z/7Z are fields because, for instance, 2−1 = 1/2 ∈ Q and 2−1 = 4 in
Z/7Z (since 2 · 4 ≡ 1 (mod 7)), but 2−1 = 1/2 ∈ / Z so Z is not a field.
We first provide a result that tells us exactly what the size of a finite field can be.
Proof. As k is finite, the set under addition is a finite abelian group, so for any x,
|k| · x = x
| + ·{z
· · + x} = 0.
|k| times
8
Honestly I’m not sure if this covers all the vector space axioms, but the main point is that every-
thing is sunshine and rainbows because Fp literally lives in k, so k is an Fp -vector space.
9
This is just saying basis elements are linearly independent, for those who have seen this stuff
before.
10
Note this is not a field isomorphism; in fact, Fp × · · · × Fp not only fails to be a field, but it fails
to be an integral domain. Convince yourself of this!
This is really great news and a strong result off the bat. However, this tells us
practically nothing else about the field k. The only elements we have a hold on is the
copy of Fp living inside k, but otherwise we are a bit lost. (What even are the basis
elements of k?) We will now prove some things which tell us more information about
Fq (where q = pn ).
Lemma 10.2
Suppose k is a finite field with q = pn elements. Then, the polynomial X q − X ∈
Fp [X] factors as Y
Xq − X = (X − α).
α∈k
Whenever we talk about a new object, we always care about maps related to the
object. Finite fields have a really, really useful map that comes with them, called
the Frobenius automorphism. An automorphism is just a bijection which is also a
(field) homomorphism: if σ is an automorphism, then σ is a bijection and it satisfies
both σ(x + y) = σ(x) + σ(y) and σ(xy) = σ(x)σ(y).
σ:k→k
α 7→ αp
where the second line follows from the fact p | mp for 1 ≤ m ≤ p − 1, so all of the
Note that because of the existence of the Frobenius automorphism, k is not unique
up to canonical isomorphism. However, it turns out that any automorphism of k is
simply some power of the Frobenius automorphism. (This is why we say the Frobenius
automorphism is really important.) This is a taste of a beautiful study called Galois
theory, which on the ground is about these automorphisms of fields but can be used to
prove wildly vast things! (In fact, the study arose from proving that there is no quintic
formula.)
We now proceed with the construction of a finite field with q elements. Let k be a
field and f (x) = k[x] is an irreducible monic polynomial. We will consider f [x] modulo
f : we say g, h ∈ k[x] are equivalent (denoted g ∼f h) if f | g − h. (Think of this as just
g ≡ h (mod f ).) This now gives us a construction:
Proposition 10.5
The set of equivalence classes of k[x] under the equivalence relation ∼f is a field.
Proof. We first show that the set has addition and multiplication. I will also suppress
Okay, we constructed a field. What does this field look like? How many elements
does it have?
Corollary 10.6
If f has degree d, then k[X]/ ∼f is a vector space of k with dimension d, with a
basis given by 1, X, X 2 , . . . , X d−1 .
Proof. Since k[X] is a Euclidean domain, it has a Division Algorithm. Thus, for any
g ∈ k[X], we can write g = q · f + r for some q, r ∈ k[X] and deg r < deg f . This
means g ∼f r, but r is written using just 1, X, . . . , X d−1 (as deg r < deg f = d), so we
are done.
10.2 Existence of Fq
Now we prove another quite remarkable fact. We showed earlier that X q − X = 0
factors into q linear factors in k[X], since every α ∈ k satisfies αq − α = 0. But what
if we just considered the factorization of X q − X = 0 in Fp [X]? A remarkable thing
happens:
Theorem 10.7
Let q = pn . Then,
n
Y
Xq − X = Xp − X = Fd (X) ∈ Fp [X],
d|n
This gives us access to irreducible polynomials! This may not seem so cool at first,
but I challenge you to find an irreducible polynomial of, say, F7 [X] of degree 4 and
see who’s laughing by the end of your computation. We care about these irreducible
polynomials because, well, we use them to construct our finite fields. In line with this,
we have the existence of a finite field of order pn :
Corollary 10.8
There exists an irreducible polynomial f ∈ Fp [X] of degree n.
This gives us an explicit form for Nn , and one can check that the sum on the right
is nonzero (it is a sign of distinct powers of p up to sign), so Nn ̸= 0 and indeed there
exists an irreducible polynomial of degree n in Fp [X].
Corollary 10.9
There exists a finite field of order pn .
Proof. From the above corollary, there is an irreducible polynomial f ∈ Fp [X] of degree
n. Then, Fp [X]/ ∼f is your desired field.
Proof of Theorem 10.7. First, we will show that no factor of X q − X appears twice,
n n
namely if f ∈ Fp [X] with deg f > 0 such that f | X p − X, then f 2 ∤ X p − X. Suppose
n
the contrary, so f 2 · g = X p − X. We will reach a contradiction shortly after a brief
interlude on derivatives.
In finite fields, we have a notion of a formal derivative. In high school calculus,
you learn the derivative via the limit definition, but afterwards, you forget about the
d 2
limit and just manipulate symbols. For example, you can prove dx x = 2x via the limit
definition, but you probably know this as just “oh, the rule for the derivative of xn is
just nxn−1 .”
For finite fields, we define the derivative as just this symbolic manipulation. So in
d 2
Fp [X], we still write dx x = 2x, except now it doesn’t mean anything more than what
we just wrote down. Limits don’t make sense here, anyways! But this derivative still
obeys all the things we expect from calculus (e.g. Chain Rule, Product Rule, etc.), so
we can work with this.
d
To summarize, we have a map dx : Fp [X] → Fp [X] where X n 7→ n · X n−1 . Taking
n
the “derivative” on both sides of f 2 · g = X p − X, we get
n −1
2f ′ (X)f (X)g(X) + f 2 (X)g ′ (X) = pn X p − 1 = −1,
where the last equality follows because we are working over Fp . But f divides the left
hand side, so we must have f | −1, which is our contradiction. Thus, any factor of
n
X p − X has multiplicity one.
n
Now, it remains to show that the only factors of X p − X have degree d | n. In
n
other words, if f ∈ Fp [X] is an irreducible monic, then f | X p − X if and only if
d = deg f | n. We will approach this via the following lemma:
Lemma 11.1
Let ℓ, m ∈ N+ . If F is a field, then X ℓ − 1 | X m − 1 in F [X] if and only if ℓ | m.
Likewise, if a ∈ N>1 , then aℓ − 1 | am − 1 if and only if ℓ | m.
Xm − 1 X qℓ+r − 1
=
Xℓ − 1 Xℓ − 1
qℓ
r X −1 Xr − 1
=X · ℓ + ℓ .
X −1 X −1
Note X ℓ − 1 | X qℓ − 1 (we have X qℓ − 1 = (X ℓ )q − 1 ≡ 1q − 1 = 0 (mod X ℓ − 1)), so
the first term on the right is a polynomial. However, the last term on the right is not
a polynomial since r < ℓ, unless r = 0 in which case ℓ | m, as desired.
The proof for the second part of the statement follows from our work above (the
proof is exactly the same).
d d
so in particular β satisfies β p − β = 0. But then any β ∈ K satisfies both X p − X = 0
n
and X p − X = 0, and the only roots of the former are elements of K, so we have
d n d n
X p − X | X p − X. In other words, X p −1 − 1 | X p −1 − 1, in which case Lemma 11.1
tells us that pd − 1 | pn − 1 and hence d | n.
The crux of the above proof relies on Lemma 11.1, which basically gives us a nice
divisibility criterion on the exponents given divisibility of polynomials.
11.2 Uniqueness of Fq
We have almost completed the story of finite fields. We will now complete the promise
given by Theorem 10.4, which not only guarantees existence, but says that the field Fq
is unique up to isomorphism. We now prove the uniqueness.
Proof of uniqueness, Theorem 10.4. Let q = pn and suppose F is a finite field of order q.
We will show that it is isomorphic to Fp [X]/ ∼f for some monic irreducibleQf ∈ Fp [X] of
degree n. Note Theorem 10.7 tells us that f (X) | X q − X, but X q − X = α∈F (X − α),
so ∃ α ∈ F such that f (α) = 0. We can now identify F as Fp (α) via the following
isomorphism:
≃
Fp [X]/f −
→F
X 7→ α.
One should be more careful than Kisin here by showing this is actually an isomorphism,
but if you know the First Isomorphism Theorem, this is not too bad. Since f is
irreducible, it is the minimal polynomial of α. Thus, the map Fp [X] → F sending X 7→
α is clearly surjective, and its kernel is exactly f , so Fp [X]/f → F is an isomorphism,
as desired.
Proposition 11.2
If |F | = q = pn , then the subfields of F are in bijection with the divisors of n.
just showed then that d | n), then we can identify E is the subfield of F fixed by σ d .
In short,
d d
E = F σ = {α ∈ F | σ d (α) = αp = α}.
What Galois theory does is it relates these subfields to subgroups of what we call
the Galois group, which is just the set of automorphisms of F fixing Fp . In the case
of F a field over Fp , it turns out that the only automorphisms of F fixing Fp are powers
of the Frobenius automorphism, so Aut(F/Fp ) = {1, σ, . . . , σ n−1 } ≃ Z/nZ. (You may
see this as Gal(F/Fp ); this is because F is what we call a Galois extension over Fp . If
you’re curious to learn more, take Kisin’s Math 123 next semester!)
But what are the subgroups of Z/nZ? They are simply the multiples of d in Z/nZ
for d | n. (For instance, {0, 3, 6} is a subgroup of Z/6Z.) The multiples of d correspond
to the subgroup {1, σ d , σ 2d , . . . , σ n−d }, or the subgroup generated by σ d . But we showed
that the field fixed by σ d (hence fixed by this subgroup) is E! So there is a bijection
between subgroups of Aut(F/Fp ) and subfields of F/Fp where a subgroup corresponds
to the subfield it fixes. This is really nice, because we have a lot of results in group
theory that we can now use.
To conclude this discussion on finite fields, we provide the following nice result:
Lemma 11.3
If |F | = q = pn , then F × is cyclic. (Hence it is isomorphic to Z/(q − 1)Z.)
The proof follows the proof for when q = p is just a prime (Theorem 5.12), so we omit
for brevity.
(−1)(p−1)/2 p. Denote p∗ := (−1)(p−1)/2 ·p. We now track through the following equivalent
statements:
Proposition 12.1
N : Z[i] → Z is a Euclidean function.
Proof. We use fact that −1 is a quadratic residue mod p when p ≡ 1 (mod 4).
the
−1
(In general, p = (−1)(p−1)/2 , which is just 1 when p ≡ 1 (mod 4).) Therefore, this
means there exists some integers s, k ∈ Z such that s2 + 1 = pk.
Now we claim p is reducible in Z[i]. Suppose not. Then, p | s2 + 1 implies either
p | s + i or p | s − i. Clearly this cannot be possible, so p is indeed reducible, meaning
we can write p = α · β where α, β ∈ Z[i] are not units. Write α = a + bi. Then,
Lemma 12.3
If N (a + bi) is a prime integer, then a + bi is irreducible.
Here, the fact that N (α) ̸= 1 follows from the fact that α is not a unit. We can
actually use this kind of idea to find all units of Z[i].
Lemma 12.4
The units in Z[i] are {±1, ±i}.
Going back to our original question, we determined that any Gaussian integer with
prime norm must be irreducible. But suppose I have an element with norm that is not
prime. Then, can it be irreducible? (Basically, is the converse of Lemma 12.3 true?)
To hint at the answer of the above question, let me ask a different question. If p ∈ Z
is prime, can p be irreducible in Z[i]? And if so, when? Fermat showed that p is not
irreducible when p ≡ 1 (mod 4), and we just proved it, so we only need to consider
primes 3 mod 4. It turns out that p ≡ 3 (mod 4) is always irreducible.
Proposition 12.5
Any prime p ≡ 3 (mod 4) is irreducible in Z[i].
Now, we have primes 3 mod 4 and Gaussian integers with prime norm as our irre-
ducibles. How many more are there? It turns out that these cover all irreducibles!
Lemma 12.6
If a + bi ∈ Z[i] is irreducible, where a, b ̸= 0, then N (a + bi) = p is prime.
Note that if one of a, b = 0, then we are just dealing with integers, from which we
concluded that the only integers which stay irreducible in Z[i] are primes 3 mod 4. I
guess we have to be a little careful and make sure to include p = 2 in our discussion,
but we can factor 2 = (1 + i)(1 − i) = i(1 − i)2 , so it is reducible in Z[i].
Proof. Let π be an irreducible dividing x + iy. Suppose for the sake of contradiction
that π | x − iy. Then, π | (x + iy) + (x − iy) = 2x = (1 + i)(1 − i)x, so π must divide
one of those factors. If π | 1 + i, then N (π) | N (1 + i) = 2; as π is a non-unit, we have
N (π) = 2. But we have π | x + iy, and this means π | x + iy = x − iy; combining gives
2 = N (π) | N (x + iy) = x2 + y 2 = z 2 . Thus, 2 | z 2 , so z is even.
But then x2 + y 2 = z 2 ≡ 0 (mod 4), which is only possible if x2 ≡ y 2 ≡ 0 (mod 4)
(if they were both odd, then x2 + y 2 ≡ 1 + 1 = 2 (mod 4)). In particular, that means
x, y are both even. This contradicts the assumption that x, y, z are coprime, so π ∤ 1+i.
Likewise, we can make the same argument for 1 − i to show π ∤ 1 − i.
Thus, π | (1 + i)(1 − i)x but π ∤ 1 + i, 1 − i, so π | x. Likewise, π | (x + iy) − (x − iy) =
2iy = i(1 + i)(1 − i)y, so π | y as well. But x, y are coprime as integers, so π cannot be
a prime integer itself. Lemma 12.6 tells us then that N (π) is some prime p. Taking the
norm of our divisibility conditions, we have p = N (π) | N (x) = x2 and p | y 2 similarly.
This contradicts again our assumption that x, y are coprime, so indeed π ∤ x − iy and
thus x + iy, x − iy are coprime.
Now we will use this claim to classify all Pythagorean triples. Recall we have
x2 + y 2 = (x + iy)(x − iy) = z 2 , and the two terms in the middle are coprime. We can
factor z into irreducibles in Z[i]: let z = u · π a1 · · · πrar where u is a unit and πj ’s are
irreducibles. Then,
(x + iy)(x − iy) = (u · π1a1 · · · πrar )2 .
But since x + iy and x − iy are coprime and their product is a square, they must
each be squares! (Up to units, of course.) Thus, x + iy = w · β 2 for some unit w and
β ∈ Z[i]. Write β = a + bi, and suppose w = 1. (It turns out that if you choose some
other unit for w, then we would get the same answer we are about to obtain, so we will
just do the w = 1 case here.) Then,
so any primitive Pythagorean triple is of the form (x, y, z) = (a2 − b2 , 2ab, a2 + b2 ) for
integers a, b ∈ Z.
Given that the proof of this took more than 350 years to arise, it is clearly out of
the scope of this class, but we can investigate this problem for small values of n.
Suppose this is true. Then, Fermat’s Last Theorem is reduced to the case when
n = p is an odd prime. To see why, let n = mp for some odd prime p. If xp + y p = z p
has no solutions, then so will (xm )p + (y m )p = (z m )p , which is just xn + y n = z n . The
only case not covered here is when n is not divisible by an odd prime, i.e., when n is a
power of 2. But this is covered by the n = 4 case above, which we will now prove.
Proof. We may assume (x, y, z) are pairwise coprime, as if p divided two of them, then
the equation forces p to divide the third, and then we could consider the smaller solution
(x/p, y/p, z/p2 ).
Suppose (x, y, z) is the smallest solution to this equation; by smallest, we mean |z| is
minimized. If (x, y, z) satisfies the equation, then (x2 , y 2 , z) is a primitive Pythagorean
triple. We found a general form for Pythagorean triples! This means that there exist
coprime k, ℓ ∈ N such that
x2 = k 2 − ℓ2 , y 2 = 2kℓ, z = k 2 + ℓ2 .
z ′ ≤ (z ′ )2 = a2 + b2 = k ≤ k 2 < z,
so we found a smaller solution (x′ , y ′ , z ′ ), and the proof concludes by an infinite descent
argument.
From our remarks above, now we are just left with the cases of Fermat’s Last
Theorem when n = p is an odd prime. Again, we won’t do this in full generality, but
we have the tools to prove it for n = 3. We will do this next time.
Remark 13.6. We can list some primes p such that 2p + 1 is also prime: 3,5,11,
23, ... It is actually an open problem whether or not there are infinitely many such
primes, yet another exhibit for why primes are so elusive.
but r ̸= p, so y p−1 ≡ 0 (mod r) which is only possible if r | y. But then this means
r | z as well, contradicting y, z being relatively prime.
Thus, since the two factors are coprime, and their product is (−z)p a perfect pth
power, each of the factors must be a perfect pth power. Write x + y = Ap and xp−1 −
xp−2 y +· · ·+y p−1 = T p for A, T ∈ Z. But we can do this same process for the equivalent
equations xp +z p = (−y)p and y p +z p = (−x)p to get x+z = B p and y+z = C p for some
B, C ∈ Z. Letting q := 2p + 1, which by assumption is prime, we have p = (q − 1)/2,
so xp + y p + z p = 0 implies
q−1 q−1 q−1
x 2 +y 2 +z 2 ≡0 (mod q).
q−1
If q ∤ xyz, then each term in the sum is either ±1 (recall x 2 = xp which is either
±1 for p ∤ x), so the left hand side cannot possibly be 0 mod q. WLOG suppose q | z
but q ∤ x, y. Using x + y = Ap , x + z = B p , y + z = C p , we have B p + C p − Ap = 2z,
which means q−1 q−1 q−1
B 2 + C 2 − A 2 ≡ 0 (mod q).
By the same argument as in the above paragraph (each term is ±1 mod q), this
forces q | ABC. But looking at the definitions of A, B, C tells us q ∤ B and q ∤ C, so we
must have q | A. So now we return to A: we have x + y = Ap ≡ 0 (mod q), meaning
y ≡ −z (mod q). We haven’t talked about T yet, so let’s bring that in now: we see
T p = xp−1 − xp−2 y + · · · + y p−1 ≡ py p−1 (mod q). Noting that since q | x, we have
Ap = x + y ≡ y (mod q), so T p ≡ p(Ap )p−1 (mod q). Rewriting, we obtain
q−1 p
= p ≡ T · (B p−1 )−1 (mod q)
2
q−1
= T · (B p−1 )−1 2 (mod q)
≡ ±1 (mod q),
so −1/2 ≡ ±1 (mod q). But p ≥ 3 implies q ≥ 7, and this equality cannot hold when
q ≥ 7, so we have reached a contradiction! Sophie Germain’s Theorem follows.
Note this is actually stronger than what we need for Fermat’s Last Theorem, but
hey, we will take stronger statements any day.
Lemma 14.2
Z[ω] is a Euclidean domain.
Before we start working with ω, let’s lay out a few identities we will consistently
use. First, note ω = ω 2 = ω −1 and ω 3 − 1 = 0 =⇒ 1 + ω + ω 2 = 0. This means
ω + ω = ω + ω −1 = −1.
αα = (a + bω)(a + bω)
= a2 + b2 + ab(ω + ω)
= a2 − ab + b2 ∈ Q.
Our statement (Theorem 14.1) mentions the units of Z[ω]; let us state what they
are.
Lemma 14.3
Z[ω]× = {±1, ±ω, ±ω 2 }.
14.2 Properties of λ = 1 − ω
We continue with proving more properties of elements in Z[ω] to build up machinery.
Lemma 14.4
Let λ = 1 − ω. Then,
1. N (λ) = 3,
2. λ is irreducible in Z[ω],
3. (λ2 ) = (3),
4. Z[ω]/λZ[ω] ≃ Z/3Z.
Proof. Let’s see how fast we can work through all of these.
The first one is literally just using the equation N (a + bω) = a2 − ab + b2 . Let’s
move on to (2). If λ = αβ, then 3 = N (λ) = N (α)N (β), which is only possible if one
of N (α) or N (β) is 1, i.e., one of α, β is a unit.
For (3), note λ2 = (1 − ω)2 = 1 − 2ω + ω 2 = −3ω. Taking the ideals generated on
each side and noting −ω is a unit, we get the result.
The hardest part is (4), but it is really not that bad. In fact, you did most of this
work in a previous problem set! (Problem Set 2, maybe?) The homework problem
x3 − 1 = (x − 1)(x − ω)(x − ω 2 )
= λt(1 − ω + λt)(1 − ω 2 + λt)
= λt(λ + λt)(λ(1 + ω) + λt)
= λ3 t(1 + t)(1 + ω + t).
Proof. We know from Lemma 14.4.2 that λ is irreducible, so if λ ∤ xyz, then λ does
not divide each factor. Look at the equation x3 + y 3 = uz 3 in modulo λ4 . From Fact
14.5 above, the left hand side is of the form ±1 ± 1, which takes on values {0, ±2}.
Meanwhile, the right hand side is congruent to ±u, but we found all the units of Z[ω]
in Lemma 14.3! So the right hand side takes on values {±1, ±ω, ±ω 2 }. There is no
overlap mod λ4 (note (λ4 ) = (9), and it is clear none of these values are congruent mod
9), so there are no solutions.
Lemma 14.7
If x3 + y 3 = uz 3 , with λ ∤ xy and λ | z, then λ2 | z, i.e., ordλ z ≥ 2.
Proof. Again, we use the incredibly useful Fact 14.5. We reduce our equation to mod
λ4 . Again, like above, the left hand side takes on values {0, ±2}. Let L be the value
of the left hand side. Then, we have L ≡ uz 3 (mod λ4 ), but λ | z, so we must
have L ≡ 0 (mod λ). This is only true when L = 0, so uz 3 ≡ 0 (mod λ4 ). Thus,
ordλ z 3 = 3 ordλ z ≥ 4, which means ordλ z ≥ 2, as desired.
We will again use the method of infinite descent. The following result activates the
descent step by finding a smaller solution given an initial one.
Lemma 14.8
If x3 + y 3 = uz 3 with λ ∤ xy and ordλ z ≥ 2, then there exist x1 , y1 , z1 ∈ Z[ω],
where x1 y1 z1 ̸= 0, and a unit u1 ∈ Z[ω]× such that x31 + y13 = u1 z13 with λ ∤ x1 y1
and ordλ z1 = ordλ z − 1.
As a consequence, the infinite descent method tells us that there are no solutions when
λ ∤ xy.
Proof. If α, β ∈ Z[ω] are nonzero, then in general we have the inequality ordλ (α + β) ≥
min(ordλ α, ordλ β). Equality holds when ordλ α ̸= ordλ β. WLOG let s = ordλ α <
ordλ β. Then, we can write α + β = λs (α/λs + β/λs ).
We may assume x, y, z have no common factors in Z[ω]. We can factor our equation
as
(x + y)(x + ωy)(x + ω 2 y) = uz 3 .
From our assumption ordλ z ≥ 2, we have ordλ uz 3 ≥ 6, so one of the factors on the
left hand side must have order ≥ 2. But the terms on the left are all “symmetric” (for
instance, we could replace y with ωy and get the same exact expression), so we can
assume without loss of generality that ordλ (x + y) ≥ 2. Then,
and as λ ∤ y, this means ordλ ((x + y) − (x + ωy)) = 1. By the work we did in the
beginning of the proof, it follows that ordλ (x + ωy) = 1. Likewise, ordλ (x + ω 2 y) = 1,
so ordλ (x + y) = 3 ordλ z − 2.
If π ∤ λ is an irreducible and π | x+y, x+ωy, then π | (x+y)−(x+ωy) = (1−ω)y =
λy, which is only possible if π | y. But then π | x, which contradicts our assumption
that x, y, z share no common factors. Likewise, we have that all three of the terms on
the left are pairwise coprime. But their product is a perfect cube up to some unit, so
we can write
x + y = u1 α3 λt (3)
x + ωy = u2 β 3 λ (4)
x + ω 2 y = u3 γ 3 λ, (5)
where t = 3 ordλ z − 2, ui are units, and (α, β, γ) = 1. Note that (x + y) + ω(x + ωy) +
ω 2 (x + ω 2 y) = (x + y)(1 + ω + ω 2 ) = 0, so we have
Alright, we are in the thick of it, but we will reach the end of this tunnel soon.
We now want to construct our smaller solution, which completes the descent step.
Let z1 = αλ(t−1)/3 (in particular, t−1 3
= ordλ z − 1), y1 = γ, and x1 = β. Also, let
−1 −1 2
ε2 = (−u2 ω) u1 and ε1 = (u2 ω) u3 ω ; note that both ε1 , ε2 are units. This leaves us
with the equation ε2 z13 = x31 + ε1 y13 .
Now we may reduce this equation mod λ2 . Recall z1 = αλ(t−1)/3 , so ordλ z13 ≥ 3 · 1 >
2, so the left hand side is congruent to 0 mod λ2 . Following Fact 14.5, the right hand
side reduces to ±1 ± ε1 (mod (λ2 )), and combining with (λ2 ) = (3) (Lemma 14.4.3),
we get ε1 ≡ ±1 (mod 3). Then, ε2 z13 = x31 ± y13 ; replacing y1 with −y1 if the sign is
negative, we get ε2 z13 = x31 + y13 , as desired.
As mentioned before the proof, the infinite descent method tells us that there are
no solutions when λ ∤ xy, as we can continue decreasing ordλ z but the order must stay
non-negative.
Corollary 14.9
The equation x3 + y 3 = uz 3 , where u ∈ Z[ω]× and x, y, z ∈ Z[ω] such that xyz ̸= 0
and λ ∤ xy, has no solutions.
Proof. We may assume (x, y, z) = 1, else we can divide through by their common factor.
We proved this above for when λ ∤ xy. Suppose λ | x. Then, we must have λ ∤ yz for
√ √
Remark 15.2. Although Z[ d] does not have any imaginary part (d ∈ N so√ d ∈
notion akin√to conjugation. In particular, the map Z[ d] →
R),√we still have a √
Z[ d] sending a + b d 7→ a − b d is an automorphism (bijective homomorphism)!
√ √
Denoting a + b d = a − b d, one can check that α + β = α + β and α · β = α · β.
This is nice because it verifies that all of our claimed solutions for Pell’s equation√as
described in Theorem 15.1 are indeed solutions! If N (α) = 1 (so writing α = x1 + y1 d,
we have x21 − dy12 = 1), then by multiplicativity of the norm, we have N (αn ) = 1, so
(xn , yn ) is also a solution to the equation. Also, if we find some α such that N (α) = −1,
then we can recover a solution to Pell’s equation by just looking at α2 , since N (α2 ) =
(−1)2 = 1.
Example 15.3
√ √
Consider x2 − 5y 2 = 1. We have 22 − 5 = −1, so we look at (2 + 5)2 = 9 + 4 5.
Indeed, 92 − 5 · 42 = 1, so we found a solution!
Lemma 15.4
Let ξ ∈ R be irrational. Then, there are infinitely many x/y ∈ Q (where x, y ∈ Z
and (x, y) = 1) such that
x 1
− ξ < 2.
y y
This is quite a powerful statement! The error being bounded by 1/y 2 is forcing the
error to be impressively small.
Before we begin the proof, let us lay out some notation. For α ∈ R, denote [α] as
the largest integer less than or equal to α, and denote {α} = α − [α] ∈ [0, 1) as the
fractional part of α.
Proof. Choose some n ∈ N. We divide up the interval [0, 1) = [0, 1/n)∪[1/n, 2/n)∪· · ·∪
[(n − 1)/n, 1) into n equal subintervals. Consider the list 0, {ξ}, {2ξ}, . . . , {nξ}. This
has n + 1 terms, so by Pigeonhole Principle, two of them are in the same subintervals.
In other words, ∃ 0 ≤ j < k ≤ n such that |{jξ} − {kξ}| < 1/n. Rewriting this via
{ξ} = ξ − [ξ], we have
|jξ − kξ + [kξ] − [jξ]| < 1/n.
Letting x = [kξ] − [jξ] and y = k − j, we have |x − yξ| < 1/n. Note that both
0 ≤ j, k ≤ n, so we have y = k − j ≤ n, meaning
x 1 1
−ξ < ≤ 2.
y ny y
Great, so this gives us one fraction x/y satisfying the inequality. Let us generate
infinitely many more! Note that we could run the same argument for any n, but we
must be careful in not producing the same fraction x/y over and over again. We will
be more careful in our choice of n.
̸ 0; choose m ∈ N such that |x/y −ξ|−1 < m.
Since ξ is irrational, we know |x/y −ξ| =
Now, run the same argument as above but with m instead of n to get some fraction
x1 /y1 such that
x1 1 1 x
−ξ < ≤ < −ξ ,
y1 my1 m y
and we know xy11 satisfies the inequality by construction. We can continue this process
to construct infinitely many xn /yn ∈ Q satisfying the inequality, as desired.
√
Proof. By the previous lemma (15.4) with ξ =√ d, there are infinitely many x/y√∈ Q,
2
with x, y ∈ N and (x,
√ y) = 1, such√that |x/y√− d| < 1/y , or equivalently |x − y√ d| <
1/y. Noting
√ x+√ y d = (x − √ y d) + 2y d, so by Triangle Inequality |x + y d| ≤
|x − y d| + |2y d| < 1/y + 2y d, we have
√ √ √
|x2 − dy 2 | = |(x − y d)(x + y d)| < 1/y(1/y + 2y d)
√ √
= 1/y 2 + 2 d ≤ 2 d + 1.
√
Taking M = 2 d + 1 will do the trick.
Proof of Theorem 15.1. By Lemma 15.5 above, there exists some m ∈ Z such that
x2 − dy 2 = m has infinitely many solutions x, y ∈ Z. Fix such an m, and let the
solutions be (xn , yn ) for n ∈ N. By Pigeonhole Principle again, there must be two
solutions (xi , yi ) and (xj , yj ) such that x1 ≡ x2 (mod m) and y1 ≡ y2 (mod m).
√ √
Let α = xi − yi d and β = xj − yj d. Note N (α) = x2i − dyi2 = m, and likewise
N (β) = m.
α · β = α(α + β − α)
= α · α + α(β − α).
√
Let A + B d = α · β. Note that α · α = m and, by√our choice of √
i, j, we have
m | β − α, so m | A and m |√
B. Thus, we can write A + B d = m(u + v d) for some
u, v ∈ Z. We have N (A + B d) = N (α)N (β) = m · m = m2 , so
√ √ √
m2 = N (m(u + v d)) = N (m)N (u + v d) = m2 N (u + v d),
√
so N (u + v d) = 1. Aha, this is promising – we know elements with norm 1 correspond √
to a solution of Pell’s equation! Furthermore, note that yi ̸= yj , so the coefficient of d
in β − α is mv = yj − yi ̸= 0. In particular, v ̸= 0.
Choose a solution
√ (x, y) of x2 − dy 2 = 1√with x, y ∈ N and x as small as possible.
Let α = x + y d, and denote β = u + v d. (I know we defined α and β earlier
in the proof,
√ but we are repurposing them here. Sorry!) We observed above that
N (u + v d) = u2 − dv 2 = 1. Since α is our minimal solution, we have β > α. This
means either β lies between αn and αn+1 for some n ∈ N, or it is equal to αn on the
dot for some n. The second case is what we want, so we suppose the first case is true,
and we hope to reach a contradiction.
Suppose αn < β < αn+1 . Multiplying by αn on both sides, we have
1 = N (α)n = αn · αn
< β · αn < αn+1 · αn = α.
Let γ = β · αn . We have N (γ) = N (β) · N (α)n = 1, but this contradicts the minimality
of α, so we must have β = αn for some n, as desired.
(X − α)(X − α) = X 2 − (α + α)X + α · α
1−d
= X2 − X + ∈ Z[X].
4
h √ i
Thus, when d ≡ 1 (mod 4), it may make sense to consider the ring Z 1+2 d in-
√
stead of Z[ d]. This is in fact what happens in number theory! To give a bit more
√
explanation, what we actually want is to find the “integers” in the field Q( d). In Q,
we can recover Z creatively via Proposition 8.3: the integers are
√ the algebraic integers
contained in Q. Likewise, we can define the “integers” of Q( d) has thei algebraic in-
√
tegers contained in this field. Turns out, this ring of integers is Z 1+2 d when d ≡ 1
(mod 4).
Proposition 17.2
There are infinitely many primes congruent to 3 (mod 4).
Proof. Suppose there are finitely many; let {p1 , . . . , pr } be the complete list. Consider
N = 4p1 · · · pr + 3. None of the pi ’s divides N , so all of its prime factors must be
1 mod 4. But the product of numbers 1 mod 4 will still be 1 mod 4, but N ≡ 3 (mod 4)
by construction, so we reach a contradiction.
For this class, we will just concern ourselves with s ∈ R>1 , but its real power comes
from taking s ∈ C. (This requires some knowledge of complex analysis, which is a really
fun subject (everything is so nice in complex analysis!) but goes beyond the scope of
this course. Take Math 113 if you’re interested, though!)
We can check the sum converges for s > 1. By the integral test, we can bound
ˆ n+1
−s
(n + 1) < t−s dt < n−s
ˆn∞
=⇒ ζ(s) − 1 < t−s dt < ζ(s),
ˆ ∞ 1
∞
−s −t−s+1 1
t dt = = < ∞.
1 s−1 1 s−1
Although ζ(s) diverges for s = 1 (it is then the harmonic series, which we’ve shown
in one of the first lectures on the Prime Number Theorem that it diverges), we see that
the divergence is “not too bad.” (For people who know complex analysis, the following
says the pole of ζ(s) at s = 1 is simple with residue 1.)
Proposition 17.3
lims→1+ (s − 1)ζ(s) = 1.
Proof. From the second line of inequalities above, multiplying by s − 1 gives us ζ(s)(s −
1) − (s − 1) < 1 < ζ(s)(s − 1), so both 1 < ζ(s)(s − 1) and ζ(s)(s − 1) < s. As s → 1+ ,
we get the desired.
I was trying to avoid discussing what happens when s ∈ C, but Kisin is going full
force with this complex discussion, so let me try to provide some explanation. We can
define the exponential function ez for z ∈ C by ex+iy = ex · eiy . The first part ex is
just the exponential function for the reals, which we know and love. The second is just
eiθ = cos θ + i sin θ; note, importantly, that |eiθ | = 1, so the magnitude of ez really
comes from just the ex = eRe z part. Writing n = elog n , we have, for s = α + iβ,
n−s = e−s log n = e−α log n · e−iβ log n = n−α e−iβ log n .
Corollary 17.4
log ζ(s)
lims→1+ log(s−1) −1 = 1.
Proof. Let ρ(s) = (s − 1)ζ(s), so log ρ(s) = log(s − 1) + log ζ(s). Dividing by log(s −
1)−1 = − log(s − 1) on both sides, we have
Taking the limit as s → 1+ , we note from Proposition 17.3 that lims→1+ ρ(s) = 1, so
Before we begin the formal proof, I think it is useful to just convince yourself,
perhaps slightly non-rigorously, that this is true. A good place to start would be a
simpler problem such as:
Exercise 17.6. What is the sum
X 1
,
a b
n
n=3 5
where the sum is taken over all n with only 3 and 5 in the prime factorization?
Once you get this, it is not too difficult to see how this generalizes when we sum
over all n ∈ N.
Proof. Our good old sum of infinite geometric series tells us that
1
(1 − p−s )−1 = = 1 + p−s + p−2s + · · · .
1 − p−s
Take some N ∈ N. We know, trivially, that any n ≤ N must factor into primes also
at most N . Thus, by unique factorization,
Y Y X
(1 − p−s )−1 = (1 + p−s + p−2s + · · · ) = n−s + RN (s),
p≤N p≤N n≤N
as desired.
This is perhaps the simplest example of an Euler Product. There are many classes
of functions which exhibit an Euler Product similar to this; if it does, then because
this factorization is so nice, it suggests that the function has some really nice proper-
ties/deeper connections to number theory.
One upshot of expressing the zeta function as a product is that when we take the
logarithm, we can split it up based on the terms in the product (we can’t do anything
with log(x + y), but we know log(xy) = log x + log y). We see this here:
Proposition 17.7
For Re s > 1, log ζ(s) = p p−s + R(s) for some function R(s) bounded near s = 1.
P
x2 X xn
− log(1 − x) = x + + ··· = .
2 n≥1
n
for some function λN with limN →∞ λN (s) = 1. Taking the logarithm on both sides and
using our Taylor series expansion above gives
X
log ζ(s) = − log(1 − p−s ) + log λN (s)
p≤N
X X p−ms
= + log λN (s)
p≤N m≥1
m
X X p−ms
=⇒ lim log ζ(s) = log ζ(s) = lim + log λN (s)
N →∞ N →∞
p≤N m≥1
m
X X p−ms
= + log 1
p m≥1
m
X X X p−ms
= p−s + .
p p m≥2
m
The double sum at the very end is our R(s). Looking at the last sum separately, we
can bound
X p−ms X
≤ p−ms = p−2s (1 + p−s + · · · ) = p−2s (1 − p−s )−1 .
m≥2
m m≥2
Therefore,
X X
R(s) ≤ p−2s (1 − p−s )−1 ≤ (1 − 2−s )−1 p−2s
p p
−s −1
≤ (1 − 2 ) ζ(2) ≤ 2ζ(2),
To illustrate what is going on, recall Corollary 17.4, which told us a certain limit is
1. We can rewrite it as
P −s
log ζ(s) pp
lim+ −1
= lim+ = 1.
s→1 log(s − 1) s→1 log(s − 1)−1
This is the simplest example of Dirichlet density: this tells us that the density of all
primes in, well, all primes is 1.
Dirichlet’s Theorem cares about primes congruent to a mod m, so let us define
P (n, m) as the set of primes p such that p ≡ a (mod m). Then, we can reformulate
Dirichlet’s Theorem as:
In fact, this is stronger than our original statement of Dirichlet’s Theorem, as it not
only guarantees infinitely many primes in each residue class, but they are distributed
equally.
Example 17.10
Taking m = 4 and noting that the only possible residue classes are 1 and 3 mod 4
(excluding the prime p = 2), we have d(P (1, 4)) = d(P (3, 4)) = 1/2, i.e., “half” of
the primes are in each residue class.
Remark 17.11. A couple of remarks. If P1 , P2 are disjoint sets of primes, then the
definition tells us d(P1 ∪P2 ) = d(P1 )+d(P2 ), which should agree with our intuition.
If P is finite, then it also makes sense that d(P ) = 0, which we can observe from the
definition of the Dirichlet density. (If the set is finite, the numerator is bounded,
but the denominator goes to infinity.) Thus, since Dirichlet’s Theorem as described
above guarantees d(P (n, m)) = 1/ϕ(m) > 0, it follows that there are infinitely
many primes in P (n, m).
(For any m, we can define a Dirichlet character χ, and then the definition of L(s, χ)
would be the same.)
Since χ takes on nonzero values {±1}, we have |χ(n)n−s | ≤ |n−s |, so Triangle
Inequality tells us
X X X
|L(s, χ)| = χ(n)n−s ≤ |χ(n)n−s | ≤ |n−s |,
n≥1 n≥1 n≥1
Proposition 17.13
d(P (1, 4)) = d(P (3, 4)) = 1/2.
Proof. Let ζ ∗ (s) = 2∤n n−s . For any n = 2k m for m odd, we make the simple obser-
P
(A more direct way to see this is that, just like how we can construct an Euler
product for ζ(s), we can do the same for ζ ∗ (s), except we omit p = 2 since we are only
summing over odd n.)
Akin to Proposition 17.7, with just omitting the p = 2 term in the sum, we have
X
log ζ ∗ (s) = p−s + R2 (s),
p̸=2
where R2 (s) is bounded near s = 1. We can do the same to L(s, χ), which is similar
to ζ ∗ (s) but where the coefficients in the sum expansion alternate between ±1. Again,
akin to Proposition 17.7, we have
X
log L(s, χ) = χ(p)p−s + Rχ (s),
p
χ(p) = 1 iff p ≡ 1 (mod 4) and the main sum in log ζ ∗ (s) has coefficients all 1, we have
X X X
2 p−s = p−s + χ(p)p−s
p≡1(4) p̸=2 p
X X X
2 p−s = p−s − χ(p)p−s
p≡3(4) p̸=2 p
X
∗ −s
=⇒ log ζ (s) + log L(s, χ) = 2 p + R(s) (*)
p≡1(4)
X
log ζ ∗ (s) − log L(s, χ) = 2 p−s + R(s),
p≡3(4)
where the R(s) error terms are bounded near s = 1. (Here, the two R(s)’s are different,
I am just abusing notation because they don’t really matter.)
We can construct crude bounds for log L(s, χ). We can group the elements on our
sum in two simple ways:
so 2/3 < L(s, χ) < 1 for s > 1. Taking the log, we have log 2/3 < log L(s, χ) < 0. In
particular, this means log L(s, χ) is finite, so we have
so d(P (1, 4)) = 1/2 and consequently d(P (3, 4)) = 1/2 as well.
Dirichlet characters are very important, but developing the theory might feel a bit
like eating vegetables. Bear with us for a bit.
≃
For m = 4, we defined this character χ : Z − 2Z → (Z/4Z)× − → {±1} where an
element outputs its residue mod 4, and we could extend this to be 0 on the even integers.
This is a baby example of the Dirichlet character for a general m. Consider a map
χ : (Z/mZ)× → C×
which is a group homomorphism (i.e. χ(ab) = χ(a)χ(b), and consequently χ(1) = 1).
Recall |(Z/mZ)× | = ϕ(m), so by multiplicativity, we have
χ(a) = e2πik/ϕ(m)
for some k ∈ Z.
For instance, let m = p be prime. We know (Z/pZ)× is cyclic, so (Z/pZ)× ≃
Z/(p − 1)Z. Thus, we can consider χ : (Z/pZ)× → C× as a function from Z/(p − 1)Z,
and the map would be
χ : Z/(p − 1)Z → C×
a 7→ e2πiak/(p−1)
for some k ∈ Z.
Like in the m = 4 case, we can extend these characters to be a Dirichlet character
mod m.
Definition 18.1 (Dirichlet character). A Dirichlet character mod m is a map
χ : Z → C such that for a ∈ Z,
2. otherwise, there exists some χ : (Z/mZ)× → C× such that χ(a) = χ(a mod
m).
Example 18.2
The simplest character is the trivial one, where χ extends from the trivial character
χ : (Z/mZ)× → C× where χ(a) = 1 for all a ∈ (Z/mZ)× .
Example 18.3
Consider A ∼ = Z/nZ, and let r be a generator of Z/nZ. (This amounts to just
having (r, n) = 1, as we’ve seen repeatedly.) Like we worked out before, χ(r) ∈ C×
has to satisfy χ(r)n = χ(rn ) = χ(1) = 1, so χ(r) = e2πik/n for some k ∈ Z.
Denote ζm = e2πik/n . Then, χ(rj ) = ζnj by multiplicativity, and this completely
determines χ. But all characters mod n must be of this form, so it is really depen-
dent on the choice of k in the exponent. As k ranges across Z/nZ (they technically
range along all of Z, but k and k + n produce the same character), we conclude
Ab ≃ Z/nZ, so in fact A ≃ A. b This may not seem so significant at first, but it is
remarkably deep.
Proof. We proved this already for cyclic groups, as any finite cyclic group is isomorphic
to some Z/nZ.
In general, the Classification of Finite Abelian Groups tells us that any finite abelian
group is of the form
A ≃ Z/n1 Z × · · · × Z/nr Z,
where the group operation is just addition component-wise.
Since these components are independent of each other, any χ ∈ A b must be a charac-
ter on each of its components, and given characters on each components, we can patch
it up via multiplication to produce a character on all of A. Thus, specifying χ ∈ A b is
\
equivalent to giving characters χi ∈ Z/n i Z for 1 ≤ i ≤ r, as χZ/ni Z = χi and we can
reconstruct χ from the χi ’s via χ(a1 , . . . , ar ) = χ1 (a1 )χ2 (a2 ) · · · χr (ar ). Now, we use
our work done for cyclic groups to get
r
Y r
Y
b≃
A \
Z/niZ ≃ Z/ni Z ≃ A.
i=1 i=1
This might feel familiar if you’ve learned some linear algebra: the dual of a vector
space V is isomorphic to V , but not canonically (it requires the choice of a basis).
Similarly, we call A
b as the dual of A. But we do know from linear algebra that the
dual of the dual of V is canonically isomorphic to V . We replicate the same result
here.
ˆ b → C× . We
Proof. If a ∈ A, we wish to produce an element of Â, which is a map χ : A
have a really choice-free way of doing so: we define ψa : χ 7→ χ(a). This gives us our
≃ ˆ
isomorphism A − → Â where a 7→ ψa .
We check that this is a bijective homomorphism. We start with the latter: we have,
for a, b ∈ A,
ψab (χ) = χ(ab) = χ(a)χ(b) = ψa (χ)ψb (χ).
It now remains to show a 7→ ψa is injective. This suffices, since we know |A| =
ˆ ˆ
|A|
b = |Â|, so if we get an injective map A → Â, then it must be surjective as well.
Injectivity amounts to proving that for any 1 ̸= a ∈ A, then ψa is not the trivial map,
or equivalently there is some χ ∈ A b such that χ(a) ̸= 1.
Via the decomposition A ≃ Z/n1 Z × · · · × Z/nr Z (where a 7→ (a1 , . . . , ar )), we can
decompose χ into characters (χ1 , . . . , χr ). So now we have reduced this problem to the
cyclic group case. If a ̸= 1, then ai ̸= 1 for some i. Select χi ’s such that for j ̸= i, χj = 1
is the trivial character, and χi (a) ̸= 1. Then, χ(a) = χ1 (a1 ) · · · χr (ar ) = χi (ai ) ̸= 1,
and we win.
Proposition 18.6
Let A be a finite abelian group, and let n = |A|. If χ, ψ ∈ A,
b then
(
X n if χ = ψ
χ(a)ψ(a) = n · δχ,ψ = .
a∈A
0 if χ = ̸ ψ
Remark 18.7. Note that for a ∈ A and χ ∈ A, b we have χ(a)n = χ(an ) = χ(1) = 1,
so in particular |χ(a)| = 1. In this case, we have χ(a) = 1/χ(a). We will use this
repeatedly.
Before we start the proof, we will prove the following lemma, which we have proven
before when the character is the Legendre symbol. In this lemma, 1 represents the
trivial character.
Lemma 18.8
If χ ∈ A,
b then
(
X n if χ = 1
χ(a) = .
a∈A
0 if χ ̸= 1
where the last equality follows from Lemma 18.8 above. The first statement follows.
ˆ
For the second statement, we can apply the first relation to  and use the isomorphism
ˆ
 ≃ A to get our desired result.
Corollary 18.9
If χ, ψ are Dirichlet characters mod m, then
m−1
X
χ(a)ψ(a) = ϕ(m)δχ,ψ .
a=0
Proof. Uh oh, we have a notation conflict here with the overline bar, but we will close
our eyes and push forward. Let χ and ψ be extensions of characters χ, ψ on (Z/mZ)× .
Then, the Proposition tells us that
m−1
X X
χ(a)ψ(a) = χ(a)ψ(a).
a=0 a∈(Z/mZ)×
Let us see what happens when χ = 1 is the trivial character. Then, we can recover
the Riemann zeta function minus finitely many primes; specifically,
Y Y
L(s, 1) = (1 − p−s )−1 = ζ(s) (1 − p−s ),
p∤m p|m
Kisin starts this lecture with a recap of the definition of Dirichlet characters (§18),
the orthogonality conditions (§18.2), and Dirichlet L-functions (§18.3). So in short,
read the previous lecture notes!
We will pick up from the last equation line from the last lecture, namely the
L-function for the trivial character χ = 1. Recall Proposition 17.3, which stated
lims→1+ (s − 1)ζ(s) = 1. Then, we can evaluate
Y
lim+ (s − 1)L(s, 1) = lim+ (s − 1)ζ(s) (1 − p−s )
s→1 s→1
p|m
Y
−1
=1· (1 − p )
p|m
= ϕ(m)/m,
where ϕ(m) is, as it always has been, the Euler totient function. (The last equality just
follows from the formula we gave for ϕ, see Lemma 4.7.)
We want to consider log L(s, χ), in a sensibility akin to Corollary 17.4. This is
relevant because, ultimately, we care about the density of certain primes, and so we
need an expression that reflects the Dirichlet density. Given our factorization of L(s, 1),
we can write
log p|m (1 − p−s )
Q
log L(s, 1) log ζ(s)
lim = lim+ + lim+ .
s→1+ log(s − 1)−1 s→1 log(s − 1)−1 s→1 log(s − 1)−1
The product in the numerator of the latter term is a finite product that goes to some
nonzero value when s → 1+ , so the latter term vanishes as s → 1+ . (The denominator
is unbounded.) Thus, we have
∞
XX 1
G(s, χ) = χ(p)k p−ks .
p k=1
k
Lemma 19.2
G(s, χ) converges absolutely for Re s > 1, and exp(G(s, χ)) = L(s, χ) for Re s > 1.
Proof. We first address convergence. It is not hard to see |1/k · χ(p)p−ks | ≤ p−ks .
Therefore,
∞
XX
|G(s, χ)| ≤ |1/kχ(p)p−ks |
pk=1
XX∞
≤ |p−ks |
p k=1
X X
= |p−s (1 − p−s )−1 | ≤ 2 |p−s |,
p p
which we know converges for Re s > 1, completing the first part of the proof.
Now we show that exp(G(s, χ)) = L(s, χ). Using the Taylor series of the logarithm
log(1 − x) = x + 21 x2 + 13 x3 + · · · that I put above, we have
∞
!
X zk
exp = (1 − z)−1 .
k=1
k
as desired.
Now we reach a key step in our proof of Dirichlet’s Theorem, which describes the
behavior of G(s, χ). Again, we can think of G(s, χ) as the logarithm of L(s, χ) in some
sense.
Proposition 19.3
Define G(s, χ) as above.
We will only prove (2) for now. We will assume (1) to prove Dirichlet’s Theorem,
then we will go back to prove (1) afterwards.
where the latter sum is bounded near s = 1 by Lemma 19.2 above. Thus, taking the
limit as s → 1+ , we get
−s
P
G(s, 1) p∤m p
lim = lim+ =1
s→1+ log(s − 1)−1 s→1 log(s − 1)−1
as desired.
Theorem 19.4
If (a, m) = 1, then d(P (a, m)) = 1/ϕ(m).
denote the latter sum as Rχ (s), which is a bounded “error” term. Then, as χ ranges
over all Dirichlet characters modulo m, we compute
X XX X
χ(a)G(s, χ) = χ(a)χ(p)p−s + Rχ (s)χ(a)
χ χ p χ
X X X
−s
= p χ(a)χ(p) + Rχ (s)χ(a)
p χ χ
X
−s
= p ϕ(m)δa,p + Rχ,a (s)
p
X
= ϕ(m) p−s + Rχ,a (s), (*)
p≡a(m)
Theorem 20.1
If χ ̸= 1, then the L-function
∞
X χ(n)
L(s, χ) =
n=1
ns
We will show that this theorem implies our desired Proposition 19.3. But first, we
should define exactly what it means for a function to be analytic. This is a term from
complex analysis.
Hidden beneath this definition are many incredible facts, which is kind of the reason
why complex analysis is such a beautiful subject. It is useful to think of analytic as the
complex analysis notion of differentiable. The magic is that, unlike in real analysis, if
a function is differentiable once, it is differentiable infinitely many times. This means
we can write f as an infinite power series, which are basically as good as one can get.
Assuming analytic continuation (Theorem 20.1), which we can now interpret as
meaning L(s, χ) can be extended to all of the right half-plane Re s > 0 as an analytic
function, we will prove Proposition 19.3.
Proof of 19.3. As L(1, χ) ̸= 0, there exists a small disc D ⊆ C centered at 1 such that
L(s, χ)|D does not take the value 0. Choose a neighborhood D′ of L(1, χ) such that
L(s, χ)(D) ⊆ D′ .
Now, choose a branch of the complex-valued logarithm defined on D′ , and let
G1 (s, χ) = log(L(s, χ)) for s ∈ D. Then, on D ∩ {s | Re s > 1}, we have exp(G(s, χ)) =
L(s, χ) = exp(G1 (s, χ)). But exp is invariant under addition by 2πi, so on D ∩ {s |
Re s > 1}, we have
G(s, χ) − G1 (s, χ) = 2πin
for some n ∈ Z. But since L(s, χ) is bounded by D′ , G1 (s, χ) is bounded on D ∋ 1,
which implies G(s, χ) is bounded around s = 1, as desired.
Proposition 20.3
1
ζ(s) − s−1 can be analytically continued to {s ∈ C | Re s > 0}.
Before we prove this, we will prove this lemma, another result from complex analysis.
Lemma 20.4
Let {an }, {bn } ⊆ C be sequences such that ∞
P
n=1 an bn converges. Let An = a1 +
a2 + · · · + an . Suppose An bn → 0 as n → ∞. Then,
∞
X ∞
X
an bn = An (bn − bn+1 )
n=1 n=1
If you sit down with this a little bit, this looks to be true: you can cancel a lot of
terms on the right to just reduce to an bn terms, which remain on the left. The real
content is that the sum is convergent.
PN
Proof. Let SN = n=1 an bn . (Also for formality, let A0 = 0.) Then, we can write
N
X N
X N
X
SN = (An − An−1 )bn = An bn − An+1 bn
n=1 n=1 n=1
XN N
X −1 N
X −1
= A n bn − An bn+1 = AN bN + An (bn − bn+1 ).
n=1 n=1 n=1
(This is the “you can cancel a lot of terms on the right” I was talking about.)
Now we prove Proposition 20.3. This is our first time really proving a function can
be analytically continued, so it may be useful to lay out the general principle first. A
priori, ζ(s) is defined for Re s > 1. To extend to Re s > 0, we will take a point z with
Re z > 1, then find a ball around z which goes beyond {Re s > 1}, then show that our
function at hand can be analytically defined on this ball.
Proof of 20.3. Applying the above lemma with an = 1 and bn = n−s , we see that as
P ∞
n=1 is exactly ζ(s), it follows that we can write
∞
X
ζ(s) = n(n−s − (n + 1)−s ).
n=1
Let {x} = x − ⌊x⌋ ∈ [0, 1) denote the fractional part of x. Note that we can write
ˆ n+1
−s −s
n − (n + 1) = s x−s−1 dx,
n
so
∞
X
ζ(s) = n(n−s − (n + 1)−s )
n=1
∞
X ˆ n+1
= n·s x−s−1 dx
n=1 n
X∞ ˆ n+1
=s ⌊x⌋x−s−1 dx
n
ˆ
n=1
∞
=s ⌊x⌋x−s−1 dx
ˆ1 ∞ ˆ ∞
−s−1
=s x·x
dx − s {x}x−s−1 dx
1
∞ ˆ ∞ 1
x1−s
=s· −s {x}x−s−1 dx
1−s 1
ˆ ∞1
1
=1+ −s {x}x−s−1 dx.
s−1 1
We see that the first term has a pole at s = 1 (we are dividing by s − 1). But ’tis
1
merely a scratch, since it is a simple pole which goes away if we subtract s−1 . (In fact,
we are just left with 1.)
We should check that the integral for the second term indeed converges and is
analytic for Re s > 0 (i.e., all of our problems lie in the first term). But we know
{x} ∈ [0, 1), so |{x}| < 1, meaning
ˆ ∞ ˆ ∞ ˆ ∞
−s−1 −s−1
{x}x dx ≤ |x | dx = x−1−Re s dx,
1 1 1
which converges for Re s > 0 by just integrating like a high schooler would: for s ∈ R+ ,
we have (check this!!) ˆ ∞
∞
s x−s−1 dx = −x−s 1
=1
1
Lemma 21.1
Let χ ̸= 1 be a Dirichlet character modulo m. Then, for N ≫ 0,
N
X
χ(n) ≤ ϕ(m).
n=0
Proof. We will denote χ0 = 1 also as the trivial character, as it is less awkward to write
χ0 (n) than 1(n). (This is the notation Kisin has maintained throughout the course,
anyways.) As χ ̸= χ0 , by the Orthogonality Relations (Corollary 18.9), we have
m−1
X m−1
X
0= χ(n)χ0 (n) = χ(n).
n=0 n=0
Proposition 21.2
If χ ̸= 1, then L(s, χ) has an analytic continuation to Re s > 0.
P P∞ −s
Proof. Let S(x) = 0≤n≤x χ(n). We know L(s, χ) = n=1 χ(n)n . We will now
−s
P ∞
invoke Lemma 20.4. Letting an = χ(n) and bn = n , we have n=1 an bn = L(s, χ) is
convergent and An = a1 + · · · + an = S(n). We can check, by the above lemma, that
|An bn | = |S(n)/ns | ≤ ϕ(m)n−s → 0 for Re s > 0. Then, by Lemma 20.4, we have
∞
X ∞
X
−s
L(s, χ) = χ(n)n = S(n)(n−s − (n + 1)−s )
n=1 n=1
X∞ ˆ n+1
−s−1
= S(n) s x dx
n
ˆ
n=1
∞
=s S(x)x−s−1 dx (S(x) = S(⌈x⌉))
1
Again, by´Lemma 21.1, which tells us |S(x)| ≤ ϕ(m) (in particular, this is bounded),
∞
we have that 1 S(x)x−s−1 dx converges absolutely. (This is a generalization of the very
last part of the proof for Proposition 20.3, replacing {x} with S(x) (or in general, any
bounded function).)
Proposition 21.3
Q
Let F (s) = χ mod m L(s, χ). For s ∈ R such that s > 1, we have F (s) > 1.
Proof. Recall G(s, χ), which satisfies exp(G(s, χ)) = L(s, χ) for Re s > 1, can be written
as ∞
XX 1
G(s, χ) = χ(pk )p−ks .
p k=1
k
Through some hard work, we can obtain
∞
X X XX 1
G(s, χ) = χ(pk )p−ks
χ mod m χ mod m p k=1
k
∞
XX 1 −ks X
= p χ(pk )
p k=1
k χ mod m
X 1
= ϕ(m) p−ks > 0,
p,k
k
pk ≡1(m)
where the last equality follows because χ mod m χ(pk ) = 0 unless pk ≡ 1 (mod m), in
P
which case the sum is ϕ(m). (This can be seen from any of the orthogonality relations
given in §18.2.) This implies
!
Y X
F (s) = L(s, χ) = exp G(s, χ) > 1
χ mod m χ mod m
as desired.
Proposition 21.4
L(1, χ) = 0 for at most one nontrivial Dirichlet character χ ̸= 1.
1
We know ζ(s) − s−1 has analytic continuation at s = 1, so we can write ζ(s) =
1 −1
s−1
+ g(s) = (s − 1) (1 + (s − 1)g(s)) for some g(s) which is analytic at s = 1.
To prove this Proposition, we will recall the very end of last lecture regarding the
order of s = 1 as a zero and writing h2 (s)/h1 (s) in terms of (s − 1) for analytic h1 , h2 .
To recap, if j1 = ords=1 h1 (s) and j2 = ords=1 h2 (s), then
0 j1 > j2
h2 (s)
lim = c ̸= 0 j1 = j2 .
s→1+ h1 (s)
∞ j1 < j2
We know L(s, 1) = ζ(s) p|m (1 − p−s ); the finite product on the right behaves perfectly
Q
well at s = 1, so L(s, 1) just has a pole at s = 1 of order 1, like ζ(s). Thus, we write
L(s, 1) = 1/h2 (s) with ords=1 h2 (s) = 1.
If χ ̸= 1 such that L(1, χ) = 0, then by definition ords=1
Q L(s, χ) ≥ 1. If there were
two such χ ̸= 1 such that L(1, χ) = 0, then the order of χ̸=1 L(s, χ) at s = 1 would
be ≥ 2. But then this would imply ords=1 F (s) ≥ − ords=1 h2 (s) + 2 = 1, meaning
lims→1+ F (s) = 0, which we know from the above proposition is false. The conclusion
follows.
Corollary 21.5
If χ ̸= 1 such that χ(Z) ̸⊆ R, then L(1, χ) ̸= 0.
Proof. Suppose L(1, χ) = 0, so ords=1 L(s, χ) > 0; we can write L(s, χ) = (s − 1)g(s)
for some g(s) analytic at s = 1. Note that we can interpret χ(Z) ̸⊆ R as saying χ ̸= χ,
as α = α ⇐⇒ α ∈ R. So it makes to look at χ. For s ∈ R, s > 1, we have
∞
X ∞
X
−s
L(s, χ) = χ(n)n = χ(n)n−s
n=1 n=1
=⇒ I didn’t write this down in time rip
Proposition 22.1
If χ is a Dirichlet character modulo m and χ(Z) ⊂ {−1, 0, 1}, then L(1, χ) =
P∞ χ(n)
n=1 n ̸= 0.
P
Proof. Let cn = d|n χ(d). (We will see in a bit why we are considering this sum.)
Suppose (n, m) = 1. If d | nm, write d = d1 d2 such that d1 | n, d2 | m. Then,
X X X
cnm = χ(d) = χ(d1 ) χ(d2 ) = cn cm .
d|nm d1 |n d2 |m
so aha, this is why our cn sums are useful. But we showed that the sum of cn is
unbounded! This means limt→1− f (t) = ∞. We will rely on this to reach a contradiction,
so keep this in mind.
1 tn 1 t
where bn = bn (t) = n(1−t)
− 1−tn
. We can compute b1 = 1−t = 1−t
= 1. Furthermore,
1 tn tn
limn→∞ bn (t) = limn→∞ n(1−t)
− 1−t n = limn→∞ − 1−t n = 0.
where the last line follows because we assumed the bn ’s form a non-increasing sequence.
But this means f (t) is bounded above by a constant for all t. This contradicts our
conclusion that limt→1− f (t) = ∞, and the proposition follows.
It remains to prove the claim that b1 ≥ b2 ≥ · · · . Bear with me as we proceed with
some computations:
1 tn 1 tn+1
(1 − t)(bn − bn+1 ) = − − +
n 1 + t + · · · + tn−1 n + 1 1 + t + · · · + tn
1 tn (1 + t + · · · + tn ) − tn+1 (1 + t + · · · + tn−1 )
= +
n(n + 1) (1 + t + · · · + tn )(1 + t + · · · + tn−1 )
1 tn
= − .
n(n + 1) (1 + t + · · · + tn )(1 + t + · · · + tn−1 )
Now, we invoke the AM-GM inequality,13 which states that AM ≥ GM. Using this
here, we can conclude
1 + t + t2 + · · · + tn−1 n(n−1) 1/n n−1
≥ t 2 =t 2
n
n−1
=⇒ 1 + t + t + · · · + tn−1 ≥ n · t 2 ≥ n · tn/2
2
=⇒ 1 + t + · · · + tn ≥ (n + 1)tn/2
1 tn
=⇒ (1 − t)(bn − bn+1 ) ≥ − = 0,
n(n + 1) n(n + 1)tn
so indeed bn − bn+1 ≥ 0. Hooray!
At last, all parts of the proof of Dirichlet’s theorem are covered. Free at last, free
at last...
more sophisticated terms, a complex Riemann surface of genus 1 (meaning it has one
hole). We can realize the below graph as a vertical cross-section of the complex torus,
where the rightmost component of the graph can be realized as a circle passing through
the point at infinity.
But R and C are not special: we can con-
sider this over any field. Q is of utmost im-
portance because it is related to the integers
in an obvious way, e.g. if we solve an + bn = 1
over Q, then we have solved an + bn = cn over
Z.
This curve has even more structure than,
well, just being a curve. There is a way to add
two points on the curve to get another point.
This endows the points on the curve with an
operation, and so the points, one can show,
form a group! In fact, it is an abelian group.
Denote E as the elliptic curve, and let E(K)
be the points of E over K. For instance, E(C)
are the points on E where both coordinates
are in C, and this forms a torus. Here is an
incredible result:
Now, how is this related to L-functions? Well, I’m going to construct an L-function
for you. Let ap = p + 1 − |E(Fp )| (yes, this seems a bit out of nowhere), and define
Lp (X) = 1 − ap X + pX 2 for “good” primes (ignore this for now, it is a technicality).
Then, consider the L-function
Y Y
L(E, s) := Lp (p−s )−1 = (1 − ap p−s + p1−2s .
p p
This converges for Re s > 3/2. But even better, this exhibits very nice properties akin
to what we proved for our L-functions attached to Dirichlet characters:
Let’s relate them more explicitly. Oh wait, we can’t actually, because nothing’s
been proven yet. But at least we can state some really important conjectures. Here’s
one of the Millenium Problems:
What we did in class was a baby example of this: Dirichlet L-functions correspond to
0-dimensional algebraic varieties, in particular the ones defined by the curve z m = 1,
which give birth to the number field Q(ζm ). So yeah, geometry is tied with these
L-functions, which mostly live in the realm of analysis but are completely tied with
number theory. All of this is really beautiful, and I would really encourage you all to
take a look at some of these things at some point in your academic journey!