Geometry for High School Teachers
CHAPTER 1
The (planar) Euclidean Geometry
1. Why is (the High School) Geometry different from other
branches of Mathematics?
The first part of these notes discusses the well-known and familiar Eu-
clidean geometry (mostly on the plane). The purpose is to describe it in
the terms that would allow some alteration and natural generalization to
discuss non-Euclidean geometries.
1.1. Axioms or definitions? A quick look shows that (at least in the
high school) exposition of Geometry very strongly differs from that of other
branches of Mathematics. Table 1 summarizes some of the main differences.
We start with “undefined notions” and “axioms”. The Eulcidean geom-
etry, as almost every science, was born out of experimental evidence and
practical needs to operate with numbers, shapes (mostly land plots, whence
the very name Geometry, literally “land measuring”). Lines were plotted as
lines of vision or using tight ropes, circles also were plotted using ropes (for
small circles a special device, the compass, could be designed to plot circles
of different radii).
To avoid the dealing with inaccurate measurements, stretching of wet
ropes and other disturbances, the Greeks “invented” idealization: what will
remain after we keep only the essential properties of the geometric shapes?
Thus they arrived at the list of axioms which “ideal” lines and circles must
satisfy. However, in terms of pure logic they could not answer the basic
questions, i.g., what is a point? what is a straight line? what is a distance?
So these were left as undefined notions. The Greek sophistication went so
far as to claim that they did not have to plot geometric shapes to establish
their properties: everything could be spelled out by plain words.
But which axioms to choose? They should, first, reflect the daily experi-
ence (or at least do not contradict it), and second, be full enough to deduce
all the rest from these axioms reasoning by pure logic. By careful selection,
Euclid and his predecessors assembled a long list of axioms, part of them
related to numbers, part to geometric shapes, that apparently satisfied both
conditions above. One notable exception was the famous Fifth Postulate
which claims that there is only one line parallel (i.e., not intersecting, if
we talk about the planar geometry) to a given line ` through a given point
P ∈/ `.
1
2 1. THE (PLANAR) EUCLIDEAN GEOMETRY
Geometry “Other mathematics”
“Undefined objects” (points, lines, The only “undefined objects” are
distance, . . . ) having some intuitive sets. All other definitions are spelled
meaning. in the language of the set theory.
Geometric constructions (lines Set theoretic constructions (unions,
through pairs of points, intersection intersections, Cartesian products,
of lines, circles with a given center extrema, images and preimages by
passing through a point, . . . ). maps, . . . ).
Axioms link “undefined objects” be- Definitions (linearity, continuity,
tween themselves (uniqueness of the limit, convexity) are given without
line through two distinct points, any claim to being “obvious”. We
uniqueness of the line parallel to a know that they are good only if they
given one through a point off it) and are really useful. Anybody has the
are purported to be “obvious” or right to define anything.
“evident”.
Axioms can be “debatable” (cf. the Definitions must be non-self-
story around the Fifth Postulate). contradictory, this is all you need. If
they are practical, they are adopted,
if they are artificial, they will be
ignored without Much Ado.
There is an aura of Geometry as the Mathematical theories are formally
absolute knowledge about our world, “mental exercises” until sometimes
something not achievable in other and for unknown reasons they be-
sciences, where there is always a come surprisingly effective and use-
chance of error or unnoticed super- ful in applications to the real life
weak effect that will eventually ruin (mostly in Physics, but also in Eco-
all existing theories. nomics, Financial mathematics e.a.).
Textbooks did not change their ap- Books written in the modern mathe-
pearance for hundreds of years. matical language would be unacces-
sible to readers from 19th century
and earlier.
Table 1. Geometry vs. the rest of Mathematics
In fact, this postulate is not much less obvious than another postulate,
claiming that any two lines cannot intersect by more than one point, ap-
parently less controversial. In both cases the problem is in the infinite size
1. GEOMETRY VERSUS MATHEMATICS? 3
of lines: one of the axioms explicitly demands that any line can be ex-
tended without any obstruction in both directions 1. Dealing with infinity is a
very tricky business: the common, household intuition is mum when infinite
quantities are involved, so one has to rely on pure logic. In particular, it
is impossible to verify that the two lines do not intersect, looking at their
finite chunks. Euclid had to circumvent this problem by angle measurement
and the Fifth postulate.
On the other hand, in the modern mathematics there is only one un-
defined notion, quite universal,—the notion of a set which consists of its
elements 2. All other constructions can be described in the language of sets.
For instance, there are set theoretic operations (union, intersection, com-
plement, Cartesian product, ...) and the notions of maps between sets, which
can be explicitly defined.
Example 1.1 (definition of functions). For two sets A, B a map from
A to B is a subset Γ in the set A × B = {(a, b) : a ∈ A, b ∈ B} of ordered
pairs with the following two properties:
(1) For any element a from A there exists an element b from B such
that (a, b) ∈ Γ ;
(2) If (a, b1 ) ∈ Γ and (a, b2 ) ∈ Γ , then necessarily b1 = b2 .
Of course, you recognize instantly that this coincides with the “common”
definition of a map f as a “rule”, which associates with every element a ∈ A
the unique element b = f (a), and Γ = {(a, f (a)) : a ∈ A} is simply the
graph of this map.
This example shows that the word “rule” is obsolete: we can define what
is a map without any explicit mention of any “rule”, using only the notions
from the Set theory. Of course, this definition is the result of very long
distillation from mathematical reality. Even at the times of Newton and
Leibnitz, that is, only yesterday by historical terms, only functions which
could be defined by explicit formulas were allowed.
Example 1.2 (definition of groups). Another example is the notion of
a group.
We say that a set G is a group, if there exists a map π : G × G → G
which is denoted not by the familiar notation (g, h) 7→ π(g, h), but rather
as a binary operation ◦ : (g, h) 7→ g ◦ h with the following properties:
(1) The associative law holds: f ◦ (g ◦ h) = (f ◦ g) ◦ h (warning: we do
not require symmetry g ◦ h = h ◦ g!);
(2) There exists an element in G denoted by e, such that g◦e = e◦g = g
for any element g ∈ G.
1This axiom is remotely related to the fact that the natural numbers N =
{1, 2, 3, . . . , 2020, 2021, . . . } come in the infinite number: any n can be further increased
by 1 to give n + 1.
2
Very large sets can cause problems, but they should be indeed very large for that.
4 1. THE (PLANAR) EUCLIDEAN GEOMETRY
(3) For any element g ∈ G there exists a unique solution x ∈ G to the
equation g ◦ x = e. This element is called the inverse of g and
denoted g −1 .
These “axioms of the group” do not pretend to be obvious or evident in
any way. The notion of a group is extremely useful in mathematics, because
groups are literally everywhere. But it took the Humanity till early 19th
century, when Galois and Lagrange explicitly introduced this notion and
showed how instrumental it is in solving the ancient problem, which algebraic
equations are explicitly solvable (using the four arithmetical operations and
extracting radicals).
In contrast with this, the toolbox of constructions in Geometry appears
very limited. The Greeks restricted themselves to using only the ruler and
compass, that is:
(1) construct a line passing by any two different points,
(2) construct a circle with any center and of any (positive) radius,
(3) find points of intersection between constructed earlier lines and
circles, provided that such intersections exist (e.g., two lines are
not parallel to each other, or the distance between centers of the
two circles is less or equal to the sum of their radii).
1.2. Status of theorems. Because the axioms in the Greek geometry
were purported to have an “absolute value”, being described as obvious and
not subject to any debate, the theorems were considered as “absolute truth”
about our world. In particular, once it was proved that the sum of angles
of any triangle is precisely equal to 180◦ , only an idiot would try and check
this statement experimentally by measuring angles of a given triangle.
Gauss, one of the greatest mathematicians of all times, suspected that
the Fifth postulate is not just independent from the rest of “obvious” axioms,
but may be “plain wrong” in our world. To avoid being called an idiot, he did
not publish his findings and reflections, but it is known that he measured the
angles of the biggest triangle he could assess, between the three mountain
peaks in Germany, all within the direct visibility. Despite all his efforts,
he was unable to find any deviation from 180◦ . Today we know that the
accuracy of his measurements was certainly insufficient to find the deviation,
but it indeed exists “in the real world”.
In contrast to that, in modern mathematics it is clear that if you consider
any theorem expand all definitions down to statements in the language of
the set theory (of course, they will be huge, with hundreds of quantifiers
“for any” (or “for all”) and “there exists”), then any statement will be
reduced to the trivial sentence A = A for some set A defined in two different
ways. This observation, due to Poincaré, another great thinker and brilliant
mathematician, should sound disappointing for mathematicians, but the key
phrase here is “in two different ways”. It is the length of this journey through
mathematical banalities that makes mathematical theorems meaningful and
beautiful.
1. GEOMETRY VERSUS MATHEMATICS? 5
Figure 1. Desargues theorem
The last remark refers to the way Geometry is taught in contrast to
the modern way to teach mathematics. Today’s textbooks of geometry are
(with rare exceptions) new renderings of the same Books of Euclid: even if
they use the “modern” mathematical notation with quantifiers and speak
“in the language of the Set theory”, their translation to the ancient Greek
can be easily done by an app in today’s smartphones.
On the contrary, a modern textbook in algebra or topology would be
completely incomprehensible to Euclid and his students. Even the notion
of a set looks so simple, the pyramid of constructions built upon each other
will be so high that one should expect years of study to be able to speak the
argo we are speaking today between ourselves.
1.3. Advertisement: two cherries on the cake. You can skip this section
during the first reading.
A beautiful example of a geometric theorem is the Desargues theorem, see
Fig. fig:desargues. It has all features of the miracle: seemingly no assumptions, but
a very unexpected conclusion.
Let 4ABC and 4abc be two triangles on the plane. Draw the lines L = Aa,
M = Bb and N = Cc. Consider three points P = AC ∩ ac, Q = AB ∩ ab and
R = BC ∩ bc. In general the lines L, M, N intersect each other pairwise at three
different points. In a similar way, the three points P, Q, R form a triangle and the
three lines P Q, QR, P R are all different. Yet some pairs of triangles 4ABC and
4abc produce exceptional configurations.
Theorem 1.1 (Desargues). The three lines L, M, N are concurrent (intersect
at a single point or are parallel) if and only if the three points P, Q, R are collinear
(belong to the same line).
Another example concerns counting rational points on various curves. The
equation x2 + y 2 = 1 defines the familiar unit circle on the plane. We say that
a point (x, y) is rational, if both its coordinates are rational numbers. How many
rational points there are on the unit circle? 1 The answer is: infinite number. (1) Xref!
Indeed, any Pythagorean triple of natural numbers l, m, n such that l2 + m2 = n2 ,
produces a rational point on the circle with the coordinates x = l/n, y = m/n.
6 1. THE (PLANAR) EUCLIDEAN GEOMETRY
Figure 2. Fermat curves
Consider a slightly different equation
xn + y n = 1, n > 3. (1.1)
One may draw the corresponding curve F : say, if n is even, it will be a smooth
oval looking closer to the unit square with sides on the lines |x| = 1 and |y| = 1,
but with rounded angles. The same question about rational points on this curve is
quite natural to ask. Clearly, there are at least 4 rational points on F : (±1, 0) and
(0, ±1). Are there any other rational points?
Theorem 1.2 (Fermat conjecture, proved by A. Wiles with help from R. Taylor
only in 1995). The curve F described by the equation (1.1) carries no rational points
except for the four trivial ones.
Problem 1.1. Derive the above theorem from the standard formulation of the
Fermat’s theorem.
2. The Euclidean plane as a metric space
Let’s restrict ourselves to the planar geometry. For that sake we intro-
duce the notation Π for The Euclidean Plane, the plane we will live in until
going up in the third dimension. The readers are assumed to be familiar
with the “classical” geometry of Π which we will refer to as the Euclidean
geometry.
Any statement concerning the Euclidean geometry can be verified using
all known tools (lemmas, theorems e.a.) from your favorite high school
textbook of geometry.
2.1. The wrong way. One can try to save on efforts and mimic the Euclid’s
reasoning, involving the set theory language purely formally. We start by saying
that Π is a set, whose elements are points, which satisfy the following properties:
(1) some subsets of Π are called “straight lines”, some are called “circles”.
(2) for any two points A, B ∈ Π there is a subset “straight line” that contains
both points.
(3) For any two subsets `, `0 ⊂ Π which are “straight lines”, either the inter-
section ` ∩ `0 is empty (and then they are called parallel), or it consists of
a single point (element in Π).
2. TRANSLATION FROM GREEK TO MODERN 7
Figure 3. Compass
(4) If A, B, C are three points on the same line and KA , KC are two circles
centered at A and C respectively and both contain (pass through) the
point B between A and C, then the circles KA and KC have no other
intersections: KA ∩ KC = {B}3.
(5) . . . (list all other axioms by Euclid in the same “birds language”.
This “1:1-compilation” will bring you nowhere. You will have only a headache
trying to translate from the birds language to your familiar geometric language
supported by experience and intuition.
The right way is to distill the concepts and fundamental ideas from the Eu-
clidean geometry. We will do exactly that now.
2.2. Distance. Looking from the past and trying to reconstruct the
way the Greeks arrived to their brilliant achievements, we can safely guess
that the fundamental notion for them was the distance between points on the
plane. We postpone the question of how and in what terms the distance can
be measured and calculated, concentrating now on the way how this notion
can be used to introduce the main concepts of the Euclidean geometry.
Definition 1.1 (naive). The distance is a function
dist : Π × Π → numbers, (A, B) 7−→ dist(A, B),
which possesses the following three obvious properties (“axioms”):
(1) The distance is nonnegative: dist(A, B) > 0 for any two points,
and dist(A, B) = 0 if and only if A = B.
(2) The distance is a symmetric function: dist(A, B) = dist(B, A).
(3) or any three points A, B, C ∈ Π, the triangle inequality holds,
dist(A, C) 6 dist(A, B) + dist(B, C). (2.1)
Remark 1.1. The three properties of the distance are (simple) theorems
in the Euclidean geometry: the first two are so obvious that they are even
not formulated as such, and the last property is formulated as the fact that
in any triangle each side is shorter than the sum of the other two.
3Did you recognize the triangle inequality in this axiom?
8 1. THE (PLANAR) EUCLIDEAN GEOMETRY
It is natural to ask, whether there exists more than one function dist
satisfying these properties? The obvious answer is “yes”, since, e.g., the
function dist0 (A, B) = λ dist(A, B) will also satisfy all three conditions if λ
is a positive number. Are there really different, non-proportional examples?
The positive answer will be given below. In fact, there are plenty of different
possible distance functions even if we restrict ourselves to the same Euclidean
plane Π.
The notion of distance turns out to be so important that we introduce
a special notion (concept) for it in the modern spirit.
Definition 1.2. An (abstract) set X equipped with a function
dist : X × X → numbers, (x, y) 7→ dist(x, y)
on it, is called a metric space if this function satisfies the above three con-
ditions.
Note that the same set X can be equipped with different distance func-
tions, making it into different metric spaces.
An example of a “non-geometric” metric space is the Handshake space.
Example 1.3 (Handshake distance). Consider any finite set M and assume that there
is a symmetric function
h : M × M → {0, 1}, h(x, y) = h(y, x).
We think of elements of M as people and interpret h(x, y) = 1 if x and y exchanged a
handshake at least once in their life, otherwise h(x, y) = 0.
Call a sequence of points x0 , . . . , xn a handshake chain, if h(xi , xi+1 ) = 1 for all
i = 0, 1, . . . , n − 1.
Call the handshake distance disth (x, y) the shortest length of a handshake chain such
that x0 = x, xn = y. If there is no handshake chain, we write that disth (x, y) = +∞ (this
is only the notation, don’t worry about the symbol ∞: what is important for the moment
is that ∀n ∈ N we put by definition n < +∞).
The function disth thus is defined on M × M and takes values in Z+ ∪ {+∞}.
Prove that this function satisfies all three axioms (conditions) above. If disth takes
only finite values, than it will become a true metric on M .
Remark 1.2. There are strong reasons to believe that if M is the entire population
of Earth, then for any two people x, y ∈ M disth (x, y) 6 7. But of course, this is the
experimental fact that might be wrong if we look at some isolated tribes in the Amazonia.
Problem 1.2. Prove the claim from Exercise 1.3. Suggest an experiment that would
confirm the statement in the remark.
Hint. Can you estimate from above the handshake distance between yourself and
the Israeli Prime minister, who definitely is on the handshaking distance at most 2 with
all heads of states on Earth?
Problem 1.3. Let x0 , x1 , . . . , xn ∈ Π be called a discrete path 4 between
two points x, y ∈ Π, if x0 = x, xn = y. The length of the discrete path is
4This notion is an approximation of the more general notion of a continuous path as
a map γ : [0, 1] → Π, such that γ(0) = x, γ(1) = y.
2. TRANSLATION FROM GREEK TO MODERN 9
Figure 4. Manhattan distance
the sum
n−1
X
dist(xi , xi+1 ).
i=0
Prove that dist(x, y) is less or equal to the length of the shortest discrete
path between x and y.
Solution. Apply repeatedly the triangle inequality dist(xi , xi+2 ) 6
dist(xi , xi+1 ) + dist(xi+1 , xi+2 ) to reduce the number of terms.
What other metrics can exist on the plane Π?
Definition 1.3 (Manhattan distance). Let Q ⊆ Π be a square on the
plane. A taxicab path between two points x, y ∈ Q is a (finite) sequence of
points x0 , . . . , xn ∈ Q, such that x0 = x, xn = y and each segment [xi , xi+1 ]
is parallel to one of the two sides of Q, i.e., either goes North-South, or
East-West on the map of Q.
The Manhattan distance distM (·, ·) on Q is the (usual Euclidean) length
of a shortest taxicab path from x to y.
Obviously, this example can be generalized for the entire plane Π, if we
choose on it two orthogonal directions, NS and EW. It is also called the the
taxicab geometry on the plane.
Problem 1.4. Show that a shortest path can always be chosen with
n = 1 or at most n = 2.
Remark 1.3. The same plane equipped with the “standard” Euclidean
distance and the Manhattan distance should be considered as two different
metric spaces.
Example 1.4 (“Subterranean metric”). Let S be the standard unit
sphere of radius 1 in the Euclidean 3D-space (centered at the origin, though
10 1. THE (PLANAR) EUCLIDEAN GEOMETRY
this is not important). Then the usual Euclidean distance in 3D can be “re-
stricted” on S: the distance between any two points P, Q ∈ S is the length
of the straight line segment with endpoints at these points.
Note that this distance is impossible to measure while remaining on S:
any such segment necessary goes “underground” at all interior points. In
order to measure this distance between two points on the surface of the
Earth, one would have to dug a tunnel between these points.
The next obvious question is “what is the straight line segment”? Today
we can use laser beams to do this, but what if the laser beams are not exactly
straight? Could they be “curved” by the Earth gravity? Stay tuned till the
last chapter of these notes. . .
In practice we, inhabitants of a spherical world use a different distance
function which will be discussed much later.
Problem 1.5. The “distance from Rehovot to Jerusalem” (as suggested
by the Google maps service) is about 50 km. The radius of the Earth is
approximately 6400 km. How deep underground will be the lowest point of
the “straight” tunnel between these cities?
2.3. Basic shapes through distance: circles. Before measuring the
distances, we should invent tools telling us when distance between pairs of
points is the same. In the case of the Euclidean distance this tool is well
known under the name of the compass, see Fig. 3. You can rotate the wheel
to change the distance between the legs of the compass, but there is no
measure scale to give you a number.
This hi-tech device, however, is a distant relative of a yardstick. Any
rigid piece of wood that is believed not to bend and hence preserve its shape
can replace the compass, even if the stick itself is not straight. Using the
compass (or the yardstick), you can draw in your sandbox the magic figure
called the circle.
Definition 1.4. The subset of Π is called a circle centered at a point
A ∈ Π, if it consists of all points P ∈ Π such that dist(A, P ) is the same for
all P (and positive).
Note that we cannot yet say what is the radius of the circle (as we did
not discuss yet the numbers), but for any point B 6= A we can draw a circle
centered at A and passing through B. For this you need to open up your
compass to get its legs to be at A and B or cut out a suitable yardstick.
Is this definition an axiom or a “abstract definition”? Euclid claims that
circles exist for any choice of A, B. He is obviously right, but the abstract
definition above doesn’t care much and can be applied to any metric space.
Definition 1.5. A (metric) circle in a metric space X is a subset of
points x ∈ X such that r = dist(A, x) > 0 is the same for all points. The
point A is called the center of the metric circle.
Remark 1.4. Unlike the Euclidean case, we do not claim that circles of
any radius are non-empty. Look forward for examples!
2. TRANSLATION FROM GREEK TO MODERN 11
Problem 1.6. Prove that two circles with the same center A are either
disjoint, or coincide.
Problem 1.7. Draw a circle with respect to the Manhattan distance
(Definition 1.3).
Problem 1.8. Prove that the Equator is a circle with respect to the
subterranean distance, see Exercise 1.4. Find the center and the radius of
this circle.
2.4. Straight line segments. Let’s go to a more complicated notion,
that of the line segment. Intuitively, this is the geometric shape realizing
the shortest path (in the sense of the distance) between two its endpoints.
It is convenient to think of a rope that is stretched at maximum: then the
shape of the rope will realize such a path. This is a physical intuition that
can be easily translated into the language of metric spaces.
Example 1.5. A line segment with two different endpoints A, B ∈ Π on
the Euclidean plane is the set of all points C ∈ Π such that
dist(A, B) = dist(A, C) + dist(C, B), (2.2)
that is, the segment is the set of points for which the triangle inequality
(2.1) degenerates into the equality.
Motivated by this example, we give the following definition.
Definition 1.6. A segment with two endpoints A, B ∈ X in a metric
space X is the set of all points C ∈ X such that the equality (2.2) holds.
This definition can be recycled to define infinite rays and infinite straight
lines in an arbitrary metric space.
Definition 1.7. For any two different points A, B ∈ X in a metric
space X the two subsets
RB = {C : dist(A, C) = dist(A, B) + dist(B, C)},
(2.3)
RA = {C : dist(B, C) = dist(A, B) + dist(A, C)},
are called rays with vertices B and A respectively.
The union of the three sets,
RA ∪ [A, B] ∪ RB
is called a (metric) line passing through the points A, B.
You can easily see that if dist is your familiar distance on Π, then the
def
Problem 1.9. Let A, B ∈ Q be two different points on the Manhattan
map Q. Describe the segment [A, B] and the two rays.
Problem 1.10. How looks a line segment for the subterranean metric,
see Example 1.4.
12 1. THE (PLANAR) EUCLIDEAN GEOMETRY
Figure 5. Segment, rays and line through two points
Figure 6. Spherical segment
Solution. The “line segment” [P Q] for the subterranean distance con-
sists of only two endpoints P, Q. For any other point C ∈ S we have
dist(P, C) + dist(C, Q) > dist(P, Q).
2.5. Digression: As the crow flies. There is one very important example
of a metric space, which we will study in details over and over. This example came
from the daily observation. The Manhattan distance, discussed above, is (roughly)
a distance measured along paved roads. Yet in the fields with no roads the more
natural is the usual distance “as the crow flies”. If we were living on the infinite
Geometric Plane, this would be the same Euclidean distance. Yet we know that
people live on the surface of a huge sphere S with radius about 6,400 km, and
considered on the scale compared with this radius, the crow flight distance (also
called the spherical distance) starts looking differently.
We can apply the same ideology in “defining” the crow flight distance as before,
as the length of a shortest path x0 , x1 , . . . , xn between any two points x0 = x and
xn = y, having in mind that all intermediate pairs of points xi , xi+1 are so close to
each other that we cannot feel that the Earth is not flat on such scale.
An accurate analysis proves that such shortest paths on the sphere are arcs
of the large circles, sections of S by 2-planes in 3D passing through the center O
of Earth. If the pair x, y does not consist of two opposite points (x, O, y) do not
belong to the same line in 3D), then such circle is unique.
This definition allows to define the notion of the segment on the sphere, and
it will be something surprisingly looking on the map (ask Google maps to plot
3. “EQUAL” FIGURES. ISOMETRIES 13
the segment from Tel Aviv to New York, for example), but still something one-
dimensional. However, if you try to apply the definition of the segment to the pair
(North Pole, South Pole), you will get a completely unexpected result.
The “spherical segments” are not straight from the 3D point of view, yet they
are good enough to serve as such for all our geodesic or cartographic purposes.
Problem 1.11. How the “infinite lines” look on the sphere?
Remark 1.5. Definition 1.7 was copy-pasted from the textbook of Eu-
clidean geometry. Later we will see that for general metric spaces a slightly
different definition is “better”, that is, can be applied with satisfactory re-
sults in a wider variety of cases.
3. “Equal” figures. Isometries
3.1. Equality. The language of the Set theory assumes that any two
subsets U, V ⊆ X of a larger set X are equal if and only if they consist of
the same elements of X. In application to subsets of the Euclidean plane Π
this would imply, say, that two triangles ABC and A0 B 0 C 0 are equal if and
only if A = A0 , B = B 0 and C = C 0 , that is, if they coincide. Clearly this is
not the Greeks (and today’s students) mean.
It is rather easy to restore the Greek meaning of “equality” from the
two facts:
(1) Two segments [A, B], [A0 , B 0 ] ⊂ Π are “equal”, if and only if
dist(A, B) = dist(A0 , B 0 );
(2) Two circles centered at two points A, A0 and passing through the
points B, B 0 respectively, are “equal” if and only if they have the
same radii, i.e., dist(A, B) = dist(A0 , B 0 ) exactly as above.
This leaves no freedom for the choice of the proper definition.
Definition 1.8. A map f : Π → Π is called a (Euclidean) isometry of
the plane Π, if
∀A, B ∈ Π dist(A, B) = dist f (A), f (B) . (3.1)
One can immediately introduce the general concept of isometry between
two possibly different metric spaces, which generalizes Definition 1.8.
Definition 1.9. Let X, X 0 be two metric spaces (eventually, the same)
with the distance functions dist, dist0 . A map f : X → X 0 is called an
isometry, if
∀A, B ∈ X dist(A, B) = dist0 (A0 , B 0 ), where A0 = f (A), B 0 = f (B).
Problem 1.12. Prove that any isometry is injective.
Solution. If x 6= y in X, then dist(x, y) > 0. But then dist0 (x0 , y 0 ) > 0
and hence x0 6= y 0 , where x0 = f (x), y 0 = f (y).
However, the isometry can be not surjective (give examples!).
14 1. THE (PLANAR) EUCLIDEAN GEOMETRY
Problem 1.13. If f : X → X 0 is an isometry, then f is a bijection
between X and the image f (X) ⊆ X 0 , and the inverse map f −1 : f (X) → X
is also an isometry. Prove this.
Theorem 1.3. If f : Π → Π is an isometry (relative to the Euclidean
distance), then f is necessarily surjective.
Moreover, for any two segments of equal length [A, B] and [A0 , B 0 ] there
exist exactly two different isometries such that f (A) = A0 , f (B) = B 0 .
Proof. Let C ∈ Π be an arbitrary point of the plane. Consider first the
case where C ∈ / `, where ` is a line through A and B. Then the two distances
dist(C, A) and dist(C, B) are both positive and together with dist(A, B)
satisfy the triangle inequality.
Draw two circles, one centered at A0 with the radius dist(A, C) and
another centered at B 0 with the radius dist(B, C). These two circles intersect
by two different points C+ 0 and C 0 , symmetric with respect to the line `0
−
passing through A and B 0 (draw the picture!). Both triangles 4A0 B 0 C+
0 0
and 4A0 B 0 C−0 are isometric (“equal”) to 4ABC, the only difference between
them is the mirror symmetry in `. By construction, f (C) can be only C+ 0
0
or C− . Once you choose the sign, it can be maintained consistently over all
points C ∈ / ` by continuity, thus we have two maps f± : Π → Π which are
required isometries. They satisfy the identities
f+ = σ ◦ f− , f− = σ ◦ f+ ,
where σ : Π → Π is the mirror symmetry in `.
These two maps can be uniquely extended on `, on which they coincide
(prove that!).
Remark 1.6. The symmetry σ : Π → Π is itself an isometry (prove it).
Remark 1.7. Let ` ⊂ Π be a line and σ : Π → Π an isometry such that
∀A ∈ `, σ(A) = A (we say that ` is fixed by σ). Prove that either σ = id, or
σ is the mirror symmetry in `.
3.2. Intermediate conclusion. The above construction allows us to
“introduce” formally the Euclidean plane Π without any “undefined objects”
and “obvious axioms”, using only the set theory and concise definitions, as a
metric space, that is, an abstract set X equipped with the numeric distance
function on X × X, (x, y) 7→ dist(x, y) > 0. The “geometric shapes” like
segments, lines, circles can be (at least in the case of Π) described (i.e.,
defined) in terms of the distance function.
However, this is only a template. We have seen that even on the same
set Π one can introduce more than one symmetric nonnegative function
satisfying the triangle inequality, and for other choices of the distance the
definitions for, say, segments may look completely unexpected.
Thus we need to find more specific descriptions of the set X and the
“proper” distance so that the result will coincide with the Euclidean plane
3. “EQUAL” FIGURES. ISOMETRIES 15
as we know it from the Greeks. In a surprising way, this requires first
examine accurately the notion of numbers that we use to measure distance.
16 1. THE (PLANAR) EUCLIDEAN GEOMETRY
4. Which numbers do we use?
When discussing the notion of a metric space, we skipped over the notion
of a number, used to define the distance. Actually, there is a lot of subtle
details in this notion.
4.1. Natural and rational numbers. The Greeks were perfectly fa-
miliar with rational numbers Q. What means a rational number?
The natural numbers N = {1, 2, 3, . . . , 2020, 2021, . . . , 5781, . . . } were so
simple to understand, that they did not deserve any special explanation or
axiomatization. “You just count them” and get the result. Some ideolog-
ical inconvenience was caused by the fact that the set of natural numbers
is infinite, which means that in any natural language we will run out of
words to name large enough numbers. Archimedes in his Ψ αµµιτ ης (“Sand
counting”) showed that the danger with names can be staved off for quite
some time and suggested a system of constructing names sufficient to talk
about astronomic numbers.
Remark 1.8. From the modern algebraic point of view the set N is a semigroup
with respect to the operation of addition +. By definition, this means that N is
closed under this binary operation which is associative, a + (b + c) = (a + b) + c,
and commutative, a + b = b + a. However, N is not a group: the neutral element
0 is not in N (although this is a matter of agreement) and there is no well defined
inverse operation: the equation a + x = b with a, b ∈ N is not always solvable with
x ∈ N. If this happens, we say that a < b (or b > a. In other words, N is an ordered
semigroup.
Remark 1.9. There is another familiar operation on N, called multiplication:
the product of two numbers is denoted by a×b, a·b or simply ab. In the specific case
of N multiplication is not an independent operation: the product can be defined
through addition, using the inductive rule
∀a ∈ N a · 1 = a, a · (b + 1) = (a · b) + a, b = 1, 2, 3, . . . . (4.1)
From this definition it follows that · is also a commutative associative binary oper-
ation on N which is related to the addition by the distributive law
a · (b + c) = (a · b) + (a · c).
In order to reduce the number of symbols, we often omit the sign · and avoid extra
braces, assuming that in their absence · precedes +.
The operation · makes N into another semigroup, and again this is not a group
since the equation ax = b is not always solvable.
Definition 1.10. A set R with two binary operations + and · is called a
(commutative) ring, if it is a group with respect to + with the neutral element
denoted by 0, the inverse operation denoted by −, and the distributive law relating
these two operations.
A ring R is called a field, if R r {0}, R without the zero, is a group with respect
to ·, that is, any equation a · x = b is solvable as long as a 6= 0.
Rational numbers are “new”, non-natural numbers, extending the “old”
natural numbers. They were introduced out of commercial needs: in modern
4. WHICH NUMBERS DO WE USE? 17
terms, the equations of the form nx = m with natural n, m required solutions
even for the cases where m was not divisible by n without remainder.
These new numbers were not very user-friendly for many reasons: ma-
nipulations with them required sophisticated rules. They are unsuitable for
counting, since there is no “next rational number” following the fraction
m/n, and between any two different rational numbers there are infinitely
many other numbers. Finally, one should always be aware that there is
no unique way to represent them: the fractions m/n and 2m/2n mean the
same. Yet the advantages of using them were too obvious to be obstructed
by these complications.
So what is, at the end, the set of rational numbers, denoted by Q?
Today’s answer is simple.
Theorem 1.4. The set of rational numbers Q is the minimal field con-
taining the natural numbers N, such that there is the transitive order > on Q
which agrees with the arithmetic operations and the natural order on N.
There exists a “constructive” way to introduce the rational numbers as
pairs (m, n) of integers, but this is outside the scope of this course.
Remark 1.10. Besides being transitive (as any order by definition),
this order is complete: for any two numbers a, b ∈ Q, we have a trichotomy:
either a < b, or b < a, or a = b, all three cases mutually exclusive.
In addition, the order > agrees with the field operations +, ·, in the sense
that for any a > b and any c, we have a + c > b + c and ac > bc for c > 0.
We say that Q is an ordered field.
Problem 1.14. Write down a complete list of all axioms of an ordered field.
Start your solution as follows:
A set X with two binary operations + and · and a binary rela-
tion < is called an ordered field, if the following holds:
(1) ∀a, b ∈ X there exists a unique z ∈ X denoted by a + b
such that a + (b + c) = (a + b) + c,
(2) ∀a, b ∈ X a + b = b + a,
(3) there exists an element in X denoted by 0 such that ∀a ∈ X
a + 0 = a,
(4) ∀a ∈ X there exists a unique element b ∈ X such that a +
b = 0. This element is denoted5 by −a, and we abbreviate
the expression a + (−b) to a − b.
(5) . . . . . . . . .
4.2. Numbers as lengths. The Greeks found a geometric way to deal
with fractions, using the notion of similarity, or proportionality of lengths.
Given a segment µ = [A, B] ⊂ Π of length |µ| = dist(A, B) > 0, it is
immediate how to construct a segment of length n|µ|, n ∈ N times the length
of µ: it is enough to stack back to back n times copies of µ assuring that
5One may introduce independently the subtraction as a binary operation − and list
conditions which define it uniquely respectively to the addition +.
18 1. THE (PLANAR) EUCLIDEAN GEOMETRY
Figure 7. Thales theorem
all of them belong to the same line through A, B. To construct a segment
of length |µ|/n, in geometric terms, to divide a segment into n equal parts,
one can use the Thales theorem, see Fig. 7.
4.3. Units of measurement. We already noticed that together with any
distance dist(·, ·) satisfying the axioms of the metric space, any other multiple
dist0 = λ · dist, λ > 0, will be again a distance, and (tedious but obvious check)
with respect to this “scaled” distance we will have the same segments, circles, lines
etc.
To avoid such trivial multitude, we need to add an extra condition to the
definition of the distance: choose a pair of points such that the distance between
them would be declared equal to 1. This amounts to the choice of the measurement
unit.
Any nontrivial segment µ = [A, B] ⊆ Π can be assigned the length equal to 1.
To live under such arbitrary choice is impossible, so different cultures used different
measures (foot, yard, furlong, mile, nautical mile, cable length, fathom, inch, . . .
only in the British culture). Typically, these units (used to measure lengths at
different scales) were multiples of each other, thus having a common denominator.
Yet in other cultures (Chinese, continental e.a.) other units (e.g. derivatives of
a meter) are used, and there translation coefficients are far from simple rational
numbers.
Having made such a choice, we can talk about lengths (of certain segments) as
natural or rational numbers. Will it be sufficient to measure every distance on the
plane?
4.4. Irrational lengths. Pythagoras discovered a shocking fact: The
length of the diagonal of a unit square is not a rational number ! But if not
rational, then what sort of number it is? Much later the Greeks discovered
one more problem: they could not determine the length of circumference of
a unit circle (circle of radius 1). Unlike the diagonal, they could not prove
that the circumference is not rational, which puzzled them enormously.
As we now know, the controversy was possible to solve by two oppo-
site approaches. One was to rethink the concept of a number and extend
the number system by adding “irrational” numbers. This is the idea to
4. WHICH NUMBERS DO WE USE? 19
which Archimedes (who lived about a hundred years after Euclid and three
centuries after Pythagoras) came quite close, but developing his thoughts
would require working with infinity. And the Greeks abhorred (very wisely)
infinity, much more than a sail between Scilla and Charybdis!
Instead the Greeks followed the “human practice”. There would be
nothing extraordinary if together with the basic length 1 we have an extra
unit, the length of the diagonal of the unit square, in the same way as two
currencies can be simultaneously used for trade. As long as we remember
how they are related, we can measure distances in a mix of the two units in
the same way as we can measure then in a mix of meters and feet; this might
be a bit awkward but completely clear. However, in this case the exchange
rate was defined not by a rational number (or a finite decimal fraction, as
in today’s bank calculations), but rather in geometric terms as the ratio of
diagonal to side in a square.
This had a logical price. The Greeks decided to give up attempts to
understand, what is an irrational number, how many there are such beasts
and how one can harness them for human purposes. Instead they constrained
themselves only to numbers that can be constructed geometrically, using
only ruler and compass.
Today the algebraists, sophisticated in abstract constructions, would say
that the Greek number universum consisted of quadratic irrationalities, that
is, all (real) numbers which can be constructed from Q by repeated arith-
metic operations and the square root extraction from nonnegative numbers,
something like
s rq
√ √ √
q
7+ 3+ 11 + 21 + · · ·
It is a tiny fraction of the set of real numbers R, and the number π is not
there (it is transcendental, but this was proved only in the 19th century).
4.5. Absolute value. Rational numbers as a metric space. Hav-
ing the order < on Q allows to define a very important function | · | : Q → Q,
called the absolute value:
(
a, if a > 0,
|a| = (4.2)
−a, if a < 0.
It satisfies the obvious properties:
(1) ∀a, |a| > 0, |a| = 0 ⇐⇒ a = 0,
(2) ∀a, b, |ab| = |a| |b|,
(3) ∀a, b, |a + b| 6 |a| + |b|.
The last property is also called the triangle inequality, although there are
no triangles in Q.
Problem 1.15. Prove these properties directly from the definition of
the absolute value (4.2).
20 1. THE (PLANAR) EUCLIDEAN GEOMETRY
Theorem 1.5. The set Q becomes a metric space, if we define the dis-
tance between the “points” (rational numbers) as
dist(a, b) = |a − b| = |b − a|. (4.3)
Problem 1.16. Prove this Theorem!
4.6. Oriented length. The set Q as a metric space is so narrow, so to say,
that one can introduce on this space an oriented distance. Unlike the general
distance, this is a function that satisfies a different set of axioms.
Definition 1.11. An oriented distance is a function which assigns to any two
pair of points A, B the numeric value dist(A, B) that satisfies the different set of
axioms:
(1) dist(A, B) = 0 if and only if A = B; no nonnegativity is assumed.
(2) Antisymmetry: dist(A, B) = − dist(B, A);
(3) The “triangle equality”: for any three points A, B, C we have dist(A, B)+
dist(B, C) = dist(A, C).
Problem 1.17. Prove that any oriented distance on Q is one of the two func-
tions, dist(A, B) = ±(A − B).
Remark 1.11. The “triangle equality” can be formulated in a more symmetric
form:
dist(A, B) + dist(B, C) + dist(C, A) = 0.
In this form it can be generalized for any number n of points, including the axiom
of antisymmetry in the case n = 2.
4.7. What is wrong with the metric space Q? We had already
seen that the rational numbers do not suffice to measure all lengths on the
Euclidean plane Π. Can we formulate more general reclamation to the metric
space Q which would explain, why it is not suitable for measurements?
Assume we have an infinite sequence of points a1 , a2 , . . . , an , · · · ∈ Q,
such that the distances between them are fast decreasing, for instance, |ak −
ak+1 | 6 10−k , k = 1, 2, 3, . . . . We would expect then that this sequence will
“accumulate” to some point a∗ ∈ Q in the sense that the distances |ak − a∗ |
would eventually become smaller than any positive number ε > 0 (we call
such a number the limit of the sequence {ak }).
Sometimes this is indeed the case: for instance, the sequence of finite
decimal fractions
0, 0.1, 0.11, 0.111, ... 0.111
| {z. . . 11}, . . .
n units
clearly accumulates to the infinite decimal fraction 0.1111 . . . which is equal
to 1/9 which is a rational number, that is, a point in Q (why?).
On the other hand, if we consider
√ the sequence of decimal approxima-
tions to the irrational “number” 2,
1, 1.4, 1.41, 1.414, . . . ,
then this sequence accumulates (in the sense of the distance) to something
that is certainly not rational. Such examples in fact occur overwhelmingly
4. WHICH NUMBERS DO WE USE? 21
often: any aperiodic sequence of decimal digits gives an example of a se-
quence of points that converges to a “hole” in the space Q. Such holes
are everywhere: the sets of positive rational solutions x ∈ Q, x > 0, of
the inequality x2 < 2 and positive rational solutions of the complementary
equation x2 > 2, are disjoint and “back to back”, yet there is no number
that separates these sets.
In such and similar case we sadly acknowledge that the metric space Q
with the distance dist(a, b) = |a − b| is not complete. What is completeness?
4.8. Completeness (can be skipped on first reading). The accurate def-
inition of completeness of a metric space is logically rather involved. Fortunately,
we can replace it by a nonstandard definition, more transparent.
Definition 1.12. A sequence of points a1 , a2 , a3 , · · · ∈ X of a metric space X
is called progressive, if dist(an , an+1 ) 6 10−n for all n = 1, 2, 3, . . . .
In the same way as we instantly see that the geometric progression with the
denominator 1/10,
1, 0.1, 0.001, 0.001, . . .
converges, we expect that any other progressive sequences should converge unless
X has some “holes”.
Remark 1.12. The choice of the denominator 1/10 in Definition 1.12 is ob-
viously tailored to be convenient for using with “infinite decimal fractions”. Any
other denominator (e.g., 1/2) will result in the same outcome.
Definition 1.13. A metric space X is called complete, if any progressive se-
quence {an } ⊆ X in it converges to a limit a∗ = lim an ∈ X.
In Mathematics there is a standard procedure of “completion”, which allows
from any (not necessarily complete) metric space X to construct a larger set X ⊇ X
which is complete. Yet this is too far from our main course.
4.9. Real numbers, Euclidean lines. What should we do to cure
the incompleteness of Q? The accurate construction of what we today call
the real numbers belongs to another area of Mathematics (Analysis) and is
properly discussed there. In the course of Geometry we will reap the fruits
of this work and use without a formal proof the following result.
Theorem 1.6. There exists a unique set denoted by R and called the set
of real numbers, which is an ordered field, contains the field Q, inherits all
operations and order from Q (including the absolute value) and is complete
with respect to the corresponding metric (4.3).
The following theorem was not in the Euclid’s books for obvious reasons
(the Greek abhorrence to infinite constructions), but today we can make
an accurate axiomatization of the Euclidean geometry and prove, after long
efforts, the following “obvious” result.
Remark 1.13. The procedure of completion can be applied to the set
Q with the “oriented distance” as in Definition 1.11. The result will be an
“oriented line” R, on which the oriented distance from A to B is equal to
B − A.
22 1. THE (PLANAR) EUCLIDEAN GEOMETRY
5. Geometry of the real line
There cannot be a more dull “theory” than the geometry in one dimen-
sion. However, it will be useful in the future.
Problem 1.18. Prove that the parallel translations
f : R → R, x 7−→ f (x) = x + a, a ∈ R, (5.1)
are isometries of the metric space R.
What about the maps
x 7−→ ax, a ∈ R, (5.2)
when they are isometries?
Example 1.6. The mirror symmetry
x 7−→ −x
is an isometry of R.
Theorem 1.7. There are only two classes of isometries of the real line
R:
(1) x 7→ x + a, a ∈ R,
(2) x →
7 −x + a, a ∈ R.
Proof. We need to describe all functions f : R → R which satisfy the
rule
|f (y) − f (x)| = |y − x|
for all x, y ∈ R. For that we have to treat different possibilities of signs
when expanding the absolute value. First, we can always assume that x < y
(otherwise just relabel the points). Then the right hand side is simply y − x.
If f (y) > f (x) for just one such pair x0 , y0 of points, then from injectivity it
follows that f (y) > f (x) for all y > x (supply the missing details!), therefore
our function f would satisfy the equation
∀x < y f (y) − f (x) = y − x.
Denote by a the common value of both sides, which is a constant (does not
depend on x, y). Then we see that
∀y ∈ R f (y) = y + a.
If, on the contrary, f (y) is always less than f (x) for any x < y, then
the “mirrored” function g(x) = −f (x) satisfies the condition g(x) < g(y)
for all x < y. By the above argument, g(x) = x + a0 , hence f (x) = −x + a,
a = −a0 .
Definition 1.14. An isometry of R is said to be direction preserving,
if it is of the first type, i.e., if f (x) is an increasing function of x ∈ R.
Otherwise it is called direction inverting.
5. GEOMETRY OF THE REAL LINE 23
5.1. Translation invariant additive metrics on R. One can invert
the problem and ask the following question: which metrics are possible on
R? how many different metrics one can put on the real line, that would
satisfy the axioms of symmetry, nonnegativity and triangle?
The unexpected answer is—there are lots of non-equivalent metrics,
which can be obtained by the same trick. Let h : R → R be any contin-
uous strictly monotone growing function on the real line. Then the function
d(x, y) = |h(y) − h(x)|, x, y ∈ R,
will be a metric! Moreover, if the image of h is bounded, e.g., when
h(x) = arctan x, then this metric will possess the surprising property, that
the “diameter” of R, the maximal possible distance between points in the
sense of this metric, will be finite (π in the above example).
However, only one additional condition makes the distance unique (up
to rescaling equivalence, as usual).
Definition 1.15. We say that a metric dist : R × R → R is translation
invariant, if dist(x, y) = dist(x + a, y + a) for any x, y, a ∈ R.
Definition 1.16. A symmetric nonnegative function d : R × R → R is
called additive, if it satisfies the following condition: for any three points
x, y, z ∈ R such that y lies between x and z, the equality d(x, z) = d(x, y) +
d(y, z).
This equality is the degenerate case of the triangle inequality which holds
for points on the same Euclidean line ` ⊆ Π.
Geometrically this means that if we stack two segments so that the right
end of one coincides with the left end of the other, as [A, B] and [B, C], then
the length of their union [A, C] = [A, B] ∪ [B, C] is the sum of the lengths
of segments (this explains the name)6.
Theorem 1.8. Any translation invariant additive metric on R is of the
form dist(x, y) = λ|y − x| for some λ > 0.
Lemma 1.9. Any continuous function f : R → R which satisfies the law
∀x, y ∈ R f (x + y) = f (x) + f (y),
is linear, f (x) = λx for some i ∈ R.
Proof of the Lemma. First, we instantly see that necessarily f (0) =
0 and hence f (−x) = −f (x) Denote f (1) by λ ∈ R. Then
f (n) = f (1 · · + 1}) = f (1) + · · · + f (1) = λn
| + ·{z | {z }
∀n ∈ N.
n times n times
6Actually, here we are implicitly involving two interpretations of the expression |y −x|
on the real line. One is the distance, which makes R into a metric space. The other
interpretation is to treat |y − x| as the 1-dimensional measure (volume) of the segment
[x, y]. Two notions are closely related, but certainly different when we consider shapes in
more than one dimension.
24 1. THE (PLANAR) EUCLIDEAN GEOMETRY
By the same argument,
1 1 1
mf ( m = f (1) = λ, i.e., f m = λm .
Together this means that f (x) = λx for all rational values x ∈ Q. By
continuity assumption, f (x) = λx for all real values x ∈ R.
Proof of the Theorem. Consider the function f (x) = dist(0, x) of
one variable x ∈ R. Assume that 0 < x < y are three points on R, so that
y < x + y. Then
f (x + y) = dist(0, x) + dist(x, x + y) = f (x) + dist(0, y) = f (x) + f (y)
by definition of f and the translation invariance dist(x, x + y) = dist(0, y).
By Lemma 1.9, we see that
f (x) = λx, hence dist(x, y) = dist(0, y − x) = λ(y − x) ∀0 < x < y.
It remains to notice that again by translation invariance,
f (−x) = dist(−x, 0) = dist(−x + x, 0 + x) = dist(0, x) = f (x) x > 0,
hence for x < 0 we have f (x) = λ(−x) = λ|x|. From the nonnegativity
axiom we conclude that λ must be strictly positive.
Remark 1.14. Similar arguments show that the oriented distance is of
the form λ(y − x), λ > 0.
6. Angles
In a surprising way, “calculus” of angles is different from the “calculus”
of lengths by two essential features: first, unlike the case with lengths, there
is a natural and universal unit of angle measurement. The second fact is
also simple but somehow unexpected: the angles are not measured by real
numbers!
6.1. Definitions in the Euclidean plane Π: preliminary discus-
sion. Let us recall the familiar definitions of an angle on the Euclidean
plane Π.
A (geometric) ray in Π is one of two half-lines, into which a line is
subdivided by any of its points, called the vertex. Unlike the lines that
always admit two possible directions, the choice of direction along a ray is
not symmetric anymore: driving one way, you get stopped by the end point,
while driving the other way is unconstrained.
Any segment [A, B] defines four rays: if we consider rays with the vertex
at A, then there are two possibilities: the half-line through A and B which
contains B, or the other half-line which does not contain B. The same is
true for rays with the vertex at B.
In the future we will use the notation “the ray [A, B]” to denote one of
these rays, namely, the ray with the vertex at A and passing through B.
Note that the rays [A, B] and [B, A] are quite different, in a sense, opposite.
6. ANGLES 25
Figure 8. Interior angles of triangle
An angle is a part of the plane, bounded by two different rays with the
common vertex. Note that any two such rays break the plane into two parts,
and one has to indicate explicitly, which of them is considered!
Remark 1.15. What happens if the two rays coincide? Formally one
can ignore such situation and do not call this an angle. Yet it turns out that
it would be much more useful to allow this degenerate case and call this the
zero angle.
This ambiguity does not bother us until we consider only the angles in
triangles. Indeed, for any triangle 4ABC the rays [AB] and [AC] divide
the plane into two parts, such that only one of them contains the interior
points of the triangle. We call this angle the interior angle of a triangle at
the vertex A.
Remark 1.16. One can explain the difference between the “smaller”
and “larger” parts of the plane bounded by two rays, in terms of convexity.
Except for the case where the three distinct points A, B, C lie on a straight
line, the complement in Π to the union of rays [AB] and [AC] consists of
one convex part and one non-convex part. In the exceptional case, both
parts are convex, but the “angles” are equal: a rotation of the plane Π with
a center at A brings one of the angle to the other.
6.2. Angles and arcs. Let C ⊂ Π be a circle of any finite radius with
the center at a point O. Then any ray with the vertex at O crosses the circle
C at a unique point. A pair of rays gives us two points A, B ∈ C which
in turn define two circular arcs on C with endpoints at these points. One
can easily see that the “convex” angle between the two rays is supported by
the shorter arc of the circle. This choice is independent of the radius of the
circle.
A circle on the Euclidean plane Π naturally inherits the Euclidean dis-
tance: for two points A, B ∈ C, distΠ (A, B) = [A, B] is the length of the
chord with endpoints A, B. One even should not bother checking the ax-
ioms for this distance (why?). Moreover, this distance is rotation invariant:
if f : C → C is a rigid rotation of the plane with the center at O, then
dist(f (A), f (B)) = dist(A, B) for any two points A, B.
We shall see in a moment, why this distance is not the most convenient
one to measure the angles.
26 1. THE (PLANAR) EUCLIDEAN GEOMETRY
Figure 9. Euclidean distance on the circle C
6.3. How do we measure angles? We want to find a natural function
that would allow to assign to any angle with a vertex at a fixed point O ∈ Π
a numeric measure ∠AOB, and have a list of apriori properties that would
be desirable. Having acquired some experience with distances and lengths,
we would like to reduce the problem to that for pairs of points on a circle C.
Actually, all circles of different radii are equivalent for what will be discussed
below, so we reduce the degree of freedom by choosing the unit circle
U = {A ∈ Π : dist(O, A) = 1} ⊆ Π.
Next, we will tacitly sweep under carpet an accurate definition of lengths of
arcs on U.
Remark 1.17. This definition requires transition to limits: from the
definition of distance we can only deduce how lengths of straight segments
and their unions, polylines (piecewise straight continuous curves on Π). The
circumference is not straight even in its smallest part, so we need to work
out what happens, say, with perimeters of perfect n-gons inscribed into U
as n → ∞. This is a separate and independent subject, which we will not
discuss here.
Instead we accept the length of the unit circumference is equal to 2π.
This is the definition of the number π. Equally, we could declare7 that the
length of the unit circle is 360◦ . This will be the definition of the degree ◦ .
Note that unlike the real line, a pair of points A, B ∈ U defines not
one, but two different arcs on U, one longer, one shorter (except for the
degenerate case where A, O, B lie on the same line), in which case the arcs
are equal in any sense. We need to find a consistent way to make the choice.
As a consequence of this ambiguity, on the circle we cannot use the intuitive
notion “to lie between”. Indeed, for any three distinct points A, B, C ∈ U
one can always assume that C lies between A and B (choosing one of the
two possible arcs).
7For any other circle C of radius r > 0 the length of the circumference will be
proportional to r. This means that the angular measure is indeed associated with the
angle rather than the arc-length, if we consider the ratio of the arc-length to the radius of
the circle. Choosing C = U simply makes the denominator equal to 1 for convenience.
6. ANGLES 27
Figure 10. Addition of angles
Having these subtleties in mind, we still can assemble a list of customer
demands/questions that we might consider in legislating the angle measure-
ment laws.
(1) Rotations of the plane Π map U into itself. Asking for the rotational
invariance of the angular measure is a very modest requirement
under any legislation.
_
(2) Continuity. Any definition of ∠AOB (resp., the length AB of
_
“the” arc AB on U should be stable by small variations of the
position of each endpoint A, B.
(3) Symmetry or asymmetry: we can adopt either the symmetry ax-
_ _
iom AB = BA or its “oriented” version (cf. with the oriented
_ _
distance) AB = − BA .
_ _
(4) Additivity: given two arcs AB and BC with a common endpoint
B ∈ U, we would love to have an identity
_ _ _
AB + BC = AC , (6.1)
cf. with Fig. 10.
6.4. Mutually incompatible demands. One can easily see that not
all of these demands compatible with each other.
One natural way would be to treat U ⊆ Π as a metric space with the
Euclidean metric inherited from Π by setting
_
AB = [A, B] ,
where the right hand side is the Euclidean length of the segment [A, B] ⊆ Π,
ignoring the fact that except for the endpoints, this segment is outside U.
This “arclength” will be, of course, symmetric, nonnegative and rotationally
invariant, but additivity will be lost completely. The identity (6.1) will fail
for all three distinct points (by the strict triangle inequality).
28 1. THE (PLANAR) EUCLIDEAN GEOMETRY
Figure 11. Violation of additivity
_
A better choice would be to denote by AB the shortest of the two arcs
_
with endpoints A, B ∈ U and let AB be its arc-length. This will be a
nonnegative and symmetric function, and one can see that it will be additive,
at least for short arcs. More accurately, for this choice the identity (6.1)
_
holds for all three points such that B ∈ AC, that is, assuming that B “lies
between” A and C somewhere on the shortest way from A to C. But this
additional constraint is unstable by small perturbation.
Indeed, assume that A and B are fixed and C slowly moves along U
_
increasing the arc-length AC . When the latter becomes equal to π, the
shortest path from A to C suddenly jumps off B, invalidating (6.1) (see
Fig. 11).
Yet these minor problems do not make this construction meaningless.
Definition 1.17. For any two points A, B ∈ U the geometric angle
∠AOB is the angle supported by the shortest of the two arcs on U with
endpoints A and B. The angular measure of the geometric angle is the
length of this arc.
By this definition, all geometric angles (measured by the length of the
corresponding arcs) are nonnegative between 0 and π and satisfy a restricted
form of the additivity (as long as the sum of angles is less than π = 180◦
and they are attached “outwards”, see Fig. 10.
However, this is not the only possibility. We can follow the paradigm of
the oriented line (cf. with Definition 1.11). Similarly to lines on the plane,
any circle C ⊆ Π with any center can be oriented. Unlike the lines, for which
the “direction of traffic” can be chosen arbitrary, orientation of circles can
be defined in “absolute terms”8.
When we travel along a circle C ⊆ Π, the two parts of Π r C are
asymmetric: one (inside) is bounded, the other (outside) is unbounded. We
8Actually, this definition depends on the orientation of the plane Π, we will talk about
this later.
6. ANGLES 29
Figure 12. Another violation of additivity
say that the travel is in the positive direction (counterclockwise), if the inside
is to the left of the traveller and the outside to the right9.
_
Definition 1.18. An oriented arc AB ⊆ U is the arc which connects
A to B in the positive direction, that is, B is obtained by rotation counter-
clockwise from A.
Definition 1.19. The oriented arc-length from A to B is the length
_
of the oriented arc AB. The oriented angular measure of ∠AOB is the
_
arc-length of the oriented arc AB.
By this definition, the oriented angular measure is a real number between
0 and 2π (the length of the full circumference of U).
However, this definition immediately contradicts the additivity assump-
tion, even for very small angles.
Example 1.7. Consider three points A, C, B ∈ U listed in their order
along counterclockwise direction, see Fig. 12. Then the identity (6.1) is very
_ _
strongly violated. Indeed, the oriented arc-lengths AB and AC are small,
_
but the oriented length of BC is close to the full turn 2π, which makes the
identity impossible.
_
To save the identity (6.1), we would have to assign to the arc BC the
_
negative value − CB of the oriented length in the same way we assigned
the negative length to segments of a real line R oriented negatively (against
the selected orientation on R). But then we will face the problem: the same
_
arc BC has to be assigned two different values of the oriented length: one
small negative number, another a positive number slightly less than 2π.
How one can accommodate such multivaluedness of the oriented length?
The answer is,—replace the measurement scale.
9The difference between left and right respective to a traveller is exactly the question
of how the orientation on the plane Π is chosen.
30 1. THE (PLANAR) EUCLIDEAN GEOMETRY
6.5. Digression: Quotient group. Recall that R as the algebraic
structure has two operations, + and ·, and with respect to the first operation
it is a commutative group. Of which the set Z or, for any other real number
λ 6= 0, λZ = {λx : x ∈ Z} is a (commutative) subgroup.
With this in mind, we will denote by a + Z (resp., a + λZ) the set of
points {a + λx : x ∈ Z}. This is an infinite (in both directions) arithmetic
progression with the difference λ 6= 0 containing the number a.
If we have two such progressions, say, a + Z and b + Z, then these sets
coincide if and only if a − b ∈ Z, that is, when a and b have the same
fractional parts.
The idea of a quotient is to forgive about the integer parts and focus
only on the fractional parts. We can define on the set of progressions as
above the operations of addition/substraction by setting
(a + Z) + (b + Z) = (a + b) + Z, (a + Z) − (b + Z) = (a − b) + Z (6.2)
Problem 1.19. What will be the neutral “element” for this addition?
Prove that this operation + makes the set of progressions into a commutative
group.
Problem 1.20. Why we cannot define the multiplication of “elements”
by the similar formula
(a + Z) · (b + Z) = (a · b) + Z? (6.3)
However, we can define multiplication by −1 (“change of sign”). How? Why
it works?
The set of progressions as above is called the quotient group and denoted
by R/λZ. Sometimes elements of this group are sometimes denoted by
a mod λ or a mod λZ instead of a + λZ (remember the notation for integer
residues, e.e., 7 = 1 mod 2).
Remark 1.18. The quotient groups appear in the minimal disguise in
another familiar situation. Recall that the decimal logarithm lg possesses
the property that lg(10n x) = n + lg x for any n ∈ Z and any x > 0. This
property allows to assemble the table of decimal logarithms as follows. By
shifting the decimal point in any number y ∈ R∗ (the set of positive real
numbers), we can represent y under the form y = 10n x, where x is less than
one and has its first decimal digit positive, that is, 0.1 6 x < 1. The decimal
logarithm lg x is a number between 0 and 1 (the mantissa), and to restore
the logarithm lg y, one has to add the integer n (the characteristic).
This “algorithm” means that we have two groups, (R, +) (the addi-
tive group of real numbers) and (R∗ , ·), the multiplicative group of positive
numbers. The sets Z ⊆ R and 10Z = {..., 0.01, 0.1, 1, 10, 100, . . . } are two
subgroups in these groups. The “table of decimal logarithms” is in fact the
table of the mantissa map lg∗ ,
lg∗ : R∗ /10Z → R/Z.
6. ANGLES 31
Its advantage is that both the domain and the range of lg∗ are compact
bounded sets, in some sense equivalent to the unit circle U, hence for any
given precision the table will be of finite size. This is in general not true if
we would like to tabulate an arbitrary function from R∗ to R (or from R to
R).
6.6. Geometric interpretation of oriented arc-length. In appli-
cation to measuring oriented arcs, passing to the quotient group means that
_
together with an oriented arc AB ⊆ U we consider all arcs which can be ob-
tained by adding any number of full turns (counterclockwise), that is, treat
_
the “null arc” AA indistinguishable from the full circular arc U, starting and
ending at the same point A ∈ U. Thus our oriented arcs are actually equiv-
_
alence classes of the form AB + nU, where n ∈ Z is an integer number. The
length of the circumference is equal to 2π, therefore the oriented arc-length
takes value in the quotient group R mod 2πZ = {s + 2πZ}.
This resolves all contradiction: the right angle (supported by one quarter
of the full circumference) has angular measure π/2, and the same arc with
an opposite direction must be either or −π/2 or 3π/2: as real numbers, these
answers are different, but as elements of the quotient group R mod 2πZ they
are the same:
π π π 3π
0= + − = + = 2π = 0.
2 2 2 2
6.7. On the choice of units. We already remember that there is no
natural unit for length measurements: any yardstick would work provided
it is the same for all people who do measurements (the length of a finger or
the footlength is varying between people). Unlike this, the universal unit for
angular measurements is obvious: this is the full angle. If we use different
circles for arclength measurements, the straight angles, the arclengths will
be proportional, but the angular measures as fractions of the full round will
remain the same.
More practical Babylonians chose the other approach: they considered
as the unit measure the arclength of the circumference of U and divided
the corresponding full angle into 360 equal parts, called degrees.10 In some
respects the Babylonians turned out more smart than the Greeks: the “full
period of circulation” is much more natural than the half-period, and the
number 2π occurs in mathematics and physics much more frequently than
just π alone. Some people went as far away as to suggest a special symbol
ππ (“double pi”, or “three-legged pi”) as the notation for the constant 2π as
a more basic symbol. Yet the Greek tradition is too firmly rooted in math.
10The number 360 is especially convenient because it has many divisors, and hence
many useful angles are measured by integer numbers of degrees.
32 1. THE (PLANAR) EUCLIDEAN GEOMETRY
Figure 13. Plots of geometric and oriented angle
6.8. Visualization. To compare the two competing definitions, geo-
metric angle and oriented angle (or what is the same, usual arc-length and
oriented arc-length on the unit circle U), we will plot two different graphs.
Choose A ∈ U a point on the unit circle which will stay in its position,
and let B = B(t) ∈ U, t ∈ R, be the point that moves along U with the unit
velocity in the counterclockwise (positive) direction, starting from B(0) = A
at the moment t = 0 (negative values of t are also legitimate). How will the
geometric and the oriented angle depend on t?
The answer is given on Fig. 13.
On the first graph for the values t close to zero we see the familiar
pattern of the absolute value: for t and −t the points B(t) and B(−t) are
symmetric with respect to A, the geometric arc-lengths are the same and
equal to |t|. But as far as t approaches the value t = π (half-circle), the
point B(t) approaches the point antipodal to A, and when it crosses the
antipodal position, the arc-length becomes decreasing function of t. The
function decreases until t becomes 2π, when the distance vanishes. The rest
is periodic with the period 2π.
6.9. On measuring lengths of curves. The question how we measure lengths
of circle arcs is not that innocent. By definition, the Euclidean length is originally
defined only for (genuine Euclidean) straight segments. Of course, it can be easily
extended for piecewise straight broken lines A0 A1 . . . An by summation,
|A0 A1 · · · An | = |A0 A1 | + |A1 A2 | + · · · + |An−1 An |.
Yet even in the case of the circular arcs no arc, even a tiny one, is a straight segment.
Thus the above formula cannot be applied directly.
The Greeks tried to approximate the circle by a regular inscribed n-gone and
a regular circumscribed n-gone whose perimeters they showed to be always related
by an inequality, and apparently approaching to each other as n becomes larger and
larger. Having no accurate concept of a real number and no concept of a limit, they
had to swipe some delicate arguments under the carpet. Yet they felt that their
beautiful building of rigorous Mathematics is obviously lacking a firm foundation,
6. ANGLES 33
so they kept asking each other about “squaring the circle”, in modern terms, about
accurate construction of the number ππ.
The question (i.e., impossibility of any meaningful “representation” of the num-
ber ππ was finally resolved only relatively recently by J. Liouville only in 1844, about
two thousand years later after the problem was identified.
But the spectacular success does not absolve us from answering the question,
what is the length of a curve? We will address it later and show that the answer is
given by the (definite) integral, at least for sufficiently well-behaving curves.
6.10. Intermediate summary. We introduced two definitions of an-
gular measure: the geometric measure, which is a nonnegative real number
between 0 and π. It has an obvious advantage: the unit circle U equipped
with the corresponding arc-length, is a metric space with rotationally in-
variant metric which is “locally additive”.
The other, oriented angular measure, is measured by not by real num-
bers, but rather by elements of the quotient group R mod 2πZ whose el-
ements are not numbers, but rather geometric progressions in R with the
same step 2π. Apart from the inconvenience caused by this “multivalued-
ness”, the oriented angle is better in the sense that it exhibits no extra
singularities.
However, there is no reason to prefer one measure to the other. For
“geometric” constructions (computation of elements of triangles e.a.) the
geometric definition is apparently more convenient. On the other hand, for
analytic constructions (e.g., computations with trigonometric functions) the
oriented measure is obviously better suited.
Beware!
34 1. THE (PLANAR) EUCLIDEAN GEOMETRY
7. Cartesian plane R2
7.1. Reminder: Cartesian product. The Cartesian product is a
very simple operation which is defined for any two sets. By definition, A×B
is the set of all ordered pairs (a, b) such that a ∈ A, b ∈ B.
Problem 1.21. “Visualize” the Cartesian products Z × Z, Z × R and
R × N.
Problem 1.22. Prove that A × (B × C) = (A × B) × C. This allows to
write the products A × B × C without any braces.
Remark 1.19. In general, A × B and B × A are different sets. On
the other hand, there always exists the map ι : (a, b) 7→ (b, a), which is a
set-theoretic equivalence. However, this map may not preserve additional
structures on the Cartesian product. Example: the operation −, which
maps R × R → R and sends (a, b) into a − b. The map ι does not preserve
the result of this operation.
7.2. The brilliant insight of Descartes.
Definition 1.20. The real plane R2 is the Cartesian product R × R,
that is, the set of all ordered pairs (x, y), where x and y are real numbers.
Why this insight is so remarkable? Because we can use arithmetic op-
erations with real numbers to impose various conditions jointly on x and y
to describe subsets of the real plane. In particular, there is no problem to
explain what is P ± Q and λP for P, Q ∈ R2 and λ ∈ R. This opens the
way to describe subsets of R2 by algebraic formulas!
Example 1.8. The subsets {(x, y) : x = 0} and {(x, y) : y = 0} are
obviously faithful copies of R and should be considered as straight lines in
R2 , parameterized by the variables y ∈ R and x ∈ R respectively. These
lines are called the coordinate axes.
Moreover, each set {x = a} = {a} × R and {y = b} = R × {b}, a, b ∈ R,
is also a copy of the real line R, parallel to one of the coordinate axes. Why
parallel? Because, say, {x = a} and {x = 0} do not intersect unless a = 0.
But are there any other lines in R2 ?
Remark 1.20 (warning). The two “lines” {(x, y) : x = 0} and {(x, y) : y = 0}
plotted as usual, intersect at the origin. But this is only because we have chosen to
embed the two lines Rx and Ry into R2x,y this way. Formally neither Rx nor Ry is
not a subset of R2 . For a pair of arbitrary sets X, Y there are no canonical maps
X → X × Y or Y → X × Y , there exist only canonical maps X × Y → X and
X × Y → Y . The difference may seem technical, but it is conceptual.
To complete the definition of R2 as a metric space, we need to define the
distance function. This definition cannot be arbitrary: for pairs of points
on the coordinate axes it should coincide with the 1-dimensional distance
dist(x1 , x2 ) = |x1 − x1 |, dist(y1 , y2 ) = |y1 − y2 |.
7. CARTESIAN PLANE R2 35
Absent alternative hypotheses, we assume that the coordinate axes are or-
thogonal to each other in hope that this orthogonality would allow us to
define the distance function uniquely11.
Then by √the Pythagoras theorem, we have a unique choice to postulate.
Since |x| = x2 valid for any x ∈ R, the following notation will be self-
explanatory.
2
p 1.21. For any element (“point”) P = (p1 , p2 ) ∈ R we
Definition
define |P | = p1 + p2 . For any two points P = (p1 , p2 ), Q = (q1 , q2 ) ∈ R2
2 2
the distance dist(P, Q), also denoted by |P − Q| is defined as
p
dist(P, Q) = |P − Q| = (p1 − q1 )2 + (p2 − q2 )2 . (7.1)
Remark 1.21. Denote by O = (0, 0) the “origin” in R2 . Then
|P | = dist(P, O) = |P − O|, for any P ∈ R2 .
The “modulus” notation is convenient to work with!
In the Euclidean plane Π we know that the function |P − Q| introduced
this way is indeed “the” distance. But in the Cartesian plane we have only
the set R2 and the formula (7.1). To show that this formula satisfies the
three axioms of the distance, we need to prove certain inequality.
Theorem 1.10. The function dist(P, Q) = |P − Q| satisfies all axioms
of the distance (see Definition 1.1).
Proof. Nonnegativity and symmetry are obvious, the only difficult part
is the triangle inequality. Without loss of generality we may assume that
one of the points is at the origin O = (0, 0), so that we have to prove
|Q| 6 |P | + |Q − P | ∀P = (p1 , p2 ), Q = (q1 , q2 ).
It is equivalent to the inequality
|Q| − |P | 6 |Q − P |,
We will transform this inequality by squaring it two times to get rid of the
square roots. After squaring the first time, we get
|Q|2 − 2|Q||P | + |P |2 6 |Q − P |2 ,
that is,
(q12 + q22 ) + (p21 + p22 ) − 2|Q||P | 6 (q1 − p1 )2 + (q2 − p2 )2
= (q12 + p21 ) + (q22 + p22 ) − 2(q1 p1 + q2 p2 ).
11Recall that in our former discussion of metric spaces the procedure was inverse: we
started from the notion of distance and only later derived the notion of angles in terms of
this distance function.
36 1. THE (PLANAR) EUCLIDEAN GEOMETRY
This last inequality is equivalent to the inequality12 q1 p1 +q2 p2 6 |P ||Q|. Af-
ter squaring it yields the famous Cauchy–Bunyakovsky-Schwarz inequality.
(1) add Xref 1
(p1 q1 + p2 q2 )2 6 (p21 + p22 ) · (q12 + q22 ).
Going back along this chain of transformations, we prove the triangle in-
equality.
Remark 1.22. This computation seems to be a pure trick, although not
a very complicated one. This happens because we forcibly introduced the
distance by the explicit formula (7.1) without proper justification or analysis
of its meaning.
Problem 1.23. Show that the Manhattan distance between two points
P = (p1 , p2 ) and Q = (q1 , q2 ) is given by the formula
distM (P, Q) = |p1 − p2 | + |q1 − q2 |.
What assumption was tacitly made about the “Manhattan map” by sug-
gesting the above formula?
Problem 1.24. Show that the distance function
dist∞ (P, Q) = max{|p1 − p2 |, |q1 − q2 |}
also defines a distance on R2 .
Hint. Draw the “unit circle” of radius 1 centered at O in both cases
and compare the two pictures.
As we know, the distance in any metric space determines “straight lines”.
Example 1.9. Consider the segment [OA] passing through O = (0, 0)
and A = (1, 0). What is the set of points B = (x, y) such that dist(O, B) =
dist(O, A) + dist(A, B)? Expanding the definition, we obtain the equation
for (x, y): p p
x2 + y 2 − 1 = (x − 1)2 + y 2 .
Transforming this equation by squaring it as in the proof of Theorem 1.10,
we obtain p
x2 + y 2 − 2 x2 + y 2 + 1 = (x − 1)2 + y 2 ,
that is, p
x2 + y 2 = x.
This gives us the solutions13 y = 0, x > 0, and this ray lies (expectedly) on
the line {y = 0} ⊆ R2 .
12One can easily see in the left hand side the scalar product of two vectors p = (p , p )
1 2
and q = (q1 , q2 ) on the Euclidean plane Π (note that we switched from capitals used for
points to smallcase letters used for “vectors”, not yet properly defined). The inequality in
question can be geometrically interpreted as the fact that the cosine of the angle between
p and q on the plane is 6 1, which is obviously true.
13Actually, the points {y = 0, 0 6 x 6 1} were acquired accidentally when squaring
the equations and should be excluded from the solution, but the ray [OA] indeed consists
of all points with x > 0.
7. CARTESIAN PLANE R2 37
Computations in the general case are more involved and we wont’t do
them, instead formulating only the result. Its “natural” and transparent
proof will be given in due time in terms of the Linear Algebra. 1 (1) add Xref
Theorem 1.11. Let a, b, c ∈ R be three real numbers such that a2 + b2 6=
0. Any set
{(x, y) ∈ R × R : ax + by + c = 0}
is a straight line in the plane R2 in the sense of the metric space R2 . Two
lines coincide if and only if (a : b : c) = (a0 : b0 : c0 ), that is, if there exists a
number λ 6= 0 such that (a, b, c) = λ(a0 , b0 , c0 ).
Conversely, any line in the sense of the distance (7.1) on R2 is defined
by a linear equation as above.
Problem 1.25. Prove that each line ` = {ax+by +c = 0} is isometric to
the standard line R: there exists a bijective map γ : R 7−→ ` which preserves
the distance: |γ(t) − γ(s)| = |t − s| for any t, s ∈ R.
Sketch of solution. Let P = (x0 , y0 ) be any point on `. Consider
the map
ϕ : t 7−→ (x0 − bt, y0 + at) ∈ R2 . (7.2)
Prove that ρ(t) is a bijective map between R and `.
Compare the distances |t − s| and |ρ(t) − ρ(s)|. Find a constant λ ∈ R+
such that γ(t) = ϕ(λt) is an isometry. This isometry solves the problem.
Remark 1.23. The formula (7.2) gives a parametric representation of
the real line. It is in a sense dual to the representation of the line “by
equation”.
Indeed, the notation ` = {ax + by + c = 0} is a shortcut for describing
the following situation:
l : R2 → R, l(x, y) = ax+by+c, ` = λ−1 (0)
preimage of a point 0 ∈ R.
(7.3)
The set ` ⊆ R2 is the preimage of an explicit polynomial (in this case linear)
map l.
On the other hand, the formula (7.2) explicitly defines the map
ϕ : R → R2 , ϕ(t) = (x0 − bt, y0 + at), so that ` = ϕ(R), (7.4)
that is, ` is defined as the image of R by a linear map.
Quite obviously, these two representations are possible also for nonlinear
objects of other dimension. For instance, if we have two curves on R2 defined
by the equations {f (x, y) = 0} and {g(x, y) = 0} respectively, then their
intersection Z ⊆ R2 (which can be a set of isolated points, or a planar
curve, or both) is the preimage of the point (0, 0) by the map F : R2 → R2 ,
given by the formulas
x f (x, y)
F: 7→ , Z = F −1 (0, 0).
y g(x, y)
38 1. THE (PLANAR) EUCLIDEAN GEOMETRY
8. Geometry through algebra. Lines and circles
8.1. Geometry of lines and their configurations. Theorem 1.11
allows to study all Euclid axioms concerning incidence of points and lines,
based entirely on their equations.
Theorem 1.12. For any two distinct points P1 = (x1 , y1 ) and P2 =
(x2 , y2 ) there exists a unique (modulo the above convention) line passing
through these points.
Proof. Denote the coefficients of the unknown line ` by a, b, c. Then
we need to solve the system of two equations
ax1 + by1 + c = 0,
ax2 + by2 + c = 0,
with respect to the three unknowns a, b, c. Subtracting the first equation
from the second one, we get
a(x2 − x1 ) + b(y2 − y1 ) = 0,
which has a unique solution (a : b) unless P1 6= P2 , when any pair (a : b) fits
in the equation a0 + b0 = 0. Substituting any solution (a, b) into any of the
equations, we find a unique c which means ultimate uniqueness of the triple
(a : b : c).
Theorem 1.13. Any two distinct lines defined by the triples (a, b, c) and
(a0 , b0 , c0 ), have a unique intersection point if and only if
a b a b
det 0 0 = 0 0 = ab0 − a0 b 6= 0.
a b a b
Otherwise, when the determinant vanishes, the two lines do not intersect
and are called parallel.
Proof. Consider two functions l(x, y) = ax + by + c and l0 (x, y) =
a0 x+b0 y +c0 and solve the system of two linear equations with two unknowns
x, y
l(x, y) = 0, l0 (x, y) = 0,
using your favorite method (you may remember the Cramer rule or simply
express one of the variables x, y through the other using one of the equations,
and substitute the result into the second equation).
Corollary 1.14 (The Fifth Postulate by Euclid). For any line ` and a
point P ∈
/ ` there is a unique line that passes through P and is parallel to `.
Proof. In our model the proverbial Fifth Postulate is not an axiom
but rather a simple theorem. Let (x0 , y0 ) ∈ R2 be the coordinates of the
point and {ax + by + c = 0} the equation of `. Since P ∈ / `, we have
0
c0 = ax0 + by0 + c 6= 0. The line ` = {ax + by + (c − c0 ) = 0} passes through
P and is parallel to ` by Theorem 1.13.
8. GEOMETRY THROUGH ALGEBRA. LINES AND CIRCLES 39
To prove the uniqueness, notice that all equations for parallel lines (up to
proportionality) have the form {ax+by +c0 = 0}. The value c0 is determined
by the point P uniquely.
Theorem 1.15. Prove that three lines defined by the triples (a, b, c),
(a0 , b0 , c0 ) and (a00 , b00 , c00 ) pass through a single point only if
a b c a b c
0 0 0
det a b c
= a0 b0 c0 = 0.
00
a b c00 00 a00 b00 c00
In the opposite direction, if the above determinant vanishes, the three
lines pass through a single point or are parallel to each other.
Proof. The determinant vanishes if and only if one of its rows (say,
the third one) is a linear combination of the two others (first and second).
Assume that the first two lines are not parallel and (x0 , y0 ) ∈ R2 is their point
of intersection. By definition, the two functions ax + by + c and a0 x + b0 y + c0
both vanish at this point. But then the third function a00 x + b00 y + c00 also
vanish as their linear combination.
Problem 1.26. Prove that the three points (x0 , y0 ), (x1 , y1 ) and (x2 , y2 )
on the plane are collinear (belong to the single line `) if and only if
x0 y0 1
x1 y1 1 = 0.
x 2 y2 1
Hint. The equation ax + by + c = 0 can be looked at from two opposite
points of view. One can assume a, b, c given and look for solutions x, y.
Alternatively, one can assume that x, y are coordinates of a known point,
and look for solutions a, b, c.
Remark 1.24. A reader with a good intuition should immediately notice
similarity between this Problem and the assertion of Theorem 1.15. The
reader with a good taste should be offended by the fact that the similarity is
not complete. There are mysterious ones in the 3 × 3-matrix in the Problem,
but no exceptions similar to the subcase of parallel lines as in the Theorem.
The explanation will be given later. It turns out that the Euclidean
axioms exclude all that happens “at infinity” (on the line of horizon), and
one has to assemble a different set of axioms describing the world in which
“infinity” is equated in rights with the rest. This is called the Projective
geometry, and the corresponding world is much more convenient for some
purposes. Yet it is certainly non-Euclidean: there are no parallel lines there!
We shall see that this world is almost identical to the spherical geometry,
which should be intuitively close to us humans restricted to live on the
Earth’s surface.
This section contains in the nutshell the incidence theory of lines on the
plane. Using manipulations with systems of linear equations, one can prove
with more or less ease theorems of Desargues (Fig. 1) and Thales (Fig. 7).
40 1. THE (PLANAR) EUCLIDEAN GEOMETRY
8.2. Cartesian circles. What about circles? By definition, the circle
of radius r > 0 centered at a point O is the set of all points X on the plane,
such that |OX| = r. Translating this into the Cartesian language, we have
the following.
Definition 1.22. The circle of radius r > 0 centered at a point (p, q) ∈
R2 is the set of all points (x, y) such that
p
(x − p)2 + (y − q)2 = r, that is, (x − p)2 + (y − q)2 − r2 = 0. (8.1)
The advantage of the second form is its polynomiality: it can be reduced
to an equation of the form f (x, y) = 0, where f is a polynome of degree 2
in x, y.
Problem 1.27. Is that true that any polynomial of the form
f (x, y) = (λx2 + λy 2 ) + (ax + by + c) (8.2)
with λ 6= 0 defines a circle {(x, y) : f (x, y) = 0}?
Hint. When an equation in one variable λx2 + ax + b = 0 defines a
two-point set? one-point set? empty set? Try to separate “full squares” in
the equation (8.2).
Remark 1.25. Since equations f (x, y) = 0 and λf (x, y) = 0 define the
same set on R2 for any λ =
6 0, one can always assume that λ = 1!
Since the equations for the circles are of degree 2 and the leading coef-
ficients before x2 and y 2 are always equal to 1 (and the term xy is absent),
theory of intersections between circles and lines reduces completely to the
study of quadratic equations in one auxiliary variable of the form ∆(λ) = 0,
where ∆(λ) = αλ2 + βλ + γ is generally referred to as the discriminant
polynomial. It is imperative not to forget that sometimes this polynomial
may degenerate to a linear one, when p = 0.
Theorem 1.16. For any circle C and line ` on the plane, the intersection
C ∩ ` can be one of the following types:
(1) C ∩ ` = ∅;
(2) C ∩ ` is a single point (then the line is called tangent to C);
(3) C ∩ ` consists of two distinct points.
Any two circles C1 , C2 on the plane can be in one of the following positions:
(1) Disjoint, C1 ∩ C2 = ∅; one of the disjoint circles may be inside or
outside the other;
(2) Kissing each other (also from inside or outside);
(3) Intersecting by two distinct points;
(4) Coinciding identically.
Sketch of the proof. As for the intersection between circles and lines,
one has to study systems of two equations, one quadratic, one linear:
x2 + y 2 + ax + by + c = 0 and a0 x + b0 y + c0 = 0.
8. GEOMETRY THROUGH ALGEBRA. LINES AND CIRCLES 41
Since a0 , b0 cannot vanish simultaneously, either x can be expressed through
y, or vice versa, but in any case the result can be substituted to the quadratic
equation, which becomes then an equation with one unknown.
With two quadratic equations for C1 , C2 this trick does not work straight-
worward (expression of x via y will involve the radical sign), but the differ-
ence of the two quadratic equations with the same leading terms x2 + y 2 is a
linear equation (eventually, the trivial one 0 = 0 or c00 = 0 with nonzero con-
stant c00 ∈ R). This reduces the nontrivial cases to the previously discussed
situation.
Remark 1.26. The definition of tangency, implicitly given in this Theo-
rem, is bad. By tangency we intuitively understand something more intimate
than just uniqueness of the intersection point. For instance, in the course
of Calculus when we consider the graph of the parabola y = x2 , we do not
consider vertical lines x = c as tangent to the parabola, although the system
of two equations
y − x2 = 0, x = 1
has only one solution.
Once again, the situation would be less confusing in the “Projective
world”, where we could argue that actually there are two points of intersec-
tion, simply one of them escaped “to infinity” in the same sense as “escapes
to infinity” the point of intersection of parallel lines.
8.3. Conic sections. As long as the quadratic equations are “explic-
itly solvable”, it would be natural to consider not just the special cases of
equations for circles, but arbitrary quadratic equations of the general form
f (x, y) = 0, f (x, y) = (λx2 + µxy + νy 2 ) + (ax + by) + c = 0
and ask, what are the sets defined by these equations?
Asked naively, the question is meaningless: there are 6 parameters (coef-
ficients) in the equation, and even if we take into account that two equations
which differ by a nonzero scalar factor define the same set, still the diversity
formally is too large.
To deal with this, we involve our intuition which suggests that two fig-
ures which are transformed into each other by an isometry (rigid move) of
the plane, are considered the same. This idea turned out to be immensely
helpful: with minimal efforts we may “identify” between themselves shapes
(subsets of the plane) which differ by more general transformations (similar-
ity is by far the most popular, but there exist other classes,—affine, projec-
tive, topological transformations). Yet we leave the hands-on discussion of
this subject for future and here formulate the result concerning “quadratic
curves”, known since the Greeks also by the name conic sections.
Theorem 1.17. Any set defined by a polynomial of degree 2 can be
brought by an isometry of the plane to one of the following sets,
(1) Ellipse λx2 + µ2 − 1 = 0, λ, µ > 0;
42 1. THE (PLANAR) EUCLIDEAN GEOMETRY
(2) Hyperbola λx2 + µ2 − 1 = 0 (the same equation, but λµ < 0, that
is, the nonzero coefficients have different signs);
(3) Parabola x2 − y = 0 (semi-degenerate case, can be considered as an
ellipse with one of the points escaped to infinity);
(4) Two crossed lines (x − ay)(x + ay) = 0, a 6= 0 (another semi-
degenerate case of hyperbola “saddling” its asymptotas);
(5) Two parallel lines or one “double line” x2 − c = 0, c 6= 0, or x2 = 0
respectively;
(6) A single point x2 + y 2 = 0;
(7) The empty set (can come in several flavors, e.g. the “imaginary
ellipse” λx2 + µ2 + 1 = 0, λ, µ > 0).
Remark 1.27. This classification, familiar to the Greeks (although for-
mulated in a completely different language), is aesthetically unsatisfactory:
too many cases, too many different scenarios leading to the same case. For
instance, should we distinguish between the simple line {x = 0} and the
“double line” {x2 = 0}? (the question makes sense already for roots of an
equation with one variable). Some clarity appears if we consider the projec-
tive classification: then the three first cases fuse into one single case of “the
real projective quadric” which may occupy different positions with respect
to the “infinite line” of horizon. Another simplification can be achieved by
studying the equations not over the real field R, but rather over the field
C of complex numbers. Unfortunately, the margins of these notes are too
small to contain the story in details, quoting Pierre Fermat.
9. Orthogonality
How one can measure angles on the Cartesian plane R2 ? The definition
through the arclength of the unit circle is problematic mainly because we
lack means to compute the arclength explicitly in algebraic terms (recall
that trigonometric functions are non-algebraic, we “see” them but cannot
yet compute them.
Yet some angles can be detected directly. Direct angle is one such ex-
ample.
Definition 1.23. Let ` = {ax + by + c = 0} ⊂ R2 be a line and O ∈/`
a point off `. A perpendicular OA from O to ` is the segment (and, more
generally, the line containing this segment) such that
∀B ∈ ` |OA| 6 |OB|. (9.1)
In other words, A is the point on ` which is closest to A.
Apriori, it requires a proof that the minimum exists, but it will follow
from its explicit construction.
Theorem 1.18. The line `0 perpendicular to ` = {ax + by + c = 0} ⊂ R2
has an equation
`0 = {a0 x + b0 y + c0 = 0}, where a0 = −b, b0 = a, (9.2)
10. CONCLUSION 43
and c0 ∈ R is uniquely defined by the condition that `0 3 O.
By this theorem, all perpendiculars to ` from all points of the plane are
parallel to each other (or coincide).
The proof of this theorem is moved to the exam (Problem 3), as we
didn’t have time to do this computation in the class.
9.1. Algebraic geometry. The insight of Descartes allowed to infuse
the classical geometry with much richer content. Why to stop at the poly-
nomials of the second degree? Newton tried to find classification of all real
cubic curves, but his list was too long to be instructive. Today we under-
stand completely the complex geometry of nonsingular cubic curves, but
each unhappy curve is unhappy in its own way, quoting Leo Tolstoi.
Instead one can focus on general properties of algebraic sets, defined by
arbitrary polynomial equations in two variables, real (or better complex).
This turned out to be a Bright New World which today is definitely at the
center of mathematical research. We will occasionally touch a few subjects
from this world.
Example 1.10. Let f be a nonzero polynomial in two real variables of
degree n, and C = {(x, y) : f (x, y) = 0} the corresponding set (algebraic
curve) on the plane.
Then any real line ` is either part of C, ` ⊆ C, or intersects C by at
most n distinct points. Indeed, if
X
f (x, y) = aij xi y j , aij ∈ R, not all ai,n−i zeros,
i+j6n
is the expansion of P , then we can use the parameterized form of the equa-
tion for ` as the image of the map
ϕ : R → R2 , ϕ(t) = (x0 − bt, y0 + at),
to find t-points corresponding to the intersections, as roots of the equation
(f ◦ ϕ)(t) = 0. The latter composition is a polynomial in t of degree 6 t
and hence can have at most n real roots, but it can “accidentally” vanish
identically.
10. Conclusion
The Cartesian plane R2 with the distance dist(P, Q) = |P − Q| gives us
a metric space which is a model for the Euclidean geometry. The “Carte-
sian straight lines” and “Cartesian circles” behave exactly as the “Euclidean
lines” and “Euclidean circles”: all geometric statements (including the ax-
ioms of Euclid!) can be proved in this model by manipulations with algebraic
equations.
However, these manipulations (especially dealing with radicals) are some-
times tedious and produce an impression of trickery.
44 1. THE (PLANAR) EUCLIDEAN GEOMETRY
Example 1.11. While the notion of the Cartesian plane R2 = R × R
is more or less natural, the formula (7.1) taken as the definition of the
distance, looks rather complicate. This complexity becomes way too obvious
when we see how simple are the formulas from Theorem 1.11 defining lines.
Of course, one can argue that the distance was specially tailored, but still
the question of why lines are defined by linear equations and squares by
quadratic equations, needs to be answered honestly.
In the next section we show how computations of the “classical” Eu-
clidean geometry, involving lines, circles and, more generally, conic sections,
can be considerably simplified if we approach the construction of the Eu-
clidean plane R2 with the distance (7.1) from the opposite end, starting from
algebraic structures known as vector spaces, and scalar products on them.
10.1. What about non-Euclidean geometry? In the Cartesian model
the Fifth Postulate (uniqueness of a line parallel to a given one through a
given point14) is an easy theorem. Yet this observation does not answer the
following question. Can there be other models (by other metric spaces, per-
haps, defined by different sort of constructions) in which the Fifth postulate
will not be true while all the other axioms hold? This would show that the
Fifth postulate is indeed independent of the other axioms, at the same time
opening us a window on non-Euclidean geometries.
Example 1.12. The “spherical geometry” which deals with the spher-
ical distance on the sphere (say, of radius 1) is almost an example of such
“non-Euclidean geometry”. Indeed, if we define the straight lines as large
circles on the sphere and “small circles” of radius less than π via the “El Al
distance”15, then almost all axioms but one will hold.
In the spherical geometry we have two outrageously “non-Euclidean
facts”:
(1) Through some pairs of points passes more than one straight line
(actually, infinitely many).
(2) There are no parallel lines at all: any two different lines intersect
by exactly two points.
These properties are so “anti-Euclidean” that you would reject calling the
large circles “spherical lines”, but one can slightly modify it and develop the
projective geometry in which the “projective line” through any two points
is still unique, but there are no parallels and any two “projective lines”
intersect each other by a unique point.
14Existence of a parallel line is an “absolute” fact that can be derived from other
axioms without using the Fifth postulate.
15An allusion to the taxicab distance in Manhattan.
CHAPTER 2
Linear algebra and vector spaces
The construction of a Cartesian plane, upon a second thought, raises
some suspicions, the main one–how one could guess the formula (7.1) for
the distance so that equations for lines and circles turned out so remarkably
simple. Of course, it was great to describe geometric shapes by algebraic
formulas and replace geometric demonstrations by computations, yet some
computations obviously appear more simple then others. In this chapter we
will try to explain why it happens. Very loosely, the idea of passing from
geometric shapes to algebraic formulas needs to become a two-way road.
We start with some basic purely algebraic constructions and associate with
them geometric notions which allow us to bring some geometric intuition.
The first bonus from such counter-initiative is the possibility to speak about
higher-dimensional spaces (of dimension 4 and up) in the same terms as we
speak about 1-, 2- and 3-dimensional space.
11. Vectors instead of points
In the past we repeatedly used the adjectives linear, quadratic e.a. in re-
lation to the equations. It was always synonymous to indication of the degree
of the corresponding polynomials: linear means first degree, quadratic—
second degree e.a.
But it was clear that the linear equations were much simpler to deal
with. In particular, solution of any system of such equations was always
possible using only the four arithmetic operations of the field R, applying
them to the equations.
This suggests a new twist: perhaps, the Euclidean plane can be con-
structed starting from a certain algebraic structure (in other words, supplied
with operations like addition and multiplication, satisfying certain prop-
erties), rather than a metric space? This would be an axiomatic way to
introduce the same object, but with a different set of axioms.
11.1. “Axioms” of the linear space. We will talk about sets called
linear spaces (defined over the field of real numbers R), whose elements
we will call vectors. For the moment “vector” is as undefined a notion as
a “point” in the Euclid’s system of axioms. The expression vector space
is fully synonymous with the term “linear space”, the difference is purely
dialectal.
45
46 2. LINEAR ALGEBRA AND VECTOR SPACES
Note that unlike the Euclidean axioms, the Definition below does not
pretend on any absolute value. With the only exception: once you formulate
this Definition, you instantly see how many linear spaces are surrounding
you. Thus the real surprise is, how much can we say about the objects based
on so short and laconic a definition.
Definition 2.1. A set V is called a linear space (or a vector space)
defined over R, if there are two operations, u and · acting as follows:
(u) : V × V → V, (v, w) 7−→ v u w,
(·) : R × V → V, (λ, v) 7−→ λ · v,
which satisfy the usual commutativity and associativity rules similar to the
operations +, × in R (in particular, u has an inverse operation), and the
distributive law
λ · (v u w) = λ · v u λ · w
for all λ ∈ R, v, w ∈ V . In particular, each linear space contains an element
0 such that v u 0 = v for any v.
The term “vector space” is a complete synonym of the term “linear
space”: the two coexist without troubles, since elements of linear spaces are
vectors. Elements of the field R are sometimes referred to as “scalars”, to
stress that they are not vectors in V (in general).
Example 2.1 (the paradigm). The field R itself is a vector space (over
itself), if we take V = R, u = +, · = × (the usual multiplication of numbers,
often omitted in notation) and 0 = 0.
Note, however, that we can assign the meaning to the “product” ab,
a, b ∈ R, only if we consider a as “the number” and b as “the vector”. The
product of two vectors v · w is not defined in general!
Example 2.2 (Cartesian plane). Consider the Cartesian product R2 =
R × R, whose elements (“vectors”) are pairs (x, y), x, y ∈ R. Define
(x, y) u (u, v) = (x + u, y + v), λ · (x, y) = (λx, λy)
(we omit the symbol of multiplication in R, as usual).
All axioms of Definition 2.1 can be immediately verified and 0 = (0, 0).
Moreover, we see that the Cartesian product Rn = R × · · · × R of n different
copies of R is also a linear space.
Remark 2.1. We again see the same ambiguity in the notations: the
symbol R may be used to denote both the field of real numbers and the
linear space (over the field R) as in Example 2.1. In the same way R2 may
stand for the Euclidean plane as a metric space or as the vector space from
Example 2.2. Using the notation R1 helps only partially: it may designate
either a 1-dimensional vector space, or 1-dimensional affine space.
One might indicate the difference by some special mark in the notation,
but the experience shows that it creates more problems instead of clarifying
the situation. You should imagine “the same” world, but assume that you
12. “LINEAR STRUCTURE”: MAPS BETWEEN VECTOR SPACES 47
are allowed to play with different tools: measurement tape in the first case
(you can measure distance between the “geometric points” but have no way
to add them) or the algebraic operations in the other.
Remark 2.2 (notation). In Rn we have n distinguished vectors ei ∈ Rn
defined as elements (0, . . . , 1, . . . , 0) with 1 occurring at the ith place, i =
1, . . . , n, and zeros occupying all other positions.
Problem 2.1. Consider the set of all polynomials in one variable of
degree 6 d with real coefficients. Show that it is a linear space after suitable
definition of operations.
Problem 2.2. Let A be an abstract set (finite or infinite). Consider the
space of all real functions on A, V = {f : A → R}. Define the operations u
and · on V that would make it a linear space. Is it important whether A is
finite or not?
Problem 2.3. Consider the set of all closed segments A = [a, a0 ] ⊆ R
and define the operations
A u B = {x + y : x ∈ A, y ∈ B}, λ · A = {λx : x ∈ A}.
Prove that these operations satisfy the axioms
A u B = B u A, A u (B u C) = (A u B) u C.
Why this set is not a linear space?
Remark 2.3 (notation). Once you get comfortable with the notion of
linear spaces and vectors, it is only natural to return to the standard “field”
notation and restore the operation + and omit the relation · when multi-
plying scalars by vectors. For the same reason we use the symbol 0 for all
null vectors in all spaces (keeping in mind that 0 ∈ R and 0 ∈ R2 are quite
different!).
Problem 2.4. Show that the Cartesian product of two vector spaces
V, W over R is again a vector space.
Hint: look at Example 2.2.
12. “Linear structure”: maps between vector spaces
12.1. How different linear spaces are different from each other?
To answer this question, we first need to say which linear spaces should be
considered “the same”. Inspired by the definition of isometry, we can arrive
at the only meaningful definition.
Definition 2.2. Let V, W be two vector spaces over the same field R.
A map A : V → W is called a linear map, if
A(v + v 0 ) = A(v) + A(v 0 ), A(λv) = λA(v) ∀v, v 0 ∈ V, λ ∈ R. (12.1)
Problem 2.5. Mark by red and blue color the operations in V and W
respectively, including the dummy symbol for multiplication by scalar.
48 2. LINEAR ALGEBRA AND VECTOR SPACES
Definition 2.3. Two linear spaces V, W are isomorphic (equivalent,
“the same”), if there exists an invertible linear map A : V → W , whose
inverse map A−1 : W → V is also linear.
The next natural question “how many non-isomorphic linear spaces there
are?” can be answered (to some extent) when we define the fundamental
notion of the dimension of a vector space.
12.2. Subspaces and their chains. Dimension of a vector space.
Definition 2.4. A subset L ⊆ V of a vector space V is called a subspace
(or a vector subspace), if L is itself a vector space, that is, it is closed by the
operations + and ·.
Problem 2.6. Let A : V → W is a map between two vector fields over
R. Consider the graph of A,
graph A = {(v, Av) : v ∈ V } ⊆ V × W.
(1) Prove that A is linear if and only if graph A is a vector subspace of
V × W.
(2) Show that in Definition 2.3 it is enough to require that A is linear
and invertible: the fact that the inverse map A−1 will be again
linear, is an automatic consequence and can be dropped from the
Definition.
Hint. The above assumption is symmetric with respect to V
and W .
Example 2.3. Any vector space V contains two subspaces. One consists
of the single zero vector, L = {0}, the other is V itself. These subspaces are
called trivial.
Problem 2.7. Suppose that V has only trivial subspaces. Prove that it
is isomoprhic to R, see Example 2.1.
Exhaustion of a linear space by subspaces. Now we will explain an al-
gorithm, whose purpose is to measure, how large a linear space V can be.
Problem 2.7 suggests both the induction base and the inductive step of this
algorithm.
If our space V has a nonzero vector v = v1 ∈ V , we can use it to construct
a subspace L1 = {λv : λ ∈ R} which we denote by R · v1 . If this space is
trivial (coincides with V ), we see that V = R by Problem 2.7. Otherwise
L1 ⊂ V and there exists a (necessarily nonzero) vector v2 ∈ V rL1 . Consider
the set1
R · v1 + R · v2 = {λ1 v1 + λ2 v2 : λ1 , λ2 ∈ R} = L1 + R · v2 .
1Did you notice that we tacitly introduced the notation A + B for two subsets A, B of
a linear space? Expand the definition of this new “addition” and prove that it coincides
with the same vector addition in the case when each A and B consists of only one vector
a and b respectively.
12. “LINEAR STRUCTURE”: MAPS BETWEEN VECTOR SPACES 49
This is yet another subspace L2 in V which strictly contains L1 : L1 ( L2 ⊆
V . If L2 = V , the process is stopped, otherwise there is a vector v3 ∈ V and
we can use it to extend L2 to the new subspace L3 = L2 + R · v3 etc.
The process can obviously be continued unless at the certain step we
obtain the subspace Ln which coincides with the entire space V .
Definition 2.5. The space V is called finite-dimensional, if each in-
creasing chain of subspaces eventually stops at the trivial subspace Ln = V ,
{0} = L0 ( L1 ( L2 ( L3 ( · · · ( Ln = V. (12.2)
If there exists an infinite chain of expanding subspaces Lk ( Lk+1 , k =
1, 2, 3, . . . , the space V is called infinite-dimensional.
Problem 2.8. Let V = R[x] be the linear space of all polynomials in
one variable x without restriction on their degree, cf. with Problem 2.1.
Consider the subspaces Ln = {p ∈ R[x] : deg p = n − 1, n = 1, 2, 3, . . . .
Show that they form the infinite chain as in (12.2).
Infinite-dimensional vector spaces are extremely important for applica-
tions, for instance, for description of physical processes like heat spread,
quantum mechanics e.a., but the area of Mathematics that deals with them,
called Functional Analysis, is way too advanced to be discussed here. From
now on we will deal only with finite-dimensional vector spaces.
Yet even in this case there are many questions that immediately beg for
answer. First, what is the number n? Does it depend on the intermediate
choice of vectors v1 , v2 , . . . , or is the same regardless of the choices?
Example 2.4. Consider the vector space R2 and take v1 = e1 = (1, 0).
Then L1 = {(λ, 0) : λ ∈ R} is the line {y = 0} ⊆ R2 . The vector v2 = e2 =
(0, 1) ∈
/ L1 can be used to produce L2 = {(λ, µ) : λ, µ ∈ R} which obviously
coincides with R2 .
Dimension.
Theorem 2.1. If V is a finite-dimensional vector space, then any se-
quence of vectors v1 , . . . , vn generating the sequence of subspaces (12.2), has
the same length.
Definition 2.6. The number n common for all chains (12.2), is called
the dimension of the space V and denoted by dim V .
Problem 2.9. Give the definition of dimension of a subspace L ⊆ V .
Prove that dim L 6 dim V . What happens if the equality occurs?
Theorem 2.2 (The Principal Theorem of Linear Algebra). Any two
finite-dimensional space are equivalent in the sense of Definition 2.3 if their
dimensions coincide. In particular, any n-dimensional space over R is equiv-
alent to Rn .
Hint. This follows from solution of the Problem 6.
50 2. LINEAR ALGEBRA AND VECTOR SPACES
12.3. Bridge to the familiar notions. The process of selection of a
sequence of vectors 0 6= v1 , v2 , v3 , . . . , vn , . . . such that for each n ∈ N we
have
vn ∈
/ R · v1 + R · v2 + · · · + R · vn−1
is traditionally described using the notion of linear (in)dependence.
Definition 2.7. A collection of vectors {v1 , . . . , vn } ⊆ V is said to
be linear independent, if the only linear combination of these vectors that
represents 0 ∈ V is trivial: the equation
0 = λ 1 v1 + · · · + λ n vn , λ1 , . . . , λn ∈ R, (12.3)
with respect to the unknowns λ1 , . . . , λn , has only one trivial solution
λ1 = · · · = λn = 0.
Definition 2.8. For any subspace L ⊆ V a collection of vectors
{v1 , . . . , vk } ⊆ L
is called a basis for L, if this collection is linear independent and k = dim L.
Remark 2.4. Infinite-dimensional spaces have no basis in our modest
world!
Definition 2.9 (Problem). Let {v1 , . . . , vn } ⊆ V be a basis in a vec-
tor space V , n = dim V . Prove that any vector x ∈ V can be uniquely
represented as the linear combination of the basis vectors like
x = x1 v1 + x2 v2 + · · · + xn vn , x1 , x2 , . . . , xn ∈ R. (12.4)
The numbers x1 , . . . , xn are called the coordinates of x in this basis. In a
different basis the same vector will have different coordinates.
Problem 2.10. Let V = Rn and {e1 , . . . , en } the standard basis in it,
see Rem 2.2. What are coordinates of a vector (a1 , . . . , an ) ∈ Rn in this
basis? Assume that another basis is obtained by permutation of vectors ei ,
say, {en , · · · , e1 }. What happens with the coordinates of the same vector in
the new basis?
13. Computations: systems of liner equations at their best
We have seen that the structure of a vector space is very meager: ba-
sically, in any finite dimension n there exists “only one” model of a vector
space, Rn . On the other hand, the world of all linear maps (see Defini-
tion 2.2) is very rich, interesting and useful, especially if we consider self-
maps A : V → V of an n-space into itself.
Problem 2.11. If A : V → W is a linear map, then the two sets, the
kernel Ker A = {v ∈ V : A(v) = 0} ⊆ V and the image A(V ) = {A(v) : v ∈
V } ⊆ W are both subspaces in the respective spaces.
We won’t go deeply into the Linear Algebra, just give an example of a
typical question that can be effectively answered.
13. COMPUTATIONS: SYSTEMS OF LINER EQUATIONS AT THEIR BEST 51
Problem 2.12. Assume that A : V → W , dim V = n, dim W = m,
dim KerA = p 6 n, dim A(V ) = q 6 m. Is there any connection between
the four natural numbers n, m, p, q?
Answer. Read about the rank of a matrix.
13.1. Matrices. Matrices are usually introduced as rectangular tables
filled by real numbers. Then there is postulated some absolutely mysterious
looking rule for the “matrix multiplication”, which leaves most of the first
year college students perplexed. In fact, all the matrix definitions can be
explained immediately once we are ready to treat each vector space as Rn
(with different n). The legality of this approach roots in Theorem 2.2.
Consider a map A : Rn → Rm . In both spaces the choice of maximal
linear independent set of vectors is obvious: by Remark 2.2, these are n-
tuples ei (resp., m-tuples fj )2 of real numbers, all of which are zeros except
for a unit in an ith (resp., jth) position, i = 1, . . . , n (resp., j = 1, . . . , m).
Therefore each image A(ei ) can be expanded as a linear combination of
the vectors fj :
Xm
A(ei ) = aij fj , i = 1, . . . , n. (13.1)
j=1
The numbers {aij : i = 1, . . . , n, j = 1, . . . , m} can be “packed” into a
rectangular table with n rows and m columns (of course, this is purely a
convention: there is absolutely no deep reason to do like that and not vice
versa). This table is called the matrix of the linear map A in the bases
(plural!) {ei , fj }.
Assume that there is yet another vector space Z isomorphic to Rp and
the basis {gk }, k = 1, . . . , p, and a linear map B : W → Z. Then for the
map B one can repeat the same construction and form a rectangular table
packed with elements bjk , k = 1, . . . , p such that
p
X
B(fj ) = bjk gk , j = 1, . . . , m.
k=1
How looks the composition B ◦ A : V → W → Z? One has to apply
the axioms of linearity and carefully expand the result of application of B
to the vectors A(ei ). Do It Yourself, to recognize the formulas of matrix
“multiplication”.
Remark 2.5. Since the time immemorial, in Linear Algebra the braces
are omitted: we write Av instead of A(v) and BA instead of B ◦ A. This
reduces the amount of braces (always annoying) but has some inner logic
in it. For instance, if A : Rn → Rm and v = (x1 , . . . , xn ) ∈ V is the vector
2We use the different notation e , f to stress the fact that “the same” vectors, with
i j
all but one zero coordinates, live in the two different spaces and should not be confused
with each other. Psychologically this is easier to digest if the dimensions of the source
and the target space are different. Then the matrices will be non-square.
52 2. LINEAR ALGEBRA AND VECTOR SPACES
with the coordinates x1 , . . . , xn in the sense of Definition 2.9, then we can
arrange the numbers xi into a single column “matrix” of height n and then
the matrix product Av will be a well-defined column matrix of height m with
entries being coordinates of the image A(v).
Problem 2.13. To prove that the matrix product is associative, A(BC) =
(AB)C starting from the formula for the elements of the matrix multipli-
cation is a tedious computation. On the other hand, the associativity of
composition of any maps is obvious, A ◦ (B ◦ C) = (A ◦ B) ◦ C. This shows
why the “geometric” definition is more convenient.
13.2. Systems of linear equations and how to solve them. As-
sume that A : Rn → Rm is a linear map with the matrix A = (aij ), see
(13.1). Then the following geometric questions arise:
(1) Describe Ker A = {x ∈ Rn : Ax = 0 ∈ Rm },
(2) Describe Im A = A(Rn ) = {y ∈ Rm : ∃x ∈ Rn Ax = y}.
(3) Describe the maps A : Rn → Rm which are injective, surjective.
Both these problems are reduced to solution of systems of linear equations.
In the first case we look for the n-tuple of unknowns x = (x1 , . . . , xn ) such
that
a11 x1 + a12 x2 + · · · + a1n xn = 0,
a21 x1 + a22 x2 + · · · + a2n xn = 0,
x1 , . . . , xn ∈ R. (13.2)
· · · = 0,
am1 x1 + am2 x2 + · · · + amn xn = 0,
In the second case the system of equations takes the non-homogeneous form
a11 x1 + a12 x2 + · · · + a1n xn = y1 ,
a21 x1 + a22 x2 + · · · + a2n xn = y2 ,
x1 , . . . , xn ∈ R, (13.3)
··· = ··· ,
am1 x1 + am2 x2 + · · · + amn xn = ym ,
and the question is for which tuples y = (y1 , . . . , ym ) of right hand sides the
system (13.3) is solvable with respect to the unknowns x. The last question
becomes a question about the matrix elements aij such that the system
(13.2) has a unique solution x1 = · · · = xn = 0; the second part of it asks
when the non-homogeneous system (13.3) is solvable for any right hand side
y.
All these questions can be effectively solved by relatively simple algo-
rithms involving only arithmetic operations with entries of the matrix A
(read about the Gauss elimination algorithm), but we don’t go there for the
lack of time. Very loosely, the idea is to treat linear equations as vectors
themselves.
Example 2.5 (Gauss elimination algorithm). Consider, for instance, the ques-
tion of solvability of the system (13.3) for a given tuple (y1 , . . . , ym ).
13. COMPUTATIONS: SYSTEMS OF LINER EQUATIONS AT THEIR BEST 53
Look at all coefficients ai1 which appear before the variable x1 . Then one has
one of the following two cases.
(1) Trivial case: all the coefficients ai1 are zeros. This means that actually
the system (13.3) does not depend on the variable x1 and we can strike it
out from the list of variables. The reduces system is solvable if and only
if the initial system is, but solutions will never be unique (any value of
x1 is permitted).
(2) At least one of the coefficients ai1 is nonzero. By reordering the equations
(an operation that does not change solvability) we may assume that a11 6=
0.
The inductive step is transforming the system (13.3) as follows: each equation with
number j = 2, . . . , m is replaced by the equation obtained by subtracting from it
the first equation multiplied by aj1 /a11 . What happens is the disappearance of
the coefficient before x1 in the transformed equation. Obvioulsy, the transformed
system is equivalent to the initial one.
By construction, the transformed system has the form
a x + a x + ··· + a x = y ,
11 1 12 2 1n n 1
a22 x2 + · · · + a2n xn = y20 ,
0 0
x1 , . . . , xn ∈ R, (13.4)
··· = ··· ,
a0m2 x2 + · · · + a0mn xn = ym0
,
with new coefficients a0ij and new right hand sides yj0 , j = 2, . . . , m, which are
explicitly expressed through the initial parameters of the system.
The most important feature of the transformed system (13.4) is the reduced
number of variables: if we keep only the equations number 2, . . . , m, then they
depend only on the variables x2 , . . . , xm . The first variable is “eliminated”: if we
consider m − 1 equations with n − 1 variables and it is solvable, the first equation
will be automatically solvable for any y1 : after substituting the solution (x2 , . . . , xn )
into it we will get an equation a11 x1 + · · · = y1 with a11 6= 0.
The above elimination of unknowns can obviously be iterated. On each step
we will have a dichotomy, whether the next variable eliminates itself automatically
without our intervention, or we need to transform the system again to eliminate it
if it refuses to disappear voluntarily. The difference will influence the right hand
sides.
When this process stops? Of course, when all variables will be eliminated. The
resulting reduced system will take the form
(p)
0 = ym−p ,
0 = ··· , (13.5)
0 = y (p) ,
m
where p is the number of nontrivial eliminations. The explicit expressions for the
transformed right hand sides are stored in the logs of the algorithm: they are explicit
linear combinations of the initial right hand sides y1 , . . . , ym with coefficients boiled
from aij by legal rational operations not involving division by zero.
Expanding the right hand sides of (13.5) in yj , we obtain necessary (and suffi-
cient, as one can easily see) conditions of solvability of the initial system (13.3).
54 2. LINEAR ALGEBRA AND VECTOR SPACES
This example teaches us three things. First, if you absolutely need this, any
system of linear equations, homogeneous or not, can be solved by pen and paper.
Second, if you can, you better ask a computer to do this in any particular situation.
Drilling solution of linear systems beyond 2 × 2 is useless torture. Third, if your
system has any peculiar form (symmetry, a small number of nonzero elements
placed at some non-accidental places, e.a.)—you better think or consult a literature,
perhaps, the answer is known.
Problem 2.14. Suppose that the matrix of coefficients is square and has only
diagonal elements aii = λi that can be nonzero. Describe the kernel and the image
of this map (note that λi themselves can be zeros or nonzeros: study all cases!).
Problem 2.15. How to twist the Gauss elimination algorithm to describe
nonzero solutions of the system (13.2)?
13.3. Expansion of a vector in a basis. Suppose we have a basis in
Rn which consists of n vectors v1 , . . . , vn ∈ Rn with the coordinates
vi = (bi1 , · · · , bin ), i = 1, . . . , n, bij ∈ R. (13.6)
Given an arbitrary vector v ∈ Rn with the coordinates
v = (c1 , . . . , cn ), ci ∈ R,
we look for a representation of v as a linear combination of the basis vectors,
v = x 1 v1 + · · · + x n vn , x1 , . . . , xn ∈ R,
with unknown (yet) coefficients (we use the boldface to distinguish between
scalars and vectors).
This vector equation, naturally, is equivalent to a system of n scalar
linear equations:
c1 = b11 x1 + b12 x2 + · · · + b1n xn ,
c2 = b21 x1 + b22 x2 + · · · + b2n xn ,
x1 , . . . , xn ∈ R. (13.7)
··· = ··· ,
cn = bn1 x1 + bn2 x2 + · · · + ann xn ,
In other words, to find an expansion of a vector in a basis, one has to solve a
system of linear nonhomogeneous equations of the form (13.3) with n = m
and the square matrix B of coefficients. This means that we need to invert
a certain auxiliary linear self-map of Rn .
13.4. Determinant. The Gauss elimination algorithm above is indeed
an algorithm, that is, it is in general impossible formulate its output by
means of a simple formula. There is one exceptional case when this can still
be done, although the answer may look complicated at the first glance.
This is the case of a linear map from a space (say, Rn ) into itself,
A : Rn → Rn . This means that the basis {fj } in the target space should
be chosen the same as the basis {ei } when computing the matrix elements
of A as in (13.1).
13. COMPUTATIONS: SYSTEMS OF LINER EQUATIONS AT THEIR BEST 55
Theorem 2.3. There exists a function, denoted by det and called the
determinant, which associates with any square matrix A the real number
det A ∈ R, with the following properties:
(1) det(· · · ) is a polynomial in the n2 variables aij , all entries of the
matrix A,
(2) det AB = det A · det B,
(3) The linear map A is injective if and only if it is surjective, and this
happens if and only if det A 6= 0.
Remark 2.6. The explicit formula for det A is given by a sum involving
n! terms, each of which is a product of n entries from among the collection aij .
Computation using this formula is quite impractical except for the lowest order
cases n = 1, 2, 3, where
• det(a 11 ) = a11(one linear term),
a a12
• det 11 = a11 a22 − a12 a21 (2=2! quadratic terms),
a
21 a22
a11 a12 a13
• det a21 a22 a23 = a11 a22 a33 ± · · · − a13 a22 a31 (6 = 3! cubic terms).
a31 a32 a3 3
This explicit formula can be replaced by the “recursive algorithm” which allows to
express det A of order n via determinants of certain submatrices (“minors”) of A
of order n − 1 which allows in some cases write down inductive formulas (rarely).
The easiest way is to describe det by its property and use this property when it
helps. Note that the columns of the square matrix A can be considered as elements
of Rn , i.e., as vectors: the ith column consists of the coordinates of the vector
Aei . Thus any function of A can be considered as a function of n vector arguments
a1 = (a11 , . . . , an1 ), a2 = (a12 , . . . , an2 ), . . . , an = (an1 , . . . , ann ), all from Rn . In
particular,
det A = F (a1 , . . . , an ), ai ∈ Rn , i = 1, . . . , n. (13.8)
Theorem 2.4. There determinant is the unique function F of n vector argu-
ments, such that:
(1) F is linear in each argument separately:
F (· · · , ai−1 , v + w, ai+1 , . . . )
= F (· · · , ai−1 , v, ai+1 , . . . ) + F (· · · , ai−1 , w, ai+1 , . . . ),
∀i = 1, . . . , n, ∀v, w ∈ Rn ,
F (· · · , ai−1 , λv, ai+1 , . . . ) = λF (· · · , ai−1 , v, ai+1 , . . . ),
∀λ ∈ R, v ∈ Rn .
(2) F is antisymmetric:
F (· · · , v, · · · , w, · · · ) = −F (· · · , w, · · · , v, · · · )
for any two vectors on any two positions.
(3) F (e1 , . . . , en ) = 1.
The determinant can be explained geometrically. Consider the space Rn and
introduce on it the notion of an oriented volume as the volume of the parallelepiped
built on n vectors: this volume can be axiomatically defined in a way close to the
56 2. LINEAR ALGEBRA AND VECTOR SPACES
way the oriented length is defined on R1 . Then this volume as a function of its
vector arguments will satisfy the three “axioms” listed in Theorem 2.4 (the last
condition is basically the choice of the unit measure for the volume).
Geometrically it is obvious that if the vectors are linear independent and the
parallelepiped degenerates into a shape inside a proper subspace, then this volume
vanishes.
14. Affine and vector spaces
Where is the hidden vector space in the Euclidean plane Π? And, con-
versely, how can we restore the classical terminology and make points out
of vectors? Recall that points in the Euclidean plane Π cannot be added or
multiplied by scalars!
We introduced the Cartesian plane R2 as a model for the geometric
Euclidean plane Π as a pure insight. The fact that R2 turned out to be
a vector space, is somewhat unexpected: points A, B, C, · · · ∈ Π of the
Euclidean plane in general cannot be neither added with each other, nor
multiplied by a scalar.
Is it possible to “reveal” the hidden structure of a vector space in the
Euclidean plane Π, if we know what to look for? Yes, we can3!
14.1. Translations of the plane Π. Translations are a particular case
of isometries of the plane Π.
Definition 2.10. A parallel translation is a self-map τ : Π → Π which is
an orientation-preserving isometry which keeps lines parallel to themselves:
for any line ` ⊆ Π we have τ (`) k `.
It is very easy to check that all parallel translations of the plane form a
group T with respect to composition: if τ1 , τ2 ∈ T , then τ2 ◦ τ1 is indeed a
parallel translation. The identity self-map of Π is the neutral element.
To define any parallel translation uniquely, it suffices to know the image
τ (A) of any point A ∈ Π: then for any other point B ∈ Π the image τ (B)
is defined uniquely. This allows to define τ −1 for any τ ∈ T . Moreover, the
group T is commutative: τ2 ◦ τ1 = τ1 ◦ τ2 for any τ1 , τ2 ∈ T . We apply the
standard assumption and denote the commutative group operation by +.
Is T a vector space? To do this, we need to define the multiplication by
scalars. This can be done in two ways.
The first one is purely algebraic (everything is described in terms of the
group operation on T ).
Note that any element of T can be “multiplied” by any integer number,
as follows4:
n · τ = τ| + ·{z
· · + τ} if n ∈ N, (−1) · τ = −τ (the inverse element).
n times
(14.1)
3This section can be skipped during the first reading.
4Compare this argument with its baby version as in Lemma 1.9.
14. AFFINE SPACE 57
Moreover, for any m, σ the equation
m · τ = σ, m ∈ N, σ∈T, (14.2)
1
is solvable with respect to τ , which allows to define the product m · σ as
the (unique)solution of this equation. Since both operations were defined
in terms of the group operation in T , we in fact defined the multiplication
(·) : Q × T → T , which by construction satisfies the distributive law
(λ, τ ) 7→ λ · τ, λ ∈ Q, τ ∈ T , λ · (τ + σ) = λ · τ + λ · σ. (14.3)
In other words, T became a vector space over the field Q, a dense subset of
the field R. To define the result of the multiplication µ · τ for an irrational
µ ∈ R r Q, one can choose any sequence of rational numbers λk converging
to µ, lim λk = µ, and set µ · τ = lim λk · τ .
The other, shorter way is to define λ · τ geometrically in terms of Π:
one has to recall how τ acts on a point A ∈ Π and use similarity of figures
on Π. This would require a special check for the distributive law (14.3) for
λ ∈ R r Q, which is an easy geometric exercise with similar parallelograms.
The result of these manipulations can be described as follows.
Theorem 2.5. The set of parallel translations T of the Euclidean plane
is (that is, can be equipped with the structure of ) a 2-dimensional vector
space over R, isomorphic to R2 .
This motivates the insight of Descartes looking back from the height of
the modern conceptual Mathematics.
Problem 2.16. Identify and prove accurately all the claims made be-
tween the lines in this section.
14.2. Is there a way back, or how to forget things in an intel-
ligent way. Suppose we want to solve the inverse problem: given a vector
space V of dimension n, can we restore the familiar “geometric” plane Π
(for n = 2) or the Euclidean three-dimensional space (for n = 3) starting
from V ? The answer is “yes, partially”, and we explain how.
To explain what needs to be done, recall that the vector space is an alge-
braic object with two operations: vectors can be added between themselves
and multiplied by real numbers (scalars). Points (on the plane Π or in the
space) do not admit natural algebraic operations between them. Moreover,
there is no distinguished geometric point which would correspond to the
neutral element 0 ∈ V . We need in a sense “forget zero” and “learn how to
add points”. The first is easy, the second uses a surprising trick: we need
rather learn to “subtract” points!
Example 2.6. Consider the binary operation Π × Π → T which as-
sociates with every pair of points A, B the unique parallel translation τ =
τAB ∈ T that sends A to B: τAB (A) = B. One can obviously see that this
operation satisfies the following properties:
(1) τAA = 0,
58 2. LINEAR ALGEBRA AND VECTOR SPACES
(2) τAB = −τBA ,
(3) τAB + τBC + τCA = 0 (the triangle identity).
However, this operation is far from being injective: τAB = τCD if and only
if ABDC is a parallelogram, in which case τAC = τBD .
−−→
Usually this operation is denoted by AB rather than τAB , but alge-
braically it would be more natural to use the minus sign and special brack-
ets, denoting τAB = [B − A] instead. Then the three properties above would
become “obvious algebraic identities”
[A−A] = 0, [A−B] = −[B−A], [B−A]+[C −B]+[A−C] = 0. (14.4)
Remark 2.7. Note, that the notation [B − A] is just a notation, the
braces [, ] are not the usual braces (, ) and apriori no distributivity law
is assumed. Thus the identities (14.4) are really independent conditions
(“axioms”) that need to be verified in each specific case. Good notation
helps a lot to simplify computations!
Definition 2.11. An (abstract) set X is called an affine space associated
with a vector space V , if there exists a “formal difference” map
X × X → V, X 3 A, B 7−→ [B − A] ∈ V, (14.5)
which is surjective and satisfies the identities (14.4) (the sums in the right
hand side are respective to the vector space V .
The construction with parallel translations means that Π is an affine
space associated with the vector space R2 , but this will not yet determine Π
uniquely: it is the notion of distance (metric) that is absent in vector spaces.
Remark 2.8. One can instantly “transform” an affine space X into the
vector space V . Choose any element (“point”) O ∈ X and identify the
vector v with a point A ∈ X such that [A − O] = v. This identification
is a one-to-one map between X and V such that the point O is mapped
into the zero vector 0 ∈ V . Choosing a different point O0 results in another
one-to-one map. This is what means to “forget zero”.
14.3. How to make an affine space into a metric space. Later we
introduce the notion of scalar product in an (abstract) vector space, which
will allow us to talk about lengths of vectors and angles between them,
see §??. These more rich objects are called the Euclidean (vector) spaces.
The same construction of “affinization” (passing from vectors to points)
can be applied to Euclidean spaces also, producing the Euclidean (affine)
spaces. The Euclidean plane Π is exactly the Euclidean affine 2-space. In
the Euclidean affine spaces one can talk about distances between points and
about angles between straight lines passing through the same point. The
details will follow later, see §16.13.
15. DUALITY 59
15. Duality
The Gauss elimination algorithm is based on the treatment of “linear
equations” as vectors. In this section we show how this should be done in
an intelligent way.
15.1. “Equations” as vectors. Dual space. Consider a vector v =
(x1 , . . . , xn ) (written in some basis in V ) and a “string of coefficients” a =
(a1 , . . . , an ) ∈ Rn . The expression (real number)
hha, vii = a1 x1 + · · · + an xn ∈ R
can be considered as a bilinear function, that is, the map
Rn × V → R, (a, v) 7−→ hha, vii ∈ R. (15.1)
which is linear separately in each of its two arguments, a, v ∈ Rn .
This observation is only a motivation for the following definitions.
Definition 2.12. Let V be a vector space. A linear functional on V is
a linear map A : V → W in the special case where W = R1 ' R.
To stress this particular case, we will denote linear functionals by the
Greek letters ξ, η, . . . .
Problem 2.17. Prove that the set of linear functionals on the same
vector space V is itself a vector space: linear functionals can be added
between themselves and multiplied by scalars.
Definition 2.13. The vector space of all linear functionals on V is called
the dual space (to V ) and denoted by V ∗ . Elements of the dual space will
be also called co-vectors (or covectors), to stress the symmetry between V
and V ∗ . The value of a covector ξ ∈ V ∗ on a vector v ∈ V is denoted5 by
hhξ, vii ∈ R.
Remark 2.9 (important). For any vector v ∈ V the map
hh·, vii : V ∗ → R, ξ 7−→ hhξ, vii
∗,
is a linear functional on V i.e., a “co-co-vector”. Thus we see that any
v ∈ V represents an element from (V ∗ )∗ = V ∗∗ , the “second dual” space,
that is,
∀V a vector space V ⊆ V ∗∗ .
This might result in a nightmare: adding more stars would produce an
infinite chain of bigger and bigger spaces. Fortunately, at least for finite-
dimensional spaces this is not the case, and we always have the equality
V = V ∗∗ , that is, any linear functional on V ∗ is the pairing hh·, vii with some
vector v ∈ V .
5The notation is temporary, to avoid confusion with the scalar product hξ, vi which
will be introduced later in §16. The difference is the fact that both arguments of the scalar
product are from the same space V .
60 2. LINEAR ALGEBRA AND VECTOR SPACES
15.2. Dual basis. Let v1 , . . . , vn be a basis in V . We say that covectors
(linear functionals) ξ1 , . . . , ξn ∈ V ∗ form a dual basis, if
(
1, if i = j,
hhξi , vj ii = (15.2)
0, if i 6= j.
One can immediately verify that the conditions (15.2) uniquely define {ξj }
given {vi } and vice versa. This shows that the two spaces V, V ∗ have the
same dimension and hence V ∗∗ must coincide with V .
Problem 2.18. Prove the last statement accurately. What conditions
should you check?
15.3. From a space to its dual and back. The dual V ∗ to a vector
space naturally inherits many things that live in V .
Example 2.7. Let L ⊆ V be a subspace. Its dual (or annulator ) is the
subspace L◦ ⊆ V ∗ , defined as follows:
L◦ = {ξ ∈ V ∗ : ∀v ∈ L hhξ, vii = 0}, (15.3)
the space of covectors that vanish on L.
By this definition, 0◦ = V ∗ , V ◦ = 0 ⊆ V ∗ . You can instantly see that
◦◦
L = L.
Example 2.8. Let A : V → V be a linear operator (the linear self-map
of V ) and ξ ∈ V ∗ a covector. Then we can consider the mapping
η : V → R, v 7→ hhξ, Avii (15.4)
as a new linear functional, η ∈ V ∗ . A trivial check shows that the map that
sends ξ ∈ V ∗ to η ∈ V ∗ is a linear self-map of V ∗ to itself. If we denote it by
A∗ , then the characteristic property relating A and A∗ will be the identity
∀v ∈ V ∀ξ ∈ V ∗ hhA∗ ξ, vii = hhξ, Avii . (15.5)
The operator A∗ is called the adjoint operator to A.
This construction is in fact simpler than you may think. If {vi } ⊆ V is
a basis in V and {ξj } ⊆ V ∗ the dual basis in V ∗ , then the matrices of A and
A∗ can be obtained from each other by simple transposition (exchanging
places between the indices i and j):
a∗ij = aji , ∀i, j = 1, . . . , n. (15.6)
Remark 2.10. Later we will study the Euclidean spaces, which provide
a canonical way to identify each such space with its dual. Then the annulator
L◦ will become a subspace in V which is an orthogonal complement to L,
each operator could be compared to its adjoint and operators equal to its
adjoint, A = A∗ , will be called self-adjoint or symmetrical etc.
16. SCALAR PRODUCT. EUCLIDEAN VECTOR SPACES 61
16. Scalar product. Euclidean vector spaces
To complete our algebraic study of the Euclidean plane, we need to
explain how the notion of distances (more precisely, lengths) and angles can
be generalized for arbitrary vector spaces. The tool for this is called the
scalar product, also known as the Euclidean structure (more rich than the
general linear structure).
16.1. Linear and multilinear functions. Recall that we introduced
the notion of a linear functional (covector) as a linear map ξ : V → R1 = R.
This paves the way to the following definition.
Definition 2.14. A function of two arguments f (x, y) is called a bilinear
function or, in the classical language of the 19th century, a bilinear form, if
it is linear in each of the arguments separately.
Remark 2.11. Absent other conditions, the arguments x, y may be
taken over two different vector spaces x ∈ V , y ∈ W . However, under
some additional assumptions, like the symmetry (see below), we will have
to restrict to the case where V = W .
Example 2.9. The pairing
hh·, ·ii : V × V ∗ → R
is a bilinear functional on V × V ∗ . It cannot be symmetric, since the spaces
V and V ∗ are distinct, and one cannot set ξ = v in the definition of hhξ, vii.
Example 2.10. Any linear functional on R2 is necessarily of the form
(x1 , x2 ) 7−→ a1 x1 + a2 x2 .
Indeed, any vector from R2 is of the form x = x1 e1 + x2 e2 (why?). Applying
the linearity condition, we see that
f (x) = x1 f (e1 ) + x2 f (e2 ) = a1 x1 + a2 x2 , ai = f (ei ) ∈ R.
Example 2.11. If f (x, y) is a bilinear map on R2 × R2 , then
f (x, y) = a11 x1 y1 + a12 x1 y2 + a21 x1 y2 + a22 y1 y2 , (16.1)
where aij = f (ei , ej ) ∈ R, i, j = 1, 2.
Problem 2.19. Prove that. Show that a bilinear map on Rn × Rn is
a polynomial of degree two in 2n scalar variables (x1 , . . . , xn , y1 , . . . , yn ) of
degree two, which is linear in each of the two subcollections separately.
Definition 2.15. A bilinear function f : V ×V → R is called symmetric,
if f (x, y) = f (y, x).
This definition explicitly demands that both arguments x, y should take
values from the same vector space V , otherwise the condition makes no
sense.
62 2. LINEAR ALGEBRA AND VECTOR SPACES
Remark 2.12. If f : R2 × R2 → R is bilinear and symmetric, then in
(16.1) we have a12 = a21 .
Definition 2.16 (fundamental). The scalar product (or the Euclidean
structure) on a vector space V is a symmetric bilinear function f : V × V →
R, which is positive, that is,
∀x ∈ V f (x, x) > 0, f (x, x) = 0 ⇐⇒ x = 0. (16.2)
Example 2.12. The function f : R2 × R2 → R defined by the formula
f (x1 , x2 , y1 , y2 ) = x1 y1 + x2 y2
is the scalar product.
Indeed, in this case f (x, x) = x21 + x22 which is obviously satisfying the
positivity condition.
Problem 2.20. Prove thatthe bilinear
form (16.1) is a scalar product
a a
if and only if a11 > 0 and det 11 12 > 0 (don’t forget the condition
a21 a22
a12 = a21 !).
16.2. Matrices again, but of a different sort. Consider three ma-
trices:
(1) a matrix x which has only one row with the entries (x1 , . . . , xn ),
(2) a matrix y which has only one column with the entries
y1
..
. ,
yn
(3) a square n × n matrix G with the entries {gij }ni,j=1 .
Then we can multiply them as follows,
g11 · · · g1n y1
.. . .. .
.. ...
xGy = (x1 , . . . , xn ) . . (16.3)
gn1 · · · gnn yn
The result is an 1 × 1-matric with a single entry, hence can be considered
as a real number. The first and the last terms of this product are vectors
from the space Rn , hence the product is a bilinear form. This form will be
symmetric if the matrix A is symmetric, that is, gij = gji . In particular,
one can use G = E the identity matrix, obtaining the bilinear form
y1 n
.. X
(x1 , . . . , xn ) . = x i yi .
yn i=1
This is a scalar product,
P since setting y = x yields the nonnegative function,
the sum of squares n1 x2i > 0.
16. SCALAR PRODUCT. EUCLIDEAN VECTOR SPACES 63
16.3. The Gram matrix. Consider a Euclidean space with the scalar
product h·, ·i andPan arbitrary basisP{ei }ni=1 in it. Each vector can be rep-
resented as x = n1 xi ei , resp, y = n1 yi ei . How looks the matrix A which
allows to compute the scalar product hx, yi as the matrix product (16.3)?
The simple computation shows that one has to take gij = hei , ej i, i, j =
1, . . . , n, so that
he1 , e1 i · · · he1 , en i
G = ... .. .. (16.4)
. .
hen , e1 i . . . hen , en i
16.4. Scalar product in Π. In the high school the scalar product between
−→ −−→
“vectors” OA and OB is defined using the following construction. Consider the
−−→ −−→
oriented line ` = OA and the orthogonal projection OB 0 of OB on this line. Then
−→ −−→
the vectors OA and OB 0 belong to the same line isometric to R, and hence can be
−−→
multiplied as the usual numbers, taken orientation into account (if OB 0 is pointed
−→
in the direction opposite to OA, then it should be treated as a negative number).
One can easily check that this construction is indeed bilinear, since the or-
thogonal projection is a linear map, and the product R × R 3 (a, b) 7→ ab ∈ R is
−→
“bilinear” by distributivity. It is nonnegative, since for B = A the projection of OA
is the same vector, and the square of any number is nonnegative. The symmetry is
only a bit more difficult to see.
In other words, the algebraic properties of the scalar product are hidden rather
deeply in the geometric construction, and it is algebra that makes the scalar product
important for geometry.
16.5. Notation. The scalar product is usually denoted by angular braces,
(v, w) 7−→ f (v, w) = hv, wi ∈ R.
Bilinearity means that you can expand the angular braces with respect to
the operations +, · on the vector spaces, exactly as you were manipulating
with the usual numbers and the usual operations.
A good mathematical symbolism (notation) does miracles to support
our intuition!
16.6. Scalar product and lengths. Regardless of the problems with
the “geometric definition” of the scalar product, it is so fundamental that
can be used for the definition of “geometric notions” in the context of an
abstract vector field.
Definition 2.17. From that moment on, we say that E is a Euclidean
vector space, if:
(1) E is a finite-dimensional vector space over R, and
(2) there is a (fixed) Euclidean structure, the scalar product (v, w) 7→
hv, wi satisfying the axioms above.
Having in mind the geometric interpretation of the scalar product, the
following definitions should look quite natural.
64 2. LINEAR ALGEBRA AND VECTOR SPACES
Definition 2.18. In a Euclidean vector space we define:
p
(1) the length |v| of a vector v ∈ V as |v| = hv, vi (the positive value
of the square root),
(2) the angle between two nonzero vectors v, w 6= 0 as
hv, wi
∠(v, w) = arccos ∈ [0, π].
|v||w|
The second part of the definition depends on the assumption that the
fractionin the right hand side is between −1 and 1. This is guaranteed by
one of the most fundamental inequalities in the analysis.
Theorem 2.6 (Cauchy–Schwartz–Bunyakovskii–. . . inequality).
In any Euclidean vector space,
hv, wi2 6 hv, vi hw, wi .
Corollary 2.7.
hv, wi 6 |v| · |w|.
Proof. This looks like a formal trick, but is in fact the shortest proof
of the above inequality, not using anything but the bilinearity of the scalar
product. I don’t know who was the first to suggest this proof, but no better
argument was found ever since.
Consider the expression
0 6 hv + tw, v + twi ∀t ∈ R.
with a parameter t ∈ R. Using bilinearity, we expand the right hand side as
hv, vi + 2t hv, wi + t2 hw, wi ,
which is a quadratic form of t. The quadratic form a + 2bt + ct2 with
a, b, c ∈ R has the leading coefficient c = hw, wi > 0 and has no distinct
real roots (necessarily and sufficient conditions to take only nonnegative
values) if and only if the discriminant ∆ = b2 − ac = hv, wi − hv, vi hw, wi is
non-positive.
16.7. Angles and orthogonality. We define two vectors v, w 6= 0
to be orthogonal to each other, if and only if hv, wi = 0. Looking at the
geometric interpretation of the scalar product (Definition 2.18), this means
that ∠(v, w) measured in the unique 2-plane that passes through v and w,
is equal to π/2.
If X is an affine space associated with a vector space V which has a
Euclidean structure, then we define
∀A, B ∈ X dist(A, B) = [B − A]
as the norm of the vector [b − a] ∈ V which is “the length of the vector from
A to B”.
You are invited to see that in your familiar Euclidean plane Π this defi-
nition gives exactly the same distance function, hence the same angles.
16. SCALAR PRODUCT. EUCLIDEAN VECTOR SPACES 65
16.8. Orthonormal systems. Expansion in the orthonormal ba-
sis. In a Euclidean space one can choose bases (and coordinates) in a special
way taking into account the scalar product.
Definition 2.19. A system of vectors v1 , . . . , vk in a Euclidean n-space
E is called orthonormal, if
(
0, if i 6= j,
hvi , vj i = (16.5)
1, if i = j.
Of course, this definition is intended to axiomatize the properties of the
basis vectors ei = (0, . . . , 1, . . . , 0) ∈ Rn .
Problem 2.21. Prove that any orthonormal system is linear indepen-
dent. Therefore, the number of elements in any such system is 6 dim E.
If we have the Euclidean space E with an orthonormal basis w1 , . . . , wn ,
then the problem of expanding an arbitrary vector v ∈ E in this basis is
much simpler than in general: the system of equations (13.7) can be solved
instantly.
Indeed, if
v = x1 w1 + · · · + xn wn ,
then we can compute the scalar product of both vector sides of this equation
with any of the vectors wi . Because of the orthonormality, we have
hv, wi i = xi · hwi , wi i = xi , i = 1, . . . , n.
In other words, the coefficients of the expansion can be computed without
inverting a matrix, but simply by directly computing the scalar product of
the expanded vector v with all vectors of the orthonormal basis wi .
Proposition 2.8. For any orthonormal basis w1 , . . . , wn in a Euclidean
space and any vector v ∈ E,
n
X
v= hv, wi i wi .
i=1
Existence of orthonormal needs to be proved. In fact, any basis can be
explicitly “twisted” to produce an orthonormal basis.
Theorem 2.9. In any Euclidean linear n-space E one can “twist” any basis v1 , . . . , vn
into an orthonormal basis w1 , . . . , wn in such a way that
Rv1 = Rw1 ,
Rv1 + Rv2 = Rw1 + Rw2 ,
.. .. (16.6)
. = .
Rv1 + · · · + Rvn = Rw1 + · · · + Rwn
The assertion of this Theorem (sometimes call the Gram–Schmidt orthogonalization)
means that the chain of subspaces L1 ⊂ L2 ⊂ · · · ⊂ Ln = V constructed in §12.2 will be
the same for the two bases (of course, the enumeration is important).
66 2. LINEAR ALGEBRA AND VECTOR SPACES
Proof. Let L ⊂ E be a subspace of dimension k, w1 , . . . , wk and orthonormal basis
in L and v ∈
/ L be a vector outside of this subspace. Consider the vector
k
X
w=v− hv, wi i wi .
i=1
Then one can see immediately that 0 6= w is orthogonal to all vectors w1 , . . . , wk . We
will set
w
wk = , so that |w1 |2 = hw1 , w1 i = 1.
|w|
The new collection w1 , . . . , wk is an orthonormal basis in L + Rv. This process allows to
construct wi inductively, starting from w1 = |vv11 | .
Problem 2.22. Let L ⊂ E be a subspace of a Euclidean space and v ∈ E a vector.
Give the definition of orthogonality between v and L. Give the definition of the orthog-
onal projection of v on L. Find an explicit formula for this projection, if you have an
orthonormal basis in L.
16.9. Orthogonal projections. Another computation that can be
easily performed in the Euclidean space is the orthogonal projection.
The general notion of a projection (not necessarily orthogonal) is quite general. As-
sume that V is a vector space and L, M ⊆ V two subspaces with the following properties:
(1) L ∩ M = {0}, and
(2) L + M = V , that is, any vector v ∈ V can be represented as a sum v = u + w,
u ∈ L, w ∈ M .
Because of the first condition, the decomposition in the first condition is unique.
Definition 2.20. For L, M ⊆ V as above, the projection of V on L parallel to M is
the map6 π : V → L which sends v to π(v) = u.
Of course, one has also a complementary projection π 0 : V → M parallel to L.
Problem 2.23. Prove that π + π 0 = idV (the identity map V → V ).
Proposition 2.10.
(1) The map π is linear.
(2) The restriction π|L is the identical map L → L.
(3) The kernel Ker π coincides with M .
(4) π 2 = π.
Problem 2.24. Prove the above proposition.
Proposition 2.11. Let A be a linear map, A : V → V . Assume that A2 = A. Then
there exist two subspaces L, M as above, such that A is a projection of V on L parallel to
M.
Proof. Denote L = AV the image of A and M = Ker A = {w : Aw = 0}. For any
v ∈ V let w = v − Av. Then Aw = Av − A2 v = 0, that is, w ∈ M . For the same reason if
u ∈ L, that is, u = Av ∈ AV , then Au = A2 v = Av = u, that is, A|L is the identity map
L → L. Decomposition v = u + w, w = Av − v, u = Av, shows that A is the projection
as defined.
Now assume that we have a Euclidean vector space E, that is, a vector
space equipped with a scalar product.
6
In linear algebra the choice of the letter π to denote projections is very widespread.
Unless you have reasons to expect a hidden circle or sphere, you should never assume that
π is the Archimedes constant 3.14159 . . . .
16. SCALAR PRODUCT. EUCLIDEAN VECTOR SPACES 67
Definition 2.21. Two subspaces L, M ⊆ E are said to be orthogonal
to each other, if for any u ∈ L, w ∈ M , we have hu, wi = 0. We write than
that hL, M i = 0.
By this definition, {0}⊥ = E, E ⊥ = {0}. For a given subspace L ⊆ E
there is the largest subspace which is orthogonal to it.
Definition 2.22. The orthogonal complement L⊥ is the subspace
L⊥ = {w ∈ E : hw, Li = 0} = {w ∈ E : ∀u ∈ L hw, ui = 0}. (16.7)
By this definition, any subspace orthogonal to L is part of L⊥ .
Proposition 2.12.
(1) L ∩ L⊥ = {0}.
(2) L + L⊥ = E, i.e., ∀v ∈ E ∃u ∈ L, ∃w ∈ L⊥ such that v = u + w.
(3) dim L + dim L⊥ = dim E.
Proof. If 0 6= v ∈ L ∩ L⊥ , then by definition v should be orthogonal to
itself, hv, vi = 0, which is impossible (why?).
Assume that L+L⊥ (which is a subspace L0 ( E) which does not coincide
with the whole E. Then there exists v ∈ E, v ∈ / L0 . By the procedure
analogous to that used in the proof of Theorem 2.9, we can construct another
nonzero vector v 0 ∈ E orthogonal to L0 . Being orthogonal to L ⊆ L0 , this
vector cannot belong to L⊥ ⊆ L0 , which was assumed to be the maximal
subspace with this property. The contradiction shows that L0 = E.
The last assertion follows immediately from the first two.
Definition 2.23. Let L ⊆ E be a subspace of a Euclidean space. The
orthogonal projection π : E → L is the projection in the sense of Defini-
tion 2.20 which is parallel to L⊥ .
Speaking informally, the orthogonal projection πL sends each vector v ∈
E into the first term of the expansion v = u + w, where u ∈ L and w ∈ L⊥ .
One easily can describe this projection in terms of the orthogonal bases.
If E is a Euclidean space, then both L and L⊥ are subspaces of it, therefore
the scalar product makes each of them into a Euclidean space. Therefore we
can choose orthonormal bases {w1 , . . . , wk } ⊆ L and {w1 , . . . , wm } ⊆ L⊥ ,
k + m = n = dim E. Since L, L⊥ = 0, the union of these vectors is a basis
in E and each vector v can be expanded as follows,
v = (x1 wk + · · · + xk wk ) + (x01 w10 + · · · + x0m wm
0
).
with the coefficients x1 , . . . , xk , x1 . . . , x0m ∈ R defined by the Proposition 2.8.
The first term in this sum is the projection on L parallel to L⊥ , the other
the projection on L⊥ parallel to L.
Remark 2.13. The orthogonalization process which proves Theorem 2.9,
can be described very easily in geometric terms. Each basis defines a growing
chain of subspaces
0 ( L1 ( L2 ( · · · ( Ln−1 ( Ln = V.
68 2. LINEAR ALGEBRA AND VECTOR SPACES
The orthogonalization process is inductive: we construct orthonormal bases
in all subspaces from this chain, starting from L1 . The induction step in
turn consists of the three operations:
(1) Take the basis vector vk+1 ∈ Lk+1 r Lk , k = 0, . . . , n − 1;
0
(2) Compute its projection vk+1 = πk (vk+1 on Lk , using the orhonor-
mal basis in Lk which was constructed on the previous step,
k
X
0
vk+1 = hvk+1 , wi i wi ,
i=1
0
(3) the difference wk+1 0
= vk+1 − vk+1 is then orthogonal to Lk (and
still nonzero),
0
(4) wk+1 can be normalized (multiplied by a suitable scalar) so that
the result wk+1 has the unit scalar square, hwk+1 , wk+1 i = 1.
16.10. Isometries. Let E be a Euclidean space.
Definition 2.24. An isometry of E is a linear self-map A : E → E
which preserves the scalar product: for any two vectors v, w ∈ E
hAv, Awi = hv, wi . (16.8)
Problem 2.25. Prove that any isometry preserves lengths of vectors
and angles between them.
Problem 2.26. Give the definition of an isometry between two different
Euclidean spaces.
Problem 2.27. Prove that all isometries of a Euclidean space form a
group by composition.
Problem 2.28. Consider Rn with the standard orthonormal basis e1 , . . . , en
and assume that v1 , . . . , vn is another orthonormal basis there.
Show that the map
(x1 , . . . , xn ) 7→ x1 v1 + · · · + xn vn
is an isometry of Rn .
16.11. Diagonalizability of isometries. We already had seen that
arbitrary operators (matrices) may be non-diagonalizable, even if all their
eigenvalues (the roots of the characteristic polynomial) are real. With isome-
tries this cannot happen.
16.12. Isometries of R2 . Consider the Euclidean 2-space R2 with the
standard scalar product. Any linear map is uniquely defined by its matrix
aij , i, j = 1, 2, where (see (13.1))
Ae1 = ae1 + be2 , Ae2 = ce1 + de2
There are three conditions for isometry of A:
|Aei | = 1, i = 1, 2, hAe1 , Ae2 i = 0.
17. ROWS OR COLUMNS? (A HORROR STORY) 69
Since the basis e1 , e2 is orthonormal, we have these conditions expressed as
a2 + b2 = 1, c2 + d2 = 1, ac + bd = 0.
The last equation defines (c : d) uniquely through a, b: necessarily c = −λb,
d = λa for some λ ∈ R (why?) the second equation implies that λ2 = 1, i.e.,
λ = ±1.
Note that any two numbers such that a2 + b2 = 1 define uniquely an
angle ϕ ∈ R mod 2πZ such that a = cos ϕ, b = sin ϕ. Thus the matrix of
any isometry of R2 is one of the two types:
cos ϕ sin ϕ cos ϕ sin ϕ
or .
− sin ϕ cos ϕ sin ϕ − cos ϕ
Problem 2.29. Show that the first solution is orientation-preserving on
R2 , the second is orientation-reversing.
Problem 2.30. Prove that the first map is a rotation counterclockwise
by ϕ: if v = (x, y) = (r cos ψ, r sin ψ) for some r > 0 and ψ ∈ R mod 2πZ,
then
Av = r cos(ϕ + ψ), r sin(ϕ + ψ) .
What does geometrically the second isometry? Show that it is composition
of a rotation with a mirror symmetry of the plane (x, y) 7→ (x, −y) (in any
order).
16.13. Isometries of affine Euclidean spaces. We already defined the affine
space S associated with a Euclidean space E in §16.7: there is no scalar product be-
tween points A, B ∈ S, but there are distances (lengths of the vectors [B − A]) and angles
∠AOB as the angle (in E) between the vectors [A − O] and [B − O].
One can see that any translation A 7→ A + v, A ∈ S, v ∈ E, is trivially distance-
preserving, since [B − A] = [(B + v) − (A + v)]. On the other hand, if I : S → S is an
isometry and preserves some point O ∈ S, then it must preserve all angles ∠AOB with
vertex at this point or “reflect” them as ϕ 7→ −ϕ mod 2πZ.
17. Rows or columns? (a horror story)
The geometric language is perfect and unambiguous until we start calculating specific
examples with concrete matrices. Then you inevitably come across the question: we need
to write some matrices, but how do we remember which numbers should be written in
rows and which in columns? The issue is damned: what comes first, chicken or egg? When
we say that A is an n × m-matrix, how many columns it has, n or m? (the other number
will be the number of rows). When we denote by aij a matrix entry, does i mean a row
or a column?
The source of confusion is twofold. Whether we want it or not, we are dealing with
elements of vector spaces and their duals. The definition of duality (at least in the finite-
dimensional case) is symmetric: it is a pairing h ξ, vii between a space V and its dual V ∗
which does not distinguish who is who: it is a bilinear map V ∗ × V → R or V × V ∗ → R
according to our preferences.
On the other hand, the matrix product is asymmetric: to multiply A by B in the
order AB, we require that the number of columns in A be equal to the number of rows
in B, otherwise the product is undefined. Clearly, this rule is order-specific (to define the
product BA, we need to exchange rows by columns). How to sort out these things if you
do not remember the textbook?
70 2. LINEAR ALGEBRA AND VECTOR SPACES
You can safely skip the rest of this section unless you are indeed going to do some
calculations with matrices.
17.1. Before you start any computations. Matrices are rectangular tables (in-
cluding the limit cases when a table reduces to just one row or one column) filled by
numbers, scalars, elements of R. But in linear algebra scalars (as opposite to vectors) may
appear only as coefficients of various (linear) combinations. Thus in order to have matri-
ces, besides vector spaces and objects on them (operators, linear functionals, bilinear and
quadratic forms) an initial supply of vectors in each space is necessary. In practical terms,
matrices will appear no sooner than a basis in each vector space is selected, and these
matrices depend on the choice of the basis. Choosing a different base may considerably
simplify the matrices, see §18.
Sometimes the choice of base is made tacitly, when the space is assumed to be the
Cartesian space Rn and the basis is the standard collection of vectors
{e1 , . . . , en }, ei = (0, . . . , 0, 1, 0, . . . , 0) ∈ Rn , i = 1, . . . , n.
i
In some other cases the choice of the basis is made explicit. For instance, in the linear
space Rn−1 [t] of polynomials of degree 6 n − 1 in one variable t with real coefficients, the
standard basis consists of the monomials
{1, t, t2 , . . . , tn−1 } ⊆ Rn−1 [t].
This basis is very convenient to deal with polynomials in the expanded form. However,
if we consider the linear operator of derivation D : Rn−1 [t] → Rn−1 [t], Dp = p0 , then the
following basis will be more convenient,
1 k
fk = t , k = 0, 1, . . . , n − 1
k!
(note that the enumeration starts from k = 0). In this basis D takes an especially simple
form, Dfk = fk−1 , Df0 = 0, so that it acts on the expansion
n−1 n−2
X ck k D X ck+1 k
p(t) = t 7−→ t = p0 (t)
k! k!
k=0 k=0
as a simple coefficient shift, (c0 , c1 , c2 , . . . , cn−1 ) 7→ (c1 , c2 , . . . , cn−1 , 0).
17.2. Column vectors and row vectors. The easiest starting point to memorize
the formula for the matrix multiplication is as follows. Let p, x be two vectors in Rn
with the coordinates p1 , . . . , pn and x1 , . . . , xn respectively.
PThen their scalar product (or
bilinear pairing of Rn with Rn∗ ' Rn is equal to h p, xii = i=1 pi xi . For each of them we
can write down the corresponding matrix as a column (with many rows) or a row (with
many entries).
Definition 2.25. The mnemonic rule of matrix multiplication: column to the right,
row to the left.
x1
x2
h p, xii = p1 p2 · · · pn−1 pn · ... ∈ Mat1×1 (R) ' R. (17.1)
xn−1
xn
If we multiply the row p and the column x in the inverse order, we will obtain the
n × n-matrix (write it explicitly!).
17. ROWS OR COLUMNS? (A HORROR STORY) 71
17.3. Matrices and “matrices”. We can extend this mnemonic rule to matrices
whose entries are vectors themselves.
Example 2.13. Let v1 , . . . , vn be a basis in V and v ∈ V is a vector which has
coordinates (x1 , . . . , xn ) in this basis, xi ∈ R. Then by definition we have
v = x1 v1 + · · · + xn vn
(we love write the scalar coefficients xi to the left from the vectors vi !). However, to
render this formula in the matrix form, we need to introduce the “row matrix”
v1 . . . vn (17.2)
so that
x1
.
v = v1 ... vn · .. ∈ V. (17.3)
xn
P
Note that the rule of matrix multiplication forces us to write vi xj in the “wrong” order.
Well, as long as we understand what does it mean, no harm is done.
Of course, the “row matrix” to the left is not a genuine matrix: its entries are not
numbers, but vectors vi . Each such vector is given by n-tuples of its coordinates, say, in
the standard basis ei ∈ Rn ' V . Expanding each vector vi as a column
vi1
.
vi = .. ∈ Rn , i = 1, . . . , n,
vin
we replace the “row matrix” (17.2) by the square matrix
v11 . . . vn1
. .. ..
.. . . (17.4)
v1n . . . vnn
Then the representation (17.3) of the vector v will take the form of the matrix product
v11 . . . vn1 x1
. . . .
v = .. .. .. · .. ∈ Rn ' V. (17.5)
v1n . . . vnn xn
Example 2.14. Let {v1 , . . . , vn } and {w1 , . . . , wn } be two different bases in the same
space V .
Show that there is an invertible n × n-square matrix C ∈ n × n(R) such that
v1 . . . vn = w1 . . . wn · C. (17.6)
Assume that a certain vector v has coordinates (x1 , . . . , xn ) in the first basis {vi }. How
can we find its coordinates (y1 , . . . , yn ) in the second basis? The answer is obvious: we
have the identity to satisfy
y1 x1
w1 . . . wn · ... = v = v1 . . . vn · ...
(17.7)
yn xn
Substituting (17.6), we see that
y1 x1
. .
C . = .. .
. (17.8)
yn xn
72 2. LINEAR ALGEBRA AND VECTOR SPACES
In other words, the multiplication by the same matrix C that described the relationship
(17.6) between the bases, achieves also the recalculation of coordinates of any vector after
the base change.
Note, however, the critical circumstance! The formula (17.6) describes how the new
base vectors are expressed through the old ones. The formula (17.8) (read from right to
left) expresses the coordinates of vectors in the old basis through their coordinates in the
new basis! wi via vj , but xi via yj !
If you want to express the new coordinates via the old ones, you would have to
compute the inverse matrix C −1 . Upon a second thought, this is not so big a surprise,
cf. with §13.3.
17.4. Matrices of operators. Thus far, we see that coordinates of a single vector
should be arranged in a column, while vectors forming a basis should be initially arranged
in a row and only then expanded as columns to produce a square matrix.
How matrix elements of an operator should be arranged into a rectangular table? We
start from the Definition 13.1. Given our readiness to work with “matrices” whose entries
are themselves vectors (while preserving the thumb rule of the “matrix” multiplication”,
we can write the formula of how an operator acts on elements of the basis {ei } in V ,
expanding the result of the action in the basis {fj } in W :
ai1
.
A(ei ) = f1 . . . fm · .. (17.9)
aim
where A is a rectangular (if n 6= m) matrix which has n = dim V rows and m = dim W
columns. Again, you can easily recall the answer by looking at the height of the column
vectors ei (equal to n = dim V ) and the number of elements in the W -basis (m = dim W ).
Arranging the coefficients of the expansions (17.9) in a rectangular table, we have the
“matrix” identity
Ae1 . . . Aen = f1 . . . fm · M, (17.10)
where M = MA is the matrix of the operator A in the chosen pair of basis.
0 n
If you want to replace the bases {ei }n m
1 and {fj }1 by another pair of bases {ei }1 and
{fj0 }m
1 , follow the prescription explained in Example 2.14.
Example 2.15. Assume that
e01 e0n = e1 . . .
... en · C, (17.11)
f10 0
... fm = f1 . . . fm · D. (17.12)
where C, D are square matrices of appropriate sizes, see Example (2.14).
Then we have for the same operator the new matrix M 0 the following equality. The
basis
e01 . . . e0n
= e1 . . . en · C
by (17.11)
is by (17.10) transformed by A into
f10 0
· D−1 M C
f1 . . . fm · M C = ... fm
by (17.12)
Thus
M 0 = D−1 M C,
where D and C are invertible square matrices (17.11), (17.12), and M the initial rectan-
gular matrix M .
17. ROWS OR COLUMNS? (A HORROR STORY) 73
17.5. Dual vectors, dual basis. As soon as we pass from a given vector space V
with a basis {ei } = e1 . . . en to the dual space V ∗ , columns become rows and vice
versa. If {ξ1 , . . . , ξn } is a basis in V dual to the basis {ei }, then this duality condition in
the matrix form looks as follows,
ξ1
.
.. · e1 . . . en = E, (17.13)
ξn
where E is the square n × n-identity matrix (note! the column is at the left, the row at
the right, so the result is not a number).
A covector ξ ∈ V ∗ with the coordinates (p1 , . . . , pn ), pj ∈ R, in the basis {ξj }, is the
row matrix
ξ1
.
ξ = p1 . . . pn · .. ,
ξn
and the value it takes on the vector v with the coordinates (x, . . . , xn ) in the basis {ei },
is the number (or rather 1 × 1-matrix)
ξ1 x1 x1
. . .
p1 . . . pn · .. · e1 . . . en · .. = p1 . . . pn · .. = h ξ, vii ,
ξn xn xn
because the middle product is the identity matrix E by (17.13).
17.6. Quadratic forms. As was explained, in the Euclidean space the scalar prod-
uct allows to identify the space V with its dual V ∗ by associating with each vector v ∈ V
the covector ξ ∈ V ∗ such that
ξ = hv, ·i , that is, h ξ, wii = hv, wi ∀w ∈ V.
On the matrix level (assuming that all coordinates are calculated with respect to an or-
thonormal basis), this legalizes the operation of transposition, which turns out columns
into rows and rows into columns:
∗
p1 x1
∗ . .
p1 . . . pn = .. , .. = x1 . . . xn . (17.14)
pn xn
Of course, this operation (again denoted by asterisk) can be applied to all rectangular
and even square matrices, for the latter it amounts to reflection of matrix elements in the
diagonal.
A quadratic form q on V in an orthonormal basis is defined by its square n × n-matrix
Q such that
x1
q(x) = x1 . . . xn · Q · ... ∈ R.
xn
Transposing both parts of this identity, we see that the natural assumption on Q is sym-
metry7, Q∗ = Q.
7
Of course, formally one can consider quadratic forms defined by a non-symmetric
matrix R, but because of the symmetry of the product one can always replace R by
Q = 21 (R + R∗ ) which will be symmetric.
74 2. LINEAR ALGEBRA AND VECTOR SPACES
Example 2.16. Assume that R is a linear operator (self-map of a Euclidean space)
which is an isometry, that is, hRv, Rwi = hv, wi for any two vectors v, w ∈ V ? The
adjoint R∗ of such operator by definition satisfies the assumption hRv, wi = hv, R∗ wi for
any two vectors v, w ∈ V , and one can instantly see that the matrix of R∗ in the same
orthonormal basis is the transpose of the matrix of R. These two identities imply that
R is an isometry (i.e., preserves the scalar product, hence all lengths, angles e.a.) if and
only if RR∗ = R∗ R = E, the identity matrix (cf. with the condition (17.13)).
This means that, in contrast with the general case where computation of an inverse
matrix is a challenge, for the matrix of an isometry it is a simple transposition that gives
the inverse answer.
In particular, after changing a basis in the Euclidean space the new matrix of a
quadratic form Q will have the form Q0 = R∗ QR, see Example 2.15.
17.7. For future reading. All these mnemonic rules about rows and columns make
an impression of some shamanism, and rightly so. The choice between the rows and
columns (or between a vector space and its dual) is as meaningful as a choice between the
“original” and the “mirror image”. The choice is a matter of convention. We can “reflect
in the diagonal” everything, starting from the rule of matrix multiplication and the order
of terms it presumes, and the new “calculus” will be as legitimate and (il)logical, as the
accepted one.
In fact, the real difference between geometric objects can be revealed only by looking
at their behavior by linear maps.
Example 2.17. Let A : V → W be a linear map between two different vector spaces
(in general, of different dimensions). By definition, A takes vectors v from V and “pushes
them forward” to become vectors w = Av from W . How does this action extend to linear
functionals on V and W respectively?
The answer may look somewhat unexpected: the operator A carries (linear) function-
als on W backward to become functionals on V , and this action (“pullback”) is a linear
operator between the respective dual spaces. This action is defined by the tautological
formula (recall, that h ξ, vii denotes the “canonical” pairing between linear functionals on
V and vectors from V ):
A∗ : W ∗ → V ∗ , h A∗ ξ, vii = h ξ, Avii ∀v ∈ V.
∗
The left hand side defines the action of A ξ on an arbitrary vector v ∈ V , thus an element
from V ∗ .
Similar arguments show that a quadratic form q on W is naturally pulled back as a
quadratic form A∗ q on V using the formula
A∗ q(v) = q(Av).
We see that at least some object differ by the direction in which they are carried by
linear maps. The above examples do not assume any invertibility of the “carrier” operator
A. The next example explicitly assumes that the “carrier” is invertible (still the spaces V
and W are considered as distinct).
Example 2.18. Let A : V → W be an invertible operator between two different vector
spaces. Assume that B : V → V is an operator on V , that is, a linear self-map of V . Can
one associate with this data a self-map of W into itself? The answer is affirmative, but to
see it one needs to draw a diagram on the plane.
A
V −
−−−−
→ W
By
yC (17.15)
A
V −
−−−−
→ W
18. CRASH COURSE ON LINEAR ALGEBRA 75
On this diagram the two rows (upper and lower) represent the same carrier A: the notation
A
V → W means the same as the notation A : V → W .
The two vertical arrows represent two operators: one of them, B, is given. How we
should define the operator C which is a self-map of the space W ?
The only meaningful answer is,—make the diagram commutative! There are two
compositions that lead from the upper-left corner V to the bottom-right corner W , one
of them is AB = A ◦ B (in this order), the other CA = C ◦ A (ditto). We say that the
diagram (17.15) is commutative, if the two compositions coincide,
AB = CA.
This identity, given the assumed invertibility of A with an inverse matrix A−1 , yields two
formulas:
B = A−1 CA and C = ABA−1 .
These formulas show that the invertible carrier A between two spaces may carry operators
acting on these spaces (B and C respectively) in both ways.
Compare these formulas with Example 2.15.
The highbrow way to deal with these things is the Tensor Analysis for practitioners
and Calculus on Manifolds for artists and art lovers.
18. What else was in your Linear Algebra course?
Here us an ultra-brief survey of what usually enters the standard syllabus of the
Linear Algebra course. We mention these things here to help you use the newly acquired
geometric intuition.
The unifying theme can be described as “simplification of formulas”: in what basis
the matrix of an object (linear map, linear self-map, quadratic form) looks best? All
spaces here are assumed to be over R and finite-dimensional.
18.1. Linear maps between different spaces. Let A : V → W be a linear map
between two different spaces. For each choice of a pair of bases {v1 , . . . , vn } ⊆ V and
{w1 , . . . , wm } ⊆ W the operator gets encoded by the n × m-matrix {aij } as the result of
expansion of the images Avi in the basis {wj }:
m
X
Avi = aij wj , i = 1, . . . , n, j = 1, . . . , m, aij ∈ R.
j=1
Choosing other basis independently in V and W changes the “matrix elements” of A.
What is the best (simplest) choice?
Theorem 2.13 (on rank). There exists a number r 6 min(n, m), called the rank of
A, such that in suitable bases A has the form
Av1 = w1 , . . . , Avr = wr , Avr+1 = · · · = Avn = 0. (18.1)
Clearly, the operator of rank 0 is identically null. The operator of maximal rank is
necessarily either injective (when n 6 m) or surjective (when n > m) or both (when
n = m). In the latter case the matrix has units on the diagonal and zeros off diagonal, in
the intermediate (non-maximal) cases the diagonal contains zeros (the off-diagonal entries
are still all zeros).
76 2. LINEAR ALGEBRA AND VECTOR SPACES
This is a very simple theorem, you are invited to prove it as a problem. The matrix
of A in these two bases is trivial:
1
1
. ..
(18.2)
1
(all blank entries are zeros, r is the number of units, the zero columns and zero rows
should be added to make it up to the dimensions).
18.2. Linear self-maps. The situation becomes considerably more difficult in the
case where W = V , that is, where A : V → V is a self-map of a space V into itself. In this
case we have to choose not two bases {vi } and {wj }, but only one. To stress the specific
nature of this case, we will refer to such linear maps as operators on V .
Eigenvalues and eigenvectors. The simplest example is, of course, the case of an
operator A : R1 → R1 of the real line to itself. We know that each such operator takes the
form
Av = λv, 0 6= v ∈ V ' R1 , λ ∈ R.
(In this case we have an 1 × 1-matrix (λ) which we identify with the real number λ ∈ R).
Definition 2.26. A nonzero vector 0 6= v ∈ V is called an eigenvector of A, if Av = λv
for some scalar λ ∈ R, which is called the eigenvalue of the corresponding eigenvector.
One can very easily verify that eigenvectors with different eigenvalues must be linear
independent. Thus if we find n pairwise different eigenvalues λ1 , . . . , λn , we will auto-
matically construct a basis, which consists of the relevant eigenvectors vi , in which the
operator takes a very simple form,
Avi = λi vi , i = 1, . . . , n, λi = aii ∈ R. (18.3)
The corresponding matrix will be diagonal with the eigenvalues λ1 , . . . , λn on the diagonal:
λ1
.. .
(18.4)
.
λn
How can one discover eigenvalues and eigenvectors? Note that if Av = λv, then
(A − λE)v = 0, where E is the identity operator, Ev = v for all v (its matrix has units on
the diagonal and zeros off diagonal in any basis). Since v was assumed to be nonlinear,
the operator A − λE has a nontrivial kernel and hence is non-invertible.
As we know, for square matrices there is an algebraic criterion of invertibility, ex-
pressed via the determinant, cf. with §13.4. The condition guaranteeing solvability of the
equation (A − λE)v = 0 with a nonzero v is the condition
det(A − λE) = 0. (18.5)
From the precious few what we know about the determinant from §13.4, we can still
conclude an important fact: the left hand side of the equation (18.5) is a polynomial in
λ of degree n with real coefficients obtained by polynomial combinations of the matrix
entries aij . This polynomial is usually referred to as the characteristic polynomial.
18. CRASH COURSE ON LINEAR ALGEBRA 77
Diagonalizable operators. In general one can expect that a real polynomial of degree
n will have exactly n distinct roots, and if we are especially lucky, all of them will be
real. This would mean that the corresponding operator would have n pairwise different
eigenvalues and the corresponding matrix will be diagonal in the bases assembled from
the eigenvectors.
Unfortunately, the life is not so simple for two reasons: first, some eigenvalues may
be non-real, and second (much more serious problem), the characteristic polynomial may
have multiple roots. The ultimate case is where this polynomial has only one root of
maximal multiplicity n, e.g., when det(A − λE) = λn .
The first case requires complex numbers to deal with and the “simplest basis” will
stop to be diagonal, yet still rather simple. In §16.12 you can see how looks the matrix
of a linear operator with two complex conjugate eigenvalues (non-real roots of a real
polynomial always come in conjugated pairs),
λ± = e±iϕ = cos ϕ ± i sin ϕ, λ+ = λ̄− .
The situation with multiple roots is more complicated.
Nilpotent operators. There is another class of operators, which are in a sense “maxi-
mally non-diagonalizable”.
Example 2.19. Consider the vector space V ' Rn which consists of all polynomials
in t with real coefficients of degree 6 n − 1. Let D : V → V be the operator of derivation
with respect to t. This is a linear operator, and one can easily compute its characteristic
polynomial. Indeed, the equation
p0 (t) = λp(t), p ∈ R[t], det p 6 n − 1, λ ∈ R,
has only one root λ = 0 and the only eigenvector with this eigenvalue, p1 (t) = 1. You can
check it directly or use your analytic skills and see that among the exponential functions
y = eλt which solve the differential equation y 0 = λy only the exponent e0t ≡ 1 is a
polynomial.
Thus we have an extreme situation, when there is a unique eigenvalue and unique
eigenvector, which is far from sufficient to build a basis of eigenvectors. Instead we notice
that there is a collection of vectors in V , namely,
vn = tn−1 , vn−1 = (n − 1)tn−2 , . . . , v2 = (n − 1) · · · 2 · t, v1 = (n − 1)! · 1,
Such that D acts on them as the “index shift”,
Dvn = vn−1 , Dvn−1 = vn−2 , . . . Dv2 = v1 , Dv1 = 0. (18.6)
As expected, Dn = D
| ·{z
· · D} is identically zero operator (such operators are called nilpo-
n times
tent). The matrix of the operator (18.6) has units just below the main diagonal (it could
well be made above-diagonal if we relabel the vectors in the reverse order). In this case
the entire subdiagonal is filled with units, but for a general nilpotent operator B the basis
vectors can be chosen to form several disjoint “vanishing chains”
vn → vn−1 → · · · → vn−k → 0,
vn−k−1 → · · · → vn−l → 0, ···
vn−r → · · · → v2 → v1 → 0.
Each separate chain has a matrix that is an elementary “Jordan block”,
0 1
0 1
. .. . .. ,
(18.7)
0 1
0
78 2. LINEAR ALGEBRA AND VECTOR SPACES
and the overall matrix of N is the block-diagonal with elementary Jordan blocks.
Jordan normal form. It turns out that the diagonal and nilpotent operators in a sense
are two extreme opposite cases, the general case being a sort of interpolation between them.
Theorem 2.14. For a general linear operator A : V → V one can can find a basis
v1 , . . . , vn of V (in general, over the complex numbers C, but ignore this for a moment)
in which the matrix of A will have the so called Jordan normal form, namely,
A = Λ + N, (18.8)
where:
(1) Λ is a diagonal matrix with the diagonal entries λ1 , . . . , λn ∈ C, the roots of the
characteristic equation, listed with their multiplicities;
(2) N is a nilpotent matrix, N r = 0 for some r 6 n,
(3) Λ and N commute between themselves, ΛN = N Λ.
It is convenient to group the eigenvalues λi into tuples according to multiplicity. Then
the last condition of commutativity would imply that N has a block-upper-triangular
structure, the structure of each block depending on how the iterates N k converge to zero
(in the sense of, say, dimensions of their kernels). It is instructive to review Example 2.19
in the case where Λ = λE is the scalar matrix with the common diagonal value.
What this theorem is good for outside from mathematical research?
One immediate advantage is that solution of systems with Jordan matrices is espe-
cially simple: they have a “triangular structure” which allows to order the indeterminates
in a special order such that they can be consecutively expressed via each other from top
to bottom (in the diagonal case where N = 0 the systems reduce to n disjoint scalar
equations).
Another advantage is the easy possibility to raise matrices to arbitrary powers (i.e.,
compute their composition with themselves any number of times). Indeed, for a diagonal
matrix Λ its kth power is the diagonal matrix with the numbers λki at the corresponding
places. If N 6= 0, to compute the answer one has to consider also products of the form
Λk−r N r with r 6 n, but again this is a relatively moderate computational task. One can
easily estimate the growth rate of the powers and, in particular, prove that the exponential
series
∞
X tk
et(Λ+N ) = (Λ + N )k
k!
k=0
converges for all values of t, which is a principal ingredient in solving systems of linear
ordinary differential equations with constant coefficients.
18.3. Bilinear and quadratic forms. A completely different problem appears
when we study Euclidean spaces with the scalar product hv, wi. We know that in a
suitable basis the scalar product takes an especially simple form (16.5). Such basis is by
no means unique, and one can use freedom of choosing it to simplify certain objects.
Such objects are called quadratic forms. Let E be a Euclidean space and A : E → E
a linear operator. Such operator defines a bilinear form
q : E → R, q(v) = hv, Avi . (18.9)
(2) Xref! It is natural to assume that the operator A is self-adjoint 2 : indeed,
q(v) = hv, Avi = hA∗ v, vi = hv, A∗ vi .
(If this is not the case, then it makes sense to replace A by B = 21 (A + A∗ ), this will not
change the form q).
Can one bring A to the diagonal form by an isometry (orthonormal transformation)?
If so, then we would have a simple expression
q(v) = λ1 x21 + · · · + λn x2n , ∀v = (x1 , . . . , xn )?
18. CRASH COURSE ON LINEAR ALGEBRA 79
From what we have learned about general linear operator, we might expect that this
diagonalization may be quite problematic for the same two reasons: this would mean
that we found an orthonormal basis (additional restriction) of eigenvalues for A and the
eigenvalues must be real.
Here a miracle happens: if A is symmetric (self-adjoint), then all problems we faced
in discussing the Jordan normal form, disappear.
Theorem 2.15.
(1) A symmetric operator A : E → E has only real eigenvalues: if Av = λv, then
λ ∈ R.
(2) Two eigenvectors with different eigenvalues λ, λ0 ∈ R, are orthogonal to each
other:
Av = λv, Av 0 = λ0 v 0 , λ 6= λ0 , 0 6= v, v 0 ∈ E =⇒ v, v 0 = 0.
(3) If L ⊆ E is a subspace invariant by A, A(L) ⊆ L, and L⊥ is its orthogonal
complement, L⊥ = {w ∈ E : hw, vi = 0 ∀v ∈ L}, then L⊥ is also invariant,
A(L⊥ ) ⊆ L⊥ .
All these statements are relatively easy to check using the symmetry of the scalar
product and the condition A∗ = A. It immediately implies the following Corollary.
Corollary 2.16 (diagionalization of a quadratic form). For any symmetric operator
A on a Euclidean space there exists an orthonormal basis {v1 , . . . , vn } ⊆ E such that
Avi = λi , i.e., the basis, in which the matrix of A is diagonal.
The corresponding quadratic form q(v) = hAv, vi takes the canonic form
n
X
q(v) = λi vi2 .
i=1
The eigenvalues λi are the same for any such orthogonal basis up to permutation.
One may ask a more relaxed question,—what if we don’t restrict ourselves by looking
only at the orthogonal bases, can we achieve more simple result? The result is also
nontrivial.
Theorem 2.17 (the inertia law). Let q(v) be a quadratic form and Q = {qij } is the
matrix of this form in an arbitrary basis {v1 , . . . , vn } so that
n
X
q(v) = qij xi xj , ∀v = x1 v1 + · · · + xn vn .
i,j=1
Then there exists a basis such that in it the matrix Q has only ±1 and 0 on the diagonal:
q(v) = (x21 + · · · + x2k ) − (x2k+1 + · · · + x2r ), k, r 6 n, r − k > 0.
The number of positive and negative squares is the invariant: it cannot be changed by
passing to another basis.
The numbers k, r − k 6 n are equal to the number of positive (resp., negative)
eigenvalues λi in Theorem!2.15. The remaining n − r eigenvalues must be zeros.
18.4. How to deal with complex eigenvalues? All our construc-
tions thus war were over the field of real numbers R, although we mentioned
occasionally that one can do similar things over the field C. However, the
fact that roots of the characteristic polynomial p(λ) = det(A − λE), see
18.5, may be non-real even if the matrix A has all real entries. How should
the corresponding complexification of the theory look like?
80 2. LINEAR ALGEBRA AND VECTOR SPACES
Since R ⊆ C, any real operator A : Rn → Rn can be extended as an
operator AC : Cn → Cn acting by the same formulas:
Xn
C
y = A x ⇐⇒ yi = aij xj , i = 1, . . . , n.
j=1
The only difference is that we allow the coordinates x1 , . . . , xn to take com-
plex values, so that its image will also have complex coefficients y1 , . . . , yn ,
while the matrix entries aij remain the same8.
If λ is a real root of the characteristic polynomial of a real matrix, then
there exists a real eigenvector v ∈ Rn such that Av = λv. If λ is a non-real,
then the corresponding eigenvector may well be nonreal as well.
Example 2.20. Let
1 −λ 1
A= : R2 → R2 , p(λ) = det = λ2 + 1.
−1 −1 −λ
This polynomial has two complex conjugate roots λ = ±i ∈ C and the
corresponding two eigenvectors are v = (1, ±i) ∈ C2 .
We can immediately observe the following property.
Proposition 2.18. If A is a real matrix and λ ∈ C r R a non-real
eigenvalue with a complex eigenvector v ∈ Cn , then the complex conjugate
number λ̄ 6= λ is also an eigenvalue and the corresponding eigenvector is
v̄ 6= v (all coordinates of v should be replaced by their conjugates).
Proof. If p ∈ R[λ] is a real polynomial, then p(λ̄) = p(λ) = 0̄ = 0. The
second claim is also obvious: since Ā = A, if Av = λv, then
Av̄ = Āv̄ = Av = λv = λ̄v̄.
For the same reasons if Rn is a Euclidean space with the standard scalar
product h·, ·i : R2 × R2 → R, hx, yi = ni=1 xi yi , then one can extend the
P
scalar product as a bilinear map
Xn
n n
h·, ·i : C × C → C, hx, yi = x i yi .
i=1
Of course, the symmetry and bilinearity will stay over C as well, but the
positivity condition disappears: the scalar product now takes complex values
and the scalar square hx, xi may well become real negative for some complex
vectors x ∈ Cn r Rn .
Example 2.21. If x = (1, i) 6= 0, then hx, xi = 1+(−1) = 0. In a similar
way, if y = (0, i) 6= 0, then hy, yi = −1,
8We needed the notation AC for a fraction of a second, only to stress the fact that we
changed the domain of the operator from Rn to Cn . After this we will, of course, denote
the complexied operator by the same letter A.
18. CRASH COURSE ON LINEAR ALGEBRA 81
Remark 2.14. In fact, there is a generalization of the scalar product P
for the complex
spaces Cn which will yield positive squares. If we denote by h(x, y) = n i=1 xi ȳi , then
h(x, x) > 0 for all x ∈ Cn , x 6= 0. However, there is a price to be paid for using such
object, called the Hermitian metric: it is not symmetric anymore and linear only over R:
h(y, x) = h(x, y), h(λx, y) = λh(x, y), h(x, λy) = λ̄h(x, y) ∀λ ∈ C. (18.10)
This remark should serve as a warning that the real and complex geometry are not com-
pletely identical (surprise?).
Now we have all prepared to prove the spectral properties 9 of symmetric
and orthogonal operators (matrices).
Proposition 2.19. Let V ' Rn be a Euclidean vector space and A : V →
V a symmetric operator. Then all eigenvalues of A are real.
Proof. Assume that λ ∈ C is an eigenvalue with a (complex, in general)
eigenvector v ∈ V C ' Cn of a symmetric operator A, Av = λv. Since A is
real, Av̄ = Av = λv = λ̄v̄. Then by the (bi)linearity we have
λ hv, v̄i = hλv, v̄i = hAv, v̄i = hv, Av̄i = v, λ̄v̄ = λ̄ hv, v̄i .
|vi |2 > 0 (cf. with Remark 2.14), hence can
P P
However, hv, v̄i = vi v̄i =
be be cancelled in the above equality, yielding the conclusion λ = λ̄, which
means that λ ∈ R.
Proposition 2.20. If A : V → V is an isometry of a real Euclidean
vector space, then all eigenvalues of A (in general, complex) have the unit
absolute value, |λ| = 1.
Proof. If A is an isometry, then hAu, Avi = hu, vi for all u, v ∈ V . If
λ ∈ C is an eigenvalue with an eigenvector v, then λ̄ is an eigenvalue with
the eigenvector v̄. Computing, we have
(λλ̄) · hv, v̄i = λv, λ̄v̄ = hAv, Av̄i = hv, v̄i .
Since hv, v̄i > 0, we conclude that λλ̄ = 1, that is, |λ|2 = 1, and hence
|λ| = 1.
9
Collection of all eigenvalues of an operator (matrix), including information on their
multiplicities, is called the spectrum of the operator. It has indeed a not-very-distant
relation to the acoustic or optical spectra appearing in Physics.
CHAPTER 3
Non-Euclidean geometries
We had already seen that the language of vector spaces and the associ-
ated power of linear algebra transform the Euclidean geometry into a simple
model which allows for effective calculations.
However, there is a “minor nuisance”. In the Euclidean geometry we
have to provide exceptions for situations that reduce to linear equations of
the form
0 · x = y, x, y ∈ R,
or even systems of equations of this form with several variables. If y = 0,
these equations are void of content and can be removed from the system,
but for nonzero values of y even one such equation makes the entire system
incompatible. To avoid this, we can consider only homogeneous systems of
equations. This leads naturally to the notion of a projective space.
19. Projective space Pn : models and visualization
19.1. Practical definitions.
Definition 3.1. The projective n-space Pn is the set of all (n + 1)-
tuples (“points”) of real numbers (x0 , x1 , . . . , xn ) ∈ Rn+1 , not all equal to
zeros, considered up to a common nonzero factor. To stress this fact, such
tuples will be denoted by (x0 : x1 : · · · : xn ). The numbers xi are called
homogeneous coordinates of a point x = (x0 : x1 : · · · : xn ) ∈ Pn .
Note that unlike in the case of vector spaces, we start enumeration of
the coordinates xi from i = 0. Note also that the projective space is not a
Cartesian product anymore: the homogeneous coordinates are not indepen-
dent.
Example 3.1. This notion already appeared in the past when discussing
equations of real lines on the Cartesian plane R2 , cf. with Theorem 1.11. The
two equations ax + by + c = 0 and a0 x + b0 y + c0 = 0 are equivalent, if and
only if (a : b : c) = (a0 : b0 : c0 ). The only difference with the Cartesian
case was the assumption that a2 + b2 > 0. The projective space P2 would
consist of all triplets such that a2 + b2 + c2 > 0, which allows for the “point”
(0 : 0 : 1) ∈ P2 . This “point” would constitute to the equation 1 = 0, which
does not involve neither x nor y, but still has no solutions.
The space P2 is called the (real ) projective plane.
83
84 3. NON-EUCLIDEAN GEOMETRIES
Example 3.2. Even more simple object is the real projective line which
consists of all pairs (x : y) such that (x, y) 6= (0, 0) (and again considered
up to proportionality).
These definitions introduce the projective spaces as abstract sets. In par-
ticular, they are not yet metric spaces, certainly they are not vector spaces,—
all these structures need exist in the “homogeneous” space Rn+1 . But they
can be brought down on the projective space after we give a coordinate-free
version of the projectivization.
Definition 3.2. Let V be a vector space over R. Consider all nonzero
vectors v, w, · · · ∈ V r {0} and define the following equivalence relation:
v ∼ w ⇐⇒ w = λv, λ ∈ R (by this definition, λ 6= 0 automatically). The
quotient space
PV = (V r {0})/(R r {0})
is called the projectivization of V . The map sending the vector 0 6= v into
its equivalence class (a point in PV ) is called the natural projection and
denoted by π.
By this definition1, Pn = P(Rn+1 ), n = 0, 1, 2, . . . . In these cases,
π(x0 , . . . , xn ) = (x0 : · · · : xn ).
Problem 3.1. What is P0 ?
Solution. We need to look at R1 r {0}: each two nonzero 1-vectors
are proportional to each other, thus there is only one equivalence class, say,
π(1). The space P0 is therefore a single point. Note that this point has no
relationship to the zero vector!
Remark 3.1. The upper indices in Rn+1 and Pn stand for the dimension of the
corresponding spaces. In the case of the vector space Rn+1 dimension was defined through
bases. The projective space Pn is not a vector space, hence we cannot define its dimension
yet. But later, in §19.3, we will see that large parts of the projective space Pn are in
one-to-one correspondence with different copies of the (affine) space Rn . This makes it
natural to assign to Pn the dimension equal to n.
How it happens that the dimension goes down by one from Rn+1 to Pn ? The answer
is simple: the projection map π which appears in Definition 3.2, squeezes 1-dimensional
lines in Rn+1 into projective 0-dimensional points in exactly the same way as another
projection (parallel) which sends (x0 , x1 , . . . , xn ) ∈ Rn+1 into the point (x1 , . . . , xn ) ∈ Rn
and squeezes all lines parallel to the 0th coordinate axis to points. This is how the
dimension unit is lost.
The general idea of equipping projective spaces PV with various struc-
tures becomes self-evident: we need to refer to properties of their preimages
in V , eventually adding the zero vector to these preimages.
We start with the linear objects in the vector space V and their images
in PV . For instance, the following definition should be self-explanatory.
1In the future we will sometimes use the notation Pn for PRn+1 and Pn for PCn+1 for
R C
the “canonical” projective spaces over R and C respectively, see Remark 3.3.
19. PROJECTIVE SPACE Pn : MODELS AND VISUALIZATION 85
Definition 3.3. A set L ⊆ PV is called a projective subspace, if its
preimage π −1 (L) ⊆ V r {0} becomes a vector subspace in V after adding
the zero vector. The dimension of L is the dimension of L = π −1 (P ) ∪ {0}
minus one, see Remark 3.1.
In fact, “addition of the zero vector” is done automatically in most cases.
Example 3.3. The preimage of a projective line in P2 should be a 2-
plane in R3 which passes through the origin. Such a plane is defined by a
homogeneous equation ax+by+cz = 0 with (a, b, c) 6= (0, 0, 0). This gives us
the general form of an projective line in the homogeneous coordinates. Note
that, compared to the Example 3.1, we don’t have any exceptional cases
to include: the equation with (a : b : c) = (0 : 0 : 1) would correspond to
the projective line with the equation z = 0 in the homogeneous coordinates
(x : y : z).
Of course, linear subspaces are not the only possible class of subsets that
can be defined inside the projective spaces.
Definition 3.4. A nonzero polynomial p ∈ R[x0 , x1 , . . . , xn ] is called
homogeneous of degree d ∈ N, if
p(λx0 , λx1 , . . . , λxn ) = λd p(x0 , x1 , . . . , xn ) ∀λ ∈ R, ∀x ∈ Rn+1 . (19.1)
Definition 3.5. A subset Z = Pn is called an algebraic projective set,
ifπ −1 (Z)⊆ Rn+1 is defined by finitely many homogeneous equations of the
form p1 (x) = 0, . . . , pk (x) = 0, x ∈ Rn+1 .
Example 3.4. On the projective plane P2 there are several sets called
quadrics, cf. with Theorem 2.17:
Zi = {±x2 ± y 2 ± z 2 } = 0, i = 8 (all combinations of signs). (19.2)
Out of these 8 quadrics, two are empty (when all three signs coincide: the
corresponding equations define only one point {x = y = z = 0} ∈ R3 which
does not correspond to a projective point). All six remaining equations are
either proportional each other, or can be transformed one to the other by
the permutation of the variables. Thus in the projective plane there exists
only one nondegenerate quadric up to trivial transformation.
To summarize the working construction of a projective space, this is the
“space” whose subsets are defined by homogeneous equations (of different
degrees, a finite number of conditions at once). The problem is how to
visualize this space and its respective subsets.
19.2. Visualizations. Working in the homogeneous coordinates is convenient, but
psychologically it is more convenient to represent each equivalence class by a single geo-
metric point. This can be done in several ways.
Consider the projective line P1 . By definition, points of this space are in one-to-one
correspondence with lines L = Lθ on the plane R2 , passing through the origin. Each such
line crosses the unit circle by two points S, S 0 .
86 3. NON-EUCLIDEAN GEOMETRIES
Figure 14. Projective line as a circle
If we want to keep only one of them, this should be done in the consistent way, e.g.,
by choosing S which is in the upper half-plane {y > 0}. This allows to identify P1 with
the arc {θ ∈ [0, π]} ⊆ U of the unit circle. Still one last ambiguity remains: the two points
θ = 0 and θ = π correspond to the same (horizontal) line: one of them should be kept
(say, θ = 0), the other (θ = π) discarded. But then one should “connect the loose ends”.
When the line Lθ rotates continuously counterclockwise across the horizontal position, the
point π − ε suddenly jumps to θ = ε. Thus “topologically” we have to take the arc above
and glue together its endpoints, making it again a (“smaller”) circle.
Formally our construction amounts to description of the quotient set U/± of the unit
circle, on which we identified any two opposite points.
A similar construction of visualization for P2 is analogous, but meets more difficulties.
We consider lines through the origin in R3 and their (pairs of) intersection points with the
unit sphere S2 = {x2 + y 2 + z 2 = 1} ⊆ R3 . Having “projective points” being visualized
as pairs of antipodal “spherical points” is good in theory, but when you consider large
subsets of P2 , you may encounter problems.
Another psychological obstruction is the “curvature” of the sphere S2 . A point being
visualized by a pair of points is not that bad. Worse if the “projective line” which in
the homogeneous coordinates is a 2-dimensional linear subspace in 3-dimensional space,
is visualized by the obviously curved large circle on S2 .
As before, it is sufficient to keep only the upper (northern) hemisphere S2+ = S2 ∩{z >
0}, but again the equator {z = 0} should be dealt with separately. The equator is isometric
to the unit circle U, and we already know how to proceed: one should glue two each other
two half-circles of the equator, identifying the opposite points.
At the end we will have to assemble together the following pieces:
(1) An open northern hemisphere S2+ = S2 ∩ {z > 0},
(2) An open half-circle S2 ∩ {z = 0} ∩ {y > 0},
(3) A point (1, 0, 0) = {z = 0, y = 0, x > 0} ∩ S2 .
These pieces have different dimensions (2,1 and 0 respectively) and need to be stitched
(glued) together to form a surface without boundary. This is a challenging task, but you
can try. The result topologically will be quite nontrivial: the resulting two-dimensional
“surface” will be non-orientable!
In practical terms the situation is not that bad, however. The northern hemisphere
can be plotted as an interior of unit disk bounded by the unit circle. All points inside the
disk are in one-to-one correspondence with points of P2 except the points on the boundary
(the unit circle U). To follow what happens with inhabitants of the projective plane, we
19. PROJECTIVE SPACE Pn : MODELS AND VISUALIZATION 87
Figure 15. Projective line as a hemisphere
should remember that they are not restricted in their travels to cross the boundary, but
they will emerge on the unit disk near an antipodal point.
19.3. Affine charts. Consider the natural projection π : Rn+1 r{0} →
Pn , π(x0 , x1 , . . . , xn ) = (x0 : x1 : · · · : xn ). The source is the union of n + 1
open sets Ui ⊆ Rn+1 ,
U0 = {x0 6= 0}, U1 = {x1 6= 0}, · · · , Un = {xn 6= 0}. (19.3)
Any equivalence class x = (x0 : · · · : xn ) (a point from Pn ) belongs to at
least one (and most likely to all) of these sets and in each set Ui has a unique
representative of the form
x0 x1 xn
Xi = , , . . . , 1, . . . , ∈ Rni , i = 0, 1, . . . , n,
xi xi i xi
(we denoted by Rni different copies of Rn in order to avoid confusing between
them).
Geometrically each point Xi is the point of intersection of the line
through x and the origin with the hyperplane {xi = 1}. We can treat the
points Xi as the images of the same point x through the different “windows”
on Pn , called charts. Note that some points are visible in all windows, while
some other points are visible only in a part of the charts. For instance, the
point (1 : 0 : · · · : 0) is visible only in the chart U0 , and in all other charts it
is invisible.
Of course, if a point is visible in several charts, the corresponding images
are Xi are far from being independent.
Example 3.5. If (x : y) ∈ P1 , then we need two charts, U0 = {x 6= 0}
and U1 = {y 6= 0}. In the first chart the point is seen as y/x ∈ R10 , in the
second as x/y ∈ R11 , assuming that both x, y are nonzeros. Thus we may
think of P1 as a union of two copies of the real line with the coordinates
u and v respectively, assuming that the projective point is the same if and
only if u = 1/v, v = 1/u.
In each of these charts only one point of P1 is invisible. In U0 invisible is
the point (0 : y) which corresponds to v = 0 in the second chart, conversely,
88 3. NON-EUCLIDEAN GEOMETRIES
the point (x : 0) is invisible in U1 , but in the first chart it corresponds to
the point u = 0.
Abusing the language, we usually say that in each chart u ∈ R1 , resp.,
v ∈ R1 , there is only one invisible point “at infinity”, which we abbreviate
(in a horrible way!) to saying that P1 = R1 ∪ {∞}. The projective line P1
differs from the one-dimensional vector space R1 by just one extra point,
“the infinity”. In a moment we shall see how this extra point changes a lot
of things.
Problem 3.2. Show that for P2 we need three charts. Write down the
explicit conditions relating the three images X0 = (1, u, v), X1 = (p, 1, q)
and X2 = (z, w, 1) of the same point.
19.4. How do we work in different charts? Assume we have an
algebraic projective set Z
p(x0 , x1 , . . . , xn ) = 0, (19.4)
where p is a homogeneous polynomial of degree d as in (19.1). Consider the
chart U0 and the variables ui = xx0i , i = 1, . . . , n. Then xi = x0 ui and by
homogeneity, the set (19.4) will become defined by the equation
p(x0 , x0 u1 , . . . , x0 un ) = xd0 p(1, u1 , . . . , un ) = 0 ⇐⇒ p(1, u1 , . . . , un ) = 0.
The latter equation defines a set Z0 ⊆ Rn0 as the set of zeros of a non-
homogeneous polynomial of degree 6 d in the variables u1 , . . . , un . All this
is possible since x0 6= 0, as we are inside U0 .
If we take another window, say, Un and introduce the respective coordi-
x
nates vj = xnj , j = 0, 1, . . . , n − 1 (note the difference in indices compared
to the previous case!), then by the same arguments the image of Z visible
through Un will be the set Zn ⊆ Rnn defined by the equation
p(v0 , v1 , . . . , vn−1 , 1) = 0.
Of course, choosing any of the remaining charts, we see the same set as an
algebraic subset Zi ∈ Rni . This example shows how we can study projective
sets by looking at them through different windows Ui and studying the
projections Zi .
Example 3.6. Consider the case d = 1 and = 2. The corresponding
linear equations define projective lines on the projective plane. Let a line
L ⊆ P2 be defined by a general equation ax0 + bx1 + cx2 = 0 with three
coefficients (a : b : c) not vanishing simultaneously.
In the windows U0 and U2 this line is defined by the equations
L0 = {a + bu1 + cu2 = 0}, resp., L2 = {av0 + bv1 + c = 0}. (19.5)
These equations, as we know, define a line in the (u1 , u2 )-plane (respectively,
in the (v0 , v1 )-plane), if these equations are nondegenerate:
b2 + c2 > 0 resp., a2 + b2 > 0. (19.6)
19. PROJECTIVE SPACE Pn : MODELS AND VISUALIZATION 89
These explicit conditions mean that if the equation for L0 is degenerate,
b = c = 0, then a 6= 0 and the second equation (for L2 ) is non-degenerate
and seen as a genuine line in the part of P2 visible through U2 . One can check
that the equation for the line L1 in this case will also be non-degenerate.
In other words, any projective line L ⊆ P2 is visible as an affine line in
at least two of the three standard windows on P2 .
Remark 3.2 (warning!). When we say that a line L is visible in the
window, say, U0 , we don’t mean that all points of L are visible in this win-
dow! All the way around, any projective line has at least one invisible point.
To find this point, we need to solve the system of two linear homogeneous
equations:
ax + by + cz = 0, x = 0.
This system defines a projective point (0 : −c : b) ∈ L which is invisible in
the window U0 .
What happens with lines which are invisible in a given chart? To un-
derstand this, we better look at the case n = 1 and arbitrary d > 1.
19.5. Algebraic subsets of P1 . A homogeneous polynomial in two
variables p(x, y) of degree d can be easily described:
d
X
p(x, y) = ci xd−i , ci ∈ R,
i=0
not all of the coefficients ci being zeros, since p 6≡ 0. How looks the algebraic
projective set Z = {p = 0} ⊆ P1 ?
According to the above philosophy, we need to study the roots of this
equation separately in two charts, taking into account that some of the
roots may be invisible in one of the charts (but not in both!). Dividing the
equation by xd (which is nonvanishing in U0 , we arrive at the equation
c0 (y/x)d + c1 (y/x)d−1 + · · · + cd = 0, that is, c0 ud + · · · + cd = 0.
In the second chart the equation after division by y d will have the “reversed”
form
c0 + c1 (x/y) + · · · + cd (x/y)d = 0, that is, c0 + · · · + cd v d = 0.
Obviously, the nonzero roots of these two equations are related by the iden-
tity ui = 1/vi , as expected. In addition, if c0 or cd are vanishing, then there
is a number of roots at v = 0 (resp., at u = 0), that are visible only in one
chart.
Example 3.7. Consider the extreme case p(x, y) = y d . Then in the
chart U0 this equation is equivalent to the equation ud = 0 which has one
multiple root at u = 0 of multiplicity d. In the other chart the same equation
takes the form 1 = 0. This means that in the chart v the equation has no
“finite” solutions, all of them escaping “to infinity”, that is, to the first
chart.
90 3. NON-EUCLIDEAN GEOMETRIES
Remark 3.3. The construction of projectivization, obviously, can be applied
to vector spaces over other fields, for instance, over the field C of complex numbers.
The results are complex projective spaces.
For instance, PC1 is called the complex projective line. Topologically it is
the complex plane C ' R2 with an added extra point denoted by ∞, that is, the
2-sphere called the Riemann sphere.
If 0 6= p(x, y) ∈ C[x, y] is a homogeneous polynomial of degree d with complex
coefficients, then it may have complex roots (x : y) visible in both charts u = y/x
and v = x/y. The roots (1 : 0) and (0 : 1) (if they are among the roots) will be
visible in only one of the charts.
What about the total number of roots? We know that the total number of
complex roots of a complex polynomial p(u) = c0 ud + · · · + cd with c0 6= 0 is exactly
d, if we count each root with its multiplicity. What happens with the total number
if we allow the coefficients vary? Since p and λp, λ 6= 0, have the same roots, the
parameter space of homogeneous polynomials of degree d coincides with the space
of proportional tuples (c0 : c1 : · · · : cd ), that is, it is itself the projective space
PCd+1 = PdC . When c0 vanishes, the degree of p drops, and some of the roots
“escape to infinity”.
You can study yourself the phenomenon by looking attentively at the quadratic
equation au2 + bu + c = 0 with (a : b : c) ∈ P2 . What happens with the roots if
a → 0? We see that on the projective line there is no room to escape, it is compact.
20. Intersection theory for the projective plane P2
20.1. Three charts, one plane. As was explained, all points (x : y :
z) ∈ P2 of the projective plane with z 6= 0 are visible in the chart U0 ,
that is, are in one-to-one correspondence with the real plane R2 using the
identification R2 3 (x, y) ↔ (x : y : 1) ∈ P2 . What is not visible in this
chart is the projective line L∞ = {z = 0} ⊆ P2 which will be referred to as
the infinite line(respective to the chart (x, y)):
P2 = R2 ∪ L∞ , where respectively L∞ ' P1 = R1 ∪ {∞}.
What is important is the following obvious observation: the visible part of
any projective line L ⊆ P2 is a straight line in the affine (Cartesian) plane
R2 .
20.2. No parallel lines in the projective geometry! The surpris-
ing property of the projective geometry is absence of parallel lines.
Theorem 3.1. Any two different projective lines on the projective plane
intersect by a unique projective point.
Geometric proof. By definition of projective lines, each such line is
the image L = π(Π), where Π ⊆ R3 is a linear 2-plane in R3 passing through
the origin of the vector 3-space. Any two different planes intersect by a 1-
line in R3 , also passing through the origin. The π-image of such line consists
of a single projective point.
20. INTERSECTION THEORY FOR THE PROJECTIVE PLANE P2 91
Algebraic proof. The two projective lines in the homogeneous coor-
dinates have the equations
(
ax + by + cz = 0,
(a : b : c) 6= (a0 : b0 : c0 ) ∈ P2 (20.1)
a x + b0 y + c0 z = 0,
0
This system of two equations with three unknowns always has a nonzero
solution (as follows from the Gauss algorithm). This solution may be non-
unique only if the two nonzero vectors (a, b, c), (a0 , b0 c0 ) ∈ R3 are propor-
tional, which is excluded.
Corollary 3.2. Every projective line has exactly one invisible point
except for the line L∞ which entirely consists of invisible points.
Problem 3.3. What happens if in the chart (x, y) we see two parallel
lines? Where is their intersection point?
Solution. The intersection point of two affine lines seen as parallel, lies
on the infinite line L∞ .
We can now establish an easy criterion saying that the three projective
lines pass through a common point. Compare the following statement with
Theorem 1.15.
Theorem 3.3. The three lines
ax + by + cz = 0,
a0 x + b0 y + c0 z = 0, (20.2)
00 00 00
a x + b y + c z = 0,
pass through a common point (finite or infinite), if and only if
a b c
det a0 b0 c0 = 0. (20.3)
a00 b00 c00
Unless all these lines coincide, this common point will be unique.
Proof. The system of homogeneous equations has a unique solution
(zero) if and only if the determinant vanishes, otherwise a solution exists
which is a point common for all three lines.
In particular, three parallel lines correspond to the case where the com-
mon point escapes to infinity.
Consider the “dual” problem (more about the duality later). Assume
that we have three points,
(x : y : z), (x0 : y 0 : z 0 ), (x00 : y 00 : z 00 ) ∈ P2 .
Theorem 3.4.
(1) Through any two distinct points always passes a unique projective
line.
92 3. NON-EUCLIDEAN GEOMETRIES
(2) Three points as above belong to a common projective line (finite or
infinite) if and only if
x y z
det x0 y 0 z 0 = 0. (20.4)
x00 y 00 z 00
(3) The common line is unique unless all three points coincide.
Problem 3.4. Prove Theorem 3.4. Give two different proofs (geometric
and algebraic) for the first statement.
Problem 3.5. Formulate the Desargues theorem 1.1 in terms of projec-
tive points (two triplets of them) and projective lines (three pairs). Think
about convenient notations!
20.3. Projective duality. Examples from the previous section sug-
gest that there is some symmetry between projective points and projective
lines. This is indeed the case. To understand full details, we would have to
discuss projectivizations of a vector space V and its dual space V ∗ , see §15.
However, even explained in coordinates, the subject is fascinating enough.
Consider the projective plane P2 = {(x : y : z)} with the homogeneous
coordinates 0 6= (x, y, z) ∈ R3 and another such plane denoted by P2∗ with
the coordinates 0 6= (a, b, c) ∈ R3 .
Definition 3.6 (Duality). Any point P ∈ P2∗ with coordinates (a : b : c)
corresponds to a line L ⊆ P2 defined by the equation
ax + by + cz = 0. (20.5)
Any point Q = (x : y : z) ∈ P2 corresponds to a line M ⊆ P2∗ defined
by the same equation (20.5) with respect to the homogeneous coordinates
(a : b : c) ∈ P2∗ .
The equation (20.5) is called the incidence condition, to stress its sym-
metric nature with respect to P2∗ and P2 .
Proposition 3.5. The above duality is one-to-one correspondence be-
tween points and lines in the respective projective planes.
Proposition 3.6. Two lines L, L0 ⊆ P2 intersect at a point P , if and
only if the two points Q, Q0 ∈ P2∗ incident to the lines L, L0 belong to the
line M ⊆ P2∗ incident to the point P , and vice versa.
20.4. Dual curves. The projective duality is an amazing transformation. It sends
any (finite) configuration of points in P2 to a configuration of lines in P2∗ preserving the
incidence: if, say, some of the points align along a certain line, then the corresponding
dual lines cross each other at the corresponding point. One can extend this duality to
some “infinite configurations”.
2
Consider, for instance, a smooth curve in C ⊆ P : by definition, it is a map ϕ : [0, 1] →
P2 , t 7→ x(t) : y(t) : z(t) whose image avoids the origin; for instance, one can consider
a parametrized straight line x(t), y(t), 1 with x(t) = at + b, y(t) = a0 t + b0 with some
a, a0 , b, b0 . Each point ϕ(t) ∈ P2 corresponds to a line M (t) ⊆ P2∗ on the dual plane. Has
this family of lines any visible structure, assuming, say, that the initial curve was smooth?
20. INTERSECTION THEORY FOR THE PROJECTIVE PLANE P2 93
Figure 16. Envelope of a family of lines
If the curve C was a straight line L ⊆ P2 , all lines M (t) dual to points P (t) = ϕ(t),
will have to pass through the same point Q ∈ P2∗ dual to L. But this is clearly an
exceptional case, and we cannot see how it can be generalized for more general curves.
In general, if we consider two close points P, P 0 ∈ C, then the line passing through
the points P and P 0 (the chord, or secant of C) should be very close to the tangent line
to C at the point P = ϕ(t0 ), provided that it exists2. Since the duality correspondence
is continuous (in some natural sense, which can be made precise), the point dual to the
tangent line at P is close to the point dual to the line P P 0 . This heuristic observation
motivates the following definition.
Definition 3.7. Let ϕ : t 7→ P (t) ∈ P2 be a smooth curve with uniquely defined
tangent line L(t) = LP (t) at each point P = P (t), and Q = Q(t) is the point in P2∗ dual
to L(t). Then the map ψ : t 7→ Q(t) is called the dual curve to ϕ.
According to this Definition, if C is a straight line, then its dual curve is a single
point.
Remark 3.4. Geometrically, the dual curve is an envelope to the family of lines M (t)
dual to the points P (t) on C. By definition, the envelope is a curve that is tangent to all
lines M (T ) at the respective points, see Figure 16.
Example 3.8. Assume that the curve C ⊆ P2 is defined not by a parametric equation,
but by zeros of a quadratic form in the homogeneous coordinates,
π −1 (C) = {v ∈ R3 : hAv, vi = 0}, v = (x : y : z) ∈ P2 ,
where A is a symmetric nondegenerate 3 × 3-matrix.
Let P = (x0 : y0 : z0 ) be a point on this curve. The tangent line L = LP to C at
P ∈ P2 is defined by the homogeneous equation
π −1 (LP ) = {v ∈ R3 : hAP, vi = 0}.
The point Q = QP dual to the line LP in P2∗ has the homogeneous coordinates AP ∈ R3 .
This means that A−1 Q is equal to P , that is, A−1 Q, Q = 0 by the duality. The last
equation describes all points Q dual to tangent lines to C. This is again a quadric, albeit
in P2∗ .
One can study the correspondence between projective curves and their duals. This is
a fascinating story having practical applications in the theory of wavefronts.
2
This condition is violated if the velocity vector of the curve in the affine window
(x, y, 1) vanishes.
94 3. NON-EUCLIDEAN GEOMETRIES
21. Projective transformations
It should not be a surprise that we learn any geometric structure by
studying transformations (maps, functions) preserving this structure. The
previous examples abound. Metric spaces can be learned only by study-
ing isometries (transformations preserving the metric). Linear spaces can
be fully understood only after we study linear maps (and in particular, self-
maps of Rn that arise when we change one basis for another). The projective
geometry is not an exception: until we understand what can and what can-
not be achieved by a transformation preserving the “projective structure”,
we will not really feel the projective geometry.
21.1. Projective maps. Let PV and PW be two projective spaces, ob-
tained as projectivizations of two vector spaces V, W (in general, of different
dimensions). For any linear map A : V → W the image of any 1-subspace
Rv, 0 6= v ∈ V , is a 1-subspace Rw, w ∈ W . The latter subspace is non-
trivial (i.e., does not reduce to {0}), if and only if Av 6= 0. This makes the
following definition obvious and self-consistent at once.
Definition 3.8. A projective map between two projective spaces PV
and PW is any map α induced by an injective linear map A : V → W ,
A
V −−−−→ W
πy
π
y (21.1)
α
PV −−−−→ PW
where π is the natural projection, cf. with Definition 3.2.
Example 3.9. Consider the case V = W = R2 . What are the projective
maps of P1R = PR2 to itself?
Any linear map A : R2 → R2 is defined by its 2 × 2 matrix ac db , so that
x a b x
7→ .
y c d y
Let u = y/x be an affine window (chart) on the projective line with coordi-
nates (x : y). Then the map above takes the form
c + du a b c d
u 7−→ , det = − det 6= 0. (21.2)
a + bu c d a b
In the other chart v = 1/u the map (21.2) takes the form
av + b
v 7−→ .
cv + d
Affine maps of the form u 7→ au + b, a 6= 0, in particular, shifts u 7→ u + b,
form a particular case of fractional linear maps. A non-affine example is
given by the inversion u 7→ 1/u.
Definition 3.9. A fractional linear map is any map P1 → P1 which in
any affine window (chart) u on P1 takes the form (21.2).
21. PROJECTIVE TRANSFORMATIONS 95
Theorem 3.7.
(1) Fractional linear maps form a group: each such map is invertible
in this class, and composition of two fractional linear maps is again
fractional linear.
(2) Any three different points in P1 can be mapped to any three other
points by a suitable fractional linear maps.
Proof. The first claim follows immediately from the fact that invertible
linear self-maps of R2 form a group.
To prove the second claim, note first that any two distinct points P1 , P2 ∈
P1 can be sent to any two other distinct points Q1 , Q2 ∈ P by a projective
map. It suffices to choose two nonzero non-collinear vectors v1 , v2 ∈ R2
that represent P1 and P2 (they would necessarily form a basis in R2 ) and
another such pair w1 , w2 representing Q1 , Q2 , and construct a unique linear
map A : R2 → R2 which sends vi to wi , i = 1, 2.
This means that any two points P1 , P2 can be sent to the two points in
an affine chart u that correspond to u = 0 and u = ∞. Once we add a
third point P3 , different from both u = 0 and u = ∞, then by a suitable
transformation u 7→ λu with λ 6= 0, it can be sent to u = 1.
Thus we see that any three points on P1 can be sent to the triple {0, 1, ∞}
in any specified affine window. Because of the group property, we see that
any three different points can be sent to any other three (complete the
argument!).
21.2. Double ratio: an invariant. Collections of four and more points
on the projective line P1 can be really different: there are some numerical
quantities that must coincide for two different tuples in order to be projec-
tively equivalent.
Definition 3.10. Given four different points with finite affine coordi-
nates u1 , u2 , u3 , u4 in P1 , their cross-ratio or double ratio is the fraction
(u1 − u3 )(u2 − u4 )
ρ= ∈ R. (21.3)
(u2 − u3 )(u1 − u4 )
Remark 3.5. Your efforts to memorize this formula are almost destined to fail.
There are four indices 1, 2, 3, 4 that have to be somehow distributed between the numerator
and denominator, besides, you should remember, how exactly they are distributed around
the minus signs.
Fortunately, the answer is almost independent of what you do. There are exactly two
ways to partition 4 indices into two groups 2 + 2 (to be enclosed by braces in the product).
You put one in the numerator, the other in the denominator. Next, in each pair you can
exchange the order of terms which you subtract from each other. Such alterations will at
worst result in replacing ρ by 1/ρ, −ρ or −1/ρ.
But you cannot cheat: once you have decided what is your preferred formula for the
cross-ratio, you should stick to it from beginning to end.
Problem 3.6. The above formula (21.3) makes sense only if all four
values ui are finite. What happens if we allow one of the ui to become
infinity?
96 3. NON-EUCLIDEAN GEOMETRIES
Solution. There is no question that if one of the ui , say, u1 is zero, the
formula still makes a perfect sense:
u3 (u2 − u4 )
ρ= , u2,3,4 6= 0.
(u2 − u3 )u4
If we pass to the other chart v = 1/u, the above expression will take the
form
1 1 1
v3 v2 − v4 v4 − v2
ρ= 1 1 1 = v −v .
v −
2 v v3 4
3 2
The ratio has a finite value even if v4 = 0.
Theorem 3.8. If f : P1 → P1 is a projective (fractional linear ) map,
then the cross-ratio of any for distinct points is preserved.
Remark 3.6. Dmitry proved this result in a broader context of four
points in P1C = PC1 and studied different configurations of points on the
complex projective line, having the same constant cross-ratio.
21.3. Projective transformations of the projective plane P2 .
Formally, projective transformations of P2 are in simple correspondence with
the linear transformations of the vector 3-space, say, R3 . To better or to
worse, our 3-dimensional intuition is developed worse than our 2-dimensional
ability to plot pictures (this might be related to the anatomical nature of
our vision that mainly treats the 2-dimensional projection on our retina).
Example 3.10. Any line one the projective plane P2 can be mapped to
any other line by a suitable projective transformation of P2 .
Moreover, any two lines can be transformed into any two other lines.
Moreover, any three lines in a general position (i.e., not passing through
the same point) can be transformed to any other three lines with this prop-
erty.
Indeed, any three 2-planes in R3 (passing through the origin, but having
only one common point at the origin) define three lines of pairwise inter-
sections, and the corresponding vectors form two bases in R3 . Obviously,
there exists a linear self-map of R3 into itself, which takes one basis into the
other.
This example provides a convenient tool of “simplifying” projective con-
figurations and proving theorems.
Example 3.11. Consider the configuration which appears in the De-
sargues Theorem 1.1, see §1.3. We can make a projective transformation
sending the line through P, Q, R (if they are collinear) to the infinite line.
Then each pair of the lines becomes a pair of parallel lines, AC k ac, BC k bc,
AB k ab
Then the two triangles 4ABC and 4abc will become similar on R2 ,
with their corresponding sides parallel and hence all angles are equal. For
two such triangles the assertion that the three lines through their respective
21. PROJECTIVE TRANSFORMATIONS 97
Figure 17. Proof of the Desargues theorem
vertices meet at a common point, is obvious, see Fig. 17. Since the assertion
of the Desargues theorem is projective invariant, this argument proves the
theorem, at least in one direction.
Problem 3.7. Prove the Desargues theorem in the other direction.
Hint. You can either apply the same ideas, or construct a projective
dual configuration (cf. with Problem 3.5).
Remark 3.7. There are other pairs of “projectively dual” theorems in
the projective plane geometry. For instance, the Menelaus theorem is dual
to the Ceva’s theorem.
21.4. Why “projective”? The very name “Projective geometry” in-
dicates that some sort of projections is involved. You are mostly familiar
with the orhogonal projection from the Euclidean 3-space to a 2-plane in
this space (or, even simpler, from the Euclidean 2-plane Π to a line on this
plane). Of course, the class of projections is much larger.
Example 3.12. Assume that L, M are two subspaces in a linear (not
necessarily Euclidean) n-spaces of complementary dimensions p, q (i.e., p +
q = n), which are in a generic position: L ∩ M = {0}. (Think of two lines
in R2 or a line and a plane in R3 ).
Then every vector v ∈ V can be uniquely represented as a sum v = u+w,
u ∈ L, w ∈ M . Indeed, you can choose a basis in V such that its first vectors
form a basis in L and the last q vectors—a basis in M (restore the details!).
Then the map A : V → L which sends v to u (well defined) is a linear
map such that:
(1) Ker A = M , AV = L;
(2) A2 = A.
Such map is called a linear projection of V on L parallel to M . Clearly,
together with this projection there exists a similar projection B of V on M
parallel to L. Obviously, A + B : V → V is the identity map (why? how do
you understand the sum A + B?).
98 3. NON-EUCLIDEAN GEOMETRIES
Figure 18. Projection in the projective spaces
In the projective space one has a similar situation. For simplicity we
consider only one case of projection of the projective plane to a projective
line from a point.
Let P ∈ P2 be a point on the projective plane, and ` ⊆ P2 a (projective)
line not passing through P .
Definition 3.11. The projection from P to ` is a map α : P2 r{P } → L
which sends any point Q ∈ P2 different from P to the unique intersection
point between the projective lines P Q and `.
Problem 3.8. Prove that α is indeed a projective map, that is, there
exists a linear map A : R3 → R2 such that ` = π(R2 ), which “covers” α.
Assume now that m ⊂ P2 is another projective line, not passing through
P as well. Then the restriction of α on m is a map between two projective
lines α|m : m → `, defined everywhere on m.
Problem 3.9. Prove that this restriction is a projective map in the
sense of Definition 3.8.
This example is a baby version of the more interesting construction that
was discovered first by the Renaissance artists and only much later studied
and formalized by mathematicians.
Example 3.13. Consider the Euclidean space R2 (in which we live)
a point P in this space and a 2-plane Π not passing through this point.
Consider the “map” from R3 r {P } to Π which sends any point Q to the
point of intersection of the line P Q with Π.
This map can be interpreted as the “photo map”, or “vision map” ϕ:
the retina of our eye or the pixel matrix of our smart phone captures the
rays of (reflected) light issued by a point Q and passing through our pupil
P (assumed to be a very small, practically a point).
This is not a well-defined map: if the line P Q is parallel to the plane
Π, then it is undefined. Yet if we add to R3 the “infinite plane” to make
22. ELEMENTARY SPHERICAL GEOMETRY 99
it into P3 and do the same with Π, then the map ϕ becomes a well defined
projective map.
Projective transformations correspond to a change of viewpoint (P and
Π simultaneously).
21.5. What about metric (lengths, angles) in the projective ge-
ometry? Projective space (and the projective geometry) is born from the
geometry of linear spaces, hence it inherits naturally the notions of points
(traces of lines spanned by nonzero vectors), lines (traces of 2-dimensional
subspaces) etc. Using the idea of homogeneous coordinates (and homoge-
neous equations), we can develop the projective algebraic geometry studying
subsets of Pn (and better PCn as always in algebra) defined by such equa-
tions. But what about the distances and angles in the projective geometry?
The answer seems to be straightforward. If we have a Euclidean vector
space E of dimension n+1, then within the group of linear self-maps of E we
have a well-defined subgroup of isometries, transformations preserving the
scalar product h·, ·i in E. These isometries after projectivization will form
a proper (and rather small) subgroup of the group of general projective
transformations, and one could study properties of various shapes in Pn
invariant by action of these groups (exactly in the way in the Euclidean
geometry we study properties invariant by isometries that preserve lengths
and angles).
However, it turns out that it is much easier to replace the projective
space by a sphere embedded in the Euclidean space Rn+1 with the standard
scalar product. The only problem we will face is “doubling the entities”:
to pass from a spherical geometry to the projective one, we must remember
that any two antipodal points correspond the same projective point.
But this is already the subject of the next section.
22. Elementary spherical geometry
22.1. Projective space vs. the sphere, lines vs. geodesics. Recall
that the unit sphere Sn ⊆ Rn+1 is the set defined by the equation
X n
n
S = {hx, xi = 1} = xi = 1 , x = (x0 , x1 , . . . , xn ) ∈ Rn+1 , (22.1)
2
i=0
where Rn+1 is considered as a Euclidean vector (n + 1)-space.
This sphere crosses exactly once each ray R+ x ⊆ Rn+1 , x 6= 0. Thus we
can describe this sphere in a way similar to the description of the projective
spaces, cf. with Definition 3.2.
Definition 3.12. Two vectors v, w 6= 0 in a vector space V are called
equivalent, v ∼ w, if w = λv, λ > 0. The quotient space
SV = (V r {0}/(R+ r {0})
is called the spherization of V .
100 3. NON-EUCLIDEAN GEOMETRIES
Remark 3.8. This definition can be applied to any vector space V , not necessarily a
Euclidean one. By this definition, there always exists a map
ι : SV → PV, (22.2)
which sends each equivalence class R+ v (a ray in V ) to the equivalence class Rv (a line
in V containing this ray). Preimage ι−1 (x) of any point x ∈ PV (an equivalence class)
consists of the two equivalence classes ±x ∈ SV .
In other words, the sphere SV twice covers the projective space PV .
But now we are mostly interested in the spherization of Euclidean vector
spaces, initially equipped with the scalar project h·, ·i : V × V → R. In this
case one can associate the spherization of a space V , say, Rn+1 with the
natural scalar product, with the sphere as a subset of Rn+1 . This is a key
difference between projective spaces and spheres: we could not realize the
projective space Pn as a surface in Rn+1 .
Denote by σ : Rn+1 r {0} → Sn the map
1 p
σ(x) = x, |x| = hx, xi ∈ S. (22.3)
|x|
Geometrically, this is the unique point of intersection of the ray R+ x ⊆ Rn+1
with the sphere Sn and allows us to look at Sn as the spherization of Rn+1 ,
SS n = S(Rn+1 ) (again cf. with Definition 3.2).
We will deal almost exclusively with the case n = 2, that is, with the
geometry on the 2-sphere embedded in the usual 3-dimensional Euclidean
space.
Definition 3.13. A large circle on S2 is the intersection of S2 with a
2-plane Π 2 ⊆ R3 passing through the origin of the Euclidean space R3 .
Example 3.14. The equator of Earth (assuming that the North and
South poles N ,S are at the points (0, 0, ±1) ∈ R3 with the coordinates
x, y, z) is the intersection with the horizontal plane z = 0. All meridians are
intersections of the Earth with different vertical planes containing the z-axis
{x = y = 0}. Hence all these circles are large circles.
The map ι allows to “lift” any construction from P2 to S2 by “doubling”
(and is a good reason to keep almost the same terms as we used when
discussing the projective plane). In particular:
(1) For any projective point P ∈ P2 its preimage ι−1 (P ) consists of two
antipodal points on S2 .
(2) For any projective line ` ⊆ P2 the preimage ι−1 (`) is a circle on S2 .
This circle is the intersection of S2 with the 2-plane through the
origin in R3 which projects into the projective line ` ⊂ P2 .
Thus analogs of the “straight” projective lines are large circles on the sphere.
However, calling (curved) circles “straight lines” would be an abuse of the
language, so we need to change the adjective, and use the word geodesic
instead of straight 3.
3In English, “geodesic” is both an adjective and the noun. The combination “geodesic
line” is still a contradiction in terms, since “line” by default means a straight line. Yet,
22. ELEMENTARY SPHERICAL GEOMETRY 101
Definition 3.14. A geodesic line in S2 is the intersection of any 2-
subspace in Π 2 ⊆ R3 with S2 . More generally, a geodesic line on Sn is the
intersection of any 2-subspace Π 2 ⊆ Rn+1 with Sn .
Problem 3.10. Any two distinct and not antipodal points P, Q ∈ Sn
define a unique geodesic line passing through these points. Prove it.
Definition 3.15. The geodesic segment with endpoints P, Q ∈ Sn is the
shortest arc of the geodesic line (large circle) through P and Q.
Here we already see that the spherical geometry will not be Euclidean:
there are infinitely many different geodesic segments with antipodal end-
points, e.g., N and S. All of them have equal rights and cannot be preferred
to each other.
Remark 3.9. Besides large circles, there are other circles on S2 which
are intersections of the sphere with 2-planes not passing through the origin.
You can consider the family of circles Ch = S2 ∩ {z = h}, h ∈ [−1, 1].
These circles are varying in size from a single point (“zero radius circle”)
for h = ±1 to the large circle at h = 0. However, at this moment we don’t
know how to define the radius of these circles.
22.2. Spherical coordinates. To perform computations in the spherical geometry,
we need some coordinates, in the same way as we have Cartesian coordinates on the plane.
Unfortunately, we cannot reproduce the Cartesian construction in full. Instead, we will
use the standard geographical coordinates (latitude and longitude), which we briefly recall.
Denote the Euclidean coordinates in R3 by x, y (“horizontal”) and z (“verical”), so
that the sphere S2 is given by the equation x2 + y 2 + z 2 = 1. The two points (0, 0, ±1) are
called the poles (N and S respectively). Every other point P projects to the horizontal
(x, y)-plane off the origin, hence has a uniquely defined polar angle ϕ = arctan y/x called
longitude, which is measured in degrees from −180◦ to +180◦ . Longitude 0 corresponds
to a large circle passing through N , S and London/Paris. The latitude is the angle
θ = arcsin z and varies from −90◦ (the South pole S) to +90◦ (the North pole N ).
Note that unlike the meridians ϕ = const which are all arcs of large circles, the parallels
θ = const are not geodesic except for the Equator θ = 0.
If you plot the grid of parallels and meridians on the Globe (say, with a step of 5◦
both), then you will see a familiar pattern. This pattern near the point ϕ = 0, θ = 0
(somewhere in the Guinea bay) looks very much as the usual Cartesian coordinate grid of
almost straight lines almost parallel to each other. On the other hand, near the poles the
pattern will look very much like polar coordinates with the radius r = 90◦ − θ and the
polar angle ϕ.
Respectively, the geometric intuition associated with the geographic coordinates,
should be different near Equator and near the poles. At the same time we note that
geometrically nothing bad occurs with the sphere near the poles: a rotation along any
axis in R3 different from the axis SN , will move the poles to a pair of antipodal “regular”
points.
22.3. Spherical distance. Recall how we define the distance between
points on the Euclidean plane Π. We start with the definition of distance
taken into account the mathematical proximity, using “geodesic” as an adjective with the
noun “line” is not that awful.
102 3. NON-EUCLIDEAN GEOMETRIES
first on one-dimensional lines isometric to the real line. Then for two dis-
tinct points P.Q ∈ Π we draw a (unique) line ` ' R1 through P and Q
and then measure the distance dist(P, Q) as the length |P Q| along `. The
corresponding function
dist : Π × Π → R+
satisfies then the conditions of Definition 1.1.
Now, since we have a reasonable replacement for lines and segments, we
can reproduce the same construction on the sphere.
Each geodesic segment (a part of a large circle U of unit radius) is a met-
ric space: the distance between the endpoints is the arclength of the shortest
of two arcs connecting them. By construction, it is a positive number less
or equal to π (since the total circumference of U is 2π).
Definition 3.16. The spherical distance is the function
distS : S2 × S2 → R+
such that dist(P, Q) is the arclength of the shorter geodesic segment with
the endpoints at P and Q.
Remark 3.10. Formally, this definition needs to be refined to address
the cases P = Q and P = −Q (a pair of antipodal points). In the former
case we simply set dist(P, P ) = 0, in the latter there are infinitely many
segments connecting P and Q, but all of them have the same length π.
Does this function satisfy the conditions of Definition 1.1? The first two
conditions (symmetry and nonnegativity) are obvious. However, the third
condition, the triangle inequality, turns out to be highly nontrivial.
Theorem 3.9. The spherical distance satisfies the triangle inequality:
for three points P, Q, R ∈ S2 connected by (short) geodesic segments,
distS (P, Q) + distS (Q, R) > distS (P, R).
Remark 3.11. It is not simple to find an elementary proof of this in-
equality, and for good reason. It turns out that it is easier to consider all
rectifiable spherical curves γ : [0, 1] → S2 , γ(0) = P , γ(1) = R (for which
the notion of length can be defined) connecting the two endpoints P, R and
prove that the geodesic segment P R has the shortest length among all these
curves. The proof of this more general statement belongs to the so called
Calculus of variations, where it becomes a specific case of a more general
fact.
We can formulate this general result as follows. Consider all continuous
vector-functions γ : [0, 1] → R3 such that:
d
(1) the (vector) derivative v(t) = dt γ(t) ∈ R3 exists for all moments
of time t ∈ [0, 1] except possibly for finitely many points, and the
derivative t 7→ v(t) is continuous between these points (i.e., the
curve γ is rectifiable),
(2) γ(t) ∈ S2 for all t ∈ [0, 1],
22. ELEMENTARY SPHERICAL GEOMETRY 103
(3) γ(0) = P , γ(1) − Q.
Then the absolute value |v(t)| > 0 is well defined and the integral
Z 1
d
|γ| = |v(t)| dt > 0, v(t) = γ(t) ∈ R3 ,
0 dt
is well defined. This integral is naturally interpreted as the length of the
curve γ, and one can compare the lengths of different curves satisfying the
above three conditions.
Then there is a theorem saying that for any such curve γ its length
|γ| is greater or equal that the length of the geodesic segment, equal to
distS (P, R).
The result mentioned in this remark implies the following result.
Theorem 3.10. The spherical distance between any two points is the
length of a shortest rectifiable curve connecting these points and this shortest
curve is necessarily a geodesic segment.
This is another argument why geodesic segments should be considered
as “nonlinear” analogs of straight line segments.
22.4. Isometries of the sphere vs. isometries of R3 . Thus for a
sphere S2 ⊆ R3 we have two (obviously, closely related) objects:
(1) the Euclidean structure (scalar product) h·, ·i in Rn ;
(2) the spherical distance distS (·.·) on S2 .
If F : R3 → R3 is an isometry of R3 as a Euclidean space (i.e., a linear map
sending 0 to 0 and preserving the scalar product), then this map preserves
the equation hx, xi = 1 and hence takes the sphere S2 to itself.
It is absolutely obvious that the restriction f = F S2 : S2 → S2 is an
isometry in the sense
that it preserves the spherical distance between points,
distS f (P ), f (Q) = dist(P, Q) for any two points P, Q. Conversely (this
requires a short and simple proof), any isometry f of S2 is induced by an
isometry F of R3 as above.
Isometries of the sphere, taken together, form a group (as usual). This
group can be easily described (actually, this could already be done earlier,
in §16).
Theorem 3.11.
(1) For any two points P, Q ∈ S2 there is an isometry f such that
f (P ) = Q.
(2) Any isometry has at least two antipodal fixed points ±P , such that
f (±P ) = ±P .
(3) Any isometry fixing two antipodal points ±P and preserving the
orientation of the sphere, is a rotation around the axis (−P, P ) in
R3 by some angle ϕ ∈ [0, 2π].
(4) A mirror reflection in any large circle is an isometry reverting the
orientation od the sphere.
104 3. NON-EUCLIDEAN GEOMETRIES
Proof. The first claim follows from the well-known fact that any linear isometry
A : E → E of a Euclidean vector space has all eigenvalues (eventually, complex) of modulus
1, λλ̄ = 1. Indeed, if A is an isometry, then hAv, Avi = hv, A∗ Avi = hv, vi, that is,
A∗ = A−1 (that’s why it is so easy to invert orthogonal matrices). This identity implies
that all eigenvalues of A satisfy the condition λ̄ = λ−1 , equivalent to |λ| = 1, meaning
that they must be numbers, perhaps, complex but on the unit circle (of modulus 1).
Since the characteristic polynomial is real, this is possible only if the eigenvalues come
in complex conjugate pars e±iϕ or are real numbers 1 or −1. Since the dimension n = 3
is an odd number, at least one eigenvalue must be real. An eigenvalue 1 means that
the corresponding vector is invariant by F , that is, the corresponding pair of antipodal
points is preserved by F . In the case λ = −1 the isometry F is a reflection in the 2-
plane orthogonal to this eigenvalue. This plane is invariant by F and hence we are in the
situation discussed in §16.12.
22.5. Tangent planes. The notion of tangency to the sphere can be
defined, unlike the general case which requires passing to limit, in finite
geometric terms.
Let P ∈ S2 be a point on the sphere S2 ⊆ R3 . Consider a straight line
in R3 passing through P and parallel to a vector v ∈ R3 : in the parametric
form the equation of this line looks like
` = {x = P + tv}, t ∈ R, x, P, v ∈ R3 , v 6= 0,
(here we adopted the simplistic Cartesian point of view, when x, P, v are
considered as triples of real numbers).
Definition 3.17. The vector v ∈ R3 is called tangent to S2 at the point
P ∈ S2 , if it is orthogonal to the “radius-vector” P , hP, vi = 0. The set of
all tangent vectors is called the tangent plane (at P ) and denoted by TP S2 .
Remark 3.12 (important). The tangent vectors can be considered as
vectors tangent to (oriented) large circles on S2 passing through the point
P : each nonzero tangent vector v ∈ TP S2 uniquely defines a geodesic line
on S2 passing through P , to which it is tangent.
This observation is important to understand how isometries of the sphere,
the maps f : S2 → S2 , act on the tangent vectors (which are not points on
the sphere and hence are not in the domain of f ).
Let f be such an isometry and v ∈ TP S2 . Denote by Q = f (P ) and
let γ be an oriented geodesic line (large circle) passing through P tangent
to v. Then f (γ) must again be an oriented large circle, necessarily passing
through Q, and there is a unique vector w ∈ TQ S2 which is tangent to f (γ)
(with the right orientation). By definition we say that w is the f -image of
v. In order to stress that v, w are only tangent to the sphere, we use the
notation4 w = f∗ (v).
It is very easy to see (prove it!), that the map
f∗ : TP S2 → TQ S2 , Q = f (P ), f∗ (v) = w,
4In more advanced courses of Geometry and Calculus it is explained why this map is
called the differential of f at the point P and is sometimes denoted by df (P ). Another
name for f∗ is the map tangent to f .
22. ELEMENTARY SPHERICAL GEOMETRY 105
Figure 19. Tangent spaces
is an isometry between two Euclidean 2-dimensional vector spaces.
Problem 3.11. Assume that f is an isometry and f (P ) = P . Prove that
in this case f∗ is a self-map of the tangent plane TP S2 into itself and hence
is either a rotation or a composition of rotation with a mirror symmetry as
in §16.12.
Remark 3.13. To be formally correct, we need to treat the tangent planes TP S2 at
different points P as completely disjoint from each other, although the definition places
all of them as subspaces in the same Euclidean space R3 . The reason is simple: two such
subspaces TP S2 and TQ S2 , P 6= Q, always have a non-empty intersection (line) in R3 ,
but the corresponding tangent vectors should be “attached” to two different points on the
sphere, the equality simply meaning that the corresponding lines will be parallel in the
affine space R3 .
To formalize this, it is convenient to introduce the notion of a tangent space to an
abstract vector space V at different points. By definition, a tangent vector to V at a point
v ∈ V is a pair of vectors (v, w), w ∈ V . Formally this is a Cartesian product V × V of V
by itself, with the first argument being considered as a point, and the other as the tangent
vector. Formally this looks like the representation
[
V ×V = Tv V, Tv V ' V.
v∈V
In other words, tangent spaces to V at different points are considered as separate identical
copies of the vector spaces V , “attached” to different points v ∈ V .
In this more accurate sense, the tangent planes to the sphere are 2-subspaces not in
the same Euclidean space R3 , but rather in different tangent spaces,
TP S2 ⊆ TP R3 , P ∈ S2 ⊆ R3 .
Note that in the Euclidean case, where instead of the general vector space V we have a
Euclidean space E, the tangent spaces TP E are Euclidean spaces themselves, the scalar
product on each of them is inherited from the isomorphism TP E ' E.
This means, in particular, that each tangent space TP S2 is itself a Euclidean plane
with the scalar product computed by the same formulas as in the ambient Euclidean space
R3 .
106 3. NON-EUCLIDEAN GEOMETRIES
Figure 20. Tangent lines
23. Curvature
23.1. Parallel translation (transport). In any affine space there ex-
ists a subgroup of transformations called the parallel translations. They are
not linear self-maps (as they move the zero vector to a nonzero one), but
are very nicely acting on the corresponding affine spaces of points. If the
vector space is Euclidean (i.e., has a scalar product), then translations are
isometric. What is important is that the translations keep straight lines
parallel to themselves.
Do we have an analogous notion in the spherical geometry, where the role
of lines is played by the geodesic lines (large circles)? Clearly, the parallel
translation in R3 cannot be used: after such translation to another point
the tangent vector stops being tangent to the sphere S2 .
However, for the 1-dimensional sphere such “parallel translation” is pos-
sible.
Example 3.15. Consider the case of 1-dimensional sphere S1 ⊆ R2
which is already familiar under the name “unit circle” U. In this case we
should replace the tangent 2-plane TP S2 by the tangent 1-line TP S1 = TP U
which is the usual tangent line as defined in the Euclidean plane Π. This
tangent line consists of vectors λv, where v is the unit vector attached to P
and orthogonal to P , see Figure 20. We can choose v to be “positive”, that
is, pointing in the counterclockwise direction on U.
There is the only way one can map isometrically TP U to TQ U: map the
unit vector from TP U to the unit vector from TQ U and then extend this map
by linearity on the entire 1-dimensional spaces. In other words, apply the
rigid rotation of the Euclidean plane Π that takes P to Q, see Remark 3.12.
This observation makes the following definition quite natural. Let P, Q ∈
S2 be two points on the 2-sphere and γ a geodesic line (the smaller arc of a
large circle passing through P and Q).
23. CURVATURE 107
Proposition 3.12. There is a unique map between the tangent spaces
∆γ : TP S2 → TQ S2 such that:
(1) ∆γ is an isometry with respect to the scalar products h·, ·iP and
h·, ·iQ respectively,
(2) ∆γ is orientation-preserving5,
(3) ∆γ sends the unit vector tangent to γ at P to the unit vector tangent
to γ at Q.
Proof. Since all tangent spaces are 2-dimensional Euclidean planes,
an isometry preserving orientation between these planes is a rotation, see
§16.12. The angle of this rotation is uniquely determined by the image of
only one vector, in this case the tangent vector to γ at P and at Q.
Definition 3.18. The map ∆γ is called the parallel transport from P
to Q along the geodesic segment connecting them.
This definition can be “iterated”: let P0 , P1 , . . . , Pn is a sequence of
different points on S2 , connected by geodesic segments. Together these seg-
ments form a continuous “broken geodesic line” γ from P0 to Pn . Composi-
tion of parallel translations from Pi to Pi+1 will be an isometry between the
respective tangent planes.
The most interesting case occurs when P0 = Pn , i.e., when the piecewise-
geodesic path is a closed loop starting and ending at the same point P0 = Pn .
Then the corresponding parallel transport is an isometric self-map of TP0 S2 .
Thus there is a uniquely defined (naturally, modulo 2π) angle ϕ such that
the overall composition is a rotation by this angle. Of course, this angle
apriori may depend on the path P0 , P1 , . . . , Pn−1 , Pn = P0 .
Example 3.16. Consider the geodesic triangle on the sphere which con-
sists of the three points: P0 is the North pole, P1 is the point on the inter-
section of the zeroth meridian with the equator (somewhere in the Guinea
bay), P2 is a point on the equator but having the longitude +90◦ (East) and
P3 = P0 is again the North pole.
Let v0 ∈ TP0 S2 be the vector tangent to the zeroth meridian P0 P1 . Since
the meridian is a geodesic line on S2 , and its transport to the point P1 is
the vector v1 ∈ TP1 S2 which points strictly to the South and hence forms an
angle π/2 with the vector v10 ∈ TP1 S2 tangent to the equator. The parallel
transport of v10 to TP2 S2 will be again tangent to the equator at P2 , and since
the angles are preserved, the parallel transport of v1 to TP2 S2 will be the
vector orthogonal to the equator at P2 and pointing south. Thus v2 will be
tangent to the meridian P2 P0 , itself again a geodesic, hence it arrives after
the last parallel transport along P2 P0 back to TP0 S2 as a vector v3 tangent
5This condition is not as obvious to formulate accurately, as you may think. In
general, there is no way to compare orientation on two tangent planes. However, we may
consider the spaces Tγ(t) S2 along the path γ: these are isometries continuously depending
on t, and hence one can choose orientations of these space in a consistent way.
108 3. NON-EUCLIDEAN GEOMETRIES
to the meridian P2 P0 at P0 . Since the vectors v0 and v3 belong to the same
tangent plane TP0 S2 and can be compared.
In other words, after making a long travel along the geodesic path
P0 P1 P2 P0 we see the vector v0 transported to V3 which is rotated image
of v0 by the rotation in the positive direction (look at the picture!).
This Example in fact suggests how intelligent ants living on the sphere
could discover the fact that their geometry is not Euclidean and what they
believed to be straight lines (the shortest paths) violates the axioms of the
Euclidean geometry.
However, these intellectual ants could say that it is not the Fifth postu-
late that is violated in their world, but rather one of the more basic axioms
(claiming through any two distinct points passes only one straight line, that
appears to be much less controversial). Not that this would simplify much
their world view, but this is the lesson to humans: even most innocent-
looking axioms may fail in a non-conventional settings.
Problem 3.12. Replace the geodesic triangle P0 P1 P2 from Example 3.16 by a very
“thin” triangle when the segment P1 P2 along the equator is a small arc of the “length”
ϕ > 0 (the difference of longitudes of the two meridians).
Compute, if you can, the spherical area of this triangle (if you can’t, compute the area
of an “infinitely small” triangle for 0 < ϕ 1) and compare this area with the rotation
of any vector along the perimeter of this triangle.
Any conjectures?
Problem 3.13. Consider a “very small” geodesic triangle. The result of a parallel
transport along this triangle will result in a rotation close to zero. Guess in which direction
will be this rotation, compare your answer with the conjecture made above.
Yet any geodesic triangle can also be considered “inside out”, as a huge triangle with
most of the sphere inside and only a tiny piece outside.
Thus you should expect that the rotation of any vector around this triangle should
be very large.
How could these apparently conflicting points of view be reconciled?
23.2. How the result of the parallel transport depends on the
path. The examples studied above suggest that the result of the parallel
transport around a closed path depends on the “size” of a piece of the sphere
S2 bounded by this path. Of course, different paths (even if we restrict
our attention to the “polygonal paths” formed by finitely many geodesic
segments). Path can be self-intersecting, include segments larger than a
half-circle etc. The complete and accurate investigation is beyond the scope
of these lectures (as if for the technical reasons alone), but we can study a
few instructive examples.
23.3. More non-Euclidean features. Having defined the spherical
distance, we can replicate some constructions from the Euclidean geometry
on the plane Π. For instance, choose the center at the north pole N = O
and consider the “spherical circle” of radius r > 0, the set of points
CN (r) = {P ∈ S2 : distS (P, N ) = r} ⊆ S2 .
23. CURVATURE 109
Figure 21. Composition of paths
A simple inspection shows that this set is indeed a “circle”, that is, the
result of intersection of S2 with the horizontal plane {z = h} for suitable
h ∈ [−1, 1]. In the geographic coordinates these circles are parallels {θ =
const}. For r > 0 small they are small circles around the North pole, when
r grows to the value r = π/2 (corresponding to the Equator), these circles
have larger and larger radii in R3 . Once r gets bigger than π, then the
“3D-radius” of the corresponding circles decreases to zero at r approaches
π, the maximal available value for the distance between any two points in
S2 .
One can ask how the length6 of C(r) = CN (r) and the area bounded
by this circle depend on r > 0. Recall that in the Euclidean geometry
the answer is given by the linear function |C(r)| = 2πr and the quadratic
function area C(r) = πr2 .
Problem 3.14. Compute explicitly the values of |C(r)| and area C(r)
on S2 for r ∈ [0, π].
Problem 3.15 (teaser). Could you suggest any reasonable definition for
these values for r > π?
Remark 3.14. These observations suggest that there are no similar figures in the
spherical geometry. This is indeed the case.
Theorem 3.13. Let 4ABC and 4P QR be two spherical triangles, whose sides are
geodesic segments. If the three angles of these triangles are pairwise equal, then there exists
an isometry f : S2 → S2 which maps one into the other. In the classical language, these
two triangles should be called equal.
In the spherical geometry there exist objects that are absent in the Euclidean geome-
try: diangles. These are rare beasts formed by two antipodal points ±P and two geodesic
segments connecting them (draw this!). The corresponding statement for the diangles is
way too obvious: if two diangles have the same angles, then they are isometric (equal).
This might be a hint to the triangular case.
6Obviously, the answer does not depend on the location of the center N of the circle.
110 3. NON-EUCLIDEAN GEOMETRIES
23.4. What’s the buzz? Tell me what’s a-happening? We have
discovered a few facts from the spherical geometry. There is nothing un-
orthodox in them, probably, all of them were known to the ancient Greeks
who we extremely skillful in two- and three-dimensional geometry, although
they did not use our notation with trigonometric functions. Nothing would
bother them among the facts that we mentioned recently. Why we should
be so fascinated by these facts?
The simple answer is that the Greeks never had an idea to compare the
large circles on the sphere with the straight lines on the plane. For them
the mere idea would be strange: it is obvious that the lines are straight and
the geodesics are curved, so no surprise that the results would be different.
It is the achievement of the New Times: rather than looking at the
appearance, look at the technical documentation. The geodesic lines on the
sphere satisfy almost all axioms of the Euclidean straight lines, especially
if we eliminate the curse of antipodal points, treating them as a single one.
The only axiom that is violated in the projective (or spherical world) is the
existence of a line parallel to the given one.
For a formal logician is already sufficient: one small difference in the list
of logical requirements is sufficient to through away all “false analogies” and
proceed further.
Yet Mathematics is not reducible to the formal Logic. If too many
similarities are discovered, then there should be a reason for that. Later we
will construct explicitly a “geometry” with its distance function, its supply
of “straight lines” and all other attributes of the Euclidean geometry, which
will put the last nail to the Euclidean world. In this new, Hyperbolic World,
there will be parallel lines, but they will be non-unique contrary to the Fifth
postulate.
24. Geometry on surfaces in R3
The choice of the standard unit sphere S1 ⊆ R3 contained in the Eu-
clidean space 22.1 is not the only possible example where an “intrinsic” ge-
ometry of the surface can be studied and turns out to be “non-Euclidean”.
Yet the general case requires extra work.
24.1. Smooth surfaces in R3 . Let M = M 2 be a smooth surface in
R3 . We do not give here an accurate definition of what a smooth surface
is. Intuitively, you can think of it as a graph of the differentiable function
z = f (x, y) of two real variables x, y after a suitable choice of the orthogo-
nal coordinates (x, y, z) in R3 . Of course, this representation is only local,
i.e., valid for some restricted values of x, y, say, in the disk x2 + y 2 < 1.
Other pieces of M may require a different choice of the coordinates and the
function.
2
p Example 3.17. The standard sphere S 1 is the graph of the function z =
2 2 2 2
1 − x + y , say, over the disk x + y < 2 (which defines only a spherical
24. GEOMETRY ON SURFACES IN R3 111
p
cap around the North pole). The graph z = − 1 − x2 + y 2 over the same
disk represents the symmetric cap around the South pole. To cover
√ also the
2 2
area aroundpthe equator, we will need four more caps, y = ± 1 − x − z
2
and x = ± 1 − y − z . 2
Problem 3.16. Why couldn’t we use the closed disk x2 + y 2 6 1 for the
first pair of caps? Check that the six caps are indeed enough to cover the
whole surface of the sphere.
Remark 3.15. After exercises with affine windows (maps) on the pro-
jective plane P2 , each serving only a part of the plane, using different rep-
resentations of the sphere should not bee very difficult conceptually.
Because of the smoothness of the surface M , there exists a unique tan-
gent plane TP M to it at each point P . Unlike in the spherical case, this
tangent plane cannot be anymore defined as the orthogonal plane to the
radius-vector of the point P (there is no radius-vector!). Instead, we can
use Calculus to define the tangent plane to the graph of function z = f (x, y)
in the same way we define the tangent line to the graph of function y = f (x)
of one variable, using the derivative. In the case of surfaces we will have to
use partial derivatives and the equation of the tangent plane at a point P
with the coordinates (x0 , y0 , z0 ), z0 = f (x0 , y0 ), will take the form
z = z0 + α(x − x0 ) + β(y − y0 ), z0 = f (x0 , y0 ),
∂f ∂f (24.1)
α= (x0 , y0 ), β= (x0 , y0 ).
∂x ∂y
As in the case of the sphere, we will consider TP M as a 2-dimensional vector
space with the coordinates (u, v) ∈ R2 , u = x − x0 , v = y − y0 , different for
any point (x0 , y0 ). Each vector v ∈ TP M can be considered as belonging to
the 3-dimensional Euclidean vector space with the coordinates (u, v, w):
v
(u, v) ∈ TP M ←→ (u, v, w) ∈ R3 , w = αu + βv,
p (24.2)
|v| = |(u, v, w)|R3 = u2 + v 2 + (αu + βv)2 .
Of course, besides computing lengths, one can measure angles between vec-
tors tangent to M at the same point, since the scalar product in Rn can be
restricted to the tangent plane TP M .
This data, namely, existence of a tangent plane at every point of M
and the existence of the scalar product on it, allow us to develop a “locally
Euclidean geometry” on M in purely intrinsic terms. The idea is straight-
forward: a very small neighborhood of P on M admits a very good model
as the Euclidean vector space TP M near the origin v = 0.
Example 3.18. Consider the surface H given by the equation
H = {z = y 2 − x2 }, (x, y) ∈ R2 . (24.3)
Let v = (u, v, w) ∈ R2 be the vector tangent to H at a point (x, y, z),
z = y 2 − x2 . Then the tangency condition (24.2) means that w = 2yv − 2xu
112 3. NON-EUCLIDEAN GEOMETRIES
we have the formula for its length:
|v|2 = |v|2x,y
= u2 + v 2 + w2 = u2 + v 2 + 4(xu − yv)2
= u2 (1 + 4x2 ) + uv(−8xy) + v 2 (1 + 4y 2 ).
If we consider the symmetric matrix of the corresponding scalar product
(depending on (x, y))
1 + 4x2 −4xy
G= , det G = 4(x2 + y 2 ), tr G = 2 + 4x2 + 4y 2 .
−4xy 1 + 4y 2
The eigenvalues of this matrix are positive, of course, and can be explicitly
computed.
We list briefly the main steps in construction of this intrinsic geometry
of M .
24.2. Rectifiable curves and their lengths. A rectifiable curve on
M is a differentiable map γ : [0, 1] → R3 , such that for each t ∈ [0, 1] we
have γ(t) ∈ M . If
dγ
v(t) = (t) ∈ Tγ(t) M ⊆ Tγ(t) R3 ' R3
t
is the velocity vector of γ at the moment t, then the length of the curve is
by definition the integral
Z 1
|γ| = |v(t)| dt.
0
The function |v(t)| is defined both in terms of the ambient space R3 and in
the intrinsic terms of Tγ(t) M :
q q
|v(t)| = hv(t), v(t)iR3 = hv(t), v(t)iTγ(t) M .
24.3. Geodesic segments. Let P, Q ∈ M be two different points on
M . Then one can consider various rectifiable curves γ (all on M ) such that
γ(0) = P , γ(1) = Q and compare their lengths.
Of course, all these lengths are longer or equal than the line segment
[P Q] ⊆ R3 , but the latter in general does not belong to M for t ∈ (0, 1).
Definition 3.19. A geodesic segment is a rectifiable line whose length
is minimal among all rectifiable curves connecting P and Q on M .
There are various reasons why this minimum may be unattainable.
Example 3.19.
(1) Nobody can guarantee that at least one rectifiable line connecting
P and Q exists: think about a surface M which is a disjoint union
of two different pieces (say, two spheres). We can exclude such
surfaces from consideration, instead studying only their separate
pieces (connected components in the language of Topology).
24. GEOMETRY ON SURFACES IN R3 113
(2) It may be that there is a sequence of curves γn : [0, 1] → M , con-
necting the points P, Q, such that their lengths |γn | go down to
some value s > 0, but there is no curve of the length exactly equal
to s. The simplest example is when M is a Euclidean plane R2 r{0}
with the origin deleted. Then, of course, the geodesic segments are
simply (Euclidean) straight line segments, but if P and Q = −P
are two “antipodal” points, then the respectable segment is not in
M , and any other path is not minimal. This situation is described
by saying that M is not geodesically complete.
24.4. Geodesic lines. What if we want to continue geodesic segments
in the same way we can continue the Euclidean line segments to obtain in-
finite (straight) lines? On the sphere the question was trivial: geodesic seg-
ments naturally appeared as parts of the large circles which are unbounded
(could be continued indefinitely in each direction, although having finite
length).
In the case of a general surface one cannot expect such luck, and a
geodesic segment might be extendable across its ends only in a limited way.
For instance, if we consider M being a unit disk {x2 + y 2 < 1} on the
Euclidean plane, then geodesic segments are of course straight line segments
of lengths not exceeding 2 and cannot be extended indefinitely, as Euclid
required axiomatically for his straight lines.
However (see the spherical example), a geodesic line will provide a short-
est rectifiable path between any its two points only if these two points are
close enough: a geodesic segment larger than the half-circle is not a shortest
path anymore.
Remark 3.16. We need to develop analytic tools to find the shortest curves. This is
very much like a problem of finding an extremum of a function f (u) of one variable u ∈ R
with one very important change. When looking for a shortest curve, we need to replace
the unknown point u by an unknown curve γ : [0, 1] → M which can be represented by a
pair of differentiable functions x(t), y(t) (the z-coordinate is computed automatically).
On the other hand, instead of a function of one variable f we have the function “length”
defined on various curves. The function “length” is not differentiable in any sense (in
the same way as the function “absolute value” is not differentiable). Instead one has to
replace the length by the “action” function, defined as
Z 1
kγk = hγ̇(t), γ̇(t)i dt.
0
For instance, for the surface H as in Example 3.18 above, the action integral is
Z 1
kγk = hγ̇(t), γ̇(t)i dt, γ̇(t) = ẋ(t), ẏ(t), 2x(t)ẋ(t) − 2y(t)ẏ(t) ,
0
where the dot stands for the derivative in t. (One can easily see that γ̇(t) is the velocity
vector of γ in R3 at the moment t ∈ [0, 1].
The “school-level” necessary condition f 0 (u∗ ) = 0 for extremum of a function f at
a point u∗ should be replaced by a condition on the “extremal curve” t 7→ x∗ (t), y∗ (t) .
Surprisingly, it took a historically negligible time from the day Newton legalized the
notion of a derivative in his Principia (1687) till Euler and Lagrange solved the apparently
114 3. NON-EUCLIDEAN GEOMETRIES
Figure 22. Paper cube and tetrahedron
immensely more complicated infinite-dimensional problem in 1750’s. This was really an
epoch of geniuses (Euler, the Bernoulli family, . . . ).
24.5. Paper and scissors. A sheet of paper is flat. Rolling this sheet
into a tube (cylinder) does not change this fact: lengths of any straight lines,
measured along the rolled surface, will remain the same. Even if we fold the
sheet along a line, this flatness persists: we will still be able to continue
straight lines and circles across the line of fold.
Thus the surface of a polyhedron in R3 made of flat faces attached to
each other along their rectilinear edges, is a metric space (think of a cube
and/or tetrahedron, see Fig. 22; the flaps are for technical reasons and should
be attached to the corresponding flap-less edges).
Problem 3.17. A spider sits on the floor in a room, and a fly sleeps on
the ceil of this room. Construct the shortest path of crawling for the spider
to reach the fly.
One can say that the surfaces of the cube, tetrahedron or any other
paperclip polyhedron inherits its metric from the embedding in R3 which
is piecewise flat. Of course, there is no tangent space to these polyhedra
at the non-smooth points (edges and vertices). Yet, as we have seen, non-
smoothness along straight edges is not a problem: any segment of a straight
line on any face, which has an endpoint inside the edge (i.e., not at the
vertex), can be continued uniquely across the edge on the adjacent face. It
is the vertices which constitute the real problem.
Problem 3.18. Construct as many different geodesic lines on the surface
of the cube and tetrahedron, not passing through their vertices.
Solution for the tetrahedron. Consider the unfolding of the tetra-
hedron on the plane and use the parallelogram from Fig. fig:paper-hedrons
as a tile to tile the entire plane. Draw a straight line across this tiling and
follow the order in which this line crosses triangles which will become faces
of the tetrahedron after gluing.
24. GEOMETRY ON SURFACES IN R3 115
Figure 23. Different geodesics on the tetrahedron
Some of the solutions are on Fig. 23 taken from here.
Problem 3.19. Construct an infinite geodesic line on the tetrahedron.
The vertices of polyhedra constitute a real problem: there is no way to
continue straight lines once they hit vertices (e.g., it is not clear on which
of the several faces it will appear after passing through the vertex. Yet for
any closed piecewise linear path (crossing each face by a straight segment)
avoiding the vertices, one can define the result of the parallel translation
along this path. It will be a rigid rotation by a certain angle, and using this
angle we can define the curvature of the polyhedron at each point.
Not surprisingly, the curvature at any point interior for a face or an
edge, is zero (we expected this when making a paperclip polyhedron from
flat pieces of paper). What is more surprising is the nontrivial curvature at
the vertices. This curvature is “atomic”: each vertex has a certain (positive)
contribution to the rotation of the vector after a parallel translation.
Example 3.20. Consider any vertex of a tetrahedron, cut a small piece
of it along one of the edges and flatten the slit surface. The three faces,
each with the angle of π3 = 60◦ , will form a piece of the half-plane. This
means (why?) that the rotation of any vector along any small triangular
path around this vertex will be equal to π = 180◦ .
Doing the same with a vertex on the cube, instead of three equilateral
triangles we will have three squares with the angles π2 = 90◦ , which after
flattening will form a complement (exterior part) of the right angle. This
means (why?) that the rotation of a vector after the parallel translation will
be π2 = 90◦ .
It is not accidental that the total curvature of all vertices of the tetra-
hedron, 4 × π, and all vertices of the cube, 8 × π2 , is the same and equal to
the total integral curvature of the sphere 1×(area of the sphere)= 4π.
There are only five Plato solids, but one can approximate the standard
round sphere by less regular polyhedrons (say, with faces of different types)
and consider the curvature of the corresponding surfaces. As in the exam-
ples above, this curvature will be identically zero outside vertices and an
“atomic” curvature at the vertices themselves: the effect of a parallel trans-
port around a vertex at which several flat angles meet, is a rotation by the
angle π−(sum of the angles of the respective faces), a small positive number.
Adding these numbers over all vertices, we will get the same number 4π, and
the process can be described as follows. The constant curvature of the round
sphere means a constant density of uniform distribution of the total (inte-
gral ) curvature value 4π for a unit area. The “atomic” curvature appears as
the result of approximation of the continuous density by “material points”
of finite small mass, located at the vertices.
116 3. NON-EUCLIDEAN GEOMETRIES
Figure 24. Truncated icosahedron: almost spherical shape
25. Abstract Riemannian geometry
An abstract Riemann surface is an abstract 2-dimensional smooth sur-
face which has a tangent plane at each point, equipped (as a vector 2-space)
by a scalar product which differentiably depends on the point.
The next section will be a translation of this catch-phrase into an accu-
rate mathematical construction.
25.1. What is an abstract surface? Thus far we have dealt with
several classes of surfaces, and used different ways to identify points on
these surfaces.
(1) The Cartesian plane R2 whose points are pairs of real numbers
(x, y), x, y ∈ R.
(2) The general 2-dimensional vector spaces V over the field R of real
numbers. In each such space one can choose a basis (pair of vectors
e1 , e2 ∈ V ) such that any other vector v ∈ V is uniquely represented
as
v = x1 e1 + x2 e2 , x1 , x2 ∈ R.
The pair of numbers (x1 , x2 ) uniquely identifies “points” (vectors)
of V and shows that V is isomorphic to the Cartesian space R2 .
Yet this isomorphism depends on the choice of the basis and is
absolutely non-unique.
(3) The complex plane C. Each point z = x + iy ∈ C is uniquely
identified by (x, y) ∈ R2 . This identification is quite canonical.
(4) The sphere S2 ⊆ R3 defined by the equation {x2 + y 2 + z 3 = 1}.
We see that any point on S2 can be identified by three numbers
(x, y, z) ∈ R. But these three numbers are not independent: know-
ing x and y determines z (up to the sign). Thus one can use points
of the Cartesian plane (x, y) ∈ R2 to identify points on the sphere,
yet with a few caveats. Besides the ambiguity with the sign (which
25. ABSTRACT RIEMANNIAN GEOMETRY 117
hemisphere, Northern or Southern, you have in mind), there are
two problems. First, only points (x, y) such that x2 + y 2 6 1
(the unit disk
√ D) can be used for identification. Second, the for-
mula z = ± 1 − x2 − z 2 is very bad near the boundary U of the
disk (e.g., non-differentiable), whereas the points on the equator
{z = 0} ∩ S2 are perfectly legitimate and do not differ from other
points on the sphere.
A popular alternative is to use the geographic coordinates θ ∈
[−π/2, +π/2] = [−90◦ , +90◦ ] and ϕ ∈ [−π, +π] = [−180◦ , +180◦ ].
These formulas work badly near the poles and near the Pacific
meridian (the International Date Line).
(5) The projective plane P2 . On this plane any point is identified by
the three coordinates 0 6= (x, y, z) ∈ R3 , but as with the sphere,
these three coordinates are not independent and can be simultane-
ously multiplied by a nonzero multiple 0 6= λ ∈ R (we denote this
equivalence by ∼). This freedom allows to identify a point on P2 by
one of the three pairs (X3 , Y3 ) ∈ R23 , (X2 , Z2 ) ∈ R22 , (Y1 , Z1 ) ∈ R23 :
(X3 , Y3 , 1) ∼ (X2 , 1, Z2 ) ∼ (1, Y1 , Z1 )
Here we use the notation R2i , i = 1, 2, 3, for three different copies of
the Cartesian plane R2 . Each of these three ways is imperfect: for
instance, the first way, using the pair (X3 , Y3 ), where X3 = x/z,
Y3 = y/z, does not work for points of P with z = 0. Yet the
(disjoint) union of all three copies R2i , called maps, covers all of P2 .
(6) Surfaces in R3 which can be represented as graphs of differential
functions. This class is quite universal: any smooth surface M ∈ R3
near a point P ∈ M is the graph of a differential function z =
f (x, y) after choosing a suitable Euclidean (orthogonal) coordinates
in R3 . Without loss of generality we may assume that
P = (0, 0, 0), f (0, 0) = 0,
∂f ∂f
(0, 0) = (0, 0) = 0,
∂x ∂y
x2 + y 2 < ε, ε > 0.
In this situation all pairs (x, y) ∈ R2 as above identify points of M
sufficiently close to P (a suitable choice of ε > 0 determines the
size of the piece of M on which this representation works).
All these examples motivate us to give the following definition of an
abstract surface.
Definition 3.20. An abstract surface is a set M such that near each
point P0 ∈ M all points P ∈ M sufficiently close to P0 can be uniquely
parameterized by two real numbers, say, (x1 , x2 ) ∈ R2 in such a way that P
corresponds to (0, 0) and the representation is valid for all {x21 + x22 < ε}.
We say that the numbers x1 , x2 are local coordinates on M near P0 .
118 3. NON-EUCLIDEAN GEOMETRIES
Remark 3.17 (important). The choice of the local coordinates is by no
means unique or canonical. Coordinates on the vector spaces are defined
modulo linear invertible transformations if we choose a different basis. In
the general case we can consider another system of local coordinates, say,
(y1 , y2 ). Since the representation of the points of M is unique, each point P
has two unique pairs representing it, hence yi must be functions of xj (and
vice versa). We shall always assume, that
yi = fi (x1 , x2 ), fi (0, 0) = 0, 6= 0. (25.1)
This condition by one of the fundamental theorems of analysis guarantees
that there exist two function h1,2 = h1,2 (y1 , y2 ) such that
∂h ∂h
!
1 1
∂y1 ∂y2
xi = hi (y1 , y2 ), hi (0, 0) = 0, det ∂h2 ∂h2
6= 0 (25.2)
∂y1 ∂y2 y1 =y2 =0
for sufficiently small yi satisfying the inequality {x21 + x22 < ε0 }, ε0 > 0.
Of course, the labels for local coordinates can be arbitrary: one can use
variables x, y, u, v, . . . , indexed variables xi , yi , . . . for i = 1, 2 or complex
variables z, w, . . . (having in mind that a complex variable is a pair of two
real numbers, the real and imaginary part).
Problem 3.20. Consider the North pole P0 = N on the sphere S2 .
The geographic coordinates θ, ϕ are bad in any neighborhood of this point
(θ = π/2, ϕ undefined at N ). Let x = ( π2 − θ) cos ϕ, y = ( π2 − θ) sin ϕ
be another system of local coordinates. Prove that they are an admissible
system near N .
Hint. The polar coordinates (r, ϕ), r > 0, |ϕ| 6 π are bad at the origin,
yet the Euclidean coordinates r cos ϕ, r sin ϕ are good.
Remark 3.18. Of course, this definition admits immediate generaliza-
tion for any dimension. We simply will not need these higher dimensional
objects, but just in case: they are called differentiable (or smooth) mani-
folds. The idea is the same: near each point they admit a local model by
a small ball in Rn , but nothing besides the smoothness (differentiability of
functions describing changes of admissible coordinates) is assumed.
In particular: on the abstract surfaces (manifolds) there are no:
• straight lines or linear subspaces, as in Rn ,
• distances, circles, spheres, lengths of curves, . . . —nothing like that
is defined.
To talk about these shapes, we need to define the necessary notions explicitly.
25.2. Tangent space(s). Despite the discouraging Remark 3.18, some
features of the Euclidean geometry can be restored in the case of abstract
surfaces. The differentiability is the key word.
25. ABSTRACT RIEMANNIAN GEOMETRY 119
Definition 3.21. A (piece of a) smooth curve on an abstract surface
M passing through a point P0 ∈ M is a map
γ : R ⊃ (−ε, ε) → M, t 7−→ γ(t) = x1 (t), x2 (t) , (25.3)
such that both local coordinates xi (t), i = 1, 2, are differentiable on (−ε, ε),
γ(0) = P0 and the velocity vector
d
γ̇(t) = ẋ1 (t), ẋ2 (t) , ẋi = dt xi (t), i = 1, 2, (25.4)
does not vanish, R2 3 γ̇(0) 6= 0.
Proposition 3.14. If γ is a smooth curve with respect to any given
system of smooth coordinates (x1 , x2 ), then it is again a smooth curve with
respect to any other system of smooth coordinates (y1 , y2 ).
Proof. By (25.1)–(25.2), the coordinates yi will be again differentiable
functions of t as compositions,
γ : t 7−→ f1 (x1 (t), x2 (t)), f2 (x1 (x1 (t), x2 (t) (25.5)
and by the chain rule the vector ẏ1 (0), ẏ2 (0) , computed by the chain rule,
X ∂fi
ẏi (0) = · ẋj (0), (25.6)
∂xj
j=1,2
since the matrices in (25.1)–(25.2) are nondegenerate and cannot map a
nonzero velocity vector (ẋ1 (0), ẋ2 (0)) to zero vector (ẏ1 (0), ẏ2 (0)).
Definition 3.22. The tangent space TP0 M to an abstract surface M
at a point P0 ∈ M is the vector space of all velocity vectors of all smooth
curves passing through P0 .
The tangent space is isomorphic to R2 (i.e., each vector has two real
coordinates ẋ1 (0), ẋ2 (0)), but this isomorphism depends on the chosen local
coordinates. There are two vectors that form a basis in the tangent space,
which are especially convenient to work with: they are tangent to the two
“coordinate curves”,
γ1 (t) = (t, 0), γ2 (t) = (0, t), ei = γ̇i (0) ∈ TP0 M. (25.7)
25.3. Self-maps of an abstract surface. Differential as a tangent
map. Assume that Φ : M → M is a self-map of an abstract surface M into
itself, which sends P0 ∈ M to Q0 = Φ(P ) ∈ M . Let (x1 , x2 ) be a local
coordinate system around P0 and (y1 , y2 ) another system around Q0 . Then
there exist two functions f1 , f2 that describe Φ,
P = (x1 , x2 ) 7→ Φ(P ) = (y1 , y2 ), yi = fi (x1 , x2 ), fi (0, 0) = 0. (25.8)
We say that Φ is differentiable, if the functions f1 , f2 are differentiable near
the origin (0, 0).
120 3. NON-EUCLIDEAN GEOMETRIES
Remark 3.19. One can change the local coordinates near P0 , Q0 or
simultaneously. Then the formulas for the functions fi will change (one
should pre- and post-compose them with the formulas describing the coor-
dinate changes. The situation is similar with the change of matrix of a linear
map, cf. with §17.4.
This analogy is not merely superficial. If Φ is a differentiable map,
Q0 = Φ(P0 ), then this map in a natural way defines its linearizaton, or
the differential, a linear map between the vector spaces TP0 M and TQ0 M if
Φ(P0 ) = Q0 . In geometric terms (without referring to any local coordinates)
it is defined as follows.
Then any smooth curve γ(t) passing through P0 at t = 0, consists of
points of M which are mapped by Φ using the composition law γ 0 = Φ ◦ γ.
By definition, the curve γ 0 is smooth and passes through the point γ 0 (0) =
Φ(γ(0)) = Q0 . This curve has a well-defined velocity vector.
Definition 3.23. Let Φ be a differentiable map, Q0 = Φ(P0 ). The
differential dΦ(P0 ), also denoted by dΦP0 is a linear map from TP0 M to
TQ0 M , Q0 = Φ(P0 ), which maps the velocity vector v = γ̇(0) of any curve
passing through P0 to the velocity vector v 0 = γ̇ 0 (0) of the curve γ 0 = Φ ◦ γ.
If we choose the basis ei ∈ TP0 M with respect to the local coordinates
(x1 , x2 ) near P0 as in (25.7) and a basis fi , i = 1, 2, in TQ0 M with respect
to coordinates (y1 , y2 ) near Q0 , then the matrix of dΦ will be the Jacobian
2 × 2-matrix J with the entries
∂f1 ∂f1
!
∂x1 ∂x2
J = det ∂f2 ∂f2
. (25.9)
∂x1 ∂x2 x1 =x2 =0
25.4. Notation. How to write linear and bilinear functions on the tan-
gent spaces? The standard way is to choose (and explicitly specify) a basis
in the tangent space and then use coordinates of vectors in this space after
expanding them in the basis of this space.
Choosing a local coordinate system near a point P0 dictates a natural
choice of the basis vectors: thinking of the coordinate system as a map from
M to R2 , in the image space we have the natural choice of the coordinate
basis (1, 0) and (0, 1), which corresponds to the choice of the basis vectors
e1,2 in TP0 M as in (25.7).
As we know, each choice of the basis of vectors ei in a vector space
uniquely defines a basis ξi in the dual space V ∗ (see §15) by the conditions
P
hhξi , ej ii = δij . Thus any linear functional can be expressed as a sum i ci ξi .
Definition 3.24 (notation). Let P0 ∈ M be a point on an abstract
surface and (x1 , x2 ) is a local coordinate system near P0 . The functionals
ξ1,2 uniquely defined by the formulas
(
1, i = j,
hhξi , ej ii =
0, i 6= j
25. ABSTRACT RIEMANNIAN GEOMETRY 121
are denoted by dxi = ξi , i = 1, 2.
The advantage of this notation is in the traditional mnemonics. Assume
that a smooth map Φ : M → M sends P0 to Q0 and in a pair of local
coordinates (x1 , x2 ) near P0 and (y1 , y2 ) near Q0 = Φ(P0 ) is given by a pair of
functions yi = fi (x1 , x2 ), i = 1, 2, as above. Then the rule of transformation
between the linear spaces in the covector bases dxi and dyi , i = 1, 2, takes
the form which is easy to memorize (and it follows from the identity (25.6):
X ∂fi
dyi = (0) · dxj , i = 1, 2. (25.10)
∂xj
j=1,2
Involving the labels xi explicitly refers these functionals to a chosen system
of local coordinates.
Remark 3.20. This formula in one dimension should be familiar to you:
dy = f 0 (0) dx,
where dx is a “tangent covector” to the real line R1 (the only coordinate of a vector
v ∈ T0 R1 as a vector space) at the point x = 0, dy is the tangent covector at 0 = f (0),
and f 0 (0) is the Jacobian 1 × 1-matrix.
This formula is a way to assign the meaning to the left hand side of the “definition”
used by Leibnitz, who assigned a mystical sense to the “infinitesimally small differences”
dx, dy.
dy
= f 0, y = f (x).
dx
Today we continue to use his notation because it is convenient, but “solve the mystery of
the infinitesimals”. The functions dx, dy are linear functionals on the tangent spaces Tx0 R
and Ty0 R, both isomorphic to the only 1-dimensional vector space R1 , while the genuine
differences x − x0 and y − y0 , y0 = f (x0 ), are measured along the local coordinates on the
abstract 1-dimensional “surface” (“abstract line”) M = R1 near x0 and y0 respectively.
25.5. Why the tangent space is an R-vector space? This section can be skipped
during the first reading, but the question begs to be asked. We defined abstract vector
spaces as algebraic structures whose elements can be added between themselves and multi-
plied by real numbers. When we defined the tangent space TP0 M as the set of the velocity
vectors of all smooth curves passing at t = 0 through P0 (see Definition 3.22), we did not
explain how these velocity vectors can be added between themselves and multiplied by
scalars from R. Of course, identifying these velocity vectors with pairs of real numbers
from R2 in each local coordinates gives a “computational” answer, but can we describe
the linear structure on the tangent spaces without referring to specific coordinates?
In fact, multiplication by a scalar is easier to explain. If γ : (−ε, ε) → M , γ(0) = P0
is a smooth curve with the velocity vector v = γ̇(0) ∈ TP0 , then we can consider another
curve γ 0 (t) = γ(λt) for λ ∈ R. This curve will also be smooth and geometrically will
represent the same curve as before with the only exception: it will be run over at the
velocity γ˙0 (λt) = λγ̇(t) (by the implicit function theorem). This explains how to multiply
velocities by scalars.
Adding velocities is a more tricky business. Of course, given two curves γ 0 and γ 00
with the velocities v 0 , v 00 ∈ TP0 M , we can formally construct the “sum” of these curves
γ = γ 0 + γ 00 in each given local coordinate system by letting
γ(t) = x01 (t) + x001 (t), x02 (t) + x002 (t) ,
but this sum makes sense only in the chosen coordinate system (x1 , x2 ). In general, there
is no way to add points of an abstract surface M .
122 3. NON-EUCLIDEAN GEOMETRIES
Instead we may consider derivatives of arbitrary functions on M along different curves.
Let F : M → R be a smooth function defined in a neighborhood of the point P0 ∈ M (note
that smoothness is a property that does not depend on the choice of the local coordinates).
Then for any smooth curve γ(t) passing through P0 at t = 0 we can restrict F on this
curve and obtain a function of one variable Fγ (t) = F γ(t) , again differentiable. It has a
derivative at the origin, which we denote by Lγ (F ) ∈ R. In local coordinates (x1 , x2 ) on
M we can compute it immediately:
∂F ∂F
Lγ (F ) = (P0 ) ẋ1 (0) + (P0 ) ẋ2 (0), γ̇(0) = ẋ1 (0), ẋ2 (0) . (25.11)
∂x1 ∂x2
From this formula (or directly from the definition of the derivative) we see, that the result
Lγ (F ) is a linear functional on the (infinite dimensional) linear space of all functions:
Lγ (F1 + F2 ) = Lγ (F1 ) + Lγ (F2 ), Lγ (λF ) = λLγ (F ), λ ∈ R. (25.12)
Definition 3.25. The map Lγ : F 7→ Lγ (F ) is called the directional derivative or the
Lie derivative. Informally, this is the derivative of F “in the direction” determined by the
curve γ.
Remark 3.21. The Lie derivatives along the “basis curves” as in (25.7), will turn
∂
into the plain partial derivatives ∂x j
, j = 1, 2,
The space of all functions is infinite-dimensional and we do not know how to work
with such spaces. Yet the situation can be simplified and reduced to a finite-dimensional
case.
First, note that adding a constant to F does not affect Lγ (F ). Thus from the very
beginning we may consider only functions such that F (P0 ) = 0. The second observation
follows from (25.11): the kernel of the map F 7→ Lγ (F ) always contains functions whose
first order partial derivatives derivatives vanish at P0 :
∂F ∂F
(P0 ) = (P0 ) = 0 =⇒ Lγ (F ) = 0.
∂x1 ∂x2
This observation does not depend on the choice of local coordinates! In another coordinate
system the partial derivatives will be given by two other numbers, but their simultaneous
vanishing occurs (or does not occur) in all local coordinates at the same time.
Thus we can form the quotient space
TP∗0 M = m1 /m2 , m1 = F : F (P0 ) = 0 ,
(25.13)
m2 = F : F (P0 ) = ∂x ∂F ∂F
1
(P0 ) = ∂x2
(P0 ) = 0 .
This quotient space is 2-dimensional vector space over R and can be generated by the
images x1 mod m2 , x2 mod m2 of the functions F1 = x1 , F2 = x2 ∈ m1 (recall that each
local coordinate system consists of two functions x1 (P ), x2 (P ) defining the coordinates of
a point P ∈ M ).
Definition 3.26. The image of a smooth function F − c, c = F (P0 ), in the quotient
space (25.13), is called the differential of F (at P0 ) and denoted by dF (P0 ) or simply dF
when P0 is known from the context. The set TP∗0 M of differentials of all smooth functions
is called the co-tangent space (to M at P0 ). By construction, it is a vector space over R
of dimension 2 = dim M .
This definition was tailored to make trivially true the following statement.
Proposition 3.15. For any choice γ of a smooth curve, the Lie derivative F 7→ Lγ (F )
defined by (25.11), initially defined on smooth functions, “descends” on the quotient vector
space TP∗0 M as a linear map (functional)
Lγ : TP∗0 M → R.
25. ABSTRACT RIEMANNIAN GEOMETRY 123
In terms of the dual spaces, each velocity vector v = γ̇(0) ∈ TP0 M is a linear functional
on the differentials at P0 :
∀v = γ̇(0) ∈ TP0 M, ∀dF ∈ TP∗0 M, h dF, vii = Lγ (F ) ∈ R. (25.14)
But this means that the space of all velocities obtains the coveted structure of a vector
space! We can add velocities at the point P0 by considering them as linear functionals on
TP∗0 and adding them as we did in the linear algebra, see §15.
Remark 3.22. The differential notation is so convenient, that we can extend it to
the tangent spaces as well. Since any tangent vector is uniquely associated with a Lie
∂
derivation, we can use notation for the partial derivatives ∂x j
, j = 1, 2, to denote also the
∂
tangent vectors. Thus every tangent space TP0 obtains a basis ei = ∂x i
and the duality
between the tangent and cotangent spaces will take the familiar form
(
DD
∂
EE 1, i = j,
dxi , ∂xj =
0, i 6= j.
25.6. Bilinear forms on the tangent spaces: the key element of
the construction. The hard work with notations being over, we can now
say how a Riemannian metric looks in the invariant form and how to write
it in the local coordinates.
Definition 3.27. Let M be an abstract surface. A Riemannian metric
on M is a function g defined on all tangent spaces TP M at all points and is
a symmetric bilinear form in the vector variable:
[
g: TP M × TP M → R,
P ∈M (25.15)
(P, u, v) 7−→ hu, viP ∈ R, u, v ∈ TP M.
If (x1 , x2 ) is a local coordinate system on M , then the differentials
dx1 , dx2 constitute a convenient basis in the dual space to TP M at any
point P in the area of validity of the coordinates, and any symmetric bilin-
ear form can be written using a 2 × 2-matrix of the coefficients
2
X 2
X
g= gij (x1 , x2 ) dxi dxj , hu, viP = gij (x1 , x2 ) hhdxi , uii hhdxj , vii .
i,j=1 i,j=1
25.7. Examples. The flat Euclidean plane with the scalar product
h·, ·ie defined as the sum of squares of the coordinates (x, y) independently
of the point has the Riemannian metric
g e = (dx)2 + (dy)2 . (25.16)
The flat Euclidean three-dimensional space with the coordinates (x, y, z) has
the Riemannian metric g e = (dx)2 + (dy)2 + (dz)2 .
124 3. NON-EUCLIDEAN GEOMETRIES
Figure 25. Earth as it looks in the geographic coordinates
25.7.1. Geographic coordinates on the round sphere. Let S2 be the round
sphere of unit radius in the Euclidean space R3 with the flat metric as above.
A point with the geographic coordinates (θ, ϕ), θ ∈ [− π2 , + π2 ], ϕ ∈
[−π, +π], is embedded into R3 using the formulas
z = sin θ ∈ [−1, 1],
y = cos θ · sin ϕ ∈ [− cos θ, + cos θ] ⊆ [−1, 1], (25.17)
x = cos θ · cos ϕ ∈ [− cos θ, + cos θ] ⊆ [−1, 1].
Both θ, ϕ and x, y, z are differentiable functions on S2 , and we can compute
their diifferentials using the chain and Leibnitz rules:
dz = cos θ dθ,
dy = − sin θ dθ · sin ϕ + cos θ · cos ϕ dϕ, (25.18)
dx = − sin θ dθ · sin ϕ + cos θ · cos ϕ dϕ.
Substituting these formulas into the metric (dx)2 + (dy)2 + (dz)2 in R3 , we
obtain (after all trigonometric units are replaced by 1) the simple expression
g(θ, ϕ) = (dθ)2 + cos2 θ (dϕ2 ). (25.19)
What wee see is the rotational symmetry of this metric by rotations around
the z-axis (represented by the shifts ϕ 7→ ϕ + α mod 2π, α ∈ R mod 2π).
The second term explicitly depends on the latitude: it is equal to 1 only on
the Equator θ = 0 (where the geographic coordinates are most “flat”). As
a point P = (θ, ϕ) tends to one of the poles, the coefficient cos θ tends to
zero, which means that on the (ϕ, θ) plane (in fact, the rectangle [−π, +π] ×
[− π2 , + π2 ]) the horizontal coordinate looks much more stretched than its “real
value” is, see Fig. 25.
Problem 3.21. Consider the cylindrical surface with the coordinates
(ϕ, z) embedded into R3 using the formulas
x = cos ϕ, y = sin ϕ,
for the coordinates (x, y). Compute the Riemannian metric.
25. ABSTRACT RIEMANNIAN GEOMETRY 125
Problem 3.22. The same problem with the conus embedded as
x = z cos ϕ, y = z sin ϕ.
25.7.2. Projective plane. Recall that the projective plane P2 is the quotient of the
sphere by the equivalence relation P ∼ −P (identifying two antipodal points on the round
sphere). Consider an affine chart X, Y on P and the corresponding point o ϕ, θ on the
sphere. To be more specific, consider the round sphere of radius 1 in R3 centered at the
point (0, 0, 0). This sphere is tangent to the affine plane with coordinates (X, Y, 1) ⊆ R3
at the point (0, 0, 1). The line in R3 passing through (X, Y, 1) crosses the sphere at the
points with the Cartesian coordinates
p
(x, y, z) ∈ S2 , x = X/R, y = Y /R, z = 1/R, R = ± 1 + X 2 + Y 2 .
Choose the sign + to resolve the ambiguity.
The geographic coordinates of these points will be θ = arcsin z, ϕ = arctan y/x =
arctan Y /X. These formulas describe one-to-one correspondence between an upper hemi-
sphere in the geographic coordinates ϕ, θ and the projective plane with the affine co-
ordinates X, Y . We can use these formulas to transform the spherical metric (dθ)2 +
cos2 θ (dϕ)2 to a metric on P2 .
One can write explicit formulas for the metric g = g s (the superscript s stands for
“spherical”) in the affine coordinates X, Y on P2 . They will be of the form
2
(1 + X 2 + Y 2 ) (dX)2 + (dY )2 − X dX + Y dY
gs = .
(1 + X 2 + Y 2 )2
The advantage of this expression is its rationality (coefficients before the differentials are
rational functions of X, Y , which simplifies their study when performing rational changes of
variables, e.g., passing from one affine chart to another). We will not use this complicated
formula.
25.8. Complex coordinates.
25.9. Lengths of curves. The Riemann metric allows to define and
compute lengths of (parameterized) curves on abstract Riemann surfaces. If
γ : [0, 1] 7−→ M is a parameterized differentiable curve, then at each point it
has the velocity vector v(t) ∈ Tγ(t) M whose length (norm) is by definition
q
equal to |v(t)|γ(t) = hv(t), v(t)iγ(t) ∈ R. Integrating over time, we get the
number
Z 1q
|γ| = hv(t), v(t)iγ(t) dt. (25.20)
0
In coordinates, if γ : t 7→ (x1 (t), x2 (t) , then v = (ẋ1 (t), ẋ2 (t) and,
using the explicit form of the Riemannian metric, we have
Z 1p 2
X
|γ| = g(t) dt, g(t) = gij x1 (t), x2 (t) ẋi (t)ẋj (t) > 0. (25.21)
0 i,j=1
Definition 3.28. A curve γ is called a geodesic segment, if it has the
shortest length among all other curves with the same endpoints:
∀γ 0 : [0, 1] → M, γ 0 (0) = γ(0), γ 0 (1) = γ(1) =⇒ |γ 0 | > |γ|.
126 3. NON-EUCLIDEAN GEOMETRIES
A curve γ : U → M , U ⊆ R, is called a geodesic curve, if for every point
t0 ∈ U there exists ε > 0 such that piece γ|[t0 ,t1 ] is a geodesic segment for
all t1 such that |t0 − t1 | < ε.
The first part of this definition literally repeats the construction for
surfaces in R3 . The second part simply means that the notion of a geodesic
curve is local : the property of being geodesic can be verified near each point
of the curve which must be locally shortest, while globally the curve should
not be the shortest7. This was the case with arcs of large circles on the
sphere longer than π.
Definition 3.29. The angle between two smooth curves γ, γ 0 crossing
each other at a point P ∈ M is the arccosine of the scalar product hv, v 0 i
between their tangent vectors (velocities) v, v 0 ∈ TP M computed in TP M .
Definition 3.30. The distance function dist : M × M → R+ is the
function that for any two points is equal to the length of the shortest geodesic
segment connecting these points.
Thus, any abstract Riemann surface is automatically a metric space in
the sense of Definition 1.2.
25.10. Riemann surfaces and metric spaces: where is the dif-
ference? The distance function in any local coordinate system (x1 , x2 ) on
M is a function of four real arguments,
dist(P, Q) = dist(x1 , x2 ; y1 , y2 ), if P = (x1 , x2 ), Q = (y1 , y2 ),
symmetric with respect to the swapping of the two pairs between themselves.
On the other hand, the Riemann metric is a function of four arguments,
but these arguments are of different nature: two of them are still coordinates
if the point and two other are coordinates of the tangent vector.
Consider the function
1
D(x; v) = lim dist(x1 , x2 ; x1 + εv1 , x2 + εv2 ),
ε→0 ε
+ (25.22)
x = (x1 , x2 ), v = (v1 , v2 ),
assuming that this limit exists for all (v1 , v2 ) ∈ R2 (if the notion of the
derivative comes to your mind, your intuition is correct). Note that the right
hand side involves the distance between two “infinitesimally close points”
(x1 , x2 ) and (x1 + εv1 , x2 + εv2 ) if 0 < ε 1.
Then from the axioms of distance you can derive the following properties
of the function D:
7A terminological catch: if γ : R → M is a geodesic curve passing through two points,
P1 = γ(t1 ) and P2 = γ(t2 ), t1 < t2 . What name should be used for the finite piece
γ : [t1 , t2 ] → M ? The accurate name would be “a segment of the geodesic curve”, but it is
way too long. Instead, the natural language asks for a permission to abbreviate it to “the
geodesic segment”, but it might not be the shortest curve connecting P1 and P2 . I don’t
know how to deal with this problem.
25. ABSTRACT RIEMANNIAN GEOMETRY 127
(1) D(x; v) > 0, D(x; v) = 0 if and only if v = 0.
(2) D(x; λv) = |λ| D(x; v), including the symmetry D(x; −v) = D(x; v).
(3) The set Ux = {v ∈ R2 : D(x; v) 6 1} ⊆ R2 is a convex set.
These properties follow respectively from the nonnegativity of the distance,
its symmetry and the triangle inequality.
In general, the distance function may be “non-differentiable” in the sense
that the limit in (25.22) may be non-existing. But even if it is, still there
are plenty of possibilities (you can construct distance functions with any
collection of convex centrally symmetric sets Ux ⊆ R2 , x ∈ M .
Which distance functions appear if we used a Riemannian metric g to
construct them? The following two properties are necessary and sufficient:
(1) for any x ∈ M the function D(x; ·) is a positive quadratic form,
that is, the set Ux ⊆ R2 is an ellipse defined by an equation
hG(x)v, vi = 1 with a symmetric 2 × 2-square positive matrix
G(x) = {gij (x)}2i,j=1 depending on x,
(2) the coefficients gij (x) depend on ∈ M in a smooth way.
These conditions mean that the vector space R2 with the “infinitesimal
distance” D is a Euclidean vector space in the sense of Definition 2.17, see
§16.6.
25.11. Isometries: local description. The most general definition of
an isometry Φ of a metric space M to itself is preservation of the distance:
dist Φ(P ), Φ(Q) = dist(P, Q) ∀P, Q ∈ M.
On the (abstract) Riemann surfaces one can construct an infinitesimal ver-
sion of this definition. Informally, an isometry should preserve infinitesimal
distances near the point P turning them into infinitesimal distances near
Q = Φ(P ).
To do it formally, recall (see §25.3) how an arbitrary map Φ which acts
on points of M (and hence on pairs of close points, curves in M etc.),
acts (transforms) the tangent vectors at different points. This action is
quite natural: if γ : (ε, ε) → M is a smooth curve passing through a point
P = γ(0), then Φ ◦ γ = γ 0 : (ε, ε) → M will be another curve passing
through Q = γ 0 (0) = Φ(P ). The velocity of the curves v = γ̇(0) ∈ TP M
and v 0 = γ̇ 0 (0) ∈ TQ M are related by the chain rule of derivation: v 0 = Φ0P v,
where Φ0P : TP M → TQ M is the linear operator between the two tangent
spaces.
In coordinates, the formulas coincide with the formulas that were once
derived for change of variables. Assume that (x1 , x2 ) is a local coordinate
system near P and (y1 , y2 ) another such system near Q (we do not exclude
the case where Q = P , but in general the domains of these coordinates are
incomparable). Then the y1,2 are smooth functions of x1,2 (and vice versa,
as we can be assured if Φ is an isometry). Then the matrix of Φ0 in the
128 3. NON-EUCLIDEAN GEOMETRIES
natural pair of bases introduced in §25.3, is the Jacobi matrix
∂y1 ∂y1
!
∂x1 ∂x2
J(x1 , x2 ) = ∂y2 ∂y2
,
∂x1 ∂x2
evaluated at P . For us it is more important to compute how the covec-
tors dx1,2 and dy1,2 are transformed. The formulas are even more easy to
memorize:
2
X ∂yi
dyi = dxk , i = 1, 2, (25.23)
∂xk
k=1
(the formula is written on purpose to stress that it works for any number of
dimensions).
So what happens with the expression for the Riemannian metric? Con-
sider this metric near a point Q, that is, written with respect to the y-
coordinates:
X2
gQ = gij (y1 , y2 ) dyi dyj .
i,j=1
Then the substitution y = y(x) which is equivalent to the action of any
smooth map Φ yields the metric near P , written with respect to the x-
coordinates:
2
X
gP = gij y1 (x1 , x2 ), y2 (x1 , x2 ) ×
i,j=1
2
X 2
X
∂yi ∂yi
× dxk · dxk . (25.24)
∂xk ∂xk
k=1 j=1
This awfully looking formula (alas, all formulas in the geometry are awfully
looking, the only salvation comes from the fact that you need to check and
apply them only once) explains how the Riemann metric is transformed.
Definition 3.31 (obvious). A smooth map Φ is an isometry of the
Riemannian metric, if for any point P and its image Q the expression (25.24)
for gP coincides with what you should have,
X
gP = gij (x1 , x2 ) dxi dxj .
i,j=1,2
Remark 3.23. In the invariant terms this means that the linear oper-
ator Φ0 : TP M → TQ M , Q = Φ(P ), is an isometry between the two scalar
products:
∀P ∈ M, ∀u, v ∈ TP M, Φ0 u, Φ0 v Q
= hu, viP , q = Φ(P ). (25.25)
Remark 3.24. An attentive reader will notice a strange phenomenon about the way
how substitutions work in the formulas involving derivatives.
25. ABSTRACT RIEMANNIAN GEOMETRY 129
If a map Φ : M → M maps a point P with its small neighborhood to the point
Q = Φ(P ) (also with its neighborhood). We call this direction the forward direction.
What else this map does?
Obviously, its linearization Φ0 maps the tangent spaces, Φ0 : TP M → TQ M , acting by
the Jacobian matrix. What else?
It maps functions defined near Q to functions defined near P by the “simple substi-
tution”. If f = f (y) is a function defined near Q, that is, written in the local coordinates
y1 , y2 , then there is an obviously and uniquely defined function g = Φ∗ f , which is the
composition, g = f ◦ Φ, and in the coordinates x1 , x2 near P it takes the form
g(x1 , x2 ) = g(x) = f (Φ(x)) = f (Φ1 (x1 , x2 ), Φ2 (x1 , x2 )),
where yi = Φi (x1 , x2 ), i = 1, 2.
Note that the operator denoted by Φ∗ : f 7→ f ◦ Φ, is a linear operator on the (infinite
dimensional) space of functions: it maps sum of functions into the sum of their images
and commutes with the multiplication by a constant λ ∈ R. This operator acts in the
backward direction, opposite to the direction of the map Φ and its differential Φ0 . For this
reason Φ∗ is sometimes called a pull-back operator associated with the map Φ.
For the same reason the pull-back Φ∗ acts on covectors dyi and their linear combina-
tions with variable coefficients. The formulas are obvious:
∂Φi ∂Φi
Φ∗ dyi = d(Φ∗ yi ) = dΦi (x1 , x2 ) = dx1 + dx2 , i = 1, 2. (25.26)
∂x1 ∂x2
This is again a pull-back.
This observation allows to control easily, in which direction different types of geo-
metric objects are moved. Points and tangent vectors are pushed forward, functions and
“differential forms” are pulled back.
Example 3.21. On the Euclidean flat plane we have g e = dx21 + dx22 .
Any translation Φ : (x1 , x2 ) 7→ (x1 + c1 , x2 + c2 ) is an isometry, because
dyi = dxi .
If Φ is a linear map Φ : x 7→ Ax, where A is a linear matrix, then
y1 a11 a12 x1 dy1 a11 a12 dx1
= , hence =
y2 a21 a22 x2 dy2 a21 a22 dx2 ,
since the coefficients aij ∈ R are constants. Thus we see that the map will
be an isometry if and only if the matrix A should be orthogonal,
∀u, v ∈ R2 , hAu, Avi = hu, vi ,
where h·, ·i is the standard scalar product in R2 .
For a round sphere in geographic coordinates the Riemannian metric
(25.19) is given by the formula g s = dθ2 + cos2 θ dϕ2 and we see that the
rotations around the polar axis (corresponding to the translations ϕ 7−→
ϕ + α mod 2π) are isometries. Of course, there are much more isometries,
but they are difficult to discover in the geographic coordinates.
Example 3.22 (flat torus). Consider the unit circle U with the co-
ordinate ϕ mod 2π. Then we can consider its Cartesian square U × U,
also known as a 2-dimensional flat torus, can be equipped with the Rie-
mannian metric g t = dϕ21 + dϕ22 in the natural coordinates (ϕ1 , ϕ2 ). This
torus is a “flat” Riemann surface (in the same sense as the Euclidean
130 3. NON-EUCLIDEAN GEOMETRIES
plane is flat) and the Riemannian metric g t is invariant by all translations
(ϕ1 , ϕ2 ) 7→ (ϕ1 + α1 , ϕ2 + α2 ).
25.12. So what? Introducing the concept of an abstract Riemann sur-
face, we made a full round from the Cartesian model of the Euclidean plane
(see §7) to the much more general concept of a geometry that is ready to
serve inhabitants of curved surfaces in R3 as well as eyeless extraterrestrials
who need only accurate prescriptions how to draw “lines” and “circles”.
Indeed, we have replaced the two main instruments of the Greeks, the
ruler and the compass, by two powerful gadgets (call them tools, construc-
tions, definitions if you like it more).
The “generalized ruler” is a mathematical tool (algorithm8) that allows
a Riemann citizen to draw an equivalent of the straight line, called the
geodesic curve, through each point and (tangent to) each direction (specified
by a tangent vector at this point). Of course, the mere possibility of an
unlimited continuation is not guaranteed unless we impose some restrictions
on the Riemann surface. Moreover, the further behavior of the geodesic
curve may be very unpredictable: on the ideal round sphere S2 any such
curve closes up becoming a large circle. However, if we replace the sphere by
an ellipsoid {ax2 +by 2 +cz 2 = 1} with general positive coefficients a, b, c > 0,
then the geodesic curves will behave in an amazingly complicated manner,
see Fig. 26. Note that on each plot only one geodesic curve is plotted, which
winds many times around the ellipsoid!
The “generalized compass” is another “gadget” which allows first to
define the distance function dist(P, Q) on the Riemann surface as the length
of the shortest geodesic segment connecting P, Q ∈ M . It understandably
depends on how well did we master the “ruler”, but theoretically we can
speak freely about the “geodesic circles” of various radii in M , centered at
different points.
Having regained the two main tools of the Greeks, one is obviously
tempted to try and develop various “alternative geometries”. There is an
unlimited supply of different egg-shaped surfaces in R3 , and each one has
an intrinsic geometry as individual as an egg-shaped surface could be.
What is really interesting and challenging is to find Riemann surfaces
which have a rich group of isometries, differentiable self-maps which preserve
the geodesic distance.
It turns out that there is one really remarkable example of a Riemann
surface which has a very rich group of isometries, comparable to the group of
isometries of the Euclidean plane R2 and the round sphere S2 . This example
is called the Lobachevsky plane, or the hyperbolic plane. It turns out that
the geodesic curves of this plane obey all axioms of straight lines in the
8The “generalized ruler” in general takes the form of a second order orfinaty differen-
tial equation involving second derivatives of x(t) and y(t) with respect to t. This equation
is very complicated and can be explicitly solved only in very exceptional cases.
26. RIEMANN SURFACES OF NEGATIVE CURVATURE 131
Figure 26. Geodesic curves on non-spherical ellipsoids.
Euclidean plane, with only one exception. There is a whole bunch of lines
parallel to a given “line” (geodesic curve) through a point off this line.
26. Riemann surfaces of negative curvature
Now we have prepared everything to describe really new examples of
Riemann surfaces. Most of the statements below will be given without
proofs: they can be achieved by rather straightforward computations for
which we have no time.
26.1. The Lobachevsky (hyperbolic) plane. Consider the upper
half-plane
H ⊆ R2 , H = {(x, y) : y > 0}.
The tangent space TP H at any point P = (p, q) ∈ R2 , q > 0, is isomorphic
to R2 = {(v, w)} as a vector space.
Denote by h·, ·iP the twisted scalar product, which differs from the stan-
dard (Euclidean) scalar product hv, wie = hv, wi = v1 w1 + v2 w2 on R2 by a
scalar factor,
1
∀v, w ∈ TP H hv, wiP = hv, wie for P = (x, y). (26.1)
y
Remark 3.25. Another, essentially the same example, is as follows.
Consider the unit disk D = {x2 + y 2 < 1} on the plane R2 . For P ∈ D
define the scalar product
1 e
∀v, w ∈ TP D hv, wiP = 2 hv, wi for P = (x, y). (26.2)
2 2
1 − (x + y )
Definition 3.32. The Riemann surface H is called the Lobachevsky
plane, or the hyperbolic plane. The Riemann surface D is called the Poincaré
disk. Note that in both cases the boundaries {y = 0}, resp., U = {x2 + y 2 =
1}, both called the absolute, are not included in the surfaces!
Remark 3.26 (important). In both examples it is easier (and morally
more correct) to use the complex notation z = x + iy ∈ C instead of the
real coordinates (x, y). Then each tangent space, say, Tw D, will become
isomorphic to the complex line C (as usual, a different copy of C should be
132 3. NON-EUCLIDEAN GEOMETRIES
used for different points w ∈ D). Denoting the tangent vector by u ∈ Tw D,
we can replace the real scalar product h·, ·iw on R2 ' C by the “variable
absolute value” |u|w . This “absolute value” will be given by the formulas
|u| |v|
|u|w = for D, resp., |v|z = for H. (26.3)
1 − |w|2 Im z
Here z stands for a point in H, v ∈ Tz H, resp., w ∈ D, and u ∈ Tw D.
Note that in both case the length of the tangent vectors differs from
the standard length by a positive nonzero coefficient, that is, the unit circle
in the Riemannian sense |u|w = const looks again as a circle on the plane
C (and not an ellipse, as it should in the general case). This property is
called conformality of the respective Riemann metrics: though the lengths
are distorted, the angles between the curves are preserved.
Problem 3.23. The unit disk D and the upper half-plane H look com-
pletely different as subsets of the real plane R2 . However, as subsets of the
complex plane C they are related by a very simple formula:
z−i
z 7−→ Φ(z) = . (26.4)
z+i
(This map takes the real line {Im z = 0} bounding H to the boundary
{|w| = 1} of the unit disk D.) Prove that Φ is a bijection between H and D.
Remark 3.27. Both H and D are parts of the complex plane C, so one
can (abusing the language) speak about their “boundaries”, the absolut(s)
{Im z = 0} and {|z| = 1}. These boundaries should be considered as the
collection of “ideal” points, the points “at infinity”.
Talking about the points “on the absolute” is a convenient way to avoid
tedious arguments with limits. This abused language should not be confused
with the “infinity” in the projective spaces. Sorry for the inconsistency of
the language.
Problem 3.24. Prove that the map Φ as above is an isometry between
the Riemann surfaces H and D with the Riemannian metrics introduced
above.
Solution. The (complex) derivative of the map Φ is
1 z−i 2i 2
Φ0 (z) = − = , |Φ0 (z)| = .
z + i (z + i)2 (z + i)2 |z + i|2
The tangent maps Tz H and Tw D, w = Φ(z), are both isomorphic to C,
and for a tangent vector v ∈ Tz H its length (the complex modulus) |v| is
multiplied9 by |Φ0 (z)|: |u| = |Φ0 (z)|·|v|. This equality between the Euclidean
9Recall that the differential of Φ at a point z is a linear map dΦ : C → C, v 7→ αv,
z
where α ∈ C is the complex derivative of Φ at the point z. This linear map is a dilatation
(stretch) by |α| accompanied by the rotation by Arg α.
26. RIEMANN SURFACES OF NEGATIVE CURVATURE 133
lengths should be translated into the equality between the hyperbolic lengths
z−i
computed at the points z and w = z+i respectively, using (26.3):
2 2
(1 − |w|2 )|u|w = |u| = |Φ0 (z)| · |v| = · |v| = · (Im z)|v|z .
|z + i|2 |z + i|2
We leave it to the reader to verify that after all simplifications the above
condition reduces to the equality
|u|w = |v|z , w = Φ(z)
which exactly means that the map Φ is an isometry between H and D.
By Problem 3.24, the two examples above describe the same Riemannian
surface. We will explain how they define a “non-Euclidean plane”, in which
the Fifth postulate of Euclid will fail. We will be mostly working with
the upper half-plane H, where the computations are a bit easier. On the
contrary, the Poincaré disc D is more symmetric (rotationally).
26.2. Isometries of the hyperbolic plane. Consider transforma-
tions of the form
Φ az + b a b
H 3 z 7−→ ∈ H, det = 1, a, b, c, d ∈ R. (26.5)
cz + d c d
These transformation obviously map the real line R = {Im z = 0} ⊆ C into
itself (and is a projective map in the sense of Definition 3.9) and the upper
half-plane into itself.
Theorem 3.16. The maps of the form (26.5) are isometries of the hy-
perbolic plane.
Proof. The fractional linear map Φ : H → H is obviously complex an-
alytic: this means, that its differential at a point z0 ∈ H is a complex mul-
tiplication Tz0 H → Tz1 H, z1 = Φ(z0 ) by the complex number λ = Φ0 (z0 ).
Computing the derivative, we obtain (using the identity ad − bc = 1)
a az0 + b 1
λ = Φ0 (z0 ) = − 2
= .
cz0 + d (cz0 + d) (cz0 + d)2
The modulus of this number is |λ| = |cz0 + d|−2 , and it should be compared
with the ratio between Im z0 and Im Φ(z0 )|. We leave it to the reader to
check that
1 1 1 az0 + b
2
· = , Φ(z0 ) = ,
|cz0 + d| Im z0 Im Φ(z0 ) cz0 + d
taking into account that a, b, c, d are real and ad − cb = 1. Compare this
argument with the solution of Problem 3.24!
Remark 3.28. There is another transformation that obviously preserves
H and is an isometry. This is the reflection in the axis {x = 0} which can
be written as
Ψ : H → H, z 7→ Ψ(z) = −z̄. (26.6)
134 3. NON-EUCLIDEAN GEOMETRIES
This is a non-analytic map as a function of z which nevertheless is an isom-
etry of H which changes orientation of all tangent spaces in the same way
as the map z 7→ z̄ changes the orientation of the complex plane C.
Theorem 3.17. If L is a half-circle in H orthogonal to the absolute (i.e.,
with the center on the absolute), or a straight line orthogonal to the absolute,
and Φ is a hyperbolic map as above, then the image Φ(L) is again in the
same category (orthogonal to the absolute).
26.3. Digression: the structure of the group of isometries of H. This section
is a brief summary of Dmitry’s lectures.
Transformations of the complex plane of the form
az + b
z 7−→ , a, b, c, d ∈ C, ad − bc 6= 0, (26.7)
cz + d
form a group (i.e., composition of such transformations is again of the same form) which
is called the Möbius group and denoted10 by Möb. In fact, it is the group of projective
self-maps of the complex projective line PC1 , cf. with Definition 3.9 of the real projective
transformations.
The real line is a rather dull object. On the contrary, the complex projective line
(also known as the Riemann sphere, which is obtained by adding a single point z = ∞ to
the complex 1-line C = C1 , admits a rich geometry. In particular, the projective maps are
circular in the following sense.
Theorem 3.18. Any map Φ ∈ Möb transforms a line or a circle in C into a line or
a circle.
The proof of this theorem follows from the following result describing the structure
of a Möbius group.
Theorem 3.19. The Möbius group is generated (via compositions) by homotheties
z 7→ az, 0 6= a ∈ C (including rotations when |a| = 1), shifts z 7→ z + b, b ∈ C and the
inversion z 7→ 1/z.
The circular property obviously holds for homotheties and translations. For the in-
version it is sufficient to check it for circles |z − c| = r, c ∈ R (an easy computation, do
it!).
The Möbius group is rather rich: one can see that any point z0 ∈ H can be mapped
to any other point z1 ∈ H by a Möbius transformation.
The groups of isometries of the hyperbolic plane consists of the Möbius maps that
preserve the real axis Im z = 0, the boundary of H ⊆ C in C (recall, that from H we don’t
“see” this boundary).
26.4. Digression: spherical geometry on C. The Riemann sphere. Consider
the complex plane C (which should be called the complex line C1 if considered over the
field of complex numbers C). Then one can define on it the Riemann metric using the
following formula:
1 e
hv, wiz = 2 hv, wi , z ∈ C, v, w ∈ Tz C. (26.8)
1 + |z|2
10
The Möbius group consists of fractional linear maps 26.5 without the requirement
that entries of these matrices are real numbers. The reality condition in 26.5 is required
to map the absolute to itself. Note that the isometry (26.4) is an element from the Möbius
group which sends one absolute {Im z = 0} to another |z| = 1.
26. RIEMANN SURFACES OF NEGATIVE CURVATURE 135
Figure 27. Isometry between C and S2
The corresponding length takes the form
|u|
|u|z = , z ∈ C, u ∈ Tz C.
1 + |z|2
This metric actually admits an extension to the infinite point if we consider C as an
affine chart on the complex projective line CP1 = P1C . Indeed, if we choose a different
chart w = z −1 , then in the new chart we have the for the vector v = Φ0 (z)u, Φ(z) = 1/z
whose derivative is −1/z 2 the length is
|v| 1 |v|
|v|w = · = , w = 1/z.
|w|2 1 + |w|−2 |w|2 + 1
In other words, the inversion z 7→ 1/z is an isometry of the metric (26.8), and this metric
extends on the complex projective line (topologically a sphere).
Example 3.23. We give here the same computation in the notation involving the
differentials, see §??. The Riemann metric is given by the formula
|dz|
g(z) =
1 + |z|2
and after change of variables z = 1/w, |z| = |w|−1 we have
1 |dw| |dw|
dz = − dw, |dz| = |w|−2 |dw|, g(w) = = .
w2 |w|2 (1 + |w|−2 ) |w|2 + 1
Problem 3.25. Use the same technique to prove all assertions from §26.1.
On the picture below we show how to construct graphically an isometry between the
round sphere S2 and the projective line with the metric (26.8).
Problem 3.26. Write down an explicit formula for the stereographic projection de-
fined on this picture and prove that this is indeed an isometry.
Remark 3.29. One can verify that isometries of the spherical metric on CP1 form
a subgroup of the Möbius group Möb introduced in §26.3, the complex fractional linear
maps
az + b
z 7−→ , a, b, c, d ∈ C, ac − bd = 1.
cz + d
136 3. NON-EUCLIDEAN GEOMETRIES
In particular, affine maps are isometric if they are rotations z 7→ |a|z, |a| = 1, and
translations z 7→ z + b. If one allows to revert the orientation, the symmetry z 7→ z̄ is also
an isometry.
26.5. Geodesic curves on H. If Φ : H → H is an isometry of the
hyperbolic plane and γ is a geodesic curve, then Φ(γ) is also a geodesic
curve. This trivial observation (prove it, it is a simple theorem!) allows us
to construct lots of geodesic curves in H.
Proposition 3.20. Any vertical line {x = c ∈ R, y > 0} is a geodesic
curve.
Example 3.24. Consider the path γ : [0, 1) → H, defined by the law of
motion x(t) = 0, y(t) = 1 − t. It starts at t = 0 at the point (0, 1), and
“reaches the absolute” at the moment t = 1, which is seemingly impossible
(to escape “to infinity” in finite time). We will see that this is possible if
the hyperbolic speed (the hyperbolic length of the velocity vector) grows
unlimited as t → 1 and compute the hyperbolic length of the geodesic line.
The visible (Euclidean) velocity vector of this motion is constant, |γ̇(t)|e ≡
1, so the visible (Euclidean) length of it is given by the integral
Z 1
e
|γ| = 1 dt = 1.
0
The hyperbolic length, however, differs by the factor 1/y, that is, is given
by the integral
Z 1
h dt
|γ| = 1· = +∞.
0 1−t
The integral above diverges, since if we replace [0, 1) by [0, ε] and try to pass
to the limit, we will get the integral
Z 1−ε ε
dt
= − ln t = − ln ε → ∞.
0 1−t 1
In other words, we prove that the length of the vertical “hyperbolic ray”
starting at (0, 1) is infinite, so is infinite the length of any other geodesic
line.
Proof. This follows from the fact that horizontal projection of any
vector w ∈ Tz H on the vertical line, say, {Re z = 0} ⊆ H, is no longer than
the length |w|z of the vector itself.
Corollary 3.21. All half-circles {|z − c| = r, Im z > 0} ⊆ C, are
geodesic curves in H.
Proof. Image of any circle with a real center c ∈ R by a transformation
(26.7) is again a circle (or straight line) with the center at Φ(z). To map
a half-circle to a vertical line, one needs to find a projective map Φ which
sends one of its two endpoints to infinity.
26. RIEMANN SURFACES OF NEGATIVE CURVATURE 137
Figure 28. Family of ultra-parallel lines on the Poincaré
disc, having zero angle with each other, and a horocycle or-
thogonal to all of them.
Figure 29. Infinite triangles with all three sides parallel to
each other and zero internal angles
Remark 3.30. In the Poincaré model D of the hyperbolic plane the
description of geodesic lines is also relatively easy. Any such line is a (Eu-
clidean) circular line, orthogonal to the absolute |z| = 1, or a (Euclidean)
diameter of the disc. This follows from the circular property of the Möbius
transformations and the fact that the map (26.4) is a Möbius transformation
itself.
Consider three distinct real points x1 < x2 < x3 ∈ R on the boundary of
H. There exist three circular arcs in H which are supported by the segments
x1 x2 , x2 x3 and x1 x3 as the diameters, see Fig. 29. Together they form a
“circular triangle” 4P1 P2 P3 , Pi = (xi , 0) ∈ C.
As we know 2 , there exists a unique (real) projective transformation (2) Xref!
Φ that sends the triplet (x1 , x2 , xn ) to the triplet (0, 1, ∞). You can see
that the same transformation sends two of the arcs to the parallel straight
138 3. NON-EUCLIDEAN GEOMETRIES
lines {x = 0}, {x = 1} and the last arc to the circular arc supported by the
diameter [0, 1] ⊆ R.
Note that in terms of the intrinsic geometry of H all these lines/circles
are geodesic, i.e., the “straight lines” in this geometry. We have constructed
an exotic object, an infinite triangle such that any two sides of it are parallel
to each other (they don’t intersect in H!).
Problem 3.27. Compute the result of a parallel transport along this in-
finite triangle in the same way we computed the result of a parallel transport
along a spherical triangle. There is a technical difficulty since the vertices
Pi of the infinite triangle 4P1 P2 P3 are “at infinity” and not belong to the
hyperbolic plane, but you surely overcome this problem by some simple limit
computation.
Remark 3.31 (important). One can easily compute the result of a par-
allel transport along any triangle, using Euclidean computations based on
counting angles between crossing circles. The computation will be tedious
(as most computations in geometry), but the answer will be simple. The
angle transported along a geodesic triangle, will rotate in the direction, op-
posite to the direction of rotation on the sphere. This is equivalent to the ob-
servation that the sum of internal angles of any hyperbolic triangle is strictly
less than π = 180◦ . The angle of this rotation will be proportional to the
hyperbolic area 11 of the triangle. The proportionality coefficient is called the
curvature (exactly as in the spherical geometry), and from the computations
we conclude that both H and D are Riemann surfaces of constant negative
curvature. In the same way as the unit sphere can be (non-isometrically!)
mapped to any other sphere of radius R (simply by multiplying all lengths
by the same factor R), the standard models of of the hyperbolic plane can be
similarly scaled. In the spherical case the curvature of a large sphere will be
1/R2 → 0 as R → +∞, the scaled model of the Lobachevsky plane will have
a constant negative curvature −1/R2 . In the limit as R → ∞, both models
will “converge” to the flat Euclidean plane with identically zero curvature.
26.6. Circles in H. A hyperbolic circle of radius R, 0 < R < ∞, is a
set of points on the hyperbolic plane, which are at the distance R from a
point O called the center of this circle.
Unlike the geodesic lines, hyperbolic circles are easier to describe in the
Poincaré model D.
Indeed, if the center O coincides with the center z = 0 of the unit
disk, then the geodesic lines through O in all directions will look like the
(Euclidean) diameters of the disk D. Since the metric in D is invariant by
the (Euclidean) rotations around the origin, all concentric discs with the
center at the origin are the (Euclidean) circles. As a corollary, we conclude
with the following description.
11We did not introduce the notion of the hyperbolic area for a lack of time, but its
construction is straightforward once we know how to measure small lengths and angles.
26. RIEMANN SURFACES OF NEGATIVE CURVATURE 139
Figure 30. Concentric circles in the Poincare disc. The
rightmost plot consists of concentric horocycles.
Theorem 3.22. Any hyperbolic circle of a finite radius R in D looks like
a Euclidean circle disjoint from the absolute.
By the conformal equivalence, the same is true about circles in the
Lobachevsky plane H: all of them look like Euclidean circles entirely above
the absolute.
In the flat (Euclidean) geometry the perimeter of the circle of radius
r > 0 is equal to 2πr. In the spherical geometry the perimeter of a circle
of (spherical) radius r is always less than 2πr: indeed, in R3 we measure
the perimeter in the usual Euclidean sense, but the length of the spherical
“radius” (think of a circle centered at the North pole) we measure along
a “curvy” meridian, whose Euclidean length is larger than the 3D-distance
from the North pole to the points on the circle. Indeed, the equator has the
spherical radius π/2 and the perimeter 2π, thus instead of the flat value π
their ratio is 4 > π. If we consider circles of radii greater than π/2, their
perimeters will go to zero as r → π (they will be small circles around the
South pole).
What will happen in the hyperbolic geometry?
Problem 3.28. Compute the hyperbolic length of a circle of hyperbolic
radius r.
Solution. This problem is more simply solved in the Poincaré disc:
we can consider concentric circles centered at the center w = 0 of the disk
D = {|w| < 1} ⊆ C. These hyperbolic circles are visible to the Euclidean
observer as concentric discs of radii ρ ∈ [0, 1), hence the only problem is
to compute the hyperbolic perimeter and the hyperbolic radius of the circle
Cρ = {|w| = ρ}. The perimeter is obviously (why?) equal to
1 2πρ
|Cρ |h = 2
· |Cρ |e = .
1−ρ 1 − ρ2
To compute the hyperbolic radius r = rh (ρ) of Cρ , we have to calculate the
integral Z ρ
ds
disth (O, Cρ ) = = r(ρ).
0 1 − |s|2
It remains to invert the formula for the latter integral, find the dependence
ρ = ρ(r) and substitute it into the first formula. Omitting the tedious
140 3. NON-EUCLIDEAN GEOMETRIES
computation (do them at your pleasure!), here is the answer12:
|Cρ(r) | = 2π sinh r, sinh r = 21 (er − e−r ), r > 0. (26.9)
In other words, instead of the constant flat value 2π, in the hyperbolic
geometry the ratio of perimeter to radius is given by a very fast growing
function 2π · r−1 sinh r (the values of this function for small 0 < r 1 are
close to 2π.
In other word, in the hyperbolic universe the circles grow in length much,
much faster than in the flat (not talking about the spherical) world.
26.7. Horocycles. In the hyperbolic geometry there exist monsters
that in principle cannot exist in the flat or spherical geometry. Consider a
curve C ⊆ H that looks like a Euclidean circle, but is tangent to the absolute
at some point O ∈ R ⊆ C (recall, this point does not belong to H). Being
circular (in C), it must be either a hyperbolic circle or a hyperbolic straight
line (geodesic line). This curve is definitely not geodesic (hyperbolic straight
line), it touches the absolute rather than being orthogonal to it. So it should
be considered as some “ideal circle”.
The circumference of this circle is infinite (you don’t even have to com-
pute the integral to see it, just compare it with the vertical line orthogonal
to the absolute at O by horizontal projection: C is “much longer” than this
geodesic). Thus its radius (whatever it is) should also be infinite. A simple
argument shows that the (hyperbolic) center of this “ideal circle” could not
be at any finite point O0 ∈ H: all points of this circle are at a finite distance
from O0 . Thus we arrive at the following set of paradoxical properties:
(1) C is a limit case of a family of circles (just move C “slightly” above
the absolute and it will become a usual hyperbolic circle).
(2) C has an infinite radius and infinite perimeter (circumference) in
the hyperbolic metric.
(3) The center of C must be somewhere “at infinity”. The only place
in the world where this center might be is the point of tangency O
itself.
(4) This “infinite circle” passes through its own center! (of course, here
the term “passes” makes no intrinsic sense, only some equality in
the limit).
Definition 3.33. The curves as above (visible as Euclidean circles of
different radii tangent to the absolute) are called horocycles. They are “the
largest circles” in the hyperbolic geometry and, quite naturally, are all iso-
metric to each other.
Problem 3.29. Prove that any two horocycles are indeed isometric to
each other in H.
12The function sinh x is called the hyperbolic sinus for some rather deep reasons: e.g.,
1
eix − e−ix .
by the Euler formula, sin x = 2i
26. RIEMANN SURFACES OF NEGATIVE CURVATURE 141
Figure 31. Local classification of surfaces.
26.8. Does a hyperbolic geometry admit a three-dimensional
globus? Analytic simplicity of the hyperbolic plane (and disc) notwith-
standing, it is highly desirable to realize it as the geometry on a surface H
in R3 , like the sphere realizes the geometry of constant positive curvature?
One can try to start with a local analysis. We know how to guess the
sign of the curvature: one has to consider geodesic circles and compare their
circumference (perimeter) with the perimeter of a small Euclidean circle of
the same radius. A hyperbolic circle must have perimeter larger than 2πr.
This means that the hyperbolic circle must be significantly non-flat, even if
we consider circles of small radius on our surface.
A simple study of functions of two real variables x, y and their graphs
in R3 of the form z = f (x, y) suggests that there are three types of surfaces,
depending on their relative position with respect to the tangent plane at a
given point, see Fig. 31.
Assuming that the tangent plane at a given point P ∈ H is horizontal
and the point itself is at the origin (all this can immediately be achieved by
a suitable choice of the coordinates in R3 ), we can expand the function f in
the Taylor series. There will be no free term and linear terms (because of
our choice of the coordinates), and the quadratic terms by a rotation of the
coordinates (x, y) around the origin can be brought into the form
f (x, y) = αx2 + βy 2 + · · · , α, β ∈ R, |x|, |y| 1. (26.10)
The shape of the surface depends on the combination of signs of α, β. If
they are of the same sign, αβ > 0, we have one of the two surfaces plotted
on the left. The corresponding function has a local maximum (or minimum)
and looks like a cup (perhaps, bottom-up). If α = β, this cup will be almost
rotationally symmetric (perfectly symmetric in absence of the higher terms
denoted by dots) and locally will resemble a small spherical cap. In the
general case the point P is called elliptic.
The last case (of different signs, αβ < 0) is the geographical rendering
of the mountain pass from one valley to another across a mountain chain.
Points of the corresponding type are also called saddle points for obvious
reasons. The function f attains neither maximum, nor minimum at (0, 0)
142 3. NON-EUCLIDEAN GEOMETRIES
and its graph crosses the tangent plane by two smooth curves13 given by the
local equations
√ p
αx ± −βy + · · · = 0, assuming that α > 0, β < 0.
Locally a saddle point can be visually identified as follows: its normal sec-
tions (by different planes in R3 passing through the axis orthogonal to H at
the point P ) should be parabola-like (flat) curves with horns pointing some-
times to one side of the surface, sometimes to the other (cf. z = αx2 + · · ·
and z = −|β|y 2 +· · · obtained by sections {y = 0} and {x = 0} respectively).
One may compute (again, the computation is technical and non-revealing),
that the curvature at the point P will be equal to 1/(αβ), that is, positive
if P is an elliptic point and negative, if it is a saddle point.
Thus to build a hyperbolic “globus”, we need to construct a surface
H ⊆ R3 which entirely consists of saddle points and in addition has the
same curvature at all its points. Is this problem solvable?
To construct a surface entirely consisting of saddle points is not difficult.
The surface z = x2 − y 2 (without any dots) will possess this property. A
more symmetric example is given by another quadric z 2 − (x2 + y 2 ) − 1 = 0,
see Fig. 32, called hyperboloid. Its curvature is obviously the same at all
points on any height z = const, but varies with z. It can be shown that
the curvature, while staying negative, tends to zero from below as z →
±∞. Indeed, as (x, y, z) → ∞, the hyperboloid tends14 to the circular cone
z 2 − (x2 + y 2 ) = 0 in the same way as the hyperbola z 2 − x2 = 1, obtained
in the intersection with the plane {y = 0}, tends to the angle formed by its
two asymptotes z = ±x.
But the circular cone is flat! It can be isometrically mapped onto a
planar angle between two straight lines on the plane. For this it is suffi-
cient to cut the cone along any straight line passing through the origin, and
“unwrap” the cut surface. One can transform this observation into an accu-
rate statement: the geodesic curvature of the hyperboloid is negative at every
point, but tends to zero as points on it tend to infinity. This the hyperboloid
is a bad candidate for the role of he hyperbolic “globus”, its curvature is far
from being constant.
Yet the idea of constructing a suitable surface can be derived from this
example. Consider the Euclidean plane with coordinates (x, y) and a func-
tion ϕ = ϕ(x) such that:
(1) ∀x ∈ R ϕ(x) > 0,
(2) ϕ is convex, ϕ00 (x) > 0.
Let us rotate the graph of this function around the x-axis in R3 . The result
will be a surface H rotationally symmetric with respect to this axis. Its
13Use the implicit function theorem to prove that these curves are indeed smooth.
14Let ε > 0 be a small parameter and make the rescaling (x, y, z) 7→ ε−1 (x, y, z) in
R3 . After this rescaling the surface will be described by the equation z 2 − (x2 + y 2 ) = ε2 ,
which for ε = 0 becomes the equation of the cone.
26. RIEMANN SURFACES OF NEGATIVE CURVATURE 143
Figure 32. Hyperboloid of rotation
curvature will be negative. Indeed, the section of this surface by any point
x = const will be a (Euclidean) circle y 2 +z 2 = ϕ2 (x) and it it lies to one side
of the tangent plane, while the section by any plane containing the rotation
axis will be isometric to the graph y = ϕ(x) and because of the convexity
lie to the other side of the tangent plane.
Of course, the above two conditions only imply that the geodesic cur-
vature of the surface is negative everywhere, but does not guarantee that
it is constant. But we can compute this curvature. The (tedious as usual)
computation will give the answer in terms of the numbers ϕ(x), ϕ0 (x) and
ϕ00 (x).
And now comes the key step. By requiring that this curvature remains
constant for all values of x, we obtain an ordinary differential equation im-
posed on the function ϕ.
The general theory of ordinary differential equations is a huge area of
mathematics that rests upon many branches (Analysis, Linear algebra, Com-
plex variables, Topology etc.), and there is absolutely no place to discuss
it even very vaguely. What is important to know at this moment, is the
following.
(1) Solutions of reasonably good differential equations always exist lo-
cally: for any point a ∈ R there exists a small enough ε > 0 such
that a solution (one or many, depending on the order of the differ-
ential equations) ϕ(x) exists for |x − a| < ε.
(2) Solutions of even very simple differential equations may have finite
domain of definitions: they exist on certain intervals (a, b) ⊆ R, but
cannot be extending outside from this interval for various reasons
(lose differentiability, tend to infinity as x → a and/or x → b etc.).
144 3. NON-EUCLIDEAN GEOMETRIES
Figure 33. Tractrix and pseudoshere
(3) Explicit formulas for solutions of differential equations are excep-
tionally rare outside the class of linear ordinary equations. Solvable
differential equations are precious things and they are studied like
precious gems.
Having that said, we can tell the readers the final report about our
attempt to construct a rotationally symmetric “globus” H of the hyperbolic
plane.
(1) The differential equation imposed on the unknown function ϕ to
produce (by rotating its graph as explained) is reasonably good. It
admits a solution.
(2) In a miraculous way, this solution can be given in the explicit form.
The corresponding curve {y = ϕ(x)} is called the tractrix .
(3) This solution is defined only on the half-line x > 0. Of course, one
can extend it to the negative values of x by parity, ϕ(−x) = ϕ(x),
but then at the point x = 0 we will have a singularity (cuspidal
point, where the derivative becomes infinite).
The result of rotation of the tractrix is called the pseudosphere (some-
times, the Beltrami pseudosphere), see Fig. 33.
Yet it is too early to celebrate the success. The pseudosphere H is indeed
a surface on which the metric inherited from the Euclidean metric in R3 has
constant negative curvature, but there are two problems:
(1) The surface H is not complete. There is an “equator” on it, which
consists of non-smooth (even “sharp”!) points, corresponding to
x = 0. A geodesic line cannot be continued across this equator. The
genuine hyperbolic plane H has no such non-natural obstructions
for continuation of geodesics.
(2) The surface H represents not all points of H, but rather a vertical
semi-strip of certain finite width between two horocycles. If a geo-
desic on H does not fit completely this semistrip, then it will wind
around the neck of the pseudosphere, eventually self-intersecting
(this is impossible for genuine geodesics).
26. RIEMANN SURFACES OF NEGATIVE CURVATURE 145
Figure 34. Geodesics on H and how they look on H and D.
Comparative global behavior of geodesics on the pseudosphere with those
on H and D is shown on Fig. 34.
Can one remove the last obstruction and find a complete and full real-
ization of the hyperbolic geometry on a surface in R3 ? Alas. In 1901 David
Hilbert proved that this is impossible: there is no surface of constant nega-
tive curvature in R3 , isometric to the hyperbolic plane. It took almost three
quarters of a century to prove that all surfaces of (even non-constant) nega-
tive curvature in R3 are similar to the hyperboloid: their curvature cannot
be bounded from above by any negative number, so there should be points
at which the curvature tends to zero (N. V. Efimov, 1975).
26.9. Non-convex polyhedra. In §24.5 we discussed polyhedral sur-
faces, which are technically easier to study than the smooth curved sur-
face. All convex polyhedra (in particular, all five Plato bodies) have positive
146 3. NON-EUCLIDEAN GEOMETRIES
Figure 35. Maximal hyperbolic disc embeddable in R3 .
Figure 36. Polyhedral “saddle point”.
“atomic curvature” at all vertices. This follows from the fact that the sum
of angles of all faces meeting at a given vertex, is smaller than 2π = 360◦ .
Can one construct (using, as before, paper, scissors and glue) a polyhe-
dron with negative “atomic curvature” at a vertex? At first, the problem
appears non-solvable: one cannot patch together faces with the total sum
of angles greater than the full rotation angle. Maximum, if the total sum of
angles is equal to 2π and then the vertex in fact will be a flat point.
Upon a second thought, you can see that this is locally possible. Look at
Fig. 36. Of course, this is only a local portrait, but it can give you an extra
insight of what a saddle point is and why the “atomic curvature” (the result
of parallel translation along the polygonal path) is a rotation in the negative
direction. If you don’t trust Hilbert or Efimov, you have a practical way
to construct a counterexample: use scissors, paper and glue to construct a
piecewise-flat polyhedral surface which will have only saddle-type vertices
in the above sense.
27. CONCLUSION. WHAT IS GEOMETRY? 147
27. Conclusion. What is Geometry?
Mr. Jourdain
What! When I say, ”Nicole, bring me my slippers, and
give me my nightcap,” that’s prose?
Philosophy Master
Yes, Sir.
Mr. Jourdain
By my faith! For more than forty years I have been
speaking prose without knowing anything about it,
and I am much obliged to you for having taught me
that. I would like then to put into a note to her:
“Beautiful marchioness, your lovely eyes make me die
of love,” but I want that put in a gallant manner and
be nicely turned.
J.-B. Molière, Le Bourgeois Gentilhomme.
27.1. Euclidean plane as a metric space. So what is after all the
Euclidean (and non-Euclidean) geometry from the modern point of view?
What is the role of Euclid’s axioms and why they look so different from the
way we build today our mathematical notions “from bottom up”, starting
with the simplest notion of sets without any additional structure on them
and gradually adding more and more features?
The first sections were aimed to convince the reader that the founda-
tions of the Euclidean geometry were “metric”, based on measurements of
lengths (and angles, which is almost automatic once we know how to measure
lengths). In metric terms one can easily describe the simplest shapes, cir-
cles and (somewhat indirectly) straight line segments as having the shortest
length among all other paths between their endpoints. In today’s language,
we would say that the ancient Greeks saw their world as a metric space.
We know now that there are many different metrics and the properties
of the corresponding metric spaces can be strikingly different (including
absurd results of how a shortest segment looks like). Of course, the Greeks
understood that the measurement results in the “real world” are much more
specific than an abstract function dist between pairs of points, which is
nonnegative, symmetric and satisfies the triangle inequality.
Today’s approach would be to try and specify the way how the “right”
distance function is constructed, imposing extra conditions on the distance
function. The way towards this description is very long and indirect, al-
though very logical. Oversimplifying things, one has first to introduce the
notion of a vector space in purely algebraic terms, then equip it with a
symmetric
p positive bilinear form (the scalar product) and use the norm
|v| = hv, vi as the measure of the length of vectors, which immediately
148 3. NON-EUCLIDEAN GEOMETRIES
leads to the flat affine Euclidean geometry, in which all axioms become easy
theorems.
Inexplicably (?) Euclid preferred to conceal the fundamental role of the
metric, postulating instead certain properties of the objects (segments, lines,
circles) defined in terms of this metric. Some of these properties are indeed
“obvious”, but other are less so. The most controversial are properties
related to the infinite (or better say, unbounded) size of the straight lines.
The very first axiom, asserting that the line can be extended in each its
direction without any constraint, may be questioned. Indeed, if we have a
finite line segment [A, B], then we can (using a regular hexagon) construct
by ruler and compass a third point C such that A, B, C are on the same
line and B is the midpoint of the segment [A, C] this construction can be
infinitely repeated, giving an infinite sequence of points C = C1 , C2 , C3 , . . .
all on the same line, such that Ci is the midpoint of the segment [Ci−1 , Ci+1 ].
But this construction in fact relies on a theorem about the sum of angles
of a triangle, that requires a much more problematic Postulate on parallels.
Thus Euclid formulates the axiom on extendability of a line independently.
But then there is a problem: any finite segment [A, B], say, of length 1 can
be extended across the endpoint B by a line, but this extension may be
shorter, say, only of length 21 . Again by the same axiom, another extension
is possible, but it might happen that the newly added part will have the
length 14 , and so on. As a result, we will have an “open line segment” [A, C)
which does not contain the end point, but still satisfies the Euclidean axiom
about extendability of any line through any point on this line.
Well, this specific problem may be solved by adding that the line can
be extended by another piece of length at least one. But still the obtained
infinite object may defy the intuition. Why there exist parallel lines at
all? Perhaps, any two lines will eventually intersect if we extend them far
enough? The example with “straight lines” drawn on a sphere of very large
radius shows that this might well be the case. Why the line through any
two points is unique? Again, if we think about two poles on Earth, this is
no longer a case.
In short, the properties that were postulated by Euclid, may well be
wrong if our daily intuition is the only justification for axiomatizing them.
The Fifth postulate was the most blatantly non-obvious axiom, no surprise
that for millennia different people tried to derive it from the rest of “more
plausible” axioms. We know that this is impossible: in the hyperbolic ge-
ometry the geodesic metric produces lines and circles that obey all other
axioms except for the Fifth postulate.
Why Euclid himself (or his successors) could not proceed this way, of
course, mutatis mutandis translating the construction of “linear” (Carte-
sian) geometry into their familiar terms? Perhaps, one of the reasons was
the problem with the number system. To measure distances, the rational
numbers, well familiar to the Greeks, were insufficient. Pythagoras, who
27. CONCLUSION. WHAT IS GEOMETRY? 149
discovered that, ruined any hope for following the modern path of thinking.
This was a tragic (in a sense) turn of History.
27.2. Construction of the Euclidean geometry. We will recall the
main steps of reconstruction of the Euclidean spaces.
27.2.1. From numbers to the the numeric line. Assume that Euclid would
have a numeric system (denote it by R not to be confused with R at this
moment) that would be reach enough. Namely, it should allow all four
arithmetic operations (including division by nonzero numbers), the order
“>”. Then we can instantly turn this set R itself into a metric space, by
introducing the distance function
(
y − x, if y > x,
dist(x, y) = distR (x, y) = |x − y| =
x − y, if x > y.
Moreover, this will be the only distance function on R that will be invari-
ant by shifts (translations), dist(x + a, y + a) = dist(x, y) for all a ∈ R.
In addition this distance is well behaving by similarity transformations,
dist(ax, ay) = dist(a, 0) dist(x, y) again for any a ∈ R. The resulting ob-
ject is the numeric line: it could be considered, e.g., if we take R to be the
field Q of rationals, so well familiar to the Greeks.
27.2.2. From line to plane and higher dimensions. Then one can make a
quantum leap and consider the Cartesian square R made of all pairs (x, y) ∈
R. This is the “space” (set) on which one can consider sets defined by
polynomial equations (of the form P (x, y) = 0 with coefficients from R).
These equations can be transformed, arranged into systems of equations
and sometimes solved. Very soon one would see that the sets defined by the
first degree polynomials (i.e., of the form ax + by + c = 0, a, b, c ∈ R), obey
all axioms required from lines in the Euclidean axiomatic.
27.2.3. Distance. Choosing the shape of the unit circle. There is still one
key ingredient missing. We can measure lengths along each coordinate x or
y separately, using the absolute value distance distR , but not the distances
of other line segments. One need to find a way to mix x with y. One
can try one’s luck guessing the equation for a circle, the set of points (x, y)
equidistant from, say, the origin (0, 0). By arguments of symmetry, one
could either build an explicit expression from |x| and |y|, or try an algebraic
curve of degree higher than one. The first way may lead to a number of
formulas for the distance:
|x| + |y|,
distR2 (P, O) = max(|x| + |y|), where P = (x, y), O = (0, 0).
p
p p p
|x| + |y| ,
Other combinations are less appropriate, since we want to keep the distance
positive for P 6= O and homogeneous, distR2 (λP, O) = |λ| distR2 (P, O),
where λP = (λx, λy), λ ∈ R (including the case λ = −1). All three distance
functions satisfy the triangle inequality, but only one of them produces the
150 3. NON-EUCLIDEAN GEOMETRIES
circle Cr = {P : distR2 (P, O) = r} which is a smooth quadratic curve, sym-
metric by a sufficiently rich group of linear transformations of R2 . (Other
distance functions are symmetric only by finitely many reflections in differ-
ent axes on the R2 -plane.
This leads us to the “right” answer: take the unit circle the quadratic
algebraic curve
x2 + y 2 = 1. (27.1)
27.2.4. Scalar product. No matter how you arrived to this answer, it
very quickly leads to a number of results exactly reproducing the funda-
mental theorems of Euclid. In particular, you can define the right angle and
fractions of it. The important step was to realize that the quadratic form
q : R2 → R, q(v) = v12 + v22 , can be polarized : the inner (scalar) product
R2 × R2 → R, (u, v) 7→ hu, vi, defined by the formula
hu, vi = 12 q(u + v) − q(u) − q(v) ,
hu, vi = u1 v1 + u2 v2 , (27.2)
is bilinear symmetric positive definite form. This form allows to talk about
orthogonality of angles formed by two rays Ru and Rv the origin when
hu, vi = 0. In general, the angle is defined in terms of the scalar product as
s
hu, vi2
∠(Ru, Rv) = arccos .
hu, ui hv, vi
27.2.5. Completeness of the number systems. Thus choice, however, comes
p price. In order to compute the distances using the formula dist(P, O) =
at a
x2 + y 2 , our number field R must admit the root extraction from all pos-
itive numbers. As we know, the field Q of rational numbers is bad in this
sense. One can partially solve this problem by replacing Q by the field of
all quadratic irrationalities (the closure of Q by adding radicals of all pos-
itive elements) or by the more broad field of real algebraic numbers. But
this would not be sufficient to compute, say, the circumference (perimeter)
of circles, as we know, since the number π is even not algebraic (satisfies
no algebraic equation with rational coefficients). Mathematicians had fully
appreciated the depth of this problem only much, much later, in the 19th
century (the names of A.-L. Cauchy and K. Weierstrass must be mentioned
for their role in developing the accurate notions of numbers, limits etc.).
From this moment on we will replace an undefined “number system” R by
the field of real numbers R which is complete.
It should be mentioned at this point that the field R, despite its completeness, is
not algebraically closed : there are equations with real coefficients without real roots, e.g.,
x2 +1 = 0. To overcome this, one has to make another step and consider the set of complex
numbers C which is algebraically closed. Working over C becomes important when we
consider orthogonal linear transformations which usually have non-real eigenvalues.
It turns out that C as a plane R2 (a point (x, y) corresponds to the complex number
x + iy ∈ C, where i is one of the two roots of the above equation) possesses a wonderful
geometry of its own. The geometry of the complex 1-dimensional line C1 equipped with
the Euclidean metric is flat, but when we change the metric to the Fubini–Study metric,
(3) Xref! the geometry will become spherical. 3
27. CONCLUSION. WHAT IS GEOMETRY? 151
27.2.6. The result. Thus from the modern point of view the Euclidean
spaces are (algebraically defined) vector spaces defined over the complete
field R and equipped with a scalar product (bilinear symmetric positive
form). In these spaces all constructions of the Euclidean geometry can be
explicitly carried out and properties of geometric configurations translated
into identities of Linear Algebra.
In particular, one can verify that all axioms of Euclid hold in this geom-
etry, including the sacramental Fifth postulate on parallels. It claims that
there is only one line `0 not intersecting a given line ` and passing through
a point A ∈ / `, and this happens if and only if both lines cross any third line
by equal angles.
In other words, the task of axiomatic definition of the Euclidean geom-
etry is completed. Although the definitions of real numbers, linear spaces,
bilinear forms would certainly look alien to the Greeks (and naive contem-
poraries), today’s students easily absorb them and find nothing artificial in
these constructions. Taken together, elements of this construction render
obsolete the most controversial axiom of Euclid.
27.2.7. What next? The existence (and accurate construction) of the
Euclidean geometry still did not answer the main question that was behind
efforts for thousands of years. Is this geometry unique? Can the contentious
Fifth postulate be derived from the rest of Euclid’s axioms? And if not, what
may happen if we drop this postulate and use only the remaining axioms?
After working out all elements of the modern construction of the Eu-
clidean space, one can relax or modify different elements of this construction.
The key idea was to relax the linearity assumption, replacing it by the lo-
cal linearity, or, in analytic terms, by differentiability of the geometry. This
idea was quite natural after mathematicians developed tools (together called
Calculus, or Mathematical analysis) to study nonlinear functions through
their linearization at different points of their domain.
27.3. Abstract Riemannian geometry. The key idea (whose origins
are different to trace back) was to consider Geometry as an analytic rather
than synthetic branch of Mathematics. In other words, when exploring
geometric constructions, one has to use the tools provided by Analysis. The
key tool of Analysis is the concept of linearization. In the same way as in the
study of real functions of one real variable a nonlinear function f (x) defined
in some neighborhood of a point A ∈ R is approximated by its differential,
the “linear” (in fact, affine) function L(x) = a(x − A) + b, a = f 0 (A),
b = f (A). Here the distance x − A is formally arbitrary, but the proximity
between f and L holds with reasonable accuracy only when |x − A| is very
small, “infinitesimal”.
The same idea can be applied to the idea of a metric space M . We
can consider a point A ∈ M and the distance function dist(P, Q) in small
neighborhood of A. It is natural to assume that the “local geometry” near
A is Euclidean, that is, there exists a local system of coordinates (x, y) on M
152 3. NON-EUCLIDEAN GEOMETRIES
near A = (A1 , A2 ), and the small circle dist(P, A) = r of radius 0 < r 10
is approximated by the equation of the infinitely small Euclidean circle
hu, ui = r2 u = (x − A1 , y − A2 ) ∈ R2 , P = (x, y).
Here the scalar product is a symmetric positive bilinear form on R2 , but
there is no reason to assume that this form should be the same for all points
A ∈ M.
Any symmetric bilinear form in any coordinate system can be described
by a symmetric 2 × 2-matrix G,
g11 g12 u
G= , hu, ui = (u1 u2 ) G 1 = g11 u21 + 2g12 u1 u2 + g22 u22 .
g12 g22 u2
The idea of Riemann was to consider only metric spaces for which the lin-
earization of the distance function for infinitesimally close points is given by
the matrix G which depends (in a differentiable way) on the point A. In
other words, the entries gij of G can be nonconstant (differentiable) func-
tions of the coordinates (x, y), G = G(x, y), gij = gij (x, y). To simplify
the notation, we write hu, viA , to stress that the scalar product should be
computed at the point A ∈ M .
A pair of points (A, P ) with P very close to A is identified with the
tangent vector with coordinates u = P − A, thus the matrix G defines the
scalar product on the (vector) space of vectors, tangent to M at A. Thus we
have the quadratic approximation to the function dist2 (A, P ) via the matrix
G(A).
27.4. Surfaces in R3 . The above abstract definition was distilled from
specific examples. The easiest of them appear as the intrinsic geometry
on smooth surfaces in R3 . For such surfaces the concept of linearization
can be visualized. At each point A ∈ M we can define (using analytic
tools) the tangent plane TA M to the surface M at this point. Intuitively, a
small piece of M near A is well approximated by the tangent plane. Since
M ⊆ R3 , the tangent plane TA M also is subset of the Euclidean space and
inherits from R3 the metric (say, the scalar product of two tangent vectors
u, v ∈ TA M is equal to |u| |v| cos α, where |u|, |v| are the Euclidean lengths
of these vectors and α is the angle between them, measured in the tangent
plane. If we introduce arbitrary coordinate system on M (say, projecting M
on the coordinate plane (x, y) in R3 , then one can compute the coefficients
of the metric gij = gij (x, y) and thus obtain a Riemann surface.
27.5. The distance function in Riemannian geometry. The Rie-
mann metric allows to define the length of tangent vectors only, that is, the
distance function dist(A, B) between infinitely close points A, B. To pass
from infinitely close pairs to arbitrary pairs of points on M , we apply the
“guasi-Euclidean” approach. In the Euclidean geometry the distance be-
tween two points is the length of a shortest smooth path γ connecting these
points, and it is proved that this path is a straight line segment.
27. CONCLUSION. WHAT IS GEOMETRY? 153
In the general case we consider smooth paths γ : [0, 1] → M such that
γ(0) = A, γ(1) = B and for each path define length as the integral of the
Riemannian length of the velocity vector measured respectively to the point
of the path:
Z 1q
d
|γ| = hv(t), v(t)iγ(t) dt, v(t) = dt γ(t) ∈ Tγ(t) M.
0
Now we can define the Riemannian distance between any two points:
distM (A, B) = min{|γ| : γ(t) differentiable, γ(0) = A, γ(1) = B}.
γ
Together with the distance function, this construction implicitly defines the
shortest curve γ, which is called a geodesic segment (more precisely, a geo-
desic segment with the endpoints A, B). Geodesic segments are a nonlinear
analogs of line segments in the Euclidean geometry. The notion of infinite
geodesic line is derived from this metric property (any sufficiently short piece
of a geodesic line must be a geodesic segment with respect to its endpoints).
Geodesic lines are, of course, nonlinear generalizations of Euclidean straight
lines.
Besides the lengths and distances, one can define angles between smooth
curves γ 0 , γ 00 crossing at a point A, using the scalar product h·, ·iA at this
point.
Thus any Riemann surface M , whether realized as a smooth surface in
R3 or defined in an abstract form by the matrix function G(A), A ∈ M ,
becomes a metric space. What is special for the Riemannian metric is the
fact that it is locally Euclidean, that is, on each tangent space TA M is defined
by a symmetric bilinear form. h·, ·iA .
Riemannian geometries are ubiquitous and diverse. Any closed surface
in R3 has its own intrinsic geometry with drastically different behavior of
geodesic lines (one can compare the round sphere and asymmetric ellipsoids).
Still, there is a way to single out more important examples.
27.6. Isometries. Recall that an isometry of a metric space M is a
self-map Φ : M → M that preserves distances,
distM Φ(A), Φ(B) = distM (A, B) ∀A, B ∈ M.
For general metric spaces one has to verify this property for all pairs A, B. If
the metric is Riemannian, one can simplify the job of checking this condition:
Φ is an isometry of M if and only if its differential
dΦA = IA = I : TA M → TB M, B = Φ(A),
is an isometry between two vector spaces which preserves the scalar product:
hIu, IviB = hu, viA , B = Φ(A), I = dΦA .
This infinitesimal definition translates into a system of partial differential
equations imposed on the components of the vector-function Φ(x, y).
154 3. NON-EUCLIDEAN GEOMETRIES
All isometries of a given Riemann surface M obviously constitute a group
with respect to the operation of composition. If M is a completely asym-
metric surface, then this group may be trivial. If M ⊆ R3 is a surface of
revolution obtained by rotation of a planar curve along an axis, then by
definition all rigid rotations of R3 around this axis are isometries of M . An
ellipsoid with three different semiaxes has a very small isometry group (re-
flection in all three coordinate planes in R3 ). On the other hand, the round
sphere and the Euclidean plane have many isometries, the corresponding
groups are relatively large.
27.7. Homogeneous Riemann surfaces. Hyperbolic plane. Both
the spherical geometry (on the round sphere Σ 2 of any radius, centered at
the origin in R3 ) and the flat Euclidean geometry R2 have in common the
following fact: for any two points A, B ∈ M and any two tangent vectors
u ∈ TA M and v ∈ TB M of the same Riemannian length. hu, uiA = hv, viB ,
there exists an isometry of M into itself which maps A to B and u to v.
This means a very reach isometry group of both geometries, which is a rare
property.
It turns out that there is another geometry, which is homogeneous (in
the same sense, possessing a large group of symmetries). The corresponding
Riemann surface, called alternatively the hyperbolic plane, the Lobachevsky
plane or the Poincaré disk, can be realized in the upper half-plane H or in
the unit disk D of the complex plane by twisting the usual Euclidean metric
on C.
This twist is conformal: the hyperbolic length |u|z of √
a (complex) vector
u ∈ Tz C differs from its Euclidean length |u| = |u|e = uū by a positive
coefficient depending on z. This coefficient tends to infinity as the point z
approaches the boundary of H (the real line R ⊆ C), resp., the boundary of
D (the unit circle U ⊆ C), called the absolute (the absolute itself is not a
part of the hyperbolic plane). This means that the small hyperbolic circles
are seen as the small Euclidean circles of the radius growing to infinity when
the center z approaches the absolute.
The geometry of the hyperbolic plane is very different from both the
spherical geometry and the flat geometry. The main striking difference is
the behavior of geodesic lines.
In each geometry we can consider a point O and shoot out the geodesic
lines in all possible directions. In the flat geometry these geodesic diverge
from each other, but the rate of divergence is moderate (see below). In
the spherical geometry after the initial divergence the geodesic lines start to
converge, until all of them again meet at the point antipodal to O. In the
hyperbolic geometry the geodesics diverge exponentially, much faster than
in the flat case.
Analytically this difference can be expressed by the formula for the
perimeter p(R) (circumference) of the geodesic circle of radius R > 0 cen-
tered at O. In the flat case everybody knows that p(r) = 2πR for all R.
27. CONCLUSION. WHAT IS GEOMETRY? 155
In the spherical case we have p(R) < 2πR for all R > 0, and this function
is growing for R ∈ [0, π/2] and then starts to decrease on [π/2, π] until15
p(π) = 0. In the hyperbolic geometry p(R) > 2πR for all R, and we have
an asymptotical formula p(R) ∼ eR 2πR for large R.
27.8. Curvature. The phenomena described above can be organized
into a common framework after introducing the notion of the curvature of
the Riemann metric. Not going back to details, the curvature is a real
function defined on a Riemann surface that can be calculated at a point
A ∈ M as the deviation of the sum of interior angles of a small geodesic
triangle near A from the “Euclidean value” π = 180◦ , properly normalized
(divided by the area of the triangle).
This curvature is obviously preserved by isometries of the surface. For
homogeneous (highly symmetric) surfaces this function must be constant for
all points A ∈ M . The results of the computation are as follows:
(1) for the intrinsic geometry on the sphere ρ · S2 of radius ρ > 0 this
curvature is equal to 1/ρ2 ;
(2) for the Euclidean (flat) geometry, the curvature is 0;
(3) for the standard hyperbolic plane the curvature is constant and
equal to −1, but one can instantly construct a family of Riemann
surfaces with constant negative curvature −1/ρ2 by a simple rescal-
ing.
27.9. How this is all related to the axioms of Euclid? Now we
made a full round and can return back to the initial challenge: what is the
role of the Fifth postulate in the system of axioms of Euclid?
Proceeding by infinitesimally small steps, we reconstructed the original
idea of Euclid of defining the “Euclidean geometry” as the geometry of a
metric space and played around with different ways to restore the distance
function from its fundamental properties rather than from the properties of
objects defined in terms of this distance function. On this way we rather
quickly found a “realization”, that is, a metric space (the 2-dimensional
linear space over R equipped with an inner product) that satisfies all axioms
of Euclid, including the Fifth postulate.
Having established the appropriate language (and following Gauss and
Riemann), we could develop a general context and introduce the notion of
Riemann surfaces as metric spaces, for which the distance function is “locally
Euclidean” (developing a suitable technique of linearization, similar to the
linearization of functions using their derivatives).
In this general Riemannian geometry one can follow the Euclid’s pattern
and introduce the basic geometric shapes (analogs of straight line segments,
straight lines and circles). In general, their behavior might be quite erratic,
but we could hope that in the case of highly symmetric Riemann surfaces
15Note that on the unit sphere S2 there are no circles of radius greater than π.
156 3. NON-EUCLIDEAN GEOMETRIES
(homogeneous surfaces) one could observe some similarities with the flat
(Euclidean) case.
This hope turns out to be well justified. The projective geometry (a
twin sister of the spherical geometry) defines lines that satisfy almost all
Euclidean axioms. The most blatant exception is absence of parallel lines
in the projective geometry (but existence of parallels in the Euclid’s list is
also controversial). Somewhat less obvious problems arise when, following
Hilbert, we have to add axioms of order on straight lines, and then the “post-
Euclidean” axiomatic starts contradicting the properties of the projective
geometry.
But the real change of the Zeitgeist occurred when the hyperbolic geom-
etry was discovered (originally in a synthetic manner by Lobachevsky and
J. Bolyai). Today we can summarize their work as follows: geodesic lines in
the hyperbolic geometry satisfy all Euclid and “post-Euclid” axioms except
for the Fifth postulate. The latter is explicitly violated: for any (geodesic)
line ` on the hyperbolic plane and any point A ∈ / ` there exist infinitely many
(geodesic) lines passing through A and parallel (disjoint from) `. Their an-
gles with the vertical from A to ` fill a positive interval, depending on the
(negative) curvature of the hyperbolic geometry.
27.10. So why should be fascinated by the discovery of non-
Euclidean geometries? First, after these geometries have been discov-
ered and described, they quickly became ubiquitous: many mathematical
problems from all walks of life, from Number theory to the Fluid dynamics
turned out to be intrinsically related to non-Euclidean geometries.
But by far the most important application happened in the 20th century
physics. In the early 20th century Albert Einstein and David Hilbert practi-
cally simultaneously and almost independently formulated the equations of
General Relativity. Unlike the Newton (ordinary) differential equation, the
Einstein–Hilbert equations involved partial differential equations, and what
was most remarkable about them was their geometric nature. In geometric
terms, they assert that the Space of the Universe is non-Euclidean, and its
curvature is proportional to the distribution of the matter in the Universe:
the more dense is the matter at a given point, the more curved is this space
at this point. On the other hand, the matter moves along geodesics of the
corresponding Riemann metric.
Of course, necessary changes should be made to put this vague assertion
into the known context. Unlike 2-dimensional surfaces that we considered,
our Universe is 4-dimensional (with three spatial coordinates and one tem-
poral, the time axis). For 4-dimensional objects the curvature is not a scalar
function of a point (x, y) on the surface, but a 4 × 4-matrix function of the
point (x, y, z, t) in the space-time. Moreover, in another striking difference,
in absence of matter the space is flat, but the inner (scalar) product in it
is not positive: the (relativistic) “distance” from a point (x, y, z, t) to the
27. CONCLUSION. WHAT IS GEOMETRY? 157
origin is given by the formula
dist2 (O, P ) = x2 + y 2 + z 2 − c2 t2 , P = (x, y, z, t), c the speed of light.
Written in the matrix form, the corresponding symmetric matrix G =
{gij }4i,j=1 in the flat case will be constant, with three units +1 on the diag-
onal and one huge negative number −c2 on the last place. This means that
the “length” of some curves will be positive, while some other curves will
have imaginary lengths.
There is certainly no space-time to go into details, one thing should be
clear. If not for the development of general concepts of the Riemannian
geometry (variable metric, geodesic lines, curvature e.a.), neither Einstein
nor Hilbert were not able to guess the equations which were tested by the
most accurate measurements and found to be experimentally confirmed until
the last available valid digit.
Thus Gauss, who suspected that the real world might not be exactly
Euclidean (flat), was absolutely right when he measured the angles of (what
he thought) a gigantic triangle formed by three mountain peaks many kilo-
meters apart. After Einstein we know why he could not find any deviation
of their sum from 180◦ : the Earth is so light (and the size of the triangle is so
small compared to the Universe) that even today we are unable (with all the
progress in the measurement instruments) detect any non-flatness on such
scales. All known experimental proofs come from astronomic observations.
APPENDIX A
Problems for the exam
Exam Problem 1. Let U ⊆ Π be the unit circle centered at the point
O ∈ Π. Define two different functions distU : U×U → R and distΠ : U×U →
R that would make U the metric space, so that
• distU (A, B) is defined in terms of the arclength of the arcs AB
and BA on U (remember the counterclockwise orientation when
denoting the arcs!);
• distΠ (A, B) is the usual Euclidean distance between A and B on
the plane.
Answer the following questions.
(1) Check that all three axioms of the distance hold for distU . Why
don’t we have to check them for distΠ ?
(2) Find the biggest possible distance between two points on U in both
cases.
(3) Prove that distΠ (A, B) 6 distU (A, B) for any two points A, B ∈ U.
When the equality is achieved?
(4) Find an explicit relationship (formula) expressing distΠ through
dist U .
(5) Using the proof of Theorem 1.7 as an example, find all isometries
of U with respect to both distances.
(6) Let L ⊆ Π be any closed non-self-intersecting curve of the total
length 2π. Construct a distance function distL (·, ·) on L such that
L is isometric to U with the distance distU .
Exam Problem 2. Given two parametric representations,
t 7→ (x0 + at, y0 + bt) and s 7→ (x00 + a0 s, y00 + b0 s),
find necessary and sufficient conditions guaranteeing that both represent the
same line ` ⊆ R2 .
Exam Problem 3. Derive the equation of a perpendicular to a line
` = {ax + by + c = 0} dropped from the point O = (0, 0), assuming that
O∈/ `.
(1) Write a parametric equation for `. Denote by B(t) = x(t), y(t)
the “variable point” on it.
159
160 A. PROBLEMS FOR THE EXAM
(2) Note that the parametric representation is not unique, see Remark
(1.23). Show that we can always find such parametrization that
any given point A ∈ ` will correspond to t = 0.
(3) Show that |OB|2 = dist2 (O, B(t)) is a quadratic polynomial of t.
Write it explicitly: its coefficients will depend on a, b, x0 , y0 .
(4) When a polynomial p(t) = αt2 + βt + γ achieves its minimum at
the point t = 0?
(5) Prove that A = B(0) = (x0 , y0 ) ∈ `, corresponding to t = 0, is the
minimum of dist2 (O, B(t)) if and only if ay0 − bx0 = 0.
(6) Show that any line `0 = {a0 x + b0 y + c0 = 0} is perpendicular to `
(through any of its points) if and only if aa0 + bb0 = 0.
(7) This condition gives more, and defines also a perpendicular to ` in
the cases where O ∈ `.
(8) Show that the coordinate axes {x = 0} and {y = 0} are orthogo-
nal1.
Exam Problem 4. Consider a point A = (0, 1).
(1) Find equations of two lines `1 , `2 passing through A and orthogonal
to each other, such that their intersections B1 , B2 with the hori-
zontal axis {y = 0} are symmetric: B = (x1 , 0) and B2 = (x2 , 0)
and x1 = −x2 .
(2) what are the angles of the triangle 4B1 AB2 ?
(3) Describe all lines which form angle π/4 with the horizontal axis.
Of course, you know the answer from school, but try to derive it
algebraically using the formula for Cartesian distance.
(4) Find two points C1 = (x01 , 0), C2 = (x02 , 0) on the axis {y = 0}
such that dist(C1 , A) = dist(C2 , A) = dist(C1 , C2 ) (prove first that
x01 + x02 = 0).
(5) Find equations of all lines `0 forming the angle π/3 = 60◦ with the
horizontal axis.
(6) Use similar construction to determine all lines forming the angle
π/6 = 30◦ with the horizontal axis.
(*) Find an equation of lines forming the angle π/9 = 20◦ with the
horizontal axis.
Exam Problem 5.
1. Motivation. We had seen many times, how convenient is to have the para-
metric representation of lines on the plane. Can one find similar parametrization
for other curves? For instance, for the circle C = {x2 + y 2 = 1}? The naive answer
is tempting but essentially useless: ψ : t 7→ (cos t, sin t) does the job. But the answer
1This is not very surprising: we introduced the explicit formula (7.1) for the Cartesian
distance, based on the Pythagoras theorem valid only for rectangular triangles. Yet it is
always good to make sure that, making a long path, we arrived at exactly the same point,
not acquiring and not losing anything on the way.
A. PROBLEMS FOR THE EXAM 161
Figure 37. Rational parametrization of the unit circle
does not help us much until we derive all properties of the trigonometric functions
sin x and cos x, which is a long story. On the other hand, we know that there are
no periodic polynomials or rational functions (why?), so what can we do? Here is
a very elegant trick which almost solves the problem.
2. The problem proper. Let U = {x2 + y 2 = 1} be the unit circle and
A = (−1, 0) ∈ U the point on it.
(1) Write the general equation of a line ` through A in the parametric
form x = x(t), y = y(t), t ∈ R.
(2) Find the equation of the line `s through A, which crosses the axis
I = {x = 0} ' R at the point (0, s), s ∈ R.
(3) Find the intersection Ps = `s ∩ U of this line with U and prove that
the coordinates (xs , ys ) of Ps are rational functions of s ∈ R. Why
this could be somewhat unexpected? (Hint: to find the intersection
of `s with U, one has to solve a quadratic equation with respect to
t, the parameter along the line.)
(4) Prove that any point on U, except for the point A itself, is the
intersection point Ps for a suitable value of s ∈ R.
(5) Prove that the map ϕ : I → U r {A}, which sends (0, s) ∈ I to
Ps ∈ U, i.e., ϕ(s) = (xs , ys ), is a rational (1:1)-parametrization of
the punctured circle U r {A}. Write this map explicitly.
(6) Prove that s ∈ Q is a rational number, than both coordinates of Ps
are rational numbers (we call such points rational ), and vice versa,
any rational point on U (except for A) is of the form ϕ(s), s ∈ Q.
(7) Find all rational points on U.
(8) Find all Pythagorean triples, integer numbers satisfying the iden-
tity k 2 + n2 = m2 .
(9) What should be done to find all rational points on any nondegen-
erate conic section (ellipse, parabola, hyperbola)?
162 A. PROBLEMS FOR THE EXAM
Exam Problem 6. Let V be a finite-dimensional vector space (over
R), and {0} = L0 , L1 , . . . , Ln = V be any sequence of subspaces that are
embedded into each other as in (12.2). We do not assume that this sequence
is generated by selection of vectors 0 6= v1 , v2 , . . . , vn−1 , vn .
The chain (12.2) is called tight (the term is provisional, don’t look for
it on the Web), if between any two consecutive subspaces Lk ( Lk+1 , one
cannot squeeze any intermediate subspace W such that
Lk ( W ( Lk+1 .
(the definition mimics the property of natural numbers: there is no natural
number x ∈ N such that k < x < k + 1).
Prove the following statements.
(1) Prove that any tight chain L1 , . . . , Ln can be generated by a se-
quence of vectors vk such that Lk+1 = Lk + R · vk (Hint: use
Problem 2.7).
(2) Prove that each set of such vectors v1 , . . . , vn is linear independent
(3) Prove that for any linear independent system v1 , . . . , vk of vectors
in V (not necessarily the maximal one with k = n) and any vector
v ∈ V the representation
0 = λ1 v1 + · · · + λk vk , λ1 , . . . , λk ∈ R, (0.3)
if it exists, is unique.
(4) Prove that the dimension of a finite-dimensional space V is the
maximal number of linear independent vectors in V .
(5) Prove that for any two maximal systems of linear vectors v1 , . . . , vn
and w1 , . . . , wn there exists a unique linear map A : V → V , such
that A(vi ) = wi , and this map is an isomorphism between V and
itself.
Exam Problem 7. Consider the sphere S2 ⊆ R3 , its upper hemisphere
S2+= S2 ∩ {z > 0} and the vertical projection π : S2+ → D on the unit disk
D = {x2 + y 2 < 1} acting as π(x, y, z) = (x, y).
(1) Prove that this projection is one-to-one.
(2) Prove that this projection can be applied also to vectors tangent
to S2+ so that for a vector v ∈ TP S2+ its image is a vector w ∈
Tπ(P ) D = Tπ(P ) R2 .
(3) Write an explicit formula relating the lengths |v| and |w|. The
answer should depend on π(P ) = (x, y) ∈ D and the direction of
the vector w on the horizontal plane.
(4) Define the Riemann metric h·, ·i(x,y) on T(x,y) D in such a way that
the projection sending v to w will be an isometry between the two
Euclidean spaces T(x,y,z) S2+ and T(x,y) D.
(5) Is the Riemannian metric h·, ·i(x,y) conformal? Plot “small circles”
of this metric on the unit disk.
A. PROBLEMS FOR THE EXAM 163
Exam Problem 8. Consider the projective plane with the homogeneous
coordinates (x : y : z) and an affine window on it with the coordinates
X = x/z, Y = y/z. Let Q = {Y = X 2 } be the standard parabola in these
coordinates.
(1) Write down the equation of this parabola in the homogeneous co-
ordinates.
(2) Find out affine equations of the same parabola in two other “canon-
ical” affine windows.
(3) Formulate a general statement describing what you see (a quadratic
curve in an affine window) in terms of the relative position of the
curve and the infinite line (which depends on the choice of the affine
window).
(4) Which quadratic curves do not look like ellipses, parabolas or hy-
perbolas in any chart?
Hint: a quadratic homogeneous polynomial q(x, y, z) in three
variables (x : y : z) can be reducible, i.e., factor out as a product of
two linear homogeneous polynomials, q = `1 `2 , `i (x, y, z) = ai x +
bi y + ci z = 0. i = 1, 2. Or it can be irreducible.
(5) A general quadratic homogeneous polynomial q(x, y, z) is defined
by a symmetric 3×3 matrix Q. Such matrix can be reduced to the
diagonal form (see Theorem 2.17 in §18.3). Describe all types of
quadratic curves from the projective point of view (how look differ-
ent curves corresponding to different combination of the coefficients
±1 and 0).
Exam Problem 9. Consider the north pole N = (0, 0, 1) ∈ S2 and the
Euclidean plane
Π = {z = −1} = TS S2 ⊆ R3
tangent to the south pole S with the coordinates (x, y) on Π. Denote by
π : S2 r N → Π the projection that sends a point P ∈ S2 to the point of
intersection of the line SP ⊆ R3 with Π (this map is called the stereographic
projection from the North pole).
(1) Prove that π is bijective.
(2) Prove that the image of any circle (not necessarily the large one)
that passes through N is a straight line on Π.
(3) Prove (or read somewhere the proof, as it is more difficult than
usual) the following fact: image of other circle C ⊆ S2 not passing
through the origin, is a circle on Π.
(4) Consider the similar projection π 0 from the South pole S = (0, 0, −1)
to the plane Π0 = {z = 1} = TN S2 with the coordinates (x0 , y 0 ).
Compute the composition
π 0 ◦ π −1 : Π → S2 → Π0
Hint. Use the polar coordinates (r, ϕ) and (r0 , ϕ0 ) on Π and Π0 .
164 A. PROBLEMS FOR THE EXAM
(5) Write the above composition in the complex coordinates z = x + iy
on Π and w = x0 + iy 0 on Π0 .
(6) Prove that the upper hemisphere S2 ∩{z > 0} (without the equator)
is in the canonical one-to-one correspondence with an open subset
H ⊆ P2 . How this subset H looks in the three canonical affine
charts on P?
The pair of charts (windows) Π, Π0 covers the whole sphere S2 . It has an advantage
before the affine charts: we need only two of them to cover the sphere, moreover, in
each chart only one point (the respective pole, North or South) is invisible (“at infinity”).
However, it is less convenient that geodesic lines are seen in these charts not as Euclidean
lines, but as circles.