Sergei Yakovenko's blog: on Math and Teaching

Sunday, November 23, 2025

Lectures 4, 5 (Nov 11 and 18, 2025)

We completed construction of the set of real numbers by completing the set of rational numbers \mathbb Q by adding solutions of all possible infinite systems of two-sided inequalities r \leqslant x \leqslant  r, \l\in L,\ r\in R, where (L,R) is a partition of all rational numbers into two non-empty sets so that each one of the above inequalities defines a nonempty set (eventually, a point). This is possible since the rationals \mathbb Q carry the natural order <.

The set of real numbers \mathbb R “seals” all gaps between rational numbers, guaranteening that \sqrt 2, \pi, e e.a. are in it. The arithmetic operations between real numbers are extending the corresponding rational operations performed on the respective inequalities, and the order < can also be extended on the real numbers. The most surprising feature of the set of the reals \mathbb R (the “holes between the rationals”) is much bigger (in a rather precise sense) than the set of the rationals themselvs. Most of the real numbers are invisible in the sense that there is no way (even in theory) to list them all. Think of all infinite decimal fractions whose digits are randomly generated: if we talk about the real randomness, then any hope of them being effectively described, disappears. The real gain we achieve from the construction of the real numbers is the limited warranty: any “number” that we can constructively think of, exists in the new universum without paying any special efforts to prove its existence.

The Cartesian powers \mathbb R^2,\mathbb R^3,\dots representing the geometric 2-plane, 3-space e.a., do not carry the linear order, so we have to replace it by a close notion of a distance between points of these powers. The distance is a symmetric nonnegative function of two arguments that satisfies the triangle inequality.

Once the distance is defined, we can talk about (round, open) balls of the form \{x\in\mathbb R^n:\ \mathrm{dist}(x,a)<\varepsilon\}, \varepsilon >0, \ a\in \mathbb R^n of the given radius and center. This paves the way to introducing the first notions of the topology, those of open and closed sets. By definition, a set A\subseteq \mathbb R^n is open, if together with each point a\in A it contains a round open ball centered in a.

The rest of the class was devoted to developing and accurate formalization of the intuitive idea of continuity of a map f:A\to\mathbb R^m, \ A\subseteq\mathbb R^n. The informal idea of a continuity of f at a is that for any open round ball B_\varepsilon of radius \varepsilon >0, no matter how small (but positive!) \varepsilon is, around the image b=f(a) one can find a small open round ball A_\delta of radius \delta >0 around a so that f(A_\delta)\subseteq B_\varepsilon (of course, wherever defined). This notion appears first as a local one (continuity in a point), but can be globalized (continuity on the entire domain).

Lecture notes (accumulated): https://drive.google.com/file/d/1knEouBLBIGD61dyYLU5NKx8MOHTEhziq/view?usp=sharing

Saturday, November 8, 2025

Analysis for high school teachers, 2025/2026. First three lectures (Oct 21, 28 and Nov 3, 2025).

שלום כיתה א! Here is the first installation of the lecture notes and zoom recordings for the course. I apologize for the technical problems that occurred in the first three lectures, and hope that we eventually overcome all these problems.

The subject of these first lectures was an initiation to the language of the (naive) set theory which today serves as the only spoken/written language of mathematics. Originally this language has the only undefined notion of a set which consists of elements and uses, in addition to letters denoting sets, the unique symbol (“predicate”) \in: the notation A \in B means that a set denoted by A is an element of another set denoted by B. We say that the two sets are equal if and only if they consist of the same elements. There is a unique set \varnothing which has no elements at all.

The predicate \in should not be confused with the symbol (predicate) \subseteq: we say that a set B is a subset of another set C and write B\subseteq C, if any element A\in B is also an element of C: \forall A\ A\in B \implies A\in C. By definition, \varnothing is a subset of any other set. Having said that, we define the union A\cup B and intersection A\cap B of any two sets. It is important to stress that there is no problem with defining infinite unions and intersections!

We discussed how these basic operations can be used to introduce the most fundamental notion, the set of natural numbers \mathbb N=\{1,2,3,\dots,\} which may or may not include the number 0 (this is a convention). The set of Peano axioms which starts, say, with two elements 0,1 and for any element n introduces a unique element n^+, the successor of n (so that 0^+=1), allows us to define the set of natural numbers \mathbb N “inductively” (imitating the human process of counting) and immediately leads us to the conclusion that \mathbb N must be infinite.

The possibility of counting using the natural numbers naturally gives rise to the two arithmetic operation, addition and multiplication: addition is a result of repeated (iterated) counting, while multiplication is an iterated addition. These operations are partially invertible, that is, the equations a+x = b and a\times x =b are sometimes solvable with respect to x in the natural numbers. Yet we can extend the set of naturals \mathbb N may be extended first to the set \mathbb Z of integer numbers, closed by subtraction, and the set \mathbb Q of rational numbers in which any equation of the form a\times x+b=c is always solvable if a\ne 0 (such a set is called a field with the four arithmetic operations) and can be equipped with the complete order >.

Yet the set of rational numbers, although sufficient for all arithmetic operations, is not sufficient for geometric applications. For instance, it does not contain solution of the equation x^2 =2 expressing the length of the diagonal of a unit square. In the same way the circumference 2\pi of the unit circle is also not a rational number. To cope with this, we need to extend the rational numbers even more, this time adding to them solutions of all (infinite) systems of two-sided inequalities of the form a\leqslant x \leqslant b with a,b\in\mathbb Q. On this way we construct the set of real numbers \mathbb R.

The first set of lecture notes is available at the link. https://drive.google.com/file/d/1U0I0CQHaDHH1fDlqRhy5RoBlXLzliFRe/view?usp=sharing

Later I will share with you the link to the recorded lecture. Enjoy!

Sunday, May 18, 2025

Lectures 6-7 (April 22, 29)

Laws of Large Numbers

On these two lectures we discussed the Central Limit Theorem (Moivre–Laplace theorem), which claims that for a sequence of independent identically distributed random variables with finite expectation \mu and variance \sigma^2 the normalized sum \dfrac{\sqrt n}{\sigma}\biggl(\frac1n\sum\limits_{i=1}^n X_i-\mu\biggr) converges to the so called normal distribution with the density function \frac1{\sqrt{2\pi}} \mathrm e^{-\frac{x^2}2}.

The lecture notes are available here

Saturday, April 5, 2025

Lecture 5 (April 1, 2025)

Filed under: lecture,Rothschild course "Probability" — Sergei Yakovenko @ 9:16

Expectation and variance

We for a random variable X:\Omega\to\mathbb R we introduced two very important numeric characteristics, expectation and variance.

The expectation \mathrm E X is (in the simplest setting) the weighted sum of values which the variable X takes with weights equal to the corresponding probabilities; it is a finite number or can be \pm\infty. If the expectation is zero, this means that X takes positive or negative values with equal probabilities. It depends linearly on X.

The variance \mathrm D X describes the spread of random variable around its expected value \mathrm E X. If the latter is zero, the the variance is a nonnegative number (possibly 0 or +\infty) equal to the expectation of the square \mathrm E X^2. Small variance means that X deviates from its expected value very rarely.

The lecture notes are available here, enjoy.

Sunday, March 30, 2025

Lectures 3-4 (March 18, 25) 2025

Filed under: Rothschild course "Probability" — Sergei Yakovenko @ 2:39

Conditional probability

This is one of quite tricky constructions in the Probability theory. Its formal definition is very simplpe: if we impose an additional assumption on the probability problem, then a new problem arises: some elementary events are excluded by the assumption, thus in the expression which “defines” the probability as the ratio of all “favorable” cases to the number of all “possible” cases, both numerator and denominator are changed (made smaller).

However, the intuition linking the original and the conditional problems is by no means that straight. Several “paradoxes” are discussed.

Random variables

Random variables are simply real valued functions X:\Omega\to\mathbb R on a given probability space, which are “measurable”: for each real interval U its preimage X^{-1}(U) must be an admissible (random) event.

Each random variable X is completely described by its distribution function F(x)=\mathrm P\{\omega: X(\omega)\leqslant x\}. If necessary, we add indication of the random variable and write F_X(x).

The lecture notes are available here, enjoy.

Monday, March 17, 2025

Probability-2025: first two lectures

Filed under: Uncategorized — Sergei Yakovenko @ 2:28

I still struggle with my home internet, but here is the link to the first 17 pages of the lecture notes. The subsequent lectures will appear hear in the due time.

The first lecture was about the nature of the Probability theory. The second was dedicated to the first steps of its rigorous mathematical formalization.

Monday, March 25, 2024

Analysis for High School Teachers 2023/2024: Exam

Filed under: lecture,Rothschild course "Analysis for high school teachers" — Sergei Yakovenko @ 6:35
Tags:

The exam problems accompanied by some auxiliary matter (necessary definitions, remarks, hints) is available by this link.

It is a take-home exam and you have one month to solve the problems and submit solutions. More details in the file. If you have any questions, please leave them as comments below or mail to Peleg/Sergei.

Good luck and חג פורים שמח

Saturday, February 10, 2024

Intermezzo (summary of several lectures)

For various reasons I failed to publish the summaries of the lessons for the few past weeks. In these lessons we mainly addressed the idea of continuity of a function of one or several real variables (both globally on the entire domain of its definition A\subseteq\mathbb R^n and locally near a point a\in A. The intuitive notion of continuity is very simple: the images f(a),f(b)\in\mathbb R^m of two close points a,b\in A\subseteq\mathbb R^n in the domain are close to each other in the target space of a function f:A\to\mathbb R^m. Yet this formulation lacks quantifiers and a careful look at how they can be placed reveals several close but not identical definitions (e.g., continuity at every point of the domain is weaker than the uniform continuity on the domain).

Having spelled out that, we can address the notion of a limit which traditionally precedes the notion of a continuity and is believed to be very technical. Yet in practice existence of a limit (say, of a function f:A\to\mathbb Rm at a point a\notin A) is nothing more than the possibility of extending the “natural” domain of this function (given, say, by an explicit function) by assigning the value f(a) so that the extended function retains continuity on the larger set A\cup\{a\}. Usually such procedure is used to “avoid” division by zero in the expressions like f(x,y)=\displaystyle \frac{p(x)-p(y)}{x-y} on the diagonal x=y or g(x)=\displaystyle \frac{\sin x}{x} at the origin, that will appear when introducing derivatives later.

Continuity of functions is a rather strong property, especially when their domains of continuity possess certain properties. Yet to unleash the full power, it is convenient to raise the abstraction level one step more and talk about general topological spaces, not just the Euclidean spaces \mathbb R^n or their proper subsets. This is where the topology with its specific language comes to help and one can introduce and study various properties of continuous functions in the general context. Thus, for instance, one can describe “bounded” (compact) sets in absence of any distance function, and prove that this class is closed by actions of continuous functions (maps). In a similar way one can formalize the notion of a connected set (meaning that this set consists of a “single piece”) and show that continuous functions cannot destroy continuity by “tearing apart” such sets. The proofs in this “abstract context” are very simple, often one-liners and may seem to be a shallow playing with abstract words, but it was a cautious crafting of numerous definitions that made the proofs easy, and in practical applications one needs to verify that all functions and sets satisfy exactly the appropriate definition. Henry Poincarè once quipped, that any mathematical truth is born as a paradox and ends up as a triviality.

After playing around with continuity we shifted to the study of differentiability. Note that the notion of continuity of a function f(x) at a point a can be interpreted as a fact that this function can be reasonably approximated by the constant function c(x)\equiv c= f(a)\in\mathbb R^m: the closer a point x approaches a, the smaller is the error of the approximation |f(x)-c(x)|=|f(x)-f(a)|. Of course, this completely transparent phrase of a human language requires quantifiers to measure “proximity” and “smallness” and how one implies the other.

Constant functions (i.e., real numbers or vectors in the case of functions of several variables) form a very simple class of functions, yet they can study some properties of the continuous functions. One is naturally tempted to look for larger classes of functions which will be on one hand easy to operate with, on the other hand would give a finer and more detailed information about functions that admit approximation by these functions.

This class is called affine (sometimes linear, see below) functions. The origin of this notion is in the Algebra, more precisely, in the so called Linear Algebra. A linear (or vector) space over a field \mathbb R or \mathbb C is a set V equipped with two operations, the vector sum/difference \pm, a binary commutative invertible operation (making V into a commutative group) and the multiplication by the scalars (numbers) from the field. Of course, the distributive and associative laws are assumed. The standard examples are (arithmetic) vector spaces \mathbb R^n,\ n=\dim V\geqslant 1.

A function L:V\to W between two (in general, different) vector spaces is called linear, if it “respects” both linear operations. In particular, it must map the zero vector of V to the zero vector of W. If \dim V=\dim W=1, then all linear functions have a simplest form L(x)=\lambda x for some number \lambda\in\mathbb R (possibly, zero), in the general multidimensional case a linear map is determined by mn real numbers which are naturally arranged in the form of a matrix, an m\times n-table.

However, the class of linear maps is not sufficiently large. For instance, the shift map T=T_c:\mathbb R^n\to\mathbb R^n, \ T_c(x)=x+c, is not linear (T(0)\ne 0 unless c=0, in which case the shift becomes a dull identical map). Thus we need to make one last step and consider the smallest class of maps that would contain all linear maps and all shifts (parallel translations) and closed by compositions. This class is called the class of affine maps: reducing similar terms, one can show that by definition, an affine map A:\mathbb R^n\to\mathbb R^m is a composition, A=TL, where L is a linear map and $T$ a translation in the target space. Explicitly, we have Ax=Lx+c (“linear non-homogeneous function”). Note that there is a well-established tradition not to enclose the argument of linear maps in the parentheses and write Ax rather than A(x): this is because one can treat Ax as a result of some binary operation, “matrix multiplication” of a matrix A and a column vector x.

How can one construct explicitly an affine approximation for a function f:\mathbb R^n\to\mathbb R^m? Let us start from the simplest case m=n=1 (functions of one real variable). If f were already an affine function, then we would have \forall x\in\mathbb R\quad f(x)=\lambda x +c for some two constants \lambda,c\in\mathbb R, which could be found using the values of f(x),f(a) at any two different points x\ne a\in\mathbb R by the formulas

\lambda = \displaystyle \frac{f(x)-f(a)}{x-a}, \qquad c= f(a)-\lambda a=f(x)-\lambda x.

Note that in the affine case the values of \lambda,c do not depend on the choice of the two points x,a.

Proposition. Assume that for a function f as above, and for some point a\in\mathbb R there exists the limit \lambda=\displaystyle\lim_{x\to a}\frac{f(x)-f(a)}{x-a}\in\mathbb R. Then the function f admits an affine approximation by the function A(x)=\lambda(x-a)+c, \quad c=f(a), in the following sense: the relative error of approximation E(x)=\displaystyle\frac{|f(x)-A(x)|}{|x-a|} tends to zero as x\to a, \displaystyle\lim_{x\to a}E(x)=0.

Definition. The function f is said to be differentiable at the point a. The real number \lambda is called the derivative of f at a, and the linear map L(v)=\lambda v the differential of f at a.

Note that we on purpose denoted the argument of the differential by a new letter v. If f is differentiable at every point a of its domain, then the derivative \lambda and the differential will depend explicitly on a. Thus from a given function f(x) we have “derived” the derivative function (usually denoted by a\mapsto f'(a)), while the differential will become a function of two independent arguments a,v (usually denoted by df(a)v without extra braces, or simply df, omitting both arguments.

There are no general reasons for functions to be differentiable, the more so be differentiable at all points of their domain of definition: it is a strong condition. In some sense, “most” functions are nowhere differentiable. For instance, the stock exchange rates give an example of such functions, and in fact measurements of almost every microscopic physical quantities also behave like that.

Yet in an absolutely surprising way, the functions most important for description of our World, turn out to be differentiable everywhere or almost everywhere. Polynomials, algebraic, trigonometric, exponential functions, … all are differentiable, possibly except for some singular points. Moreover, the Laws of Nature, as discovered by Newton, are written in the language of Differential Equations, identities connecting unknown functions and their derivatives. And even the equations of the General Relativity have the form of differential equations, though these equations are partial (i. e., involve partial derivatives of the unknown functions). Yet this subject is way aside from our main line of exposition.

Supplementary material

Here are links to the transparencies from the several past lectures.

Monday, January 15, 2024

Lectures 5-6, Jan 9-16,2024

The simplifying language: Topology

Constructions involving numerous quantifiers are different to grasp. A good theory splits such constructions introducing appropriate notions and building a suitable language. Let A\subseteq\mathbb R^n be a subset in the Euclidean space and \mathrm{dist}(x,y) a metric on it which we (only for simplicity!) will assume translation invariant and denote |y-x|=|x-y|. Everywhere below B_r(a) will denote the open ball of radius r >0 centered at a point a\in\mathbb R^n: B_r(a)=\{x\in\mathbb R^n:|x-a|<r\}.

Definitions. A point a\in A is called interior point, if \exists r>0 such that the open ball B_r(a)=\{x\in\mathbb R^n: |x-a|<r\}\subseteq\mathbb R^n lies in A, i.e., B_r(a)\subseteq A. This is the same as saying that \exists r>0\ \forall x\in\mathbb R^n\ |x-a|\leqslant r\implies x\in A.

A point b\notin A is called exterior point, if it is interior for the complement \mathbb R^n\smallsetminus A. The points that are neither interior nor exterior are called the boundary points of A.

The sets of all interior and boundary points of A are denoted \mathrm{int}\, A and \partial A respectively.

A point a\in \mathbb R^n is called an accumulation point (for A), if \forall \varepsilon >0\ \exists x\in A such that |x-a|<\varepsilon. The set of all accumulation points for A is called its closure and denoted by \mathrm{clos}\,A. Sometimes the notation \overline{A} is used for the closure.

Exercise. Prove that \mathrm{int}\, A\cup\partial A=\mathrm{clos}\,A. Prove that the exterior of A is \mathbb R^n\smallsetminus \mathrm{clos}\, A.

Exercise. Prove that the notions of interior, exterior and boundary do not depend on the choice of the distance function in \mathbb R^n.

Definitions. A subset A\subseteq \mathbb R^n is called open, if it coincides with its own interior, A=\mathrm{int}\,A. The subset is closed, if it coincides with its own closure A=\mathrm{clos}\,A.

Exercise. Assume that n=1 and A=[0,1)\subseteq\mathbb R=\{0\leqslant x<1\}. Describe interior, closure and boundary of this segment. Is it open? closed? neither?

Exercise. Show that the complement of an open subset is closed and vice versa, the complement of a closed subset is open. Show that Are there other subsets that are both open and closed?

Theorem 1.

  1. A=\mathbb R^n and A=\varnothing are both open and closed simultaneously.
  2. Union of any family (infinite or even uncountable) of open sets is open.
  3. Finite intersection of open sets is open.
  4. Intersection of any Union of any family (infinite or even uncountable) of closed sets is closed.
  5. Finite union of closed sets is closed.

Continuity as a topological notion

Consider first the case of maps (functions) defined on the entire Euclidean space, f:\mathbb R^n\to\mathbb R^m.

Temporary Defininion. A map as above will be called an O-map1, if the preimage of any open set V\subseteq\mathbb R^m is an open subset U\subseteq\mathbb R^n.

Lemma 1. A map f continuous at all points of \mathbb R^n, is an O-map.

Proof. Consider any open set V\subset\mathbb R^m and its preimage U=f^{-1}(V)\subseteq\mathbb R^n. Let a\in U be any point in this preimage: by definition, this means that b=f(a)\in V. Since V is open in \mathbb R^m, there exists a ball B_\delta(b) of positive radius \delta>0 which lies in V. By continuity of f at a, there exists \varepsilon >0 such that |x-a|<\varepsilon\implies |f(x)-b|<\delta, that is, all points of the ball B_\varepsilon(a) are mapped inside B_\delta(b), hence inside V. Therefore the preimage U=f^{-1}(V) together with the point a contains a small ball around a, that is, a is an interior point for U. Since a was chosen arbitrarily, this means that all points of U are interior points, hence U is open. Since V was chosen arbitrary, we have proved that f is an O-map. Q.E.D.

Lemma 2. An O-map is continuous at every point a\in \mathbb R^n.

Proof. Let a\in\mathbb R^n be an arbitrary point, and denote b=f(a)\in \mathbb R^m. Consider an arbitrary open ball V=B_\delta(b) of positive radius \delta>0. To prove the continuity of f, we need to find an open ball around B_\varepsilon(a) such that its f-image is inside V. But since V is open2, its preimage U=f^{-1}(V) is also open in \mathbb R^n by the definition of an O-map applied to f. The openness of U means that each its point, in particular, the point a, is interior for U and hence the ball B_\varepsilon(a) with the required property exists. Q.E.D.

These two lemmas together prove that at least for maps whose domain is the entire Euclidean space, the property “Preimages of open sets are open” (as stated in the Temporary Definition) is fully equivalent to the property of being continuous on the entire domain.

How this result should be modified for maps f:A\to\mathbb R^m whose domains are only proper subsets of the Euclidean subspace, A\subseteq \mathbb R^n? The answer is simpler than you might imagine. You don’t need to modify the definition of O-maps, you need to twist the definition of open sets, making it relative to the arbitrary domain A of definition of the map f.

Definition. Let A\subseteq\mathbb R^n be an arbitrary (not necessarily open) subset of the Euclidean space. A subset U\subseteq A is called open relative to A, if there exists an open (in the original sense) subset U'\subseteq \mathbb R^n such that U=U'\cap A.

One can immediately and easily check (by just passing from open sets to relatively open obtained by intersection with any subset A, that:

  • Theorem 1 above remains valid in the relative sense, with the only required correction that the “absolute” A should replace \mathbb R^n as a set which is both relatively open and relatively closed.
  • The proofs of both Lemmas 1 and 2 remain literally true if we replace the (ordinary, absolute) openness by the openness relative to A: indeed, for x\notin A the value f(x) is simply undefined hence cannot violate any inequality or inclusion.

As a result, we obtain the following reformulation of continuity for maps defined on proper subsets of the Euclidean space.

Theorem 2. A map f:A\to \mathbb R^m defined on a subset A\subseteq \mathbb R^n is continuous on its domain of definition, if and only if the preimage f^{-1}(V) of any open subset V\subseteq\mathbb R^m is an open subset relative to A.

Note that this equivalent definition of continuity of a map at all points of its domain formally requires only one quantifier, assuming that the notion of an open set is sufficiently familiar to the reader: indeed, it asserts that f:A\to\mathbb R^m is continuous if and only if

\forall V\text{ open in }\mathbb R^m\quad f^{-1}(V)\text{ is relatively open in }A.

But there is much more to gain from the topological approach.

Topological spaces

The topological language that we introduces in a very particular settings (for subsets of the Euclidean sets) actually works in a much broader context. Indeed, Theorem 1 above is a pretty good motivation for the following definition.

Definition. A topological space X is an abstract set (eventually very large, much larger than subsets of \mathbb R^n such that some of its subsets are distinguished by bearing a noble name of open sets U_\alpha. There are only three axioms these open sets must obey:

  • The total space X itself and the empty set \varnothing are open.
  • Union of any number of open sets is again open.
  • Intersection of any finite number of open sets is open.

Note that the axioms do not specify any way concrete way how open sets should be defined in any concrete example. Only their algebraic properties in the Boolean algebra are important. This is dangerous (examples may challenge our intuition) but provides great versatility. In particular, Theorem 2 above allows to define the continuity for any map f:X\to Y between any two topological spaces, with an immediate trivial corollary that composition of any two continuous maps (when defined) will again be continuous. This becomes a trivial observation (why?), although the proof in the “classical” case is also very easy.

What we (on our rather down-to-earth) level can gain from so abstract constructions? Quite a lot, even if we consider only topological spaces embedded in \mathbb R^n with the supply of open spaces through the definition of relative openness.

Connected spaces

Using only topological terms, we can formulate one of the most basic properties of sets, the fact that they do not fall apart as unions of smaller sets. It is instrumental in the study: if something is built from smaller components that do not interact with each other, then one can study these components separately and then “mechanically” bring the results together.

Definition. A topological space X is called disconnected (or disconnect), if it can be represented as a disjoint union of two open sets, X=U\cup V with U\cap V=\varnothing. If such representation is impossible, we call the space connected. Examples are numerous: the Euclidean spaces of all (finite) dimensions are connected, yet the set A=(-1,0)\cup (0,1)\subseteq\mathbb R^1 is disconnected, as the two relatively open subsets provide the partition.

Remark. The property of connectedness is very closely related to the completeness of the real numbers. One can consider the rational numbers \mathbb Q as a topological space and define open and closed sets relative to them. Then the sets \{q\in\mathbb Q: q^2<2\} and \{q\in\mathbb Q: q^2>2\} are obviously (relatively) open and disjoint from each other, but their union is the whole of \mathbb Q.

Theorem 3. A continuous map f:X\to Y preserves connectedness: if X is connected, then so is Y.

Proof. Assume that U,V are two disjoint open subsets such that Y=U\cup V. Then their preimages f^{-1}(U) and f^{-1}(V) are open by continuity of f, obviously disjoint and their union gives X in contradiction with the assumption on X. Q.E.D.

Exercise. Describe subsets of \mathbb R^1 which are connected topological spaces with respect to the relative topology inherited from \mathbb R^1. Derive from Theorem 4 the familiar Theorem on intermediate value: if a function continuous on a segment I\subseteq\mathbb R (finite or infinite, doesn’t matter) takes two different values y_1<y_2, then it takes also all intermediate values \{x: y_1\leqslant x \leqslant y_2\}.

Warning. One should be very careful and never confuse between preimages and images. The preimage of the connected interval (1,4)\subseteq\mathbb R by the continuous map f:\mathbb R\to\mathbb R, f(x)=x^2, is the disconnected union (-2,-1)\cup(1,2).

Another example of a useful notion that is of purely topological nature, is that of an isolated point.

Definition. A point a\in X is an isolated point of a topological space X (e.g., a subset A\subseteq\mathbb R^n with the topology defined by the relatively open sets), if the one-point subset \{a\}\subseteq X is both open and closed3.

Proposition. Any map f:X\to Y is automatically continuous at all isolated points of X. Q.E.D.

Compact sets

Another purely topological property of topological spaces (in particular, subsets of \mathbb R^n with the inherited relative topology) is a mighty generalization of some finiteness property. Recall that finite collections (say, of positive numbers) allow to choose a minimal element, which will still be positive: infinite collections of positive numbers, like the set \{1/n: n\in\mathbb N\}\subseteq\mathbb R^1 do not allow such choice: the only nonnegative element that is smaller than all number in the above set, is zero which is non-positive.

Definition. A collection (finite or infinite) of sets \{U_\alpha\subseteq X\}_{\alpha\in\mathscr A} is an open covering of the topological space X, if:

  • All sets U_\alpha, \alpha\in\mathscr A are open, and
  • X=\bigcup _{\alpha\in\mathscr A} U_\alpha.

When dealing with subsets of Euclidean spaces A\subseteq\mathbb R^n we can assume that a covering \mathscr U is a collection of open subsets U_\alpha\subseteq\mathbb R^n in \mathbb R^n, which contain A in their union, A\subseteq \bigcup_{\alpha\in \mathscr A} U_\alpha.

A subcovering is a subcollection \{U_\alpha:\alpha\in\mathscr B,\ \mathscr B\subseteq\mathscr A\}, that is, a collection of open sets which still cover X obtained by rarefying \mathscr A, that is, discarding (throwing away) some open sets from the initial covering.

Example. Let f:A\to\mathbb R^1 is a function continuous at all points of its domain. Then for every point a\in A there exists an open set U_a\subseteq A such that f(U_a)\subseteq B_1(f(a)). The collection \{U_a:a\in A\} is an open covering of A. Another example of the covering is the representation of the real line \mathbb R^1 as the union of open sets,

\mathbb R^1=\bigcup\limits_{n\in\mathbb Z}U_n,\qquad (n-\tfrac 13,n+1+\tfrac13).

The second covering is minimal in the sense that removing of any of the sets U_n is not covering of \mathbb R^1 anymore: the middle third of the corresponding segment [n,n+1] will become uncovered. Yet some of the coverings are definitely non-minimal, and one can safely remove some of the open sets which were used to cover.

Definition. A topological space X (i.e., a set A\subseteq \mathbb R^n with the inherited topology) is called compact, if any open covering can be decimated to produce a finite open subcovering.

Make no mistake: compactness does not mean that there simply exists finite open covering: any subset can be covered by just one open set (e.g., the space X itself). Compactness means that a finite covering can be achieved by discarding all but finitely many open sets from any open covering. This definition is rather technical, it is somewhat difficult to digest (people rarely have any working intuition with coverings and their finite subcoverings), yet the idea is quite transparent: compact sets possess some hidden “finiteness”. Yet in a very surprising way sometimes compactness can be achieved by adding some points to a non-compact spaces. For instance, the non-compact real interval \{0<x<1\} (it is non-compact because the infinite open covering (0,1)=\bigcup_{n\in\mathbb N}(\tfrac 1n, 1-\tfrac1n) cannot be reduced to a finite subcovering) can be compactified by adding two endpoints x=0 and x=1. The explanation “on one leg” of this phenomenon is simple: adding the extra points imposes additional requirement on the collection of open sets to be a covering, i.e., to cover the extra point as well.

Exercise. An unbounded set A\subseteq\mathbb R^n cannot be finite. Indeed, consider the union of all open balls of radius 1 around all points of A, \bigcup_{a\in A}B_1(a). This is a covering, since each point belongs to “its” ball. Yet no finite subcovering can be selected from this covering: the union of finitely many balls of radius 1 must be bounded. A similar easy argument shows that a set which is not closed, also cannot be compact.

Proposition. The real closed segment [0,1]\subseteq\mathbb R^1 is compact.

Proof. Consider an arbitrary open covering \mathscr U=\{U_\alpha\} and let M\subseteq [0,1] be the set of all points a\in[0,1] such that the subsegment [0,a] admits a finite subcovering selected from \mathscr U. Since 0 is covered, the set M contains some positive number. Denote by b the supremum of points in M, b=\sup\limits_{a\in M}a\leqslant 1. We claim that b=1. Indeed, if b<1, then b\in U_\alpha for some open set U_\alpha\in\mathscr U. But since U_\alpha is open, for some sufficiently small \varepsilon >0 the point b+\varepsilon would still be in U_\alpha and hence the same finite subcovering will still “serve” the point b+\varepsilon. This contradicts our choice of b<1 as the exact supremum. This leaves the only possibility that b=1, that is, the entire segment [0,1] admits a finite subcovering selectable from \mathscr U. Q.E.D.

Remark. Compactness of the closed segment [0,1] uses the fact that any bounded set set of real numbers has the exact supremum. Indeed, the “rational closed segment I=\{q\in\mathbb Q: 0\leqslant q\leqslant q\} is not compact. To see this, let us enumerate all points of I by natural numbers, I=\{q_1,q_2,\dots,q_n,\dots\} (this is possible, since \mathbb Q is countable!) and consider the open covering U which covers each point q_n\in I by the interval (open ball) of radius \bigl(\tfrac13\bigr)^n >0 centered at this point. This infinite covering does not admit a finite subcovering of I, since the sum of lengths of any finite number of intervals from \mathscr U is less than \tfrac23 which is less than 1, so at least one point of I will remain “unserved”.

Compactness and continuity

Let X,Y be topological spaces and f:X\to Y a continuous map.

Theorem. If X is compact, then its image is f(X) is compact in Y.

Proof. Let \mathscr V=\{V_\alpha\} be an arbitrary open covering of Y. Consider the sets U_\alpha=f^{-1}(V_\alpha). By continuity of f, these sets are open and together give a covering $\mathscr U$ of X. Since X is assumed compact, there exists a finite subcovering of \mathscr U. The corresponding finitely many sets V_\alpha give a finite covering of Y. Q.E.D.

Combining this Theorem with the Exercise above, we see that any continuous function is bounded on any compact topological space. Note that the preimage of a compact set may well be noncompact (consider any constant function on \mathbb R).

Problem. Prove that any closed subset of a compact topological space is also compact.

The following result, which we will not prove, describes all compact subsets of finite-dimensional Euclidean space.

Theorem. A subset A\subseteq\mathbb R^n is compact, if and only if it is bounded (i.e., \sup_{a\in A}|x|<+\infty) and closed (\mathrm{clos}\,A=A). Q.E.D.

Warning

The simplicity obtained by carefully crafting the definitions may well be misleading. Open and closed subsets of \mathbb R^n provide a rich basis for our finitely-dimensional intuition. Yet the general notion of a topological space X without assuming that its topology is inherited from an embedding of X in some space \mathbb R^n (recall that by “topology” we mean a rule that allows to declare some subsets of X open) allows for some surprising results.

Example. Consider the real line \mathbb R with the origin x=0 deleted, but with two distinct imaginary points 0^\pm added instead. We can introduce a “perverse topology” on this set \mathbb R^{\bigstar} by declaring that the open sets are the open sets in the former \mathbb R by replacing 0 by only one of the two artificially created points 0^\pm. This rule describes all open subsets of \mathbb R^\bigstar which (check it at home) are consistent with claiming that X=\mathbb R^\bigstar is a topological space.

What is “wrong” with this space? The answer is very easy: the two distinct points, 0^+ and 0^-, cannot be separated by disjoint open sets: any two sets U^\pm open in the topology of \mathbb R^\bigstar and containing the points 0^+ and 0^- respectively, necessarily intersect: U^+\cap U^-\ne\varnothing. This means that the topology of X=\mathbb R^\bigstar cannot be generated by any distance function on X. This is an example of the so called non-Hausdorff topology: it happens quite often when dealing with the topological spaces of algebraic origin.

Example. The same space $\mathbb R^1$ can be equipped with a non-standard topology, the so called Zariski topology: namely, declare closed only finite sets and (and the line itself). Hence open will be complements to finite point sets (and the empty set). This topology is also non-Hausdorff: any two non-empty open sets intersect.

Footnotes

  1. Of course, the letter O should remind you about the Openness definition. ↩︎
  2. Here we use the obvious fact (can you prove it ;-)?) that an open ball in \mathbb R^n is an open set! ↩︎
  3. Such a space cannot be connected if it has at least one other point b\ne a. ↩︎

Sunday, January 7, 2024

Lecture 4, Jan 2, 2024. Happy New Year!

Continuity

Usually discussion of limits in different forms precedes discussion of continuity. I find this somewhat illogical and difficult to comprehend. Continuity is intuitively very simple property, whereas the theory of limits with its cumbersome sequences of qualifiers is much less transparent.

Our main object of study will be functions (maps) defined on a subset of the Euclidean space of finite dimension A \subseteq\mathbb R^n and taking values in another space \mathbb R^m. The simplest case n=m=1 is “too narrow” to talk in proper geometric terms.

The intuitive definition of continuity of f is the following: “For any two points x,a\in A sufficiently close to each other, their images y=f(x) and $b=f(a)$ in \mathbb R^m will be also close to each other”. Rephrasing, “Small change of the argument implies small change of the value of the function”. This definition, however, requires thorough inspection, since the words occurring have little sense by themselves and some quantifiers “for any” and “exists” are clearly missing.

Proximity and distance

To talk about close points one needs to make precise the notion of proximity. For this sake we need to select a distance function on pairs of points (x,y)\in \mathbb R^n\times\mathbb R^n. This function should be far from arbitrary:

  • \textrm{dist}(x,y)\geqslant 0 for any pair of points; the distance is zero if and only if the two points coincide, x=y;
  • \textrm{dist}(x,y)=\textrm{dist}(y,x), that is, the distance is a symmetric function;
  • The triangle inequality holds: for any three points x,y,z\in\mathbb R we have \textrm{dist}(x,z)\leqslant\textrm{dist}(x,y)+\textrm{dist}(y,z)

Obviously, for n=m=1 the function \textrm{dist}(x,y)=|y-x|=|x-y| satisfies all these axioms. (Don’t think that this is the only function with such properties! The function |x^3-y^3| also satisfies all of them.) Yet in \mathbb R^n with n\geqslant 2 there is a whole family of distance functions, \textrm{dist}(x,y)=\sqrt[p]{\sum_{i=1}^n(x_i-y_i)^p} for any positive p\ge 1. The case \textrm{dist}(x,y)=\max_{i=1,\dots,n}|x_i-y_i| appears as the limit case when p grows to infinity, the case p=2 corresponds to the usual Euclidean metric.

Our constructions will not depend on the choice of any of these metrics, though some are more convenient for computations. In any case the sets of the form B_a(r)=\{x:\textrm{dist}(x,a) < r\} we will call open balls centered at a\in\mathbb R^n of radius r >0.

Notation. Moreover, in order to save on keystrokes, we will use the notation |x-y| for any of the above distance functions on \mathbb R^n. It is justified by the fact (that can be easily verified) that \textrm{dist}(x,y) for all p depends only on the difference, \textrm{dist}(x,y)=\textrm{dist}(x-y,0) (invariance by translations). Not all functions satisfying the above three properties possess this invariance.

Quantifiers

The pre-definition of continuity drafted above may now be written in the form of the implication,

|x-y| \text{ small }\implies |f(x)-f(y)|\text{ small}.

But there are no “small” and “large” numbers, even if we restrict to positive numbers. Of course, 0 is the smallest nonnegative number (the distance must be nonnegative!), but the claim |x-y| = 0 \implies |f(x)-f(y)|=0 is trivially valid for any function and the axiom of distance! Besides, it is not clear, for which pairs (x,y)\in A\times A\subseteq\mathbb R^n this claim must hold.

The smallness can be quantified. Let \varepsilon >0 be any positive number. Then the set of all numbers \{z: 0\le z\le \varepsilon\} can be called \varepsilon-small, if \varepsilon is chosen as a yardstick (a distance measured in miles may be small, but the same distance measured in yards or inches can be quite large, depending on what one is going to do, sail, swim or kiss).

We have two “small” quantities in the implication above. There is no reason to assume that they could be somehow related to each other: first of all, the corresponding distances are measured in different Euclidean spaces! So we replace the two instances of “smallness” by two quantified terms, \varepsilon-small and \delta-small, with two positive numbers \varepsilon,\delta >0, getting the implication

|x-y| < \varepsilon \implies |f(x)-f(y)|<\delta.

Now we have a logical claim \mathscr C(x,y,\varepsilon,\delta) that involves four “free variables”, x,y,\in A,\ \varepsilon, \delta >0. What shall we do with them? There are two options, either to tie down each variable by one of the two quantifiers $\forall, exists$ or to designate them as a “parameter” that has to be specified in advance.

Let us first address the variables x,y\in A. They clearly play a symmetric role, so it would be natural to assign the same quantifier in front of each of them, and this quantifier is obviously “for all”, \forall x,y\in A (warning: see below). As for the two remaining “scale variables” \varepsilon, \delta >0, we can play around with the two quantifiers \lozenge_1 \varepsilon and \lozenge_2 \delta, where \lozenge_{1,2} are independently either \forall or \exists, and placed in a different order: altogether this yields 8 various possible combinations (some of them identical).

It is an excellent exercise to understand the meaning of all the resulting properties. It turns out that the only one that fits our intuitive understanding of continuity is the the following:

\forall \varepsilon>0\ \exists\delta>0\text{ such that }\forall (x,y)\in A\times A \quad  |x-y| < \varepsilon \implies |f(x)-f(y)|<\delta.

The verbose definition corresponding to this phrase, using the terms \epsilon-closeness (resp., \delta-closeness) introduced earlier, is as follows.

Definition. A function f:A\to\mathbb R^m defined on the set A\subseteq\mathbb R^n is uniformly continuous on A, if for any requested proximity measure \varepsilon>0 in the target space one can find the proximity measure \delta >0 in the domain such that for any two \delta-close points x,y\in A their images f(x),f(y) are \varepsilon-close in the target space.

Variations: continuity at a point a\in A

The above definition is aesthetically nice but sometimes need to be relaxed. It implicitly involves some dependence: to establish the continuity, given an arbitrary $latex \varepsilon >0$, one needs to present a suitable $latex \delta>0$ which will serve all pairs x,y\in A simultaneously. This might be a challenge.

Example. Consider the function f:\mathbb R\to \mathbb R,\ f(x)=x^2. Then if |x-y|<\delta, then for any fixed \delta the distance |f(x)-f(y)|=|x-y|\cdot|x+y|=\delta|x+y| will eventually exceed any given \varepsilon>0, if the sum |x+y| is large enough, so the quadratic function, the nicest of all nonlinear function, turns out to be not uniformly continuous on its natural domain. This happens because the natural domain \mathbb R is non-compact (unbounded): the dependence of \delta on \varepsilon must depend also on where exactly are located two “close” points x,y.

To deal with this problem, we need to localize the statement, tying down one of the points, say, y=a\in\mathbb R^n and considering it as a parameter. The corresponding definition (stripped of the word “uniformly“, looks as follows.

Definition. A function f:A\to\mathbb R^m defined on the set A\subseteq\mathbb R^n is continuous at a point a\in A, if for any requested proximity measure \varepsilon>0 in the target space one can find the proximity measure \delta >0, depending on a\in A such that for any point x\in A such that |x-a|<\delta its image f(x) is \varepsilon-close in the target space to b=f(a), |f(x)-f(a)|<\varepsilon.

Note that instead of the “homologous” notation” x,y we switched to a notation x,a, stressing the difference between the point a\in A which is now an external parameter of the statement (for some a the claim may be true, for some false).

Digression on quantifiers

Quantifiers are a subtle issue. If you think accurately, their meaning depends on a lot of hidden data. For instance, if you see a herd of black horses in a field, you might be prompted to conclude that all horses are black. This is patently not true, since there are other horses elsewhere. But even if you add a qualification that all horses in this herd are black, it will be not exactly true: I am not sure whether horses which are black on one side and white on the other exist in the nature, but until this non-existence axiom is added to my logic, I can only conclude that all horses in this herd have a black side that was facing me at the moment of observation. Don’t think that is a stupid game: mathematics requests that all arguments pertinent to a judgement are explicitly put on the table. Yet absent other considerations, for any statement \mathscr C(x,y) that depends on two logical “free variables” x,y, the two claims,

\forall x\ \exists y\quad \mathscr C(x,y) \qquad \text{or}\quad \exists y\ \forall x\quad \mathscr C (x,y)

are non-equivalent: оne is stronger and implies the other. Sleep over this wisdom, it will help you a lot when parsing the math code with numerous quantifiers.

The mnemonics is pretty simple: “For every person there is a hat suiting him” and “There is a hat that suits every person”. The first is (allegedly) true, the second is patently wrong. Mind that!

The experience shows that three or more alternating quantifiers may cause problems even for professional mathematicians, not to say about beginning students. For instance, the full expansion of the sentence “the sequence \{a_n\} converges” takes the form involving the horrible four (!) quantifiers:

\exists A\in \mathbb R \ \forall \varepsilon >0\ \exists N\in\mathbb N\ \forall n>N\quad |a_n-A|<\varepsilon

No surprise that it takes so strong efforts to digest this construction when you see it for the first time. A proven way to grapple with this problem is to construct appropriate definitions involving the minimal possible numbers of quantifiers. For instance, the above sentence can be reformulated as follows, \exists A\ \lim_{n\to\infty}a_n=A, where the claim \lim_{n\to\infty}a_n=A, involving the number A as external parameter, expands using only three quantifiers, the usual definition of the limit of sequence.

Enclosures

The marked slides with lecture notes: https://drive.google.com/file/d/1YozuLkx8hCP1d9PysXCS5ncFnL1xJQz4/view?usp=sharing

The zoom record: https://weizmann.zoom.us/rec/share/6cP-3gTgT0yWo16KYzKqslE7oyP7uysgQEY5qBgZDqFp_HjG1uJRT7Nc_LWOB_Eh.qavB11mPx7W8RSMx?startTime=1704183044000
Passcode: Prr6Kz

Next Page »

Blog at WordPress.com.

Design a site like this with WordPress.com
Get started