0% found this document useful (0 votes)
38 views25 pages

Constructive Mathematics Notes

This document introduces constructive mathematics and algorithms. It discusses the division algorithm, Euclid's algorithm for finding the greatest common divisor (GCD) of two numbers, and provides an example computation. It also introduces linear Diophantine equations for finding integer solutions to ax + by = c, and provides a lemma for obtaining all solutions from a particular solution and the associated homogeneous equation.

Uploaded by

Chandan Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views25 pages

Constructive Mathematics Notes

This document introduces constructive mathematics and algorithms. It discusses the division algorithm, Euclid's algorithm for finding the greatest common divisor (GCD) of two numbers, and provides an example computation. It also introduces linear Diophantine equations for finding integer solutions to ax + by = c, and provides a lemma for obtaining all solutions from a particular solution and the associated homogeneous equation.

Uploaded by

Chandan Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Constructive Mathematics

Developed by Andy Wathen,


minor edits by Raphael Hauser

May 5, 2022
Introduction
Much of Mathematics is abstract. However a surprisingly large number of algo-
rithms - that is procedures which one can carry out in practice or in principle
- are used throughout the subject. In Pure Mathematics, a procedure which
can be guaranteed to yield a particular structure for a particular problem state-
ment may provide an existence proof. In Statistics, algorithms are widely used
on carefully collected data sets to yield statistics, that is numbers which have
some representative meaning. In Applied Mathematics, the solutions of prob-
lems which model the physical world are very often found approximately using
computational procedures.
Some constructions of use in algebra, analysis and more broadly in diverse
applications will be introduced and analysed in this course. The ability to ac-
tually carry out these procedures is quite often provided only by computers;
humans are not always so good at repetitive or routine procedures especially
where a lifetime of ‘hand’ calculations can be done in milliseconds by an appro-
priate algorithm implemented on an electronic computer!

The Division Algorithm and Euclid’s Algorithm


Consider the natural numbers
N = {0, 1, 2, . . .}.
A basic operation we learned in early school is to divide: if a ∈ N and b ∈ N
with a > b > 0 then we can always find q ∈ N − {0} and r ∈ N with 0 ≤ r < b
such that
a = q × b + r. (1)
Thus 42 = 2 × 15 + 12 or 151 = 13 × 11 + 8. Note the clear stipulation that q
is a positive integer and that the remainder, r is a non-negative integer which
is less than b: we would introduce non-uniqueness (of q, r) were this not so.
This simple procedure is known as The Division Algorithm on the natural
numbers.
A simple matlab implementation would be the function
function[q,r]=division(a,b)
q=floor(a/b);r=a-q*b;
end
(which one would store in the file division.m).
We will not usually denote multiplication by ×, but rather, as is common,
write qb for the product of q and b; in matlab it is q*b.
Armed with such a procedure, it is natural to explore what can be achieved
by repeated application: since b > r it is clear that the Division Algorithm can
be applied to the pair (b, r) as it was to (a, b) provided r > 0. Since we might
otherwise run out of letters, let us introduce subscripts and rewrite (1) as
a = q1 b + r1 . (2)

1
Further applying the Division Algorithm to (b, r1 ), there must be q2 ∈ N − {0}
and r2 ∈ N with 0 ≤ r2 < r1 such that
b = q2 r1 + r2 .
Repeating we obtain altogether
a = q1 b + r1
b = q2 r1 + r2 (3)
r1 = q3 r2 + r3
r2 = q4 r3 + r4
..
.
rj−3 = qj−1 rj−2 + rj−1 (4)
rj−2 = qj rj−1 + rj (5)
..
.
with a > b > r1 > r2 > r3 > r4 > . . . > rj > . . . > 0. One quickly realises that
since a, b and the rj for every j are members of N then this repeated procedure
must terminate at some finite stage with rj = 0 since the Division Algorithm
can continue to be applied whenever the remainder is positive. Thus if it is rj
that is the first remainder to be zero, (5) must be
rj−2 = qj rj−1 (6)
and our repeated procedure must stop: the Division Algorithm is not defined
for application to (rj−1 , 0).
Now (6) tells us that rj−1 is a factor of rj−2 . Thus the right hand side of
the equals sign in (4) must also be exactly divisible (in the sense of the natural
numbers: no fractions allowed here!) by rj−1 . Therefore rj−3 (as the left hand
side) is also exactly divisible by rj−1 . Continuing to apply this argument back
through the ‘equations’ we find, by induction, that rj−1 > 0 is a factor of each
of the natural numbers rj−2 , rj−3 , . . . , r2 , r1 , b, a.
Thus the repeated use of the Division Algorithm described here yields a
(positive integer) common factor of a and b. That this procedure, widely known
as Euclid’s Algorithm, in fact yields the highest common factor of a and b is
proved by the following inductive argument. Suppose d > rj−1 is a larger
common factor of a and b, then since (2) may be restated as
a − q1 b = r1 ,
it follows that d also must be a factor of r1 . Similarly using (3), d must divide r2
and inductively d must divide rj−1 . That is, we cannot have a common factor
of a, b which is greater than rj−1 .
The outcome is that Euclid’s Algorithm must always compute the highest
common factor, hcf (a, b), of a and b as the last non-zero remainder.

2
Example.

1099 = 2 × 525 + 49
525 = 10 × 49 + 35
49 = 1 × 35 + 14
35 = 2 × 14 + 7
14 = 2×7+0

hence 7 is the highest common factor of 1099 and 525.

Note that if we divide each of 1099 and 525 by 7 to get 157 and 75, then the
two numbers are necessarily coprime; that is the highest common factor of 157
and 75 is 1.
In case the highest common factor of larger natural numbers are required, a
simple matlab implementation is something like

function[h]=euclid(a,b)
% a>b>0 are expected to be integers otherwise this will not work!
oldr = a; newr = b;
while newr>0, [q,r]=division(oldr,newr); oldr=newr; newr=r; end
h=oldr, end

(After storing this in the file euclid.m you might wish to experiment: I’ve just
discovered that hcf (987654321, 123456789) = 9 which might be obvious, but I
didn’t know it! matlab starts to automatically convert very large integers to
real (floating point) numbers when they are bigger than 2147483647, so you’ll
have to use some other computational environment if you want to go bigger
than this.)
The basis of currently used cryptography and hence data security (whether
you insert your cash card into a cash machine or order a book online) is the
factorization of large integers. This is simply because factorization of a single
natural number is a very computationally intensive task, requiring much more
arithmetic than finding the highest common factor of two natural numbers by
Euclid’s Algorithm. Current crytographic methods (RSA, named after Rivest,
Shamir and Adleman is the most widely used) would become insecure if a fast
algorithm was found for integer factorization.

Simple Linear Diophantine Equations


We are used to solving linear equations over R (or indeed over Q): thus 3x+2 = 4
yields x = 2/3 and

2x + 5y = 3
7x + 3y = 5

3
yields x = 16/29, y = 11/29. An interesting and different set of issues arise if
we seek only solutions in the integers, Z. To make this sensible we must have
more variables than equations; 3x + 2 = 4 will clearly have no solution in the
integers, for example!
The simplest linear Diophantine equation (named after Diophantus of Alexan-
dria who lived in the 3rd century) is of the form

ax + by = c, (7)

where a, b, c ∈ N are given and it is desired to find all solutions x, y ∈ Z. If


a = b then the question of existence of solutions simply reduces to whether a is
a factor of c, or not, and there is some arbitrariness in x and y since only x + y
is essentially determined even if this condition is satisfied. Thus we consider
(7) only when the two ‘coefficients’ are different and, without loss of further
generality, when a > b.
First consider the situation where c is not divisible by h = hcf (a, b), the
highest common factor of a and b. The left hand side is then divisible by h for
all integer values of x, y, but the right hand side is not: the conclusion must be
that there can be no integer solutions in this case. Thus 1099x + 525y = 12 can
have no integer solutions since 12 is not divisible by 7 = hcf (1099, 525).
Next, if c is divisible by hcf (a, b) then the whole equation can be divided by
hcf (a, b); thus 1099x+525y = 28 has the same integer solutions as 157x+75y =
4. We may thus assume without loss of generality that a and b are co-prime
(that is, hf c(a, b) = 1). To find the set of all such solutions in this case, we can
first find a particular solution and then add the general solution of the associ-
ated homogeneous solution:

Lemma. If (xp , yp ) is a particular solution of (7), then (x, y) ∈ Z2 is a


solution of (7) if and only if it is of the form (x, y) = (xp + xh , yp + yh ), where
(xh , yh ) ∈ Z2 satisfies the homogeneous equation

ax + by = 0. (8)

Proof
Let (x, y) be a solution of (7), and define xh = xp − x, yh = yp − y. Then
(xh , yh ) ∈ Z2 and axh + byh = (axh + byh ) − (ax + by) = 0. The reverse impli-
cation follows similarly.

To find a particular solution of (7), it is sufficient to find any integer solution


x̂, ŷ of the equation
ax̂ + bŷ = 1 (9)
and recover a particular solution of (7) by setting (xp , yp ) = c(x̂, ŷ). For ex-
ample, a particular solution of 157x + 75y = 4 can be found by identifying a
particular solution (x̂, ŷ) of 157x̂ + 75ŷ = 1, and by multiplying the result by
4, (xp , yp ) = (4x̂, 4ŷ). In what follows, we will see that Diophantine equations
of the form (9) always have a solution when a and b are co-prime. The proof is

4
constructive and yields an algorithm for the computation of such a solution.

Examples.
i) 12x + 9y = 4 has no integer solutions, because 3 = hcf (12, 9) does not
divide into 4.
ii) 12x + 9y = 15 has the same set of integer solutions as

4x + 3y = 5, (10)

and a particular solution of the latter can be found by multiplying a par-


ticular solution of 4x̂ + 3ŷ = 1 by 5. Taking for example (x̂, ŷ) = (1, −1),
we find (x, y) = (5, −5) as a particular solution of (10). Note however,
that not all solutions of the latter equation can be obtained in this fash-
ion. For example, (2, −1) solves (10) but is not of the form 5(x̂, ŷ) for any
(x̂, ŷ) ∈ Z2 .

Consider now Euclid’s Algorithm applied to a, b as in (9), and assuming that


hcf (a, b) = 1, as above. Since 1 is the highest common factor, the final equation
in the algorithm (that is the one with the last non-zero remainder) must be

rj−2 = qj rj−1 + 1

for some j (in case the remainder 1 arises at the first stage (2) or second stage
(3) we could just call a = r−1 and b = r0 ). Thus

1 = rj−2 − qj rj−1 . (11)

The previous equation (4) similarly defines

rj−1 = rj−3 − qj−1 rj−2

and hence rj−1 can be subtituted in so that (11) become an equality between
1 and the sum of multiples of rj−2 and rj−3 . This process can always be con-
tinued (with integers only) until we have an equality between 1 and the sum of
multiples of b and a. An example demonstrates this best.

Example. Euclid’s Algorithm applied to (157, 75) gives

157 = 2 × 75 + 7
75 = 10 × 7 + 5
7 = 1×5+2
5 = 2×2+1
(2 = 2 × 1 + 0).

5
Thus,

1 = 5−2×2
= 5 − 2 × (7 − 1 × 5) = −2 × 7 + 3 × 5
= −2 × 7 + 3 × (75 − 10 × 7) = 3 × 75 − 32 × 7
= 3 × 75 − 32 × (157 − 2 × 75) = −32 × 157 + 67 × 75

so that x̂ = −32, ŷ = 67 is a solution of 157x̂ + 75ŷ = 1. A solution of 157x +


75y = 4, for example is then x = −128, y = 268.
Notice the similarity between this example and the example in the Euclid
Algorithm section above: the reduction to coprimality was not actually nec-
essary, but has been introduced here as a simplifying device. This is a very
common feature in general in the derivation and use of constructive methods:
define an algorithm for the simplest case from which a whole range of problems
can be solved.
The question arises as to whether we have found the only solution by the
above procedure: in fact if solutions exist, then there are always infinitely many
of which the above procedure has found only one. For consider if x̂, ŷ are the
solution of (9) found by the procedure above (or in fact by any method) and
ξ, η ∈ Z satisfy
aξ + bη = 0 (12)
then by addition (i.e. by linearity)

a(x̂ + ξ) + b(ŷ + η) = 1.

One solution of (12) is clearly ξ = b, η = −a and since a and b are coprime


it must be the case that ξ = nb, η = −na, n ∈ Z are the set of all solu-
tions of the homogeneous equation (12). So the set of all solutions to (9) are
x = x̂ + nb, y = ŷ − na, n ∈ Z. If there were any other solution x̃, ỹ not of this
form then x̃ − x̂, ỹ − ŷ must be a solution of the homogeneous equation (12)
from which it follows that x̃ = x̂ + nb, ỹ = ŷ − na for some n ∈ Z.

Examples.
i) 157 and 75 are coprime, and the equation 157x̂ + 75ŷ = 1 is satisfied for
x̂ = −32, ŷ = 67. A particular solution of

157x + 75y = 4 (13)

is thus (−128, 268) = 4(−32, 67), and the general solution of (13) is given
by x = −128 + 75n, y = 268 − 157n, (n ∈ Z).
ii) We saw previously that Equation (10) has (5, −5) as particular solution.
The set of all solutions is thus given as {(5 + 3n, −5 − 4n) : n ∈ Z}. In
particular, the solution (2, −1) is found in this set by choosing n = −1.

6
The above constructions (do more than) provide a proof of the following
result.

Corollary. (Bézout’s Lemma)


If a, b ∈ N are both non-zero and d = hcf (a, b) then there exist x, y ∈ Z satis-
fying ax + by = d.

Note, in particular, that if a and b are coprime then there are integer solu-
tions of ax + by = 1.
As a historical note, it was in the margin of his copy of a Latin translation
of Diophantus’ book Arithmetica that Fermat scribbled the now famous obser-
vation that he had a beautiful proof of what became known as Fermat’s Last
Theorem, but that there was not enough room in the margin to write it down!

Polynomials
There is another context in which the Division Algorithm might have been en-
countered at school: if one considers the set of real polynomials then division
is possible. We will not cover this rigorously here – it will appear in a later
algebra course – however a couple of examples indicate the idea.

Examples.
 
1 2 1
x4 + 2x3 − 2x2 + 3x = 2x2 + 4x − 6 + (x + 3),

x +
2 2
 
π 1
2x3 + 3x2 + (π − 2)x − = x − 2x2 + 4x + π + 0,

2 2

that is, in both cases we have

a(x) = q(x)b(x) + r(x),

where b(x) has degree lower than or equal than the degree of a(x), and r(x) is
a polynomial of degree strictly less than the degree of b(x).

Note that the (ordered) ‘size’ of a number is what is important for N, but
it is the degree which determines ‘size’ in this context: a(x) > b(x) would here
have to mean that the degree of the polynomial a is greater than the degree of
the polynomial b. The coefficients of all of the polynomials are in general real
and not integers.
If this division can always be done (it can), then Euclid’s Algorithm can also
be applied since it is simply repeated application of the Division Algorithm.
This repeated application must eventually lead to a remainder polynomial of
degree zero i.e. to a constant. If this constant is zero then a common factor

7
– indeed the highest common factor – of two polynomials has been found as
the previous remainder. If the constant is not zero then there are no common
factors other than (arbitrary) constants. Any constant is a common factor of
any pair of polynomials just as 1 is a common factor of any two natural numbers.

Example.

x6 − 4x5 + x4 − 4x3 − 5x2 + 24x + 3 = (x − 4) x5 + x3 − 6x + x2 + 3


 

x5 + x3 − 6x = x3 − 2x x2 + 3 + 0
 

so that (any scalar multiple of) x2 + 3 is the highest common factor of the poly-
nomials x6 − 4x5 + x4 − 4x3 − 5x2 + 24x + 3 and x5 + x3 − 6x.

Example.

x4 + 1 = (x) x3 + 1 − (x − 1)


x3 + 1 = −x2 − x − 1 (−x + 1) + 2


so there are no common factors of x4 + 1 and x3 + 1 (other than constants).

Thus there is in principle a constructive method to find common factors and


hence, common roots of two polynomials. Just as with the natural numbers,
however, the problem of factoring a single polynomial, that is finding its roots,
is rather more difficult. We consider this problem next: it has a rather more
analytic and less algebraic flavour and the methods generally apply to smooth
enough functions and not just to polynomials.

Root Finding: The Bisection Algorithm


If one is given a function which is continuous on a real interval and two points
a, b within this interval for which the values f (a) and f (b) are of opposite sign,
then the Intermediate Value Theorem immediately guarantees the existence of
at least one point α between a and b where f (α) = 0.
This observation can be turned into a procedure for finding such an α to
any desired accuracy,  > 0. The simplest way is to calculate the point mid way
between a and b, to check the sign of f at that point, to identify an interval of
half the length which must contain a root and to repeat:

Algorithm.
Starting from a = a0 , b = b0
for k = 0, 1, 2, . . .
calculate ck = 21 (ak + bk )
if f (ck ) is of the same sign as f (ak ) then set ak+1 = ck and bk+1 = bk
otherwise set ak+1 = ak and bk+1 = ck
repeat

8
If this procedure is stopped when |ak − bk | < 2 then |ck − α| <  and one has
the desired accuracy.
Analysing this procedure, one has
 
1 1 1 1 1
|ck − α| ≤ |bk − ak | = |bk−1 − ak−1 | = . . . = k+1 |b0 − a0 | = k+1 |b − a|
2 2 2 2 2

and a required accuracy  is guaranteed when 2k+1  ≥ |b − a|.


A simpler implementation just redefines a, b at each iteration:

Algorithm.
while |a − b| > , do the following
set m = 21 (a + b)
if f (m)f (a) < 0, redefine b = m
else redefine a = m
repeat

This algorithm is implemented in the following matlab function


function[m]=bisection(a,b,tol)
while b-a>tol, m=(a+b)/2;
if f(m)*f(a) <0, b=m, else a=m; end
end
end
which should be stored in the file bisection.m. The extra matlab function
function[value]=f(x)
value = x^2 -1;
end
(stored in the file f.m and set to x2 − 1 here) must simply define (that is,
evaluate) the function f .
This procedure is called the Bisection Method since one is bisecting (cutting
in half) the length of the interval containing a root at each stage. It is perhaps
the simplest root finding method. It illustrates the important idea that if one
has a procedure which generates a convergent sequence, then one can stop at
some finite stage to achieve an approximate solution to any desired accuracy:
with such a method, more computational work is required for better accuracy.

Fixed Point Methods


Given f : R → R the problem of finding a value α such that f (α) = 0 can be
recast in many different ways as the problem of finding a value α such that

α = g(α) (14)

for some g.

9
Examples.
(i) x5 − 3x + 12 = 0 ↔ x = 13 x5 + 12


(ii) x5 − 3x + 12 = 0 ↔ x = −1/(2x4 − 6)
(iii) sin x − x cos x = 0 ↔ x = tan x
(iv) f (x) = 0 ↔ x = x − µf (x) for any µ ∈ R, µ 6= 0.
A value α satisfying (14) is called a fixed point of the function g.
The associated fixed point iteration is to select some particular x0 ∈ R and
then to iteratively define (i.e. compute!)

xk = g(xk−1 ) for k = 1, 2, . . . . (15)

Clearly if this iteration defines a convergent sequence, so that xk → α as k → ∞,


then also xk−1 → α as k → ∞, so that the limit, α, must be a fixed point of g.
Our first consideration is sufficient conditions for existence of fixed points.
We will say that g : [a, b] → [a, b] is a continuous function on [a, b] if g is
continuous on (a, b) as well as g(a) = limx&a g(x) and g(b) = limx%b g(x). This
is all that is needed for the existence of a fixed point to be guaranteed:

Lemma. (Existence of a fixed point)


If g : [a, b] → [a, b] is continuous on [a, b] then g has a fixed point α ∈ [a, b].
Proof
If g(a) = a or g(b) = b then we are done. Assuming a 6= g(a) and b 6= g(b) then
if we set
h(x) = x − g(x)
we have 
h(a) = a − g(a) < 0
a ≤ g(x) ≤ b ⇒
h(b) = b − g(b) > 0
Now h is necessarily a continuous function on [a, b] since g is, so applying the
Intermediate Value Theorem, there exists α ∈ (a, b) with h(α) = 0, i.e., α =
g(α).

You may like to convince yourself that the above Lemma gives sufficient
but not necessary conditions for the existence of fixed points and also that
uniqueness cannot hold without further conditions. Such conditions are given
now:

Lemma. (Uniqueness of a fixed point)


If g : [a, b] → [a, b] is continuous on [a, b], differentiable on (a, b) and if there
exists γ ∈ R with 0 ≤ γ < 1 such that

|g 0 (s)| ≤ γ for all s ∈ (a, b)

then there is a unique fixed point of g in [a, b].

10
Proof
Assume g has 2 fixed points c 6= d, c, d ∈ [a, b] then we can employ the Mean
Value Theorem (or Taylor’s Theorem if you prefer) which guarantees the exis-
tence of ξ ∈ R strictly between c and d with

g(c) − g(d) = (c − d)g 0 (ξ).

But c and d are fixed points of g, so g(c) = c and g(d) = d, thus

c − d = (c − d)g 0 (ξ)

and on taking absolute values we have

|c − d| = |c − d||g 0 (ξ)| ≤ γ|c − d| < |c − d|

which provides a contradiction and hence, completes the proof of uniqueness.

This Lemma is rather more important than one might expect. Consider if
a function g did have 2 fixed points, then the fixed point iteration (15) might
compute a sequence which simply oscillated between them and so not converge
or it might compute an even more complicated non-convergent (hence divergent)
sequence. This a major consideration for constructive methods: problems which
have non-unique solutions are generally fraught with difficulties, whereas if any
solution is unique then at least there is a candidate which might be computed
or approximated by a constructive method.
In the context of fixed points and the fixed point iteration, precisely the
same conditions which guarantee existence and uniqueness of a fixed point also
guarantee convergence of the fixed point iteration:

Lemma. (Convergence of fixed point iteration)


If g : [a, b] → [a, b] is continuous on [a, b], differentiable on (a, b) and if there
exists γ ∈ R with 0 ≤ γ < 1 such that

|g 0 (s)| ≤ γ for all s ∈ (a, b)

then the fixed point iteration

xk = g(xx−1 ), k = 1, 2, . . .

gives a sequence {xk , k = 0, 1, 2, . . .} which converges to the unique fixed point


in [a, b] for any x0 ∈ [a, b].
Proof
x0 ∈ [a, b] ⇒ x1 = g(x0 ) ∈ [a, b] and inductively xk = g(xk−1 ) ∈ [a, b] for
k = 0, 1, 2, . . ..
If α is the fixed point in [a, b] then

xk − α = g(xk−1 ) − g(α). (16)

11
By Taylor’s Theorem (or the Mean Value Theorem) there exists ξk strictly
between xk−1 and α such that

g(xk−1 ) = g(α) + (xk−1 − α)g 0 (ξk )

and using this in (16) gives

xk − α = (xk−1 − α)g 0 (ξk ).

On taking absolute values we have

|xk − α| = |xk−1 − α| |g 0 (ξk )| ≤ γ|xk−1 − α| ≤ γ 2 |xk−2 − α| ≤ . . . ≤ γ k |x0 − α|.


(17)
Thus xk → α as k → ∞ because γ < 1 ⇒ γ k → 0 as k → ∞.

These results establish checkable conditions for fixed points and conver-
gence of fixed point iterations. Note that the proof here is non-constructive
(indeed the existence Lemma used the Intermediate Value Theorem which is
non-constructive), but the result allows for the reliable use of a constructive
method for the calculation of fixed points.
Example.
Determine if f (x) = x5 − 3x + 12 has any roots in [0, 12 ] and if so, find them.
As in the example above we can write this as the fixed point problem
1 5 1
x= (x + ) = g(x).
3 2
Now clearly for 0 ≤ x ≤ 21 we have 16 ≤ g(x) ≤ 31 ( 32
1
+ 12 ) so g : [0, 12 ] → [0, 12 ]
and roots must exists here because g is clearly continuous. Further
5 4 5
g 0 (x) = x ⇒ |g 0 (x)| ≤ <1
3 48
on the open interval. Hence there exists a unique fixed point of g and thus a
unique real root of f in [0, 12 ]. Starting with x0 = 0.25 we calculate

x1 = 0.16699219, x2 = 0.16670995, x3 = 0.16670959 = x4 = x5 = . . .

to the number of decimal places shown.


Example.
On [0, 3/2] verify if there are any roots of cos x − x = 0 and calculate any
such root to 3 decimal places.
Write as g(x) = cos x and note that g(x) ∈ [0, 1] ⊂ [0, 3/2] (as 3/2 < π/2)
and g is certainly a continuous function on the given interval, so at least one
fixed point in [0, 3/2] exists. Also |g 0 (x)| = | − sin(x)| ≤ sin(3/2) < 0.9975

12
for x ∈ (0, 3/2) so there is a unique fixed point in this interval which we may
compute by fixed point iteration: say we start with x0 = 0 ∈ [0, 3/2]

x1 = cos(x0 ) = 1
x2 = cos(x1 ) = 0.5403
x3 = cos(x2 ) = 0.8576
x4 = cos(x3 ) = 0.6543
x5 = cos(x4 ) = 0.7935
..
.
x21 = cos(x20 ) = 0.7392
x22 = cos(x21 ) = 0.7390
x23 = cos(x22 ) = 0.7391 = x24 = x25 = . . . .

Given the value γ = 0.9975 here for which powers reduce pretty slowly, (0.997523 ≈
0.9441) it is not so surprising that it takes many more iterations here to get con-
vergence than in the example above.
It is pretty clear from this last example that we can use ‘mathematics by
hand’ to verify that there is a unique root in the required interval, and that may
be all that we want to know. However if we wish to actually calculate this root,
then using a computer is worthwhile. A simple piece of matlab code (stored
in the file fixedpt.m) here will enable this:
function[x] = fixedpt(xin,tol)
xold=xin;x=g(xin),
while abs(x-xold)>tol,xold=x;x=g(xold), end
end
where the function g needs to be defined (and stored in the file g.m)
function[value]= g(x)
value=cos(x);
end
If you don’t want to see anything but the approximate root then one can simply
amend the last three lines as
xold=xin;x=g(xin);
while abs(x-xold)>tol,xold=x;x=g(xold); end
x, end

Though this may be the more common situation, it is not actually necessary
for the function g to be differentiable in order to guarantee uniqueness of fixed
points and convergence of fixed point iterations:

13
Theorem. (The Contraction Mapping Theorem)
If g : [a, b] → [a, b] is continuous and satisfies

|g(x) − g(y)| ≤ γ|x − y|

for some real value 0 ≤ γ < 1 and for all x, y ∈ [a, b] then there exists a unique
fixed point of g in [a, b] and for any x0 ∈ [a, b] the fixed point iteration (15) will
converge to it.
Proof. Similar to the above.

Whether g is differentiable or not, the value of γ is important; if γ is only


just smaller than 1 then γ k reduces only slowly with increasing k, whereas if γ
is rather smaller than 1 then the reduction is rapid and (17) guarantees rapid
convergence to any desired accuracy of the corresponding fixed point iteration.
In practice this is often a significant consideration: existence of γ < 1 is suffi-
cient if existence of a unique fixed point is all one is interested in, but if one
actually wants to compute accurate approximations of fixed points then speed
of convergence of the fixed point iteration is important.

Newton’s Method
It follows from (17) that
|xk − α|
0< ≤ γ < 1. (18)
|xk−1 − α|
A convergent sequence {xk } satisfying (18) is said to converge linearly and the
corresponding fixed point iteration to have linear convergence. The question
arises as to whether more rapid convergence can be achieved with a fixed point
iteration. For example, do there exist methods for which
|xk − α|
0< ≤K (19)
|xk−1 − α|2
for some constant, K? Any such fixed point method is called quadratically
convergent.
From Taylor’s Theorem, provided g ∈ C 2 on some appropriate interval con-
taining α, we have
1
g(xk ) = g(α) + (xk − α)g 0 (α) + (xk − α)2 g 00 (ξk ), some ξk
2
i.e.
1
xk+1 − α = (xk − α)g 0 (α) + (xk − α)2 g 00 (ξk )
2
and so quadratic convergence will be achieved if g 0 (α) = 0. So to find a root α
of f , look for φ(x) so that

x − φ(x)f (x) = g(x)

14
satisfies g 0 (α) = 0:
g 0 (α) = 1 − φ(α)f 0 (α) − φ0 (α)f (α) ⇒ φ(x) = 1/f 0 (x)
(or indeed any sufficiently smooth function φ which happens to satisfy φ(α) =
1/f 0 (α)) since f (α) = 0. The resulting method is Newton’s method:
xk+1 = xk − f (xk )/f 0 (xk ).
It is widely used because of the quadratic convergence property, but it is only
locally convergent: it is necessary that x0 is sufficiently close to α to guarantee
convergence to α. Also note that xk+1 is only defined (bounded) if f 0 (xk ) 6= 0.
Rigorous conditions guaranteeing the convergence of (the iterates computed by)
Newton’s method are somewhat technical and will not be covered in this course:
statements about the convergence of Newton’s method should be assumed to
implicitly include the condition: ‘if {xk } converges’.
Example.
f (x) = x5 − 3x + 1
2 clearly has f 0 (x) = 5x4 − 3 so Newton’s method gives
the following
(i) if x0 = 0, then
x1 = 0.166666666666667
x2 = 0.166709588805906
x3 = 0.166709588834381 = x4 = . . .
(ii) if x0 = 0.8 then
x1 = −0.851596638655463
x2 = 6.188323841208973
x3 = 4.952617138936713
x4 = 3.965882550556341
x5 = 3.180014757669606
x6 = 2.558042610355714
x7 = 2.073148949750824
x8 = 1.708602767234352
x9 = 1.457779524622984
x10 = 1.319367836130980
x11 = 1.274944679385050
x12 = 1.270653002026494
x13 = 1.270615088695053
x14 = 1.270615085755746 = x15 = . . .
Notice how in (ii) it takes several iterations before the iterates ‘settle down’
and ultimately converge rapidly (in fact quadratically!) to another root of this
quintic polynomial.

15
Example.
f (x) = cos(x) − x has f 0 (x) = − sin(x) − 1 so Newton’s method gives
(i) if x0 = 0, then

x1 = 1
x2 = 0.750363867840244
x3 = 0.739112890911362
x4 = 0.739085133385284
x5 = 0.739085133215161 = x6 = . . .

(ii) if x0 = −3, then


x1 = −0.6597
x2 = 3.0858
x3 = −0.7829
x4 = 4.2797
x5 = −46.7126
x6 = 29.5906
x7 = −896.8049
..
.
and the sequence of iterates converges only after 131 iterations, after a phase
of erratic behaviour.
The simple matlab functions used to obtain these numerical results are
function[x] = newton(xin,tol)
x=xin;funvalue = f(x),
while abs(funvalue)>tol,x=x-funvalue/fprime(x), funvalue = f(x), end
end

function[value]=f(x)
value = x^5 -3*x + 0.5;
% value = cos(x)-x;
end

function[value]= fprime(x)
value=5*x^4 - 3;
% value = -sin(x)-1;
end

For functions with enough derivatives, one can define iterative methods for which

|xk − α|
0< ≤K (20)
|xk−1 − α|p

16
for some constant, K. Any such method would have order of convergence p.
In practice there is generally little to be gained from consideration of meth-
ods with higher order convergence than p = 2 and Newton’s method (and its
generalisations, some of which we will come to later) is by far the most im-
portant practical method for solving nonlinear equations when derivatives are
available. If the derivative is not readily available, then the leading general,
practical method is probably Brent’s method which is essentially a clever com-
bination of the Bisection method and the Secant method (see problem 5 on
question sheet 3).

Horner’s Method
When seeking roots of polynomials expressed in the form

pn (z) = an z n + an−1 z n−1 + . . . + a1 z + a0 , an 6= 0, (21)

Newton’s method can be carried out in a particularly efficient and elegant way
using Horner’s method or Horner’s rule which is sometimes also called the nested
multiplication scheme. This is approached most easily by considering an efficient
algorithm for evaluating a polynomial.
Theorem. Setting bn = an and for r = n − 1, n − 2, . . . 2, 1, 0 setting

br = br+1 σ + ar ,

then b0 = pn (σ). Moreover, if

qn−1 (z) = bn z n−1 + bn−1 z n−2 + . . . + b2 z + b1 ,

then
pn (z) = (z − σ)qn−1 (z) + b0 .

Proof

(z − σ)qn−1 (z) + b0 = (z − σ)(bn z n−1 + bn−1 z n−2 + . . . + b2 z + b1 ) + b0


= bn z n + (bn−1 − bn σ)z n−1 + · · · + (br − br+1 σ)z r +
· · · + (b1 − b2 σ)z + (b0 − b1 σ).

But bn = an , and for each r ∈ {n − 1, n − 2, . . . , 2, 1, 0}, br = br+1 σ + ar which


implies ar = br − br+1 σ, so that

(z − σ)qn−1 (z) + b0 = an z n + an−1 z n−1 + . . . + a1 z + a0 = pn (z).

Setting z = σ gives b0 = pn (σ).


Note that this nested multiplication requires only n multiplications and n
additions.

17
Differentiating
pn (z) = (z − σ)qn−1 (z) + b0
gives
p0n (z) = (z − σ)qn−1
0
(z) + qn−1 (z)
so that p0n (σ) = qn−1 (σ). We may thus write Newton’s method for a root of a
polynomial as: choose x0 and for k = 0, 1, 2, . . . set
xk+1 = xk − pn (xk )/p0n (xk ),
so with xk = σ:
xk+1 = xk − pn (xk )/qn−1 (xk ).
So perform nested multiplication on pn (xk ) to give
bn = an and for r = n − 1, n − 2, . . . , 2, 1, 0 compute br = br+1 xk + ar
so that b0 = pn (xk ), and then perform nested multiplication on qn−1 (xk ) to give
cn−1 = bn and for r = n − 2, n − 3, . . . , 2, 1, 0 compute cr = cr+1 xk + br+1
so that c0 = qn−1 (xk ), and thus the next Newton iterate is
xk+1 = xk − b0 /c0 .
For a computational algorithm, perform the two evaluations consecutively:
function[x] = horner(xin,a,n,maxit,tol)
% n is degree of the polynomial, a is a vector of length n+1 containing
% the coefficients with constant term first, the coefficient of the
% linear term second, then quadratic etc
% xin is the starting value for the Newton iteration
x=xin;
for k=1:maxit,
b=a(n+1); % compute b_n
c=b; % compute c_{n-1}
b=b*x + a(n); %compute b_{n-1}
for r=n-2:-1:0, %steps of -1 down to zero
c=c*x + b; %compute c_r
b=b*x + a(r+1); %compute b_r
end,
if abs(b)<tol, return, end % convergence test
x=x-b/c; % Newton update
end
end
Notice that as matlab indexes vectors (and in fact all arrays) from 1 rather
than from zero, in the above matlab function the given polynomial is
pn (x) = a(1) + a(2)x + a(3)x2 + . . . + a(n)xn−1 + a(n + 1)xn .

18
Example.
For the cubic polynomial 6 − 5x − 2x2 + x3 ,
alpha=horner(0,[6,-5,-2,1],3,20,1.e-11) gives alpha = 1,
alpha=horner(20,[6,-5,-2,1],3,20,1.e-11) gives alpha = 3.000000000000099,
and
alpha=horner(0.2,[6,-5,-2,1],3,20,1.e-11) gives alpha = 0.999999999999997.

Example.

maxit=20; a=[1,-2,-3,-4,-5,-6,7]; tolerance = 1.e-9;


alpha=horner(2.9,a,6,maxit,tolerance)

computes the approximate root


alpha = 1.634200239846092
of the equation 1 − 2x − 3x2 − 4x3 − 5x4 − 6x5 + 7x6 = 0.
Example. Find the minimum value of the polynomial

1 + 2x + 3x2 + 4x3 + 5x4 + 6x5 + 7x6 = p(x)

on the real line.


Simple calculus gives

p0 (x) = 2 + 6x + 12x2 + 20x3 + 30x4 + 42x5

and using the horner function in matlab

alpha=horner(pi,[2,6,12,20,30,42],5,20,1.e-9)
for this and several other different values of x0 gives always the approximate root
alpha =-0.508507238094213 for which p(alpha) = 0.484106445983341.

Newton’s Method for Systems of Nonlinear Equa-


tions
Having considered methods for the solution of the single equation f (x) = 0, a
natural (and commonly arising) extension is to consider the nonlinear system
of equations
f (x) = 0 where f : Rn → Rn ,
that is, the set of equations

fi (x1 , x2 , . . . , xn ) = 0, i = 1, 2, . . . , n. (22)

Fixed point methods can be applied in Rn and Newton’s method and its variants
are by far the most widely used.

19
If f (α) = 0 then by Taylors theorem in several variables

0 = f (α) = f (x) + J(α − x) + higher order terms

where J is the Jacobian matrix:


∂fi
J = {Jij , i, j = 1, 2, . . . , n}, Jij = (x)
∂xj

evaluated at x and the higher order terms involve 2nd partial derivates and
quadratic terms in αj − xj . In component form as in (22) this is
n
X
0 = fi (α1 , α2 , . . . , αn ) = fi (x1 , x2 , . . . , xn )+ Jij (αj −xj )+higher order terms
j=1

for each i = 1, 2, . . . , n. Now, if we suppose that the higher order terms can be
ignored, then rearranging we obtain the (hopefully!) better estimate for α

xnew = x − J −1 f (x).

Certainly this should be a better estimate of α if x is already reasonably close


to α. Newton’s method is therefore:
select x0 ∈ Rn and for k = 0, 1, 2, . . . until convergence compute

xk+1 = xk − J −1 f (xk )

or in a form which is more useful for computation:


select x0 ∈ Rn and for k = 0, 1, 2, . . . until convergence solve the linear system

Jδ = −f (xk ) and set xk+1 = xk + δ (23)

where the Jacobian matrix is evaluated at xk .


Example.
Writing (x1 , x2 ) as (x, y) for the system

f1 (x, y) = xy + y 2 − 2 = 0
f2 (x, y) = x3 y − 3x − 1 = 0,

we have  
y x + 2y
J=
3x2 y − 3 x3
so that starting at x0 = 0, y0 = 1 we compute the next iterate via
    
1 2 δ1 1
=
−3 0 δ2 1
     
−1/3 x1 −1/3
giving δ = so that x1 = = x0 + δ = .
2/3 y1 5/3

20
The next iterate is calculated from the solution of
 5    −2 
3 3 δ1 9
−11 −1 = 7
45 27 δ2 81

leading to  
−0.357668
x1 = .
1.606112
Though one could continue with rationals (and ‘hand’ calculation), this is clearly
a computational procedure; the matlab functions for this problem

function[x,y] = vecnewton(xin,yin,tol)
x=xin;y=yin; funvalue = fvec(x,y); %initial (vector) function value
while norm(funvalue)>tol, % whilst the Euclidean length (norm) of the
% vector function value is too large
delta=-jacobian(x,y)\funvalue; x=x+delta(1), y=y+delta(2),
% Newton iteration
% \ solves the linearised equation system
% J delta = -f
funvalue = fvec(x,y), %new (vector) function value
end
end

function[vecval]=fvec(x,y)
f1 = x*y+y^2-2; f2= x^3*y-3*x-1;
vecval = [f1;f2]; % column vector
end

function[jac]=jacobian(x,y)
jac=[ y, x+2*y ; 3*x^2*y-3, x^3];
end
compute further iterates:
   
−0.357838 −0.357838
x2 = , x3 = = x4 = x5 = . . .
1.604407 1.604406

to the number of decimal places shown.


I must confess that I guessed that I would see convergence to the more
obviously checkable zero given by x = −1, y = 2 from these staring values, but
it takes something like
vecnewton(-1.3,2.8,1.e-10);
to obtain this other solution of this system of nonlinear equations.

21
Some comments are in order:
(i) as for a scalar equation, Newton’s method is only convergent if the starting
value is close enough to the desired zero (root).
(ii) the Newton iterates are only defined so long as the Jacobian remains non-
singular when evaluated at the iterates.

Optimization
One of the most significant applications of Newton’s method is in optimization:
one often desires to find the minimum (or at least a local minimum) value of
a function f : Rn → R. Provided the derivatives exist, the conditions for a
stationary point are well known:

g(α) = ∇f (α) = 0.
∂f
The vector g with entries gi = ∂xi
is the gradient. To guarantee that α indeed
gives minimality, we require some conditions of convexity of f , at least locally
in a neighbourhood of α. Taylor’s theorem is again useful provided that the
partial derivatives exist:

0 = g(α) = g(x) + H(α − x) + O(kα − xk2 ), (24)

where

H = [hij ]i,j=1,2,...,n ,
∂2f
 
∂gi ∂ ∂f
hij = (x) = (x) = (x), (i, j = 1, . . . , n).
∂xj ∂xj ∂xi ∂xi ∂xj

Note that H is the Jacobian matrix of the (vector valued) function g(x), and
this is also the Hessian matrix of the (real valued) function f (x) at x. Taking
the Taylor development (24) around an approximate solution xk and dropping
the higher order terms, we obtain the system of linear equations

0 = g(xk ) + H(xk+1 − xk ),

where we now write xk+1 for α, as the solution

xk+1 = xk − H −1 g(xk ) (25)

of this approximate system merely gives another approximation of α, albeit a


generally improved one. The updating rule (25) is the same as Newton updates
for the nonlinear system of equations 0 = g(α).
Another way to motivate the method (25) is to approximate the objective
function f (x) by a quadratic model function
1
mk (x) = f (xk ) + ∇f (xk )T (x − xk ) + (x − xk )T Bk (x − xk ), (26)
2

22
where Bk is an appropriate symmetric matrix, chosen so that mk (x) ≈ f (x) in
a neighbourhood of xk , that is, mk (x) is a locally valid model for f (x). Instead
of minimising f (x), it is simpler to minimise the model function. Its minimiser

xk+1 = arg min mk (x) (27)

is our updated approximation of a minimiser of f (x). Setting up an updated


model function mk+1 (x) that is valid in a neighbourhood of xk+1 , we obtain
an iterative process. Note that Bk being symmetric, all its eigenvalues are real.
When all the eigenvalues are positive, we say that Bk is positive definite, and
it is then the case that mk (x) is a strictly convex function whose minimiser is
found at the unique stationary point

bf xk+1 = xk − Bk−1 ∇f (xk ). (28)


∂gi
An obvious choice of quadratic model is to set Bk = H = [ ∂x j
(xk )], so that
mk (x) approximates f (x) to second order at the current iterate xk . In this case,
the quasi-Newton update (28) coincides with the Newton update (25).
The above discussion leads to the following observations:
i) When Bk is positive definite, the quasi-Newton update (25) moves to a
point xk+1 , where f (xk+1 ) is potentially smaller than f (xk ), since xk+1
is the minimiser of a model function that approximates f (x) in a neigh-
bourhood of xk . However, the descent condition

f (xk+1 ) < f (xk ) (29)

is not guaranteed in a situation where xk+1 lies far away from xk where
the model mk is a bad fit for f . To impose the descent condition (29), one
often combines the quasi-Newton method with a line-search: take a step

xk+1 = xk + τk dk , (30)

where dk = −Bk−1 ∇f (xk ) is the quasi-Newton step used as a search di-


rection, and τk > 0 is an approximiate minimiser of the function

φ(t) = f (xk + tdk ).

Note that since Bk is positive definite, we have ∇f (xk )T Bk ∇f (xk ) > 0,


so that

phi0 (0) = ∇f (xk )T dk = −∇f (xk )T Bk ∇f (xk ) < 0,

showing that at least for small enough t, the descent condition f (xk +
tdk ) < f (xk ) is satisfied. The choice of τk thus guarantees that (29) holds
true.
ii) Applied to the special case Bk = H, the above discussion shows that New-
ton’s Method (25) can only be expected to converge to a local minimiser

23
x∗ of f when the Hessian matrix of f at x∗ is positive definite and the
method is started sufficiently close to x∗ . Indeed, since Newton’s Method
is designed to solve g(x) = 0, it can also converge to a local maximiser of
f when it is not started near a local minimiser. Line-searches prevent this
phenomenon from occurring.
iii) Simple choices of Bk that guarantee that Bk is positive definite include
the following, where σ1 denotes the smalles eigenvalue of H,

Bk = H + λ I, with λ > −σ1 , (regularised Newton),


Bk = I, (steepest descent).

On large scale problems, the first of these two choices is computationally


too expensive, while the second can lead to excessively slow convergence.
Among numerous clever methods that were developed with the aim of
combining the fast converence of the first of these methods with the low
computational cost per iteration of the second, the so-called BFGS (Broy-
den– Fletcher–Goldfarb–Shanno) method is one of the most successful in
practice.

24

You might also like