Constructive Mathematics Notes
Constructive Mathematics Notes
May 5, 2022
Introduction
Much of Mathematics is abstract. However a surprisingly large number of algo-
rithms - that is procedures which one can carry out in practice or in principle
- are used throughout the subject. In Pure Mathematics, a procedure which
can be guaranteed to yield a particular structure for a particular problem state-
ment may provide an existence proof. In Statistics, algorithms are widely used
on carefully collected data sets to yield statistics, that is numbers which have
some representative meaning. In Applied Mathematics, the solutions of prob-
lems which model the physical world are very often found approximately using
computational procedures.
Some constructions of use in algebra, analysis and more broadly in diverse
applications will be introduced and analysed in this course. The ability to ac-
tually carry out these procedures is quite often provided only by computers;
humans are not always so good at repetitive or routine procedures especially
where a lifetime of ‘hand’ calculations can be done in milliseconds by an appro-
priate algorithm implemented on an electronic computer!
1
Further applying the Division Algorithm to (b, r1 ), there must be q2 ∈ N − {0}
and r2 ∈ N with 0 ≤ r2 < r1 such that
b = q2 r1 + r2 .
Repeating we obtain altogether
a = q1 b + r1
b = q2 r1 + r2 (3)
r1 = q3 r2 + r3
r2 = q4 r3 + r4
..
.
rj−3 = qj−1 rj−2 + rj−1 (4)
rj−2 = qj rj−1 + rj (5)
..
.
with a > b > r1 > r2 > r3 > r4 > . . . > rj > . . . > 0. One quickly realises that
since a, b and the rj for every j are members of N then this repeated procedure
must terminate at some finite stage with rj = 0 since the Division Algorithm
can continue to be applied whenever the remainder is positive. Thus if it is rj
that is the first remainder to be zero, (5) must be
rj−2 = qj rj−1 (6)
and our repeated procedure must stop: the Division Algorithm is not defined
for application to (rj−1 , 0).
Now (6) tells us that rj−1 is a factor of rj−2 . Thus the right hand side of
the equals sign in (4) must also be exactly divisible (in the sense of the natural
numbers: no fractions allowed here!) by rj−1 . Therefore rj−3 (as the left hand
side) is also exactly divisible by rj−1 . Continuing to apply this argument back
through the ‘equations’ we find, by induction, that rj−1 > 0 is a factor of each
of the natural numbers rj−2 , rj−3 , . . . , r2 , r1 , b, a.
Thus the repeated use of the Division Algorithm described here yields a
(positive integer) common factor of a and b. That this procedure, widely known
as Euclid’s Algorithm, in fact yields the highest common factor of a and b is
proved by the following inductive argument. Suppose d > rj−1 is a larger
common factor of a and b, then since (2) may be restated as
a − q1 b = r1 ,
it follows that d also must be a factor of r1 . Similarly using (3), d must divide r2
and inductively d must divide rj−1 . That is, we cannot have a common factor
of a, b which is greater than rj−1 .
The outcome is that Euclid’s Algorithm must always compute the highest
common factor, hcf (a, b), of a and b as the last non-zero remainder.
2
Example.
1099 = 2 × 525 + 49
525 = 10 × 49 + 35
49 = 1 × 35 + 14
35 = 2 × 14 + 7
14 = 2×7+0
Note that if we divide each of 1099 and 525 by 7 to get 157 and 75, then the
two numbers are necessarily coprime; that is the highest common factor of 157
and 75 is 1.
In case the highest common factor of larger natural numbers are required, a
simple matlab implementation is something like
function[h]=euclid(a,b)
% a>b>0 are expected to be integers otherwise this will not work!
oldr = a; newr = b;
while newr>0, [q,r]=division(oldr,newr); oldr=newr; newr=r; end
h=oldr, end
(After storing this in the file euclid.m you might wish to experiment: I’ve just
discovered that hcf (987654321, 123456789) = 9 which might be obvious, but I
didn’t know it! matlab starts to automatically convert very large integers to
real (floating point) numbers when they are bigger than 2147483647, so you’ll
have to use some other computational environment if you want to go bigger
than this.)
The basis of currently used cryptography and hence data security (whether
you insert your cash card into a cash machine or order a book online) is the
factorization of large integers. This is simply because factorization of a single
natural number is a very computationally intensive task, requiring much more
arithmetic than finding the highest common factor of two natural numbers by
Euclid’s Algorithm. Current crytographic methods (RSA, named after Rivest,
Shamir and Adleman is the most widely used) would become insecure if a fast
algorithm was found for integer factorization.
2x + 5y = 3
7x + 3y = 5
3
yields x = 16/29, y = 11/29. An interesting and different set of issues arise if
we seek only solutions in the integers, Z. To make this sensible we must have
more variables than equations; 3x + 2 = 4 will clearly have no solution in the
integers, for example!
The simplest linear Diophantine equation (named after Diophantus of Alexan-
dria who lived in the 3rd century) is of the form
ax + by = c, (7)
ax + by = 0. (8)
Proof
Let (x, y) be a solution of (7), and define xh = xp − x, yh = yp − y. Then
(xh , yh ) ∈ Z2 and axh + byh = (axh + byh ) − (ax + by) = 0. The reverse impli-
cation follows similarly.
4
constructive and yields an algorithm for the computation of such a solution.
Examples.
i) 12x + 9y = 4 has no integer solutions, because 3 = hcf (12, 9) does not
divide into 4.
ii) 12x + 9y = 15 has the same set of integer solutions as
4x + 3y = 5, (10)
rj−2 = qj rj−1 + 1
for some j (in case the remainder 1 arises at the first stage (2) or second stage
(3) we could just call a = r−1 and b = r0 ). Thus
and hence rj−1 can be subtituted in so that (11) become an equality between
1 and the sum of multiples of rj−2 and rj−3 . This process can always be con-
tinued (with integers only) until we have an equality between 1 and the sum of
multiples of b and a. An example demonstrates this best.
157 = 2 × 75 + 7
75 = 10 × 7 + 5
7 = 1×5+2
5 = 2×2+1
(2 = 2 × 1 + 0).
5
Thus,
1 = 5−2×2
= 5 − 2 × (7 − 1 × 5) = −2 × 7 + 3 × 5
= −2 × 7 + 3 × (75 − 10 × 7) = 3 × 75 − 32 × 7
= 3 × 75 − 32 × (157 − 2 × 75) = −32 × 157 + 67 × 75
a(x̂ + ξ) + b(ŷ + η) = 1.
Examples.
i) 157 and 75 are coprime, and the equation 157x̂ + 75ŷ = 1 is satisfied for
x̂ = −32, ŷ = 67. A particular solution of
is thus (−128, 268) = 4(−32, 67), and the general solution of (13) is given
by x = −128 + 75n, y = 268 − 157n, (n ∈ Z).
ii) We saw previously that Equation (10) has (5, −5) as particular solution.
The set of all solutions is thus given as {(5 + 3n, −5 − 4n) : n ∈ Z}. In
particular, the solution (2, −1) is found in this set by choosing n = −1.
6
The above constructions (do more than) provide a proof of the following
result.
Note, in particular, that if a and b are coprime then there are integer solu-
tions of ax + by = 1.
As a historical note, it was in the margin of his copy of a Latin translation
of Diophantus’ book Arithmetica that Fermat scribbled the now famous obser-
vation that he had a beautiful proof of what became known as Fermat’s Last
Theorem, but that there was not enough room in the margin to write it down!
Polynomials
There is another context in which the Division Algorithm might have been en-
countered at school: if one considers the set of real polynomials then division
is possible. We will not cover this rigorously here – it will appear in a later
algebra course – however a couple of examples indicate the idea.
Examples.
1 2 1
x4 + 2x3 − 2x2 + 3x = 2x2 + 4x − 6 + (x + 3),
x +
2 2
π 1
2x3 + 3x2 + (π − 2)x − = x − 2x2 + 4x + π + 0,
2 2
where b(x) has degree lower than or equal than the degree of a(x), and r(x) is
a polynomial of degree strictly less than the degree of b(x).
Note that the (ordered) ‘size’ of a number is what is important for N, but
it is the degree which determines ‘size’ in this context: a(x) > b(x) would here
have to mean that the degree of the polynomial a is greater than the degree of
the polynomial b. The coefficients of all of the polynomials are in general real
and not integers.
If this division can always be done (it can), then Euclid’s Algorithm can also
be applied since it is simply repeated application of the Division Algorithm.
This repeated application must eventually lead to a remainder polynomial of
degree zero i.e. to a constant. If this constant is zero then a common factor
7
– indeed the highest common factor – of two polynomials has been found as
the previous remainder. If the constant is not zero then there are no common
factors other than (arbitrary) constants. Any constant is a common factor of
any pair of polynomials just as 1 is a common factor of any two natural numbers.
Example.
x5 + x3 − 6x = x3 − 2x x2 + 3 + 0
so that (any scalar multiple of) x2 + 3 is the highest common factor of the poly-
nomials x6 − 4x5 + x4 − 4x3 − 5x2 + 24x + 3 and x5 + x3 − 6x.
Example.
x4 + 1 = (x) x3 + 1 − (x − 1)
x3 + 1 = −x2 − x − 1 (−x + 1) + 2
Algorithm.
Starting from a = a0 , b = b0
for k = 0, 1, 2, . . .
calculate ck = 21 (ak + bk )
if f (ck ) is of the same sign as f (ak ) then set ak+1 = ck and bk+1 = bk
otherwise set ak+1 = ak and bk+1 = ck
repeat
8
If this procedure is stopped when |ak − bk | < 2 then |ck − α| < and one has
the desired accuracy.
Analysing this procedure, one has
1 1 1 1 1
|ck − α| ≤ |bk − ak | = |bk−1 − ak−1 | = . . . = k+1 |b0 − a0 | = k+1 |b − a|
2 2 2 2 2
Algorithm.
while |a − b| > , do the following
set m = 21 (a + b)
if f (m)f (a) < 0, redefine b = m
else redefine a = m
repeat
α = g(α) (14)
for some g.
9
Examples.
(i) x5 − 3x + 12 = 0 ↔ x = 13 x5 + 12
(ii) x5 − 3x + 12 = 0 ↔ x = −1/(2x4 − 6)
(iii) sin x − x cos x = 0 ↔ x = tan x
(iv) f (x) = 0 ↔ x = x − µf (x) for any µ ∈ R, µ 6= 0.
A value α satisfying (14) is called a fixed point of the function g.
The associated fixed point iteration is to select some particular x0 ∈ R and
then to iteratively define (i.e. compute!)
You may like to convince yourself that the above Lemma gives sufficient
but not necessary conditions for the existence of fixed points and also that
uniqueness cannot hold without further conditions. Such conditions are given
now:
10
Proof
Assume g has 2 fixed points c 6= d, c, d ∈ [a, b] then we can employ the Mean
Value Theorem (or Taylor’s Theorem if you prefer) which guarantees the exis-
tence of ξ ∈ R strictly between c and d with
c − d = (c − d)g 0 (ξ)
This Lemma is rather more important than one might expect. Consider if
a function g did have 2 fixed points, then the fixed point iteration (15) might
compute a sequence which simply oscillated between them and so not converge
or it might compute an even more complicated non-convergent (hence divergent)
sequence. This a major consideration for constructive methods: problems which
have non-unique solutions are generally fraught with difficulties, whereas if any
solution is unique then at least there is a candidate which might be computed
or approximated by a constructive method.
In the context of fixed points and the fixed point iteration, precisely the
same conditions which guarantee existence and uniqueness of a fixed point also
guarantee convergence of the fixed point iteration:
xk = g(xx−1 ), k = 1, 2, . . .
11
By Taylor’s Theorem (or the Mean Value Theorem) there exists ξk strictly
between xk−1 and α such that
These results establish checkable conditions for fixed points and conver-
gence of fixed point iterations. Note that the proof here is non-constructive
(indeed the existence Lemma used the Intermediate Value Theorem which is
non-constructive), but the result allows for the reliable use of a constructive
method for the calculation of fixed points.
Example.
Determine if f (x) = x5 − 3x + 12 has any roots in [0, 12 ] and if so, find them.
As in the example above we can write this as the fixed point problem
1 5 1
x= (x + ) = g(x).
3 2
Now clearly for 0 ≤ x ≤ 21 we have 16 ≤ g(x) ≤ 31 ( 32
1
+ 12 ) so g : [0, 12 ] → [0, 12 ]
and roots must exists here because g is clearly continuous. Further
5 4 5
g 0 (x) = x ⇒ |g 0 (x)| ≤ <1
3 48
on the open interval. Hence there exists a unique fixed point of g and thus a
unique real root of f in [0, 12 ]. Starting with x0 = 0.25 we calculate
12
for x ∈ (0, 3/2) so there is a unique fixed point in this interval which we may
compute by fixed point iteration: say we start with x0 = 0 ∈ [0, 3/2]
x1 = cos(x0 ) = 1
x2 = cos(x1 ) = 0.5403
x3 = cos(x2 ) = 0.8576
x4 = cos(x3 ) = 0.6543
x5 = cos(x4 ) = 0.7935
..
.
x21 = cos(x20 ) = 0.7392
x22 = cos(x21 ) = 0.7390
x23 = cos(x22 ) = 0.7391 = x24 = x25 = . . . .
Given the value γ = 0.9975 here for which powers reduce pretty slowly, (0.997523 ≈
0.9441) it is not so surprising that it takes many more iterations here to get con-
vergence than in the example above.
It is pretty clear from this last example that we can use ‘mathematics by
hand’ to verify that there is a unique root in the required interval, and that may
be all that we want to know. However if we wish to actually calculate this root,
then using a computer is worthwhile. A simple piece of matlab code (stored
in the file fixedpt.m) here will enable this:
function[x] = fixedpt(xin,tol)
xold=xin;x=g(xin),
while abs(x-xold)>tol,xold=x;x=g(xold), end
end
where the function g needs to be defined (and stored in the file g.m)
function[value]= g(x)
value=cos(x);
end
If you don’t want to see anything but the approximate root then one can simply
amend the last three lines as
xold=xin;x=g(xin);
while abs(x-xold)>tol,xold=x;x=g(xold); end
x, end
Though this may be the more common situation, it is not actually necessary
for the function g to be differentiable in order to guarantee uniqueness of fixed
points and convergence of fixed point iterations:
13
Theorem. (The Contraction Mapping Theorem)
If g : [a, b] → [a, b] is continuous and satisfies
for some real value 0 ≤ γ < 1 and for all x, y ∈ [a, b] then there exists a unique
fixed point of g in [a, b] and for any x0 ∈ [a, b] the fixed point iteration (15) will
converge to it.
Proof. Similar to the above.
Newton’s Method
It follows from (17) that
|xk − α|
0< ≤ γ < 1. (18)
|xk−1 − α|
A convergent sequence {xk } satisfying (18) is said to converge linearly and the
corresponding fixed point iteration to have linear convergence. The question
arises as to whether more rapid convergence can be achieved with a fixed point
iteration. For example, do there exist methods for which
|xk − α|
0< ≤K (19)
|xk−1 − α|2
for some constant, K? Any such fixed point method is called quadratically
convergent.
From Taylor’s Theorem, provided g ∈ C 2 on some appropriate interval con-
taining α, we have
1
g(xk ) = g(α) + (xk − α)g 0 (α) + (xk − α)2 g 00 (ξk ), some ξk
2
i.e.
1
xk+1 − α = (xk − α)g 0 (α) + (xk − α)2 g 00 (ξk )
2
and so quadratic convergence will be achieved if g 0 (α) = 0. So to find a root α
of f , look for φ(x) so that
14
satisfies g 0 (α) = 0:
g 0 (α) = 1 − φ(α)f 0 (α) − φ0 (α)f (α) ⇒ φ(x) = 1/f 0 (x)
(or indeed any sufficiently smooth function φ which happens to satisfy φ(α) =
1/f 0 (α)) since f (α) = 0. The resulting method is Newton’s method:
xk+1 = xk − f (xk )/f 0 (xk ).
It is widely used because of the quadratic convergence property, but it is only
locally convergent: it is necessary that x0 is sufficiently close to α to guarantee
convergence to α. Also note that xk+1 is only defined (bounded) if f 0 (xk ) 6= 0.
Rigorous conditions guaranteeing the convergence of (the iterates computed by)
Newton’s method are somewhat technical and will not be covered in this course:
statements about the convergence of Newton’s method should be assumed to
implicitly include the condition: ‘if {xk } converges’.
Example.
f (x) = x5 − 3x + 1
2 clearly has f 0 (x) = 5x4 − 3 so Newton’s method gives
the following
(i) if x0 = 0, then
x1 = 0.166666666666667
x2 = 0.166709588805906
x3 = 0.166709588834381 = x4 = . . .
(ii) if x0 = 0.8 then
x1 = −0.851596638655463
x2 = 6.188323841208973
x3 = 4.952617138936713
x4 = 3.965882550556341
x5 = 3.180014757669606
x6 = 2.558042610355714
x7 = 2.073148949750824
x8 = 1.708602767234352
x9 = 1.457779524622984
x10 = 1.319367836130980
x11 = 1.274944679385050
x12 = 1.270653002026494
x13 = 1.270615088695053
x14 = 1.270615085755746 = x15 = . . .
Notice how in (ii) it takes several iterations before the iterates ‘settle down’
and ultimately converge rapidly (in fact quadratically!) to another root of this
quintic polynomial.
15
Example.
f (x) = cos(x) − x has f 0 (x) = − sin(x) − 1 so Newton’s method gives
(i) if x0 = 0, then
x1 = 1
x2 = 0.750363867840244
x3 = 0.739112890911362
x4 = 0.739085133385284
x5 = 0.739085133215161 = x6 = . . .
function[value]=f(x)
value = x^5 -3*x + 0.5;
% value = cos(x)-x;
end
function[value]= fprime(x)
value=5*x^4 - 3;
% value = -sin(x)-1;
end
For functions with enough derivatives, one can define iterative methods for which
|xk − α|
0< ≤K (20)
|xk−1 − α|p
16
for some constant, K. Any such method would have order of convergence p.
In practice there is generally little to be gained from consideration of meth-
ods with higher order convergence than p = 2 and Newton’s method (and its
generalisations, some of which we will come to later) is by far the most im-
portant practical method for solving nonlinear equations when derivatives are
available. If the derivative is not readily available, then the leading general,
practical method is probably Brent’s method which is essentially a clever com-
bination of the Bisection method and the Secant method (see problem 5 on
question sheet 3).
Horner’s Method
When seeking roots of polynomials expressed in the form
Newton’s method can be carried out in a particularly efficient and elegant way
using Horner’s method or Horner’s rule which is sometimes also called the nested
multiplication scheme. This is approached most easily by considering an efficient
algorithm for evaluating a polynomial.
Theorem. Setting bn = an and for r = n − 1, n − 2, . . . 2, 1, 0 setting
br = br+1 σ + ar ,
then
pn (z) = (z − σ)qn−1 (z) + b0 .
Proof
17
Differentiating
pn (z) = (z − σ)qn−1 (z) + b0
gives
p0n (z) = (z − σ)qn−1
0
(z) + qn−1 (z)
so that p0n (σ) = qn−1 (σ). We may thus write Newton’s method for a root of a
polynomial as: choose x0 and for k = 0, 1, 2, . . . set
xk+1 = xk − pn (xk )/p0n (xk ),
so with xk = σ:
xk+1 = xk − pn (xk )/qn−1 (xk ).
So perform nested multiplication on pn (xk ) to give
bn = an and for r = n − 1, n − 2, . . . , 2, 1, 0 compute br = br+1 xk + ar
so that b0 = pn (xk ), and then perform nested multiplication on qn−1 (xk ) to give
cn−1 = bn and for r = n − 2, n − 3, . . . , 2, 1, 0 compute cr = cr+1 xk + br+1
so that c0 = qn−1 (xk ), and thus the next Newton iterate is
xk+1 = xk − b0 /c0 .
For a computational algorithm, perform the two evaluations consecutively:
function[x] = horner(xin,a,n,maxit,tol)
% n is degree of the polynomial, a is a vector of length n+1 containing
% the coefficients with constant term first, the coefficient of the
% linear term second, then quadratic etc
% xin is the starting value for the Newton iteration
x=xin;
for k=1:maxit,
b=a(n+1); % compute b_n
c=b; % compute c_{n-1}
b=b*x + a(n); %compute b_{n-1}
for r=n-2:-1:0, %steps of -1 down to zero
c=c*x + b; %compute c_r
b=b*x + a(r+1); %compute b_r
end,
if abs(b)<tol, return, end % convergence test
x=x-b/c; % Newton update
end
end
Notice that as matlab indexes vectors (and in fact all arrays) from 1 rather
than from zero, in the above matlab function the given polynomial is
pn (x) = a(1) + a(2)x + a(3)x2 + . . . + a(n)xn−1 + a(n + 1)xn .
18
Example.
For the cubic polynomial 6 − 5x − 2x2 + x3 ,
alpha=horner(0,[6,-5,-2,1],3,20,1.e-11) gives alpha = 1,
alpha=horner(20,[6,-5,-2,1],3,20,1.e-11) gives alpha = 3.000000000000099,
and
alpha=horner(0.2,[6,-5,-2,1],3,20,1.e-11) gives alpha = 0.999999999999997.
Example.
alpha=horner(pi,[2,6,12,20,30,42],5,20,1.e-9)
for this and several other different values of x0 gives always the approximate root
alpha =-0.508507238094213 for which p(alpha) = 0.484106445983341.
fi (x1 , x2 , . . . , xn ) = 0, i = 1, 2, . . . , n. (22)
Fixed point methods can be applied in Rn and Newton’s method and its variants
are by far the most widely used.
19
If f (α) = 0 then by Taylors theorem in several variables
evaluated at x and the higher order terms involve 2nd partial derivates and
quadratic terms in αj − xj . In component form as in (22) this is
n
X
0 = fi (α1 , α2 , . . . , αn ) = fi (x1 , x2 , . . . , xn )+ Jij (αj −xj )+higher order terms
j=1
for each i = 1, 2, . . . , n. Now, if we suppose that the higher order terms can be
ignored, then rearranging we obtain the (hopefully!) better estimate for α
xnew = x − J −1 f (x).
xk+1 = xk − J −1 f (xk )
f1 (x, y) = xy + y 2 − 2 = 0
f2 (x, y) = x3 y − 3x − 1 = 0,
we have
y x + 2y
J=
3x2 y − 3 x3
so that starting at x0 = 0, y0 = 1 we compute the next iterate via
1 2 δ1 1
=
−3 0 δ2 1
−1/3 x1 −1/3
giving δ = so that x1 = = x0 + δ = .
2/3 y1 5/3
20
The next iterate is calculated from the solution of
5 −2
3 3 δ1 9
−11 −1 = 7
45 27 δ2 81
leading to
−0.357668
x1 = .
1.606112
Though one could continue with rationals (and ‘hand’ calculation), this is clearly
a computational procedure; the matlab functions for this problem
function[x,y] = vecnewton(xin,yin,tol)
x=xin;y=yin; funvalue = fvec(x,y); %initial (vector) function value
while norm(funvalue)>tol, % whilst the Euclidean length (norm) of the
% vector function value is too large
delta=-jacobian(x,y)\funvalue; x=x+delta(1), y=y+delta(2),
% Newton iteration
% \ solves the linearised equation system
% J delta = -f
funvalue = fvec(x,y), %new (vector) function value
end
end
function[vecval]=fvec(x,y)
f1 = x*y+y^2-2; f2= x^3*y-3*x-1;
vecval = [f1;f2]; % column vector
end
function[jac]=jacobian(x,y)
jac=[ y, x+2*y ; 3*x^2*y-3, x^3];
end
compute further iterates:
−0.357838 −0.357838
x2 = , x3 = = x4 = x5 = . . .
1.604407 1.604406
21
Some comments are in order:
(i) as for a scalar equation, Newton’s method is only convergent if the starting
value is close enough to the desired zero (root).
(ii) the Newton iterates are only defined so long as the Jacobian remains non-
singular when evaluated at the iterates.
Optimization
One of the most significant applications of Newton’s method is in optimization:
one often desires to find the minimum (or at least a local minimum) value of
a function f : Rn → R. Provided the derivatives exist, the conditions for a
stationary point are well known:
g(α) = ∇f (α) = 0.
∂f
The vector g with entries gi = ∂xi
is the gradient. To guarantee that α indeed
gives minimality, we require some conditions of convexity of f , at least locally
in a neighbourhood of α. Taylor’s theorem is again useful provided that the
partial derivatives exist:
where
H = [hij ]i,j=1,2,...,n ,
∂2f
∂gi ∂ ∂f
hij = (x) = (x) = (x), (i, j = 1, . . . , n).
∂xj ∂xj ∂xi ∂xi ∂xj
Note that H is the Jacobian matrix of the (vector valued) function g(x), and
this is also the Hessian matrix of the (real valued) function f (x) at x. Taking
the Taylor development (24) around an approximate solution xk and dropping
the higher order terms, we obtain the system of linear equations
0 = g(xk ) + H(xk+1 − xk ),
22
where Bk is an appropriate symmetric matrix, chosen so that mk (x) ≈ f (x) in
a neighbourhood of xk , that is, mk (x) is a locally valid model for f (x). Instead
of minimising f (x), it is simpler to minimise the model function. Its minimiser
is not guaranteed in a situation where xk+1 lies far away from xk where
the model mk is a bad fit for f . To impose the descent condition (29), one
often combines the quasi-Newton method with a line-search: take a step
xk+1 = xk + τk dk , (30)
showing that at least for small enough t, the descent condition f (xk +
tdk ) < f (xk ) is satisfied. The choice of τk thus guarantees that (29) holds
true.
ii) Applied to the special case Bk = H, the above discussion shows that New-
ton’s Method (25) can only be expected to converge to a local minimiser
23
x∗ of f when the Hessian matrix of f at x∗ is positive definite and the
method is started sufficiently close to x∗ . Indeed, since Newton’s Method
is designed to solve g(x) = 0, it can also converge to a local maximiser of
f when it is not started near a local minimiser. Line-searches prevent this
phenomenon from occurring.
iii) Simple choices of Bk that guarantee that Bk is positive definite include
the following, where σ1 denotes the smalles eigenvalue of H,
24