Lec0general
Lec0general
Taylor’s theorem: Suppose f ∈ C n ([a, b]) and f (n+1) exists on (a, b) and let x0 ∈ (a, b).
Then for all x ∈ (a, b) there is a ξ(x) between x0 and x such that
where n
X f (j) (x0 ) f (n+1) (ξ(x))
Pn (x) = (x − x0 )j , Rn = (x − x0 )n+1 .
j=0
j! (n + 1)!
The function Pn is the Taylor polynomial of degree n and Rn is the remainder term.
1
Intermediate value theorem (IVT): Suppose f ∈ C([a, b]). Then for all values y between
f (a) and f (b) there is a point c such that f (c) = y. That is, if f is continuous on [a, b] then
f takes on all values between its minimum and maximum on the interval.
Mean value theorem (MVT): Suppose f ∈ C([a, b]) and f 0 exists on (a, b). Then there
is a point ξ ∈ (a, b) such that
f (b) − f (a)
f 0 (c) = .
b−a
Integral version: If f ∈ C([a, b]) and g is singled-signed and integrable on [a, b] then there
exists a point ξ ∈ (a, b) such thatx
Z b Z b
f (x)g(x) dx = f (ξ) g(x) dx
a a
2 Order of convergence
Throughout this section, let us consider a sequence {xn } of scalars with
lim xn = x.
n→∞
The sequence xn might be approximations from an iterative method or the value of an er-
ror after n iterations (e.g. ke(k) k from the Jacobi/Gauss-Seidel methods, converging to zero).
i) xn = 1/n2
2
To get an error of 10−2 , we need n = 10, 4 and 1 respectively. Now let us consider how many
more iterations are needed to reduce this error to 10−4 .
so only one more iteration is required. Each iteration doubles the number of correct digits.
For (iii):
x10 = 10−2 → · · · 90 iterations · · · → x100 = 10−4 .
Notice that to achieve an error on the order of machine precision, (i) takes about 4 iterations,
(ii) takes 32 and (iii) takes 100 billion iterations. The way in which the error decreases is
dramatically different for each sequence.
for a constant (the ‘rate’) r with 0 < r < 1. Informally, this means that
|xn+1 − x| ≈ r|xn − x| as n → ∞.
i.e. the error decays by about a factor of r at each step when close to the limit. If only the
above holds with an inequality,
|xn+1 − x| ≤ r|xn − x| as n → ∞.
Notation: The statement ‘f (n) ≤ g(n) as n → ∞’ means that for any > 0 there is an
index N such that f (n) ≤ g(n) + for n ≥ N .
As we have seen in (i), to get one more correct digit in the approximation (i.e. to reduce
the error by a factor of ten), we need to take a fixed number of iterations. This property
3
makes linear convergence good in practice (for many problems) so long as the rate r is not
too close to 1.
It is often not too bad to get reasonable close to the limit. However, to get very high accu-
racy, one requires a large number of iterations as seen in example (iii).
for a constant C that does not have to be less than one. Informally,
i.e. the error gets raised to the power α at each step (and then multiplied by C).
The case where α = 2 is called quadratic convergence. As shown in example (ii), when
near the limit each iteration multiplies the number of correct digits by a factor α. Thus the
approximations become accurate very quickly, needing only a handful of iterations to reach
machine precision.
The limit C determines an ‘initial’ error of sorts that gets reduced as n → ∞. In con-
trast to linear convergence, the value of the limit C is not too important because the effect
of doubling the number of correct digits (or multiplying by α) is an ‘exponential’ growth in
the number of digits that will quickly reduce any initial error, no matter how large.
1
Such a sequence is said to have ‘polynomial decay’ because it goes to zero like 1 over a polynomial, in
contrast to ‘exponential decay’ in the linear example.
4
A technicality (optional): It is worth noting that the definition of convergence here is
a little too stringent, because we require that the error drops by a certain factor at each
iteration. This does not allow for some ‘oscillations’ in the sequence, even if they decay to
zero. For example, define a sequence cn by
1/4, 1, 1/4, 1 · · ·
|an | ≤ 2−n
so an converges to zero, and we would like to say the it converges linearly with rate 1/2.
5
2.3 Big-O Notation
A function f (x) (or sequence) is ‘Big-O’ of g(x) as x → a, written
f = O(g(x)) as x → a
or more succinctly as
f = O(g(x))
if there is a constant C > 0 such that
The values of a are always either a = 0 or a = ±∞. The limit is usually implied by context
and is left unstated.
If a = ∞: there are values x0 and C such that the bound holds for all x > x0 .
If a = 0: There are values x0 and C such that the bound holds for |x| < x0 .
The notation expresses that f grows at most as fast as g (up to a constant factor) as
x → a. (Alternately, if g → 0 then f decays at least as fast as g).
The same definitions apply for sequence an , where the only meaningful limit is n → ∞.
Big-O provides an easy way to state the size of a term without being to specific. One can
choose when to stop being exact if there are many terms, e.g. it is also true that
x3 + x2 + x = x3 + O(x2 ).
Since g is only an upper bound, a function that is O(g) is also O(anything larger than g).
For example,
Other examples:
6
• Similarly, a polynomial in 1/x is big-O of its smallest degree term as x → ∞, e.g.
1 2 3
− 3 + 5 = O(1/x) as x → ∞.
x x x
A Big-O term in an equation eats all smaller terms (convenient when we don’t care about
terms of that size). If f1 = O(g) and f2 = O(h) and 0 < g ≤ h then
f1 + f2 = O(h).
Similarly, O(f1 f2 ) = O(gh). Take, for instance, calculating a product of power series near
x = 0 like
sin x cos x = (x − O(x3 ))(1 − x2 /2 + O(x4 )).
The O(x3 ) term will absorb x3 and above terms, so we only need to calculate the terms that
are x2 and below:
sin x cos x = x + O(x3 ).
Note that xO(x4 ) = O(x5 ) and the product of O(x3 ) and O(x4 ) terms is O(x7 ). The obvious
downside is that we lose some information (what is the x3 term above?), so it is important
to only use O(· · · ) when the details of those terms are not important.