0% found this document useful (0 votes)
11 views7 pages

Lec0general

The document provides lecture notes on calculus theorems, including Taylor's theorem, the Mean Value Theorem, and the Intermediate Value Theorem. It also discusses the order and rate of convergence for sequences, defining linear, sublinear, and superlinear convergence, along with practical implications for iterative methods. Additionally, it introduces Big-O notation to describe the growth rates of functions and sequences.

Uploaded by

Debojit Sen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views7 pages

Lec0general

The document provides lecture notes on calculus theorems, including Taylor's theorem, the Mean Value Theorem, and the Intermediate Value Theorem. It also discusses the order and rate of convergence for sequences, defining linear, sublinear, and superlinear convergence, along with practical implications for iterative methods. Additionally, it introduces Big-O notation to describe the growth rates of functions and sequences.

Uploaded by

Debojit Sen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Math 361S Lecture Notes

Some background and theorems


January 17, 2019

Topics covered (to be expanded as needed)


• Calculus theorems
◦ Taylor’s theorem
◦ Mean value theorem
• Order and rate of convergence
◦ Linear and superlinear convergence
• Big-O notation

1 Results from calculus


In the following, f ∈ C n ([a, b]) means that f is n-times continuously differentiable (i.e. f (n)
is continuous) in the interval [a, b] and f ∈ C([a, b]) means that f is continuous on [a, b].

Taylor’s theorem: Suppose f ∈ C n ([a, b]) and f (n+1) exists on (a, b) and let x0 ∈ (a, b).
Then for all x ∈ (a, b) there is a ξ(x) between x0 and x such that

f (x) = Pn (x) + Rn (x)

where n
X f (j) (x0 ) f (n+1) (ξ(x))
Pn (x) = (x − x0 )j , Rn = (x − x0 )n+1 .
j=0
j! (n + 1)!

The function Pn is the Taylor polynomial of degree n and Rn is the remainder term.

In more informal arguments, it is sometimes more convenient to write


f (x) = Pn (x) + O((x − x0 )n+1 ).
If we set h = x − x0 then the theorem can be written
1 1 f (n) (ξ) n+1
f (x0 + h) = f (x0 ) + hf 0 (x0 ) + h2 f 00 (x0 ) + · · · + hn f (n) (x0 ) + h .
2 n! (n + 1)!

1
Intermediate value theorem (IVT): Suppose f ∈ C([a, b]). Then for all values y between
f (a) and f (b) there is a point c such that f (c) = y. That is, if f is continuous on [a, b] then
f takes on all values between its minimum and maximum on the interval.

Mean value theorem (MVT): Suppose f ∈ C([a, b]) and f 0 exists on (a, b). Then there
is a point ξ ∈ (a, b) such that
f (b) − f (a)
f 0 (c) = .
b−a
Integral version: If f ∈ C([a, b]) and g is singled-signed and integrable on [a, b] then there
exists a point ξ ∈ (a, b) such thatx
Z b Z b
f (x)g(x) dx = f (ξ) g(x) dx
a a

2 Order of convergence
Throughout this section, let us consider a sequence {xn } of scalars with

lim xn = x.
n→∞

The sequence xn might be approximations from an iterative method or the value of an er-
ror after n iterations (e.g. ke(k) k from the Jacobi/Gauss-Seidel methods, converging to zero).

We are interested in a few basic questions:


• Quantifying rates: How do we describe the speed at which a sequence converges?

• Practical perspective: Suppose we can generate the sequence x1 , x2 , · · · by some


iteration and want N digits of accuracy for xn ≈ x. How many iterations must we do
to gain k digits of accuracy (that is, to reduce the error by a factor of 10−k )?

• Inferring convergence from data: Given an output of errors, how do we determine


how fast the sequence is converging?

2.1 Motivating examples


To begin, let us consider three examples:
ii) xn = 10−n/2
n
iii) xn = 10−2

i) xn = 1/n2

2
To get an error of 10−2 , we need n = 10, 4 and 1 respectively. Now let us consider how many
more iterations are needed to reduce this error to 10−4 .

For (i), the error decreases by a factor of 10 every two iterations:

x4 = 10−2 , x6 = 10−3 , x8 = 10−4 , · · ·

so it takes an additional 4 iterations. Each iteration gives us half a correct digit.

For (ii), the factor of decrease doubles at each iteration:

10−2 , 10−4 , 10−8 , 10−16 , · · ·

so only one more iteration is required. Each iteration doubles the number of correct digits.

For (iii):
x10 = 10−2 → · · · 90 iterations · · · → x100 = 10−4 .
Notice that to achieve an error on the order of machine precision, (i) takes about 4 iterations,
(ii) takes 32 and (iii) takes 100 billion iterations. The way in which the error decreases is
dramatically different for each sequence.

2.2 Formal definitions


Linear: The sequence is said to converge linearly if
|xn+1 − x|
lim =r
n→∞ |xn − x|

for a constant (the ‘rate’) r with 0 < r < 1. Informally, this means that

|xn+1 − x| ≈ r|xn − x| as n → ∞.

i.e. the error decays by about a factor of r at each step when close to the limit. If only the
above holds with an inequality,

|xn+1 − x| ≤ r|xn − x| as n → ∞.

then we say the sequence converges at least linearly to x.

Notation: The statement ‘f (n) ≤ g(n) as n → ∞’ means that for any  > 0 there is an
index N such that f (n) ≤ g(n) +  for n ≥ N .

An exponentially decaying sequence xn = Crn converges linearly to 0 with rate r.

As we have seen in (i), to get one more correct digit in the approximation (i.e. to reduce
the error by a factor of ten), we need to take a fixed number of iterations. This property

3
makes linear convergence good in practice (for many problems) so long as the rate r is not
too close to 1.

Sublinear: The sequence converges sublinearly if xn → x but


|xn+1 − x|
lim = 1.
n→∞ |xn − x|

For example, a sequence1 for which there is a k > 0 such that


1
xn = + smaller terms
nk
will converge sublinearly to zero since
|xn+1 | nk
lim = lim = 1.
n→∞ |xn | n→∞ (n + 1)k

It is often not too bad to get reasonable close to the limit. However, to get very high accu-
racy, one requires a large number of iterations as seen in example (iii).

Superlinear: The sequence converges superlinearly if


|xn+1 − x|
lim = 0.
n→∞ |xn − x|
In particular, the sequence converges with order α (for α > 1) if
|xn+1 − x|
lim =C
n→∞ |xn − x|α

for a constant C that does not have to be less than one. Informally,

|xn+1 − x| = C|xn − x|α as n → ∞

i.e. the error gets raised to the power α at each step (and then multiplied by C).

The case where α = 2 is called quadratic convergence. As shown in example (ii), when
near the limit each iteration multiplies the number of correct digits by a factor α. Thus the
approximations become accurate very quickly, needing only a handful of iterations to reach
machine precision.

The limit C determines an ‘initial’ error of sorts that gets reduced as n → ∞. In con-
trast to linear convergence, the value of the limit C is not too important because the effect
of doubling the number of correct digits (or multiplying by α) is an ‘exponential’ growth in
the number of digits that will quickly reduce any initial error, no matter how large.

1
Such a sequence is said to have ‘polynomial decay’ because it goes to zero like 1 over a polynomial, in
contrast to ‘exponential decay’ in the linear example.

4
A technicality (optional): It is worth noting that the definition of convergence here is
a little too stringent, because we require that the error drops by a certain factor at each
iteration. This does not allow for some ‘oscillations’ in the sequence, even if they decay to
zero. For example, define a sequence cn by

1/4, 1, 1/4, 1 · · ·

and an = 2−n cn . Then an+1 /an can be as large as 2, but

|an | ≤ 2−n

so an converges to zero, and we would like to say the it converges linearly with rate 1/2.

The notion of convergece introduced previously is technically called Q-linear conver-


gence (etc.), the Q standing for ‘quotient’. A less restrictive definition is R-linear
convergence is that an → 0 at least linearly (in this new sense) if there is a sequence bn such
that bn converges Q-linearly to 0 and |an | ≤ |bn | (i.e. if an is bounded above by a linearly
convergent sequence).

5
2.3 Big-O Notation
A function f (x) (or sequence) is ‘Big-O’ of g(x) as x → a, written

f = O(g(x)) as x → a

or more succinctly as
f = O(g(x))
if there is a constant C > 0 such that

|f (x)| ≤ C|g(x)| as x → a (1)

The values of a are always either a = 0 or a = ±∞. The limit is usually implied by context
and is left unstated.

To be precise: The meaning of (1) is the following:

If a = ∞: there are values x0 and C such that the bound holds for all x > x0 .

If a = 0: There are values x0 and C such that the bound holds for |x| < x0 .

The notation expresses that f grows at most as fast as g (up to a constant factor) as
x → a. (Alternately, if g → 0 then f decays at least as fast as g).

The same definitions apply for sequence an , where the only meaningful limit is n → ∞.

For example (in the limit x → ∞),

x3 + x2 + x = O(x3 ), ex + x10 = O(ex ).

Big-O provides an easy way to state the size of a term without being to specific. One can
choose when to stop being exact if there are many terms, e.g. it is also true that

x3 + x2 + x = x3 + O(x2 ).

Since g is only an upper bound, a function that is O(g) is also O(anything larger than g).
For example,

x→∞: xm = O(xn ) for 0 < m < n


x→0: xm = O(xn ) for 0 < n < m

Other examples:

• A degree n polynomial Pn (x) is O(xn ) as x → ∞.

• A degree n polynomial with lowest degree term ak xk is O(xk ) as x → 0.

6
• Similarly, a polynomial in 1/x is big-O of its smallest degree term as x → ∞, e.g.
1 2 3
− 3 + 5 = O(1/x) as x → ∞.
x x x

• Every polynomial is O(ex ) as x → ∞.

A Big-O term in an equation eats all smaller terms (convenient when we don’t care about
terms of that size). If f1 = O(g) and f2 = O(h) and 0 < g ≤ h then

f1 + f2 = O(h).

Similarly, O(f1 f2 ) = O(gh). Take, for instance, calculating a product of power series near
x = 0 like
sin x cos x = (x − O(x3 ))(1 − x2 /2 + O(x4 )).
The O(x3 ) term will absorb x3 and above terms, so we only need to calculate the terms that
are x2 and below:
sin x cos x = x + O(x3 ).
Note that xO(x4 ) = O(x5 ) and the product of O(x3 ) and O(x4 ) terms is O(x7 ). The obvious
downside is that we lose some information (what is the x3 term above?), so it is important
to only use O(· · · ) when the details of those terms are not important.

You might also like