0% found this document useful (0 votes)
46 views78 pages

Ma 2715

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Topics covered

  • differentiation,
  • boundary value problems,
  • integration by parts,
  • convergence,
  • cosine series,
  • Parseval's identity,
  • finite difference method,
  • periodic functions,
  • piecewise smooth functions,
  • integration
0% found this document useful (0 votes)
46 views78 pages

Ma 2715

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Topics covered

  • differentiation,
  • boundary value problems,
  • integration by parts,
  • convergence,
  • cosine series,
  • Parseval's identity,
  • finite difference method,
  • periodic functions,
  • piecewise smooth functions,
  • integration

[Link] c M. K.

Warby MA2715 Advanced Calculus and Numerical Methods 0–1

MA2715
Advanced Calculus and Numerical Methods
Lecture Notes by M.K. Warby in 2019/0
Dept. of Mathematics, Brunel Univ., UK
Email: [Link]@[Link]
URL: [Link]

MA2715 and MA2895 – the week breakdown

The lectures MA2715 run for weeks 17-27 with seminars starting in week 18.
The related assessment block MA2895 will run with teaching events for weeks 17–27
involving computer labs in which Matlab will be used.

Assessment dates and assessment information

Some of the assessment is under the code MA2895_CB, which is a 10 credit module, which
has the title “Numerical Analysis Project” and for this the breakdown is as follows:

• Class test worth 30%: Planned for week 22.

• Assignment worth 70%: Deadline is likely to be the start of week 28. The assignment
tasks are likely to be given out in the next three to five weeks with some of the topics
related to the material of MA2715.

We will confirm later as to the precise arrangement of the test planned for week 22
and also the deadline for the assignment.

There will also be questions on the material taught on all parts of MA2715_SB in the
20 credits assessment block MA2815_CN which has the title “Advanced Mathematics II
and Numerical Methods”. This exam will be in the April/May 2020 exam period.
[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 0–2

Recommended reading and my sources

There is no essential text to obtain for this module although there are many texts which
cover at least most of the material and among the sources for the notes that I will generate
are the following books.

1. Kendall E. Atkinson. Elementary numerical analysis. Wiley, 3rd edition, 2004.


QA297.A83

2. Richard L. Burden and J. Douglas Faires. Numerical analysis. Brooks/Cole, 7th


edition, 2001.
QA297.B87. The Brunel library has later editions with the 9th edition being pub-
lished in 2011.

3. Glyn James. Advanced Modern Engineering Mathematics. Pearson Prentice Hall,


Fourth edition, 2010.
TA330.A48. ISBN: 9780130454256.
(Chapter 7 of this book contains material about Fourier series which is planned to
be the last chapter covered in MA2715.)

Books about Matlab related to MA2895

The following is repeated in the handouts for the Matlab sessions and for completeness I
include it here as well. There is no core Matlab book that you have to buy for the module
although I do myself own several books and among those that I consult sometimes are
the following.

1. Timothy A. Davis. MATLAB primer. CRC Press, eighth edition, 2011.


QA297.D38.

2. Brian D. Hahn and D. T. D. T. Valentine. Essential MATLAB for engineers and


scientists. Academic Press, 5th edition, 2013.
QA297.H345.

3. D. J. Higham and Nicholas J. Higham. MATLAB guide. Society for Industrial and
Applied Mathematics, 2nd edition, 2005.
QA297.H525
(I also have the 3rd edition published in Feb 2017 but this is not yet in the Brunel
library.)

Please note that Matlab itself has an extensive help system.


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 1–1

Chapter 1

Vectors and matrices: notation,


revision and norms

1.1 Notation

In this module the following notation will be used.

R denotes the set of real numbers.


Rn denotes the set of vectors of length n with real entries.
Rm,n denotes the set of matrices with m rows and n columns with real entries.

For column vectors the notation x = (xi ) ∈ Rn is shorthand for a real column vector x
with n rows with the row i entry being xi , i.e.
 
x1
 .. 
x =  . .
xn

If x is a column vector then xT = (x1 , . . . , xn ) is a row vector. Here the superscript T


denotes transpose. Column vectors and row vectors are special cases of a matrix and
in the notation used we have reduced how much we write by just using 1 subscript for
the entries. When both m > 1 and n > 1 we have a matrix and we use the notation
A = (aij ) ∈ Rm,n as shorthand for a real matrix with m rows and n columns with the i, j
entry being aij , i.e.  
a11 · · · a1n
A =  ... · · · ..  .

. 
am1 · · · amn
When we refer to the size of a matrix the terms m-by-n or m × n will sometimes be used.
In this module we mostly just consider the case when m = n, i.e. we mostly just consider
square matrices.
Other notation which will frequently be used are the following.

– Vectors and matrices: notation, revision and norms – 1–1 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 1–2

The zero vector is denoted by 0. The size of 0 will be depend on the context.

The n × n identity matrix is denoted by


 
1 0 0
··· 0
0 1 0
· · · 0
 
I = (δij ) = 0 0 1
· · · 0,

 .. .. ..
. . .. 
. . . . .
0 0 0 ··· 1

where (
1, if i = j,
δij =
0, otherwise.
The identity matrix is a square diagonal matrix with each diagonal entry being equal to 1.
When just I is written the size should be clear from the context but if the size is not
immediately clear then we will write In when the size is n × n.

The standard base vectors for Rn are denoted by e1 , . . . , en which correspond respec-
tively to the columns of I. Thus when n = 3 we have
       
1 0 0 1 0 0
I = I3 = 0 1 0 , e1 = 0 , e2 = 1 , e3 = 0 .
0 0 1 0 0 1

1.2 The vector space Rn

Rn is an example of a vector space and the vectors e1 , . . . , en give the standard basis for
this space and further these base vectors are orthonormal. (Vector spaces were discussed
at the start of level 2 as part of the module MA2721.) We consider next the definitions of
these terms using the column vector notation used in this module.

The inner product of vectors x and y is defined by


 
y1
 y2 
xT y = (x1 , x2 , . . . , xn )  ..  = x1 y1 + x2 y2 + · · · + xn yn .
 
.
yn

In books and other sources you may see this operation written as (x, y) or < x, y > or as
x · y but in these notes the matrix product notation of a row vector times a column vector
will be used.

Vectors x and y are orthogonal if


xT y = 0.
A vector x has unit length if

xT x = x21 + · · · + x2n = 1.

– Vectors and matrices: notation, revision and norms – 1–2 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 1–3

(As we will see in section 1.5, for vectors x of any length the quantity xT x is also referred
to as the square of the 2-norm of x.)

Vectors v1 , . . . , vn are orthonormal if the vectors each have unit length and they are
orthogonal to each other, i.e.

v Ti v i = 1, i = 1, . . . , n and v Ti vj = 0, when i 6= j.

Vectors v1 , . . . , vn are linearly independent if

α1 v 1 + · · · + αn v n = 0

is only true when α1 = · · · = αn = 0. It follows almost immediately that if vectors


are orthonormal then they are linearly independent. The converse of linearly indepen-
dent is linearly dependent. Vectors v1 , . . . , vn are linearly dependent if there exists
α = (αi ) 6= 0 such that
α1 v1 + · · · + αn v n = 0.

1.3 The vector Ax and the solution of Ax = b

In the context of things done in MA2715 note that when we have a n × n matrix A and a
n × 1 column vector x the product gives another column vector, i.e.

b = Ax ∈ Rn .

This can be represented in a number of ways. With b = (bi ) the matrix multiplication
means that n
X
bi = (ith row of A)x = aij xj , i = 1, . . . , n.
j=1

Instead of considering things entry-by-entry we can consider representing A column -by-


column, i.e.
 
a1j
 .. 
A = (a1 , . . . , an ) where aj =  .  = jth column of A.
anj

It is valid here to write


 
x1
Ax = (a1 , . . . , an )  ... 
 
xn
= a1 x1 + · · · + an xn
= x1 a1 + · · · + xn an .

Provided the matrix sizes are compatible matrix multiplication works in “block form”
to explain how the second line follows from the first line. The final line follows because
aj xj = xj aj as a consequence of what we mean by a scalar times a vector. The point of

– Vectors and matrices: notation, revision and norms – 1–3 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 1–4

writing the expression in this form is that Ax is a linear combination of the n columns of
A. When the columns of A are linearly dependent there is a vector x 6= 0 such that
Ax = 0
and this vector is an eigenvector of A with 0 as the eigenvalue. When the columns of A
are linearly dependent the matrix is singular, the determinant det A = 0, the matrix does
not have an inverse matrix (i.e. it is not invertible) and we cannot uniquely solve a linear
system
Ax = b.
In chapter 2 we consider how computer packages solve Ax = b and we hence need to
determine when there is a unique solution and when there is not, i.e. as part of the
computation we need to be able to decide if the matrix A is a singular matrix or not. In
your previous study you will have done many examples with small matrices (e.g. n = 2
or n = 3) where you attempt to obtain a solution or you detect that there is no unique
solution and you possibly also determine a general solution if it exists. For larger matrices
with computations done on a computer with floating point arithmetic used (which usually
has rounding errors) we cannot generally be so precise in deciding that a matrix is singular
but instead be can just determine that a matrix is “close to being singular”. This is
described briefly in section 1.5 when vector and matrix norms and the matrix condition
number is introduced.

1.4 Eigenvalues and eigenvectors

Eigenvalues and eigenvectors will appear a few times in this module and we give here
briefly a revision of the definition of these terms together with some of the basic properties.

Let A denote a real n × n matrix. A vector v 6= 0 is an eigenvector of A with


eigenvalue λ if
Av = λv.
This can be equivalently written as
Av − λv = 0 or (A − λI)v = 0.
As v 6= 0 this means that we have a non-trivial solution of (A − λI)v = 0 and this implies
that A − λI is singular and its determinant is 0. If we let
pA (t) = det(A − tI)
then this defines the characteristic polynomial of A, and the eigenvalues are the roots
of this polynomial. As a minor point, the characteristic polynomial is often defined as
det(tI − A), e.g. this is the case in Matlab, which only differs from the previous definition
by a factor of −1 when n is odd and both versions have the same roots which are the
solutions of
det(A − λI) = 0
which is known as the characteristic equation. There is a result known as the funda-
mental theorem of algebra which states that such a polynomial has n roots λ1 , . . . , λn
in the complex plane such that
PA (t) = (λ1 − t)(λ2 − t) · · · (λn − t)

– Vectors and matrices: notation, revision and norms – 1–4 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 1–5

which is a polynomial of degree n. The set of eigenvalues {λ1 , . . . , λn } is known as the


spectrum of A. The spectral radius of a matrix is defined by

ρ(A) = max {|λ1 |, . . . , |λn |} .

As a final comment here, as A is a real matrix the polynomial pA (t) has real coefficients
and as a consequence any non-real roots of pA (t) must occur in complex conjugate pairs.

1.4.1 Examples
1. Let  
1 0
A = I2 = .
0 1
Then
1−t 0
pA (t) = det(I2 − tI2 ) = = (1 − t)2 .
0 1−t
The eigenvalues are λ1 = λ2 = 1, i.e. we have a repeated eigenvalue. Every non-zero
vector in R2 is an eigenvector.

2. Let  
2 1
A= .
1 2
Then
2−t 1
pA (t) = = (2 − t)2 − 1 = (1 − t)(3 − t).
1 2−t
The eigenvalues are λ1 = 1 and λ2 = 3. To get an eigenvector corresponding to
λ1 = 1 we need to obtain a non-zero solution to
    
1 1 x1 0
(A − λ1 I)x = = .
1 1 x2 0

A solution is xT = (1, −1).


Similarly to get an eigenvector corresponding to λ2 = 3 we need to obtain a non-zero
solution to     
−1 1 x1 0
(A − λ2 I)x = = .
1 −1 x2 0
A solution is xT = (1, 1).
Observe that the eigenvector corresponding λ1 is orthogonal to the eigenvector cor-
responding to λ2 which is a consequence of the matrix being real and symmetric.
The spectral radius of this matrix is max {1, 3} = 3.

3. Let  
0 −1
A= .
1 0
Then
−t −1
pA (t) = = t2 + 1 = (i − t)(−i − t).
1 −t

– Vectors and matrices: notation, revision and norms – 1–5 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 1–6

The eigenvalues are the complex conjugate pair λ1 = i and λ2 = −i. The eigen-
vectors are also complex. The spectral radius of this matrix is 1. The matrix is
a rotation matrix corresponding to anti-clockwise rotation by angle π/2 about the
origin.

4. Let  
0 1
A= .
0 0
Then
−t 1
pA (t) = = t2 .
0 −t
The eigenvalues are λ1 = λ2 = 0. To obtain an eigenvector we need to obtain a
non-zero solution of
    
0 1 x1 0
(A − 0I)x = = .
0 0 x2 0

The only information that these equations give is that x2 = 0 and thus the only
direction which is an eigenvector is xT = (1, 0). This is an example of a deficient
matrix in that it has an eigenvalue of algebraic multiplicity equal to 2 (this is from
considering the characteristic polynomial) but the dimension of the eigen-subspace
corresponding to this eigenvalue is only 1. The dimension of the eigen-subspace
corresponding to an eigenvalue is known as the geometric multiplicity.
Note that in this example the spectral radius is 0 but the matrix is not the zero
matrix although A2 is the zero matrix. This example shows that for general matrices
the spectral radius cannot be used to define a norm for square matrices with norms
considered soon in this chapter.

Later in the module we will need eigenvalues and eigenvectors when we consider the
solution of systems of differential equations of the form u′ = Au where u = u(x) ∈ Rn
and A is a n × n constant matrix.

1.4.2 A summary of key points about eigenvalues and eigenvec-


tors

Let v i 6= 0 denote an eigenvector associated with the eigenvalue λi , i.e.

Av i = λi v i , i = 1, 2, . . . , n.

The following are some important results about eigenvalues which are needed in this
module.

1. An n × n matrix A is non-singular if and only if λi 6= 0 for i = 1, 2, . . . , n.

– Vectors and matrices: notation, revision and norms – 1–6 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 1–7

2. If v 1 , . . . , v n are linearly independent then they give a basis for Rn . In matrix form
we have

A(v1 , . . . , vn ) = (Av 1 , . . . , Avn )


= (λ1 v 1 , . . . , λn vn )
 
λ1
= (v 1 , . . . , vn )  .. .
.
 
λn

If we let V = (v 1 , . . . , vn ) be the matrix with the columns as the eigenvectors and


we let D = diag(λ1 , . . . , λn ) then in matrix form we have

AV = V D.

As we are assuming that the columns of V are linearly independent it follows that
V has an inverse and we can write

V −1 AV = D and A = V DV −1 .

The first result is often referred to as diagonalising the matrix and the second version
gives a representation of the matrix in terms of the eigenvalues and eigenvectors.
Using this representation we can immediately get powers of A as

A2 = (V DV −1 )(V DV −1 ) = V D 2 V −1

and more generally


Ak = V D k V −1 , k = 1, 2, . . . .
When all the eigenvalues are non-zero we also have

A−1 = V D −1 V −1 .

3. If λ1 , . . . , λn are distinct then it can be shown that the eigenvectors v 1 , . . . , vn are


linearly independent and the conditions in the previous item hold.

4. If A is a real symmetric matrix then we can diagonalise A using an orthogonal


matrix, i.e. there exists an orthogonal matrix Q such that

QT AQ = D, A = QDQT . (1.4.1)

This is a special case of the situation above with V = Q as the inverse of an


orthogonal matrix is its transpose, i.e. Q−1 = QT .

1.5 Vector and matrix norms

For a given object (vectors and matrices in our case) a norm is a way of giving a ‘size’ to
the object. For example the modulus of a real or complex number defines a norm and the
Euclidean length of a vector in three dimensional geometry defines a norm. With a way
of giving a size to objects like vectors and matrices we can then consider when a vector
x is close to another vector y by considering the size of x − y and we can consider when

– Vectors and matrices: notation, revision and norms – 1–7 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 1–8

a matrix A is close to another matrix B by considering the size of A − B. In the context


of solving Ax = b situations like these arise when we have an approximate solution x̂ and
we need to estimate or compute the size of the error x − x̂ and the size of the residual
term b − Ax̂.
In the n dimensional space Rn there are many different norms which can be defined
and we will consider the most popular ones shortly. In the applications that are considered
in this module it does not usually matter too much which norm is used as in a certain
sense all norms in Rn are equivalent which roughly means that if kxk is very small or
very big in one norm then it is also very small or very big respectively in another norm
provided n is not too large.

1.5.1 Vector norms – the axioms

We begin abstractly by giving the vector norm axioms.


A function f : Rn → R is a vector norm if the following conditions are satisfied:

(i) f (x) ≥ 0 for all x ∈ Rn with f (x) = 0 if and only if x = 0.

(ii) f (αx) = |α|f (x) for all α ∈ R, x ∈ Rn . (This is linearity.)

(iii) f (x + y) ≤ f (x) + f (y) for all x, y ∈ Rn . (This is the Triangle inequality.)

If f is a norm then we use the notation

f (x) = kxk .

(When more than one norm is being used or when the particular norm being used is
not clear from the context we will use subscripts to distinguish between different norms.)
In this k.k notation we repeat the norm axioms as follows:

(i) kxk ≥ 0 for all x ∈ Rn with kxk = 0 if and only if x = 0.

(ii) kαxk = |α| kxk for all α ∈ R and x ∈ Rn .

(iii) kx + yk ≤ kxk + kyk for all x and y in Rn .

1.5.2 Commonly used vector norms

The most commonly used vector norms are the 2-norm, the ∞−norm and the 1− norm
and for x ∈ Rn these are defined as follows.

n
!1/2
1/2 X 1/2
kxk2 = x21 + x22 + · · · + x2n = x2i = xT x
1

– Vectors and matrices: notation, revision and norms – 1–8 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 1–9

is the 2-norm or Euclidean norm. (When we just refer to the length of a vector it is
usually understood that this means the two norm.) In usage the next most popular norm
is
kxk∞ = max |xi |
1≤i≤n

which is the ∞-norm. The 1-norm is


n
X
kxk1 = |x1 | + |x2 | + · · · + |xn | = |xi |.
1

There is a function called norm() in Matlab which computes these norms. For example,
if you put

x=[1, -2, 2]
n2=norm(x)
n2b=norm(x, 2)
ninf=norm(x, inf)
n1=norm(x, 1)

Then the output is

x =
1 -2 2
n2 = 3
n2b = 3
ninf = 2
n1 = 5

norm(x) and norm(x, 2) both give the 2-norm whilst in the other cases a second argument
of inf or 1 has to be given.
As a final comment on the definition of the 2-norm, if we need to consider vectors with
non-real entries then we need to change slightly what is given above to be such that
n
X
kxk22 = |x1 |2 + · · · + |xn |2 = |xi |2 = x̄T x.
i=1

Here the notation x̄ = (x̄i ), means that each entry is the complex conjugate of the
corresponding entry of x = (xi ). In a later chapter there will be examples involving real
matrices which have complex eigenvalues and eigenvectors but there will not be too many
other examples in this module when we have vectors with non-real entries.

1.5.3 Matrix norms induced by vector norms

Let A denote a n × n matrix. Given any vector norm for vectors in Rn we can define the
matrix norm of A induced by the vector norm as

kAk = max {kAxk : kxk = 1} . (1.5.1)

– Vectors and matrices: notation, revision and norms – 1–9 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 1–10

It is straightforward to show that a similar set of axioms to the vector case also hold here.
More precisely we have the following.

(i) kAk ≥ 0 for all A ∈ Rn,n with kAk = 0 if and only if A = 0.


(A = 0 means A=zero matrix.) This is known as the non-negative condition.

(ii) kαAk = |α| kAk for all scalars α ∈ R and all A ∈ Rn,n . This is known as the
linearity condition.

(iii) kA + Bk ≤ kAk + kBk for all A, B ∈ Rn,n . This is known as the triangle inequality.

We can add to these properties if we consider matrix-vector multiplication and matrix-


matrix multiplication.
In the first case, an immediate consequence of the definition of the matrix norm is
that if we consider a vector x 6= 0 then
x
y=
kxk

is a vector of norm 1 in the norm being used and

x = kxk y

so that
kAxk = kxk kAyk ≤ kxk kAk.
Thus for all vectors x ∈ Rn we have

kAxk ≤ kAk kxk (1.5.2)

and for at least one direction there is a vector x with kAxk = kAk kxk.
In the case of matrix-matrix multiplication we have that if A and B are both n × n
matrices then the products AB and BA are also n × n matrices. If x is such that kxk = 1
then ABx = A(Bx) and thus

kABxk ≤ kAk kBxk, using (1.5.2) in the case of the matrix A,


≤ kAk kBk kxk, using (1.5.2) in the case of the matrix B,
= kAk kBk, as the vector x has norm 1.

Thus
kABk = max {kABxk : kxk = 1} ≤ kAk kBk.
Similarly
kBAk ≤ kAk kBk.
These indicate that we can bound the norm of the product of two matrices in terms of
the norms of the matrices involved and in particular note that it follows that

kAk k ≤ kAkk , k = 1, 2, . . . .

– Vectors and matrices: notation, revision and norms – 1–10 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 1–11

1.5.4 The matrix norms kAk1, kAk2 and kAk∞

Each of the common vector norms generates an induced matrix norm and it turns out
that it is not too difficult to get explicit expressions for these although we omit the details
of the derivations in these notes although some these are considered in the exercise sheet
questions. We use the notation kAk1 , kAk2 and kAk∞ for the matrix norms induced by
the vector 1-norm, 2-norm and ∞-norm respectively. The results are as follows.
n
X
kAk∞ = max |aij | = maximum row sum of absolute values,
1≤i≤n
j=1
Xn
kAk1 = max |aij | = maximum column sum of absolute values,
1≤j≤n
i=1
1/2
kAk2 = ρ(AT A) ,

where, as on page 1-5,

ρ(B) = spectral radius of B


= largest eigenvalue of B in magnitude.

These results show that it is straightforward to compute kAk∞ and kAk1 but that the
matrix 2-norm involves computing the dominant eigenvalue of AT A, i.e. the largest in
magnitude eigenvalue of AT A, which can usually only be done numerically and it involves
much more work than the other two cases.

As an example, consider the following 3 × 3 matrix which is on the wiki-page about


matrix norms.  
3 5 7
A= 2 6
 4 .
0 2 8
The row sums of the absolute values are 15, 12 and 10 and thus kAk∞ = 15. The column
sums of the absolute values are 5, 13 and 19 and thus kAk1 = 19. We can use Matlab to
work out the 2-norm and this can be done by the following commands

A=[3, 5, 7; 2, 6, 4; 0, 2, 8]
ninf=norm(A, inf)
n1=norm(A, 1)
n2=norm(A)

which generate the following output

A =
3 5 7
2 6 4
0 2 8
ninf = 15
n1 = 19
n2 = 13.686

– Vectors and matrices: notation, revision and norms – 1–11 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 1–12

1.6 The spectral radius and matrix norms

There are some connections between the spectral radius ρ(A) of a matrix A and any of
the induced matrix norms kAk.

Firstly, if v 6= 0 is an eigenvector of A with eigenvalue λ, i.e. Av = λv then any scaling


of v is also an eigenvector and thus there is a eigenvector corresponding to λ such that
kvk = 1. In this case
|λ| = |λv| = kAvk ≤ kAk.
This shows that an induced matrix norm is a bound on the magnitude of all the eigenvalues
are thus in particular
ρ(A) ≤ kAk.
As we can easily compute kAk1 and kAk∞ we get that

ρ(A) ≤ min {kAk1 , kAk∞ } .

Secondly, if the matrix is real and symmetric, i.e. A ∈ Rn×n and AT = A, then
kAk1 = kAk∞ and
kAk22 = ρ(AT A) = ρ(A2 ).
When we have a real symmetric matrix all the eigenvalues are real and for any matrix

Av = λv implies that A2 v = A(λv) = λ2 v.

The eigenvalues of A2 are the square of the eigenvalues of A and thus in the real symmetric
case
kAk2 = ρ(A) = max {|λi | : λi is an eigenvalue of A} .
As we still have bounds on ρ(A), by using kAk1 = kAk∞ it follows that in the real
symmetric case
kAk2 = ρ(A) ≤ kAk1 = kAk∞ .

1.7 The matrix condition number

When A is a n × n matrix which is non-singular it has an inverse A−1 and the matrix
condition number is defined as

κ(A) = kAk kA−1k.

When A is singular we say that κ(A) = ∞. From the result that kABk ≤ kAk kBk, that
I = A−1 A and that kIk = 1 it follows that for all square matrices

1 ≤ κ(A) ≤ ∞.

This quantity can be expressed in terms of eigenvalues when we have a real symmetric
matrix as when all the eigenvalues of A are non-zero the relation
 
−1 1
Av = λv rearranges to A v = v.
λ

– Vectors and matrices: notation, revision and norms – 1–12 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 1–13

Thus if the eigenvalues of a non-singular matrix A are λ1 , . . . , λn then the eigenvalues of


A−1 are λ−1 −1
1 , . . . , λn . If for convenience we suppose that the labelling is such that

0 < |λn | ≤ · · · ≤ |λ1 |

then
1 1
0< ≤ ··· ≤
|λ1 | |λn |
and with the 2-norm and a symmetric matrix
1
kAk2 = |λ1 |, kA−1 k2 =
|λn |

and the condition number using the 2-norm is

|λ1 |
κ(A) = .
|λn |

The significance of the condition number to this module is that a large condition
number indicates that the matrix is close to a singular matrix and we cannot expect to
accurately solve a linear system Ax = b by any means. With some further analysis it can
be shown that 1/κ(A) gives the relative distance of the matrix A to a singular matrix,
i.e. for all singular matrices C of the same shape as A we have

kA − Ck 1
≥ .
kAk κ(A)

and there exists a singular matrix B such that

kA − Bk 1
= .
kAk κ(A)

In the case of the 2-norm the nearest singular matrix B to the non-singular matrix A can
be quite easily described by using what is known as the singular valued decomposition of
A which is a topic beyond this module. In the particular case that A is real and symmetric
the situation is slightly easier by noting that we have the representation of A in terms of
eigenvalues and eigenvectors given in (1.4.1)

A = QDQT
= λ1 v 1 v T1 + · · · + λn−1 vn−1 v Tn−1 + λn v n v Tn .

If we replace the smallest eigenvalue in magnitude (which is λn ) by 0 then we get a singular


matrix. If we denote this singular matrix by B then it has the representation

B = λ1 v1 vT1 + · · · + λn−1 v n−1 v Tn−1

and it can be shown that this is the nearest singular matrix to the matrix A with

kA − Bk2 |λn |
kA − Bk2 = |λn |, kAk2 = |λ1 | and = .
kAk2 |λ1 |

– Vectors and matrices: notation, revision and norms – 1–13 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 2–1

Chapter 2

Direct methods for solving Ax = b

2.1 Introduction

In this chapter we consider Gauss elimination type methods for a solving a system of
linear equations
Ax = b,
where A is an n × n matrix and b is a n × 1 column vector. Methods of this type are used
in Matlab when you give the command

x=A\b;

when A and b have the appropriate shape in the Matlab workspace. In fact when you
put A\b in Matlab the software attempts to classify the type of matrix involved and
then selects the “best” technique to use to reliably solve the problem in an efficient way.
Different methods are used if for example the matrix is symmetric or if it is tri-diagonal
as compared to what is done if A is a full matrix with no special properties. (The
symmetric case also needs a property known as positive definite.) In this chapter we
just consider this general case and consider in general the Gauss elimination procedure
involving systematically reducing a general system to an equivalent system involving a
triangular matrix. A key result of the chapter is in showing that the Gauss elimination
process is equivalent to factorizing the matrix A in terms of triangular matrices, i.e. in
its basic form (when it works) we show that we get the factorization

A = LU,

where L is a unit lower triangular matrix and U is an upper triangular matrix.


In the case that n = 4 this means that we have a factorization of the form
    
a11 a12 a13 a14 1 0 0 0 u11 u12 u13 u14
a21 a22 a23 a24  l21 1 0 0  0 u22 u23 u24 
a31 a32 a33 a34  = l31 l32 1 0  0
    .
0 u33 u34 
a41 a42 a43 a44 l41 l42 l43 1 0 0 0 u44

An upper triangular matrix is a matrix for which all the entries below the diagonal are 0,
a lower triangular matrix is a matrix for which all the entries above the diagonal are 0

– Direct methods for solving Ax = b – 2–1 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 2–2

and a unit lower triangular matrix is a lower triangular matrix with the diagonal entries
all being equal to 1. For Gauss elimination to work in its basic form needs A to have
special properties. For a more stable way of factorizing a matrix packages instead create
a factorization of the form
P A = LU,
where P is permutation matrix. In this form the matrix P A is obtained from A by
rearranging the rows. In Matlab you can get the matrices generated by using the command

[L, U, P]=lu(A);

Gauss elimination and implementations involving LU factorization are examples of


direct methods for solving Ax = b where the term direct method means that the
exact solution is obtained in a finite number of steps provided exact arithmetic is used.
However it should be realised that to do computations quickly on a computer floating
point arithmetic is used and this has rounding error with calculation typically done with
close to 16 decimal digit accuracy. Thus in practice we only get an approximate solution
although hopefully the method can generate an approximate solution which is close to
machine accuracy.

2.2 Forward and backward substitution algorithms

We start by considering systems with triangular matrices and we start with two examples
which can be done by hand calculations.
Consider the upper triangular system
    
1 1 0 0 x1 0
0 2 1 0 x2   3 
0 0 3 1 x3  = −2 .
    

0 0 0 4 x4 4

The last equation only involves x4 and hence we get this immediately. Then using x4 we
can use the previous equation to get x3 and continue in a similar way to get x2 and then
x1 . For the details

4x4 = 4, giving x4 = 1,
3x3 = −2 − x4 , giving x3 = −1,
2x2 = 3 − x3 , giving x2 = 2,
x1 = −x3 , giving x1 = −2.

Next consider the lower triangular system


    
1 0 0 0 x1 2
1 1 0 0 x2  3
1 1 1 0 x3  = 2 .
    

1 1 1 1 x4 0

– Direct methods for solving Ax = b – 2–2 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 2–3

The first equation only involves x1 and hence we get this immediately. Then using x1 we
can use the second equation to get x2 and continue in a similar way to get x3 and then
x4 . For the details

x1 = 2,
x2 = 3 − x1 , giving x2 = 1,
x3 = 2 − x1 − x2 , giving x3 = −1,
x4 = −x1 − x2 − x3 , giving x4 = −2.

The procedure used in the case of an upper triangular matrix is known as backward
substitution and the procedure used in the case of the lower triangular system is known
as forward substitution. We now consider how to do this generally for n × n triangular
matrices. For a general upper triangular system Ux = b, i.e.

u11 x1 + u12 x2 + ··· + u1n xn = b1


+ u22 x2 + ··· + u2n xn = b2
.. ..
. .
un−1,n−1 xn−1 + un−1,n xn = bn−1
unn xn = bn

backward substitution can be described as follows.

xn = bn /unn
n
!
X
xi = bi − uik xk /uii , i = n − 1, . . . , 1 .
k=i+1

For a general lower triangular system Lx = b, i.e.

l11 x1 = b1
l21 x1 + l22 x2 = b2
.. ..
. .
ln1 x1 + · · · + ln,n−1 xn−1 + ln,n xn = bn

forward substitution can be described as follows.

x1 = b1 /l11
i−1
!
X
xi = bi − lik xk /lii , i = 2, . . . , n .
k=1

In both cases the procedure requires that the diagonal entries are non-zero as we divide
by these entries.
In algorithms implemented on a computer using floating point arithmetic an operation
involving 1 multiplication and 1 addition/subtraction is sometimes referred to as a flop
although it seems now more common to refer to this as 2 flop operations as 2 operations
are involved and this is what will be done here. The computational cost of an algorithm
is often expressed in terms of how many such operations are needed to solve a problem.
The speed of a computer is also often expressed in terms of how many of such operations
can be done in 1 second (the terms flops is used in this case). In the case of the number

– Direct methods for solving Ax = b – 2–3 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 2–4

of operations involved in the backward or forward substitution this can be determined


without too much effort and in the case of forward substitution the computation of
i−1
X
bi − lik xk
k=1

involves 2(i − 1) flop operations and this must be done for i = 2, . . . , n. In total this gives

2(1 + 2 + 3 + · · · + (n − 1)) = n(n − 1).

When n is large there is a much smaller number of divisions and we say that in total there
is an the order of n2 operations which is often written as O(n2 ). The operation count
with backward substitution is the same. Thus in each case the triangular matrices involve
O(n2 /2) entries and the number of operations needed to solve the systems is O(n2 ).

2.3 Solving a system LU x = b

Before we consider how to obtain a factorization A = LU we consider briefly how this


helps us to solve a system of equations

Ax = LUx = b,

i.e. we start by assuming that we have a lower triangular matrices L and an upper
triangular U such that A = LU. All we do is to let y = Ux and first solve a system
Ly = b and once we have y we solve a second system Ux = y to get x. To express this as
an algorithm we have the following.

Solve Ly = b by forward substitution.


Solve Ux = y by backward substitution.

The number of operations involved is O(2n2 ). This is much less operations than the
number of operations needed to get the triangular matrices L and U which we show
needs O(2n3 /3) operations as we describe in the next section.

2.4 Reduction to triangular form by Gauss elimina-


tion

In Gauss elimination you eliminate unknowns by subtracting appropriate multiples of one


equation from the other equations and in terms of the coefficients this involves subtracting
multiples of one row from other rows. In the case that n = 4 the sequence of matrices
generated by basic Gauss elimination has the following form.
       
x x x x x x x x x x x x x x x x
 → 0 x x x → 0 x x x → 0 x x x .
x x x x      

x x x x 0 x x x  0 0 x x 0 0 x x
x x x x 0 x x x 0 0 x x 0 0 0 x

– Direct methods for solving Ax = b – 2–4 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 2–5

In each case x just indicates an entry which is likely to be non-zero and in the above we
are just considering the changes in the matrix without considering the changes that are
also done to the right hand side vector. The first matrix refers to the starting matrix A
and the final matrix is an upper triangular matrix which we refer to as U. There are 3
steps here with each step associated with creating zero entries below the diagonal in a
given column. In the case of a general n × n matrix there are n − 1 such steps.
There is a choice at each stage as to which row you use to subtract multiples of from
the other rows and there is even the possibility of re-ordering the unknowns which involves
re-ordering the columns. In the basic version of Gauss elimination none of the above is
done and we always subtract multiples of the current top row from the rows below. Thus
in the basic form you are not also making decisions about which row to use to make the
hand computations as easy as possible and you are not changing one row by multiplying
it by a scalar as you may have done when dealing with small matrices in level one. Basic
Gauss elimination is hence one specific order of the operations that you were taught at
level one in the context of solving a linear system and we consider next how to describe
this in a matrix way and we show that it is equivalent to a factorization.
To describe the operations in a matrix way let A(0) = A and let A(k) , k = 1, . . . , n − 1
denote the intermediate matrices with U = A(n−1) denoting the final upper triangular
matrix assuming the procedure runs to completion. In the case n = 4 the matrices are
 
a a a a
 
a11 a12 a13 a14 11 12 13 14
 0 a(1) a(1) a(1) 
(0)
a21 a22 a23 a24  (1) 22 23 24 
A =A= , A = (1)  ,

(1) (1)
 
a31 a32 a33 a34   0 a32 a33 a34 
a41 a42 a43 a44 (1) (1) (1)
0 a42 a43 a44
   
a11 a12 a13 a14 a11 a12 a13 a14
 0 a(1) a(1) a(1)   0 a(1) a(1) a(1) 
22 23 24  22 23 24 
A(2) =  , U = A (3)
= (2)  .
 
(2) (2) (2)
0 0 a33 a34  0 0 a33 a34 
 
(2) (2) (3)
0 0 a43 a44 0 0 0 a44
As a consequence of the order in which the reduction is done the first row never changes,
the second row never changes once A(1) has been computed, the third row never changes
once A(2) has been computed and the final entry to be computed is the 4, 4 entry of U.
We show next how to describe each reduction step as a matrix multiplication with
A(1) = M1 A(0) , A(2) = M2 A(1) , U = A(3) = M3 A(2)
so that
U = M3 M2 M1 A
and
A = (M3 M2 M1 )−1 U = (M1−1 M2−1 M3−1 )U = LU
where
L = M1−1 M2−1 M3−1 .
The matrices Mk are known as Gauss transformation matrices, which are described in a
moment, and we will show that the matrix L is a unit lower triangular matrix.
Firstly, to obtain A(1) we subtract multiples of the first row from rows 2, 3 and 4. Now
the first row at this stage is
 
(0) (0) (0) (0)
a11 , a12 , a13 , a14

– Direct methods for solving Ax = b – 2–5 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 2–6

and the ith row, also at this stage, is


 
(0) (0) (0) (0)
ai1 , ai2 , ai3 , ai4

and thus to eliminate x1 the multiples are


(0)
ai1
mi1 = (0)
, i = 2, 3, 4.
a11

In terms of rows the ith row of the next matrix A(1) is


   
(0) (0) (0) (0) (0) (0) (0) (0)
ai1 , ai2 , ai3 , ai4 − mi1 a11 , a12 , a13 , a14 , i = 2, 3, 4.

To describe this in a matrix way observe that


 
a11 a12 a13 a14
 a21 a22 a23 a24 
eT1 A(0) = 1, 0, 0, 0 a31

a32 a33 a34 
a41 a42 a43 a44

= a11 , a12 , a13 , a14
= 1st row of A(0) ,

and  
0
m21  T (0)
m31  e1 A
 

m41
is the 4 × 4 matrix to subtract from A(0) to get A(1) . Thus if we let
 
0
m21 
m1 =  m31 

m41

then
A(1) = A(0) − m1 eT1 A(0) = (I − m1 eT1 )A(0)
and the Gauss transformation matrix at the first step is
 
1 0 0 0
−m 21 1 0 0
M1 = I − m1 eT1 = 

.
−m31 0 1 0
−m41 0 0 1

To get A(2) from A(1) we subtract multiples of the 2nd row from the rows below and the
vector of the multipliers is
 
0
(1)
 0  ai2
m2 =    , where mi2 = (1) , i = 3, 4.
m32  a22
m42

– Direct methods for solving Ax = b – 2–6 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 2–7

The Gauss transformation matrix is


 
1 0 0 0
0 1 0 0
M2 = I − m2 eT2 = 

.
0 −m32 1 0
0 −m42 0 1

Similarly to get U = A(3) from A(2) we have


 
0
(2)
 0  a43
m3 = 
  , where m43 = (2) ,
0  a33
m43

and the Gauss transformation matrix is


 
1 0 0 0
0 1 0 0
M3 = I − m3 eT3 = 

.
0 0 1 0
0 0 −m43 1

The next task is describe the inverse matrices M1−1 , M2−1 and M3−1 and the product
of these which is L = M1−1 M2−1 M3−1 . This is very straightforward as

Mk−1 = I + mk eTk , k = 1, 2, 3.

This follows by noting that at the kth stage the kth row does not change and thus we can
reverse the reduction process by adding the same multiples of the kth row to the other
rows. Alternatively we can verify the expression for the inverse by just considering the
product
(I − mk eTk )(I + mk eTk ) = I − mk (eTk mk )eTk = I
where the last equality is because

eTk mk = kth entry of mk = 0

as mk has entries of 0 in positions 1, . . . , k. For the product of these matrices first note
that

M1−1 M2−1 = (I + m1 eT1 )(I + m2 eT2 )


= I + m1 eT1 + m2 eT2 + m1 (eT1 m2 )eT2
= I + m1 eT1 + m2 eT2

where the last equality is because

eT1 m2 = 1st entry of m2 = 0.

Similar reasoning gives

L = M1−1 M2−1 M3−1 = (I + m1 eT1 + m2 eT2 )(I + m3 eT3 )


= I + m1 eT1 + m2 eT2 + m3 eT3

– Direct methods for solving Ax = b – 2–7 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 2–8

where the last equality is because


eT1 m3 = eT2 m3 = 0.
Thus in full  
1 0 0 0
m 21 1 0 0
L = I + m1 eT1 + m2 eT2 + m3 eT3 = 


m31 m32 1 0
m41 m42 m43 1
which is a unit lower triangular matrix and the entries below the diagonal are exactly the
multipliers used in the basic Gauss elimination process.

The general n × n case

The above description easily generalises to the case of a general n × n matrix which
requires elimination below the diagonal in columns 1, 2, . . . , n−1 and this involves matrices
A(0) = A, A(1) , . . . , A(n−1) = U. In the general case we have for k = 1, . . . , n − 1,
0
 
 .. 
 . 
 0 
 
A(k) = Mk A(k−1) , Mk = I − mk eTk , mk =  .
mk+1,k 
 . 
 .. 
mnk
In the above the multipliers are
(k−1)
aik
mik = (k−1)
, i = k + 1, . . . , n, (2.4.1)
akk
the inverse matrices are
Mk−1 = I + mk eTk
and for the product we have
M1−1 · · · Mr−1 = I + m1 eT1 + · · · + mr eTr , r = 2, . . . , n − 1.
This can be proved by induction and there is an exercise question about this. For the
algorithm to run to completion we need the matrix to be such that we never divide by 0,
(k−1)
i.e. we need that akk 6= 0 for k = 1, . . . , n − 1 and also for U to be invertible we need
(n−1) (k−1)
that ann 6= 0. The terms ukk = akk are the pivot elements and they are the diagonal
entries of U (assuming that the algorithm runs to completion to create U).
To count the number of operations involved note that to get A(1) from A involves
computing (n − 1)2 entries corresponding to positions 2 ≤ i, j ≤ n and there are 2
floating point operation associated with each entry. Similarly to get A(2) from A(1) requires
changing (n − 2)2 entries with 2 operations in each case and continuing in this way the
total reduction process requires the following number of operations
2((n − 1)2 + (n − 2)2 + · · · + 22 + 1)
(n − 1)n(2n − 1) 2n3
=2 ∼ (for large n).
6 3
– Direct methods for solving Ax = b – 2–8 –
[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 2–9

It is not necessary to be too precise here other than to noteR that we have the n3 term and
n
we can get the constant 2/3 by considering the integral 2 0 x2 dx as the area under the
curve y = 2x2 is similar in some sense to the sum. Hence as we double the size of a matrix
we get 4 times as many entries and we require about 8 times as much work to solve a
system Ax = b when A is a full matrix with no special properties. This operation count
should be compared with the forward and backward substitution algorithms which just
has a term of involving n2 . The point here to note is that for large matrices the reduction
to triangular form is where most of the computation is done.

Example

Let  
2 3 1
A = −2 −2 −2 .
−2 −4 4
The Gauss elimination procedure to construct the LU factorization involves the following
steps.
Elimination in column 1:

  
0 2 3 1
m1 = −1 , A(1) = 0 1 −1 .
−1 0 −1 5

Elimination in column 2:

  
0 2 3 1
m2 =  0  , U = A(2) = 0 1 −1 .
−1 0 0 4
The unit lower triangular matrix is
 
1 0 0
L = I + m1 eT1 + m2 eT2 = −1 1 0 .
−1 −1 1

Thus we have shown that


    
2 3 1 1 0 0 2 3 1
−2 −2 −2 = −1 1 0 0 1 −1 .
−2 −4 4 −1 −1 1 0 0 4

To check the calculations using Matlab all that is needed is to put the following.

A=[2, 3, 1; -2, -2, -2; -2, -4, 4]


[L U]=lu(A)

The output generated is

– Direct methods for solving Ax = b – 2–9 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 2–10

A =
2 3 1
-2 -2 -2
-2 -4 4
L =
1 0 0
-1 1 0
-1 -1 1
U =
2 3 1
0 1 -1
0 0 4

It is worth noting here that the Matlab command lu will not necessarily give the LU
factorization when used in this way as it always uses pivoting which is described shortly. In
this particular example the pivoting decisions was that nothing needed to be re-arranged.

The factorization of sub-matrices

For a n × n matrix A = (aij ) the k × k principal sub-matrix is


 
a11 · · · a1k
 .. ..  .
 . ··· . 
ak1 · · · akk

In the previous example we can immediately note that for the sub-matrices of size
k = 1 and k = 2 we have
2 = (1)(2)
and     
2 3 1 0 2 3
= .
−2 −2 −1 1 0 1
This shows that in this example the factorization of the full matrix also gives the factor-
ization of all the principal sub-matrices. To prove that this is true in general we just need
to consider what matrix multiplication means when we have A = LU with A = (aij ),
L = (lij ) and U = (uij ).

aij = (LU)ij = (ith row of L)(jth column of U)


Xn
= lir urj
r=1
min{i,j}
X
= lir urj
r=1

where the last equality is because L and U are triangular and thus lir = 0 for r > i and
urj = 0 for r > j. If we restrict the indices to 1 ≤ i, j ≤ k then these entries of A are

– Direct methods for solving Ax = b – 2–10 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 2–11

entirely determined by the entries of L and U in this range, i.e.


    
a11 · · · a1k l11 u11 · · · u1k
 .. ..  =  .. . . .. ..  ,
. . k = 1, 2, . . . , n.

 . ··· .   .  . 
ak1 · · · akk lk1 · · · lkk ukk

Performing the calculations in a different order

In computer packages the order in which the entries are determined when computing a
LU factorization is generally different to that which is done in the basic Gauss elimination
algorithm. To illustrate that a different order can be used consider again the previous
example.     
2 3 1 1 0 0 u11 u12 u13
A = −2 −2 −2 = l21 1 0  0 u22 u23  .
−2 −4 4 l31 l32 1 0 0 u33
Motivated by the observation that the factorization of A also means the factorization of
all principal sub-matrices we can calculate the 1st column of U, then the 2nd row of L,
then the 2nd column of U, then the 3rd row of L and finally the 3rd column of U. Thus

a11 = 2 = u11 ,
a21 = −2 = l21 u11 , giving l21 = −1,
a12 = 3 = u12 ,
a22 = −2 = l21 u12 + u22 = −3 + u22 , giving u22 = 1.

At this stage we hence have


    
2 3 1 1 0 0 2 3 u13
−2 −2 −2 = −1 1 0 0 1 u23  .
−2 −4 4 l31 l32 1 0 0 u33

Continuing to get the 3rd row of L and the 3rd column of U.

a31 = −2 = l21 u11 , giving l31 = −1,


a32 = −4 = l31 u12 + l32 u22 = −3 + l32 , giving l32 = −1,
a13 = 1 = u13 ,
a23 = −2 = l21 u13 + u23 = −1 + u23 , giving u23 = −1,
a33 = 4 = l31 u13 + l32 u23 + u33 = −1 + 1 + u33 , giving u33 = 4.

This procedure of starting with u11 = a11 and then successively computing a row of
L followed by a column of U can be generalised to the case of an n × n matrix A and
it can be implemented more efficiently on a computer than is the case when the Gauss
elimination order of operations is used. There is no saving when small problems are done
by hand calculations other than we more quickly determine that a principal sub-matrix
is singular as this is the case if we obtain a diagonal entry of U which is equal to 0. More
precisely, if the diagonal entries u11 , . . . , uk−1,k−1 are non-zero but ukk = 0 then the k × k
principal sub-matrix has zero determinant and is not invertible and we cannot continue.

– Direct methods for solving Ax = b – 2–11 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 2–12

2.5 Partial pivoting and the factorization P A = LU

Basic Gauss elimination for a non-singular matrix A only works if all the pivot elements
are non-zero and this is the case if and only if all the principal sub-matrices of A are
invertible. Situations do arise when the matrix A has such properties but otherwise we
need to consider re-arranging the rows at each stage. We consider some examples to
illustrate this.
The basic Gauss elimination algorithm fails for the 2 × 2 system
    
0 1 x1 1
=
1 1 x2 3

although we can spot the solution x2 = 1 and x1 = 2. To modify the basic Gauss
elimination method we need to swap the equations to give
    
1 1 x1 3
=
0 1 x2 1

which is already in upper triangular form.


The basic Gauss elimination algorithm works in theory if we change slightly the pre-
vious example and have  −20     
10 1 x1 1
= .
1 1 x2 3
With hand computations
     −20 
0 1 0 10 1
m1 = , L= , U= .
1020 1020 1 0 1 − 1020

However with floating point arithmetic U is rounded to


 −20 
10 1
0 −1020

on a computer. The problem with doing this calculation on a computer is that although
A is not nearly singular, and the inverse is
   
−1 1 1 −1 −1 1
A = −20 ≈ ,
10 − 1 −1 10−20 1 0

both L and U are very close to being singular and combining this with rounding errors
that occur leads to the linear system not being solved accurately on a computer with
floating point arithmetic. In this particular case solving Ly = b gives

y1 = 1, y2 = 3 − 1020 which rounds to − 1020 .

Then solving with the matrix U after rounding, i.e. solving Ux = y gives

x2 = 1, and x1 = 1020 (1 − 1) = 0

whereas the exact solution is very close to (2, 1)T . It is the rounding error which lead to
the (1−1) = 0 part in the computation of x1 and the inaccurate result for this component.

– Direct methods for solving Ax = b – 2–12 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 2–13

If we had done the calculation exactly then at this stage we would have had a very small
number here and the computation of the large number times the small number would
have given the correct value for x1 .
To get an accurate answer in this example we swap the rows to give
    
1 1 1 0 1 1
−20 =
10 1 10−20 1 0 1 − 10−20
and everything works well.
Now consider the 3 × 3 matrix
 
4 2 1
A = 2 1 1 .
1 1 1
Basic Gauss elimination starts and we get
   
0 4 2 1
m1 = 1/2 , A(1) = 0 0 1/2 .
1/4 0 1/2 3/4
As the entry in the 2, 2 position is 0 the basic procedure cannot continue but if we swap
the 2nd and 3rd rows to give
   
0 4 2 1
m̃1 = 1/4 , Ã(1) = 0 1/2 3/4
1/2 0 0 1/2
then we are already in upper triangular form. For the factorization we have
    
4 2 1 1 0 0 4 2 1
P A = 1 1 1 = 1/4 1 0 0 1/2 3/4 .
2 1 1 1/2 0 1 0 0 1/2
The permutation matrix P is obtained by swapping rows 2 and 3 of the identity matrix,
i.e.  
1 0 0
P = 0 0 1 .
0 1 0
To check if Matlab does the same re-arrangements and gives the same output you just
need to put the following.

A=[4, 2, 1; 2, 1, 1; 1, 1, 1]
[L, U, P]=lu(A)

This generates the following output.

A =
4 2 1
2 1 1
1 1 1

– Direct methods for solving Ax = b – 2–13 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 2–14

L =
1.00000 0.00000 0.00000
0.25000 1.00000 0.00000
0.50000 0.00000 1.00000
U =
4.00000 2.00000 1.00000
0.00000 0.50000 0.75000
0.00000 0.00000 0.50000
P =
1 0 0
0 0 1
0 1 0

In the examples given we have either had to swap rows to avoid dividing by 0 or it
was desirable to swap to avoid very large entries in L and U. The strategy in partial
pivoting is to swap so that the largest entry is put in the pivot position after considering
all the candidates for the pivot entries in the column being reduced. In the general n × n
case this means that when we are at the stage of creating A(k) from A(k−1) instead of just
computing the multipliers
(k−1)
aik
mik = (k−1)
, i = k + 1, . . . , n,
akk

(as first given in (2.4.1) on page 2-8) the strategy involves first determining r ≥ k such
that n o
(k−1) (k−1) (k−1)
|ark | = max |akk |, . . . , |ank |
and then swapping row r and row k of the relevant matrices if r > k. After the swapping
this guarantees that all the multipliers (which are entries of L) have magnitude of less
than or equal to 1. With slight variations this is what is done in practice and it works
well in practice in almost all cases although it is not guaranteed to work in every possible
case for matrices A which are well conditioned. The exercise sheets contain an example
illustrating when this pivoting strategy fails (at least when n is not small) but failure in
practice is actually rare, i.e. if matrices are generated randomly then the probability of
failure is extremely low. The analysis of this is beyond the scope of this module.

2.6 Remarks about the inverse matrix A−1

In this chapter Gauss elimination and the related LU factorization have been considered
as a way of solving a general linear system Ax = b. In level 1 you would have also been
taught methods for finding the inverse matrix A−1 and then you would possibly have used
it to construct the solution to the linear system by doing the multiplication

x = A−1 b.

In level 1 this is likely to have only been considered when the number of unknowns n = 2
or n = 3. This is not the most efficient approach for these small problems but not too

– Direct methods for solving Ax = b – 2–14 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 2–15

much work is required overall. For larger problems that you do on a computer there is
about 3 times as much work involved in solving a linear system by first computing the
inverse matrix as compared to using Gauss elimination. Thus when we write x = A−1 b it
should be considered as a way of describing the solution and not as a way of computing
the solution. There are not too many situations when you need to compute A−1 but in
cases that you do the methods to do this can make use of the methods described so far
in this chapter as follows.
Before an algorithm is given we note that if ej denotes the usual base vector (i.e. the
jth column of I) then
xj = A−1 ej = jth column of A−1
and thus the jth column of the inverse matrix satisfies the linear system

Axj = ej or equivalently P Axj = P ej

for any permutation matrix P . Hence if we successively choose b to be e1 , . . . , en and solve


Ax = b then we get A−1 column-by-column. To do this efficiently we first factorize A and
then repeatedly use the forward and backward substitution algorithms. An algorithm is
hence as follows.

Step 1: Construct the factorization P A = LU.

Step 2: For j = 1, 2, . . . , n solve


LUxj = P ej .

Note that P ej is just a re-arrangement of the entries in ej and thus it is one of the other
base vectors. In an efficient implementation there is no multiplication done in determining
the vector which is described as P ej .
To summarize the statements in this section, we should not first compute an inverse
matrix in order to help solve a linear system but we should instead use the techniques for
solving linear systems to compute the inverse matrix.
The following is a Matlab script to check the claim that using the inverse matrix takes
about 3 times longer to solve a large system.

% create a random n-by-n matrix


% and create a problem with a known solution y
n=8000;
A=rand(n, n);
y=ones(n, 1);
b=A*y;

% use \ to solve the system and time it and check the accuracy
tic;
x=A\b;
e=toc;
acc=norm(x-y, inf);
fprintf(’n=%d, time taken to solve= %6.2f, accuracy=%10.2e\n’, ...
n, e, acc);

– Direct methods for solving Ax = b – 2–15 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 2–16

% repeat via first computing the inverse


tic;
x=inv(A)*b;
e=toc;
acc=norm(x-y, inf);
fprintf(’n=%d, time taken using inv(A)=%6.2f, accuracy=%10.2e\n’, ...
n, e, acc);

The output generated on a laptop which was new in 2015 is given below with the time
taken being in seconds.

n=8000, time taken to solve= 6.29, accuracy= 6.49e-11


n=8000, time taken using inv(A)= 17.23, accuracy= 5.97e-10

If you try this then you will get different numbers as the random matrix is likely to be
different on each attempt and your computer speed is likely to be different. You should
however observe that the approach involving computing the inverse matrix is slightly less
accurate as well as taking about 3 times longer. The storage of the matrix with n = 8000
involves 64 × 106 entries with each entry requiring 8 bytes and thus it involves 512 × 106
bytes of the memory. If we double n to 16000 then the storage increases by a factor of 4
to 2048 × 106 bytes and the time taken increases by a factor of about 8 and this took 47
and 143 seconds respectively on the same computer. Problems of this size are approaching
the limits of what can be done with such equipment in a reasonable time.
For some final comments about the inverse matrix recall that it was one of the terms
in the definition of the matrix condition number, i.e.

κ(A) = kAk kA−1k.

(The matrix condition number was introduced in section 1.7.) The size of this number is
relevant to how accurately we can expect to solve a linear system and to partly explain
why this is the case consider the following linear systems.

Ax = b,
A(x + ∆x) = b + ∆b.

When k∆bk is very small the two systems have the same matrix and nearly the same right
hand side and we would expect the solutions to be close and by subtracting the equations
we immediately get that
A∆x = ∆b
and hence
∆x = A−1 ∆b and k∆xk ≤ kA−1 k k∆bk.
To obtain the size of k∆xk relative to the size of kxk observe that as b = Ax it follows
that
1 kAk
kbk = kAxk ≤ kAk kxk, and this implies that ≤ .
kxk kbk
Combining the last two results gives
k∆xk k∆bk k∆bk
≤ kAk kA−1 k = κ(A) .
kxk kbk kbk

– Direct methods for solving Ax = b – 2–16 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 2–17

This shows that a small relative change in the right hand side vector b can lead to a much
larger change in the solution when κ(A) is large. A similar conclusion is reached if there
is also a small change to the matrix although the details are much longer and are not
done here.

2.7 Summary and some comments

If you have grasped the material in this chapter then it should have extended your knowl-
edge of the problem of solving Ax = b when n is large so that hand calculations are
no longer feasible and the computer has to be used. Some key theoretical points are as
follows.

1. Linear systems with triangular matrices can be solved using backward substitution
in the case of an upper triangular matrix and forward substitution in the case of a
lower triangular matrix.

2. We can solve LUx = b by solving Ly = b followed by solving Ux = y. Hence if


A = LU then we can solve Ax = b quickly. If the factorization is P A = LU then
we similar consider LUx = P b.

3. A Gauss transformation matrix is of the form Mk = I −mk eTk with the first k entries
of mk being 0. These matrices have the properties that

Mk−1 = I + mk eTk , M1−1 M2−1 · · · Mr−1 = I + m1 eT1 + · · · + mr eTr

with all the matrices being unit lower triangular.

4. Basic Gauss elimination involves no re-arrangement of the rows and when the
algorithm runs to completion it is equivalent to a factorization A = LU with
L = I + m1 eT1 + · · · + mn−1 eTn−1 being unit lower triangular and with U being
upper triangular. When the factorization is possible we also have the factorization
of all the principal sub-matrices. For a n × n matrix it takes O(2n3 /3) operations
to compute the factorization with an operation meaning a multiplication or an ad-
dition/subtraction.

5. Partial pivoting involves swapping rows to give a more stable procedure and it
corresponds to a factorization of the form P A = LU where P is a permutation
matrix.

6. Instead of getting A−1 and using this to solve a linear system we do the reverse and
solve linear systems if we need to compute A−1 as the jth column of A−1 satisfies
Axj = ej .

If more time was available then other direct methods involving factorising the matrix A
could be considered. As one example, there is complete pivoting which involves possibly
re-ordering both the rows of the matrix and the columns of the matrix and mathematically
this corresponds to a factorization of the form

P AQ = LU

– Direct methods for solving Ax = b – 2–17 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 2–18

where now both P and Q are permutation matrices. This is more stable than the partial
pivoting described earlier but the extra stability nearly doubles the time taken to solve a
linear system and it is rarely necessary.
It is also possible to factorize A in a more stable way using special orthogonal matrices
leading to a factorization of the form

A = QR

where here Q is an orthogonal matrix and R is an upper triangular matrix (the term right
triangular is also used here with the factorization known as the QR factorization). Again
the stability comes at a cost as it takes about twice as long to compute this compared
with the time taken to compute the factorization P A = LU.
Much of the description has been concerned with general square matrices with no
special properties and there was a comment that for basic Gauss elimination to work we
need all leading principal sub-matrices to be non-singular. A common situation when we
have this case is when the matrix A is symmetric and is positive definite which means that
xT Ax > 0 for all x 6= 0. The symmetric and positive definite property implies that all the
eigenvalues of the matrix are positive and in particular such a matrix is non-singular. We
can quickly deduce from this that all leading principal sub-matrices are also symmetric
and positive definite, and hence non-singular, and we can factorize in a symmetric way as

A = LLT ,

where L now denotes a general lower triangular matrix with positive diagonal entries.
This is known as a Cholesky factorization.
As a final comment, when the matrix is banded we can often factorise in a banded way
and the case of a tri-diagonal linear system will be briefly considered in a later chapter
when the finite difference method is used to approximately solve two-point boundary value
problems.

– Direct methods for solving Ax = b – 2–18 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 3–1

Chapter 3

The problem u′ = Au, u(0) = u0

3.1 Introduction

In this chapter we consider how to solve a first order linear system of ordinary differential
equations (ODEs) with constant coefficients which we write in a matrix-vector form as
u′ = Au, u(0) = u0 . (3.1.1)
If A = (aij ) is an n × n matrix then in full the differential equation part of this is
    
u1 (x) a11 · · · a1n u1 (x)
d  .   . ..   ..  .
 ..  =  .. · · · .  . 
dx
un (x) an1 · · · ann un (x)
The key result of the chapter is that in the cases that we consider the solution always
exists and it can be expressed in terms of the eigenvalues and eigenvectors of A when A
has a complete set of eigenvectors.

3.2 Using eigenvalues and eigenvectors

Let v 6= 0 be an eigenvector of A with eigenvalue λ, i.e. Av = λv. Also let


y(x) = eλx v. (3.2.1)
If we differentiate with respect to x then we get
y ′ (x) = λeλx v. (3.2.2)
If we multiply the vector in (3.2.1) by the matrix A then we get
Ay(x) = eλx Av = eλx λv (3.2.3)
where the last equality follows the eigenvector property. Both (3.2.2) and (3.2.3) are the
same and thus the vector in (3.2.1) satisfies the differential equation. As every eigen-
value/eigenvector pair gives a solution of the system of differential equations and as the
differential equation
u′ − Au = 0

– The problem u′ = Au, u(0) = u0 – 3–1 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 3–2

is linear it follows that linear combinations of different solutions is also a solution. Thus
if λ1 , λ2 , . . . , λn denotes the eigenvalues of A and v i 6= 0 is an eigenvector associated with
eigenvalue λi then
u(x) = c1 eλ1 x v 1 + · · · + cn eλn x v n (3.2.4)
satisfies the differential equations for all values of c1 , . . . , cn . What we have not yet done
is to determine if we can also satisfy the initial condition u(0) = u0 for some choice of
c = (ci ). To make progress here note that when x = 0 we have
 
c
  .1 
u(0) = c1 v1 + · · · + cn v n = v 1 , . . . , vn  ..  = V c
cn

where 
V = v 1, . . . , vn .
(The matrix V with the eigenvectors as the columns is the same matrix as appeared in
section 1.4.2.) Provided we can solve for c the equations

V c = u0 (3.2.5)

then (3.2.4) is the solution to (3.1.1). The linear system with the matrix V has a unique
solution if V is invertible and this is the case if and only if the eigenvectors v 1 , . . . , vn are
linearly independent. This condition on the eigenvectors corresponds to the case that the
matrix A is diagonalisable and a sufficient condition which guarantees this is when the
eigenvalues λ1 , . . . , λn are distinct.
We summarise the method.

1. Determine the eigenvalues and eigenvectors of A which we denote by λ1 , . . . , λn and


v1, . . . , vn.

2. Form the matrix V = (v 1 , . . . , vn ) and, if V is invertible, solve

V c = u0 .

3. If we obtained c in step 2 then the solution is given by


n
X
u(x) = ci eλi x v i .
i=1

In the above we have a representation of the exact solution although numerical methods
are needed to determine the eigenvalues and eigenvectors of A and the use of a computer
helps to solve V c = u0 . When n is as small as 2 hand calculations can be done and we
illustrate this next with some examples.

3.3 Examples with n = 2

In the following examples we start with two cases which can easily also be done by
techniques that you have already met and hence you can quite quickly confirm the solution.

– The problem u′ = Au, u(0) = u0 – 3–2 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 3–3

1. Consider the system


 ′     
u1 0 1 u1 2
= , u(0) = .
u2 1 0 u2 4

Here the matrix is  


0 1
A=
1 0
and the characteristic equation is

det(A − λI) = λ2 − 1 = 0.

The eigenvalues are λ1 = −1 and λ2 = 1. To get the eigenvectors first consider


   
1 1 −1 1
A − λ1 I = and A − λ2 I = .
1 1 1 −1

By inspection we can take the eigenvectors as


   
1 1
v1 = and v 2 = .
−1 1

The general solution is


   
−x 1 x 1
u(x) = c1 e + c2 e .
−1 1

To satisfy the condition at x = 0 we need to solve


    
1 1 c1 2
u(0) = V c = = .
−1 1 c2 4

If we add the two equations we get 2c2 = 6, c2 = 3 and c1 = −1.


If we consider the answer in components then

u1 (x) = −e−x + 3ex ,


u2 (x) = e−x + 3ex .

It is put in this form as you may note that the problem itself could have been
tackled in a different way by noting that u′1 = u2 so that u′′1 = u′2 and from the
other differential equation u′2 = u1 and thus u′′1 = u1 . We similarly get u′′2 = u2 .
By methods that you have done previously you you should know that the solution
involves e−x and ex .

2. Consider now the system


 ′     
u1 0 −1 u1 2
= , u(0) = .
u2 1 0 u2 4

Here the matrix is  


0 −1
A=
1 0

– The problem u′ = Au, u(0) = u0 – 3–3 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 3–4

and the characteristic equation is

det(A − λI) = λ2 + 1 = 0.

The eigenvalues are the complex conjugate pair λ1 = −i and λ2 = i. To get the
eigenvectors first consider
   
i −1 −i −1
A − λ1 I = and A − λ2 I = .
1 i 1 −i

By inspection we can take the eigenvectors as


   
1 1
v1 = and v2 = .
−i i

The general solution is


   
−ix 1 ix 1
u(x) = c1 e + c2 e .
−i i

To satisfy the condition at x = 0 we need to need to solve


    
1 1 c1 2
u(0) = V c = = .
−i i c2 4

If we multiply the first equation by i and add to the second equation we get

2ic2 = 4 + 2i, c2 = 1 − 2i.

Then
c1 = 2 − c2 = 1 + 2i.
The coefficients are a complex conjugate pair which is because the solution is real.
From what has been done so far we can write the solution as
   
−ix 1 ix 1
u(x) = (1 + 2i)e + (1 − 2i)e .
−i i

To express it in a form only involving real quantities note that

e−ix = cos x − i sin x and eix = cos x + i sin x.

and

(1 + 2i)(cos x − i sin x) = (cos x + 2 sin x) + i(2 cos x − sin x),


(1 − 2i)(cos x + i sin x) = (cos x + 2 sin x) − i(2 cos x − sin x).

Hence

u1 (x) = 2 cos x + 4 sin x,


u2 (x) = 4 cos x − 2 sin x.

As in the previous example this answer could have been obtained by noting that
u′1 = −u2 so that u′′1 = −u′2 = −u1 and similarly u′′2 = −u2 . You should know that
the solutions to these equations involve cos x and sin x.

– The problem u′ = Au, u(0) = u0 – 3–4 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 3–5

3. Consider now the system


 ′     
u1 6 6 u1 20
= , u(0) = .
u2 −2 −7 u2 −7
Here the matrix is  
6 6
A=
−2 −7
and the characteristic equation is
det(A − λI) = (6 − λ)(−7 − λ) + 12,
= λ2 + λ − 30 = (λ + 6)(λ − 5).
The eigenvalues are λ1 = −6 and λ2 = 5. To get the eigenvectors first consider
   
12 6 1 6
A − λ1 I = and A − λ2 I = .
−2 −1 −2 −12
By inspection we can take the eigenvectors as
   
1 −6
v1 = and v2 = .
−2 1
The general solution is
   
−6x 1 5x −6
u(x) = c1 e + c2 e .
−2 1
To satisfy the condition at x = 0 we need to need to solve
    
1 −6 c1 20
u(0) = V c = = .
−2 1 c2 −7
If we use basic Gauss elimination to solve then we have
   
1 −6 20 1 −6 20

−2 1 −7 0 −11 −33
giving c2 = −3 and c1 = 20 + 6c2 = 2. Thus to summarize, the solution is
   
−6x 2 5x 18
u(x) = e +e .
−4 −3

As some comments on the examples, in all cases we had distinct eigenvalues which
guarantees that the matrix V = (v1 , v2 ) has linearly independent columns and is in-
vertible. In one of the examples we had a complex conjugate pair of eigenvalues and the
corresponding eigenvectors occurred as a complex conjugate pair. Examples with complex
conjugate eigenvalues tend to be a longer to do by hand calculations to put the answer
in real form and it does need knowledge of what the exponential means for a general
complex number λ = p + iq with p, q ∈ R. We can take the following as the definition of
the exponential term in this case.
eλx = e(p+iq)x = epx eiqx = epx (cos(qx) + i sin(qx))
and hence
eλx = e(p−iq)x = epx e−iqx = epx (cos(qx) − i sin(qx)) = eλx .
Note that the magnitude in both cases is epx , i.e. it just involves the real part of the
eigenvalue.

– The problem u′ = Au, u(0) = u0 – 3–5 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 3–6

3.4 The exponential matrix

If you refer to section 1.4 in the revision chapter then note that for a diagonalisable matrix
we had that there exists a matrix V = (v1 , . . . , vn ) such that

V −1 AV = D or equivalently A = V DV −1

with D = diag {λ1 , . . . , λn } containing the eigenvalues and with v 1 , . . . , v n being the
eigenvectors. We have a similar set-up here with

u(x) = c1 eλ1 x v 1 + · · · + cn eλn x vn


 
c1 eλ1 x
= v 1 , . . . , v n  ... 
 
cn eλn x
  
eλ1 x c1
 ..   .. 
= v1, . . . , vn  .  . 
eλn x cn
= V exp(xD)c

where  
eλ1 x
exp(xD) =  ..
.
 

eλn x
is the exponential of the diagonal matrix D. Now as

V c = u0 , c = V −1 u0

and we get the representation

u(x) = V exp(xD)V −1 u0 .


In the case of a diagonalisable matrix the quantity in the brackets is known as the expo-
nential matrix exp(xA), i.e.

exp(xA) = V exp(xD)V −1

and the solution of the ODEs can be expressed in the form

u(x) = exp(xA)u0 .

It needs to be appreciated that this does not actually help us to solve the problem but
it does give a neat way of describing the solution which is consistent with the scalar case
u′1 = a11 u1 .
As a final comment, the exponential matrix is implemented in Matlab as the function
expm() and it does also exist for square matrices which are are not diagonalisable. In fact
if we let B = xA then exp(B) can be expressed in other ways and in particular it has the
series expansion
1 1 1
exp(B) = I + B + B 2 + B 3 + · · · + B n + · · ·
2 6 n!
– The problem u′ = Au, u(0) = u0 – 3–6 –
[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 3–7

which can be shown to converge for all matrices B and we have that

u(x) = exp(xA)u0

in all cases, i.e. diagonalisable or not. However when the matrix is not-diagonalisable
we cannot so neatly compute things as above and we require what are known as Jordan
blocks and for the solution u(x) this leads to terms such as xeλk x , x2 eλk x , etc. when λk is
a repeated eigenvalue with insufficient eigenvectors. These more complicated theoretical
cases will not be considered in this module although it is possible to describe the following
2 × 2 case without details which are very long.
Let  
λ1 α
A=
0 λ2
which has eigenvalues of λ1 and λ2 . It is deficient only when we have λ2 = λ1 with α 6= 0.
To determine the form of exp(xA) in the deficient case we first consider the case when
λ2 6= λ1 and then take the limit of the expression as λ2 → λ1 . Suppose then that λ2 6= λ1 .
To get the eigenvectors note that
   
0 α λ1 − λ2 α
A − λ1 I = and A − λ2 I =
0 λ2 − λ1 0 0

and thus for the eigenvectors we can take


   
1 α
v1 = and v 2 = 6= 0.
0 λ2 − λ1

The matrix V = (v1 , v 2 ) and the inverse V −1 are


  1 −α !
1 α λ2 − λ1 .
V = and V −1 = 1
0 λ2 − λ1 0
λ2 − λ1
If D = diag {λ1 , λ2 } then
  
λ2 x λ1 x
 λx
e 1 αeλ2 x

e λ1 x
α e − e
V exp(xD) = , V exp(xD)V −1 = λ2 − λ1 .
0 (λ2 − λ1 )eλ2 x λ2 x
0 e

Letting λ2 → λ1 gives
 λx
e 1 αxeλ1 x
  
λ1 α
A= with exp(xA) = .
0 λ1 0 eλ1 x

The form of exp(xA) when λ2 = λ1 could also have been obtained with a similar length
of workings using the series after first noting that

λ1 nαλ1n−1
   n 
λ1 α n
A= gives A = .
0 λ1 0 λn1

There is a question on the first exercise sheet similar to this to get the expression for An .

– The problem u′ = Au, u(0) = u0 – 3–7 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 3–8

3.5 Higher order systems of ODEs with constant co-


efficients

You should already be familiar with finding the general solution of differential equations
of the form
y ′′ + b1 y ′ + b0 y = 0, b0 and b1 being constants.
To solve this the technique to use is to consider a candidate solution of the form

y(x) = emx

and substituting into the equation gives

emx (m2 + b1 m + b0 ) = 0

and this is true provided m is such that

m2 + b1 m + b0 = 0,

which is known as the auxiliary equation. If we assume that the quadratic has distinct
roots m1 and m2 then the general solution is of the form

y(x) = B1 em1 x + B2 em2 x , (3.5.1)

where B1 and B2 are constants. This can be considered as a particular case of what has
been given earlier in this chapter as we can convert this second order ODE into a system
of first order ODEs by letting
u1 = y, u2 = y ′
and then u′1 = y ′ = u2 and we get u′2 = y ′′ = −b1 y ′ − b0 y = −b1 u2 − b0 u1 . If we write this
in matrix vector notation then we have
 ′     
u1 u2 0 1 u1
= = .
u2 −b0 u1 − b1 u2 −b0 −b1 u2

As we have already given the general solution in (3.5.1) we can put this in vector notation
as      
y(x) m1 x 1 m2 x 1
u(x) = ′ = B1 e + B2 e .
y (x) m1 m2
This is of the form
u(x) = c1 eλ1 x v 1 + c2 eλ2 x v2
with c1 = B1 and c2 = B2 and with (1, mk )T being an eigenvector with λk = mk as the
eigenvalue for k = 1, 2. To check the eigenvector part we just need to consider the product
    
0 1 1 mk
= .
−b0 −b1 mk −b0 − b1 mk
As mk is a root of the quadratic we have

m2k + b1 mk + b0 = 0, which re-arranges to − b0 − b1 mk = m2k

and it follows that     


0 1 1 1
= mk
−b0 −b1 mk mk

– The problem u′ = Au, u(0) = u0 – 3–8 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 3–9

which confirms the eigenvalue/eigenvector property as required. Although we did not


need to construct the characteristic equation for A here it does immediately follow that

−λ 1
det(A − λI) = = λ(b1 + λ) + b0 = λ2 + b1 λ + b0
−b0 −b1 − λ

and thus the characteristic equation is the same as the auxiliary equation.
The above can be generalised to an ODE of any order of the form

y (n) + bn−1 y (n−1) + · · · + b1 y ′ + b0 y = 0

where b0 , b1 , . . . , bn−1 are constants. We define u1 , u2 , . . . , un as follows.

u1 = y,
u2 = y ′ = u′1 ,
u3 = y ′′ = u′2 ,
··· ···
un = y (n−1) = u′n−1

and note that from the differential equation

u′n = y (n) = −b0 u1 − b1 u2 − · · · − bn−1 un−1 .

Thus we have
 
0 1 0 ··· 0
 0 0 1 ··· 0 
..
 
u′ = Au with A =  .. . (3.5.2)
.
 
 ··· . ··· 0  
 0 0 0 ··· 1 
−b0 −b1 · · · · · · −bn−1

The characteristic equation of the matrix is the auxiliary equation and is given by

λn + bn−1 λn−1 + · · · + b1 λ + b0 = 0. (3.5.3)

If you search through books and you search on the internet then you will find such a
matrix A in (3.5.2) referred to as the transpose of the companion matrix associated with
the polynomial given on the left hand side of (3.5.3). Finding the roots of a polynomial
can hence be converted to a problem of finding the eigenvalues of such a matrix and this
is what is often done in computer packages to determine all the roots of a polynomial.
As a final point here about converting one high order ODE into a system of first order
ODEs, the initial condition u(0) is a condition involving y(0), y ′ (0), . . . , y (n−1) (0), i.e. it
involves the function and derivatives at x = 0. This is an example of an initial value
problem. This term will be seen again when numerical techniques are considered and in
the next chapter the case of the extra conditions being at more than one point will also
be considered when we consider the two point boundary value problem.

– The problem u′ = Au, u(0) = u0 – 3–9 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 3–10

3.6 Summary
1. The problem u′ (x) = Au(x), u(0) = u0 , where A = (aij ) is a n × n constant matrix,
can be solved by a procedure which involves finding the eigenvalues and eigenvectors
of the matrix A when A is a diagonalisable matrix. With the eigenvalues and
eigenvectors found the general solution is given by
n
X
u(x) = ci eλi x v i , (3.6.1)
i=1

where Av i = λi v i , i = 1, . . . , n. To obtain the particular solution satisfying u(0) =


u0 we must solve
V c = u0 , where V = (v 1 , . . . , v n ).

2. With a real matrix A the eigenvalues and eigenvectors may be complex but they
occur in complex conjugate pairs and the solution u(x) is real when u0 is real. To
deal with the complex case note that

exp((p + iq)x) = epx eiqx = epx (cos(qx) + i sin(qx)),


exp((p − iq)x) = epx e−iqx = epx (cos(qx) − i sin(qx))
= exp((p + iq)x).

3. The behaviour of the general solution in (3.6.1) as x → ∞ depends on the eigenvalues


λ1 , . . . , λn . If the real part of λi is negative then eλi x → 0 as x → ∞ and if this is
the case for all the eigenvalues then u(x) → 0 as x → ∞ for all values of c1 , . . . , cn .

4. With
A = V DV −1 , D = diag {λ1 , . . . , λn }
and
exp(xA) = V exp(xD)V −1
the solution can be represented in the form

u(x) = exp(xA)u0 .

This does not help us solve the problem but it is neat way of expressing the solution
and it helps to understand the structure of the problem.
In all cases, diagonalisable or not,

x2 2 x3 3 xm m
exp(xA) = I + xA + A + A +···+ A +··· .
2| 3! m!

5. A higher order ODE can be converted into a system of first order ODEs by letting
u1 = y, u2 = y ′ , u3 = y ′′, etc. so that u′1 = u2 , u′2 = u3 etc..

– The problem u′ = Au, u(0) = u0 – 3–10 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 4–1

Chapter 4

The finite difference method for the


2-point boundary value problem

4.1 Introduction

In the previous chapter linear ordinary differential equations (ODEs) of the form
u′ = Au, u(0) = u0
were considered, with A being a n × n matrix of constants, and it was shown that the
solution could be expressed in the form
u(x) = exp(xA)u0 ,
where in the case of a diagonalisable matrix
exp(xA) = V exp(xD)V −1 , V = (v1 , . . . , vn ),
exp(xD) = diag eλ1 x , . . . , eλn x .


In the above v i 6= 0 is an eigenvector of A with eigenvalue λi . It was also shown that a


linear higher order ODE with constant coefficients can also be converted into a system of
first order ODEs and we get a similar closed form expression for the solution.
Closed form expressions for the solution of differential equations are actually rare and
usually numerical methods are the only way to approximately solve a given problem. In
this chapter we consider the finite difference technique which can be used in the case of
the 2-point boundary value problem and comment at the end of the chapter on numerical
methods for the initial value problem for more general problems than were described in
the previous chapter.
The specific problem for most of this chapter is the following: Find u(x), a ≤ x ≤ b,
such that
u′′ (x) = p(x)u′ (x) + q(x)u(x) + r(x), a < x < b, (4.1.1)
u(a) = g1 , u(b) = g2 , (4.1.2)
where p(x), q(x) and r(x) are suitable functions defined in [a, b] and q(x) ≥ 0 on [a, b].
Suitable functions in this context means that they have sufficiently many bounded deriva-
tives on [a, b]. The condition q(x) ≥ 0 is a standard sufficient condition to ensure that a

– Numerical methods for the 2-point boundary value problem – 4–1 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 4–2

solution does exist. The solution to (4.1.1)-(4.1.2) depends on the functions p(x), q(x),
r(x) and on the boundary values g1 and g2 and it cannot in general be expressed in closed
form.

4.2 Finite difference approximations

The method considered in the chapter to approximate the solution to (4.1.1)-(4.1.2) is the
finite difference method and we start here by considering how to approximate derivatives
using differences involving function values and the key mathematical tool to understand
this material is Taylor expansions.
For this section let u(x) denote any sufficiently differentiable function. We will consider
Taylor expansions about a point and for this it is convenient here to introduce the points
that we will consider later by introducing a uniform mesh of our interval [a, b] involving
N + 1 equally spaced points with spacing h = (b − a)/N with the points being given by
xi = a + ih, i = 0, 1, . . . , N. (4.2.1)
To shorten the expressions we let ui = u(xi ), u′i = u′ (xi ), u′′i = u′′ (xi ), etc.. With these
abbreviations we have the following Taylor expansions about the point xi :
h2 ′′ h3 ′′′ h4 ′′′′
ui+1 = u(xi + h) = ui + hu′i + u + ui + ui
2! i 3! 4!
+··· (4.2.2)
2 3 4
h ′′ h ′′′ h ′′′′
ui−1 = u(xi − h) = ui − hu′i + u − ui + ui
2! i 3! 4!
+··· (4.2.3)
Here we have related ui+1 = u(xi+1 ) and ui−1 = u(xi−1 ) to u and its derivatives at the
nearby point xi . We can combine these in different ways as follows. If we add (4.2.2)
and (4.2.3) then all the terms with odd order derivatives cancel and if we subtract (4.2.3)
from (4.2.2) then all the terms with even order derivatives cancel. In the adding case we
have
h4
ui+1 + ui−1 = 2ui + h2 u′′i + u′′′′ +···
12 i
and in the subtracting case we have
h3 ′′′
ui+1 − ui−1 = 2hu′i + ui + · · · .
3
By rearranging these we get expressions for u′′i and u′i given by
ui+1 − 2ui + ui−1 h2 ′′′′
u′′i = − ui + · · · (4.2.4)
h2 12
and
ui+1 − ui−1 h2 ′′′
u′i =
− ui + · · · . (4.2.5)
2h 6
′′
The central difference approximation to ui is given by
ui+1 − 2ui + ui−1
h2
– Numerical methods for the 2-point boundary value problem – 4–2 –
[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 4–3

and the central difference approximation to u′i is given by


ui+1 − ui−1
2h
and the error in both of these is written as O(h2 ). This order notation is used frequently
in this context when the actually expression involved is of the form ch2 , where c does not
involve h, and we are not too interested in the details of the term c.
If you are someone who likes a bit more analysis then we can do all of the above a bit
more rigorously by using Taylor’s series with a remainder term at each step and we repeat
this next and start with the assumption that u(x) is 4-times continuously differentiable.
Instead of (4.2.2) and (4.2.3) we have

h2 ′′ h3 ′′′ h4 ′′′′
ui+1 = u(xi + h) = ui + hu′i + u + ui + u (ξ1 )
2! i 3! 4!
(4.2.6)
2 3 4
h ′′ h ′′′ h ′′′′
ui−1 = u(xi − h) = ui − hu′i + u − ui + u (ξ2 )
2! i 3! 4!
(4.2.7)

where ξ1 ∈ (xi , xi+1 ) and ξ2 ∈ (xi−1 , xi ). Then

ui+1 − 2ui + ui−1 h2 ′′′′


u′′i = − (u (ξ1 ) + u′′′′ (ξ2 )) .
h2 24
The error term as given is a bit awkward but it can be tidied up by using the intermediate
value theorem by considering the continuous function
1 ′′′′
f (t) = u′′′′ (t) − (u (ξ1 ) + u′′′′ (ξ2 )) .
2
Observe that
1 ′′′′
(u (ξ1 ) − u′′′′ (ξ2 )) = −f (ξ2 ).
f (ξ1 ) =
2
The continuous function hence changes sign between ξ2 and ξ1 and thus there exists

ξ ∈ (ξ2 , ξ1 ) ⊂ (xi−1 , xi+1 )

with f (ξ) = 0, i.e.


1 ′′′′
u′′′′ (ξ) = (u (ξ1 ) + u′′′′ (ξ2 ))
2
and we can write
ui+1 − 2ui + ui−1 h2 ′′′′
= u′′i − u (ξ).
h2 12
Similar reasoning leads to the existence of α ∈ (xi−1 , xi+1 ) such that

ui+1 − ui−1 h2 ′′′


u′i = − u (α).
2h 6

– Numerical methods for the 2-point boundary value problem – 4–3 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 4–4

4.3 The FDM for the two-point BVP

In the following Ui will denote a finite difference approximation to ui = u(xi ), i.e. we use
upper case letters for the approximation and lower case letters for the exact solution to
the differential equation. For the uniform mesh a = x0 < x1 < · · · < xN = b described
in (4.2.1) we have N + 1 such values corresponding to i = 0, 1, . . . , N. To motivate how
these are going to be defined consider replacing the derivatives in (4.1.1) by difference
expressions for the interior mesh points corresponding to 1 ≤ i ≤ N − 1. With pi = p(xi ),
qi = q(xi ) and ri = r(xi ) we have
 
ui+1 − 2ui + ui−1 ui+1 − ui−1
2 = pi + qi ui + ri + O(h2 ). (4.3.1)
h 2h
The exact values satisfy this equation but it involves a term O(h2 ) which we do not know
precisely. In a finite difference scheme we omit the O(h2 ) term and our main requirement
on the terms U0 , U1 , . . . , UN are that they satisfy
 
Ui+1 − 2Ui + Ui−1 Ui+1 − Ui−1
= pi + qi Ui + ri ,
h2 2h
i = 1, 2, . . . , N − 1. (4.3.2)

This gives N − 1 equations with N + 1 unknowns. For the two additional equations we
note (4.1.2) and impose the conditions that

U0 = g1 and UN = g2 ,

i.e. the approximate solution exactly satisfies the boundary conditions at x = a and at
x = b. Thus with U0 and UN known (4.3.2) give N − 1 equations which we need to solve
to get U1 , . . . , UN −1 and we consider this below. First however we note some terminology
associated with approximating (4.3.1) by (4.3.2). The quantity
   
ui+1 − 2ui + ui−1 ui+1 − ui−1
Li = − pi + qi ui + ri = O(h2 )
h2 2h
is often referred to as the local truncation error and it indicates how closely the exact
values satisfy the difference equations.
We now re-arrange (4.3.2) into the the linear equations that we solve to get the ap-
proximations U1 , . . . , UN −1 . If we multiply (4.3.2) by h2 and collect together all the parts
involving Ui−1 , Ui and Ui+1 on the left hand side then we first get
hpi
Ui−1 − 2Ui + Ui+1 = (Ui+1 − Ui−1 ) + qi h2 Ui + h2 ri
2
and finally we have
   
hpi 2 hpi
Ui+1 = −h2 ri ,

−1 − Ui−1 + 2 + h qi Ui + −1 +
2 2
i = 1, 2, . . . , N − 1.

When i = 1 the equation involves U0 = g1 and putting this on the right hand side gives
   
2 hp1 2 hp1
(2 + h q1 )U1 + −1 + U2 = −h r1 + 1 + g1 .
2 2

– Numerical methods for the 2-point boundary value problem – 4–4 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 4–5

When i = N − 1 the equation involves UN = g2 and putting this on the right hand side
gives
 
hpN −1
UN −2 + 2 + h2 qN −1 UN −1

−1 −
2
 
2 hpN −1
= −h rN −1 + 1 − g2 .
2

The equations when i = 1 and i = N − 1 only involve 2 unknowns whilst the ones for
i = 2, . . . , N − 2 involve 3 unknowns. We have a tri-diagonal linear system for U =
(U1 , . . . , UN −1 )T of the form
AU = c (4.3.3)
where the matrix is given by
 
a11 a12 0 ··· 0
. . ..
a21 . . . . .
 

A= 0
 . .
.. .. . ..

,
 0 
 . . .
 .. .. ..

aN −2,N −1 
0 · · · 0 aN −1,N −2 aN −1,N −1
hpi hpi
ai,i−1 = −1 − , aii = 2 + h2 qi , ai,i+1 = −1 +
2 2
(4.3.4)

and the right hand side vector c = (ci ) is such that


 
2 hp1
c1 = −h r1 + 1 + g1 , (4.3.5)
2
ci = −h2 ri , 2 ≤ i ≤ N − 2, (4.3.6)
 
2 hpN −1
cN −1 = −h rN −1 + 1 − g2 . (4.3.7)
2

One particular case of the above worth noting is when the function p(x) = 0 so that
in the numerical scheme p1 = p2 = · · · = pN −1 = 0. The matrix A simplifies to

2 + h2 q1 −1
 
 −1 2 + h2 q2 −1 
.. .. ..
 
 . . . 
A= . (4.3.8)
 
 . .
.. .. . .. 
 
 −1 2 + h2 qN −2 −1 
−1 2 + h2 qN −1

If you recall that it was stated that q(x) ≥ 0 is a sufficient condition to guarantee the
solution of the two-point boundary value problem and it can be shown that this also
guarantees that the matrix above is non-singular and satisfies what is known as a positive
definite property. We will not pursue these aspects here other than to note that there is
a solution to the linear system to determine U .

– Numerical methods for the 2-point boundary value problem – 4–5 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 4–6

4.3.1 Computational resources

We can store the tri-diagonal matrix A in an efficient way by just storing the sub-diagonal,
diagonal and super-diagonal entries and we can solve the system efficiently by Gauss
elimination by noting that when we eliminate below the diagonal in each column there
is only one entry to eliminate at each step. We show next how this can be done and we
then indicate what is already available in Matlab.

A tri-diagonal solver involving O(m) storage and operations

It is customarily when we have tri-diagonal matrices to adopt a one-subscript notation


(ai , di and bi ) for the entries instead of the double subscript notation aij . In the following
m is the size of the system (thus m = N −1 when we relate this to the two-point boundary
value problem). Thus we will write the system as

d1 U1 + b2 U2 = c1
ai−1 Ui−1 + di Ui + bi+1 Ui+1 = ci , 2 ≤ i ≤ m − 1,
am−1 Um−1 + dm Um = cm

or as  
d1 b2 ··· 0  U   c 
1 1
 .. 
 a1 d2 b3 .   U2   c2 
.  .   . 
.  .   . 
 . a2 d3 b4  .   . 
.. .. ..  .  =  .  .
  ..   .. 


 . . .    
 .. ..  U  cm−1 
 . . bm  m−1

0 ··· am−1 dm Um cm

The first step in Gauss elimination involves eliminating below the diagonal in column 1
and thus in this case we need only subtract a multiple of equation 1 from equation 2 to
produce the modified equation

d′2 U2 + b3 U3 = c′2 , where l1 = a1 /d1 , d′2 = d2 − l1 b2 , c′2 = c2 − l1 c1 .

One step of Gauss elimination (to create d′2 and r2′ in this case) hence just involves 1
division, 2 multiplications and 2 subtractions. Continuing this process to eliminate below
the diagonal in column 2, then in column 3, then in · · · and finally in column m − 1
similarly gives

l2 = a2 /d′2 , d′3 = d3 − l2 b3 , c′3 = c3 − l2 c2 ,


l3 = a3 /d′3 , d′4 = d4 − l3 b4 , c′4 = c4 − l3 c3 .
··· ··· ···

With d′1 = d1 and c′1 = c1 we hence have

lk = ak /d′k , d′k+1 = dk+1 − lk bk+1 , c′k+1 = ck+1 − lk c′k ,


k = 1, 2, . . . , m − 1 .

– Numerical methods for the 2-point boundary value problem – 4–6 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 4–7

In matrix form we have created the upper triangular system


 ′ 
d1 b2 · · · 0  U1   c′1 
..   U   c′ 
d′2 b3 .  2   2 


 .   . 

d3 b4   ..   .. 


 .. ..   .. 
 = . 
. .  .   . 
  . 


..
. bm  Um−1  c′m−1 
 

0 ··· d′m Um c′m

which can then be solved by back substitution. We simply have

Um = c′m /d′m
Ui = (c′i − bi+1 Ui+1 )/d′i for i = m − 1, m − 2, . . . , 1.

Hence to summarize this an an algorithm we have the following.

d′1 = d1 , c′1 = c1 .
For k = 1, 2, . . . , m − 1.
lk = ak /d′k .
d′k+1 = dk+1 − lk bk+1 .
c′k+1 = ck+1 − lk c′k .
End For loop
Um = c′m /d′m .
For i = m − 1, . . . , 1.
Ui = (c′i − bi+1 Ui+1 )/d′i
End For loop

The point to note in the above is that the storage is O(m) and the number of operations
to obtain the solution is also O(m) in contrast to a general system which has respectively
O(m2 ) and O(m3 ).

Using spdiags in Matlab to have O(m) storage and O(m) operations

Whichever programming language is used there are not too many statements needed to
implement the algorithm in the previous subsection. Matlab has a number of functions
which deal with sparse matrices and in particular the function spdiags is available to
quickly set-up banded matrices. If, as in the last subsection, m is number of unknowns
(we write m in the statements) and in Matlab a, d and b are column vectors of length m
then we can set-up the matrix in sparse form by the statement

A=spdiags([a, d, b], -1:1, m, m);

Here -1:1 means [-1, 0, 1] and the statement indicates that the first m-1 entries of a
are on the band below the diagonal, d contains the m diagonal entries and the last m-1

– Numerical methods for the 2-point boundary value problem – 4–7 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 4–8

entries of b are on the band above the diagonal. The part [a, d, b] is a m × 3 matrix
and note that the last entry of a and the first entry of b are not used. With A sparsely
stored and with c being a column vector of length m the Matlab statement

UU=A\c;

efficiently solves the equations with an algorithm which involves O(m) operations.

A less efficient implementation in Matlab using diag

Matlab also has a builtin command called diag which can be used to create diagonal ma-
trices or more generally a matrix with one non-zero band parallel with the main diagonal.
Thus if a, d and b are as in the last subsection then the tri-diagonal matrix can be created
with the command

A=diag(a(1:m-1), -1) + diag(d) + diag(b(2:m), 1);

which involves the sum of 3 matrices. When diag is used with 2 arguments the 2nd
argument indicates which band to use and when the band is the sub-diagonal or super-
diagonal it is necessary to give exactly m-1 entries.
The difference between the set-up with diag and the set-up using spdiags is that in
this case A is stored as a full matrix whereas in the previous case only the non-zero entries
are stored. Hence this version is wasteful in resources when m is large. The time taken
to solve the linear system is also much greater although it is not as long as you might
initially think from what was given in chapter 2 about solving linear equations. When
you use \ in a statement such as UU=A\c Matlab first does some checks on the matrix in
order to select what it considers to be the best technique to use. In this case it detects
that the matrix is tri-diagonal and it uses an efficient technique but this is after O(m2 )
checks. Thus, to summarize, this approach is inefficient in storage and in the number of
operations compared with the approach using spdiags as they are both O(m2 ) compared
with O(m) in the sparse version. This may not matter too much if you only wish to solve
one system and the speed of the computer is such that it is 0.01 seconds with the sparse
version and 1 second with the full version. However if this or similar linear systems are
to be solved a 1000 times in an application or if m needs to be much larger then the
difference becomes significant.

4.3.2 Experiments indicating the convergence of the method

In the derivation of the finite difference scheme there was equation (4.3.1) that the exact
values satisfied followed by the system (4.3.2) which defined the finite difference scheme
used in the remaining parts of this chapter. From this it is possible to get equations for
the errors ui − Ui and from this it can be shown that the method does converge as the
number of intervals N → ∞. It is usual to describe the convergence in terms of the mesh
parameter
b−a
h= > 0.
N

– Numerical methods for the 2-point boundary value problem – 4–8 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 4–9

What can be shown is that as h → 0 the errors behave like

|ui − Ui | = O(h2 )

provided all the conditions mentioned earlier are satisfied and in particular that the exact
solution u(x) is 4-times continuously differentiable. If N is sufficiently large then when we
double N the error on average decreases is about 1/4 of what it was before and similarly
if we increase N by a factor of 10 then the error is about 1/100 of what it was, i.e. we gain
about two more decimal digits of accuracy. We can check these claims on a manufactured
problem with a known exact solution as follows.
We take the interval [a, b] = [0, π], we take p(x) = q(x) = 0 and we take r(x) so that
the exact solution is
u(x) = exp(0.1x) cos(3x).
In this simplified case the equation is just

u′′ = r, with r = exp(0.01x)((0.01 − 9) cos(3x) − 0.6 sin(3x))

and the boundary conditions are u(0) = g1 = 1 and u(π) = g2 = − exp(0.1π). A complete
short Matlab script which implements the finite difference scheme for N = 8, 16, . . . , 214 =
16384 and compares the approximate solution with the known exact solution follows.

– Numerical methods for the 2-point boundary value problem – 4–9 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 4–10

% Manufactured problem u’’=r, 0<x<pi


% with known solution uex
uex =@(x) exp(0.1*x).*cos(3*x);
r =@(x) exp(0.1*x).*( (0.01-9)*cos(3*x)-0.6*sin(3*x) );

% set the interval [a, b] and boundary values of U


a=0;
b=pi;
g1=uex(a);
g2=uex(b);

fprintf(’%5s %12s %10s\n’, ’N’, ’error’, ’ratio’);


er=zeros(12, 1);

% loop to get the solution for larger and larger N


for k=1:12
% set N, h and the uniform mesh points as x
N=2^(2+k);
x=linspace(a, b, N+1);
x=x(:);
xx=x(2:N);
h=(b-a)/N;

% set-up the tri-diagonal matrix A


N1=N-1;
o=ones(N1, 1);
A=spdiags([-o, 2*o, -o], -1:1, N1, N1);

% set-up the rhs vector c


c=-h*h*r(xx);
c(1)=c(1)+g1;
c(N1)=c(N1)+g2;

% get the fd solution U and save the error


U=[g1; zeros(N1, 1); g2];
U(2:N)=A\c;
er(k)=norm(U-uex(x), inf);

% show the error and the error ratio


if k>1
rat=er(k-1)/er(k);
fprintf(’%5d %12.3e %10.7f\n’, N, er(k), rat);
else
fprintf(’%5d %12.3e\n’, N, er(k));
end
end

– Numerical methods for the 2-point boundary value problem – 4–10 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 4–11

The output generated is as follows.

N error ratio
8 2.152e-01
16 5.473e-02 3.9314256
32 1.350e-02 4.0528885
64 3.376e-03 4.0002883
128 8.435e-04 4.0024833
256 2.109e-04 3.9994355
512 5.272e-05 4.0002049
1024 1.318e-05 4.0000512
2048 3.295e-06 3.9999897
4096 8.238e-07 3.9999987
8192 2.059e-07 4.0000128
16384 5.147e-08 4.0009746

The output clearly shows that the ∞-norm of the exact error does indeed decrease
like O(h2 ) as the ratios of the errors are close to 4.

4.4 Remarks about other boundary conditions

In some applications we have ODEs of the form (4.1.1) with a boundary condition at one
or more of the end points a and b involving a derivative of u, e.g. u′ (a) = g3 of u′ (b) = g3 .
In full this type of problem is either of the form

u′′ (x) = p(x)u′ (x) + q(x)u(x) + r(x), a < x < b,


u′ (a) = g3 , u(b) = g2 ,

or

u′′ (x) = p(x)u′ (x) + q(x)u(x) + r(x), a < x < b,


u(a) = g1 , u′(b) = g3 .

This complicates things a little but it still possible to adapt the method and retain the
O(h2 ) accuracy provided we have a difference approximation for derivative boundary
condition of sufficient accuracy which we can do as follows.
Consider again Taylor series expansions about xN = b for u(xN −1 ) = u(b − h) and now
also for u(xN −2 ) = u(b − 2h).

h2 ′′ h3 ′′′
uN −1 = uN − hu′N
+ uN − uN + · · ·
2 6
2
4h 8h3 ′′′
uN −2 = uN − 2hu′N + u′′N − u +··· .
2 6 N
We want to combine these to eliminate the u′′N term and hence we take

4h3 ′′′
4uN −1 − uN −2 = 3uN − 2hu′N + u +···
6 N
– Numerical methods for the 2-point boundary value problem – 4–11 –
[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 4–12

and this rearranges to give

uN −2 − 4uN −1 + 3uN h2
u′N = + u′′′ +··· .
2h 3 N
This motivates that to approximate the boundary condition u′(b) = g3 we take

UN −2 − 4UN −1 + 3UN
= g3 .
2h
To use this we need to add it to the equations to be solved as now UN is also an unknown
and if the other boundary condition is still U0 = g1 then we have a linear system for
U1 , . . . , UN although we do not consider the details further here in these notes.

4.5 Summary
1. An exact solution of a ODE in closed form is not usually possible and numerical
techniques are needed to approximate the solution.

2. Using Taylor’s series gives the following

h2 ′′ h3 ′′′ h4 ′′′′
ui+1 = u(xi + h) = ui + hu′i + u + ui + ui + · · ·
2! i 3! 4!

2
h ′′ h ′′′ h4 ′′′′
3
ui−1 = u(xi − h) = ui − hui + ui − ui + ui + · · ·
2! 3! 4!
The central difference approximations for u′i and u′′i that are used in the finite
difference scheme are
Ui+1 − Ui−1 Ui+1 − 2Ui + Ui−1
and .
2h h2
Both of these have an accuracy which is written as O(h2 ).

3. The finite difference scheme described in this chapter for the two-point boundary
value problem gives a tri-diagonal linear system to solve and the storage of the
matrix and the amount of computation are both O(N) when we have N + 1 mesh
points in [a, b]. The error in the basic scheme is O(h2 ) where h = (b − a)/N is the
mesh spacing in a uniform mesh. We can implement this efficiently in Matlab using
spdiags when setting up the matrix so that sparse storage mode is used.

4. We can also deal with derivative boundary conditions but we need to use a sufficient
number of points to retain the O(h2 ) accuracy. In the case of approximating u′ (b) =
g3 this involves
UN −2 − 4UN −1 + 3UN
= g3 .
2h
This was derived by considering the Taylor expansions of uN −1 = u(b − h) and
uN −2 = u(b − 2h) about the point b.

– Numerical methods for the 2-point boundary value problem – 4–12 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 4–13

4.6 Remarks about initial value problems

If more time was available then numerical methods for the initial value problem

u′ = f (t, u(t)), u(t0 ) = u0 or u′ = f (t, u(t)), u(t0 ) = u0 (4.6.1)

would be given. Here we are now using t for the variable as in applications this denotes
time. In (4.6.1) the first case is the scalar case and the other case is the vector case. With
suitable properties on f or f it can be proved that a solution exists in a vicinity of t = t0
but it may not always exist for all t. The chapter about u′ = Au was a special case of the
vector case corresponding to f (t, u(t)) = Au(t) with A being a matrix of constants.
As this section just contains comments on methods in this context we restrict these
comments to the scalar case. Firstly, as an attempt to indicate the complexity that can
arise consider what is involved to get expressions for the time derivatives of u(t). For
example, we might wish to consider the Taylor expansion about t0 as we know u(t0 ) = u0 .
The first derivative is given by the differential equation, i.e.

u′ (t) = f (t, u(t)).

To get the second derivative with respect to t we need to use the chain rule and we get
∂f ∂f
u′′ (t) = (t, u(t))) + (t, u(t))u′(t)
∂t ∂u
∂f ∂f
= (t, u(t)) + (t, u(t))f (t, u(t)).
∂t ∂u
It is getting a bit cumbersome to show the evaluation point (t, u(t)) for each term and
thus we shorten to just putting
∂f ∂f
u′′ = +f . (4.6.2)
∂t ∂u
To get the next derivative we need to now use both the chain rule and the product rule
and this gives
 2
∂2f ′
 2
∂2f ′
   
′′′ ∂ f ∂f ∂f ′ ∂f ∂ f
u = + u + + u +f + u
∂t2 ∂u∂t ∂t ∂u ∂u ∂t∂u ∂u2
∂2f ∂2f ∂2f
 
∂f ∂f ∂f
= + 2f + + f + f (4.6.3)
∂t2 ∂u∂t ∂t ∂u ∂u ∂u2
where the last line is obtained be replacing u′ by f and collecting together some of the
terms. As this illustrates each new derivative gets progressively more complicated and we
stop at this point.
There is a class of methods which are known as Taylor series methods which needs
the expressions for derivatives of u but they are not used in practice as they require that
partial derivatives of f are determined whereas other methods of comparable accuracy
only need that we can evaluate f (t, v) for any values of t and v. A particular well known
class of such methods are Runge Kutta methods which we just state.
In the following let h be small and let ti = t0 + ih so that t0 < t1 < t2 < · · · <
tn < tn+1 · · · denote equally spaced discrete times. In all the methods we start by taking
U0 = u0 .

– Numerical methods for the 2-point boundary value problem – 4–13 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 4–14

Euler’s method involves defining U1 , U2 , . . . by


Un+1 = Un + hf (tn , Un ), n = 0, 1, 2, . . . . (4.6.4)
When we start we make an error of magnitude O(h2 ) when we obtain U1 and the termi-
nology is that this is the local truncation error. If we require N steps to reach a final time
tf = t0 + Nh then the accumulation of errors of magnitude O(h2 ) at each step is that the
error in the approximation UN ≈ un = u(tf ) has an error of order O(h) as
Nh2 = (tf − t0 )h.
To describe the operations in the form of an algorithm or pseudo code can be done as
follows.
For n = 0, 1, 2, . . .
k1 = f (tn , Un ),
Un+1 = Un + hk1 .
End For loop

One of the Runge Kutta methods of order 2 known as Huen’s involves the following.
For n = 0, 1, 2, . . .
k1 = f (tn , Un ),
k2 = f (tn + h, Un + hk1 ),
Un+1 = Un + h
2 (k1 + k2 ).
End For loop

This involves 2 function evaluations to get the approximation Un+1 once Un is already
known. If we require N steps to reach a final time tf = t0 + Nh then the accumulation
of the errors UN ≈ un = u(tf ) has an error of order O(h2 ).
The most well known of the Runge Kutta methods is the Runge Kutta method of
order 4 and this involves the following.
For n = 0, 1, 2, . . .
k1 = f (tn , Un ),
k2 = f (tn + h/2, Un + (h/2)k1),
k3 = f (tn + h/2, Un + (h/2)k2),
k4 = f (tn + h, Un + hk3 ),
Un+1 = Un + h 6 (k1 + 2k2 + 2k3 + k4 ).
End For loop

This involves 4 function evaluations to get the approximation Un+1 once Un is already
known. If we require N steps to reach a final time tf = t0 + Nh then the accumulation
of the errors UN ≈ un = u(tf ) has an error of order O(h4 ). This method was developed
over 100 years ago and it is still used today.

– Numerical methods for the 2-point boundary value problem – 4–14 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 5–1

Chapter 5

Fourier series

5.1 Introduction and periodic functions

Fourier series is concerned with representing a periodic function in terms of appropriate


sine and cosine terms and the series are named after Joseph Fourier (1768–1830). Joseph
Fourier developed the series in his work to solve the heat equation which is a partial
differential equation which governs how the temperature in a body changes in both space
and time. As part of MA2715 there is not enough time to consider this application but we
will consider how to obtain Fourier series for several different functions, we consider some
manipulations with the series such as term-by-term integration and differentiation and we
will state sufficient conditions for the series to be the same as the function it represents.
A periodic function f (x) is one those graph repeats itself and in terms of a formula a
function has period T > 0 if

f (x + T ) = f (x), for all x ∈ R.

The smallest value of T > 0 for which this is true is sometimes called the least period or
the fundamental period although when the context is clear some texts just use the term
period to mean the least period. For example, the least period of cos θ and sin θ is 2π,
i.e.
cos(θ + 2π) = cos(θ), sin(θ + 2π) = sin(θ),
and there is no smaller number such that we have these relations. As Fourier series
involve expansions in terms of cosines and sines it is convenient in terms of simplifying
the formulas to start with 2π-periodic functions and then to generalise later to the case
when the period can be any T = 2L > 0.

5.2 The Fourier series for a 2π-periodic function f (x)

We will need some revision and preliminary results about cosines and sines before we can
justify the formula that appear but first it helps to indicate what is a Fourier series for a
“suitable” 2π-periodic function f . The term “suitable” for this module will be piecewise
smooth which here means piecewise continuous and in an interval where it is continuous

– Fourier series – 5–1 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 5–2

it has a derivative except possibly at a finite number of points. For example, the functions
f1 and f2 defined on (−π, π] given by
( (
1, if 0 ≤ x ≤ π, x, if 0 ≤ x ≤ π,
f1 (x) = and f2 (x) = |x| = (5.2.1)
0, if −π < x < 0, −x, if −π < x < 0,

satisfy the requirements and are shown in figures 5.1 and 5.2 on page 5-3. In the context
of Fourier series we extend these 2π periodically so that they are defined for all x ∈ R
and a sketch of these in (−3π, 3π] are shown in figures 5.3 and 5.4 on page 5-4. Note that
f1 (x) has jump discontinuities at points kπ when k is an integer and f2 (x) is continuous
on R with the first derivative having jump discontinuities at kπ when k is an integer.
The Fourier series of a 2π-periodic function f (x) is written as

a0 X
f (x) ∼ + (an cos(nx) + bn sin(nx)) , (5.2.2)
2 n=1
1 π 1 π
Z Z
where an = f (x) cos(nx) dx and bn = f (x) sin(nx) dx. (5.2.3)
π −π π −π

The meaning of the symbol ∼ here is that the series is determined by f and it represents f
in some way but it is not necessarily the case that the value
∞ N
!
a0 X a0 X
+ (an cos(nx) + bn sin(nx)) = lim + (an cos(nx) + bn sin(nx))
2 n=1
N →∞ 2 n=1

is the same as f (x) for all x ∈ (−π, π]. As we will indicate later, it is the same for all x
in this interval in the case of f2 (x) and in the case of the discontinuous function f1 (x) it
is the same at all points of continuity of f1 (x) but it is not the same at the points kπ,
k ∈ Z where we have the jump discontinuity.

5.3 The orthogonality properties of cos(nx) and sin(nx)


on (−π, π)

In the case of column vectors x = (xi ) 6= 0 and y = (yi ) 6= 0 in Rn we say that they are
orthogonal if
xT y = x1 y1 + · · · + xn yn = 0.
The operation xT y is also called the dot product, scalar product or inner product of the
two vectors. We also have inner products for other types of objects and in the case of
functions f (x) and g(x) defined on (−π, π) we say that they are orthogonal if
Z π
f (x)g(x) dx = 0.
−π

We next show that we have this property when we consider the functions

1, cos(x), sin(x), cos(2x), sin(2x), . . . , cos(nx), sin(nx), . . . . (5.3.1)

– Fourier series – 5–2 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 5–3

−π 0 π

Figure 5.1: Sketch of f1 (x) on −π < x ≤ π.

−π 0 π

Figure 5.2: Sketch of f2 (x) = |x| on −π < x ≤ π.

To help evaluate the integrals that we encounter we need to recall some of the trig.
identities involving the addition formulas which in complex form can we written as
ei(a+b) = eia eib .
In terms of cosines and sines we have
cos(a + b) + i sin(a + b) = (cos(a) + i sin(a))(cos(b) + i sin(b))
= (cos(a) cos(b) − sin(a) sin(b)) + i(sin(a) cos(b) + cos(a) sin(b)).
If we replace b by −b then we have
cos(a − b) + i sin(a − b) = (cos(a) cos(b) + sin(a) sin(b)) + i(sin(a) cos(b) − cos(a) sin(b)).
(5.3.2)
If we add (5.3.2) and (5.3.2) and divide by 2 then we get
cos(a + b) + cos(a − b)
cos(a) cos(b) = , (5.3.3)
2
sin(a + b) + sin(a − b)
sin(a) cos(b) = . (5.3.4)
2
Similarly if we subtract (5.3.2) from (5.3.2) and divide by 2 then we get
cos(a − b) − cos(a + b)
sin(a) sin(b) = , (5.3.5)
2
sin(a + b) − sin(a − b)
cos(a) sin(b) = . (5.3.6)
2
– Fourier series – 5–3 –
[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 5–4

−3π −2π −π 0 π 2π 3π

Figure 5.3: Sketch of the 2π periodic function f1 (x) on −3π < x ≤ 3π.

−3π −2π −π 0 π 2π 3π

Figure 5.4: Sketch of the 2π periodic function f2 (x) on −3π < x ≤ 3π

Now for the evaluation of the integrals we have that when p is an integer and p 6= 0 we
immediately obtain Z π Z π
cos(px) dx = 0, sin(px) dx = 0
−π −π
and when p = 0 we trivially have cos(px) = 1 and
Z π
dx = 2π.
−π

We first consider taking f (x) and g(x) to be two different functions from the set of
functions listed in (5.3.1). If f (x) = 1 and g(x) = cos(nx) (with n > 0) or g(x) = sin(nx)
then we have Z π Z π
cos(nx) dx = sin(nx) dx = 0.
−π −π
If n > 0, m > 0 and n 6= m then n + m and n − m are both non-zero integers and we have
Z π
1 π
Z
cos(nx) cos(mx) dx = cos((n + m)x) + cos((n − m)x) dx = 0,
−π 2 −π
Z π
1 π
Z
sin(nx) sin(mx) dx = cos((n − m)x) − cos((n + m)x) dx = 0.
−π 2 −π
If n and m are now any integers then as n + m and n − m are both integers we have
Z π
1 π
Z
sin(nx) cos(mx) dx = sin((n + m)x) + sin((n − m)x) dx = 0,
−π 2 −π
Z π
1 π
Z
cos(nx) sin(mx) dx = sin((n + m)x) − sin((n − m)x) dx = 0.
−π 2 −π
The inner product of one function with any other function in the list (5.3.1) is hence zero.
If we take f (x) = g(x) then when f (x) = g(x) = 1 we have
Z π
dx = 2π (5.3.7)
−π

– Fourier series – 5–4 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 5–5

and if we take f (x) = g(x) = cos(nx) or we take f (x) = g(x) = sin(nx) then we have
Z π
1 π
Z
cos2 (nx) dx = 1 + cos(2nx) dx = π, (5.3.8)
2 −π
Z−ππ
1 π
Z
sin2 (nx) dx = 1 − cos(2nx) dx = π. (5.3.9)
−π 2 −π

5.4 The formula for the Fourier coefficients an and bn

We consider next justifying why an and bn are as given in (5.2.3) and we do this by
supposing that we have suitable numbers a0 , a1 , . . . , an , . . . and b0 , b1 , . . . , bn , . . . so that
the series converges to define a function f (x) given by

a0 X
f (x) = + (an cos(nx) + bn sin(nx)) (5.4.1)
2 n=1

and we suppose that we can integrate this function. Firstly,


π π ∞
a0 X
Z Z
f (x) dx = + (an cos(nx) + bn sin(nx)) dx
−π −π 2 n=1
∞ Z π
a0 π
Z X
= dx + (an cos(nx) + bn sin(nx)) dx
2 −π n=1 −π
= a0 π

assuming that it is valid to interchange the summation and the integral. If we instead
multiply by cos(mx), m ≥ 1, and integrate then we get
Z π Z π ∞
!
a0 X
f (x) cos(mx) dx = + (an cos(nx) + bn sin(nx)) cos(mx) dx
−π −π 2 n=1
a0 π
Z
= cos(mx) dx
2 −π
X∞ Z π
+ (an cos(nx) cos(mx) + bn sin(nx) cos(mx)) dx
n=1 −π
= am π

by using the orthogonality properties given earlier. Similarly, if we instead multiply by


sin(mx) and integrate then we get
Z π Z π ∞
!
a0 X
f (x) sin(mx) dx = + (an cos(nx) + bn sin(nx)) sin(mx) dx
−π −π 2 n=1
a0 π
Z
= sin(mx) dx
2 −π
∞ Z π
X
+ (an cos(nx) sin(mx) + bn sin(nx) sin(mx)) dx
n=1 −π

= bm π

– Fourier series – 5–5 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 5–6

where again we use the orthogonality properties given earlier. Hence we have shown that
when we have (5.4.1) the coefficients are

1 π 1 π
Z Z
an = f (x) cos(nx) dx and bn = f (x) sin(nx) dx, n = 0, 1, 2, . . .
π −π π −π

We can include b0 as trivially b0 = 0 as the integrand is 0 and note that we have a0 /2 in


the formula so that we have the same expression for all the an terms.
It is worth making some additional comments about the formula for the coefficients.
Firstly, when a function is 2π-periodic we can use any interval of length 2π in the formula,
i.e. we have
1 π 1 2π 1 α+2π
Z Z Z
an = f (x) cos(nx) dx = f (x) cos(nx) dx = f (x) cos(nx) dx
π −π π 0 π α

and
π 2π α+2π
1 1 1
Z Z Z
bn = f (x) sin(nx) dx = f (x) sin(nx) dx = f (x) sin(nx) dx
π −π π 0 π α

for all α ∈ R. In these notes we will use (−π, π] and when odd and even functions are
considered as it is convenient to have 0 in the middle of the interval. Secondly, if we
write an (f ) and bn (f ) to indicate more explicitly the dependence of the coefficients on the
function then these relations are linear, i.e. if we have functions f (x) and g(x) then the
Fourier coefficients of the function αf (x) + βg(x) are given by

an (αf + βg) = αan (f ) + βan (g),


bn (αf + βg) = αbn (f ) + βbn (g).

This is used in some examples.

– Fourier series – 5–6 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 5–7

5.5 Examples of the coefficients an and bn for various


functions

In the following we determine the Fourier series for several different functions and illustrate
graphically some of the partial sums
N
a0 X
fN (x) = + (an cos(nx) + bn sin(nx))
2 n=1

to see how they compare with the function f (x) from which they are derived.

1. The Heaviside function is defined by H : R → R,


(
1, if x ≥ 0,
H(x) =
0, if x < 0.

The function f1 (x) = H(x) corresponds with this in (−π, π] which we then continue
in a 2π-periodic way. The Fourier coefficient a0 of f1 is
1 π 1 π
Z Z
a0 = f1 (x) dx = dx = 1.
π −π π 0

For n ≥ 1,
π π  π
1 1 1 sin(nx)
Z Z
an = f1 (x) cos(nx) dx = cos(nx) dx = = 0,
π −π π 0 π n 0

1 π 1 π
Z Z
bn = f1 (x) sin(nx) dx = sin(nx) dx
π −π π 0
 π
1 − cos(nx) − cos(nπ) + 1
= =
π n 0 nπ
2
(
= nπ , if n is odd,
0, if n is even.

The Fourier series is hence


 
1 2 sin(3x) sin((2n − 1)x
+ sin(x) + +···+ +··· .
2 π 3 2n − 1

There are some mathematical questions to ask concerning whether or not this series
converges and if it does converge then does it converge to f1 (x)? Although we do not
provide any rigorous proofs, the series does converge for all values of x where f1 (x) is
continuous but if we let x = 0 of x = π, which is where f1 (x) is not continuous, then
all the sine terms are zero and all the partial sums are 1/2 and thus in particular
it converges to 1/2 which is not the value of f1 (0) = f1 (π) = 1. However it does
converge to f1 (x) for all other x in the interval, i.e. it converges in (−π, 0) and in

– Fourier series – 5–7 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 5–8

(0, π). We can illustrate this with the following Matlab statements which computes
and displays the partial sums defined by
m
1 2 X sin(2k − 1)x
g2m−1 (x) = +
2 π k=1 2k − 1

for m = 2, 8, 32, 128. The plots created are shown in figure 5.5 on page 5-9.

x=linspace(-pi, pi, 501);


mm=[2, 8, 32, 128];
g=zeros( size(x) );
for k=1:max(mm)
g=g+sin( (2*k-1)*x )/(2*k-1);

% create a plot if k is one of the values of mm


if sum(k==mm)==1
figure(k)
y=0.5+(2/pi)*g;
plot(x, y, ’LineWidth’, 2);
s=sprintf(’Fourier series for f1 with terms up to sin(%dx)’, 2*k-1);
title(s, ’FontSize’, 14);
end
end

2. We now consider the function f2 (x) = |x| which is even in (−π, π). As f2 (x) is even
and sin(nx) is odd it follows that

1 π
Z
bn = f2 (x) sin(nx) dx = 0, n = 1, 2, . . .
π −π

For a0 we have
π π
1 2 2 π2
Z Z
a0 = f2 (x) dx = x dx = = π.
π −π π 0 π 2

For n ≥ 1 we use integration by parts to evaluate the integrals and we have


1 π 2 π
Z Z
an = f2 (x) cos(nx) dx = x cos(nx) dx
π −π π 0
 π Z π 
2 x sin(nx) sin(nx)
= − dx .
π n 0 0 n

For the term in the square brackets the expression is 0 at both limits. Thus
Z π  π
2 2 cos(nx)
an = (− sin(nx)) dx =
nπ 0 nπ n 0
(
2 n
0, if n is even,
= 2 ((−1) − 1) = −4
nπ , if n is odd.
n2 π

– Fourier series – 5–8 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 5–9

Fourier series for f1 with terms up to sin(3x) Fourier series for f1 with terms up to sin(15x)
1.2 1.2

1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0

-0.2 -0.2
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4

Fourier series for f1 with terms up to sin(63x) Fourier series for f1 with terms up to sin(255x)
1.2 1.2

1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0

-0.2 -0.2
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4

Figure 5.5: Plots of the partial sums of the Fourier series for f1 (x).

In this case the Fourier series does converge to f2 (x) for all x ∈ [−π, π] and we can
write
 
π 4 cos(3x) cos((2n − 1)x)
f2 (x) = |x| = − cos(x) + +···+ + · · · , |x| ≤ π.
2 π 32 (2n − 1)2

We can illustrate this with the following Matlab statements which computes and
displays the partial sums defined by
 
π 4 cos(3x) cos((2m − 1)x)
g2m−1 (x) = − cos(x) + +···+ .
2 π 32 (2m − 1)2

for m = 4, 16. The plots created are shown in figure 5.6 on page 5-10.

– Fourier series – 5–9 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 5–10

x=linspace(-pi, pi, 501);


mm=[4, 16];
g=zeros( size(x) );
for k=1:max(mm)
g=g+cos( (2*k-1)*x )/((2*k-1)^2);

% create plot if k is one of the values of mm


if sum(k==mm)==1
figure(k)
y=0.5*pi-(4/pi)*g;
plot(x, y, ’LineWidth’, 2);
axis equal
s=sprintf(’Fourier series for f2 with terms up to cos(%dx)’, 2*k-1);
title(s, ’FontSize’, 14);
end
end

Fourier series for f2 with terms up to cos(7x) Fourier series for f2 with terms up to cos(31x)
4 4

3.5 3.5

3 3

2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0

-0.5 -0.5

-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3

Figure 5.6: Plots of the partial sums of the Fourier series for f2 (x).

As a final point, as the Fourier series converges to f2 (x) for all x ∈ [−π, π] we can
write
 
π 4 cos(3x) cos((2n − 1)x)
|x| = − cos(x) + +···+ + · · · , |x| ≤ π.
2 π 32 (2n − 1)2
In particular if we let x = 0 then
 
π 4 1 1
0= − 1+ 2 + 2 +··· ,
2 π 3 5
which rearranges to

X 1 π2
= .
n=1
(2n − 1)2 8

– Fourier series – 5–10 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 5–11

3. We now consider the Fourier series of the function

f3 (x) = x, −π < x ≤ π,

which we continue 2π-periodically. This function is odd in (−π, π) and a consequence

1 π
Z
an = x cos(nx) dx = 0, n = 0, 1, 2, . . . .
π −π

For the bn coefficients the integrands are even which means that we only need to
consider [0, π] and to make further progress we need to use integration by parts.

1 π 2 π
Z Z
bn = x sin(nx) dx = x sin(nx) dx
π −π π 0
 π Z π 
2 −x cos(nx) − cos(nx)
= − dx .
π n 0 0 n

For the term in the square brackets we have



(−1)n+1 π

−x cos(nx) −π cos(nπ)
= = .
n 0 n n

The part with the integral is 0 as


Z π  π
sin(nx)
cos(nx) dx = =0
0 n 0

and thus
2 (−1)n+1 π 2(−1)n+1
bn = = .
π n n
The Fourier series does converges to f3 (x) in −π < x < π where f3 (x) is continuous
and we can write

(−1)n+1
 
X sin(2x) sin(3x)
x=2 sin(nx) = 2 sin(x) − + −··· .
n=1
n 2 3

In this example if we let x = π in the Fourier series then all the sine terms are 0
and the value of the Fourier series is 0 which is not the same as f3 (π) = π or of the
limit limx↓−π f3 (x) = −π. At the point of discontinuity we get the average of the
two values just given as was similarly the case when f1 (x) was considered. This is
actually generally the case as we state in the next section.
As in the previous two examples, below is a short Matlab script to create and plot
partial sums for the series with the plots being shown in Figure 5.7 on page 5-13.

– Fourier series – 5–11 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 5–12

x=linspace(-pi, pi, 501);


mm=[4, 16, 64, 256];
g=zeros( size(x) );
pm=1;
for k=1:max(mm)
g=g+pm*sin(k*x)/k;
pm=-pm;

% create a plot if k is one of the values of mm


if sum(k==mm)==1
figure(k)
y=2*g;
plot(x, y, ’LineWidth’, 2);
axis equal
s=sprintf(’Fourier series for f3 with terms up to sin(%dx)’, k);
title(s, ’FontSize’, 14);
end
end

4. As a final example, if we let


(
f2 (x) + f3 (x) |x| + x 0, −π < x < 0,
f4 (x) = = =
2 2 x, 0 ≤ x ≤ π,

then we can quickly obtain the Fourier series by combining the series for f2 (x) and
f3 (x) giving
 
π 2 cos(3x) cos((2n − 1)x)
f4 (x) = − cos(x) + +···+ +···
4 π 32 (2n − 1)2
 
sin(2x) sin(3x)
+ sin(x) − + −···
2 3

for −π < x < π. The series does not converge to f4 (x) at the points of discontinuity
when we consider this as 2π periodic function, and in the interval (−π, π] this is the
point x = π.

5.6 The pointwise convergence of Fourier series

In the examples of section 5.5 we claimed that in the case of the function f2 (x), which
is continuous, the Fourier series converges to it for all x ∈ R. In the other examples
we claimed that the series converges at the points of continuity to the functions that
they represented and although each series also converges at the points of discontinuity we
could observe that the convergence was to the average of the values. It would take a few

– Fourier series – 5–12 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 5–13

Fourier series for f3 with terms up to sin(4x) Fourier series for f3 with terms up to sin(16x)
3
3

2
2

1
1

0 0

-1
-1

-2
-2

-3
-3
-3 -2 -1 0 1 2 3 -4 -3 -2 -1 0 1 2 3 4

Fourier series for f3 with terms up to sin(64x) Fourier series for f3 with terms up to sin(64x)

3 3

2 2

1 1

0 0

-1 -1

-2 -2

-3 -3

-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4

Figure 5.7: Plots of the partial sums of the Fourier series for f3 (x) = x.

lectures to prove why this is the case and such proofs will not be done in MA2715 and we
just restrict to stating sufficient conditions for this results.
One commonly used set of sufficient conditions for the pointwise convergence of Fourier
series are known as the Dirichlet conditions although these are expressed in slightly dif-
ferent ways in different texts. We have for example the following.

On page 584 of the book by James et al given at the start of the MA2715 notes the
conditions on f (x) are as follows. The function f (x) is a bounded periodic function
such that in any period it has a finite number of isolated maxima and minima and
a finite number of points of discontinuity.

On page 22 of Fourier Analysis by Murray R. Spiegel the function f (x) should be


such that f (x) and f ′ (x) are piecewise continuous. In particular this means that
every point x ∈ (−π, π) the following left and right limits must exist.

f (x−) = lim f (t), f (x+) = lim f (t),


t↑x t↓x
′ ′
f (x−) = lim f (t), f (x+) = lim f ′ (t).

t↑x t↓x

– Fourier series – 5–13 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 5–14

It is worth mentioning that these are sufficient conditions on f (x) for the following but
they are not necessary conditions, i.e. the result may still hold in some cases when they
are not satisfied. When the sufficient conditions are satisfied we can say the following.

1. The Fourier series converges to f (x) if x is a point of continuity.

2. The Fourier series converges to

lim f (t) + lim f (t)


f (x+) + f (x−) t↓x t↑x
=
2 2
if x is a point of discontinuity.

Thus if the 2π-periodic function is continuous on R then convergence is at all points and
we had this case with the even function f2 (x) in the examples. In the case of the function
f1 (x) we had a point of discontinuity in the interval (−π, π] as well as at the end of the
interval but in the case of f3 (x) and f4 (x) the point of discontinuity was just at the end
of the interval (−π, π], i.e. the value f (−π) := f (π) = π is not the same as the value
limx↓−π f (x) = −π.

5.7 Comments about the uniform convergence and


the L2 convergence of Fourier series

In the previous subsection we just stated sufficient conditions for a Fourier series to
converge in a pointwise sense and we gave the limit value. The limit value was only
the same as f (x) at a point of continuity of f (x). No proofs were given as this would
go beyond what can be done in just a few lectures as well as being a bit more difficult
and more technical than the other material presented in this chapter. The purpose of
this subsection is just to mention that the convergence of a series of functions can be
considered in other senses than pointwise convergence.

5.7.1 Uniform convergence

If you are on the mathematics degree then uniform convergence is one of your topics. In
the context of Fourier series we have uniform convergence when we have the following.
Let
N
a0 X
sN (x) = + (an cos(nx) + bn sin(nx)) , N = 1, 2, . . . (5.7.1)
2 n=1

with the Fourier coefficients given as before. We say that sN tends to f uniformly on
[−π, π] if
max {|f (x) − sN (x)| : −π ≤ x ≤ π} → 0 as N → ∞.
This was only the case in our examples when the 2π-periodic function f (x) is continuous.
Thus, as our examples have shown, we can have pointwise convergence but not uniform
convergence.

– Fourier series – 5–14 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 5–15

5.7.2 L2 convergence

With the conditions on the type of functions being considered these are more than suffi-
cient for us to have what is known as L2 convergence. With sN being as defined in (5.7.1)
this means that Z π
(f (x) − sN (x))2 dx → 0 as N → ∞. (5.7.2)
−π

There are some relations connected with (5.7.2) which quite quickly follow. Firstly,
from the orthogonality properties of functions in the set

VN = {1, cos(x), sin(x), cos(2x), sin(2x), . . . , cos(Nx), sin(Nx)}

we have
π N Z
a2 π π
Z Z X
sN (x) dx = 0
2
dx + a2n cos2 (nx) + b2n sin2 (nx) dx
−π 4 −π n=1 −π
N
!
a20 X
= π + (a2n + b2n ) . (5.7.3)
2 n=1

Now if we consider again how we justified the expressions for the Fourier coefficients
we have for any function π ∈ V that
Z π
(f (x) − sN (x))φ(x) dx = 0.
−π

sN is a linear combination of functions in VN and thus


Z π Z π Z π
(f (x) − sN (x))sN (x) dx = 0 which gives f (x)sN (x) dx = s2N (x) dx.
−π −π −π

Using both these results gives


Z π Z π Z π
2
(f (x) − sN (x)) dx = (f (x) − sN (x))f (x) dx = (f (x)2 − sN (x)2 ) dx.
−π −π −π

That is Z π Z π Z π
2 2
f (x) dx = sN (x) dx + (f (x) − sN (x))2 dx.
−π −π −π

Thus when we have (5.7.2) the use of (5.7.3) gives, as N → ∞,


π ∞
1 a2 X 2
Z
f (x) dx = 0 +2
(an + b2n ). (5.7.4)
π −π 2 n=1

Equation (5.7.4) is known as Parseval’s identity.

– Fourier series – 5–15 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 5–16

5.8 Half range series

In the material on Fourier series given so far we have considered representing a function
f (x) specified on (−π, π] by a Fourier series with the series representing the 2π-periodic
extension of f (x). If we keep with the period being 2π then we can instead start with
functions just given on (0, π) which we extend to (−π, π) as an odd function or we can
extend to (−π, π) as an even function and consider the series that we get in each case. The
Fourier series for the odd extension will just involve sine terms and the Fourier series for
the even extension will just involve cosine terms. The series created in this way are known
as the half range Fourier sine series or the half range Fourier cosine series. As the odd
extension generates an odd function on (−π, π) it follows that the integrand f (x) sin(nx)
in the formula for the coefficient is an even function. Similarly, as the even extension
generates an even function on (−π, π) the integrand f (x) cos(nx) is an even function. In
both cases the formula for the coefficients in the half range series just involve integrals on
(0, π) and we have the following.

The half range cosine series for f (x) defined on (0, π) is


∞ π
a0 X 2
Z
f (x) ∼ + an cos(nx), an = f (x) cos(nx) dx. (5.8.1)
2 n=1
π 0

The half range sine series for f (x) defined on (0, π) is


∞ π
2
X Z
f (x) ∼ bn sin(nx), bn = f (x) sin(nx) dx. (5.8.2)
n=1
π 0

The examples given already fit, or nearly fit, this category already.
In the case of f1 (x) involving the Heaviside function if we instead take
(
1, if 0 ≤ x ≤ π,
f˜1 (x) = 2f1 (x) − 1 = (5.8.3)
−1, if −π < x < 0,

then we have an odd function (at most points) and the series only involves sine terms, i.e.

4 X sin((2n − 1)x
f˜1 (x) ∼ . (5.8.4)
π n=1 2n − 1

We can use the equal sign here if x ∈ (0, π).


In the case of f2 (x) = |x| the series is already the half range cosine series and in the
case of f3 (x) = x the series is already the half range sine series. The functions f2 (x) and
f3 (x) are the same in (0, π) and as a consequence we have two representations of x in
[0, π), i.e. for 0 ≤ x < π,
 
π 4 cos(3x) cos((2n − 1)x
x = − cos(x) + +···+ +···
2 π 32 (2n − 1)2
 
sin(2x) sin(3x) n+1 sin(nx)
= 2 sin(x) − + + · · · + (−1) +··· .
2 3 n

– Fourier series – 5–16 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 5–17

5.9 Differentiation and integration of Fourier series

We have considered Fourier series for functions whose 2π-periodic extensions are dis-
continuous at certain points and for the example which was continuous everywhere the
derivative had discontinuities at certain points. We now consider when it is valid to inte-
grate or differentiate a Fourier series of a function to obtain the Fourier series of a related
function and although rigorous proofs are not given we attempt to partially justify the re-
sults. When the term-by-term operations are valid this will give one method of generating
one Fourier series from another.
In summary, as all the functions that we consider are piecewise continuous it is valid
to integrate term-by-term and the new function is “smoother” although care is needed
related to the a0 /2 term when this is non-zero. It is only valid to differentiate term-by-
term when the 2π periodic extension is continuous.

5.9.1 Integrating a Fourier series

With the summary already given stating that we can integrate term-by-term we consider
here examples.
The function f2 (x) considered earlier is such that f2 (x) = |x| when −π < x < π and
the derivative is the piecewise continuous function
(
1, if x > 0,
f˜2′ (x) =
−1, if x < 0.

The point x = 0 is a point of discontinuity of f˜2′ . This function coincides with (5.8.3) given
earlier (at least at all points except x = 0) and we already have the Fourier series (5.8.4)
which we repeat again here. For x ∈ (−π, 0) ∪ (0, π)

˜ 4 X sin((2n − 1)x
f1 (x) = .
π n=1 2n − 1

If we integrate 1 then we get x and if we integrate −1 then we get −x and thus the
function f2 (x) can be written as
Z x ∞ Z x
4 1
f˜1 (t) dt =
X
f2 (x) = sin((2n − 1)t) dt
0 π n=1 2n − 1 0
∞  x
4 X − cos((2n − 1)t)
=
π n=1 (2n − 1)2 0

4X 1
= (1 − cos((2n − 1)x) (5.9.1)
π n=1 (2n − 1)2

a0 4 X cos((2n − 1)x)
= − (5.9.2)
2 π n=1 (2n − 1)2

if we define ∞
a0 4X 1
= .
2 π 1 (2n − 1)2

– Fourier series – 5–17 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 5–18

Hence we get the series for f2 (x) from the series for f˜1 (x) by integrating term-by-term
although the constant term appears differently. To complete the details you still need to
consider
1 π 2 π
Z Z
f2 (x) dx = x dx = π
π −π π 0
to establish that a0 = π and

π 4X 1
= .
2 π 1 (2n − 1)2

If we consider the integration of a Fourier series more generally then at points of


continuity we have

a0 X
f (t) = + (an cos(nt) + bn sin(nt))
2 n=1
and we can shift the constant term to the left hand side to give

a0 X
f (t) − = (an cos(nt) + bn sin(nt))
2 n=1

and integrating from 0 to x gives


Z x ∞  
a0 x X an bn
f (t) dt − = sin(nx) − (cos(nx) − 1) .
0 2 n=1
n n
As an example, consider again the case f (x) = |x| in (−π, π) which is even and the Fourier
series only involves the cosine terms. Recall that we had
 
π 4 cos(3t) cos((2n − 1)t
|t| = − cos(t) + +···+ +···
2 π 32 (2n − 1)2
so that  
π 4 cos(3t) cos((2n − 1)t
|t| − = − cos(t) + +···+ +··· .
2 π 32 (2n − 1)2
Now |t| = t for t > 0 and |t| = −t for t < 0 and if we define g(x) by
Z x (
x2 /2, x ≥ 0,
g(x) = |t| dt = 2
0 −x /2, x ≤ 0
then term-by-term integration gives

πx 4 X sin((2n − 1)x
φ(x) = g(x) − =− .
2 π n=1 (2n − 1)3
The piecewise defined function φ(x) is continuously differentiable and to directly check
this and determine the values note that

 1 (x2 − πx) = 1 x(x − π), when 0 ≤ x ≤ π,
φ(x) = 2 2
 1 (−x2 − πx) = − 1 x(x + π), when −π ≤ x ≤ 0.
2 2
The function is 0 at −π , 0 and π where the pieces join and the left and right derivatives
agree at the join points and we have
π π
φ′ (0) = − and φ′ (−π) = φ′ (π) = .
2 2
On [−π, π] the function φ(x) is piecewise defined with the two quadratic pieces giving a
continuously differentiable function when we consider the 2π periodic extension.

– Fourier series – 5–18 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 5–19

5.9.2 Differentiating a Fourier series

We have already stated that for the piecewise smooth functions that we have considered
it is only valid to differentiate the series term-by-term if the 2π periodic extension of the
function is continuous. As we have taken (−π, π] as our interval this means that for such
a function f (x) we need to have

f (π) = lim f (x) = lim f (x).


x↑π x↓−π

As our function f (x) is continuous we have at all points in [−π, π]



a0 X
f (x) = + (an cos(nx) + bn sin(nx)) .
2 n=1

Term-by-term differentiation gives



X
(nbn cos(nx) − nan sin(nx)) .
n=1

Now if we consider the function f ′ (x) which is piecewise defined and we are able to
integrate it then the Fourier coefficients are as follows.
1 π ′ 1 π ′
Z Z
ãn = f (x) cos(nx) dx and b̃n = f (x) sin(nx) dx.
π −π π −π

We can re-express both of these by integration by parts.


1 1 π
Z
π
ãn = [f (x) cos(nx)]−π − f (x)(−n sin(nx)) dx = nbn ,
π π −π
1 1 π
Z
π
b̃n = [f (x) sin(nx)]−π − f (x)(n cos(nx)) dx = −nan ,
π π −π

where in the first case we need to use that f (−π) = f (π). Thus the Fourier coefficients
that we obtain by term-by-term differentiation are the same as what we get when we use
the formula with the function as f ′ (x) and we have

X

f (x) ∼ (nbn cos(nx) − nan sin(nx)) .
n=1

The Fourier series for f ′ (x) is the same as f ′ (x) at points of continuity of f ′ (x) and it is
the average of the left and right limits at points of discontinuity of f ′ (x).

5.10 The Fourier series for a function f (x) of pe-


riod 2L

For the final part of these notes on Fourier series suppose that the least period of f (x) is
2L > 0, i.e.
f (x + 2L) = f (x), for all x ∈ R.

– Fourier series – 5–19 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 5–20

As we have already described the 2π periodic case we can adjust for this by letting
πx Lt
t= or equivalently x =
L π
so that x = ±L correspond respectively to t = ±π and first consider the related 2π
periodic function  
Lt
g(t) = f .
π
Equivalently we can write  πx 
f (x) = g .
L
From what has already been done the Fourier series for g(t) is

a0 X
g(t) ∼ + (an cos(nt) + bn sin(nt)) .
2 n=1

Hence the corresponding Fourier series for f (x) is



a0 X   nπx   nπx 
f (x) ∼ + an cos + bn sin .
2 n=1
L L

This is the form of a Fourier series when the period is 2L. To express the Fourier coef-
ficients in terms of integrals involving f (x) can be done using the substitution indicated
above, i.e.
πx dt π
t= , = , g(t) = f (x).
L dx L
Thus
dt  nπx  π dt  nπx  π
g(t) cos(nt) = f (x) cos , g(t) sin(nt) = f (x) sin ,
dx L L dx L L
and the Fourier coefficients can be written as
1 π 1 L  nπx 
Z Z
an = g(t) cos(nt) dt = f (x) cos dt,
π −π L −L L
1 π 1 L  nπx 
Z Z
bn = g(t) sin(nt) dt = f (x) sin dt.
π −π L −L L

Example

Let f (x) = | sin(x)| which periodic and continuous. As sin(x + π) = − sin(x) it follows
that f (x + π) = f (x) and the least period is 2L = π so that L = π/2. The function is
even and thus only cosine terms are involved in the Fourier series which is of the form

a0 X
f (x) = | sin(x)| = + an cos(2nx).
2 n=1

As the function is even we get the coefficients by just integrating over half the range, i.e.
[0, π/2].
4 π/2 4 4
Z
a0 = sin(x) dx = [− cos(x)]π/2 0 = .
π 0 π π

– Fourier series – 5–20 –


[Link] c M. K. Warby MA2715 Advanced Calculus and Numerical Methods 5–21

For n ≥ 1,
π/2
4
Z
an = sin(x) cos(2nx) dx.
π 0
Now recall the trig. identity

2 sin(x) cos(2nx) = sin((2n + 1)x) − sin((2n − 1)x).

Thus
Z π/2 Z π/2
2 sin(x) cos(2nx) dx = sin((2n + 1)x) − sin((2n − 1)x) dx
0 0
 π/2
− cos((2n + 1)x) cos((2n − 1)x)
= +
2n + 1 2n − 1 0
1 1 −2
= − = 2 .
2n + 1 2n − 1 4n − 1
Hence  
4 1
an = − 2 .
π 4n − 1
The Fourier series is ∞
2 4 X cos(2nx)
| sin(x)| = − .
π π n=1 4n2 − 1

– Fourier series – 5–21 –

Common questions

Powered by AI

The Fourier series of a function reflects its periodic properties by representing it as a sum of sines and cosines with coefficients determined from integrals over one period. However, at points of discontinuity, the Fourier series does not necessarily converge to the function itself but rather to the average of the limits at the discontinuity. This is characterized by the Gibbs phenomenon, where the Fourier series overshoots the discontinuities .

At points of discontinuity, the Fourier series does not converge to the actual function value but rather to the average of the left-hand and right-hand limits of the function at that point. This is a general property of Fourier series related to the convergence at discontinuities, known as the Dirichlet conditions .

The eigenvectors and eigenvalues provide a framework for the solution of the differential equations u′ = Au. Specifically, if v is an eigenvector of A and λ is its corresponding eigenvalue, then y(x) = e^(λx)v forms a solution to the differential equation . Further, any linear combination of these solutions also forms a solution due to the linearity of the system, so the general solution is u(x) = c1e^(λ1x)v1 + … + cne^(λnx)vn, where c1, ..., cn are constants determined by initial conditions .

A symmetric matrix that is positive definite is considered non-singular because all its eigenvalues are positive, implying no eigenvalue is zero and the matrix does not have any zero determinant . This property ensures that all leading principal sub-matrices are also non-singular. The implication is that we can perform efficient factorization using the Cholesky decomposition, and the matrix is stable for numerical solutions .

Matrix diagonalization is significant as it provides a way to simplify the computation of powers of a matrix. If A is diagonalizable, it can be written as A = VDV^(-1), where D is a diagonal matrix of eigenvalues and V is a matrix whose columns are the corresponding eigenvectors. This makes it easier to compute powers of A, as Ak = V Dk V^(-1), facilitating computations especially in solving differential equations .

Orthogonality in Fourier series is the property that sine and cosine functions, as well as their integer multiples, are orthogonal over an interval like (−π, π) because their inner (dot) product evaluates to zero. This orthogonality is crucial for calculating Fourier coefficients as it simplifies the integrals involved, isolating different frequency components independently .

The linearity property of Fourier coefficients means that if you have two functions, f(x) and g(x), and combine them in the form af(x) + bg(x), the Fourier coefficients of the combination are simply the linear combinations of the individual coefficients, such that an(af + bg) = a*an(f) + b*an(g). This significance lies in its ability to construct Fourier series for combined functions easily and illustrates the superposition principle .

Cholesky factorization is a method of decomposing a symmetric positive definite matrix A into the product of a lower triangular matrix L and its transpose (A = LL^T). This factorization requires the matrix to be symmetric and positive definite, ensuring all eigenvalues are positive and the matrix is non-singular. It is particularly useful for numerical stability and efficiency in solving linear systems .

Orthogonal matrices play a critical role in the diagonalization of real symmetric matrices because they provide a basis transformation whereby a symmetric matrix A can be diagonalized using an orthogonal matrix Q such that QT AQ = D, where D is a diagonal matrix of eigenvalues. Since the inverse of an orthogonal matrix is its transpose, Q−1 = QT, computations are simplified, and the orthogonality condition preserves numerical stability and error minimization .

The solution to a system of differential equations is related to the matrix invertibility and completeness of eigenvectors because the solution u(x) = Vce^λx forms a valid representation only if V, the matrix of eigenvectors, is invertible. V is invertible if and only if the eigenvectors v1, ..., vn are linearly independent, which occurs if A is diagonalizable. Thus, completeness in terms of a full set of eigenvectors is crucial for ensuring a unique and solvable set of initial conditions .

You might also like