Probabilites Numeriques
Probabilites Numeriques
to
Numerical Probability for Finance
—–
2016-17
—–
Gilles Pagès
LPMA-Université Pierre et Marie Curie
—
E-mail: [email protected]
—
http: //www.proba.jussieu.fr/pageperso/pages
—
3 Variance reduction 49
3.1 The Monte Carlo method revisited: static control variate . . . . . . . . . . . . . . . 49
3.1.1 Jensen inequality and variance reduction . . . . . . . . . . . . . . . . . . . . . 51
3.1.2 Negatively correlated variables, antithetic method . . . . . . . . . . . . . . . 57
3.2 Regression based control variate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2.1 Optimal mean square control variate . . . . . . . . . . . . . . . . . . . . . . . 60
3.2.2 Implementation of the variance reduction: batch vs adaptive . . . . . . . . . 61
3.3 Application to option pricing: using parity equations to produce control variates . . 66
3.3.1 Complexity aspects in the general case . . . . . . . . . . . . . . . . . . . . . . 68
3.3.2 Examples of numerical simulations . . . . . . . . . . . . . . . . . . . . . . . . 68
3.3.3 Multidimensional case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.4 Pre-conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.5 Stratified sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.6 Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.6.1 The abstract paradigm of important sampling . . . . . . . . . . . . . . . . . . 78
3
4 CONTENTS
11 Miscellany 331
11.1 More on the normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
11.2 Characteristic function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
11.3 Numerical approximation of the distribution function . . . . . . . . . . . . . . . . . . 331
11.4 Table of the distribution function of the normal distribution . . . . . . . . . . . . . . 332
11.5 Uniform integrability as a domination property . . . . . . . . . . . . . . . . . . . . . 333
11.6 Interchanging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
11.7 Measure Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
11.8 Weak convergence of probability measures on a Polish space . . . . . . . . . . . . . . 337
11.9 Martingale Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
Notations
General notations
• bxc denotes the integral part of the real number x i.e. the highest integer not greater than x.
{x} = x − bxc denotes the fractional part of the real number x.
• The notation u ∈ Kd will denote the column vector u of the vector space Kd , K = R or C.
The row vector will be denoted u∗ or t u.
i i 1 d
P
• (u|v) = 1≤i≤d u v denotes the canonical inner product of vectors u = (u , . . . , u ) and
1 d d
v = (v , . . . , v ) of R .
• M(d, q, K) will denote the vector space of matrices with d rows and q columns with K-valued
entries.
• Cb (S, Rd ) := f : (S, δ) → Rd , continuous and bounded where (S, δ) denotes a metric space.
|f (x) − f (y)|q
[f ]Lip = sup .
x6=y |x − y|d
and f is Lipschitz continuous continuous with coefficient [f ]Lip if [f ]Lip < +∞.
selected norm on Rd .
7
8 CONTENTS
Chapter 1
But this naive and abstract definition is not satisfactory because the “scenario” ω ∈ Ω may
be not a “good” one i.e. not a “generic”. . . ? Many probabilistic properties (like the law of large
numbers to quote the most basic one) are only satisfied P-a.s.. Thus, if ω precisely lies in the
negligible set that does not satisfy one of them, the induced sequence will not be “admissible”.
Whatever, one usually cannot have access to an i.i.d. sequence of random variables (Un ) with
distribution U([0, 1])! Any physical device would too slow and not reliable. Some works by logicians
like Martin-Löf lead to consider that a sequence (xn ) that can be generated by an algorithm cannot
be considered as “random” one. Thus the digits of π are not random in that sense. This is quite
embarrassing since an essential requested feature for such sequences is to be generated almost
instantly on a computer!
The approach coming from computer and algorithmic sciences is not really more tractable since
their definition of a sequence of random numbers is that the complexity of the algorithm to generate
the first n terms behaves like O(n). The rapidly growing need of good (pseudo-)random sequences
with the explosion of Monte Carlo simulation in many fields of Science and Technology (I mean not
only neutronics) after World War II lead to adopt a more pragmatic approach – say – heuristic –
based on a statistical tests. The idea is to submit some sequences to statistical tests (uniform
distribution, block non correlation, rank tests, etc)
For practical implementation, such sequences are finite, as the accuracy of computers is. One
considers some sequences (xn ) of so-called pseudo-random numbers defined by
yn
xn = , yn ∈ {0, . . . , N − 1}.
N
One classical process is to generate the integers yn by a congruential induction:
yn+1 ≡ ayn + b mod. N
9
10 CHAPTER 1. SIMULATION OF RANDOM VARIABLES
where gcd(a, N ) = 1, so that ā (class of a modulo N ) is invertible for the multiplication (modulo
N ). Let (Z/N Z)∗ denote the set of such invertible classes (modulo N ). The multiplication of
classes (module N ) is an internal law on (Z/NZ)∗ and ((Z/N Z)∗ , ×) is a commutative group for
this operation. By the very definition of the Euler function ϕ(N ) as the number of integers a in
{0, . . . , N − 1} such that gcd(a, N ) = 1, the cardinality of (Z/N Z)∗ is equal to ϕ(N ). Let us recall
that the Euler function is multiplicative and given by the following closed formula
Y 1
ϕ(N ) = N 1− .
p
p|N, p prime
Thus ϕ(p) = p − 1 for very prime integer of N and ϕ(pk ) = pk − p (primary numbers), etc. In
particular, if p is prime it shows that (Z/N Z)∗ = (Z/N Z) \ {0}.
If b = 0 (the most common case), one speaks of homogeneous generator. We will focus on this type
of generators in what follows.
Homogeneous congruent generators. When b = 0, the period of the sequence (yn ) is given by the
multiplicative order of a in ((Z/N Z)∗ , ×) i.e.
Moreover, we know by Lagrange’s Theorem that τa divides the cardinality ϕ(N ) of (Z/N Z)∗ .
For pseudo-random number simulation purpose, we search for couples (N, a) such that the
period τa of a in ((Z/N Z)∗ , ×) is very large. The needs an in-depth study of the multiplicative
groups ((Z/N Z)∗ , ×), with in mind that N should be large itself to allow a having a large period.
This suggests to focus on prime numbers or primary numbers since, as seen above, their Euler
function is itself large.
In fact the structure of these groups have been elucidated for a long time and we sum up below
these results.
What does this theorem say in connection with our pseudo-random number generation problem?
First a very good news: when N is a prime number the group ((Z/N Z)∗ , ×) is cyclic i.e. there
exists a ∈ {1, . . . , N } such that (Z/N Z)∗ = {ān , 1 ≤ n ≤ N − 1 ∈}. The bad news is that we do
not know which a satisfy this property, not all do and, even worse, we do not know how to find any.
Thus, if p = 7, ϕ(7) = 7 − 1 = 6 and o(3) = o(5) = 6 but o(2) = o(4) = 3 (which divides 6) and
o(6) = 2 (which again divides 6).
1.1. PSEUDO-RANDOM NUMBERS 11
The send bad news is that the length of a sequence, though a necessary property of a sequence
(yn )n , provides no guarantee or even clue that (xn )n is a good sequence of pseudo-random numbers!
Thus, the (homogeneous) generator of the FORTRAN IMSL library does not fit in the formerly
described setting: one sets N := 231 − 1 = 2 147 483 647 (which is a Mersenne prime number (see
below)) discovered by Leonhard Euler in 1772), a := 75 , and b := 0 (a 6≡ 0 mod.8), the point being
that the period of 75 is not maximal.
Another approach to random number generation is based on shift register and relies upon the
theory of finite fields.
At this stage, a sequence must pass successfully through various statistical tests, having in
mind that such a sequence is finite by contraction and consequently cannot satisfy asymptotically
as common properties as the Law of Large Numbers, the Central Limit Theorem or the Law of the
Iterated Logarithm (see next chapter). Dedicated statistical toolboxes like DIEHARD (Marsaglia,
1998) have been devised to test and “certify” sequences of pseudo-random numbers.
The aim of this introductory section is just to give the reader the flavour of pseudo-random
numbers generation, but in no case we recommend the specific use of any of the above generators
or discuss the virtues of such or such generator.
For more recent developments on random numbers generators (like shift register, etc, we refer
e.g. to [42], [118], etc). Nevertheless, let us mention the Mersenne twister generators. This family
of generator has been introduced in 1997 by Makoto Matsumata and Takuji Nishimura in [111].
The first level of Mersenne Twister Generators (denoted M T -p) are congruential generators whose
period Np is a prime Mersenne number i.e. an integer of the form Np = 2p − 1 where p is itself
prime. The most popular and now worldwide implemented is the M T -19937, owing to its unrivaled
period 219937 − 1 ≈ 106000 since 210 ≈ 103 ). It can simulate a uniform distribution in 623 dimension
(i.e. on [0, 1]623 . A second “shuffling” device still improves it. For recent improvements and their
implementations in C ++, use the link
www.math.sci.hiroshima-u.ac.jp/e m-mat/MT/emt.html
Recently, new developments in massively parallel computing drew the attention back to pseudo-
random number simulation. In particular the GPU based intensive computations which use the
graphic device of a computer as a computing unit which may run in parallel hundreds of parallel
computations. One can imagine that each pixel is a (virtual) small computing unit achieving a
small chain of elementary computations (thread). What is really new is the access to such an inten-
sive parallel computing becomes cheap, although it requires some specific programming language
(like CUDA on Nvidia GPU). As concerns its use for intensively parallel Monte Carlo simulation,
some new questions arise: in particular the ability to generate independently (in parallel!) many
“independent” sequences of pseudo-random numbers since the computing units of a GPU never
“speak” to each other or to anybody else while running: each pixel is a separate (virtual) thread.
The Wichmann-Hill pseudo-random numbers generator (which is in fact a family of 273 different
methods) seems to be a good candidate for Monte Carlo simulation on GPU. For more insight on
that topic we refer to [151] and the references therein.
12 CHAPTER 1. SIMULATION OF RANDOM VARIABLES
PX = λ[0,1] ◦ ϕ−1
where λ[0,1] ◦ ϕ−1 denotes the image of the Lebesgue measure λ[0,1] on the unit interval by ϕ.
We will admit this theorem. For a proof we refer to [27] (Theorem A.3.1p.38). It also appears
as a “brick” in the proof of the Skorokhod representation theorem for random variables having
values in a Polish space.
As a consequence this means that, if U denotes any random variable with uniform distribution
on (0, 1) defined on a probability space (Ω, A, P), then
d
X = ϕ(U ).
The interpretation is that any E-valued random variable can be simulated from a uniform
distribution. . . provided the functionϕ is computable. If so is the case, the yield of the simulation is
1 since every (pseudo-)random number u ∈ [0, 1] produces a PX -distributed random number. Except
in very special situations (see below), this result turns out to be of purely theoretical nature and is
of little help for practical simulation. However the fundamental theorem of simulation is important
from a theoretical point view in Probability Theory since it is the fundamental step of Skorokhod
representation Theorem.
In the three sections below we provide a short background on the most classical simulation meth-
ods (inversion of the distribution function, acceptance-rejection method, Box-Müller for Gaussian
vectors). This is of course far from being exhaustive. For an overview of the different aspects of
simulation of non uniform random variables or vectors, we refer to [42]. But in fact, a large part
of the results from Probability Theory can give rise to simulation methods.
d d
Proposition 1.1 If U = U ((0, 1)), then X := Fl−1 (U ) = µ.
{X ≤ x} = {Fl−1 (U ) ≤ x} = {U ≤ F (x)}
Remarks. • When F is increasing and continuous on the real line, then F has an inverse function
denoted F −1 defined (0, 1) (increasing and continuous as well) such that F ◦ F −1 = Id|(0,1) and
F −1 ◦ F = IdR . Clearly F −1 = Fl−1 by the very definition of Fl−1 . But the above proof can be
made even more straightforward since {F −1 (U ) ≤ x} = {U ≤ F (x)} by simple left composition of
F −1 by the increasing function F .
Z x
• If µ has a probability density f such that {f = 0} has an empty interior, then F (x) = f (u)du
−∞
is continuous and increasing.
• One can replace R by any interval [a, b] ⊂ R or R (with obvious conventions).
• One could also have considered the right continuous canonical inverse Fr−1 by:
When X takes finitely many values in R, we will see in Example 4 below that this simulation
method corresponds to the standard simulation method of such random variables.
so that if F is continuous (or equivalently µ has no atom: µ({x}) = 0 for every x ∈ R), then
F ◦ Fl−1 = Id(0,1)
d
(b) Show that if F is continuous, then F (X) = U ([0, 1]).
(c) Show that if F is (strictly) increasing, then Fl−1 is continuous and Fl−1 ◦ F = IdR .
(d) One defines the survival function of µ by F̄ (x) = 1 − F (x) = µ (x, +∞) ,x ∈ R. One defines
the canonical right inverse of F̄ by
Show that F̄r−1 (u) = Fl−1 (1−u). Deduce that F̄r−1 is right continuous on (0, 1) and that F̄r−1 (U )
has distribution µ. Define F̄l−1 and show that F̄r−1 (U ) has distribution µ. Finally establish for F̄r−1
similar properties to (a)-(b)-(c).
(Unformal) Definition The yield (often denoted r) of a simulation procedure is defined as the
inverse of the number of pseudo-random numbers used to generate one PX -distributed random
number.
One must keep in mind that the yield is attached to a simulation method, not to a probability
distribution (the fundamental theorem of simulation always provides a simulation method with
yield 1, except that it is usually not tractable).
Example. Typically, if X = ϕ(U1 , . . . , Um ) where ϕ is a Borel function defined on [0, 1]m and
U1 , . . . , Um are independent and uniformly distributed over [0, 1], the yield of this ϕ-based procedure
to simulate the distribution of X is r = 1/m.
Thus, the yield r of the (inverse) distribution function is consequently always equal to r = 1.
d
Examples: 1.Simulation of an exponential distribution. Let X = E(λ), λ > 0. Then
Z x
∀ x ∈ (0, ∞), FX (x) = λ e−λξ dξ = 1 − e−λx .
0
d
Consequently, for every y ∈ (0, 1), FX−1 (u) = − log(1 − u)/λ. Now, using that U = 1 − U if
d
U = U ((0, 1)) yields
log(U ) d
X=− = E(λ).
λ
c
2. Simulation of a Cauchy(c), c > 0, distribution. We know that PX (dx) = dx.
π(x2 + c2 )
Z x
c du 1 x π
∀x ∈ R, FX (x) = = Arctan + .
−∞ u2 + c2 π π c 2
d
X = c tan π(U − 1/2) = Cauchy(c).
θ
3. Simulation of a Pareto(θ), θ > 0, distribution. We know that PX (dx) = 1{x≥1} dx. The
x1+θ
d d
distribution function FX (x) = 1 − x−θ so that, still using U = 1 − U if U = U ((0, 1)),
1 d
X = U − θ = Pareto(θ).
with the convention x0 = −∞ and xN +1 = +∞. As a consequence, its left continuous canonical
inverse is given by
N
X
−1
∀u ∈ (0, 1), FX,l (u) = xk 1{p1 +···+pk−1 <u≤p1 +···+pk }
k=1
so that
N
d
X
X= xk 1{p1 +···+pk−1 <U ≤p1 +···+pk } .
k=1
The yield of the procedure is still r = 1. However, when implemented naively, its complexity –
which corresponds to (at most) N comparisons for every simulation – may be quite high. See [42]
for some considerations (in the spirit of quick sort algorithms) which lead to a O(log N ) complexity.
Furthermore, this procedure underlines that one has access to the probability weights pk with an
arbitrary accuracy. This is not always the case even in a priori simple situations as emphasized in
Example 6 below.
It should be noticed of course that the above simulation formula is still appropriate for a random
variable taking values in any set E, not only for subsets of R!
5. Simulation of a Bernoulli random variable B(p), p ∈ (0, 1). This is the simplest application of
the previous method since
d
X = 1{U ≤p} = B(p).
6. Simulation of a Binomial random variable B(n, p), p ∈ (0, 1), n ≥ 1. One relies on the very
definition of the binomial distribution as the law of the sum of n independent B(p)-distributed
random variables i.e.
n
d
X
X= 1{Uk ≤p} = B(n, p).
k=1
where U1 , . . . , Un are i.i.d. random variables, uniformly distributed over [0, 1].
Note that this procedure has a very bad yield, namely r = n1 . Moreover, it needs n comparisons
like the standard method (without any shortcut).
Why not using the above standard method for random variable taking finitely many values?
Because the cost of the computation of the probability weights pk ’s is much too high as n grows.
7. Simulation of geometric random variables G(p) and G∗ (p), p ∈ (0, 1). It is the distribution of the
first success instant when repeating independently the same Bernoulli experiment with parameter
p. Conventionaly, G(p) starts at time 0 whereas G∗ (p) starts at time 1.
To be precise, if (Xk )k≥0 denotes an i.i.d. sequence of random variables with distribution B(p),
p ∈ (0, 1) then
d
τ ∗ := min{k ≥ 1 | Xk = 1} = G∗ (p)
16 CHAPTER 1. SIMULATION OF RANDOM VARIABLES
and
d
τ := min{k ≥ 0 | Xk = 1} = G(p).
Hence
P(τ ∗ = k) = p(1 − p)k−1 , k ∈ N∗ := {1, 2, . . . , n, . . .}
and
P(τ = k) = p(1 − p)k , k ∈ N := {0, 1, 2, . . . , n, . . .}
∗
P P
(so that both random variables are P-a.s. finite since k≥0 P(τ = k) = k≥1 P(τ = k) = 1).
Note that τ + 1 has the same G∗ (p)-distribution as τ ∗ .
The (random) yields of the above two procedures are r∗ = τ1∗ and r = 1
τ +1 respectively. Their
common mean (average yield) r̄ = r̄∗ is given by
1 1 X1
E =E = p(1 − p)k−1
τ +1 τ∗ k
k≥1
p X1
= (1 − p)k
1−p k
k≥1
p
= − log(1 − x)|x=1−p
1−p
p
= − log(p).
1−p
Exercises. 1. Let X : (Ω, A, P) → (R, Bor(R)) be a real-valued random variable with distribu-
tion function F and left continuous canonical inverse Fl−1 . Let I = [a, b], −∞ ≤ a < b ≤ +∞, be
d
a nontrivial interval of R. Show that, if U = U ([0, 1]), then
d
Fl−1 F (a) + (F (b) − F (a))U = L(X | X ∈ I).
2. Negative binomial distributions The negative binomial distribution with parameter (n, p) is the
law µn,p of the nth success in an infinite sequence of independent Bernoulli trials, namely, with
above notations used for the geometric distributions, the distribution of
Show that
n−1 n
µn,p (k) = 0, k ≤ n − 1, µn,p (k) = Ck−1 p (1 − p)k−n , k ≥ n.
Compute the mean yield of its (natural and straightforward) simulation method.
we know that f is dominated by a probability distribution g.µ which can be simulated at “low
cost”. (Note that ν = R ff dµ .µ.)
E
In most elementary applications (see below), E is either a Borel set of Rd equipped with its
Borel σ-field and µ is the trace of the Lebesgue measure on E or a subset E ⊂ Zd equipped with
the counting measure.
Let us be more precise. So, let µ be a non-negative σ-finite measure
1
R on (E, E) and let f, g :
(E, E) −→ R+ be two Borel functions. Assume that f ∈ LR+ (µ) with E f dµ > 0 and that g is a
probability density with respect to µ satisfying furthermore g > 0 µ-a.s. and there exists a positive
real constant c > 0 such that
f (x) ≤ c g(x) µ(dx)-a.e.
Z
Note that this implies c ≥ f dµ. As mentioned above, the aim of this section is to show how to
E
simulate some random numbers distributed according to the probability distribution
f
ν=R .µ
Rd f dµ
using some g.µ-distributed (pseudo-)random numbers. In particular, to make the problem consis-
tent, we will assume that ν 6= µ which in turn implies that
Z
c> f dµ.
Rd
• we know how to simulate (at a reasonable cost) on a computer a sequence of i.i.d. random
vectors (Yk )k≥1 with the distribution g . µ
f
• we can compute on a computer the ratio g (x) at every x ∈ Rd (again at a reasonable cost).
As a first (not so) preliminary step, we will explore a natural connection (in distribution)
between an E-valued random varaible X with distribution ν and Y an E-valued random variable
with distribution g.µ. W swill see that he key idea is completely elementary. let h : (E, E) → R a
test function (measurable and bounded or non-negative). On the one hand
Z
1
E h(X) = R h(x)f (x)µ(dx)
E f dµ E
Z
1 f
= R h(y) (y)g(y)µ(dy) since g > 0 µ-a.e.
E f dµ E g
f
= E h(Y ) (Y ) .
g
18 CHAPTER 1. SIMULATION OF RANDOM VARIABLES
We can also stay on the state space E and note in a somewhat artificial way that
Z Z 1
c
E h(X) = R h(y) 1{u≤ 1 f (y)} du g(y)µ(dy)
E f dµ E 0 c g
Z Z 1
c
= R h(y)1{u≤ 1 f (y)} g(y)µ(dy)du
E f dµ E 0
c g
c
= R E h(Y )1{cU ≤ f (Y )}
E f dµ
g
The proposition below takes advantage of this identity in distribution to propose a simulation
procedure. In fact it is simply a reverse way to make (and interpret) the above computations.
Remark. The (random) yield of the method is τ1 .Hnece we know that its mean yield is given by
R
ERf dµ
1 p log p c
E =− = log R .
τ 1−p c − E f dµ E f dµ
Z
p log p
Since lim − = 1, the closer to f dµ the constant c is, the higher the yield of the simulation
p→1 1−p Rd
is.
Proof. Step 1: Let (U, Y ) be a couple of random variables with distribution U ([0, 1]) ⊗ PY . Let
1.4. THE ACCEPTANCE-REJECTION METHOD 19
i.e.
L (Y |{c U g(Y ) ≤ f (Y )}) = ν.
Remark.Z An important point to be noticed is that we do not need to know the numerical value
either of f dµ to implement the above acceptance-rejection procedure.
E
20 CHAPTER 1. SIMULATION OF RANDOM VARIABLES
(a) The sequence (τn − τn−1 )n≥1 (with the convention τ0 = 0) is i.i.d. with distribution G∗ (p) and
the sequence
Xn := Yτn
(b) Furthermore the random yield of the simulation of the first n PX -distributed random variables
Yτk , k = 1, . . . , n is
n a.s.
ρn = −→ p as n → +∞.
τn
Before proposing first applications, let us briefly present a more applied point of view which is
closer to what is really implemented in practice when performing a Monte Carlo simulation based
on the acceptance-rejection method.
The user’s viewpoint (practitioner’s corner). The practical implementation of the acceptance-
rejection method is rather simple. Let h : E → R be a PX -integrable Borel function. How to
compute E h(X) using Von Neumann’s acceptance-rejection method? It amounts to the simulation
an n-sample (Uk , Yk )1≤k≤n on a computer and to the computation of
n
X
1{cUk g(Yk )≤f (Yk )} h(Yk )
k=1
n .
X
1{cUk g(Yk )≤f (Yk )}
k=1
Note that
n n
X 1X
1{cUk g(Yk )≤f (Yk )} h(Yk ) 1{cUk g(Yk )≤f (Yk )} h(Yk )
n
k=1 k=1
n = n , n ≥ 1.
X 1X
1{cUk g(Yk )≤f (Yk )} 1{cUk g(Yk )≤f (Yk )}
n
k=1 k=1
1.4. THE ACCEPTANCE-REJECTION METHOD 21
Hence, owing to the strong Law of Large Number (see next chapter if necessary) this quantity a.s.
converges as n goes to infinity toward
Z 1 Z Z
f (y)
du µ(dy)1{cug(y)≤f (y)} h(y)g(y) h(y)g(y)µ(dy)
0 E cg(y)
Z 1 Z = Z
f (y)
du µ(dy)1{cug(y)≤f (y)} g(y) g(y)µ(dy)
0 R d cg(y)
Z
f (y)
h(y)µ(dy)
EZ c
=
f (y)
µ(dy)
Rd c
Z
f (y)
= h(y) R µ(dy)
Z
R d
Rd f dµ
= h(y)ν(dy).
E
This third way to present the same computations show that in term of practical implementation,
this method is in fact very elementary.
Classical applications. Uniform distributions on a bounded domain D.
d
Let D ⊂ a + [−M, M ]d , λd (D) > 0 (where a ∈ Rd , M > 0) and let Y = U (a + [−M, M ]d ), let
τD := min{n | Yn ∈ D} where (Yn )n≥1 is an i.i.d. sequence defined on a probability space (Ω, A, P)
with the same distribution as Y . Then,
d
YτD = U (D)
and
f (x) = 1D (x) ≤ (2M )d g(x)
| {z }
=:c
f
so that R .µ = U (D).
Rd f dµ
As a matter of fact, with the notations of the above proposition,
n f o
τ = min k ≥ 1 | c Uk ≤ (Yk ) .
g
f f
However, g (y) > 0 if and only if y ∈ D and if so, g (y) = 1. Consequently τ = τD .
A standard application is to consider the unit ball of Rd , D := Bd (0; 1). When d = 2, this is
involved in the so-called polar method, see below, for the simulation of N (0; I2 ) random vectors.
22 CHAPTER 1. SIMULATION OF RANDOM VARIABLES
The γ(α)-distribution.
dx
Let α > 0 and PX (dx) = fα (x) Γ(α) where
Lemma 1.1 Let X 0 and X” two independent random variables with distributions γ(α0 ) and γ(α”)
respectively. Then X = X 0 + X” has a distribution γ(α0 + α”).
X = ξ1 + · · · + ξn
where ξk are i.i.d. with exponential distribution since γ(1) = E(1). Consequently, if U1 , . . . , Un are
i.i.d. uniformly distributed random variables
n
!
d
Y
X = − log Uk .
k=1
To simulate a random variable with general distribution γ(α), one writes α = bαc + {α} where
bαc := max{k ≤ α, k ∈ N} denotes the integral value of α and {α} ∈ [0, 1) its fractional part.
1.4. THE ACCEPTANCE-REJECTION METHOD 23
1
Let E = K, let µ = λd|E be the reference measure and g = λd (K) 1K the density of the uniform
distribution over K (this distribution is very easy to simulate as emphasized in a former example).
Then
Y d
f (x) ≤ c g(x) x ∈ K with c = kf ksup λd (K) = kf ksup (bi − ai ).
i=1
Then, if (Yn )n≥1 denotes an i.i.d. sequence defined on a probability space (Ω, A, P) with the uniform
distribution over K, the stopping strategy τ of the Von Neumann’s acceptance-rejection method
reads
τ = min k | kf ksup Uk ≤ f (Yk ) .
Equivalently this van be rewritten in more intuitive way as follows: let Vn = (Vn1 , Vn2 )n≥1 be an i.i.d.
sequence of random vectors defined on a probability space (Ω, A, P) having a uniform distribution
over K × [0, kf k∞ ]. Then
d
Vτ1 = ν τ = min k ≥ 1 | Vk2 ≤ f (Vk1 ) .
where
There are many other applications of Von Neumann’s acceptance-rejection method, e.g. in
Physics, to take advantage of the fact the density to be simulated is only known up to constant.
Several methods have been devised to speed it up i.e. to increase its yield. Among them let us cite
the Zigurat method for which we refer to [112]. It has been developed by Marsaglia and Tsang in
the early 2000’s.
λk
∀ k ∈ N, P(λ)({k}) = e−λ .
k!
To simulate this distribution in an exact way, one relies on its close connection with the Poisson
counting process. The (normalized) Poisson counting process is the counting process induced by
the Exponential random walk (with parameter 1). It is defined by
X n o
∀ t ≥ 0, Nt = 1{Sn ≤t} = min n | Sn+1 > t
n≥1
Proposition 1.3 The process (Nt )t≥0 has càdlàg (1 ) paths, independent stationary increments. In
particular for every s, t ≥ 0, s ≤ t, Nt − Ns is independent of Ns and has the same distribution as
Nt−s . Furthermore, for every t ≥ 0, Nt has a Poisson distribution with parameter t.
Proof. Let (Xk )k≥1 be a sequence of i.i.d. random variables with an exponential distribution E(1).
Set, for every n ≥ 1,
Sn = X1 + · · · + Xn ,
Let t1 , t2 ∈ R+ , t1 < t2 and let k1 , k2 ∈ N. Assume temporarily k2 ≥ 1.
Now, if set A = P(Sk1 ≤ t1 < Sk1 +1 ≤ Sk1 +k2 ≤ t2 < Sk1 +k2 +1 ) for convenience, we get
Z
A = k +k +1 1{x1 +···+x ≤t1 ≤x1 +···+x , x1 +···+x
e−(x1 +···+xk1 +k
≤t2 <x1 +···+x }
)
2 +1 dx1 · · · dx
k1 +k2 +1
k1 k1 +1 k1 +k2 k1 +k2 +1
R+1 2
1
French acronym for right continuous with left limits (continu à droite limitée à gauche).
1.5. SIMULATION OF POISSON DISTRIBUTIONS (AND POISSON PROCESSES) 25
Integrating downward from uk1 +k2 +1 down to u1 we get owing to Fubini’s Theorem,
Z
A = du1 · · · duk1 +k2 e−t2
{0≤u1 ≤···≤uk1 ≤t1 ≤uk1 +1 ≤···≤uk1 +k2 ≤t2 <uk1 +k2 +1 }
Corollary 1.2 (Simulation of a Poisson distribution) Let (Un )n≥1 be an i.i.d. sequence of
uniformly distributed random variables on the unit interval. The process null at zero and defined
for every t > 0 by
d
Nt = min k ≥ 0 | U1 · · · Uk+1 < e−t = P(t)
Proposition 1.4 Let R2 and Θ : (Ω, A, P) → R be two independent r.v. with distributions
L(R2 ) = E( 21 ) and L(Θ) = U ([0, 2π]) respectively. Then
d
X := (R cos Θ, R sin Θ) = N (0, I2 )
√
where R := R2 .
using the standard change of variable: x1 = ρ cos θ, x2 = ρ sin θ. We use the facts that (ρ, θ) 7→
(ρ cos θ, ρ sin θ) is a C 1 -diffeomorphism from (0, 2π)×(0, ∞) → R2 \(R+ ×{0}) and λ2 (R+ ×{0}) = 0.
√
Setting now ρ = r, one has:
r
x2 + x22 dx1 dx2 √ √ e− 2
ZZ ZZ
drdθ
f (x1 , x2 ) exp − 1 = f ( r cos θ, r sin θ) 1R∗+ (ρ)1(0,2π) (θ)
R2 2 2π 2 2π
√ √
= IE f R2 cos Θ, R2 sin Θ
= IE(f (X)). ♦
Corollary 1.3 (Box-Müller method) One can simulate a distribution N (0; I2 ) from a couple
(U1 , U2 ) of independent random variable with distribution U ([0, 1]) by setting
p p
X := −2 log(U1 ) cos(2πU2 ), −2 log(U1 ) sin(2πU2 ) .
The yield of the simulation is r = 1/2 with respect to the N (0; 1). . . and r = 1 when the aim is
to simulate a N (0; I2 )-dsitributed (pseudo-)random vector or, equivalently, two N (0; 1)-distributed
(pseudo-)random numbers.
d
Proof. Simulate the exponential distribution using the inverse distribution function using U1 =
d d
U ([0, 1]) and note that if U2 = U ([0, 1]), then 2πU2 = U ([0, 2π]) (where U2 is taken independent of
U1 . ♦
d
We consider a d-tuple (U1 , . . . , Ud ) = U ([0, 1]d ) random vector (so that U1 , . . . , Ud are i.i.d. with
distribution U ([0, 1])) and we set
p p
(X2i−1 , X2i ) = −2 log(U2i−1 ) cos(2πU2i ), −2 log(U2i−1 ) sin(2πU2i ) , i = 1, . . . , d/2. (1.1)
d
Exercise (Marsaglia’s Polar method). See [112] Let (V1 , V2 ) = U (B(0; 1)) where B(0, 1)
denotes the canonical Euclidean unit ball in R2 . Set R2 = V12 + V22 and
p p
X := V1 −2 log(R2 )/R2 , V2 −2 log(R2 )/R2 .
d V1 V2
(a) Show that R2 = U ([0, 1]), cos(θ), sin(θ) , R2 and VR1 , VR2 are independent.
R, R ∼
d
Deduce that X = N (0; I2 ).
d
(b) Let (U1 , U2 ) = U ([−1, 1]2 ). Show that L (U1 , U2 ) | U12 + U22 ≤ 1 = U B(0; 1) . Derive a sim-
ulation method for N (0; I2 ) combining the above identity and an appropriate acceptance-rejection
algorithm to simulate the N (0; I2 ) distribution. What us the yield of the resulting procedure.
(c) Compare the performances of this so-called Marsaglia’s polar method with those of the Box-
Müller algorithm (i.e. the acceptance-rejection rule versus the computation of trigonometric func-
tions). Conclude.
Lemma 1.2 Let Y be an Rd -valued square integrable random vector and let A ∈ M(q, d) be a q × d
matrix. Then the covariance matrix CAY of the random vector AY is given by
CAY = ACY A∗
where A∗ stands for the transpose of A.
The Cholesky based approach is more performing since it approximately divides the complexity
of this phase of the simulation almost by a factor 2.
d d
Exercises. 1. Let Z = (Z1 , Z2 ) be a Gaussian vector such that Z1 = Z2 = N (0; 1) and
Cov(Z1 , Z2 ) = ρ ∈ [−1, 1].
(a) Compute for every u ∈ R2 the Laplace transform L(u) = E e(u|Z) of Z.
(b) Compute for every σ1 , σ2 > 0 the correlation (2 ) ρX1 ,X2 between the random variables X1 =
eσ1 Z1 and X2 = eσ2 Z2 .
2
(c) Show that inf ρ∈[−1,1] ρX1 ,X2 ∈ (−1, 0) and that, when σi = σ > 0, inf ρ∈[−1,1] ρX1 ,X2 = −e−σ .
2. Let Σ be a positive definite matrix. Show the existence of a unique lower triangularP matrix T
and a diagonal matrix D such that both T and D have positive diagonal entries and i,j=1 Tij2 = 1
for every i = 1, . . . , d. [Hint: change the reference Euclidean norm to perform the Hilbert-Schmidt
decomposition] (3 ).
is a first possibility.
However, it seems more natural to use the independence and the stationarity of the increments
i.e. that
d
Wt1 , Wt2 − Wt1 , . . . , Wtn − Wtn−1 = N 0; Diag(t1 , t2 − t1 , . . . , tn − tn−1 )
so that
Wt1
Z1
Wt2 − Wt1
d √ √ p ..
= Diag t1 , t2 − t1 , . . . , tn − tn−1
.. .
.
Zn
Wtn − Wtn−1
d
where (Z1 , . . . , Zn ) = N 0; In . The simulation of (Wt1 , . . . , Wtn ) follows by summing up the
increments.
Remark. To be more precise, one derives from the above result that
Wt1 Wt1
Wt W t − Wt
2 2 1
.. = L where L = 1{i≥j} 1≤i,j≤n .
..
. .
Wtn Wtn − Wtn−1
√ √ √
Hence,
√ if we set T = L Diag t1 , t2 − t1 , . . . , tn − tn−1 one checks on the one hand that
tj − tj−1 1{i≥j} 1≤i,j≤n and on the other hand that
Wt1
Z1
Wt2
.
= T .. .
..
.
Zn
Wtn
We derive, owing to Lemma 1.2, that T T ∗ = T In T ∗ = ΣW t1 ,...,tn . The matrix T being lower
triangular, it provides the Cholesky decomposition of the covariance matrix ΣW t1 ,...,tn .
Practitioner’s corner: Warning! In quantitative finance, especially when modeling the dy-
namics of several risky assets, say d, the correlation between the Brownian sources of randomness
B = (B 1 , . . . , B d ) attached to the log-return is often misleading in terms of notations: since it is
usually written as
q
σij Wtj .
X
i
∀ i ∈ {1, . . . , d}, Bt =
j=1
`=1
30 CHAPTER 1. SIMULATION OF RANDOM VARIABLES
where σi. is the column vector [σij ]1≤j≤q , σ = [σij ]1≤i≤d,1≤j≤q and (.|.) denotes here the canonical
inner product on Rq . So one should proceed the Cholesky decomposition on the symmetric non-
negative matrix ΣB1 = σσ ∗ . Also have in mind that, if q < d, then σB1 has rank at most q and
cannot be invertible.
31
32 CHAPTER 2. MONTE CARLO METHOD AND APPLICATIONS TO OPTION PRICING
This procedure is the core of the Monte Carlo simulation. We provided several examples of
such representations in the previous chapter. For further developments on that wide topic, we refer
to [42] and the references therein, but in some way an important part of the scientific activity in
Probability Theory is motivated by or can be applied to simulation.
Once these conditions are fulfilled, one can perform a Monte Carlo simulation. But this leads
to two important issues:
– what about the rate of convergence of the method?
and
– how can the resulting error be controlled?
The answer to these questions call upon fundamental results in Probability Theory and Statis-
tics.
Rate(s) of convergence. The (weak) rate of convergence in the SLLN is ruled by the Central
Limit Theorem (CLT ) which says that if X is square integrable (X ∈ L2 (P)), then
√ L 2
M X M − mX −→ N (0; σX ) as M → +∞
where σX 2 = Var(X) := E((X − E X)2 ) = E(X 2 ) − (E X)2 is the variance (its square root σ is
X
called standard deviation) (1 ). Also note that the mean quadratic rate of convergence (i.e. the rate
in L2 (P)) is exactly
σ
kX M − mX k2 = √ X .
M
If σX = 0, then X M = X = mX P-a.s.. So, from now on, we may assume without loss of
generality that σX > 0.
The Law of the Iterated Logarithm (LIL) provides an a.s. rate of convergence, namely
s s
M M
lim X M − mX = σX and lim X M − mX = −σX .
M 2 log(log M ) M 2 log(log M )
A proof of this (difficult) result can be found e.g. in [149]. All these rates stress the main drawback
of the Monte Carlo method: it is a slow method since dividing the error by 2 needs to increase the
size of the simulation by 4.
1 L
The symbol −→ stands for the convergence in distribution (or “in law” whence the notation “L”) : a sequence
of random variables (Yn )n≥1 converges in distribution toward Y∞ if
It can be defined equivalently as the weak convergence of the distributions PYn toward the distribution PY∞ . Con-
vergence in distribution is also characterized by the following property
The extension to Rd -valued random vectors is straightforward. See [24] for a presentation of weak convergence of
probability measures in a general framework.
2.1. THE MONTE CARLO METHOD 33
Data driven control of the error: confidence level and confidence interval. Assume that σX > 0
(otherwise the problem is empty). As concerns the control of the error, one relies on the CLT . It
is obvious that the CLT also reads
√ X M − mX L
M −→ N (0; 1) as M → +∞.
σX
Furthermore, the normal distribution having a density, it has no atom. Consequently, this conver-
gence implies (in fact it is equivalent, see [24]) that for every real numbers a > b,
√
X M − mX
lim P M ∈ [b, a] = P N (0; 1) ∈ [b, a] as M → +∞.
M σX
= Φ0 (a) − Φ0 (b)
where Φ0 denotes the distribution function of the standard normal distribution i.e.
Z x
ξ2 dξ
Φ0 (x) = e− 2 √ .
−∞ 2π
Exercise. Show that, as well as any distribution function of a symmetric random variable
without atom, the distribution function of the centered normal distribution on the real line satisfies
∀ x ∈ R, Φ0 (x) + Φ0 (−x) = 1.
Deduce that P(|N (0; 1)| ≤ a) = 2Φ0 (a) − 1 for every a > 0.
In turn, if X1 ∈ L3 (P), the convergence rate in the Central √ Limit Theorem is ruled by the
Berry-Essen Theorem (see [149]). It turns out to be again 1/ M (2 ) which is rather slow from a
statistical viewpoint but is not a real problem within the usual range of Monte Carlo simulations
(at least many thousands, usually one hundred thousands or one million paths). Consequently,
√ X M − mX
one can assume that M does have a standard normal distribution. Which means in
σX
particular that one can design a probabilistic control of the error directly derived from statistical
concepts: let α ∈ (0, 1) denote a confidence level (close to 1) and let qα be the two-sided α-quantile
defined as the unique solution to the equation
Then, setting a = qα and b = −qα , one defines the theoretical random confidence interval at level
α
α σX σX
JM := X M − qα √ , X M + qα √ ,
M M
2
Namely, if σX > 0,
√
C E|X1 − EX1 |3 1
∀ n ≥ 1, ∀ x ∈ R, P M (X̄M − mX ) ≤ xσX − Φ0 (x) ≤ √ .
3
σX M 1 + |x|3
See [149].
34 CHAPTER 2. MONTE CARLO METHOD AND APPLICATIONS TO OPTION PRICING
which satisfies
√
α |X − mX |
P(mX ∈ JM ) = P M M ≤ qα
σX
M →+∞
−→ P(|N (0; 1)| ≤ qα ) = α.
However, at this stage this procedure remains purely theoretical since the confidence interval
JM involves the standard deviation σX of X which is usually unknown.
Here comes out the “trick” which made the tremendous success of the Monte Carlo method:
the variance of X can in turn be evaluated on-line by simply adding a companion Monte Carlo
2 , namely
simulation to estimate the variance σX
M
1 X
VM = (Xk − X M )2
M −1
k=1
M
1 X M 2
= Xk2 − 2
X M −→ Var(X) = σX as M → +∞.
M −1 M −1
k=1
The above convergence (3 ) follows from the SLLN applied to the i.i.d. sequence of integrable
random variables (Xk2 )k≥1 (and the convergence of X M which follows from the SLLN as well). It
is an easy exercise to show that moreover E(V M ) = σX2 i.e., with the terminology of Statistics, V
M
2
is an unbiased estimator of σX . This remark is of little importance in practice due to the usual –
large – range of Monte Carlo simulations. Note that the above a.s. convergence is again ruled by
a CLT if X ∈ L4 (P) (so that X 2 ∈ L2 (P)).
1
XM = X M −1 + XM − X M −1
M
M −1 X
= X M −1 + M , M ≥ 1
M M
1
VM = V M −1 − V M −1 − (XM − X M −1 )(XM − X M )
M −1
M −2 (XM − X M −1 )2
= V M −1 + , M ≥ 2.
M −1 M
3
When X “already” has an N (0; 1) distribution, then V M has a χ2 (M − 1)-distribution. The χ2 (ν)-distribution,
known as the χ2 -distribution with ν ∈ N∗ degrees of freedom is the distribution of the sum Z12 + · · · + Zν2 where
Z1 , . . . , Zν are i.i.d. with N (0; 1) distribution. The loss of one degree of freedom for V M comes from the fact that
X1 , . . . , XM and X M satisfies a linear equation which induces a linear constraint. This result is known as Cochran’s
Theorem.
2.1. THE MONTE CARLO METHOD 35
√ X M − mX √ X M − mX σ M →+∞
M q = M × qX −→ N (0; 1).
VM σX VM
Of course, within the usual range of Monte Carlo simulations, one can always consider that, for
large M ,
√ X M − mX
M q ≈ N (0; 1).
VM
Note that when X is itself normally distributed, one shows that the empirical mean X M and the
empirical variance V M are independent so the that the true distribution of the left hand side of the
above (approximate) equation is a Student distribution with M − 1 degree of freedom (4 ), denoted
T (M − 1).
Finally, one defines the confidence interval at level α of the Monte Carlo simulation by
s s
VM VM
IM = X M − qα , X M + qα
M M
For numerical implementation one often considers qα = 2 which corresponds to the confidence
level α = 95, 44 % ≈ 95 %.
The “academic” birth of the Monte Carlo method started in 1949 with the publication of a
seemingly seminal paper by Metropolis and Ulam “The Monte Carlo method” in J. of American
Statistics Association (44, 335-341). In fact, the method had already been extensively used for
several years as a secret project of the U.S. Defense Department.
One can also consider that, in fact, the Monte Carlo method goes back to the celebrated
Buffon’s needle experiment and should subsequently be credited to the French naturalist Georges
Louis Leclerc, Comte de Buffon (1707-1788).
As concerns Finance and more precisely the pricing of derivatives, it seems difficult to trace the
origin the implementation of Monte Carlo method for the pricing and hedging of options. In the
academic literature, the first academic paper dealing in a systematic manner with a Monte Carlo
approach seems to go back to Boyle in [28].
4
The
√ Student distribution with ν degrees of freedom, denoted T (ν) is the law of the random variable
νZ0
p where Z0 , Z1 , . . . , Zν are i.i.d. with N (0; 1) distribution. As expected for the coherence of what
Z12 + · · · + Zν2
precedes, one shows that T (ν) converges in distribution to the normal distribution N (0; 1) as ν → +∞.
36 CHAPTER 2. MONTE CARLO METHOD AND APPLICATIONS TO OPTION PRICING
σi2
)t+σi Wti
Xt0 = ert , Xti = xi0 e(r− 2 , i = 1, 2.
A European vanilla option with maturity T > 0 is an option related to a European payoff
hT := h(XT )
which only depends on X at time T . In such a complete market the option premium at time 0 is
given by
V0 = e−rT E(h(XT ))
The fact that W has independent stationary increments implies that X 1 and X 2 have indepen-
dent stationary ratios i.e.
XTi d XTi −t
= is independent of FtW .
Xti xi0
As a consequence, if
V0 := v(x0 , T ),
5
One shows that, owing to the 0-1 Kolmogorov law, that this filtration is right continuous i.e. Ft = ∩s>t Fs . A
right continuous filtration which contains the P-negligible sets satisfies the so-called “usual conditions”.
2.2. VANILLA OPTION PRICING IN A BLACK-SCHOLES MODEL: THE PREMIUM 37
then
There is a closed form for this option which is but the celebrated Black-Scholes formula for
option on stock (without dividend)
CallBS
0 = C(x0 , K, r, σ1 , T ) = x0 Φ0 (d1 ) − e−rT KΦ0 (d2 ) (2.2)
σ12
log(x0 /K) + (r + √
√ 2 )T
with d1 = , d2 = d1 − σ1 T
σ1 T
hT = max(XT1 , XT2 ) − K
+
.
A quasi-closed form is available involving the distribution function of the bi-variate (correlated)
normal distribution. Laziness may lead to price it by MC (P DE is also appropriate but needs more
care) as detailed below.
hT = (XT1 − XT2 ) − K
+
.
For this payoff no closed form is available. One has the choice between a PDE approach (quite
appropriate in this 2-dimensional setting but requiring some specific developments) and a Monte
Carlo simulation.
We will illustrate below the regular Monte Carlo procedure on the example of a Best-of call.
functions of independent uniformly distributed random variables, namely a centered and normalized
Gaussian. In our case, it amounts to writing
d
e−rT hT = ϕ(Z 1 , Z 2 )
√ 1 √
2 2
1 σ1 2 σ2 1
p
2 2
−rT
:= max x0 exp − T +σ1 T Z , x0 exp − T +σ2 T ρZ + 1 − ρ Z −Ke
2 2 +
d
where Z = (Z 1 , Z 2 ) = N (0; I2 ) (the dependence of ϕ in xi0 , etc, is dropped). Then, simulating a
M -sample (Zm )1≤m≤M of the N (0; I2 ) distribution using e.g. the Box-Müller yields the estimate
Best-of Call0 = e−rT E max(XT1 , XT2 ) − K + )
M
1 2 1 X
= E(ϕ(Z , Z )) ≈ ϕM := ϕ(Zm ).
M
m=1
One computes an estimate for the variance using the same sample
M
1 X M
V M (ϕ) = ϕ(Zm )2 − ϕ2 ≈ Var(ϕ(Z))
M −1 M −1 M
m=1
since M is large enough. Then one designs a confidence interval for E ϕ(Z) at level α ∈ (0, 1) by
setting s s
α V M (ϕ) V M (ϕ)
IM = ϕM − qα , ϕ M + qα
M M
where qα is defined by P(|N (0; 1)| ≤ qα ) = α (or equivalently 2Φ0 (qα ) − 1 = α).
Numerical Application: We consider a European “Best-of-Call” option with the following parame-
ters
r = 0.1, σi = 0.2 = 20 %, ρ = 0, X0i = 100, T = 1, K = 100
(being aware that the assumption ρ = 0 is not realistic). The confidence level is set (as usual) at
α = 0.95.
The Monte Carlo simulation parameters are M = 2m , m = 10, . . . , 20 (have in mind that 210 =
1024). The (typical) results of a numerical simulation are reported in the Table below.
M ϕM IM
210 = 1024 18.8523 [17.7570 ; 19.9477]
11
2 = 2048 18.9115 [18.1251 ; 19.6979]
212 = 4096 18.9383 [18.3802 ; 19.4963]
213 = 8192 19.0137 [18.6169 ; 19.4105]
214 = 16384 19.0659 [18.7854 ; 19.3463]
215 = 32768 18.9980 [18.8002 ; 19.1959]
216 = 65536 19.0560 [18.9158 ; 19.1962]
217 = 131072 19.0840 [18.9849 ; 19.1831]
218 = 262144 19.0359 [18.9660 ; 19.1058]
219 = 524288 19.0765 [19.0270 ; 19.1261]
220 = 1048576 19.0793 [19.0442 ; 19.1143]
2.3. GREEKS (SENSITIVITY TO THE OPTION PARAMETERS): A FIRST APPROACH 39
20
19.5
19
18.5
18
17.5
10 11 12 13 14 15 16 17 18 19 20
Figure 2.1: Black-Scholes Best of Call. The Monte Carlo estimator (×) and its confidence
interval (+) at level α = 95 %.
Remark. Once the script is written for one option, i.e. one payoff function, it is almost instan-
taneous to modify it to price another option based on a new payoff function: the Monte Carlo
method is very flexible, much more than a P DE approach.
We will see further on in Chapter 3 how to specify the size M of the simulation to comply with
an a priori accuracy level.
Remarks. • The local version of the above theorem can be necessary to prove the differentiability
of a function defined by an expectation over the whole real line (see exercise below).
• All what precedes remains true if one replaces the probability space (Ω, A, P) by any measurable
space (E, E, µ) where µ is a non-negative measure (see again Chapter 8 in [30]). However, this ex-
tension is no longer true as set when dealing under the uniform integrability assumption mentioned
in the exercises below.
• Some variants of the result can be established to get a theorem for right or left differentiability
of functions Φ = Eω ϕ(x, ω) defined on the real line, (partially) differentiable functions defined on
Rd , for holomorphic functions on C, etc.
• There exists a local continuity result for such functions ϕ defined as an expectation which are quite
similar to Claim (a). The domination property by an integrable non-negative random variable Y is
requested on ϕ(x, ω) itself. A precise statement is provided in Chapter 11 (with the same notations).
For a proof we still refer to [30], Chapter 8. This result is in particular useful to establish the
differentiability of a multi-variable function by combining the existence and the continuity of its
partial derivatives.
2.3. GREEKS (SENSITIVITY TO THE OPTION PARAMETERS): A FIRST APPROACH 41
d
Exercise. Let Z = N (0; 1) defined on a probability space (Ω, A, P), ϕ(x, ω) = (x − Z(ω))+ and
Φ(x) = E ϕ(x − Z)+ , x ∈ R.
(a) Show that Φ is differentiable on the real line by applying the local version of Theorem 2.1 and
compute its derivative.
(b) Show that if I denotes a non-trivial interval of R. Show that if ω ∈ {Z ∈ I} (i.e. Z(ω) ∈ I), the
function x 7→ (x − Z(ω))+ is never differentiable on the whole interval I.
Exercises (extension to uniform integrability). One can replace the domination property
(iii) in Claim (a) (local version) of the above Theorem 2.1 by the less stringent uniform integrability
assumption
ϕ(x, .) − ϕ(x0 , .)
(iii)ui is P-uniformly integrable on (Ω, A, P).
x − x0 x∈I\{x0 }
For the definition and some background on uniform integrability, see Chapter 11, Section 11.5.
1. Show that (iii)ui implies (iii).
2. Show that (i)-(ii)-(iii)ui implies the conclusion of Claim (a) (local version) in the above Theo-
rem 2.1.
3. State a “uniform integrable” counterpart of (iii)glob to extend Claim (b) (global version) of
Theorem 2.1.
4. Show that uniform integrability of the above family of random variables follows from its Lp -
boundedness for (any) p > 1.
where ϕ : (0, +∞) → R lies in L1 (PX x ) for every x ∈ (0, ∞) and T > 0. This corresponds (when ϕ
T
is non-negative), to vanilla payoffs with maturity T . However, we skip on purpose the discounting
factor in what follows to alleviate notations: one can always imagine it is included as a constant in
the function ϕ below since we work at a fixed time T . The updating of formulas is obvious.
First, we are interested in computing the first two derivatives Φ0 (x) and Φ”(x) of the function
Φ which correspond (up to the discounting factor) to the δ-hedge of the option and its γ parameter
respectively. The second parameter γ is involved in the so-called “tracking error”. But other
sensitivities are of interest to the practitioners like the vega i.e. the derivative of the (discounted)
function Φ with respect to the volatility parameter, the ρ (derivative with respect to the interest
rate r), etc. The aim is to derive some representation of these sensitivities as expectations in order
to compute them using a Monte Carlo simulation, in parallel with the premium computation. Of
42 CHAPTER 2. MONTE CARLO METHOD AND APPLICATIONS TO OPTION PRICING
course the Black-Scholes model is simply a toy model to illustrate such an approach since, for
this model, many alternative and more efficient methods can be implemented to carry out these
computations.
We will first work on the scenarii space (Ω, A, P), because this approach contains the “seed” of
methods that can be developed in much more general settings in which the SDE has no explicit
solution like it has in the Black-Scholes model. On the other hand, as soon as an explicit expression
is available for the density pT (x, y) of XTx , it is more efficient to use the next section 2.3.3.
Proposition 2.1 (a) If ϕ : (0, +∞) → R is differentiable and ϕ0 has polynomial growth, then the
function Φ defined by (2.3) is differentiable and
x
x XT
0 0
∀ x > 0, Φ (x) = E ϕ (XT ) . (2.4)
x
(b) If ϕ : (0, +∞) → R is differentiable outside a countable set and is locally Lipschitz continuous
with polynomial growth in the following sense
∃ m ≥ 0, ∀ u, v ∈ R+ , |ϕ(u) − ϕ(v)| ≤ C|u − v|(1 + |u|m + |v|m ),
then Φ is differentiable everywhere on (0, ∞) and Φ0 is given by (2.4).
(c) If ϕ : (0, +∞) → R is simply a Borel function with polynomial growth, then Φ is still differen-
tiable and
0 x WT
∀ x > 0, Φ (x) = E ϕ(XT ) . (2.5)
xσT
Proof. (a) This straightforwardly follows from the explicit expression for XTx and the above
differentiation Theorem 2.1 (global version) since, for every x ∈ (0, ∞),
∂ ∂XTx Xx
ϕ(XTx ) = ϕ0 (XTx ) = ϕ0 (XTx ) T .
∂x ∂x x
0 m
Now |ϕ (u)| ≤ C(1 + |u| ) (m ∈ N, C ∈ (0, ∞)) so that, if 0 < x ≤ L < +∞,
∂ 0
ϕ(X x ) ≤ Cr,σ,T m
∈ L1 (P)
∂x T
(1 + L exp (m + 1)σW T )
0
where Cr,σ,T is another positive real constant. This yields the domination condition of the derivative.
(b) This claim follows from Theorem 2.1(a) (local version) and the fact that, for every T > 0,
P(XTx = ξ) = 0 for every ξ ≥ 0.
2
(c) Now, still under the assumption (a) (with µ := r − σ2 ),
0
Z √ √ 2 du
ϕ0 x exp µT + σ T u) exp µT + σ T u e−u /2 √
Φ (x) =
R 2π
√
∂ϕ x exp (µT + σ T u) −u2 /2 du
Z
1
= √ e √
xσ T R ∂u 2π
2 /2
1
Z √ ∂e−u du
= − √ ϕ x exp (µT + σ T u) √
x σ TZ R ∂u 2π
1 √ −u2 /2 du
= √ ϕ x exp (µT + σ T u) u e √
x σ TZ R 2π
1 √ √ 2 du
ϕ x exp (µT + σ T u) T u e−u /2 √
=
x σT R 2π
2.3. GREEKS (SENSITIVITY TO THE OPTION PARAMETERS): A FIRST APPROACH 43
where we used an integration by parts to obtain the third equality taking advantage of the fact
that, owing to the polynomial growth assumptions on ϕ,
√ 2
lim ϕ x exp (µT + σ T u) e−u /2 = 0.
|u|→+∞
Remark. We will see in the next section a much quicker way to establish claim (c). The above
method of proof, based on an integration by parts, can be seen as a toy-introduction to a systematic
WT
way to produce random weights like xσT as a substitute of the differentiation of the function
ϕ, especially when the differential does not exist. The most general extension of this approach,
developed on the Wiener space(6 ) for functionals of the Brownian motion is known as Malliavin-
Monte Carlo method.
Exercise (Extension to Borel functions with polynomial growth). (a) Show that as
soon as ϕ is a Borel function with polynomial growth, the function f defined by (2.3) is continuous.
[Hint: use that the distribution XTx has a probability density pT (x, y) on the positive real line which
continuously depends on x and apply the continuity theorem for functions defined by an integral,
see Theorem 11.3(a) in the “Miscellany” Chapter 11.]
(b) Show that (2.6) holds true as soon as ϕ is a bounded Borel function. [Hint: Apply the Functional
Monotone Class Theorem (see Theorem 11.5 in the “Miscellany” Chapter 11) to an appropriate
vector subspace of functions ϕ and use the Baire σ-field Theorem.]
(c) Extend the result to Borel functions ϕ with polynomial growth [Hint: use that ϕ(XTx ) ∈ L1 (P)
and ϕ = limn ϕn with ϕn = (n ∧ ϕ ∨ (−n))].
(d) Derive from what precedes a simple expression for Φ(x) when ϕ = 1I is the indicator function
of the interval I.
Comments: The extension to Borel functions ϕ always needs at some place an argument based
on the regularizing effect of the diffusion induced by the Brownian motion. As a matter of fact if
6
The Wiener space C(R+, Rd ) and its Borel σ-field for the topology of uniform convergence on compact sets,
namely σ ω 7→ ω(t), t ∈ R+ , endowed with the Wiener measure i.e. the distribution of a standard d-dimensional
Brownian motion (W 1 , . . . , W d ).
44 CHAPTER 2. MONTE CARLO METHOD AND APPLICATIONS TO OPTION PRICING
Xtx were the solution to a regular ODE this extension to non-continuous functions ϕ would fail.
We propose in the next section an approach (the log-likelihood method) directly based on this
regularizing effect through the direct differentiation of the probability density pT (x, y) of XTx .
Note that the assumptions of claim (b) are satisfied by usual payoff functions like ϕCall (x) = (x−K)+
or ϕP ut := (K − x)+ (when XTx has a continuous distribution). In particular, this shows that
The computation of this quantity – which is part of that of the Black-Scholes formula – finally
yields as expected:
∂ E(ϕCall (XTx ))
= erT Φ0 (d1 )
∂x
(keep in mind that the discounting factor is missing).
Exercises. 0. A comparison. Try a direct differentiation of the Black-Scholes formula (2.2) and
compare with a (formal) differentiation based on Theorem 2.1. You should find by both methods
∂Call0BS
(x) = Φ0 (d1 ).
∂x
But the true question is: “how long did it take you to proceed?”
1. Application to the computation of the γ (i.e. Φ”(x)). Show that, if ϕ is differentiable with a
derivative having polynomial growth,
1 0 x x x
Φ”(x) := E (ϕ (X )X − ϕ(X ))W
x2 σT T T T T
Extend this identity to the case where ϕ is simply Borel with polynomial growth. Note that a
(somewhat simpler) formula also exists when the function ϕ is itself twice differentiable but such
an assumption is not realistic for financial applications.
2. Variance reduction for the δ (7 ). The above formulae are clearly not the unique representations
of the δ as an expectation: using that E WT = 0 and E XTx = xerT , one derives immediately that
x
rT XT
0 0 rT rT 0 x 0
Φ (x) = ϕ (xe )e + E (ϕ (XT ) − ϕ (xe ))
x
7
In this exercise we slightly anticipate on the next chapter entirely devoted to Variance reduction.
2.3. GREEKS (SENSITIVITY TO THE OPTION PARAMETERS): A FIRST APPROACH 45
4. Testing the variance reduction if any. Although the former two exercises are entitled “variance
reduction” the above formulae do not guarantee a variance reduction at a fixed time T . It seems
intuitive that they do only when the maturity T is small. Do some numerical experiments to test
whether or not the above formulae induce some variance reduction.
When the maturity increases, test whether the regression method introduced in Chapter 3,
Section 3.2 works or not with these “control variates”.
5. Computation of the vega (8 ). Show likewise that E(ϕ(XTx )) is differentiable with respect to the
volatility parameter σ under the same assumptions on ϕ, namely
∂
E(ϕ(XTx )) = E ϕ0 (XTx )XTx (WT − σT )
∂σ
if ϕ is differentiable with a derivative with polynomial growth. Derive without any further compu-
tations – but with the help of the previous exercises – that
!!
∂ x x WT2 1
E(ϕ(XT )) = E ϕ(XT ) − WT − .
∂σ σT σ
if ϕ is simply Borel with polynomial growth. [Hint: use the former exercises.]
This derivative is known (up to an appropriate discounting) as the vega of the option related
to the payoff ϕ(XTx ). Note that the γ and the vega of a Call satisfy
vega(x, K, r, σ, T ) = x2 σT γ(x, K, r, σ, T )
In fact the beginning of this section can be seen as an introduction to the so-called tangent
process method (see the section that ends this chapter and section 9.2.2 in Chapter 9).
In fact, one may imagine in full generality that XTx depends on a parameter θ a real parameter
of interest : thus, XTx = XTx (θ) may be the solution at time T to a stochastic differential equation
which coefficients depend on θ. This parameter can be the starting x itself, or more specifically in
a Black-Scholes model, the volatility σ, the interest rate r, the maturity T or even the strike K of
a Call or Put option written on the risky asset X x .
Formally, one has Z
Φ(θ) = E(ϕ(XTx (θ))) = ϕ(y)pT (θ, x, y)µ(dy)
R
so that, formally,
Z
0 ∂pT
Φ (θ) = ϕ(y) (θ, x, y)µ(dy)
R ∂θ
∂pT
(θ, x, y)
Z
∂θ
= ϕ(y) p (θ, x, y)µ(dy)
R pT (θ, x, y) T
x ∂ log pT x
= E ϕ(XT ) (θ, x, XT ) . (2.7)
∂θ
A multi-dimensional version of this result can be established the same way round. However, this
straightforward and simple approach to “greek” computation remains limited beyond the Black-
Scholes world by the fact that it needs to have access not only to the regularity of the probability
density pT (x, y, θ) of the asset at time T but also to its explicit expression in order to include it in
a simulation process.
A solution in practice is to replace the true process X by an approximation, typically its Euler
scheme, with step Tn , (X̄txn )0≤k≤n where X̄0x = x and tnk := kT
n (see Chapter 7 for details). Then,
k
under some natural ellipticity assumption (non degeneracy of the volatility) this Euler scheme
starting at x (with step = T /n) does have a density p̄ kT (x, y), and more important its increments
n
have conditional densities which makes possible to proceed using a Monte Carlo simulation.
An alternative is to “go back” to the “scenarii” space Ω. Then, some extensions of the first two
approaches are possible: if the function and the diffusion coefficients (when the risky asset prices
follow a Brownian diffusion) are smooth enough and the payoff function is smooth too, one relies on
the tangent process (derivative of the process with respect to its starting value, or more generally
with respect to one of its structure parameters, see below).
When the payoff function has no regularity, a more sophisticated method is to introduce some
Malliavin calculus methods which correspond to a differentiation theory with respect to the generic
Brownian paths. An integration by parts formula plays the role of the elementary integration by
parts used in Section 2.3.2. This second topic is beyond the scope of the present course, although
some aspects will be mentioned through Bismut and Clark-Occone formulas.
2.3. GREEKS (SENSITIVITY TO THE OPTION PARAMETERS): A FIRST APPROACH 47
Exercises: 0. Provide simple assumptions to justify the above formal computations in (2.7), at
some point θ0 or for all θ running over a non empty pen interval Θ of R (or domain of Rd if θ is
vector valued). [Hint: use the remark right below Theorem 2.1.]
1. Compute the probability density pT (σ, x, y) of XTx,σ in a Black-Scholes model (σ > 0 stands for
the volatility parameter).
2. Re-establish all the sensitivity formulae established in the former section 2.3.2 (including the
exercises at the end of the section) using this approach.
3. Apply these formulae to the case ϕ(x) := e−rT (x − K)+ and retrieve the classical expressions
for the greeks in a Black-Scholes model: the δ, the γ and the vega.
Variance reduction
mX = E X ∈ R
49
50 CHAPTER 3. VARIANCE REDUCTION
As a first conclusion, this shows that, a confidence level being fixed, the size of a Monte Carlo
simulation grows linearly with the variance of X for a given accuracy and quadratically as the
inverse of the prescribed accuracy for a given variance.
Variance reduction: (not so) naive approach. Assume now that we know two random
variables X, X 0 ∈ L2R (Ω, A, P) satisfying
Practitioner’s corner. Usually, the problem appears as follows: there exists a random
variable Ξ ∈ L2R (Ω, A, P) such that
(i) E Ξ can be computed at a very low cost by a deterministic method (closed form, numerical
analysis method),
(ii) the random variable X − Ξ can be simulated with the same cost (complexity) than X,
(iii) the variance Var(X − Ξ) < Var(X).
Definition 3.1 A random variable Ξ satisfying (i)-(ii)-(iii) is called a control variate for X.
Exercise. Show that if the simulation process of X and X − Ξ have complexity κ and κ0
respectively, then (iii) becomes
(iii)0 κ0 Var(X − Ξ) < κVar(X).
Example (Toy-): In the previous chapter, we showed that, in B-S model, if the payoff function
is differentiable outside a countable set and locally Lipschitz continuous with polynomial growth
at infinity then the function Φ(x) = E ϕ(XTx ) is differentiable on (0, +∞) and
Xx W
Φ0 (x) = E ϕ0 (XTx ) T = E ϕ(XTx ) T .
x xσT
3.1. THE MONTE CARLO METHOD REVISITED: STATIC CONTROL VARIATE 51
So we have at hand two formulas for Φ0 (x) that can be implemented in a Monte Carlo simulation.
Which one should we choose to compute Φ0 (x) (i.e. the δ-hedge up to an e−rT -factor)? Since they
have the same expectations, the two random variables should be discriminated through (the square
of) their L2 - norm, namely
X x 2
0 x x W T 2
E ϕ (XT ) T and E ϕ(XT ) .
x xσT
It is clear, owing to the LIL for the Brownian motion that at 0 that if ϕ(x) 6= 0,
W 2
lim inf E ϕ(XTx ) T = +∞
T →0 xσT
whereas
X x 2
lim inf E ϕ0 (XTx ) T = +∞.
T →+∞ x
As a consequence the first formula is more appropriate for short maturities whereas the second
formula has less variance for long maturities. This appears somewhat as an exception: usually
Malliavin-Monte Carlo weights tend to introduce variance, not only for short maturities. One way
to partially overcome this drawback is to introduce some localization methods (see e.g. [26] for an
in-depth analysis or Section 9.4.3 for an elementary introduction).
(iii)” 0≤Ξ≤X
can be considered as a good candidate to reduce the variance, especially if Ξ is close to X so that
X − Ξ is “small”.
However, note that it does not imply (iii). Here is a trivial counter-example: let X ≡ 1, then
Var(X) = 0 whereas a uniformly distributed random variable Ξ on [1 − η, 1], 0 < η < 1, will
(almost) satisfy (i)-(ii) but Var(X − Ξ) > 0. . . Consequently, this variant is only a heuristic method
to reduce the variance which often works, but with no a priori guarantee.
Proposition 3.1 (Jensen Inequality) Let X : (Ω, A, P) → R be a random variable and let
g : R → R be a convex function. Suppose X and g(X) are integrable. Then, for any sub-σ-field B
of A,
g (E(X | B)) ≤ E (g(X) | B) P-a.s.
In particular considering B = {∅, Ω} yields the above inequality for regular expectation i.e.
g E(X) ≤ E g(X).
Jensen inequality is an efficient tool to design control variate when dealing with path-dependent
or multi-asset options as emphasized by the following examples:
The motivationP for this example is that in a (possibly correlated) d-dimensional Black-Scholes
model (see below), 1≤i≤d αi log(XTi,xi ) still has a normal distribution so that the Call like European
option written on the payoff
P i,xi
kT := e 1≤i≤d αi log(XT ) − K
+
j=1
3.1. THE MONTE CARLO METHOD REVISITED: STATIC CONTROL VARIATE 53
where
q
X
σi.2 = 2
σij , i = 1, . . . , d.
j=1
Exercise. Show that if the matrix σσ ∗ is positive definite (then q ≥ d) and one may assume
without modifying the model that X i,xi only depends on the first i components of a d-dimensional
standard Brownian motion. [Hint: think to Cholesky decomposition in Section 1.6.2.]
Now, let us describe the two phases of the variance reduction procedure:
– Phase I: Ξ = e−rT kT as a pseudo-control variate and computation of its expectation E Ξ.
The vanilla call option has a closed form in a Black-Scholes model and elementary computations
show that
X d
1 X
αi log(XTi,xi /xi ) = N r − αi σi.2 T ; α∗ σσ ∗ α T
2
1≤i≤d 1≤i≤d
d
!
1 P 2 ∗ ∗
√
xαi i e− 2 1≤i≤d αi σi. −α σσ α T
Y
−rT
e E kT = CallBS , K, r, α∗ σσ ∗ α, T .
i=1
This task clearly amounts to simulating M independent copies WT(m) = (WT1,(m) , . . . , WTq,(m) ),
m = 1, . . . , M of the d-dimensional standard Brownian motion W at time T i.e. M independent
(m) (m) d √
copies Z (m) = (Z1 , . . . , Zq ) of the N (0; Iq ) distribution in order to set WT(m) = T Z (m) ,
m = 1, . . . , M .
The resulting pointwise estimator of the premium is given, with obvious notations, by
m d
!
e−rT X (m) 1 P 2 ∗ ∗
√
xαi i e− 2 1≤i≤d αi σi. −α σσ α T
Y
hT − kT(m) + CallBS
, K, r, α∗ σσ ∗ α, T .
M
m=1 i=1
54 CHAPTER 3. VARIANCE REDUCTION
P
Remark. The extension to more general payoffs of the form ϕ α X i,xi is straightfor-
1≤i≤d i T
ward provided ϕ is non-decreasing and a closed form exists for the vanilla option with payoff
P i,xi
αk log(XT )
ϕ e 1≤i≤d .
Exercise. Other ways to take advantage of the convexity of the exponential function can be
explored: thus one can start from
X
i,xi
X X XTi,xi
αi XT = αi xi α
ei
xi
1≤i≤d 1≤i≤d 1≤i≤d
αi xi
where α ei = P , i = 1, . . . , d. Compare on simulations the respective performances of these
k αk xk
different approaches.
2. Asian options and Kemna-Vorst control variate in a Black-Scholes model (see [80]).
Let Z T
1
hT = ϕ Xtx dt
T 0
be a generic Asian payoff where ϕ is non-negative, non-decreasing function defined on R+ and let
σ2
x
Xt = x exp r− t + σWt , x > 0, t ∈ [0, T ],
2
be a regular Black-Scholes dynamics with volatility σ > 0 and interest rate r. Then, (standard)
Jensen inequality applied to the probability measure T1 1[0,T ] (t)dt implies
Z T Z T
1 1
Xtx dt ≥ x exp (r − σ /2)t + σWt dt 2
T 0 T
0
σ T
Z
T
= x exp (r − σ 2 /2) + Wt dt .
2 T 0
Now Z T Z T Z T
Wt dt = T WT − sdWs = (T − s)dWs
0 0 0
so that Z T Z T
1 d 1 2 T
Wt dt = N 0; 2 s ds = N 0; .
T 0 T 0 3
This suggests to rewrite the right hand side of the above inequality in a “Black-Scholes asset” style,
namely:
1 T x 1 T
Z Z
αT 2
Xt dt ≥ x e exp (r − (σ /3)/2)T + σ Wt dt
T 0 T 0
where a straightforward computation shows that
r σ2
α=− + .
2 12
3.1. THE MONTE CARLO METHOD REVISITED: STATIC CONTROL VARIATE 55
1 σ2 1 T
2
Z
KV −( r2 + σ12 )T
kT := ϕ xe exp r − T +σ Wt dt
2 3 T 0
hT ≥ kTKV .
– Phase I: The random variable kTKV is an admissible control variate as soon as the vanilla
option related to the payoff ϕ(XTx ) has a closed form. Indeed if a vanilla option related to the payoff
ϕ(XTx ) has a closed form
e−rT E ϕ(XTx ) = Premiumϕ BS (x, r, σ, T ),
or any other numerical integration method, having in mind nevertheless that the (continuous)
functions f of interest are here given for the first and the second payoff function by
σ2
f (t) = ϕ x exp r− t + σWt (ω) and f (t) = Wt (ω)
2
respectively. Hence, their regularity is (almost) 21 -Hölder (in fact α-Hölder, for every α < 12 , locally
as for the payoff hT ). Finally, in practice it will amount to simulating independent copies of the
n-tuple
W T , . . . , W (2k−1)T , . . . , W (2n−1)T .
2n 2n 2n
from which one can reconstruct both a mid -point approximation of both integrals appearing in hT
and kT .
In fact one can improve this first approach by taking advantage of the fact that W is a Gaussian
process as detailed in the exercise below
56 CHAPTER 3. VARIANCE REDUCTION
Further developments to reduce the time discretization error are proposed in Section 8.2.6
(see [97] where an in-depth study of the Asian option pricing in a Black-Scholes model is carried
out).
with ak = 2(k−n)+1
2n T , k = 1, . . . , n. [Hint: we know that for gaussian vector conditional expectation
and affine regression coincide.]
(c) Propose a variance reduction method in which the pseudo-control variate e−rT kT will be simu-
lated exactly.
2. Check that what precedes can be applied to payoffs of the form
Z T
1 x x
ϕ Xt dt − XT
T 0
where ϕ is still a non-negative, non-decreasing function defined on R.
2. Best-of-call option. We consider the Best-of-call payoff given by
hT = max(XT1 , XT2 ) − K + .
(a) Using the convexity inequality (that can still be seen as an application of Jensen inequality)
√
ab ≤ max(a, b), a, b > 0,
show that q
kT := XT1 XT2 − K .
+
is a natural (pseudo-)control variate for hT .
(b) Show that, in a 2-dimensional Black-Scholes (possibly correlated) model (see expel in Sec-
tion 2.2), the premium of the option with payoff kT (known as geometric mean option) has a closed
form. Show that this closed form can be written as a Black-Scholes formula with appropriate
parameters (and maturity T ).
(c) Check on (at least one) simulation(s) that this procedure does reduce the variance (use the
parameters of the model specified in Section 2.2).
(d) When σ1 and σ2 are not equal, improve the above variance reduction protocol by considering a
parametrized family of (pseudo-)control variate, obtained from the more general inequality aθ b1−θ ≤
max(a, b) when θ ∈ (0, 1).
1
Var X + X 0
Var (χ) =
4
1
Var(X) + Var(X 0 ) + 2Cov(X, X 0 )
=
4
Var(X) + Cov(X, X 0 )
= .
2
The size M X (ε, α) and M χ (ε, α) of the simulation using X and χ respectively to enter a given
interval [m − ε, m + ε] with the same confidence level α is given, following (3.1), by
q 2 q 2
α α
MX = Var(X) with X and Mχ = X = Var(χ) with χ.
ε ε
Taking into account the complexity like in the exercise that follows Definition 3.1, that means in
terms of CP U computation time that one should better use χ if and only if
κχ M χ < κX M X ⇐⇒ 2κ M χ < κ M X
i.e.
2Var(χ) < Var(X).
Given the above inequality, this reads
Cov(X, X 0 ) < 0.
To use this remark in practice, one usually relies on the following result.
Proposition 3.2 (co-monotony) (a) Let Z : (Ω, A, P) → R be a random variable and let
ϕ, ψ : R → R be two monotone (hence Borel) functions with the same monotony. Assume that
ϕ(Z), ψ(Z) ∈ L2R (Ω, A, P). Then
Cov(ϕ(Z), ψ(Z)) ≥ 0.
If, mutatis mutandis, ϕ and ψ have opposite monotony, then
Cov(ϕ(Z), ψ(Z)) ≤ 0.
58 CHAPTER 3. VARIANCE REDUCTION
Furthermore, the inequity is strict holds as an equality if and only if ϕ(Z) = E ϕ(Z) P-a.s. or
ψ(Z) = E ψ(Z) P-a.s..
d
(b) Assume there exists a non-increasing (hence Borel) function T : R → R such that Z = T (Z).
Then X = ϕ(Z) and X 0 = ϕ(T (Z)) are identically distributed and satisfy
Cov(X, X 0 ) ≤ 0.
Proof. (a) Inequality. The functions Let Z, Z 0 be two independent random variables defined
on the same probability space (Ω, A, P) with distribution PZ (1 ). Then, using that ϕ and ψ are
monotone with the same monotony, we have
so that the expectation of this product is non-positive (and finite since all random variables are
square integrable). Consequently
d
Using that Z 0 = Z and that Z, Z 0 are independent, we get
that is
Cov ϕ(Z), ψ(Z) = E ϕ(Z)ψ(Z) − E ϕ(Z))E(ψ(Z) ≥ 0.
If the functions ϕ and ψ have opposite monotony, then
Now assume that ϕ(Z) is not P-a.s. constant. Then, ϕ cannot be constant on I and ϕ(a) < ϕ(b)
(with the above convention on atoms). Consequently, on Cε , ϕ(Z) − ϕ(Z 0 ) > 0 a.s. so that
ψ(Z) − Ψ(Z 0 ) = 0 P-a.s.; which in turn implies that ψ(a + ε) = ψ(b − ε). Then, letting ε and ε to
0, one derives that ψ(a) = ψ(b) (still having in mind the convention on atoms)). Finally this shows
that ψ(Z) is P-a.s. constant. ♦
(b) Set ψ = ϕ ◦ T so that ϕ and ψ have opposite monotony. Noting that X and X 0 do have the
same distribution and applying claim (a) completes the proof. ♦
Antithetic random variables method. This terminology is shared by two classical situations
in which the above approach applies:
d
– the symmetric random variable Z: Z = −Z (i.e. T (z) = −z).
d
– the [0, L]-valued random variable Z such that Z = L − Z (i.e. T (z) = L − z). This is satisfied
d
by U = U ([0, 1]) with L = 1.
The above one dimensional Proposition 3.2 admits a multi-dimensional extension that reads as
follows:
Theorem 3.1 Let d ∈ N∗ and let Φ, ϕ : Rd → R two functions satisfying the following joint
marginal monotony assumption: for every i ∈ {1, . . . , d}, for every (z1 , . . . , zi−1 , zi+1 , . . . , zd ) ∈
Rd−1 ,
which may depend on i and on (zi+1 , . . . , zd ). For every i ∈ {1, . . . , d}, let Z1 , . . . , Zd be independent
real valued random variables defined on a probability space (Ω, A, P) and let Ti : R → R, i =
d
1, . . . , d, be non-increasing functions such that Ti (Zi ) = Zi . Then, if Φ(Z1 , . . . , Zd ), ϕ(Z1 , . . . , Zd ) ∈
L2 (Ω, A, P),
Cov Φ(Z1 , . . . , Zd ), ϕ(T1 (Z1 ), . . . , Td (Zd )) ≤ 0.
Remarks. • This result may be successfully applied to functions f (X̄ T , . . . , X̄k T , . . . , X̄n T ) of
n n n
T
the Euler scheme with step n of a one-dimensional Brownian diffusion with a non-decreasing drift
60 CHAPTER 3. VARIANCE REDUCTION
and a deterministic strictly positive diffusion coefficient, provided f is “marginally monotonic” i.e.
monotonic in each of its variable with the same monotony. (We refer to Chapter 7 for an intro-
duction to the Euler scheme of a diffusion.) The idea is to rewrite the functional as a “marginally
monotonic” function of the n (independent) Brownian increments which play the role of the r.v.
Zi . Furthermore, passing to the limit as the step size goes to zero, yields some correlation results
for a class of monotone continuous functionals defined on the canonical space C([0, T ], R) of the
diffusion itself (the monotony should be understood with respect to the naive partial order.
• For more insight about this kind of co-monotony properties and their consequences for the pricing
of derivatives, we refer to [125].
Exercises. 1. Prove the above theorem by induction on d. [Hint: use Fubini’s Theorem.]
2. Show that if ϕ and ψ are non-negative Borel functions defined on R, monotone with opposite
monotony,
E ϕ(Z)ψ(Z) ≤ E ϕ(Z) E ψ(Z)
so that, if ϕ(Z), ψ(Z) ∈ L1 (P), then ϕ(Z)ψ(Z) ∈ L1 (P).
3. Use Proposition 3.2(a) and Proposition 2.1(b) to derive directly (in the Black-Scholes model)
from its representation as an expectation that the δ-hedge of a European option whose payoff
function is convex is non-negative.
The idea is simply to parametrize the impact of the control variate Ξ by a factor λ i.e. we set for
every λ ∈ R,
X λ = X − λ Ξ.
3.2. REGRESSION BASED CONTROL VARIATE 61
Cov(X, Ξ) E(X Ξ)
λmin := =
Var(Ξ) E Ξ2
0
Cov(X , Ξ) E(X 0 Ξ)
= 1+ =1+ .
Var(Ξ) E Ξ2
Consequently
so that
2
≤ min Var(X), Var(X 0 )
σmin
2
and σmin = Var(X) if and only if Cov(X, Ξ) = 0.
Remark. Note that Cov(X, Ξ) = 0 if and only if λmin = 0 i.e. Var(X) = min Φ(λ).
λ∈R
If we denote by ρX,Ξ the correlation coefficient between X and Ξ, one gets
2
σmin = Var(X)(1 − ρ2X,Ξ ) = Var(X 0 )(1 − ρ2X 0 ,Ξ ).
1 + ρX,X 0
≤ σX σX 0 .
2
where σX and σX 0 denote the standard deviations of X and X 0 respectively.
Ξk = Xk − Xk0 , Xkλ = Xk − λ Ξk .
62 CHAPTER 3. VARIANCE REDUCTION
M M
1 X 2 1 X
VM := Ξk , CM := Xk Ξk
M M
k=1 k=1
and
CM
λM := (3.2)
VM
with the convention λ0 = 0.
The “batch” approach. Definition of the batch estimator. The Strong Law of Large Numbers
implies that both
This suggests to introduce the batch estimator of m, defined for every size M ≥ 1 of the
simulation by
M
λM 1 X λM
XM = Xk .
M
k=1
M M
λ 1 X 1 X
X MM = Xk − λM Ξ̄k
M M
k=1 k=1
= X̄M − λM Ξ̄M (3.3)
M
λ 1 X λM a.s.
X MM = Xk −→ EX = m
M
k=1
2
and satisfies a CLT (asymptotic normality) with an optimal asymptotic variance σmin i.e.
√
λ
L 2
M X MM − m −→ N 0; σmin .
3.2. REGRESSION BASED CONTROL VARIATE 63
Remark. However, note that the batch estimator is a biased estimator of m since E λM Ξ̄M 6= 0.
Exercise. Let X̄m and Ξ̄m denote the empirical mean processes of the sequences (Xk )k≥ and
Ξk )k≥1 respectively. Show that the quadruplet (X̄M , Ξ̄M , CM , VM ) can be computed in a recursive
way from the sequence (Xk , Xk0 )k≥1 . Derive a recursive way to compute the batch estimator
Var(X λ )|λ=λ̂
M 0
whose asymptotic variance – given the first phase of the simulation – is . This
M − M0
approach is not recursive at all. On the other hand, the above resulting estimator satisfies a
CLT with asymptotic variance Φ(λ̂M 0 ) = Var(X λ )|λ=λ̂ . In particular we most likely observe a
M0
significant – although not optimal – variance reduction? So, from this point of view, you can stop
reading this section at this point. . . .
2
If Yn → c in probability and Zn → Z∞ in distribution then, Yn Zn → c Z∞ in distribution. In particular if c = 0
the last convergence holds in probability.
64 CHAPTER 3. VARIANCE REDUCTION
Theorem 3.2 Assume X, X 0 ∈ L2+δ (P) for some δ > 0. Let (Xk , Xk0 )k≥1 be an i.i.d. sequence
with the same distribution as (X, X 0 ). We set for every k ≥ 1
λ
e
is unbiased (E X M = m), convergent i.e.
λ
e a.s.
X M −→ m as M → +∞,
The rest of this section can be omitted at the occasion of a first reading although the method of proof
exposed below is quite standard when dealing with the efficiency of an estimator by martingale methods.
Proof. Step 1 (a.s. convergence): Let F0 = {∅, Ω} and let Fk := σ(X1 , X10 , . . . , Xk , Xk0 ), k ≥ 1, be the
filtration of the simulation.
First we show that (X ek − m)k≥1 is a sequence of square integrable (Fk , P)-martingale increments. It is
clear by construction that X ek is Fk -measurable. Moreover,
where we used that Ξk and λ̃k−1 are independent. Finally, for every k ≥ 1,
It follows from what precedes that the sequence (Nk )k≥1 is a square integrable ((Fk )k , P)-martingale since
(Xek − m)k≥1 , is a sequence of square integrable (Fk , P)-martingale increments. Its conditional variance
increment process (also known as “bracket process”) hN ik , k ≥ 1 given by
k
X E((Xe` − m)2 | F`−1 )
hN ik =
`2
`=1
k
X Φ(λ̃`−1 )
= .
`2
`=1
Now, the above series is a.s. converging since we know that Φ(λ̃k ) a.s. converges towards Φ(λmin ) as
kn → +∞ since λ̃k a.s. converges toward λmin and Φ is continuous. Consequently,
hN i∞ = a.s. lim hN iM < +∞ a.s.
M →+∞
i.e.
M
1 X e a.s.
X M :=
e Xk −→ m as M → ∞.
M
k=1
We will need this classical lemma (see Chapter 11 (Miscellany), Section 11.9 for a proof).
Lemma 3.1 (Kronecker Lemma) Let (an )n≥1 be a sequence of real numbers and let (bn )n≥1 be a non-
decreasing sequence of positive real numbers with limn bn = +∞. Then
!
X an n
1 X
converges in R as a series =⇒ ak −→ 0 as n → +∞ .
bn bn
n≥1 k=1
Step 2 (CLT , weak rate of convergence): It is a consequence of the Lindeberg Central Limit Theorem for
(square integrable) martingale increments (see Theorem 11.7 the Miscellany chapter or Theorem 3.2 and
its Corollary 3.1, p.58 in [70] refried to as “Lindeberg’s CLT ” in what follows). In our case, the array of
martingale increments is defined by
ek − m
X
XM,k := √ , 1 ≤ k ≤ M.
M
There are basically two assumptions to be checked. First the convergence of the conditional variance incre-
2
ment process toward σmin :
M M
X
2 1 X
E(XM,k | Fk−1 ) = E((Xek − m)2 | Fk−1 )
M
k=1 k=1
M
1 X
= Φ(λ̃k−1 )
M
k=1
2
−→ σmin := min Φ(λ).
λ
66 CHAPTER 3. VARIANCE REDUCTION
The second one is the so-called Lindeberg condition (see again Miscellany or [70], p.58) which reads in
our framework:
XM
2 P
∀ ε > 0, E(XM,` 1{|XM,` |>ε} | F`−1 ) −→ 0.
`=1
In turn, owing to conditional Markov inequality and the definition of XM,` , this condition classically follows
from the slightly stronger
sup E(|Xe − m|2+δ | F`−1 ) < +∞ P-a.s.
`
`≥1
since
M M
X
2 1 X
E(XM,` 1{|XM,` |>ε} | F`−1 ) ≤ δ E(|Xe` − m|2+δ | F`−1 ).
`=1 εδ M 1+ 2 `=1
This means that the expected variance reduction does occur if one implements the recursive approach
described above. ♦
e−rt St t∈[0,T ]
is a martingale on the scenarii space (Ω, A, P)
(with respect to the augmented filtration of (St )t∈[0,T ] ). This means that P is a risk-neutral proba-
bility (supposed to exist). furthermore, to comply with usual assumptions of AOA theory, we will
assume that this risk neutral probability is unique (complete market) to justify that we may price
any derivative under this probability. However this has no real impact on what follows.
Vanilla Call-Put parity (d = 1): We consider a Call and a Put with common maturity T
and strike K. We denote by
the premia of this Call and this Put option respectively. Since
(ST − K)+ − (K − ST )+ = ST − K
and e−rt St
t∈[0,T ]
is a martingale, one derives the classical Call-Put parity equation:
Asian Call-Put parity: We consider an :em asian Call and an Asian Put with common
maturity T , strike K and averaging period [T0 , T ], 0 ≤ T0 > T ].
Z T ! Z T !
−rT 1 −rT 1
CallAs
0 =e E St dt − K and PutAs
0 =e E K− St dt .
T − T0 T0 +
T − T0 T0 +
Still using that Set = e−rt St is a P-martingale and, this time, Fubini-Tonelli’s Theorem yield
1 − e−r(T −T0 )
CallAs As
0 − Put0 = s0 − e−rT K
r(T − T0 )
so that
0
CallAs
0 = E(X) = E(X )
with
Z T
−rT 1
X := e St dt − K
T − T0 T0 +
e−r(T −T0 ) T
1−
Z
0 −rT −rT 1
X := s0 −e K +e K− St dt .
r(T − T0 ) T − T0 T0 +
This leads to
T
1 − e−r(T −T0 )
Z
−rT 1
Ξ=e St dt − s0 .
T − T0 T0 r(T − T0 )
Remark. In both cases, the parity equation directly follows from the P-martingale property of
Set = e−rt St .
68 CHAPTER 3. VARIANCE REDUCTION
– Warning ! This is no longer true in general. . . and in a general setting the complexity of
the simulation of X and X 0 is double of that of X itself. Then the regression method is efficient iff
1
2
σmin < min(Var(X), Var(X 0 ))
2
(provided one neglects the cost of the estimation of the coefficient λmin ).
The exercise below shows the connection with antithetic variables which then appears as a
special case of regression methods.
Exercise (Connection with the antithetic variable methdod). Let X, X 0 ∈ L2 (P) such
that E X = E X 0 = m and Var(X) = Var(X 0 ).
(a) Show that
1
λmin = .
2
(b) Show that
X + X0
X λmin =
2
and
X + X0
1
Var(X) + Cov(X, X 0 ) .
Var =
2 2
Characterize the couple (X, X 0 ) with which the regression method does reduce the variance. Make
the connection with the antithetic method.
0.015
0.005
-0.005
-0.01
-0.015
-0.02
90 95 100 105 110 115 120
Strikes K=90,...,120
0.7
0.6
0.5
K ---> lambda(K)
0.4
0.3
0.2
0.1
90 95 100 105 110 115 120
Strikes K=90,...,120
Figure 3.2: Black-Scholes Calls: K 7→ 1 − λmin (K), K = 90, . . . , 120, for the Interpolated
synthetic Call.
The dynamics of the risky asset is this time a stochastic volatility model, namely the Heston
model, defined as follows. Let ϑ, k, a such that ϑ2 /(4ak) < 1 (so that vt remains a.s. positive,
see [91]).
√
dSt = St r dt + vt dWt1 ,
s0 = x0 > 0, (risky asset)
√
dvt = k(a − vt )dt + ϑ vt dWt2 , v0 > 0 with < W 1 , W 2 >t = ρ t, ρ ∈ [−1, 1].
Usually, no closed form are available for Asian payoffs even in the Black-Scholes model and so is
the case in the Heston model. Note however that (quasi-)closed forms do exist for vanilla European
70 CHAPTER 3. VARIANCE REDUCTION
18
14
12
10
4
90 95 100 105 110 115 120
Strikes K=90,...,120
options in this model (see [72]) which is the origin of its success. The simulation has been carried
out by replacing the above diffusion by an Euler scheme (see Chapter 7 for an introduction to
Euler scheme). In fact the dynamics of the stochastic volatility process does not fulfill the standard
Lipschitz continuous continuous assumptions required to make the Euler scheme converge at least
at its usual rate. In the present case it is even difficult to define this scheme because of the term
√
vt . Since our purpose here is to illustrate the efficiency of parity relations to reduce variance we
adopted a rather “basic” scheme, namely
r
rT q T p
S̄ kT = S̄ (k−1)T 1 + + |v̄ (k−1)T | ρUk2 + 1 − ρ2 Uk1 , S̄0 = s0 > 0
n n n n n
T q
v̄ kT = k a − v̄ (k−1)T + ϑ |v (k−1)T | Uk2 , v̄0 = v0 > 0
n n n n
where Uk = (Uk1 , Uk2 )k≥1 is an i.i.d. sequence of N (0, I2 )-distributed random vectors.
This scheme is consistent but its rate of convergence if not optimal. The simulation of the Heston
model has given rise to an extensive literature. Se e.g. among others Diop, Alfonsi, Andersen, etc.....
r = 0, σ = 0.3, x0 = 100, T = 1.
1. One considers a vanilla Call with strike K = 80. The random variable Ξ is defined as above.
Estimate the λmin (one should be not too far from 0.825). Then, compute a confidence interval for
3.3. APPLICATION TO OPTION PRICING: USING PARITY EQUATIONS TO PRODUCE CONTROL VARIAT
12
10
0
90 95 100 105 110 115 120
Strikes K=90,...,120
Figure 3.4: Heston Asian Calls. Standard Deviation (MC Premium). K = 90, . . . , 120.
–o–o–o– Crude Call. –∗–∗–∗– Synthetic Parity Call. –×–×–×– Interpolated synthetic Call.
0.9
0.8
0.7
K ---> lambda(K)
0.6
0.5
0.4
0.3
0.2
0.1
0
90 95 100 105 110 115 120
Strikes K=90,...,120
Figure 3.5: Heston Asian Calls. K 7→ 1 − λmin (K), K = 90, . . . , 120, for the Interpolated
Synthetic Asian Call.
the Monte Carlo pricing of the Call with and without the linear variance reduction for the following
sizes of the simulation: M = 5 000, 10 000, 100 000, 500 000.
2. Proceed as above but with K = 150 (true price 1.49). What do you observe ? Provide an
interpretation.
0.02
0.015
0.005
−0.005
−0.01
−0.015
−0.02
90 95 100 105 110 115 120
Strikes K=90,...,120
Figure 3.6: Heston Asian Calls. M = 106 (Reference: M C with M = 108 ). K = 90, . . . , 120.
–o–o–o– Crude Call. –∗–∗–∗– Parity Synthetic Call. –×–×–×– Interpolated Synthetic Call.
Remark. This approach also produces an optimal asset selection (since it is essentially a P CA)
which helps for hedging.
3.4 Pre-conditioning
The principle of the pre-conditioning method – also known as the Blackwell-Rao method – is
based on the very definition of conditional expectation: let (Ω, A, P) be probability space and
let X : (Ω, A, P) → R be a square integrable random variable. The practical constraint for
implementation is the ability to simulate at a competitive cost E(X | B). The examples below
show some typical situations where so is the case.
For every sub-σ-field B ⊂ A
E X = E (E(X | B))
and
where Z1 , Z2 are independent random vectors. Set B := σ(Z2 ). Then standard results on condi-
tional expectations show that
is a version of the conditional expectation of g(Z1 , Z2 ) given σ(Z2 ). At this stage, the pre-
conditionning method can be implemented as soon as the following conditions are satisfied:
– a closed form is available for the function G and
– the (distribution of) Z2 can be simulated with the same complexity as (the distribution of)
X.
σi2 i
Examples. 1. Exchange spread options. Let XTi = xi e(r− 2 )T +σi WT , xi , σi > 0, i = 1, 2, be two
“Black-Scholes” assets at time T related to two Brownian motions WTi , i = 1, 2, with correlation
ρ ∈ [−1, 1]. One considers an exchange spread options with strike K i.e. related to the payoff
ρ 2 σ1
2T √ 2 √
σ2 p
= CallBS x1 e− 2 +σ1 ρ T Z2 , x2 e(r− 2 )T +σ2 T Z2 + K, r, σ1 1 − ρ2 , T .
74 CHAPTER 3. VARIANCE REDUCTION
Then one takes advantage of the closed form available for vanilla call options in a Black-scholes
model to compute
PremiumBS (X01 , x20 , K, σ1 , σ2 , r, T ) = E E(e−rT hT | Z2 )
where U is uniformly distributed on [0, 1]ri (with ri ∈ N ∪ {∞}, the case ri = ∞ corresponding
to the acceptance-rejection method) and ϕi : [0, 1]ri → E is an (easily) computable function.
This second condition simply means that L(X | X ∈ Ai ) is easy to simulate on a computer. To
be more precise we implicitly assume in what follows that the the simulation of X and of the
conditional distribution L(X | X ∈ Ai ), i ∈ I, or equivalently the random vectors ϕi (U ) have the
same complexity. One must always keep that in mind since it is a major constraint for practical
implementations of stratification methods.
This simulability condition usually has a strong impact on the possible design of the strata.
For convenience, we will assume in what follows that ri = r.
Let F : (E, E) → (R, Bor(R)) such that E F 2 (X) < +∞,
X
E F (X) = E 1{X∈Ai } F (X)
i∈I
X
= pi E F (X) | X ∈ Ai
i∈I
X
= pi E F (ϕi (U )) .
i∈I
The stratification idea takes place now. Let M be the global “budget” allocated to the simula-
tion of E F (X). We split this budget into |I| groups by setting
Mi = qi M, i ∈ I,
3.5. STRATIFIED SAMPLING 75
be the allocated budget to compute E F (ϕi (U )) in each stratum “Ai ”. This leads to define the
following (unbiased) estimator
M
I X 1 Xi
(X)M :=
Fd pi F (ϕi (Uik ))
Mi
i∈I k=1
where (Uik )1≤k≤Mi ,i∈I are uniformly distributed on [0, 1]r , i.i.d. random variables. Then, elementary
computations show that
1 X p2i 2
I
Var F (X)M =
d σ
M qi F,i
i∈I
where, for every i ∈ I,
2
σF,i = Var F (ϕi (U )) = Var F (X) | X ∈ Ai
2
= E F (X) − E(F (X)|X ∈ Ai ) | X ∈ Ai
2
E F (X) − E(F (X)|X ∈ Ai ) 1{X∈Ai }
= .
pi
Optimizing the simulation allocation to each stratum amounts to solving the following minimization
problem ( )
X p2 X
2
min i
σF,i where PI := (qi )i∈I ∈ RI+ | qi = 1 .
(qi )∈PI qi
i∈I i∈I
A sub-optimal choice, but natural and simple, is to set
qi = pi , i ∈ I.
Such a choice is first motivated by the fact that the weights pi are known and of course because it
does reduce the variance since
X p2 X
i 2 2
σF,i = pi σF,i
qi
i∈I i∈I
X
E 1{X∈Ai } (F (X) − E(F (X)|X ∈ Ai ))2
=
i∈I
= kF (X) − E(F (X)|σ({X ∈ Ai }, i ∈ I))k22 . (3.4)
where σ({X ∈ Ai }, i ∈ I)) denotes the σ-field spanned by the partition {X ∈ Ai }, i ∈ I. Conse-
quently
X p2
σ ≤ kF (X) − E(F (X))k22 = Var(F (X))
i 2
(3.5)
qi F,i
i∈I
with equality if and only if E (F (X)|σ({X ∈ Ai }, i ∈ I)) = E F (X) P-a.s. Or, equivalently, equality
holds in this inequality if and only if
So this choice always reduces the variance of the estimator since we assumed that the stratification
is not trivial. It corresponds in the opinion poll world to the so-called quota method.
The optimal choice is the solution to the above constrained minimization problem. It follows
from a simple application of Schwarz Inequality (and its equality case) that
X X pi σF,i √
pi σF,i = √ qi
qi
i∈I i∈I
!1 !1
X p2i σF,i
2 2 X 2
≤ qi
qi
i∈I i∈I
!1
X p2i σF,i
2 2
= ×1
qi
i∈I
!1
X p2i σF,i
2 2
= .
qi
i∈I
Consequently, the optimal choice for the allocation parameters qi ’s i.e. the solution to the above
constrained minimization problem is given by
∗ pi σF,i
qF,i =P , i ∈ I,
j pj σF,j
d
Examples. Stratifications for the computation of E F (X), X = N (0; Id ), d ≥ 1.
(a) Stripes. Let v be a fixed unitary vector (a simple and natural choice for v is v = e1 =
(1, 0, 0, · · · , 0)): it is natural to define the strata as hyper-stripes perpendicular to the main axis
Re1 of X). So, we set, for a given size N of the stratification (I = {1, . . . , N }),
1
pi = P(X ∈ Ai ) = P(Z ∈ [yi−1 , yi ]) = Φ0 (yi ) − Φ0 (yi−1 ) =
N
where Φ0 denotes the distribution function of the N (0; 1)-distribution. Other choices are possible
for the yi , leading to a non uniform distribution of the pi ’s. The simulation of the conditional
distributions follows from the fact that
d d
where ξ1 = L(Z | Z ∈ [a, b]) is independent of ξ2 = N (0, Id−1 ),
d d
L(Z | Z ∈ [a, b]) = Φ−1
0 (Φ0 (b) − Φ0 (a))U + Φ0 (a) , U = U ([0, 1])
and πv⊥⊥ denotes the orthogonal projection on v ⊥ . When v = e1 , this reads simply
d
(b) Hyper-rectangles. We still consider X = (X 1 , . . . , X d ) = N (0; Id ), d ≥ 2. Let (e1 , . . . , ed ) denote
the canonical basis of Rd . We define the strata as hyper-rectangles. Let N1 , . . . , Nd ≥ 1.
d n
Y o d
Y
Ai := x ∈ Rd s.t. (e` |x) ∈ [yi`` −1 , yi`` ] , i∈ {1, . . . , N` }
`=1 `=1
i
where the yi` ∈ R are defined by Φ0 (yi` ) = N` , i = 0, . . . , N` . Then, for every multi-index i =
(i1 , . . . , id ) ∈ d`=1 {1, . . . , N` },
Q
d
O
L(X | X ∈ Ai ) = L(Z | Z ∈ [yi`` −1 , yi`` ]). (3.6)
`=1
Optimizing the allocation to each stratum in the simulation for a given function F in order to
to reduce the variance is of course interesting and can be highly efficient but with the drawback to
be strongly F -dependent, especially when this allocation needs an extra procedure like in [45]. An
alternative and somewhat dual approach is to try optimizing the strata themselves uniformly with
respect to a class of functions F (namely Lipschitz continuous continuous functions) prior to the
allocation of the allocation.
This approach emphasizes the connections between stratification and optimal quantization and
provides bounds on the best possible variance reduction factor that can be expected from a strati-
fication. Some elements are provided in Chapter 5, see also [38] for further developments in infinite
dimension.
78 CHAPTER 3. VARIANCE REDUCTION
PX = f.µ.
In practice, we will have to simulate several r.v., whose distributions are all absolutely continuous
with respect to this reference measure µ. For a first reading one may assume that E = R and µ is
the Lebesgue measure but what follows can also be applied to more general measured spaces like
the Wiener space (equipped with the Wiener measure), etc. Let h ∈ L1 (PX ). Then,
Z Z
E h(X) = h(x)PX (dx) = h(x)f (x)µ(dx).
E E
Now for any µ-a.s. positive probability density function g defined on (E, E) (with respect to µ),
one has Z Z
h(x)f (x)
E h(X) = h(x)f (x)µ(dx) = g(x)µ(dx).
E E g(x)
One can always enlarge (if necessary) the original probability space (Ω, A, P) to design a random
variable Y : (Ω, A, P) → (E, E) having g as a probability density with respect to µ. Then, going
back to the probability space yields for every non-negative or PX -integrable function h : E → R,
h(Y )f (Y )
E h(X) = E . (3.7)
g(Y )
So, in order to compute E h(X) one can also implement a Monte Carlo simulation based on the
simulation of independent copies of the random variable Y i.e.
M
h(Y )f (Y ) 1 X f (Yk )
E h(X) = E = lim h(Yk ) a.s.
g(Y ) M →+∞ M g(Yk )
k=1
Practitioner’s corner
Practical requirements (to undertake the simulation). To proceed it is necessary to simulate
independent copies of Y and to compute the ratio of density functions f /g at a reasonable cost (note
that only the ratio is needed which makes useless the computation of some “structural” constants
like (2π)d/2 when both f and g are Gaussian densities, e.g. with different means, see below). By
“reasonable cost” for the simulation of Y , we mean at the same cost as that of X (in terms of
complexity). As concerns the ratio f /g this means that its computation remains negligible with
respect to that of h.
Sufficient conditions (to undertake the simulation). Once the above conditions are fulfilled,
the question is: is it profitable to proceed like that? So is the case if the complexity of the
simulation for a given accuracy (in terms of confidence interval) is lower with the second method.
3.6. IMPORTANCE SAMPLING 79
If one assumes as above that,simulating X and Y on the one hand, and computing h(x) and
(hf /g)(x) on the other hand are both comparable in terms of complexity, the question amounts to
comparing the variances or equivalently the squared quadratic norm of the estimators since they
have the same expectation E h(X).
Now
" 2 # " 2 #
h(Y )f (Y ) hf
E = E (Y )
g(Y ) g
h(x)f (x) 2
Z
= g(x)µ(dx))2
g(x)
ZE
f (x)
= h(x)2 f (x)µ(dx)
E g(x)
2f
= E h(X) (X) .
g
hf
As a consequence simulating (Y ) rather than h(X) will reduce the variance if and only if
g
f
E h(X)2 (X) < E h(X)2 .
(3.8)
g
Remark. In fact, theoretically, as soon as h is non-negative and E h(X) 6= 0, one may reduce the
variance of the new simulation to. . . 0. As a matter of fact, using Schwarz Inequality one gets, as
h(Y )f (Y )
if trying to “reprove” that Var g(Y ) ≥ 0,
Z 2
2
E h(X) = h(x)f (x)µ(dx)
E
Z !2
h(x)f (x) p
= p g(x)µ(dx)
E g(x)
(h(x)f (x))2
Z Z
≤ µ(dx) × g dµ
E g(x) E
(h(x)f (x))2
Z
= µ(dx)
E g(x)
since g is a probability density
p function. Now(x)the equality case in Schwarz inequality says that the
variance is 0 if and only if g(x) and h(x)f
√ are proportional µ(dx)-a.s. i.e. h(x)f (x) = c g(x)
g(x)
µ(dx)-a.s. for a (non-negative) real constant c. Finally this leads, when h has a constant sign and
E h(X) 6= 0 to
h(x)
g(x) = f (x) µ(dx) a.s.
E h(X)
This choice is clearly impossible to make since this would mean that E h(X) is known since it
is involved in the formula. . . and would then be of no use. A contrario this may suggest a direction
to design the (distribution) of Y .
80 CHAPTER 3. VARIANCE REDUCTION
The problem becomes a parametric optimization problem, typically solving the minimization
problem ( " 2 # )
f 2 f
min E h(Yθ ) (Yθ ) = E h(X) (X) .
θ∈Θ gθ gθ
Of course there is no reason why the solution to the above problem should be θ0 (if so, such a
parametric model is inappropriate). At this stage one can follow two strategies:
– Try to solve by numerical means the above minimization problem,
– Use one’s intuition to select a priori a good (although sub-optimal) θ ∈ Θ by applying the
heuristic principle: “focus light where it is needed”.
σ2
with x > 0, σ > 0 and µ = r − 2 . Then, the premium of an option with payoff ϕ : (0, +∞) →
(0, +∞),
From now on we forget about the financial framework and deal with
Z z2
e2
E h(Z) = h(z)f (z)dz where f (z) = √
R 2π
and the random variable Z will play the role of X in the above theoretical part. The idea is to
introduce the parametric family
Yθ = Z + θ, θ ∈ Θ := R.
We consider the Lebesgue measure on the real line λ1 as a reference measure, so that
(y−θ)2
e− 2
gθ (y) = √ , y ∈ R, θ ∈ Θ := R.
2π
f θ2
(y) = e−θy+ 2 , y ∈ R, θ ∈ Θ := R.
gθ
82 CHAPTER 3. VARIANCE REDUCTION
θ2
= e 2 E h(Z + θ)e−θ(Z+θ)
θ2
= e− 2 E h(Z + θ)e−θZ .
In fact, a standard change of variable based on the invariance of the Lebesgue measure by
translation yields the same result in a much more straightforward way: setting z = u + θ shows
that
Z
θ2 u2 du
E h(Z) = h(u + θ)e− 2 −θu− 2 √
R 2π
θ 2
= e− 2 E e−θZ h(Z + θ)
θ2
= e 2 E h(Z + θ)e−θ(Z+θ) .
It is to be noticed again that there is no need to account for the normalization constants to
f g0
compute the ratio = .
g gθ
The next step is to choose a “good” θ which significantly reduces the variance i.e., following
Condition (3.8) (using the formulation involving with “Yθ = Z + θ”) such that
2 2
θ
−θ(Z+θ)
E e 2 h(Z + θ)e < E h2 (Z)
i.e.
−θ2 2 −2θZ
e E h (Z + θ)e < E h2 (Z)
or, equivalently if one uses the formulation of (3.8) based on the original random variable (here Z)
θ2
E h2 (Z)e 2 −θZ < E h2 (Z).
It is clear that the solution of this optimization problem and the resulting choice of θ choice
highly depends on the function h.
– Optimization approach: When h is smooth enough, an approach based on large deviation esti-
mates has been proposed by Glasserman and al. (see [57]). We propose a simple recursive/adaptive
3.6. IMPORTANCE SAMPLING 83
approach in Chapter 6 based on Stochastic Approximation methods which does not depend upon
the regularity of the unction h (see also [4] for a pioneering work in that direction).
would lead to
log(x/K)
θ := − √ . (3.10)
σ T
A third simple, intuitive idea is to search for θ such that
√ 1
P x exp µT + σ T (Z + θ) < K =
2
which yields
log(x/K) + µT
θ := − √ . (3.11)
σ T
√
This choice is also the solution to the equation xeµT +σ T θ = K, etc.
All these choices are suboptimal but reasonable when x K. However, if we need to price a
whole portfolio including many options with various strikes, maturities (and underlyings. . . ), the
above approach is no longer possible and a data driven optimization method like the one developed
in Chapter 6 becomes mandatory.
Other parametric methods can be introduced, especially in non-Gaussian frameworks, like for
example the so-called “exponential tilting” (or Esscher transform) for distributions having a Laplace
transform on the whole real line (see e.g. [102]). Thus, when dealing with the NIG distribution
(for Normal Inverse Gaussian) this transform has an impact on the thickness oft the tail of the
distribution. Of course there is no a priori limit to what can be designed on as specific problem.
When dealing with path-dependent options, one usually relies on the Girsanov theorem to modify
likewise the drift of the risky asset dynamics (see [102]). Of course all this can be adapted to
multi-dimensional models. . .
Equation (3.12) has at least one solution since F is continuous which may not be unique in
general. For convenience one often assumes that the lowest solution of Equation (3.12) is the
V @Rα,X . In fact Value-at-Risk (of X) is not consistent as a measure of risk (as emphasized
in [49]), but nowadays it is still widely used to measure financial risk.
One naive way to compute V @Rα,X is to estimate the empirical distribution function of a (large
enough) Monte Carlo simulation at some points ξ lying in a grid Γ := {ξi , i ∈ I}, namely
M
1 X
Fd
(ξ)M := 1{Xk ≤ξ} , ξ ∈ Γ,
M
k=1
3.6. IMPORTANCE SAMPLING 85
where (Xk )k≥1 is an i.i.d. sequence of X-distributed random variables. Then one solves the equation
Fb(ξ)M = α (using an interpolation step of course).
Such an approach based on empirical distribution of X needs to simulate extreme values of
X since α is usually close to 1. So implementing a Monte Carlo simulation directly on the above
equation is usually a slightly meaningless exercise. Importance sampling becomes the natural way
to “re-center” the equation.
Assume e.g. that
d
X := ϕ(Z), Z = N (0, 1).
Then, for every ξ ∈ R,
θ2
P(X ≤ ξ) = e− 2 E 1{ϕ(Z+θ)≤ξ} e−θZ
so that the above Equation (3.12) now reads (θ being fixed)
θ2
E 1{ϕ(Z+θ)≤V @Rα,X } e−θZ = e 2 α.
It remains to find good θ’s. Of course this choice of depends on ξ but in practice it should be
fitted to reduce the variance in the neighbourhood of V @Rα,X . We will see in Chapter 6 devoted
that more efficient methods based on Stochastic Approximation can be devised. But they also
need variance reduction to be implemented. Furthermore, similar ideas can be used to compute a
consistent measure of risk called the Conditional Value-at-Risk (or Averaged Value-at-Risk). All
these topics will be investigated in Chapter 6.
86 CHAPTER 3. VARIANCE REDUCTION
Chapter 4
In this chapter we present the so-called quasi-Monte Carlo (QM C) method which can be seen as
a deterministic alternative to the standard Monte Carlo method: the pseudo-random numbers are
replaced by deterministic computable sequences of [0, 1]d -valued vectors which, once substituted
mutatis mutandis to pseudo-random numbers in the Monte Carlo may speed up significantly its
rate of convergence making it almost independent of the structural dimension d of the simulation.
where the subset Ωf of P-probability 1 one which this convergence does hold depends on the
function f . In particular the above a.s.-convergence holds for any continuous function on [0, 1]d .
But in fact, taking advantage of the separability of the space of continuous functions, one shows
87
88 CHAPTER 4. THE QUASI-MONTE CARLO METHOD
that, we will show below that this convergence simultaneously holds for all continuous functions on
[0, 1]d and even on the larger class of functions of Riemann integrable functions on [0, 1]d .
First we briefly recall the basic definition of weak convergence of probability measures on metric
spaces (see [24] Chapter 1, for a general introduction to weak convergence of measures on metric
spaces).
Definition 4.1 (Weak convergence) Let (S, δ) be a metric space and S := Borδ (S) its Borel
σ-field. Let (µn )n≥1 be a sequence of probability measures on (S, S) and µ a probability measure on
(S)
the same space. The sequence (µn )n≥1 weakly converges to µ (denoted µn =⇒ µ throughout this
chapter) if, for every function f ∈ Cb (S, R),
Z Z
f dµn −→ f dµ as n → +∞. (4.2)
S S
One important result on weak convergence of probability measures that we will use in this
chapter is the following. The result below is important for applications.
(S)
Proposition 4.1 (See Theorem 5.1 in [24]) If µn =⇒ µ, then the above convergence (4.2) holds
for every bounded Borel function f : (S, S) → R such that
µ(Disc(f )) = 0 where Disc(f ) = x ∈ S, f is discontinuous at x ∈ S.
Theorem 4.1 (Glivenko-Cantelli) If (Un )n≥1 is an i.i.d. sequence of uniformly distributed ran-
dom variables on the unit hypercube [0, 1]d , then
n
1X ([0,1]d )
P(dω)-a.s. δUk (ω) =⇒ λd|[0,1]d = U ([0, 1]d )
n
k=1
i.e.
n Z
d 1X
P(dω)a.s. ∀ f ∈ C([0, 1] , R), f (Uk (ω)) −→ f (x)λd (dx). (4.3)
n [0,1]d
k=1
Proof. The vector space C([0, 1]d , R) endowed with the sup-norm kf k∞ := supx∈[0,1]d |f (x)| is
separable in the sense that there exists a sequence (fm )m≥1 of continuous functions on [0, 1]d which
is everywhere dense in C([0, 1]d , R) for the sup-norm (1 ). Then, a countable union of P-negligible
sets remaining P-negligible, one derives from (4.1) the existence of Ω0 ⊂ Ω such that P(Ω0 ) = 1
and
n Z
1X
∀ ω ∈ Ω0 , ∀ m ≥ 1, fm (Uk (ω)) −→ E(fm (U1 )) = fm (u)λd (du) as n → +∞.
n [0,1]d
k=1
1
When d = 1, an easy way to construct this sequence is to consider the countable family of piecewise affine
functions with monotony breaks at rational points of the unit interval and taking rational values at this break points
(and at 0 and 1). The density follows from that of the set Q of rational numbers. When d ≥ 2, one proceeds likewise
by considering functions which are affine on hyper-rectangles with rational vertices which tile the unit hypercube.
4.1. MOTIVATION AND DEFINITIONS 89
On the other hand, it is straightforward that, for any f ∈ C([0, 1]d , R), for every n ≥ 1 and every
ω ∈ Ω0 ,
1 Xn n
1X
fm (Uk (ω)) − f (Uk (ω)) ≤ kf − fm k∞ and |E(fm (U1 )) − E(f (U1 ))| ≤ kf − fm k∞
n n
k=1 k=1
Now, the fact that the sequence (fm )m≥1 is everywhere dense in C([0, 1]d , R) for the sup-norm
exactly means that
lim inf kf − fm k∞ = 0.
m
This completes the proof since it shows that, for every ω ∈ Ω0 , the expected convergence holds for
every continuous function on [0, 1]d . ♦
Corollary 4.1 Owing to Proposition 4.1, one may replace in (4.3) the set of continuous functions
on [0, 1]d by that of all bounded Borel λd -a.s. continuous functions on [0, 1]d
Remark. In fact, one may even replace the Borel measurability by the “Lebesgue”-measurable
d → R, replace the B([0, 1]d ), Bor(R) -measurability by the
namely, for a function f : [0, 1]
L([0, 1]d ), Bor(R) -measurability where L([0, 1]d ) denotes the completion of the Borel σ-field on
[0, 1]d by the λd -negligible sets (see [30], Chapter 13). Such functions are known as Riemann
integrable functions on [0, 1]d (see again [30], Chapter 13).
What precedes suggests that, as long as one wishes to compute some quantities E f (U ) for
(reasonably) smooth functions f , we only need to have access to a sequence that satisfies the above
convergence property for its empirical distribution. Furthermore, we know from the first chapter
devoted to simulation that this situation is generic since all distributions can be simulated from
a uniformly distributed random variable. This leads to formulate the following definition of a
uniformly distributed sequence .
Definition 4.2 A [0, 1]d -valued sequence (ξn )n≥1 is uniformly distributed (u.d.) on [0, 1]d if
n
1X ([0,1]d )
δξk =⇒ U ([0, 1]d ) as n → +∞.
n
k=1
We need some characterizations of uniform distribution which can be established more easily on
examples. These are provided by the proposition below. To this end we need to introduce further
definitions and notations.
Definition 4.3 (a) We define a partial order, denoted “≤”, on [0, 1]d is defined by: for every
x = (x1 , . . . , xd ), y = (y 1 , . . . , y d ) ∈ [0, 1]d
x≤y if xi ≤ y i , 1 ≤ i ≤ d.
90 CHAPTER 4. THE QUASI-MONTE CARLO METHOD
(b) The “box” [[x, y]] is defined for every x = (x1 , . . . , xd ), y = (y 1 , . . . , y d ) ∈ [0, 1]d , x ≤ y, by
Note that [[x, y]] 6= ∅ if and only if x ≤ y and, if so is the case, [[x, y]] = Πdi=1 [xi , y i ].
Notation. In particular the unit hypercube [0, 1]d can be denoted [[0, 1]].
Proposition 4.2 (Portemanteau Theorem) (See among others [86, 27]) Let (ξn )n≥1 be a [0, 1]d -
valued sequence. The following assertions are equivalent.
(vi) (Bounded Riemann integrable function) For every bounded λd -a.s. continuous Lebesgue-mea-
surable function f : [0, 1]d → R
n Z
1X
f (ξk ) −→ f (x)λd (dx) as n → +∞.
n [0,1]d
k=1
Remark. The two moduli introduced in items (iii) and (iv) define the discrepancy at the origin
and the extreme discrepancy respectively.
Sketch of proof. The ingredients of the proof come from the theory of weak convergence of
probability measures. For more details in the multi-dimensional setting we refer to [24] (Chapter 1
devoted to the general theory of weak convergence of probability measures on a Polish space)
4.1. MOTIVATION AND DEFINITIONS 91
or [86] (old but pleasant book devoted to u.d. sequences). We provide some elements of proof in
1-dimension.
The equivalence (i) ⇐⇒ (ii) is simply the characterization of weak convergence of probability
measures by the convergence of their distributions functions (2 ) since the distribution function Fµn
1 P 1 P
of µn = n 1≤k≤n δξk is given by Fµn (x) = n 1≤k≤n 1{0≤ξk ≤x} .
Owing to second Dini’s Lemma, this convergence of non-decreasing (distribution) functions
is uniform as soon as it holds pointwise since its pointwise limiting function is FU ([0,1]) (x) = x
is continuous. This remark yields the equivalence (ii) ⇐⇒ (iii). Although more technical, the
d-dimensional extension is remains elementary and relies on a similar principle.
The equivalence (iii) ⇐⇒ (iv) is trivial since Dn∗ (ξ) ≤ D∞ (ξ) ≤ 2Dn∗ (ξ) in 1-dimension. In
d-dimension the equality on the right reads with 2d instead of 2.
Item (v) is based on the fact that weak convergence of finite measures on [0, 1]d is characterized
by that of the sequences of their Fourier coefficients (the Fourier coefficients of a finite measure µ
on ([0, 1] , Bor([0, 1] ) are defined by cp (µ) := [0,1]d e2ı̃π(p|u) µ(du), p ∈ Nd , ı̃2 = −1). One checks
d d
R
that the Fourier coefficients cp (λd,[0,1]d ) p∈Nd are simply cp (λd,0,1]d ) = 0 if p 6= 0 and 1 if p = 0.
Item (vi) follows from (i) and Proposition 4.1 since for every x ∈ [[0, 1]], fc (ξ) := 1{ξ≤x} is
continuous outside {x} which is clearly Lebesque negligible. Conversely (vi) implies the pointwise
convergence of the distribution function Fµn as defined above toward FU ([0,1]) . ♦
The discrepancy at the origin Dn∗ (ξ) plays a central rôle in the theory of uniformly distributed
sequences: it idoes not only provide a criterion for uniform distribution, it also appears in as an
upper error modulus for numerical integration when the function f has the appropriate regularity
(see Koksma-Hlawka Inequality below).
Remark. One can define the discrepancy Dn∗ (ξ1 , . . . , ξn ) at the origin of a given n-tuple (ξ1 , . . . , ξn ) ∈
([0, 1]d )n by the expression in (iii) of the above Proposition.
The geometric interpretation of the discrepancy is the following: if x := (x1 , . . . , xd ) ∈ [[0, 1]],
the hyper-volume of [[0, x]] is equal to the product x1 · · · xd and
n
1X card {k ∈ {1, . . . , n}, ξk ∈ [[0, x]]}
1[[0,x]] (ξk ) =
n n
k=1
is but the frequency with which the first n points ξk ’s of the sequence fall into [[0, x]]. The discrepancy
measures the maximum induced error when x runs over [[0, 1]].
The first exercise below provides a first example of uniformly distributed sequence.
Exercises. 1. Rotations of the torus ([0, 1]d , +). Let (α1 , . . . , αd ) ∈ (R \ Q)d (irrational numbers)
such that (1, α1 , . . . , αd ) are linearly independent over Q (3 ). Let x = (x1 , . . . , xd ) ∈ Rd . Set for
every n ≥ 1
ξn := ({xi + n αi })1≤i≤d .
2
The distribution function Fµ of a probability measure µ on [0, 1] is defined by Fµ (x) = µ([0, x]). One shows
that a sequence of probability measures µn converges toward a probability measure µ if and only if their distribution
functions Fµn and Fµ satisfy Fµn (x) converges to Fµ (x) at every x ∈ R such that Fµ is continuous at x (see [24],
Chapter 1)
3
This means that if λi ∈ Q, i = 0, . . . , d satisfy λ0 + λ1 α1 + · · · + λd αd = 0 then λ0 = λ1 = · · · = λd = 0.
92 CHAPTER 4. THE QUASI-MONTE CARLO METHOD
where {x} denotes the fractional part of a real number x. Show that the sequence (ξn )n≥1 is
uniformly distributed on [0, 1]d (and can be recursively generated). [Hint: use the Weyl criterion.]
2.(a) Assume d = 1. Show that for every n-tuple (ξ1 , . . . , ξn ) ∈ [0, 1]n
k − 1 (n) k (n)
Dn∗ (ξ1 , . . . , ξn ) = max − ξk , − ξk
1≤k≤n n n
(n)
where (ξk )1≤k≤nP is the reordering of the n-tuple (ξ1 , . . . , ξn ). [Hint: How does the “càdlàg”
function x 7→ n1 nk=1 1{ξk ≤x} − x reach its supremum?].
(b) Deduce that
1 (n) 2k − 1
Dn∗ (ξ1 , . . . , ξn ) = + max ξ − .
2n 1≤k≤n k 2n
(c) Minimal discrepancy at the origin. Show that the n-tuple with the lowest discrepancy (at the
2k−1 1
origin) is 2n 1≤k≤n (the “mid-point” uniform n-tuple) with discrepancy 2n .
(or equivalently f (x) = f (0) − ν(c [[0, 1 − x]])). The variation V (f ) is defined by
V (f ) := |ν|([0, 1]d )
Exercises. 1. Show that f has finite variation in the measure sense if and only if there exists
a signed ν measure with ν({1}) = 0 such that
∀ x ∈ [0, 1]d , f (x) = f (1) + ν([[x, 1]]) = f (0) − ν(c [[x, 1]])
and that its variation is given by |ν|([0, 1]d ). This could of course be taken as the definition
equivalently to the above one.
2. Show that the function f defined on [0, 1]2 by
d
has finite variation in the measure sense [Hint: consider the distribution of (U, 1−U ), U = U ([0, 1])].
For the class of functions with finite variations, the Koksma-Hlawka Inequality provides an error
n Z
1X
bound for f (ξk ) − f (x)dx based on the star discrepancy.
n [0,1]d
k=1
where we used Fubini’s Theorem to interchange the integration order (which is possible since
|ν| ⊗ |e
µn | is a finite measure). Finally, using the extended triangular inequality for integrals with
respect to signed measures,
Z
1 X n Z
f (ξk ) − f (x)λd (dx) = en ([[0, 1 − v]])ν(dv)
µ
n
k=1 [0,1]d [0,1]d
Z
≤ |e
µn ([[0, 1 − v]])| |ν|(dv)
[0,1]d
Remarks. • The notion of finite variation in the measure sense has been introduced in [27]
and [133]. When d = 1, it coincides with the notion of left continuous functions with finite variation.
When d ≥ 2, it is a slightly more restrictive notion than “finite variation” in the Hardy and Krause
sense (see e.g. [86, 118] for a definition of this slightly more general notion). However, it is much
easier to handle, in particular to establish the above Koksma-Hlawka Inequality! Furthermore
V (f ) ≤ VH&K (f ). Conversely, one shows that a function with finite variation in the Hardy and
94 CHAPTER 4. THE QUASI-MONTE CARLO METHOD
Krause sense is λd -a.s. equal to a function fe having finite variations in the measure sense satisfying
V (f˜) ≤ VH&K (f ). In one dimension, finite variation in the Hardy & Krause sense exactly coincides
with the standard definition of finite variation.
• A classical criterion (see [27, 133]) for finite variation in the measure sense is the following: if
∂df
f : [0, 1]d → R having a cross derivative in the distribution sense which is an integrable
∂x1 · · · ∂xd
function i.e.
df
Z
∂ 1 d
(x , . . . , x ) dx1 · · · dxd < +∞,
1 d
[0,1]d ∂x · · · ∂x
then f has finite variation in the measure sense. This class includes the functions f defined by
Z
f (x) = f (1) + ϕ(u1 , . . . , ud )du1 . . . dud , ϕ ∈ L1 ([0, 1]d , λd ). (4.4)
[[0,1−x]]
f (x1 , x2 , x3 ) := (x1 + x2 + x3 ) ∧ 1
has not finite variations in the measure sense [Hint: its third derivative in the distribution sense is
not a measure] (5 ).
2. (a) Show directly that if f satisfies (4.4), then for any n-tuple (ξ1 , . . . , ξn ),
1 X n Z
∗
f (ξk ) − f (x)λd (dx) ≤ kϕkL1 (λ ) Dn (ξ1 , . . . , ξn ).
n d|[0,1]d
[0,1]d
k=1
(b) Show that if ϕ is also in Lp ([0, 1]d , λd ), for an exponent p ∈ [1, +∞] with an Hölder conjugate
q, then
1 Xn Z
f (ξk ) − f (x)λd (dx) ≤ kϕkLp (λd|[0,1]d ) Dn(q) (ξ1 , . . . , ξn )
n
k=1 [0,1]d
where
Z n d
q ! 1q
1 X Y
Dn(q) (ξ1 , . . . , ξn ) = 1[[0,x]] (ξk ) − xi λd (dx) .
[0,1]d n
k=1 i=1
n Z (n)
ξk+1
X k
Dn(1) (ξ1 , . . . , ξn )
= − u du
(n) n
k=0 ξk
So, it is natural to evaluate its (random) discrepancy Dn∗ (Uk (ω))k≥1 as a measure of its uniform
distribution and one may wonder at which rate it goes to zero: to be more precise is there a kind
of transfer of the Central Limit Theorem (CLT ) and the Law of the Iterated Logarithm (LIL) –
which rule the weak and strong rate of convergence in the SLLN – to this discrepancy modulus.
The answer is positive since Dn∗ ((Un )n≥1 ) satisfies both a CLT and a LIL. These results are due
to Chung (see e.g. [33]).
√ L E(supx∈[0,1]d |Zxd |)
n Dn∗ (Uk )k≥1 −→ sup |Zxd | E Dn∗ ((Uk )k≥1 ) ∼
and √ as n → +∞
x∈[0,1]d n
where (Zxd )x∈[0,1]d denotes the centered Gaussian multi-index process (or “bridged hyper-sheet”)
with covariance structure given by
d
Y d
Y d
Y
∀ x = (x1 , . . . , xd ), y = (y 1 , . . . , y d ) ∈ [0, 1]d , Cov(Zxd , Zyd ) = xi ∧ y i − xi yi .
When d = 1, Z 1 is simply the Brownian bridge over the unit interval [0, 1] and the distribution of
its sup-norm is given, using its tail (or survival) function, by
2 z2
X
∀ z ∈ R+ , P( sup |Zx | ≥ z) = 2 (−1)k+1 e−2k
x∈[0,1] k≥1
At this stage, one can provide a preliminary definition of a sequence with low discrepancy on
[0, 1]d as a [0, 1]d -valued sequence ξ := (ξn )n≥1 such that
r !
∗ log(log n)
Dn (ξ) = o as n → +∞
n
which means that its implementation with a function with finite variation will speed up the con-
vergence rate of numerical integration by the empirical measure with respect to the worst rate of
the Monte Carlo simulation.
Exercise. Show, using the standard LIL, the easy part of Chung’s LIL, that is
s
2n
Dn∗ (Uk (ω))k≥1 ≥ 2 sup
p
lim sup λd ([[0, x]])(1 − λd ([[0, x]])) = 1 P(dω)-a.s.
n log(log n) x∈[0,1]d
(log n)d
∀ n ≥ 1, Dn∗ (ξ) = C(ξ) where C(ξ) < +∞.
n
Based on this, one can derive from the Hammersley procedure (see Section4.3.4 below) the existence
of a real constant Cd ∈ (0, +∞) such that
n (log n)d−1
∀ n ≥ 1, ∃ (ξ1 , . . . , ξn ) ∈ [0, 1]d , Dn∗ (ξ1 , . . . , ξn ) ≤ Cd .
n
4.3. SEQUENCES WITH LOW DISCREPANCY: DEFINITION(S) AND EXAMPLES 97
In spite of more than fifty years of investigations, the gap between these asymptotic lower and
upper bounds have not been improved: it has still not been proved whether there exists a sequence
d
for which C(ξ) = 0 i.e. for which the rate (lognn) is not optimal.
In fact it is widely shared in the QM C community that in theabove lower bounds, d−1
2 can be
(log n) d
d
replaced by d − 1 in (4.5) and 2 by d in (4.6) so that the rate O n seems to be the lowest
possible for a u.d. sequence. When d = 1, Schmidt proved that this conjecture is true.
This leads to a more convincing definition for sequence of a sequence with low discrepancy.
Definition 4.5 A [0, 1]d -valued sequence (ξn )n≥1 is a sequence with low discrepancy if
(log n)d
Dn∗ (ξ) = O as n → +∞.
n
(p)
For more insight about the other measures of uniform distribution (Lp -discrepancy Dn (ξ),
diaphony, etc), we refer e.g. to [25].
The proof of this bound essentially relies on the the Chinese Remainder Theorem (known as
“Théorème chinois” in French). Several improvements of this classical bound have been established:
some of numerical interest (see e.g. [120]), some more theoretical like those established by Faure
(see [46])
d !
1 Y log n pi + 2
Dn∗ (ξ) ≤ d+ (pi − 1) + , n ≥ 1.
n 2 log pi 2
i=1
Remark. When d = 1, the sequence (Φp (n))n≥1 is called the p-adic Van der Corput sequence (and
the integer p needs not to be prime).
One easily checks that the first terms of the V dC(2) sequence are as follows
1 1 3 1 5 3 7
ξ1 = , ξ 2 = , ξ 3 = , ξ 4 = , ξ 5 = , ξ 6 = , ξ 7 = . . .
2 4 4 8 8 8 8
98 CHAPTER 4. THE QUASI-MONTE CARLO METHOD
(n) (n)
Exercise. Let ξ = (ξn )n≥1 denote the p-adic Van der Corput sequence and let ξ1 ≤ · · · ≤ ξn
be the reordering of the first n terms of ξ.
(a) Show that, for every k ∈ {1, . . . , n},
(n) k
ξk ≤ .
n+1
[Hint: Use an induction on q where n = qp + r, 0 ≤ r ≤ p − 1.]
(b) Derive that, for every n ≥ 1,
ξ1 + · · · + ξn 1
≤ .
n 2
(c) One considers the p-adic Van der Corput sequence (ξ˜n )n≥1 starting at 0 i.e.
˜ = 1 ξ1 + · · · + ξn
Dn(1) (ξ) − .
2 n
The Kakutani sequences. This denotes a family of sequences first obtained as a by-product
when trying to generate the Halton sequence as the orbit of an ergodic transform (see [90, 92, 121]).
This extension is based on the p-adic addition on [0, 1], also known as the Kakutani adding machine,
defined on regular p-adic expansions of real of [0, 1] as the addition from the left to the right of the
regular p-adic expansions with carrying over (the p-adic regular expansion of 1 is conventionally
p
set to 1 = 0.(p − 1)(p − 1)(p − 1) . . . ). Let ⊕p denote this addition.Thus, as an example
Tp,y (x) := x ⊕p y.
Proposition 4.4 Let p1 , . . . , pd denote the first d prime numbers, y1 , . . . , yd ∈ (0, 1), where yi is a
pi -adic rational number satisfying yi ≥ 1/pi , i = 1, . . . , d and x1 , . . . , xd ∈ [0, 1]. Then the sequence
(ξ)n≥1 defined by
ξn := (Tpn−1
i ,yi
(xi ))1≤i≤d , n ≥ 1.
has a discrepancy at the origin Dn∗ (ξ) satisfying the same upper-bound (4.7) as the Halton sequence,
see [92, 121].
4.3. SEQUENCES WITH LOW DISCREPANCY: DEFINITION(S) AND EXAMPLES 99
p
Remark. Note that if yi = xi = 1/pi = 0.1 , i = 1, . . . , d, the sequence ξ is simply the regular
Halton sequence.
One asset of this approach is to provide an easy recursive form for the computation of ξn since
ξn = ξn−1 ⊕p (y1 , . . . , yd )
where, with a slight abuse of notation, ⊕p denotes here the componentwise pseudo-addition.
Appropriate choices of the starting vector and the “angle” can significantly reduce the discrep-
ancy at least at a finite range (see remark below).
Practitioner’s corner. Heuristic arguments not developed here suggest that a good choice for
the “angles” yi and the starting values xi is
p
2pi − 1 − (pi + 2)2 + 4pi
yi = 1/pi , xi = 1/5, i 6= 3, 4, xi = , i = 3, 4.
3
This specified Kakutani sequence is much easier to implement than the Sobol’ sequences and
behaves quite well up to medium dimensions d, say 1 ≤ d ≤ 20 (see [133, 27]).
The Faure sequences. These sequences were introduced in [47]. Let p be the smallest prime
integer not lower than d (i.e. p ≥ d). The d-dimensional Faure sequence is defined for every n ≥ 1,
by
ξn = (Φp (n − 1), Cp (Φp (n − 1)), · · · , Cpd−1 (Φp (n − 1)))
It has been shown later on in [118] that they share the so-called “Pp,d ” property
The prominent feature of Faure’s estimate (4.8) is that the coefficient of the leading error term
(in the log n-scale) satisfies
1 p−1 d
lim =0
d d! 2 log p
d
which seems to suggest that the rate is asymptotically better than O (lognn) .
A non-asymptotic upper-bound is provided in [133] (due to Y.J. Xiao in his PhD thesis [158]):
for every n ≥ 1,
d !
1 1 p−1 d
∗ log(2n)
Dn (ξ) ≤ +d+1 .
n d! 2 log p
Note that this bound has the same coefficient in its leading term than the asymptotic error
bound obtained by Faure although, from a numerical point of view, it becomes efficient only for
very large n: thus if d = p = 5 and n = 1 000,
d d !
1 1 p − 1 log(2n)
Dn∗ (ξ) ≤ +d+1 ≈ 1.18
n d! 2 log p
which is of little interest if one keeps in mind that, by construction, the discrepancy takes its values
in [0, 1]. This can be explained by the form of the “constant” term (in the log n-scale) since
(d + 1)d p − 1 d
lim = +∞
d→+∞ d! 2
A better bound is provided in Xiao’s PhD thesis, provided n ≥ pd+2 /2 (of less interest for
applications when d increases since pd+2 /2 ≥ dd+2 /2). Furthermore, it is still suffering from the
same drawback mentioned above.
The Sobol’ sequence. Sobol’s sequence, although pioneering and striking contribution to
sequences with low discrepancy, appears now as a specified sequence in the family of Niederreiter’s
sequences defined below. The usual implemented sequence is a variant due to Antonov and Saleev
(see [3]). For recent developments on that topic, see e.g. [156].
The Niederreiter sequences. These sequences were designed as generalizations of Faure and
Sobol’ sequences (see [118]).
Let q ≥ d be the smallest primary integer not lower than d (a primary integer reads q = pr with
p prime). The (0, d)-Niederreiter sequence is defined for every integer n ≥ 1, by
ξn = (Ψq,1 (n − 1), Ψq,2 (n − 1), · · · , Ψq,d (n − 1))
where !
(i)
X X
Ψq,i (n) := ψ −1 C(j,k) Ψ(ak ) q −j
j k
and Ψ : {0, . . . , q − 1} → IFq is a one-to-one correspondence between {0, . . . , q − 1} and the finite
field IFq with cardinal q satisfying Ψ(0) = 0 and
(i) k
C(j,k) = j−1 Ψ(i − 1).
4.3. SEQUENCES WITH LOW DISCREPANCY: DEFINITION(S) AND EXAMPLES 101
These sequences coincide with the Faure sequence when q is the lowest prime number not lower
than d. When q = 2r , with 2r−1 < d ≤ 2r , the sequence coincides with the Sobol’ sequence (in its
original form).
The sequences of this family all share the Pp,d property and have consequently discrepancy
satisfying an upper bound with a structure similar to that of the Faure sequence.
Proposition 4.6 Let d ≥ 2. Let (ζ1 , . . . , ζn ) be a [0, 1]d−1 -valued n-tuple. Then the [0, 1]d -valued
n-tuple defined by
k
(ξk )1≤k≤n = ζk ,
n 1≤k≤n
satisfies
max1≤k≤n kDk∗ (ζ1 , . . . , ζk ) 1 + max ∗
1≤k≤n kDk (ζ1 , . . . , ζk )
≤ Dn∗ (ξk )1≤k≤n ≤
n n
Proof. It follows from the very definition of the discrepancy at the origin
!
1 Xn d−1
Y
∗ i
Dn ((ξk )1≤k≤n ) = sup 1{ζk ∈[[0,x]], k ≤y} − x y
n
n
(x,y)∈[0,1]d−1 ×[0,1]
k=1
k=1
= sup sup |· · · |
x∈[0,1]d−1 y∈[0,1]
!
1 Xk d−1 k−1 d−1
k Y i 1 X k Y i
= max sup 1{ζ` ∈[[0,x]]} − x ∨ sup 1{ζ` ∈[[0,x]]} − x
1≤k≤n x∈[0,1]d−1 n `=1
n x∈[0,1]d−1 n
i=1 `=1
n
i=1
1 Pn k
since a function y 7→ n k=1 ak 1{ n
k
≤y} − b y (ak , b ≥ 0) reaches its supremum at y = n or at its
left limit “ nk −” for an index k ∈ {1, . . . , n}. Consequently
!
k−1 d−1
1 1 X k Y
Dn∗ ((ξk )1≤k≤n ) = max k Dk∗ (ζ1 , . . . , ζk ) ∨ (k − 1) sup 1{ζ` ∈[[0,x]]} − xi (4.9)
n 1≤k≤n x∈[0,1] d−1 k − 1 k − 1
`=1 i=1
1
max k Dk∗ (ζ1 , . . . , ζk ) ∨ (k − 1)Dk−1
∗
≤ (ζ1 , . . . , ζk−1 ) + 1
n 1≤k≤n
1 + max1≤k≤n kDk∗ (ζ1 , . . . , ζk )
≤ .
n
The lower bound is obvious from (4.9). ♦
Corollary 4.2 Let d ≥ 1. There exists a real constant Cd ∈ (0, ∞) such that, for every n ≥ 1,
there exists an n-tuple (ξ1n , . . . , ξnn ) ∈ ([0, 1]d )n satisfying
1 + (log n)d−1
Dn∗ (ξ1n , . . . , ξnn ) ≤ Cd .
n
102 CHAPTER 4. THE QUASI-MONTE CARLO METHOD
The main drawback of this procedure is that if one starts from a sequence with low discrepancy
(often defined recursively), one looses the “telescopic” feature of such a sequence. If one wishes,
for a given function f defined on [0, 1]d , to increase n in order to improve the accuracy of the
approximation, all the terms of the sum in the empirical mean have to be re-computed
Exercise. Derive the theoretical lower bound (4.6) for infinite sequences from the one for (4.5).
6
Let (X, X , µ) be a probability space. A mapping T : (X, X ) → (X, X ) is ergodic if
Then the Birkhoff’s pointwise ergodic Theorem (see [85]) implies that, for every f ∈ L1 (µ)
n Z
1X
µ(dx)-a.s. f (T k−1 (x)) −→ f dµ.
n
k=1
The mapping T is uniquely ergodic if µ is the only measure satisfying T . If X is a topological space, X = Bor(X)
and T is continuous, then, for any continuous function f : X → R,
n Z
1
X
k−1
sup f (T (x)) −→ f dµ .
x∈X n X
k=1
n−1
In particular it shows that any orbit (T (x))n≥1 is µ-distributed. When X = [0, 1]d and µ = λd , one retrieves the
notion of uniformly distributed sequence. This provides a powerful tool to devise uniformly distributed sequences
(like Kakutani sequences).
4.3. SEQUENCES WITH LOW DISCREPANCY: DEFINITION(S) AND EXAMPLES 103
The Kakutani transforms (rotations with respect to ⊕p ) and the automorphisms of the torus
are typical examples of ergodic transforms, the first ones being uniquely ergodic (keep in mind
that the p-adic Van der Corput sequence is an orbit of the transform Tp,1/p ). The fact that the
Kakutani transforms are not continuous – or to be precise their representation on [0, 1]d – can be
circumvent (see [121]) and one can take advantage of this unique ergodicity to characterize their
coboundaries. Easy-to-check criterions based on the rate of decay of trh Fourier coefficients cp (f ),
p = (p1 , . . . , pd ) ∈ N of a function f as kpk := p1 × · · · × pd goes to infinity have been provided
(see [121, 158, 159]) for the p-adic Van der Corput sequences in 1-dimension and other orbits of
the Kakutani transforms. These criterions are based
Extensive numerical tests on problems involving some smooth (periodic) functions on [0, 1]d ,
d ≥ 2, have been carried out, see e.g. [133, 27]. They suggest that this improvement still holds in
higher dimension, at least partially. These are for the pros.
The cons. As concerns the cons, the first one is that all the non asymptotic available non
asymptotic bounds are very poor from a numerical point of view. We still refer to [133] for some
examples which show that these bounds cannot be used to provide any kind of (deterministic)
error intervals. By contrast, one must always have in mind that the regular Monte Carlo method
automatically provides a confidence interval.
The second drawback concerns the family of functions for which the QM C speeds up the conver-
gence through the Koksma-Hlawka Inequality. This family – mainly the functions with some kind
of finite variations – somehow becomes smaller and smaller as the dimension d increases since the
requested condition becomes more and more stringent. If one is concerned with the usual regularity
of functions like Lipschitz continuous continuity, the following striking theorem due to Proı̈nov
([139]) shows that the curse of dimensionality comes in the game without possible escape.
Theorem 4.2 (Proı̈nov) Assume Rd is equipped with the `∞ -norm defined by |(x1 , . . . , xd )|∞ :=
max |xi |. Let
1≤i≤d
w(f, δ) := sup |f (x) − f (y)|, δ ∈ (0, 1).
x,y∈[0,1]d , |x−y|∞ ≤δ
Exercise. Show using the Van der Corput sequences starting at 0 (defined in Section 4.3.3, see
the exercise in the paragraph devoted to Van der Corput and Halton sequences) and the function
f (x) = x on [0, 1] that the above Proı̈nov Inequality cannot be improved for Lipschitz continuous
functions even in 1-dimension. [Hint: Reformulate some results of the Exercise in Section 4.3.3.]
A third drawback of QM C for practical numerical integration is that all functions need to be
defined on the unit hypercube. One way to get partially rid of that may be to consider integration
on some domains C ⊂ [0, 1]d having a regular boundary in the Jordan sense (7 ). Then a Koksma-
Hlawka like inequality holds true :
Z
n
1X 1
f (x)λd (dx) − 1{xk ∈C} f (ξk ) ≤ V (f )Dn∞ (ξ) d
C n
k=1
where V (f ) denotes the variations of f (in the Hardy & Krause or measure sense) and Dn∞ (ξ)
denotes the extreme discrepancy of (ξ1 , . . . , ξn ) (see again [118]). The simple fact to integrate over
such a set annihilates the low discrepancy effect (at least from a theoretical point of view).
Exercise. Prove Proı̈nov’s Theorem when d = 1 [Hint: read the next chapter and compare
discrepancy and quantization error].
WARNING the dimensional trap! A fourth drawback, undoubtedly the most danger-
ous for beginners, is of course that a given (one-dimensional) sequence (ξn )n≥1 does not “simulate”
independence as emphasized by the classical exercise below.
Exercise. Let ξ = (ξn )n≥1 denote the dyadic Van der Corput sequence. Show that, for every
n≥0
1 ξn
ξ2n+1 = ξ2n + and ξ2n =
2 2
with the convention ξ0 = 0. Deduce that
n
1X 5
lim ξ2k ξ2k+1 = .
n n 24
k=1
Compare with E (U V ), where U, V are independent with uniform distribution over [0, 1]. Conclude.
In fact this phenomenon is typical of the price to be paid for “filling up the gaps” faster than
random numbers do. This is the reason why it is absolutely mandatory to use d-dimensional
sequences with low discrepancy to perform QM C computations related to a random vector X of the
7
Namely that for every ε > 0, λd ({u ∈ [0, 1]d | dist(u, ∂C) < ε}) ≤ κC ε.
4.3. SEQUENCES WITH LOW DISCREPANCY: DEFINITION(S) AND EXAMPLES 105
d
form X = Ψ(Ud ), Ud = U ([0, 1]d ) (d is sometimes called the structural dimension of the simulation).
The d components of these d-dimensional sequences do simulate independence.
This has important consequences on very standard simulation methods as illustrated below.
Application to the “QM C Box-Müller method”: to adapt the Box-Müller method of sim-
ulation of a normal distribution N 0; 1 introduced in Corollary 1.3, we proceed as follows: let
ξ = (ξn1 , ξn2 )n≥1 be a u.d. sequence over [0, 1]2 (in practice chosen with low discrepancy). We set
for every n ≥ 1,
r r !
1 2 1 1 2
1 1 2
ζn = ζn , ζn ) := − log(ξn ) sin 2πξn , − log(ξn ) cos 2πξn , n ≥ 1.
2 2
These continuity assumption on f1 and f2 can be relaxed, e.g. for f2 , into: the function fe2 defined
on (0, 1]2 by
r r !
1 2 1 2 1 2
1 2
(ξ , ξ ) 7−→ fe2 (ξ , ξ ) := f2 − log(ξ 1 ) sin 2πξ , − log(ξ 1 ) cos 2πξ
2 2
is Riemann integrable. Likewise, the Koksma-Hlawka inequality applies, provided fe2 has finite
d
variation.
The same holds for f1 and admits a straightforward extension to functions f (Z), Z =
N 0; Id . Establishing that these functions do have finite variations in practice is usually quite
demanding (if true)
The extension of the multivariate Box-Müller method (1.1) should be performed following the
same rule of the structural dimension: it requires to consider a u.d. sequence over [0, 1]d .
T
In particular, we will see further on in Chapter 7 that simulating the Euler scheme with step m of
a d-dimensional diffusion
over [0, T ] with an underlying q-dimensional Brownian motion
consumes m
independent N 0; Iq -distributed random vectors i.e. m × q independent N 0; 1 random variables.
To perform a QM C simulation of a function of this Euler scheme at time T , we consequently should
consider a sequence with low discrepancy over [0, 1]mq . Existing error bounds on sequences with
low discrepancy and the sparsity of functions with finite variation make essentially meaningless any
use of Koksma-Hlawka’s inequality or Proı̈nov’s theorem to produce error bounds. Not to say that
in the latter case, the curse of dimensionality will lead to extremely poor theoretical bounds for
Lipschitz functions (like fe2 in 2-dimension).
on the sequence with low discrepancy of your choice) and organize a race M C vs QM C to compute
106 CHAPTER 4. THE QUASI-MONTE CARLO METHOD
various calls, say CallBS (K, T ) (T = 1, K ∈ {95, 96, . . . , 104, 105}) in a Black-Scholes model (with
r = 2%, σ = 30%, x = 100, T = 1). To simulate this underlying Black-Scholes risky asset, first use
the closed expression
σ2
√
Xtx = xe(r− 2
)t+σ TZ
.
(b) Anticipating on Chapter 7, implement the Euler scheme (7.3) of the Black-Scholes dynamics
n 1 n − pi
ξni = , n ∈ {1, . . . , pi − 1}, ξn = 2 + , n ∈ {pi , . . . , 2pi − 1}
pi pi pi
so it is clear that the ith and the (i + 1)th components will remain highly correlated if i is close to
d and d is large: p80 = 503 and P81 = 509,. . .
To overcome this correlation observed for (not so) small values of n, the usual method is to
discard the first values of a sequence.
As a conclusion to this section, let us emphasize graphically in terms of texture the differences
between M C and QM C i.e. between (pseudo-)randomly generated points (say 60 000) and the
same number of terms of the Halton sequence (with p1 = 2 and p2 = 3).
Proposition 4.7 (a) Let a := (a1 , . . . , ad ) ∈ Rd , the sequence ({a + ξn })n≥1 is u.d. where
{x} denotes the component-wise fractional part of x = (x1 , . . . , xd ) ∈ Rd (defined as {x} =
{x1 }, . . . , {xd } ).
(b) Let U be a uniformly distributed random variable on [0, 1]d . Then, for every a ∈ Rd
d
{a + U } = U.
4.4. RANDOMIZED QMC 107
Figure 4.1: Left: 6.104 randomly generated point; Right: 6.104 terms of Halton(2, 3)-sequence.
Proof. (a) This follows from the (static) Weyl criterion: let p ∈ Nd \ {0},
n n
1 X 2ı̃π(p|{a+ξk }) 1 X 2ı̃π(p1 (a1 +ξ1 )+···+pd (ad +ξd ))
e = e k k
n n
k=1 k=1
n
1 X 2ı̃π(p|ξk )
= e2ı̃π(p|a) e
n
k=1
−→ 0 as n → +∞.
Hence the random variable {a + U } (supported by [0, 1]d ) has the same Fourier transform as U i.e.
the same distribution. (b) One easily checks that both random variables have the same characteristic
function. ♦
Consequently, if U is a uniformly distributed random variable on [0, 1]d and f : [0, 1]d → R is a
Riemann integrable function, the random variable
n
1X
χ = χ(f, U ) := f ({U + ξk })
n
k=1
satisfies
1
Eχ = × n E f (U ) = E f (U )
n
108 CHAPTER 4. THE QUASI-MONTE CARLO METHOD
owing to claim (b). Then, starting from an M -sample U1 , . . . , UM , one defines the Monte Carlo
estimator of size M attached to χ, namely
M M n
1 X 1 XX
I(f, ξ)n,M :=
b χ(f, Um ) = f ({Um + ξk })
M nM
m=1 m=1 k=1
This estimator has a complexity approximately equal to κ × nM where κ is the unitary complexity
induced by the computation of one value of the function f . It straightforwardly satisfies by the
Strong Long of large Numbers
b ξ)n,M −→a.s.
I(f, E f (U )
from what precedes and a CLT
√ L
N 0; σn2 (f, ξ)
b ξ)n,M − E f (U ) −→
M I(f,
a.s. converges toward E χ = E f (U ) by the above claim (a) and satisfies a CLT with variance with
n
!
1 X
σn2 (f, ξ) = Var f ({U + ξk }) .
n
k=1
Hence, the specific rate of convergence of the QM C is irremediably lost. So, this hybrid method
should be compared to regular Monte Carlo of size nM through their respective variances. It is
clear that we will observe a variance reduction if and only if
One can show that fu : v 7→ f ({u + v}) has finite variation as soon as f has and supu∈[0,1]d V (fu ) <
+∞ (more precise results can be established) then
so that, if ξ = (ξn )n≥1 is sequence with low discrepancy (Halton, Kakutani, Sobol’, etc),
(log n)2d
σn2 (f, ξ) ≤ Cξ2 , n ≥ 1.
n2
4.5. QM C IN UNBOUNDED DIMENSION: THE ACCEPTANCE REJECTION METHOD 109
Consequently, in that case, it is clear that randomized QM C provides a very significant variance
2d
reduction (for the same complexity) of a magnitude proportional to (lognn) (with an impact of
d
magnitude (log √
n)
n
on the confidence interval). But one must have in mind once again that such
functions become dramatically sparse as d increases.
In fact, even better bounds can be obtained for some classes of functions whose Fourier coeffi-
cients cp (f ), p = (p1 , . . . , pd ) ∈ N satisfy some decreasing rate assumptions as kpk := p1 × · · · × pd
2
Cf,ξ
goes to infinity since in that case, one has σn2 (f, ξ) ≤ 2 so that the gain in terms of variance
n
then becomes proportional n1 for such functions (a global budget /complexity being prescribed fro
the simulation).
By contrast, if we consider the Lipschitz setting, things go radically differently: assume that
f : [0, 1]d → R is Lipschitz continuous and isotropically periodic, i.e. for every x ∈ [0, 1]d and
every vector ei = (δij )1≤j≤d , i = 1, . . . , d of the canonical basis of Rd (δij stands for the Kronecker
symbol) f (x+ei ) = f (x) as soon as x+ei ∈ [0, 1]d , then f can be extended as a Lipschitz continuous
function on the whole Rd with the same Lipschitz coefficient, say [f ]Lip . Furthermore, it satisfies
f (x) = f ({x}) for every x ∈ Rd . Then, it follows from Proı̈nov’s Theorem 4.2 that
n Z 2
1 X 2
sup f ({u + ξk }) − f dλd ≤ [f ]2Lip Dn∗ (ξ1 , . . . , ξn ) d
u∈[0,1]d n
k=1 [0,1]d
2 (log n)2
≤ Cd2 [f ]2Lip Cξd 2 .
nd
(where Cd is Proı̈nov’s constant). This time, still for a prescribed budget, the “gain” factor in
2
terms of variance is proportional to n1− d (log n)2 , which is not a gain. . . but a loss as soon as d ≥ 2!
For more results and details, we refer to the survey [153] on randomized QM C and the references
therein.
Finally, randomized QM C is a specific (and not so easy to handle) variance reduction method,
not a QM C speeding up method. Which suffers from one drawback shared by all QM C-based
simulation methods: the sparsity of the class of functions with finite variations and the difficulty
to identify them in practice when d > 1.
I : (u1 , u2 ) 7→ 1{c u1 g(Ψ(u2 ))≤f (Ψ(u2 ))} is λr+1 -a.s. continuous on [0, 1]r+1 (4.10)
ϕ ◦ Ψ is Riemann integrable.
This is classically holds true if ϕ is continuous (see e.g. [30], Chapter 3).
Let ξ = (ξn1 , ξn2 )n≥1 be a [0, 1] × [0, 1]r -valued sequence, assumed to be with low discrepancy (or
simply uniformly distributed) over [0, 1]r+1 . Hence, (ξn1 )n≥1 and (ξn2 )n≥1 are in particular uniformly
distributed over [0, 1] and [0, 1]r respectively.
If (U, V ) ∼ U ([0, 1] × [0, 1]r ), then (U, Ψ(V )) ∼ (U, Y ). Consequently, the product of two
Riemann integrable functions being Riemann integrable,
Pn 2
k=1 1{c ξk1 g(Ψ(ξk2 ))≤f (Ψ(ξk2 ))} ϕ(Ψ(ξk )) n→+∞ E(1{c U g(Y )≤f (Y )} ϕ(Y ))
Pn −→
k=1 1{c ξk1 g(Ψ(ξk2 ))≤f (Ψ(ξk2 ))} P(c U g(Y ) ≤ f (Y ))
Z
= ϕ(x)f (x)dx
Rd
= E ϕ(X) (4.11)
where the last two lines follow from computations carried out in Section 1.4.
The main gap to apply the method in a QM C framework is the a.s. continuity assump-
tion (4.10). The following proposition yields an easy and natural criterion.
f
Proposition 4.8 If the function g ◦ Ψ is λr -a.s. continuous on [0, 1]r , then Assumption (4.10) is
satisfied.
In turn, this subset of [0, 1]r+1 is negligible for the Lebesgue measure λr+1 since, coming back to
the independent random variables U and Y and having in mind that g(Y ) > 0 P-a.s.,
n f o
λr+1 (ξ 1 , ξ 2 ) ∈ [0, 1]r+1 s.t. c ξ 1 = ◦ Ψ(ξ 2 ) = P(c U g(Y ) = f (Y ))
g
f
=P U = (Y ) = 0
cg
where we used (see exercise below) that U and Y are independent by construction and that U has
a diffuse distribution (no atom). ♦
f
Remark. The criterion of the proposition is trivially satisfied when g and Ψ are continuous on
Rd and [0, 1]r respectively.
Exercise. Show that if X and Y are independent and X or Y has no atom then
P(X = Y ) = 0.
As a conclusion note that in this section we provide no rate of convergence for this acceptation-
rejection method by quasi-Monte Carlo. In fact there is no such error bound under realistic as-
sumptions on f , g, ϕ and Ψ. Only empirical evidence can justify its use in practice.
Optimal Vector Quantization is a method coming from Signal Processing devised to approximate a
continuous signal by a discrete one in an optimal way. Originally developed in the 1950’s (see [55]),
it was introduced as a quadrature formula for numerical integration in the early 1990’s (see [122]),
and for conditional expectation approximations in the early 2000’s, in order to price multi-asset
American style options [12, 13, 11]. In this brief chapter, we focus on the cubature formulas for
numerical integration with respect to the distribution of a random vector X taking values in Rd .
In view of applications, we only deal with the canonical Euclidean quadratic optimal quanti-
zation in Rd although general theory of optimal vector quantization can be developed in a much
more general framework in finite (general norms on both Rd and the probability space (Ω, A, P))
and even infinite dimension (so-called functional quantization, see [106, 131]).
We recall that the canonical Euclidean norm on the vector space Rd is denoted | . |.
The Borel sets Ci (x) are called Voronoi cells of the partition induced by x (note that, if x has not
pairwise distinct components, then some of the cells Ci (x) are empty).
Remark. In the above definition | . | denotes the canonical Euclidean norm. However (see [66] many
results in what follows can be established for any norm on Rd , except e.g. some differentiability
results on the quadratic distortion function (see Proposition 6.3.1 in Section 6.3.6).
113
114 CHAPTER 5. OPTIMAL QUANTIZATION METHODS (CUBATURES) I
Let {x1 , . . . , xN } be the set of values of the N -tuple x (whose cardinality is at most N ). The
nearest neighbour projection Projx : Rd → {x1 , . . . , xN } induced by a Voronoi partition is defined
by
N
X
Projx (ξ) := xi 1Ci (x) , ξ ∈ Rd .
i=1
Then, we define an x-quantization of X by
N
X
X̂ x = Projx (X) = xi 1{X∈Ci (x)} . (5.1)
i=1
b x is given by
The pointwise error induced when replacing X by X
b x | = dist(X, {x1 , . . . , xN }) = min |X − xi |.
|X − X
1≤i≤N
When X has a strongly continuous distribution , i.e. P(X ∈ H) = 0 for any hyperplane H of Rd ,
any two x-quantizations are P-a.s. equal.
Definition 5.1.2 The mean quadratic quantization error induced by the the N -tuple x ∈ (Rd )N is
defined as the quadratic norm of the pointwise error i.e.
1 Z 1
2 2
x i 2 i 2
kX − X̂ k2 = E min |X − x | = min |ξ − x | PX (dξ) .
1≤i≤N Rd 1≤i≤N
We briefly recall some classical facts about theoretical and numerical aspects of Optimal Quan-
tization. For further details, we refer e.g. to [66, 129, 130, 124, 131].
Theorem 5.1.1 [66, 122] Let X ∈ L2Rd (P). The quadratic distortion function at level N (defined
squared mean quadratic quantization error on (Rd )N )
x = (x1 , . . . , xN ) 7−→ E min |X − xi |2 = kX − X̂ x k22 (5.2)
1≤i≤N
Rajouter la preuve
This leads naturally to the following definition.
Definition 5.1.3 Any N -tuple solution to the above distortion minimization problem is called an
optimal N -quantizer or an optimal quantizer at level N .
Proposition 5.1.1 [122, 124] Any L2 -optimal N -quantizer x ∈ (Rd )N is stationary in the follow-
b x∗ of X,
ing sense: for every Voronoi quantization X
E(X|X̂ x ) = X̂ x . (5.3)
N
∗
X
bx
E(X | X = yi 1{X∈Ci (x)}
i=1
On the other hand, by the very definition of conditional expectation as an orthogonal projection
b x∗ obviously lies in L2 d (Ω, A∗ , P),
on A∗ and the fact that X R
∗
b x |
≤
X − X b x∗
.
X − E(X | X
2 2
Remark. • It is shown in [66] (Theorem 4.2, p.38) an additional property of optimal quantizers:
the boundaries of any of their Voronoi partition are PX -negligible (even if PX do have atoms).
• Let x ∈ (Rd )N be an N -tuple with pairwise distinct
components and its Voronoi partitions (all)
SN bx
have a P-negligible boundary i.e. P i=1 ∂Ci (x) = 0. Then the Voronoi quantization X of
X is P-a.s. unique and, if x is a local minimum of the quadratic distortion function, then x is a
stationary quantizer, still in the sense that
bx = Xb x.
E X |X
This is an easy consequence of the differentiability result of the quadratic distortion established
further on in Proposition 6.3.1 in Section 6.3.6.
Figure 5.1 shows a quadratic optimal – or at least close to optimality – quantization grid for a
bivariate normal distribution N (0, I2 ). The rate of convergence to 0 of the optimal quantization
error is ruled by the so-called Zador Theorem.
Figure 5.2 illustrates on the bi-variate normal distribution the intuitive fact that optimal quan-
tization does not produce quantizers whose Voronoi cells all have the same “weights” . In fact,
optimizing a quantizer tends to equalize the local inertia of each cell i.e. E 1{X∈Ci (x)} |X − xi |2 ,
i = 1, . . . , N .
116 CHAPTER 5. OPTIMAL QUANTIZATION METHODS (CUBATURES) I
-1
-2
-3
-3 -2 -1 0 1 2 3
Figure 5.1: Optimal quadratic quantization of size N = 200 of the bi-variate normal distribution
N (0, I2 ).
Figure 5.2: Voronoi Tessellation of an optimal N -quantizer (N = 500). Color code: the heaviest
the cell is for the normal distribution, the lightest the cell looks.
Theorem 5.1.2 (Zador’s Theorem) Let d ≥ 1. (a) Sharp rate (see [66]). Let X ∈ L2+δ
Rd
( P)
for some δ > 0. Assume PX (dξ) = ϕ(ξ)λd (dξ) + ν(dξ), ν ⊥ λd (λd Lebesgue measure on Rd ).
Then, there is a constant Je2,d ∈ (0, ∞), such that
Z 1+1
1 d 2 d
x
lim N d min kX − X̂ k2 = Je2,d ϕ d+2 dλd .
N →+∞ x∈(Rd )N Rd
(b) Non asymptotic upper bound (see [107]). Let δ > 0. There exists a real constant Cd,δ ∈
5.1. THEORETICAL BACKGROUND ON VECTOR QUANTIZATION 117
Figure 5.3: Optimal N -quantization (N = 500) of W1 , supt∈[0,1] Wt depicted with its Voronoi
tessellation, W standard Brownian motion.
1
Remarks. • The N d factor is known as the curse of dimensionality: this is the optimal rate to
“fill” a d-dimensional space by 0-dimensional objects.
• The real constant Je2,d clearly corresponds to the case of the uniform distribution U ([0, 1])d ) over
the unit hypercube [0, 1]d for which the slightly more precise statement holds
1 1
lim N d min kX − X̂ x k2 = inf N d min kX − X̂ x k2 ) = Je2,d .
N x∈(Rd )N N x∈(Rd )N
One key of the proof is a self-similarity argument “à la Halton” which establishes the theorem for
the U ([0, 1])d ) distributions.
• Zador’s Theorem holds true for any general – possibly non Euclidean – norm on Rd and the value
of Je2,d depends on the reference norm on Rd . When d = 1, elementary computations show that
1
Je2,1 = 2√ 3
. When d = 2, with the canonical Euclidean norm, one shows (see [117] for a proof,
q
see also [66]) that Je2,d = 185√3 . Its exact value is unknown for d ≥ 3 but, still for the canonical
Euclidean norm, one has (see [66]) using some random quantization arguments,
r s
d d
Je2,d ∼ ≈ as d → +∞.
2πe 17, 08
118 CHAPTER 5. OPTIMAL QUANTIZATION METHODS (CUBATURES) I
which is the quantization-based cubature formula to approximate E (f (X)) [122, 126]. As X̂ x is close
to X, it is natural to estimate E(f (X)) by E(f (X̂ x )) when f is continuous. Furthermore, when f
is smooth enough, one can provide an upper bound for the resulting error using the quantization
error kX − X̂ x k2 , or its square (when the quantizer x is stationary).
The same idea can be used to approximate the conditional expectation E(f (X)|Y ) by E(f (X̂)|Ŷ ),
but one also needs the transition probabilities:
b xk .
≤ [F ]Lip kX − X 1
In fact, considering the Lipschitz continuous functional F (ξ) := d(ξ, x), shows that
kX − X b x k = sup E F (X) − E F (X b x ) . (5.5)
1
[F ]Lip ≤1
5.2. CUBATURE FORMULAS 119
The Lipschitz continuous functionals making up a characterizing family for the weak convergence
of probability measures on Rd , one derives that, for any sequence of N -quantizers xN satisfying
kX − Xb xN k → 0 as N → +∞,
1
(Rd )
b xN = xN ) δ N =⇒
X
P(X i x i
PX
1≤i≤N
( Rd )
where =⇒ denotes the weak convergence of probability measures on Rd .
so that
b ≤ E (F (X)) .
E F (X)
b x yields
Taking conditional expectation given X
b x)−F (X
E(F (X) | X
b x)− E DF (X b x).(X − X
b x) | X
b x ≤ [DF ] E(|X − X
b x |2 | X
b x).
Lip
so that
x x x 2 bx
(F (X) | X ) − F (X ) ≤ [DF ] |X − X | | X .
E b b Lip E b
In fact, the above inequality holds provided F is C 1 with Lipschitz continuous differential on
every Voronoi cell Ci (x). A similar characterization to (5.5) based on these functionals could be
established.
Some variant of these cubature formulae can be found in [129] or [67] for functions or functionals
F having only some local Lipschitz continuous regularity.
120 CHAPTER 5. OPTIMAL QUANTIZATION METHODS (CUBATURES) I
At this stage a natural question is to look for a priori estimates in Lp for the resulting error
given the Lp -quantization errors kX − Xk
b p and kY − Yb kp .
E (F (X) | Y ) = ϕF (Y ).
Usually, no closed form is available for the function ϕF but some regularity property can be es-
tablished, especially in a (Feller) Markovian framework. Thus assume that both F and ϕF are
Lipschitz continuous continuous with Lipschitz continuous coefficients [F ]Lip and [ϕF ]Lip .
Using this time the very definition of conditional expectation given Yb as the best quadratic ap-
proximation among σ(Yb )-measurable random variables, we get
E F (X) | Y ) − E E F (X) | Y | Yb
=
ϕF(Y ) − E (ϕF (Y )|Yb )
2 2
≤
ϕF(Y ) − ϕF (Yb )
.
2
On the other hand, still using that E( . |σ(Yb )) is an L2 -contraction and this time that F is Lipschitz
continuous continuous yields
E(F (X) − F (X) | Y )
≤
F (X) − F (X)
≤ [F ]Lip
X − X
.
b b
b
b
2 2 2
Finally,
5.3. HOW TO GET OPTIMAL QUANTIZATION? 121
2
2
2
E(F (X) | Y ) − E F (X)
≤ [F ]2Lip
X − X
b | Yb
+ [ϕF ]2Lip
Y − Yb
.
b
2 2 2
Lp -case p 6= 2. In the non-quadratic case a counterpart of the above inequality remains valid for
the Lp -norm itself, provided [ϕF ]Lip is replaced by 2[ϕF ]Lip , namely
E(F (X) | Y ) − E(F (X)b | Yb )
≤ [F ]Lip
X − X
+ 2[ϕF ]Lip
Y − Yb
.
b
p p p
www.quantize.maths-fi.com
simulation. Several tests have been carried out and reported in [129, 127] and to refine this a priori theoretical
bound. The benchmark was made of several options on a geometric index on d independent assets in a Black-
Scholes model: puts, puts spread and the same in a smoothed version, always without any control variate. Of
course, not correlated assets is not a realistic assumption but it is clearly more challenging as far as numerical
integration is concerned. Once the dimension d and the number of points N have been chosen, we compared
the resulting integration error with a one standard deviation confidence interval of the corresponding Monte
Carlo estimator for the same number of integration points N . The last standard deviation is computed
thanks to a Monte Carlo simulation carried out using 104 trials.
The results turned out to be more favourable to quantization than predicted by theoretical bounds,
mainly because we carried out our tests with rather small values of N whereas curse of dimensionality is an
asymptotic bound. Until the dimension 4, the larger N is, the more quantization outperforms Monte Carlo
simulation. When the dimension d ≥ 5, quantization always outperforms Monte Carlo (in the above sense)
until a critical size Nc (d) which decreases as d increases.
In this section, we provide a method to push ahead theses critical sizes, at least for smooth enough
functionals. Let F : Rd → R, be a twice differentiable functional with Lipschitz continuous Hessian D2 F .
Let (Xb (N ) )N ≥1 be a sequence of optimal quadratic quantizations. Then
1
E(F (X)) = E(F (Xb (N ) )) + E D2 F (Xb (N ) ).(X − Xb (N ) )⊗2 + O E|X − X| b3 . (5.8)
2
Under some assumptions which are satisfied by most usual distributions (including the normal distribution),
it is proved in [67] as a special case of a more general result that
3
b 3 = O(N − d )
E|X − X|
3−ε
or at least (in particular when d = 2) E |X − X| b 3 = O(N − d ), ε > 0. If furthermore, we make the conjecture
that c
1
E D2 F (Xb (N ) ).(X − Xb (N ) )⊗2 = F,X2 + o ( 3 (5.9)
Nd Nd
one can use a Richardson-Romberg-Richardson extrapolation to compute E(F (X)). Namely, one considers
two sizes N1 and N2 (in practice one often sets N1 = N/2 and N2 = N ). Then combining (5.8) with N1 and
N2 ,
2 2 !
N2d E(F (Xb (N2 ) )) − N d E(F (X
b (N1 ) )) 1
1
E(F (X)) = 2 2 +O 1 2 2 .
N2d − N1d (N1 ∧ N2 ) d (N2d − N1d )
Numerical illustration: In order to see the effect of the extrapolation technique described above,
numerical computations have been carried out in the case of the regularized version of some Put Spread
options on geometric indices in dimension d = 4, 6, 8 , 10. By “regularized”, we mean that the payoff at
maturity T has been replaced by its price function at time T 0 < T (usually, T 0 ≈ T ). Numerical integration
was performed using the Gaussian optimal grids of size N = 2k , k = 2, . . . , 12 (available at the web site
www.quantize.maths-fi.com).
We consider again one of the test functions implemented in [129] (pp. 152). These test functions were
borrowed from classical option pricing in mathematical finance, namely a Put spread option (on a geometric
index which is less classical). Moreover we will use a “regularized” version of the payoff. One considers d
independent traded assets S 1 , . . . , S d following a d-dimensional Black & Scholes dynamics (under its risk
neutral probability)
σ2 √ i,t
i i
St = s0 exp (r − )t + σ tZ , i = 1, . . . , d.
2
where Z i,t = Wti , W = (W 1 , . . . , W d ) is a d-dimensional standard Brownian motion. Independence is
unrealistic but corresponds to the most unfavorable case for numerical experiments. We also assume that
S0i = s0 > 0, i = 1, . . . , d and that the d assets share the same volatility σ i = σ > 0. One considers,
5.5. HYBRID QUANTIZATION-MONTE CARLO METHODS 123
1 σ2 1
the geometric index It = St1 . . . Std d . One shows that e− 2 ( d −1)t It has itself a risk neutral Black-Scholes
dynamics. We want to test the regularized Put spread option on this geometric index with strikes K1 < K2
(at time T /2). Let ψ(s0 , K1 , K2 , r, σ, T ) denote the premium at time 0 of a Put spread on any of the assets
Si.
Using the martingale property of the discounted value of the premium of a European option yields that the
premium e−rT E((K1 − IT )+ − (K2 − IT )+ ) of the Put spread option on I satisfies on the one hand
σ2 1 √
e−rT E((K1 − IT )+ − (K2 − IT )+ ) = ψ(s0 e 2 (d −1)T
, K1 , K2 , r, σ/ d, T )
1, T2 d, T2 d
and Z = (Z ,...,Z ) = N (0; Id ). The numerical specifications of the function g are as follows:
s0 = 100, K1 = 98, K2 = 102, r = 5%, σ = 20%, T = 2.
The results are displayed below (see Fig. 5.4) in a log-log-scale for the dimensions d = 4, 6, 8, 10.
First, we recover the theoretical rates (namely −2/d) of convergence for the error bounds. Indeed, some
slopes β(d) can be derived (using a regression) for the quantization errors and we found β(4) = −0.48,
β(6) = −0.33, β(8) = −0.25 and β(10) = −0.23 for d = 10 (see Fig. 5.4). Theses rates plead for the
implementation of Richardson-Romberg extrapolation. Also note that, as already reported in [129], when
d ≥ 5, quantization still outperforms M C simulations (in the above sense) up to a critical number Nc (d) of
points (Nc (6) ∼ 5000, Nc (7) ∼ 1000, Nc (8) ∼ 500, etc).
As concerns the Richardson-Romberg extrapolation method itself, note first that it always gives better
results than “crude” quantization. As regards now the comparison with Monte Carlo simulation, no critical
number of points NRomb (d) comes out beyond which M C simulation outperforms Richardson-Romberg
extrapolation. This means that NRomb (d) is greater than the range of use of quantization based cubature
formulas in our benchmark, namely 5 000.
Romberg extrapolation techniques are commonly known to be unstable, and indeed, it has not been
always possible to estimate satisfactorily its rate of convergence on our benchmark. But when a significant
slope (in a log-log scale) can be estimated from the Richardson-Romberg errors (like for d = 8 and d = 10 in
Fig. 5.4 (c), (d)), its absolute value is larger than 1/2, and so, these extrapolations always outperform the M C
method even for large values of N . As a by-product, our results plead in favour of the conjecture (5.9) and
lead to think that Richardson-Romberg extrapolation is a powerful tool to accelerate numerical integration
by optimal quantization, even in higher dimension.
1
1
0.1
0.1
0.01
0.01
0.001
0.001
1e-04
1e-05 0.0001
1 10 100 1000 10000 1 10 100 1000 10000
(a) (b)
d=8 d = 10
0.1 0.1
QTF g4 (slope -0.25) QTF g4 (slope -0.23)
QTF g4 Romberg (slope -1.2) QTF g4 Romberg (slope -0.8)
MC MC
0.01
0.01
0.001
0.0001 0.001
100 1000 10000 100 1000 10000
(c) (d)
Figure 5.4: Errors and standard deviations as functions of the number of points N in a log-log-scale.
The quantization error is displayed by the cross + and the Richardson-Romberg extrapolation error
by the cross ×. The dashed line without crosses denotes the standard deviation of the Monte Carlo
estimator. (a) d = 4, (b) d = 6, (c) d = 8, (d) d = 10.
5.5. HYBRID QUANTIZATION-MONTE CARLO METHODS 125
131] to deal with Lipschitz continuous functionals of the Brownian motion. Here, we will deal with
a finite dimensional setting.
b N = Projx (X)
X
(one of ) its (Borel) nearest neighbour projection. We also assume that we have access to the
bN
numerical values of the “companion” probability distribution of x, that is the distribution of X
given by
P(Xb N = xi ) = P(X ∈ Ci (x)), i = 1, . . . , N,
where (Ci (x))i=1,...,N denotes the Voronoi tessellation of the N -quantizer induced by the above
nearest neighbour projection.
Let F : Rd → Rd be a Lipschitz continuous function such that F (X) ∈ L2 (P). In order to
compute E(F (X)), one writes
b N )) + E F (X) − F (X
E(F (X)) = E(F (X bN )
M
1 X N
N
= E(F (X )) +
b F (X (m) ) − F (Xd
(m) ) +R
N,M (5.10)
| {z } M
(a)
m=1
| {z }
(b)
where X (m) , m = 1, . . . , M are M independent copies of X, the symbol bN denotes the nearest
neighbour projection on a fixed N -quantizer x ∈ (Rd )N and RN,M is a remainder term defined
by (5.10). Term (a) can be computed by quantization and Term (b) can computed by a Monte
Carlo simulation. Then,
b N ))
σ(F (X) − F (X b N )k2
kF (X) − F (X bN k
kX − X 2
kRN,M k2 = √ ≤ √ ≤ [F ]Lip √
M M M
b N )N ≥1 is a sequence of
Consequently, if F is simply a Lipschitz continuous function and if (X
optimal quadratic quantizations of X, then
b N )k
kF (X) − F (X σ2+δ (X)
2
kRN,M k2 ≤ √ ≤ C2,δ [F ]Lip √ 1 . (5.11)
M MN d
where C2,δ the constant coming from the non-asymptotic version of Zador’s Theorem.
126 CHAPTER 5. OPTIMAL QUANTIZATION METHODS (CUBATURES) I
Proposition 5.1 (Universal stratification) Let (Ai )i∈I be a stratification of Rd . For every
i ∈ I, define the local inertia of the random vector X by
(a) Then, for every Lipschitz continuous continuous function F : (Rd , | . |) → (Rd , | . |),
Furthermore !2
X
2
pi σi ≥
X − E X|σ({X ∈ Ai }, i ∈ I )
1 .
i∈I
Remark. Any real-valued Lipschitz continuous continuous function can be seen as an Rd -valued
Lipchitz function, but then the above equalities (5.12), (5.13) and (5.14) only hold as inequalities.
owing to the very definition of conditional expectation as a minimizer w.r.t. to the conditional
distribution. Now using that F is Lipschitz, it follows that
2 1
≤ [F ]2Lip E 1{X∈Ai } |X − E(X|X ∈ Ai )|2 = [F ]2Lip σi2 .
σF,i
pi
Equalities in (b) and (c) straightforwardly follow from (a). Finally, the monotony of the Lp -norms
implies
N
X
pi σi ≥ kX − E(X|σ({X ∈ Ai }, i ∈ I))k1 . ♦
i=1
Ai = Ci (x), i ∈ I.
Then, for every i ∈ {1, . . . , N }, there exists a Borel function ϕ(xi , .) : [0, 1]q → Rd such that
we are lead to consider situations where q ≥ 2 (or even infinite when considering a Von Neumann
acceptance-rejection method).
d bx d
Now let (ξ, U ) be a couple of independent random vectors such that ξ = X and U = U ([0, 1])q .
Then, one checks that
d
ϕ(ξ, U ) = X
so that one may assume without loss of generality that X = ϕ(ξ, U ) which in turn implies that
ξ=X b x i.e.
d
b x , U ), U =
X = ϕ(X U ([0, 1]q ), b x independent.
U, X
In terms of implementation, as mentioned above we need a closed formula for the function ϕ
which induces some stringent constraint on the choice of the N -quantizers. In particular there is
no reasonable hope to consider true optimal quadratic quantizer for that purpose. A reasonable
compromise is to consider some optimal product quantization for which the function ϕ can easily
be made explicit (see Section 3.5).
Let Ad (N ) denote the family of all Borel partitions of Rd having (at most) N elements.
Proof. First we rewrite (5.13) and (5.14) in terms of quantization i.e. with Ai = Ci (x),:
!
X X
2
2
sup pi σF,i ≤ pi σi2 x
=
X − E(X|X )
. (5.15)
b
[F ]Lip ≤1 i∈I i∈I
2
and
!2 !2
X X
2
sup pi σF,i ≤ pi σi
b x )
≥
X − E(X|X
(5.16)
[F ]Lip ≤1 i∈I i∈I
1
b x ).
where we used the obvious fact that σ({X ∈ Ci (x)}, i ∈ I) = σ(X
(a) It follows from (5.13) that
!
X
pi σi2
inf = inf kX − E X|σ({X ∈ Ai }, i ∈ I k2 .
(Ai )1≤i≤N (Ai )1≤i≤N
i∈I
5.5. HYBRID QUANTIZATION-MONTE CARLO METHODS 129
∗N
n o
bx
kX − X k2 = min kX − Y k2 , Y : (ω, A, P) → Rd , |Y (Ω)| ≤ N
and
b x∗N = E(X | X
X b x∗N ).
X
b x(A) k
pi σi ≥ kX − X 1
i∈I
= b xk .
min kX − X ♦
1
x∈(Rd )N
As a conclusion, we see that the notion of universal stratification (with respect to Lipschitz con-
tinuous functions) and quantization are closely related since the variance reduction factor that can
be obtained by such an approach is essentially ruled by the quantization rate of the state space of
the random vector X by its distribution.
One dimension In that case the method applies straightforwardly provided both the distribution
function FX (u) := P(X ≤ u) of X (on R̄) and its right continuous (canonical) inverse on [0, 1],
denoted FX−1 are computable.
We also need the additional assumption that the N -quantizer x = (x1 , . . . , xN ) satisfies the
following continuity assumption
P(X = xi ) = 0, i = 1, . . . , N.
6.1 Motivation
In Finance, one often faces some optimization problems or zero search problems. The first ones
often reduce to the second one since, at least in a convex framework, minimizing a function amounts
to finding a zero of its gradient. The most commonly encountered examples are the extraction of
implicit parameters (implicit volatility of an option, implicit correlations in the credit markets
or for a single best-of-option), the calibration, the optimization of an exogenous parameters for
variance reduction (regression, Importance sampling, etc). All these situations share a common
feature: the involved functions all have a representation as an expectation, namely they read
h(y) = E H(y, Z) where Z is a q-dimensional random vector. The aim of this chapter is to provide
a toolbox – stochastic approximation – based on simulation to solve these optimization or zero
search problems. It can be seen as an extension of the Monte Carlo method.
Stochastic approximation can be presented as a probabilistic extension of Newton-Raphson like
zero search recursive procedures of the form
131
132 CHAPTER 6. STOCHASTIC APPROXIMATION AND APPLICATIONS TO FINANCE
Let us make a connection with diffferential dynamic systems: thus, when γn = γ > 0, Equa-
tion (6.1) is but the Euler scheme with step γ > 0 of the ODE
ODEh ≡ ẏ = −h(y).
A Lyapunov function for ODEh is a function L : Rd → R+ such that any solution t 7→ x(t)
of the equation satisfies t 7→ L(x(t)) is non-increasing as t increases. If L is differentiable this is
mainly equivalent to the condition (∇L|h) ≥ 0 since
d
L(y(t)) = (∇L(y(t))|ẏ(t)) = −(∇L|h)(y(t)).
dt
If such a Lyapunov function does exist (which is not always the case!), the system is said to be
dissipative.
Basically one meets two frameworks:
– the function L is identified a priori, it is the object of interest for optimization purpose, and
one designs a function h from L e.g. by setting h = ∇L (or possibly h proportional to ∇L).
– The function of interest is h and one has to search for a Lyapunov function L (which may not
exist). This usually requires a deep understanding of the problem from a dynamical point of view.
This duality also occurs in discrete time Stochastic Approximation Theory from its very begin-
ning in the early 1950’ (see [142, 82]).
As concerns the constraints on the Lyapunov function, due to the specificities of the discrete
time setting, we will require some further regularity assumption on ∇L, typically ∇L is Lipschitz
continuous and |∇L|2 ≤ C(1 + L) (essentially quadratic property).
∀ x, y ∈ Rd , (h(y) − h(x)|y − x) ≥ 0
Now imagine that no straightforward access to numerical values of h(y) is available but that h
has an integral representation with respect to an Rq -valued random vector Z, say
Borel d
h(y) = E H(y, Z), H : Rd × Rq −→ Rd , Z = µ, (6.2)
smoothen the chaotic (stochastic. . . ) effect induced by this “local” randomization. However one
should not to make γn go to 0 too fast so thatP an averaging effect occurs like in the Monte Carlo
method. In fact, one should impose that n γn = +∞ to ensure that the initial value of the
procedure will be forgotten.
Based on this heuristic analysis, we can reasonably hope that the recursive procedure
where
(Zn )n≥1 is an i.i.d. sequence with distribution µ defined on (Ω, A, P)
and Y0 is an Rd -valued random vector (independent of the sequence (Zn )n≥1 ) defined on the same
probability space also converges to a zero y∗ of h at least under appropriate assumptions, to be
specified further on, on both H and the gain sequence γ = (γn )n≥1 .
What precedes can be seen as the “meta-theorem” of stochastic approximation. In this frame-
work, the Lyapunov functions mentioned above are called upon to ensure the stability of the
procedure.
A first toy-example: the Strong Law of Large Numbers. As a first example note that the sequence
of empirical means (Z n )n≥1 of an i.i.d. sequence (Zn ) of integrable random variable satisfies
1
Z n+1 = Z n − (Z n − Zn+1 ), n ≥ 0, Z̄0 = 0,
n+1
i.e. a stochastic approximation procedure with H(y, Z) = y − Z and h(y) := y − z∗ with z∗ = E Z
(so that Yn = Z̄n ). Then the procedure converges a.s. (and in L1 ) to the unique zero z∗ of h.
The (weak) rate of convergence of (Z n )n≥1 is ruled by the CLT which may suggest that the
generic rate of convergence of this kind of procedure is of the same type. In particular the deter-
1
ministic counterpart with the same gain parameter, yn+1 = yn − n+1 (yn − z∗ ), converges at a n1 -rate
to z∗ (and this is clearly not the optimal choice for γn in this deterministic framework).
However, if we do not know the value of the mean z∗ = E Z but if we able to simulate µ-
distributed random vectors, the first recursive stochastic procedure can be easily implemented
whereas the deterministic one cannot. The stochastic procedure we are speaking about is simply
the regular Monte Carlo method!
We know that σ 7→ P utBS (x, K, σ, r, T ) = e−rT E(K − XTx,σ )+ is an even function, increasing on
(0, +∞), continuous with lim P utBS (x, K, σ, r, T ) = (e−rT K−x)+ and lim P utBS (x, K, σ, r, T ) =
σ→0 σ→+∞
−rT
e K (these bounds are model-free and can be directly derived by arbitrage arguments). Let
134 CHAPTER 6. STOCHASTIC APPROXIMATION AND APPLICATIONS TO FINANCE
Pmarket (x, K, r, T ) ∈ [(e−rT K − x)+ , e−rT K] be a consistent mark-to-market price for the Put op-
tion with maturity T and strike price K. Then the implied volatility σimpl := σimpl (x, K, r, T ) is
defined as the unique (positive) solution to the equation
P utBS (x, K, σimpl , r, T ) = Pmarket (x, K, r, T )
i.e.
E e−rT (K − XTx,σ )+ − Pmarket (x, K, r, T ) = 0.
This naturally leads to devise the following stochastic algorithm to solve numerically this equation:
√
σ2
(r− 2n )T +σn T Zn+1
σn+1 = σn − γn+1 K − xe − Pmarket (x, K, r, T )
+
where (Zn )n≥1 is an i.i.d. sequence of N (0; 1)-distributed random variables and the step sequence
γ = (γn )n≥1 is e.g. given by γn = nc for some parameter c > 0. After a necessary “tuning” of the
2
constant c (try c = x+K ), one observes that
σn −→ σimpl a.s. as nn → +∞.
A priori one could imagine that σn could converge toward −σimpl (which would not be a real
problem) but it a.s. never happens because this negative solution is repulsive for the related ODEh
and “noisy”. This is an important topic often known in the literature as “how stochastic algorithms
never fall into noisy traps” (see [29, 96, 136], etc).
To conclude this introductory section, let us briefly come back to the case where h = ∇L and
L(y) = E(Λ(y, Z)) so that ∇L(y) = E H(y, Z) with H(y, z) := ∂Λ(y,z)∂y . The function H is sometimes
called a local gradient (of L) and the procedure (6.3) is known as a stochastic gradient procedure.
When Yn converges to some zero y∗ of h = ∇L at which the algorithm is “noisy enough” – say e.g.
E(H(y∗ , Z) H(y∗ , Z)∗ ) > 0 is a definite symmetric matrix – then y∗ is necessarily a local minimum
of the potential L: y∗ cannot be a trap. So, if L is strictly convex and lim|y|n→+∞ L(y) = +∞, ∇L
has a single zero y∗ which is but the global minimum of L: the stochastic gradient turns out to be
a minimization procedure.
However, most recursive stochastic algorithms (6.3) are not stochastic gradients and the Lya-
punov function, if any, is not naturally associated to the algorithm: finding a Lyapunov function
to “stabilize” the algorithm (by bounding a.s. its paths, see Robbins-Siegmund Lemma below) is
often a difficult task which requires a deep understanding of the related ODEh .
As concerns the rate of convergence, one must have in mind that it usually ruled by a CLT at
√ √
a 1/ γn rate which can reach at most the n-rate of the regular CLT . So, such a “toolbox” is
clearly not competitive compared to a deterministic procedure when available but this rate should
be compare to that of the Monte Carlo metrhod (i.e. the SLLN ) since their field of application
are similar: stochastic approximation is the natural extension of the Monte Carlo method to solve
inverse or optimization problems related to functions having a representation as an expectation of
simulatable random functionals.
Recently, several contributions (see [4, 101, 102]) draw the attention of the quants world to
stochastic approximation as a tool for variance reduction,implicitation of parameters, model cal-
ibration, risk management. . . It is also used in other fields of finance like algorithmic trading as
an on line optimizing device for execution of orders (see e.g. [94]). We will briefly discuss several
(toy-)examples of application.
6.2. TYPICAL A.S. CONVERGENCE RESULTS 135
(∇L|h) ≥ 0. (6.5)
Remarks and terminology. • The sequence (γn )n≥1 is called a step sequence or a gain parameter
sequence.
• If the function L satisfies (6.4), (6.5), (6.6) and moreover lim|y|→+∞ L(y) = +∞, then L is called
a Lyapunov function of the system like in Ordinary Differential Equation Theory.
√ √
• Note that the assumption (6.4) on L implies that ∇ 1 + L is bounded. Hence L has at most
a linear growth so that L itself has at most a quadratic growth.
• In spite of the standard terminology, the step sequence does not need to be decreasing in As-
sumption (6.7).
136 CHAPTER 6. STOCHASTIC APPROXIMATION AND APPLICATIONS TO FINANCE
P
• A careful reading of the proof below shows that the assumption n≥1 γn = +∞ is not needed.
However we leave it in the statement because it is dramatically useful for any application of this
Lemma since it implies, combined with (iv) that
The key of the proof is the following convergence theorem for non negative super-martingales
(see [116]).
Theorem 6.2 Let (Sn )n≥0 be a non-negative super-martingale with respect to a filtration (Fn )n≥0
on a probability space (Ω, A, P) (i.e. for every n ≥ 0, Sn ∈ L1 (P) and E(Sn+1 |Fn ) ≤ Sn a.s.) then,
Sn converges P-a.s. to an integrable (non-negative) random variable S∞ .
For general convergence theorems for sub-, super- and true martingales we refer to any standard
course on Probability Theory or, preferably to [116].
where
∆Mn+1 = H(Yn , Zn+1 ) − h(Yn ).
We aim at showing that ∆Mn+1 is a (square integrable) Fn -martingale increment satisfying
E(|∆Mn+1 |2 | Fn ) ≤ C(1 + L(Yn )) for an appropriate real constant C > 0.
First note that L(Yn ) ∈ L1 (P) and H(Yn , Zn+1 ) ∈ L2 (P) for every n ≥ 0: this follows from (6.8)
and an easy induction since
1
E|(∇L(Yn )|H(Yn , Zn+1 ))| ≤ (E|∇L(Yn )|2 + E|H(Yn , Zn+1 )|2 ) ≤ C(1 + E L(Yn ))
2
(where we used that |(a|b)| ≤ 21 (|a|2 + |b|2 ), a, b ∈ Rd ).
Now Yn being Fn -measurable and Zn+1 being independent of Fn
Now, one derives from the assumptions (6.7) and (6.9) that there exists two positive real con-
stants CL = C[∇L]Lip > 0 such that
Pn−1 P 2
L(Yn ) + k=0 γk+1 (∇L(Yk )|h(Yk )) + CL k≥n+1 γk
Sn = Qn 2
k=1 (1 + CL γk )
is a (non negative) super-martingale with S0 = L(Y0 ) ∈ L1 (P). This uses that (∇L|h) ≥ P0. Hence2 P-
a.s. converging toward an integrable random variable S∞ . Consequently, using that k≥n+1 γk →
0, one gets
n−1
a.s.
X Y
L(Yn ) + γk+1 (∇L|h)(Yk ) −→ Se∞ = S∞ (1 + CL γn2 ) ∈ L1 (P). (6.10)
k=0 n≥1
The super-martingale (Sn )n≥0 being L1 (P)-bounded, one derives likewise that (L(Yn ))n≥0 is L1 -
bounded since
n
!
Y
L(Yn ) ≤ (1 + CL γk2 ) Sn , n ≥ 0.
k=1
Now, a series with non-negative terms which is upper bounded by an (a.s.) converging sequence,
a.s. converges in R+ so that
X
γn+1 (∇L|h)(Yn ) < +∞ P-a.s.
n≥0
n→+∞
It follows from (6.10) that, P-a.s., L(Yn ) −→ L∞ which is integrable since (L(Yn ))n≥0 is L1 -
bounded.
Finally
X X X
E|∆Yn |2 ≤ γn2 E(|H(Yn−1 , Zn )|2 ) ≤ C γn2 (1 + EL(Yn−1 )) < +∞
n≥1 n≥1 n≥1
2)
P
so that n≥1 E(|∆Yn | < +∞ a.s. which yields Yn − Yn−1 → 0 a.s.. ♦
Remark. The same argument which shows that (L(Yn ))n≥1 is L1 (P)-bounded shows that
X X
γn E(∇L|h)(Yn−1 ) = E γn (∇L|h)(Yn−1 ) < +∞ a.s..
n≥1 n≥1
Corollary 6.1 (a) Robbins-Monro algorithm. Assume that the mean function h of the algo-
rithm is continuous and satisfies
∀ y ∈ Rd , y 6= y∗ ,
y − y∗ |h(y) > 0 (6.11)
(which implies that {h = 0} = {y∗ }). Suppose furthermore that Y0 ∈ L2 (P) and that H satisfies
∀ y ∈ Rd ,
H(y, Z)
≤ C(1 + |y|).
2
138 CHAPTER 6. STOCHASTIC APPROXIMATION AND APPLICATIONS TO FINANCE
Assume that the step sequence (γn )n≥1 satisfies (6.7). Then
a.s.
Yn −→ y∗
the step sequence (γn )n≥1 satisfies (6.7). Then L(y∗ ) = minRd L and
a.s.
Yn −→ y∗ .
Proof. (a) Assumption (6.11) is but the mean-reverting assumption related to the Lyapunov
function L(y) = |y − y∗ |2 . The assumption on H is clearly the linear growth assumption (6.6) for
this function L. Consequently, it follows from the above Robbins-Siegmund Lemma that
X
|Yn − y∗ |2 −→ L∞ ∈ L1 (P) and γn (h(Yn−1 )|Yn−1 − y∗ ) < +∞ P a.s.
n≥1
One may assume the existence of a subsequence (φ(n, ω))n such that
n→+∞
(Yφ(n,ω) (ω) − y∗ |h(Yφ(n,ω) (ω))) −→ 0 and Yφ(n,ω) (ω) → y∞ = y∞ (ω).
If lim inf (Yn−1 (ω) − y∗ |h(Yn−1 (ω)) > 0 the convergence of the series induces a contradiction
Xn
with γn = +∞. Now, up to one further extraction, one may assume since (Yn (ω)) is bounded,
n≥1
that Yφ(n,ω) (ω) → y∞ . It follows that (y∞ − y∗ |h(y∞ )) = 0 which in turn implies that y∞ = y∗ .
Now
lim |Yn (ω) − y∗ |2 = lim |Yφ(n,ω) (ω) − y∗ |2 = 0.
n n
2
Finally, for every p ∈ (0, 2), (|Yn − y∗ |p )n≥0 is L p (P)-bounded, hence uniformly integrable. As
a consequence the a.s. convergence holds in L1 i.e. Yn → y∗ converges in Lp (P).
(b) One may apply Robbins-Siegmund’s Lemma with L as Lyapunov function since since (h|∇L)(x) =
|∇L(x)|2 ≥ 0. The assumption on H is but the quadratic linear growth assumption. As a conse-
quence X
L(Yn ) −→ L∞ ∈ L1 (P) and γn |∇L(Yn−1 )|2 < +∞ P-a.s.
n≥1
6.2. TYPICAL A.S. CONVERGENCE RESULTS 139
Let ω ∈ Ω such that L(Yn (ω)) converges in R+ and n≥1 γn |∇L(Yn−1 (ω))|2 < +∞ (and Yn (ω) −
P
Yn−1 (ω) → 0). The same argument as above shows that
lim inf |∇L(Yn (ω))|2 = 0.
n
From the convergence of L(Yn (ω)) toward L∞ (ω) and lim L(y) = +∞, one derives the bound-
|y|→∞
edness of (Yn (ω))n≥0 . Then there exists a subsequence (φ(n, ω))n≥1 such that Yφ(n,ω) → ye and
lim ∇L(Yφ(n,ω) (ω)) = 0 and L(Yφ(n,ω) (ω)) → L∞ (ω).
n
Then ∇L(e y ) = 0 which implies ye = y∗ . Then L∞ (ω) = L(y∗ ). The function L being non-negative,
differentiable and going to infinity at infinity, it implies that L reaches its unique global minimum
at y∗ . In particular {L = L(y∗ )} = {∇L = 0} = {y∗ }. Consequently the only possible limiting
value for the bounded sequence (Yn (ω))n≥1 is y∗ i.e. Yn (ω) converges toward y∗ .
The Lp (P)-convergence to 0 of |∇L(Yn )|, p ∈ (0, 2), follows by the same uniformly integrability
argument like in (a). ♦
Exercises. 1. Show that Claim (a) remains true if one only assume that
y 7−→ (h(y)|y − y∗ ) is lower semi-continuous.
2 (Non-homogeneous L2 -strong law of large numbers by stochastic approximation).
Let (Zn )n≥1 be an i.i.d. sequence of square integrable random vectors. Let (γn )n≥1 be a sequence
of positive real numbers satisfying the decreasing step Assumption (6.7). Show that the recursive
procedure defined by
Yn+1 = Yn − γn+1 (Yn − Zn+1 )
a.s. converges toward y∗ = E Z1 .
The above settings are in fact some special cases of a more general result the so-called “pseudo-
gradient setting” stated below. However its proof, in particular in a multi-dimensional setting,
needs additional arguments, mainly the so-called ODE method (for Ordinary Differential Equation)
originally introduced by Ljung (see [103]). The underlying idea is to consider that a stochastic
algorithm is a perturbed Euler scheme with a decreasing step of the ODE ẏ = −h(y). For a
detailed proof we refer to classical textbooks on Stochastic Approximation like [21, 43, 89].
Theorem 6.3 (Pseudo-Stochastic Gradient) Assume that L and h and the step sequence
(γn )n≥1 satisfy all the assumptions of the Robbins-Siegmund Lemma. Assume furthermore that
lim L(y) = +∞ and (∇L|h) is lower semi-continuous.
yn→+∞
Then, P(dω)-a.s. there exists ` = `(ω) ≥ 0 and a connected component Y∞ (ω) of {(∇L|h) =
0} ∩ {L = `} such that
Proof (One-dimensional case). We consider ω ∈ Ω for which all the conclusions of Robbins-
Siegmund Lemma are true. Combining Yn (ω) − Yn−1 (ω) → 0 with the boundedness of the sequence
(Yn (ω))n≥1 , one can show that the set Y∞ (ω) of the limiting values of (Yn (ω))n≥0 is a connected
compact set (2 ).
On the other hand, Y∞ (ω) ⊂ {L = L∞ (ω)} since L(Yn (ω)) → L∞ (ω). Furthermore, reasoning
like in the proof of claim (b) of the above Corollary shows that there exists a limiting value y∗ ∈
Y∞ (ω) such that (∇L(y∗ )|h(y∗ )) = 0 so that y∗ ∈ {(∇L|h) = 0} ∩ {L = L∞ (ω)}.
At this stage, we do assume that d = 1. Either Y∞ = {y∗ } and the proof is complete. Or Y∞ (ω)
is an interval as a connected subset of R. The function L is constant on this interval, consequently
the derivative L0 is zero on Y∞ (ω). Hence the conclusion. ♦
where ϕ : Rd → R is integrable with respect the normalized Gaussian measure. In order to deal
with a consistent problem, we assume throughout this section that
P(ϕ(Z) 6= 0) > 0.
where A is a lower triangular matrix such that the covariance matrix R = AA∗ has diagonal entries
equal to 1 and φ is a (non-negative) continuous payoff function. This can also be the Gaussian
vector resulting from the Euler scheme of the diffusion dynamics of a risky asset, etc.
2
The method of proof is to first establish that Y∞ (ω) is a “bien enchaı̂né” set. A subset A ⊂ Rd is “bien enchaı̂né”
if for every a, a0 ∈ A, every ε > 0, there exists p ∈ N∗ , b0 , b1 , . . . , bp ∈ A such that b0 = a, bp = a0 , |bi − bi−1 | ≤ ε.
Any connected set A is “bien enchaı̂né” and the converse is true if A is compact.
6.3. APPLICATIONS TO FINANCE 141
One natural way to optimize the computation by Monte Carlo simulation of the Premium is to
choose among the above representations depending on the parameter θ ∈ Rd the one with the lowest
variance. This means solving, at least roughly, the following minimization problem
min L(θ)
θ∈Rd
with
2
L(θ) = e−|θ| E ϕ2 (Z + θ)e−2(Z|θ)
|θ|2
since Var(e− 2 ϕ(Z + θ)e−(θ|Z) ) = L(θ) − (E ϕ(Z))2 . A reverse change of variable shows that
|θ|2
L(θ) = e 2 E ϕ2 (Z)e−(Z|θ) . (6.13)
Hence, if E ϕ(Z)2 |Z|ea|Z| < +∞ for every a ∈ (0, ∞), one can always differentiate L(θ) owing to
Theorem 2.1(b) with
|θ|2
∇L(θ) = e 2 E ϕ2 (Z)e−(θ|Z) (θ − Z) . (6.14)
1 2
clearly shows that L is strictly convex since θ 7→ e 2 |θ−z| is strictly convex for every z ∈ Rd (and
ϕ(Z) is not identically 0). Furthermore, Fatou’s Lemma implies lim|θ|→+∞ L(θ) = +∞.
Consequently, L has a unique global minimum θ∗ which is also local and whence satisfies
∇L(θ∗ ) = 0.
We prove now the classical lemma which shows that if L is strictly convex then θ 7→ |θ − θ∗ |2
is a Lyapunov function (in the Robbins-Monro sense defined by (6.11)).
Proof. (a) One introduces the differentiable function defined on the unit interval
Consequently
∇L(θ) = 0 ⇐⇒ E ϕ2 (Z − θ)(2θ − Z) = 0.
From now on, we assume that there exist two positive real constants a, C > 0 such that
a
0 ≤ ϕ(z) ≤ Ce 2 |z| , z ∈ Rd . (6.15)
Exercises. 1. Show that under this assumption, E ϕ(Z)2 |Z|e|θ||Z| < +∞ for every θ ∈ Rd
(in the sense of symmetric matrices) which proves again that L is strictly convex.
Taking this assumption into account, we set
1
2 +1) 2
Ha (θ, z) = e−a(|θ| ϕ(z − θ)2 (2θ − z). (6.16)
≤ C 0 (1 + |θ|2 )
since |Z| has a Laplace transform defined on the whole real line which in turn implies E ea|Z| |Z|m <
so that ha is continuous, (θ − θ∗ |ha (θ)) > 0 for every θ 6= θ∗ and {ha = 0} = {θ∗ }.
Applying Corollary 6.1(a) (called Robbins-Monro Theorem), one derives that for any step se-
quence γ = (γn )n≥1 satisfying (6.7), the sequence (θn )n≥0 defined by
where (Zn )n≥1 is an i.i.d. sequence with distribution N (0; Id ) and θ0 is an independent Rd -valued
random vector, both defined on a probability space (Ω, A, P), satisfies
a.s.
θn −→ θ∗ as n → ∞.
p
Remarks. • The only reason for introducing |θ|2 + 1 is that this function behaves like |θ| (which
can be successfully implemented in practice) but is also everywhere differentiable which simplifies
the discussion about the rate of convergence detailed further on.
• Note that no regularity assumption is made on the payoff ϕ.
• An alternative based on large deviation principle but which needs some regularity assumption on
the payoff ϕ is developed in [58]. See also [137].
• To prevent a possible “freezing” of the procedure, for example when the step sequence has
been misspecified or when the payoff function is too anisotropic, one can replace the above proce-
dure (6.17) by the following fully data-driven variant of the algorithm
∀ n ≥ 0, θen+1 = θen − γn+1 H
e a (θen , Zn+1 ), θe0 = θ0 , (6.18)
where
2
e a (θ, z) := ϕ(z − θ) (2θ − z).
H
1 + ϕ(−θ)2
This procedure also converges a.s. under sub-multiplicativity assumption on the payoff function ϕ
(see [102]).
• A final – and often crucial – trick to boost the convergence when dealing with rare events as it
is often the case with importance sampling is to “drive” a parameter from a “regular” value to the
value that makes the event rare. Typically when trying to reduce the variance of a deep-out-of-
the-money Call option like in the numerical illustrations below, a strategy can be to implement the
above algorithm with a slowly varying strike Kn which goes from K0 = x0 to the “target” strike K
(see below) during the first iterations.
– Rate of convergence (About the): Assume that the step sequence has the following para-
α
metric form γn = , n ≥ 1, and that Dha (θ∗ ) is positive in the following sense: all the
β+n
eigenvalues of Dha (θ∗ ) have a positive real part. Then, the rate of convergence of θn toward θ∗ is
√
ruled by a CLT (at rate n) if and only if (see 6.4.3)
1
α> >0
2<e(λa,min )
where λa,min is the eigenvalue of Dha (θ∗ ) with the lowest real part. Moreover, one can show that
the (theoretical. . . ) best choice for α is αopt := <e(λ1a,min ) . (We give in section 6.4.3 some insight
about the asymptotic variance.) Let us focus now on Dha (θ∗ ).
Starting from the expression
1
2 +1) 2 −|θ|2
ha (θ) = e−a(|θ| ∇L(θ)
1 |θ|2
−a(|θ|2 +1) 2 − 2
= e × E ϕ(Z)2 (θ − Z)e−(θ|Z) by (6.14)
= ga (θ) × E ϕ(Z)2 (θ − Z)e−(θ|Z) .
6.3. APPLICATIONS TO FINANCE 145
Then
|θ|2
Dha (θ) = ga (θ)E ϕ(Z)2 e−(θ|Z) (Id + ZZ t − θZ t ) + e− 2 ∇L(θ) ⊗ ∇ga (θ).
(where u ⊗ v = [ui vj ]i,j ). Using that ∇L(θ∗ ) = ha (θ∗ ) = 0 (so that ha (θ∗ )(θ∗ )t = 0),
Dha (θ∗ ) = ga (θ∗ )E ϕ(Z)2 e−(θ∗ |Z) (Id + (Z − θ∗ )(Z − θ∗ )t )
Hence Dha (θ∗ ) is a definite positive symmetric matrix. Its lowest eigenvalue λa,min satisfies
λa,min ≥ ga (θ∗ )E ϕ(Z)2 e−(θ∗ |Z) > 0.
These computations show that if the behaviour of the payoff ϕ at infinity is mis-evaluated,
this leads to a bad calibration of the algorithm. Indeed if one considers two real numbers a, a0
satisfying (6.15) with 0 < a < a0 , then one checks with obvious notations that
1 ga0 (θ∗ ) 1 0 2
1
1 1
= = e(a−a )(|θ∗ | +1) 2 < .
2λa,min ga (θ∗ ) 2λa0 ,min 2λa0 ,min 2λa0 ,min
So the condition on α is more stringent with a0 than it is with a. Of course in practice, the user
does not know these values (since she/he does not know the target θ∗ ), however she/he will be
lead to consider higher values of α than requested which will have as an effect to deteriorate the
asymptotic variance (see again 6.4.3).
The batch strategy is the simplest and the most elementary one.
Phase 1: One first computes an hopefully good approximation of the optimal variance reducer
that we will denote θn0 for a large enough n0 (which will remain fixed during the second phase
devoted to the computation of E ϕ(Z)).
|θn0 |2
Phase 2: As a second step, one implements a Monte Carlo simulation based on ϕ(Z+θn0 )e(θn0 |Z)− 2
i.e.
nX
0 +M |θn0 |2
1
E ϕ(Z) = lim ϕ(Zm + θn0 )e−(θn0 |Zm )− 2
M M
m=n0 +1
where (Zm )m≥n0 +1 is an i.i.d. sequence of N (0, Id )-distributed random vectors. This procedure
satisfies a CLT with (conditional) variance L(θn0 ) − (Eϕ(Z))2 (given θn0 ).
The adaptive strategy is to devise a procedure fully based on the simultaneous computation of the
optimal variance reducer and E ϕ(Z) from the same sequence (Zn )n≥1 of i.i.d. N (0, Id )-distributed
random vectors used in (6.17). This is approach has been introduced in [4]. To be precise this leads
to devise the following adaptive estimator of E ϕ(Z):
M |θm−1 |2
1 X
ϕ(Zm + θm−1 )e−(θm−1 |Zm )− 2 , M ≥ 1, (6.19)
M
m=1
146 CHAPTER 6. STOCHASTIC APPROXIMATION AND APPLICATIONS TO FINANCE
As a first consequence, the estimator defined by (6.19) is unbiased. Now let us define the (FM )-
martingale
|θm−1 |2
M
X ϕ(Zm + θm−1 )e−(θm−1 |Zm )− 2 − E ϕ(Z)
NM := 1{|θm−1 |≤m} .
m
m=1
It is clear that
|θm−1 |2 2
a.s.
E ϕ(Zm + θm−1 )e−(θm−1 |Zm )− 2 | Fm−1 1{|θm−1 |≤m} = L(θm−1 )1{|θm−1 |≤m} −→ L(θ∗ )
as m → +∞ which in turn implies that (Nm )m≥1 has square integrable increments so that Nm ∈
L2 (P) for very m ∈ N∗ and
X
1
hN i∞ ≤ sup L(θm ) < +∞ a.s.
m m2
m≥1
Consequently (see Chapter 11, Proposition 11.4), NM → N∞ where N∞ is an a.s. finite random
variable. Finally, Kronecker’s Lemma (see again Chapter 11, Lemma 11.1) implies
M |θm−1 |2
1 X
ϕ(Zm + θm−1 )e−(θm−1 |Zm )− 2 − E ϕ(Z) 1{|θm−1 |≤m} −→ 0 as M → +∞.
M
m=1
M |θm−1 |2
1 X
ϕ(Zm + θm−1 )e−(θm−1 |Zm )− 2 −→ E ϕ(Z) as M → +∞.
M
m=1
One can show, using the CLT theorem for triangular arrays of martingale increments (see [70]
and Chapter 11, Theorem ??) that
M
!
√ 1 X |θm−1 |2
L
M ϕ(Zm + θm−1 )e−(θm−1 |Zm )− 2 − E ϕ(Z) −→ N (0, (σ∗ )2 )
M
m=1
2 0.25
0.2
1.5 0.15
0.1
1 0.05
0.5 −0.05
−0.1
0 −0.15
−0.2
−0.5 −0.25
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0 1 2 3 4 5 6 7 8 9 10
5
x 10
Figure 6.1: B-S Vanilla Call option.T = 1, r = 0.10, σ = 0.5, X0 = 100, K = 100. Left:
convergence toward θ∗ (up to n = 10 000). Right: Monte Carlo simulation of size M = 106 ; dotted
line: θ = 0, solid line: θ = θ10 000 ≈ θ∗ .
with the following parameters: T = 1, r = 0.10, σ = 0.5, x0 = 100, K = 100. The Black-Scholes
reference price of the Vanilla Call is 23.93.
The recursive optimization of θ has been achieved by running the data driven version (6.18)
with a sample (Zn )n of size 10 000. A first renormalization has been made prior to the computation:
we considered the equivalent problem (as far as variance reduction si concerned) where the starting
values of the asset is 1 and the strike is the moneyness K/X0 . The procedure was initialized at
θ0 = 1. (Using (3.9) would have led to set θ0 = −0.2).
We did not try optimizing the choice of the step γn according to the results on the rate of
convergence but applied the heuristic rule that if the function H (here Ha ) takes its (reasonable)
20
values within a few units, then choosing γn = c 20+n with c ≈ 1 (say e.g. c ∈ [1/2, 2]) leads to
satisfactory perfomances for the algorithm.
The resulting value θ10 000 has been used in a standard Monte Carlo simulation of size M =
1 000 000 based on (6.12) and compared to a regular Monte Carlo simulation (i.e. with θ = 0). The
numerical results are as follows:
– θ = 0: Confidence interval (95 %) = [23.92, 24.11] (pointwise estimate: 24.02).
– θ = θ10 000 ≈ 1.51: Confidence interval (95 %) = [23.919, 23.967] (pointwise estimate: 23.94).
42.69
The gain ratio in terms of standard deviations is 11.01 = 3, 88 ≈ 4. This is observed on most
simulations we made although to convergence of θn may be more chaotic than what may be observed
in the figure (the convergence is almost instantaneous). The behaviour of the optimization of θ and
the Monte Carlo simulations are depicted in Figure
√ 6.1. The alternative original “parametrized”
version of the algorithm (Ha (θ, z)) with a = 2σ T yields quite similar results (when implemented
with the same step and the same starting value.
148 CHAPTER 6. STOCHASTIC APPROXIMATION AND APPLICATIONS TO FINANCE
Further comments: As developed in [124], all what precedes can be extended to non-Gaussian
random vectors Z provided their distribution have a log-concave probability density p satisfying
for some positive ρ,
log(p) + ρ| . |2 is convex.
One can also replace the mean translation by other importance sampling procedures like those
based on the Esscher transform. This has applications e.g. when Z = XT is the value at time T
of a process belonging to the family of subordinated (to Brownian motion) Lévy processes i.e. of
the form Zt = WYt where Y is an increasing Lévy process independent of the standard Brownian
motion W (see [22, 147] for more insight on that topic).
for the two risky assets where hW 1 , W 2 it = ρt, ρ ∈ [−1, 1] denotes the correlation between W 1 and
W 2 (that is the correlation between the yields of the risky assets X 1 and X 2 ).
In this market, we consider a best-of call option characterized by its payoff
max(XT1 , XT2 ) − K + .
A market of such best-of calls is a market of the correlation ρ (the respective volatilities being
obtained from the markets of vanilla options on each asset as implicit volatilities). In this 2-
dimensional B-S setting there is a closed formula for the premium involving the bi-variate standard
normal distribution (see [77]), but what follows can be applied as soon as the asset dynamics – or
their time discretization – can be simulated (at a reasonable cost, as usual. . . ).
We will use a stochastic recursive procedure to solve the inverse problem in ρ
where
PBoC (x10 , x20 , K, σ1 , σ2 , r, ρ, T ) := e−rT E max(XT1 , XT2 ) − K +
√ √ √
−rT 1 µ1 T +σ1 T Z1 2 µ2 T +σ2 T (ρZ 1 + 1−ρ2 Z 2 )
= e E max x0 e , x0 e −K
+
σ2 d
where µi = r − 2i , i = 1, 2, Z = (Z 1 , Z 2 ) = N (0; I2 ).
We assume from now on that the mark-to-market premium Pmarket is consistent i.e. that Equa-
tion (6.20) in ρ has at least one solution, say ρ∗ . Once again, this is a toy example since in this
very setting, more efficient deterministic procedures can be called upon, based on the closed form
for the option premium. On the other hand, what we propose below is a universal approach.
The most convenient way to prevent edge effects due to the fact that ρ ∈ [−1, 1] is to use a
trigonometric parametrization of the correlation by setting
ρ = cos θ, θ ∈ R.
6.3. APPLICATIONS TO FINANCE 149
The function PBoC is a 2π-periodic continuous function. Extracting the implicit correlation from
the market amounts to solving
where Pmarket is the quoted premium of the option (mark-to-market). We need an additional
assumption (which is in fact necessary with almost any zero search procedure): we assume that
i.e. that the (consistent) value Pmarket is not an extremal value of PBoC . So we are looking for a
zero of the function h defined on R by
h(θ) = E H(θ, Z)
d
and Z = (Z 1 , Z 2 ) = N (0; I2 ).
Proposition 6.1 Under the above assumptions made on the Pmarket and the function PBoC and
if, moreover, the equation PBoC (θ) = Pmarket has finitely many solutions on [0, 2π], the stochastic
zero search recursive procedure defined by
with a step sequence satisfying the decreasing step assumption (6.7) a.s. converges toward θ∗ solution
to PBoC (θ) = Pmarket .
150 CHAPTER 6. STOCHASTIC APPROXIMATION AND APPLICATIONS TO FINANCE
Proof. For every z ∈ R2 , θ 7→ H(θ, z) is continuous, 2π-periodic and dominated by a function g(z)
such that g(Z) ∈ L2 (P) (g is obtained by replacing z 1 cos θ + z 2 sin θ by |z 1 | + |z 2 | in the above
formula for H). One derives that the mean function h and θ 7→ E H 2 (θ, Z) are both continuous
and 2π-periodic as well (hence bounded).
The main difficulty to apply the Robbins-Siegmund Lemma is to find out the appropriate
Lyapunov function.
Z 2π
The quoted value Pmarket is not an extremum of the function P , hence h± (θ)dθ > 0 where
0
h± := max(±h, 0). We consider θ0 any (fixed) solution to the equation h(θ) = 0 and two real
numbers β ± such that R 2π +
h (θ)dθ
+
0 < β < R02π < β−
−
0 h (θ)dθ
and we set, for every θ ∈ R,
The function ` is clearly continuous and is 2π-periodic “on the right” on [θ0 , +∞) and “on the left”
on (−∞, θ0 ]. In particular it is a bounded function. Furthermore, owing to the definition of β ± ,
Z θ0 2+π Z θ0
`(θ)dθ > 0 and `(θ)dθ < 0
θ0 θ0 −2π
so that Z θ
lim `(u)du = +∞.
θ→±∞ θ0
As aconsequence, there exists a real constant C > 0 such that the function
Z θ
L(θ) = `(u)du + C
0
It remains to prove that L0 = ` is Lipschitz continuous continuous. Calling upon the usual ar-
guments to interchange expectation and differentiation (i.e. Theorem 2.1(b)), one shows that the
function PBoC is differentiable at every θR \ 2π Z with
0
√
PBoC (θ) = σ2 T E 1{X 2 >max(X 1 ,K)} XT2 (cos(θ)Z 2 − sin(θ)Z 1 ) .
T T
Furthermore
0
√ √ 1 2
|PBoC (θ)| ≤ σ2 T E x20 eµ2 T +σ2 T (|Z |+|Z |) (|Z 2 | + |Z 1 |) < +∞
so that PBoc is clearly Lipschitz continuous continuous on the interval [0, 2π] hence on the whole
real line by periodicity. Consequently h and h± are Lipschitz continuous continuous which implies
in turn that ` is Lipschitz continuous as well.
6.3. APPLICATIONS TO FINANCE 151
2.8 0
−0.1
2.6
−0.2
2.4
−0.3
2.2
−0.4
−0.5
2
−0.6
1.8
−0.7
1.6
−0.8
1.4 −0.9
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
4 4
x 10 x 10
Figure 6.2: B-S Best-of-Call option. T = 1, r = 0.10, σ1 = σ2 = 0.30, X01 = X02 = 100,
K = 100. Left: convergence of θn toward a θ∗ (up to n = 100 000). Right: convergence of
ρn := cos(θn ) toward -0.5
Moreover, one can show that the equation PBoC (θ) = Pmarket has finitely many solutions on
every interval of length 2π.
One may apply Theorem 6.3 (for which we provide a self-contained proof in one dimension) to
derive that θn will converge toward a solution θ∗ of the equation PBoC (θ) = Pmarket . ♦
Exercises. 1. Show that if x16 = x20 or σ1 6= σ2 , then PBoC is continuously differentiable on the
whole real line.
2. Extend what precedes to any payoff ϕ(XT1 , XT2 ) where ϕ : R2+ → R+ is a Lipschitz continuous
function. In particular, show without the help of differentiation that the corresponding function
θ 7→ P (θ) is Lipschitz continuous continuous.
σ2
where χ(σ) = (1 + |σ|)e− 2 T . Justify carefully this choice for H and implement the algorithm with
x = K = 100, r = 0.1 and a market price equal to 16.73. Choose the step parameter of the form
γn = xc n1 with c ∈ [0.5, 2] (this is simply a suggestion).
Warning. The above exercise is definitely a toy exercise! More efficient methods for extracting
standard implied volatility are available (see e.g. [109] which is based on a Newton algorithm; a
dichotomy approach is also very efficient).
Exercise (Extension to more general asset dynamics). One considers now a couple of
risky assets following two correlated local volatility models,
where the functions σi : R2+ → R+ are Lipschitz continuous continuous and bounded and the
Brownian motions W 1 and W 2 are correlated with correlation ρ ∈ [−1, 1] so that
dhW 1 , W 2 it = ρdt.
(This ensures the existence and uniqueness of strong solutions for this SDE, see Chapter 7.)
Assume that we know how to simulate (XT1 , XT2 ), either exactly, or at least as an approximation
d
by an Euler scheme from a d-dimensional normal vector Z = (Z 1 , . . . , Z d ) = N (0; Id ).
Show that the above approach can be extended mutatis mutandis.
6.3.3 Application to correlation search (II): reducing variance using higher or-
der Richardson-Romberg extrapolation
This section requires the reading of Section 7.7, in which we introduced the higher order Richardson-
Romberg extrapolation (of order R = 3). We showed that considering consistent increments in the
(three) Euler schemes made possible to control the variance of the Richardson-Romberg estimator
of E(f (XT )). However, we mentioned, as emphasized in [123], that this choice cannot be shown to
be optimal like at the order R = 2 corresponding to standard Richardson-Romberg extrapolation.
6.3. APPLICATIONS TO FINANCE 153
Stochastic approximation can be a way to search for the optimal correlation structure between
the Brownian increments. For the sake of simplicity, we will start from the continuous time diffusion
dynamics. Let
dXti = b(Xti ) dt + σ(Xti ) dBti , i = 1, 2, 3,
where B = (B 1 , B 2 , B 3 ) is a correlated Brownian motion with (symmetric) correlation matrix
R = [rij ]1≤i,j≤3 (rii = 1). We consider the 3rd -order Richardson-Romberg weights αi , i = 1, 2, 3.
1 2 3
The expectation E f (XT ) is estimated by E α1 f (X̄T ) + α2 f (X̄T ) + α3 f (X̄T ) using a Monte Carlo
simulation. So our objective is then to minimize the variance of this estimator i.e. solving the
problem
2
min E α1 f (X̄T1 ) + α2 f (X̄T2 ) + α3 f (X̄T3 )
R
The starting idea is to use the trigonometric representation of covariance matrices derived from
the Cholesky decomposition (see e.g. [119]):
R = T T ∗, T lower triangular,
T
In practice at that stage one switches to the Euler schemes X̄ i of X i with steps in i = 1, 2, 3
∂Xti
respectively. We proceed likewise with the θ-sensitivity processes ( ∂θij )1≤j≤i−1 which Euler scheme
¯i
∂X
is denoted ( ∂θijt )1≤j≤i−1 .
3
This means that the derivatives of the function are bounded and α-Hölder for some α > 0.
154 CHAPTER 6. STOCHASTIC APPROXIMATION AND APPLICATIONS TO FINANCE
Suppose now that f : R → R is P a smooth enough (say differentiable or only λ-a.s. if L(X̄T ) has
a density). Then, setting Φf (x) = ( 3i=1 αi f (xi ))2 , we define the potential function L : R3 → R by
Our aim is to minimize this potential function or at least to exhibit values of θ such that
L(θ) < L (0, 0, 0)
i.e. to do better than considering Euler schemes with consistent increments. Theoretical results
from [123] strongly suggest that such θ do exist as well as empirical evidence.
The function L is differentiable and its gradient at θ is given by
t ∂ X̄T
∇L(θ) := E ∇Φf (X̄T (θ)) .
∂θ 1≤j<i≤3
Then, one devises the resulting stochastic gradient procedure (Θm )m≥0 recursively defiened by
∂ X̄T(m)
Θm+1 = Θn − γm+1 ∇Φf (X̄T(m) (Θm ))t (Θm ), Θ0 ∈ (0, 2π)3
∂θ
(m)
∂ X̄
where X̄T(m) , ∂θT , m ≥ 1 are independent copies of the joint Euler scheme at time T , computed
using the covariance matrix Θn .
By contruction the function L plays the role of a Lyapunov function for the algorithm (except
for its behaviour at infinity. . . ).
Under natural assumptions, one shows that Θn a.s. converges toward a random vector Θ∗
which takes values in the zero of ∇L. Although the function L is bounded, one can prove this
convergence by taking advantage of the periodicity of L by applying e.g; results from [50]. Then
relying on results on traps (see [29, 96], etc) in the subset of local minima of L. Hopefully it
may converge to the global minimum of L or at least induce a significant variance reduction with
respect to the Richardson-Romberg-Richardson extrapolation carried out with consistent Brownian
increment (i.e. simulated from a unique underlying Brownian motion).
the “theoretical price obtained with parameter θ and the quoted price. To make the problem
consistent we assume throughout this section that
Let S ∈ S+ (p, R) ∩ GL(p, R) be a (positive definite) matrix. The resulting inner product is defined
by
∀ u, v ∈ Rp , hu|viS := u∗ Sv
p
and the associated Euclidean norm | . |S by |u|S := hu|uiS .
A natural choice for the matrix S can be a simple diagonal matrix S = Diag(w1 , . . . , wp ) with
“weights” wi > 0, i = 1, . . . , p.
The paradigm of model calibration is to find the parameter θ∗ that minimizes the “aggregated
error” with respect to the | . |S -norm. This leads to the following minimization problem
1
(C) ≡ argminθ |E Yθ |S = argminθ |E Yθ |2S .
2
Here are two simple examples to illustrate this somewhat abstract definition.
where W is a standard Brownian motion. Then let (Ki , Ti )i=1,...,p be p couples “maturity-strike
price”. Set
θ := σ, Θ := (0, ∞)
and
Yθ := e−rTi (XTx,σ
i
− K )
i + − P (T
market i , Ki )
i=1,...,p
where Pmarket (Ti , Ki ) is the mark-to-market price of the option with maturity Ti > 0 and strike
price Ki .
2. Merton model (mini-krach). Now, for every x, σ, λ > 0, a ∈ (0, 1), set
σ2
Xtx,σ,λ,a = x e(r− 2
+λa)t+σWt
(1 − a)Nt , t ≥ 0
where W is as above and N = (Nt )t≥0 is a standard Poisson process with jump intensity λ. Set
and
Yθ = e −rTi
(XTx,σ,λ,a
i
− Ki )+ − Pmarket (Ti , Ki ) .
i=1...,p
We will also have to make simulability assumptions on Yθ and if necessary its derivatives with
respect to θ (see below) otherwise our simulation based approach would be meaningless.
At this stage, basically two approaches can be considered as far as solving this problem by
simulation is concerned:
156 CHAPTER 6. STOCHASTIC APPROXIMATION AND APPLICATIONS TO FINANCE
• A more direct treatment based on the so-called Kiefer-Wolfowitz procedure which is a kind
counterpart of the Robbins-Siegmund approach based on a finite difference method (with
decreasing step) which does not require the existence of a representation of ∇L as an expec-
tation.
One checks – using the exercise “Extension to uniform integrability” that follows Theorem 2.1 –
that θ 7−→ EYθ is differentiable and that its Jacobian is given by
∂θ E Yθ = E ∂θ Yθ .
Then, the function L is differentiable everywhere on Θ and its gradient (with respect to the canonical
Euclidean norm) is given by
At this stage we need a representation of ∇L(θ) as an expectation. To this end, we construct for
every θ ∈ Θ, an independent copy Yeθ of Yθ defined as follows: we consider the product probability
space (Ω2 , A⊗2 , P⊗2 ) and set, for every (ω, ω e ) ∈ Ω2 , Yθ (ω, ω
e ) = Yθ (ω) (extension of Yθ on Ω2 still
denoted Yθ ) and Yeθ (ω, ω
e ) = Yθ (e
ω ). It is straightforward by the product measure Theorem that the
two families (Yθ )θ∈Θ and (Yeθ )θ∈Θ are independent with the same distribution. From now on we
will make the usual abuse of notations consisting in assuming that these two independent copies
live on the probability space (Ω, A, P).
Now, one can write
The standard situation, as announced above, is that Yθ is a vector of payoffs written on d traded
risky assets, recentered by their respective quoted prices. The model dynamics of the d risky assets
depends on the parameter θ, say
where the price dynamics (X(θ)t )t≥0 , of the d traded assets is driven by a parametrized diffusion
process
dX(θ)t = b(θ, t, Xt (θ)) dt + σ(θ, t, Xt (θ)).dWt , X0 (θ) = x0 (θ) ∈ Rd ,
6.3. APPLICATIONS TO FINANCE 157
where W is an Rq -valued standard Brownian motion defined ona probability space (Ω, A, P), b is
an Rd -valued vector field defined on Θ × [0, T ] × Rd and σ is an M(d, q)-valued field defined on the
same product space, both satisfying appropriate regularity assumptions.
The pathwise differentiability of Yθ in θneeds that of Xt (θ) with respect to θ. This question
is closely related to the θ-tangent process ∂X∂θt (θ) of X(θ). A precise statement is provided
t≥0
in Section 9.2.2 which ensures that if b and σ are smooth enough with respect to the variable θ,
then such a θ-tangent process does exist and is solution to a linear SDE (involving X(θ) in its
coefficients).
Some differentiability properties are also required on the functions Fi in order to fulfill the above
differentiability Assumption (i). As model calibrations on vanilla derivative products performed in
Finance, Fi is never everywhere differentiable – typically Fi (y) := e−rTi (y −Ki )+ −Pmarket (Ti , Ki ) –
but, if Xt (θ) has an absolutely continuous distribution (i.e. a probability density) for every time
t > 0 and every θ ∈ Θ, then Fi only needs to be differentiable outside a Lebesgue negligible subset
of R+ . Finally, we can write formally
where W stands for an abstract random innovation taking values in an appropriate space. We
denote by the capital letter W the innovation because when the underlying dynamics is a Brow-
nian diffusion or its Euler-Maruyama scheme, it refers to a finite-dimensional functional of (two
independent copies of) the Rq -valued standard Brownian motion on an interval [0, T ]: either (two
independent copies of) (WT1 , . . . , WTp ) or (two independent copies of) the sequence (∆W kT )1≤k≤n
n
of Brownian increments with step T /n over the interval [0, T ]. Thus, these increments naturally
appear in the simulation of the Euler scheme (X̄ nkT (θ))0≤k≤n of the process (X(θ)t )t∈[0,T ] when the
n
latter cannot be simulated dirctly (see Chapter 7 entirely devoted to the Euler scheme of Brownian
diffusions). Of course other situations may occur especially when dealing with with jump diffusions
where W usually becomes the increment process of the driving Lévy process.
Whatsoever, we make the following reasonable meta-assumption that
– the process W is simulatable,
– the functional H(θ, w) can be easily computed for any input (θ, w).
Then, one may define recursively the following zero search algorithm for ∇L(θ) = E H(θ, W ),
by setting
θn+1 = θn − γn+1 H(θn , W n+1 )
where (W n )n≥1 is an i.i.d. sequence of copies of W and (γn )n≥1 is a sequence of steps satisfying
the usual decreasing step assumptions
X X
γn = +∞ and γn2 < +∞.
n n
In such a general framework, of course, one cannot ensure that the function s L and H will satisfy
the basic assumptions needed to make stochastic gradient algorithms converge, typically
p
∇L is Lipschitz continuous continuous and ∀ θ ∈ Θ, kH(θ, .)k2 ≤ C(1 + L(θ))
158 CHAPTER 6. STOCHASTIC APPROXIMATION AND APPLICATIONS TO FINANCE
or one of their numerous variants (see e.g. [21] for a large overview of possible assumptions). How-
ever, in many situations, one can make the problem fit into a converging setting by an appropriate
change of variable on θ or by modifying the function L and introducing an appropriate explicit
(strictly) positive “weight function ” χ(θ) that makes the product χ(θ)H(θ, W (ω)) fit with these
requirements.
Even though, the topological structure of the set {∇L = 0} can be nontrivial, in particular
disconnected. Nonetheless, as seen in Proposition 6.3, one can show, under natural assumptions,
that
The next step is that if ∇L has several zeros, they cannot all be local minima of L especially when
there are more than two of them (this is a consequence of the well-known Mountain-Pass Lemma,
see [79]). Some are local maxima or saddle points of various kinds. These equilibrium points
which are not local minima are called traps. An important fact is that, under some non-degeneracy
assumptions on H at such a parasitic equilibrium point θ∞ (typically E H(θ∞ , W )∗ H(θ∞ , W ) is
positive definite at least in the direction of an unstable manifold of h at θ∞ ), the algorithm will
a.s. never converge toward such a “trap”. This question has been extensively investigated in the
literature in various settings for many years (see [96, 29, 136, 20, 51]).
A final problem may arise due to the incompatibility between the geometry of the parameter
set Θ and the above recursive algorithm: to be really defined by the above recursion, we need Θ
to be left stable by (almost) all the mappings θ 7→ θ − γH(θ, w) at least for γ small enough. If not
the case, we need to introduce some constraints on the algorithm by projecting it on Θ whenever
θn skips outside Θ. This question has been originally investigated in [31] when Θ is a convex set.
Once all these technical questions have been circumvented, we may state the following meta-
theorem which says that θn a.s. converges toward a local minimum of L.
At this stage it is clear that calibration looks like quite a generic problem for stochastic op-
timization and that almost all difficulties arising in the field of Stochastic Approximation can be
encountered when implementing such a (pseudo-)stochastic gradient to solve it.
Practical implementations of the Robbins-Siegmund approach point out a specific technical diffi-
culty: the random functions θ 7→ Yθ (ω) are not always pathwise differentiable (nor in the Lr (P)-
sense which could be enough). More important in some way, even if one shows that θ 7→ E Y (θ) is
differentiable, possibly by calling upon other techniques (log-likelihood method, Malliavin weights,
etc) the resulting representation for ∂θ Y (θ) may turn out to be difficult to simulate, requiring ing
much programming care, whereas the random vectors Yθ can be simulated in a standard way. In
such a setting, an alternative is provided by the Kiefer-Wolfowitz algorithm (K-W ) which combines
the recursive stochastic approximation principle with a finite difference approach to differentiation.
The idea is simply to approximate the gradient ∇L by
∂L L(θ + η i ei ) − L(θ − η i ei )
(θ) ≈ , 1 ≤ i ≤ p,
∂θi 2η i
6.3. APPLICATIONS TO FINANCE 159
where (ei )1≤i≤p denotes the canonical basis of Rp and η = (η i )1≤i≤p . This finite difference term
has an integral representation given by
(Y (θ, W ) is related to the innovation W ). Starting from this representation, we may derive a
recursive updating formula for θn as follows
i
Λ(θn + ηn+1 , W n+1 ) − Λ(θn − ηn+1
i , W n+1 )
i
θn+1 = θni − γn+1 i
, 1 ≤ i ≤ p.
2ηn+1
We reproduce below a typical convergence result for K-W procedures (see [21]) which is the natural
counterpart of the stochastic gradient framework.
Theorem 6.4 Assume that the function θ 7→ L(θ) is twice differentiable with a Lipschitz continu-
ous Hessian. We assume that
X X X X γn 2
γn = ηni = +∞, γn2 < +∞, ηn → 0, < +∞.
ηni
n≥1 n≥1 n≥1 n≥1
A special case of this procedure in a linear framework is proposed in Chapter 9.1.2: the decreas-
ing step finite difference method for greeks computation. The traps problem for the K-W algorithm
(convergence toward a local minimum of L) has been more specifically investigated in [96].
Users must keep in mind that this procedure needs some care in the tuning of the step pa-
rameters γn and ηn . This may need some preliminary numerical experiments. Of course, all the
recommendations made for the R-Z procedures remain valid. For more details on the K-W proce-
dure we refer to [21].
Definition 6.1 The Value at Risk at level α ∈ (0, 1) is the (lowest) α-quantile of the distribution
of X i.e.
V @Rα (X) := inf{ξ | P(X ≤ ξ) ≥ α}. (6.22)
The Value at Risk does exist since lim P(X ≤ ξ) = 1 and satisfies
ξ→+∞
at
V @Rα (X) = inf argminξ∈R L(ξ).
6.3. APPLICATIONS TO FINANCE 161
Proof. The function L is clearly convex and 1-Lipschitz continuous since both functions ξ 7→ ξ
and ξ 7→ (x − ξ)+ are convex and 1-Lipschitz continuous for every x ∈ R. If X has no atom, then,
calling upon Theorem 2.1(a), the function L is also differentiable on the whole real line with a
derivative given for every ξ ∈ R by
1 1
L0 (ξ) = 1 −
P(X > ξ) = P(X ≤ ξ) − α .
1−α 1−α
This follows from the interchange of differentiation and expectation allowed by Theorem 2.1(a)
1
since ξ 7→ ξ + 1−α (X − ξ)+ is differentiable at a given ξ0 on the event {X = ξ0 }, i.e P-a.s. since X
is atomless and, on the other hand, is Lipschitz continuous continuous with Lipschitz continuous
X
ratio 1−α ∈ L1 (P). The second equality is obvious. Then L attains an absolute minimum, if any,
at any solution ξα of the equation P(X > ξα ) = 1 − α i.e. P(X ≤ ξα ) = α. Hence, L does attain a
minimum at the value-at-risk (which is the lowest solution of this equation). Furthermore
E ((X − ξα )+ )
L(ξα ) = ξα +
P(X > ξα )
ξα E1{X>ξα } + E ((X − ξα )1{X>ξα } )
=
P(X > ξα )
E (X1{X>ξα } )
=
P(X > ξα )
= E (X | {X > ξα }).
and
L(−ξ) 1 1 α
lim = lim − 1 + E(X/ξ + 1)+ = −1 + = .
ξ→+∞ ξ ξ→+∞ 1−α 1−α 1−α
Exercise. Show that the conditional value-at-risk CV @Rα (X) is a consistent measure of risk i.e.
that it satisfies the following three properties
where
1 1
H(ξ, x) := 1 − 1{x≥ξ} = 1{X<ξ} − α , (6.25)
1−α 1−α
a.s. converges toward the Value-at-Risk i.e.
a.s.
ξn −→ V @Rα (X).
Furthermore the sequence (L(ξn ))n≥0 is L1 -bounded so that L(ξn ) → CV @Rα (X) a.s. and in every
Lp (P), p ∈ (0, 1].
Proof. First assume that ξ0 ∈ L2 (P). The sequence (ξn )n≥0 defined by (6.24) is the stochastic
gradient related to the Lyapunov function L(ξ)
e = L(ξ) − E X, but owing to the convexity of L,
it is more convenient to rely on Corollary 6.1(a) (Robbins-Monro algorithm) since it is clear that
α
the function (ξ, x) 7→ H(ξ, x) is bounded by 1−α so that ξ 7→ kH(ξ, X)k2 is bounded as well. The
conclusion directly follows from the Robbins-Monro setting.
2
In the general case – ξ0 ∈ L1 (P) – one introduces the Lyapunov function L̃(ξ) = √ (ξ−ξα )
1+(ξ−ξα )2
(ξ−ξα )(2++(ξ−ξα )2 )
where we set ξ = ξα for convenience. The derivative of L̃ is given by L̃0 (ξ) = 3 ).
(1+(ξ−ξα )2 ) 2
One checks on the one hand that L̃0 is Lipschitz continuous continuous over the real line (e.g.
because L̃” is bounded) and, on the other hand, {L̃0 = 0} ∩ {L0 = 0} = {L0 = 0} = {V @Rα (X)}.
Then Theorem 6.3 (pseudo-Stochastic Gradient) applies and yields the announced conclusion. ♦
Exercises. 1. Show that if X has a bounded density fX , then a direct application of the
stochastic gradient convergence result (Corollary 6.1(b)) yields the announced result under the
assumption ξ0 ∈ L1 (P) [Hint: show that L0 is Lipschitz continuous continuous].
6.3. APPLICATIONS TO FINANCE 163
2. Under the additional assumption of Exercise 1, show that the mean function h satisfies h0 (x) =
fX (x)
L”(x) = 1−α . Deduce a way to optimize the step sequence of the algorithm based on the CLT
for stochastic algorithms stated further on in Section 6.4.3.
Proposition 6.4 If X ∈ Lp (P) for a p > 0 and is atomless with a unique value-at-risk and if
ξ0 ∈ Lp (P), then the algorithm (6.24) a.s. converges toward V @Rα (X).
Remark. The uniqueness of the value-at-risk can also be relaxed. The conclusion becomes that ξn
a.s.converges to a random variable taking values in the “V @Rα (X) set”: {ξ ∈ R | P(X ≤ ξ) = α}
(see [19] for a statement in that direction).
Second step: adaptive computation of CV @Rα (X). The main aim of this section was to
compute the CV @Rα (X). How can we proceed? The idea is to devise a companion procedure of
the above stochastic gradient. Still set temporarily ξα = V @Rα (X) for convenience. It follows
L(ξ0 ) + · · · + L(ξn−1 )
from what precedes that → CV @Rα (X) a.s. owing to Césaro principle. On
n
the other hand, we know that, for every ξ ∈ R,
(x − ξ)+
L(ξ) = E Λ(ξ, X) where Λ(ξ, x) = ξ + .
1−α
Using that Xn and (ξ0 , ξ1 , . . . , ξn ) are independent, on can re-write the above Césaro converge as
Λ(ξ0 , X1 ) + · · · + Λ(ξn−1 , Xn )
E −→ CV @Rα (X)
n
which naturally suggests to consider the sequence (Cn )n≥0 defined by C0 = 0 and
n−1
1X
Cn = Λ(ξk , Xk+1 ), n ≥ 1,
n
k=0
is a candidate to be an estimator of CV @Rα (X). This sequence can clearly be recursively defined
since, for every n ≥ 0,
1
Cn+1 = Cn − Cn − Λ(ξn , Xn+1 ) . (6.26)
n+1
Proposition 6.5 Assume that X ∈ L1+ρ (P) for ρ ∈ (0, 1] and that ξn −→ V @Rα (X). Then
a.s.
Cn −→ CV @Rα (X) as n → +∞.
Proof. We will prove this claim in details in the quadratic case ρ = 1. The proof in the general
case relies on the Chow Theorem (see [43] or the second exercise right after the proof). First, one
decomposes
n−1 n
1X 1X
Cn − L(ξα ) = L(ξk ) − L(ξα ) + Yk
n n
k=0 k=1
164 CHAPTER 6. STOCHASTIC APPROXIMATION AND APPLICATIONS TO FINANCE
n−1
1X
with Yk := Λ(ξk−1 , Xk ) − L(ξk−1 ), k ≥ 1. It is clear that L(ξk ) − L(ξα ) → 0 as n → +∞ by
n
k=0
Césaro’s principle.
As concerns the second term, we first note that
1
Λ(ξ, x) − L(ξ) = ((x − ξ)+ − E(X − ξ)+ )
1−α
so that, x 7→ x+ being 1-Lipschitz continuous continuous
1 1
|Λ(ξ, x) − L(ξ)| ≤ E |X − ξ| ≤ E |X| + |ξ| .
1−α 1−α
Consequently, for every k ≥ 1,
2
E Yk2 ≤ (E X)2 + E X 2 .
(1 − α)2
We consider the natural filtration of the algorithm Fn := σ(ξ0 , X1 , . . . , Xn ). One checks that, for
every k ≥ 1,
This martingale is in L2 (P) for every n and its predictable bracket process is given by
n
X E(Y 2 | Fk−1 )
k
hN in =
k2
k=1
so that
X 1
E hN i∞ ≤ sup E Yn2 × < +∞.
n k2
k≥1
Consequently Nn → N∞ a.s. and in L2 as n → +∞. Then the Kronecker Lemma (see Lemma 11.1)
implies that
n
1 X n→+∞
Yk −→ 0
n
k=1
Remark. For practical implementation one may prefer estimating first the V @Rα (X) and, once
it is done, use a regular Monte Carlo procedure to evaluate the CV @Rα (X).
6.3. APPLICATIONS TO FINANCE 165
Exercises. 1. Show that an alternative method to compute CV @Rα (X) is to design the
following recursive procedure
where (γn )n≥1 is the step sequence implemented in the algorithm (6.24) which computes V @Rα (X).
2 (Proof of Proposition 6.5). Show that the conclusion of Proposition 6.5 remains valid if
X ∈ L1+ρ (P). [Hint: rely on the Chow theorem (4 ).]
The quadratic distortion function is defined as the squared quadratic mean-quantization error i.e.
∀ x = (x1 , . . . , xN ) ∈ (Rd )N , DN
X b x k2 = E dis
(x) := kX − X 2 loc,N
(Γ, X)
4
Let (Mn )n≥0 be an (Fn , P)-martingale null at 0 and let ρ ∈ (0, 1]; then
X
a.s.
|∆Mn |1+ρ | Fn−1 < +∞ .
Mn −→ M∞ on E
n≥1
166 CHAPTER 6. STOCHASTIC APPROXIMATION AND APPLICATIONS TO FINANCE
N -tuples x ∈ (Rd )N with pairwise distinct components as a consequence of the local Lebesgue dif-
ferentiation theorem (Theorem 2.1(a)) and one easily checks that
∂DNX ∂disloc,N
Z
∂disloc,N
(x) := E (x, X) = (x, ξ)PX (dξ),
∂xi ∂xi Rd ∂xi
with a local gradient given by
∂disloc,N
(x, ξ) := 2(xi − ξ)1{Proj (ξ)=xi } , 1≤i≤N
∂xi x
where Projx denotes a (Borel) projection following the nearest neighbour rule on the grid {x1 , . . . , xN }.
As emphasized in the introduction of this chapter, the gradient ∇DN X having an integral represen-
X
tation, it formally possible to minimize DN using a stochastic gradient descent.
X X
Unfortunately, it is easy to check that lim inf DN (x) < +∞ (though lim inf DN (x) = +∞.
|x|→+∞ mini |xi |→+∞
Consequently, it is hopeless to apply the standard convergence theorem “à la Robbins-Siegmund”
for stochastic gradient procedure like that established in Corollary 6.1(b). But of course, we can
still write it down formally and implement it. . .
Ingredients:
– A sequence ξ 1 , . . . , ξ t , . . . of (simulated) independent copies of X,
– A step sequence γ1 , . . . , γt , . . . .
A
One usually choose the step in the parametric families γt = & 0 (decreasing step) or
B+t
γt = η ≈ 0 (small constant step).
One can easily check that if x(t) has pairwise distinct components (in Rd , this is preserved by
the “learning” phase. So that the above procedure is well-defined (up to the convention to be made
in case of conflict between several components x(t)j in the “Competitive” phase.
The name of the procedure – Competitive Learning Vector Quantization algorithm – is of course
inspired from these two phases.
b Γ∗ k :
X (Γ∗ ) = kX − X
• (Quadratic) Quantization error DN 2
a.s.
DX,t+1 := (1 − γt+1 )DX,t + γt+1 |xi(t+1),t − ξ t+1 |2 −→ DN
X
(Γ∗ ).
N N
Note that, since the ingredients involved in the above computations are those used in the com-
petition learning phase, there is (almost) no extra C.P.U. time cost induced from these additional
terms, especially if one has in mind (see below) that the costfull part of the algorithm (as well as
that of the Lloyd procedure below) is the “competition phase” (Winner selection) wince it amounts
to a nearest neighbour search.
In some way the CLVQ algorithm can be seen as a Non Linear Monte Carlo Simulation devised
to design an optimal skeleton of the distribution of X.
For (partial) theoretical results on the convergence of the CLVQ algorithm, we refer to [122]
when X has a compactly supported distribution. No theoretical convergence results are known to
us when X has an inbounded support. As for the convergence of the online adaptive companion
procedures, which rely on classical martingale arguments, we refer again to [122], but also to [12].
have pairwise distinct components (see [66, 122] among others). Consequently, if x = (x , . . . , xN )
1
denotes a local minimum, the gradient of the squared quantization error at x must be zero i.e.
∂DNX
(x, ξ) = 2 E (xi − X)1{Proj (X)=xi } = 0, 1 ≤ i ≤ N,
∂xi x
or equivalently
xi = E X | {X
b x = xi } , 1 ≤ i ≤ N.
(6.28)
This fixed point identity can also be rewritten in term of conditional expectation as follows:
bx = E X | X bx
X
168 CHAPTER 6. STOCHASTIC APPROXIMATION AND APPLICATIONS TO FINANCE
bx =
since, by the characterization of the conditional expectation of a discrete random vector, E X | X
XN
b x = xi } 1 b x i .
E X | {X {X =x }
i=1
Regular Lloyd’s procedure. The Lloyd procedure is simply the recursive procedure asso-
ciated to the fixed point identity (6.28)
xi (t + 1) = E X | {X
b x(t) = xi } , 1 ≤ i ≤ N, t ∈ N, x(0) ∈ (Rd )N ,
where x(0) has pairwise distinct components in Rd . We leave as an exercise to show that this
procedure is entirely determined by the distribution of the random vector X.
Exercise. Prove that this recursive procedure only involves the distribution µ = PX of the
random vector X.
Lloyd’s algorithm can be viewed as two step procedure acting on random vectors as follows
b x(t) , x(t + 1) = X(t
(i)Grid updating e + 1) = E X | X
X(t e + 1)(Ω),(6.29)
(ii)Distribution/weight updating b x(t+1) ← X(t
X e + 1). (6.30)
The first step updates the grid, the second step re-assigns to each element of the grid its Voronoi
cell which can be viewed as a weight updating through the
Proposition 6.6 The Lloyd’s algorithm makes the quadratic quantization error decrease i.e.
b x(t) k
t 7−→ kX − X is non-increasing.
2
Proof. It follows the the above decomposition of the procedure and the very definitions of nearest
neighbour projection and conditional expectation as an orthogonal projector in L2 (P) that, for
every t ∈ N,
b x(t+1) k = kdist X, x(t + 1) k
kX − X 2 2
x(t+1) b x(t) k
≤ kX − Xe k2 = kX − E X | X 2
x(t)
≤ X −X
b
2
♦
In particular, if we use the same sample at each iteration, we still have the property that the
procedure makes decrease a quantization error modulus (at level N ) related to the distribution µ.
This suggests that the random i.i.d. sample (ξm )m≥1 can also be replaced by deterministic
copies obtained through a QM C procedure based on a representation of X of the form X = ψ(U ),
U ∼ U ([0, 1]r ).
When computing larger and larger quantizer of the same distribution, a significant improvement
of the method is to initialize the (randomized Lloyd’s procedure at “level” N + 1 be adding one
component to the N -quantizer resulting from the procedure applied at level N , namely to start
from the N + 1-tuple (x∗,(N ) , ξ) where x∗,(N ) denotes the limiting value of the procedure at level
N (assumed to exist, which is the case in practice).
(x1 )2 ≥ δmin
2
=⇒ |x| ≥ δmin
..
.
(x1 )2 + · · · + (x` )2 ≥ δmin
2
=⇒ |x| ≥ δmin
..
.
This is the simplest idea and the easiest idea to implement but it seems that it is also the only one
that still works as d increases.
The K-d tree (Friedmann, Bentley, Finkel, 1977, see [53]): store the N points of Rd in a tree
of depth O(log(N )) based on their coordinates on the canonical basis of Rd .
Further improvements are due to Mc Names (see [108]): the idea is to perform a pre-processing
of the dataset of N point using a Principal Component Axis (PCA) analysis and then implement
the K-d-tree method in the new orthogonal basis induced by the PCA.
Figure 6.3: An optimal quantization of the bi-variate normal distribution with size N = 500
algorithm (see [129] or [127]) and the Lloyd I procedure (see [129, 124, 56]) which have been just
described and briefly analyzed above. More algorithmic details are also made available on the web
site
www.quantize.maths-fi.com
For normal distributions a large scale optimization have been carried out based on a mixed
CLV Q-Lloyd procedure. To be precise, grids have been computed for For d = 1 up to 10 and
N = 1 ≤ N ≤ 5 000. Furthermore several companion parameters have also been computed (still
by simulation): weight, L1 quantization error, (squared) L2 -quantization error (also known as
distortion), local L1 &L2 -pseudo-inertia of each Voronoi cell. All these grids can be downloaded on
the above website.
Thus Figure 6.3.6 depicts an optimal quadratic N -quantization of the bi-variate normal distri-
bution N (0; I2 ) with N = 500.
where (Zn )n≥1 is an i.i.d. innovation sequence of Rq -valued random vectors, Y0 is independent of
the innovation sequence, H : Rd × Rq → Rd is a Borel function such that. . . and (γn )n≥1 is sequence
of step parameters.
We saw that this algorithm can be represented in a canonical way as follows:
where h(y) = E H(y, Z1 ) is the mean filed or mean function of the algorithm and ∆Mn+1 =
h(Yn ) − E(H(Yn , Zn+1 ) | Fn ), n ≥ 0, is a sequence of martingale increments with respect to the
filtration Fn = σ(Y0 , Z1 , . . . , Zn ), n ≥ 0.
In what precedes we established criterions (see Robins-Siegmund’s Lemma) based on the exis-
tence of Lyapunov functions which ensure that the sequence (Yn )n≥1 is a.s. bounded and that the
martingale X
Mnγ = γk ∆Mk is a.s. convergent in Rd .
k≥1
To derive the a.s. convergence of the algorithm itself, we used pathwise arguments based on
elementary topology and functional analysis. The main improvement provided by the ODE method
is to study the asymptotics of the sequence (Yn (ω))n≥0 , assumed a priori ) to be bounded, through
the sequence (Yk (ω))k≥n , n ≥ 1 represented as sequence of function in the scale of the cumulative
function of the step. We will also need an assumption on the paths of the martingale (Mnγ )n≥1 ,
however significantly lighter than the above a.s. convergence property. Let us be more specific:
first we consider a discrete time dynamics
where (πn )n≥1 is a sequence of Rd -valued vectors and h : Rd → Rd is a continuous Borel function.
We set Γ0 = 0 and, for every integer n ≥ 1,
n
X
Γn = γk .
k=1
(0)
Then we define the stepwise constant càdlàg function (Yt )t∈R+ by
(0)
yt = yn if t ∈ [Γn , Γn+1 )
so that N (t) = n if and only if t ∈ [Γn , Γn+1 ) (in particular N (Γn ) = n).
Developing the recursive Equation (6.32), we get
n
X n
X
yn = y0 − γk h(yk−1 ) − γk π k
k=1 k=1
Z ΓN (t) N (t)
(0) (0)
X
yt = y0 − h(ys(0) )ds − γk π k
0 k=1
Z t Z t N (t)
(0)
X
= y0 − h(ys(0) )ds + h(ys(0) )ds − γk π k (6.33)
0 ΓN (t) k=1
Then, by the very definition of the shifted function y (n) and taking advantage of the fact that
ΓN (Γn ) = Γn , we derive by subtracting (6.33) at times Γn + t and Γn , that for every t ∈ R+ ,
Z Γn +u N (Γn +t)
(0)
X
where we keep in mind that yΓn = yn . The term − γk πk is candidate to be a
ΓN (Γn +t) k=n+1
remainder term as n goes to infinity. Our aim is to make a connection between the asymptotic
behavior of the sequence of vectors (yn )n≥10 and that of the sequence of functions y (n) , n ≥ 0.
(a) The set Y ∞ := {limiting points of(yn )n≥0 } is a compact connected set.
(b) The sequence (y (n) )n≥1 is sequentially relatively compact (5 ) for the topology of the uniform
convergence on compacts sets on the space B(R+ , Rd ) of bounded functions from R+ to Rd (6 ) and
all its limiting points lie in C(R+ , Y ∞ ).
As a consequence (see [6] for details) the set Y ∞ is compact and bien enchaı̂né (7 ), hence connected.
Z .
(n)
(b) The sequence h(ys )ds is uniformly Lipschitz continuous with Lipschitz continuous
0 n≥0
coefficient L since, for every s, t ∈ R+ , s ≤ t,
Z t Z s Z t
h(yu(n) )du − (n)
|h(yu(n) )|du ≤ L(t − s).
h(yu )du ≤
0 0 s
hence, it follows from Arzela-Ascoli’s Theorem that (y (n) )n≥0 is relatively compact in C(R+ , Rd )
endowed with the topology, denoted UK , of the uniform convergence on compact sets. On the other
hand, for every T ∈ (0, +∞),
Z t N (Γn +t)
(n) (n) (n)
X
sup yt − y0 +
h(ys )ds ≤ sup γk L + sup
γk πk −→ 0 as n → +∞
t∈[0,T ] 0 k≥n+1 t∈[0,T ] k=n+1
Connection algorithm-ODE
To state the first theorem on the so-called ODE method theorem, we introduce the reverse differ-
ential equation
ODE ∗ ≡ ẏ = h(y).
Theorem 6.5 (ODE II) Assume H1 -H3 hold and that the mean field function h is continuous.
(a) Any limiting function of the sequence (y (n) )n≥0 is an Y ∞ -valued solution of
ODE ≡ ẏ = −h(y).
(b) Assume that ODE satisfies the following uniqueness property: for every y0 ∈ Y ∞ , ODE admits
a unique Y ∞ -valued solution (Φt (y0 ))t∈R+ starting at Φ0 (y0 ) = y0 . Also assume uniqueness for
ODE ∗ (defined likewise). Then, the set Y ∞ is a compact, connected set, and flow-invariant for
both ODE and ODE ∗ .
Proof: (a) Given the above proposition, one has to check that any limiting function y (∞) =
(ϕ(n)) (∞) (ϕ(n))
UK − limn→+∞ y (ϕ(n)) is solution the ODE. For every t ∈ R+ , yt → yt , hence h(yt )→
(∞)
h(yt ) since the mean field function h is continuous. Then by the Lebesgue dominated convergence
theorem, one derives that for every t ∈ R+ ,
Z t Z t
(ϕ(n))
h(ys )ds −→ h(ys(∞) )ds.
0 0
(ϕ(n)) (∞)
One also has y0 → y0 so that finally, letting ϕ(n) → +∞ in (6.34), we obtain
Z t
(∞) (∞)
yt = y0 − h(ys(∞) )ds.
0
(b) Any y0 ∈ Y ∞ is the limit of a sequence yϕ(n) . Up to a new extraction, still denoted ϕ(n) for
convenience, we may assume that y (ϕ(n)) → y (∞) as n → ∞, uniformly on compact sets of R+ .
(∞)
The function y (∞) is a Y ∞ -valued solution to ODE and y (∞) = Φ(y0 ) owing to the uniqueness
assumption which implies the invariance of Y ∞ under the flow of ODE.
For every p ∈ N, we consider for large enough n, say n ≥ np , the sequence (yN (Γϕ(n) −p) )n≥np . It
is clear by mimicking the proof of Proposition 6.7 that all sequences of functions (y (N (Γϕ(n) −p)) )n≥np
are UK -relatively compact. By a diagonal extraction procedure, we may assume that, for every
p ∈ N,
UK
y (N (Γϕ(n) −p) −→ y (∞),p as n → +∞.
(N (Γϕ(n) −p−1)) (N (Γϕ(n) −p))
Since yt+1 = yt for every t ∈ R+ and n ≥ np+1 , one has
(∞),p+1 (∞),p
∀ p ∈ N, ∀ t ∈ R+ , yt+1 = yt .
Furthermore, it follows from (a) that the functions y (∞),p are Y ∞ -valued solutions to ODE. One
defines
(∞) (∞),p
ỹt = yp−t , t ∈ [p − 1, p]
6.4. FURTHER RESULTS ON STOCHASTIC APPROXIMATION 175
(∞) (∞),p
which satisfies in fact, for every p ∈ N, ỹt = yp−t , t ∈ [0, p]. This implies that ỹ (∞) is
an Y ∞ -valued solution to ODE ∗ starting from y0 on ∪p≥0 [0, p] = R+ . Uniqueness implies that
ỹ (∞) = Φ∗t (y0 ) which completes the proof. ♦
Remark. If uniqueness fails either for ODE or for ODE ∗ , one still has that Y ∞ is left invariant
by ODE and ODE ∗ in the sense that, from every y0 ∈ Y ∞ , there exists Y ∞ -valued solutions of
ODE and ODE ∗ .
This property is the first step of the deep connection between the the asymptotic behavior of
recursive stochastic algorithm and its associated mean field ODE. Item (b) can be seen a first a
criterion to direct possible candidates to a set of limiting values of the algorithm. This any zero y ∗
of h, or equivalently equilibrium points of ODE satisfies the requested invariance condition since
Φt (y ∗ ) = Φt (y ∗ ) = y ∗ for every t ∈ R+ . No other single point can satisfy this invariance property.
More generally,we have the following result
Corollary 6.2 If the ODE ≡ ẏ = −h(y) has finitely many compact connected two-sided invariant
sets Xi , i ∈ I (I finite), then the sequence (yn )n≥1 converges toward one of these sets.
ẏ = (1 − |y|)y + ςy ⊥ , y0 ∈ R2
where y = (y1 , y2 ) and y ⊥ = (y2 , y1 ). Then the unit circle C(0; 1) is clearly a connect compact
set invariant by ODE and ODE ∗ . The singleton {0} also satisfies this invariant property. In fact
C(0; 1) is an attractor of ODE and {0} is a repeller. So, we know that any stochastic algorithm
which satisfies H1 -H3 with the above mean field function h(y1 , y2 ) = (1 − |y|)y + ςy ⊥ will converter
either toward C(0, 1) or 0.
Sharper characterizations of the possible set of limiting points of the sequence (yn )n≥0 have
been established in close connection with the theory of perturbed dynamical systems (see [20], see
also [50] when uniqueness fails and the ODE has no flow). To be slightly more precise it has
been shown that the set of limiting points of the sequence (yn )n≥0 is internally chain recurrent
or, equivalently, contains no strict attractor for the dynamics of the ODE i.e. as subset A ⊂ Y ∞ ,
A 6= Y ∞ such that φt (y) converges to A uniformly iny ∈ X ∞ .
Unfortunately, the above results are not able to discriminate between these two sets though it
seems more likely that the algorithm converges toward the unit circle, like the flow of the ODE
does (except when starting from 0). This intuition can be confirmed under additional assumptions
on the fact that 0 is a noisy for the algorithm, e.g. if it is at the origin a Markovian algorithm of
the form (6.3), that the symmetric nonnegative matrix
E H(0, Z1 )∗ H(0, Z1 ) 6= 0.
Practical aspects of assumption H3 To make the connection with the original form of stochas-
tic algorithms, we come back to hypothesis H3 in the following proposition. In particular, it em-
phasizes that this condition is less stringent than a standard convergence assumption on the series.
176 CHAPTER 6. STOCHASTIC APPROXIMATION AND APPLICATIONS TO FINANCE
Proposition 6.8 Assumption H3 is satisfied in tow situations (or their “additive” combination):
(a) Remainder term: If πn → 0 and γn → 0 as n → +∞, then
N (Γn +t)
X
sup γk πk ≤ sup |πk |(ΓN (Γn +T − Γn ) → 0 as n → +∞.
t∈[0,T ] n+1 k≥n+1
(a) Converging martingale term: The series Mnγ = n≥1 γn ∆Mn converges in Rd . Consequently
P
it satisfies a Cauchy condition so that
N (Γn +t)
X X
sup γk πk ≤ sup γk πk as n → +∞.
t∈[0,T ] k=n+1 `≥n+1 `≥n+1
πn = ∆Mn + rn
where rn is a remainder term which goes to 0 a.s. and Mnγ is an a.s. convergent martingale.
This convergence property can even be relaxed for the martingale term. Thus we have the
following classical results where H3 is satisfied while the martingale Mnγ may diverge.
2(p−1)
This allows for the use of steps of the form γn ∼ c1 n−a , a > 2
2+q = 3(p−1)+2 .
(b) Assume that there exist a real number c > 0 such that,
λ2
∀ λ ∈ R, EeλδMn ≤ ec 2 .
X −c
Then for every sequence (γn )n≥0 such that e γn < +∞, Assumption H3 is satisfied with πn =
n≥1
∆Mn . This allows for the use of steps of the form γn ∼ c1 n−a , a > 0, and γn = c1 (log n)−1+a ,
a > 0.
Examples: Typical examples where the sub-Gaussian assumption is satisfied are the following:
λ2
K2
• |∆Mn | ≤ K ∈ R+ since, owing to Hoeffding Inequality, E eλ∆Mn ≤ e 8 .
λ2
K2
• ∆Mn ∼ N (0; σn2 ) with σn ≤ K, then E eλ∆Mn ≤ e 2
The first case is very important since in many situations the perturbation term is a martingale
term and is structurally bounded.
6.4. FURTHER RESULTS ON STOCHASTIC APPROXIMATION 177
Proposition 6.10 (G-Lemma, see [50]) Assume H1 -H3 . Let G : Rd → R+ be a function satis-
fying
(G) ≡ lim yn = y∞ and lim G(yn ) = 0 =⇒ G(y∞ ) = 0.
n→+∞ n→+∞
Then, there exists a connected component X ∗ of the set {G = 0} such that diet(yn , X ∗ ) = 0.
(n
Proof. First, it follows from Proposition (6.7) that the sequence (yn )n≥0 is UK -relatively compact
with limiting functions lying in C(R+ , Y ∞ ) where Y ∞ still denotes the compact connected set of
limiting values of (yn )n≥0 .
Set, for every y ∈ Rd , G(y)
e = lim inf G(x) = inf lim inf G(xn ), xn → y so that 0 ≤ G e ≤ G.
x→y n
The function G
e is the l.s.c. envelope of the function G i.e. the highest l.s.c. function not greater
than G. In particular, under Assumption (G)
{G = 0} = {G
e = 0} is closed.
Let y∞ ∈ Y ∞ . Up to at most two extractions, one may assume that y (ϕ(n)) → y (∞) for the UK
(∞)
topology where y0 = y∞ . It follows from Fatou’s Lemma that
Z +∞ Z +∞
(∞)
0≤ G(ys )ds =
e G(lim
e ys(ϕ(n)) )ds
0 0 n
Z +∞
≤ e (ϕ(n)) )ds (since G
lim G(y e is l.s.c.)
n s
0
Z +∞
≤ lim e s(ϕ(n)) )ds (owing to Fatou’s lemma)
G(y
n 0
Z +∞
≤ lim G(ys(ϕ(n)) )ds
n 0
Z +∞
= lim G(ys(0) ds = 0.
n Γϕ(n)
Z +∞
Consequently, G(y e s(∞) ) = 0 ds-a.s.. Now s 7→ ys(∞)
e s(∞) )ds = 0 which implies that G(y
0
e (∞) ) = 0 since G
is continuous it follows that G(y e is l.s.c. Which in turn implies G(y ∞ ) = 0 i.e.
0 0
178 CHAPTER 6. STOCHASTIC APPROXIMATION AND APPLICATIONS TO FINANCE
G(y∞ ) = 0. As a consequence Y ∞ ⊂ {G = 0} which yields the result since on the other hand it is
a connected set. ♦
Now we are in position to prove the convergence of stochastic and stochastic pseudo-gradient
procedures in the multi-dimensional case.
Corollary 6.3
Proof. ♦
Assume Y0 ∈ L2 (P) is independent of the i.i.d. sequence (Zn )n≥1 . Assume there exists y ∗ ∈ Rd and
α > 0 such that both the strong mean-reverting assumption
Fionally, assume that the step sequence (γn )n≥1 satisfies the usual decreasing step assumption (6.7)
and the additonal assumption
1 γn
(G)α ≡ lim sup an = 1 − 2 α γn+1 − 1 = −κ∗ < 0.
n γ n+1 γ n+1
hold, then
a.s. √
Yn −→ y ∗ and kYn − y ∗ k2 = O( γn ).
γ 1
Examples. • If γn = , 2 < ϑ < 1, then (G)α is satisfied for any α > 0.
nϑ
γ 1 − 2αγ 1
• If γn = , Condition (G)α reads = −κ∗ < 0 or equivalently γ > .
n γ 2α
Likewise we get
E |H(Yn , Zn+1 )|2 ≤ C 2 (1 + |Yn |2 ).
6.4. FURTHER RESULTS ON STOCHASTIC APPROXIMATION 179
This implies
owing successively to the linear quadratic growth and the strong mean-reverting assumptions.
Finally,
E |Yn+1 − y ∗ |2 = E |Yn − y ∗ |2 1 − 2 α γn+1 + C 0 γn+1
2
+ C 0 γn+1
2
.
κ∗
Let n0 be an integer such that, for every n ≥ n0 , an ≤ − 43 κ∗ and C 0 γn ≤ 4 . For these integers n,
κ∗
un+1 ≤ un 1 − γn+1 + C 0 γn+1 .
2
Then, one derives by induction that
2C 0
∀ n ≥ n0 , 0 ≤ un ≤ max un0 , ∗
κ
Exercise. Show a similar result (under appropriate assumptions) for an algorithm of the form
Hence, L, being non-negative, attains its minimum at a point y ∗ ∈ Rd . In fact, the above inequality
straightforwardly implies that y ∗ is unique since {∇L = 0} is clearly reduced to {y ∗ }.
Moreover, for every y, u ∈ Rd ,
∗ 2 ∇L(y + tu) − ∇L(y)|u
0 ≤ u D L(y)u = lim ≤ [∇L]Lip |u|2 < +∞.
t→0 t
In that framework, the following proposition shows that the “value function” L(Yn ) of stochastic
1
gradient converges in L at a O 1/n -rate. This is a straightforward consequence of the above
Proposition 6.11.
Proof. It is clear that such a stochatic algorihm satifies the assumptions of the above Proposi-
tion 6.11, especially the strong mean-reverting assumption (6.36) owing to the preliminaries on L
that precede the propsition. By the fundamental theorem of calculus, for every n ≥ 0 there exists
Ξn ∈ (Yn−1 , Yn ) (geometric interval in Rd ) such that
1
L(Yn ) − L(y ∗ ) = ∇L(y ∗ )|Yn − y ∗ + (Yn − y ∗ )∗ D2 L(Ξn )(Yn − y ∗ )
2
2
= [∇L]1 Yn − y ∗ .
One concludes by taking the expectation in the above inequality and applying Proposition 6.11. ♦
Exercise. Show a similar results for a pseudo-stochastic gradient under appropriate assumptions
on the mean function h.
Theorem 6.6 (Pelletier (1995) [135]) We consider the stochastic procedure defined by (6.3).
Let y∗ be an equilibrium point of {h = 0}.
(i) The equilibrium point y∗ is an attractor for ODEh : Assume that y∗ is a “strongly” attractor
for the ODE ≡ ẏ = −h(y) in the following sense (8 ):
h is is differentiable at y∗ and all the eigenvalues of Dh(y∗ ) have positive real parts.
(ii) Regularity and growth control of H: Assume that the function H satisfies the following regu-
larity and growth control property
The “Lstably ” stable convergence in distribution mentioned in (6.39) means that for every
bounded continuous function and every A ∈ F,
√ n→∞ √ √
E 1{Yn →y∗ }∩A f n(Yn − y∗ ) −→ E 1{Yn →y∗ }∩A f ( α Σ ζ) , ζ ∼ N (0; Id ).
Remarks. • Other rates can be obtained when ϑ = 1 and c < 2<e(λ1 min ) or, more generally, when
the step go to 0 faster although it satisfies the usual decreasing step Assumption (6.7). So is the
c
case for example when γn = b log(n+1)n . Thus, in the latter case, one shows that there exists an
Rd -valued random vector Ξ such that
√ √
Yn = y ∗ + γn Ξ + o( γn ) a.s. as n → +∞.
8
This is but a locally uniform attractivity condition for y∗ viewed as an equilibrium the ODE ẏ = −h(y).
182 CHAPTER 6. STOCHASTIC APPROXIMATION AND APPLICATIONS TO FINANCE
c2
c Σ = Var(H(y∗ , Z)) .
2c h0 (y∗ ) − 1
1 1
γn := or, more generally, γn :=
h0 (y∗ )n h0 (y∗ )(b + n)
where b can be tuned to “control” the step at the beginning of the simulation (when n is small).
At this stage, one encounters the same difficulties as with deterministic procedures since y∗
being unknown, h0 (y∗ ) is “more” known. One can imagine to estimate this quantity as a companion
procedure of the algorithm but this turns out to be not very efficient. A more efficient approach,
although not completely satisfactory in practice, is to implement the algorithm in its averaged
version (see Section 6.4.4 below).
However one must have in mind that this tuning of the step sequence is intended to optimize
the rate of convergence of the algorithm during its final convergence phase. In real applications,
this class of recursive procedures spends post of its time “exploring” the state space before getting
trapped in some attracting basin (which can be the basin of a local minimum in case of multiple
critical points). The CLT rate occurs once the algorithm is trapped.
An alternative to these procedures is to design some simulated annealing procedure which
“super-excites” the algorithm in order to improve the exploring phase. Thus, when the mean
function h is a gradient (h = ∇L), it finally converges – but only in probability – to the true
minimum of the potential/Lyapunov function L. The “final” convergence rate is worse owing
to the additional exciting noise. Practitioners often use the above Robbins-Monro or stochastic
gradient procedure with a sequence of steps (γn )n≥1 which decreases to a positive limit γ.
We now prove this CLT in the 1D-framework when the algorithm a.s. converges toward a
unique “target” y∗ . Our method of proof is the so-called SDE method which heavily relies on
weak functional convergence arguments. We will have to admit few important results about weak
convergence of processes for which we provide precise references. An alternative proof can be
carried out relying on the CLT for triangular arrays of martingale increments (see [70]). Thus, such
an alternative proof – in a one dimensional setting – can be found in [105].
We purpose below a partial proof of Theorem 6.6, dealing only withe case where the equilibrium
point y ∗ is unique. The extension to a multi-target algorithm is not really more involved and we
refer to the original paper [135].
6.4. FURTHER RESULTS ON STOCHASTIC APPROXIMATION 183
Proof of Theorem 6.6. We will not need the Markovian feature of stochastic algorithms we
are dealing with in this chapter. In fact it will be more useful to decompose the algorithm in its
canonical form
Yn+1 = Yn − γn+1 h(Yn ) + ∆Mn+1
where
∆Mn = H(Yn−1 , Zn ) − h(Yn−1 ), n ≥ 1,
is a sequence of Fn -martingale increments where Fn = σ(Y0 , Z1 , . . . , Zn ), n ≥ 0. The so-called
SDE method follows the same principle as the ODE method but with the quantity of interest
Yn − y∗
Υn := √ , n≥1
γn
(this normalization is strongly suggested by the above L2 -convergence rate theorem). The under-
lying idea is to write a recursion on Υn which appears as an Euler scheme with decreasing step γn
of an SDE having a stationary/steady regime.
Step 1(Toward the SDE): So, we assume that
a.s.
Yn −→ y ∗ ∈ {h = 0}.
y ∗ = 0.
Moreover the function η is bounded on the real line owing to the linear growth of h. For every
n ≥ 1, we have
√
r
γn
Υn+1 = Υn − γn+1 h(Yn ) + ∆Mn+1
γn+1
√ √
r
γn
= Υn − γn+1 Yn h0 (0) + η(Yn ) + γn+1 ∆Mn+1
γn+1
√ √ √
r
γn
= Υn − Υn + Υn − γn+1 γn Υn h0 (0) + η(Yn ) + γn+1 ∆Mn+1
γn+1
r r √
γn 0 1 γn
= Υn − γn+1 Υn h (0) + η(Yn ) − −1 + γn+1 ∆Mn+1 .
γn+1 γn+1 γn+1
Assume that the sequence (γn )n≥1 is such that there exists c ∈ (0, +∞] satisfying
r
0 1 γn 1
lim an = −1 = .
n γn+1 γn+1 2c
γn
Note that this implies limn γn+1 = 1. In fact it is satisfied by our two families of step sequences of
interest since
184 CHAPTER 6. STOCHASTIC APPROXIMATION AND APPLICATIONS TO FINANCE
1
– if γn = c
b+n , c > 0, b ≥ 0, then lim a0n = > 0,
n 2c
– if γn = c
nϑ
, c > 0, 1
2 < ϑ < 1, then lim a0n = 0 i.e. c = +∞.
n
Consequently, for every n ≥ 1,
1 √
Υn+1 = Υn − γn+1 Υn h0 (0) − + αn1 + αn2 η(Yn ) + γn+1 ∆Mn+1
2c
where (αni )n≥1 i = 1, 2 are two deterministic sequences such that αn1 → 0 and αn2 → 1 as n → +∞
0.
Step 2 (Localisation(s)): Since Yn → 0 a.s., one can write the scenarios space Ω as follows
[ n o
∀ ε > 0, Ω= Ωε,N a.s. where Ωε,N := sup |Yn | ≤ ε .
N ≥1 n≥N
Let ε > 0 and N ≥ 1 being temporarily free parameters. We define the function e
h=e
hε by
Ye ε,N = Ye ε,N − γ
h ( enε,N ) + 1{|Y |≤ε} ∆Mn+1 , n ≥ N.
Y
n n+1 ε
e
n+1 n
To alleviate notations we will drop the exponent ε,N in what follows and denote Yen instead of Yenε,N .
The “characteristics” (mean function and Fn -martingale increments) of this new algorithm are
h;e
h and fn+1 = 1{|Y |≤ε} ∆Mn+1 , n ≥ N
∆M n
which satisfy
fn+1 |2+ρ |Fn ) ≤ 21+ρ sup (E|H(θ, X)|2+ρ ) ≤ A(ε) < +∞ a.s..
sup E(|∆M
n≥N |θ|≤ε
e n := √Yn , n ≥ N
e
Υ
γn
Step 3: (Specification of ε and K = K(ε)) Since h is differentiable at 0 (and h0 (0) > 0),
c
If γn = n+b with c > 2h01(0) and b ≥ 0, (as prescribed in the statement of the theorem), we may
choose ρ = ρ(h) > 0 small enough so that
1
c> .
2h0 (0)(1 − ρ)
c 1 0
If γn = (n+b)ϑ , 2 < ϑ < 1 and, more generally, as soon as limn a, = 0, any choice of ρ ∈ (0, 1) is
possible. Now let ε(ρ) > 0 such that |y| ≤ ε(ρ) implies |η(y)| ≤ ρh0 (0). It follows that
Now we specify
ε = ε(ρ) and K = (1 − ρ)h0 (0) > 0
As a consequence, the function e
h satisfies
∀ y ∈ R, h(y) ≥ Ky 2
ye
Γn = γ1 + · · · + γn e (0) = Υ
and Υ e n, n ≥ N,
(Γn )
and
∀ t ∈ R+ , N (t) = min {k | Γk+1 ≥ t} .
Hence, setting a = h0 (0) − 1
2c > 0, we get for every n ≥ N ,
Γn n
√
Z
e (0) = Υ (0)
X
e − 1 2
Υ (Γn ) N Υ(t) a + α
e eN (t) + α
eN (t) εe(YN (t) ) dt +
e γk ∆M
fk+1 .
ΓN k=N
t N (t)
Z X√
e (0) = Υ
Υ e − (0)
Υ(s) ρ + α
e 1
eN (s) + α 2
eN (s) εe(YN (s) ) ds +
e γk ∆M
fk+1 .
(t) N
ΓN k=N
| {z } | {z }
e(0)
=:A f(0)
=:M
(t) (t)
186 CHAPTER 6. STOCHASTIC APPROXIMATION AND APPLICATIONS TO FINANCE
and
e(n) := A
A e(0) and f(n) = M
M f(0) .
(t) (Γn +t) (t) (Γn +t)
At this stage, we need two fundamental results about functional weak convergence. The first one
is a criterion which implies the functional tightness of the distributions of a sequence of right contin-
uous left limited (càdlàg) processes X (n) (viewed as probability measures on the space D(R+ , R) of
càdlàg functions from R+ to R. The second one is an extension of Donsker’s Theorem for sequences
of martingales.
We need to introduce the uniform continuity modulus defined for every function f : R+ → R
and δ, T > 0 by
w(f, δ, T ) = sup |f (t) − f (s)|.
s,t∈[0,T ],|s−t|≤δ
The terminology comes from the seminal property of this modulus: f is (uniformly) continuous
over [0, T ] if and only if limδ→0 w(f, δ, T ) = 0.
n )
Theorem 6.7 (A criterion for C-tightness) ([24], Theorem 15.5, p.127) (a) Let (X(t) t≥0 , n ≥
1, be a sequence of càdlàg processes processes null at t = 0. If, for every T > 0 and every ε > 0,
0
then the sequence (X n )n≥1 is C-tight in the following sense: from any subsequence (X n )n≥1 one
may extract a subsequence (X n” )n≥1 such that X n converges in distribution toward a process X ∞
for the weak topology on the space D(R+ , R) (endowed with the topology of uniform convergence on
compact sets) such that P(X ∞ ∈ C(R+ , R)) = 1.
(b) (See [24], proof of Theorem 8.3 p.56) If, for every T > 0 and every ε > 0,
(n)
lim sup lim sup P( sup |Xt − Xs(n) | ≥ ε) = 0
δ→0 s∈[0,T ] n s≤t≤t+δ
The second theorem provides a tightness criterion for a sequence of martingales based on the
sequence of their bracket processes.
Theorem 6.8 (Weak functional limit of a sequence of martingales) ([73]). Let (M(t) n )
t≥0 ,
n ≥ 1, be a sequence of càdlàg (local) martingales, null at 0, C-tight, with (existing) predictable
bracket process hM n i. If
n→+∞
∀ t ≥ 0, hM n i(t) −→ σ 2 t, σ > 0,
then
LD(R+ ,R)
Mn −→ σW,
6.4. FURTHER RESULTS ON STOCHASTIC APPROXIMATION 187
Hence
C i e n |2 δ 2
supn E|Υ
∀ n ≥ N, ∀ s ∈ R+ , P sup |A e(n) | ≥ ε ≤ keεksup ,keα ksup
e(n) − A .
(t) (s) ε2
s≤t≤s+δ
and one concludes that the sequence (A e(n) )n≥1 is C-tight by applying the above Theorem ??.
(n)
We can apply also this result to the weak asymptotics of the martingales (M(t) ). We start from
f(0) defined by
the definition of the martingale M
N (t)
X√
f(0) =
M γk ∆M
fk+1
(t)
k=1
(0)
(the related filtration is Ft = Fn , t ∈ [Γn , Γn+1 )). Since we know that
we get, owing to B.D.G. Inequality, that, for every ρ > 0 and every s ∈ R+ ,
1+ ρ
N (s+δ) 2
f(0) 2+ρ ≤ Cρ E
(0) X
2
E sup f −M
M γk (∆M
fk )
(t) (s)
s≤t≤s+δ
k=N (s)+1
1+ ρ P 1+ 2ρ
N (s+δ) 2 N (s+δ) f 2
k=N (s)+1 γk (∆Mk )
X
≤ Cρ γk E PN (s+δ)
k=N (s)+1 k=N (s)+1 γk
1+ ρ P
N (s+δ) 2 N (s+δ) f 2+ρ
k=N (s)+1 γk |∆Mk |
X
≤ Cρ γk E PN (s+δ)
k=N (s)+1 γ
k=N (s)+1 k
ρ
N (s+δ) 2 N (s+δ)
X X
≤ Cρ γk fk |2+ρ
γk E|∆M
k=N (s)+1 k=N (s)+1
9
This means that for every bounded functional F : D R+ , R → R, measurable with respect to σ-field spanned
by finite projection α 7→ α(t), t ∈ R+ , and continuous at every α ∈ C R+ , R), one has EF (M n ) → EF (σ W ) as
n → +∞. This rermains true in fact for measurable functionals F which are Pσ W (dα)-a.s. continuous on C(R+ , R),
such that (F (M n ))n≥1 is uniformly integrable.
188 CHAPTER 6. STOCHASTIC APPROXIMATION AND APPLICATIONS TO FINANCE
where Cρ is a positive real constant. One finally derives that, for every s ∈ R+ ,
1+ ρ
N (s+δ) 2
(0) (0) 2+ρ
X
E sup M(t) − M(s) ≤ Cρ A(ε) γk
s≤t≤t+δ
k=N (s)+1
1+ ρ
≤ Cρ A(ε) δ + sup γk 2
k≥N (s)+1
N (Γn +t)
X
f(n) i(t) = fk )2 |Fk−1
hM γk E (∆M
k=n+1
N (Γn +t)
X
γk (E H(y, Zk )2 )|y=Yk−1 − h(Yk−1 )2
=
k=n+1
N (Γn +t)
X
2 2
∼ E H(0, Z) − h(0) γk
k=n+1
Hence
f(n) i(t) −→ E H(0, Z)2 × t
hM as n → +∞.
where we used here that h and θ 7→ E H(θ, Z)2 are both continuous en 0. Theorem 6.8 then implies
LC(R+ ,R)
f(n)
M −→ σ W, W standard Brownian motion.
(n)
Step 5 (Synthesis and conclusion): The sequence Υ(t) satisfies, for every n ≥ N ,
(n) e(n) − M
f(n) .
∀ t ≥ 0, en − A
Υ(t) = Υ (t) (t)
e (∞) , M
Let (Υ f(∞) ) be a weak functional limiting value. It will solve the Ornstein-Uhlenbeck
(t) (t)
(O.U.) SDE
dΥe (∞) = −aΥe (∞) dt + σ dWt (6.40)
(t) (t)
e ∞ k ≤ sup kΥ
kΥ e nk .
2 2
n
Let ν0 := L-limn Υ
e ϕ(n) be a weak (functional) limiting value of (Υ
e ϕ(n) )n≥1 .
For every t > 0, one considers the sequence of integers ψt (n) uniquely defined by
and
L e (∞,−t)
e (ψt (n)) −→ e (∞,−t) ∼ ν
Υ Υ starting from Υ (0) −t
One checks by strong uniqueness of solutions of the above Ornstein-Uhlenbeck SDE that
e (∞,−t) = Υ
Υ e (∞,0) .
(t) (0)
Now let (P t )t≥0 denote the semi-group of the Ornstein-Uhlenbeck process. From what precedes,
for every t ≥ 0,
ν0 = ν−t P t
Moreover, (ν−t )t≥0 is tight since it is L2 -bounded. Let ν−∞ be a weak limiting value of ν−t as
t → +∞.
Let Υµ(t) denote a solution to (6.40) starting from a µ-distributed random variable. We know
by the confluence property of O.U. paths that
0 0
|Υµt − Υµt | ≤ |Υµ0 − Υµ0 |e−at .
Consequently
σ2
t
ν0 = lim ν−∞ P = N 0; .
t→+∞ 2a
2
We have just proved that the distribution N 0; σ2a is the only possible limiting value hence
σ2
L
e n −→
Υ N 0; .
2a
Now we come back to Υn (prior to the localization). We have just proved that for ε = ε(ρ) and
for every N ≥ 1,
σ2
ε,N L
Υn −→ N 0;
e as n → +∞. (6.41)
2a
S
On the other hand, we already saw that Yn → 0 a.s. implies that Ω = N ≥1 Ωε,N a.s. where
Ωε,N = {Y ε,N = Yn , n ≥ N } = {Υe ε,N = Υn , n ≥ N }. Moreover the events Ωε,N ar non-decreasing
as N increases so that
lim P(Ωε,N ) = 1.
N →∞
where Ξ ∼ N (0; 1). One concludes by letting N go to infinity that for every bounded continuous
function f σ
lim Ef (Υn ) = E f √ Ξ
n 2a
2
L σ
i.e. Υn −→ N 0; . ♦
2a
Theorem 6.9 (Ruppert & Poliak, see [43, 146, 138]) Let H : Rd × Rq → Rd a Borel func-
tion and let (Zn )n≥1 be sequence of i.i.d. Rq -valued random vectors defined on a probability space
(Ω, A, P) such that, for every y ∈ Rd , H(y, Z) ∈ L2 (P). Then the recursively defined procedure
Yn+1 = Yn − γn+1 H(Yn , Zn+1 )
and the mean function h(y) = E H(y, Z) are well-defined. We make the following assumptions that:
(i) The function h has a unique zero a y∗ and is “fast” differentiable at y∗ in the sense that
∀ y ∈ Rd , h(y) = A(y − y∗ ) + O(|y − y∗ |2 )
where all eigenvalues of the Jacobian matrix A = Jh (y∗ ) of h at y∗ have a positive real part.
(ii) The algorithm Yn converges toward y∗ with positive probability.
(iii) There exists an exponent c > 2 and a real constant C > 0 such that
∀ K > 0, sup E(|H(y, Z)|c ) < +∞ and y 7→ E(H(y, Z)H(y, Z)t ) is continuous at y∗ (6.42)
|y|≤K
γ0
Then, if γn = na +b , n ≥ 1, where 1/2 < a < 1 and b ≥ 0, the empirical mean sequence defined
by
Y0 + · · · + Yn−1
Ȳn =
n
satisfies the CLT with the optimal asymptotic variance, on the event {Yn → y∗ }, namely
√ L
n (Ȳn − y∗ ) −→ N 0; A−1 Γ∗ A−1
on {Yn → y∗ }.
192 CHAPTER 6. STOCHASTIC APPROXIMATION AND APPLICATIONS TO FINANCE
We will prove this result in a more restrictive setting and refer e.g. to [43] for the general case.
Fist we assume (only for convenience) that d = 1. Then, we assume that h satisfies the coercivity
assumption (6.36) from Proposition 6.11, has a Lipschitz continuous derivative . Finally, we assume
that Yn → y∗ a.s.. Note that the step sequences under consideration all satisfy the Condition (G)α
of Proposition 6.11.
Proof (one-dimensional case). We consider the case of a scalar algorithm (d = 1). We assume
without loss of generality that y∗ = 0. We start from the canonical Markov representation
∀ n ≥ 0, Yn+1 = Yn − γn+1 h(Yn ) − γn+1 ∆Mn+1 where ∆Mn+1 = H(Yn , Zn+1 ) − h(Yn )
Yn − Yn+1
h0 (0)Yn = − ∆Mn+1 + O(|Yn |2 )
γn+1
n n−1
√ 1 X Yk − Yk−1 1 1 X
h0 (0) n Ȳn = − √ − √ Mn − √ O(|Yk |2 ).
n γk n n
k=1 k=0
We will inspect successively the three sums on the right hand side of the equation. First, by an
Abel transform, we get
n n
X Yk − Yk−1 Yn Y0 X 1 1
= − + Yk−1 − .
γk γn γ1 γk γk−1
k=1 k=2
As for the second (martingale) term, it is straightforward that, if we set P si(y) = E H(y, Z) −
2
h(y) , then
n
hM in 1X a.s.
= Ψ(Yn ) −→ Ψ(y∗ ).
n n
k=1
It follows from the assumptions made on the function H and Lindeberg’s CLT (Theorem 11.7 in
the Miscellany chapter 11) that
1 Mn d Ψ(y )
∗
0
√ −→ N 0; 0 .
h (0) n (h (0))2
The third term is handled as follows. Under the Assumption (6.36) of Proposition 6.11, we
know that E Yn2 ≤ Cγn since the class of steps we consider satisfy the condition (G)α (see remark
below Proposition 6.11). On the other hand, it follows from the assumption made on the step
sequence that
+∞ n
X E O(|Yk |2 ) X E Yk2
√ ≤ [h0 ]Lip √
k=1
k k=1
k
+∞
γ
√k .
X
≤ C [h0 ]Lip
k=1
k
Remark. As far as the step sequence is concerned, we only used that (γn ) is decreasing, satisfies
(G)α and X γk √
√ < +∞ and lim nγn = +∞.
k n
n
Indeed, we have seen in the former section that this variance is the lowest possible asymptotic
copt
variance in the CLT when specifying the step parameter in an optimal way (γn = n+b ). In fact
this discussion and its conclusions can be easily extended to higher dimensions (if one considers
some matrix-valued step sequences) as emphasized e.g. in [43].
So, the Ruppert & Poliak principle performs as the regular stochastic algorithm with the lowest
asymptotic variance for free!
Exercise. Test the above averaging principle on the former exercises and “numerical illustra-
3
tions” by considering γn = α n− 4 , n ≥ 1. Compare with a direct approach with a step γ̃n = β+n
α
.
Practitioner’s corner. In practice, one should not start the averaging at the true beginning
of the procedure but rather wait for its stabilization, ideally once the “exploration/search” phase
is finished. On the other hand, the compromise consisting in using a moving window (typically of
length n after 2n iterations does not yield the optimal asymptotic variance as pointed out in [101].
194 CHAPTER 6. STOCHASTIC APPROXIMATION AND APPLICATIONS TO FINANCE
6.4.5 Traps
In presence of multiple equilibrium points i.e. of points at which the mean function h of the
procedure vanishes, some of them can be seen as parasitic equilibrium. This is the case of saddle
points local maxima in the framework of stochastic gradient descent (to some extend local minima
are parasitic too but this is another story and usual stochastic approximation does not provide
satisfactory answers to this “second order problem”).
There is a wide literature on this problem which says, roughly speaking, that an excited enough
parasitic equilibrium point is a.s. not a possible limit point for a stochastic approximation pro-
cedure. Although natural and expected such a conclusion is far from being straightforward to
establish, as testified by the various works on that topic (see [96, 136, 43, 20, 51], etc).
6.4.6 (Back to) V @Rα and CV @Rα computation (II): weak rate
We can apply both above CLTs to the V @Rα and CV @Rα (X) algorithms (6.24) and (6.26). Since
1 1
EH(ξ, X)2 =
h(ξ) = F (ξ) − α and 2
F (ξ)(1 − F(ξ) ,
1−α (1 − α)
one easily derives from Theorems 6.6 and 6.9 the following results
Theorem 6.10 Assume that PX = f (x)dx where f is a continuous density function (at least at
the ξα∗ = V @Rα (X)).
κ 1
(a) If γn = na +b , 2 < a < 1, b ≥ 0 then
a L κα(1 − α)
n− 2 ξn − ξα∗ −→ N 0;
2f (ξα∗ )
κ 1−α
(b) If γn = n+b , b ≥ 0 and κ > ∗)
2f (ξα then
√ L κ2 α
n ξn − ξα∗ −→ N 0;
2κf (ξα∗ ) − (1 − α)
√ L α(1 − α)
n ξn − ξα∗ −→ N 0; .
f (ξα∗ )2
The algorithm for the CV @Rα (X) satisfies the same kind of CLT . In progress [. . . ]
This result is not satisfactory because we see that in general the asymptotic variance remains
huge since f (ξα∗ ) is usually very close to 0. Thus if X has a normal distribution N (0; 1), then it is
clear that ξα∗ → +∞ as α → 1. Consequently
f (ξα∗ )
1 − α = P(X ≥ ξα∗ ) ∼ as α→1
ξα∗
6.4. FURTHER RESULTS ON STOCHASTIC APPROXIMATION 195
so that
α(1 − α) 1
∗
∼ ∗ → +∞ as α → 1.
f (ξα ) 2 ξα f (ξα∗ )
This is simply an illustration of the “rare event” effect which implies that when α is close to 1
and the event {Xn+1 > ξn } is are especially when ξn gets close to its limit ξα∗ = V @Rα (X).
The way out is to add an importance sapling procedure to somewhat “re-center” the distribution
around its V @Rα (X). To proceed, we will take advantage of our recursive variance reduction by
importance sampling described and analyzed in Section 6.3.1. This is the object of the next section.
Hence, for a level α ∈ (0, 1], in a (temporarily) static framework (i.e. fixed ξ ∈ R), the function of
interest for variance reduction id defined by
1
ϕα,ξ (z) = 1{ϕ(z)≤ξ} − α , z ∈ Rd .
1−α
So, still following Section 6.3.1 and taking advantage of the fact that ϕα,ξ is bounded, we design
the following data driven procedure for the variance reducer (with the notations of this section),
In progress [. . . ]
Considering now a dynamic version of these procedure (i.e. which adapts recursively ξ leads to
design the following procedure
This procedure is a.s. converging toward it target (θα∗ , ξα ) and satisfies the averaged component
¯
(ξen )n≥0 of (ξn )n≥0 satisfies a CLT.
196 CHAPTER 6. STOCHASTIC APPROXIMATION AND APPLICATIONS TO FINANCE
Theorem 6.11 (see [19]) (a) If the step sequence satisfies the decreasing step assumption (6.7),
then
n→+∞
(ξen , θen ) −→ (ξα , θα∗ ) with ξα = V @Rα (X).
κ 1
(b) If the step sequence satisfies γn = na +b , 2 < a < 1, b ≥ 0, κ > 0, then
√ Vα,ξα (θα∗ )
L
n ξ˜n − ξα −→ N 0,
f (ξα )2
where 2
2
Vα,ξ (θ) = e−|θ| E 1{χ(Zn+1 +θn )≤ξen } − α e−2(θ|Z) .
Note that
Vα,ξ (0) = F (ξ) 1 − F (ξ) .
d
y ∈ Rd , U = U ([0, 1]q ).
h(y) = E H(y, U ) ,
Suppose that
{h = 0} = {y∗ }, y∗ ∈ Rd
and that there exists a continuously differentiable function L : Rd → R+ with a Lipschitz continuous
continuous gradient ∇L satisfying √
|∇L| ≤ CL 1 + L
6.5. FROM QUASI-MONTE CARLO TO QUASI-STOCHASTIC APPROXIMATION 197
such that H satisfies the following pathwise mean reverting assumption: the function ΦH defined
by
∀ y ∈ Rd , ΦH (y) := inf (∇L(y)|H(y, u) − H(y∗ , u)) is l.s.c. and positive on Rd \ {y∗ }. (6.43)
u∈[0,1]q
Proof. (a) Step 1 (The regular part): The beginning of the proof is rather similar to the “regular”
stochastic case except that we will use as a Lyapounov function
√
Λ = 1 + L.
First note that ∇Λ = √∇L is bounded (by the constant CL ) so that Λ is C-Lipschitz continuous
2 1+L
continuous. Furthermore, for every x, y ∈ Rd ,
|∇L(y) − ∇L(y 0 )| 1 1
|∇Λ(y) − ∇Λ(y 0 )| ≤ + |∇L(y 0 )| p −p (6.47)
p
1 + L(y 0 )
1 + L(y) 1 + L(y)
|y − y 0 | C p p
≤ [∇L]Lip p +p L | 1 + L(y) − 1 + L(y 0 )|
1 + L(y) 1 + L(y)
|y − y 0 | C2
≤ [∇L]Lip p +p L |y − y 0 |
1 + L(y) 1 + L(y)
|y − y 0 |
≤ CΛ p (6.48)
1 + L(y)
198 CHAPTER 6. STOCHASTIC APPROXIMATION AND APPLICATIONS TO FINANCE
≤ Λ(yn ) − γn+1 (∇Λ(yn ) | H(yn , ξn+1 )) + γn+1 ∇Λ(yn ) − ∇Λ(ζn+1 )H(yn , ξn+1 ).
Now, the above inequality (6.48) applied with with y = yn and y 0 = ζn+1 yields, knowing that
|ζn+1 − yn | ≤ |yn+1 − yn |,
2 CΛ
≤ Λ(yn ) − γn+1 (∇Λ(yn ) | H(yn , ξn+1 )) + γn+1 p |H(yn , ξn+1 )|2
1 + L(yn )
Λ(yn+1 ) ≤ Λ(yn ) − γn+1 (∇Λ(yn ) | H(yn , ξn+1 ) − H(y∗ , ξn+1 )) − γn+1 (∇Λ(yn ) | H(y∗ , ξn+1 ))
2
+γn+1 CΛ Λ(yn ).
2
Λ(yn+1 ) ≤ Λ(yn )(1 + CΛ γn+1 ) − γn+1 ΦH (yn ) − γn+1 (∇Λ(yn ) | H(y∗ , ξn+1 )). (6.49)
First note that (6.45) combined with the Koksma-Hlawka Inequality imply
where V (H(y∗ , .)) denotes the variation in the measure sense of H(y∗ , .). An Abel transform yields
(with the convention S0∗ = 0)
n−1
X
en (∇Λ(yn−1 ) | Sn∗ ) −
mn = γ ek ∇Λ(yk−1 ) | Sk∗ )
γk+1 ∇Λ(yk ) − γ
(e
k=1
n−1
X
= en (∇Λ(yn−1 ) | Sn∗ ) −
γ ek (∇Λ(yk ) − ∇Λ(yk−1 ) | Sk∗ )
γ
| {z }
(a) |k=1 {z }
(b)
n−1
X
− γk+1 (∇Λ(yk ) | Sk∗ ) .
∆e
|k=1 {z }
(c)
We aim at showing that mn converges in R toward a finite limit by inspecting the above three
terms.
One gets, using that γn ≤ γ
en ,
|e
γn+1 − γ 2
en | ≤ |γn+1 − γn | + CΛ γn+1 γn ≤ CΛ0 max(γn2 , |γn+1 − γn |).
One checks that the series (c) is also (absolutely) converging owing to the boundedness of ∇L,
Assumption (6.45) and the upper-bound (6.51) for Sn∗ .
Then mn converges toward a finite limit m∞ . This induces that the sequence (sn + mn ) is lower
bounded since (sn ) is non-negative. Now we know from (6.50) that (sn +mn ) is also non-increasing,
hence converging in R which in turn implies that the sequence (sn )n≥0 itself is converging toward
a finite limit. The same arguments as in the regular stochastic case yield
n→+∞
X
L(yn ) −→ L∞ and γn ΦH (yn−1 ) < +∞
n≥1
Once again, one concludes like in the stochasic case that (yn ) is bounded and eventually converges
toward the unique zero of ΦH i.e. y∗ .
200 CHAPTER 6. STOCHASTIC APPROXIMATION AND APPLICATIONS TO FINANCE
(b) is obvious. ♦
Comments for practical implementation • The step assumption (6.45) includes all the step
γn = ncα , α ∈ (0, 1]. Note that as soon as q ≥ 2, the condition γn (log n)q → 0 is redundant (it
follows from the convergence of the series on the right owing to an Abel transform).
• One can replace the (slightly unrealistic) assumption on H(y∗ , .) by a Lipschitz continuous as-
sumption provided one strengthens the step assumption (6.45) into
1− 1q 1− 1q
X X
γn = +∞, γn (log n)n →0 and max(γn − γn+1 , γn2 )(log n)n < +∞. (6.52)
n≥1 n≥1
This is a straightforward consequence of Proı̈nov’s Theorem (Theorem 4.2) which says that
1− 1q
|Sn∗ | ≤ C(log n) n .
c 1
Note the above assumptions are satisfied by the step sequences γn = nρ , 1− q < ρ ≤ 1.
• It is clear that the Lyapunov assumptions on H is much more stringent in this QM C setting.
• It remains that the theoretical spectrum of application of the above theorem is dramatically
more narrow than the original one. From a practical viewpoint, one observes on simulations a very
satisfactory behaviour of such quasi-stochastic procedures, including the improvement of the rate
of convergence with respect to the regular M C implementation.
Exercise. We assume now that the recursive procedure satisfied by the sequence (yn )n≥0 is
given by
∀ n ≥ 0, yn+1 = yn − γn+1 (H(yn , ξn+1 ) + rn+1 ), y0 ∈ Rd
P
where the sequence (rn )n≥1 is a disturbance term. Show that if, n≥1 γn rn is a converging series,
then the conclusion of the above theorem remains true.
Numerical experiment: We reproduced here (without even trying to check any kind of assump-
tion, indeed) the implicit correlation search recursive procedure tested in Section 6.3.2 implemented
this time with a sequence of some quasi-random normal numbers, namely
p p
(ζn1 , ζn2 ) = −2 log(ξn1 ) sin(2πξn2 ), −2 log(ξn1 ) cos(2πξn2 ) , n ≥ 1,
n ρn := cos(θn )
1000 -0.4964
10000 -0.4995
25000 -0.4995
50000 -0.4994
75000 -0.4996
100000 -0.4998
6.5. FROM QUASI-MONTE CARLO TO QUASI-STOCHASTIC APPROXIMATION 201
−0.25 −0.25
−0.3 −0.3
−0.35 −0.35
−0.4 −0.4
−0.45 −0.45
−0.5 −0.5
−0.55 −0.55
−0.6 −0.6
−0.65 −0.65
−0.7 −0.7
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
4 4
x 10 x 10
Figure 6.4: B-S Best-of-Call option.T = 1, r = 0.10, σ1 = σ2 = 0.30, X01 = X02 = 100,
K = 100. Convergence of ρn = cos(θn ) toward a ρ∗ = toward − 0.5 (up to n = 100 000). Left: M C
implementation. Right: QM C implementation.
Discretization scheme(s) of a
Brownian diffusion
One considers a d-dimensional Brownian diffusion process (Xt )t∈[0,T ] solution to the following
Stochastic Differential Equation (SDE)
∀ t ∈ [0, T ], ∀ x, y ∈ Rd , |b(t, x) − b(t, y)| + kσ(t, x) − σ(t, y)k ≤ K|x − y|. (7.2)
∀ t ∈ [0, T ], Ft := σ(X0 , NP , Ws , 0 ≤ s ≤ t)
where NP denotes the class of P-negligible sets of A (i.e. all negligible sets if the σ-algebra A is
supposed to be P-complete). One shows using the Kolmogorov 0-1 law that this completed filtration
is right continuous i.e. Ft = ∩s>t Fs for every t ∈ [0, T ). Such a combination of completeness and
right continuity of a filtration is also known as “usual conditions”.
Theorem 7.1 (see e.g. [78], theorem 2.9, p.289) Under the above assumptions on b, σ, X0 and
W , the above SDE has a unique (Ft )-adapted solution X = (Xt )t∈[0,T ] defined on the probability
space (Ω, A, P), starting from X0 at time 0, in the following sense:
Z t Z t
P-a.s. ∀ t ∈ [0, T ], Xt = X0 + b(s, Xs )ds + σ(s, Xs )dWs .
0 0
203
204 CHAPTER 7. DISCRETIZATION SCHEME(S) OF A BROWNIAN DIFFUSION
Notation. When X0 = x ∈ Rd , one denotes the solution of (SDE) on [0, T ] by X x or (Xtx )t∈[0,T ] .
Remark. • A solution as described in the above theorem is known as a strong solution in the sense
that it is defined on the probability space on which W lives.
• The continuity assumption onb and σ can be relaxed into Borel measurability, if we add the linear
growth assumption
∀ t ∈ [0, T ], ∀ x ∈ Rd , |b(t, x)| + kσ(t, x)k ≤ K 0 (1 + |x|).
In fact if b and σ are continuous this condition follows from (7.2) applied with (t, x) and (t, 0),
given the fact that t 7→ b(t, 0) is bounded on [0, T ].
• By adding the 0th component t to X i.e. be setting Yt := (t, Xt ) one may always assume that
the (SDE) is homogeneous i.e. that the coefficients b and σ only depend on the space variable.
This is often enough for applications although it induces some too stringent assumptions on the
time variable in many theoretical results. Furthermore, when some ellipticity assumptions are
required, this way of considering the equation no longer works since the equation dt = 1dt + 0dWt
is completely degenerate.
X
et = X̄t , t ∈ [0, T ]. (7.4)
7.2. STRONG ERROR RATE AND POLYNOMIAL MOMENTS (I) 205
∀ k ∈ {0, . . . , n − 1}, ∀ t ∈ [tnk , tnk+1 ), X̄t = X̄t + (t − t)b(t, X̄t ) + σ(t, X̄t )(Wt − Wt ). (7.5)
It is clear that lim n X̄t = X̄tnk+1 since W has continuous paths. Consequently, so defined,
t→tn
k+1 ,t<tk+1
Proposition 7.1 Assume that b and σare continuous functions on [0, T ] × Rd . The genuine Euler
scheme is a (continuous) Itô process satisfying the pseudo-SDE with frozen coefficients
that is Z t Z t
X̄t = X0 + b(s, X̄s )ds + σ(s, X̄s )dWs . (7.6)
0 0
Proof. It is clear from (7.5), the recursive definition (7.3) at the discretization dates tnk and the
continuity of b and σ that X̄t → X̄tnk+1 as t → tnk+1 . Consequently, for every t ∈ [tnk , tnk+1 ],
Z t Z t
X̄t = X̄tnk + b(s, X̄s )ds + σ(s, X̄s )dWs
tn
k tn
k
so that the conclusion follows by just concatenate the above identities between 0 and tn1 , . . . , tnk
and t. ♦
Notation: In the main statements, we will write X̄ n instead of X̄ to recall the dependence of the
Euler scheme in its step T /n. Idem for X,
e etc.
Then, the main (classical) result is that under the assumptions on the coefficients b and σ
mentioned above, supt∈[0,T ] |Xt − X̄t | goes to zero in every Lp (P), 0 < p < ∞ as n → +∞. Let us
be more specific on that topic by providing error rates under slightly more stringent assumptions.
How to use this continuous scheme for practical simulation seems not obvious, at least not as
obvious as the stepwise constant Euler scheme. However this turns out to be an important method
to improve the convergence rate of M C simulations e.g. for option pricing. Using this scheme in
simulation relies on the so-called diffusion bridge method and will be detailed further on.
Polynomial moment control. It is often useful to have at hand the following uniform bounds for
the solution(s) of (SDE) and its Euler schemes which first appears as a step of the proof of the
rate but has many other applications: thus it is an important step to prove the existence of global
strong solutions to (SDE).
Proposition 7.2 Assume that the coefficients b and σ of the SDE (7.1) are Borel functions that
simply satisfy the following linear growth assumption:
for some real constant C > 0 and a “horizon” T > 0. Then, for every p ∈ (0, +∞), there exists a
universal positive real constant κp such that every strong solution (Xt )t∈[0,T ] (if any) satisfies
sup |Xt |
≤ 2eκp CT (1 + kX0 kp )
t∈[0,T ] p
One noticeable consequence of this proposition is that, if b and σ satisfy (7.7) with the same real
constant C for every T > 0, then the conclusion holds true for every T > 0, providing a “rough”
exponential control in T of any solution.
Uniform convergence rate in Lp (P). First we introduce the following condition (HTβ ) which
strengthens Assumption (7.2) by adding a time regularity assumption of the Hölder type:
Theorem 7.2 (Strong Rate for the Euler scheme) (a) Continuous Euler scheme. Sup-
pose the coefficients b and σ of the SDE (7.1) satisfy the above regularity condition (HTβ ) for a
real constant Cb,σ,T > 0 and an exponent β ∈ (0, 1]. Then the continuous Euler scheme (X̄ n )t∈[0,T ]
1
converges toward (Xt )t∈[0,T ] in every Lp (P), p > 0, such that X0 ∈ Lp , at a O(n−( 2 ∧β) )-rate. To be
precise, there exists a universal constant κp > 0 only depending on p such that, for every n ≥ T ,
β∧ 1
T 2
sup |Xt − X̄tn |
≤ K(p, b, σ, T )(1 + kX0 kp ) (7.9)
t∈[0,T ] p n
where
0
0
K(p, b, σ, T ) = 2κp (Cb,σ,T )2 eκp (1+Cb,σ,T )T
and
0
Cb,σ,T = Cb,σ,T + sup |b(t, 0)| + sup kσ(t, 0)k < +∞. (7.10)
t∈[0,T ] t∈[0,T ]
7.2. STRONG ERROR RATE AND POLYNOMIAL MOMENTS (I) 207
In particular (7.9) is satisfied when the supremum is restricted to discretization instants, namely
β∧ 1
n
T 2
sup |Xtk − X̄tk |
≤ K(p, b, σ, T )(1 + kX0 kp ) . (7.11)
0≤k≤n p n
(a0 ) If b and σ are defined on the whole R+ × Rd and satisfy (HTβ ) with the same real constant Cb,σ
0
not depending on T and if b( . , 0) and σ( . , 0) are bounded on R+ , then Cb,σ,T does not depend on
T.
So will be the case in the homogeneous case i.e. if b(t, x) = b(x) and σ(t, x) = σ(x), t ∈ R+ ,
x ∈ Rd with b and σ Lipschitz continuous continuous on Rd .
(b) Stepwise constant Euler scheme. As soon as b and σ satisfy the linear growth assump-
tion (7.7) with a real constant Lb,σ,T > 0, then, for every p ∈ (0, +∞) and every n ≥ T ,
r r !
n
n
κ̃p Lb,σ,T T T (1 + log n) 1 + log n
sup |X̄t − X̄t |
≤ κ̃p e (1 + kX0 kp ) =O
t∈[0,T ]
n n
p
where κ̃p > 0 is a positive real constant only depending on p (and increasing in p).
In particular if b and σ satisfy the assumptions of item (a) then the stepwise constant Euler
scheme (Xe n )t∈[0,T ] converges toward (Xt )t∈[0,T ] in every Lp (P), p > 0, such that X0 ∈ Lp and for
every n ≥ T ,
r β∧ 1 !
e n |
≤ K(p,
T (1 + log n) T 2
sup |Xt − X e b, σ, T ) 1 + kX0 kp +
t
t∈[0,T ] p n n
β r !
1 1 + log n
= O +
n n
where
e b, σ, T ) = κ̃0 (1 + C 0 2 κ̃0p (1+Cb,σ,T
0 )T
K(p, p b,σ,T ) e ,
0
κ̃0p > 0 is a positive real constant only depending on p (increasing in p) and Cb,σ,T is given by (7.10).
Warning! The complete and detailed proof of this theorem in its full generality, i.e. including
the management of the constants, is postponed to Section 7.8. It makes use of stochastic calculus.
A first approach to the proof in the one-dimensional quadratic case is proposed in Section 7.2.2.
However, owing to its importance for applications, the optimality of the upper bound for the
stepwise constant Euler scheme will be discussed right after the remarks below.
Remarks. • When n ≤ T , the above explicit bounds still hold true with the same constants
provided one replaces Note that as soon as Tn ≤ 1,
β∧ 1 β 1 ! r r !
T 2 1 T T 2 T 1 T T
2 by + and 1 + log n by 1 + log n +
n 2 n n n 2 n n
which significantly simplifies the formulation of the error bound in item (b) of the above theorem.
208 CHAPTER 7. DISCRETIZATION SCHEME(S) OF A BROWNIAN DIFFUSION
• As a consequence, note that the time regularity exponent β rules the convergence rate of the
scheme as soon as β < 1/2. In fact, the method of proof itself will emphasize this fact: the idea
is to use a Gronwall Lemma to upper-bound the error X − X̄ in Lp (P) by the Lp (P)-norm of the
increments of Xs − Xs .
• If b(t, x) and σ(t, x) are globally Lipschitz continuous continuous on R+ × Rd with Lipschitz
continuous coefficient Cb,σ , one may consider time t as a (d + 1)th spatial component of X and
apply directly item (a0 ) of the above theorem.
The following corollary is a straightforward consequence of claims (a) of the theorem (to be
precise (7.11)): it yields a (first) convergence rate for the pricing of “vanilla” European options
(payoff ϕ(XT )).
Corollary 7.1 Let ϕ : Rd → R be an α-Hölder function for an exponent α ∈ (0, 1], i.e. a function
(y)|α
such that [ϕ]α := supx6=y |f (x)−f
|x−y| < +∞. Then, there exists a real constant Cb,σ,T ∈ (0, ∞) such
that, for every n ≥ 1,
T α
n n 2
|E ϕ(XT ) − E ϕ(X̄T )| ≤ E |ϕ(XT ) − ϕ(X̄T )| ≤ Cb,σ,T [ϕ]α .
n
We will see further on that this rate can be considerably improved when b, σ and ϕ share higher
regularity properties.
About the universality of the rate for the stepwise constant Euler scheme (p ≥ 2). Note that the
rate in claim (b) of the theorem is universal since it holds as a sharp rate for the Brownian motion
itself (here we deal with the case d = 1). Indeed, since W is its own continuous Euler scheme,
k sup |Wt − Wt |kp =
max sup |Wt − Wtnk−1 |
t∈[0,T ]
k=1,...,n n n
t∈[tk−1 ,tk )
p
r
T
= max sup |W − W |
t k−1
f f
n
k=1,...,n t∈[k−1,k)
p
pn
where W
ft :=
T WT t is a standard Brownian motion owing to the scaling property. Hence
n
r
T
sup |Wt − Wt |
=
max ζk
t∈[0,T ]
n
k=1,...,n
p
p
ζk ≥ Zk := |Wk − Wk−1 |.
since the Brownian motion Wt has continuous paths. The sequence (Zk )k≥1 is i.i.d. as well, with the
same distribution as |W1 |. Hence, the random variables Zk2 are still i.i.d. with a χ2 (1)-distribution
so that (see item (b) of the exercises below)
r p
∀ p ≥ 2,
max |Zk |
= k max Zk2 kp/2 ≥ cp log n.
k=1,...,n p k=1,...,n
7.2. STRONG ERROR RATE AND POLYNOMIAL MOMENTS (I) 209
so that, finally, r
Tp
∀ p ≥ 2,
sup |Wt − Wt |
≥ cp log n.
t∈[0,T ] p n
Upper bound. To establish the upper-bound, we proceed as follows. First, note that
ζ1 = max( sup Wt , sup − Wt ) .
t∈[0,1) t∈[0,1)
2
= 2 E eθ(supt∈[0,1) Wt )
!
u2
Z
θW12 du 2
= 2Ee =2 exp − 1
√ =√ < +∞
R 2( √1−2θ )2 2π 1 − 2θ
as long as θ ∈ (0, 12 ). Consequently, it follows from Lemma 7.1 below applied with the sequence
(ζn2 )n≥1 that r p
k max ζk kp = k max ζk2 kp/2 ≤ Cp 1 + log n
k=1,...,n k=1,...,n
n
!
p
X
≤ ϕp E ϕ−1
p (Yk )
k=0
p
= ϕp n E ϕ−1
p (Y1 )
≤ ϕp n E eY1
p
≤ log ep−1 + n E eY1 .
Hence
max Yk
≤ log ep−1 + n E eY1
k=1,...,n p
ep−1
= log n + log E eY1 +
n
≤ log n + Cp,Y1
where Cp,Y1 = log E eY1 + ep−1 .
Let us come back to the general case i.e. E eλY1 < +∞ for a λ > 0. Then
1
max(Y1 , . . . , Yn )
=
max(λY1 , . . . , λYn )
p λ p
1
≤ (log n + Cp,λ,Z1 ). ♦
λ
Exercises. (a) Let Z be a non-negative random variable with distribution function F (z) =
P(Z ≤ z) and a continuous probability density function f . Show that if the survival function
F̄ (z) := P(Z > z) satisfies
∀ z ≥ a > 0, F̄ (z) ≥ c f (z)
then, if (Zn )n≥1 is i.i.d. with distribution PZ (dz),
n
X 1 − F k (a)
∀ p ≥ 1,
max(Z1 , . . . , Zn )
≥ c = c(log n + log(1 − F (a))) + C + εn
p k
k=1
with limn εn = 0.
[Hint: one may assume p = 1. Then use the classical representation formula
Z +∞
EU = P(U ≥ u)du
0
7.2. STRONG ERROR RATE AND POLYNOMIAL MOMENTS (I) 211
for any non-negative random variable U and some basic facts about Stieljès integral like dF (z) =
f (z)dz, etc.]
−u
(b) Show that the χ2 (1) distribution defined by f (u) := √e 2πu
2
1{u>0} du satisfies the above inequality
for any η > 0 [Hint: use an integration by parts and usual comparison theorems on integrals to
show that
z Z +∞ − u z
2e− 2 e 2 2e− 2
F̄ (z) = √ − √ du ∼ √ as z → +∞ .]
2πz z u 2πu 2πz
A.s. convergence rate(s). The last important result of this section is devoted to the a.s. conver-
gence of the Euler schemes toward the diffusion process with a first (elementary) approach to its
rate of convergence.
Theorem 7.3 If b and σ satisfy (HTβ ) for a β ∈ (0, 1] and if X0 is a.s. finite, the continuous
Euler scheme X̄ n = (X̄tn )t∈[0,T ] a.s. converges toward the diffusion X for the sup-norm over [0, T ].
Furthermore, for every α ∈ [0, β ∧ 12 ),
a.s.
nα sup |Xt − X̄tn | −→ 0.
t∈[0,T ]
The proof follows from the Lp -convergence theorem by an approach “à la Borel-Cantelli”. The
details are deferred to Section 7.8.6.
Lemma 7.2 (Gronwall Lemma) Let f : R+ → R+ be a Borel non-negative locally bounded function
and let ψ : R+ → R+ be a non-decreasing function satisfying
Z t
(G) ≡ ∀ t ≥ 0, f (t) ≤ α f (s) ds + ψ(t)
0
Proof. It is clear that the non-decreasing (finite) function ϕ(t) := sup0≤s≤t f (s) satisfies (G)
Rt
instead of f . Now the function e−αt 0 ϕ(s) ds has a right derivative at every t ≥ 0 and that
Z t 0 Z t
−αt −αt
e ϕ(s)ds = e (ϕ(t+) − α ϕ(s) ds)
0 r 0
≤ e−αt ψ(t+)
where ϕ(t+) and ψ(t+) denote right limits of ϕ and ψ at t. Then, it follows from the fundamental
theorem of calculus that
Z t Z t
−αt
e ϕ(s)ds − e−αs ψ(s+) ds is non-increasing.
0 0
where we used successively that a monotone function is ds-a.s. continuous and that ψ is non-
decreasing. ♦
Now we recall the classical Doob’s Inequality that is needed to carry out the proof (instead of
the more sophisticated Burkhölder-Davis-Gundy Inequality which is necessary in the non-quadratic
case).
Doob’s Inequality. (see e.g. [91]) (a) Let M = (Mt )t≥0 be a continuous martingale with M0 = 0.
Then, for every T > 0, !
E sup Mt2 ≤ 4 E MT2 = 4 E hM iT .
t∈[0,T ]
(b) If M is simply a continuous local martingale with M0 = 0, then, for every T > 0,
!
E sup Mt2 ≤ 4 E hM iT .
t∈[0,T ]
Proof of Proposition 7.2 (A first partial). We may assume without loss of generality that
E X02 < +∞ (otherwise the inequality is trivially fulfilled). Let τL := min{t : |Xt − X0 | ≥ L},
7.2. STRONG ERROR RATE AND POLYNOMIAL MOMENTS (I) 213
L ∈ N \ {0} (with the usual convention min ∅ = +∞) . It is a positive F-stopping time as the
hitting time of a closed set by a process with continuous paths. Furthermore, for every t ∈ [0, T ],
τ
|Xt L | ≤ L + |X0 |, t ∈ [0, ∞).
Then,
Z t∧τL Z t∧τL
τ
Xt L = X0 + b(Xs )ds + σ(Xs )dWs
0 0
Z t∧τL Z t∧τL
τ τ
= X0 + b(Xs L )ds + σ(Xs L )dWs
0 0
owing to the local feature of (standard and) stochastic integral(s). The stochastic integral
Z t∧τ
(L) L τ
Mt := σ(Xs L )dWs
0
The elementary inequality (a + b + c)2 ≤ 3(a2 + b2 + c2 ) (a, b, c ≥ 0), combined with the Schwarz
Inequality successively yields
2 !
Z t
τ τ
sup (Xs L )2 ≤ 3 X02 + |b(Xs L )|ds + sup |Ms(L) |2
s∈[0,t] 0 s∈[0,t]
!
Z t
τ
≤ 3 X02 +t |b(Xs L )|2 ds + sup |Ms(L) |2 .
0 s∈[0,t]
as Lipschitz continuous continuous functions. Then, taking expectation and using Doob Inequality
for the local martingale M (L) yield for an appropriate real constant Cb,σ,T > 0 (that may vary from
line to line)
Z t Z t∧τ
τL 2 τL 2 L τL
2 2
E( sup (Xs ) ) ≤ 3 EX0 + T Cb,σ (1 + E|Xs | )ds + E σ (Xs )ds
s∈[0,t] 0 0
Z t Z t
τ τ
≤ Cb,σ,T EX02 + (1 + E|Xs L |2 )ds +E (1 + |Xs L |)2 ds
0 0
Z t
τ
= Cb,σ,T EX02 + (1 + E|Xs L |2 )ds
0
where we used again (in the first inequality) that τL ∧ t ≤ t. Finally, this can be rewritten
Z t
τ τ
E( sup (Xs L )2 ) ≤ Cb,σ,T 1+ EX02 + E(|Xs L |2 )ds
s∈[0,t] 0
for a new real constant Cb,σ,T . Then the Gronwall Lemma applied to the bounded function fL (t) :=
τ
E(sups∈[0,t] (Xs L )2 ) (at time t = T ) implies
τ
E( sup (Xs L )2 ) ≤ Cb,σ,T (1 + EX02 )eCb,σ,T T .
s∈[0,T ]
This holds or every L ≥ 1. Now τL ↑ +∞ a.s. as L ↑ +∞ since sup0≤s≤t |Xt | < +∞ for every t ≥ 0
a.s. Consequently,
τ
lim sup |Xs L | = sup |Xs |.
L→+∞ s∈[0,T ] s∈[0,T ]
As for the Euler scheme the same proof works perfectly if we introduce the stopping time
τ̄L = min t : |X̄t − X0 | ≥ L big}
and if we note that, for every s ∈ [0, T ], sup |X̄u | ≤ sup |X̄u |. Then one shows that
u∈[0,s] u∈[0,s]
!
sup E sup (X̄sn )2 ≤ Cb,σ,T (1 + EX02 )eCb,σ,T T . ♦
n≥1 s∈[0,T ]
Proof of Theorem 7.2 (partial) (a) (Convergence rate of the continuous Euler scheme). Com-
bining the equations satisfied by X and its (continuous) Euler scheme yields
Z t Z t
Xt − X̄t = (b(Xs ) − b(X̄s ))ds + (σ(Xs ) − σ(X̄s ))dWs .
0 0
7.2. STRONG ERROR RATE AND POLYNOMIAL MOMENTS (I) 215
Consequently, using that b and σ are Lipschitz, Schwartz and Doob Inequality lead to
Z t 2 Z s 2
2
E sup |Xs − X̄s | ≤ 2E [b]Lip |Xs − X̄s |ds + 2 E sup (σ(Xu ) − σ(X̄u ))dWu
s∈[0,t] 0 s∈[0,t] 0
Z t 2 Z t
≤ 2E [b]Lip |Xs − X̄s |ds + 8 E (σ(Xu ) − σ(X̄u ))2 du
0 0
Z t Z t
2 2 2
≤ 2T [b]Lip E |Xs − X̄s | ds + 8 [σ]Lip E|Xu − X̄u |2 du
0 0
Z t Z t
2 2
≤ Cb,σ,T E|Xs − X̄s | ds + 8[σ]Lip E|Xu − X̄u |2 du
0 0
Z t Z t
2
≤ Cb,σ,T E sup |Xu − X̄u | ds + Cb,σ,T E|X̄s − X̄s |2 ds.
0 u∈[0,s] 0
The function f (t) := E sups∈[0,t] |Xs − X̄s |2 is locally bounded owing to Step 1. Consequently, it
follows from Gronwall Lemma (at t = T ) that
Z T
E sup |Xs − X̄s |2 ≤ Cb,σ,T E|X̄s − X̄s |2 ds eCb,σ,T T .
s∈[0,T ] 0
Now
X̄s − X̄s = b(X̄s )(s − s) + σ(X̄s )(Ws − Ws ) (7.13)
so that, using Step 1 (for the Euler scheme) and the fact that Ws − Ws and X̄s are independent
2 T 2 2 2 2
E|X̄s − X̄s | ≤ Cb,σ E b (X̄s ) + E σ (X̄s ) E(Ws − Ws )
n
T 2 T
≤ Cb,σ (1 + E sup |X̄t |2 ) +
t∈[0,T ] n n
T
= Cb,σ 1 + E X02 .
n
(b) Stepwise constant Euler scheme. We assume here – for pure convenience – that X0 ∈ L4 . One
derives from (7.13) and the linear growth assumption satisfied by b and σ (since they are Lipschitz
continuous continuous) that
T
sup |X̄t − X̄t | ≤ Cb,σ 1 + sup |X̄t | + sup |Wt − Wt |
t∈[0,T ] t∈[0,T ] n t∈[0,T ]
so that,
T
k sup |X̄t − X̄t |k2 ≤ Cb,σ
1 + sup |X̄t | + sup |Wt − Wt |
.
t∈[0,T ]
t∈[0,T ] n t∈[0,T ]
2
Now, if U and V are real valued random variables, Schwarz Inequality implies
p p
kU V k2 = kU 2 V 2 k1 ≤ kU 2 k2 kV 2 k2 = kU k4 kV k4 .
216 CHAPTER 7. DISCRETIZATION SCHEME(S) OF A BROWNIAN DIFFUSION
Consequently
k sup |X̄t − X̄t |k2 ≤ Cb,σ (1 + k sup |X̄t |k4 )(T /n + k sup |Wt − Wt |k4 ).
t∈[0,T ] t∈[0,T ] t∈[0,T ]
Now, as already mentioned in the first remark that follows Theorem 7.2,
r
Tp
k sup |Wt − Wt |k4 ≤ CW 1 + log n
t∈[0,T ] n
which completes the proof (if we admit that k supt∈[0,T ] |X̄t |k4 ≤ Cb,σ,T (1 + kX0 k4 )). ♦
Remarks. • The proof in the general Lp framework follows exactly the same lines, except that
one replaces Doob’s Inequality for continuous (local) martingale (Mt )t≥0 by the so-called Burkh
”older-Davis-Gundy Inequality (see e.g. [141]) which holds for every exponent p > 0 (only in the
continuous setting)
∀ t ≥ 0, k sup |Ms |kp ≤ Cp khM it k2p
s∈[0,t] 2
where Cp is a positive real constant only depending on p. This general setting is developed in full
details in Section 7.8 (in the one dimensional case to alleviate notations).
• In some so-called mean-reverting situations one may even get boundedness over t ∈ (0, ∞).
We still consider a Brownian diffusion process solution with drift b and diffusion coefficient σ,
starting at X0 , solution to the SDE (7.1). Furthermore, we assume that b : [0, T ] × Rd → Rd and
σ : [0, T ] × Rd → Md,q (R) are Borel functions satisfying a Lipschitz continuous assumption in x
uniformly in t ∈ [0, T ] i.e.
|b(t, x) − b(t, y)| kσ(t, x) − σ(t, y)k
[b]Lip = sup < +∞ and [σ]Lip = sup < +∞.
t∈[0,T ],x6=y |x − y] t∈[0,T ],x6=y |x − y]
The (augmented) natural filtration of the driving standard q-dimensional Brownian motion W
defined on a probability space (Ω, A, P) will be denoted (FtW )t∈[0,T ] .
The definition of the Brownian Euler scheme with step T /n (starting at X0 ), is unchanged but
to alleviate notations we will use the notation X̄kn (rather than X̄tnn ). So we have: X̄0n = X0 and
k
r
n T n n T
X̄k+1 = X̄kn + b(t , X̄ ) + σ(tnk , X̄kn ) Zk+1 , k = 0, . . . , n − 1
n k k n
7.3. NON ASYMPTOTIC DEVIATION INEQUALITIES FOR THE EULER SCHEME 217
where tnk = kT
pn
n , k = 0, . . . , n and Zk = T (Wtk − Wtk−1 ), k = 1, . . . , n is an i.i.d. sequence of
n n
N (0, Iq ) random vectors. When X0 = x ∈ R , we may denote occasionally by (X̄kn,x )0≤k≤n the
d
Lemma 7.3 Under the above assumptions on b and σ, the transitions Pk satisfy
hT n i2 T 1
2
with [Pk ]Lip = b(tk , .)
1+ + [σ(tnk , .)]2Lip
n Lip n
T T 12
≤ 1+ Cb,σ + κb
n n
The key property is the following classical exponential inequality for the Gaussian measure for
which we refer to [100].
Proposition 7.3 Let f : Rq → R be Lipschitz continuous continuous function (with respect to the
canonical Euclidean measure) and let Z be an N (0; Iq ) distributed random vector. Then
λ2
λ f (Z)−Ef (Z) [f ]2Lip
∀ λ > 0, E e ≤e 2 . (7.14)
218 CHAPTER 7. DISCRETIZATION SCHEME(S) OF A BROWNIAN DIFFUSION
and, using the Wiener isometry, we derive the covariance matrix ΣΞxt of Ξxt given by
Z t Z t
s s
ΣΞxt = et E e 2 dWsk e 2 dWs`
0 0 1≤k,`≤q
Z t
= e−t es dsIq
0
= (1 − e−t )Iq .
(The time covariance structure of the process can also be computed easily but is of no use in this
proof). As a consequence, for every Borel function g : Rq → R with polynomial growth
L
p
Qt g(x) := E g Ξxt = E g x e−t/2 + 1 − e−t Z
where Z ∼ N (0; Iq ).
where L is the infinitesimal generator of the above equation (known as the Ornstein-Uhlenbeck
operator) which maps g to
1
ξ 7−→ Lg(ξ) = ∆g(ξ) − (ξ|∇g(ξ)) .
2
7.3. NON ASYMPTOTIC DEVIATION INEQUALITIES FOR THE EULER SCHEME 219
gx0 i (ξ)
..
where ∆g(ξ) = qi=1 g”x2 (ξ) denotes the Laplacian of g and ∇g(ξ) =
P
. Now, if h and g
i
.
0
gxq (ξ)
are both twice differentiable, one has
q Z
X |z|2 dz
gx0 k (z)h0xk (z)e−
E (∇g(Z)|∇h(Z) = 2
d
k=1 Rq (2π) 2
∂ 0 |z|2 |z|2
hxk (z)e− 2 = e− 2 h”x2 (z) − zk h0xk (z) ,
∂zk k
yields
E (∇g(Z)|∇h(Z) − 2E G(Z)Lh(Z) . (7.16)
since h and all its existing partial derivative have at lost polynomial growth. One also derive from
the above identity and the continuity of s 7→ E (Lg)(Ξxs ) that
d
Qt g(x) = E Lg(Ξxt ) = Qt Lg(x).
dt
Furthermore, one shows using the same arguments that Hλ is differentiable over the whole real
line and
1
= − λE ∇z (eλQt f (z) )|z=Z |Qt f (Z)
2
λ2
= − e−t E eλQt f (Z) |Qt ∇f (Z)|2
2
where we used successively (7.16) and (7.15) in the last two lines.
220 CHAPTER 7. DISCRETIZATION SCHEME(S) OF A BROWNIAN DIFFUSION
We are now in position to state the main result of this section and its application to the design
of confidence intervals (see also [54]).
7.3. NON ASYMPTOTIC DEVIATION INEQUALITIES FOR THE EULER SCHEME 221
Theorem 7.4 Assume |||σ|||sup := sup(t,x)∈[0,T ]×Rd |||σ(t, x)||| < +∞. Then, for every integer
n0 ≥ 1, there exists a real constant K(b, σ, T, n0 ) ∈ (0, +∞) such that, for every n ≥ n0 and
Lipschitz continuous continuous function f : Rd → R,
λ2 2 2
λ f (X̄nn )−E f (X̄nn )
∀ λ > 0, E e ≤ e 2 |||σ|||sup [f ]Lip K(b,σ,T,n) . (7.17)
Application. Let us briefly recall that such exponential inequalities yield concentration inequal-
ities in the strong law of large numbers. Let (X̄ n,` )`≥1 be independent copies of the Euler scheme
X̄ n = (X̄kn )0≤k≤n . Then, for every ε > 0, Markov inequality and independence imply, for every
integer n ≥ n0 ,
M
1 X λ2
|||σ|||2sup [f ]2Lip K(b,σ,T,n0 )
f X̄nn,` − E f (X̄nn ) > ε inf e−λM ε+M
P ≤ 2
M λ>0
`=1
ε2 M
−
2|||σ|||2 2
sup [f ]Lip K(b,σ,T,n0 )
= e
so that by symmetry,
M ε2 M
1 X −
n,` n 2|||σ|||2 2
sup [f ]Lip K(b,σ,T,n0 )
f X̄n − E f (X̄n ) > ε ≤ 2e .
P
M
`=1
The crucial facts in the above inequality, beyond the fact that it holds for possibly unbounded
Lipschitz continuous functions f , is that the right hand upper-bound does not depend on the time
step Tn of the Euler scheme. Consequently we can design confidence intervals for Monte Carlo
simulations based on the Euler schemes uniformly in the time discretization step Tn .
Doing so, we can design non asymptotic confidence interval when computing E f (XT ) by a
Monte Carlo simulation. We know that the bias is due to the discretization scheme and only
depends on the step Tn : usually (see Section 7.6 for the (expansion of the) weak error), one has
c
E f (X̄Tn ) = E f (XT ) + + o(1/nα ) with α = 1
2 or 1.
nα
Remark. Under the assumptions we make on b and σ, the Euler scheme converges a.s. and in
every Lp spaces (provided X0 lies in Lp for the sup norm over [0, T ] toward the diffusion process
X, so that we retrieve the result for (independent copies of) the diffusion itself, namely
M ε2 M
1 X −
` 2|||σ|||2 2
sup [f ]Lip K(b,σ,T )
f XT − E f (XT ) > ε ≤ 2e
P
M
`=1
Cb,σ T
e
with K(b, σ, T ) = Cb,σ (with Cb,σ = 2[b]Lip + [σ]2Lip ).
222 CHAPTER 7. DISCRETIZATION SCHEME(S) OF A BROWNIAN DIFFUSION
Proof of Theorem 7.4. The starting point is the following: let k ∈ {0, . . . , n − 1}, and let
r
T n n T
Ek (x, z) = x + b(tk , x) + σ(tk , x) z, x ∈ Rd , z ∈ Rq , k = 0, . . . , n − 1,
n n
denote the Euler scheme operator. Then, it follows from Proposition 7.3, that for every Lipschitz
continuous continuous function f : Rd → Rq
2
λ2
q
T
|||σ(tn [f ]2Lip
λ f (Ek (x,Z))−Pk f (x) k ,x)|||
∀ λ > 0, E e ≤e 2 n .
since z 7→ f (E(x, z)) is Lipschitz continuous continuous with respect to the canonical Euclidean
norm with a coefficient upper bounded by the trace norm of σ(tnk , x).
Consequently,
λ2 T
n
n 2 2
∀ λ > 0, E eλf (X̄k+1 ) | FtW
n ≤ eλPk f (X̄k )+ 2 n |||σ|||sup [f ]Lip .
k
whenever n ≥ n0 . ♦
Further comments. It seems hopeless to get a concentration inequality for the supremum of the
Monte Carlo error over all Lipschitz continuous functions (with Lipschitz continuous coefficients
bounded by 1) as emphasized below. Let Lip1 (Rd , R) be the set of Lipschitz continuous continuous
functions from Rd to R with a Lipschitz continuous coefficient [f ]Lip ≤ 1.
Let X` : (Ω, A, P) → R, ` ≥ 1, be independent copies of an integrable random vector X with
distribution denoted PX . For every ω ∈ Ω and every M ≥ 1, the function defined for every ξ ∈ Rd
by
fω,M (ξ) = min |X` (ω) − ξ|.
1≤`≤M
It is clear from its very definition that fω,M ∈ Lip1 (Rd , R). Then, for every ω ∈ Ω,
1 X M Z M Z
1 X
sup f (X` (ω)) − f (ξ)PX (dξ) ≥ fω,M (X` (ω)) − fω,M (ξ)PX (dξ)
f ∈Lip1 (Rd ,R) M k=1 M k=1 |
Rd {z } Rd
=0
Z
= fω,M (ξ)PX (dξ)
Rd
≥ e1,M (X) := inf E min |X − xi |.
x1 ,...,xM ∈Rd x1 ,...,xM
The lower bound in the last line is but the optimal L1 -quantization error of the (distribution of
the) random vector X. It follows from Zador’s Theorem (see e.g. [66]) that
1
lim inf M − d e1,M (X) ≥ J1,d kϕX k d
M d+1
where ϕX denotes the density – possibly equal to 0 – of the nonsingular part of the distribution
PX of X with respect to the Lebesgue measure, J1,d ∈ (0, ∞) is a universal constant and the
pseudo-norm kϕX k d is finite as soon as X ∈ L1+ = ∪η>0 Lp+η . Furthermore, it is clear that as
d+1
soon as the support of PX is infinite, e1,M (X) > 0. Consequently, for non-singular random vector
distributions, we have
1 XM
− d1 1
M sup f (X ) − f (X) ≥ inf M − d e1,M (X) > 0.
` E
f ∈Lip1 (Rd ,R) M
M ≥1
k=1
The right hand side of the above equation corresponds to the self-random quantization of the distri-
bution PX . It has been shown in [36, 37] that, under appropriate assumptions on the distributions
PX (satisfied e.g. by the normal distribution), one has, P-a.s.,
1
lim sup M − d (e1,M (X1 (ω), . . . , XM (ω), X) < +∞.
which shows that, for a wide class of random vector distributions, a.s.,
1 XM 1
sup f (X` (ω)) − E f (X)) C(f )M − d
M
f ∈Lip1 (Rd ,R)
k=1
This illustrates that the strong law of large numbers/Monte Carlo method is not as “dimension
free” as it is commonly admitted.
then r
1 + log n
|E(F ((Xt )t∈[0,T ] )) − E(F (X̄tn )t∈[0,T ] )| ≤ [F ]Lip Cb,σ,T
n
and
1
|E(F ((Xt )t∈[0,T ] ) − E(F (X̄tn )t∈[0,T ] )| ≤ Cn− 2 .
Typical example in option pricing. Assume that a one dimensional diffusion process X =
(Xt )t∈[0,T ] models the dynamics of a single risky asset (we do not take into account here the con-
sequences on the drift and diffusion coefficient term induced by the preservation of non-negativity
and the martingale property under a risk-neutral probability for the discounted asset. . . ).
The (partial) Lookback Lookback payoffs:
hT := XT − λ min Xt
t∈[0,T ] +
where λ = 1 in the regular Lookback case and λ > 1 in the so-called “partial Lookback” case.
“Vanilla” payoffs on extrema (like Calls and Puts)
hT = ϕ( sup Xt ),
t∈[0,T ]
where ϕ is Lipschitz continuous on R q.R In fact such Asian payoffs are continuous with respect to
T 2
the pathwise L2 -norm i.e. kf kL2 := 0 f (s)ds.
T
We wish to analyze the behaviour of the diffusion process Xtx for small values of t in order
to detect the term of order at most 1 i.e. that goes to 0 like t when t → 0 (with respect to the
L2 (P)-norm). Let us inspect the two integral terms successively. First,
Z t
b(Xsx )ds = b(x)t + o(t) as t → 0
0
so that
Z t Z tZ s
σ(Xsx )dWs = σ(x)Wt + σ(Xux )σ 0 (Xux )dWu dWs + OL2 (t3/2 ) (7.18)
0 0 0
Z t
= σ(x)Wt + σσ 0 (x) Ws dWs + oL2 (t) + OL2 (t3/2 ) (7.19)
0
1
= σ(x)Wt + σσ 0 (x)(Wt2 − t) + oL2 (t).
2
The OL2 (t3/2 ) in (7.18) comes from the fact that u 7→ σ 0 (Xux )b(Xux ) + 21 σ”(Xux )σ 2 (Xux ) is L2 (P)-
bounded in t in the neighbourhood of 0 (note that b and σ have at most linear growth and use
Proposition 7.2). Consequently, using the fundamental isometry property of stochastic integration,
Fubini-Tonnelli Theorem and Proposition 7.2,
Z tZ s 2 Z t Z s 2
0 1
0 1
E σ (Xux )b(Xux )+ σ”(Xux )σ 2 (Xux ) dudWs =E σ (Xux )b(Xux )+ σ”(Xux )σ 2 (Xux ) du ds
0 0 2 0 0 2
Z t Z s 2
≤ C(1 + x4 ) du ds = C(1 + x4 )t3 .
0 0
The oL2 (t) in Equation (7.19) also follows from the fundamental isometry property of stochastic
integral (twice) and Fubini-Tonnelli Theorem which yields
Z t Z s 2 Z tZ s
0 0
E σσ (Xux ) − σσ (x) dWu dWs = ε(u)duds
0 0 0 0
where ε(u) = E(σσ 0 (Xux ) − σσ 0 (x))2 goes to 0 as u → 0 by the Lebesgue dominated convergence
Theorem. Finally note that
starting at a random vector X0 , independent of the standard Brownian motion W . Using the
Markov property of the diffusion, one can reproduce the above reasoning on each time step [tnk , tnk+1 )
given the value of the scheme at time tnk . This leads to the following simulatable scheme:
e mil = X0 ,
X0
r
etmil etmil etmil 1 0 e mil T etn ) T Uk+1 + 1 σσ 0 (X
mil etmil T 2
X n = X n + b(X n ) − σσ (Xtn ) + σ(X n ) U , (7.20)
k+1 k k 2 k n k n 2 k n k+1
pn
where Uk = T (Wtk
n − Wtnk−1 ), k = 1, . . . , n.
7.5. MILSTEIN SCHEME (LOOKING FOR BETTER STRONG RATES. . . ) 227
By interpolating the above scheme between the discretization times we define the continuous
Milstein scheme defined (with our standard notations) by
e mil = X0 ,
X0
e mil = X
X e mil )− 1 σσ 0 (X
e mil +(b(X e mil )(Wt − Wt )+ 1 σσ 0 (X
e mil ))(t − t)+σ(X e mil )(Wt − Wt )2 . (7.21)
t t t t t t
2 2
The following theorem gives the rate of strong pathwise convergence of the Milstein scheme.
Theorem 7.5 (Strong rate for the Milstein scheme) (See e.g. [83]) (a) Assume that b and
σ are C 1 on R with bounded, αb0 and ασ0 -Hölder continuous first derivatives respectively, αb0 ,
ασ0 ∈ (0, 1]. Then, for every p ∈ (0, ∞), there exists a real constant Cb,σ,T,p > 0 such that, for every
X0 ∈ Lp (P), independent of the Brownian motion W , one has
1+α
mil,n
mil,n
T 2
max |Xtnk − Xtn |
≤
sup |Xt − Xt |
≤ Cb,σ,T,p (1 + kX0 kp )
e
e
0≤k≤n k p t∈[0,T ] p n
Remarks. • As soon as the derivatives of b0 and σ 0 are Hölder, the (continuous) Milstein scheme
converges faster than the Euler scheme.
• The O(1/n)-rate obtained here should be compared to the weak rate investigated in Section 7.6
which is also O(1/n). Comparing performances of both approaches for the computation of E f (XT )
should rely on numerical evidences and (may) depend on the specified diffusion or function.
• The second claim of the theorem shows that the stepwise constant Milstein scheme does not
converge faster than the stepwise constant Euler scheme. To get convinced of this rate without
rigorous proof, one has just to think of the Brownian motion itself: in that case σ 0 ≡ 0 so that
the stepswise constant Milstein and Euler schemes coincide and subsequently converge at the same
rate! As a consequence, since it is the only simulatable version of the Milstein scheme, its use
for the approximate computation (by Monte Carlo simulation) of the expectation of functionals
E F ((Xt )t∈[0,T ] ) of the diffusion should not provide better results than implementing the standard
stepwise Euler scheme as briefly described in Section 7.4.
By contrast, it happens that functionals of the continuous Euler scheme can be simulated: this
is the purpose of Chapter 8 devoted to diffusion bridges.
A detailed proof in the case X0 = x ∈ R and p ∈ [2, ∞) is provided in Section 7.8.8.
Exercise. Derive from these Lp -rates an a.s. rate of convergence for the Milstein scheme.
228 CHAPTER 7. DISCRETIZATION SCHEME(S) OF A BROWNIAN DIFFUSION
(b) Show that, under the assumption of (a), the continuous Milstein scheme also satisfies X etmil ≥ 0
for every t ∈ [0, T ].
(c) Let Yt = eκt Xt , t ∈ [0, T ]. Show that y is solution to a stochastic differential equation
dYt = eb(t, Yt ) + σ
e(t, Yt )dWt
e : [0, T ] × R → R are functions depending on b, σ and κ. Deduce by mimicking the proof
where eb, σ
b
of (a) that the Milstein scheme of Y is non-negative as soon as 2 0 ≥ η > 0 on the real line.
(σ )
k = 0, . . . , n − 1,
where ∆Wtnk+1 := Wtnk+1 − Wtnk , σ.k (x) denotes the k th column of the matrix σ and
d
d
X ∂σ.i
∀ x = (x1 , . . . , xd ) ∈ R , ∂σ.i σ.j (x) := (x)σ`j (x) ∈ Rd .
∂x`
`=1
Having a look at this formula when d = 1 and q = 2 shows that simulating the Milstein scheme
at times tnk in a general setting amounts to being able to simulate the joint distribution of the
triplet Z t
1 2 1 2
Wt , Wt , Ws dWs at time t = tn1
0
that is the joint distribution of two independent Brownian motions and their Lévy area. Then, the
simulation of
Z tn !
k
Wt1n − Wt1n Wt2n − Wt2n , (Ws1 − Wt1n )dWs2 , k = 1, . . . , n.
k k−1 k k−1 k
tn
k−1
7.5. MILSTEIN SCHEME (LOOKING FOR BETTER STRONG RATES. . . ) 229
No convincing (i.e. efficient) method to achieve that has been proposed so far in the literature at
least when q ≥ 3 or 4.
and Z tn
k+1 1 T
(Wsi − Wtin )dWsi = (∆Wtin )2 − .
tn
k
k 2 k+1 n
The announced form for the scheme follows. ♦
In this very special situation, the scheme (7.23) can still be simulated easily in dimension d
since it only involves some Brownian increments.
The rate of convergence of the Milstein scheme is formally the same in higher dimension as it is
in 1-dimension: Theorem 7.5 remains true with a d-dimensional diffusion driven by a q-dimensional
Brownian motion provided b : Rd → Rd and σ : Rd → M(d, q, R) are C 2 with bounded existing
partial derivatives.
Theorem 7.6 (See e.g. [83]) Assume that b and σ are C 2 on Rd with bounded existing partial
derivatives. Then, for every p ∈ (0, ∞) there exists a real constant Cp,b,σ,T > 0 such that for every
X0 ∈ Lp (P), independent of the q-dimensional Brownian motion W , the error bound established in
the former Theorem 7.5(a) in the Lipschitz continuous continuous case remains valid.
However, one must keep in mind that this result has nothing to do with the ability to simulate
this scheme.
In a way, one important consequence of the above theorem about the strong rate of convergence
of the Milstein scheme concerns the Euler scheme.
230 CHAPTER 7. DISCRETIZATION SCHEME(S) OF A BROWNIAN DIFFUSION
Corollary 7.2 If the drift b is C 2 (Rd , Rd ) with bounded existing partial derivatives and if σ(x) = Σ
is constant, then the Euler scheme and the Milstein scheme coincide at the discretization times tnk .
As a consequence, the strong rate of convergence of the Euler scheme is, in that very specific case,
O( n1 ). Namely, for every p ∈ (0, ∞), one has
max |Xtn − X̄tnn |
≤ Cb,σ,T T (1 + kX0 kp ).
0≤k≤n k k
p n
In fact the first inequality in this chain turns out to be highly non optimal since it switches from
a weak error (the difference only depending on the respective(marginal) distributions of XT and
X̄Tn ) to a pathwise approximation XT − X̄Tn . To improve asymptotically the other two inequalities
is hopeless since it has been shown (see the remark and comments in Section 7.8.6 for a brief
introduction) that under appropriate assumptions XT − X̄Tn satisfy a central limit theorem at rate
√
n with non-zero asymptotic variance. In fact one even has a functional form of this central limit
theorem for the whole process (Xt − X̄tn )t∈[0,T ] (see [88, 74]). As a consequence a rate faster than
1
n− 2 is possible in a L1 sense would not be consistent with this central limit behaviour.
Furthermore, numerical experiments confirm that the weak rate of convergence between the
1
above two expectations is usually much faster then n− 2 . This fact has been known for long and
has been extensively investigated in the literature, starting with the two seminal papers [152] by
Talay-Tubaro and [9] by Bally-Talay), leading to an expansion of the time discretization error
at an arbitrary accuracy when b and σ are smooth enough as functions. Two main settings have
drawn attention: the case where f is a smooth functions and the case where the diffusion itself is
“regularizing” i.e. propagate the regularizing effect of the driving Brownian motion (1 ) thanks to
a non-degeneracy assumption on the diffusion coefficient σ typically like uniform ellipticity for σ
(see below) or weaker assumption of Hörmander type.
The same kind of question has been investigated for specific classes of functionals F of the
whole path of the diffusion X with some applications to path-dependent option pricing. These
(although partial) results show that the resulting weak rate is the same as the strong rate that can
be obtained with the Milstein scheme for this type of functionals (when Lipschitz continuous with
respect to the sup-norm over [0, T ]).
1
The regularizing property of the Brownian motion should be understood as follows: if f is a Borel bounded
function on Rd , then fσ (x) := E(f (x + σW )) is a C ∞ function for every σ > 0 and converges towards f as σ → 0
in every Lp space, p > 0. This result is but a classical convolution result with a Gaussian kernel rewritten in a
probabilistic form.
7.6. WEAK ERROR FOR THE EULER SCHEME (I) 231
As a second step, we will show how the so-called Richardson-Romberg extrapolation methods
provides a systematic procedure to take optimally advantage of these weak rates, including their
higher order form.
7.6.1 Main results for E f (XT ): the Talay-Tubaro and Bally-Talay Theorems
We adopt the notations of the former section 7.1, except that we consider, mostly for convenience
an homogeneous SDE starting at x ∈ Rd ,
The notations (Xtx )t∈[0,T ] and (X̄t )t∈[0,T ] denote the diffusion and the Euler scheme of the diffusion
starting at x at time 0 (an exponent n will sometimes be added to emphasize that the step of the
scheme is Tn ).
Our first result is the first result on the weak error, the one which can be obtained with the less
stringent assumptions on b and σ.
Theorem 7.7 (see [152]) Assume b and σ are 4 times continuously differentiable on Rd with
bounded existing partial derivatives (this implies that b and σ are Lipschitz). Assume f : Rd → R
is 4 times differentiable with polynomial growth as well as its existing partial derivatives. Then, for
every x ∈ Rd ,
x n,x 1
E f (XT ) − E f (X̄T ) = O as n → +∞. (7.24)
n
Pt g(x) := E g(Xtx ).
On the other hand, the Euler scheme with step Tn starting at x ∈ R, denoted (X̄txn )0≤k ≤n , is a
k
discrete time homogeneous Markov (indexed by k) chain with transition
r !
T d
P̄ g(x) = E g x + σ(x) Z , Z = N (0; 1).
n
To be more precise this means for the diffusion process that, for any Borel bounded or non-negative
test function g
∀ s, t ≥ 0, Pt g(x) = E(g(Xs+t | Xs = x) = E g(Xtx )
kT
and for its Euler scheme (still with tnk = n )
Now
E f (XTx ) = PT f (x) = P Tn (f )(x)
n
and
E f (X̄Tx ) = P̄ n (f )(x)
232 CHAPTER 7. DISCRETIZATION SCHEME(S) OF A BROWNIAN DIFFUSION
so that,
n
X
x x
E f (XT ) − E f (X̄T ) = P Tk (P̄ n−k f )(x) − P Tk−1 (P̄ n−(k−1) f )(x)
n n
k=1
Xn
= P Tk−1 ((P T − P̄ )(P̄ n−k f ))(x). (7.25)
n n
k=1
P T g(x) − P̄ g(x),
n
with respect to the step Tn and the (four) first derivatives of the function g.
– the second one is to control the (four first) derivatives of the functions P̄ ` g for the sup norm
when g is regular, uniformly in ` ∈ {1, . . . , n} and n ≥ 1, in order to propagate the above local
error bound.
Let us deal with the first task. To be precise, let g : R → R be a four times differentiable
function g with bounded existing derivatives. First, Itô’s formula yields
Z t Z t
x 0 x 1
Pt g(x) := E g(XT ) = g(x) + E (g σ)(Xs )dWs + E g”(Xsx )σ 2 (Xsx )ds
0 2 0
| {z }
=0
P̄ g(x) = E g(X̄Tx )
!r r !2
T 1 T
= g(x) + g 0 (x)σ(x)E Z + (g”σ 2 )(x)E Z
n 2 n
r !3 r !4
σ 3 (x) T σ 4 (x) T
+g (3) (x) E Z + E g (4) (ξ) Z
3! n 4! n
r !4
T 4
σ (x) (4) T
= g(x) + (g”σ 2 )(x) + E g (ξ) Z
2n 4! n
T σ 4 (x)T 2
= g(x) + (g”σ 2 )(x) + cn (g),
2n 4!n2
where |cn (g)| ≤ 3kg (4) k∞ . This follows from the well-known facts that EZ = EZ 3 = 0 and E Z 2 = 1
and E Z 4 = 3. Consequently
T
σ 4 (x)T 2
Z
1 n
P T g(x) − P̄ g(x) = E((g”σ 2 )(Xsx ) − (g”σ 2 )(x))ds + cn (g). (7.26)
n 2 0 4!n2
7.6. WEAK ERROR FOR THE EULER SCHEME (I) 233
so that s
sup E((g”σ 2 )(Xsx ) − (g”σ 2 )(x)) ≤ kγ”σ 2 ksup .
∀ s ≥ 0,
x∈R 2
Elementary computations show that
where Cσ depends on kσ (k) k∞ , k = 0, 1, 2 but not on g (with the standard convention σ (0) = σ).
Consequently, we derive from (7.26) that
2
0 (k) T
|P T (g)(x) − P̄ (g)(x)| ≤ Cσ,T max kg k∞ .
n k=2,3,4 n
The fact that the first derivative g 0 is not involved in these bounds is simply the consequence
of our artificial assumption that b ≡ 0.
Now we pass to the second task. In order to plug this estimate in (7.25), we need now to control
the first four derivatives of P̄ ` f , ` = 1, . . . , n, uniformly with respect to k and n. In fact we do
not directly need to control the first derivative since b ≡ 0 but we will do it as a first example to
illustrate the method in a simpler case.
Let us consider again the generic function g and its four bounded derivatives.
r ! r !!
T T
(P̄ g)0 (x) = E g 0 x + σ(x) Z 1 + σ 0 (x) Z .
n n
so that
r
T
|(P̄ g)0 (x)| ≤ kg 0 k∞
1 + σ 0 (x) Z
n
r
1
T
≤ kg 0 k∞
1 + σ 0 (x) Z
n
2
v !2
u r
0
u
0
T
= kg k∞ E 1 + σ (x)
t Z
n
v !
u r
u T T
= kg 0 k∞ tE 1 + 2σ 0 (x) Z + σ 0 (x)2 Z 2
n n
r
T
= kg 0 k∞ 1 + (σ 0 )2 (x)
n
T
≤ kg 0 k∞ 1 + σ 0 (x)2
2n
234 CHAPTER 7. DISCRETIZATION SCHEME(S) OF A BROWNIAN DIFFUSION
√
since 1 + u ≤ 1 + u2 , u ≥ 0. Hence, we derive by induction that, for every n ≥ 1 and every
` ∈ {1, . . . , n},
0 2 T /2
∀ x ∈ R, |(P̄ ` f )0 (x)| ≤ kf 0 k∞ (1 + (σ 0 )2 (x)T /(2n))` ≤ kf 0 k∞ eσ (x) .
(where we used that (1 + u)` ≤ e`u , u ≥ 0). This yields
0 2
k(P̄ ` f )0 ksup ≤ kf 0 k∞ ekσ k∞ T /2 .
Let us deal now with the second derivative,
r ! r !!
d 0 T 0 T
(P̄ g)”(x) = E g x + σ(x) Z 1 + σ (x) Z
dx n n
r ! r !2 r ! r !
T T T T
= E g” x + σ(x) Z 1 + σ 0 (x) Z + E g0 x + σ(x) Z σ”(x) Z .
n n n n
Then ! !2
r r
E g” x + σ(x) T Z 0 T 0 2T
1 + σ (x) Z ≤ kg”k∞ 1 + σ (x) n
n n
q q
and, using that g 0 x + σ(x) Tn Z = g 0 (x) + g”(ζ)σ(x) Tn Z owing to the fundamental formula
of Calculus, we get
r ! r !
T T T
E g 0 x + σ(x) Z σ”(x) Z ≤ kg”k∞ kσσ”k∞ E(Z 2 )
n n n
so that
T
∀ x ∈ R, |(P̄ g)”(x)| ≤ kg”k∞ 1 + (2kσσ”k∞ + k(σ 0 )2 k∞ )
n
which implies the boundedness of |(P̄ ` f )”(x)|, ` = 0, . . . , n − 1, n ≥ 1.
The same reasoning yields the boundedness of all derivatives (P̄ ` f )(i) , i = 1, 2, 3, 4, ` = 1, . . . , n,
n ≥ 1.
Now we can combine our local error bound with the control of the derivatives. Plugging these
estimates in each term of (7.25), finally yields
n
0
X T2 T
|E f (XTx ) − E f (X̄Tx )| ≤ Cσ,T max k(P̄ ` f )(i) ksup ≤ Cσ,T,f T
1≤`≤n,i=1,...,4 n2 n
k=1
Exercises. 1. Complete the above proof by inspecting the case of higher order derivatives
(k = 3, 4).
2. Extend the proof to a (bounded) non zero drift b.
If one assumes more regularity on the coefficients or some uniform ellipticity on the diffusion
coefficient σ it is possible to obtain an expansion of the error at any order.
7.6. WEAK ERROR FOR THE EULER SCHEME (I) 235
Theorem 7.8 (a) Talay-Tubaro’s Theorem (see [152]). Assume b and σ are infinitely differen-
tiable with bounded partial derivatives. Assume f : Rd → R is infinitely differentiable with partial
derivative having polynomial growth. Then, for every integer R ∈ N∗ ,
R
X ck
x
(ER+1 ) ≡ E f (XT ) − E f (X̄T x,n
)= + O(n−(R+1) ) as n → +∞, (7.27)
nk
k=1
then the conclusion of (a) holds true for any bounded Borel function.
One method of proof for (a) is to rely on the P DE method i.e. considering the solution of the
parabolic equation
∂
+ L (u)(t, x) = 0, u(T, . ) = f
∂t
where
1
(Lg)(x) = g 0 (x)b(x) + g”(x)σ 2 (x)
2
denotes the infinitesimal generator of the diffusion. It follows from the Feynmann-Kac formula that
(under some appropriate regularity assumptions)
u(0, x) = E f (XTx ).
Formally (in one dimension), one assumes that u is regular enough to apply Itô’s formula so that
Z T Z T
x x ∂
f (XT ) = u(T, XT ) = u(0, x) + + L (u)(t, Xtx )dt + ∂x u(t, Xtx )σ(Xtx )dWt .
0 ∂t 0
Taking expectation (and assuming that the local martingale is a true martingale. . . ) and using
that u is solution to the above parabolic P DE yields the announced result.
Then, one uses a domino strategy based on the Euler scheme as follows
The core of the proof consists in applying Itô’s formula (to u, b and σ) to show that
E φ(tnk , Xtxnk )
E u(tnk , X̄txnk ) − u(tnk−1 , X̄tnk−1 ) = + o(n−2 ).
n2
236 CHAPTER 7. DISCRETIZATION SCHEME(S) OF A BROWNIAN DIFFUSION
for some continuous function φ. Then, one derives (after new computations) that
Remark. The last important information about weak error is that the weak error induced by
the Milstein scheme has exactly the same order as that of the Euler scheme i.e. O(1/n). So the
Milstein scheme seems of little interest as long as one wishes to compute E f (XT ) with a reasonable
framework like the ones described in the theorem, since, even when it can be implemented without
restriction, its complexity is higher than that of the standard Euler scheme. Furthermore, the next
paragraph about Richardson-Romberg extrapolation will show that it is possible to take advantage
of the higher order time discretization error expansion which become dramatically faster than
Milstein scheme.
1 c2
E(f (XT )) = E(2f (X̄T(2) ) − f (X̄T(1) )) − + O(n−3 ).
2 n2
Then, the new global (squared) quadratic error becomes
M
2 c 2 Var(2f (X̄ (2) ) − f (X̄ (1) ))
1 X 2
E(f (XT )) − 2f ((X̄T(2) )m ) − f ((X̄T(1) )m )
= + T T
+ O(n−5 ).
M 2 2n2 M
m=1
(7.29)
The structure of this quadratic error suggests the following natural question:
Is it possible to reduce the (asymptotic) time discretization error without increasing the Monte
Carlo error (at least asymptotically in n. . . )?
Or, put differently, to what extent is it possible to control the variance term Var(2f (X̄T(2) ) −
f (X̄T(1) ))?
– Lazy simulation. If one adopts a somewhat “lazy” approach by using the pseudo-random
number generator in a purely sequentially to simulate the two Euler schemes, this corresponds from
(1) (2)
a theoretical point of view to consider independent Gaussian white noises (Uk )k and (Uk )k to
simulate the Brownian increments in bot schemes or equivalently to assume that W (1) and W (2)
are two independent Brownian motions. Then
n→+∞
Var(2f (X̄T(2) ) − f (X̄T(1) )) = 4Var(f (X̄T(2) )) + Var(f (X̄T(1) )) = 5 Var(f (X̄T )) −→ 5 Var(f (X̄T )).
In this approach the cost of one order on the time discretization error (switch from n−1 to n−2 )
is the increase of the variance by a factor 5 and of the complexity by a factor (approximately) 3.
Exercise. Show that if X and Y ∈ L2 (Ω, A, P) have the same distribution. Then, for every
α ∈ [1, +∞)
Var(αX + (1 − α)Y ) ≥ Var(X).
M ∝ n2R .
For details we refer to [123]. The practical limitation of these results about Richardson-Romberg
extrapolation is that the control of the variance is only asymptotic (as n → +∞) whereas the
method is usually implemented for small values of n. However it is efficient up to R = 4 for Monte
Carlo simulation of sizes M = 106 to M = 108 .
7.8. FURTHER PROOFS AND RESULTS 239
Exercise: We consider the simplest option pricing model, the (risk-neutral) Black-Scholes dy-
namics, but with unusually high volatility. (We are aware that this model used to price Call options
does not fulfill the theoretical assumptions made above). To be precise
kT
where tk = n , k = 0, . . . , n. We want to price a vanilla Call option i.e. to compute
using a Monte Carlo simulation with M sample paths, M = 104 , M = 106 , etc.
Proof. If p = 1 the inequality is obvious. Assume now p ∈ (1, ∞). Let T ∈ (0, ∞) and let Y be a non-
negative random variable defined on the same probability space as (Xt )t∈[0,T ] . Let M > 0. It follows from
Fubini’s Theorem and Hölder Inequality that
!
Z T Z T
E (Xs ∧ M )ds Y = E((Xs ∧ M )Y )ds
0 0
Z T
p
≤ kXs ∧ M kp kY kq ds where q = p−1
0
Z T
= kY kq kXs ∧ M kp ds.
0
!p−1
Z T
The above inequality applied with Y := Xs ∧ M ds where M is a positive real number yields
0
!p !p !1− p1 Z
Z T Z T +∞
E Xs ∧ M ds ≤ E Xs ∧ M ds kXs kp ds.
0 0 0
R p
T
If E Xs ∧ Mn ds = 0 for any sequence Mn ↑ +∞, the inequality is obvious since, by Beppo Levi’s
0
RT
monotone convergence Theorem, 0 Xs ds = 0 P-a.s.. Otherwise, there is a sequence Mn ↑ ∞ such that all
these integrals are non zero (and finite since X is bounded by M and T is finite). Consequently, one can
divide both sides of the former inequality to obtain
!p ! p1
Z T Z +∞
∀ n ≥ 1, E Xs ∧ Mn ds ≤ kXs kp ds.
0 0
Now letting Mn ↑ +∞ yields exactly the expected result owing to two successive applications of Beppo
Levi’s monotone convergence Theorem, the first with respect to the Lebesgue measure ds, the second with
respect to d P. When T = +∞, one concludes by Fatou’s Lemma by letting T go to infinity in the inequality
obtained for finite T . ♦
Burkhölder-Davis-Gundy Inequality: For every p ∈ (0, ∞), there exists two positive real constants
cBDG
p and CpBDG such that for every non-negative continuous local martingale (Xt )t∈[0,T ] null at 0,
p
p
BDG
cp
hXiT
≤
sup |Xt |
≤ CpBDG
hXiT
.
p
t∈[0,T ]
p
p
For a detailed proof based on a stochastic calculus approach, we refer to [141], p.160.
then, every strong solution of Equation (7.1) starting from the finite random vector X0 (if any), satisfies
0
sup |Xs |
≤ 2 eκp CT 1 + kX0 kp .
∀ p ∈ (0, ∞),
s∈[0,T ]
p
7.8. FURTHER PROOFS AND RESULTS 241
T
(b) The same conclusion holds under the same assumptions for the continuous Euler schemes with step n,
n ≥ 1, as defined by (7.5) with the same constant κ0p (which does not depend n) i.e.
0
sup |X̄s |
≤ 2 eκp CT 1 + kX0 kp .
n
∀ p ∈ (0, ∞), ∀ n ≥ 1,
s∈[0,T ]
p
Remarks. • Note that this proposition makes no assumption neither on the existence of strong solutions
to (7.1) nor on some (strong) uniqueness assumption on a time interval or the whole real line. Furthermore,
/ Lp (P).
the inequality is meaningless when X0 ∈
• The case p ∈ (0, 2) will be discussed at the end of the proof.
τ
sup |Xt N | ≤ N + |X0 |
t∈[0,T ]
so that the non-decreasing function fN defined by fN (t) :=
sup |Xs∧τN |
, t ∈ [0, T ], is bounded by
s∈[0,t] p
It follows from successive applications of both the regular and the generalized Minkowski Inequalities and
of the BDG Inequality that
s
Z t
Z t∧τN
BDG
fN (t) ≤ kX0 kp + k1{s≤τN } b(s, Xs )kp ds + Cp 2
σ(s, Xs ) ds
0
0
p
s
Z t
Z t
kb(s ∧ τN , Xs∧τN )kp ds + CpBDG
≤ kX0 kp + σ(s ∧ τ , X ) 2 ds
N s∧τN
0
0
p
s
Z t
Z t
(1 + kXs∧τN kp )ds + CpBDG C
≤ kX0 kp +C
(1 + |Xs∧τN |)2 ds
0
0
p
s
√
Z t
Z t
(1 + kXs∧τN kp )ds + CpBDG C
≤ kX0 kp +C 2
t+ Xs∧τ N
ds
0
0
p
2
This holds true for any hitting time of an open set by an Ft -adapted càd process.
242 CHAPTER 7. DISCRETIZATION SCHEME(S) OF A BROWNIAN DIFFUSION
where we used in the last line that the Minkowski inequality on L2 [[0, T ], dt) endowed with its usual Hilbert
√ √ 1
norm. Hence, the Lp (P)-Minkowski Inequality and the obvious identity k .kp = k .k 2p yield
2
21
t √
Z
Z t
(1 + kXs∧τN kp )ds + CpBDG C t +
2
fN (t) ≤ kX0 kp + C
Xs∧τ N
ds
.
0 0 p
2
p p
Now the generalized L 2 (P)-Minkowski’s Inequality (7.30) (use 2 ≥ 1) yields
21 !
t √ t
Z Z
CpBDG C
2
fN (t) ≤ kX0 kp + C (1 + kXs∧τN kp )ds + t+
Xs∧τ
p ds
N
2
0 0
21 !
t √ t
Z Z
2
= kX0 kp + C (1 + kXs∧τN kp )ds + CpBDG C t+ kXs∧τN kp ds .
0 0
where √
ψ(t) = kX0 kp + C(t + CpBDG t).
Step 2: “À la Gronwall” Lemma.
Lemma 7.4 Let f : [0, T ] → R+ and ψ : [0, T ] → R+ be two non-negative non-decreasing functions
satisfying
Z t Z t 12
2
∀ t ∈ [0, T ], f (t) ≤ A f (s)ds + B f (s)ds + ψ(t)
0 0
where A, B are two positive real constants. Then
2
f (t) ≤ 2 e(2A+B )t ψ(t).
∀ t ∈ [0, T ],
√
Proof. First, it follows from the elementary inequality x y ≤ 12 (x/B + By), x, y ≥ 0, B > 0, that
t 21 Z t 21
f (t) B t
Z Z
2
f (s)ds ≤ f (t) f (s)ds ≤ + f (s)ds.
0 0 2B 2 0
Plugging this in the original inequality yields
Z t
f (t) ≤ (2A + B 2 ) f (s)ds + 2 ψ(t).
0
Step 3: Applying the above generalized Gronwall’s Lemma to the functions fN and ψ defined in Step 1,
leads to
BDG 2
√
∀ t ∈ [0, T ],
sup |Xs∧τN |
= fN (t) ≤ 2 e(2+(Cp ) )Ct kX0 kp + C(t + CpBDG t) .
s∈[0,t] p
The sequence of stopping times τN is non-decreasing and converges toward τ∞ taking values in [0, T ] ∪ {∞}.
On the event {τ∞ ≤ T }, |XτN − X0 | ≥ N so that |Xτ∞ − X0 | = limN →+∞ |XτN − X0 | = +∞ since Xt
7.8. FURTHER PROOFS AND RESULTS 243
has continuous paths. This is a.s. impossible since the process (Xt )t≥0 has continuous paths on [0, T ]. As a
consequence τ∞ = ∞ a.s. which in turn implies that
lim sup |Xs∧τN | = sup |Xs | a.s.
N s∈[0,t] s∈[0,t]
√
which finally yields, using that max( u, u) ≤ eu , u ≥ 0,
BDG 2
BDGp 2
∀ t ∈ [0, T ],
sup |Xs |
≤ 2 e(2+(Cp ) )Ct kX0 kp + eCt + e(C ) t
.
s∈[0,t] p
One derives the existence of a positive real constant κ0p > 0, only depending on p, such that
0
∀ t ∈ [0, T ],
sup |Xs |
≤ 2eκp Ct (1 + kX0 kp ).
s∈[0,t] p
Step 4 (p ∈ (0, 2)). The extension can be carried out as follows: for every x ∈ Rd , the diffusion process
starting at x, denoted (X̄tn,x )t∈[0,T ] , satisfies the following two obvious facts:
NP
– the process X x is FtW -adapted where FtW := σ(Ws , s ≤ t) .
d
– If X0 is an R -valued random vector defined on (Ω, A, P), independent of W , then the process X =
(Xt )t∈[0,T ] starting from X0 satisfies
Xt = XtX0 .
Consequently, using that p 7→ k . kp is non-decreasing, it follows that
0
sup |Xsx |
≤
sup |Xs |
≤ 2 eκ2 Ct (1 + |x|).
s∈[0,t] p s∈[0,t] 2
Now ! Z !
0
E sup |Xt |p = PX0 (dx)E sup |Xtx |p ≤ 2(p−1)+ 2p epκ2 CT (1 + E|X0 |p )
t∈[0,T ] R d t∈[0,T ]
p (p−1)+ p p
(where we used that (u + v) ≤ 2 (u + v ), u, v ≥ 0) so that
1 0 1
sup |Xt |
≤ 2(1− p )+ 2 eκ2 CT 2( p −1)+ 1 + kX0 kp
t∈[0,T ] p
1 0
2|1− p | 2 eκ2 CT 1 + kX0 kp
=
As concerns the SDE (7.1) itself, the same reasoning can be carried out only if (7.1) satisfies an existence
and uniqueness assumption for any starting value X0 .
(b) (Euler scheme) The proof follows the same lines as above. One starts from the integral form (7.6) of the
continuous Euler scheme and one introduces for every n, N ≥ 1 the stopping times
n
τ̄N = τ̄N := inf{t ∈ [0, T ] | |X̄tn − X0 | > N }.
At this stage, one notes that b (as well as σ 2 ) satisfy this type of inequality
!
∀ s ∈ [0, t], k1{s≤τ̄N } b(s, X̄s )kp ≤ C 1+
sup |X̄s |
.
s∈[0,t∧τ̄N ] p
which makes possible to reproduce formally the above proof to the continuous Euler scheme. ♦
244 CHAPTER 7. DISCRETIZATION SCHEME(S) OF A BROWNIAN DIFFUSION
RT
where H and G are (Ft )-progressively measurable satisfying 0
|Gs | + Hs2 ds < +∞ a.s.
(a) For every p ≥ 2,
1
∀ s, t ∈ [0, T ], kYt − Ys kp ≤ CpBDG sup kHt kp |t − s| 2 + sup kGt kp |t − s|
t∈[0,T ] t∈[0,T ]
√ 1
≤ (CpBDG sup kHt kp + T sup kGt kp )|t − s| 2 .
t∈[0,T ] t∈[0,T ]
1
In particular if supt∈[0,T ] kHt kp + supt∈[0,T ] kGt kp < +∞, the process t 7→ Yt is Hölder with exponent 2 from
[0, T ] into Lp (P).
(b) If p ∈ [1, 2), then
1
∀ s, t ∈ [0, T ], kYt − Ys kp ≤ CpBDG k sup |Ht |kp |t − s| 2 + sup kGt kp |t − s|
t∈[0,T ] t∈[0,T ]
√ 1
≤ (CpBDG k sup |Ht |kp + T sup kGt kp )|t − s| 2 .
t∈[0,T ] t∈[0,T ]
Proof. (a) Let 0 ≤ s ≤ t ≤ T . It follows from the standard and generalized Minkowski Inequalities and the
R s+u
BDG Inequality (applied to the continuous local martingale ( s Hr dWr )u≥0 ) that
Z t
Z t
kYt − Ys kp ≤
|G u |du
+
H u dW u
s p s p
s
Z t
Z t
BDG
≤ kGu kp du + Cp 2
Hu du
s
s
p
Z t
12
BDG
2
≤ sup kGt kp (t − s) + Cp
Hu du
t∈[0,T ] s p/2
1 1
≤ sup kGt kp (t − s) + CpBDG sup kHu2 kp/2
2
(t − s) 2
t∈[0,T ] u∈[0,T ]
1
= sup kGt kp (t − s) + CpBDG sup kHu kp (t − s) 2 .
t∈[0,T ] u∈[0,T ]
√ 1
The second inequality simply follows from |t − s| ≤ T |t − s| 2 .
(b) If p ∈ [1, 2], one simply uses that
Z t
21
12
1
1
2 2
Hu du
≤ |t − s|
sup Hu
= |t − s|
sup |Hu |
.
2
2
u∈[0,T ]
u∈[0,T ]
s p/2
p/2 p
Remark : If H, G and Y are defined on the whole real line R+ and sup (kGu kp + kHu kp ) < +∞ then
t∈R+
t 7→ Yt est locally 1/2-Hölder on R+ . If H = 0, the process is in fact Lipschitz continuous on [0, T ].
Combining the above result for Itô processes with those of Proposition 7.5 leads to the following result
on pathwise regularity of the diffusion solution to (7.1) (when it does exists) and the related Euler schemes.
Proposition 7.6 If the coefficients b and σ satisfy the linear growth assumption (7.31) over [0, T ] × Rd
with a real constant C > 0, then the Euler scheme with step T /n and any strong solution of (7.1) satisfy for
every p ≥ 1,
√ 1
kXt − Xs kp + kX̄tn − X̄sn kp ≤ κ”p Ceκ”p CT (1 + T ) 1 + kX0 kp |t − s| 2 .
∀ n ≥ 1, ∀ s, t ∈ [0, T ],
where κ”p ∈ (0, ∞) is real constant only depending on p (increasing in p).
Proof. As concerns the process X, this is a straightforward consequence of the above Lemma 7.5 by setting
Gt = b(t, Xt ) and Ht = σ(t, Xt )
since
max(k sup |Gt |kp , k sup |Ht |kp ) ≤ C(1 + k sup |Xt |kp ).
t∈[0,T ] t∈[0,T ] t∈[0,T ]
so that Z t Z s
sup |εs | ≤ |(b(s, Xs ) − b(s, X̄s )|ds + sup (σ(u, Xu ) − σ(u, X̄u ))dWu .
s∈[0,t] 0 s∈[0,t] 0
It follows from regular and generalized Minkowski Inequalities and BDG Inequality that
Z t
Z
t 12
BDG
2
f (t) ≤ kb(s, Xs ) − b(s, X̄s )kp ds + Cp σ(s, Xs ) − σ(s, X̄s ) ds
0
0
p
Z t
Z t
12
BDG
2
= kb(s, Xs ) − b(s, X̄s )kp ds + Cp
(σ(s, Xs ) − σ(s, X̄s )) ds
p
0 0 2
Z t Z t 21
≤ kb(s, Xs ) − b(s, X̄s )kp ds + CpBDG 2
kσ(s, Xs ) − σ(s, X̄s )kp ds
0 0
Z t Z t 21 !
β
≤ Cb,σ ((s − s) + kXs − X̄s kp )ds + CpBDG (s − s) 2β
+ kXs − X̄s k2p ds
0 0
β 21 !
√ √ Z t Z t
T
≤ Cb,σ ( t+ CpBDG ) t+ BDG
kXs − X̄s kp ds + Cp 2
kXs − X̄s kp ds
n 0 0
246 CHAPTER 7. DISCRETIZATION SCHEME(S) OF A BROWNIAN DIFFUSION
√ √ √
where we we used that 0 ≤ s − s ≤ Tn and the elementary inequality u + v ≤ u + v, u, v ≥ 0 and Cb,σ
denotes a positive real constant only depending on b and σ. Now, noting that
= kXs − Xs kp + kεs kp
≤ kXs − Xs kp + f (s),
it follows that
21 !
t √ t
Z Z
f (t) ≤ Cb,σ f (s)ds + 2 CpBDG 2
f (s) ds + ψ(t) (7.32)
0 0
where
β 12
√ √ Z t √ BDG
Z t
T
ψ(t) := ( t + CpBDG ) t+ kXs − Xs kp ds + 2 Cp 2
kXs − Xs kp ds . (7.33)
n 0 0
Now, we will use the path regularity of the diffusion process X in Lp (P) obtained in Proposition 7.6 to
provide an upper-bound for the function ψ. We first note that since b and σ satisfy (HTβ ) with a positive
real constant Cb,σ , they satisfy the linear growth assumption yields (7.31) with
0
Cb,σ,T := Cb,σ + sup (|b(t, 0)| + |σ(t, 0)|) < +∞
t∈[0,T ]
(b(., 0) and σ(., 0) are β-Hölder hence bounded on [0, T ]). It follows from (7.33) and Proposition 7.6 that
β 12
√ T √ 0 0 √ T √ √
( t + CpBDG ) κ”p Cb,σ t
(t + 2CpBDG t)
ψ(t) ≤ t + κ”p Cb,σ e 1 + kX0 kp (1 + t)
n n
β
T √ 0 0
≤ CpBDG (1 + t) + 2(1 + 2CpBDG )(1 + t)2 κ”p Cb,σ eκ”p Cb,σ t 1 + kX0 kp
n
√ √ √
where we used the elementary inequality u ≤ 1 + u which implies that ( t + CpBDG ) t ≤ CpBDG (1 + t)
√ √ BDG √ √ BDG
and (1 + t)(t + 2Cp t) ≤ 2(1 + 2Cp )(1 + t)2 .
Hence, there exists a real constant κ̃p > 0 such that
β
T 0 0
eκ̃p Cb,σ t 1 + kX0 kp (1 + t)2
ψ(t) ≤ κ̃p (1 + t) + κ̃p Cb,σ
n
β
T 0 0
≤ κ̃p et eκ̃p (1+Cb,σ )t 1 + kX0 kp .
+ 2κ̃p Cb,σ
n
since 1 + u ≤ eu and (1 + u)2 ≤ 2eu , u ≥ 0. Finally one plugs this bound in (7.34) (at time T ) to get the
announced upper-bound by setting the real constant κp at an appropriate value.
Step 2 (p ∈ (0, 2)): It remains to deal with the case p ∈ [1, 2). In fact, once noticed that Assumption
(HTβ ) ensures global existence and uniqueness of the solution X of (7.1) starting from a given random
variable X0 (independent of W ), it can be solved following the approach developed in Step 4 of the proof of
Proposition 7.5. We leave the details to the reader. ♦
7.8. FURTHER PROOFS AND RESULTS 247
Corollary 7.3 (Lipschitz continuous framework) If b and σ satisfy Condition (HT1 ) i.e.
∀ s, t ∈ [0, T ], ∀ x, y ∈ Rd , |b(s, x) − b(t, y)| + |σ(s, x) − σ(t, y)| ≤ Cb,σ,T (|t − s| + |x − y|)
It has been seen in Section 7.2.1 that when X = W , a log n factor comes out in the error rate. One must
again have in mind that this question is quite crucial since, at least in higher dimension, since the simulation
of the continuous Euler scheme may rise difficult problems whereas the simulation of the stepwise constant
Euler scheme remains straightforward in any dimension (provided b and σ are known).
Proof of Theorem 7.2(b). Step 1 (Deterministic X0 ): We first assume that X0 = x. Then, one may
assume without loss of generality that p ∈ [1, ∞). Then
Z t Z t
= b(s, X̄s )ds + σ(s, X̄s )dWs .
t t
On the other hand, using the extended Hölder Inequality: for every p ∈ (0, ∞),
1 1
∀ r, s ≥ 1, + = 1, kf gkp ≤ kf krp kgksp ,
r s
with r = 1 + η and s = 1 + 1/η, η > 0, leads to
sup |σ(t, X̄t )(Wt − Wt )|
≤
sup |σ(t, X̄t )| sup |Wt − Wt |
t∈[0,T ]
t∈[0,T ] t∈[0,T ]
p p
≤
sup |σ(t, X̄t )|
sup |Wt − Wt |
t∈[0,T ]
t∈[0,T ]
p(1+η) p(1+1/η)
248 CHAPTER 7. DISCRETIZATION SCHEME(S) OF A BROWNIAN DIFFUSION
(for convenience we set η = 1 in what follows). Now, like for the drift b, one has
0
sup |σ(t, X̄t )|
≤ 2eκ2p Cb,σ,T T (1 + |x|).
t∈[0,T ]
p(1+η)
owing to (7.12) in Section 7.2.1. Finally, plugging these estimates in (7.35), yields
T 0
sup |X̄tn − X
≤ 2 eκp Cb,σ,T T (1 + |x|)
etn |
t∈[0,T ]
n
p
r
κ02p Cb,σ,T T T
+2e (1 + |x|) × CW,2p (1 + log n),
n
r !
κ02p Cb,σ,T T T T
≤ 2(CW,2p + 1)e (1 + |x|) (1 + log n) + .
n n
q q
One concludes by noting that Tn 1 + log n + Tn ≤ 2 Tn 1 + log n for every integer n ≥ 1 and by setting
7.8.6 Application to the a.s.-convergence of the Euler schemes and its rate.
One can derive from the above Lp -rate of convergence an a.s.-convergence result. The main result is given
in the following theorem (which extends Theorem 7.3 stated in the homogeneous Lipschitz continuous con-
tinuous case).
Theorem 7.9 If (HTβ ) holds and if X0 is a.s. finite, the continuous Euler scheme X̄ n = (X̄tn )t∈[0,T ] a.s.
converges toward the diffusion X for the sup-norm over [0, T ]. Furthermore, for every α ∈ [0, β ∧ 21 ),
a.s.
nα sup |Xt − X̄tn | −→ 0.
t∈[0,T ]
The same convergence rate holds with the stepwise constant Euler scheme (X̃tn )t∈[0,T ] .
Proof: We make no a priori integrability assumption on X0 . We rely on the localization principle (at the
(N ) X0
origin). Let N > 1; set X0 := X0 1{|X0 |≤N } + N |X0|
1{|X0 |>N } . Stochastic integration being a local
(N ) (N )
operator, the solutions (Xt )t∈[0,T ] and (Xt )t∈[0,T ] of the SDE (7.1) are equal on {X0 = X0 }, namely
7.8. FURTHER PROOFS AND RESULTS 249
on {|X0 | ≤ N }. The same property is obvious for the Euler schemes X̄ n and X̄ n,(N ) starting from X0 and
(N )
X0 respectively. For a fixed N , we know from Theorem 7.2 (a) that, for every p ≥ 1,
p(β∧ 21 )
(N ) (N )
T
∃ Cp,b,σ,β,T > 0 such that, ∀ n ≥ 1, E sup |X̄t − Xt |p ≤ Cp,b,σ,β,T (1 + kX0 kp )p .
t∈[0,T ] n
In particular
E 1{|X0 |≤N } sup |X̄tn,(N ) − Xt |p = E 1{|X0 |≤N } sup |X̄tn,(N ) − Xt(N ) |p
t∈[0,T ] t∈[0,T ]
n,(N ) (N ) p
≤ E sup |X̄t − Xt |
t∈[0,T ]
p(β∧ 21 )
(N ) T
≤ Cp,b,σ,β,T (1 + kX0 kp )p .
n
1
X 1
On chooses p > β∧ 12
,so that 1 < +∞. Consequently Beppo Levi’s Theorem for series with
n≥1
np(β∧ 2 )
non-negative terms implies
X
E 1{|X0 |≤N } sup |X̄tn − Xt |p < +∞.
n≥1 t∈[0,T ]
Hence
a.s.
X [
sup |X̄tn − Xt |p < ∞, P a.s. on {|X0 | ≤ N } = {X0 ∈ Rd } = Ω
n≥1 t∈[0,T ] n≥1
n→∞
so that sup |X̄tn − Xt | −→ 0 P a.s.
t∈[0,T ]
One can improve the above approach to get a much more powerful result: let α ∈]0, β∧ 21 [, and p > 1
β∧ 12 −α
,
p(β∧ 12 )
pα
(N ) (N )
(N ) T
n E sup |X̄t − Xt |p ≤ Cp,b,σ,β,T (1 + kX0 kp )p npα
t∈[0,t] n
(N ) 1
0
= Cp,b,σ,β,T (1 + kX0 kp )p n−p(β∧ 2 −α)
Remarks and comments. • The above rate result strongly suggests that the critical index for the a.s. rate
of convergence is β ∧ 12 . The question is then: what happens when α = β ∧ 12 ? It is shown in [88, 74] that
√ L
(when β = 1), n(Xt − X̄t ) −→ Ξt where Ξ = (Ξt )t∈[0,T ] is a diffusion process. This weak convergence holds
in a functional sense, namely for the topology of the uniform convergence on C([0, T ], Rd ). This process Ξ is
not P-a.s. ≡ 0 if σ(x) 6≡ 0., even a.s non-zero if σ never vanishes. The “weak functional” feature means first
that we consider the processes as random variables taking values in their natural path space, namely the
separable Banach space (C([0, T ], Rd ), k . ksup ). Then, one may consider the weak convergence of probability
measures defined on (the Borel σ-field of) this space (see [24] for an introduction). The connection with the
above elementary results is the following:
250 CHAPTER 7. DISCRETIZATION SCHEME(S) OF A BROWNIAN DIFFUSION
L
If Yn −→ Y , then P-a.s. ∀ ε > 0
(This follows either from the Skokhorod representation theorem or from a direct approach)
√
• When β = 1 (Lipschitz continuous case), one checks that, for every ε > 0, limn ( n)1+ε sup |Xt − X̄tn | =
t∈[0,T ]
+∞ P-a.s. If one changes the time scale, it is of iterated logarithm type.
t
Exercise. One considers the geometric Brownian motion Xt = e− 2 +Wt solution to
dXt = Xt dWt , X0 = 1.
k
Y
`T
X̄tnk = 1 + ∆Wtn` where tn` = n , ∆Wtn` = Wtn` − Wtn`−1 , ` ≥ 1.
`=1
2
Theorem 7.10 There exists a function κ03 : R∗+ → R+ such that if the coefficient b and σ of (7.1) satisfy
Assumption (HTβ ) for a real constant C > 0, then the unique strong solution (Xtx )t∈[0,T ] starting from x ∈ Rd
sur [0, T ] and the continuous Euler scheme (X̄ n,x )t∈[0,T ] satisfy
0
y n,x n,y
∀ x, y ∈ Rd , ∀ n ≥ 1
sup |Xtx − Xt |
+
sup |X̄t − X̄t |
≤ κ03 (p, C)eκ3 (p,C)T |x − y|.
t∈[0,T ]
t∈[0,T ]
p p
Proof: We focus on the diffusion process (Xt )t∈[0,T ] . First note that if the above bound holds for some
p > 0 then it holds true for any p0 ∈ (0, p) since the k . kp -norm is non-decreasing in p. Starting from
Z t Z t
Xtx − Xty (b(s, Xsx ) − b(s, Xsy ))ds + σ(s, Xsx ) − σ(s, Xsy ) dWs
= (x − y) +
0 0
one gets
Z t Z s
sup |Xsx − Xsy | |b(s, Xsx ) − b(s, Xsy )|ds + sup σ(u, Xux ) − σ(u, Xuy ) dWu .
≤ |x − y| +
s∈[0,t] 0 s∈[0,t] 0
7.8. FURTHER PROOFS AND RESULTS 251
Then setting for every p ≥ 2, f (t) :=
sup |Xsx − Xsy |
, it follows from the generalized Minkowski and
s∈[0,t] p
Z t
Z t
21
x y BDG x y 2
≤ |x − y| + C kXs − Xs kp ds + Cp C
|Xs − Xs | ds
0 0 p
2
Z t Z t 21
2
≤ |x − y| + C kXsx − Xsy kp ds + CpBDG C kXsx − Xsy k ds .
0 0
7.8.8 Strong error rate for the Milstein scheme: proof of Theorem 7.5
In this section, we prove Theorem 7.5 i.e. the scalar case d = q = 1. Throughout this section Cb,σ,p,T and
Kb,σ,p,T are positive real constants that may vary from line to line.
First we note that the (interpolated) continuous Milstein scheme as defined by (7.21) can be written in
an integral form as follows
Z t Z t Z tZ s
Xetmil = x + esmil )ds +
b(X esmil )dWs +
σ(X (σσ 0 )(X
eumil )dWu dWs (7.36)
0 0 0 s
with our usual notation t (note that u = s on [s, s]). For notational convenience, we will also drop throughout
this section the superscript mil since we will deal exclusively with the Milstein scheme.
(a) Step 1 (Moment control): Our first aim is to prove that the Milstein scheme has uniformly controlled
moments at any order, namely that, for every p ∈ (0, ∞), there exists s a real constant Cp,b,σ,T > 0 such that
∀ n ≥ 1, sup
sup |X
etn |
< Cb,σ,T (1 + kX0 kp ). (7.37)
n≥1 t∈[0,T ] p
Set Z s
Hs = σ(X
es ) + (σσ 0 )(X es ) + (σσ 0 )(X
eu )dWu = σ(X es )(Ws − Ws )
s
so that Z t Z t
X
et = X0 + es )ds +
b(X Hs dWs .
0 0
It follows from the boundedness of b0 and σ 0 that b and σ satisfy a linear growth assumption.
We will follow the lines of the proof of Proposition 7.5, the specificity of the Milstein framework being
that the diffusion coefficient is replace by the process Hs . So, our task is to control the term
Z s∧τ̃N
sup Hu dWu
s∈[0,t] 0
252 CHAPTER 7. DISCRETIZATION SCHEME(S) OF A BROWNIAN DIFFUSION
Finally, following the lines of the step 1 of the proof of Proposition 7.5 leads to
! 21
Z s∧τ̃N
√ Z t
sup Hu dWu
≤ Cb,σ,T CpBDG t + k sup |X eu |k2 ds .
p
s∈[0,t] 0 p 0 u∈[0,s∧τ̃N ]
One concludes still following the lines of the proof Proposition 7.5 (including the step 4 to deal with the
case p ∈ (0, 2)).
Furthermore, as a by-product we get that, for every p > 0 and every n ≥ 1,
sup |Ht |
≤ Kb,σ,T,p 1 + kX0 kp < +∞ (7.38)
t∈[0,T ] p
where κb,σ,T,p does not depend on the discretization step n. As a matter of fact, this follows from
sup |Ht | ≤ Cb,σ 1 + sup |X etn | 1 + 2 sup |Wt |
t∈[0,T ] t∈[0,T ] t∈[0,T ]
Now, by Lemma 7.5 devoted to the Lp -regularity of Itô processes, one derives the existence of a real
constant Kb,σ,p,T ∈ (0, ∞) (not depending on n ≥ 1) such that
1
n n
T 2
∀ t ∈ [0, T ], ∀ n ≥ 1, kXt − Xt kp ≤ Kb,σ,p,T 1 + kX0 kp
e e . (7.39)
n
Step 2 (Decomposition and analysis of the error, p ∈ [2, +∞), X0 = x ∈ Rd ): Set εt := Xt − Xet , t ∈ [0, T ],
and, for every p ∈ [1, +∞),
f (t) := k sup |εs |kp , t ∈ [0, T ].
s∈[0,t]
Using the diffusion equation and the continuous Milstein scheme one gets
Z t Z t Z tZ s
εt = (b(Xs ) − b(Xs ))ds +
e (σ(Xs ) − σ(Xs ))dWs −
e (σσ 0 )(X
eu )dWu dWs
0 0 0 s
Z t Z t
= (b(Xs ) − b(Xes ))ds + (σ(Xs ) − σ(X es ))dWs
0 0
Z t Z t
+ es ) − b(X
(b(X es ))ds + σ(X es ) − σ(X es ) − σσ 0 (X
es )(Ws − Ws ) dWs .
0 0
First one derives that
Z t Z s
0
sup |εs | ≤ kb ksup sup |εu |ds + sup (σ(Xu ) − σ(Xu ))dWu
e
s∈[0,t] 0 u∈[0,s] s∈[0,t] 0
Z u
+ sup eu ) − b(X
b(X eu )du
u∈[0,s] 0
Z u
0
+ sup σ(Xu ) − σ(Xu ) − (σσ )(Xu )(Wu − Wu ) dWu
e e e
u∈[0,s] 0
so that, using the generalized Minkowski Inequality (twice) and the BDG Inequality, one gets classically
s
Z t Z t
0
f (t) ≤ kb ksup f (s)ds + CpBDG kσ 0 ksup
f (s)2 ds
0 0
Z s
Z s
eu ) − (σσ 0 )(X
+
sup
eu ) − b(X
b(X eu ) du
+
sup
σ(Xeu ) − σ(X eu )(Wu − Wu ) dWu
.
s∈[0,t] 0
s∈[0,t] 0
p p
| {z } | {z }
B C
where ρb (u) is defined by the above equation on the event {X eu 6= Xeu } and is equal to 0 otherwise. This
defines an (Fu )-adapted process, bounded by the Hölder coefficient [b0 ]αb0 of b0 . Using that for every x ∈ R,
|bb0 |(x) ≤ kb0 ksup (kb0 ksup + |b(0)|)|x| and (7.39) yields
1+α 0
b
T T 2
0 0 0
B ≤ kb ksup (kb ksup + |b(0)|)
sup |Xt |
+ [b ]αb0 Kb,σ,p,T 1 + |x|
e
t∈[0,T ] pn n
Z s Z u
+
sup b0 (X
eu ) Hv dWv du
.
s∈[0,t] 0 u p
254 CHAPTER 7. DISCRETIZATION SCHEME(S) OF A BROWNIAN DIFFUSION
kT kT
!
Z n
Z s Z n
Z s
Gu dWu ds = Gu dWu ds
(k−1)T (k−1)T (k−1)T
n s n n
Z kT
n kT
= − s− Gs dWs .
(k−1)T
n
n
h
(`−1)T
Likewise if t ∈ n , `T
n , then
`T
Z t Z s Z n
Gu dWu ds = (t − s)Gs dWs
(`−1)T (`−1)T
n s n
eu ) T + 1 σ(σ 0 )2 (X
σ(X eu )−(σσ 0 )(X
eu )−σ(X eu )(Wu −Wu ) = σ 0 b(X eu ) (Wu −Wu )2 −(u−u) +ρσ (s)|X eu |1+ασ0
e u −X
n 2
7.8. FURTHER PROOFS AND RESULTS 255
where ρσ (s) is an (Fu )-adapted process bounded by the Hölder coefficient [σ 0 ]ασ0 of σ 0 . Consequently for
every p ≥ 1,
σ(Xu ) − σ(X
e eu ) − (σσ 0 )(X
≤ kσ 0 b(X
eu )(Wu − Wu )
eu )kp (u − u)
p
1
+ kσ(σ 0 )2 (X eu )kp k(Wu − Wu )2 − (u − u)kp
2
+[σ 0 ]ασ0 k|X
eu − X eu |1+ασ0 k1+ασ0
p(1+α 0 )
σ
1+α2 σ0
T
≤ CpBDG Cb,σ,p,T (1 + |x|) .
n
Step 2 (Extension to p ∈ (0, 2) and random starting values X0 ): First one uses that p 7→ k . kp is non-
decreasing to extend the above bound to p ∈ (0, 2). Then one uses that, if X0 and W are independent, for
any non-negative functional Φ : C([0, T ], Rd )2 → R+ , one has with obvious notations
Z
EΦ(X, X) e = PX0 (dx) E Φ(X x , Xe x ).
Rd
(b) This second claim follows from the error bound established for the Brownian motion itself: as concerns
the Brownian motion, both stepwise constant and continuous versions of the Milstein and the Euler scheme
coincide. So a better convergence rate is hopeless. ♦
7.8.9 Weak error expansion for the Euler scheme by the P DE method
We make the following regularity assumption on b and σ on the one hand and on the function f on the other
hand:
k +k k +k
∞
∂ 1 2b ∂ 1 2σ
(R∞ ) ≡ b, σ ∈ C ([0, T ]×R) and ∀ k1 , k2 ∈ N, k1 +k2 ≥ 1, sup ∂tk1 ∂xk2 (t, x)+ ∂tk1 ∂xk2 (t, x) < +∞.
(t,x)∈[0,T ]×R
256 CHAPTER 7. DISCRETIZATION SCHEME(S) OF A BROWNIAN DIFFUSION
In particular, b et σ are Lipschitz continuous in (t, x) ∈ [0, T ] × R since, e.g., for every t, t0 ∈ [0, T ], and every
x, x0 ∈ R,
0 0
∂b 0 ∂b 0
|b(t , x ) − b(t, x)| ≤ sup ∂x (s, ξ) |x − x| +
sup ∂t (t, ξ) |t − t|.
(s,ξ)∈[0,T ]×R (s,ξ)∈[0,T ]×R
Consequently (see e.g. [27]), the SDE (7.1) always has a unique strong solution starting from any Rd -
valued random vector X0 (independent of the Brownian motion W ). Furthermore, since |b(t, x)| + |σ(t, x)| ≤
sup (|b(t, 0)|+|σ(t, 0)|)+C|x| ≤ C(1+|x|), any such strong solution (Xt )t∈[0,T ] satisfies (see Proposition 7.5):
t∈[0,T ]
The infinitesimal generator L of the diffusion is defined on every function g ∈ C 1,2 ([0, T ] × R) by
∂g 1 ∂2g
L(g)(t, x) = b(t, x) + σ 2 (t, x) 2 .
∂x 2 ∂x
Theorem 7.11 (a) If both Assumptions (R∞ ) and (F ) hold, the parabolic P DE
∂u
+ Lu = 0, u(T, x) = f (x) (7.41)
∂t
has a unique solution u ∈ C ∞ ([0, T ] × R, R). This solution satisfies
k
∂ u
sup k (t, x) ≤ Ck,T (1 + |x|r(k,T ) ).
∀ k ≥ 0,
t∈[0,T ] ∂x
where (Xsx,t )s∈[t,T ] denotes the unique solution of the SDE (7.1) starting at x at time t. If furthermore
b(t, x) = b(x) and σ(t, x) = σ(x) then
2
Notation: To alleviate notations we will use throughout this section the notations ∂x f , ∂t f , ∂xt f , etc, for
∂f ∂f 2
∂ u
the partial derivatives instead of ∂x , ∂t , ∂x∂t . . .
Exercise. Combining the above bound for the spatial partial derivatives with ∂t u = −Lu, show that
k +k
∂ 1 2u
∀ k = (k1 , k2 ) ∈ N2 , sup k1 k2 (t, x) ≤ Ck,T (1 + |x|r(k,T ) ).
t∈[0,T ] ∂t ∂x
Such a solution does exist since b and σ are Lipschitz continuous in x uniformly with respect to t and
continuous in (t, x).
For every t ∈ [0, T ],
Z T Z T
1 T 2
Z
u(T, XT ) = u(t, Xt ) + ∂t u(s, Xs )ds + ∂x u(s, Xs )dXs + ∂ u(s, Xs )dhXs i
t t 2 t x,x
Z T Z T
= u(t, Xt ) + (∂t u + Lu)(s, Xs )ds + ∂x u(s, Xs )σ(s, Xs )dWs
t t
Z T
= u(t, Xt ) + ∂x u(s, Xs )σ(s, Xs )dWs
t
Z t
since u satisfies the P DE (7.41). Now the local martingale Mt := ∂x u(s, Xs )σ(s, Xs )dWs is a true
0
martingale since
Z t
hM it = (∂x u(s, Xs ))2 σ 2 (s, Xs )ds ≤ C 1 + sup |Xt |θ ∈ L1 (P)
0 t∈[0,T ]
for an exponent θ ≥ 0. The above inequality follows from the assumptions on b and σ and the induced
growth properties of u. The integrability is a consequence of Proposition 7.31. Consequently (Mt )t∈[0,T ] is
a true martingale. As a consequence, using the assumption u(T, .) = f , one derives that
∀ t ∈ [0, T ], E f (XT ) | Ft = u(t, Xt ).
The announced representation follows from the Markov property satisfied by (Xt )t≥0 (with respect to the
filtration Ft = σ(X0 ) ∨ FtW , t ≥ 0).
One shows likewise that when b and σ do not depend on t, then (weak) uniqueness of the solutions
x,t
of (7.1) (3 ) implies that (Xt+s x
)s∈[0,T −t] and Xs∈[0,T −t] have the same distribution. ♦
Theorem 7.12 (Talay-Tubaro [152]): If (R∞ ) and (F ) hold, then there exists a non-decreasing function
Cb,σ,f : R+ → R+ such that
T
max E(f (X̄ n,x )) − (f (X x
)) ≤ Cb,σ,f (T ) .
E
0≤k≤n kT
n
kT
n n
Remark. The result also holds under some weaker smoothness assumptions on b, σ and f (say Cb5 ).
Proof. Step 1 (Representing and estimating E(f (XTx ))− E(f (X̄Tx ))): To alleviate notations we temporarily
drop the exponent x in Xtx or X̄tn,x . Noting that
We apply Itô’s formula between tk−1 and tk to the function u and use that the Euler scheme satisfies the
“frozen” SDE
dX̄tn = b(t, X̄tn )dt + σ(t, X̄tn )dWt .
3
which in turns follows from the strong existence and uniqueness for this SDE, see e.g. [141].
258 CHAPTER 7. DISCRETIZATION SCHEME(S) OF A BROWNIAN DIFFUSION
This yields
Z tk Z tk Z tk
1
u(tk , X̄tnk ) − u(tk−1 , X̄tnk−1 ) = ∂t u(s, X̄sn )ds + ∂x u(s, X̄sn )dX̄sn + ∂xx u(s, X̄sn )dhX̄ n is
tk−1 tk−1 2 tk−1
Z tk Z tk
= (∂t + L̄)u(s, s, X̄sn , X̄sn )ds + σ(s, X̄sn )∂x u(s, X̄sn )dWs
tk−1 tk−1
(the integrability of the integral term follows from ∂t u = −Lu which ensures the polynomial growth of
2
(∂t +L̄)u(s, s, x, x)). At this stage, the idea is to expand the above expectation into a term φ̄(s, X̄sn ) Tn +O( Tn ).
To this end we will use again Itô’s formula to ∂t u(s, X̄sn ), ∂x u(s, X̄sn ) and ∂xx u(s, X̄sn ), taking advantage of
the regularity of u.
– Term 1. The function ∂t u being C 1,2 ([0, T ] × R), Itô’s formula yields between s = tk−1 and s
Z s Z s
2 2
∂t u(s, X̄s ) = ∂t u(s, X̄s ) + ∂tt u(r, X̄r ) + L̄(∂t u)(r, X̄r ) dr + σ(r, X̄r )∂xt u(r, Xr )dWr .
s s
First let us show that the locale martingale term is the increment between s and s of a true martingale
(1) 2
(denoted (Mt ) from now on). Note that ∂t u = −L u so that ∂xt u = −∂x L u which is clearly a function
with polynomial growth in x uniformly in t ∈ [0, T ] since
∂x b(t, x)∂x u + 1 σ 2 (t, x)∂xx
2 ≤ C(1 + |x|θ0 ).
u (t, x)
2
Consequently, (Mt )t∈[0,T ] is a true martingale since E(hM iT ) < +∞. On the other hand, using (twice) that
∂t u = −L u, leads to
2
∂tt u(r, X̄r ) + L̄(∂t u)(r, r, X̄r , X̄r ) = −∂t L(u)(r, X̄r ) − L̄ ◦ Lu(r, r, X̄r , X̄r )
=: φ̄(1) (r, r, X̄r , X̄r )
This follows from the fact that φ̄(1) is defined as a linear combination products of b, ∂t b, ∂x b, ∂xx b, σ, ∂t σ,
∂x σ, ∂xx σ, ∂x u, ∂xx u at (t, x) or (t, x) (with “x = X̄r ” and “x = X̄r ”).
– Term 2. The function ∂x u being C 1,2 , Itô’s formula yields
Z s
2
∂x u(s, X̄s ) = ∂x u(s, X̄s ) + ∂tx u(r, X̄r ) + L̄(∂x u)(r, r, X̄r , X̄r ) dr
s
Z s
2
+ ∂xx u(r, X̄r )σ(r, X̄r )dWr
s
(2)
The stochastic integral is the increment of a true martingale (denoted (Mt ) in what follows) and using
2
that ∂tx u = ∂x (−L u), one shows likewise that
2
∂tx u(r, X̄r ) + L̄(∂x u)(r, r, X̄r , X̄r ) = L̄(∂x u) − ∂x (Lu) (r, r, X̄r , X̄r )
= φ̄(2) (r, r, X̄r , X̄r )
where (t, t, x, x) 7→ φ̄(2) (t, t, x, x) has a polynomial growth in (x, x) uniformly in t, t ∈ [0, T ].
– Term 3. Following the same lines one shows that
Z s
2 2
∂xx u(s, X̄s ) = ∂xx u(s, X̄s ) + φ̄(3) (r, r, X̄r , X̄r )dr + Ms(3) − Ms(3)
s
(3)
where (Mt ) is a martingale and φ̄(3) has a polynomial growth in (x, x) uniformly in (t, t).
Step 2: Collecting all the results obtained in Step 2 yields
where
1
φ̄(r, r, x, x) = φ̄(1) (r, x, x) + b(r, x)φ̄(2) (r, x, x) + σ 2 (r, x)φ̄(3) (r, x, x).
2
Hence the function φ̄ satisfies a polynomial growth assumption
0
|φ̄(t, x, y)| ≤ Cφ (1 + |x|θ + |y|θ ), θ, θ0 ∈ N.
The first term on the right hand side of the equality vanishes since ∂t u + Lu = 0.
As concerns the third term, we will show that it has a zero expectation. One can use Fubini’s Theorem
since sup |X̄t | ∈ Lp (P) for every p > 0 (this ensures the integrability of the integrand). Consequently
t∈[0,T ]
Z tk !
(1) 1 2 (2) (3)
E − Ms(1) + Mtk−1 b(tk−1 , X̄tk−1 )(Ms(2)
− (3)
+ σ (tk−1 , X̄tk−1 )(Ms − Mtk−1 ) ds =
Mtk−1 )
tk−1 2
Z tk 1
E (Ms(1) − Mt(1) (2) (3)
) + E b(tk−1 , X̄tk−1 )(Ms(2) − Mtk−1 ) + E σ 2 (tk−1 , X̄tk−1 )(Ms(3) − Mtk−1 ) ds.
tk−1
k−1
2
260 CHAPTER 7. DISCRETIZATION SCHEME(S) OF A BROWNIAN DIFFUSION
Now all the three expectations inside the integral are zero since the M (k) are true martingale. Thus
E b(tk−1 , X̄tk−1 )(Ms(2) − Mt(2)
k−1
) =E b(tk−1 , X̄tk−1 ) E(Ms(2) − Mtk−1 | Ftk−1 )
(2)
= 0,
| {z } | {z }
Ftk−1 -measurable =0
Z tk Z s
E u(tk , X̄tk ) − u(tk−1 , X̄tk−1 ) = E φ̄(r, X̄r , X̄r ) dr ds (7.42)
tk−1 tk−1
so that
Z tk Z s
E u(tk , X̄t ) − u(tk−1 , X̄t ) ≤ ds dr E |φ̄(r, X̄r , X̄r )|
k k−1
tk−1 tk−1
0 (tk − tk−1 )2
≤ Cφ 1 + 2 E sup |X̄tn |θ∨θ .
t∈[0,T ] 2
2
T
≤ Cb,σ,f (T )
n
where, owing to Proposition 6.6, the function Cb,σ,f (.) only depends on T (in a non-decreasing manner).
Summing over the terms for k = 1, . . . , n yields the result at time T .
T tk
Step 3: Let k ∈ {0, . . . , n − 1}. It follows from the obvious equalities n = k , k = 1, . . . , n, and what
precedes that
This function φ̄ can be written explicit as a polynomial of b, σ, u and (some of) its partial derivatives. So,
under suitable assumptions on b and σ and f (like (R∞ ) and (F )) one can show that φ̄ satisfies in turn
0
(iii) ∂tmm φ̄(t, t, x, x) ≤ CT (1 + |x|θ(m,T ) + |x|θ (m,T ) ) t ∈ [0, T ].
7.8. FURTHER PROOFS AND RESULTS 261
In fact, as above, a C 1,2 -regularity in (t, x) is sufficient to get a second order expansion. The idea is once
again to apply Itô’s formula, this time to φ̄. Let r ∈ [tk−1 , tk ] (so that r = tk−1 ).
Z r
φ̄(r, r, X̄rn , X̄rn ) = φ̄(r, r, X̄rn , X̄rn ) + ∂x φ̄(v, r, X̄vn , X̄rn )dX̄vn
tk−1
Z r
1 2
+ ∂t φ̄(v, r, X̄vn , X̄rn ) + ∂xx φ(v, r, X̄vn , X̄rn )σ 2 (v, X̄vn ) dv
tk−1 2
Z r
n n
= φ̄(r, r, X̄r , X̄r ) + ∂t φ̄(v, r, X̄vn , X̄rn ) + L̄φ̄(., r, ., X̄rn )(v, v, X̄vn , X̄vn ) dv
tk−1
Z r
+ ∂x φ̄(v, r, X̄vn , X̄rn )σ(v, X̄vn )dWv .
tk−1
The stochastic integral turns out to be a true square integrable martingale (increment) on [0, T ] since
sup ∂x φ̄(v, r, X̄vn , X̄rn )σ(v, X̄vn ) ≤ C(1 + sup |X̄vn |αT + sup |X̄rn |βT ) ∈ L1 (P).
v,r∈[0,T ] v∈[0,T ] r∈[0,T ]
Now, !
E sup |L̄φ̄(., r, ., X̄vn )(v, v, X̄vn , X̄vn )| < +∞
v,r∈[0,T ]
owing to (7.40) and the polynomial growth of b, σ, u and its partial derivatives. The same holds for
∂t φ̄(v, r, X̄vn , X̄rn ) so that
Z tk Z s Z r ! 3
n n n n n
1 T
∂t φ̄(v, r, X̄v , X̄r ) + L̄φ̄(., r, ., X̄r )(v, v, X̄v , X̄v ) dv dr ds ≤ Cb,σ,f,T .
E
tk−1 tk−1 tk−1 3 n
k=1
T T
Z
T 2
= E φ̄(s, s, X̄sn , X̄sn )ds + O
2n 0 n
Z T
2
T T
= E ψ̄(s, X̄sn )ds + O
2n 0 n
where ψ̄(t, x) := φ̄(t, t, x, x) is at least a C 1,2 -function. In turn, for every k ∈ {0, . . . , n}, the function ψ(tk , .)
satisfies Assumption (F ) with some uniform bounds in k (this follows from the fact that its time partial
derivative does exist continuously on [0, T ]). Consequently
T
max E(ψ̄(tk , X̄tn,x 0
)) − E(ψ̄(tk , Xtxk )) ≤ Cb,σ,f
(T )
0≤k≤n k
n
262 CHAPTER 7. DISCRETIZATION SCHEME(S) OF A BROWNIAN DIFFUSION
so that
Z T Z T ! Z
T
n n
ψ̄(s, X̄s )ds − ψ̄(s, Xs )ds = E ψ̄(s, X̄s ) − E ψ̄(s, Xs ) ds
E
0 0 0
0 T
≤ T Cb,σ,f .
n
Applying Itô’s formula to ψ̄(u, Xu ) between s and s shows that
Z s Z s
ψ̄(s, Xs ) = ψ̄(s, Xs ) + (∂t + L)(ψ̄)(r, Xr )dr + ∂x ψ̄(r, Xr )σ(Xr )dWr
s s
which implies
T
sup |E ψ̄(s, Xs ) − E ψ̄(s, Xs )| ≤ C”f,b,σ,T .
s∈[0,T ] n
Hence
Z T Z T T2
ψ̄(s, Xs )ds − E ψ̄(s, Xs )ds ≤ C”b,σ,f .
E
0 0 n
Finally, combining all these bounds yields
Z T
T 1
E(f (X̄Tn,x ) − f (XTx )) = E ψ̄(s, Xsx ) ds + O
2n 0 n2
Remark. One can make ψ̄ explicit and by induction we eventually obtain the following Theorem.
Theorem 7.13 (Talay-Tubaro, 1989) If both (F ) and (R∞ ) holds then the weak error can be expanded at
any order, that is
R
X ak 1
∀ R ∈ N∗ , E(f (X̄Tn )) − E(f (XT )) = k
+ O R+1 .
n n
k=1
The main application is of course Richardson-Romberg extrapolation developed in Section 7.7. For more
recent results on weak errors we refer to [69].
Remark. What happens when f is no longer smooth? In fact the result still holds provided the diffu-
sion coefficient satisfies a uniform ellipticity condition (or at least a uniform Hormander hypo-ellipticity
assumption). This is the purpose of Bally-Talay’s Theorem, see [9].
Chapter 8
In the recent past years, several papers provided weak rates of convergence for some families of
functionals F . These works were essentially motivated by the pricing of path-dependent (European)
options, like Asian or Lookback options in 1-dimension, corresponding to functionals defined for
every α ∈ D([0, T ], Rd ) by
Z T
F (α) := f α(s)ds , F (x) := f α(T ), sup α(t), inf α(t)
0 t∈[0,T ] t∈[0,T ]
and, this time possibly in higher dimension, to barrier options with functionals of the form
where D is an (open) domain of Rd and τD (α) := inf{s ∈ [0, T ], α(s) or α(s−) ∈ / D} is the escape
1
time from D by the generic càdlàg path α ( ). In both frameworks f is usually Lipschitz continuous
1
when α is continuous or stepwise constant and càdlàg, τD (α) := inf{s ∈ [0, T ], α(s) ∈
/ D}.
263
264CHAPTER 8. THE DIFFUSION BRIDGE METHOD : APPLICATION TO THE PRICING OF PATH-DEP
continuous. Let us quote two well-known examples of results (in a homogeneous framework i.e.
b(t, x) = b(x) and σ(t, x) = σ(x)):
– In [63] is established the following theorem.
Theorem 8.1 (a) If the domain D is bounded and has a smooth enough boundary (in fact C 3 ), if
b ∈ C 3 (Rd , Rd ), σ ∈ C 3 Rd , M(d, q, R) , σ uniformly elliptic on D (i.e. σσ ∗ (x) ≥ ε0 Id , ε0 > 0),
then for every bounded Borel function f vanishing in a neighbourhood of ∂D,
e n )1
1
E f (XT )1{τD (X)>T } − E f (X T e n )>T }
{τD (X
= O √ as n → +∞. (8.1)
n
(where X
e denotes the stepwise constant Euler scheme).
(b) If furthermore, b and σ are C 5 and D is half-space, then the continuous Euler scheme satisfies
n
1
E f (XT )1{τD (X)>T } − E f (X̄T )1{τD (X̄ n )>T } =O as n → +∞. (8.2)
n
Note however that these assumptions are not satisfied by usual barrier options (see below).
– It is suggested in [150] (including a rigorous proof when X = W ) that if b, σ ∈ Cb4 (R, R), σ is
4,2
uniformly elliptic and f ∈ Cpol (R2 ) (existing partial derivatives with polynomial growth), then
e n , max X
en ) = O √
1
E f (XT , max Xt ) − E f (XT tk as n → +∞. (8.3)
t∈[0,T ] 0≤k≤n n
A similar improvement – O( n1 ) rate – as above can be expected (but still holds a conjecture) when
replacing Xe by the continuous Euler scheme X̄, namely
1
E f (XT , max Xt ) − E f (X̄Tn , max X̄tnk ) = O as n → +∞.
t∈[0,T ] t∈[0,T ] n
If we forget about the regularity assumptions, the formal intersection between these two classes
of path-dependent options is not empty since the payoff a barrier option with domain D = (−∞, L)
can be written
f (XT )1{τD (X)>T } = g XT , sup Xt with g(x, y) = f (x)1{y<L} .
t∈[0,T ]
Unfortunately, such a function g is never a smooth function so that if the second result is true it
does not solve the first one.
By contrast with the “vanilla case”, these results are somewhat disappointing since they point
out that the weak error obtained with the stepwise√ constant Euler scheme is not significantly
better than the strong error (the only gain is a 1 + log n factor). The positive part is that we
can reasonably hope that using the continuous Euler scheme yield again the O(1/n)-rate and the
expansion of the time discretization error, provided we know how to simulate this scheme.
8.2. FROM BROWNIAN TO DIFFUSION BRIDGE: HOW TO SIMULATE FUNCTIONALS OF THE GENUINE
Likewise one shows that, for every u ≥ T , E(YtW,T Wu ) = 0 so that YtW,T ⊥ span(Wu , u ≥ T ).
Now W being a Gaussian process independence and no correlation coincide in the closed subspace of
L2 (Ω, A, P) spanned by W . The process Y W,T belongs to this space by construction. Consequently
Y W,T is independent of (WT +u )u≥0 .
(b) First note that for every t ∈ [T0 , T1 ],
(T )
Wt = WT0 + Wt−T0 0
(T )
where Wt 0 = WT0 +t − WT0 , t ≥ 0, is a standard Brownian motion, independent of FTW0 . Rewrit-
ing (8.4) for W (T0 ) leads to
s (T ) (T0 )
Ws(T0 ) = W 0 + YsW ,T− T0 .
T1 − T0 T1 −T0
Plugging this identity in the above equality (at time s = t − T0 ) leads to
t − T0 W (T0 ) ,T1 −T0
Wt = WT0 + (W − WT0 ) + Yt−T . (8.5)
T1 − T0 T1 0
[Hint: consider the finite dimensional marginal distributions (Wt1 , . . . , Wtn ) given (X, Ws1 , . . . , Wsp )
where ti ∈ [T0 , T1 ], i = 1, . . . , n, sj ∈ [T0 , T1 ]c , j = 1, . . . , p, and X = (WT0 , WT1 ) and use the
decomposition (8.5).]
3. The conditional distribution of (Wt )t∈[T0 ,T1 ] given WT0 , WT1 is that of a Gaussian process hence
it can also be characterized by its expectation and covariance structure. Show that they are given
respectively by
T1 − t t − T0
E (Wt | WT0 = x, WT1 = y) = x+ y, t ∈ [T0 , T1 ],
T1 − T0 T1 − T0
8.2. FROM BROWNIAN TO DIFFUSION BRIDGE: HOW TO SIMULATE FUNCTIONALS OF THE GENUINE
and (T − t)(s − T )
1 0
Cov Wt , Ws | WT0 = x, WT1 = y = , s ≤ t, s, t ∈ [T0 , T1 ].
T1 − T0
Proposition 8.2 (Bridge of the Euler scheme) Assume that σ(t, x) 6= 0 for every t ∈ [0, T ],
x ∈ R.
(a) The processes (X̄t )t∈[tnk ,tnk+1 ] , k = 0, . . . , n − 1, are conditionally independent given the σ-field
σ(X̄tnk , k = 0, . . . , n).
(b) Furthermore, the conditional distribution
!
t − tnk n B, T /n
L (X̄t )t∈[tnk ,tnk+1 ] | X̄tnk = xk , X̄tnk+1 = xk+1 = L xk + (x k+1 − x k ) + σ(tk , xk )Y t−t n
tnk+1 − tnk k t∈[tn n
k ,tk+1 ]
B, T /n
where (Ys )s∈[0,T /n] is a Brownian bridge (related to a generic Brownian motion B) as defined
by (8.4). The distribution of this Gaussian process (sometimes called a diffusion bridge) is entirely
characterized by:
– its expectation function
!
t − tnk
xk + n (xk+1 − xk )
tk+1 − tnk
t∈[tn n
k ,tk+1 ]
and
– its covariance operator
2 (s ∧ t − tnk )(tnk+1 − s ∨ t)
σ (tnk , xk ) .
tnk+1 − tnk
t − tnk n
n
W (tk ) ,T /n
X̄t = X̄tnk + ( X̄ n
tk+1 − X̄ n
tk ) + σ(t k , X̄ n
tk )Yt−tn
tnk+1 − tnk k
(with in mind that tnk+1 − tnk = T /n). Consequently the conditional independence claim will follow
n
W (tk ) ,T /n
if the processes (Yt )t∈[0,T /n] , k = 0, . . . , n − 1, are independent given σ(X̄tn` , ` = 0, . . . , n).
Now, it follows from the assumption on σ that
and W are independent (note all the above bridges are FTW -measurable). First note that all
n
W (tk ) ,T /n
the bridges (Yt )t∈[0,T /n] , k = 0, . . . , n − 1 and W live in a Gaussian space. We know from
n
W (tk ) ,T /n
Proposition 8.1(a) that each bridge (Yt )t∈[0,T /n] is independent of both FtWn and σ(Wtn
k+1 +s
−
k
Wtk , s ≥ 0) hence in particular of all σ({Wt` , ` = 1, . . . , n}) (we use here a specificity of Gaussian
n n
processes). On the other hand, all bridges are independent since they are built from independent
n
(tn ) W (tk ) ,T /n
Brownian motions (Wt k )t∈[0,T /n] . Hence, the bridges (Yt )t∈[0,T /n] , k = 0, . . . , n − 1 are
i.i.d. and independent of σ(Wtnk , k = 1, . . . , n).
Now X̄tnk is σ({Wtn` , ` = 1, . . . , k})-measurable consequently σ(Xtnk , Wtnk , X̄tnk+1 ) ⊂ σ({Wtn` , ` =
n
W (tk ) ,T /n
1, . . . , n}) so that (Yt )t∈[0,T /n] is independent of (Xtnk , Wtnk , X̄tnk+1 ). The conclusion follows. ♦
Now we know the distribution of the continuous Euler scheme between two successive discretiza-
tion tnk and tnk+1 times conditionally to the Euler scheme at its discretization times. Now, we are
in position to simulate some functionals of the continuous Euler scheme, namely its supremum.
Proposition 8.3 The distribution of the supremum of the Brownian bridge starting at 0 and ar-
riving at y at time T , defined by YtW,T,y = Tt y + Wt − Tt WT on [0, T ] is given by
2
1 − exp − T z(z − y) if z ≥ max(y, 0),
P sup YtW,T,y ≤ z =
t∈[0,T ]
=0 if z ≤ max(y, 0).
Proof. The key is to have in mind that the distribution of Y W,T,y is that of the conditional
distribution of W given WT = y. So, we can derive the result from an expression of the joint
distribution of (supt∈[0,T ] Wt , WT ), e.g. from
P sup Wt ≥ z, WT ≤ y .
t∈[0,T ]
It is well-known from the symmetry principle that, for every z ≥ max(y, 0),
P sup Wt ≥ z, WT ≤ y = P(WT ≥ 2z − y).
t∈[0,T ]
We briefly reproduce the proof for the reader’s convenience. One introduces the hitting time
τz := inf{s > 0 | Ws = z}. This random time is in fact a stopping time with respect to the filtration
(FtW ) of the Brownian motion W since it is the hitting time of the closed set [z, +∞) by the
continuous process W (this uses that z ≥ 0). Furthermore, τz is a.s. finite since lim supt Wt = +∞
a.s. Consequently, still by continuity of its paths, Wτz = z a.s. and Wτz +t − Wτz is independent of
FτWz . As a consequence, for every z ≥ max(y, 0),
P sup Wt ≥ z, WT ≤ y = P(τz ≤ T, WT − Wτz ≤ y − z)
t∈[0,T ]
= P(τz ≤ T, −(WT − Wτz ) ≤ y − z)
= P(τz ≤ T, WT ≥ 2z − y)
= P(WT ≥ 2z − y)
8.2. FROM BROWNIAN TO DIFFUSION BRIDGE: HOW TO SIMULATE FUNCTIONALS OF THE GENUINE
ξ2
e− 2T
where h(ξ) = √ . Hence, since the involved functions are differentiable one has
2πT
P(WT ≥ 2z − (y + η)) − P(WT ≥ 2z − y) /η
P( sup Wt ≥ z | WT = y) = lim
η→0
t∈[0,T ] P(WT ≤ y + η) − P(WT ≤ y) /η
∂ P(WT ≥2z−y)
∂y h(2z − y)
= ∂ P(WT ≤y)
=
h(y)
∂y
(2z−y)2 −y 2 2z(z−y)
= e− 2T = e− 2T . ♦
Corollary 8.1 Let λ > 0 and let x, y ∈ R. If Y W,T denotes the standard Brownian bridge related
to W between 0 and T , then for every
! 1 − exp − 2 (z − x)(z − y) if z ≥ max(x, y)
t T λ2
P sup x + (y − x) + λYtW,T ≤ z =
t∈[0,T ] T
0 if z < max(x, y).
(8.6)
Then one concludes by the previous proposition, using that for any real-valued random variable ξ,
every α ∈ R, and every β ∈ (0, ∞),
z − α z − α
P(α + β ξ ≤ z) = P ξ ≤ =1−P ξ > . ♦
β β
d
Exercise. Show using −W = W that
where n
n,k
nt W (tk ) ,T /n
Mx,y := sup x+ (y − x) + σ(tnk , x)Yt .
t∈[0,T /n] T
n,k
It follows from Proposition 8.3 that the random variables MX̄ n ,X̄ n
, k = 0, . . . , n − 1, are con-
t t
k k+1
ditionally independent given X̄ , k = 0, . . . , n. Following Corollary 8.1, the distribution function
tn
k
Gn,k n,k
x,y of Mx,y is given by
2n
Gn,k
x,y (z) = 1 − exp − (z − x)(z − y) 1{z≥max(x,y)} , z ∈ R.
T σ 2 (tnk , x)
d
where we used that U = 1 − U . From a computational viewpoint ζ := (Gn,k −1
x,y ) (1 − u) is the
admissible solution (i.e. such that ≥ max(x, y)) to the equation
2n
1 − exp − 2 n (ζ − x)(ζ − y) = 1 − u
T σ (tk , x)
or equivalently to
T 2 n
ζ 2 − (x + y)ζ + xy + σ (tk , x) log(u) = 0.
2n
This leads to
1 q
(Gnx,y )−1 (1 − u) = 2 2 n
x + y + (x − y) − 2T σ (tk , x) log(u)/n .
2
Finally
L max X̄tn | {X̄tnk = xk , k = 0, . . . , n} = L max (Gnxk ,xk+1 )−1 (1 − Uk )
t∈[0,T ] 0≤k≤n−1
where (Uk )0≤k≤n−1 are i.i.d. and uniformly distributed random variables over the unit interval.
We assume for the sake of simplicity that the interest rate r is 0. By Lookback like options we
mean the class of options whose payoff involve possibly both X̄T and the maximum of (Xt ) over
[0, T ] i.e.
E f (X̄T , sup X̄t ).
t∈[0,T ]
Regular Call on maximum is obtained by setting f (x, y) = (y − K)+ , the regular maximum
Lookback option by setting f (x, y) = y − x .
We want to compute a Monte Carlo approximation of E f (supt∈[0,T ] X̄t ) using the continuous
Euler scheme. We reproduce below a pseudo-script to illustrate how to use the above result on the
conditional distribution of the maximum of the Brownian bridge.
• Set S f = 0.
for m = 1 to M
(m)
• Simulate a path of the discrete time Euler scheme and set xk := X̄tn , k = 0, . . . , n.
k
(m) (m)
• Simulate Ξ(m) := max0≤k≤n−1 (Gnxk ,xk+1 )−1 (1 − Uk ), where (Uk )1≤k≤n are i.i.d. with
U ([0, 1])-distribution.
end.(m)
• Eventually,
f
SM
E f ( sup X̄t ) ≈ ,
t∈[0,T ] M
at least for large enough M (2 ).
Once one can simulate supt∈[0,T ] X̄t (and its minimum, see exercise below), it is easy to price
by simulation the exotic options mentioned in the former section (Lookback, options on maximum)
but also the barrier options since one can decide whether or not the continuous Euler scheme strikes
or not a barrier (up or down). Brownian bridge is also involved in the methods designed for pricing
Asian options.
Exercise. Derive a formula similar to (8.7) for the conditional distribution of the minimum of
the continuous Euler scheme using now the inverse distribution functions
n,k −1 1 q
2 2 n
(Fx,y ) (u) = x + y − (x − y) − 2T σ (tk , x) log(u)/n , u ∈ [0, 1].
2
2
. . . Of course one needs to compute the empirical variance (approximately) given by
M M
!2
1 X (m) 2 1 X (m)
f (Ξ ) − f (Ξ ) .
M M
k=1 k=1
in order to design a confidence interval without which the method is simply non sense. . . .
272CHAPTER 8. THE DIFFUSION BRIDGE METHOD : APPLICATION TO THE PRICING OF PATH-DEP
of !
t W,T /n
inf x+ T
(y − x) + σ(x)Yt .
t∈[0,T /n]
n
Proof of Equation (8.8). This formula is the typical application of pre-conditioning described
in Section 3.4. We start from the chaining identity for conditional expectation:
E f (X̄T )1{supt∈[0,T ] X̄t ≤L} = E E f (X̄T )1{supt∈[0,T ] X̄t ≤L} | X̄tk , k = 0, . . . , n
Then we use the conditional independence of the Brownian bridges given the values X̄tk , k =
0, . . . , n, of the Euler scheme which has been established in Proposition 8.2. It follows
!
E f (X̄T )1{supt∈[0,T ] X̄t ≤L} = E f (X̄T )P( sup X̄t ≤ L | X̄tk , k = 0, . . . , n)
t∈[0,T ]
n
!
Y
= E f (X̄T ) GnX̄t ,X̄t (L)
k k+1
k=1
n−1 (X̄t −L)(X̄t −L)
k k+1
− 2n
σ 2 (tn ,Xt )
Y T
= E f (X̄T )1{max0≤k≤n X̄t ≤L}
1 − e k k . ♦
k
k=0
Furthermore, we know that the random variable in the right hand side always has a lower
variance since it is a conditional expectation of the random variable in the left hand side, namely
n−1 (X̄t −L)(X̄t −L)
2n k k+1
−
σ 2 (tn ,Xt )
Y
Var f (X̄T )1{max0≤k≤n X̄t ≤L} 1 − e T k k ≤ Var f (X̄T )1{sup
k t∈[0,T ] X̄t ≤L}
k=0
8.2. FROM BROWNIAN TO DIFFUSION BRIDGE: HOW TO SIMULATE FUNCTIONALS OF THE GENUINE
Pn−1 (X̄t −L)(X̄t −L)
k k+1
− 2n
T k=0 σ 2 (tn ,Xt )
E f (X̄T )1{inf t∈[0,T ] X̄t ≥L} = E f (X̄T )1{min0≤k≤n X̄tk ≥L} e k k (8.9)
and that the expression in the second expectation has a lower variance.
2. Extend the above results to barriers of the form
L(t) := eat+b , a, b ∈ R.
still for some real constant ck depending on b, σ, F , etc For small values of R, one checks that
(1) √ (1) √ √
R=2: α1 2 = −(1 + 2), α2 2 = 2(1 + 2).
√ √ √ √
( 21 ) 3− 2 ( 21 ) 3−1 (1) 2−1
∗[.4em]R = 3 : α1 = √ √ , α2 = −2 √ √ , α3 2 =3 √ √ .
2 2− 3−1 2 2− 3−1 2 2− 3−1
Note that these coefficients have greater absolute values than in the standard case. Thus if R = 4,
P ( 21 ) 2
1≤r≤4 (αr ) ≈ 10 900! which induces an increase of the variance term for too small values of the time
discretization parameter n even when increments are consistently generated. The complexity computations of
274CHAPTER 8. THE DIFFUSION BRIDGE METHOD : APPLICATION TO THE PRICING OF PATH-DEP
the procedure needs to be updated but grosso modo the optimal choice for the time discretization parameter
n as a function of the M C size M is
M ∝ nR .
• The continuous Euler scheme: The conjecture is simply to assume that the expansion (ER ) now holds
for a vector space V of functionals F (with polynomial growth with respect to the sup-norm). The increase
of the complexity induced by the Brownian bridge method is difficult to quantize: it amounts to computing
−1
log(Uk ) and the inverse distribution functions Fx,y and G−1
x,y .
The second difficulty is that simulating (the extrema of) some of continuous Euler schemes using the
Brownian bridge in a consistent way is not straightforward at all. However, one can reasonably expect that
using independent Brownian bridges “relying” on stepwise constant Euler schemes with consistent Brownian
increments will have a small impact on the global variance (although slightly increasing it).
To illustrate and compare these approaches we carried some numerical tests on partial Lookback and
barrier options in the Black-Scholes model presented in the previous section.
Partial Lookback options: The partial Lookback Call option is defined by its payoff functional
F (x) = e−rT x(T ) − λ min x(s) , x ∈ C([0, T ], R),
s∈[0,T ] +
CallLkb
0 = e−rT E((XT − λ min Xt )+ )
t∈[0,T ]
is given by
σ2
2r 2r
CallLkb
0
BS
= X0 Call (1, λ, σ, r, T ) + λ X0 PutBS
λ , 1, , r, T .
σ 2
2r σ
We took the same values for the B-S parameters as in the former section and set the coefficient λ at λ = 1.1.
For this set of parameters CallLkb
0 = 57.475.
As concerns the MC simulation size, we still set M = 106 . We compared the following three methods
for every choice of n:
– A 3-step Richardson-Romberg extrapolation (R = 3) of the stepwise constant Euler scheme (for which
3
a O(n− 2 )-rate can be expected from the conjecture).
– A 3-step Richardson-Romberg extrapolation (R = 3) based on the continuous Euler scheme (Brownian
bridge method) for which a O( n13 )-rate can be conjectured (see [63]).
– A continuous Euler scheme (Brownian bridge method) of equivalent complexity i.e. with discretization
parameter 6n for which a O( n1 )-rate can be expected (see [63]).
The three procedures have the same complexity if one neglects the cost of the bridge simulation with
respect to that of the diffusion coefficients (note this is very conservative in favour of “bridged schemes”).
We do not reproduce the results obtained for the standard stepwise constant Euler scheme which are
clearly out of the game (as already emphasized in [63]). In Figure 8.2.5, the abscissas represent the size of
Euler scheme with equivalent complexity (i.e. 6n, n = 2, 4, 6, 8, 10). Figure 8.2.5(a) (left) shows that both
3-step Richardson-Romberg extrapolation methods converge significantly faster than the “bridged” Euler
scheme with equivalent complexity in this high volatility framework. The standard deviations depicted in
Figure 8.2.5(a) (right) show that the 3-step Richardson-Romberg extrapolation of the Brownian bridge is
controlled even for small values of n. This is not the case with the 3-step Richardson-Romberg extrapolation
method of the stepwise constant Euler scheme. Other simulations – not reproduced here – show this is
already true for the standard Romberg extrapolation and the bridged Euler scheme. In any case the multistep
Romberg extrapolation with R = 3 significantly outperforms the bridged Euler scheme.
8.2. FROM BROWNIAN TO DIFFUSION BRIDGE: HOW TO SIMULATE FUNCTIONALS OF THE GENUINE
(a)
4 350
3
300
2
Error = MC Premium − BS Premium
1 250
−1
150
−2
−3 100
−4
50
−5
−6 0
10 15 20 25 30 35 40 45 50 55 60 10 15 20 25 30 35 40 45 50 55 60
BS−LkbCall, S0 = 100, lambda = 1.1, sig = 100%, r = 15 %, R = 3 : Consist. Romberg & Brownian bridge, R = 3 BS−LkbCall, S0 = 100, lambda = 1.1, sig = 30%, r = 5 %, R = 3 : Consist. Romberg & Brownian bridge, R = 3
(b)
4 350
3
300
2
Error = MC Premium − BS Premium
1
Std Deviation MC Premium 250
0
200
−1
150
−2
−3 100
−4
50
−5
−6 0
10 15 20 25 30 35 40 45 50 55 60 10 15 20 25 30 35 40 45 50 55 60
BS−LkbCall, S0 = 100, lambda = 1.1, sig = 100%, r = 15 %, R = 3 : Consist. Romberg & Brownian bridge, R = 3 BS−LkbCall, S0 = 100, lambda = 1.1, sig = 30%, r = 5 %, R = 3 : Consist. Romberg & Brownian bridge, R = 3
Figure 8.1: B-S Euro Partial Lookback Call option. (a) M = 106 . Richardson-Romberg extrapola-
tion (R = 3) of the Euler scheme with Brownian bridge: o−−o−−o. Consistent Richardson-Romberg extrap-
olation (R = 3): ×—×—×. Euler scheme with Brownian bridge with equivalent complexity: + − − + − − +.
X0 = 100, σ = 100%, r = 15 %, λ = 1.1. Abscissas: 6n, n = 2, 4, 6, 8, 10. Left: Premia. Right: Standard
Deviations. (b) Idem with M = 108 .
When M = 108 , one verifies (see Figure 8.2.5(b)) that the time discretization error of the 3-step
Richardson-Romberg extrapolation vanishes like for the partial Lookback option. In fact for n = 10 the
3-step bridged Euler scheme yields a premium equal to 57.480 which corresponds to less than half a cent
error, i.e. 0.05 % accuracy! This result being obtained without any control variate variable.
The Richardson-Romberg extrapolation of the standard Euler scheme also provides excellent results. In
fact it seems difficult to discriminate them with those obtained with the bridged schemes, which is slightly
unexpected if one think about the natural conjecture about the time discretization error expansion.
As a theoretical conclusion, these results strongly support both conjectures about the existence of ex-
pansion for the weak error in the (n−p/2 )p≥1 and (n−p )p≥1 scales respectively.
Up & out Call option: Let 0 ≤ K ≤ L. The Up-and-Out Call option with strike K and barrier L is
defined by its payoff functional
with
2 2 σ2 x
log(X0 /L) + (r −
2 )T
Z
0 X0 0 X0 − ξ2 dξ
K =K , L =L , d (L) = √ and Φ(x) := e− 2 √
L L σ T −∞ 2π
2r
and µ = .
σ2
We took again the same values for the B-S parameters as for the vanilla call. We set the barrier value at
L = 300. For this set of parameters C0U O = 8.54. We tested the same three schemes. The numerical results
are depicted in Figure 8.2.5.
The conclusion (see Figure 8.2.5(a) (left)) is that, at this very high level of volatility, when M = 106
(which is a standard size given the high volatility setting) the (quasi-)consistent 3-step Romberg extrapolation
with Brownian bridge clearly outperforms the continuous Euler scheme (Brownian bridge) of equivalent
complexity while the 3-step Richardson-Romberg extrapolation based on the stepwise constant Euler schemes
with consistent Brownian increments is not competitive at all: it suffers from both a too high variance (see
Figure 8.2.5(a) (right)) for the considered sizes of the Monte Carlo simulation and from its too slow rate of
convergence in time.
When M = 108 (see Figure 8.2.5(b) (left)), one verifies again that the time discretization error of the
3-step Richardson-Romberg extrapolation almost vanishes like for the partial Lookback option. This no
longer the case with the 3-step Richardson-Romberg extrapolation of stepwise constant Euler schemes. It
seems clear that the discretization time error is more prominent for the barrier option: thus with n = 10,
the relative error is 9.09−8.54
8.54 ≈ 6.5% by this first Richardson-Romberg extrapolation whereas, the 3-step
Romberg method based on the quasi-consistent “bridged” method yields an approximate premium of 8.58
corresponding to a relative error of 8.58−8.54
8.54 ≈ 0.4%. These specific results (obtained without any control
variate) are representative of the global behaviour of the methods as emphasized by Figure 8.2.5(b)(left).
where h : R+ → R+ is a non-negative Borel function. This kind of options needs some specific
treatments to improve Rthe rate of convergence of its time discretization. This is due to the continuity
T
of the functional f 7→ 0 f (s)ds in (L1 ([0, T ], dt), k . k1 ).
This problem has been extensively investigated (essentially for a Black-Scholes dynamics) by
E. Temam in his PHD thesis (see [97]). What follows comes from this work.
σ2
Xtx = x exp (µt + σWt ), µ=r− , x > 0.
2
8.2. FROM BROWNIAN TO DIFFUSION BRIDGE: HOW TO SIMULATE FUNCTIONALS OF THE GENUINE
(a)
2 900
800
1.5
700
Error = MC Premium − BS Premium
0.5
500
400
0
300
−0.5
200
−1
100
−1.5 0
10 15 20 25 30 35 40 45 50 55 60 10 15 20 25 30 35 40 45 50 55 60
BS−UOCall, S0=K=100, L = 300, sig = 100%, r = 15 %, Consist. Romberg Brownian bridge R = 3 vs Equiv Brownian BS−UOCall, S0=K=100, L = 300, sig = 100%, r = 15 %, CConsist. Romberg & Brownian bridge, R = 3
(b)
1.5 900
1 800
700
0.5
Error = MC Premium − BS Premium
600
Std Deviation MC Premium
500
−0.5
400
−1
300
−1.5
200
−2 100
−2.5 0
10 15 20 25 30 35 40 45 50 55 60 10 15 20 25 30 35 40 45 50 55 60
BS−UOCall, S =K=100, L = 300, sig = 100%, r = 15 %, Consist. Romberg Brownian bridge R = 3 vs Equiv Brownian BS−UOCall, S =K=100, L = 300, sig = 100%, r = 15 %, CConsist. Romberg & Brownian bridge, R = 3
0 0
Figure 8.2: B-S Euro up-&-out Call option. (a) M = 106 . Richardson-Romberg extrapolation (R = 3)
of the Euler scheme with Brownian bridge: o−−o−−o. Consistent Richardson-Romberg extrapolation (R = 3):
—×—×—×—. Euler scheme with Brownian bridge and equivalent complexity: + − − + − − +. X0 = K = 100,
L = 300, σ = 100%, r = 15%. Abscissas: 6n, n = 2, 4, 6, 8, 10. Left: Premia. Right: Standard Deviations.
(b) Idem with M = 108 .
278CHAPTER 8. THE DIFFUSION BRIDGE METHOD : APPLICATION TO THE PRICING OF PATH-DEP
Then
Z T n−1
X Z tnk+1
Xsx ds = Xsx ds
0 k=0 tn
k
n−1 Z T /n
X (tn
k)
= Xtxn exp (µs + σWs )ds.
k
k=0 0
So, we need to approximate the integral coming out in the above equation.
p Let B be a standard
Brownian motion. Roughly speaking, sups∈[0,T /n] |Bs | is proportional to T /n. Although it is not
true in the a.s. (owing to a missing log log term), it holds true in any Lp (P), p ∈ [1, ∞) by a simple
scaling so that, we may write “almost” rigorously
σ2 2 3
∀ s ∈ [0, T /n], exp (µs + σBs ) = 1 + µs + σBs + Bs + “O(n− 2 )”.
2
Hence
Z T /n T /n
µ T2 σ2 T 2 σ 2 T /n 2
Z Z
T 5
exp (µs + σBs )ds = + +σ Bs ds + + (Bs − s)ds + “O(n− 2 )”
0 n 2 n2 0 2 2n 2 2 0
2 Z T /n 2
2 Z 1
T rT σ T eu2 − u)du + “O(n− 52 )”
= + +σ Bs ds + (B
n 2 n2 0 2 n 0
q
T
where B u , u ∈ [0, 1], is a standard Brownian motion (on the unit interval) since,
eu =
n BT n
combining a scaling and a change of variable,
Z T /n 2 Z 1
2 T e 2 − u)du.
(Bs − s)ds = (B u
0 n 0
R1 2
Owing to the fact that the random variable 0 (B e − u)du is centered and that, when B is replaced
u
(tn)
sucessively by W k , k = 1, . . . , n, the resulting random variables are i.i.d., one can in fact consider
5
that the contribution of this term is O(n− 2 ) (3 ). This leads us to use the following approximation
Z T /n Z T /n
(tn ) T r T2 (tn
k)
exp (µs + σWs k )ds ≈ Ikn := + 2
+ σ W s ds.
0 n 2 n 0
n
fu(tk ) )2 − u) du is independent of FtW
3
R1
To be more precise, the random variable 0 ((W n , k = 0, . . . , n − 1, so that
k
n−1 Z 1
2 n−1
Z 1
2
(tn (tn
X
k) 2 k) 2
x
X
x
X (( W ) − u) du = X (( W ) − u) du
n
n
u u
tk
f
f
tk
k=0 0
0k=0 2
2
n−1
Z 1
2
X
xn
2
n
fu(tk ) )2 − u) du
=
Xtk
(( W
2 0
k=0 2
4
Z 1
2
T
(Bu2 − u) du
kX x k2
≤ n T 2
n
0
2
Simulation phase: Now, it follows from Proposition 8.2 applied to the Brownian motion (which
is its own continuous Euler scheme), that the n-tuple of processes (Wt )t∈[tnk ,tnk+1 ] , k = 0, . . . , n − 1
(tn )
are independent processes given σ(Wtnk , k = 1, . . . , n). This also reads as (Wt k )t∈[0,T /n] , k =
0, . . . , n − 1 are independent processes given σ(Wtnk , k = 1, . . . , n). The same proposition implies
that for every ` ∈ {0, . . . , n − 1},
!
(tn )
nt W,T /n
L (Wt ` )t∈[0,T /n] | Wtnk = wk , k = 1, . . . , n = L (w`+1 − w` ) + Yt
T t∈[0,T /n]
are independent processes given σ({Wtnk , k = 1, . . . , n}). Consequently the random variables
R T /n (tn` )
0 Ws ds, ` = 1, . . . , n, are conditionally i.i.d. given σ(Wtnk = wk , k = 1, . . . , n) with a condi-
tional Gaussian distribution with (conditional mean) given by
Z T /n
nt T
(w`+1 − w` )dt = (w`+1 − w` ).
0 T 2n
As concerns the variance, we can use the exercise below Proposition 8.1 but, at this stage, we will
detail the computation for a generic Brownian motion, say W , between 0 and T /n.
! !2
Z T /n Z T /n
Var Ws ds | W T = E YsW,T /n ds
n
0 0
Z
= E(YsW,T /n YtW,T /n ) ds dt
[0,T /n]2
Z
n T
= (s ∧ t) − (s ∨ t) ds dt
T [0,T /n]2 n
3 Z
T
= (u ∧ v)(1 − (u ∨ v)) du dv
n [0,1]2
1 T 3
= .
12 n
Exercise. Use stochastic calculus to show directly that
Z T 2 ! Z T 2 Z T 2
W,T T T T3
E Ys ds =E Ws ds − WT = − s ds = .
0 0 2 0 2 12
R
T
No we can describe the pseudo-code for the pricing of an Asian option with payoff h 0 Xsx ds
by a Monte Carlo simulation.
for m := 1 to M
q
(m) d T (m) (m)
• Simulate the Brownian increments ∆Wtn := n Zk , k = 1, . . . , n, Zk i.i.d. with distri-
k+1
bution N (0; 1); set
q
(m) (m) (m)
– wk := Tn Z1 + · · · + Zn ,
(m) (m)
– xk := x exp (µtnk + σwk ), k = 0, . . . , n.
280CHAPTER 8. THE DIFFUSION BRIDGE METHOD : APPLICATION TO THE PRICING OF PATH-DEP
(m)
• Simulate independently ζk , k = 1, . . . , n, i.i.d. with distribution N (0; 1) and set
2 3 !
n,(m) d T r T T 1 T 2 (m)
Ik := + +σ (wk+1 − wk ) + √ ζk , k = 0, . . . , n − 1.
n 2 n 2n 12 n
• Compute
n−1
!
(m) n,(m)
X
(m)
hT =: h xk Ik
k=0
end.(m)
M
1 X (m)
Premium ≈ e−rT hT .
M
m=1
end.
Time discretization error estimates: Set
Z T n−1
X
AT = Xsx ds and ĀnT = Xtxn Ikn .
k
0 k=0
Proof. In progress. ♦
Remark. The main reason for not considering higher order expansions
Rt R t of exp(µt + σBt ) is that we
are not able to simulate at a reasonable cost the triplet (Bt , 0 Bs ds, 0 Bs2 ds) which is no longer a
Rt Rt
Gaussian vector and, consequently, ( 0 Bs ds, 0 Bs2 ds) given Bt .
Chapter 9
Let Z : (Ω, A, P) → (E, E) be an E-valued random variable where (E, E) is an abstract measurable
space, let I be a nonempty open interval of R and let F : I × E → R be a Bor(I) ⊗ E-measurable
function such that, for every x ∈ I, F (x, Z) ∈ L2 (P) (1 ). Then set
Assume that the function f is regular, at least at some points. Our aim is to devise a method to
compute by simulation f 0 (x) at such points (or higher derivatives f (k) (x), k ≥ 1). If the functional
F (x, z) is differentiable at x (with respect to its first variable) for PZ almost every z, if a domination
or uniform integrability property holds (like (9.1) or (9.5) below), if the partial derivative ∂F ∂x (x, z)
can be computed at a reasonable cost and Z is a simulatable random vector (still at a reasonable. . . ),
it is natural to compute f 0 (x) using a Monte Carlo simulation based on the representation formula
∂F
f 0 (x) = E (x, Z) .
∂x
This approach has already been introduced in Chapter 2 and will be more deeply developed further
on in Section 9.2 mainly devoted to the tangent process.
Otherwise, when ∂F ∂x (x, z) does not exist or cannot be compute easily (whereas F can), a natural
idea is to introduce a stochastic finite difference approach. Other methods based on the introduction
of an appropriate weight will be introduced in the last two sections of this chapter.
281
282 CHAPTER 9. BACK TO SENSITIVITY COMPUTATION
Proof. Let ε ∈ (0, ε0 ). It follows from the Taylor formula applied to f between x and x ± ε
respectively that
f (x + ε) − f (x − ε) ε2
|f 0 (x) − | ≤ [f ”]Lip . (9.4)
2ε 2
On the other hand
F (x + ε, Z) − F (x − ε, Z) f (x + ε) − f (x − ε)
E =
2ε 2ε
and
2 !
F (x + ε, Z) − F (x − ε, Z) F (x + ε, Z) − F (x − ε, Z) f (x + ε) − f (x − ε) 2
Var = E −
2ε 2ε 2ε
E(F (x + ε, Z) − F (x − ε, Z))2 f (x + ε) − f (x − ε) 2
= −
4ε2 2ε
f (x + ε) − f (x − ε) 2
2
≤ CF,Z − (9.5)
2ε
2
≤ CF,Z . (9.6)
9.1. FINITE DIFFERENCE METHOD(S) 283
Numerical recommendations. The above result suggests to choose M = M (ε) (and ε) so that
M (ε) = o ε−4 .
As a matter of fact, it is useless to pursue the Monte Carlo simulation so that the statistical error
becomes smaller than that induced by the approximation of the derivative. From a practical point
of view this means that, in order to reduce the error by a factor 2, we need to reduce ε and increase
M as follows: √
ε ; ε/ 2 and M ; 4 M.
Remark (what should never be done). Imagine that one uses two independent samples (Zk )k≥1
and (Zek )k≥1 to simulate F (x − ε, Z) and F (x + ε, Z). Then, it is straightforward that
M
!
1 X F (x + ε, Zk ) − F (x − ε, Zek ) 1
Var = Var(F (x + ε, Z)) + Var(F (x − ε, Z))
M 2ε 4M ε2
k=1
Var(F (x, Z))
≈ .
2M ε2
2
which formally simply reads E|X − a|2 = (a − E X)2 + Var(X).
284 CHAPTER 9. BACK TO SENSITIVITY COMPUTATION
f (x+ε)−f (x−ε)
Then the asymptotic variance of the estimator of 2 explodes as ε → 0 and the resulting
quadratic error read approximately
Examples: (a) Black-Scholes model. The basic fields of applications in finance is the Greeks
computation. This corresponds to functions F of the form
√
2
−rT (r− σ2 )T +σ T z
F (x, z) = e h xe , z ∈ R, x ∈ (0, ∞)
where b and ϑ are locally Lipschitz continuous functions (on the real line) with at most linear
growth (which implies the existence and uniqueness of a strong solution (Xtx )t∈[0,T ] starting from
X0x = x. In such a case one should rather writing
The Lipschitz continuous continuity of the flow of the above SDE (see Theorem 7.10) shows that
where Cb,ϑ is a positive constant only depending on the Lipschitz continuous coefficients of b and ϑ.
In fact this also holds for multi-dimensional diffusion processes and for path-dependent functionals.
The regularity of the function f is a less straightforward question. But the answer is positive in
two situations: either h, b and σ are regular enough to apply results on the flow of the SDE which
allows pathwise differentiation of x 7→ F (x, ω) (see Theorem 9.1 further on in Section 9.2.2) or ϑ
satisfies a uniform ellipticity assumption ϑ ≥ ε0 > 0.
(c) Euler scheme of a diffusion model with Lipschitz continuous coefficients. The same holds for
the Euler scheme. Furthermore Assumption (9.1) holds uniformly with respect to n if T /n is the
step size of the Euler scheme.
(d) F can also stand for a functional of the whole path of a diffusion provided F is Lipschitz
continuous with respect to the sup-norm over [0, T ].
As emphasized in the section devoted to the tangent process below, the generic parameter x
can be the maturity T (in practice the residual maturity T − t, also known as seniority ), or any
finite dimensional parameter on which the diffusion coefficient depend since they always can be
seen as a component or a starting value of the diffusion.
Exercises. 1. Adapt the results of this section to the case where f 0 (x) is estimated by its
“forward” approximation
f (x + ε) − f (x)
f 0 (x) ≈ .
ε
2. Apply the above method to approximate the γ-parameter by setting
f (x + ε) + f (x − ε) − 2f (x)
f ”(x) =
ε2
under suitable assumptions on f and its derivatives.
In the setting described in the above Proposition 9.1, we are close to a framework in which one
can interchange derivation
0 and expectation:
the (local) Lipschitz continuous assumption on x0 7→
F (x0 , Z) implies that F (x ,Z)−F
x0 −x
(x,Z)
0
is a uniformly integrable family. Hence as soon
x ∈(x−ε0 ,x+ε0 )
as x0 7→ F (x0 , Z) is P-a.s. pathwise differentiable at x (or even simply in L2 (P)), one has f 0 (x) =
E Fx0 (x, Z).
Consequently it is important to investigate the singular setting in which f is differentiable at x
and F (., Z) is not Lipschitz continuous in L2 . This is the purpose of the next proposition (whose
proof is quite similar to that the Lipschitz continuous setting and is subsequently left to the reader
as an exercise).
286 CHAPTER 9. BACK TO SENSITIVITY COMPUTATION
ε2 CHol,F,Z
≤ [f ”]Lip + √ .
2 (2ε)1−θ M
Once again, the variance of the finite difference estimator explodes as ε → 0 as soon as θ 6= 1.
As a consequence, in such a framework, to divide the quadratic error by a 2 factor, we need to
√
ε ; ε/ 2 and M ; 21−θ × 4 M.
A slightly different point of view in this singular case is to (roughly) optimize the parameter
ε = ε(M ), given a simulation size M in order to minimize the quadratic error, or at least its natural
upper bounds. Such an optimization performed in (9.7) yields
! 1
2θ CHol,F,Z 3−θ
εopt = √ .
[f ”]Lip M
which of course depends on M so that it breaks the recursiveness of the estimator. Moreover its
sensitivity to [f ”]Lip makes its use rather unrealistic in practice.
2−θ
− 3−θ
The resulting rate of decay of the quadratic error is O M . This rate show that when
θ ∈ (0, 1), the lack of L2 -regularity of x 7→ F (x, Z) slows down the convergence of the finite difference
method by contrast with the Lipschitz continuous case where the standard rate of convergence of
the Monte Carlo method is preserved.
Example of the digital option. A typical example of such a situation is the pricing of digital
options (or equivalently the computation of the δ-hedge of a Call or Put options).
Let us consider in the still in the standard risk neutral Black-Scholes model a digital Call option
with strike price K > 0 defined by its payoff
h(ξ) = 1{ξ≥K}
√
σ2
and set F (x, z) = e−rT h x e(r− 2 )T +σ T z , z ∈ R, x ∈ (0, ∞) (r denotes the constant interest
rate as usual). We know that the premium of this option is given for every initial price x > 0 of
the underlying risky asset by
d
f (x) = E F (x, Z) where Z = N (0; 1).
9.1. FINITE DIFFERENCE METHOD(S) 287
σ2
Set µ = r − 2 . It is clear (using that Z and −Z have the same distribution), that
√
f (x) = e−rT P x eµ T +σ T Z ≥ K
log(x/K) + µT
= e−rT P Z ≥ − √
σ T
log(x/K) + µT
= e−rT Φ0 √
σ T
where Φ0 denotes the distribution function of the N (0; 1) distribution. Hence the function f is
infinitely differentiable on (0, ∞) since the probability density of the standard normal distribution
is on the real line.
On the other hand, still using that Z and −Z have the same distribution, for every x, x0 ∈ R,
2
0 2 −2rT
n
kF (x, Z) − F (x , Z)k2 = e 1
Z≥− log(x/K)+µT
o − 1 n
0
log(x /K)+µT
o
Z≥−
√ √
σ T σ T
2
2
= e−2rT E 1n log(x/K)+µT o − 1n log(x0 /K)+µT o
Z≤ σ√T Z≤ √
σ T
log(max(x, x0 )/K) + µT log(min(x, x0 )/K) + µT
= e−2rT Φ0 √ − Φ0 √ .
σ T σ T
Using that that Φ00 is bounded (by κ0 = √1 ),
2π
we derive that
κ0 e−2rT
kF (x, Z) − F (x0 , Z)k22 ≤ log x − log x0 .
√
σ T
Consequently for every interval I ⊂ (0, ∞) bounded away from 0, there exists a real constant
Cr,σ,T,I > 0 such that
i.e. the functional F is 12 -Hölder in L2 (P) and the above proposition applies.
M
!2
0 0 (x)
2
0 1 X f (x + εk ) − f (x − εk )
f (x) − fd
= f (x) −
M
2 M 2εk
k=1
M
1 X Var(F (x + εk , Zk ) − F (x − εk , Zk ))
+
M2 4ε2k
k=1
M
!2
[f ”]2Lip X 2
2
CF,Z
≤ εk + (9.9)
4M 2 M
k=1
2
M
2
!
1 [f ”] X
= Lip
ε2k 2
+ CF,Z
M 4M
k=1
√
Standard M -L2 of convergence In order to prove a √1M rate (like in a standard Monte Carlo
simulation) we need the sequence (εm )m≥1 and the size M to satisfy
M
!2
X
ε2k = O(M ).
k=1
9.1. FINITE DIFFERENCE METHOD(S) 289
since
M M 1
√ 1 X k −2 √ √
Z 1
X 1 dx
1 = M ∼ M √ =2 M as M → +∞.
k2 M M 0 x
k=1 k=1
Erasing the asymptotic bias A more efficient way to take advantage of a decreasing step
approach is to erase the bias by considering steps of the form
1
εk = o k − 4
as k → +∞.
M
F (x + ε , Z ) − F (x − ε , Z )
2
M
X Var(F (x + εk , Zk ) − F (x − εk , Zk )) X k k k k 2
=
4ε2k 4ε2k
k=1 k=1
M 2
X f (x + εk ) − f (x − εk )
− .
4ε2k
k=1
f (x + εk ) − f (x − εk )
Now, since → f 0 (x) as k → +∞,
2εk
M
1 X f (x + εk ) − f (x − εk ) 2
2 → f 0 (x)2 as M → +∞.
M 4εk
k=1
Plugging this in the above computations yields the refined asymptotic upper-bound
√
q
lim sup M
f 0 (x) − fd
0 (x)
≤ 2
CF,Z − (f 0 (x))2 .
M
M →+∞ 2
This approach has the same quadratic rate of convergence as a regular Monte Carlo simulation
(e.g. a simulation carried out with ∂F
∂x (x, Z) if it exists).
Now, we show that the estimator fd 0 (x) is consistent i.e. converging toward f 0 (x).
M
Proposition 9.3 Under the assumptions of Proposition 9.1 and if εk goes to zero as k goes to
0 (x)
infinity, the estimator fd M
a.s. converges to its target f 0 (x).
290 CHAPTER 9. BACK TO SENSITIVITY COMPUTATION
M
X 1 F (x + εk , Zk ) − F (x − εk , Zk ) − (f (x + εk ) − f (x − εk ))
LM = , M ≥ 1.
k 2εk
k=1
so that
sup EL2M < +∞.
M
Consequently, LM a.s. converges to a square integrable (hence a.s. finite) random variable L∞ as
M → +∞. The announced a.s. convergence in (9.11) follows from the Kronecker Lemma (see
Lemma 11.1). ♦
Exercises. 1. (Central Limit Theorem) Assume that x 7→ F (x, Z) is Lipschitz continuous from
R to L2+η (P) for an η > 0. Show that the convergence of the finite difference estimator with
decreasing step fd 0 (x) defined in (9.8) satisfies the following property: from every subsequence
M
(M 0 ) of (M ) one may extract a subsequence (M ”) such that
√
0
L
0 (x)
M ” fd − f (x) −→ N (0; v), v ∈ [0, v̄], as M → +∞
M”
2 − f 0 (x) 2 .
where v̄ = CF,Z
[Hint: Note that the sequence Var F (x+εk ,Zk )−F
2εk
(x−εk ,Zk )
, k ≥ 1, is bounded and use the fol-
lowing Central Limit Theorem: if (Yn )n≥1 is a sequence of i.i.d. random variables such that there
exists η > 0 satisfying
Nn
1 X n→+∞
sup E|Yn |2+η < +∞ and ∃ Nn → +∞ with Var(Yk ) −→ σ 2 > 0
n Nn
k=1
9.2. PATHWISE DIFFERENTIATION METHOD 291
then
Nn
1 X L
√ Yk −→ N (0; σ 2 ) as n → +∞.]
Nn k=1
2. (Hölder framework) Assume that x 7→ F (x, Z) is only θ-Hölder from R to L2 (P) with θ ∈ (0, 1),
like in Proposition 9.2.
(a) Show that a natural upper-bound for the quadratic error induced by the symmetric finite
0 (x)
difference estimator with decreasing step fd defined in (9.8) is given by
M
v !2
M M
u 2 2
1ut [f ”]Lip CF,Z 1
X X
ε2k + .
M 4
k=1
22(1−θ) 2(1−θ)
k=1 εk
0 (x)
(b) Show that the resulting estimator fd M
a.s. converges to its target f 0 (x) as soon as
X 1
< +∞.
2 2(1−θ)
k≥1 k εk
(c) Assume that εk = kca , k ≥ 1, where c is a positive real constant and a ∈ (0, 1). Show that the
1
exponent a corresponds to an admissible step iff a ∈ (0, 2(1−θ) ). Justify the choice of a∗ = 2(3−θ)
1
for
√ − 2
the exponent a and derive that the resulting rate of decay of the quadratic error is O M 3−θ .
The above exercise shows that the (lack of) regularity of x 7→ F (x, Z) in L2 (P) does impact the
rate of convergence of the finite difference method.
∀ ξ ∈ (x − ε0 , x + ε0 ), F (ξ, Z) ∈ Lp (P).
f 0 (x) = E ∂x F (x, ω) .
292 CHAPTER 9. BACK TO SENSITIVITY COMPUTATION
The usual criterion to establish Lp -differentiability, especially when the underlying source of
randomness comes from a diffusion (see below), is to establish a pathwise differentiability of
ξ 7→ F (ξ, Z(ω)) combined with an Lp -uniform integrability property of the ratio F (x,Z)−F x−ξ
(ξ,Z)
(see Theorem 11.2 and the Corollary that follows in Chapter 11 for a short background on uniform
integrability).
Usually, this is applied with p = 2 since one needs ∂x F (x, ω) to be in L2 (P) to ensure that the
Central Limit Theorem applies to rule the rate of convergence in the Monte Carlo simulation.
This can be summed up in the following proposition which proof is obvious.
Theorem 9.1 (a) Let b : R+ × Rd → Rd , ϑ : R+ × Rd → M(d, q, R), with regularity Cb1 with
bounded α-Hölder partial derivatives for an α > 0. Let X x = (Xtx )t≥0 denote the unique strong
solution of the SDE
dXt = b(t, Xt )dt + ϑ(t, Xt )dWt , X0 = x ∈ Rd , (9.12)
∂(Xtx )i
h i
gradient Yt (x) := ∇x Xtx = ∂xj
satisfies the linear stochastic differential system
1≤i,j≤d
d Z t q Z t
d X
∂bi ∂ϑik
Ytij (x)
X X
x `j
∀ t ∈ R+ , = δij + `
(s, Xs )Ys (x)ds+ `
(s, Xsx )Ys`j (x)dWsk , 1 ≤ i, j ≤ d,
0 ∂y 0 ∂y
`=1 `=1 k=1
Remark. One easily derives from the above theorem the slightly more general result about the
tangent process to the solution (Xst,x )s∈[t,T ] starting from x at time t. This process, denoted
Y (tx)s )σ≥t can be deduced from Y (x) be
∀ s ≥ t, Y (t, x)s = Y (x)s Y −1 (x)t .
This is a consequence often unique ness of the solution of a linear SDE.
Remark. • Higher order differentiability properties hold true if b are ϑ are smoother. For a more
precise statement, see Section 9.2.2 below.
so that, in the Black-Scholes model (b(t, x) = rx, ϑ(t, x) = ϑx), one retrieves that
d x Xx
Xt = Yt (x) = t .
dx x
Exercise. Let d = q = 1. Show that under the assumptions of Theorem 9.1, the tangent process
d
at x Yt (x) = dx Xtx satisfies
Yt (x)
sup ∈ Lp (P), p > 0.
Y
s,t∈[0,T ] s (x)
Applications to δ-hedging. The tangent process and the δ hedge are closely related. Assume
that the interest rate is 0 (for convenience) and that a basket is made up of d risky assets whose
price dynamics (Xtx )t∈[0,T ] , X0x = x ∈ (0, +∞)d , (Xtx )t∈[0,T ] is solution to (9.12).
Then the premium of the payoff h(XTx ) on the basket is given by
f (x) := E h(XTx ).
The δ-hedge vector of this option (at time 0 and) at x = (x1 , . . . , xd ) ∈ (0, +∞)d is given by
∇f (x).
We have the following proposition that establishes the existence and the representation of
f 0 (x) as an expectation (with in view its computation by a Monte Carlo simulation). It is a
straightforward application of Theorem 2.1(b).
294 CHAPTER 9. BACK TO SENSITIVITY COMPUTATION
Remark. One can also consider a forward start payoff h(XTx1 , . . . , XTx ). Then, under similar
N
assumptions its premium v(x) is differentiable and
N
X D E
∇f (x) = E ∇yj h(XTx1 , . . . , XTx )|∇x XTx .
N j
j=1
Computation by simulation
One uses these formulae to compute some sensibility by Monte Carlo simulations since sensitivity
parameters are functions of the couple made by the diffusion process X x solution to the SDE
starting at x and its tangent process ∇x X x at x: it suffices to consider the Euler scheme of this
couple (Xtx , ∇x Xtx ) over [0, T ] with step Tn .
Assume d = 1 for notational convenience and set Yt = ∇x Xtx :
In 1-dimension, one can take advantage of the semi-closed formula (9.13) obtained in the above
exercise for the tangent process.
Set xe = (x, θ) and Xetx0 := (Xtx , θt ). Thus, following Theorems 3.1 and 3.3 of Section 3 in [87], if
(θ, x) 7→ b(θ, x) and (θ, x) 7→ ϑ(θ, x) are Cbk+α (0 < α < 1) with respect to x and θ (4 ) then the
solution of the SDE at a given time t will be C k+β (0 < β < α) as a function of (x and θ). A more
specific approach would show that some regularity in the sole variable θ would be enough but then
this result does not follow for free from the general theorem of the differentiability of the flows.
Assume b = b(θ, . ) and ϑ = ϑ(θ, . ) and the initial value x = x(θ), θ ∈ Θ, Θ ⊂ Rq (open set).
One can also differentiate a SDE with respect to this parameter θ. We can assume that q = 1 (by
considering a partial derivative if necessary). Then, one gets
Z t
∂Xt (θ) ∂x(θ) ∂b ∂b ∂Xs (θ)
= + θ, Xs (θ) + θ, Xs (θ) ds
∂θ ∂θ 0 ∂θ ∂x ∂θ
Z t
∂ϑ ∂ϑ ∂Xs (θ)
+ θ, Xs (θ) + θ, Xs (θ) dWs .
0 ∂θ ∂x ∂θ
As a conclusion let us mention that this tangent process approach is close to the finite difference
method applied to F (x, ω) = h(XTx (ω)): it appears as a limit case of the finite difference method.
measure µ on Rd . Assume this density is positive on a domain D of Rd for every θ ∈ Θ. Then one
could derive the sensitivity of functions
Z
f (θ) := E ϕ(X(θ)) = ϕ(y)p(θ, y)µ(dy)
D
(with respect to θ ∈ Θ, ϕ Borel function with appropriate integrability assumptions) provided the
density function p is smooth enough as a function of θ, regardless of the regularity of ϕ. In fact,
this was briefly developed in a one dimensional Black-Scholes framework but the extension to an
abstract framework is straightforward and yields the following result.
then
0
∂ log p
∀ θ ∈ Θ, f (θ) = E ϕ X(θ) (θ, X(θ)) .
∂θ
Let pT (θ, x, y) and p̄T (θ, x, y) denote the density of XTx (θ) and its Euler scheme X̄Tx (θ) (with
step size Tn ). Then one may naturally propose the following naive approximation
0 x ∂ log pT x x ∂ log p̄T x
f (θ) = E ϕ(XT (θ)) (θ, x, XT (θ)) ≈ E ϕ(X̄T (θ)) (θ, x, X̄T (θ)) .
∂θ ∂θ
In fact the story is not as straightforward because what can be made explicit and tractable is
the density of the whole n-tuple (X̄txn , . . . , X̄txn , . . . , X̄txnn ) (with tnn = T ).
1 k
(a) Then the distribution PX̄ x (dy) of X̄ xT has a probability density given by
T n
n
1 n T ∗ ∗ −1 T
p̄ T (θ, x, y) = d e− 2T (y−x− n b(θ,x)) (ϑϑ (θ,x)) (y−x− n b(θ,x)) .
(2π Tn ) 2
p
n ∗
det ϑϑ (θ, x)
(b) The distribution P(X̄ xn ,...,X̄ xn ,...,X̄ xn ) (dy1 , . . . , dyn ) of the n-tuple (X̄txn , . . . , X̄txn , . . . , X̄txnn ) has a
t1 t tn 1 k
k
probability density given by
n
Y
p̄tn ,...,tnn (θ, x, y1 , . . . , yn ) = p̄ T (θ, yk−1 , yk )
1 n
k=1
Proof. (a) is a straightforward consequence of the definition of the Euler scheme at time Tn and
the formula for the density of a Gaussian vector. Claim (b) follows from an easy induction based
on the Markov property satisfied by the Euler scheme [Details in progress, to be continued...]. ♦
The above proposition shows that every marginal X̄txn has a density which, unfortunately, cannot
k
be made explicit. So to take advantage of the above closed form for the density of the n-tuple, we
can write
!
∂ log p̄tn ,...,tnn
0 x
f (θ) ≈ E ϕ(X̄T (θ)) 1
(θ, x, X̄txn1 (θ), . . . , X̄txn1 (θ))
∂θ
n ∂ log p̄ T
!
X
x
= E ϕ(X̄T (θ)) n
(θ, X̄txn (θ), X̄txn (θ)) .
∂θ k−1 k
k=1
At this stage it appears clearly that the method also works for path dependent problems i.e.
when considering Φ((Xt (θ))t∈[0,T ] ) instead of ϕ(XT (θ)) (at least for specific functionals Φ involving
time averaging, a finite number of instants, supremum, infimum, etc). This leads to new difficulties
in connection with the Brownian bridge method for diffusions, that need to be encompassed.
Finally, let us mention that evaluating the rate of convergence of these approximations from
a theoretical point of view is quite a challenging problems since it involves not only the rate of
convergence of the Euler scheme itself but also that of the probability density functions of the
scheme toward that of the diffusion (see [10]).
Theorem 9.2 (Bismut formula) Let W = (Wt )t∈[0,T ] be a standard Brownian motion on a prob-
ability space (Ω, A, P) and let F := (Ft )t∈[0,T ] be its augmented (hence càd) natural filtration. Let
X x = (Xtx )t∈[0,T ] be a diffusion process solution to the SDE
where b and ϑ are Cb1 (hence Lipschitz continuous). Let f : R → R be a continuously differentiable
function such that
E f 2 (XTx ) + (f 0 )2 (XTx ) < +∞.
Let (Ht )t∈[0,T ] be an F-progressively measurable (5 ) process lying in L2 ([0, T ] × Ω, dt ⊗ dP) i.e.
RT
satisfying E 0 Hs2 ds < ∞. Then
Z T Z T
ϑ(Xsx )Hs
x 0 x
E f (XT ) Hs dWs = E f (XT )YT ds
0 0 Ys
dXtx
where Yt = dx is the tangent process of X x at x.
Proof (Sketch of ). We will assume to simplify the arguments in the proof that |Ht | ≤ C < +∞,
C ∈ R+ and that f and f 0 are bounded functions. Let ε ≥ 0. Set on the probability space
(Ω, FT , P),
Pε = L(ε)
T .P
where Z t
ε2 t 2
Z
(ε)
Lt = exp −ε Hs dWs − Hs ds , t ∈ [0, T ]
0 2 0
is a P-martingale since H is bounded. It follows from Girsanov’s Theorem that
Z t
ε
W := Wt + ε
f Hs ds is a P(ε) , (Ft )t∈[0,T ] -Brownian motion.
0 t∈[0,T ]
Consequently (see [141], Theorem 1.11, p.372), X has the same distribution under P(ε) as X (ε)
solution to
(ε)
dX (ε) = (b(X (ε) ) − εHt ϑ(X (ε) ))dt + ϑ(X (ε) )dWt , X0 = x.
5
This means that for every t ∈ [0, T ], (Hs (ω))(s,ω)∈[0,T ]×Ω is Bor([0, t]) ⊗ Ft -measurable.
9.4. A FLAVOUR OF STOCHASTIC VARIATIONAL CALCULUS: FROM BISMUT TO MALLIAVIN299
where we used once again Theorem 2.1 and the obvious fact that X (0) = X.
Using the tangent
! process method with ε as an auxiliary variable, one derives that the process
(ε)
∂Xt
Ut := satisfies
∂ε
|ε=0
It is clear by the Lebesgue dominated convergence Theorem that H (n) converges to H in L2 ([0, T ]×
Ω, dt ⊗ dP). Then one checks that both sides of Bismut’s identity are continuous with respect to
this topology (using Hölder’s Inequality).
300 CHAPTER 9. BACK TO SENSITIVITY COMPUTATION
which yields the announced result. The extension to continuous functions with polynomial growth
relies on an approximation argument. ♦
Remarks. • One retrieves in the case of a Black-Scholes model the formula (2.6) obtained for the
Xx
δ in Section 2.3 by using an elementary integration by parts since Yt = xt and ϑ(x) = σx.
• Note that the assumption
Z T 2
Yt
dt < +∞
0 ϑ(Xtx )
is basically an ellipticity assumption. Thus if ϑ2 (x) ≥ ε0 > 0, one checks that the assumption is
always satisfied.
Exercises. 1. Apply what precedes to get a formula for the γ-parameter in a general diffusion
model. [Hint: Apply the above “derivative free” formula to the δ-formula obtained using the
tangent process method].
0
2. Show that if b0 − b ϑϑ − 12 ϑ”ϑ = c ∈ R, then
∂ ecT
E f (XTx ) = E f (XTx )WT
∂x ϑ(x)
9.4. A FLAVOUR OF STOCHASTIC VARIATIONAL CALCULUS: FROM BISMUT TO MALLIAVIN301
with Lipschitz continuous coefficients b and σ. We denote by X x = (Xtx )t∈[0,T ] its unique solution
starting at x and by (Ft )t∈[0,T ] the (augmented) filtration of the Brownian motion W . We state
the result in a one-dimensional setting for (at least notational) convenience.
Theorem 9.3 Let F : (C([0, T ], R), k . ksup ) → R be a differentiable functional with differential
DF . Then Z T
F (X x ) = E F (X x ) + E(DF (X).(1[t,T ] Y.(t) ) | Ft )ϑ(Xt )dWt
0
So, the Clark-Occone-Haussman provides a kind of closed form for the process H.
• The differential DF (ξ) of the functional F at an element ξ ∈ C([0, T ], R) is a continuous linear
form on C([0, T ], R). Hence, following the Riesz representation Theorem (see e.g. [30]), it can be
represented by a finite signed measure, say µDF (ξ) (ds) so that the term DF (ξ).(1[t,T ] Y.(t) ) reads
Z T Z T
(t)
DF (ξ).(1[t,T ] Y. ) = 1[t,T ] (s)Ys(t) µDF (ξ) (ds) = Ys(t) µDF (ξ) (ds).
0 t
What is called Malliavin calculus is a way to extend this notion of differentiation to more
general functionals using some functional analysis arguments (closure of operators), etc) using e.g.
the domain of the operator Dt F (see []).
and perform a (stochastic) integration by parts. Owing to (9.17), we get, under appropriate inte-
grality conditions,
Z T Z T Z T
x
E f (XT ) Hs dWs = 0+E [. . . ] dMt + E [. . . ] dNt
0 0 0
Z T
(s)
+E E(f 0 (XT )YT | Fs ) ϑ(Xsx )Hs ds
0
Z T
= E E(f 0 (XT )YT(s) | Fs ) ϑ(Xsx )Hs ds
0
owing to Fubini’s Theorem. Finally, using the characterization of conditional expectation to get
rid of the conditioning, we obtain
Z T Z T
E f (XTx ) Hs dWs = E f 0 (XT )YT(s) ϑ(Xsx )Hs ds.
0 0
(s) YT
Finally, a reverse application of Fubini’s Theorem and the identity YT = Ys leads to
T T
ϑ(Xsx )
Z Z
E f (XTx ) Hs dWs = E f 0 (XT )YT Hs ds
0 0 Ys
9.4. A FLAVOUR OF STOCHASTIC VARIATIONAL CALCULUS: FROM BISMUT TO MALLIAVIN303
F (x, Z) = (1 − ϕ(x − Z))F (x, Z) + ϕ(x − Z)F (x, Z) := F1 (x, Z) + F2 (x, Z).
Functions ϕ can be obtained as mollifiers in convolution theory but other choices are possible, like
simply Lipschitz continuous continuous functions (see the numerical illustration in Section 9.4.4).
304 CHAPTER 9. BACK TO SENSITIVITY COMPUTATION
Set fi (x) = E Fi (x, Z)) so that f (x) = f1 (x) + f2 (x). Then one may use a direct differentiation to
compute
0 ∂F1 (x, Z)
f1 (x) = E
∂x
(or a finite difference method with constant or decreasing increments). As concerns f20 (x), since
F2 (x, Z) is singular, it is natural to look for a weighted estimator
reproduced in figures 9.1 and 9.2 (pay attention to the scales of y-axis in each figure).
2 100
payoff payoff
smooth payoff smooth payoff
80
1.5
60
40
0.5
20
0
0
-0.5 -20
35 40 45 50 55 60 65 35 40 45 50 55 60 65
Figure 9.1: Payoff h1 of the digital Call with Figure 9.2: Payoff hS of the asset-or-nothing
strike K = 50. Call with strike K = 50.
σ2
√ σ2
√
Denoting F (x, z) = e−rT h1 (xe(r− 2 )T +σ T z ) and F (x, z) = e−rT hS (xe(r− 2 )T +σ T z ) in the
digital Call case and in the asset-or-nothing Call respectively, we consider f (x) = E F (x, Z) where
9.4. A FLAVOUR OF STOCHASTIC VARIATIONAL CALCULUS: FROM BISMUT TO MALLIAVIN305
Z is a standard Gaussian variable. With both payoff functions, we are in the singular setting in
which F (., Z) is not Lipschitz continuous continuous but only 12 -Hölder in L2 . As expected, we are
interested in computing the delta of the two options i.e. f 0 (x).
In such a singular case, the variance of the finite difference estimator explodes as ε → 0 (see
Proposition 9.2) and ξ 7→ F (ξ, Z) is not Lp -differentiable for p ≥ 1 so that the tangent process
approach can not be used (see Section 9.2.2).
We first illustrate the variance explosion in Figures 9.3 and 9.4 where the parameters have been
set to r = 0.04, σ = 0.1, T = 1/12 (one month), x0 = K = 50 and M = 106 .
10 5000
Crude estimator Crude estimator
RR extrapolation RR extrapolation
4500
8 4000
3500
6 3000
2500
4
2000
1500
2
1000
500
0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Figure 9.3: Variance of the two estimator as Figure 9.4: Variance of the two estimator as
a function of ε (digital Call). a function of ε (asset-or-nothing Call).
To avoid the explosion of the variance one considers a smooth (namely Lipschitz continuous
continuous) approximation of both payoffs. Given a small parameter η > 0 one defines
(
if |ξ − K| > η,
h1 (ξ) if |ξ − K| > η, hS (ξ)
h1,η (ξ) = 1 1 K and hη,S (ξ) = K+η K+η K
2η ξ + 2 (1 − η ) if |ξ − K| ≤ η 2η ξ + 2 (1 − η ) if |ξ − K| ≤ η
The control of the variance in the smooth case is illustrated in Figures 9.5 and 9.6 when η = 2,
and in Figures 9.7 and 9.8 when η = 0.5. The variance increases when η decreases to 0 but does
not explode as ε goes to 0.
For a given ε, note that the variance is usually higher using the RR extrapolation. However,
in the Lipschitz continuous case the variance of the RR estimator and that of the crude finite
difference converge toward the same value when ε goes to 0. Moreover from (9.18) we deduce the
choice ε = O(M −1/8 ) to keep the balance between the bias term of the RR estimator and the
variance term.
As a consequence, for a given level of the L2 -error, we can choose a bigger ε with the RR
estimator, which reduces the bias without increasing the variance.
0.035 80
Crude estimator Crude estimator
RR extrapolation RR extrapolation
70
0.03
60
0.025
50
0.02
40
0.015
30
0.01 20
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Figure 9.5: Variance of the two estimator as Figure 9.6: Variance of the two estimator
a function of ε (digital Call). Smooth payoff as a function of ε (asset-or-nothing Call).
with η = 1. Smooth payoff with η = 1.
0.25 700
Crude estimator Crude estimator
RR extrapolation RR extrapolation
600
0.2
500
0.15
400
300
0.1
200
0.05
100
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Figure 9.7: Variance of the two estimator as Figure 9.8: Variance of the two estimator
a function of ε (digital Call). Smooth payoff as a function of ε (asset-or-nothing Call).
with η = 0.5. Smooth payoff with η = 0.5.
9.4. A FLAVOUR OF STOCHASTIC VARIATIONAL CALCULUS: FROM BISMUT TO MALLIAVIN307
1
The parameters of the model are the following: x0 = 50, K = 50, r = 0.04, σ = 0.1 and T = 52
1 6
(one week) or T = 12 (one month). The number of Monte Carlo simulations is fixed to M = 10 .
We now compare the following estimators in the two cases with two different maturities T = 1/12
(one month) and T = 1/52 (one week):
1
• Finite difference estimator on the non-smooth payoffs h1 and hS with ε = M − 4 ' 0.03.
• Crude weighted estimator (with standard Black-Scholes δ-weight) on the non-smooth payoffs
h1 and hS .
• Localization: Finite difference estimator on the smooth payoffs h1,η and hS,η with η = 1.5 and
1
ε = M − 4 combined with the weighted estimator on the (non-smooth) differences h1 − h1,η
and hS − hS,η .
The results are summarized in the following Tables ?? and ?? for the delta of the digital Call
option and in the Tables ?? and ?? for the one of the asset-or-nothing Call option.
[In progress...]
• Multi-dimensional case
• Variance reduction?
308 CHAPTER 9. BACK TO SENSITIVITY COMPUTATION
Chapter 10
Multi-asset American/Bermuda
Options, swing options
10.1 Introduction
In this chapter devoted to numerical methods for multi-asset American and Bermuda options, we
will consider a slightly more general framework that the nonengative dynamics traded assets in
complete or incomplete markets. This is we will switch from a dynamics (St )t∈[0,T ] to a more
general Rd -valued Brownian diffusion (Xt )t∈[0,T ] satisfying the SDE
(SC) ≡ f satisfies (Lip) and is semi-convex in the following sense: there exists δf : [0, T ]× Rd →
Rd , Borel and bounded, and ρ ∈ (0, ∞) such that
309
310 CHAPTER 10. MULTI-ASSET AMERICAN/BERMUDA OPTIONS, SWING OPTIONS
h(t, y) − h(t, x) ≥ (∇x h(t, x)|y − x) − [∇x h]Lip |ξ − x||y − x|, ξ ∈ [x, y]
≥ h∇x h(t, x)|y − xi − [∇x h]Lip |y − x|2
Proposition 10.1 (see [148]) There exists a continuous function F : Rd → R+ such that
n o
F (t, Xt ) = P-esssup E e−rτ f (τ, Xτ ) | Ft , τ ∈ TtF
where τ ∈ TtF denotes the set of (Fs )s∈[0,T ] -stopping times having values in [t, T ].
Note that F (0, x) is but the premium of the American option with payoff f (t, Xtx ), t ∈ [0, T ].
transitions
Pk (x, dy) = P(Xtnk+1 ∈ dy | Xtnk = x), k = 0, . . . , n − 1.
tn ,x
where (Xsk )s∈[tnk ,T ] is the unique solution to the (SDE) starting from x at time tnk .
n
n
(a) e−rtk Fn (tnk , Xtnk ) = (P, (Ftnk ))-Snell e−rtk f (tnk , Xtnk ) .
k=0,...,n
(b) Fn (tnk , Xtnk ) satisfies the “pathwise” backward dynamic programming principle (denoted
k=0,...,n
BDP P in the sequel)
Fn (T, XT = f (T, XT )
10.2. TIME DISCRETIZATION 311
and T
Fn (tnk , Xtnk ) = max f (tnk , Xtnk ), e−r n E Fn (tnk , Xtnk+1 ) | FtW
n , k = 0, . . . , n − 1.
k
(c) The functions Fn (tnk , .) satisfy the “functional” backward dynamic programming principle
T
Fn (T, x) = f (T, x) and Fn (tnk , x) = max f (tnk , x), e−r n Pk Fn (tnk , .))(x) , k = 0, . . . , n − 1.
Proof. We proceed backward as well. We consider the functions Fn (tnk , .), k = 0, . . . , n, as defined
in (c).
(c) ⇒ (b). The result follows from the Markov property which implies for every k = 0, . . . , n − 1,
n
(b) ⇒ (a). This is a trivial consequence of the (BDP P ) since e−rtk Fn (tnk , Xtnk ) is the
k=0,...,n
n
Snell envelope associated to the obstacle sequence e−rtk f (tnk , Xtnk ) .
k=0,...,n
Applying what precedes to the case X0 = x, we derive from the general theory on optimal stopping
that n o
Fn (0, x) = sup E e−rτ f (τ, Xτ , τ ∈ T[0,T
n
] .
The extension to times tnk , k = 1, . . . , n follows likewise from the same reasoning carried out
tn ,x
with (Xtnk )`=k,...,n . ♦
`
Remark-Exercise. It is straightforward to establish, given that the flow of the SDE is Lipschitz
continuous (see Theorem 7.10) from Rd to any Lp (P), 1 ≤ p < +∞, in the sense
that the functions Fn (tnk , .) are Lipschitz continuous continuous, uniformly in tnk , k = 0, . . . , n,
n ≥ 1.
Now we pass to the (discrete time) Euler scheme as defined by Equation (7.3) in Chapter 7.
We recall its definition for convenience (with a slight change of notation concerning the Gaussian
noise):
r
T n n T
X̄tnk+1 = X̄tnk + b(tk , X̄tnk ) + σ(tk , X̄tnk ) Zk+1 , X̄0 = X0 , k = 0, . . . , n − 1, (10.2)
n n
where (Zk )1≤k≤n denotes a sequence of i.i.d. N (0; Iq )-distributed random vectors given by
r
n
Uk := (Wtnk − Wtnk−1 ), k = 1, . . . , n.
T
Proposition 10.3 (Euler scheme) The above proposition remains true when replacing the se-
quence (Xtnk )0≤k≤n by its Euler scheme with step Tn , still with the filtration (σ(X0 , FtW
n ))0≤k≤n . In
k
both cases one just has to replace the transitions Pk (xdy) of the original process by that of its Euler
scheme with step Tn , namely P̄k,k+1
n (x, dy) defined by
r
n
T T
P̄k,k+1 f (x) = E f (x + b(tnk−1 , x) + σ(tnk−1 , x)Z) , Z ∼ N (0; Iq ),
n n
and Fn by F̄n .
Remark. In fact the result holds true as well with the (smaller) natural innovation filtration
FkX0 ,Z = σ(X0 , Z1 , . . . , Zk ), 0 ≤ k ≤ n, since the Euler scheme remains a Markov chain with the
same transitions with respect to this filtration
Now we state a convergence rate result for the “réduites” and the value functions. In what
follows we consider the value function of the original continuous time stopping time problem denoted
F (t, x) and defined by
n o
F (t, x) = sup E e−rτ f (τ, Xτx ) , τ : (Ω, A) → [t, T ], (FtW )-stopping time
(10.3)
n o
= E P-esssup E e−rτ f (τ, Xτx )|FtW , τ : (Ω, A) → [t, T ], (FtW )-stopping time .(10.4)
∀ t ∈ [0, T ], {τ ≤ t} ∈ FtW .
This means that you decide to leave the game between 0 end t based on your observation of the
process W between 0 and t.
Asking {τ = t} to lie in FtW (I decide to stop exactly at time t) seems more natural assumption,
but in a continuous time framework this condition turns out to be technically not strong enough
to make the theory work.
Exercise. Show that for any F W -stopping time one has, for every t ∈ [0, T ], {τ = t} ∈ FtW (the
converse is not true in general in a continuous time setting).
10.2. TIME DISCRETIZATION 313
In practice, the underlying assumption that the Brownian motion W is observable by the player
is highly unlikely, or at least induces dramatic limitations for the modeling of (Xt )t∈[0,T ] which
models in Finance the vector of market prices of traded assets.
In fact, since the Brownian diffusion process X is a F W -Markov process, one shows (see [149])
one can replace in the above definition and characterizations of F (t, x) the natural filtration of the
Brownian motion W by that of X. This is clearly more in accordance with usual models in Finance
(and elsewhere) since X represents in most models the observable structure process.
Theorem 10.1 (see [13], 2003) (a) Discretization of the stopping rules for the struc-
ture process X: If f satisfies (Lip), i.e. is Lipschitz continuous in x uniformly in t ∈ [0, T ], then
so are the value functions Fn (tnk , .) and F (tnk , .), uniformly with respect to tnk , k = 0, . . . , n, n ≥ 1.
Furthermore F (tnk , .) ≥ Fn (tnk , .) and
Cb,σ,f,T
max F (tnk , Xtnk ) − Fn (tnk , Xtnk )
≤ √
0≤k≤n p n
(b) If f is semi-convex, then there exists real constants Cb,σ,f,T and Cb,σ,f,T,K > 0 such that
Cb,σ,f,T
max Fn (tnk , Xtnk ) − F (tnk , Xtnk )
≤
0≤k≤n p n
(c) Euler approximation scheme X̄ n : There exists real constants Cb,σ,f,T and Cb,σ,f,T,K > 0
such that
n n
Cb,σ,f,T
max |Fn (tk , Xtk ) − F̄n (tk , Xtk )|
≤ √
n n
0≤k≤n p n
314 CHAPTER 10. MULTI-ASSET AMERICAN/BERMUDA OPTIONS, SWING OPTIONS
∀ t ∈ [0, T ], Xt = ϕ(t, Wt )
where ϕ(t, x) can be computed at a very low cost. When d = 1 (although of smaller interest for
application in view of the available analytical methods based on variational inequalities), one can
also rely on the exact simulation method of one dimensional diffusions, see [23]).
In the general case, we will rely on this two successive steps of discretization: one for the optimal
stopping rules and one for the underlying process to make it simulatable. However in both cases,
as far as numerics are concerned, we will rely on the BDP P which itself requires the computation
of conditional expectations.
In both case we have now access to a simulatable Markov chain (either the Euler scheme or
the process itself at times tnk ).This task requires a new discretization phase, making possible the
computation of the discrete time Snell envelope (and its réduite).
0 ≤ Zk = fk (Xk ), k = 0, . . . , n.
We want to compute the so-called (P, (Fn )0≤k≤n ))-Snell envelope U = (Uk )0≤k≤n defined by
n o
Uk = esssup E f (τ, Xτ ) | Fk , τ : (Ω, A) → {k, . . . , n}, (F` )0≤k≤n stopping time
Proposition 10.4 The Snell envelope (Uk )0≤k≤n is solution to the following Backward Dynamic
Programing Principle
Un = Zn and Uk = max Zk , E Uk+1 | Fk , k = 0, . . . , n − 1.
10.3. A GENERIC DISCRETE TIME MARKOV CHAIN MODEL 315
Furthermore, for every k ∈ {0, . . . , n} there exists a Borel function uk : Rd → R such that
Uk = uk (Xk ), k = 0, . . . , n,
where
un = fn and uk = max fk , Pk uk+1 , k = 0, . . . , n − 1.
Moreover, general discrete time optimal stopping theory (with finite horizon) ensures that, for
every k ∈ {0, . . . , n}, there exists an optimal stopping time τk for this problem when starting to
play at time k i.e. such that
Uk = E Zτk | Fk .
Proof. We proceed
by a backward induction on k. If k = the result is obvious. If Uk+1 =
E Zτk+1 | Fk+1 , then
E Zτk+1 | Xk = E E Zτk+1 | Fk+1 | Xk = E Uk+1 | Xk = E uk+1 (Xk+1 ) | Xk = E Uk+1 | Fk
Furthermore,
τk = k1{Zk ≥E(Uk+1 |Fk )} + τk+1 1{Zk ≥E(Uk+1 |Fk )}c
In practice, one may choose different functions at every time k, i.e. families ei,k so that
(ei,k (Xk ))i≥1 makes up a Hilbert basis of L2 (Ω, σ(Xk ), P).
and set
[Nk ]
τn := n,
[Nk ] [N ]
τk := k1 [Nk ] + τk+1k 1 [Nk ] , k = 0, . . . , n − 1.
{Zk >(αk |e[N ] (Xk ))} {Zk ≤(αk |e[Nk ] (Xk ))}
[N ]
where αk k := argmin E(Z [N ]
k − (α|e[Nk ] (Xk )))2 , α∈ RNk .
τk+1
In fact this finite dimensional optimization problem has a well-known solution given by
[N ] [N ]
αk = Gram(e[Nk ] (Xk ))−1 hZ [Nk ] | e` k (Xk )iL2 (P)
τk+1
1≤`≤Nk
This second approximation phase is itself decomposed into two successive phases :
Forward Monte Carlo simulation phase: Simulate (and store) M independent copies
X (1) , . . . , X (m) , . . . , X (M ) of X = (Xk )0≤k≤n in order to have access to the empirical measure
M
1 X
δX (`) .
m
m=1
Backward phase:
τn[N,m,M ] := n
– for k = n − 1 downto 0.
Compute
M
!2
[N ],M 1 X (m)
αk := argminα∈RN Z [N ],m,M − α.e[N ] (X (m) )
M τk+1
m=1
Finally, the resulting approximation of the mean value at the origin of the Snell envelope, also
called “réduite”, reads
M
1 X (m)
E U0 = E(Zτ0 ) ≈ E(Zτ [N0 ] ) ≈ Z [N ],m,M as M → +∞.
0 M τ0
m=1
318 CHAPTER 10. MULTI-ASSET AMERICAN/BERMUDA OPTIONS, SWING OPTIONS
Remarks. • One may rewrite formally the second approximation phase by simply replacing the
distribution of the chain (Xk )0≤k≤n by the empirical measure
M
1 X
δX (`) .
M
m=1
(Nk ) (Nk )
• The Gram matrices [E ei (Xk ) ]1≤i,j≤Nk can be computed “off line” in the sense that
(Xk )ej
[Nk ]
they are not “payoff dependent” by contrast with the term hZ [Nk ] | e` (Xk )iL2 (P) .
τk+1
1≤`≤Nk
In various situations, it is even possible to have some closed form for this Gram matrix, e.g.
when ei (Xk ), i ≥ 1, happens to be an orthonormal basis of L2 (Ω, σ(Xk ), P). In thatpcase the
n
Gram matrix is reduced to the identity matrix! So is the case for example when Xk = kT W kT ,
n
k = 0, . . . , n and ei = Hi−1 where (Hi )i≥0 is the basis of Hermite polynomials (see [78], chapter 3,
p.167).
• The algorithmic analysis of the above described procedure shows that its implementation requires
– a forward simulation of M paths of the Markov chain,
– a backward nonlinear optimization phase in which all the (stored) paths have to interact
[N ],M (m)
through the computation of αk which depends on all the values Xk , k ≥ 1.
However, still in very specific situations, the forward phase can be skipped if a backward sim-
ulation simulation method for the Markov chain (Xk )0≤k≤n is available. So is the case for the
Brownian motion at times nk T using Brownian bridge simulation methods (see Chapter 8).
The rate of convergence of the Monte Carlo step of the procedure is ruled by a Central Limit
Theorem stated below.
Theorem 10.2 (Clément-Lamberton-Protter (2003), see [34]) The Monte Carlo approximation
satisfies a CLT
M
!
√ 1 X (m) [N ],M [N ] L
M Z [N ],m,M − E Zτ [N ] , αk − αk −→ N (0; Σ)
M τk k
m=1 0≤k≤n−1
Cons
• From a purely theoretical point of view the regression approach does not provide error bounds
or rate of approximation for the convergence of E Zτ [N0 ] ) toward E Zτ0 = E U0 which is mostly ruled
0
by the rate at which the family e[Nk ] (Xk ) “fills” L2 (Ω, σ(Xk ), P) when Nk goes to infinity. However,
in practice this information would be of little interest since, especially in higher dimension, the
possible choices for the size Nk are limited by the storing capacity of the used computing device.
• Almost all computations are made on-line since they are payoff dependent. However, note
[N ]
that the Gram matrix of (e` (Xk ))1≤`≤N can be computed off-line since it only depends on the
structure process.
• The choice of the functions ei (x), i ≥ 1, is crucial and needs much care (and intuition).
In practical implementations, it may vary at every times step. Furthermore, it may have a biased
effect for options deep in or deep out of the money since the coordinates of the functions ei (Xk )
are computed locally “where things happen most of the time” inducing an effect on the prices at
long range through their behaviour at infinity. On the other hand this choice of the functions, if
they are smooth, has a smoothing effect which can be interesting to users (if it does not induce
hidden arbitrages. . . ). To overcome the first problem one may choose local function like indicator
functions of a Voronoi diagram (see the next section 10.4 devoted to quantization tree methods or
Chapter ??) with the counterpart that no smoothing effect can be expected any more.
When there is a family of distributions “related” to the underlying Markov structure process, a
natural idea can be to consider an orthonormal basis of L2 (µ0 ) where µ0 is a normalized distribution
of the family. A typical example is the sequence of Hermite polynomials for the normal distribution
N (0, 1).
When no simple solution is available, considering the simple basis (t` )`≥0 remains a quite natural
and efficient choice in 1-dimension.
In higher dimensions (in fact the only case of interest in practice since one dimensional setting
is usually solved by PDE methods!), this choice becomes more and more influenced by the payoff
itself.
• Huge RAM capacity are needed to store all the paths of the simulated Markov chain (for-
ward phase) except when a backward simulation procedure is available. This induces a stringent
limitation of the size M of the simulation, even with recent devices, to prevent a swapping effect
which would dramatically slow down the procedure. By swapping effect we mean that when the
quantity of data to be stored becomes too large the computer uses its hard disk to store them but
the access to this ROM memory is incredibly slow compared to the access to RAM memory.
• Regression methods are strongly payoff dependent in the sense that a significant part of the
procedure (product of the inverted Gram matrix by the projection of the payoff at every time k)
has to be done for each payoff.
For every k ∈ {0, . . . , n}, we replace the marginal Xk by a function X bk of Xk taking values in a
grid Γk , namely Xk = πk (Xk ), where πk : R → Γk is a Borel function. The grid Γk = πk (Rd ) (also
b d
known as a codebook in Signal processing or Information Theory) will always be supposed finite
in practice, with size |Γk | = Nk ∈ N∗ although the error bounds established below still hold if the
grids are infinite provided πk is sublinear (|πk (x)| ≤ C(1 + |x|)) so that X bk has at least as many
moments as Xk .
b
We saw in Chapter 5 an optimal way to specify the function πk (including Γk ) when trying
by minimizing the induced Lp -mean quadratic error kXk − X bk kp . This is the purpose of optimal
quantization Theory. We will come back to these aspects further on.
The starting point, being aware that the sequence (X bk )0≤k≤n has no reason to share a Markov
property, is to force this Markov property in the Backward Dynamic Programing Principle. This
means defining by induction a quantized pseudo-Snell envelope of fk (Xk ) 0≤k≤n (assumed to lie
at least in L1 ), namely
Ubn = fn (X bn ), Ubk = max fk (Xbk ), E Ubk+1 | X
bk . (10.5)
U
bk = u
bk (X
bk ) bk : Rd → R+ , Borel function.
u
Theorem 10.3 (see [12] (2001), [132] (2011)) Assume that all functions fk : Rd → R+ are
Lipschitz continuous and that the transitions Pk (x, dy) = P(Xk+1 ∈ dy | Xk = x) are Lipschitz
continuous in the following sense
Set [P ]Lip = max [Pk ]Lip and [f ]Lip = max0≤k≤n [fk ]Lip .
0≤k≤n−1
10.4. QUANTIZATION (TREE) METHODS (II) 321
n
X
Let p ∈ [1, +∞). We assume that kXk kp + kX
bk kp < +∞.
k=1
n
!1
√ X 2(n−`) 2
kUk − U
bk k2 ≤ 2[f ]Lip [P ]Lip ∨ 1 b ` k2
kX` − X .
2
`=k
Proof. Step 1. First, we control the Lipschitz continuous constants of the functions uk . It follows
from the classical inequality
that
[uk ]Lip ≤ max [fk ]Lip , [Pk uk+1 ]Lip
≤ max [f ]Lip , [Pk ]Lip [uk+1 ]Lip
n−k
with the convention [un+1 ]Lip = 0. An easy backward induction yields [uk ]Lip ≤ [f ]Lip [P ]Lip ∨1 .
Step 2. We focus on claim (b) when p = 2. First we show
b1
2 ≤ [P1 g]2
X1 − X b2 )
2 .
b1
2 +
g(X2 ) − h(X
P1 g(X1 ) − E h(X
b2 ) | X
Lip (10.6)
2 2 2
We have
+
P1 g(X1 ) − E h(X
b2 ) | X
b1 = E g(X2 )|X1 − E E g(X2 )|X1 | X
b1 ⊥ E g(X2 )|X
b1 − E h(X2 )|X1
where we used the chaining rule for conditional expectation E(E( . |X1 )|Xb1 ) = E( . |X
b1 ) since X
b1
is σ(X1 )-measurable. The orthogonality follows from the very definition of conditional expectation
as a projection. Consequently
2
2
2
P1 g(X1 ) − E h(X2 ) | X1
≤
P1 g(X1 ) − E P1 g(X1 )|X1
+
E g(X2 )|X1 − E h(X2 )|X1
b b
b b
2 2 2
2
2
≤
P1 g(X1 ) − P1 g(X1 )
+
g(X2 ) − h(X2 )
b
b
2 2
where we used in the last line the very definition of conditional expectation E(.|X
b1 ) as the best
2
L -approximation by a σ(X b1 )-measurable random vector and the fact that conditional expectation
operator are projectors (hence contraction with operator norm at most equal to 1).
322 CHAPTER 10. MULTI-ASSET AMERICAN/BERMUDA OPTIONS, SWING OPTIONS
Now, it follows from both dynamic programing formulas (original and quantized) that
|Uk − U
bk | ≤ max |fk (Xk ) − fk (X bk+1 |X
bk )|, E Uk+1 |Xk − E U bk
so that
bk |2 ≤ |fk (Xk ) − fk (X
bk )|2 + E Uk+1 |Xk − E U bk 2 .
|Uk − U bk+1 |X
Plugging this inequality in the original one yields for every k ∈ {0, . . . , n},
bk
2 ≤ [f ]2 + [P ]2 [uk+1 ]2
Xk − X bk+1
2
bk
2 +
Uk+1 − U
Uk − U Lip Lip Lip
2 2 2
Consequently
n−1
bk
2 ≤
2(n−k)
b`
2 + [f ]2Lip
Xn − X
bn
2
X
2[f ]2Lip 1 ∨ [Pk ]Lip
Uk − U
X` − X
2 2 2
`=k
n
2(n−k)
b`
2
X
≤ 2[f ]2Lip
1 ∨ [Pk ]Lip
X` − X
2
`=k
Remark. The above control emphasizes the interest of minimizing the “quantization” error kXk −
bk kp at each tim step of the Markov chain to reduce the final resulting error.
X
Example of application: the Euler scheme. Let (X̄tnn )0≤k≤n be the Euler scheme with step Tn
k
of the d-dimensional diffusion solution to the SDE (10.1). It defines an homogenous Markov chain
with transition
r !
T T L
P̄kn g(x) = E g x + b(tnk , X̄tnn ) + σ(tnk , X̄tnn ) Z , Z ∼ N (0, Iq ).
n k k n
10.4. QUANTIZATION (TREE) METHODS (II) 323
If f is Lipschitz
r 2
n
P̄ g(x) − P̄ n g(x0 )2 ≤ [g]2Lip E x − x0 + T n n 0
T n n 0
b(t , x) − b(t , x ) + σ(t , x) − σ(t , x ) Z
k k k k k k
n n
2 !
T 2
≤ [g]2Lip x − x0 + b(tnk , x) − b(tnk , x0 ) + σ(tnk , x) − σ(tnk , x0 )
n
T2 2
2 0 2 T 2 2T
≤ [g]Lip |x − x | 1 + [σ]Lip + [b]Lip + 2 [b]Lip .
n n n
As a consequence
Cb,σ,T
[P̄kn g]Lip ≤ 1+ [g]Lip , k = 0, . . . , n − 1,
n
i.e.
n Cb,σ,T
[P̄ ]Lip ≤ 1+ .
n
Applying the control established in claim (b) of the above theorem yields with obvious notations
n
!1
√ X Cb,σ,T 2(n−`)
2 2
Uk − U
bk
≤ 2[f ]Lip 1+
X` − X b`
2 n 2
`=k
n
!1
√ X n 2
2
Exercise. Derive a result in the case p 6= 2 based on Claim (a) of the theorem.
bk+1 = xk+1 |X
pkij = P X bx = xk , 1 ≤ i ≤ Nk , 1 ≤ j ≤ Nk+1 , k = 0, . . . , n − 1.
j i (10.9)
Although the above super-matrix defines a family of Markov transitions, the sequence (X
bk )0≤k≤n
bk+1 = xk+1 |X bk = xk and
is definitely not a Markov Chain since there is no reason why P X j i
bk+1 = xk+1 |Xk = xk , X
b` x = x` , ` = 0, . . . , i − 1 .
P X j i i`
In fact one should rather seeing the quantized transitions
Nk+1
X
Pbk (xki , dy) = pkij δxk+1 , xki ∈ Γk , k = 0 = . . . , n − 1,
j
j=1
as spatial discretizations of the original transitions Pk (x, du) of the original Markov chain.
Definition 10.1 The family of grids (Γk ), 0 ≤ k ≤ n, and the transition super-matrix [pkij ] defined
by (10.9) defines a quantization tree of size N = N0 + · · · + Nn .
Remark. A quantization tree in the sense of the above definition does not characterize the distri-
bution of the sequence (X
bk )0≤k≤n .
Implementation. The implementation of the whole quantization tree method relies on the
computation of this transition super-matrix. Once the grids (optimal or not ) have been specified
and the weights of the super-matrix have been computed or, to be more precise have been estimated,
the computation of the pseudo-réduite E U b0 at the origin amounts to an almost instantaneous
“backward descent” of the quantization tree based on (10.8).
If we can simulate M independent copies of the Markov chain (Xk )0≤k≤n denoted X (1) , . . . , X (M ) ,
then the weights pkij can be estimated by a standard Monte Carlo estimator
Remark. By contrast with the regression methods for which theoretical results are mostly focused
on the rate of convergence of the Monte Carlo phase, we will analyze here this part of the procedure
for which we refer to [8] where the Monte Carlo simulation procedure to compute the transition
super-matrix and its impact on the quantization tree is deeply analyzed.
Application. We can apply what precedes, still within the framework of an Euler scheme, to
0
our original optimal stopping problem. We assume that all random vectors Xk lie in Lp (P), for
a real exponent p0 > 2 and that they have been optimally quantized (in L2 ) by grids of size Nk ,
k = 0, . . . , n, then, relying on the non-asymptotic Zador’s Theorem (claim (b) from Theorem 5.1.2),
we get with obvious notations,
n
!1
c̄n k ≤ √2eCb,σ,T [f ] C 0 0
2
X −2
kŪ0n −U0 2 Lip p ,d e 2Cb,σ,T
σp0 (X̄tnn )2 Nk d .
k
k=0
10.4. QUANTIZATION (TREE) METHODS (II) 325
Quantization tree optimal design At this stage, one can proceed an optimization of the
quantization tee. To be precise, one can optimize the size of the grids Γk subject to a “budget” (or
total allocation) constraint, typically
( n )
2
n 2 −d
X
2Cb,σ,T
min e σp0 (X̄tn ) Nk , Nk ≥ 1, N0 + · · · + Nn = N .
k
k=0
(b) Derive a asymptotically optimal (suboptimal) choice for the grid size allocation (as N → +∞).
This optimization turns out to have a significant numerical impact, even if, in terms of rate,
the uniform choice Nk = N̄ = Nn−1 (doing so we assume implicitly that X0 = x0 ∈ Rd so that
Xb0 = X0 = x0 , N0 = 1 and Ub0 = u
b0 (x0 )) leads to a quantization error of the form
Remark. If we can simulate directly the sampled (Xtnk )0≤k≤n of the diffusion instead of its Euler
scheme and if the obstacle/payoff function is semi-convex in the sense Condition (SC), then we get
as a typical error bound
1
!
c̄n (x0 )| ≤ Cb,σ,T,f,d 1 n 2
|u0 (x0 ) − u 0 + .
n N̄ d1
1
Remark. • The rate of decay N̄ − d which becomes worse and worse as the dimension d of the
structure Markov process increases can be encompassed by such tree methods. It is a consequence
of Zador’s Theorem and is known as the curse of dimensionality.
• These rates can be significantly improved by introducing a Romberg like extrapolation method
or/and some martingale corrections to the quantization tree (see [132]).
326 CHAPTER 10. MULTI-ASSET AMERICAN/BERMUDA OPTIONS, SWING OPTIONS
(see exercise, in the previous section) we (asymptotically) minimize the resulting quantization error
induced by the BDDP descent. This error reads with N e = N0 + · · · + Nn (usually > N ).
N
with N̄ = n . Theoretically this choice may look not crucial since it has no impact on the conver-
gence rate, but in practice, it does influence the numerical performances.
• Stationary process: The process (Xk )0≤k≤n is stationary and X0 ∈ L2+δ for some δ > 0. A
typical example in the Gaussian world, is, as expected, the stationary Ornstein-Uhlenbeck process
(sampled at times tnk = kT n , k = 0, . . . , n). The essential feature of such a setting is that the
quantization tree only relies on one optimal N̄ -grid (say L2h optimal for the distribution i of X
b0 )
Γ = Γ∗N = {x1 , . . . , xN̄ } and one quantized transition matrix P(X b Γ = xj | X
b Γ = xi ) .
1 0
1≤i,j≤n
l m
N
– For every k ∈ {0, . . . , n}, kXk k2+δ = kX0 k2+δ hence Nk = n+1 , k = 0, . . . , n and
1 1
n2+d n N
kV0 − vb0 (X
b0 )k ≤ CX,δ
2 1 ≤ CX,δ 1 with N̄ = n+1 .
N d N̄ d
Exercise (temporary). Make the assumption that the error in a quantization scheme admits
a first order expansion of the form
1 1
c1 c2 n 2 + d
Err(n, N ) = α + 1 .
n Nd
10.4. QUANTIZATION (TREE) METHODS (II) 327
Devise a Richardson-Romberg extrapolation based on two quantization trees with sizes N (1)
and N (2) respectively.
Martingale correction When (Xk )0≤k≤n , is a martingale, one can force this martingale
property on the quantized chain by freezing the transition weigh super-matrix and by moving in a
backward way the grids Γk so that the resulting grids Γe k satisfy
b Γek−1 = X
b Γek | X b Γek−1 .
E Xk k−1 k−1
www.quantize.maths-fi.com
The 1-dimension. . . Although of little interest for applications since other (deterministic meth-
ods like PDE approach are available) we propose below for the sake of completeness a Newton-
Raphson method to compute optimal quantizers of scalar unimodal distributions, i.e. absolutely
continuous distributions whose density is log-concave. The starting point is the following theorem
due to Kieffer [81] which states uniqueness of the optimal quantizer in that setting.
Theorem 10.4 (1D-distributions, see [81]) If d = 1 and PX (dξ) = ϕ(ξ) dξ with log ϕ concave,
then there is exactly one stationary N -quantizer (up to the reordering of its component, i.e. as a
328 CHAPTER 10. MULTI-ASSET AMERICAN/BERMUDA OPTIONS, SWING OPTIONS
Figure 10.1: An optimal quantization of the bi-variate normal distribution with size N = 500
set). This unique stationary quantizer is a global (and local) minimum of the distortion function
i.e.
X
∀ N ≥ 1, argminRN DN = {x(N ) }.
Remark. Absolutely continuous distributions on the real line with a log-concave density are
sometimes called unimodal distributions. The support of such distributions is a (closed) interval
[a, b] where −∞ ≤ a ≤ b ≤ +∞.
Examples of unimodal distributions: The normal and Gaussian distributions, the gamma
distributions γ(α, β) with α ≥ 1, the Beta B(a, b)-distributions with a, b ≥ 1, etc.
Specification of the Voronoi cells of x = (x1 , . . . , xN ), a < x1 < x2 < · · · < xN < b where [a, b]
1 1 1 i+1 i
denote the support of PX (which is that of ϕ as well): Ci (x) = [xi− 2 , xi+ 2 [, xi+ 2 = x 2+x ,
1 1
i = 2, . . . , N − 1, x 2 = a, xN + 2 = b .
Z xi+ 12
Computation of the gradient Gradient: ∇DN X
(x) = 2 1
(xi − ξ)ϕ(ξ)dξ .
xi− 2
1≤i≤N
The Hessian ∇2 (DN X )(x1 , . . . , xN ) can in turn be computed (at least at N -tuples having com-
ponents where the density ϕ is continuous which is the case everywhere, except possibly at the
endpoints of its support which is an interval). It reads
∇2 (DN
X
)(x1 , . . . , xN ) =
tobecompleted
Z u Z u
where Φ(u) = ϕ(v)dv and Ψ(u) = vϕ(v)dv.
−∞ −∞
10.4. QUANTIZATION (TREE) METHODS (II) 329
Thus if X ∼ N (0; 1) the function Φ is (tabulated or) computable (see Section 11.3) at low
computational cots with high accuracy and
x2
e− 2
Ψ(x) = √ , x ∈ R.
2π
This allows for an almost instant search for the unique optimal N -quantizer using a Newton-
Raphson descent on RN with the requested accuracy.
For the normal distribution N (0; 1) and N = 1, . . . , 500, tabulation within 10−14 accuracy of
both optimal N -quantizers and companion parameters:
and
(N )
P(X ∈ Ci (x(N ) )), i = 1, . . . N, and kX − X
bx k2
can also be downloaded at the website www.quantize.maths-fi.com.
330 CHAPTER 10. MULTI-ASSET AMERICAN/BERMUDA OPTIONS, SWING OPTIONS
Chapter 11
Miscellany
Z +∞ x2 dx
Φ0Z 2
(u) = i u eiux e− 2 x √ = −uΦZ (u) .
−∞ 2π
so that
u2 u2
ΦZ (u) = ΦZ (0)e− 2 = e− 2 . ♦
331
332 CHAPTER 11. MISCELLANY
where
1
t := 1+px , p := 0.231 6419
and
2
− x2 6
O e t ≤ 7, 5 10−8 .
The distribution function of the N (0; 1) distribution is given for every real number t by
Z t
1 x2
Φ0 (t) := √ e− 2 dx.
2π −∞
The following tables give the values of Φ0 (t) for t = x0 , x1 x2 where x0 ∈ {0, 1, 2}, x1 ∈
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9} and x2 ∈ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}.
For example, if t = 1.23 (i.e. row 1.2 and column 0.03) one has Φ0 (t) ≈ 0.8907.
11.5. UNIFORM INTEGRABILITY AS A DOMINATION PROPERTY 333
t 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6661 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7290 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1,0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1,1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1,2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1,3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1,4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1,5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1,6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1,7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1,8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1,9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2,0 0.9772 0.9779 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2,1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2,2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2,3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2,4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2,5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2,6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2,7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2,8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2,9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
One notes that Φ0 (t) = 0.9986 for t = 2.99. This comes from the fact that the mass of the normal
distribution is mainly concentrated on the interval [−3, 3] as emphasized by the table of the “large”
values hereafter (for instance, one remarks that P({|X| ≤ 4, 5}) ≥ 0.99999 !).
t 3,0 3,1 3,2 3,3 3,4 3,5 3,6 3,8 4,0 4,5
Φ0 (t) .99865 .99904 .99931 .99952 .99966 .99976 .999841 .999928 .999968 .999997
what follows that each random variable Xi can be defined on its own probability space (Ωi , Ai , Pi )
with a straightforward adaptation of the statements.
Theorem 11.1 (Equivalent definitions of uniform integrability I.) A family (Xi )i∈I of Rd -
valued random vectors, defined on a probability space (Ω, A, P), is said to be uniformly integrable
if it satisfies one of the following equivalent properties,
(i) lim sup E |Xi |1{|Xi |≥R} = 0.
R→+∞ i∈I
(α) supi∈I E |Xi | < +∞,
(ii) Z
(β) ∀ ε > 0, ∃ η = η(ε) > 0 such that, ∀ A ∈ A, P(A) ≤ η =⇒ sup |Xi |dP ≤ ε.
i∈I A
Remark. • All norms being strongly equivalent on Rd , claims (i) and (ii) do not depend on the
selected norm on Rd .
• L1 -Uniform integrability of a family of probability distribution (µi )i∈I defined on (Rd , Bor(Rd ))
can be defined accordingly by
Z
lim sup |x|µi (dx) = 0.
R→+∞ i∈I {|x|≥R}
at least for large enough R ∈ (0, ∞). Now, for every i ∈ I and every A ∈ A,
Z Z
|Xi |dP ≤ RP(A) + |Xi |dP.
A {|Xi |≥R}
Z
ε
Owing to (i) there exists real number R = R(ε) > 0 such that sup |Xi |dP ≤ . Then
i∈I {|Xi |≥R} 2
ε
setting η = η(ε) = 2R yields (ii).
Conversely, for every real number R > 0, the Markov Inequality implies
supi∈I E |Xi |
sup P({|Xi | ≥ R}) ≤ .
i∈I R
11.5. UNIFORM INTEGRABILITY AS A DOMINATION PROPERTY 335
supi∈I E |Xi |
Let η = η(ε) given by (ii)(β). As soon as R > η , sup P({|Xi | ≥ R}) ≤ η and (ii)(β)
i∈I
implies that
sup E |Xi |1{|Xi |≥R} ≤ ε
i∈I
which completes the proof. ♦
P2. If (Xi )i∈I and (Yi )i∈I are two families of uniformly integrable Rd -valued random vectors, then
(Xi + Yi )i∈I is uniformly integrable.
P3. If (Xi )i∈I is a family of Rd -valued random vectors dominated by a uniformly integrable family
(Yi )i∈I of random variables in the sense that
∀ i ∈ I, |Xi | ≤ Yi P-a.s.
then (Xi )i∈I is uniformly integrable.
The four properties follow from characterization (i). To be precise P1 is a consequence of the
Lebesgue Dominated Convergence Theorem whereas the second one follows from the obvious
E |Xi + Yi |1{|Xi +Yi |≥R} ≤ E |Xi |1{|Xi |≥R} + E |Yi |1{|Yi |≥R} .
Corollary 11.1 Let (Xi )i∈I be a family of Rd -valued random vectors defined on a probability space
Φ(x)
(Ω, A, P) and let Φ : Rd → R+ satisfying lim = +∞. If
|x|n→+∞ |x|
Theorem 11.2 (Uniform integrability II.) Let (Xn )n≥1 a sequence of Rd -valued random vec-
tors defined on a probability space (Ω, A, P) and let X be an Rd -valued random vector defined on
the same probability space. If
(i) (Xn )n≥1 is uniformly integrable,
P
(ii) Xn −→ X,
then
E|Xn − X| −→ 0 as n → +∞.
(In particular E Xn → E X.)
336 CHAPTER 11. MISCELLANY
Proof. One derives from (ii) the existence of a subsequence (Xn0 )n≥1 such that Xn0 → X a.s..
Hence by Fatou’s Lemma
Hence X ∈ L1 (P) and, owing to P1 and P2, (Xn − X)n≥1 is a uniformly integrable sequence. Now,
for every integer n ≥ 1 and every R > 0,
E |Xn − X| ≤ E(|Xn − X| ∧ M ) + E |Xn − X|1{|Xn −X|≥M } .
The Lebesgue Dominated Convergence Theorem implies limn E(|Xn − X| ∧ M ) = 0 so that
lim sup E |Xn − X| ≤ lim sup E |Xn − X|1{|Xn −X|≥M } = 0. ♦
n R→+∞ n
Corollary 11.2 (Lp -uniform integrability) Let p ∈ [1, +∞). Let (Xn )n≥1 a sequence of Rd -
valued random vectors defined on a probability space (Ω, A, P) and let X be an Rd -valued random
vector defined on the same probability space. If
(i) (Xn )n≥1 is Lp -uniformly integrable (i.e. (|Xn |p )n≥1 is uniformly integrable),
P
(ii) Xn −→ X,
then
kXn − Xkp −→ 0 as n → +∞.
(In particular E Xn → E X.)
Proof. By the same argument as above X ∈ Lp (P) so that (|Xn − X|p )n≥1 is uniformly integrable
by P2 and P3. One concludes by the above theorem since |Xn − X|p → 0 in probability as
n → +∞. ♦
11.6 Interchanging. . .
Theorem 11.3 (Interchanging continuity and expectation) (see [30], Chapter 8) (a) Let
(Ω, A, P) be a probability space, let I be a nontrivial interval of R and let Ψ : I × Ω → R be a
Bor(I) ⊗ A-measurable function. Let x0 ∈ I. If the function Ψ satisfies:
(i) for every x ∈ I, the random variable Ψ(x, . ) ∈ L1R (P),
(ii) P(dω)-a.s., x 7→ Ψ(x, ω) is continuous at x0 ,
(iii) There exists Y ∈ L1R+ (P) such that for every x ∈ I,
The same extension based on uniform integrability as Claim (b) holds true for the differentiation
Theorem 2.1 (see the exercise following the theorem).
11.7. MEASURE THEORY 337
where C(S, R) denotes the set of continuous functions from (S, dS ) to R. When S is σ-compact
(i.e. is a countable union of compact sets), one may replace the space C(S, R) by the space CK (S, R)
of continuous functions with compact support.
Theorem 11.5 (Functional monotone class Theorem) Let (S, S) be a measurable space. Let
V be a vector space of real valued bounded measurable functions defined on (S, S). Let C be a subset
of V , stable by the product of two functions. Assume furthermore that V satisfies
(i) 1 ∈ V ,
Theorem 11.6 Let (µn )n≥1 be a sequence of probability measures on a Polish (metric) space (S, δ)
equipped with its Borel σ-field S and let µ be a probability measure on the same space. The following
properties are equivalent:
S
(i) µn =⇒ µ as n → +∞,
(iii) For every Borel set A ∈ S such that µ(∂A) = 0 (where ∂A = A \ Å is the boundary of A),
(v) For every bounded Borel function f : S → R such that µ Disc(f ) = 0,
Z Z
lim f dµn = f dµ.
n S S
Proposition 11.2 Let (µn )n≥1 be a sequence of probability measures on a Polish (metric) space
(S, δ) weakly converging to µ.
(a) Let g : S → R+ a (non-negative) µ-integrable Borel function and let f : S → R be a µ-a.s.
continuous Borel function. If
Z Z
0 ≤ |f | ≤ g and gdµn −→ gdµ as n → +∞
S S
Z Z
then f ∈ L1 (µ) and f dµn −→ f dµ as n → +∞.
S S
(b) The conclusion still holds if (µn )n≥1 is f -uniformly integrable i.e.
Z
lim sup |f |dµn = 0
n n≥1 |f |≥R
Proof.cqf d
If S = Rd , weak convergence is also characterized by the Fourier transform µ
b defined on Rd by
Z
µ
b(u) = eĩ(u|x) µ(dx), u ∈ Rd .
Rd
Proposition 11.3 Let (µn )n≥1 be a sequence of probability measures on (Rd , Bor(Rd ) and let µ be
a probability measure on the same space. Then
S
µn =⇒ µ ⇐⇒ ∀ u ∈ Rd , µ bn (u) −→ µb(u) .
For a proof we refer e.g. to [74], or to any textbook on a first course in Probability Theory.
Remark. The convergence in distribution of a sequence (Xn )n≥1 defined on probability spaces
(Ωn , An , Pn ) of random variables taking values in a Polish space is defined as the weak convergence
of their distributions on their distribution µn = PnXn = Pn ◦ Xn−1 on (S, S).
11.9. MARTINGALE THEORY 339
n→+∞
Mn −→ M∞ on the event {hM i∞ < +∞}
where M∞ is a finite random variable. If furthermore EhM i∞ < +∞, then M∞ ∈ L2 (P).
Lemma 11.1 (Kronecker Lemma) Let (an )n≥1 be a sequence of real numbers and let (bn )n≥1
be a non decreasing sequence of positive real numbers with limn bn = +∞. Then
n
!
X an 1 X
converges in R as a series =⇒ ak −→ 0 as n → +∞ .
bn bn
n≥1 k=1
n
X ak P ak
Proof. Set Cn = , n ≥ 1, C0 = 0. The assumption says that Cn → C∞ = k≥1 bk ∈ R.
bk
k=1
Now, for large enough n bn > 0 and
n n
1 X 1 X
ak = bk ∆Ck
bn bn
k=1 k=1
n
!
1 X
= bn Cn − Ck−1 ∆bk
bn
k=1
n
1 X
= Cn − ∆bk Ck−1
bn
k=1
where we used Abel Transform for series. One concludes by the extended Césaro Theorem since
∆bn ≥ 0 and lim bn = +∞. ♦
n
To establish Central Limit Theorems outside the ‘ô.i.d.” setting, we will rely on Lindberg’s
Central limit theorem for arrays of martingale increments (seeTheorem 3.2 and its corollary 3.1,
p.58 in [70]), stated below in a simple form involving only one square integral martingale.
340 CHAPTER 11. MISCELLANY
Theorem 11.7 (Lindeberg Central Limit Theorem for martingale increments) ([70]) Let
(Mn )n≥1 be a sequence of square integrable martingales with respect to a filtration (Fn )n≥1 and let
(an )n≥1 be a non-decreasing sequence of real numbers going to infinity when n goes to infinity.
If the following conditions hold:
n
1 X
(i) E |∆Mk |2 | Fn−1 −→ σ 2 ∈ [0, +∞) in probability,
an
k=1
n
1 X
(ii) ∀ ε > 0, E (∆Mk )2 1{|∆Mk |≥an ε} | Fn−1 −→ 0 in probability,
an
k=1
then
Mn L
√ −→ N 0; σ 2 .
an
Derive from this result a d-dimensional theorem [Hint: Consider a linear combination of the
d-dimensional martingale under consideration.]
Bibliography
[1] M. Abramovicz, I.A. Stegun (1964). Handbook of Mathematical Functions, National Bu-
reau of Standards, Washington.
[3] I.A. Antonov and V.M. Saleev (1979). An economic method of computing LPτ -sequences,
Zh. vȳ chisl. Mat. mat. Fiz., 19, 243-245. English translation: U.S.S.R. Comput. Maths. Math.
Phys., 19, 252-256.
[4] B. Arouna (2004). Adaptive Monte Carlo method, a variance reduction technique. Monte
Carlo Methods and Appl., 10(1):1-24.
[5] B. Arouna, O. Bardou (2004). Efficient variance reduction for functionals of diffusions by
relative entropy, CERMICS-ENPC, technical report.
[6] M.D. Ašić, D.D. Adamović (1970): Limit points of sequences in metric spaces, The American
Mathematical Monthly, 77(6):613-616.
[8] V. Bally (2004). The central limit theorem for a nonlinear algorithm based on quantization,
Proc. of The Royal Soc., 460:221-241.
[9] V. Bally, D. Talay (1996). The distribution of the Euler scheme for stochastic differential
equations: I. Convergence rate of the distribution function, Probab. Theory Related Fields,
104(1): 43-60 (1996).
[10] V. Bally, D. Talay (1996). The law of the Euler scheme for stochastic differential equations.
II. Convergence rate of the density, Monte Carlo Methods Appl., 2(2):93-128.
[11] V. Bally, G. Pagès and J. Printems (2001). A Stochastic quantization method for non-
linear problems, Monte Carlo Methods and Appl., 7(1):21-34.
[12] V. Bally, G. Pagès (2003). A quantization algorithm for solving discrete time multidimen-
sional optimal stopping problems, Bernoulli, 9(6):1003-1049.
341
342 BIBLIOGRAPHY
[13] V. Bally, G. Pagès (2003). Error analysis of the quantization algorithm for obstacle prob-
lems, Stochastic Processes & Their Applications, 106(1):1-40.
[14] V. Bally, G. Pagès, J. Printems (2003). First order schemes in the numerical quantization
method, Mathematical Finance 13(1):1-16.
[15] V. Bally, G. Pagès, J. Printems (2005). A quantization tree method for pricing and
hedging multidimensional American options, Mathematical Finance, 15(1):119-168.
[16] C. Barrera-Esteve and F. Bergeret and C. Dossal and E. Gobet and A. Meziou
and R. Munos and D. Reboul-Salze (2006). Numerical methods for the pricing of Swing
options: a stochastic control approach, Methodology and Computing in Applied Probability.
[17] O. Bardou, S. Bouthemy, G. Pagès (2009). Pricing swing options using Optimal Quanti-
zation, Applied Mathematical Finance, 16(2):183-217.
[18] O. Bardou, S. Bouthemy, G. Pagès (2006). When are Swing options bang-bang?, Inter-
national Journal for Theoretical and Applied Finance, 13(6):867-899, 2010.
[19] O. Bardou, N. Frikha, G. Pagès (2009). Computing VaR and CVaR using Stochastic
Approximation and Adaptive Unconstrained Importance Sampling, Monte Carlo and Appli-
cations Journal, 15(3):173-210.
[21] A. Benveniste, M. Métivier and P. Priouret (1987). Algorithmes adaptatifs and ap-
proximations stochastiques. Masson, Paris, 1987.
[22] J. Bertoin, Lévy processes. (1996) Cambridge tracts in Mathematics, 121, Cambridge Uni-
versity Press, 262p.
[23] A. Beskos, G.O. Roberts (2005). Exact simulation of diffusions, Ann. appl. Prob., 15(4):
2422-2444.
[25] J.-P. Borel, G. Pagès, Y.J. XIao (1992). Suites à discrépance faible and intégration
numérique, Probabilités Numériques, N. Bouleau, D. Talay eds., Coll. didactique, INRIA,
ISBN-10: 2726107087.
[26] B. Bouchard, I. Ekeland, N. Touzi (2004) On the Malliavin approach to Monte Carlo
approximation of conditional expectations. (English summary) Finance Stoch., 8 (1):45-71.
[27] N. Bouleau, D. Lépingle (1994). Numerical methods for stochastic processes, Wiley Se-
ries in Probability and Mathematical Statistics: Applied Probability and Statistics. A Wiley-
Interscience Publication. John Wiley & Sons, Inc., New York, 359 pp. ISBN: 0-471-54641-0.
[28] P. P. Boyle (1977). Options: A Monte Carlo approach, Journal of Financial Economics,
4(3):323-338
BIBLIOGRAPHY 343
[29] O. Brandière, M. Duflo (1996). Les algorithmes stochastiques contournent-ils les pièges?
(French) [Do stochastic algorithms go around traps?], Ann. Inst. H. Poincaré Probab. Statist.,
32(3):395-427.
[30] M. Briane, G. Pagès, Théorie de l’intégration, Cours & exercices, Vuibert, 1999.
[31] Buche R., Kushner H.J. (2002). Rate of convergence for constrained stochastic approxima-
tion algorithm, SIAM J. Control Optim., 40(4):1001-1041.
[32] J.F. Carrière (1996) Valuation of the early-exercise price for options using simulations and
nonparametric regression. Insurance Math. Econ., 19:1930.
[33] K.L. Chung (1949) An estimate concerning the Kolmogorov limit distribution, Trans. Amer.
Math. Soc., 67, 36-50.
[34] E. Clément, A. Kohatsu-Higa, D. Lamberton (2006). A duality approach for the weak
approximation of Stochastic Differential Equations, Annals of Applied Probability, 16(3):449-
471.
[35] É. Clément, P. Protter, D. Lamberton (2002). An analysis of a least squares regression
method for American option pricing, Finance & Stochastics, 6(2):449-47.
[36] P. Cohort (2000), Sur quelques problèmes de quantification, thèse de l’Univsersité Paris 6,
Paris, 187p,
[37] P. Cohort (2004). Limit theorems for random normalized distortion, Ann. Appl. Probab.,
14(1):118-143.
[38] S. Corlay, G. Pagès (2014). Functional quantization based stratified sampling methods,
pre-print LPMA, 1341, to appear in Monte Carlo and Applications Journal.
[39] R. Cranley, T.N.L. Patterson (1976). Randomization of number theoretic methods for
multiple integration,SIAM J. Numer. Anal., 13(6):904-914.
[41] R.B. Davis , D.S. Harte (1987) Tests for Hurst effect, Biometrika, 74:95-101.
[42] L. Devroye (1986). Non uniform random variate generation, Springer, New York.
[43] M. Duflo (1996) ( 1997). Algorithmes stochastiques, coll. Mathématiques & Applications,
23, Springer-Verlag, Berlin, 319p.
[44] M. Duflo, Random iterative models, translated from the 1990 French original by Stephen
S. Wilson and revised by the author. Applications of Mathematics (New York), 34 Springer-
Verlag, Berlin, 385 pp.
[45] P. Étoré, B. Jourdain (2010). Adaptive optimal allocation in stratified sampling methods,
Methodol. Comput. Appl. Probab., 12(3):335-360.
344 BIBLIOGRAPHY
[46] H. Faure (1986). Suite à discrépance faible dans T s , Technical report, Univ. Limoges (France).
[47] H. Faure (1982). Discrépance associée à un système de numération (en dimension s), Acta
Arithmetica, 41:337-361.
[48] O. Faure (1992). Simulation du mouvement brownien and des diffusions, Thèse de l’ENPC
(Marne-la-Vallée, France), 133p.
[50] J.C Fort, G. Pagès (1996). Convergence of stochastic algorithms: from the Kushner &
Clark Theorem to the Lyapunov functional, Advances in Applied Probability, 28:1072-1094.
[51] J.C Fort, G. Pagès (2002). Decreasing step stochastic algorithms: a.s. behaviour of weighted
empirical measures, Monte Carlo Methods & Applications, 8(3):237-270.
[52] A. Friedman (1975). Stochastic differential equations and applications. Vol. 1-2. Probability
and Mathematical Statistics, Vol. 28. Academic Press [Harcourt Brace Jovanovich, Publishers],
New York-London, 528p.
[53] J. H. Friedman, J.L. Bentley and R.A. Finkel (1977). An Algorithm for Finding
Best Matches in Logarithmic Expected Time, ACM Transactions on Mathematical Software,
3(3):209-226.
[54] N. Frikha, S. Menozzi (2012). Concentration bounds for Stochastic Approximations, Elec-
tron. Cummun. Proba., 183 17(47):1-15.
[55] A. Gersho and R.M. Gray (1988). Special issue on Quantization, I-II (A. Gersho and R.M.
Gray eds.), IEEE Trans. Inform. Theory, 28.
[56] A. Gersho and R.M. Gray (1992). Vector Quantization and Signal Compression. Kluwer,
Boston.
[59] P. Glasserman, D.D. Yao (1992). Some guidelines and guarantees for common random
numbers, Manag. Sci., 36(6):884-908.
[60] P. Glynn (1989). Optimization of stochastic systems via simulation. In Proceedings of the
1989 Winter simulation Conference, Society for Computer Simulation, San Diego, 90-105.
[61] E. Gobet, G. Pagès, H. Pham, J. Printems (2007). Discretization and simulation for a
class of SPDE’s with applications to Zakai and McKean-Vlasov equations, SIAM Journal on
Numerical Analysis, 44(6):2505-2538.
BIBLIOGRAPHY 345
[62] E. Gobet, S. Menozzi (2004): Exact approximation rate of killed hypo-elliptic diffusions
using the discrete Euler scheme, Stochastic Process. Appl., 112(2):201-223.
[63] E. Gobet (2000): Weak approximation of killed diffusion using Euler schemes, Stoch. Proc.
and their Appl., 87:167-197.
[64] E. Gobet, A. Kohatsu-Higa (2003). Computation of Greeks for barrier and look-back
options using Malliavin calculus, Electron. Comm. Probab., 8:51-62.
[65] E. Gobet, R. Munos (2005). Sensitivity analysis using Itô-Malliavin calculus and martin-
gales. Application to optimal control, SIAM J. Control Optim., 43(5):1676-1713.
[67] S. Graf, H. Luschgy, and G. Pagès (2008). Distortion mismatch in the quantization of
probability measures, ESAIM P&S, 12:127-154.
[69] J. Guyon (2006). Euler scheme and tempered distributions, Stochastic Processes and their
Applications, 116(6), 877-904.
[70] P. Hall, C.C. Heyde (1980). Martingale Limit Theory and Its Applications, Academic Press,
308p.
[71] U.G. Haussmann (1979). On the Integral Representation of Functionals of Itô Processes, ???,
3:17-27.
[72] S.L. Heston (1993). A closed-form solution for options with stochastic volatility with appli-
cations to bon and currency options, The review of Financial Studies, 6(2):327-343.
[73] J. Jacod, A.N. Shiryaev (2003). Limit theorems for stochastic processes. Second edition
(first edition 1987). Grundlehren der Mathematischen Wissenschaften [Fundamental Principles
of Mathematical Sciences], 288. Springer-Verlag, Berlin, 2003. xx+661 pp.
[74] J. Jacod, P. Protter (1998). Asymptotic error distributions for the Euler method for
stochastic differential equations. Annals of Probability, 26(1):267-307.
[76] P. Jaillet, D. Lamberton, and B. Lapeyre (1990). Variational inequalities and the
pricing of American options, Acta Appl. Math., 21:263-289.
[77] H. Johnson (1987). Call on the maximum or minimum of several assets, J. of Financial and
Quantitative Analysis, 22(3):277-283.
[78] I. Karatzas, S.E. Shreve (1998). Brownian Motion and Stochastic Calculus, Graduate
Texts in Mathematics, Springer-Verlag, New York, 470p.
346 BIBLIOGRAPHY
[79] O. Kavian (1993). Introduction à la théorie des points critiques et applications aux problèmes
elliptiques. (French) [Introduction to critical point theory and applications to elliptic problems].
coll. Mathématiques & Applications [Mathematics & Applications], 13, Springer-Verlag, Paris
[Berlin], 325 pp. ISBN: 2-287-00410-6.
[80] A.G. Kemna, A.C. Vorst (1990). A pricing method for options based on average asset
value, J. Banking Finan., 14:113-129.
[81] J.C. Kieffer (1982) Exponential rate of convergence for Lloyd’s method I, IEEE Trans. on
Inform. Theory, Special issue on quantization, 28(2):205-210.
[83] P.E. Kloeden, E. Platten (1992). Numerical solution of stochastic differential equations.
Applications of Mathematics, 23, Springer-Verlag, Berlin (New York), 632 pp. ISBN: 3-540-
54062-8.
[85] U. Krengel (1987). Ergodic Theorems, De Gruyter Studies in Mathematics, 6, Berlin, 357p.
[87] H. Kunita (1982). Stochastic differential equations and stochastic flows of diffeomorphisms,
Cours d’école d’été de Saint-Flour, LN-1097, Springer-Verlag, 1982.
[88] T.G. Kurtz, P. Protter (1991). Weak limit theorems for stochastic integrals and stochastic
differential equations, Annals of Probability, 19:1035-1070.
[89] H.J. Kushner, G.G. Yin (2003). Stochastic approximation and recursive algorithms and ap-
plications, 2nd edition, Applications of Mathematics, Stochastic Modelling and Applied Proba-
bility, 35, Springer-Verlag, New York.
[90] J.P. Lambert (1985). Quasi-Monte Carlo, low-discrepancy, and ergodic transformations, J.
Computational and Applied Mathematics, 13/13, 419-423.
[92] B. Lapeyre, G. Pagès (1989). Familles de suites à discrépance faible obtenues par itération
d’une transformation de [0, 1], Comptes rendus de l’Académie des Sciences de Paris, Série I,
308:507-509.
[93] B. Lapeyre, G. Pagès, K. Sab (1990). Sequences with low discrepancy. Generalization and
application to Robbins-Monro algorithm, Statistics, 21(2):251-272.
[94] S. Laruelle, C.-A. Lehalle, G. Pagès (2009). Optimal split of orders across liquidity
pools: a stochastic algorithm approach, pre-pub LPMA 1316, to appear in SIAM J. on Fi-
nancial Mathematics.
BIBLIOGRAPHY 347
[95] S. Laruelle, G. Pagès (2012). Stochastic Approximation with averaging innovation, Monte
Carlo and Appl. J.. 18:1-51.
[96] V.A. Lazarev (1992). Convergence of stochastic approximation procedures in the case of a
regression equation with several roots, (transl. from) Problemy Pederachi Informatsii, 28(1):
66-78.
[97] B. Lapeyre, E. Temam (2001). Competitive Monte Carlo methods for the pricing of Asian
Options, J. of Computational Finance, 5(1):39-59.
[98] M. Ledoux, M. Talagrand (2011). Probability in Banach spaces. Isoperimetry and pro-
cesses. Reprint of the 1991 edition. Classics in Mathematics. Springer-Verlag, Berlin, 2011.
xii+480 pp.
[99] P. L’Ecuyer, G. Perron (1994). On the convergence rates of ipa and fdc derivatives esti-
mators, Oper. Res., 42:643-656.
[100] M. Ledoux (1997) Concentration of measure and logarithmic Sobolev inequalities, technical
report, http://www.math.duke.edu/ rtd/CPSS2007/Berlin.pdf.
[101] J. Lelong (2007). Algorithmes Stochastiques and Options parisiennes, Thèse de l’École Na-
tionale des Ponts and Chaussées.
[103] Ljung L. (1977). Analysis of recursive stochastic algorithms, IEEE Trans. on Automatic
Control, AC-22(4):551-575.
[104] F.A. Longstaff, E.S. Schwarz (2001). Valuing American options by simulation: a simple
least-squares approach, Review of Financial Studies, 14:113-148.
[105] H. Luschgy (2013). Martingale die discrete Ziet, Theory und Anwendungen. Berlin. Springer,
452p.
[107] H. Luschgy, G. Pagès (2008). Functional quantization rate and mean regularity of pro-
cesses with an application to Lévy processes, Annals of Applied Probability, 18(2):427-469.
[108] Mac Names (2001). A fast nearest-neighbour algorithm based on principal axis search tree.
IEEE T. Pattern. Anal., 23(9):964-976.
[109] S. Manaster, G. Koehler (1982). The calculation of Implied Variance from the Black-
Scholes Model: A Note, The Journal of Finance, 37(1):227-230.
[110] G. Marsaglia, T.A. Bray (1964) A convenient method for generating normal variables.
SIAM Rev., 6 , 260-264.
348 BIBLIOGRAPHY
[112] G. Marsaglia, W.W.Tsang (2000). The Ziggurat Method for Generating Random Vari-
ables, Journal of Statistical Software, 5(8):363-372.
[113] Métivier M., Priouret P. (1987) Théorèmes de convergence presque sure pour une classe
d’algorithmes stochastiques à pas décroissant. (French. English summary) [Almost sure con-
vergence theorems for a class of decreasing-step stochastic algorithms], Probab. Theory Related
Fields, 74(3):403-428.
[114] Milstein G.N., (1976). A method of second order accuracy for stochastic differential equa-
tions, Theor. Probab. Its Apll. (USSR), 23:396-401.
[115] J. Neveu, (1964). Bases mathématiques du calcul des Probabilités, Masson, Paris; English
translation Mathematical Foundations of the Calculus of Probability (1965), Holden Day, San
Francisco.
[116] J. Neveu (1972). Martingales à temps discret, Masson, 1972, 218p. English translation:
Discrete-parameter martingales, North-Holland, New York, 1975, 236p.
[117] D.J. Newman (1982). The Hexagon Theorem, IEEE Trans. Inform. Theory (A. Gersho and
R.M. Gray eds.), 28:137-138.
[118] H. Niederreiter (1992) Random Number Generation and Quasi-Monte Carlo Methods,
CBMS-NSF regional conference series in Applied mathematics, SIAM, Philadelphia.
[120] G. Pagès (1987). Sur quelques problèmes de convergence, thèse Univ. Paris 6, Paris.
[121] G. Pagès (1992). Van der Corput sequences, Kakutani transform and one-dimensional nu-
merical integration, J. Computational and Applied Mathematics, 44:21-39.
[122] G. Pagès (1998). A space vector quantization method for numerical integration, J. Compu-
tational and Applied Mathematics, 89, 1-38 (Extended version of “Voronoi Tessellation, space
quantization algorithms and numerical integration”, in: M. Verleysen (Ed.), Proceedings of
the ESANN’ 93, Bruxelles, Quorum Editions, (1993), 221-228).
[124] G. Pagès (2007). Quadratic optimal functional quantization methods and numerical appli-
cations, Proceedings of MCQMC, Ulm’06, Springer, Berlin, 101-142.
[125] G. Pagès (2013). Functional co-monotony of processes with an application to peacocks and
barrier options, Sémiaire de Probabilités XXVI, LNM 2078, Springer, Cham, 365:400.
BIBLIOGRAPHY 349
[126] G. Pagès (2014). Introduction to optimal quantization for numerics, to appear in ESAIM
Proc. & Surveys, Paris, France.
[127] G. Pagès, H. Pham, J. Printems (2004). Optimal quantization methods and applications
to numerical problems in finance, Handbook on Numerical Methods in Finance (S. Rachev,
ed.), Birkhauser, Boston, 253-298.
[129] G. Pagès, J. Printems (2003). Optimal quadratic quantization for numerics: the Gaussian
case, Monte Carlo Methods and Appl. 9(2):135-165.
[131] G. Pagès, J. Printems (2009). Optimal quantization for finance: from random vectors to
stochastic processes, chapter from Mathematical Modeling and Numerical Methods in Finance
(special volume) (A. Bensoussan, Q. Zhang guest éds.), coll. Handbook of Numerical Analysis
(P.G. Ciarlet Editor), North Holland, 595-649.
[132] G. Pagès, B. Wilbertz (2012). Optimal Delaunay et Voronoi quantization methods for
pricing American options, pré-pub PMA, 2011, Numerical methods in Finance (R. Carmona,
P. Hu, P. Del Moral, .N. Oudjane éds.), Springer, New York, 171-217.
[133] G. Pagès, Y.J. Xiao (1988) Sequences with low discrepancy and pseudo-random numbers:
theoretical results and numerical tests, J. of Statist. Comput. Simul., 56:163-188.
[134] K.R. Parthasarathy (1967). Probability measures on metric spaces. Probability and Math-
ematical Statistics, No. 3 Academic Press, Inc., New York-London, xi+276 pp.
[135] M. Pelletier (1998). Weak convergence rates for stochastic approximation with application
to multiple targets and simulated annealing, Ann. Appl. Probab., 8(1):10-44.
[136] R. Pemantle (1990), Nonconvergence to unstable points in urn models and stochastic ap-
proximations, Annals of Probability, 18(2):698-712.
[137] H. Pham (2007). Some applications and methods of large deviations in finance and insurance
in Paris-Princeton Lectures on Mathematical Finance 2004, LNM 1919, New York, Springer,
191-244.
[138] B.T. Polyak (1990) A new method of stochastic approximation type. (Russian) Avtomat.
i Telemekh. 7: 98-107; translation in Automat. Remote Control, 51 7, part 2, 937946 (1991).
[139] P.D. Proı̈nov (1988). Discrepancy and integration of continuous functions, J. of Approx.
Theory, 52:121-131.
[140] W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery (2002). Numerical
recipes in C++. The art of scientific computing, Second edition, updated for C++. Cambridge
University Press, Cambridge, 2002. 1002p. ISBN: 0-521-75033-4 65-04.
350 BIBLIOGRAPHY
[141] D. Revuz, M. Yor (1999). Continuous martingales and Brownian motion. Third edition.
Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical
Sciences], 293, Springer-Verlag, Berlin, 602 p. ISBN: 3-540-64325-7 60G44 (60G07 60H05).
[142] H. Robbins, S. Monro (1951). A stochastic approximation method, Ann. Math. Stat.,
22:400-407.
[143] R.T. Rockafellar, S. Uryasev (1999). Optimization of Conditional Value at Risk, pre-
print, Univ. Florida, http://www.ise.ufl.edu/uryasev.
[145] W. Rudin (1966). Real and complex analysis, McGraw-Hill, New York.
[146] W. Ruppert . . .
[147] K.-I. Sato (1999). Lévy processes and Infinitely Divisible Distributions, Cambridge Univer-
sity Press.
[148] A. N. Shyriaev (2008). Optimal stopping rules. Translated from the 1976 Russian second
edition by A. B. Aries. Reprint of the 1978 translation. Stochastic Modelling and Applied
Probability, 8. Springer-Verlag, Berlin, 2008. 217 pp.
[149] A. N. Shyriaev (1984). Probability, Graduate Tetxts in mathematics, Springer, New York,
577p.
[152] D. Talay, L. Tubaro (1990). Expansion of the global error for numerical schemes solving
stochastic differential equations, Stoch. Anal. Appl., 8:94-120.
[153] B. Tuffin (2004). Randomization of Quasi-Monte Carlo Methods for error estimation: Sur-
vey and Normal Approximation, Monte Carlo Methods and Applications, 10, (3-4):617-628.
[154] S. Villeneuve, A. Zanette (2002). Parabolic A.D.I. methods for pricing american option
on two stocks, Mathematics of Operation Research, 27(1):121-149.
[155] D. S. Watkins (2010) Fundamentals of matrix computations (3rd edition), Pure and Applied
Mathematics (Hoboken), John Wiley & Sons, Inc., Hoboken, NJ, xvi+644 pp.
[156] M. Winiarski (2006). Quasi-Monte Carlo Derivative valuation and Reduction of simulation
bias, Master thesis, Royal Institute of Technology (KTH), Stockholm (Sweden).
BIBLIOGRAPHY 351
[157] A. Wood, G. Chan (1994). Simulation of stationary Gaussian processes in [0, 1]d , Journal
ofcomputational and graphical statistics, 3(4):409-432.
[158] Y.J. Xiao (1990). Contributions aux méthodes arithmétiques pour la simulation accélérée,
Thèse de l’ENPC, Paris.
[159] Y.J. Xiao (1990). Suites équiréparties associées aux automorphismes du tore, C.R. Acad.
Sci. Paris (Série I), 317:579-582.
[160] P.L. Zador (1963). Development and evaluation of procedures for quantizing multivariate
distributions. Ph.D. dissertation, Stanford Univ.
[161] P.L. Zador (1982). Asymptotic quantization error of continuous signals and the quantization
dimension, IEEE Trans. Inform. Theory, 28(2):139-14.
Index
352
INDEX 353
Euler scheme (discrete time), 204 Lindeberg Central Limit Theorem, 340
Euler scheme (genuine), 205 Lloyd I procedure (randomized), 121
Euler scheme (stepwise constant), 204 log-likelihood method, 45
Euler schemes, 204 low discrepancy (sequence with), 97
Euler-Maruyama schemes, 204
exchange spread option, 73 Malliavin-Monte Carlo method, 43
exponential distribution, 14 mean quadratic quantization error, 114
extreme discrepancy, 90 Milstein scheme, 225
Milstein scheme (higher dimensional), 228
Faure sequences, 99 Monte Carlo method, 31
Feynmann-Kac’s formula, 256 Monte Carlo simulation, 31
finite difference (constant step), 282
finite difference (deceasing step), 288 nearest neighbour projection, 114
finite difference methods (greeks), 281 negative binomial distribution(s), 16
finite variation function (Hardy and Krause sense negatively correlated variables, 57
), 93 Niederreiter sequences, 100
finite variation function (measure sense), 92 normal distribution, 331
flow of an SDE, 250 numerical integration (quantization), 121
fractional Brownian motion (simulation), 30
functional monotone class theorem, 337 optimal quantizer, 114
option on a basket, 52
gamma distribution, 22 option on an index, 52
generalized Minkowski inequality, 239
generator (homogeneous ), 10 Pareto distribution, 14
geometric distribution(s), 15 parity equation, 66
geometric mean option, 56 parity variance reduction method, 66
greek computation, 281 partial Lookback option, 224
Poisson distribution, 24
Halton sequence, 97 Poisson process, 24
Hammersley procedure, 101 polar method, 21, 27
Haussman-Clark-Occone formula, 301 Polish space, 12
homogeneous generator, 10 Portmanteau Theorem, 337
pre-conditioning, 73
implication (parameter), 134 Proı̈nov’s Theorem, 103
importance sampling, 78
importance sampling (recursive), 140 quantile, 77, 160
index option, 52 quantile (two-sided α-), 33
integrability (uniform), 41 quantization error (mean quadratic), 114
quantization tree, 320
Jensen inequality, 52 quasi-Monte Carlo method, 87
quasi-Monte Carlo method (unbounded dimen-
Kakutani sequences, 98 sion), 109
Kakutani’s adding machine, 98 quasi-stochastic approximation, 111, 196
Kemna-Vorst control variate, 54
Koksma-Hlawka’s Inequality, 93 Röth’s lower bound (discrepancy), 96
Kronecker Lemma, 65, 339 residual maturity, 285
354 INDEX